In this part two series of Multi-Modal Data blog, you will learn which modality is best for you. Click here to read part one of this blog series, which covers the differences of the various data warehousing modalities.
Which modality is best?
When is one technology or modality better than another? Are there trade-offs for some that can be mitigated with others? How should we develop a data warehousing strategy that balances current needs against future needs without locking yourself into technology that may be significantly out of date in the near future? Using the framework of thinking outlined earlier, we can break down these questions and more into their component parts for more study.
All things being equal, traditional data warehouses are typically the smallest modality. Appliances are closely coupled arrays of traditional database technologies, so they can generate more performance per unit than a stand-alone traditional data warehouse. Hadoop-style architectures are based on commodity hardware and low overhead file management instead of heavier overhead database technologies, allowing Hadoop environments to grow massively. Finally, elastic cloud solutions completely decouple hardware configuration from the logical configuration, allowing massive scalability that can outpace all the others.
Traditional database technology can handle most basic data tasks. In addition to basic data tasks, appliances can often execute collections of advanced algorithms for rapid return of predictive scoring and clustering. Hadoop environments, by virtue of being file-based, can handle a wider array of data styles, both structured and unstructured. Elastic cloud solutions vary from traditional database as a service, to massively parallel structured data processing akin to virtual appliances, to full-blown Hadoop-style environments as a service.
Hadoop technologies are mostly based on open-source code bases and communities. As such, while they offer an amazing array of options for any number of innovative data movement, data management, data querying and data storage tasks, teams that own and operate Hadoop environments must be quite sophisticated. Cloud technologies, by virtue of virtualization at scale, can simplify many tasks by making them configuration-only exercises. Appliances can be challenging to keep in tune for optimized performance, but tend to require a narrower band of skills than Hadoop or cloud. Traditional database technologies have been around for decade, and many of the skills required to maintain them are considered to be closer to commodity.
Traditional databases are slower than the other modalities due to the higher overhead technologies they are built on. Hadoop is dramatically faster than traditional databases for very large workloads, but because each query requires the same amount of overhead, they can be comparatively slow for smaller tasks. Elastic cloud technologies have the advantage of being flexible, so scenarios that require more memory allocations for query speed or distributed computing power for high concurrency can return results at very high rates. Appliances are designed for performance, and while they’re expensive, they can be fine-tuned to provide the highest performance of all modalities.
In summary, traditional database technologies are for smaller-scale, lower-complexity, and lower-performance needs, but have the advantage of not requiring highly sophisticated teams. Appliances handle larger scale and complexity, but need some specialized skill to yield the highest relative performance. Hadoop-style environments are for very large and complex data sets, and can provide high performance for the scale and complexity when managed by a highly skilled team. Finally, elastic cloud can provide the best combination of all four factors.
Why don’t we move everything to the cloud then? Many workloads have already been migrated to the cloud, and as companies are becoming more and more comfortable not maintaining direct physical control of their data, the cost will come down and we’ll likely see hybrid cloud environments dominate the data landscape.
Today the comparative costs favor Hadoop for very large workloads, elastic cloud for medium volume but high concurrency uses, appliances for specialized needs, and traditional for those not quite ready to explore the broader world of multi-modal data warehousing.