Clarifying Terascala’s Approach to Big Compute – and Why It is NOT the Same as the Big Data Challenge

At Terascala, we enable high-performance storage appliances to rack up impressive showings in the “numbers game”— gigabytes per second of throughput and petabytes of storage.  To some people, that sounds a lot like Big Data, one of the hottest buzzwords of today. So we often get inquiries from people with Big Data problems asking us if we can help.  We like to hear from those potential customers, but we often need to clarify whether we can help them. The answer is a definitive “maybe.”  There are some similarities and some overlaps between Big Data and Big Compute, but most of the time they are separate worlds.

When people talk about Big Data, they are usually thinking in terms of MapReduce and Hadoop, which is a great way to grind through mountains of unstructured and semi-structured data in the search for business value. With the unstructured data growth and the business insight buried in the data, it’s no wonder why so many enterprises are looking to Hadoop and similar offerings. But that doesn’t mean that all large data sets are best processed by Hadoop.

At the risk of oversimplifying the architectures, frameworks like Hadoop allow developers to group independent nodes that consist of both compute and storage. By contrast, Big Compute (often termed HPC—for high performance computing) is usually broken into 2 clusters—a consolidated computing resource (often multiple servers functioning as one) accessing a very large pool of clustered storage.

These two different architectures are best applied to specific scenarios.  Hadoop is great for breaking up a huge set of data and conducting the same tasks on subsets of the data. As a result, Hadoop problems are about taking massive amounts of data and searching through it or building an analysis about it in very much a divide and conquer approach.

By contrast, in the Big Compute world, the answer is not “in” the terabytes or petabytes of data; rather, the answer is the result of computationally intensive algorithms with the data. For example, a manufacturer simulating the operation of an engine needs to “crunch” massive amounts of data corresponding to combustion, lubrication, heat dissipation, Newtonian forces on pistons and crankshafts, and many, many other factors, all interrelated.  The masses of data must be considered as a whole by Big Compute rather than parsed for trends and patterns as in Big Data. Indeed, it is almost an opposite activity, where in the case of Big Compute, data that already has clear characteristics and patterns must be rendered into something wholly new.

Terascala’s sweet spot is solving those challenges where Big Compute meets Big Data and the goal of the output is simulation, analysis, or modeling. Historically, scale-out NAS solutions have been used to address these needs; however, scale-out NAS really isn’t designed to deliver the I/O throughput needed for the ever increasing data sizes and performance needs. Terascala appliances leverage Lustre as the parallel file system, allowing us to provide a large single namespace for petabytes of data, and linearly scale to tens of gigabytes of reads and writes per second. The large data sets enable higher fidelity engineering and the throughput performance can reduce the time to complete a job from days of computing down to just a few hours.

So, getting back to Big Data (Hadoop) versus Big Compute (HPC),  while Big Data problems are often addressed by building more pathways and lanes, consider that Terascala’s approach to Big Compute challenges is like providing a very large super highway for customer data—reducing or eliminating the latency and uncertainty of traditional connections and speeding the results.

It’s what makes us the much appreciated problem solver for customers like Florida State University (get case study), which relies on a Dell | Terascala HPC storage solution to support a wide range of research projects. Our value stack not only delivers the raw power they need, it also simplifies their operations, making life simpler and reducing the time researchers need to solve problems.

For more case studies, visit http://www.terascala.com/resources/.

One thought on “Clarifying Terascala’s Approach to Big Compute – and Why It is NOT the Same as the Big Data Challenge

  1. Pingback: Datacenter Acceleration - Alan R. Earls - HPC Storage Scheme Targets Wall Street

Leave a Reply

Your email address will not be published. Required fields are marked *