And then of course, you needed something to coordinate all of this.
So Chubby came along as a coordination system that would manage all of
the products in this one unit or one ecosystem that
could process all these large amounts of structured data seamlessly.
Well, then we can see that Facebook's stack looks very similar.
You have Zookeeper that orchestrates things.
You have HBase, you have Hive and Databee, and
then we have Scribe that is used for ingestion of large data set.
Then, if we move on and look at the Yahoo stack,
you can see that they will use some of the same components but they have something
called Data Highway and Oozie and then HCatalog for the metadata catalog.
So things are still very similar.
They are still used for the same purposes.
However, they might have different names
because they have somewhat different implementations of these tools.
Moving on, we can see that LinkedIn has their own version of this stack.
Again, you can see that there's some of the same components, and
then there's some specific versions of these tools that these organizations
developed on their own.
So you can see that there's a pattern that emerges across all these stacks
the different organizations use it.
And now we can see that we come down to the Cloudera's distribution for Hadoop,
and we can see which one of the pieces of the system Cloudera has.
So you can see that in Cloudera's distribution for Hadoop, or
Cloudera Stack as we call it We have Sqoop and Flume for ingestion.
We use HBase for the common store.
We have Oozie as the coordination and the workflow engine.
We use Pig and Hive for high level languages and querying some of the data.
And then we use Zookeeper as a coordination service on bottom
of this stack.