for instance most data centers are running some form of intrusion detection system or
the other, and all these require real-time analysis of the data that's coming in.
Essentially real-time analysis means that you have large amounts of data and
you need to process very large amounts of data within a few seconds with very low
latencies and with very high throughput, as much data per second as possible.
And you want to produce some amount of knowledge or
information out of that data that you're getting.
So you want to convert data into knowledge but fairly quickly even if the dat,
even if the data cannot be processed completely
you want to still be able to [SOUND] glean some information out of it, at least.
So that's a stream processing challenge and of course you might think well,
the MapReduce system is available and we've already used it for
batch computing before, why not just use it?
And in fact there are variants of MapReduce that are aimed at
stream processing.
But essentially MapReduce is a batch processing framework.
And you need to wait for an entire computation on a large dataset to complete
before you can get the results.
There is no notion of partial results in MapReduce at all.
Even the variants of MapReduce that are tuned to stream-processing or
to incremental processing, there are incremental versions of MapReduce as well,
still have a fairly high latency.
And essentially MapReduce is not intended for stream-processing applications.
And essentially it's not ra,
intended to be a long running application which is what stream processing really is.
So the Storm system has become very popular in the last few years.
It's an Apache project in the Apache incubator.
It's one of the more highly active JVM projects today.
Multiple languages are supported in the API including Python and Ruby.
And Storm as of today is used by many companies including Twitter for
personalization and for search.
So every time you do a Twitter search, they're likely using indirectly
the Storm system that Twitter is running in the data centers.
Flipboard uses Storm for generating custom feeds.
And a variety of other companies and websites use Storm as well,
including the Weather Channel WebMD.com, and others.