Amazon recently introduced Kinesis. It claims in its marketing material to be a an elastic service for processing streaming data. That sounds an awful lot like what Storm is, and since I know a thing or two about Storm, I figured I’d dive in and see what Kinesis might have to offer.
As usual, Amazon does a good job marketing the product so that you believe it will solve all your problems. Having been burned by AWS offerings in the past (I’m looking at you, DataPipeline), I was hesitant to get excited. After digging in, though, Kinesis does offer some cool stuff with a major drawback.
First, the cool stuff. Kinesis does appear to be elastic and able to scale automatically based on load. This is a great feature to have since, if you’ve ever managed an EC2 cluster before, you know it’s not always easy and straightforward to get this right. Assuming this works as advertised (always iffy at first, but I’m sure eventually it will), this will greatly simplify cluster operations over what you do with a Storm setup.
Kinesis also has a fairly simple and straightforward design that should make it easy to develop for. Doing work on a data stream managed by Kineses should be as simple as writing some Java code that conforms to a simple interface. And unlike with Hadoop batch processing, you don’t need to write map reduces, so the barrier to entry is lower. Essentially any old procedure you care to write can operate on a stream in parallel using Kinesis.
The downside, though, is that every Kinesis application consists of just one procedure, so you can’t do complex stream processing like can be done with Storm unless you connect together multiple Kinesis applications. Naturally, I have some concerns about this.
- Communication costs between Kinesis applications may prevent low-latency complex stream processing
- There’s no built-in way to rollback partial processing of a record if it fails mid-process.
- More generally, coordination between Kinesis applications is limited.
This suggests to me that Kinesis is, like many AWS offerings, a nice tool for simple workloads. If you need to do something complex, though, Kinesis probably cannot deliver the results you want. For that you will still need Storm.
That being said, it does seem like Kinesis could be used to feed Storm instead of Kafka so long as your stream is fairly simple and you don’t need Kafka’s advanced partitioning features. That is, if you’re just using Kafka or Redis or something else as a queue for Storm to pull from, Kinesis might be able to serve you better depending on how it performs in the real world.
Cross posted from g3’s blog