Last year I started learning Scala and Akka for my then day-job. This post isn’t so much about that learning experience, which is an entire topic in its own right, given the notoriously large learning curve Scala has. Instead I wanted to write about the project I then went ahead and built as part of that learning process, which I’ve called CurioDB — a distributed and persistent Redis clone! Please note that this is a toy project, hence the name “Curio”, and any suitability as a drop-in replacement for Redis is purely incidental.
Firstly though, let’s talk about Scala a little. If you’ve never heard of Scala, it’s an advanced, hybrid functional and object oriented programming language for the JVM, developed by a company called Typesafe. As I mentioned, I found its reputation for being relatively hard to learn well founded. I can only reiterate all the common points brought up when people write about Scala — the type system goes very deep, personally I’ve only scratched the surface of it. On one hand this is a good thing. Working with a technology that takes a very long time to achieve mastery in can serve as a constant source of motivation, given the right attitude.
The language itself is very powerful, with an overwhelming number of advanced features. This power comes at a cost though — the syntax at first seems to contain many ambiguities, and in that regard, it reminds me of a very modern Perl. I imagine that one developer’s Scala may look very different to another’s, and that working with Scala on a team would require a relatively decent amount of discipline to conform to a consistent set of language features and style. On the other hand, that can all be thrown out of the window with a one-person project, as was my case, and as such, learning Scala and developing CurioDB has been a huge amount of fun!
Given only the above, I’d be on the fence as to whether Scala as a language itself is a worthwhile time investment in the long-term toolbox, but when you take Akka into consideration, I’m definitely sold. Akka is a framework, also by Typesafe, that allows you to develop massively distributed systems in a safe and transparent way, using the actor model. This is a weird analogy, but I see Akka as a killer framework for distributed systems in the same way I’ve seen Django as a killer framework for web development over the years — both gave me a profound sense of rapid development, by providing just the right level of abstraction that handles all the nitty-gritty details of their respective domains, allowing you to focus specifically on developing your application logic.
So why build a Redis clone? I’ve used Redis heavily in the past, and Akka gave me some really cool ideas for implementing a clone, based on each key/value pair (or KV pair) in the system being implemented as an actor:
Since each KV pair in the system is an actor, CurioDB will happily use all your CPU cores, so you can run 1 server using 32 cores instead of 32 servers each using 1 core (or use all 1,024 cores of your 32 server cluster, why not). Each actor operates on its own value atomically, so the atomic nature of Redis commands is still present, it just occurs at the individual KV level instead of in the context of an entire running instance of Redis.
Distributed by Default
Since each KV pair in the system is an actor, the interaction between
multiple KV pairs works the same way when they’re located across the
network as it does when they’re located on different processes on a
single machine. This negates the need for features of Redis like “hash
tagging”, and allows commands that deal with multiple keys (
MSET, etc) to operate seamlessly across a cluster.
Since each KV pair in the system is an actor, the unit of disk storage is the individual KV pair, not a single instance’s entire data set. This makes Redis’ abandoned virtual memory feature a lot more feasible. With CurioDB, an actor can simply persist its value to disk after some criteria occurs, and shut itself down until requested again.
Scala is concise, you get a lot done with very little code, but that’s just the start — CurioDB leverages Akka very heavily, taking care of clustering, concurrency, persistence, and a whole lot more. This means the bulk of CurioDB’s code mostly deals with implementing all of the Redis commands, so far weighing in at only a paltry 1,000 lines of Scala! Currently, the majority of commands have been fully implemented, as well as the Redis wire protocol itself, so existing client libraries can be used. Some commands have been purposely omitted where they don’t make sense, such as cluster management, and things specific to Redis’ storage format.
Since Akka Persistence is used for storage, many strange scenarios become available. Want to use PostgreSQL or Cassandra for storage, with CurioDB as the front-end interface for Redis commands? This should be possible! By default, CurioDB uses Akka’s built-in LevelDB storage.
Let’s look at the overall design. Here’s a bad diagram representing one server in the cluster, and the flow of a client sending a command:
Message flow through actors on a single cluster node
Not diagrammed, but in addition to the above:
CurioDB also supports
a HTTP/WebSocket JSON API, as well as the same wire protocol that Redis
implements over TCP. Commands are issued with HTTP POST requests, or
WebSocket messages, containing a JSON Object with a single
containing an Array of arguments. Responses are returned as a JSON
Object with a single
PSUBSCRIBE commands work as expected over WebSockets,
and are also fully supported by the HTTP API, by using chunked transfer
encoding to allow a single HTTP connection to receive a stream of
published messages over an extended period of time.
In the case of errors such as invalid arguments to a command, WebSocket
connections will transmit a JSON Object with a single
containing the error message, while HTTP requests will return a
response with a 400 status, contaning the error message in the response
PUNSUBSCRIBEcommands get broadcast to all of them. This needs rethinking!
Mainly though, Redis is an extremely mature and battle-tested project that’s been developed by many over the years, while CurioDB is a one-man hack project worked on over a few months. As much as this article attempts to compare them, they’re really not comparable in that light.
These are the results of
redis-benchmark -q on an early 2014
MacBook Air running OSX 10.9 (the numbers are requests per second):
Generated with the bundled benchmark.py script.
That’s it so far! I had a lot of fun building CurioDB, and Akka has really blown me away. If nothing else, I hope this project can provide a great showcase for how trivial Akka makes building distributed systems.