I've been prepping for some interviews and brushing up on some system design topics. Below are three talks I've found interesting along with brief descriptions of some of what I took away from them.
This talk is about Zuul Push, a push notification system designed and used by Netflix. When I think of push notifications I think of those annoying messages you get on your phone from apps trying to get you to use them more frequently. Zuul Push is a different sort of push notification. It pushes updated data from Netflix's servers to client devices, for example, when your recommendations have changed. Internally it is implemented with WebSockets. I have a special fondness for WebSockets because I spent a couple of weeks implementing the WebSocket protocol on an embedded device so that it could send updates to a laptop that monitored the status of the device. The two things I found most interesting were how they handled persistence of WebSocket connections (basically adding timeouts and having servers that can handle tens of thousands of connections simultaneously) as well as an optimization for
TCP TIME_WAIT that I was not aware of.
Scaling Memcache at Facebook
As the title implies, this talk is about scaling Memcache at Facebook. Facebook is obviously a complicated webapp, and most user generated requests, even seemingly small ones, need to process a large amount of data. This becomes hard to handle at Facebook scale without caching. Fortunately, there is roughly a 100 to 1 read/write ratio, so Facebook can cache a lot of data in memcache to return it to users without having to redo DB lookups and computation. The basic idea is simple: just stick some cache in front of the DBs, get data from the cache if it exists, otherwise get it from the DB and fill the cache. The complexity comes from avoiding storing stale state in the caches and from avoiding network congestion. I was particularly impressed with how they tailed the commit log in order to issue deletes to the caches.
The last of the talks I'll cover in this post is on Tao, Facebook's data store for things like posts, comments, friend relations, etc. You will probably appreciate this talk more if you watch the one on Memcache first because then you will understand better how Tao and Memcache work together. I found it interesting that Tao uses MySQL as the primary datastore, but the queries for data are done using a NoSQL type API that queries Memcache. This made we wonder whether Facebook would choose a SQL DB for their backend if they were doing a from-scratch rewrite.