Wednesday, April 01, 2015

Federating THE friends table in a Sharded mySQL environment without downtime or users noticing


A friends table is the cornerstone of social applications. Its purpose is to define relationships and help answer the question what are my friends doing.

Here is an example friend’s table:

 CREATE TABLE `friends` (
  `user_id` bigint(20) unsigned NOT NULL,
  `friend_id` bigint(20) unsigned NOT NULL,
  `auto_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`user_id`,`friend_id`),
  KEY `user_id-auto_ts` (`user_id`,`auto_ts`),
  KEY `friend_id-auto_ts` (`friend_id`,`auto_ts`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci


With the table above we can get a list of user_ids a user follows (following), or a list of people who follow said user (followers), or get a list of mutual follows. This is a very simple table structure yet very powerful.

The problem is this table doesn't scale on a single server, when you have millions of users, each user has many friends, all users are semi to deeply connected the table becomes a problem. Mix this with a huge request rate, with lots of concurrency a single server just doesn't scale.

One can replicate the friends table but what starts to cause lag is when many users start adding or removing friends at once. So, how can we distribute this table across many servers holding a small % of the friend graph?

Let's look at the friends table.  It defines whom a user follows and who follows the user ordered by insertion time.

Let's create two tables:

CREATE TABLE `following` (
  `user_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `friend_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `mutual` tinyint(3) unsigned NOT NULL DEFAULT '0' COMMENT 'Flag to denote mutual connections',
  `auto_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`user_id`,`friend_id`),
  KEY `user_id-auto_ts` (`user_id`,`auto_ts`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8


CREATE TABLE `followers` (
  `user_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `friend_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `mutual` tinyint(3) unsigned NOT NULL DEFAULT '0' COMMENT 'Flag to denote mutual connections',
  `auto_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`user_id`,`friend_id`),
  KEY `friend_id-auto_ts` (`friend_id`,`auto_ts`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8


The 'following' table defines whom a said user follows. The table is federated by the user_id so this table exists on the user_id's shard.

The 'followers' table fines that is following the said user. On every follow instead of writing one row, we now write two rows. One write on the following user's shard. One write on the followed users shard. Thus the followers table is federated by friend_id.

This can be best described by an example on reads:

How many people am I user_id 3306 following?

Connect to my Shard-x, execute the query

SELECT COUNT(*) FROM following WHERE user_id = 3306


How many people are following me (user_id 3306)

Connect to my Shard-x, execute the following query

SELECT COUNT(*) FROM followers WHERE user_id = 3306


Now let's look at a write, of me (user_id:3306) following friend_id:11211

3306 is on Shard-x
11211 is on Shard-y

So, 1st we write to the fact that 3306 is following 11211. We connect to Shard-x and execute the transaction

BEGIN
INSERT INTO following (user_id, friend_id, mutual, auto_ts) VALUES(3306, 11211, 0, NOW());
// DO NOT COMMIT YET


Now connect to Shard-y to write the followers row. If the connection fails rollback the transaction on 3306's Shard-x, otherwise

BEGIN
INSERT INTO followers (user_id, friend_id, mutual, auto_ts) VALUES(3306, 11211, 0, NOW());
if affected rows == 1 (no error)
COMMIT on Shard-x
COMMIT on Shard-y


Now we can answer the main questions.


But what about something like. Give me my friends photos sorted by last upload time 10 at a time?

Well here is the magic sauce. We are going to do a FANOUT reads and hit all the shards, which my friends are on. For my environment this is much better than a FANOUT of writes, since we like to customize in real-time the feed as well as duplicating the data 10000s of times becomes very expensive quickly as servers start turning cold. We can go into this topic a bit more in another post.


Now I execute the query across from friends shards

SELECT p.id FROM photos p JOIN followers f ON(f.friend_id=p.user_id) WHERE f.user_id = 3306 ORDER BY p.id DESC LIMIT 10;

If I have a 1000 friends and 100 shards, each friend has 10 photos I am going to get back 1000 rows.

But the Order is not what I am going to display because I want to display the latest 10 photos. Thus I will need to sort in memory on the application server and take a slice of the results.


But what if I want the 2nd page?
SELECT p.id FROM photos p JOIN followers f ON (f.friend_id=p.user_id) WHERE f.user_id = 3306 p.id < [LAST_ID_FROM_FIRST_PAGE] ORDER BY p.id DESC LIMIT 10

In the application we pass the last_id from the 1st page and execute the same FANOUT on reads again do the same logic and return the photos.

Your questions might be, but isn't this slow because people with large networks will have to hit every shard each time and you have to loop - execute - read on each connection?

This can be mitigated with memory, pipelining and parallel SQL execution.

If you're social graph is like twitter where all active users follows 100K users and the feed doesn't change dynamically writing the data to each shard may be for you. But, again this is out of scope for this post.

What about answering the question mutual connections?

On ever write of a friend relationship, do a select to see if the followed person follows the follower. Then mark the row on both shards as mutual.

For all my personal cases, this distributed friends table solves all my needs. Lots of friend writes from importing friends from say an address book or email or other social network friend graph and a large concurrency is not going to affect me SINCE the table has been removed from a Single Point and is now distributed across many servers.

Reads are fast because only a % of data is on each shard, 90% of the queries hit only that shard for a given user.

Feed type queries are fast because the SQL is executed in parallel if we have to go to the SQL Layer. Most data is cached, reducing the need to FANOUT on reads.

Finally federating without downtime or users notices requires a backfill script and writes to the old friends table as well as writes to the new friend tables. Once this is done, fix all the queries to use the new format. Then sit back and feel good that good work was done :)

No comments: