Notes on Dynamo DB

We read Dynamo today, and this is a summary of my own key takeaways. Amazon's Dynamo DB is interesting because it's a highly available and incrementally scaleable key-value store, optimized for Amazon's shopping cart, which requires a 99.9% write availability SLA. In particular, it allows service owners to customize their durability, consistency, and availabilities by configuring their read / write replication factors. Key ideas used and addressed in this paper include:

  • Consistent hashing with virtual nodes
  • Partition schemes for incremental scaleability
  • Merkle Trees for conflict detection
  • Sloppy quorum and hinted handoff for dealing with failures
  • Gossip protocol for detecting membership changes
  • Vector clocks for data versioning

Consistent Hashing with virtual nodes

  • Consistent hashing
    • Each node in the system is assigned a random value within this space which represents its “position” on the ring.
    • Each data item identified by a key is assigned to a node by hashing the data item’s key to yield its position on the ring, and then walking the ring clockwise to find the first node with a position larger than the item’s position.
    • Thus, each node becomes responsible for the region in the ring between it and its predecessor node on the ring.
    • The principle advantage of consistent hashing is that departure or arrival of a node only affects its immediate neighbors and other nodes remain unaffected.
  • Problems
    • Random position assignment of each node on the ring leads to non-uniform data and load distribution.
    • Every time a new node comes online or offline, you have to rehash everything and recompute all Merkle Trees.
    • Algorithm is oblivious to the heterogeneity in the performance of nodes.
  • Virtual nodes as a solution
    • Instead of mapping a node to a single point in the circle, each node gets assigned to multiple “virtual nodes” in the ring.
    • The number of virtual nodes that a node is responsible can decided based on its capacity—allows exploitation of performance heterogeneity in nodes.
    • Giving each physical node multiple virtual nodes also increases the number of partitions (i.e. key ranges), mitigating non-uniform data / load distribution.

Merkle trees for conflict detection

  • Merkle trees are hash trees that allows conflict detection in logarithmic time. The idea is that all leaves are hashes of their data, and that all nodes are hashes of a concatenation of its direct children.
  • Dynamo nodes keep a Merkle tree for each key range (set of keys owned by a virtual node).

Partition Schemes for Incremental Scaleability

  • The default strategy was just to give out T random tokens per node and partition by token value.
    • This had the problem of random key ranges, which changed as nodes joined and left the system.
    • Key range changes resulted in scanning and rebuilding entire Merkle trees, which was expensive.
  • Dynamo eventually migrated to a strategy with a fixed number of equally sized partitions. This saves recalculating any Merkel trees, since key ranges are fixed and leaves / joiners can just copy over a Merkle tree.

Sloppy Quorum

  • We should preface this with how requests are fielded in Dynamo.
    • Any client request (get or put) is eventually routed to a physical node that serves as a coordinator. The coordinator walks down a preference list of N nodes and requests a read or a write from each. We separately configure read replication factors (R) and write replication factors (W). Read requests requires a quorum of R consistent reads; writes requires a quorum of W consistent writes to succeed. If this is not achieved, the coordinator fails the request.
  • Dynamo's quorum is called “sloppy” because instead of requiring the first N nodes, we accept that some nodes will be down and walk the first N healthy nodes.
  • Hinted handoff is just the practice of storing writes temporarily on a backup node if the target write node is down. For example, a write to node A which is down, will be stored on node B until A comes back online again. When A comes back online, B transfers the written data to A and deletes it from itself.
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 202,980评论 5 476
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,178评论 2 380
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 149,868评论 0 336
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,498评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,492评论 5 364
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,521评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,910评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,569评论 0 256
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,793评论 1 296
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,559评论 2 319
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,639评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,342评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,931评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,904评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,144评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,833评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,350评论 2 342

推荐阅读更多精彩内容

  • 今天来讲一个真实的故事,故事的主人公是S姑娘。S姑娘是我在参加注会课程中认识的,她是我的小组长。中午收到她的短信,...
    肖肖爱吃鱼阅读 154评论 0 0
  • 今天取得了一个朋友的谅解,我希望我们还和以前一样,希望我们四个都好好的都得到自己的幸福。
    AnriquejoanChou阅读 222评论 0 0
  • “人人都喜欢怀旧,但没有谁真的愿意回去。而且无论是你厌弃或者钟爱的现在,也很快就摇身成为心心念念的过去。生命是一场...
    图同阅读 198评论 0 0
  • 夏时温度升的老高,但是灵魂的某处却是冰寒一片,只能通过读书略略取暖,心似乎永远捂不热一般,蒙起一层好看的霜。
    云使小短阅读 123评论 0 1