[HN Gopher] Millions of Tiny Databases [pdf] ___________________________________________________________________ Millions of Tiny Databases [pdf] Author : aratno Score : 80 points Date : 2020-02-14 19:02 UTC (3 hours ago) (HTM) web link (assets.amazon.science) (TXT) w3m dump (assets.amazon.science) | vimota wrote: | Tangentially related, BigQuery uses a similar usage-based | approach to place and replicate data in a manner that's likely to | be available for users: | | https://cloud.google.com/blog/products/data-analytics/how-bi... | ignoramous wrote: | Abstract at: https://www.amazon.science/publications/millions-of- | tiny-dat... | | > _...Physalia is a transactional key-value store, optimized for | use in large-scale cloud control planes, which takes advantage of | knowledge of transaction patterns and infrastructure design to | offer both high availability and strong consistency to millions | of clients. Physalia uses its knowledge of data center topology | to place data where it is most likely to be available. Instead of | being highly available for all keys to all clients, Physalia | focuses on being extremely available for only the keys it knows | each client needs, from the perspective of that client._ | | > _...We believe that the same patterns, and approach to design, | are widely applicable to distributed systems problems like | control planes,configuration management, and service discovery._ | | It'd be interesting to constrast this approach with Route53's or | IAM's datastore which need to be globally-replicated with time- | bounded eventually-consistent reads, and transactional but | verifiable writes. | | I hope AWS begins publishing about S3, now. One can look at the | patents AWS engineers author to get a feel for some of the | internals, but they are (intentionally?) hard to read. | | For instance, patents filed by two of the many S3 founding- | engineers: | https://patents.google.com/?inventor=James+Christopher+Soren... | | Also see: | | https://aws.amazon.com/builders-library/ | | https://research.google/pubs/ | mjb wrote: | It's not really about the design of S3, but if you're | interested in some of the philosophy and thinking behind S3 you | might enjoy "Beyond eleven nines: Lessons from Amazon S3 | culture of durability" | https://www.youtube.com/watch?v=DzRyrvUF-C0 | kthejoker2 wrote: | Thought for sure this'd be a thinkpiece on Excel in the | enterprise ... | | Seriously, though, this whole paper uses an amazing amount of | terminology - blast radius, colony, color, game day, split brain | - and an awesome biological metaphor of the Portuguese man o'war. | | Great read even if you don't care about fault tolerance, CAP | theorem, or distributed balancing at AWS-scale. | | One sample quote of the value of cheap heuristics over full-blown | number-crunching: | | > Globally optimizing the placement of Physalia volumes is not | feasible for two reasons, one is that it's a non-convex | optimization problem across huge numbers of variables, the other | is that it needs to be done online because volumes and cells come | and go at a high rate in our production environment. Figure 11 | shows the results of using one very rough placement heuristic: a | sort of bubble sort which swaps nodes between two cells at random | if doing so would improve locality. In this simulation, we | considered 20 candidates per cell. Even with this simplistic and | cheap approach to placement, Physalia is able to offer | significantly (up to 4x) reduced probability of losing | availability. | mjb wrote: | This is my paper (along with Tao and Fan). It's a great feeling | to have this published and available, and I'm super proud of the | team behind Physalia. | | There's a lighter-weight introduction to the work here: | https://www.amazon.science/blog/amazon-ebs-addresses-the-cha... | and for those attending NSDI, I'll be talking about Physalia in | the "Deployment Experience" session on Wednesday. | maxmcd wrote: | Very cool. I haven't read the whole paper yet, but from a quick | overview it seems somewhat similar to SLOG (which also deals | with world-scale replication by trying to keep data closer to | the nodes that use it): | | - http://www.vldb.org/pvldb/vol12/p1747-ren.pdf | | - https://blog.acolyer.org/2019/09/04/slog/ | | Any thoughts on this comparison? | mjb wrote: | Very interesting, I hadn't seen SLOG before. It seems like | there's a fairly similar core insight: placing data near | where it's used can help with some system properties. They | appear to be more latency-focused and we were (primarily) | aiming at availability. | | The other different part of Physalia is our focus (again, for | availability) on placement for 'blast radius'. That means we | try limit the number of cells than any one failure (software, | infrastructure, etc) can touch. Geo-replicated systems can | have similar concerns, but I haven't seen the same level of | focus on it as a key design goal. | revertts wrote: | RE: NSDI - Is the Firecracker paper available somewhere, or not | yet? | | Edit: I'm an idiot - | https://www.amazon.science/publications/firecracker-lightwei... ___________________________________________________________________ (page generated 2020-02-14 23:00 UTC)