hngopher.com

       [HN Gopher] Time Series and FoundationDB (2019)
       ___________________________________________________________________
        
       Time Series and FoundationDB (2019)
        
       Author : richieartoul
       Score  : 205 points
       Date   : 2022-05-28 14:19 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | richieartoul wrote:
       | Kind of surprised how fast this took off, I wrote this 2 years
       | ago and was showing it to a friend today who was asking how they
       | might build an inverted index using FoundationDB and decided to
       | repost it.
       | 
       | If you're looking for more examples of using FoundationDB to
       | store timeseries data, we describe a much more scaleable approach
       | here:
       | 
       | https://www.datadoghq.com/blog/engineering/introducing-husky...
        
         | rektide wrote:
         | Recently submitted:
         | 
         | https://news.ycombinator.com/item?id=31416843 (185 points, 11d
         | ago, 15 comments)
        
         | yed wrote:
         | In Husky, how do you maintain very short cache times in your
         | Writers without blowing up your S3 write costs?
        
           | richieartoul wrote:
           | It's (very lightly) hinted at in the blog post (the section
           | where we discuss "shard router"), but basically data locality
           | in our pipelines by tenant + a little bit of buffering goes a
           | long way.
           | 
           | I'm hoping I can coax one of my colleagues who works on the
           | routing side of things to write a detailed blog post on just
           | that topic though :)
        
             | yed wrote:
             | That would be great! Really appreciate your write up, this
             | space is really fascinating. The short cache / no stateful
             | Writer / no searches on Writers thing stood out to me as a
             | key differentiator to something like Grafana Loki.
        
       | SnowHill9902 wrote:
       | Why does he define the sample schema with "series_id TEXT,
       | timestamp integer"? Isn't it more reasonable to use "series_id
       | bigint, timestamp timestamptz"?
        
         | Mo3 wrote:
         | series_id - definitely looks like a mistake - timestamp could
         | just be Unix time? I always use ints for unix times too, I have
         | no hard data on that decision, but it feels like that might
         | avoid a little bit of overhead.
        
           | SnowHill9902 wrote:
           | Someone could chime in but timestamptz is just an 8 byte
           | integer like a bigint.
        
       | pradeepchhetri wrote:
       | I see you mentioned M3DB throughout the README. Have you looked
       | at VictoriaMetrics ?
        
         | richieartoul wrote:
         | I only mentioned M3DB because I was a maintainer of it at the
         | time, so it's what I was familiar with.
         | 
         | I'm a fan of VictoriaMetrics; it's performant and well
         | designed. The main dev Aliaksandr is prolific, and a really
         | nice guy to boot. I've messaged him a few times to ask for
         | advice / help and he's always sent me very detailed and
         | thoughtful responses.
         | 
         | These days I'm working on "non metrics" timeseries storage (I.E
         | full fidelity events) at Datadog so my focus has shifted to the
         | "real time OLAP data warehouse" side of things which is quite a
         | bit different from traditional metrics stores.
        
           | pradeepchhetri wrote:
           | Thank you!
        
       | avinassh wrote:
       | > Time series means different things to different people. In this
       | case, I want to focus on the type of time series storage engine
       | that could efficiently power an OLTP system (strong consistency
       | and immediately read your writes) or a monitoring / observability
       | workload as opposed to a time series database designed for OLAP
       | workloads.
       | 
       | I am more interested in the latter part:
       | 
       | > as opposed to a time series database designed for OLAP
       | workloads.
       | 
       | how it would be any different? Can anyone explain how/why time
       | series for OLTP would be different than OLAP? (I always thought
       | time series is stored in OLAP databases)
        
         | richieartoul wrote:
         | I described a modern take on an OLAP timeseries system recently
         | here: https://www.datadoghq.com/blog/engineering/introducing-
         | husky...
         | 
         | I don't know if that helps, but contrasting those two systems
         | might give you some idea of the difference.
        
       | alberth wrote:
       | When should you use FDB over Redis?
        
         | [deleted]
        
         | ww520 wrote:
         | When you need transaction and durability.
        
         | endisneigh wrote:
         | Not super familiar with Redis performance - is it much better
         | than FDB for single machine?
        
           | datalopers wrote:
           | For single-machine key-value needs (and lists, sets, sorted
           | sets, hash tables, and various full-text operations) it's by
           | far fastest in my experience. But it's not going to have the
           | level of transactional guarantees in a clustered setup or
           | full ACID (writes aren't immediately durable) that some use-
           | cases will require.
           | 
           | Treat Redis more like a cache or job queue or pub-sub or
           | event streaming platform or full-text search engine than a
           | primary datastore of mission-critical data.
        
             | endisneigh wrote:
             | Is Redis really more performant than FDB on a single
             | instance when _Redis persists to disk_? It wouldn 't
             | surprise me that Redis is faster in-memory, tho.
        
               | datalopers wrote:
               | No idea, I haven't benchmarked a 100% durable redis setup
               | (aof, fsync every query) against FDB but that will
               | definitely have a significant hit to write performance. I
               | generally stick to the model of not using Redis for
               | mission-critical durability use cases.
               | 
               | I don't really view them as competing technologies in the
               | first place though.
        
             | WJW wrote:
             | One underappreciated part about Redis is that its
             | singlethreaded in-memory approach makes every thing
             | extremely atomic. There is literally no way for
             | simultaneous operations to conflict if you only ever run
             | one operation at a time.
             | 
             | That said, you indeed need to be very careful around
             | durability since the dataset is kept in memory. Most
             | persistence options are a bit meh compared to "real"
             | databases IMO. Redis is a very fast ACI-compliant datastore
             | if you will.
        
               | jjtheblunt wrote:
               | Can you create a conflict across a power failure reboot?
        
               | WJW wrote:
               | I don't think you can create a conflict as in "corrupt
               | the database", but it is definitely possible to lose data
               | depending on how you have set your FSYNC settings in the
               | redis config. The docs mention that a truncated AOF file
               | (such as might happen during a power failure) will not
               | invalidate it but the last command will be lost. The
               | default is to sync to disk every second, so you could
               | lose up to a second of data that your backend did
               | consider to be written. You can also set it to fsync
               | after every write query, but this will be much slower.
               | 
               | https://redis.io/docs/manual/persistence/ has all the
               | details.
        
       | beebmam wrote:
       | It seems like it is a suboptimal choice to use a garbage
       | collected language when trying to optimize for throughput. What
       | made you decide to use Go instead of a language like Rust or C++?
        
         | jerryjerryjerry wrote:
         | Interestingly I have been thinking about related questions too
         | (https://news.ycombinator.com/item?id=31541200#31542407)
         | 
         | Language wise, Rust might be a good choice as some projects
         | like tikv already use it in a pretty larger scale systems. But
         | I'd also like to know both pros/cons of these languages.
        
         | jjtheblunt wrote:
         | I think it's worth noting that Go HAS a garbage collector
         | available, but it's entirely possible to write code which
         | doesn't rely on it, at least in the user-written code.
         | 
         | Furthermore, the compiler itself can help in the (re)writing of
         | code to effect stack allocations of variables, avoiding runtime
         | reliance on the garbage collector.
         | 
         | This is effected, in general, through "escape analyses" that
         | prevent reliance, at runtime, on garbage collection.
         | 
         | Punchline is calling Go a garbage collected language can be
         | entirely inaccurate, depending on the program being compiled.
        
         | adgjlsfhk1 wrote:
         | GC isn't necessarily higher cost than not doing GC. garbage
         | collectors are allowed to free objects when they want to which
         | can be significantly faster than freeing in an arbitrary order
         | as happens with manual memory management.
        
           | simpsond wrote:
           | It does require more memory for tracking objects as well as
           | storing objects longer than they are strictly necessary. So I
           | would argue that manual memory management is most useful when
           | there are memory constraints. But yes, GCd systems can yield
           | more throughput. They are also easier to work with IMO.
        
             | adgjlsfhk1 wrote:
             | this is only kind of true. garbage collectors can use lots
             | of tricks like alias analysis to automatically free objects
             | without involving the garbage collector, and manually
             | managed languages actually may keep objects around for
             | longer than gced ones because manual management can only
             | use compile time information to determine lifetimes, but
             | GCs get run time info. also, for both types, the trash
             | performance win is avoiding unnecessary allocations in the
             | first place.
        
               | jjtheblunt wrote:
               | > garbage collectors can use lots of tricks like ...
               | without involving the garbage collector
               | 
               | is there a typo: X can use tricks to avoid X ?
        
         | [deleted]
        
           | [deleted]
        
         | foobiekr wrote:
         | If you are doing real performance coding, you also are avoiding
         | allocations and paying attention to cache sizes and so on. Go
         | is fine for that, hypothetically, though obviously you can do
         | better. But you aren't generating a lot of garbage in the first
         | place so the GC aspect becomes minimal impact.
         | 
         | That is, unless you are forced to. One of the things about Go
         | that is _very_ irritating is the way so many key APIs in the
         | ecosystem make it almost impossible to re-use buffers. For
         | example, protobuf and sarama generate throwaway memory at the
         | same rate as message throughput.
        
           | richieartoul wrote:
           | Yeah that's true, although there is a growing ecosystem of
           | "allocation sensitive" libraries. We use this one a lot for
           | protobuf: https://github.com/richardartoul/molecule
           | 
           | jsonparser is great for JSON parsing:
           | https://github.com/buger/jsonparser
           | 
           | fastcache is a great "zero gc" cache:
           | https://pkg.go.dev/github.com/VictoriaMetrics/fastcache
           | 
           | Etc
        
           | AlphaSite wrote:
           | Go's GC works well because it has stack allocated types,
           | which Java doesn't and pays a very heavy penalty for.
        
             | astrange wrote:
             | "Stack allocated" types wouldn't be a concept at the Java
             | language level, what it's missing is value types. This has
             | way worse issues than performance, because reference types
             | are unnecessarily mutable and cause lots of bugs.
             | 
             | Theoretically it can recover stack allocations with escape
             | analysis though.
        
         | richieartoul wrote:
         | This is a controversial opinion, but I've been generally
         | unimpressed with the performance of real world systems written
         | in Rust or C++.
         | 
         | While the theoretical performance ceiling is higher, I've
         | personally observed that it usually takes several times longer
         | to write the systems in Rust/C++ and often the "naive" Rust
         | implementation is less performant than the "naive" go
         | implementation.
         | 
         | There are obviously cases where using Rust or C++ makes a lot
         | of sense, but I prefer Go. I can get a performant prototype up
         | and running quickly, the "critical path" can usually be micro
         | optimized to within +/- 10% of Rust if you know what you're
         | doing, and by the time the Rust implementation matures the Go
         | implementation has already had time to iterate repeatedly on
         | key architectural, data structure, and algorithmic choices
         | based on real world usage.
         | 
         | That's just my 2cents though and doesn't reflect the views of
         | all the people I work with.
        
           | purplerabbit wrote:
           | Very astute observation. Although it's a bit painful to
           | admit, as Rust scratches a perfectionist itch which Go
           | absolutely does not.
           | 
           | I've read a bit about algorithm-driven performance gains vs.
           | hardware-driven performance gains... this seems like an
           | argument for a third category: DX-driven performance gains.
        
           | the_duke wrote:
           | These systems usually spend a lot of time waiting for
           | network/disk IO, which means there is plenty of CPU time
           | available to do the computations or GC runs, and the added
           | memory pressure isn't so significant.
           | 
           | The Go runtime is also finely tuned for exactly these kinds
           | of workloads, with language level insight into blocking
           | operations.
           | 
           | If we take the Tokio async runtime for Rust: it's not exactly
           | the fastest runtime out there, comparatively new, you need to
           | carefully avoid introducing blocking calls that stall the
           | worker threads, there is no language level integration, and
           | Rusts trait system currently often requires boxing of
           | futures.
           | 
           | Combine all of that and you often end up with code that isn't
           | noticeably faster than Go in practice.
           | 
           | One counterpoint though: I've found that Rust code takes
           | longer to write, but it's significantly easier to refactor,
           | thanks to the much stronger type system.
        
           | jerryjerryjerry wrote:
           | Good points on performance side, but IMHO one of the pain
           | points is still the memory management. Go GC can do some of
           | the work, however it can be hard to do real-time tracking and
           | management while this is exactly what db kernel needs.
        
         | Cthulhu_ wrote:
         | Just because it has a garbage collector doesn't mean it's
         | suboptimal; Go's GC is much faster and less intrusive than that
         | in other languages, and unlike Java, you actually get to choose
         | whether you put things on the heap or stack so you can in
         | theory avoid using the heap entirely, if it makes sense for
         | your application.
        
           | benhoyt wrote:
           | > you actually get to choose whether you put things on the
           | heap or stack
           | 
           | I think I know what you mean, but this isn't strictly true.
           | In Go, the compiler decides for you, and you don't have to
           | worry about it. The compiler allocates on the stack whenever
           | it can (because it's faster and uses less memory), but if it
           | determines that a variable "escapes the stack" or if it can't
           | figure out whether it escapes or not, it allocates on the
           | heap. You could have a simplistic (but correct) Go compiler
           | which allocated everything on the heap -- interestingly, the
           | Go spec doesn't mention the words "heap" or "stack" at all.
           | 
           | Go does give you a lot of control over allocations (stack or
           | heap!) and memory layout, though, which makes it pretty good
           | for relatively low-level performance work.
        
       | Xeoncross wrote:
       | This is a really neat experiment. I really appreciate all the
       | time that went into documenting and training the reader on why
       | and how. Efficient compression of time series data is something
       | that is fun to think about just like probabilistic structures or
       | tries which provide space/time saves orders of magnitude more
       | than naive approaches.
        
       | andrew-ld wrote:
       | general comment and not just about this post:
       | 
       | on hackernews we keep seeing relatively dead projects made in go
       | in a few lines of code that promise to handle a really large
       | amount of data in a short time, every time we end up with fake
       | benchmarks or partial implementations of how it should really be,
       | please before criticising the real projects (which unlike them
       | are larger than tens of thousands of lines) check the claims
       | carefully.
        
         | richieartoul wrote:
         | I was pretty clear in the first paragraph that the code is a
         | half-baked research PoC that should not be used for anything
         | serious.
        
       | javajosh wrote:
       | "imagine you have a fleet of 50,000 servers..." It's interesting
       | to consider when you would ever need so many servers, from first
       | principles, considering that a single server can easily handle a
       | million simultaneous connections, and very few applications are
       | used by every man woman and child on Earth and even fewer are
       | used by multiples of that.
        
         | tilolebo wrote:
         | > considering that a single server can easily handle a million
         | simultaneous connections
         | 
         | Can you elaborate on that part?
        
           | joisig wrote:
           | Not the GP but they might be referring to [0] or one of
           | several other articles you will find if you Google "handle a
           | million connections in *".
           | 
           | Realistically you also usually need to perform some non-
           | trivial work from time to time for some non-trivial portion
           | of those connections, which will further load your server,
           | but still.
           | 
           | [0] https://phoenixframework.org/blog/the-road-to-2-million-
           | webs...
        
             | crgwbr wrote:
             | Genuinely curious on this--with port numbers only being 16
             | bits, how is it possible for one machine to ever handle
             | more than 65k concurrent connections?
        
               | alduin32 wrote:
               | For outbound connections, it can be done using multiple
               | IP addresses.
        
               | dxhdr wrote:
               | A connection is the tuple {source_ip, source_port,
               | dest_ip, dest_port}
        
               | pipe_connector wrote:
               | Connections must have unique IP:Port _pairs_ between
               | client and server. You 're limited to 65K concurrent
               | connections for the same client. In practice, no one is
               | opening that many connections from a single client.
        
               | dtjohnnymonkey wrote:
               | You might run into this limitation more quickly if you
               | are receiving connections via a load balancer.
        
               | adra wrote:
               | As such, the load balancer itself can probably hold a
               | group of source IPs to use as second-hop solution to this
               | problem as well if we're sincerely talking about load
               | balancers holding a ton of largely idle connections
               | simultaneously.
               | 
               | The more likely load balancer outcome would be DNS split
               | on inbound client IPs, and scaling out until each load
               | balancer handles the appropriate amount of traffic (by
               | some measure and scale out if exceeded).
        
         | unboxingelf wrote:
         | Not every problem is network or io bound. Imagine a system that
         | ingests data from clients. Only a subset of the capacity may
         | service client connections, while the rest may be running
         | computations over data.
        
           | javajosh wrote:
           | Yes, of course you might be CPU bound, in which case even a
           | single user might take up 50k servers (think of a scientist
           | doing a climate simulation, or an AI researcher building a
           | very large model).
           | 
           | All things being equal, CPU-bound is the exception, not the
           | rule. Most every program we think of as an "application" is
           | chat with some structure, persistence, access control added,
           | and are indeed IO bound.
        
           | foobiekr wrote:
           | Basically in an ideal system you are only bound by hardware
           | limits.
           | 
           | memory bandwidth bound
           | 
           | memory size bound
           | 
           | cache hit rate and bandwidth bound
           | 
           | TLB size bound
           | 
           | CPU decode and issue logic bound
           | 
           | CPU renaming / OOO buffer bound
           | 
           | kernel implementation bound (esp. locks, interrupts,...)
           | 
           | network physical layer bound
           | 
           | GPU/TPU bound (which are variations of the above, though
           | mostly memory)
           | 
           | power and cooling bound
           | 
           | ...
           | 
           | ... so basically unless you're running out of functional
           | units, you're hopefully I/O bound depending on where you
           | consider i/o.
           | 
           | But in reality, most SW is just "crappy SW making poor use of
           | resources bound." That's where we mostly are now as an
           | industry. Bad language choices, terrible design, no cross-
           | layer comprehension.
        
             | javajosh wrote:
             | _> But in reality, most SW is just "crappy SW making poor
             | use of resources bound."_
             | 
             | I agree, but the number one culprit is premature
             | distribution, thanks to the widely pervasive cargo culting
             | Amazon-style microservices.
        
             | legulere wrote:
             | Having enough computing power to be able to write crappy
             | software is an enormous productivity boost. Writing
             | software with crappy performance is way easier than writing
             | software with good performance and takes less time and less
             | experienced developers. Writing crappy software is often
             | better at achieving business goals as cheap as possible.
        
             | DSingularity wrote:
             | "Cross layer comprehension" - what is that? Wanna move
             | computation to the NIC ?
        
               | foobiekr wrote:
               | It is useful to understand the underlying layers so as to
               | take advantage of what they are good at and not end up in
               | the position of relying on operations where they perform
               | badly.
               | 
               | "What is are the natural, performant operations for the
               | layer below me? How can I construct my solution from
               | these operations instead of pretending the layers are
               | orthogonal?"
               | 
               | A good example would be the linear read/write performance
               | of harddisks. If you had the option to avoid random
               | access and take advantage of read-ahead and other methods
               | you'd have seen much better performance than an approach
               | that ignored this behavior.
               | 
               | There are many, many examples of this.
        
               | astrange wrote:
               | That is a good idea and we do that. (TSO/LRO/TLS offload)
        
             | jiggawatts wrote:
             | There was a post (here?) about how Netflix uses NICs that
             | offload TLS encryption because that's literally the only
             | way to hit 200 Gbps. It's not that the CPUs can't encrypt
             | that fast -- the limit is the memory bandwidth.
             | 
             | I see some clouds deploying servers with 100 Gbps NICs and
             | I wonder what percentage of deployed applications could get
             | anywhere near that...
        
         | SnowHill9902 wrote:
         | I guess it's just an example. It can be sensors, cars, apps,
         | ...
        
         | dchuk wrote:
         | I work at a telematics company. Technically, every black box in
         | a vehicle is a server, and we have much more on the road than
         | that figure
        
           | [deleted]
        
         | Johnny555 wrote:
         | A single server can only handle a million connections if each
         | "connection" is doing a trivial amount of work. There are lots
         | of compute bound services that need more power than a single
         | server (or even 50,000 servers) can provide.
        
         | maccard wrote:
         | I work on multiplayer video games, and our game servers are
         | often stateful programs that persist for 10-60m. If you have 5
         | players per session and 250,000 peak concurrent users (remember
         | that steam is only one platform, consoles exist too, as do
         | mobile platforms), you can have 50k servers. Sure they're not
         | necessarily 50k unique machines, but they might be container
         | instances or just processes on a single box, but theres 50k
         | individual processes doing "stuff"
        
           | [deleted]
        
         | foobiekr wrote:
         | Think differently about this problem.
         | 
         | Not all applications are web pages where the lifecycle of a
         | given connection is "establish TCP, establish TLS, receive
         | series of requests and produce series of responses mostly by
         | hitting external caches or DBs" [or the QUIC variant of this].
         | That problem space was one of the very first scale challenges
         | to arrive in 1998 and was one of the very first that actually
         | got addressed. It's not the challenge space now and has not
         | been, basically, for 20+ years, except perhaps the whole
         | scaling-of-database problem, which has been dealt with by
         | sharding, and the distribution of flows problem, which has
         | mostly been solved with clever applications of intensely
         | performant _scale-up_ hardware in the form of modern switch
         | NPUs running variations of ECMP and clever uses of anycast, DNS
         | load balancing, routing, etc.
         | 
         | All of this stuff was pretty common by the mid-2000s.
         | 
         | But bottlenecks in systems always exist. They move around. They
         | can be anywhere in the stack.
         | 
         | Connections is the simplest one that got a lot of attention for
         | twenty years - real pre-emptive threads (in Linux, and
         | Solaris's weird diversion into m:n), select() scalability (both
         | in terms of bookkeeping and in terms of basic limits - that is,
         | the lack thereof) giving rise to kqueue on freebsd, WFMO() on
         | NT and years of attempts on Linux to get something that
         | actually worked, which took awhile, _then_ the c10k problem,
         | and so on.
         | 
         | After connectivity you have issues in layer 3 - TCP offload,
         | cost of TLS, etc. Userland to kernel copies (basic stuff like
         | sendfile() to different userland driver schemes). And so on.
         | Physical layer - servers move from NICs with lots of copies, to
         | ring based with scatter-gather, to TSO, crypto offload, to ...
         | while going from 10mb to 100, 1000, 2.5g, 10g, 40g, 100G and
         | sooner or later 400G on server will not be as uncommon as it is
         | now.
         | 
         | But as your networking capacity and throughput scale, you start
         | bumping into other things. Once you're talking high speed links
         | and various schemes to get the kernel out of the way, you are
         | mostly - not always, but mostly - talking about data movement
         | problems. Elephant flows have their own system level problems
         | in networks, and for data moving and staging you actually don't
         | want the host _cpu_ involved if you can avoid it, let alone the
         | kernel. Now you are in the area of doing (remote)->NIC----
         | PCIe---->NVME (or --->GPU) directly, if you can. Now your NVME
         | storage device becomes the bottleneck.
         | 
         | 90s era supercomputing clusters had all of these problems with
         | slightly different technologies, AI clusters have them today.
         | These are not connection limited, they do not scale with
         | people. Their primary scale challenge is utilization/CAPEX, but
         | that's a longer discussion.
        
         | yashap wrote:
         | In 2016, Gartner estimated that Google ran ~2.5 million servers
         | (https://www.datacenterknowledge.com/archives/2017/03/16/goog..
         | .), and it's probably even more today. Obviously Google are an
         | extreme outlier, but that's 50x 50,000.
         | 
         | Most servers do far, far more compute intensive things than
         | handling connections - that's a pretty meaningless number. I
         | work in the mobility space, we've got servers that solve
         | large/complex vehicle routing problems, and ideally they're
         | performing computation for just ONE user at a time.
         | 
         | Certainly 50,000 servers is a lot, but tonnes of large tech
         | companies run 100s to 1000s of servers at a time.
        
           | javajosh wrote:
           | Sure, but the simple Postgres soln the author mentions
           | _works_ for 1000 servers. Assuming a power law distribution,
           | technology like this is useful for a vanishingly small number
           | of organizations. What 's even more interesting is that it's
           | pure admin overhead for those looking to centralize control
           | over vast numbers of systems. So often technologists are
           | asked to consider the moral or political implications of
           | their work, and I think this is a good opportunity to do so.
        
         | hinkley wrote:
         | My first scaled project, we were aiming at 30req/s per server.
         | On UltraSPARC boxes, so maybe 8 cores? At a time when memcached
         | was brand spanking new and thus not trustworthy, and F5 had
         | just got their traffic shaping logic to actually work. We did
         | it, but the architecture was terrible and we should have been
         | able to do 50-100 req/s/s if we had taken certain principles
         | more seriously.
         | 
         | My current project is doing 5x the traffic but with 20x the
         | cores, it's embarrassing. And that's just counting "our"
         | servers, which are only about 1/3 of the whole enterprise
         | (heading toward 50% if I wasn't on the scene). I look at all
         | the waste in our project and then I think about why I would
         | ever need 50k cores, let alone servers and I just can't fathom
         | it. Who is handling a tens of millions of requests per second?
         | And what on earth are you fucking up so badly that you need 2
         | million servers? Is Google doing 4 billion requests per second?
         | 
         | At some point I have to ask myself if making it easy to manage
         | more servers was really their best strategy. Often friction and
         | constraints are where innovation comes from. When things are
         | easy people don't think about them until they are gone.
        
           | morelisp wrote:
           | > Who is handling a tens of millions of requests per second?
           | 
           | Speaking as someone less than one order of magnitude below
           | that, it takes us about ~100-200 cores. (Depending on how you
           | amortize shared infrastructure like our Kafka brokers, etc.)
           | So even 10x'ing our infrastructure I can't imagine 50k
           | servers.
        
             | hinkley wrote:
             | It's very reminiscent of that feeling when you were a child
             | that adults have everything figured out, and so you're in a
             | big hurry to get there because then everything will make
             | sense.
             | 
             | Then that moment of dawning horror when you see that nobody
             | knows what the fuck they're doing and everyone is faking it
             | at best, and just a child playing dress-up at worst.
             | 
             | Google has certainly figured out a number of things, but
             | anyone using 2.5 million servers in 2016 on a planet of
             | only 7 billion people is playing dress-up.
        
         | Cthulhu_ wrote:
         | Connections is fine, but applications do more than just accept
         | connections, generally speaking.
        
       | endisneigh wrote:
       | I thought about making something using SQLite virtual tables to
       | back all of the data on FDB in order to basically provide a SQL
       | interface with FDB without writing a new layer, but some others
       | tried and apparently didn't go well since you can't have indexes
       | using virtual tables. Truly a shame.
       | 
       | Any other storage engines that support using another interface as
       | the underlying store?
        
         | zitterbewegung wrote:
         | FoundationDB uses SQLite as the storage layer unless you mean
         | to access the storage layer in FoundationDB to perform that
         | task?
        
           | endisneigh wrote:
           | yes, access through storage layer through FDB using a SQL
           | interface. aka SQL interface with FDB guarantees.
        
         | richieartoul wrote:
         | Yeah that would be cool. You could try forking an existing "SQL
         | over key value" system like Cockroach or TiDB and replacing the
         | KV store with FoundationDB. That is probably what I would
         | explore first.
         | 
         | Some of my old colleagues/friends are working on building a
         | transactional document store (among other things) on top of
         | FoundationDB, so if you can forgo SQL that's another project
         | you could explore working on:
         | https://github.com/tigrisdata/tigris
        
         | alexchamberlain wrote:
         | Postgres Foreign Data Wrappers?
        
           | snissn wrote:
           | Yeah was gonna suggest this! I'm not the biggest expert but
           | you can put Postgres in front of all your data sources and
           | have pg route the queries through a heterogeneous
           | infrastructure and get a nice interface
        
       ___________________________________________________________________
       (page generated 2022-05-28 23:00 UTC)