[HN Gopher] Show HN: Serverless OLAP with Seafowl and GCP
       ___________________________________________________________________
        
       Show HN: Serverless OLAP with Seafowl and GCP
        
       Hello HN! I'm an engineer at Splitgraph and recently started
       learning Rust so I could make my first contribution to Seafowl [0],
       an early stage analytical database. Along the way I figured out a
       database hosting hack on GCP and wanted to share it with HN. It's a
       way to achieve "true" scale to zero database hosting that could be
       useful for certain side projects or spiky traffic situations.  A
       recurring problem I've faced with side projects is the need for
       Postgres, but no desire to deploy or maintain new instances. So
       when I learned GCP's "always free" tier includes serverless [1] I
       got curious to see if I could run a database.  While a lot of
       classic databases aren't usually a great fit for serverless,
       Seafowl separates compute, storage and catalog (catalog == a SQLite
       file of metadata). [2] Last month I was able to introduce GCS
       bucket compat to Seafowl, which enabled me to mount the catalog via
       gcsfuse (i.e. an adapter that allows attaching GCS buckets to local
       filesystems). Upshot: while FUSE does add HTTP requests to
       container startup, init time remains comparatively quick, even cold
       starts, because fetching is limited to the single catalog SQLite
       file only.  With this approach you get a URL you can query directly
       from your FE if you want, e.g. fetch() can send SELECT * ...
       queries straight from your users' browser. You could plot a graph
       from a static React frontend, or observablehq.com editor, with no
       persistent backend needed. So at times when nobody's using your
       app, 100% of your stack can scale to zero with obvious cloud spend
       advantages. And even if you exceed free tier limits, being PAYG
       offers a good chance you'll come out ahead on hosting costs anyway.
       NB: Seafowl is an early stage project, so it's not really suitable
       if you need transactions or fast single-row writes. Otherwise, this
       could be a nice way to get free database hosting at a big 3 cloud
       provider, especially for e.g. read-only analytical reporting
       queries.  Feedback and suggestions are appreciated. Hope it helps
       you! More available if you want [3].  [0]
       https://seafowl.io/docs/getting-started/introduction  [1]
       https://cloud.google.com/run/pricing#cpu-requests  [2] Neon is
       another interesting project that separates compute and storage.
       https://neon.tech/blog/architecture-decisions-in-neon  One issue I
       observed was a noticeably longer startup time vs this FUSE
       approach, which I believe may be related to Postgres connection
       setup time/roundtrips. Looking forward to trying Neon again in
       future.  [3] https://www.splitgraph.com/blog/deploying-serverless-
       seafowl
        
       Author : paws
       Score  : 44 points
       Date   : 2023-06-06 16:18 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | lee101 wrote:
       | [dead]
        
       | [deleted]
        
       | aleda145 wrote:
       | I use seafowl hosted on Cloud Run for a side project for Swedish
       | Real Estate data. Around a million rows, seafowl works great!
       | 
       | One killer feature (aside from scaling to zero) is that the
       | queries can be constructed as GET requests. That means we can
       | cache the query results with cloudflare.
       | 
       | I have it exposed here if you want to write some SQL and check it
       | out live: https://bostadsbussen.se/sold/query
        
         | paws wrote:
         | Glad to hear you've had good results! Yes, @mildbyte did a
         | great job making Seafowl comply with HTTP cache semantics (i.e.
         | Etags/Cache-Control), and it should give good results for both
         | CDNs and browsers. When building Open Data Monitor [0] I
         | certainly observed some nice speed ups.
         | 
         | For those interested in how caching works (i.e. if your dataset
         | is public it could be an easy win) more info is in the docs [1]
         | 
         | [0] https://open-data-monitor.splitgraph.io/week/2023-05-22
         | 
         | It's a Socrata scraper that renders diffs of public/government
         | datasets
         | 
         | [1] https://seafowl.io/docs/getting-started/tutorial-fly-
         | io/part...
        
       ___________________________________________________________________
       (page generated 2023-06-07 23:01 UTC)