[HN Gopher] Dbdev - A database package manager for PostgreSQL tr...
       ___________________________________________________________________
        
       Dbdev - A database package manager for PostgreSQL trusted language
       extensions
        
       Author : samwillis
       Score  : 101 points
       Date   : 2023-04-14 15:38 UTC (7 hours ago)
        
 (HTM) web link (database.dev)
 (TXT) w3m dump (database.dev)
        
       | koolba wrote:
       | Is this like pgxn but specific to trusted languages so presumably
       | less installation issues?
        
         | oliverrice wrote:
         | yeah, exactly. pgxn also houses native extensions, some of
         | which may be TLE compatible. database.dev is trusted language
         | extension specific
         | 
         | We've reached out to David, who runs pgxn to see how we might
         | collaborate and allow pgxn to be used as another registry that
         | users can install TLEs from
        
       | mmmeff wrote:
       | What's stopping me from adding an extension to this repository
       | that adds something valuable while simultaneously exfiltrating
       | entire databases back to me?
       | 
       | It's a cool idea, but I won't be the first to try it. You're
       | placing an awful lot of trust in these modules.
        
         | thewataccount wrote:
         | Nothing? It's a classic supply chain attack.
         | 
         | The same goes for pypi, npm, crates.io, docker.com, github,
         | chrome extensions, etc.
         | 
         | Generally it comes down to trusted authors/package names. Many
         | people trust `requests` from pypi. That being said even this
         | doesn't prevent the package to be sold, or hacked, or from the
         | owner later adding things, or a 3rd party sneaking something in
         | via a PR. Some like docker (and technically most git based ones
         | because of md5) can have specific tags changed at any time, so
         | you could later add a malicious package to version 3.0.4 or
         | whatever.
         | 
         | Many package managers (mostly linux ones) include key-signing,
         | some package managers also have built-in scanners. Preferably
         | you proxy/cache versions you need, and have a way to audit
         | version upgrades - but in reality it's a lot of work to do
         | properly.
         | 
         | Recent example of `pytorch` nightly on pypi accidentally having
         | a dependancy named `torchtriton` that I think was name
         | squatting? Anyway installing the pytorch nightly between
         | December 25th to 30th would result in your home directory being
         | uploaded, etc.
         | https://www.bleepingcomputer.com/news/security/pytorch-discl...
         | 
         | EDIT: That said if anyone from supabase reads this - I'd love
         | to see GPG signing, version hashing/integrity verification, 2FA
         | if it's not their already, and I know this one is a lot of
         | extra work - but any type of automated even cursory glance for
         | malicious code.
        
           | michelpp wrote:
           | > That said if anyone from supabase reads this - I'd love to
           | see GPG signing, version hashing/integrity verification, 2FA
           | if it's not their already
           | 
           | Yep, we've got all these on the road map, thanks!
           | 
           | > and I know this one is a lot of extra work - but any type
           | of automated even cursory glance for malicious code.
           | 
           | Unfortunately a "cursory glance" is trivially sidestepped,
           | and anything complete enough to be useful is probably
           | impossible. Maybe you want to write an extension that
           | automates logically replicating your data off the server, or
           | truncates a table after processing it, to you that's useful,
           | to someone else that's malicious, there's just no way to
           | know.
           | 
           | If you don't trust the authors then the only way to trust a
           | TLE is to inspect the code yourself in the context of what
           | "safe" means to you. The pytorch example you mentioned was
           | very likely caught by a human, if an automated system were
           | possible for python (but likely impossible) it would have
           | caught it sooner.
        
         | michelpp wrote:
         | Like any package or plugin for any language or framework, you
         | are responsible for understanding what you are installing.
         | Since TLEs do not come with any compiled code, this is
         | relatively easy for you to verify yourself with code
         | inspection.
         | 
         | We expect a lot of DBAs to say "no way" to something like TLEs,
         | that's fine, for similar reasons people don't use pip, npm, or
         | cpan. You make your own risk/benefit analysis.
        
       | bashinator wrote:
       | You talking with the PostGIS folks?
        
         | oliverrice wrote:
         | PostGIS is a native extension mostly written in C. Since there
         | is no C trusted language it would be hard to fit PostGIS into
         | the TLE paradigm
         | 
         | Its less of a problem for well known and trusted extensions
         | like PostGIS though because it comes pre-installed on most
         | hosted providers (Supabase, RDS, etc)
        
       | mike_hearn wrote:
       | At KotlinConf today I gave a talk on designing apps with two-tier
       | architecture, where you implement your entire app without the web
       | stack appearing anywhere at all. Instead you publish desktop and
       | mobile apps that connect directly to an RDBMS like PostgreSQL via
       | its native protocol, and use server extensions for any logic that
       | is inconvenient to do with SQL.
       | 
       | This approach might seem horrifyingly outside-the-box but has a
       | lot of advantages, and some of the reasons we didn't do things
       | this way historically have been solved in recent years.
       | 
       | Because it was KotlinConf the demo uses PL/Java, which is pretty
       | nice because there's such a healthy ecosystem of stuff based
       | around JDBC and because deploying JVM stuff doesn't require any
       | sort of cross-compilation. PL/Java also supports (for now)
       | trusted extensions using sandboxing, although of course the
       | sandbox can just get in the way and normally you trust your own
       | server anyway so this is a double edged sword.
       | 
       | The demo code can be found here (it's a prototype and nobody
       | reviewed it yet so be gentle)
       | 
       | https://github.com/hydraulic-software/bugzino
       | 
       | I'll write up a blog post version of the talk, but for now I had
       | to mention that DBaaS providers don't actually enable this sort
       | of design because they like to wall off the full power of the
       | RDBMS behind custom APIs. But in two-tier design you really lean
       | into the database and use all of its features. So, it'd be nice
       | if:
       | 
       | a. database.dev were to support PL/Java extensions.
       | 
       | b. Supabase were to allow direct connections, as the native DB
       | protocol supports a _lot_ of features that otherwise have to be
       | sort of hacked on top of HTTP. Ultimately, HTTP is designed to
       | fetch hypertext whereas the PG native protocol is designed to
       | work with data, and that difference shines through in a bunch of
       | ways.
        
         | saghm wrote:
         | > This approach might seem horrifyingly outside-the-box but has
         | a lot of advantages, and some of the reasons we didn't do
         | things this way historically have been solved in recent years.
         | 
         | From a development standpoint, I definitely think this seems
         | easier! My concerns about this would be almost entirely about
         | security. How does this approach avoid leaking credentials that
         | people could use to access the database outside of the
         | application? More generally, how does this approach deal with
         | the increased attack surface for the database from exposing it
         | to the open internet?
        
           | mike_hearn wrote:
           | See my other post for a discussion of non-user security
           | topics.
           | 
           | Every app user has a database user, mapped 1:1, so there are
           | no credentials to leak. The DB credentials are the user
           | credentials. In the demo app repository I implement open
           | signup with email confirmation using stored procedures/a
           | server extension. It works by creating a guest user with a
           | well known password, but it's locked down so that the only
           | things it can do are run a couple of stored procedures. Those
           | then send an email to confirm ownership of the address before
           | creating a real DB user.
           | 
           | Open work: don't use passwords, mint SSL client certificates
           | during signup with the private key stored in the OS keystore.
           | Now there's no cookies or passwords to steal! Also, OAuth
           | integration.
           | 
           | The guest user / open signup is certainly a weakness for the
           | demo, which is public on the internet (a dedicated cluster
           | though). It's locked down as much as possible but there are
           | probably lots of ways to screw with the server even without
           | being granted access to anything. And there are no rate
           | limits on signup. It's really a very basic exploration.
        
             | conceptme wrote:
             | Connections are pretty expensive in PG how can you do
             | pooling if every user connects directly?
        
               | giovannibonetti wrote:
               | Connection poolers like PgBouncer [1] (traditional) and
               | Supabase's Supavisor [2] (new) come to mind.
               | 
               | [1] https://www.pgbouncer.org/ [2]
               | https://github.com/supabase/supavisor
        
               | mike_hearn wrote:
               | Supavisor looks great. Unclear if it can be configured to
               | use https://github.com/pgaudit/set_user though.
               | 
               | That said, there are quite a lot of CRUD apps in the
               | world that don't need to support lots of simultaneous
               | users, or where you can just add read replicas quite
               | cheaply. Think internal apps, dashboards, etc.
        
         | GordonS wrote:
         | > This approach might seem horrifyingly outside-the-box
         | 
         | I mean, this is how we routinely built client-server apps 20-30
         | years back. Everything old is new again!
         | 
         | One of the big reasons so many moved away from this approach
         | was security. Giving end users direct access to the database
         | (even if "obscured"), gives me the shivers!
        
           | mike_hearn wrote:
           | One of the things that's changed in recent years is that free
           | databases like PostgreSQL have far more security features.
           | You can grant fine grained privileges on individual objects
           | like queries (via views), individual table rows, stored
           | procedures/functions and so on.
           | 
           | In the demo app, I use a simple approach to security: users
           | don't have permissions to directly access any tables. Read
           | access is via views and could use row-level security (RLS),
           | and write access is always via stored procedures. Stored
           | procs can also act as queries, and they can be written in any
           | supported extension language, so any access control and
           | privacy policy is implementable.
           | 
           | Still, security in two-tier apps is a nuanced topic. Some
           | things get better and other things are worse.
           | 
           | Some of the remaining things to solve are:
           | 
           | 1. RDBMS engines tend to be written in C. They're good code
           | and fairly trustworthy, but still, web servers tend to be
           | written in memory safe languages and databases aren't.
           | 
           | 2. DDoS defense companies have more options for protecting
           | web apps (sending js challenges and captchas). If you have a
           | mobile app then you're exposing raw APIs anyway, so this is
           | maybe not a big change.
           | 
           | 3. You have to set resource quotas to stop malicious users
           | submitting expensive queries. Of course this is also a
           | concern with web apps, just translating HTTP to PG doesn't
           | make it impossible to spam servers with expensive queries.
           | 
           | 4. Data can be protected with ACLs pretty well, but schemas
           | not so much. So you could leak an upcoming feature via the
           | appearance of new object names. Web apps sometimes have the
           | same issue with leaking info about new features in their JS
           | though, and the sky doesn't fall.
           | 
           | For many app types the above concerns don't matter much. Any
           | app that's internal or business-to-business for example. If
           | you build auth around client certs then unauthorized users
           | never get to connect at all.
           | 
           | Also! Consider that this design also eliminates many kinds of
           | security issue. The talk goes into this. For example, XSS and
           | SQL injection are eliminated by design! Those have been two
           | of the most common and destructive bug classes over time.
           | Also, web servers do not exactly have a track record of being
           | unhackable, especially as wrapping SQL with HTTP handlers
           | introduces many new places where access checks might be
           | forgotten or incorrectly implemented. A SQL GRANT statement
           | is pretty transparent and the implementation is well tested
           | compared to app specific logic that might have been written
           | by one person and never properly tested.
           | 
           | There's also other security-related benefits, for example,
           | certificate expiry routinely causes massive and eye-
           | bleedingly disruptive outages, but is eliminated as a problem
           | entirely by this design. It works because code signatures are
           | timestamped so don't break when the cert expires, and the db
           | connection can be then be encrypted using a self-signed
           | certificate that never expires. To rotate the key you roll
           | out a new client.
        
             | majkinetor wrote:
             | This is basically how its done today with tools like
             | Postgrest.
        
         | benatkin wrote:
         | > This approach might seem horrifyingly outside-the-box but has
         | a lot of advantages
         | 
         | You don't need to worry, there are a lot of people who are into
         | stored procedures. If there weren't, there wouldn't be people
         | disagreeing with them, like the author of Rails:
         | http://web.archive.org/web/20060418215514/http://www.loudthi...
         | 
         | The slippery slope of liking stored procedures a whole lot
         | means them taking over the rest of the middle of the stack, and
         | that isn't a new idea.
         | 
         | I like the idea of having instant access to the database, and
         | microservices are my favored approach.
        
         | nijave wrote:
         | Are there any performance issues with connection handling? Not
         | sure how/where a connection pooler fits in.
        
           | mwcampbell wrote:
           | Mike has called this out as a weakness of PostgreSQL, and an
           | advantage of Microsoft SQL Server and Oracle, before.
        
           | mike_hearn wrote:
           | This is also discussed in the talk, which I hope will be on
           | YouTube soon enough (it was live streamed).
           | 
           | DB connections are heavier than web server connections, in
           | particular in terms of memory (when using postgres, not sure
           | about mysql). Multiplexers like pgbouncer can address this,
           | but then you lose a bit of the deployment simplicity.
           | Fortunately RAM is pretty cheap nowadays, you can configure
           | the DB to disconnect idle connections, you can use
           | multiplexers and you can make the client transparently
           | reconnect when a connection is needed. So it turns into a
           | RAM/user latency tradeoff.
           | 
           | There are some benefits though. Native DB protocols will tend
           | to stream results in binary, avoiding a JSON encoding that
           | can expand the bandwidth requirements. Also you can get rid
           | of the server side load balancing, because good db drivers
           | know how to load balance amongst R/O replicas already. A lot
           | of deployment complexity in web apps is driven by the fact
           | that browsers handle errors by just throwing up an error page
           | and stopping, so you need to avoid even tiny amounts of
           | downtime. When you control the client you can do failover and
           | recovery more gracefully, which in turn relaxes the
           | availability constraints on the servers a bit.
        
             | nijave wrote:
             | It seems like this uses native Postgres security/roles
             | which I think would prevent a connection pooler from
             | working (unless you held open 1 connection per user)
             | 
             | Ultimately it'd be nice if Postgres could keep a hot worker
             | pool for connections instead of forking processes but afaik
             | there's been plenty of debate over the years and no
             | consensus
             | 
             | I think high connection thrashing also eats up CPU and
             | there's the issue of increasing load eats more RAM which
             | reduces Postgres memory cache. We had a connection leak
             | problem and clearing ~2600 PG connections saved something
             | like 80-100GB of RAM
        
               | mike_hearn wrote:
               | I think you can use https://github.com/pgaudit/set_user
               | combined with some pgbouncer configuration to solve that,
               | but agreed that postgres doesn't have perfect support for
               | this scenario. It'd be better to have a proxy that is
               | designed specifically for this use case that knows how to
               | drive something like the set_user extension (and better
               | if it were to be merged upstream).
        
       | noob4life wrote:
       | How does `index_advisor` work?
        
         | oliverrice wrote:
         | index_advisor is a project that almost made the cut for LW7. We
         | decided to launch it as a pgtle instead to get some feedback
         | before we (spoiler) probably release it in LW8
         | 
         | It uses the hypopg package to very quickly generate an index
         | for each column referenced in the query and then re-run's an
         | explain plan to measure the estimated "total cost".
         | 
         | If the expected "total_cost" reduces, we add the indexes to the
         | list of recommendations and continue testing the remaining
         | columns.
         | 
         | Its currently limited to single column indexes only but in a
         | later update we'll enhance that to do more comprehensive
         | searches with multiple columns in a single index
        
           | claytongulick wrote:
           | Cool idea for an extension!
           | 
           | I had a whole set of comments about concerns about doing this
           | to a large table in prod, and statistics and such - then I
           | read about hypopg.
           | 
           | What a neat package, learned something new today.
        
       | claytongulick wrote:
       | I'd _love_ to see plv8 added to this.
       | 
       | Jerry S. has done an amazing job of wrangling the insane v8 build
       | process into something that mostly works across platforms, but
       | installation is still not particularly simple.
       | 
       | Unfortunately, due to the problems with that build process,
       | Ubuntu and other flavors dropped the prebuilt apt packages.
       | 
       | Making plv8 a simple install would be a huge boost to the
       | community.
        
         | michelpp wrote:
         | Good news for you then! plv8 is a trusted language and works
         | with pg_tle just fine.
         | 
         | EDIT: I think I misunderstood your comment, plv8 itself is a
         | compiled extension and thus cannot be a TLE, however TLEs _can_
         | be written in plv8 once you have it installed. Sorry for the
         | confusion!
        
         | jerrysievert wrote:
         | wow, thanks for the compliment! it hasn't always been easy,
         | believe me.
        
           | LunaSea wrote:
           | Thank you for the amazing work indeed!
           | 
           | I know that at one point the idea of using a non-v8
           | JavaScript engine was floating around, due in part to the V8
           | build complexity and the API breaking changes.
           | 
           | Is this still something that is up in the air?
           | 
           | I know that Bun decided to use JavaScriptCore as a recent
           | example.
        
             | jerrysievert wrote:
             | I have been slowly but surely working on pljs in my spare
             | time. it is nowhere near feature parity with plv8, but I
             | hope to get it closer with more time. currently missing:
             | * query execution/cursors         * window functions
             | * sub transactions
             | 
             | it is based on quickjs, is very fast/easy to compile, and
             | while in long-running functions it is slower than v8, it
             | has a very fast startup time that negates that for shorter
             | functions.
             | 
             | but, it's not ready for prime time yet.
        
           | mike_hearn wrote:
           | That's interesting - could you describe what makes the build
           | and install so hard? I can't think of any obvious reason why
           | it should be harder than PL/Java. V8 is pretty widely used!
        
             | jerrysievert wrote:
             | v8 has its own build chain, which involves ~5gb of
             | downloads. that build chain doesn't work on many platforms,
             | and requires very specific versions of prerequisites
             | including specific glibc versions.
             | 
             | the build chain consists of specific compilers, specific
             | versions of chromium parts and pieces, its own sys root,
             | everything.
             | 
             | since the API for v8 changes on a regular basis (not as
             | much as it has in the past, at least), you cannot simply
             | change out v8 versions, as they likely will not work.
             | 
             | v8 is also compiled differently for each embedding
             | application, and thus a shared object won't work correctly
             | across multiple applications, and besides is strongly
             | advised against by the v8 team.
             | 
             | nodejs has its own build process that they've been
             | maintaining since the v8 3.1.4 days, but it is specific to
             | node.
             | 
             | so, it just adds up to a difficult maintenance.
        
               | mike_hearn wrote:
               | Wow. That's kind of surprising. HotSpot is just
               | autotools: ./configure && make and you're done. Plus the
               | embedding APIs are stable.
               | 
               | Do you have any insight into what led them to have such a
               | complex build? Is it just because it's a part of the
               | Chromium tree? It doesn't seem inherent to the problem
               | space.
        
               | jerrysievert wrote:
               | they use their own build systems for everything, even
               | skia. though skia has a bit less strict build
               | requirements.
        
       | foreigner wrote:
       | Can I use this on AWS RDS?
        
         | oliverrice wrote:
         | RDS does have the underlying native extension that is required
         | to install user defined packages (pg_tle) from database.dev. It
         | doesn't currently have support for the extension that is
         | necessary to make the in-database client work to install
         | packages directly from the registry.
         | 
         | That's great feedback though: We'll add a new page to the
         | website for each package that provides the snippet you can run
         | on an RDS instance to load the extension and then run `create
         | extension`.
         | 
         | We're also in talks with RDS team and experimenting with a few
         | options that would allow users to query the registry directly.
         | The most promising one we have found so far is using AWS Lambda
         | with RDS's aws_lambda extension to do that http part, and pass
         | the contents back to the database. Still a WIP though!
        
         | paulddraper wrote:
         | No.
         | 
         | RDS has a strict list of approved extensions. (Since AWS is on
         | the hook for the system stability, security, replication, etc.)
        
           | supamichelp wrote:
           | RDS for Postgres 14+ _does_ come with pg_tle, but it does not
           | come with pgsql-http, so you cannot use the dbdev installer.
           | But, you _can_ still install TLEs, you just have to do so
           | "by hand" by calling `pgtle.install_extension()` yourself,
           | probably from a migration script depending on your tooling.
           | 
           | See the RDS docs here: https://docs.aws.amazon.com/AmazonRDS/
           | latest/UserGuide/Postg...
        
           | samwillis wrote:
           | Partially correct, this doesn't work with RDS as you can't
           | install pgsql-http (a requirement for automatic install).
           | However all the extensions listed are "Trusted Language
           | Extensions for PostgreSQL" which was developed by AWS to
           | enable installing extensions written in "Safe" languages. So
           | I believe that all the extensions listed could be installed
           | manually.
           | 
           | https://github.com/aws/pg_tle
           | 
           | https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Postg.
           | ..
        
             | paulddraper wrote:
             | Oh cool!
        
       | [deleted]
        
       | nextaccountic wrote:
       | From https://supabase.com/blog/dbdev
       | 
       | > dbdev fills the same role for PostgreSQL as npm for JavaScript,
       | pip for Python and cargo for Rust in that it enables publishing
       | libraries and applications for repeatable deployment. We'll be
       | releasing the tooling necessary for third-parties to publish
       | pglets to the registry once we've collected some community
       | feedback and incorporate any great new ideas. Our goal is to
       | create an open ecosystem for packaging and discovering SQL.
       | 
       | How does https://pgxn.org/ fit in this ecosystem?
        
         | supamichelp wrote:
         | pgxn does not (yet) support the notion of TLEs, but it
         | certainly could with minimal changes. We're currently in
         | discussion with the pgxn maintainers about how that would look.
        
           | ccleve wrote:
           | TLEs = trusted language extensions?
        
             | [deleted]
        
             | supamichelp wrote:
             | Yes.
        
       ___________________________________________________________________
       (page generated 2023-04-14 23:00 UTC)