[HN Gopher] Dbdev - A database package manager for PostgreSQL tr... ___________________________________________________________________ Dbdev - A database package manager for PostgreSQL trusted language extensions Author : samwillis Score : 101 points Date : 2023-04-14 15:38 UTC (7 hours ago) (HTM) web link (database.dev) (TXT) w3m dump (database.dev) | koolba wrote: | Is this like pgxn but specific to trusted languages so presumably | less installation issues? | oliverrice wrote: | yeah, exactly. pgxn also houses native extensions, some of | which may be TLE compatible. database.dev is trusted language | extension specific | | We've reached out to David, who runs pgxn to see how we might | collaborate and allow pgxn to be used as another registry that | users can install TLEs from | mmmeff wrote: | What's stopping me from adding an extension to this repository | that adds something valuable while simultaneously exfiltrating | entire databases back to me? | | It's a cool idea, but I won't be the first to try it. You're | placing an awful lot of trust in these modules. | thewataccount wrote: | Nothing? It's a classic supply chain attack. | | The same goes for pypi, npm, crates.io, docker.com, github, | chrome extensions, etc. | | Generally it comes down to trusted authors/package names. Many | people trust `requests` from pypi. That being said even this | doesn't prevent the package to be sold, or hacked, or from the | owner later adding things, or a 3rd party sneaking something in | via a PR. Some like docker (and technically most git based ones | because of md5) can have specific tags changed at any time, so | you could later add a malicious package to version 3.0.4 or | whatever. | | Many package managers (mostly linux ones) include key-signing, | some package managers also have built-in scanners. Preferably | you proxy/cache versions you need, and have a way to audit | version upgrades - but in reality it's a lot of work to do | properly. | | Recent example of `pytorch` nightly on pypi accidentally having | a dependancy named `torchtriton` that I think was name | squatting? Anyway installing the pytorch nightly between | December 25th to 30th would result in your home directory being | uploaded, etc. | https://www.bleepingcomputer.com/news/security/pytorch-discl... | | EDIT: That said if anyone from supabase reads this - I'd love | to see GPG signing, version hashing/integrity verification, 2FA | if it's not their already, and I know this one is a lot of | extra work - but any type of automated even cursory glance for | malicious code. | michelpp wrote: | > That said if anyone from supabase reads this - I'd love to | see GPG signing, version hashing/integrity verification, 2FA | if it's not their already | | Yep, we've got all these on the road map, thanks! | | > and I know this one is a lot of extra work - but any type | of automated even cursory glance for malicious code. | | Unfortunately a "cursory glance" is trivially sidestepped, | and anything complete enough to be useful is probably | impossible. Maybe you want to write an extension that | automates logically replicating your data off the server, or | truncates a table after processing it, to you that's useful, | to someone else that's malicious, there's just no way to | know. | | If you don't trust the authors then the only way to trust a | TLE is to inspect the code yourself in the context of what | "safe" means to you. The pytorch example you mentioned was | very likely caught by a human, if an automated system were | possible for python (but likely impossible) it would have | caught it sooner. | michelpp wrote: | Like any package or plugin for any language or framework, you | are responsible for understanding what you are installing. | Since TLEs do not come with any compiled code, this is | relatively easy for you to verify yourself with code | inspection. | | We expect a lot of DBAs to say "no way" to something like TLEs, | that's fine, for similar reasons people don't use pip, npm, or | cpan. You make your own risk/benefit analysis. | bashinator wrote: | You talking with the PostGIS folks? | oliverrice wrote: | PostGIS is a native extension mostly written in C. Since there | is no C trusted language it would be hard to fit PostGIS into | the TLE paradigm | | Its less of a problem for well known and trusted extensions | like PostGIS though because it comes pre-installed on most | hosted providers (Supabase, RDS, etc) | mike_hearn wrote: | At KotlinConf today I gave a talk on designing apps with two-tier | architecture, where you implement your entire app without the web | stack appearing anywhere at all. Instead you publish desktop and | mobile apps that connect directly to an RDBMS like PostgreSQL via | its native protocol, and use server extensions for any logic that | is inconvenient to do with SQL. | | This approach might seem horrifyingly outside-the-box but has a | lot of advantages, and some of the reasons we didn't do things | this way historically have been solved in recent years. | | Because it was KotlinConf the demo uses PL/Java, which is pretty | nice because there's such a healthy ecosystem of stuff based | around JDBC and because deploying JVM stuff doesn't require any | sort of cross-compilation. PL/Java also supports (for now) | trusted extensions using sandboxing, although of course the | sandbox can just get in the way and normally you trust your own | server anyway so this is a double edged sword. | | The demo code can be found here (it's a prototype and nobody | reviewed it yet so be gentle) | | https://github.com/hydraulic-software/bugzino | | I'll write up a blog post version of the talk, but for now I had | to mention that DBaaS providers don't actually enable this sort | of design because they like to wall off the full power of the | RDBMS behind custom APIs. But in two-tier design you really lean | into the database and use all of its features. So, it'd be nice | if: | | a. database.dev were to support PL/Java extensions. | | b. Supabase were to allow direct connections, as the native DB | protocol supports a _lot_ of features that otherwise have to be | sort of hacked on top of HTTP. Ultimately, HTTP is designed to | fetch hypertext whereas the PG native protocol is designed to | work with data, and that difference shines through in a bunch of | ways. | saghm wrote: | > This approach might seem horrifyingly outside-the-box but has | a lot of advantages, and some of the reasons we didn't do | things this way historically have been solved in recent years. | | From a development standpoint, I definitely think this seems | easier! My concerns about this would be almost entirely about | security. How does this approach avoid leaking credentials that | people could use to access the database outside of the | application? More generally, how does this approach deal with | the increased attack surface for the database from exposing it | to the open internet? | mike_hearn wrote: | See my other post for a discussion of non-user security | topics. | | Every app user has a database user, mapped 1:1, so there are | no credentials to leak. The DB credentials are the user | credentials. In the demo app repository I implement open | signup with email confirmation using stored procedures/a | server extension. It works by creating a guest user with a | well known password, but it's locked down so that the only | things it can do are run a couple of stored procedures. Those | then send an email to confirm ownership of the address before | creating a real DB user. | | Open work: don't use passwords, mint SSL client certificates | during signup with the private key stored in the OS keystore. | Now there's no cookies or passwords to steal! Also, OAuth | integration. | | The guest user / open signup is certainly a weakness for the | demo, which is public on the internet (a dedicated cluster | though). It's locked down as much as possible but there are | probably lots of ways to screw with the server even without | being granted access to anything. And there are no rate | limits on signup. It's really a very basic exploration. | conceptme wrote: | Connections are pretty expensive in PG how can you do | pooling if every user connects directly? | giovannibonetti wrote: | Connection poolers like PgBouncer [1] (traditional) and | Supabase's Supavisor [2] (new) come to mind. | | [1] https://www.pgbouncer.org/ [2] | https://github.com/supabase/supavisor | mike_hearn wrote: | Supavisor looks great. Unclear if it can be configured to | use https://github.com/pgaudit/set_user though. | | That said, there are quite a lot of CRUD apps in the | world that don't need to support lots of simultaneous | users, or where you can just add read replicas quite | cheaply. Think internal apps, dashboards, etc. | GordonS wrote: | > This approach might seem horrifyingly outside-the-box | | I mean, this is how we routinely built client-server apps 20-30 | years back. Everything old is new again! | | One of the big reasons so many moved away from this approach | was security. Giving end users direct access to the database | (even if "obscured"), gives me the shivers! | mike_hearn wrote: | One of the things that's changed in recent years is that free | databases like PostgreSQL have far more security features. | You can grant fine grained privileges on individual objects | like queries (via views), individual table rows, stored | procedures/functions and so on. | | In the demo app, I use a simple approach to security: users | don't have permissions to directly access any tables. Read | access is via views and could use row-level security (RLS), | and write access is always via stored procedures. Stored | procs can also act as queries, and they can be written in any | supported extension language, so any access control and | privacy policy is implementable. | | Still, security in two-tier apps is a nuanced topic. Some | things get better and other things are worse. | | Some of the remaining things to solve are: | | 1. RDBMS engines tend to be written in C. They're good code | and fairly trustworthy, but still, web servers tend to be | written in memory safe languages and databases aren't. | | 2. DDoS defense companies have more options for protecting | web apps (sending js challenges and captchas). If you have a | mobile app then you're exposing raw APIs anyway, so this is | maybe not a big change. | | 3. You have to set resource quotas to stop malicious users | submitting expensive queries. Of course this is also a | concern with web apps, just translating HTTP to PG doesn't | make it impossible to spam servers with expensive queries. | | 4. Data can be protected with ACLs pretty well, but schemas | not so much. So you could leak an upcoming feature via the | appearance of new object names. Web apps sometimes have the | same issue with leaking info about new features in their JS | though, and the sky doesn't fall. | | For many app types the above concerns don't matter much. Any | app that's internal or business-to-business for example. If | you build auth around client certs then unauthorized users | never get to connect at all. | | Also! Consider that this design also eliminates many kinds of | security issue. The talk goes into this. For example, XSS and | SQL injection are eliminated by design! Those have been two | of the most common and destructive bug classes over time. | Also, web servers do not exactly have a track record of being | unhackable, especially as wrapping SQL with HTTP handlers | introduces many new places where access checks might be | forgotten or incorrectly implemented. A SQL GRANT statement | is pretty transparent and the implementation is well tested | compared to app specific logic that might have been written | by one person and never properly tested. | | There's also other security-related benefits, for example, | certificate expiry routinely causes massive and eye- | bleedingly disruptive outages, but is eliminated as a problem | entirely by this design. It works because code signatures are | timestamped so don't break when the cert expires, and the db | connection can be then be encrypted using a self-signed | certificate that never expires. To rotate the key you roll | out a new client. | majkinetor wrote: | This is basically how its done today with tools like | Postgrest. | benatkin wrote: | > This approach might seem horrifyingly outside-the-box but has | a lot of advantages | | You don't need to worry, there are a lot of people who are into | stored procedures. If there weren't, there wouldn't be people | disagreeing with them, like the author of Rails: | http://web.archive.org/web/20060418215514/http://www.loudthi... | | The slippery slope of liking stored procedures a whole lot | means them taking over the rest of the middle of the stack, and | that isn't a new idea. | | I like the idea of having instant access to the database, and | microservices are my favored approach. | nijave wrote: | Are there any performance issues with connection handling? Not | sure how/where a connection pooler fits in. | mwcampbell wrote: | Mike has called this out as a weakness of PostgreSQL, and an | advantage of Microsoft SQL Server and Oracle, before. | mike_hearn wrote: | This is also discussed in the talk, which I hope will be on | YouTube soon enough (it was live streamed). | | DB connections are heavier than web server connections, in | particular in terms of memory (when using postgres, not sure | about mysql). Multiplexers like pgbouncer can address this, | but then you lose a bit of the deployment simplicity. | Fortunately RAM is pretty cheap nowadays, you can configure | the DB to disconnect idle connections, you can use | multiplexers and you can make the client transparently | reconnect when a connection is needed. So it turns into a | RAM/user latency tradeoff. | | There are some benefits though. Native DB protocols will tend | to stream results in binary, avoiding a JSON encoding that | can expand the bandwidth requirements. Also you can get rid | of the server side load balancing, because good db drivers | know how to load balance amongst R/O replicas already. A lot | of deployment complexity in web apps is driven by the fact | that browsers handle errors by just throwing up an error page | and stopping, so you need to avoid even tiny amounts of | downtime. When you control the client you can do failover and | recovery more gracefully, which in turn relaxes the | availability constraints on the servers a bit. | nijave wrote: | It seems like this uses native Postgres security/roles | which I think would prevent a connection pooler from | working (unless you held open 1 connection per user) | | Ultimately it'd be nice if Postgres could keep a hot worker | pool for connections instead of forking processes but afaik | there's been plenty of debate over the years and no | consensus | | I think high connection thrashing also eats up CPU and | there's the issue of increasing load eats more RAM which | reduces Postgres memory cache. We had a connection leak | problem and clearing ~2600 PG connections saved something | like 80-100GB of RAM | mike_hearn wrote: | I think you can use https://github.com/pgaudit/set_user | combined with some pgbouncer configuration to solve that, | but agreed that postgres doesn't have perfect support for | this scenario. It'd be better to have a proxy that is | designed specifically for this use case that knows how to | drive something like the set_user extension (and better | if it were to be merged upstream). | noob4life wrote: | How does `index_advisor` work? | oliverrice wrote: | index_advisor is a project that almost made the cut for LW7. We | decided to launch it as a pgtle instead to get some feedback | before we (spoiler) probably release it in LW8 | | It uses the hypopg package to very quickly generate an index | for each column referenced in the query and then re-run's an | explain plan to measure the estimated "total cost". | | If the expected "total_cost" reduces, we add the indexes to the | list of recommendations and continue testing the remaining | columns. | | Its currently limited to single column indexes only but in a | later update we'll enhance that to do more comprehensive | searches with multiple columns in a single index | claytongulick wrote: | Cool idea for an extension! | | I had a whole set of comments about concerns about doing this | to a large table in prod, and statistics and such - then I | read about hypopg. | | What a neat package, learned something new today. | claytongulick wrote: | I'd _love_ to see plv8 added to this. | | Jerry S. has done an amazing job of wrangling the insane v8 build | process into something that mostly works across platforms, but | installation is still not particularly simple. | | Unfortunately, due to the problems with that build process, | Ubuntu and other flavors dropped the prebuilt apt packages. | | Making plv8 a simple install would be a huge boost to the | community. | michelpp wrote: | Good news for you then! plv8 is a trusted language and works | with pg_tle just fine. | | EDIT: I think I misunderstood your comment, plv8 itself is a | compiled extension and thus cannot be a TLE, however TLEs _can_ | be written in plv8 once you have it installed. Sorry for the | confusion! | jerrysievert wrote: | wow, thanks for the compliment! it hasn't always been easy, | believe me. | LunaSea wrote: | Thank you for the amazing work indeed! | | I know that at one point the idea of using a non-v8 | JavaScript engine was floating around, due in part to the V8 | build complexity and the API breaking changes. | | Is this still something that is up in the air? | | I know that Bun decided to use JavaScriptCore as a recent | example. | jerrysievert wrote: | I have been slowly but surely working on pljs in my spare | time. it is nowhere near feature parity with plv8, but I | hope to get it closer with more time. currently missing: | * query execution/cursors * window functions | * sub transactions | | it is based on quickjs, is very fast/easy to compile, and | while in long-running functions it is slower than v8, it | has a very fast startup time that negates that for shorter | functions. | | but, it's not ready for prime time yet. | mike_hearn wrote: | That's interesting - could you describe what makes the build | and install so hard? I can't think of any obvious reason why | it should be harder than PL/Java. V8 is pretty widely used! | jerrysievert wrote: | v8 has its own build chain, which involves ~5gb of | downloads. that build chain doesn't work on many platforms, | and requires very specific versions of prerequisites | including specific glibc versions. | | the build chain consists of specific compilers, specific | versions of chromium parts and pieces, its own sys root, | everything. | | since the API for v8 changes on a regular basis (not as | much as it has in the past, at least), you cannot simply | change out v8 versions, as they likely will not work. | | v8 is also compiled differently for each embedding | application, and thus a shared object won't work correctly | across multiple applications, and besides is strongly | advised against by the v8 team. | | nodejs has its own build process that they've been | maintaining since the v8 3.1.4 days, but it is specific to | node. | | so, it just adds up to a difficult maintenance. | mike_hearn wrote: | Wow. That's kind of surprising. HotSpot is just | autotools: ./configure && make and you're done. Plus the | embedding APIs are stable. | | Do you have any insight into what led them to have such a | complex build? Is it just because it's a part of the | Chromium tree? It doesn't seem inherent to the problem | space. | jerrysievert wrote: | they use their own build systems for everything, even | skia. though skia has a bit less strict build | requirements. | foreigner wrote: | Can I use this on AWS RDS? | oliverrice wrote: | RDS does have the underlying native extension that is required | to install user defined packages (pg_tle) from database.dev. It | doesn't currently have support for the extension that is | necessary to make the in-database client work to install | packages directly from the registry. | | That's great feedback though: We'll add a new page to the | website for each package that provides the snippet you can run | on an RDS instance to load the extension and then run `create | extension`. | | We're also in talks with RDS team and experimenting with a few | options that would allow users to query the registry directly. | The most promising one we have found so far is using AWS Lambda | with RDS's aws_lambda extension to do that http part, and pass | the contents back to the database. Still a WIP though! | paulddraper wrote: | No. | | RDS has a strict list of approved extensions. (Since AWS is on | the hook for the system stability, security, replication, etc.) | supamichelp wrote: | RDS for Postgres 14+ _does_ come with pg_tle, but it does not | come with pgsql-http, so you cannot use the dbdev installer. | But, you _can_ still install TLEs, you just have to do so | "by hand" by calling `pgtle.install_extension()` yourself, | probably from a migration script depending on your tooling. | | See the RDS docs here: https://docs.aws.amazon.com/AmazonRDS/ | latest/UserGuide/Postg... | samwillis wrote: | Partially correct, this doesn't work with RDS as you can't | install pgsql-http (a requirement for automatic install). | However all the extensions listed are "Trusted Language | Extensions for PostgreSQL" which was developed by AWS to | enable installing extensions written in "Safe" languages. So | I believe that all the extensions listed could be installed | manually. | | https://github.com/aws/pg_tle | | https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Postg. | .. | paulddraper wrote: | Oh cool! | [deleted] | nextaccountic wrote: | From https://supabase.com/blog/dbdev | | > dbdev fills the same role for PostgreSQL as npm for JavaScript, | pip for Python and cargo for Rust in that it enables publishing | libraries and applications for repeatable deployment. We'll be | releasing the tooling necessary for third-parties to publish | pglets to the registry once we've collected some community | feedback and incorporate any great new ideas. Our goal is to | create an open ecosystem for packaging and discovering SQL. | | How does https://pgxn.org/ fit in this ecosystem? | supamichelp wrote: | pgxn does not (yet) support the notion of TLEs, but it | certainly could with minimal changes. We're currently in | discussion with the pgxn maintainers about how that would look. | ccleve wrote: | TLEs = trusted language extensions? | [deleted] | supamichelp wrote: | Yes. ___________________________________________________________________ (page generated 2023-04-14 23:00 UTC)