[HN Gopher] PGX: Write Postgres extensions in Rust instead of C ___________________________________________________________________ PGX: Write Postgres extensions in Rust instead of C Author : adamnemecek Score : 226 points Date : 2020-07-13 14:27 UTC (8 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | jarym wrote: | Guys, this is really really cool. Been looking to learn Rust and | it just so happens I need to write a Postgres extension in the | near future so two birds and one stone. | | Btw - Also, I've seen ZomboDB and while I've not used it I must | say it looks EXCELLENT. So congrats on that too. | zombodb wrote: | _blush_ | [deleted] | rukittenme wrote: | Just out of curiosity, what sort of requirements compel a person | to write a postgres extension? I know of things like PostGIS but | I'm not well versed enough in this world to know the totality of | the problem. | gshulegaard wrote: | This is a sort of broad question, but if I were to try and | answer I would say that if there is anything you want to be | handled by the database you would gravitate towards an | extension. | | PostGIS is a great example since one of the most well known it | additions it makes is a datatype, but that's not the only thing | it adds. A datatype on it's own wouldn't be that useful without | Spatial Indexing (https://postgis.net/workshops/postgis- | intro/indexing.html) or spatial utilities to make spatial joins | more expressive (https://postgis.net/workshops/postgis- | intro/joins_exercises....). | | Rich data types aren't the only candidates for extensions, | automated partition management | (https://github.com/pgpartman/pg_partman), data sharding | (https://www.citusdata.com/), or even database cron scheduling | (https://github.com/citusdata/pg_cron) are also good examples | of things that are well suited to be extensions. | | To be honest, I have a hard time imagining alternative paths | where some of this functionality _isn't_ an extension. You | might imagine an external daemon for things like pg_cron and | pg_partman, but if PostGIS wasn't an extension you would | probably be using a different database/tool (or a fork) if you | had geospatial requirements. It's worth noting that Citus was a | fork of PostgreSQL before it was refactored to be an extension. | | I know this isn't a direct answer to your question, but | hopefully you find it somewhat useful. | | Edit: I totally forgot to add 1 more thing about extensions. | When functionality is packaged as an extension there is a much | higher likelihood you can mix and match them. For example, I | currently run a PostgreSQL cluster using Citus, with pg_partman | for automatic time-based partitioning, pg_cron regularly | scheduling partition creation, and PostGIS for geospatial | datatypes. You could extrapolate the various ways you might use | this kind of setup...one that jumps to my mind is a scalable | GPS history that could be queried by time period as well as by | geographic region. | zombodb wrote: | Yeah, "rich data types" is a good point. | | pgx provides a #[derive(PostgresType)] macro that lets you | represent any normal Rust struct (that serde_cbor can | (de)serialize) as a Postgres type. You write no code. | | It even generates all the boring SQL boilerplate for you. | | I plan on putting together an example about this and doing a | twitch stream this week to discuss in detail. | derefr wrote: | The most common reason to write a PG extension is to add native | support for a structure/data type that PG doesn't already have | native support for. | | For example, despite Postgres having a (base-10000) | numeric/decimal type, there are certain things that can be done | much more cheaply in base-256 (e.g. ser/des from bit-packed | and/or hex strings), and thus pgmp | (https://github.com/dvarrazzo/pgmp) exists as a Postgres libgmp | binding. | | There are also "data service" type extensions--exposing some | other data "into" Postgres so it can participate in a query. | PG's own pg_stat_statements extension is an example. Postgres's | support for Foreign Data Wrappers has mostly obviated writing | these as extensions, though; it's much easier (and safer for | your RDBMS-side data!) to just write these as standalone | daemons that Postgres mounts as FDW servers. | cdaringe wrote: | We have to write a Json logger versus the default log format | zombodb wrote: | `pgx` would let you do that. | | You could hook Postgres "emit_log_hook" and probably just use | serde to xform the provided "ErrorData" pointer right to json | and ship it off where ever you want. | | (edit: typeos) | Tostino wrote: | It just depends on the type of problem you'd like to solve. | | I've got a whole bunch of ETL helper functions and patterns | i've written for my business that could be packaged up as an | extension if I were so inclined for example. Those are all SQL | or PL/PGSQL functions, so no need for native code there | though... | | For PostGIS, a whole lot of that code is just SQL or PL/PGSQL | functions...but there are quite a few C functions as well, to | take care of the heavy lifting which is less efficient to | implement in SQL or PL/PGSQL. | harikb wrote: | I think "extension" covers any custom code (even functions that | need to import a c library), not just major db features. I have | had to do it recently [1] to customize partitioning function to | be the way I like. It would be great to do it in a friendlier | language | | [1] https://harikb.github.io/posts/postgres-11-hash- | partitioning... | Hendrikto wrote: | > It would be great to do it in a friendlier language | | Whether Rust is "friendlier" than C depends a lot on your | definition of what that means. | zombodb wrote: | You're not wrong, but barring bugs in `pgx` (of which I'm | sure there are plenty right now), at least Rust gives you | compile-time guarantees around not crashing. | | And when running inside your database process, that's a | huge win. | zombodb wrote: | As a follow-up. Here's what Postgres' "generate_series" | function looks like with pgx: https://github.com/zombodb/ | pgx/blob/0803bd3aa2f8ef2bd80d2e03... | | And here's what you'd have to do to implement it in C: ht | tps://github.com/postgres/postgres/blob/dad75eb4a8d5835ec | c... | zombodb wrote: | One more follow-up... | | The top one is pgx, the bottom is Postgres. So there's a | little room for improvement here with pgx, but that's | okay for a v0.0.3 release. test=# | select count(*) from srf.generate_series(1, 10000000); | Time: 1552.115 ms (00:01.552) test=# select | count(*) from generate_series(1, 10000000); Time: | 1406.357 ms (00:01.406) | anarazel wrote: | The largest part of the time executing the above query | isn't inside the function, so this isn't that a material | comparison. The reason for that is that SRFs in FROM to | be materialized into a tuplestore, which isn't free: | postgres[607045][1]=# SELECT count(*) FROM | generate_series(1, 10000000); +----------+ | | count | +----------+ | 10000000 | | +----------+ (1 row) Time: 1249.224 | ms (00:01.249) postgres[607045][1]=# SELECT | count(*) FROM (SELECT generate_series(1, 10000000)) s; | +----------+ | count | +----------+ | | 10000000 | +----------+ (1 row) | Time: 460.206 ms | | For mostly historical reasons SRFs in the target list can | use the non-materializing SRF query protocol, but SRFs in | the FROM list can't. | | Any chance you could show the timings for the pgx version | of the second query? | zombodb wrote: | Sure! Top is Postgres, bottom is pgx, after running each | 5 times... test=# SELECT count(*) FROM | (SELECT generate_series(1, 10000000)) s; count | ---------- 10000000 (1 row) | Time: 399.630 ms test=# SELECT count(*) FROM | (SELECT srf.generate_series(1, 10000000)) s; | count ---------- 10000000 (1 | row) Time: 478.194 ms | | Thanks for the reply. I'm not surprised there's room for | optimization in pgx, especially in the Iterator-->SRF | path. | | edit: PS, thanks for the idea of putting what I assume is | the backend pid in psql's prompt. I need to go figure out | how to do that right now! | | edit edit: hmm, I guess that's not the PID. | anarazel wrote: | Thanks for the update. | | > edit: PS, thanks for the idea of putting what I assume | is the backend pid in psql's prompt. I need to go figure | out how to do that right now! | | Here's my ~/.psqlrc: | andres@awork3:~/src/postgresql$ cat ~/.psqlrc | \set QUIET 1 \pset pager always \set | VERBOSITY verbose \pset null '(null)' | \set COMP_KEYWORD_CASE upper \pset linestyle | unicode \pset border 2 \set PROMPT1 | '%/[%p][%l]%x%R%# ' \set PROMPT2 | '%/[%p][%l]%x%R%# ' \set PROMPT3 | 'c:%/[%p][%l]%x%R%# ' \set HISTCONTROL ignoredups | \set HISTSIZE 100000 \timing on \set | QUIET 0 | db48x wrote: | That is certainly nicer. | derefr wrote: | One might say that Rust is friendlier _downstream_ , to | the ops people running the compiled binary :) | chc wrote: | Does Rust give you compile-time guarantees around not | crashing? I thought any function could panic without so | much as a peep from the compiler. | zombodb wrote: | Fair. With pgx, however, Rust "panic!"s are translated | into standard Postgres "ERROR"s, such that instead of | crashing, only the current transaction aborts. | | So while you're pretty much correct in general, pgx | handles it in the way a PG extension author would expect. | zombodb wrote: | Author of `pgx` here. | | We developed pgx so that we could rewrite "ZomboDB" | (https://github.com/zombodb/zombodb) in Rust. ZDB is a custom | "Index Access Method" for Postgres. Think btree index, but | stored in Elasticsearch. | | So that's definitely a thing. | | Other ideas might be custom data | analytics/transformation/processing engines that you'd prefer | to run in the database instead of externally. | | Custom data types to represent, for example, street addresses | or genetic information. | | The only limit is yourself! ;) | rukittenme wrote: | > The only limit is yourself! ;) | | You have no idea how significant of a limit that is... I'm | kidding. | | Thanks for the response! And zombodb looks very cool. I might | bring it up since we have a couple of heavy analytics | endpoints right now. | boulos wrote: | > The only limit is yourself! ;) | | For folks who don't get the joke, please visit zombo.com | before Flash is finally dead :). | mumblemumble wrote: | Alternatively, there's https://html5zombo.com | Dowwie wrote: | Have you evaluated Tantivy/Toshi as possible alternative to | ES? I only ask because of your native use of Rust. PGX seems | like a more robust offering than pg-extend-rs. This is a | great addition to the community! | | Also, v.0.0.3 doesn't create much confidence about stability! | I'll keep an eye on the project and wish the team the best. | zombodb wrote: | I've looked into them. It seems they're designed to work | within a single process, and it's not quite clear to me how | sharing the underlying data files across postgres backends | (even with proper Postgres locking) would work. | | That's not say they aren't good frameworks. I'm sure they | are. It just seems like they're designed for different use | cases. | | That said, I have other ideas on this front that I can't | talk about today. ;) | zombodb wrote: | re: v0.0.3 -- sure. I just published it last night. | | We've been working on it since November last year, and have | now fully ported ZomboDB to it. | | It's proving out nicely, but keep in mind that Postgres' | internals are infinitely complex. Getting safe wrappers | around all its "things" is going to take a very very long | time. | | I'd rather get something that seems very stable now, and | continue to iterate on it over time. | faizshah wrote: | Interesting, it makes coding a postgres extension seem | delightfully simple. I have been looking for something to try | using rust on! Thanks for posting this. | zombodb wrote: | Thank you! | | If you decide to jump into it, definitely let us know any pain | points you have. | | It takes a bit of time to work out the kinks in a thing like | this. | [deleted] | gavinray wrote: | This is really cool! The only other similar thing I know of is | WASMer's extension (which only supports Rust atm I believe): | | https://github.com/wasmerio/postgres-ext-wasm | | One question though: If you built and compiled an extension with | this, how would you deploy it on top of a Postgres Docker image? | | I see instructions for installing and testing locally, via the | cargo command, but couldn't find anything on this. | | Does it spit out a static asset that you put in the project | folder, and copy over during the build pipeline? Something like: | FROM postgres:12-alpine COPY ./my-rust-extension | /some/container/directory RUN install-command-here | zombodb wrote: | That's a great question, and one probably best answered over on | pgx's GitHub page. | | But! I plan on adding a command to "cargo-pgx" to package up | the extension for you into a directory structure (or tarball, | maybe). | | The idea is that you'd just run: cargo pgx package | | And it would just build a --release library, and create the | proper directory structure (based on what pg_config) says for | the shared library and the associated .sql. | | I actually need this ASAP for supporting ZomboDB proper, so... | Coming Soon! | gavinray wrote: | Awesome, look forward to it =D | fnord123 wrote: | Also of interest: rpgffi - https://github.com/posix4e/rpgffi | | "R(Rust) PG(Postgresql) FFI (Foreign Function Interface)" | zombodb wrote: | That is interesting. | | pgx does a lot of that too, but for only pg10/11/12. And then | it post-processes the bindings and builds a "common.rs" for | symbols that are identical across versions. | | That way you can target all three versions at once, and then | you can use Rust's #[cfg(feature=)] directive to opt into | support for specific versions. | | pgx has also done a little bit of work to mask differences | between the three versions so they appear as part of | "common.rs" anyways. | | I wanted a framework that let me know, at development time, if | I was trying to do a thing that isn't supported (or the same) | across various PG versions. | tcyrus wrote: | Is there any way to make custom types using pgx? | zombodb wrote: | Yes there is. It's not documented/example'd yet tho. | | There's a derive macro called #[derive(PostgresType)]. Combine | that with serde's Serialize, Deserialize, and you're gtg. | | I'm going to be working on more docs and twitch streams over | this week. | YalsXHAN wrote: | Are there still other languages apart from Rust? | | Is any of the daily rewrite-x-in-rust submissions used in | production? | | C is actually used in production ... | speedgoose wrote: | JavaScript is still pretty strong these days. | ficklepickle wrote: | Typo in the title: oc->of | dang wrote: | Fixed. Thanks! | [deleted] ___________________________________________________________________ (page generated 2020-07-13 23:00 UTC)