[HN Gopher] PGX: Write Postgres extensions in Rust instead of C
       ___________________________________________________________________
        
       PGX: Write Postgres extensions in Rust instead of C
        
       Author : adamnemecek
       Score  : 226 points
       Date   : 2020-07-13 14:27 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jarym wrote:
       | Guys, this is really really cool. Been looking to learn Rust and
       | it just so happens I need to write a Postgres extension in the
       | near future so two birds and one stone.
       | 
       | Btw - Also, I've seen ZomboDB and while I've not used it I must
       | say it looks EXCELLENT. So congrats on that too.
        
         | zombodb wrote:
         | _blush_
        
       | [deleted]
        
       | rukittenme wrote:
       | Just out of curiosity, what sort of requirements compel a person
       | to write a postgres extension? I know of things like PostGIS but
       | I'm not well versed enough in this world to know the totality of
       | the problem.
        
         | gshulegaard wrote:
         | This is a sort of broad question, but if I were to try and
         | answer I would say that if there is anything you want to be
         | handled by the database you would gravitate towards an
         | extension.
         | 
         | PostGIS is a great example since one of the most well known it
         | additions it makes is a datatype, but that's not the only thing
         | it adds. A datatype on it's own wouldn't be that useful without
         | Spatial Indexing (https://postgis.net/workshops/postgis-
         | intro/indexing.html) or spatial utilities to make spatial joins
         | more expressive (https://postgis.net/workshops/postgis-
         | intro/joins_exercises....).
         | 
         | Rich data types aren't the only candidates for extensions,
         | automated partition management
         | (https://github.com/pgpartman/pg_partman), data sharding
         | (https://www.citusdata.com/), or even database cron scheduling
         | (https://github.com/citusdata/pg_cron) are also good examples
         | of things that are well suited to be extensions.
         | 
         | To be honest, I have a hard time imagining alternative paths
         | where some of this functionality _isn't_ an extension. You
         | might imagine an external daemon for things like pg_cron and
         | pg_partman, but if PostGIS wasn't an extension you would
         | probably be using a different database/tool (or a fork) if you
         | had geospatial requirements. It's worth noting that Citus was a
         | fork of PostgreSQL before it was refactored to be an extension.
         | 
         | I know this isn't a direct answer to your question, but
         | hopefully you find it somewhat useful.
         | 
         | Edit: I totally forgot to add 1 more thing about extensions.
         | When functionality is packaged as an extension there is a much
         | higher likelihood you can mix and match them. For example, I
         | currently run a PostgreSQL cluster using Citus, with pg_partman
         | for automatic time-based partitioning, pg_cron regularly
         | scheduling partition creation, and PostGIS for geospatial
         | datatypes. You could extrapolate the various ways you might use
         | this kind of setup...one that jumps to my mind is a scalable
         | GPS history that could be queried by time period as well as by
         | geographic region.
        
           | zombodb wrote:
           | Yeah, "rich data types" is a good point.
           | 
           | pgx provides a #[derive(PostgresType)] macro that lets you
           | represent any normal Rust struct (that serde_cbor can
           | (de)serialize) as a Postgres type. You write no code.
           | 
           | It even generates all the boring SQL boilerplate for you.
           | 
           | I plan on putting together an example about this and doing a
           | twitch stream this week to discuss in detail.
        
         | derefr wrote:
         | The most common reason to write a PG extension is to add native
         | support for a structure/data type that PG doesn't already have
         | native support for.
         | 
         | For example, despite Postgres having a (base-10000)
         | numeric/decimal type, there are certain things that can be done
         | much more cheaply in base-256 (e.g. ser/des from bit-packed
         | and/or hex strings), and thus pgmp
         | (https://github.com/dvarrazzo/pgmp) exists as a Postgres libgmp
         | binding.
         | 
         | There are also "data service" type extensions--exposing some
         | other data "into" Postgres so it can participate in a query.
         | PG's own pg_stat_statements extension is an example. Postgres's
         | support for Foreign Data Wrappers has mostly obviated writing
         | these as extensions, though; it's much easier (and safer for
         | your RDBMS-side data!) to just write these as standalone
         | daemons that Postgres mounts as FDW servers.
        
         | cdaringe wrote:
         | We have to write a Json logger versus the default log format
        
           | zombodb wrote:
           | `pgx` would let you do that.
           | 
           | You could hook Postgres "emit_log_hook" and probably just use
           | serde to xform the provided "ErrorData" pointer right to json
           | and ship it off where ever you want.
           | 
           | (edit: typeos)
        
         | Tostino wrote:
         | It just depends on the type of problem you'd like to solve.
         | 
         | I've got a whole bunch of ETL helper functions and patterns
         | i've written for my business that could be packaged up as an
         | extension if I were so inclined for example. Those are all SQL
         | or PL/PGSQL functions, so no need for native code there
         | though...
         | 
         | For PostGIS, a whole lot of that code is just SQL or PL/PGSQL
         | functions...but there are quite a few C functions as well, to
         | take care of the heavy lifting which is less efficient to
         | implement in SQL or PL/PGSQL.
        
         | harikb wrote:
         | I think "extension" covers any custom code (even functions that
         | need to import a c library), not just major db features. I have
         | had to do it recently [1] to customize partitioning function to
         | be the way I like. It would be great to do it in a friendlier
         | language
         | 
         | [1] https://harikb.github.io/posts/postgres-11-hash-
         | partitioning...
        
           | Hendrikto wrote:
           | > It would be great to do it in a friendlier language
           | 
           | Whether Rust is "friendlier" than C depends a lot on your
           | definition of what that means.
        
             | zombodb wrote:
             | You're not wrong, but barring bugs in `pgx` (of which I'm
             | sure there are plenty right now), at least Rust gives you
             | compile-time guarantees around not crashing.
             | 
             | And when running inside your database process, that's a
             | huge win.
        
               | zombodb wrote:
               | As a follow-up. Here's what Postgres' "generate_series"
               | function looks like with pgx: https://github.com/zombodb/
               | pgx/blob/0803bd3aa2f8ef2bd80d2e03...
               | 
               | And here's what you'd have to do to implement it in C: ht
               | tps://github.com/postgres/postgres/blob/dad75eb4a8d5835ec
               | c...
        
               | zombodb wrote:
               | One more follow-up...
               | 
               | The top one is pgx, the bottom is Postgres. So there's a
               | little room for improvement here with pgx, but that's
               | okay for a v0.0.3 release.                   test=#
               | select count(*) from srf.generate_series(1, 10000000);
               | Time: 1552.115 ms (00:01.552)              test=# select
               | count(*) from generate_series(1, 10000000);         Time:
               | 1406.357 ms (00:01.406)
        
               | anarazel wrote:
               | The largest part of the time executing the above query
               | isn't inside the function, so this isn't that a material
               | comparison. The reason for that is that SRFs in FROM to
               | be materialized into a tuplestore, which isn't free:
               | postgres[607045][1]=# SELECT count(*) FROM
               | generate_series(1, 10000000);         +----------+
               | |  count   |         +----------+         | 10000000 |
               | +----------+         (1 row)              Time: 1249.224
               | ms (00:01.249)              postgres[607045][1]=# SELECT
               | count(*) FROM (SELECT generate_series(1, 10000000)) s;
               | +----------+         |  count   |         +----------+
               | | 10000000 |         +----------+         (1 row)
               | Time: 460.206 ms
               | 
               | For mostly historical reasons SRFs in the target list can
               | use the non-materializing SRF query protocol, but SRFs in
               | the FROM list can't.
               | 
               | Any chance you could show the timings for the pgx version
               | of the second query?
        
               | zombodb wrote:
               | Sure! Top is Postgres, bottom is pgx, after running each
               | 5 times...                   test=# SELECT count(*) FROM
               | (SELECT generate_series(1, 10000000)) s;           count
               | ----------          10000000         (1 row)
               | Time: 399.630 ms         test=# SELECT count(*) FROM
               | (SELECT srf.generate_series(1, 10000000)) s;
               | count            ----------          10000000         (1
               | row)              Time: 478.194 ms
               | 
               | Thanks for the reply. I'm not surprised there's room for
               | optimization in pgx, especially in the Iterator-->SRF
               | path.
               | 
               | edit: PS, thanks for the idea of putting what I assume is
               | the backend pid in psql's prompt. I need to go figure out
               | how to do that right now!
               | 
               | edit edit: hmm, I guess that's not the PID.
        
               | anarazel wrote:
               | Thanks for the update.
               | 
               | > edit: PS, thanks for the idea of putting what I assume
               | is the backend pid in psql's prompt. I need to go figure
               | out how to do that right now!
               | 
               | Here's my ~/.psqlrc:
               | andres@awork3:~/src/postgresql$ cat ~/.psqlrc
               | \set QUIET 1              \pset pager always         \set
               | VERBOSITY verbose         \pset null '(null)'
               | \set COMP_KEYWORD_CASE upper         \pset linestyle
               | unicode         \pset border 2         \set PROMPT1
               | '%/[%p][%l]%x%R%# '         \set PROMPT2
               | '%/[%p][%l]%x%R%# '         \set PROMPT3
               | 'c:%/[%p][%l]%x%R%# '         \set HISTCONTROL ignoredups
               | \set HISTSIZE 100000              \timing on         \set
               | QUIET 0
        
               | db48x wrote:
               | That is certainly nicer.
        
               | derefr wrote:
               | One might say that Rust is friendlier _downstream_ , to
               | the ops people running the compiled binary :)
        
               | chc wrote:
               | Does Rust give you compile-time guarantees around not
               | crashing? I thought any function could panic without so
               | much as a peep from the compiler.
        
               | zombodb wrote:
               | Fair. With pgx, however, Rust "panic!"s are translated
               | into standard Postgres "ERROR"s, such that instead of
               | crashing, only the current transaction aborts.
               | 
               | So while you're pretty much correct in general, pgx
               | handles it in the way a PG extension author would expect.
        
         | zombodb wrote:
         | Author of `pgx` here.
         | 
         | We developed pgx so that we could rewrite "ZomboDB"
         | (https://github.com/zombodb/zombodb) in Rust. ZDB is a custom
         | "Index Access Method" for Postgres. Think btree index, but
         | stored in Elasticsearch.
         | 
         | So that's definitely a thing.
         | 
         | Other ideas might be custom data
         | analytics/transformation/processing engines that you'd prefer
         | to run in the database instead of externally.
         | 
         | Custom data types to represent, for example, street addresses
         | or genetic information.
         | 
         | The only limit is yourself! ;)
        
           | rukittenme wrote:
           | > The only limit is yourself! ;)
           | 
           | You have no idea how significant of a limit that is... I'm
           | kidding.
           | 
           | Thanks for the response! And zombodb looks very cool. I might
           | bring it up since we have a couple of heavy analytics
           | endpoints right now.
        
           | boulos wrote:
           | > The only limit is yourself! ;)
           | 
           | For folks who don't get the joke, please visit zombo.com
           | before Flash is finally dead :).
        
             | mumblemumble wrote:
             | Alternatively, there's https://html5zombo.com
        
           | Dowwie wrote:
           | Have you evaluated Tantivy/Toshi as possible alternative to
           | ES? I only ask because of your native use of Rust. PGX seems
           | like a more robust offering than pg-extend-rs. This is a
           | great addition to the community!
           | 
           | Also, v.0.0.3 doesn't create much confidence about stability!
           | I'll keep an eye on the project and wish the team the best.
        
             | zombodb wrote:
             | I've looked into them. It seems they're designed to work
             | within a single process, and it's not quite clear to me how
             | sharing the underlying data files across postgres backends
             | (even with proper Postgres locking) would work.
             | 
             | That's not say they aren't good frameworks. I'm sure they
             | are. It just seems like they're designed for different use
             | cases.
             | 
             | That said, I have other ideas on this front that I can't
             | talk about today. ;)
        
             | zombodb wrote:
             | re: v0.0.3 -- sure. I just published it last night.
             | 
             | We've been working on it since November last year, and have
             | now fully ported ZomboDB to it.
             | 
             | It's proving out nicely, but keep in mind that Postgres'
             | internals are infinitely complex. Getting safe wrappers
             | around all its "things" is going to take a very very long
             | time.
             | 
             | I'd rather get something that seems very stable now, and
             | continue to iterate on it over time.
        
       | faizshah wrote:
       | Interesting, it makes coding a postgres extension seem
       | delightfully simple. I have been looking for something to try
       | using rust on! Thanks for posting this.
        
         | zombodb wrote:
         | Thank you!
         | 
         | If you decide to jump into it, definitely let us know any pain
         | points you have.
         | 
         | It takes a bit of time to work out the kinks in a thing like
         | this.
        
       | [deleted]
        
       | gavinray wrote:
       | This is really cool! The only other similar thing I know of is
       | WASMer's extension (which only supports Rust atm I believe):
       | 
       | https://github.com/wasmerio/postgres-ext-wasm
       | 
       | One question though: If you built and compiled an extension with
       | this, how would you deploy it on top of a Postgres Docker image?
       | 
       | I see instructions for installing and testing locally, via the
       | cargo command, but couldn't find anything on this.
       | 
       | Does it spit out a static asset that you put in the project
       | folder, and copy over during the build pipeline? Something like:
       | FROM postgres:12-alpine       COPY ./my-rust-extension
       | /some/container/directory       RUN install-command-here
        
         | zombodb wrote:
         | That's a great question, and one probably best answered over on
         | pgx's GitHub page.
         | 
         | But! I plan on adding a command to "cargo-pgx" to package up
         | the extension for you into a directory structure (or tarball,
         | maybe).
         | 
         | The idea is that you'd just run: cargo pgx package
         | 
         | And it would just build a --release library, and create the
         | proper directory structure (based on what pg_config) says for
         | the shared library and the associated .sql.
         | 
         | I actually need this ASAP for supporting ZomboDB proper, so...
         | Coming Soon!
        
           | gavinray wrote:
           | Awesome, look forward to it =D
        
       | fnord123 wrote:
       | Also of interest: rpgffi - https://github.com/posix4e/rpgffi
       | 
       | "R(Rust) PG(Postgresql) FFI (Foreign Function Interface)"
        
         | zombodb wrote:
         | That is interesting.
         | 
         | pgx does a lot of that too, but for only pg10/11/12. And then
         | it post-processes the bindings and builds a "common.rs" for
         | symbols that are identical across versions.
         | 
         | That way you can target all three versions at once, and then
         | you can use Rust's #[cfg(feature=)] directive to opt into
         | support for specific versions.
         | 
         | pgx has also done a little bit of work to mask differences
         | between the three versions so they appear as part of
         | "common.rs" anyways.
         | 
         | I wanted a framework that let me know, at development time, if
         | I was trying to do a thing that isn't supported (or the same)
         | across various PG versions.
        
       | tcyrus wrote:
       | Is there any way to make custom types using pgx?
        
         | zombodb wrote:
         | Yes there is. It's not documented/example'd yet tho.
         | 
         | There's a derive macro called #[derive(PostgresType)]. Combine
         | that with serde's Serialize, Deserialize, and you're gtg.
         | 
         | I'm going to be working on more docs and twitch streams over
         | this week.
        
       | YalsXHAN wrote:
       | Are there still other languages apart from Rust?
       | 
       | Is any of the daily rewrite-x-in-rust submissions used in
       | production?
       | 
       | C is actually used in production ...
        
         | speedgoose wrote:
         | JavaScript is still pretty strong these days.
        
       | ficklepickle wrote:
       | Typo in the title: oc->of
        
         | dang wrote:
         | Fixed. Thanks!
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2020-07-13 23:00 UTC)