[HN Gopher] Postgres Auditing in 150 lines of SQL
       ___________________________________________________________________
        
       Postgres Auditing in 150 lines of SQL
        
       Author : oliverrice
       Score  : 146 points
       Date   : 2022-03-09 15:26 UTC (7 hours ago)
        
 (HTM) web link (supabase.com)
 (TXT) w3m dump (supabase.com)
        
       | jph wrote:
       | The post leads to a good Supabase Postgres auditing tool as a PG
       | extension named supa_audit:
       | 
       | https://github.com/supabase/supa_audit
        
       | simonw wrote:
       | I've battled this problem many times over my career in many
       | different ways - this approach looks very solid to me. I
       | particularly like the way the schema design incorporates a record
       | ID, old record ID, updated record and old record.
       | 
       | You do end up storing a lot of extra data, but it's very thorough
       | - it's not going to miss anything, and figuring out what happened
       | later on should be very easy.
        
       | lichtenberger wrote:
       | I'd argue that's still a lot of work to manually do. However,
       | great work and detail, thanks a lot :-)
       | 
       | I'm working on a database system[1] in my spare time, which
       | automatically retains all revisions and assignes revision
       | timestamps during commits (single timestamp in the
       | RevisionRootPage). Furthermore, it is tamper proof and the whole
       | storage can be verified by comparing a single UberPage hash as in
       | ZFS.
       | 
       | Basically it is a persistent trie-based revision index (plus
       | document and secondary indexes) mapped to durable storage, a
       | simple log-structured append-only file. A second file tracks
       | revision offsets to provide binary search on an in-memory map of
       | timestamps. As the root of the tree is atomically swapped it does
       | not need a WAL, which basically is another data file and can be
       | omitted in this case.
       | 
       | Besides versioning the data itself in a binary encoding similar
       | to BSON it tracks changes and writes simple JSON diff files for
       | each new revision.
       | 
       | The data pages are furthermore not simply copied on write, but a
       | sliding snapshot algorithm makes sure, that only changed records
       | mainly have to be written. Before the page fragments are written
       | on durable storage they are furthermore compressed and in the
       | future might be encrypted.
       | 
       | [1] https://sirix.io | https://github.com/sirixdb/sirix
        
         | jmalicki wrote:
         | Every google hit for "UberPage" seems to be your writing, is
         | there standard ZFS terminology that would be found in ZFS
         | documentation to what you're referring?
        
           | lichtenberger wrote:
           | It's in our case the tree root of the index, which is always
           | written after all the descendant pages have been written as a
           | new revision is committed (during a postorder traversal of
           | the new pages).
           | 
           | In ZFS the UberPage is called UberBlock. We borrowed some of
           | the concepts as to add checksums in the parent pages instead
           | of the pages itself. In ZFS they are blocks :-)
           | 
           | Thanks for asking.
        
         | ithrow wrote:
         | Yes but sirix is not a SQL/RDBMS.
        
           | lichtenberger wrote:
           | True. In general I could add storing relational data as well,
           | but currently I'm entirely focusing on JSON and auto-indexing
           | for secondary indexes as well higher order function support
           | in Brackit.
           | 
           | Of course we'd need more man-power as I'm more or less the
           | only one working on the core in my spare time (since 2012).
           | 
           | Moshe mainly works on the clients and a frontend. A new
           | frontend based on SolidJS is in the works showing the history
           | and diffs as in
           | 
           | https://raw.githubusercontent.com/sirixdb/sirix/master/Scree.
           | ..
           | 
           | However, we're of course looking forward to suggestions, bug
           | reports, real world use cases and contributions :-)
        
         | lichtenberger wrote:
         | In order to provide fast audits of subtrees, the system stores
         | optionally a merkle hash tree (a hash in each node) and updates
         | the hashes for all ancestors automatically during updates.
        
         | chatmasta wrote:
         | You might like what we're doing with Splitgraph. Our command
         | line tool (sgr) installs an audit log into Postgres to track
         | changes [0]. Then `sgr commit` can write these changes to
         | delta-compressed objects [1], where each object is a columnar
         | fragment of data, addressable by the LTHash of rows
         | added/deleted by the fragment, and attached to metadata
         | describing its index [2].
         | 
         | I haven't explored sirix before, but at first glance it looks
         | like we have some similar ideas -- thanks for sharing, I'm
         | excited to learn more, especially about its application of ZFS.
         | 
         | [0] https://www.splitgraph.com/docs/working-with-
         | data/tracking-c...
         | 
         | [1] https://www.splitgraph.com/docs/concepts/objects
         | 
         | [2]
         | https://github.com/splitgraph/splitgraph/blob/master/splitgr...
        
           | lichtenberger wrote:
           | Very interesting, thanks for pointing out :-)
        
       | oliverrice wrote:
       | author here (of the blog post and
       | https://github.com/supabase/supa_audit)
       | 
       | happy to answer any questions
        
         | simonw wrote:
         | Are you at all worried about the size of the audit table?
         | Keeping a JSON copy of both the old and the new data is going
         | to add up pretty quickly for fast-changing tables.
         | 
         | I've built audit systems in the past which track just the new
         | values, with the idea being that you can replay the audit log
         | for a particular record from the beginning to reconstruct its
         | state. I'm not convinced that's a better approach than yours
         | here though, just interested in your thinking.
        
           | oliverrice wrote:
           | > Are you at all worried about the size of the audit table?
           | Keeping a JSON copy of both the old and the new data is going
           | to add up pretty quickly for fast-changing tables.
           | 
           | That's def a valid concern if you're auditing a high
           | frequency insert/update table, or have your entire database
           | under audit. If you do have those needs a system that logs
           | outside of postgres (like pgaudit) would be a better fit.
           | 
           | In my experience most startups and mid-size enterprises
           | sprinkle in auditing around the sensitive parts of the DB,
           | like `account` or `access_control` tables where writes load
           | isn't much of a concern.
           | 
           | > I've built audit systems in the past which track just the
           | new values
           | 
           | Yeah, that solution works great! The one place it breaks down
           | is if you apply auditing to an existing table, and need to be
           | able to recover the records from when auditing was enabled
           | (initial state is not snapshotted)
           | 
           | We toyed with another approach to avoiding having to track
           | old_record by first checking to see if the `record_id` exists
           | in the audit table, and then inserting a row with the OLD
           | values with a `SNAPSHOT` `operation` if it does not.
           | 
           | Even though that query/check was operating on an index, the
           | performance overhead was higher than we were comfortable
           | with.
           | 
           | Storage is pretty cheap these days so we opted to optimize
           | for write throughput + reduced memory usage rather than any
           | considerations wrt disk.
        
           | jsmith99 wrote:
           | I've also used an update trigger to track only changed
           | columns, inspired by this post[0] which wraps a comparison
           | function into a new subtract operator for JSONB, similar to
           | the built in subtract operator for HSTORE which removes any
           | keypairs that haven't changed.
           | 
           | [0] http://8kb.co.uk/blog/2015/01/19/copying-pavel-stehules-
           | simp...
        
             | oliverrice wrote:
             | thats a cool solution for reducing storage. nice!
        
           | craigkerstiens wrote:
           | Building a bit off Simon's above question. I'm curious if any
           | thought was given to using JSON over JSONB. The trade off of
           | really fast ability to ingest them and save them seems like
           | it could be better than JSONB which has to do some conversion
           | and if there are any indexes on the JSONB columns could
           | really dramatically slow down throughput.
        
             | oliverrice wrote:
             | > I'm curious if any thought was given to using JSON over
             | JSONB
             | 
             | For sure! There is a conversion penalty that is paid
             | incrementally (at write time). For paying that penalty,
             | jsonb gets you reduced storage size and dramatically faster
             | column value extraction with `->` or `->>` if you end up
             | having to filter the audit table by something other than
             | the `record_id`
             | 
             | It is a tradeoff though and depending on the specifics of
             | your use-case JSON could be a better choice.
        
         | netcraft wrote:
         | this is great! I've done something similar many times, but the
         | ability to do it on arbitrary tables is really nice.
         | 
         | If you have the HSTORE extension, you can also use it to get a
         | diff of the old and new: `HSTORE(OLD) - HSTORE(NEW)` and vice
         | versa, very nice to easily have the deltas.
        
         | perlgeek wrote:
         | First of all, this looks super awesome, and at least an order
         | of magnitude better than our home-grown solution (grown over
         | 25+ years on a DB that is now mariadb).
         | 
         | How well does this work when you audit multiple tables with
         | foreign key relationships between them?
         | 
         | If the user references a customer, and you want to find out
         | what customer the user belonged to at time X (but the customer
         | might have been deleted in the mean time), you have to track
         | through the history of multiple tables to find answers. More if
         | you have n2m relationships...
         | 
         | It sounds doable, but not trivial to get right for more
         | complicated models.
         | 
         | ---
         | 
         | Update: to clarify, I want to be able to present a somewhat
         | homan-readable history of an object without doing much table-
         | specific logic.
         | 
         | If an author can publish many books, you typically have an n2m
         | mapping table, like book_authors. If you want to add newly
         | published books to the "author" history view, you have to
         | determine than you have to include parts of the history of the
         | book_authors table. I don't think you can do that through a
         | join though? Trying to think my way through this...
         | 
         | Maybe this can be dealt with easier if you can assume each
         | table has a non-composite primary key, and include that primary
         | key in the history table, and then you can do joins more easily
         | on the history?
        
           | oliverrice wrote:
           | All the data would be available and joining it would be
           | possible with some jsonb gymnastics but its optimized more
           | for compliance style auditing than rapid lookups for complex
           | queries.
           | 
           | For that scenario replicating audit data to a OLAP system may
           | be a better fit
        
         | simonw wrote:
         | I'm not completely clear on the reason for both record_id and
         | old_record_id - is this so that you can capture instances where
         | one of the columns that makes up the primary key for the row is
         | itself modified for some reason?
        
           | oliverrice wrote:
           | Being able to see when the primary key changes using
           | `record_id` and `old_record_id` is a perk, but we also wanted
           | to make sure a snapshot of the existing data was available
           | when auditing is enabled on a table that already has data in
           | it. See comment ^ for more color
           | 
           | Long story short: it is avoidable if you're okay with paying
           | the penalty of querying the audit table by `record_id` before
           | every write. We decided to use more disk and keep the higher
           | write throughput
        
       | bob1029 wrote:
       | We have a new business system we are trying to build that has a
       | _lot_ of very granular auditing and versioning requirements
       | throughout, and this is where I started looking.
       | 
       | After a while of screwing with this kind of approach, I realized
       | that the number of different types of objects involved was going
       | to cause an explosion in the number of related auditing and
       | versioning tables. Keeping track of not only who changed
       | something but also the exact facts they changed (such that you
       | could revert as needed) requires a lot of additional boilerplate
       | in each instance.
       | 
       | So, I made a pretty big gamble and went all-in on an event-
       | sourced abstraction where everything is written to 1 gigantic
       | append-only log of events. Turns out, this was worth it in our
       | case. As long as everything has a timestamp and you have a way to
       | refer to specific log entries (i.e. by offset), you basically
       | have a complete solution. The hard parts are handled by a JSON
       | serializer and gzip. We use in-memory indexes that are simply
       | long/long dictionaries that map an object identity to a physical
       | log offset.
       | 
       | The only downside with what I have proposed is that there is no
       | garbage collection or cleanup possible. At this time, this is
       | actually a highly-desirable property of the system. Disk space is
       | cheap. Losing 3 weeks of work to an accidental button click is
       | not.
        
         | infogulch wrote:
         | I've done this before on smaller scales. Writing a set of
         | table-valued functions that expose the event data in relational
         | form "as of date given by param X", and views on top of that
         | where X is set to getdate(), you get a very ergonomic way to
         | consume "current" & "historical" data. Materialized views can
         | make querying these more natural relations very cheap if your
         | engine supports them.
        
         | pstuart wrote:
         | > everything is written to 1 gigantic append-only log of events
         | 
         | Do you replicate that stream to avoid SPOF?
        
           | bob1029 wrote:
           | I am currently investigating synchronous replication of each
           | compressed log batch to a 2nd host. We are totally fine with
           | availability issues (i.e. system down because log replication
           | broken). Availability problems are far preferable to
           | consistency and other nasty bugs.
        
         | lichtenberger wrote:
         | As posted in another comment that's basically what
         | https://github.com/sirixdb/sirix supports along with easily
         | reconstructing former revisions of a JSON document (or any node
         | therein), sophisticated secondary (also versioned) indexes,
         | querying with JSONiq and optimizations at query compile time as
         | for joins and aggregates....
         | 
         | In our case we opted for three types of secondary user-defined
         | indexes (besides the main document trie index to find unique,
         | stable nodes by their 64bit ID), namely
         | 
         | 1. field indexes
         | 
         | 2. path indexes
         | 
         | 3. content-and-structure indexes which index paths and their
         | typed values
         | 
         | Furthermore a path summary keeps track of all distinct paths in
         | the JSON resource.
        
       | scwoodal wrote:
       | I didn't see this mentioned in the article but with JSONB you can
       | leverage operators to know what data changed.                 >
       | -- record = {"id": 1, "name": "bar"}, old_record = {"id": 1,
       | "name": "foo"}       > select record - old_record where id=3;
       | name => bar       >
        
         | netcraft wrote:
         | I dont think this is built in, I think you must have an
         | extension or custom operator providing that:
         | https://dbfiddle.uk/?rdbms=postgres_14&fiddle=273a440d043a82...
        
           | scwoodal wrote:
           | Ah yes, I was thinking HSTORE.
           | 
           | https://dbfiddle.uk/?rdbms=postgres_14&fiddle=2f406630f367b5.
           | ..
        
         | efxhoy wrote:
         | This SO answer has a good jsonb diff function:
         | https://stackoverflow.com/a/36043269
        
       | craigkerstiens wrote:
       | Love the detail and approach here, though (and it is mentioned in
       | the bottom of the article) this shouldn't be used at any very
       | large scale... the estimate of at about 1k transactions per
       | second or less seems about right. For any larger scale you want
       | to look at something like pg_audit which can scale much much
       | further and be used with other extensions to ensure you have a
       | full audit log that can't be modified (when dealing with
       | compliance).
        
       | sususu wrote:
       | I really like this approach, however I have a use case where the
       | application user id that made the operation must be saved in the
       | audit table, unfortunately, I cannot see how to do that with a
       | pure SQL solution.
       | 
       | Has anyone done something similar with SQL only?
        
         | oliverrice wrote:
         | if `user_id` is in the table you're auditing you could do
         | 
         | ``` alter table audit.record_version add column user_id bigint
         | default (coalesce(record, old_record) ->> 'user_id'); ```
         | 
         | or if you meant the SQL `current_user` you could extend the
         | trigger to track it.
         | 
         | but if the user_id is only available in application space (not
         | in the db) it will not be possible using straight SQL
        
         | timwis wrote:
         | PostgREST has an interesting approach to authentication that
         | might give you some ideas:
         | 
         | https://postgrest.org/en/stable/auth.html
         | 
         | Supabase is built on top of PostgREST, but I can't say for sure
         | it uses the same means of authentication (I think it has its
         | own Auth layer).
        
           | oliverrice wrote:
           | Supabase passes all the JWT info to postgrest so it is still
           | available in transaction local config
        
         | netcraft wrote:
         | we do that by setting a local variable before the query and
         | then reading that in the triggers:
         | 
         | ```                 SET LOCAL foo.who_id = 'some-uuid';
         | UPDATE table       SET ...
         | 
         | ```
         | 
         | ```                 -- function to make getting the setting
         | easier       DROP FUNCTION IF EXISTS get_who_id (text);
         | CREATE OR REPLACE FUNCTION get_who_id (default_value text
         | DEFAULT null::text) RETURNS text AS $get_who_id$      DECLARE
         | who_id text;      BEGIN       BEGIN        who_id :=
         | current_setting('foo.who_id');       EXCEPTION        WHEN
         | SQLSTATE '42704' THEN         RETURN default_value;       END;
         | IF (length(who_id) = 0) THEN        RETURN default_value;
         | END IF;       return who_id;      END       $get_who_id$
         | LANGUAGE plpgsql VOLATILE;
         | 
         | ```
         | 
         | ```                 CREATE OR REPLACE FUNCTION some_table_audit
         | () RETURNS   TRIGGER AS $some_table_audit$      DECLARE
         | who_id text;      BEGIN            who_id :=
         | get_who_id(null::text);       ...
         | 
         | ```
         | 
         | Identifiers changed, but hopefully will give you the idea.
        
           | tda wrote:
           | I've experiment with a very similar solution and it felt a
           | bit dirty but so far it seems to be working just fine. I have
           | made an integration for auditing with sqlmodel which I intend
           | to share, but it is kind of rough and I was a bit stuck
           | trying to clean it up. The idea is that you add e.g. a
           | HeroHistory model derived from HeroBase and a HistoryMixin
           | that creates the triggers and relationships to have an Audit
           | log of the Hero table.
           | 
           | If anyone is interested give me a shout out
        
           | sususu wrote:
           | That's really interesting and gave me some (hopefully) good
           | ideas.
           | 
           | Thank you very much!
        
       | raimille1 wrote:
       | This is a great technique, we used to have it setup very
       | similarly at a previous startup and worked wonders. Thanks for
       | sharing a generic way of doing it @oliverrice !
        
       | ithrow wrote:
       | Debezium can be used for this in embedded mode if you are on the
       | JVM.
        
         | johndfsgdgdfg wrote:
         | Can you please elaborate? Last time I checked it needed Kafka
         | to work.
        
       ___________________________________________________________________
       (page generated 2022-03-09 23:00 UTC)