[HN Gopher] Postgres Auditing in 150 lines of SQL ___________________________________________________________________ Postgres Auditing in 150 lines of SQL Author : oliverrice Score : 146 points Date : 2022-03-09 15:26 UTC (7 hours ago) (HTM) web link (supabase.com) (TXT) w3m dump (supabase.com) | jph wrote: | The post leads to a good Supabase Postgres auditing tool as a PG | extension named supa_audit: | | https://github.com/supabase/supa_audit | simonw wrote: | I've battled this problem many times over my career in many | different ways - this approach looks very solid to me. I | particularly like the way the schema design incorporates a record | ID, old record ID, updated record and old record. | | You do end up storing a lot of extra data, but it's very thorough | - it's not going to miss anything, and figuring out what happened | later on should be very easy. | lichtenberger wrote: | I'd argue that's still a lot of work to manually do. However, | great work and detail, thanks a lot :-) | | I'm working on a database system[1] in my spare time, which | automatically retains all revisions and assignes revision | timestamps during commits (single timestamp in the | RevisionRootPage). Furthermore, it is tamper proof and the whole | storage can be verified by comparing a single UberPage hash as in | ZFS. | | Basically it is a persistent trie-based revision index (plus | document and secondary indexes) mapped to durable storage, a | simple log-structured append-only file. A second file tracks | revision offsets to provide binary search on an in-memory map of | timestamps. As the root of the tree is atomically swapped it does | not need a WAL, which basically is another data file and can be | omitted in this case. | | Besides versioning the data itself in a binary encoding similar | to BSON it tracks changes and writes simple JSON diff files for | each new revision. | | The data pages are furthermore not simply copied on write, but a | sliding snapshot algorithm makes sure, that only changed records | mainly have to be written. Before the page fragments are written | on durable storage they are furthermore compressed and in the | future might be encrypted. | | [1] https://sirix.io | https://github.com/sirixdb/sirix | jmalicki wrote: | Every google hit for "UberPage" seems to be your writing, is | there standard ZFS terminology that would be found in ZFS | documentation to what you're referring? | lichtenberger wrote: | It's in our case the tree root of the index, which is always | written after all the descendant pages have been written as a | new revision is committed (during a postorder traversal of | the new pages). | | In ZFS the UberPage is called UberBlock. We borrowed some of | the concepts as to add checksums in the parent pages instead | of the pages itself. In ZFS they are blocks :-) | | Thanks for asking. | ithrow wrote: | Yes but sirix is not a SQL/RDBMS. | lichtenberger wrote: | True. In general I could add storing relational data as well, | but currently I'm entirely focusing on JSON and auto-indexing | for secondary indexes as well higher order function support | in Brackit. | | Of course we'd need more man-power as I'm more or less the | only one working on the core in my spare time (since 2012). | | Moshe mainly works on the clients and a frontend. A new | frontend based on SolidJS is in the works showing the history | and diffs as in | | https://raw.githubusercontent.com/sirixdb/sirix/master/Scree. | .. | | However, we're of course looking forward to suggestions, bug | reports, real world use cases and contributions :-) | lichtenberger wrote: | In order to provide fast audits of subtrees, the system stores | optionally a merkle hash tree (a hash in each node) and updates | the hashes for all ancestors automatically during updates. | chatmasta wrote: | You might like what we're doing with Splitgraph. Our command | line tool (sgr) installs an audit log into Postgres to track | changes [0]. Then `sgr commit` can write these changes to | delta-compressed objects [1], where each object is a columnar | fragment of data, addressable by the LTHash of rows | added/deleted by the fragment, and attached to metadata | describing its index [2]. | | I haven't explored sirix before, but at first glance it looks | like we have some similar ideas -- thanks for sharing, I'm | excited to learn more, especially about its application of ZFS. | | [0] https://www.splitgraph.com/docs/working-with- | data/tracking-c... | | [1] https://www.splitgraph.com/docs/concepts/objects | | [2] | https://github.com/splitgraph/splitgraph/blob/master/splitgr... | lichtenberger wrote: | Very interesting, thanks for pointing out :-) | oliverrice wrote: | author here (of the blog post and | https://github.com/supabase/supa_audit) | | happy to answer any questions | simonw wrote: | Are you at all worried about the size of the audit table? | Keeping a JSON copy of both the old and the new data is going | to add up pretty quickly for fast-changing tables. | | I've built audit systems in the past which track just the new | values, with the idea being that you can replay the audit log | for a particular record from the beginning to reconstruct its | state. I'm not convinced that's a better approach than yours | here though, just interested in your thinking. | oliverrice wrote: | > Are you at all worried about the size of the audit table? | Keeping a JSON copy of both the old and the new data is going | to add up pretty quickly for fast-changing tables. | | That's def a valid concern if you're auditing a high | frequency insert/update table, or have your entire database | under audit. If you do have those needs a system that logs | outside of postgres (like pgaudit) would be a better fit. | | In my experience most startups and mid-size enterprises | sprinkle in auditing around the sensitive parts of the DB, | like `account` or `access_control` tables where writes load | isn't much of a concern. | | > I've built audit systems in the past which track just the | new values | | Yeah, that solution works great! The one place it breaks down | is if you apply auditing to an existing table, and need to be | able to recover the records from when auditing was enabled | (initial state is not snapshotted) | | We toyed with another approach to avoiding having to track | old_record by first checking to see if the `record_id` exists | in the audit table, and then inserting a row with the OLD | values with a `SNAPSHOT` `operation` if it does not. | | Even though that query/check was operating on an index, the | performance overhead was higher than we were comfortable | with. | | Storage is pretty cheap these days so we opted to optimize | for write throughput + reduced memory usage rather than any | considerations wrt disk. | jsmith99 wrote: | I've also used an update trigger to track only changed | columns, inspired by this post[0] which wraps a comparison | function into a new subtract operator for JSONB, similar to | the built in subtract operator for HSTORE which removes any | keypairs that haven't changed. | | [0] http://8kb.co.uk/blog/2015/01/19/copying-pavel-stehules- | simp... | oliverrice wrote: | thats a cool solution for reducing storage. nice! | craigkerstiens wrote: | Building a bit off Simon's above question. I'm curious if any | thought was given to using JSON over JSONB. The trade off of | really fast ability to ingest them and save them seems like | it could be better than JSONB which has to do some conversion | and if there are any indexes on the JSONB columns could | really dramatically slow down throughput. | oliverrice wrote: | > I'm curious if any thought was given to using JSON over | JSONB | | For sure! There is a conversion penalty that is paid | incrementally (at write time). For paying that penalty, | jsonb gets you reduced storage size and dramatically faster | column value extraction with `->` or `->>` if you end up | having to filter the audit table by something other than | the `record_id` | | It is a tradeoff though and depending on the specifics of | your use-case JSON could be a better choice. | netcraft wrote: | this is great! I've done something similar many times, but the | ability to do it on arbitrary tables is really nice. | | If you have the HSTORE extension, you can also use it to get a | diff of the old and new: `HSTORE(OLD) - HSTORE(NEW)` and vice | versa, very nice to easily have the deltas. | perlgeek wrote: | First of all, this looks super awesome, and at least an order | of magnitude better than our home-grown solution (grown over | 25+ years on a DB that is now mariadb). | | How well does this work when you audit multiple tables with | foreign key relationships between them? | | If the user references a customer, and you want to find out | what customer the user belonged to at time X (but the customer | might have been deleted in the mean time), you have to track | through the history of multiple tables to find answers. More if | you have n2m relationships... | | It sounds doable, but not trivial to get right for more | complicated models. | | --- | | Update: to clarify, I want to be able to present a somewhat | homan-readable history of an object without doing much table- | specific logic. | | If an author can publish many books, you typically have an n2m | mapping table, like book_authors. If you want to add newly | published books to the "author" history view, you have to | determine than you have to include parts of the history of the | book_authors table. I don't think you can do that through a | join though? Trying to think my way through this... | | Maybe this can be dealt with easier if you can assume each | table has a non-composite primary key, and include that primary | key in the history table, and then you can do joins more easily | on the history? | oliverrice wrote: | All the data would be available and joining it would be | possible with some jsonb gymnastics but its optimized more | for compliance style auditing than rapid lookups for complex | queries. | | For that scenario replicating audit data to a OLAP system may | be a better fit | simonw wrote: | I'm not completely clear on the reason for both record_id and | old_record_id - is this so that you can capture instances where | one of the columns that makes up the primary key for the row is | itself modified for some reason? | oliverrice wrote: | Being able to see when the primary key changes using | `record_id` and `old_record_id` is a perk, but we also wanted | to make sure a snapshot of the existing data was available | when auditing is enabled on a table that already has data in | it. See comment ^ for more color | | Long story short: it is avoidable if you're okay with paying | the penalty of querying the audit table by `record_id` before | every write. We decided to use more disk and keep the higher | write throughput | bob1029 wrote: | We have a new business system we are trying to build that has a | _lot_ of very granular auditing and versioning requirements | throughout, and this is where I started looking. | | After a while of screwing with this kind of approach, I realized | that the number of different types of objects involved was going | to cause an explosion in the number of related auditing and | versioning tables. Keeping track of not only who changed | something but also the exact facts they changed (such that you | could revert as needed) requires a lot of additional boilerplate | in each instance. | | So, I made a pretty big gamble and went all-in on an event- | sourced abstraction where everything is written to 1 gigantic | append-only log of events. Turns out, this was worth it in our | case. As long as everything has a timestamp and you have a way to | refer to specific log entries (i.e. by offset), you basically | have a complete solution. The hard parts are handled by a JSON | serializer and gzip. We use in-memory indexes that are simply | long/long dictionaries that map an object identity to a physical | log offset. | | The only downside with what I have proposed is that there is no | garbage collection or cleanup possible. At this time, this is | actually a highly-desirable property of the system. Disk space is | cheap. Losing 3 weeks of work to an accidental button click is | not. | infogulch wrote: | I've done this before on smaller scales. Writing a set of | table-valued functions that expose the event data in relational | form "as of date given by param X", and views on top of that | where X is set to getdate(), you get a very ergonomic way to | consume "current" & "historical" data. Materialized views can | make querying these more natural relations very cheap if your | engine supports them. | pstuart wrote: | > everything is written to 1 gigantic append-only log of events | | Do you replicate that stream to avoid SPOF? | bob1029 wrote: | I am currently investigating synchronous replication of each | compressed log batch to a 2nd host. We are totally fine with | availability issues (i.e. system down because log replication | broken). Availability problems are far preferable to | consistency and other nasty bugs. | lichtenberger wrote: | As posted in another comment that's basically what | https://github.com/sirixdb/sirix supports along with easily | reconstructing former revisions of a JSON document (or any node | therein), sophisticated secondary (also versioned) indexes, | querying with JSONiq and optimizations at query compile time as | for joins and aggregates.... | | In our case we opted for three types of secondary user-defined | indexes (besides the main document trie index to find unique, | stable nodes by their 64bit ID), namely | | 1. field indexes | | 2. path indexes | | 3. content-and-structure indexes which index paths and their | typed values | | Furthermore a path summary keeps track of all distinct paths in | the JSON resource. | scwoodal wrote: | I didn't see this mentioned in the article but with JSONB you can | leverage operators to know what data changed. > | -- record = {"id": 1, "name": "bar"}, old_record = {"id": 1, | "name": "foo"} > select record - old_record where id=3; | name => bar > | netcraft wrote: | I dont think this is built in, I think you must have an | extension or custom operator providing that: | https://dbfiddle.uk/?rdbms=postgres_14&fiddle=273a440d043a82... | scwoodal wrote: | Ah yes, I was thinking HSTORE. | | https://dbfiddle.uk/?rdbms=postgres_14&fiddle=2f406630f367b5. | .. | efxhoy wrote: | This SO answer has a good jsonb diff function: | https://stackoverflow.com/a/36043269 | craigkerstiens wrote: | Love the detail and approach here, though (and it is mentioned in | the bottom of the article) this shouldn't be used at any very | large scale... the estimate of at about 1k transactions per | second or less seems about right. For any larger scale you want | to look at something like pg_audit which can scale much much | further and be used with other extensions to ensure you have a | full audit log that can't be modified (when dealing with | compliance). | sususu wrote: | I really like this approach, however I have a use case where the | application user id that made the operation must be saved in the | audit table, unfortunately, I cannot see how to do that with a | pure SQL solution. | | Has anyone done something similar with SQL only? | oliverrice wrote: | if `user_id` is in the table you're auditing you could do | | ``` alter table audit.record_version add column user_id bigint | default (coalesce(record, old_record) ->> 'user_id'); ``` | | or if you meant the SQL `current_user` you could extend the | trigger to track it. | | but if the user_id is only available in application space (not | in the db) it will not be possible using straight SQL | timwis wrote: | PostgREST has an interesting approach to authentication that | might give you some ideas: | | https://postgrest.org/en/stable/auth.html | | Supabase is built on top of PostgREST, but I can't say for sure | it uses the same means of authentication (I think it has its | own Auth layer). | oliverrice wrote: | Supabase passes all the JWT info to postgrest so it is still | available in transaction local config | netcraft wrote: | we do that by setting a local variable before the query and | then reading that in the triggers: | | ``` SET LOCAL foo.who_id = 'some-uuid'; | UPDATE table SET ... | | ``` | | ``` -- function to make getting the setting | easier DROP FUNCTION IF EXISTS get_who_id (text); | CREATE OR REPLACE FUNCTION get_who_id (default_value text | DEFAULT null::text) RETURNS text AS $get_who_id$ DECLARE | who_id text; BEGIN BEGIN who_id := | current_setting('foo.who_id'); EXCEPTION WHEN | SQLSTATE '42704' THEN RETURN default_value; END; | IF (length(who_id) = 0) THEN RETURN default_value; | END IF; return who_id; END $get_who_id$ | LANGUAGE plpgsql VOLATILE; | | ``` | | ``` CREATE OR REPLACE FUNCTION some_table_audit | () RETURNS TRIGGER AS $some_table_audit$ DECLARE | who_id text; BEGIN who_id := | get_who_id(null::text); ... | | ``` | | Identifiers changed, but hopefully will give you the idea. | tda wrote: | I've experiment with a very similar solution and it felt a | bit dirty but so far it seems to be working just fine. I have | made an integration for auditing with sqlmodel which I intend | to share, but it is kind of rough and I was a bit stuck | trying to clean it up. The idea is that you add e.g. a | HeroHistory model derived from HeroBase and a HistoryMixin | that creates the triggers and relationships to have an Audit | log of the Hero table. | | If anyone is interested give me a shout out | sususu wrote: | That's really interesting and gave me some (hopefully) good | ideas. | | Thank you very much! | raimille1 wrote: | This is a great technique, we used to have it setup very | similarly at a previous startup and worked wonders. Thanks for | sharing a generic way of doing it @oliverrice ! | ithrow wrote: | Debezium can be used for this in embedded mode if you are on the | JVM. | johndfsgdgdfg wrote: | Can you please elaborate? Last time I checked it needed Kafka | to work. ___________________________________________________________________ (page generated 2022-03-09 23:00 UTC)