[HN Gopher] Postgres Observability ___________________________________________________________________ Postgres Observability Author : fforflo Score : 346 points Date : 2020-11-11 13:12 UTC (9 hours ago) (HTM) web link (pgstats.dev) (TXT) w3m dump (pgstats.dev) | iamwil wrote: | Are these usually used for monitoring (like a read-only api)? Or | do people build on top of this? Like say, you only need the write | ahead log, so you build out a key value store, ignoring all the | other stuff postgres built on top of the write ahead log. | | I guess I'm asking if you can take postgres and "turn it inside | out" by cherry-picking parts of it to build other storage | software with it. | anarazel wrote: | > Are these usually used for monitoring (like a read-only api)? | | Yes. | | > I guess I'm asking if you can take postgres and "turn it | inside out" by cherry-picking parts of it to build other | storage software with it. | | Unfortunately many subsystems of postgres are too | interdependent to easily be used independently. Including the | WAL. | | Part of that is historical / unintentional. But there's also | plenty places where generalizing subsystems would make the | whole slower and/or more complicated. | | There definitely are bits that I'd like to have used outside of | postgres many times. If you have to write C, something like | postgres' memory context API makes it so much more convenient | (and often more performant too!). | technimad wrote: | I don't like the devaluation of the word observability. A chart | like this used to be called something like 'tools to monitor and | understand postgres'. Observability is the ability to derive the | internal state of a system from the outside. Usually done through | traces. | | It would be great if Postgres could emit a trace per query, | showing in real-time which internal components were hit by this | query. A sort of continuous query explain service. | | Combine these traces to database clients and other front end | services and you'll be able to point to the front end service | version which causes cache misses deep inside postgres. | anarazel wrote: | What kind of in information would you like to see in that | trace? | | Not planning to work on it directly, but I am planning to do | some larger executor work as one of the next bigger projects, | and having ideas about what kind of information people would | like to see could make it easier to implement them later. | technimad wrote: | A trace should answer why a query is fast or slow. It should | provide enough information to a dba (which I'm not) to | determine in which domain the most likely optimization is | possible. I.e. the query itself, db configuration and tuning | or underlying infra. | anarazel wrote: | Unfortunately some of the data for that isn't cheap to | collect. Postgres' EXPLAIN (ANALYZE, SETTINGS) <query>, | which shows a good bit of what you want, can slow queries | down substantially (in some workloads). Primarily because | of timestamp overhead. | | Just always collecting that information just in case it may | get accessed thus isn't really feasible. It'd be good to | make it possible to query cheaper information on-demand | though (e.g. asking for the EXPLAIN of a query running in | another session, without analyze, should be doable with | some effort). | | You can already set up things in a way that allows to | correlate connections / queries with distributed tracing. | But it's a more work than it should be. Postgres' | pg_stat_activity shows queries, and it can include | information that allows to correlate in the connection's | 'application_name'. | tybit wrote: | The point the OP was trying to make I believe, is not | that it should do all of that, but that if it doesn't do | that a different name to observability should be chosen | such as monitoring. | anarazel wrote: | "It would be great if Postgres could emit a trace per | query, showing in real-time which internal components | were hit by this query. A sort of continuous query | explain service." | Someone wrote: | https://www.postgresql.org/docs/current/dynamic-trace.html: | | "PostgreSQL provides facilities to support dynamic tracing of | the database server. This allows an external utility to be | called at specific points in the code and thereby trace | execution. | | A number of probes or trace points are already inserted into | the source code. These probes are intended to be used by | database developers and administrators. By default the probes | are not compiled into PostgreSQL; the user needs to explicitly | tell the configure script to make the probes available. | | Currently, the DTrace utility is supported, which, at the time | of this writing, is available on Solaris, macOS, FreeBSD, | NetBSD, and Oracle Linux. The SystemTap project for Linux | provides a DTrace equivalent and can also be used. Supporting | other dynamic tracing utilities is theoretically possible by | changing the definitions for the macros in | src/include/utils/probes.h." | hobs wrote: | Not super familiar with PG but I was under the impression it | did have traces - https://www.postgresql.org/docs/9.1/dynamic- | trace.html | [deleted] | thamer wrote: | Observability is not just about traces. This great article[1] | by @copyconstruct describes it in detail, and lists monitoring, | alerting/visualization, tracing, and log aggregation/analytics | as key components of an observability stack. | | [1] https://copyconstruct.medium.com/monitoring-and- | observabilit... | valyala wrote: | I'm curios whether all the metrics mentioned here are exported by | postgres_exporter [1]? Nowadays Prometheus becomes de-facto | standard for monitoring, so it would be great if | postgres_exporter could provide all these metrics to Prometheus. | Moreover, there are other monitoring solutions such as | VictoriaMetrics [2] or vmagent [3], which can scrape metrics from | Prometheus-compatible exporters. | | [1] https://github.com/wrouesnel/postgres_exporter | | [2] https://github.com/VictoriaMetrics/VictoriaMetrics/ | | [3] https://victoriametrics.github.io/vmagent.html | pgtruesdell wrote: | This is a handy tool, big thanks to the dev who put this | together. This is going to be an excellent educational tool. My | only suggestion I have is to add links to the docs for the detail | page's relevant function. | michaelcampbell wrote: | I find the UX kind of jarring; the fade-out/in feels too fast for | me, but that may be pretty subjective. | andrewstuart2 wrote: | Yeah that was my one element of constructive feedback. The | flashing literally made my eyes uncomfortable, to the point | that I don't really want to explore too much. | lesovsky wrote: | I'm not a professional UX designer, just a Postgres DBA ))) | I'm not promising night mode, but try to experiment with | colors and make animation more smooth. | scott_s wrote: | The problem for me is that by changing what I'm _not_ | looking at, it draws me to look at what was taken away. We | are drawn to look at movement. It may be better to leave | the not-relevant stuff alone, and instead highlight what | the user is currently hovering over. Something as simpler | as making the relevant text bold, increasing the thickness | of the lines, and maybe making the text box a darker shade | may help. | jyoshi wrote: | This is very interesting. Another valuable information, albeit | involving quite a bit of work, would be to show parameters | associated with that specific area. | Munksgaard wrote: | What am I looking at here? | tempest_ wrote: | Postgres views and functions that contain info you might want | to know about a Postgres instance | | You can click on them and it will tell you more. | tirumaraiselvan wrote: | For those interested in similar chart but for linux processes: | http://www.brendangregg.com/linuxperf.html | | Amazing stuff. | candiddevmike wrote: | I would love to see PostgreSQL integrate with something like | Jaeger, specifically to have spans for triggers. Database | triggers are terrible to introspect and debug. | sargun wrote: | I added integration for Zipkin in via Opencensus some time ago | into a project I work on. This has been running in production | for a while now: https://github.com/Netflix/titus- | executor/tree/master/vpc/se... | | You need to: | | 1. Have an Opencensus tracing context | | 2. Have query logging setup on your PG server. There are a | bunch of ways to do this with minimal overhead. You can log | slow queries (say queries >5ms), or log queries that fail, or | sample queries. | | Your query / log ends up getting something like: | 2020-11-11 21:00:00 UTC:titusvpcservice@titusvpcservice: | [60294]:HINT: The transaction might succeed if retried. | 2020-11-11 21:00:00 UTC:titusvpcservice@titusvpcservice: | [60294]:STATEMENT: /* md: {"spanID":"34c1a9f38fb44cad"} */ | INSERT INTO assignments(branch_eni_association, assignment_id) | VALUES ($1, $2) RETURNING id | | You can then look at a Zipkin, and use the value within MD | (spanID) to get the trace. I did this originally, because I | wanted to transparently wrap the PG SQL Driver for Zipkin. | Postgres can be oblivious to the fact there's "Zipkin Inside", | because it's a terminal node. You can get the span ID from the | BEGIN TRANSACTION / first query, and then tie that to the pid | and timestamp, and then use that to go back and look through | things with standard(ish) postgres introspection, since almost | all of it has the pg_backend_pid + timestamp in it. | anarazel wrote: | Have you considered to add the span ID to the application | name instead? Combined with including the application name in | the log_line_prefix that will make the logs easier to inspect | in quite a few cases. Also makes querying pg_stat_activity | etc easier, since you can see the current application_name | for each connection in there. | awinter-py wrote: | yeah -- I wonder how DBs will change over the next few years. | Their performance advantage is partially code quality, but | partially that they can run complex searches 'close to the | data'. You could do the latter with a stateful REST server too, | and it would support a lot of modern tooling. | brunoqc wrote: | Did you check https://github.com/open-telemetry/opentelemetry- | js-contrib/t... ? | | I'm not sure if it's the only one but opentelemetry.io does | list PostgreSQL. | candiddevmike wrote: | Correct me if I'm wrong, but this just seems like using the | opentelemetry library in node to send query info in a span. | This doesn't seem like a full PostgreSQL integration? | brunoqc wrote: | Yeah it looks like it runs on the client side right? While | I was expecting a proper plugin running on the postgresql | side. | sargun wrote: | Why would you need any "server side" logic in PG? | majewsky wrote: | They want to trace the goings-on inside PG, e.g. if you | have a query that runs in 3 seconds and joins five | different tables, it might be interesting to see | information like "spent 0.467 seconds joining table B and | F". Like what EXPLAIN ANALYZE does, but live. | purple-again wrote: | Lovely and interesting tool. If submitted by the author; one | small probable bug detected. When you mouse over "Stats | Collector" it doesn't highlight any packages | lesovsky wrote: | Unfortunately, it's not a bug )). There is no any view or | functions for inspecting stats collector. | sor-axxel wrote: | I was searching for something like this. Thanks- | nickjj wrote: | If anyone is using Flask I wrote an extension that helps get at | some of these stats by adding a bunch of Flask CLI commands to | your project: https://github.com/nickjj/flask-pg-extras | | It's super inspired (read: mostly copy / pasted) from Heroku's | "PG Extras" tool, except in this case it all works without | Heroku. You just need to be using SQLAlchemy. | i_have_to_speak wrote: | pgmetrics [1] is an open source tool that can collect information | from most of the views/functions described here. Can emit that | information in text or JSON formats. | | [1] https://pgmetrics.io | maxekman wrote: | Very useful, thanks for sharing. | rickspencer3 wrote: | InfluxDB has a template to aid with monitoring a postgres | instances[0]. I wonder if the Telegraf Plugin[1] could be | enhanced to use more of these metrics, and then the template | could be enhanced as well? | | [0] https://github.com/influxdata/community- | templates/tree/maste... [1] | https://github.com/influxdata/telegraf/tree/master/plugins/i... | abhinav22 wrote: | As a side note, .dev is Such a great domain -- great to see its | popularity increasing! | [deleted] ___________________________________________________________________ (page generated 2020-11-11 23:01 UTC)