[HN Gopher] Postgres Observability
       ___________________________________________________________________
        
       Postgres Observability
        
       Author : fforflo
       Score  : 346 points
       Date   : 2020-11-11 13:12 UTC (9 hours ago)
        
 (HTM) web link (pgstats.dev)
 (TXT) w3m dump (pgstats.dev)
        
       | iamwil wrote:
       | Are these usually used for monitoring (like a read-only api)? Or
       | do people build on top of this? Like say, you only need the write
       | ahead log, so you build out a key value store, ignoring all the
       | other stuff postgres built on top of the write ahead log.
       | 
       | I guess I'm asking if you can take postgres and "turn it inside
       | out" by cherry-picking parts of it to build other storage
       | software with it.
        
         | anarazel wrote:
         | > Are these usually used for monitoring (like a read-only api)?
         | 
         | Yes.
         | 
         | > I guess I'm asking if you can take postgres and "turn it
         | inside out" by cherry-picking parts of it to build other
         | storage software with it.
         | 
         | Unfortunately many subsystems of postgres are too
         | interdependent to easily be used independently. Including the
         | WAL.
         | 
         | Part of that is historical / unintentional. But there's also
         | plenty places where generalizing subsystems would make the
         | whole slower and/or more complicated.
         | 
         | There definitely are bits that I'd like to have used outside of
         | postgres many times. If you have to write C, something like
         | postgres' memory context API makes it so much more convenient
         | (and often more performant too!).
        
       | technimad wrote:
       | I don't like the devaluation of the word observability. A chart
       | like this used to be called something like 'tools to monitor and
       | understand postgres'. Observability is the ability to derive the
       | internal state of a system from the outside. Usually done through
       | traces.
       | 
       | It would be great if Postgres could emit a trace per query,
       | showing in real-time which internal components were hit by this
       | query. A sort of continuous query explain service.
       | 
       | Combine these traces to database clients and other front end
       | services and you'll be able to point to the front end service
       | version which causes cache misses deep inside postgres.
        
         | anarazel wrote:
         | What kind of in information would you like to see in that
         | trace?
         | 
         | Not planning to work on it directly, but I am planning to do
         | some larger executor work as one of the next bigger projects,
         | and having ideas about what kind of information people would
         | like to see could make it easier to implement them later.
        
           | technimad wrote:
           | A trace should answer why a query is fast or slow. It should
           | provide enough information to a dba (which I'm not) to
           | determine in which domain the most likely optimization is
           | possible. I.e. the query itself, db configuration and tuning
           | or underlying infra.
        
             | anarazel wrote:
             | Unfortunately some of the data for that isn't cheap to
             | collect. Postgres' EXPLAIN (ANALYZE, SETTINGS) <query>,
             | which shows a good bit of what you want, can slow queries
             | down substantially (in some workloads). Primarily because
             | of timestamp overhead.
             | 
             | Just always collecting that information just in case it may
             | get accessed thus isn't really feasible. It'd be good to
             | make it possible to query cheaper information on-demand
             | though (e.g. asking for the EXPLAIN of a query running in
             | another session, without analyze, should be doable with
             | some effort).
             | 
             | You can already set up things in a way that allows to
             | correlate connections / queries with distributed tracing.
             | But it's a more work than it should be. Postgres'
             | pg_stat_activity shows queries, and it can include
             | information that allows to correlate in the connection's
             | 'application_name'.
        
               | tybit wrote:
               | The point the OP was trying to make I believe, is not
               | that it should do all of that, but that if it doesn't do
               | that a different name to observability should be chosen
               | such as monitoring.
        
               | anarazel wrote:
               | "It would be great if Postgres could emit a trace per
               | query, showing in real-time which internal components
               | were hit by this query. A sort of continuous query
               | explain service."
        
         | Someone wrote:
         | https://www.postgresql.org/docs/current/dynamic-trace.html:
         | 
         | "PostgreSQL provides facilities to support dynamic tracing of
         | the database server. This allows an external utility to be
         | called at specific points in the code and thereby trace
         | execution.
         | 
         | A number of probes or trace points are already inserted into
         | the source code. These probes are intended to be used by
         | database developers and administrators. By default the probes
         | are not compiled into PostgreSQL; the user needs to explicitly
         | tell the configure script to make the probes available.
         | 
         | Currently, the DTrace utility is supported, which, at the time
         | of this writing, is available on Solaris, macOS, FreeBSD,
         | NetBSD, and Oracle Linux. The SystemTap project for Linux
         | provides a DTrace equivalent and can also be used. Supporting
         | other dynamic tracing utilities is theoretically possible by
         | changing the definitions for the macros in
         | src/include/utils/probes.h."
        
         | hobs wrote:
         | Not super familiar with PG but I was under the impression it
         | did have traces - https://www.postgresql.org/docs/9.1/dynamic-
         | trace.html
        
         | [deleted]
        
         | thamer wrote:
         | Observability is not just about traces. This great article[1]
         | by @copyconstruct describes it in detail, and lists monitoring,
         | alerting/visualization, tracing, and log aggregation/analytics
         | as key components of an observability stack.
         | 
         | [1] https://copyconstruct.medium.com/monitoring-and-
         | observabilit...
        
       | valyala wrote:
       | I'm curios whether all the metrics mentioned here are exported by
       | postgres_exporter [1]? Nowadays Prometheus becomes de-facto
       | standard for monitoring, so it would be great if
       | postgres_exporter could provide all these metrics to Prometheus.
       | Moreover, there are other monitoring solutions such as
       | VictoriaMetrics [2] or vmagent [3], which can scrape metrics from
       | Prometheus-compatible exporters.
       | 
       | [1] https://github.com/wrouesnel/postgres_exporter
       | 
       | [2] https://github.com/VictoriaMetrics/VictoriaMetrics/
       | 
       | [3] https://victoriametrics.github.io/vmagent.html
        
       | pgtruesdell wrote:
       | This is a handy tool, big thanks to the dev who put this
       | together. This is going to be an excellent educational tool. My
       | only suggestion I have is to add links to the docs for the detail
       | page's relevant function.
        
       | michaelcampbell wrote:
       | I find the UX kind of jarring; the fade-out/in feels too fast for
       | me, but that may be pretty subjective.
        
         | andrewstuart2 wrote:
         | Yeah that was my one element of constructive feedback. The
         | flashing literally made my eyes uncomfortable, to the point
         | that I don't really want to explore too much.
        
           | lesovsky wrote:
           | I'm not a professional UX designer, just a Postgres DBA )))
           | I'm not promising night mode, but try to experiment with
           | colors and make animation more smooth.
        
             | scott_s wrote:
             | The problem for me is that by changing what I'm _not_
             | looking at, it draws me to look at what was taken away. We
             | are drawn to look at movement. It may be better to leave
             | the not-relevant stuff alone, and instead highlight what
             | the user is currently hovering over. Something as simpler
             | as making the relevant text bold, increasing the thickness
             | of the lines, and maybe making the text box a darker shade
             | may help.
        
       | jyoshi wrote:
       | This is very interesting. Another valuable information, albeit
       | involving quite a bit of work, would be to show parameters
       | associated with that specific area.
        
       | Munksgaard wrote:
       | What am I looking at here?
        
         | tempest_ wrote:
         | Postgres views and functions that contain info you might want
         | to know about a Postgres instance
         | 
         | You can click on them and it will tell you more.
        
       | tirumaraiselvan wrote:
       | For those interested in similar chart but for linux processes:
       | http://www.brendangregg.com/linuxperf.html
       | 
       | Amazing stuff.
        
       | candiddevmike wrote:
       | I would love to see PostgreSQL integrate with something like
       | Jaeger, specifically to have spans for triggers. Database
       | triggers are terrible to introspect and debug.
        
         | sargun wrote:
         | I added integration for Zipkin in via Opencensus some time ago
         | into a project I work on. This has been running in production
         | for a while now: https://github.com/Netflix/titus-
         | executor/tree/master/vpc/se...
         | 
         | You need to:
         | 
         | 1. Have an Opencensus tracing context
         | 
         | 2. Have query logging setup on your PG server. There are a
         | bunch of ways to do this with minimal overhead. You can log
         | slow queries (say queries >5ms), or log queries that fail, or
         | sample queries.
         | 
         | Your query / log ends up getting something like:
         | 2020-11-11 21:00:00 UTC:titusvpcservice@titusvpcservice:
         | [60294]:HINT: The transaction might succeed if retried.
         | 2020-11-11 21:00:00 UTC:titusvpcservice@titusvpcservice:
         | [60294]:STATEMENT: /* md: {"spanID":"34c1a9f38fb44cad"} */
         | INSERT INTO assignments(branch_eni_association, assignment_id)
         | VALUES ($1, $2) RETURNING id
         | 
         | You can then look at a Zipkin, and use the value within MD
         | (spanID) to get the trace. I did this originally, because I
         | wanted to transparently wrap the PG SQL Driver for Zipkin.
         | Postgres can be oblivious to the fact there's "Zipkin Inside",
         | because it's a terminal node. You can get the span ID from the
         | BEGIN TRANSACTION / first query, and then tie that to the pid
         | and timestamp, and then use that to go back and look through
         | things with standard(ish) postgres introspection, since almost
         | all of it has the pg_backend_pid + timestamp in it.
        
           | anarazel wrote:
           | Have you considered to add the span ID to the application
           | name instead? Combined with including the application name in
           | the log_line_prefix that will make the logs easier to inspect
           | in quite a few cases. Also makes querying pg_stat_activity
           | etc easier, since you can see the current application_name
           | for each connection in there.
        
         | awinter-py wrote:
         | yeah -- I wonder how DBs will change over the next few years.
         | Their performance advantage is partially code quality, but
         | partially that they can run complex searches 'close to the
         | data'. You could do the latter with a stateful REST server too,
         | and it would support a lot of modern tooling.
        
         | brunoqc wrote:
         | Did you check https://github.com/open-telemetry/opentelemetry-
         | js-contrib/t... ?
         | 
         | I'm not sure if it's the only one but opentelemetry.io does
         | list PostgreSQL.
        
           | candiddevmike wrote:
           | Correct me if I'm wrong, but this just seems like using the
           | opentelemetry library in node to send query info in a span.
           | This doesn't seem like a full PostgreSQL integration?
        
             | brunoqc wrote:
             | Yeah it looks like it runs on the client side right? While
             | I was expecting a proper plugin running on the postgresql
             | side.
        
               | sargun wrote:
               | Why would you need any "server side" logic in PG?
        
               | majewsky wrote:
               | They want to trace the goings-on inside PG, e.g. if you
               | have a query that runs in 3 seconds and joins five
               | different tables, it might be interesting to see
               | information like "spent 0.467 seconds joining table B and
               | F". Like what EXPLAIN ANALYZE does, but live.
        
       | purple-again wrote:
       | Lovely and interesting tool. If submitted by the author; one
       | small probable bug detected. When you mouse over "Stats
       | Collector" it doesn't highlight any packages
        
         | lesovsky wrote:
         | Unfortunately, it's not a bug )). There is no any view or
         | functions for inspecting stats collector.
        
       | sor-axxel wrote:
       | I was searching for something like this. Thanks-
        
       | nickjj wrote:
       | If anyone is using Flask I wrote an extension that helps get at
       | some of these stats by adding a bunch of Flask CLI commands to
       | your project: https://github.com/nickjj/flask-pg-extras
       | 
       | It's super inspired (read: mostly copy / pasted) from Heroku's
       | "PG Extras" tool, except in this case it all works without
       | Heroku. You just need to be using SQLAlchemy.
        
       | i_have_to_speak wrote:
       | pgmetrics [1] is an open source tool that can collect information
       | from most of the views/functions described here. Can emit that
       | information in text or JSON formats.
       | 
       | [1] https://pgmetrics.io
        
       | maxekman wrote:
       | Very useful, thanks for sharing.
        
       | rickspencer3 wrote:
       | InfluxDB has a template to aid with monitoring a postgres
       | instances[0]. I wonder if the Telegraf Plugin[1] could be
       | enhanced to use more of these metrics, and then the template
       | could be enhanced as well?
       | 
       | [0] https://github.com/influxdata/community-
       | templates/tree/maste... [1]
       | https://github.com/influxdata/telegraf/tree/master/plugins/i...
        
       | abhinav22 wrote:
       | As a side note, .dev is Such a great domain -- great to see its
       | popularity increasing!
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2020-11-11 23:01 UTC)