[HN Gopher] Discussion: structured, leveled logging
       ___________________________________________________________________
        
       Discussion: structured, leveled logging
        
       Author : philosopher1234
       Score  : 135 points
       Date   : 2022-09-11 15:44 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | svnpenn wrote:
       | Every time I come across a logging package, I don't understand
       | what the point of them are. Finally in the linked issue, I see
       | this:
       | 
       | > For many programs that aren't run directly by person, like
       | servers, logging is the main way for developers to observe the
       | detailed behavior of the system, and often the first place they
       | go to debug it.
       | 
       | and it makes sense. I don't deal with servers at all, other than
       | a couple I have made just to host some basic sites or programs
       | locally. I think Go team might be overlooking the fact that a big
       | chunk of the userbase doesnt understand, and has no need for a
       | logging package. Couple that with the giant API proposed, along
       | with a poor design decision (kvs ...any), and the large amount of
       | bikeshedding that already happened in this issue, I really hope
       | they think hard before just shoving this into "std".
        
         | xani_ wrote:
         | Develop bigger app and you will understand.
         | 
         | But one example - having an logging interface that is not shit
         | (the current builtin is so garbage it might as well not exist
         | at all) allows for packages to adopt it as default method and
         | you magically can just fed same logger to every dep you pull
         | with no loss of features or having to write wrapper.
         | 
         | Contexted logger means you can just fed ["plugin":pluginname]
         | tagged one to every plugin and get logs pointing you to the
         | place where it happened.
         | 
         | The structured logging part is mostly to avoid woes of then
         | having to parse it back to structured format via bunch of
         | regexpes that will inevitably break next release. If you still
         | don't get it, here is a regexp to parse HAProxy logs:
         | .*haproxy\[(\d+)]: (.+?):(\d+) \[(.+?)\] (.+?)(|[\~])
         | (.+?)\/(.+?)
         | ([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+) ([\-\d]+)
         | ([\-\d]+) (\S+) (\S+) (\S)(\S)(\S)(\S)
         | ([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+)
         | ([\-\d]+)\/([\-\d]+)(| \{.*\}) (".*)([\n|\s]*?)$`)
         | 
         | > I think Go team might be overlooking the fact that a big
         | chunk of the userbase doesnt understand, and has no need for a
         | logging package
         | 
         | How does that proposal impede them ?
         | 
         | if you don't need any of the extras you'd be able to just
         | `log.Info"
        
         | tptacek wrote:
         | It's not going into std; it's going into /x.
        
           | marcus_holmes wrote:
           | effectively the same place at this point. There's so many
           | "standard" libs in /x that trying to constraint to "just the
           | standard lib" has to include /x to be actually useful.
        
             | philosopher1234 wrote:
             | Such as?
        
               | marcus_holmes wrote:
               | goimports, godoc and the other tools
               | (https://pkg.go.dev/golang.org/x/tools - you basically
               | can't use Go properly without these)
               | 
               | all the crypto stuff
               | (https://pkg.go.dev/golang.org/x/crypto) if you're doing
               | anything with crypto then this is needed
               | 
               | others are really useful (see index at
               | https://pkg.go.dev/golang.org/x) - I've imported the
               | image, sync and text packages
               | 
               | it's a lot of the "missing bits" from the standard lib. I
               | get why they're in /x but the distinction is moot in most
               | cases.
        
               | tptacek wrote:
               | Almost nobody uses goimports or godoc as a declared
               | dependency; they're tools, not standard library
               | components.
               | 
               | It's perfectly possible to implement cryptography without
               | touching x/crypto; std includes SHA2, HMAC, an AEAD
               | interface, ED25519, and, of course, Go's entire TLS
               | stack. Arguably there's _too much_ cryptography in std
               | (almost nobody is better off directly building on cipher
               | cores and signature algorithms). Thankfully: most Go
               | programs don 't implement cryptography, which is
               | dangerous even with Go's relatively solid library
               | interfaces.
        
             | tptacek wrote:
             | They're not at all effectively the same place, and much of
             | what's in /x competes with third-party packages, which are
             | on a somewhat level playing field in that both /x and
             | third-party code have to be explicitly brought in to a
             | project as a dep.
        
               | marcus_holmes wrote:
               | meh, I'm always going to prefer an /x package to a 3rd-
               | party package.
        
         | atsjie wrote:
         | The majority of Golang devs actually do work on web services,
         | API's, or libs/frameworks that would benefit from a
         | standardized structured logging interface.
         | 
         | Source: https://go.dev/blog/survey2022-q2-results (scroll
         | almost to the bottom)
         | 
         | For the audience of this proposal it's probably fairly evident
         | what the advantages of standardization would be.
         | 
         | Also you argue quite strongly about the poor API design
         | decisions being made; while you similarly admit not having
         | worked with such tools or with web services in Golang general.
         | That's odd.
        
           | svnpenn wrote:
           | I don't work with servers much, but I do have extensive
           | experience in Go. I know a bad API when I see one.
           | strings.Replacer [1] uses the same syntax, and several others
           | have commented in the discussion about it being a bad idea.
           | 
           | 1. https://godocs.io/strings#Replacer
        
             | xani_ wrote:
             | From what I've skimmed the most problems seem to be
             | "golang's type system won't allow expressing all of that on
             | compile time so it will be costly" and I don't think that
             | can be sensibly fixed in logging package
             | 
             | > I know a bad API when I see one. strings.Replacer [1]
             | uses the same syntax, and several others have commented in
             | the discussion about it being a bad idea.
             | 
             | What's wrong with it ? Never used it. But I think the
             | complaints has been about wasting time for runtime type
             | handling and Replacer doesn't have that problem.
        
               | Joker_vD wrote:
               | The main performance problem is with "kvs ...any"
               | parameter in the Logger's functions because it allocates
               | _lots_ of stuff: the array itself, and then the wrappers
               | around the values. I 've tried to use something like this
               | in a central loop of a CPU-emulator to trace every
               | instruction executed and IIRC it slowed things about 3x
               | times down. The only thing you can do to escape this cost
               | is                  if logging.Level() >= logging.DEBUG {
               | logging.Logger.Debug(...)        }
               | 
               | but it's ugly and can't be abstracted away. And yes, I've
               | put several of such guards in the several hot paths of
               | some actual production code base and seen about 25% speed
               | up after that edit.
        
               | jen20 wrote:
               | > What's wrong with it ?
               | 
               | It relies on lining up pairs of values in a way that
               | cannot be checked by the compiler rather than having
               | dedicated functions that ensure they are kept together.
               | 
               | This is the fundamental difference between bad logging
               | libraries (like klog, hclog etc) and good logging
               | libraries (like zerolog).
        
               | xani_ wrote:
               | I guess simplest fix for that particular one would be
               | accepting `map[string]string` but that won't exactly work
               | for the any type, and you can't easily add more.
               | 
               | Maybe `...map[string]interface{}` but we're back to "the
               | overhead is kinda shit"
        
         | kaz-inc wrote:
         | I also didn't understand the need for a logging package for a
         | while, but I think it makes sense for go because they're
         | focusing on an audience that writes server applications.
        
           | xani_ wrote:
           | It's less about package and more about interface.
           | 
           | One "good enough" in stdlib means libraries could just
           | require passing a interface matching it. Then no matter which
           | logging package you use, as long as it fulfils the interface
           | of stdlib logger it can be used with any library or other
           | component you pulled.
           | 
           | So with simple app you can just have simple logger that dumps
           | everything on stderr, complex app can have whatever complex
           | logging logic it needs, but lib "just using stdlib logger"
           | doesn't care about that.
           | 
           | And as a lib author, again, you can just use the stdlib one
           | and most likely cover all your use cases.
        
       | GauntletWizard wrote:
       | Levels are a great abstraction for debug logs. Debug logs are
       | often abused for much more than their reasonable purpose.
       | 
       | Audit logs are different from debug logs, and it's pretty common
       | for people to not think about the reasons why. Audit logs are
       | records, and they don't have levels; they might have metadata,
       | but you need to capture all of them, reliably. Debug logs are
       | ephemeral and if you miss some of them you shouldn't care.
       | 
       | The problem with structured logging libraries is that they try to
       | do both. There's some in common, but the mechanisms of output
       | should be really different - Debug logs in a circular buffer in
       | RAM makes a lot of sense, but you can't do that for Audit logs.
       | The current approach seems to be "Write them all to the same
       | place, then filter them out", and that only adds more latency in
       | which to lose your audit logs.
        
         | jen20 wrote:
         | > Audit logs are different from debug logs
         | 
         | Very much so - audit logs, where they are important, are often
         | part of the actual business domain and should be explicitly
         | modelled, not delegated to infrastructure with a slightly
         | different level.
        
         | marcus_holmes wrote:
         | When I last discussed this we ended up in a huge debate about
         | whether we should be sending different log levels to different
         | destinations, because of this dichotomy.
         | 
         | Debug messages shouldn't be stored on permanent logs (they
         | frequently contain sensitive/private info). Alerts need to be
         | sent to someone immediately (even if that's a different process
         | monitoring the log file for alerts and raising them). Errors
         | need context in order to diagnose them (ideally a stack trace).
         | All of these are doing different things using the same
         | interface.
         | 
         | Log levels end up being abused because of this. What we want is
         | to be able to handle different situations differently. What we
         | get is log levels.
        
         | tptacek wrote:
         | This appears to be why there's a separation of concerns between
         | "handlers" and "loggers".
        
           | GauntletWizard wrote:
           | That's exactly what I'm railing against. You shouldn't be
           | pushing all these things into the same stream and then
           | separating them out again. You may want to make it easy to
           | generate a debug log when you generate an alert. You
           | shouldn't do that by writing a log. You should have a very
           | different set of parameters for
           | 
           | If you're a bank, your login "Logs" shouldn't just be an
           | annotated web request; you should have an explicit
           | "AuthenticationSuccess" type in a non-ephemeral data store,
           | and it should contain more than just "301 /auth 1.2.3.4", it
           | should include information about the factors used to make
           | that decision.
           | 
           | Semi-structured logs don't have that, but will be abused to
           | provide that. It should be a type system error to create an
           | AuthSuccess without all that, not a set of default parameters
           | to a generic logging call.
           | 
           | Structured logging is great for moving fast, and I use it all
           | the time to make it easy to filter through my debug logs, but
           | levels are an abstraction in debug logs - and handlers are
           | completely an anti-pattern
        
             | tptacek wrote:
             | You've lost me, because the point of having a handler
             | interface is exactly to be able to do things like ensuring
             | that your "authentication success" events get stored
             | differently than your debug messages.
             | 
             | You can see upthread that the trend is kind of in exactly
             | the opposite direction to what you're advocating for:
             | plenty of people think there's no such thing as a log or an
             | audit record, only "events", expressed as otel spans. Have
             | fun with those people! :)
        
       | reggieband wrote:
       | I see some asking for stack support in the logs, and that is
       | something I would be looking for. Often when an error happens on
       | a server it might be inside some generic utility. At least with a
       | stack, you might realize that all of the calls to that utility
       | function are coming from the same location a few steps up the
       | stack. In my experience that can significantly speed up
       | identifying the root cause of issues.
       | 
       | I'm not sure I like the way they are handling attributes either.
       | One pattern I strongly discourage on projects I work on is
       | formatting data into the log message. When I've had access to
       | decent log aggregation tools (e.g. Splunk, Datadog) I like to
       | generate reports on error messages. That is usually possible with
       | regex magic but in my experience it is much much cleaner to have
       | a static unchanging message for all messages. I then store all
       | additional details in a separate part of the log line. I tend to
       | prefer JSON for those details since it is well supported and even
       | parsed by default by many tools. Another benefit of JSON for
       | those additional details is that it is supports arrays and hash-
       | tables. The attribute system proposed here doesn't seem to have
       | that same hierarchical attribute support.
       | 
       | IMO, unified, clean and consistent logging is one of the most
       | underrated techniques for maintaining long running services.
        
       | didip wrote:
       | If Zap and Zerolog can agree on a common interface, I think the
       | rest of Go world would follow.
        
         | xani_ wrote:
         | Seriously, Zap nails most of what I want from logger and the
         | remaining few % isn't exactly hard to add on top.
         | 
         | Could maybe get a bit of ergonomics pass but that's it
        
       | glenjamin wrote:
       | I'm increasingly of the opinion that we should stop using
       | "logging" libraries, and instead we should produce telemetry via
       | events using something like OpenTelemetry's trace API.
       | 
       | These traces can be rendered as strings for people who like their
       | logs in files
        
         | chrsig wrote:
         | I'm inclined to agree.
         | 
         | Traditional logging has a number of problems. Developers tend
         | to just stick `log.Info` (or whatever equivalent) anywhere.
         | 
         | Using an http server as an example:
         | 
         | The result is that one http request may produce dozens of log
         | messages. But because multiple http requests are running at the
         | same time, they all get interleaved in the actual log.
         | 
         | They can be tied together after the fact if you're using a
         | request id, but that requires post processing. It can be
         | difficult to know when a request ends, so that means there
         | needs to be an explicit "end of request" type message,
         | otherwise assume that it could still be active until the end of
         | the file...unless your log is rotating, in which case it could
         | be in the next file, or the next.
         | 
         | On top of that, it's a big performance hit. On every log
         | message, you're serializing the message, acquiring a mutex,
         | flushing it out to disk. Depending on the library, some of that
         | may or may not get evaluated depending on message severity.
         | It's a non-trivial overhead, and individual messages can cause
         | quite a bit of contention for that mutex.
         | 
         | Using something more like tracing, a log message gets attached
         | to a context for the task, and it's not until the task is
         | complete does anything get logged. This makes for a single log
         | message that has a list of all the individual informational
         | messages, as well as other metadata about the request. One
         | mutex acquisition, one serialization, one file flush.
         | 
         | It's at that point where tracing and logging are arguably the
         | same thing.
        
           | xani_ wrote:
           | > Using something more like tracing, a log message gets
           | attached to a context for the task, and it's not until the
           | task is complete does anything get logged. This makes for a
           | single log message that has a list of all the individual
           | informational messages, as well as other metadata about the
           | request. One mutex acquisition, one serialization, one file
           | flush.
           | 
           | That's kinda terrible design. You don't see anything in
           | progress, only if it finishes, and anything in progress just
           | disappears, never to be seen again, if app crashes.
           | 
           | Splitting to event per action in a given context is a
           | feature, not a bug
        
           | wokkel wrote:
           | Just look into log4j: mdc/ndc to include context
           | (user/session), include the thread name in your pattern and
           | use an async appender to take care of performance. All of
           | these things have been solved already.
        
             | chrsig wrote:
             | > All of these things have been solved already.
             | 
             | I mean...yes, that's how I was able to highlight the
             | benefits of having a single log message per request.
             | 
             | > Just look into log4j: mdc/ndc to include context
             | (user/session), include the thread name in your pattern and
             | use an async appender to take care of performance.
             | 
             | At a glance this looks like it will still produce 1 log
             | line per logging call, versus buffering with the request.
             | adding the thread id just helps stitch them together after
             | the fact. This doesn't get rid of any performance overhead
             | -- it may get it out of the way of a user request, but it's
             | still there under the hood.
             | 
             | In the go case, using the runtime/trace library as an
             | example[0], a `context.Context` is passed down the call
             | stack which gets decorated with trace spans/regions/log
             | messages on the way down. At the end of a task, all of the
             | collected context gets written to the active trace log.
             | 
             | [0] https://pkg.go.dev/runtime/trace
        
               | wokkel wrote:
               | I like my log to include where something happened and in
               | some complex cases having a few thousand lines of logging
               | info per request is not unreasonable. So stuffing al that
               | info into a single log event/line doesn't make sense to
               | me. I recently had the unfortunate situation where two
               | seemingly independant sessions were interfering, here a
               | sequential logfile where things were logged in the order
               | occurring was the only way to correlate them. Even though
               | we do telemetry via events, old school logfiles are a
               | godsend to debug the complex cases.
        
           | GeneralMayhem wrote:
           | > But because multiple http requests are running at the same
           | time, they all get interleaved in the actual log.
           | 
           | Many logging libraries will export a structured preamble
           | including the timestamp and PID, which is enough to
           | reconstruct single-threaded time histories when you want
           | them, while still allowing you to see cross-thread time-
           | correlated issues when you need that as well.
        
           | KronisLV wrote:
           | > It's at that point where tracing and logging are arguably
           | the same thing.
           | 
           | On some level I agree with this, but honestly tracing doesn't
           | have nearly the same adoption that logging has (at least in
           | my experience).
           | 
           | Why? Because generally it's easier to approach logging, even
           | if in ways that don't necessarily scale that well: like
           | getting your application writing logs to a file on the server
           | directory.
           | 
           | Note: I mention Java here, but a lot of it also applies to
           | Node/Python/Go/Ruby etc.
           | 
           | Eventually they can be moved over to a networked share, have
           | Logrotate running against them, or perhaps have proper log
           | shipping in place. That is perhaps the hardest aspect, since
           | for tracing (outside of something like JavaMelody which is
           | integrated inside of your app) you'll also need to set up
           | some sort of a platform for the logs to be shipped to and
           | processed in any number of ways.
           | 
           | Personally, I've found that something like Apache Skywalking
           | is what many who want to look into it should consider, a
           | setup utilizing which consists approximately of the
           | following:                 - some sort of a database for it
           | (ElasticSearch recommended, apparently
           | PostgreSQL/MariaDB/MySQL viable)       - the server
           | application which will process data (can run as an OCI
           | container)       - the web application if you need an
           | interface of this sort (can run as an OCI container)       -
           | an agent for your agent of choice, for example a set of .jar
           | files for Java which can be setup with -javaagent       -
           | optionally, some JS for integrating with your web application
           | (if that's what you're developing)
           | 
           | Technically, you can also use Skywalking for log aggregation,
           | but personally the setup isn't as great and their log view UI
           | is a bit awkward (e.g. it's not easy to preview all logs for
           | a particular service/instance in a file-like view), see the
           | demo: https://skywalking.apache.org/
           | 
           | For logs in particular, Graylog feels reasonably sane, since
           | it has a similarly "manageable" amount of components, for a
           | configuration example see:
           | https://docs.graylog.org/docs/docker#settings
           | 
           | Contrast that to some of the more popular solutions out
           | there, like Sentry, which gets way more complicated really
           | quickly: https://github.com/getsentry/self-
           | hosted/blob/master/docker-...
           | 
           | For most of the people who have to deal with self-hosted
           | setups where you might benefit from something like tracing or
           | log shipping, actually getting the platform up and running
           | will be an uphill battle, especially if not everyone sees the
           | value in setting something like this up, or setting aside
           | enough resources for it. Sometimes people will be more okay
           | with having no idea why a system goes down randomly, rather
           | than administering something like this constantly and
           | learning new approaches, instead of just rotating a bunch of
           | files.
           | 
           | For others, there are no such worries, because they can open
           | their wallets (without worrying about certain regulations and
           | where their data can be stored, hopefully) and have some
           | cloud provider give them a workable solution, so they just
           | need to integrate their apps with some agent for shipping the
           | information.
           | 
           | For others yet, throwing the requirement over to some other
           | team who's supposed to provide such platform components for
           | them is also a possibility.
        
             | phillipcarter wrote:
             | > Why? Because generally it's easier to approach logging
             | 
             | That's due to a significant head start. When one practice
             | has more than a decade to mature by spreading across an
             | ecosystem and becoming more ergonomic, it's definitely
             | going to be easier to adopt.
             | 
             | However, tracing as come a LONG way with OpenTelemetry even
             | in the past year. There's still a big mountain to climb,
             | but making tracing better/more widespread and providing on-
             | ramps to metrics and logging systems is exciting.
        
           | nine_k wrote:
           | Logging to syslog already solves the problem of asynchronous
           | / streamed logging.
           | 
           | Structuring log messages better than an arbitrary text string
           | is harder (even though I heard the widespread systemd offers
           | a way). Mostly it's because it's hard to agree upon a
           | universal standard. Logging in a self-describing structure is
           | basically the same as logging in JSON (which is not unheard
           | of). Maybe you could do a bit better logging in protobuf or
           | thrift records.
        
             | r3trohack3r wrote:
             | > Mostly it's because it's hard to agree upon a universal
             | standard
             | 
             | This is one of the things I've appreciated in my corner of
             | the Nodejs ecosystem. It's fairly standardized around pino
             | and bunyan. With async hooks you can instrument all
             | downstream async APIs too and associate them with the
             | incoming request. So, by default, in my services all logs
             | are associated with their incoming requests.
             | 
             | That being said - I'm inclined to agree that, as the
             | distributed tracing ecosystem becomes increasingly mature,
             | it really is eating the use case of logs. The real trick
             | with tracing is being able to associate a spike in latency
             | for 5xx requests in metrics with all their associated
             | traces. With logs I can aggregate all 5xx logs, filter by
             | latency, and get the subset of logs for the high latency
             | requests to dig in. In more mature stacks, I can even
             | conditionally increase log sampling on-demand for those
             | requests to get a better view into the problem (especially
             | helpful for rare events that are unlikely to make it past
             | the sampler).
             | 
             | I'm not good enough with traces yet to do this. Today I
             | have to start with logs to find the request IDs to get to
             | the traces.
        
             | dfee wrote:
             | My God. Logging in protobuf?
             | 
             | Logging is the lowest of all debugging utilities - its the
             | first thing you _ever_ do writing software - "hello world".
             | And, while I admire structural logging, the truth is
             | printing strings remains (truly) the lowest common
             | denominator across software developers.
             | 
             | I often see confusion on how to interpolate variables - let
             | alone zero standardization on field formatting, and poorly
             | managed logging infra, too.
             | 
             | Adding another barrier to entry (protobufs) isn't going to
             | drive better usage. Any solution I could prescribe would
             | inevitably chop off the long tail of SWE (raise the common
             | denominator), and I think that's going to be quite an
             | unpopular position to advance in any established company.
             | 
             | To be clear: structure is good, our expectations of how to
             | structurally log are too low, and introducing a build step
             | to compile PB to native, and then dealing with a poorly
             | generated (my opinion) API to log, sounds like a miserable
             | experience.
        
               | arinlen wrote:
               | > _My God. Logging in protobuf?_
               | 
               | Yes, or any other data format and/or transport protocol.
               | 
               | I'm surprised this is up for debate.
               | 
               | > _Logging is the lowest of all debugging utilities - its
               | the first thing you ever do writing software - "hello
               | world". And, while I admire structural logging, the truth
               | is printing strings remains (truly) the lowest common
               | denominator across software developers._
               | 
               | This sort of comment is terribly miopic. You can have a
               | logging API, and then configure your logging to transport
               | the events anywhere, any way. This is a terribly basic
               | feature and requirement, and one that comes out of the
               | box with some systems. Check how SLF4J[1] is pervasive in
               | Java, and how any SLF4J implementation offers logging to
               | stdout or a local file as a very specific and basic
               | usecase.
               | 
               | It turns out that nowadays most developers write software
               | that runs on many computers that aren't stashed over or
               | under their desks, and thus they need efficient and
               | convenient ways to check what's happening either in a
               | node or in all deployments.
               | 
               | [1] https://www.slf4j.org/
        
             | chrsig wrote:
             | Syslog doesn't solve any of the problems outlined above -
             | you're just trading a file for a unix domain socket. And it
             | introduces it's own problems, such as sharing that socket
             | with the containers (if you just mount /dev/log into the
             | container, you're gonna have a bad time when the syslog
             | daemon restarts). Log rotation also becomes a problem at
             | higher volumes. Want a 5 minute rotation? logrotate isn't
             | of much help.
             | 
             | re: logging in protobuf/thrift. Thrift is more well suited
             | towards it due to it's stream friendliness. With protobufs,
             | you can do better performance wise than json, but does tend
             | to be a bit more annoying because you need to write a
             | length header between messages. (at this point, I've
             | considered logging directly into a `.tar` file with one
             | file per request. I haven't ever done it, and wouldn't in
             | production, but it's an intriguing thought to me.) With
             | either format, you have to deal with the fact that they're
             | now binary logs, and you lose out on the ability to use a
             | number of tools like grep & awk, or you have some program
             | to cat the log as plain text.
             | 
             | Logging to json, while it's a bit more overhead, offers
             | access to a _lot_ of tooling which is lost with binary
             | logging or unstructured logging. Athena, snowflake, sqlite,
             | kafka /ksql, jq, it's incredibly ELK friendly...The list
             | goes on. In general, it's usable without the message
             | definitions, so it can be much easier to use.
             | 
             | All of these tradeoffs wildly depend on what it is that
             | you're logging and what your needs are.
        
         | rcarmo wrote:
         | Many tracing libraries get their data off logging libraries.
         | Logging doesn't always mean "whoa, let's stop everything and
         | write to a file", it is often a matter of buffering data or
         | increasing counters until data can be sent out in various
         | fashions.
         | 
         | Log files are just one of those.
        
           | glenjamin wrote:
           | The key difference is my experience is not about destination
           | - backends are as flexible as you say
           | 
           | One key difference is having an event name + properties,
           | rather than a human-readable description and maybe some
           | fields if you're lucky
           | 
           | Another key difference is wrapping units of work rather than
           | just scattering log lines in places you think need them
           | somewhat arbitrarily
        
             | rcarmo wrote:
             | Most decent logging frameworks allow for structured output.
             | That change came about roughly when NewRelic became "the
             | thing" to use.
        
             | jayd16 wrote:
             | >One key difference is having an event name + properties
             | [...] Another key difference is wrapping units of work
             | [...]
             | 
             | Modern "enterprise" logging libraries already work this
             | way, don't they?
        
         | tptacek wrote:
         | The otel trace API is nowhere near ergonomic enough to replace
         | logging. I think people mostly tolerate otel because the
         | idiomatic way to use it is just a funcall and a defer at the
         | top of a subset of your functions, and because if you're otel-
         | instrumenting, you're probably already passing a context.
         | 
         | https://pkg.go.dev/go.opentelemetry.io/otel/trace
        
           | phillipcarter wrote:
           | FWIW tracing in Go is primarily made more difficult by having
           | explicit context propagation and no helper to auto-close a
           | span based on scope. If you use Python or .NET for example,
           | it's implicit and you can "just create some spans" like you
           | would a structured log somewhere in your app, then use built-
           | in language constructs to have it handle closing for you.
        
             | xani_ wrote:
             | ...the helper like WithCancel/WithDeadline ?
        
             | Rapzid wrote:
             | .Net has task-local storage which is pretty amazing and
             | Seri log made great use of it(my all-time fav structured
             | log lib).
             | 
             | But iirc task local is a perf concern, and JavaScript has
             | yet to have anything comparable that's ubiquitous(nodejs
             | async hooks though so one day maybe).
             | 
             | So, explicit contexts for Go and JavaScript unfortunately.
        
               | phillipcarter wrote:
               | JS does have convenience APIs in the form of
               | _startActiveSpan_ and _getActiveSpan_ that take care of
               | most of the pains here. There 's likely some scenarios
               | where you can't use these (e.g., instrumenting a tool
               | must use the sdk-trace-base package) but I think in most
               | cases they help a lot. Go doesn't have these kinds of
               | convenience APIs though.
        
             | tptacek wrote:
             | Sure. If it was a different language, or a different otel
             | library, my opinion would probably be different. I do not
             | think it's realistic to replace logging with otel in Go,
             | though.
        
               | glenjamin wrote:
               | For web-style application code my experience writing Go
               | code says that you totally can stop using logs and just
               | use traces and have just as much visibility into a
               | running process, if not more.
               | 
               | For library code or other deployment models I'm not sure
               | - it's not something I've had a need to experiment with
        
               | phillipcarter wrote:
               | I think it's an inclusive OR here. When logging
               | stabilizes in the Go SDK (which will be a while, perhaps
               | a year) the idea will be that you can keep all your
               | existing logging and then start to add tracing
               | judiciously so they're easily correlated with requests. I
               | think the juice is worth that particular squeeze for more
               | folks at least.
               | 
               | From there I would personally advocate for adding more
               | manual traces rather than more logs, but it's
               | understandable if it's a heavier lift to pass context
               | around all the time. The more options the better.
        
           | henvic wrote:
           | I've played with trying to create an idiomatic wrapper for
           | their Go module precisely because of this reason, but then
           | didn't find time to continue with this.
           | 
           | https://github.com/henvic/tel
        
         | jyounker wrote:
         | Or you include the trace ID as one field in the structured log.
         | That ID can be threaded through the log library.
        
         | akira2501 wrote:
         | I'm of the opinion that we should use both. They're both
         | fundamentally useful and solve separate problems, and I don't
         | think there's any resource constraints that would make this an
         | issue.
         | 
         | A logging API that just lets me write either a textual string,
         | a structured telemetry object, or just both would be ideal.
         | Then I can have either source separately, or I can view how
         | they generated events collectively.
        
           | phillipcarter wrote:
           | OpenTelemetry's design says: why not both! Logging is still
           | early days, but the idea is that you can take your existing
           | structured logs and have them automatically correlated with
           | traces that you can add later.
           | 
           | (there's other use cases some can envision, but I'm team
           | "tracing is better than logging", while recognizing that most
           | apps don't start from scratch and need to bring their logs
           | along for the ride)
        
             | jen20 wrote:
             | I'm in the position of working on a green-field system
             | currently, and spent a bit of time looking at OpenTelemetry
             | both as a logging system and for metrics and tracing. If
             | OpenTelemetry can indeed suit that use case today, it's not
             | evident from any of the documentation or examples, and the
             | naive "go get" approach in a test project led to
             | incompatible versions of unstable libraries being pulled
             | in.
             | 
             | Further, the OpenTelemetry Go library pulls in all kind of
             | dependencies which are not acceptable: a YAML parser which
             | doesn't even use compatible tagging and versioning via
             | Testify, logr and friends, and go-cmp. Combined with the
             | insistence of GRPC on pulling glog and a bunch of other
             | random stuff like appengine libraries, you end up with a
             | huge dependency tree before you've even started writing
             | code.
             | 
             | The result is I'm sticking with zerolog, and a custom
             | metrics library, and will let Splunk do it's thing. It's a
             | shame, as distributed tracing is very desirable, but not
             | desirable enough to ignore libraries that have chosen
             | concrete dependencies vs interfaces and adapters.
             | 
             | Fundamental things like this need to be part of the
             | standard library, or have zero external dependencies. Rust
             | does (in my opinion) a better job here despite requiring on
             | average more libraries to achieve things, because features
             | can be switched off via Cargo.toml, reducing the overall
             | dependency set.
        
               | phillipcarter wrote:
               | (note: I'm not a maintainer in the Go SIG so I can't
               | speak towards these issues with much authority)
               | 
               | Definitely interested in how things can be improved here.
               | What are you generally looking to accomplish? Hopefully
               | it's just a matter of docs not accurately expressing
               | what's possible today (a consistent theme for quite some
               | time that's getting resolved, but it takes time).
               | 
               | As for the dependency issues, is this something you'd
               | have the time to file an issue about[0], or make an
               | agenda item for the next Go SIG? At the end of each SIG
               | there's also a moment where the SIG leader asks for
               | stories about adoption (good or bad) and this would
               | clearly count. I'd like to think that these issues are
               | solvable, or at least solvable enough to make it worth it
               | for you to adopt.
               | 
               | [0]: https://github.com/open-telemetry/opentelemetry-
               | go/issues?q=...
        
               | jen20 wrote:
               | Thanks for the thoughtful response.
               | 
               | I think the aim is to have a single "sink" for metrics,
               | logs and tracing, and it's clear that OpenTelemetry is
               | aiming to be that. Ideally from a single "go get", one
               | should be able to pull in a library without additional
               | dependencies that:
               | 
               | - permits recording of structured logs (perhaps the
               | stdlib interface under discussion will help here!). From
               | a personal perspective I'd rather not have an API that
               | accepts key-value pairs as varargs, but if that was the
               | only thing available I'd take it I think.
               | 
               | - permits recording of counters, gauges, timers and
               | histograms,
               | 
               | - permits recording of spans, and associating both logs
               | and metrics with those spans.
               | 
               | The Go library README shows the logs section as "frozen"
               | and states that no pull requests to improve logging are
               | being accepted. With no visibility into the process
               | (which is, to be clear, because I haven't looked into it
               | and not the fault of the project per se!), it looks like
               | this might be an option in future but not today?
        
               | phillipcarter wrote:
               | I see! Yeah, this is one where where otel-go is a lot
               | harder to use, but it's something the SIG is looking at.
               | A coworker of mine is helping drive a design that's sort
               | of an "easy button" to configure all the things with the
               | least-surprising defaults[0] and we're seeing how people
               | like it in our SDK distribution that uses it[1]. I hope
               | that sometime soon we'll have the design polished-up
               | enough to get merged in. Like most OSS projects, it'll
               | take some time but I'm confident we can get it done.
               | 
               | The main challenge is that there's a large variety of use
               | cases to fulfill (e.g., someone wants custom context
               | propagation, a custom span processor, and export over
               | HTTP+json but not HTTP+protobuf) and today the answer to
               | that is that you have to pull in all the libraries for
               | all the things you need. It's a lot more energy you need
               | to expend to get started with all of this than it needs
               | to be.
               | 
               | As for logging support in the Go SDK, it's frozen mostly
               | just due to lack of bandwidth and a need to finish what's
               | already been started. Metrics have proven to be much more
               | difficult and time-consuming to implement correctly
               | across all languages, with Go being impacted harder than
               | other languages (e.g., Python and .NET). I think you can
               | expect logging integrations in the near-ish future
               | though.
               | 
               | This is great feedback. I'll pass it on folks who haven't
               | seen it. Thank you! And please feel free to file issues
               | about all the things that rub you the wrong way
               | 
               | [0]: https://github.com/open-telemetry/opentelemetry-go-
               | contrib/p...
               | 
               | [1]: https://github.com/honeycombio/honeycomb-
               | opentelemetry-go
        
         | arinlen wrote:
         | > _I'm increasingly of the opinion that we should stop using
         | "logging" libraries, and instead we should produce telemetry
         | via events using something like OpenTelemetry's trace API._
         | 
         | That really depends on what you mean by "events". Even
         | OpenTelemetry supports logging events. Logging is not replaced
         | by emitting metrics, as they have very distinct
         | responsibilities.
         | 
         | > _These traces can be rendered as strings for people who like
         | their logs in files_
         | 
         | You're missing the whole point of logging stuff. The point of
         | logging things is not to have stuff show up in files. Logging
         | events are the epitome of observability, and metrics and
         | tracing events end up specializations of logging events which
         | only cover very precise and specific usecase.
        
           | glenjamin wrote:
           | > Logging events are the epitome of observability, and
           | metrics and tracing events end up specializations of logging
           | events which only cover very precise and specific usecase.
           | 
           | This is a common way of thinking about things, but having now
           | worked with event-based tracing for the last few years I no
           | longer believe traditional logs are necessary.
           | 
           | If you do "good" logging you likely have entirely structured
           | data and no dynamic portions of the main log message. This is
           | almost exactly equivalent to a span as represented by tracing
           | implementations.
           | 
           | I have come to believe that these events are the core piece
           | of telemetry, and from them you can derive metrics, spans,
           | traces and even normal-looking logs.
        
             | arinlen wrote:
             | > _This is a common way of thinking about things, but
             | having now worked with event-based tracing for the last few
             | years I no longer believe traditional logs are necessary._
             | 
             | You are free to use stuff as you see fit, even if you
             | intentionally miss out on features.
             | 
             | Logging is a very basic feature whose usecases are not
             | covered by metrics events, or even traces. Logs are used to
             | help developers troubleshoot problems by giving them an
             | open format to output data that's relevant to monitor
             | specific aspects of a system's behavior.
             | 
             | It makes no sense to emit a metrics event with tons of
             | stuff as dimensions and/or annotations that's emitted in a
             | single line of code that's executed by a single code path
             | when all you need is a single log message.
             | 
             | Also, stack traces are emitted as logging events, and
             | metrics events are pretty useless at tracking that sort of
             | info.
             | 
             | But hey, to each its own. If you feel that emitting a
             | counter or a timer gives you enough info to check why your
             | code caught an exception and what made it crash, more power
             | to you. The whole world seems to disagree, given standard
             | logging frameworks even added support for logging stack
             | traces as structured logs as a basic feature.
        
               | glenjamin wrote:
               | A counter or a timer is clearly no substitute for a log,
               | that would be an absurd statement for me to have made -
               | which is why I said no such thing.
               | 
               | A span as modelled by OpenTelemetry is a a bag of
               | developer-chosen structured data containing a name, a
               | timestamp and some auto-generated IDs to correlate it
               | with others.
               | 
               | It is effectively an open standard for how to collect
               | good structured logs - although it is currently not
               | commonly pitched that way by either tracing advocates or
               | structured logging advocates.
               | 
               | These are commonly used to create trace waterfalls, but
               | can also be used to derive metrics or simple dev-debug
               | output.
               | 
               | I have used this approach myself to great effect, and
               | also spoken to a number of people who have also found
               | success with such an approach - but currently the
               | momentum of existing approaches (and the fact that
               | traditional logging still works well-enough for most
               | people) has not led to wide adoption
        
         | cube2222 wrote:
         | I agree.
         | 
         | I'm not necessarily a fan of OpenTelemetry, as every time I
         | looked at it it was still "in progress" (though the last time
         | was almost 2 years ago), but in general I agree.
         | 
         | We started with tracing+logging, but as time passed we moved
         | more and more stuff into tracing. But not just span-per-service
         | kind of tracing - detailed orchestration on the level of
         | "interesting" functions, with important info added as tags.
         | 
         | Right now we use almost exclusively tracing (and metrics ofc)
         | and it's working great, even if each trace often has hundreds
         | of spans.
         | 
         | In our case we use Datadog, which works well for most bits and
         | purposes, but we also send all traces over Firehose to S3 so
         | that we can query them with Athena at 100% retention. This way
         | you get your cake and can eat it too.
        
           | phillipcarter wrote:
           | If you've got the time for instrumenting something relatively
           | small, it's definitely worth picking up OTel again to see how
           | it works for you. Tracing is stable in most languages (Go
           | included) and there's a lot more support in each language's
           | ecosystem.
           | 
           | FWIW I think OTel will always be in a state of "in progress"
           | though. There's more scenarios and signal types that will get
           | picked up (profiling, RUM) and more things to spread across
           | the ecosystem (k8s operator improvements, collector
           | processors for all kinds of things, etc.) and more
           | integration into relevant tools and frameworks (.NET does
           | this today, maybe flask or spring could do it tomorrow). But
           | I wouldn't let that stop you from trying it out again to see
           | if you can get some value out of it!
        
         | jayd16 wrote:
         | I guess Golang is catching up but in something like C# or Java,
         | the logging libraries already feel like tracing libraries. They
         | already have contexts and structure.
         | 
         | What are you proposing should change? Log files should default
         | to a verbose binary representation that captures more? What do
         | you think is not captured by today's logging?
        
           | phillipcarter wrote:
           | Tracing gives you correlation between "log lines" and across
           | process/network boundaries by propagating context for you.
           | You could replicate everything that tracing does by manually
           | propagating around context-like metadata yourself, stitching
           | the log lines together into a "trace". But why not just use
           | an SDK that does that for you?
        
             | jbjbjbjb wrote:
             | It is a config away in C# world, and even when it wasn't so
             | in built it was very simple to add to structured logging.
        
               | phillipcarter wrote:
               | In the C# world, tracing is also just a config away. It's
               | baked into the runtime. But you don't need to manually
               | stitch together logs into a "trace" - it just does it for
               | you.
        
               | jbjbjbjb wrote:
               | What I meant was you can easily add a correlation id to
               | your logs, and send those to wherever you collect and
               | query the logs. Then using that you can easily see the
               | log across your services. So there isn't any manual
               | stitching in the logging either.
        
         | azlev wrote:
         | Telemetry and logs are distinct. It makes sense to gather
         | metrics and it makes sense to catch sporadic messages of what's
         | going on in details.
        
       | latchkey wrote:
       | While we can debate all day long what should and should not be
       | part of the standard lib (bike shed), I just ignore all of this
       | and happily use zerolog.
       | 
       | Problem solved, one dependency, it just works. Back to writing
       | code.
        
         | jen20 wrote:
         | The problem is, it's not one dependency. Several important
         | libraries pull in a variety of wrappers and adapters - just to
         | name three:
         | 
         | - glog (gRPC)
         | 
         | - logr (OpenTelemetry and friends)
         | 
         | - klog (Kubernetes client)
         | 
         | It's quite conceivable that even having decided to use zerolog,
         | you end up with at least four separate logging libraries in
         | your dependency tree before having written any of your own
         | code.
        
           | latchkey wrote:
           | True, but as long as I can either control the logging or
           | write an adapter, I'm ok with that.
           | 
           | At the end of the day, I'd still have that issue, unless the
           | standard lib was being used everywhere... which is something
           | that is equally not controllable.
        
             | jen20 wrote:
             | The hope of a standard structured logging interface in the
             | standard library is that serious libraries will adapt
             | themselves to use it, and the proliferation will disappear
             | over time. Whether or not this will actually happen, or
             | whether the libraries which do not play ball will have to
             | be forked is a different matter.
        
               | latchkey wrote:
               | "serious" libraries today all provide the ability to pass
               | in adapters.
               | 
               | Having been in the group of a multi-year effort to do
               | this in Java (JSR-47), there is always a limitation in
               | the API that prevents someone from getting it right.
               | 
               | Funny enough, even though Java ended up with JUL
               | (java.util.logging), SLF4J ended up being the defacto
               | standard that everyone went with.
        
               | jen20 wrote:
               | > "serious" libraries today all provide the ability to
               | pass in adapters.
               | 
               | Sadly they don't, unless we count things like gRPC,
               | Kubernetes clients and so forth as non-serious.
        
         | tptacek wrote:
         | The proposal acknowledges that. Part of the point of working
         | this out is that it creates a common interface so that your
         | zerolog-logging code plays nice with someone else's hclog-
         | logging code.
        
           | latchkey wrote:
           | A long time ago, I was part of JSR-47. I wonder if anything
           | can be learned from that process (which was a bit of a mess).
        
           | oezi wrote:
           | Unless the new generic logging API is designed badly so that
           | certain functionalities become hard in the future because
           | they deviate.
           | 
           | For instance when Rails did standardize ActiveJob (background
           | tasks) they didn't include a way in the API to query the
           | current length of a queue of background tasks. Because
           | everyone is now only trying to conform to the API accessing
           | such information is now hard in many of the implementations.
        
       | motiejus wrote:
       | If we ignore the proposal, this gives at least an awesome
       | overview of the state-of-the-art of logging in go. That was the
       | best part. Thanks for the heads up!
        
         | egnehots wrote:
         | _Prior Work_
         | 
         | - The existing log package has been in the standard library
         | since the release of Go 1 in March 2012. It provides formatted
         | logging, but not structured logging or levels.
         | 
         | - Logrus, one of the first structured logging packages, showed
         | how an API could add structure while preserving the formatted
         | printing of the log package. It uses maps to hold key-value
         | pairs, which is relatively inefficient.
         | 
         | - Zap grew out of Uber's frustration with the slow log times of
         | their high-performance servers. It showed how a logger that
         | avoided allocations could be very fast.
         | 
         | - zerolog reduced allocations even further, but at the cost of
         | reducing the flexibility of the logging backend.
         | 
         | - All the above loggers include named levels along with key-
         | value pairs. Logr and Google's own glog use integer verbosities
         | instead of named levels, providing a more fine-grained approach
         | to filtering high-detail logs.
         | 
         | Other popular logging packages are Go-kit's log, HashiCorp's
         | hclog, and klog.
        
       ___________________________________________________________________
       (page generated 2022-09-11 23:00 UTC)