[HN Gopher] Discussion: structured, leveled logging ___________________________________________________________________ Discussion: structured, leveled logging Author : philosopher1234 Score : 135 points Date : 2022-09-11 15:44 UTC (7 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | svnpenn wrote: | Every time I come across a logging package, I don't understand | what the point of them are. Finally in the linked issue, I see | this: | | > For many programs that aren't run directly by person, like | servers, logging is the main way for developers to observe the | detailed behavior of the system, and often the first place they | go to debug it. | | and it makes sense. I don't deal with servers at all, other than | a couple I have made just to host some basic sites or programs | locally. I think Go team might be overlooking the fact that a big | chunk of the userbase doesnt understand, and has no need for a | logging package. Couple that with the giant API proposed, along | with a poor design decision (kvs ...any), and the large amount of | bikeshedding that already happened in this issue, I really hope | they think hard before just shoving this into "std". | xani_ wrote: | Develop bigger app and you will understand. | | But one example - having an logging interface that is not shit | (the current builtin is so garbage it might as well not exist | at all) allows for packages to adopt it as default method and | you magically can just fed same logger to every dep you pull | with no loss of features or having to write wrapper. | | Contexted logger means you can just fed ["plugin":pluginname] | tagged one to every plugin and get logs pointing you to the | place where it happened. | | The structured logging part is mostly to avoid woes of then | having to parse it back to structured format via bunch of | regexpes that will inevitably break next release. If you still | don't get it, here is a regexp to parse HAProxy logs: | .*haproxy\[(\d+)]: (.+?):(\d+) \[(.+?)\] (.+?)(|[\~]) | (.+?)\/(.+?) | ([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+) ([\-\d]+) | ([\-\d]+) (\S+) (\S+) (\S)(\S)(\S)(\S) | ([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+) | ([\-\d]+)\/([\-\d]+)(| \{.*\}) (".*)([\n|\s]*?)$`) | | > I think Go team might be overlooking the fact that a big | chunk of the userbase doesnt understand, and has no need for a | logging package | | How does that proposal impede them ? | | if you don't need any of the extras you'd be able to just | `log.Info" | tptacek wrote: | It's not going into std; it's going into /x. | marcus_holmes wrote: | effectively the same place at this point. There's so many | "standard" libs in /x that trying to constraint to "just the | standard lib" has to include /x to be actually useful. | philosopher1234 wrote: | Such as? | marcus_holmes wrote: | goimports, godoc and the other tools | (https://pkg.go.dev/golang.org/x/tools - you basically | can't use Go properly without these) | | all the crypto stuff | (https://pkg.go.dev/golang.org/x/crypto) if you're doing | anything with crypto then this is needed | | others are really useful (see index at | https://pkg.go.dev/golang.org/x) - I've imported the | image, sync and text packages | | it's a lot of the "missing bits" from the standard lib. I | get why they're in /x but the distinction is moot in most | cases. | tptacek wrote: | Almost nobody uses goimports or godoc as a declared | dependency; they're tools, not standard library | components. | | It's perfectly possible to implement cryptography without | touching x/crypto; std includes SHA2, HMAC, an AEAD | interface, ED25519, and, of course, Go's entire TLS | stack. Arguably there's _too much_ cryptography in std | (almost nobody is better off directly building on cipher | cores and signature algorithms). Thankfully: most Go | programs don 't implement cryptography, which is | dangerous even with Go's relatively solid library | interfaces. | tptacek wrote: | They're not at all effectively the same place, and much of | what's in /x competes with third-party packages, which are | on a somewhat level playing field in that both /x and | third-party code have to be explicitly brought in to a | project as a dep. | marcus_holmes wrote: | meh, I'm always going to prefer an /x package to a 3rd- | party package. | atsjie wrote: | The majority of Golang devs actually do work on web services, | API's, or libs/frameworks that would benefit from a | standardized structured logging interface. | | Source: https://go.dev/blog/survey2022-q2-results (scroll | almost to the bottom) | | For the audience of this proposal it's probably fairly evident | what the advantages of standardization would be. | | Also you argue quite strongly about the poor API design | decisions being made; while you similarly admit not having | worked with such tools or with web services in Golang general. | That's odd. | svnpenn wrote: | I don't work with servers much, but I do have extensive | experience in Go. I know a bad API when I see one. | strings.Replacer [1] uses the same syntax, and several others | have commented in the discussion about it being a bad idea. | | 1. https://godocs.io/strings#Replacer | xani_ wrote: | From what I've skimmed the most problems seem to be | "golang's type system won't allow expressing all of that on | compile time so it will be costly" and I don't think that | can be sensibly fixed in logging package | | > I know a bad API when I see one. strings.Replacer [1] | uses the same syntax, and several others have commented in | the discussion about it being a bad idea. | | What's wrong with it ? Never used it. But I think the | complaints has been about wasting time for runtime type | handling and Replacer doesn't have that problem. | Joker_vD wrote: | The main performance problem is with "kvs ...any" | parameter in the Logger's functions because it allocates | _lots_ of stuff: the array itself, and then the wrappers | around the values. I 've tried to use something like this | in a central loop of a CPU-emulator to trace every | instruction executed and IIRC it slowed things about 3x | times down. The only thing you can do to escape this cost | is if logging.Level() >= logging.DEBUG { | logging.Logger.Debug(...) } | | but it's ugly and can't be abstracted away. And yes, I've | put several of such guards in the several hot paths of | some actual production code base and seen about 25% speed | up after that edit. | jen20 wrote: | > What's wrong with it ? | | It relies on lining up pairs of values in a way that | cannot be checked by the compiler rather than having | dedicated functions that ensure they are kept together. | | This is the fundamental difference between bad logging | libraries (like klog, hclog etc) and good logging | libraries (like zerolog). | xani_ wrote: | I guess simplest fix for that particular one would be | accepting `map[string]string` but that won't exactly work | for the any type, and you can't easily add more. | | Maybe `...map[string]interface{}` but we're back to "the | overhead is kinda shit" | kaz-inc wrote: | I also didn't understand the need for a logging package for a | while, but I think it makes sense for go because they're | focusing on an audience that writes server applications. | xani_ wrote: | It's less about package and more about interface. | | One "good enough" in stdlib means libraries could just | require passing a interface matching it. Then no matter which | logging package you use, as long as it fulfils the interface | of stdlib logger it can be used with any library or other | component you pulled. | | So with simple app you can just have simple logger that dumps | everything on stderr, complex app can have whatever complex | logging logic it needs, but lib "just using stdlib logger" | doesn't care about that. | | And as a lib author, again, you can just use the stdlib one | and most likely cover all your use cases. | GauntletWizard wrote: | Levels are a great abstraction for debug logs. Debug logs are | often abused for much more than their reasonable purpose. | | Audit logs are different from debug logs, and it's pretty common | for people to not think about the reasons why. Audit logs are | records, and they don't have levels; they might have metadata, | but you need to capture all of them, reliably. Debug logs are | ephemeral and if you miss some of them you shouldn't care. | | The problem with structured logging libraries is that they try to | do both. There's some in common, but the mechanisms of output | should be really different - Debug logs in a circular buffer in | RAM makes a lot of sense, but you can't do that for Audit logs. | The current approach seems to be "Write them all to the same | place, then filter them out", and that only adds more latency in | which to lose your audit logs. | jen20 wrote: | > Audit logs are different from debug logs | | Very much so - audit logs, where they are important, are often | part of the actual business domain and should be explicitly | modelled, not delegated to infrastructure with a slightly | different level. | marcus_holmes wrote: | When I last discussed this we ended up in a huge debate about | whether we should be sending different log levels to different | destinations, because of this dichotomy. | | Debug messages shouldn't be stored on permanent logs (they | frequently contain sensitive/private info). Alerts need to be | sent to someone immediately (even if that's a different process | monitoring the log file for alerts and raising them). Errors | need context in order to diagnose them (ideally a stack trace). | All of these are doing different things using the same | interface. | | Log levels end up being abused because of this. What we want is | to be able to handle different situations differently. What we | get is log levels. | tptacek wrote: | This appears to be why there's a separation of concerns between | "handlers" and "loggers". | GauntletWizard wrote: | That's exactly what I'm railing against. You shouldn't be | pushing all these things into the same stream and then | separating them out again. You may want to make it easy to | generate a debug log when you generate an alert. You | shouldn't do that by writing a log. You should have a very | different set of parameters for | | If you're a bank, your login "Logs" shouldn't just be an | annotated web request; you should have an explicit | "AuthenticationSuccess" type in a non-ephemeral data store, | and it should contain more than just "301 /auth 1.2.3.4", it | should include information about the factors used to make | that decision. | | Semi-structured logs don't have that, but will be abused to | provide that. It should be a type system error to create an | AuthSuccess without all that, not a set of default parameters | to a generic logging call. | | Structured logging is great for moving fast, and I use it all | the time to make it easy to filter through my debug logs, but | levels are an abstraction in debug logs - and handlers are | completely an anti-pattern | tptacek wrote: | You've lost me, because the point of having a handler | interface is exactly to be able to do things like ensuring | that your "authentication success" events get stored | differently than your debug messages. | | You can see upthread that the trend is kind of in exactly | the opposite direction to what you're advocating for: | plenty of people think there's no such thing as a log or an | audit record, only "events", expressed as otel spans. Have | fun with those people! :) | reggieband wrote: | I see some asking for stack support in the logs, and that is | something I would be looking for. Often when an error happens on | a server it might be inside some generic utility. At least with a | stack, you might realize that all of the calls to that utility | function are coming from the same location a few steps up the | stack. In my experience that can significantly speed up | identifying the root cause of issues. | | I'm not sure I like the way they are handling attributes either. | One pattern I strongly discourage on projects I work on is | formatting data into the log message. When I've had access to | decent log aggregation tools (e.g. Splunk, Datadog) I like to | generate reports on error messages. That is usually possible with | regex magic but in my experience it is much much cleaner to have | a static unchanging message for all messages. I then store all | additional details in a separate part of the log line. I tend to | prefer JSON for those details since it is well supported and even | parsed by default by many tools. Another benefit of JSON for | those additional details is that it is supports arrays and hash- | tables. The attribute system proposed here doesn't seem to have | that same hierarchical attribute support. | | IMO, unified, clean and consistent logging is one of the most | underrated techniques for maintaining long running services. | didip wrote: | If Zap and Zerolog can agree on a common interface, I think the | rest of Go world would follow. | xani_ wrote: | Seriously, Zap nails most of what I want from logger and the | remaining few % isn't exactly hard to add on top. | | Could maybe get a bit of ergonomics pass but that's it | glenjamin wrote: | I'm increasingly of the opinion that we should stop using | "logging" libraries, and instead we should produce telemetry via | events using something like OpenTelemetry's trace API. | | These traces can be rendered as strings for people who like their | logs in files | chrsig wrote: | I'm inclined to agree. | | Traditional logging has a number of problems. Developers tend | to just stick `log.Info` (or whatever equivalent) anywhere. | | Using an http server as an example: | | The result is that one http request may produce dozens of log | messages. But because multiple http requests are running at the | same time, they all get interleaved in the actual log. | | They can be tied together after the fact if you're using a | request id, but that requires post processing. It can be | difficult to know when a request ends, so that means there | needs to be an explicit "end of request" type message, | otherwise assume that it could still be active until the end of | the file...unless your log is rotating, in which case it could | be in the next file, or the next. | | On top of that, it's a big performance hit. On every log | message, you're serializing the message, acquiring a mutex, | flushing it out to disk. Depending on the library, some of that | may or may not get evaluated depending on message severity. | It's a non-trivial overhead, and individual messages can cause | quite a bit of contention for that mutex. | | Using something more like tracing, a log message gets attached | to a context for the task, and it's not until the task is | complete does anything get logged. This makes for a single log | message that has a list of all the individual informational | messages, as well as other metadata about the request. One | mutex acquisition, one serialization, one file flush. | | It's at that point where tracing and logging are arguably the | same thing. | xani_ wrote: | > Using something more like tracing, a log message gets | attached to a context for the task, and it's not until the | task is complete does anything get logged. This makes for a | single log message that has a list of all the individual | informational messages, as well as other metadata about the | request. One mutex acquisition, one serialization, one file | flush. | | That's kinda terrible design. You don't see anything in | progress, only if it finishes, and anything in progress just | disappears, never to be seen again, if app crashes. | | Splitting to event per action in a given context is a | feature, not a bug | wokkel wrote: | Just look into log4j: mdc/ndc to include context | (user/session), include the thread name in your pattern and | use an async appender to take care of performance. All of | these things have been solved already. | chrsig wrote: | > All of these things have been solved already. | | I mean...yes, that's how I was able to highlight the | benefits of having a single log message per request. | | > Just look into log4j: mdc/ndc to include context | (user/session), include the thread name in your pattern and | use an async appender to take care of performance. | | At a glance this looks like it will still produce 1 log | line per logging call, versus buffering with the request. | adding the thread id just helps stitch them together after | the fact. This doesn't get rid of any performance overhead | -- it may get it out of the way of a user request, but it's | still there under the hood. | | In the go case, using the runtime/trace library as an | example[0], a `context.Context` is passed down the call | stack which gets decorated with trace spans/regions/log | messages on the way down. At the end of a task, all of the | collected context gets written to the active trace log. | | [0] https://pkg.go.dev/runtime/trace | wokkel wrote: | I like my log to include where something happened and in | some complex cases having a few thousand lines of logging | info per request is not unreasonable. So stuffing al that | info into a single log event/line doesn't make sense to | me. I recently had the unfortunate situation where two | seemingly independant sessions were interfering, here a | sequential logfile where things were logged in the order | occurring was the only way to correlate them. Even though | we do telemetry via events, old school logfiles are a | godsend to debug the complex cases. | GeneralMayhem wrote: | > But because multiple http requests are running at the same | time, they all get interleaved in the actual log. | | Many logging libraries will export a structured preamble | including the timestamp and PID, which is enough to | reconstruct single-threaded time histories when you want | them, while still allowing you to see cross-thread time- | correlated issues when you need that as well. | KronisLV wrote: | > It's at that point where tracing and logging are arguably | the same thing. | | On some level I agree with this, but honestly tracing doesn't | have nearly the same adoption that logging has (at least in | my experience). | | Why? Because generally it's easier to approach logging, even | if in ways that don't necessarily scale that well: like | getting your application writing logs to a file on the server | directory. | | Note: I mention Java here, but a lot of it also applies to | Node/Python/Go/Ruby etc. | | Eventually they can be moved over to a networked share, have | Logrotate running against them, or perhaps have proper log | shipping in place. That is perhaps the hardest aspect, since | for tracing (outside of something like JavaMelody which is | integrated inside of your app) you'll also need to set up | some sort of a platform for the logs to be shipped to and | processed in any number of ways. | | Personally, I've found that something like Apache Skywalking | is what many who want to look into it should consider, a | setup utilizing which consists approximately of the | following: - some sort of a database for it | (ElasticSearch recommended, apparently | PostgreSQL/MariaDB/MySQL viable) - the server | application which will process data (can run as an OCI | container) - the web application if you need an | interface of this sort (can run as an OCI container) - | an agent for your agent of choice, for example a set of .jar | files for Java which can be setup with -javaagent - | optionally, some JS for integrating with your web application | (if that's what you're developing) | | Technically, you can also use Skywalking for log aggregation, | but personally the setup isn't as great and their log view UI | is a bit awkward (e.g. it's not easy to preview all logs for | a particular service/instance in a file-like view), see the | demo: https://skywalking.apache.org/ | | For logs in particular, Graylog feels reasonably sane, since | it has a similarly "manageable" amount of components, for a | configuration example see: | https://docs.graylog.org/docs/docker#settings | | Contrast that to some of the more popular solutions out | there, like Sentry, which gets way more complicated really | quickly: https://github.com/getsentry/self- | hosted/blob/master/docker-... | | For most of the people who have to deal with self-hosted | setups where you might benefit from something like tracing or | log shipping, actually getting the platform up and running | will be an uphill battle, especially if not everyone sees the | value in setting something like this up, or setting aside | enough resources for it. Sometimes people will be more okay | with having no idea why a system goes down randomly, rather | than administering something like this constantly and | learning new approaches, instead of just rotating a bunch of | files. | | For others, there are no such worries, because they can open | their wallets (without worrying about certain regulations and | where their data can be stored, hopefully) and have some | cloud provider give them a workable solution, so they just | need to integrate their apps with some agent for shipping the | information. | | For others yet, throwing the requirement over to some other | team who's supposed to provide such platform components for | them is also a possibility. | phillipcarter wrote: | > Why? Because generally it's easier to approach logging | | That's due to a significant head start. When one practice | has more than a decade to mature by spreading across an | ecosystem and becoming more ergonomic, it's definitely | going to be easier to adopt. | | However, tracing as come a LONG way with OpenTelemetry even | in the past year. There's still a big mountain to climb, | but making tracing better/more widespread and providing on- | ramps to metrics and logging systems is exciting. | nine_k wrote: | Logging to syslog already solves the problem of asynchronous | / streamed logging. | | Structuring log messages better than an arbitrary text string | is harder (even though I heard the widespread systemd offers | a way). Mostly it's because it's hard to agree upon a | universal standard. Logging in a self-describing structure is | basically the same as logging in JSON (which is not unheard | of). Maybe you could do a bit better logging in protobuf or | thrift records. | r3trohack3r wrote: | > Mostly it's because it's hard to agree upon a universal | standard | | This is one of the things I've appreciated in my corner of | the Nodejs ecosystem. It's fairly standardized around pino | and bunyan. With async hooks you can instrument all | downstream async APIs too and associate them with the | incoming request. So, by default, in my services all logs | are associated with their incoming requests. | | That being said - I'm inclined to agree that, as the | distributed tracing ecosystem becomes increasingly mature, | it really is eating the use case of logs. The real trick | with tracing is being able to associate a spike in latency | for 5xx requests in metrics with all their associated | traces. With logs I can aggregate all 5xx logs, filter by | latency, and get the subset of logs for the high latency | requests to dig in. In more mature stacks, I can even | conditionally increase log sampling on-demand for those | requests to get a better view into the problem (especially | helpful for rare events that are unlikely to make it past | the sampler). | | I'm not good enough with traces yet to do this. Today I | have to start with logs to find the request IDs to get to | the traces. | dfee wrote: | My God. Logging in protobuf? | | Logging is the lowest of all debugging utilities - its the | first thing you _ever_ do writing software - "hello world". | And, while I admire structural logging, the truth is | printing strings remains (truly) the lowest common | denominator across software developers. | | I often see confusion on how to interpolate variables - let | alone zero standardization on field formatting, and poorly | managed logging infra, too. | | Adding another barrier to entry (protobufs) isn't going to | drive better usage. Any solution I could prescribe would | inevitably chop off the long tail of SWE (raise the common | denominator), and I think that's going to be quite an | unpopular position to advance in any established company. | | To be clear: structure is good, our expectations of how to | structurally log are too low, and introducing a build step | to compile PB to native, and then dealing with a poorly | generated (my opinion) API to log, sounds like a miserable | experience. | arinlen wrote: | > _My God. Logging in protobuf?_ | | Yes, or any other data format and/or transport protocol. | | I'm surprised this is up for debate. | | > _Logging is the lowest of all debugging utilities - its | the first thing you ever do writing software - "hello | world". And, while I admire structural logging, the truth | is printing strings remains (truly) the lowest common | denominator across software developers._ | | This sort of comment is terribly miopic. You can have a | logging API, and then configure your logging to transport | the events anywhere, any way. This is a terribly basic | feature and requirement, and one that comes out of the | box with some systems. Check how SLF4J[1] is pervasive in | Java, and how any SLF4J implementation offers logging to | stdout or a local file as a very specific and basic | usecase. | | It turns out that nowadays most developers write software | that runs on many computers that aren't stashed over or | under their desks, and thus they need efficient and | convenient ways to check what's happening either in a | node or in all deployments. | | [1] https://www.slf4j.org/ | chrsig wrote: | Syslog doesn't solve any of the problems outlined above - | you're just trading a file for a unix domain socket. And it | introduces it's own problems, such as sharing that socket | with the containers (if you just mount /dev/log into the | container, you're gonna have a bad time when the syslog | daemon restarts). Log rotation also becomes a problem at | higher volumes. Want a 5 minute rotation? logrotate isn't | of much help. | | re: logging in protobuf/thrift. Thrift is more well suited | towards it due to it's stream friendliness. With protobufs, | you can do better performance wise than json, but does tend | to be a bit more annoying because you need to write a | length header between messages. (at this point, I've | considered logging directly into a `.tar` file with one | file per request. I haven't ever done it, and wouldn't in | production, but it's an intriguing thought to me.) With | either format, you have to deal with the fact that they're | now binary logs, and you lose out on the ability to use a | number of tools like grep & awk, or you have some program | to cat the log as plain text. | | Logging to json, while it's a bit more overhead, offers | access to a _lot_ of tooling which is lost with binary | logging or unstructured logging. Athena, snowflake, sqlite, | kafka /ksql, jq, it's incredibly ELK friendly...The list | goes on. In general, it's usable without the message | definitions, so it can be much easier to use. | | All of these tradeoffs wildly depend on what it is that | you're logging and what your needs are. | rcarmo wrote: | Many tracing libraries get their data off logging libraries. | Logging doesn't always mean "whoa, let's stop everything and | write to a file", it is often a matter of buffering data or | increasing counters until data can be sent out in various | fashions. | | Log files are just one of those. | glenjamin wrote: | The key difference is my experience is not about destination | - backends are as flexible as you say | | One key difference is having an event name + properties, | rather than a human-readable description and maybe some | fields if you're lucky | | Another key difference is wrapping units of work rather than | just scattering log lines in places you think need them | somewhat arbitrarily | rcarmo wrote: | Most decent logging frameworks allow for structured output. | That change came about roughly when NewRelic became "the | thing" to use. | jayd16 wrote: | >One key difference is having an event name + properties | [...] Another key difference is wrapping units of work | [...] | | Modern "enterprise" logging libraries already work this | way, don't they? | tptacek wrote: | The otel trace API is nowhere near ergonomic enough to replace | logging. I think people mostly tolerate otel because the | idiomatic way to use it is just a funcall and a defer at the | top of a subset of your functions, and because if you're otel- | instrumenting, you're probably already passing a context. | | https://pkg.go.dev/go.opentelemetry.io/otel/trace | phillipcarter wrote: | FWIW tracing in Go is primarily made more difficult by having | explicit context propagation and no helper to auto-close a | span based on scope. If you use Python or .NET for example, | it's implicit and you can "just create some spans" like you | would a structured log somewhere in your app, then use built- | in language constructs to have it handle closing for you. | xani_ wrote: | ...the helper like WithCancel/WithDeadline ? | Rapzid wrote: | .Net has task-local storage which is pretty amazing and | Seri log made great use of it(my all-time fav structured | log lib). | | But iirc task local is a perf concern, and JavaScript has | yet to have anything comparable that's ubiquitous(nodejs | async hooks though so one day maybe). | | So, explicit contexts for Go and JavaScript unfortunately. | phillipcarter wrote: | JS does have convenience APIs in the form of | _startActiveSpan_ and _getActiveSpan_ that take care of | most of the pains here. There 's likely some scenarios | where you can't use these (e.g., instrumenting a tool | must use the sdk-trace-base package) but I think in most | cases they help a lot. Go doesn't have these kinds of | convenience APIs though. | tptacek wrote: | Sure. If it was a different language, or a different otel | library, my opinion would probably be different. I do not | think it's realistic to replace logging with otel in Go, | though. | glenjamin wrote: | For web-style application code my experience writing Go | code says that you totally can stop using logs and just | use traces and have just as much visibility into a | running process, if not more. | | For library code or other deployment models I'm not sure | - it's not something I've had a need to experiment with | phillipcarter wrote: | I think it's an inclusive OR here. When logging | stabilizes in the Go SDK (which will be a while, perhaps | a year) the idea will be that you can keep all your | existing logging and then start to add tracing | judiciously so they're easily correlated with requests. I | think the juice is worth that particular squeeze for more | folks at least. | | From there I would personally advocate for adding more | manual traces rather than more logs, but it's | understandable if it's a heavier lift to pass context | around all the time. The more options the better. | henvic wrote: | I've played with trying to create an idiomatic wrapper for | their Go module precisely because of this reason, but then | didn't find time to continue with this. | | https://github.com/henvic/tel | jyounker wrote: | Or you include the trace ID as one field in the structured log. | That ID can be threaded through the log library. | akira2501 wrote: | I'm of the opinion that we should use both. They're both | fundamentally useful and solve separate problems, and I don't | think there's any resource constraints that would make this an | issue. | | A logging API that just lets me write either a textual string, | a structured telemetry object, or just both would be ideal. | Then I can have either source separately, or I can view how | they generated events collectively. | phillipcarter wrote: | OpenTelemetry's design says: why not both! Logging is still | early days, but the idea is that you can take your existing | structured logs and have them automatically correlated with | traces that you can add later. | | (there's other use cases some can envision, but I'm team | "tracing is better than logging", while recognizing that most | apps don't start from scratch and need to bring their logs | along for the ride) | jen20 wrote: | I'm in the position of working on a green-field system | currently, and spent a bit of time looking at OpenTelemetry | both as a logging system and for metrics and tracing. If | OpenTelemetry can indeed suit that use case today, it's not | evident from any of the documentation or examples, and the | naive "go get" approach in a test project led to | incompatible versions of unstable libraries being pulled | in. | | Further, the OpenTelemetry Go library pulls in all kind of | dependencies which are not acceptable: a YAML parser which | doesn't even use compatible tagging and versioning via | Testify, logr and friends, and go-cmp. Combined with the | insistence of GRPC on pulling glog and a bunch of other | random stuff like appengine libraries, you end up with a | huge dependency tree before you've even started writing | code. | | The result is I'm sticking with zerolog, and a custom | metrics library, and will let Splunk do it's thing. It's a | shame, as distributed tracing is very desirable, but not | desirable enough to ignore libraries that have chosen | concrete dependencies vs interfaces and adapters. | | Fundamental things like this need to be part of the | standard library, or have zero external dependencies. Rust | does (in my opinion) a better job here despite requiring on | average more libraries to achieve things, because features | can be switched off via Cargo.toml, reducing the overall | dependency set. | phillipcarter wrote: | (note: I'm not a maintainer in the Go SIG so I can't | speak towards these issues with much authority) | | Definitely interested in how things can be improved here. | What are you generally looking to accomplish? Hopefully | it's just a matter of docs not accurately expressing | what's possible today (a consistent theme for quite some | time that's getting resolved, but it takes time). | | As for the dependency issues, is this something you'd | have the time to file an issue about[0], or make an | agenda item for the next Go SIG? At the end of each SIG | there's also a moment where the SIG leader asks for | stories about adoption (good or bad) and this would | clearly count. I'd like to think that these issues are | solvable, or at least solvable enough to make it worth it | for you to adopt. | | [0]: https://github.com/open-telemetry/opentelemetry- | go/issues?q=... | jen20 wrote: | Thanks for the thoughtful response. | | I think the aim is to have a single "sink" for metrics, | logs and tracing, and it's clear that OpenTelemetry is | aiming to be that. Ideally from a single "go get", one | should be able to pull in a library without additional | dependencies that: | | - permits recording of structured logs (perhaps the | stdlib interface under discussion will help here!). From | a personal perspective I'd rather not have an API that | accepts key-value pairs as varargs, but if that was the | only thing available I'd take it I think. | | - permits recording of counters, gauges, timers and | histograms, | | - permits recording of spans, and associating both logs | and metrics with those spans. | | The Go library README shows the logs section as "frozen" | and states that no pull requests to improve logging are | being accepted. With no visibility into the process | (which is, to be clear, because I haven't looked into it | and not the fault of the project per se!), it looks like | this might be an option in future but not today? | phillipcarter wrote: | I see! Yeah, this is one where where otel-go is a lot | harder to use, but it's something the SIG is looking at. | A coworker of mine is helping drive a design that's sort | of an "easy button" to configure all the things with the | least-surprising defaults[0] and we're seeing how people | like it in our SDK distribution that uses it[1]. I hope | that sometime soon we'll have the design polished-up | enough to get merged in. Like most OSS projects, it'll | take some time but I'm confident we can get it done. | | The main challenge is that there's a large variety of use | cases to fulfill (e.g., someone wants custom context | propagation, a custom span processor, and export over | HTTP+json but not HTTP+protobuf) and today the answer to | that is that you have to pull in all the libraries for | all the things you need. It's a lot more energy you need | to expend to get started with all of this than it needs | to be. | | As for logging support in the Go SDK, it's frozen mostly | just due to lack of bandwidth and a need to finish what's | already been started. Metrics have proven to be much more | difficult and time-consuming to implement correctly | across all languages, with Go being impacted harder than | other languages (e.g., Python and .NET). I think you can | expect logging integrations in the near-ish future | though. | | This is great feedback. I'll pass it on folks who haven't | seen it. Thank you! And please feel free to file issues | about all the things that rub you the wrong way | | [0]: https://github.com/open-telemetry/opentelemetry-go- | contrib/p... | | [1]: https://github.com/honeycombio/honeycomb- | opentelemetry-go | arinlen wrote: | > _I'm increasingly of the opinion that we should stop using | "logging" libraries, and instead we should produce telemetry | via events using something like OpenTelemetry's trace API._ | | That really depends on what you mean by "events". Even | OpenTelemetry supports logging events. Logging is not replaced | by emitting metrics, as they have very distinct | responsibilities. | | > _These traces can be rendered as strings for people who like | their logs in files_ | | You're missing the whole point of logging stuff. The point of | logging things is not to have stuff show up in files. Logging | events are the epitome of observability, and metrics and | tracing events end up specializations of logging events which | only cover very precise and specific usecase. | glenjamin wrote: | > Logging events are the epitome of observability, and | metrics and tracing events end up specializations of logging | events which only cover very precise and specific usecase. | | This is a common way of thinking about things, but having now | worked with event-based tracing for the last few years I no | longer believe traditional logs are necessary. | | If you do "good" logging you likely have entirely structured | data and no dynamic portions of the main log message. This is | almost exactly equivalent to a span as represented by tracing | implementations. | | I have come to believe that these events are the core piece | of telemetry, and from them you can derive metrics, spans, | traces and even normal-looking logs. | arinlen wrote: | > _This is a common way of thinking about things, but | having now worked with event-based tracing for the last few | years I no longer believe traditional logs are necessary._ | | You are free to use stuff as you see fit, even if you | intentionally miss out on features. | | Logging is a very basic feature whose usecases are not | covered by metrics events, or even traces. Logs are used to | help developers troubleshoot problems by giving them an | open format to output data that's relevant to monitor | specific aspects of a system's behavior. | | It makes no sense to emit a metrics event with tons of | stuff as dimensions and/or annotations that's emitted in a | single line of code that's executed by a single code path | when all you need is a single log message. | | Also, stack traces are emitted as logging events, and | metrics events are pretty useless at tracking that sort of | info. | | But hey, to each its own. If you feel that emitting a | counter or a timer gives you enough info to check why your | code caught an exception and what made it crash, more power | to you. The whole world seems to disagree, given standard | logging frameworks even added support for logging stack | traces as structured logs as a basic feature. | glenjamin wrote: | A counter or a timer is clearly no substitute for a log, | that would be an absurd statement for me to have made - | which is why I said no such thing. | | A span as modelled by OpenTelemetry is a a bag of | developer-chosen structured data containing a name, a | timestamp and some auto-generated IDs to correlate it | with others. | | It is effectively an open standard for how to collect | good structured logs - although it is currently not | commonly pitched that way by either tracing advocates or | structured logging advocates. | | These are commonly used to create trace waterfalls, but | can also be used to derive metrics or simple dev-debug | output. | | I have used this approach myself to great effect, and | also spoken to a number of people who have also found | success with such an approach - but currently the | momentum of existing approaches (and the fact that | traditional logging still works well-enough for most | people) has not led to wide adoption | cube2222 wrote: | I agree. | | I'm not necessarily a fan of OpenTelemetry, as every time I | looked at it it was still "in progress" (though the last time | was almost 2 years ago), but in general I agree. | | We started with tracing+logging, but as time passed we moved | more and more stuff into tracing. But not just span-per-service | kind of tracing - detailed orchestration on the level of | "interesting" functions, with important info added as tags. | | Right now we use almost exclusively tracing (and metrics ofc) | and it's working great, even if each trace often has hundreds | of spans. | | In our case we use Datadog, which works well for most bits and | purposes, but we also send all traces over Firehose to S3 so | that we can query them with Athena at 100% retention. This way | you get your cake and can eat it too. | phillipcarter wrote: | If you've got the time for instrumenting something relatively | small, it's definitely worth picking up OTel again to see how | it works for you. Tracing is stable in most languages (Go | included) and there's a lot more support in each language's | ecosystem. | | FWIW I think OTel will always be in a state of "in progress" | though. There's more scenarios and signal types that will get | picked up (profiling, RUM) and more things to spread across | the ecosystem (k8s operator improvements, collector | processors for all kinds of things, etc.) and more | integration into relevant tools and frameworks (.NET does | this today, maybe flask or spring could do it tomorrow). But | I wouldn't let that stop you from trying it out again to see | if you can get some value out of it! | jayd16 wrote: | I guess Golang is catching up but in something like C# or Java, | the logging libraries already feel like tracing libraries. They | already have contexts and structure. | | What are you proposing should change? Log files should default | to a verbose binary representation that captures more? What do | you think is not captured by today's logging? | phillipcarter wrote: | Tracing gives you correlation between "log lines" and across | process/network boundaries by propagating context for you. | You could replicate everything that tracing does by manually | propagating around context-like metadata yourself, stitching | the log lines together into a "trace". But why not just use | an SDK that does that for you? | jbjbjbjb wrote: | It is a config away in C# world, and even when it wasn't so | in built it was very simple to add to structured logging. | phillipcarter wrote: | In the C# world, tracing is also just a config away. It's | baked into the runtime. But you don't need to manually | stitch together logs into a "trace" - it just does it for | you. | jbjbjbjb wrote: | What I meant was you can easily add a correlation id to | your logs, and send those to wherever you collect and | query the logs. Then using that you can easily see the | log across your services. So there isn't any manual | stitching in the logging either. | azlev wrote: | Telemetry and logs are distinct. It makes sense to gather | metrics and it makes sense to catch sporadic messages of what's | going on in details. | latchkey wrote: | While we can debate all day long what should and should not be | part of the standard lib (bike shed), I just ignore all of this | and happily use zerolog. | | Problem solved, one dependency, it just works. Back to writing | code. | jen20 wrote: | The problem is, it's not one dependency. Several important | libraries pull in a variety of wrappers and adapters - just to | name three: | | - glog (gRPC) | | - logr (OpenTelemetry and friends) | | - klog (Kubernetes client) | | It's quite conceivable that even having decided to use zerolog, | you end up with at least four separate logging libraries in | your dependency tree before having written any of your own | code. | latchkey wrote: | True, but as long as I can either control the logging or | write an adapter, I'm ok with that. | | At the end of the day, I'd still have that issue, unless the | standard lib was being used everywhere... which is something | that is equally not controllable. | jen20 wrote: | The hope of a standard structured logging interface in the | standard library is that serious libraries will adapt | themselves to use it, and the proliferation will disappear | over time. Whether or not this will actually happen, or | whether the libraries which do not play ball will have to | be forked is a different matter. | latchkey wrote: | "serious" libraries today all provide the ability to pass | in adapters. | | Having been in the group of a multi-year effort to do | this in Java (JSR-47), there is always a limitation in | the API that prevents someone from getting it right. | | Funny enough, even though Java ended up with JUL | (java.util.logging), SLF4J ended up being the defacto | standard that everyone went with. | jen20 wrote: | > "serious" libraries today all provide the ability to | pass in adapters. | | Sadly they don't, unless we count things like gRPC, | Kubernetes clients and so forth as non-serious. | tptacek wrote: | The proposal acknowledges that. Part of the point of working | this out is that it creates a common interface so that your | zerolog-logging code plays nice with someone else's hclog- | logging code. | latchkey wrote: | A long time ago, I was part of JSR-47. I wonder if anything | can be learned from that process (which was a bit of a mess). | oezi wrote: | Unless the new generic logging API is designed badly so that | certain functionalities become hard in the future because | they deviate. | | For instance when Rails did standardize ActiveJob (background | tasks) they didn't include a way in the API to query the | current length of a queue of background tasks. Because | everyone is now only trying to conform to the API accessing | such information is now hard in many of the implementations. | motiejus wrote: | If we ignore the proposal, this gives at least an awesome | overview of the state-of-the-art of logging in go. That was the | best part. Thanks for the heads up! | egnehots wrote: | _Prior Work_ | | - The existing log package has been in the standard library | since the release of Go 1 in March 2012. It provides formatted | logging, but not structured logging or levels. | | - Logrus, one of the first structured logging packages, showed | how an API could add structure while preserving the formatted | printing of the log package. It uses maps to hold key-value | pairs, which is relatively inefficient. | | - Zap grew out of Uber's frustration with the slow log times of | their high-performance servers. It showed how a logger that | avoided allocations could be very fast. | | - zerolog reduced allocations even further, but at the cost of | reducing the flexibility of the logging backend. | | - All the above loggers include named levels along with key- | value pairs. Logr and Google's own glog use integer verbosities | instead of named levels, providing a more fine-grained approach | to filtering high-detail logs. | | Other popular logging packages are Go-kit's log, HashiCorp's | hclog, and klog. ___________________________________________________________________ (page generated 2022-09-11 23:00 UTC)