[HN Gopher] My Logging Best Practices (2020) ___________________________________________________________________ My Logging Best Practices (2020) Author : isaacfrond Score : 254 points Date : 2021-10-05 13:52 UTC (9 hours ago) (HTM) web link (tuhrig.de) (TXT) w3m dump (tuhrig.de) | yongjik wrote: | Also, it goes without saying, but never buffer logs. Each line | should end up with a write(2) call or something equivalent. If | you buffer logs, and the process crashes, you lose logs exactly | when you really really need it. | Spivak wrote: | I agree with you in theory if resources were infinite but there | is no way to log at the rates developers actually want to log | and have a performant application without buffering. | | I think the more practical advice is "buffer logs outside your | application" or "buffer logs in a separate thread/process that | crashes independently of your application code." | | Even the small places I've worked could easily push 250G logs a | day. | yongjik wrote: | I guess "buffering in a separate process" can be handy, esp. | if you have strict performance requirements, but I guess | using shared memory correctly can be tricky...? | | (Without shared memory, I imagine you'll have one system call | per log anyway - in which case it's not much different from | calling one write() per line.) | TameAntelope wrote: | I love these suggestions, but what is most important is that you | make these choices (or others) and stick to them. | | Consistency in how you log is probably Best Practice #0, IMO. | adrianmsmith wrote: | > Log after, not before | | My problem with that is what if the program crashes or produces | an error. It's good to know what the program was attempting to | do. | | Imagine the output Successfully connected to | DB. Successfully read config file. Uncaught | Exception: NullPointerException | | vs. Will connect to DB... Will read | config file... Will connect to API server... | Uncaught Exception: NullPointerException | jlokier wrote: | I've found the most useful log before/after pattern is a | little-known dynamic variant. | | Imagine this success output, at a low (not tracing) log level: | Opened DB connection | | Error output _at the same logging level_ : | Opening DB connection Opening TCP connection to | 127.0.0.1:5678 Error: No more file descriptors | Error: Failed opening TCP connection Error: Failed | opening DB connection | | Crash output _at the same logging level_ : | Opening DB connection Opening TCP connection to | 127.0.0.1:5678 Starting TLS negotiation | Fatal: Segmentation fault <stacktrace...> | | Success output with tracing turned up and no error: | Opening DB connection Opened TCP connection to | 127.0.0.1:5678 TLS negotiation completed in 0.1ms | DB version 5.1 features A, B, C Opened DB connection | | The key feature here is that there's a scoped before/after | message pair ("Opening DB connection" and "Opened DB | connection"), and the before part of the pair is output _if and | only if_ any messages are output in between. | | The contour around messages in between is visible (here with | indentation), and the before message is _not_ shown if there | are no intermediate messages. Sometimes it 's useful to show a | different after-message if the before/after pair is not output, | to include a small amount of information in the single message, | or just to improve the language in the quieter case. Sometimes | it's useful to suppress the after-message just like the before- | message, if there is nothing in between. | | Anything that might terminate the program, such as the | Segmentation fault with stacktrace, counts as an intermediate | message, and so triggers the output of all the before messages | as part of the termination message logging. For this to work, | of course you must have some way for the before messages to be | queued that is available to the crash handler. | | This is a poorly known technique, which is a shame as in my | opinion it's a very helpful technique. | | Its chief benefit isn't saving a few messages, it's that it | _scales_ to complex, deeply called systems with heavy amounts | of tracepoints encouraged throughout the code. When studying | issues, you may turn on tracing for some components and | actions. As you turn on tracing for a particular component, | before-messages from _caller_ components become activated | dynamically, ensuring there 's always context about what's | happening. | | In case you're wondering, it works fine with multiple threads, | async contexts, and even across network services, if you have | the luxury of being able to implement your choice of logging | across services, but of course there needs to be some context | id on each line to disambiguate. | spaetzleesser wrote: | Agreed. Logging before gives you a lot of information to | diagnose an issue. Logging just exceptions usually lacks a lot | of necessary information | goohle wrote: | Just print out all necessary information, in case of | exception, and suggest a potential solution to the problem. | slaymaker1907 wrote: | This does not work in practice a lot of the time. What if | the problem was actually due to some other thread and said | other thread didn't crash? If you only log after | success/failure, you would have absolutely no idea why the | program crashed. Oftentimes if the actual error were | anticipated enough for good reporting, said error would not | exist in the first place. | AlexCoventry wrote: | Would it make sense to log a traceback, on unexpected errors? | jayknight wrote: | I think he is assuming that the logged exception will include | the details needed to track down the error (line number, stack | trace) | bobbyi wrote: | Yes. I think logging before is still useful, but it is for | hangs, not exceptions. | | If your program disappears into a black hole doing some | operation that doesn't have (and needs) a timeout specified, | you want to know what it was about to do when you last heard | from it. | adrianmsmith wrote: | Of course in an ideal scenario that's the case. Nevertheless | programs do sometimes just "crash" without an error message, | or maybe they hang and never produce any output due to an | infinite loop, or excessive memory allocation and GC | overhead, etc. Logging what the program was attempting to do | is useful in those scenarios. | raldi wrote: | That's what the DEBUG level is for, as explained at the end of | the section you're referring. So it would be like: | D: Connecting to DB D: Reading config file D: | Connecting to API server D: Sending updated record | I: Updated user record for $USER | | Or, if it fails, the last line is replaced by: | E: Failed to connect to API server; could not update user | record. [IP=..., ErrNo=..] | mirekrusin wrote: | Debug level is not for production, it's for debugging. | yongjik wrote: | Personally I'm not a fan of DEBUG level (unless you always | keep it on). You can have a transient system issue that | appeared for 30 minutes and then went away. If DEBUG logs | weren't turned on at the time, they won't help me. | Jowsey wrote: | Why not have the program write logs to a temp file | somewhere at trace/debug level, and only output the | standard level to the console? | Cicero22 wrote: | Wouldn't that end up slowing down the whole system from | writing to a file though? | Jowsey wrote: | Ah, suppose it depends on how much you're doing and/or | how verbose your logs are. If you've got lines speeding | past then yeah I maybe wouldn't write them all. | grey-area wrote: | Writing verbose logs is very expensive. | worik wrote: | It depends which costs what. | | In a case where there are plenty of resources and the | thing you are logging is very heavy then even verbose | logging is trivial. | | If you are beset by failures it is much cheaper than | making your developers guess. | raldi wrote: | Of course you always keep it on -- the point of the | different levels is that you can filter them out at _read_ | time (e.g. to eyeball logs, or to alert when the rate of | >= WARN is too high) | codemac wrote: | This traditionally has not been log level's use case. | | syslog() and setlogmask() being the obvious examples. | | Many (if not all) companies I have worked out filtered | out everything DEBUG at compile time to improve | performance in production as well. | | The best option I've seen is putting everything in DEBUG | or lower into a memory ring buffer per process that's | dumped and collated with other logs at each ERROR/FATAL | log. | hurflmurfl wrote: | I've actually used this idea in production and it worked | great. | | In my web app if nothing unexpected happens, only | INFO/LOG level stuff is pushed to logs. If, however, I | get an exception, I will instead print out all the | possible traces. I.e. I always store all the logs in- | memory and choose what should be printed out based on the | success/failure of the request. | | Now, of course, this is just a web API running in AWS | Lambda, and I don't have to care overly much about the | machine just silently dying mid-request, so this might | not work for some of you, but this works great for my | use-case and I'm sure it will be enough for a lot of | people out there. | spockz wrote: | That is actually quite nifty. That way you can carry | debug logs and only materialise/send them in case of | errors when you are likely interested in it without | having to put those logs at error and see them all the | time etc. | | Does anyone know of an implementation for this in any of | the Java logging frameworks? | masklinn wrote: | Fwiw sentry does that out of the box, anything below | warning (I think) is used as context for "actual events". | snak wrote: | Performance degradation is noticeable with DEBUG level in | production. | raldi wrote: | Performance degradation is noticeable with excessive | logging; the level is irrelevant. | mplewis wrote: | Leaving debug levels on in production will bankrupt most | companies. | raldi wrote: | Addressed here: | https://news.ycombinator.com/item?id=28764834 | dezzeus wrote: | Even though it can be lost by an optimistic configuration, I | prefer to use the TRACE level(s) and leave the DEBUG one for, | well, debugging. | raldi wrote: | We're discussing a NullPointerException situation; how is | that not debugging? | Leherenn wrote: | I think OP's case was not debugging a crash, just that if | a crash happens he would like the log to be before rather | than after. The log statements would be there no matter | whether there is a crash or not, just in case. | BrandoElFollito wrote: | I prefer to have both information (the actual logs would be | structured) Connecting to DB | Connected to DB - OK Reading config file Read | config file - OK | | Even better, but hardly doable because of the nature of logging | Connecting to DB ... OK Reading config file ... OK | kulikalov wrote: | Performance though? | worik wrote: | Diagnosis of failure? | whatupmd wrote: | Users sure do appreciate people who put love into their logging. | | Remember that just because you love to hear what your application | is doing, it's not always be helpful or useful to your user. Give | consideration to the Rule of Silence | (http://www.linfo.org/rule_of_silence.html) and even if you do | not agree with it remember that some of your users might. | [deleted] | ryandrake wrote: | One thing I noticed in one of the article's examples is that the | log contained an example user's PII: INFO | | User registered for newsletter. [user="Thomas", | email="thomas@tuhrig.de"] | | Ruh-roh! You now have a potential privacy incident ready to | occur, if an inadvertent log leak ever happens. I think, at least | these days, one vital component of any logging API is the ability | to flag particular bits of information as PII so that it can get | handled differently upon output, storage, or ingestion by other | tools. For example, if the users's actual E-mail address is not | specifically useful for debugging, do you need to display it in | the default visualization of the log? | | I think in 2020, with companies' privacy practices getting more | and more scrutiny, any logging best practices doc should at least | mention treatment of PII. | tobbebex wrote: | I used to work with a medical software where we turned off | logging completely in production, to mitigate the risk of | accidentally log patient information. | | We needed to log that kind of data during development and just | tagging what messages were sensitive would have been too error | prone, so we couldnt reliably have any logs once we went live. | | We did however have a special crash handler that logged the | exception type and callstack, but not even the exception | message could touch the harddrive. | slaymaker1907 wrote: | What I find works best is to avoid logging raw datatypes and | instead limit it to objects/structs with custom toString. Then | just make sure any PII fields are not included with toString | either through some sort of annotation or by writing the | implementation manually. This also encourages minimizing the | code that actually interacts with PII such as replacing emails | with user IDs ASAP. | mholt wrote: | Ah yeah, that's why in Caddy 2 while we do emit full details | from HTTP requests in our structured JSON logs, you have the | ability to traverse into each log message and replace or delete | data within it: | https://caddyserver.com/docs/caddyfile/directives/log#filter | kordlessagain wrote: | This reminds me of the time a developer turned on debugging to | fix some credit card thing and was sending us credit card | numbers in their logs. | pugworthy wrote: | Agreed. | | I found out a few years back that a large industrial device our | company made was recording all users passwords due to a logging | statement that was blindly just logging every piece of user | input via UI. | cloverich wrote: | Don't forget exit logging[1], I am a bit ashamed how far into my | career I was before understanding this concept. | | [1]: https://pinecoder.dev/blog/2020-04-27/node-exit-logging | boffinism wrote: | "Best Practices" really is synonymous with "Personal Preferences" | if it can work in headlines like this, isn't it? | the_gipsy wrote: | The author does make some effort to explain why. | SideburnsOfDoom wrote: | That author however has some reading to do around structured | logging and related practices, which might raise their | personal "best". | kleiba wrote: | "my" + "best practices" = "personal preferences" | worik wrote: | "my" + "personal preferences" = "best practices" | jonnycomputer wrote: | I like this post, and I think the recommendations are sound. I'm | not quite sold on the technical vs business logic distinction | between debug and info. I use debug for all those messages I only | want to see if something has gone wrong, and info for all those I | want to see to know its working. That _might_ fall along the | business vs technical distinction, but it might not. | camel_gopher wrote: | } catch(e: UnauthorizedException) { log.warn("Request to REST API | was rejected | | Don't throw exceptions for auth failures. | rond2911 wrote: | I was expecting way too much in this article. But, I don't see | anything about logging requestId's etc. | | Sharing a few things briefly below I have learned over the past | few years - | | 1. Log every single message for a transaction/request with the | requestId to look at the entire picture. Ideally, that requestId | would be used upstream/downstream too internally for finding | exact point of failure in a sea of micro services. 2. Log all | essential id's/PK's for that particular transaction. - makes | debugging easier. 3. Log all the messages possibly in a json | format, if the log aggregator you are using supports that. | Parsing later for dashboards makes lives WAY EASIER and is more | readable too. Might also reduce overall computations being run on | the cloud to extract values you want -> hence, lesser cost. 4. | Having error/warn logs is good, but having success/positive logs | is equally important imo. 5. Oh, and be very very careful about | what you end up logging. We don't want to log any sort of user | PII. At one of my previous companies, there were mistakes made | where we were logging user's phone numbers, addresses etc. | | I am sure the community here already knows about these, and might | have even more to add. Would love to hear what other people are | doing. | | Cheers. | slaymaker1907 wrote: | I completely disagree with Thomas' first point. It is better IMO | to always log before you do something so that if you crash during | whatever was happening, you know what you were trying to do. | Additionally, if you have enough logging, you can figure out that | some call did succeed by implication so long as you have access | to the source code/know what the order of high level operations | are. | | If for some reason you can't easily figure out whether something | succeeded, you should probably log before and after the | operation. I personally try to avoid doing this since it can | clutter up the log, but it can be useful depending on how code is | structured. | | Another minor point is that I prefer to have INFO cram as much | information as possible with DEBUG reserved for stuff which is | ignored unless you explicitly turn it on either due to lack of | utility vs cost. | gwbas1c wrote: | Don't SPAM the logs. | | You don't need to log every method call, every variable, every | "if" statement, ect. That's what the debugger is for. | | And, I shouldn't need to say it: Don't log passwords or other | sensitive information. | | Also: Make sure your logger is configured correctly. Log4Net used | to (and it still might) calculate an entire stack trace with each | call to the logger. It can be the single highest consumption of | CPU in an application; even if you aren't actually logging stack | traces. | TameAntelope wrote: | If you're using a tool like Splunk, verbosity of logging is | usually not much of a problem. | albertgoeswoof wrote: | But if I don't log everything, why did I build a massive ELK | stack for my big data logs | tombert wrote: | I'm surprised there's no mention of tagging with some kind of | correlation ID. I find that even generating a quick one-off GUID | at the beginning of the request (to be used across the entire | request) makes tracing server logs substantially easier. I just | tag every log (and exception) with that ID, so when searching | with grep or Splunk or whatever, I have a quick and easy way to | disambiguate which log is connected to other logs. | wainstead wrote: | And there's a standard for it: https://www.w3.org/TR/trace- | context/ | | Many libraries and tools support the standard now. | tombert wrote: | And now I feel embarrassed for not knowing about | this...Thanks for the link! | wainstead wrote: | Don't, I only learned about it recently, and the standard | itself is quite young! | kerblang wrote: | Just an opinion, but I tend to think of this as distributed | monolithic development... the more log correlation is needed, | the more likely a requirement has been chopped up and then | duct-taped back together over HTTP. | jniedrauer wrote: | What about distributed systems? I had to reach for this tool | recently to trace requests as they propagate through up to 5 | separate devices. Not everything is a web app. | NovemberWhiskey wrote: | Not sure I follow that logic? | | If you have a heavily used service, it's likely to have many | concurrent requests logging into the same log stream and you | absolutely need a request correlation identifier to be able | to thread the log statements for a single request back | together. | | That's essential even if you don't actually call any other | services as part of your implementation. | comeonseriously wrote: | > Log after, not before | | I'd say this is good for production. During development, I | sometimes like to log the name of the function I'm in as the | first line of the function. That way I'm assured to know if the | function was even called. I will also put before and after logs | if I want to see how data is changed (Yeah, there are things | called debuggers that let you watch variables, but sometimes a | log is just so much easier - such as in heavily threaded code or | code with a number of timers and events firing). | raldi wrote: | That's why the author wrote: | | _> I make exceptions for DEBUG._ | jonnycomputer wrote: | Yes, but I think the author could have expanded on that tbh. | 8008135 wrote: | But that makes me engage my monkey brain to decide what level | and where to put the message. Easier to have a consistent | rule that all logging is before/after (I'm strongly in the | before camp). | progx wrote: | In asynchronous apps it could be better to log both, before and | after. You don't know what will be possible logged in the | maentime. | | With info and success level you could log info before and success | after execution. | | In dev mode you show info, in production you could filter | success. | chriswarbo wrote: | Will Sargent did a nice talk on logging at Scalacon. My biggest | takeaway was that pre-choosing a 'log level' for what to skip is | making the decision too early (it's hard to guess what messages | we'll want), and too coarse-grained (we don't want verbose | messages when we follow the happy-path, but tricky/exceptional | paths may want more logging). | | His suggestion is to accumulate verbose messages in memory/disk, | so we can _later_ decide whether to dump them out (e.g. in a top- | level error handler). | | The talk suggested an SQLite DB, with dynamically-configurable | filters for what to keep, etc. but that's a step beyond ;) | worik wrote: | I really do not think that logs belong in any sort of database. | Perhaps there is something I have not thought of? | | Structured logs (of course structured) in a simple text file. | Rotate the files if your antiquated system cannot handle big | files or if you wildly successful crypto currency scam | generates many many gigabytes a time period..... | | `grep` is your friend! | adrianmsmith wrote: | > His suggestion is to accumulate verbose messages in | memory/disk, so we can later decide whether to dump them out | (e.g. in a top-level error handler). | | Interesting; I've definitely seen that with unit test | frameworks. Where in addition to assertEquals(...) there was an | annotate(...) method which just took a string, and was stored | for that unit test. If the unit test eventually failed, all the | previous annotations for that unit test were printed, otherwise | they weren't. | | https://metacpan.org/pod/Test::Unit::TestCase#annotate-(MESS... | bjt2n3904 wrote: | Another thing to think about is how the log looks on info, vs on | warn, and debug. | | In production, if you set to warn only, will you have enough | data? If you set to info, will you have too much? | spaetzleesser wrote: | I totally disagree with the "log after ". I have seen it multiple | times where the log ended but it wasn't possible to tell in what | code section because some call had thrown an exception. I much | prefer to see a log where I see lines like "attempting to do x | with parameters a,b,c". | | In general I set the bar for " too much logging" very high. It's | a better problem to have too much log to store and sift through | vs having no data. | ziml77 wrote: | I've also seen issues with logging after. | | It feels more natural to log when something is complete instead | of constantly saying when you're starting to do something, but | it leads to issues where you have no idea what the system was | actually doing when it failed. It's very easy to have the | process die without logging what made it die, especially if | you're using a language that doesn't provide stack traces or | you have them turned off for performance. | | Logging before an action also means it's possible to know | what's going on when a process hangs. It might lead me to where | I should look to resolve the problem instead of forcing me to | just restart the service and hope for the best. | raldi wrote: | The author does log before, just at DEBUG level. In the | scenario you describe, the information you desire would be | available. | SideburnsOfDoom wrote: | In that case, the stack trace from the exception should be able | to tell you that you failed on the line _before_ the "log | after". | | It won't be nearly as convenient to dig that fact out, though. | [deleted] | pugworthy wrote: | That assumes that the all fail cases will create an exception | and properly pass on that a failure occurred, with enough | information so that the caller can decide what to do next. | SideburnsOfDoom wrote: | The assumption upfront is that you don't reach the "log | after" point. IMHO, given that, not having an exception | will be rare, and probably involve code that looks "off" | and fixing that is the first task. YMMV though, depending | on language | georgekin wrote: | Another option is to log after, but in a cross cutting way, | e.g. using a middleware or filter. Extra properties can be | added along the way, and if there's an exception the middleware | can handle it and add the problem to the log event. | | The power of structured logging here is in enriching events to | make them queryable later, so you could e.g. view all events | for a request. | xtracto wrote: | A friend of mine wrote a similar article back in 2006: | https://nerds.kueski.com/better-logging/ | | We used that as a standard for logging for at some point, it has | some pretty good insights. | macfaran wrote: | I always try to think of the log like I think of the UI, i.e. | make it usable with as little knowledge as possible. If the | user/1st line support can use it without calling you that's the | goal. | rhn_mk1 wrote: | > Log after, not before | | This is already weird, what about operations that take a long | time? If the session is interactive, the user should definitely | know that nothing hung up. Or is it out of scope somehow? | raldi wrote: | The author later explains that in such situations, you log | before at DEBUG level, and after at INFO. | namanyayg wrote: | Yeah, it's best to log BOTH before and after. | stinos wrote: | Or if you think both is too much, probably only before? Might | depend on te application but I've seen enough native | application logfiles with "did FOO" as very last line meaning | the crash happened on the action executed next, but that | wasn't logged because 'log after!'... | rhn_mk1 wrote: | Logging before has the disadvantage of being noisier than | after only, at least if you want to inspect the result of | the operation. | piaste wrote: | I tend to use VERBOSE severity if I'm just logging "we've | reached this line in the code and local variables are set to | $whatever". | | Since you generally only want to see them when diagnosing a | frozen job or something equally obtuse. | SideburnsOfDoom wrote: | For that there is is usually some log wrapper that times the | operation, e.g. | using(_log.TimeOperation("Doing the thing")) { | DoTheThing(); } | | So that you get a start message, and an end message with | timing, and if you exited with normal control flow vs. with an | exception. | | Author doesn't seem to have reached that level of best practice | yet. | m_ke wrote: | + Use structured logging (ex: | https://www.structlog.org/en/stable/why.html) | no_circuit wrote: | Totally agree. One huge advantage not mentioned specifically at | that structured logging link is handling privacy. A logging | pipeline can be set up to have a collector that automatically | redacts privacy-related or sensitive fields. Only give | developers access to redacted logs to help prevent leaks. | xxpor wrote: | Who would have access in the first place besides said devs? | wainstead wrote: | The ops people. In many companies access to production is | limited. | [deleted] | closeparen wrote: | It's important for data to be limited to those with | business need to know, which is why developers (who are | responsible for the application) should have access to | its data while ops (who are responsible for core | infrastructure) must not. | | I would not put an email address in logs, but definitely | some kind of account identifier so that we can | investigate bug reports from specific users. | Spivak wrote: | Structured logging is just tracing but less powerful. "Context" | and "binds" are just poor-man's spans. If you're going to go | through the effort to redo your whole application's logging | infrastructure just bite the bullet and use OpenTelemetry. | | Edit: Good lord people, set the collector to never sample and | use one of the many file exporters and you get completely bog | standard logs but with a huge amount of work already done for | you and enough context to follow complicated multi- | threaded/multi-process programs without making your head spin. | billti wrote: | Indeed. For those unfamiliar, take a quick read of | https://github.com/open-telemetry/opentelemetry- | specificatio... | | There are a few concepts to understand, but instrumenting it | right from the start is valuable as it grows into a bigger | system. Being a cross-cutting concern, the more code that | standardizes on this the more useful it will be. | closeparen wrote: | Maybe it's just a budget pathology, but we sample our traces. | Browsing the available traces can be informative, but you | can't just pull a trace for an arbitrary request. We do keep | and index all the logs (for a few days). | jeffbee wrote: | So much this. What's important to know is that structured | logging is _as expensive_ as tracing. If you are forced into | the cost of a trace span for every message, you won't do much | logging. Maybe this makes sense to Python programmers to whom | a few microseconds here and there don't seem significant, but | to C++ programmers it is pure lunacy. The cost of | informational logging with something like glog or spdlog can | be as little as a few CPU cycles. | CodesInChaos wrote: | I don't see why basic structured logging would be more | significantly more expensive or complex that unstructured | logging. | | In the simplest case you could just output a sequence of | values (for example as a csv, or one json array per line), | instead of an interpolated string. The size and CPU | overhead of this approach is minimal, but makes it much | easier and more robust to parse the log afterwards. | | Such basic structured logging is quite similar to what the | article proposes in "Separate parameters and messages", but | more consistent and unambitious if one of the arguments | contains your separator character. | handrous wrote: | Correctly and safely serializing to CSV and JSON is more | overhead than spitting out bytes. It's also easier to | spit or ingest bytes as a stream--yes, there are tools | that understand streams of json objects, but you still | have to have all the data in memory at once for a given | record, to serialize it, and then you've basically got it | in memory _twice_ at the same time (original + serialized | text). Your minimum record size to consider becomes | "this entire object" rather than "this field". You've | also now got the code for CSV or JSON serialization in | memory. | staticassertion wrote: | Serializing is very fast, and ingesting JSON as a stream | is trivial when it's newline delineated. No one emits a | giant array of json log objects, they emit a stream of | newline delineated json. | | You almost certainly also have it in memory more than | twice since you're copying a buffer out to IO. | | If you care about record size use protobuf, compression, | backend parsers, etc. | no_circuit wrote: | Unless we're talking about something performance critical | up front like an embedded system, odds are by the time an | organization is going to care about logging performance | overhead, they probably already have a few engineers | dedicated to just optimizing logging. | | Of course some developers may just go ahead and create a | JSON/CSV string from scratch, which is about as safe as | string interpolation instead of parameter bindings for | SQL statements /s. But provided they're using a proper | JSON/CSV/proto object builder library that uses String | references, no data is going to get copied until a | logging component needs to write it to a file or a | network buffer. Therefore the only overhead are the | pointers that form the structure. | CodesInChaos wrote: | You can json or csv encode data in a streaming manner, | you just need to escape some special characters (only " | for csv) and add a separator. But I never worked on an | application where log records were big enough for | streaming to make sense. | | If you want to preserve the "one line = one record" | property typically expected from logs, standard csv will | not be suitable (since it doesn't escape linebreaks), but | json without unnecessary linebreaks will handle that | (known as json-lines). | jeffreygoesto wrote: | Wrong. Structured logging in C++ done right [0] is the most | efficient you can do. Plus you get all mentioned benefits. | | [0] https://youtu.be/FyJI4Z6jD4w | IggleSniggle wrote: | Structured logging is just taking whatever information _you | already intended to log anyway_ and giving it a consistent | structure. Ie, instead of | | "$TIMESTAMP $LEVEL: Hey look index $INDEX works!" | | you have | | $TIMESTAMP $LEVEL $FORMATTER $MSG $INDEX | | I'm quite intentionally drawing a parallel to `echo` vs | `printf` here, where printf has an interface that | communicates intended format while echo does not. | | The only overhead of structured logs done right is that you | need to be explicit about your intended formatting on some | level, but now with the bonus of reliable structure. | Whether you use glog or spdlog or whatever is somewhat | inconsequential. It's about whether you have an essentially | unknown format for your data, having thrown away the | context of whatever you wish to convey, or whether you | embed the formatting with the string itself so that it can | later be parsed and understood by more than just a human. | | If you're concerned about the extra volume of including | something like "[%H:%M:%S %z] [%n] [%^---%L---%$] [thread | %t] %v" on every log entry, then you use eg (in GLOG | equivalent for your language) LOG_IF, DLOG, CHECK, or | whatever additional macros you need for performance. | | If I'm wrong or just misunderstanding, please do correct | me. | jeffbee wrote: | LOG_IF and VLOG, and related concepts from other | libraries, have their places but I don't consider not | logging to be a solution to expensive logging. In fact, | that was my whole original point. | | With structured logging the outputs may be trivial in | which case the compiler can do a good job. But if you are | inserting a std::string into a log line and your log | output format is JSON, there's nothing the compiler can | do about the fact that the string must be scanned for | metacharacters and possibly escaped, which will be | strictly more costly than just copying the string into | the output. | bpicolo wrote: | I think there's room for both. | | Tracing is usually sampled, or much more time limited, and | aggregation tooling isn't nearly as good compared to logs | with most providers. Much better for deep dives into a narrow | context. For a single request drilldown or aggregated | performance/error metrics, tracing every time. | | Structured logging tends to give you rich search against | broader contexts (e.g. long spans of time, many requests in | aggregate, random non performance/error attributes you may | care about) without the same concerns of data cardinality. | | I felt nowhere near the same power with tracing tooling for | random broad context as I've had with e.g. Splunk for logs. | That's not to say you can't somehow have the best of both | worlds, but I haven't seen it in the ecosystem just yet. | bcrosby95 wrote: | I use WARN for "might be a problem". Only ERROR definitely | requires action. | | An elevated number of WARNs indicates a problem. So this is a | metric we monitor. And obviously ERRORs since that means there is | a problem. | radus wrote: | For python logging specifically, here's an excellent article: | https://blog.guilatrova.dev/how-to-log-in-python-like-a-pro/ | golergka wrote: | Great post! It's also the first time that I see a concise | explanation of the difference between info and debug levels. | lbriner wrote: | + Use correlation ids on more complex applications so you can | track more than a single failed call into an application. | | Getting an error like "this thing fell over because of a number | parsing error" might not be very useful when you can't recreate | it. | | With correlation, you then realise that the user did something | specific beforehand and then the cause is more obvious. | ishitatsuyuki wrote: | Short, clear, and pragmatic. This is the kind of post I'd love to | see every day. ___________________________________________________________________ (page generated 2021-10-05 23:00 UTC)