[HN Gopher] My Logging Best Practices (2020)
       ___________________________________________________________________
        
       My Logging Best Practices (2020)
        
       Author : isaacfrond
       Score  : 254 points
       Date   : 2021-10-05 13:52 UTC (9 hours ago)
        
 (HTM) web link (tuhrig.de)
 (TXT) w3m dump (tuhrig.de)
        
       | yongjik wrote:
       | Also, it goes without saying, but never buffer logs. Each line
       | should end up with a write(2) call or something equivalent. If
       | you buffer logs, and the process crashes, you lose logs exactly
       | when you really really need it.
        
         | Spivak wrote:
         | I agree with you in theory if resources were infinite but there
         | is no way to log at the rates developers actually want to log
         | and have a performant application without buffering.
         | 
         | I think the more practical advice is "buffer logs outside your
         | application" or "buffer logs in a separate thread/process that
         | crashes independently of your application code."
         | 
         | Even the small places I've worked could easily push 250G logs a
         | day.
        
           | yongjik wrote:
           | I guess "buffering in a separate process" can be handy, esp.
           | if you have strict performance requirements, but I guess
           | using shared memory correctly can be tricky...?
           | 
           | (Without shared memory, I imagine you'll have one system call
           | per log anyway - in which case it's not much different from
           | calling one write() per line.)
        
       | TameAntelope wrote:
       | I love these suggestions, but what is most important is that you
       | make these choices (or others) and stick to them.
       | 
       | Consistency in how you log is probably Best Practice #0, IMO.
        
       | adrianmsmith wrote:
       | > Log after, not before
       | 
       | My problem with that is what if the program crashes or produces
       | an error. It's good to know what the program was attempting to
       | do.
       | 
       | Imagine the output                   Successfully connected to
       | DB.         Successfully read config file.          Uncaught
       | Exception: NullPointerException
       | 
       | vs.                   Will connect to DB...         Will read
       | config file...         Will connect to API server...
       | Uncaught Exception: NullPointerException
        
         | jlokier wrote:
         | I've found the most useful log before/after pattern is a
         | little-known dynamic variant.
         | 
         | Imagine this success output, at a low (not tracing) log level:
         | Opened DB connection
         | 
         | Error output _at the same logging level_ :
         | Opening DB connection         Opening TCP connection to
         | 127.0.0.1:5678           Error: No more file descriptors
         | Error: Failed opening TCP connection       Error: Failed
         | opening DB connection
         | 
         | Crash output _at the same logging level_ :
         | Opening DB connection         Opening TCP connection to
         | 127.0.0.1:5678         Starting TLS negotiation
         | Fatal: Segmentation fault           <stacktrace...>
         | 
         | Success output with tracing turned up and no error:
         | Opening DB connection         Opened TCP connection to
         | 127.0.0.1:5678         TLS negotiation completed in 0.1ms
         | DB version 5.1 features A, B, C       Opened DB connection
         | 
         | The key feature here is that there's a scoped before/after
         | message pair ("Opening DB connection" and "Opened DB
         | connection"), and the before part of the pair is output _if and
         | only if_ any messages are output in between.
         | 
         | The contour around messages in between is visible (here with
         | indentation), and the before message is _not_ shown if there
         | are no intermediate messages. Sometimes it 's useful to show a
         | different after-message if the before/after pair is not output,
         | to include a small amount of information in the single message,
         | or just to improve the language in the quieter case. Sometimes
         | it's useful to suppress the after-message just like the before-
         | message, if there is nothing in between.
         | 
         | Anything that might terminate the program, such as the
         | Segmentation fault with stacktrace, counts as an intermediate
         | message, and so triggers the output of all the before messages
         | as part of the termination message logging. For this to work,
         | of course you must have some way for the before messages to be
         | queued that is available to the crash handler.
         | 
         | This is a poorly known technique, which is a shame as in my
         | opinion it's a very helpful technique.
         | 
         | Its chief benefit isn't saving a few messages, it's that it
         | _scales_ to complex, deeply called systems with heavy amounts
         | of tracepoints encouraged throughout the code. When studying
         | issues, you may turn on tracing for some components and
         | actions. As you turn on tracing for a particular component,
         | before-messages from _caller_ components become activated
         | dynamically, ensuring there 's always context about what's
         | happening.
         | 
         | In case you're wondering, it works fine with multiple threads,
         | async contexts, and even across network services, if you have
         | the luxury of being able to implement your choice of logging
         | across services, but of course there needs to be some context
         | id on each line to disambiguate.
        
         | spaetzleesser wrote:
         | Agreed. Logging before gives you a lot of information to
         | diagnose an issue. Logging just exceptions usually lacks a lot
         | of necessary information
        
           | goohle wrote:
           | Just print out all necessary information, in case of
           | exception, and suggest a potential solution to the problem.
        
             | slaymaker1907 wrote:
             | This does not work in practice a lot of the time. What if
             | the problem was actually due to some other thread and said
             | other thread didn't crash? If you only log after
             | success/failure, you would have absolutely no idea why the
             | program crashed. Oftentimes if the actual error were
             | anticipated enough for good reporting, said error would not
             | exist in the first place.
        
         | AlexCoventry wrote:
         | Would it make sense to log a traceback, on unexpected errors?
        
         | jayknight wrote:
         | I think he is assuming that the logged exception will include
         | the details needed to track down the error (line number, stack
         | trace)
        
           | bobbyi wrote:
           | Yes. I think logging before is still useful, but it is for
           | hangs, not exceptions.
           | 
           | If your program disappears into a black hole doing some
           | operation that doesn't have (and needs) a timeout specified,
           | you want to know what it was about to do when you last heard
           | from it.
        
           | adrianmsmith wrote:
           | Of course in an ideal scenario that's the case. Nevertheless
           | programs do sometimes just "crash" without an error message,
           | or maybe they hang and never produce any output due to an
           | infinite loop, or excessive memory allocation and GC
           | overhead, etc. Logging what the program was attempting to do
           | is useful in those scenarios.
        
         | raldi wrote:
         | That's what the DEBUG level is for, as explained at the end of
         | the section you're referring. So it would be like:
         | D: Connecting to DB         D: Reading config file         D:
         | Connecting to API server         D: Sending updated record
         | I: Updated user record for $USER
         | 
         | Or, if it fails, the last line is replaced by:
         | E: Failed to connect to API server; could not update user
         | record. [IP=..., ErrNo=..]
        
           | mirekrusin wrote:
           | Debug level is not for production, it's for debugging.
        
           | yongjik wrote:
           | Personally I'm not a fan of DEBUG level (unless you always
           | keep it on). You can have a transient system issue that
           | appeared for 30 minutes and then went away. If DEBUG logs
           | weren't turned on at the time, they won't help me.
        
             | Jowsey wrote:
             | Why not have the program write logs to a temp file
             | somewhere at trace/debug level, and only output the
             | standard level to the console?
        
               | Cicero22 wrote:
               | Wouldn't that end up slowing down the whole system from
               | writing to a file though?
        
               | Jowsey wrote:
               | Ah, suppose it depends on how much you're doing and/or
               | how verbose your logs are. If you've got lines speeding
               | past then yeah I maybe wouldn't write them all.
        
               | grey-area wrote:
               | Writing verbose logs is very expensive.
        
               | worik wrote:
               | It depends which costs what.
               | 
               | In a case where there are plenty of resources and the
               | thing you are logging is very heavy then even verbose
               | logging is trivial.
               | 
               | If you are beset by failures it is much cheaper than
               | making your developers guess.
        
             | raldi wrote:
             | Of course you always keep it on -- the point of the
             | different levels is that you can filter them out at _read_
             | time (e.g. to eyeball logs, or to alert when the rate of
             | >= WARN is too high)
        
               | codemac wrote:
               | This traditionally has not been log level's use case.
               | 
               | syslog() and setlogmask() being the obvious examples.
               | 
               | Many (if not all) companies I have worked out filtered
               | out everything DEBUG at compile time to improve
               | performance in production as well.
               | 
               | The best option I've seen is putting everything in DEBUG
               | or lower into a memory ring buffer per process that's
               | dumped and collated with other logs at each ERROR/FATAL
               | log.
        
               | hurflmurfl wrote:
               | I've actually used this idea in production and it worked
               | great.
               | 
               | In my web app if nothing unexpected happens, only
               | INFO/LOG level stuff is pushed to logs. If, however, I
               | get an exception, I will instead print out all the
               | possible traces. I.e. I always store all the logs in-
               | memory and choose what should be printed out based on the
               | success/failure of the request.
               | 
               | Now, of course, this is just a web API running in AWS
               | Lambda, and I don't have to care overly much about the
               | machine just silently dying mid-request, so this might
               | not work for some of you, but this works great for my
               | use-case and I'm sure it will be enough for a lot of
               | people out there.
        
               | spockz wrote:
               | That is actually quite nifty. That way you can carry
               | debug logs and only materialise/send them in case of
               | errors when you are likely interested in it without
               | having to put those logs at error and see them all the
               | time etc.
               | 
               | Does anyone know of an implementation for this in any of
               | the Java logging frameworks?
        
               | masklinn wrote:
               | Fwiw sentry does that out of the box, anything below
               | warning (I think) is used as context for "actual events".
        
               | snak wrote:
               | Performance degradation is noticeable with DEBUG level in
               | production.
        
               | raldi wrote:
               | Performance degradation is noticeable with excessive
               | logging; the level is irrelevant.
        
               | mplewis wrote:
               | Leaving debug levels on in production will bankrupt most
               | companies.
        
               | raldi wrote:
               | Addressed here:
               | https://news.ycombinator.com/item?id=28764834
        
           | dezzeus wrote:
           | Even though it can be lost by an optimistic configuration, I
           | prefer to use the TRACE level(s) and leave the DEBUG one for,
           | well, debugging.
        
             | raldi wrote:
             | We're discussing a NullPointerException situation; how is
             | that not debugging?
        
               | Leherenn wrote:
               | I think OP's case was not debugging a crash, just that if
               | a crash happens he would like the log to be before rather
               | than after. The log statements would be there no matter
               | whether there is a crash or not, just in case.
        
         | BrandoElFollito wrote:
         | I prefer to have both information (the actual logs would be
         | structured)                   Connecting to DB
         | Connected to DB - OK         Reading config file         Read
         | config file - OK
         | 
         | Even better, but hardly doable because of the nature of logging
         | Connecting to DB ... OK         Reading config file ... OK
        
           | kulikalov wrote:
           | Performance though?
        
             | worik wrote:
             | Diagnosis of failure?
        
       | whatupmd wrote:
       | Users sure do appreciate people who put love into their logging.
       | 
       | Remember that just because you love to hear what your application
       | is doing, it's not always be helpful or useful to your user. Give
       | consideration to the Rule of Silence
       | (http://www.linfo.org/rule_of_silence.html) and even if you do
       | not agree with it remember that some of your users might.
        
       | [deleted]
        
       | ryandrake wrote:
       | One thing I noticed in one of the article's examples is that the
       | log contained an example user's PII:                   INFO  |
       | User registered for newsletter. [user="Thomas",
       | email="thomas@tuhrig.de"]
       | 
       | Ruh-roh! You now have a potential privacy incident ready to
       | occur, if an inadvertent log leak ever happens. I think, at least
       | these days, one vital component of any logging API is the ability
       | to flag particular bits of information as PII so that it can get
       | handled differently upon output, storage, or ingestion by other
       | tools. For example, if the users's actual E-mail address is not
       | specifically useful for debugging, do you need to display it in
       | the default visualization of the log?
       | 
       | I think in 2020, with companies' privacy practices getting more
       | and more scrutiny, any logging best practices doc should at least
       | mention treatment of PII.
        
         | tobbebex wrote:
         | I used to work with a medical software where we turned off
         | logging completely in production, to mitigate the risk of
         | accidentally log patient information.
         | 
         | We needed to log that kind of data during development and just
         | tagging what messages were sensitive would have been too error
         | prone, so we couldnt reliably have any logs once we went live.
         | 
         | We did however have a special crash handler that logged the
         | exception type and callstack, but not even the exception
         | message could touch the harddrive.
        
         | slaymaker1907 wrote:
         | What I find works best is to avoid logging raw datatypes and
         | instead limit it to objects/structs with custom toString. Then
         | just make sure any PII fields are not included with toString
         | either through some sort of annotation or by writing the
         | implementation manually. This also encourages minimizing the
         | code that actually interacts with PII such as replacing emails
         | with user IDs ASAP.
        
         | mholt wrote:
         | Ah yeah, that's why in Caddy 2 while we do emit full details
         | from HTTP requests in our structured JSON logs, you have the
         | ability to traverse into each log message and replace or delete
         | data within it:
         | https://caddyserver.com/docs/caddyfile/directives/log#filter
        
         | kordlessagain wrote:
         | This reminds me of the time a developer turned on debugging to
         | fix some credit card thing and was sending us credit card
         | numbers in their logs.
        
         | pugworthy wrote:
         | Agreed.
         | 
         | I found out a few years back that a large industrial device our
         | company made was recording all users passwords due to a logging
         | statement that was blindly just logging every piece of user
         | input via UI.
        
       | cloverich wrote:
       | Don't forget exit logging[1], I am a bit ashamed how far into my
       | career I was before understanding this concept.
       | 
       | [1]: https://pinecoder.dev/blog/2020-04-27/node-exit-logging
        
       | boffinism wrote:
       | "Best Practices" really is synonymous with "Personal Preferences"
       | if it can work in headlines like this, isn't it?
        
         | the_gipsy wrote:
         | The author does make some effort to explain why.
        
           | SideburnsOfDoom wrote:
           | That author however has some reading to do around structured
           | logging and related practices, which might raise their
           | personal "best".
        
         | kleiba wrote:
         | "my" + "best practices" = "personal preferences"
        
           | worik wrote:
           | "my" + "personal preferences" = "best practices"
        
       | jonnycomputer wrote:
       | I like this post, and I think the recommendations are sound. I'm
       | not quite sold on the technical vs business logic distinction
       | between debug and info. I use debug for all those messages I only
       | want to see if something has gone wrong, and info for all those I
       | want to see to know its working. That _might_ fall along the
       | business vs technical distinction, but it might not.
        
       | camel_gopher wrote:
       | } catch(e: UnauthorizedException) { log.warn("Request to REST API
       | was rejected
       | 
       | Don't throw exceptions for auth failures.
        
       | rond2911 wrote:
       | I was expecting way too much in this article. But, I don't see
       | anything about logging requestId's etc.
       | 
       | Sharing a few things briefly below I have learned over the past
       | few years -
       | 
       | 1. Log every single message for a transaction/request with the
       | requestId to look at the entire picture. Ideally, that requestId
       | would be used upstream/downstream too internally for finding
       | exact point of failure in a sea of micro services. 2. Log all
       | essential id's/PK's for that particular transaction. - makes
       | debugging easier. 3. Log all the messages possibly in a json
       | format, if the log aggregator you are using supports that.
       | Parsing later for dashboards makes lives WAY EASIER and is more
       | readable too. Might also reduce overall computations being run on
       | the cloud to extract values you want -> hence, lesser cost. 4.
       | Having error/warn logs is good, but having success/positive logs
       | is equally important imo. 5. Oh, and be very very careful about
       | what you end up logging. We don't want to log any sort of user
       | PII. At one of my previous companies, there were mistakes made
       | where we were logging user's phone numbers, addresses etc.
       | 
       | I am sure the community here already knows about these, and might
       | have even more to add. Would love to hear what other people are
       | doing.
       | 
       | Cheers.
        
       | slaymaker1907 wrote:
       | I completely disagree with Thomas' first point. It is better IMO
       | to always log before you do something so that if you crash during
       | whatever was happening, you know what you were trying to do.
       | Additionally, if you have enough logging, you can figure out that
       | some call did succeed by implication so long as you have access
       | to the source code/know what the order of high level operations
       | are.
       | 
       | If for some reason you can't easily figure out whether something
       | succeeded, you should probably log before and after the
       | operation. I personally try to avoid doing this since it can
       | clutter up the log, but it can be useful depending on how code is
       | structured.
       | 
       | Another minor point is that I prefer to have INFO cram as much
       | information as possible with DEBUG reserved for stuff which is
       | ignored unless you explicitly turn it on either due to lack of
       | utility vs cost.
        
       | gwbas1c wrote:
       | Don't SPAM the logs.
       | 
       | You don't need to log every method call, every variable, every
       | "if" statement, ect. That's what the debugger is for.
       | 
       | And, I shouldn't need to say it: Don't log passwords or other
       | sensitive information.
       | 
       | Also: Make sure your logger is configured correctly. Log4Net used
       | to (and it still might) calculate an entire stack trace with each
       | call to the logger. It can be the single highest consumption of
       | CPU in an application; even if you aren't actually logging stack
       | traces.
        
         | TameAntelope wrote:
         | If you're using a tool like Splunk, verbosity of logging is
         | usually not much of a problem.
        
         | albertgoeswoof wrote:
         | But if I don't log everything, why did I build a massive ELK
         | stack for my big data logs
        
       | tombert wrote:
       | I'm surprised there's no mention of tagging with some kind of
       | correlation ID. I find that even generating a quick one-off GUID
       | at the beginning of the request (to be used across the entire
       | request) makes tracing server logs substantially easier. I just
       | tag every log (and exception) with that ID, so when searching
       | with grep or Splunk or whatever, I have a quick and easy way to
       | disambiguate which log is connected to other logs.
        
         | wainstead wrote:
         | And there's a standard for it: https://www.w3.org/TR/trace-
         | context/
         | 
         | Many libraries and tools support the standard now.
        
           | tombert wrote:
           | And now I feel embarrassed for not knowing about
           | this...Thanks for the link!
        
             | wainstead wrote:
             | Don't, I only learned about it recently, and the standard
             | itself is quite young!
        
         | kerblang wrote:
         | Just an opinion, but I tend to think of this as distributed
         | monolithic development... the more log correlation is needed,
         | the more likely a requirement has been chopped up and then
         | duct-taped back together over HTTP.
        
           | jniedrauer wrote:
           | What about distributed systems? I had to reach for this tool
           | recently to trace requests as they propagate through up to 5
           | separate devices. Not everything is a web app.
        
           | NovemberWhiskey wrote:
           | Not sure I follow that logic?
           | 
           | If you have a heavily used service, it's likely to have many
           | concurrent requests logging into the same log stream and you
           | absolutely need a request correlation identifier to be able
           | to thread the log statements for a single request back
           | together.
           | 
           | That's essential even if you don't actually call any other
           | services as part of your implementation.
        
       | comeonseriously wrote:
       | > Log after, not before
       | 
       | I'd say this is good for production. During development, I
       | sometimes like to log the name of the function I'm in as the
       | first line of the function. That way I'm assured to know if the
       | function was even called. I will also put before and after logs
       | if I want to see how data is changed (Yeah, there are things
       | called debuggers that let you watch variables, but sometimes a
       | log is just so much easier - such as in heavily threaded code or
       | code with a number of timers and events firing).
        
         | raldi wrote:
         | That's why the author wrote:
         | 
         |  _> I make exceptions for DEBUG._
        
           | jonnycomputer wrote:
           | Yes, but I think the author could have expanded on that tbh.
        
           | 8008135 wrote:
           | But that makes me engage my monkey brain to decide what level
           | and where to put the message. Easier to have a consistent
           | rule that all logging is before/after (I'm strongly in the
           | before camp).
        
       | progx wrote:
       | In asynchronous apps it could be better to log both, before and
       | after. You don't know what will be possible logged in the
       | maentime.
       | 
       | With info and success level you could log info before and success
       | after execution.
       | 
       | In dev mode you show info, in production you could filter
       | success.
        
       | chriswarbo wrote:
       | Will Sargent did a nice talk on logging at Scalacon. My biggest
       | takeaway was that pre-choosing a 'log level' for what to skip is
       | making the decision too early (it's hard to guess what messages
       | we'll want), and too coarse-grained (we don't want verbose
       | messages when we follow the happy-path, but tricky/exceptional
       | paths may want more logging).
       | 
       | His suggestion is to accumulate verbose messages in memory/disk,
       | so we can _later_ decide whether to dump them out (e.g. in a top-
       | level error handler).
       | 
       | The talk suggested an SQLite DB, with dynamically-configurable
       | filters for what to keep, etc. but that's a step beyond ;)
        
         | worik wrote:
         | I really do not think that logs belong in any sort of database.
         | Perhaps there is something I have not thought of?
         | 
         | Structured logs (of course structured) in a simple text file.
         | Rotate the files if your antiquated system cannot handle big
         | files or if you wildly successful crypto currency scam
         | generates many many gigabytes a time period.....
         | 
         | `grep` is your friend!
        
         | adrianmsmith wrote:
         | > His suggestion is to accumulate verbose messages in
         | memory/disk, so we can later decide whether to dump them out
         | (e.g. in a top-level error handler).
         | 
         | Interesting; I've definitely seen that with unit test
         | frameworks. Where in addition to assertEquals(...) there was an
         | annotate(...) method which just took a string, and was stored
         | for that unit test. If the unit test eventually failed, all the
         | previous annotations for that unit test were printed, otherwise
         | they weren't.
         | 
         | https://metacpan.org/pod/Test::Unit::TestCase#annotate-(MESS...
        
       | bjt2n3904 wrote:
       | Another thing to think about is how the log looks on info, vs on
       | warn, and debug.
       | 
       | In production, if you set to warn only, will you have enough
       | data? If you set to info, will you have too much?
        
       | spaetzleesser wrote:
       | I totally disagree with the "log after ". I have seen it multiple
       | times where the log ended but it wasn't possible to tell in what
       | code section because some call had thrown an exception. I much
       | prefer to see a log where I see lines like "attempting to do x
       | with parameters a,b,c".
       | 
       | In general I set the bar for " too much logging" very high. It's
       | a better problem to have too much log to store and sift through
       | vs having no data.
        
         | ziml77 wrote:
         | I've also seen issues with logging after.
         | 
         | It feels more natural to log when something is complete instead
         | of constantly saying when you're starting to do something, but
         | it leads to issues where you have no idea what the system was
         | actually doing when it failed. It's very easy to have the
         | process die without logging what made it die, especially if
         | you're using a language that doesn't provide stack traces or
         | you have them turned off for performance.
         | 
         | Logging before an action also means it's possible to know
         | what's going on when a process hangs. It might lead me to where
         | I should look to resolve the problem instead of forcing me to
         | just restart the service and hope for the best.
        
         | raldi wrote:
         | The author does log before, just at DEBUG level. In the
         | scenario you describe, the information you desire would be
         | available.
        
         | SideburnsOfDoom wrote:
         | In that case, the stack trace from the exception should be able
         | to tell you that you failed on the line _before_ the "log
         | after".
         | 
         | It won't be nearly as convenient to dig that fact out, though.
        
           | [deleted]
        
           | pugworthy wrote:
           | That assumes that the all fail cases will create an exception
           | and properly pass on that a failure occurred, with enough
           | information so that the caller can decide what to do next.
        
             | SideburnsOfDoom wrote:
             | The assumption upfront is that you don't reach the "log
             | after" point. IMHO, given that, not having an exception
             | will be rare, and probably involve code that looks "off"
             | and fixing that is the first task. YMMV though, depending
             | on language
        
         | georgekin wrote:
         | Another option is to log after, but in a cross cutting way,
         | e.g. using a middleware or filter. Extra properties can be
         | added along the way, and if there's an exception the middleware
         | can handle it and add the problem to the log event.
         | 
         | The power of structured logging here is in enriching events to
         | make them queryable later, so you could e.g. view all events
         | for a request.
        
       | xtracto wrote:
       | A friend of mine wrote a similar article back in 2006:
       | https://nerds.kueski.com/better-logging/
       | 
       | We used that as a standard for logging for at some point, it has
       | some pretty good insights.
        
       | macfaran wrote:
       | I always try to think of the log like I think of the UI, i.e.
       | make it usable with as little knowledge as possible. If the
       | user/1st line support can use it without calling you that's the
       | goal.
        
       | rhn_mk1 wrote:
       | > Log after, not before
       | 
       | This is already weird, what about operations that take a long
       | time? If the session is interactive, the user should definitely
       | know that nothing hung up. Or is it out of scope somehow?
        
         | raldi wrote:
         | The author later explains that in such situations, you log
         | before at DEBUG level, and after at INFO.
        
         | namanyayg wrote:
         | Yeah, it's best to log BOTH before and after.
        
           | stinos wrote:
           | Or if you think both is too much, probably only before? Might
           | depend on te application but I've seen enough native
           | application logfiles with "did FOO" as very last line meaning
           | the crash happened on the action executed next, but that
           | wasn't logged because 'log after!'...
        
             | rhn_mk1 wrote:
             | Logging before has the disadvantage of being noisier than
             | after only, at least if you want to inspect the result of
             | the operation.
        
         | piaste wrote:
         | I tend to use VERBOSE severity if I'm just logging "we've
         | reached this line in the code and local variables are set to
         | $whatever".
         | 
         | Since you generally only want to see them when diagnosing a
         | frozen job or something equally obtuse.
        
         | SideburnsOfDoom wrote:
         | For that there is is usually some log wrapper that times the
         | operation, e.g.
         | using(_log.TimeOperation("Doing the thing"))        {
         | DoTheThing();        }
         | 
         | So that you get a start message, and an end message with
         | timing, and if you exited with normal control flow vs. with an
         | exception.
         | 
         | Author doesn't seem to have reached that level of best practice
         | yet.
        
       | m_ke wrote:
       | + Use structured logging (ex:
       | https://www.structlog.org/en/stable/why.html)
        
         | no_circuit wrote:
         | Totally agree. One huge advantage not mentioned specifically at
         | that structured logging link is handling privacy. A logging
         | pipeline can be set up to have a collector that automatically
         | redacts privacy-related or sensitive fields. Only give
         | developers access to redacted logs to help prevent leaks.
        
           | xxpor wrote:
           | Who would have access in the first place besides said devs?
        
             | wainstead wrote:
             | The ops people. In many companies access to production is
             | limited.
        
               | [deleted]
        
               | closeparen wrote:
               | It's important for data to be limited to those with
               | business need to know, which is why developers (who are
               | responsible for the application) should have access to
               | its data while ops (who are responsible for core
               | infrastructure) must not.
               | 
               | I would not put an email address in logs, but definitely
               | some kind of account identifier so that we can
               | investigate bug reports from specific users.
        
         | Spivak wrote:
         | Structured logging is just tracing but less powerful. "Context"
         | and "binds" are just poor-man's spans. If you're going to go
         | through the effort to redo your whole application's logging
         | infrastructure just bite the bullet and use OpenTelemetry.
         | 
         | Edit: Good lord people, set the collector to never sample and
         | use one of the many file exporters and you get completely bog
         | standard logs but with a huge amount of work already done for
         | you and enough context to follow complicated multi-
         | threaded/multi-process programs without making your head spin.
        
           | billti wrote:
           | Indeed. For those unfamiliar, take a quick read of
           | https://github.com/open-telemetry/opentelemetry-
           | specificatio...
           | 
           | There are a few concepts to understand, but instrumenting it
           | right from the start is valuable as it grows into a bigger
           | system. Being a cross-cutting concern, the more code that
           | standardizes on this the more useful it will be.
        
           | closeparen wrote:
           | Maybe it's just a budget pathology, but we sample our traces.
           | Browsing the available traces can be informative, but you
           | can't just pull a trace for an arbitrary request. We do keep
           | and index all the logs (for a few days).
        
           | jeffbee wrote:
           | So much this. What's important to know is that structured
           | logging is _as expensive_ as tracing. If you are forced into
           | the cost of a trace span for every message, you won't do much
           | logging. Maybe this makes sense to Python programmers to whom
           | a few microseconds here and there don't seem significant, but
           | to C++ programmers it is pure lunacy. The cost of
           | informational logging with something like glog or spdlog can
           | be as little as a few CPU cycles.
        
             | CodesInChaos wrote:
             | I don't see why basic structured logging would be more
             | significantly more expensive or complex that unstructured
             | logging.
             | 
             | In the simplest case you could just output a sequence of
             | values (for example as a csv, or one json array per line),
             | instead of an interpolated string. The size and CPU
             | overhead of this approach is minimal, but makes it much
             | easier and more robust to parse the log afterwards.
             | 
             | Such basic structured logging is quite similar to what the
             | article proposes in "Separate parameters and messages", but
             | more consistent and unambitious if one of the arguments
             | contains your separator character.
        
               | handrous wrote:
               | Correctly and safely serializing to CSV and JSON is more
               | overhead than spitting out bytes. It's also easier to
               | spit or ingest bytes as a stream--yes, there are tools
               | that understand streams of json objects, but you still
               | have to have all the data in memory at once for a given
               | record, to serialize it, and then you've basically got it
               | in memory _twice_ at the same time (original + serialized
               | text). Your minimum record size to consider becomes
               | "this entire object" rather than "this field". You've
               | also now got the code for CSV or JSON serialization in
               | memory.
        
               | staticassertion wrote:
               | Serializing is very fast, and ingesting JSON as a stream
               | is trivial when it's newline delineated. No one emits a
               | giant array of json log objects, they emit a stream of
               | newline delineated json.
               | 
               | You almost certainly also have it in memory more than
               | twice since you're copying a buffer out to IO.
               | 
               | If you care about record size use protobuf, compression,
               | backend parsers, etc.
        
               | no_circuit wrote:
               | Unless we're talking about something performance critical
               | up front like an embedded system, odds are by the time an
               | organization is going to care about logging performance
               | overhead, they probably already have a few engineers
               | dedicated to just optimizing logging.
               | 
               | Of course some developers may just go ahead and create a
               | JSON/CSV string from scratch, which is about as safe as
               | string interpolation instead of parameter bindings for
               | SQL statements /s. But provided they're using a proper
               | JSON/CSV/proto object builder library that uses String
               | references, no data is going to get copied until a
               | logging component needs to write it to a file or a
               | network buffer. Therefore the only overhead are the
               | pointers that form the structure.
        
               | CodesInChaos wrote:
               | You can json or csv encode data in a streaming manner,
               | you just need to escape some special characters (only "
               | for csv) and add a separator. But I never worked on an
               | application where log records were big enough for
               | streaming to make sense.
               | 
               | If you want to preserve the "one line = one record"
               | property typically expected from logs, standard csv will
               | not be suitable (since it doesn't escape linebreaks), but
               | json without unnecessary linebreaks will handle that
               | (known as json-lines).
        
             | jeffreygoesto wrote:
             | Wrong. Structured logging in C++ done right [0] is the most
             | efficient you can do. Plus you get all mentioned benefits.
             | 
             | [0] https://youtu.be/FyJI4Z6jD4w
        
             | IggleSniggle wrote:
             | Structured logging is just taking whatever information _you
             | already intended to log anyway_ and giving it a consistent
             | structure. Ie, instead of
             | 
             | "$TIMESTAMP $LEVEL: Hey look index $INDEX works!"
             | 
             | you have
             | 
             | $TIMESTAMP $LEVEL $FORMATTER $MSG $INDEX
             | 
             | I'm quite intentionally drawing a parallel to `echo` vs
             | `printf` here, where printf has an interface that
             | communicates intended format while echo does not.
             | 
             | The only overhead of structured logs done right is that you
             | need to be explicit about your intended formatting on some
             | level, but now with the bonus of reliable structure.
             | Whether you use glog or spdlog or whatever is somewhat
             | inconsequential. It's about whether you have an essentially
             | unknown format for your data, having thrown away the
             | context of whatever you wish to convey, or whether you
             | embed the formatting with the string itself so that it can
             | later be parsed and understood by more than just a human.
             | 
             | If you're concerned about the extra volume of including
             | something like "[%H:%M:%S %z] [%n] [%^---%L---%$] [thread
             | %t] %v" on every log entry, then you use eg (in GLOG
             | equivalent for your language) LOG_IF, DLOG, CHECK, or
             | whatever additional macros you need for performance.
             | 
             | If I'm wrong or just misunderstanding, please do correct
             | me.
        
               | jeffbee wrote:
               | LOG_IF and VLOG, and related concepts from other
               | libraries, have their places but I don't consider not
               | logging to be a solution to expensive logging. In fact,
               | that was my whole original point.
               | 
               | With structured logging the outputs may be trivial in
               | which case the compiler can do a good job. But if you are
               | inserting a std::string into a log line and your log
               | output format is JSON, there's nothing the compiler can
               | do about the fact that the string must be scanned for
               | metacharacters and possibly escaped, which will be
               | strictly more costly than just copying the string into
               | the output.
        
           | bpicolo wrote:
           | I think there's room for both.
           | 
           | Tracing is usually sampled, or much more time limited, and
           | aggregation tooling isn't nearly as good compared to logs
           | with most providers. Much better for deep dives into a narrow
           | context. For a single request drilldown or aggregated
           | performance/error metrics, tracing every time.
           | 
           | Structured logging tends to give you rich search against
           | broader contexts (e.g. long spans of time, many requests in
           | aggregate, random non performance/error attributes you may
           | care about) without the same concerns of data cardinality.
           | 
           | I felt nowhere near the same power with tracing tooling for
           | random broad context as I've had with e.g. Splunk for logs.
           | That's not to say you can't somehow have the best of both
           | worlds, but I haven't seen it in the ecosystem just yet.
        
       | bcrosby95 wrote:
       | I use WARN for "might be a problem". Only ERROR definitely
       | requires action.
       | 
       | An elevated number of WARNs indicates a problem. So this is a
       | metric we monitor. And obviously ERRORs since that means there is
       | a problem.
        
       | radus wrote:
       | For python logging specifically, here's an excellent article:
       | https://blog.guilatrova.dev/how-to-log-in-python-like-a-pro/
        
       | golergka wrote:
       | Great post! It's also the first time that I see a concise
       | explanation of the difference between info and debug levels.
        
       | lbriner wrote:
       | + Use correlation ids on more complex applications so you can
       | track more than a single failed call into an application.
       | 
       | Getting an error like "this thing fell over because of a number
       | parsing error" might not be very useful when you can't recreate
       | it.
       | 
       | With correlation, you then realise that the user did something
       | specific beforehand and then the cause is more obvious.
        
       | ishitatsuyuki wrote:
       | Short, clear, and pragmatic. This is the kind of post I'd love to
       | see every day.
        
       ___________________________________________________________________
       (page generated 2021-10-05 23:00 UTC)