[HN Gopher] Tips for analyzing logs
       ___________________________________________________________________
        
       Tips for analyzing logs
        
       Author : historynops
       Score  : 186 points
       Date   : 2022-12-13 16:47 UTC (6 hours ago)
        
 (HTM) web link (jvns.ca)
 (TXT) w3m dump (jvns.ca)
        
       | phillipcarter wrote:
       | Surprised to see that under the section "correlate between
       | different systems" tracing isn't mentioned as an alternative
       | approach. That's what tracing is: logging across different
       | systems and getting that structure all stitched together for you.
        
       | tetha wrote:
       | As much as I approve of a skillset to analyze local logs, but
       | after a relatively small scale (10-20 systems), a central decent
       | log aggregation like opensearch or ELK just brings so much value
       | even on 1-3 nodes. It'd be one of the first changes I make to an
       | infrastructure because it's so powerful.
       | 
       | And its not just log searching and correlation value. At work,
       | the entire discussion "oh but we need access to all servers
       | because of logs" just died when all logs were accessible via one
       | web interface. I added a log aggregation and suddenly only ops
       | needed access to servers.
       | 
       | Designing that thing with accessibility and discoverability in
       | mind is a whone nother topic though.
        
         | rsdbdr203 wrote:
         | Agreed! Centralizing logs is so helpful, but you don't know it
         | until you've done it. Too many people rely upon grep, when a
         | handy tool is just a download away. Plug for said too:
         | https://log-store.com
        
       | emmelaich wrote:
       | I found the histogram technique to be really helpful. Slight mod
       | - I tend to sort reverse at the end of the pipeline (sort -rn);
       | then |head is often more useful.
       | 
       | It's also good to have histograms by hour or day. I've hacked up
       | scripts to do this but I should really make something better!
        
       | baud147258 wrote:
       | A few weeks ago I had a windows installer that was silently
       | failing when upgrading from an older version (installation from
       | scratch was working without issues). And as windows install logs
       | aren't exactly easy to read, I was stumped, until I took an
       | upgrade log from an older, working build, strip all date
       | information from both files and compare them, checking all the
       | sections which were different, until I found a line indicating
       | that a colleague had forgotten about a limitation when dealing
       | with msp (don't delete components on minor upgrades, but I didn't
       | throw any stones as I've done the same mistake, twice, one and
       | two years ago...)
        
       | the_arun wrote:
       | Do we still use utilities like grep for searching logs? Are these
       | when we cannot stream logs to tools like Splunk & Loggly and use
       | their search services?
        
         | ranger207 wrote:
         | All of our logs are in Kibana, but sometimes I'll `kubectl logs
         | pod > tmp; grep pattern tmp` because Kibana's search and
         | filtering is often annoying. Actually, I'll usually open the
         | logs in vim and use its search, which also get me the ability
         | to eg delete every line not containing the pattern and then
         | search only things matching the first pattern. I'm going to try
         | lnav as mentioned in this thread but I've gotten by fine with
         | judicious use of grep
        
       | chaps wrote:
       | One thing I've done to identify infrequent log entries within a
       | log file is to remove all numbers from a file and print out a
       | frequency of each. Basically just helps to disregard timestamps
       | (not just at the beginning of the line), line numbers, etc.
       | cat file.log | sed 's/[0-9]//g' | sort | uniq -c | sort -nr
       | 
       | This has been incredibly helpful in quickly resolving outages
       | more than once.
        
       | chasil wrote:
       | A minor optimization is collapsing the grep -v, from this:
       | cat file | grep -v THING1 | grep -v THING2 | grep -v THING3 |
       | grep -v THING4
       | 
       | to this:                 egrep -v 'THING1|THING2|THING3|THING4'
       | file
       | 
       | That gets rid of the cat and three greps. Both POSIX and GNU
       | encourage grep -E to be used in preference to egrep.
       | 
       | A pcregrep utility also used to exist, if you want expansive
       | perl-compatible regular expressions. This has been absorbed into
       | GNU grep with the -P option.
        
         | anotherevan wrote:
         | Or the following if pipes or egrep make you nervous:
         | 
         | grep -v -e THING1 -e THING2 -e THING3 -e THING4 file
        
         | aargh_aargh wrote:
         | I once compared the speed of these two approaches, rather
         | accidentally. I did output colorization by adding ANSII
         | sequences. I thought, of course one process must be more
         | efficient than a pipe of processes. After the rewrite, I was
         | disappointed about the slowdown and reverted back to the pipe.
         | 
         | PS I checked back and I used sed rather than grep. I think the
         | result would hold for grep but the morale is that you should
         | verify rather than assume.
         | 
         | I have around 50 seds in the pipe, running in parallel (which
         | is what makes it faster), it would have been a half of that
         | when I tried the rewrite.
        
         | yakubin wrote:
         | I usually prefer to pipe fgrep -v into fgrep -v. With egrep you
         | need to escape brackets and other characters.
        
           | mdaniel wrote:
           | If you haven't seen it, there's been some noise recently
           | about fgrep and its egrep friend                 $ fgrep -h
           | fgrep: warning: fgrep is obsolescent; using grep -F       $
           | egrep -h       egrep: warning: egrep is obsolescent; using
           | grep -E
        
           | chasil wrote:
           | True, for absolute fixed strings, the author's approach is
           | superior.
           | 
           | Off the cuff, another way to do it is with awk's index
           | function. I don't know what speed penalty this might impose.
           | $ cat ~/fmgrep       #!/bin/awk -f            BEGIN {
           | split(ARGV[1], s, ","); ARGV[1]="" }            { m = 1;
           | for(c in s) if(index($0, s[c])) m = 0; if(m) print }
           | $ ~/fmgrep sshd,systemd,polkitd /var/log/secure
        
           | anotherevan wrote:
           | This also works:
           | 
           | fgrep -v -e THING1 -e THING2 -e THING3 -e THING4
        
         | burntsushi wrote:
         | > A pcregrep utility also used to exist, if you want expansive
         | perl-compatible regular expressions. This has been absorbed
         | into GNU grep with the -P option.
         | 
         | 'pcregrep' still exists. But with PCRE2 supplanting PCRE, it is
         | now spelled 'pcre2grep'.
         | 
         | I don't know the precise history of 'grep -P' and whether
         | 'pcregrep' was actually absorbed into it, but 'pcregrep' is its
         | own thing with its own features. For example, it has a
         | -M/--multiline flag that no standard grep (that I'm aware of)
         | has. (Although there are some work-arounds, e.g., by treating
         | NUL as the line terminator via the -z/--null-data flag in GNU
         | grep.)
        
           | chasil wrote:
           | Oddly, there are pcre2 packages in RedHat/Alma 9, but they do
           | not include a pcre2grep.
           | 
           | GNU grep is also linked to pcre, not pcre2.                 #
           | pcre2grep       bash: pcre2grep: command not found...
           | # yum install pcre2grep       Last metadata expiration check:
           | 1:58:58 ago on Tue 13 Dec 2022 11:45:44 AM CST.       No
           | match for argument: pcre2grep       Error: Unable to find a
           | match: pcre2grep            # yum whatprovides pcre2grep
           | Last metadata expiration check: 2:09:25 ago on Tue 13 Dec
           | 2022 11:45:44 AM CST.       Error: No matches found.
           | # rpm -qa | grep pcre2 | sort
           | pcre2-10.40-2.0.2.el9.x86_64
           | pcre2-syntax-10.40-2.0.2.el9.noarch
           | pcre2-utf32-10.40-2.0.2.el9.x86_64            # which grep
           | /usr/bin/grep       # ldd /usr/bin/grep | grep pcre
           | libpcre.so.1 => /lib64/libpcre.so.1 (0x00007efc473c4000)
        
       | jldugger wrote:
       | Honestly, the most amazing thing I did with logs was learn how to
       | do subtraction. Any time you have multiple instances of a thing
       | and only some of them are bad, you can easily find the problem
       | (if anyone bothered to log it) by performing bad - good.
       | 
       | The way you do this is by aggregating logs by fingerprints.
       | Removing everything but punctuation is a generic approach to
       | fingerprinting, but is not exactly human friendly. For Java,
       | log4j can use class in your logging pattern, and that plus log
       | level is usually pretty specific.
       | 
       | Once you have a fingerprint, the rest is just counting and
       | division. Over a specific time window, count the number of log
       | events, for every finger print, for both good and bad systems.
       | Then score every fingerprint as (1+ # of bad events) / (1 + # of
       | good events) and everything at the top is most strongly bad. And
       | the more often its logged, the further up it will be. No more
       | lecturing people about "correct" interpretations of ERROR vs INFO
       | vs DEBUG. No more "this ERROR is always logged, even during
       | normal operations".
        
         | 10000truths wrote:
         | Isn't this basically what structured binary logs are? Instead
         | of writing a string like `Error: foo timeout after
         | ${elapsed_amt} ms` to the log, you write a 4-byte error code
         | and a 4 byte integer for elapsed_amt. I know there are
         | libraries like C++'s nanolog that do this for you, under the
         | hood.
        
           | jldugger wrote:
           | That solves the fingerprint, but you still need to count and
           | score.
        
         | nerdponx wrote:
         | You might be interested in the TF-IDF algorithm used in
         | information retrieval and text classification.
        
           | jldugger wrote:
           | Yes, this is pretty much TF-IDF for people too lazy to count
           | the number of unique items in the corpus.
           | 
           | Since that number should be the same (or at least close!) in
           | both good and bad datasets, I'm not sure the extra math
           | matters much.
        
       | alexpetralia wrote:
       | Has anyone tried passing logs into chatGPT? I had thought it
       | would be especially effective here.
        
         | viraptor wrote:
         | It may work for a few lines, but you can't really pipe a GB of
         | logs into it, which is what we typically deal with.
         | 
         | It works for simple explanations though.
        
       | didip wrote:
       | If you are using Kubernetes, I highly recommend using
       | https://github.com/stern/stern
        
       | [deleted]
        
       | kqr wrote:
       | One thing that's improved my log analysis is learning awk (well
       | actually Perl but I think 95 % of what I do with Perl I could
       | also do with just awk). Often the most useful way to look at logs
       | is statistically, and awk can quickly let you aggregate things
       | like time-between-statements, counts, rates, state transition
       | probabilities, etc for arbitrary patterns.
        
       | digitalsushi wrote:
       | One of my pet peeves is "The Useless Use of cat Award". Someone
       | awarded it to me as a teenager in the late 90s and I've been sore
       | ever since.
       | 
       | Yup, it's often a waste of resources to run an extra 'cat'. It
       | really demonstrates that you don't have the usage of the command
       | receiving the output completely memorized. You know, the thousand
       | or so commands you might be piping it into.
       | 
       | But, if you're doing a 'useless' use of cat, you're probably just
       | doing it in an interactive session. You're not writing a script.
       | (Or maybe you are, but even still, I bet that script isn't
       | running thousands of times per second. And if it is, ok, time to
       | question it).
       | 
       | So you're wasting a few clock cycles. The computer is doing a few
       | billion of these per second? By the time you explain the
       | 'useless' use of cat to someone, the time you wasted explaining
       | to them why they are wrong, is greater than the total time that
       | their lifetime usage of cat was going to waste.
       | 
       | There's a set of people who correct the same three pairs of
       | homophones that get used incorrectly, but don't know what the
       | word 'homophone' is. (Har har, they're/their/there). I always
       | liken the people who are so quick to chew someone out for using
       | cat, in the same batch of people who do this: what if I just want
       | to use cat because it makes my command easier to edit? I can
       | click up, warp to the front of the line, and change it real
       | quick.
       | 
       | Sorry. I did say, it is a pet peeve.
        
         | amarshall wrote:
         | I do "useless" use of cat quite often because, in my brain, the
         | pipeline naturally starts with "given this file", so it makes
         | the pipeline more consistent e.g. `cat f | a | b | c` rather
         | than `a < f | b | c` where one must start with the first
         | "function" rather than with the data. I see starting with `cat`
         | analogous to the `->` thread macro in Clojure, `|>` pipe in
         | Elixir, and `&` reverse application operator in Haskell. If
         | bash permitted putting the filename first, I'd stop using
         | `cat`; alas, it does not.
        
           | aendruk wrote:
           | There's this. Though it would look nicer with another |.
           | <f a | b | c
        
             | amarshall wrote:
             | That's actually not half-bad. I think I'll use that,
             | thanks!
        
         | civopsec wrote:
         | Doesn't people like Rob Pike argue for `cat |` since it fits so
         | orthogonally with the rest of the pipeline?
        
         | infogulch wrote:
         | > the time [used to] explain the 'useless[ness]' .. of cat to
         | someone .. is greater than the total time that their lifetime
         | usage of cat was going to waste
         | 
         | If you look for situations like this they are surprisingly
         | common.
         | 
         | > _pet_ peeve
        
         | dllthomas wrote:
         | It's a lot more than the few extra cycles to spin up the
         | process - it's also an extra copy of all the data. Usually
         | that's also not much, but occasionally it's everything, as the
         | consuming program can seek in a file, but not in a pipe, so
         | might otherwise only need a tiny bit of the data.
        
         | marcosdumay wrote:
         | Well, you waste an entire fork and exec, so I believe you are
         | underestimating the time by a few orders of magnitude. Also,
         | it's almost always grep following the cat, so it's not much to
         | memorize.
         | 
         | But it's well worth wasting a process to have a nice pipeline
         | where each command does a single thing so you can easily reason
         | about them.
        
       | sisk wrote:
       | Loosely related: a few years ago I wanted a simpler alternative
       | to some of the more feature-full log viewers out there so I threw
       | together a tiny (50kb) app that might be useful to some folks in
       | here.
       | 
       | All it does is consistently colors the first field in a line from
       | stdin so you can quickly see which log lines have the same first
       | field.
       | 
       | I used it in combination with the parallel[0] command to prefix
       | log lines by replica name when tailing logs across machines:
       | https://github.com/jasisk/color-prefix-pipe
       | 
       | [0]: https://www.gnu.org/software/parallel/
        
       | chrisweekly wrote:
       | Glad to see lnav (https://lnav.org) already getting some love in
       | the comments. Hands-down the most reliable source of "thank you!
       | I wish I'd known about this tool sooner!" responses, even from v
       | experienced sysadmins / SREs.
        
       | AtNightWeCode wrote:
       | Usage of correlation ids and structured logs is pretty much
       | standard. Don't go down this path.
        
       | oxff wrote:
       | Just use structured logs, its 2022 ffs.
        
       | kamma4434 wrote:
       | Jeez, this stuff is frontpage on HN? Sounds... pretty basic. I'm
       | sure our AI overlords could produce deeper content.
        
         | viraptor wrote:
         | People on HN have varied level of skills, and this is a well
         | structured introduction to diving into logs. It already started
         | conversation about better tooling. Let's celebrate today's
         | lucky 10000 https://xkcd.com/1053/ rather than talk something
         | down for being basic.
        
         | hex4def6 wrote:
         | I'm interested in this idea as well. Seems like it would be
         | useful for detecting unusual issues assuming you have a large
         | enough data set of "normal" runs.
         | 
         | I've been mulling this idea over in my head as well. I have a
         | fleet of PCs out in the wild, all running the same software. It
         | would be nice to have an easy way to detect strange behavior
         | like processes that are continually respawning / segfaults /
         | crashes / etc, without explicitly writing a bunch of search
         | terms.
        
         | phillipcarter wrote:
         | Yours is an unhelpful and non-constructive comment. Clearly a
         | lot of people have been getting something out of the content in
         | this post, as it's started several discussions.
        
       | kerblang wrote:
       | > you'll get overwhelmed by a million irrelevant messages because
       | the log level is set to INFO
       | 
       | I know this happens, but I think it's because programmers are
       | abusing INFO. In principle it's reserved for messages that are
       | informative at a level sys admins and a few others can make sense
       | of and use. Unfortunately abuse often leads to "We turned INFO
       | off" making it much harder to diagnose things after the fact.
        
         | pantulis wrote:
         | I think there should be an counterpart to "log analysis" which
         | is "logging strategies for your app". Which should be WHAT to
         | log and WHEN.
         | 
         | Stuff like, if you are exposing an HTTP endpoint, you should
         | log the request URL and the time it took to serve it. Or if you
         | are invoking an extenral service, you should log the response
         | time of that service.
        
           | lowbloodsugar wrote:
           | And you should produce a single line of output for each
           | request that identifies all of the pertinent information. You
           | can have more than one, for e.g. a thread dump, but there
           | should be one that provides a complete summary. I've lived
           | with apps that logged at each stage of the process as
           | separate lines, and that's just not useful data when grepping
           | for anomalies.
        
             | rsdbdr203 wrote:
             | I hypothesize it's not useful _because_ you 're using grep.
             | If you use a tool that can show you multiple lines all tied
             | by a request ID, it becomes much more helpful.
        
       | jeppesen-io wrote:
       | also, journalctl. Very nice to have all logs in one place in a
       | semi-structured nature
       | 
       | `journalctl -u myservice -S "5 min ago"`
        
       | heliostatic wrote:
       | My biggest quality of life improvement for understanding logs has
       | been lnav (https://lnav.org/) -- does everything mentioned in
       | this post in a single tool with interactive filtering and quick
       | logical and time based navigation.
        
         | TeMPOraL wrote:
         | Thanks for linking to this!
         | 
         | Until now I thought logview.el[0] is the bee's knees, but now I
         | can feel feature envy set in. There are some seriously powerful
         | ideas listed on the lnav page, and it's also the first time I
         | saw SQLite virtual tables used in the wild.
         | 
         | --
         | 
         | [0] - https://github.com/doublep/logview
        
         | chrisweekly wrote:
         | Yes! lnav (https://lnav.org) is phenomenal. Embedded SQLite...
         | easily scriptable... OOB log formats galore, or define your
         | own... it's a mini ETL powertool that scales to at least a few
         | million rows and runs in your terminal. Maintainer's a friendly
         | dude, too.
        
         | francislavoie wrote:
         | At my current company, we're using pdsh with tail -f piped into
         | sed to follow logs from multiple remote servers at the same
         | time, label them and colorize them. Works okay. Decent solution
         | without needing to install much other software. Not my
         | favourite because it doesn't deal well with SSH sessions timing
         | out and leaves some tail -f processes hanging, and some other
         | quirks. But out of laziness and risk of breaking things in
         | prod, we haven't tried experimenting with much else.
        
           | tstack wrote:
           | lnav has some support for tailing files on remote hosts --
           | https://lnav.org/2021/05/03/tailing-remote-files.html
           | 
           | It works by uploading a stub that is an "actually portable
           | executable" to the remote over ssh. lnav then talks to the
           | stub to sync files back to the local host. I don't know the
           | scale of the hosts or files, but if they're not too big, it
           | might work fine.
        
           | UI_at_80x24 wrote:
           | Run that in a tmux session. Problem solved.
        
         | clumsysmurf wrote:
         | Interesting! Do you know how it compares to multitail?
         | (https://www.vanheusden.com/multitail/) They look very similar.
        
           | tstack wrote:
           | I don't think multitail really understands logs like lnav
           | does, it's just following the last lines in the file. For
           | example, if you try to follow multiple files in multitail
           | like so:                   $ multitail /var/log/install.log
           | -I /var/log/system.log
           | 
           | You get a view with the tail from one file followed by the
           | tail from the other, they are not collated by timestamp. In
           | contrast, if you do the same thing in lnav:
           | $ lnav /var/log/install.log /var/log/system.log
           | 
           | You will get a single view with all of the log messages from
           | both files and they're sorted by their timestamps. Here is
           | all of what lnav is doing:                 * Monitoring
           | files/directories for updates       * Decompressed
           | files/archives       * Detected the log format for each file
           | * Created SQLite vtables that provide access to log messages
           | * Built an index of all the log messages in all the files so
           | you can jump to a certain point in time or to the
           | next/previous error message.       * Display all log messages
           | with syntax highlighting
        
             | aargh_aargh wrote:
             | I migrated from multitail to lnav. Turned out to be a no-
             | brainer.
             | 
             | I second the above, just one pain point with multitail to
             | add. I often page/search/filter in the scrollback buffer (I
             | typoed "bugger" - Freudian slip?) and in multitail the
             | scrollback is a separate window with a frame and
             | everything, which is a pain (copying whole lines using
             | mouse includes the frame, ugh). The filtering/searching
             | being a separate pain.
             | 
             | One thing I used in multitail and not sure if I migrated
             | wholly to lnav was log file syntax highlighting using
             | regexes.
        
         | oefrha wrote:
         | Huh, I almost posted a duplicate recommendation. My only
         | complaint with lnav was that it had to be built from source on
         | Linux and the build was frigging huge. Apparently they have a
         | pre-compiled linux-musl binary now.
        
           | amarshall wrote:
           | > that it had to be built from source on Linux
           | 
           | Seems to be packaged quite widely
           | https://repology.org/project/lnav/versions
        
         | hermitcrab wrote:
         | Our Easy Data Transform software is intended for data wrangling
         | and desktop ETL. But it has lots of features useful for
         | browsing and manipulating log files, including:
         | 
         | * powerful filtering (including with regex)
         | 
         | * smooth scrolling of millions of rows of data
         | 
         | * support for csv, text, xml, json, Excel and other formats
         | 
         | * available for Window and Mac
        
         | dj_gitmo wrote:
         | I love lnav and use it constantly, but it crashes a lot. I do
         | wish there was something like lnav that was a little simpler to
         | use, and written in a more resilient way that crashed less. I
         | can cut lnav some slack for the crashes because identifying and
         | parsing arbitrary log formats seems like a messy problem. Still
         | it shouldn't crash 1/3rd of the time I use it.
        
         | heinrichhartman wrote:
         | Looks great, but I struggle to get this working with the two
         | main log sources on my system:
         | 
         | 1) systemd logs (binary data stored at /var/log/journal)
         | 
         | 2) docker logs (from various containers running on the host)
         | 
         | Any hints on getting this working? I don't see this covered in
         | the docs (https://docs.lnav.org/en/latest/formats.html).
        
       | ljw1004 wrote:
       | I wrote https://github.com/ljw1004/seaoflogs - an interactive
       | filtering tool, for similar ends to what's described here. I
       | wrote it because my team was struggling to analyze LSP logs
       | (that's the protocol used by VSCode to communicate with language
       | servers). But I made it general-purpose able to analyze more log
       | formats too - for instance, we want to correlate LSP logs with
       | server logs and other traffic logs.
       | 
       | (1) I wanted something where colleagues could easily share links
       | in workplace chat with each other, so we could cooperatively
       | investigate bugs.
       | 
       | (2) For LSP we're often concerned with responsiveness, and I
       | thought the best way to indicate times when viewing a log is with
       | whitespace gaps between log messages in proportion to their time
       | gap.
       | 
       | (3) For LSP we have lots of interleaved activity going on, and I
       | wanted to have visual "threads" connecting related logs.
       | 
       | (4) As the post and lnav say, interactivity is everything. I
       | tried to take it a step further with (1) javascript, (2)
       | playground-style updates as you type, (3) autocomplete which
       | "learns" what fields are available from structured logs.
       | 
       | My tool runs all in the browser. (I spent effort figuring out how
       | people can distribute it safely and use it for their own
       | confidential logs too). It's fast enough up to about 10k lines of
       | logs.
        
         | unwind wrote:
         | That sounds really interesting, although too limited for my
         | current work focus which deals with Really Big logs.
         | 
         | I tried the demo link, it rendered a lonely @ at the top left
         | of the page, the rest was blank. The console said:
         | Uncaught TypeError: [].filter(...).reverse()[0] is undefined
         | onload https://ljw1004.github.io/seaoflogs/demo.html:14
         | 
         | This was in Firefox 98.0 in Windows.
        
       | Manjuuu wrote:
       | > Often log lines will include a request ID.
       | 
       | Yes, always include a request id in every request structure you
       | create and include it also in the response and print it. It would
       | seem something obvious that everyone does by default but instead,
       | no, it's not so obvious it seems.
        
         | janci wrote:
         | Not so obvious. How to implement it without passing request id
         | to all the functions, when they are unrelated to request/http ?
         | Especially in languages without thread-locals such as
         | javascript?
        
           | yakshaving_jgt wrote:
           | You can stick it in a middleware which wraps the request
           | lifecycle.
        
           | gh123man wrote:
           | Many frameworks solve this with logger context. Add the
           | properties you want to the logging context and all future
           | logs in that context will have that property.
           | 
           | One simple example - Serilog enrichment
           | https://github.com/serilog/serilog/wiki/Enrichment
           | 
           | This does however assume your entire application uses the
           | same logger. But this is generally a requirement in services
           | that require tracing like this.
        
           | mrloba wrote:
           | Nodejs now has async local storage built in, which makes this
           | fairly easy.
        
       | 29athrowaway wrote:
       | Your friends
       | 
       | - grep (or ripgrep), sort, uniq, head, tail, cut, colrm, tr
       | 
       | - diff
       | 
       | - jq
       | 
       | - bat
       | 
       | - ministat
       | 
       | - less, more, percol
       | 
       | - Facebook Path Picker (fpp)
        
       | Severian wrote:
       | My tips:
       | 
       | 1) Fuck grep, use ripgrep, especially if you have to scour over
       | an entire directory.
       | 
       | 2) Get good with regex, seriously, it will shave hours off your
       | searching.
       | 
       | 3) For whatever application you are using, get to know how the
       | logging is created. Find the methods used where said logs are
       | made, and understand why such a log line exists.
       | 
       | 4) Get good with piping into awk if needed if you need some nice
       | readable output.
        
         | nerdponx wrote:
         | Piping into AWK feels like a misuse of the tool in all but the
         | simplest of cases. Don't forget that you can write complete AWK
         | scripts and invoke them from a file!
        
       | geocrasher wrote:
       | One thing I didn't see was how to use GREP to view the lines
       | before and after a match:                  grep regex
       | /var/log/logfile -A5 #To view the next 5 lines        grep regex
       | /var/log/logfile -B5 #To view the previous 5 lines        grep
       | regex /var/log/logfile -05 #To view the 5 lines before *and*
       | after the match
       | 
       | This is super handy to find out what happened just before a
       | service crashed, for example.
        
       | danjc wrote:
       | We've added a log tailing feature into our product UI which also
       | has a basic find/filter. It's been enormously useful for cases
       | where something weird happens as you can immediately access the
       | last few mins of logs.
        
       | terom wrote:
       | Re timing, logs like nginx access logs have their timestamp from
       | when the request completed, not when the request came in. That's
       | a significant difference for long duration (~10s+) requests, and
       | matters when trying to correlate logs or metrics to a request.
        
       | trashcan wrote:
       | Also, sometimes logs stretch across multiple lines, and the other
       | lines won't have the identifier you are searching for. For
       | example, Java stack traces. In that case if you are stuck parsing
       | unstructured logs, the simplest thing to do is to look at the
       | entire file and search for the timestamp that found the first
       | line.
        
       | par wrote:
       | My tips for analyzing logs:
       | 
       | cd ~/logs
       | 
       | cat * | grep <my string>
        
         | mxuribe wrote:
         | I guess it depends, but would you not encounter issues like
         | permission blocks, or there being so much content in the logs
         | that it is slow to find what you need?
        
       ___________________________________________________________________
       (page generated 2022-12-13 23:00 UTC)