[HN Gopher] Using eBPF and predefined inspections to minimize "o...
       ___________________________________________________________________
        
       Using eBPF and predefined inspections to minimize "observability
       tax"
        
       Author : apetruhin
       Score  : 79 points
       Date   : 2022-12-27 16:02 UTC (6 hours ago)
        
 (HTM) web link (coroot.com)
 (TXT) w3m dump (coroot.com)
        
       | PeterZaitsev wrote:
       | eBPF is great! Fantastic to see it used more for observability!
        
       | VectorLock wrote:
       | Its too bad they had to build a whole separate tool because
       | Grafana wasn't capable. The Node graph panel for Grafana really
       | needs some love.
       | 
       | Edit: Also they need a SaaS to sell, so of course.
        
         | nikolay_sivko wrote:
         | Grafana offers one of the best solutions for storing and
         | navigating through telemetry data, but there remains a
         | challenge in using this data to generate insights. Our goal is
         | to address this issue, even if it means occasionally
         | reinventing the wheel. This is a necessary step at this stage.
        
         | buro9 wrote:
         | An update to the node graph is currently in the works, some
         | love is being shown (I work at Grafana Labs).
         | 
         | It's also too bad that they built their own eBPF
         | instrumentation as the Cloudflare eBPF exporter also exists and
         | is very good https://github.com/cloudflare/ebpf_exporter
         | 
         | Alternatively if what you want specifically are the
         | integrations mentioned on the coroot page then my money would
         | be on isovalent and cilium
        
           | nikolay_sivko wrote:
           | With ebpf-exporter it is not possible to implement complex
           | logic, such as converting the PID of each TCP connection into
           | a container name and the destination IP into a real IP
           | according to the conntrack table.
        
             | alexeldeib wrote:
             | This seems like a limitation that could be lifted instead
             | of introducing a separate product (disclaimer: familiar
             | with ebpf exporter but haven't dug into OP).
             | 
             | Iirc ebpf exporter had some limitations, but they weren't
             | fundamental. However it was also fairly light, so maybe
             | another tool is just the right solve.
        
               | nikolay_sivko wrote:
               | Coroot's agent collects data from various sources to
               | cover all aspects of container behavior. Ebpf-exporter
               | perfectly solves the problem of running custom ebpf
               | programs and turning their output into metrics, but using
               | it as a foundation for more specific solutions doesn't
               | seem reasonable
        
           | prpl wrote:
           | Are there more data sources or ways to overlay node graph
           | information over time series data?
           | 
           | I've wanted to use it, but haven't had time to write a custom
           | day source.
        
           | tptacek wrote:
           | There's not much to the eBPF profiler they've built, and not
           | very much overlap with ebpf_exporter; ebpf_exporter also
           | seems to require CO-RE kernels.
        
       | ekiauhce wrote:
       | Thanks for the great article!
       | 
       | At my current employer we have a company-wide service for
       | aggregating error logs in particular (WARN, ERORR level log rows
       | and stacktraces, if it was an exception) so developers can
       | analyze them for debugging purposes. Also it automatically
       | gathers information about incoming http request (geo, ip address,
       | user agent, etc) and you can easily see a particular segment of
       | errors, and what kind of users getting them.
       | 
       | As I can see you have logs quantitative metric https://community-
       | demo.coroot.com/p/oc1vhnmq/app/default:Dep... but without any
       | detalization (maybe it works this way only for the demo app). I
       | mean, it would be great to be able to inspect each ERROR event
       | separately or to define custom SLO with alert for particular type
       | of errors.
       | 
       | Another great feature we use a lot is historical data, so you can
       | find patterns of error spikes on months scale and when it has
       | gone after fix.
       | 
       | FYI this error-service I'm talking about is built on top of the
       | ClickHouse, so it's quite responsive regardless of the large
       | volumes of data.
       | 
       | Another thing I want to mention is cron-like workload (or batch
       | jobs, you name it). Is there any support or useful metrics for
       | it?
        
       | john-tells-all wrote:
       | this looks really useful! For my business I want 1) high-res data
       | about local CPU-memory-IO (e.g. - how can I speed up tests), and
       | 2) summary sampling data from production, to detect weird bugs or
       | attacks. eBPF might be able to solve both cases!
        
       ___________________________________________________________________
       (page generated 2022-12-27 23:00 UTC)