[HN Gopher] Using eBPF and predefined inspections to minimize "o... ___________________________________________________________________ Using eBPF and predefined inspections to minimize "observability tax" Author : apetruhin Score : 79 points Date : 2022-12-27 16:02 UTC (6 hours ago) (HTM) web link (coroot.com) (TXT) w3m dump (coroot.com) | PeterZaitsev wrote: | eBPF is great! Fantastic to see it used more for observability! | VectorLock wrote: | Its too bad they had to build a whole separate tool because | Grafana wasn't capable. The Node graph panel for Grafana really | needs some love. | | Edit: Also they need a SaaS to sell, so of course. | nikolay_sivko wrote: | Grafana offers one of the best solutions for storing and | navigating through telemetry data, but there remains a | challenge in using this data to generate insights. Our goal is | to address this issue, even if it means occasionally | reinventing the wheel. This is a necessary step at this stage. | buro9 wrote: | An update to the node graph is currently in the works, some | love is being shown (I work at Grafana Labs). | | It's also too bad that they built their own eBPF | instrumentation as the Cloudflare eBPF exporter also exists and | is very good https://github.com/cloudflare/ebpf_exporter | | Alternatively if what you want specifically are the | integrations mentioned on the coroot page then my money would | be on isovalent and cilium | nikolay_sivko wrote: | With ebpf-exporter it is not possible to implement complex | logic, such as converting the PID of each TCP connection into | a container name and the destination IP into a real IP | according to the conntrack table. | alexeldeib wrote: | This seems like a limitation that could be lifted instead | of introducing a separate product (disclaimer: familiar | with ebpf exporter but haven't dug into OP). | | Iirc ebpf exporter had some limitations, but they weren't | fundamental. However it was also fairly light, so maybe | another tool is just the right solve. | nikolay_sivko wrote: | Coroot's agent collects data from various sources to | cover all aspects of container behavior. Ebpf-exporter | perfectly solves the problem of running custom ebpf | programs and turning their output into metrics, but using | it as a foundation for more specific solutions doesn't | seem reasonable | prpl wrote: | Are there more data sources or ways to overlay node graph | information over time series data? | | I've wanted to use it, but haven't had time to write a custom | day source. | tptacek wrote: | There's not much to the eBPF profiler they've built, and not | very much overlap with ebpf_exporter; ebpf_exporter also | seems to require CO-RE kernels. | ekiauhce wrote: | Thanks for the great article! | | At my current employer we have a company-wide service for | aggregating error logs in particular (WARN, ERORR level log rows | and stacktraces, if it was an exception) so developers can | analyze them for debugging purposes. Also it automatically | gathers information about incoming http request (geo, ip address, | user agent, etc) and you can easily see a particular segment of | errors, and what kind of users getting them. | | As I can see you have logs quantitative metric https://community- | demo.coroot.com/p/oc1vhnmq/app/default:Dep... but without any | detalization (maybe it works this way only for the demo app). I | mean, it would be great to be able to inspect each ERROR event | separately or to define custom SLO with alert for particular type | of errors. | | Another great feature we use a lot is historical data, so you can | find patterns of error spikes on months scale and when it has | gone after fix. | | FYI this error-service I'm talking about is built on top of the | ClickHouse, so it's quite responsive regardless of the large | volumes of data. | | Another thing I want to mention is cron-like workload (or batch | jobs, you name it). Is there any support or useful metrics for | it? | john-tells-all wrote: | this looks really useful! For my business I want 1) high-res data | about local CPU-memory-IO (e.g. - how can I speed up tests), and | 2) summary sampling data from production, to detect weird bugs or | attacks. eBPF might be able to solve both cases! ___________________________________________________________________ (page generated 2022-12-27 23:00 UTC)