hngopher.com

       [HN Gopher] Show HN: HyperDX - open-source dev-friendly Datadog ...
       ___________________________________________________________________
        
       Show HN: HyperDX - open-source dev-friendly Datadog alternative
        
       Hi HN, Mike and Warren here! We've been building HyperDX
       (hyperdx.io). HyperDX allows you to easily search and correlate
       logs, traces, metrics (alpha), and session replays all in one
       place. For example, if a user reports a bug "this button doesn't
       work," an engineer can play back what the user was doing in their
       browser and trace API calls back to the backend logs for that
       specific request, all from a single view.  Github Repo:
       https://github.com/hyperdxio/hyperdx  Coming from an observability
       nerd background, with Warren being SRE #1 at his last startup and
       me previously leading dev experience at LogDNA/Mezmo, we knew there
       were gaps in the existing tools we were used to using. Our previous
       stack of tools like Bugsnag, LogRocket, and Cloudwatch required us
       to switch between different tools, correlate timestamps (UTC?
       local?), and manually cross-check IDs to piece together what was
       actually happening. This often made meant small issues required
       hours of frustration to root cause.  Other tools like Datadog or
       New Relic come with high price tags - when estimating costs for
       Datadog in the past, we found that our Datadog bill would exceed
       our AWS bill! Other teams have had to adjust their infrastructure
       just to appease the Datadog pricing model.  To build HyperDX, we've
       centralized all the telemetry in one place by leveraging
       OpenTelemetry (a CNCF project for standardizing/collecting
       telemetry) to pull and correlate logs, metrics, traces, and
       replays. In-app, we can correlate your logs/traces together in one
       panel by joining everything automatically via trace ids and session
       ids, so you can go from log <> trace <> replay in the same panel.
       To keep costs low, we store everything in Clickhouse (w/ S3
       backing) to make it extremely affordable to store large amounts of
       data (compared to Elasticsearch) while still being able to query it
       efficiently (compared to services like Cloudwatch or Loki), in
       large part thanks to Clickhouse's bloom filters + columnar layout.
       On top of that, we've focused on providing a smooth developer
       experience (the DX in HyperDX!). This includes features like native
       parsing of JSON logs, full-text search on any log or trace, 2-click
       alert creation, and SDKs that help you get started with
       OpenTelemetry faster than the default OpenTelemetry SDKs.  I'm
       excited to share what we've been working with you all and would
       love to hear your feedback and opinions!  Hosted Demo -
       https://api.hyperdx.io/login/demo  Open Source Repo:
       https://github.com/hyperdxio/hyperdx  Landing Page:
       https://hyperdx.io
        
       Author : mikeshi42
       Score  : 321 points
       Date   : 2023-09-18 16:25 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | lopkeny12ko wrote:
       | I remember when every SaaS landing page looked like Slack, then
       | they all looked like Stripe, and I guess now they all look like
       | Linear.
        
         | ilrwbwrkhv wrote:
         | Designers at startups are some of the most cargo culty groups
         | in tech
        
         | fuddle wrote:
         | Linear seems to be the latest trend. https://www.linears.art/ -
         | A collection of websites inspired by Linear
        
           | pests wrote:
           | I don't think the Linear trend is as strong as the Slack or
           | Strike trend. I agree they look similar but I feel they all
           | just look like a standard modern page.
        
         | mikeshi42 wrote:
         | I designed our landing page - and I definitely took heavy
         | inspiration from Linear. As an engineer, creating novel
         | beautiful design's isn't first-nature to me, but I know how
         | critical it can be to make a clean/impactful landing page so I
         | try to take some elements from the best.
         | 
         | Some other landing pages I loved and had along side while
         | designing ours were Vercel, Resend, and WorkOS :)
        
         | [deleted]
        
         | VTimofeenko wrote:
         | If the project comes close to linear.app's platform UI
         | responsiveness - wouldn't be a bad thing.
        
           | mikeshi42 wrote:
           | Let's say my React profiler tab gets lots of love - though if
           | you find anything sluggish, let us know and I'd love to fix
           | it. The last thing I want HyperDX to feel like is Jira
           | sluggishness.
        
       | btown wrote:
       | The union of session replay and OpenTelemetry is fascinating -
       | because what is a browser session, really, other than a sequence
       | of RPCs between backend (micro)services <-> API server(s) <->
       | browser <-> human at the keyboard?
       | 
       | Being able to see that a user bounced because they couldn't
       | handle the input that they were seeing - is it all that different
       | from a service erroring because it cannot handle a certain type
       | of input?
       | 
       | Honeycomb is great for the OpenTelemetry part on the server side
       | (and with https://docs.honeycomb.io/getting-data-
       | in/opentelemetry/brow... is moving towards full-stack), and
       | systems like Posthog and Heap are great for sending session
       | replay + browser events -> Clickhouse. But I don't think I've
       | seen a great DX that ties everything together.
       | 
       | To that point - I would love to see different font/color options
       | for HyperDX: the monospaced font can become tiring to read when
       | so dense. Will be following this project closely though - this is
       | amazing work so far!
        
         | [deleted]
        
         | mikeshi42 wrote:
         | Oh yeah browsers are really just another service (and that's
         | what we try to treat it as, as well!) and it's really the same
         | set of questions you'd ask of any service, but for some reason
         | the tooling completely stops either at the frontend or at the
         | backend.
         | 
         | As for monospace font - feedback received! Is there a
         | particular section you think is too overwhelming? (search page,
         | nav bar, etc.) We've been thinking of how can we balance
         | between the ease of monospace for reading instead of having it
         | literally the default on every UI surface :P
        
       | mfkp wrote:
       | Looks very interesting, although a lot of the OpenTelemetry
       | libraries are incomplete:
       | https://opentelemetry.io/docs/instrumentation/
       | 
       | Especially Ruby, which is the one that I would be most interested
       | in using.
        
         | mikeshi42 wrote:
         | The OpenTelemetry ecosystem is definitely still young depending
         | on the language, but we have Ruby users onboard (typically
         | using OpenTelemetry for the tracing portion, and piping logs
         | via Heroku or something else via the regular Ruby logger).
         | 
         | Feel free to pop in on the Discord if you'd like to chat
         | more/share your thoughts!
        
         | [deleted]
        
       | specialist wrote:
       | First paragraph https://github.com/hyperdxio/hyperdx
       | 
       |  _" HyperDX helps engineers figure out why production is broken
       | faster by centralizing and correlating logs, metrics, traces,
       | exceptions and session replays in one place. An open source and
       | developer-friendly alternative to Datadog and New Relic."_
       | 
       | Just perfect. Bravo.
       | 
       | --
       | 
       | As a merc, I never understood the why of Datadog (or equiv). The
       | teams and projects I rotated thru each embraced the "LOG ALL THE
       | THINGS!" strategy. No guiding purpose, no esthetics. General
       | agreement about need to improve signal to noise ratio. But little
       | courage or gumption to act. And any such efforts would be easily
       | rebuffed by citing the parable of Chesterfordstorm's Fences of
       | Doom and something something about velocity.
       | 
       | Late last century, IT projects, like CRMs and ERPs, were plagued
       | by over collection of data. Opaque provenance, dubious (data)
       | quality, unclear ownership, subtractive value propositions (where
       | the whole is worth less than the parts). No, no, don't remove
       | that field. We might need it some day.
       | 
       | Today's "analytics" projects are the same, right? Every drive-by
       | stakeholder tosses in a few tags, some misc fields, a little
       | extra meta. And before anyone can say "kanban", the stone soup
       | accreted enough mass to become a gravity well threatening
       | implosion dragging the entire org-chart into the gapping maw of
       | our universe's newest black hole.
       | 
       | Am I wrong?
       | 
       | But logging is useful, right? Or at least has that potential.
       | 
       | The last time I designed a system end-to-end, that's kinda what
       | we did. Listed all the kinds of things we wanted to log. Sorta
       | settled on formats and content (never really ever done). Did
       | regular log bashs to explain and clear anomalies. Scripts for
       | grooming and archiving. (For one team I rotated thru, most of
       | their spend was on just cloudwatch. Hysterical.)
       | 
       | But my stuff wasn't B2C, so wasn't tainted by the attention
       | economy, manufactured outrage, or recommenders. No tags,
       | referrers, campaigns, etc. It was just about keeping the system
       | up and true. And resolving customer support incidents asap.
       | 
       | Does any one talk or write about this? (Those SRE themed novels
       | are now buried deep in my to read pile.)
       | 
       | I'd like some cookbooks or blue prints which show some idealized
       | logging strategies, with depictions of common enough
       | troubleshooting scenarios.
       | 
       | Having something authoritative to cite could reduce my semblance
       | to an Eeyore. "Hey, team mates, you know what'd be really great?!
       | Correlation IDs! So we can see how an action percolates thru our
       | system!"
       | 
       | Just curious.
       | 
       | PS- Datadog's server hexagon map/chart thingie is something else.
       | The kind of innovation that wins prizes.
        
         | mikeshi42 wrote:
         | Yes! It should definitely be thoughtful about what you log and
         | how you expect to use it. My biggest gripe with logs is often
         | people writing them never think about "how would I use this
         | when things are on fire?" and tend to log useless information
         | or fail to tag them in ways that are actually searchable.
         | 
         | Tagging the right IDs are a huge thing - customer X is saying
         | their instance is really slow, but if none of your logs let you
         | link service performance to customer X, your telemetry you're
         | paying for is absolutely useless!
         | 
         | You have an ally in me on this one :) I'm hoping given a bit
         | more time we get to write things like this - practical
         | observability from the perspective of a dev, as opposed to the
         | SRE angle that I think is well covered. Feel free to join us on
         | discord btw if you want to chat more - I (for better/worse)
         | love musing about these things :)
        
         | [deleted]
        
         | TheBengaluruGuy wrote:
         | > I'd like some cookbooks or blue prints which show some
         | idealized logging strategies, with depictions of common enough
         | troubleshooting scenarios.
         | 
         | > "Hey, team mates, you know what'd be really great?!
         | Correlation IDs! So we can see how an action percolates thru
         | our system!"
         | 
         | Hi, I'm building, Doctor Droid -- https://drdroid.io/ that
         | enables you join structured application logs via correlation
         | IDs and then build multiple types of rules / frameworks on it
         | -- some are at granular level and some are at aggregate levels
         | (like funnels).
         | 
         | We are early in the development lifecycle, would love to hear
         | your feedback / connect with you.
        
       | podoman wrote:
       | Looks very similar to what we're doing at https://highlight.io.
       | Would love to trade notes at some point.
       | 
       | One thing to consider with your messaging is that when you start
       | speaking to large companies, they won't see you as a datadog
       | alternative. They'll see you as a mix of sentry + fullstory +
       | honeycomb.
       | 
       | Datadog originally found its success with its metrics products,
       | and the larger the buyer of datadog gets, the more metrics-esque
       | use case a company finds. The session replay, logging and other
       | things are simply products that datadog tacks on.
       | 
       | That being said, this is clearly a large market (which is why
       | we're working on it). I particularly like the tracing UI that
       | y'all have and I'd love to chat with your team at some point.
       | Good luck.
        
         | distantsounds wrote:
         | You're charging for your product, this is MIT licensed. As the
         | meme goes, "we are not the same."
        
           | paulgb wrote:
           | Highlight is Apache-2, which is for all intents and purposes
           | equivalent to MIT if the work is not subject to patent. (this
           | is my understanding, IANAL)
        
           | podoman wrote:
           | As other commenters mentioned, we are both comparable
           | (pending your opinion on the MIT license).
           | 
           | We both charge a cloud saas fee as well:
           | 
           | https://www.hyperdx.io/pricing
           | https://www.highlight.io/pricing
        
           | endisneigh wrote:
           | they both charge money and they're both some variant of open
           | source.
        
         | [deleted]
        
       | dangoodmanUT wrote:
       | S3-backed CH merge trees are notoriously expensive due to the
       | high API call rates. We have a table doing over 11M APi calls per
       | day. What are you seeing?
        
         | parhamn wrote:
         | Is anyone doing these on cloudflare r2 where the cost is
         | significantly lower?
        
           | mikeshi42 wrote:
           | I'd love to be using Cloudflare as our cloud provider, but it
           | didn't seem to make a lot of sense for our use case.
           | 
           | We were concerned with some of the performance benchmarks
           | we've seen with R2 in the past (though they've probably have
           | improved), not to mention our compute options become a bit
           | more limited to bandwidth alliance clouds otherwise we'll be
           | eating network egress fees (which I do hate with a HUGE
           | passion).
           | 
           | Though I can imagine if you're comfortable with one of the
           | bandwidth alliance clouds already and can take a bit of a
           | perf hit for search, R2 and Backblaze both can provide some
           | cost savings depending on your workload.
        
           | cldellow wrote:
           | R2 is significantly cheaper for egress, but not for API
           | calls. It's still cheaper for API calls, but only by 10%:
           | 
           | - 1M GETs $0.36 (R2) vs $0.40 (S3)
           | 
           | - 1M PUTs $4.50 (R2) vs $5.00 (S3)
        
         | mikeshi42 wrote:
         | We use a mix of SSDs and S3 for storage depending on the
         | workload - as you're right, merging on S3 is awful and we try
         | to avoid it!
        
         | [deleted]
        
       | choppaface wrote:
       | what is DX?
       | 
       | why not grafana / prometheus / loki?
        
         | coel wrote:
         | DX is Developer eXperience
        
       | jamesmcintyre wrote:
       | This looks really promising, will definitely look into using this
       | for a project i'm working on! Btw I've used both datadog and
       | newrelic in large-scale production apps and for the costs I still
       | am not very impressed by the dx/ux. If hyperdx can undercut price
       | and deliver parity features/dx (or above) i can easily see this
       | doing well in the market. Good luck!
        
         | mikeshi42 wrote:
         | Thank you! Absolutely agree on Datadog/New Relic DX, I think
         | the funny thing we learned is that most customers of theirs
         | mention how few developers on their team actually comfortably
         | engage with either New Relic or Datadog, and most of the time
         | end up relying on someone to help get the data they need!
         | 
         | Definitely striving to be the opposite of that - and would love
         | to hear how it goes and any place we can improve!
        
         | Hamuko wrote:
         | Datadog feels like they've used a shotgun to shoot
         | functionality all over the place. New Relic felt a bit more
         | focused, but even then I had to go attend a New Relic seminar
         | to properly learn how to use the bloody thing.
        
       | fuddle wrote:
       | Congrats on the launch! Are you planning to release the cloud
       | features as source available or are they closed source?
        
         | mikeshi42 wrote:
         | Our cloud features are closed source in a downstream repo - I
         | think repos that have a very clear separation between OSS and
         | closed are best - this also enforces that our OSS is always a
         | fully-featured product that we develop on the OSS-only version
         | day to day, and our cloud features are only a minor addition on
         | top.
         | 
         | I've historically hit issues with repos that do an `ee` folder
         | and blur the line between what is truly open source and self-
         | hostable, vs need a license/cloud-only. I understand why they
         | do that, but I hope we don't replicate that confusion ourselves
         | :)
        
         | [deleted]
        
       | candiddevmike wrote:
       | Since this is MIT, someone should fork it and add SSO to the OSS
       | version/remove the SSO tax. Looks like they're just using
       | Passport for auth, shouldn't take much to enable the OAuth bits
       | of it.
       | 
       | That's why this is MIT right, so folks can contribute stuff like
       | this?
        
         | fuddle wrote:
         | The "SSO tax" is used to fund development of the project.
        
         | [deleted]
        
         | mikeshi42 wrote:
         | We're more than happy to have users self-host and deploy in a
         | way that works with their SSO provider! Whether that's via SSO
         | on Nginx or forking and adding SSO to Passport in their fork.
         | Depending on the provider, it's likely very straight-forward to
         | do.
         | 
         | We did explicitly choose MIT for the freedom of end users to
         | deploy and modify the code how they want - and tried to open
         | source pretty much everything that doesn't have a hard 3rd
         | party dependency. We do touch a bit on how we think about the
         | open core model as well in the README, and largely align with
         | Gitlab's stewardship model [1] when it comes to paid vs OSS. In
         | this case, a contribution to add SAML specifically to OSS will
         | likely not be merged. It'd also introduce complexities with
         | maintaining that alongside our cloud version that already
         | includes a specific implementation of SAML.
         | 
         | [1] https://handbook.gitlab.com/handbook/company/stewardship/
        
           | candiddevmike wrote:
           | Balancing open core needs is pretty much an impossible task
           | IMO. You will never do enough to placate your open source
           | users, and you will constantly be competing against yourself
           | and spending cycles on non-value add things. Your cloud
           | offering will be a huge time sink chasing regulatory
           | compliance, security, and data sovereignty needs as well.
           | It's for all these reasons that I personally think open core
           | with a SaaS model is no longer a sustainable option.
           | 
           | There's nothing wrong with asking folks to pay for software
           | instead of giving it away via FOSS, especially if you're
           | honest about your intentions and goals. When you choose FOSS
           | to gain traction and rug pull your users when no one converts
           | later on, you end up reaping what you sow.
        
             | freedomben wrote:
             | Just clarifying, your alternative to open core is open
             | nothing? Just proprietary it up?
        
               | candiddevmike wrote:
               | Alternatives depend on what the goals of the person or
               | organization who wrote the code are. There are various
               | FOSS and source available options that can grant some
               | freedoms while protecting others for the creator, such as
               | if they want to let users still contribute and view the
               | source.
               | 
               | My main point was you should get these ducks in order
               | first and be genuine with your intentions. Don't use FOSS
               | as a growth hack, it never ends well for the creator or
               | the user. I don't think HyperDX is genuine with their
               | intentions, as with all open core, it's all kumbaya FOSS
               | until you start encroaching on their enterprise feature
               | set.
        
               | mikeshi42 wrote:
               | I'd genuinely would love to learn the OSS options we'd
               | have available here, as we'd genuinely want to build a
               | sustainable open source project and community, while
               | preserving as many user freedoms as possible.
               | 
               | I think that HyperDX is a bit different from tools like
               | Mongo, Redis or Hashicorp in that we're a vertically
               | integrated product from SDKs/UIs to ingestion pipeline
               | and DBs, which is opposite kind of offering from done by
               | the above companies (which has made them more vulnerable
               | to the kind of rug pull you mentioned)
               | 
               | We're trying to be permissive with freedoms granted to
               | the user of our code, while still maintaining governance
               | over the project to make it sustainable.
               | 
               | We don't want to be source-available, as that's pretty
               | much the opposite of what we want to accomplish (and is
               | why we consciously did not pick a license such as
               | BSL/SSPL/etc.)
        
               | candiddevmike wrote:
               | > we'd genuinely want to build a sustainable open source
               | project and community
               | 
               | How do you plan on doing that while being VC-backed? Why
               | did you choose to be VC backed in the first place? You
               | can create a sustainable open source project and
               | community without any VC funding.
        
               | darkwater wrote:
               | Honest question: where do you see they are VC backed?
        
               | candiddevmike wrote:
               | It's on the front page of the app, the company behind
               | this (DeploySentinel) is YC backed:
               | https://www.crunchbase.com/organization/deploysentinel.
               | The original product seems like some kind of CI tool.
               | 
               | Interestingly, it seems like "HyperDX" might've been part
               | of their original product offering that they decided to
               | open source--their main website
               | (https://www.deploysentinel.com) doesn't include any
               | references to "HyperDX for CI" in May of 2023: https://we
               | b.archive.org/web/20230321102146/https://www.deplo....
               | Seems like they're pivoting to metrics? Even more of a
               | reason to be weary about this.
        
               | danr4 wrote:
               | funnily enough i made a similar comment on an exact same
               | "OS" product:
               | https://news.ycombinator.com/item?id=36774611#36775934
        
               | freedomben wrote:
               | Nice thanks, I think you make some great points.
        
       | bg46z wrote:
       | For highly regulated workloads, would it be possible to have a
       | self-hosted version that is supported?
        
         | mikeshi42 wrote:
         | Absolutely! You can either self-host the OSS version today, or
         | chat with us (mike@hyperdx.io) directly if you need a managed
         | on-prem solution or any other custom requirements depending on
         | your deployment.
        
       | Dockson wrote:
       | Just want to heap on with the praise here and say that this was
       | definitely the best experience I've had with any tool trying to
       | add monitoring for a Next.js full-stack application. The Client
       | Sessions tab where I, out of the box, can correlate front-end
       | actions and back-end operations for a particular user is
       | especially nice.
       | 
       | Great job!
        
         | wrn14897 wrote:
         | Thank you. This means a lot to us.
        
         | [deleted]
        
       | hernantz wrote:
       | There is also SigNoz [0] solving the same problem with a similar
       | stack (OpenTelemetry and Clickhouse)
       | 
       | [0] https://github.com/SigNoz/signoz
        
         | [deleted]
        
       | kcsavvy wrote:
       | The session playback looks useful - I find this is missing from
       | many DD alternatives I have seen.
        
         | mikeshi42 wrote:
         | Absolutely! It's pretty magical to go from a user report ->
         | session replay -> exact API call being made and the backend
         | error logs.
         | 
         | We dogfood a ton internally and (while obviously biased) we're
         | always surprised how much faster we can pin point issues and
         | connect alarms with bug reports.
         | 
         | Hope you give us a spin and feel free to hop on our discord or
         | open an issue if you run into anything!
        
       | jefc1111 wrote:
       | Hey, cool product. I know that marketing success is not
       | predicated on good grammar, nevertheless I felt moved to suggest
       | a minor edit to your blurb:
       | 
       | "HyperDX helps engineers figure out why production is broken,
       | faster. HyperDX centralises and correlates logs, metrics, traces,
       | exceptions and session replays in one place."
       | 
       | Good luck!
        
         | mikeshi42 wrote:
         | Thank you! I'm assuming this is in reference to our README?
         | (Sorry I'm a _tad_ lacking in sleep)
         | 
         | If so, would you like to open a PR? I'm also happy to edit it
         | myself but of course don't want to be stealing credit if you'd
         | like to be attributed that way.
        
         | joshxyz wrote:
         | Everyone says that.
         | 
         | How about: "9 out of 10 devs are now pushing to prod on
         | fridays. Thanks to HyperDX. Hehe."
        
           | quintes wrote:
           | This helps coordinate Spotify releases on Fridays to get on
           | the release radar?
        
         | [deleted]
        
       | vadman97 wrote:
       | How do you think about the query syntax? Are you defining your
       | own or are you following an existing specification? I
       | particularly love the trace view you have, connecting a frontend
       | HTTP request to server side function-level tracing.
        
         | mikeshi42 wrote:
         | This one is a fun one that I've spent too many nights on -
         | we're largely similar to Google-style search syntax (bare
         | terms, "OR" "AND" logical operators, and property:value kind of
         | search).
         | 
         | We include a "query explainer" - which translates the parsed
         | query AST into something more human readable under the search
         | bar, hopefully giving good feedback to the user on whether
         | we're understand their query or not. Though there's lots of
         | room to improve here!
        
           | gajus wrote:
           | Potentially useful resource - https://github.com/gajus/liqe
        
             | mikeshi42 wrote:
             | I've tried liqe! I really wanted to love it - and I think
             | it's amazing for the use case you've built it for, but I
             | recall we ran into a few fatal issues (maybe it was
             | supporting URLs or something as a property value?) and had
             | to fork one of the `lucene` forks to get the grammar that
             | we wanted.
             | 
             | Edit: happy to chat more about it as well if you're looking
             | for more specific feedback - it's an area I've spent a
             | decent amount of time on and would love to improve projects
             | like liqe or others based on our experience if we can.
        
       | t1mmen wrote:
       | This looks really cool, congrats on the launch!
       | 
       | I haven't had time to dig in proper, but this seems like
       | something that would fit perfectly for "local dev" logging as
       | well. I struggled to find a good solution for this, ending up
       | Winston -> JSON, with a simpler "dump to terminal" script
       | running.
       | 
       | (The app I'm building does a ton of "in the background" work, and
       | I wanted to present both "user interactions" and "background
       | worker" logs in context)
       | 
       | I don't see Winston being supported as a transport, but
       | presumably easy to add/contribute.
       | 
       | Good luck!
        
         | mikeshi42 wrote:
         | Thank you! We do support Winston (docs:
         | https://www.hyperdx.io/docs/install/javascript#winston-
         | trans...) and use it a lot internally. Let me know if you run
         | into any issues with it (or have suggestions on how to make it
         | more clear)
         | 
         | In fact this is actually how we develop locally - because even
         | our local stack is comparatively noisy, we enable self-logging
         | in HyperDX so our local logs/traces go to our own dev instance,
         | and we can quickly trace a 500 that way. (Literally was doing
         | this last night for a PR I'm working on).
        
           | t1mmen wrote:
           | Oh sweet! I was in a bit of a hurry and must've missed it,
           | thanks for clarifying. This will be super helpful for us,
           | very excited play with it!
        
             | mikeshi42 wrote:
             | No worries - excited to hear what you think! Feel free to
             | drop by our discord if you run into any issues or have any
             | other feedback as well :)
        
       | dgoncharov wrote:
       | This could be huge for healthcare companies like Metriport [1] -
       | do you sign BAAs with customers for HIPAA compliance?
       | 
       | [1] https://github.com/metriport/metriport
        
         | mikeshi42 wrote:
         | Definitely familiar with the compliance needs there - more than
         | happy to chat further about BAAs and HIPAA compliance
         | requirements with you guys. Always love partnering with others
         | in the OSS space :)
        
         | [deleted]
        
       | pranay01 wrote:
       | Congrats on the launch!
       | 
       | Do also check out SigNoz [1] We are working on a similar problem
       | statement ;)
       | 
       | [1] https://github.com/signoz/signoz
        
         | [deleted]
        
       | nodesocket wrote:
       | Congrats on the launch. Perhaps I missed it, but what are the
       | system requirements to run the self-hosted version? Seems
       | decently heavy (Clickhouse, MongoDB, Redis, HyperDX services)? Is
       | there a Helm chart to install into k8s?
       | 
       | Look forward to the syslog integration which says coming soon. I
       | have a hobby project which uses systemd services for each of my
       | Python apps and the path with least resistance is just ingest
       | syslog (aware that I lose stack traces, session reply, etc).
        
         | mikeshi42 wrote:
         | The absolute bare minimum I'd say is 2GB RAM, though in the
         | README we do say 4GB and 2 cores for testing, obviously more if
         | you're at scale and need performance.
         | 
         | For Syslog - it's something we're actually pretty close to
         | because we already support Heroku's syslog based messages
         | (though it's over HTTP), but largely need to test the otel
         | Syslog receiver + parsing pipeline will translate as well as it
         | should (PRs always welcome of course but it shouldn't be too
         | far out from now ourselves :)). I'm curious are you using
         | TLS/TCP syslog or plain TCP or UDP?
         | 
         | Here's my docker stats on a x64 linux VM where it's doing some
         | minimal self-logging, I suspect the otel collector memory can
         | be tuned down to bring the memory usage closer to 1GB, but this
         | is the default out-of-the-box stats, and the miner can be
         | turned off if log patterns isn't needed:
         | 
         | CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK
         | I/O PIDS
         | 
         | 439e3f426ca6 hdx-oss-miner 0.89% 167.2MiB / 7.771GiB 2.10%
         | 3.25MB / 6.06MB 8.85MB / 0B 21
         | 
         | 7dae9d72913d hdx-oss-task-check-alerts 0.03% 83.65MiB /
         | 7.771GiB 1.05% 6.79MB / 9.54MB 147kB / 0B 11
         | 
         | 5abd59211cd7 hdx-oss-app 0.00% 56.32MiB / 7.771GiB 0.71% 467kB
         | / 551kB 6.23MB / 0B 11
         | 
         | 90c0ef1634c7 hdx-oss-api 0.02% 93.71MiB / 7.771GiB 1.18% 13.2MB
         | / 7.87MB 57.3kB / 0B 11
         | 
         | 39737209c58f hdx-oss-hostmetrics 0.03% 72.27MiB / 7.771GiB
         | 0.91% 3.83GB / 173MB 3.84MB / 0B 11
         | 
         | e13c9416c06e hdx-oss-ingestor 0.04% 23.11MiB / 7.771GiB 0.29%
         | 73.2MB / 89.4MB 77.8kB / 0B 5
         | 
         | 36d57eaac8b2 hdx-oss-otel-collector 0.33% 880MiB / 7.771GiB
         | 11.06% 104MB / 68.9MB 1.24MB / 0B 11
         | 
         | 78ac89d8e28d hdx-oss-aggregator 0.07% 88.08MiB / 7.771GiB 1.11%
         | 141MB / 223MB 147kB / 0B 11
         | 
         | 8a2de809efed hdx-oss-redis 0.19% 3.738MiB / 7.771GiB 0.05%
         | 4.36MB / 76.5MB 8.19kB / 4.1kB 5
         | 
         | 2f2eac07bedf hdx-oss-db 1.34% 75.62MiB / 7.771GiB 0.95% 105MB /
         | 3.79GB 1.32MB / 246MB 56
         | 
         | 032ae2b50b2f hdx-oss-ch-server 0.54% 128.7MiB / 7.771GiB 1.62%
         | 194MB / 45MB 88.4MB / 65.5kB 316
        
           | nodesocket wrote:
           | Thanks for the reply and providing detailed system
           | requirements and docker stats. Seems I missed the note in the
           | README. :-)
           | 
           | Actually I am not really using syslog per say, but systemd
           | journalctl which default behaviour on Debian (rsyslog) also
           | duplicates to /var/log/syslog.
           | StandardOutput=journal         StandardError=journal
           | 
           | Is there a better integration to pull logs from my systemd
           | services and journalctl up to HyperDX?
        
             | mikeshi42 wrote:
             | Ah yeah the easiest way is probably using the OpenTelemetry
             | collector to set up a process to pull your logs out of
             | jounrnald and send them via otel logs to HyperDX (or
             | anywhere else that speaks otel) - the docs might be a bit
             | tricky to go around depending on your familiarity with
             | OpenTelemetry but this is what you'd be looking for:
             | 
             | https://github.com/open-telemetry/opentelemetry-collector-
             | co...
             | 
             | Happy to dive more into the discord too if you'd like!
        
       | codegeek wrote:
       | How are you different compared to similar tools like signoz ?
        
         | [deleted]
        
         | mikeshi42 wrote:
         | Overall we're highly focused on providing solid developer
         | workflows, ex. with HyperDX users can correlate a log to a
         | trace (and vice-versa) really easily in the same UI, we don't
         | silo out features that are commonly needed in a single
         | workflow. You can also search everything from a single panel,
         | whether it's a log, trace, or client-side event, using the same
         | syntax which means there's less to learn.
         | 
         | Feature-to-feature, I'd say the things we do better is browser-
         | side monitoring (session replay), event patterns/clustering,
         | and we have first-party SDKs built on OpenTelemetry to make the
         | setup a lot easier than vanilla OpenTelemetry.
         | 
         | I think Signoz has built a nice one-stop platform for
         | observability, whereas we go one step further and focus on the
         | developer experience to ensure anyone can fully leverage that
         | observability data!
        
       | addisonj wrote:
       | Wow, there is a lot here and what here is to a pretty impressive
       | level of polish for how far along this is.
       | 
       | The background of someone with a DX background comes through! I
       | will be looking into this a lot more.
       | 
       | Here are a few comments, notes, and questions:
       | 
       | * I like the focus on DX (especially compared to other OSS
       | solutions) in your messaging here, and I think your hero
       | messaging tells that story, but it isn't reinforced as much
       | through the features/benefits section
       | 
       | * It seems like clickhouse is obviously a big piece of the tech
       | here, which is an obvious choice, but from my experience with
       | high data rate ingest, especially logs, you can run into issues
       | at larger scale. Is that something you expect to give options
       | around in open source? Or is the cloud backend a bit different
       | where you can offer that scale without making open source so
       | complex?
       | 
       | * I saw what is in OSS vs cloud and I think it is a reasonable
       | way to segment, especially multi-tenancy, but do you see the
       | split always being more management/security features? Or are you
       | considering functional things? Especially with recent HashiCorp
       | "fun" I think more and more it is useful to be open about what
       | you think the split will be. Obviously that will evolve, but I
       | think that sort of transparency is useful if you really want to
       | grow the OSS side
       | 
       | * on OSS, I was surprised to see MIT license. This is full
       | featured enough and stand alone enough that AGPL (for server
       | components) seems like a good middle ground. This also gives some
       | options for potentially a license for an "enterprise" edition, as
       | I am certain there is a market for a modern APM that can run all
       | in a customer environment
       | 
       | * On that note, I am curious what your target persona and GTM
       | plan is looking like? This space is a a bit tricky IMHO, because
       | small teams have so many options at okay price points, but the
       | enterprise is such a difficult beast in switching costs. This
       | looks pretty PLG focused atm, and I think for a first release it
       | is impressive, but I am curious to know if you have more you are
       | thinking to differentiate yourself in a pretty crowded space.
       | 
       | Once again, really impressive what you have here and I will be
       | checking it out more. If you have any more questions, happy to
       | answer in thread or my email is in profile.
        
         | dangoodmanUT wrote:
         | For clickhouse, just batch insert. They probably have something
         | batching every few s before inserting directly to their hosted
         | version
        
           | dangoodmanUT wrote:
           | There's also async inserts
        
           | vadman97 wrote:
           | ClickHouse Async insert docs [1].
           | 
           | We ran into some challenges with async inserts at
           | highlight.io [2]. Namely, ClickHouse Cloud has an async flush
           | size configured (that can't be changed AFAIK) that isn't
           | large enough for our scale. Once you async insert more than
           | can be flushed, you get back pressure on your application
           | waiting to write while Clickhouse flushes the queue. We found
           | that implementing our own batched flushing via kafka [3] is
           | far more performant, allowing us to insert 500k+ RPS on the
           | smallest cloud instance type.
           | 
           | [1] https://clickhouse.com/docs/en/optimize/asynchronous-
           | inserts [2] https://github.com/highlight/highlight/tree/main
           | [3] https://github.com/highlight/highlight/blob/4d28451b19357
           | 96d...
        
           | addisonj wrote:
           | Generally, any sort of async/batch inserts will get you
           | _decently_ far, but still will have limitations well before
           | you get to million rows a second, mostly because it is really
           | difficult to get your batch size large enough from individual
           | producers without some sort of aggregation, which that
           | aggregation is a challenge if you care about durability.
           | 
           | So often that means you need something like a Kafka to get
           | the bulk ingest to really perform to get batch sizes large
           | enough.
           | 
           | That kind of gets into one of the challenges of OSS
           | observabilility systems, you don't want to make the
           | dependencies insane for someone who only has a few thousand
           | logs a second, but generally at some point of scale you do
           | need more.
        
         | fnord77 wrote:
         | Clickhouse is proprietary, though.
         | 
         | I wonder why not Apache Druid
        
         | mikeshi42 wrote:
         | Thank you, really appreciate the feedback and encouragement!
         | 
         | > It seems like clickhouse is obviously a big piece of the tech
         | here, which is an obvious choice, but from my experience with
         | high data rate ingest, especially logs, you can run into issues
         | at larger scale. Is that something you expect to give options
         | around in open source?
         | 
         | Scaling any system can be challenging - our experience so far
         | is that Clickhouse is a fraction of the overhead of systems
         | like Elasticsearch has previously demanded luckily. That being
         | said, I think there's always going to be a combination of
         | learnings we'd love to open source for operators that are self-
         | hosting/managing Clickhouse, and tooling we use internally that
         | is purpose-built for our specific setup and workloads.
         | 
         | > I saw what is in OSS vs cloud and I think it is a reasonable
         | way to segment, especially multi-tenancy, but do you see the
         | split always being more management/security features?
         | 
         | Our current release - we've open sourced the vast majority of
         | our feature set, including I think some novel features like
         | event patterns that typically are SaaS-only and that'll
         | definitely be the way we want to continue to operate. Given the
         | nature of observability - we feel comfortable continuing to
         | keep pushing a fully-featured OSS version while having a
         | monetizable SaaS that focuses on the fact that it's completely
         | managed, rather than needing to gate heavily based on features.
         | 
         | > on OSS, I was surprised to see MIT license
         | 
         | We want to make observability accessible and we think AGPL will
         | accomplish the opposite of that. While we need to make money at
         | the end of the day - we believe that a well-positioned
         | enterprise + cloud offering is better suited to pull in those
         | that are willing to pay, rather than forcing it via a license.
         | I also love the MIT license and use it whenever I can :)
         | 
         | > On that note, I am curious what your target persona and GTM
         | plan is looking like?
         | 
         | I think for small teams, imo the options available are largely
         | untantilizing, it ranges from narrow tools like Cloudwatch to
         | enterprise-oriented tools like New Relic or Datadog. We're
         | working hard to make it easier for those kinds of teams to
         | adopt good monitoring and observability from day 1, without the
         | traditional requirement of needing an observability expert or
         | dedicated SRE to get it set up. (Admittedly, we still have a
         | ways to improve today!) On the enterprise side, switching costs
         | are definitely high, but most enterprises are highly
         | decentralized in decision making, where I routinely hear F500s
         | having a handful of observability tools in production at a
         | given time! I'll say it's not as locked-in as it seems :)
        
           | addisonj wrote:
           | Thanks for the answers Mike!
           | 
           | One more follow-up on the scale side (which I mentioned with
           | sibling comment), it isn't so much about clickhouse itself,
           | but about scaling up ingest. From my own experience and from
           | talking with quite a few APM players (I previously worked in
           | streaming space), a Kafka / durable log storage kind of
           | becomes a requirement, so I was curious if you think at some
           | point you need a log to further scale ingest.
           | 
           | For enterprise side, I was previously in data streaming space
           | and had quite a few conversations with APM players and
           | companies building their own observability platforms, happy
           | to chat and share more if that would be useful!
        
             | mikeshi42 wrote:
             | Ah got it, yeah a queue of some sort is definitely useful
             | when scaling up to buffer pre-inserted data. This is
             | something on the OSS side we've kept open to
             | implementation. However it's something that is highly
             | coupled with infra footprint and internal SLA guarantees
             | the user wants to preserve. It can range anywhere from just
             | rely on client-side retries to setting up a HA Kafka
             | cluster early in the ingestion pipeline.
             | 
             | Similar to Elastic - I think a lot of architectures are
             | available to choose on that side when users want to scale.
             | 
             | Will reach out to connect!
        
       | vosper wrote:
       | We've seen a fair few "Datadog alternatives" on HN over the
       | years. Does that mean that Datadog is the reference or gold-
       | standard system to beat, or to compare your product to?
       | 
       | Kind of like how people mostly promote "Elasticsearch
       | alternatives" and not "Solr alternatives".
        
         | [deleted]
        
         | mikeshi42 wrote:
         | It's a pretty scattered landscape with everyone wanting
         | something slightly different, but everyone has likely heard of
         | Datadog at one point or another (whether they wanted to or
         | not... but that's another story).
         | 
         | It becomes convenient short-hand for what they do (collect
         | logs, metrics, traces, RUM, etc. for engineers to debug).
         | 
         | Though with more characters to write, I'd like to think we have
         | a different take on both how our pricing model works and how
         | easy it should be for an engineer to get started with us :)
        
         | viraptor wrote:
         | It's a relatively ok priced system which has almost everything:
         | server and client performance, alerts, dashboards, logs,
         | profiling, tracing, etc. It's not amazing and has some issues,
         | but it's one place to get lots of things you want and it's good
         | enough for many. I wouldn't say gold-standard, but rather a
         | benchmark for "you have to be this tall to play the
         | observability product game".
        
       | robertlagrant wrote:
       | If you want my two Datadog favourite features, they were: 1)
       | clicking on a field and making it a custom search dimension in
       | another click, and 2) flame graphs. Delicious flame graphs.
        
         | [deleted]
        
         | mikeshi42 wrote:
         | We should have both! If you hover over a property value, a
         | magnify/plus icon come up to allow you to search on that
         | property value (no manual facets required) - and our traces all
         | come with delicious flame graphs :) Let me know if you were
         | thinking of something different.
         | 
         | One other thing I think you'd love if you're coming from
         | Datadog is that you're able to full text search on structured
         | logs as well, so even if the value you're looking for lives in
         | a property, it's still full text searchable (this is a huge
         | pain we hear from other Datadog users)
         | 
         | If there's anything you love/hate about Datadog - would love to
         | learn more!
        
           | robertlagrant wrote:
           | Well - the worst thing about Datadog is the sales process :-)
           | But I'll save that for my memoirs. I seem to remember at the
           | time their K8s/Helm integration was a little buggy, but no
           | other pain than that. Plugging our software in was very easy,
           | I recall. We had Python in the backend and we just installed
           | their software and wired it into our API services. I also
           | remember they had a consumer for Auth0 via Auth0's log
           | streaming feature, which we were using at the time.
           | 
           | Btw I haven't checked your product out yet; I was just
           | reminiscing :-) I'll take a look soon.
        
             | mikeshi42 wrote:
             | Awesome, let me know what you think when you get a chance
             | to take a look!
        
       | user3939382 wrote:
       | I'm interested. Datadog is cool but the price is ridiculously
       | high for small orgs.
        
         | thelastparadise wrote:
         | Is prometheus/grafana still the recommended FOSS solution?
        
           | bovermyer wrote:
           | Prometheus isn't quite enough on its own. You need
           | Prometheus, Grafana, Tempo, Loki, Faro, and Pyroscope to get
           | close to Datadog's feature set.
        
           | gazby wrote:
           | I believe so, but have recently stumbled upon Netdata which
           | scratches the "I don't want to maintain an entire monitoring
           | stack for these few boxes" kind of itch. Need to work with it
           | some more to nail down the trade-offs.
        
         | mikeshi42 wrote:
         | Agreed! It's per-host pricing can obliterate budgets if you use
         | a fleet of small instances (which is crazy to me their pricing
         | dictates your infra...)
         | 
         | Would love to have you check us out! Let me know if you run
         | into any issues - feel free to hop on our discord as well :)
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-09-18 23:00 UTC)