[HN Gopher] Monitoring tiny web services ___________________________________________________________________ Monitoring tiny web services Author : mfrw Score : 108 points Date : 2022-07-09 17:29 UTC (5 hours ago) (HTM) web link (jvns.ca) (TXT) w3m dump (jvns.ca) | dafelst wrote: | I like this apparent shift back to "small is okay" where not | every service has to be an overengineered allegedly hyper- | scalable distributed mess of five nines uptime with enterprise | logging, alerting and monitoring. | | Those things are nice when you have a bazillion users and | downtime means hordes of unhappy users and dollars flushing away | at insane rates, but for the vast majority of hobby projects and | even mid stage startups, what is described in this article is | plenty good enough. | is_true wrote: | I've thought about posting an AskHN about simple infrastructure | for some time but I'm not sure how to word it to attract as | many responses as possible. | rozenmd wrote: | My particular favourite is how GraphQL servers respond with "200 | OK" and the errors will be sent in a key called "errors". Makes | regular healthchecks almost useless. | | I ended up writing my own service[0] to detect problems with | graphql responses, before expanding it to cover websites and web | apps too. | | -[0]: https://onlineornot.com | BiteCode_dev wrote: | Github answers 404 instead of a 403 when you try to access a | private repository while not being logged in. | | I assume the rational is to not leak information about what's | private. But still, it's weird. | [deleted] | dmlittle wrote: | AWS S3 does the opposite when querying objects that don't | exist. If you don't have s3:ListObjects permissions on the | bucket you'll get a 403 error (you can't differentiate | between the object not existing vs. you don't have access to | it). | | I think either approach is valid as long as you're | consistent. You can make a case for either 404 or 403 when | you don't have enough permissions. In GitHub's case you can | argue that it's a 404 because the resource does indeed not | exist through your auth context. In AWS' case you can argue | that a 403 makes sense because you don't have permission to | know the answer to your query. | OJFord wrote: | I honestly hate that so much, it's a relief to read someone | saying the same. | | I sort of almost made myself feel a bit better about it by | thinking 'no, it's not REST, we _have_ reached the graphql | server successfully and got a .. "successful" response from | _it_ , it's sort of a "Layer 8" on top of HTTP'. The problem is | that none of the bloody tooling is 'Layer 8', so you end up in | browser dev tools with all these 200 responses and no idea | which ones are errorful. If any. | bdd wrote: | Google's uptime monitoring also allows writing JSONPath checks, | so one can monitor HTTP 200 JSON responses semantically. | KronisLV wrote: | Currently got the cheapest VPS that I could (in my case from | Time4VPS, some others might prefer Hetzner, or Scaleway Stardust | instances), setup Uptime Kuma on it | (https://github.com/louislam/uptime-kuma), now have checks every | 5 minutes against 30+ URLs (could easily do each minute, but | don't need that sort of resolution yet). | | It's integrated with Mattermost currently, seems to work pretty | well. Could also set it up on another VPS, for example on Hetzner | (which also has excellent pricing), could also integrate another | alerting method such as sending e-mails, or anything else that's | supported out of the box: https://github.com/louislam/uptime- | kuma/issues/284 | | Oh, also Zabbix for the servers themselves. Honestly, if things | are as simple to setup as nowadays and you have about 50 EUR per | year per node that you want (1 is usually enough, 2 is better | from a redundancy standpoint, since then it becomes feasible to | monitor the monitoring, others might go for 3 nodes for important | things etc.), you don't even need to look for cloud services or | complex systems out there. | | Of course, if someone knows of some affordable options for cloud | services, feel free to share! | | I briefly checked the prices for a few and most of them are a | little bit more expensive than just getting a VPS, setting up | sshd to only use key based auth, throwing Let's Encrypt in front | of the web UI (or maybe additional auth, or making it accessible | only through VPN, whatever you want), adding fail2ban and | unattended updates, and doing some other basic configuration that | you probably have automated anyways. | | The good news is that if you prefer cloud services and would | rather have that piece of your setup be someone else's problem, | they're not even an order of magnitude off in most cases - though | I'm yet to see how Uptime Kuma in particular scales once I'll get | to 100 endpoints. Seems like at a certain scale it's a bit | cheaper to run your own monitoring, but at that point you might | still find it easier to just pay a vendor. | | At the end of the day, there's lots of great options out there, | both cloud based and self-hosted, whichever is your personal | preference. | jacooper wrote: | You can get a free 4vcpu 24gb ram 200gb storage VPS with oracle | cloud Free tier. | perth wrote: | You can get a cheaper VPS through ramnode & $15/year atm | KronisLV wrote: | That's pretty cool! | | I guess I'd personally also mention Contabo as an affordable | host in general (though their web UI is antiquated), | especially their storage nodes: | https://contabo.com/en/storage-vps/ | | For the most part, though, use whichever host you've been | with for a few years (though feel free to experiment with | whatever new platforms catch your eye), but ideally still | have local backups for everything (as long as you don't have | to deal with regulations that'd make it not possible) so you | can migrate elsewhere. | tatoalo wrote: | I have been using cronitor[0] for a few months now and I have | been really satisfied with them so far! | | [0]: https://cronitor.io | pkrumins wrote: | If you have a popular service, then one of the best approaches is | to have your users notify you when something is down or is | broken. This pattern follows the famous quote: "Given enough | eyeballs all bugs are shallow." I have employed this approach to | great success and haven't had a need for any monitoring services. | redleader55 wrote: | If users see the problem it is too late. You will be seen as | unable to keep the service up and the service will be seen as | flaky. | | Also, the Holly grail of monitoring is to be able to remediate | the problem automatically - this is pretty hard when users are | reporting it. | dimitar wrote: | If I have to do one thing to monitor a simple website I'm | probably going to use something that takes a screenshot | periodically and checks it for changes. There are open source | solutions but I just prefer to pay a bit for a managed service to | do it. | | I think it covers quite a lot of things - the servers are up, DNS | is OK, assets are OK. It can also be a safety net in case of | other, more sophisticated monitoring fails to detect an unusual | state. | | This doesn't work well for website with too much javascript, ads | or widgets. | radus wrote: | What are the OSS solutions for this? | xrd wrote: | I installed Uptime Kuma (https://github.com/louislam/uptime-kuma) | on my dokku paas to monitor my dokku apps. It works great. It is | great for pure HTTP services, but it can be used against things | like RTMP servers because it also permits configuration of a | health check with TCP pings. It gives me an email when things are | down, and supports retry, heartbeat intervals, and can validate a | string in the HTML retrieved. I love it. | jslakro wrote: | I considerated this option but then realized that both sides, | the api/services and the uptime checker will be in the same | server then any problem impacting the server itself will leave | offline the monitoring | jewel wrote: | Another approach that has been working great for me: | https://www.webalert.me. This app runs on your phone, you can | configure it to check once an hour if any content on a page | changes. | blondin wrote: | have to say, this is exactly what kubernetes was designed to | solve. but the focus was on microservices and containers. and | things also got out of hands. | nickjj wrote: | > have to say, this is exactly what kubernetes was designed to | solve | | Kubernetes probes are much different in my opinion. | | Your Kubernetes liveness check will check if things are working | inside of your cluster which is great for a high frequency | checkup to potentially modify the state of your pod based on | the result. | | But Uptime Robot is an end to end test. It tests a real | connection over the internet to your domain which exercises | external DNS, traffic flowing through any reverse proxies, your | SSL certificate, etc.. | | Both compliment each other for different use cases. | dinvlad wrote: | I really wish managed Kubernetes offerings remained "free" for | small use, and would only expose "empty" nodes ready for full | utilization by end-user containers. | | The reality however is that every managed node (like on GKE) | uses quite a lot of CPU and memory out of the box, for which | the user pays. On top of that there're cluster fees, just for | having it around. This makes it completely unfriendly to | hobbyist projects, unless one is ready to pay dozens of $s just | to have Kubernetes (prior to deploying any apps to it). | | (And sure, there're free tiers here and there, but they never | solve this problem completely on any of the big cloud | providers, at least) | | Compare that to managed "serverless" offerings (even pseudo- | compatible with K8s API like Cloud Run), which eliminate the | management fees, but impose a tax with latency. Oh well. | epelesis wrote: | One reason this is not feasible is that K8s is not designed | for secure multitenancy, so for every tenant, you'll need to | spin up an entire K8s control plane, which includes a | database and several services - this is what's driving the | cluster fees. Keep in mind that customers also expect managed | K8s to be highly available, so this cost is also going into | things like replicating data, setting up load balancers, | etc... | | Compare this to a serverless offering that is multitenant by | design, the control plane is shared making the overhead cost | of an extra user is basically zero, which is why they don't | charge you a fee like this. | | IMO if you're a hobbyist interested in K8s, your best way to | go is to install K3s, which is a lightweight, API compatible | K8s alternative that runs on a single node. It's pretty nice | if you don't care about fault tolerance or High Availability. | | https://k3s.io/ | dinvlad wrote: | I'm not so sure about the economics of what you describe. I | think it could very well be that small customers don't | really consume that much "bandwidth" that their resource | requirements could be subsumed entirely by larger uses. It | doesn't make much sense that both large and small customers | have to pay the same cluster fee, for example - it would be | much more fair to charge more the more you use, and | approach "near zero" the lesser you use it. | | At the end of the day, all resources are run by the cloud | provider on KVMs sharing the same physical machines | anyways, so it's up to them how much to charge. The fact | that both small and large customers get to pay for the same | amount of resources allocated for them, only means these | resources are not allocated in the most efficient manner. | So a cloud provider could fix this. | | We should also not discount the net positive effect of | attracting more hobbyists and startups to your platform. | That's how AWS and GCP started, for example, but now | they're just focusing on more enterprise business so | smaller ones mean less to them (although AWS arguably less | so). But we shouldn't forget that while they don't | contribute as much to the revenue, they're essentially a | free advertising resource that make your platform stay | "relevant" (and especially more so for burgeoning startups | that could grow to bring more revenue in the future!). The | moment they leave, the platform just becomes another IBM | that's bound to die, for better or worse. | | On top of that, the anti-analogy with serverless for | control plane breaks down, because one could always run it | on the same shared pool of resources in gVisor or | Firecracker, just like with serverless. | daverobbins1 wrote: | Since everyone is posting their favorite free-tier monitoring | products - does anyone have a recommendation for a cloud product | that will allow us to create a group of ping monitors and alert | only if all monitors in the group are down for N minutes? | machinerychorus wrote: | You could hack that together with huginn pretty easily | | https://github.com/huginn/huginn | zoover2020 wrote: | > [...] recommend a cloud product | | Hacker mentality never left this site since inception :) | prakashn27 wrote: | I am curious for the use case of it. What group of servers do | you want to monitor? | daverobbins1 wrote: | We have dual internet connections coming into a satellite | office and we only want to be alerted if both are down. | bdd wrote: | You can get free uptime monitoring from Google Cloud. The limit | is 100 uptime checks per monitoring scope, which may mean either | a project or an organization based on how you configure IIUC. | https://cloud.google.com/monitoring/uptime-checks. The checks are | ran from 6 locations around the world, so you can also catch | network issues, that you likely cannot do much about when you're | running a tiny service. My uptime checks show the probes come | from: usa-{virginia,oregon,iowa}, eur-belgium, apac-singapore, | sa-brazil-sao_paulo | | Another neat monitoring thing I rely on is | https://healthchecks.io. Anything that needs to run periodically | checks in with the API at the start and the end of execution so | you can be sure they are running as they should, on time, and | without errors. Its free tier allows 20 checks. | ydant wrote: | healthchecks.io is a great service (and apparently can be self- | hosted - https://github.com/healthchecks/healthchecks) that I | use for both personal projects and at work. | | It works really well for cron jobs - while it works with a | single call, you can also call a /start and finished endpoint | and get extra insights such as runtime for your jobs. | | It would be nice if it had slightly more complex alerting rules | available - for example, a "this service should run | successfully at least once every X hours, but is fine to fail | multiple times otherwise" type alert. | | We wanted to use it for monitoring some periodic downloads | (like downloading partners' reports), and the expectation is | the call will often time out or fail or have no data to | download, which is technically a "failure", but only if it goes | on for more than a day. Since healtchecks.io doesn't really | support this, we ended up writing our own "stale data" | monitoring logic and alerting inside the downloader, and just | use healtchecks.io to monitor the script not crashing. | jacooper wrote: | What is the interval for the checks ? | | Its written that its 100 per metric scope, but I don't know | what that means really.(2) | | Also there seems to be no status monitor page ? | | 2- https://cloud.google.com/monitoring/uptime-checks | yawgmoth wrote: | Continuing the tooling thread: the free tier of | https://www.uptimetoolbox.com/ is quite good. ___________________________________________________________________ (page generated 2022-07-09 23:00 UTC)