[HN Gopher] Ask HN: Do You Test in Production? ___________________________________________________________________ Ask HN: Do You Test in Production? There are a lot of blog posts talking about the fact that testing in prod should not be a taboo like it may have been in the 90s. I've read some of these [1] [2], I get the arguments in favour of it, and I want to try some experiments. My question is -- how does one go about doing it _safely_? In particular, I'm thinking about data. Is it common practice to inject fabricated data into a prod system to run such tests? What's the best practice or prior art on doing this well? Ultimately, I think this will end up looking like implementing SLIs and SLOs in PROD, but for some of my SLOs, I think I need to actually _fake_ the data in order to get the SLIs I need, so how to do this? Suggestions appreciated -- thanks. [1] https://increment.com/testing/i-test-in-production/ [2] https://segment.com/blog/we-test-in-production-you-should-too/ Author : bradwood Score : 19 points Date : 2023-01-14 22:13 UTC (46 minutes ago) | tonymet wrote: | yes you can do so with a canary tier . assuming your code is well | instrumented to distinguish performance and quality regressions , | a canary tier served to customers will catch more regressions | than synthetic testing | rr808 wrote: | Depends a lot on your application and how big the changes are. IF | you're an online store and you're pushing out incremental changes | to a subset of users its a good strategy. If its aircraft auto- | pilot not so much. | csours wrote: | Everyone tests in production. Some people also test before | production! | | Some people try to NOT test in production, but everyone does test | in prod in a very real sense because dependencies and | environments are different in prod. | | I think the question was "Do you INTENTIONALLY test in | production" | bradwood wrote: | I see a lot of suggestions in the comments for feature flags -- | we've been using these from the beginning, to very good effect. | | However flags turn on/off _code_ , not data, and my main area of | interest here is how to deal with the test data problem in prod. | quickthrower2 wrote: | In a multi tennant system one of the accounts can be a test | account. Within that you can run integration tests. You might | need special cases: test payment accounts and credit cards, test | pricing plans and so on. | | Some basic ping tests and other checks before swapping (as in | preparing, initiating, and pointing the load balancer) to a new | version into production would be smart. | HereBeBeasties wrote: | Good testing is an exercise in pushing I/O to the fringes, as | that's what has stateful side-effects. (Some might even argue | that anything that tests I/O is an integration test. The term | "integration test" is not well defined and not worth getting hung | up over IME.) | | Once you're into testing I/O, which is ultimately unavoidable no | matter how hard you try not to, you either need cooperative third | parties who can give you truly representative test systems (rare) | or a certain amount of test-in-prod. | | Testing database stuff remains hard. You either wrap things in a | some kind of layer you can mock out, or dupe prod or some subset | of it into a staging environment with a daily snapshot or similar | and hope any differences (scale, normally) aren't too bad. | | Copy-on-write systems or those with time-travel and/or | immutability help immensely with test-in-prod, especially if you | can effectively branch your data. If it's your own systems you | are testing against, things like lakefs.io look pretty useful in | this regard. | cloudking wrote: | I think it depends on how your application works. If you have the | concept of customers, then you can have a test customer in | production with test data that doesn't affect real customers for | example. You can reset the test customer data each time you want | to test. | brianwawok wrote: | Anytime you need to talk to a third party API, you need to test | in prod. | | Some people have sandbox apis. They are generally broken and not | worth it. See eBay for super in depth sandbox API that never | works. | | You can read the docs 100 times over. At the end of the day, the | API is going to work like it works. So you kind of "have to" test | in prod for these guys. | lpapez wrote: | Ditto regarding Paypal: you need a sandbox API token to get | started with it. Their sandbox token generator was broken for | MONTHS, I could not believe it. By the time we got the token, | we already fixed all bugs on our side the hard way - by testing | in prod - and moved on. | __s wrote: | Yes | | Just because you have staging doesn't mean you don't need unit | tests. Similarly, test in stage, then test in prod. Ideally in a | way isolated from real prod users (eg, in an insurance system we | had fake dealer accounts for testing) | fleekonpoint wrote: | We run canaries in Prod, it isn't as extensive as our integration | tests that run in our test stages but it still tests happy paths | for most of our APIs. | natoliniak wrote: | Feature flags. | atemerev wrote: | In electronic trading, most new systems are tested in production | by running with smaller capital allocation first. It is hard to | flatten out all bugs unless you are on the real market with real | money and real effects (of course, simulations testing and unit | testing are heavily employed too). | revskill wrote: | It's more about handling production error quickly, than testing | in production. Feature flag is a good way. | paxys wrote: | Lots of ways to test in production. IMO the way you are | suggesting - injecting synthetic data into prod - is the worst of | both worlds. You aren't actually testing real world use cases, | and end up polluting your prod environment. | | Some common ways to go about this: | | - Feature flags: every new change goes into production behind a | flag. You can flip the flag for a limited set of users and do a | broader rollout when ready. | | - Staged rollouts: have staging/canary etc. environments and roll | out new deployments to them first. Observe metrics and alerts to | check if something is wrong. | | - Beta releases: have a group of internal/external power users | test your features before they go out to the world. | cuuupid wrote: | I work for a B2E company that has a structure similar to | Salesforce. We test in production all the time even for our | secure environments where the data is highly sensitive. | | Re: data, it's a somewhat common practice to notionalize data | (think isomorphically faking data). We regularly do this and will | often designate rows as notional to hide them from users who | aren't admins. I've found this to work exceptionally well; we do | this 1-2 times a week, ensure there's a closed circuit for | notional data, and for more critical systems we'll inform our | customers that testing will occur. | | I'm sure there are more complex and automated solutions but when | it comes to testing, simple and flexible is often the way to go. | bradwood wrote: | Thanks. This sounds interesting. | | Can you give a bit more colour on "notionalizing" and | "isomorphically faking" please. | cuuupid wrote: | Essentially creating fake data that looks very realistic and | creates narratives that would span real use cases. Some of | this is simple (fake names with faker), some of it is a bit | more manually guided (customer-specific terminology and | specific business logic). | | The goal here is for the data to both be useful for testing | and provide coverage not just at a software level, but at a | user story level. This helps test things like cross- | application interactions; is also doubly helpful since we can | use it for demos without screwing up production data. | piyh wrote: | >notionalize data (think isomorphically faking data) | | Are these just $5 words for setting a fake data flag on the | records? | cuuupid wrote: | We do that too but notionalizing for us is usually creating | data that looks and behaves realistically but is actually | fake. (A side benefit to this is that we can then use it for | demos!) | nonethewiser wrote: | So you mock data and then flag it as fake. | cuuupid wrote: | Essentially yes! We usually try to follow some sort of | theoretical user story/paint some sort of narrative but | at the end of the day it's just adjusting the mocking. | | Just now realizing notionalizing isn't a widely accepted | term for this | sethammons wrote: | Note: if you plan on accurate financial planning and metrics | (esp. if going public), you need to be able to separate your test | prod stats from the real prod stats for reporting. | dmitriid wrote: | A/B tests and feature flags are basically testing in prod. And | yes, some of those features sometimes run as a "well, it should | work, but we're not entirely sure until we get a significant | number of users using the system". It could be an edge case | failing or scalability requirements being wrong. | | Another variation on the same theme is rewriting systems when you | run production data through both systems. Quite often that's the | only way of doin migrations to a new platform, or a new database, | or yes, a newly re-written system. | | > Is it common practice to inject fabricated data into a prod | system to run such tests? What's the best practice or prior art | on doing this well? | | A very common practice is to run a snapshot of prod data (e.g. | last hour, or last 24 hours, or even a week/month/year) through a | system in staging (or cooking, or pre-cooking, or whatever name | you give the system that's just about to be released). However, | doing it properly may not be easy, and depends on the systems | involved. | turtleyacht wrote: | Sometimes one cannot get the exact same specs on test hardware | versus production, yet a rollout depends on simulating system | load to shake out issues. | | Performance testing needs a schedule, visibility, timebox, known | scope, backout plan, data revert plan, pre- and post-graphs. | Schedule. Folks are clearly tagged in a table | with times down the side. Visibility. Folks who should | know know when it's going to happen, | are invited to the session, and are mentioned | in the distributed schedule. | Timebox. It's going to start at a defined | time and end on a defined time. | Known scope. Is it going to fulfill an order? | How many accounts created? Backout | plan. DBA and DevOps on standby for stopping | the test. Data revert plan. We know what rows to | delete or update after testing. | Pretty pictures. You want to show graphs | during the test, so that you know what to | improve and everyone's time wasn't | wasted. | | Reference: observing successful runs that didn't result in | problems later. ___________________________________________________________________ (page generated 2023-01-14 23:00 UTC)