[HN Gopher] Ask HN: Do You Test in Production?
       ___________________________________________________________________
        
       Ask HN: Do You Test in Production?
        
       There are a lot of blog posts talking about the fact that testing
       in prod should not be a taboo like it may have been in the 90s.
       I've read some of these [1] [2], I get the arguments in favour of
       it, and I want to try some experiments.  My question is -- how does
       one go about doing it _safely_? In particular, I'm thinking about
       data. Is it common practice to inject fabricated data into a prod
       system to run such tests? What's the best practice or prior art on
       doing this well?  Ultimately, I think this will end up looking like
       implementing SLIs and SLOs in PROD, but for some of my SLOs, I
       think I need to actually _fake_ the data in order to get the SLIs I
       need, so how to do this?  Suggestions appreciated -- thanks.  [1]
       https://increment.com/testing/i-test-in-production/  [2]
       https://segment.com/blog/we-test-in-production-you-should-too/
        
       Author : bradwood
       Score  : 19 points
       Date   : 2023-01-14 22:13 UTC (46 minutes ago)
        
       | tonymet wrote:
       | yes you can do so with a canary tier . assuming your code is well
       | instrumented to distinguish performance and quality regressions ,
       | a canary tier served to customers will catch more regressions
       | than synthetic testing
        
       | rr808 wrote:
       | Depends a lot on your application and how big the changes are. IF
       | you're an online store and you're pushing out incremental changes
       | to a subset of users its a good strategy. If its aircraft auto-
       | pilot not so much.
        
       | csours wrote:
       | Everyone tests in production. Some people also test before
       | production!
       | 
       | Some people try to NOT test in production, but everyone does test
       | in prod in a very real sense because dependencies and
       | environments are different in prod.
       | 
       | I think the question was "Do you INTENTIONALLY test in
       | production"
        
       | bradwood wrote:
       | I see a lot of suggestions in the comments for feature flags --
       | we've been using these from the beginning, to very good effect.
       | 
       | However flags turn on/off _code_ , not data, and my main area of
       | interest here is how to deal with the test data problem in prod.
        
       | quickthrower2 wrote:
       | In a multi tennant system one of the accounts can be a test
       | account. Within that you can run integration tests. You might
       | need special cases: test payment accounts and credit cards, test
       | pricing plans and so on.
       | 
       | Some basic ping tests and other checks before swapping (as in
       | preparing, initiating, and pointing the load balancer) to a new
       | version into production would be smart.
        
       | HereBeBeasties wrote:
       | Good testing is an exercise in pushing I/O to the fringes, as
       | that's what has stateful side-effects. (Some might even argue
       | that anything that tests I/O is an integration test. The term
       | "integration test" is not well defined and not worth getting hung
       | up over IME.)
       | 
       | Once you're into testing I/O, which is ultimately unavoidable no
       | matter how hard you try not to, you either need cooperative third
       | parties who can give you truly representative test systems (rare)
       | or a certain amount of test-in-prod.
       | 
       | Testing database stuff remains hard. You either wrap things in a
       | some kind of layer you can mock out, or dupe prod or some subset
       | of it into a staging environment with a daily snapshot or similar
       | and hope any differences (scale, normally) aren't too bad.
       | 
       | Copy-on-write systems or those with time-travel and/or
       | immutability help immensely with test-in-prod, especially if you
       | can effectively branch your data. If it's your own systems you
       | are testing against, things like lakefs.io look pretty useful in
       | this regard.
        
       | cloudking wrote:
       | I think it depends on how your application works. If you have the
       | concept of customers, then you can have a test customer in
       | production with test data that doesn't affect real customers for
       | example. You can reset the test customer data each time you want
       | to test.
        
       | brianwawok wrote:
       | Anytime you need to talk to a third party API, you need to test
       | in prod.
       | 
       | Some people have sandbox apis. They are generally broken and not
       | worth it. See eBay for super in depth sandbox API that never
       | works.
       | 
       | You can read the docs 100 times over. At the end of the day, the
       | API is going to work like it works. So you kind of "have to" test
       | in prod for these guys.
        
         | lpapez wrote:
         | Ditto regarding Paypal: you need a sandbox API token to get
         | started with it. Their sandbox token generator was broken for
         | MONTHS, I could not believe it. By the time we got the token,
         | we already fixed all bugs on our side the hard way - by testing
         | in prod - and moved on.
        
       | __s wrote:
       | Yes
       | 
       | Just because you have staging doesn't mean you don't need unit
       | tests. Similarly, test in stage, then test in prod. Ideally in a
       | way isolated from real prod users (eg, in an insurance system we
       | had fake dealer accounts for testing)
        
       | fleekonpoint wrote:
       | We run canaries in Prod, it isn't as extensive as our integration
       | tests that run in our test stages but it still tests happy paths
       | for most of our APIs.
        
       | natoliniak wrote:
       | Feature flags.
        
       | atemerev wrote:
       | In electronic trading, most new systems are tested in production
       | by running with smaller capital allocation first. It is hard to
       | flatten out all bugs unless you are on the real market with real
       | money and real effects (of course, simulations testing and unit
       | testing are heavily employed too).
        
       | revskill wrote:
       | It's more about handling production error quickly, than testing
       | in production. Feature flag is a good way.
        
       | paxys wrote:
       | Lots of ways to test in production. IMO the way you are
       | suggesting - injecting synthetic data into prod - is the worst of
       | both worlds. You aren't actually testing real world use cases,
       | and end up polluting your prod environment.
       | 
       | Some common ways to go about this:
       | 
       | - Feature flags: every new change goes into production behind a
       | flag. You can flip the flag for a limited set of users and do a
       | broader rollout when ready.
       | 
       | - Staged rollouts: have staging/canary etc. environments and roll
       | out new deployments to them first. Observe metrics and alerts to
       | check if something is wrong.
       | 
       | - Beta releases: have a group of internal/external power users
       | test your features before they go out to the world.
        
       | cuuupid wrote:
       | I work for a B2E company that has a structure similar to
       | Salesforce. We test in production all the time even for our
       | secure environments where the data is highly sensitive.
       | 
       | Re: data, it's a somewhat common practice to notionalize data
       | (think isomorphically faking data). We regularly do this and will
       | often designate rows as notional to hide them from users who
       | aren't admins. I've found this to work exceptionally well; we do
       | this 1-2 times a week, ensure there's a closed circuit for
       | notional data, and for more critical systems we'll inform our
       | customers that testing will occur.
       | 
       | I'm sure there are more complex and automated solutions but when
       | it comes to testing, simple and flexible is often the way to go.
        
         | bradwood wrote:
         | Thanks. This sounds interesting.
         | 
         | Can you give a bit more colour on "notionalizing" and
         | "isomorphically faking" please.
        
           | cuuupid wrote:
           | Essentially creating fake data that looks very realistic and
           | creates narratives that would span real use cases. Some of
           | this is simple (fake names with faker), some of it is a bit
           | more manually guided (customer-specific terminology and
           | specific business logic).
           | 
           | The goal here is for the data to both be useful for testing
           | and provide coverage not just at a software level, but at a
           | user story level. This helps test things like cross-
           | application interactions; is also doubly helpful since we can
           | use it for demos without screwing up production data.
        
         | piyh wrote:
         | >notionalize data (think isomorphically faking data)
         | 
         | Are these just $5 words for setting a fake data flag on the
         | records?
        
           | cuuupid wrote:
           | We do that too but notionalizing for us is usually creating
           | data that looks and behaves realistically but is actually
           | fake. (A side benefit to this is that we can then use it for
           | demos!)
        
             | nonethewiser wrote:
             | So you mock data and then flag it as fake.
        
               | cuuupid wrote:
               | Essentially yes! We usually try to follow some sort of
               | theoretical user story/paint some sort of narrative but
               | at the end of the day it's just adjusting the mocking.
               | 
               | Just now realizing notionalizing isn't a widely accepted
               | term for this
        
       | sethammons wrote:
       | Note: if you plan on accurate financial planning and metrics
       | (esp. if going public), you need to be able to separate your test
       | prod stats from the real prod stats for reporting.
        
       | dmitriid wrote:
       | A/B tests and feature flags are basically testing in prod. And
       | yes, some of those features sometimes run as a "well, it should
       | work, but we're not entirely sure until we get a significant
       | number of users using the system". It could be an edge case
       | failing or scalability requirements being wrong.
       | 
       | Another variation on the same theme is rewriting systems when you
       | run production data through both systems. Quite often that's the
       | only way of doin migrations to a new platform, or a new database,
       | or yes, a newly re-written system.
       | 
       | > Is it common practice to inject fabricated data into a prod
       | system to run such tests? What's the best practice or prior art
       | on doing this well?
       | 
       | A very common practice is to run a snapshot of prod data (e.g.
       | last hour, or last 24 hours, or even a week/month/year) through a
       | system in staging (or cooking, or pre-cooking, or whatever name
       | you give the system that's just about to be released). However,
       | doing it properly may not be easy, and depends on the systems
       | involved.
        
       | turtleyacht wrote:
       | Sometimes one cannot get the exact same specs on test hardware
       | versus production, yet a rollout depends on simulating system
       | load to shake out issues.
       | 
       | Performance testing needs a schedule, visibility, timebox, known
       | scope, backout plan, data revert plan, pre- and post-graphs.
       | Schedule. Folks are clearly tagged in a                 table
       | with times down the side.            Visibility. Folks who should
       | know know                   when it's going to happen,
       | are invited to the session,                    and are mentioned
       | in the                    distributed schedule.
       | Timebox.    It's going to start at a                   defined
       | time and end on a                   defined time.
       | Known scope. Is it going to fulfill an                    order?
       | How many accounts                    created?            Backout
       | plan. DBA and DevOps on standby                     for stopping
       | the test.            Data revert plan. We know what rows to
       | delete or update after                         testing.
       | Pretty pictures.  You want to show graphs
       | during the test, so                         that you know what to
       | improve and everyone's                         time wasn't
       | wasted.
       | 
       | Reference: observing successful runs that didn't result in
       | problems later.
        
       ___________________________________________________________________
       (page generated 2023-01-14 23:00 UTC)