[HN Gopher] The $440M software error at Knight Capital (2019) ___________________________________________________________________ The $440M software error at Knight Capital (2019) Author : bfm Score : 119 points Date : 2022-05-02 18:36 UTC (4 hours ago) (HTM) web link (www.henricodolfing.com) (TXT) w3m dump (www.henricodolfing.com) | inter_netuser wrote: | Peanuts, just a regular day in DeFi. | | https://rekt.news | Terry_Roll wrote: | I read this | | >Under stock exchange rules, Knight would have been required to | pay for those shares three days later. However, there was no way | it could pay, since the trades were unintentional and had no | source of funds behind them. The only alternatives were to try to | have the trades canceled, or to sell the newly acquired shares | the same day. | | And then I understand why /r/WallStreeBets and /r/Antiwork is | gaining traction. | | All it takes is a bit of organisation and the adoption of Govt | tactics and practices which is ultimately violence and then just | maybe you might see a Govt that works for the people and not the | criminals, but I cant picture Bernie Sanders wielding a | pitchfork! | | Still I see Musk was market making with his tweet. I dont think | you can be any more blatant! LOL | https://twitter.com/elonmusk/status/1520650036865949696?cxt=... | NovemberWhiskey wrote: | (2019) | bfm wrote: | Updated the title | randomhodler84 wrote: | Back in the day $440M loss due to coding error was a landmark | warning case. How could this happen?? | | In 2021 alone something like $10B was lost due to bugs in defi | land. | | Something about the worst possible thing could happen tends to | happen eventually and it gets worse every passing year. | pingeroo wrote: | Was just about to comment along these lines. If I read about | this a few years ago I would be shocked. Now after seeing so | many flubs in the crypto space, my reaction is just 'meh' | vmception wrote: | I actually always think of the Knight case and similar ones | when people see a DeFi organization have an issue and | extrapolate that to an issue with the entire DeFi concept. | | Its so obvious that those people have no clue whats going on in | the markets they respect. Truth be told, many of them dont like | markets at all. So its just a lack of exposure and compounded | ignorance. | colechristensen wrote: | Many traditional finance issues are fixable though, there are | many more errors which don't become big stories because they | are reasonably reversed as only minor inconveniences. | vmception wrote: | like how Credit Suisse is going to reverse their Bill Hwang | losses? I guess in this conversation we can't distinguish | from irreversible asset value and liquidity issue to | misdirected transactions that inherit partial | reversibility. | | similarly, maybe you/they just don't see the headlines of | thwarted attacks in DeFi that work specifically due to | design considerations. | | I'll take the permission to fail. The rapid iteration | creates some really fascinating systems in very short time | periods, for me. One project implodes, 100 (or 1000) more | harden, bigger money comes in creating more assurances for | users like easier recovery and compensation paths, all | while continuing to rapidly iterate. | nikanj wrote: | Those $440M were lost by rich people who had invested in a | hedge fund, not poor people who bought crypto lottery tickets | in the hopes of getting rich quick | bfm wrote: | The OP details how poor software engineering practices brought | down a 1.4B market marker with 1400 employees in 2012. | | Some of the issues mentioned include: - Keeping | synthetic test data generation as part of a production build. | - Keeping dead code for years. - Re-purposing a feature | flag. - Refactoring without regression tests. - | Manual deployments without peer reviews. They forgot to update | one of their servers with the new code. - Automated alerts | sent via email were ignored. - Rolled back to a version of | the code running on the server they forgot to update, making | things worse. - Rushing out a release without proper | software engineering hygiene. | | The article suggests improvements that could have prevented the | chain of events. | | For those here who are in HFT circles, have things improved after | the Knight Capital Group debacle? | | edit: formatting | rebelos wrote: | Some of this is unforgivable, but reflecting on it I also | realized that software engineering at quant firms has an almost | impossible mandate. You want something akin to the extreme | rigor of mission critical software (airplanes, cars, NASA, | etc), while also remaining nimble enough to modify strategies | as market conditions rapidly evolve. | SilasX wrote: | Same is true for blockchain smartcontracts, which have | similar catastrophic consequences. | ChrisClark wrote: | That truly is scary to me. I can easily* write advanced | Solidity and could try to make something big. But I won't, | because I know I would not be able to handle the stress and | responsibility. One tiny logic error and millions lost. | Thanks but no thanks. | | *The fact I believe I could easily do it is probably | exactly why I'd end up making some huge mistake. ;) | posterboy wrote: | That's a weird statement. | | The extreme rigor on the one hand seems to require a value | judgement of the real benefits to HTF that I'm not willing to | make. The remaining nimble'ity, on the other hand, is an odd | word to use over _agility_ or old fashioned _responsibility_. | The benefit is proportional to it, but not exclusively. | | The rapidly evolving market conditions concern regular trade | too. Swift reactions are expected in any other systems | application. "almost impossible" is a weasel word. It's | almost impossible to win except for the last man standing, is | that it? And there's no practical upper limit to nimble'y, | though conservative estimates indicate that less work is | more. | | What's missing is the perverse incentives, corrupt policies, | sociopathic leadership, ... | nradov wrote: | Why unforgivable? It's only numbers in an account. No one | died. | bfm wrote: | It is challenging, although, with financial markets, it seems | like it would be simpler to have some automatic anomaly | detection mechanism to unplug or slow things down to prevent | further damage. | WJW wrote: | There are a lot of preventative measures they could have | taken, starting with just not leaving in dead code and | paying attention to automated alerting. But the moral of | the story is that they got away with it for so long that | nobody cared about it anymore. After all, if it were truly | a big deal why hadn't it broken years earlier. Then when | the technical debt finally got called it bankrupted the | entire firm in one go. | | Most of us (hopefully) have less devastating technical debt | to deal with, but it is still a cautionary tale about what | could happen if you ignore it for too long. | pclmulqdq wrote: | I used to work in HFT. I have seen highly variable practices in | this case, including a "mini-knight" incident in the single- | digit millions due to tech debt and poor test coverage. | However, the most useful change that has resulted from the KCG | debacle was adding several layers of kill switches, a dedicated | ops team to watch trading and flip the kill switches, and | embracing devops automation. | | There is a much more serious focus on having a defense in | depth, and making sure that problems like this are noticed | before they become an issue. Rollbacks are no longer the first | action when something goes wrong: the kill switch comes first. | | Dead code, tech debt, repurposed flags, and spotty test | coverage are everywhere still. | aaronharnly wrote: | I'm curious about the "repurposed flags" part. | | I wouldn't think of flags as expensive / effortful to make | more of, but clearly they must be if people are tempted to | reuse them. Can you help me understand what is meant by a | flag in this context, and why it would be repurposed? | isogon wrote: | Repurposing flags not always well-motivated, but one | legitimate reason to do this is the memory (and | particularly cache) footprint. | | Often flags are local to a particular object. If there are | lots of such objects, you want each to take as little space | as possible. You should check out the contortions linux | devs go through to make struct page small [0]. This is | important, because there is one such struct per page of | physical memory. The memory use is a near-constant | percentage of your total memory, and you wouldn't want it | to be any larger than necessary. | | Even when there are not a lot of these objects, in low- | latency software it's important to hit the cache. Your | program should always just be as compact in memory as | possible. | | Semantically flags are booleans (is proposition P true of | this object). They are stored compactly as bitsets, often | implicitly, say: #define FLAG_1 0x01 | #define FLAG_2 0x02 /* ... */ #define | FLAG_8 0x80 struct order { u32 qty; | u16 id; u8 type; u8 flags; | }; | | This struct will fit into 8 bytes. This is great, as you | probably won't waste space to alignment in many cases -- 8 | is a good multiple. But if you wanted to add FLAG_9 here, | your flags would become a u16, and your struct would, | frustratingly, stop fitting into 8 bytes. To avoid this, | one might repurpose flags. | | Another example of this is intrustive flagging, using, for | example, the high or low bits of a pointer aligned to 2^n | bytes. If you run out of bits there, not much you can do. | | [0] https://github.com/torvalds/linux/blob/master/include/l | inux/... | pclmulqdq wrote: | This is pretty much why flags get repurposed. It's also | important to mention that things like JSON and protobufs | are too expensive for HFT, so you are likely going to be | sending structs over the wire. Repurposing flags lets you | change a wire format with a lot less friction than adding | a byte to a struct. Essentially, it lets you change the | minor version number on a protocol and only recompile the | endpoints without changing the major version number and | recompiling everything. | commandlinefan wrote: | > poor test coverage | | Yet you don't have to hang around here long to be told that | "Unit Testing is Overrated": https://tyrrrz.me/blog/unit- | testing-is-overrated | kevstev wrote: | I worked in algo trading for years, eventually got out because | quite frankly the level of risk I was carrying on my shoulders | everyday for what I was being paid were just way out of whack, | I at least personally never got the huge pay days that people | talked about until after I left finance for more pure tech. | Interestingly, I worked at Knight and my team pioneered trying | to blow up the firm, but that was in 2004, and things were much | friendlier- instead of front page news, it was a small blurb on | page 3 of the markets section of the WSJ. | | Anyway, I still have friends in that business. It hasn't really | changed, they have too few people covering systems that are | quite complex and while there are checks and such, no one | really understands things entirely from end to end in detail | that can prevent all problems. | | I will never invest directly in an investment bank- either | through carelessness or maliciousness I could have easily | caused a 9 figure loss, if not more, and there were probably a | thousand other people in the same position. | | When I read the detailed writeup around this a few years back, | I think by far the biggest issue was reusing a tag that had | been previously used to denote which strategy to use. I | understand why they may have chosen to do so, at the Big Bank I | was working at, getting a new fix tag to be passed through all | the layers properly would involve at least two other teams and | coordinating releases and probably several weeks worth of | meetings. If you just reuse an old value you can avoid all that | since everything is already set up. | sjtindell wrote: | I appreciate your comment about pay. Recruiters will often | tell me "it's finance so of course the pay will be | substantial." Then when we get to talking numbers they're | like "300k a year". Oh, you mean the going rate at a FAANG? | And I have to move to New York or Chicago, work more hours, | and actively work for people who I know are taking home | paychecks with 7+ zeroes on them? Come on. Sometimes it's 400 | plus bonus or whatever, which is based on fund performance | and yada yada. But it feels way off. I had heard so much | about the staggering paydays at these places but it seems you | need an ML PHD or some trading chops to be part of that. | caffeine wrote: | The attitude that finance pays more is a leftover from a | previous era. 10-15 years ago it was true: the profits from | HFT were so also way, way bigger and split up amongst a | much smaller group of firms. | | Now those firms are all in a completely competitive | industry squeezing each other for basis points. | | Meanwhile the definition of a FAANG is that it has an | effective monopoly, and these companies are taking in way | more money than the HFT industry. (Netflix is losing its | monopoly but we can't really drop N from the acronym | without a replacement..) | 22SAS wrote: | Tbf, most of us don't really prefer to be called as HFT's | but as Market Makers. Different name, but we still use | the same ultra low latency techniques to get the job | done. | spacemanmatt wrote: | > but we can't really drop N from the acronym without a | replacement | | Huh, yeah. That would be quite a GAAF. Gotta come up with | something before Netflix is forced out of the FAANG club. | snotrockets wrote: | I've seen MAAM being used. | gjs278 wrote: | asjre34marakf wrote: | Why pay more than market rate of a replaceable ML person? | | Is there any realistic path for a demonstrably smart and | hardworking person into that 7+zeros club? Evidence | suggests no: leetcode grinders and FAANGers are not in that | club, and most of them will never even make it into the | 6+zeros club. Net wealth -- sure, but not income. | 22SAS wrote: | It's all about making $$ for the firm. If the strategies | developed are very profitable then 7-figures is | definitely reachable for the researchers at a prop | trading firm. | isogon wrote: | I cannot confirm this. ~300k is pay (excluding sign-on) | fresh out of college at a big HFT -- sufficiently senior | devs make 7 figures. | hatesinterviews wrote: | At our firm, the numbers are similar: $600k TC for new | grads ($200k base, $100k minimum first year bonus, $300k | signing bonus) | 22SAS wrote: | WTF! I am at an HFT firm in Chicago, this is insane. This | seems to be a lot like an offer from Radix, or Headland, | or maybe Algo Dev at HRT. | isogon wrote: | There is certainly much variance between the firms, | especially the sign-on IME. People I know have turned | down HRT core dev for big tech because their offers were | unimpressive. | | I think an interesting target for comparison with big | tech is Jane Street, since their culture and WLB are | good, so the main QoL drawbacks of finance don't apply. A | new grad will get ~300k at Jane Street, though probably | not with this large a sign-on. | 22SAS wrote: | This is interesting, didn't know this about HRT Core Dev | where offers were below FAANG. My understanding is that | core devs are basically the folks who work on all the low | latency stuff, so they'd be pretty well. | | Jane Street, from what I recall, is 300K (base + bonus) | and 125K sign-on, and also it is non-negotiable. No idea | what their numbers are like for experienced hires from | competitors. | 22SAS wrote: | Honestly, that depends on the firm. There are same that | do pay very well like this, eg: HRT, Jane Street (they | are not an HFT though), Headlands, Radix. Some others | like Jump, Optiver the pay varies depending on whether | it's front office or back office. | | Where I work at, the new grad offers are slightly better | than FAANG, but the growth is very good based on | performance, we also pay very well to people coming in | from a competitor. | kevstev wrote: | Yeah, pay at the big banks is shit really, especially when | you consider the utter lack of work/life balance. I left in | 2013 making 150k, which was supposed to be supplemented by | a ~40% bonus for the level I was at, but each year was | "well its been a tough year..." and after getting a token | amount one year, and then zeroes the next 2, after working | 50-60 hour weeks, I was like I am not only done with this | place, but this industry, and left for a 50% pay raise, my | TC is now 4x where it was in those days. A neighbor of mine | is more or less sitting in my exact seat there, and is | somewhere in the 200-250k range. | | That said, I went back to finance to work at one of the | premier hedge funds out there, and they actually lived up | to their expectations in terms of comp, that place was more | like a tech firm though than any other firm I have ever | worked at aside for maybe Knight. 8% annual increases were | normal there. You can look in my post history back to 2018 | if you want the name, I recently left after 5 years there | and just want to stay out of their crosshairs- they monitor | social media aggressively and there is deferred comp at | stake. | | At big banks, there are really only a very small number of | people who are in tech that are getting paid- you have to | know which questions to ask- where is the bonus pool coming | from- are you "in the business" or the tech pool, which is | a second class of citizen. I would have to be in a pretty | bad place to ever consider going back to a bank, it was | borderline abusive... always dangling the prospect of that | big check that would make it worth it | rosege wrote: | I spent a few years at an investment bank, not in the US, | and the only people on serious money were some of the top | managers. But my overall opinion of these people were | that they did very little but the lower downs I met were | some of the most talented people I ever worked with. | | The top ones would spend all their time traveling the | world to the offices and meeting with staff in each | location and the sending emails to the rest of the | department about what the staff in that location were | working on. They would harvest ideas from the staff as | they went and then present that as their own or approve | projects that staff have suggested to them. I really | didn't see how they were worth the $5M they were earning | since they didn't come up with the ideas for what would | be done and didn't do any real work. | 22SAS wrote: | Most quantitative hedge funds and prop trading firms are | now following a very tech like culture since they realize | now that technology is just as important as the | strategies. To get the best engineers, especially from | FAANG, they need to have a similar culture otherwise | they'll have a hard time getting new hires. | benjaminwootton wrote: | I worked in a lot of front office groups in investment banking. | The short spell I did in HFT had great software development and | DevOps practices. | idohft wrote: | Hard to speak for HFT in general. Like in software, different | firms have different levels of hygiene. About half of your | bullet points were true of my previous employer, at my time of | leaving. | aledalgrande wrote: | This is all basic stuff I look to set up in every team, and | it's crazy given how these firms work directly with tons of | money that they don't have an even higher standard. Guess I | wasn't wrong turning down these roles. | bnastic wrote: | I remember the Knight Cap event, I was working on order routing | at the time. | | Things have changed a lot since 2012, and at the same time | haven't. Circuit breakers and position monitoring are no.1 in | any sane market making firm. What happened then I can't imagine | happening now (accumulating a huge position for, what was it, | 30 minutes? With nobody killing the algos within a couple of | minutes?). On the other hand, the perfect world of "code | hygiene" and 100% test coverage will never exist in this world, | things will slip and they do frequently. What's better, | externally, is the availability of good tools for development | and change reviews (bitbucket taking hold, for example), | automated deployments, containers, testing frameworks and | similar. This type of software, end to end, is incredibly | complex and difficult to reason about when unexpected happens | (there was a TTL misconfig for multicast and we never got such | and such update? Well, no one thought of that!), esp these days | with the influx of ML algos for price generation. | 22SAS wrote: | Currently work at an HFT firm. Most of the firms invest well | into good DevOps, Trading Systems and SRE teams, to ensure that | everything from installing a trading server at the colocation | facility, to CI/CD and making changes to the systems configs, | is done well. There are also guards in place to ensure that if | the system seems to make trades that are way too odd then pull | the plug and go down immediately. | | Also, any code that does not need to be there, is promptly | removed right away. | | Where I work at, we have a few people from KCG i.e what was | formed after Knight Capital merged with GETCO, after this | incident. Sometimes this incident is bought up, although none | of them I think ever worked for Knight Capital before this | incident. | bob1029 wrote: | Repurposing feature flags is some kind of next dimension horror | for me. We've got quite a few of these to deal with, and if | someone started changing what they mean we'd be fucked super | fast. Simply _suggesting_ that we alter the meaning of an | existing FF would result in the resignation of a non-zero | number of project managers on my team. | | Rolling back code is another thing I have no tolerance for | anymore. The only option we entertain these days is a roll- | forward. If your software takes so long to iterate/build that | you need to go back to and old version in an emergency, you | need to review your languages/tools/frameworks/processes. We | maintain a contractual obligation to our customers for same-day | code updates (in cases of production/regulatory emergencies) | because we have enough confidence in our processes. | robofanatic wrote: | at the end .. its just money going from one account to another | right? Its not like some physical thing that has perished and | cant be brought back. Why is it difficult to reverse the | transactions? | ceejayoz wrote: | Because those transactions cause other transactions, which | cause others, and so on and so forth. You'd have to reset the | market for the day. | | Imagine how pissed you'd be if you made money off Knight's | mistake and it all just disappeared the next day. | strgcmc wrote: | Except, well, cancelling transactions obviously does happen, | sometimes: https://www.reuters.com/business/lme-suspends- | nickel-trading... | | Knight was probably too messy to rollback cleanly, but that | just means it's a matter of cost/complexity/politics... if | you're a big enough player, then the exchange will do you | favors, like in the LME case. | | Free markets, lol | bfm wrote: | From the OP Rules were established after | the "flash crash" of May 2010 to govern when trades should | be canceled. Knight's buying binge did not drive up the | price of the purchased stocks by more than 30 percent, the | cancellation threshold, except for six stocks. Those | transactions were reversed. In the other cases, the trades | stood. | ceejayoz wrote: | > The LME announced that all trades will be voided from | midnight until 8:15 a.m. on Tuesday when trading stopped | and added that it was considering a closure of several | days. | | > "People will be asking if this really a functioning | market... This is meant to be a market of last resort and | people can't get inventories to deliver against positions," | said Colin Hamilton, managing director of commodities | research at BMO Capital Markets. | | There's gonna be a pretty high threshold for this sort of | thing. Higher than "one company fucked up and wants a do- | over". | rubyskills wrote: | This is much easier to do in a centralized futures market. | I can't imagine a rollback in stocks being easy or | possible. | rubyskills wrote: | Exactly this. If you're a market maker, likely your trades | impact your own trades too. As you accumulate a position, | your average price is going up with it. Trades should not | just roll back because one large hedge fund screwed up. | Imagine being a retail trader with that expectation. Would be | nice! | anamax wrote: | > Why is it difficult to reverse the transactions? | | Why should the transactions be reversed? | | If things had gone according to plan, Knight would have made | several million dollars that day, some likely because of a | mistake by someone else or an unavoidable circumstance, just | like it did on other days. | | Those other people weren't made whole, so why should Knight be | any different? | user3939382 wrote: | Here's a 225 million dollar oopsie from 2005 | https://www.foxnews.com/story/typing-error-causes-225m-loss-... | bfm wrote: | Today there was a 300B oopsie in Europe caused by a Citibank | "glitch" bloomberg.com/news/articles/2022-05-02/citi-s-london- | trading-desk-behind-rare-european-flash-crash | chmod775 wrote: | It only was a sudden drop in share prices, which quickly | rebounded. The amount of money that actually changed hands | due to that mistake will be tiny in comparison. | nuclearnice1 wrote: | Here's the same oops in 2001 | | https://www.wsj.com/articles/SB1007117680496415760 | gzer0 wrote: | _The incident happened after a technician forgot to copy the new | Retail Liquidity Program (RLP) code to one of the eight SMARS | computer servers, which was Knight 's automated routing system | for equity orders. RLP code repurposed a flag that was formerly | used to activate an old function known as 'Power Peg'. Power Peg | was designed to move stock prices higher and lower in order to | verify the behavior of trading algorithms in a controlled | environment. Therefore, orders sent with the repurposed flag to | the eighth server triggered the defective Power Peg code still | present on that server_ [1] | | > Power Peg was designed to move stock prices higher and lower in | order to verify the behavior of trading algorithms in a | controlled environment. | | This is insane. Make one wonder, what _is_ or _isn 't_ actually | being deployed in prod in 2022. | | [1] | https://en.wikipedia.org/wiki/Knight_Capital_Group#2012_stoc... | codeulike wrote: | _coder running down corridor to the trading room, bumping past | people and sending sheaves of papers flying_ | | "Power Peg has triggered! Tell them Power Peg has triggered!" | bovermyer wrote: | Interesting. Five years prior, this story was posted on this | blog: https://dougseven.com/2014/04/17/knightmare-a-devops- | caution... | dang wrote: | Related: | | _Knight Capital Says Trading Glitch Cost It $440 Million_ - | https://news.ycombinator.com/item?id=4329101 - Aug 2012 (90 | comments) | throwyawayyyy wrote: | Random, but I interviewed at Knight Capital for a software | engineering position a few weeks before this all went down. I was | in London, so the interview was done over the phone. Picture me | in the evening, handwriting C to solve some problem (the fog of | time too thick to remember what that problem was), then reading | out what I'd written, semicolons and all, to the interviewer. | Because of course there was no shared doc. I did very badly. But | then, so did they. | coolhoody wrote: | > handwriting C /.../ reading out what I'd written, semicolons | and all, to the interviewer. | | I had to re-read it to make sure you are not joking. The fact | that you were not made me laugh harder. | | I'm now just saying "retuuurn" in various exaggerated accents. | nogridbag wrote: | Same! Although I interviewed in person in their NYC office. I | was very junior at the time and the team I interviewed with was | awesome. I (luckily) didn't get the job. I did a few more | interviews and accepted an offer from another company where I | met my wife! ___________________________________________________________________ (page generated 2022-05-02 23:00 UTC)