[HN Gopher] The anatomy of an ML-powered stock picking engine ___________________________________________________________________ The anatomy of an ML-powered stock picking engine Author : muggermuch Score : 140 points Date : 2022-09-27 17:01 UTC (5 hours ago) (HTM) web link (principiamundi.com) (TXT) w3m dump (principiamundi.com) | igorkraw wrote: | Nice writeup, thank you for sharing so openly! | | The three things I always want to know from stock picking ML | people: | | 1. Did you put your own money in it ? | | 2. How'd it go? | | 3. How well does your engine do vs a fixed stock allocation based | on trend-statistics computed on the whole time window (i.e., | compared to a fixed optimal portfolio computed with mean/std | values you don't have access to, but which isn't allowed to | change its choice. what's the regret if you are familiar with | online learning) | muggermuch wrote: | Thank you for appreciating the article; I tried to disclose all | that I could! | | 1. Yes, I did put my own money in it (low 6 figures). | | 2. It went as described in the article - for the capital I | allocated to Didact, I beat the market (SPY) by ~20% since | inception. | | 3. If I understand your question correctly, this would be the | equivalent of the payoff on an optimal lookback option | (https://en.wikipedia.org/wiki/Lookback_option). I haven't | actually done that analysis, but it sounds like a nice idea. | adamsmith143 wrote: | >2. It went as described in the article - for the capital I | allocated to Didact, I beat the market (SPY) by ~20% since | inception. | | This seems extremely hard to believe. You should be running a | multi-billion $ Quant fund if this is the case. The idea that | you would try to push this as a newsletter rather than just | taking investor money and becoming a billionaire literally | makes the story seem farcical. | HFguy wrote: | It is very easy to believe. | | I could have flipped a coin, gone long or short at | beginning of this year. | | I would have had a 50% chance of outperforming the market | by 40% this year (given it is down roughly 20%). | [deleted] | muggermuch wrote: | >You should be running a multi-billion $ Quant fund if this | is the case. | | You seem to underestimate the level of effort and rigor | required to achieve this level of capital allocation. In | contrast, beating the market by 20% is table stakes. Folks | in the industry do it all the time; the difference here | simply is that I built an ML-powered engine to do it | systematically. | colinmhayes wrote: | Starting a hedge fund is a lot harder than beating the | market by 20%. | gbasin wrote: | If your predictions are good, I'd be happy to get you $100 | million in assets to manage. It's very unlikely that your | predictions are good... | notacop31337 wrote: | It's very unlikely that you're able to get OP $100 million in | assets... | mbarras_ing wrote: | Brilliantly written. As someone considering a move into the Quant | field it is very informative. | muggermuch wrote: | Thank you! | ajoseps wrote: | this is very cool! where did you get your data from and how's the | transition to airflow? | muggermuch wrote: | There are commercial feeds available via Nasdaq DataLink (FKA | Quandl). I also bought bulk historical data to feed through my | backtester (I haven't talked about this in the post; it was | getting to be a bit too long). | timeserious wrote: | Let's get a write up of your backtesting framework too | please! Terrific post @muggermuch - thank you! | Joel_Mckay wrote: | Every gambler thinks they have a system, but often fails to | recognize a game is unfair long before they arrived. lol =) | darepublic wrote: | You can think outside the box to beat the unfair game but then | you end up in jail. | jesuslop wrote: | Nice report. How did you did risk management? Have you been | leveraged? Have you paid for data? Kudos for a view from the | trenches. | muggermuch wrote: | Thank you! | | >How did you did risk management? I put in a basic position | management layer (1% fixed stop). Also, the market regime | module would modulate participation, i.e. in really risky | environments it would dial down the number of stock picks. I | can definitely do much more on this front, but I wanted to nail | down the stock picking first! :) | | >Have you been leveraged? No leverage. | | >Have you paid for data? Yes, my monthly running costs for data | are ~$1.2k. | pneumatic1 wrote: | Have you looked into Kelly criterion? | muggermuch wrote: | Yes! I use fractional Kelly extensively in my (separate) | higher-frequency strategies (on MES/ES/NQ/VX futures). | | I'm thinking of writing some follow-up posts on how to | reason about ML-driven strategies in an intraday setting. | Thanks to low-cost brokerages, there's a lot of alpha that | can be captured by small league speculators such as myself. | muggermuch wrote: | Hi, fellow HN'ers! Author here, please let me know if you have | any questions or thoughts! | krschultz wrote: | I'm not at all interested in finance / stock picking but found | this to be one of the best walkthroughs of an ML system end-to- | end that I've ever read. I'm not in the field of ML but I'm | interested in learning more and this was fantastic, thank you. | muggermuch wrote: | Thank you so much for your kind words! Your comment made my | day! :) | dennisy wrote: | This is great! Thanks for writing this! | | I have wanted to do something like this for a while, purely for | learning. The thing which puts me off is that there is a huge | amount of knowledge needed in understanding the features vs the | ML. | | Could you recommend a base system / reference one could use to | get started which explains or bakes in some of the feature / | signals engineering work? | | Also would this approach work with crypto? | muggermuch wrote: | > Also would this approach work with crypto? | | Some of it works on crypto. TBH I've stayed away from the | asset class, but only because I find it difficult to build | mental models and think about features (in my mind, it's a | mix of commodity factors and currency factors, but I'd have | to test it out). | | I seem to remember coming across papers that have tested | momentum factors at larger time-frames (e.g. weeklies). | | > Could you recommend a base system / reference one could use | to get started which explains or bakes in some of the feature | / signals engineering work? | | The references I put in at the end of the post will really | help with this! I might actually write out a separate blog | post about starting out in this space from an ML perspective. | Thanks for the idea! | idoh wrote: | If you have a tool that can generate great returns, then why fall | back to a newsletter? | muggermuch wrote: | Great question. | | If I beat the market by 20% (say SPY generated 0% for the year, | very optimistic at this point), and I have allocated $100k to | this, I make $20k before taxes. | | That's less than minimum wage. | | Meanwhile, allocators expect a track record of at least 3-5 | years. | | Ideally, if I have an asset, I'd like to extract as much | revenue as I can. | | Hope this makes sense. | xapata wrote: | If you're sitting on a gold mine, you can wait 5 years. This | does not make sense. | [deleted] | M3L0NM4N wrote: | You also don't know if your alpha is going to last 5 years. | The gold mine can run out of gold. | beambot wrote: | OP could parlay this experience into a high-paying finance | job. Algorithmic edge tends to be short lived. | muggermuch wrote: | Indeed. I haven't shut down development, just shut down the | newsletter. I'm continuing to work on it. | shadycuz wrote: | I'm a self proclaimed world class DevOps engineer. Can I | help contribute in order to get access to the model? | muggermuch wrote: | :) hmu on LinkedIn! | YetAnotherNick wrote: | Learn about hedging. Basically, for $100k, if your prediction | could consistently beat some index, you don't just buy a | stock, but you sell some other(short) stock/index at the same | time. So you own 0 worth of stock but you get the difference | in the increase as your profit. Obviously in real world, you | would need some sort of deposit, but you could bet millions | for $100k. | seanhunter wrote: | You're talking about both hedging and leverage and this is | a very important difference. | | Turning a long-only equity strategy into a long/short | strategy or an "outperformance" strategy[1] with added | leverage can seriously affect the volatility of returns and | the risk of ruin so it's really important to understand | well before embarking on this, because it will affect | position sizing and a bunch of other things. You can indeed | bet millions for $100k, but if your strategy has 10% | volatility unlevered you can get completely wiped out in | doing so whereas the risk of ruin of the unleveraged | strategy is far lower. | | [1] You could say long/short is where you long some things | and short some other things generally whereas | outperformance is where you long some things and | specifically short an index. So in the latter case you are | betting on the outperformance of your picks in particular | and in the former you are just saying you have the ability | to pick both things that go up and things that go down. | Straw wrote: | How difficult is it to get investors when you can show your | model beats the market consistently? | | Of course, they have to check your not trading a strategy | with extreme tail risk, but here it sounds like that's not | the case? | muggermuch wrote: | It's difficult. We made a lot of pitches. | Investors/allocators require a fairly long track record and | are extremely reluctant to fund (what they perceive to be) | black box strategies. | [deleted] | sanp wrote: | OP, what are you using to draw the diagrams? They look nice and | are very readable. | muggermuch wrote: | Thank you! | | I used Excalidraw (https://excalidraw.com), and I highly | recommend it! It gives me 'xkcd' vibes. | alpineidyll3 wrote: | My heart goes out to this author, but you can tell even by his | first table that he doesn't quite understand the mathematics of | financial markets, the purpose of a hedge fund, how they grow | etc. | | 1) It's plain by quickly looking at the allocation of capital in | investment firms, that AUM is not made by performance; it's | marketing. At best people invest when they believe a person is | connected to inside information. Saying you have an ML advisor is | really just a pre-req to these people. | | 2) Is that allocation stupid? No, it's not, because actually the | powers of mathematics and by extension ML are intrinsically | limited for investment returns because they are fat-tailed | </Taleb>. For example this author quotes a realistic sharpe | (0.8), but didn't calculate the standard deviation in his sharpe, | which I would bet a large sum was _at least_ 0.8. Ie: he doesn't | really know what his sharpe is. This is because equity assets | behave like a student-t distributions with a degree-of-freedom | parameter ~2 or less </Mandlebrot, /Bergomi, /Gatheral etc.>. Ie: | higher moments such as uncertainty in sharpe, literally do not | exist or converge and are unknowable. The only exception is if | your strategy explicitly cuts off tails. | | Once you understand 2) you begin to understand that there's no | such thing as a real quant fund (ie a fund which truly makes | money predictably using models) which doesn't trade a liquidity | limited book that has quite advanced hedging. Wealthy people are | aware of this, which is why the author can't market this product. | | If you're doing something silly like holding equities without | tail risk control, you literally cannot be quantitatively | investing. You are just slowly rediscovering what Kelly, Bergomi, | Mandlebrot, Bernay's etc. realized with a little deep thought | over pen and paper (while clumsily writing boilerplate software.) | That markets are entropy machines rougher than a normal | distribution, and any gains come directly from information. (see: | Kelly: "a novel interpretation of the information rate".) | | For a high latency (ms) market data feed, the returns on | information are very very small. Markets are efficient. | chollida1 wrote: | Someone asked about how difficult it is to get outside | investment.... | | It's usually very difficult and it takes a lot of money to run a | proper fund. | | Let's say you raise $50M. You can maybe charge 1 and 20,meaning | you get 1% of assets each year for running the fund and 20% of | profits. | | 1% of $50M( and keep in mind this is a large raise for someone | without a track record on the sell side or inside another fund) | give you $500,000 a year to pay: | | - salaries( lets say you pay yourself $100,000 all in plus the | same for a single analyst | | - a Bloomberg terminal $30,000 including data feeds | | - market data feeds you need $25,000/year for basic market data | and fundamental data that you are allowed to warehouse(you can't | store data you get from the Bloomberg terminal). | | - rent $50,000/year for office space | | - outside lawyer fees and outside accounting fees $100,000/year | | - similar fees for someone to run your back office, roughly | $100,000/year. | | And on the other side of expenses you have the money making side | of things. Which as the OP pointed isn't great. If you return 10% | on the 50M you get to keep 20% of that so a 10% return gives $5M | in profits and you keep $1M. | | That allows you to bonus out yourself and analysts on good years. | If you lose money one year then you get no bonus and have to | bonus out the employees out of the retained earnings you kept | from previous bonuses. | | it usually gets worse as most funds have what's called a high | water mark. This means you don't collect the performance fee | until your fund gets back to the high water mark. So if you are | down 10% one year you need to make that back before you start to | make any performance fee, which is why most funds shut down if | they go down more than 20%. | | As to raising money.....Anyone can show a model that makes money. | that doesn't mean its easy to create a model, its just that there | are alot of people capable of building such a model. | | Its the risk management that people with money are really looking | for and sadly that's just really hard to show out of a model as | part of the risk management is things like positions sizing and | showing your model doesn't pile into one asset class or trade | correlated products. | | it bodes well for the OP that they talk about market regimes as, | IMHO, this is one of the biggest risk management tools that | aspiring traders ignore. | | And this risk management is why people ask for a track record of | more than a year. | muggermuch wrote: | Thank you for this comprehensive response! | | I have often found myself struggling to explain the difference | between building a strategy or trading system (which reduces to | a technical/intellectual challenge) and running a hedge fund | (essentially running a complex information-driven business). | | Your cost breakdown really puts matters into perspective. | | > it bodes well for the OP that they talk about market regimes | | I concur. Market regimes (modeling, detecting, reasoning about | them) are too delicious of an intellectual puzzle to resist. | HFguy wrote: | This is actually way too optimistic. | | Your first 1-2 seed investors will: | | - Only pay 1 and 10 (1% fixed fee and 10% of PNL) | | - They will also get ownership of the actual fund management | firm and will get that in the form of 20% of REVENUE (not | equity, revenue, think about that) | | This is one reason new fund formation is way down. The | economics are bad for years. Know a bunch of HF people that | started vc-backed tech firms instead. | | The other reason is 10+ year run where stocks, bonds, private | firms and real estate just went up. No need for diversifying | return streams. | HFguy wrote: | BTW, data costs also too low. | | Just a BB terminal around 30k and a lot of extra data from BB | costs extra (can be 200-300k per additional product). | | For quant strategy probably looking at 500k up to 2M for data | initially. And you will likely be at a disadvantage to | existing firms that have been collecting data for years. | | And that is at the low end. Spent many millions per year for | 1 strategy at last large firm. And that was small fraction of | total firm spend. | rmah wrote: | Working in the industry, I can confirm that the above numbers | are approximately correct except for the employee costs -- | those are roughly double and up. You also need to hire a fund | administrator, auditors and compliance firms (maybe $50k to | $100k per year each) which add on even more costs. And you | can't skip the lawyers, outside administrator, outside | compliance, etc. as they are required by regulations/law. | prabdude wrote: | Excellent article | muggermuch wrote: | Thanks! | unpwn wrote: | Lmao this engine is down 6.9% for the year, when literally it's | as simple as just buying some puts. | Jabbles wrote: | You realise that puts have a cost that is determined by the | market? | ramesh31 wrote: | > this engine is down 6.9% for the year | | That's pretty damn good, and still beating the market by nearly | 20%. Of course you can always make more with riskier | strategies. | [deleted] | antognini wrote: | Have you considered submitting your predictions to the Numerai | Signals? It's market neutral so as long as your models can | generate some alpha you can still get good returns. | muggermuch wrote: | That's a good idea. I'll try it out, thanks! ___________________________________________________________________ (page generated 2022-09-27 23:00 UTC)