[HN Gopher] The anatomy of an ML-powered stock picking engine
       ___________________________________________________________________
        
       The anatomy of an ML-powered stock picking engine
        
       Author : muggermuch
       Score  : 140 points
       Date   : 2022-09-27 17:01 UTC (5 hours ago)
        
 (HTM) web link (principiamundi.com)
 (TXT) w3m dump (principiamundi.com)
        
       | igorkraw wrote:
       | Nice writeup, thank you for sharing so openly!
       | 
       | The three things I always want to know from stock picking ML
       | people:
       | 
       | 1. Did you put your own money in it ?
       | 
       | 2. How'd it go?
       | 
       | 3. How well does your engine do vs a fixed stock allocation based
       | on trend-statistics computed on the whole time window (i.e.,
       | compared to a fixed optimal portfolio computed with mean/std
       | values you don't have access to, but which isn't allowed to
       | change its choice. what's the regret if you are familiar with
       | online learning)
        
         | muggermuch wrote:
         | Thank you for appreciating the article; I tried to disclose all
         | that I could!
         | 
         | 1. Yes, I did put my own money in it (low 6 figures).
         | 
         | 2. It went as described in the article - for the capital I
         | allocated to Didact, I beat the market (SPY) by ~20% since
         | inception.
         | 
         | 3. If I understand your question correctly, this would be the
         | equivalent of the payoff on an optimal lookback option
         | (https://en.wikipedia.org/wiki/Lookback_option). I haven't
         | actually done that analysis, but it sounds like a nice idea.
        
           | adamsmith143 wrote:
           | >2. It went as described in the article - for the capital I
           | allocated to Didact, I beat the market (SPY) by ~20% since
           | inception.
           | 
           | This seems extremely hard to believe. You should be running a
           | multi-billion $ Quant fund if this is the case. The idea that
           | you would try to push this as a newsletter rather than just
           | taking investor money and becoming a billionaire literally
           | makes the story seem farcical.
        
             | HFguy wrote:
             | It is very easy to believe.
             | 
             | I could have flipped a coin, gone long or short at
             | beginning of this year.
             | 
             | I would have had a 50% chance of outperforming the market
             | by 40% this year (given it is down roughly 20%).
        
             | [deleted]
        
             | muggermuch wrote:
             | >You should be running a multi-billion $ Quant fund if this
             | is the case.
             | 
             | You seem to underestimate the level of effort and rigor
             | required to achieve this level of capital allocation. In
             | contrast, beating the market by 20% is table stakes. Folks
             | in the industry do it all the time; the difference here
             | simply is that I built an ML-powered engine to do it
             | systematically.
        
             | colinmhayes wrote:
             | Starting a hedge fund is a lot harder than beating the
             | market by 20%.
        
       | gbasin wrote:
       | If your predictions are good, I'd be happy to get you $100
       | million in assets to manage. It's very unlikely that your
       | predictions are good...
        
         | notacop31337 wrote:
         | It's very unlikely that you're able to get OP $100 million in
         | assets...
        
       | mbarras_ing wrote:
       | Brilliantly written. As someone considering a move into the Quant
       | field it is very informative.
        
         | muggermuch wrote:
         | Thank you!
        
       | ajoseps wrote:
       | this is very cool! where did you get your data from and how's the
       | transition to airflow?
        
         | muggermuch wrote:
         | There are commercial feeds available via Nasdaq DataLink (FKA
         | Quandl). I also bought bulk historical data to feed through my
         | backtester (I haven't talked about this in the post; it was
         | getting to be a bit too long).
        
           | timeserious wrote:
           | Let's get a write up of your backtesting framework too
           | please! Terrific post @muggermuch - thank you!
        
       | Joel_Mckay wrote:
       | Every gambler thinks they have a system, but often fails to
       | recognize a game is unfair long before they arrived. lol =)
        
         | darepublic wrote:
         | You can think outside the box to beat the unfair game but then
         | you end up in jail.
        
       | jesuslop wrote:
       | Nice report. How did you did risk management? Have you been
       | leveraged? Have you paid for data? Kudos for a view from the
       | trenches.
        
         | muggermuch wrote:
         | Thank you!
         | 
         | >How did you did risk management? I put in a basic position
         | management layer (1% fixed stop). Also, the market regime
         | module would modulate participation, i.e. in really risky
         | environments it would dial down the number of stock picks. I
         | can definitely do much more on this front, but I wanted to nail
         | down the stock picking first! :)
         | 
         | >Have you been leveraged? No leverage.
         | 
         | >Have you paid for data? Yes, my monthly running costs for data
         | are ~$1.2k.
        
           | pneumatic1 wrote:
           | Have you looked into Kelly criterion?
        
             | muggermuch wrote:
             | Yes! I use fractional Kelly extensively in my (separate)
             | higher-frequency strategies (on MES/ES/NQ/VX futures).
             | 
             | I'm thinking of writing some follow-up posts on how to
             | reason about ML-driven strategies in an intraday setting.
             | Thanks to low-cost brokerages, there's a lot of alpha that
             | can be captured by small league speculators such as myself.
        
       | muggermuch wrote:
       | Hi, fellow HN'ers! Author here, please let me know if you have
       | any questions or thoughts!
        
         | krschultz wrote:
         | I'm not at all interested in finance / stock picking but found
         | this to be one of the best walkthroughs of an ML system end-to-
         | end that I've ever read. I'm not in the field of ML but I'm
         | interested in learning more and this was fantastic, thank you.
        
           | muggermuch wrote:
           | Thank you so much for your kind words! Your comment made my
           | day! :)
        
         | dennisy wrote:
         | This is great! Thanks for writing this!
         | 
         | I have wanted to do something like this for a while, purely for
         | learning. The thing which puts me off is that there is a huge
         | amount of knowledge needed in understanding the features vs the
         | ML.
         | 
         | Could you recommend a base system / reference one could use to
         | get started which explains or bakes in some of the feature /
         | signals engineering work?
         | 
         | Also would this approach work with crypto?
        
           | muggermuch wrote:
           | > Also would this approach work with crypto?
           | 
           | Some of it works on crypto. TBH I've stayed away from the
           | asset class, but only because I find it difficult to build
           | mental models and think about features (in my mind, it's a
           | mix of commodity factors and currency factors, but I'd have
           | to test it out).
           | 
           | I seem to remember coming across papers that have tested
           | momentum factors at larger time-frames (e.g. weeklies).
           | 
           | > Could you recommend a base system / reference one could use
           | to get started which explains or bakes in some of the feature
           | / signals engineering work?
           | 
           | The references I put in at the end of the post will really
           | help with this! I might actually write out a separate blog
           | post about starting out in this space from an ML perspective.
           | Thanks for the idea!
        
       | idoh wrote:
       | If you have a tool that can generate great returns, then why fall
       | back to a newsletter?
        
         | muggermuch wrote:
         | Great question.
         | 
         | If I beat the market by 20% (say SPY generated 0% for the year,
         | very optimistic at this point), and I have allocated $100k to
         | this, I make $20k before taxes.
         | 
         | That's less than minimum wage.
         | 
         | Meanwhile, allocators expect a track record of at least 3-5
         | years.
         | 
         | Ideally, if I have an asset, I'd like to extract as much
         | revenue as I can.
         | 
         | Hope this makes sense.
        
           | xapata wrote:
           | If you're sitting on a gold mine, you can wait 5 years. This
           | does not make sense.
        
             | [deleted]
        
             | M3L0NM4N wrote:
             | You also don't know if your alpha is going to last 5 years.
             | The gold mine can run out of gold.
        
             | beambot wrote:
             | OP could parlay this experience into a high-paying finance
             | job. Algorithmic edge tends to be short lived.
        
             | muggermuch wrote:
             | Indeed. I haven't shut down development, just shut down the
             | newsletter. I'm continuing to work on it.
        
               | shadycuz wrote:
               | I'm a self proclaimed world class DevOps engineer. Can I
               | help contribute in order to get access to the model?
        
               | muggermuch wrote:
               | :) hmu on LinkedIn!
        
           | YetAnotherNick wrote:
           | Learn about hedging. Basically, for $100k, if your prediction
           | could consistently beat some index, you don't just buy a
           | stock, but you sell some other(short) stock/index at the same
           | time. So you own 0 worth of stock but you get the difference
           | in the increase as your profit. Obviously in real world, you
           | would need some sort of deposit, but you could bet millions
           | for $100k.
        
             | seanhunter wrote:
             | You're talking about both hedging and leverage and this is
             | a very important difference.
             | 
             | Turning a long-only equity strategy into a long/short
             | strategy or an "outperformance" strategy[1] with added
             | leverage can seriously affect the volatility of returns and
             | the risk of ruin so it's really important to understand
             | well before embarking on this, because it will affect
             | position sizing and a bunch of other things. You can indeed
             | bet millions for $100k, but if your strategy has 10%
             | volatility unlevered you can get completely wiped out in
             | doing so whereas the risk of ruin of the unleveraged
             | strategy is far lower.
             | 
             | [1] You could say long/short is where you long some things
             | and short some other things generally whereas
             | outperformance is where you long some things and
             | specifically short an index. So in the latter case you are
             | betting on the outperformance of your picks in particular
             | and in the former you are just saying you have the ability
             | to pick both things that go up and things that go down.
        
           | Straw wrote:
           | How difficult is it to get investors when you can show your
           | model beats the market consistently?
           | 
           | Of course, they have to check your not trading a strategy
           | with extreme tail risk, but here it sounds like that's not
           | the case?
        
             | muggermuch wrote:
             | It's difficult. We made a lot of pitches.
             | Investors/allocators require a fairly long track record and
             | are extremely reluctant to fund (what they perceive to be)
             | black box strategies.
        
             | [deleted]
        
       | sanp wrote:
       | OP, what are you using to draw the diagrams? They look nice and
       | are very readable.
        
         | muggermuch wrote:
         | Thank you!
         | 
         | I used Excalidraw (https://excalidraw.com), and I highly
         | recommend it! It gives me 'xkcd' vibes.
        
       | alpineidyll3 wrote:
       | My heart goes out to this author, but you can tell even by his
       | first table that he doesn't quite understand the mathematics of
       | financial markets, the purpose of a hedge fund, how they grow
       | etc.
       | 
       | 1) It's plain by quickly looking at the allocation of capital in
       | investment firms, that AUM is not made by performance; it's
       | marketing. At best people invest when they believe a person is
       | connected to inside information. Saying you have an ML advisor is
       | really just a pre-req to these people.
       | 
       | 2) Is that allocation stupid? No, it's not, because actually the
       | powers of mathematics and by extension ML are intrinsically
       | limited for investment returns because they are fat-tailed
       | </Taleb>. For example this author quotes a realistic sharpe
       | (0.8), but didn't calculate the standard deviation in his sharpe,
       | which I would bet a large sum was _at least_ 0.8. Ie: he doesn't
       | really know what his sharpe is. This is because equity assets
       | behave like a student-t distributions with a degree-of-freedom
       | parameter ~2 or less </Mandlebrot, /Bergomi, /Gatheral etc.>. Ie:
       | higher moments such as uncertainty in sharpe, literally do not
       | exist or converge and are unknowable. The only exception is if
       | your strategy explicitly cuts off tails.
       | 
       | Once you understand 2) you begin to understand that there's no
       | such thing as a real quant fund (ie a fund which truly makes
       | money predictably using models) which doesn't trade a liquidity
       | limited book that has quite advanced hedging. Wealthy people are
       | aware of this, which is why the author can't market this product.
       | 
       | If you're doing something silly like holding equities without
       | tail risk control, you literally cannot be quantitatively
       | investing. You are just slowly rediscovering what Kelly, Bergomi,
       | Mandlebrot, Bernay's etc. realized with a little deep thought
       | over pen and paper (while clumsily writing boilerplate software.)
       | That markets are entropy machines rougher than a normal
       | distribution, and any gains come directly from information. (see:
       | Kelly: "a novel interpretation of the information rate".)
       | 
       | For a high latency (ms) market data feed, the returns on
       | information are very very small. Markets are efficient.
        
       | chollida1 wrote:
       | Someone asked about how difficult it is to get outside
       | investment....
       | 
       | It's usually very difficult and it takes a lot of money to run a
       | proper fund.
       | 
       | Let's say you raise $50M. You can maybe charge 1 and 20,meaning
       | you get 1% of assets each year for running the fund and 20% of
       | profits.
       | 
       | 1% of $50M( and keep in mind this is a large raise for someone
       | without a track record on the sell side or inside another fund)
       | give you $500,000 a year to pay:
       | 
       | - salaries( lets say you pay yourself $100,000 all in plus the
       | same for a single analyst
       | 
       | - a Bloomberg terminal $30,000 including data feeds
       | 
       | - market data feeds you need $25,000/year for basic market data
       | and fundamental data that you are allowed to warehouse(you can't
       | store data you get from the Bloomberg terminal).
       | 
       | - rent $50,000/year for office space
       | 
       | - outside lawyer fees and outside accounting fees $100,000/year
       | 
       | - similar fees for someone to run your back office, roughly
       | $100,000/year.
       | 
       | And on the other side of expenses you have the money making side
       | of things. Which as the OP pointed isn't great. If you return 10%
       | on the 50M you get to keep 20% of that so a 10% return gives $5M
       | in profits and you keep $1M.
       | 
       | That allows you to bonus out yourself and analysts on good years.
       | If you lose money one year then you get no bonus and have to
       | bonus out the employees out of the retained earnings you kept
       | from previous bonuses.
       | 
       | it usually gets worse as most funds have what's called a high
       | water mark. This means you don't collect the performance fee
       | until your fund gets back to the high water mark. So if you are
       | down 10% one year you need to make that back before you start to
       | make any performance fee, which is why most funds shut down if
       | they go down more than 20%.
       | 
       | As to raising money.....Anyone can show a model that makes money.
       | that doesn't mean its easy to create a model, its just that there
       | are alot of people capable of building such a model.
       | 
       | Its the risk management that people with money are really looking
       | for and sadly that's just really hard to show out of a model as
       | part of the risk management is things like positions sizing and
       | showing your model doesn't pile into one asset class or trade
       | correlated products.
       | 
       | it bodes well for the OP that they talk about market regimes as,
       | IMHO, this is one of the biggest risk management tools that
       | aspiring traders ignore.
       | 
       | And this risk management is why people ask for a track record of
       | more than a year.
        
         | muggermuch wrote:
         | Thank you for this comprehensive response!
         | 
         | I have often found myself struggling to explain the difference
         | between building a strategy or trading system (which reduces to
         | a technical/intellectual challenge) and running a hedge fund
         | (essentially running a complex information-driven business).
         | 
         | Your cost breakdown really puts matters into perspective.
         | 
         | > it bodes well for the OP that they talk about market regimes
         | 
         | I concur. Market regimes (modeling, detecting, reasoning about
         | them) are too delicious of an intellectual puzzle to resist.
        
         | HFguy wrote:
         | This is actually way too optimistic.
         | 
         | Your first 1-2 seed investors will:
         | 
         | - Only pay 1 and 10 (1% fixed fee and 10% of PNL)
         | 
         | - They will also get ownership of the actual fund management
         | firm and will get that in the form of 20% of REVENUE (not
         | equity, revenue, think about that)
         | 
         | This is one reason new fund formation is way down. The
         | economics are bad for years. Know a bunch of HF people that
         | started vc-backed tech firms instead.
         | 
         | The other reason is 10+ year run where stocks, bonds, private
         | firms and real estate just went up. No need for diversifying
         | return streams.
        
           | HFguy wrote:
           | BTW, data costs also too low.
           | 
           | Just a BB terminal around 30k and a lot of extra data from BB
           | costs extra (can be 200-300k per additional product).
           | 
           | For quant strategy probably looking at 500k up to 2M for data
           | initially. And you will likely be at a disadvantage to
           | existing firms that have been collecting data for years.
           | 
           | And that is at the low end. Spent many millions per year for
           | 1 strategy at last large firm. And that was small fraction of
           | total firm spend.
        
         | rmah wrote:
         | Working in the industry, I can confirm that the above numbers
         | are approximately correct except for the employee costs --
         | those are roughly double and up. You also need to hire a fund
         | administrator, auditors and compliance firms (maybe $50k to
         | $100k per year each) which add on even more costs. And you
         | can't skip the lawyers, outside administrator, outside
         | compliance, etc. as they are required by regulations/law.
        
       | prabdude wrote:
       | Excellent article
        
         | muggermuch wrote:
         | Thanks!
        
       | unpwn wrote:
       | Lmao this engine is down 6.9% for the year, when literally it's
       | as simple as just buying some puts.
        
         | Jabbles wrote:
         | You realise that puts have a cost that is determined by the
         | market?
        
         | ramesh31 wrote:
         | > this engine is down 6.9% for the year
         | 
         | That's pretty damn good, and still beating the market by nearly
         | 20%. Of course you can always make more with riskier
         | strategies.
        
       | [deleted]
        
       | antognini wrote:
       | Have you considered submitting your predictions to the Numerai
       | Signals? It's market neutral so as long as your models can
       | generate some alpha you can still get good returns.
        
         | muggermuch wrote:
         | That's a good idea. I'll try it out, thanks!
        
       ___________________________________________________________________
       (page generated 2022-09-27 23:00 UTC)