[HN Gopher] Jane Street Market Prediction ($100k Kaggle competit...
       ___________________________________________________________________
        
       Jane Street Market Prediction ($100k Kaggle competition)
        
       Author : tosh
       Score  : 169 points
       Date   : 2020-11-24 18:25 UTC (4 hours ago)
        
 (HTM) web link (www.kaggle.com)
 (TXT) w3m dump (www.kaggle.com)
        
       | usmannk wrote:
       | As a frequent Kaggler (perhaps too frequent... it's a bit
       | addicting, in a way I'm sure others on HN will understand), I was
       | fairly intrigued to see this one pop up in the competition list a
       | few days ago. Finance shops have tried their hand at Kaggle
       | before, but I think they've normally been out of their domain.
       | e.g. Two Sigma recently did a reinforcement learning game
       | competition.
       | 
       | I'd caution the HN crowd not to expect production-level quant
       | models out of this, like I'm seeing some doing in the comments
       | already. Kagglers are excellent machine learning practitioners
       | and the models that come out of many competitions are top-notch
       | stuff, often making their way into research papers. But this is a
       | short competition on limited data in a non-real-world scenario.
       | The winning models will be very interesting educational exercises
       | and probably wonderful recruiting material for Jane Street, but
       | won't be the underpinnings of a new fund.
       | 
       | That said, I can't wait to see what comes out of this one. It
       | ticks all of my competitive boxes :)
        
         | riazrizvi wrote:
         | Mathematical analysis of financial markets is more celebrated
         | when applied to relative valuation of different assets, rather
         | than prediction of the market. Black-scholes, for example,
         | applied calculus with an underlying no-arbitrage assumption to
         | create a thriving market in option pricing, by giving traders a
         | mechanism to reduce risk and thereby reduce bid offer spreads.
         | Same in fixed income, mortgage, and credit market assets over
         | the years.
         | 
         | The problem with predicting absolute levels, is that there is a
         | game theoretic aspect which undermines any mathematical trading
         | strategy as soon as it is public. optimal game theory trading
         | strategies don't produce great results, and they are relatively
         | trivial to identify. Instead strong profits in market
         | long/short macro positions are mostly created by information
         | advantages, which don't really make for interesting Kaggle
         | competitions. For example, big profits in macro trading have
         | historically been consistently achieved by front running
         | customer orders, by building timing advantages on top of
         | trading infrastructure, by funding research analysts that
         | inspect operations on the ground, by lobbying for regulations
         | that change market directions and so on.
         | 
         | It's very hard to tell if a best performing hedge funds that
         | doesn't have an unfair advantage, that declares its only using
         | quantitative strategies, is in fact just a statistical anomaly
         | with a hollow narrative.
        
           | hogFeast wrote:
           | Also, prices aren't stationary. For an equity security, you
           | are predicting the price for a company that is compounding
           | capital over time. You can predict the price for the company
           | at one point, the relative valuation for the company (for
           | example, against peer group) may not change in one year but
           | that company is investing their capital at X% so you get
           | price growth.
           | 
           | The reason why relative valuation models are more effective
           | is the same reason why most sports betting models use current
           | odds as an input. Prices contain information but, in my
           | experience, these methods aren't totally effective because
           | they often miss important information about the company
           | itself (big price moves happen because relative valuations
           | are wrong). Value or quality appears to do fundamental work
           | but is often woefully blind (for example, there are proven
           | accounting issues with value strategies...does your average
           | quant understand this? No. Have they ever read a set of
           | accounts? No. They have no hope. None.)
           | 
           | Just imo, I think quant strategies are almost totally
           | worthless beyond liquidity provision (even a strategy like
           | front-running news in FX...humans do this better, and I know
           | people who are still making tons of money doing this). I
           | think there is massive value in that mode of analysis but the
           | people who make the most are always going to be people who
           | know the fundamentals better (I think firms like Marshall
           | Wace that are doing this synthesis will move ahead) because
           | that information is often not in the price at all.
        
           | blovescoffee wrote:
           | I've wanted to start learning about this for a while but I'm
           | really not sure where to start. I have a degree in CS and
           | Math so I'm not a total layman wrt the maths. Do you have any
           | suggestions?
        
             | riazrizvi wrote:
             | What's your goal?
        
           | georgeecollins wrote:
           | This!
           | 
           | >> The problem with predicting absolute levels, is that there
           | is a game theoretic aspect which undermines any mathematical
           | trading strategy as soon as it is public.
           | 
           | I took finance in Business school, coming from doing a lot of
           | statistical analysis in a research lab. I hated my finance
           | professors and there pseudo science. Pricing formulas work
           | great until they don't. The problem is when they don't, they
           | really don't, in a catastrophic way. Read "When Genius
           | Failed." Real traders know this. But some economists and
           | finance professors act like these mathematical models are
           | describing a predictable physical phenomena.
        
             | riazrizvi wrote:
             | To clarify, the hedge fund LTCM in "When Genius Failed",
             | collapsed not because it relied on arbitrage 'pricing
             | formulae', rather because it failed to properly execute
             | arbitrage trades.
             | 
             | LTCM in being overly leveraged, relied on other market
             | participants to maintain short term price alignment, which
             | meant it was not arbitrage. Salomon's reduced its role as
             | market-maker, maintaining short term price alignment, which
             | increased short term price anomalies, and thus increased
             | LTCM's vulnerability. The Asian financial crisis increased
             | the frequency and extent of those pricing anomalies, and
             | the subsequent Russian Default crisis did the same. Margin
             | calls were made on LTCM that it couldn't cover, forcing
             | them to close out of their positions at very unprofitable
             | times of the trade strategy.
             | 
             | So I don't think "pseudo-science" is a great description
             | for what those B-School profs are teaching. Rather the
             | pricing formulae are just the beginnings of the financial
             | theory you need to run arbitrage strategies, but they are
             | not sufficient. You need to augment them with a broader
             | picture of market dynamics and capital management, just
             | like you'd need to learn about financial law, financial
             | market technology, and a bunch of other stuff to run a
             | successful market-making desk.
        
             | milesvp wrote:
             | To add to this, unless your model is situated, and can
             | purturb the market, it has no way of knowing what happens
             | when you flex your muscle. I have a friend who did
             | algorithmic trading professionally for a few years, and he
             | said it was amazing to watch the data. Said he could see
             | other bots come along and poke him, trying to look for
             | weaknesses in his algorithm to exploit. I would expect a
             | purely formulaic trader to underperform other traders who
             | can take advantage of others. It's no different than how
             | you have to win a rhoshambo turnament.
        
         | fractionalhare wrote:
         | Yes, it's overwhelmingly unlikely that the winning model will
         | actually be a competitive trading strategy.
         | 
         | Kaggle encourages a domain agnostic approach to modeling, in
         | the sense that participants use sophisticated machine learning
         | and statistical methods but typically have no domain expertise
         | in the underlying data. This kind of approach to finance has
         | historically performed poorly. [1]
         | 
         | Good quantitative trading is usually backed by a strong
         | fundamental thesis and an _interpretable_ model, which is
         | obtained by cross-pollinating sophisticated math and statistics
         | with domain expertise in some part of finance. That domain
         | expertise might be in different kinds of assets, liquidity or
         | market microstructure, but it 's there.
         | 
         | $100k is cheap for Jane Street. If nothing else they have a new
         | recruiting pipeline of people with demonstrable machine
         | learning skills.
         | 
         | ______________
         | 
         | 1. I would also say this is a poor way to approach statistical
         | analysis in _most_ domains, and usually leads to spurious or
         | overfit results. But the idea that you can just run a model and
         | find patterns in pricing data is especially attractive and
         | insidious.
        
           | usmannk wrote:
           | > Kaggle encourages a domain agnostic approach to modeling,
           | in the sense that participants use sophisticated machine
           | learning and statistical methods but typically have no domain
           | expertise in the underlying data.
           | 
           | Yes this is accurate and put very well. This is so much the
           | case that if you have a strong background understanding of
           | the field, the ML part can actually be picked up quite
           | quickly or contributed by someone else. There are a few
           | notable users who are both domain and ML experts and they
           | tend to absolutely clean up in their field. I'm thinking of a
           | couple of med students in particular who are formidable in
           | every medical imaging competition.
        
           | rahimnathwani wrote:
           | "have no domain expertise in the underlying data. This kind
           | of approach to finance has historically performed poorly"
           | 
           | I recently read 'The man who solved the market', about Jim
           | Simons and Renaissance Capital. The way the book tells it,
           | looking for patterns without seeking domain expertise (e.g.
           | ignoring fundamental valuation of equities) is exactly what
           | Renaissance did, and it worked out very well.
        
             | fractionalhare wrote:
             | I can see why someone would characterize RenTech that way
             | but it's not really fair to do so. There is a lot of mythos
             | about how Simons hired computer scientists, mathematicians,
             | signal processing and NLP experts, etc. When Mercer came
             | over from IBM, he definitely contributed a significant
             | amount of analytical expertise that was probably
             | nonexistent in financial trading at the time (with the
             | possible exception of the Ed Thorp diaspora). The
             | astrophysicists RenTech hires every year bring new insights
             | in ways to model and understand vast amounts of data with
             | absurd dimensionality.
             | 
             | But all of this has to be utilized in the context of the
             | data. The reality is that you're not going to develop a
             | sophisticated options trading strategy without a strong
             | understanding of what an option (and more generally, a
             | derivative) _is_. You can 't develop a viable statistical
             | arbitrage strategy just by treating market microstructure
             | as a blackbox signal to be solved with e.g. Fourier
             | analysis. You can certainly find an edge in using
             | fundamentally superior methods of analysis, but you still
             | need to know what that data represents in the context of
             | the market.
             | 
             | Don't be fooled: people working at firms like RenTech have
             | a strong understanding of the underlying finance. It's just
             | that they learned it on the job, because the ethos at these
             | firms is that learning fundamental theory in math and
             | statistics is harder than learning fundamental theory in
             | finance. You don't have to take my word for it though. Read
             | about one of the few strategies of RenTech's which has been
             | publicized: https://www.bloomberg.com/opinion/articles/2014
             | -07-22/senate.... Deutsche and RenTech didn't team up on
             | this strategy (to fantastic success) by treating basket
             | options as some kind of blackbox abstraction devoid of
             | delta, gamma, theta and vega.
        
               | reese_john wrote:
               | How much do you think does their advantage stems from
               | having high quality proprietary/alternative datasets?
        
               | ciamac wrote:
               | That DB/rentech basket option is a tax avoidance scheme.
               | It has nothing to do options pricing and concepts like
               | delta, gamma, etc.
        
               | fractionalhare wrote:
               | Yeah, yeah. That controversy has been litigated on HN a
               | dozen times already, I'm not going to rehash it. Do you
               | dispute my primary point here? If so, why?
               | 
               | (Also, even if I agree it was purely intended for tax
               | avoidance, I don't understand why you think that would
               | obviate having to understand how the options work
               | intimately well).
        
               | nojito wrote:
               | Yup.
               | 
               | https://www.rentec.com/Careers.action?computerProgrammer=
               | tru...
               | 
               | They look for programmers with knowledge of Tax and Risk
               | Management.
        
         | x87678r wrote:
         | They even say in the instructions:
         | 
         | Admittedly, this challenge far oversimplifies the depth of the
         | quantitative problems Jane Streeters work on daily, and Jane
         | Street is happy with the performance of its existing trading
         | model for this particular question.
        
         | Spinnaker_ wrote:
         | We should also remember that Jane Street is primarily an ETF
         | market maker. Their main business isn't betting on prices of
         | stocks or managing a portfolio.
         | 
         | I've only taken a quick look at the data, but the problem
         | doesn't seem to be focused on their core competencies, but
         | instead is much more general.
        
       | npmisdown wrote:
       | I'm sorry, if you could build a model to predict markets, why
       | will you post in to Kaggle to get $40k in prize instead of
       | applying this model to your own broker account?
        
         | tikhonj wrote:
         | It is _much_ harder to turn a model into a profitable trading
         | strategy than people realize. Apart from transaction costs,
         | risk management and market impact there are also a lot of small
         | operational details which can make or break your execution. One
         | example I vaguely recall was that the details of how a specific
         | foreign exchange conducted its closing auction could make a
         | substantial difference to a strategy that involved executing
         | there alongside other trading venues.
         | 
         | The payoff for getting these operational details right or wrong
         | is massively asymmetrical. If you get everything right, you'll
         | only do as well as your model lets you. But if you get anything
         | wrong, you run a real chance of losing far more money than you
         | could have hoped to make!
         | 
         | Even just validating your strategy on historical data (ie back-
         | testing) is harder than it sounds. If you make a mistake that
         | leaks information to the code you're testing, you can end up
         | with a much rosier return and risk profile than you really
         | have. Another way to lose money when you go put your model into
         | action.
         | 
         | If you get over these challenges and run your strategy
         | successfully for a while, other market participants are going
         | to start adjusting against it and you have to adjust in turn.
         | You can't just "set and forget".
         | 
         | I should note that I am far from an expert on any of this,
         | though! I just know enough to not trade with serious money--my
         | real savings are all in index funds I don't touch, thank you
         | very much :).
        
           | hogFeast wrote:
           | I believe what you are referring to is the fix. Foreign
           | exchange markets, that I am aware of, do not have closing
           | auctions.
           | 
           | I have heard of some quants trading foreign exchange markets,
           | agreeing to trade at the fix with their counter-party, and
           | not realising that traders often manipulate the fix resulting
           | in the quant's strategy appearing not to work. It is almost
           | comical (I worked in finance but not in FX, everyone knew
           | this was going on for decades before the SEC starting fining
           | people) that someone who managed money was making this error.
           | 
           | You are 100% correct about all the other stuff. Lots of
           | issues with "production"...that is why financial firms employ
           | traders/risk people/etc. Most people who trade themselves
           | tend to go for lower-frequency strategies that they can
           | implement personally. I actually don't think there are huge
           | barriers, smaller investors have a huge advantage (when you
           | trade at scale, the market moves against you) but you have to
           | work with what you have and realise that you will get crushed
           | if you try to replicate what someone with more money is
           | doing.
           | 
           | Also, data. Data is expensive, and a huge fixed cost.
        
             | nstj wrote:
             | A "foreign exchange" not "foreign exchange market"
        
               | hogFeast wrote:
               | Ah, same principle. I have heard of many similar stories.
        
         | madrafi wrote:
         | well because quant trading isn't about import xgboost, you need
         | a sustainable infra to handle api failovers, bad data... not
         | even going to mention risk management which is 50% of what
         | quant trading is about. the data provided is anonymized but
         | would probably be a mix of laggard measurements (moving
         | averages, rsi...) and maybe some flow data... quant trading
         | isn't really about finding "secret stuff" most profitable
         | strats you can deploy can be based on stat-arb, basis trading
         | or even just delta-neutral funding farming and such
        
         | homie wrote:
         | Mostly because it's impossible to accurately predict the market
         | - and this is just a competition to see who can build the best
         | model.
        
           | thegjp210 wrote:
           | HFT firms aren't trying to predict "the market" as a whole -
           | just small eddies of it. A typical example of this is arbing
           | names at the bottom of index fund rebalances. Speed is
           | important mostly to make sure someone else doesn't hit the
           | arb first.
        
         | mbesto wrote:
         | It's a good question. The basic answer is not everyone has
         | capital and risk, but they may have the time and intellect.
        
       | 1helloworld1 wrote:
       | Isn't it pretty well known in the finance world that using stale
       | public information to predict the market is a fool's errand?
       | 
       | Unless you have some kind of specialized non-public data (e.g
       | satellite images of number of cars parked outside parking malls,
       | number of cargo ships moving in and out), trying to predict the
       | market with historical data does worse than "Just give me some
       | monkeys, darts and a dart board".
        
         | Tinyyy wrote:
         | That's not necessarily true.
        
         | 2-tpg wrote:
         | Using purely historical price data it is harrowingly difficult.
         | There are 130 anonymized features, so that's unlikely to be
         | only price data. It could include information on the order
         | book, correlated assets, fundamentals, vectorized/embedded
         | text, etc.
         | 
         | Besides, I bet you can train monkeys to do (slightly) better
         | than blindfolded random throwing. Even with public data
         | (replace satellite images with Youtube mentions, or number of
         | links moving into a company website) it is _very_ possible to
         | do better than average guessing on quite a lot of assets
         | (especially smaller and newer markets).
         | 
         | Most hedge funds, even with specialized expensive non-public
         | data, are not magical unicorns. Their quants really may just
         | run a gradient boosting machine and leave it at that. Some
         | hedge funds even prefer linear methods, because this lowers
         | risk through lower variance. Such models _can_ be beaten by
         | experienced Kagglers for sure. For one, I did.
        
           | Traster wrote:
           | One thing we need to be clear about is that you're not aiming
           | to be better than average. You're aiming to make a profit.
           | There are probably hundreds of thousands of day traders,
           | there are probably <100 market makers and tradingfirms (far
           | less than that for a some specific products) and you'll
           | probably find 99% of the day traders aren't making systematic
           | profits. There are lots of strategies that are much better
           | than average and still worse than putting your cash in a
           | bank.
        
             | 2-tpg wrote:
             | You can aim for both. If you just aim for profit, then you
             | can get lucky with just average, or even random, betting.
             | If you find a weighted coinflip (which is not impossible),
             | provided by how many times you can flip that coin, you will
             | see steady systematic profits. Of course, majority of day
             | traders are getting owned by the big players, and they
             | would do better doing more reasoned and long-term
             | investments. Most day traders are not even using predictive
             | models though.
        
               | Traster wrote:
               | On that point - it's pretty clear that this Kaggle
               | competition is highly likely to result in a decent number
               | of submissions that make more money through luck, than
               | other make through strategy.
        
         | minimaxir wrote:
         | Granted, a typical Kaggle metagame-that-is-technically-against-
         | the-rules is to use data from outside the dataset, which is one
         | of the reasons winners have to be validated.
        
           | usmannk wrote:
           | Generally this is allowed if you publish the data you're
           | bringing in. They even create a sponsored thread for it in
           | most competitions.
        
           | justjonathan wrote:
           | From: kaggle.com/c/jane-street-market-
           | prediction/overview/code-requirements
           | 
           | "Freely & publicly available external data is allowed,
           | including pre-trained models"
        
         | blhack wrote:
         | >Unless you have some kind of specialized non-public data (e.g
         | satellite images of number of cars parked outside parking
         | malls, number of cargo ships moving in and out)
         | 
         | Planet labs will sell you all of that data, in case people
         | reading along here are curious.
        
         | cultus wrote:
         | I'm sure they've got more/better data in production. There
         | seems to be some arbitrage that can be shaved off the edges for
         | players with innovative enough strategies and good and timely
         | enough data.
        
         | logicslave wrote:
         | Theres actually still money to be made in small scale
         | strategies. Sophisticated funds are running billions. They cant
         | focus on strategies that only work for 100-500k. This is where
         | big returns can be made. Even warren buffet will say, if he was
         | only managing 1 million, he would get 100% a year returns.
        
           | xapata wrote:
           | That doesn't make sense unless there are very few viable
           | small scale strategies, at which point they'd probably be
           | difficult to identify. Your assertion might have been true
           | before computers were able to help someone manage many
           | strategies simultaneously.
        
         | smabie wrote:
         | no not really. you can use public information to give you an
         | edge. And I say this as a person who trades and develops models
         | at a _vety_ successful market making firm.
         | 
         | Alternative data no one else can get easily certainly has
         | tremendous value though.
         | 
         | Of course, predicting one or two seconds into the future (my
         | primary concern) is easier than days or years, so there's that.
        
         | hogFeast wrote:
         | None of that information is non-public (you can find cargo ship
         | data online for free), none of it is particularly valuable (you
         | are looking for information, it is hard to know how much
         | information is in cargo ship movement...it depends), and most
         | non-quant hedge funds have been doing stuff like this for
         | decades (i.e. hiring people to stand outside a retailer's
         | stores and count customers)...most of this stuff is less useful
         | than people think (again, you need information, data with
         | intent).
         | 
         | Also, most of this stuff isn't in the price. Lots of people are
         | collecting new data, it is definitely becoming more widespread
         | but the actual synthesis is tricky (most people who are quants
         | do not understand fundamentals, and most fundamental analysis
         | don't understand data...most firms are swirling in a perfect
         | storm of ignorance).
        
       | master_yoda_1 wrote:
       | These are leetcode equivalent for data science and quant. So
       | invest time only interested.
        
       | baobabKoodaa wrote:
       | Market prediction with anonymized feature set, sounds like
       | Numerai: https://numer.ai/
        
       | toomuchtodo wrote:
       | Any model superior to what Jane Street is running is worth vastly
       | more than the prizes they're offering.
       | 
       | If you prove such a model out, get licensed (SEC, FINRA) and
       | start soliciting to manage assets.
       | 
       | Disclaimer: Not investment advice. Not a lawyer, not your
       | fiduciary.
        
         | [deleted]
        
         | csomar wrote:
         | You are ignoring execution, infrastructure and real-market
         | conditions. The model is just one part of the game.
        
         | georgeek wrote:
         | Such competitions might have two goals in mind: recruiting and
         | signal diversification. The recruiting angle is obvious.
         | 
         | Any alpha that is not fully correlated to existing alpha is
         | worth its weight in gold for an organization with the size,
         | sophistication and complexity of JS. That's part of the reason
         | why efforts such as 2Sigma's Alpha Studio exist:
         | https://alphastudio.com/
        
         | usmannk wrote:
         | You don't have to (and certainly won't) beat all of Jane
         | Street. The goal is to beat everyone else on Kaggle. A still
         | difficult but much more accomplishable task.
        
           | toomuchtodo wrote:
           | Yeah, I'm arguing to not disclose the model. It's worth far
           | more held close.
           | 
           | If you want to work at Jane Street, go work for Jane Street.
           | If you want to build your own models and run your own shop,
           | the tools exist for you to do that without Jane Street
           | (although there's probably some amount of value learning the
           | ropes there while they pay you, if that's your thing).
           | 
           | My comments in thread are primarily around not having
           | someone's work exploited by sophisticated hedge/prop
           | trading/investment professionals, which I've seen happen more
           | than once, and for which you have no recourse.
        
             | Spinnaker_ wrote:
             | A backtested model is similar to a great startup idea.
             | There is a huge amount of work to be done before it is
             | worth much.
        
             | jjallen wrote:
             | Assuming it passes some absolute measure(s) of quality. You
             | could be the best on Kaggle and still have a mediocre
             | model.
        
             | usmannk wrote:
             | My point is the winning model of the competition will not
             | be worth anything beyond the prize. It will only be the
             | best amongst other kagglers, almost none of whom are domain
             | experts in finance. It will not stand a chance in "prod".
        
         | lostcolony wrote:
         | 'Admittedly, this challenge far oversimplifies the depth of the
         | quantitative problems Jane Streeters work on daily, and Jane
         | Street is happy with the performance of its existing trading
         | model for this particular question'
        
         | nv-vn wrote:
         | >Jane Street has spent decades developing their own trading
         | models and machine learning solutions to identify profitable
         | opportunities and quickly decide whether to execute trades.
         | These models help Jane Street trade thousands of financial
         | products each day across 200 trading venues around the world.
         | 
         | >Admittedly, this challenge far oversimplifies the depth of the
         | quantitative problems Jane Streeters work on daily, and Jane
         | Street is happy with the performance of its existing trading
         | model for this particular question. However, there's nothing
         | like a good puzzle, and this challenge will hopefully serve as
         | a fun introduction to a type of data science problem that a
         | Jane Streeter might tackle on a daily basis.
         | 
         | Sounds like it's just for fun/recruiting rather than trying to
         | crowd source new strategies -- I'm sure if they were looking to
         | crowd source strats they'd pay a whole lot more than 40k for
         | first place
        
           | tcbawo wrote:
           | This contest seems like the equivalent of the "inventor's
           | hotline" infomercial. If it identifies one promising new
           | approach that they can iterate on, it has probably paid for
           | itself. It also serves as a good PR and recruiting tool. The
           | prize is probably designed to bring in clever non-
           | professionals. It's a win-win for Jane Street
        
             | Traster wrote:
             | If someone has a good idea, you don't want the idea, you
             | want the person. If you take the idea, at best you'll split
             | the market with the person who had the idea. At worst
             | they'll iterate and you'll get nothing. Far better to find
             | people who have the skills to develop an idea.
             | 
             | Having said that you also want to find the (vastly more in
             | number) people who can take someone else's idea and
             | actually implement it.
        
             | amznthrwaway wrote:
             | I sincerely doubt they think they'll get actionable ideas.
             | It seems like a fun recruiting play from a company that
             | takes pride in hiring non-traditional talent.
        
             | elil17 wrote:
             | The inventors hotline infomercial is a scam where they get
             | you to pay for expensive patent filing, consulting, and
             | marketing packages. They never intend to actually use any
             | of the inventions.
        
           | b20000 wrote:
           | why would they need to pay more? the nature of smart people
           | is to undervalue themselves and to not negotiate, so as long
           | as smart people keep doing that other people can take
           | advantage of that.
        
         | heipei wrote:
         | True, and I don't think they expect models that are superior to
         | their own, I'd look at this as a hiring / marketing tool. Plus,
         | even if you had a model that from a pure engineering standpoint
         | was able to match Jane Street's approach, the model would not
         | work without the wealth of proprietary (and expensive) data
         | sources that Jane Street is sure to ingest, so you still
         | couldn't just go out and do it yourself without some serious
         | upfront investment first to get the same data. That is assuming
         | all data they use is even available commercially, which I doubt
         | as well. There are probably data sources that only become
         | available to you through personal relationships with the right
         | folks at the right places.
        
         | uponcoffee wrote:
         | To me, this seems more like a funnel for recruiting
        
           | renewiltord wrote:
           | With an engineer phone screen and three on-site interviews,
           | that's 4 hours of engineer time. $150/hr compensation per
           | engineer, so cost is roughly 2x, $300/hr. So $1.2k to run a
           | candidate through the pipeline post-initial-qualification.
           | 
           | To get one candidate and come out superior, acceptance rate
           | should be 1%. (i.e. 99 failures). But if there are 50 leads
           | from the program, and you convert a fifth, that's 10
           | candidates for a cost / successful recruit of $10k which
           | means you have 10% acceptance rate to break even.
           | 
           | Hmm, back of the envelope seems to do all right as a strat.
           | Relatively cheap. I recall the last time we were hiring, we
           | projected cost per hire at $35k with the bulk of that
           | actually being the recruiter referral fee.
        
             | aaronblohowiak wrote:
             | I think you are significantly underestimating the cost per
             | hour of jane street employees
        
               | renewiltord wrote:
               | I have first party information, two years out of date. Do
               | you have contradictory first party information? Yes/no
               | will be sufficient for me to adjust my priors and I will
               | be grateful.
        
               | throwaway378692 wrote:
               | Package for just-graduated SWEs is about $400k.
        
               | [deleted]
        
               | renewiltord wrote:
               | Thank you.
        
               | theptip wrote:
               | The correct metric is not what the employee's salary is,
               | but the opportunity cost of their time. If all your
               | engineers are working on urgent stuff, and each engineer
               | adds in $1-2m of revenue a year, then the cost to your
               | business of taking them off feature work to do recruiting
               | is not $150/hr.
        
               | renewiltord wrote:
               | Yes, of course. Reasoning for not using that is as
               | follows: if conservative estimates yield a yes, you don't
               | need to assume more.
               | 
               | I know salary (2 years out of date). I don't know oppo
               | cost.
        
               | kgwgk wrote:
               | Disclaimer: I have no first hand info.
               | 
               | According to https://news.efinancialcareers.com/ch-
               | en/307393/jane-street-... "Last year, Jane Street's
               | graduate hires straight from college were said to be paid
               | a $200k annual base salary, plus a $100k sign-on bonus,
               | plus a $100k-$150k guaranteed performance bonus."
               | 
               | According to random people in reddit https://www.reddit.c
               | om/r/cscareerquestions/comments/69k0ap/d... "somebody
               | said they got an offer from JS for $150k + $50k/yr
               | "performance"-based bonus"
               | 
               | Both may be true.
        
               | renewiltord wrote:
               | Thank you.
        
         | paxys wrote:
         | Why do you think this competition will result in a model
         | superior to what Jane Street is running?
        
           | 2-tpg wrote:
           | The winner model will likely outperform anything that Jane
           | Street could come up with this 130 feature set. With 3000+
           | competitors, the top 10 will likely be superior to what Jane
           | Street can do in-house. Then an ensemble of the top solutions
           | will be the best possible model anyone can come up with.
        
         | ACow_Adonis wrote:
         | That's a general problem with a lot of kaggle-esque comps: but
         | then I don't discount the number of unemployed or
         | intellectually curious very intelligent people out there, even
         | if they're doing the equivalent of pushing down against the
         | value of my wage/ bargaining power and doing the datascience
         | equivalent of working for reputation/recruitment.
         | 
         | hell, the thing about us hackers is I can KNOW it's a dud deal,
         | yet part of me still wants to give it a go because it's a
         | problem and it's right there!
        
         | [deleted]
        
         | ampdepolymerase wrote:
         | Quantopian tried this. Didn't work out. Now they have been
         | acquihired by Robin Hood.
        
         | huac wrote:
         | part of what makes trading hard, and especially quantitative
         | trading, is the necessary infrastructure. not just the obvious
         | stuff like execution but also the infra to manage backtesting,
         | risk sizing and management, etc. big firms offer this and lots
         | of data and allow researchers to focus on the small parts where
         | they can become subject domain experts.
        
       | basicneo wrote:
       | By a similar argument to https://danluu.com/sounds-easy/ , no one
       | will beat Jane Street in a weekend.
       | 
       | Jane Street's hiring standards exceeds FAANG's.
       | 
       | This is a hiring/branding strategy. Good luck to them.
        
         | 2-tpg wrote:
         | I actually give it 48 hours before the top 3 equals what Jane
         | Street can do in-house on this exact same dataset. A week
         | before the reasonable plateau is reached, and a month or so
         | before the absolute most information is squeezed out.
        
       | Goosee wrote:
       | Sine this is a competition, does someone mind explaining the
       | reason for people to publicly post notebooks in the "Notebooks"
       | section?
       | 
       | Seems counter-intuitive to provide competitors with free
       | information, unless you are trying to throw them off.
        
         | minkzilla wrote:
         | Not a frequent Kaggle user but from what I've seen the ones
         | posted in notebooks are baseline examples. The things anyone
         | who is competitive enough to win has probably thought of and
         | dismissed or could implement themselves in half an hour and
         | iterate from there.
        
           | mjn wrote:
           | As a current example: the highest-scoring entry that has an
           | associated notebook at the moment is basically a clean
           | example of how to apply XGBoost [1] to this dataset. XGBoost
           | ends up being tried in nearly every Kaggle competition, so
           | the person isn't giving away many secrets there.
           | 
           | [1] https://xgboost.readthedocs.io/en/latest/
        
         | 2-tpg wrote:
         | Posting notebooks can get you upvotes, which contribute towards
         | becoming a Kaggle (Grand)Master. It is also a good way to "win"
         | some attention and goodwill, without spending months trying to
         | actually win the competition itself. Publishing Notebooks also
         | helps you improve your coding/presentation skills, for a
         | popular notebook needs to be useful for a wide audience (or
         | fairly competitive).
         | 
         | The best techniques, certainly coming from teams, are hardly
         | ever published as Notebooks. But yes, many winning teams will
         | eventually incorporate some of the information in the
         | Notebooks, if only to hedge against the others doing the same.
        
       | [deleted]
        
       | x87678r wrote:
       | Jane Street is big in OCaml and CompSci worlds but the only guy I
       | know who worked there did ETF arbitrage/redemption/creation which
       | has to be one of the most boring businesses on the street. Is it
       | worth working there (aside from the salaries)?
        
         | nvarsj wrote:
         | It depends what you're looking for.
         | 
         | Best in class engineering and internet scale problems? Nope,
         | you aren't going to find that at any hedge fund. They are much
         | more like small start up cultures. Speed and results are
         | favored over a mature engineering culture and maintainable
         | code.
         | 
         | Want to have the potential to make a large direct impact and
         | make a crap load of money? Well then, a hedge fund may be a
         | good fit.
        
       | b20000 wrote:
       | I never heard of Jane street. I visited the website and saw a
       | bunch of people crammed into an open office like chickens in a
       | breeding factory. That is not attractive to me. Competitions like
       | this just seem to me like a cheap way to outsource problem
       | solving without having to pay anyone. Like bringing someone in
       | for an interview, letting them solve your problem, and then
       | telling them they are not a good fit while profiting of their
       | work. Same idea, different package.
        
       | ArtWomb wrote:
       | So, if I parsed the training data correctly, one's output
       | algorithm is completely agnostic to any actual market conditions.
       | It's merely learned on the anonymized feature set of 130
       | variables. Making it qualitatively no different that any other
       | abstract ML forecasting problem. There's no considerations of
       | market microstructure, news driven events, leverage, etc?
        
         | mawise wrote:
         | In their "code requirements" section they say:
         | 
         | > Freely & publicly available external data is allowed
         | 
         | So I presume it would be fair to fetch and leverage additional
         | data on your own.
        
       | yodsanklai wrote:
       | Just curious, how well-known is jane street outside of the OCaml
       | world?
        
         | nwsm wrote:
         | Often mentioned on Blind as one of the places top engineers can
         | be paid handsomely.
        
         | reducesuffering wrote:
         | Well known to Ivy-League undergrads and FAANG engineers for
         | having some of the highest paying jobs available.
        
         | smabie wrote:
         | it's well known as a top tier market making firm. They're not
         | as special as they are made out to be, though. They just have
         | good marketing.
        
       | blatchcorn wrote:
       | Just give me some monkeys, darts and a dart board
        
       | bigdict wrote:
       | This would be much more interesting if the features weren't
       | anonymized.
       | 
       | At this point this is just a widest/deepest neural net
       | competition on some unknown bunch of features.
        
         | rrjjww wrote:
         | I was just about to post the same thing. I'm a statistician in
         | my day job and the raw math is only one part of building a
         | model. Human judgement (while often flawed) can be key to
         | improving actual model performance.
        
       | arcticbull wrote:
       | Stonks only go up what more do you need to know?
        
       | bklyn11201 wrote:
       | Is this a just a hiring competition under a different name? If
       | you can beat their model, obviously you deserve a 100k signing
       | bonus and a very generous compensation package.
        
         | sjg007 wrote:
         | Doing well on Kaggle is basically a resume builder and probably
         | would get you into jobs you wouldn't otherwise get. Not much
         | different than topcoder in that sense. Some people can make a
         | living just winning competitions.
        
         | usmannk wrote:
         | You don't beat their model, just those of everyone else in the
         | competition. A very different game.
        
         | hnracer wrote:
         | You can't beat their model because they don't include all the
         | data they use in the competition dataset. It's a highly
         | sanitized and simplified toy problem used for hiring and
         | marketing. They don't even tell you what the features are so
         | it's impossible to use domain expertise to constrain the
         | fitting problem (this is a critical component to building
         | profitable trading models because of the signal to noise ratio
         | in the data)
        
       | lwigo wrote:
       | Whats the over / under on someone entering the inverse of what
       | WSB does?
        
         | x87678r wrote:
         | Pump in 2020, dump in 2021.
        
       | afrojack123 wrote:
       | Nothing like free code monkey data scientists.
        
       ___________________________________________________________________
       (page generated 2020-11-24 23:00 UTC)