[HN Gopher] Jane Street Market Prediction ($100k Kaggle competit... ___________________________________________________________________ Jane Street Market Prediction ($100k Kaggle competition) Author : tosh Score : 169 points Date : 2020-11-24 18:25 UTC (4 hours ago) (HTM) web link (www.kaggle.com) (TXT) w3m dump (www.kaggle.com) | usmannk wrote: | As a frequent Kaggler (perhaps too frequent... it's a bit | addicting, in a way I'm sure others on HN will understand), I was | fairly intrigued to see this one pop up in the competition list a | few days ago. Finance shops have tried their hand at Kaggle | before, but I think they've normally been out of their domain. | e.g. Two Sigma recently did a reinforcement learning game | competition. | | I'd caution the HN crowd not to expect production-level quant | models out of this, like I'm seeing some doing in the comments | already. Kagglers are excellent machine learning practitioners | and the models that come out of many competitions are top-notch | stuff, often making their way into research papers. But this is a | short competition on limited data in a non-real-world scenario. | The winning models will be very interesting educational exercises | and probably wonderful recruiting material for Jane Street, but | won't be the underpinnings of a new fund. | | That said, I can't wait to see what comes out of this one. It | ticks all of my competitive boxes :) | riazrizvi wrote: | Mathematical analysis of financial markets is more celebrated | when applied to relative valuation of different assets, rather | than prediction of the market. Black-scholes, for example, | applied calculus with an underlying no-arbitrage assumption to | create a thriving market in option pricing, by giving traders a | mechanism to reduce risk and thereby reduce bid offer spreads. | Same in fixed income, mortgage, and credit market assets over | the years. | | The problem with predicting absolute levels, is that there is a | game theoretic aspect which undermines any mathematical trading | strategy as soon as it is public. optimal game theory trading | strategies don't produce great results, and they are relatively | trivial to identify. Instead strong profits in market | long/short macro positions are mostly created by information | advantages, which don't really make for interesting Kaggle | competitions. For example, big profits in macro trading have | historically been consistently achieved by front running | customer orders, by building timing advantages on top of | trading infrastructure, by funding research analysts that | inspect operations on the ground, by lobbying for regulations | that change market directions and so on. | | It's very hard to tell if a best performing hedge funds that | doesn't have an unfair advantage, that declares its only using | quantitative strategies, is in fact just a statistical anomaly | with a hollow narrative. | hogFeast wrote: | Also, prices aren't stationary. For an equity security, you | are predicting the price for a company that is compounding | capital over time. You can predict the price for the company | at one point, the relative valuation for the company (for | example, against peer group) may not change in one year but | that company is investing their capital at X% so you get | price growth. | | The reason why relative valuation models are more effective | is the same reason why most sports betting models use current | odds as an input. Prices contain information but, in my | experience, these methods aren't totally effective because | they often miss important information about the company | itself (big price moves happen because relative valuations | are wrong). Value or quality appears to do fundamental work | but is often woefully blind (for example, there are proven | accounting issues with value strategies...does your average | quant understand this? No. Have they ever read a set of | accounts? No. They have no hope. None.) | | Just imo, I think quant strategies are almost totally | worthless beyond liquidity provision (even a strategy like | front-running news in FX...humans do this better, and I know | people who are still making tons of money doing this). I | think there is massive value in that mode of analysis but the | people who make the most are always going to be people who | know the fundamentals better (I think firms like Marshall | Wace that are doing this synthesis will move ahead) because | that information is often not in the price at all. | blovescoffee wrote: | I've wanted to start learning about this for a while but I'm | really not sure where to start. I have a degree in CS and | Math so I'm not a total layman wrt the maths. Do you have any | suggestions? | riazrizvi wrote: | What's your goal? | georgeecollins wrote: | This! | | >> The problem with predicting absolute levels, is that there | is a game theoretic aspect which undermines any mathematical | trading strategy as soon as it is public. | | I took finance in Business school, coming from doing a lot of | statistical analysis in a research lab. I hated my finance | professors and there pseudo science. Pricing formulas work | great until they don't. The problem is when they don't, they | really don't, in a catastrophic way. Read "When Genius | Failed." Real traders know this. But some economists and | finance professors act like these mathematical models are | describing a predictable physical phenomena. | riazrizvi wrote: | To clarify, the hedge fund LTCM in "When Genius Failed", | collapsed not because it relied on arbitrage 'pricing | formulae', rather because it failed to properly execute | arbitrage trades. | | LTCM in being overly leveraged, relied on other market | participants to maintain short term price alignment, which | meant it was not arbitrage. Salomon's reduced its role as | market-maker, maintaining short term price alignment, which | increased short term price anomalies, and thus increased | LTCM's vulnerability. The Asian financial crisis increased | the frequency and extent of those pricing anomalies, and | the subsequent Russian Default crisis did the same. Margin | calls were made on LTCM that it couldn't cover, forcing | them to close out of their positions at very unprofitable | times of the trade strategy. | | So I don't think "pseudo-science" is a great description | for what those B-School profs are teaching. Rather the | pricing formulae are just the beginnings of the financial | theory you need to run arbitrage strategies, but they are | not sufficient. You need to augment them with a broader | picture of market dynamics and capital management, just | like you'd need to learn about financial law, financial | market technology, and a bunch of other stuff to run a | successful market-making desk. | milesvp wrote: | To add to this, unless your model is situated, and can | purturb the market, it has no way of knowing what happens | when you flex your muscle. I have a friend who did | algorithmic trading professionally for a few years, and he | said it was amazing to watch the data. Said he could see | other bots come along and poke him, trying to look for | weaknesses in his algorithm to exploit. I would expect a | purely formulaic trader to underperform other traders who | can take advantage of others. It's no different than how | you have to win a rhoshambo turnament. | fractionalhare wrote: | Yes, it's overwhelmingly unlikely that the winning model will | actually be a competitive trading strategy. | | Kaggle encourages a domain agnostic approach to modeling, in | the sense that participants use sophisticated machine learning | and statistical methods but typically have no domain expertise | in the underlying data. This kind of approach to finance has | historically performed poorly. [1] | | Good quantitative trading is usually backed by a strong | fundamental thesis and an _interpretable_ model, which is | obtained by cross-pollinating sophisticated math and statistics | with domain expertise in some part of finance. That domain | expertise might be in different kinds of assets, liquidity or | market microstructure, but it 's there. | | $100k is cheap for Jane Street. If nothing else they have a new | recruiting pipeline of people with demonstrable machine | learning skills. | | ______________ | | 1. I would also say this is a poor way to approach statistical | analysis in _most_ domains, and usually leads to spurious or | overfit results. But the idea that you can just run a model and | find patterns in pricing data is especially attractive and | insidious. | usmannk wrote: | > Kaggle encourages a domain agnostic approach to modeling, | in the sense that participants use sophisticated machine | learning and statistical methods but typically have no domain | expertise in the underlying data. | | Yes this is accurate and put very well. This is so much the | case that if you have a strong background understanding of | the field, the ML part can actually be picked up quite | quickly or contributed by someone else. There are a few | notable users who are both domain and ML experts and they | tend to absolutely clean up in their field. I'm thinking of a | couple of med students in particular who are formidable in | every medical imaging competition. | rahimnathwani wrote: | "have no domain expertise in the underlying data. This kind | of approach to finance has historically performed poorly" | | I recently read 'The man who solved the market', about Jim | Simons and Renaissance Capital. The way the book tells it, | looking for patterns without seeking domain expertise (e.g. | ignoring fundamental valuation of equities) is exactly what | Renaissance did, and it worked out very well. | fractionalhare wrote: | I can see why someone would characterize RenTech that way | but it's not really fair to do so. There is a lot of mythos | about how Simons hired computer scientists, mathematicians, | signal processing and NLP experts, etc. When Mercer came | over from IBM, he definitely contributed a significant | amount of analytical expertise that was probably | nonexistent in financial trading at the time (with the | possible exception of the Ed Thorp diaspora). The | astrophysicists RenTech hires every year bring new insights | in ways to model and understand vast amounts of data with | absurd dimensionality. | | But all of this has to be utilized in the context of the | data. The reality is that you're not going to develop a | sophisticated options trading strategy without a strong | understanding of what an option (and more generally, a | derivative) _is_. You can 't develop a viable statistical | arbitrage strategy just by treating market microstructure | as a blackbox signal to be solved with e.g. Fourier | analysis. You can certainly find an edge in using | fundamentally superior methods of analysis, but you still | need to know what that data represents in the context of | the market. | | Don't be fooled: people working at firms like RenTech have | a strong understanding of the underlying finance. It's just | that they learned it on the job, because the ethos at these | firms is that learning fundamental theory in math and | statistics is harder than learning fundamental theory in | finance. You don't have to take my word for it though. Read | about one of the few strategies of RenTech's which has been | publicized: https://www.bloomberg.com/opinion/articles/2014 | -07-22/senate.... Deutsche and RenTech didn't team up on | this strategy (to fantastic success) by treating basket | options as some kind of blackbox abstraction devoid of | delta, gamma, theta and vega. | reese_john wrote: | How much do you think does their advantage stems from | having high quality proprietary/alternative datasets? | ciamac wrote: | That DB/rentech basket option is a tax avoidance scheme. | It has nothing to do options pricing and concepts like | delta, gamma, etc. | fractionalhare wrote: | Yeah, yeah. That controversy has been litigated on HN a | dozen times already, I'm not going to rehash it. Do you | dispute my primary point here? If so, why? | | (Also, even if I agree it was purely intended for tax | avoidance, I don't understand why you think that would | obviate having to understand how the options work | intimately well). | nojito wrote: | Yup. | | https://www.rentec.com/Careers.action?computerProgrammer= | tru... | | They look for programmers with knowledge of Tax and Risk | Management. | x87678r wrote: | They even say in the instructions: | | Admittedly, this challenge far oversimplifies the depth of the | quantitative problems Jane Streeters work on daily, and Jane | Street is happy with the performance of its existing trading | model for this particular question. | Spinnaker_ wrote: | We should also remember that Jane Street is primarily an ETF | market maker. Their main business isn't betting on prices of | stocks or managing a portfolio. | | I've only taken a quick look at the data, but the problem | doesn't seem to be focused on their core competencies, but | instead is much more general. | npmisdown wrote: | I'm sorry, if you could build a model to predict markets, why | will you post in to Kaggle to get $40k in prize instead of | applying this model to your own broker account? | tikhonj wrote: | It is _much_ harder to turn a model into a profitable trading | strategy than people realize. Apart from transaction costs, | risk management and market impact there are also a lot of small | operational details which can make or break your execution. One | example I vaguely recall was that the details of how a specific | foreign exchange conducted its closing auction could make a | substantial difference to a strategy that involved executing | there alongside other trading venues. | | The payoff for getting these operational details right or wrong | is massively asymmetrical. If you get everything right, you'll | only do as well as your model lets you. But if you get anything | wrong, you run a real chance of losing far more money than you | could have hoped to make! | | Even just validating your strategy on historical data (ie back- | testing) is harder than it sounds. If you make a mistake that | leaks information to the code you're testing, you can end up | with a much rosier return and risk profile than you really | have. Another way to lose money when you go put your model into | action. | | If you get over these challenges and run your strategy | successfully for a while, other market participants are going | to start adjusting against it and you have to adjust in turn. | You can't just "set and forget". | | I should note that I am far from an expert on any of this, | though! I just know enough to not trade with serious money--my | real savings are all in index funds I don't touch, thank you | very much :). | hogFeast wrote: | I believe what you are referring to is the fix. Foreign | exchange markets, that I am aware of, do not have closing | auctions. | | I have heard of some quants trading foreign exchange markets, | agreeing to trade at the fix with their counter-party, and | not realising that traders often manipulate the fix resulting | in the quant's strategy appearing not to work. It is almost | comical (I worked in finance but not in FX, everyone knew | this was going on for decades before the SEC starting fining | people) that someone who managed money was making this error. | | You are 100% correct about all the other stuff. Lots of | issues with "production"...that is why financial firms employ | traders/risk people/etc. Most people who trade themselves | tend to go for lower-frequency strategies that they can | implement personally. I actually don't think there are huge | barriers, smaller investors have a huge advantage (when you | trade at scale, the market moves against you) but you have to | work with what you have and realise that you will get crushed | if you try to replicate what someone with more money is | doing. | | Also, data. Data is expensive, and a huge fixed cost. | nstj wrote: | A "foreign exchange" not "foreign exchange market" | hogFeast wrote: | Ah, same principle. I have heard of many similar stories. | madrafi wrote: | well because quant trading isn't about import xgboost, you need | a sustainable infra to handle api failovers, bad data... not | even going to mention risk management which is 50% of what | quant trading is about. the data provided is anonymized but | would probably be a mix of laggard measurements (moving | averages, rsi...) and maybe some flow data... quant trading | isn't really about finding "secret stuff" most profitable | strats you can deploy can be based on stat-arb, basis trading | or even just delta-neutral funding farming and such | homie wrote: | Mostly because it's impossible to accurately predict the market | - and this is just a competition to see who can build the best | model. | thegjp210 wrote: | HFT firms aren't trying to predict "the market" as a whole - | just small eddies of it. A typical example of this is arbing | names at the bottom of index fund rebalances. Speed is | important mostly to make sure someone else doesn't hit the | arb first. | mbesto wrote: | It's a good question. The basic answer is not everyone has | capital and risk, but they may have the time and intellect. | 1helloworld1 wrote: | Isn't it pretty well known in the finance world that using stale | public information to predict the market is a fool's errand? | | Unless you have some kind of specialized non-public data (e.g | satellite images of number of cars parked outside parking malls, | number of cargo ships moving in and out), trying to predict the | market with historical data does worse than "Just give me some | monkeys, darts and a dart board". | Tinyyy wrote: | That's not necessarily true. | 2-tpg wrote: | Using purely historical price data it is harrowingly difficult. | There are 130 anonymized features, so that's unlikely to be | only price data. It could include information on the order | book, correlated assets, fundamentals, vectorized/embedded | text, etc. | | Besides, I bet you can train monkeys to do (slightly) better | than blindfolded random throwing. Even with public data | (replace satellite images with Youtube mentions, or number of | links moving into a company website) it is _very_ possible to | do better than average guessing on quite a lot of assets | (especially smaller and newer markets). | | Most hedge funds, even with specialized expensive non-public | data, are not magical unicorns. Their quants really may just | run a gradient boosting machine and leave it at that. Some | hedge funds even prefer linear methods, because this lowers | risk through lower variance. Such models _can_ be beaten by | experienced Kagglers for sure. For one, I did. | Traster wrote: | One thing we need to be clear about is that you're not aiming | to be better than average. You're aiming to make a profit. | There are probably hundreds of thousands of day traders, | there are probably <100 market makers and tradingfirms (far | less than that for a some specific products) and you'll | probably find 99% of the day traders aren't making systematic | profits. There are lots of strategies that are much better | than average and still worse than putting your cash in a | bank. | 2-tpg wrote: | You can aim for both. If you just aim for profit, then you | can get lucky with just average, or even random, betting. | If you find a weighted coinflip (which is not impossible), | provided by how many times you can flip that coin, you will | see steady systematic profits. Of course, majority of day | traders are getting owned by the big players, and they | would do better doing more reasoned and long-term | investments. Most day traders are not even using predictive | models though. | Traster wrote: | On that point - it's pretty clear that this Kaggle | competition is highly likely to result in a decent number | of submissions that make more money through luck, than | other make through strategy. | minimaxir wrote: | Granted, a typical Kaggle metagame-that-is-technically-against- | the-rules is to use data from outside the dataset, which is one | of the reasons winners have to be validated. | usmannk wrote: | Generally this is allowed if you publish the data you're | bringing in. They even create a sponsored thread for it in | most competitions. | justjonathan wrote: | From: kaggle.com/c/jane-street-market- | prediction/overview/code-requirements | | "Freely & publicly available external data is allowed, | including pre-trained models" | blhack wrote: | >Unless you have some kind of specialized non-public data (e.g | satellite images of number of cars parked outside parking | malls, number of cargo ships moving in and out) | | Planet labs will sell you all of that data, in case people | reading along here are curious. | cultus wrote: | I'm sure they've got more/better data in production. There | seems to be some arbitrage that can be shaved off the edges for | players with innovative enough strategies and good and timely | enough data. | logicslave wrote: | Theres actually still money to be made in small scale | strategies. Sophisticated funds are running billions. They cant | focus on strategies that only work for 100-500k. This is where | big returns can be made. Even warren buffet will say, if he was | only managing 1 million, he would get 100% a year returns. | xapata wrote: | That doesn't make sense unless there are very few viable | small scale strategies, at which point they'd probably be | difficult to identify. Your assertion might have been true | before computers were able to help someone manage many | strategies simultaneously. | smabie wrote: | no not really. you can use public information to give you an | edge. And I say this as a person who trades and develops models | at a _vety_ successful market making firm. | | Alternative data no one else can get easily certainly has | tremendous value though. | | Of course, predicting one or two seconds into the future (my | primary concern) is easier than days or years, so there's that. | hogFeast wrote: | None of that information is non-public (you can find cargo ship | data online for free), none of it is particularly valuable (you | are looking for information, it is hard to know how much | information is in cargo ship movement...it depends), and most | non-quant hedge funds have been doing stuff like this for | decades (i.e. hiring people to stand outside a retailer's | stores and count customers)...most of this stuff is less useful | than people think (again, you need information, data with | intent). | | Also, most of this stuff isn't in the price. Lots of people are | collecting new data, it is definitely becoming more widespread | but the actual synthesis is tricky (most people who are quants | do not understand fundamentals, and most fundamental analysis | don't understand data...most firms are swirling in a perfect | storm of ignorance). | master_yoda_1 wrote: | These are leetcode equivalent for data science and quant. So | invest time only interested. | baobabKoodaa wrote: | Market prediction with anonymized feature set, sounds like | Numerai: https://numer.ai/ | toomuchtodo wrote: | Any model superior to what Jane Street is running is worth vastly | more than the prizes they're offering. | | If you prove such a model out, get licensed (SEC, FINRA) and | start soliciting to manage assets. | | Disclaimer: Not investment advice. Not a lawyer, not your | fiduciary. | [deleted] | csomar wrote: | You are ignoring execution, infrastructure and real-market | conditions. The model is just one part of the game. | georgeek wrote: | Such competitions might have two goals in mind: recruiting and | signal diversification. The recruiting angle is obvious. | | Any alpha that is not fully correlated to existing alpha is | worth its weight in gold for an organization with the size, | sophistication and complexity of JS. That's part of the reason | why efforts such as 2Sigma's Alpha Studio exist: | https://alphastudio.com/ | usmannk wrote: | You don't have to (and certainly won't) beat all of Jane | Street. The goal is to beat everyone else on Kaggle. A still | difficult but much more accomplishable task. | toomuchtodo wrote: | Yeah, I'm arguing to not disclose the model. It's worth far | more held close. | | If you want to work at Jane Street, go work for Jane Street. | If you want to build your own models and run your own shop, | the tools exist for you to do that without Jane Street | (although there's probably some amount of value learning the | ropes there while they pay you, if that's your thing). | | My comments in thread are primarily around not having | someone's work exploited by sophisticated hedge/prop | trading/investment professionals, which I've seen happen more | than once, and for which you have no recourse. | Spinnaker_ wrote: | A backtested model is similar to a great startup idea. | There is a huge amount of work to be done before it is | worth much. | jjallen wrote: | Assuming it passes some absolute measure(s) of quality. You | could be the best on Kaggle and still have a mediocre | model. | usmannk wrote: | My point is the winning model of the competition will not | be worth anything beyond the prize. It will only be the | best amongst other kagglers, almost none of whom are domain | experts in finance. It will not stand a chance in "prod". | lostcolony wrote: | 'Admittedly, this challenge far oversimplifies the depth of the | quantitative problems Jane Streeters work on daily, and Jane | Street is happy with the performance of its existing trading | model for this particular question' | nv-vn wrote: | >Jane Street has spent decades developing their own trading | models and machine learning solutions to identify profitable | opportunities and quickly decide whether to execute trades. | These models help Jane Street trade thousands of financial | products each day across 200 trading venues around the world. | | >Admittedly, this challenge far oversimplifies the depth of the | quantitative problems Jane Streeters work on daily, and Jane | Street is happy with the performance of its existing trading | model for this particular question. However, there's nothing | like a good puzzle, and this challenge will hopefully serve as | a fun introduction to a type of data science problem that a | Jane Streeter might tackle on a daily basis. | | Sounds like it's just for fun/recruiting rather than trying to | crowd source new strategies -- I'm sure if they were looking to | crowd source strats they'd pay a whole lot more than 40k for | first place | tcbawo wrote: | This contest seems like the equivalent of the "inventor's | hotline" infomercial. If it identifies one promising new | approach that they can iterate on, it has probably paid for | itself. It also serves as a good PR and recruiting tool. The | prize is probably designed to bring in clever non- | professionals. It's a win-win for Jane Street | Traster wrote: | If someone has a good idea, you don't want the idea, you | want the person. If you take the idea, at best you'll split | the market with the person who had the idea. At worst | they'll iterate and you'll get nothing. Far better to find | people who have the skills to develop an idea. | | Having said that you also want to find the (vastly more in | number) people who can take someone else's idea and | actually implement it. | amznthrwaway wrote: | I sincerely doubt they think they'll get actionable ideas. | It seems like a fun recruiting play from a company that | takes pride in hiring non-traditional talent. | elil17 wrote: | The inventors hotline infomercial is a scam where they get | you to pay for expensive patent filing, consulting, and | marketing packages. They never intend to actually use any | of the inventions. | b20000 wrote: | why would they need to pay more? the nature of smart people | is to undervalue themselves and to not negotiate, so as long | as smart people keep doing that other people can take | advantage of that. | heipei wrote: | True, and I don't think they expect models that are superior to | their own, I'd look at this as a hiring / marketing tool. Plus, | even if you had a model that from a pure engineering standpoint | was able to match Jane Street's approach, the model would not | work without the wealth of proprietary (and expensive) data | sources that Jane Street is sure to ingest, so you still | couldn't just go out and do it yourself without some serious | upfront investment first to get the same data. That is assuming | all data they use is even available commercially, which I doubt | as well. There are probably data sources that only become | available to you through personal relationships with the right | folks at the right places. | uponcoffee wrote: | To me, this seems more like a funnel for recruiting | renewiltord wrote: | With an engineer phone screen and three on-site interviews, | that's 4 hours of engineer time. $150/hr compensation per | engineer, so cost is roughly 2x, $300/hr. So $1.2k to run a | candidate through the pipeline post-initial-qualification. | | To get one candidate and come out superior, acceptance rate | should be 1%. (i.e. 99 failures). But if there are 50 leads | from the program, and you convert a fifth, that's 10 | candidates for a cost / successful recruit of $10k which | means you have 10% acceptance rate to break even. | | Hmm, back of the envelope seems to do all right as a strat. | Relatively cheap. I recall the last time we were hiring, we | projected cost per hire at $35k with the bulk of that | actually being the recruiter referral fee. | aaronblohowiak wrote: | I think you are significantly underestimating the cost per | hour of jane street employees | renewiltord wrote: | I have first party information, two years out of date. Do | you have contradictory first party information? Yes/no | will be sufficient for me to adjust my priors and I will | be grateful. | throwaway378692 wrote: | Package for just-graduated SWEs is about $400k. | [deleted] | renewiltord wrote: | Thank you. | theptip wrote: | The correct metric is not what the employee's salary is, | but the opportunity cost of their time. If all your | engineers are working on urgent stuff, and each engineer | adds in $1-2m of revenue a year, then the cost to your | business of taking them off feature work to do recruiting | is not $150/hr. | renewiltord wrote: | Yes, of course. Reasoning for not using that is as | follows: if conservative estimates yield a yes, you don't | need to assume more. | | I know salary (2 years out of date). I don't know oppo | cost. | kgwgk wrote: | Disclaimer: I have no first hand info. | | According to https://news.efinancialcareers.com/ch- | en/307393/jane-street-... "Last year, Jane Street's | graduate hires straight from college were said to be paid | a $200k annual base salary, plus a $100k sign-on bonus, | plus a $100k-$150k guaranteed performance bonus." | | According to random people in reddit https://www.reddit.c | om/r/cscareerquestions/comments/69k0ap/d... "somebody | said they got an offer from JS for $150k + $50k/yr | "performance"-based bonus" | | Both may be true. | renewiltord wrote: | Thank you. | paxys wrote: | Why do you think this competition will result in a model | superior to what Jane Street is running? | 2-tpg wrote: | The winner model will likely outperform anything that Jane | Street could come up with this 130 feature set. With 3000+ | competitors, the top 10 will likely be superior to what Jane | Street can do in-house. Then an ensemble of the top solutions | will be the best possible model anyone can come up with. | ACow_Adonis wrote: | That's a general problem with a lot of kaggle-esque comps: but | then I don't discount the number of unemployed or | intellectually curious very intelligent people out there, even | if they're doing the equivalent of pushing down against the | value of my wage/ bargaining power and doing the datascience | equivalent of working for reputation/recruitment. | | hell, the thing about us hackers is I can KNOW it's a dud deal, | yet part of me still wants to give it a go because it's a | problem and it's right there! | [deleted] | ampdepolymerase wrote: | Quantopian tried this. Didn't work out. Now they have been | acquihired by Robin Hood. | huac wrote: | part of what makes trading hard, and especially quantitative | trading, is the necessary infrastructure. not just the obvious | stuff like execution but also the infra to manage backtesting, | risk sizing and management, etc. big firms offer this and lots | of data and allow researchers to focus on the small parts where | they can become subject domain experts. | basicneo wrote: | By a similar argument to https://danluu.com/sounds-easy/ , no one | will beat Jane Street in a weekend. | | Jane Street's hiring standards exceeds FAANG's. | | This is a hiring/branding strategy. Good luck to them. | 2-tpg wrote: | I actually give it 48 hours before the top 3 equals what Jane | Street can do in-house on this exact same dataset. A week | before the reasonable plateau is reached, and a month or so | before the absolute most information is squeezed out. | Goosee wrote: | Sine this is a competition, does someone mind explaining the | reason for people to publicly post notebooks in the "Notebooks" | section? | | Seems counter-intuitive to provide competitors with free | information, unless you are trying to throw them off. | minkzilla wrote: | Not a frequent Kaggle user but from what I've seen the ones | posted in notebooks are baseline examples. The things anyone | who is competitive enough to win has probably thought of and | dismissed or could implement themselves in half an hour and | iterate from there. | mjn wrote: | As a current example: the highest-scoring entry that has an | associated notebook at the moment is basically a clean | example of how to apply XGBoost [1] to this dataset. XGBoost | ends up being tried in nearly every Kaggle competition, so | the person isn't giving away many secrets there. | | [1] https://xgboost.readthedocs.io/en/latest/ | 2-tpg wrote: | Posting notebooks can get you upvotes, which contribute towards | becoming a Kaggle (Grand)Master. It is also a good way to "win" | some attention and goodwill, without spending months trying to | actually win the competition itself. Publishing Notebooks also | helps you improve your coding/presentation skills, for a | popular notebook needs to be useful for a wide audience (or | fairly competitive). | | The best techniques, certainly coming from teams, are hardly | ever published as Notebooks. But yes, many winning teams will | eventually incorporate some of the information in the | Notebooks, if only to hedge against the others doing the same. | [deleted] | x87678r wrote: | Jane Street is big in OCaml and CompSci worlds but the only guy I | know who worked there did ETF arbitrage/redemption/creation which | has to be one of the most boring businesses on the street. Is it | worth working there (aside from the salaries)? | nvarsj wrote: | It depends what you're looking for. | | Best in class engineering and internet scale problems? Nope, | you aren't going to find that at any hedge fund. They are much | more like small start up cultures. Speed and results are | favored over a mature engineering culture and maintainable | code. | | Want to have the potential to make a large direct impact and | make a crap load of money? Well then, a hedge fund may be a | good fit. | b20000 wrote: | I never heard of Jane street. I visited the website and saw a | bunch of people crammed into an open office like chickens in a | breeding factory. That is not attractive to me. Competitions like | this just seem to me like a cheap way to outsource problem | solving without having to pay anyone. Like bringing someone in | for an interview, letting them solve your problem, and then | telling them they are not a good fit while profiting of their | work. Same idea, different package. | ArtWomb wrote: | So, if I parsed the training data correctly, one's output | algorithm is completely agnostic to any actual market conditions. | It's merely learned on the anonymized feature set of 130 | variables. Making it qualitatively no different that any other | abstract ML forecasting problem. There's no considerations of | market microstructure, news driven events, leverage, etc? | mawise wrote: | In their "code requirements" section they say: | | > Freely & publicly available external data is allowed | | So I presume it would be fair to fetch and leverage additional | data on your own. | yodsanklai wrote: | Just curious, how well-known is jane street outside of the OCaml | world? | nwsm wrote: | Often mentioned on Blind as one of the places top engineers can | be paid handsomely. | reducesuffering wrote: | Well known to Ivy-League undergrads and FAANG engineers for | having some of the highest paying jobs available. | smabie wrote: | it's well known as a top tier market making firm. They're not | as special as they are made out to be, though. They just have | good marketing. | blatchcorn wrote: | Just give me some monkeys, darts and a dart board | bigdict wrote: | This would be much more interesting if the features weren't | anonymized. | | At this point this is just a widest/deepest neural net | competition on some unknown bunch of features. | rrjjww wrote: | I was just about to post the same thing. I'm a statistician in | my day job and the raw math is only one part of building a | model. Human judgement (while often flawed) can be key to | improving actual model performance. | arcticbull wrote: | Stonks only go up what more do you need to know? | bklyn11201 wrote: | Is this a just a hiring competition under a different name? If | you can beat their model, obviously you deserve a 100k signing | bonus and a very generous compensation package. | sjg007 wrote: | Doing well on Kaggle is basically a resume builder and probably | would get you into jobs you wouldn't otherwise get. Not much | different than topcoder in that sense. Some people can make a | living just winning competitions. | usmannk wrote: | You don't beat their model, just those of everyone else in the | competition. A very different game. | hnracer wrote: | You can't beat their model because they don't include all the | data they use in the competition dataset. It's a highly | sanitized and simplified toy problem used for hiring and | marketing. They don't even tell you what the features are so | it's impossible to use domain expertise to constrain the | fitting problem (this is a critical component to building | profitable trading models because of the signal to noise ratio | in the data) | lwigo wrote: | Whats the over / under on someone entering the inverse of what | WSB does? | x87678r wrote: | Pump in 2020, dump in 2021. | afrojack123 wrote: | Nothing like free code monkey data scientists. ___________________________________________________________________ (page generated 2020-11-24 23:00 UTC)