[HN Gopher] Show HN: Exp. Smoothing is 32% more accurate and 100... ___________________________________________________________________ Show HN: Exp. Smoothing is 32% more accurate and 100x faster than Neural-Prophet We benchmarked on more than 55K series and show that ETS improves MAPE and sMAPE forecast accuracy by 32% and 19%, respectively, with 104x less computational time over NeuralProphet. We hope this exercise helps the forecast community avoid adopting yet another overpromising and unproven forecasting method. Author : maxmc Score : 115 points Date : 2022-08-17 19:33 UTC (3 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | ren_engineer wrote: | what's the consensus on machine learning vs more classical | methods for time series forecasting? I know in 2018 a hybrid | model won the M4 competition, obviously in this case classical | still beats AI/ML | | https://en.wikipedia.org/wiki/Makridakis_Competitions | rich_sasha wrote: | I think depends massively in what you mean by "time series". If | it is really an ARMA model you're looking at then ML can only | bring noise to the problem. If it is a complex large system | that happens to be indexed by time, ML can well be better. | | AFAIK Prophet had more modest scope than "be all and end all of | TS modelling", rather a decent model for everything. It might | indeed be excellent at that... | dylanjcastillo wrote: | In the M5 competition[1], most winning solutions used LightGBM. | So ML beat classical. | | Just a couple of the winning solutions used DL. | | [1] | https://www.sciencedirect.com/science/article/pii/S016920702... | gillesjacobs wrote: | This wouldn't pass peer-review of it were a paper. Major issues: | | - No fair hyperparametrization for Neural Prophet. They mention | multiple times they used default hyperparams or ad-hoc example | hyperparams. | | - 3/4 benchmark datasets (one they didn't finish training) where | ETS outperforms is not strong evidence of all-round robustness. | Benchmarks like SuperGlue for NLP combine 10 completely different | tasks with more subtasks to assess language model performance. | And even SuperGlue is not uncontroversial. | gillesjacobs wrote: | While the results don't prove the superiority convincingly, it | does seem that ETS is a good candidate as a first go-to in | practical applications. "In practice, practice and theory are | the same. In theory, they are not." | variaga wrote: | In _theory_ practice and theory are the same. In _practice_ | they are not. | gillesjacobs wrote: | HN pedantry ruins the fun of wordplay yet again. | anon_123g987 wrote: | In theory, his version is right. In practice, yours. | Imnimo wrote: | In the original Prophet paper | (https://peerj.com/preprints/3190.pdf) they claim that Prophet | outperforms ETS (see Figure 7, for example). And in the | NeuralProphet paper, they claim that it outperforms Prophet (but | do not, as far as I can see, compare directly to ETS). Here we | see ETS outperforms NeuralProphet. | | Presumably this apparent non-transitivity is because of | differences in each evaluation. If we fix the evaluation to the | method used here, is it still the case that NeuralProphet | outperforms Prophet (and therefore the claim that Prophet | outperforms ETS is not correct)? Or is it that NeuralProphet does | not outperform Prophet, but Prophet does outperform ETS? | beernet wrote: | As usual in ML, the appropriate solution depends on the problem | and context. | | ML (particularly DL) tends to outperform "classical" statistical | time series forecasting when the data is (strongly) nonlinear, | highly dimensional and large. The opposite holds as well. | | It is also important to note that accuracy is not the only | relevant metric in practical applications. Explainability is of | particular interest in time series forecasting: it is good to | know if your sales are going to increase/decrease, but it is even | more valuable to know which input variables are likely to account | for that change. Hence, a "simple" model with inferior | forecasting accuracy might be preferred to a stronger estimator | if it can give insights to not only the "what" will happen, but | also the "why". | tomwphillips wrote: | > ML (particularly DL) tends to outperform "classical" | statistical time series forecasting when the data is (strongly) | nonlinear, highly dimensional and large. | | This claim about forecasting with DL comes up a lot, but I've | seen little evidence to back it up. | | Personally, I've never managed to have the same success others | apparently have with DL time series forecasting. | beernet wrote: | It's true simply because large ANNs have a higher capacity, | which is great for large, nonlinear data but less so for | small datasets or simple functions. | | In any case, Transformers are eating ML right now and I'm | actually surprised there's no "GPT-3 for time series" yet. | It's technically the same problem as language modeling (that | is, multi-step prediction of numerics), however, there is | only a comparably little amount of human-generated data for | self-supervised learning of a time series forecasting model. | Another reason might be that the expected applications and | potentials of such a pre-trained model aren't as glamorous as | generating language. | time_to_smile wrote: | > It's technically the same problem as language modeling | | You're thinking of modeling event sequences which is not | strictly speaking the same as time series modeling. | | Plenty of people do use LSTMs to model event sequences, | using the hidden state of the model as a vector | representation of processes current location walking a | graph (i.e. a Users journey through a mobile app, or | navigating following links on the web.) | | Time series is different because the ticks of timed events | are at consistent intervals and are also part of the | problem being modeled. In general time series models have | often been distinct from sequence models. | | The reason there's no GPT-3 for any general sequence is the | lack of data. Typically the vocabulary of events is much | smaller than natural languages and the corpus of sequences | much smaller. | time_to_smile wrote: | A larger problem is that time series modeling is particularly | resistant towards black box approaches since a lot of | information is encoded in the model itself. | | Take even a simple moving average model on daily observations. | Consider stock ticker data (where there are no weekends) and | web traffic data (where there is an observation each day). The | stock ticker data should be smoothed with a 5 day window and | the web traffic with a 7 to help reduce the impact of weekly | effects (which probably shouldn't exist in the stock market | anyway). | | It's possible in either of these cases you might find a moving | average that performs better on some choose metric, say 4 or 8 | days. However neither of these alternatives make any sense as a | window if we're trying to remove day-of-week effect, and unless | you can come up with a justifiable explanation, smoothing over | arbitrary windows should be avoided. | | If you let a black box optimize even a simple moving average | you would be avoiding some very essential introspection into | what your model is actually claiming. | | Not to mention that we often can do more than just prediction | with these intentional model tunings (for example day-of-week | effect can be explicitly differenced from the data to measure | exactly how much sales should increase on a Saturday) | rich_sasha wrote: | Hmm, wow. When I saw the headline, I assumed they used like one | dataset or something similarly limiting. | | I'd need to dig out the original paper, but I would be surprised | if the original didn't compare to basic benchmark methods. But | from memory, I never saw such a comparison (until now). | cercatrova wrote: | Can someone explain this? I don't know what the context is for | this Show HN. | IshKebab wrote: | They're time series prediction methods. E.g. they mention | electricity usage forecasting - given historical data, what | will the usage be in 1 hour? | | Facebook's Prophet is quite popular in the space I understand. | No idea about the other two. | mkl wrote: | A minor language error: "this model does not outperform classical | statistical methods _neither_ in accuracy _nor_ speed. " should | say "either" and "or". | ISV_Damocles wrote: | https://dictionary.cambridge.org/grammar/british-grammar/nei... | mkl wrote: | Nothing there seems to contradict me. The problem in the | linked page is that "neither ... nor" is used after "not", | which makes it a double negative. | lightedman wrote: | The "Not-Neither-Nor" sequence is typical, even with regards to | American English, versus British English (the Queen's English.) | In either case, both are technically-correct. | bo1024 wrote: | As part of a double negative? | kevin_thibedeau wrote: | English is nothing if not inconsistent. ___________________________________________________________________ (page generated 2022-08-17 23:00 UTC)