[HN Gopher] Stan is a state-of-the-art platform for statistical ... ___________________________________________________________________ Stan is a state-of-the-art platform for statistical modeling Author : Tomte Score : 170 points Date : 2020-12-23 10:19 UTC (12 hours ago) (HTM) web link (mc-stan.org) (TXT) w3m dump (mc-stan.org) | pvitz wrote: | By looking at the user's guide, it seem that Stan has also other | use cases than Bayesian inference. Examples are linear | regression, mixture models and even ODEs. Does anybody here have | experience with Stan and R and could comment on the strengths of | Stan in non-Bayesian contexts? | elsherbini wrote: | Stan uses MCMC (specifically NUTS, which is a Hamiltonian Monte | Carlo sampler) to optimize parameter fitting, so it can be used | for things like ODEs. | | Here is an example from a class taught last January that uses | stan to fit a simple ODE (using the `integrate_ode_rk45` | function in stan): | | https://github.com/gregbritten/BayesianEcosystems_IAP/blob/m... | standevbob wrote: | Stan provides both frequentist inference (penalized maximum | likelihood with bootstrapped confidence intervals) and Bayesian | inference (MCMC sampling or approximate variational) inference. | | As currymj says, the differential equations (same for all the | linear algebra solvers like eigendecomposition) can be used in | defining likelihoods for either Bayesian or frequentist | estimation. Same for all of our linear algebra operations and | special functions. | | Not every model that can be programmed in Stan has a well- | defined MLE or proper posterior. Standard | hierarchical/multilevel models don't have MLEs, even with | standard shrinkage. Bayesian models with improper priors and no | data wind up with improper posteriors, etc. | | Having said all that, almost all of the use of Stan is for | Bayesian inference. | nlpNick wrote: | You may find the `rstanarm` package interesting/useful. I've | used it for linear regression and HLMs. https://mc- | stan.org/rstanarm/articles/index.html | currymj wrote: | I believe the main reason for including ODE solvers is | basically to do Bayesian parameter estimation of ODEs from | data. | | likewise as far as I know, linear regression and mixture models | are both done in a Bayesian style (a hierarchical model giving | priors for parameters). | ChrisRackauckas wrote: | DiffEqBayes.jl can transpile Julia ODE code to Stan. This is a | nice interface to use Stan directly from Julia, and also makes | it easy to benchmark the ODE inference in a bunch of PPLs. Some | benchmarks: | | https://benchmarks.sciml.ai/html/ParameterEstimation/DiffEqB... | | https://benchmarks.sciml.ai/html/ParameterEstimation/DiffEqB... | Crye wrote: | I don't know what your experience with Bayesian modeling is, | and I'll admit mine is limited, but STAN can solve linear | regression by defining up a linear model and then setting the | dimension parameters as a normal distribution to solve. This is | great because it gives you a measure of certainty for each of | your parameters. | money28 wrote: | kwqeqw | elsherbini wrote: | Does anyone know what a practical upper limit is for Stan in | terms of size of data set / number of parameters to fit? Can you | use stan to fit a model with ~10^4 parameters and ~10^6 rows of | data if you had access to ~10^3 cores? How long would it take? | standevbob wrote: | Yes, we regularly use Stan's MCMC to fit relatively simple | time-series regression models or item-response theory type | models with 10^5 parameters and 10^6 rows of data on a desktop | computer. It can take a day, though. It's much faster with | variational inference, but that can be less stable and it | doesn't give you the same uncertainty quantification because of | the way the KL-divergence is ordered in the objective. | | Stan can parallelize multiple chains and it can parallelize the | density/gradient calculations in a single chain. But for the | latter to be efficient, the chunks being parallelized need to | be compute intensive, like you might get in a pharmacometric | compartment model where you might have to solve a bunch of | differential equations for each of thousands of patients in a | clinical trial. | gbrown wrote: | I imagine it depends on which algorithm you're using - maybe | with their VB functionality (I almost entirely use the souped | up NUTS algorithm for full Bayesian inference). | | It also likely depends on how well conditioned your model is - | even if you can get it to run for huge models on reasonable | hardware, convergence may not be practical. | hendzen wrote: | Stan used to only be able to parallelize across chains but they | introduced within-chain parallelism this year. Even then some | of the work is still serial so I don't think you can expect | linear speedups past a certain point. | mark_l_watson wrote: | I will give PyStan a try over the holidays. Also, many thanks for | all of the great comments here (statistical modeling is not in my | toolbox yet, and the discussion here is grounding). | ogogmad wrote: | Obligatory question: What applications does Stan have? I'm aware | of Facebook's Prophet model for time series prediction, but what | about others? | | I find there's a lot of excitement around Bayesian inference and | MCMC, but I do wonder about the substance. | usgroup wrote: | Bayesian modellers are typically scientists and Stan probably | has more indirect than direct users. For example, rstanarm and | BRMS are both R regression packages which use Stan which are | wildly popular in Bayesian circles. They enable hierarchical | Bayesian modelling which can be used to perform a very flexible | kind of regression which allows the integration of lots of | prior information, and better quantification of uncertainties | than previous alternatives. | [deleted] | j7ake wrote: | Bayesian methods are great if you want to squeeze all the | information you get from each of your data points, and also | injecting specific prior information to help prevent over | fitting. | | These methods are ideal for small datasets with correlation | structures that aren't necessarily independent. | | Also great if you want uncertainty with your estimates. | standevbob wrote: | This is the main reason that people use Stan---squeezing as | much info out of your data as possible. That and the ability | to write custom models for these situations. | | There are hundreds of different applications of Stan across | the physical, biologial, and social sciences, as well as in | finance, education, sports analytics, actuarial sciences, | transportation planning, all sorts of material and chemical | and civil engineering, clinical trials and pharmacometrics, | etc. etc. It's most popular in fields like ecology and | epidemiology where Bayesian methods are already popular. For | instance, many of the Covid models (like the one for NY | state) are being built with Stan. All four baseball teams in | the semifinals (LCS) use Stan for analytics, for example. | Google and Facebook use Stan for ad attribution and resource | allocation. It's been used for models of neutrino mass and | models of galactic mass, models of supernovas, and it's even | used in the LIGO gravitational wave experiments. | stdbrouw wrote: | Any kind of statistical modeling that doesn't fit neatly into | an existing "one (meta)model to rule them all" framework such | as generalized linear models. | usgroup wrote: | I don't think that's accurate. Sampling based approaches | scale badly with data so, although there are a few | exceptions, if you're tackling the problem as a hierarchical | Bayesian model - which is most often what Stan is used for - | you're working with a dataset with a small number of features | and fewer than 10k rows. | | Stan fittings can be made parallel, some models will scale | linearly with the data, but in the main you won't find many | big data use cases here. | | You also can't use Stan for online learning. | harperlee wrote: | > You also can't use Stan for online learning. | | Can't you loop posteriors as next iterations priors to get | a system that learns online? | usgroup wrote: | Stan will get your from a prior to a posterior | distribution that is best supported by your data, but | typically the posterior and prior distributions will not | be of the same form, so there's no loop back to make. | | In the case that your model is extremely simple such that | your posterior has a "conjugate prior" (i.e. the | posterior and prior are the same family of distribution), | this sort of loop back is possible. But where this is | possible you have no reason at all to use Stan or MCMC | since you can just update your posterior directly. | harperlee wrote: | "Typically" depends on your problem, right? So if I know | that I want to feedback predictions into the next | iteration, I need to take care to structure a model that | enables that, by having posteriors with same shape as | priors. But from what I understand it seems a design | consideration, not a fundamental limitation. | ploika wrote: | You can, but it's slow and computationally intensive - | you still need to be fairly sure that the sampler has | converged on the true posterior (I might be a bit off | with the terminology there but you know what I mean) | before that can become your new prior. | stdbrouw wrote: | Fair enough. I read the word "application" as "field of | inquiry" where I do think the sky is the limit, but it's | true that Stan is primarily geared towards scientific work | with small data sets. | standevbob wrote: | That's right---Stan doesn't have any online learning | facilities. It's very hard to approximate posteriors and | chain them, so we don't try. | | If by "big data", we're talking about too big to fit in | memory, that's right. Stan's fully in-memory. Compute can | be distributed and GPU-powered for matrix ops, but all of | the data and parameters and the core autodiff expression | graph need to fit in memory. | | For "medium data", Stan's adaptive Hamiltonian Monte Carlo | sampling is much more efficient and scalable to complex | models and higher dimensions than Gibbs or Metropolis. I'm | fitting a Covid prevalence model using a custom trend- | following and mean-reverting second-order autoregression | model over 400 distinct regions with weekly data that has | 5M data points and 10K parameters and adjusts for | sensitivity and specificity of various tests taken. It fits | in a single thread using MCMC in 24 hours or so, but we can | fit the model with variational inference in a couple | minutes. Although variational inference often produces | reasonable point estimates in bigger data settings, it | doesn't reasonably quantify uncertainty. I'm also working | on a genomics model for differential expression of splice | variants that involves 120K measurements and just as many | parameters to deal with overdispersion of biological | replicates in a control and treatment group. We're using | variational inference and it fits in a couple minutes for | the comparitiver event probabilities we need to estimate. | RA_Fisher wrote: | Check out stan's variational inference algos. They're | relatively fast (compared to MCMC) at the cost of being | approximative. | elsherbini wrote: | This was useful. Do you know how painful it would be to use | Stan with 100k rows, or even 1m? (For a sorta normal | hierarchical model) | usgroup wrote: | Under the hood Stan attempts to find globally optimal | parameter values for your function which you've expressed | as a joint probability density. To do this it relies on | the same MCMC theoretical results which indicate how the | recursive process of sampling and posterior updating | leads to the global optimum. The big deal about Stan is | that its algorithm for doing this is state-of-the-art, | and that it can work with a huge variety (including | custom) density functions by utilising auto- | differentiation. | | Sampling is a slow approach when there are other | alternatives. For example, if you are after OLS | regression, you can do the equivalent with Stan but it | may be an order of magnitude slower than plain OLS. | Further, the calculation of your likelihood function will | scale linearly with the size of the data. But adding new | parameters will scale exponentially, so you may find that | a model with 2 free parameters which takes 10 minutes to | fit takes 2 hours with 3 parameters. | | A good thing about Stan however, is that it is | parallelisable so you can run it on many cores (and it | will scale linearly for a good while) and you can also | run it on MPI across many machines. Some regression | functions with very large matrices support GPUs (although | Stan requires double precision to work). So to some | extent you can "throw more money at it" to get a result | out and it has been used for very big data problems in | astronomy for example which however utilised something | like 600k cores if memory serves correctly. | standevbob wrote: | Stan supports optimization (L-BFGS) to find (penalized) | maximum likelihood or MAP estimates where they exist. | Bayesian estimates are typically posterior means, which | involve MCMC rather than optimization, and the result is | usually far away from the maximum likelihood estimate in | high dimensions. I wrote a case study with some simple | examples here: https://mc- | stan.org/users/documentation/case-studies/curse-d... | | Adding new parameters scales as O(N^5/4) in HMC, whereas | it scales as O(N^2) in Metropolis or Gibbs. It's | quadrature that scales exponentially in dimension. | There's also a constant factor for posterior correlation, | which can get nasty. I regularly fit regressions for | epidemiology or genomics or education with 10s or even | 100s of thousands of parameters on my notebook with one | core and no GPU. | | MCMC or optimization can be sub-linear or super-linear in | the data, depending on the statistical properties of the | posterior. Some non-parametric models like Gaussian | processes can be cubic in the data size, whereas | regressions are often sub-linear (doubling the data | doesn't double computation time) because posteriors are | better behaved (more normal in the Gaussian sense) when | there's more data and hence easier to explore in fewer | log density and gradient evaluations. | noelsusman wrote: | It's really good for hierarchical models. I used it this year | to model PPE usage for a large health system. It let me easily | share information across hospitals and embed knowledge of how | different PPE items interact with each other. As always, there | are other ways to accomplish this, but it felt natural in Stan. | wodenokoto wrote: | What kind of problems does Stan / Bayesian inference beat the | much more hyped Tensorflow / deep learning approach? | | Often you hear that deep learning is best at unstructured data | (images, sound and recently raw text) and boosted trees / XG | boost for tabular data. | eggie5 wrote: | Managing uncertainty with Distributions instead of point | estimates | kj98uo wrote: | I am still learning about Bayesian inference so this might be | off-base but isn't the point to compute the full posterior | distribution (or an approximation thereof) of the underlying | parameters. Whether this is done in the context of a linear | model or a deep neural network is a question of tractability. | | The other distinction is between discriminative and generative | models. In a discriminative model, the output/label is being | predicted based on the input features: p(y|x, theta). For | example, the probability of an image containing a dog, y based | on pixels, x. Theta here refers to the parameters one needs to | discover. | | In a generative model, one instead models the distribution | p(x|y, beta) i.e. given the label, say dog, predicting the | joint distribution of all the images. | | Neural networks with backproagation can be used for both | discriminative and generative models. Bayesian methods can be | applied to both discriminative and generative models to compute | the full posterior distribution of the parameters, theta and | beta. | | Edit for clarity: The claim is that the choice of the model vs | the choice of inferential methodology (Bayesian vs max | likelihood for example) are orthogonal choices. | | A neural network doing (discriminative) binary classification | based on cross-entropy is maximizing likelihood instead of | maximizing the posterior. Most Bayesian examples seem to | specify a generative model (a Hidden Markov Model for example) | and then infer the posterior. But there's nothing preventing | one from using Bayesian methods with discriminative models | (generalized linear models) or max likelihood with generative | models. | credit_guy wrote: | Both Bayesian inference and deep learning can do function | fitting, i.e. given a number of observations y and explanatory | variables x, you try to find a function so that y ~ f(x). The | function f can have few parameters (e.g. f(x)= ax+b for linear | regression) or millions of parameters (the usual case for deep | learning). You can try to find the best value for each of these | parameters, or admit that each parameter has some uncertainty | and try to infer a distribution for it. The first approach uses | optimization, and in the last decade, that's done via various | flavors of gradient descent. The second uses Monte Carlo. When | you have few parameters, gradient descent is smoking fast. | Above a number of parameters (which is surprisingly small, | let's say about 100), gradient descent fails to converge to the | optimum, but in many cases gets to a place that is "good | enough". Good enough to make the practical applications useful. | In pretty much all cases though, Bayesian inference via MCMC is | painfully slow compared to gradient descent. | | But there is a case where it makes sense: when you have | reasonably few parameters, and you can understand their | meaning. And this is exactly the case of what's called | "statistical models". That's why STAN is called a statistical | modeling language. | | How is that? Gradient descent for these small'ish models is | just MLE (maximum likelihood estimation). People have been | doing MLE for 100 years, and they understand the ins and outs | of MLE. There are some models that are simply unsuited for MLE; | their likelihood function is called "singular"; there are | places where the likelihood becomes infinite despite the fit | being quite poor. One way to fix that is to "regularize" the | problem, i.e. to add some artificial penalty that does not | allow the reward function to become infinite. But this | regularization is often subjective. You never know when the | penalty you add is small enough to not alter the final fit. | Another way is to do Bayesian inference . It's very slow, but | you don't get pulled towards the singular parameters. | celrod wrote: | It's used a lot for things like analyzing clinical trials, e.g | making futility or early stopping calls in interims, or for | meta analysis. JAGS may still be the most popular, at least in | some companies, but Stan is starting to catch on thanks to its | greater flexibility in most respects. | abeppu wrote: | I like many of the answers to your question. But a refinement | of your question is when do we really have to choose between | Bayesian inference and deep learning? Under what conditions | should one pick Stan over Edward or Pyro? | nabla9 wrote: | 1) You have too little data for Deep Learning | | 2) You want to do statistical modelling, not a black box. You | already have a statistical model in mind, you just want to fit | parameters. | | Stan is probabilistic programming system. You describe the | data-producing mechanism (the model of reality), and the level | and form of approximations used in the estimation. The compiler | generates code for the estimators. | jsinai wrote: | Other comments point out to Bayesian inference being good for | modelling an uncertain outcome, while deep learning is good for | prediction. | | However Bayesian inference is a good choice for prediction when | you have few data points (deep learning is sample-size hungry). | And it is especially good when you have high uncertainty in | your labelled training data (ie large variance in the response | variable for given input). Here a Bayesian regression (or even | classification) model wouldn't magically remove the uncertainty | but rather you'd be able to account for the predictive variance | (instead of being none-the-wiser using just good ole deep | learning). You can then take it from there how you wish to | treat the predictions, given the predictive variance as well. | borroka wrote: | The choice is not between Bayesian methods and Deep Learning, | but between statistical models and machine learning models | (say, from random forest to GBM to xgboost and then maybe | Deep Learning). There is overlap between statistical models | and machine learning models--it is a matter sometimes of | focus--and Bayesian methods can also be applied to what are | typically considered ML approaches (see for example Bayesian | hierarchical random forest). | usgroup wrote: | Stan is exceptional if what you need is a hierarchical Bayesian | model, and if what you want is rigorous way of quantifying the | uncertainty associated in the parameter selections in your | model. | | Stan users are more often R users than Python user and mostly | come from science backgrounds. They often use Stan via a | package called BRMS which stands for "Bayesian Regression | Models using Stan" which should give you some idea of its core | use case. | | You wouldn't use Stan if you weren't trying to model your | problem as a distribution based probabilistic model. | scottfr wrote: | Stan: Predict the values of parameters in a model | | Deep Learning: Predict an outcome variable | | For example, if I want to know what effect household income has | on a student's chance of getting into college, Stan would allow | you to estimate that given a proposed model. | | If instead I wanted to predict a given student's chance of | getting into college, I might use Machine Learning. | | Of course, those two problems are linked, but it's a | fundamental difference of focus. | [deleted] | [deleted] | nightski wrote: | While it is true that Bayesian inference is very powerful in | that it allows one to introspect and view effects of the | model's parameters on the outcome, it is equally as good at | predicting the outcome variable as well. It just depends on | what you want to get out of it. In fact you get more | information about your outcome variable from Bayesian | Inference as it is a distribution. | | I'm not saying it is better than DL by any means, as DL can | scale much better. Just that I don't think it's necessary to | pigeonhole Bayesian inference to just predicting the | parameters. In my opinion the "fundamental difference of | focus" is just a personal decision, not something inherent to | the method. | borroka wrote: | The focus of statistical models (including Bayesian models) | is on inference and uncertainty (both for parameter values | and for predictions), the focus of ML models (including DL | models) is on prediction and it is rarely possible to | obtain any quantification of uncertainty. | jgalt212 wrote: | > rarely possible to obtain any quantification of | uncertainty. | | Can't this be estimated via bootstrapping? | peteradio wrote: | I guess Bayesian will tend to be underfit while DL may tend | to overfit. | ogogmad wrote: | I asked essentially the same question as you 5 minutes before | you did. Have an upvote anyway. | | [Edit] I don't understand these downvotes. | ogogmad wrote: | Anonymous passive aggressive downvoting cowards go to hell. | tel wrote: | Bayesian modeling has a somewhat distinct feeling to both | (typical) deep learning algorithms and boosting/bagging | classifiers. | | Most particularly, Bayesian modeling tends to be generative | modeling as opposed to discriminative. This means that you | construct your model by describing a process which generates | your observed data from a set of latent/unknown quantities. | | For instance, we might observe that n[u, d] clicks are observed | on user u on day d for various choices of u and d. We could | build a variety of generative stories here: that n[u, d] is | independent of u and d, just being a random draw from a | Normal(mu, sigma) distribution; that n[u, d] incorporates | another unknown parameter p[u], the user's propensity to click, | and then is a random draw from Normal(mu + b p[u], sigma); or | that we also include season trends sm[d] and ss[d] to both the | mean and spread of n[u, d], saying it's Normal(mu + b p[u] + | sm[d], sigma * ss[d]). | | In these examples, the unknown latents are parameters like mu, | sigma, and b as well as any latent data needed to give shape to | p[-], sm[-], and ss[-]. Once we've posited the structure of | this generative model, we'd like to infer what values those | latents might take as informed by the data. | | This is the bread and butter of Stan modeling. It lets you | describe these generative models as a "forward" process where | we sample latents in a simple forward program. Similar to | Tensorflow/etc Stan extracts from this forward program a DAG | and computes derivatives, but instead of simply maximizing an | objective function through backdrop, Stan uses these | derivatives to perform a sampling algorithm over the latents | (mu, sigma, b). | | Ultimately, this gives you a distribution of plausible latent | configurations given the data you've observed. This | _distribution_ is a key point of Bayesian modeling and can | provide a lot of information beyond what the objective- | maximizing value would. As a simple example, it 's trivial from | a Bayesian output distribution to make statements like "we're | 95% confident that mu > 0.1". | gbrown wrote: | This question would be super bizarre to anyone coming from a | stats background. | | Others have commented on the role of inference/estimation, and | prediction in small data or non-black-box contexts, so I'll | just add that there are deep theoretical reasons to do Bayesian | inference. It's a framework grounded firmly in decision theory, | and provides a coherent way to reason about the world. You can | prove, under sensible axioms, that beliefs can be described in | terms of probability distributions, and that we should update | beliefs based on Bayes' Rule. | darthdeus wrote: | Stan gives you the ability to do probabilistic reasoning. There | is actually Tensorflow Probability | (https://www.tensorflow.org/probability) which has a lot of | overlapping algorithms, but isn't as mature and approaches some | things differently. | | The main difference is that with Stan you think in terms of | random variables and distributions (and their transformations), | while with Tensorflow/DL you think in terms of predicting | directly from data. Stan lets model a problem with | probabilities and do arbitrary inference, generally asking any | question you want about your model. | | There are many other interesting alternatives, e.g. | http://pyro.ai/ which takes a yet another approach merging DL | and probabilistic programming with variational inference. (Stan | and TFP can do variational inference too, but I guess it's like | Python vs JavaScript vs Ruby vs Java - all of them can be used | for programming, but not the same way). | usgroup wrote: | The next cut of Stan will likely use TFP as a backend. I | think that PyMC4 will also. The Stan team wrote everything | from scratch in C++ including their own autodiff code which | many regard as quite a stretch in terms of long term | maintenance. Since TFP executes on top of Tensorflow things | like autodiff and many of the other performance concerns that | take up so much Stan-dev time are already taken care of. | [deleted] | abhgh wrote: | PyMC4 on TFP was the plan, but they made a recent | announcement [1] indicating those efforts would stop, and | instead, they would develop PyMC3+JAX+Theano. | | [1] https://pymc-devs.medium.com/the-future-of-pymc3-or- | theano-i... | diab0lic wrote: | Woah. Thanks for the link, as a PyMC3 user I was not | looking forward to the transition to 4 expecting to have | to relearn the API like the transition from 2 to 3. I was | debating wether I should learn 4 or switch to a different | library when all I really wanted to do was stick with 3. | | Looks like I get the best of both worlds now. | jsinai wrote: | Please no, we don't need Stan to be rebuilt with a Python | backend. That it's built in C++ and can be called with | higher level API's is part of the appeal. | dstick wrote: | My name is Stan and I just finished a feature on the product I'm | working on that used statistical analysis for anomaly detection. | So this headline made me smile - thanks for sharing, and | apologies for a rather pointless comment otherwise ;-) | harry8 wrote: | Named for Stanislaw Ulam. | https://en.wikipedia.org/wiki/Stanislaw_Ulam | | Were you? | mhh__ wrote: | The name of the main developer of Microsoft's C++ library is a | never-ending source of fun | hendzen wrote: | If you want to learn Stan I highly recommend the book Statistical | Rethinking (2nd Ed) by Richard McElreath. It's a pedagogical | masterpiece and light years away the best resource I've found on | learning Bayesian inference. | elsherbini wrote: | Seconded. He has a full course on youtube as well, and a free | version of the textbook that is just missing the last chapter | available on his website (password is in the first or second | lecture on youtube) | | https://www.youtube.com/watch?v=4WVelCswXo4&list=PLDcUM9US4X... | nextos wrote: | Statistical Rethinking is not bad, but I think it's for people | with backgrounds different than CS (or Math). | | Personally, I think https://probmods.org/ is an exceptionally | good introduction to probabilistic programming for someone that | knows CS or just some programming and likes a SICP-like | textbook that goes into the essence of the topic. | | Learning Stan is great, but not as a first probabilistic | programming language, because it's quite limited (it trades | model expressiveness for performance). So you can't represent a | large set of models, such as infinite mixtures, which may | become really relevant in the future developments of deep | learning. It also has poor performance in models that involve | many discrete variables. | stevesimmons wrote: | The Statistical Rethinking book uses R. | | For people wanting Python, Jupyter notebooks with Python code | examples are here: | | * https://github.com/pymc- | devs/resources/tree/master/Rethinkin... | glial wrote: | Facebook's (very good) Prophet forecasting library is a wrapper | for Stan models. | | https://facebook.github.io/prophet/ | PLenz wrote: | Stan is one of those technologies I keep finding is actually | powering the more 'friendly' interfaces I run to one off jobs - | especially in the mcmc world. Every so often I think I'll spend | some time to learn stan proper but it's such an all-encompassing | project that I get intimidated and stick to the derivatives. My | loss! | | Bravo to the team behind it and for making and supporting such a | powerful tool! | deugtniet wrote: | I've dabbled in Stan, and it's really good and state of the art | for Bayesian inference. Starting using Stan is a bit difficult | though, as it has a C like programming language that is difficult | to master initially. Especially since statistics is usually done | in languages like R, so the learning curve is a bit steep for | beginners. | | I've personally liked PyMC for simple models and relative ease of | inference, as it's more integrated with the Python language. That | being said, if you want the latest in inference methods and | statistical alchemy, Stan is the place to go. | phillc73 wrote: | There are a good range of other programming language interfaces | to Stan. The R one is quite popular.[1] | | You do still need the C++ toolchain, but can just write your | code in R. | | [1] https://mc-stan.org/rstan/ | standevbob wrote: | Stan requires models to be coded in the Stan language, which | is a simple imperative language that's like MATLAB with | explicit data types. This is the same as was done in Stan's | predecessors, BUGS and JAGS. | | A Stan program can be run in any of our interfaces in Python, | Julia, R, MATLAB, Stata, etc. But you can't mix any of those | languages into a Stan program. | | The C++ toolchain is required because Stan transpiles its | programs to C++, then compiles those against the Stan math | librarym, which does autodiff. But you don't need to write | any C++ to use Stan, just to develop extensions for it. | melling wrote: | There's a Julia Stan too: | | http://stanjulia.github.io/Stan.jl/stable/INTRO.html | | https://astrostatistics.psu.edu/su14/lectures/BayesComp2014L. | .. | mushufasa wrote: | Does anyone know of a good article of a comparison between Stan | vs PyMC3 for real-world bayesian modelling tasks? E.g. to be used | in a production system. ___________________________________________________________________ (page generated 2020-12-23 23:01 UTC)