[HN Gopher] Stan is a state-of-the-art platform for statistical ...
       ___________________________________________________________________
        
       Stan is a state-of-the-art platform for statistical modeling
        
       Author : Tomte
       Score  : 170 points
       Date   : 2020-12-23 10:19 UTC (12 hours ago)
        
 (HTM) web link (mc-stan.org)
 (TXT) w3m dump (mc-stan.org)
        
       | pvitz wrote:
       | By looking at the user's guide, it seem that Stan has also other
       | use cases than Bayesian inference. Examples are linear
       | regression, mixture models and even ODEs. Does anybody here have
       | experience with Stan and R and could comment on the strengths of
       | Stan in non-Bayesian contexts?
        
         | elsherbini wrote:
         | Stan uses MCMC (specifically NUTS, which is a Hamiltonian Monte
         | Carlo sampler) to optimize parameter fitting, so it can be used
         | for things like ODEs.
         | 
         | Here is an example from a class taught last January that uses
         | stan to fit a simple ODE (using the `integrate_ode_rk45`
         | function in stan):
         | 
         | https://github.com/gregbritten/BayesianEcosystems_IAP/blob/m...
        
         | standevbob wrote:
         | Stan provides both frequentist inference (penalized maximum
         | likelihood with bootstrapped confidence intervals) and Bayesian
         | inference (MCMC sampling or approximate variational) inference.
         | 
         | As currymj says, the differential equations (same for all the
         | linear algebra solvers like eigendecomposition) can be used in
         | defining likelihoods for either Bayesian or frequentist
         | estimation. Same for all of our linear algebra operations and
         | special functions.
         | 
         | Not every model that can be programmed in Stan has a well-
         | defined MLE or proper posterior. Standard
         | hierarchical/multilevel models don't have MLEs, even with
         | standard shrinkage. Bayesian models with improper priors and no
         | data wind up with improper posteriors, etc.
         | 
         | Having said all that, almost all of the use of Stan is for
         | Bayesian inference.
        
         | nlpNick wrote:
         | You may find the `rstanarm` package interesting/useful. I've
         | used it for linear regression and HLMs. https://mc-
         | stan.org/rstanarm/articles/index.html
        
         | currymj wrote:
         | I believe the main reason for including ODE solvers is
         | basically to do Bayesian parameter estimation of ODEs from
         | data.
         | 
         | likewise as far as I know, linear regression and mixture models
         | are both done in a Bayesian style (a hierarchical model giving
         | priors for parameters).
        
         | ChrisRackauckas wrote:
         | DiffEqBayes.jl can transpile Julia ODE code to Stan. This is a
         | nice interface to use Stan directly from Julia, and also makes
         | it easy to benchmark the ODE inference in a bunch of PPLs. Some
         | benchmarks:
         | 
         | https://benchmarks.sciml.ai/html/ParameterEstimation/DiffEqB...
         | 
         | https://benchmarks.sciml.ai/html/ParameterEstimation/DiffEqB...
        
         | Crye wrote:
         | I don't know what your experience with Bayesian modeling is,
         | and I'll admit mine is limited, but STAN can solve linear
         | regression by defining up a linear model and then setting the
         | dimension parameters as a normal distribution to solve. This is
         | great because it gives you a measure of certainty for each of
         | your parameters.
        
       | money28 wrote:
       | kwqeqw
        
       | elsherbini wrote:
       | Does anyone know what a practical upper limit is for Stan in
       | terms of size of data set / number of parameters to fit? Can you
       | use stan to fit a model with ~10^4 parameters and ~10^6 rows of
       | data if you had access to ~10^3 cores? How long would it take?
        
         | standevbob wrote:
         | Yes, we regularly use Stan's MCMC to fit relatively simple
         | time-series regression models or item-response theory type
         | models with 10^5 parameters and 10^6 rows of data on a desktop
         | computer. It can take a day, though. It's much faster with
         | variational inference, but that can be less stable and it
         | doesn't give you the same uncertainty quantification because of
         | the way the KL-divergence is ordered in the objective.
         | 
         | Stan can parallelize multiple chains and it can parallelize the
         | density/gradient calculations in a single chain. But for the
         | latter to be efficient, the chunks being parallelized need to
         | be compute intensive, like you might get in a pharmacometric
         | compartment model where you might have to solve a bunch of
         | differential equations for each of thousands of patients in a
         | clinical trial.
        
         | gbrown wrote:
         | I imagine it depends on which algorithm you're using - maybe
         | with their VB functionality (I almost entirely use the souped
         | up NUTS algorithm for full Bayesian inference).
         | 
         | It also likely depends on how well conditioned your model is -
         | even if you can get it to run for huge models on reasonable
         | hardware, convergence may not be practical.
        
         | hendzen wrote:
         | Stan used to only be able to parallelize across chains but they
         | introduced within-chain parallelism this year. Even then some
         | of the work is still serial so I don't think you can expect
         | linear speedups past a certain point.
        
       | mark_l_watson wrote:
       | I will give PyStan a try over the holidays. Also, many thanks for
       | all of the great comments here (statistical modeling is not in my
       | toolbox yet, and the discussion here is grounding).
        
       | ogogmad wrote:
       | Obligatory question: What applications does Stan have? I'm aware
       | of Facebook's Prophet model for time series prediction, but what
       | about others?
       | 
       | I find there's a lot of excitement around Bayesian inference and
       | MCMC, but I do wonder about the substance.
        
         | usgroup wrote:
         | Bayesian modellers are typically scientists and Stan probably
         | has more indirect than direct users. For example, rstanarm and
         | BRMS are both R regression packages which use Stan which are
         | wildly popular in Bayesian circles. They enable hierarchical
         | Bayesian modelling which can be used to perform a very flexible
         | kind of regression which allows the integration of lots of
         | prior information, and better quantification of uncertainties
         | than previous alternatives.
        
           | [deleted]
        
         | j7ake wrote:
         | Bayesian methods are great if you want to squeeze all the
         | information you get from each of your data points, and also
         | injecting specific prior information to help prevent over
         | fitting.
         | 
         | These methods are ideal for small datasets with correlation
         | structures that aren't necessarily independent.
         | 
         | Also great if you want uncertainty with your estimates.
        
           | standevbob wrote:
           | This is the main reason that people use Stan---squeezing as
           | much info out of your data as possible. That and the ability
           | to write custom models for these situations.
           | 
           | There are hundreds of different applications of Stan across
           | the physical, biologial, and social sciences, as well as in
           | finance, education, sports analytics, actuarial sciences,
           | transportation planning, all sorts of material and chemical
           | and civil engineering, clinical trials and pharmacometrics,
           | etc. etc. It's most popular in fields like ecology and
           | epidemiology where Bayesian methods are already popular. For
           | instance, many of the Covid models (like the one for NY
           | state) are being built with Stan. All four baseball teams in
           | the semifinals (LCS) use Stan for analytics, for example.
           | Google and Facebook use Stan for ad attribution and resource
           | allocation. It's been used for models of neutrino mass and
           | models of galactic mass, models of supernovas, and it's even
           | used in the LIGO gravitational wave experiments.
        
         | stdbrouw wrote:
         | Any kind of statistical modeling that doesn't fit neatly into
         | an existing "one (meta)model to rule them all" framework such
         | as generalized linear models.
        
           | usgroup wrote:
           | I don't think that's accurate. Sampling based approaches
           | scale badly with data so, although there are a few
           | exceptions, if you're tackling the problem as a hierarchical
           | Bayesian model - which is most often what Stan is used for -
           | you're working with a dataset with a small number of features
           | and fewer than 10k rows.
           | 
           | Stan fittings can be made parallel, some models will scale
           | linearly with the data, but in the main you won't find many
           | big data use cases here.
           | 
           | You also can't use Stan for online learning.
        
             | harperlee wrote:
             | > You also can't use Stan for online learning.
             | 
             | Can't you loop posteriors as next iterations priors to get
             | a system that learns online?
        
               | usgroup wrote:
               | Stan will get your from a prior to a posterior
               | distribution that is best supported by your data, but
               | typically the posterior and prior distributions will not
               | be of the same form, so there's no loop back to make.
               | 
               | In the case that your model is extremely simple such that
               | your posterior has a "conjugate prior" (i.e. the
               | posterior and prior are the same family of distribution),
               | this sort of loop back is possible. But where this is
               | possible you have no reason at all to use Stan or MCMC
               | since you can just update your posterior directly.
        
               | harperlee wrote:
               | "Typically" depends on your problem, right? So if I know
               | that I want to feedback predictions into the next
               | iteration, I need to take care to structure a model that
               | enables that, by having posteriors with same shape as
               | priors. But from what I understand it seems a design
               | consideration, not a fundamental limitation.
        
               | ploika wrote:
               | You can, but it's slow and computationally intensive -
               | you still need to be fairly sure that the sampler has
               | converged on the true posterior (I might be a bit off
               | with the terminology there but you know what I mean)
               | before that can become your new prior.
        
             | stdbrouw wrote:
             | Fair enough. I read the word "application" as "field of
             | inquiry" where I do think the sky is the limit, but it's
             | true that Stan is primarily geared towards scientific work
             | with small data sets.
        
             | standevbob wrote:
             | That's right---Stan doesn't have any online learning
             | facilities. It's very hard to approximate posteriors and
             | chain them, so we don't try.
             | 
             | If by "big data", we're talking about too big to fit in
             | memory, that's right. Stan's fully in-memory. Compute can
             | be distributed and GPU-powered for matrix ops, but all of
             | the data and parameters and the core autodiff expression
             | graph need to fit in memory.
             | 
             | For "medium data", Stan's adaptive Hamiltonian Monte Carlo
             | sampling is much more efficient and scalable to complex
             | models and higher dimensions than Gibbs or Metropolis. I'm
             | fitting a Covid prevalence model using a custom trend-
             | following and mean-reverting second-order autoregression
             | model over 400 distinct regions with weekly data that has
             | 5M data points and 10K parameters and adjusts for
             | sensitivity and specificity of various tests taken. It fits
             | in a single thread using MCMC in 24 hours or so, but we can
             | fit the model with variational inference in a couple
             | minutes. Although variational inference often produces
             | reasonable point estimates in bigger data settings, it
             | doesn't reasonably quantify uncertainty. I'm also working
             | on a genomics model for differential expression of splice
             | variants that involves 120K measurements and just as many
             | parameters to deal with overdispersion of biological
             | replicates in a control and treatment group. We're using
             | variational inference and it fits in a couple minutes for
             | the comparitiver event probabilities we need to estimate.
        
             | RA_Fisher wrote:
             | Check out stan's variational inference algos. They're
             | relatively fast (compared to MCMC) at the cost of being
             | approximative.
        
             | elsherbini wrote:
             | This was useful. Do you know how painful it would be to use
             | Stan with 100k rows, or even 1m? (For a sorta normal
             | hierarchical model)
        
               | usgroup wrote:
               | Under the hood Stan attempts to find globally optimal
               | parameter values for your function which you've expressed
               | as a joint probability density. To do this it relies on
               | the same MCMC theoretical results which indicate how the
               | recursive process of sampling and posterior updating
               | leads to the global optimum. The big deal about Stan is
               | that its algorithm for doing this is state-of-the-art,
               | and that it can work with a huge variety (including
               | custom) density functions by utilising auto-
               | differentiation.
               | 
               | Sampling is a slow approach when there are other
               | alternatives. For example, if you are after OLS
               | regression, you can do the equivalent with Stan but it
               | may be an order of magnitude slower than plain OLS.
               | Further, the calculation of your likelihood function will
               | scale linearly with the size of the data. But adding new
               | parameters will scale exponentially, so you may find that
               | a model with 2 free parameters which takes 10 minutes to
               | fit takes 2 hours with 3 parameters.
               | 
               | A good thing about Stan however, is that it is
               | parallelisable so you can run it on many cores (and it
               | will scale linearly for a good while) and you can also
               | run it on MPI across many machines. Some regression
               | functions with very large matrices support GPUs (although
               | Stan requires double precision to work). So to some
               | extent you can "throw more money at it" to get a result
               | out and it has been used for very big data problems in
               | astronomy for example which however utilised something
               | like 600k cores if memory serves correctly.
        
               | standevbob wrote:
               | Stan supports optimization (L-BFGS) to find (penalized)
               | maximum likelihood or MAP estimates where they exist.
               | Bayesian estimates are typically posterior means, which
               | involve MCMC rather than optimization, and the result is
               | usually far away from the maximum likelihood estimate in
               | high dimensions. I wrote a case study with some simple
               | examples here: https://mc-
               | stan.org/users/documentation/case-studies/curse-d...
               | 
               | Adding new parameters scales as O(N^5/4) in HMC, whereas
               | it scales as O(N^2) in Metropolis or Gibbs. It's
               | quadrature that scales exponentially in dimension.
               | There's also a constant factor for posterior correlation,
               | which can get nasty. I regularly fit regressions for
               | epidemiology or genomics or education with 10s or even
               | 100s of thousands of parameters on my notebook with one
               | core and no GPU.
               | 
               | MCMC or optimization can be sub-linear or super-linear in
               | the data, depending on the statistical properties of the
               | posterior. Some non-parametric models like Gaussian
               | processes can be cubic in the data size, whereas
               | regressions are often sub-linear (doubling the data
               | doesn't double computation time) because posteriors are
               | better behaved (more normal in the Gaussian sense) when
               | there's more data and hence easier to explore in fewer
               | log density and gradient evaluations.
        
         | noelsusman wrote:
         | It's really good for hierarchical models. I used it this year
         | to model PPE usage for a large health system. It let me easily
         | share information across hospitals and embed knowledge of how
         | different PPE items interact with each other. As always, there
         | are other ways to accomplish this, but it felt natural in Stan.
        
       | wodenokoto wrote:
       | What kind of problems does Stan / Bayesian inference beat the
       | much more hyped Tensorflow / deep learning approach?
       | 
       | Often you hear that deep learning is best at unstructured data
       | (images, sound and recently raw text) and boosted trees / XG
       | boost for tabular data.
        
         | eggie5 wrote:
         | Managing uncertainty with Distributions instead of point
         | estimates
        
         | kj98uo wrote:
         | I am still learning about Bayesian inference so this might be
         | off-base but isn't the point to compute the full posterior
         | distribution (or an approximation thereof) of the underlying
         | parameters. Whether this is done in the context of a linear
         | model or a deep neural network is a question of tractability.
         | 
         | The other distinction is between discriminative and generative
         | models. In a discriminative model, the output/label is being
         | predicted based on the input features: p(y|x, theta). For
         | example, the probability of an image containing a dog, y based
         | on pixels, x. Theta here refers to the parameters one needs to
         | discover.
         | 
         | In a generative model, one instead models the distribution
         | p(x|y, beta) i.e. given the label, say dog, predicting the
         | joint distribution of all the images.
         | 
         | Neural networks with backproagation can be used for both
         | discriminative and generative models. Bayesian methods can be
         | applied to both discriminative and generative models to compute
         | the full posterior distribution of the parameters, theta and
         | beta.
         | 
         | Edit for clarity: The claim is that the choice of the model vs
         | the choice of inferential methodology (Bayesian vs max
         | likelihood for example) are orthogonal choices.
         | 
         | A neural network doing (discriminative) binary classification
         | based on cross-entropy is maximizing likelihood instead of
         | maximizing the posterior. Most Bayesian examples seem to
         | specify a generative model (a Hidden Markov Model for example)
         | and then infer the posterior. But there's nothing preventing
         | one from using Bayesian methods with discriminative models
         | (generalized linear models) or max likelihood with generative
         | models.
        
         | credit_guy wrote:
         | Both Bayesian inference and deep learning can do function
         | fitting, i.e. given a number of observations y and explanatory
         | variables x, you try to find a function so that y ~ f(x). The
         | function f can have few parameters (e.g. f(x)= ax+b for linear
         | regression) or millions of parameters (the usual case for deep
         | learning). You can try to find the best value for each of these
         | parameters, or admit that each parameter has some uncertainty
         | and try to infer a distribution for it. The first approach uses
         | optimization, and in the last decade, that's done via various
         | flavors of gradient descent. The second uses Monte Carlo. When
         | you have few parameters, gradient descent is smoking fast.
         | Above a number of parameters (which is surprisingly small,
         | let's say about 100), gradient descent fails to converge to the
         | optimum, but in many cases gets to a place that is "good
         | enough". Good enough to make the practical applications useful.
         | In pretty much all cases though, Bayesian inference via MCMC is
         | painfully slow compared to gradient descent.
         | 
         | But there is a case where it makes sense: when you have
         | reasonably few parameters, and you can understand their
         | meaning. And this is exactly the case of what's called
         | "statistical models". That's why STAN is called a statistical
         | modeling language.
         | 
         | How is that? Gradient descent for these small'ish models is
         | just MLE (maximum likelihood estimation). People have been
         | doing MLE for 100 years, and they understand the ins and outs
         | of MLE. There are some models that are simply unsuited for MLE;
         | their likelihood function is called "singular"; there are
         | places where the likelihood becomes infinite despite the fit
         | being quite poor. One way to fix that is to "regularize" the
         | problem, i.e. to add some artificial penalty that does not
         | allow the reward function to become infinite. But this
         | regularization is often subjective. You never know when the
         | penalty you add is small enough to not alter the final fit.
         | Another way is to do Bayesian inference . It's very slow, but
         | you don't get pulled towards the singular parameters.
        
         | celrod wrote:
         | It's used a lot for things like analyzing clinical trials, e.g
         | making futility or early stopping calls in interims, or for
         | meta analysis. JAGS may still be the most popular, at least in
         | some companies, but Stan is starting to catch on thanks to its
         | greater flexibility in most respects.
        
         | abeppu wrote:
         | I like many of the answers to your question. But a refinement
         | of your question is when do we really have to choose between
         | Bayesian inference and deep learning? Under what conditions
         | should one pick Stan over Edward or Pyro?
        
         | nabla9 wrote:
         | 1) You have too little data for Deep Learning
         | 
         | 2) You want to do statistical modelling, not a black box. You
         | already have a statistical model in mind, you just want to fit
         | parameters.
         | 
         | Stan is probabilistic programming system. You describe the
         | data-producing mechanism (the model of reality), and the level
         | and form of approximations used in the estimation. The compiler
         | generates code for the estimators.
        
         | jsinai wrote:
         | Other comments point out to Bayesian inference being good for
         | modelling an uncertain outcome, while deep learning is good for
         | prediction.
         | 
         | However Bayesian inference is a good choice for prediction when
         | you have few data points (deep learning is sample-size hungry).
         | And it is especially good when you have high uncertainty in
         | your labelled training data (ie large variance in the response
         | variable for given input). Here a Bayesian regression (or even
         | classification) model wouldn't magically remove the uncertainty
         | but rather you'd be able to account for the predictive variance
         | (instead of being none-the-wiser using just good ole deep
         | learning). You can then take it from there how you wish to
         | treat the predictions, given the predictive variance as well.
        
           | borroka wrote:
           | The choice is not between Bayesian methods and Deep Learning,
           | but between statistical models and machine learning models
           | (say, from random forest to GBM to xgboost and then maybe
           | Deep Learning). There is overlap between statistical models
           | and machine learning models--it is a matter sometimes of
           | focus--and Bayesian methods can also be applied to what are
           | typically considered ML approaches (see for example Bayesian
           | hierarchical random forest).
        
         | usgroup wrote:
         | Stan is exceptional if what you need is a hierarchical Bayesian
         | model, and if what you want is rigorous way of quantifying the
         | uncertainty associated in the parameter selections in your
         | model.
         | 
         | Stan users are more often R users than Python user and mostly
         | come from science backgrounds. They often use Stan via a
         | package called BRMS which stands for "Bayesian Regression
         | Models using Stan" which should give you some idea of its core
         | use case.
         | 
         | You wouldn't use Stan if you weren't trying to model your
         | problem as a distribution based probabilistic model.
        
         | scottfr wrote:
         | Stan: Predict the values of parameters in a model
         | 
         | Deep Learning: Predict an outcome variable
         | 
         | For example, if I want to know what effect household income has
         | on a student's chance of getting into college, Stan would allow
         | you to estimate that given a proposed model.
         | 
         | If instead I wanted to predict a given student's chance of
         | getting into college, I might use Machine Learning.
         | 
         | Of course, those two problems are linked, but it's a
         | fundamental difference of focus.
        
           | [deleted]
        
           | [deleted]
        
           | nightski wrote:
           | While it is true that Bayesian inference is very powerful in
           | that it allows one to introspect and view effects of the
           | model's parameters on the outcome, it is equally as good at
           | predicting the outcome variable as well. It just depends on
           | what you want to get out of it. In fact you get more
           | information about your outcome variable from Bayesian
           | Inference as it is a distribution.
           | 
           | I'm not saying it is better than DL by any means, as DL can
           | scale much better. Just that I don't think it's necessary to
           | pigeonhole Bayesian inference to just predicting the
           | parameters. In my opinion the "fundamental difference of
           | focus" is just a personal decision, not something inherent to
           | the method.
        
             | borroka wrote:
             | The focus of statistical models (including Bayesian models)
             | is on inference and uncertainty (both for parameter values
             | and for predictions), the focus of ML models (including DL
             | models) is on prediction and it is rarely possible to
             | obtain any quantification of uncertainty.
        
               | jgalt212 wrote:
               | > rarely possible to obtain any quantification of
               | uncertainty.
               | 
               | Can't this be estimated via bootstrapping?
        
             | peteradio wrote:
             | I guess Bayesian will tend to be underfit while DL may tend
             | to overfit.
        
         | ogogmad wrote:
         | I asked essentially the same question as you 5 minutes before
         | you did. Have an upvote anyway.
         | 
         | [Edit] I don't understand these downvotes.
        
           | ogogmad wrote:
           | Anonymous passive aggressive downvoting cowards go to hell.
        
         | tel wrote:
         | Bayesian modeling has a somewhat distinct feeling to both
         | (typical) deep learning algorithms and boosting/bagging
         | classifiers.
         | 
         | Most particularly, Bayesian modeling tends to be generative
         | modeling as opposed to discriminative. This means that you
         | construct your model by describing a process which generates
         | your observed data from a set of latent/unknown quantities.
         | 
         | For instance, we might observe that n[u, d] clicks are observed
         | on user u on day d for various choices of u and d. We could
         | build a variety of generative stories here: that n[u, d] is
         | independent of u and d, just being a random draw from a
         | Normal(mu, sigma) distribution; that n[u, d] incorporates
         | another unknown parameter p[u], the user's propensity to click,
         | and then is a random draw from Normal(mu + b p[u], sigma); or
         | that we also include season trends sm[d] and ss[d] to both the
         | mean and spread of n[u, d], saying it's Normal(mu + b p[u] +
         | sm[d], sigma * ss[d]).
         | 
         | In these examples, the unknown latents are parameters like mu,
         | sigma, and b as well as any latent data needed to give shape to
         | p[-], sm[-], and ss[-]. Once we've posited the structure of
         | this generative model, we'd like to infer what values those
         | latents might take as informed by the data.
         | 
         | This is the bread and butter of Stan modeling. It lets you
         | describe these generative models as a "forward" process where
         | we sample latents in a simple forward program. Similar to
         | Tensorflow/etc Stan extracts from this forward program a DAG
         | and computes derivatives, but instead of simply maximizing an
         | objective function through backdrop, Stan uses these
         | derivatives to perform a sampling algorithm over the latents
         | (mu, sigma, b).
         | 
         | Ultimately, this gives you a distribution of plausible latent
         | configurations given the data you've observed. This
         | _distribution_ is a key point of Bayesian modeling and can
         | provide a lot of information beyond what the objective-
         | maximizing value would. As a simple example, it 's trivial from
         | a Bayesian output distribution to make statements like "we're
         | 95% confident that mu > 0.1".
        
         | gbrown wrote:
         | This question would be super bizarre to anyone coming from a
         | stats background.
         | 
         | Others have commented on the role of inference/estimation, and
         | prediction in small data or non-black-box contexts, so I'll
         | just add that there are deep theoretical reasons to do Bayesian
         | inference. It's a framework grounded firmly in decision theory,
         | and provides a coherent way to reason about the world. You can
         | prove, under sensible axioms, that beliefs can be described in
         | terms of probability distributions, and that we should update
         | beliefs based on Bayes' Rule.
        
         | darthdeus wrote:
         | Stan gives you the ability to do probabilistic reasoning. There
         | is actually Tensorflow Probability
         | (https://www.tensorflow.org/probability) which has a lot of
         | overlapping algorithms, but isn't as mature and approaches some
         | things differently.
         | 
         | The main difference is that with Stan you think in terms of
         | random variables and distributions (and their transformations),
         | while with Tensorflow/DL you think in terms of predicting
         | directly from data. Stan lets model a problem with
         | probabilities and do arbitrary inference, generally asking any
         | question you want about your model.
         | 
         | There are many other interesting alternatives, e.g.
         | http://pyro.ai/ which takes a yet another approach merging DL
         | and probabilistic programming with variational inference. (Stan
         | and TFP can do variational inference too, but I guess it's like
         | Python vs JavaScript vs Ruby vs Java - all of them can be used
         | for programming, but not the same way).
        
           | usgroup wrote:
           | The next cut of Stan will likely use TFP as a backend. I
           | think that PyMC4 will also. The Stan team wrote everything
           | from scratch in C++ including their own autodiff code which
           | many regard as quite a stretch in terms of long term
           | maintenance. Since TFP executes on top of Tensorflow things
           | like autodiff and many of the other performance concerns that
           | take up so much Stan-dev time are already taken care of.
        
             | [deleted]
        
             | abhgh wrote:
             | PyMC4 on TFP was the plan, but they made a recent
             | announcement [1] indicating those efforts would stop, and
             | instead, they would develop PyMC3+JAX+Theano.
             | 
             | [1] https://pymc-devs.medium.com/the-future-of-pymc3-or-
             | theano-i...
        
               | diab0lic wrote:
               | Woah. Thanks for the link, as a PyMC3 user I was not
               | looking forward to the transition to 4 expecting to have
               | to relearn the API like the transition from 2 to 3. I was
               | debating wether I should learn 4 or switch to a different
               | library when all I really wanted to do was stick with 3.
               | 
               | Looks like I get the best of both worlds now.
        
             | jsinai wrote:
             | Please no, we don't need Stan to be rebuilt with a Python
             | backend. That it's built in C++ and can be called with
             | higher level API's is part of the appeal.
        
       | dstick wrote:
       | My name is Stan and I just finished a feature on the product I'm
       | working on that used statistical analysis for anomaly detection.
       | So this headline made me smile - thanks for sharing, and
       | apologies for a rather pointless comment otherwise ;-)
        
         | harry8 wrote:
         | Named for Stanislaw Ulam.
         | https://en.wikipedia.org/wiki/Stanislaw_Ulam
         | 
         | Were you?
        
         | mhh__ wrote:
         | The name of the main developer of Microsoft's C++ library is a
         | never-ending source of fun
        
       | hendzen wrote:
       | If you want to learn Stan I highly recommend the book Statistical
       | Rethinking (2nd Ed) by Richard McElreath. It's a pedagogical
       | masterpiece and light years away the best resource I've found on
       | learning Bayesian inference.
        
         | elsherbini wrote:
         | Seconded. He has a full course on youtube as well, and a free
         | version of the textbook that is just missing the last chapter
         | available on his website (password is in the first or second
         | lecture on youtube)
         | 
         | https://www.youtube.com/watch?v=4WVelCswXo4&list=PLDcUM9US4X...
        
         | nextos wrote:
         | Statistical Rethinking is not bad, but I think it's for people
         | with backgrounds different than CS (or Math).
         | 
         | Personally, I think https://probmods.org/ is an exceptionally
         | good introduction to probabilistic programming for someone that
         | knows CS or just some programming and likes a SICP-like
         | textbook that goes into the essence of the topic.
         | 
         | Learning Stan is great, but not as a first probabilistic
         | programming language, because it's quite limited (it trades
         | model expressiveness for performance). So you can't represent a
         | large set of models, such as infinite mixtures, which may
         | become really relevant in the future developments of deep
         | learning. It also has poor performance in models that involve
         | many discrete variables.
        
         | stevesimmons wrote:
         | The Statistical Rethinking book uses R.
         | 
         | For people wanting Python, Jupyter notebooks with Python code
         | examples are here:
         | 
         | * https://github.com/pymc-
         | devs/resources/tree/master/Rethinkin...
        
       | glial wrote:
       | Facebook's (very good) Prophet forecasting library is a wrapper
       | for Stan models.
       | 
       | https://facebook.github.io/prophet/
        
       | PLenz wrote:
       | Stan is one of those technologies I keep finding is actually
       | powering the more 'friendly' interfaces I run to one off jobs -
       | especially in the mcmc world. Every so often I think I'll spend
       | some time to learn stan proper but it's such an all-encompassing
       | project that I get intimidated and stick to the derivatives. My
       | loss!
       | 
       | Bravo to the team behind it and for making and supporting such a
       | powerful tool!
        
       | deugtniet wrote:
       | I've dabbled in Stan, and it's really good and state of the art
       | for Bayesian inference. Starting using Stan is a bit difficult
       | though, as it has a C like programming language that is difficult
       | to master initially. Especially since statistics is usually done
       | in languages like R, so the learning curve is a bit steep for
       | beginners.
       | 
       | I've personally liked PyMC for simple models and relative ease of
       | inference, as it's more integrated with the Python language. That
       | being said, if you want the latest in inference methods and
       | statistical alchemy, Stan is the place to go.
        
         | phillc73 wrote:
         | There are a good range of other programming language interfaces
         | to Stan. The R one is quite popular.[1]
         | 
         | You do still need the C++ toolchain, but can just write your
         | code in R.
         | 
         | [1] https://mc-stan.org/rstan/
        
           | standevbob wrote:
           | Stan requires models to be coded in the Stan language, which
           | is a simple imperative language that's like MATLAB with
           | explicit data types. This is the same as was done in Stan's
           | predecessors, BUGS and JAGS.
           | 
           | A Stan program can be run in any of our interfaces in Python,
           | Julia, R, MATLAB, Stata, etc. But you can't mix any of those
           | languages into a Stan program.
           | 
           | The C++ toolchain is required because Stan transpiles its
           | programs to C++, then compiles those against the Stan math
           | librarym, which does autodiff. But you don't need to write
           | any C++ to use Stan, just to develop extensions for it.
        
           | melling wrote:
           | There's a Julia Stan too:
           | 
           | http://stanjulia.github.io/Stan.jl/stable/INTRO.html
           | 
           | https://astrostatistics.psu.edu/su14/lectures/BayesComp2014L.
           | ..
        
       | mushufasa wrote:
       | Does anyone know of a good article of a comparison between Stan
       | vs PyMC3 for real-world bayesian modelling tasks? E.g. to be used
       | in a production system.
        
       ___________________________________________________________________
       (page generated 2020-12-23 23:01 UTC)