Skip to content

comment on the p-direction post #40

@bob-carpenter

Description

@bob-carpenter

I couldn't figure out how to leave comments on the blog, so I hope you don't mind if I open an issue relating to the post the p-direction. Apologies in advance if this is obvious and just considered too much for the blog post!

You can compute traditional p-values for Bayesian estimators using the bootstrap. Using max a posteriori (MAP) will then produce results identical to the traditional p-value derived from penalized maximum likelihood where the prior is considered the "penalty". But MAP isn't a Bayesian estimator and doesn't have the nice properties of the two common Bayesian estimators, the posterior mean (minimizes expected square error) and posterior median (mimimizes expected absolute error). Deriving a point estimate isn't particularly Bayesian, but at least the posterior mean and median have natural probabilistic interpretations as an expectation and the point at which a 50% probability obtains. With those estimators, results will vary from MAP based on how skewed the distribution is.

A bigger issue is that MAP doesn't even exist for our bread and butter hierarchical models. The frequentist approach is to use maximum marginal likelihood (this is often called "empirical Bayes" in that the MML estimate is for the hierarchical or "prior" parameters). This leads to underestimates of lower-level regression coefficient uncertainty by construction, as you see in packages like lme4 in R.

Part of the point of Bayesian inference is to not have to collapse to a point estimate. When we want to do downstream predictive inference, we don't want to just plug in an estimate, we want to do posterior predictive inference and average over our uncertainty in parameter estimation.

Defining what it means for a prior to be "informative" is tricky and it wasn't defined in this post. This is particularly vexed because of changes of variables. A prior defined for a probability variable in [0, 1] that's flat is very different from a flat prior for a log odds variable in (-infinity, infinity). A flat prior on [0, 1] under the logit transform leads to a standard logistic prior on the log odds. That's not flat. In MLE, changing variables doesn't matter, but it does in Bayes.

I wouldn't say that changing the threshold for significance with regularization is a good thing. While regularlization can be good for error control (trading variance for bias), the whole notion of a dichotomous up/down decision through signfiicance is the problem, not the threshold used. Also, we tend to use regularization that is not to zero, but to the population mean. This is also common in frequentist penalized maximum likelihood estimates (see, e.g., Efron and Morris's famous paper on predicting batting average in baseball using empirical Bayes, which despite the name, is a frequentist max-marginal likelihood method). That's even better for error control than shrinkage, but it's going to have the "wrong" effect on this notion of p-direction unless you talk about p-direction of the difference from the population estimate, rather than the random effect itself (that is, you don't want to say Connecticut is significantly different than zero, but significantly different than other states).

P.S. For reference, Gelman et al. use a similar, but not equivalent notion, in calculating posterior predictive p-values in Bayesian Data Analysis, but without flipping signs (so that either values near 0 or near 1 are evidence the model doesn't fit the data well). These are not intended to be used in hypothesis tests, though, just as a diagnostic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions