Three Pieces of Stat Nerd Trivia

I’ve recently learned three bits of trivia that only a statistics nerd could love. They won’t help you think deeply about data. They won’t help you analyze data. In fact, they’ll probably push useful knowledge out of your head and make you dumber. Read further at your own risk.

1. Where does the word “logit” come from?

One of the earliest probablistic models of a binary process is the probit. It was developed by Chester Bliss in 1934, and in this paper he coined the term as a shortening of the phrase “probability unit”. The idea is that a one unit increase in an independent variable changes the probability of the outcome occurring by a constant number of probability units. Probits can be translated into actual probabilities using a particular non-linear transformation: the cumulative distribution function of the standard normal. That is, we have:

$\Pr(y_i) = \Phi(\beta_0 + \beta_1 x_i)$

In 1944, Joseph Berkson developed a similar model that also mapped the real line to the (0,1) interval, this time using the logistic function:

$\Pr(y_i) = \frac{\exp(\beta_0 + \beta_1 x_i)}{1 + \exp(\beta_0 + \beta_1 x_i)}$

By analogy, this model was given the name logit.

2. Why does Stata store dates as the number of days since January 1, 1960?

Nick Cox knows a lot about Stata, and even he does not know exactly why this date was chosen as the origin for internal dates. His best guess is that it was copied from “some data base or spreadsheet program”. My guess is that Stata copied SAS which uses the same date as the date-time origin.

This led me to wonder why SAS chose the date, and Derek Morgan (in The Essential Guide to SAS Dates and Times) says “One story has it that the founders of SAS wanted to use the approximate birth date of the IBM 370 system, and they chose January 1, 1960 as an easy-to-remember approximation.” According to Wikipedia S-Plus also uses the same reference date.

3. How are we supposed to pronounce McNemar’s test?

Suppose you have a set of matched pairs of observations and want to know if the distribution of some categorical outcome is the same for each side of the pair. You should use McNemar’s test to see if the observed differences are statistically significant. I went years pronouncing Quinn McNemar’s name as “mik-nee’-mahr” and getting likely getting laughed at behind my back by biostatisticians. It turns out the proper pronunciation is “mak’ne-mahr”. Ashwini Kalantri pronounces it well as he explains how it works.

Note: The pretty math equations are rendered with Mathjax and integrating this with Javascript was trivial thanks to Lucy Park.

High Variance

Three Pieces of Stat Nerd Trivia

Comments