52-App-Symbols.Rmd


# Symbols, formulas, statistics and parameters {#StatisticsAndParameters}

## Symbols and standard errors {#Symbols}

(ref:DEFSamplingDistributionPhat) Def. \@ref(def:DEFSamplingDistributionPhat)

(ref:DEFSamplingDistributionPhatKnown) Def. \@ref(def:SamplingDistProp)

(ref:DEFSamplingDistributionXbar) Def. \@ref(def:DEFSamplingDistributionXbar)

(ref:DEFSamplingDistributionDbar) Def. \@ref(def:DEFSamplingDistributionDbar)


```{r ParametersStatistics}
ParStat <- array( dim = c(11, 4))

colnames(ParStat) <- c("Parameter",
                       "Statistic",
                       "Standard error",
                       "S.E. formula reference")
rownames(ParStat) <- c("Proportion (CI)",
                       "Proportion (Test)",
                       "Mean",
                       "Standard deviation",
                       "Mean difference",
                       "Diff. between means",
                       "Odds ratio",
                       "Correlation",
                       "Slope of regression line",
                       "Intercept of regression line",
                       "R-squared")

ParStat[1, ] <- c("$p$", 
                  "$\\hat{p}$", 
                  "$\\displaystyle\\text{s.e.}(\\hat{p}) = \\sqrt{\\frac{ \\hat{p} \\times (1 - \\hat{p})}{n}}$",
                  "(ref:DEFSamplingDistributionPhat)")
ParStat[2, ] <- c("$p$", 
                  "$\\hat{p}$", 
                  "$\\displaystyle\\text{s.e.}(\\hat{p}) = \\sqrt{\\frac{ p \\times (1 - p)}{n}}$",
                  "(ref:DEFSamplingDistributionPhatKnown)")
ParStat[3, ] <- c("$\\mu$", 
                  "$\\bar{x}$", 
                  "$\\displaystyle\\text{s.e.}(\\bar{x}) = \\frac{s}{\\sqrt{n}}$", 
                  "(ref:DEFSamplingDistributionXbar) ")
ParStat[4, ] <- c("$\\sigma$", 
                  "$s$", 
                  "", 
                  "")
ParStat[5, ] <- c("$\\mu_d$", 
                  "$\\bar{d}$", 
                  "$\\displaystyle\\text{s.e.}(\\bar{d}) = \\frac{s_d}{\\sqrt{n}}$", 
                  "(ref:DEFSamplingDistributionDbar)")
ParStat[6, ] <- c("$\\mu_1 - \\mu_2$", 
                  "$\\bar{x}_1 - \\bar{x}_2$", 
                  "$\\displaystyle\\text{s.e.}(\\bar{x}_1 - \\bar{x}_2)$", 
                  "Value given")
ParStat[7, ] <- c("Pop. OR", 
                  "Sample OR", 
                  "", # $\\displaystyle\\text{s.e.}(\\text{sample OR})$", 
                  "Value given")
ParStat[8, ] <- c("$\\rho$", 
                  "$r$", 
                  "", 
                  "")
ParStat[9, ] <- c("$\\beta_1$", 
                  "$b_1$", 
                  "$\\text{s.e.}(b_1)$", 
                  "Value given")
ParStat[10, ] <- c("$\\beta_0$", 
                  "$b_0$", 
                  "$\\text{s.e.}(b_0)$", 
                  "Value given")
ParStat[11, ] <- c("", 
                   "$R^2$", 
                   "", 
                   "")

if( knitr::is_latex_output() ) {
  kable( ParStat,
         format = "latex",
         booktabs = TRUE,
         escape = FALSE,
         align = c("c", "c", "c", "c"),
         linesep = "\\addlinespace\\addlinespace",
         #linesep = c("", "", "\\addlinespace", "\\addlinespace", "", "", "", ""),  # Else adds a space every 5 lines... 
         caption = "Some sample statistics used to estimate population parameters. Empty table cells means that these are not studied in this textbook. For statistics with standard errors given, the sampling distribution is approximately normally distributed under certain (statistical validity) conditions."
         ) %>%
	  row_spec(0, bold = TRUE) %>%
    kable_styling(font_size = 10) %>% # Columns headings in bold
   column_spec(column = 1, width = "33mm") %>% 
   column_spec(column = 2, width = "14mm") %>%
   column_spec(column = 3, width = "15mm") %>%
   column_spec(column = 4, width = "39mm") %>%
   column_spec(column = 5, width = "23mm")
}
if( knitr::is_html_output() ) {
  kable( ParStat,
         format = "html",
         booktabs = TRUE,
         align = c("c", "c", "c", "c"),
         caption = "Some sample statistics used to estimate population parameters. Empty table cells means that these are not studied in this textbook. For statistics with standard errors given, the sampling distribution is approximately normally distributed under certain (statistical validity) conditions.") %>%
    row_spec(0, bold = TRUE)
}
```


\pagebreak


## Confidence intervals {#FormulasCI}

**Confidence intervals** have the form  
\[ 
    \text{statistic} \pm ( \text{multiplier} \times \text{s.e.}(\text{statistic})).
\]
when the sampling distribution has an approximate normal distribution.

**Notes:**

* The multiplier is *approximately* 2 for an *approximate* 95% CI (based on the 68--95--99.7 rule).
* $\text{multiplier} \times \text{s.e.}(\text{statistic})$ is called the *margin of error*.
* When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for *odds ratios* and *correlation coefficients*), **this formula does not apply**.


## Hypothesis testing {#FormulasTest}

For **hypothesis tests**, the *test statistic* is a $t$-score, which has the form:  
\[
  t = \frac{\text{statistic} - \text{parameter}}{\text{s.e.}(\text{statistic})}.
\]
when the sampling distribution has an approximate normal distribution.

**Notes:**

* Since $t$-scores are a little like $z$-scores, the 68--95--99.7 rule can be used to *approximate* $P$-values.
* Tests involving *odds ratios* do not use $t$-scores, so **this formula does not apply for tests involving odds ratios**.
* When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for *odds ratios* and *correlation coefficients*), **this formula does not apply**.
* A hypothesis test about **odds ratios** uses a $\chi^2$ test statistic, whose value is approximately like a $z$-score with a value of  
\[
   \sqrt{\frac{\chi^2}{\text{df}}}.
\]
where $\text{df}$ is the 'degrees of freedom' given in the software output.
 
(ref:AboutHypotheses) Sect. \@ref(AboutHypotheses)

(ref:TestStatObs) Sect. \@ref(TestStatObs)

(ref:AboutCIs) Chap. \@ref(AboutCIs)

(ref:StandardError) Def. \@ref(def:StandardError)


\pagebreak

## Sample size estimation {#FormulasSampleSize}

* To estimate the sample size needed (Sect. \@ref(SampleSizeProportions)) for **estimating a proportion**:  
\[
   n = \frac{1}{(\text{Margin of error})^2}.
\]
* To estimate the sample size needed (Sect. \@ref(SampleSizeOneMean)) for **estimating a mean**:  
\[
   n = \left( \frac{2\times s}{\text{Margin of error}}\right)^2.
\]
* To estimate the sample size needed (Sect. \@ref(SampleSizeMeanDifferences)) for **estimating a mean difference**:  
\[
   n = \left( \frac{2 \times s_d}{\text{Margin of error}}\right)^2.
\]

**Notes:**

* In **sample size calculations**, always **round up** the sample size found from the above formulas.


## Other formulas {#FormulasOther}

* To calculate $z$-scores (Sect. \@ref(z-scores)): $\displaystyle z = \frac{x - \mu}{\sigma}$ or, more generally,\bigskip  
\[
   z = \frac{\text{value of variable} - \text{mean of distribution}}{\text{standard deviation of distribution}}.
\]

* The **unstandardizing formula** (Sect. \@ref(Unstandardising)): $x = \mu + (z\times \sigma)$.

**Notes:**

* $t$-scores are like $z$-scores, except that the standard deviation of the distribution includes some values estimated from the sample.

\pagebreak

## Other symbols used

```{r OtherSymbols}
SymbolsTab <- array( dim = c(7, 3))

colnames(SymbolsTab) <- c("Symbol", 
                          "Meaning", 
                          "Reference")

SymbolsTab[1, ] <- c("$H_0$", 
                     "Null hypothesis", 
                     "(ref:AboutHypotheses)")
SymbolsTab[2, ] <- c("$H_1$", 
                     "Alternative hypothesis", 
                     "(ref:AboutHypotheses)")
SymbolsTab[3, ] <- c("df", 
                     "Degrees of freedom", 
                     "(ref:TestStatObs)")
SymbolsTab[4, ] <- c("CI", 
                     "Confidence interval", 
                     "(ref:AboutCIs) ")
SymbolsTab[5, ] <- c("s.e.", 
                     "Standard error" , 
                     "(ref:StandardError)")
SymbolsTab[6, ] <- c("$n$", 
                     "Sample size", "")
SymbolsTab[7, ] <- c("$\\chi^2$", 
                     "The chi-squared test statistic", 
                     "(ref:TestStatObs) ")


if( knitr::is_latex_output() ) {
  kable( SymbolsTab,
         format = "latex",
         booktabs = TRUE,
         escape = FALSE,
         align = c("c", "l", "c"),
         linesep = c("", "", "\\addlinespace", "\\addlinespace", "", "", "")) %>%  # Else adds a space every 5 lines...
        # caption = "Some symbols used") %>%
	row_spec(0, bold = TRUE) %>%
  kable_styling(font_size = 11)
}
if( knitr::is_html_output() ) {
  kable( SymbolsTab,
                format = "html",
                booktabs = TRUE,
                align = c("c", "l", "c")) %>%
                #caption = "Some symbols used")
  row_spec(0, bold = TRUE)
}
```