03-results-4-extrapolation.Rmd

## Expenses Extrapolation

```{r, include=FALSE}
if (file.exists("output/r-extrapolation.rds")) {
    r_extrapolation <- readRDS("output/r-extrapolation.rds")
} else {
    source("src/r-extrapolation.R", local = knitr::knit_global())
    saveRDS(r_extrapolation, "output/r-extrapolation.rds")
}
```

The rough extrapolation of the survey mean expenses with the total sum of honey bee colonies in Austria are based on the reported numbers by the Austrian beekeeping federation 'Biene Österreich'[^r-vis].
This extrapolation with the arithmetic mean sums up to EUR&nbsp;`r format(r_extrapolation$data %>% pull(extrapolation_estimate)%>% .[1], big.mark = ",")` (95% CI: `r format(r_extrapolation$data %>% pull(extrapolation_estimate_lower)%>% .[1], big.mark = ",")` - `r format(r_extrapolation$data %>% pull(extrapolation_estimate_upper)%>% .[1], big.mark = ",")`) in
2018/19, to EUR&nbsp;`r format(r_extrapolation$data %>% pull(extrapolation_estimate)%>% .[2], big.mark = ",")` (95% CI: `r format(r_extrapolation$data %>% pull(extrapolation_estimate_lower)%>% .[2], big.mark = ",")` - `r format(r_extrapolation$data %>% pull(extrapolation_estimate_upper)%>% .[2], big.mark = ",")`) in 2019/20 and to EUR&nbsp;`r format(r_extrapolation$data %>% pull(extrapolation_estimate)%>% .[3], big.mark = ",")` (95% CI: `r format(r_extrapolation$data %>% pull(extrapolation_estimate_lower)%>% .[3], big.mark = ",")` - `r format(r_extrapolation$data %>% pull(extrapolation_estimate_upper)%>% .[3], big.mark = ",")`) in 2020/21 (Table&nbsp;\@ref(tab:extrapolation-tab-extrapolation), Fig.&nbsp;\@ref(fig:extrapolation)).

In addition the same extrapolation was calculated with the geometric mean from the natural logarithm. For this calculation the single zero value in the dataset was replaced with EUR&nbsp;0.01. The geometric mean is less influenced by outliers and typically used as point estimate for a log-normal distribution. The results are always lower than the arithmetic mean and the difference in the extrapolation is on average EUR&nbsp;1,258,377.

Over the three survey years the extrapolation with the geometric mean sums up to `r format(r_extrapolation$data %>% pull(extrapolation_estimate)%>% .[4], big.mark = ",")` (95% CI: `r format(r_extrapolation$data %>% pull(extrapolation_estimate_lower) %>% .[4], big.mark = ",")` - `r format(r_extrapolation$data %>% pull(extrapolation_estimate_upper)%>% .[4], big.mark = ",")`) in 2018/19, to `r format(r_extrapolation$data %>% pull(extrapolation_estimate) %>% .[5], big.mark = ",")` (95% CI: `r format(r_extrapolation$data %>% pull(extrapolation_estimate_lower)%>% .[5], big.mark = ",")` - `r format(r_extrapolation$data %>% pull(extrapolation_estimate_upper)%>% .[5], big.mark = ",")`) in 2019/20 and to `r format(r_extrapolation$data %>% pull(extrapolation_estimate) %>% .[6], big.mark = ",")` (95% CI: `r format(r_extrapolation$data %>% pull(extrapolation_estimate_lower)%>% .[6], big.mark = ",")` - `r format(r_extrapolation$data %>% pull(extrapolation_estimate_upper)%>% .[6], big.mark = ",")`) in 2020/21 (Table&nbsp;\@ref(tab:extrapolation-tab-extrapolation), Fig. \@ref(fig:extrapolation)). The extrapolation done with the estimated treatment expenses is a little bit higher but similar to the reported expenses from the survey, see appendix Fig.&nbsp;\@ref(fig:extrapolation-estimate).

To evaluate how good the two means capture the expense distribution, the extrapolated mean sum if applied to the survey data was compared to the sum as reported by the participants, by calculating the total expenses for each participant by the reported expenses multiplied by colonies going into winter. It showed that the real survey sum is in-between the arithmetic and geometric mean (Table&nbsp;\@ref(tab:extrapolation-tab-survey)). 

For better comparison between years an extrapolation on a fixed number of 100.000 colonies was done. It showed similar mean sums with a maximum arithmetic mean difference of EUR&nbsp;122.030 and geometric mean maximum difference of EUR&nbsp;53.465 (Table&nbsp;\@ref(tab:extrapolation-tab-extrapolation)).

```{r extrapolation-tab-survey, include=T}
c <- "Column 'Sum-Mean' represents the multiplication of the average treatment expenses per colony with the total number of honey bee colonies wintered from the survey participants. Column 'Sum-Survey' is the observed total sum from the survey, where the total expenses for each participant was calculated by reported expenses multiplied by colonies going into winter and afterwards summarised for each year. Confidence intervals for point estimates are calculated with 1,000 bootstrap resamples. Two methods applied arithmetic mean and geometric mean."
cn <- c("Year", "Colonies [#]", "CI Lower [EUR]", "Mean [EUR]", "CI Higher [EUR]", "Mean [EUR]", "Survey [EUR]")
kable(
    r_extrapolation$data %>%
        select(year, colonies_survey, conf.low, statistic, conf.high, survey_estimate, survey_real) %>%
        mutate(
            across(conf.low:conf.high, fr, r = 1),
            colonies_survey = ft(colonies_survey),
            across(survey_estimate:survey_real, ~ .x %>%
                round() %>%
                ft())
        ),
    align = c("c", rep("r", 5)),
    booktabs = TRUE,
    row.names = FALSE,
    col.names = cn,
    caption = c,
    escape = TRUE,
) %>%
    add_header_above(c(" " = 5, "Sum" = 2)) %>%
    pack_rows("Arithmetic Mean", 1, 3) %>%
    pack_rows("Geometric Mean", 4, 6) %>%
    kable_styling(latex_options = "scale_down")
rm(c, cn)
```

```{r extrapolation-tab-extrapolation, include=T}
c <- "Extrapolation for Austria with the number of colonies as reported from the Austrian beekeeping federation 'Biene Österreich' by multiplication of the average treatment expenses per colony per year. Second extrapolation is based on 100.000 colonies. Confidence intervals for point estimates are calculated with 1,000 bootstrap resamples. Two methods applied arithmetic mean and geometric mean."
cn <- c("Year", "Colonies [N]", "CI Lower [EUR]", "Mean Sum [EUR]", "CI Upper [EUR]", "CI Lower [EUR]", "Mean Sum [EUR]", "CI Upper [EUR]")
kable(
    r_extrapolation$data_estimate %>%
        select(year, colonies, extrapolation_estimate_lower, extrapolation_estimate, extrapolation_estimate_upper, extrapolation_estimate_lower_100k, extrapolation_estimate_100k, extrapolation_estimate_upper_100k) %>%
        mutate(
            across(colonies:extrapolation_estimate_upper_100k, ~ .x %>%
                round() %>%
                ft())
        ),
    align = c("c", rep("r", 7)),
    booktabs = TRUE,
    row.names = FALSE,
    col.names = cn,
    caption = c,
    escape = TRUE,
) %>%
    add_header_above(c(" " = 1, "Extrapolation Population" = 4, "Extrapolation with 100.000 colonies" = 3)) %>%
    pack_rows("Arithmetic Mean", 1, 3) %>%
    pack_rows("Geometric Mean", 4, 6) %>%
    kable_styling(latex_options = "scale_down")
rm(c, cn)
```

(ref:extrapolation) Multiplication of the average treatment expenses per colony with the total number of honey bee colonies wintered from the survey participants and extrapolation for Austria with the number of colonies as reported from the national beekeeping association 'Biene Österreich'. Confidence intervals for point estimates are calculated with 1,000 bootstrap resamples. Two methods applied arithmetic mean and geometric mean. Basis of the calculation are the mean expenses for one colony, see Table&nbsp;\@ref(tab:extrapolation-tab-survey).
```{r extrapolation, include=T, fig.cap="(ref:extrapolation)", out.width="100%"}
include_custom("output/figs/extrapolation.png")
```