Calculates Horvitz-Thompson estimate with different methods for variance estimation such as Yates and Grundy, Hansen-Hurwitz and Hajek.

htestimate(y, N, PI, pk, pik, method = 'yg')

Arguments

y

vector of observations

N

integer for population size

PI

square matrix of second order inclusion probabilities with n rows and cols. It is necessary to be specified for variance estimation by methods 'ht' and 'yg'.

pk

vector of first order inclusion probabilities of length n for the sample elements. It is necessary to be specified for variance estimation by methods 'hh' and 'ha'.

pik

an optional vector of first order inclusion probabilities of length N for the population elements . It can be used for variance estimation by method 'ha'.

method

method to be used for variance estimation. Options are 'yg' (Yates and Grundy) and 'ht' (Horvitz-Thompson), approximate options are 'hh' (Hansen-Hurwitz) and 'ha' (Hajek).

Details

For using methods 'yg' or 'ht' has to be provided matrix PI, and for 'hh' and 'ha' has to be specified vector pk of inclusion probabilities. Additionally, for Hajek method 'ha' can be specified pik. Unless, an approximate Hajek method is used.

Value

The function htestimate returns a value, which is a list consisting of the components

call

is a list of call components: y observations, N population size, PI inclusion probabilities, pk inclusion probabilities of sample, pik full inclusion probabilities and method method for variance estimation

mean

mean estimate

se

standard error of the mean estimate

References

Kauermann, Goeran/Kuechenhoff, Helmut (2010): Stichproben. Methoden und praktische Umsetzung mit R. Springer.

Author

Juliane Manitz

See also

Examples

data(influenza) summary(influenza)
#> id district population cases #> Min. : 1001 LK Aachen : 1 Min. : 34719 Min. : 0.00 #> 1st Qu.: 5877 LK Ahrweiler : 1 1st Qu.: 104553 1st Qu.: 9.00 #> Median : 8331 LK Aichach-Friedberg: 1 Median : 145130 Median : 27.00 #> Mean : 8468 LK Alb-Donau-Kreis : 1 Mean : 193910 Mean : 44.58 #> 3rd Qu.: 9778 LK Altenburger Land : 1 3rd Qu.: 244154 3rd Qu.: 59.00 #> Max. :16077 LK Altenkirchen : 1 Max. :1770629 Max. :410.00 #> (Other) :418
# pps.sampling() set.seed(108506) pps <- pps.sampling(z=influenza$population,n=20,method='midzuno') sample <- influenza[pps$sample,] # htestimate() N <- nrow(influenza) # exact variance estimate PI <- pps$PI htestimate(sample$cases, N=N, PI=PI, method='yg')
#> #> htestimate object: Estimator for samples with probabilities proportional to size #> Method of Yates and Grundy: #> #> Mean estimator: 40.36766 #> Standard Error: 8.059507 #>
htestimate(sample$cases, N=N, PI=PI, method='ht')
#> #> htestimate object: Estimator for samples with probabilities proportional to size #> Method of Horvitz-Thompson: #> #> Mean estimator: 40.36766 #> Standard Error: 8.227719 #>
# approximate variance estimate pk <- pps$pik[pps$sample] htestimate(sample$cases, N=N, pk=pk, method='hh')
#> #> htestimate object: Estimator for samples with probabilities proportional to size #> Method of Hansen-Hurwitz (approximate variance): #> #> Mean estimator: 40.36766 #> Standard Error: 8.534792 #>
pik <- pps$pik htestimate(sample$cases, N=N, pk=pk, pik=pik, method='ha')
#> #> htestimate object: Estimator for samples with probabilities proportional to size #> Method of Hajek (approximate variance): #> #> Mean estimator: 40.36766 #> Standard Error: 8.262482 #>
# without pik just approximate calculation of Hajek method htestimate(sample$cases, N=N, pk=pk, method='ha')
#> Warning: Without input of 'pik' just approximative calculation of Hajek method is possible.
#> #> htestimate object: Estimator for samples with probabilities proportional to size #> Method of Hajek (approximate variance): #> #> Mean estimator: 40.36766 #> Standard Error: 8.244296 #>
# calculate confidence interval based on normal distribution for number of cases est.ht <- htestimate(sample$cases, N=N, PI=PI, method='ht') est.ht$mean*N
#> [1] 17115.89
lower <- est.ht$mean*N - qnorm(0.975)*N*est.ht$se upper <- est.ht$mean*N + qnorm(0.975)*N*est.ht$se c(lower,upper)
#> [1] 10278.45 23953.33
# true number of influenza cases sum(influenza$cases)
#> [1] 18900