Draws a systematic sample of size n. Each unit's probability of selection is proportional to its size measure.
Arguments
- frame
The input data frame for the function to work on.
- n
The sample size, the parameter expects an integer of length 1. The function will check if n is less than or equal to the number of rows in the input frame.
- mos
The measure of size, the parameter expects a character string to indicate the variable to be use as the measure of size. The variable must exists on the frame and be non-missing and non-negative numeric variable.
- outall
Output all records or selected records. If outall is TRUE, then all records are return and the following variables are created: SelectionIndicator, SamplingWeight, NumberHits, and ExpectedHits. If outall is FALSE, then the selected records are return and the following variables are created: SamplingWeight, NumberHits, ExpectedHits.
- curstrat
A character variable that specifies the current strata, only used as an assertion for the n == N test.
Value
Returns an object of type tidytable that contains the weight, selection probability, number of hits, etc plus all original variables.
Examples
# PPS sample of 75 counties using Pop_Tot as the measure of size
# Return only the sampled counties
sys_pps(county_2023, mos = "Pop_Tot", n = 75, outall = FALSE)
#> Frame size: 3144
#> Sample size: 75
#> Sampling interval (k): 4431834
#> Random start (r): 3657350
#> # A tidytable: 73 × 28
#> GEOID Name State Region Division Pop_Tot Pop_Pct_White_NH Pop_Pct_Black_NH
#> <chr> <chr> <chr> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 01097 Mobile… AL South East So… 413162 55.3 35.9
#> 2 04013 Marico… AZ West Mountain 4491987 53.4 5.49
#> 3 04021 Pinal … AZ West Mountain 449219 55.9 4.86
#> 4 06001 Alamed… CA West Pacific 1651949 28.2 9.63
#> 5 06029 Kern C… CA West Pacific 910433 30.7 4.83
#> 6 06037 Los An… CA West Pacific 9848406 25.2 7.54
#> 7 06059 Orange… CA West Pacific 3164063 37.7 1.52
#> 8 06065 Rivers… CA West Pacific 2449909 32.0 6.12
#> 9 06073 San Di… CA West Pacific 3282782 43.2 4.44
#> 10 06077 San Jo… CA West Pacific 787416 27.9 6.68
#> # ℹ 63 more rows
#> # ℹ 20 more variables: Pop_Pct_AIAN_NH <dbl>, Pop_Pct_Asian_NH <dbl>,
#> # Pop_Pct_NHPI_NH <dbl>, Pop_Pct_Other_NH <dbl>, Pop_Pct_Hispanic <dbl>,
#> # HU_Tot <dbl>, HU_Pct_Occupied <dbl>, HU_Pct_Vacant <dbl>,
#> # Pop_Pct_0004 <dbl>, Pop_Pct_0509 <dbl>, Pop_Pct_1014 <dbl>,
#> # Pop_Pct_2544 <dbl>, Pop_Pct_4564 <dbl>, Pop_Pct_6574 <dbl>,
#> # Pop_Pct_75plus <dbl>, Pop_Pct_1517 <dbl>, Pop_Pct_1824 <dbl>, …
# Return the full dataset with selection indicators
sys_pps(county_2023, mos = "Pop_Tot", n = 75, outall = TRUE)
#> Frame size: 3144
#> Sample size: 75
#> Sampling interval (k): 4431834
#> Random start (r): 3563587
#> # A tidytable: 3,144 × 29
#> GEOID Name State Region Division Pop_Tot Pop_Pct_White_NH Pop_Pct_Black_NH
#> <chr> <chr> <chr> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 01001 Autaug… AL South East So… 59285 71.7 20.0
#> 2 01003 Baldwi… AL South East So… 239945 81.4 7.94
#> 3 01005 Barbou… AL South East So… 24757 43.7 46.9
#> 4 01007 Bibb C… AL South East So… 22152 73.7 20.7
#> 5 01009 Blount… AL South East So… 59292 85.0 1.26
#> 6 01011 Bulloc… AL South East So… 10157 21.1 71.2
#> 7 01013 Butler… AL South East So… 18807 50.7 44.7
#> 8 01015 Calhou… AL South East So… 116141 69.8 21.6
#> 9 01017 Chambe… AL South East So… 34450 53.9 39.7
#> 10 01019 Cherok… AL South East So… 25224 90.7 3.73
#> # ℹ 3,134 more rows
#> # ℹ 21 more variables: Pop_Pct_AIAN_NH <dbl>, Pop_Pct_Asian_NH <dbl>,
#> # Pop_Pct_NHPI_NH <dbl>, Pop_Pct_Other_NH <dbl>, Pop_Pct_Hispanic <dbl>,
#> # HU_Tot <dbl>, HU_Pct_Occupied <dbl>, HU_Pct_Vacant <dbl>,
#> # Pop_Pct_0004 <dbl>, Pop_Pct_0509 <dbl>, Pop_Pct_1014 <dbl>,
#> # Pop_Pct_2544 <dbl>, Pop_Pct_4564 <dbl>, Pop_Pct_6574 <dbl>,
#> # Pop_Pct_75plus <dbl>, Pop_Pct_1517 <dbl>, Pop_Pct_1824 <dbl>, …