Draws a systematic sample of size n
from a data frame. Each unit has an equal probability of being selected.
Arguments
- frame
a
data.frame
,tibble
, ordata.table
containing the sampling frame. Must have at least one row.- n
Integer. The desired sample size. Must be less than or equal to the number of rows in
frame
.- curstrat
Character or NULL. Optional stratum name for printing messages.
- outall
logical indicator for whether full frame is returned or just the sample
Value
a data.table
with the original columns plus:
- SelectionProbability
Equal to n / N for all units.
- SamplingWeight
Equal to N / n for all units.
- SelectionIndicator
TRUE if selected, FALSE otherwise.-only included if
outall=TRUE
- NumberHits
1 if selected, 0 otherwise.
- ExpectedHits
Equal to SelectionProbability.
References
Kalton, G. (1983). Introduction to Survey Sampling. SAGE Publications. https://doi.org/10.4135/9781412984683
Examples
# Sort by REGION, DIVISION, and Pop_Tot, then take a sample
puma_2023 |>
tidytable::arrange(Region, Division, Pop_Tot) |>
sys(n = 50, outall = FALSE)
#> Frame size: 2462
#> Sample size: 50
#> Sampling interval (k): 49.24
#> Random start (r): 18.4339
#> # A tidytable: 50 × 27
#> GEOID Name State Region Division Pop_Tot Pop_Pct_White_NH Pop_Pct_Black_NH
#> <chr> <chr> <chr> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 2300600 Andr… ME North… New Eng… 112323 88.2 4.61
#> 2 2500904 Norf… MA North… New Eng… 139589 64.4 12.8
#> 3 3401105 Monm… NJ North… Middle … 100369 74.5 3.82
#> 4 3603002 Rock… NY North… Middle … 109404 65.0 8.22
#> 5 3400903 Midd… NJ North… Middle … 117567 32.0 14.3
#> 6 4201803 Alle… PA North… Middle … 127233 88.2 3.32
#> 7 4203230 Phil… PA North… Middle … 139415 68.4 11.0
#> 8 4201200 Cent… PA North… Middle … 158041 83.5 3.21
#> 9 4201701 Pitt… PA North… Middle … 196078 58.1 24.8
#> 10 1802401 Mari… IN Midwe… East No… 109931 32.8 46.7
#> # ℹ 40 more rows
#> # ℹ 19 more variables: Pop_Pct_AIAN_NH <dbl>, Pop_Pct_Asian_NH <dbl>,
#> # Pop_Pct_NHPI_NH <dbl>, Pop_Pct_Other_NH <dbl>, Pop_Pct_Hispanic <dbl>,
#> # HU_Tot <dbl>, HU_Pct_Occupied <dbl>, HU_Pct_Vacant <dbl>,
#> # Pop_Pct_0004 <dbl>, Pop_Pct_0509 <dbl>, Pop_Pct_1014 <dbl>,
#> # Pop_Pct_2544 <dbl>, Pop_Pct_4564 <dbl>, Pop_Pct_6574 <dbl>,
#> # Pop_Pct_75plus <dbl>, Pop_Pct_1517 <dbl>, Pop_Pct_1824 <dbl>, …
# Return full dataset with selection indicators
puma_2023 |>
tidytable::arrange(Region, Division, Pop_Tot) |>
sys(n = 50, outall = TRUE)
#> Frame size: 2462
#> Sample size: 50
#> Sampling interval (k): 49.24
#> Random start (r): 43.60074
#> # A tidytable: 2,462 × 28
#> GEOID Name State Region Division Pop_Tot Pop_Pct_White_NH Pop_Pct_Black_NH
#> <chr> <chr> <chr> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 2500505 Worc… MA North… New Eng… 99893 51.7 9.42
#> 2 2500903 Norf… MA North… New Eng… 101322 54.4 6.06
#> 3 2500705 Esse… MA North… New Eng… 104233 37.8 10.1
#> 4 0920902 West… CT North… New Eng… 104909 80.9 1.97
#> 5 2501101 Plym… MA North… New Eng… 105080 26.6 37.0
#> 6 2500603 Midd… MA North… New Eng… 105224 80.4 3.56
#> 7 2500504 Worc… MA North… New Eng… 105608 49.4 13.4
#> 8 3300602 Grea… NH North… New Eng… 106658 88.9 0.991
#> 9 0920301 Nort… CT North… New Eng… 107291 87.3 1.52
#> 10 2500606 Midd… MA North… New Eng… 107565 74.7 1.73
#> # ℹ 2,452 more rows
#> # ℹ 20 more variables: Pop_Pct_AIAN_NH <dbl>, Pop_Pct_Asian_NH <dbl>,
#> # Pop_Pct_NHPI_NH <dbl>, Pop_Pct_Other_NH <dbl>, Pop_Pct_Hispanic <dbl>,
#> # HU_Tot <dbl>, HU_Pct_Occupied <dbl>, HU_Pct_Vacant <dbl>,
#> # Pop_Pct_0004 <dbl>, Pop_Pct_0509 <dbl>, Pop_Pct_1014 <dbl>,
#> # Pop_Pct_2544 <dbl>, Pop_Pct_4564 <dbl>, Pop_Pct_6574 <dbl>,
#> # Pop_Pct_75plus <dbl>, Pop_Pct_1517 <dbl>, Pop_Pct_1824 <dbl>, …