| Title: | Quantile-Based Inequality Indicators for Complex Survey Data |
|---|---|
| Description: | Estimates quantile-based inequality indicators from complex survey data, including the quantile ratio index (QRI), quintile share Ratio (QSR), Palma ratio, and percentile ratios, together with the Gini coefficient. Influence functions are provided for linearization and variance estimation, along with a rescaled bootstrap for complex sampling designs. Estimation from grouped data is also supported. See Scarpa et al. (2025) <doi:10.1093/jssam/smaf024> for details. |
| Authors: | Silvia Scarpa [aut, cre, cph] (ORCID: <https://orcid.org/0000-0003-4003-1621>), Stefan Sperlich [aut] |
| Maintainer: | Silvia Scarpa <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-04 11:12:22 UTC |
| Source: | https://github.com/silviascarpa/inequantiles |
Computes quantiles for weighted or unweighted data, allowing for sampling weights and several interpolation types. The method extends the standard quantile definitions of Hyndman and Fan (1996) and Harrell and Davis (1982) estimator to the case of complex survey data by incorporating sampling weights into the cumulative distribution function and interpolation points, as proposed in Scarpa et al. (2025).
csquantile(y, weights = NULL, probs = seq(0, 1, 0.1), type = 4, na.rm = FALSE)csquantile(y, weights = NULL, probs = seq(0, 1, 0.1), type = 4, na.rm = FALSE)
y |
Numeric vector of observations. |
weights |
Optional numeric vector of sampling weights; if |
probs |
Numeric vector of probabilities (default: |
type |
Quantile estimation type: integer |
na.rm |
Logical; if |
Consider a random sample of size . Let be the sample observations from a finite population,
with order statistics and corresponding sampling
weights . Define the cumulative weights
and the total weight .
The weighted quantile estimator is computed as a linear interpolation between
adjacent order statistics:
where denotes the estimated cumulative distribution function
(the “plotting position”), and the order is such that
,
with determined by the interpolation method.
The table below summarizes the six interpolation types (4–9) extended from Hyndman and Fan (1996) to incorporate sampling weights, as described in Scarpa et al. (2025).
| Type | Estimator |
Interpolation |
Selection rule for |
| 4 | |
0 | |
| 5 | |
|
|
| 6 | |
|
|
| 7 | |
|
|
| 8 | |
|
|
| 9 | |
|
|
The function supports several interpolation rules (types 4–9) and extends the quantile definitions in Hyndman and Fan (1996) to incorporate sampling weights. For unweighted data, the function returns the standard R quantiles.
The Harrell–Davis estimator ("HD") is extended to the weighted case as
proposed in Kreutzmann (2018), by redefining the weighting coefficients
for order statistics as:
where denotes the incomplete beta function.
The resulting quantile estimator is
For unweighted data, the function returns the Harrell-Davis quantile estimator.
A named numeric vector of estimated quantiles corresponding to
probs.
Harrell FE, Davis CE (1982). “A new distribution-free quantile estimator.” Biometrika, 69, 635–640.
Hyndman RJ, Fan Y (1996). “Sample quantiles in statistical packages.” The American Statistician, 50, 361–365.
Kreutzmann AK (2018). “Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions.” AStA Wirtschafts-und Sozialstatistisches Archiv, 12, 245–270.
Scarpa S, Ferrante MR, Sperlich S (2025). “Inference for the quantile ratio inequality index in the context of survey data.” Journal of Survey Statistics and Methodology. doi:10.1093/jssam/smaf024.
data(synthouse) y <- synthouse$eq_income w <- synthouse$weight # Unweighted quantiles csquantile(y, probs = c(0.25, 0.5, 0.75), type = 6) # Weighted quantiles csquantile(y, weights = w, probs = c(0.25, 0.5, 0.75), type = 6) # Harrell-Davis estimator csquantile(y, weights = w, probs = c(0.25, 0.5, 0.75), type = "HD")data(synthouse) y <- synthouse$eq_income w <- synthouse$weight # Unweighted quantiles csquantile(y, probs = c(0.25, 0.5, 0.75), type = 6) # Weighted quantiles csquantile(y, weights = w, probs = c(0.25, 0.5, 0.75), type = 6) # Harrell-Davis estimator csquantile(y, weights = w, probs = c(0.25, 0.5, 0.75), type = "HD")
Computes the Gini coefficient from grouped income data based on linear interpolation of income shares.
gini_grouped(Y, freq)gini_grouped(Y, freq)
Y |
Numeric vector of total amounts per group (e.g., total income per income class) |
freq |
Numeric vector of frequencies per group or class. |
Consider grouped data divided into classes with known boundaries,
observed frequencies and total amounts .
The Gini coefficient is approximated by linear interpolation of cumulative shares, as:
where:
is the population share of group ;
is the share of the variable of interest in group ;
is the cumulative share of the variable up to group ;
is the cumulative population share up to group ;
by convention.
This formula computes twice the area between the egalitarian line (perfect equality)
and the Lorenz curve obtained by linearly interpolating the points .
Since it assumes all observations within a group have identical values, it provides
a lower-bound estimate of the true Gini coefficient, actual inequality may be larger
(Jorda et al. 2021). The bias magnitude depends on the number of groups and how they are defined.
A numeric value representing the estimated Gini coefficient on grouped data. The Gini coefficient ranges from 0 (perfect equality) to 1 (complete inequality). Note that it assumes equality within groups.
Jorda V, Sarabia JM, Jäntti M (2021). “Inequality measurement with grouped data: Parametric and non-parametric methods.” Journal of the Royal Statistical Society Series A: Statistics in Society, 184(3), 964–984.
qri_grouped for computing the quantile ratio index from grouped data.
Other grouped data functions:
qri_grouped(),
quantile_grouped()
income_freq <- c(1200, 1800, 1500, 800, 400, 20, 10) income_tot <- c(18800, 16300, 44700, 33900, 21500, 22100, 98300) gini_grouped(Y = income_tot, freq = income_freq)income_freq <- c(1200, 1800, 1500, 800, 400, 20, 10) income_tot <- c(18800, 16300, 44700, 33900, 21500, 22100, 98300) gini_grouped(Y = income_tot, freq = income_freq)
Computes the influence function for the Gini coefficient, useful for variance estimation and linearization in complex survey designs Langel and Tillé (2013).
if_gini(y, weights = NULL, na.rm = TRUE)if_gini(y, weights = NULL, na.rm = TRUE)
y |
Numeric vector of income or variable of interest. |
weights |
Numeric vector of sampling weights. If |
na.rm |
Logical. Should missing values be removed? Default is |
The influence function for the Gini coefficient is computed using the linearization method, following Deville (1999) framework and as defined by Langel and Tillé (2013). The influence function for Gini is:
where:
is the cumulative sum of weights up to rank
is the weighted mean of values up to rank
is the total sum of weights
is the weighted total of the variable
is the Gini coefficient estimate
A numeric vector of the same length as y containing the
influence function values for each observation, returned in the same order
as the input y.
Deville J (1999). “Variance estimation for complex statistics and estimators: linearization and residual techniques.” Survey methodology, 25, 193–204.
Langel M, Tillé Y (2013). “Variance estimation of the Gini index: revisiting a result several times published.” Journal of the Royal Statistical Society Series A, 176, 521–540.
Other influence functions:
if_qri(),
if_quantile(),
if_ratio_quantiles(),
if_share_ratio()
data(synthouse) eq <- synthouse$eq_income # Equivalized disposable income # Simple example z <- if_gini(eq) # With weights w <- synthouse$weight z_weighted <- if_gini(y = eq, weights = w)data(synthouse) eq <- synthouse$eq_income # Equivalized disposable income # Simple example z <- if_gini(eq) # With weights w <- synthouse$weight z_weighted <- if_gini(y = eq, weights = w)
Computes the influence function of the quantile ratio index (QRI) in the context of finite population for all observations, as defined in Scarpa et al. (2025), under simple and complex sampling. See Deville (1999) for an introduction to the definition of influence function in finite population theory.
if_qri(y, weights = NULL, type = 6, na.rm = TRUE)if_qri(y, weights = NULL, type = 6, na.rm = TRUE)
y |
A numeric vector of data values |
weights |
A numeric vector of sampling weights (optional). If |
type |
Quantile estimation type: integer |
na.rm |
Logical. Should missing values be removed? Default is |
The influence function for the QRI is computed on each observation as
where:
is the weighted sample quantile of order ,
computed using the internal function csquantile(),
denotes the estimated income density function,
is the estimated population size, where
is the sampling weight associated to the -th individual.
The density function is estimated via a Gaussian kernel smoother:
where is the Gaussian kernel.
The bandwidth is chosen as:
where is the interquartile range of the weighted sample.
A numeric vector of influence function values (one per observation),
returned in the same order as the input y.
Deville J (1999). “Variance estimation for complex statistics and estimators: linearization and residual techniques.” Survey methodology, 25, 193–204.
Scarpa S, Ferrante MR, Sperlich S (2025). “Inference for the quantile ratio inequality index in the context of survey data.” Journal of Survey Statistics and Methodology. doi:10.1093/jssam/smaf024.
qri for the QRI inequality indicator estimator, csquantile
for weighted quantile estimation.
Other influence functions:
if_gini(),
if_quantile(),
if_ratio_quantiles(),
if_share_ratio()
# On synthetic data eq_synth <- rlnorm(30, 9, 0.7) IF_synth <- if_qri(y = eq_synth) # On real data data(synthouse) eq <- synthouse$eq_income[1:30] w <- synthouse$weight[1:30] IF_qri <- if_qri(y = eq, weights = w, type = 6)# On synthetic data eq_synth <- rlnorm(30, 9, 0.7) IF_synth <- if_qri(y = eq_synth) # On real data data(synthouse) eq <- synthouse$eq_income[1:30] w <- synthouse$weight[1:30] IF_qri <- if_qri(y = eq, weights = w, type = 6)
Computes the influence function of sample quantiles, allowing for both simple random sampling and complex survey designs with sampling weights, in the context of finite population. See Hampel et al. (1986) for an explanation of influence function and Deville (1999) for its definition in finite population theory.
if_quantile(y, weights = NULL, probs, type = 6, na.rm = TRUE)if_quantile(y, weights = NULL, probs, type = 6, na.rm = TRUE)
y |
A numeric vector of data values |
weights |
A numeric vector of sampling weights (optional) |
probs |
A numeric value specifying the probability for the quantile (e.g., 0.5 for median) |
type |
Quantile estimation type: integer |
na.rm |
Logical, should missing values be removed? (default: TRUE) |
From the definition in Van der Vaart (2000) and (Osier 2009),
the population influence function of the quantile is defined as:
where is the population density function evaluated at the quantile and
is the population size.
In the sample, this is estimated as:
where is the weighted sample quantile estimated by
csquantile(), and is the estimated population size.
The density is estimated using a Gaussian kernel density function:
with bandwidth
A numeric vector containing the estimated influence function values for each observation.
Hampel FR, Ronchetti E, Rousseeuw P, Stahel W (1986). Robust statistics: the approach based on influence functions. John Wiley & Sons.
Deville J (1999). “Variance estimation for complex statistics and estimators: linearization and residual techniques.” Survey methodology, 25, 193–204.
Van der Vaart AW (2000). Asymptotic statistics, volume 3. Cambridge University Press.
Osier G (2009). “Variance estimation for complex indicators of poverty and inequality using linearization techniques.” Survey Research Methods, 3, 167–195.
csquantile for weighted quantile estimation.
Other influence functions:
if_gini(),
if_qri(),
if_ratio_quantiles(),
if_share_ratio()
# On synthetic data eq_synth <- rlnorm(30, 9, 0.7) IF_synth <- if_quantile(y = eq_synth, probs = 0.3) # On real data data(synthouse) eq <- synthouse$eq_income[1:30] # First 30 observations w <- synthouse$weight[1:30] IF_quantile <- if_quantile(y = eq, weights = w, type = 6, probs = 0.5)# On synthetic data eq_synth <- rlnorm(30, 9, 0.7) IF_synth <- if_quantile(y = eq_synth, probs = 0.3) # On real data data(synthouse) eq <- synthouse$eq_income[1:30] # First 30 observations w <- synthouse$weight[1:30] IF_quantile <- if_quantile(y = eq, weights = w, type = 6, probs = 0.5)
Computes the influence function of the ratio between two quantiles (e.g., P90/P10) for all observations in the sample. See Deville (1999) and Osier (2009) for the definition of influence functions in finite population theory.
if_ratio_quantiles( y, weights = NULL, type = 6, prob_numerator = 0.9, prob_denominator = 0.1, na.rm = TRUE )if_ratio_quantiles( y, weights = NULL, type = 6, prob_numerator = 0.9, prob_denominator = 0.1, na.rm = TRUE )
y |
A numeric vector of data values. |
weights |
A numeric vector of sampling weights (optional). If |
type |
Quantile estimation type: integer |
prob_numerator |
Numeric in |
prob_denominator |
Numeric in |
na.rm |
Logical; remove missing values before computing? Default: |
The influence function for the ratio
is derived via the delta method applied to the quantile influence function of
Deville (1999):
where:
is the weighted sample quantile of order ,
computed via csquantile,
and are the orders of the numerator and denominator
quantiles, respectively,
is the estimated density function,
is the estimated population size.
The density is estimated via a Gaussian kernel:
with bandwidth .
A numeric vector of influence function values, one per observation.
Deville J (1999). “Variance estimation for complex statistics and estimators: linearization and residual techniques.” Survey methodology, 25, 193–204. Osier G (2009). “Variance estimation for complex indicators of poverty and inequality using linearization techniques.” Survey Research Methods, 3, 167–195.
Other influence functions:
if_gini(),
if_qri(),
if_quantile(),
if_share_ratio()
# On synthetic data set.seed(1) eq_synth <- rlnorm(30, 9, 0.7) IF_synth <- if_ratio_quantiles(y = eq_synth, prob_numerator = 0.80, prob_denominator = 0.20) # On survey data data(synthouse) eq <- synthouse$eq_income[1:30] w <- synthouse$weight[1:30] IF_vals <- if_ratio_quantiles(y = eq, weights = w, type = 6)# On synthetic data set.seed(1) eq_synth <- rlnorm(30, 9, 0.7) IF_synth <- if_ratio_quantiles(y = eq_synth, prob_numerator = 0.80, prob_denominator = 0.20) # On survey data data(synthouse) eq <- synthouse$eq_income[1:30] w <- synthouse$weight[1:30] IF_vals <- if_ratio_quantiles(y = eq, weights = w, type = 6)
Estimates one or more quantile-based inequality indicators simultaneously — QRI, quantile-based share ratio (QSR, Palma, or custom), percentile ratio — together with the Gini coefficient as a widely used benchmark. When standard errors are requested, all indicators are evaluated on the same bootstrap replicates, ensuring full comparability.
inequantiles( y, weights = NULL, indicators = "all", se = FALSE, type = 6, na.rm = TRUE, M = 100, B = 200, seed = NULL, data = NULL, strata = NULL, psu = NULL, N_h = NULL, m_h = NULL, verbose = TRUE ) ## S3 method for class 'inequantiles' print(x, digits = 4, ...)inequantiles( y, weights = NULL, indicators = "all", se = FALSE, type = 6, na.rm = TRUE, M = 100, B = 200, seed = NULL, data = NULL, strata = NULL, psu = NULL, N_h = NULL, m_h = NULL, verbose = TRUE ) ## S3 method for class 'inequantiles' print(x, digits = 4, ...)
y |
A numeric vector of strictly positive values (e.g. income, wealth, expenditure). |
weights |
A numeric vector of sampling weights. If |
indicators |
Character vector specifying which indicators to compute.
Use |
se |
Logical; if |
type |
Quantile estimation type: integer |
na.rm |
Logical; remove missing values before computing? Default:
|
M |
Integer; number of quantile-ratio grid points for the QRI
(default: |
B |
Integer; number of bootstrap replicates (default: |
seed |
Integer; random seed for reproducibility. Only used when
|
data |
A data frame containing the survey design variables (strata,
PSU). Required when |
strata |
Character string; name of the stratification column in
|
psu |
Character string; name of the PSU column in |
N_h |
Optional named numeric vector of stratum population sizes for
the finite population correction. See |
m_h |
Optional vector of bootstrap sample sizes per stratum.
Defaults to the Rao-Wu formula. See |
verbose |
Logical; if |
x |
An object of class |
digits |
Integer; number of decimal places for rounding (default: |
... |
Further arguments passed to or from other methods. |
All quantile-based indicators are computed from the same specified
csquantile type. When se = TRUE, a single
bootstrap loop is run through the rescaled bootstrap method
(see rescaled_bootstrap) and all indicators are evaluated on
each replicate, so standard errors are based on identical resamples and are
directly comparable.
The Gini coefficient is estimated following (Langel and Tillé 2013), equation 6, using a weighted formula based on cumulative weight sums.
A list with components:
estimates |
Numeric vector of point estimates of inequality indicators. |
se |
Numeric vector of standard errors, or |
B |
Number of bootstrap replicates used, or |
design |
Sampling design type detected by the bootstrap, or |
call |
The matched function call. |
The argument x, invisibly.
qri, share_ratio,
ratio_quantiles, rescaled_bootstrap
Other inequality indicators based on quantiles:
plot_inequality_curve(),
qri(),
ratio_quantiles(),
share_ratio(),
superpop_qri()
data(synthouse) eq <- synthouse$eq_income w <- synthouse$weight # Point estimates only inequantiles(eq, weights = w) # Subset of indicators inequantiles(eq, weights = w, indicators = c("qri", "palma")) # With bootstrap standard errors (complex design) inequantiles(eq, weights = w, se = TRUE, B = 50, seed = 42, data = synthouse, strata = "NUTS2", psu = "municipality", verbose = FALSE)data(synthouse) eq <- synthouse$eq_income w <- synthouse$weight # Point estimates only inequantiles(eq, weights = w) # Subset of indicators inequantiles(eq, weights = w, indicators = c("qri", "palma")) # With bootstrap standard errors (complex design) inequantiles(eq, weights = w, se = TRUE, B = 50, seed = 42, data = synthouse, strata = "NUTS2", psu = "municipality", verbose = FALSE)
Plots the inequality curve over
, from either sampling survey data or a parametric
distribution. The shaded area between the curve and the line
equals the QRI.
plot_inequality_curve( y = NULL, qfunction = NULL, qfun_args = list(), weights = NULL, M = 100, type = 6, na.rm = TRUE, shade = TRUE, add = FALSE, col = "steelblue", shade_col = NULL, lwd = 1.5, lty = 1, legend_qri = TRUE, label = NULL, xlab = "p", ylab = "R(p)", main = "Inequality curve" )plot_inequality_curve( y = NULL, qfunction = NULL, qfun_args = list(), weights = NULL, M = 100, type = 6, na.rm = TRUE, shade = TRUE, add = FALSE, col = "steelblue", shade_col = NULL, lwd = 1.5, lty = 1, legend_qri = TRUE, label = NULL, xlab = "p", ylab = "R(p)", main = "Inequality curve" )
y |
Numeric vector of strictly positive values (e.g. income). Provide
either |
qfunction |
A parametric quantile function, e.g. |
qfun_args |
Named list of additional arguments passed to
|
weights |
Numeric vector of sampling weights. Only used in estimation
mode. If |
M |
Integer; number of grid points for evaluating |
type |
Quantile estimation type: integer |
na.rm |
Logical; remove missing values? Default: |
shade |
Logical; if |
add |
Logical; if |
col |
Colour of the inequality curve (default: |
shade_col |
Colour for the shaded area. Defaults to a transparent
version of |
lwd |
Line width (default: |
lty |
Line type (default: |
legend_qri |
Logical; if |
label |
Character string; overrides the auto-generated legend label
( |
xlab |
x-axis label (default: |
ylab |
y-axis label (default: |
main |
Plot title (default: |
The inequality curve plots the ratio of symmetric quantiles
around the median:
against .
For a perfectly equal distribution for all , and the
curve coincides with the horizontal line at 1. The further the curve lies
below the equality line, the more unequal the distribution. The QRI is the
area between the equality line and the curve.
Boundary values and are set by convention
(see Prendergast and Staudte (2018)).
Multiple curves can be overlaid by calling the function repeatedly with
add = TRUE. The legend outside the plot accumulates an entry for
each curve automatically.
Beyond the plot, a named list with three elements:
p |
Numeric vector of grid points in |
Rp |
Numeric vector of |
qri |
The estimated QRI (area between the equality line and the curve). |
The list is returned invisibly, meaning it is not printed to the console
when the function is called without assignment. Assign the output to a
variable (e.g. out <- plot_inequality_curve(...)) to inspect it.
Prendergast LA, Staudte RG (2018). “A simple and effective inequality measure.” The American Statistician, 72, 328–343.
Scarpa S, Ferrante MR, Sperlich S (2025). “Inference for the quantile ratio inequality index in the context of survey data.” Journal of Survey Statistics and Methodology. doi:10.1093/jssam/smaf024.
qri for the sample-based QRI estimator,
superpop_qri for the theoretical QRI of a parametric
distribution.
Other inequality indicators based on quantiles:
inequantiles(),
qri(),
ratio_quantiles(),
share_ratio(),
superpop_qri()
# ----------------------------------------------------------------- # Parametric mode: single curve # ----------------------------------------------------------------- plot_inequality_curve( qfunction = qlnorm, qfun_args = list(meanlog = 9, sdlog = 0.9), main = "Log-Normal inequality curve" ) # ----------------------------------------------------------------- # Overlay multiple curves — legend accumulates automatically # ----------------------------------------------------------------- plot_inequality_curve( qfunction = qlnorm, qfun_args = list(meanlog = 9, sdlog = 0.3), main = "Log-Normal inequality curves", col = "steelblue", label = "LogN(9, 0.3)" ) plot_inequality_curve( qfunction = qlnorm, qfun_args = list(meanlog = 9, sdlog = 0.9), col = "tomato", lty = 2, add = TRUE, label = "LogN(9, 0.9)" ) # ----------------------------------------------------------------- # Empirical mode: survey data with sampling weights # ----------------------------------------------------------------- data(synthouse) out <- plot_inequality_curve( y = synthouse$eq_income, weights = synthouse$weight, main = "Inequality curve — synthouse" ) # Inspect the returned list out$qri # estimated QRI head(out$p) # grid points head(out$Rp) # R(p) values# ----------------------------------------------------------------- # Parametric mode: single curve # ----------------------------------------------------------------- plot_inequality_curve( qfunction = qlnorm, qfun_args = list(meanlog = 9, sdlog = 0.9), main = "Log-Normal inequality curve" ) # ----------------------------------------------------------------- # Overlay multiple curves — legend accumulates automatically # ----------------------------------------------------------------- plot_inequality_curve( qfunction = qlnorm, qfun_args = list(meanlog = 9, sdlog = 0.3), main = "Log-Normal inequality curves", col = "steelblue", label = "LogN(9, 0.3)" ) plot_inequality_curve( qfunction = qlnorm, qfun_args = list(meanlog = 9, sdlog = 0.9), col = "tomato", lty = 2, add = TRUE, label = "LogN(9, 0.9)" ) # ----------------------------------------------------------------- # Empirical mode: survey data with sampling weights # ----------------------------------------------------------------- data(synthouse) out <- plot_inequality_curve( y = synthouse$eq_income, weights = synthouse$weight, main = "Inequality curve — synthouse" ) # Inspect the returned list out$qri # estimated QRI head(out$p) # grid points head(out$Rp) # R(p) values
Computes the quantile ratio index (QRI) estimator for measuring inequality on simple and complex sampling data
qri(y, weights = NULL, M = 100, type = 6, na.rm = TRUE)qri(y, weights = NULL, M = 100, type = 6, na.rm = TRUE)
y |
A numeric vector of strictly positive values (e.g. income, wealth, expenditure). |
weights |
A numeric vector of sampling weights. If |
M |
Integer; number of quantile ratios to average (default: 100) |
type |
Quantile estimation type: integer |
na.rm |
Logical; should missing values be removed? (default: TRUE) |
Consider a random sample , where , , defines
the sampling weight associated to the -th individual.
The QRI estimator is defined as:
where and .
The estimated quantiles are computed via the function
csquantile(), which accounts for sampling weights and the specified
quantile type. This allows to be used both for simple
random samples and for complex survey data with design weights.
This index was proposed by Prendergast and Staudte (2018), and extended to survey data by Scarpa et al. (2025).
A scalar numeric value representing the estimated inequality by the quantile ratio index (QRI).
Prendergast LA, Staudte RG (2018). “A simple and effective inequality measure.” The American Statistician, 72, 328–343.
Scarpa S, Ferrante MR, Sperlich S (2025). “Inference for the quantile ratio inequality index in the context of survey data.” Journal of Survey Statistics and Methodology. doi:10.1093/jssam/smaf024.
qri_grouped for QRI estimation from grouped data,
superpop_qri for the theoretical QRI of a parametric distribution,
if_qri for the influence function used for linearization.
Other inequality indicators based on quantiles:
inequantiles(),
plot_inequality_curve(),
ratio_quantiles(),
share_ratio(),
superpop_qri()
data(synthouse) eq <- synthouse$eq_income # Income data # Compute unweighted QRI with default type 6 quantile estimator qri(y = eq) # Consider the sampling weights and change quantile estimation type w <- synthouse$weight qri(y = eq, weights = w, type = 5) # Compare QRI across macro-regions (NUTS1) tapply(1:nrow(synthouse), synthouse$NUTS1, function(area) { qri(y = synthouse$eq_income[area], weights = synthouse$weight[area], type = 6) })data(synthouse) eq <- synthouse$eq_income # Income data # Compute unweighted QRI with default type 6 quantile estimator qri(y = eq) # Consider the sampling weights and change quantile estimation type w <- synthouse$weight qri(y = eq, weights = w, type = 5) # Compare QRI across macro-regions (NUTS1) tapply(1:nrow(synthouse), synthouse$NUTS1, function(area) { qri(y = synthouse$eq_income[area], weights = synthouse$weight[area], type = 6) })
Computes the quantile ratio index (QRI) for measuring inequality from grouped frequency data using linear interpolation for quantile estimation. This function is intended to be used for administrative or tax data, which are very often in the form of grouped data. Therefore, sampling weights are not considered.
qri_grouped( freq, lower_bounds, upper_bounds, M = 100, midpoints = NULL, na.rm = TRUE )qri_grouped( freq, lower_bounds, upper_bounds, M = 100, midpoints = NULL, na.rm = TRUE )
freq |
Numeric vector of class frequencies (counts). Must be non-negative. |
lower_bounds |
Numeric vector of lower class bounds. |
upper_bounds |
Numeric vector of upper class bounds. |
M |
Integer; number of quantile ratios to average (default: 100). |
midpoints |
Optional numeric vector of class midpoints. Used only as fallback when a quantile class has zero frequency. |
na.rm |
Logical; should missing values in frequencies be removed? (default: TRUE) |
Consider grouped data divided into classes with known boundaries and
observed frequencies . The QRI estimator for grouped
data is approximated as:
where:
for
denotes the -th quantile computed from
grouped data using linear interpolation (see quantile_grouped)
is the number of quantile ratios to average (default: 100)
The quantiles are computed via quantile_grouped(),
which uses linear interpolation within classes and automatically handles
open-ended classes (with -Inf or Inf bounds).
The QRI ranges from 0 (perfect equality) to 1 (maximum inequality). The index measures inequality by averaging the relative differences between symmetric quantiles below and above the median, across the entire distribution.
A scalar numeric value representing the estimated inequality by the quantile ratio index (QRI) for grouped data.
When individual-level (microdata) are available, use qri instead,
which provides more accurate estimates. The grouped data version
qri_grouped should be used when only frequency distributions are available,
such as in published statistical tables or administrative aggregates.
The grouped QRI will generally approximate the microdata QRI well when:
Classes are sufficiently narrow
The distribution within classes is approximately uniform
Sample sizes within classes are adequate
Prendergast LA, Staudte RG (2018). “A simple and effective inequality measure.” The American Statistician, 72, 328–343.
qri for QRI estimation with microdata.
superpop_qri for QRI computation on parametric distributions
Other grouped data functions:
gini_grouped(),
quantile_grouped()
# Basic example with closed classes income_freq <- c(120, 180, 150, 80, 40, 20, 10) income_lower <- c(0, 15000, 30000, 45000, 60000, 80000, 100000) income_upper <- c(15000, 30000, 45000, 60000, 80000, 100000, 150000) qri_grouped(income_freq, income_lower, income_upper) # Example with open-ended classes (Italian MEF-style data) wage_freq <- c(150, 200, 180, 220, 180, 50, 15, 5) wage_lower <- c(-Inf, 0, 10000, 15000, 26000, 55000, 75000, 120000) wage_upper <- c(0, 10000, 15000, 26000, 55000, 75000, 120000, Inf) # Compute QRI (automatically handles open classes) qri_grouped(wage_freq, wage_lower, wage_upper)# Basic example with closed classes income_freq <- c(120, 180, 150, 80, 40, 20, 10) income_lower <- c(0, 15000, 30000, 45000, 60000, 80000, 100000) income_upper <- c(15000, 30000, 45000, 60000, 80000, 100000, 150000) qri_grouped(income_freq, income_lower, income_upper) # Example with open-ended classes (Italian MEF-style data) wage_freq <- c(150, 200, 180, 220, 180, 50, 15, 5) wage_lower <- c(-Inf, 0, 10000, 15000, 26000, 55000, 75000, 120000) wage_upper <- c(0, 10000, 15000, 26000, 55000, 75000, 120000, Inf) # Compute QRI (automatically handles open classes) qri_grouped(wage_freq, wage_lower, wage_upper)
Computes quantiles from grouped frequency data using linear interpolation within the quantile class.
quantile_grouped( freq, lower_bounds, upper_bounds, probs = 0.5, midpoints = NULL )quantile_grouped( freq, lower_bounds, upper_bounds, probs = 0.5, midpoints = NULL )
freq |
Numeric vector of class frequencies (counts). Must be non-negative. |
lower_bounds |
Numeric vector of lower class bounds. Must be strictly increasing. |
upper_bounds |
Numeric vector of upper class bounds. Must be strictly
increasing and greater than corresponding |
probs |
Numeric vector of probabilities (between 0 and 1) for which to compute the quantiles. Default is 0.5 (median). |
midpoints |
Optional numeric vector of class midpoints. Used only as
fallback when a quantile class has zero frequency. If |
Consider grouped data divided into classes with known boundaries. Let:
be the lower bound of the -th quantile class
be the upper bound of the -th quantile class
be the -th quantile class width
be the cumulative frequency up to the previous class
be the frequency within the quantile class
be the total frequency
The quantile class for the -th quantile is the first class such that:
.
The -th quantile is then estimated by linear interpolation within the
quantile class:
The method assumes a uniform distribution of observations within each class interval. This is a standard approach for grouped data when individual observations are not available.
A vector of estimated quantiles on grouped data corresponding to probs.
Returns NA if total frequency is zero or missing.
When dealing with administrative or tax data, the first class is often defined
as negative income (or incomes below zero) and the last class as incomes above
a certain threshold. In such cases, we have -Inf as the lower bound of the
first class and Inf as the upper bound of the last class.
If Inf values are present in the given bounds, the function imputes reasonable
bounds using the specified method:
For open left class (first lower bound = -Inf): The imputed first lower bound is given by:
where is the upper bound of the first class and
is the width of the second class. This assumes the first class has the same
width as the second class.
For open right class (last upper bound = Inf):
The imputed upper bound is given by:
where is the lower bound of the last class and
is the width of the second-to-last class.
This assumes the last class has the same width as the penultimate class.
If the quantile class has zero frequency, the function returns the class midpoint as a fallback.
If total frequency is zero or NA, the function returns NA
for all requested quantiles.
quantile for quantiles of ungrouped data.
Other grouped data functions:
gini_grouped(),
qri_grouped()
# Basic usage: compute quartiles freq <- c(5, 8, 10, 4, 3) lower <- c(0, 10, 20, 30, 40) upper <- c(10, 20, 30, 40, 50) quantile_grouped(freq, lower, upper, probs = c(0.25, 0.5, 0.75)) # Compute deciles quantile_grouped(freq, lower, upper, probs = seq(0.1, 0.9, by = 0.1)) # With custom midpoints midpts <- c(5, 15, 25, 35, 45) quantile_grouped(freq, lower, upper, probs = 0.5, midpoints = midpts) # Income distribution example income_freq <- c(120, 180, 150, 80, 40, 20, 10) income_lower <- c(0, 15000, 30000, 45000, 60000, 80000, 100000) income_upper <- c(15000, 30000, 45000, 60000, 80000, 100000, 150000) # Compute median income quantile_grouped(income_freq, income_lower, income_upper, probs = 0.5) # Compute income quintiles quantile_grouped(income_freq, income_lower, income_upper, probs = seq(0.2, 0.8, by = 0.2))# Basic usage: compute quartiles freq <- c(5, 8, 10, 4, 3) lower <- c(0, 10, 20, 30, 40) upper <- c(10, 20, 30, 40, 50) quantile_grouped(freq, lower, upper, probs = c(0.25, 0.5, 0.75)) # Compute deciles quantile_grouped(freq, lower, upper, probs = seq(0.1, 0.9, by = 0.1)) # With custom midpoints midpts <- c(5, 15, 25, 35, 45) quantile_grouped(freq, lower, upper, probs = 0.5, midpoints = midpts) # Income distribution example income_freq <- c(120, 180, 150, 80, 40, 20, 10) income_lower <- c(0, 15000, 30000, 45000, 60000, 80000, 100000) income_upper <- c(15000, 30000, 45000, 60000, 80000, 100000, 150000) # Compute median income quantile_grouped(income_freq, income_lower, income_upper, probs = 0.5) # Compute income quintiles quantile_grouped(income_freq, income_lower, income_upper, probs = seq(0.2, 0.8, by = 0.2))
Estimates ratio of quantiles (e.g., P90/P10) on simple and complex sampling data
ratio_quantiles( y, weights = NULL, prob_numerator = 0.9, prob_denominator = 0.1, type = 6, na.rm = TRUE )ratio_quantiles( y, weights = NULL, prob_numerator = 0.9, prob_denominator = 0.1, type = 6, na.rm = TRUE )
y |
A numeric vector of data values |
weights |
A numeric vector of sampling weights (optional). If |
prob_numerator |
The percentile to be considered at the numerator (default |
prob_denominator |
The percentile to be considered at the denominator (default |
type |
Quantile estimation type: integer |
na.rm |
Logical, should missing values be removed? (default: TRUE) |
Consider a random sample of size , and let , , define
the sampling weight and be the observed characteristics (i.e. income)
associated to the -th individual, .
Let be the order of the quantile at the numerator and
be the order of the quantile at the denominator. For example, set and
. Then the popular P90/P10 ratio can be estimated by
where the estimated quantiles are computed via the function
csquantile(), which accounts for sampling weights and the specified
quantile type.
A scalar numeric value representing the estimated ratio of quantiles
Other inequality indicators based on quantiles:
inequantiles(),
plot_inequality_curve(),
qri(),
share_ratio(),
superpop_qri()
data(synthouse) eq <- synthouse$eq_income # Income data # Compute unweighted P90/P10 with default type 6 quantile estimator ratio_quantiles(y = eq) # Consider the sampling weights, change quantile estimation type and orders of quantiles w <- synthouse$weight ratio_quantiles(y = eq, weights = w, prob_numerator = 0.6, prob_denominator = 0.1, type = 5) # Compare the P90/P10 across macro-regions (NUTS1) tapply(1:nrow(synthouse), synthouse$NUTS1, function(area) { ratio_quantiles(y = synthouse$eq_income[area], weights = synthouse$weight[area]) })data(synthouse) eq <- synthouse$eq_income # Income data # Compute unweighted P90/P10 with default type 6 quantile estimator ratio_quantiles(y = eq) # Consider the sampling weights, change quantile estimation type and orders of quantiles w <- synthouse$weight ratio_quantiles(y = eq, weights = w, prob_numerator = 0.6, prob_denominator = 0.1, type = 5) # Compare the P90/P10 across macro-regions (NUTS1) tapply(1:nrow(synthouse), synthouse$NUTS1, function(area) { ratio_quantiles(y = synthouse$eq_income[area], weights = synthouse$weight[area]) })
Implements the rescaled bootstrap method for variance estimation in survey data, supporting both stratified simple random sampling and multistage complex designs.
rescaled_bootstrap( data, y, strata, N_h = NULL, psu = NULL, weights = NULL, estimator, by_strata = TRUE, B = 200, m_h = NULL, seed = NULL, verbose = TRUE )rescaled_bootstrap( data, y, strata, N_h = NULL, psu = NULL, weights = NULL, estimator, by_strata = TRUE, B = 200, m_h = NULL, seed = NULL, verbose = TRUE )
data |
A data frame containing the survey data. |
y |
A character string specifying the variable name to be used for the target variable. |
strata |
A character string specifying the stratification variable. |
N_h |
Optional vector of stratum population sizes, used for the finite population correction (FPC). Can be a single value (applied to all strata) or one value per stratum. |
psu |
Optional character string specifying the Primary Sampling Unit (PSU) variable. Required for multistage complex designs. |
weights |
Optional character string specifying the sampling weight variable. Required for complex designs with unequal inclusion probabilities. |
estimator |
A function that computes the statistic of interest, accepting arguments
|
by_strata |
Logical; if |
B |
Integer; number of bootstrap replicates (default = 200). |
m_h |
Optional vector of bootstrap sample sizes per stratum (PSUs for complex designs).
If |
seed |
Optional integer for reproducibility. |
verbose |
Logical; if |
The rescaled bootstrap is a resampling technique designed for complex survey data that preserves stratification and primary sampling unit (PSU) structure, providing consistent variance estimation for both smooth and non-smooth statistics. The methodology is based on Rao and Wu (1988) and Rao et al. (1992).
(1) Stratified Simple Random Sampling design
Consider a finite population divided into strata, each of size , with a sample of size
selected independently in each stratum. Suppose to be interested in some parameter,
with sampling estimator.
For each bootstrap replicate, and stratum :
Draw a bootstrap sample of size with replacement from the sampled units.
By default, .
Compute rescaled bootstrap values:
where is the bootstrap observation, is the FPC, with , and
is the sample stratum mean.
Compute the statistic of interest using rescaled values.
The variance is then estimated by the bootstrap variance.
(2) Two-Stage Stratified Sampling design
For designs with PSUs and sampling weights:
Within each stratum , draw PSUs with replacement from the sampled PSUs.
By default, .
Let denote the number of times PSU is selected in replicate .
Each observation in the -th PSU is assigned a rescaled bootstrap weight:
is the sampling weight associated to individual
in PSU in stratum
The statistic is computed using the rescaled weights.
The variance is then estimated by the bootstrap variance.
Multiple estimators
The estimator argument accepts any function with signature
f(y, weights) (complex design) or f(y) (simple design),
including functions from this package and user-defined ones.
When estimator returns a named numeric vector, variances are
computed for all outputs simultaneously from the same bootstrap replicates,
so the resulting standard errors are directly comparable across indicators.
A list containing:
variance |
Bootstrap variance estimate |
boot_estimates |
Vector of B bootstrap estimates |
B |
Number of bootstrap replicates |
by_strata |
Logical; |
design |
Character string: |
strata_info |
Data frame with number of observations/PSUs per stratum. |
call |
The matched function call. |
Rao J, Wu C (1988). “Resampling inference with complex survey data.” Journal of the American Statistical Association, 83, 231–241.
Rao J, Wu C, Yue K (1992). “Some recent work on resampling methods for complex surveys.” Survey methodology, 18, 209–217.
Kolenikov S (2010). “Resampling variance estimation for complex survey data.” The Stata Journal, 10, 165–199.
Scarpa S, Ferrante MR, Sperlich S (2025). “Inference for the quantile ratio inequality index in the context of survey data.” Journal of Survey Statistics and Methodology. doi:10.1093/jssam/smaf024.
For a convenience wrapper that automatically computes all package inequality
indicators and their standard errors in a single call, see inequantiles.
data(synthouse) # ================================================================ # Example 1: Stratified Simple Random Sampling (SRS) # ================================================================ # Use NUTS2 as strata set.seed(123) # Simulate population sizes per stratum (for FPC) N_values <- sample(2000:5000, length(unique(synthouse$NUTS2)), replace = TRUE) names(N_values) <- sort(unique(synthouse$NUTS2)) # Define a simple mean estimator mean_estimator <- function(y) mean(y, na.rm = TRUE) # Apply the rescaled bootstrap under stratified SRS boot_srs <- rescaled_bootstrap( data = synthouse, y = "eq_income", strata = "NUTS2", N_h = N_values, estimator = mean_estimator, by_strata = TRUE, B = 50, # small number for illustration seed = 123, verbose = FALSE ) # View results boot_srs$variance # ================================================================ # Example 2: Two-stage Complex Design # ================================================================ # Estimate the QRI estimator sampling variance. boot_complex <- rescaled_bootstrap( data = synthouse, y = "eq_income", strata = "NUTS2", psu = "municipality", weights = "weight", estimator = qri, by_strata = TRUE, B = 50, seed = 456, verbose = FALSE ) # Display variance and bootstrap estimates summary(boot_complex$variance) # Strata and PSU summary # ================================================================ # Example 3: Multiple estimators in a single bootstrap loop # ================================================================ # Create a function returning a named vector of estimates, # including package functions and user-defined ones. All indicators share # the same bootstrap replicates, ensuring directly comparable standard errors. multi_estimator <- function(y, weights) { c( w_mean = sum(y * weights) / sum(weights), # custom: weighted mean qri = qri(y, weights = weights), # package function qsr = share_ratio(y, weights = weights) # package function ) } boot_multi <- rescaled_bootstrap( data = synthouse, y = "eq_income", strata = "NUTS2", psu = "municipality", weights = "weight", estimator = multi_estimator, by_strata = FALSE, B = 50, seed = 42, verbose = FALSE ) # One variance per indicator, all from the same replicates boot_multi$variance # ================================================================ # Note: # These examples use small B for speed. For actual analysis, # use B >= 200 for stable estimates. # ================================================================data(synthouse) # ================================================================ # Example 1: Stratified Simple Random Sampling (SRS) # ================================================================ # Use NUTS2 as strata set.seed(123) # Simulate population sizes per stratum (for FPC) N_values <- sample(2000:5000, length(unique(synthouse$NUTS2)), replace = TRUE) names(N_values) <- sort(unique(synthouse$NUTS2)) # Define a simple mean estimator mean_estimator <- function(y) mean(y, na.rm = TRUE) # Apply the rescaled bootstrap under stratified SRS boot_srs <- rescaled_bootstrap( data = synthouse, y = "eq_income", strata = "NUTS2", N_h = N_values, estimator = mean_estimator, by_strata = TRUE, B = 50, # small number for illustration seed = 123, verbose = FALSE ) # View results boot_srs$variance # ================================================================ # Example 2: Two-stage Complex Design # ================================================================ # Estimate the QRI estimator sampling variance. boot_complex <- rescaled_bootstrap( data = synthouse, y = "eq_income", strata = "NUTS2", psu = "municipality", weights = "weight", estimator = qri, by_strata = TRUE, B = 50, seed = 456, verbose = FALSE ) # Display variance and bootstrap estimates summary(boot_complex$variance) # Strata and PSU summary # ================================================================ # Example 3: Multiple estimators in a single bootstrap loop # ================================================================ # Create a function returning a named vector of estimates, # including package functions and user-defined ones. All indicators share # the same bootstrap replicates, ensuring directly comparable standard errors. multi_estimator <- function(y, weights) { c( w_mean = sum(y * weights) / sum(weights), # custom: weighted mean qri = qri(y, weights = weights), # package function qsr = share_ratio(y, weights = weights) # package function ) } boot_multi <- rescaled_bootstrap( data = synthouse, y = "eq_income", strata = "NUTS2", psu = "municipality", weights = "weight", estimator = multi_estimator, by_strata = FALSE, B = 50, seed = 42, verbose = FALSE ) # One variance per indicator, all from the same replicates boot_multi$variance # ================================================================ # Note: # These examples use small B for speed. For actual analysis, # use B >= 200 for stable estimates. # ================================================================
Computes the theoretical quantile ratio index (QRI) for measuring inequality for a given parametric distribution.
superpop_qri(qfunction, lower = 0, upper = 1, subdivisions = 1000L, ...)superpop_qri(qfunction, lower = 0, upper = 1, subdivisions = 1000L, ...)
qfunction |
A quantile function (e.g., |
lower |
Lower bound of integration. Default is 0. |
upper |
Upper bound of integration. Default is 1. |
subdivisions |
Maximum number of subintervals for integration. Default is 1000L. |
... |
Additional parameters to pass to |
The QRI was proposed by (Prendergast and Staudte 2018) for measuring
economic inequality. Consider a random variable with positive support, which admits
a continuous CDF and quantile function , for any .
It is calculated as:
where is the ratio of symmetric quantiles, with and .
This function computes the (superpopulation) QRI for
theoretical parametric distributions, as opposed to qri which estimates
the QRI from sample data.
A numeric value representing the theoretical QRI for the specified parametric distribution. Values range from 0 (perfect equality) to 1 (maximum inequality).
Prendergast LA, Staudte RG (2018). “A simple and effective inequality measure.” The American Statistician, 72, 328–343.
qri for the sample-based QRI estimator, plot_inequality_curve for its representation
Other inequality indicators based on quantiles:
inequantiles(),
plot_inequality_curve(),
qri(),
ratio_quantiles(),
share_ratio()
# Log-normal distribution superpop_qri(qlnorm, meanlog = 9, sdlog = 0.3) superpop_qri(qlnorm, meanlog = 9, sdlog = 1.4) # Weibull distribution superpop_qri(qweibull, shape = 1.7, scale = 30000) superpop_qri(qweibull, shape = 3, scale = 30000)# Log-normal distribution superpop_qri(qlnorm, meanlog = 9, sdlog = 0.3) superpop_qri(qlnorm, meanlog = 9, sdlog = 1.4) # Weibull distribution superpop_qri(qweibull, shape = 1.7, scale = 30000) superpop_qri(qweibull, shape = 3, scale = 30000)
A realistic synthetic dataset based on the empirical structure of real IT-SILC (Italian Survey on Income and Living Conditions) 2024 data.
synthousesynthouse
A data frame with 20,034 rows (individuals nested in 10,099 households) and 17 variables covering demographic, socio-economic, and geographic information:
Character. Unique person identifier, composed of the household ID followed by a person index within the household (format: HH000001P1, HH000001P2, HH000002P1, ...)
Character. Household identifier. All individuals in the same household share this ID (format: HH000001, HH000002, ...)
Character. NUTS1 region code (5 macro-regions):
N
S
NE
NO
C
Character. NUTS2 region code (30 regions, format: N01-N06, S01-S06, ...)
Character. NUTS3 province code (120 provinces, format: N01001-N01004, ...)
Character. Municipality code (1,079 municipalities, format: N010010001-N010010008, ...)
Integer. Age in years (0-85)
Factor. Age class with 7 levels: "0-14", "15-17", "18-24", "25-34", "35-49", "50-64", "65+"
Integer. Gender code:
1 = Male
2 = Female
Character. Education level (adults 18+ only, NA for minors):
"Low" = No education, primary, or lower secondary (ISCED 0-2)
"Medium" = Upper secondary or post-secondary non-tertiary (ISCED 3-5)
"High" = Tertiary education (ISCED 6-8)
Character. Main activity status:
"Employed" = In employment
"Unemployed" = Unemployed
"Retired" = Retired
"Student" = Student or pupil
"Other" = Other (unable to work, domestic tasks, etc.)
Integer. Household size (number of members): 1-7
Character. Household type:
"Single" = One-person household
"Couple" = Two adults without children
"Single_parent" = Single parent with children
"Family" = Household with children (2+ adults)
"Other" = Other household types
Numeric. Equivalised disposable household income in euros. This is the total household income divided by the OECD modified equivalence scale. All household members share the same equivalised income.
Numeric. Total disposable household income in euros before equivalisation. All household members share the same total income.
Numeric. OECD modified equivalence scale for the household:
First adult (14+): weight = 1.0
Other adults (14+): weight = 0.5 each
Children (< 14): weight = 0.3 each
Formula: modif_oecd_scale = 1.0 + 0.5 × (n_adults - 1) + 0.3 × n_children
Numeric. Sampling weight (inverse inclusion probability). Represents the number of individuals in the population represented by this sample unit. All household members share the same weight.
The synthetic dataset was generated to reproduce key characteristics of
IT-SILC data, but contains fictional values; it is therefore suitable for
methodology illustration and testing, not for policy analysis.
It is primarily intended to demonstrate the computation of quantile-based
inequality indicators provided by the inequantiles package,
such as quantiles, quantile-based indicators, influence functions, and
variance estimation.
Geographic variables follow a hierarchical NUTS structure with realistic proportions across macro-regions and were created randomly; they do not correspond to real codes. Individual characteristics (age, gender, education, ...) were assigned randomly based on conditional empirical distributions from IT-SILC. Income was generated using a regression model fitted to IT-SILC data:
where the suffix _head identifies variables measured for the
household head (e.g., education_head is the education level of
the household head, age_head is their age).
Sampling weights follow a lognormal distribution fitted to IT-SILC.
Key Statistics:
Sample size: 20,034 individuals in 10,099 households
Average household size: ~1.99 (matching IT-SILC)
Estimated population: 15,749,925 individuals (the sum of the weights)
Geographic coverage: 5 macro-regions, 30 NUTS2, 120 NUTS3, 1,079 municipalities
Eurostat (2024). EU Statistics on Income and Living Conditions (EU-SILC): Methodology. https://ec.europa.eu/eurostat/
# Load the dataset data(synthouse) # Basic structure str(synthouse) head(synthouse) # Summary statistics summary(synthouse$eq_income) summary(synthouse$age) # Number of households and individuals length(unique(synthouse$hh_id)) # Households nrow(synthouse) # Individuals # Average household size mean(table(synthouse$hh_id)) # Distribution of household types table(unique(synthouse[, c("hh_id", "hh_type")])$hh_type) # Age distribution table(synthouse$age_class) # Weighted quantiles csquantile(synthouse$eq_income, weights = synthouse$weight, probs = c(0.25, 0.5, 0.75), type = 6) # Quantile Ratio Index qri(synthouse$eq_income, weights = synthouse$weight, type = 6)# Load the dataset data(synthouse) # Basic structure str(synthouse) head(synthouse) # Summary statistics summary(synthouse$eq_income) summary(synthouse$age) # Number of households and individuals length(unique(synthouse$hh_id)) # Households nrow(synthouse) # Individuals # Average household size mean(table(synthouse$hh_id)) # Distribution of household types table(unique(synthouse[, c("hh_id", "hh_type")])$hh_type) # Age distribution table(synthouse$age_class) # Weighted quantiles csquantile(synthouse$eq_income, weights = synthouse$weight, probs = c(0.25, 0.5, 0.75), type = 6) # Quantile Ratio Index qri(synthouse$eq_income, weights = synthouse$weight, type = 6)