10 Data Envelopment and Deprivation
10.1 What is deprivation analysis?
Synthetic scores of multiple deprivation
mdepriv
is a R package for combining binary, continuous and suitably transformed ordinal items/indicators of deprivation into synthetic measures of deprivation. As such, it is a tool of poverty analysis. It is suitable also for generating composite measures of severity and
vulnerability in the humanitarian realm. The R implementation translates the original Stata version by Pi Alperin & Van Kerm (2009), with additional features (notably, non-integer sampling weights are admitted).
mdepriv returns unit-level synthetic scores of multiple deprivation and their statistical summaries. It offers several methods for determining item/indicator weights in response to user preferences for rewarding better discrimination and penalizing redundancy.
mdepriv is particularly appropriate in situations where the underlying concept of deprivation / severity / vulnerability is intuitively multi-dimensional, but the structure of dimensions is poorly understood; plausibly they overlap, i.e. they reinforce each other to unknown degrees.
Also, the measures produced under mdepriv do not presume normative standards (e.g., poverty lines). There are no a-priori cut-offs of the kind that are fundamental to multi-dimensional poverty measures in the Alkire-Foster tradition (implemented in R for example in the functions
svyafc
and svyafcdec
of the package convey
). Shortfall indicators (e.g., years of basic education missed) can be used the same way as non-normative ones (e.g., workdays lost to illness).
# install package devtools if not yet installed
# install.packages("devtools")
# install fast from GitHub without vignettes (not recommanded)
# devtools::install_github("a-benini/mdepriv")
Fundamentally, the four methods
- Equi-proportionate
- Desai & Shah (1988)
- Cerioli & Zani (1990)
- Betti & Verma (1998)
differ by their increasing sensitivity to deprivations / adverse conditions suffered by the poorest / most oppressed units (individuals, households, communities). The idea is to weight more strongly specific deprivations / adverse conditions that the more advantaged units have overcome, but which are still afflicting those at the bottom. Typically, in poverty studies, this makes good sense with durable household assets, which are observed as dichotomous “has / does not have” and recorded as binary (0/1) variables.
In humanitarian assessments it is less common that the severity of a lacking / suffering / threat is inversely proportionate to its prevalence. When prevalence-dependent weighting is not appropriate, mdepriv gives the user three options:
- Equi-proportionate weighting
- Unequal weights chosen for substantive considerations
- Switching off prevalence-sensitivity
This third option is meaningful primarily in combination with the Betti-Verma method. This method deserves particular attention. Its defining feature is a double-weighting algorithm:
- The first weighting factor gauges the discriminating power of an item / indicator through its coefficient of variation (= standard deviation / mean). For dichotomous items / indicators, this implies high sensitivity to prevalence (the C.o.V. grows exponentially as the proportion tends towards zero). For continuous ones, higher C.o.V.s imply higher information content.
- The second weighting factor results from the correlations among the items / indicators. Those correlated strongly positively with most others are considered more highly redundant and are penalized with lower weights. Those with comparatively low positive or negative sums of correlation coefficients are considered capturing more unique aspects of deprivation / humanitarian conditions and are rewarded with higher weights.
Betti-Verma is the only one among the four methods sensitive to the information content of continuous items / indicators. mdepriv automatically computes the weight of each item / indicator proportionate to the product of its two weighting factors.
The mdepriv
function makes the second weighting factor available also for the other three methods, but not all combinations are practically meaningful. mdepriv
automatically recognizes the appropriate type of correlations between pairs of items / indicators, but the user can impose a particular type collectively on all pairs (rarely meaningful!).
When the user opts for double-weighting (other than in Betti-Verma, where it is the default), the options have to be specified for both factors. mdepriv
notation for the first factor is wa
, for the second it is wb
. However, not all combinations are practically meaningful. Recommended combinations depend on the analytic objective. Plausibly, the most relevant are:
Objective | First weighting factor (wa) | Second weighting factor (wb) |
---|---|---|
Indifferent to prevalence (dichotomous items) and to information content (continuous). But controlling for redundancy is important. | Equi-proportionate | on |
Items are all dichotomous. Limited sensitivity to low prevalence desired. Not controlling for redundancy. | Desai-Shah | off |
Items are all dichotomous. High sensitivity to low prevalence desired. Not controlling for redundancy. | Betti-Verma | off |
All items are continuous, and controlling for redundancy is important. | Betti-Verma | on |
Items are mixed dichotomous / continuous. Indifferent to prevalence of the dichotomous, but concerned for information content of the continuous. Redundancy control is important. i.e. Two-level deprivation / severity model: | ||
Level 1: Combine all dichotomous items, save scores as one more continuous indicator for the second level. | Equi-proportionate | on |
Level 2: Combine continuous items and indicator saved from first level. Produces aggregate deprivation / severity statistic. | Betti-Verma | on |
The Cerioli-Zani model appears to be primarily of historic interest. It was one of the first to pursue the so-called fuzzy-set approach to multi-dimensional poverty measurement, which Betti and Verma subsequently deepened. The interested user may consult the reader edited by Lemmi and Betti (2006), but familiarity with fuzzy sets is not required for the understanding and application of mdepriv.
As a final introductory remark, it should be said that the Betti-Verma method is particularly appropriate when a concept (deprivation, severity of conditions, etc.) has many aspects, its dimensionality is not well understood, and classical methods to unravel the dimensions (e.g., factor analysis) are likely distorted by redundancies among the available indicators.
The following focuses on the technical handling of the mdepriv
package. Context and rationale are not discussed here. The purpose is to walk the user through the manifold arguments and outputs of the core function mdepriv
. Familiarity with the basics of R is presumed. Still, things are kept at a pretty low skill level so that users with little R experience can follow. The code provided is intended as copy-and-paste material, which users can modify for practice or for real-world data analysis.
Chiefly we will use the simulated dataset simul_data
with 100 observations, which are enough to demonstrate functionality. To showcase a two-level deprivation model we put the dataset MSNA_HC
to use. Both datasets are part of the mdepriv
-package; thus they do not require further sourcing.