Survey Data Analysis

# Survey Data Analysis
## <code>Cookbook</code>
### 
### 1 December 2021

---

---
class: inverse, left, middle

# A Vision for Data Analysis

"_Multi-functional teams, with strengthened data literacy, regularly conduct meaningful and documented joint data interpretation sessions to define their strategic directions based on statistical evidences_"

???
Slides made with Xaringan - - https://arm.rbind.io/slides/xaringan.html
https://slides.earo.me/rladiesakl20/#1
https://xaringantutorial.netlify.app

---

# A Theory of Change for Data analysis

Proper use of data for advocacy & programmatic decision making

&#8618; Corporate __Standards__ exist to define how to encode & process household surveys dataset

&#8618; Field data experts are trained based on precise recipes and predefined tools at each step of the __data life cycle__

&#8618; Data are presented, discussed and linked to expert knowledge during data __interpretation__ sessions with a multi-functional team

&#8618; All potential valid interpretations, including diverging views, are systematically __recorded__

&#8618; __Persuasive__ "Data Stories“ and Policy papers are generated

---

# Learning objectives

### 1. How to build charts quickly?

### 2. How to calculate impact indicators from survey data?

### 3. How to answer Key Research Questions? Describe...

_Explore, Explain to be presented in next webinar!_

???
Depending on the pace of the group, if we do not finish today we will org anise a second session

---

# Webinar rules

Leverage this opportunity and make this session __lively__ - there's no stupid questions!

Use the __chat__ to send your questions - we are two facilitators and one is focused on replying all questions directly in the chat while the session is on-going

All practical exercises are designed to get you __testing the commands__:

> Start Rstudio if you have already it installed  or Login now to[cloud-based version of RStudio](https://login.rstudio.cloud/register?redirect=https%3A%2F%2Fclient.login.rstudio.cloud%2Foauth%2Flogin%3Fshow_auth%3D0%26show_login%3D0) for this session  
    
> Paste the command from the chat to your online Rstudio session and check what is happening
    
> In case it is not working as expected, share screenshot or error messages from the console in the chat

---

# Click-based Workflow...

Associate data  with other tables with **ACCESS** ![](images/Logo_Microsoft_Access_2013.png)

...then explore through graphs with **EXCEL** ![](images/Microsoft_Excel_2013_logo.svg.png)

...then mapping with **ArcGIS** ![](images/ArcGIS.png)

]

..then write up narratives in **WORD** ![](images/Microsoft_Word_logo.png)

... and design a full document with **INDESIGN** ![](images/Adobe_InDesign_icon.png)

... or create an infographic with **ILLUSTRATOR** ![](images/Adobe_Illustrator_icon.png)
]

???
and eventually some VBA macros

---

## ... coming up with challenges!

As a coauthor/reader/peer reviewer, one would like to see the whole **research process** (_how we arrived to that conclusion_), rather than cooked manuscript with inserted tables/figures.

What analysis is **behind the figure**? Did it accounts for [..._new last minute question_...] in the analysis? 
 
What **dataset** (_final vs preliminary version_) was used ? Were **outliers** identified? How did you **weight** your sample?

Oops, there is an error in the data. Can we **repeat the analysis**? And update quickly the figures, graphs and tables in the report and the presentation! 
]

![](images/exhausted.png)
]

???

When managing numerous analysis with data that may change and in a collaborative mode, this workflow is **not** the most effective.

*  Data are manipulated through "point-and-click" user interfaces that are not __captured__!
 *  Data are moving from a software to another (Excel, GIS, Word...) using different __formats__! 
 *  All results (figures, tables) are **manually** copied/pasted to the final publishing system...
 
---

# Science is '_show me_' - not '_trust me_'!

### Reproducible Research Manisfesto; aka the _"Ten Commandments"_

For every result, **keep track** of how it was produced

**Avoid manual data manipulation** steps

**Archive** the exact versions of all external programs used

**Version control** all custom scripts

**Record all intermediate results**, when possible in standardized formats
]

For analyses that include randomness, **note underlying random seeds**

Always **store raw data** behind plots

Generate hierarchical analysis output, allowing layers of increasing detail to be inspected

Connect **textual statements** to underlying results

Provide **public access** to scripts, runs, and results
]

---

## Enable a fully auditable workflow

As soon as all steps (i.e. **DATA + TIDYING + MODELING + VISUALS + NARRATIVE**) are done through **series of written commands recorded in scripts**:

- when spotting error in the data, or using different dataset, one just need to adjust in the script and report will update automatically;
 
 - Data manipulation becomes be *de facto* fully documented (no more manual changes in Excel);

- Analysis is self-explanatory and ready for any kind of collaborative review;

- Customization are facilitated and allow to deliver  final product  with a professional branding and styling.

> Analysis becomes streamlined and [reproducible](https://unhcr-americas.github.io/reproducibility)!

> A "collaboration mode" is enabled from the begining of the process!

> As your analysis can be reviewed, you become "covered"...

???
 instead of **hundreds of mouse clicks**
See also http://muschellij2.github.io/summerR_2015/modules/module12.html
---

### Key Concept 1: From "click" to "script"

Using the right combination of packages, you can integrate all necessary data analysis steps into **scripts**:

Data management (import, clean, recode, merge, reshape)

Data analysis (test, regression, multivariate analysis, etc...)
 
Data visualization (plot, map, graph...)

Writing up results (report and presentation generation)

]

![](images/data-science-wrangle.png)

]

---

### Key Concept 2: Everything is an object & Anything can be packaged....

![](images/design-cake3.png)

]

`Vectors` are a core single data structure, created with `c()`.

`Data.frame` where each column is a vector, but adjacent vectors can hold different things

`Matrix` just like a data frame except it's all numeric

`List` are made of any dimension, mix and match

`Factors` are a special class that R uses for categorical variables, which also allows for value labeling and ordering.

`Functions` are object designed to transform one object in a new one

`Charts` are objects  designed to generate an image

`Models` are objects recording computation based on specific data

]

???
Elements in a vector must be of the same type.
Reference link on [Manipulating data](http://www.cookbook-r.com/Manipulating_data/)

---

### Key Concept 3: Search, Test, Try...

![](images/change.png)
]

Get a Certification on R language recognized by UNHCR on [learn.unhcr.org](https://unhcr.csod.com/ui/lms-learner-playlist/PlaylistDetails?playlistId=e90e2279-e3a4-4ef2-8b74-757f91d224b2) and play with [Rstudio Primers](https://rstudio.cloud/learn/primers)

Search and ask in [Stackoverflow](https://stackoverflow.com/questions/tagged/r)

Go through [Cheat-sheets](https://rstudio.cloud/learn/cheat-sheets)

Consult Key Manuals, maybe starting with [R for Data Science](https://r4ds.had.co.nz/)

Browse chart library on [`UnhcrDataPackage`]()

Follow blogs like general [Rbloggers](https://www.r-bloggers.com/), [Tidyverse blog](https://www.tidyverse.org/blog/) or more specific [HumanitaRian-useR-group](https://humanitarian-user-group.github.io/) as well some twitter accounts.

Join forum like [Inter-Agency R skype group](https://join.skype.com/qYBKC5q3wKp4) or Internal UNHCR Ms Discussion group (ask to join!).

]

---

# Learning stages...

![](images/learning.png)
]

__Step 1.__ Develop an understanding of what data science is and what programming and math concepts are needed for it
 
 __Step 2.__ Break data science challenges into small steps - Acquire basic command syntax through very practical and focused project
 
 __Step 3.__ Develop Reproducible Analysis Workflow - Understand the relevance, inputs, constraints, and limitations of the various techniques
 
 __Step 4.__ Optimize your problem solving approaches in elegant ways - Build packages

]

???
https://towardsdatascience.com/the-stages-of-learning-data-science-3cc8be181f54

See Video - https://www.youtube.com/watch?v=hpMc6TgT34I

---
# Analytics Models / Algorithms

![](images/analytics.png)
---
class: center, middle, inverse

# Practical Use Case 1 - Charting

### *My boss needs a slide with the main origin of Asylum Seekers and Refugees from Americas in this country... in 5 minutes....*

---

## How to get to this chart in a couple of lines?

???
https://r4ds.had.co.nz/graphics-for-communication.html#figure-sizing

---

### Install required packages

Got to your [locally installed Rstudio](https://www.rstudio.com/products/rstudio/download/#download) or [sign-up for a free Rstudio Cloud account](https://login.rstudio.cloud/register?redirect=https%3A%2F%2Fclient.login.rstudio.cloud%2Foauth%2Flogin%3Fshow_auth%3D0%26show_login%3D0)

First create a new project within R studio and then make sure we have the [tidyverse](https://www.tidyverse.org/packages/) plus additional UNHCR packages

```r
# Tidyverse
if (!require("tidyverse")) install.packages("tidyverse", dependencies = TRUE)
if (!require("here")) install.packages("here")

# UnhcRverse  
if (!require("unhcrdatapackage")) remotes::install_github('unhcr/unhcrdatapackage')
if (!require("unhcRstyle")) remotes::install_github('unhcr-web/unhcRstyle')
```
]

![tidyverse](images/forcats.png)

]

???

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.

---

### Get the data

To get the data, multiple approaches are possible.

Go to [UNHCR dataset page on HDX](https://data.humdata.org/dataset/unhcr-population-data-for-world) to download the dataset: `end_year_population_totals_residing_world.csv` and save it locally within your project in a folder name `data-raw`

```r
popdata <- read.csv(here::here("data-raw", "end_year_population_totals_residing_world.csv"))
```

or __save time__ and use directly the unhcrdatapackage that includes a reshaped `long` version of the data

```r
popdata <- unhcrdatapackage::end_year_population_totals_long

## check the name of the variable
#names(popdata)

## Check the top 5 lines for select variables
# head(popdata %>% select(Year, CountryOriginCode,CountryAsylumCode,Population.type, Value),5) 
```
]

```
## [1] "Year"                        "CountryOriginCode"          
## [3] "CountryAsylumCode"           "CountryOriginName"          
## [5] "CountryAsylumName"           "Population.type"            
## [7] "Value"                       "Population.type.label"      
## [9] "Population.type.label.short"
```

<table>
 <thead>
 <tr>
 <th style="text-align:right;"> Year </th>
 <th style="text-align:left;"> CountryAsylumCode </th>
 <th style="text-align:left;"> Population.type </th>
 <th style="text-align:right;"> Value </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:right;"> 1951 </td>
 <td style="text-align:left;"> AUS </td>
 <td style="text-align:left;"> REF </td>
 <td style="text-align:right;"> 180000 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 1951 </td>
 <td style="text-align:left;"> AUT </td>
 <td style="text-align:left;"> REF </td>
 <td style="text-align:right;"> 282000 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 1951 </td>
 <td style="text-align:left;"> BEL </td>
 <td style="text-align:left;"> REF </td>
 <td style="text-align:right;"> 55000 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 1951 </td>
 <td style="text-align:left;"> CAN </td>
 <td style="text-align:left;"> REF </td>
 <td style="text-align:right;"> 168511 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 1951 </td>
 <td style="text-align:left;"> DNK </td>
 <td style="text-align:left;"> REF </td>
 <td style="text-align:right;"> 2000 </td>
 </tr>
</tbody>
</table>
]

---

### Reshape the data  - 1 - Merge with reference

```r
# first merge it with the reference table to get the bureau filter 
#names(unhcrdatapackage::reference)
*Origin <- dplyr::left_join(
 x= unhcrdatapackage::end_year_population_totals_long, 
 y= unhcrdatapackage::reference, 
 by = c("CountryAsylumCode" = "iso_3"))

# head(names(unhcrdatapackage::reference), 9)
#knitr::kable(head(unhcrdatapackage::reference %>% select(UNHCRBureau,iso_3, ctryname )%>% filter(UNHCRBureau== "Americas" ),   5), format = 'html')
```
]

```
## [1] "iso_3"       "UNHCRcode"   "ctryname"    "namepostat"  "namepostat2"
## [6] "gis_name"    "UNHCRBureau" "main_office" "hcr_region"
```

<table>
 <thead>
 <tr>
 <th style="text-align:left;"> UNHCRBureau </th>
 <th style="text-align:left;"> iso_3 </th>
 <th style="text-align:left;"> ctryname </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:left;"> Americas </td>
 <td style="text-align:left;"> ABW </td>
 <td style="text-align:left;"> Aruba </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Americas </td>
 <td style="text-align:left;"> AIA </td>
 <td style="text-align:left;"> Anguilla </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Americas </td>
 <td style="text-align:left;"> ARG </td>
 <td style="text-align:left;"> Argentina </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Americas </td>
 <td style="text-align:left;"> ASM </td>
 <td style="text-align:left;"> American Samoa </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Americas </td>
 <td style="text-align:left;"> ATG </td>
 <td style="text-align:left;"> Antigua and Barbuda </td>
 </tr>
</tbody>
</table>
]

---

### Reshape the data - 2 - Filter

```r
nrow(Origin) # Number of Rows before filter
```

```
# [1] 161788
```

```r
## Using Pipe (%>%) Operator
# allows us to pass the result of one function/argument to the other one in sequence
# assigning each functional output as an argument to the next one, and so on
Origin <- Origin %>%
 
* ## First handling functions..
* filter(
 # Other handling functions includes: 
 # arrange, at_least, chop, combine, dissolve, filter, 
 # mutate, rename, rm_uncomplete, rw_fac, sample_frac,
 # sample_n, select, slice, subset.Coo
 ## https://dplyr.tidyverse.org/reference/index.html
 
 ## we will use the 4 filters below
 CountryAsylumName == "Panama" & 
 UNHCRBureau == "Americas" &
 Year == max(unhcrdatapackage::end_year_population_totals_long$Year) &
 Population.type %in%c("REF", "ASY", "VDA" ))

nrow(Origin) # Checking Number of rows after filter
```

```
# [1] 46
```

---

### Reshape the data - 3 - Shorten country name

```r
## Check what the potential value of CountryOriginName
levels(Origin$CountryOriginName)
```

```
# NULL
```

```r
## Changing to factor an display the last 10 values
tail(levels(as.factor(Origin$CountryOriginName)),6)
```

```
# [1] "Sierra Leone"                       "Somalia"                           
# [3] "Sri Lanka"                          "Sudan"                             
# [5] "Ukraine"                            "Venezuela (Bolivarian Republic of)"
```

```r
Origin <- Origin %>% 
* mutate(
 CountryOriginName = str_replace(CountryOriginName,
 " \$Bolivarian Republic of\$", "")) 
## We used a string replacement function

## check replacement 
tail(levels(as.factor(Origin$CountryOriginName)),4)
```

```
# [1] "Sri Lanka" "Sudan"     "Ukraine"   "Venezuela"
```

---

### Reshape the data - 4 - Aggregate by Country of Origin

```r
Origin <- Origin %>% 
* group_by( CountryOriginName) %>%
* summarise(DisplacedAcrossBorders = sum(Value) )

### Explore the results in a filterable table
# DT::datatable(Origin,
#    fillContainer = FALSE, options = list(pageLength = 4))
```
]

.pull-right[
<div id="htmlwidget-87f37838c3d2b31201bf" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-87f37838c3d2b31201bf">{"x":{"filter":"none","vertical":false,"fillContainer":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35"],["Afghanistan","Armenia","Bangladesh","Belarus","Bosnia and Herzegovina","Cameroon","Colombia","Côte d'Ivoire","Croatia","Cuba","Democratic Republic of the Congo","Dominican Republic","Ecuador","El Salvador","Eritrea","Guatemala","Haiti","Honduras","India","Iran (Islamic Republic of)","Italy","Jamaica","Liberia","Mexico","Nepal","Nicaragua","Nigeria","Pakistan","Peru","Sierra Leone","Somalia","Sri Lanka","Sudan","Ukraine","Venezuela"],[11,5,15,7,5,14,2745,5,5,1674,5,5,15,1315,38,20,19,94,7,5,5,5,5,5,14,5409,44,16,23,5,101,7,5,5,124384]],"container":"<table class=\"display\">\n <thead>\n <tr>\n <th> <\/th>\n <th>CountryOriginName<\/th>\n <th>DisplacedAcrossBorders<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"pageLength":4,"columnDefs":[{"className":"dt-right","targets":2},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[4,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>
]

---

### Reshape the data - 5 - Create data labels

```r
Origin <- Origin %>% 
 
* ## using the scales::label_number_si function
 ## but adding some accuracy needs to be corrected for values below 1000
 mutate( DisplacedAcrossBordersRound = 
 ## let do a test with ifelse
 ifelse(DisplacedAcrossBorders > 1000, 
 ## condition 1 - above 1000
 paste(scales::label_number_si(accuracy = 0.1)(DisplacedAcrossBorders)),
 ## condition 2 - below 1000
 as.character(DisplacedAcrossBorders) ) ) 
```
]

.pull-right[
<table>
 <thead>
 <tr>
 <th style="text-align:left;"> CountryOriginName </th>
 <th style="text-align:left;"> DisplacedAcrossBordersRound </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:left;"> Venezuela </td>
 <td style="text-align:left;"> 124.4K </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Nicaragua </td>
 <td style="text-align:left;"> 5.4K </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombia </td>
 <td style="text-align:left;"> 2.7K </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Cuba </td>
 <td style="text-align:left;"> 1.7K </td>
 </tr>
 <tr>
 <td style="text-align:left;"> El Salvador </td>
 <td style="text-align:left;"> 1.3K </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Somalia </td>
 <td style="text-align:left;"> 101 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Honduras </td>
 <td style="text-align:left;"> 94 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Nigeria </td>
 <td style="text-align:left;"> 44 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Eritrea </td>
 <td style="text-align:left;"> 38 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Peru </td>
 <td style="text-align:left;"> 23 </td>
 </tr>
</tbody>
</table>
]

---

### Reshape the data - 6 - Select top 10 countries

```r
Origin <- Origin %>% 
 ## Sorting resulting by descending numbers
 arrange(desc(DisplacedAcrossBorders)) %>%
 ## keep only the first 10 records
 head(10)

## check results
#knitr::kable(head(Origin %>%
#    select(CountryOriginName,DisplacedAcrossBordersRound ),5 ), format = 'html')
```
]

---

### Build the chart - 1 - Initial bar plot

```r
## A chart is indeed an object

## We create below a chart called plot using the basic ggplot function

plot <- ggplot(data = Origin, 
 
 ## Construct aesthetic mappings of variables to the chart
 aes(
 # Reordering country name by Value to avoid default alphabetical
 x = reorder(CountryOriginName, DisplacedAcrossBorders), 
 y = DisplacedAcrossBorders)) +
 
 # here we configure that it will be bar chart 
 geom_bar(
 stat = "identity", 
## indicate how data should be summarised... 
# default behavior is to count the rows for each x value # stat = "identity" tells ggplot2 to skip aggregation
# An alternative is to use geom_col
## indicate the color to use to fill the bar 
 fill = "#0072bc")

## Print the plot!!
plot
```

]

]

???
See full reference here https://ggplot2.tidyverse.org/reference/

---

### Build the chart - 2 - Format axis number

```r
## A plot is constructed through layered instruction - 
# I can re-use the same plot object (here: plot) and add more instructions with a +

plot <- plot + 
 scale_y_continuous( label = scales::label_number_si())

## simply inputing the name of the object as a commend will print its status
plot
```

]

]

---

### Build the chart - 3 - Flip the chart

```r
plot <- plot + 
 coord_flip()

plot
```

]

]

---

### Build the chart - 4 - Add and position data label

```r
plot <- plot + 
 ## Add label inside the bar in white - outside bar in black
 geom_label( data = subset(Origin, DisplacedAcrossBorders < max(DisplacedAcrossBorders) / 1.5),
 aes(x = reorder(CountryOriginName, DisplacedAcrossBorders), 
 y = DisplacedAcrossBorders,
 label= DisplacedAcrossBordersRound),
 hjust = -0.1 ,
 vjust = 0.5, 
 colour = "black", 
 fill = NA, label.size = NA, 
 family = "Lato", size = 4 ) + 
 ## Add label outside bar in black 
 geom_label( data = subset(Origin, DisplacedAcrossBorders >= max(DisplacedAcrossBorders) / 1.5),
 aes(x = reorder(CountryOriginName, DisplacedAcrossBorders), 
 y = DisplacedAcrossBorders,
 label= DisplacedAcrossBordersRound),
 hjust = 1.1 ,
 vjust = 0.5, 
 colour = "white", 
 fill = NA, label.size = NA, 
 family = "Lato", size = 4 ) 
plot
```

]

]

---

### Build the chart - 5 - Add chart labels

```r
plot <- plot + 
 ## and the chart labels
 labs(title = "What are the main Origin of Forced Displacement across Borders?",
 subtitle = paste0("Top 10 Origin from Americas - Data as of ",max(unhcrdatapackage::end_year_population_totals_long$Year), " in Panama" ), 
 x = " ",
 y = "# of Forcibly displaced people",
 caption = "Data: UNHCR Refugee Population Statistics Database.\n Forced Displacement includes Refugees, Asylym Seekers and Venezuelan Displaced Abroad Population Group.") 
plot
```

]

]

---

### Build the chart - 6 - Apply unhcRstyle

```r
plot <- plot + 
 unhcRstyle::unhcr_theme(base_size = 8) 
plot
```

]

]

---

### Build the chart - 7 - Adjust graphical element

```r
plot <- plot + 
 ## Hihlight axis line
 geom_hline(yintercept = 0, size = 1.1, colour = "#333333") +
 theme(
 ### changing grid line that should appear
 panel.grid.major.x = element_line(color = "#cbcbcb"),
 panel.grid.major.y = element_blank()) 
plot
```

]

]

---

# The Grammar of graphics in 7 key syntax

.img75[![Structure of ggplot2](images/struct_ggplot.png)]
[@CedSherer](https://twitter.com/CedScherer/status/1229418108122783744?s=20)

???
**Data**
- Data is not just data
- Representation defines what can be done with it
- Grammar requires a tidy format (though it precedes the notion)

**Aesthetics**
- Allow generic datasets to be understood by the graphic system.
- Link variables in data to graphical properties in the geometry.

**Layers**
1. Geom
    - How to interpret aesthetics as graphical representations
    - Is a progression of positional aesthetics a number of points, a line, a single polygon, or something else entirely?
2. Stats
    - Transform input variables to displayed values
    - Is implicit in many plot-types but can often be done prior to plotting

**Scales**
- A scale translate back and forth between variable ranges and property ranges
    - Categories > Colour
    - Numbers > Position

**Coordinates**
- Defines the physical mapping of the aesthetics to the paper

**Facets**
- Define the number of panels with equal logic and split data among them…
- Small multiples

**Themes**
- Theming spans every part of the graphic that is not linked to data

---

### Insert this in a slide deck -  Report template....

All based on Rmarkdown that allow to create multiple outputs from the same content format.

- Powerpoint with UNHCR style

- Word with UNHCR style

- html/bootstrap -scroll-able report

- html/slide - slide-able report (WIP)

- Paginated report built on the top of pagedown.

- Analysis Repository contribution

]

Access them **"From Template"** panel when creating a new Rmd document

![UNHCR Rmd templates](images/rmd_templates.png)

]

---
class: center, middle, inverse

# Practical Use Case 2 - Building impact indicators from Survey Data

### *The National Household Survey is including Forcibly Displaced People and got published yesterday... We need to report our indicators by tomorrow! *

???

We will use the “Encuesta Nacional de Calidad de Vida ECV 2020” published by the National Office of Colombia (http://microdatos.dane.gov.co/index.php/catalog/718/get_microdata) and explore different type of statistical analysis based on an example of three research questions:

---

## How to get to this chart in a couple of lines?

---

### Install packages

The particularity of HH survey dataset, when stored in SPSS (`.sav`), SAS (`.sas7bdat`), or Stata (`.dta`) format is that they usually contains both values and __associated labels__ (we call them `labelled data`).

```r
if (!require("haven")) install.packages("haven", dependencies = TRUE) 
if (!require("labelled")) install.packages("labelled", dependencies = TRUE)
if (!require("sjPlot")) install.packages("sjPlot", dependencies = TRUE)
if (!require("DT")) install.packages("DT", dependencies = TRUE)
```
]

]

???
https://www.pipinghotdata.com/posts/2020-12-23-leveraging-labelled-data-in-r/

---

### Get the data

The dataset,  [Encuesta Nacional de Calidad de Vida ECV 2020, Colombia, DANE](http://microdatos.dane.gov.co/index.php/catalog/718/get_microdata) is available through a micro data library.  Download the data from [here]( https://unhcr365-my.sharepoint.com/:f:/g/personal/legoupil_unhcr_org/El_KBosz8pFJilV0mSYUXycBhXqXx2dXtGey4T3_R1L8pA?e=kFXm5U) to your "`data-raw`" folder within your project file

```r
## Household composition where disaggregation variables are
data0 <- haven::read_sav(here::here("data-raw", "Características y composición del hogar.sav"))

## Display frame content
# data0 %>% 
#        sjPlot::view_df()

# view_df() offers many options, e.g. to add the frequencies of values, 
# the amount of missing values per variable, 
#or even weighted frequencies.
  # show.na = TRUE, 
  # show.type = TRUE, 
  # show.frq = TRUE, 
  # show.prc = TRUE, 
  # show.string.values = TRUE, 
  # show.id = TRUE 
```
]

.right-column[
<table style="border-collapse:collapse; border:none;">
<caption>Data frame: .</caption>
<tr>
<th style="border-bottom:double; font-style:italic; font-weight:normal; padding:0.2cm; text-align:left; vertical-align:top;">ID</th><th style="border-bottom:double; font-style:italic; font-weight:normal; padding:0.2cm; text-align:left; vertical-align:top;">Name</th><th style="border-bottom:double; font-style:italic; font-weight:normal; padding:0.2cm; text-align:left; vertical-align:top;">Label</th><th style="border-bottom:double; font-style:italic; font-weight:normal; padding:0.2cm; text-align:left; vertical-align:top;">Values</th><th style="border-bottom:double; font-style:italic; font-weight:normal; padding:0.2cm; text-align:left; vertical-align:top;">Value Labels</th>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">1</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">DIRECTORIO</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Directorio</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2">range: 7247300-7407013</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">2</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">SECUENCIA_ENCUESTA</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Secuencia_encuesta</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2">range: 1-22</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">3</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">SECUENCIA_P</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Secuencia_p</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2">range: 1-4</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">4</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">ORDEN</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Orden</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2">range: 1-22</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">5</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">FEX_C</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Factor de expansión</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2">range: 1.9-3061.6</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">6</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6016</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Número de orden de la persona que proporciona la información:</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2">range: 1-18</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">7</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P1894</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Tipo de documento de identidad</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">1 2 3 4 5</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Registro civil de nacimiento Tarjeta de identidad Cédula de ciudadanía Cédula de extranjería No tiene documento de identidad</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">8</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6020</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Sexo</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1 2</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Hombre Mujer</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">9</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P6034</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">¿Cuál es la fecha de nacimiento de _____?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2">range: 1-2</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6040</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">¿cuántos años cumplidos tiene...?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2">range: 0-106</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">11</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P6051</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Cuál es el parentesco de....con el jefe o la jefa de este hogar?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">1 2 3 4 5 6 7 8 9 10 11 12 13 14</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Jefe (a) del hogar Pareja, esposo (a), cónyuge, compañero(a) Hijo(a) hijastro(a) Nieto (a) Padre, madre, padrastro y madrastra Suegro o suegra Hermano (a), hermanastro (a) Yerno, nuera Otro pariente del jefe(a) Empleado(a) del servicio doméstico Parientes del servicio doméstico Trabajador Pensionista Otro pariente</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">12</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P5502</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Actualmente…:</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1 2 3 4 5 6</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee"> No está casado(a) y vive en pareja hace menos de dos años No está casado(a) y vive en pareja hace dos años o más Está viudo(a) Está separado(a) o divorciado(a) Está soltero(a) Está casado(a)</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">13</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P6071</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">El (la) cónyuge de .. vive en este hogar</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">1 2</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Sí No</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">14</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6071S1</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">No de orden</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2">range: 1-19</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">15</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P756</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">¿Dónde nació ________?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">1 2 3</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">En este municipio En otro municipio En otro país</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">16</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P756S1</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Departamento</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2">range: 5-99</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">17</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P756S2</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Municipio</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2">range: 5001-99773</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">18</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P756S3</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">En otro país</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1 2 3 4 5 6 7 8 9 10 11</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Estados Unidos España Venezuela Ecuador Panamá Perú Costa Rica Argentina Francia Italia Otro país</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">19</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P6074</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">¿_______________ siempre ha vivido aquí en este municipio?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">1 2</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Sí No</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">20</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P767</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">¿cuántos años continuos hace que vive ___ aquí en este municipio?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2">range: 0-90</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">21</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P6076</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Antes de venir a este municipio_______________ vivía en</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">1 2</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Otro país Otro municipio</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">22</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6076S1</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Departamento</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee"></td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee"></td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">23</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P6076S2</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Municipio
</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;"></td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;"></td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">24</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6077</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">_____ vivía en</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1 2</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">El centro urbano donde está la alcaldía Un corregimiento, inspección de policía, caserío, vereda o c</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">25</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P6096</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">¿cuál fue la razón principal para cambiar la residencia al municipio actual?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">1 2 3 4 5 6 7 8 9 10 11 12</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;"> Dificultad para encontrar trabajo o ausencia de medios de s Riesgo o consecuencia de desastre natural (inundación, aval Amenaza o riesgo para su vida, su libertad o su integridad f Necesidad de educación Porque se casó o formó pareja Motivos de salud Mejorar la vivienda o localización Mejores oportunidades laborales o de negocio Acompañar a otro(s) miembro(s) del hogar Adquisición de vivienda Búsqueda de tranquilidad o mejor calidad de vida Otra</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">26</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6081</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">El padre de .. vive en este hogar</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1 2 3</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Sí No Fallecido</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">27</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P6081S1</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">No. De orden</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2">range: 1-15</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">28</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6087</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">¿cuál es o fue el nivel de educación más alto alcanzado por el padre de…..?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Algunos años de primaria Toda la primaria Algunos años de secundaria Toda la secundaria Uno o mas años de técnica o tecnológica Técnica o tecnológica completa Uno o mas años de universidad Universitaria completa Ninguno No sabe</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">29</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P6083</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">La madre de .. vive en este hogar</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">1 2 3</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Sí No Fallecida</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">30</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6083S1</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">No. De orden</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2">range: 1-18</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">31</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P6088</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">¿cuál es o fue el nivel de educación más alto alcanzado por la madre de......?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Algunos años de primaria Toda la primaria Algunos años de secundaria Toda la secundaria Uno o mas años de técnica o tecnológica Técnica o tecnológica completa Uno o mas años de universidad Universitaria completa Ninguno No sabe</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">32</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6080</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">De acuerdo con su cultura, pueblo o rasgos físicos, _____ es o se reconoce comó:</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1 2 3 4 5 6</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Indígena Gitano (a) (rom) Raizal del background-color:#eeeeeehipiélago de San Andrés, Providencia y Santa C Palenquero (a) de San Basilio Negro (a), mulato (a) (afrodescendiente), afrocolombiano(a) Ninguno de los anteriores</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">33</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P5667</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;"> ¿a cuál pueblo o etnia indígena pertenece _____?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;"></td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;"></td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">34</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P2057</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">¿Usted se considera campesino(a)?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1 2 3</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Si No No informa</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">35</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P2059</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;"> ¿Usted considera que alguna vez fue campesino(a)?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">1 2 3</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Si No No informa</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">36</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P2061</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee"> ¿Usted considera que la comunidad en que vive es campesina?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1 2 3</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Si No No informa</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">37</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P1895</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">En general, qué tan satisfecho(a) se siente ... con su vida actualmente?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">0 1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Totalmente insatisfecho(a) 1 2 3 4 5 6 7 8 9 Totalmente satisfecho(a)</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">38</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P1896</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">En general, qué tan satisfecho(a) se siente ... con su ingreso actualmente?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">0 1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Totalmente insatisfecho(a) 1 2 3 4 5 6 7 8 9 Totalmente satisfecho(a)</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">39</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P1897</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">En general, qué tan satisfecho(a) se siente ... con su salud actualmente?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">0 1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Totalmente insatisfecho(a) 1 2 3 4 5 6 7 8 9 Totalmente satisfecho(a)</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">40</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P1898</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">En general, qué tan satisfecho(a) se siente ... con su nivel de seguridad actualmente?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">0 1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Totalmente insatisfecho(a) 1 2 3 4 5 6 7 8 9 Totalmente satisfecho(a)</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">41</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P1899</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">En general, qué tan satisfecho(a) se siente ... con su trabajo/actividad actualmente?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">0 1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Totalmente insatisfecho(a) 1 2 3 4 5 6 7 8 9 Totalmente satisfecho(a)</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">42</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P3175</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee"> En general, ¿qué tan satisfecho/a se siente _____ con su tiempo libre?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">0 1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Totalmente insatisfecho(a) 1 2 3 4 5 6 7 8 9 Totalmente satisfecho(a)</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">43</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P1901</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;"> ¿qué tan feliz se sintió ... el día de ayer?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">0 1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Para nada feliz 1 2 3 4 5 6 7 8 9 Todo el tiempo feliz</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">44</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P1903</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">¿qué tan preocupado(a) se sintió ... el día de ayer?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">0 1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Para nada preocupado(a) 1 2 3 4 5 6 7 8 9 Todo el tiempo preocupado(a)</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">45</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P1904</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">¿qué tan triste se sintió ... el día de ayer?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">0 1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Para nada triste 1 2 3 4 5 6 7 8 9 Todo el tiempo triste</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">46</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P1905</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Qué tanto considera ... que las cosas que hace en su vida valen la pena?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">0 1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">No valen la pena 1 2 3 4 5 6 7 8 9 Valen totalmente la pena</td>
</tr>
<tr>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">47</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">P1927</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">¿En cuál escalón diría usted que se encuentra parado(a) en este momento?</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">0 1 2 3 4 5 6 7 8 9 10</td>
<td style="padding:0.2cm; text-align:left; vertical-align:top;">Peor vida 1 2 3 4 5 6 7 8 9 Mejor vida</td>
</tr>

</table>
]

???
https://www.pipinghotdata.com/posts/2020-12-23-leveraging-labelled-data-in-r/

Here are operations I commonly perform on labelled data:

Evaluate if variable is of class haven_labelled.

Why? Troubleshooting, exploring, mutating.

Function(s): haven::is.labelled()

Convert haven_labelled variable to numeric value codes.

Why? To treat the variable as continuous for analysis. For example, if a 1-7 rating scale imports as labelled and you want to compute a mean.

Function(s): base::as.numeric() (strips variable of all metadata), haven::zap_labels() and labelled::remove_val_labels (removes value labels, retains other metadata)

Convert haven_labelled() variable to factor with value labels.

Why? To treat the variable as categorical for analysis.

Function(s): haven::as_factor(), labelled::to_factor(), sjlabelled::as_label(). As far as I can tell, these three functions have the same result. By default, the factor levels are ordered by value codes.

Convert variable label to variable name.

Why? For more informative or readable variable names.

Function(s): sjlabelled::label_to_colnames()

---

### Search in generated data dictionnary

```r
# create data dictionary 
dictionary <- labelled::generate_dictionary(data0)
View(dictionary)

# library(DT)
# dictionary %>% 
#   DT::datatable()
```
]

<div id="htmlwidget-03fb4177b97acfe9d309" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-03fb4177b97acfe9d309">{"x":{"filter":"none","vertical":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47"],[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47],["DIRECTORIO","SECUENCIA_ENCUESTA","SECUENCIA_P","ORDEN","FEX_C","P6016","P1894","P6020","P6034","P6040","P6051","P5502","P6071","P6071S1","P756","P756S1","P756S2","P756S3","P6074","P767","P6076","P6076S1","P6076S2","P6077","P6096","P6081","P6081S1","P6087","P6083","P6083S1","P6088","P6080","P5667","P2057","P2059","P2061","P1895","P1896","P1897","P1898","P1899","P3175","P1901","P1903","P1904","P1905","P1927"],["Directorio","Secuencia_encuesta","Secuencia_p","Orden","Factor de expansión","Número de orden de la persona que proporciona la información:","Tipo de documento de identidad","Sexo","¿Cuál es la fecha de nacimiento de _____?","¿cuántos años cumplidos tiene...?","Cuál es el parentesco de....con el jefe o la jefa de este hogar?","Actualmente…:","El (la) cónyuge de .. vive en este hogar","No de orden","¿Dónde nació ________?","Departamento","Municipio","En otro país","¿_______________ siempre ha vivido aquí en este municipio?","¿cuántos años continuos hace que vive ___ aquí en este municipio?","Antes de venir a este municipio_______________ vivía en","Departamento","Municipio\r\n","_____ vivía en","¿cuál fue la razón principal para cambiar la residencia al municipio actual?","El padre de .. vive en este hogar","No. De orden","¿cuál es o fue el nivel de educación más alto alcanzado por el padre de…..?","La madre de .. vive en este hogar","No. De orden","¿cuál es o fue el nivel de educación más alto alcanzado por la madre de......?","De acuerdo con su cultura, pueblo o rasgos físicos, _____ es o se reconoce comó:"," ¿a cuál pueblo o etnia indígena pertenece _____?","¿Usted se considera campesino(a)?"," ¿Usted considera que alguna vez fue campesino(a)?"," ¿Usted considera que la comunidad en que vive es campesina?","En general, qué tan satisfecho(a) se siente ... con su vida actualmente?","En general, qué tan satisfecho(a) se siente ... con su ingreso actualmente?","En general, qué tan satisfecho(a) se siente ... con su salud actualmente?","En general, qué tan satisfecho(a) se siente ... con su nivel de seguridad actualmente?","En general, qué tan satisfecho(a) se siente ... con su trabajo/actividad actualmente?"," En general, ¿qué tan satisfecho/a se siente _____ con su tiempo libre?"," ¿qué tan feliz se sintió ... el día de ayer?","¿qué tan preocupado(a) se sintió ... el día de ayer?","¿qué tan triste se sintió ... el día de ayer?","Qué tanto considera ... que las cosas que hace en su vida valen la pena?","¿En cuál escalón diría usted que se encuentra parado(a) en este momento?"],["dbl","dbl","dbl","dbl","dbl","dbl","dbl+lbl","dbl+lbl","dbl","dbl","dbl+lbl","dbl+lbl","dbl+lbl","dbl","dbl+lbl","dbl","dbl","dbl+lbl","dbl+lbl","dbl","dbl+lbl","chr","chr","dbl+lbl","dbl+lbl","dbl+lbl","dbl","dbl+lbl","dbl+lbl","dbl","dbl+lbl","dbl+lbl","chr","dbl+lbl","dbl+lbl","dbl+lbl","dbl+lbl","dbl+lbl","dbl+lbl","dbl+lbl","dbl+lbl","dbl+lbl","dbl+lbl","dbl+lbl","dbl+lbl","dbl+lbl","dbl+lbl"],[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,[1,2,3,4,5],[1,2],null,null,[1,2,3,4,5,6,7,8,9,10,11,12,13,14],[1,2,3,4,5,6],[1,2],null,[1,2,3],null,null,[1,2,3,4,5,6,7,8,9,10,11],[1,2],null,[1,2],null,null,[1,2],[1,2,3,4,5,6,7,8,9,10,11,12],[1,2,3],null,[1,2,3,4,5,6,7,8,9,10],[1,2,3],null,[1,2,3,4,5,6,7,8,9,10],[1,2,3,4,5,6],null,[1,2,3],[1,2,3],[1,2,3],[0,1,2,3,4,5,6,7,8,9,10],[0,1,2,3,4,5,6,7,8,9,10],[0,1,2,3,4,5,6,7,8,9,10],[0,1,2,3,4,5,6,7,8,9,10],[0,1,2,3,4,5,6,7,8,9,10],[0,1,2,3,4,5,6,7,8,9,10],[0,1,2,3,4,5,6,7,8,9,10],[0,1,2,3,4,5,6,7,8,9,10],[0,1,2,3,4,5,6,7,8,9,10],[0,1,2,3,4,5,6,7,8,9,10],[0,1,2,3,4,5,6,7,8,9,10]]],"container":"<table class=\"display\">\n <thead>\n <tr>\n <th> <\/th>\n <th>pos<\/th>\n <th>variable<\/th>\n <th>label<\/th>\n <th>col_type<\/th>\n <th>levels<\/th>\n <th>value_labels<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":1},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

]

???

---

### Simple plot with labelled data

```r
plot <- sjPlot::plot_frq(
 data = data0$P756S3, 
 type = "bar",
 sort.frq = "asc",
 coord.flip = TRUE,
 weight.by = as.vector(data0$FEX_C),
 show.ci = TRUE) 
plot
```
]

<img src="presentation_files/figure-html/unnamed-chunk-38-1.png" width="1800" />
]

---

### Style the plot
.pull-left[

```r
plot <- plot + 
 scale_y_continuous(labels = scales::label_number_si()) + 
 labs(title = paste0(sjlabelled::get_label(data0$P756S3)),
 x = "", y = "",
 caption = "Source: Encusta Nacional de Calidad de Vida ECV 2020, Colombia, DANE")+
 unhcRstyle::unhcr_theme(base_size = 8) + 
 theme(legend.position = "none",
 panel.grid.major.x = element_line(color = "#cbcbcb"), 
 panel.grid.major.y = element_blank(), 
 panel.grid.minor = element_blank()) 
plot
```
]

<img src="presentation_files/figure-html/unnamed-chunk-40-1.png" width="1800" />
]

---

### Get all data files together

```r
# get a list of files in a folder with specific sav extension in the data-raw folder

datasetlist <- fs::dir_ls(here::here("data-raw"), 
 glob = "*.sav") 
# head(datasetlist, 2)

# read all the files 
# all the dataset will be stored in a list object
datasets <- datasetlist %>% 
 map(compose(as_factor, haven::read_sav))
# class(datasets)
```
]

```
##  Print datasetlist first 2 lines
```

```
## /home/edouard/R-projects/tested_script/SurveyAnalysisTutorial/data-raw/Atencion integral de los niños y niñas menores de 5 años.sav
## /home/edouard/R-projects/tested_script/SurveyAnalysisTutorial/data-raw/Características y composición del hogar.sav
```

```
##  Print dataset class
```

```
## [1] "list"
```
]

???

---

### Join data frame for Individuals

```r
# Bind together datasets with the same unit of analysis
persons <- datasets %>% 
 # keep() is similar to Filter() but 
 # can be applied to a list object 
 ## Records available for individual are sorted with a 
 # combination of code within the sequence variable
 keep(~!all(.$SECUENCIA_P == 1) &
 !all(.$SECUENCIA_ENCUESTA == 1)) %>% 
 ## Iteratively applying a binary function, i.e loop...
 reduce(
 ## Perform multiple left joins 
 full_join)

# harmonize key variables by renaming them 
# notice how SECUENCIA_ENCUESTA changes meaning between datasets
persons <- persons %>% 
 rename(dwelling = DIRECTORIO, 
 household = SECUENCIA_P, 
 person = SECUENCIA_ENCUESTA)
# nrow(persons)
# ncol(persons)
```
]

```
## [1] 267098
```

```
## [1] 527
```
]

???

---

### Join data frame for Household

```r
households <- datasets %>% 
 
 ## Records available for Household are sorted with a 
 # combination of code within the sequence variable
 
 keep(~all(.$SECUENCIA_P == 1) & 
 !all(.$SECUENCIA_ENCUESTA == 1)) %>% 
 reduce(full_join)

# harmonize key variables - 
# notice how SECUENCIA_ENCUESTA changes meaning between datasets

households <- households %>% 
 rename(dwelling = DIRECTORIO, 
 household = SECUENCIA_ENCUESTA) %>% 
 select(-ORDEN)
# nrow(households)
#ncol(households)
```
]

```
## [1] 88310
```

```
## [1] 235
```
]

---

### Identify Venezuelan

```r
library(tidyverse)
data <- 
 left_join(persons, households) %>% 
 transmute(pop = case_when(is.na(P756S3) ~ "Colombians",
 P756S3 == "Venezuela" ~ "Venezuelans",
 TRUE ~ "Others"))

# table(data$pop, useNA = "ifany")
```
]

```
## 
##  Colombians      Others Venezuelans 
##      257044         496        9558
```
]

???
 
---

### Create Indicators

```r
indicators1 <- left_join(persons, households) %>% 
 
 # transmute function returns the same new variable as mutate.
 # However, it does not retain our original data!
 transmute(pop = case_when(is.na(P756S3) ~ "Colombians",
 P756S3 == "Venezuela" ~ "Venezuelans",
 TRUE ~ "Others"),
 
 # 2.1 Proportion of PoC living below the national poverty line
 rbm2.1 = PERCAPITA < 327674,
 # 2.3 Proportion of PoC with access to health services
 rbm2.3 = if_else(P5665 == "Sí", 
 is.na(P6153), 
 NA),
 
 dpto = "dummy", # FIXME: should be dwellings$dpto,
 wt = FEX_C)

#knitr::kable(head(indicators1 %>%
#                    select(pop, rbm2.1 )),
#             format = 'html')
```
]

.pull-right[
<table>
 <thead>
 <tr>
 <th style="text-align:left;"> pop </th>
 <th style="text-align:left;"> rbm2.1 </th>
 <th style="text-align:left;"> rbm2.3 </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> TRUE </td>
 <td style="text-align:left;"> NA </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> FALSE </td>
 <td style="text-align:left;"> NA </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> TRUE </td>
 <td style="text-align:left;"> NA </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> FALSE </td>
 <td style="text-align:left;"> NA </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> FALSE </td>
 <td style="text-align:left;"> NA </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> TRUE </td>
 <td style="text-align:left;"> NA </td>
 </tr>
</tbody>
</table>
]

???

---

### Add Indicators from multiple variables

```r
indicators2 <- left_join(persons, households) %>% 
 transmute(pop = case_when(is.na(P756S3) ~ "Colombians",
 P756S3 == "Venezuela" ~ "Venezuelans",
 TRUE ~ "Others"),
 
 water = P792 != "No tienen el servicio",
 sanitation = P5032 != "No tienen el servicio",
 overcrowding = CANT_PERSONAS_HOGAR/P5000 > 3,
 secure_tenure = 
 P5095 == "Propia, totalmente pagada" |
 P5095 == "Propia, lo están pagando" |
 (P5095 == "En arriendo o subarriendo" & P3006 == "Escrito"),
 housing_durability = TRUE, # FIXME: correct when we get the missing data file
 
 # 2.2 Proportion of PoCs residing in physically safe and secure settlements 
 rbm2.2 = water & sanitation & !overcrowding & housing_durability & secure_tenure,
 
 dpto = "dummy", # FIXME: should be dwellings$dpto,
 wt = FEX_C)

#knitr::kable(head(indicators %>%
#                    select(pop, rbm2.2, wt )),
#             format = 'html')
```
]

.pull-right[
<table>
 <thead>
 <tr>
 <th style="text-align:left;"> pop </th>
 <th style="text-align:left;"> rbm2.2 </th>
 <th style="text-align:right;"> wt </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> TRUE </td>
 <td style="text-align:right;"> 438.1767 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> TRUE </td>
 <td style="text-align:right;"> 495.8612 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> TRUE </td>
 <td style="text-align:right;"> 440.2299 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> TRUE </td>
 <td style="text-align:right;"> 466.8462 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> TRUE </td>
 <td style="text-align:right;"> 466.8462 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> FALSE </td>
 <td style="text-align:right;"> 466.4742 </td>
 </tr>
</tbody>
</table>
]

???

---

### Aggregate Value

```r
indicators.computed <- 
 ## get all indicators joined
 dplyr::left_join(x= indicators1,
 y =indicators2) %>%
 ## Remove the neither colombian nor venezuelan
 filter(pop != "Others") %>% 
 
 ## transform in a weighted survey object
 srvyr::as_survey_design(strata = dpto, 
 weights = wt) %>% 
 ## Compile indicators by group
 group_by(pop) %>% 
 summarize(
 # Apply functions across multiple columns
 across(
 ## select all variable name based on 
 ## name pattern
 contains("rbm"), 
 srvyr::survey_mean, 
 
 # Report variability as confidence interval ("ci")
 vartype = "ci", 
 # missing values should be dropped
 na.rm = TRUE,
 ## append a suffix to all column name to 
 # distinguish the mai estimation from CI
 .names = "{.col}_est")) 
```
]

.pull-right[
<table>
 <thead>
 <tr>
 <th style="text-align:left;"> pop </th>
 <th style="text-align:right;"> rbm2.2_est </th>
 <th style="text-align:right;"> rbm2.2_est_low </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:right;"> 0.5501188 </td>
 <td style="text-align:right;"> 0.5490104 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Venezuelans </td>
 <td style="text-align:right;"> 0.2246054 </td>
 <td style="text-align:right;"> 0.2161841 </td>
 </tr>
</tbody>
</table>
]

???
 
---

### Pivot Frame

```r
indicators <- indicators.computed %>%
 ## pivot from wide to long using variable pop
 pivot_longer(-pop, 
 names_to = c("ind", ".value"), 
 names_pattern = "(.+?)_(.+)")

# Save indicators in a csv file
# write.csv(indicators, "indicators.csv")
# knitr::kable(head(indicators %>%
#                    select(pop, ind, est )), format = 'html')
```
]

.pull-right[
<table>
 <thead>
 <tr>
 <th style="text-align:left;"> pop </th>
 <th style="text-align:left;"> ind </th>
 <th style="text-align:right;"> est </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> rbm2.1 </td>
 <td style="text-align:right;"> 0.3027933 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> rbm2.3 </td>
 <td style="text-align:right;"> 0.6900830 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Colombians </td>
 <td style="text-align:left;"> rbm2.2 </td>
 <td style="text-align:right;"> 0.5501188 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Venezuelans </td>
 <td style="text-align:left;"> rbm2.1 </td>
 <td style="text-align:right;"> 0.6496214 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Venezuelans </td>
 <td style="text-align:left;"> rbm2.3 </td>
 <td style="text-align:right;"> 0.3959790 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Venezuelans </td>
 <td style="text-align:left;"> rbm2.2 </td>
 <td style="text-align:right;"> 0.2246054 </td>
 </tr>
</tbody>
</table>
]

???

---

### Build chart

```r
plot <- ggplot( data = indicators,
 aes( x= est, 
 y = fct_rev(ind))) +
 geom_pointrange(aes(xmin = est_low, 
 xmax = est_upp, 
 color = pop), 
 size = .75) +
 geom_text(aes(label = scales::label_percent(.1)(est)), 
 vjust = -1.25) 
plot
```
]

???

---

### Add scale

```r
plot <- plot +
 scale_x_continuous(labels = scales::label_percent(), 
 limits = c(0, 1), 
 breaks = c(0, .5, 1)) +
 scale_color_manual(values = c(Colombians = "black", Venezuelans = "#0072BC"))

plot
```
]

???

---

### Change labels

```r
## Add labels for each calculated indicators
rbm_indicators <- 
 c(rbm2.1 = "2.1 Proportion of PoC living below the national poverty line",
 rbm2.2 = "2.2 Proportion of PoCs residing in physically safe and secure settlements with access to basic facilities.",
 rbm2.3 = "2.3 Proportion of PoC with access to health services")

plot <- plot +
 scale_y_discrete(labels = str_wrap(rev(rbm_indicators), 40)) 
plot
```
]

???

---

### Title and Style

```r
plot <- plot + 
 labs(x = "Estimate", y = "",
 title = "Impact Area 2: Realizing Rights in Safe Environments",
 caption = glue::glue("Encuesta Nacional de Calidad de Vida (ECV) 2020\n",
 "# of Venezuelans = {scales::label_comma()(sum(data$pop=='Venezuelans'))} obs. ",
 "({scales::label_number_si(.1)(sum((data$pop=='Venezuelans')*data$wt))} weighted)")) +
 unhcRstyle::unhcr_theme() +
 theme(panel.grid.minor.x = element_blank())

plot
```
]

???

---

## Congrat!

Today you learn the basics!

* Objects are __words__
 * Functions are __verbs__ used to transform object
 * Charts are  __complement__ created with the __grammar__ of graphics

???

In grammar, a complement is a word, phrase, or clause that is necessary to complete the meaning of a given expression. Complements are often also arguments

---
class: center, middle, inverse

# Practical Use Case 3 - Evidence for key research question

### * What programmatic assumptions do we have evidences for? Describe, Explore, Explain... *

---

## Statistical association analysis: Crosstabulation & Assocation

> __Question__: How is the perception of current situation compared to that of 5 years ago differs between the "undocumented Venezuelans" and the rest of the population? 
 
See you in the R master webinar in January !

???

---

## Regression analysis between a target variable and multiple predictors
> __Question__: What can explain why some "undocumented Venezuelans" feel worse now than five years ago? 
 
See you in the R master webinar in January !

---

## Clustering analysis to identify a homogeneous group of individuals based on multiple variables

> __Question__: What are the main profiles of "undocumented Venezuelans" who have a worse situation now than 5 years ago? 
  
 
See you in the R master webinar in February !

---
class: left, top

# Good luck in your R journey!

### Reach out to us for questions, analysis mentoring or code peer review!

__DIMA Americas__: 
Edouard Legoupil, _Snr Statistics & Data Analysis Officer_, <a href="mailto:legoupil@unhcr.org">&nbsp; legoupil@unhcr.org</a> 
Hisham Galal, _Associate Statistics & Data Analysis Officer_, <a href="mailto:galal@unhcr.org">&nbsp; galal@unhcr.org</a>

Check our github @ <a href="http://github.com/unhcr-americas">&nbsp;unhcr-americas</a> and learn about __UnhcRverse packages__

Slides created with [**remark.js**](http://remarkjs.com/) and the R package [**xaringan**](https://github.com/yihui/xaringan). Slides notes for this presentation can be displayed by pressing keyboard shortcut `p` - Navigation help with keyboard shortcut `h`

]

]

???

Last, we hope that this session will motivate you to join the vibrant R users community in UNHCR and soon become an R champion. In order to make the most of the session, we would advise you to install the following open source environment:

R - https://cran.r-project.org/bin/windows/base/

Rstudio Free version: https://www.rstudio.com/products/rstudio/download/

Create an account on Github - https://github.com/join?  and install Github desktop https://desktop.github.com/

You may also start installing UNHCR Packages – following the instruction in their respective documentation published on Github:

Use UNHCR Open data  - https://unhcr.github.io/unhcrdatapackage/docs/

API to connect to internal data source - https://unhcr-web.github.io/hcrdata/docs/

Perform High Frequency Check https://unhcr.github.io/HighFrequencyChecks/docs/

Process data crunching for survey dataset - https://unhcr.github.io/koboloadeR/docs/

Use UNHCR graphical template- https://unhcr-web.github.io/unhcRstyle/docs/

See some practical tutorial on https://humanitarian-user-group.github.io/

The best way to start and learn is to have a concrete project! If you have one and need mentoring, we can liaise after the session.

Resources Ggplot:

- [Ggplot main doc](https://ggplot2.tidyverse.org/index.html)
- [The ggplot flipbook](https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1) by Gina Reynolds
- [A ggplot2 tutorial for beautiful plotting in R](https://www.cedricscherer.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/) and [ggplot Wizardry Hands-On](https://z3tt.github.io/OutlierConf2021/) by Cedric Scherer
- Ggplot workshop [Part1](https://www.youtube.com/watch?v=h29g21z0a68)/[Part2](https://www.youtube.com/watch?v=0m4yywqNPVY) by Thomas Lin Pedersen (one of the main maintainer of ggplot)

packages extension for ggplot

- Patchwork to bind multiple plots on one

- Gganimate to create simple animation

- Ggtext and ggrepel to deal with annotation and text style

- Ggforce to group some content visually

- and much more.... just go out there and experiment