class: center, middle, inverse, title-slide # Survey Data Analysis ##
<font color=“black”, size = 30>Cookbook
### ### 1 December 2021 --- layout: true <div class="my-footer"><span></span></div> --- class: inverse, left, middle # A Vision for Data Analysis <span style='font-size:50px; color:grey15;'>"_Multi-functional teams, with strengthened data literacy, regularly conduct meaningful and documented joint data interpretation sessions to define their strategic directions based on statistical evidences_"</span> ??? Slides made with Xaringan - - https://arm.rbind.io/slides/xaringan.html https://slides.earo.me/rladiesakl20/#1 https://xaringantutorial.netlify.app --- # A Theory of Change for Data analysis <span style='font-size:30px;'>Proper use of data for advocacy & programmatic decision making </span> <span style='font-size:40px;'>↪</span> Corporate __Standards__ exist to define how to encode & process household surveys dataset <span style='font-size:40px;'>↪</span> Field data experts are trained based on precise recipes and predefined tools at each step of the __data life cycle__ <span style='font-size:40px;'>↪</span> Data are presented, discussed and linked to expert knowledge during data __interpretation__ sessions with a multi-functional team <span style='font-size:40px;'>↪</span> All potential valid interpretations, including diverging views, are systematically __recorded__ <span style='font-size:40px;'>↪</span> __Persuasive__ "Data Stories“ and Policy papers are generated --- class: left, middle # Learning objectives ### 1. How to build charts quickly? ### 2. How to calculate impact indicators from survey data? ### 3. How to answer Key Research Questions? Describe... _Explore, Explain to be presented in next webinar!_ ??? Depending on the pace of the group, if we do not finish today we will org anise a second session --- # Webinar rules <i class="fa fa-spinner fa-spin fa-fw fa-2x"></i> Leverage this opportunity and make this session __lively__ - there's no stupid questions! <i class="fa fa-check-square fa-fw fa-2x"></i> Use the __chat__ to send your questions - we are two facilitators and one is focused on replying all questions directly in the chat while the session is on-going <i class="fa fa-cog fa-fw fa-2x"></i> All practical exercises are designed to get you __testing the commands__: > Start Rstudio if you have already it installed or Login now to[cloud-based version of RStudio](https://login.rstudio.cloud/register?redirect=https%3A%2F%2Fclient.login.rstudio.cloud%2Foauth%2Flogin%3Fshow_auth%3D0%26show_login%3D0) for this session > Paste the command from the chat to your online Rstudio session and check what is happening > In case it is not working as expected, share screenshot or error messages from the console in the chat --- # Click-based Workflow... .pull-left[ Associate data with other tables with **ACCESS** ![](images/Logo_Microsoft_Access_2013.png) ...then explore through graphs with **EXCEL** ![](images/Microsoft_Excel_2013_logo.svg.png) ...then mapping with **ArcGIS** ![](images/ArcGIS.png) ] .pull-right[ ..then write up narratives in **WORD** ![](images/Microsoft_Word_logo.png) ... and design a full document with **INDESIGN** ![](images/Adobe_InDesign_icon.png) ... or create an infographic with **ILLUSTRATOR** ![](images/Adobe_Illustrator_icon.png) ] ??? and eventually some VBA macros --- ## ... coming up with challenges! As a coauthor/reader/peer reviewer, one would like to see the whole **research process** (_how we arrived to that conclusion_), rather than cooked manuscript with inserted tables/figures. .pull-left[ What analysis is **behind the figure**? Did it accounts for [..._new last minute question_...] in the analysis? What **dataset** (_final vs preliminary version_) was used ? Were **outliers** identified? How did you **weight** your sample? Oops, there is an error in the data. Can we **repeat the analysis**? And update quickly the figures, graphs and tables in the report and the presentation! ] .pull-right[ > This consumes time and open space for errors... ![](images/exhausted.png) ] ??? When managing numerous analysis with data that may change and in a collaborative mode, this workflow is **not** the most effective. * Data are manipulated through "point-and-click" user interfaces that are not __captured__! * Data are moving from a software to another (Excel, GIS, Word...) using different __formats__! * All results (figures, tables) are **manually** copied/pasted to the final publishing system... --- # Science is '_show me_' - not '_trust me_'! ### Reproducible Research Manisfesto; aka the _"Ten Commandments"_ .pull-left[ For every result, **keep track** of how it was produced **Avoid manual data manipulation** steps **Archive** the exact versions of all external programs used **Version control** all custom scripts **Record all intermediate results**, when possible in standardized formats ] .pull-right[ For analyses that include randomness, **note underlying random seeds** Always **store raw data** behind plots Generate hierarchical analysis output, allowing layers of increasing detail to be inspected Connect **textual statements** to underlying results Provide **public access** to scripts, runs, and results ] --- ## Enable a fully auditable workflow As soon as all steps (i.e. **DATA + TIDYING + MODELING + VISUALS + NARRATIVE**) are done through **series of written commands recorded in scripts**: - when spotting error in the data, or using different dataset, one just need to adjust in the script and report will update automatically; - Data manipulation becomes be *de facto* fully documented (no more manual changes in Excel); - Analysis is self-explanatory and ready for any kind of collaborative review; - Customization are facilitated and allow to deliver final product with a professional branding and styling. > Analysis becomes streamlined and [reproducible](https://unhcr-americas.github.io/reproducibility)! > A "collaboration mode" is enabled from the begining of the process! > As your analysis can be reviewed, you become "covered"... ??? instead of **hundreds of mouse clicks** See also http://muschellij2.github.io/summerR_2015/modules/module12.html --- ### Key Concept 1: From "click" to "script" Using the right combination of packages, you can integrate all necessary data analysis steps into **scripts**: .pull-left[ Data management (import, clean, recode, merge, reshape) Data analysis (test, regression, multivariate analysis, etc...) Data visualization (plot, map, graph...) Writing up results (report and presentation generation) ] .pull-right[ ![](images/data-science-wrangle.png) ] --- ### Key Concept 2: Everything is an object & Anything can be packaged.... .left-column[ ![](images/design-cake3.png) ] .right-column[ `Vectors` are a core single data structure, created with `c()`. `Data.frame` where each column is a vector, but adjacent vectors can hold different things `Matrix` just like a data frame except it's all numeric `List` are made of any dimension, mix and match `Factors` are a special class that R uses for categorical variables, which also allows for value labeling and ordering. `Functions` are object designed to transform one object in a new one `Charts` are objects designed to generate an image `Models` are objects recording computation based on specific data ] ??? Elements in a vector must be of the same type. Reference link on [Manipulating data](http://www.cookbook-r.com/Manipulating_data/) --- ### Key Concept 3: Search, Test, Try... .pull-left[ ![](images/change.png) ] .pull-right[ Get a Certification on R language recognized by UNHCR on [learn.unhcr.org](https://unhcr.csod.com/ui/lms-learner-playlist/PlaylistDetails?playlistId=e90e2279-e3a4-4ef2-8b74-757f91d224b2) and play with [Rstudio Primers](https://rstudio.cloud/learn/primers) Search and ask in [Stackoverflow](https://stackoverflow.com/questions/tagged/r) Go through [Cheat-sheets](https://rstudio.cloud/learn/cheat-sheets) Consult Key Manuals, maybe starting with [R for Data Science](https://r4ds.had.co.nz/) Browse chart library on [`UnhcrDataPackage`]() Follow blogs like general [Rbloggers](https://www.r-bloggers.com/), [Tidyverse blog](https://www.tidyverse.org/blog/) or more specific [HumanitaRian-useR-group](https://humanitarian-user-group.github.io/) as well some twitter accounts. Join forum like [Inter-Agency R skype group](https://join.skype.com/qYBKC5q3wKp4) or Internal UNHCR Ms Discussion group (ask to join!). ] --- # Learning stages... .pull-left[ ![](images/learning.png) ] .pull-right[ __Step 1.__ Develop an understanding of what data science is and what programming and math concepts are needed for it __Step 2.__ Break data science challenges into small steps - Acquire basic command syntax through very practical and focused project __Step 3.__ Develop Reproducible Analysis Workflow - Understand the relevance, inputs, constraints, and limitations of the various techniques __Step 4.__ Optimize your problem solving approaches in elegant ways - Build packages ] ??? https://towardsdatascience.com/the-stages-of-learning-data-science-3cc8be181f54 See Video - https://www.youtube.com/watch?v=hpMc6TgT34I --- # Analytics Models / Algorithms ![](images/analytics.png) --- class: center, middle, inverse # Practical Use Case 1 - Charting ### *My boss needs a slide with the main origin of Asylum Seekers and Refugees from Americas in this country... in 5 minutes....* <i class="fa fa-exclamation-circle fa-fw fa-2x"></i> --- ## How to get to this chart in a couple of lines? <img src="presentation_files/figure-html/unnamed-chunk-1-1.png" width="80%" /> ??? https://r4ds.had.co.nz/graphics-for-communication.html#figure-sizing --- ### Install required packages Got to your [locally installed Rstudio](https://www.rstudio.com/products/rstudio/download/#download) or [sign-up for a free Rstudio Cloud account](https://login.rstudio.cloud/register?redirect=https%3A%2F%2Fclient.login.rstudio.cloud%2Foauth%2Flogin%3Fshow_auth%3D0%26show_login%3D0) .pull-left[ First create a new project within R studio and then make sure we have the [tidyverse](https://www.tidyverse.org/packages/) plus additional UNHCR packages ```r # Tidyverse if (!require("tidyverse")) install.packages("tidyverse", dependencies = TRUE) if (!require("here")) install.packages("here") # UnhcRverse if (!require("unhcrdatapackage")) remotes::install_github('unhcr/unhcrdatapackage') if (!require("unhcRstyle")) remotes::install_github('unhcr-web/unhcRstyle') ``` ] .pull-right[ ![tidyverse](images/forcats.png) ] ??? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. --- ### Get the data .pull-left[ To get the data, multiple approaches are possible. Go to [UNHCR dataset page on HDX](https://data.humdata.org/dataset/unhcr-population-data-for-world) to download the dataset: `end_year_population_totals_residing_world.csv` and save it locally within your project in a folder name `data-raw` ```r popdata <- read.csv(here::here("data-raw", "end_year_population_totals_residing_world.csv")) ``` or __save time__ and use directly the unhcrdatapackage that includes a reshaped `long` version of the data ```r popdata <- unhcrdatapackage::end_year_population_totals_long ## check the name of the variable #names(popdata) ## Check the top 5 lines for select variables # head(popdata %>% select(Year, CountryOriginCode,CountryAsylumCode,Population.type, Value),5) ``` ] -- .pull-right[ ``` ## [1] "Year" "CountryOriginCode" ## [3] "CountryAsylumCode" "CountryOriginName" ## [5] "CountryAsylumName" "Population.type" ## [7] "Value" "Population.type.label" ## [9] "Population.type.label.short" ``` <table> <thead> <tr> <th style="text-align:right;"> Year </th> <th style="text-align:left;"> CountryAsylumCode </th> <th style="text-align:left;"> Population.type </th> <th style="text-align:right;"> Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1951 </td> <td style="text-align:left;"> AUS </td> <td style="text-align:left;"> REF </td> <td style="text-align:right;"> 180000 </td> </tr> <tr> <td style="text-align:right;"> 1951 </td> <td style="text-align:left;"> AUT </td> <td style="text-align:left;"> REF </td> <td style="text-align:right;"> 282000 </td> </tr> <tr> <td style="text-align:right;"> 1951 </td> <td style="text-align:left;"> BEL </td> <td style="text-align:left;"> REF </td> <td style="text-align:right;"> 55000 </td> </tr> <tr> <td style="text-align:right;"> 1951 </td> <td style="text-align:left;"> CAN </td> <td style="text-align:left;"> REF </td> <td style="text-align:right;"> 168511 </td> </tr> <tr> <td style="text-align:right;"> 1951 </td> <td style="text-align:left;"> DNK </td> <td style="text-align:left;"> REF </td> <td style="text-align:right;"> 2000 </td> </tr> </tbody> </table> ] --- ### Reshape the data - 1 - Merge with reference .pull-left[ ```r # first merge it with the reference table to get the bureau filter #names(unhcrdatapackage::reference) *Origin <- dplyr::left_join( x= unhcrdatapackage::end_year_population_totals_long, y= unhcrdatapackage::reference, by = c("CountryAsylumCode" = "iso_3")) # head(names(unhcrdatapackage::reference), 9) #knitr::kable(head(unhcrdatapackage::reference %>% select(UNHCRBureau,iso_3, ctryname )%>% filter(UNHCRBureau== "Americas" ), 5), format = 'html') ``` ] -- .pull-right[ ``` ## [1] "iso_3" "UNHCRcode" "ctryname" "namepostat" "namepostat2" ## [6] "gis_name" "UNHCRBureau" "main_office" "hcr_region" ``` <table> <thead> <tr> <th style="text-align:left;"> UNHCRBureau </th> <th style="text-align:left;"> iso_3 </th> <th style="text-align:left;"> ctryname </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Americas </td> <td style="text-align:left;"> ABW </td> <td style="text-align:left;"> Aruba </td> </tr> <tr> <td style="text-align:left;"> Americas </td> <td style="text-align:left;"> AIA </td> <td style="text-align:left;"> Anguilla </td> </tr> <tr> <td style="text-align:left;"> Americas </td> <td style="text-align:left;"> ARG </td> <td style="text-align:left;"> Argentina </td> </tr> <tr> <td style="text-align:left;"> Americas </td> <td style="text-align:left;"> ASM </td> <td style="text-align:left;"> American Samoa </td> </tr> <tr> <td style="text-align:left;"> Americas </td> <td style="text-align:left;"> ATG </td> <td style="text-align:left;"> Antigua and Barbuda </td> </tr> </tbody> </table> ] --- ### Reshape the data - 2 - Filter ```r nrow(Origin) # Number of Rows before filter ``` ``` # [1] 161788 ``` ```r ## Using Pipe (%>%) Operator # allows us to pass the result of one function/argument to the other one in sequence # assigning each functional output as an argument to the next one, and so on Origin <- Origin %>% * ## First handling functions.. * filter( # Other handling functions includes: # arrange, at_least, chop, combine, dissolve, filter, # mutate, rename, rm_uncomplete, rw_fac, sample_frac, # sample_n, select, slice, subset.Coo ## https://dplyr.tidyverse.org/reference/index.html ## we will use the 4 filters below CountryAsylumName == "Panama" & UNHCRBureau == "Americas" & Year == max(unhcrdatapackage::end_year_population_totals_long$Year) & Population.type %in%c("REF", "ASY", "VDA" )) nrow(Origin) # Checking Number of rows after filter ``` ``` # [1] 46 ``` --- ### Reshape the data - 3 - Shorten country name ```r ## Check what the potential value of CountryOriginName levels(Origin$CountryOriginName) ``` ``` # NULL ``` ```r ## Changing to factor an display the last 10 values tail(levels(as.factor(Origin$CountryOriginName)),6) ``` ``` # [1] "Sierra Leone" "Somalia" # [3] "Sri Lanka" "Sudan" # [5] "Ukraine" "Venezuela (Bolivarian Republic of)" ``` ```r Origin <- Origin %>% * mutate( CountryOriginName = str_replace(CountryOriginName, " \\(Bolivarian Republic of\\)", "")) ## We used a string replacement function ## check replacement tail(levels(as.factor(Origin$CountryOriginName)),4) ``` ``` # [1] "Sri Lanka" "Sudan" "Ukraine" "Venezuela" ``` --- ### Reshape the data - 4 - Aggregate by Country of Origin .pull-left[ ```r Origin <- Origin %>% * group_by( CountryOriginName) %>% * summarise(DisplacedAcrossBorders = sum(Value) ) ### Explore the results in a filterable table # DT::datatable(Origin, # fillContainer = FALSE, options = list(pageLength = 4)) ``` ] -- .pull-right[
] --- ### Reshape the data - 5 - Create data labels .pull-left[ ```r Origin <- Origin %>% * ## using the scales::label_number_si function ## but adding some accuracy needs to be corrected for values below 1000 mutate( DisplacedAcrossBordersRound = ## let do a test with ifelse ifelse(DisplacedAcrossBorders > 1000, ## condition 1 - above 1000 paste(scales::label_number_si(accuracy = 0.1)(DisplacedAcrossBorders)), ## condition 2 - below 1000 as.character(DisplacedAcrossBorders) ) ) ``` ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> CountryOriginName </th> <th style="text-align:left;"> DisplacedAcrossBordersRound </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Venezuela </td> <td style="text-align:left;"> 124.4K </td> </tr> <tr> <td style="text-align:left;"> Nicaragua </td> <td style="text-align:left;"> 5.4K </td> </tr> <tr> <td style="text-align:left;"> Colombia </td> <td style="text-align:left;"> 2.7K </td> </tr> <tr> <td style="text-align:left;"> Cuba </td> <td style="text-align:left;"> 1.7K </td> </tr> <tr> <td style="text-align:left;"> El Salvador </td> <td style="text-align:left;"> 1.3K </td> </tr> <tr> <td style="text-align:left;"> Somalia </td> <td style="text-align:left;"> 101 </td> </tr> <tr> <td style="text-align:left;"> Honduras </td> <td style="text-align:left;"> 94 </td> </tr> <tr> <td style="text-align:left;"> Nigeria </td> <td style="text-align:left;"> 44 </td> </tr> <tr> <td style="text-align:left;"> Eritrea </td> <td style="text-align:left;"> 38 </td> </tr> <tr> <td style="text-align:left;"> Peru </td> <td style="text-align:left;"> 23 </td> </tr> </tbody> </table> ] --- ### Reshape the data - 6 - Select top 10 countries .pull-left[ ```r Origin <- Origin %>% ## Sorting resulting by descending numbers arrange(desc(DisplacedAcrossBorders)) %>% ## keep only the first 10 records head(10) ## check results #knitr::kable(head(Origin %>% # select(CountryOriginName,DisplacedAcrossBordersRound ),5 ), format = 'html') ``` ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> CountryOriginName </th> <th style="text-align:left;"> DisplacedAcrossBordersRound </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Venezuela </td> <td style="text-align:left;"> 124.4K </td> </tr> <tr> <td style="text-align:left;"> Nicaragua </td> <td style="text-align:left;"> 5.4K </td> </tr> <tr> <td style="text-align:left;"> Colombia </td> <td style="text-align:left;"> 2.7K </td> </tr> <tr> <td style="text-align:left;"> Cuba </td> <td style="text-align:left;"> 1.7K </td> </tr> <tr> <td style="text-align:left;"> El Salvador </td> <td style="text-align:left;"> 1.3K </td> </tr> </tbody> </table> ] --- ### Build the chart - 1 - Initial bar plot .pull-left[ ```r ## A chart is indeed an object ## We create below a chart called plot using the basic ggplot function plot <- ggplot(data = Origin, ## Construct aesthetic mappings of variables to the chart aes( # Reordering country name by Value to avoid default alphabetical x = reorder(CountryOriginName, DisplacedAcrossBorders), y = DisplacedAcrossBorders)) + # here we configure that it will be bar chart geom_bar( stat = "identity", ## indicate how data should be summarised... # default behavior is to count the rows for each x value # stat = "identity" tells ggplot2 to skip aggregation # An alternative is to use geom_col ## indicate the color to use to fill the bar fill = "#0072bc") ## Print the plot!! plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-18-1.png" width="1800" /> ] ??? See full reference here https://ggplot2.tidyverse.org/reference/ --- ### Build the chart - 2 - Format axis number .pull-left[ ```r ## A plot is constructed through layered instruction - # I can re-use the same plot object (here: plot) and add more instructions with a + plot <- plot + scale_y_continuous( label = scales::label_number_si()) ## simply inputing the name of the object as a commend will print its status plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-20-1.png" width="1800" /> ] --- ### Build the chart - 3 - Flip the chart .pull-left[ ```r plot <- plot + coord_flip() plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-22-1.png" width="1800" /> ] --- ### Build the chart - 4 - Add and position data label .pull-left[ ```r plot <- plot + ## Add label inside the bar in white - outside bar in black geom_label( data = subset(Origin, DisplacedAcrossBorders < max(DisplacedAcrossBorders) / 1.5), aes(x = reorder(CountryOriginName, DisplacedAcrossBorders), y = DisplacedAcrossBorders, label= DisplacedAcrossBordersRound), hjust = -0.1 , vjust = 0.5, colour = "black", fill = NA, label.size = NA, family = "Lato", size = 4 ) + ## Add label outside bar in black geom_label( data = subset(Origin, DisplacedAcrossBorders >= max(DisplacedAcrossBorders) / 1.5), aes(x = reorder(CountryOriginName, DisplacedAcrossBorders), y = DisplacedAcrossBorders, label= DisplacedAcrossBordersRound), hjust = 1.1 , vjust = 0.5, colour = "white", fill = NA, label.size = NA, family = "Lato", size = 4 ) plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-24-1.png" width="1800" /> ] --- ### Build the chart - 5 - Add chart labels .pull-left[ ```r plot <- plot + ## and the chart labels labs(title = "What are the main Origin of Forced Displacement across Borders?", subtitle = paste0("Top 10 Origin from Americas - Data as of ",max(unhcrdatapackage::end_year_population_totals_long$Year), " in Panama" ), x = " ", y = "# of Forcibly displaced people", caption = "Data: UNHCR Refugee Population Statistics Database.\n Forced Displacement includes Refugees, Asylym Seekers and Venezuelan Displaced Abroad Population Group.") plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-26-1.png" width="1800" /> ] --- ### Build the chart - 6 - Apply unhcRstyle .pull-left[ ```r plot <- plot + unhcRstyle::unhcr_theme(base_size = 8) plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-28-1.png" width="1800" /> ] --- ### Build the chart - 7 - Adjust graphical element .pull-left[ ```r plot <- plot + ## Hihlight axis line geom_hline(yintercept = 0, size = 1.1, colour = "#333333") + theme( ### changing grid line that should appear panel.grid.major.x = element_line(color = "#cbcbcb"), panel.grid.major.y = element_blank()) plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-30-1.png" width="1800" /> ] --- # The Grammar of graphics in 7 key syntax .img75[![Structure of ggplot2](images/struct_ggplot.png)] [@CedSherer](https://twitter.com/CedScherer/status/1229418108122783744?s=20) ??? **Data** - Data is not just data - Representation defines what can be done with it - Grammar requires a tidy format (though it precedes the notion) **Aesthetics** - Allow generic datasets to be understood by the graphic system. - Link variables in data to graphical properties in the geometry. **Layers** 1. Geom - How to interpret aesthetics as graphical representations - Is a progression of positional aesthetics a number of points, a line, a single polygon, or something else entirely? 2. Stats - Transform input variables to displayed values - Is implicit in many plot-types but can often be done prior to plotting **Scales** - A scale translate back and forth between variable ranges and property ranges - Categories > Colour - Numbers > Position **Coordinates** - Defines the physical mapping of the aesthetics to the paper **Facets** - Define the number of panels with equal logic and split data among them… - Small multiples **Themes** - Theming spans every part of the graphic that is not linked to data --- .pull-left[ ### Insert this in a slide deck - Report template.... All based on Rmarkdown that allow to create multiple outputs from the same content format. - Powerpoint with UNHCR style - Word with UNHCR style - html/bootstrap -scroll-able report - html/slide - slide-able report (WIP) - Paginated report built on the top of pagedown. - Analysis Repository contribution ] .pull-right[ Access them **"From Template"** panel when creating a new Rmd document ![UNHCR Rmd templates](images/rmd_templates.png) ] --- class: center, middle, inverse # Practical Use Case 2 - Building impact indicators from Survey Data ### *The National Household Survey is including Forcibly Displaced People and got published yesterday... We need to report our indicators by tomorrow! * <i class="fa fa-exclamation-triangle fa-fw fa-2x"></i> ??? We will use the “Encuesta Nacional de Calidad de Vida ECV 2020” published by the National Office of Colombia (http://microdatos.dane.gov.co/index.php/catalog/718/get_microdata) and explore different type of statistical analysis based on an example of three research questions: --- ## How to get to this chart in a couple of lines? <img src="presentation_files/figure-html/unnamed-chunk-31-1.png" width="80%" /> --- ### Install packages The particularity of HH survey dataset, when stored in SPSS (`.sav`), SAS (`.sas7bdat`), or Stata (`.dta`) format is that they usually contains both values and __associated labels__ (we call them `labelled data`). .pull-left[ ```r if (!require("haven")) install.packages("haven", dependencies = TRUE) if (!require("labelled")) install.packages("labelled", dependencies = TRUE) if (!require("sjPlot")) install.packages("sjPlot", dependencies = TRUE) if (!require("DT")) install.packages("DT", dependencies = TRUE) ``` ] -- .pull-right[ ![tidyverse](images/labelled_preview.PNG) ] ??? https://www.pipinghotdata.com/posts/2020-12-23-leveraging-labelled-data-in-r/ --- ### Get the data The dataset, [Encuesta Nacional de Calidad de Vida ECV 2020, Colombia, DANE](http://microdatos.dane.gov.co/index.php/catalog/718/get_microdata) is available through a micro data library. Download the data from [here]( https://unhcr365-my.sharepoint.com/:f:/g/personal/legoupil_unhcr_org/El_KBosz8pFJilV0mSYUXycBhXqXx2dXtGey4T3_R1L8pA?e=kFXm5U) to your "`data-raw`" folder within your project file .left-column[ ```r ## Household composition where disaggregation variables are data0 <- haven::read_sav(here::here("data-raw", "Características y composición del hogar.sav")) ## Display frame content # data0 %>% # sjPlot::view_df() # view_df() offers many options, e.g. to add the frequencies of values, # the amount of missing values per variable, #or even weighted frequencies. # show.na = TRUE, # show.type = TRUE, # show.frq = TRUE, # show.prc = TRUE, # show.string.values = TRUE, # show.id = TRUE ``` ] -- .right-column[ <table style="border-collapse:collapse; border:none;"> <caption>Data frame: .</caption> <tr> <th style="border-bottom:double; font-style:italic; font-weight:normal; padding:0.2cm; text-align:left; vertical-align:top;">ID</th><th style="border-bottom:double; font-style:italic; font-weight:normal; padding:0.2cm; text-align:left; vertical-align:top;">Name</th><th style="border-bottom:double; font-style:italic; font-weight:normal; padding:0.2cm; text-align:left; vertical-align:top;">Label</th><th style="border-bottom:double; font-style:italic; font-weight:normal; padding:0.2cm; text-align:left; vertical-align:top;">Values</th><th style="border-bottom:double; font-style:italic; font-weight:normal; padding:0.2cm; text-align:left; vertical-align:top;">Value Labels</th> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">1</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">DIRECTORIO</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Directorio</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2"><em>range: 7247300-7407013</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">2</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">SECUENCIA_ENCUESTA</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Secuencia_encuesta</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2"><em>range: 1-22</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">3</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">SECUENCIA_P</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Secuencia_p</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2"><em>range: 1-4</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">4</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">ORDEN</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Orden</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2"><em>range: 1-22</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">5</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">FEX_C</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Factor de expansión</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2"><em>range: 1.9-3061.6</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">6</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6016</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Número de orden de la persona que proporciona la<br>información:</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2"><em>range: 1-18</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">7</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P1894</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Tipo de documento de identidad</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">1<br>2<br>3<br>4<br>5</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Registro civil de nacimiento<br>Tarjeta de identidad<br>Cédula de ciudadanía<br>Cédula de extranjería<br>No tiene documento de identidad</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">8</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6020</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Sexo</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1<br>2</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Hombre<br>Mujer</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">9</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P6034</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">¿Cuál es la fecha de nacimiento de _____?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2"><em>range: 1-2</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6040</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">¿cuántos años cumplidos tiene...?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2"><em>range: 0-106</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">11</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P6051</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Cuál es el parentesco de....con el jefe o la jefa<br>de este hogar?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10<br>11<br>12<br>13<br>14</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Jefe (a) del hogar<br>Pareja, esposo (a), cónyuge, compañero(a)<br>Hijo(a) hijastro(a)<br>Nieto (a)<br>Padre, madre, padrastro y madrastra<br>Suegro o suegra<br>Hermano (a), hermanastro (a)<br>Yerno, nuera<br>Otro pariente del jefe(a)<br>Empleado(a) del servicio doméstico<br>Parientes del servicio doméstico<br>Trabajador<br>Pensionista<br>Otro pariente</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">12</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P5502</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Actualmente…:</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1<br>2<br>3<br>4<br>5<br>6</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee"> No está casado(a) y vive en pareja hace menos de dos años<br>No está casado(a) y vive en pareja hace dos años o más<br> Está viudo(a)<br>Está separado(a) o divorciado(a)<br> Está soltero(a)<br> Está casado(a)</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">13</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P6071</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">El (la) cónyuge de .. vive en este hogar</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">1<br>2</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Sí<br>No</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">14</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6071S1</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">No de orden</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2"><em>range: 1-19</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">15</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P756</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">¿Dónde nació ________?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">1<br>2<br>3</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">En este municipio<br>En otro municipio<br>En otro país</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">16</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P756S1</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Departamento</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2"><em>range: 5-99</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">17</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P756S2</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Municipio</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2"><em>range: 5001-99773</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">18</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P756S3</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">En otro país</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10<br>11</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Estados Unidos<br>España<br>Venezuela<br>Ecuador<br>Panamá<br>Perú<br>Costa Rica<br>Argentina<br>Francia<br>Italia<br>Otro país</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">19</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P6074</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">¿_______________ siempre ha vivido aquí en este<br>municipio?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">1<br>2</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Sí<br>No</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">20</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P767</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">¿cuántos años continuos hace que vive ___ aquí en<br>este municipio?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2"><em>range: 0-90</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">21</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P6076</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Antes de venir a este municipio_______________<br>vivía en</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">1<br>2</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Otro país<br>Otro municipio</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">22</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6076S1</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Departamento</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee"></td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee"></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">23</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P6076S2</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Municipio </td> <td style="padding:0.2cm; text-align:left; vertical-align:top;"></td> <td style="padding:0.2cm; text-align:left; vertical-align:top;"></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">24</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6077</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">_____ vivía en</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1<br>2</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">El centro urbano donde está la alcaldía<br>Un corregimiento, inspección de policía, caserío, vereda o c</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">25</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P6096</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">¿cuál fue la razón principal para cambiar la<br>residencia al municipio actual?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10<br>11<br>12</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;"> Dificultad para encontrar trabajo o ausencia de medios de s<br> Riesgo o consecuencia de desastre natural (inundación, aval<br>Amenaza o riesgo para su vida, su libertad o su integridad f<br> Necesidad de educación<br>Porque se casó o formó pareja<br> Motivos de salud<br> Mejorar la vivienda o localización<br> Mejores oportunidades laborales o de negocio<br>Acompañar a otro(s) miembro(s) del hogar<br>Adquisición de vivienda<br>Búsqueda de tranquilidad o mejor calidad de vida<br> Otra</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">26</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6081</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">El padre de .. vive en este hogar</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1<br>2<br>3</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Sí<br>No<br>Fallecido</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">27</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P6081S1</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">No. De orden</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;" colspan="2"><em>range: 1-15</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">28</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6087</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">¿cuál es o fue el nivel de educación más alto<br>alcanzado por el padre de…..?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Algunos años de primaria<br>Toda la primaria<br>Algunos años de secundaria<br>Toda la secundaria<br>Uno o mas años de técnica o tecnológica<br>Técnica o tecnológica completa<br>Uno o mas años de universidad<br>Universitaria completa<br>Ninguno<br>No sabe</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">29</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P6083</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">La madre de .. vive en este hogar</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">1<br>2<br>3</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Sí<br>No<br>Fallecida</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">30</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6083S1</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">No. De orden</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee" colspan="2"><em>range: 1-18</em></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">31</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P6088</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">¿cuál es o fue el nivel de educación más alto<br>alcanzado por la madre de......?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Algunos años de primaria<br>Toda la primaria<br>Algunos años de secundaria<br>Toda la secundaria<br>Uno o mas años de técnica o tecnológica<br>Técnica o tecnológica completa<br>Uno o mas años de universidad<br>Universitaria completa<br>Ninguno<br>No sabe</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">32</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P6080</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">De acuerdo con su cultura, pueblo o rasgos<br>físicos, _____ es o se reconoce comó:</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1<br>2<br>3<br>4<br>5<br>6</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Indígena<br>Gitano (a) (rom)<br>Raizal del background-color:#eeeeeehipiélago de San Andrés, Providencia y Santa C<br>Palenquero (a) de San Basilio<br>Negro (a), mulato (a) (afrodescendiente), afrocolombiano(a)<br>Ninguno de los anteriores</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">33</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P5667</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;"> ¿a cuál pueblo o etnia indígena pertenece _____?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;"></td> <td style="padding:0.2cm; text-align:left; vertical-align:top;"></td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">34</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P2057</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">¿Usted se considera campesino(a)?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1<br>2<br>3</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Si<br>No<br>No informa</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">35</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P2059</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;"> ¿Usted considera que alguna vez fue campesino(a)?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">1<br>2<br>3</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Si<br>No<br>No informa</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">36</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P2061</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee"> ¿Usted considera que la comunidad en que vive es<br>campesina?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">1<br>2<br>3</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Si<br>No<br>No informa</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">37</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P1895</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">En general, qué tan satisfecho(a) se siente ...<br>con su vida actualmente?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Totalmente insatisfecho(a)<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>Totalmente satisfecho(a)</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">38</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P1896</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">En general, qué tan satisfecho(a) se siente ...<br>con su ingreso actualmente?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Totalmente insatisfecho(a)<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>Totalmente satisfecho(a)</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">39</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P1897</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">En general, qué tan satisfecho(a) se siente ...<br>con su salud actualmente?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Totalmente insatisfecho(a)<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>Totalmente satisfecho(a)</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">40</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P1898</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">En general, qué tan satisfecho(a) se siente ...<br>con su nivel de seguridad actualmente?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Totalmente insatisfecho(a)<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>Totalmente satisfecho(a)</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">41</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P1899</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">En general, qué tan satisfecho(a) se siente ...<br>con su trabajo/actividad actualmente?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Totalmente insatisfecho(a)<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>Totalmente satisfecho(a)</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">42</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P3175</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee"> En general, ¿qué tan satisfecho/a se siente _____<br>con su tiempo libre?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Totalmente insatisfecho(a)<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>Totalmente satisfecho(a)</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">43</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P1901</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;"> ¿qué tan feliz se sintió ... el día de ayer?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Para nada feliz<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>Todo el tiempo feliz</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">44</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P1903</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">¿qué tan preocupado(a) se sintió ... el día de<br>ayer?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Para nada preocupado(a)<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>Todo el tiempo preocupado(a)</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">45</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P1904</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">¿qué tan triste se sintió ... el día de ayer?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Para nada triste<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>Todo el tiempo triste</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">46</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">P1905</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">Qué tanto considera ... que las cosas que hace en<br>su vida valen la pena?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top; background-color:#eeeeee">No valen la pena<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>Valen totalmente la pena</td> </tr> <tr> <td style="padding:0.2cm; text-align:left; vertical-align:top;">47</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">P1927</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">¿En cuál escalón diría usted que se encuentra<br>parado(a) en este momento?</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10</td> <td style="padding:0.2cm; text-align:left; vertical-align:top;">Peor vida<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>Mejor vida</td> </tr> </table> ] ??? https://www.pipinghotdata.com/posts/2020-12-23-leveraging-labelled-data-in-r/ Here are operations I commonly perform on labelled data: Evaluate if variable is of class haven_labelled. Why? Troubleshooting, exploring, mutating. Function(s): haven::is.labelled() Convert haven_labelled variable to numeric value codes. Why? To treat the variable as continuous for analysis. For example, if a 1-7 rating scale imports as labelled and you want to compute a mean. Function(s): base::as.numeric() (strips variable of all metadata), haven::zap_labels() and labelled::remove_val_labels (removes value labels, retains other metadata) Convert haven_labelled() variable to factor with value labels. Why? To treat the variable as categorical for analysis. Function(s): haven::as_factor(), labelled::to_factor(), sjlabelled::as_label(). As far as I can tell, these three functions have the same result. By default, the factor levels are ordered by value codes. Convert variable label to variable name. Why? For more informative or readable variable names. Function(s): sjlabelled::label_to_colnames() --- ### Search in generated data dictionnary .left-column[ ```r # create data dictionary dictionary <- labelled::generate_dictionary(data0) View(dictionary) # library(DT) # dictionary %>% # DT::datatable() ``` ] -- .right-column[
] ??? --- ### Simple plot with labelled data .pull-left[ ```r plot <- sjPlot::plot_frq( data = data0$P756S3, type = "bar", sort.frq = "asc", coord.flip = TRUE, weight.by = as.vector(data0$FEX_C), show.ci = TRUE) plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-38-1.png" width="1800" /> ] --- ### Style the plot .pull-left[ ```r plot <- plot + scale_y_continuous(labels = scales::label_number_si()) + labs(title = paste0(sjlabelled::get_label(data0$P756S3)), x = "", y = "", caption = "Source: Encusta Nacional de Calidad de Vida ECV 2020, Colombia, DANE")+ unhcRstyle::unhcr_theme(base_size = 8) + theme(legend.position = "none", panel.grid.major.x = element_line(color = "#cbcbcb"), panel.grid.major.y = element_blank(), panel.grid.minor = element_blank()) plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-40-1.png" width="1800" /> ] --- ### Get all data files together .pull-left[ ```r # get a list of files in a folder with specific sav extension in the data-raw folder datasetlist <- fs::dir_ls(here::here("data-raw"), glob = "*.sav") # head(datasetlist, 2) # read all the files # all the dataset will be stored in a list object datasets <- datasetlist %>% map(compose(as_factor, haven::read_sav)) # class(datasets) ``` ] -- .pull-right[ ``` ## Print datasetlist first 2 lines ``` ``` ## /home/edouard/R-projects/tested_script/SurveyAnalysisTutorial/data-raw/Atencion integral de los niños y niñas menores de 5 años.sav ## /home/edouard/R-projects/tested_script/SurveyAnalysisTutorial/data-raw/Características y composición del hogar.sav ``` ``` ## Print dataset class ``` ``` ## [1] "list" ``` ] ??? --- ### Join data frame for Individuals .pull-left[ ```r # Bind together datasets with the same unit of analysis persons <- datasets %>% # keep() is similar to Filter() but # can be applied to a list object ## Records available for individual are sorted with a # combination of code within the sequence variable keep(~!all(.$SECUENCIA_P == 1) & !all(.$SECUENCIA_ENCUESTA == 1)) %>% ## Iteratively applying a binary function, i.e loop... reduce( ## Perform multiple left joins full_join) # harmonize key variables by renaming them # notice how SECUENCIA_ENCUESTA changes meaning between datasets persons <- persons %>% rename(dwelling = DIRECTORIO, household = SECUENCIA_P, person = SECUENCIA_ENCUESTA) # nrow(persons) # ncol(persons) ``` ] -- .pull-right[ ``` ## [1] 267098 ``` ``` ## [1] 527 ``` ] ??? --- ### Join data frame for Household .pull-left[ ```r households <- datasets %>% ## Records available for Household are sorted with a # combination of code within the sequence variable keep(~all(.$SECUENCIA_P == 1) & !all(.$SECUENCIA_ENCUESTA == 1)) %>% reduce(full_join) # harmonize key variables - # notice how SECUENCIA_ENCUESTA changes meaning between datasets households <- households %>% rename(dwelling = DIRECTORIO, household = SECUENCIA_ENCUESTA) %>% select(-ORDEN) # nrow(households) #ncol(households) ``` ] -- .pull-right[ ``` ## [1] 88310 ``` ``` ## [1] 235 ``` ] --- ### Identify Venezuelan .pull-left[ ```r library(tidyverse) data <- left_join(persons, households) %>% transmute(pop = case_when(is.na(P756S3) ~ "Colombians", P756S3 == "Venezuela" ~ "Venezuelans", TRUE ~ "Others")) # table(data$pop, useNA = "ifany") ``` ] -- .pull-right[ ``` ## ## Colombians Others Venezuelans ## 257044 496 9558 ``` ] ??? --- ### Create Indicators .pull-left[ ```r indicators1 <- left_join(persons, households) %>% # transmute function returns the same new variable as mutate. # However, it does not retain our original data! transmute(pop = case_when(is.na(P756S3) ~ "Colombians", P756S3 == "Venezuela" ~ "Venezuelans", TRUE ~ "Others"), # 2.1 Proportion of PoC living below the national poverty line rbm2.1 = PERCAPITA < 327674, # 2.3 Proportion of PoC with access to health services rbm2.3 = if_else(P5665 == "Sí", is.na(P6153), NA), dpto = "dummy", # FIXME: should be dwellings$dpto, wt = FEX_C) #knitr::kable(head(indicators1 %>% # select(pop, rbm2.1 )), # format = 'html') ``` ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> pop </th> <th style="text-align:left;"> rbm2.1 </th> <th style="text-align:left;"> rbm2.3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:left;"> NA </td> </tr> </tbody> </table> ] ??? --- ### Add Indicators from multiple variables .pull-left[ ```r indicators2 <- left_join(persons, households) %>% transmute(pop = case_when(is.na(P756S3) ~ "Colombians", P756S3 == "Venezuela" ~ "Venezuelans", TRUE ~ "Others"), water = P792 != "No tienen el servicio", sanitation = P5032 != "No tienen el servicio", overcrowding = CANT_PERSONAS_HOGAR/P5000 > 3, secure_tenure = P5095 == "Propia, totalmente pagada" | P5095 == "Propia, lo están pagando" | (P5095 == "En arriendo o subarriendo" & P3006 == "Escrito"), housing_durability = TRUE, # FIXME: correct when we get the missing data file # 2.2 Proportion of PoCs residing in physically safe and secure settlements rbm2.2 = water & sanitation & !overcrowding & housing_durability & secure_tenure, dpto = "dummy", # FIXME: should be dwellings$dpto, wt = FEX_C) #knitr::kable(head(indicators %>% # select(pop, rbm2.2, wt )), # format = 'html') ``` ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> pop </th> <th style="text-align:left;"> rbm2.2 </th> <th style="text-align:right;"> wt </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:right;"> 438.1767 </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:right;"> 495.8612 </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:right;"> 440.2299 </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:right;"> 466.8462 </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> TRUE </td> <td style="text-align:right;"> 466.8462 </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> FALSE </td> <td style="text-align:right;"> 466.4742 </td> </tr> </tbody> </table> ] ??? --- ### Aggregate Value .pull-left[ ```r indicators.computed <- ## get all indicators joined dplyr::left_join(x= indicators1, y =indicators2) %>% ## Remove the neither colombian nor venezuelan filter(pop != "Others") %>% ## transform in a weighted survey object srvyr::as_survey_design(strata = dpto, weights = wt) %>% ## Compile indicators by group group_by(pop) %>% summarize( # Apply functions across multiple columns across( ## select all variable name based on ## name pattern contains("rbm"), srvyr::survey_mean, # Report variability as confidence interval ("ci") vartype = "ci", # missing values should be dropped na.rm = TRUE, ## append a suffix to all column name to # distinguish the mai estimation from CI .names = "{.col}_est")) ``` ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> pop </th> <th style="text-align:right;"> rbm2.2_est </th> <th style="text-align:right;"> rbm2.2_est_low </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:right;"> 0.5501188 </td> <td style="text-align:right;"> 0.5490104 </td> </tr> <tr> <td style="text-align:left;"> Venezuelans </td> <td style="text-align:right;"> 0.2246054 </td> <td style="text-align:right;"> 0.2161841 </td> </tr> </tbody> </table> ] ??? --- ### Pivot Frame .pull-left[ ```r indicators <- indicators.computed %>% ## pivot from wide to long using variable pop pivot_longer(-pop, names_to = c("ind", ".value"), names_pattern = "(.+?)_(.+)") # Save indicators in a csv file # write.csv(indicators, "indicators.csv") # knitr::kable(head(indicators %>% # select(pop, ind, est )), format = 'html') ``` ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> pop </th> <th style="text-align:left;"> ind </th> <th style="text-align:right;"> est </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> rbm2.1 </td> <td style="text-align:right;"> 0.3027933 </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> rbm2.3 </td> <td style="text-align:right;"> 0.6900830 </td> </tr> <tr> <td style="text-align:left;"> Colombians </td> <td style="text-align:left;"> rbm2.2 </td> <td style="text-align:right;"> 0.5501188 </td> </tr> <tr> <td style="text-align:left;"> Venezuelans </td> <td style="text-align:left;"> rbm2.1 </td> <td style="text-align:right;"> 0.6496214 </td> </tr> <tr> <td style="text-align:left;"> Venezuelans </td> <td style="text-align:left;"> rbm2.3 </td> <td style="text-align:right;"> 0.3959790 </td> </tr> <tr> <td style="text-align:left;"> Venezuelans </td> <td style="text-align:left;"> rbm2.2 </td> <td style="text-align:right;"> 0.2246054 </td> </tr> </tbody> </table> ] ??? --- ### Build chart .pull-left[ ```r plot <- ggplot( data = indicators, aes( x= est, y = fct_rev(ind))) + geom_pointrange(aes(xmin = est_low, xmax = est_upp, color = pop), size = .75) + geom_text(aes(label = scales::label_percent(.1)(est)), vjust = -1.25) plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-58-1.png" width="1080" /> ] ??? --- ### Add scale .pull-left[ ```r plot <- plot + scale_x_continuous(labels = scales::label_percent(), limits = c(0, 1), breaks = c(0, .5, 1)) + scale_color_manual(values = c(Colombians = "black", Venezuelans = "#0072BC")) plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-60-1.png" width="1080" /> ] ??? --- ### Change labels .pull-left[ ```r ## Add labels for each calculated indicators rbm_indicators <- c(rbm2.1 = "2.1 Proportion of PoC living below the national poverty line", rbm2.2 = "2.2 Proportion of PoCs residing in physically safe and secure settlements with access to basic facilities.", rbm2.3 = "2.3 Proportion of PoC with access to health services") plot <- plot + scale_y_discrete(labels = str_wrap(rev(rbm_indicators), 40)) plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-62-1.png" width="1080" /> ] ??? --- ### Title and Style .pull-left[ ```r plot <- plot + labs(x = "Estimate", y = "", title = "Impact Area 2: Realizing Rights in Safe Environments", caption = glue::glue("Encuesta Nacional de Calidad de Vida (ECV) 2020\n", "# of Venezuelans = {scales::label_comma()(sum(data$pop=='Venezuelans'))} obs. ", "({scales::label_number_si(.1)(sum((data$pop=='Venezuelans')*data$wt))} weighted)")) + unhcRstyle::unhcr_theme() + theme(panel.grid.minor.x = element_blank()) plot ``` ] -- .pull-right[ <img src="presentation_files/figure-html/unnamed-chunk-64-1.png" width="1080" /> ] ??? --- ## Congrat! Today you learn the basics! * Objects are __words__ * Functions are __verbs__ used to transform object * Charts are __complement__ created with the __grammar__ of graphics ??? In grammar, a complement is a word, phrase, or clause that is necessary to complete the meaning of a given expression. Complements are often also arguments --- class: center, middle, inverse # Practical Use Case 3 - Evidence for key research question ### * What programmatic assumptions do we have evidences for? Describe, Explore, Explain... * <i class="fa fa-eye fa-fw fa-2x"></i> --- ## Statistical association analysis: Crosstabulation & Assocation > __Question__: How is the perception of current situation compared to that of 5 years ago differs between the "undocumented Venezuelans" and the rest of the population? See you in the R master webinar in January ! ??? --- ## Regression analysis between a target variable and multiple predictors > __Question__: What can explain why some "undocumented Venezuelans" feel worse now than five years ago? See you in the R master webinar in January ! --- ## Clustering analysis to identify a homogeneous group of individuals based on multiple variables > __Question__: What are the main profiles of "undocumented Venezuelans" who have a worse situation now than 5 years ago? See you in the R master webinar in February ! --- class: left, top # Good luck in your R journey! ### Reach out to us for questions, analysis mentoring or code peer review! .pull-left[ __DIMA Americas__: <br> Edouard Legoupil, _Snr Statistics & Data Analysis Officer_,<br> <a href="mailto:legoupil@unhcr.org"><i class="fa fa-paper-plane fa-fw"></i> legoupil@unhcr.org</a> <br> Hisham Galal, _Associate Statistics & Data Analysis Officer_,<br> <a href="mailto:galal@unhcr.org"><i class="fa fa-paper-plane fa-fw"></i> galal@unhcr.org</a> Check our github @ <a href="http://github.com/unhcr-americas"><i class="fa fa-github fa-fw"></i> unhcr-americas</a> and learn about __UnhcRverse packages__<br> Slides created with [**remark.js**](http://remarkjs.com/) and the R package [**xaringan**](https://github.com/yihui/xaringan). Slides notes for this presentation can be displayed by pressing keyboard shortcut `p` - Navigation help with keyboard shortcut `h` ] .pull-right[ <img src="images/giphy.gif" width="310px"/> ] ??? Last, we hope that this session will motivate you to join the vibrant R users community in UNHCR and soon become an R champion. In order to make the most of the session, we would advise you to install the following open source environment: R - https://cran.r-project.org/bin/windows/base/ Rstudio Free version: https://www.rstudio.com/products/rstudio/download/ Create an account on Github - https://github.com/join? and install Github desktop https://desktop.github.com/ You may also start installing UNHCR Packages – following the instruction in their respective documentation published on Github: Use UNHCR Open data - https://unhcr.github.io/unhcrdatapackage/docs/ API to connect to internal data source - https://unhcr-web.github.io/hcrdata/docs/ Perform High Frequency Check https://unhcr.github.io/HighFrequencyChecks/docs/ Process data crunching for survey dataset - https://unhcr.github.io/koboloadeR/docs/ Use UNHCR graphical template- https://unhcr-web.github.io/unhcRstyle/docs/ See some practical tutorial on https://humanitarian-user-group.github.io/ The best way to start and learn is to have a concrete project! If you have one and need mentoring, we can liaise after the session. Resources Ggplot: - [Ggplot main doc](https://ggplot2.tidyverse.org/index.html) - [The ggplot flipbook](https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1) by Gina Reynolds - [A ggplot2 tutorial for beautiful plotting in R](https://www.cedricscherer.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/) and [ggplot Wizardry Hands-On](https://z3tt.github.io/OutlierConf2021/) by Cedric Scherer - Ggplot workshop [Part1](https://www.youtube.com/watch?v=h29g21z0a68)/[Part2](https://www.youtube.com/watch?v=0m4yywqNPVY) by Thomas Lin Pedersen (one of the main maintainer of ggplot) packages extension for ggplot - Patchwork to bind multiple plots on one - Gganimate to create simple animation - Ggtext and ggrepel to deal with annotation and text style - Ggforce to group some content visually - and much more.... just go out there and experiment