top of page

6e8508fb-8c77-4b76-bb06-e3d23cb57112

Data Wrangling using R: Application to Netflix Data

Download Student Materials
Download Instructor Materials

You must be signed in to download materials.

About

Overview:

This case will be used to teach students about the basics of ETL (extracting, transforming, and loading data) or data wrangling (e.g., loading a dataframe, subsetting data, etc.) in R. This case will include concepts like data discovery, data structuring and data cleaning. The case questions will be solved using a dataset on TV shows and movies from Netflix that will be loaded and run in R, a popular programming language for statistical analysis and machine learning.


Usage:

Here is one suggested way to teach the case and its concepts:

Discuss the case and the case concepts in class.

Students then complete the case materials.

Finally, the solution file is discussed in a subsequent class.


Lecture #1: Walk through the PowerPoint deck, help students install R and RStudio, walk through a demo of R using the demo files, and introduce the Netflix data and case before class ends. Use the slide deck to present the key concepts of R to students in class. Plan for about 30 minutes of lecture and 30 minutes for the demo of R installation, introducing the key concepts. Activity 1 in this lecture: Give students some time in class to install R and RStudio. Activity 2 in this lecture: Introduce basic R concepts listed in this ppt using a demo in class with the demo RMD file and the demo data xlsx file.


Lecture #2: Discuss the solution to the Netflix case after students have had some time to work on it in class/at home.


Level:

This case is applicable to a "novice".


Target Classes:

It could be used in an advanced undergraduate or graduate classes in any business discipline


Learning Objectives:

  • basics of ETL (extracting, transforming, and loading data) or data wrangling (e.g., loading a dataframe, subsetting data, etc.) in R

  • data discovery

  • data structuring

  • data cleaning


File titles and explanations:

Lecture #1: Walk through the PowerPoint deck, help students install R and RStudio, walk through a demo of R using the demo files, and introduce the Netflix data and case before class ends.

"Instructional Materials.ppt": Use this


1. "Instructional Materials.ppt": Use this slide deck to present the key concepts of R to students in class. Plan for about 30 minutes of lecture and 30 minutes for the demo of R installation, introducing the key concepts.

    a. Activity 1 in this lecture:Give students some time in class to install R and RStudio

    b. Activity 2 in this lecture: Introduce basic R concepts listed in this ppt using demo in class of the following files:

        i. Instructional Materials Demo_intro_to_R.Rmd and its html version Instructional Materials Demo_intro_to_R.html

        ii. Instructional Materials demo_data.xlsx is used for the second half of the demo to show students how to load data files in R and work with dataframes.


2. "Case Document.docx": Use this file in the class to discuss the problem and explain to students what needs to be done in the case. This file can also be posted on learning management systems (LMS), such as Canvas as the case question. It includes all the tasks and exercises that need to be completed by the student.


3. "S_Data.csv": This file contains the dataset for the case, which will be loaded in R.


4. "S_Data Description Sheet.txt": This file contains information about the variables used in this case.


Lecture #2: Discuss the solution to the Netflix case after students have had some time to work on it in class/at home.


5. "Case Solution.Rmd": This file is instructor-only and includes the solution code along with the outputs and results of the analysis. It is an R Markdown file that instructors can run the code chunk by chunk for demonstrating solutions in class. Note that for questions 2a-2b, current year and time since release variables will be based on the year in which you run these codes, so the solution answers should look different from the solutions in these files that are based on the year in which the case was written. Make sure to re-run the solution code file before posting the solution for students!


6. "Case Solution.html": This file is instructor-only and is the knitted html version of the Case Solution.Rmd file in case the instructor wants to show the solution without running it in R. It can be used as a supplement to the .Rmd file. The instructor can also demonstrate how .Rmd files can be exported to .html for ease of use by non-R users or readers.


Installations: To work on this case, install R and R Studio from the CRAN project website.


Contact: For any questions about the case or the files, please email unnati@illinois.edu.

Author(s)

Unnati Narang

Assistant Professor of Business Administration and RC Evans Data Analytics Scholar

Others Disciplines

Audience

Primary Discipline

Skills

Software

bottom of page