class: title-slide, middle, center # BUS 320 # Excel to R - I # Introduction to R and RStudio ## Elizabeth Stanny --- layout: true <div class="my-footer"><span>http://estannydotcom.netlify.com</span></div> --- class: center # What are R and RStudio? <img src="images/R_vs_RStudio.png" width="80%" /> .pull-left[ R, programming language, does the work .footnote[.font80[Source: Figure 1.1. from Modern Dive https://moderndive.com/1-getting-started.html]] ] .pull-right[ RStudio, *integrated development environment (IDE)*, is interface that makes it easier ] --- # Open RStudio not R <div class="figure" style="text-align: center"> <img src="images/R_vs_RStudio_logos.png" alt="Icons of R versus RStudio on your computer." width="90%" /> <p class="caption">Icons of R versus RStudio on your computer.</p> </div> --- class: center # RStudio <img src="images/rstudio.png" width="70%" /> --- .pull-left[ # Delete the welcome post ] -- .pull-right[ # Create a new post in Distill In the Console type ``` distill::create_post("Summary statistics") ``` it will create new file `summary-statistics.Rmd` ``` --- title: "Summary statistics" description: | Comparison of R and Excel date: 01-28-2021 output: distill::distill_article: self_contained: false --- ``` ] --- # Rmarkdown files files that end in .Rmd combine **R** code and text formatted using **m**ark**d**own - code is in R chunks between 3 forward ticks - text is outside - knit the .Rmd document to generate webpage with text and output of code --- # What are R packages? <img src="images/R_vs_R_packages.png" width="70%" style="display: block; margin: auto;" /> .footnote[.font80[Source: Modern Dive https://moderndive.com/1-getting-started.html]] --- # Package installation - using menu -- .pull-left[ a) Click on the "Packages" tab. b) Click on "Install" next to Update. c) Type the name of the package under "Packages (separate multiple with space or comma):" In this case, type `tidyverse` d) Click "Install." .footnote[.font80[Source: Modern Dive https://moderndive.com/1-getting-started.html]] ] -- .pull_right[ <img src="images/install_packages_easy_way.png" width="50%" style="display: block; margin: auto 0 auto auto;" /> ] --- - Menu of commands: cmd + shit + p - install package `skimr` - create an r chunk - load the libraries in the r chunk ```r library(tidyverse) library(readxl) library(skimr) ``` --- .pull-left[ # Exploring data 1. About the data *Corporate Tax Avoidance in the First Year of the Trump Tax Law* here https://itep.org/corporate-tax-avoidance-in-the-first-year-of-the-trump-tax-law/ >Profitable Fortune 500 companies avoided $73.9 billion in taxes under the first year of the Trump-GOP tax law. The study includes financial filings by 379 Fortune 500 companies that were profitable in 2018; it excludes companies that reported a loss. The report builds on a previous ITEP analysis released in April 2019, which reviewed corporate filings available as of that date. ] -- .pull-right[ # Load the data into R copy the link [https://estannydotcom.netlify.com/static/week2/corp_tax.xlsx](https://estannydotcom.netlify.com/static/week2/corp_tax.xlsx) ] --- .pull-left[ ### First step in data exploration in R Use function `skim` from the packages `skimr` to calculate descriptive statistics **Character variables** - minimum length of character variable - maximum length of character variable - number of unique values - number of blank cells ] -- .pull-right[ **Numeric variables** - mean (average) - standard deviation - quartiles provides summary of distribution - p0 is the minimum value, 0% of the values are less than it - p25 is the 1st quartile, 25% of the values are smaller - p50 is the 2nd quartile, the middle value, 50% of the values are smaller - p75 is the 3rd quartile, 75% of values are smaller - p100 is the maximum value, 100% of the values are smaller ] --- # Use package skimr to calculate summary statistics ```r skim(corp_tax) ``` --- # Duplicating the output of the R skim function in Excel - `COLUMNS(array)` number of columns - `LEN(array)` number of rows - `AVERAGE()` mean - `STDEV()` standard deviation - Count unique values among duplicates - Quartiles - p0 = `QUARTILE(array, 0)` - p25 = `QUARTILE(array, 1)` - p50 = `QUARTILE(array, 2)` - p75 = `QUARTILE(array, 3)` - p100 = `QUARTILE(array, 4)` --- # Tips on learning to code * **Computers are not actually that smart** * **The best way to learn to code is by doing** - Analyze data you are interested in * **Practice is key** .footnote[.font80[Source: Modern Dive https://moderndive.com/1-getting-started.html]] --- # Errors, warnings, and messages * **Errors**: <span style="color:red">go to the start of the error to figure out the problem</span> * **Warnings**: <span style="color:orange">anything unexpected? if not, don't worry</span> * **Messages**:<span style="color:green"> don't worry</span> .footnote[.font80[Source: Modern Dive https://moderndive.com/1-getting-started.html]]