Introduction to R and RStudio

# BUS 320

# Excel to R - I

# Introduction to R and RStudio

## Elizabeth Stanny

---

<div class="my-footer"><span>http://estannydotcom.netlify.com</span></div>

---

# What are R and RStudio?

R, programming language, does the work

.footnote[.font80[Source: Figure 1.1. from Modern Dive https://moderndive.com/1-getting-started.html]]
]

RStudio, *integrated development environment (IDE)*, is interface that makes it easier

]

---

# Open RStudio not R

<div class="figure" style="text-align: center">
<img src="images/R_vs_RStudio_logos.png" alt="Icons of R versus RStudio on your computer." width="90%" />
<p class="caption">Icons of R versus RStudio on your computer.</p>
</div>

---

# RStudio

---

# Create a new post in Distill

In the Console type

```
distill::create_post("Summary statistics")
```

it will create new file `summary-statistics.Rmd`

```
---
title: "Summary statistics"
description: |
  Comparison of R and Excel
date: 01-28-2021
output:
  distill::distill_article:
    self_contained: false
---

```
]

---

# Rmarkdown files

files that end in .Rmd combine **R** code and text formatted using **m**ark**d**own

- code is in R chunks between 3 forward ticks

- text is outside

- knit the .Rmd document to generate webpage with text and output of code

---

# What are R packages?

---

# Package installation - using menu

a) Click on the "Packages" tab.

b) Click on "Install" next to Update.

c) Type the name of the package under "Packages (separate multiple with space or comma):" In this case, type `tidyverse`

d) Click "Install."

]

---

- Menu of commands:  cmd + shit  + p

- install package `skimr`

- create an r chunk

- load the libraries in the r chunk

```r
library(tidyverse)
library(readxl)
library(skimr)
```

---

# Exploring data

1. About the data *Corporate Tax Avoidance in the First Year of the Trump Tax Law* here https://itep.org/corporate-tax-avoidance-in-the-first-year-of-the-trump-tax-law/

>Profitable Fortune 500 companies avoided $73.9 billion in taxes under the first year of the Trump-GOP tax law. The study includes financial filings by 379 Fortune 500 companies that were profitable in 2018; it excludes companies that reported a loss. The report builds on a previous ITEP analysis released in April 2019, which reviewed corporate filings available as of that date.

]

# Load the  data into R

copy the link [https://estannydotcom.netlify.com/static/week2/corp_tax.xlsx](https://estannydotcom.netlify.com/static/week2/corp_tax.xlsx)

]

---

### First step in data exploration in R

Use function `skim` from the packages `skimr` to calculate descriptive statistics

**Character variables**

- minimum length of character variable

- maximum length of character variable

- number of unique values

- number of blank cells
]

- mean (average)

- standard deviation

- quartiles provides summary of distribution 
  - p0 is the minimum value, 0% of the values are less than it
  - p25 is the 1st quartile, 25% of the values are smaller
  - p50 is the 2nd quartile, the middle value, 50% of the values are smaller 
  - p75 is the 3rd quartile,  75% of values are smaller
  - p100 is the maximum value, 100% of the values are smaller

]

---

# Use package skimr to calculate summary statistics

```r
skim(corp_tax)
```

---

# Duplicating the output of the R skim function in Excel

- `COLUMNS(array)` number of columns
- `LEN(array)` number of rows
- `AVERAGE()` mean
- `STDEV()` standard deviation
-  Count unique values among duplicates
- Quartiles
  - p0 = `QUARTILE(array, 0)` 
  - p25 = `QUARTILE(array, 1)` 
  - p50 = `QUARTILE(array, 2)`
  - p75 = `QUARTILE(array, 3)`
  - p100 = `QUARTILE(array, 4)`

---

# Tips on learning to code

* **Computers are not actually that smart**

* **The best way to learn to code is by doing**
  - Analyze data you are interested in

* **Practice is key**

---

# Errors, warnings, and messages

* **Errors**: <span style="color:red">go to the start of the error to figure out the problem</span>

* **Warnings**: <span style="color:orange">anything unexpected? if not, don't worry</span>

* **Messages**:<span style="color:green"> don't worry</span>