class: title-slide, middle, center # BUS 320 # Topic 5 # Data manipulation ## Elizabeth Stanny --- layout: true <div class="my-footer"><span>http://estannydotcom.netlify.com</span></div> --- # Learning objectives for the course - Ask the right questions -- - Extract, transform and load relevant data (ETL process) - Extract (`tidyverse/dplyr`) - Transform (`tidyverse/dplyr`) - Load Excel Workbooks (`tidyverse/readxl`) -- - Apply appropriate data analytic techniques - Descriptive statistics - `tidyverse/dplyr` - `skimr/skim` -- - Interpret and share the results - `rmarkdown` files end with .Rmd --- ## Set up - create new post - delete line with author - delete very thing below below second set of --- --- ### 1. Load the packages and data ```r library(tidyverse) ``` -- ### 2. Read the data in the file, `drug_cos.csv` in to R and assign it to `drug_cos`. ```r drug_cos <- read_csv("https://estannydotcom.netlify.com/static/week5/drug_cos.csv") ``` -- ###3. Use `glimpse()` to get a glimpse of your data. ```r glimpse(drug_cos) ``` --- # drug_cos variables .left-column[ - `ticker` - `name` - `location` - `year` ] -- .right-column[ Profitability ratios - [`netmargin`](https://www.investopedia.com/terms/n/net_margin.asp): Net income / Revenue. What percent of revenue left after all expenses - [`grossmargin`](https://www.investopedia.com/terms/g/grossmargin.asp): Gross Profit / Revenue. Gross Profit = Revenue - Cost of Revenue - [`ros`]((https://www.investopedia.com/terms/r/ros.asp): Return on Sales (or operating profit margin) measures operational efficiency. Earnings Before Interest and Taxes (EBIT) / Revenue - [`ebitdamargin`](https://www.investopedia.com/terms/e/ebitda-margin.asp): measures operating profit as a percentage of revenue. EBITDA / Revenue. EBITDA is Earnings Before Interest Taxes, Depreciation and Amortization (cash income) Financial performance ratio - [`roe`](https://www.investopedia.com/terms/r/returnonequity.asp): Return on Equity measures financial performance. Net income / Shareholder's Equity ] --- # Questions this data could answer? --- ## Recall dplyr verbs: `function()` | Action --------------|-------------------------------------------------------- `filter()` | extract **observations** based on their values `select()` | selects a subset of **variables** `arrange()` | orders **observations** based on their values `mutate()` | creates new **variables** `group_by()` | create subsets of data to apply functions to `summarize()` | create summary statistics --- # Filter ### Extract rows by comparisons - `==`, `<=`, `>=`, `!=` - `%in%` is object part of set - `grepl("pattern", variable)` -- ### Combine criteria using operators that make comparisons: - `|` or - `&` and