class: center, middle, inverse, title-slide .title[ # ECON 4370 / 6370 Computing for Economics ] .subtitle[ ## Lecture 1: R Programming ] .author[ ### ] .date[ ### 02 September 2025 ] --- class: middle name: Overview ## Overview 1. [Checklist and Tips](#3) 2. [Use R](#4) 3. [Objects](#22) 4. [Namespace](#47) 5. [Indexing](#58) 6. [Cleaning up](#67) --- # Checklist and Tips ☑ Install `R` and `RStudio` (or [`Positron`](https://positron.posit.co/) if you prefer a VSC experience). ☑ Install/update the following packages: ``` r # Check, install, and load required packages # Set CRAN mirror first options(repos = c(CRAN = "https://cran.rstudio.com/")) required_packages <- c("tidyverse", "dplyr", "foreach", "doParallel") if(length(new_packages <- required_packages[!(required_packages %in% installed.packages()[,"Package"])])) install.packages(new_packages, dependencies = TRUE) # invisible(lapply(required_packages, library, character.only = TRUE)) update.packages(ask = FALSE, checkBuilt = TRUE) ``` - Learn coding the <em><span class="font120" style="color:#CC0035">hard way</span></em>, especially when you are just starting out - Slightly more painful in the beginning, but much better payoff in the long-run --- class: inverse, center, middle name: started # Use R <html><div style='float:left'></div><hr color='#354CA1' size=1px width=1096px></html> --- # Basic arithmetic R is a powerful <span style="color:#CC0035">*calculator*</span> and recognizes all of the standard arithmetic operators: ``` r 1+2 ## Addition ``` ``` ## [1] 3 ``` ``` r 6-7 ## Subtraction ``` ``` ## [1] -1 ``` ``` r 5/2 ## Division ``` ``` ## [1] 2.5 ``` --- # Basic arithmetic (cont.) ``` r 2^3 ## Exponentiation ``` ``` ## [1] 8 ``` ``` r 2 + 4 * 2^3 ## Standard order of precedence (`*` before `+`, etc.) ``` ``` ## [1] 34 ``` --- # Basic arithmetic (cont.) We can also invoke modulo operators (integer division & remainder). - Very useful when dealing with time, for example. ``` r 100 %/% 60 ## How many whole hours in 100 minutes? ``` ``` ## [1] 1 ``` ``` r 100 %% 60 ## How many minutes are left over? ``` ``` ## [1] 40 ``` --- # Logical operators R also comes equipped with a full set of logical operators (and Boolean functions), which follow standard programming protocol. For example: ``` r 1 > 2 ``` ``` ## [1] FALSE ``` ``` r 1 == 2 ``` ``` ## [1] FALSE ``` ``` r 1 > 2 | 0.5 ## The "|" stands for "or" (not a pipe a la the shell) ``` ``` ## [1] TRUE ``` ``` r 1 > 2 & 0.5 ## The "&" stands for "and" ``` ``` ## [1] FALSE ``` --- # Logical operators (cont.) ``` r isTRUE (1 < 2) ``` ``` ## [1] TRUE ``` ``` r is.na(1:10) ``` ``` ## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ``` etc.. You can read more about these logical operators <a href="https://stat.ethz.ch/R-manual/R-devel/library/base/html/logical.html" target="_blank">here</a> and <a href="https://stat.ethz.ch/R-manual/R-devel/library/base/html/Logic.html" target="_blank">here</a>. --- # Logical operators: Order of precedence Much like standard arithmetic, logic statements follow a strict order of precedence. Logical operators (`>`, `==`, etc) are evaluated before Boolean operators (`&` and `|`). Failure to recognise this can lead to unexpected behaviour... ``` r 1 > 0.5 & 2 ``` ``` ## [1] TRUE ``` -- What's happening here is that R is evaluating two separate "logical" statements: - `1 > 0.5`, which is is obviously TRUE. - `2`, which is TRUE(!) because R is "helpfully" converting it to `as.logical(2)`. -- **Solution:** Be explicit about each component of your logic statement(s). ``` r 1 > 0.5 & 1 > 2 ``` ``` ## [1] FALSE ``` --- # Logical operators: Negation: `!` We use `!` as a short hand for negation. This will come in very handy when we start filtering data objects based on non-missing (i.e. non-NA) observations. ``` r is.na(1:10) ``` ``` ## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ``` ``` r !is.na(1:10) ``` ``` ## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ``` ``` r # Negate(is.na)(1:10) ## This also works. Try it yourself. ``` --- # Logical operators Value matching: `%in%` To see whether an object is contained within (i.e. matches one of) a list of items, use `%in%`. ``` r 4 %in% 1:10 ``` ``` ## [1] TRUE ``` ``` r 4 %in% 5:10 ``` ``` ## [1] FALSE ``` -- There's no equivalent "not in" command, but how might we go about creating one? - Hint: Think about negation... -- ``` r `%ni%` = Negate(`%in%`) ## The backticks (`) help to specify functions. 4 %ni% 5:10 ``` ``` ## [1] TRUE ``` --- # Logical operators (cont.) ### Evaluation We'll get to assignment shortly. However, to preempt it somewhat, we always use two equal signs for logical evaluation. ``` r # This will cause an error - you can't assign to a literal number 1 = 1 ``` ``` Error in 1 = 1 : invalid (do_set) left-hand side to assignment ``` -- ``` r 1 == 1 ## This does work - comparison ``` ``` ## [1] TRUE ``` ``` r 1 != 2 ## This also works - not equal comparison ``` ``` ## [1] TRUE ``` .footnote[`!=` is a `!` followed by a `=`. It is rendered as a `!=` b/c of the [font](https://github.com/tonsky/FiraCode) I'm using here.] --- # Logical operators: Evaluation caveat with Floating-point numbers What do you think will happen if we evaluate `0.1 + 0.2 == 0.3`? -- ``` r 0.1 + 0.2 == 0.3 ``` ``` ## [1] FALSE ``` Uh-oh! (Or, maybe you're thinking: Huh??) -- ``` r print(0.1 + 0.2, digits = 20) ## Show 20 digits ``` ``` ## [1] 0.30000000000000004441 ``` --- **Problem:** Computers represent numbers as binary (i.e. base 2) floating-points. More [here](https://floating-point-gui.de/basic/). - Fast and memory efficient, but can lead to unexpected behaviour. - Similar to the way that standard decimal (i.e. base 10) representation can't precisely capture certain fractions (e.g. `\(\frac{1}{3} = 0.3333...\)`). --- # Logical operators: Evaluation caveat with Floating-point numbers **Solution:** Use `all.equal()` for evaluating floats (i.e fractions). - `all.equal()` was built to check "near equality" rather than strict bitwise equality. It checks whether two numbers are equal within a small tolerance, typically `sqrt(.Machine$double.eps) ≈ 1.5e-8`. ``` r all.equal(0.1 + 0.2, 0.3) ``` ``` ## [1] TRUE ``` ``` r all.equal(0.1 + 0.2, 0.3, tolerance = 1e-12) ``` ``` ## [1] TRUE ``` --- # Assignment In R, we can use either `=` or `<-` to handle assignment.<sup>1</sup> .footnote[ <sup>1</sup> The `<-` is really a `<` followed by a `-`. It just looks like one thing b/c of the [font](https://github.com/tonsky/FiraCode) I'm using here. ] -- ### Assignment with `<-` `<-` is normally read aloud as "gets". You can think of it as a (left-facing) arrow saying *assign in this direction*. ``` r a <- 10 + 5 a ``` ``` ## [1] 15 ``` -- Of course, an arrow can point in the other direction too (i.e. `->`). So, the following code chunk is equivalent to the previous one, although used much less frequently. ``` r 10 + 5 -> a ``` --- # Assignment (cont.) ### Assignment with `=` You can also use `=` for assignment. ``` r b = 10 + 10 ## Note that the assigned object *must* be on the left with "=". b ``` ``` ## [1] 20 ``` Most R users (purists?) seem to prefer `<-` for assignment, since `=` also has specific role for evaluation *within* functions. - We'll see lots of examples of this later. - But... `=` is quicker to type and is more intuitive if you're coming from another programming language. More discussion about `<-` vs `=` [here](https://github.com/Robinlovelace/geocompr/issues/319#issuecomment-427376764) and [here](https://www.separatinghyperplanes.com/2018/02/why-you-should-use-and-never.html). **Bottom line:** Use whichever you prefer. Just be consistent. --- # Help For more information on a (named) function or object in R, consult the "help" documentation. For example: ```R help(plot) ``` Or, more simply, just use `?` or `??`: ```R # This is what most people use. ?plot ??sequence ``` -- **Aside 1:** Comments in R are demarcated by `#`. - Hit `Ctrl+Shift+c` in RStudio to (un)comment whole sections of highlighted code. -- **Aside 2:** See the *Examples* section at the bottom of the help file? - You can run them with the `example()` function. Try it: `example(plot)`. --- # Help: Vignettes For many packages, you can also try the `vignette()` function, which will provide an introduction to a package and it's purpose through a series of helpful examples. - Try running `vignette("dplyr")` in your console now. -- I highly encourage reading package vignettes if they are available. - They are often the best way to learn how to use a package. -- One complication is that you need to know the exact name of the package vignette(s). - E.g. The `dplyr` package actually has several vignettes associated with it: "dplyr", "window-functions", "programming", etc. - You can run `vignette()` (i.e. without any arguments) to list the available vignettes of every *installed* package installed on your system. - Or, run `vignette(all = FALSE)` if you only want to see the vignettes of any *loaded* packages. --- # Help: Demos Similar to vignettes, many packages come with built-in, interactive demos. To list all available demos on your system: ```r demo(package = .packages(all.available = TRUE)) ``` -- To run a specific demo, just tell R which one and the name of the parent package. For example: ```r demo("graphics", package = "graphics") ``` --- class: inverse, center, middle name: objects # Objects <html><div style='float:left'></div><hr color='#354CA1' size=1px width=1096px></html> --- # Object-oriented programming (OOP) in R R's approach to [object-oriented programming](https://en.wikipedia.org/wiki/Object-oriented_programming) (OOP), which is often summarised as: > **"Everything is an object and everything has a name."** There are actually _multiple_ OOP frameworks in R. - **S3**, **S4**, **R6**... - Quite different philosophies from other languages like Python ... functions are more native ... I'm being loose and sloppy here and won't systematically discuss OOP as in a computer science class. - Hadley Wickham's "Advanced R" provides a [very thorough overview](https://adv-r.hadley.nz/oo.html) of the main issues. Instead, I will walk you through some of the basic objects and talk about some special features as we go along. --- # What are objects? It's important to emphasise that there are many different *types* (or *classes*) of objects. We'll revisit the issue of "type" vs "class" in a slide or two. For the moment, it is helpful simply to name some objects that we'll be working with regularly: - vectors - matrices - data frames - lists - functions - etc. --- # Vectors (cont.) ## Vectors The c() function creates vectors. This is one of the objects we'll use to store data. ``` r myvec <- c(1, 2, 3) print(myvec) ``` ``` ## [1] 1 2 3 ``` Shortcut for consecutive numbers: ``` r myvec <- 1:3 print(myvec) ``` ``` ## [1] 1 2 3 ``` --- # Vectors (cont.) .pull-left[ Basic algebraic operations all work entrywise on vectors. ``` r myvec <- c(1, 3, 7) myvec2 <- c(5, 14, 3) myvec3 <- c(9, 4, 8) myvec + myvec2 ``` ``` ## [1] 6 17 10 ``` ``` r myvec / myvec2 ``` ``` ## [1] 0.2000000 0.2142857 2.3333333 ``` ``` r myvec * (myvec2^2 + sqrt(myvec3)) ``` ``` ## [1] 28.00000 594.00000 82.79899 ``` ] -- .pull-right[ So are the binary logical operations `&` `|` `!=`. ``` r # logical vectors logi_1 = c(T,T,F) logi_2 = c(F,T,T) logi_12 = logi_1 & logi_2 print(logi_12) ``` ``` ## [1] FALSE TRUE FALSE ``` ] --- # Vectors (cont.) Some useful vector functions: ``` r length(myvec) ``` ``` ## [1] 3 ``` ``` r mean(myvec) ``` ``` ## [1] 3.666667 ``` ``` r var(myvec) ``` ``` ## [1] 9.333333 ``` --- # Matrices Matrices are just collections of several vectors of the same length. ``` r myvec <- c(1, 3, 7) myvec2 <- c(5, 14, 3) myvec3 <- c(9, 4, 8) ``` -- .pull-left[ ``` r # creates matrix whose columns are the inputs of myvec mat_1 <- cbind(myvec, myvec2, myvec3) print(mat_1) ``` ``` ## myvec myvec2 myvec3 ## [1,] 1 5 9 ## [2,] 3 14 4 ## [3,] 7 3 8 ``` ] -- .pull-right[ ``` r # now they're rows instead mat_2 <- rbind(myvec, myvec2, myvec3) print(mat_2) ``` ``` ## [,1] [,2] [,3] ## myvec 1 3 7 ## myvec2 5 14 3 ## myvec3 9 4 8 ``` ] --- # Matrices (cont.) .pull-left[ ``` r # Define a matrix by its element mat_3 <- matrix(1:8, 2, 4) print(mat_3) ``` ``` ## [,1] [,2] [,3] [,4] ## [1,] 1 3 5 7 ## [2,] 2 4 6 8 ``` ``` r dim(mat_3) ``` ``` ## [1] 2 4 ``` ] -- .pull-right[ ``` r # Define a matrix by its element (different order) mat_4 <- matrix(1:8, 2, 4, byrow = TRUE) print(mat_4) ``` ``` ## [,1] [,2] [,3] [,4] ## [1,] 1 2 3 4 ## [2,] 5 6 7 8 ``` ``` r dim(mat_4) ``` ``` ## [1] 2 4 ``` ] --- # Matrices (cont.) Matrix algebra works the same way as vector algebra - it's all done entrywise with the `*` and `+` operators. If you want to do matrix multiplication, use `%*%`. ``` r mat_1 * mat_2 ``` ``` ## myvec myvec2 myvec3 ## [1,] 1 15 63 ## [2,] 15 196 12 ## [3,] 63 12 64 ``` ``` r mat_1 %*% mat_2 #Note that this differs from the elementwise product ``` ``` ## [,1] [,2] [,3] ## [1,] 107 109 94 ## [2,] 109 221 95 ## [3,] 94 95 122 ``` --- # Matrices (cont.) Other often used matrix operations: .pull-left[ ``` r t(mat_1) # Transpose ``` ``` ## [,1] [,2] [,3] ## myvec 1 3 7 ## myvec2 5 14 3 ## myvec3 9 4 8 ``` ``` r solve(mat_1) # Inverse ``` ``` ## [,1] [,2] [,3] ## myvec -0.146842878 0.01908957 0.155653451 ## myvec2 -0.005873715 0.08076358 -0.033773862 ## myvec3 0.130690162 -0.04698972 0.001468429 ``` ] .pull-right[ ``` r eigen(mat_1) # eigenvalues and eigenvectors ``` ``` ## eigen() decomposition ## $values ## [1] 18.699429 8.556685 -4.256114 ## ## $vectors ## [,1] [,2] [,3] ## [1,] -0.4631214 0.3400869 0.87156297 ## [2,] -0.7270618 -0.6713411 -0.03609066 ## [3,] -0.5068528 0.6585150 -0.48895343 ``` ] For more operations, check out https://www.statmethods.net/advstats/matrix.html. --- # Data frames `data.frame` is a two-dimensional table that stores the data, similar to a spreadsheet in Excel. A matrix is also a two-dimensional table, but it only accommodates one type of elements. Real world data can be a collection of integers, real numbers, characters, categorical numbers and so on. Data frame is the best way to organize data of mixed type in R. ``` r df_1 <- data.frame(a = 1:2, b = 3:4) print(df_1) ``` ``` ## a b ## 1 1 3 ## 2 2 4 ``` ``` r df_2 <- data.frame(name = c("Jack", "Rose"), score = c(100, 90)) print(df_2) ``` ``` ## name score ## 1 Jack 100 ## 2 Rose 90 ``` --- # Lists A vector only contains one type of elements. *list* is a basket for objects of various types. It can serve as a container when a procedure returns more than one useful object. For example, recall that when we invoke `eigen`, we are interested in both eigenvalues and eigenvectors, which are stored into `$value` and `$vector`, respectively. ``` r x_list <- list(a = 1:2, b = "hello world") print(x_list) ``` ``` ## $a ## [1] 1 2 ## ## $b ## [1] "hello world" ``` --- # What are objects? (cont.) Most likely, you already have a good idea of what distinguishes these objects and how to use them. - However, bear in mind that there subtleties that may confuse while you're still getting used to R. - E.g. There are different kinds of data frames. We'll soon encounter "[tibbles](https://tibble.tidyverse.org/)" and "[data.tables](https://rdatatable.gitlab.io/data.table/articles/datatable-intro.html#what-is-datatable-1a)", which are enhanced versions of the standard data frame in R. --- # What are objects? (cont.) Each object class has its own set of rules ("methods") for determining valid operations. - For example, you can perform many of the same operations on matrices and data frames. But there are some operations that only work on a matrix, and vice versa. - At the same time, you can (usually) convert an object from one type to another. .pull-left[ ``` r ## Create a small data frame called "d". d = data.frame(x = 1:2, y = 3:4) d ``` ``` ## x y ## 1 1 3 ## 2 2 4 ``` ] .pull-right[ ``` r ## Convert it to (i.e. create) a matrix call "m". m = as.matrix(d) m ``` ``` ## x y ## [1,] 1 3 ## [2,] 2 4 ``` ] --- # Object class, type, and structure Use the `class`, `typeof`, and `str` commands if you want understand more about a particular object. ``` r # d = data.frame(x = 1:2, y = 3:4) ## Create a small data frame called "d". class(d) ## Evaluate its class. ``` ``` ## [1] "data.frame" ``` ``` r typeof(d) ## Evaluate its type. ``` ``` ## [1] "list" ``` ``` r str(d) ## Show its structure. ``` ``` ## 'data.frame': 2 obs. of 2 variables: ## $ x: int 1 2 ## $ y: int 3 4 ``` -- PS — Confused by the fact that `typeof(d)` returns "list"? See [here](https://stackoverflow.com/questions/45396538/typeofdata-frame-shows-list-in-r). --- # Object class, type, and structure (cont.) Of course, you can always just inspect/print an object directly in the console. - E.g. Type `d` and hit Enter. The `View()` function is also very helpful. This is the same as clicking on the object in your RStudio *Environment* pane. (Try both methods now.) - E.g. `View(d)`. --- name: global_env # Global environment Let's go back to the simple data frame that we created a few slides earlier. ``` r d ``` ``` ## x y ## 1 1 3 ## 2 2 4 ``` -- Now, let's try to run a regression<sup>1</sup> on these "x" and "y" variables: .footnote[ <sup>1</sup> Yes, this is a dumb regression with perfectly co-linear variables. Just go with it. ] ``` r lm(y ~ x) ## The "lm" stands for linear model(s) ``` ``` ## Error in eval(predvars, data, env): object 'y' not found ``` -- Uh-oh. What went wrong here? (Answer on next slide.) --- # Global environment (cont.) The error message provides the answer to our question: ``` *## Error in eval(predvars, data, env): object 'y' not found ``` -- R can't find the variables that we've supplied in our [Global Environment](https://www.datamentor.io/r-programming/environment-scope/):  -- Put differently: Because the variables "x" and "y" live as separate objects in the global environment, we have to tell R that they belong to the object `d`. - Think about how you might do this before clicking through to the next slide. --- # Global environment (cont.) There are a various ways to solve this problem. One is to simply specify the datasource: ``` r lm(y ~ x, data = d) ## Works when we add "data = d"! ``` ``` ## ## Call: ## lm(formula = y ~ x, data = d) ## ## Coefficients: ## (Intercept) x ## 2 1 ``` -- --- # Global environment (cont.) I wanted to emphasize this global environment issue, because it is something that Stata users (i.e. many economists) struggle with when they first come to R. - In Stata, the entire workspace essentially consists of one (and only one) data frame. So there can be no ambiguity where variables are coming from. - However, that "convenience" comes at a really high price IMO. You can never read more than two separate datasets (let alone object types) into memory at the same time, have to resort all sorts of hacks to add summary variables to your dataset, etc. - Speaking of which... --- # Working with multiple objects As I keep saying, R's ability to keep multiple objects in memory at the same time is a huge plus when it comes to effective data work. E.g. We can copy an exiting data frame, or create new one entirely from scratch. Either will exist happily with our existing objects in the global environment. Again, however, it does mean that you have to pay attention to the names of those distinct data frames and be specific about which objects you are referring to. ``` r d2 = data.frame(x = rnorm(10), y = runif(10)) ``` <div style="text-align: center"> <img src="images/environment2.png" alt="Now with d2 added" style="width: 60%;"> </div> --- class: inverse, center, middle name: ename # "Everything has a name" <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1096px></html> --- # Reserved words We've seen that we can assign objects to different names. However, there are a number of special words that are "reserved" in R. - These are are fundamental commands, operators and relations in base R that you cannot (re)assign, even if you wanted to. - We already encountered examples with the logical operators. See [here](http://stat.ethz.ch/R-manual/R-devel/library/base/html/Reserved.html) for a full list, including (but not limited to): `if`, `else`, `while`, `function`, `for`, `TRUE`, `FALSE`, `NULL`, `Inf`, `NaN`, `NA`... --- # Semi-reserved words In addition to the list of strictly reserved words, there is a class of words and strings that I am going to call "semi-reserved". - These are named functions or constants (e.g. `pi`) that you can re-assign if you really wanted to... but already come with important meanings from base R. Arguably the most important semi-reserved character is `c()`, which we use for concatenation; i.e. creating vectors and binding different objects together. ``` r my_vector = c(1, 2, 5) my_vector ``` ``` ## [1] 1 2 5 ``` -- What happens if you type the following? (Try it in your console.) ```R c = 4 c(1, 2 ,5) ``` ??? Vectors are very important in R, because the language has been optimised for them. Don't worry about this now; later you'll learn what I mean by "vectorising" a function. --- # Semi-reserved words (cont.) In this case, thankfully nothing. R is "smart" enough to distinguish between the variable `c = 4` that we created and the built-in function `c()` that calls for concatenation. -- However, this is still *extremely* sloppy coding. R won't always be able to distinguish between conflicting definitions. And neither will you. For example: ``` r pi ``` ``` ## [1] 3.141593 ``` ``` r pi = 2 pi ``` ``` ## [1] 2 ``` -- **Bottom line:** Don't use (semi-)reserved characters! --- # Namespace conflicts A similar issue crops up when we load two packages, which have functions that share the same name. E.g. Look what happens we load the `dplyr` package. ``` r library(dplyr) ``` -- The messages that you see about some object being *masked from 'package:X'* are warning you about a namespace conflict. - E.g. Both `dplyr` and the `stats` package (which gets loaded automatically when you start R) have functions named "filter" and "lag". --- # Namespace conflicts (cont.) The potential for namespace conflicts is a result of the OOP approach.<sup>1</sup> - Also reflects the fundamental open-source nature of R and the use of external packages. People are free to call their functions whatever they want, so some overlap is only to be expected. .footnote[ <sup>1</sup> Similar problems arise in virtually every other programming language (Python, C, etc.) ] -- Whenever a namespace conflict arises, the most recently loaded package will gain preference. So the `filter()` function now refers specifically to the `dplyr` variant. But what if we want the `stats` variant? Well, we have two options: 1. Temporarily use `stats::filter()` 2. Permanently assign `filter = stats::filter` --- # Solving namespace conflicts ### 1. Use `package::function()` We can explicitly call a conflicted function from a particular package using the `package::function()` syntax. For example: ``` r stats::filter(1:10, rep(1, 2)) ``` ``` ## Time Series: ## Start = 1 ## End = 10 ## Frequency = 1 ## [1] 3 5 7 9 11 13 15 17 19 NA ``` --- # Solving namespace conflicts (cont.) We can also use `::` for more than just conflicted cases. E.g. Being explicit about where a function (or dataset) comes from can help add clarity to our code. Try these lines of code in your R console. ``` r print(dplyr::starwars, n = 4) ## Print the starwars data frame from the dplyr package ``` ``` ## # A tibble: 87 × 14 ## name height mass hair_color skin_color eye_color birth_year sex gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> ## 1 Luke Sky… 172 77 blond fair blue 19 male mascu… ## 2 C-3PO 167 75 <NA> gold yellow 112 none mascu… ## 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu… ## 4 Darth Va… 202 136 none white yellow 41.9 male mascu… ## # ℹ 83 more rows ## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>, ## # vehicles <list>, starships <list> ``` ``` r scales::comma(c(1000, 1000000)) ## Use the comma function, which comes from the scales package ``` ``` ## [1] "1,000" "1,000,000" ``` ??? The `::` syntax also means that we can call functions without loading package first. E.g. As long as `dplyr` is installed on our system, then `dplyr::filter(iris, Species=="virginica")` will work. --- # Solving namespace conflicts (cont.) ### 2. Assign `function = package::function` A more permanent solution is to assign a conflicted function name to a particular package. This will hold for the remainder of your current R session, or until you change it back. E.g. ``` r filter = stats::filter ## Note the lack of parentheses. filter = dplyr::filter ## Change it back again. ``` --- # Solving namespace conflicts (cont.) ### General advice I would generally advocate for the temporary `package::function()` solution. Another good rule of thumb is that you want to load your most important packages last. (E.g. Load the tidyverse after you've already loaded any other packages.) Other than that, simply pay attention to any warnings when loading a new package and `?` is your friend if you're ever unsure. (E.g. `?filter` will tell you which variant is being used.) - In truth, problematic namespace conflicts are rare. But it's good to be aware of them. --- # User-side namespace conflicts A final thing to say about namespace conflicts is that they don't only arise from loading packages. They can arise when users create their own functions with a conflicting name. - E.g. If I was naive enough to create a new function called `c()`. -- </br> In a similar vein, one of the most common and confusing errors that even experienced R programmers run into is related to the habit of calling objects "df" or "data"... both of which are functions in base R! - See for yourself by typing `?df` or `?data`. Again, R will figure out what you mean if you are clear/lucky enough. But, much the same as with `c()`, it's relatively easy to run into problems. - Case in point: Triggering the infamous "object of type closure is not subsettable" error message. (See from 1:45 [here](https://rstudio.com/resources/rstudioconf-2020/object-of-type-closure-is-not-subsettable/).) --- class: inverse, center, middle name: indexing # Indexing <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1096px></html> --- # Option 1: [] We've already seen an example of indexing in the form of R console output. For example: ``` r 1+2 ``` ``` ## [1] 3 ``` The `[1]` above denotes the first (and, in this case, only) element of our output.<sup>1</sup> In this case, a vector of length one equal to the value "3". -- Try the following in your console to see a more explicit example of indexed output: ``` r rnorm(n = 100, mean = 0, sd = 1) # rnorm(100) ## Would work just as well. (Why? Hint: see ?rnorm) ``` .footnote[ [1] Indexing in R begins at 1. Not 0 like some languages (e.g. Python and JavaScript). ] --- # Option 1: [] (cont.) More importantly, we can also use `[]` to index objects that we create in R. ``` r a = 1:10 a[4] ## Get the 4th element of object "a" ``` ``` ## [1] 4 ``` ``` r a[c(4, 6)] ## Get the 4th and 6th elements ``` ``` ## [1] 4 6 ``` --- # Option 1: [] (cont.) It also works on larger arrays (vectors, matrices, data frames, and lists). For example: ``` r starwars[1, 1] ## Show the cell corresponding to the 1st row & 1st column of the data frame. ``` ``` ## # A tibble: 1 × 1 ## name ## <chr> ## 1 Luke Skywalker ``` -- ``` r starwars[1:3, 1:2] ``` ``` ## # A tibble: 3 × 2 ## name height ## <chr> <int> ## 1 Luke Skywalker 172 ## 2 C-3PO 167 ## 3 R2-D2 96 ``` --- # Option 1: [] (cont.) We haven't covered them properly yet (patience), but **lists** are a more complex type of array object in R. - They can contain a random assortment of objects that don't share the same class, or have the same shape (e.g. rank) or common structure. - E.g. A list can contain a scalar, a string, and a data frame. Or you can have a list of data frames, or even lists of lists. -- The relevance to indexing is that lists require two square brackets `[[]]` to index the parent list item and then the standard `[]` within that parent item. An example might help to illustrate: ``` r my_list = list(a = "hello", b = c(1,2,3), c = data.frame(x = 1:5, y = 6:10)) my_list[[1]] ## Return the 1st list object ``` ``` ## [1] "hello" ``` ``` r my_list[[2]][3] ## Return the 3rd element of the 2nd list object ``` ``` ## [1] 3 ``` --- # Option 2: $ Lists provide a nice segue to our other indexing operator: `$`. ``` r my_list ``` ``` ## $a ## [1] "hello" ## ## $b ## [1] 1 2 3 ## ## $c ## x y ## 1 1 6 ## 2 2 7 ## 3 3 8 ## 4 4 9 ## 5 5 10 ``` --- count: false # Option 2: $ ``` *## $a ## [1] "hello" ## *## $b ## [1] 1 2 3 ## *## $c ## x y ## 1 1 6 ## 2 2 7 ## 3 3 8 ## 4 4 9 ## 5 5 10 ``` Notice how our (named) parent list objects are demarcated: "$a", "$b" and "$c". --- # Option 2: $ (cont.) We can call these objects directly by name using the dollar sign, e.g. ``` r my_list$a ## Return list object "a" ``` ``` ## [1] "hello" ``` ``` r my_list$b[3] ## Return the 3rd element of list object "b" ``` ``` ## [1] 3 ``` ``` r my_list$c$x ## Return column "x" of list object "c" ``` ``` ## [1] 1 2 3 4 5 ``` -- </br> **Aside:** Typing `View(my_list)` (or, equivalently, clicking on the object in RStudio's environment pane) provides a nice interactive window for exploring the nested structure of lists. --- # Option 2: $ (cont.) The `$` form of indexing also works (and in the manner that you probably expect) for other object types in R. In some cases, you can also combine the two index options. - E.g. Get the 1st element of the "name" column from the starwars data frame. ``` r starwars$name[1] ``` ``` ## [1] "Luke Skywalker" ``` -- However, note some key differences between the output from this example and that of our previous `starwars[1, 1]` example. What are they? - Hint: Apart from the visual cues, try wrapping each command in `str()`. --- # Option 2: $ (cont.) The last thing that I want to say about `$` is that it provides another way to avoid the "object not found" problem that we ran into with our earlier regression example. ``` r lm(y ~ x) ## Doesn't work ``` ``` ## Error in eval(predvars, data, env): object 'y' not found ``` ``` r lm(d$y ~ d$x) ## Works! ``` ``` ## ## Call: ## lm(formula = d$y ~ d$x) ## ## Coefficients: ## (Intercept) d$x ## 2 1 ``` --- class: inverse, center, middle name: cleaning # Cleaning up <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1096px></html> --- # Removing objects (and packages) Use `rm()` to remove an object or objects from your working environment. ``` r a = "hello" b = "world" rm(a, b) ``` You can also use `rm(list = ls())` to remove all objects in your working environment (except packages), but this is [frowned upon](https://www.tidyverse.org/articles/2017/12/workflow-vs-script/). - Better just to start a new R session. -- Detaching packages is more complicated, because there are so many cross-dependencies (i.e. one package depends on, and might even automatically load, another.) However, you can try, e.g. `detach(package:dplyr)` - Again, better just to restart your R session. --- # Removing plots You can use `dev.off()` to removing any (i.e. all) plots that have been generated during your session. For example, try this in your R console: ``` r plot(1:10) dev.off() ``` -- You may also have noticed that RStudio has convenient buttons for clearing your workspace environment and removing (individual) plots. Just look for these icons in the relevant window panels:  --- class: inverse, center, middle name: started # References <html><div style='float:left'></div><hr color='#354CA1' size=1px width=1096px></html> --- # Additional resources - Official Manual - [R-Introduction](https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf) - Books - Grolemund G. [Hands-On Programming with R](https://rstudio-education.github.io/hopr/) - Wickham, H. and Grolemund, G. [R for data science](https://r4ds.had.co.nz/) - Wickham, H [Advanced R](https://adv-r.hadley.nz/) - Hanck, C., Arnold, M., Gerber, A., and Schmelzer, M. [Econometrics with R](https://www.econometrics-with-r.org/) - McDermott, G. [Data Science for Economists and Other Animals](https://grantmcdermott.com/ds4e/index.html) - Online Courses Notes - [EC 607](https://github.com/uo-ec510-2020-spring/lectures) taught by [Grant McDermott](https://grantmcdermott.com/) at the University of Oregon - <span style="color:#CC0035"><strong>This deck of slides is mostly derived from the lecture notes of EC 607.</strong></span> - Tibshirani, R. [Statistics 36-350](https://www.stat.cmu.edu/~ryantibs/statcomp/) taught by [Prof. Ryan Tibshirani](https://www.stat.cmu.edu/~ryantibs/index.html) at Carnegie Mellon University. - Stack Overflow/Google/Github/AI oriented programming - SMU Office of Information Technology Help Desk --- class: inverse, center, middle name: started # Next lecture: More about R programming <html><div style='float:left'></div><hr color='#354CA1' size=1px width=1096px></html>