[1] 4
Session 1 of 4: The basics
Problem: We are given an excel file with 1M+ rows, with species names, latitude and longitude of occurrence, and we must find errors in a species geographical distribution. The analysis is complicated and involves multiple steps that need to be replicated by others in the future.
Solution: R
| Mechanics | Science | |
|---|---|---|
| Problem | Bolt won’t loosen by hand | I have hypothesis and data; I want results for a paper |
| Solution | Use a spanner/wrench/socket/hammer | Use functions in R to import, clean, visualise and model data |
| Outcome | Loosen bolt | Figures and statistics |

Task: Start a new R project - Open RStudio - New Project - New Directory > New Project - Browse location for location of all R projects - Give a good title for the analysis (e.g. “phd_chapter_1”) - avoid spaces and capitals
<-. This is called the “assignment operator”Task: Create an object called x and make it equal to 5, and then modify x.
What does z equal?
What does x equal?
class() function to see what the class of an object is| Name | Examples | Syntax |
|---|---|---|
| Numeric | 6.7, 8.9, 1.0 | dbl |
| Character string | “cat”, “dog” | chr |
| Boolean/logical | TRUE, FALSE | lgl |
| Integer | 2, 5, 149 | int |
== syntax to see if two things are equal=)= behaves similar to the assignment operator <-, but avoid using it+, () and /, calculate the mean of 5, 10, and 3.mean() function to calculate the mean:mean is the function, x is the argument of the functionc(5,10,3) to the x argument of the functionmean() functionmean() is a very simple function, but other functions can be extremely complex?functionname notation to see information about the functionreadrread_csv(), which allows us to read in excel data (in comma-separated-value format, .csv)read.csv()read_csv() function from the readr package{r, eval=TRUE, echo = FALSE}0 library(readr) read_csv("cape_howe.csv", n_max = 6)
read.csv() and read_csv()head() function# A tibble: 6 × 12
survey_id species_name size_class n_500m2 survey_date site_code depth program
<dbl> <chr> <dbl> <dbl> <date> <chr> <dbl> <chr>
1 2002715 Ophthalmolep… 25 3 2011-04-18 JBMP-S2 10.2 RLS
2 2002715 Pseudolabrus… 7.5 1 2011-04-18 JBMP-S2 10.2 RLS
3 2002715 Pempheris af… 5 2 2011-04-18 JBMP-S2 10.2 RLS
4 2002715 Pempheris af… 7.5 10 2011-04-18 JBMP-S2 10.2 RLS
5 2002715 Pempheris af… 7.5 10 2011-04-18 JBMP-S2 10.2 RLS
6 2002715 Trachinops t… 2.5 65 2011-04-18 JBMP-S2 10.2 RLS
# ℹ 4 more variables: latitude <dbl>, longitude <dbl>, ecoregion <chr>,
# method <dbl>
tail() functionn = argument of tail() - what does this do?glimpse() function from the dplyr packageRows: 269,123
Columns: 12
$ survey_id <dbl> 2002715, 2002715, 2002715, 2002715, 2002715, 2002715, 200…
$ species_name <chr> "Ophthalmolepis lineolatus", "Pseudolabrus luculentus", "…
$ size_class <dbl> 25.0, 7.5, 5.0, 7.5, 7.5, 2.5, 2.5, 5.0, 5.0, 10.0, 10.0,…
$ n_500m2 <dbl> 3, 1, 2, 10, 10, 65, 65, 110, 110, 71, 71, 2, 5, 5, 6, 6,…
$ survey_date <date> 2011-04-18, 2011-04-18, 2011-04-18, 2011-04-18, 2011-04-…
$ site_code <chr> "JBMP-S2", "JBMP-S2", "JBMP-S2", "JBMP-S2", "JBMP-S2", "J…
$ depth <dbl> 10.2, 10.2, 10.2, 10.2, 10.2, 10.2, 10.2, 10.2, 10.2, 10.…
$ program <chr> "RLS", "RLS", "RLS", "RLS", "RLS", "RLS", "RLS", "RLS", "…
$ latitude <dbl> -35.08, -35.08, -35.08, -35.08, -35.08, -35.08, -35.08, -…
$ longitude <dbl> 150.8, 150.8, 150.8, 150.8, 150.8, 150.8, 150.8, 150.8, 1…
$ ecoregion <chr> "Cape Howe", "Cape Howe", "Cape Howe", "Cape Howe", "Cape…
$ method <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
View() function in Rreadr and dplyrtidyverse (a special collection of packages)