Session 2 of 4: Data import and manipulation
<- mean?== mean?Task
session2 or day2, something informative to you.Task
Task
%>%import(), clean(), and visualise() and I want to apply them to some dataset x.Task
Create a new object called arctic_sites which has only the rows where the ‘Realm’ is ‘Arctic’
You can use other operators beyond just the == operator that tests for equality:
> means “greater than”< means “less than”>= means “greater than or equal to”<= means “less than or equal to”!= means “not equal to”& means “and”| means “or”>, <, >=, !=, &, |Task
library(tidyverse) # where the functions are stored
filter() # subsetting the data (e.g., order == "Aves")
select() # Selecting a single or multiple columns
mutate() # Creating a new column
rename() # Renaming a column
summarise() # Summarising a dataset
arrange() # Ordering a column (E.g., sort by smallest arrival time)
arrange(desc()) # ... or by largest arrival time
distinct() # give me only the unique rows (no repeats)
pull() # pull out a column and make it a vector
count() # count the number of rowsselect()Task
Create a new object called slim_data which has the columns survey_id, site_code, species_name, total
mutate()Task
Create a new object called cunner which is only the species Tautogolabrus adspersus and has a new column called biomass_kg which is the biomass, but in kg instead of g.
rename()Task
Rename the visibility column to something a bit more informative, maybe to include the units?
summarise()summarise()summarise() functionTask
What is the mean_biomass of all the rows? What happens if you don’t include na.rm=TRUE in your code? Why?
.by argumentYou may see this written as group_by(), it’s the same
Task
Which species, on average, has approximately 1.27 individuals per survey?
arrange()Task
Order the dataframe so the mean_abun is descending (High to low)
count()Task
Looking only at Tautogolabrus adspersus, what is the most common site is it observed at?
| Action | Base R code | Tidyverse code |
|---|---|---|
| Extract a column from a dataframe | df$col1 OR df[,"col1"] | df %>% pull(col1) |
| Filter specific rows | df[df$col1=="ABC",] | df %>% filter(col1 == "ABC") |
| Select specific columns | df[,c(col1,col3:col5)] | df %>% select(col1,col3:col5) |
| Create a new column | df$col9 <- sqrt(df$col8) | df %>% mutate(col9 = sqrt(col8)) |
| Rename a column | names(df)[names(df) == "old_col_name"] <- "new_col_name" | df %>% rename(new_col_name = old_col_name) |
| Summarise data (e.g. mean of a column) | mean(df$col1) | df %>% summarise(mean(col1)) |
| Sort a column | df[order(df$col1), ] | df %>% arrange(col1) |
| Filter unique rows | df[!duplicated(df[,c("col1", "col2")]), ] | df %>% distinct() |
| Count number of rows of a group | as.data.frame(table(df$col1)) | df %>% count(col1) |
Task
Using the example dataset, for Chordates only, get the mean abundance of individuals (rename ‘total’ to ‘abundance’), within each taxonomic family.