tidyverse
研究人员必备良器
R
packages
首先,让我们学习一下研究人员必知必会的R基本工具包:tidyverse
[@tidyverse-2]
1 Setup
2 基本操作:
2.1 选择已有变量(行): select
2.2 筛选样本(列): filter
2.3 生成新变量: mutate
# A tibble: 6 × 14
name height mass hair_color skin_color eye_color birth_year sex gender
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 Luke Sky… 172 77 blond fair blue 19 male mascu…
2 C-3PO 167 75 <NA> gold yellow 112 none mascu…
3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…
4 Darth Va… 202 136 none white yellow 41.9 male mascu…
5 Leia Org… 150 49 brown light brown 19 fema… femin…
6 Owen Lars 178 120 brown, gr… light blue 52 male mascu…
# ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
# vehicles <list>, starships <list>
代码
starwars %>%
select(gender,mass,height,species) %>% # 选择变量会有提示,提高输入效率
filter(species == "Human") %>%
na.omit() %>% # 去掉NA数据
mutate(height = height / 100, # 这里使用相同变量名,则会替换掉原变量
BMI = mass / height^2) %>% # 这里使用不同的变量名,则会新生成一个变量
summarise(Average_BMI = mean(BMI),.by = gender) # tidyverse升级后,group_by 可以通过.by实现
# A tibble: 2 × 2
gender Average_BMI
<chr> <dbl>
1 masculine 25.7
2 feminine 20.8
2.4 教学视频
3 分类命令:case_when()
case_when 命令用于将数据按一定条件进行分类。
3.1 导入样本数据
3.2 演示
代码
# A tibble: 34 × 3
name score grade
<chr> <dbl> <chr>
1 student 1 80 B
2 student 2 66 D
3 student 3 72 c
4 student 4 75 c
5 student 5 74 c
6 student 6 71 c
7 student 7 77 c
8 student 8 49 F
9 student 9 66 D
10 student 10 84 B
# ℹ 24 more rows