分类变量(categorial variables of factors) 处理工具

R
packages
作者

Reddy Lee

发布于

2023年10月28日星期六

修改于

2023年10月29日星期日


1 Setup

代码
library(tidyverse)

2 去掉不需要的分类

代码
# gss_cat 是tidyverse 内置的数据
# view(gss_cat)
levels(gss_cat$race)
[1] "Other"          "Black"          "White"          "Not applicable"
代码
table(gss_cat$race) 

         Other          Black          White Not applicable 
          1959           3129          16395              0 
代码
gss_cat %>%
  mutate(race = fct_drop(race)) %>%
  select(race) %>%
  table()
race
Other Black White 
 1959  3129 16395 

3 调整分类的顺序

代码
gss_cat %>%
  mutate(race = fct_drop(race),
         race = fct_relevel(race, c("White", "Black", "Other"))) %>%
  select(race) %>%
  table()
race
White Black Other 
16395  3129  1959 

4 让柱状图按大小顺序排列

代码
gss_cat %>%
  mutate(marital = fct_infreq(marital)) %>%
  ggplot(aes(marital)) +
  geom_bar(fill = "purple")

代码
gss_cat %>%
  mutate(marital = fct_rev(fct_infreq(marital))) %>%
  ggplot(aes(marital)) +
  geom_bar(fill = "purple")

5 分组均值并排序

代码
gss_cat %>%
  summarise(meantv = mean(tvhours, na.rm = T),.by = relig) %>%
  mutate(relig = fct_reorder(relig, meantv)) %>%
  ggplot(aes(meantv, relig)) +
  geom_point(size = 4, color = "steelblue")

6 调整类别顺序

代码
gss_cat %>% 
  count(partyid)
# A tibble: 10 × 2
   partyid                n
   <fct>              <int>
 1 No answer            154
 2 Don't know             1
 3 Other party          393
 4 Strong republican   2314
 5 Not str republican  3032
 6 Ind,near rep        1791
 7 Independent         4119
 8 Ind,near dem        2499
 9 Not str democrat    3690
10 Strong democrat     3490

7 教学视频

Using R programming to manage categorial variables or factors using the forcats package

回到顶部