长宽数据变换_reshape2包_2021-02-28

我们用一个R内置的测试数据airquality举例什么是：

head(airquality)

ozone solar.r wind temp month day

1 41 190 7.4 67 5 1

2 36 118 8.0 72 5 2

3 12 149 12.6 74 5 3

4 18 313 11.5 62 5 4

5 NA NA 14.3 56 5 5

6 28 NA 14.9 66 5 6

str(airquality)

'data.frame': 153 obs. of 6 variables:

$ ozone : int 41 36 12 18 NA 28 23 19 8 NA ...

$ solar.r: int 190 118 149 313 NA NA 299 99 19 194 ...

$ wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...

$ temp : int 67 72 74 62 56 66 65 59 61 69 ...

$ month : int 5 5 5 5 5 5 5 5 5 5 ...

$ day : int 1 2 3 4 5 6 7 8 9 10 ...

长数据：

"ozone" "solar.r" "wind" "temp" "month" "day"都是airquality的变量variable 名称，value值就是对应每个检测的值，这样的数据非常适合数据可视化。

head(melt(airquality), n = 10)

No id variables; using all as measure variables

variable value

1 ozone 41

2 ozone 36

3 ozone 12

4 ozone 18

5 ozone NA

6 ozone 28

7 ozone 23

8 ozone 19

9 ozone 8

10 ozone NA

宽数据：

宽数据通常是变量为列，检测为行所组成的数据框Data frame

head(airquality, n =10)

ozone solar.r wind temp month day

1 41 190 7.4 67 5 1

2 36 118 8.0 72 5 2

3 12 149 12.6 74 5 3

4 18 313 11.5 62 5 4

5 NA NA 14.3 56 5 5

6 28 NA 14.9 66 5 6

7 23 299 8.6 65 5 7

8 19 99 13.8 59 5 8

9 8 19 20.1 61 5 9

10 NA 194 8.6 69 5 10

# 1.工作目录

setwd("reshape2")

# 2.安装和导入

# install.packages("reshape2")

library(reshape2)

# 3.功能测试

help(package="reshape2")

### 3.1 acast()，Cast functions Cast a molten data frame into an array or data frame.

str(acast)

# function (data, formula, fun.aggregate = NULL, ..., margins = NULL, subset = NULL,

# fill = NULL, drop = TRUE, value.var = guess_value(data))

# Cast functions Cast a molten data frame into an array or data frame.

names(airquality) <- tolower(names(airquality))

head(airquality)

# ozone solar.r wind temp month day

# 1 41 190 7.4 67 5 1

# 2 36 118 8.0 72 5 2

# 3 12 149 12.6 74 5 3

# 4 18 313 11.5 62 5 4

# 5 NA NA 14.3 56 5 5

# 6 28 NA 14.9 66 5 6

head(acast(aqm, day ~ month ~ variable))

, , ozone

5 6 7 8 9

1 41 NA 135 39 96

2 36 NA 49 9 78

3 12 NA 32 16 73

4 18 NA NA 78 91

5 NA NA 64 35 47

6 28 NA 40 66 32

, , solar.r

5 6 7 8 9

1 190 286 269 83 167

2 118 287 248 24 197

3 149 242 236 77 183

4 313 186 101 NA 189

5 NA 220 175 NA 95

6 NA 264 314 NA 92

, , wind

5 6 7 8 9

1 7.4 8.6 4.1 6.9 6.9

2 8.0 9.7 9.2 13.8 5.1

3 12.6 16.1 9.2 7.4 2.8

4 11.5 9.2 10.9 6.9 4.6

5 14.3 8.6 4.6 7.4 7.4

6 14.9 14.3 10.9 4.6 15.5

, , temp

5 6 7 8 9

1 67 78 84 81 91

2 72 74 85 81 92

3 74 67 81 82 93

4 62 84 84 86 93

5 56 85 83 85 87

6 66 79 83 87 84

acast(aqm, month ~ variable, mean)

# ozone solar.r wind temp

# 5 23.61538 181.2963 11.622581 65.54839

# 6 29.44444 190.1667 10.266667 79.10000

# 7 59.11538 216.4839 8.941935 83.90323

# 8 59.96154 171.8571 8.793548 83.96774

# 9 31.44828 167.4333 10.180000 76.90000

acast(aqm, month ~ variable, mean, margins = TRUE)

# ozone solar.r wind temp (all)

# 5 23.61538 181.2963 11.622581 65.54839 68.70696

# 6 29.44444 190.1667 10.266667 79.10000 87.38384

# 7 59.11538 216.4839 8.941935 83.90323 93.49748

# 8 59.96154 171.8571 8.793548 83.96774 79.71207

# 9 31.44828 167.4333 10.180000 76.90000 71.82689

# (all) 42.12931 185.9315 9.957516 77.88235 80.05722

dcast(aqm, month ~ variable, mean, margins = c("month", "variable"))

# month ozone solar.r wind temp (all)

# 1 5 23.61538 181.2963 11.622581 65.54839 68.70696

# 2 6 29.44444 190.1667 10.266667 79.10000 87.38384

# 3 7 59.11538 216.4839 8.941935 83.90323 93.49748

# 4 8 59.96154 171.8571 8.793548 83.96774 79.71207

# 5 9 31.44828 167.4333 10.180000 76.90000 71.82689

# 6 (all) 42.12931 185.9315 9.957516 77.88235 80.05722

### 3.2 melt( )，宽数据转化为长数据，Convert an object into a molten data frame.

aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE)

head(aqm)

# month day variable value

# 1 5 1 ozone 41

# 2 5 2 ozone 36

# 3 5 3 ozone 12

# 4 5 4 ozone 18

# 6 5 6 ozone 28

# 7 5 7 ozone 23

### 3.3 colsplit()

?colsplit

# Split a vector into multiple columns

x <- c("a_1_T", "a_2_F", "b_2_T", "c_3_F")

vars <- colsplit(x, "_", c("trt", "time", "Boolean_value"))

vars

# trt time Boolean_value

# 1 a 1 TRUE

# 2 a 2 FALSE

# 3 b 2 TRUE

# 4 c 3 FALSE

str(vars)

# 'data.frame': 4 obs. of 3 variables:

# $ trt : chr "a" "a" "b" "c"

# $ time : int 1 2 2 3

# $ Boolean_value: logi TRUE FALSE TRUE FALSE

### 3.4 recast()，Recast: melt and cast in a single step

### Recast: melt and cast in a single step

?recast

recast(french_fries, time ~ variable, id.var = 1:4)

# Aggregation function missing: defaulting to length

# time potato buttery grassy rancid painty

# 1 1 72 72 72 72 72

# 2 2 72 72 72 72 72

# 3 3 72 72 72 72 72

# 4 4 72 72 72 72 72

# 5 5 72 72 72 72 72

# 6 6 72 72 72 72 72

# 7 7 72 72 72 72 72

# 8 8 72 72 72 72 72

# 9 9 60 60 60 60 60

# 10 10 60 60 60 60 60

### 3.5 reshape2: built-in data

str(tips)

# 'data.frame': 244 obs. of 7 variables:

# $ total_bill: num 17 10.3 21 23.7 24.6 ...

# $ tip : num 1.01 1.66 3.5 3.31 3.61 4.71 2 3.12 1.96 3.23 ...

# $ sex : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 2 2 2 2 2 ...

# $ smoker : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...

# $ day : Factor w/ 4 levels "Fri","Sat","Sun",..: 3 3 3 3 3 3 3 3 3 3 ...

# $ time : Factor w/ 2 levels "Dinner","Lunch": 1 1 1 1 1 1 1 1 1 1 ...

# $ size : int 2 3 3 2 4 4 2 4 2 2 ...

# In all he recorded 244 tips. The data was reported in a collection of case studies for business statistics (Bryant & Smith 1995).

str(smiths)

# 'data.frame': 2 obs. of 5 variables:

# $ subject: Factor w/ 2 levels "John Smith","Mary Smith": 1 2

# $ time : int 1 1

# $ age : num 33 NA

# $ weight : num 90 NA

# $ height : num 1.87 1.54

# A small demo dataset describing John and Mary Smith. Used in the introductory vignette.

str(french_fries)

# 'data.frame': 696 obs. of 9 variables:

# $ time : Factor w/ 10 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...

# $ treatment: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...

# $ subject : Factor w/ 12 levels "3","10","15",..: 1 1 2 2 3 3 4 4 5 5 ...

# $ rep : num 1 2 1 2 1 2 1 2 1 2 ...

# $ potato : num 2.9 14 11 9.9 1.2 8.8 9 8.2 7 13 ...

# $ buttery : num 0 0 6.4 5.9 0.1 3 2.6 4.4 3.2 0 ...

# $ grassy : num 0 0 0 2.9 0 3.6 0.4 0.3 0 3.1 ...

# $ rancid : num 0 1.1 0 2.2 1.1 1.5 0.1 1.4 4.9 4.3 ...

# $ painty : num 5.5 0 0 0 5.1 2.3 0.2 4 3.2 10.3 ...

# This data was collected from a sensory experiment conducted at Iowa State University in 2004. The investigators were interested in the effect of using three different fryer oils had on the taste of the fries.

### 3.6 查看reshape2的描述信息

help(package="reshape2")

Package: reshape2

Title: Flexibly Reshape Data: A Reboot of the Reshape Package

Version: 1.4.4

Author: Hadley Wickham <h.wickham@gmail.com>

Maintainer: Hadley Wickham <h.wickham@gmail.com>

Description: Flexibly restructure and aggregate data using just two

functions: melt and 'dcast' (or 'acast').

License: MIT + file LICENSE

URL: https://github.com/hadley/reshape

BugReports: https://github.com/hadley/reshape/issues

Depends: R (>= 3.1)

Imports: plyr (>= 1.8.1), Rcpp, stringr

Suggests: covr, lattice, testthat (>= 0.8.0)

LinkingTo: Rcpp

Encoding: UTF-8

LazyData: true

RoxygenNote: 7.1.0

NeedsCompilation: yes

Packaged: 2020-04-09 12:27:19 UTC; hadley

Repository: CRAN

Date/Publication: 2020-04-09 13:50:02 UTC

Built: R 4.0.0; x86_64-w64-mingw32; 2020-05-02 21:38:15 UTC; windows

Archs: i386, x64

# 4.收尾

sessionInfo()

# R version 4.0.3 (2020-10-10)

# Platform: x86_64-w64-mingw32/x64 (64-bit)

# Running under: Windows 10 x64 (build 18363)

# Matrix products: default

# locale:

# [1] LC_COLLATE=Chinese (Simplified)_China.936

# [2] LC_CTYPE=Chinese (Simplified)_China.936

# [3] LC_MONETARY=Chinese (Simplified)_China.936

# [4] LC_NUMERIC=C

# [5] LC_TIME=Chinese (Simplified)_China.936

# attached base packages:

# [1] stats graphics grDevices utils datasets methods base

# other attached packages:

# [1] reshape2_1.4.4

# loaded via a namespace (and not attached):

# [1] compiler_4.0.3 magrittr_2.0.1 plyr_1.8.6 tools_4.0.3 yaml_2.2.1

# [6] Rcpp_1.0.6 tinytex_0.29 stringi_1.5.3 stringr_1.4.0 xfun_0.21

最后编辑于：2021.02.28 18:45:19

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 203,098评论 5赞 476
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,213评论 2赞 380
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 149,960评论 0赞 336
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,519评论 1赞 273
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,512评论 5赞 364
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,533评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,914评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,574评论 0赞 256
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,804评论 1赞 296
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,563评论 2赞 319
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,644评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,350评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,933评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,908评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,146评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,847评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,361评论 2赞 342

长宽数据变换_reshape2包_2021-02-28

我们用一个R内置的测试数据airquality举例什么是：

长数据：

"ozone" "solar.r" "wind" "temp" "month" "day"都是airquality的变量variable 名称，value值就是对应每个检测的值，这样的数据非常适合数据可视化。

宽数据：

宽数据通常是变量为列，检测为行所组成的数据框Data frame

# 1.工作目录

# 2.安装和导入

# 3.功能测试

help(package="reshape2")

### 3.1 acast()，Cast functions Cast a molten data frame into an array or data frame.

### 3.2 melt( )，宽数据转化为长数据，Convert an object into a molten data frame.

### 3.3 colsplit()

### 3.4 recast()，Recast: melt and cast in a single step

### 3.5 reshape2: built-in data

### 3.6 查看reshape2的描述信息

# 4.收尾

推荐阅读更多精彩内容