ggplot cheat sheet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
ggplot(data=NULL,mapping=aes(x=,y=,color=,),environment=parent.frame())

## one variable:
ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,fill=,linetype=,size=,))+
geom_area(aes(y=..density..),stat=�bin�) #with shadow

ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,fill=,linetype=,size=,weight=))+
geom_density(aes(y=..county..,),kernel=�gaussian�) #without shadow

ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,fill=,))+
+geom_dotplot()#plot dots

ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,fill=,linetype=,size=,weight-))+
+geom_histogram(aes(y=..density..),binwidth=5)

## two variables
### continuous X, continuous Y
ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,fill=,shape=,size=,))+
geom_jitter() #jittering is adding a small amount of random noise to data, it is often used to spread out points that would otherwise
be overplotted

ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,fill=,shape=,size=,))+
geom_point()

ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,linetype=,size=,weight=))+
geom_quantile()

ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,lintype=,size=,))+
geom_rug(sides=�bl�)

ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,fill=,linetype=,size=,weight=))+
geom_smooth(model=lm)

ggplot(data=NULL,mapping=aes(x=,y=,label=,alpha=,angle=,color=,family=,fontface=,hjust=,lineheight=,size=,vjust=))+
geom_text(aes(label=cty))

###continuous bivariate distribution
ggplot(data=NULL,mapping=aes(xmax=,xmin=,ymax=,ymin=,alpha=,color=,fill=,linetype=,size=,weight=))+
geom_bin2d()

ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,linetype=,size=,))+
geom_density2d()

ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,fill=,size=,))+
geom_hex()

###discrete X, continuous Y
ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,fill=,lintype=,size=,weight=))+
geom_bar(stat=�identity�)

ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,fill=,shape=,size=,))+
geom_boxplot(lower=,middle=,upper=,x=,ymax=,ymin=,alpha=,color=,fill=,linetype=,shape=,size=,weight=)

ggplot(data=NULL,mapping=aes(x=,y=,alpha=,color=,fill=))+
geom_dotplot(binaxis=�y�,stackdir=�center�)

##visualizing error
ggplot(data=NULL,aes(x=,y=,ymin=,ymax=,alpha=,color=,fill=,linetype=,size=))+
geom_crossbar(fatten=2)

ggplot(data=NULL,aes(x=,ymin=,ymax=,alpha=,color=,linetype=,size=,width=))+
geom_errorbar()

ggplot(data=NULL,aes(x=,ymin=,ymax=,alpha=,color=,linetype=,size=))+
geom_linerange()

ggplot(data=NULL,aes(x=,y=,ymin=,ymax=,alpha=,color=,fill=,linetype=,shape=,size=))+
geom_pointrange()

##Stats:
some plots visualize a transformation of the original data set, each stats creates additional variables to map aesthetics to
stat_smooth(method=��,formula=)
method: datasets with n<1000, use loess, n>1000, use gam
how to define smooths in gam formulae???
s(�,k=1,fx=FALSE,bs=�tp�)
� represents a list of variables that are the covariates that this smooth is a function of
k is the dimension of the basis used to represent the smooth term
fx is whether the term is a fixed d.f. regression spline
bs is a two letter charater string indicating the penalized smoothing basis(smooth terms in GAM)
thin plate regression spline: tp;
duchon splines: ds
cubic regression splines: cr,cs,cc
splines on the sphere: sos
P-splines: ps
Random effects: re
Markov Random Fields: mrf
Gaussian process smooths: gp
soap film smooths: so, sf, sw

thin plate regression splines gives the best MSE performance but slower, the knot based penalized cubic regression splines is the second
best
https://www.rdocumentation.org/packages/mgcv/versions/1.8-24/topics/smooth.terms

##Scales
scales control how a plot maps data values to the visual values of an aesthetic
scale_fill_manual(
values=c(),
limits=c(),
breaks=c(),
name=��,
labels=c())specify the own set of mappings from levels in the data to aesthetic values;
scale_x_continuous():
set the range and the breaks for xaxis, map x values to visual values
scale_color_manual():
mapped colors for values

##themes
theme_bw():
a theme with white background and black gridlines
theme():
theme(axis.text.x=element_text(size= , angle= , hjust= ):
when there are many x axis coordinates, a big issue is that they will be overlapped, so change the angle of x axis coordinates and horizontal justification

##Faceting
divide a plot into subplots based on the values of one or more discrete variables
facet_wrap(~fl): wraps a 1d sequence of panels into 2d(wrap facets into a rectangular layout)

##Legends
place legend at �bottom�,�top�,�left�, or �right�
ggplot()+geom_point()+theme(legend.position)
ggplot()+geom_point()+guides(color=�none�)
ggplot()+geom_point()+scale_fill_discrete(name=�Title�,labels=c(�A�,�B�,�C�))

#some other functions not related to ggplot
gsub(): replaces all matches of a string
with(): for example, with(mtcars,summary(mpg)) �> to calculate the summary statistics of mpg in mtcars data, return the summary statistic
cut():cut into different intervals
melt():transform the wide format to long format; when does it need to transform to long format? when there are group factors
subset():subset datas of what you are interested in