| ggparcoord {GGally} | R Documentation |
A function for plotting static parallel coordinate plots, utilizing
the ggplot2 graphics package.
ggparcoord(data, columns, groupColumn = NULL, scale = "std", scaleSummary = "mean", centerObsID = 1, missing = "exclude", order = columns, showPoints = FALSE, splineFactor = FALSE, alphaLines = 1, boxplot = FALSE, shadeBox = NULL, mapping = NULL, title = "")
data |
the dataset to plot |
columns |
a vector of variables (either names or indices) to be axes in the plot |
groupColumn |
a single variable to group (color) by |
scale |
method used to scale the variables (see Details) |
scaleSummary |
if scale=="center", summary statistic to univariately center each variable by |
centerObsID |
if scale=="centerObs", row number of case plot should univariately be centered on |
missing |
method used to handle missing values (see Details) |
order |
method used to order the axes (see Details) |
showPoints |
logical operator indicating whether points should be plotted or not |
splineFactor |
logical or numeric operator indicating whether spline interpolation should be used. Numeric values will multiplied by the number of columns, |
alphaLines |
value of alpha scaler for the lines of the parcoord plot or a column name of the data |
boxplot |
logical operator indicating whether or not boxplots should underlay the distribution of each variable |
shadeBox |
color of underlaying box which extends from the min to the max for each variable (no box is plotted if shadeBox == NULL) |
mapping |
aes string to pass to ggplot object |
title |
character string denoting the title of the plot |
scale is a character string that denotes how to scale the variables
in the parallel coordinate plot. Options:
std: univariately, subtract mean and divide by standard deviation
robust: univariately, subtract median and divide by median absolute deviation
uniminmax: univariately, scale so the minimum of the variable is zero, and the maximum is one
globalminmax: no scaling is done; the range of the graphs is defined
by the global minimum and the global maximum
center: use uniminmax to standardize vertical height, then
center each variable at a value specified by the scaleSummary param
centerObs: use uniminmax to standardize vertical height, then
center each variable at the value of the observation specified by the centerObsID param
missing is a character string that denotes how to handle missing
missing values. Options:
exclude: remove all cases with missing values
mean: set missing values to the mean of the variable
median: set missing values to the median of the variable
min10: set missing values to 10% below the minimum of the variable
random: set missing values to value of randomly chosen observation
on that variable
order is either a vector of indices or a character string that denotes how to
order the axes (variables) of the parallel coordinate plot. Options:
(default): order by the vector denoted by columns
(given vector): order by the vector specified
anyClass: order variables by their separation between any one class and
the rest (as opposed to their overall variation between classes). This is accomplished
by calculating the F-statistic for each class vs. the rest, for each axis variable.
The axis variables are then ordered (decreasing) by their maximum of k F-statistics,
where k is the number of classes.
allClass: order variables by their overall F statistic (decreasing) from
an ANOVA with groupColumn as the explanatory variable (note: it is required
to specify a groupColumn with this ordering method). Basically, this method
orders the variables by their variation between classes (most to least).
skewness: order variables by their sample skewness (most skewed to
least skewed)
Outlying: order by the scagnostic measure, Outlying, as calculated
by the package scagnostics. Other scagnostic measures available to order
by are Skewed, Clumpy, Sparse, Striated, Convex, Skinny, Stringy, and
Monotonic. Note: To use these methods of ordering, you must have the scagnostics
package loaded.
ggplot object that if called, will print
Jason Crowley crowley.jason.s@gmail.com, Barret Schloerke schloerke@gmail.com, Di Cook dicook@iastate.edu, Heike Hofmann hofmann@iastate.edu, Hadley Wickham h.wickham@gmail.com
# use sample of the diamonds data for illustrative purposes
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],100),]
# basic parallel coordinate plot, using default settings
# ggparcoord(data = diamonds.samp,columns = c(1,5:10))
# this time, color by diamond cut
gpd <- ggparcoord(data = diamonds.samp,columns = c(1,5:10),groupColumn = 2)
# gpd
# underlay univariate boxplots, add title, use uniminmax scaling
gpd <- ggparcoord(data = diamonds.samp,columns = c(1,5:10),groupColumn = 2,
scale = "uniminmax",boxplot = TRUE,title = "Parallel Coord. Plot of Diamonds Data")
# gpd
# utilize ggplot2 aes to switch to thicker lines
gpd <- ggparcoord(data = diamonds.samp,columns = c(1,5:10),groupColumn = 2,
title="Parallel Coord. Plot of Diamonds Data",mapping = ggplot2::aes(size = 1))
# gpd
# basic parallel coord plot of the msleep data, using 'random' imputation and
# coloring by diet (can also use variable names in the columns and groupColumn
# arguments)
data(msleep, package="ggplot2")
gpd <- ggparcoord(data = msleep, columns = 6:11, groupColumn = "vore", missing =
"random", scale = "uniminmax")
# gpd
# center each variable by its median, using the default missing value handler,
# 'exclude'
gpd <- ggparcoord(data = msleep, columns = 6:11, groupColumn = "vore", scale =
"center", scaleSummary = "median")
# gpd
# with the iris data, order the axes by overall class (Species) separation using
# the anyClass option
gpd <- ggparcoord(data = iris, columns = 1:4, groupColumn = 5, order = "anyClass")
# gpd
# add points to the plot, add a title, and use an alpha scalar to make the lines
# transparent
gpd <- ggparcoord(data = iris, columns = 1:4, groupColumn = 5, order = "anyClass",
showPoints = TRUE, title = "Parallel Coordinate Plot for the Iris Data",
alphaLines = 0.3)
# gpd
# color according to a column
iris2 <- iris
iris2$alphaLevel <- c("setosa" = 0.2, "versicolor" = 0.3, "virginica" = 0)[iris2$Species]
gpd <- ggparcoord(data = iris2, columns = 1:4, groupColumn = 5, order = "anyClass",
showPoints = TRUE, title = "Parallel Coordinate Plot for the Iris Data",
alphaLines = "alphaLevel")
# gpd
## Use splines on values, rather than lines (all produce the same result)
columns <- c(1, 5:10)
gpd <- ggparcoord(diamonds.samp, columns, groupColumn = 2, splineFactor = TRUE)
# gpd
gpd <- ggparcoord(diamonds.samp, columns, groupColumn = 2, splineFactor = 3)
# gpd
splineFactor <- length(columns) * 3
gpd <- ggparcoord(diamonds.samp, columns, groupColumn = 2, splineFactor = I(splineFactor))
# gpd