median

Erin L. Keller

The purpose of this function is to calculate the median value of a vector. The median value of a set of number is the value at the midpoint of the vector, so there are equal amounts of items on either side of the midpoint. In the case of a vector with an even number of values, the median function will take the average of the two midpoint values. The input for this function is a numerical vector and the ouput is an integer.

To use this function, you will need two arguments:
* x - the numeric vector from which you want the median
* na.rm - logial value specifying whether NA values should be included or discarded in the calculation
* na.rm = FALSE will not include NA values (default)
* na.rm = TRUE will include NA values

z <- 1:10
print(z)
##  [1]  1  2  3  4  5  6  7  8  9 10
median(z, na.rm=FALSE)
## [1] 5.5

complete.cases

Erin L. Keller

The function complete.cases indicates what values in your data (vectors, matrix, data frames) are complete (do not have missing values, no NA). The input for this function needs to be a vector, matrix, or data frame and the output will be a logical vector with “TRUE” indicating that the value is complete (no missing data) while a “FALSE” indicates that the value is incomplete (missing data, NA). This function is useful for identifying missing data in data sets, although if the intent is to discard NA value…(line truncated)…

test<-c(0,1.5,NA,5,4.5,NA,3,3,NA)
complete.cases(test)
## [1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE

seq_len

Erin L. Keller

Seq_len is a seqeunce generator that outputs a sequence of values from 1 to the value specified in the parentheses and is equivalent to seq(length.out=). Sequence length will automatically be inferred if the start and end values are given in the seq function (i.e. seq(from=1, to=20), sequence length will be 20). It is important to note that all numerical inputs should be finite.

a<-seq_len(10)
b<-seq_len(0)
print(a)
##  [1]  1  2  3  4  5  6  7  8  9 10
print(b)
## integer(0)

read.fwf

Erin L. Keller

The purpose of read.fwf to to read a table of fixed width formatted data into a data.frame. To use this function, you will need multiple arguments including: * the name of the file containing the data * the width of the fixed-dwitch fields in the form of a vector * a header containing a logical value specifying names of the variables * if this is present, the names of the variables must be delimited by sep * a character not used in the data set that will be the separator (sep) * row and column names (row.names and col.names) * n is the maximu number of lines to be included * skip can be used to identify how many rows should be skipped when reading the fwf file. * buffersize indicates the maximum number of lines to be read at one time (reducing this may reduce memory use when dealing with large files leading to faster processing)

The output of read.fwf will be a data frame as produced by read.table.

x <- read.fwf(
  file=url("http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for"),
  skip=4,
  widths=c(12, 7, 4, 9, 4, 9, 4, 9, 4))

head(x)
##             V1   V2   V3   V4   V5   V6   V7   V8  V9
## 1  03JAN1990   23.4 -0.4 25.1 -0.3 26.6  0.0 28.6 0.3
## 2  10JAN1990   23.4 -0.8 25.2 -0.3 26.6  0.1 28.6 0.3
## 3  17JAN1990   24.2 -0.3 25.3 -0.3 26.5 -0.1 28.6 0.3
## 4  24JAN1990   24.4 -0.5 25.5 -0.4 26.5 -0.1 28.4 0.2
## 5  31JAN1990   25.1 -0.2 25.8 -0.2 26.7  0.1 28.4 0.2
## 6  07FEB1990   25.8  0.2 26.1 -0.1 26.8  0.1 28.4 0.3

+

Erin L. Keller

The + in R is used to indicate the addition of numbers as it is used in arithmetic. Using + is simple and only requires two values to be added together. Spaces between the values and the + are optional. The input value can be a vector, data frame, or matrix and the ouput value will be integers.

a<-4.4
b<-1.2
c<-a+b
print(c)
## [1] 5.6
d<-1:2
e<-d+1
print(e)
## [1] 2 3

is.numeric

Erin L. Keller

is.numeric determines whether a particular variable is numeric or not. The only input needed is the object to be tested (variable) and the output will be a logical vector with “TRUE” indicating that the variable is numeric (double or integers) and “FALSE” indicating that the variable is not numeric. It is important to note that while double is identical to numeric, however, is.double is not the same as is.numeric.

x<-4
y<-"Yes"
is.numeric(x)
## [1] TRUE
is.numeric(y)
## [1] FALSE

<=

Erin L. Keller

<= is an inequality sign that can be used to determine if the value to the left of the carat is smaller than or equal to the value to the right of the equal sign. The input arguments can be integers, atomic vectors, etc. and the ouput that is given is a logical vector where “TRUE” indicates that the inequality is correct and “FALSE” indicates that the inequality is incorrect.

a<-1
b<-2
a<=b
## [1] TRUE
b<=a
## [1] FALSE

sum

Erin L. Keller

The sum function calculates the sum of all values in the argument. If no arguments are given, the sum is “0” by definition. The input arguments can be numerical or complex vectors and the output will be an integer. If a non-numeric value is present in the argument, you will receive an error message.

sum(1:5)
## [1] 15
a<-1:5
sum(a+1)
## [1] 20

nrow

Erin L. Keller

nrow will report the number of rows of an array. The input arguments can be a vector, array, or data frame and the output will be a single integer reporting the number of rows.

matrix<-matrix(1:10,2,5)
print(matrix)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
nrow(matrix)
## [1] 2

as.data.frame

Erin L. Keller

as.data.frame will report whether a given input value is a data frame. This is important as some functions require data to be in a data frame format (which is similar to a matrix but is distinct from a matrix). The input value can be any element of interest and the output value will be a logical vector, where “TRUE” indicates that the element of interest is in data frame format while “FALSE” indicates that the element of interest is not in data frame format. To coerce data into a data frame, use the as.data…(line truncated)…

matrix<-matrix(1:10,2,5)
is.data.frame(matrix)
## [1] FALSE
matrix2<-as.data.frame(matrix)
is.data.frame(matrix2)
## [1] TRUE

rle

Erin L. Keller

rle standard for “Run Length Encoding” and determines the lengths and values of runs of equal values in a vector. The input argument must be an atomic vector and the ouput will indicate the lengths of the runs (integer) and the values of the run (numeric). The function, inverse.rle can also be used to transform the output of rle back into a vector.

a<-c(1,1,2,2,2,2,3,3,3,4,5,5,5,6)
rle(a)
## Run Length Encoding
##   lengths: int [1:6] 2 4 3 1 3 1
##   values : num [1:6] 1 2 3 4 5 6
b<-inverse.rle((rle(a)))
print(b)
##  [1] 1 1 2 2 2 2 3 3 3 4 5 5 5 6

cumsum

Erin L. Keller

the cumsum function returns a vector which contains the cumulative sums of the elements of the input arguments (i.e. the first value in the output vector is the sum of the first element, the second value in the output vector is the sum of the first two elements, and so on). The input argument must be a numeric or complex object and the output is a numeric vector. If NA is included in the input argument, each sum from the position of NA on will be reported as “NA”.

cumsum(1:10)
##  [1]  1  3  6 10 15 21 28 36 45 55
a<-c(1,2,3,NA,4)
cumsum(a)
## [1]  1  3  6 NA NA

-

Erin L. Keller

The - symbol, or the “minus sign” subtracts the value after the “-” from the value before the “-” as it does in arithmetic and reports the difference between those two values. The input value can be a vector, data frame, or matrix and the ouput value will be integers in a vector, matrix, or data frame (matching the input format).

a<-matrix(1:6,3,2)
b<-matrix(3:8,3,2)
c<-b-a
print(c)
##      [,1] [,2]
## [1,]    2    2
## [2,]    2    2
## [3,]    2    2

subset

Erin L. Keller

Subset is a function that pulls out data from the input object that meets the criteria of interest specified in the function. The input object can be vectors, data frames, and matrices and the arguments to be included are: * x - the object to be subsetted * subset - this is a logical expression where one specifies what elements to keep and to subset. Missing values are not included * select - an expression which indicates what columns to select from a data frame (only used for data frames and matrices) * drop - eliminates dimensions of an array that have only one level

The output to this function will be similar to the input object containing only the selected elements (vector) or rows/columns (matrix, data frame). It is important to note that some factors will have empty levels after subsetting and those that are unused will be removed.

a <- matrix(c(runif(10, min = 80, max = 100),(runif(10, min=85, max = 115))),nrow=10,ncol=2)
colnames(a)<-c("MaxTemp2015","MaxTemp2017")
print(a)
##       MaxTemp2015 MaxTemp2017
##  [1,]    84.60452    89.42439
##  [2,]    98.73616   100.35983
##  [3,]    90.45584    95.70610
##  [4,]    85.83882   108.16546
##  [5,]    94.29872    97.58203
##  [6,]    90.04143   113.36021
##  [7,]    96.17224   112.89776
##  [8,]    99.33855   114.73410
##  [9,]    94.01010   107.07337
## [10,]    95.37309    88.23599
subset(a, a[,1]>90 & a[,2]>100) # this will subset the original data matrix so that MaxTemp2015 will only diplay values over 90 while MaxTemp2017 will only display values greater than 100.
##      MaxTemp2015 MaxTemp2017
## [1,]    98.73616    100.3598
## [2,]    90.04143    113.3602
## [3,]    96.17224    112.8978
## [4,]    99.33855    114.7341
## [5,]    94.01010    107.0734

t

Erin L. Keller

t is a function that transposes a matrix or data frame (i.e. it will flip the column and rows). The input object, x, is typically a matrix or data frame; however, a coerced vector can be used as well. The ouput of t is a matrix with dim and dimnames constructed and all other elements of the original matrix.

a <- matrix(1:10, 2, 5)
colnames(a) <- LETTERS[1:5]
print(a)
##      A B C D  E
## [1,] 1 3 5 7  9
## [2,] 2 4 6 8 10
ta <- t(a)
print(ta)
##   [,1] [,2]
## A    1    2
## B    3    4
## C    5    6
## D    7    8
## E    9   10

print

Erin L. Keller

print is a function that displays the ouput of the input object while making the arguments invisible. Typically, this function is used to visualize data and the same result can be achieved by simply typing the variable name. For this function, inputs of any form can be used; however, some forms may require additional information including: * quote - a logical object used to indicate whether strings should be printed * max.levels - an integer indicating how many levels for a factor should be printed. The detault is NULL which will print on one line of a specified width * digits - the minimum number of significant digits * na.print - a character string which indicates NA values in the output * zero.print - this character can specify if 0’s should be included (some people prefer looking at decimals without the 0) * justify - a character that indicates whether strings should be right- or left-justified or not justified * useSource - this logical indicates whether an internally stored source should be present when printing (keep.source=TRUE if in use)

print.factor allows further customization whiel print.table allows further customization for tables. The output of this function will be the form of the input object.

Rainfall <- matrix(c(1.2,2.4,4.2,6.3,8.9,11.1,12.2,10.7),4,2)
colnames(Rainfall)<-c("Desert","Deciduous")
rownames(Rainfall)<-c("December","February","March","April")
print(Rainfall)
##          Desert Deciduous
## December    1.2       8.9
## February    2.4      11.1
## March       4.2      12.2
## April       6.3      10.7