Statistical Analysis in R

For Statistical Analysis in R programming, several built-in functions are available like mean, mode, median. These functions are part of the R programming base package. These functions take R vectors as an input along with the arguments and give the result.

Statistical Analysis in R
Mean, Median, Mode

Mean

For Statistical Analysis in R, the mean is calculated by taking the sum of the values and divide it with the number of values in the dataset. The mean() function is used to calculate it in R programming.

Its syntax is given below:

Here:

  • x is the input vector.
  • trim is used to drop some observations
  • na.rm is used to remove the missing values
m <- c(12,78,34,4.32,18,2,24,21,8,25)

res <- mean(m)
print(res)

Output

m1

Applying Trim Option

The trim parameter is supplied the values in the vector to get sorted and then the required numbers of observations are dropped from calculating it.

Trim=0.2, 2 values from each end will be dropped from the calculations.

m <- c(12,78,34,4.32,18,2,24,21,8,25)

res <-  mean(m,trim = 0.2)
print(res)

Output

m2

Applying NA Option

The above function returns NA in case of missing values. To drop the missing values from the calculation use na.rm=TRUE which means remove the NA values.

m <- c(12,78,34,4.32,18,2,24,21,8,25,NA)

res <-  mean(m)
print(res)

res <-  mean(m,na.rm = TRUE)
print(res)

Output

m3

Median

For Statistical Analysis in R, the median() function is used to calculate the median. The most center value in a data series is called the median.

Check the below syntax for calculating median in R programming:

Here:

  • m is the input vector.
  • na.rm is used to remove the missing values
m <- c(12,78,34,4.32,18,2,24,21,8,25)

res <- median(m)
print(res)

Output

m5

Mode

The highest number of occurrences in a set of data is known as mode. The mode can have both numeric and character data unlike mean and median.

For Statistical Analysis in R there is not a standard in-built function to calculate the mode. That’s why we create a user function to calculate the mode of a data set in R programming. The major function is to takes the vector as input and gives the mode value as output.

getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Numeric mode
v <- c(12,78,34,4.32,18,12,24,21,8,25)

res <- getmode(v)
print(res)

# character mode
charv <- c("she","the","the","it","he")

# Calculate the mode using the user function.
res<- getmode(charv)
print(res)

Output

m4

Min and Max

The Min and max in R can be used to find the lowest or highest value in a set. Its syntax is given below:

An example for calculation min and max is given below:

x <-c(12,78,34,4.32,18,12,24,21,8,25)

max(x)
min(x)

Output

m6

The example for calculating min and max without NA

x <-c(12,78,34,4.32,18,12,24,21,8,25,NA)

max(x, na.rm=TRUE)
min(x, na.rm=TRUE)

Output

m7