R Package Dplyr Comparison of Functions

   Diversity
 +1.732.947.4119
Digital  Life Sciences Healthcare Staffing  Company  Contact Us Knowledge Center 
HOME  BLOGS  R PACKAGE DPLYR COMPARISON OF FUNCTIONS BLOG LIST
About Us
Rang Technologies has grown to

become a global leader in
Analytics, Data Science, Artificial
Intelligence, Machine Learning,
Salesforce CRM, Cloud, DevOps,
Internet of Things - IoT,
Cybersecurity, IT Consulting and
Staffing, and Corporate Training.
Categories
DATA SCIENCE
• Data Science (27)
R Package dplyr Comparison of • AI / Machine Learning (6)
functions • Big Data (2)
In R, you can accomplish the same task in different ways.

• Cloud (5)
This R document explains functions from R package--dplyr and • Cybersecurity (5)
in some places compares those functions with base functions. • Salesforce CRM (2)
• IoT (3)
• Clinical (5)
# import dplyr library
• Healthcare (4)
# we are going to work with R in built dataset airquality
library(dplyr) • Life Sciences (6)
head(airquality) • This & That (8)
• Recruiting Strategy (16)
• Diversity, Equity & Inclusion (1)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
Filter function--dplyr vs subset vs [ ]
You might have guessed what filter function will do here. It

filters/subsets/slices the data depending on one or more
condition. I am discussing here the three commonly used ways
for subsetting.
# filter data with Wind > 7.0 for the month of May
#dplyr way
filter(airquality,Wind > 7.0, Month == 5)
# base way
#first
airquality[airquality$Wind > 7.0 & airquality$Month == 5,]
#second
subset(airquality,Wind > 7.0 & Month == 5)
I won't recommend using the subset function unless you

completely understand subset function.
Use caution when using subset. For further readings look at the
stackoverflow
thread, http://stackoverflow.com/questions/9860090/in-r-why-
is-better-than-subset
Mutate function
Mutate function is used to create new variables without

affecting existing variables. In other sense it creates new
variable and keeps the old variable (you will understand why
this is even a thing to note). It can also be used to transform
existing varibles.
Transmute function (you read it right its transmute not

transmutate) does the same but it drops all the variables
except the created new variable. So if you just want to create a
single variable then use transmute otherwise use mutate.
# mutate transforms the variable and keep the existing

variable
mutate(airquality,TempInC = ((Temp - 32) * 5 / 9))
# transmute transforms a variable and drops the existing

variables( I said variables )
# it keeps only the new variable and drops all other variables
transmute(airquality,TempInC = ((Temp - 32) * 5 / 9))
# base function
# somehow i find it easy to use than mutate
airquality$TempInC<-((Temp - 32) * 5 /9)
Arrange function
Arrange function is used to sort variable(s).
# dplyr--arrange
arrange(airquality,Month,desc(Temp))
# base--order()
airquality[order(airquality$Month,-airquality$Temp),]
# Note - for ordering desc
group_by and summarise functions
First let me explain summarise fn then we go for group_by.

Summarise function takes vector as input and outputs a
single value. You can ask min, max, mean,sd, var, median, etc
from a vector and summarise fn outputs the result. Of course R
base package will give you all these summary stats but there
is a catch, summarise function works with group_by function
but base functions don't. I will explain you with examples.
# Both base and summarise give you same output for normal
df/tbl
mean(airquality$Temp)
## [1] 77.88235
summarise(airquality, mean(Temp))
## mean(Temp)
## 1 77.88235
There is a subtle difference in outputs between these two. First

return double the later returns list. But that doesn't concern us,
the key difference is when used with group_by function.
If you know SQL then you may be deceived by the group_by

function. Here group_by doesn't return the output for each
group as you might expect but it creates a new grouped table.
This table can be further used to do lot of actions with that
grouped variable.
grouped_table<-group_by(airquality,Month)
head(grouped_table)
## Source: local data frame [6 x 6]

## Groups: Month [1]
##
##
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
#Dimensions for both original dataset and grouped dataset

dim(airquality)
## [1] 153 6
dim(grouped_table)
## [1] 153 6
Both have same dimensions and you can see the header
records of grouped_table which looks same as original
dataset. But grouped_table is grouped on top of Month
variable. You can see the 'Groups' section denoting the
variable(s) used for group_by function.
Generally you can use more than one variable to group and
ask Summarise fn to give output. Now we ask for
average(mean) using summarise and mean fns and compare
the results.
mean(grouped_table$Temp)
## [1] 77.88235
summarise(grouped_table,mean(Temp))
## # A tibble: 5 x 2
## Month mean(Temp)
##
## 1 5 65.54839
## 2 6 79.10000
## 3 7 83.90323
## 4 8 83.96774
## 5 9 76.90000
dplyr::distinct vs base::unique
From the name you can understand both unique and distinct
functions. Both gives you the unique/distinct values but unique
works with list too (of course).
distinct(airquality,Month)
## Month
## 1 5
## 2 6
## 3 7
## 4 8
## 5 9
unique(airquality$Month)

## [1] 5 6 7 8 9
dplyr::sample_n/sample_frac vs
base::sample
dplyr sample is a wrapper around base sample.int function.
sample_n(airquality,size=2)

## 124 96 167 6.9 91 9 1
## 65 NA 101 10.9 84 7 4
sample_frac(airquality,size=0.01)

## 57 NA 127 8.0 78 6 26
## 135 21 259 15.5 76 9 12
Piping
This symbol %>% is pipe operator which is used to connect

codes together and run connected codes together without
saving intermediate results.
Simply put this operator sends left side parameter as first

argument to right side function. You can also use .(dot)
operator if you want to pass left side parameter Let me show
you,
airquality %>% group_by(Month) %>%

summarise(mean_wind=mean(Wind)) %>%
arrange(desc(mean_wind))
## # A tibble: 5 x 2
## Month mean_wind
##
## 1 5 11.622581
## 2 6 10.266667
## 3 9 10.180000
## 4 7 8.941935
## 5 8 8.793548
airquality data is used as first argument for group_by function.

Then the intermediate grouped table is passed as first
argument for summarise function. At last summarised table is
passed to arrange function and produces output.

This is commonly used when experimenting with data. It also
helps to reduce creating number of temp variables while doing
analysis. ## other functions to look for na_if - converts any
suspicious value to na coalesce - picks non-missing value at
each position when you input more than 1 vectors with same
length. Inspired from SQL coalesce tbl - create table from data
recode - replace values for both numeric and character
vectors. Numeric based on position and character based on
name.
This completes our introduction part for dplyr. This will help you
to start working with data and have fun.!!
About Rang Technologies:

Headquartered in New Jersey, Rang Technologies has
dedicated over a decade delivering innovative solutions and
best talent to help businesses get the most out of the latest
technologies in their digital transformation journey. Read
More...
 BY: SANTHOSH SUBRAMANIAN  Jul 12 2016
Rang Technologies Facebook Feeds

Inc
Headquartered in New Jersey,
Rang Technologies has
dedicated over a decade
delivering innovative solutions
and best talent to help
businesses get the most out of
the latest technologies in their
digital transformation journey.
Read more...
Office Locations
United India Canad
Corporate Headquarters:

Rang Technologies Inc.
15 Corporate Pl S, Suite# 356,
Piscataway, NJ 08854
Phone

+1.732.947.4119

E-mail

General Inquiries:
[email protected]
Sales Inquiries:
[email protected]
Learn More Learn More
About Us Digital
© 2023 Rang Why partner with us Data Science & Analytics
Technologies Inc Diversity AI & Machine Learning Google

All rights reserved
Reviews
Press Release Big Data
Case Studies Cloud

4.5 1r e9v7i e w s
Terms of Use
Provided by review-widget.net
Privacy Policy Blog Staffing
Cookies Policy Events

   
Follow
85,652
Update cookies preferences

R Package Dplyr Comparison of Functions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R Package Dplyr Comparison of Functions

Uploaded by

Copyright:

Available Formats

   Diversity

Digital  Life Sciences Healthcare Staffing  Company  Contact Us Knowledge Center 

HOME  BLOGS  R PACKAGE DPLYR COMPARISON OF FUNCTIONS BLOG LIST

Rang Technologies has grown to

In R, you can accomplish the same task in different ways.

Filter function--dplyr vs subset vs [ ]

You might have guessed what filter function will do here. It

I won't recommend using the subset function unless you

Mutate function is used to create new variables without

Transmute function (you read it right its transmute not

# mutate transforms the variable and keep the existing

# transmute transforms a variable and drops the existing

Arrange function is used to sort variable(s).

group_by and summarise functions

First let me explain summarise fn then we go for group_by.

There is a subtle difference in outputs between these two. First

If you know SQL then you may be deceived by the group_by

## Source: local data frame [6 x 6]

#Dimensions for both original dataset and grouped dataset

Digital  Life Sciences Healthcare Staffing  Company  Contact Us Knowledge Center 

dplyr sample is a wrapper around base sample.int function.

## Ozone Solar.R Wind Temp Month Day

## Ozone Solar.R Wind Temp Month Day

This symbol %>% is pipe operator which is used to connect

Simply put this operator sends left side parameter as first

airquality %>% group_by(Month) %>%

airquality data is used as first argument for group_by function.

passed to arrange function and produces output.

About Rang Technologies:

 BY: SANTHOSH SUBRAMANIAN  Jul 12 2016

Rang Technologies Facebook Feeds

Digital  Life Sciences Healthcare Staffing  Company  Contact Us Knowledge Center 

Learn More Learn More

© 2023 Rang Why partner with us Data Science & Analytics

Technologies Inc Diversity AI & Machine Learning Google

Case Studies Cloud

Cookies Policy Events

Update cookies preferences

You might also like