Author: Sara

April 22, 2020

R is for read_

The tidyverse is full of functions for reading data, beginning with “read_”. The read_csv I’ve used to access my reads2019 data is one example, falling under the read_delim functions. read_tsv...continue reading.

Sara

April 21, 2020

Q is for qplot versus ggplot

Two years ago, when I did Blogging A to Z of R, I talked about qplots. qplots are great for quick plots – which is why they’re named as such...continue reading.

Sara

April 19, 2020

P is for percent

We’ve used ggplots throughout this blog series, but today, I want to introduce another package that helps you customize scales on your ggplots – the scales package. I use this...continue reading.

Sara

April 18, 2020

O is for order_by

This will be a quick post on another tidyverse function, order_by. I’ll admit, I don’t use this one as often as arrange. It can be useful, though, if you don’t...continue reading.

Sara

April 17, 2020

N is for n_distinct

Today, we’ll start digging into some of the functions used to summarise data. The full summarise function will be covered for the letter S. For now, let’s look at one...continue reading.

Sara

April 16, 2020

M is for mutate

Today, we finally talk about the mutate function! I’ve used it a lot throughout the series so far, so it’s nice to get to discuss what it is and how...continue reading.

Sara

April 15, 2020

L is for Log Transformation

When visualizing data, outliers and skewed data can have a huge impact, potentially making your visualization difficult to understand. We can use many of the tricks covered so far to...continue reading.

Sara

April 14, 2020

H2O.ai Academic Program for Professors and Students: Part 2 – Creating Your First (Time Series) Experiment

Part 1 of this blog series discussed how to:apply for free academic license of H2O.ai automated machine learning (AutoML) platform Driverless AI,spin up a VM with budget-oriented cloud provider Paperspace...continue reading.

Sara

April 14, 2020

K is for Keep or Drop Variables

A few times in this series, I’ve wanted to display part of a dataset, such as key variables, like Title, Rating, and Pages. The tidyverse allows you to easily keep...continue reading.

Sara

April 12, 2020

J is for Join

Today, we’ll start digging into the wonderful world of joins! The tidyverse offers several different types of joins between two datasets, X and Y:left_join – keeps all rows from X...continue reading.

Sara

April 11, 2020

I is for I Want to Learn More

This could have easily been a post about a function beginning with the letter I. But I wanted to take the opportunity to share some the resources that really helped...continue reading.

Sara

April 10, 2020

H is for haven

The tidyverse includes many packages meant to make importing, wrangling, analyzing, and visualizing data easier. The haven package allows you to important files from other statistical software, such as SPSS,...continue reading.

Sara

April 9, 2020

G is for group_by

For the letter G, I’d like to introduce a very useful function: group_by. This function lets you group data by one or more variables. By itself, it may not seem...continue reading.

Sara

April 8, 2020

F is for filter

For the letter F – filters! Filters are incredibly useful, especially when combined with the main pipe %>%. I frequently use filters along with ggplot functions, to chart a specific...continue reading.

Sara

April 7, 2020

E is for Exposition Pipe

For the letter E, I want to talk about a set of operators provided by tidyverse (specifically the magrittr package) that makes for much prettier, easier-to-read code: pipes. The main...continue reading.

Sara

April 5, 2020

D is for dummy_cols

For the letter D, I’m going to talk about the dummy_cols functions, which isn’t actually part of the tidyverse, but hey: my posts, my rules. This function is incredibly useful...continue reading.

Sara

April 4, 2020

Developing the right mindset for learning statistics: Some suggestions

Developing the right mindset for learning statistics: Some suggestionscontinue reading.

Sara

April 4, 2020

C is for coalesce

For the letter C, we’ll talk about the coalesce function. If you’re familiar with SQL, you may have seen this function before. It combines two or more variables into a...continue reading.

Sara

April 3, 2020

B is for bind_rows

Moving on to the letter B, today we’ll talk about merging datasets that contain the same variables but add new cases. This is easily done with bind_rows. Let’s say I...continue reading.

Sara

April 2, 2020

A is for arrange

The arrange function allows you to sort a dataset by one or more variable, either ascending or descending. This function is especially helpful if you plan on aggregating your data...continue reading.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Author: Sara

R is for read_

Q is for qplot versus ggplot

P is for percent

O is for order_by

N is for n_distinct

M is for mutate

L is for Log Transformation

H2O.ai Academic Program for Professors and Students: Part 2 – Creating Your First (Time Series) Experiment

K is for Keep or Drop Variables

J is for Join

I is for I Want to Learn More

H is for haven

G is for group_by

F is for filter

E is for Exposition Pipe

D is for dummy_cols

Developing the right mindset for learning statistics: Some suggestions

C is for coalesce

B is for bind_rows

A is for arrange

Editor Picks

How to prevent data leakage in pandas & scikit-learn ☔

Q1 2024 tidymodels digest

Categories

Platinum Sponsors

Sponsors

Buy us a coffee for $10.

Older posts