## nice student project

In all of my undergraduate classes, I require a term project, done in groups of 3-4 students. Though the topic is specified, it is largely open-ended, a level of “freedom”...continue reading.

## Using R and H2O to identify product anomalies during the manufacturing process.

Introduction:We will identify anomalous products on the production line by using measurements from testing stations and deep learning models. Anomalous products are not failures, these anomalies are products close to...continue reading.

## How cdata Control Table Data Transforms Work

With all of the excitement surrounding cdata style control table based data transforms (the cdata ideas being named as the “replacements” for tidyr‘s current methodology, by the tidyr authors themselves!)...continue reading.

## Why we Did Not Name the cdata Transforms wide/tall/long/short

We recently saw this UX (user experience) question from the tidyr author as he adapts tidyr to cdata techniques. The terminology that he is not adopting from cdata is “unpivot_to_blocks()”...continue reading.

## Decode Lyrics in Pop Music: Visualise Prose with the Songsim algorithm

The post Decode Lyrics in Pop Music: Visualise Prose with the Songsim algorithm appeared first on The Lucid Manager. The lyrics of songs are more and more repetitive. Wihin this...continue reading.

## A checklist for choosing between #rstats packages

The paradox of choice can at times be a challenge. There are well over 10,000 packages on CRAN now (likely 16,000), and there have been suggestions on how to find...continue reading.

## How to Speed Up Gradient Boosting by a Factor of Two

Our latest tool development at STATWORX: random boost, an algorithm twice as fast as gradient boosting, with comparable prediction performance. Der Beitrag How to Speed Up Gradient Boosting by a...continue reading.

## How long since your team scored 100+ points? This blog’s first foray into the fitzRoy R package

When this blog moved from bioinformatics to data science I ran a Twitter poll to ask whether I should start afresh at a new site or continue here. “Continue here”,...continue reading.

## RStudio Connect 1.7.2

RStudio Connect 1.7.2 is ready to download, and this release contains some long-awaited functionality that we are excited to share. Several authentication and user-management tooling improvements have been added, including...continue reading.

## Upcoming talks in spring 2019

This spring, I’ll be giving talks at a couple of Meetups and conferences: March, 26th: At the data lounge Bremen, I’ll be talking about Explainable Machine Learning April, 11th: At...continue reading.

## How to Avoid Publishing Credentials in Your Code

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. When accessing an API or database in R, it is often necessary to provide credentials such...continue reading.

## Pivoting data frames just got easier thanks to pivot_wide() and pivot_long()

There’s a lot going on in the development version of {tidyr}. New functions for pivoting data frames, pivot_wide() and pivot_long() are coming, and will replace the current functions, spread() and...continue reading.

## Data Science Software Reviews: Forrester vs. Gartner

In my previous post, I discussed Gartner’s reviews of data science software companies. In this post, I show Forrester’s coverage and discuss how radically different it is. As usual, this...continue reading.

## R and labelled data: Using quasiquotation to add variable and value labels #rstats

Labelling data is typically a task for end-users and is applied in own scripts or functions rather than in packages. However, sometimes it can be useful for both end-users and...continue reading.

## Tidyverse users: gather/spread are on the way out

From https://twitter.com/sharon000/status/1107771331012108288: From https://tidyr.tidyverse.org/dev/articles/pivot.html: There are two important new features inspired by other R packages that have been advancing of reshaping in R: The reshaping operation can be specified with...continue reading.

## Review of Data Walking

The Data Walking project was organised and written up by David Hunter at Ravensbourne University London  (which you might remember … Morecontinue reading.

## Assumptions Matter More Than Dependencies

There’s been alot of talk about “dependencies” in the R universe of late. This is not really a post about that but more of a “really, don’t do this” if...continue reading.

## tibble 2.1.1

Version 2.1.1 of the tibble package is on CRAN now. Tibbles are a modern reimagining of the data frame, keeping what time has shown to be effective, and throwing out...continue reading.

## RStudio Connect Quickstart

RStudio have recently announced ‘RStudio Connect QuickStart’ which is a VM containing a full suite of RStudio’s pro tools, available to be trialled for a 45 day period. RStudio Connect...continue reading.