## Fully General Record Transforms with cdata

One of the design goals of the cdata R package is that very powerful and arbitrary record transforms should be convenient and take only one or two steps. In fact...continue reading.

To make teaching R quasi-quotation easier it would be nice if R string-interpolation and quasi-quotation both used the same notation. They are related concepts. So some commonality of notation would...continue reading.

R Tip: use inline operators for legibility. A Python feature I miss when working in R is the convenience of Python‘s inline + operator. In Python, + does the right...continue reading.

Please help share our news and this discount. The second edition of our best-selling book Practical Data Science with R2, Zumel, Mount is featured as deal of the day at...continue reading.

R Tip: use seqi() for indexing. R‘s “1:0 trap” is a mal-feature that confuses newcomers and is a reliable source of bugs. This note will show how to use seqi()...continue reading.

While working on a variation of the RcppDynProg algorithm we derived the following beautiful identity of 2 by 2 real matrices: The superscript “top” denoting the transpose operation, the ||.||^2_2...continue reading.

While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python. This means I can time the...continue reading.

One often hears that R can not be fast (false), or more correctly that for fast code in R you may have to consider “vectorizing.” A lot of knowledgable R...continue reading.

RcppDynProg is a new Rcpp based R package that implements simple, but powerful, table-based dynamic programming. This package can be used to optimally solve the minimum cost partition into intervals...continue reading.

vtreat‘s purpose is to produce pure numeric R data.frames that are ready for supervised predictive modeling (predicting a value from other values). By ready we mean: a purely numeric data...continue reading.

In our last note we used wrapr::qe() to help quote expressions. In this note we will discuss quoting and code-capturing interfaces (interfaces that capture user source code) a bit more....continue reading.

Pipelines in R are popular, the most popular one being magrittr as used by dplyr. This note will discuss the advanced re-usable piping systems: rquery/rqdatatable operator trees and wrapr function...continue reading.

Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We...continue reading.

This note is a comment on some of the timings shared in the dplyr-0.8.0 pre-release announcement. The original published timings were as follows: With performance metrics: measurements are marketing. So...continue reading.

Our group has done a lot of work with non-standard calling conventions in R. Our tools work hard to eliminate non-standard calling (as is the purpose of wrapr::let()) or at...continue reading.

Many R users appear to be big fans of “code capturing” or “non standard evaluation” (NSE) interfaces. In this note we will discuss quoting and non-quoting interfaces in R. The...continue reading.

This note is just a quick follow-up to our last note on correcting the bias in estimated standard deviations for binomial experiments. For normal deviates there is, of course, a...continue reading.

This note is about attempting to remove the bias brought in by using sample standard deviation estimates to estimate an unknown true standard deviation of a population. We establish there...continue reading.

R is designed to make working with statistical models fast, succinct, and reliable. For instance building a model is a one-liner: model <- lm(Petal.Length ~ Sepal.Length, data = iris) And...continue reading.

coalesce is a classic useful SQL operator that picks the first non-NULL value in a sequence of values. We thought we would share a nice version of it for picking...continue reading.