## Summarize content of a vector or data.frame every n entries

I imagine that the same result can be achieved by a proper use of quantile, but I like to have an easy way to obtain summary statistics every n entries...continue reading.

When building visualizations with ggplot2 in R I decided to create specialized functions that encapsulate plotting logic for some of my creations. In this case instead of commonly used aes function...continue reading.

In the last week, I’ve read an interesting article “Dispersion Models in Regression Analysis” by Peter Song (http://www.pakjs.com/journals/25%284%29/25%284%299.pdf), which describes a new class of models more general than classic generalized...continue reading.

Below is a schematic diagram of statistical models for fractional outcomes based on my studies done in early 2012. For details, please refer to my blog series in “Modeling Rates...continue reading.

On 06/23, I posted two SAS macros implementation GRNN (https://statcompute.wordpress.com/2013/06/23/prototyping-a-general-regression-neural-network-with-sas). However, in order to use these macros in the production environment, we still need a scheme to automatically choose the...continue reading.

From the technical prospective, people usually would choose GRNN (general regression neural network) to do the function approximation for the continuous response variable and use PNN (probabilistic neural network) for...continue reading.

Last time when I read the paper “A General Regression Neural Network” by Donald Specht, it was exactly 10 years ago when I was in the graduate school. After reading...continue reading.

With geo-tagging enabled, tweets include information on the location of the user when the tweet was sent. Miguel Rios (@miguelrios) has plotted locations of billions of tweets to create maps...continue reading.

A while ago I was playing around with the JavaScript package D3.js, and I began with this visualization—that I never really finished—of how a one-way ANOVA is calculated. I wanted...continue reading.

A common way of illustrating the idea behind statistical power in null hypothesis significance testing, is by plotting the sampling distributions of the null hypothesis (\( H_0 $) and the...continue reading.

When processing large data sets in R you often also end up creating large temporary objects. In order to keep the memory footprint small, it is always good to remove...continue reading.

In this post I will show some different examples of how to work with map projections and how to plot the maps using ggplot. Many maps that are shown using...continue reading.

The recent elections in Pakistan on May 11 were a great success by all means. In spite of the threats for violence by Al-Qaeda and its local franchises in Pakistan...continue reading.

In this post I show some R-examples on how to perform power analyses for mixed-design ANOVAs. The first example is analytical — adapted from formulas used in G*Power (Faul et...continue reading.

The second edition of Michael Cawley’s The R Book is available from Wiley. According to the publisher, the new edition boasts the following features:”Features full colour text and extensive graphics...continue reading.

Today it’s 16 years ago and 367,496 messages later since Martin Mächler started the R-help (321,119 msgs), R-devel (45,830 msgs) and R-announce (547 msgs) mailing lists [1] – a great...continue reading.

The Normalized Difference Vegetation Index (NDVI) estimates the greenness of plants covering the surface of the Earth by measuring the light reflected by the vegetation into space. The main idea...continue reading.

The feed-forward neural network is a very powerful classification model in the machine learning content. Since the goodness-of-fit of a neural network is majorly dominated by the model complexity, it...continue reading.

In my post yesterday comparing efficiency in joining two data frames, I overlooked the computing cost used to convert data.frames to data.tables / ff data objects. Today, I did the...continue reading.

In R, there are multiple ways to merge 2 data frames. However, there could be a huge disparity in terms of efficiency. Therefore, it is worthwhile to test the performance...continue reading.