Author: statcompute

January 30, 2013

Another Benchmark for Joining Two Data Frames

In my post yesterday comparing efficiency in joining two data frames, I overlooked the computing cost used to convert data.frames to data.tables / ff data objects. Today, I did the...continue reading.

statcompute

January 29, 2013

Efficiency in Joining Two Data Frames

In R, there are multiple ways to merge 2 data frames. However, there could be a huge disparity in terms of efficiency. Therefore, it is worthwhile to test the performance...continue reading.

statcompute

January 12, 2013

PART – A Rule-Learning Algorithm

> require(‘RWeka’) > require(‘pROC’) > > # SEPARATE DATA INTO TRAINING AND TESTING SETS > df1 <- read.csv(‘credit_count.csv’) > df2 <- df1[df1$CARDHLDR == 1, 2:12] > set.seed(2013) > rows <-...continue reading.

statcompute

January 2, 2013

Efficiecy of Extracting Rows from A Data Frame in R

In the example below, 552 rows are extracted from a data frame with 10 million rows using six different methods. Results show a significant disparity between the least and the...continue reading.

statcompute

December 31, 2012

Modeling in R with Log Likelihood Function

Similar to NLMIXED procedure in SAS, optim() in R provides the functionality to estimate a model by specifying the log likelihood function explicitly. Below is a demo showing how to...continue reading.

statcompute

December 29, 2012

Surprising Performance of data.table in Data Aggregation

data.table (http://datatable.r-forge.r-project.org/) inherits from data.frame and provides functionality in fast subset, fast grouping, and fast joins. In previous posts, it is shown that the shortest CPU time to aggregate a...continue reading.

statcompute

December 25, 2012

More about Aggregation by Group in R

Motivated by my young friend, HongMing Song, I managed to find more handy ways to calculate aggregated statistics by group in R. They require loading additional packages, plyr, doBy, Hmisc,...continue reading.

statcompute

December 24, 2012

Aggregation by Group in R

Efficiency Comparison among 4 Methods abovecontinue reading.

statcompute

December 24, 2012

Data Import Efficiency – A Case in R

Below is a piece of R snippet comparing the data import efficiencies among CSV, SQLITE, and HDF5. Similar to the case in Python posted yesterday, HDF5 shows the highest efficiency.continue reading.

statcompute

December 21, 2012

Removing Records by Duplicate Values in R – An Efficiency Comparison

After posting “Removing Records by Duplicate Values” yesterday, I had an interesting communication thread with my friend Jeffrey Allard tonight regarding how to code this in R, a combination of...continue reading.

statcompute

December 20, 2012

Removing Records by Duplicate Values

Removing records from a data table based on duplicate values in one or more columns is a commonly used but important data cleaning technique. Below shows an example about how...continue reading.

statcompute

December 19, 2012

Generalized Boosted Regression with A Monotonic Marginal Effect for Each Predictor

In the practice of risk modeling, it is sometimes mandatory to maintain a monotonic relationship between the response and each predictor. Below is a demonstration showing how to develop a...continue reading.

statcompute

December 17, 2012

Fractional Logit Model with Python

In [1]: import pandas as pd In [2]: import statsmodels.api as sm In [3]: data = pd.read_table(‘/home/liuwensui/Documents/data/csdata.txt’) In [4]: Y = data.LEV_LT3 In [5]: X = sm.add_constant(data[[‘COLLAT1’, ‘SIZE1’, ‘PROF2’, ‘LIQ’,...continue reading.

statcompute

December 3, 2012

Exchange Data between Python and R with SQLite

SQLite is a light-weight database with zero-configuration. Being fast, reliable, and simple, SQLite is a good choice to store / query large data, e.g. terabytes, and is well supported by...continue reading.

statcompute

October 12, 2012

Download Stock Price Online with R

library(chron) library(zoo) # STOCK TICKER OF Fifth Third Bancorp stock <- ‘FITB’ # DEFINE STARTING DATE start.date <- 1 start.month <- 1 start.year <- 2012 # DEFINE ENDING DATE end.date...continue reading.

statcompute

October 8, 2012

Fit and Visualize A MARS Model

################################################# ## FIT A MULTIVARIATE ADAPTIVE REGRESSION ## ## SPLINES MODEL (MARS) USING MDA PACKAGE ## ## DEVELOPED BY HASTIE AND TIBSHIRANI ## ##############################################…continue reading.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Author: statcompute

Another Benchmark for Joining Two Data Frames

Efficiency in Joining Two Data Frames

PART – A Rule-Learning Algorithm

Efficiecy of Extracting Rows from A Data Frame in R

Modeling in R with Log Likelihood Function

Surprising Performance of data.table in Data Aggregation

More about Aggregation by Group in R

Aggregation by Group in R

Data Import Efficiency – A Case in R

Removing Records by Duplicate Values in R – An Efficiency Comparison

Removing Records by Duplicate Values

Generalized Boosted Regression with A Monotonic Marginal Effect for Each Predictor

Fractional Logit Model with Python

Exchange Data between Python and R with SQLite

Download Stock Price Online with R

Fit and Visualize A MARS Model

Editor Picks

How to prevent data leakage in pandas & scikit-learn ☔

Q1 2024 tidymodels digest

Categories

Platinum Sponsors

Sponsors

Buy us a coffee for $10.

Older posts