Author: chris2016

chris2016

February 18, 2020

Part 6: How not to validate your model with optimism corrected bootstrapping

When evaluating a machine learning model if the same data is used to train and test the model this results in overfitting. So the model performs much better in predictive...continue reading.

chris2016

February 5, 2020

Consensus clustering in R

The logic behind the Monti consensus clustering algorithm is that in the face of resampling the ideal clusters should be stable, thus any pair of samples should either always or...continue reading.

chris2016

December 5, 2019

How to make a precision recall curve in R

Precision recall (PR) curves are useful for machine learning model evaluation when there is an extreme imbalance in the data and the analyst is interested particuarly in one class. A...continue reading.

chris2016

November 26, 2019

How to easily make a ROC curve in R

A typical task in evaluating the results of machine learning models is making a ROC curve, this plot can inform the analyst how well a model can discriminate one class...continue reading.

chris2016

October 1, 2019

Fast adaptive spectral clustering in R (brain cancer RNA-seq)

Spectral clustering refers to a family of algorithms that cluster eigenvectors derived from the matrix that represents the input data’s graph. An important step in this method is running the...continue reading.

chris2016

June 8, 2019

Running UMAP for data visualisation in R

UMAP is a non linear dimensionality reduction algorithm in the same family as t-SNE. In the first phase of UMAP a weighted k nearest neighbour graph is computed, in the...continue reading.

chris2016

May 30, 2019

Quick and easy t-SNE analysis in R

t-SNE is a useful dimensionality reduction method that allows you to visualise data embedded in a lower number of dimensions, e.g. 2, in order to see patterns and trends in...continue reading.

chris2016

May 23, 2019

Easy quick PCA analysis in R

Principal component analysis (PCA) is very useful for doing some basic quality control (e.g. looking for batch effects) and assessment of how the data is distributed (e.g. finding outliers). A...continue reading.

chris2016

January 16, 2019

Using clusterlab to benchmark clustering algorithms

Clusterlab is a CRAN package (https://cran.r-project.org/web/packages/clusterlab/index.html) for the routine testing of clustering algorithms. It can simulate positive (data-sets with >1 clusters) and negative controls (data-sets with 1 cluster). Why test...continue reading.

chris2016

December 29, 2018

Part 5: Code corrections to optimism corrected bootstrapping series

The truth is out there R readers, but often it is not what we have been led to believe. The previous post examined the strong positive results bias in optimism...continue reading.

chris2016

December 28, 2018

Part 4: More bias and why does bias occur in optimism corrected bootstrapping?

In the previous parts of the series we demonstrated a positive results bias in optimism corrected bootstrapping by simply adding random features to our labels. This problem is due to...continue reading.

chris2016

December 27, 2018

Part 3: Two more implementations of optimism corrected bootstrapping show shocking bias

Welcome to part III of debunking the optimism corrected bootstrap in high dimensions (quite high number of features) in the Christmas holidays. Previously we saw with a reproducible code implementation...continue reading.

chris2016

December 26, 2018

Part 2: Optimism corrected bootstrapping is definitely bias, further evidence

Some people are very fond of the technique known as ‘optimism corrected bootstrapping’, however, this method is clearly bias and this becomes apparent as we increase the number of features...continue reading.

chris2016

December 25, 2018

Optimism corrected bootstrapping: a problematic method

There are lots of ways to assess how predictive a model is while correcting for overfitting. In Caret the main methods I use are leave one out cross validation, for...continue reading.

chris2016

August 26, 2018

Simulating NXN dimensional Gaussian clusters in R

Gaussian clusters are found in a range of fields and simulating them is important as often we will want to test a given class discovery tools performance under conditions where...continue reading.

chris2016

August 21, 2018

How to perform consensus clustering without overfitting and reject the null hypothesis

The Monti et al. (2003) consensus clustering algorithm is one of the most widely used class discovery techniques in the genome sciences and is commonly used to cluster transcriptomic, epigenetic,...continue reading.

chris2016

July 25, 2018

Forcasting the price of bitcoin with the CRAN forecast package

There is interest in bitcoin at the moment because it is displaying signs of steady year to year growth with brief boosts followed by rapid declines. It is considered a...continue reading.

chris2016

June 29, 2018

Bias in high dimensional optimism corrected bootstrap procedure

I have been working in high dimensional analysis to predict drug response in rheumatoid arthritis patients and I was concerned to find the procedure called optimism corrected bootstrapping over-fits as...continue reading.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Author: chris2016

Part 6: How not to validate your model with optimism corrected bootstrapping

Consensus clustering in R

How to make a precision recall curve in R

How to easily make a ROC curve in R

Fast adaptive spectral clustering in R (brain cancer RNA-seq)

Running UMAP for data visualisation in R

Quick and easy t-SNE analysis in R

Easy quick PCA analysis in R

Using clusterlab to benchmark clustering algorithms

Part 5: Code corrections to optimism corrected bootstrapping series

Part 4: More bias and why does bias occur in optimism corrected bootstrapping?

Part 3: Two more implementations of optimism corrected bootstrapping show shocking bias

Part 2: Optimism corrected bootstrapping is definitely bias, further evidence

Optimism corrected bootstrapping: a problematic method

Simulating NXN dimensional Gaussian clusters in R

How to perform consensus clustering without overfitting and reject the null hypothesis

Forcasting the price of bitcoin with the CRAN forecast package

Bias in high dimensional optimism corrected bootstrap procedure

Editor Picks

R Weekly 2024-W17 volcano plots, box, duckplyr

R Highcharts Drilldown – How to Create Animated and Interactive Drilldown Charts in R

Categories

Platinum Sponsors

Sponsors

Buy us a coffee for $10.

Older posts