Evaluation metrics play a critical role in machine learning ecosystem. Especially for machine learning products, evaluation metrics are like the heart beats. They show how healthy the model is and...continue reading.
Category: R Books
Consider the following two spark dataframes:df1.show()+—-+——+——-+|id_a|time_a|value_a|+—-+——+——-+| 1| 1| CA|| 1| 2| CA|| 2| 1| TX|| 3| 5| NE|| 4| 6| WA|+—-+——+——-+df2.show(…continue reading.
In my last post, I compiled and cleaned publicly available data on over 4.5 million stops over the past 11 years. I also presented preliminary summary statistics showing that blacks had...continue reading.
The NYPD provides publicly available data on stop and frisks with data dictionaries, located here. The data, ranging from 2003 to 2014, contains information on over 4.5 million stops. Several...continue reading.
An universally used generative unsupervised clustering is Gaussains Mixture Model (GMM) which is also known as “EM Clustering”. The idea of GMM is very simple: for a given dataset, each...continue reading.