Data is as prevalent as it is daunting. It can help us achieve amazing new discoveries but misinterpreting it can lead to fallacious reasoning. Data has always played a central feature in reasoning, but this role has increased exponential in recent years. Justifications for claims and decisions are often come down to data.Â
For this reason, it's vital to gain an understanding of the methods that are used to leverage data. This page provides intuitive applications that can be used right in your browser to explore otherwise confusing topics. Along with each application is a write-up explaining the methods and common applications.
As data becomes more prevalent in everyone's lives, it's vital to know the basic concepts. However many people have real trouble anytime math-talk pops up. I built this app to demonstrate three of the most important concepts in all of statistics and data analysis: Mean, Variance, and Correlation. Seeing how these parameters determine the shape of data provides an intuitive, practical understanding that sidesteps technical jargon.
Probability theory is the corner stone of data science and decision making. And the cornerstone of probability theory is Bayes' Rule. An understanding of Bayes' Rule is a key to understanding how to update one's strategies based on evidence. In this project I provide a simple yet effective visualization of probability theory to show how Bayes' rule works without the formulas.
Clustering is a family of unsupervised machine learning methods that partitions data. The point is to find naturally occurring subsets of the data - carve it at its joints - you might say. In this project, I discuss the differences between three clustering methods: K-means, Gaussian Mixture Modelling, & Density Based Scanning. This includes an app that allows you to compare the clustering in real time!
In this set of projects which comprise my dissertation, I identify how the distribution of statistically generated data across graphical networks affects the reliability and accuracy of the theories that result from the data. This is done by using bandit problems to simulate scientific data generation, graphical algorithms to distribute the data, and Bayesian learning to update theories.