Data Literacy

Data is as prevalent as it is daunting. It can help us achieve amazing new discoveries but misinterpreting it can lead to fallacious reasoning. Data has always played a central feature in reasoning, but this role has increased exponential in recent years. Justifications for claims and decisions are often come down to data.

For this reason, it's vital to gain an understanding of the methods that are used to leverage data. This page provides intuitive applications that can be used right in your browser to explore otherwise confusing topics. Along with each application is a write-up explaining the methods and common applications.

Visualizing 2D Normal Distributions

As data becomes more prevalent in everyone's lives, it's vital to know the basic concepts. However many people have real trouble anytime math-talk pops up. I built this app to demonstrate three of the most important concepts in all of statistics and data analysis: Mean, Variance, and Correlation. Seeing how these parameters determine the shape of data provides an intuitive, practical understanding that sidesteps technical jargon.

Visualizing Conditional Probability &
Bayes' Rule

Probability theory is the corner stone of data science and decision making. And the cornerstone of probability theory is Bayes' Rule. An understanding of Bayes' Rule is a key to understanding how to update one's strategies based on evidence. In this project I provide a simple yet effective visualization of probability theory to show how Bayes' rule works without the formulas.

Comparing Three Methods of Data Clustering: K-Means, GMM, & DBScan

Clustering is a family of unsupervised machine learning methods that partitions data. The point is to find naturally occurring subsets of the data - carve it at its joints - you might say. In this project, I discuss the differences between three clustering methods: K-means, Gaussian Mixture Modelling, & Density Based Scanning. This includes an app that allows you to compare the clustering in real time!

How Variance in Data Affects Reliability of Predictions in Scientific Networks
[PhD Dissertation]

In this set of projects which comprise my dissertation, I identify how the distribution of statistically generated data across graphical networks affects the reliability and accuracy of the theories that result from the data. This is done by using bandit problems to simulate scientific data generation, graphical algorithms to distribute the data, and Bayesian learning to update theories.

Google Sites

Report abuse

Data Literacy

Visualizing 2D Normal Distributions

Visualizing Conditional Probability & Bayes' Rule

Comparing Three Methods of Data Clustering: K-Means, GMM, & DBScan

How Variance in Data Affects Reliability of Predictions in Scientific Networks[PhD Dissertation]

Visualizing Conditional Probability &
Bayes' Rule

How Variance in Data Affects Reliability of Predictions in Scientific Networks
[PhD Dissertation]