Handling NaNs in your Data: the Titanic Dataset

Jessica YungData Science


Sometimes your data will contain invalid values such as NaN, often because data was lost or could not be collected. There are two ways of handling them: Delete the datapoint Estimate the value of the datapoint The first option – deleting your data – may be better if the number of anomalous datapoints is tiny and when the estimate of … Read More

Artificial Intelligence at Apple

Jessica YungData Science, Technology Article Summaries


A summary of the article ‘An Exclusive Look at how AI and Machine Learning work at Apple’ by Steven Levy posted on Backchannel. Apple has been keeping a low profile on its artificial intelligence developments, so much so that critics thought it was far behind companies such as Facebook and Google. In this interview, Apple executives discuss how sophisticated Artificial … Read More

NASA on Mars: Images taken by the Curiosity Rover

Jessica YungData Science


Yesterday we looked at satellite images of the Earth. Today let’s look at images of Mars taken by the Curiosity Rover! (Scroll down for the final product.) I made some quick adjustments to yesterday’s JS Bin app and got this nasty-looking return object: JS Bin on jsbin.com Parsing the data Let’s get an idea of what the return object is … Read More

NASA’s released data in action!

Jessica YungData Science

NASA announced last week that they are making their research data available to the public. The key change seems to be the creation of PubSpace, an online, free-to-access archive of original science journal articles produced by NASA-funded research. The data will be available for download, reading and analysis within one year of publication. Much of NASA’s data can be explored … Read More

Debugging a Classification Model: Refining Evaluation Metrics

Jessica YungData Science


A New Evaluation Metric In the previous post, I discussed the problems of using a pure accuracy metric for multi-label classification when you have many labels and a small number of labels assigned to each input. Even when my model assigned no labels to anything, it had an accuracy of 92%. In this post, I will discuss and go through … Read More

Multi-label classification: One debating topic, many categories

Jessica YungData Science


What colour is this rainbow? Yesterday we wrangled debating motions data using Google Sheets. Today we’ll discuss building a machine learning model to classify these debating topics (e.g. This House Would Break Up the Eurozone) into categories (e.g. ‘Economics’ and ‘International Relations’). Why is this problem interesting? It is primarily a text classification problem. It is a multi-label classification problem. … Read More

Data Wrangling in Google Sheets: Debating Motions Example

Jessica YungData Science


Problem We need to sort information about debating tournaments sent to us in a word document (that is, rows with text strings) into by-category columns. This will then be entered into the Hello Motions database. Hello Motions is a site I developed to make it easy for people to search for debating topics. In this post, I will focus on extracting … Read More

Big O Notation: A Common Mistake and Documentation

Jessica YungProgramming

A Question What’s the time complexity of the following algorithm? (Don’t know how to calculate that? Here’s a nice intro to Big-O Notation from InterviewCake.)

O(n^2)?  Nope. If array_x has length x and array_y has length y, the algorithm has time complexity O(xy) since one loop with a constant number of operations is run y times for each iteration of … Read More

Automate running a script using crontab

Jessica YungProgramming

Problem: You want to run a script once every 5 minutes or at some other regular interval, but don’t want to do it manually. You will need: Access to your UNIX (Any *nix should work) shell A script you want to run. If you don’t have a script you want to run, you can follow along with <a href="https://github.com/jessicayung/blog-code-snippets/blob/master/record_time.py" target="_blank">record_time.py</a> that appends a … Read More

A Classification Process Outlined in Simple Terms

Jessica YungData Science


Classification is determining which category an input belongs to. An example is trying to determine if an image is of a frog, a dog or a log. uDacity’s Deep Learning course gives a nice overview of one classification process (Multinomial Logistic Classification) with the following diagram and four steps (we’ll translate this in a moment): Take an input. Turn input into … Read More