Lending Club Data: Proportion of Loans that end in Default by US State

Jessica YungData ScienceLeave a Comment

lending-club-default-map

Lending Club Loans Dataset: Complete loan data (over 800k records with up to ~70 attributes each!) for all loans issued through 2007-2015, including current loan status (Current, Late, Fully Paid, etc.) and latest payment information. I’ve posted the full code on GitHub. It (1) shows how I obtained the data used in the map above and (2) includes relevant exploratory … Read More

Improving on code in Jupyter Notebooks

Jessica YungData ScienceLeave a Comment

not-sure-if-refactoring

Here’s a better organised version of the code I posted yesterday predicting survival of passengers in the Titanic dataset. I made three changes: Structured code Modularised code Removed unnecessary code 1. Structured code: Added numbered section headings. Section headings help provide context for the code that follows, allow the reader to have a feel for what is going on by skimming … Read More

Handling NaNs in your Data: the Titanic Dataset

Jessica YungData ScienceLeave a Comment

titanic

Sometimes your data will contain invalid values such as NaN, often because data was lost or could not be collected. There are two ways of handling them: Delete the datapoint Estimate the value of the datapoint The first option – deleting your data – may be better if the number of anomalous datapoints is tiny and when the estimate of … Read More

Artificial Intelligence at Apple

Jessica YungData Science, Technology Article SummariesLeave a Comment

apple-artificial-intelligence

A summary of the article ‘An Exclusive Look at how AI and Machine Learning work at Apple’ by Steven Levy posted on Backchannel. Apple has been keeping a low profile on its artificial intelligence developments, so much so that critics thought it was far behind companies such as Facebook and Google. In this interview, Apple executives discuss how sophisticated Artificial … Read More

NASA on Mars: Images taken by the Curiosity Rover

Jessica YungData ScienceLeave a Comment

NASA-curiosity-rover

Yesterday we looked at satellite images of the Earth. Today let’s look at images of Mars taken by the Curiosity Rover! (Scroll down for the final product.) I made some quick adjustments to yesterday’s JS Bin app and got this nasty-looking return object: JS Bin on jsbin.com Parsing the data Let’s get an idea of what the return object is … Read More

NASA’s released data in action!

Jessica YungData ScienceLeave a Comment

NASA announced last week that they are making their research data available to the public. The key change seems to be the creation of PubSpace, an online, free-to-access archive of original science journal articles produced by NASA-funded research. The data will be available for download, reading and analysis within one year of publication. Much of NASA’s data can be explored … Read More

Multi-label classification: One debating topic, many categories

Jessica YungData ScienceLeave a Comment

one-rainbow-many-colours

What colour is this rainbow? Yesterday we wrangled debating motions data using Google Sheets. Today we’ll discuss building a machine learning model to classify these debating topics (e.g. This House Would Break Up the Eurozone) into categories (e.g. ‘Economics’ and ‘International Relations’). Why is this problem interesting? It is primarily a text classification problem. It is a multi-label classification problem. … Read More

Data Wrangling in Google Sheets: Debating Motions Example

Jessica YungData ScienceLeave a Comment

google-sheets-logo

Problem We need to sort information about debating tournaments sent to us in a word document (that is, rows with text strings) into by-category columns. This will then be entered into the Hello Motions database. Hello Motions is a site I developed to make it easy for people to search for debating topics. In this post, I will focus on extracting … Read More

A Classification Process Outlined in Simple Terms

Jessica YungData ScienceLeave a Comment

udacity-multinomial-linear-classification-process

Classification is determining which category an input belongs to. An example is trying to determine if an image is of a frog, a dog or a log. uDacity’s Deep Learning course gives a nice overview of one classification process (Multinomial Logistic Classification) with the following diagram and four steps (we’ll translate this in a moment): Take an input. Turn input into … Read More