Numpy Views vs Copies: Avoiding Costly Mistakes

Jessica YungData Science, Programming

In this post we will talk about the differences between views and copies. It’s really important you’re aware of the difference between the two. Otherwise you might run into problems like accidentally modifying arrays. What are views and copies? With a view, it’s like you are viewing the original (base) array. The view is actually part of the original array even though … Read More

Generating Autoregressive data for experiments

Jessica YungData Science, Machine Learning

In this post, we will go through how to generate autoregressive data in Python, which is useful for debugging models for sequential prediction like recurrent neural networks. When you’re building a machine learning model, it’s often helpful to check that it works on simple problems before moving on to complicated ones. I’ve found this is especially useful for debugging neural … Read More

Effective Deep Learning Resources: A Shortlist

Jessica YungArtificial Intelligence, Data Science, Education, Machine Learning, Studying

A lot of people ask me how to get started with deep learning. In this post I’ve listed a few resources I recommend for getting started. I’ve only chosen a few because I’ve found precise recommendations to be more helpful. Let me know if you have any comments or suggestions! Prelude: If you’re new to machine learning Deep learning is … Read More

How to use pickle to save and load variables in Python

Jessica YungData Science, Programming

pickle is a module used to convert Python objects to a character stream. You can (1) use it to save the state of a program so you can continue running it later. You can also (2) transmit the (secured) pickled data over a network. The latter is important for parallel and distributed computing. How to save variables to a .pickle file: … Read More

Comparing Model Performance with Normalised vs standardised input (Traffic Sign Classifier)

Jessica YungData Science, Self-Driving Car ND, Statistics

In the previous post, we explained (1) what normalisation and standardisation of data were, (2) why you might want to do it and (3) how you can do it. In this post, we’ll compare the performance of one model on unprocessed, normalised and standardised data. We’d expect using normalised or standardised input to give us higher accuracy, but how much better … Read More

How to use AWS EC2 GPU instances with BitFusion

Jessica YungData Science, Uncategorized

If you want to train neural networks seriously, you need more computational power than the typical laptop has. There are two solutions: Get (buy or borrow) more computational power (GPUs or servers) or Rent servers online. GPUs cost over a hundred dollars each and top models like the NVIDIA TESLA cost thousands, so it’s usually easier and cheaper to rent … Read More

Discovering and Curating Data on Data.World

Jessica YungData Science

To solve problems – particularly if you want to use statistical approaches or AI – you need data. Data is evidence or descriptive information. We usually deal with quantitative data or quantitative representations of e.g. text or images because they are easier to handle. The good news is there’s tons of data out there. The bad news is it’s often hidden … Read More