Improving on code in Jupyter Notebooks

Jessica YungData ScienceLeave a Comment

not-sure-if-refactoring

Here’s a better organised version of the code I posted yesterday predicting survival of passengers in the Titanic dataset.

I made three changes:

  1. Structured code
  2. Modularised code
  3. Removed unnecessary code

1. Structured code: Added numbered section headings.

screenshot

Section headings

  1. help provide context for the code that follows,
  2. allow the reader to have a feel for what is going on by skimming the page, and
  3. make it easy to jump to different sections of the code. This is especially useful if the reader is only interested in one section of the code, e.g. in the type of model used.

Numbering headings helps readers get a sense of

  1. where in the document they are. This is important because they could otherwise easily get or feel lost. I know I do! They can also get a sense of
  2. how one heading relates to another. In the screenshot, ‘1.1.1 Sex’ is more clearly a sub-section of ‘1.1 Transform columns that are non-numerical into numerical’.

2. Modularised code

Put code inside functions so

  1. code can be re-used, e.g. for processing the test set data or for use in a different file and
  2. code is more readable.

Instead of

we have

The advantages are that when I want to add or remove features, I can edit the add_features function. It is much neater and I can use function again when it comes to editing the test set.

Note to self: Consider adding docstrings. These functions are easy-to-read one-liners and I’m still working on this model so I haven’t done yet.

3. Removed unnecessary code

It is unclear whether some deletions were a net improvement. Some df.head()s I deleted may have helped clarify what state the dataframe was in. Whether or not it helps depends on the audience and their use cases: In this case, the audience was myself – I was most concerned with being able to work with the code easily and the df.head()s were only getting in the way. So I chose to delete them. If I want to check the dataframe, I can trivially add an extra df.head().

Relevant links:

  • Titanic dataset Code

Leave a Reply