engineering

Reasons for using Jupyter

share

This year, the project Jupyter became the laureate of the ACM Software System Award, which honors software of a lasting influence (previous laureates : the web, Unix, Java, LLVM, GCC, ...). This influence was mostly felt in data science, where it became a new standard for prototyping and sharing code.

At its heart, Jupyter is a REPL : Read-Eval-Print Loop (other REPLs would be irb, the Python prompt or using SQLite in the terminal).

You input a command, the program interprets it, prints the result and asks for the next input. Nothing revolutionary, but Jupyter has two big advantages over the other REPLs:

  • you can use much richer outputs than what you would be constrained to using the terminal. For example, interactive graphs are a common sight in Jupyter
  • the whole document (called "notebook"), with its inputs and outputs, constitutes a single file, which can be shared and can serve as a basis for a third person. This person will be able to start from the same code and re-execute it by integrating with his or her own modifications. In addition, Jupyter is responsible for specifying the format of this notebook document with the file extension: .ipynb. This document can be converted to an HTML page by tools, such as: https://nbviewer.jupyter.org/

Here

To operate, Jupyter is divided in a few components :

  • a web app : this is the interface that the user sees. By default, the notebook is opened in a navigator and this is where the user will write her code and sees the results
  • a web server : the code written by the user is not run directly in the browser. The commands are sent to a web server, which has the responsibility to delegate the execution before sending the result back to the web app
  • finally, an “execution environment”, named “kernel” in the Jupyter ecosystem. The kernel is where code is executed, and the user can pick a kernel that would best fit her needs. For example, instead of using a default kernel (that would let you write Python 2 or 3), the user could install a Kotlin kernel or one that has additional properties (like a smarter auto-complete).

If you wish to play with Jupyter, Google offers a cloud version, named Collaboratory (so there is nothing to install on your end).

To conclude, Jupyter can be very efficient for two different use cases :

  • prototyping. Jupyter allows you to keep a readable history, better outputs, install additional libraries, ... Your prototyping is then stored in a document from which you can restart any time later (which is much more difficult for other REPLs)
  • sharing. Jupyter made it much easier and accessible to share code in data science. Having direct access to code and results also created an expectation to see publications using this format, which motivated more people to share the code behind their experimentations

However, one must be vigilant about the limits of Jupyter. For example, when writing a significant amount of code, the traditional tools are definitely more suitable., Its use also requires a certain rigor: as it is easy to go back on code already executed, it can transform the notebook into a document where inputs are no longer in the right order and outputs no longer correspond to the associated input

more...