Documenting Python Packages with Sphinx and ReadTheDocs

05 Jan 2019 - Tags: tools

Writing, building, and hosting the documentation for a Python package can be a pain to do manually. Luckily there are tools which make it a lot easier. Sphinx is a tool to generate html documentation from reStructuredText files (kind of like Markdown). It even supports automatically generating API references from Python code and the docstrings within! ReadTheDocs is a service which automatically builds and hosts your documentation for free. In this post we’ll take a look at how to use Sphinx and ReadTheDocs to generate and host documentation for a Python project.

Outline

Installation
Setup
Writing the documentation
- File hierarchy
- Syntax
- Math
- Code
- Cross-referencing
Building the Documentation
Autodoc extension
Napoleon extension
- Enabling Napoleon
- Configuring Napoleon
Intersphinx
Macros
Themes
Hosting Documentation on ReadTheDocs

Installation

Installing Sphinx is pretty easy. Just install via pip with:

pip install sphinx

Setup

Let’s assume your project’s main folder is project-name. Create a folder for the documentation within that folder (called, say, docs). In a terminal, navigate to that docs folder and run

sphinx-quickstart

and answer all the questions. Make sure to say yes to enabling the autodoc extension!

This will create an index.rst file, and a conf.py file. The index.rst file contains the home page of your documentation in reStructuredText format. reStructuredText is sort of like Markdown, but made specifically for writing technical documentation. The conf.py file is a python script which sets configuration options for building your documentation.

Writing the documentation

The index.rst file, which was created in your docs folder, is the main page of your documentation. In that file you can write content, and define sub-pages which have their own content (and sub-pages of their own).

To write the documentation, we’ll use reStructuredText(reST). reST is similar to Markdown in that it is easily readable in plain text form (though honestly not as easily readable as Markdown), but is built with the intention of converting the raw text to formatted html (or pdf, or something).

The main difference between Markdown and reST are “roles” and “directives” (in addition to a few other syntactic differences we’ll get to in a bit). Roles are used for signifying things like inline math expressions, cross-references, and arguments to directives. Directives are blocks of explicit markup for things like the table of contents, images, displayed math equations, and tables.

File hierarchy

Each page in your documentation can contain content and links to sub-pages, defining a hierarchy of documentation pages (though it is possible to have isolated pages to which other pages in the main hierarchy link). The sub-pages for a given pages are listed in a directive called the toctree (table of contents tree).

The toctree for a documentation page contains a list of subpages for that page. List elements in the toctree are filenames (minus the .rst extension) of other pages to link to. To make those pages, create .rst files with those filenames, and add a title (the toctree in a parent page will list the page title, not the filename).

For example, an index.rst documentation home page may contain a toctree which looks like this:

.. toctree::
   getting_started
   tutorials
   api

If you create the files getting_started.rst, tutorials.rst, and api.rst, then the home page of your documentation will link to those three pages.

You can change the behavior of a toctree by adding different roles as “arguments”. For example, to set the caption of the toctree to “Contents”, you can add the :caption: role, followed by the desired caption:

.. toctree::
   :caption: Contents
   getting_started
   tutorials
   api

You can also set the maximum depth that the table of contents will display by setting the maxdepth role:

.. toctree::
   :caption: Contents
   :maxdepth: 2
   etc...

Setting the maxdepth to 1 lists only this page’s sub-pages; setting the maxdepth to 2 also lists each subpage’s subpages; and so on.

By default, Spinx adds a side bar with the table of contents, but also displays the table of contents in the main page. If you want to hide the table of contents in the main page (but still show it in the side bar), you can add the :hidden: role:

.. toctree::
   :caption: Contents
   :maxdepth: 2
   :hidden:
   etc...

Syntax

But having a document hierarchy isn’t very useful without adding content to the documents! Titles and headings can be added by underlining the title/heading with certain characters. You can actually use any characters you want, but the convention is to use:

= for page titles,
- for sections,
^ for subsections, and
" for subsubsections.

You can add text in the normal way (by just adding text!). Surround text with one asterisk for italics, two asterisks for boldface, and double backquotes for inline code (aka ‘grave accent marks’ - single backquotes are used in markdown to signify inline code). So, for example:

Title
=====

Here is some normal text

Section
-------

*Italic* text

Subsection
^^^^^^^^^^

**Bold** text

Subsubsection
"""""""""""""

``Code sample``

Bulleted lists use asterisks, numbered lists use numbers, and nesting is done with indentation, like Markdown (though nested lists need to be surrounded by blank lines):

* Bulleted
* list

  * nested
  * list

* con't

1. Numbered
2. list

For hyperlinks, use:

`Link text <http://www.link-address.com>`_

Commented lines begin with two periods and a space (like a directive w/o the double colons):

.. I'm a comment!

And multiline comments are also like directives w/o the colons:

..
   I am
   a multiline

   comment

Sphinx has a page with more info on sphinx-style reStructuredText syntax, and here’s a page with general reStructuredText syntax.

Math

Sphinx supports including LaTeX-style equations in the documentation’s .rst files. There are a few different ways to do this, but I prefer using MathJax via the mathjax extension. To enable the mathjax extension, in your conf.py file, add the string 'sphinx.ext.mathjax' to the extensions list. Assuming that during sphinx-quickstart you said yes to enabling the autodoc extension, there should be a variable in conf.py containing a list of extensions to use which looks like this:

extensions = [
    'sphinx.ext.autodoc',
    #optionally other extensions...
]

To enable the mathjax extension, just add 'sphinx.ext.mathjax' to that list:

extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.mathjax',
    #optionally other extensions...
]

It may already be there if you said yes to enabling the mathjax extension during sphinx-quickstart.

Now you can insert LaTeX-style math expressions and equations in your .rst files and they’ll be rendered in the built html documentation using mathjax!

Inline math can be inserted using the :math: role, and then an expression enclosed in single backquotes (aka ‘grave accent marks’, used in markdown to signify inline code). For example,

... the mean (:math:`\mu`) of ...

will render as:

… the mean ( \( \mu \) ) of …

Displayed math (an equation or expression which has its own line) can be inserted using the .. math:: directive and indentation:

.. math::

    y \sim \mathcal{N}(0, 1)

which renders as

\[y \sim \mathcal{N}(0, 1)\]

You can also reference equations using the :label: and :eq: roles:

.. math:: \beta \sim \text{Poisson}(\lambda=5)
   :label: beta_prior

The prior on :math:`\beta` is a Poisson distribution with rate parameter of 5 :eq:`beta_prior`.

Code

As mentioned above, inline code can be inserted by enclosing the text in double backquotes:

Here's some ``inline code``.

Which renders as:

Here’s some inline code.

Displayed code blocks are a bit different from Markdown. To make a displayed code block, you can use a literal block, which is started with double colons, and continues until text returns to the original indentation:

Here's some normal text. ::

  # And here's some code
  for i in range(5):
    print(i)

  # code block keeps going until un-indent
  
Normal text again

which renders as:

Here’s some normal text.

# And here's some code
for i in range(5):
  print(i)

# code block keeps going until un-indent

Normal text again

You can also use the code-block directive to ensure a specific type of syntax highlighting is used:

.. code-block:: languagename

    Code in that language.

which renders identically to the above example which used a literal block - except you can set which language is used for the syntax highlighting. Valid values for languagename are any language that Pygments supports (like python, c, javascript, etc).

Cross-referencing

Sphinx also supports cross-referencing. To create a label for a section, use a directive which consists of an underscore, then any label name you want, then a colon, followed by the section header:

.. _some-label-name:

Section you want to reference
-----------------------------
etc

Then, in any other document in the documentation, you can reference that label with the :ref: role:

... see :ref:`some-label-name` for more ...

You can also make labels which can be anywhere (and not just before a section header), but when you reference them you also need to specify a label title:

:ref:`Link title <some-other-label-name>`

The one downside of this is that because a label in any document can be referenced from any other, all your label names must be unique.

You can also just reference a specific document. To reference the document by the filename, use the :doc: role. For example, to reference the file api.rst), use:

:doc:`/api`

Like with :ref:, the default is to use the document title as the link text, but you can override the text by:

:doc:`Custom link text </page-to-link>`

Building the Documentation

To build the docs from the .rst files, run

sphinx-build -b html sourcedir builddir

where sourcedir is the folder with your sphinx doc files, and builddir is the directory in which to put the output html. The -b flag indicates what type you want to build the docs as (in this case, html).

For example, to create the built html version of the documentation in the folder project-name/docs/_html, from within the project-name/docs folder, run:

sphinx-build -b html . _html

Autodoc extension

The autodoc extension for sphinx can automatically generate API reference doc pages from the docstrings in your python code. Python docstrings are string literals which occur immediately after function or class definitions. They’re treated as comments, and it’s customary to document a function or class in its docstring. The autodoc extension allows you to include reStructuredText in your docstrings, and will build an API reference automatically for your module from the docstrings, while allowing for further customization.

To enable the autodoc extension, in your conf.py file, add the string 'sphinx.ext.autodoc' to the extensions list (this should have already been done if you answered yes to enabling the autodoc extension during sphinx-quickstart).

You also need to add the path to the folder containing your module’s source code. Also in conf.py, add the lines

import os
import sys
sys.path.insert(0, os.path.abspath('../package-name'))

The above assumes your project directories look like this:

package-name/
    package-name/
        python files, etc
    docs/
        conf.py
        rst files, other docs stuff

Automodule and autoclass

After enabling autodoc, in the documentation’s .rst files, you can use the automodule directive:

.. automodule:: package_name.module
   :members:

which will insert an automatically generated API reference for the entire module. The :members: role makes autodoc generate documentation for each member (function, class, or constant) in that module, and for each member of each class (attributes and methods).

To insert an automatically-generated API reference only for one specific class, you can use the autoclass directive:

.. autoclass:: package_name.module.class_name
   :members:
   :inherited-members:
   :exclude-members: members, to, exclude

which lists all attributes and methods, except those in the exclude-members list.

The :inherited-members: role causes members which are inherited to also be included in the documentation (this role can also be used with automodule).

Speaking of inheritance, to show a list of classes the current class inherits from , add the :show-inheritance: role to the directive.

To include only specific members, instead of specifying :exclude-members:, you can add a list of members you do want to include after :members:, eg: :members: members, to, include.

The default is to not document private members (attribs or methods starting with an underscore, like _private_method). To force autodoc to also include those private members in the generated API documentation, add the :private-members: role.

You can also add :special-members: to include “special” functions - functions with two underscores at the beginning and end, such as __add__, __sub__, and __init__.

If you want to have more control over how the docs are structured, there are also autofunction, automethod, and autoattribute directives. These are used in the same way as automodule and autoclass, but they only insert the documentation for specific functions, methods, or attributes.

Mock importing packages

To build the docs, Sphinx needs to actually import your package. But you might not have all dependencies installed on the machine you’re building the docs on (for example, I don’t have tensorflow and tensorflow-probability installed on my laptop, but a package I’m building and wanting to build the docs for imports them!). To make autodoc “mock” import these packages (don’t actually import the packages but allow the code to run), add a variable called autodoc_mock_imports to your conf.py file containing a list of packages to mock import.

autodoc_mock_imports = ['packages', 'to', 'mock']

For example, to mock import tensorflow and tensorflow-probability, add:

autodoc_mock_imports = ['tensorflow', 'tensorflow_probability']

This allows you to build your documentation on machines where you don’t have the full dependency stack installed and set up.

Including Math in docstrings

Including math expressions or equations in your docstrings can be done in the same way as for .rst files (see above). However, for docstrings which contain math, it’s a good idea to either make the docstring a Python raw string (by putting r in front of the docstring, like so: r"""Docstring..."""), or to double all backslashes. Otherwise the escaped characters are passed to autodoc instead of the backslash literals. I’ve found that unless you use string literals or double backslashes, things (somehow) work fine for displayed math, but not for inline math.

Referencing the API

In docstrings, you can cross-reference labels and documents just like you can in the .rst files. But now that you’ve added the API reference, you can also reference specific modules, classes, functions, attributes, and methods.

To reference modules, use the :mod: role:

see the :mod:`modulename` module...

To reference functions, use the :func: role:

...the :func:`modulename.funcname` returns...

To reference classes, use the :class: role:

...must be a :class:`modulename.classname` object...

To reference methods, use the :meth: role:

...has a :meth:`modulename.classname.` method...

The :attr: role is used to reference attributes in the same way as :meth:. Objects can be referenced with the :obj: role, and exceptions with the :exc: role.

You can also have Sphinx search for what you want to reference, assuming it is uniquely named, by prepending a period instead of all the modules/classes which the thing you’re referencing belongs to. For example, instead of:

:class:`modulename1.modulename2.modulename3.MyCoolClass`

you can simply use:

:class:`.MyCoolClass`

Sphinx has more information about referencing Python objects on their website.

Napoleon extension

The Napoleon extension for Sphinx allows for NumPy/Google style docstrings instead of using the hard-to-read reStructuredText in your docstrings. Napoleon is a pre-processor which takes your NumPy- or Google-style docstrings and converts them to reStructuredText.

For example, using raw reStructuredText, writing a function specification is horrific:

def some_func(foo, bar, baz):
  """Does some stuff
  :param foo: The foo to bar
  :type foo: int
  :param bar: Bar to use on foo
  :type bar: str
  :param baz: Baz to frobnicate
  :type baz: float
  :returns: The frobnicated baz
  :rtype: float
  """

Yuck. Google-style docstrings are slightly better:

def some_func(foo, bar, baz):
  """Does some stuff

  Args:
    foo (int): The foo to bar
    bar (str): Bar to use on foo
    baz (float): Baz to frobnicate

  Returns:
    float: The frobnicated baz
  """

However, especially when it comes to functions which have many parameters with multi-line descriptions (and long lists of types), I prefer Numpy-style docstrings:

def some_func(foo, bar, baz):
  """Does some stuff

  Parameters
  ----------
  foo : int, float, str, or tf.Tensor
    The foo to bar, which has a really really, reeeeeeeeeeeeeeeeally
    unnecessarily long multiline description.
  bar : str
    Bar to use on foo
  baz : float
    Baz to frobnicate

  Returns
  -------
  float
    The frobnicated baz
  """

The Napoleon extension allows you to write Numpy- or Google-style docstrings in your code, but causes Sphinx to render it correctly (as if you had written them with all the complicated reStructuredText roles).

Enabling Napoleon

To enable the Napoleon extension, in your sphinx conf.py file, add the string 'sphinx.ext.napoleon' to the extensions list. So, previously (after enabling autodoc), my conf.py file contained:

extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.intersphinx',
]

And to enable Napoleon, add the string to the list of extensions:

extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.napoleon',
    'sphinx.ext.intersphinx',
]

Configuring Napoleon

There are a few configuration settings that can be set for Napoleon. Set their values by defining them in the conf.py file. For example, to use Numpy-style docstrings and not Google-style docstrings, we can enable napoleon_numpy_docstring and disable napoleon_google_docstring by adding these two lines to conf.py:

napoleon_google_docstring = False
napoleon_numpy_docstring = True

Now you can run the sphinx-build command to generate documentation from NumPy or Google style docstrings!

Intersphinx

Sphinx also has another extension called Intersphinx, which allows you to link to the API reference of another project (as long as the documentation for that project was also built with Sphinx).

To enable Intersphinx, add the string 'sphinx.ext.intersphinx' to the extensions list in your conf.py file.

Then, add a variable to your conf.py file called intersphinx_mapping which contains a dict of module names to their API reference urls, like this:

intersphinx_mapping = {
    'python': ('https://docs.python.org/', None),
    'numpy': ('http://docs.scipy.org/doc/numpy/', None)
}

Now in your documentation and docstrings, you can reference elements of that module. For example, to reference the NumPy ndarray class in a docstring, you can use

:class:`numpy.ndarray`

Unfortunately, with intersphinx, you have to use the full class/function/method name (including the module name), and can’t use dot notation to have Sphinx search for the correct class/function/method to reference. Fortunately, Sphinx supports macros.

Macros

Macros can come in handy for making your docstrings clean and easy to read, especially when using Intersphinx. For example, it’s cumbersome to reference the NumPy ndarray class with

:class:`numpy.ndarray`

In order to create a macro, you can add a replace directive to the rst file

.. |ndarray| replace:: :class:`numpy.ndarray`

And then in your docstrings, you can just use

|ndarray|

and Sphinx will replace it with the formatted link to that class. This also comes in handy for linking to docs which aren’t written using Sphinx (and so intersphinx won’t work), for example:

.. |Tensor| replace:: `Tensor <http://www.tensorflow.org/api_docs/python/tf/Tensor>`__

Note that when using hyperlinks with the replace directive, you need to have a double underscore at the end instead of just one (like with a normal hyperlink in rst).

Although, presumably you’ll be wanting the same macros for most of your rst files. So, instead of re-writing all the macros at the top of each rst file, you can create a single file (named, say, macros.hrst) which contains all your macros. Then to include all those macros in a given rst file, add the following line to that rst file:

.. include:: macros.hrst

Themes

You can set the theme by setting the value of the html_theme in your conf.py file. Here’s a list of themes provided with Sphinx. For example, to use the ReadTheDocs theme, use sphinx_rtd_theme:

html_theme = 'sphinx_rtd_theme'

Though specifically for the ReadTheDocs theme, you’ll need to install it with pip:

pip install sphinx_rtd_theme

Hosting Documentation on ReadTheDocs

If you want others to be able to easily access your built documentation, you could in theory build the html with Sphinx, and then host that html on a server. But, that’s a pain. ReadTheDocs is a service which automatically builds and hosts your documentation, for free! You can link it up to the GitHub repository for your package and documentation, and every time you push to that repository, ReadTheDocs will build the newest version of the package’s documentation and host it at http://your-project-name.readthedocs.io.

Setting up ReadTheDocs

Using ReadTheDocs is easiest if you have a GitHub account and the code and docs for your package are hosted in a GitHub repository. It also works pretty much the same way with GitLab and Bitbucket.

First, you have to sign up for a ReadTheDocs account by going to ReadTheDocs and selecting “Sign Up”. If you already have a GitHub account, choose “Sign Up with GitHub”. Allow ReadTheDocs the required permissions to your GitHub account, and confirm your email address.

Then, go to your ReadTheDocs dashboard, and select “Import a Project”. You might first have to click the “refreshing your accounts” link, but if you’ve linked your Github account, you should see a list of all your GitHub repositories. Click the plus sign next to the repository, and et voila! Wait a few minutes (for ReadTheDocs to build your documentation), and your documentation will be hosted at http://repository-name.readthedocs.io.

ReadTheDocs and Autodoc

To get ReadTheDocs to work with the Autodoc extension, there are a few other things you might have to worry about. If your package only works with a specific version of python, you’ll have to ensure ReadTheDocs knows what python version to use.

Add a readthedocs.yml file in the root directory of your project, which contains the following:

# .readthedocs.yml

build:
  image: latest

python:
  version: 3.6

Also, if your package depends on other non-standard packages, you have to let ReadTheDocs know about them. To do that, create a requirements.txt file in the root directory of your project, which contains a list of package names that your package depends on. For example, if my package depends on numpy, tensorflow, and tensorflow_probability python packages, my requirements.txt file will contain:

numpy
tensorflow
tensorflow_probability

Then, make ReadTheDocs aware of these dependencies by adding the following line to your readthedocs.yml file:

requirements_file: requirements.txt

Adding a Docs Badge

You can also add a documentation status badge to your project’s README.rst file. For example, to add a documentation badge, add a |Badge| macro and then use ReadTheDocs’ badge link:

|Some other badge|  |Docs Badge|  |etc|

... rest of your readme file ...

.. |Docs Badge| image:: https://readthedocs.org/projects/your-project-name/badge/
    :alt: Documentation Status
    :scale: 100%
    :target: http://your-project-name.readthedocs.io

Brendan Hasz