Python Development Standards
-
Techradar
Trial
-
Recommendation
Please see notes below
NOTE: Python has yet to be adopted by the tools authority as "Strategic Adopt". Python is only for the use in AI and Data and is not permitted outside of this remit. Please adhere to the following Defra Software Development Standards for building digital services. For the use of Python in data science and data engineering, please refer to the Data Analytics & Science Hub (DASH) section. The DASH platform provides tools and software that enable users to carry out data analysis with powerful compute, accessible support and secure access to data.
Overview
NOTE: Python should not be used for frontend development. The GDS Service Manual has information on using client-side JavaScript. For server-side JavaScript we use Node.js. The principles of Progressive Enhancement should be used for the design and build of frontend digital services. The building of backend services in Python is limited to the use in AI and Data. A javascript first approach should be taken and where there is adquate justification, Python can then be used.
This manual is designed to aid developers in writing clear and consistent Python code across projects.
Python is the tool of choice for developing artificial intelligence tools (e.g., traditional machine learning and large language models) and is adopted widely by the data analytics and science community. The Adoption of Python for software development in the AI and Data area will allow for easier deployment of artificial intelligence tools. Python’s prevalence in the artificial intelligence world is highlighted by the fact that OpenAI use Pytorch as their standard framework for training their large language models (e.g., GPT models).
Code Formatting
We use Black
to format our code. Black
is an opinionated formatter that follows PEP 8. Where Black
and PEP 8 don't express a view, we defer to the Google Python style guide. Use these as references unless something is explicitly mentioned here.
To add a new rule or exception, please create a pull request against this repo.
Maximum Line Length
The maximum line length is 120 characters. This deviates from PEP 8's preferred 79 characters due to:
- Most developers now use IDEs that can handle longer lines
- GDS developers often work with nested JSON objects, ORM model definitions/queries, and long error/URL strings
Linting
Ruff
We recommend using the Ruff command-line checker as an all-in-one lint, code style, and complexity checker.
How to use Ruff
- Add the
Ruff
module to your 'dev' or 'test' requirements/dependencies. - Run it alongside your unit tests.
Ruff Ignores
Ruff can ignore particular lines or files. You can specify rule exemptions per directory or file.
Common Configuration
An example pyproject.toml
configuration:
Environments
When a new Python project is created, a new virtual environment should be used to store the dependencies required to run the code within the project. Virtual environments are used to manage dependencies and ensure that the correct version of a dependency is available to the correct project. Additionally, they make the project reproducible as the virtual environment can be recreated on any machine. A text file ‘requirements.txt’ can be created with the command pip freeze > requirements.txt after the required dependencies are installed into the virtual environment. This file contains a list of the installed dependencies and their version number. This file should be included in the Git repository for the project. It should be updated if new dependencies are added to the project. After downloading a project, this file will be used to recreate the virtual environment on another machine, ensuring the code functions correctly. GDS Way suggest using pyenv-virtualenv to manage virtual environments.
- Use
Pyenv
to manage different Python versions - Use
pyenv-virtualenv
to manage virtual Python environments - Use
direnv
to manage environment variables
Dependencies
Applications
For applications or script bundles:
- Pin specific versions of dependencies for reproducibility
- Keep dependencies up-to-date for security
- Document the upgrade process in your
README
- Use
requirements.txt
for pinned dependencies - Consider using
pip-tools
for larger projects - Use
requirements-dev.txt
for development/testing dependencies
Libraries
For installable Python repositories:
- Use
setup.py
to specify dependencies and version ranges - Update
setup.py
when testing new versions outside the existing range - Use PEP 440 git references for dependencies not available on PyPI
- Specify testing dependencies in
tox.ini
(if using Tox) orrequirements-dev.txt
Recommended Libraries (through a data science lens)
- Data processing: Pandas, NumPy, SciPy
- Data visualisation / dashboarding: Matplotlib, Seaborn, Plotly, Dash
- ML/AI/NLP: Scikit-learn, LightGBM, XGBoost, CatBoost, PyTorch, TensorFlow, Keras, Langchain, LlamaIndex, Transformers, NLTK, spaCy
- Statistics: Statsmodels
- HTTP: requests, PyCrypto
- Scraping and automation: Selenium, BeautifulSoup
- Unit testing: pytest
- Logging: logging