Planning genetic constructs with many parts and assembly steps, such as recombinant metabolic pathways π§«, are often difficult to properly document as is evident from the poor state of documentation in the scientific literature β’οΈ.
The pydna python package provide a human-readable formal descriptions of 𧬠cloning and genetic assembly strategies in Python π which allow for simulation and verification. Pydna can be used as executable documentation for cloning.
A cloning strategy expressed in pydna is complete, unambiguous and stable.
Pydna provides simulation of:
- Primer design
- PCR
- Restriction digestion
- Ligation
- Gel electrophoresis of DNA with generation of gel images
- Homologous recombination
- Gibson assembly
- Golden gate assembly (in progress)
Virtually any sub-cloning experiment can be described in pydna, and its execution yield the sequences of intermediate and final DNA molecules.
Pydna has been designed with the goal of being understandable for biologists with only some basic understanding of Python.
Pydna can formalize planning and sharing of cloning strategies and is especially useful for complex or combinatorial DNA molecule constructions.
If you use pydna in your research, please reference the paper:
Pereira, F., Azevedo, F., Carvalho, Γ., Ribeiro, G. F., Budde, M. W., & Johansson, B. (2015). Pydna: a simulation and documentation tool for DNA assembly strategies using python. BMC Bioinformatics, 16(142), 142. doi:10.1186/s12859-015-0544-x
Full documentation of all modules and classes can be found at https://bjornfjohansson.github.io/pydna.
To get started, we recommend you to have a look at the example notebooks. Start by having a look at Dseq, Dseq_Features and Importing_Seqs, which cover the basics of working with sequences. The rest of the notebooks cover how to use pydna for different cloning strategies, such as Gibson assembly, Restriction-Ligation, etc.
Most pydna functionality is implemented as methods for the double stranded DNA sequence record classes Dseq and Dseqrecord, which are subclasses of the Biopython Seq and SeqRecord classes.
These classes make PCR primer design, PCR simulation and cut-and-paste cloning very simple:
NOTE: You can run this example in this notebook
from pydna.dseqrecord import Dseqrecord
# Let's create a DNA sequence record, and add a feature to it
dsr = Dseqrecord("ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAAT")
dsr.add_feature(x=0, y=60,type="gene", label="my_gene") # We add a feature to highlight the sequence as a gene
dsr.figure()
Dseqrecord(-60) ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAAT TACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTA
# This is how it would look as a genbank file
print(dsr.format("genbank"))
LOCUS name 60 bp DNA linear UNK 01-JAN-1980
DEFINITION description.
ACCESSION id
VERSION id
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
misc 1..60
/type="gene"
/label="my_gene"
ORIGIN
1 atgcaaacag taatgatgga tgacattcaa agcactgatt ctattgctga aaaagataat
//
# Now let's design primers to amplify it
from pydna.design import primer_design
# limit is the minimum length of the primer, target_tm is the desired melting temperature of the primer
amplicon = primer_design(dsr, limit=13, target_tm=55)
# Let's print the primers, and a figure that shows where they align with the template sequence
print("forward primer:", amplicon.forward_primer.seq)
print("reverse primer:", amplicon.reverse_primer.seq)
amplicon.figure()
forward primer: ATGCAAACAGTAATGATGGA
reverse primer: ATTATCTTTTTCAGCAATAGAATCA
5ATGCAAACAGTAATGATGGA...TGATTCTATTGCTGAAAAAGATAAT3
|||||||||||||||||||||||||
3ACTAAGATAACGACTTTTTCTATTA5
5ATGCAAACAGTAATGATGGA3
||||||||||||||||||||
3TACGTTTGTCATTACTACCT...ACTAAGATAACGACTTTTTCTATTA5
# Let's say we don't want to just amplify it, but we want to add restriction sites to it!
from pydna.amplify import pcr
# We add the restriction sites to the primers
forward_primer = "ccccGGATCC" + amplicon.forward_primer
reverse_primer = "ttttGGATCC" + amplicon.reverse_primer
# We do the PCR
pcr_product = pcr(forward_primer, reverse_primer, dsr)
# The PCR product is of class `Amplicon`, a subclass of `Dseqrecord`.
# When doing a figure, it shows where primers anneal.
pcr_product.figure()
5ATGCAAACAGTAATGATGGA...TGATTCTATTGCTGAAAAAGATAAT3
|||||||||||||||||||||||||
3ACTAAGATAACGACTTTTTCTATTACCTAGGtttt5
5ccccGGATCCATGCAAACAGTAATGATGGA3
||||||||||||||||||||
3TACGTTTGTCATTACTACCT...ACTAAGATAACGACTTTTTCTATTA5
# If we want to see the sequence more clearly, we can turn it into a `Dseqrecord`
pcr_product = Dseqrecord(pcr_product)
pcr_product.figure()
Dseqrecord(-80) ccccGGATCCATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATGGATCCaaaa ggggCCTAGGTACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTACCTAGGtttt
from Bio.Restriction import BamHI # cuts GGATCC
# a, payload, c are the cut fragments
a, payload, c = pcr_product.cut (BamHI)
print(a.figure())
print()
print (payload.figure())
print()
print(c.figure())
Dseqrecord(-9) ccccG ggggCCTAG Dseqrecord(-70) GATCCATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATG GTACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTACCTAG Dseqrecord(-9) GATCCaaaa Gtttt
# We create a circular vector to insert the amplicon into
vector = Dseqrecord("aatgtttttccctCCCGGGcaaaatAGATCTtgctatgcatcatcgatct", circular=True, name="vect")
vector.figure()
Dseqrecord(o50)
aatgtttttccctCCCGGGcaaaatAGATCTtgctatgcatcatcgatct
ttacaaaaagggaGGGCCCgttttaTCTAGAacgatacgtagtagctaga
from Bio.Restriction import BglII # cuts AGATCT
linear_vector_bgl = vector.cut(BglII)[0] # Linearize the vector at BglII (produces only one fragment)
# Ligate the fragment of interest to the vector, and call looped() to circularize it
# synced is used to place the origin coordinate (0) in the same place for rec_vector and vector
rec_vector= (linear_vector_bgl + payload).looped().synced(vector)
rec_vector.figure()
Dseqrecord(o116) aatgtttttccctCCCGGGcaaaatAGATCCATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATGGATCTtgctatgcatcatcgatct ttacaaaaagggaGGGCCCgttttaTCTAGGTACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTACCTAGAacgatacgtagtagctaga
# Let's simulate a Gibson assembly
from pydna.assembly import Assembly
fragments = [
Dseqrecord('aatgtttttccctCACTACGtgctatgcatcat', name="fragment_A"),
Dseqrecord('tgctatgcatcatCTATGGAcactctaataatg', name="fragment_B"),
Dseqrecord('cactctaataatgTTACATAaatgtttttccct', name="fragment_C"),
]
# limit is the min. homology length between fragments in the assembly
asm = Assembly(fragments, limit=10)
# From the assembly object, which can generate all possible products, get a circular
product, *rest = asm.assemble_circular()
# We can print a figure that shows the overlaps between fragments
product.figure()
-|fragment_A|13
| \/
| /\
| 13|fragment_B|13
| \/
| /\
| 13|fragment_C|13
| \/
| /\
| 13-
| |
--------------------------------------------
# Or show the final sequence:
Dseqrecord(product).figure()
Dseqrecord(o60)
aatgtttttccctCACTACGtgctatgcatcatCTATGGAcactctaataatgTTACATA
ttacaaaaagggaGTGATGCacgatacgtagtaGATACCTgtgagattattacAATGTAT
As the example above shows, pydna keeps track of sticky ends and features.
Pydna can be very compact. The eleven lines of Python below simulates the construction of a recombinant plasmid. DNA sequences are downloaded from Genbank by accession numbers that are guaranteed to be stable over time.
from pydna.genbank import Genbank
gb = Genbank("myself@email.com") # Tell Genbank who you are!
gene = gb.nucleotide("X06997") # Kluyveromyces lactis LAC12 gene for lactose permease.
from pydna.parsers import parse_primers
primer_f,primer_r = parse_primers(''' >760_KlLAC12_rv (20-mer)
ttaaacagattctgcctctg
>759_KlLAC12_fw (19-mer)
aaatggcagatcattcgag ''')
from pydna.amplify import pcr
pcr_prod = pcr(primer_f,primer_r, gene)
vector = gb.nucleotide("AJ001614") # pCAPs cloning vector
from Bio.Restriction import EcoRV
lin_vector = vector.linearize(EcoRV)
rec_vec = ( lin_vector + pcr_prod ).looped()
By default, pydna is installed with minimal dependencies, but there are optional dependencies for additional functionality.
Click here to see optional dependencies
Enables the pydna.dseqrecord.Dseqrecord.copy_gb_to_clipboard()
and pydna.dseqrecord.Dseqrecord.copy_fasta_to_clipboard()
These methods will put a copy the sequence on the clipboard in either Genbank (gb) or fasta format.
Dependency | Function in pydna |
---|---|
pyperclip | copy sequence to clipboard |
Pyparsing enables the pydna.genbankfixer.gbtext_clean()
function that can automatically
correct malformed sequence files in Genbank format. These are often found online, so this
option also installs requests to enable the pydna.genbankfixer.download.download_text()
function which can be used to get cleaned up text from a URL.
Dependency | Function in pydna |
---|---|
pyparsing | fix corrupt Genbank files with pydna.genbankfixer |
requests | download sequences with pydna.download |
This option enables the pydna.utils.cai()
function and the cai()
method
available from subclasses of pydna.seqrecord.SeqRecord
, such as
pydna.dseqrecord.Dseqrecord
.
| cai2 | codon adaptation index calculations in several modules |
Scipy, matplotlib and pillow (PIL) enable the generation of gel images. Numpy is also needed, but usually installed as a dependency of biopython.
Dependency | Function in pydna |
---|---|
scipy | gel simulation with pydna.gel |
matplotlib | β |
pillow | β |
# use the `--pre` flag to get the latest version of pydna.
pip install --pre --upgrade pydna
# to install the optional dependencies, you can use the following command:
pip install --pre --upgrade pydna[clipboard,download,express,gel]
Remove options inside the square brackets as required, but be sure not to leave spaces as pip will not recognize the options. See below under "Optional dependencies".
If your project uses poetry to manage dependencies, you can install pydna with the following commands:
# Basic installation
poetry add pydna
# With optional dependencies (ommit the options you don't want)
poetry add pydna --extras "clipboard download express gel"
# If you already have it installed and you want to add or remove optional
# dependencies, you have to uninstall and install again
poetry remove pydna
poetry add pydna --extras "express gel"
Feedback & suggestions are very welcome! Please create an issue with your question, comment or suggestion. Please include the version of pydna you are using and code to reproduce the issue if possible.
If you don't have a github account, you can get in touch through the google group for pydna.
Below are the instructions for developers who want to contribute to pydna. Please direct pull requests towards the dev_bjorn
branch.
Fork the entire repository (not just the master
branch by unticking the "Copy the master
branch only" box)
Create your branch starting from dev_bjorn
, and if your changes are related to an issue, call the branch issue_<number>
.
# Clone the repository
git clone https://github.com/<your-username>/pydna.git
# Change to the repository directory
cd pydna
# Go to the dev_bjorn branch
git checkout -b dev_bjorn
# Pull the current version of dev_bjorn
git pull origin dev_bjorn
# Create your own branch
git checkout -b issue_<number>
This is the preferred method to develop on pydna, so if you plan to contribute regularly, it's worth taking this route. If you encounter any issues, create an issue on GitHub and we will be able to help.
Use Poetry to install dependencies and activate virtual environment. This is necessary if you want to edit the project dependencies.
# If you want the virtual environment to be created in this folder
# (this is now the default, see `poetry.toml`)
poetry config virtualenvs.in-project true
# Install dependencies (extras are required for tests to pass)
poetry install --all-extras
# Activate virtual environment
poetry shell
# Install pre-commit hooks
poetry run pre-commit install
TODO
- Make your changes.
- Add the necessary tests in
tests/
. - Run the tests from the root directory with
python run_test.py
.TIP: You can run a particular test file with
pytest -vs test_file.py
(-v
for verbose and-s
to see print statements in the test). If you want to run just a single test, you can usepytest -vs -k test_name
, wheretest_name
is the name of the test function. - Install
pre-commit
hooks if you haven't by runningpre-commit install
.pre-commit
should be available in the environment, since it is installed by poetry.TIP: The hooks are a series of checks that will be run before you commit your code. If any of the checks fail, the commit will not be allowed. Some of them auto-fix the code (e.g.,
black
formatting), so you can simply dogit add .
and commit again. - Push the changes to your fork
- From your fork, make a PR towards the branch
dev_bjorn
in the original repository. - Mention the issue number in the PR description (e.g.,
Closes #123
). - Remember to click the "Allow edits from maintainers" checkbox so that we can make some changes to the PR if necessary.
The test_and_coverage workflow is triggered on all pushed commits for all branches except the master
branch. This workflow run tests, doctests and a series of Jupyter notebooks using pytest on Linux, Windows and macOS with all
supported python versions.
Documentation is built using Sphinx from docstrings using a GitHub action. The numpy docstring format is used.
Below the commands to run a local sphinx server that auto-updated when files are changed.
# Install docs dependency group
poetry install --with docs
# Start the sphinx server to see docs live by default at http://127.0.0.1:8000/
sphinx-autobuild --watch src/ docs docs/_build/html
See the releases for changes and releases.
The build workflow builds a PyPI packages using poetry. This workflow is triggered by publishing a Github release manually from the Github web interface.
Pydna was made public in 2012 on Google code.
π¦
π΅πΉ
Taylor, L. J., & Strebel, K. (2017). Pyviko: an automated Python tool to design gene knockouts in complex viruses with overlapping genes. BMC Microbiology, 17(1), 12. PubMed
Wang, Y., Xue, H., Pourcel, C., Du, Y., & Gautheret, D. (2021). 2-kupl: mapping-free variant detection from DNA-seq data of matched samples. In Cold Spring Harbor Laboratory (p. 2021.01.17.427048). DOI PubMed
ShareYourCloning, a web application for designing and documenting DNA cloning strategies.
An Automated Protein Synthesis Pipeline with Transcriptic and Snakemake
and other projects on github