-
-

Pydna

-
-

Welcome to pydna’s documentation!

-

Planning genetic constructs with many parts and assembly steps, such as recombinant -metabolic pathways :petri_dish:, are often difficult to properly document as is evident from the poor -state of documentation in the scientific literature :radioactive:.

-

The pydna python package provide a human-readable formal descriptions of :dna: cloning and genetic assembly -strategies in Python :snake: which allow for simulation and verification. -Pydna can be used as executable documentation for cloning.

-

A cloning strategy expressed in pydna is complete, unambiguous and stable.

-

Pydna provides simulation of:

+
+

Welcome to pydna’s documentation!

+

Stuff & other stuff

+
+

Module contents

+
+
copyright:
+

Copyright 2013-2023 by Björn Johansson. All rights reserved.

+
+
license:
+

This code is part of the pydna package, governed by the +license in LICENSE.txt that should be included as part +of this package.

+
+
+
+

pydna

+

Pydna is a python package providing code for simulation of the creation of +recombinant DNA molecules using +molecular biology +techniques. Development of pydna happens in this Github repository.

+
+
Provided:
    +
  1. PCR simulation

  2. +
  3. Assembly simulation based on shared identical sequences

  4. +
  5. Primer design for amplification of a given sequence

  6. +
  7. Automatic design of primer tails for Gibson assembly +or homologous recombination.

  8. +
  9. Restriction digestion and cut&paste cloning

  10. +
  11. Agarose gel simulation

  12. +
  13. Download sequences from Genbank

  14. +
  15. Parsing various sequence formats including the capacity to +handle broken Genbank format

  16. +
+
+
+
+

pydna package layout

+

The most important modules and how to import functions or classes from +them are listed below. Class names starts with a capital letter, +functions with a lowercase letter:

+
from pydna.module import function
+from pydna.module import Class
+
+Example: from pydna.gel import Gel
+
+pydna
+   ├── amplify
+   │         ├── Anneal
+   │         └── pcr
+   ├── assembly
+   │          └── Assembly
+   ├── design
+   │        ├── assembly_fragments
+   │        └── primer_design
+   ├── download
+   │          └── download_text
+   ├── dseqrecord
+   │            └── Dseqrecord
+   ├── gel
+   │     └── Gel
+   ├── genbank
+   │         ├── genbank
+   │         └── Genbank
+   ├── parsers
+   │         ├── parse
+   │         └── parse_primers
+   └── readers
+             ├── read
+             └── read_primers
+
+
+
+
+

How to use the documentation

+

Documentation is available as docstrings provided in the source code for +each module. +These docstrings can be inspected by reading the source code directly. +See further below on how to obtain the code for pydna.

+

In the python shell, use the built-in help function to view a +function’s docstring:

+
>>> from pydna import readers
+>>> help(readers.read)
+... 
+
+
+

The doctrings are also used to provide an automaticly generated reference +manual available online at +read the docs.

+

Docstrings can be explored using IPython, an +advanced Python shell with +TAB-completion and introspection capabilities. To see which functions +are available in pydna, +type pydna.<TAB> (where <TAB> refers to the TAB key). +Use pydna.open_config_folder?<ENTER>`to view the docstring or +`pydna.open_config_folder??<ENTER> to view the source code.

+

In the Spyder IDE it is possible +to place the cursor immediately before the name of a module,class or +function and press ctrl+i to bring up docstrings in a separate window in Spyder

+

Code snippets are indicated by three greater-than signs:

+
>>> x=41
+>>> x=x+1
+>>> x
+42
+
+
+
+
+

pydna source code

+

The pydna source code is +available on Github.

+
+
+

How to get more help

+

Please join the +Google group +for pydna, this is the preferred location for help. If you find bugs +in pydna itself, open an issue at the +Github repository.

+
+
+

Examples of pydna in use

+
+
See this repository for a collection of

examples.

+
+
+
+
+
+
+pydna.open_current_folder()[source]
+

Open the current working directory.

+

Opens in the default file manager. The location for this folder is +given by the os.getcwd() function

+
+ +
+
+pydna.open_cache_folder()[source]
+

Open the pydna cache folder.

+

Opens in the default file manager. The location for this folder is stored +in the pydna_data_dir environmental variable.

+
+ +
+
+pydna.open_config_folder()[source]
+

Open the pydna configuration folder.

+

Opens in the default file manager. The location for this folder is stored +in the pydna_config_dir environmental variable.

+

The pydna.ini file can be edited to make pydna quicker to use. +See the documentation of the :class:configparser.ConfigParser´ class.

+

Below is the content of a typical pydna.ini file on a Linux +system.

+
[main]
+loglevel=30
+email=myemail@example.org
+data_dir=/home/bjorn/.local/share/pydna
+log_dir=/home/bjorn/.cache/pydna/log
+ape=tclsh /home/bjorn/.ApE/AppMain.tcl
+cached_funcs=Genbank_nucleotide
+primers=/home/bjorn/Dropbox/wikidata/PRIMERS.txt
+enzymes=/home/bjorn/Dropbox/wikidata/RestrictionEnzymes.txt
+
+
+

The email address is set to someone@example.com by default. If you change +this to you own address, the pydna.genbank.genbank() function can be +used to download sequences from Genbank directly without having to +explicitly add the email address.

+

Pydna can cache results from the following functions or methods:

+ +

These can be added separated by a comma to the cached_funcs entry +in pydna.ini file or the pydna_cached_funcs environment variable.

+
+ +
+
+pydna.open_log_folder()[source]
+

docstring.

+
+ +
+
+pydna.get_env()[source]
+

Print a an ascii table containing all environmental variables.

+

Pydna related variables have names that starts with pydna_

+
+ +
+ +

Ascii-art logotype of pydna.

+
+ +
+
+

pydna.dseq module

+

Provides the Dseq class for handling double stranded DNA sequences.

+

Dseq is a subclass of Bio.Seq.Seq. The Dseq class +is mostly useful as a part of the pydna.dseqrecord.Dseqrecord class +which can hold more meta data.

+

The Dseq class support the notion of circular and linear DNA topology.

+
+
+class pydna.dseq.Dseq(watson: str | bytes, crick: str | bytes | None = None, ovhg=None, circular=False, pos=0)[source]
+

Bases: Seq

+

Dseq holds information for a double stranded DNA fragment.

+

Dseq also holds information describing the topology of +the DNA fragment (linear or circular).

+
+
Parameters:
+
    +
  • watson (str) – a string representing the watson (sense) DNA strand.

  • +
  • crick (str, optional) – a string representing the crick (antisense) DNA strand.

  • +
  • ovhg (int, optional) – A positive or negative number to describe the stagger between the +watson and crick strands. +see below for a detailed explanation.

  • +
  • linear (bool, optional) – True indicates that sequence is linear, False that it is circular.

  • +
  • circular (bool, optional) – True indicates that sequence is circular, False that it is linear.

  • +
+
+
+

Examples

+

Dseq is a subclass of the Biopython Seq object. It stores two +strings representing the watson (sense) and crick(antisense) strands. +two properties called linear and circular, and a numeric value ovhg +(overhang) describing the stagger for the watson and crick strand +in the 5’ end of the fragment.

+

The most common usage is probably to create a Dseq object as a +part of a Dseqrecord object (see pydna.dseqrecord.Dseqrecord).

+

There are three ways of creating a Dseq object directly listed below, but you can also +use the function Dseq.from_full_sequence_and_overhangs() to create a Dseq:

+

Only one argument (string):

+
>>> from pydna.dseq import Dseq
+>>> Dseq("aaa")
+Dseq(-3)
+aaa
+ttt
+
+
+

The given string will be interpreted as the watson strand of a +blunt, linear double stranded sequence object. The crick strand +is created automatically from the watson strand.

+

Two arguments (string, string):

+
>>> from pydna.dseq import Dseq
+>>> Dseq("gggaaat","ttt")
+Dseq(-7)
+gggaaat
+   ttt
+
+
+

If both watson and crick are given, but not ovhg an attempt +will be made to find the best annealing between the strands. +There are limitations to this. For long fragments it is quite +slow. The length of the annealing sequences have to be at least +half the length of the shortest of the strands.

+

Three arguments (string, string, ovhg=int):

+

The ovhg parameter is an integer describing the length of the +crick strand overhang in the 5’ end of the molecule.

+

The ovhg parameter controls the stagger at the five prime end:

+
dsDNA       overhang
+
+  nnn...    2
+nnnnn...
+
+ nnnn...    1
+nnnnn...
+
+nnnnn...    0
+nnnnn...
+
+nnnnn...   -1
+ nnnn...
+
+nnnnn...   -2
+  nnn...
+
+
+

Example of creating Dseq objects with different amounts of stagger:

+
>>> Dseq(watson="agt", crick="actta", ovhg=-2)
+Dseq(-7)
+agt
+  attca
+>>> Dseq(watson="agt",crick="actta",ovhg=-1)
+Dseq(-6)
+agt
+ attca
+>>> Dseq(watson="agt",crick="actta",ovhg=0)
+Dseq(-5)
+agt
+attca
+>>> Dseq(watson="agt",crick="actta",ovhg=1)
+Dseq(-5)
+ agt
+attca
+>>> Dseq(watson="agt",crick="actta",ovhg=2)
+Dseq(-5)
+  agt
+attca
+
+
+

If the ovhg parameter is specified a crick strand also +needs to be supplied, otherwise an exception is raised.

+
>>> Dseq(watson="agt", ovhg=2)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "/usr/local/lib/python2.7/dist-packages/pydna_/dsdna.py", line 169, in __init__
+    else:
+ValueError: ovhg defined without crick strand!
+
+
+

The shape of the fragment is set by circular = True, False

+

Note that both ends of the DNA fragment has to be compatible to set +circular = True.

+
>>> Dseq("aaa","ttt")
+Dseq(-3)
+aaa
+ttt
+>>> Dseq("aaa","ttt",ovhg=0)
+Dseq(-3)
+aaa
+ttt
+>>> Dseq("aaa","ttt",ovhg=1)
+Dseq(-4)
+ aaa
+ttt
+>>> Dseq("aaa","ttt",ovhg=-1)
+Dseq(-4)
+aaa
+ ttt
+>>> Dseq("aaa", "ttt", circular = True , ovhg=0)
+Dseq(o3)
+aaa
+ttt
+
+
+
>>> a=Dseq("tttcccc","aaacccc")
+>>> a
+Dseq(-11)
+    tttcccc
+ccccaaa
+>>> a.ovhg
+4
+
+
+
>>> b=Dseq("ccccttt","ccccaaa")
+>>> b
+Dseq(-11)
+ccccttt
+    aaacccc
+>>> b.ovhg
+-4
+>>>
+
+
+

Coercing to string

+
>>> str(a)
+'ggggtttcccc'
+
+
+

A Dseq object can be longer that either the watson or crick strands.

+
<-- length -->
+GATCCTTT
+     AAAGCCTAG
+
+<-- length -->
+      GATCCTTT
+AAAGCCCTA
+
+
+

The slicing of a linear Dseq object works mostly as it does for a string.

+
>>> s="ggatcc"
+>>> s[2:3]
+'a'
+>>> s[2:4]
+'at'
+>>> s[2:4:-1]
+''
+>>> s[::2]
+'gac'
+>>> from pydna.dseq import Dseq
+>>> d=Dseq(s, circular=False)
+>>> d[2:3]
+Dseq(-1)
+a
+t
+>>> d[2:4]
+Dseq(-2)
+at
+ta
+>>> d[2:4:-1]
+Dseq(-0)
+
+
+>>> d[::2]
+Dseq(-3)
+gac
+ctg
+
+
+

The slicing of a circular Dseq object has a slightly different meaning.

+
>>> s="ggAtCc"
+>>> d=Dseq(s, circular=True)
+>>> d
+Dseq(o6)
+ggAtCc
+ccTaGg
+>>> d[4:3]
+Dseq(-5)
+CcggA
+GgccT
+
+
+

The slice [X:X] produces an empty slice for a string, while this +will return the linearized sequence starting at X:

+
>>> s="ggatcc"
+>>> d=Dseq(s, circular=True)
+>>> d
+Dseq(o6)
+ggatcc
+cctagg
+>>> d[3:3]
+Dseq(-6)
+tccgga
+aggcct
+>>>
+
+
+ +
+
+trunc = 30
+
+ +
+
+classmethod quick(watson: str, crick: str, ovhg=0, circular=False, pos=0)[source]
+
+ +
+
+classmethod from_string(dna: str, *args, circular=False, **kwargs)[source]
+
+ +
+
+classmethod from_representation(dsdna: str, *args, **kwargs)[source]
+
+ +
+
+classmethod from_full_sequence_and_overhangs(full_sequence: str, crick_ovhg: int, watson_ovhg: int)[source]
+

Create a linear Dseq object from a full sequence and the 3’ overhangs of each strand.

+

The order of the parameters is like this because the 3’ overhang of the crick strand is the one +on the left side of the sequence.

+
+
Parameters:
+
    +
  • full_sequence (str) – The full sequence of the Dseq object.

  • +
  • crick_ovhg (int) – The overhang of the crick strand in the 3’ end. Equivalent to Dseq.ovhg.

  • +
  • watson_ovhg (int) – The overhang of the watson strand in the 5’ end.

  • +
+
+
Returns:
+

A Dseq object.

+
+
Return type:
+

Dseq

+
+
+

Examples

+
>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=2, watson_ovhg=2)
+Dseq(-6)
+  AAAA
+TTTT
+>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=-2, watson_ovhg=2)
+Dseq(-6)
+AAAAAA
+  TT
+>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=2, watson_ovhg=-2)
+Dseq(-6)
+  AA
+TTTTTT
+>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=-2, watson_ovhg=-2)
+Dseq(-6)
+AAAA
+  TTTT
+
+
+
+ +
+
+mw() float[source]
+

This method returns the molecular weight of the DNA molecule +in g/mol. The following formula is used:

+
MW = (A x 313.2) + (T x 304.2) +
+     (C x 289.2) + (G x 329.2) +
+     (N x 308.9) + 79.0
+
+
+
+ +
+
+upper() DseqType[source]
+

Return an upper case copy of the sequence.

+
>>> from pydna.dseq import Dseq
+>>> my_seq = Dseq("aAa")
+>>> my_seq
+Dseq(-3)
+aAa
+tTt
+>>> my_seq.upper()
+Dseq(-3)
+AAA
+TTT
+
+
+
+
Returns:
+

Dseq object in uppercase

+
+
Return type:
+

Dseq

+
+
+
+

See also

+

pydna.dseq.Dseq.lower

+
+
+ +
+
+lower() DseqType[source]
+

Return a lower case copy of the sequence.

+
>>> from pydna.dseq import Dseq
+>>> my_seq = Dseq("aAa")
+>>> my_seq
+Dseq(-3)
+aAa
+tTt
+>>> my_seq.lower()
+Dseq(-3)
+aaa
+ttt
+
+
+
+
Returns:
+

Dseq object in lowercase

+
+
Return type:
+

Dseq

+
+
+
+

See also

+

pydna.dseq.Dseq.upper

+
+
+ +
+
+find(sub: _SeqAbstractBaseClass | str | bytes, start=0, end=_sys.maxsize) int[source]
+

This method behaves like the python string method of the same name.

+

Returns an integer, the index of the first occurrence of substring +argument sub in the (sub)sequence given by [start:end].

+

Returns -1 if the subsequence is NOT found.

+
+
Parameters:
+
    +
  • sub (string or Seq object) – a string or another Seq object to look for.

  • +
  • start (int, optional) – slice start.

  • +
  • end (int, optional) – slice end.

  • +
+
+
+

Examples

+
>>> from pydna.dseq import Dseq
+>>> seq = Dseq("atcgactgacgtgtt")
+>>> seq
+Dseq(-15)
+atcgactgacgtgtt
+tagctgactgcacaa
+>>> seq.find("gac")
+3
+>>> seq = Dseq(watson="agt",crick="actta",ovhg=-2)
+>>> seq
+Dseq(-7)
+agt
+  attca
+>>> seq.find("taa")
+2
+
+
+
+ +
+
+reverse_complement() Dseq[source]
+

Dseq object where watson and crick have switched places.

+

This represents the same double stranded sequence.

+

Examples

+
>>> from pydna.dseq import Dseq
+>>> a=Dseq("catcgatc")
+>>> a
+Dseq(-8)
+catcgatc
+gtagctag
+>>> b=a.reverse_complement()
+>>> b
+Dseq(-8)
+gatcgatg
+ctagctac
+>>>
+
+
+
+ +
+
+rc() Dseq
+

Dseq object where watson and crick have switched places.

+

This represents the same double stranded sequence.

+

Examples

+
>>> from pydna.dseq import Dseq
+>>> a=Dseq("catcgatc")
+>>> a
+Dseq(-8)
+catcgatc
+gtagctag
+>>> b=a.reverse_complement()
+>>> b
+Dseq(-8)
+gatcgatg
+ctagctac
+>>>
+
+
+
+ +
+
+shifted(shift: int) DseqType[source]
+

Shifted version of a circular Dseq object.

+
+ +
+
+looped() DseqType[source]
+

Circularized Dseq object.

+

This can only be done if the two ends are compatible, +otherwise a TypeError is raised.

+

Examples

+
>>> from pydna.dseq import Dseq
+>>> a=Dseq("catcgatc")
+>>> a
+Dseq(-8)
+catcgatc
+gtagctag
+>>> a.looped()
+Dseq(o8)
+catcgatc
+gtagctag
+>>> a.T4("t")
+Dseq(-8)
+catcgat
+ tagctag
+>>> a.T4("t").looped()
+Dseq(o7)
+catcgat
+gtagcta
+>>> a.T4("a")
+Dseq(-8)
+catcga
+  agctag
+>>> a.T4("a").looped()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "/usr/local/lib/python2.7/dist-packages/pydna/dsdna.py", line 357, in looped
+    if type5 == type3 and str(sticky5) == str(rc(sticky3)):
+TypeError: DNA cannot be circularized.
+5' and 3' sticky ends not compatible!
+>>>
+
+
+
+ +
+
+tolinear() DseqType[source]
+

Returns a blunt, linear copy of a circular Dseq object. This can +only be done if the Dseq object is circular, otherwise a +TypeError is raised.

+

This method is deprecated, use slicing instead. See example below.

+

Examples

+
>>> from pydna.dseq import Dseq
+>>> a=Dseq("catcgatc", circular=True)
+>>> a
+Dseq(o8)
+catcgatc
+gtagctag
+>>> a[:]
+Dseq(-8)
+catcgatc
+gtagctag
+>>>
+
+
+
+ +
+
+five_prime_end() Tuple[str, str][source]
+

Returns a tuple describing the structure of the 5’ end of +the DNA fragment

+

Examples

+
>>> from pydna.dseq import Dseq
+>>> a=Dseq("aaa", "ttt")
+>>> a
+Dseq(-3)
+aaa
+ttt
+>>> a.five_prime_end()
+('blunt', '')
+>>> a=Dseq("aaa", "ttt", ovhg=1)
+>>> a
+Dseq(-4)
+ aaa
+ttt
+>>> a.five_prime_end()
+("3'", 't')
+>>> a=Dseq("aaa", "ttt", ovhg=-1)
+>>> a
+Dseq(-4)
+aaa
+ ttt
+>>> a.five_prime_end()
+("5'", 'a')
+>>>
+
+
+ +
+ +
+
+three_prime_end() Tuple[str, str][source]
+

Returns a tuple describing the structure of the 5’ end of +the DNA fragment

+
>>> from pydna.dseq import Dseq
+>>> a=Dseq("aaa", "ttt")
+>>> a
+Dseq(-3)
+aaa
+ttt
+>>> a.three_prime_end()
+('blunt', '')
+>>> a=Dseq("aaa", "ttt", ovhg=1)
+>>> a
+Dseq(-4)
+ aaa
+ttt
+>>> a.three_prime_end()
+("3'", 'a')
+>>> a=Dseq("aaa", "ttt", ovhg=-1)
+>>> a
+Dseq(-4)
+aaa
+ ttt
+>>> a.three_prime_end()
+("5'", 't')
+>>>
+
+
+ +
+ +
+
+watson_ovhg() int[source]
+

Returns the overhang of the watson strand at the three prime.

+
+ +
+
+fill_in(nucleotides: None | str = None) Dseq[source]
+

Fill in of five prime protruding end with a DNA polymerase +that has only DNA polymerase activity (such as exo-klenow [1]) +and any combination of A, G, C or T. Default are all four +nucleotides together.

+
+
Parameters:
+

nucleotides (str)

+
+
+

Examples

+
>>> from pydna.dseq import Dseq
+>>> a=Dseq("aaa", "ttt")
+>>> a
+Dseq(-3)
+aaa
+ttt
+>>> a.fill_in()
+Dseq(-3)
+aaa
+ttt
+>>> b=Dseq("caaa", "cttt")
+>>> b
+Dseq(-5)
+caaa
+ tttc
+>>> b.fill_in()
+Dseq(-5)
+caaag
+gtttc
+>>> b.fill_in("g")
+Dseq(-5)
+caaag
+gtttc
+>>> b.fill_in("tac")
+Dseq(-5)
+caaa
+ tttc
+>>> c=Dseq("aaac", "tttg")
+>>> c
+Dseq(-5)
+ aaac
+gttt
+>>> c.fill_in()
+Dseq(-5)
+ aaac
+gttt
+>>>
+
+
+

References

+ +
+ +
+
+transcribe() Seq[source]
+

Transcribe a DNA sequence into RNA and return the RNA sequence as a new Seq object.

+

Following the usual convention, the sequence is interpreted as the +coding strand of the DNA double helix, not the template strand. This +means we can get the RNA sequence just by switching T to U.

+
>>> from Bio.Seq import Seq
+>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
+>>> coding_dna
+Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')
+>>> coding_dna.transcribe()
+Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')
+
+
+

The sequence is modified in-place and returned if inplace is True:

+
>>> sequence = MutableSeq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
+>>> sequence
+MutableSeq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')
+>>> sequence.transcribe()
+MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')
+>>> sequence
+MutableSeq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')
+
+
+
>>> sequence.transcribe(inplace=True)
+MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')
+>>> sequence
+MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')
+
+
+

As Seq objects are immutable, a TypeError is raised if +transcribe is called on a Seq object with inplace=True.

+

Trying to transcribe an RNA sequence has no effect. +If you have a nucleotide sequence which might be DNA or RNA +(or even a mixture), calling the transcribe method will ensure +any T becomes U.

+

Trying to transcribe a protein sequence will replace any +T for Threonine with U for Selenocysteine, which has no +biologically plausible rational.

+
>>> from Bio.Seq import Seq
+>>> my_protein = Seq("MAIVMGRT")
+>>> my_protein.transcribe()
+Seq('MAIVMGRU')
+
+
+
+ +
+
+translate(table='Standard', stop_symbol='*', to_stop=False, cds=False, gap='-') Seq[source]
+

Translate..

+
+ +
+
+mung() Dseq[source]
+

Simulates treatment a nuclease with 5’-3’ and 3’-5’ single +strand specific exonuclease activity (such as mung bean nuclease [2])

+
    ggatcc    ->     gatcc
+     ctaggg          ctagg
+
+     ggatcc   ->      ggatc
+    tcctag            cctag
+
+>>> from pydna.dseq import Dseq
+>>> b=Dseq("caaa", "cttt")
+>>> b
+Dseq(-5)
+caaa
+ tttc
+>>> b.mung()
+Dseq(-3)
+aaa
+ttt
+>>> c=Dseq("aaac", "tttg")
+>>> c
+Dseq(-5)
+ aaac
+gttt
+>>> c.mung()
+Dseq(-3)
+aaa
+ttt
+
+
+

References

+ +
+ +
+
+T4(nucleotides=None) Dseq[source]
+

Fill in five prime protruding ends and chewing back +three prime protruding ends by a DNA polymerase providing both +5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty +(such as T4 DNA polymerase). This can be done in presence of any +combination of the four A, G, C or T. Removing one or more nucleotides +can facilitate engineering of sticky ends. Default are all four nucleotides together.

+
+
Parameters:
+

nucleotides (str)

+
+
+

Examples

+
>>> from pydna.dseq import Dseq
+>>> a=Dseq("gatcgatc")
+>>> a
+Dseq(-8)
+gatcgatc
+ctagctag
+>>> a.T4()
+Dseq(-8)
+gatcgatc
+ctagctag
+>>> a.T4("t")
+Dseq(-8)
+gatcgat
+ tagctag
+>>> a.T4("a")
+Dseq(-8)
+gatcga
+  agctag
+>>> a.T4("g")
+Dseq(-8)
+gatcg
+   gctag
+>>>
+
+
+
+ +
+
+t4(nucleotides=None) Dseq
+

Fill in five prime protruding ends and chewing back +three prime protruding ends by a DNA polymerase providing both +5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty +(such as T4 DNA polymerase). This can be done in presence of any +combination of the four A, G, C or T. Removing one or more nucleotides +can facilitate engineering of sticky ends. Default are all four nucleotides together.

+
+
Parameters:
+

nucleotides (str)

+
+
+

Examples

+
>>> from pydna.dseq import Dseq
+>>> a=Dseq("gatcgatc")
+>>> a
+Dseq(-8)
+gatcgatc
+ctagctag
+>>> a.T4()
+Dseq(-8)
+gatcgatc
+ctagctag
+>>> a.T4("t")
+Dseq(-8)
+gatcgat
+ tagctag
+>>> a.T4("a")
+Dseq(-8)
+gatcga
+  agctag
+>>> a.T4("g")
+Dseq(-8)
+gatcg
+   gctag
+>>>
+
+
+
+ +
+
+exo1_front(n=1) DseqType[source]
+

5’-3’ resection at the start (left side) of the molecule.

+
+ +
+
+exo1_end(n=1) DseqType[source]
+

5’-3’ resection at the end (right side) of the molecule.

+
+ +
+
+no_cutters(batch: RestrictionBatch | None = None) RestrictionBatch[source]
+

Enzymes in a RestrictionBatch not cutting sequence.

+
+ +
+
+unique_cutters(batch: RestrictionBatch | None = None) RestrictionBatch[source]
+

Enzymes in a RestrictionBatch cutting sequence once.

+
+ +
+
+once_cutters(batch: RestrictionBatch | None = None) RestrictionBatch
+

Enzymes in a RestrictionBatch cutting sequence once.

+
+ +
+
+twice_cutters(batch: RestrictionBatch | None = None) RestrictionBatch[source]
+

Enzymes in a RestrictionBatch cutting sequence twice.

+
+ +
+
+n_cutters(n=3, batch: RestrictionBatch | None = None) RestrictionBatch[source]
+

Enzymes in a RestrictionBatch cutting n times.

+
+ +
+
+cutters(batch: RestrictionBatch | None = None) RestrictionBatch[source]
+

Enzymes in a RestrictionBatch cutting sequence at least once.

+
+ +
+
+seguid() str[source]
+

SEGUID checksum for the sequence.

+
+ +
+
+isblunt() bool[source]
+

isblunt.

+

Return True if Dseq is linear and blunt and +false if staggered or circular.

+

Examples

+
>>> from pydna.dseq import Dseq
+>>> a=Dseq("gat")
+>>> a
+Dseq(-3)
+gat
+cta
+>>> a.isblunt()
+True
+>>> a=Dseq("gat", "atcg")
+>>> a
+Dseq(-4)
+ gat
+gcta
+>>> a.isblunt()
+False
+>>> a=Dseq("gat", "gatc")
+>>> a
+Dseq(-4)
+gat
+ctag
+>>> a.isblunt()
+False
+>>> a=Dseq("gat", circular=True)
+>>> a
+Dseq(o3)
+gat
+cta
+>>> a.isblunt()
+False
+
+
+
+ +
+
+cas9(RNA: str) Tuple[slice, ...][source]
+

docstring.

+
+ +
+
+terminal_transferase(nucleotides='a') Dseq[source]
+

docstring.

+
+ +
+
+cut(*enzymes: EnzymesType) Tuple[DseqType, ...][source]
+

Returns a list of linear Dseq fragments produced in the digestion. +If there are no cuts, an empty list is returned.

+
+
Parameters:
+

enzymes (enzyme object or iterable of such objects) – A Bio.Restriction.XXX restriction objects or iterable.

+
+
Returns:
+

frags – list of Dseq objects formed by the digestion

+
+
Return type:
+

list

+
+
+

Examples

+
>>> from pydna.dseq import Dseq
+>>> seq=Dseq("ggatccnnngaattc")
+>>> seq
+Dseq(-15)
+ggatccnnngaattc
+cctaggnnncttaag
+>>> from Bio.Restriction import BamHI,EcoRI
+>>> type(seq.cut(BamHI))
+<class 'tuple'>
+>>> for frag in seq.cut(BamHI): print(repr(frag))
+Dseq(-5)
+g
+cctag
+Dseq(-14)
+gatccnnngaattc
+    gnnncttaag
+>>> seq.cut(EcoRI, BamHI) ==  seq.cut(BamHI, EcoRI)
+True
+>>> a,b,c = seq.cut(EcoRI, BamHI)
+>>> a+b+c
+Dseq(-15)
+ggatccnnngaattc
+cctaggnnncttaag
+>>>
+
+
+
+ +
+
+cutsite_is_valid(cutsite: Tuple[Tuple[int, int], _AbstractCut | None]) bool[source]
+

Returns False if: +- Cut positions fall outside the sequence (could be moved to Biopython) +- Overhang is not double stranded +- Recognition site is not double stranded or is outside the sequence +- For enzymes that cut twice, it checks that at least one possibility is valid

+
+ +
+
+get_cutsites(*enzymes: EnzymesType) List[Tuple[Tuple[int, int], _AbstractCut | None]][source]
+

Returns a list of cutsites, represented represented as ((cut_watson, ovhg), enz):

+
    +
  • cut_watson is a positive integer contained in [0,len(seq)), where seq is the sequence +that will be cut. It represents the position of the cut on the watson strand, using the full +sequence as a reference. By “full sequence” I mean the one you would get from str(Dseq).

  • +
  • ovhg is the overhang left after the cut. It has the same meaning as ovhg in +the Bio.Restriction enzyme objects, or pydna’s Dseq property.

  • +
  • +
    enz is the enzyme object. It’s not necessary to perform the cut, but can be

    used to keep track of which enzyme was used.

    +
    +
    +
  • +
+

Cuts are only returned if the recognition site and overhang are on the double-strand +part of the sequence.

+
+
Parameters:
+

enzymes (Union[_RestrictionBatch,list[_AbstractCut]])

+
+
Return type:
+

list[tuple[tuple[int,int], _AbstractCut]]

+
+
+

Examples

+
>>> from Bio.Restriction import EcoRI
+>>> from pydna.dseq import Dseq
+>>> seq = Dseq('AAGAATTCAAGAATTC')
+>>> seq.get_cutsites(EcoRI)
+[((3, -4), EcoRI), ((11, -4), EcoRI)]
+
+
+

cut_watson is defined with respect to the “full sequence”, not the +watson strand:

+
>>> dseq = Dseq.from_full_sequence_and_overhangs('aaGAATTCaa', 1, 0)
+>>> dseq
+Dseq(-10)
+ aGAATTCaa
+ttCTTAAGtt
+>>> dseq.get_cutsites([EcoRI])
+[((3, -4), EcoRI)]
+
+
+

Cuts are only returned if the recognition site and overhang are on the double-strand +part of the sequence.

+
>>> Dseq('GAATTC').get_cutsites([EcoRI])
+[((1, -4), EcoRI)]
+>>> Dseq.from_full_sequence_and_overhangs('GAATTC', -1, 0).get_cutsites([EcoRI])
+[]
+
+
+
+ +
+
+left_end_position() Tuple[int, int][source]
+

The index in the full sequence of the watson and crick start positions.

+

full sequence (str(self)) for all three cases is AAA

+
AAA              AA               AAT
+ TT             TTT               TTT
+Returns (0, 1)  Returns (1, 0)    Returns (0, 0)
+
+
+
+ +
+
+right_end_position() Tuple[int, int][source]
+

The index in the full sequence of the watson and crick end positions.

+

full sequence (str(self)) for all three cases is AAA

+

` +AAA               AA                   AAA +TT                TTT                  TTT +Returns (3, 2)    Returns (2, 3)       Returns (3, 3) +`

+
+ +
+
+get_cut_parameters(cut: Tuple[Tuple[int, int], _AbstractCut | None] | None, is_left: bool) Tuple[int, int, int][source]
+

For a given cut expressed as ((cut_watson, ovhg), enz), returns +a tuple (cut_watson, cut_crick, ovhg).

    -
  • Primer design

  • -
  • PCR

  • -
  • Restriction digestion

  • -
  • Ligation

  • -
  • Gel electrophoresis of DNA with generation of gel images

  • -
  • Homologous recombination

  • -
  • Gibson assembly

  • -
  • Golden gate assembly (in progress)

  • +
  • cut_watson: see get_cutsites docs

  • +
  • cut_crick: equivalent of cut_watson in the crick strand

  • +
  • ovhg: see get_cutsites docs

  • +
+

The cut can be None if it represents the left or right end of the sequence. +Then it will return the position of the watson and crick ends with respect +to the “full sequence”. The is_left parameter is only used in this case.

+
+ +
+
+apply_cut(left_cut: Tuple[Tuple[int, int], _AbstractCut | None], right_cut: Tuple[Tuple[int, int], _AbstractCut | None]) Dseq[source]
+

Extracts a subfragment of the sequence between two cuts.

+

For more detail see the documentation of get_cutsite_pairs.

+
+
Parameters:
+
+
+
Return type:
+

Dseq

+
+
+

Examples

+
>>> from Bio.Restriction import EcoRI
+>>> from pydna.dseq import Dseq
+>>> dseq = Dseq('aaGAATTCaaGAATTCaa')
+>>> cutsites = dseq.get_cutsites([EcoRI])
+>>> cutsites
+[((3, -4), EcoRI), ((11, -4), EcoRI)]
+>>> p1, p2, p3 = dseq.get_cutsite_pairs(cutsites)
+>>> p1
+(None, ((3, -4), EcoRI))
+>>> dseq.apply_cut(*p1)
+Dseq(-7)
+aaG
+ttCTTAA
+>>> p2
+(((3, -4), EcoRI), ((11, -4), EcoRI))
+>>> dseq.apply_cut(*p2)
+Dseq(-12)
+AATTCaaG
+    GttCTTAA
+>>> p3
+(((11, -4), EcoRI), None)
+>>> dseq.apply_cut(*p3)
+Dseq(-7)
+AATTCaa
+    Gtt
+
+
+
>>> dseq = Dseq('TTCaaGAA', circular=True)
+>>> cutsites = dseq.get_cutsites([EcoRI])
+>>> cutsites
+[((6, -4), EcoRI)]
+>>> pair = dseq.get_cutsite_pairs(cutsites)[0]
+>>> pair
+(((6, -4), EcoRI), ((6, -4), EcoRI))
+>>> dseq.apply_cut(*pair)
+Dseq(-12)
+AATTCaaG
+    GttCTTAA
+
+
+
+ +
+
+get_cutsite_pairs(cutsites: List[Tuple[Tuple[int, int], _AbstractCut | None]]) List[Tuple[None | Tuple[Tuple[int, int], _AbstractCut | None], None | Tuple[Tuple[int, int], _AbstractCut | None]]][source]
+

Returns pairs of cutsites that render the edges of the resulting fragments.

+

A fragment produced by restriction is represented by a tuple of length 2 that +may contain cutsites or None:

+
+
    +
  • Two cutsites: represents the extraction of a fragment between those two +cutsites, in that orientation. To represent the opening of a circular +molecule with a single cutsite, we put the same cutsite twice.

  • +
  • None, cutsite: represents the extraction of a fragment between the left +edge of linear sequence and the cutsite.

  • +
  • cutsite, None: represents the extraction of a fragment between the cutsite +and the right edge of a linear sequence.

  • +
+
+
+
Parameters:
+

cutsites (list[tuple[tuple[int,int], _AbstractCut]])

+
+
Return type:
+

list[tuple[tuple[tuple[int,int], _AbstractCut]|None],tuple[tuple[int,int], _AbstractCut]|None]

+
+
+

Examples

+
>>> from Bio.Restriction import EcoRI
+>>> from pydna.dseq import Dseq
+>>> dseq = Dseq('aaGAATTCaaGAATTCaa')
+>>> cutsites = dseq.get_cutsites([EcoRI])
+>>> cutsites
+[((3, -4), EcoRI), ((11, -4), EcoRI)]
+>>> dseq.get_cutsite_pairs(cutsites)
+[(None, ((3, -4), EcoRI)), (((3, -4), EcoRI), ((11, -4), EcoRI)), (((11, -4), EcoRI), None)]
+
+
+
>>> dseq = Dseq('TTCaaGAA', circular=True)
+>>> cutsites = dseq.get_cutsites([EcoRI])
+>>> cutsites
+[((6, -4), EcoRI)]
+>>> dseq.get_cutsite_pairs(cutsites)
+[(((6, -4), EcoRI), ((6, -4), EcoRI))]
+
+
+
+ +
+ +
+
+

pydna.dseqrecord module

+

This module provides the Dseqrecord class, for handling double stranded +DNA sequences. The Dseqrecord holds sequence information in the form of a pydna.dseq.Dseq +object. The Dseq and Dseqrecord classes are subclasses of Biopythons +Seq and SeqRecord classes, respectively.

+

The Dseq and Dseqrecord classes support the notion of circular and linear DNA topology.

+
+
+class pydna.dseqrecord.Dseqrecord(record, *args, circular=None, n=5e-14, **kwargs)[source]
+

Bases: SeqRecord

+

Dseqrecord is a double stranded version of the Biopython SeqRecord [3] class. +The Dseqrecord object holds a Dseq object describing the sequence. +Additionally, Dseqrecord hold meta information about the sequence in the +from of a list of SeqFeatures, in the same way as the SeqRecord does.

+

The Dseqrecord can be initialized with a string, Seq, Dseq, SeqRecord +or another Dseqrecord. The sequence information will be stored in a +Dseq object in all cases.

+

Dseqrecord objects can be read or parsed from sequences in FASTA, EMBL or Genbank formats. +See the pydna.readers and pydna.parsers modules for further information.

+

There is a short representation associated with the Dseqrecord. +Dseqrecord(-3) represents a linear sequence of length 2 +while Dseqrecord(o7) +represents a circular sequence of length 7.

+

Dseqrecord and Dseq share the same concept of length. This length can be larger +than each strand alone if they are staggered as in the example below.

+
<-- length -->
+GATCCTTT
+     AAAGCCTAG
+
+
+
+
Parameters:
+
    +
  • record (string, Seq, SeqRecord, Dseq or other Dseqrecord object) – This data will be used to form the seq property

  • +
  • circular (bool, optional) – True or False reflecting the shape of the DNA molecule

  • +
  • linear (bool, optional) – True or False reflecting the shape of the DNA molecule

  • +
+
+
+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("aaa")
+>>> a
+Dseqrecord(-3)
+>>> a.seq
+Dseq(-3)
+aaa
+ttt
+>>> from pydna.seq import Seq
+>>> b=Dseqrecord(Seq("aaa"))
+>>> b
+Dseqrecord(-3)
+>>> b.seq
+Dseq(-3)
+aaa
+ttt
+>>> from Bio.SeqRecord import SeqRecord
+>>> c=Dseqrecord(SeqRecord(Seq("aaa")))
+>>> c
+Dseqrecord(-3)
+>>> c.seq
+Dseq(-3)
+aaa
+ttt
+
+
+

References

+ +
+
+classmethod from_string(record: str = '', *args, circular=False, n=5e-14, **kwargs)[source]
+

docstring.

+
+ +
+
+classmethod from_SeqRecord(record: SeqRecord, *args, circular=None, n=5e-14, **kwargs)[source]
+
+ +
+
+property circular
+

The circular property can not be set directly. +Use looped()

+
+ +
+
+m()[source]
+

This method returns the mass of the DNA molecule in grams. This is +calculated as the product between the molecular weight of the Dseq object +and the

+
+ +
+
+extract_feature(n)[source]
+

Extracts a feature and creates a new Dseqrecord object.

+
+
Parameters:
+

n (int) – Indicates the feature to extract

+
+
+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("atgtaa")
+>>> a.add_feature(2,4)
+>>> b=a.extract_feature(0)
+>>> b
+Dseqrecord(-2)
+>>> b.seq
+Dseq(-2)
+gt
+ca
+
+
+
+ +
+
+add_feature(x=None, y=None, seq=None, type_='misc', strand=1, *args, **kwargs)[source]
+

Add a feature of type misc to the feature list of the sequence.

+
+
Parameters:
+
    +
  • x (int) – Indicates start of the feature

  • +
  • y (int) – Indicates end of the feature

  • +
+
+
+

Examples

+
>>> from pydna.seqrecord import SeqRecord
+>>> a=SeqRecord("atgtaa")
+>>> a.features
+[]
+>>> a.add_feature(2,4)
+>>> a.features
+[SeqFeature(SimpleLocation(ExactPosition(2), ExactPosition(4), strand=1), type='misc', qualifiers=...)]
+
+
+
+ +
+
+seguid()[source]
+

Url safe SEGUID for the sequence.

+

This checksum is the same as seguid but with base64.urlsafe +encoding instead of the normal base64. This means that +the characters + and / are replaced with - and _ so that +the checksum can be part of a URL.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a = Dseqrecord("aa")
+>>> a.seguid()
+'ldseguid=TEwydy0ugvGXh3VJnVwgtxoyDQA'
+
+
+
+ +
+
+looped()[source]
+

Circular version of the Dseqrecord object.

+

The underlying linear Dseq object has to have compatible ends.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("aaa")
+>>> a
+Dseqrecord(-3)
+>>> b=a.looped()
+>>> b
+Dseqrecord(o3)
+>>>
+
+
+ +
+ +
+
+tolinear()[source]
+

Returns a linear, blunt copy of a circular Dseqrecord object. The +underlying Dseq object has to be circular.

+

This method is deprecated, use slicing instead. See example below.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("aaa", circular = True)
+>>> a
+Dseqrecord(o3)
+>>> b=a[:]
+>>> b
+Dseqrecord(-3)
+>>>
+
+
+
+ +
+
+terminal_transferase(nucleotides='a')[source]
+

docstring.

+
+ +
+
+format(f='gb')[source]
+

Returns the sequence as a string using a format supported by Biopython +SeqIO [4]. Default is “gb” which is short for Genbank.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> x=Dseqrecord("aaa")
+>>> x.annotations['date'] = '02-FEB-2013'
+>>> x
+Dseqrecord(-3)
+>>> print(x.format("gb"))
+LOCUS       name                       3 bp    DNA     linear   UNK 02-FEB-2013
+DEFINITION  description.
+ACCESSION   id
+VERSION     id
+KEYWORDS    .
+SOURCE      .
+  ORGANISM  .
+            .
+FEATURES             Location/Qualifiers
+ORIGIN
+        1 aaa
+//
+
+
+

References

+ +
+ +
+
+write(filename=None, f='gb')[source]
+

Writes the Dseqrecord to a file using the format f, which must +be a format supported by Biopython SeqIO for writing [5]. Default +is “gb” which is short for Genbank. Note that Biopython SeqIO reads +more formats than it writes.

+

Filename is the path to the file where the sequece is to be +written. The filename is optional, if it is not given, the +description property (string) is used together with the format.

+

If obj is the Dseqrecord object, the default file name will be:

+

<obj.locus>.<f>

+

Where <f> is “gb” by default. If the filename already exists and +AND the sequence it contains is different, a new file name will be +used so that the old file is not lost:

+

<obj.locus>_NEW.<f>

+

References

+ +
+ +
+
+find(other)[source]
+
+ +
+
+find_aminoacids(other)[source]
+
>>> from pydna.dseqrecord import Dseqrecord
+>>> s=Dseqrecord("atgtacgatcgtatgctggttatattttag")
+>>> s.seq.translate()
+Seq('MYDRMLVIF*')
+>>> "RML" in s
+True
+>>> "MMM" in s
+False
+>>> s.seq.rc().translate()
+Seq('LKYNQHTIVH')
+>>> "QHT" in s.rc()
+True
+>>> "QHT" in s
+False
+>>> slc = s.find_aa("RML")
+>>> slc
+slice(9, 18, None)
+>>> s[slc]
+Dseqrecord(-9)
+>>> code = s[slc].seq
+>>> code
+Dseq(-9)
+cgtatgctg
+gcatacgac
+>>> code.translate()
+Seq('RML')
+
+
+
+ +
+
+find_aa(other)
+
>>> from pydna.dseqrecord import Dseqrecord
+>>> s=Dseqrecord("atgtacgatcgtatgctggttatattttag")
+>>> s.seq.translate()
+Seq('MYDRMLVIF*')
+>>> "RML" in s
+True
+>>> "MMM" in s
+False
+>>> s.seq.rc().translate()
+Seq('LKYNQHTIVH')
+>>> "QHT" in s.rc()
+True
+>>> "QHT" in s
+False
+>>> slc = s.find_aa("RML")
+>>> slc
+slice(9, 18, None)
+>>> s[slc]
+Dseqrecord(-9)
+>>> code = s[slc].seq
+>>> code
+Dseq(-9)
+cgtatgctg
+gcatacgac
+>>> code.translate()
+Seq('RML')
+
+
+
+ +
+
+map_trace_files(pth, limit=25)[source]
+
+ +
+
+linearize(*enzymes)[source]
+

Similar to :func:cut.

+

Throws an exception if there is not excactly one cut +i.e. none or more than one digestion products.

+
+ +
+
+no_cutters(batch: RestrictionBatch | None = None)[source]
+

docstring.

+
+ +
+
+unique_cutters(batch: RestrictionBatch | None = None)[source]
+

docstring.

+
+ +
+
+once_cutters(batch: RestrictionBatch | None = None)[source]
+

docstring.

+
+ +
+
+twice_cutters(batch: RestrictionBatch | None = None)[source]
+

docstring.

+
+ +
+
+n_cutters(n=3, batch: RestrictionBatch | None = None)[source]
+

docstring.

+
+ +
+
+cutters(batch: RestrictionBatch | None = None)[source]
+

docstring.

+
+ +
+
+number_of_cuts(*enzymes)[source]
+

The number of cuts by digestion with the Restriction enzymes +contained in the iterable.

+
+ +
+
+cas9(RNA: str)[source]
+

docstring.

+
+ +
+
+reverse_complement()[source]
+

Reverse complement.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("ggaatt")
+>>> a
+Dseqrecord(-6)
+>>> a.seq
+Dseq(-6)
+ggaatt
+ccttaa
+>>> a.reverse_complement().seq
+Dseq(-6)
+aattcc
+ttaagg
+>>>
+
+
+ +
+ +
+
+rc()
+

Reverse complement.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("ggaatt")
+>>> a
+Dseqrecord(-6)
+>>> a.seq
+Dseq(-6)
+ggaatt
+ccttaa
+>>> a.reverse_complement().seq
+Dseq(-6)
+aattcc
+ttaagg
+>>>
+
+
+ +
+ +
+
+synced(ref, limit=25)[source]
+

This method returns a new circular sequence (Dseqrecord object), which has been rotated +in such a way that there is maximum overlap between the sequence and +ref, which may be a string, Biopython Seq, SeqRecord object or +another Dseqrecord object.

+

The reason for using this could be to rotate a new recombinant plasmid so +that it starts at the same position after cloning. See the example below:

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("gaat", circular=True)
+>>> a.seq
+Dseq(o4)
+gaat
+ctta
+>>> d = a[2:] + a[:2]
+>>> d.seq
+Dseq(-4)
+atga
+tact
+>>> insert=Dseqrecord("CCC")
+>>> recombinant = (d+insert).looped()
+>>> recombinant.seq
+Dseq(o7)
+atgaCCC
+tactGGG
+>>> recombinant.synced(a).seq
+Dseq(o7)
+gaCCCat
+ctGGGta
+
+
+
+ +
+
+upper()[source]
+

Returns an uppercase copy. +>>> from pydna.dseqrecord import Dseqrecord +>>> my_seq = Dseqrecord(“aAa”) +>>> my_seq.seq +Dseq(-3) +aAa +tTt +>>> upper = my_seq.upper() +>>> upper.seq +Dseq(-3) +AAA +TTT +>>>

+
+
Returns:
+

Dseqrecord object in uppercase

+
+
Return type:
+

Dseqrecord

+
+
+ +
+ +
+
+lower()[source]
+
>>> from pydna.dseqrecord import Dseqrecord
+>>> my_seq = Dseqrecord("aAa")
+>>> my_seq.seq
+Dseq(-3)
+aAa
+tTt
+>>> upper = my_seq.upper()
+>>> upper.seq
+Dseq(-3)
+AAA
+TTT
+>>> lower = my_seq.lower()
+>>> lower
+Dseqrecord(-3)
+>>>
+
+
+
+
Returns:
+

Dseqrecord object in lowercase

+
+
Return type:
+

Dseqrecord

+
+
+ +
+ +
+
+orfs(minsize=300)[source]
+

docstring.

+
+ +
+
+orfs_to_features(minsize=300)[source]
+

docstring.

+
+ +
+
+copy_gb_to_clipboard()[source]
+

docstring.

+
+ +
+
+copy_fasta_to_clipboard()[source]
+

docstring.

+
+ +
+
+figure(feature=0, highlight='\x1b[48;5;11m', plain='\x1b[0m')[source]
+

docstring.

+
+ +
+
+shifted(shift)[source]
+

Circular Dseqrecord with a new origin <shift>.

+

This only works on circular Dseqrecords. If we consider the following +circular sequence:

+
+
GAAAT   <-- watson strand
+
CTTTA   <-- crick strand
+
+

The T and the G on the watson strand are linked together as well +as the A and the C of the of the crick strand.

+

if shift is 1, this indicates a new origin at position 1:

+
+
+
new origin at the | symbol:
+

+
+
G|AAAT
+
C|TTTA
+
+

new sequence:

+
+
AAATG
+
TTTAC
+
+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("aaat",circular=True)
+>>> a
+Dseqrecord(o4)
+>>> a.seq
+Dseq(o4)
+aaat
+ttta
+>>> b=a.shifted(1)
+>>> b
+Dseqrecord(o4)
+>>> b.seq
+Dseq(o4)
+aata
+ttat
+
+
+
+ +
+
+cut(*enzymes)[source]
+

Digest a Dseqrecord object with one or more restriction enzymes.

+

returns a list of linear Dseqrecords. If there are no cuts, an empty +list is returned.

+

See also Dseq.cut() +:param enzymes: A Bio.Restriction.XXX restriction object or iterable of such. +:type enzymes: enzyme object or iterable of such objects

+
+
Returns:
+

Dseqrecord_frags – list of Dseqrecord objects formed by the digestion

+
+
Return type:
+

list

+
+
+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("ggatcc")
+>>> from Bio.Restriction import BamHI
+>>> a.cut(BamHI)
+(Dseqrecord(-5), Dseqrecord(-5))
+>>> frag1, frag2 = a.cut(BamHI)
+>>> frag1.seq
+Dseq(-5)
+g
+cctag
+>>> frag2.seq
+Dseq(-5)
+gatcc
+    g
+
+
+
+ +
+
+apply_cut(left_cut, right_cut)[source]
+
+ +
+ +
+
+

pydna.amplicon module

+

This module provides the Amplicon class for PCR simulation. +This class is not meant to be use directly but is +used by the amplify module

+
+
+class pydna.amplicon.Amplicon(record, *args, template=None, forward_primer=None, reverse_primer=None, **kwargs)[source]
+

Bases: Dseqrecord

+

The Amplicon class holds information about a PCR reaction involving two +primers and one template. This class is used by the Anneal class and is not +meant to be instantiated directly.

+
+
Parameters:
+
    +
  • forward_primer (SeqRecord(Biopython)) – SeqRecord object holding the forward (sense) primer

  • +
  • reverse_primer (SeqRecord(Biopython)) – SeqRecord object holding the reverse (antisense) primer

  • +
  • template (Dseqrecord) – Dseqrecord object holding the template (circular or linear)

-

Virtually any sub-cloning experiment can be described in pydna, and its execution yield -the sequences of intermediate and final DNA molecules.

-

Pydna has been designed with the goal of being understandable for biologists with only some basic understanding of Python.

-

Pydna can formalize planning and sharing of cloning strategies and is especially useful for complex or combinatorial -DNA molecule constructions.

-
+
+
+
+
+classmethod from_SeqRecord(record, *args, path=None, **kwargs)[source]
+
+ +
+
+reverse_complement()[source]
+

Reverse complement.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("ggaatt")
+>>> a
+Dseqrecord(-6)
+>>> a.seq
+Dseq(-6)
+ggaatt
+ccttaa
+>>> a.reverse_complement().seq
+Dseq(-6)
+aattcc
+ttaagg
+>>>
+
+
+ +
+ +
+
+rc()
+

Reverse complement.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("ggaatt")
+>>> a
+Dseqrecord(-6)
+>>> a.seq
+Dseq(-6)
+ggaatt
+ccttaa
+>>> a.reverse_complement().seq
+Dseq(-6)
+aattcc
+ttaagg
+>>>
+
+
+ +
+ +
+
+figure()[source]
+

This method returns a simple figure of the two primers binding +to a part of the template.

+
5tacactcaccgtctatcattatc...cgactgtatcatctgatagcac3
+                           ||||||||||||||||||||||
+                          3gctgacatagtagactatcgtg5
+5tacactcaccgtctatcattatc3
+ |||||||||||||||||||||||
+3atgtgagtggcagatagtaatag...gctgacatagtagactatcgtg5
+
+
+
+
Returns:
+

figure – A string containing a text representation of the primers +annealing on the template (see example above).

+
+
Return type:
+

string

+
+
+
+ +
+
+set_forward_primer_footprint(length)[source]
+
+ +
+
+set_reverse_primer_footprint(length)[source]
+
+ +
+
+program()[source]
+
+ +
+
+dbd_program()[source]
+
+ +
+
+primers()[source]
+
+ +
+
+
+

pydna.amplify module

+

This module provide the Anneal class and the pcr() function +for PCR simulation. The pcr function is simpler to use, but expects only one +PCR product. The Anneal class should be used if more flexibility is required.

+

Primers with 5’ tails as well as inverse PCR on circular templates are handled +correctly.

+
+
+class pydna.amplify.Anneal(primers, template, limit=13, **kwargs)[source]
+

Bases: object

+

The Anneal class has the following important attributes:

+
+
+forward_primers
+

Description of forward_primers.

+
+
Type:
+

list

+
+
+
+ +
+
+reverse_primers
+

Description of reverse_primers.

+
+
Type:
+

list

+
+
+
+ +
+
+template
+

A copy of the template argument. Primers annealing sites has been +added as features that can be visualized in a seqence editor such as +ApE.

+
+
Type:
+

Dseqrecord

+
+
+
+ +
+
+limit
+

The limit of PCR primer annealing, default is 13 bp.

+
+
Type:
+

int, optional

+
+
+
+ +
+
+property products
+
+ +
+
+report()
+

returns a short report describing if or where primer +anneal on the template.

+
+ +
+ +
+
+pydna.amplify.pcr(*args, **kwargs) Amplicon[source]
+

pcr is a convenience function for the Anneal class to simplify its +usage, especially from the command line. If more than one or no PCR +product is formed, a ValueError is raised.

+

args is any iterable of Dseqrecords or an iterable of iterables of +Dseqrecords. args will be greedily flattened.

+
+
Parameters:
+
    +
  • args (iterable containing sequence objects) – Several arguments are also accepted.

  • +
  • limit (int = 13, optional) – limit length of the annealing part of the primers.

  • +
+
+
+

Notes

+

sequences in args could be of type:

+
    +
  • string

  • +
  • Seq

  • +
  • SeqRecord (or subclass)

  • +
  • Dseqrecord (or sublcass)

  • +
+

The last sequence will be assumed to be the template while +all preceeding sequences will be assumed to be primers.

+

This is a powerful function, use with care!

+
+
Returns:
+

product – An pydna.amplicon.Amplicon object representing the PCR +product. The direction of the PCR product will be the same as for +the template sequence.

+
+
Return type:
+

Amplicon

+
+
+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> from pydna.readers import read
+>>> from pydna.amplify import pcr
+>>> from pydna.primer import Primer
+>>> template = Dseqrecord("tacactcaccgtctatcattatctactatcgactgtatcatctgatagcac")
+>>> from Bio.SeqRecord import SeqRecord
+>>> p1 = Primer("tacactcaccgtctatcattatc")
+>>> p2 = Primer("cgactgtatcatctgatagcac").reverse_complement()
+>>> pcr(p1, p2, template)
+Amplicon(51)
+>>> pcr([p1, p2], template)
+Amplicon(51)
+>>> pcr((p1,p2,), template)
+Amplicon(51)
+>>>
+
+
+
+ +
+
+

pydna.assembly module

+

Assembly of sequences by homologous recombination.

+

Should also be useful for related techniques such as Gibson assembly and fusion +PCR. Given a list of sequences (Dseqrecords), all sequences are analyzed for +shared homology longer than the set limit.

+

A graph is constructed where each overlapping region form a node and +sequences separating the overlapping regions form edges.

+
            -- A --
+catgatctacgtatcgtgt     -- B --
+            atcgtgtactgtcatattc
+                        catattcaaagttct
+
+
+
+--x--> A --y--> B --z-->   (Graph)
+
+Nodes:
+
+A : atcgtgt
+B : catattc
+
+Edges:
+
+x : catgatctacgt
+y : actgt
+z : aaagttct
+
+
+

The NetworkX package is used to trace linear and circular paths through the +graph.

+
+
+class pydna.assembly.Assembly(frags: List[Dseqrecord], limit: int = 25, algorithm: Callable[[str, str, int], List[Tuple[int, int, int]]] = common_sub_strings)[source]
+

Bases: object

+

Assembly of a list of linear DNA fragments into linear or circular +constructs. The Assembly is meant to replace the Assembly method as it +is easier to use. Accepts a list of Dseqrecords (source fragments) to +initiate an Assembly object. Several methods are available for analysis +of overlapping sequences, graph construction and assembly.

+
+
Parameters:
+
    +
  • fragments (list) – a list of Dseqrecord objects.

  • +
  • limit (int, optional) – The shortest shared homology to be considered

  • +
  • algorithm (function, optional) – The algorithm used to determine the shared sequences.

  • +
  • max_nodes (int) – The maximum number of nodes in the graph. This can be tweaked to +manage sequences with a high number of shared sub sequences.

  • +
+
+
+

Examples

+
>>> from pydna.assembly import Assembly
+>>> from pydna.dseqrecord import Dseqrecord
+>>> a = Dseqrecord("acgatgctatactgCCCCCtgtgctgtgctcta")
+>>> b = Dseqrecord("tgtgctgtgctctaTTTTTtattctggctgtatc")
+>>> c = Dseqrecord("tattctggctgtatcGGGGGtacgatgctatactg")
+>>> x = Assembly((a,b,c), limit=14)
+>>> x
+Assembly
+fragments....: 33bp 34bp 35bp
+limit(bp)....: 14
+G.nodes......: 6
+algorithm....: common_sub_strings
+>>> x.assemble_circular()
+[Contig(o59), Contig(o59)]
+>>> x.assemble_circular()[0].seq.watson
+'acgatgctatactgCCCCCtgtgctgtgctctaTTTTTtattctggctgtatcGGGGGt'
+
+
+
+
+assemble_linear(**kwargs)
+
+ +
+
+assemble_circular(**kwargs)
+
+ +
+ +
+
+

pydna.common_sub_strings module

+

This module is based on the Py-rstr-max package that +was written by Romain Brixtel (rbrixtel_at_gmail_dot_com) +(https://brixtel.users.greyc.fr) and is available from +https://code.google.com/p/py-rstr-max +https://github.com/gip0/py-rstr-max +the original code was covered by an MIT licence.

+
+
+pydna.common_sub_strings.common_sub_strings(stringx: str, stringy: str, limit: int = 25) List[Tuple[int, int, int]][source]
+

Finds all common substrings between stringx and stringy, and returns +them sorted by length.

+

This function is case sensitive.

+
+
Parameters:
+
    +
  • stringx (str)

  • +
  • stringy (str)

  • +
  • limit (int, optional)

  • +
+
+
Returns:
+

[(startx1, starty1, length1),(startx2, starty2, length2), …]

+

startx1 = startposition in x, where substring 1 starts +starty1 = position in y where substring 1 starts +length1 = lenght of substring

+

+
+
Return type:
+

list of tuple

+
+
+
+ +
+
+pydna.common_sub_strings.terminal_overlap(stringx: str, stringy: str, limit: int = 15) List[Tuple[int, int, int]][source]
+

Finds the the flanking common substrings between stringx and stringy +longer than limit. This means that the results only contains substrings +that starts or ends at the the ends of stringx and stringy.

+

This function is case sensitive.

+

returns a list of tuples describing the substrings +The list is sorted longest -> shortest.

+
+
Parameters:
+
    +
  • stringx (str)

  • +
  • stringy (str)

  • +
  • limit (int, optional)

  • +
+
+
Returns:
+

[(startx1,starty1,length1),(startx2,starty2,length2), …]

+

startx1 = startposition in x, where substring 1 starts +starty1 = position in y where substring 1 starts +length1 = lenght of substring

+

+
+
Return type:
+

list of tuple

+
+
+

Examples

+
>>> from pydna.common_sub_strings import terminal_overlap
+>>> terminal_overlap("agctatgtatcttgcatcgta", "gcatcgtagtctatttgcttac", limit=8)
+[(13, 0, 8)]
+
+
+
             <-- 8 ->
+<---- 13 --->
+agctatgtatcttgcatcgta                    stringx
+             gcatcgtagtctatttgcttac      stringy
+             0
+
+
+
+ +
+
+

pydna.contig module

+
+
+class pydna.contig.Contig(record, *args, graph=None, nodemap=None, **kwargs)[source]
+

Bases: Dseqrecord

+

This class holds information about a DNA assembly. This class is instantiated by +the Assembly class and is not meant to be used directly.

+
+
+classmethod from_string(record: str = '', *args, graph=None, nodemap=None, **kwargs)[source]
+

docstring.

+
+ +
+
+classmethod from_SeqRecord(record, *args, graph=None, nodemap=None, **kwargs)[source]
+
+ +
+
+reverse_complement()[source]
+

Reverse complement.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("ggaatt")
+>>> a
+Dseqrecord(-6)
+>>> a.seq
+Dseq(-6)
+ggaatt
+ccttaa
+>>> a.reverse_complement().seq
+Dseq(-6)
+aattcc
+ttaagg
+>>>
+
+
+ +
+ +
+
+rc()
+

Reverse complement.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("ggaatt")
+>>> a
+Dseqrecord(-6)
+>>> a.seq
+Dseq(-6)
+ggaatt
+ccttaa
+>>> a.reverse_complement().seq
+Dseq(-6)
+aattcc
+ttaagg
+>>>
+
+
+ +
+ +
+
+detailed_figure()[source]
+

Returns a text representation of the assembled fragments.

+

Linear:

+
acgatgctatactgCCCCCtgtgctgtgctcta
+                   TGTGCTGTGCTCTA
+                   tgtgctgtgctctaTTTTTtattctggctgtatc
+
+
+

Circular:

+
||||||||||||||
+acgatgctatactgCCCCCtgtgctgtgctcta
+                   TGTGCTGTGCTCTA
+                   tgtgctgtgctctaTTTTTtattctggctgtatc
+                                      TATTCTGGCTGTATC
+                                      tattctggctgtatcGGGGGtacgatgctatactg
+                                                           ACGATGCTATACTG
+
+
+
+ +
+
+figure()[source]
+

Compact ascii representation of the assembled fragments.

+

Each fragment is represented by:

+
Size of common 5' substring|Name and size of DNA fragment|
+Size of common 5' substring
+
+
+

Linear:

+
frag20| 6
+       \\/
+       /\\
+        6|frag23| 6
+                 \\/
+                 /\\
+                  6|frag14
+
+
+

Circular:

+
 -|2577|61
+|       \\/
+|       /\\
+|       61|5681|98
+|               \\/
+|               /\\
+|               98|2389|557
+|                       \\/
+|                       /\\
+|                       557-
+|                          |
+ --------------------------
+
+
+
+ +
+ +
+
+

pydna.design module

+

This module contain functions for primer design for various purposes.

+
    +
  • :func:primer_design for designing primers for a sequence or a matching primer for an existing primer. Returns an Amplicon object (same as the amplify module returns).

  • +
  • :func:assembly_fragments Adds tails to primers for a linear assembly through homologous recombination or Gibson assembly.

  • +
  • :func:circular_assembly_fragments Adds tails to primers for a circular assembly through homologous recombination or Gibson assembly.

  • +
+
+
+pydna.design.primer_design(template, fp=None, rp=None, limit=13, target_tm=55.0, tm_func=_tm_default, estimate_function=None, **kwargs)[source]
+

This function designs a forward primer and a reverse primer for PCR amplification +of a given template sequence.

+

The template argument is a Dseqrecord object or equivalent containing the template sequence.

+

The optional fp and rp arguments can contain an existing primer for the sequence (either the forward or reverse primer). +One or the other primers can be specified, not both (since then there is nothing to design!, use the pydna.amplify.pcr function instead).

+

The limit argument is the minimum length of the primer. The default value is 13.

+

If one of the primers is given, the other primer is designed to match in terms of Tm. +If both primers are designed, they will be designed to target_tm

+

tm_func is a function that takes an ascii string representing an oligonuceotide as argument and returns a float. +Some useful functions can be found in the pydna.tm module, but can be substituted for a custom made function.

+

estimate_function is a tm_func-like function that is used to get a first guess for the primer design, that is then used as starting +point for the final result. This is useful when the tm_func function is slow to calculate (e.g. it relies on an +external API, such as the NEB primer design API). The estimate_function should be faster than the tm_func function. +The default value is None. +To use the default tm_func as estimate function to get the NEB Tm faster, you can do: +primer_design(dseqr, target_tm=55, tm_func=tm_neb, estimate_function=tm_default).

+

The function returns a pydna.amplicon.Amplicon class instance. This object has +the object.forward_primer and object.reverse_primer properties which contain the designed primers.

+
+
Parameters:
+
    +
  • template (pydna.dseqrecord.Dseqrecord) – a Dseqrecord object. The only required argument.

  • +
  • fp (pydna.primer.Primer, optional) – optional pydna.primer.Primer objects containing one primer each.

  • +
  • rp (pydna.primer.Primer, optional) – optional pydna.primer.Primer objects containing one primer each.

  • +
  • target_tm (float, optional) – target tm for the primers, set to 55°C by default.

  • +
  • tm_func (function) – Function used for tm calculation. This function takes an ascii string +representing an oligonuceotide as argument and returns a float. +Some useful functions can be found in the pydna.tm module, but can be +substituted for a custom made function.

  • +
+
+
Returns:
+

result

+
+
Return type:
+

Amplicon

+
+
+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> t=Dseqrecord("atgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatg")
+>>> t
+Dseqrecord(-64)
+>>> from pydna.design import primer_design
+>>> ampl = primer_design(t)
+>>> ampl
+Amplicon(64)
+>>> ampl.forward_primer
+f64 17-mer:5'-atgactgctaacccttc-3'
+>>> ampl.reverse_primer
+r64 18-mer:5'-catcgtaagtttcgaacg-3'
+>>> print(ampl.figure())
+5atgactgctaacccttc...cgttcgaaacttacgatg3
+                     ||||||||||||||||||
+                    3gcaagctttgaatgctac5
+5atgactgctaacccttc3
+ |||||||||||||||||
+3tactgacgattgggaag...gcaagctttgaatgctac5
+>>> pf = "GGATCC" + ampl.forward_primer
+>>> pr = "GGATCC" + ampl.reverse_primer
+>>> pf
+f64 23-mer:5'-GGATCCatgactgct..ttc-3'
+>>> pr
+r64 24-mer:5'-GGATCCcatcgtaag..acg-3'
+>>> from pydna.amplify import pcr
+>>> pcr_prod = pcr(pf, pr, t)
+>>> print(pcr_prod.figure())
+      5atgactgctaacccttc...cgttcgaaacttacgatg3
+                           ||||||||||||||||||
+                          3gcaagctttgaatgctacCCTAGG5
+5GGATCCatgactgctaacccttc3
+       |||||||||||||||||
+      3tactgacgattgggaag...gcaagctttgaatgctac5
+>>> print(pcr_prod.seq)
+GGATCCatgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatgGGATCC
+>>> from pydna.primer import Primer
+>>> pf = Primer("atgactgctaacccttccttggtgttg", id="myprimer")
+>>> ampl = primer_design(t, fp = pf)
+>>> ampl.forward_primer
+myprimer 27-mer:5'-atgactgctaaccct..ttg-3'
+>>> ampl.reverse_primer
+r64 32-mer:5'-catcgtaagtttcga..atc-3'
+
+
+
+ +
+
+pydna.design.assembly_fragments(f, overlap=35, maxlink=40, circular=False)[source]
+

This function return a list of pydna.amplicon.Amplicon objects where +primers have been modified with tails so that the fragments can be fused in +the order they appear in the list by for example Gibson assembly or homologous +recombination.

+

Given that we have two linear pydna.amplicon.Amplicon objects a and b

+

we can modify the reverse primer of a and forward primer of b with tails to allow +fusion by fusion PCR, Gibson assembly or in-vivo homologous recombination. +The basic requirements for the primers for the three techniques are the same.

+
 _________ a _________
+/                     \
+agcctatcatcttggtctctgca
+                  |||||
+                 <gacgt
+agcct>
+|||||
+tcggatagtagaaccagagacgt
+
+                        __________ b ________
+                       /                     \
+                       TTTATATCGCATGACTCTTCTTT
+                                         |||||
+                                        <AGAAA
+                       TTTAT>
+                       |||||
+                       AAATATAGCGTACTGAGAAGAAA
+
+agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
+||||||||||||||||||||||||||||||||||||||||||||||
+tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA
+\___________________ c ______________________/
+
+
+

Design tailed primers incorporating a part of the next or previous fragment to be assembled.

+
agcctatcatcttggtctctgca
+|||||||||||||||||||||||
+                gagacgtAAATATA
+
+|||||||||||||||||||||||
+tcggatagtagaaccagagacgt
+
+                       TTTATATCGCATGACTCTTCTTT
+                       |||||||||||||||||||||||
+
+                ctctgcaTTTATAT
+                       |||||||||||||||||||||||
+                       AAATATAGCGTACTGAGAAGAAA
+
+
+

PCR products with flanking sequences are formed in the PCR process.

+
agcctatcatcttggtctctgcaTTTATAT
+||||||||||||||||||||||||||||||
+tcggatagtagaaccagagacgtAAATATA
+                \____________/
+
+                   identical
+                   sequences
+                 ____________
+                /            \
+                ctctgcaTTTATATCGCATGACTCTTCTTT
+                ||||||||||||||||||||||||||||||
+                gagacgtAAATATAGCGTACTGAGAAGAAA
+
+
+

The fragments can be fused by any of the techniques mentioned earlier to form c:

+
agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
+||||||||||||||||||||||||||||||||||||||||||||||
+tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA
+
+
+

The first argument of this function is a list of sequence objects containing +Amplicons and other similar objects.

+

At least every second sequence object needs to be an Amplicon

+

This rule exists because if a sequence object is that is not a PCR product +is to be fused with another fragment, that other fragment needs to be an Amplicon +so that the primer of the other object can be modified to include the whole stretch +of sequence homology needed for the fusion. See the example below where a is a +non-amplicon (a linear plasmid vector for instance)

+
 _________ a _________           __________ b ________
+/                     \         /                     \
+agcctatcatcttggtctctgca   <-->  TTTATATCGCATGACTCTTCTTT
+|||||||||||||||||||||||         |||||||||||||||||||||||
+tcggatagtagaaccagagacgt                          <AGAAA
+                                TTTAT>
+                                |||||||||||||||||||||||
+                          <-->  AAATATAGCGTACTGAGAAGAAA
+
+     agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
+     ||||||||||||||||||||||||||||||||||||||||||||||
+     tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA
+     \___________________ c ______________________/
+
+
+

In this case only the forward primer of b is fitted with a tail with a part a:

+
agcctatcatcttggtctctgca
+|||||||||||||||||||||||
+tcggatagtagaaccagagacgt
+
+                       TTTATATCGCATGACTCTTCTTT
+                       |||||||||||||||||||||||
+                                        <AGAAA
+         tcttggtctctgcaTTTATAT
+                       |||||||||||||||||||||||
+                       AAATATAGCGTACTGAGAAGAAA
+
+
+

PCR products with flanking sequences are formed in the PCR process.

+
agcctatcatcttggtctctgcaTTTATAT
+||||||||||||||||||||||||||||||
+tcggatagtagaaccagagacgtAAATATA
+                \____________/
+
+                   identical
+                   sequences
+                 ____________
+                /            \
+                ctctgcaTTTATATCGCATGACTCTTCTTT
+                ||||||||||||||||||||||||||||||
+                gagacgtAAATATAGCGTACTGAGAAGAAA
+
+
+

The fragments can be fused by for example Gibson assembly:

+
agcctatcatcttggtctctgcaTTTATAT
+||||||||||||||||||||||||||||||
+tcggatagtagaacca
+
+                             TCGCATGACTCTTCTTT
+                ||||||||||||||||||||||||||||||
+                gagacgtAAATATAGCGTACTGAGAAGAAA
+
+
+

to form c:

+
agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
+||||||||||||||||||||||||||||||||||||||||||||||
+tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA
+
+
+

The first argument of this function is a list of sequence objects containing +Amplicons and other similar objects.

+

The overlap argument controls how many base pairs of overlap required between +adjacent sequence fragments. In the junction between Amplicons, tails with the +length of about half of this value is added to the two primers +closest to the junction.

+
>       <
+Amplicon1
+         Amplicon2
+         >       <
+
+         ⇣
+
+>       <-
+Amplicon1
+         Amplicon2
+        ->       <
+
+
+

In the case of an Amplicon adjacent to a Dseqrecord object, the tail will +be twice as long (1*overlap) since the +recombining sequence is present entirely on this primer:

+
Dseqrecd1
+         Amplicon1
+         >       <
+
+         ⇣
+
+Dseqrecd1
+         Amplicon1
+       -->       <
+
+
+

Note that if the sequence of DNA fragments starts or stops with an Amplicon, +the very first and very last prinmer will not be modified i.e. assembles are +always assumed to be linear. There are simple tricks around that for circular +assemblies depicted in the last two examples below.

+

The maxlink arguments controls the cut off length for sequences that will be +synhtesized by adding them to primers for the adjacent fragment(s). The +argument list may contain short spacers (such as spacers between fusion proteins).

+
Example 1: Linear assembly of PCR products (pydna.amplicon.Amplicon class objects) ------
+
+>       <         >       <
+Amplicon1         Amplicon3
+         Amplicon2         Amplicon4
+         >       <         >       <
+
+                     ⇣
+                     pydna.design.assembly_fragments
+                     ⇣
+
+>       <-       ->       <-                      pydna.assembly.Assembly
+Amplicon1         Amplicon3
+         Amplicon2         Amplicon4     ➤  Amplicon1Amplicon2Amplicon3Amplicon4
+        ->       <-       ->       <
+
+Example 2: Linear assembly of alternating Amplicons and other fragments
+
+>       <         >       <
+Amplicon1         Amplicon2
+         Dseqrecd1         Dseqrecd2
+
+                     ⇣
+                     pydna.design.assembly_fragments
+                     ⇣
+
+>       <--     -->       <--                     pydna.assembly.Assembly
+Amplicon1         Amplicon2
+         Dseqrecd1         Dseqrecd2     ➤  Amplicon1Dseqrecd1Amplicon2Dseqrecd2
+
+Example 3: Linear assembly of alternating Amplicons and other fragments
+
+Dseqrecd1         Dseqrecd2
+         Amplicon1         Amplicon2
+         >       <       -->       <
+
+                     ⇣
+             pydna.design.assembly_fragments
+                     ⇣
+                                                  pydna.assembly.Assembly
+Dseqrecd1         Dseqrecd2
+         Amplicon1         Amplicon2     ➤  Dseqrecd1Amplicon1Dseqrecd2Amplicon2
+       -->       <--     -->       <
+
+Example 4: Circular assembly of alternating Amplicons and other fragments
+
+                 ->       <==
+Dseqrecd1         Amplicon2
+         Amplicon1         Dseqrecd1
+       -->       <-
+                     ⇣
+                     pydna.design.assembly_fragments
+                     ⇣
+                                                   pydna.assembly.Assembly
+                 ->       <==
+Dseqrecd1         Amplicon2                    -Dseqrecd1Amplicon1Amplicon2-
+         Amplicon1                       ➤    |                             |
+       -->       <-                            -----------------------------
+
+------ Example 5: Circular assembly of Amplicons
+
+>       <         >       <
+Amplicon1         Amplicon3
+         Amplicon2         Amplicon1
+         >       <         >       <
+
+                     ⇣
+                     pydna.design.assembly_fragments
+                     ⇣
+
+>       <=       ->       <-
+Amplicon1         Amplicon3
+         Amplicon2         Amplicon1
+        ->       <-       +>       <
+
+                     ⇣
+             make new Amplicon using the Amplicon1.template and
+             the last fwd primer and the first rev primer.
+                     ⇣
+                                                   pydna.assembly.Assembly
++>       <=       ->       <-
+ Amplicon1         Amplicon3                  -Amplicon1Amplicon2Amplicon3-
+          Amplicon2                      ➤   |                             |
+         ->       <-                          -----------------------------
+
+
+
+
Parameters:
+
    +
  • f (list of pydna.amplicon.Amplicon and other Dseqrecord like objects) – list Amplicon and Dseqrecord object for which fusion primers should be constructed.

  • +
  • overlap (int, optional) – Length of required overlap between fragments.

  • +
  • maxlink (int, optional) – Maximum length of spacer sequences that may be present in f. These will be included in tails for designed primers.

  • +
  • circular (bool, optional) – If True, the assembly is circular. If False, the assembly is linear.

  • +
+
+
Returns:
+

seqs

+
[Amplicon1,
+ Amplicon2, ...]
+
+
+

+
+
Return type:
+

list of pydna.amplicon.Amplicon and other Dseqrecord like objects pydna.amplicon.Amplicon objects

+
+
+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> from pydna.design import primer_design
+>>> a=primer_design(Dseqrecord("atgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatg"))
+>>> b=primer_design(Dseqrecord("ccaaacccaccaggtaccttatgtaagtacttcaagtcgccagaagacttcttggtcaagttgcc"))
+>>> c=primer_design(Dseqrecord("tgtactggtgctgaaccttgtatcaagttgggtgttgacgccattgccccaggtggtcgtttcgtt"))
+>>> from pydna.design import assembly_fragments
+>>> # We would like a circular recombination, so the first sequence has to be repeated
+>>> fa1,fb,fc,fa2 = assembly_fragments([a,b,c,a])
+>>> # Since all fragments are Amplicons, we need to extract the rp of the 1st and fp of the last fragments.
+>>> from pydna.amplify import pcr
+>>> fa = pcr(fa2.forward_primer, fa1.reverse_primer, a)
+>>> [fa,fb,fc]
+[Amplicon(100), Amplicon(101), Amplicon(102)]
+>>> fa.name, fb.name, fc.name = "fa fb fc".split()
+>>> from pydna.assembly import Assembly
+>>> assemblyobj = Assembly([fa,fb,fc])
+>>> assemblyobj
+Assembly
+fragments....: 100bp 101bp 102bp
+limit(bp)....: 25
+G.nodes......: 6
+algorithm....: common_sub_strings
+>>> assemblyobj.assemble_linear()
+[Contig(-231), Contig(-166), Contig(-36)]
+>>> assemblyobj.assemble_circular()[0].seguid()
+'cdseguid=85t6tfcvWav0wnXEIb-lkUtrl4s'
+>>> (a+b+c).looped().seguid()
+'cdseguid=85t6tfcvWav0wnXEIb-lkUtrl4s'
+>>> print(assemblyobj.assemble_circular()[0].figure())
+ -|fa|36
+|     \/
+|     /\
+|     36|fb|36
+|           \/
+|           /\
+|           36|fc|36
+|                 \/
+|                 /\
+|                 36-
+|                    |
+ --------------------
+>>>
+
+
+
+ +
+
+pydna.design.circular_assembly_fragments(f, overlap=35, maxlink=40)[source]
+

Equivalent to assembly_fragments with circular=True.

+

Deprecated, kept for backward compatibility. Use assembly_fragments with circular=True instead.

+
+ +
+
+

pydna.download module

+

Provides a function for downloading online text files.

+
+
+

pydna.editor module

+

This module provides a class for opening a sequence using an editor +that accepts a file as a command line argument.

+

ApE - A plasmid Editor [6] is and excellent editor for this purpose.

+

References

+ +
+
+class pydna.editor.Editor(shell_command_for_editor, tmpdir=None)[source]
+

Bases: object

+

The Editor class needs to be instantiated before use.

+
+
Parameters:
+
    +
  • shell_command_for_editor (str) – String containing the path to the editor

  • +
  • tmpdir (str, optional) – String containing path to the temprary directory where sequence +files are stored before opening.

  • +
+
+
+

Examples

+
>>> import pydna
+>>> #ape = pydna.Editor("tclsh8.6 /home/bjorn/.ApE/apeextractor/ApE.vfs/lib/app-AppMain/AppMain.tcl")
+>>> #ape.open("aaa") # This command opens the sequence in the ApE editor
+
+
+
+
+open(seq_to_open)[source]
+

Open a sequence for editing in an external (DNA) editor.

+
+
Parameters:
+

args (SeqRecord or Dseqrecord object)

+
+
+
+ +
+ +
+
+pydna.editor.ape(*args, **kwargs)[source]
+

docstring.

+
+ +
+
+

pydna.gel module

+

docstring.

+
+
+pydna.gel.interpolator(mwstd)[source]
+

docstring.

+
+ +
+
+pydna.gel.gel(samples=None, gel_length=600, margin=50, interpolator=interpolator(mwstd=_mwstd))[source]
+
+ +
+
+

pydna.genbank module

+

This module provides a class for downloading sequences from genbank +called Genbank and an function that does the same thing called genbank.

+

The function can be used if the environmental variable pydna_email has +been set to a valid email address. The easiest way to do this permanantly is to edit the +pydna.ini file. See the documentation of pydna.open_config_folder()

+
+
+class pydna.genbank.Genbank(users_email: str, *, tool: str = 'pydna')[source]
+

Bases: object

+

Class to facilitate download from genbank. It is easier and +quicker to use the pydna.genbank.genbank() function directly.

+
+
Parameters:
+

users_email (string) – Has to be a valid email address. You should always tell +Genbanks who you are, so that they can contact you.

+
+
+

Examples

+
>>> from pydna.genbank import Genbank
+>>> gb=Genbank("bjornjobb@gmail.com")
+>>> rec = gb.nucleotide("LP002422.1")   # <- entry from genbank
+>>> print(len(rec))
+1
+
+
+
+
+nucleotide(**kwargs)
+
+ +
+ +
+
+pydna.genbank.genbank(accession: str = 'CS570233.1', *args, **kwargs) GenbankRecord[source]
+

Download a genbank nuclotide record.

+

This function takes the same paramenters as the +:func:pydna.genbank.Genbank.nucleotide method. The email address stored +in the pydna_email environment variable is used. The easiest way set +this permanantly is to edit the pydna.ini file. +See the documentation of pydna.open_config_folder()

+

if no accession is given, a very short Genbank +entry +is used as an example (see below). This can be useful for testing the +connection to Genbank.

+

Please note that this result is also cached by default by settings in +the pydna.ini file. +See the documentation of pydna.open_config_folder()

+
LOCUS       CS570233                  14 bp    DNA     linear   PAT 18-MAY-2007
+DEFINITION  Sequence 6 from Patent WO2007025016.
+ACCESSION   CS570233
+VERSION     CS570233.1
+KEYWORDS    .
+SOURCE      synthetic construct
+  ORGANISM  synthetic construct
+            other sequences; artificial sequences.
+REFERENCE   1
+  AUTHORS   Shaw,R.W. and Cottenoir,M.
+  TITLE     Inhibition of metallo-beta-lactamase by double-stranded dna
+  JOURNAL   Patent: WO 2007025016-A1 6 01-MAR-2007;
+            Texas Tech University System (US)
+FEATURES             Location/Qualifiers
+     source          1..14
+                     /organism="synthetic construct"
+                     /mol_type="unassigned DNA"
+                     /db_xref="taxon:32630"
+                     /note="This is a 14bp aptamer inhibitor."
+ORIGIN
+        1 atgttcctac atga
+//
+
+
+
+ +
+
+

pydna.genbankfile module

+
+
+class pydna.genbankfile.GenbankFile(record, *args, path=None, **kwargs)[source]
+

Bases: Dseqrecord

+
+
+classmethod from_SeqRecord(record, *args, path=None, **kwargs)[source]
+
+ +
+
+reverse_complement()[source]
+

Reverse complement.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("ggaatt")
+>>> a
+Dseqrecord(-6)
+>>> a.seq
+Dseq(-6)
+ggaatt
+ccttaa
+>>> a.reverse_complement().seq
+Dseq(-6)
+aattcc
+ttaagg
+>>>
+
+
+ +
+ +
+
+rc()
+

Reverse complement.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("ggaatt")
+>>> a
+Dseqrecord(-6)
+>>> a.seq
+Dseq(-6)
+ggaatt
+ccttaa
+>>> a.reverse_complement().seq
+Dseq(-6)
+aattcc
+ttaagg
+>>>
+
+
+ +
+ +
+ +
+
+

pydna.genbankfixer module

+

This module provides the gbtext_clean() function which can clean up broken Genbank files enough to +pass the BioPython Genbank parser

+

Almost all of this code was lifted from BioJSON (https://github.com/levskaya/BioJSON) by Anselm Levskaya. +The original code was not accompanied by any software licence. This parser is based on pyparsing.

+

There are some modifications to deal with fringe cases.

+

The parser first produces JSON as an intermediate format which is then formatted back into a +string in Genbank format.

+

The parser is not complete, so some fields do not survive the roundtrip (see below). +This should not be a difficult fix. The returned result has two properties, +.jseq which is the intermediate JSON produced by the parser and .gbtext +which is the formatted genbank string.

+
+
+pydna.genbankfixer.parseGBLoc(s, l_, t)[source]
+

retwingles parsed genbank location strings, assumes no joins of RC and FWD sequences

+
+ +
+
+pydna.genbankfixer.strip_multiline(s, l_, t)[source]
+
+ +
+
+pydna.genbankfixer.toInt(s, l_, t)[source]
+
+ +
+
+pydna.genbankfixer.strip_indent(str)[source]
+
+ +
+
+pydna.genbankfixer.concat_dict(dlist)[source]
+

more or less dict(list of string pairs) but merges +vals with the same keys so no duplicates occur

+
+ +
+
+pydna.genbankfixer.toJSON(gbkstring)[source]
+
+ +
+
+pydna.genbankfixer.wrapstring(str_, rowstart, rowend, padfirst=True)[source]
+

wraps the provided string in lines of length rowend-rowstart +and padded on the left by rowstart. +-> if padfirst is false the first line is not padded

+
+ +
+
+pydna.genbankfixer.locstr(locs, strand)[source]
+

genbank formatted location string, assumes no join’d combo of rev and fwd seqs

+
+ +
+
+pydna.genbankfixer.originstr(sequence)[source]
+

formats dna sequence as broken, numbered lines ala genbank

+
+ +
+
+pydna.genbankfixer.toGB(jseq)[source]
+

parses json jseq data and prints out ApE compatible genbank

+
+ +
+
+pydna.genbankfixer.gbtext_clean(gbtext)[source]
+

This function takes a string containing one genbank sequence +in Genbank format and returns a named tuple containing two fields, +the gbtext containing a string with the corrected genbank sequence and +jseq which contains the JSON intermediate.

+

Examples

+
>>> s = '''LOCUS       New_DNA      3 bp    DNA   CIRCULAR SYN        19-JUN-2013
+... DEFINITION  .
+... ACCESSION
+... VERSION
+... SOURCE      .
+...   ORGANISM  .
+... COMMENT
+... COMMENT     ApEinfo:methylated:1
+... ORIGIN
+...         1 aaa
+... //'''
+>>> from pydna.readers import read
+>>> read(s)  
+/home/bjorn/anaconda3/envs/bjorn36/lib/python3.6/site-packages/Bio/GenBank/Scanner.py:1388: BiopythonParserWarning: Malformed LOCUS line found - is this correct?
+:'LOCUS       New_DNA      3 bp    DNA   CIRCULAR SYN        19-JUN-2013\n'
+  "correct?\n:%r" % line, BiopythonParserWarning)
+Traceback (most recent call last):
+  File "/home/bjorn/python_packages/pydna/pydna/readers.py", line 48, in read
+    results = results.pop()
+IndexError: pop from empty list
+
+During handling of the above exception, another exception occurred:
+
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "/home/bjorn/python_packages/pydna/pydna/readers.py", line 50, in read
+    raise ValueError("No sequences found in data:\n({})".format(data[:79]))
+ValueError: No sequences found in data:
+(LOCUS       New_DNA      3 bp    DNA   CIRCULAR SYN        19-JUN-2013
+DEFINITI)
+>>> from pydna.genbankfixer import gbtext_clean
+>>> s2, j2 = gbtext_clean(s)
+>>> print(s2)
+LOCUS       New_DNA                    3 bp ds-DNA     circular SYN 19-JUN-2013
+DEFINITION  .
+ACCESSION
+VERSION
+SOURCE      .
+ORGANISM  .
+COMMENT
+COMMENT     ApEinfo:methylated:1
+FEATURES             Location/Qualifiers
+ORIGIN
+        1 aaa
+//
+>>> s3 = read(s2)
+>>> s3
+Dseqrecord(o3)
+>>> print(s3.format())
+LOCUS       New_DNA                    3 bp    DNA     circular SYN 19-JUN-2013
+DEFINITION  .
+ACCESSION   New_DNA
+VERSION     New_DNA
+KEYWORDS    .
+SOURCE
+  ORGANISM  .
+            .
+COMMENT
+            ApEinfo:methylated:1
+FEATURES             Location/Qualifiers
+ORIGIN
+        1 aaa
+//
+
+
+
+ +
+
+

pydna.genbankrecord module

+
+
+class pydna.genbankrecord.GenbankRecord(record, *args, item='accession', start=None, stop=None, strand=1, **kwargs)[source]
+

Bases: Dseqrecord

+
+
+classmethod from_string(record: str = '', *args, item='accession', start=None, stop=None, strand=1, **kwargs)[source]
+

docstring.

+
+ +
+
+classmethod from_SeqRecord(record, *args, item='accession', start=None, stop=None, strand=1, **kwargs)[source]
+
+ +
+
+reverse_complement()[source]
+

Reverse complement.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("ggaatt")
+>>> a
+Dseqrecord(-6)
+>>> a.seq
+Dseq(-6)
+ggaatt
+ccttaa
+>>> a.reverse_complement().seq
+Dseq(-6)
+aattcc
+ttaagg
+>>>
+
+
+ +
+ +
+
+rc()
+

Reverse complement.

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> a=Dseqrecord("ggaatt")
+>>> a
+Dseqrecord(-6)
+>>> a.seq
+Dseq(-6)
+ggaatt
+ccttaa
+>>> a.reverse_complement().seq
+Dseq(-6)
+aattcc
+ttaagg
+>>>
+
+
+ +
+ +
+
+pydna_code()[source]
+

docstring.

+
+ +
+
+biopython_code()[source]
+

docstring.

+
+ +
+ +
+
+

pydna.myprimers module

+

Provides a practical way to access a list of primer sequences in a text file.

+

The path of a text file can be specified in the pydna.ini file or by the +´pydna_primers´ environment variable.

+

The file is expected to contain sequences in FASTA, Genbank or EMBL formats or +any format readable by the parse_primers function.

+

The primer list is expected to follow the convension below. The primer name is +expected to begin with the number.

+

can have the format below for example:

+
>2_third_primer
+tgagtagtcgtagtcgtcgtat
+
+>1_second_primer
+tgatcgtcatgctgactatactat
+
+>0_first_primer
+ctaggatcgtagatctagctg
+...
+
+
+

The primerlist funtion returns a list of pydna.primer.Primer objects +primerdict returns a dict where the key is the id of the object.

+
+
+class pydna.myprimers.PrimerList(initlist: ~typing.Iterable = None, path: (<class 'str'>, <class 'pathlib.Path'>) = None, *args, identifier: str = "p", **kwargs)[source]
+

Bases: UserList

+

Read a text file with primers.

+

The primers can be of any format readable by the parse_primers +function. Lines beginning with # are ignored. Path defaults to +the path given by the pydna_primers environment variable.

+

The primer list does not accept new primers. Use the +assign_numbers_to_new_primers method and paste the new +primers at the top of the list.

+

The primer list remembers the numbers of accessed primers. +The indices of accessed primers are stored in the .accessed +property.

+
+
+property accessed
+

docstring.

+
+ +
+
+assign_numbers(lst: list)[source]
+

Find new primers in lst.

+

Returns a string containing new primers with their assigned +numbers. This string can be copied and pasted to the primer +text file.

+
+ +
+
+pydna_code_from_list(lst: list)[source]
+

Pydna code for a list of primer objects.

+
+ +
+
+open_folder()[source]
+

Open folder where primer file is located.

+
+ +
+
+code(lst: list)
+

Pydna code for a list of primer objects.

+
+ +
+ +
+
+pydna.myprimers.check_primer_numbers(pl: list | None = None)[source]
+

Find primers whose number do not match position in list.

+
+ +
+
+pydna.myprimers.undefined_sequence(pl: list | None = None)[source]
+

Primers in list with N or n instead of a sequence.

+
+ +
+
+pydna.myprimers.find_duplicate_primers(pl: list | None = None)[source]
+

Find a list of lists with duplicated primer sequences.

+
+ +
+
+

pydna.parsers module

+

Provides two functions, parse and parse_primers

+
+
+pydna.parsers.extract_from_text(text)[source]
+

docstring.

+
+ +
+
+pydna.parsers.embl_gb_fasta(text)[source]
+

Parse embl, genbank or fasta format from text.

+

Returns list of Bio.SeqRecord.SeqRecord

+

annotations[“molecule_type”] +annotations[“topology”]

+
+ +
+
+pydna.parsers.parse(data, ds=True)[source]
+

Return all DNA sequences found in data.

+

If no sequences are found, an empty list is returned. This is a greedy +function, use carefully.

+
+
Parameters:
+
    +
  • data (string or iterable) –

    The data parameter is a string containing:

    +
      +
    1. an absolute path to a local file. +The file will be read in text +mode and parsed for EMBL, FASTA +and Genbank sequences. Can be +a string or a Path object.

    2. +
    3. a string containing one or more +sequences in EMBL, GENBANK, +or FASTA format. Mixed formats +are allowed.

    4. +
    5. data can be a list or other iterable where the elements are 1 or 2

    6. +
    +

  • +
  • ds (bool) – If True double stranded Dseqrecord objects are returned. +If False single stranded Bio.SeqRecord [7] objects are +returned.

  • +
+
+
Returns:
+

contains Dseqrecord or SeqRecord objects

+
+
Return type:
+

list

+
+
+

References

+ +
+

See also

+

read

+
+
+ +
+
+pydna.parsers.parse_primers(data)[source]
+

docstring.

+
+ +
+
+

pydna.primer module

+

This module provide the Primer class that is a subclass of the biopython SeqRecord.

+
+
+class pydna.primer.Primer(record, *args, amplicon=None, position=None, footprint=0, **kwargs)[source]
+

Bases: SeqRecord

+

Primer and its position on a template, footprint and tail.

+
+
+property footprint
+
+ +
+
+property tail
+
+ +
+
+reverse_complement(*args, **kwargs)[source]
+

Return the reverse complement of the sequence.

+
+ +
+ +
+
+

pydna.readers module

+

Provides two functions, read and read_primer.

+
+
+pydna.readers.read(data, ds=True)[source]
+

This function is similar the parse() function but expects one and only +one sequence or and exception is thrown.

+
+
Parameters:
+
    +
  • data (string) – see below

  • +
  • ds (bool) – Double stranded or single stranded DNA, if True return +Dseqrecord objects, else Bio.SeqRecord objects.

  • +
+
+
Returns:
+

contains the first Dseqrecord or SeqRecord object parsed.

+
+
Return type:
+

Dseqrecord

+
+
+

Notes

+

The data parameter is similar to the data parameter for parse().

+
+

See also

+

parse

+
+
+ +
+
+pydna.readers.read_primer(data)[source]
+

Use this function to read a primer sequence from a string or a local file. +The usage is similar to the parse_primer() function.

+
+ +
+
+

pydna.seqrecord module

+

A subclass of the Biopython SeqRecord class.

+

Has a number of extra methods and uses +the pydna._pretty_str.pretty_str class instread of str for a +nicer output in the IPython shell.

+
+
+class pydna.seqrecord.SeqRecord(seq, *args, id='id', name='name', description='description', **kwargs)[source]
+

Bases: SeqRecord

+

A subclass of the Biopython SeqRecord class.

+

Has a number of extra methods and uses +the pydna._pretty_str.pretty_str class instread of str for a +nicer output in the IPython shell.

+
+
+classmethod from_Bio_SeqRecord(sr: SeqRecord)[source]
+

Creates a pydnaSeqRecord from a Biopython SeqRecord.

+
+ +
+
+property locus
+

Alias for name property.

+
+ +
+
+property accession
+

Alias for id property.

+
+ +
+
+property definition
+

Alias for description property.

+
+ +
+
+reverse_complement(*args, **kwargs)[source]
+

Return the reverse complement of the sequence.

+
+ +
+
+rc(*args, **kwargs)
+

Return the reverse complement of the sequence.

+
+ +
+
+isorf(table=1)[source]
+

Detect if sequence is an open reading frame (orf) in the 5’-3’.

+

direction.

+

Translation tables are numbers according to the NCBI numbering [8].

+
+
Parameters:
+

table (int) – Sets the translation table, default is 1 (standard code)

+
+
Returns:
+

True if sequence is an orf, False otherwise.

+
+
Return type:
+

bool

+
+
+

Examples

+
>>> from pydna.seqrecord import SeqRecord
+>>> a=SeqRecord("atgtaa")
+>>> a.isorf()
+True
+>>> b=SeqRecord("atgaaa")
+>>> b.isorf()
+False
+>>> c=SeqRecord("atttaa")
+>>> c.isorf()
+False
+
+
+

References

+ +
+ +
+
+translate()[source]
+

docstring.

+
+ +
+
+add_colors_to_features_for_ape()[source]
+

Assign colors to features.

+

compatible with +the ApE editor.

+
+ +
+
+add_feature(x=None, y=None, seq=None, type_='misc', strand=1, *args, **kwargs)[source]
+

Add a feature of type misc to the feature list of the sequence.

+
+
Parameters:
+
    +
  • x (int) – Indicates start of the feature

  • +
  • y (int) – Indicates end of the feature

  • +
+
+
+

Examples

+
>>> from pydna.seqrecord import SeqRecord
+>>> a=SeqRecord("atgtaa")
+>>> a.features
+[]
+>>> a.add_feature(2,4)
+>>> a.features
+[SeqFeature(SimpleLocation(ExactPosition(2),
+                           ExactPosition(4),
+                           strand=1),
+            type='misc',
+            qualifiers=...)]
+
+
+
+ +
+
+list_features()[source]
+

Print ASCII table with all features.

+

Examples

+
>>> from pydna.seq import Seq
+>>> from pydna.seqrecord import SeqRecord
+>>> a=SeqRecord(Seq("atgtaa"))
+>>> a.add_feature(2,4)
+>>> print(a.list_features())
++-----+---------------+-----+-----+-----+-----+------+------+
+| Ft# | Label or Note | Dir | Sta | End | Len | type | orf? |
++-----+---------------+-----+-----+-----+-----+------+------+
+|   0 | L:ft2         | --> | 2   | 4   |   2 | misc |  no  |
++-----+---------------+-----+-----+-----+-----+------+------+
+
+
+
+ +
+
+extract_feature(n)[source]
+

Extract feature and return a new SeqRecord object.

+
+
Parameters:
+
    +
  • n (int)

  • +
  • extract (Indicates the feature to)

  • +
+
+
+

Examples

+
>>> from pydna.seqrecord import SeqRecord
+>>> a=SeqRecord("atgtaa")
+>>> a.add_feature(2,4)
+>>> b=a.extract_feature(0)
+>>> b
+SeqRecord(seq=Seq('gt'), id='ft2', name='part_name',
+          description='description', dbxrefs=[])
+
+
+
+ +
+
+sorted_features()[source]
+

Return a list of the features sorted by start position.

+

Examples

+
>>> from pydna.seqrecord import SeqRecord
+>>> a=SeqRecord("atgtaa")
+>>> a.add_feature(3,4)
+>>> a.add_feature(2,4)
+>>> print(a.features)
+[SeqFeature(SimpleLocation(ExactPosition(3), ExactPosition(4),
+                           strand=1),
+            type='misc', qualifiers=...),
+ SeqFeature(SimpleLocation(ExactPosition(2), ExactPosition(4),
+                           strand=1),
+            type='misc', qualifiers=...)]
+>>> print(a.sorted_features())
+[SeqFeature(SimpleLocation(ExactPosition(2), ExactPosition(4),
+                           strand=1),
+            type='misc', qualifiers=...),
+ SeqFeature(SimpleLocation(ExactPosition(3), ExactPosition(4),
+                           strand=1),
+            type='misc', qualifiers=...)]
+
+
+
+ +
+
+seguid()[source]
+

Return the url safe SEGUID [9] for the sequence.

+

This checksum is the same as seguid but with base64.urlsafe +encoding instead of the normal base 64. This means that +the characters + and / are replaced with - and _ so that +the checksum can be a part of and URL or a filename.

+

Examples

+
>>> from pydna.seqrecord import SeqRecord
+>>> a=SeqRecord("gattaca")
+>>> a.seguid() # original seguid is +bKGnebMkia5kNg/gF7IORXMnIU
+'lsseguid=tp2jzeCM2e3W4yxtrrx09CMKa_8'
+
+
+

References

+ +
+ +
+
+comment(newcomment='')[source]
+

docstring.

+
+ +
+
+datefunction()[source]
+

docstring.

+
+ +
+
+stamp(now=datefunction, tool='pydna', separator=' ', comment='')[source]
+

Add seguid checksum to COMMENTS sections

+

The checksum is stored in object.annotations[“comment”]. +This shows in the COMMENTS section of a formatted genbank file.

+

For blunt linear sequences:

+

SEGUID <seguid>

+

For circular sequences:

+

cSEGUID <seguid>

+

Fore linear sequences which are not blunt:

+

lSEGUID <seguid>

+

Examples

+
>>> from pydna.seqrecord import SeqRecord
+>>> a = SeqRecord("aa")
+>>> a.stamp()
+'lsseguid=gBw0Jp907Tg_yX3jNgS4qQWttjU'
+>>> a.annotations["comment"][:41]
+'pydna lsseguid=gBw0Jp907Tg_yX3jNgS4qQWttj'
+
+
+
+ +
+
+lcs(other, *args, limit=25, **kwargs)[source]
+

Return the longest common substring between the sequence.

+

and another sequence (other). The other sequence can be a string, +Seq, SeqRecord, Dseq or DseqRecord. +The method returns a SeqFeature with type “read” as this method +is mostly used to map sequence reads to the sequence. This can be +changed by passing a type as keyword with some other string value.

+

Examples

+
>>> from pydna.seqrecord import SeqRecord
+>>> a = SeqRecord("GGATCC")
+>>> a.lcs("GGATCC", limit=6)
+SeqFeature(SimpleLocation(ExactPosition(0),
+                          ExactPosition(6), strand=1),
+                          type='read',
+                          qualifiers=...)
+>>> a.lcs("GATC", limit=4)
+SeqFeature(SimpleLocation(ExactPosition(1),
+                          ExactPosition(5), strand=1),
+                          type='read',
+                          qualifiers=...)
+>>> a = SeqRecord("CCCCC")
+>>> a.lcs("GGATCC", limit=6)
+SeqFeature(None)
+
+
+
+ +
+
+gc()[source]
+

Return GC content.

+
+ +
+
+cai(organism='sce')[source]
+

docstring.

+
+ +
+
+rarecodons(organism='sce')[source]
+

docstring.

+
+ +
+
+startcodon(organism='sce')[source]
+

docstring.

+
+ +
+
+stopcodon(organism='sce')[source]
+

docstring.

+
+ +
+
+express(organism='sce')[source]
+

docstring.

+
+ +
+
+copy()[source]
+

docstring.

+
+ +
+
+dump(filename, protocol=None)[source]
+

docstring.

+
+ +
+ +
+
+class pydna.seqrecord.ProteinSeqRecord(seq, *args, id='id', name='name', description='description', **kwargs)[source]
+

Bases: SeqRecord

+
+
+reverse_complement(*args, **kwargs)[source]
+

Return the reverse complement of the sequence.

+
+ +
+
+rc(*args, **kwargs)
+

Return the reverse complement of the sequence.

+
+ +
+
+isorf(*args, **kwargs)[source]
+

Detect if sequence is an open reading frame (orf) in the 5’-3’.

+

direction.

+

Translation tables are numbers according to the NCBI numbering [10].

+
+
Parameters:
+

table (int) – Sets the translation table, default is 1 (standard code)

+
+
Returns:
+

True if sequence is an orf, False otherwise.

+
+
Return type:
+

bool

+
+
+

Examples

+
>>> from pydna.seqrecord import SeqRecord
+>>> a=SeqRecord("atgtaa")
+>>> a.isorf()
+True
+>>> b=SeqRecord("atgaaa")
+>>> b.isorf()
+False
+>>> c=SeqRecord("atttaa")
+>>> c.isorf()
+False
+
+
+

References

+ +
+ +
+
+gc()[source]
+

Return GC content.

+
+ +
+
+cai(*args, **kwargs)[source]
+

docstring.

+
+ +
+
+rarecodons(*args, **kwargs)[source]
+

docstring.

+
+ +
+
+startcodon(*args, **kwargs)[source]
+

docstring.

+
+ +
+
+stopcodon(*args, **kwargs)[source]
+

docstring.

+
+ +
+
+express(*args, **kwargs)[source]
+

docstring.

+
+ +
+ +
+
+

pydna.tm module

+

This module provide functions for melting temperature calculations.

+
+
+pydna.tm.tm_default(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=_mt.DNA_NN4, tmm_table=None, imm_table=None, de_table=None, dnac1=500 / 2, dnac2=500 / 2, selfcomp=False, Na=40, K=0, Tris=75.0, Mg=1.5, dNTPs=0.8, saltcorr=7, func=_mt.Tm_NN)[source]
+
+ +
+
+pydna.tm.tm_dbd(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=_mt.DNA_NN3, tmm_table=None, imm_table=None, de_table=None, dnac1=250, dnac2=250, selfcomp=False, Na=50, K=0, Tris=0, Mg=1.5, dNTPs=0.8, saltcorr=1, func=_mt.Tm_NN)[source]
+
+ +
+
+pydna.tm.tm_product(seq: str, K=0.050)[source]
+

Tm calculation for the amplicon.

+

according to:

+

Rychlik, Spencer, and Rhoads, 1990, Optimization of the anneal +ing temperature for DNA amplification in vitro +http://www.ncbi.nlm.nih.gov/pubmed/2243783

+
+ +
+
+pydna.tm.ta_default(fp: str, rp: str, seq: str, tm=tm_default, tm_product=tm_product)[source]
+

Ta calculation.

+

according to:

+

Rychlik, Spencer, and Rhoads, 1990, Optimization of the anneal +ing temperature for DNA amplification in vitro +http://www.ncbi.nlm.nih.gov/pubmed/2243783

+

The formula described uses the length and GC content of the product and +salt concentration (monovalent cations)

+
+ +
+
+pydna.tm.ta_dbd(fp, rp, seq, tm=tm_dbd, tm_product=None)[source]
+
+ +
+
+pydna.tm.program(amplicon, tm=tm_default, ta=ta_default)[source]
+

Returns a string containing a text representation of a suggested +PCR program using Taq or similar polymerase.

+
|95°C|95°C               |    |tmf:59.5
+|____|_____          72°C|72°C|tmr:59.7
+|3min|30s  \ 59.1°C _____|____|60s/kb
+|    |      \______/ 0:32|5min|GC 51%
+|    |       30s         |    |1051bp
+
+
+
+ +
+
+pydna.tm.taq_program(amplicon, tm=tm_default, ta=ta_default)
+

Returns a string containing a text representation of a suggested +PCR program using Taq or similar polymerase.

+
|95°C|95°C               |    |tmf:59.5
+|____|_____          72°C|72°C|tmr:59.7
+|3min|30s  \ 59.1°C _____|____|60s/kb
+|    |      \______/ 0:32|5min|GC 51%
+|    |       30s         |    |1051bp
+
+
+
+ +
+
+pydna.tm.dbd_program(amplicon, tm=tm_dbd, ta=ta_dbd)[source]
+

Text representation of a suggested PCR program.

+

Using a polymerase with a DNA binding domain such as Pfu-Sso7d.

+
|98°C|98°C               |    |tmf:53.8
+|____|_____          72°C|72°C|tmr:54.8
+|30s |10s  \ 57.0°C _____|____|15s/kb
+|    |      \______/ 0:15|5min|GC 51%
+|    |       10s         |    |1051bp
+
+|98°C|98°C      |    |tmf:82.5
+|____|____      |    |tmr:84.4
+|30s |10s \ 72°C|72°C|15s/kb
+|    |     \____|____|GC 52%
+|    |      3:45|5min|15058bp
+
+
+
+ +
+
+pydna.tm.pfu_sso7d_program(amplicon, tm=tm_dbd, ta=ta_dbd)
+

Text representation of a suggested PCR program.

+

Using a polymerase with a DNA binding domain such as Pfu-Sso7d.

+
|98°C|98°C               |    |tmf:53.8
+|____|_____          72°C|72°C|tmr:54.8
+|30s |10s  \ 57.0°C _____|____|15s/kb
+|    |      \______/ 0:15|5min|GC 51%
+|    |       10s         |    |1051bp
+
+|98°C|98°C      |    |tmf:82.5
+|____|____      |    |tmr:84.4
+|30s |10s \ 72°C|72°C|15s/kb
+|    |     \____|____|GC 52%
+|    |      3:45|5min|15058bp
+
+
+
+ +
+
+pydna.tm.Q5(primer: str, *args, **kwargs)[source]
+

For Q5 Ta they take the lower of the two Tms and add 1C +(up to 72C). For Phusion they take the lower of the two +and add 3C (up to 72C).

+
+ +
+
+pydna.tm.tmbresluc(primer: str, *args, primerc=500.0, saltc=50, **kwargs)[source]
+

Returns the tm for a primer using a formula adapted to polymerases +with a DNA binding domain, such as the Phusion polymerase.

+
+
Parameters:
+
    +
  • primer (string) – primer sequence 5’-3’

  • +
  • primerc (float) – primer concentration in nM), set to 500.0 nm by default.

  • +
  • saltc (float, optional) – Monovalent cation concentration in mM, set to 50.0 mM by default.

  • +
  • thermodynamics (bool, optional) – prints details of the thermodynamic data to stdout. For +debugging only.

  • +
+
+
Returns:
+

tm – the tm of the primer

+
+
Return type:
+

float

+
+
+
+ +
+
+pydna.tm.tm_neb(primer, conc=0.5, prodcode='q5-0')[source]
+

Calculates a single primers melting temp from NEB.

+
+
Parameters:
+
+
+
Returns:
+

tm – primer melting temperature

+
+
Return type:
+

int

+
+
+
+ +
+
+

pydna.utils module

+

Miscellaneous functions.

+
+
+pydna.utils.three_frame_orfs(dna: str, limit: int = 100, startcodons: tuple = ('ATG',), stopcodons: tuple = ('TAG', 'TAA', 'TGA'))[source]
+

Overlapping orfs in three frames.

+
+ +
+
+pydna.utils.shift_location(original_location, shift, lim)[source]
+

docstring.

+
+ +
+
+pydna.utils.shift_feature(feature, shift, lim)[source]
+

Return a new feature with shifted location.

+
+ +
+
+pydna.utils.smallest_rotation(s)[source]
+

Smallest rotation of a string.

+

Algorithm described in Pierre Duval, Jean. 1983. Factorizing Words +over an Ordered Alphabet. Journal of Algorithms & Computational Technology +4 (4) (December 1): 363–381. and Algorithms on strings and sequences based +on Lyndon words, David Eppstein 2011. +https://gist.github.com/dvberkel/1950267

+

Examples

+
>>> from pydna.utils import smallest_rotation
+>>> smallest_rotation("taaa")
+'aaat'
+
+
+
+ +
+
+pydna.utils.cai(seq: str, organism: str = 'sce', weights: dict = _weights)[source]
+

docstring.

+
+ +
+
+pydna.utils.rarecodons(seq: str, organism='sce')[source]
+

docstring.

+
+ +
+
+pydna.utils.express(seq: str, organism='sce')[source]
+

docstring.

+

NOT IMPLEMENTED YET

+
+ +
+
+pydna.utils.open_folder(pth)[source]
+

docstring.

+
+ +
+
+pydna.utils.rc(sequence: StrOrBytes) StrOrBytes[source]
+

Reverse complement.

+

accepts mixed DNA/RNA

+
+ +
+
+pydna.utils.complement(sequence: str)[source]
+

Complement.

+

accepts mixed DNA/RNA

+
+ +
+
+pydna.utils.memorize(filename)[source]
+

Cache functions and classes.

+

see pydna.download

+
+ +
+
+pydna.utils.identifier_from_string(s: str) str[source]
+

Return a valid python identifier.

+

based on the argument s or an empty string

+
+ +
+
+pydna.utils.flatten(*args) List[source]
+

Flattens an iterable of iterables.

+

Down to str, bytes, bytearray or any of the pydna or Biopython seq objects

+
+ +
+
+pydna.utils.seq31(seq)[source]
+

Turn a three letter code protein sequence into one with one letter code.

+

The single input argument ‘seq’ should be a protein sequence using single +letter codes, as a python string.

+

This function returns the amino acid sequence as a string using the one +letter amino acid codes. Output follows the IUPAC standard (including +ambiguous characters B for “Asx”, J for “Xle” and X for “Xaa”, and also U +for “Sel” and O for “Pyl”) plus “Ter” for a terminator given as an +asterisk.

+

Any unknown +character (including possible gap characters), is changed into ‘Xaa’.

+

Examples

+
>>> from Bio.SeqUtils import seq3
+>>> seq3("MAIVMGRWKGAR*")
+'MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer'
+>>> from pydna.utils import seq31
+>>> seq31('MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer')
+'M  A  I  V  M  G  R  W  K  G  A  R  *'
+
+
+
+ +
+
+pydna.utils.randomRNA(length, maxlength=None)[source]
+

docstring.

+
+ +
+
+pydna.utils.randomDNA(length, maxlength=None)[source]
+

docstring.

+
+ +
+
+pydna.utils.randomORF(length, maxlength=None)[source]
+

docstring.

+
+ +
+
+pydna.utils.randomprot(length, maxlength=None)[source]
+

docstring.

+
+ +
+
+pydna.utils.eq(*args, **kwargs)[source]
+

Compare two or more DNA sequences for equality.

+

Compares two or more DNA sequences for equality i.e. if they +represent the same double stranded DNA molecule.

+
+
Parameters:
+
    +
  • args (iterable) – iterable containing sequences +args can be strings, Biopython Seq or SeqRecord, Dseqrecord +or dsDNA objects.

  • +
  • circular (bool, optional) – Consider all molecules circular or linear

  • +
  • linear (bool, optional) – Consider all molecules circular or linear

  • +
+
+
Returns:
+

eq – Returns True or False

+
+
Return type:
+

bool

+
+
+

Notes

+

Compares two or more DNA sequences for equality i.e. if they +represent the same DNA molecule.

+

Two linear sequences are considiered equal if either:

+
    +
  1. They have the same sequence (case insensitive)

  2. +
  3. One sequence is the reverse complement of the other

  4. +
+

Two circular sequences are considered equal if they are circular +permutations meaning that they have the same length and:

+
    +
  1. One sequence can be found in the concatenation of the other sequence with itself.

  2. +
  3. The reverse complement of one sequence can be found in the concatenation of the other sequence with itself.

  4. +
+

The topology for the comparison can be set using one of the keywords +linear or circular to True or False.

+

If circular or linear is not set, it will be deduced from the topology of +each sequence for sequences that have a linear or circular attribute +(like Dseq and Dseqrecord).

+

Examples

+
>>> from pydna.dseqrecord import Dseqrecord
+>>> from pydna.utils import eq
+>>> eq("aaa","AAA")
+True
+>>> eq("aaa","AAA","TTT")
+True
+>>> eq("aaa","AAA","TTT","tTt")
+True
+>>> eq("aaa","AAA","TTT","tTt", linear=True)
+True
+>>> eq("Taaa","aTaa", linear = True)
+False
+>>> eq("Taaa","aTaa", circular = True)
+True
+>>> a=Dseqrecord("Taaa")
+>>> b=Dseqrecord("aTaa")
+>>> eq(a,b)
+False
+>>> eq(a,b,circular=True)
+True
+>>> a=a.looped()
+>>> b=b.looped()
+>>> eq(a,b)
+True
+>>> eq(a,b,circular=False)
+False
+>>> eq(a,b,linear=True)
+False
+>>> eq(a,b,linear=False)
+True
+>>> eq("ggatcc","GGATCC")
+True
+>>> eq("ggatcca","GGATCCa")
+True
+>>> eq("ggatcca","tGGATCC")
+True
+
+
+
+ +
+
+pydna.utils.cuts_overlap(left_cut, right_cut, seq_len)[source]
+
+ +
+
+pydna.utils.location_boundaries(loc: SimpleLocation | CompoundLocation)[source]
+
+ +
+
+pydna.utils.locations_overlap(loc1: SimpleLocation | CompoundLocation, loc2: SimpleLocation | CompoundLocation, seq_len)[source]
+
+ +
+
+
+

Indices and tables

+