diff --git a/docs/_static/assembly_fragment_slide_circular.png b/docs/_static/assembly_fragment_slide_circular.png new file mode 100644 index 00000000..fca398b6 Binary files /dev/null and b/docs/_static/assembly_fragment_slide_circular.png differ diff --git a/docs/_static/assembly_fragment_slide_linear.png b/docs/_static/assembly_fragment_slide_linear.png new file mode 100644 index 00000000..a0091a08 Binary files /dev/null and b/docs/_static/assembly_fragment_slide_linear.png differ diff --git a/docs/notebooks/primer_design.ipynb b/docs/notebooks/primer_design.ipynb new file mode 100644 index 00000000..859bb2af --- /dev/null +++ b/docs/notebooks/primer_design.ipynb @@ -0,0 +1,454 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Primer design in pydna\n", + "\n", + "You can use `pydna` for primer design in different contexts, let's start with some basic primer functionalities." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Checking the Tm of a primer\n", + "\n", + "Primer design in pydna is very flexible, and supports different methods. For typical use-cases, we recommend using `tm_default`, which uses the method `Bio.SeqUtils.MeltingTemp` from biopython, nearest neighbor thermodynamics values from [SantaLucia & Hicks (2004)](https://pubmed.ncbi.nlm.nih.gov/15139820/) and common values for nucleotide concentration, salt concentration, etc. You can of course change those settings. For a full dive check `src/pydna/tm.py`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "55.047602641480864\n", + "54.55481807340169\n" + ] + } + ], + "source": [ + "from pydna.tm import tm_default\n", + "\n", + "# The primers from the readme example\n", + "print(tm_default(\"ATGCAAACAGTAATGATGGA\"))\n", + "print(tm_default(\"ATTATCTTTTTCAGCAATAGAATCA\"))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using NEB Tm calculator\n", + "\n", + "If you are used to the NEB Tm calculator, and you want to use it programmatically, you can do so as well. The function takes three arguments:\n", + "\n", + "- `primer`: The primer sequence.\n", + "- `conc`: The primer concentration.\n", + "- `prodcode`: The product code, which you can find on [NEB's website](https://tmapi.neb.com/docs/productcodes).\n", + "\n", + "> **NOTE:** When you call the function, it will make a request to the NEB server. This makes it much slower than using the builtin methods. In addition, we cannot guarantee that the NEB server will always be available, nor that the calculations that they use will not change in the future, since the code is not available." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "59\n", + "57\n" + ] + } + ], + "source": [ + "from pydna.tm import tm_neb\n", + "\n", + "print(tm_neb(\"ATGCAAACAGTAATGATGGA\", 0.5, \"q5-0\"))\n", + "print(tm_neb(\"ATTATCTTTTTCAGCAATAGAATCA\", 0.5, \"q5-0\"))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Designing primers for PCR\n", + "\n", + "Let's use pydna to amplify a region from a DNA sequence. You can use the `primer_design` function to design primers for a given target Tm (`target_tm`) and indicate a minimum primer hybridization length in basepairs (`limit`)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Forward primer Tm: 59.71997924024873\n", + "Forward primer sequence: ATGCAAACAGTAATGATGGATGAC\n", + "\n", + "Reverse primer Tm: 60.22377911083646\n", + "Reverse primer sequence: TTATTCAGCAATAGAATCAGTGCTTTG\n" + ] + } + ], + "source": [ + "from pydna.dseqrecord import Dseqrecord\n", + "from Bio.SeqFeature import SeqFeature, SimpleLocation\n", + "from pydna.design import primer_design\n", + "\n", + "dna = Dseqrecord(\"ggttcaATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAATAAcatttacatca\")\n", + "\n", + "# Let's add a feature representing the CDS\n", + "dna.features.append(SeqFeature(SimpleLocation(start=6, end=60), type=\"CDS\"))\n", + "\n", + "# To design the primer, we extract the template sequence we want to amplify, and use the `primer_design` method.\n", + "template = dna.features[0].location.extract(dna)\n", + "\n", + "# We get an amplicon object (a subclass of Dseqrecord), that also contains extra info\n", + "# of where the primers align etc.\n", + "amplicon = primer_design(template, target_tm=60.0, limit=15)\n", + "\n", + "# We extract the primers\n", + "fwd_primer, rvs_primer = amplicon.primers()\n", + "\n", + "# We print the Tms\n", + "print(\"Forward primer Tm:\", tm_default(fwd_primer.seq))\n", + "print(\"Forward primer sequence:\", fwd_primer.seq)\n", + "print()\n", + "print(\"Reverse primer Tm:\", tm_default(rvs_primer.seq))\n", + "print(\"Reverse primer sequence:\", rvs_primer.seq)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Special primers\n", + "\n", + "We saw an example where we simply want to amplify a region of DNA. But what if we want to design primers for a specific restriction enzyme, or for Gibson Assembly? That's also easy." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Restriction enzyme\n", + "\n", + "Simply append the sequence you want at the 5' end of the primers." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ttGAATTCATGCAAACAGTAATGATGGATGAC\n", + "ttGAATTCTTATTCAGCAATAGAATCAGTGCTTTG\n" + ] + } + ], + "source": [ + "from Bio.Restriction import EcoRI\n", + "fwd_primer_EcoRI = 'ttGAATTC' + fwd_primer\n", + "# You can also do it like this!\n", + "rvs_primer_EcoRI = 'tt' + EcoRI.site + rvs_primer\n", + "\n", + "print(fwd_primer_EcoRI.seq)\n", + "print(rvs_primer_EcoRI.seq)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **Edge case:** Some recognition sites contain ambiguous bases, for instance `Bst4CI` cuts at site `ACNGT`, where `N` can be any nucleotide. In that case, you can use the dictionary provided by biopython (`from Bio.Data.IUPACData import ambiguous_dna_values`) to produce a concrete DNA sequence you can use in real life." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Gibson Assembly\n", + "\n", + "To design primers for Gibson Assembly, you can use the `assembly_fragments` function." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Linear Gibson Assembly" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Primers for fragment 1:\n", + "ATGCAAACAGTAATGATGGATGAC\n", + "GAGTGATTATCTTTTTCAGCAATAGAATCAGTGC\n", + "\n", + "Primers for fragment 2:\n", + "ATAATCACTCTAATAATGAATCTAACTTTACTTGGAAA\n", + "ATGCTTTTCCACTTGTTCACG\n", + "\n", + "PCR product 1:\n", + "ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATCACTC\n", + "\n", + "PCR product 2:\n", + "ATAATCACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAAAGCAT\n", + "\n", + "Overlap\n", + "ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATCACTC\n", + " ATAATCACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAAAGCAT\n" + ] + } + ], + "source": [ + "from pydna.design import assembly_fragments\n", + "# Let's imagine we want to join these two sequences together linearly with Gibson Assembly\n", + "seq1 = Dseqrecord('ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAAT')\n", + "seq2 = Dseqrecord('CACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAAAGCAT')\n", + "\n", + "# First, we design primers for each fragment, as before:\n", + "pre_amplicon1 = primer_design(seq1, target_tm=60.0, limit=15)\n", + "pre_amplicon2 = primer_design(seq2, target_tm=60.0, limit=15)\n", + "\n", + "# Then, we use the `assembly_fragments` function to design primers for Gibson Assembly\n", + "amplicon1, amplicon2 = assembly_fragments([pre_amplicon1, pre_amplicon2], overlap=10)\n", + "\n", + "# We print the primers:\n", + "fwd_1, rvs_1 = amplicon1.primers()\n", + "fwd_2, rvs_2 = amplicon2.primers()\n", + "\n", + "print('Primers for fragment 1:')\n", + "print(fwd_1.seq)\n", + "print(rvs_1.seq)\n", + "print()\n", + "print('Primers for fragment 2:')\n", + "print(fwd_2.seq)\n", + "print(rvs_2.seq)\n", + "print()\n", + "\n", + "# The amplicons contain the PCR products (note the overlap between the two fragments)\n", + "print('PCR product 1:')\n", + "print(amplicon1.seq)\n", + "print()\n", + "print('PCR product 2:')\n", + "print(amplicon2.seq)\n", + "print()\n", + "\n", + "print('Overlap')\n", + "print(amplicon1.seq)\n", + "print(' '*55,amplicon2.seq, sep='')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once you have the amplicons, you can use `Assembly` to join them together (see the `Gibson` notebook for more details)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "65bp_PCR_prod|10\n", + " \\/\n", + " /\\\n", + " 10|65bp_PCR_prod\n", + "\n", + "Dseqrecord(-120)\n", + "\u001b[48;5;11mATGCAAACAGTAATGATGGATGAC\u001b[0mATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATCACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAAAGCAT\n", + "TACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTAGTGAGATTATTACTTAGATTGAAATGAACCTTTCGCAAAGCACTTGTTCACCTTTTCGTA\n" + ] + } + ], + "source": [ + "from pydna.assembly import Assembly\n", + "from pydna.common_sub_strings import terminal_overlap\n", + "\n", + "\n", + "assembly = Assembly([amplicon1, amplicon2], limit=10, algorithm=terminal_overlap)\n", + "product = assembly.assemble_linear()[0]\n", + "print(product.figure())\n", + "\n", + "print()\n", + "\n", + "print(Dseqrecord(product).figure())\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Circular Gibson Assembly" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Primers for fragment 1:\n", + "AGCATATGCAAACAGTAATGATGGATGAC\n", + "GAGTGATTATCTTTTTCAGCAATAGAATCAGTGC\n", + "\n", + "Primers for fragment 2:\n", + "ATAATCACTCTAATAATGAATCTAACTTTACTTGGAAA\n", + "TGCATATGCTTTTCCACTTGTTCACG\n", + "\n", + " -|70bp_PCR_prod|10\n", + "| \\/\n", + "| /\\\n", + "| 10|70bp_PCR_prod|10\n", + "| \\/\n", + "| /\\\n", + "| 10-\n", + "| |\n", + " ------------------------------------\n", + "\n", + "Dseqrecord(o120)\n", + "AGCAT\u001b[48;5;11mATGCAAACAGTAATGATGGATGAC\u001b[0mATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATCACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAA\n", + "TCGTATACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTAGTGAGATTATTACTTAGATTGAAATGAACCTTTCGCAAAGCACTTGTTCACCTTT\n" + ] + } + ], + "source": [ + "# We use the `assembly_fragments` function with `circular=True`\n", + "amplicon1, amplicon2 = assembly_fragments([pre_amplicon1, pre_amplicon2], overlap=10, circular=True)\n", + "\n", + "# We print the primers:\n", + "fwd_1, rvs_1 = amplicon1.primers()\n", + "fwd_2, rvs_2 = amplicon2.primers()\n", + "\n", + "print('Primers for fragment 1:')\n", + "print(fwd_1.seq)\n", + "print(rvs_1.seq)\n", + "print()\n", + "print('Primers for fragment 2:')\n", + "print(fwd_2.seq)\n", + "print(rvs_2.seq)\n", + "print()\n", + "\n", + "assembly = Assembly([amplicon1, amplicon2], limit=10, algorithm=terminal_overlap)\n", + "\n", + "# Here we use assemble_circular!\n", + "product = assembly.assemble_circular()[0]\n", + "print(product.figure())\n", + "\n", + "print()\n", + "\n", + "print(Dseqrecord(product).figure())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Adding spacers / linkers to Gibson Assembly primers\n", + "\n", + "In this case, you can pass a list of `Amplicons` and `Dseqrecords` to the `assembly_fragments` function. The `amplicons` will be used as the fragments to assemble, and the `dseqrecords` will be used as spacers between the fragments, as long as they are shorter than the argument `maxlink`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "GCATtttATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATaaaCACTCTAATAATGAATCTAACTTTACTTGGAAAGCGTTTCGTGAACAAGTGGAAAA\n", + " ^^^ ^^^\n" + ] + } + ], + "source": [ + "# We create two spacers as dseqrecords\n", + "spacer1 = Dseqrecord('aaa')\n", + "spacer2 = Dseqrecord('ttt')\n", + "\n", + "amplicon1, amplicon2 = assembly_fragments([pre_amplicon1, spacer1, pre_amplicon2, spacer2], overlap=10, circular=True)\n", + "\n", + "assembly = Assembly([amplicon1, amplicon2], limit=10, algorithm=terminal_overlap)\n", + "\n", + "# Here we use assemble_circular!\n", + "product = assembly.assemble_circular()[0]\n", + "\n", + "# See the linkers that have been added\n", + "print()\n", + "print(Dseqrecord(product).seq)\n", + "print(4*' ', '^^^', 60*' ', '^^^', sep='')\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Summary of assembly_fragments behaviour\n", + "\n", + "The behaviour is summarised in the following graphics for linear and circular assembly.\n", + "\n", + "![assembly_fragments behaviour](../_static/assembly_fragment_slide_linear.png)\n", + "![assembly_fragments behaviour](../_static/assembly_fragment_slide_circular.png)\n", + "\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/src/pydna/design.py b/src/pydna/design.py index be7a5904..ef45127f 100755 --- a/src/pydna/design.py +++ b/src/pydna/design.py @@ -236,7 +236,7 @@ def design(target_tm, template): return prod -def assembly_fragments(f, overlap=35, maxlink=40): +def assembly_fragments(f, overlap=35, maxlink=40, circular=False): """This function return a list of :mod:`pydna.amplicon.Amplicon` objects where primers have been modified with tails so that the fragments can be fused in the order they appear in the list by for example Gibson assembly or homologous @@ -563,6 +563,9 @@ def assembly_fragments(f, overlap=35, maxlink=40): maxlink : int, optional Maximum length of spacer sequences that may be present in f. These will be included in tails for designed primers. + circular : bool, optional + If True, the assembly is circular. If False, the assembly is linear. + Returns ------- seqs : list of :mod:`pydna.amplicon.Amplicon` and other Dseqrecord like objects :mod:`pydna.amplicon.Amplicon` objects @@ -620,6 +623,15 @@ def assembly_fragments(f, overlap=35, maxlink=40): >>> """ + + # Recursive call for circular assemblies + if circular: + fragments = assembly_fragments(f + f[0:1], overlap=overlap, maxlink=maxlink, circular=False) + + if hasattr(fragments[0], "template"): + fragments[0] = _pcr((fragments[-1].forward_primer, fragments[0].reverse_primer), fragments[0].template) + return fragments[:-1] + # sanity check for arguments nf = [item for item in f if len(item) > maxlink] if not all(hasattr(i[0], "template") or hasattr(i[1], "template") for i in zip(nf, nf[1:])): @@ -742,11 +754,19 @@ def assembly_fragments(f, overlap=35, maxlink=40): def circular_assembly_fragments(f, overlap=35, maxlink=40): - fragments = assembly_fragments(f + f[0:1], overlap=overlap, maxlink=maxlink) + """ + Equivalent to `assembly_fragments` with `circular=True`. - if hasattr(fragments[0], "template"): - fragments[0] = _pcr((fragments[-1].forward_primer, fragments[0].reverse_primer), fragments[0].template) - return fragments[:-1] + Deprecated, kept for backward compatibility. Use `assembly_fragments` with `circular=True` instead. + """ + import warnings + + warnings.warn( + "The circular_assembly_fragments function is deprecated. Use assembly_fragments with circular=True instead.", + DeprecationWarning, + stacklevel=2, + ) + return assembly_fragments(f, overlap=overlap, maxlink=maxlink, circular=True) if __name__ == "__main__": diff --git a/tests/test_module_design.py b/tests/test_module_design.py index 8c2fefc6..9c993fce 100755 --- a/tests/test_module_design.py +++ b/tests/test_module_design.py @@ -220,24 +220,24 @@ def test_primer_Design_with_linker(): """test_primer_design""" b = Dseqrecord("agctactgactattaggggttattctgatcatctgatctactatctgactgtactgatcta") - l = Dseqrecord("AAATTTCCCGGG") + linker = Dseqrecord("AAATTTCCCGGG") c = Dseqrecord("tctgatctactatctgactgtactgatctattgacactgtgatcattctagtgtattactc") - frags = assembly_fragments((primer_design(b), l, primer_design(c))) + frags = assembly_fragments((primer_design(b), linker, primer_design(c))) asm1 = Assembly(frags) - assert asm1.assemble_linear()[0].seguid(), (b + l + c).seguid() == "l95igKB8iKAKrvvqE9CYksyNx40" + assert asm1.assemble_linear()[0].seguid(), (b + linker + c).seguid() == "l95igKB8iKAKrvvqE9CYksyNx40" - frags = assembly_fragments((primer_design(b), l, primer_design(c), primer_design(b))) + frags = assembly_fragments((primer_design(b), linker, primer_design(c), primer_design(b))) b2 = pcr(frags[-1].forward_primer, frags[0].reverse_primer, b) asm2 = Assembly((b2, frags[1], frags[2])) - assert (b + l + c).looped().seguid() == asm2.assemble_circular()[0].seguid() + assert (b + linker + c).looped().seguid() == asm2.assemble_circular()[0].seguid() - assert (b + l + c).looped().seguid() == "cdseguid=LqQ1_uMp2AmEZ_L2I1_njIMkVDc" + assert (b + linker + c).looped().seguid() == "cdseguid=LqQ1_uMp2AmEZ_L2I1_njIMkVDc" def test_primer_Design_given_fw_primer(): @@ -275,7 +275,7 @@ def test_primer_Design_multiple_products(): from pydna import _PydnaWarning with pytest.warns(_PydnaWarning): - a = primer_design(b) + primer_design(b) def test_circular_assembly_fragments(): @@ -346,5 +346,27 @@ def tm_alt_upper(*args): assert amp_upper_with_estimate == amp_upper_no_estimate +def test_primer_design_correct_value(): + from pydna.tm import tm_default + + for original_target_tm in range(60, 65): + for frag in frags: + amp = primer_design(frag, target_tm=original_target_tm, limit=15) + fwd, rvs = amp.primers() + possible_fwd = [Primer(frag[0:i].seq) for i in range(15, 40)] + possible_rvs = [Primer(frag[-i:].reverse_complement().seq) for i in range(15, 40)] + + # Finds the closest forward primer + fwd_diff = [abs(tm_default(f.seq) - original_target_tm) for f in possible_fwd] + correct_fwd = possible_fwd[fwd_diff.index(min(fwd_diff))] + assert str(fwd.seq) == str(correct_fwd.seq) + + # Uses that primers Tm as the target Tm for the reverse primer + new_target_tm = tm_default(correct_fwd.seq) + rvs_diff = [abs(tm_default(f.seq) - new_target_tm) for f in possible_rvs] + correct_rvs = possible_rvs[rvs_diff.index(min(rvs_diff))] + assert str(rvs.seq) == str(correct_rvs.seq) + + if __name__ == "__main__": pytest.cmdline.main([__file__, "-v", "-s"])