paper: fixing pandas citation

ebi-jdispatcher · Sep 26, 2024 · fac5c40 · fac5c40
1 parent 9497e64
commit fac5c40
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 3 deletions.
diff --git a/paper/paper.bib b/paper/paper.bib
@@ -66,14 +66,14 @@ @incollection{kans_entrez_2024
 	year = {2024},
 }
 
-@misc{team_pandas-devpandas_2024,
+@misc{pandas_2024,
 	title = {pandas-dev/pandas: {Pandas}},
 	shorttitle = {pandas-dev/pandas},
 	url = {https://zenodo.org/records/13819579},
 	abstract = {Pandas is a powerful data structures for data analysis, time series, and statistics.},
 	urldate = {2024-09-25},
 	publisher = {Zenodo},
-	author = {team, The pandas development},
+	author = {The pandas development team},
 	month = sep,
 	year = {2024},
 	doi = {10.5281/zenodo.13819579},

diff --git a/paper/paper.md b/paper/paper.md
@@ -53,7 +53,7 @@ Taxonomy Resolver has been developed with simplicity in mind and it can be used
 * **filtering** a tree based on the inclusion and/or exclusion of certain TaxIDs
 * **writing and loading** tree data structures using Python’s object serialisation
 
-A taxonomy tree is a hierarchical structure that can be seen as a collection of deeply nested containers - nodes connected by edges, following the hierarchy, from the parent node - the root, all the way down to the children nodes - the leaves. An object-oriented programming (OOP) tree implementation based on recursion does not typically scale well for large trees, such as the NCBI Taxonomy, which is composed of >2.6 million nodes. To improve performance, Taxonomy Resolver represents the tree structure following the Nested Set Model, which is a technique developed to represent hierarchical data in relational databases lacking recursion capabilities. This allows for efficient and inexpensive querying of parent-child relationships. The full tree is traversed following the Modified Preorder Tree Traversal (MPTT) strategy [@celko_chapter_2004], in which each node in the tree is visited twice. In a preorder traversal, the root node is visited first, then recursively a preorder traversal of the left sub-tree, followed by a recursive preorder traversal of the right subtree, in order, until every node has been visited. The modified strategy allows capturing the 'left' and 'right' ($lft$ and $rgt$, respectively) boundaries of each subtree, which are stored as two additional attributes. Finding a subtree is as simple as searching for the nodes of interest where $lft > node lft$ and $rgt < node rgt$. Likewise, finding the full path to a node is as simple as searching for the nodes where $lft < node lft$ and $rgt > node rgt$. Traversal attributes, depth and node indexes are captured for each tree node and are stored as a pandas DataFrame [@team_pandas-devpandas_2024].
+A taxonomy tree is a hierarchical structure that can be seen as a collection of deeply nested containers - nodes connected by edges, following the hierarchy, from the parent node - the root, all the way down to the children nodes - the leaves. An object-oriented programming (OOP) tree implementation based on recursion does not typically scale well for large trees, such as the NCBI Taxonomy, which is composed of >2.6 million nodes. To improve performance, Taxonomy Resolver represents the tree structure following the Nested Set Model, which is a technique developed to represent hierarchical data in relational databases lacking recursion capabilities. This allows for efficient and inexpensive querying of parent-child relationships. The full tree is traversed following the Modified Preorder Tree Traversal (MPTT) strategy [@celko_chapter_2004], in which each node in the tree is visited twice. In a preorder traversal, the root node is visited first, then recursively a preorder traversal of the left sub-tree, followed by a recursive preorder traversal of the right subtree, in order, until every node has been visited. The modified strategy allows capturing the 'left' and 'right' ($lft$ and $rgt$, respectively) boundaries of each subtree, which are stored as two additional attributes. Finding a subtree is as simple as searching for the nodes of interest where $lft > node lft$ and $rgt < node rgt$. Likewise, finding the full path to a node is as simple as searching for the nodes where $lft < node lft$ and $rgt > node rgt$. Traversal attributes, depth and node indexes are captured for each tree node and are stored as a pandas DataFrame [@pandas_2024].
 
 In conclusion, Taxonomy Resolver has been developed to take advantage of the Nested Set Model tree structure, so it can perform fast validation and create lists of taxa that compose a particular subtree. Inclusion and exclusion lists can also be seamlessly used to produce subset trees with wide applications, particularly for sequence similarity search.