Skip to content

Commit

Permalink
Merge pull request #303 from Steinbeck-Lab/development
Browse files Browse the repository at this point in the history
Development
  • Loading branch information
CS76 authored Dec 4, 2024
2 parents 69d416f + 0f4145c commit ab28195
Show file tree
Hide file tree
Showing 99 changed files with 14,988 additions and 2,357 deletions.
30 changes: 14 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,40 +15,38 @@
[![RDKit badge](https://img.shields.io/badge/Powered%20by-RDKit-3838ff.svg?logo=)](https://www.rdkit.org/)
![Workflow](https://GitHub.com/Steinbeck-Lab/coconut/actions/workflows/dev-build.yml/badge.svg)
[![Powered by Laravel](https://img.shields.io/badge/Powered%20by-Laravel-red.svg?style=flat&logo=Laravel)](https://laravel.com)
[![DOI](https://zenodo.org/badge/778260166.svg)](https://zenodo.org/doi/10.5281/zenodo.13283948)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13897048.svg)](https://doi.org/10.5281/zenodo.13382750)

</div>

## ![About](https://www.google.com/s2/favicons?domain=naturalproducts.net) About COCONUT
## ![About](https://www.google.com/s2/favicons?domain=coconut.naturalproducts.net) About

COCONUT is an open-access database dedicated to collecting and disseminating natural products. It aims to provide researchers, scientists, and enthusiasts with comprehensive and easily accessible data on a wide variety of natural compounds. The database includes detailed information on the chemical structures, literature references and sources of these compounds, facilitating research and discovery in natural products.
A comprehensive platform facilitating natural product research by providing data, tools, and services for deposition, curation, and reuse. It aims to provide researchers, scientists, and enthusiasts with comprehensive and easily accessible data on a wide variety of natural compounds. The database includes detailed information on the chemical structures, literature references and sources of these compounds, facilitating research and discovery in natural products.

## ![Features](https://www.google.com/s2/favicons?domain=github.com) Features

- **Extensive Database**: Contains information on thousands of natural products from diverse sources (63).
- **Chemical Structures**: Provides detailed chemical structures for each natural product, aiding research and identification.
- **Search and Filter**: Advanced search and filtering options to find compounds based on specific criteria easily.
- **Online Submission and Curation**: Allows users to contribute new data, ensuring the database remains current and comprehensive.
- **API Access**: Provides API access for seamless integration with other tools and databases.
[https://coconut.naturalproducts.net/](https://coconut.naturalproducts.net/)

## ![Calculations](https://www.google.com/s2/favicons?domain=python.org) Descriptor calculations
## ![Features](https://www.google.com/s2/favicons?domain=github.com) Features

COCONUT data curation and descriptors calculation are performed using our [microservices](https://github.com/Steinbeck-Lab/cheminformatics-python-microservice). More details can be found in our [API documentation](https://api.naturalproducts.net/docs).
- **Standardised data aggregation**: COCONUT 2.0 aggregates data from [more than 63 sources](https://coconut.naturalproducts.net/collections?q=) using ChEMBL curation pipeline with RDKit post-processing, standardizing molecular structures and metadata while preserving stereochemistry from sources.
- *Descriptor calculations*: COCONUT data curation and descriptors calculation are performed using [Cheminformatics microservice](https://docs.api.naturalproducts.net/).
- **Comprehensive download options**: The database offers downloads in CSV, SDF, and SQL dump formats, with specialized CSV files for mass spectrometry, molecular descriptors, and substructure analyses accessible [here](https://coconut.naturalproducts.net/download).
- **Search and Filtering**: Users can search by chemical structure (exact, substructure, similarity), text-based queries for names/SMILES, and filter by organism, chemical class, or literature references.
- **Online Submission and Curation**: Features community-driven data submission and curation through a web interface, allowing users to submit new structures, report issues, and request changes with full audit trail tracking.
- **API Access**: Provides a REST API compliant with OpenAPI specifications for programmatic access to chemical structures, metadata, and audit information with real-time data updates - [API documentation](https://coconut.naturalproducts.net/api-documentation).

## ![License](https://www.google.com/s2/favicons?domain=opensource.org) License

COCONUT infrastructure code is licensed under the MIT license - see the [LICENSE](https://GitHub.com/Steinbeck-Lab/coconut/blob/documentation/LICENSE). Every source on COCONUT comes with its own specific license. It is essential to review the license details for each dataset before using it.

## ![Citations](https://www.google.com/s2/favicons?domain=doi.org)Citations

### COCONUT 2.0
- Venkata Chandrasekhar, Kohulan Rajan, Sri Ram Sagar Kanakam, Nisha Sharma, Viktor Weißenborn, Jonas Schaub, Christoph Steinbeck, COCONUT 2.0: a comprehensive overhaul and curation of the collection of open natural products database, Nucleic Acids Research, 2024;, gkae1063, https://doi.org/10.1093/nar/gkae1063

### COCONUT (Legacy)
- Sorokina, M., Merseburger, P., Rajan, K. et al. (2021). COCONUT online: COlleCtion of Open Natural prodUcTs database. *Journal of Cheminformatics*, 13, 2.
https://doi.org/10.1186/s13321-020-00478-9

### COCONUT 2.0
- Nainala, V.C., Rajan, K., Kanakam, S.R.S., Sharma, N., Weißenborn, V., Schaub, J., et al. (2024). COCONUT 2.0: A comprehensive overhaul and curation of the collection of open natural products database. *ChemRxiv*.
https://doi.org/10.26434/chemrxiv-2024-fxq2s

## ![Maintained](https://www.google.com/s2/favicons?domain=uni-jena.de) Maintained by

The COCONUT database and its infrastructure are developed and maintained by the [Steinbeck group](https://cheminf.uni-jena.de) at the [Friedrich Schiller University](https://www.uni-jena.de/en/) Jena, Germany.
Expand Down
50 changes: 50 additions & 0 deletions app/Actions/Coconut/AssignDOI.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<?php

namespace App\Actions\Coconut;

use App\Models\Collection;
use App\Models\Ticker;
use App\Services\DOI\DOIService;

class AssignDOI
{
private $doiService;

/**
* Create a new class instance.
*
* @return void
*/
public function __construct(DOIService $doiService)
{
$this->doiService = $doiService;
}

/**
* Archive the given model.
*
* @param mixed $model
* @return void
*/
public function assign($model)
{
$collection = null;
if ($model instanceof Collection) {
$collection = $model;
}
if ($collection) {
$collectionIdentifier = $collection->identifier ? $collection->identifier : null;
if ($collectionIdentifier == null) {
$collectionTicker = Ticker::whereType('collection')->first();
$collectionIdentifier = $collectionTicker->index + 1;
$collectionTicker->index = $collectionIdentifier;
$collectionTicker->save();

$collection->identifier = $collectionIdentifier;
$collection->save();
}
$collection->fresh()->generateDOI($this->doiService);
echo $collection->identifier."\r\n";
}
}
}
75 changes: 25 additions & 50 deletions app/Actions/Coconut/SearchMolecule.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ public function query($query, $size, $type, $sort, $tagType, $page)
$this->query = $query;
$this->size = $size;
$this->type = $type;

$this->sort = $sort;
$this->tagType = $tagType;
$this->page = $page;
Expand All @@ -61,10 +62,15 @@ public function query($query, $size, $type, $sort, $tagType, $page)
}
$queryType = strtolower($queryType);

$filterMap = $this->getFilterMap();
$filterMap = getFilterMap();

if ($queryType == 'tags') {
$results = $this->buildTagsStatement($offset);
} elseif ($queryType == 'filters') {
$statement = $this->buildStatement($queryType, $offset, $filterMap);
if ($statement) {
$results = $this->executeQuery($statement);
}
} else {
$statement = $this->buildStatement($queryType, $offset, $filterMap);
if ($statement) {
Expand All @@ -73,11 +79,9 @@ public function query($query, $size, $type, $sort, $tagType, $page)
}

return [$results, $this->collection, $this->organisms];

} catch (QueryException $exception) {

return $this->handleException($exception);

}
}

Expand All @@ -89,6 +93,7 @@ private function determineQueryType($query)
$patterns = [
'inchi' => '/^((InChI=)?[^J][0-9BCOHNSOPrIFla+\-\(\)\\\\\/,pqbtmsih]{6,})$/i',
'inchikey' => '/^([0-9A-Z\-]{27})$/i', // Modified to ensure exact length
'parttialinchikey' => '/^([A-Z]{14})$/i',
'smiles' => '/^([^J][0-9BCOHNSOPrIFla@+\-\[\]\(\)\\\\\/%=#$]{6,})$/i',
];

Expand All @@ -102,6 +107,8 @@ private function determineQueryType($query)
return 'inchi';
} elseif ($type == 'inchikey' && substr($query, 14, 1) == '-' && strlen($query) == 27) {
return 'inchikey';
} elseif ($type == 'parttialinchikey' && strlen($query) == 14) {
return 'parttialinchikey';
} elseif ($type == 'smiles') {
return 'smiles';
}
Expand All @@ -111,43 +118,6 @@ private function determineQueryType($query)
return 'text';
}

/**
* Return a mapping of filter codes to database columns.
*/
private function getFilterMap()
{
return [
'mf' => 'molecular_formula',
'mw' => 'molecular_weight',
'hac' => 'heavy_atom_count',
'tac' => 'total_atom_count',
'arc' => 'aromatic_ring_count',
'rbc' => 'rotatable_bond_count',
'mrc' => 'minimal_number_of_rings',
'fc' => 'formal_charge',
'cs' => 'contains_sugar',
'crs' => 'contains_ring_sugars',
'cls' => 'contains_linear_sugars',
'npl' => 'np_likeness_score',
'alogp' => 'alogp',
'topopsa' => 'topo_psa',
'fsp3' => 'fsp3',
'hba' => 'h_bond_acceptor_count',
'hbd' => 'h_bond_donor_count',
'ro5v' => 'rule_of_5_violations',
'lhba' => 'lipinski_h_bond_acceptor_count',
'lhbd' => 'lipinski_h_bond_donor_count',
'lro5v' => 'lipinski_rule_of_5_violations',
'ds' => 'found_in_databases',
'class' => 'chemical_class',
'subclass' => 'chemical_sub_class',
'superclass' => 'chemical_super_class',
'parent' => 'direct_parent_classification',
'org' => 'organism',
'cite' => 'ciatation',
];
}

/**
* Build the SQL statement based on the query type.
*/
Expand Down Expand Up @@ -175,6 +145,7 @@ private function buildStatement($queryType, $offset, $filterMap)
break;

case 'inchikey':
case 'parttialinchikey':
$statement = "SELECT id, COUNT(*) OVER ()
FROM molecules
WHERE standard_inchi_key LIKE '%{$this->query}%'
Expand Down Expand Up @@ -262,19 +233,23 @@ private function buildTagsStatement($offset)
private function buildFiltersStatement($filterMap)
{
$orConditions = explode('OR', $this->query);
$statement = 'SELECT molecule_id as id, COUNT(*) OVER ()
FROM properties WHERE ';

foreach ($orConditions as $index => $orCondition) {
if ($index > 0) {
$statement = 'SELECT properties.molecule_id as id, COUNT(*) OVER ()
FROM properties
INNER JOIN molecules ON properties.molecule_id = molecules.id
WHERE molecules.active = TRUE
AND NOT (molecules.is_parent = TRUE AND molecules.has_variants = TRUE)
AND ';

foreach ($orConditions as $outerIndex => $orCondition) {
if ($outerIndex > 0) {
$statement .= ' OR ';
}

$andConditions = explode(' ', trim($orCondition));
$statement .= '(';

foreach ($andConditions as $index => $andCondition) {
if ($index > 0) {
foreach ($andConditions as $innerIndex => $andCondition) {
if ($innerIndex > 0) {
$statement .= ' AND ';
}

Expand Down Expand Up @@ -326,7 +301,7 @@ private function buildDefaultStatement($offset)
WHEN \"name\"::TEXT ILIKE '%{$this->query}%' THEN 4
WHEN \"synonyms\"::TEXT ILIKE '%{$this->query}%' THEN 5
WHEN \"identifier\"::TEXT ILIKE '%{$this->query}%' THEN 6
ELSE 7
ELSE 7
END
LIMIT {$this->size} OFFSET {$offset}";
} else {
Expand All @@ -350,16 +325,16 @@ private function executeQuery($statement)

$ids_array = collect($hits)->pluck('id')->toArray();
$ids = implode(',', $ids_array);
// dd($ids);

if ($ids != '') {

$statement = "
SELECT identifier, canonical_smiles, annotation_level, name, iupac_name, organism_count, citation_count, geo_count, collection_count
FROM molecules
WHERE id = ANY (array[{$ids}])
WHERE id = ANY (array[{$ids}]) AND active = TRUE AND NOT (is_parent = TRUE AND has_variants = TRUE)
ORDER BY array_position(array[{$ids}], id);
";

if ($this->sort == 'recent') {
$statement .= ' ORDER BY created_at DESC';
}
Expand Down
4 changes: 3 additions & 1 deletion app/Console/Commands/AssignCollectionsIdentifiers.php
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,9 @@ public function handle()

public function generateIdentifier($index)
{
return 'CNPC'.str_pad($index, 4, '0', STR_PAD_LEFT);
$prefix = (env('APP_ENV') === 'production') ? 'CNPC' : 'CNPC_DEV';

return $prefix.str_pad($index, 6, '0', STR_PAD_LEFT);
}

public function fetchLastIndex()
Expand Down
45 changes: 45 additions & 0 deletions app/Console/Commands/AssignDOIs.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<?php

namespace App\Console\Commands;

use App\Actions\Coconut\AssignDOI;
use App\Models\Collection;
use Illuminate\Console\Command;
use Illuminate\Support\Facades\DB;

class AssignDOIs extends Command
{
/**
* The name and signature of the console command.
*
* @var string
*/
protected $signature = 'coconut:collection-assign-doi';

/**
* The console command description.
*
* @var string
*/
protected $description = 'Assigns dois to all unassigned public collections';

/**
* Execute the console command.
*
* @return int
*/
public function handle(AssignDOI $assigner)
{
return DB::transaction(function () use ($assigner) {
$collections = Collection::where([
['is_public', true],
['doi', null],
])->get();

foreach ($collections as $collection) {
$collectionDOI = $collection->doi ? $collection->doi : null;
$assigner->assign($collection);
}
});
}
}
18 changes: 9 additions & 9 deletions app/Console/Commands/DashWidgetsRefresh.php
Original file line number Diff line number Diff line change
Expand Up @@ -32,49 +32,49 @@ public function handle()
// Cache::flush();

// Create the cache for all DashboardStats widgets
Cache::remember('stats.collections', 172800, function () {
Cache::rememberForever('stats.collections', function () {
return DB::table('collections')->selectRaw('count(*)')->get()[0]->count;
});
$this->info('Cache for collections refreshed.');

Cache::remember('stats.citations', 172800, function () {
Cache::rememberForever('stats.citations', function () {
return DB::table('citations')->selectRaw('count(*)')->get()[0]->count;
});
$this->info('Cache for citations refreshed.');

Cache::remember('stats.organisms', 172800, function () {
Cache::rememberForever('stats.organisms', function () {
return DB::table('organisms')->selectRaw('count(*)')->get()[0]->count;
});
$this->info('Cache for organisms refreshed.');

Cache::remember('stats.geo_locations', 172800, function () {
Cache::rememberForever('stats.geo_locations', function () {
return DB::table('geo_locations')->selectRaw('count(*)')->get()[0]->count;
});
$this->info('Cache for geo locations refreshed.');

Cache::remember('stats.reports', 172800, function () {
Cache::rememberForever('stats.reports', function () {
return DB::table('reports')->selectRaw('count(*)')->get()[0]->count;
});
$this->info('Cache for reports refreshed.');

// Create the cache for all DashboardStatsMid widgets

Cache::remember('stats.molecules.non_stereo', 172800, function () {
Cache::rememberForever('stats.molecules.non_stereo', function () {
return DB::table('molecules')->selectRaw('count(*)')->whereRaw('has_stereo=false and is_parent=false')->get()[0]->count;
});
$this->info('Cache for molecules non-stereo refreshed.');

Cache::remember('stats.molecules.stereo', 172800, function () {
Cache::rememberForever('stats.molecules.stereo', function () {
return DB::table('molecules')->selectRaw('count(*)')->whereRaw('has_stereo=true')->get()[0]->count;
});
$this->info('Cache for molecules stereo refreshed.');

Cache::remember('stats.molecules.parent', 172800, function () {
Cache::rememberForever('stats.molecules.parent', function () {
return DB::table('molecules')->selectRaw('count(*)')->whereRaw('has_stereo=false and is_parent=true')->get()[0]->count;
});
$this->info('Cache for molecules parent refreshed.');

Cache::remember('stats.molecules', 172800, function () {
Cache::rememberForever('stats.molecules', function () {
return DB::table('molecules')->selectRaw('count(*)')->whereRaw('active=true and NOT (is_parent=true AND has_variants=true)')->get()[0]->count;
});
$this->info('Cache for molecules refreshed.');
Expand Down
Loading

0 comments on commit ab28195

Please sign in to comment.