Releases: brunoamaral/gregory-ai
The António Lopes Edition
Hope you're ready, what we have is heavy. Let's hear it for @antoniolopes who shines from the shadows and gave Gregory an AI upgrade. Let's get to it before your patience starts to fade.
António has been helping Gregory since the early stage, with the relevancy algorithm, and advice worthy of a sage. This time he brought a new summariser for the abstracts that can process the database through Django's management commands.
./manage.py get_takeaways
will populate the "takeaways" column with the key points within the abstract of each article.
In future releases we may use this to improve the newsletters and automatic tweets.
And his magic didn't stop here. There is a new API endpoint that allows you to add new articles via http POST requests.
There is also a new SciencePaper
class to make sure we have all the required information when saving article. This is also used to clean up the abstracts of any weird characters or html.
To save on CPU, and be gentle with the crossref API, we now stop trying to fetch missing data after trying for 30 days.
A special word of appreciation goes out to @codeZenon for taking the time to help us improve the documentation.
Development of new features and improvements has been 3x faster than documentation, and I don't expect it to improve. Our time is scarce. Which isn't the same as saying we don't care.
If you have any questions, please reach out by posting an issue or adding a thread in the discussion page.
Final note, remember to run ./manage.py migrate
and pip install -r requirements.txt
in the admin container when upgrading.
What's Changed
- quick fix by @brunoamaral in #258
- remove hardcoded information from crossref script by @brunoamaral in #259
- removes utm parameters from urls in feedreader by @brunoamaral in #261
- API returns authors as an object with first name, family name, and ORCID url by @brunoamaral in #264
- add information from crossref.org upon fetching articles from the rss feeds by @brunoamaral in #269
- Improves the way we fetch authors by using Django's ORM by @brunoamaral in #267
- Refactor pipeline by @brunoamaral in #270
- Apply method to avoid excessive queries to crossreforg by @brunoamaral in #273
- partial fix for naive datetime warning by @brunoamaral in #275
- fixed some grammer issues by @codeZenon in #276
- Auth API by @antoniolopes by @brunoamaral in #271
- Added two methods to SciencePaper class refresh() and clean_abstract() by @brunoamaral in #280
- Make DOI optional when adding content from the API by @brunoamaral in #281
- adds a new endpoint to list articles by journal name by @brunoamaral in #283
- Adds a script to calculate the summary of abstracts by @brunoamaral in #284
- fix variable name by @brunoamaral in #286
- Clean up debug prints, limit results to 100 rows, fix #285 by @brunoamaral in #287
- create shell command to process takeaways by @brunoamaral in #288
- include takeaways in article json output by @brunoamaral in #289
New Contributors
- @codeZenon made their first contribution in #276
Full Changelog: v12...v13
The Ichabod Crane edition
Gregory lost its head, just in time for Halloween, the website code was shot dead.
With the AI engine now standalone, this can be the setting stone for many opportunities ahead.
On fresh install you get an AI and an API, letting you build the frontend you wish. But there's more than this.
We already had the concept of sources from where we fetch the papers. Think of PubMed, the same source can bring us several journals from several publishers. Gregory now saves this information as strings in the database. (We may extend this in the future)
But there's more.
Gregory is an open book, but not all papers are open access. We are now using Unpaywall to tag which articles are free access and which are restricted.
You can get a list of open access papers through the api: https://api.gregory-ms.com/articles/open/
Or you can get it from an rss feed: https://api.gregory-ms.com/feed/articles/open/
The rss feeds were also missing the item pubDate field, we fixed that.
A lot of the information comes from crossref.org, we made that code a lot cleaner and less disperse.
If you need the full details, here they are.
What's Changed
- Check if key exists by @brunoamaral in #220
- Docs by @brunoamaral in #222
- Remove deprecated field from the database sent to twitter by @brunoamaral in #225
- Admins should have the option to create new subjects by @brunoamaral in #227
- Update documentation by @brunoamaral in #229
- Wrong link in the rss feed for clinical trials by @brunoamaral in #231
- API should list all relevant articles by @brunoamaral in #233
- Add information of availability for science papers in the articles table by @brunoamaral in #239
- Adds the option to fetch publication name from the DOI number by @brunoamaral in #240
- Add journal title to database by @brunoamaral in #241
- Use one script to fetch all the data we need from crossref by @brunoamaral in #244
- Add item pubDate to rss feeds by @brunoamaral in #250
- Sort by discovery date by @brunoamaral in #251
- Add api endpoint and rss feed to list open access articles by @brunoamaral in #253
- Website should be detached from the rest of the software by @brunoamaral in #246
Full Changelog: v11...v12
Still there? Here's the important bit. With the website detached from the AI we are opening the door to get much more done with Gregory and getting more flexibility in areas or fields where we can apply this method. Some thoughts include getting the takeaways from articles, finding biomedical entities in the abstracts, asking Gregory to answer specific questions within a subject.
Imagine asking, "what are the current disease modifying therapies available for MS?", and getting back a list of them all, sorted by category and date. It's a long shot, but we'll keep working on this ... 🙂
Thank you, and have a great Halloween!
The Flavio Amiel Edition
Take a seat, grab your favorite brew, there's a lot in this release for you.
1. SEO and Content
This release is a thank you to Flavio Amiel, he took some time to look at the website and offer some suggestions to improve the SEO and content.
This release implements some of those suggestions in content and in the url of articles listed. Previous URL was domain.com/articles/<article_id>
, the new URL is domain.com/articles/<article_id>/<slug>
, and the slug is taken from the noun_phrases found for each article title, to keep it a bit more relevant for search engines.
Content was also reviewed to include search keywords when possible.
Flavio was important not just because he took the time to look at our project but also because he took. a fresh look at a part of it that was ... overlooked.
If you're facing the same issues, you can reach him on twitter or schedule a call.
2. Setting up Gregory
We like to automate the boring stuff and making things easier for everyone.
If you're installing Gregory for the first time, run setup.py
and it will take care of 80% of the work in setting up the database and the containers. This little sentence took quite some time but it's worth it to help people get their research up and running faster.
3. Filter feeds and API endpoints
Gregory has the concept of 'subject'. In this case, Multiple Sclerosis is the only subject configured. A Subject is a group of Sources and their respective articles. There are also categories that can be created. A category is a group of articles whose title matches at least one keyword in list for that category. Categories can include articles across subjects.
The one thing it didn't have was a way to filter by subject and category. So we added those options to the API and RSS Feeds in the format articles/category/ and articles/subject/ where and is the lowercase name with spaces replaced by dashes.
RSS Feeds
- Latest articles by subject,
/feed/articles/subject/<subject>/
, for example https://api.gregory-ms.com/feed/articles/subject/multiple-sclerosis/ - Latest articles by category,
/feed/articles/category/<category>/
, for example https://api.gregory-ms.com/feed/articles/category/mobility/
API endpoints
/articles/subject/<subject>/
, for example https://api.gregory-ms.com/articles/subject/multiple-sclerosis//articles/category/<category>/
, for example https://api.gregory-ms.com/articles/category/mobility/
4. Science paper, news, trials
What's in the news? Before this release Gregory could only understand science papers and clinical trials. We now have the option to include news articles without getting them mixed up with the other articles. You'll have to edit your current sources to make sure they have 'science paper' as the value of source for
.
We're doing this to help follow the full process of scientific discovery, from publishing hypothesis, running clinical trials, and making it known to the people outside the scientific community.
5. Ignore SSL, if you must
In the past we had some issues reading RSS feeds whose web server didn't have the SSL certificate configured properly, and we were using a workaround that wasn't ideal because it turned off SSL verification for every request. This was fixed now, and each Source can be configured independently to bypass the certificate check if you really must.
What's Changed
- 197 add noun phrases to url keeping redirect from old format by @brunoamaral in #198
- truncate url to 250 maximum of characters by @brunoamaral in #199
- sync with main by @brunoamaral in #202
- 179 error on first install because there are no subscribers by @brunoamaral in #204
- 192 articles should be of type paper and news by @brunoamaral in #200
- send 1 email per admin, save status if at least one send = success by @brunoamaral in #203
- 191 create an rss feed and api endpoint per category and subject by @brunoamaral in #206
- remove hardcoded site domain by @brunoamaral in #207
- 175 error running setuppy for django by @brunoamaral in #208
- Add documentation about env file by @brunoamaral in #209
- fix db host on setup.py by @brunoamaral in #210
- Add documentation about env file and build script by @brunoamaral in #211
- set env variables after user configures them by @brunoamaral in #214
- remove relation to sitesettings and django.contrib.sites by @brunoamaral in #215
- run django commands from setup script by @brunoamaral in #216
- 132 implement a better approach to ssl problems in feedreader indexer by @brunoamaral in #217
Full Changelog: v10.7...v11
The Rock edition
Content is alive and should evolve, live, and thrive.
Because I don't want anything too set in stone, you can edit the email title and footer in the new custom settings.
Careful on the upgrade ! Pull your changes and run to migrate:
sudo docker exec -it admin /bin/sh
./manage.py makemigrations && ./manage.py migrate
Visit the backoffice and edit the new settings.
This release also includes an example configuration for nginx.
Careful with your flows ! We are also moving away from running a custom version of the @node-red container.
Run sudo docker-compose pull && sudo docker-compose up -d
to make the change. You may need to install node-red packages you had installed previously.
What's Changed
- resolve #187 docker container breaks with new env variables by @brunoamaral in #188
- improve the README file by @brunoamaral in #189
- use oficial node-red image by @brunoamaral in #194
- example configuration for nginx + instructions by @brunoamaral in #195
- 190 the email template contains hardcoded information by @brunoamaral in #196
Full Changelog: v10.6.11...v10.7
The "Mise en place" edition
You got it, this is for the setup.py script to get you up to speed without missing a beat.
Not going to be verbose because the changelog should be right on the nose.
Send me a note to [email protected] or comment with any questions.
What's Changed
- make sure hugo_path is a string by @brunoamaral in #165
- check if hugo_path is set or try to find where it's installed by @brunoamaral in #166
- content review by @brunoamaral in #167
- setup.py creates a .env file if needed by @brunoamaral in #168
- Clarify setup instructions when running setup.py, and try running sudo to launch the containers by @brunoamaral in #169
- Exclude author attribute for Trials by @brunoamaral in #170
- include django configuration steps in setup.py by @brunoamaral in #172
- clean up and make sure we use gunicorn in production by @brunoamaral in #173
- make sure we configure the right db host by @brunoamaral in #178
- create the metabase database in postgres (you should reload the container once finished) by @brunoamaral in #181
- get domain from env variables by @brunoamaral in #182
Full Changelog: v10.6...v10.6.5
v10.6
Just some polishes and fixes to issues and other near misses.
What's Changed
- fix #151 by @brunoamaral in #159
- add server requirement by @brunoamaral in #160
- Organize npm files and update documentation by @brunoamaral in #163
- get path of hugo command to run build by @brunoamaral in #153
Full Changelog: v10.5.4...v10.6
The happy birthday edition
Not much happening, but I got some wind under my wing, took a deep breath and looked at what was left.
@dippas took some weight off my shoulders by fixing a few bugs on the frontend. Meanwhile, today I looked at all the information Gregory likes to send. The emails and the rss feeds are now listed in the readme file.
Also took a look at the install instructions, because I don't like to be vile, added more info, clarified.
And while we are at it, there were some amazing donations, from at least three nations. We have enough budget to keep the site running for the next 12 months.
I don't like stunts, so I'm adding that information to the next annual review. Transparency is a goal that I'm more than happy to pursue.
That's it, thank you for reading up to this bit.
What's Changed
- Allow users to subscribe on their own with a form on the frontend by @brunoamaral in #141
- Loads the articles from json and adds to database by @brunoamaral in #143
- New indexer for sagepub by @brunoamaral in #144
- Get published date from crossreforg by @brunoamaral in #146
- Fix missing autoprefixer dependency by @dippas in #147
- Fix mobile navigation - class nav-open not compiling by @dippas in #148
- Fix footer in authors page by @dippas in #149
- Allow users to subscribe on their own with a form on the frontend by @brunoamaral in #150
- Added more information on features and install procedure by @brunoamaral in #156
- Update readme.md by @brunoamaral in #157
New Contributors
Full Changelog: v10...v10.5.4
The Frankenstein edition
Gregory is made of several bits and we're trying to make sure everything fits.
So we moved the website files into their own directory, hugo
and updated the build script to match. Some directories were renamed to be more descriptive, and others deleted.
Things will break and you should be careful not to lose your database. Otherwise, this makes way for a system that is easier to understand and evolve.
Also, it seems that this release includes a bug, the mobile menu isn't working and I will look into it sometime next week. #137
What's Changed
- Use psycopg2 to fetch data from PG during the build process by @brunoamaral in #125
- fix build error when trial title contains ' character by @brunoamaral in #126
- 115 list relevant results in the last 30 days in the doctors page by @brunoamaral in #129
- add categories to clinical trials by @brunoamaral in #131
- 134 move site to its own directory by @brunoamaral in #136
Full Changelog: v9.4...v10
The Rosie Jetson edition
This is mostly some house cleaning
What's Changed
- fix listing of articles for physical therapists by @brunoamaral in #112
- categories now use "terms" as a way to tag articles @brunoamaral in #116
- fixes excessive listing of articles in the weekly digest by @brunoamaral in #118
- creates a new RSS feed to post articles and clinical trials on twitter by @brunoamaral in #119
Full Changelog: v9...v9.1
Breaking the flow
I know, I know, we got to keep the flow.
What you don't want to miss for this release is the new subscription lists for alerts and the django-cron implementation to keep the database up to date and complete.
Subscriptions
You can now create a list to notify people of new clinical trials, send admin digests. The following notifications are included using django-cron:
- subscriptions.admin_summary
- subscriptions.weekly_summary
- subscriptions.trials_notification
Db maintenance
Previously, we had node-red flows to update authors and make sure articles were properly categorized. That is now done with django-cron as well using the following tasks:
- db_maintenance.get_authors
- db_maintenance.rebuild_categories
Same goes for the prediction of relevant articles and calculation of noun phrases:
- gregory.noun_phrases
- gregory.predict
Node-red was also fetching some rss feeds using a python script that read from the database. You guessed it, it's now a django-cron task:
- gregory.feedreadertask
Building the system
The latest developments have slowly made the system easier to install with docker containers, right now you should be up and running by setting up the correct .env variables, and running docker-compose up -d
.
What's broken
Training the Machine Learning models inside the container is not working, seems to run out of memory. The workaround is to build locally and place the files in the ml_models
directory.
What's Changed
- add ko-fi link by @brunoamaral in #99
- Manage subscriptions through django's admin by @brunoamaral in #100
- fix to add migrations by @brunoamaral in #101
- new template for the notification of new trials by @brunoamaral in #102
- run weekly summary from django by @brunoamaral in #103
- adds dbMaintenance tasks and moves ML and AI components into django by @brunoamaral in #105
- Fix predictor by @brunoamaral in #107
- Make Dockerfile use requirements.txt by @brunoamaral in #109
Full Changelog: v8.5...v9