Skip to content

Commit

Permalink
Update documentation for Redis DB and Readme
Browse files Browse the repository at this point in the history
  • Loading branch information
OnToNothing committed Oct 28, 2024
1 parent df214d3 commit fd2bffd
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 14 deletions.
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,15 @@ With the API limiting that is in place, it would take us months to download all


## Getting Started

If you are interested in becoming a developer, see `docs/developers.md`.

To run Mirrulations, you need Python 3.9 or greater ([MacOSX](https://docs.python-guide.org/starting/install3/osx/) or [Windows](https://docs.python-guide.org/starting/install3/win/)) on your machine to run this, as well as [redis](https://redis.io/) if you are running a server

You will also need a valid API key from Regulations.gov to participate. To apply for a key, you must simply [contact the Regulations Help Desk]([email protected]) and provide your name, email address, organization, and intended use of the API. If you are not with any organizations, just say so in your message. They will email you with a key once they've verified you and activated the key.
You will also need a valid API key from Regulations.gov to participate. To apply for a key, you must simply complete the API key request form (https://open.gsa.gov/api/regulationsgov/) and provide your name, email address, organization, and intended use of the API. After review the key will be sent by email.

To download the actual project, you will need to go to our [GitHub page](https://github.com/MoravianUniversity/mirrulations) and [clone](https://help.github.com/articles/cloning-a-repository/) the project to your computer.



### Disclaimers
--------
"Regulations.gov and the Federal government cannot verify and are not responsible for the accuracy or authenticity of the data or analyses derived from the data after the data has been retrieved from Regulations.gov."
Expand Down
64 changes: 53 additions & 11 deletions docs/database.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,53 @@

## Database Format

We use [Redis](https://redis.io/) to store jobs as well as key values that must
be remembered.
We use [Redis](https://redis.io/) to store jobs as well as key values

## Database Structure

The Redis database is structured with the following keys:

regulations_total_comments
num_dockets_done
num_documents_done
num_attachments_done
last_job_id
jobs_in_progress
num_pdf_attachments_done
num_jobs_documents_waiting
num_jobs_comments_waiting
dockets_last_timestamp
invalid_jobs
regulations_total_dockets
client_jobs
num_extractions_done
regulations_total_documents
mirrulations_bucket_size
num_comments_done
documents_last_timestamp
num_jobs_dockets_waiting
comments_last_timestamp


## Job Management

The REDIS database has three "queues", with the names:

`jobs_waiting_queue`, `jobs_in_progress`, and `jobs_done`.

`jobs_waiting_queue` is a list, while 'jobs_in_progress' and 'jobs_done' are hashes.
Each stores jobs for clients to process.
The keys serve the following functions:

jobs_waiting_queue: A list holding JSON strings representing each job.

jobs_in_progress: A hash storing jobs currently being processed.

Keys will be integers, the job ids of the jobs.
These keys will be mapped to integers, the values to be processed.
jobs_done: A hash storing completed jobs.

Additionally, the database has an integer value storing the number of clients:
`total_num_client_ids`.
The keys client_jobs and total_num_client_ids are used for sotring client information.

client_jobs: A hash mapping job IDs to client IDs.

total_num_client_ids: An integer value storing the number of clients.

## Redis Format
## `jobs_waiting_queue`
Expand Down Expand Up @@ -54,7 +84,19 @@ timestamp seen when querying regulations.gov.
The `last_job_id` variable is used by the work generator to ensure it generates
unique ids for each job.

## Client IDs

The 'last_client_id' variable is used by the work server to ensure that it
generates unique client ids.
## Job Statistics Keys

DOCKETS_DONE: Tracks the number of completed dockets.

DOCUMENTS_DONE: Tracks the number of completed documents.

COMMENTS_DONE: Tracks the number of completed comments.

ATTACHMENTS_DONE: Tracks the number of completed attachments.

PDF_ATTACHMENTS_DONE: Tracks the number of completed PDF attachments.

EXTRACTIONS_DONE: Tracks the number of completed extractions.

MIRRULATION_BUCKET_SIZE: Stores the size of the mirrulations bucket.

0 comments on commit fd2bffd

Please sign in to comment.