Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data retrieval - recipes from Nutritionfacts #218

Open
3 of 11 tasks
tubamos opened this issue Jun 25, 2024 · 0 comments
Open
3 of 11 tasks

Data retrieval - recipes from Nutritionfacts #218

tubamos opened this issue Jun 25, 2024 · 0 comments
Assignees
Labels
data pipeline Items that are related to the scrapers of the data pipeline sprint-09 Items assigned to sprint 09

Comments

@tubamos
Copy link
Contributor

tubamos commented Jun 25, 2024

Domain

app frontend, app backend

Description

Develop a web scraper to extract recipe content from the Nutritionfacts.org website. This scraper will use the existing orchestrator infrastructure for task scheduling, execution and monitoring. The scraper will collect data including recipe names, ingredients, instructions, and nutritional information. The data will be stored in JSON format in a MongoDB database hosted on Google Cloud Servers. The extracted data will be utilized in creating context for custom agents specialized in nutrition advice.

User Story

  • As a common-user,
  • I want access to nutritional information from reliable sources provided by my customised agent
  • so that I can make informed lifestyle decisions.

Acceptance Criteria

  • The scraper is able to access and parse Nutritionfacts.org.
  • Recipe names, ingredients, instructions, and nutritional information are extracted.
  • Data is stored in JSON format in the database.
  • The scraper uses the orchestrator for task scheduling, execution, and monitoring.
  • Error handling and logging are implemented for scraper failures.

Definition of Done

  • The feature has been fully implemented.
  • The feature has been manually tested and works as expected without critical bugs.
  • The feature code is documented with clear explanations of its functionality and usage.
  • The feature code has been reviewed and approved by at least one team member.
  • The feature branches have been merged into the main branch and closed.
  • The feature utility, function and usage have been documented in the respective project wiki on github.
@tubamos tubamos converted this from a draft issue Jun 25, 2024
@tubamos tubamos added the data pipeline Items that are related to the scrapers of the data pipeline label Jun 25, 2024
@tubamos tubamos added this to the Part A: Data acquisition milestone Jun 25, 2024
@tubamos tubamos moved this from Sprint Backlog to Awaiting Review in amos2024ss06-feature-board Jun 25, 2024
@tubamos tubamos added the sprint-09 Items assigned to sprint 09 label Jun 25, 2024
@tubamos tubamos moved this from Awaiting Review to In Progress in amos2024ss06-feature-board Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data pipeline Items that are related to the scrapers of the data pipeline sprint-09 Items assigned to sprint 09
Projects
Status: In Progress
Development

No branches or pull requests

2 participants