Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New algorithm logic #33

Open
harshita-srivastava-yral opened this issue Jul 15, 2024 · 29 comments
Open

New algorithm logic #33

harshita-srivastava-yral opened this issue Jul 15, 2024 · 29 comments
Assignees

Comments

@harshita-srivastava-yral
Copy link

harshita-srivastava-yral commented Jul 15, 2024

  • We have aligned and presented the audience the new algorithm logic.
  • Next step: Setting up job architecture and deployment pipeline
@harshita-srivastava-yral
Copy link
Author

  • GA user attributes are not flowing in big query - 1 day lag is there before data starts flowing

@harshita-srivastava-yral
Copy link
Author

Issues identified:
discrepancy in user attributes flowing as "NULL".
Before and after login the user IDs are not getting merged

Reference doc: https://support.google.com/analytics/answer/9268042?sjid=12265376955839615970-AP

@harshita-srivastava-yral
Copy link
Author

Implementation discussion is planned today

@harshita-srivastava-yral
Copy link
Author

  • Phase 2 issue to be created for hashtag scraping from youtube and adding it to reocomendation logic for user. - Understand if this can be done in current phase and can push hashtags to big query
  • Connect with Komal and Ravi to create an issue around endpoint changes requires to achieve the above task

@Natasha-GB
Copy link

  • Batch jobs and tables have been set
  • Explore third party options
  • Google's VectorDB option to be looked at
  • Set up monitoring on DS services

@harshita-srivastava-yral
Copy link
Author

  • We will have to go with the third party.
  • Google form for big query has been filled. Awaiting response for support on google cloud.
  • Next steps:
  1. Await response from google to get it done from big query
  2. Another provided to spend time on pursuing that front and see if all operations are available there. Set the similar search script and see if the algorithm is returning the similar batch of videos.

@harshita-srivastava-yral
Copy link
Author

harshita-srivastava-yral commented Jul 25, 2024

Next step:

  1. Whenever we get new data, we push that to upstash
  2. Setting up sanity check for vectorDB sync
  3. Nudging on google channel
  4. Check if the required functionalities are in the database: https://cloud.google.com/alloydb/ai?hl=en (This is transactional database which is not there in big query)

@Natasha-GB
Copy link

  • Can we get realtime location for users?
  • Successful video play logic on canisters

@jay-dhanwant-yral
Copy link

jay-dhanwant-yral commented Jul 26, 2024

ML server setup

Logic (2 days)

  • Setup upstash server
  • Sync bigquery to upstash
  • Heuristic checks for sync (can be picked later)
  • Enable API authentication
  • Finalising the logic

Deployment (1 day)

  • adding github secret
  • deploying to fly.io

Testing and Benchmarking (1-2 days post integration)

@Natasha-GB
Copy link

  • Need to set up Upstash server for prod
  • E2E for Staging to be verified on Jay's end

@harshita-srivastava-yral
Copy link
Author

Next Step:

  • Working on server setup today 1st half expected to be completed
  • Test it with Komal then we can close it from DS end

@jay-dhanwant-yral
Copy link

jay-dhanwant-yral commented Jul 31, 2024

ML server setup

  • Setup upstash server & initialise
  • Sync bigquery to upstash
  • Heuristic checks for sync (can be picked later)
  • Enable API authentication
  • Complete the DS codebase & local testing (2 days)
  • Change & test the new logic once canister data is available
  • code review
  • Adhoc changes
  • Deployment (1 day)

Testing and Benchmarking (1-2 days) ~ to be picked after the tech integration is done from the backend and frontend

@harshita-srivastava-yral
Copy link
Author

  • Post cannister update Jay will need another 2 days and 1 additional day for testing

@Natasha-GB
Copy link

  • Git repo initiation today

@harshita-srivastava-yral
Copy link
Author

  • ML server is working E2E and repo is shared.
  • Start fetching likes
  • Create a logic for sampling
    Monday Task:
    Deployment to Fly

@Natasha-GB
Copy link

  • Final logic pending only

@harshita-srivastava-yral
Copy link
Author

  • Need 1-2 days to complete the ML side of things
  • Rust side of things have been closed.

@Natasha-GB
Copy link

Natasha-GB commented Aug 7, 2024

  • Server setup done
  • Issue: video uploaded by the importer, not getting reflected in storage
  • Popularity integration (2-3 hours)
  • Rest sorted

@harshita-srivastava-yral
Copy link
Author

  • Wrapped up ML feed server and analysed the behaviour
  • Found some issue and noting in notion (DS issues to be taken up in next phase)
  • Popularity needs to be toned down as NSFW video tagging is creating issue
  • Not happening in main feed as its driven by freshness
  • Staging link shared where we can see the new logic however feeded content is not getting tagged in the feed yet
  • Initial caching as current load time is high for personalised feed

@Natasha-GB
Copy link

Natasha-GB commented Aug 9, 2024

  • Freshness is fetched
  • Not using popularity due to NSFW tagging
  • Build a clustering mechanism to identify NSFW (people need to help us tag so that we can identify them correctly), use open source models to generate annotations
  • If NSFW toggle is on > show them initial training data
  • If toggle is off > for cold-start > have videos under guardrails (guardrail - how? warm annotations)
  • Cannot judge about user's preference in the first go, however, lots of NSFW complaints raised
  • HON Game data to still be incorporated into the model
  • Any negative response should be considered as a strong signal to not display NSFW videos
  • Put FE on hotornot.wtf for data gathering

Next Steps >

  • Upstash to pgvector to be explored
  • Behaviour to be confirmed
  • Instance sizing

@Natasha-GB
Copy link

Natasha-GB commented Aug 12, 2024

  • Test vector db offerings
  • Finalise route for AI feed
  • Option 1: Go as is (deployed to new URL - hotornot.wtf - use for internal testing/organic audience) - recommended for now
  • Option 2: Hard Signals from user (Reflective and Predictive: blurring out videos > See Anyway; prevents exposure to all)
  • Option 3: NSFW tags and take a business decision to not let these videos flow

@Natasha-GB
Copy link

  • VectorDB: BQ pre-filtering and querying solving our criteria (okay to go forward with)
  • Refactoring
  • Freshness along with popularity and relevance to be incorporated

@Natasha-GB
Copy link

  • Metadata in BQ flowing correctly
  • Integration done
  • Plan to ship today

@Natasha-GB
Copy link

  • v0: freshness added to serve more HON games and newer videos seeded
  • Testing done from Jay's end, signals make sense (tested locally)
  • Deployment to staging to be done today

@harshita-srivastava-yral
Copy link
Author

  • Feed is not behaving as expected. Need to spend time to check what is not working fine
  • Connect with Devansh to understand the next step on feed side

@Natasha-GB
Copy link

  • Testing in local is in progress to figure out ongoing issues (Komal & Jay working together on this)
  • Test containers (only off-chain pending and to be set up)

@harshita-srivastava-yral
Copy link
Author

  • Lets start building the ML feed phase 2
  • Reports on feed to be part of the logic implementation of phase 2
  • Prioritise it over other task.
  • If there is any video not part of listing but is proposed by model, figure out a way to ensure it should not break and become part of the list

@harshita-srivastava-yral
Copy link
Author

  • Create an issue around removing the duplication videos from the feed

@harshita-srivastava-yral
Copy link
Author

This has been closed from @jay-dhanwant end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants