Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wk source zips #6

Open
wants to merge 16 commits into
base: dev
Choose a base branch
from
Open

Wk source zips #6

wants to merge 16 commits into from

Conversation

wkelly17
Copy link
Contributor

@PurpleGuitar , @danparisd - This is branched off dev before merge, so it looks like 5 commits, but this is really just one:
9d38750.
This pr adds a view for Mark's requested source_zips if they are actually what they wants. Replicated here with some notes:

image

Not sure if this what you want or not exactly Craig, and I can adjust it to add on full language name or something if Orature or other needs that pretty easy.
Some notes.

  1. This makes the assumption that the branch in wacs is master. The archive/master.zip is a wacs convention thing.
  2. This makes the assumption, as is true right now, that all these projects are in wacs (these zips are wacs api things).
  3. This filters out content by domain (scripture only, as well as the meta properties of biel/primary)
  4. I was under the impression that we only wanted to expose these for which we had an entire project successfully rendered, hence the count of unique book slugs that have been rendered must be > 26 (i.e. 27 = NT.. more than than and we assume it's probably an OT + NT)

Here is an example result:
image

If this isn't what you had in mind, we can certainly pivot, but this is at least what I understand that Mark/Orature was asking for.

Copy link

cloudflare-workers-and-pages bot commented Apr 3, 2024

Deploying languageapi with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1bb22ab
Status: ✅  Deploy successful!
Preview URL: https://61fe2e14.languageapi.pages.dev
Branch Preview URL: https://wk-source-zips.languageapi.pages.dev

View logs

Copy link
Contributor Author

@wkelly17 wkelly17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PurpleGuitar @danparisd - I might should have put this in a different PR than just attaching it source zips, but being that both will need reworking if we adjust architecture, then I figured its fine to just tack it on here.

Copy link
Contributor Author

@wkelly17 wkelly17 Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PurpleGuitar @danparisd - this is an initial proposition for tracking localization of bible book names and of words like "tw" -> translation words. As seen, I figured the most logical unique index (and in this case I just made it the composite primary key) is the ietf already in the database plus a "key" for the string.
The one question I wonder about is do we want to add something like a "domain" to this table as well. I.e. the domain of bible books woudl be "bible_book" and then of "tn" would be something like resource types. There aren't currently conflicts between things like Gen and tn, and I'm not aware of any potential conflicts since the bible book slugs are set in englihs. But thoughts?
Forgot to add the lines: Those are here: https://github.com/WycliffeAssociates/languageapi/pull/6/files/afea3a4b1556f3db8b0f4ccadd3d3547a845c8eb#diff-ea2dd11638ccb90cd55880ec76d010447b0b2737763ba9563fac7455956d9096R321-R338

Comment on lines 63 to 101
// const query = sql.raw(`SELECT book_name, book_slug, ietf_code, id
// FROM (SELECT book_name, book_slug, l.ietf_code, c.id,
// ROW_NUMBER() OVER (PARTITION BY l.ietf_code, book_slug ORDER BY book_slug) AS rn
// FROM scriptural_rendering_metadata AS srm
// JOIN rendering AS r ON r.id = srm.rendering_id
// JOIN content AS c ON r.content_id = c.id
// JOIN language AS l ON l.ietf_code = c.language_id
// JOIN git_repo AS gr ON c.git_id = gr.id
// WHERE gr.username ILIKE '%wa-catalog%'
// AND c.domain = 'scripture'
// AND book_slug IS NOT NULL
// ) AS subquery
// WHERE rn = 1
// ORDER BY ietf_code, book_slug;`);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PurpleGuitar @danparisd - I've done it with the ORM instead of raw sql, but I imaigne the commented out raw sql might be more readable. In short, this query has those hardcoded dependencies on the git repo username being wa-catalog there. We only need one localized version for each slug (e.g. Gen, Exo), hence the select from rn =1 for each partition.

The result looks like this:
image

Comment on lines 148 to 192
app.timer("manageLocalizationTable", {
schedule: "0 0 0 * * *",
handler: populateLocalization,
useMonitor: false,
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can ruin this not on a cron, but for now outside of setting up something more complicated such as doing event driven inserts based on the metadata table, this is a really straightforward way, and this day surely isn't gonna be that time critical I'd image.

Comment on lines 1 to 7
const en = {
tw: "Translations Words",
tn: "Translation Notes",
};
export type keysType = keyof typeof en;
export default {dict: en, ietf: "en"};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I know that you mentioned doing this in crowdin, and we could certainly do it via api most likely. For now though, I've put these into TS files. We probably need to decide what's the scope of these to translate, and moreover, there's some junky data currently in content resource types that needs cleaning up where the resource type is clearly not somethign we would consider a resource type. Probably worth having a discussion on .

@PurpleGuitar
Copy link

Re: the source.zip list, I think this is a great first pass. If I read it right, it should return a JSON document containing all the source zips with some metadata. Looks good to me. 👍

Copy link
Contributor

@danparisd danparisd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants