-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If no other training data, don't add any keyterms #580
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #580 +/- ##
==========================================
- Coverage 62.97% 62.92% -0.06%
==========================================
Files 279 279
Lines 13984 13989 +5
Branches 1814 1817 +3
==========================================
- Hits 8807 8803 -4
- Misses 4551 4560 +9
Partials 626 626 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r1, all commit messages.
Reviewable status: 3 of 4 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/Machine/test/Serval.Machine.Shared.Tests/Services/PreprocessBuildJobTests.cs
line 747 at r1 (raw file):
Assert.That( pretranslations[2]!["translation"]!.ToString(), Is.EqualTo("Source one, chapter twelve, verse one.")
Is there a way to 'save' this test? Or are we happy for it just to be saved in the commit history? I think ultimately we'll need a test like this once we enable some kind of filtering and, as you can see, it's a fairly long test.
src/ServiceToolkit/src/SIL.ServiceToolkit/Services/ParallelCorpusPreprocessingService.cs
line 89 at r1 (raw file):
} if (useKeyTerms)
Why not just && parallelTrainingDataPresent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only other concern is that I think the EITL team needs to 'buy in' before we do this. It's my impression that they already feel undermined in how KBT changes were done/unstable in regard to what the behavior is in Serval/SF. Even a quick email or Slack message that has an asking-not-telling tone would do it. Not sure if that's already been done.
Reviewable status: 3 of 4 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We need buy-in from the EITL team before we proceed.
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)
Previously, Enkidu93 (Eli C. Lowry) wrote…
We could "ignore" or otherwise pass over the test. That would be about equivalent as to saving it in commit history, but just more visible. I am fine either way. Also, if we actually implement this, we will likely want to change it to "filter on the pretranslations" rather than on the training data. |
Previously, Enkidu93 (Eli C. Lowry) wrote…
Because we won't know until every corpus has been run through whether there is any data at all. The simplest solution that I could think of was to collect them all and if there was data, add them when all the corpora are looped through. Otherwise, we would have to loop through the corpora twice. |
Request for input made. |
In the meeting that you had to leave, @johnml1135, I got the go-ahead from the EITL team for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135)
Resolves #569
This change is