TODOs

08-07-2022

Run baseline with more CRAC things
Why is TRIM not working
Effects of scheduling. Try with different schedules
Run a grid of scheduling experiments - wire it up
[x]

18-07-2022

Talking about speaker_ids
Embeddings be domain specific
So its possible that in some domain, we have contatenated this, and not in some other domains
Which creates a mismatch in slow_antecedent_scorer (Ctrl+f this in model).
So we need to create faux speaker IDs for EVERYTHING. And a flag to either include them or not at all
- an approach for that would be to add a n+1 embedding and use the other embedding in cases where domain 2 has no speakers
- alternatively just put [0,0,0...] everywhere.
- THIS NEEDS TO BE DONE TO MAKE THE MODEL WORK WITH SPEAKERS ded

20-07-2022

So that seems to be done. Now there are two bugs that we need to fix. Urgently.

There's something very wrong with persuasion.
1. First step is to try re-running the baseline on the current commit without speakers using --ignore-speakers flag
2. Re-running the baseline associated with the right commit and seeing if that does something. You better hope it does
3. Then its a matter of opening diffs and seeing what went wrong. Or preproc to see what went wrong. Basically I have no idea.
Ontonotes full does not even fit in memory. Not even the train set. We really do need to move to the HOI way of doing things. Shall we just do that hereon?

22-07-2022

So after three days, the bug is fixed. And what a bug fix wow fuck me sideways. Anyway. We'll be running a few more experiments. All on persuasion (or light) and trying to maximize the numbers we're getting Specifically

cr-pr-ccper-cr-pr-on-lrc-lspru
full onto alongwith persuasion (on the 32G GPU I just spotted)
switchboard baselines
throw in an NER dataset.
NER onto vs Coref onto

Other TODOs

Do loss scales (defined in the task triple) change anything?
coref num speakers variable... is it needed? it shouldn't be.

01-08-2022

Implement domain adversarial learning thing
Change encoder to something autoregressive
Analyse predictions of model - look at what samples are predicted. Can we curtail spans by POS tags?
Feed spans from the data iter part instead of within the model
Integrate Univ. Anaphora Scorer

03-08-2022

Make different chunks of the MTL model
Encoder chunk (that extends Tranformers sure)
NER chunk
Coref Chunk
Integration Logic remains in MTL but modules do their job well.

Compute Baseline Results
Implement Shared Dense Layers (2 layer dense net with same or half dimensionality)

16-08-2022

Figure out the deal with CoNLL12
Implement Shared Pruner (integrated in there)

Refactoring for multiple NER tags per span

Figure out how loss scales work, and simplify it\
Recheck if tokenizer is working as intended (do we need special chars or not)

21-08-2022

Fix MEMORY LEAK
Add a POS tagging task (which is token level, for a change. yay ^^)
Can we make the sample stuff resample every epoch?
Figure out how sampling ratios work
Simulate the loss thing (all entries get a 1)
dwie baselines
Check if ON works better with SpanBERT base or large

23-08-2022

figure out whats going wrong with NER eval (acc stuff)

Leftovers

Implement Shared Pruner (integrated in there)
Recheck if tokenizer is working as intended (do we need special chars or not)
Make setup.sh conditioned and verbose
Should we make a new repo for preproc data management?: NO!

General

Find a way to easily run sweeps

ACE

Figure out NER metrics and clamp them down tight.
Get ACE Sweep Started
Find a place to note down important experiment results

Ontonotes

So I'm still short of the baseline. Time to fork out and try to get baseline Ontonotes working well. First step: get the current best result out. Sweep.

Run a bert-base-uncased sweep
Mimic parameter init (and see if performance rises or falls)
Okay, its time to write a OAR management class/thing

21-09-2022

[!] Redo all NER runs (We are working with shared pruner!)
Add a flag to disassociate NER with pruner stuff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

todo.md

todo.md

TODOs

08-07-2022

18-07-2022

20-07-2022

22-07-2022

01-08-2022

03-08-2022

16-08-2022

Refactoring for multiple NER tags per span

21-08-2022

23-08-2022

Leftovers

General

ACE

Ontonotes

21-09-2022

Files

todo.md

Latest commit

History

todo.md

File metadata and controls

TODOs

08-07-2022

18-07-2022

20-07-2022

22-07-2022

01-08-2022

03-08-2022

16-08-2022

Refactoring for multiple NER tags per span

21-08-2022

23-08-2022

Leftovers

General

ACE

Ontonotes

21-09-2022