Skip to content

Commit

Permalink
Added results
Browse files Browse the repository at this point in the history
  • Loading branch information
dvs23 committed Jul 5, 2024
1 parent 180b463 commit 6ee1d3d
Show file tree
Hide file tree
Showing 62 changed files with 1,393,667 additions and 0 deletions.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

150 changes: 150 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/test_prompt1.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

150 changes: 150 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/test_prompt2.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

150 changes: 150 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/test_prompt3.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

150 changes: 150 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/test_prompt4.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

150 changes: 150 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/test_prompt5.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

367 changes: 367 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/train_prompt1.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

367 changes: 367 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/train_prompt2.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

367 changes: 367 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/train_prompt3.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

367 changes: 367 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/train_prompt4.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

367 changes: 367 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/train_prompt5.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

41 changes: 41 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/valid_prompt1.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

41 changes: 41 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/valid_prompt2.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

41 changes: 41 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/valid_prompt3.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

41 changes: 41 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/valid_prompt4.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

41 changes: 41 additions & 0 deletions results/GPT/finetuned-gpt-3.5-turbo/datasets/valid_prompt5.jsonl

Large diffs are not rendered by default.

Large diffs are not rendered by default.

5,394 changes: 5,394 additions & 0 deletions results/GPT/gpt-3.5-turbo/QALD9_gpt-3.5-turbo_0-shot_test.csv

Large diffs are not rendered by default.

6,466 changes: 6,466 additions & 0 deletions results/GPT/gpt-3.5-turbo/QALD9_gpt-3.5-turbo_0-shot_test_lexicon.csv

Large diffs are not rendered by default.

22,618 changes: 22,618 additions & 0 deletions results/GPT/gpt-3.5-turbo/QALD9_gpt-3.5-turbo_0-shot_train.csv

Large diffs are not rendered by default.

15,691 changes: 15,691 additions & 0 deletions results/GPT/gpt-3.5-turbo/QALD9_gpt-3.5-turbo_0-shot_train_lexicon.csv

Large diffs are not rendered by default.

4,238 changes: 4,238 additions & 0 deletions results/GPT/gpt-4/QALD9_gpt-4_0-shot_test.csv

Large diffs are not rendered by default.

3,887 changes: 3,887 additions & 0 deletions results/GPT/gpt-4/QALD9_gpt-4_0-shot_test_lexicon.csv

Large diffs are not rendered by default.

11,135 changes: 11,135 additions & 0 deletions results/GPT/gpt-4/QALD9_gpt-4_0-shot_train.csv

Large diffs are not rendered by default.

10,391 changes: 10,391 additions & 0 deletions results/GPT/gpt-4/QALD9_gpt-4_0-shot_train_lexicon.csv

Large diffs are not rendered by default.

Binary file not shown.
630,065 changes: 630,065 additions & 0 deletions results/NeoDUDES/29376s-all-train/chosen-queries-per-strategy-train.csv

Large diffs are not rendered by default.

90 changes: 90 additions & 0 deletions results/NeoDUDES/29376s-all-train/strategy-total-eval-train.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
"Strategy","Micro F1","Micro TP","Micro FP","Micro FN","Micro EM","Micro Precision","Micro Recall","Macro F1","Macro Precision","Macro Recall"
"BestScoreEval","0,685346733668342",8524,1311,6516,145,"0,86670055922725","0,566755319148936","0,382261006228723","0,386424967640807","0,398045708951882"
"LLMMostWinsEval_0_0.0","0,165987448039775",4073,29963,10967,122,"0,119667410976613","0,270811170212766","0,325366508401639","0,328172222839548","0,36202121564206"
"LLMMostWinsEval_0_0.1","0,218986891437016",4001,17500,11039,122,"0,186084368168922","0,266023936170213","0,323823166310113","0,326144560330641","0,354991512881033"
"LLMMostWinsEval_0_0.25","0,247499374843711",3959,12993,11081,125,"0,233541764983483","0,263231382978723","0,333966836519941","0,337384623820406","0,359861197380347"
"LLMMostWinsEval_0_0.5","0,268510610340976",4087,11315,10953,124,"0,265355148681989","0,271742021276596","0,331210828438958","0,335429923605021","0,359512798133642"
"LLMMostWinsEval_0_0.75","0,265229836672841",3938,10717,11102,122,"0,268713749573524","0,261835106382979","0,325026508017797","0,329885022481599","0,347510770029919"
"LLMMostWinsEval_0_0.9","0,303387334315169",3914,6848,11126,119,"0,363687047017283","0,260239361702128","0,318770109177905","0,324267778568707","0,339822779008595"
"LLMAccumEval_0_sigmoid","0,257885327544375",4039,12245,11001,125,"0,248034880864652","0,268550531914894","0,332820374070796","0,335712542382242","0,36008566427148"
"LLMAccumEval_0_logits","0,257827710574192",4039,12252,11001,126,"0,247928303971518","0,268550531914894","0,334805936656314","0,337920691021767","0,355147392666542"
"LLMMostWinsEval_1_0.0","0,269374448142362",3966,10440,11074,114,"0,275301957517701","0,263696808510638","0,304687093449226","0,307679940118861","0,327462667194623"
"LLMMostWinsEval_1_0.1","0,291322388598232",4037,8638,11003,119,"0,318500986193294","0,268417553191489","0,317447407571822","0,319299124802778","0,339876933312592"
"LLMMostWinsEval_1_0.25","0,28925679065434",4036,8830,11004,121,"0,31369501010415","0,268351063829787","0,322120870260364","0,324016960248425","0,344687330652808"
"LLMMostWinsEval_1_0.5","0,339468126373863",3938,4223,11102,120,"0,482538904546012","0,261835106382979","0,317261951311722","0,320322319072941","0,328682122682754"
"LLMMostWinsEval_1_0.75","0,342266766744669",3876,3733,11164,120,"0,509396766986463","0,257712765957447","0,315530045630369","0,317853183270472","0,327348428891639"
"LLMMostWinsEval_1_0.9","0,357677202169441",3825,2523,11215,116,"0,602551984877127","0,254321808510638","0,305013968486206","0,307615088868952","0,312533614076824"
"LLMAccumEval_1_sigmoid","0,299503193754436",4009,7722,11031,121,"0,341744096837439","0,26655585106383","0,321200921938363","0,324640441192482","0,337613558826996"
"LLMAccumEval_1_logits","0,266984347595883",4034,11145,11006,125,"0,265761907899071","0,268218085106383","0,329528734409134","0,330293169615012","0,351546041215222"
"LLMMostWinsEval_2_0.0","0,306108229465046",3954,6840,11086,112,"0,366314619232907","0,262898936170213","0,299874868826942","0,303042749677626","0,320329608209712"
"LLMMostWinsEval_2_0.1","0,353083217815141",3948,3375,11092,116,"0,539123310118804","0,2625","0,309335327670813","0,313528313327153","0,317913947099763"
"LLMMostWinsEval_2_0.25","0,363931989222336",3917,2569,11123,116,"0,603916127042862","0,260438829787234","0,311203509785908","0,314869705833205","0,322852218704701"
"LLMMostWinsEval_2_0.5","0,371384045972533",3813,1681,11227,116,"0,694029850746269","0,253523936170213","0,306634125136808","0,308790394198337","0,31495702514221"
"LLMMostWinsEval_2_0.75","0,648366949118829",8149,1948,6891,121,"0,807071407348717","0,541821808510638","0,317027675952437","0,318099400820005","0,330737335366965"
"LLMMostWinsEval_2_0.9","0,639331397630072",8147,2299,6893,120,"0,779915757227647","0,541688829787234","0,314064712989474","0,31535591659504","0,325799063762027"
"LLMAccumEval_2_sigmoid","0,369242518256885",3868,2043,11172,119,"0,654373202503807","0,25718085106383","0,314228936392752","0,317063666862507","0,327554190208511"
"LLMAccumEval_2_logits","0,367775831873905",3885,2202,11155,123,"0,638245441103992","0,258311170212766","0,322158531063252","0,322503269420265","0,336401926834026"
"LLMMostWinsEval_3_0.0","0,228370221327968",4086,16658,10954,122,"0,196972618588508","0,271675531914894","0,323401584977872","0,32653221444582","0,349376247441536"
"LLMMostWinsEval_3_0.1","0,291823358762222",4074,8807,10966,120,"0,316279791941619","0,270877659574468","0,316476029664387","0,319118783486247","0,339705465548532"
"LLMMostWinsEval_3_0.25","0,304744046502906",4063,7562,10977,119,"0,349505376344086","0,270146276595745","0,317715712238655","0,322228311243486","0,339376247441536"
"LLMMostWinsEval_3_0.5","0,277269624573379",4062,10198,10978,117,"0,284852734922861","0,270079787234043","0,314760501664397","0,319763057730015","0,338553202174046"
"LLMMostWinsEval_3_0.75","0,291625759063307",4010,8451,11030,115,"0,321804028569136","0,266622340425532","0,308914770659272","0,31321929699141","0,325904900054845"
"LLMMostWinsEval_3_0.9","0,305992122672173",4001,7110,11039,113,"0,360093600936009","0,266023936170213","0,302433075547684","0,306009593595317","0,321021498134406"
"LLMAccumEval_3_sigmoid","0,300025973062224",4043,7868,10997,124,"0,339434136512467","0,268816489361702","0,329111252622202","0,33344918362634","0,343617123993416"
"LLMAccumEval_3_logits","0,297844210217048",4055,8134,10985,126,"0,332677003855936","0,269614361702128","0,335914737203465","0,340594137437961","0,345861792904752"
"LLMMostWinsEval_4_0.0","0,206872193349816",4100,20498,10940,124,"0,166680217903895","0,272606382978723","0,323097006850843","0,325380345941872","0,355807847391655"
"LLMMostWinsEval_4_0.1","0,223053197587139",4086,17511,10954,122,"0,18919294346437","0,271675531914894","0,319730003483839","0,321638655533515","0,350869575786716"
"LLMMostWinsEval_4_0.25","0,231832925510727",4074,16032,10966,117,"0,202626081766637","0,270877659574468","0,313702132607726","0,318521359400096","0,342940906376565"
"LLMMostWinsEval_4_0.5","0,497806358555202",8283,9955,6757,118,"0,45416164053076","0,550731382978723","0,317247552039026","0,322133082947989","0,341022337976516"
"LLMMostWinsEval_4_0.75","0,274906266738083",4106,10726,10934,114,"0,276833872707659","0,273005319148936","0,308844213428417","0,312646434871274","0,330843171659783"
"LLMMostWinsEval_4_0.9","0,276927182360986",4151,10788,10889,111,"0,277863310797242","0,275997340425532","0,300229909737861","0,304487625230983","0,321085336901948"
"LLMAccumEval_4_sigmoid","0,275332408566499",4069,10448,10971,122,"0,280292071364607","0,270545212765957","0,322091659866183","0,32537539476799","0,342618546980132"
"LLMAccumEval_4_logits","0,304804520116795",4019,7312,11021,123,"0,354690671608861","0,267220744680851","0,32591015521297","0,328912275553136","0,343769415362639"
"LLMMostWinsEval_5_0.0","0,13302034428795",4165,43417,10875,109,"0,0875331007523854","0,276928191489362","0,293210684019692","0,296326312043283","0,336797203180132"
"LLMMostWinsEval_5_0.1","0,13496952353236",4174,42637,10866,116,"0,0891670761145885","0,277526595744681","0,312772785838803","0,316042199208401","0,351387551103813"
"LLMMostWinsEval_5_0.25","0,148221309629014",4277,38394,10763,126,"0,100232007686719","0,284375","0,337838416994284","0,340930545662373","0,383486316535912"
"LLMMostWinsEval_5_0.5","0,155265994666868",4105,33732,10935,123,"0,108491688030235","0,272938829787234","0,33099311487794","0,335856959567217","0,371824183593176"
"LLMMostWinsEval_5_0.75","0,400145260864302",8264,18001,6776,123,"0,314639253759756","0,549468085106383","0,328973773456225","0,334425920027708","0,35947850458083"
"LLMMostWinsEval_5_0.9","0,408298508938948",8256,17145,6784,122,"0,325026573756939","0,548936170212766","0,326279783169642","0,331839022082941","0,352997023099349"
"LLMAccumEval_5_sigmoid","0,146232724030317",4264,39014,10776,125,"0,0985258098803087","0,283510638297872","0,331301612135124","0,334722192116731","0,375809548859144"
"LLMAccumEval_5_logits","0,145165813643694",4141,37871,10899,126,"0,0985670760735028","0,275332446808511","0,335143745778739","0,338284341351048","0,375258527027519"
"LLMMostWinsEval_6_0.0","0,446437124155181",8389,14153,6651,118,"0,372149764883329","0,557779255319149","0,313419393258965","0,316878055699837","0,339484116253109"
"LLMMostWinsEval_6_0.1","0,428923667886527",8263,15226,6777,124,"0,351781685044063","0,549401595744681","0,32930522352104","0,33260662548245","0,354298931067923"
"LLMMostWinsEval_6_0.25","0,470849122807018",8387,12198,6653,126,"0,407432596550887","0,557646276595745","0,334346770734864","0,336724006638819","0,360554075100845"
"LLMMostWinsEval_6_0.5","0,536437978258728",8167,7242,6873,121,"0,530014926341748","0,543018617021277","0,323020961982397","0,328042851505633","0,337557294150518"
"LLMMostWinsEval_6_0.75","0,55372293490803",8158,6268,6882,122,"0,565506723970609","0,542420212765957","0,323178818427884","0,328842769806269","0,331850846962589"
"LLMMostWinsEval_6_0.9","0,321969383533306",3891,5239,11149,123,"0,426177437020811","0,258710106382979","0,325179669854022","0,33101261642056","0,331805122225506"
"LLMAccumEval_6_sigmoid","0,277846864451945",3910,9195,11130,127,"0,298359404807325","0,259973404255319","0,332773845681641","0,336480875714266","0,346890128668537"
"LLMAccumEval_6_logits","0,243147711769813",3921,13291,11119,125,"0,227806181733674","0,260704787234043","0,329481664611682","0,332777172010562","0,344420992866068"
"LLMMostWinsEval_7_0.0","0,195127748068925",4105,22930,10935,120,"0,151840207138894","0,272938829787234","0,316551166567279","0,318544832157396","0,345432488868148"
"LLMMostWinsEval_7_0.1","0,191071304589322",4128,24041,10912,121,"0,146544073272037","0,274468085106383","0,322530081865761","0,328110487302143","0,352810758213688"
"LLMMostWinsEval_7_0.25","0,195761808926246",4134,23061,10906,120,"0,152013237727523","0,274867021276596","0,322826067238915","0,328865313350088","0,35781761692425"
"LLMMostWinsEval_7_0.5","0,281172656604037",4033,9614,11007,119,"0,29552282552942","0,268151595744681","0,319313324952228","0,323683852191212","0,341849447225739"
"LLMMostWinsEval_7_0.75","0,574704607425016",8220,5346,6820,122,"0,605926581158779","0,546542553191489","0,326415673860188","0,330929376615098","0,341849447225739"
"LLMMostWinsEval_7_0.9","0,636946138014058",8201,2510,6839,117,"0,765661469517319","0,545279255319149","0,314002966428323","0,318503731272786","0,326988907673842"
"LLMAccumEval_7_sigmoid","0,214099760491513",4112,19260,10928,125,"0,175937018654801","0,273404255319149","0,334247035903934","0,340387682890189","0,360012404304223"
"LLMAccumEval_7_logits","0,189169343735659",4122,24418,10918,126,"0,144428871758935","0,27406914893617","0,336359371431275","0,34266007809572","0,359806642987351"
"LLMMostWinsEval_8_0.0","0,287996498522814",3948,8429,11092,120,"0,318978750908944","0,2625","0,314361363369775","0,315237578458957","0,334512026660806"
"LLMMostWinsEval_8_0.1","0,369517183019134",3930,2301,11110,119,"0,630717380837747","0,261303191489362","0,311336179096951","0,312424684445747","0,322120622911377"
"LLMMostWinsEval_8_0.25","0,377608288962441",3936,1871,11104,116,"0,677802651971758","0,261702127659574","0,307138648232754","0,309955548643278","0,317182351306439"
"LLMMostWinsEval_8_0.5","0,372866730584851",3889,1931,11151,113,"0,668213058419244","0,258577127659574","0,302080457940161","0,304577920437353","0,311352447328387"
"LLMMostWinsEval_8_0.75","0,379676267788156",3882,1527,11158,114,"0,71769273433167","0,25811170212766","0,304250804480713","0,308790394198337","0,311846274488881"
"LLMMostWinsEval_8_0.9","0,377763553104535",3836,1433,11204,114,"0,728031884608085","0,255053191489362","0,304250804480713","0,308790394198337","0,311846274488881"
"LLMAccumEval_8_sigmoid","0,380050751512785",3894,1558,11146,118,"0,714233308877476","0,258909574468085","0,313657036109166","0,317432369506979","0,321434751855136"
"LLMAccumEval_8_logits","0,373907703934628",3958,2173,11082,123,"0,645571684880117","0,263164893617021","0,32383367500202","0,32489758292569","0,337689895888058"
"LLMMostWinsEval_9_0.0","0,247169181263471",4071,13830,10969,116,"0,22741746271158","0,270678191489362","0,308575570436826","0,310968551484603","0,339239073230288"
"LLMMostWinsEval_9_0.1","0,242467043314501",4120,14824,10920,121,"0,217483108108108","0,273936170212766","0,323030504951753","0,325549633626759","0,351228099293388"
"LLMMostWinsEval_9_0.25","0,261035313001605",3903,10961,11137,123,"0,26258073196986","0,259507978723404","0,325243783110219","0,329070924302849","0,339258254369996"
"LLMMostWinsEval_9_0.5","0,311341909595185",3957,6422,11083,122,"0,381250602177474","0,263098404255319","0,324991206818755","0,328801088246155","0,342605305124454"
"LLMMostWinsEval_9_0.75","0,363027399171978",3902,2555,11138,117,"0,604305404986836","0,259441489361702","0,311234010674093","0,316625449734542","0,321974303752712"
"LLMMostWinsEval_9_0.9","0,370141135649646",3947,2340,11093,117,"0,627803403849213","0,262433510638298","0,313450554008261","0,318915518714359","0,325815181667664"
"LLMAccumEval_9_sigmoid","0,278222972023056",3958,9454,11082,124,"0,295108857739338","0,263164893617021","0,32849093064749","0,331975741046793","0,347715044493453"
"LLMAccumEval_9_logits","0,231895921648882",3966,15199,11074,125,"0,206939733889903","0,263696808510638","0,330454191170628","0,333116931867554","0,3469948798844"
"LLMMostWinsEval_None_0.0","0,207525075098009",4076,20166,10964,123,"0,168137942413992","0,271010638297872","0,32633730257165","0,328954524296163","0,35715901337245"
"LLMMostWinsEval_None_0.1","0,229073114565343",3979,15721,11061,126,"0,201979695431472","0,264561170212766","0,332514354819784","0,335552182553647","0,359628149174919"
"LLMMostWinsEval_None_0.25","0,268995633187773",4004,10726,11036,125,"0,271826205023761","0,266223404255319","0,330866493237126","0,33366913642388","0,352877782989525"
"LLMMostWinsEval_None_0.5","0,304445544931622",3996,7215,11044,124,"0,356435643564356","0,265691489361702","0,328928681922684","0,332342453186096","0,34617183461691"
"LLMMostWinsEval_None_0.75","0,309600221878838",3907,6292,11133,125,"0,383076772232572","0,259773936170213","0,330346170965607","0,334096285353902","0,342776772888515"
"LLMMostWinsEval_None_0.9","0,305769908731247",3903,6586,11137,122,"0,372104109066641","0,259507978723404","0,322938763558199","0,326688877946494","0,335369365481107"
"LLMAccumEval_None_sigmoid","0,283222237980428",3994,9170,11046,123,"0,3034032209055","0,265558510638298","0,325533737707351","0,329225237391926","0,341233563011972"
"LLMAccumEval_None_logits","0,311622170055306",3916,6177,11124,126,"0,387991677400178","0,260372340425532","0,331585637945814","0,33400550830234","0,343721404388702"
Binary file not shown.
Loading

0 comments on commit 6ee1d3d

Please sign in to comment.