You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I just learn about SetFit and now I want to use it for my ABSA usecase. I have 50.000 row of datasets which the maximum token per row is 511. When I use ABSATrainer for this dataset, I encounter this error :
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/setfit/trainer.py", line 502, in get_dataloader
data_sampler = ContrastiveDataset(
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/setfit/sampler.py", line 68, in __init__
self.generate_pairs()
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/setfit/sampler.py", line 90, in generate_pairs
for (_text, _label), (text, label) in shuffle_combinations(self.sentence_labels):
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/setfit/sampler.py", line 29, in shuffle_combinations
idxs = np.stack(np.triu_indices(n, k), axis=-1)
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/numpy/lib/twodim_base.py", line 1113, in triu_indices
tri_ = ~tri(n, m, k=k - 1, dtype=bool)
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/numpy/lib/twodim_base.py", line 414, in tri
m = greater_equal.outer(arange(N, dtype=_min_int(0, N)),
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 46.6 GiB for an array with shape (223709, 223709) and data type bool
How to solve this error? Is it because my row is too much? I saw other example in the github issue and it uses 200 rows. I tried 200 rows too but get the exact same error.
I didn't really understand how SetFit works, hence I don't know what to do to change things so I can solve the error. So can you also explain it a bit on how does it works? Like I saw Contrastive in the training and the ~tri seems like a triangular matrix for masking no? Why masking requires huge dimensional matrix?
The text was updated successfully, but these errors were encountered:
Hello, I just learn about SetFit and now I want to use it for my ABSA usecase. I have 50.000 row of datasets which the maximum token per row is 511. When I use ABSATrainer for this dataset, I encounter this error :
The text was updated successfully, but these errors were encountered: