You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Chai-1 is limited to 2048 tokens (token=canonical AA or atom), and the main reason is high memory consumption.
We received several requests to support larger crop sizes, but it requires significant engineering investments, though some scenarios are much simpler than others.
If you're critically bound by this limitation, please leave the following information:
your case (more specifically, why it demands larger crop size), max crop size and approx. how many inference runs you need (1? 10? 10000?)
your current hardware setup, single node: GPU model, GPU memory, number of GPUs in a node and connection between GPUs with a node.
An example would be 1. fake subunit to integrate into viral capsids, ~4000AAs, compare 10-100 designs. 2. 8xA100 80GB + nvlink (plot twist: you likely don't need to model full capsid to handle such scenario, and this would be way faster/cheaper/simpler)
The text was updated successfully, but these errors were encountered:
In my case, I am currently working with over 500 sequences, each exceeding 2048 amino acids. Most are close to this limit, but a few extend beyond 5,000-7,000 amino acids. Of course, this is just my specific use case, and there will always be others with different requirements.
Support for a token limit closer to that of AF3 would be greatly appreciated.
I would suggest approaching this issue differently. Consider transitioning to H100 cards (and, in the future, others such as Blackwells). Conduct engineering tests to determine the maximum token limit that CHAI-1 can handle effectively.
Additionally, it might be helpful to include memory consumption guidelines in the README file. For example:
A100 40GB: Maximum token size ~1500
A100 80GB: Maximum token size ~2048
... and so on.
A quick workaround could also be advising users to sort sequences in their batch by size and run until GPU memory becomes insufficient.
Use case: Modeling entire hemichannels or gap junctions, and small ligands bound to them
Hardware: 4xA100 (80 GB) node
Number of Residues: ~2600 for a hemichannel + ligands, ~5200 for a gap junction + ligands
Ultimately, being able to model the open state vs. the closed state, potentially using some contact constraints would be VERY useful to me, and I will need to do this with many kinds of gap junctions in a medium throughput workflow (maybe 100-500 predictions per connexin type). Anything that passes the simpler connexin-ligand validation step then goes into the hemichannel-6x-ligand validation step, or into the gap junction-6x-ligand validation step.
Chai-1 is limited to 2048 tokens (token=canonical AA or atom), and the main reason is high memory consumption.
We received several requests to support larger crop sizes, but it requires significant engineering investments, though some scenarios are much simpler than others.
If you're critically bound by this limitation, please leave the following information:
An example would be 1. fake subunit to integrate into viral capsids, ~4000AAs, compare 10-100 designs. 2. 8xA100 80GB + nvlink (plot twist: you likely don't need to model full capsid to handle such scenario, and this would be way faster/cheaper/simpler)
The text was updated successfully, but these errors were encountered: