Skip to content

Commit

Permalink
add llava resampler
Browse files Browse the repository at this point in the history
  • Loading branch information
ZhangYuanhan-AI committed Jan 3, 2024
1 parent 8788e67 commit d0ffb4e
Show file tree
Hide file tree
Showing 19 changed files with 3,886 additions and 41 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -166,4 +166,4 @@ checkpoints/
*.txt
pipeline/serve/deploy/otterhd_endpoint.py
pipeline/benchmarks/models/llava_model.py
eval_results/
# eval_results/
36 changes: 36 additions & 0 deletions eval_results/eval_results_llava
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
================================================================================
EVALUATION REPORT
================================================================================


MODEL INFO: {'name': 'llava_model', 'model_path': '/mnt/petrelfs/zhangyuanhan/LLaVA/checkpoints/llava-v1.5-7b'}
--------------------------------------------------------------------------------
[2023-12-21 10:25:49,407] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Imported class: <class 'pipeline.benchmarks.models.llava_model.LLaVA_Model'>
Imported class: <class 'pipeline.benchmarks.datasets.mme.MMEDataset'>

DATASET: MMEDataset
--------------------
=========== Cognition ===========
total score: 250.0
code_reasoning score: 55.0
numerical_calculation score: 35.0
text_translation score: 50.0
commonsense_reasoning score: 110.0
=========== Perception ===========
total score: 1484.87775110044
artwork score: 125.75
celebrity score: 129.41176470588235
count score: 153.33333333333334
color score: 165.0
position score: 118.33333333333334
OCR score: 132.5
landmark score: 160.0
scene score: 157.25
existence score: 195.0
posters score: 148.29931972789115

--------------------------------------------------------------------------------
Total Datasets Evaluated: 1

================================================================================
197 changes: 197 additions & 0 deletions eval_results/eval_results_otter
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
================================================================================
EVALUATION REPORT
================================================================================


MODEL INFO: {'name': 'otter_image', 'model_path': '/mnt/petrelfs/zhangyuanhan/Otter/checkpoints/otter_llava_sft_nonconv_nogroup/epoch_1/'}
--------------------------------------------------------------------------------
[2023-12-23 08:32:08,024] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Imported class: <class 'pipeline.benchmarks.models.otter_image.OtterImage'>
The current model version is configured for Otter-Image with max_num_frames set to None.
Parameter: lang_encoder.model.embed_tokens.weight, Size: 131.084288 M
Parameter: lang_encoder.model.layers.3.gated_cross_attn_layer.attn_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.3.gated_cross_attn_layer.ff_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.3.gated_cross_attn_layer.attn.norm.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.3.gated_cross_attn_layer.attn.norm.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.3.gated_cross_attn_layer.attn.to_q.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.3.gated_cross_attn_layer.attn.to_kv.weight, Size: 1.048576 M
Parameter: lang_encoder.model.layers.3.gated_cross_attn_layer.attn.to_out.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.3.gated_cross_attn_layer.feed_forward.0.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.3.gated_cross_attn_layer.feed_forward.0.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.3.gated_cross_attn_layer.feed_forward.1.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.3.gated_cross_attn_layer.feed_forward.3.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.7.gated_cross_attn_layer.attn_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.7.gated_cross_attn_layer.ff_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.7.gated_cross_attn_layer.attn.norm.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.7.gated_cross_attn_layer.attn.norm.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.7.gated_cross_attn_layer.attn.to_q.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.7.gated_cross_attn_layer.attn.to_kv.weight, Size: 1.048576 M
Parameter: lang_encoder.model.layers.7.gated_cross_attn_layer.attn.to_out.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.7.gated_cross_attn_layer.feed_forward.0.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.7.gated_cross_attn_layer.feed_forward.0.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.7.gated_cross_attn_layer.feed_forward.1.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.7.gated_cross_attn_layer.feed_forward.3.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.11.gated_cross_attn_layer.attn_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.11.gated_cross_attn_layer.ff_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.11.gated_cross_attn_layer.attn.norm.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.11.gated_cross_attn_layer.attn.norm.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.11.gated_cross_attn_layer.attn.to_q.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.11.gated_cross_attn_layer.attn.to_kv.weight, Size: 1.048576 M
Parameter: lang_encoder.model.layers.11.gated_cross_attn_layer.attn.to_out.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.11.gated_cross_attn_layer.feed_forward.0.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.11.gated_cross_attn_layer.feed_forward.0.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.11.gated_cross_attn_layer.feed_forward.1.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.11.gated_cross_attn_layer.feed_forward.3.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.15.gated_cross_attn_layer.attn_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.15.gated_cross_attn_layer.ff_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.15.gated_cross_attn_layer.attn.norm.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.15.gated_cross_attn_layer.attn.norm.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.15.gated_cross_attn_layer.attn.to_q.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.15.gated_cross_attn_layer.attn.to_kv.weight, Size: 1.048576 M
Parameter: lang_encoder.model.layers.15.gated_cross_attn_layer.attn.to_out.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.15.gated_cross_attn_layer.feed_forward.0.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.15.gated_cross_attn_layer.feed_forward.0.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.15.gated_cross_attn_layer.feed_forward.1.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.15.gated_cross_attn_layer.feed_forward.3.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.19.gated_cross_attn_layer.attn_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.19.gated_cross_attn_layer.ff_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.19.gated_cross_attn_layer.attn.norm.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.19.gated_cross_attn_layer.attn.norm.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.19.gated_cross_attn_layer.attn.to_q.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.19.gated_cross_attn_layer.attn.to_kv.weight, Size: 1.048576 M
Parameter: lang_encoder.model.layers.19.gated_cross_attn_layer.attn.to_out.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.19.gated_cross_attn_layer.feed_forward.0.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.19.gated_cross_attn_layer.feed_forward.0.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.19.gated_cross_attn_layer.feed_forward.1.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.19.gated_cross_attn_layer.feed_forward.3.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.23.gated_cross_attn_layer.attn_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.23.gated_cross_attn_layer.ff_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.23.gated_cross_attn_layer.attn.norm.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.23.gated_cross_attn_layer.attn.norm.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.23.gated_cross_attn_layer.attn.to_q.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.23.gated_cross_attn_layer.attn.to_kv.weight, Size: 1.048576 M
Parameter: lang_encoder.model.layers.23.gated_cross_attn_layer.attn.to_out.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.23.gated_cross_attn_layer.feed_forward.0.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.23.gated_cross_attn_layer.feed_forward.0.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.23.gated_cross_attn_layer.feed_forward.1.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.23.gated_cross_attn_layer.feed_forward.3.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.27.gated_cross_attn_layer.attn_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.27.gated_cross_attn_layer.ff_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.27.gated_cross_attn_layer.attn.norm.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.27.gated_cross_attn_layer.attn.norm.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.27.gated_cross_attn_layer.attn.to_q.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.27.gated_cross_attn_layer.attn.to_kv.weight, Size: 1.048576 M
Parameter: lang_encoder.model.layers.27.gated_cross_attn_layer.attn.to_out.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.27.gated_cross_attn_layer.feed_forward.0.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.27.gated_cross_attn_layer.feed_forward.0.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.27.gated_cross_attn_layer.feed_forward.1.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.27.gated_cross_attn_layer.feed_forward.3.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.31.gated_cross_attn_layer.attn_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.31.gated_cross_attn_layer.ff_gate, Size: 0.000001 M
Parameter: lang_encoder.model.layers.31.gated_cross_attn_layer.attn.norm.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.31.gated_cross_attn_layer.attn.norm.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.31.gated_cross_attn_layer.attn.to_q.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.31.gated_cross_attn_layer.attn.to_kv.weight, Size: 1.048576 M
Parameter: lang_encoder.model.layers.31.gated_cross_attn_layer.attn.to_out.weight, Size: 2.097152 M
Parameter: lang_encoder.model.layers.31.gated_cross_attn_layer.feed_forward.0.weight, Size: 0.004096 M
Parameter: lang_encoder.model.layers.31.gated_cross_attn_layer.feed_forward.0.bias, Size: 0.004096 M
Parameter: lang_encoder.model.layers.31.gated_cross_attn_layer.feed_forward.1.weight, Size: 67.108864 M
Parameter: lang_encoder.model.layers.31.gated_cross_attn_layer.feed_forward.3.weight, Size: 67.108864 M
Parameter: lang_encoder.lm_head.weight, Size: 131.084288 M
Parameter: perceiver.latents, Size: 0.065536 M
Parameter: perceiver.layers.0.norm_media.weight, Size: 0.001024 M
Parameter: perceiver.layers.0.norm_media.bias, Size: 0.001024 M
Parameter: perceiver.layers.0.norm_latents.weight, Size: 0.001024 M
Parameter: perceiver.layers.0.norm_latents.bias, Size: 0.001024 M
Parameter: perceiver.layers.0.to_q.weight, Size: 0.524288 M
Parameter: perceiver.layers.0.to_kv.weight, Size: 1.048576 M
Parameter: perceiver.layers.0.to_out.weight, Size: 0.524288 M
Parameter: perceiver.layers.0.feed_forward.0.weight, Size: 0.001024 M
Parameter: perceiver.layers.0.feed_forward.0.bias, Size: 0.001024 M
Parameter: perceiver.layers.0.feed_forward.1.weight, Size: 4.194304 M
Parameter: perceiver.layers.0.feed_forward.3.weight, Size: 4.194304 M
Parameter: perceiver.layers.1.norm_media.weight, Size: 0.001024 M
Parameter: perceiver.layers.1.norm_media.bias, Size: 0.001024 M
Parameter: perceiver.layers.1.norm_latents.weight, Size: 0.001024 M
Parameter: perceiver.layers.1.norm_latents.bias, Size: 0.001024 M
Parameter: perceiver.layers.1.to_q.weight, Size: 0.524288 M
Parameter: perceiver.layers.1.to_kv.weight, Size: 1.048576 M
Parameter: perceiver.layers.1.to_out.weight, Size: 0.524288 M
Parameter: perceiver.layers.1.feed_forward.0.weight, Size: 0.001024 M
Parameter: perceiver.layers.1.feed_forward.0.bias, Size: 0.001024 M
Parameter: perceiver.layers.1.feed_forward.1.weight, Size: 4.194304 M
Parameter: perceiver.layers.1.feed_forward.3.weight, Size: 4.194304 M
Parameter: perceiver.layers.2.norm_media.weight, Size: 0.001024 M
Parameter: perceiver.layers.2.norm_media.bias, Size: 0.001024 M
Parameter: perceiver.layers.2.norm_latents.weight, Size: 0.001024 M
Parameter: perceiver.layers.2.norm_latents.bias, Size: 0.001024 M
Parameter: perceiver.layers.2.to_q.weight, Size: 0.524288 M
Parameter: perceiver.layers.2.to_kv.weight, Size: 1.048576 M
Parameter: perceiver.layers.2.to_out.weight, Size: 0.524288 M
Parameter: perceiver.layers.2.feed_forward.0.weight, Size: 0.001024 M
Parameter: perceiver.layers.2.feed_forward.0.bias, Size: 0.001024 M
Parameter: perceiver.layers.2.feed_forward.1.weight, Size: 4.194304 M
Parameter: perceiver.layers.2.feed_forward.3.weight, Size: 4.194304 M
Parameter: perceiver.layers.3.norm_media.weight, Size: 0.001024 M
Parameter: perceiver.layers.3.norm_media.bias, Size: 0.001024 M
Parameter: perceiver.layers.3.norm_latents.weight, Size: 0.001024 M
Parameter: perceiver.layers.3.norm_latents.bias, Size: 0.001024 M
Parameter: perceiver.layers.3.to_q.weight, Size: 0.524288 M
Parameter: perceiver.layers.3.to_kv.weight, Size: 1.048576 M
Parameter: perceiver.layers.3.to_out.weight, Size: 0.524288 M
Parameter: perceiver.layers.3.feed_forward.0.weight, Size: 0.001024 M
Parameter: perceiver.layers.3.feed_forward.0.bias, Size: 0.001024 M
Parameter: perceiver.layers.3.feed_forward.1.weight, Size: 4.194304 M
Parameter: perceiver.layers.3.feed_forward.3.weight, Size: 4.194304 M
Parameter: perceiver.layers.4.norm_media.weight, Size: 0.001024 M
Parameter: perceiver.layers.4.norm_media.bias, Size: 0.001024 M
Parameter: perceiver.layers.4.norm_latents.weight, Size: 0.001024 M
Parameter: perceiver.layers.4.norm_latents.bias, Size: 0.001024 M
Parameter: perceiver.layers.4.to_q.weight, Size: 0.524288 M
Parameter: perceiver.layers.4.to_kv.weight, Size: 1.048576 M
Parameter: perceiver.layers.4.to_out.weight, Size: 0.524288 M
Parameter: perceiver.layers.4.feed_forward.0.weight, Size: 0.001024 M
Parameter: perceiver.layers.4.feed_forward.0.bias, Size: 0.001024 M
Parameter: perceiver.layers.4.feed_forward.1.weight, Size: 4.194304 M
Parameter: perceiver.layers.4.feed_forward.3.weight, Size: 4.194304 M
Parameter: perceiver.layers.5.norm_media.weight, Size: 0.001024 M
Parameter: perceiver.layers.5.norm_media.bias, Size: 0.001024 M
Parameter: perceiver.layers.5.norm_latents.weight, Size: 0.001024 M
Parameter: perceiver.layers.5.norm_latents.bias, Size: 0.001024 M
Parameter: perceiver.layers.5.to_q.weight, Size: 0.524288 M
Parameter: perceiver.layers.5.to_kv.weight, Size: 1.048576 M
Parameter: perceiver.layers.5.to_out.weight, Size: 0.524288 M
Parameter: perceiver.layers.5.feed_forward.0.weight, Size: 0.001024 M
Parameter: perceiver.layers.5.feed_forward.0.bias, Size: 0.001024 M
Parameter: perceiver.layers.5.feed_forward.1.weight, Size: 4.194304 M
Parameter: perceiver.layers.5.feed_forward.3.weight, Size: 4.194304 M
Parameter: perceiver.norm.weight, Size: 0.001024 M
Parameter: perceiver.norm.bias, Size: 0.001024 M
Total Trainable param: 1.441004 B
Imported class: <class 'pipeline.benchmarks.datasets.mme.MMEDataset'>

DATASET: MMEDataset
--------------------
=========== Cognition ===========
total score: 295.3571428571429
code_reasoning score: 50.0
numerical_calculation score: 80.0
text_translation score: 72.5
commonsense_reasoning score: 92.85714285714286
=========== Perception ===========
total score: 902.483993597439
artwork score: 58.0
celebrity score: 67.05882352941177
count score: 121.66666666666666
color score: 55.00000000000001
position score: 50.0
OCR score: 50.0
landmark score: 119.25
scene score: 155.25
existence score: 163.33333333333334
posters score: 62.925170068027214

--------------------------------------------------------------------------------
Total Datasets Evaluated: 1

================================================================================
Loading

0 comments on commit d0ffb4e

Please sign in to comment.