Reasons for the poor effect of Speculative Sampling #198

JoeNan1 · 2024-08-26T02:07:28Z

I tested the Speculative Sampling method with llama2-7b and llama2-70b on the a800, but their boost effect was almost zero and negative in most cases.

llama2-7b base 103.25 tokens/s
llama2-7b Speculative Sampling 104.52 tokens/s
llama2-70b base 14.55tokens/s
llama2-70b Speculative Sampling 13.41 tokens/s

yanboliang · 2024-09-16T03:59:15Z

Can you print out and check if the aggregate_metrics['accept_counts'] makes sense? accept_counts means how many token prediction from the draft model has been accepted by the verifier model. If it's too low, you can't get too much performance boost from speculative sampling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reasons for the poor effect of Speculative Sampling #198

Reasons for the poor effect of Speculative Sampling #198

JoeNan1 commented Aug 26, 2024

yanboliang commented Sep 16, 2024

Reasons for the poor effect of Speculative Sampling #198

Reasons for the poor effect of Speculative Sampling #198

Comments

JoeNan1 commented Aug 26, 2024

yanboliang commented Sep 16, 2024