-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] (suggested fix) mmrazor.models.algorithms.quantization.mm_architecture.MMArchitectureQuant.sync_qparams()
fails if there are modules present in other modes but not in forward mode='tensor'
#634
Comments
mmrazor.models.algorithms.quantization.mm_architecture.MMArchitectureQuant.sync_qparams()
fails if there are modules present in other modes but not in forward mode='tensor'
mmrazor.models.algorithms.quantization.mm_architecture.MMArchitectureQuant.sync_qparams()
fails if there are modules present in other modes but not in forward mode='tensor'mmrazor.models.algorithms.quantization.mm_architecture.MMArchitectureQuant.sync_qparams()
fails if there are modules present in other modes but not in forward mode='tensor'
Added more context and suggested a fix |
mmrazor.models.algorithms.quantization.mm_architecture.MMArchitectureQuant.sync_qparams()
fails if there are modules present in other modes but not in forward mode='tensor'
mmrazor.models.algorithms.quantization.mm_architecture.MMArchitectureQuant.sync_qparams()
fails if there are modules present in other modes but not in forward mode='tensor'
After trying to deploy the quantized model, I realized the suggested fix might be unnecessary and cause further issues since the I then tried: python /tools/train.py \
${qat_topdown_cgf} \
--cgf-options \
model.architecture.test_cfg.flip_test=False \
--work-dir /path/here/ But the model still fails to sync without my patch. |
I realized that the @@ -121,7 +121,7 @@ class MMArchitectureQuant(BaseAlgorithm):
in some subtle ways, so we need to sync them here.
"""
- def traverse(module, prefix):
+ def traverse(module, prefix, mode, src_mode):
for name, child in module._modules.items():
if module is None:
continue
@@ -129,7 +129,13 @@ class MMArchitectureQuant(BaseAlgorithm):
if isinstance(child, FakeQuantizeBase):
for name, param in child.named_parameters():
param_name = f'{child_name}.{name}'
- src_param = src_state_dict[param_name]
+ src_param = src_state_dict.get(param_name)
+ if '_dup' in param_name and src_param is None:
+ param_name = '.'.join([section.split('_dup')[0] for section in param_name.split('.')])
+ src_param = src_state_dict.get(param_name)
+ if src_param is None:
+ print(f"{param_name} in mode: '{mode}' but not found in source mode: '{src_mode}', skipping sync.")
+ continue
if src_param.shape == param.shape:
param.data.copy_(src_param)
else:
@@ -140,20 +146,26 @@ class MMArchitectureQuant(BaseAlgorithm):
param.data.copy_(src_param)
for name, buffer in child.named_buffers():
buffer_name = f'{child_name}.{name}'
- src_buffer = src_state_dict[buffer_name]
+ src_buffer = src_state_dict.get(buffer_name)
+ if '_dup' in buffer_name and src_buffer is None:
+ buffer_name = '.'.join([section.split('_dup')[0] for section in buffer_name.split('.')])
+ src_buffer = src_state_dict.get(buffer_name)
+ if src_buffer is None:
+ print(f"{buffer_name} in mode: '{mode}' but not found in source mode: '{src_mode}', skipping sync.")
+ continue
if src_buffer.shape == buffer.shape:
buffer.data.copy_(src_buffer)
else:
buffer.resize_(src_buffer.shape)
buffer.data.copy_(src_buffer)
else:
- traverse(child, f'{child_name}.')
+ traverse(child, f'{child_name}.', mode, src_mode)
src_state_dict = self.qmodels[src_mode].state_dict()
for mode in self.forward_modes:
if mode == src_mode:
continue
- traverse(self.qmodels[mode], '')
+ traverse(self.qmodels[mode], '', mode, src_mode)
def _get_rewriter_context_in_mmdeploy(self, deploy_cfg):
"""Get rewriter context in mmdeploy according to the deploy related |
After some fixing, the solution to this issue is to refactor the model so that all FX tracing is possible on all modes up until wrapped methods that differ in each mode. as long as the only difference in tracing is after the |
Describe the bug
In models where theres modules that exist only in mode 'predict' or in 'loss' but not in 'tensor', the following code fails with a
KeyError
looking through the state dict of the tensor mode model. For example, if one model has duplicates but the other doesn't.mmrazor.models.algorithms.quantization.mm_architecture.MMArchitectureQuant.sync_params()
#L124--L148Additional Context
I have been trying to quantize the mmpose.TopdownPoseEstimator, applying fixes for torch 2.0.0 incompatibility suggested in mmrazor #632, a fix for nn.Parameters inside TopdownPoseEstimator not being traced in mmrazor #633, and a fix on mmpose.TopdownPoseEstimator untraceable methods in mmpose #3012.
Because of a
flip
input inversion test being added to the predict forward graph, not only are there duplicate modules but also duplicate loose (leaf)activation_post_process_xyz
numbered modules that make the syncing fail.Reproduces the error - code sample
I cannot currently provide the configuration, but the executing code is this:
Reproduces the problem - error message
And while patching that:
Post related information - suggested fix
*EDIT: while this fix allows for syncing of nodes that aren't in other modes, it causes failure in model deployment later down the line
For duplicate modules i figure one can copy the
state_dict
element with a non-suffixed name, but I don't have a suggestion for non existent modules yet.For activation post processing leaf nodes, I can ignore most of the copying since a lot of it is reset in
MMArchitectureQuant.__init__()
.mmrazor/models/algorithms/quantization/mm_architecture.py
The text was updated successfully, but these errors were encountered: