Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a capabilities module #1820

Merged

Conversation

yelhamer
Copy link
Collaborator

This PR moves the capability extraction logic from `main.py` to a new "capabilities" module.

@williballenthin
Copy link
Collaborator

williballenthin commented Oct 19, 2023

I like the idea of splitting out this logic into its own module, because:

  1. main.py is way too big, and,
  2. this logic is re-used by other scripts/tools, so it makes sense to provide a stable module path

I'd propose that this new module be capa.capabilities (not capa.features.capabilities). Thoughts @mr-tz @mike-hunhoff @yelhamer?

@yelhamer please also update the scripts that use these routines:

scripts/profile-time.py
117:            capa.main.find_capabilities(rules, extractor, disable_progress=True)

scripts/capa_as_library.py
35:def find_subrule_matches(doc: rd.ResultDocument) -> Set[str]:
178:    capabilities, counts = capa.main.find_capabilities(rules, extractor, disable_progress=True)

scripts/lint.py
369:    capabilities, _ = capa.main.find_capabilities(ctx.rules, extractor, disable_progress=True)

scripts/show-capabilities-by-function.py
189:    capabilities, counts = capa.main.find_capabilities(rules, extractor)

scripts/bulk-process.py
139:    capabilities, counts = capa.main.find_capabilities(rules, extractor, disable_progress=True)

also plugins/integrations:

capa/ghidra/capa_ghidra.py
76:    capabilities, counts = capa.main.find_capabilities(rules, extractor, False)
126:    capabilities, counts = capa.main.find_capabilities(rules, extractor, True)

capa/ida/plugin/form.py
771:                    capabilities, counts = capa.main.find_capabilities(

@williballenthin williballenthin added enhancement New feature or request breaking-change introduces a breaking change that should be released in a major version labels Oct 19, 2023
@williballenthin
Copy link
Collaborator

williballenthin commented Oct 19, 2023

Furthermore, does it make sense to break out any other logic from main? Here's whats left:

timing
set_vivisect_log_level
has_rule_with_namespace
is_internal_rule
is_file_limitation_rule
has_file_limitation
is_supported_format
is_supported_arch
get_arch
is_supported_os
get_os
get_meta_str
is_running_standalone
get_default_root
get_default_signatures
get_workspace
get_extractor
get_file_extractors
is_nursery_rule_path
collect_rule_file_paths
on_load_rule_default
get_rules
get_signatures
get_sample_analysis
collect_metadata
compute_dynamic_layout
compute_static_layout
compute_layout
install_common_args
handle_common_args
main
ida_main
ghidra_main

I think these things belong in main:

timing
set_vivisect_log_level
is_running_standalone
install_common_args
handle_common_args
main
ida_main
ghidra_main

proposal 1: move to existing capa.rules:

has_rule_with_namespace
is_internal_rule
is_file_limitation_rule
get_meta_str
is_nursery_rule_path
collect_rule_file_paths
on_load_rule_default
get_rules

proposal 2: move to capa.capabilities:

has_file_limitation

proposal 3: move to new module capa.loader:

has_file_limitation
is_supported_format
is_supported_arch
get_arch
is_supported_os
get_os
is_running_standalone
get_default_root
get_default_signatures
get_workspace
get_extractor
get_file_extractors
get_signatures
get_sample_analysis
collect_metadata
compute_dynamic_layout
compute_static_layout
compute_layout

Please brainstorm if these changes make intuitive sense to you. If so, then we can update this PR with the changes, paying particular attention to how the scripts change - we'd want these to become simpler/more readable.

@mr-tz
Copy link
Collaborator

mr-tz commented Oct 19, 2023

I'd propose that this new module be capa.capabilities (not capa.features.capabilities). Thoughts @mr-tz @mike-hunhoff @yelhamer?

agreed

I think we should use this as a start to a larger refactor significantly reducing the size of main and making capa much easier to use as a library (and within our own code), see for example #1813 as well as many code duplication in our scripts and plugins.
Let's wait on that big refactor till all the current big tasks and PRs have been addressed though.

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, thanks @yelhamer!

yelhamer and others added 3 commits October 20, 2023 09:27
@williballenthin williballenthin mentioned this pull request Oct 20, 2023
18 tasks
@williballenthin
Copy link
Collaborator

@mr-tz i'd like to have your LGTM here too before we merge

@mr-tz
Copy link
Collaborator

mr-tz commented Oct 20, 2023

reviewing now!

capa/rules/__init__.py Outdated Show resolved Hide resolved
capa/main.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great changes, just a few minor issues I've noticed

@yelhamer yelhamer requested a review from mr-tz October 26, 2023 17:02
Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yelhamer yelhamer merged commit 0097822 into mandiant:dynamic-feature-extraction Oct 27, 2023
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change introduces a breaking change that should be released in a major version enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants