Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compose Metadata Analyzer: Refactor to support V2 and V1 repositories #4470

Merged
merged 6 commits into from
Dec 29, 2024

Conversation

valentijnscholten
Copy link
Contributor

@valentijnscholten valentijnscholten commented Dec 18, 2024

Description

Use Composer Repository V2 url for metadata as v1 is deprecated. The https://repo.packagist.org and its private counterparts at packagist.com are becoming read only to clients that request V1 metdata.

This PR adds support for V2 for all repositories. It also keeps and improves V1 support as lots of third party repositories are still in V1 mode. The V1 support in DT was very minimal and this PR ads more constructs such as packages included in the main packages.json file and the user of available-packages and available-package-patterns to only query for applicable packages.

I analyzed a number of real world projects and looked at all the composer repositories and how those were handled by Composer. The implementation is based on the docs found in https://github.com/composer/composer. The docs are a bit ambiguous / unclear sometimes, so I also looked at the source code. I confirmed the behaviour by looking at real worl requests/responses sent by the Composer CLI.

Test cases are added.

Addressed Issue

Fixes #2337

Additional Details

The metadata being returned for V2 has the same format as V1, except that the releases for a package are returned as an Array instead of a JSONObject (minified is what Composer calls this).

One thing that changed is how non-existing (or no longer existing) packages are being handled:

Checklist

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have added corresponding update logic
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

Copy link

codacy-production bot commented Dec 18, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.00% (target: -1.00%) 100.00% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (ee5cbce) 22927 18194 79.36%
Head commit (eb80260) 22927 (+0) 18194 (+0) 79.36% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4470) 9 9 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

@valentijnscholten valentijnscholten marked this pull request as ready for review December 18, 2024 10:31
@valentijnscholten
Copy link
Contributor Author

The current code only works with repositories that use the default url pattern for package metadata:

    /**
     * @see <a href="https://packagist.org/apidoc#get-package-data">Packagist's API doc for "Getting package data - Using the Composer v2 metadata"</a>
     * Example: https://repo.packagist.org/p2/monolog/monolog.json
     */
    private static final String PACKAGE_META_DATA_URL = "/p2/%s/%s.json";

More extensive changes are needed to support repositories that use a different url pattern, or provide the package metadata inline. Those changes should go into a different PR so we can keep this PR here simple. Hopefully that helpts to get this PR merged/released before Composer V1 stops providing new metadata (February 1st 2025)

@nscuro
Copy link
Member

nscuro commented Dec 20, 2024

Hopefully that helpts to get this PR merged/released before Composer V1 stops providing new metadata (February 1st 2025)

So this really could (should!) go out in the 4.12.3 bugfix release then. Even if we get 4.13 out before Feb, not everyone will upgrade non-bugfix releases immediately. So better play it safe and get it out sooner.

@nscuro nscuro added enhancement New feature or request backport/4.12.3 PRs to be backported to version 4.12.3 labels Dec 20, 2024
@valentijnscholten
Copy link
Contributor Author

Even better. Didn't know if you'd want it in a bugfix release, but it's not too big of a change.

@valentijnscholten
Copy link
Contributor Author

valentijnscholten commented Dec 20, 2024

Looks like some third party repositories still expose their (meta)data in V1 format. Example https://packages.shopware.com.
And there's no field indicating which version. So we'll need to support both versions in DT for now by just trying both.

@valentijnscholten valentijnscholten marked this pull request as draft December 20, 2024 21:26
@valentijnscholten
Copy link
Contributor Author

valentijnscholten commented Dec 21, 2024

After a good nights sleep I realized we should just look at https://github.com/composer/composer/blob/main/src/Composer/Repository/ComposerRepository.php and mimic what is done there. The approach in #4435 may work for the short term, but it will result in lots of failing (404) requests for V2 meatada to repositories that are still V1.

@valentijnscholten
Copy link
Contributor Author

valentijnscholten commented Dec 21, 2024

Turns out that do it right, we need to retrieve the packages.json for the repository and check for the metadata-url. If present, it's V2. Otherwise it's V1. The approach in #4435 may work for the short term, but I decided to implement the "correct" solution here by mimicking what Composer itself does. We need that anyway if we want to do Vulnerability Analysis as well from these repositories.

Some talking points:

  • I added a simple in memory cache to cache the packages.json response. This may need something persistent/shared when ported to Hyades.
  • The ComposerMetaAnalyzer doesn't have enough information to make decent cache key. Any thoughts on providing more fields to the Analyzers besides just the Url. We could provide the ID as well, which should be unique? Or can we just provide the full reposiroy entity? For now the code works with the repositories I have seen, but I had to add a protected method to clean the cache before the start of each unit test as these are all running against localhost. Alternatively I can use a path in the repo url for each test. Or a username. But all that feels dirty :-)
  • Some of the mock files are huge. These are real-world examples. But we could trim them a bit and still have decent test quality.

Valentijn

Copy link

codacy-production bot commented Dec 21, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.09% (target: -1.00%) 80.59% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (ee5cbce) 22927 18194 79.36%
Head commit (a6fe0b7) 23013 (+86) 18283 (+89) 79.45% (+0.09%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4470) 170 137 80.59%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

@valentijnscholten valentijnscholten marked this pull request as ready for review December 23, 2024 09:59
@valentijnscholten valentijnscholten changed the title Compose Metadata Analyzer: Use v2 URL Compose Metadata Analyzer: Refactor to support V2 and V1 Dec 23, 2024
@valentijnscholten valentijnscholten changed the title Compose Metadata Analyzer: Refactor to support V2 and V1 Compose Metadata Analyzer: Refactor to support V2 and V1 repositories Dec 23, 2024
@nscuro
Copy link
Member

nscuro commented Dec 23, 2024

Any thoughts on providing more fields to the Analyzers besides just the Url. We could provide the ID as well, which should be unique? Or can we just provide the full reposiroy entity?

I don't see a problem in providing more info, like the repository type and identifier. Don't think we should pass the entire Repository object though.

I had to add a protected method to clean the cache before the start of each unit test as these are all running against localhost.

Added a comment about that, it should be possible to avoid this conflict in tests by instructing WireMock to use a dynamic port.

Some of the mock files are huge. [...] But we could trim them a bit and still have decent test quality.

That would be great.

@valentijnscholten
Copy link
Contributor Author

valentijnscholten commented Dec 23, 2024

I don't see a problem in providing more info, like the repository type and identifier. Don't think we should pass the entire Repository object though.

I suggest we replace the existing setRepositoryUrl and setUsernameAndPasssword with a new setup(...) method where we provide all parameters and setup everything.

@valentijnscholten
Copy link
Contributor Author

Some of the mock files are huge. [...] But we could trim them a bit and still have decent test quality.

That would be great.

Trimmed them down.

@@ -211,6 +213,7 @@ private void analyze(final QueryManager qm, final Component component, final IMe

if (StringUtils.trimToNull(model.getLatestVersion()) != null) {
// Resolution from repository was successful. Update meta model
//FIXME What happens if multiple repositories return a metamodel result with different lastPublishedTimestamps?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repository loop is broken out of upon encounter of the first matching repository, via the break in line 243.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what made me wonder. I noticed some components are published in multiple repositories. For example phpunit. I came across a third part repo that published an older version of phpunit. If DT ends up using that release as latest, it might draw the wrong conclusion. Might be safer to analyze all repositories and take the global latest release. But that could be wrong as well if some repo publish a release with some changes/fixes with a higher version number. Not sure how DT handles the suffixes like p1, p2, redhat1, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no bullet-proof way to determine what the "correct" repository or version is, at least none I am aware of. Hence the current strategy that will work most of the time, is to pick the first match. The strategy assumes that authoritative repositories (i.e. Maven Central for Maven) are checked first, which ATM is ensured via the resolution order.

Not sure how DT handles the suffixes like p1, p2, redhat1, etc.

It's not possible to handle such things generically. #2826 might help, but only as long as whoever adds these suffixes abides to the ecosystem's de-facto standard notation.

@nscuro nscuro added this to the 4.13 milestone Dec 28, 2024
Copy link
Member

@nscuro nscuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nscuro nscuro merged commit 626493b into DependencyTrack:master Dec 29, 2024
10 checks passed
@valentijnscholten valentijnscholten deleted the composer-metadata-v2 branch January 1, 2025 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/4.12.3 PRs to be backported to version 4.12.3 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to Packagist api v2 for Composer package metadata
2 participants