Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update content_type_matching logic for binary payloads #429

Merged
merged 7 commits into from
May 30, 2024

Conversation

YOU54F
Copy link
Member

@YOU54F YOU54F commented May 28, 2024

fixes #171

Content Type - Detection Mechanisms for Binary Content

pact_matching now performs the following for matching Binary content-types

  1. Determines expected Content-Type header requested by user in test
  2. Read content buffer with infer library, and guess Content-Type based on magic bytes
  3. If unsuccessful
    1. Read content buffer with tree_magic_mini library, and guess Content-Type based on shared-mime-info DB
      1. MagicDB is not shipped with pact_matching, due to GPL restrictions, users can add manually
        1. Linux Alpine - apk add shared-mime-info
        2. MacOS brew install shared-mime-info
          1. arm64 MacOS requires tree_magic_mini fork
        3. Linux - Debianapt-get install -y shared-mime-info
  4. If either result returns text/plain, then manually read bytes using detect_content_type_from_bytes function in pact_models
  5. If all of 2, 3, or 4 fails, then throw error, otherwise return Ok

certainly better ways to implement this, but wanted to make sure I had a full round trip in all the consuming languages, and could remove the content-type hacks there.

The plus side is we have consistent behaviour in all of our test examples, on all supported platforms and architectures (win/lin/mac arm64 and amd64) without the use of tree_magic and the existence of shared-mime-info (which is great for container users, and great for avoiding GPL woes)

Rust libraries used:

Questions

  1. Do we need tree_magic_mini, I don't seem to hit that check in the wild
  2. We probably want some vendored format examples that are harder to detect
  3. Provide user full opt-out of content-type matching, in case of things going weird?

Notes to merger

  1. will need updating in pact_mock_server (use of pact_matching) which is then used in this repo, so I believe we will need to build/release pact_matching to crates.io, update pact_mock_server, release, and then back to pact_ref to do the works there

Tested in

@rholshausen
Copy link
Contributor

Looks like infer will handle the most common types, but it has all the checks build in in Rust code, not config, so it is not extensible and will not be able to detect new types unless the crate is updated. So having tree_magic as a fallback is probably a good idea.

@YOU54F YOU54F marked this pull request as ready for review May 28, 2024 23:59
@YOU54F YOU54F marked this pull request as draft May 29, 2024 00:19
YOU54F added 2 commits May 29, 2024 03:31
- use updated tree_magic with merged fix for apple silicon
@YOU54F YOU54F marked this pull request as ready for review May 29, 2024 19:53
@YOU54F YOU54F requested review from adamrodger, mefellows, tienvx and JP-Ellis and removed request for mefellows May 29, 2024 19:53
@YOU54F
Copy link
Member Author

YOU54F commented May 29, 2024

👋🏾 Tagged a few of you for review!

Always room for improvement on what is proposed but this gives us

  • consistent cross platform and cross arch content-type detection with the new crate for the existing raised issues
  • falls back to the existing behaviour we have today, with additional improvements to also add support for arm64 macos in the tree_magic_mini crate for shared-mime-info db lookups
  • improved debug logging, so we are aware of which content type lookups are being performed.

The compatibility suite has been updated to test cross platform (windows x86_64 + macos x86_64/arm64 ) and skipped tests for windows have been been re-instated 👍🏾

Happy to make any amendments, as I know this might not be the best solution, especially with respect to my coding style, or the overall (existing) approach to matching in this function, but please be reminded this is a massive improvement over our existing confusing behaviour, and we can always progress forwards.

@@ -30,7 +30,7 @@ itertools = "0.12.1"
lazy_static = "1.4.0"
maplit = "1.0.2"
pact_matching = { version = "~1.2.2", path = "../pact_matching", default-features = false }
pact_mock_server = { version = "~1.2.6", default-features = false }
pact_mock_server = { version = "=1.2.8", default-features = false, git = "https://github.com/you54f/pact-core-mock-server.git", branch = "main" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update to released pact_mock_server, when pact_matching is released, and pact_mock_server updated to reference released crate

@@ -31,7 +31,8 @@ maplit = "1.0.2"
multipart = { version = "0.18.0", default-features = false, features = ["client", "mock"] }
onig = { version = "6.4.0", default-features = false }
pact_matching = { version = "~1.2.2", path = "../pact_matching" }
pact_mock_server = { version = "~1.2.6" }
pact_mock_server = { version = "=1.2.8", git = "https://github.com/you54f/pact-core-mock-server.git", branch = "main" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update to released pact_mock_server, when pact_matching is released, and pact_mock_server updated to reference released crate

@@ -31,7 +31,8 @@ maplit = "1.0.2"
multipart = { version = "0.18.0", default-features = false, features = ["client", "mock"] }
onig = { version = "6.4.0", default-features = false }
pact_matching = { version = "~1.2.2", path = "../pact_matching" }
pact_mock_server = { version = "~1.2.6" }
pact_mock_server = { version = "=1.2.8", git = "https://github.com/you54f/pact-core-mock-server.git", branch = "main" }
# pact_mock_server = { version = "=1.2.8", path = "../../../pact-core-mock-server/pact_mock_server" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path useful for dev guidelines if users are working across both repos locally.

.
├── pact-core-mock-server
└── pact-reference

@rholshausen rholshausen merged commit dece679 into pact-foundation:master May 30, 2024
32 checks passed
@rholshausen
Copy link
Contributor

Just a note, if you want to override a dependency, the easiest way is to use the [patch.crates-io] section in the workspace Cargo.toml

Copy link
Contributor

@tienvx tienvx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I wonder: since this PR is merged, when can we revert custom changes related to the pact_mock_server?

1. MagicDB is not shipped with pact_matching, due to GPL restrictions, users can add manually
1. Linux Alpine - `apk add shared-mime-info`
2. MacOS `brew install shared-mime-info`
1. `arm64` MacOS requires `tree_magic_mini` [fork](https://github.com/you54f/tree_magic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe a new version has been released v3.1.5, and this fork is no longer needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has been yes! i need to update that comment to reflect, we are consuming 3.1.5 in here now

@rholshausen
Copy link
Contributor

@rholshausen
Copy link
Contributor

Two tests are failing on Windows: https://github.com/pact-foundation/pact-reference/actions/runs/9295821959/job/25583357274

All good. Mock server is using the old pact matching. Updated it and ran the tests on Windows, all pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Matching binary payloads does not work consistently across OS/Platform
3 participants