Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Stax2 (Woodstox) instead of plexus-xml (Xpp) to process XML #1174

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

andrzejj0
Copy link
Contributor

@andrzejj0 andrzejj0 commented Nov 4, 2024

Second stage of convergence with Maven 4 chages: this time getting rid of Plexus XML for document parsing.

Using Stax2 (Woodstox) instead of plexus-xml (Xpp) to process XMLuments; reimplemented the mutable reader using an XMLStreamReader2, added some cleanup and more documentation.

  • thanks to XMLStreamReader2 providing both the start and the end of the current token in its LocationInfo, it was not necessary to cache the next token like the ModifiedXMLEventReader did, which saves us a lot of complexity and makes the design more compatible with the delegate;
  • most importantly, the semantics is in line with the interface: hasNext() does not proceed the cursor to the next event (like was the case with the previous implementation), next() does

It's not possbible to switch entirely to Stax2 because of Modello. But aside from that, most stuff has been replaced. Thanks to using a Stax parser instead of an event-based parser, we can also see some performance improvements (see integration test times for example).

Added some polish, cleanup, documentation (especially to MutableXMLStreamReader as it's the central piece of this whole plugin).

It's quite a big change. Please review. Should we push this to 2.18.0 or postpone it a bit?

…uments; reimplemented the mutable reader using an XMLStreamReader2, added some cleanup and more documentation.
@andrzejj0 andrzejj0 added this to the 2.18.0 milestone Nov 4, 2024
@andrzejj0 andrzejj0 self-assigned this Nov 4, 2024
Copy link
Member

@slawekjaranowski slawekjaranowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we can try ... every ITs look ok

@andrzejj0 andrzejj0 merged commit a899699 into mojohaus:master Nov 5, 2024
26 checks passed
@andrzejj0 andrzejj0 deleted the xml branch November 5, 2024 06:59
@slawekjaranowski
Copy link
Member

@andrzejj0 we can try with stax-reader,writer

https://codehaus-plexus.github.io/modello/modello-maven-plugin/index.html

@andrzejj0
Copy link
Contributor Author

@slawekjaranowski I tried. It almost worked. However, there's one problem. It seems that the reader generated by modello-stax is not forgiving when it comes to XSD validation. It is only able to parse the namespace matching the namespace pattern defined in the .mdo file.

With the recent changes, we have the namespace of https://www.mojohaus.org/VERSIONS/RULE/${version}. However, we still have older unit tests and integration tests using the older namespaces, like http://mojo.codehaus.org/versions-maven-plugin/rule/2.0.0.

Making this plugin strict is an easy choice, but shortsighted. This will create a problem with backwards compatibility where people blindly updating the plugin or using the latest version will suddenly have a broken CI/CD pipeline because they've still been using the old namespace somewhere in their rules file and it will not work anymore.

So, ideally, it would be great to have a lenient parser, like we apparently do in case of Xpp3.

Should I file a bug/feature with Modello (I could even attempt to implement it)?

@andrzejj0
Copy link
Contributor Author

andrzejj0 commented Nov 6, 2024

It seems that strict namespace checking is built into modello-stax:

// legacy hack for pomVersion == 3
                if ( "3".equals( modelVersion ) )
                {
                    modelVersion = "3.0.0";
                }
                if ( !"2.1.0".equals( modelVersion ) )
                {
                    throw new XMLStreamException( "Document model version of '" + modelVersion + "' doesn't match reader version of '2.1.0'", xmlStreamReader.getLocation() );
                }

This actually requires that we're only getting version 2.1.0 and nothing else. Obviously this will break things for us. Xpp3 doesn't have this.

Next to introducing a feature to ignore namespace or just the version, maybe we should actually just create different model versions whenever we introduce a new namespace, and maintain as many namespace versions as we agree to support. That will surely make our code bigger and more difficult to maintain, but that's what you get when you introduce new namespaces, I guess.

Until this is sorted I think that we should just keep Xpp3 for the time being.

@andrzejj0
Copy link
Contributor Author

Alternatively, if we want to simply not use plexus-xml, we could use dom4j, but it requires dom4j:dom4j which currently has 2 open CVEs.

@slawekjaranowski
Copy link
Member

we can try to fix modello-stax should be work similar as modello with plexus

@andrzejj0
Copy link
Contributor Author

andrzejj0 commented Nov 8, 2024

That's one option. Another option is to use modello-dom4j. Then we also don't use Plexus and we're not after performance here anyway.

Unfortunately, dom4j:dom4j currently has 2 open CVEs. That held me from going for it.

@andrzejj0
Copy link
Contributor Author

I briefly looked at the code of the Modello plugin. It looks like the version check is only added if block is present in the model definition file.

I agree that the Modello plugin needs to behave in a consistent manner across parser types, so perhaps there needs to come a new argument there to choose whether this namespace check is needed.

@slawekjaranowski
Copy link
Member

we can start a discussion about it at Modello issues https://github.com/codehaus-plexus/modello/issues

@andrzejj0
Copy link
Contributor Author

andrzejj0 commented Nov 9, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants