Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split linters in separate classes #17081

Merged
merged 34 commits into from
Feb 16, 2024

Conversation

bernt-matthias
Copy link
Contributor

@bernt-matthias bernt-matthias commented Nov 24, 2023

So far mainly a call for discussion. Would be nice to have this discussed first before I continue with the other linters
(discuss we could transform only stdio, a few more, or all linters within this PR).

Currently, a linter is a function prefixed by lint_ (most of them are contained in tool_util.linters which are used by default). These linters are at the moment quite monolithic, e.g. there is a single linter for the input tag and we have
only a limited number of states (check, info, warn, error).

This led to a lot of problems and frustration (see) since

  • It has never been defined what qualifies as an error/warning and contributors had their own different definitions.
  • It has not been possible to disable single checks, but only complete linters (which is not useful due to their monolithic nature)

So far this change does the following

  • Create a new Linter class which makes it easier to decide what a linter is in the loop over the modules.
  • The main method of this class is the classmethod lint
  • We could have a fix method that may be used to fix some problems (discuss: maybe not in this PR?).
  • when I started I added a code property (inspired by pep8/flake8 codes) which we could add to the LintMessages that are generated by the lint methods. But then I realised that skipping of linters simply works by the linter name (i.e. class name) .. maybe this is sufficient (discuss: advantage of codes is that we definitely have short codes for the planemo CLI, alphanumeric codes might become longer but this is closer to what we have at the moment). Alternatively we could just call the classes with codes, e.g. S001.

The following already works: planemo lint tools/filters/cutWrapper.xml --skip StdIOAbsence

  • removes the attribute check for regex and exit_code tags and subtag check for stdio since those are covered already by the xsd linter.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

try:
re.compile(match)
except Exception as e:
lint_ctx.error(f"Match '{match}' is no valid regular expression: {str(e)}", node=child)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to have a code per message, unless one linter only emits one possible message ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can try to do this. With the current state I would need to split StdIOAbsence only, or? I hoped to come away with something like at most one message per linter (which would be fine for profile versions as in StdIOAbsence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's important to note that we currently filter for linter function/class names (here) and not linter message codes. I would say that do not need the codes. But we can either name the filter classes using a code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One message per linter sounds fine, but also like a lot of work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One message per linter sounds fine, but also like a lot of work.

The good thing is that we have good unit test coverage - which now pays off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you augment the lint_ctx.function to instead pass codes we can use those codes to determine what is classified as an error / what can be ignored. This seems like less work (we'd have to run all relatively coarse linters, but I think that's ok ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent idea, but I think we are fixed to the signature of the linter function, i.e 1st parameter ToolSource and 2nd parameter LintContext. The reason is that there are some linters that live in the planemo sources. One option would be to move them here (not sure if this would add extra requirements) - which might be good anyway.

But there might be another way, due to this hack we can filter the messages before adding them to the final message list (and also if without this hack we could filter the list of messages after each linter application).

I'm a bit worried how we can ensure that our codes are not duplicated.

One message per linter sounds fine, but also like a lot of work.

Main question for me is what is the "best" solution ..

we'd have to run all relatively coarse linters, but I think that's ok ?

I also think this would be OK. Runtime should not be a concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that there are some linters that live in the planemo sources

I think when I first implemented the XSD validation the external dependency on xmllint (https://github.com/galaxyproject/planemo/blob/master/planemo/xml/validation.py#L14C20-L14C27) was the issue. lxml seemed less mature at the time than xmllint. I guess we use lxml a lot more aggressively now and we have been generally pleased with the XSD validation it is doing and we now have a Galaxy dependency on lxml. I think I would be fine dropping the xmllint path through planemo if it would simplify things and let the XSD validation move into Galaxy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment @jmchilton. Will try to move it here, but probably in a separate PR.

I'm still experimenting if separate classes are the way to go. It seems difficult to avoid code duplication, but it will become clearer how we end up in the different linter messages.

@bernt-matthias bernt-matthias force-pushed the topic/linter-overhaul branch 6 times, most recently from 1780d95 to 3305cef Compare November 26, 2023 13:57
@bernt-matthias bernt-matthias force-pushed the topic/linter-overhaul branch 15 times, most recently from bb17d76 to aa85c64 Compare December 7, 2023 16:24
@bernt-matthias
Copy link
Contributor Author

bernt-matthias commented Dec 7, 2023

Ping @mvdbeek and @jmchilton I'm quite happy with the current state. I have moved the lxml-based xsd validation from planemo to here. I thought it would be a good idea to run the xsd linter with each of the unit tests for the linters (mainly to remove redundant linters) which uncovered a few minor problems (e.g. in the xsd).

For me, the TODOs would be

  • decide if we want to store the linter type with each message and if we want shorter (eg numeric) linter names
    • we could also have something like aliases which would also allow to continue to use current skip lists, e.g. "inputs"
  • run the current and previous state of the linters on the IUC repo (maybe also devteam)
  • adapt planemo
    • remove the lxml based linter (should we keep the possibility to lint with xmllint?)
    • maybe allow to specify a skip list from file
    • maybe new sub command to list all available linters + doc

Regarding lxml I was also wondering if we should remove https://github.com/galaxyproject/galaxy/blob/e7a168000e627dc8579ad749174bd5eabbe15873/lib/galaxy/util/__init__.py#L68C9-L68C9

@bernt-matthias bernt-matthias marked this pull request as ready for review December 7, 2023 16:39
bernt-matthias added a commit to galaxyproject/tools-iuc that referenced this pull request Dec 18, 2023
* fix all XML schema errors

discovered here galaxyproject/galaxy#17081

* more linter fixes

* fix value

* add missing macros file for deprecated tool

creates a slightly annoying Traceback in the CI setup job

* ampvis: fix URLs

* ampvis2: more URL fixes

* ampvis2 heatmap fix duplicated output label

* Apply suggestions from code review

Co-authored-by: Marius van den Beek <[email protected]>

---------

Co-authored-by: Marius van den Beek <[email protected]>
@mvdbeek mvdbeek modified the milestones: 23.2, 24.0 Dec 19, 2023
bernt-matthias added a commit to bernt-matthias/planemo that referenced this pull request Jan 16, 2024
return lint_context


def lint_xml_with(lint_context, tool_xml, extra_modules=None) -> LintContext:
extra_modules = extra_modules or []
tool_source = get_tool_source(xml_tree=tool_xml)
return lint_tool_source_with(lint_context, tool_source, extra_modules=extra_modules)


def list_linters(extra_modules: Optional[List[str]] = None) -> List[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's actually a rare case where you can use a Metaclass:

In [1]: linters = {}
   ...:
   ...: class LinterMeta(type):
   ...:     def __new__(cls, clsname, bases, attrs):
   ...:         newclass = super(LinterMeta, cls).__new__(cls, clsname, bases, attrs)
   ...:         linters[clsname] = newclass
   ...:         return newclass
   ...:
   ...:
   ...: class MyLinter(metaclass=LinterMeta):
   ...:     pass
   ...:
   ...: linters
Out[1]: {'MyLinter': __main__.MyLinter}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing me to metaclasses. Never took the time to learn about them.

Still I hope that 3a98a3a is sufficient?

using `__subclasses__`
@mvdbeek
Copy link
Member

mvdbeek commented Feb 14, 2024

Any chance you could resolve the conflicts ?

@mvdbeek
Copy link
Member

mvdbeek commented Feb 16, 2024

Thanks a lot @bernt-matthias, this is really important work!

@mvdbeek mvdbeek merged commit 74a1b02 into galaxyproject:dev Feb 16, 2024
53 checks passed
Copy link

This PR was merged without a "kind/" label, please correct.

@bernt-matthias bernt-matthias deleted the topic/linter-overhaul branch February 17, 2024 11:43
bernt-matthias added a commit to bernt-matthias/planemo that referenced this pull request May 2, 2024
bernt-matthias added a commit to bernt-matthias/planemo that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants