Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include only implementation reports submitted by maintainers or with their consent #132

Closed
csarven opened this issue Apr 14, 2022 · 18 comments

Comments

@csarven
Copy link
Contributor

csarven commented Apr 14, 2022

https://github.com/solid/test-suite/blob/main/README.md#table appears to include a (high-level) report about implementations.

Can the table (and information elsewhere) only list implementations that the maintainers of the project's submitted a report for or with their consent (ideally in public record)?

FWIW, this was the approach taken in LDN tests in that the reports were either submitted by the maintainer of the project or with the consent of the maintainer. It allows the submissions to be made and shared in good faith.

See also solid/specification#383

@michielbdejong
Copy link
Collaborator

michielbdejong commented Apr 20, 2022

That was proposed in the past, but in practice that doesn't work. The main reason is that some implementers go through the effort of properly testing their server, but some don't. So that approach would lead to incomplete and incorrect data.

The only situation where we will not test a server implementation is if that server self-reports as "not (yet) Solid compliant". We don't test teapots. :)

The test suite is primarily a service to pod hosters and to Solid app developers. Our role as a service to pod server implementers is only secondary. We owe it to our readers to give a complete and honest report of the objective data, not just echo what each implementer self-advertise in their own documentation. There have been repeated cases where implementations self-advertised as "use this as a Solid server" but in reality, it turned out not to be spec compliant. So a self-reporting system like you propose would not work.

Thanks for the suggestion nonetheless!

@csarven
Copy link
Contributor Author

csarven commented Apr 20, 2022

some implementers go through the effort of properly testing their server, but some don't

What/Who? Why do they end up testing improperly?

The only situation where we will not test a server implementation is if that server self-reports as "not (yet) Solid compliant"

Is this a well-known and agreed practice? Where do they report not to be compliant? Why is it opt-out instead of opt-in?


Claims about an implementation's conformance can surely be done by multiple parties, and what's authoritative and trustable is another matter.

When I suggest that maintainers of a project reporting their test results, it is not merely a claim of "our server is spec compliant". It is very much about tests running in a (semi-)automated or manual mode - whatever is appropriate for a particular test - and generating a structured report that is reproducible in context of the inputs provided.

but in reality, it turned out not to be spec compliant

Specs need to authoritatively reference tests and reports (hence, issue 383). It is only then we can say that the results/reports generated by a test suite are valid. For specs that have a status other than "final" (1.0 or REC or whatever), the results are questionable. Unless of course the maintainer states that a particular version of an implementation conforms to a particular version of a spec when tested with a particular version of test software. If that (and possibly other) information is not taken into account for testing or communicated in the reports, I would always question the data/claims/reports - it is a major pothole for all sorts of issues: data quality, concept drifts, comparability, communication...

Having said that, there is room for both maintainer submitted reports as well as third-party reporting and/or verification (of maintainers reports). In order to get to reliable third-party reporting, I'd argue that the bar for the quality of tests and reporting is very high especially if the communication is intended to have high degree of authority on the matter. That itself requires multiple layers of checks from people writing test software, writing tests, authors/editors verifying the accuracy and necessary precision of the tests, and a significant sample of reports that can as well be verified. This is important because the community and the world at large will interpret what's authoritative for different reasons.

At the end of the day, specs need to help implementers and within good reason there shouldn't be doubt about the validity of the tests or the results. Hence, my concern about who makes the claim - after all, as you know, anyone can say anything about anything.

@michielbdejong
Copy link
Collaborator

Yeah, that's why it has to be a collaboration between testers and implementers, so that we put all our knowledge and tools together, adding up to the best information we can compile as a community. And I think in most cases that's going pretty well.

We always ping the implementer in question, of course, and in case of doubt or disagreement, we mark the test as 'disputed' . That ensure that the information we publish is the at least as good as the information that each implementer would publish by themselves.

But if you feel there is room for improvement then maybe we want to call a panel meeting about this topic?

@michielbdejong
Copy link
Collaborator

tests running in a (semi-)automated or manual mode

Yes please! Absolutely. And then we can also link those results, of course. NSS, PSS and Solid-Nextcloud are already doing this, and I think CSS and Inrupt-ESS are starting to run a subset.

What I'm pushing back on is the word "only" in this issue's title. We're not going to hold back information unless there are technical grounds to do so. We should of course publish the best information we have, and annotate it with our best knowledge of the context (e.g. is a given part of the spec draft, or optional, etc) and invite everybody to add their knowledge and skills in an open way.

@michielbdejong
Copy link
Collaborator

This is something we will decide in the test suite panel.

@michielbdejong
Copy link
Collaborator

Focusing on technical details is not just important because of our community's code of conduct, but also because of the issue we're discussing here:

What should be the process of publishing test reports? I think there are three main cases:

  • If implementers self-report results from CI and/or from manual testing, I think we can usually publish those in good faith, unless there is evidence of tampering.

  • If the test suite panel reports test results and the implementer has specific technical comments, then we discuss these, and the results are not published until the discussion leads to a conclusion.

  • If the test suite panel reports test results and the implementer makes broad negative statements about the panel's work, without going into any technical detail, what should happen? I think that's the trickiest one of the three.

@michielbdejong
Copy link
Collaborator

Thanks for going into some more detail. IMHO the level of detail is often what makes the difference between a destructive comment and a constructive one, thank you!

The majority of implementations that pass the table are not compatible with existing apps.

-> created #133 about this.

Multiple implementations that do not pass the table are fully compatible with existing apps.

-> I personally would say that's not a reason to lower the bar, but if you feel we should change our reporting based on that then we'll listen, take that on as a suggestion, and discuss it with the panel. I created #134 about this.

In particular, CSS has implemented hundreds of commits implementing crucial app and spec compatibility behavior. Virtually none of these affect test score.

-> that's interesting, we might be able to dig through those to find pointers for new tests we can add. I created #135 about this.

@michielbdejong
Copy link
Collaborator

@csarven Doodle for the meeting about this issue is almost complete, you are the last person missing. Can you please fill in your availability? Thanks!

@kjetilk
Copy link
Contributor

kjetilk commented Apr 25, 2022

Let me take a middle ground here.

First, note that no specification is at 1.0 yet. I think that is important because it hasn't been that uncommon to publish results of conformance testing without maintainer's approval, it can be thought of a service to the community and as a statement of fact, I think it is arguably a good practice, but that has only happened after 1.0. In the current situation, what is conformant behaviour is still somewhat in a state of flux.

Then, we also need to ensure that we have conformance tests that are thoroughly reviewed, and that we have metadata to explain what the test refers to, that it is provably legitimized by the spec, and that there is metadata to show that some tests are not yet approved. Finally, the test harness should reflect that.

Before we have all that, we can't be sufficiently confident that a test failure is the fault of the server under test and not the test or testing infrastructure, and thus it is inappropriate to publish them without the maintainer's consent. Once we have that, my opinion is that it is quite appropriate to do so.

I would like to integrate test writing into the spec review process. I have wanted to do that for years, to have text and test move together, make it part of the Definition of Done of a feature. We haven't had the resources to do that, though, I suspect it would speed up the overall ecosystem if we had that, since we'd be more confident about the impact of a certain feature if we did.

I'd rather not argue much more over this subject, but I do feel that all test suites should be held to the same standard, at least if they are to be in the solid org.

@michielbdejong
Copy link
Collaborator

Thanks! @csarven We're waiting for you to fill in the doodle so we can discuss this issue in the panel.

@edwardsph
Copy link
Contributor

@kjetilk Thanks - I agree completely.

  • The specs are not sufficiently ready - we have a mix of early releases and drafts.
  • The test suites are not sufficiently ready - neither of them are yet made up of fully reviewed and accepted tests.
  • The implementations are having to work in this fluid environment and balance multiple requirements:
    • support developers who are trying to build applications that can work on any server.
    • create implementations that help us understand where the spec is good and where it needs more work.
    • create implementations that have commercial value and can be trusted sufficiently in those environments.
    • develop things which are likely to appear in future versions of the specs but are required now for reasons above.

At present I do not believe we have any server implementation that passes all MUST/MUST-NOT requirement tests in both test suites. I would argue that we cannot say with confidence whether this is because the server is wrong or the test suite is. We certainly need to investigate any failures and record what is discovered in each case. If we published results from all tests suites what would that really tell people? It would only undermine confidence in the specs, servers and test suites. It is just too early to publish results without causing harm.

@michielbdejong
Copy link
Collaborator

Thanks for your input, all!
We'll discuss it in the panel.

@bourgeoa
Copy link
Member

Since more than 2 months, at least 21 out of 93 tests (23% of tests) for WAC mandate the server to accept invalid RDF

@RubenVerborgh I'm sorry to discover that the RDF is invalid.
I need some more information on the RDF error to be able to correct it in the tests and/or on the NSS server.

@edwardsph
Copy link
Contributor

I'm not sure the RDF used in the tests is actually invalid. This is because the fragments used in the tests don't actually contain any triples. The code

    "@prefix solid: <http://www.w3.org/ns/solid/terms#>." +
    "#patch a solid:InsertDeletePatch;" +
    "  solid:inserts { <#patch> <#to> <#create> .}.",

generates

@prefix solid: <http://www.w3.org/ns/solid/terms#>.#patch a solid:InsertDeletePatch;  solid:inserts { <#patch> <#to> <#create> .}.

So that line represents a prefix follows by a comment from the second # onwards.

The tests all succeed at submitting an empty patch. This may confirm the access control aspects of the mechanism but only on the assumption that the server hasn't gone "Oh - you sent me nothing - ok that's fine, you can do that".

If the patch actually has an effect and inserts a triple then that would be a serious RDF parser error but I don't think that is proven as the tests don't actually confirm anything changed.

@edwardsph
Copy link
Contributor

edwardsph commented May 3, 2022

I should add that the type solid:InsertDeletePatch is not actually in the vocab at http://www.w3.org/ns/solid/terms#

I just spotted solid:InsertDeletePatch in the spec. It is missing from the vocab: solid/vocab#66

@edwardsph
Copy link
Contributor

On reading the spec more, I would expect these tests to fail as the spec says

A patch document MUST contain one or more patch resources.

I would therefore expect all these tests to get a 422 response from a server unless I have mis-read something.

@michielbdejong
Copy link
Collaborator

Thanks for digging into it @edwardsph, I created solid-contrib/web-access-control-tests#47 about it.

@michielbdejong
Copy link
Collaborator

Discussion continued in solid/test-suite-panel#5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants