Include only implementation reports submitted by maintainers or with their consent #132

csarven · 2022-04-14T10:38:35Z

https://github.com/solid/test-suite/blob/main/README.md#table appears to include a (high-level) report about implementations.

Can the table (and information elsewhere) only list implementations that the maintainers of the project's submitted a report for or with their consent (ideally in public record)?

FWIW, this was the approach taken in LDN tests in that the reports were either submitted by the maintainer of the project or with the consent of the maintainer. It allows the submissions to be made and shared in good faith.

See also solid/specification#383

michielbdejong · 2022-04-20T08:56:55Z

That was proposed in the past, but in practice that doesn't work. The main reason is that some implementers go through the effort of properly testing their server, but some don't. So that approach would lead to incomplete and incorrect data.

The only situation where we will not test a server implementation is if that server self-reports as "not (yet) Solid compliant". We don't test teapots. :)

The test suite is primarily a service to pod hosters and to Solid app developers. Our role as a service to pod server implementers is only secondary. We owe it to our readers to give a complete and honest report of the objective data, not just echo what each implementer self-advertise in their own documentation. There have been repeated cases where implementations self-advertised as "use this as a Solid server" but in reality, it turned out not to be spec compliant. So a self-reporting system like you propose would not work.

Thanks for the suggestion nonetheless!

csarven · 2022-04-20T11:30:46Z

some implementers go through the effort of properly testing their server, but some don't

What/Who? Why do they end up testing improperly?

The only situation where we will not test a server implementation is if that server self-reports as "not (yet) Solid compliant"

Is this a well-known and agreed practice? Where do they report not to be compliant? Why is it opt-out instead of opt-in?

Claims about an implementation's conformance can surely be done by multiple parties, and what's authoritative and trustable is another matter.

When I suggest that maintainers of a project reporting their test results, it is not merely a claim of "our server is spec compliant". It is very much about tests running in a (semi-)automated or manual mode - whatever is appropriate for a particular test - and generating a structured report that is reproducible in context of the inputs provided.

but in reality, it turned out not to be spec compliant

Specs need to authoritatively reference tests and reports (hence, issue 383). It is only then we can say that the results/reports generated by a test suite are valid. For specs that have a status other than "final" (1.0 or REC or whatever), the results are questionable. Unless of course the maintainer states that a particular version of an implementation conforms to a particular version of a spec when tested with a particular version of test software. If that (and possibly other) information is not taken into account for testing or communicated in the reports, I would always question the data/claims/reports - it is a major pothole for all sorts of issues: data quality, concept drifts, comparability, communication...

Having said that, there is room for both maintainer submitted reports as well as third-party reporting and/or verification (of maintainers reports). In order to get to reliable third-party reporting, I'd argue that the bar for the quality of tests and reporting is very high especially if the communication is intended to have high degree of authority on the matter. That itself requires multiple layers of checks from people writing test software, writing tests, authors/editors verifying the accuracy and necessary precision of the tests, and a significant sample of reports that can as well be verified. This is important because the community and the world at large will interpret what's authoritative for different reasons.

At the end of the day, specs need to help implementers and within good reason there shouldn't be doubt about the validity of the tests or the results. Hence, my concern about who makes the claim - after all, as you know, anyone can say anything about anything.

michielbdejong · 2022-04-20T13:29:00Z

Yeah, that's why it has to be a collaboration between testers and implementers, so that we put all our knowledge and tools together, adding up to the best information we can compile as a community. And I think in most cases that's going pretty well.

We always ping the implementer in question, of course, and in case of doubt or disagreement, we mark the test as 'disputed' . That ensure that the information we publish is the at least as good as the information that each implementer would publish by themselves.

But if you feel there is room for improvement then maybe we want to call a panel meeting about this topic?

michielbdejong · 2022-04-20T13:39:34Z

tests running in a (semi-)automated or manual mode

Yes please! Absolutely. And then we can also link those results, of course. NSS, PSS and Solid-Nextcloud are already doing this, and I think CSS and Inrupt-ESS are starting to run a subset.

What I'm pushing back on is the word "only" in this issue's title. We're not going to hold back information unless there are technical grounds to do so. We should of course publish the best information we have, and annotate it with our best knowledge of the context (e.g. is a given part of the spec draft, or optional, etc) and invite everybody to add their knowledge and skills in an open way.

michielbdejong · 2022-04-24T09:16:12Z

This is something we will decide in the test suite panel.

michielbdejong · 2022-04-25T08:16:38Z

Focusing on technical details is not just important because of our community's code of conduct, but also because of the issue we're discussing here:

What should be the process of publishing test reports? I think there are three main cases:

If implementers self-report results from CI and/or from manual testing, I think we can usually publish those in good faith, unless there is evidence of tampering.
If the test suite panel reports test results and the implementer has specific technical comments, then we discuss these, and the results are not published until the discussion leads to a conclusion.
If the test suite panel reports test results and the implementer makes broad negative statements about the panel's work, without going into any technical detail, what should happen? I think that's the trickiest one of the three.

michielbdejong · 2022-04-25T12:03:46Z

Thanks for going into some more detail. IMHO the level of detail is often what makes the difference between a destructive comment and a constructive one, thank you!

The majority of implementations that pass the table are not compatible with existing apps.

-> created #133 about this.

Multiple implementations that do not pass the table are fully compatible with existing apps.

-> I personally would say that's not a reason to lower the bar, but if you feel we should change our reporting based on that then we'll listen, take that on as a suggestion, and discuss it with the panel. I created #134 about this.

In particular, CSS has implemented hundreds of commits implementing crucial app and spec compatibility behavior. Virtually none of these affect test score.

-> that's interesting, we might be able to dig through those to find pointers for new tests we can add. I created #135 about this.

michielbdejong · 2022-04-25T12:05:32Z

@csarven Doodle for the meeting about this issue is almost complete, you are the last person missing. Can you please fill in your availability? Thanks!

kjetilk · 2022-04-25T20:04:29Z

Let me take a middle ground here.

First, note that no specification is at 1.0 yet. I think that is important because it hasn't been that uncommon to publish results of conformance testing without maintainer's approval, it can be thought of a service to the community and as a statement of fact, I think it is arguably a good practice, but that has only happened after 1.0. In the current situation, what is conformant behaviour is still somewhat in a state of flux.

Then, we also need to ensure that we have conformance tests that are thoroughly reviewed, and that we have metadata to explain what the test refers to, that it is provably legitimized by the spec, and that there is metadata to show that some tests are not yet approved. Finally, the test harness should reflect that.

Before we have all that, we can't be sufficiently confident that a test failure is the fault of the server under test and not the test or testing infrastructure, and thus it is inappropriate to publish them without the maintainer's consent. Once we have that, my opinion is that it is quite appropriate to do so.

I would like to integrate test writing into the spec review process. I have wanted to do that for years, to have text and test move together, make it part of the Definition of Done of a feature. We haven't had the resources to do that, though, I suspect it would speed up the overall ecosystem if we had that, since we'd be more confident about the impact of a certain feature if we did.

I'd rather not argue much more over this subject, but I do feel that all test suites should be held to the same standard, at least if they are to be in the solid org.

michielbdejong · 2022-04-26T07:10:32Z

Thanks! @csarven We're waiting for you to fill in the doodle so we can discuss this issue in the panel.

edwardsph · 2022-04-26T10:12:05Z

@kjetilk Thanks - I agree completely.

The specs are not sufficiently ready - we have a mix of early releases and drafts.
The test suites are not sufficiently ready - neither of them are yet made up of fully reviewed and accepted tests.
The implementations are having to work in this fluid environment and balance multiple requirements:
- support developers who are trying to build applications that can work on any server.
- create implementations that help us understand where the spec is good and where it needs more work.
- create implementations that have commercial value and can be trusted sufficiently in those environments.
- develop things which are likely to appear in future versions of the specs but are required now for reasons above.

At present I do not believe we have any server implementation that passes all MUST/MUST-NOT requirement tests in both test suites. I would argue that we cannot say with confidence whether this is because the server is wrong or the test suite is. We certainly need to investigate any failures and record what is discovered in each case. If we published results from all tests suites what would that really tell people? It would only undermine confidence in the specs, servers and test suites. It is just too early to publish results without causing harm.

michielbdejong · 2022-04-28T09:21:54Z

Thanks for your input, all!
We'll discuss it in the panel.

bourgeoa · 2022-04-29T17:10:03Z

Since more than 2 months, at least 21 out of 93 tests (23% of tests) for WAC mandate the server to accept invalid RDF

@RubenVerborgh I'm sorry to discover that the RDF is invalid.
I need some more information on the RDF error to be able to correct it in the tests and/or on the NSS server.

edwardsph · 2022-05-03T09:25:37Z

I'm not sure the RDF used in the tests is actually invalid. This is because the fragments used in the tests don't actually contain any triples. The code

    "@prefix solid: <http://www.w3.org/ns/solid/terms#>." +
    "#patch a solid:InsertDeletePatch;" +
    "  solid:inserts { <#patch> <#to> <#create> .}.",

generates

@prefix solid: <http://www.w3.org/ns/solid/terms#>.#patch a solid:InsertDeletePatch;  solid:inserts { <#patch> <#to> <#create> .}.

So that line represents a prefix follows by a comment from the second # onwards.

The tests all succeed at submitting an empty patch. This may confirm the access control aspects of the mechanism but only on the assumption that the server hasn't gone "Oh - you sent me nothing - ok that's fine, you can do that".

If the patch actually has an effect and inserts a triple then that would be a serious RDF parser error but I don't think that is proven as the tests don't actually confirm anything changed.

edwardsph · 2022-05-03T09:29:03Z

~~I should add that the type solid:InsertDeletePatch is not actually in the vocab at http://www.w3.org/ns/solid/terms#~~

I just spotted solid:InsertDeletePatch in the spec. It is missing from the vocab: solid/vocab#66

edwardsph · 2022-05-03T10:23:17Z

On reading the spec more, I would expect these tests to fail as the spec says

A patch document MUST contain one or more patch resources.

I would therefore expect all these tests to get a 422 response from a server unless I have mis-read something.

michielbdejong · 2022-05-16T09:01:47Z

Thanks for digging into it @edwardsph, I created solid-contrib/web-access-control-tests#47 about it.

michielbdejong · 2022-05-22T18:38:16Z

Discussion continued in solid/test-suite-panel#5

This was referenced May 16, 2022

Missing newline in patch body solid-contrib/web-access-control-tests#47

Open

Add a test for patch doc with zero patch resources solid-contrib/solid-crud-tests#57

Open

michielbdejong closed this as completed May 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include only implementation reports submitted by maintainers or with their consent #132

Include only implementation reports submitted by maintainers or with their consent #132

csarven commented Apr 14, 2022 •

edited

Loading

michielbdejong commented Apr 20, 2022 •

edited

Loading

csarven commented Apr 20, 2022

michielbdejong commented Apr 20, 2022

michielbdejong commented Apr 20, 2022

michielbdejong commented Apr 24, 2022

michielbdejong commented Apr 25, 2022

michielbdejong commented Apr 25, 2022

michielbdejong commented Apr 25, 2022

kjetilk commented Apr 25, 2022

michielbdejong commented Apr 26, 2022

edwardsph commented Apr 26, 2022

michielbdejong commented Apr 28, 2022

bourgeoa commented Apr 29, 2022

edwardsph commented May 3, 2022

edwardsph commented May 3, 2022 •

edited

Loading

edwardsph commented May 3, 2022

michielbdejong commented May 16, 2022

michielbdejong commented May 22, 2022

Include only implementation reports submitted by maintainers or with their consent #132

Include only implementation reports submitted by maintainers or with their consent #132

Comments

csarven commented Apr 14, 2022 • edited Loading

michielbdejong commented Apr 20, 2022 • edited Loading

csarven commented Apr 20, 2022

michielbdejong commented Apr 20, 2022

michielbdejong commented Apr 20, 2022

michielbdejong commented Apr 24, 2022

michielbdejong commented Apr 25, 2022

michielbdejong commented Apr 25, 2022

michielbdejong commented Apr 25, 2022

kjetilk commented Apr 25, 2022

michielbdejong commented Apr 26, 2022

edwardsph commented Apr 26, 2022

michielbdejong commented Apr 28, 2022

bourgeoa commented Apr 29, 2022

edwardsph commented May 3, 2022

edwardsph commented May 3, 2022 • edited Loading

edwardsph commented May 3, 2022

michielbdejong commented May 16, 2022

michielbdejong commented May 22, 2022

csarven commented Apr 14, 2022 •

edited

Loading

michielbdejong commented Apr 20, 2022 •

edited

Loading

edwardsph commented May 3, 2022 •

edited

Loading