Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid attribute defaults being added #9

Open
PStellmann opened this issue Feb 15, 2016 · 13 comments
Open

Avoid attribute defaults being added #9

PStellmann opened this issue Feb 15, 2016 · 13 comments

Comments

@PStellmann
Copy link
Contributor

When using xsl:copy-of within sch:add or sch:replace the attribute defaults will be added as well. This happens for instance on the new sample merge-dita-sl I just commited.

Maybe there could be a special behavior of sch:copy-of that it will copy the original code, not the one with resolved attribute defaults!?

@octavianN
Copy link
Contributor

We cannot add an "sqf:copy-of" that will ignore the default attributes because the processing is done in XSL, and we cannot determine if an attribute is default or not in XSL.
I propose to have an "sqf:copy-of" that will specify also the attributes to be ignored. Maybe something like this:

<sqf:copy-of exceptAttrs="class domain"/>

And on the processing part we can generate a copy template that will ignore the specified attributes. Something like this:

<!-- Template used to copy the current node -->
<xsl:template match="node() | @*" mode="copyExceptClass">
    <xsl:copy copy-namespaces="no">
        <xsl:apply-templates select="node() | @*" mode="copyExceptClass"/>
    </xsl:copy>
</xsl:template>
<!-- Template used to skip the @class attribute from being copied -->
<xsl:template match="@class" mode="copyExceptClass"/>

@nkutsche
Copy link
Contributor

Hi there,

I don't think it is impossible to implement this, but I see much problems - so it won't be easy.
Maybe we should make an optional advice in the specification, so implementation could implement it in "XML save mode" (without default attributes, resolved entities, etc.) or in "XSLT mode"?

@octavianN: I like your idea, but don't think that's enough. Because you would also delete all attributes with a specified value. We need a structure, which excludes attributes with a specified value. For instance like this:

<sqf:copy-of except="@class[.='x'] | @domain[.='y']"/>

But I think it is (almost) impossible to implement both - the sqf:copy-of in "XML save mode" and the except attribute. So both optional?

Or another attribute like this:

<sqf:copy-of xsm="true"/>

The rule would be:

  1. The same sqf:copy-of element must not have an xsm and an except attribute.
  2. The support of the xsm attribute is optional for the implementations

@PStellmann
Copy link
Contributor Author

I think this feature is pretty important and I see it as a strength that SQF won't modify any content not effected by a fix. It surely should be possible to use the same concept to copy a part of it.

I don't have sufficient insight into the implementation but I could imagine the following solutions:

  1. Use a custom extension instruction or function to process the sqf:copy-of that accesses the original xml.
  2. Pass original xml as parameter to XSLT and wrap the sqf:copy-of in a for-each to set the context to the original code (something like select="*[contains(@Class, ' topic/title ')]" won't work then, which would be OK, i think).
  3. Pass original xml as parameter to XSLT. For all the nodes passed to sqf:copy-of generate the xpath from the root and apply toe same xpath to the original xml to get the same node without resolved entities and attribute default. (I did this in my transformation by loading the document twice - once with attribute defaults and again without)
  4. Replace the select attribute by something else that will allow you to implement it. I think the most important use-cases are: The complete node, all attributes and all child nodes.

@octavianN
Copy link
Contributor

In most cases you know the default attributes from your fix. So an "except" attribute will solve most of the problems. It can specify also the attributes values as in your example.

For the "XML save mode" I see two solutions (but I do not like them):

  1. Ignore the default attributes, but you will not be able to match them. For example in a DITA document you cannot create a rule like <sch:rule context="*[contains(@Class, ' topic/title ')]">, because the @Class attribute is in general a default attribute.
  2. Do not ignore the default attributes and just set the "sqf:copy-of" in a mode that will ignore them. In this case you will need to determine somehow the default attributes when "sqf:copy-of" is ecxecuted, but it will be more processing and you can have performance problems.

@PStellmann
Copy link
Contributor Author

I just committed an example/demonstration/experiment how you can copy content withouth attribute defaults and still use these attribute defaults for selection: https://github.com/schematron-quickfix/sqf/tree/master/samples/SqfUtil/sqfu-copy-of-test

Thus, it would be possible to have a work-around based on XSLT. I'd still prefer if such a behavior could be built into the sqf:copy-of since I'm sure it could be integrated there even easier.

I agree that this will be more processing but in my environment I can't imagine a situation where this will play a role - the quick fixes are always triggered by humans and it doesn't matter if he waits for 10 or 100ms.

@georgebina
Copy link
Contributor

I agree that it is technically possible to implement this - but I guess if you want to take into account all possible edge cases, then this implementation becomes difficult - I am referring here to cases when you use external entities, XInclude either entire files or only fragments using different XPointer shemes, cases when the DTD defaults a namespace declaration and thus you select content in the XHTML namespace if default attributes are enabled but then you copy content in no namespace if you remove the attribute defaults, etc.
I am therefore not sure if we want to but such a high entry barrier for implementations - the problem again is not to support easy cases, but the edge cases - when the same results may be achieved just by excluding a few attributes that the quick-fix developer should already be aware that they are defined as defaults.
Maybe if we decide to actually add this support we can think of making it optional, but then it may be difficult for users to know when they can use it or not...

Best Regards,
George

@PStellmann
Copy link
Contributor Author

I agree that this XSLT way is more a workaround than a solution - although it will most likely work for my personal use cases.

On the other hand, while it is easy to implement excuding some attributes it makes the user interface pretty uncomfortable and in my environment there are several edge cases that could hardly be handled (e.g. haveing the same attribute with different default for different elements).

However, keeping content unmodified appears to be a problem that you have already solved anyway: When you apply a quick fix e.g. by deleting a node the rest of the file won't be touched. So when you managed to identify a node to precisely remove it why is it so much more difficult to copy or keep it?

Maybe the problem is that a select allows much more than a match attribute. Thus, an idea might be to get back to the sqf:keep element but with a match attribute that can only select content within the current context.

@georgebina
Copy link
Contributor

I am not against this functionality but considering the implementation difficulties for edge cases, adding this feature to the language should be supported by a number of concrete use cases where just specifying some attributes to be excluded is not enough. From my experience DITA is probably one of the very few languages that is a little abusing the default attributes, but DITA cases are easily handled by specifying '@Class' and '@domains' to be excluded when copying content.

@PStellmann
Copy link
Contributor Author

Note that @Class and @donains are not the only attributes with default values. Another sample is @xml:space for the codeblock element. And there are few more (I just did a find-in-files in the schema). But more important (at least to me) are those attributes added or set to defaults by specialization. A simple sample would be to have @Audience been set to "internal" by default for a specialized section element that is meant to contain confidential data.

I've just committed a simple sample "handle-dita-attributes" with 4 different quick fixes to play around.

I'm planning to add additional samples:

  • for DocBook using xInclude - which also should not be resolved.
  • with some specialized DITA elements using more attribute defaults.

@nkutsche
Copy link
Contributor

nkutsche commented Mar 9, 2016

Hi all,

I'm sorry, but I'm a bit confused, what we are talking about. At the beginning I think there was four possible concepts for this issue:

  1. "ignore default attributes": sqf:copy ignores all default attributes (initial issue by @PStellmann)
  2. "exceptAttrs attribute": add an exceptAttrs attribute to sqf:copy to ignore all listed attributes (counterproposal of @octavianN)
  3. "XML save mode": sqf:copy takes the orignal code - including all (non significant) whitespace, non-attributes, entities, CDATA sections etc.
  4. "except attribute": add an except attribute to sqf:copy, which contains an XSLT pattern to exclude specific nodes from copying (both introduced by me)

Could you please specify which of those concepts you are prefering?

To the discussion:

@georgebina:

I agree that it is technically possible to implement this

Do you mean "ignore default attributes" or "XML save mode"?
After you listed all edge cases, I want to relativise my statement:

I don't think it is impossible to implement this, but I see much problems

I was referring to "XML save mode". But I don't think there is a technical solution to copy nodes with all entities from one document to another (connected by XInclude or external entities).

To the use cases:
For the XML save mode I see many use cases and at least the publishers will be very happy about the keeping of the entity references:

  • The following actions would be possible without resolving all entities
    • unwrap nodes
    • wrap nodes
    • move nodes
    • duplicate nodes
  • By copying of large elements (sections for instance) you may have a huge number of possible default attributes.

But for the "ignore default attributes only" concept: yeah agree with @georgebina - there is not enough use cases. That's solvable with an except attribute.

@PStellmann:

However, keeping content unmodified appears to be a problem that you have already solved anyway

  • At least in my implementation, this is not true. I don't resolve all that non-DOM-stuff, because I don't touch them. I just replace (delete) the parts of the documents, which are subject of the QuickFixes. The new or copied content will be generated by XSLT. Because, if I want to copy nodes, there is much more to respect, as @goergebina already described.

So my proposal is:

  1. Add an except attribute, because it is easy to implement and could be used for more use cases, than just avoid default attributes.
  2. We should investigate the XML save mode concept, if there is more traps, we should skip this or we could restrict this to the common cases.

Maybe if we decide to actually add this support we can think of making it optional, but then it may be difficult for users to know when they can use it or not...

I don't see a big problem in that. Because it is just a feature request. The implementation, which does not support the XML save mode, is still able to have an almost equal solution and wouldn't create corrupt files.

Best Regards
Nico

@PStellmann
Copy link
Contributor Author

Personally I don't expect me using the except attribute. But I agree that it can be handy and it will be easy to implement.

But I'm very interested in the "XML safe mode". I will explain a litle more how I think that it could be implemented:

  • Identifying the character sequence in the original document the represents a node has already been solved for replacing or deleting it.
  • Now to copy the node without any resolving or modification just copy the character sequence to the output with <xsl:value-of select="$characterSequence" disable-output-escaping="yes"/>.

There are probably still edge cases that result in broken xml but it would surly handle all of my own use-cases.

@octavianN
Copy link
Contributor

For this issue "Avoid attribute defaults being added", I prefer the except attribute solution.
But I like the idea to keep the entity references and XInclude not expanded when you make a copy and the new sqf:copy element can be a solution.

Therefore we can define a mode for the sqf:copy-of element, maybe @unparsed-mode="true", that will copy without expanding the entities and XInclude and without copying the default attributes.

@PStellmann
Copy link
Contributor Author

I've just uploaded another sample with various attribute defaults. It's probably a very special case in global scope - but it's the main use-case for me personally. Just so you can understand why this is such an issue for me:
https://github.com/schematron-quickfix/sqf/blob/master/samples/outsource-dita-topics/oursource-dita-topics-dita-semia.dita

However, I like the suggestion from Octavio to activate the unparsed mode by an attribute - just like @xml:space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants