-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support filling text fields & rendering corresponding annots accordingly #35
Comments
Let me clarify a little further.
from the PADEs specification. I understand that the usual flow is to sign the I would like to abuse the specification's lack of detail regarding the Thanks! |
Hi Frederico, thank you for your interest in this project! Let me first address your question about inserting text. Text processing in PDF is complicated, and there's no simple way to "just add some strings", unfortunately. Adding text to a PDF file involves making many choices, managing (potentially several) font resources, handling glyph positioning etc. PyHanko has the facilities to do (most) of that, but those APIs are pretty low-level and require some knowledge of the PDF spec to use. After all, pyHanko is not a general-purpose PDF manipulation library. But if you tell me what kind of thing you want to typeset, I might be able to help you along. As for the problem of signing an update only: you're right that ByteRange can theoretically span any range of bytes. But that's arguably a design error in the specification, and all decent validators will reject signatures where the ByteRange doesn't conform to expectations. In fact, in ISO 32000-2, messing with ByteRanges is explicitly banned in PAdES signatures, and discouraged in general. What's the problem that you're trying to solve here, if I may ask? There might be a more conventional way to accomplish what you want :) |
There are no hard constraints for the typsetting. Currently, I am using
Could you please point me to where in the ISO that prohibition is? I only have access to a Portuguese translation of ISO 32000-1 at the moment, but should get my hands on a copy of ISO 32000-2 shortly.
We have a large (few megabytes) single-page template PDF file, which is digitally signed with pyHanko. Then, we would like to fill the blanks in this template and digitally sign the added phrases. We hope to store the byte-range with the strings and signature bytes for millions of documents and re-create the final PDF file (template with signature + byte-range with strings and signature bytes regarding the byte-range with strings) on-demand. Our hope with this approach is to store a few bytes or Kbytes (byte-range + signature) instead of a few Mbytes (signed template + signed byte-range) per document. |
Hi Frederico
(Disclaimer: I’m on mobile, so apologies for the curtness and/or formatting)
The relevant language in ISO 32000-2:2020 is in clause 12.8.3.4.2 and in
clause 12.8.1 (below NOTE 1). The PAdES provision is not part of ISO
32000-1 for sure, but maybe the PAdES spec has similar language (don’t
remember off the top of my head). Anyway, allowing partial ByteRanges has
security implications, so most validators don’t allow you to get away with
that, regardless of whether the specification would theoretically permit it
or not.
As for your actual use case: there are two things you can do:
- Sign the filled template using an incremental update in the usual way,
but discard the “common” part when writing to disk. When retrieving the
file later, you can simply concatenate the streams. This works because the
PDF update mechanism simply appends to the end of the base file, and it
won’t/shouldn’t break the signature.
- Given the above, do you still need the template portion to be signed
separately? Because if so, template filling operations using direct page
content modification will almost certainly invalidate the first signature.
Not for cryptographic reasons, but because good validators don’t allow just
any incremental update. This can be circumvented by turning the template
into a form, and making sure the first signature permits form filling
operations. There’s no generic form filling API in pyHanko, but it’s not
appreciably more complex than adding text content (since I don’t have a
high-level API for that either). In some ways, it’s maybe even simpler…
Let me know what you think.
…On Wed, 15 Sep 2021 at 23:59, Frederico Schardong ***@***.***> wrote:
Hi Frederico, thank you for your interest in this project!
Let me first address your question about inserting text. Text processing
in PDF is complicated, and there's no simple way to "just add some
strings", unfortunately. Adding text to a PDF file involves making many
choices, managing (potentially several) font resources, handling glyph
positioning etc. PyHanko has the facilities to do (most) of that, but those
APIs are pretty low-level and require some knowledge of the PDF spec to
use. After all, pyHanko is not a general-purpose PDF manipulation library.
But if you tell me what kind of thing you want to typeset, I might be able
to help you along.
There are no hard constraints for the typsetting. Currently, I am using
reportlab's drawString with a single font for the entire thing, namely setFont("Times-Roman",
12).
As for the problem of signing an update only: you're right that
*ByteRange* can *theoretically* span any range of bytes. But that's
arguably a design error in the specification, and all decent validators
will reject signatures where the *ByteRange* doesn't conform to
expectations. In fact, in ISO 32000-2, messing with *ByteRange*s is
explicitly banned in PAdES signatures, and discouraged in general.
Could you please point me to where in the ISO that prohibition is? I only
have access to a Portuguese translation of ISO 32000-1 at the moment, but
should get my hands on a copy of ISO 32000-2 shortly.
What's the problem that you're trying to solve here, if I may ask? There
might be a more conventional way to accomplish what you want :)
We have a large (few megabytes) single-page template PDF file, which is
digitally signed with pyHanko. Then, we would like to fill the blanks in
this template and digitally sign the added phrases. We hope to store the
byte-range with the strings and signature bytes for millions of documents
and re-create the final PDF file (template with signature + byte-range with
strings and signature bytes regarding the byte-range with strings)
on-demand. Our hope with this approach is to store a few bytes or Kbytes
(byte-range + signature) instead of a few Mbytes (signed template + signed
byte-range) per document.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#35 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABREJHYCUPL47PQ7GARX4S3UCEJL7ANCNFSM5EDD56GQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I see.
I haven't thought about this. Would it just be a straightforward binary concatenation? Nonetheless, if we disregard the partial
My beef with form is the aesthetics of the form fields. Could the looks of form filling possibly be changed so that the filled form fields look like strings on the pdf? Moreover, how does signature over form data works? From your explanation, I understand that only whatever is filled is signed, i.e., not the entire pdf document. If I got it right, then is there a considerable size difference between template filling using direct page content modification vs form filling? Since our goal is to reduce storage, perhaps we can live with ugly form-filled PDFs if they require just a few bytes of storage. |
Yes. Incremental updates work by simply appending the updated objects + an updated xref table to the end of the base file.
I'm not 100% sure to what degree this is implementation dependent, but if the last signature disables form field editing, I think it should be fine. Reason being that form field widgets (like pretty much all annotations) have an appearance stream (mandatory in PDF 2.0!) that can contain arbitrary graphics/text, etc. Typically, a viewer would only need to regenerate a form field appearance when editing form fields, so while the viewer might highlight the form field content in some way, you can choose how you want to render the actual field contents.
No. In a typical situation, the PDF processor that fills in the form updates the form field values using a regular incremental update (or a full save, depends on context) to override the actual form field objects. The digest for the new signature is computed over the entire file. Again, signing only parts of a PDF isn't really a thing, everything is done with incremental updates. Policing what is and isn't allowed in an incremental update to a signed document is the job of the signature validator. In fact, pyHanko implements a version of that too: see here.
Not really. It's on the order of a few KBs at most, with the form-based ones taking up a tiny bit more space because they need to deal with form field objects as well as the new rendered text. Anyway, you can use direct text editing to fill out your template in an incremental update if and only if the template itself isn't signed. Direct page content modification (even in an incremental update!) will cause validators (including Acrobat and pyHanko itself) to invalidate all earlier signatures during incremental update validation. Form field updates are typically permitted by default (unless the signature on the base file explicitly disallows them). Complicating matters further: there's no clear standard (yet) defining what is and is not allowed in a post-signing incremental update, but page content edits are rejected by pretty much all validators that do incremental update diffs. So, long story short:
In either case: both methods would use PDF incremental updates, so the trick of not writing the "common" template part to disk when creating the final signature works just fine either way. |
Dear Matthias, Thank you! I will have to use forms then. Since PyHanko does not support form filling, do you have any recommendations for a Python library for that? Actually, anything that runs on Linux and I could call on shell would do the job. |
That's a good question... I believe there are a couple of Python libs out there capable of doing form filling, but the problem is in finding one that supports writing incremental updates (because otherwise our little scheme would fall apart). Outside the Python space, you certainly have options (iText, PDFBox, ...), but I'm not aware of any "batteries included" solution in Python. Now, since signature fields are a special kind of form field, pyHanko does handle form fields at some basic level. Odds are that you can recycle some of that to handle text fields as well, but you'd still have to implement quite a few things yourself. Regardless, this is probably a good starting point. Sorry that I don't have a better answer :( Anyway, I'll add text field filling to my backlog. It's arguably a little out of scope, but it's probably not super hard to implement given what's there already (at least, I think I know what to do). I can't promise when I'll get to it, though :) |
Sorry for the late reply.
That would be amazing! I will be the first one to test it :-) |
Alright. It's on my TODO list, and I've reworded/reclassified this issue accordingly. Thanks! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions! |
Nope, not stale, I still plan to work on this (actually we probably don't need the stale bot anymore now that we have a discussion forum... I should something about that at some point). |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions! |
Hey,
Hopefully this is still in your backlog :-)
Em sáb., 13 de ago. de 2022 às 12:24, stale[bot] ***@***.***>
escreveu:
… Closed #35 <#35> as
completed.
—
Reply to this email directly, view it on GitHub
<#35 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA4PJFBS57OCRYR3ZTOBDIDVY5ZUFANCNFSM5EDD56GQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Sincerely,
Frederico Schardong
|
It is, it's just that there's always something more urgent to work on ;). (Next time the stale bot complains, feel free to leave a comment to show you're still interested. That's basically the only reason why I have it active on enhancement requests. I should probably increase the timeouts, though...) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions! |
I have a doc with 2 parts on one page. First part signed then car go out of park: template + fuel level+ odometer value. Second part signed then car arrives back: first part + new values. I can do versioning like #35 (comment) If I cant - I will put "new values" inside signature widget as hack)) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions! |
Hi @MatthiasValvekens, first of all, marvelous work on the pyHanko module! It has been a godsend for our project that greatly relies on digital signatures. I am also interested if anything like a system described in the comment #35 We are trying to create a system where multiple signers have the ability to do some incremental changes to a pdf, like adding text or putting check-marks, on a pdf and signing the file with their digital signature afterwards, via incremental updates. Overall similar to what what DocuSign does, but as a part of our project and integrated to other functionality. We do have an option to just make all of the changes and put our own digital signature at the end, but we are still experimenting and exploring the possibilities of giving each signer his own digital signature. As I understand this can be done with pdf forms, but we are not using built in pdf forms in favor of our own forms module that we use in the other pars of the project, so first we are trying to do this by updating the pages of the pdf directly, by using pyHanko and PyPDF2.
I understand that this is not the intended use-case for this module, but judging how well Adobe handles the cases 3 and to some extent 2 it "feels" like there can be a way to correctly achieve this. May be there now is a more strait forward way to achieve something similar to this? Or may be @alex-eri has some success in achieving the desired outcome? Edit: I understand how this works a bit better now after some experimentation and reading up, stamps are added as Form XObjects, and the border around the stamp can be removed. The only thing that remains is that the xobject overlays the whole page, changes the cursor, and prevents the text behind it from being selected. Is this the inherent behavior of Form XObjects or can this be changed somehow? |
Hi @unloder, Good points, let me clarify some things.
This is expected, and will trip any validator that checks incremental changes. The reason why is boring, but instructive: PDF's graphics model is a page description language at heart, and the link between PDF graphics operators and what a human sees is not always 100% clear. In the general case, it is practically impossible to distinguish between "additions" and any other type of change to direct page content. Clearly, nobody wants to accept arbitrary changes to page content, so because of the nature of the PDF graphics model, they end up having to reject all page content changes. As you note, using annotations (in particular, form fields) partially circumvents this. That's because annotations live "outside" the page content, so it's easier to treat them separately (this is not without its own set of risks, though: see https://pdf-insecurity.org/#attacks-on-pdf-certification-may-2021; and also https://itextpdf.com/blog/itext-news-technical-notes/attacks-pdf-certification-and-what-you-can-do-about-them). Signers can (to a degree) influence this behaviour by setting a DocMDP level. While we can discuss the merits of allowing/disallowing this kind of thing in a validator until the cows come home, it's a fact of life that this part of the PDF spec is extremely vague, and there's no reason to believe that that will change in the near future. Basically everyone in the industry agrees that the current situation is crappy, but there's no real consensus on how to fix things.... :) On the generating side of things: pyHanko currently only supports stamping directly on the page (which breaks prior signatures) or generating signature appearances. It's in principle straightforward to allow it to fill in text fields as well, but there are some sharp edges there (relating to font handling and the way PDF deals with "variable text"). This makes it a tough feature to land, since I really don't have the bandwidth anymore to do systematic tests with different viewers etc. to ensure that they all consume pyHanko's files as intended. Nonetheless, I put some experimental code on this branch: https://github.com/MatthiasValvekens/pyHanko/tree/feature/basic-form-filling. Feel free to give that Extending this to support things like checkboxes is possible but will involve more work (because checkboxes are a bit special in PDF forms). In principle, supporting annotations is also possible, but that will take significantly more time to land due to the sheer number of possible combinations. To this point:
This is not a property of form XObjects, no. Most of the observations you make are actually more about annotations vs. page content. If you include a form XObject directly on the page, it will behave (modulo transformations & resource handling) as if you'd directly injected the graphics stream into the page. Hope that helps. |
Hi @MatthiasValvekens, |
Hi Matthias,
I would like to kindly ask for your guidance once again. I have a similar objective as reported in #6. Let me explain.
I have a PDF whose entire content is signed with pyHanko. Now, I would like to: (i) add a few strings to the signed pdf by x/y coordinates; and (ii) sign only whatever was added after the first signature.
Regarding (i), after looking at the documentation and source code, there doesn't seem to be a readily available class/method for writing strings to the pdf. If that is indeed the case, what would be the straightforward way of doing so? What comes to mind is to mimic what is done in this method.
Regarding (ii) I am not sure where to start :-)
The text was updated successfully, but these errors were encountered: