-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GoBD compliance #16
Comments
What I tryied so far is to update the provided S3 Storage Backend to meet the 2.1 architecture papermerge/s3#1 This allows me to connect to an S3 Bucket and papermerge-core is uploading V1 into that bucket - but no further versions are stored in the S3 bucket, e.g. V2 (with OCR). |
I also enabled TLS / HTTPS and needed to adapt papermerge.js to connect via secure websocket (wss) |
Thank you very much for taking your time and opening this issue and opening pull requests. I merged both of your merge requests. Thank you again for your contributions! Regarding GoDB compliance. Data retention feature will be implemented. However, it won't be part of papermerge-core - it will be provided as plugin (django app). There two main reasons why data retention will be implemented as plugin:
Papermerge-Core will contain only features wanted by all users (vast majority of users) - everything else will be provided as plugin. I will keep this ticket open for reference and GoDB links. |
That sounds good! But I don't know if it is worth the effort to code a storage layer which is able to provide the data retention layer as required by the GoBD. I think to put the data retention into a seperate plugin, frontend (set retention time) and backend (prohibit deletion for files under retention) is the way to go. But I think to actually perform the task of archiving and storing the data under retention - prevent data loss, ensure data integrity , prevent admin/root access/modifications, keep track of ANY modifications and so on over the hole storage time - is way to complex for a simple plugin and should be done by the filesystem/storage layer. I think you already developed part of the solution with the storage class under core.lib.storage (btw. I like the concept). As far as I understand the storage class, it is the place to handle all File IO. So a Retention Plugin could provide an additional storageclass which is for example backed by MinIO S3 with enabled retention and a folder with enabled retention would use such storage class - while other folders are using the default storage class - or to set the default sotrage class to an retention enabled storage. Is it the way to go? I just had a quick look at the source code, in this perspective and found some other places where File IO is performed - Is there any plan to move these File IO into the storage class or is there any blocker to do so? |
The goal of this issue is to provide a manual and working copy of papermerge to be GoBD compliant.
The basic installation of papermerge will already meet alot of the requirements of the german GoBD
The Bitkom e.V. published a guide e.g. checklist to check if a software and process can be GoBD compliant.
In my humble opinion there is currently on major blocker and this is 2.2.1 c) requirement 17 where it reads:
This delete lock (Keine Löschmöglichkeit vor dem Ende der Aufbewahrungsfrist) includes priviliged access like root / admin. One way to implement such mechanism is MinIO Retention, it turnes your S3 bucket into a WORM (Write Once Read Many) storage backend. And the Cohasset Associates, Inc. did already an assesment on MinIO to deploy such an S3 Storage to be SEC 17a-4(f), FINRA 4511(c) and CFTC 1.31(c)-(d) compliance. We can consider GoBD and SEC 17a-4(f), FINRA 4511(c) and CFTC 1.31(c)-(d) comparable as both deal with storing of tax data on digital devices.
From version 2.1 ongoing papermerge is moving towards a kubernetes ready architecture and RWX file storage to store data.
The RWX storage is also used to share the data between the app and the worker nodes. But this development moved papermerge a bit further away from the S3 backend.
A further goal of this issue is to adapt papermerge in order to use an S3 WORM storage backend (storing only relevant data, this includes any intermediate steps in the processing of the original data to the processed data, but nothing more - as this data is stored for at least 10 years on that WORM drive).
To archive this goal, we need to adapt not only the core, but also other parts of the papermerge project. We should link all adaptions in order to be GoBD compliant to this issue, so that we can track the development.
The text was updated successfully, but these errors were encountered: