Skip to content
This repository has been archived by the owner on Jun 21, 2024. It is now read-only.

improve export format #118

Open
1 of 5 tasks
felixboehm opened this issue Oct 14, 2019 · 2 comments
Open
1 of 5 tasks

improve export format #118

felixboehm opened this issue Oct 14, 2019 · 2 comments
Assignees
Labels

Comments

@felixboehm
Copy link
Contributor

felixboehm commented Oct 14, 2019

Milestone: Improvements on "export format" https://github.com/owncloud/data_exporter/milestone/2

Open issues

Spec

Simplified Structure

.
└── einstein
    ├── files
    │       ├── Documents
    │       │   └── Example.odt
    │       ├── Photos
    │       │   ├── Paris.jpg
    │       │   ├── San\ Francisco.jpg
    │       │   └── Squirrel.jpg
    │       └── ownCloud\ Manual.pdf
    ├── files_trashbin
    │       └── todo @butonic 
    ├── files_versions
    │       └── todo @butonic 
    ├── files.jsonl
    ├── shares.jsonl
    ├── user.json

After development

@felixboehm felixboehm self-assigned this Oct 14, 2019
@felixboehm felixboehm pinned this issue Oct 14, 2019
IljaN added a commit that referenced this issue Nov 1, 2019
Before this PR all data-types (files, versions, trashbin...)
were stored inside a single files directory which closely followed ownClouds
home-folder layout. This allowed for fast iteration but also coupled
the format to ownCloud which in turn introduced some design quirks
(/files/files/) in files.jsonl (#111)

All data type specific folders are now stored in the root directory of
the export which allows simpler mapping from metadata (#118).
This is also reflected in the architecture: The exporter traverses down from
the specific directories instead from the home.

A special "root folder" was introduced in files.jsonl to further decouple things
from owncloud and to be able to carry the e-tag for the whole tree. Instead of
"/files", "/" is now root in the export.

Path class has been added to reduce path-merging boilerplate.
IljaN added a commit that referenced this issue Nov 7, 2019
Before this PR all data-types (files, versions, trashbin...)
were stored inside a single files directory which closely followed ownClouds
home-folder layout. This allowed for fast iteration but also coupled
the format to ownCloud which in turn introduced some design quirks
(/files/files/) in files.jsonl (#111)

All data type specific folders are now stored in the root directory of
the export which allows simpler mapping from metadata (#118).
This is also reflected in the architecture: The exporter traverses down from
the specific directories instead from the home.

A special "root folder" was introduced in files.jsonl to further decouple things
from owncloud and to be able to carry the e-tag for the whole tree. Instead of
"/files", "/" is now root in the export.

Path class has been added to reduce path-merging boilerplate.
IljaN added a commit that referenced this issue Nov 9, 2019
Before this PR all data-types (files, versions, trashbin...)
were stored inside a single files directory which closely followed ownClouds
home-folder layout. This allowed for fast iteration but also coupled
the format to ownCloud which in turn introduced some design quirks
(/files/files/) in files.jsonl (#111)

All data type specific folders are now stored in the root directory of
the export which allows simpler mapping from metadata (#118).
This is also reflected in the architecture: The exporter traverses down from
the specific directories instead from the home.

A special "root folder" was introduced in files.jsonl to further decouple things
from owncloud and to be able to carry the e-tag for the whole tree. Instead of
"/files", "/" is now root in the export.

Path class has been added to reduce path-merging boilerplate.
@IljaN
Copy link
Member

IljaN commented Nov 10, 2019

Any pointers on how to model versions in the export in a platform independet manner? (oCis)@butonic @felixboehm

Possible avenues:

1. Add versions array to each entry in files.jsonl

This could potentially bloat files.jsonl, and this is only etag, owner and timestamp.

{
  "type": "file",
  "path": "/Rop/versioned.txt",
  "eTag": "3a1f4a6ab721bd13ae9abe79088d5a69",
  "permissions": 27,
  "mtime": 1573372163,
  "versions": {
    "1573372157": {
      "etag": "83cbf4a6423c1bf846650f50c987b135",
      "owner": "admin",
      "timestamp": 1573372157
    },
    "1573372158": {
      "etag": "befa0fe4cb4f672d9db9ca532059069d",
      "owner": "admin",
      "timestamp": 1573372158
    },
    "1573372159": {
      "etag": "f6c61a371083277bb3fe5583444da1f7",
      "owner": "admin",
      "timestamp": 1573372159
    },
    "1573372161": {
      "etag": "258067b818ff1633cec4fe6b244e4319",
      "owner": "admin",
      "timestamp": 1573372161
    },
    "1573372163": {
      "etag": "ba5d239ac8b84cb092ff5a0bd1ea9f3a",
      "owner": "admin", 
      "timestamp": 1573372163
    }
  }
}

Storage-Path and some other fields are ommited because I assume this an implementation detail. The target system should know by itself where to put it's versions. Please correct me If this assumption is wrong but following this train of tought we could simplify even further:

{
  "type": "file",
  "path": "/Rop/versioned.txt",
  "eTag": "3a1f4a6ab721bd13ae9abe79088d5a69",
  "permissions": 27,
  "mtime": 1573372163,
  "versions": [
      "1573372157"
      "1573372158"  
      "1573372159"  
      "1573372161"
   ]
}
 

As the files_version directory mirrors the user-dir but with all files suffixed with .v$VERSION the importer can recreate everything from the version-string by using the path of the file.

Downsides of the above approach are that the etag, mtime etc. for the version is lost (future clients with version-sync!). This also won't work for systems where versions are organized differently.

2. Don't link files to versions (like in ownCloud)

  • Create sperate versions.jsonl which includes the filecache-entry of the seperate version-files + some additional fields with version-metadata.

  • Import version files seperately in a second pass maybe even online after maintenance is over?

@butonic Will this be even possible in oCis i.e iterating over files-metadata and modifying retroactively? Only knowing the path and maybe the Storage?

Any toughts?

IljaN added a commit that referenced this issue Nov 12, 2019
Before this PR all data-types (files, versions, trashbin...)
were stored inside a single files directory which closely followed ownClouds
home-folder layout. This allowed for fast iteration but also coupled
the format to ownCloud which in turn introduced some design quirks
(/files/files/) in files.jsonl (#111)

All data type specific folders are now stored in the root directory of
the export which allows simpler mapping from metadata (#118).
This is also reflected in the architecture: The exporter traverses down from
the specific directories instead from the home.

A special "root folder" was introduced in files.jsonl to further decouple things
from owncloud and to be able to carry the e-tag for the whole tree. Instead of
"/files", "/" is now root in the export.

Path class has been added to reduce path-merging boilerplate.
@butonic butonic self-assigned this Nov 13, 2019
@butonic
Copy link
Member

butonic commented Nov 14, 2019

I'd go with a separate file. then an import does not need to read all version information and afaict we will need to bypass the cs3 api to add versions anyway. same for trashbin.

@butonic butonic assigned IljaN and unassigned butonic Nov 19, 2019
@mmattel mmattel unpinned this issue Oct 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants