Skip to content

Commit

Permalink
Merge pull request #2 from appsembler/jazzar/working-commands
Browse files Browse the repository at this point in the history
Add core functionality management commands
  • Loading branch information
iamjazzar authored Aug 4, 2021
2 parents 82c3f03 + 665d6fd commit 88b205f
Show file tree
Hide file tree
Showing 27 changed files with 2,250 additions and 20 deletions.
32 changes: 32 additions & 0 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries

name: Build and upload Python package

on:
release:
types: [published]
pull_request:

jobs:
deploy:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- name: Build package
run: python -m build
- name: 'Publish package'
uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
if: ${{ startsWith(github.ref, 'refs/tags') }}
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,11 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 tox
pip install flake8 tox semantic-version
- name: Display Python version
run: python -c "import sys; print(sys.version)"
- name: Test library version
run: python -c "from gestore import __version__; from semantic_version import Version; Version(__version__)"
- name: Lint with flake8
run: flake8 gestore demoapp --statistics
- name: Test with tox
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Library folders
exports/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
192 changes: 190 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,190 @@
# gestore
Django object management
<p align="center">
<a href="https://www.appsembler.com/">
<img width="500" alt="Gestore Django object manager" src="https://user-images.githubusercontent.com/11036472/123709664-47cc6f80-d822-11eb-9a97-4f87ba3ca64d.png">
</a>
</p>
<p align="center">
<br>
<br>
"gestore" means "manager" in Italian.
<br>
<br>
<a href="https://github.com/appsembler/gestore/issues">Report bug</a>
·
<a href="mailto:[email protected]">Report security issues</a>
·
<a href="https://www.appsembler.com/">Appsembler</a>
·
<a href="https://www.appsembler.com/blog/">Blog</a>
</p>


## Gestore

A set of tools that will help you to
- Export individual objects from DB.
- Import exported objects back.
- Delete objects from the database and all other objects related to it.

## Table of Contents
1. [Why using this tool](#why-using-this-tool)
1. [Gestore vs Django dumpdata and loaddata](#gestore-vs-django-dumpdata-and-loaddata)
1. [Get started](#get-started)
1. [How does it work](#how-does-it-work)
1. [Export functionality](#export-functionality)
1. [Import functionality](#import-functionality)
1. [Delete functionality](#delete-functionality)
1. [Demo app](#demo-app)
1. [Releasing](#releasing)
1. [Challenges](#challenges)

> **Note**
>
> Object Import and Delete are not production-ready yet. Use with caution
## Why using this tool
This idea came out of Appsembler Multi-Cluster UI/UX Workflows. This tool is handy for supporting multiple clusters.
Other reasons why having robust Export/Import/Delete functionality on your app would be highly beneficial:
- Frees your site from lots of data you are not using. It's a great idea to export such data to a file system so you can import it later.
- Decreases the overhead for data tools.
- Removes old data to keep your costs down and improve performance of aggregation functions (e.g., data pipelines)
- Deletes obsolete objects as customers churn.
- Data export is beneficial for GDPR reasons
- Some customers want their data now for DR (disaster recovery) reasons, not because they're churning.
- If you are strongly motivated to create a separate cluster for data that already exists on a current one.
- Lowers the risk of objects (e.g., trial users) being able to crack your site isolation and access data from paying customers.

### Gestore vs Django dumpdata and loaddata
While the functionality might seem the same, these Gestore commands are entirely different from Django commands.

You can use Django's `dumpdata` commands to back up (export) your models or whole database. And `loaddata` command helps to import these objects back.

On the other hand, Gestore's `exportobjects` command will help you export all data across the database that's only related to a given Object. This functionality will make sure that you can import these exported objects successfully later.

### Example
Let's assume you want to export a specific `Books` object:

<img width="965" alt="Screen Shot 2021-06-28 at 11 16 15 AM" src="https://user-images.githubusercontent.com/11036472/123684405-43905a00-d802-11eb-862b-abc0392b4bf6.png">

In Django, you have to export all Books objects `dumpdata` and load it back later using `loaddata`. This method is not practical in two situations:
- You only want to export one object, not the whole table.
- Importing that object back might cause some problems as `Authors` object does not exist the exports file.

Gestore helps you overcome these issues altogether. When you provide it with your object ID, Gestore will scan all objects related to it, so when you import it back, you get it to work as expected.

## Get started

Start by installing the package from pip
```shell
pip install gestore
```

To be able to access the management commands, add `gestore` to your installed apps:

```python
INSTALLED_APPS = [
...
'gestore',
]
```

Now your project should be ready to use gestore to manage objects.

## How does it work
This tool uses BFS to explore all objects in the DB from the given one. In the following GIF, let's assume you want to export object number 3; gestore will fetch its data and process all the objects it's connected to
![Breadth first search animation](https://media.giphy.com/media/v6P6CSXDAthrRA4ZHi/giphy.gif)

### Export functionality
This command will help you export all object-related data once triggered. For every model being processed: we get its data, including linked objects' keys (Foreign, ManyToMany, OneToOne) until we hit a base model that's not connected to any other model (leaf node).

We use a BFS technique to scrape data element by element from the database until we reach a node without any relations. For each processed object, we store its data and its children's data.

> The output of `exportobjects` can be used as input for `importobjects`.
#### Command Usage

```shell
python manage.py exportobjects [-d] [-o OUTPUT] objects
```
`objects` is a list of objects to be exported. Each of these arguments must match the following syntax: `<app_id>.<Model>.<object_id>`

##### Example
```shell
python manage.py exportobjects auth.User.10 demoapp.Book.4 -o /path/to/exp.json
```

#### Command Arguments

- `objects` The main argument of the `exportobjects`. Its representation is described above.
- `--debug` flag is optional. Use it to prevent any file writing. It is helpful to see the JSON output of the data before writing it on your system.
- `--output` is an optional argument that takes a path string of the location in which you want to store the data exports file.
- `--bucket` If provided, we will export the objects a GCP bucket in the path provided above (or the auto generated one). This needs settings configurations.


### Import functionality

Importing objects is developed in a way that leverages Django's
`django.core.serializers.python.Deserializer` functionality. In Django, if you are loading a JSON-formatted object into a Model, Django will check the desired table for that object ID and then determines whether to perform either an update or an insert action on that table.

#### Command Usage

```shell
python manage.py importobjects [-d] [-o] path
```

##### Example
```shell
python manage.py importobjects /path/to/exp.json
```

#### Command Arguments

- `path`. The main argument of the `importobjects`. It should point to an export file on your local system.
- `--debug` performs a dry run. Will not commit or save any changes to the DB.
- `--override` DANGEROUS. In case of a conflict, this will override objects in the DB with the ones being imported.
- `--bucket` If provided, we will import the objects from the given path in a GCP bucket. This needs settings configurations.

#### Main issues here
Let's say I have two objects with the same ID. Both of these objects might have the same schema or might be completely different. How can we perform a safe import without sacrificing the current data and without duplicating all objects?
In other words, we have primary key collisions on import and need a strategy to prevent these collisions.

As this app is still under development, we now route for two ways to solve this:
- **Manual editing**: We'll collect all conflicts before committing changes, then we notify the developer about them. The developer will go to the export file, check these objects, compare them with the ones in the database, and modify the import file with the desired values. Once satisfied, they can use the import command again.
- **Force replacement**: Using the `--override` flag allows the command to replace all conflicting objects in the DB with the ones being imported. This is a very DANGEROUS approach and should never be considered in a production environment.


#### Ways we are looking into:
- Using **UUID**s in our system: It's the industry-standard solution making database IDs unique in distributed systems.
- **Changing conflicting objects IDs**: This is a good solution to avoid all conflicts. We set an offset value (or auto increment) and add it to the new object being inserted in the database. Instead of `ID=1` we end up with `ID=9001`. This approach is nice in case conflicts have been resolved, but might cause data duplicates in case not.


### Delete functionality
#### Not implemented yet.

### Demo app
This app is created for the sole purpose of testing and visualizing the manager commands in action. No other functionality is expected out of this app.

## Releasing
We publish new releases using GitHub Actions. The following steps must
be followed to post a new release:
- Create a PR to bump the version and get it merged. Version is being stored in [gestore/__init__.py](https://github.com/appsembler/gestore/blob/master/gestore/__init__.py) file which both the commands and the [setup.py](https://github.com/appsembler/gestore/blob/master/setup.py) file read its value from.
- Once the PR is merged, go and make a new release out of master using the [Draft New Release button](https://docs.github.com/en/github/administering-a-repository/releasing-projects-on-github/managing-releases-in-a-repository):
- Mark dev releases as Pre-release so it's clear on GitHub and PyPI
- In a minute or so, the release will be published into [PyPI](https://pypi.org/project/gestore/).

### Debugging failed releases
- Go to the GitHub actions tab and select [Build and upload Python package](https://github.com/appsembler/gestore/actions/workflows/python-publish.yml).
- Click on it to see build logs.

### Dev releases
Until we feel this is production-ready, we will continue only to push releases that contain `dev` in them.

## Challenges
- **Platform state**: When exporting data from your project, it's assumed that importing it back will take place in the same project with the same data structures. If you upgrade a library that you're using its models, and these models were changed (fields removed, added, type changed), you will face some problems.
- **Object conflicts**
- Some data like _usernames_ are unique cluster-wide; if we're importing such data from another cluster, some could be duplicated or overridden.
- Some exported objects might have a similar ID to a different object in the database. This tool will flag these objects for you so you know what to change and what to override.
- **Using Buckets**: At the moment, we are only supporting GCP Cloud Storage, not only that, but we are using `gsutil` to perform this operation for us. I know this sounds stupid, but it was our only way to do so since `google-cloud-storage` doesn't have support for Python 3.5, which is something we have to support at the moment.
## Reporting Security Issues
Please do not report security issues in public. Please email us
on [email protected].
9 changes: 9 additions & 0 deletions config/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,3 +126,12 @@
# https://docs.djangoproject.com/en/3.2/ref/settings/#default-auto-field

DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'


# Bucket upload settings
GESTORE_BUCKET_NAME = os.environ.get('GESTORE_BUCKET_NAME')
GESTORE_PROJECT_NAME = os.environ.get('GESTORE_PROJECT_NAME')
GESTORE_CREDENTIALS = os.environ.get(
"GESTORE_CREDENTIALS",
os.environ.get("GOOGLE_APPLICATION_CREDENTIALS")
)
2 changes: 1 addition & 1 deletion config/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@


urlpatterns = [
path('', RedirectView.as_view(url='/catalog/', permanent=True)),
path('', RedirectView.as_view(url='/catalog/')),
path('accounts/', include('django.contrib.auth.urls')),
path('catalog/', include('demoapp.urls')),

Expand Down
3 changes: 1 addition & 2 deletions gestore/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
"""
gestore initialization module
"""
from semantic_version import Version

# Increase this version by 1 after every backward-incompatible
# change in the exported data format
__version__ = str(Version('0.1.0-dev0'))
__version__ = '0.1.0-dev3'
24 changes: 24 additions & 0 deletions gestore/encoders.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from django.core.serializers.json import DjangoJSONEncoder
from django.db.models.fields.files import ImageFieldFile


class GestoreEncoder(DjangoJSONEncoder):
"""
A custom encoder that allows us to serialize unserializable fields
like `ImageFieldFile` and `Country` objects.
For each field you are trying to encode, make sure the return value is
appropriate to be imported back again.
"""
def default(self, o, *args, **kwargs):
if isinstance(o, ImageFieldFile):
return o.name

try:
from django_countries.fields import Country
if isinstance(o, Country):
return o.code
except ImportError:
pass

return super(GestoreEncoder, self).default(o)
Loading

0 comments on commit 88b205f

Please sign in to comment.