Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match drush sql-sanitize? #1

Open
anotherjames opened this issue Jan 9, 2019 · 31 comments
Open

Match drush sql-sanitize? #1

anotherjames opened this issue Jan 9, 2019 · 31 comments
Labels
enhancement New feature or request

Comments

@anotherjames
Copy link
Collaborator

This looks like a handy project, thank you! Much easier to deal with than directly setting up expressions etc to pass to the mysqldump replacement.

There are some more common sanitizations done by drush sql-sanitize, such as clearing up fields on users, contact form submissions and even webform.

Is it possible for this project to include those? It would mean dynamically picking up what fields / modules exist, but I would guess that's achievable as there's a connection to the source database, so all those things can be investigated.

@robiningelbrecht
Copy link
Owner

The YML file allows us to configure what tables/columns should be sanitized and how this should be done. So I think you can just edit the config file, no?

@anotherjames
Copy link
Collaborator Author

That can be a pretty hefty manual process though, whereas I think it could be done dynamically. e.g. Pick out all text fields on users, form submissions, and sanitize them with some sensible default.

Many of those fields might want manually customising on a given project to use a particular anonymization method, but it would be wonderful if this module could be just used on any project to at least anonymize as much as is possible, maybe?

This is probably something I could put a PR together for when I have some time. I mostly wanted to discuss is first to see if you'd be up for something like that.

@robiningelbrecht
Copy link
Owner

Including webform submissions into the default config is a good idea!

I'm not so keen on including clearing custom user fields automatically/dynamically. That's the whole purpose of the yml config file :). For example, if you have 10 custom user fields that have to be cleared, you just have to add 10 config options to the yml file... Don't think this is a hefty process?

If you push a PR for clearing webform submission, can you make it configurable in the YML file?

@anotherjames
Copy link
Collaborator Author

Automatic sanitisation is most useful when it's not just the 10 user fields, but also those on form submissions, commerce orders, etc etc... and across multiple projects, they can all build up quickly. Rather than having to customise for each and every project, it would be wonderful to be able to just install a module like this on each of them (which is particularly easy if projects are started from a template, like I often would) and have sanitization taken care of for me.

Thinking about it, maybe a better approach would be to have a separate admin form that 'suggests' sensible sanitization defaults (each initially disabled, but perhaps with a 'select all' tickbox to enable them all easily), that when submitted, would update the settings config with all those fields.

Or even a hook that allows other modules to monkey with the settings just after they're loaded from config. So for example, I could have a second module that does add all user fields dynamically - that could solve my problem without having to add too much to your initial module?

@robiningelbrecht
Copy link
Owner

I like the idea of providing a form which checks available modules (commerce, webform,...) and allows them to be checked/unchecked.

In addition we could dispatch an alter event to allow devs to change the config on runtime?

@anotherjames
Copy link
Collaborator Author

Yes, all agreed :-) I'll keep this in mind, with any luck, I'll be able to contribute something this year, but don't let that stop you or anyone else implementing these things!

@robiningelbrecht
Copy link
Owner

I'll add the dispatch event. This isn't development heavy ;)

@robiningelbrecht robiningelbrecht added the enhancement New feature or request label Jan 9, 2019
@robiningelbrecht
Copy link
Owner

#2. Still have to test it though :)

@robiningelbrecht
Copy link
Owner

@anotherjames any idea how you would implement the UI? Should we make the current config also editable through the UI? Or only a simple form which allows us to check/uncheck modules. The rest of the config should be changed/added in the YML file?

@anotherjames
Copy link
Collaborator Author

The https://www.drupal.org/project/gdpr project's dump sub-module has a UI for selecting anonymization plugins for columns on tables, which might be a start for a UI for configuring the replacements & formatters. IIRC, it defaults to certain sections being shown which are expected to want configuring, e.g. user & contact/webform fields.

GDPR dump admin UI

(It's not necessarily the best UI, but is a start - I got involved in improvements getting it to this point, so I hope it's true that it's better than it was once!)

I feel like it would be ideal for the entire config settings to be configurable via the UI, but that doesn't have to be case. You could even hide the majority of the complexity under a 'wizard' / basic template settings page (with a simple checkbox for each module / set of potential things to sanitize?), with the rest under an 'advanced' page (which might look like that gdpr-dump UI), if wanted.

I'm not aware enough of what the 'gdpr_expressions' and 'drivers' parts of the settings are to be able to comment on those though.

@robiningelbrecht
Copy link
Owner

I think this is a start, but I wouldn't add all DB tables on initial page load.
I think it is better to provide a "Add table" ajax callback button, no?

@anotherjames
Copy link
Collaborator Author

Oh - a subtle thing in that screenshot - spot the 'Empty this table?' checkbox!

Is it possible to avoid exporting any data for a whole table with this project / machbarmacher/gdpr-dump ? I would use that a lot for tables containing personal data that I don't really even need in an anonymized state. I realise those can be specified with the table list parameters passed to drush sql-dump, but ideally I'd love those to be held in the settings too (and also then configurable in any UI).

@anotherjames
Copy link
Collaborator Author

I think this is a start, but I wouldn't add all DB tables on initial page load.
I think it is better to provide a "Add table" ajax callback button, no?

Yes, definitely :-)

@robiningelbrecht
Copy link
Owner

Oh - a subtle thing in that screenshot - spot the 'Empty this table?' checkbox!

Is it possible to avoid exporting any data for a whole table with this project / machbarmacher/gdpr-dump ? I would use that a lot for tables containing personal data that I don't really even need in an anonymized state. I realise those can be specified with the table list parameters passed to drush sql-dump, but ideally I'd love those to be held in the settings too (and also then configurable in any UI).

I don't think it is possible with machbarmacher/gdpr-dump. But we could achieve this with the --tables-list option provided by the drush command:

  • Add all tables to the --tables-list option
  • When "Empty table" is checked, don't add table

@robiningelbrecht
Copy link
Owner

Oh btw, my English can be a little bit rough, so if you spot any mistakes in the README, feel free to adjust them ;)

@anotherjames
Copy link
Collaborator Author

Yes, that would be great :-)

I've just read back through the tickets on the GDPR project for potential improvements to that UI that we could implement here - pagination is one, for if users really have added lots of tables even with the 'add another' button. I now remember that I did actually implement that button in the gdpr_dump UI too - see https://www.drupal.org/project/gdpr/issues/2973031 for screenshot & details.

@anotherjames
Copy link
Collaborator Author

Oh btw, my English can be a little bit rough, so if you spot any mistakes in the README, feel free to adjust them ;)

Sure :-) Your English seems fine enough to me so far, no problems yet!

@robiningelbrecht
Copy link
Owner

robiningelbrecht commented Jan 10, 2019

I'm not aware enough of what the 'gdpr_expressions' and 'drivers' parts of the settings are to be able to comment on those though.

  • GDPR expressions: https://github.com/machbarmacher/gdpr-dump#use-with-drush. So you can use a combo of expressions and replacements. (Expressions just replace a value of a column by a value of another column)
  • Drivers: Every driver has a different dump command, and the config file stores what command to use for dumping an sql file. This won't be configurable through UI

I personnaly don't think "expressions" will be used a lot.... so maybe we can just leave them out?

@anotherjames
Copy link
Collaborator Author

Thanks for the explanation.

I personnaly don't think "expressions" will be used a lot.... so maybe we can just leave them out?

Fine by me.

Should drivers really be part of config at all, or should they just be set from a static map in the classes code?

@robiningelbrecht
Copy link
Owner

I just made a class "GdprDumperEnums" to define an array of Faker options. I can put the driver config into that class?

@robiningelbrecht
Copy link
Owner

Also, wouldn't it be better, UI wise, to use vertical tabs per table instead of fieldsets (eg screenshot)?

@anotherjames
Copy link
Collaborator Author

Yes, all sounds good to me!

@robiningelbrecht
Copy link
Owner

image

I would propose this approach... :)

@anotherjames
Copy link
Collaborator Author

Very nice, yes! Could the table description be included in the tab, probably above the table? That should be retrievable. It would give a little more context, I think.

@robiningelbrecht
Copy link
Owner

Yes, I already retrieve the table comment, but i want to add it as a vertical tab "summary" like so:
image
But this has to be done with JS. I have marked it as a TODO in my code. Will do this afterwards

@anotherjames
Copy link
Collaborator Author

Ah OK I see that could be nice. But I was thinking that showing it above the table helps give some context to the user whilst they are looking at the table. The name of the table is already off to the left in the tab, so potentially nowhere near the table.
If there was a long list of tabs/tables, then the name/description may not even be in the viewport.
Maybe it's worth even duplicating the table name and/or description to show above the table?

@anotherjames
Copy link
Collaborator Author

Just posting this here since it's not explicitly linked: #3

@robiningelbrecht
Copy link
Owner

Ended up with this. What do you think?
image

@robiningelbrecht
Copy link
Owner

Any idea how I can make the --structure-tables-listoption work with the drush sql-dump command?
At this moment in time machbarmacher/gdpr-dump doesn't support exporting table-structure only (machbarmacher/gdpr-dump#32), but according to the drush documentation, I could achieve this with the option stated above:

A comma-separated list of tables to include for structure, but not data

@anotherjames
Copy link
Collaborator Author

I think https://github.com/drush-ops/drush/blob/db02bfbca1fc44b4e9d94596583f5e784d52afe0/src/Sql/SqlMysql.php#L170 is how drush sql-dump implements it, at least for mysqldump?

@anotherjames
Copy link
Collaborator Author

Oh sorry I see your GdprSql* classes all extend those classes anyway. From just looking at the code, I think maybe the option would already be respected, if --structure-tables-list was actually passed when running drush sql-dump-gdpr (I could be wrong). So I think GdprSqlBase::create() (or somewhere between there and calling dump()) needs to fetch the config and merge in the list of tables to the options?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants