Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split two lists/iterables of objects "wanted_state", "existing_state" in 3 lists "create", "update", "delete" #567

Open
LLyaudet opened this issue Jun 1, 2023 · 5 comments

Comments

@LLyaudet
Copy link
Contributor

LLyaudet commented Jun 1, 2023

Hello,
I'm trying to find the best way to modify the database when I want to copy some object parametering from one site to one site in Django.
Let's assume I have a model

class MyModel:
    def __init__(self, some_id, name, site_id):
        self.id = some_id
        self.name = name
        self.site_ids
        # ... some other interesting fields I want to copy

The ids and site_ids are distinct between the two sites.
However, I would like to pair objects by name and do only the database modifications that are required.
What is the best way to do this using toolz API, please ?
I will detail some solutions I have found later.
Best regards,
Laurent Lyaudet

@LLyaudet
Copy link
Contributor Author

LLyaudet commented Jun 1, 2023

Some solution is for example:

def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
    one_only_list = []
    both_list = []
    two_only_list = []
    objects_dict1 = {getattr(some_object, pairing_field): some_object for some_object in objects_list1}
    objects_dict2 = {getattr(some_object, pairing_field): some_object for some_object in objects_list2}
    keys = set(objects_dict1.keys()) | set(objects_dict2.keys())
    for some_key in keys:
         if objects_dict1.get(some_key) is None:
             two_only_list.append(objects_dict2[some_key])
         elif objects_dict2.get(some_key) is None:
             one_only_list.append(objects_dict1[some_key])
         else:
             both_list.append((objects_dict1[some_key], objects_dict2[some_key]))
    return one_only_list, both_list, two_only_list

I did not test this code. I just coded it directly in the issue comment.
But you get the idea :)

@LLyaudet
Copy link
Contributor Author

LLyaudet commented Jun 1, 2023

Until now I did some adhoc code every time I had to do this, but clearly it can be factorized.
I would be very happy if you find time to improve my function above,
and even more happy if you would consider to add this kind of function in toolz :)

@LLyaudet
Copy link
Contributor Author

LLyaudet commented Jun 1, 2023

To complete the example, you can do:

wanted_state_objects = MyModel.objects.filter(site_id=1)
existing_state_objects = MyModel.objects.filter(site_id=2)

to_create_objects, to_update_objects, to_delete_objects = get_modification_lists(
    wanted_state_objects,
    existing_state_objects,
    "name",
)

for some_object in to_create_objects:
    some_object.site_id = 2
    some_object.id = None
MyModel.objects.bulk_create(to_create_objects)

MyModel.objects.bulk_delete(to_delete_objects)

true_updates = []
for wanted_object, existing_object in to_update_objects:
    # The code in this loop should go in a method of the model
    is_modified = False
    if  existing_object.some_parameter_interesting_field1 != wanted_object.some_parameter_interesting_field1:
        existing_object.some_parameter_interesting_field1 = wanted_object.some_parameter_interesting_field1
        is_modified = True
    if existing_object.some_parameter_interesting_field2 != wanted_object.some_parameter_interesting_field2:
        existing_object.some_parameter_interesting_field2 = wanted_object.some_parameter_interesting_field2
        is_modified = True
    # ...
    if is_modified:
        true_updates.append(existing_object)

# You can enhance this with updated_fields
MyModel.objects.bulk_update(true_updates)

This way you can truly minimize database load :)

@groutr
Copy link
Contributor

groutr commented Jul 10, 2023

This looks a lot like different types of joins. pytoolz has a join method (https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.join)

If I understand your description, doing a full outer join and inspecting the result will tell you which operation should be performed.

from toolz import join, identity
a = list(range(10))
b = list(range(5, 15))

for el1, el2 in join(identity, a, identity, b, left_default=None, right_default=None):
    if el1 is None:
        # These are elements that appear in b, but not in a
    elif el2 is None:
        # These are elements that appear in a, but not in b
    else:
        # These are elements that appear in both a and b

You would want to replace the identity function with attrgetter to get the appropriate attribute off each object for comparison.

@LLyaudet
Copy link
Contributor Author

Thank you very much for your answer @groutr .
I checked the doc and corrected a typo: #568
It is interesting to see it as a full outer join :)
Nevertheless, it doesn't factorize the code enough since you want the three lists for bulk database modification.
With full outer join, you replace:

def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
    one_only_list = []
    both_list = []
    two_only_list = []
    objects_dict1 = {getattr(some_object, pairing_field): some_object for some_object in objects_list1}
    objects_dict2 = {getattr(some_object, pairing_field): some_object for some_object in objects_list2}
    keys = set(objects_dict1.keys()) | set(objects_dict2.keys())
    for some_key in keys:
         if objects_dict1.get(some_key) is None:
             two_only_list.append(objects_dict2[some_key])
         elif objects_dict2.get(some_key) is None:
             one_only_list.append(objects_dict1[some_key])
         else:
             both_list.append((objects_dict1[some_key], objects_dict2[some_key]))
    return one_only_list, both_list, two_only_list

with

def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
   one_only_list = []
   both_list = []
   two_only_list = []
   for object1, object2 in join(
       lambda x: getattr(x, pairing_field),
       objects_list1,
       lambda x: getattr(x, pairing_field),
       objects_list2,
       left_default=None,
       right_default=None,
   ):
        if object1 is None:
            two_only_list.append(object2)
        elif object2 is None:
            one_only_list.append(object1)
        else:
            both_list.append((object1, object2))
   return one_only_list, both_list, two_only_list

But I think join is the right track for naming a new function.
both_list corresponds to inner join.
one_only_list corresponds to left outer join "minus" inner join.
two_only_list corresponds to right outer join "minus" inner join.
Something like

LEFT_OUTER_JOIN_MINUS_INNER_JOIN = (1,)
INNER_JOIN = (2,)
RIGHT_OUTER_JOIN_MINUS_INNER_JOIN = (3,)
LEFT_OUTER_JOIN = (1,2)
RIGHT_OUTER_JOIN = (2,3)
FULL_OUTER_JOIN = (1,2,3)
ANTI_JOIN = (1, 3)

def join_synthesis(
    leftkey,
    leftseq,
    rightkey,
    rightseq,
    joins_seq,
):
    ....

And my use case could be:

one_only_list, both_list, two_only_list = join_synthesis(
    lambda x: getattr(x, pairing_field),
    objects_list1,
    lambda x: getattr(x, pairing_field),
    objects_list2,
    joins_seq=(LEFT_OUTER_JOIN_MINUS_INNER_JOIN, INNER_JOIN, RIGHT_OUTER_JOIN_MINUS_INNER_JOIN),
)

It would be nice :) Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants