-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split two lists/iterables of objects "wanted_state", "existing_state" in 3 lists "create", "update", "delete" #567
Comments
Some solution is for example: def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
one_only_list = []
both_list = []
two_only_list = []
objects_dict1 = {getattr(some_object, pairing_field): some_object for some_object in objects_list1}
objects_dict2 = {getattr(some_object, pairing_field): some_object for some_object in objects_list2}
keys = set(objects_dict1.keys()) | set(objects_dict2.keys())
for some_key in keys:
if objects_dict1.get(some_key) is None:
two_only_list.append(objects_dict2[some_key])
elif objects_dict2.get(some_key) is None:
one_only_list.append(objects_dict1[some_key])
else:
both_list.append((objects_dict1[some_key], objects_dict2[some_key]))
return one_only_list, both_list, two_only_list I did not test this code. I just coded it directly in the issue comment. |
Until now I did some adhoc code every time I had to do this, but clearly it can be factorized. |
To complete the example, you can do: wanted_state_objects = MyModel.objects.filter(site_id=1)
existing_state_objects = MyModel.objects.filter(site_id=2)
to_create_objects, to_update_objects, to_delete_objects = get_modification_lists(
wanted_state_objects,
existing_state_objects,
"name",
)
for some_object in to_create_objects:
some_object.site_id = 2
some_object.id = None
MyModel.objects.bulk_create(to_create_objects)
MyModel.objects.bulk_delete(to_delete_objects)
true_updates = []
for wanted_object, existing_object in to_update_objects:
# The code in this loop should go in a method of the model
is_modified = False
if existing_object.some_parameter_interesting_field1 != wanted_object.some_parameter_interesting_field1:
existing_object.some_parameter_interesting_field1 = wanted_object.some_parameter_interesting_field1
is_modified = True
if existing_object.some_parameter_interesting_field2 != wanted_object.some_parameter_interesting_field2:
existing_object.some_parameter_interesting_field2 = wanted_object.some_parameter_interesting_field2
is_modified = True
# ...
if is_modified:
true_updates.append(existing_object)
# You can enhance this with updated_fields
MyModel.objects.bulk_update(true_updates) This way you can truly minimize database load :) |
This looks a lot like different types of joins. pytoolz has a join method (https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.join) If I understand your description, doing a full outer join and inspecting the result will tell you which operation should be performed. from toolz import join, identity
a = list(range(10))
b = list(range(5, 15))
for el1, el2 in join(identity, a, identity, b, left_default=None, right_default=None):
if el1 is None:
# These are elements that appear in b, but not in a
elif el2 is None:
# These are elements that appear in a, but not in b
else:
# These are elements that appear in both a and b You would want to replace the identity function with attrgetter to get the appropriate attribute off each object for comparison. |
Thank you very much for your answer @groutr . def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
one_only_list = []
both_list = []
two_only_list = []
objects_dict1 = {getattr(some_object, pairing_field): some_object for some_object in objects_list1}
objects_dict2 = {getattr(some_object, pairing_field): some_object for some_object in objects_list2}
keys = set(objects_dict1.keys()) | set(objects_dict2.keys())
for some_key in keys:
if objects_dict1.get(some_key) is None:
two_only_list.append(objects_dict2[some_key])
elif objects_dict2.get(some_key) is None:
one_only_list.append(objects_dict1[some_key])
else:
both_list.append((objects_dict1[some_key], objects_dict2[some_key]))
return one_only_list, both_list, two_only_list with def get_modification_lists(objects_list1: Iterable, objects_list2: Iterable, pairing_field: str):
one_only_list = []
both_list = []
two_only_list = []
for object1, object2 in join(
lambda x: getattr(x, pairing_field),
objects_list1,
lambda x: getattr(x, pairing_field),
objects_list2,
left_default=None,
right_default=None,
):
if object1 is None:
two_only_list.append(object2)
elif object2 is None:
one_only_list.append(object1)
else:
both_list.append((object1, object2))
return one_only_list, both_list, two_only_list But I think join is the right track for naming a new function. LEFT_OUTER_JOIN_MINUS_INNER_JOIN = (1,)
INNER_JOIN = (2,)
RIGHT_OUTER_JOIN_MINUS_INNER_JOIN = (3,)
LEFT_OUTER_JOIN = (1,2)
RIGHT_OUTER_JOIN = (2,3)
FULL_OUTER_JOIN = (1,2,3)
ANTI_JOIN = (1, 3)
def join_synthesis(
leftkey,
leftseq,
rightkey,
rightseq,
joins_seq,
):
.... And my use case could be: one_only_list, both_list, two_only_list = join_synthesis(
lambda x: getattr(x, pairing_field),
objects_list1,
lambda x: getattr(x, pairing_field),
objects_list2,
joins_seq=(LEFT_OUTER_JOIN_MINUS_INNER_JOIN, INNER_JOIN, RIGHT_OUTER_JOIN_MINUS_INNER_JOIN),
) It would be nice :) Thanks :) |
Hello,
I'm trying to find the best way to modify the database when I want to copy some object parametering from one site to one site in Django.
Let's assume I have a model
The ids and site_ids are distinct between the two sites.
However, I would like to pair objects by name and do only the database modifications that are required.
What is the best way to do this using toolz API, please ?
I will detail some solutions I have found later.
Best regards,
Laurent Lyaudet
The text was updated successfully, but these errors were encountered: