Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Set data types #12

Open
dankochetov opened this issue Aug 7, 2020 · 10 comments
Open

Support for Set data types #12

dankochetov opened this issue Aug 7, 2020 · 10 comments

Comments

@dankochetov
Copy link

Are you planning to support Set data types (SS, NS, BS)? If there is no current progress in that direction, I'm willing to work on this.

@breath103
Copy link
Contributor

Hi, so this contains 2 different requests i think -

  1. conversion between NN <-> Set, SS <-> Set, BS <-> Set
  2. supporting DynamoDB operations like ["ADD"] to Set.

so for 1, we haven't talked about it yet but we think it completely makes sense -
actual implementation will be something like

@attribute()
public numberSet: NumberSet;

(because of reflect-metadata limit we have no choice to use NumberSet like custom class instead of clean Set)

for 2, we don't really have plan as of now, since that consumes same write capacity anyway - though it does have some beneficial usage.

for 1, I'd love to get PR and review - @dankochetov thanks.

@benhutchins
Copy link

@breath103 Within Dyngoose, we dropped the use of reflect-metadata in favor of more explicit specifications, and there you can use sets and maps. Although it doesn't optimize updates to perform ADD operations. @dankochetov if you're still looking for this support, you can see how Dyngoose implemented it if you want to consider porting it back to Dynamorm.

@breath103
Copy link
Contributor

I'm not sure you understand what i meant by limit of reflect-metadata
this is issue of Typescript metadata system itself. on reflect-metadata there is no concept of template (generic).

so NN should be typed as Set<number> and SS as Set<string> but there is no way to pass this information to Dynamorm as metadata. you can check this simply by checking compiled js file.

@Attribute("score")
public score: Set<number>;

compiled as

__decorate([
    dynamorm_1.Decorator.Attribute({ name: "score" }),
    __metadata("design:type", Set)
], Model.prototype, "score", void 0);

in short, we just need to create

class NumberSet extends Set<number> {}
class StringSet extends Set<number> {}

and handle those on ORM query layer. this is completely possible, just need work;

@benhutchins
Copy link

benhutchins commented Nov 4, 2020

@breath103 You could also change this by creating more explicit decorators. You could maintain the existing support you have for @Attribute(…) but you could similarly create:

@Attribute.NumberSet()
public score: number[]

This would then allow you to avoid using reflect-metadata and you'd know what the value format should be.

That is how Dyngoose implemented this, while it maintains @Attribute({…}) support it isn't mentioned anywhere in the examples or docs because every attribute type has explicit options that can impact how the data is formatted and dealt with.

An example string.metadata.ts

@Dyngoose.Attribute.String({ trim: true })
public automaticallyTrimmedString: string

@Dyngoose.Attribute.Date({ nowOnCreate: true })
createdAt: Date

This could be added to Dynamorm without losing the reflect-metadata usage. See more examples of how it can work on Dyngoose Attributes doc.

@breath103
Copy link
Contributor

oh you mean not using reflect-metadata at all. i mean that's technically much easier way - it's just trading off developer's convenience of manually mapping types - this was design decision on system.
i'm not sure why would you want to use SS or NS if it doesn't support ADD / REMOVE and not mapped as "Set" javascript class? that's up to original questioner.
anyway idea of implementing this feature on dynamorm is completely valid. we just haven't invested time on that

@benhutchins
Copy link

benhutchins commented Nov 4, 2020

@breath103 With sets, you can query using the contains query operator to determine if a set or list contains the specified value. That querying support can be extremely useful.

I actually only see the the ADD/REMOVE utilities as useful if the set if massive. If it was a massive set, then you'd want to avoid loading the set most of the time and you might not know what the list contains unless you perform a query using the contains operator. If the set is not so large you need to avoid loading it, then it likely isn't too large to send as a complete value when calling UpdateItem.

Except, Dynamorm currently does not use UpdateItem, it always uses PutItem which replaces the entire item rather than saving only the updated values. See Table.save which just calls Writer.put.

If you're going to attempt to optimize performance of sets, it might first be worth optimizing every single update operation of Dynamorm. Dyngoose does this by determining the best operation given the state and changes of the instance, see Table.forceSave.

Although, the limitations in Dynamorm is why I ended up forking dynamo-types into Dyngoose. I've ended up adding a lot of functionality to Dyngoose that did not exist in dynamo-types. I'd eagerly merge the two projects back together if possible.

@benhutchins
Copy link

Just another thought. I think developers would find it more convenient to change the decorator than to utilize a special set class. But a special set class for NumberSet could have some utilities on it for .add and .remove, however, if it was a native JavaScript array it might be easier to consume and manipulate; but then you could perform a diff during the save to determine which items are removed and which are added to build your ADD and REMOVE operations.

@breath103
Copy link
Contributor

breath103 commented Nov 5, 2020

Well those are are very often talked issue:

  1. why not partial update?
    a. DynamoDB from it's design consume exactly same amount write capacity regardless it's updateItem or putItem
    b. implementing dirty check and updating delta only is actually very complicated. that is one of the reason why proper orm like https://github.com/typeorm/typeorm just become very massive.

so in summary, it doesn't really have material performance (or dynamodb capacity) impact unless record is crazy big.
but size of single record of DynamoDB has limit (400kb), and updating that record (either partial / full) triggers 100 write capacity - which is reason why i think that's very anti pattern regardless of ORM implementation.
so it's very risky feature to create and most of time doesn't have difference anyway - which is reason why we design this as current

  1. You can still use partial update for certain use cases
    Conditional updates? balmbees/dynamo-types#30 there are some valid use case such as conditional update. which is why we implemented this.

  2. Contains query
    again, this doesn't matter when you do Query or Scan. since unless it's HashKey or SortKey DynamoDB consume exactly same read capacity anyway.
    only case this actually would have meaningful impact is on Conditional update as i mentioned above. (unless record is very huge, which is again, should be anti pattern anyway)

  3. Developer convenience

@Decorator.Attribute("set")
public setAttribute: NumberSet;
@Decorator.NumberSetAttribute("set")
public setAttribute: number[];

// or
@Decorator.NumberSetAttribute("set")
public setAttribute: Set<number>;

I don't think second one is better not only because of Decorator dependent definition, but also giving out implicit option for Array and Set. also, this is design decision. you can create it on Dynagoose if you want

@benhutchins
Copy link

benhutchins commented Nov 5, 2020

  1. TypeORM supports many databases, so it was bound to become massive. Unfortunately though, write capacity units isn't really the reason why UpdateItem is beneficial. Besides the better performance you get from reduced data transmission, performing a PutItem replaces the entire document every time you save. When using projections (which Dynmorm doesn't really allow for) or indexes that are not ProjectionType of ALL (such as INCLUDE which only contains specific data), Dynamorm would wipes out the unloaded data and loses attributes of documents.

When I was evaluating dynamo-types, that limitation was probably the single largest reason I couldn't convert to using dynamo-types. We have some massive tables and cloning the entire table into an index would be wasteful, by loading data from that index with dynamo-types and then saving would corrupt our tables.

But all of dynamorm is designed around the concept that GSIs will have ProjectionType of ALL and it provides no utility to specify the ProjectionExpression when loading records, so it assumes it will always load the entire record. So when saving, in that case, it would be fine to use PutItem if you had the entire record.

Unfortunately, at scale, there are too many times when loading the entire record is wasteful. The ability to perform a partial update to documents, without losing attributes that were not loaded, is necessary in those situations.

  1. Conditional saving is not the same as UpdateItem, both of which are important, and both of which I implemented in Dyngoose. I was involved in that PR over at dynamo-types and pointed out then that Dyngoose had already implemented conditional saving, although with a very different approach to typing the conditions.

  2. But again, it's not just about the DynamoDB capacity utilized, it's about how much data you are loading. If an application needs to query for records that match specific filters, they only want to retrieve those records for processing. In that case, they find it acceptable to use the read capacity units from DynamoDB as long as it doesn't wastefully load records they will simply ignore from their application. You can still avoid wasted resources by asking DynamoDB to do the filtering and using the contains operator is there for that purpose.

  3. No problem. At this point I think merging the two projects wouldn't likely be feasible. There were a lot of things that lead to the need to create Dyngoose. Still, feel free to pull in some of the improvements if you find them helpful and can port them. Some of the features so far include development database seeding, CloudFormation template resources creation, the ability to add filters to queries, powerful call-chain querying, typed creation of new instances, support for transactions and batch operations across multiple tables, a native JavaScript array output from queries, support for Dates, support for JSON objects, support for Binary attributes, support for BigInts, support for String, Number, and Binary Sets, support for table migration, support for default attribute values, support for encrypted tables, support for pay per request and for provisioned throughput tables, support for dynamodb streams… Hopefully some of the work can be helpful to dynamorm in the future.

@breath103
Copy link
Contributor

@benhutchins

Yes indeed this assumes ProjectionType=All, since this is "ORM". technically in order to make sense the consistency of ORM, it's required to create "separated" model if the data schema is different. otherwise developers are always exposed to use none-existing attribute, since there is no explicit and safe way of knowing which attributes exists or not.

Again this is simply system design trade offs. I mean i'm literally AWS certified Serverless Hero. me and team didn't implement those cause we decided that it's better tradeoff to not have those. both on productivity and performance wise.

I'm not sure about your intention here precisely because of that. if you don't agree with the view of this repo and maintainers, you can just fork this repository and create your own. not to even mention since your point of view on designing ORM is completely opposite from us thus i've never had intention to merge with Dynagoose anyway.

Instead you just copy-and-pasted a lot of codes from this repository, which is against ISC license.
I'll let you slide with that, but i'll not tolerate anymore pointless discussion here. if you want to just advertise your repository without having productive suggestion to repository, keep that to yourself. i'm blocking you from here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants