Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce @flatten annotation #642

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

nox213
Copy link
Contributor

@nox213 nox213 commented Dec 1, 2024

closes #623

This PR introduces the @flatten annotation.

Usage

The @flatten annotation can only be applied to:

case classes: Flatten fields of a nested case class into the parent structure.

case class A(i: Int, @flatten b: B)
case class B(msg: String)
implicit val rw: ReadWriter[A] = macroRW
implicit val rw: ReadWriter[B] = macroRW
write(A(1, B("Hello"))) // {"i":1, "msg": "Hello"}

Iterable: Flatten key-value pairs of a Iterable[(String, _)] into the parent structure.

case class A(i: Int, @flatten map: Map[String, String])
implicit val rw: ReadWriter[A] = macroRW
val map = Map("a" -> "1", "b" -> "2")
write(A(1, map)) // {"i":1, "a":"1", "b": "2"}

Nested flattening allows you to apply the @flatten annotation recursively to fields within nested case classes.

case class Outer(msg: String, @flatten inner: Inner)
case class Inner(@flatten inner2: Inner2)
case class Inner2(i: Int)

implicit val rw: ReadWriter[Inner2] = macroRW
implicit val rw: ReadWriter[Inner] = macroRW
implicit val rw: ReadWriter[Outer] = macroRW

write(Outer("abc", Inner(Inner2(7)))) // {"msg": "abc", "i": 7}

The Reader also recognizes the @flatten annotation.

case class A(i: Int, @flatten b: B)
case class B(msg: String)
implicit val rw: ReadWriter[A] = macroRW
implicit val rw: ReadWriter[B] = macroRW
read("""{"i": 1, "msg": "Hello"}""") // The top-level field "msg": "Hello" is correctly mapped to the field in B.

For collection, during deserialization, all key-value pairs in the JSON that do not directly map to a specific field in the case class are attempted to be stored in the Map.

If a key in the JSON does not correspond to any field in the case class, it is stored in the collection.

case class A(i: Int,@flatten Map[String, String])
implicit val rw: ReadWriter[A] = macroRW
read("""{"i":1, "a" -> "1", "b" -> "2"}""") // Output: A(1, Map("a" -> "1", "b" -> "2"))

If there are no keys in the JSON that can be stored in the collection, it is treated as an empty collection.

read("""{"i":1}""")
// Output: A(1, Map.empty)

If a key’s value in the JSON cannot be converted to the Map’s value type (e.g., String), the deserialization fails.

read("""{"i":1, "a":{"name":"foo"}}""")
// Error: Failed to deserialize because the value for "a" is not a String, as required by Map[String, String].

Limitations

  1. Flattening more than two collections to a same level is not supported. Flattening multiple collections to a same level feels awkward to support because, when deriving a Reader, it becomes unclear which collection the data should be stored in.
  2. Type parameters do not seem to be properly resolved in the following scenario:
case class Param[T](@flatten t: T)
object Param {
   implicit def rw[T: RW]: RW[Param[T]] = upickle.default.macroRW // compile error when this function is called to derive instance
  implicit val rw[SomeClass]: RW[Param[SomeClass]] = upickle.default.macroRW // works
}
  1. When using the @flatten annotation on a Iterable, the type of key must be String.

Implementation Strategy

Writer Derivation

From my understanding, deriving a Writer for a case class involves implementing a dispatch function that iterates through each field of the case class. In the existing implementation, the Writer is generated by processing the information of each field in the type T being derived.

this.writeSnippetMappedName[R, Int](ctx, upickle.default.objectAttributeKeyWriteMap("i"), implicitly[upickle.default.Writer[Int]], v.i);
this.writeSnippetMappedName[R, upickle.Nested](ctx, upickle.default.objectAttributeKeyWriteMap("n"), implicitly[upickle.default.Writer[upickle.Nested]], v.n);
ctx.visitEnd(-1)

When deriving a Writer, the above snippet shows how the visit function is invoked for each field, passing the corresponding field’s Writer as an argument. If the field is a Map or a case class and needs to be flattened, additional processing is required:

  1. For case classes, instead of delegating to the Writer of the nested case class, the visit function should be called directly for each field in the nested case class. For example:
this.writeSnippetMappedName[R, upickle.Int](ctx, upickle.default.objectAttributeKeyWriteMap("integerValue"), implicitly[upickle.default.Writer[upickle.Int]], v.n.integerValue);
  1. For Map, iterate through all key-value pairs in the Map, calling visit for each pair:
mapField.foreach { case (key, value) =>
  this.writeSnippetMappedName[R, $valueType](
    ctx,
    key.toString,
    implicitly[${c.prefix}.Writer[$valueType]],
    value
  )
}

Reader Derivation

Deriving a Reader is more complex, especially with support for Map, as it introduces several edge cases. A Reader is essentially a Visitor for JSON, as I understand it. The process involves three main steps:

  1. Mapping read keys to indices.
  2. Storing the read values in variables corresponding to the indices.
  3. Constructing the case class instance from the read values after traversal.

To support flattening, additional steps were required in these stages:

  1. Mapping Keys to Indices:
    Flattened fields must be accounted for in the key-to-index mapping. For example:
case class A(@flatten b: B)
case class B(i: Int, d: Double)

// Without flattening
key match {
  "b" => 0
  _ => -1
}

// With flattening
key match {
  "i" => 0
  "d" => 1
  _ => -1
}
  1. Allocate storage for flattened fields, similar to step 1, but extended to account for flattened structures.
  2. Modify the class construction logic to recursively handle flattened fields.

Special Case for Maps:

  • Since Map must capture all key-value pairs not corresponding to a specific field, I implemented logic to handle this:
    • If a key’s index is -1 an type T we derive has a flatten annotation on map, the key-value pair is stored in a ListBuffer within the Reader.
    • These pairs are used during class construction to populate the Map.

# Conflicts:
#	upickle/test/src/upickle/MacroTests.scala
#	upickleReadme/Readme.scalatex
@lihaoyi
Copy link
Member

lihaoyi commented Dec 2, 2024

@nox213 this looks great. Could you update the PR description with a summary of the implemented semantics, with examples, as well as a summary of the implementation strategy? That would make the PR much easier to review and merge

@nox213
Copy link
Contributor Author

nox213 commented Dec 2, 2024

Sure, I’ll tag you when I’m ready.

@nox213
Copy link
Contributor Author

nox213 commented Dec 3, 2024

@lihaoyi I polished the code a little and updated the PR.

@lihaoyi
Copy link
Member

lihaoyi commented Dec 5, 2024

@nox213 at I high level, would it be possible to implement this without the case class Field refactor? If we could preserve the existing method signatures mostly-in-place, we could add forwarders and maintain binary compatibility which would allow us to release this without a breaking major version

@lihaoyi
Copy link
Member

lihaoyi commented Dec 5, 2024

I guess more generally I'd like to see whether we can land this without breaking binary compatibility. Breaking bincompat and releasing Mill 5.0.0 is an option, but it would be better if we didn't need to do that

@lihaoyi
Copy link
Member

lihaoyi commented Dec 5, 2024

I think this could work:

  • currentKey/storeToMap variables can be made binary compatible by moving them from the upstream trait into the various downstream impls

  • The macro changes could be made binary compatible by leaving the existing macros in place but deprecated, and just making a Macros2 class containing all the new implementations

@lihaoyi
Copy link
Member

lihaoyi commented Dec 5, 2024

For the Scala 2 vs Scala 3 approach, could we use the Scala 3 approach in Scala 2 as well? It's best to keep the two implementations mostly consistent, so future changes can be done more easily across both versions

@lihaoyi
Copy link
Member

lihaoyi commented Dec 5, 2024

case class Param[T](@flatten t: T)
object Param {
   implicit def rw[T]: RW[Param[T]] = upickle.default.macroRW // compile error when this function is called to derive instance
  implicit val rw[SomeClass]: RW[Param[SomeClass]] = upickle.default.macroRW // works
}

Can this be fixed by making it implicit def rw[T: RW]?

@lihaoyi
Copy link
Member

lihaoyi commented Dec 5, 2024

When using the @flatten annotation on a Map, the type of key must be String. (Allowing user-defined case classes as keys introduces a high risk of key conflicts, so I’ve restricted it to String.) Also, only immutable.Map is supported. This makes derivation of Reader simpler.

@flatten is only allowed to immutable.Map. This makes derivation of Reader simpler.

How hard would it be to generalize this to other collection types and key types?

  • One use case for other collection types would be to support Seq[(String, T)] or LinkedHashMap[String, T] for cases where people want to preserve ordering
  • uPickle already supports non-String map keys via https://com-lihaoyi.github.io/upickle/#JSONDictionaryFormats, so it would be nice if we could be consistent here as well when flattening out maps

@lihaoyi
Copy link
Member

lihaoyi commented Dec 5, 2024

Left some high level comments. It's a pretty invasive change so I expect it will take some time to discuss the changes, first at a high-level and then down to the lines of code being changed, just to set expectations. But overall the approach looks solid

@nox213
Copy link
Contributor Author

nox213 commented Dec 5, 2024

@lihaoyi Thanks for your review.

Can this be fixed by making it implicit def rw[T: RW]?

I'm not sure whether it's possible. I will investigate more on it.

How hard would it be to generalize this to other collection types and key types?

I believe this is doable. For other collection types, I could store the type of collection that is flatten and then use that type to construct it from Reader. For keys, I didn't do it because I thought that's not the use cases. I will try this.

For the Scala 2 vs Scala 3 approach, could we use the Scala 3 approach in Scala 2 as well? It's best to keep the two implementations mostly consistent, so future changes can be done more easily across both versions

Sure, I will use the Scala 3 approach in Scala 2 (no case class Field).

@nox213
Copy link
Contributor Author

nox213 commented Dec 5, 2024

The challenging part seems to be bincompat. Would the best approach be to implement @flatten without changing the existing function signatures then? And if changing the function signatures is unavoidable, the next best option would be to keep the existing functions (marking them as deprecated) and implement separate macros (e.g., a new Macro class in Scala 2). In this case, would it make sense to introduce a conditional branch in specific functions to decide whether to call the deprecated macro or the new one? For example, determining this based on the presence of the @flatten annotation. Did I understand this correctly?

One thing I’m curious about: as I understand it, macro code is only called internally, so I assumed it would be safe in terms of bincompat. However, can even functions that are not called externally impact bincompat?

@lihaoyi
Copy link
Member

lihaoyi commented Dec 5, 2024

For bincompat, we could leave the existing Macros file completely untouched and just make a new copy Macros2 that we use going forward. No need to conditionally branch on the presence of @flatten, the new code works great. We just need to leave the old code in place for bincompat in case anyone downstream is using it (probably nobody, but good to be careful in preserving bincompat anyway)

@lihaoyi
Copy link
Member

lihaoyi commented Dec 5, 2024

I believe this is doable. For other collection types, I could store the type of collection that is flatten and then use that type to construct it from Reader. For keys, I didn't do it because I thought that's not the use cases. I will try this.

I think what you would need to do is take the type of the collection, resolve the Reader for it, and use that Reader instance at runtime in order to construct the collection from the individual key-value pairs

For keys, I think you should be able to do the same thing: take the type, resolve the Reader, and use the Reader to construct the key objects from the JSON keys and then pass those to the collection Reader

@nox213
Copy link
Contributor Author

nox213 commented Dec 6, 2024

Thanks for your explanation and tips.

To summarize the direction I need to take for the changes:

  1. For Scala 2, keep the existing Macros.scala unchanged and create a new Macros2.scala to apply the modifications (for bincompat), and the new code will use Macros2.scala.
  2. Move the definitions of currentKey and storeToMap from the upstream trait to the downstream implementation (for bincompat).
  3. Update the implementation to support flattening for collection types other than immutable.Map.
  4. Allow Map to have key types other than String.
  5. For the Scala 2 macro implementation, aim to follow a design as similar as possible to Scala 3, without using case class Field.
  • In the existing Scala 2 implementation, all the information necessary for derivation (e.g., symbols, types, mappedName, etc.) is generated upfront and passed to wrapCaseN for construction. That's why I introduced case class Field, so that I can pass all the information to wrapCaseN. If I were to adopt an approach similar to Scala 3, wrapCaseN would instead start with just the type being derived and then call a function to fetch the field information for that type.

Is this the right direction to proceed?

Lastly, for Macros2.scala, would it be better to apply private or [package] private as much as possible if those restrictions can be applied?

@lihaoyi
Copy link
Member

lihaoyi commented Dec 6, 2024

@nox213 yes that's right. Making the new macros package private would indeed help, though some of the supporting runtime APIs may need to remain public so we won't be able to privatize everything

@nox213
Copy link
Contributor Author

nox213 commented Dec 6, 2024

Okay, I will work on it.

@nox213 nox213 force-pushed the feature/flatten branch 4 times, most recently from d79bf49 to f33336d Compare December 7, 2024 13:06
wip

make scala2 works

Delete some files

Add more tests

more tests

scala3 writer

fix

polish

clean up

Add helper function

scala3 macro

readers

fix bug

test pass

polish

polish

wip

fix

remove unused

Use ListBuffer to preserve orders

polish

Keep existing macros for bincompat

Keep formatting
@nox213
Copy link
Contributor Author

nox213 commented Dec 9, 2024

@lihaoyi
I have some questions regarding bincompat of Scala3.

  1. I modified types of some fields(construct, visitor0) in CaseClassReader3. Do you have some ideas to keep bincompat? Is it okay to make a copy of CaseClassReader3 like CaseClassReader3V2 and use it instead?
  2. In macros, is it okay to maintain bincompat by just keeping old methods and making them deprecated?

@lihaoyi
Copy link
Member

lihaoyi commented Dec 9, 2024

@lihaoyi I have some questions regarding bincompat of Scala3.

  1. I modified types of some fields(construct, visitor0) in CaseClassReader3. Do you have some ideas to keep bincompat? Is it okay to make a copy of CaseClassReader3 like CaseClassReader3V2 and use it instead?

Making a copy should work

  1. In macros, is it okay to maintain bincompat by just keeping old methods and making them deprecated?

Yes leaving the old methods deprecated should work as well

@nox213
Copy link
Contributor Author

nox213 commented Dec 10, 2024

@lihaoyi
77a882e
This is the change that move storeToMap and currentKey to downstream trait. Since storeToMap should be used in visitValue, I had to overwrite this function in downstream trait. What do you think of making a copy of ObjectContext.scala instead so that we don't need to define visitValue separately for scala 2 & 3.

@lihaoyi
Copy link
Member

lihaoyi commented Dec 10, 2024

@nox213 I don't quite understand what you are asking

@nox213
Copy link
Contributor Author

nox213 commented Dec 10, 2024

If I define storeToMap not inside of BaseCaseObjectContext but inside of the trait that is generated by macro codes, I have to override visitValue as well from the trait that is generated by macro.

@lihaoyi
Copy link
Member

lihaoyi commented Dec 10, 2024

defining it separately for Scala 2 and Scala 3 is fine I think

@nox213
Copy link
Contributor Author

nox213 commented Dec 10, 2024

Okay, I will keep it as it is

@nox213
Copy link
Contributor Author

nox213 commented Dec 13, 2024

@lihaoyi
I made Scala2 macros works with Iterable[(Key, Value)] (removing restrictions to only work with Map and String keys). But I ran into one problem when I worked on Scala 3: the stricter quote system does not allow me to use the same approach with Scala 2.

First, for deriving a Writer of collection with non-string key, I did like below for Scala 2.

val writeKey = if (keyType =:= typeOf[String]) q"key" else q"upickle.default.write(key)(implicitly[${c.prefix}.Writer[$keyType]])"
                q"""
                  $select.foreach { case (key, value) =>
                  this.writeSnippetMappedName[R, $valueType](
                    ctx,
                    $writeKey,
                    implicitly[${c.prefix}.Writer[$valueType]],
                    value
                    )
                  }
                """

Basically, I summoned Writer[Key] and called upickle.default.write with that Writer to convert key value to string.
But this doesn't work for Scala 3 since quote of Scala 3 do not allow me to generate such a code. Scala 3 requires that all elements referenced in the quote are valid and resolvable in the scope of the quote itself if I understood correctly. Do you have some ideas that convert key to string without calling write?

'{
                ${select.asExprOf[Iterable[(keyTpe, valueTpe)]]}.foreach { (k, v) =>
                  ${self}.writeSnippetMappedName[R, valueTpe](
                    ${ctx},
                    upickle.default.write(key)(summonInline[keyWriterTpe]), // this doesn't compile
                    summonInline[writerTpe],
                    v,
                  )
                }
              }

@lihaoyi
Copy link
Member

lihaoyi commented Dec 16, 2024

@nox213 I'm not actually sure on the Scala 3 side. @jodersky do you have any ideas how this can be done with Scala 3 metaprogramming?

@nox213
Copy link
Contributor Author

nox213 commented Dec 16, 2024

What I tried was that

val writeRef = Ref(Symbol.requiredMethod("upickle.default.write"))
          (keyTpe0.asType, valueTpe0.asType, keyWriterTpe0.asType, valueWriterTpe0.asType) match {
            case ('[keyTpe], '[valueTpe], '[keyWriterTpe], '[valueWriterTpe]) =>
              val typeAppliedWrite = TypeApply(writeRef, List(TypeTree.of[keyTpe]))
              val snippet =
                if (keyTpe0 =:= TypeRepr.of[String])
                 ... // be ommitted
                else
                  '{
                      ${select.asExprOf[Iterable[(keyTpe, valueTpe)]]}.foreach { (k, v) =>
                        ${self}.writeSnippetMappedName[R, valueTpe](
                          ${ctx},
                          ${Apply(
                              Apply(
                                typeAppliedWrite,
                                List('{k}.asTerm, '{-1}.asTerm, '{false}.asTerm, '{false}.asTerm)
                              ),
                              List(
                                '{summonInline[keyWriterTpe]}.asTerm
                              )
                            )
                            .asExprOf[String]},
                          summonInline[valueWriterTpe],
                          v,
                        )
                      }
                    }

I used requiredMethod to access the method symbol out of scope and I think I managed to generate a code that calls it, but the error was that

[error]     |               Found:    (upickle.Flatten.KeyClass.rw :
[error]     |                 upickle.default.ReadWriter[upickle.Flatten.KeyClass])
[error]     |               Required: Api.this.Writer[upickle.Flatten.KeyClass]

Those two types are treated as different type because their path is different. I got stuck at this point.

@nox213
Copy link
Contributor Author

nox213 commented Dec 18, 2024

@lihaoyi

case class Param[T](@flatten t: T)
object Param {
   implicit def rw[T]: RW[Param[T]] = upickle.default.macroRW // compile error when this function is called to derive instance
  implicit val rw[SomeClass]: RW[Param[SomeClass]] = upickle.default.macroRW // works
}

Can this be fixed by making it implicit def rw[T: RW]?

I misunderstood your comment. I tested with your fix but it still doesn't work.

  object FlattenTestWithType {
    implicit def rw[T: RW]: RW[FlattenTestWithType[T]] = upickle.default.macroRW // compile error
  }

I’ve pushed the changes to support collections other than Map (iterable flattening is now possible) so far. a07839d

@jodersky
Copy link
Member

I just saw your ping now. Let me know if there's still something I can help with

@nox213
Copy link
Contributor Author

nox213 commented Dec 20, 2024

@jodersky Thanks, could you check these comment, comment?
Basically, I tried to make flatten annotation works for collections that has non-string key. To support it, I need to generate code that calls functions defined in upickle.default which lie outside the scope from macros.scala.

@nox213
Copy link
Contributor Author

nox213 commented Dec 26, 2024

@lihaoyi

supporting collections with non-string key

Would you consider merging this PR without the feature for now? The PR is already quite large, and if we come up with a good solution later, we could address it in a separate follow-up.

@lihaoyi
Copy link
Member

lihaoyi commented Dec 26, 2024

Sure! Would be happy to merge it first

Give me some time to properly review it. Still probably need @jodersky 's help to look through some of the scala 3 macro stuff, and I'll give it a pass as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide Rust serde-like flatten behavior for flattening/sharing/capturing fields (500USD Bounty)
3 participants