Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema generates conflicting names for enum variants/across crates #92

Open
jshaw-jump opened this issue Mar 30, 2022 · 10 comments
Open

Comments

@jshaw-jump
Copy link

jshaw-jump commented Mar 30, 2022

Original issue (conflicting names of enum variants with outer structs)

The following code fails to compile because the schema name generation for enum variants can produce name clashes.

#[derive(BorshSchema)]
enum Foo {
    Bar(FooBar),
}

#[derive(BorshSchema)]
struct FooBar {}
error[E0275]: overflow evaluating the requirement `<Foo as borsh::BorshSchema>::add_definitions_recursively::FooBar: borsh::BorshSchema`
 --> code.rs:1:10
  |
2 | #[derive(BorshSchema)]
  |          ^^^^^^^^^^^
  |
  = help: see issue #48214
  = note: this error originates in the derive macro `borsh::BorshSchema` (in Nightly builds, run with -Z macro-backtrace for more info)

I believe the error is a result of using a simple concatenation to generate the name.

let full_variant_name_str = format!("{}{}", name_str, variant_name_str);


Widened issue (conflicting names for types globally across multiple crates)

As per research by @dzmitry-lahoda, BorshSchema is problematic to use when multiple third-party crates are involved,
which can introduce conflicting names and only be panicing with conflicts when tried to be used elsewhere.

use crate_a::A as AA;
use crate_b::A as AB;

#[derive(BorshSchema)]
struct B {
    x: AA,
    y: AB,
}
@frol
Copy link
Collaborator

frol commented Mar 31, 2022

Good catch! Thanks for reporting it, but I believe borsh schema issues won't be addressed at the moment as it is not used in production and was just an experiment without a clear specification

@austinabell
Copy link
Contributor

cc @itegulov @miraclx we might want to consider this issue now that we are using schema for ABI

@dzmitry-lahoda
Copy link
Contributor

what is possible outline of fix for this? I have clash for same named struct in different parts of my big account struct.

@dzmitry-lahoda
Copy link
Contributor

dzmitry-lahoda commented May 2, 2024

I see next:

  • BorshSchema is used with solana contracts, even some in spl
  • BorshSchema is used in near codebase in several types
  • NearSchema with borsh attribute generates BorshSchema impl. as per code and documentation NearSchema does BorshSchema as default.

So there are some users, may be a lot if search on GH.

cc @frol

@dzmitry-lahoda
Copy link
Contributor

dzmitry-lahoda commented May 2, 2024

Did quick tests on schemars

TLDR

For same named items schemars generates second declaration/definition under +1 suffixed name.

Long:

See #/definitions/A2

thread 'engine::tests::borsh_diff::serde_name_conflict' panicked at engine/src/engine/tests/borsh_diff.rs:68:5:
RootSchema { meta_schema: Some("http://json-schema.org/draft-07/schema#"), schema: SchemaObject { metadata: Some(Metadata { id: None, title: Some("D"), description: None, default: None, deprecated: false, read_only: false, write_only: false, examples: [] }), instance_type: Some(Single(Object)), format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: Some(ObjectValidation { max_properties: None, min_properties: None, required: {"b", "c"}, properties: {"b": Object(SchemaObject { metadata: None, instance_type: None, format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: None, reference: Some("#/definitions/B"), extensions: {} }), "c": Object(SchemaObject { metadata: None, instance_type: None, format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: None, reference: Some("#/definitions/C"), extensions: {} })}, pattern_properties: {}, additional_properties: None, property_names: None }), reference: None, extensions: {} }, definitions: {"A": Object(SchemaObject { metadata: None, instance_type: Some(Single(Object)), format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: Some(ObjectValidation { max_properties: None, min_properties: None, required: {"a"}, properties: {"a": Object(SchemaObject { metadata: None, instance_type: Some(Single(Integer)), format: Some("uint8"), enum_values: None, const_value: None, subschemas: None, number: Some(NumberValidation { multiple_of: None, maximum: None, exclusive_maximum: None, minimum: Some(0.0), exclusive_minimum: None }), string: None, array: None, object: None, reference: None, extensions: {} })}, pattern_properties: {}, additional_properties: None, property_names: None }), reference: None, extensions: {} }), "A2": Object(SchemaObject { metadata: None, instance_type: Some(Single(Object)), format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: Some(ObjectValidation { max_properties: None, min_properties: None, required: {"a"}, properties: {"a": Object(SchemaObject { metadata: None, instance_type: Some(Single(Integer)), format: Some("uint8"), enum_values: None, const_value: None, subschemas: None, number: Some(NumberValidation { multiple_of: None, maximum: None, exclusive_maximum: None, minimum: Some(0.0), exclusive_minimum: None }), string: None, array: None, object: None, reference: None, extensions: {} })}, pattern_properties: {}, additional_properties: None, property_names: None }), reference: None, extensions: {} }), "B": Object(SchemaObject { metadata: None, instance_type: Some(Single(Object)), format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: Some(ObjectValidation { max_properties: None, min_properties: None, required: {"a"}, properties: {"a": Object(SchemaObject { metadata: None, instance_type: None, format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: None, reference: Some("#/definitions/A"), extensions: {} })}, pattern_properties: {}, additional_properties: None, property_names: None }), reference: None, extensions: {} }), "C": Object(SchemaObject { metadata: None, instance_type: Some(Single(Object)), format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: Some(ObjectValidation { max_properties: None, min_properties: None, required: {"a"}, properties: {"a": Object(SchemaObject { metadata: None, instance_type: None, format: None, enum_values: None, const_value: None, subschemas: None, number: None, string: None, array: None, object: None, reference: Some("#/definitions/A2"), extensions: {} })}, pattern_properties: {}, additional_properties: None, property_names: None }), reference: None, extensions: {} })} }

mod a {
    use borsh::{de, BorshDeserialize, BorshSchema, BorshSerialize};

    #[derive(Default, Debug, BorshSerialize, BorshSchema, arbitrary::Arbitrary, schemars::JsonSchema, serde::Serialize)]
    pub  struct A {
        a: u8,
    }
    
    #[derive(Default, Debug, BorshSerialize, BorshSchema, arbitrary::Arbitrary, schemars::JsonSchema, serde::Serialize)]
    pub struct  B {
        a: A,
    }
}

mod b {
    use borsh::{de, BorshDeserialize, BorshSchema, BorshSerialize};

    #[derive(Default, Debug, BorshSerialize, BorshSchema, arbitrary::Arbitrary, schemars::JsonSchema, serde::Serialize)]
    pub  struct A {
        a: u8,
    }
    
    #[derive(Default, Debug, BorshSerialize, BorshSchema, arbitrary::Arbitrary, schemars::JsonSchema, serde::Serialize)]
    pub struct  C {
        a: A,
    }
}

use a::B;
use b::C;
#[derive(Default, Debug, BorshSerialize, BorshSchema, arbitrary::Arbitrary, schemars::JsonSchema, serde::Serialize)]
struct D {
    b: B,
    c: C,
}

#[test]
fn serde_name_conflict() {
    let schema = schemars::gen::SchemaGenerator::default()
    .into_root_schema_for::<D>();
    panic!("{:?}", schema);
}

@dzmitry-lahoda
Copy link
Contributor

dzmitry-lahoda commented May 2, 2024

@frol @itegulov @miraclx @austinabell

would PR which does exact same bump of number suffix in declaration accepted as fix as in schemars?

solution

So for type A and A in definitions, second one will become A2

possible improvement

  • improvement can be if definitions are 100% same, do not bump, but use existing one (not conflict panic)
  • store in schema name as it was given as is, and reference name with bumped id

alternative

Prefix definition with parent struct, so if A and A in B and C, name of second one will be B_A, which will require need to propagate parent info (hopefully it is already).

In case A and A in same struct one will have alias or mod prefix, so use alias or mod prefix.

Alternative seems complicated.

Not sure if one can get full Rust compile time info about modules in proc macro.

@frol
Copy link
Collaborator

frol commented Jun 4, 2024

I don’t see a universal solution to avoid naming collisions when names are autogenerated. The most flexible solution is to introduce macro attribute to control/override the naming.

@dzmitry-lahoda Would you like to contribute the solution to allow naming customization?

@dj8yfo
Copy link
Collaborator

dj8yfo commented Jun 5, 2024

@dzmitry-lahoda there's some solution for simple cases without deep nesting of conflicting types

diff --git a/borsh-derive/src/internals/attributes/field/schema.rs b/borsh-derive/src/internals/attributes/field/schema.rs
index 01353bf1..af632f93 100644
--- a/borsh-derive/src/internals/attributes/field/schema.rs
+++ b/borsh-derive/src/internals/attributes/field/schema.rs
@@ -24,31 +24,31 @@ pub static SCHEMA_FIELD_PARSE_MAP: Lazy<BTreeMap<Symbol, Box<ParseFn>>> = Lazy::
     // assigning closure `let f = |args| {...};` and boxing closure `let f: Box<ParseFn> = Box::new(f);`
     // on 2 separate lines doesn't work
     let f_params: Box<ParseFn> = Box::new(|attr_name, meta_item_name, meta| {
         parse_lit_into_vec::<ParameterOverride>(attr_name, meta_item_name, meta)
             .map(Variants::Params)
     });
 
     let f_with_funcs: Box<ParseFn> = Box::new(|_attr_name, _meta_item_name, meta| {
         let map_result = meta_get_by_symbol_keys(WITH_FUNCS, meta, &WITH_FUNCS_FIELD_PARSE_MAP)?;
         let with_funcs: WithFuncs = map_result.into();
-        if (with_funcs.declaration.is_some() && with_funcs.definitions.is_none())
-            || (with_funcs.declaration.is_none() && with_funcs.definitions.is_some())
-        {
-            return Err(syn::Error::new_spanned(
-                &meta.path,
-                format!(
-                    "both `{}` and `{}` have to be specified at the same time",
-                    DECLARATION.1, DEFINITIONS.1,
-                ),
-            ));
-        }
+        // if (with_funcs.declaration.is_some() && with_funcs.definitions.is_none())
+        //     || (with_funcs.declaration.is_none() && with_funcs.definitions.is_some())
+        // {
+        //     return Err(syn::Error::new_spanned(
+        //         &meta.path,
+        //         format!(
+        //             "both `{}` and `{}` have to be specified at the same time",
+        //             DECLARATION.1, DEFINITIONS.1,
+        //         ),
+        //     ));
+        // }
         Ok(Variants::WithFuncs(with_funcs))
     });
     m.insert(PARAMS, f_params);
     m.insert(WITH_FUNCS, f_with_funcs);
     m
 });
 
 /**
 Struct describes an entry like `order_param => override_type`,  e.g. `K => <K as TraitName>::Associated`
 */
diff --git a/borsh/Cargo.toml b/borsh/Cargo.toml
index e0a9465f..b208a7a6 100644
--- a/borsh/Cargo.toml
+++ b/borsh/Cargo.toml
@@ -38,19 +38,19 @@ bytes = { version = "1", optional = true }
 bson = { version = "2", optional = true }
 
 [dev-dependencies]
 insta = "1.29.0"
 
 [package.metadata.docs.rs]
 features = ["derive", "unstable__schema", "rc"]
 targets = ["x86_64-unknown-linux-gnu"]
 
 [features]
-default = ["std"]
+default = ["std", "unstable__schema"]
 derive = ["borsh-derive"]
 unstable__schema = ["derive", "borsh-derive/schema"]
 std = []
 # Opt into impls for Rc<T> and Arc<T>. Serializing and deserializing these types
 # does not preserve identity and may result in multiple copies of the same data.
 # Be sure that this is what you want before enabling this feature.
 rc = []
 de_strict_order = []
diff --git a/borsh/tests/schema/test_schema_with_third_party.rs b/borsh/tests/schema/test_schema_with_third_party.rs
index 30562f89..ea068e85 100644
--- a/borsh/tests/schema/test_schema_with_third_party.rs
+++ b/borsh/tests/schema/test_schema_with_third_party.rs
@@ -1,24 +1,25 @@
 use crate::common_macro::schema_imports::*;
 
 // use alloc::collections::BTreeMap;
 
 #[allow(unused)]
+#[derive(BorshSchema)]
 struct ThirdParty<K, V>(BTreeMap<K, V>);
 #[allow(unused)]
 mod third_party_impl {
     use crate::common_macro::schema_imports::*;
 
     pub(super) fn declaration<K: borsh::BorshSchema, V: borsh::BorshSchema>(
     ) -> borsh::schema::Declaration {
         let params = vec![<K>::declaration(), <V>::declaration()];
-        format!(r#"{}<{}>"#, "ThirdParty", params.join(", "))
+        format!(r#"{}<{}>"#, "ThirdPartyOneHasNoControlOfRenamed", params.join(", "))
     }
 
     pub(super) fn add_definitions_recursively<K: borsh::BorshSchema, V: borsh::BorshSchema>(
         definitions: &mut BTreeMap<borsh::schema::Declaration, borsh::schema::Definition>,
     ) {
         let fields = borsh::schema::Fields::UnnamedFields(vec![
             <BTreeMap<K, V> as borsh::BorshSchema>::declaration(),
         ]);
         let definition = borsh::schema::Definition::Struct { fields };
         let no_recursion_flag = definitions.get(&declaration::<K, V>()).is_none();
@@ -58,24 +59,24 @@ enum C<K, V> {
 pub fn struct_overriden() {
     assert_eq!(
         "A<u64, String>".to_string(),
         <A<u64, String>>::declaration()
     );
     let mut defs = Default::default();
     <A<u64, String>>::add_definitions_recursively(&mut defs);
     assert_eq!(
         schema_map! {
             "A<u64, String>" => Definition::Struct { fields: Fields::NamedFields(vec![
-                ("x".to_string(), "ThirdParty<u64, String>".to_string()),
+                ("x".to_string(), "ThirdPartyOneHasNoControlOfRenamed<u64, String>".to_string()),
                 ("y".to_string(), "u64".to_string())]
             )},
-            "ThirdParty<u64, String>" => Definition::Struct { fields: Fields::UnnamedFields(vec![
+            "ThirdPartyOneHasNoControlOfRenamed<u64, String>" => Definition::Struct { fields: Fields::UnnamedFields(vec![
                 "BTreeMap<u64, String>".to_string(),
             ]) },
             "BTreeMap<u64, String>"=> Definition::Sequence {
                 length_width: Definition::DEFAULT_LENGTH_WIDTH,
                 length_range: Definition::DEFAULT_LENGTH_RANGE,
                 elements: "(u64, String)".to_string(),
             },
             "(u64, String)" => Definition::Tuple { elements: vec!["u64".to_string(), "String".to_string()]},
             "u64" => Definition::Primitive(8),
             "String" => Definition::Sequence {
@@ -101,23 +102,23 @@ pub fn enum_overriden() {
         schema_map! {
             "C<u64, String>" => Definition::Enum {
                 tag_width: 1,
                 variants: vec![
                     (0, "C3".to_string(), "CC3".to_string()),
                     (1, "C4".to_string(), "CC4<u64, String>".to_string())
                 ]
             },
             "CC3" => Definition::Struct { fields: Fields::UnnamedFields(vec!["u64".to_string(), "u64".to_string()]) },
             "CC4<u64, String>" => Definition::Struct { fields: Fields::UnnamedFields(vec![
-                "u64".to_string(), "ThirdParty<u64, String>".to_string()
+                "u64".to_string(), "ThirdPartyOneHasNoControlOfRenamed<u64, String>".to_string()
             ]) },
-            "ThirdParty<u64, String>" => Definition::Struct { fields: Fields::UnnamedFields(vec![
+            "ThirdPartyOneHasNoControlOfRenamed<u64, String>" => Definition::Struct { fields: Fields::UnnamedFields(vec![
                 "BTreeMap<u64, String>".to_string(),
             ]) },
             "BTreeMap<u64, String>"=> Definition::Sequence {
                 length_width: Definition::DEFAULT_LENGTH_WIDTH,
                 length_range: Definition::DEFAULT_LENGTH_RANGE,
                 elements: "(u64, String)".to_string(),
             },
             "(u64, String)" => Definition::Tuple { elements: vec!["u64".to_string(), "String".to_string()]},
             "u64" => Definition::Primitive(8),
             "String" => Definition::Sequence {

this diff is vs https://github.com/near/borsh-rs/tree/f16cd07e3c982539352aa43f65abf3607461a7bc

cd borsh && cargo test  --features unstable__schema 'schema::test_schema_with_third_party'

it appears that doing a robust solution for arbitrary levels of nesting will require considerable breakage

@dj8yfo dj8yfo changed the title Schema generates conflicting names for enum variants Schema generates conflicting names for enum variants/across crates Jun 5, 2024
@dzmitry-lahoda
Copy link
Contributor

dzmitry-lahoda commented Jun 6, 2024

after some discussion and above comments, it seems that if i am mistaken,

  1. fixing conflicts in crates are I control is easy - rename or alias, or custom attributed. So i am not interested in fixing this as attribute (UPDATE: because there are workarounds, so error can be made more descriptive, like showing crate+mod path of conflicts? UPDATE2: and because it is not blocker for schema adoption, while inability to fix without forking 3rd parties is).
  2. fixing 3rd party crates could be solved by prefix, I am ok with that, but seems leaking details is not good. why I am good? because in the end everything is just some unique string which hints me how to debug thing, if crate+mod migrated/moved, i am ok with default behavior schema changes and TS consumer update as needed (even if change is doing nothing in the end). which leads to solution allow root object to hook and rename any 3rd party crates declaration during registration, so when assembly from root to children happens, my hook on root can receive child register and rename as needed. UPDATE: hook to receive declaration+definition+crate+mod of each registration.

For me good is hook, but ideal is crate+module prefix and hook working together (controlled by attributes). So without hook does not seem there is final solution?

@dj8yfo
Copy link
Collaborator

dj8yfo commented Jul 1, 2024

generates second declaration/definition under +1 suffixed name

@dzmitry-lahoda yeah, well, more-or-less robust automatic renames instead of conflict detection should be possible, if one introduces schema_id,

        impl schemars::JsonSchema for A {
            fn schema_name() -> std::string::String {
                "A".to_owned()
            }
            fn schema_id() -> std::borrow::Cow<'static, str> {
                std::borrow::Cow::Borrowed("example_crate::a::A")
            }

Someone willing to replace somebody's else's type schema intentionally will still be able to do so,
but a reasonable default of std::module_path! + type_name in derives will be a great way to do a unique id for types,
which wish to conform to being nice to others.
Schemars doesn't attempt to compare the definitions of types semantically, which might prove to be anywhere from very hard to impossible with recursive types being present.
Self-assigned schema_id would be the full source of truth for comparing types, breaking borsh's schema trait in a similar way by replacing a btreemap<declaration, definition> with a Generator, having an additional btreemap<schema_id, RenamedDeclaration> field.


a more robust solution would require compiler support for type_id of arbitrary types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants