Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added encapsulating double quotes to comply with DOT language #1176

Closed
wants to merge 3 commits into from

Conversation

tabedzki
Copy link
Contributor

@tabedzki tabedzki commented Sep 5, 2024

Adds double quotes " needed to close #1175 and closes #1169 and closes #1100 and closes #1072 closes #1065

Here is a truncated DOT string generated from this commit:

strict digraph  {
"ephys_element.QualityMetrics" [primary_key="{'insertion_number', 'paramset_idx', 'precluster_param_steps_id', 'recording_id', 'curation_id'}", distinguished=False, node_type=<class 'datajoint.user_tables.Imported'>, shape=ellipse, fontsize=12, fontcolor="#00007FA0", fontname=arial, fixedsize=false, width=0.48, height=0.48, label="ephys_element.QualityMetrics", color="#00007F40", style=filled];
}

Example working ERD after this fix
example_erd

@dimitri-yatsenko or @ethho
Would either of you be able to get this merged fairly quickly?

Versions:
networkx 3.4rc0.dev0
pydot 3.0.1

The latest dev version of networkx is required since there is a patch that removes a : check that arises if you use the latest version on pip. The expected pip release for this patch is September 27, 2024.

The error that occurs if you do not use the github version of the package networkx:

ValueError: Node names and attributes should not contain ":" unless they are quoted with "". For example the string 'attribute:data1' should be written as '"attribute:data1"'. Please refer https://github.com/pydot/pydot/issues/258

@tabedzki
Copy link
Contributor Author

tabedzki commented Sep 6, 2024

The new version of networkx requires Python 3.10 as the minimum version so if we want to use the newer/working versions of networkx and pydot, we should reconsider bumping the minimum supported Python version to 3.10 as discussed by ethho here and include a package floor for compatibility reasons in requirements.txt.

@ethho
Copy link
Contributor

ethho commented Sep 6, 2024

Thanks for the great work, @tabedzki!

Could you fix the tests/test_erd.py pytests? You can reproduce by opening the repo in a Dev Container or CodeSpaces; see the "Full error log" below.

close #1175 and closes #1169 and closes #1100 and closes #1072 closes #1065

Would love if we could reproduce as many of these issues as possible in pytests. These issues highlight lack of regression coverage for dj.Diagram. I won't say additional tests are blockers for merging this PR, but nice to have.

Full error log from pytest

vscode ➜ /workspaces/datajoint-python (tabedzki/master) $ pytest -sv tests/test_erd.py 
========================================================= test session starts ==========================================================
platform linux -- Python 3.11.4, pytest-8.2.2, pluggy-1.5.0 -- /usr/local/py-utils/venvs/pytest/bin/python
cachedir: .pytest_cache
rootdir: /workspaces/datajoint-python
plugins: anyio-4.4.0, Faker-28.1.0, cov-5.0.0
collected 7 items                                                                                                                      

tests/test_erd.py::test_decorator [2024-09-06 19:32:00,521][INFO]: Connecting [email protected]:3306
[2024-09-06 19:32:00,529][INFO]: Connected [email protected]:3306
[2024-09-06 19:32:00,678][INFO]: Connecting [email protected]:3306
[2024-09-06 19:32:00,683][INFO]: Connected [email protected]:3306
PASSED
tests/test_erd.py::test_dependencies PASSED
tests/test_erd.py::test_erd FAILED
tests/test_erd.py::test_erd_algebra PASSED
tests/test_erd.py::test_repr_svg FAILED
tests/test_erd.py::test_make_image FAILED
tests/test_erd.py::test_part_table_parsing FAILED

=============================================================== FAILURES ===============================================================
_______________________________________________________________ test_erd _______________________________________________________________

schema_simp = Schema `djtest_relational`


    def test_erd(schema_simp):
        assert dj.diagram.diagram_active, "Failed to import networkx and pydot"
        erd = dj.ERD(schema_simp, context=LOCALS_SIMPLE)
        graph = erd._make_graph()
>       assert set(cls.__name__ for cls in (A, B, D, E, L)).issubset(graph.nodes())
E       assert False
E        +  where False = <built-in method issubset of set object at 0x7f4b5aa27840>(NodeView(('"A"', '"ArgmaxTest"', '"DataA"', '"DataB"', '"IJ"', '"JI"', '"L"', '"OutfitLaunch"', '"OutfitLaunch.OutfitPiece"', '"TTestUpdate"', '"Website"', '"B"', '"B.C"', '"D"', '"E"', '"E.F"', '"E.G"', '"E.H"', '"E.M"', '"G"', '"F"', '"Profile"', '"Profile.Website"', '"ReservedWord"')))
E        +    where <built-in method issubset of set object at 0x7f4b5aa27840> = {'A', 'B', 'D', 'E', 'L'}.issubset
E        +      where {'A', 'B', 'D', 'E', 'L'} = set(<generator object test_erd.<locals>.<genexpr> at 0x7f4b59555490>)
E        +    and   NodeView(('"A"', '"ArgmaxTest"', '"DataA"', '"DataB"', '"IJ"', '"JI"', '"L"', '"OutfitLaunch"', '"OutfitLaunch.OutfitPiece"', '"TTestUpdate"', '"Website"', '"B"', '"B.C"', '"D"', '"E"', '"E.F"', '"E.G"', '"E.H"', '"E.M"', '"G"', '"F"', '"Profile"', '"Profile.Website"', '"ReservedWord"')) = NodeView(('"A"', '"ArgmaxTest"', '"DataA"', '"DataB"', '"IJ"', '"JI"', '"L"', '"OutfitLaunch"', '"OutfitLaunch.OutfitPiece"', '"TTestUpdate"', '"Website"', '"B"', '"B.C"', '"D"', '"E"', '"E.F"', '"E.G"', '"E.H"', '"E.M"', '"G"', '"F"', '"Profile"', '"Profile.Website"', '"ReservedWord"'))()
E        +      where NodeView(('"A"', '"ArgmaxTest"', '"DataA"', '"DataB"', '"IJ"', '"JI"', '"L"', '"OutfitLaunch"', '"OutfitLaunch.OutfitPiece"', '"TTestUpdate"', '"Website"', '"B"', '"B.C"', '"D"', '"E"', '"E.F"', '"E.G"', '"E.H"', '"E.M"', '"G"', '"F"', '"Profile"', '"Profile.Website"', '"ReservedWord"')) = <networkx.classes.digraph.DiGraph object at 0x7f4b592aa990>.nodes

tests/test_erd.py:31: AssertionError
____________________________________________________________ test_repr_svg _____________________________________________________________

schema_adv = Schema `djtest_advanced`


    def test_repr_svg(schema_adv):
        erd = dj.ERD(schema_adv, context=dict())
>       svg = erd._repr_svg_()

tests/test_erd.py:53: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
datajoint/diagram.py:440: in _repr_svg_
    return self.make_svg()._repr_svg_()
datajoint/diagram.py:428: in make_svg
    return SVG(self.make_dot().create_svg())
datajoint/diagram.py:371: in make_dot
    dot = nx.drawing.nx_pydot.to_pydot(graph)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

N = <networkx.classes.digraph.DiGraph object at 0x7f4b580f7b10>

    def to_pydot(N):
        """Returns a pydot graph from a NetworkX graph N.
    
        Parameters
        ----------
        N : NetworkX graph
          A graph created with NetworkX
    
        Examples
        --------
        >>> K5 = nx.complete_graph(5)
        >>> P = nx.nx_pydot.to_pydot(K5)
    
        Notes
        -----
    
        """
        import pydot
    
        # set Graphviz graph type
        if N.is_directed():
            graph_type = "digraph"
        else:
            graph_type = "graph"
        strict = nx.number_of_selfloops(N) == 0 and not N.is_multigraph()
    
        name = N.name
        graph_defaults = N.graph.get("graph", {})
        if name == "":
            P = pydot.Dot("", graph_type=graph_type, strict=strict, **graph_defaults)
        else:
            P = pydot.Dot(
                f'"{name}"', graph_type=graph_type, strict=strict, **graph_defaults
            )
        try:
            P.set_node_defaults(**N.graph["node"])
        except KeyError:
            pass
        try:
            P.set_edge_defaults(**N.graph["edge"])
        except KeyError:
            pass
    
        for n, nodedata in N.nodes(data=True):
            str_nodedata = {str(k): str(v) for k, v in nodedata.items()}
            # Explicitly catch nodes with ":" in node names or nodedata.
            n = str(n)
            raise_error = _check_colon_quotes(n) or (
                any(
                    (_check_colon_quotes(k) or _check_colon_quotes(v))
                    for k, v in str_nodedata.items()
                )
            )
            if raise_error:
                raise ValueError(
                    f'Node names and attributes should not contain ":" unless they are quoted with "".\
                    For example the string \'attribute:data1\' should be written as \'"attribute:data1"\'.\
                    Please refer https://github.com/pydot/pydot/issues/258'
                )
            p = pydot.Node(n, **str_nodedata)
            P.add_node(p)
    
        if N.is_multigraph():
            for u, v, key, edgedata in N.edges(data=True, keys=True):
                str_edgedata = {str(k): str(v) for k, v in edgedata.items() if k != "key"}
                u, v = str(u), str(v)
                raise_error = (
                    _check_colon_quotes(u)
                    or _check_colon_quotes(v)
                    or (
                        any(
                            (_check_colon_quotes(k) or _check_colon_quotes(val))
                            for k, val in str_edgedata.items()
                        )
                    )
                )
                if raise_error:
                    raise ValueError(
                        f'Node names and attributes should not contain ":" unless they are quoted with "".\
                        For example the string \'attribute:data1\' should be written as \'"attribute:data1"\'.\
                        Please refer https://github.com/pydot/pydot/issues/258'
                    )
                edge = pydot.Edge(u, v, key=str(key), **str_edgedata)
                P.add_edge(edge)
    
        else:
            for u, v, edgedata in N.edges(data=True):
                str_edgedata = {str(k): str(v) for k, v in edgedata.items()}
                u, v = str(u), str(v)
                raise_error = (
                    _check_colon_quotes(u)
                    or _check_colon_quotes(v)
                    or (
                        any(
                            (_check_colon_quotes(k) or _check_colon_quotes(val))
                            for k, val in str_edgedata.items()
                        )
                    )
                )
                if raise_error:
>                   raise ValueError(
                        f'Node names and attributes should not contain ":" unless they are quoted with "".\
                        For example the string \'attribute:data1\' should be written as \'"attribute:data1"\'.\
                        Please refer https://github.com/pydot/pydot/issues/258'
                    )
E                   ValueError: Node names and attributes should not contain ":" unless they are quoted with "".                    For example the string 'attribute:data1' should be written as '"attribute:data1"'.                    Please refer https://github.com/pydot/pydot/issues/258

/usr/local/lib/python3.11/site-packages/networkx/drawing/nx_pydot.py:282: ValueError
___________________________________________________________ test_make_image ____________________________________________________________

schema_simp = Schema `djtest_relational`


    def test_make_image(schema_simp):
        erd = dj.ERD(schema_simp, context=dict())
>       img = erd.make_image()

tests/test_erd.py:59: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
datajoint/diagram.py:435: in make_image
    return plt.imread(self.make_png())
datajoint/diagram.py:431: in make_png
    return io.BytesIO(self.make_dot().create_png())
datajoint/diagram.py:371: in make_dot
    dot = nx.drawing.nx_pydot.to_pydot(graph)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

N = <networkx.classes.digraph.DiGraph object at 0x7f4b57fb0a50>

    def to_pydot(N):
        """Returns a pydot graph from a NetworkX graph N.
    
        Parameters
        ----------
        N : NetworkX graph
          A graph created with NetworkX
    
        Examples
        --------
        >>> K5 = nx.complete_graph(5)
        >>> P = nx.nx_pydot.to_pydot(K5)
    
        Notes
        -----
    
        """
        import pydot
    
        # set Graphviz graph type
        if N.is_directed():
            graph_type = "digraph"
        else:
            graph_type = "graph"
        strict = nx.number_of_selfloops(N) == 0 and not N.is_multigraph()
    
        name = N.name
        graph_defaults = N.graph.get("graph", {})
        if name == "":
            P = pydot.Dot("", graph_type=graph_type, strict=strict, **graph_defaults)
        else:
            P = pydot.Dot(
                f'"{name}"', graph_type=graph_type, strict=strict, **graph_defaults
            )
        try:
            P.set_node_defaults(**N.graph["node"])
        except KeyError:
            pass
        try:
            P.set_edge_defaults(**N.graph["edge"])
        except KeyError:
            pass
    
        for n, nodedata in N.nodes(data=True):
            str_nodedata = {str(k): str(v) for k, v in nodedata.items()}
            # Explicitly catch nodes with ":" in node names or nodedata.
            n = str(n)
            raise_error = _check_colon_quotes(n) or (
                any(
                    (_check_colon_quotes(k) or _check_colon_quotes(v))
                    for k, v in str_nodedata.items()
                )
            )
            if raise_error:
                raise ValueError(
                    f'Node names and attributes should not contain ":" unless they are quoted with "".\
                    For example the string \'attribute:data1\' should be written as \'"attribute:data1"\'.\
                    Please refer https://github.com/pydot/pydot/issues/258'
                )
            p = pydot.Node(n, **str_nodedata)
            P.add_node(p)
    
        if N.is_multigraph():
            for u, v, key, edgedata in N.edges(data=True, keys=True):
                str_edgedata = {str(k): str(v) for k, v in edgedata.items() if k != "key"}
                u, v = str(u), str(v)
                raise_error = (
                    _check_colon_quotes(u)
                    or _check_colon_quotes(v)
                    or (
                        any(
                            (_check_colon_quotes(k) or _check_colon_quotes(val))
                            for k, val in str_edgedata.items()
                        )
                    )
                )
                if raise_error:
                    raise ValueError(
                        f'Node names and attributes should not contain ":" unless they are quoted with "".\
                        For example the string \'attribute:data1\' should be written as \'"attribute:data1"\'.\
                        Please refer https://github.com/pydot/pydot/issues/258'
                    )
                edge = pydot.Edge(u, v, key=str(key), **str_edgedata)
                P.add_edge(edge)
    
        else:
            for u, v, edgedata in N.edges(data=True):
                str_edgedata = {str(k): str(v) for k, v in edgedata.items()}
                u, v = str(u), str(v)
                raise_error = (
                    _check_colon_quotes(u)
                    or _check_colon_quotes(v)
                    or (
                        any(
                            (_check_colon_quotes(k) or _check_colon_quotes(val))
                            for k, val in str_edgedata.items()
                        )
                    )
                )
                if raise_error:
>                   raise ValueError(
                        f'Node names and attributes should not contain ":" unless they are quoted with "".\
                        For example the string \'attribute:data1\' should be written as \'"attribute:data1"\'.\
                        Please refer https://github.com/pydot/pydot/issues/258'
                    )
E                   ValueError: Node names and attributes should not contain ":" unless they are quoted with "".                    For example the string 'attribute:data1' should be written as '"attribute:data1"'.                    Please refer https://github.com/pydot/pydot/issues/258

/usr/local/lib/python3.11/site-packages/networkx/drawing/nx_pydot.py:282: ValueError
_______________________________________________________ test_part_table_parsing ________________________________________________________

schema_simp = Schema `djtest_relational`


    def test_part_table_parsing(schema_simp):
        # https://github.com/datajoint/datajoint-python/issues/882
        erd = dj.Di(schema_simp, context=LOCALS_SIMPLE)
        graph = erd._make_graph()
>       assert "OutfitLaunch" in graph.nodes()
E       assert 'OutfitLaunch' in NodeView(('"A"', '"ArgmaxTest"', '"DataA"', '"DataB"', '"IJ"', '"JI"', '"L"', '"OutfitLaunch"', '"OutfitLaunch.OutfitPiece"', '"TTestUpdate"', '"Website"', '"B"', '"B.C"', '"D"', '"E"', '"E.F"', '"E.G"', '"E.H"', '"E.M"', '"G"', '"F"', '"Profile"', '"Profile.Website"', '"ReservedWord"'))
E        +  where NodeView(('"A"', '"ArgmaxTest"', '"DataA"', '"DataB"', '"IJ"', '"JI"', '"L"', '"OutfitLaunch"', '"OutfitLaunch.OutfitPiece"', '"TTestUpdate"', '"Website"', '"B"', '"B.C"', '"D"', '"E"', '"E.F"', '"E.G"', '"E.H"', '"E.M"', '"G"', '"F"', '"Profile"', '"Profile.Website"', '"ReservedWord"')) = NodeView(('"A"', '"ArgmaxTest"', '"DataA"', '"DataB"', '"IJ"', '"JI"', '"L"', '"OutfitLaunch"', '"OutfitLaunch.OutfitPiece"', '"TTestUpdate"', '"Website"', '"B"', '"B.C"', '"D"', '"E"', '"E.F"', '"E.G"', '"E.H"', '"E.M"', '"G"', '"F"', '"Profile"', '"Profile.Website"', '"ReservedWord"'))()
E        +    where NodeView(('"A"', '"ArgmaxTest"', '"DataA"', '"DataB"', '"IJ"', '"JI"', '"L"', '"OutfitLaunch"', '"OutfitLaunch.OutfitPiece"', '"TTestUpdate"', '"Website"', '"B"', '"B.C"', '"D"', '"E"', '"E.F"', '"E.G"', '"E.H"', '"E.M"', '"G"', '"F"', '"Profile"', '"Profile.Website"', '"ReservedWord"')) = <networkx.classes.digraph.DiGraph object at 0x7f4b57ec2910>.nodes

tests/test_erd.py:67: AssertionError
======================================================= short test summary info ========================================================
FAILED tests/test_erd.py::test_erd - assert False
FAILED tests/test_erd.py::test_repr_svg - ValueError: Node names and attributes should not contain ":" unless they are quoted with "".                    For example the str...
FAILED tests/test_erd.py::test_make_image - ValueError: Node names and attributes should not contain ":" unless they are quoted with "".                    For example the str...
FAILED tests/test_erd.py::test_part_table_parsing - assert 'OutfitLaunch' in NodeView(('"A"', '"ArgmaxTest"', '"DataA"', '"DataB"', '"IJ"', '"JI"', '"L"', '"OutfitLaunch"', '"OutfitLa...
===================================================== 4 failed, 3 passed in 31.91s =====================================================

@tabedzki
Copy link
Contributor Author

tabedzki commented Sep 6, 2024

@ethho do the dev containers/tests use the development version of networkx? Those error messages for test_repr_svg and test_make_image indicate to me that they are not using the patched version of networkx that is currently only available on github and not through pypa. In order to use the latest, patched version, we need Python 3.10 so the dev test for (3.8, 5.7) will fail this test because of that.

  • For test_erd.py the fake objects A,B,C, etc will somehow need their names to be encapsulated in " and that will make it pass the test.
    • Can you do this one?
  • For test_part_table_parsing, assert '"OutfitLaunch"' in graph.nodes() should do the trick. (commit pushed)

ethho added a commit that referenced this pull request Sep 6, 2024
Before running `to_pydot`, we encapsulate node names
and attr_map in double quotes to avoid syntax errors
when the node names contain special characters.
Implements the workarounds described in
#1176 (comment)
and
pydot/pydot#258 (comment)
@ethho
Copy link
Contributor

ethho commented Sep 6, 2024

Those error messages for test_repr_svg and test_make_image indicate to me that they are not using the patched version of networkx

The root cause is an instance of this error. While this would be fixed by depending on the pydot pre-release, we can easily fix in dj.Diagram by also encapsulating edge attributes, as done here.

@tabedzki tabedzki closed this Sep 7, 2024
@ethho
Copy link
Contributor

ethho commented Sep 8, 2024

We chose to implement #1177 as an alternative. Continuing discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment