Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][experimental] Raise an exception if a leaf node is found during compilation #47757

Merged
merged 9 commits into from
Oct 29, 2024

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Sep 20, 2024

Why are these changes needed?

Leaf nodes are nodes that are not output nodes and have no downstream nodes. If a leaf node raises an exception, it will not be propagated to the driver. Therefore, this PR raises an exception if a leaf node is found during compilation.

Another solution: implicitly add leaf node to MultiOutputNode

Currently, the function execute can return multiple CompiledDAGRefs. The UX we want to provide is to implicitly add leaf nodes to the MultiOutputNode but not return the references of the leaf nodes. For example, a MultiOutputNode is containing 3 DAG nodes (2 normal DAG nodes + 1 leaf node).

x, y = compiled_dag.execute(input_vals) # We don't return the ref for the leaf node.

However, the ref for leaf node will be GC(ed) in execute, and CompiledDAGRef’s del will call get if it was never called which makes execute to be a sync instead of an async operation which is not acceptable.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@kevin85421 kevin85421 changed the title [core][experimental] Propagate leaf node errors to users [WIP][core][experimental] Propagate leaf node errors to users Sep 20, 2024
@kevin85421 kevin85421 changed the title [WIP][core][experimental] Propagate leaf node errors to users [core][experimental] Propagate leaf node errors to users Sep 25, 2024
@kevin85421 kevin85421 changed the title [core][experimental] Propagate leaf node errors to users [core][experimental] Raise an exception if a leaf node is found during compilation Sep 25, 2024
@kevin85421 kevin85421 marked this pull request as ready for review September 25, 2024 20:22
@rkooo567
Copy link
Contributor

Another solution: implicitly add leaf node to MultiOutputNode

Can you create an issue for this? Also can you share me the error message?

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit comment for improving error message further.

"Compiled DAG doesn't support leaf nodes that don't have "
"downstream nodes and are not output nodes. There are "
f"{len(leaf_nodes)} leaf nodes in the DAG. Please add them to "
f"the MultiOutputNode. These nodes are: {leaf_nodes}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you improve the error message to show how to solve this error step-by-step? For example, assuming a leaf node is w.f.bind() it could say sth like add the output of w.f.bind() to MultiOutputNode

What I recommend you is to try raising the error on your own and fix it looking at this error message. I think it is not very trivial if you assume you are not a developer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated cc9c823

@kevin85421
Copy link
Member Author

Can you create an issue for this? Also can you share me the error message?

@rkooo567 This was the original implementation of this PR, but we decided to raise an exception for leaf nodes after our discussions. Do we still need to open an issue to track it? What error message are you referring to?

Copy link
Contributor

@ruisearch42 ruisearch42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more on the problem, does it make a real difference if we disallow leaf nodes and force the user to add leaf nodes manually to MultiOutputNode?

Using the same example as mentioned, a MultiOutputNode is containing 3 DAG nodes (2 normal DAG nodes + 1 leaf node).

Previously we have:
x, y = compiled_dag.execute(input_vals)
and the 3rd return value is silently GC-ed and it calls get() underneath.

When the user changes to
x, y, z = compiled_dag.execute(input_vals)

z is not used by the user and also discarded, the same GC and get happens. Is there a difference here?

kevin85421 and others added 6 commits October 11, 2024 04:21
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Rui Qiao <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
@kevin85421
Copy link
Member Author

gentle ping - @ruisearch42 @rkooo567

@rkooo567: I also opened another issue: #47977.

@ruisearch42
Copy link
Contributor

Generally looks good, but after #47689 , implicitly adding leaf node to MultiOutputNode will work? And do we still need this change?

@kevin85421
Copy link
Member Author

but after #47689 , implicitly adding leaf node to MultiOutputNode will work? And do we still need this change?

I briefly skimmed through #47689. It will still call get in __del__ if the graph is not torn down, which will cause execute to become a synchronous operation if I understand correctly.

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome error message!

@kevin85421 kevin85421 added the go add ONLY when ready to merge, run all tests label Oct 28, 2024
@rkooo567 rkooo567 merged commit 0935c36 into ray-project:master Oct 29, 2024
6 checks passed
Jay-ju pushed a commit to Jay-ju/ray that referenced this pull request Nov 5, 2024
…g compilation (ray-project#47757)

Leaf nodes are nodes that are not output nodes and have no downstream nodes. If a leaf node raises an exception, it will not be propagated to the driver. Therefore, this PR raises an exception if a leaf node is found during compilation.

Another solution: implicitly add leaf node to MultiOutputNode
Currently, the function execute can return multiple CompiledDAGRefs. The UX we want to provide is to implicitly add leaf nodes to the MultiOutputNode but not return the references of the leaf nodes. For example, a MultiOutputNode is containing 3 DAG nodes (2 normal DAG nodes + 1 leaf node).

x, y = compiled_dag.execute(input_vals) # We don't return the ref for the leaf node.
However, the ref for leaf node will be GC(ed) in execute, and CompiledDAGRef’s del will call get if it was never called which makes execute to be a sync instead of an async operation which is not acceptable.
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this pull request Nov 14, 2024
…g compilation (ray-project#47757)

Leaf nodes are nodes that are not output nodes and have no downstream nodes. If a leaf node raises an exception, it will not be propagated to the driver. Therefore, this PR raises an exception if a leaf node is found during compilation.

Another solution: implicitly add leaf node to MultiOutputNode
Currently, the function execute can return multiple CompiledDAGRefs. The UX we want to provide is to implicitly add leaf nodes to the MultiOutputNode but not return the references of the leaf nodes. For example, a MultiOutputNode is containing 3 DAG nodes (2 normal DAG nodes + 1 leaf node).

x, y = compiled_dag.execute(input_vals) # We don't return the ref for the leaf node.
However, the ref for leaf node will be GC(ed) in execute, and CompiledDAGRef’s del will call get if it was never called which makes execute to be a sync instead of an async operation which is not acceptable.
Signed-off-by: JP-sDEV <[email protected]>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this pull request Nov 15, 2024
…g compilation (ray-project#47757)

Leaf nodes are nodes that are not output nodes and have no downstream nodes. If a leaf node raises an exception, it will not be propagated to the driver. Therefore, this PR raises an exception if a leaf node is found during compilation.

Another solution: implicitly add leaf node to MultiOutputNode
Currently, the function execute can return multiple CompiledDAGRefs. The UX we want to provide is to implicitly add leaf nodes to the MultiOutputNode but not return the references of the leaf nodes. For example, a MultiOutputNode is containing 3 DAG nodes (2 normal DAG nodes + 1 leaf node).

x, y = compiled_dag.execute(input_vals) # We don't return the ref for the leaf node.
However, the ref for leaf node will be GC(ed) in execute, and CompiledDAGRef’s del will call get if it was never called which makes execute to be a sync instead of an async operation which is not acceptable.

Signed-off-by: mohitjain2504 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants