forked from ray-project/ray
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Data] support batch_format for Sort and Aggregate (ray-project#48287)
## Why are these changes needed? While we calling `xxx.map_groups(..., batch_format="...")`, we may invoke sort function and creating empty blocks which still uses pyarrow by default. And, when we invoke another sort call on top of it, we will hit `AttributeError: 'DataFrame' object has no attribute 'num_rows'` since we uses first block type. (However, we may have different blocks). See more details in ray-project#46748 ## Related issue number Close ray-project#46748 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Xingyu Long <[email protected]> Co-authored-by: Scott Lee <[email protected]>
- Loading branch information
1 parent
5788c4b
commit 3f195b4
Showing
9 changed files
with
147 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
42 changes: 42 additions & 0 deletions
42
python/ray/data/_internal/logical/rules/inherit_batch_format.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
from collections import deque | ||
from typing import Iterable | ||
|
||
from ray.data._internal.logical.interfaces import LogicalOperator, LogicalPlan, Rule | ||
from ray.data._internal.logical.operators.all_to_all_operator import AbstractAllToAll | ||
from ray.data._internal.logical.operators.map_operator import MapBatches | ||
|
||
|
||
class InheritBatchFormatRule(Rule): | ||
"""For AbstractAllToAll based operator, apply this rule | ||
to inherit batch_format from upstream operator by traversing | ||
the entire DAG.""" | ||
|
||
def apply(self, plan: LogicalPlan) -> LogicalPlan: | ||
optimized_dag: LogicalOperator = self._apply(plan.dag) | ||
new_plan = LogicalPlan(dag=optimized_dag, context=plan.context) | ||
return new_plan | ||
|
||
def _apply(self, op: LogicalOperator): | ||
# Post-order traversal. | ||
nodes: Iterable[LogicalOperator] = deque() | ||
for node in op.post_order_iter(): | ||
nodes.appendleft(node) | ||
|
||
while len(nodes) > 0: | ||
current_op = nodes.pop() | ||
|
||
if isinstance(current_op, AbstractAllToAll): | ||
# traversal up the DAG until we find MapBatches with batch_format | ||
# or we reach to source op and do nothing | ||
upstream_op = current_op.input_dependencies[0] | ||
while upstream_op.input_dependencies: | ||
if ( | ||
isinstance(upstream_op, MapBatches) | ||
and upstream_op._batch_format | ||
): | ||
current_op._batch_format = upstream_op._batch_format | ||
break | ||
upstream_op = upstream_op.input_dependencies[0] | ||
|
||
# just return the default op | ||
return op |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters