Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Native Python Build Rules - Phase 1 #3768

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

surahman
Copy link
Member

@surahman surahman commented Jan 28, 2022

Sequential update of build rules towards native Python.

Each change should pass completely on the CI before moving to the next. Build container is Ubuntu 20.04. Resolves #3751.

Protocol Buffers:

We need to generate the Python protobuf and use Bazel's native rules for Java and CC. Java and CC examples are here. A tutorial on py_proto_library.

Protobuf dependency diagram

Building Heron's Protobuf in isolation in this repository.

I am also researching whether we can return a Starlark providers when generating the protobufs in genproto.bzl to get around this error:

...does not have mandatory providers: 'py' or 'PyInfo'...


Work Log:

  • WORKSPACE: added pip install entries. Removed py_repositories.
  • requirements.txt: located in tools/python/.
  • newgenproto.bzl: removed.
  • heron/proto/BUILD: updated is required to proceed → https://github.com/surahman/Heron-Protobuf.
  • heron/common/src/python/BUILD: colorlog==2.6.1
  • examples/src/python/BUILD: no external library dependencies.
  • heron/common/tests/python/pex_loader/testdata/src/BUILD: no external library dependencies.
  • heron/tools/cli/src/python/BUILD: PyYAML==5.4.1, requests==2.27.1, netifaces==0.10.6
  • heron/common/src/python/BUILD: colorlog==2.6.1

Added pip_install entry.
There are no references to this file in any of the build scripts. A full build with the complete test battery, without this file, yielded no issues.
Removed redundant <py_repositories>. Build passing locally.
Updated. No Python requirements.
@surahman surahman self-assigned this Jan 29, 2022
@nicknezis
Copy link
Contributor

I think this is still in draft and the checklist not complete, so not sure if this is approvable yet. Just wanted to make sure it wasn't merged in by accident. But I guess the Draft status should help with that.

I'd like to better understand the refactoring of the protobuf files to this location: https://github.com/surahman/Heron-Protobuf

@surahman Will this remove the need for the protobuf files to live in this repo? Or was this just a temporary home to figure out the build tooling?

@surahman
Copy link
Member Author

surahman commented Feb 13, 2022

I think this is still in draft and the checklist not complete, so not sure if this is approvable yet. Just wanted to make sure it wasn't merged in by accident. But I guess the Draft status should help with that.

Yes, this is still a WIP. There is a long way to go to completion and the protobuf is currently blocking progress.

I'd like to better understand the refactoring of the protobuf files to this location: https://github.com/surahman/Heron-Protobuf

@surahman Will this remove the need for the protobuf files to live in this repo? Or was this just a temporary home to figure out the build tooling?

It is to isolate the development of the protobuf build script and directory structures. It has sped up and greatly declutter the protobuf build script development. The directory structures and build scripts would need to be transplanted into the current Heron build scripts.

The issues to consider are whether the import/includes and directory structure within the protobuf libraries these scripts assemble will match what Heron requires. This could become very complicated if it does not.

Edit:

For clarity, everything outside of the files in this directory is extraneous and can be classified as build artifacts. The only relevant files are in this tree and will need to be integrated into Heron:

└── proto
    ├── api
    │   ├── BUILD.bazel
    │   └── topology.proto
    ├── ckptmgr
    │   ├── BUILD.bazel
    │   └── ckptmgr.proto
    ├── scheduler
    │   ├── BUILD.bazel
    │   └── scheduler.proto
    ├── stmgr
    │   ├── BUILD.bazel
    │   └── stmgr.proto
    ├── system
    │   ├── BUILD.bazel
    │   ├── common.proto
    │   ├── execution_state.proto
    │   ├── metrics.proto
    │   ├── packing_plan.proto
    │   ├── physical_plan.proto
    │   ├── stats.proto
    │   └── tuple.proto
    ├── testing
    │   ├── BUILD.bazel
    │   └── networktests.proto
    └── manager
        ├── BUILD.bazel
        └── tmanager.proto

The protobuf is central to all communication between processes in Heron and if it is broken things will fail, so great care and scrutiny are required.

@thinker0
Copy link
Member

@surahman
Copy link
Member Author

I don't think the changes in #3797 will affect the Protobuf IDLs that I have put together. This PR is blocked until more people can review and confirm the Protobuf IDLs are correct.

@nicknezis
Copy link
Contributor

Can we close this Merge Request?

@surahman
Copy link
Member Author

surahman commented May 2, 2022

We could close this PR but it is very close to getting over the bottleneck for the IDL and header/import generation for the Protobufs. This would mean we can use requirements.txt for Python dependencies. Once the IDL hurdle is cleared it will be very easy to update the remaining build script dependencies on PEX.

@surahman
Copy link
Member Author

I have verified that the generated C++, Java, and Python source files match the expected extensions.

C++:

proto_hdr = src[:-6] + ".pb.h"
proto_src = src[:-6] + ".pb.cc"

https://github.com/surahman/Heron-Protobuf/blob/23a47cfab36cbed6fe1ca93ba0a1e5b09420409d/proto/api/BUILD.bazel#L24-L25

Java:

srcjar = ctx.actions.declare_file("%s.srcjar" % ctx.attr.name)
java_srcs = srcjar.path + ".srcs"

https://github.com/surahman/Heron-Protobuf/blob/23a47cfab36cbed6fe1ca93ba0a1e5b09420409d/proto/api/BUILD.bazel#L41

Python:

proto_src = src[:-6] + "_pb2.py"
proto_srcgen_rule = name + "_py_src"

https://github.com/surahman/Heron-Protobuf/blob/23a47cfab36cbed6fe1ca93ba0a1e5b09420409d/proto/api/BUILD.bazel#L49

The next step is to generate the protocol buffer C++, Java, and Python files from the latest Heron build and the IDL on my repo. There needs to be verification that the file names and directory structure match the expected.

Subsequently, the new IDL and build scripts can be transplanted into this branch. There will need to be some build script configurations to kick off the builds for the C++, Java, and Python IDL source files. There might also be a potentially rejigging of the file structure required to match that which is expected by Heron.

Is there anyone proficient with Bazel who can assist in getting this PR across the finish line?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants