Skip to content

Commit

Permalink
Changes to make v1.29.4 build on SmartOS.
Browse files Browse the repository at this point in the history
  • Loading branch information
siepkes committed Jul 10, 2024
1 parent 8eef22b commit d21796f
Show file tree
Hide file tree
Showing 31 changed files with 595 additions and 61 deletions.
25 changes: 23 additions & 2 deletions .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@ build --color=yes
build --jobs=HOST_CPUS-1
build --workspace_status_command="bash bazel/get_workspace_status"
build --incompatible_strict_action_env
build --java_runtime_version=remotejdk_11
build --tool_java_runtime_version=remotejdk_11
# On illumos we need to use the OpenJDK version installed locally.
#build --java_runtime_version=remotejdk_11
#build --tool_java_runtime_version=remotejdk_11
build --platform_mappings=bazel/platform_mappings
# silence absl logspam.
build --copt=-DABSL_MIN_LOG_LEVEL=4
Expand Down Expand Up @@ -548,3 +549,23 @@ common:debug --config=debug-tests
try-import %workspace%/clang.bazelrc
try-import %workspace%/user.bazelrc
try-import %workspace%/local_tsan.bazelrc

# Illumos
build:illumos --cxxopt=-std=c++17
build:illumos --define hot_restart=disabled
build:illumos --define wasm=disabled
# Disables gperftools. I (@siepkes) couldn't get 'libtcmalloc_and_profiler.a' to be built.
# Not because it doesn't build on illumos but because I couldn't get Bazel -> foreign_cc
# -> autoconf to build the thing. And then Bazel complains about not finding it.
# When building the "vanilla" gperftools repo the 'libtcmalloc_and_profiler.a' artifact
# is build just fine.
build:illumos --define tcmalloc=disabled
# The DataDog extension pulls in Google Abseil, which enables `-Werror` (i.e. treat warnings as errors).
# When using GCC 12 on illumos this fails to build.
build:illumos --@envoy//source/extensions/tracers/datadog:enabled=false

build:illumos --java_runtime_version=local_jdk
build:illumos --java_language_version=11
build:illumos --tool_java_runtime_version=local_jdk
build:illumos --tool_java_language_version=11
build:illumos --extra_toolchains=//:repository_default_toolchain_definition
19 changes: 19 additions & 0 deletions BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,22 @@ package_group(
"//mobile/...",
],
)

load(
"@bazel_tools//tools/jdk:default_java_toolchain.bzl",
"default_java_toolchain", "DEFAULT_TOOLCHAIN_CONFIGURATION", "BASE_JDK9_JVM_OPTS", "DEFAULT_JAVACOPTS"
)

# On illumos this config gets activated. We use it to force Bazel to use the local (pkgsrc) JDK. If we don't
# it will try to use '@bazel_tools//tools/jdk:remote_jdk11'. Which won't work because there is no remote illumos
# JDK which can be downloaded configured in the 'rules_java' project.
default_java_toolchain(
name = "repository_default_toolchain",
configuration = DEFAULT_TOOLCHAIN_CONFIGURATION, # One of predefined configurations
# Other parameters are from java_toolchain rule:
java_runtime = "@local_jdk//:jdk", # JDK to use for compilation and toolchain's tools execution
jvm_opts = BASE_JDK9_JVM_OPTS, # Additional JDK options
javacopts = DEFAULT_JAVACOPTS, # Additional javac options
source_version = "11",
target_version = "11",
)
198 changes: 198 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,201 @@
# Envoy SmartOS / illumos / Solaris port

**(Scroll down for original Envoy readme.md)**

This repo contains a SmartOS / illumos port for Envoy. It will probably also work on Solaris though it will probably require modifications since we assume the use of pkgsrc.

To build this Envoy port you need Bazel. This requires a SmartOS / illumos / Solaris port of Bazel since Bazel does not natively support these platforms. See the [bazel-smartos](https://github.com/siepkes/bazel-smartos) repo for a SmartOS port. My intention being to properly upstream the thing but that takes some work... Feel free to reach out to me if you encounter any issues.

## Running Envoy on SmartOS

**WARNING: When running set the environmental variable `EVENT_NOEVPORT=yes`.**

Envoy uses libevent which uses event ports on illumos (the native non-blocking IO implementation on illumos). For some reason when using event ports libevent starts making a massive number of syscalls (as many as the CPU limits allow). Therefor we disable the event ports implementation in libevent for now.

```
$ export EVENT_NOEVPORT=yes
$ ./envoy-static --disable-hot-restart -c ./config.yaml
```

## Building Envoy on Triton / SmartOS

Wheter you are on Triton or on SmartOSS you need to create a native (`joyent` brand) container for the build. Steps below are performed on a container running the `base-64` image version `23.4.0` (`8adac45a-aca7-11ee-b53e-00151714048c`).

The following things are good to know:

* As stated in the [bazel-smartos](https://github.com/siepkes/bazel-smartos) repo the Bazel and Envoy binaries depend on the specific GCC version used due to hardcoded versions in some paths.
* For this guide a container with 32 GB RAM and 64 GB swap was used in order to built using `export NUM_CPUS=2`. You can experiment with lowering `NUM_CPUS` if you run in to memory problems or changing the amount of RAM and swap of the container. When the linker runs out of memory I've seen the following errors (some cryptic) which all meant "not enough memory" in my case:
** `ld: fatal: mmap anon failed: Resource temporarily unavailable`
** `collect2: error: ld returned 1 exit status` can mean "out of memory" if there is no other error in the output.

Install required build packages:

```
# pkgin -y install go119 ninja-build gcc12 git-base zip unzip openjdk11 libtool cmake automake ninja-build autoconf gmake python311 py311-expat
```

Bazel will try to build the extensions that use Python (for example Kafka filter) for every Python version that is installed. Meaning you need to have the Python modules such as `py311-expat` installed for every installed Python version (for example `py311-expat` if Python 3.11 is also installed). When bumping Triton image version verify the package Python version. So beware that having other versions of Python installed in your build VM might complicate the build process. The same goes for Go; Having a more recent version or multiple versions of Go installed can lead to build issues.

Prepare the build environment:
```
$ git clone https://github.com/siepkes/envoy-smartos.git
$ cd envoy-smartos
$ git checkout smartos-v1.29.2
```

Build Envoy:
```
$ export LANG=en_US.UTF-8
$ export NUM_CPUS=2 # Needed to prevent a CPU detection algorithm from seeing all CPU's in the hypervisor and spawning too many threads.
$ export JAVA_HOME="/opt/local/java/openjdk11"
$ bazel build -c opt --jobs=4 \
--config=illumos \
--package_path %workspace%:/root/envoy-smartos/ \
//source/exe:envoy-static
```

To troubleshoot build issues Bazel can be made more talkative by adding the following flags:

```
--sandbox_debug --verbose_failures --toolchain_resolution_debug --subcommands
```

This will result in a statically linked binary of Envoy in `./bazel-bin/source/exe/envoy-static`.

The binary will include debug symbols which you can strip to bring down the size of the binary substantially. Beware that this will make the backtrace library unusable (ie. stacktraces become hard to read):

```
$ strip --strip-debug ./bazel-bin/source/exe/envoy-static
```

## Known issues / TODO's / Remarks

Below is a list of known issues of this port. These are mostly open issues because they represent functionality I didn't need right away and stood in the way of doing a sucessful build. I'm obviously open to any PR / help anyone can offer though!

### Make webassembly runtime work

We currently disable WASM in `.bazerc` when building. Reason for this is that the V8 WASM runtime currently doesn't build on illumos. Envoy can be configured to use a different WASM runtime but for now WASM is just disabled.

Additionally building the WASM extensions which GCC does not work. Leading to errors such as the one below. Apparantly these issues are not present when using clang instead of GCC (See [Envoy issue 14788](https://github.com/envoyproxy/envoy/issues/14788)).

```
external/com_google_absl/absl/time/internal/cctz/include/cctz/civil_time_detail.h: In function 'constexpr int absl::time_internal::cctz::detail::impl::days_per_month(absl::time_internal::cctz::year_t, absl::time_internal::cctz::detail::month_t)':
external/com_google_absl/absl/time/internal/cctz/include/cctz/civil_time_detail.h:104:28: warning: array subscript has type 'char' [-Wchar-subscripts]
return k_days_per_month[m] + (m == 2 && is_leap_year(y));
^
external/com_google_cel_cpp/eval/eval/ternary_step.cc: In function 'absl::StatusOr<std::unique_ptr<google::api::expr::runtime::ExpressionStep> > google::api::expr::runtime::CreateTernaryStep(int64_t)':
external/com_google_cel_cpp/eval/eval/ternary_step.cc:75:10: error: could not convert 'step' from 'std::unique_ptr<google::api::expr::runtime::ExpressionStep>' to 'absl::StatusOr<std::unique_ptr<google::api::expr::runtime::ExpressionStep> >'
return step;
^~~~
```

### Make event ports work

Currently we disable event ports by using the environmental variable `EVENT_NOEVPORT=yes`. When using event ports Envoy (or more likely libevent) starts making a massive number of syscalls. I'm guessing this is because some (event) loop in libevent is going haywire. Probably need to take a look at `libevent_scheduler.cc` how libevent is configured.

### Final binary requires GCC package

Due to the way the linking is currently configured the final Envoy binary requires the GCC package to be installed in the container:

```
$ ldd bazel-bin/source/exe/envoy-static
librt.so.1 => /lib/64/librt.so.1
libdl.so.1 => /lib/64/libdl.so.1
libpthread.so.1 => /lib/64/libpthread.so.1
libm.so.2 => /lib/64/libm.so.2
libstdc++.so.6 => /opt/local/gcc7//lib/amd64/libstdc++.so.6
libxnet.so.1 => /lib/64/libxnet.so.1
libsocket.so.1 => /lib/64/libsocket.so.1
libnsl.so.1 => /lib/64/libnsl.so.1
libgcc_s.so.1 => /opt/local/gcc7//lib/amd64/libgcc_s.so.1
libc.so.1 => /lib/64/libc.so.1
libmp.so.2 => /lib/64/libmp.so.2
libmd.so.1 => /lib/64/libmd.so.1
```

### Get entire test suite to run

Headline covers it.

```
$ export JAVA_HOME="/opt/local/java/openjdk11"
$ bazel test --host_javabase=@local_jdk//:jdk //test/...
```

### Hot restart support disabled

Currently we pass `--define hot_restart=disabled` via `.bazelrc` when building to disable Hot restart (ie. restart Envoy without client connections being closed). Hot restart is disabled because it didn't work without modifications and I didn't have a need for it.

### DataDog tracing extension disabled

During the build the DataDog tracing extension is disabled via `.bazelrc`. This is because the DataDog extension pulls in it's own Google Abseil (not our patched version) which enables `-Werror` (i.e. treat warnings as errors). When using GCC 12 on illumos this fails to build:

```
/opt/local/gcc12/lib/gcc/x86_64-sun-solaris2.11/12.2.0/../../../../include/c++/12.2.0/bits/new_allocator.h:158:33: error: 'void operator delete(void*, size_t)' called on a pointer to an unallocated object '1' [-Werror=free-nonheap-object]
158 | _GLIBCXX_OPERATOR_DELETE(_GLIBCXX_SIZED_DEALLOC(__p, __n));
| ^
cc1plus: all warnings being treated as errors
```

This could be remedied by forking DataDog's [dd-trace-cpp](https://github.com/DataDog/dd-trace-cpp/) library and ensuring this flag isn't set but since I (@siepkes) have no need for it I disabled building the extension.

### py37-expat package requirement.

The `py37-expat` package must be installed otherwise the build dies with the output below. I (@siepkes) think this might actually be a bug in upstream since requiring manual install of the package is not really what Bazel is about?

```
[INFO 08:56:26.568 src/main/cpp/rc_file.cc:131] Skipped optional import of /root/envoy-smartos/local_tsan.bazelrc, the specified rc file either does not exist or is not readable.
[INFO 08:56:26.568 src/main/cpp/rc_file.cc:56] Parsing the RcFile /dev/null
[INFO 08:56:26.569 src/main/cpp/blaze.cc:1623] Debug logging requested, sending all client log statements to stderr
[INFO 08:56:26.570 src/main/cpp/blaze.cc:1506] Acquired the client lock, waited 0 milliseconds
[INFO 08:56:26.577 src/main/cpp/blaze.cc:1694] Trying to connect to server (timeout: 30 secs)...
[INFO 08:56:26.590 src/main/cpp/blaze_util_illumos.cc:126] PID: 658256 (/root/envoy-smartos).
[INFO 08:56:26.590 src/main/cpp/blaze.cc:1261] Connected (server pid=658256).
[INFO 08:56:26.590 src/main/cpp/blaze.cc:1971] Releasing client lock, let the server manage concurrent requests.
INFO: Repository config_validation_pip3 instantiated at:
no stack (--record_rule_instantiation_callstack not enabled)
Repository rule pip_import defined at:
/root/.cache/bazel/_bazel_root/7558a64af10a6eb79f74e70211660103/external/rules_python/python/pip.bzl:51:29: in <toplevel>
ERROR: An error occurred during the fetch of repository 'config_validation_pip3':
pip_import failed: (Traceback (most recent call last):
File "/opt/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.cache/bazel/_bazel_root/7558a64af10a6eb79f74e70211660103/external/rules_python/tools/piptool.par/__main__.py", line 26, in <module>
File "/root/.cache/bazel/_bazel_root/7558a64af10a6eb79f74e70211660103/external/rules_python/tools/piptool.par/piptool_deps_pypi__setuptools_44_0_0/pkg_resources/__init__.py", line 35, in <module>
File "/opt/local/lib/python3.7/plistlib.py", line 65, in <module>
from xml.parsers.expat import ParserCreate
File "/opt/local/lib/python3.7/xml/parsers/expat.py", line 4, in <module>
from pyexpat import *
ModuleNotFoundError: No module named 'pyexpat'
)
ERROR: no such package '@config_validation_pip3//': pip_import failed: (Traceback (most recent call last):
File "/opt/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.cache/bazel/_bazel_root/7558a64af10a6eb79f74e70211660103/external/rules_python/tools/piptool.par/__main__.py", line 26, in <module>
File "/root/.cache/bazel/_bazel_root/7558a64af10a6eb79f74e70211660103/external/rules_python/tools/piptool.par/piptool_deps_pypi__setuptools_44_0_0/pkg_resources/__init__.py", line 35, in <module>
File "/opt/local/lib/python3.7/plistlib.py", line 65, in <module>
from xml.parsers.expat import ParserCreate
File "/opt/local/lib/python3.7/xml/parsers/expat.py", line 4, in <module>
from pyexpat import *
ModuleNotFoundError: No module named 'pyexpat'
)
INFO: Elapsed time: 10.114s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
Fetching @headersplit_pip3; fetching 9s
Fetching @configs_pip3; fetching 9s
Fetching @kafka_pip3; fetching 9s
Fetching @thrift_pip3; fetching 9s
Fetching @protodoc_pip3; fetching 9s
```

# Original Envoy Readme

![Envoy Logo](https://github.com/envoyproxy/artwork/blob/main/PNG/Envoy_Logo_Final_PANTONE.png)

[Cloud-native high-performance edge/middle/service proxy](https://www.envoyproxy.io/)
Expand Down
8 changes: 4 additions & 4 deletions api/bazel/repository_locations.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -155,10 +155,10 @@ REPOSITORY_LOCATIONS_SPEC = dict(
project_name = "envoy_toolshed",
project_desc = "Tooling, libraries, runners and checkers for Envoy proxy's CI",
project_url = "https://github.com/envoyproxy/toolshed",
version = "0.1.3",
sha256 = "ee6d0b08ae3d9659f5fc34d752578af195147b153f8ca68eb4f8530aceb764d9",
strip_prefix = "toolshed-bazel-v{version}/bazel",
urls = ["https://github.com/envoyproxy/toolshed/archive/bazel-v{version}.tar.gz"],
version = "ad6ef2576db35e8d4e9deec5fb229b0ebb120f0b",
sha256 = "6dba7c5a5fafdbdf2caa06984872e6b32826356b1356392b33b4ab9b2c82f9d0",
strip_prefix = "toolshed-{version}/bazel",
urls = ["https://github.com/siepkes/toolshed/archive/{version}.tar.gz"],
use_category = ["build"],
release_date = "2024-04-16",
cpe = "N/A",
Expand Down
6 changes: 6 additions & 0 deletions bazel/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -582,6 +582,11 @@ config_setting(
values = {"cpu": "x64_windows"},
)

config_setting(
name = "illumos",
values = {"cpu": "x86_64"},
)

# Configuration settings to make doing selects for Apple vs non-Apple platforms
# easier. More details: https://docs.bazel.build/versions/master/configurable-attributes.html#config_settingaliasing
config_setting(
Expand Down Expand Up @@ -757,6 +762,7 @@ selects.config_setting_group(
":ios_x86_64",
":linux_x86_64",
":windows_x86_64",
":illumos",
],
)

Expand Down
7 changes: 5 additions & 2 deletions bazel/dependency_imports.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,18 @@ load("@com_google_cel_cpp//bazel:deps.bzl", "parser_deps")
load("@com_github_chrusty_protoc_gen_jsonschema//:deps.bzl", protoc_gen_jsonschema_go_dependencies = "go_dependencies")

# go version for rules_go
GO_VERSION = "1.20"
GO_VERSION = "1.19"

JQ_VERSION = "1.7"
YQ_VERSION = "4.24.4"

def envoy_dependency_imports(go_version = GO_VERSION, jq_version = JQ_VERSION, yq_version = YQ_VERSION):
rules_foreign_cc_dependencies()
go_rules_dependencies()
go_register_toolchains(go_version)
# Using 'host' makes Bazel use the go installation on our host. This
# is needed because the 'io_bazel_rules_go' tries to download a GO
# installation. However it can't download one for illumos / Solaris.
go_register_toolchains(go_version = "host")
envoy_download_go_sdks(go_version)
gazelle_dependencies(go_sdk = "go_sdk")
apple_rules_dependencies()
Expand Down
Loading

0 comments on commit d21796f

Please sign in to comment.