Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulties getting CGW to run #85

Open
ncalad opened this issue Sep 24, 2024 · 16 comments
Open

Difficulties getting CGW to run #85

ncalad opened this issue Sep 24, 2024 · 16 comments

Comments

@ncalad
Copy link

ncalad commented Sep 24, 2024

We have zookeeper and kafka running but when we try to run the CGW application, the container terminates and the logs show ...

./run_cgw.sh openlan-cgw-img:3d46cc3 ucentral-cgw-container

docker logs 010d7e028b78
[2024-09-23T22:08:40Z INFO ucentral_cgw] Starting CGW application, rev tag:
[2024-09-23T22:08:40Z INFO ucentral_cgw] (1048576, 1048576)
[2024-09-23T22:08:40Z INFO ucentral_cgw] (1048576, 1048576)
%3|1727129320.768|FAIL|rdkafka#producer-1| [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connect to ipv6#[::1]:9092 failed: Connection refused (after 0ms in state CONNECT)
[2024-09-23T22:08:40Z ERROR ucentral_cgw::cgw_remote_discovery] Can't create CGW Remote Discovery client: Redis client create failed (Connection(ConnectionFailed))
[2024-09-23T22:08:40Z ERROR ucentral_cgw::cgw_connection_server] Can't create CGW Connection server: Remote Discovery create failed: RemoteDiscovery("Redis client create failed")
thread 'main' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.0/src/runtime/blocking/shutdown.rs:51:21:
Cannot drop a runtime in a context where blocking is not allowed. This happens when a runtime is dropped from within an asynchronous context.
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

@Cahb
Copy link
Contributor

Cahb commented Sep 24, 2024

Hey @ncalad,
The reason why it fails, is it can't connect to kafka and redis, it's because of the enviroment variables being set to default values;
Also some other service-requirements are not met;

Let's do the following:
i've prepared a special changes that would eliminate all this issues and would allow to run CGW all-in-one; Basically, it starts all the necessary containers with proper configs and so on. We can view this as a 'default' cgw enviroment, and it could be used as a starting point for anyone who wants to try CGW out-of-the-box;

Here's the branch: https://github.com/Telecominfraproject/openlan-cgw/tree/feat/all_in_one_make

Simply pull the branch and run make / make all should do all the necessary stuff under the hood;
In case if you see any difficulties, please also write back, as this is only dev-tested by me, and we need to have someone else to try and use it.
If it works, we'll merge it into main directly.

P.S. this change also means you can drop your zookeeper and kafka that you previously created;

@Cahb
Copy link
Contributor

Cahb commented Sep 24, 2024

Few more things:
prints like

err Invalid symbol 45, offset 0. 

and

%3|1727172666.463|FAIL|rdkafka#producer-1| [thrd:docker-broker-1:9092/bootstrap]: docker-broker-1:9092/bootstrap: Connect to ipv4#172.23
.0.3:9092 failed: Connection refused (after 0ms in state CONNECT)                                                                       
%3|1727172666.464|FAIL|CGW0#consumer-2| [thrd:docker-broker-1:9092/bootstrap]: docker-broker-1:9092/bootstrap: Connect to ipv4#172.23.0.
3:9092 failed: Connection refused (after 0ms in state CONNECT)    

Are safe; We will resolve them in the future;
The first one is safe and should be removed completely;
The second one indicates that lazy connection connect failed, but it will retry and it should work / make run once again fixes the issue;
It fails, because we're running containers (kafka/redis/PGSQL) from docker-compose, and CGW is spawned in as not part of compose file; Hence, we need a proper way to synchronize their startup/ready states;
We will address this in the future;

@ncalad
Copy link
Author

ncalad commented Sep 24, 2024

We were able to get the cgw container to run. Can we browse to it or perform some other test to see if everything is correct?

root@docker-desktop:/# netstat -anp | grep LISTEN
tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:36541 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 1/ucentral-cgw
tcp 0 0 0.0.0.0:50051 0.0.0.0:* LISTEN 1/ucentral-cgw
tcp 0 0 0.0.0.0:9092 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:9094 0.0.0.0:* LISTEN -
tcp6 0 0 :::111 :::* LISTEN -
tcp6 0 0 :::60981 :::* LISTEN -
unix 2 [ ACC ] STREAM LISTENING 32710 - /run/containerd/s/d75d140b9730bc4fa15482715a5a93195fe591fcacddecbcdf15c0a6e7a138a3
unix 2 [ ACC ] STREAM LISTENING 33546 - /run/containerd/s/15cd467bbd6f6e4e2a254505a853d13738f7a65efef49deaae85e55a9973ce03
unix 2 [ ACC ] STREAM LISTENING 16982 - /var/run/docker.sock
unix 2 [ ACC ] STREAM LISTENING 18662 - /run/rpcbind.sock
unix 2 [ ACC ] STREAM LISTENING 17542 - /run/grpcfuse.mount.sock
unix 2 [ ACC ] STREAM LISTENING 41010 - /run/containerd/s/786ea0a701c2f25b1bbf02b51ee5c4f2251b6be9f2d3501d5bf0fd256f86fab5
unix 2 [ ACC ] STREAM LISTENING 17681 - /run/containerd/containerd.sock.ttrpc
unix 2 [ ACC ] STREAM LISTENING 17682 - /run/containerd/containerd.sock
unix 2 [ ACC ] STREAM LISTENING 17729 - /var/run/docker/metrics.sock
unix 2 [ ACC ] STREAM LISTENING 18143 - /var/run/docker/libnetwork/7fd7ea8ad1ff.sock
unix 2 [ ACC ] STREAM LISTENING 43337 - /run/containerd/s/8495dff9ed02e247954d88072e3ea812b3c0943748b4eafeea4ea058a5b2e0ae
unix 2 [ ACC ] STREAM LISTENING 17452 - /run/guest-services/wsl2-expose-ports.sock
unix 2 [ ACC ] STREAM LISTENING 16915 - /run/guest-services/debug-shell.sock
unix 2 [ ACC ] STREAM LISTENING 16921 - /run/guest-services/diagnosticd.sock
unix 2 [ ACC ] STREAM LISTENING 16976 - /run/guest-services/docker.proxy.sock
unix 2 [ ACC ] STREAM LISTENING 16981 - /run/guest-services/docker-api-proxy-control.sock
unix 2 [ ACC ] STREAM LISTENING 16983 - /run/guest-services/docker.sock
unix 2 [ ACC ] STREAM LISTENING 16984 - /run/guest-services/lifecycle-server.sock
unix 2 [ ACC ] STREAM LISTENING 17544 - /run/guest-services/filesystem-event.sock
unix 2 [ ACC ] STREAM LISTENING 17546 - /run/guest-services/filesystem-test.sock
root@docker-desktop:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 1 18:25 pts/0 00:00:05 ucentral-cgw
root 30 0 0 18:26 pts/1 00:00:00 /bin/sh
root 36 0 0 18:26 pts/2 00:00:00 bash
root 258 36 0 18:32 pts/2 00:00:00 ps -ef

@ncalad
Copy link
Author

ncalad commented Sep 24, 2024

We noticed a problem with the cert file ...

%3|1727202338.058|FAIL|CGW0#consumer-2| [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connect to ipv6#[::1]:9092 failed: Connection refused (after 59ms in state CONNECT)
[2024-09-24T18:25:38Z INFO ucentral_cgw::cgw_db_accessor] Connection to SQL DB has been established!
[2024-09-24T18:25:38Z INFO ucentral_cgw::cgw_remote_discovery] Connection to REDIS DB has been established!
[2024-09-24T18:25:38Z INFO ucentral_cgw::cgw_remote_server] Starting GRPC server id 0 - listening at 0.0.0.0:50051
[2024-09-24T18:25:38Z ERROR ucentral_cgw::cgw_tls] Failed to open TLS certificate file: /etc/cgw/certs/cas.pem. Error: No such file or directory (os error 2)
[2024-09-24T18:25:38Z ERROR ucentral_cgw] Failed to create TLS acceptor. Error: Failed to open TLS certificate file: /etc/cgw/certs/cas.pem. Error: No such file or directory (os error 2)
%3|1727202340.073|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 69ms in state CONNECT)
%3|1727202340.140|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 66ms in state CONNECT, 1 identical error(s) suppressed)
%3|1727202374.111|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 69ms in state CONNECT, 9 identical error(s) suppressed)
%3|1727202411.933|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 68ms in state CONNECT, 4 identical error(s) suppressed)
%3|1727202451.493|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 64ms in state CONNECT, 4 identical error(s) suppressed)
%3|1727202490.124|FAIL|CGW0#consumer-2| [thrd:GroupCoordinator]: GroupCoordinator: docker-broker-1:9092: Failed to resolve 'docker-broker-1:9092': Name or service not known (after 65ms in state CONNECT, 4 identical error(s) suppressed)

@Cahb
Copy link
Contributor

Cahb commented Sep 24, 2024

@ncalad could you please post the output you get from running make?
Especially where this part starts:

Starting CGW...
CGW LOG LEVEL                     : debug
CGW ID                            : 0
CGW GROUPS CAPACITY/THRESHOLD     : 1000:50
...

@ncalad
Copy link
Author

ncalad commented Sep 24, 2024

Here you go ...
What's Next?
View summary of image vulnerabilities and recommendations → docker scout quickview
Docker build done
Starting CGW...
CGW LOG LEVEL : debug
CGW ID : 0
CGW GROUPS CAPACITY/THRESHOLD : 1000:50
CGW GROUP INFRAS CAPACITY : 2000
CGW WSS THREAD NUM : 4
CGW WSS IP/PORT : 0.0.0.0:15002
CGW WSS CAS : cas.pem
CGW WSS CERT : cert.pem
CGW WSS KEY : key.pem
CGW GRPC PUBLIC HOST/PORT : openlan_cgw:50051
CGW GRPC LISTENING IP/PORT : 0.0.0.0:50051
CGW KAFKA HOST/PORT : docker-broker-1:9092
CGW KAFKA TOPIC : CnC:CnC_Res
CGW DB NAME : cgw
CGW DB HOST/PORT : docker-postgresql-1:5432
CGW DB TLS : no
CGW REDIS HOST/PORT : docker-redis-1:6379
CGW REDIS TLS : no
CGW METRICS PORT : 8080
CGW CERTS PATH : /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server
CGW ALLOW CERT MISMATCH : no
CGW NB INFRA CERTS PATH : /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server
CGW NB INFRA TLS : no
CGW UCENTRAL AP DATAMODEL URI : https://raw.githubusercontent.com/Telecominfraproject/wlan-ucentral-schema/main/ucentral.schema.json
CGW UCENTRAL SWITCH DATAMODEL URI : https://raw.githubusercontent.com/Telecominfraproject/ols-ucentral-schema/main/ucentral.schema.json
2247ff21a29b44788148d0bc451fb4fd46a7b14d7cf48610ba9b6327ea837da3
docker: Error response from daemon: Ports are not available: exposing port TCP 0.0.0.0:15002 -> 0.0.0.0:0: listen tcp 0.0.0.0:15002: bind: address already in use.
make: *** [Makefile:66: run] Error 125

@Cahb
Copy link
Contributor

Cahb commented Sep 24, 2024

Are there any certificates at the /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server?
Can you please post output of

ls /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server

Also, few more things: could you please check if your all_in_one_make branch is up to date? I've updated it few times using forcepush, shouldn't affect anything that much, but still;

Also, the make should also stop the container, not sure why make failed with that last error;

Could you please also post output of the following command please

docker ps

@ncalad
Copy link
Author

ncalad commented Sep 24, 2024

A few minutes ago, we ran the generate_certs script and that directory now contains ...

ls /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server
cas.pem cert.pem gw.crt gw.key key.pem

@Cahb
Copy link
Contributor

Cahb commented Sep 24, 2024

@ncalad please also post output of docker ps command, it seems like either some of the containers are not running or they reside in different networks

@ncalad
Copy link
Author

ncalad commented Sep 24, 2024

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3081725ba7f8 bitnami/kafka:latest "/opt/bitnami/script…" 4 hours ago Up 58 minutes (healthy) 0.0.0.0:9092->9092/tcp, 0.0.0.0:9094->9094/tcp docker-broker-1
6983b702416c bitnami/redis:latest "/opt/bitnami/script…" 4 hours ago Up 59 minutes 0.0.0.0:6379->6379/tcp docker-redis-1
0416b8d69f41 postgres:latest "docker-entrypoint.s…" 4 hours ago Up 59 minutes 0.0.0.0:5432->5432/tcp docker-postgresql-1
010d7e028b78 openlan-cgw-img:3d46cc3 "ucentral-cgw" 21 hours ago Up 57 minutes ucentral-cgw-container

@Cahb
Copy link
Contributor

Cahb commented Sep 24, 2024

Okay,
@ncalad Would you also please post output of the following:
docker inspect docker-broker-1 -f "{{json .NetworkSettings.Networks }}"
docker inspect openlan_cgw -f "{{json .NetworkSettings.Networks }}"
docker inspect -f '{{ .Mounts }}' openlan_cgw
docker inspect -f "{{ .Config.Env }}" openlan_cgw

@ncalad
Copy link
Author

ncalad commented Sep 24, 2024

docker inspect docker-broker-1 -f "{{json .NetworkSettings.Networks }}"
docker inspect openlan_cgw -f "{{json .NetworkSettings.Networks }}"
docker inspect -f '{{ .Mounts }}' openlan_cgw
docker inspect -f "{{ .Config.Env }}" openlan_cgw
{"docker_cgw_network":{"IPAMConfig":null,"Links":null,"Aliases":["docker-broker-1","broker","3081725ba7f8"],"MacAddress":"02:42:ac:15:00:04","DriverOpts":null,"NetworkID":"fca38b8dbb60c4f4a9044909410689d4e8c0d1120edf1ad5e9876a78d4adeb1b","EndpointID":"50c7d1171fc7375836a3d56633ddafef49926c536e100bc117671454daf3fdef","Gateway":"172.21.0.1","IPAddress":"172.21.0.4","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"DNSNames":null}}
{"docker_cgw_network":{"IPAMConfig":null,"Links":null,"Aliases":null,"MacAddress":"","DriverOpts":null,"NetworkID":"","EndpointID":"","Gateway":"","IPAddress":"","IPPrefixLen":0,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"DNSNames":null}}
[{bind /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server /etc/cgw/certs true rprivate} {bind /home/oguser/OpenLAN/openlan-cgw/utils/cert_generator/certs/server /etc/cgw/nb_infra/certs true rprivate}]
[CGW_REDIS_TLS=no CGW_METRICS_PORT=8080 CGW_ID=0 CGW_WSS_CERT=cert.pem CGW_GRPC_PUBLIC_HOST=openlan_cgw CGW_DB_NAME=cgw CGW_ALLOW_CERT_MISMATCH=no CGW_FEATURE_TOPOMAP_ENABLE CGW_UCENTRAL_SWITCH_DATAMODEL_URI=https://raw.githubusercontent.com/Telecominfraproject/ols-ucentral-schema/main/ucentral.schema.json CGW_KAFKA_PRODUCE_TOPIC=CnC_Res CGW_WSS_IP=0.0.0.0 CGW_GRPC_PUBLIC_PORT=50051 CGW_KAFKA_HOST=docker-broker-1 CGW_KAFKA_CONSUME_TOPIC=CnC CGW_DB_USERNAME=cgw CGW_GROUPS_CAPACITY=1000 CGW_GROUPS_THRESHOLD=50 CGW_GRPC_LISTENING_IP=0.0.0.0 CGW_NB_INFRA_TLS=no CGW_UCENTRAL_AP_DATAMODEL_URI=https://raw.githubusercontent.com/Telecominfraproject/wlan-ucentral-schema/main/ucentral.schema.json CGW_GROUP_INFRAS_CAPACITY=2000 DEFAULT_WSS_THREAD_NUM=4 CGW_WSS_KEY=key.pem CGW_KAFKA_PORT=9092 CGW_DB_HOST=docker-postgresql-1 CGW_REDIS_PORT=6379 CGW_WSS_PORT=15002 CGW_WSS_CAS=cas.pem CGW_GRPC_LISTENING_PORT=50051 CGW_DB_TLS=no CGW_REDIS_HOST=docker-redis-1 CGW_LOG_LEVEL=debug CGW_DB_PORT=5432 CGW_DB_PASSWORD=123 PATH=/usr/local/cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin RUSTUP_HOME=/usr/local/rustup CARGO_HOME=/usr/local/cargo RUST_VERSION=1.77.0 CGW_CONTAINER_BUILD_REV= CGW_CONTAINER_BUILD_BRANCH= CGW_CONTAINER_BUILD_TIME=0]

@Cahb
Copy link
Contributor

Cahb commented Sep 24, 2024

@ncalad
Docker engine fails to connect CGW container properly to the network; This is the reason for the errors with broker connect and stuff like that;

First of all, could you please launch the following command:

docker exec -it openlan_cgw sh -c "ls /etc/cgw/certs"

Just to make sure at least volume got mounted;

Also please post you docker / compose version:

docker --version
docker compose version

Then try to run make stop and then make again to see if that helps;

@ncalad
Copy link
Author

ncalad commented Sep 24, 2024

docker exec -it openlan_cgw sh -c "ls /etc/cgw/certs"
cas.pem cert.pem gw.crt gw.key key.pem

docker --version
Docker version 27.3.1, build ce12230
oguser@oguser-virtual-machine:/OpenLAN/openlan-cgw/utils/cert_generator/certs/server$ docker compose version
Docker Compose version v2.19.0
oguser@oguser-virtual-machine:
/OpenLAN/openlan-cgw/utils/cert_generator/certs/server$

2024-09-24T19:43:47Z ERROR ucentral_cgw::cgw_tls] Failed to open TLS private key file: /etc/cgw/certs/key.pem. Error: Permission denied (os error 13)
[2024-09-24T19:43:47Z ERROR ucentral_cgw] Failed to create TLS acceptor. Error: Failed to open TLS private key file: /etc/cgw/certs/key.pem. Error: Permission denied (os error 13)
[2024-09-24T19:43:49Z DEBUG ucentral_cgw::cgw_nb_api_listener] pre_rebalance callback, assigned partition(s): 0 1
[2024-09-24T19:43:49Z DEBUG ucentral_cgw::cgw_nb_api_listener] post_rebalance callback, assigned partition(s): 0 1
oguser@oguser-virtual-machine:/OpenLAN/openlan-cgw/utils/cert_generator/certs/server$ docker exec -it openlan_cgw sh -c "ls /etc/cgw/certs"
cas.pem cert.pem gw.crt gw.key key.pem
oguser@oguser-virtual-machine:
/OpenLAN/openlan-cgw/utils/cert_generator/certs/server$ docker exec -it openlan_cgw sh -c "ls /etc/cgw/certs"
cas.pem cert.pem gw.crt gw.key key.pem
oguser@oguser-virtual-machine:~/OpenLAN/openlan-cgw/utils/cert_generator/certs/server$ docker exec -it openlan_cgw bash
root@dc97c92e43d0:/# cd /etc
root@dc97c92e43d0:/etc# cd cgw/
root@dc97c92e43d0:/etc/cgw# cd certs/
root@dc97c92e43d0:/etc/cgw/certs# ls
cas.pem cert.pem gw.crt gw.key key.pem
root@dc97c92e43d0:/etc/cgw/certs# ls -alt
total 28
drwxr-xr-x 4 root root 4096 Sep 24 19:43 ..
-rw-rw-r-- 1 root root 3631 Sep 24 19:06 gw.crt
-rw------- 1 root root 3272 Sep 24 19:06 gw.key
drwxrwxr-x 2 root root 4096 Sep 24 16:03 .
-rw------- 1 nobody nogroup 3272 Sep 24 16:03 key.pem
-rw-r--r-- 1 nobody nogroup 3631 Sep 24 16:03 cert.pem
-rw-r--r-- 1 nobody nogroup 1757 Sep 24 16:03 cas.pem

root can't open open key.pem

@Cahb
Copy link
Contributor

Cahb commented Sep 24, 2024

Okay, so first note is that we never tried to use this stuff on VM; It shouldn't make any difference, but still, FYI;

Second thing is i think restart helped?
I can tell it connected to broker because to the following prints:
[2024-09-24T19:43:49Z DEBUG ucentral_cgw::cgw_nb_api_listener] pre_rebalance callback, assigned partition(s): 0 1
[2024-09-24T19:43:49Z DEBUG ucentral_cgw::cgw_nb_api_listener] post_rebalance callback, assigned partition(s): 0 1

You can try launching make stop, and changing owner of the files, e.g.

chown root:root /OpenLAN/openlan-cgw/utils/cert_generator/certs/server/key.pem
chown root:root /OpenLAN/openlan-cgw/utils/cert_generator/certs/server/cert.pem
chown root:root /OpenLAN/openlan-cgw/utils/cert_generator/certs/server/cas.pem

NOTE: you have to launch this from host OS, not container

Also, last thing: is this VM you're using - is your HOST OS Windows or Linux?
E.g. are you using Virtualbox or whatever on the Windows machine by any chance?

@Cahb
Copy link
Contributor

Cahb commented Sep 25, 2024

@ncalad did you have a chance to look into these steps i've posted?
Also, just out of curiosity: what company are you working in? I was thinking maybe i can grab your slack ID and we can invite you to the OpenWifi / OpenLan slack channels we have
And we can debug this issue a bit faster / in real time chatting + You could ask OpenLan / OpenWiFi questions directly there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants