-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug(config): some parameters are incorrect #20
Comments
NTP_SERVER定义为unbond,只能执行 NTP_SERVER=xxx ./install.sh绕过 |
已知问题,之前修复了忘记重新构建安装包了。刚刚重新构建了,重新下载试一下 https://github.com/k8sli/kubeplay/releases/tag/v0.1.0-alpha.3 。 |
TASK [cluster/bootstrap-os : Configure offline resources repository on apt package manager] ************************
changed: [node1]
changed: [node2]
Sunday 05 September 2021 15:10:28 +0000 (0:00:00.591) 0:00:05.231 ******
Sunday 05 September 2021 15:10:28 +0000 (0:00:00.046) 0:00:05.278 ******
TASK [cluster/bootstrap-os : Update apt repository cache] **********************************************************
fatal: [node2]: FAILED! => changed=false
msg: 'Failed to update apt cache: E:The method driver /usr/lib/apt/methods/192.168.100.25 could not be found., W:Is the package apt-transport-192.168.100.25 installed?, E:Failed to fetch 192.168.100.25://8080/ubuntu/amd64/bionic/InRelease , E:Some index files failed to download. They have been ignored, or old ones used instead.'
fatal: [node1]: FAILED! => changed=false
msg: 'Failed to update apt cache: E:The method driver /usr/lib/apt/methods/192.168.100.25 could not be found., W:Is the package apt-transport-192.168.100.25 installed?, E:Failed to fetch 192.168.100.25://8080/ubuntu/amd64/bionic/InRelease , E:Some index files failed to download. They have been ignored, or old ones used instead.'
NO MORE HOSTS LEFT *************************************************************************************************
PLAY RECAP *********************************************************************************************************
node1 : ok=9 changed=3 unreachable=0 failed=1 skipped=17 rescued=0 ignored=0
node2 : ok=9 changed=3 unreachable=0 failed=1 skipped=23 rescued=0 ignored=0
Sunday 05 September 2021 15:11:00 +0000 (0:00:31.961) 0:00:37.240 ******
===============================================================================
cluster/bootstrap-os : Update apt repository cache --------------------------------------------------------- 31.96s
Gather minimal facts ---------------------------------------------------------------------------------------- 1.09s
download : download | Download files / images --------------------------------------------------------------- 0.86s
cluster/bootstrap-os : Configure offline resources repository on apt package manager ------------------------ 0.59s
Gather necessary facts (hardware) --------------------------------------------------------------------------- 0.54s
Gather necessary facts (network) ---------------------------------------------------------------------------- 0.40s
cluster/bootstrap-os : Backup system default package manager repo file -------------------------------------- 0.32s
cluster/bootstrap-os : Create remote_tmp for it is used by another module ----------------------------------- 0.28s
cluster/bootstrap-os : gather os specific variables --------------------------------------------------------- 0.13s
cluster/bootstrap-os : include_tasks ------------------------------------------------------------------------ 0.06s
kubespray-defaults : Gather ansible_default_ipv4 from all hosts --------------------------------------------- 0.05s
container-engine/nerdctl : nerdctl | Copy nerdctl binary from download dir ---------------------------------- 0.05s
download : download | Get kubeadm binary and list of required images ---------------------------------------- 0.05s
download : prep_download | Set image pull/info command for containerd and crio on localhost ----------------- 0.05s
cluster/bootstrap-os : Configure offline resources repository on yum package manager ------------------------ 0.05s
kubespray-defaults : Configure defaults --------------------------------------------------------------------- 0.05s
download : prep_download | Create staging directory on remote node ------------------------------------------ 0.05s
download : prep_download | Set image pull/info command for containerd and crio ------------------------------ 0.05s
container-engine/crictl : install crictĺ -------------------------------------------------------------------- 0.05s
container-engine/nerdctl : nerdctl | Download nerdctl ------------------------------------------------------- 0.04s
###### 01-cluster-bootstrap-os installation failed ###### |
在安装包根目录执行 |
root@fredvb:~/kubeplay# grep 'offline_resources_url' config/kubespray/env.yml |
多次执行,随机地,会出现末行错误而终止: INFO[0000] Creating container nginx
INFO[0000] Creating container registry
✔ The registry container is running.
✔ The nginx container is running.
✖ Error: the http://192.168.100.25:8080/certs/rootCA.crt website is not running, and the status code is 000! |
config.yaml 配置文件发一下 |
这个每次必出现 ✔ Updated the apt list file
E: Failed to fetch file:/root/kubeplay/resources/nginx/ubuntu/amd64/bionic/Packages File not found - /root/kubeplay/resources/nginx/ubuntu/amd64/bionic/Packages (2: No such file or directory)
E: Some index files failed to download. They have been ignored, or old ones used instead. |
root@fredvb:~/kubeplay# cat config.yaml
compose:
# Compose bootstrap node ip, default is local internal ip
internal_ip: 192.168.100.25
# Nginx http server bind port for download files and packages
nginx_http_port: 8080
# Registry domain for CRI runtime download images
registry_domain: kube.registry.local
kubespray:
# Kubernetes version by default, only support v1.20.6
kube_version: v1.21.4
# For deploy HA cluster you must configure a external apiserver access ip
external_apiserver_access_ip: 192.168.100.5
# Set network plugin to calico with vxlan mode by default
kube_network_plugin: calico
#Container runtime, only support containerd if offline deploy
container_manager: containerd
# Now only support host if use containerd as CRI runtime
etcd_deployment_type: host
# Settings for etcd event server
etcd_events_cluster_setup: true
etcd_events_cluster_enabled: true
# Cluster nodes inventory info
inventory:
all:
vars:
ansible_port: 22
ansible_user: root
ansible_ssh_pass: q1w2e3r4
# ansible_ssh_private_key_file: /kubespray/config/id_rsa
hosts:
node1:
ansible_host: 192.168.100.4
node2:
ansible_host: 192.168.100.5
children:
kube_control_plane:
hosts:
node2:
kube_node:
hosts:
node1:
etcd:
hosts:
node2:
k8s_cluster:
children:
kube_control_plane:
kube_node:
gpu:
hosts: {}
calico_rr:
hosts: {}
### Default parameters ###
## This filed not need config, will auto update,
## if no special requirement, do not modify these parameters.
default:
# NTP server ip address or domain, default is internal_ip
ntp_server:
- 192.168.100.25
# Registry ip address, default is internal_ip
registry_ip: 192.168.100.25
# Offline resource url for download files, default is internal_ip:nginx_http_port
offline_resources_url: 192.168.100.25:8080
# Use nginx and registry provide all offline resources
offline_resources_enabled: true
# Image repo in registry
image_repository: library
# Kubespray container image for deploy user cluster or scale
kubespray_image: "kube.registry.local/library/kubespray:v2.16.0-154-geb42915a"
# Auto generate self-signed certificate for registry domain
generate_domain_crt: true
# For nodes pull image, use 443 as default
registry_https_port: 443
# For push image to this registry, use 5000 as default, and only bind at 127.0.0.1
registry_push_port: 5000
# Set false to disable download all container images on all nodes
download_container: false |
|
default改回去了,现在还是回到以下错误: TASK [cluster/bootstrap-os : Configure offline resources repository on apt package manager] ************************
changed: [node1]
changed: [node2]
Sunday 05 September 2021 16:25:26 +0000 (0:00:00.613) 0:00:05.384 ******
Sunday 05 September 2021 16:25:26 +0000 (0:00:00.046) 0:00:05.431 ******
TASK [cluster/bootstrap-os : Update apt repository cache] **********************************************************
fatal: [node2]: FAILED! => changed=false
msg: 'Failed to update apt cache: unknown reason'
fatal: [node1]: FAILED! => changed=false
msg: 'Failed to update apt cache: unknown reason'
NO MORE HOSTS LEFT *************************************************************************************************
PLAY RECAP *********************************************************************************************************
node1 : ok=9 changed=2 unreachable=0 failed=1 skipped=17 rescued=0 ignored=0
node2 : ok=9 changed=2 unreachable=0 failed=1 skipped=23 rescued=0 ignored=0
Sunday 05 September 2021 16:28:29 +0000 (0:03:03.812) 0:03:09.243 ******
===============================================================================
cluster/bootstrap-os : Update apt repository cache -------------------------------------------------------- 183.81s
Gather minimal facts ---------------------------------------------------------------------------------------- 1.11s
download : download | Download files / images --------------------------------------------------------------- 0.87s
cluster/bootstrap-os : Configure offline resources repository on apt package manager ------------------------ 0.61s
Gather necessary facts (hardware) --------------------------------------------------------------------------- 0.54s
Gather necessary facts (network) ---------------------------------------------------------------------------- 0.41s
cluster/bootstrap-os : Backup system default package manager repo file -------------------------------------- 0.27s
cluster/bootstrap-os : Create remote_tmp for it is used by another module ----------------------------------- 0.26s
download : prep_download | Create local cache for files and images on control node -------------------------- 0.13s
kubespray-defaults : Populates no_proxy to all hosts -------------------------------------------------------- 0.10s
cluster/bootstrap-os : gather os specific variables --------------------------------------------------------- 0.08s
cluster/bootstrap-os : include_tasks ------------------------------------------------------------------------ 0.06s
kubespray-defaults : Gather ansible_default_ipv4 from all hosts --------------------------------------------- 0.06s
download : prep_download | Set image pull/info command for containerd and crio on localhost ----------------- 0.05s
container-engine/crictl : install crictĺ -------------------------------------------------------------------- 0.05s
download : prep_download | Set image pull/info command for docker on localhost ------------------------------ 0.05s
download : prep_download | Check that local user is in group or can become root ----------------------------- 0.05s
download : prep_download | Set a few facts ------------------------------------------------------------------ 0.05s
kubespray-defaults : Configure defaults --------------------------------------------------------------------- 0.05s
download : prep_download | Set image pull/info command for docker ------------------------------------------- 0.05s
✖ ###### 01-cluster-bootstrap-os installation failed ######
root@fredvb:~/kubeplay# |
可能是你安装包下载的不对,系统是 ubuntu 18.04 ,下载的安装包也是 18.04 吗 |
都是18.04. 感觉是iptables没有设置对,nerdctl拉起之后,iptables没有放行8080/443 port |
我手工加iptables -A FORWARD -p tcp --dport 8080 -j ACCEPT,这个'Failed to update apt cache: unknown reason'就解决了 |
ls 看一下有没有这个目录,出现这个错误的原因就是下载的安装包版本和 OS 不匹配🤔。 |
没有这个目录,只有一个gz文件和两个目录: root@fredvb:~/kubeplay/resources/nginx/ubuntu/amd64/bionic# ls
archive.ubuntu.com download.docker.com Packages.gz 我的安装包是kubeplay-v0.1.0-alpha.3-ubuntu-bionic-amd64.tar.gz |
关于这个local repo,我记得你有个文档提到,如果直接FROM nginx:1.9.1, 两个COPY --from [bionic|focal] /ubuntu /usr/share/nginx/html是错的。我改成COPY --from [bionic|focal] /ubuntu /usr/share/nginx/html/ubuntu就可以了。对于上面这个,好像路径又有所不同。另外,那个文档提到type=tar可以生成tar包导入,但是entrypoint会在import时丢掉,所以内置nginx不会启动,解决这个问题需要在import的时候加上-change 'CMD /usr/sbin/nginx -g "daemon off;"' 选项 |
又发现2个失败点:
fatal: [node1]: FAILED! => changed=false
msg: |-
Failed to reload sysctl: net.ipv4.ip_forward = 1
net.ipv4.ip_local_reserved_ports = 30000-32767
sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory
sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such file or directory
changed: [node2] |
我是使用各个 Linux 发行版 Cloud-init 镜像创建的虚拟机测试的,其他经过修改或者安装了相冲突的包是无法保证能够安装成功。
|
modprobe br_netfilter解决了这个问题 |
root@node2:~# ll /etc/apt/sources.list.d/offline-resources.list*
-rw-r--r-- 1 root root 66 Sep 6 15:18 /etc/apt/sources.list.d/offline-resources.list
-rw-r--r-- 1 root root 66 Sep 6 14:51 /etc/apt/sources.list.d/offline-resources.list.bak
root@node2:~# apt update
Err:1 http://192.168.100.25:8080/ubuntu/amd64 bionic/ InRelease
Could not connect to 192.168.100.25:8080 (192.168.100.25). - connect (111: Connection refused)
Reading package lists... Done
Building dependency tree
Reading state information... Done
All packages are up to date.
W: Failed to fetch http://192.168.100.25:8080/ubuntu/amd64/bionic/InRelease Could not connect to 192.168.100.25:8080 (192.168.100.25). - connect (111: Connection refused)
W: Some index files failed to download. They have been ignored, or old ones used instead. |
root@fredvb:~/kubeplay/resources/nginx/ubuntu/amd64/bionic# tree -L 2
.
├── archive.ubuntu.com
│ └── ubuntu
├── download.docker.com
│ └── linux
└── Packages.gz
4 directories, 1 file |
终于成功了一次,删除了cgroupv2,重启 ===============================================================================
kubernetes-apps/ansible : Kubernetes Apps | Lay Down CoreDNS templates --------------------------------------------------------------------------- 4.58s
kubernetes-apps/ansible : Kubernetes Apps | Start Resources -------------------------------------------------------------------------------------- 4.52s
download : download | Download files / images ---------------------------------------------------------------------------------------------------- 0.81s
Gather minimal facts ----------------------------------------------------------------------------------------------------------------------------- 0.65s
Gather necessary facts (hardware) ---------------------------------------------------------------------------------------------------------------- 0.60s
kubernetes-apps/ansible : Kubernetes Apps | Wait for kube-apiserver ------------------------------------------------------------------------------ 0.53s
Gather necessary facts (network) ----------------------------------------------------------------------------------------------------------------- 0.42s
kubernetes-apps/ansible : Kubernetes Apps | Delete kubeadm CoreDNS ------------------------------------------------------------------------------- 0.35s
kubernetes-apps/ansible : Kubernetes Apps | Register coredns deployment annotation `createdby` --------------------------------------------------- 0.31s
kubernetes-apps/ansible : Kubernetes Apps | Delete kubeadm Kube-DNS service ---------------------------------------------------------------------- 0.24s
kubernetes-apps/ansible : Kubernetes Apps | Lay Down nodelocaldns Template ----------------------------------------------------------------------- 0.19s
kubernetes-apps/metallb : Kubernetes Apps | Install and configure MetalLB ------------------------------------------------------------------------ 0.18s
kubernetes-apps/metallb : Kubernetes Apps | Set apparmor_enabled --------------------------------------------------------------------------------- 0.14s
kubespray-defaults : Set no_proxy to all assigned cluster IPs and hostnames ---------------------------------------------------------------------- 0.14s
kubernetes-apps/external_cloud_controller/openstack : External OpenStack Cloud Controller | Generate Manifests ----------------------------------- 0.13s
kubernetes-apps/container_engine_accelerator/nvidia_gpu : Container Engine Acceleration Nvidia GPU | Create manifests for nvidia accelerators ---- 0.11s
kubernetes-apps/csi_driver/cinder : Cinder CSI Driver | Write cacert file ------------------------------------------------------------------------ 0.10s
kubespray-defaults : Gather ansible_default_ipv4 from all hosts ---------------------------------------------------------------------------------- 0.10s
download : prep_download | On localhost, check if passwordless root is possible ------------------------------------------------------------------ 0.10s
kubernetes-apps/ansible : Kubernetes Apps | Lay Down Secondary CoreDNS Template ------------------------------------------------------------------ 0.09s
✔ ###### 05-cluster-apps successfully installed ######
✔ ###### kubernetes cluster successfully installed ###### |
这是我目前还需要手动解决 #!/bin/bash
# one shot
# iptables -A FORWARD -p tcp -m tcp --dport 443 -j ACCEPT
# iptables -A FORWARD -p tcp -m tcp --dport 8080 -j ACCEPT
# for i in nodes; do ssh $i modprobe br_netfilter; done
for h in x99u d9020 fredvb; do
ssh $h 'rm -rf /etc/apt/sources.list.d/offline-resources.list*'
done 很奇怪nerdctl拉起的两个容器端口8080 443为啥不给加iptables通过 |
这个后期会修复,移除的时候会清理这些存留的文件 |
config/compose/certs/下面本来放的是2个文件,结果成了目录,所以启动nginx加载证书出错
另外,nginx.conf里面的registry:5000好像也不能自动替换为IP, 手动修复可以通过
The text was updated successfully, but these errors were encountered: