Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update feature/perf from master #6111

Merged
merged 50 commits into from
Nov 12, 2024
Merged

Update feature/perf from master #6111

merged 50 commits into from
Nov 12, 2024

Conversation

edwintorok
Copy link
Contributor

No description provided.

psafont and others added 30 commits October 16, 2024 13:54
Parallel atoms do quite a bit of unnecessary actions even when they are empty.
They are also not needed when running a single task.

They also show as spans in the traces. Removing them makes the traces shorter
and easier to read.

Co-authored-by: Edwin Török <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
Operations on different devices should be independent and therefore can be
parallelized. This both means parallelizing operations on different device
types and on devices for the same type.

An atom to serialize action has been introduced because the operation regarding
a single device must be kept serialized.

Signed-off-by: Pau Ruiz Safont <[email protected]>
Instead of sleeping and hoping for the best, wait for the background job to
finish.

Signed-off-by: Andrii Sultanov <[email protected]>
GCC correctly reports "pointer targets differ in signedness".

Signed-off-by: Ross Lagerwall <[email protected]>
Newer systemd warns that the "syslog" StandardOutput/StandardError type
is deprecated and automatically uses the journal instead. Fix this by
removing the explicit setting of StandardOutput/StandardError. Instead,
the service will use the default values configured in systemd
(DefaultStandardOutput/DefaultStandardError) which for a XenServer
system will result in the output going to rsyslog.

Signed-off-by: Ross Lagerwall <[email protected]>
services configuration

Systemd services has good support for the services depends and
orders in the Unit file, that is the place the restart order
should be stated.
However, the command `systemd stop foo bar ...` will override
the order in the Unit file. As the number of the services grow
up, it is really hard to manage the order in the systemd command

In order to resolve the issue, `toolstack.target` is created to
group and manage the toolstack services.
- toolstack.target: `Wants: foo.service` will start foo.service
when `systemctl start toolstack.target`
- foo.service: `PartOf: toolstack.target` will restart/stop
foo.service when `systemctl stop/restart toolstack.target`
Note: Above two does not have to match, eg. if we do not want to
start a service during `systemctl start toolstack.target`, we can
remove it from the first list.

- Following xenopsd services are no longer valid, just got removed
  * xenopsd
  * xenopsd-xenlight
  * xenopsd-simulator
  * xenopsd-libvirt

Signed-off-by: Lin Liu <[email protected]>
2 recent optimizations have changed the Uuidx module to open
  /dev/urandom once on startup, instead of every time a value was
  requested.

However 'networkd_db' runs in the installer environment, inside a chroot
where /dev/urandom is not available.

Open /dev/urandom on first use instead.

Simplify the code and use a single implementation for both fast and
secure urandom generation:
* use a mutex to protect accesses to global urandom state
* use an input channel, rather than a Unix file descriptor, this allows
  us to read many bytes in one go, and then generate multiple random
  numbers without having to make syscalls that often

(syscalls are slow in this case because they require releasing the
runtime mutex, which gives another thread the opportunity to run for
50ms).

Fixes: a0176da ("CP-49135: open /dev/urandom just once")
Fixes: a2d9fbe ("IH-577 Implement v7 UUID generation")
Fixes: 6635a00 ("CP-49136: Introduce PRNG for generating non-secret UUIDs")

This is slightly slower than before, but still fast enough:
```
│  uuidx creation/Uuidx.make            │             0.0004 mjw/run│            16.0001 mnw/run│            105.8801 ns/run│
│  uuidx creation/Uuidx.make_uuid_urnd  │             0.0004 mjw/run│            16.0001 mnw/run│            105.1474 ns/run│
```

Previously this used to take ~88ns, so in fact the difference is barely
noticable.

Also remove the feature flag: the previous change was feature flagged
too, but broke master anyway (I wouldn't have though anything *doesn't*
have /dev/urandom available, and didn't feature flag that part, because
in general it is not possible to feature flag startup code without races)

Signed-off-by: Edwin Török <[email protected]>
initscripts family are legacy and want to be removed

`service iptables save` call
/usr/libexec/initscripts/legacy-actions/iptables/save, which call
`exec /usr/libexec/iptables/iptables.init save`, to save iptables
rules and remove initscripts, we call following directly
`/usr/libexec/iptables/iptables.init save`

`service` command are also updated to `systemctl`

Signed-off-by: Lin Liu <[email protected]>
… reduce number of syscalls (#6085)

This is an alternative to
#6077

2 recent optimizations have changed the Uuidx module to open
  /dev/urandom once on startup, instead of every time a value was
  requested.

However 'networkd_db' runs in the installer environment, inside a chroot
where /dev/urandom is not available.

Open /dev/urandom on first use instead.

Simplify the code and use a single implementation for both fast and
secure urandom generation:
* use a mutex to protect accesses to global urandom state
* use an input channel, rather than a Unix file descriptor, this allows
us to read many bytes in one go, and then generate multiple random
numbers without having to make syscalls that often

(syscalls are slow in this case because they require releasing the
runtime mutex, which gives another thread the opportunity to run for
50ms).

Fixes: a0176da ("CP-49135: open /dev/urandom just once")
Fixes: a2d9fbe ("IH-577 Implement v7 UUID generation")
Fixes: 6635a00 ("CP-49136: Introduce PRNG for generating non-secret
UUIDs")

This is slightly slower than before, but still fast enough:
```
│  uuidx creation/Uuidx.make            │             0.0004 mjw/run│            16.0001 mnw/run│            105.8801 ns/run│
│  uuidx creation/Uuidx.make_uuid_urnd  │             0.0004 mjw/run│            16.0001 mnw/run│            105.1474 ns/run│
```

Previously this used to take ~88ns, so in fact the difference is barely
noticable.

Also remove the feature flag: the previous change was feature flagged
too, but broke master anyway (I wouldn't have though anything *doesn't*
have /dev/urandom available, and didn't feature flag that part, because
in general it is not possible to feature flag startup code without
races).

`networkd_db` now doesn't try to open this anymore:
```
strace -e openat -e open networkd_db
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/var/lib/xcp/networkd.db", O_RDONLY) = 3
+++ exited with 0 +++
```
Removes the remnants of an escaping mechanism once used in xapi. It
appears that the format "<<xxx<value<xxx<" was an early escaping format.

Also removes the effort at line tracking from the lexer actions. Since
the .mli for the generated lexer was only present to have this line
variable, it is deleted.

Signed-off-by: Colin James <[email protected]>
Removes the remnants of an escaping mechanism once used in xapi.

It appears that the format `<<xxx<value<xxx<` (where `xxx` acted as
delimiting tags) was an early escaping format. The construction of these
weird strings appears to have been dropped a long time ago (2006).
… domids instead (#6068)

Follow-up to the fix we had to rush, dropping xenctrl entirely.

Tested with a Windows VM, the domain creation is picked up and network
metrics are present for it, with correct UUIDs determined.
Draft because I need to get Yizhou's agreement on this
Operations on different devices should be independent and therefore can
be
parallelized. This both means parallelizing operations on different
device
types and on devices for the same type.

An atom to serialize action has been introduced because the operation
regarding
a single device must be kept serialized.

Also removes some unneeded parallel atoms, which cause some overhead as
well as polluting the traces

This makes VM_starts to run about ~1 second faster on some tests where
VMs have 5 VIFs. I'm testing it further, but it doesn't look like it's
breaking any tests. I want to push it through some bootstorms and
measure improvements

Please do suggest some further parallelization if you can find it!

BVT+BST are all green
)

Instead of sleeping and hoping for the best, wait for the background job
to finish.
The new version (5.3.0) jemalloc caused a significant increase of
memory usage compared to the version 3.6.0.

Signed-off-by: Stephen Cheng <[email protected]>
Prior to version 4.12, OCaml's standard threads library (systhreads) had
no builtin concept of a semaphore, so one was implemented in
Xapi_stdext_threads.

We replace all usages of this with Semaphore.Counting from the standard
library and remove the implementation from Xapi_stdext_threads.

Technically, the interface provided by the previous semaphore is more
general: it permits arbitrary adjustments to the semaphore's counter,
allowing for a "weighted" style of locking. However, this is only used
in one place (with a weight value of 1, which is the same
decrement/increment value as normal).

Signed-off-by: Colin James <[email protected]>
Prior to version 4.12, OCaml's standard threads library (`systhreads`)
had no builtin concept of a semaphore, so one was implemented in
`Xapi_stdext_threads`.

We replace all usages of this with `Semaphore.Counting` from the
standard library and remove the implementation from
`Xapi_stdext_threads`.

Technically, the interface provided by the previous semaphore is more
general: it permits arbitrary adjustments to the semaphore's counter,
allowing for a "weighted" style of locking. However, this is only used
in one place (with a weight value of 1, which is the same
decrement/increment value as normal).

---

All green on BVT+BST (`207043`) apart from 2 known issues. The number of
users of the semaphore module in the codebase is quite low, which is why
I think it's fine to drop the implementation entirely.
For example "1.2.3a" will be divided to [Int 1; Int 2; Str 3a]
and "1.2.3" is divided to [Int 1; Int 2; Int 3]. It leads to
"1.2.3" > "1.2.3a" which is incorrect.
After fix, "1.2.3a" will be divided to [Int 1; Int 2; Int 3;
Str a], then we can get the right compare result.

Signed-off-by: Changlei Li <[email protected]>
- varstored-guard Before: xenopsd.servcie->xenopsd-xc.service
- Remove unnecessary xenopsd services
  * xenopsd
  * xenopsd-xenlight
  * xenopsd-simulator
  * xenopsd-libvirt
- Reorder services to stop/start
  * forkexecd: does not depends on others, move to last
  * xenopsd-xc After xcp-rrdd. so it needs after xcp-rrdd-*

Note: the for loop reverse the TO_RESTART order
This service is not enabled and started by default, but used on-demand
whenever it is needed for clustering. The current Wants= option in the
toolstack.target is causing xapi-clusterd.service to be started by
xe-toolstack-restart even if it is not enabled.

The fix is to replace Wants= in toolstack.target with WantedBy= in
xapi-clusterd.service, as the latter only installs the dependency when
enabling the service.

Signed-off-by: Rob Hoes <[email protected]>
initscripts family are legacy and want to be removed

`service iptables save` call
/usr/libexec/initscripts/legacy-actions/iptables/save, which call `exec
/usr/libexec/iptables/iptables.init save`, to save iptables rules and
remove initscripts, we call following directly
`/usr/libexec/iptables/iptables.init save`

`service` command are also updated to `systemctl`
This service is not enabled and started by default, but used on-demand
whenever it is needed for clustering. The current Wants= option in the
toolstack.target is causing xapi-clusterd.service to be started by
xe-toolstack-restart even if it is not enabled.

The fix is to replace Wants= in toolstack.target with WantedBy= in
xapi-clusterd.service, as the latter only installs the dependency when
enabling the service.
A previous commit changed the socket location to /run/pvsproxy but this
is problematic because the pvsproxy daemon runs as a deprivileged user
and cannot create the socket.

Instead, update the path to a location that the daemon has permission to
create. Add a fallback to the original path to cope with older pvsproxy
daemons. This fallback can be removed in the future.

Co-developed-by: Pau Ruiz Safont <[email protected]>
Signed-off-by: Ross Lagerwall <[email protected]>
A previous commit changed the socket location to /run/pvsproxy but this
is problematic because the pvsproxy daemon runs as a deprivileged user
and cannot create the socket.

Instead, update the path to a location that the daemon has permission to
create. Add a fallback to the original path to cope with older pvsproxy
daemons. This fallback can be removed in the future.
Tilde `~` used in RPM version stands for pre-release. So version
with `~` is earlier than the same version without `~`.
For example:
1.2.3~beta < 1.2.3
1.xs8 > 1.xs8~2_1

Signed-off-by: Changlei Li <[email protected]>
`systemctl list-dependencies --plain --no-pager` list uncessary
xapi-clusterd service when xapi-clusterd-shutdown is started.

Here instead of checking the status of all dependencies, we only
check the status of previous enabled dependencies. This also
complies with the behavior before toolstack.target

Signed-off-by: Lin Liu <[email protected]>
`systemctl list-dependencies --plain --no-pager` list uncessary
xapi-clusterd service when xapi-clusterd-shutdown is started.

Here instead of checking the status of all dependencies, we only check
the status of previous acitve dependencies. This also complies with the
behavior before toolstack.target
minglumlu and others added 20 commits November 5, 2024 02:42
1. Fix version segment division error.
2. Support tilde in RPM version/release comparison.
When a VDI.pool_migrate starts at a pool member, a connection between the
coordinator and that host remains open for the duration of the migration. This
connection is completely idle. If the migration lasts for more than 12 hours,
stunnel closes the connection due to inactivity, which cancels the migration.

To avoid this use an internal API that uses short-running connection whenever
possible to avoid interrupting the migration.

Signed-off-by: Pau Ruiz Safont <[email protected]>
This allows to reduce most of the indentation in check_operation_error, which
returns searches for a single error and returns it.

Signed-off-by: Pau Ruiz Safont <[email protected]>
There's no seeming reason these were missing, and they need to be added to be
able to map the VDI operations to SR ones for better error messages

Signed-off-by: Pau Ruiz Safont <[email protected]>
…6102)

When a VDI.pool_migrate starts at a pool member, a connection between
the
coordinator and that host remains open for the duration of the
migration. This
connection is completely idle. If it's open for 12 hours, stunnel closes
the
connection due to inactivity, which cancels the migration.

To avoid this use an internal API that uses short-running connection
whenever
possible to avoid interrupting the migration.

Also change nested if-else flow in vdi handling to use Result with let
binds, and add missing CDI operations to storage operations.

I've manually verified the fix by changing the stunnel timeout in the
coordinator of a pool to 5 seconds and migrating a vdi used by a VM in
another host:
```
# xe vdi-pool-migrate uuid=fdb8783f-8cdb-44d3-9327-992f047e6262 sr-uuid=c879ffc9-bb14-fec5-3d95-8959db9337eb
270c3dc8-27bc-4ef2-982a-a6142f0ffce1
```

And after reverting the patch:
```
# xe vdi-pool-migrate uuid=270c3dc8-27bc-4ef2-982a-a6142f0ffce1 sr-uuid=aade9dc6-ff31-3204-4187-37f82ad06c77
Cannot forward messages because the server cannot be contacted. The server may be switched off or there may be network connectivity problems.
host: 0a3a45a0-8754-4570-bfde-6ef6843ccda1 (xs2)
```

The changes are best reviewed one by one, and while ignoring whitespace,
if using github.
CI check "Run OCaml tests" on github failed occasionally. The
cause is test_systemd timeout. Add sleep 1 between server start
and client to fix it.

Signed-off-by: Changlei Li <[email protected]>
CI check "Run OCaml tests" on github failed occasionally. The cause is
test_systemd timeout. Add sleep 1 between server start and client to fix
it.
Fix a warning reported by newer systemd as well as a build warning I
noticed when building with a newer GCC.
The license field is not always a valid date, sometimes it contains a special
value to signify there's no expiry date.

Signed-off-by: Pau Ruiz Safont <[email protected]>
Internalize the concept of expiration dates, including "never", and use Date to
manage all the dates, instead of using unix time

Signed-off-by: Pau Ruiz Safont <[email protected]>
This now matches the concept of xenserver's licensing daemon, which changed it
in the last year.

Signed-off-by: Pau Ruiz Safont <[email protected]>
Instead use the Date, Ptime and Ptime.Span

Signed-off-by: Pau Ruiz Safont <[email protected]>
The license field is not always a valid date, sometimes it contains a
special
value to signify there's no expiry date.
The new version (5.3.0) jemalloc caused a significant increase of memory
usage compared to the version 3.6.0.
Adjust the parameters to fix the memory usage issue.

The modification has two parts, the other PR is in the spec repo.

- dom0mem test:
-- xenopsd-xc

![image](https://github.com/user-attachments/assets/853212ce-de13-49ed-9890-97887c9836c1)
-- qemu

![image](https://github.com/user-attachments/assets/aa28b3d4-7948-472f-bcf1-62c866768299)
-- xapi (the config is not changed for xapi service)

![image](https://github.com/user-attachments/assets/02e924a6-22f4-48f4-a821-601cf205ffc7)
`host_pending_features` represents the features that are available on
some of the hosts in the pool, but not all of them. Note the way this
field is initialised in the `SM.create` code means that it will only
contain new features that appear during upgrade. This means a feature
that is added into `SM.features` during creation will stay there even if
it is not available on all hosts. But we should not end up in this
situation in the first place.

Also change the meaning of `Sm.features` to be pool-wide.

Signed-off-by: Vincent Liu <[email protected]>
NEW Sm features that are found during an upgrde will now only be
available when they are available on all of the hosts. Add logic to
keep track of features that are only availabe on some of the hosts in
the pool, and declare them in `Sm.feature` only when all of the hosts
have declared this.

Also move `Storage_access.on_xapi_start` to `dbsync_slave` as this needs
to be run on all hosts for each sm to get a chance to say what features
they have.

Signed-off-by: Vincent Liu <[email protected]>
Implement a new `assert_sm_features_compatiable` in pre_join_checks so
that if the joining host does not have compatible sm features, it will
be denied entry into the pool.

Signed-off-by: Vincent Liu <[email protected]>
There are two parts to this PR:
1. only declare a feature as a public sm feature when it is advertised
by all SMs in the pool
2. reject a host from joining the pool if it does not have a compatible
set of SM features with the current pool

Now I did not create a new class just for SM features as I feel having a
decidated class for something that is supposed to be quite a temporary
state (as this is only supposed to happen during an upgrade) is an
overkill. But I am open to suggestions.
Introduces an interface file for the Rbac module within xapi in order to
document the intent of each of its functions.

Signed-off-by: Colin James <[email protected]>
Introduces an interface file for the Rbac module within xapi in order to
document the intent of each of its functions.

---

Cherry picked from another branch concerning `server.ml` generation; I
want to reduce the surface area of these modules so we can be clear
about what they are doing.
@edwintorok edwintorok merged commit 8c3438d into feature/perf Nov 12, 2024
132 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.