Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync from master to feature/easier-pool-join #6170

Conversation

gangj
Copy link
Contributor

@gangj gangj commented Dec 11, 2024

No description provided.

danilo-delbusso and others added 30 commits October 1, 2024 11:25
Parallel atoms do quite a bit of unnecessary actions even when they are empty.
They are also not needed when running a single task.

They also show as spans in the traces. Removing them makes the traces shorter
and easier to read.

Co-authored-by: Edwin Török <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
Operations on different devices should be independent and therefore can be
parallelized. This both means parallelizing operations on different device
types and on devices for the same type.

An atom to serialize action has been introduced because the operation regarding
a single device must be kept serialized.

Signed-off-by: Pau Ruiz Safont <[email protected]>
Instead of sleeping and hoping for the best, wait for the background job to
finish.

Signed-off-by: Andrii Sultanov <[email protected]>
GCC correctly reports "pointer targets differ in signedness".

Signed-off-by: Ross Lagerwall <[email protected]>
Newer systemd warns that the "syslog" StandardOutput/StandardError type
is deprecated and automatically uses the journal instead. Fix this by
removing the explicit setting of StandardOutput/StandardError. Instead,
the service will use the default values configured in systemd
(DefaultStandardOutput/DefaultStandardError) which for a XenServer
system will result in the output going to rsyslog.

Signed-off-by: Ross Lagerwall <[email protected]>
services configuration

Systemd services has good support for the services depends and
orders in the Unit file, that is the place the restart order
should be stated.
However, the command `systemd stop foo bar ...` will override
the order in the Unit file. As the number of the services grow
up, it is really hard to manage the order in the systemd command

In order to resolve the issue, `toolstack.target` is created to
group and manage the toolstack services.
- toolstack.target: `Wants: foo.service` will start foo.service
when `systemctl start toolstack.target`
- foo.service: `PartOf: toolstack.target` will restart/stop
foo.service when `systemctl stop/restart toolstack.target`
Note: Above two does not have to match, eg. if we do not want to
start a service during `systemctl start toolstack.target`, we can
remove it from the first list.

- Following xenopsd services are no longer valid, just got removed
  * xenopsd
  * xenopsd-xenlight
  * xenopsd-simulator
  * xenopsd-libvirt

Signed-off-by: Lin Liu <[email protected]>
This enables extending the type without causing performance issues,
and should reduce the work for the garbage collector.

Signed-off-by: Edwin Török <[email protected]>
Instead of getting one timestamp for all collectors, get them closer to the
actual measurement time.

Signed-off-by: Andrii Sultanov <[email protected]>
Instead of timestamps taken at read time in xcp-rrdd, propagate timestamps
taken by plugins (or collectors in xcp-rrdd itself) and use these when
updating values.

Also process datasources without flattening them all into one list.

This allows to process datasources with the same timestamp (provided by the
plugin or dom0 collector in xcp_rrdd) at once.

Co-authored-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Andrii Sultanov <[email protected]>
RRDs can have different owners, and new ones can show up without new domids
being involved (SRs, VMs are also owner types).

Signed-off-by: Andrii Sultanov <[email protected]>
ds_update* functions previously relied on being called for the entirety of
datasources of a certain RRD. Instead, operate on chunks of datasources
provided by the same plugin. These chunks are not contiguous (more detailed
explanation in the comments in the code), so indices to rrd_dss and the
associated structures need to be carried with each datasource.

Also turns the value*transform tuple into a type for more explicit accesses.

Signed-off-by: Andrii Sultanov <[email protected]>
Reason:
API errors in go sdk are generated from Api_errors.errors.
Error is filled in Api_errors.errors when defined using
add_error function. Error too_many_groups is not defined
using add_error function.
Fix:
Define too_many_groups using add_error function.

Signed-off-by: Changlei Li <[email protected]>
)

Reason:
API errors in go sdk are generated from Api_errors.errors. Error is
filled in Api_errors.errors when defined using add_error function. Error
too_many_groups is not defined using add_error function.
Fix:
Define too_many_groups using add_error function.
These become warnings in ocaml 5.0+

Signed-off-by: Pau Ruiz Safont <[email protected]>
There is no remote invocation of this function, so should be safe.

Signed-off-by: Vincent Liu <[email protected]>
Rather than copy and copy', rename them into `copy_into_sr` and
`copy_into_vdi`, which is what they actually do.

Signed-off-by: Vincent Liu <[email protected]>
Also make the logging information more verbose.

Signed-off-by: Vincent Liu <[email protected]>
- Introduce new `MigrateLocal` and `MigrateRemote` modules which contains
  the main implementations of the migration logic.
- Move the actual exposed SMAPIv2 functions in one place towards the end
  of the file, rather than spreading across the entire file.

These refactoring should all be statically verifiable by the compiler.

Signed-off-by: Vincent Liu <[email protected]>
Previously a task was constructed based on the log and tracing of a dbg
of the type Debug_info.t, and then later on a dbg is constructed based
on the previously constructued task. Instead of that, just convert the
first dbg into a string and thread it through the call.

Signed-off-by: Vincent Liu <[email protected]>
2 recent optimizations have changed the Uuidx module to open
  /dev/urandom once on startup, instead of every time a value was
  requested.

However 'networkd_db' runs in the installer environment, inside a chroot
where /dev/urandom is not available.

Open /dev/urandom on first use instead.

Simplify the code and use a single implementation for both fast and
secure urandom generation:
* use a mutex to protect accesses to global urandom state
* use an input channel, rather than a Unix file descriptor, this allows
  us to read many bytes in one go, and then generate multiple random
  numbers without having to make syscalls that often

(syscalls are slow in this case because they require releasing the
runtime mutex, which gives another thread the opportunity to run for
50ms).

Fixes: a0176da ("CP-49135: open /dev/urandom just once")
Fixes: a2d9fbe ("IH-577 Implement v7 UUID generation")
Fixes: 6635a00 ("CP-49136: Introduce PRNG for generating non-secret UUIDs")

This is slightly slower than before, but still fast enough:
```
│  uuidx creation/Uuidx.make            │             0.0004 mjw/run│            16.0001 mnw/run│            105.8801 ns/run│
│  uuidx creation/Uuidx.make_uuid_urnd  │             0.0004 mjw/run│            16.0001 mnw/run│            105.1474 ns/run│
```

Previously this used to take ~88ns, so in fact the difference is barely
noticable.

Also remove the feature flag: the previous change was feature flagged
too, but broke master anyway (I wouldn't have though anything *doesn't*
have /dev/urandom available, and didn't feature flag that part, because
in general it is not possible to feature flag startup code without races)

Signed-off-by: Edwin Török <[email protected]>
initscripts family are legacy and want to be removed

`service iptables save` call
/usr/libexec/initscripts/legacy-actions/iptables/save, which call
`exec /usr/libexec/iptables/iptables.init save`, to save iptables
rules and remove initscripts, we call following directly
`/usr/libexec/iptables/iptables.init save`

`service` command are also updated to `systemctl`

Signed-off-by: Lin Liu <[email protected]>
last-genius and others added 25 commits December 4, 2024 10:25
`xcp-rrdd` used to ignore timestamps of measurement times provided by
the RRDD plugins, and always calculated change in all values against a
single timestamp captured at the time of reading on the server. This can
potentially trigger a case where the time that passed between
measurements is not equal to the time that passed between readings (or
other measurements), and values would be calculated incorrectly. This
can be triggered on a host with 100% CPU load, where `cpu0-P0` derived
metric, for example, would often be > 1.0.

This PR reworks `xcp-rrdd` and the plugin infrastructure to calculate
values based on the actual length of time passed since the last value
for this particular metric was recorded. This required extensive
architectural changes, explained in detail in commits themselves - *so
best reviewed by commit*.

I've tested this over the weekend, and haven't registered any spikes in
metrics on a fully-loaded host. Will run BST+BVT as well, since RRDD is
at the center of a lot of other things and is quite fragile.
…#6134)

### NUMA docs: Fix typos and simplify some parts

Hi, in this PR for
https://xapi-project.github.io/new-docs/toolstack/features/NUMA/index.html,
I'd like to:
- Add more sections to improve the document structure
- Fix a few typos
- Simplify some parts
Signed-off-by: Rob Hoes <[email protected]>
This appears to be entirely unused.
Recent changes to Http_client were made to log errors and backtraces in
case of unusual errors, in order to better diagnose problems.
Unfortunately, this also led to benign exceptions being logged, which
caused people to suspect problems where the exception was actually
"expected". In particular the following error started appearing in the
logs many times:

    Raised Http_client.Parse_error("Expected initial header []")

The case here is that the client has sent an HTTP request, but the
server disconnects before sending a response.

It turns out that this happens commonly when an external connection is
received by xapi and handed over to another process, such as xcp-rrdd.
The connection's file descriptor is passed to the other process, and the
original HTTP request is forwarded over a local socket. The other process
continues to handle the request and sends responses to the forwarded
socket, but never to the local socket, where xapi is waiting for the
response.

This is a quick change that makes the caller of the receive function
aware that no response was sent, without logging scary errors, to fix
the immediate problem. In addition, we should enhance the handover
process to actually send responses, although funcionally this is not an
issue.

Signed-off-by: Rob Hoes <[email protected]>
As per the new design, we are no longer telling user that their
cluster stack version is out of date, since the corosync upgrade will be
part of the compulsory actions enforced by XC upgrade wizard and the
host installer.

Signed-off-by: Vincent Liu <[email protected]>
Recent changes to Http_client were made to log errors and backtraces in
case of unusual errors, in order to better diagnose problems.
Unfortunately, this also led to benign exceptions being logged, which
caused people to suspect problems where the exception was actually
"expected". In particular the following error started appearing in the
logs many times:

    Raised Http_client.Parse_error("Expected initial header []")

The case here is that the client has sent an HTTP request, but the
server disconnects before sending a response.

It turns out that this happens commonly when an external connection is
received by xapi and handed over to another process, such as xcp-rrdd.
The connection's file descriptor is passed to the other process, and the
original HTTP request is forwarded over a local socket. The other
process continues to handle the request and sends responses to the
forwarded socket, but never to the local socket, where xapi is waiting
for the response.

This is a quick change that makes the caller of the receive function
aware that no response was sent, without logging scary errors, to fix
the immediate problem. In addition, we should enhance the handover
process to actually send responses, although funcionally this is not an
issue.
Old implementation had different issues.
Just use mutexes and conditions using C stubs.
Current Ocaml Condition module does not have support for timeout waiting
for conditions, so use C stubs.
This implementation does not require opening a pipe.
This reduces the possibilities of errors calling "wait" to zero.
Mostly of the times it does not require kernel calls.

Signed-off-by: Frediano Ziglio <[email protected]>
Default empty allowed_operations on a cloned VBD means that XenCenter
does not display the DVD option in the console tab for VMs cloned from
templates, for example.

Follow the practice in xapi_vbd, and update_allowed_operations immediately
after Db.VBD.create.

Signed-off-by: Andrii Sultanov <[email protected]>
Previously a `CLUSTER_STACK_OUT_OF_DATE` is rasied when the user is
running corosync 2 as the cluster stack, while corosync3 is available on
the system. This is to nag people to upgrade their cluster stack to
corosync 3, on XS 9.

As per the new design, we are no longer telling user that their cluster
stack version is out of date, since the corosync upgrade will be part of
the compulsory actions enforced by XC upgrade wizard and the host
installer.
… during sync_updates (xapi-project#6137)

The new version of yum-utils introduced to xenserver enhances the
--download-metadata functionality to align with **dnf reposync** (refer
to: https://dnf-plugins-core.readthedocs.io/en/latest/reposync.html) by
fully downloading repository metadata.
Consequently, _create_pool_repository_ is no longer required.
The PR introduces a check to verify whether the **repodata** already
exists before proceeding to create repository metadata.
…6159)

Default empty `allowed_operations` on a cloned VBD means that XenCenter
does not display the DVD option in the console tab for VMs cloned from
templates, for example.

Follow the practice in `xapi_vbd`, and `update_allowed_operations`
immediately after `Db.VBD.create`.

---

I've tested this fix and XC displays the ISO selection on the cloned VM
now, and allowed operations on the cloned VBDs are correct as well.
In some environments the time ranges checked are too strict causing test to
fail.
Previously the maximum error accepted was 10 ms, increase to 50 ms.
Also increase timeouts to reduce error/value ratio.

Signed-off-by: Frediano Ziglio <[email protected]>
Old implementation had different issues.
Just use mutexes and conditions using C stubs.
Current Ocaml Condition module does not have support for timeout waiting
for conditions, so use C stubs.
This implementation does not require opening a pipe. This reduces the
possibilities of errors calling "wait" to zero. Mostly of the times it
does not require kernel calls.
When a host starts, the systemd service xapi-wait-init-complete waiting
on the creation of the xapi init cookie file may fail on timeout for a
matter of seconds. This patch adds 1 minute (300 seconds total) to the
timeout passed to the script xapi-wait-init-complete.
In some environments the time ranges checked are too strict causing test
to fail.
Previously the maximum error accepted was 10 ms, increase to 50 ms.
Also increase timeouts to reduce error/value ratio.
Some plugins may not store the client-set metadata, and return a static value
when replying to the update. This would override the values that a client
used when the SR was created, or set afterwards, which is unexpected.

Now name_label and name_description fields returned by the plugins are ignored
on update.

Current set_name_label and set_name_description rely on the update mechanism to
work. Instead add database call at the end of the methods to ensure both xapi
and the SR backend are synchronized, even when the latter fails to update the
values.

Signed-off-by: Pau Ruiz Safont <[email protected]>
Though the majority of completions were already using set_completions and the
like to add completion suggestions, there were two leftovers still needlessly
changing COMPREPLY themselves. This caused bugs, as in the case of

xe vm-import filename=<TAB>

autocompleting all of the filenames into the prompt instead of presenting
the choice. Let only these functions operate on COMPREPLY directly.

Signed-off-by: Andrii Sultanov <[email protected]>
**xe-cli completion: Use grep -E instead of egrep**

Otherwise newer packages in XS9 issue
"egrep: warning: egrep is obsolescent; using grep -E"
warnings all over the place

---

**xe-cli completion: Hide COMPREPLY manipulation behind functions**

Though the majority of completions were already using set_completions
and the
like to add completion suggestions, there were two leftovers still
needlessly
changing COMPREPLY themselves. This caused bugs, as in the case of

`xe vm-import filename=<TAB>`

autocompleting all of the filenames into the prompt instead of
presenting
the choice. Let only these functions operate on COMPREPLY directly.
…roject#6165)

Some plugins may not store the client-set metadata, and return a static
value
when replying to the update. This would override the values that a
client
used when the SR was created, or set afterwards, which is unexpected.

Now name_label and name_description fields returned by the plugins are
ignored
on update.

Current set_name_label and set_name_description rely on the update
mechanism to
work. Instead, add database call at the end of the methods to ensure
both xapi
and the SR backend are synchronized, even when the latter fails to
update the
values.

Tested on GFS2 tests (JR 4175192), as well as ring3 bvt + bst (209177),
and storage validation tests (SR 209180)
@gangj
Copy link
Contributor Author

gangj commented Dec 11, 2024

[gangj@xenrt1015818055 scm]$ git show
commit 97706ae (HEAD -> private/gangj/easier-pool-join.5-24.11.11-mergeMaster, my_gh/private/gangj/easier-pool-join-mergeMaster)
Merge: deb25d2 d8baca7
Author: Gang Ji [email protected]
Date: Wed Dec 11 10:29:35 2024 +0800

Merge branch 'master' into feature/easier-pool-join

diff --cc ocaml/idl/schematest.ml
index d83bf34775,2c4a87453b..8f87550cc0
--- a/ocaml/idl/schematest.ml
+++ b/ocaml/idl/schematest.ml
@@@ -3,7 -3,7 +3,7 @@@ let hash x = Digest.string x |> Digest.
(* BEWARE: if this changes, check that schema has been bumped accordingly in
ocaml/idl/datamodel_common.ml, usually schema_minor_vsn *)

- let last_known_schema_hash = "aba698bd66b04e0145f07130e6db9cad"
-let last_known_schema_hash = "18df8c33434e3df1982e11ec55d1f3f8"
++let last_known_schema_hash = "ffceac5e586329de3267b9bb958524a7"

let current_schema_hash : string =
let open Datamodel_types in

@gangj gangj merged commit 724ff9c into xapi-project:feature/easier-pool-join Dec 11, 2024
15 checks passed
@gangj gangj deleted the private/gangj/easier-pool-join-mergeMaster branch December 11, 2024 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.