- Bug fixes:
- Fix handling of unresponsive zeroconf-discovered Discovery Controllers. Sometimes we could have a timeout during twice as long as normal.
- Set default value of legacy "[Global] persistent-connections=false"
- Add
ControllerTerminator
entity to deal with potential (rare) cases where Connect/Disconnect operations could be performed in reverse order.
- Add more unit tests
- Increase code coverage
- Improve name resolution algorithm
- Set udev event priority to high (for faster handling)
- Bug fixes:
- Immediately remove existing connection to Discovery Controllers (DC) discovered through zeroconf (mDNS) when added to
exclude=
instafd.conf
. Previously, adding DCs toexclude=
would only take effect on new connections and would not apply to existing connections. - When handling "key=value" pairs in the TXT field from Avahi, "keys" need to be case insensitive.
- Strip spaces from Discovery Log Page Entries (DLPE). Some DCs may append extra spaces to DLPEs (e.g. IP addresses with trailing spaces). The kernel driver does not expect extra spaces and therefore they need to be removed.
- Immediately remove existing connection to Discovery Controllers (DC) discovered through zeroconf (mDNS) when added to
- In
stafd.conf
andstacd.conf
, added new configuration parameters to provide parity withnvme-cli
:nr-io-queues
,nr-write-queues
,nr-poll-queues
,queue-size
,reconnect-delay
,ctrl-loss-tmo
,duplicate-connect
,disable-sqflow
- Changes to
stafd.conf
:- Move
persistent-connections
from the[Global]
section to a new section named[Discovery controller connection management]
.persistent-connections
will still be recognized from the[Global]
section, but will be deprecated over time. - Add new configuration parameter
zeroconf-connections-persistence
to section[Discovery controller connection management]
. This parameter allows to age Discovery Controllers discovered through zeroconf (mDNS) when they are no longer reachable and should be purged from the configuration.
- Move
- Added more configuration validation to identify invalid Sections and Options in configuration files (
stafd.conf
andstacd.conf
). - Improve dependencies in meson build environment so that missing subprojects won't prevent distros from packaging the
nvme-stas
(i.e. needed when invoking meson with the--wrap-mode=nodownload
option) - Improve Read-The-Docs documentation format.
Because of incompatibilities between 1.1.6 and 1.2 (ref. sticky-connections
), it was decided to skip release 1.2 and have a 2.0 release instead. Release 2.0 contains everything listed in 1.2 (below) plus the following:
- Add support for PLEO - Port-Local Entries Only, see TP8010.
- Add new configuration parameter to stafd.conf:
pleo=[enabled|disabled]
- This requires
libnvme
1.2 or later although nvme-stas can still operate with 1.1 (but PLEO will not be supported). - Although
blacklist=
is deprecated, keep supporting it for a while. - Target
udev-rule=
at TCP connections only. - Read-the-docs will now build directly from source (instead of using a possibly stale copy)
- More unit tests were added
- Refactored the code that handles pyudev events in an effort to fix spurious lost events.
- Add new configuration parameter to stafd.conf:
-
In
stacd.conf
, add a new configuration section,[I/O controller connection management]
.- This is to replace
sticky-connections
bydisconnect-scope
anddisconnect-trtypes
, which is needed so that hosts can better react to Fabric Zoning changes at the CDC. - Add
connect-attempts-on-ncc
to control how stacd will react to the NCC bit (Not Connected to CDC).
- This is to replace
-
When the host's symbolic name is changed in
sys.conf
, allow re-issuing the DIM command (register with DC) on areload
signal (systemctl reload stafd
). -
Replace
blacklist=
byexclude=
isstafd.conf
andstacd.conf
. Warning: this may create an incompatibility for people that were usingblacklist=
. They will need to manually migrate their configuration files. -
Change
TID.__eq__()
andTID.__ne__()
to recognize a TID object even when thehost-iface
is not set. This is to fix system audits wherenvme-stas
would not recognize connections made bynvme-cli
. The TID object, or Transport ID, contains all the parameters needed to establish a connection with a controller, e.g. (trtype
,traddr
,trsvcid
,nqn
,host-traddr
, andhost-iface
).nvme-stas
can scan thesysfs
(/sys/class/nvme/
) to find exiting NVMe connections. It relies on theaddress
and other attributes for that. For example the attribute/sys/class/nvme/nvme0/address
may contain something like:traddr=192.168.56.1,trsvcid=8009,host_iface=enp0s8
.nvme-stas
always specify thehost-iface
when making connections butnvme-cli
typically does not. Instead,nvme-cli
relies on the routing table to select the interface. This creates a discrepancy between theaddress
attribute of connections made bynvme-cli
and those made bynvme-stas
(i.e.host_iface=
is missing fornvme-cli
connections). And this results innvme-stas
not being able to recognize connections made bynvme-cli
. Two solutions have been proposed to workaround this problem:- First, a short term solution changes
TID.__eq__()
andTID.__ne__()
so that thehost-iface
has a lesser weight when comparing two TIDs. This way, the TID of a connection created bynvme-cli
can be compared to the TID of a connection made withnvme-stas
and still result in a match. The downside to this approach is that a connection made withnvme-cli
that is going over the wrong interface (e.g. bad routing table entry), will now be accepted bynvme-stas
as a valid connection. - Second, a long term solution that involves a change to the kernel NVMe driver will allow being able to determine the host interface for any NVMe connections, even those made without specifying the
host-iface
parameter. The kernel driver will now expose the source address of all NVMe connections through thesysfs
. This will be identified by the key=value pair "src-addr=[ip-address]
" in theaddress
attribute. And from the source address one can infer the actual host interface. This actually will solve the shortcomings of the "short term" solution discussed above. Unfortunately, it may take several months before this kernel addition is available in a stock Distribution OS. So, the short term solution will need to suffice for now.
- First, a short term solution changes
- Fix issues with I/O controller connection audits
- Eliminate pcie devices from list of I/O controller connections to audit
- Add soaking timer to workaround race condition between kernel and user-space applications on "add" uevents. When the kernel adds a new nvme device (e.g.
/dev/nvme7
) and sends a "add" uevent to notify user-space applications, the attributes associated with that device (e.g./sys/class/nvme/nvme7/cntrltype
) may not be fully initialized which can leadstacd
to dismiss a device that should get audited.
- Make
sticky-connections=enabled
the default (seestacd.conf
)
- Fix issues introduced in 1.1.3 when enabling Fibre Channel (FC) support.
- Eliminate pcie devices from discovery log pages. When enabling FC, pcie was accidentally enabled as well.
- Fix I/O controller scan and detect algorithm. Again, while adding support for FC, the I/O scan & detect algorithm was modified, but we accidentally made it detect Discovery Controllers as well as I/O controllers.
- Fix issues for Fibre Channel (FC) support.
- Add TESTING.md
stacd: Add I/O controller connection audits. Audits are enabled when the configuration parameter "sticky-connections
" is disabled.
stafd: Preserve and Reload last known configuration on restarts. This is for warm restarts of the stafd
daemon. This does not apply to system reboots (cold restarts). This is needed to avoid deleting I/O controller (IOC) connections by mistake when restarting stafd
. It prevents momentarily losing previously acquired Discovery Log Page Entries (DLPE). Since stacd
relies on acquired DLPEs to determine which connection should be created or deleted, it's important that the list of DLPEs survives a stafd
restart. Eventually, after stafd
has restarted and reconnected to all Discovery Controllers (DC), the list will get refreshed and the DLPE cache will get updated. And as the cache gets updated, stacd
will be able to determine which connections should remain and which one should get deleted.
stafd
/stacd
: Fixed crash caused by stafd
/stacd
calling the wrong callback function during the normal disconnect of a controller. There are two callback functions that can be called after a controller is disconnected, but one of them must only be called on a final disconnect just before the process (stafd
or stacd
) exits. The wrong callback was being called on a normal disconnect, which led the process to think it was shutting down.
stacd: Bug fix. Check that self._cfg_soak_tmr is not None before dereferencing it.
Make sticky-connections=disabled
the default (see stacd.conf
)
- Add
udev-rule
configuration parameter tostacd.conf
. - Add
sticky-connections
configuration parameter tostacd.conf
. - Add coverage testing (
make coverage
) - Add
make uninstall
- To
README.md
, add mDNS troubleshooting section.
- Install staslib as pure python package instead of arch-specific.
- First public release following TP8009 / TP8010 ratification and publication.
- Initial release