Hello Martin,

From a transport side an example would be an Ethernet Pause or with Infiniband there is also an ECN (Explicit Congestion Notification) and I'm sure there may be others for other transport mechanisms. 
https://cva.stanford.edu/classes/ee382c/research/infiniband_cong.pdf

I don't know how much of these are currently being used in other parts of the stack but I think that when a modular approach can be created with a common set of configuration options from a multipathd level you would seriously do the entire Linux and Storage sysadmin ecosystem a huge favour.

I would need to dive into the stacks or seek help from the maintainers of these code-bases to know more of what the options are.

Cheers
Erwin


On Tue, 2021-12-07 at 09:11 +0000, Martin Wilck wrote:
On Tue, 2021-12-07 at 09:19 +1000, Erwin van Londen wrote:
Hello Martin, Muneendra.

As I kicked this discussion off in the beginning of the year and
seeing the Muneendra and the broadcom people have come up with the
first iteration I can only applaud the efforts. On behalf of all
storage and linux administrators I would say "Thank you".

As for your remark Martin my view would be to try and create a
modular approach where the transport layer drivers can hook into and
inform multipathd of any event. The module in multipathd would then
decide based on configured characteristics what the actions should
be. (Take it offline, suspend for X amount of time, introduce X us
delay etc...) That way when more transport methods are used these can
then dynamically be linked into the configuration without having any
impact on other parts of the transport stack. I can imagine that
Infiniband. ethernet, SAS and others utilise different transport
characteristics and as such may need to inform the attached hosts of
one or more events. On FC this is FPIN but a similar module may be
written for other transports.

Interesting idea. Are you aware of a technology for non-FC transports
that could take the role of FPIN? I have to admit I'm not, but that
doesn't mean they don't exist or won't exist in the future.

In the first place we'd need to "hook in" an event listener. Like with
Muneendra's patch, we're adding a new class of events that we're
listening to. The events would then than collected and processed by
separate worker thread (which unlike the listener would take the
multipath lock), setting paths states to marginal or back to normal.

I don't think we want to add plug-ins that spawn their own independent
threads, though. That sounds very difficult to handle properly, and we
already have more than enough complexity.

If we want to modularize this, we need a *generic* event listener
thread. A module would basically provide an fd for that thread to poll
on, and a callback to be called when an event occurs. This idea appeals
to me a lot, in particular because we already have an event listener
(the uevent listener thread) which is sitting idle most of the time.

So Muneendra, instead of creating a new receiver thread, you would
extend the existing uevent listener to handle the FPIN events as well.
The thread would now add uevents to the uevent list and FPIN events to
the FPIN events list.

Next, we'd also need a generic event consumer, with callbacks for
different types of marginal state handlers. Perhaps this could even be
the uevent trigger thread? The uevent trigger has more work to do than
the uevent listener. But any handler thread that wants to modify path
state would need to take the lock anyway, effectively serializing all
operations. So I guess we might as well use both uevent threads for
"transport event notification" reception and processing, respectively.

We also need to think about whether the currently existing marginal
path handler could fit into this framework. Not so well probably,
because it's not event driven and hooks into check_path(). OTOH, maybe
possible future mechanisms might hook into check_path(), too, so we'd
need a generic callback there?

Moreover, the existing marginal paths handler has two different modes
of operation, the "classical" one that disables reinstate, and the
more modern one that uses marginal pathgroups. I am wondering whether
we need the first mode in the long run. In particular if we want to
generalize this feature, we may want to get rind of the "classical"
mode altogether. I'm not aware of any distinct advantages of that
algorithm compared to marginal path groups.

@Ben, Muneendra, what do you think?

One word of caution here: we must be careful not to over-engineer.
As long as no other mechanism like FPIN for other transports is
conceivable, generalizing the concept makes only so much sense.
Therefore we shouldn't hold back the FPIN patches until we have
conceived of a generic mechanism, which may take a lot of time to
develop. If another mechanism becomes available, we could still try to
generalize the concept, if we keep the current additions clean and
well-separated from the core multipathd code.

However I am really thrilled by the prospect of generalizing event
handling and reusing the uevent threads for FPIN. That would reduce
complexity a lot, which is a good thing IMO.

@Ben, Muneendra, again, your opinions?

Best
Martin