All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
@ 2021-03-23  7:52 Erwin van Londen
  2021-03-25 16:07 ` Benjamin Block
  0 siblings, 1 reply; 17+ messages in thread
From: Erwin van Londen @ 2021-03-23  7:52 UTC (permalink / raw)
  To: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 299 bytes --]

Hello All,

Just wondering if there were any plans to incorporate FPIN
congestion/latency notifications in dm-multipath to disperse IO over
non-affected paths.

Regards,
Erwin van Londen
-- 
Kind regards,

Erwin van Londen
http://erwinvanlonden.net
PGP key: http://erwinvanlonden.net/pgp-key-id/




[-- Attachment #1.2: Type: text/html, Size: 3702 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-23  7:52 [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications Erwin van Londen
@ 2021-03-25 16:07 ` Benjamin Block
  2021-03-26 11:15   ` Muneendra Kumar M
  0 siblings, 1 reply; 17+ messages in thread
From: Benjamin Block @ 2021-03-25 16:07 UTC (permalink / raw)
  To: Erwin van Londen; +Cc: Muneendra, dm-devel

On Tue, Mar 23, 2021 at 05:52:33PM +1000, Erwin van Londen wrote:
> Hello All,
> 
> Just wondering if there were any plans to incorporate FPIN
> congestion/latency notifications in dm-multipath to disperse IO over
> non-affected paths.
> 

For whats worth, general support in Kernel for a new path state in
answer to existing FPIN notifications was added earlier this year:
https://lore.kernel.org/linux-scsi/1609969748-17684-1-git-send-email-muneendra.kumar@broadcom.com/T/

But this only adds a new port-state and support of it for one particular
driver (lpfc). Not aware of any other driver supporting this new state
yet, but I might have missed it. Also, the port-state is not set in
kernel, but has to be set by something external, unlike with RSCNs,
where we set the state in the kernel.

What it does, once a path is set into 'Marginal' state, is to not retry
commands on the same shaky path, once it already failed one time
already.

As far as dm-multipath is concerned, I asked that as well when this
patch series was developed:
https://lore.kernel.org/linux-scsi/20201002162633.GA8365@t480-pf1aa2c2/
Hannes answered that in the thread:
https://lore.kernel.org/linux-scsi/ca995d96-608b-39b9-8ded-4a6dd7598660@suse.de/

Not sure what happened in between, didn't see anything on the mpath
topic yet.


-- 
Best Regards, Benjamin Block  / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH    /    https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen         /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-25 16:07 ` Benjamin Block
@ 2021-03-26 11:15   ` Muneendra Kumar M
  2021-03-31  0:22     ` Erwin van Londen
  0 siblings, 1 reply; 17+ messages in thread
From: Muneendra Kumar M @ 2021-03-26 11:15 UTC (permalink / raw)
  To: Benjamin Block, Erwin van Londen; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 3157 bytes --]

Hi Benjamin,
My replies are below


On Tue, Mar 23, 2021 at 05:52:33PM +1000, Erwin van Londen wrote:
>> Hello All,
>>
> >Just wondering if there were any plans to incorporate FPIN
> >congestion/latency notifications in dm-multipath to disperse IO over
> >non-affected paths.
>

>For whats worth, general support in Kernel for a new path state in answer
to existing FPIN notifications was added earlier this year:
>https://lore.kernel.org/linux-scsi/1609969748-17684-1-git-send-email-mune
endra.kumar@broadcom.com/T/

>But this only adds a new port-state and support of it for one particular
driver (lpfc). Not aware of any other driver supporting this new state
yet, but I might have missed it. Also, the port-state is not set in
kernel, but has to be set by something external, unlike with RSCNs, where
we set the >state in the kernel.

We had a discussion with Marvel and they are adding the support in
their(qlaxx) driver.


>What it does, once a path is set into 'Marginal' state, is to not retry
commands on the same shaky path, once it already failed one time already.
Yes

>As far as dm-multipath is concerned, I asked that as well when this patch
series was developed:
>https://lore.kernel.org/linux-scsi/20201002162633.GA8365@t480-pf1aa2c2/
>Hannes answered that in the thread:
>https://lore.kernel.org/linux-scsi/ca995d96-608b-39b9-8ded-4a6dd7598660@s
use.de/

>Not sure what happened in between, didn't see anything on the mpath topic
yet.

As Hannes mentioned in his reply we have an external daemon called fctxpd
which acts on fpin-li events and sets the path to marginal path group as
well as set the port state to marginal.
This daemon is part of epel8.
Below is the path for the same where we have changes
https://github.com/brocade/bsn-fc-txptd

The above code is reviewed by the Benjamin Marzinski from redhat .

Note:The latest release will be available on the epel8 where we have the
support to set the port state to marginal in a week time

As we have all the support in the kernel for fpin registration,
notifications and also setting the port_state to marginal
We had a initial discussion with Hannes adding the fpin based native
support in dm multipathd for FPIN Congestion/Latency notifications .
I will take the initiative and start the discussion with Benjamin
Marzinski and get this work done with the help of Hannes.




Regards,
Muneendra.

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.

[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4220 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-26 11:15   ` Muneendra Kumar M
@ 2021-03-31  0:22     ` Erwin van Londen
  2021-03-31  7:25       ` Hannes Reinecke
  0 siblings, 1 reply; 17+ messages in thread
From: Erwin van Londen @ 2021-03-31  0:22 UTC (permalink / raw)
  To: Muneendra Kumar M, Benjamin Block; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 3966 bytes --]

Hello Muneendra, benjamin,

The fpin options that are developed do have a whole plethora of options
and do not mainly trigger paths being in a marginal state. Th mpio
layer could utilise the various triggers like congestion and latency
and not just use a marginal state as a decisive point. If a path is
somewhat congested the amount of io's dispersed over these paths could
just be reduced by a flexible margin depending on how often and which
fpins are actually received. If for instance and fpin is recieved that
an upstream port is throwing physical errors you may exclude is
entirely from queueing IO's to it. If it is a latency related problem
where credit shortages come in play you may just need to queue very
small IO's to it. The scsi CDB will tell the size of the IO. Congestion
notifications may just be used for potentially adding an artificial 
delay to reduce the workload on these paths and schedule them on
another.

Not really sure what the possibilities are from a DM-Multipath
viewpoint, but I feel if the OS options are not properly aligned with
what the FC protocol and HBA drivers are able to provide we may miss a
good opportunity to optimize the dispersion of IO's and improve overall
performance. 

Regards,
Erwin

On Fri, 2021-03-26 at 16:45 +0530, Muneendra Kumar M wrote:
> Hi Benjamin,
> My replies are below
> 
> 
> On Tue, Mar 23, 2021 at 05:52:33PM +1000, Erwin van Londen wrote:
> > > Hello All,
> > > 
> > > Just wondering if there were any plans to incorporate FPIN
> > > congestion/latency notifications in dm-multipath to disperse IO
> > > over
> > > non-affected paths.
> > 
> 
> > For whats worth, general support in Kernel for a new path state in
> > answer
> to existing FPIN notifications was added earlier this year:
> > https://lore.kernel.org/linux-scsi/1609969748-17684-1-git-send-email-mune
> endra.kumar@broadcom.com/T/
> 
> > But this only adds a new port-state and support of it for one
> > particular
> driver (lpfc). Not aware of any other driver supporting this new
> state
> yet, but I might have missed it. Also, the port-state is not set in
> kernel, but has to be set by something external, unlike with RSCNs,
> where
> we set the >state in the kernel.
> 
> We had a discussion with Marvel and they are adding the support in
> their(qlaxx) driver.
> 
> 
> > What it does, once a path is set into 'Marginal' state, is to not
> > retry
> commands on the same shaky path, once it already failed one time
> already.
> Yes
> 
> > As far as dm-multipath is concerned, I asked that as well when this
> > patch
> series was developed:
> > https://lore.kernel.org/linux-scsi/20201002162633.GA8365@t480-pf1aa2c2/
> > Hannes answered that in the thread:
> > https://lore.kernel.org/linux-scsi/ca995d96-608b-39b9-8ded-4a6dd7598660@s
> use.de/
> 
> > Not sure what happened in between, didn't see anything on the mpath
> > topic
> yet.
> 
> As Hannes mentioned in his reply we have an external daemon called
> fctxpd
> which acts on fpin-li events and sets the path to marginal path group
> as
> well as set the port state to marginal.
> This daemon is part of epel8.
> Below is the path for the same where we have changes
> https://github.com/brocade/bsn-fc-txptd
> 
> The above code is reviewed by the Benjamin Marzinski from redhat .
> 
> Note:The latest release will be available on the epel8 where we have
> the
> support to set the port state to marginal in a week time
> 
> As we have all the support in the kernel for fpin registration,
> notifications and also setting the port_state to marginal
> We had a initial discussion with Hannes adding the fpin based native
> support in dm multipathd for FPIN Congestion/Latency notifications .
> I will take the initiative and start the discussion with Benjamin
> Marzinski and get this work done with the help of Hannes.
> 
> 
> 
> 
> Regards,
> Muneendra.
> 

[-- Attachment #1.2: Type: text/html, Size: 6327 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-31  0:22     ` Erwin van Londen
@ 2021-03-31  7:25       ` Hannes Reinecke
  2021-03-31  8:12         ` Erwin van Londen
  2021-03-31  9:57         ` Martin Wilck
  0 siblings, 2 replies; 17+ messages in thread
From: Hannes Reinecke @ 2021-03-31  7:25 UTC (permalink / raw)
  To: Erwin van Londen, Muneendra Kumar M, Benjamin Block
  Cc: dm-devel, Martin Wilck

Hi Erwin,

On 3/31/21 2:22 AM, Erwin van Londen wrote:
> Hello Muneendra, benjamin,
> 
> The fpin options that are developed do have a whole plethora of options
> and do not mainly trigger paths being in a marginal state. Th mpio layer
> could utilise the various triggers like congestion and latency and not
> just use a marginal state as a decisive point. If a path is somewhat
> congested the amount of io's dispersed over these paths could just be
> reduced by a flexible margin depending on how often and which fpins are
> actually received. If for instance and fpin is recieved that an upstream
> port is throwing physical errors you may exclude is entirely from
> queueing IO's to it. If it is a latency related problem where credit
> shortages come in play you may just need to queue very small IO's to it.
> The scsi CDB will tell the size of the IO. Congestion notifications may
> just be used for potentially adding an artificial  delay to reduce the
> workload on these paths and schedule them on another.
> 
As correctly noted, FPINs come with a variety of options.
And I'm not certain we can everything correctly; a degraded path is
simple, but for congestion there is only _so_ much we can do.
The typical cause for congestion is, say, a 32G host port talking to a
16G (or even 8G) target port _and_ a 32G target port.

So the host cannot 'tune down' it's link to 8G; doing so would impact
performance on the 32G target port.
(And we would suffer reverse congestion whenever that target port sends
frames).

And throttling things on the SCSI layer only helps _so_ much, as the
real congestion is due to the speed with which the frames are sequenced
onto the wire. Which is not something we from the OS can control.

>From another POV this is arguably a fabric mis-design; so it _could_ be
alleviated by separating out the ports with lower speeds into its own
zone (or even on a separate SAN); that would trivially make the
congestion go away.

But for that the admin first should be _alerted_, and this really is my
primary goal: having FPINs showing up in the message log, to alert the
admin that his fabric is not performing well.

A second step will be to massaging FPINs into DM multipath, and have it
influencing the path priority or path status. But this is currently
under discussion how it could be integrated best.

> Not really sure what the possibilities are from a DM-Multipath
> viewpoint, but I feel if the OS options are not properly aligned with
> what the FC protocol and HBA drivers are able to provide we may miss a
> good opportunity to optimize the dispersion of IO's and improve overall
> performance. 
> 
Looking at the size of the commands is one possibility, but at this time
this presumes too much on how we _think_ FPINs will be generated.
I'd rather do some more tests to figure out under which circumstances we
can expect which type of FPINs, and then start looking for ways on how
to integrate them.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-31  7:25       ` Hannes Reinecke
@ 2021-03-31  8:12         ` Erwin van Londen
  2021-03-31  9:57         ` Martin Wilck
  1 sibling, 0 replies; 17+ messages in thread
From: Erwin van Londen @ 2021-03-31  8:12 UTC (permalink / raw)
  To: Hannes Reinecke, Muneendra Kumar M, Benjamin Block
  Cc: dm-devel, Wilck, Martin


[-- Attachment #1.1: Type: text/plain, Size: 4360 bytes --]

Hello Hannes,

Thanks for responding.

On Wed, 2021-03-31 at 09:25 +0200, Hannes Reinecke wrote:
> Hi Erwin,
> 
> On 3/31/21 2:22 AM, Erwin van Londen wrote:
> > Hello Muneendra, benjamin,
> > 
> > The fpin options that are developed do have a whole plethora of
> > options
> > and do not mainly trigger paths being in a marginal state. Th mpio
> > layer
> > could utilise the various triggers like congestion and latency and
> > not
> > just use a marginal state as a decisive point. If a path is
> > somewhat
> > congested the amount of io's dispersed over these paths could just
> > be
> > reduced by a flexible margin depending on how often and which fpins
> > are
> > actually received. If for instance and fpin is recieved that an
> > upstream
> > port is throwing physical errors you may exclude is entirely from
> > queueing IO's to it. If it is a latency related problem where
> > credit
> > shortages come in play you may just need to queue very small IO's
> > to it.
> > The scsi CDB will tell the size of the IO. Congestion notifications
> > may
> > just be used for potentially adding an artificial  delay to reduce
> > the
> > workload on these paths and schedule them on another.
> > 
> As correctly noted, FPINs come with a variety of options.
> And I'm not certain we can everything correctly; a degraded path is
> simple, but for congestion there is only _so_ much we can do.
> The typical cause for congestion is, say, a 32G host port talking to
> a
> 16G (or even 8G) target port _and_ a 32G target port.
Congestion can also be caused by a change in workload characteristics
where, for example, read and write workload start interfering. The
funnel principle would not apply in that case.
> 
> So the host cannot 'tune down' it's link to 8G; doing so would impact
> performance on the 32G target port.
> (And we would suffer reverse congestion whenever that target port
> sends
> frames).
> 
> And throttling things on the SCSI layer only helps _so_ much, as the
> real congestion is due to the speed with which the frames are
> sequenced
> onto the wire. Which is not something we from the OS can control.
If you can interleave IOs with an artificial delay depending on the
type and frequency these FPINS arrive you would be able to prevent
latency buildup in the san.
> 
> From another POV this is arguably a fabric mis-design; so it _could_
> be
> alleviated by separating out the ports with lower speeds into its own
> zone (or even on a separate SAN); that would trivially make the
> congestion go away.
The entire FPIN concept was designed to be able to provide clients with
the option to respond and react to changing behaviours in sans. A mis-
design is often not really the case but ongoing changes and continuous
provisioning is  mainly contributing to the case. 
> 
> But for that the admin first should be _alerted_, and this really is
> my
> primary goal: having FPINs showing up in the message log, to alert
> the
> admin that his fabric is not performing well.
I think the FC drivers are already having facilities to do that or they
will have that shortly. dm-multipath is not really required to handle
the notifications but would be useful if actions have been done based
on fpins. 
> 
> A second step will be to massaging FPINs into DM multipath, and have
> it
> influencing the path priority or path status. But this is currently
> under discussion how it could be integrated best.
OK
> 
> > Not really sure what the possibilities are from a DM-Multipath
> > viewpoint, but I feel if the OS options are not properly aligned
> > with
> > what the FC protocol and HBA drivers are able to provide we may
> > miss a
> > good opportunity to optimize the dispersion of IO's and improve
> > overall
> > performance. 
> > 
> Looking at the size of the commands is one possibility, but at this
> time
> this presumes too much on how we _think_ FPINs will be generated.
> I'd rather do some more tests to figure out under which circumstances
> we
> can expect which type of FPINs, and then start looking for ways on
> how
> to integrate them.
The FC protocol only describes the framework and not the values that
need to be adhered to. That depends on the end devices and their
capabilities. 
> 
> Cheers,
> 
> Hannes

[-- Attachment #1.2: Type: text/html, Size: 6076 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-31  7:25       ` Hannes Reinecke
  2021-03-31  8:12         ` Erwin van Londen
@ 2021-03-31  9:57         ` Martin Wilck
  2021-03-31 10:48           ` Muneendra Kumar M
  1 sibling, 1 reply; 17+ messages in thread
From: Martin Wilck @ 2021-03-31  9:57 UTC (permalink / raw)
  To: erwin, muneendra.kumar, bblock, hare; +Cc: dm-devel

On Wed, 2021-03-31 at 09:25 +0200, Hannes Reinecke wrote:
> Hi Erwin,
> 
> On 3/31/21 2:22 AM, Erwin van Londen wrote:
> > Hello Muneendra, benjamin,
> > 
> > The fpin options that are developed do have a whole plethora of
> > options
> > and do not mainly trigger paths being in a marginal state. Th mpio
> > layer
> > could utilise the various triggers like congestion and latency and
> > not
> > just use a marginal state as a decisive point. If a path is
> > somewhat
> > congested the amount of io's dispersed over these paths could just
> > be
> > reduced by a flexible margin depending on how often and which fpins
> > are
> > actually received. If for instance and fpin is recieved that an
> > upstream
> > port is throwing physical errors you may exclude is entirely from
> > queueing IO's to it. If it is a latency related problem where
> > credit
> > shortages come in play you may just need to queue very small IO's
> > to it.
> > The scsi CDB will tell the size of the IO. Congestion notifications
> > may
> > just be used for potentially adding an artificial  delay to reduce
> > the
> > workload on these paths and schedule them on another.
> > 
> As correctly noted, FPINs come with a variety of options.
> And I'm not certain we can everything correctly; a degraded path is
> simple, but for congestion there is only _so_ much we can do.
> The typical cause for congestion is, say, a 32G host port talking to
> a
> 16G (or even 8G) target port _and_ a 32G target port.
> 
> So the host cannot 'tune down' it's link to 8G; doing so would impact
> performance on the 32G target port.
> (And we would suffer reverse congestion whenever that target port
> sends
> frames).
> 
> And throttling things on the SCSI layer only helps _so_ much, as the
> real congestion is due to the speed with which the frames are
> sequenced
> onto the wire. Which is not something we from the OS can control.
> 
> From another POV this is arguably a fabric mis-design; so it _could_
> be
> alleviated by separating out the ports with lower speeds into its own
> zone (or even on a separate SAN); that would trivially make the
> congestion go away.
> 
> But for that the admin first should be _alerted_, and this really is
> my
> primary goal: having FPINs showing up in the message log, to alert
> the
> admin that his fabric is not performing well.
> 
> A second step will be to massaging FPINs into DM multipath, and have
> it
> influencing the path priority or path status. But this is currently
> under discussion how it could be integrated best.

If there was any discussion, I haven't been involved :-) 

I haven't looked into FPIN much so far. I'm rather sceptic with it's
usefulness for dm-multipath. Being a property of FC-2, FPIN works at
least 2 layers below dm-multipath. dm-multipath is agnostic against
protocol and transport properties by design. User space multipathd can
cross these layers and tune dm-multipath based on lower-level
properties, but such actions  have rather large latencies.

As you know, dm-multipath has 3 switches for routing IO via different
paths:

 1 priority groups,
 2 path status (good / failed)
 3 path selector algorithm

1) and 2) are controlled by user space, and have high latency.

The current "marginal" concept in multipathd watches paths for repeated
failures, and configures the kernel to avoid using paths that are
considered marginal, using methods 1) and 2). This is a very-high-
latency algorithm that changes state on the time scale of minutes.
There is no concept for "delaying" or "pausing" IO on paths on short
time scale.

The only low-latency mechanism is 3). But it's block level, no existing
selector looks at transport-level properties.

That said, I can quite well imagine a feedback mechanism based on
throttling or delays applied in the FC drivers. For example, it a
remote port was throttled by the driver in response to FPIN messages,
it's bandwidth would decrease, and a path selector like "service-time"
would automatically assign less IO to such paths. This wouldn't need
any changes in dm-multipath or multipath-tools, it would work entirely
on the FC level.

Talking about improving the current "marginal" algorithm in multipathd,
and knowing that it's slow, FPIN might provide additional data
that would be good to have. Currently, multipathd only has 2 inputs,
"good<->bad" state transitions based either on kernel I/O errors or
path checker results, and failure statistics from multipathd's internal
"io_err_stat" thread, which only reads sector 0. This could obviously
be improved, but there may actually be lower-hanging fruit than
evaluating FPIN notifications (for example, I've pondered utilizing the
kernel's blktrace functionality to detect unusually long IO latencies
or bandwidth drops).

Talking about FPIN, is it planned to notify user space about such
fabric events, and if yes, how?

Thanks,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-31  9:57         ` Martin Wilck
@ 2021-03-31 10:48           ` Muneendra Kumar M
  2021-03-31 11:45             ` Martin Wilck
  2021-04-01  2:48             ` Erwin van Londen
  0 siblings, 2 replies; 17+ messages in thread
From: Muneendra Kumar M @ 2021-03-31 10:48 UTC (permalink / raw)
  To: Martin Wilck, erwin, bblock, hare; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 5837 bytes --]

Hi Martin,
Below are my replies.


>If there was any discussion, I haven't been involved :-)

>I haven't looked into FPIN much so far. I'm rather sceptic with it's
usefulness for dm-multipath. Being a property of FC-2, FPIN works at least
2 layers below dm-multipath. dm-multipath is agnostic against protocol and
transport properties by design. User space multipathd can cross these
layers and tune dm-multipath based on lower-level properties, but such
actions  have rather large latencies.

>As you know, dm-multipath has 3 switches for routing IO via different
paths:

> 1 priority groups,
> 2 path status (good / failed)
 >3 path selector algorithm

>1) and 2) are controlled by user space, and have high latency.

>The current "marginal" concept in multipathd watches paths for repeated
failures, and configures the kernel to avoid using paths that are
considered marginal, using methods 1) and 2). This is a very-high- latency
algorithm that >changes state on the time scale of minutes.
>There is no concept for "delaying" or "pausing" IO on paths on short time
scale.

>The only low-latency mechanism is 3). But it's block level, no existing
selector looks at transport-level properties.

>That said, I can quite well imagine a feedback mechanism based on
throttling or delays applied in the FC drivers. For example, it a remote
port was throttled by the driver in response to FPIN messages, it's
bandwidth would >decrease, and a path selector like "service-time"
>would automatically assign less IO to such paths. This wouldn't need any
changes in dm-multipath or multipath-tools, it would work entirely on the
FC level.

[Muneendra]Agreed.

>Talking about improving the current "marginal" algorithm in multipathd,
and knowing that it's slow, FPIN might provide additional data that would
be good to have. Currently, multipathd only has 2 inputs, "good<->bad"
state >transitions based either on kernel I/O errors or path checker
results, and failure statistics from multipathd's internal "io_err_stat"
thread, which only reads sector 0. This could obviously be improved, but
there may actually be >lower-hanging fruit than evaluating FPIN
notifications (for example, I've pondered utilizing the kernel's blktrace
functionality to detect unusually long IO latencies or bandwidth drops).

>Talking about FPIN, is it planned to notify user space about such fabric
events, and if yes, how?

[Muneendra]Yes. FC drivers, when receiving FC FPIN ELS's are calling a
scsi transport routine with the FPIN payload.  The transport
is pushing this as an "event" via netlink.  An app bound to the local
address used by the scsi transport can receive the event and parse it.

Benjamin has added a marginal_path group(multipath marginal pathgroups) in
the dm-multipath.
https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git
-send-email-bmarzins@redhat.com/

One of the intention of the Benjamin's patch (support for maginal path) is
to support for the FPIN events we receive from fabric.
On receiving the fpin-li our intention was to  place all the paths that
are affected into the marginal path group.

Below are the 4 types of descriptors returned in an FPIN:
•	Link Integrity (LN): some error on a link that affected frames,
which is the main one for "flaky path"
•	Delivery Notification (DN):  something explicitly knew about a
dropped frame and is reporting it. Usually, things like a CRC error says
you can't trust the frame header, so you it's a LI error. But if you do
have a valid frame, but drop it, such as a fabric edge timer (don't queue
it more the 250-600ms), then it becomes a DN type. Could be flaky path,
but not necessarily.
•	Congestion (CN): fabric is saying it's congested sending to "your"
port. Meaning if a host receives it - fabric is saying it has more frames
for the host than it's pulling in so it's backing up the fabric.What
should happen is load by the host should be lowered - but it's across all
targets. Not all targets are perhaps in the mpio path list
•	Peer Congestion (PCN): this goes along with CN in that the fabric
is now telling the other devices in the zone sending traffic to that
congested port that the other port is backing up. So the idea is these
peer send less load to the congested port.  Shouldn't really tie to mpio.
some of the current thinking is targets could see this and reduce their
transmission rate to a host to the link speed of the host

On receiving the congestion notifications our intention is to slowdown the
work load gradually from the host until it stops receiving the congestion
notifications.
We need to validate the same how we can achieve the same of decreasing the
workloads with the help of dm-multipath.

As Hannes mentioned  in his earlier mail our primary goal is that the
admin first should be _alerted_, having FPINs showing up in the message
log, to alert the
admin that his fabric is not performing well.


Regards,
Muneendra.

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.

[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4220 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-31 10:48           ` Muneendra Kumar M
@ 2021-03-31 11:45             ` Martin Wilck
  2021-03-31 11:53               ` Hannes Reinecke
  2021-04-01  2:48             ` Erwin van Londen
  1 sibling, 1 reply; 17+ messages in thread
From: Martin Wilck @ 2021-03-31 11:45 UTC (permalink / raw)
  To: erwin, muneendra.kumar, bblock, hare; +Cc: dm-devel

On Wed, 2021-03-31 at 16:18 +0530, Muneendra Kumar M wrote:
> 
> > Talking about FPIN, is it planned to notify user space about such
> > fabric
> events, and if yes, how?
> 
> [Muneendra]Yes. FC drivers, when receiving FC FPIN ELS's are calling
> a
> scsi transport routine with the FPIN payload.  The transport
> is pushing this as an "event" via netlink.  An app bound to the local
> address used by the scsi transport can receive the event and parse
> it.
> 
> Benjamin has added a marginal_path group(multipath marginal
> pathgroups) in
> the dm-multipath.
> https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git
> -send-email-bmarzins@redhat.com/
> 
> One of the intention of the Benjamin's patch (support for maginal
> path) is
> to support for the FPIN events we receive from fabric.
> On receiving the fpin-li our intention was to  place all the paths
> that
> are affected into the marginal path group.
> 

I'm aware of Ben's work, but I hadn't realized it had anything to do
with FPIN. As of today, multipathd doesn't listen on the
NETLINK_SCSITRANSPORT socket. Does any user space tool do this?
Google didn't show me anything.

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-31 11:45             ` Martin Wilck
@ 2021-03-31 11:53               ` Hannes Reinecke
  2021-03-31 11:57                 ` Muneendra Kumar M
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Reinecke @ 2021-03-31 11:53 UTC (permalink / raw)
  To: Martin Wilck, erwin, muneendra.kumar, bblock; +Cc: dm-devel

On 3/31/21 1:45 PM, Martin Wilck wrote:
> On Wed, 2021-03-31 at 16:18 +0530, Muneendra Kumar M wrote:
>>
>>> Talking about FPIN, is it planned to notify user space about such
>>> fabric
>> events, and if yes, how?
>>
>> [Muneendra]Yes. FC drivers, when receiving FC FPIN ELS's are calling
>> a
>> scsi transport routine with the FPIN payload.  The transport
>> is pushing this as an "event" via netlink.  An app bound to the local
>> address used by the scsi transport can receive the event and parse
>> it.
>>
>> Benjamin has added a marginal_path group(multipath marginal
>> pathgroups) in
>> the dm-multipath.
>> https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git
>> -send-email-bmarzins@redhat.com/
>>
>> One of the intention of the Benjamin's patch (support for maginal
>> path) is
>> to support for the FPIN events we receive from fabric.
>> On receiving the fpin-li our intention was to  place all the paths
>> that
>> are affected into the marginal path group.
>>
> 
> I'm aware of Ben's work, but I hadn't realized it had anything to do
> with FPIN. As of today, multipathd doesn't listen on the
> NETLINK_SCSITRANSPORT socket. Does any user space tool do this?
> Google didn't show me anything.
> 
I did, once, but that was years ago.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-31 11:53               ` Hannes Reinecke
@ 2021-03-31 11:57                 ` Muneendra Kumar M
  2021-03-31 12:41                   ` Martin Wilck
  0 siblings, 1 reply; 17+ messages in thread
From: Muneendra Kumar M @ 2021-03-31 11:57 UTC (permalink / raw)
  To: Hannes Reinecke, Martin Wilck, erwin, bblock; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 1485 bytes --]

Hi Martin,

>>
>> I'm aware of Ben's work, but I hadn't realized it had anything to do
>> with FPIN. As of today, multipathd doesn't listen on the
>> NETLINK_SCSITRANSPORT socket. Does any user space tool do this?
> >Google didn't show me anything.
> >
>I did, once, but that was years ago.

We have user space daemon(Broadcom'sFiber Channel Transport Daemon) called
fctxpd (Benjamin was talking in his patch) which acts on fpin-li events
by listening  on NETLINK_SCSITRANSPORT socket
And it sets the path to marginal path group on receiving FPIN events.
This daemon is part of epel8.
Below is the path for the same where we have changes
https://github.com/brocade/bsn-fc-txptd

Regards,
Muneendra.

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.

[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4220 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-31 11:57                 ` Muneendra Kumar M
@ 2021-03-31 12:41                   ` Martin Wilck
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Wilck @ 2021-03-31 12:41 UTC (permalink / raw)
  To: erwin, muneendra.kumar, bblock, hare; +Cc: dm-devel

On Wed, 2021-03-31 at 17:27 +0530, Muneendra Kumar M wrote:
> Hi Martin,
> 
> > > 
> > > I'm aware of Ben's work, but I hadn't realized it had anything to
> > > do
> > > with FPIN. As of today, multipathd doesn't listen on the
> > > NETLINK_SCSITRANSPORT socket. Does any user space tool do this?
> > > Google didn't show me anything.
> > > 
> > I did, once, but that was years ago.
> 
> We have user space daemon(Broadcom'sFiber Channel Transport Daemon)
> called
> fctxpd (Benjamin was talking in his patch) which acts on fpin-li events
> by listening  on NETLINK_SCSITRANSPORT socket
> And it sets the path to marginal path group on receiving FPIN events.
> This daemon is part of epel8.
> Below is the path for the same where we have changes
> https://github.com/brocade/bsn-fc-txptd
> 
> Regards,
> Muneendra.

I see, and this daemon uses multipathd's "set marginal" command to make
multipathd act on it. I can see now that Ben talked about "Broadcom's
Fiber Channel Transport Daemon" back then, but he didn't go into
details, and I either overlooked it entirely, or forgot about it.
I recall that I was wondering by myself whether the "set marginal"
command had any use other than manual testing.

I wonder if we could / should incorporate this functionality into
multipathd itself.

But anyway, it seems that this part of the FPIN mechanism works
already.

Thanks
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-03-31 10:48           ` Muneendra Kumar M
  2021-03-31 11:45             ` Martin Wilck
@ 2021-04-01  2:48             ` Erwin van Londen
  2021-04-01 10:16               ` Martin Wilck
  1 sibling, 1 reply; 17+ messages in thread
From: Erwin van Londen @ 2021-04-01  2:48 UTC (permalink / raw)
  To: Muneendra Kumar M, Martin Wilck, bblock, hare; +Cc: dm-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 7031 bytes --]

Hello Muneendra

On Wed, 2021-03-31 at 16:18 +0530, Muneendra Kumar M wrote:
> Hi Martin,
> Below are my replies.
> 
> 
> > If there was any discussion, I haven't been involved :-)
> 
> > I haven't looked into FPIN much so far. I'm rather sceptic with
> > it's
> usefulness for dm-multipath. Being a property of FC-2, FPIN works at
> least
> 2 layers below dm-multipath. dm-multipath is agnostic against
> protocol and
> transport properties by design. User space multipathd can cross these
> layers and tune dm-multipath based on lower-level properties, but
> such
> actions  have rather large latencies.
> 
> > As you know, dm-multipath has 3 switches for routing IO via
> > different
> paths:
> 
> > 1 priority groups,
> > 2 path status (good / failed)
>  >3 path selector algorithm
> 
> > 1) and 2) are controlled by user space, and have high latency.
> 
> > The current "marginal" concept in multipathd watches paths for
> > repeated
> failures, and configures the kernel to avoid using paths that are
> considered marginal, using methods 1) and 2). This is a very-high-
> latency
> algorithm that >changes state on the time scale of minutes.
> > There is no concept for "delaying" or "pausing" IO on paths on
> > short time
> scale.
> 
> > The only low-latency mechanism is 3). But it's block level, no
> > existing
> selector looks at transport-level properties.
> 
> > That said, I can quite well imagine a feedback mechanism based on
> throttling or delays applied in the FC drivers. For example, it a
> remote
> port was throttled by the driver in response to FPIN messages, it's
> bandwidth would >decrease, and a path selector like "service-time"
> > would automatically assign less IO to such paths. This wouldn't
> > need any
> changes in dm-multipath or multipath-tools, it would work entirely on
> the
> FC level.
> 
> [Muneendra]Agreed.
I think the only way the FC drivers can respond to this is by delaying
the R_RDY primitives resulting in less credits being available for the
remote side to use. That only works on a link layer and not fabric
wide. It cannot change linkspeed at all as that would bounce a port
resulting in all sorts of state changes. That being said this is
already the existing behavior and not really tied to fpins. The goal of
the fpin method was to provide a more proactive method and inform the
OS layer of fabric issues so it could act upon it by adjusting the IO
profile.
> 
> > Talking about improving the current "marginal" algorithm in
> > multipathd,
> and knowing that it's slow, FPIN might provide additional data that
> would
> be good to have. Currently, multipathd only has 2 inputs, "good<-
> >bad"
> state >transitions based either on kernel I/O errors or path checker
> results, and failure statistics from multipathd's internal
> "io_err_stat"
> thread, which only reads sector 0. This could obviously be improved,
> but
> there may actually be >lower-hanging fruit than evaluating FPIN
> notifications (for example, I've pondered utilizing the kernel's
> blktrace
> functionality to detect unusually long IO latencies or bandwidth
> drops).
> 
> > Talking about FPIN, is it planned to notify user space about such
> > fabric
> events, and if yes, how?
> 
> [Muneendra]Yes. FC drivers, when receiving FC FPIN ELS's are calling
> a
> scsi transport routine with the FPIN payload.  The transport
> is pushing this as an "event" via netlink.  An app bound to the local
> address used by the scsi transport can receive the event and parse
> it.
> 
> Benjamin has added a marginal_path group(multipath marginal
> pathgroups) in
> the dm-multipath.
> https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git
> -send-email-bmarzins@redhat.com/
> 
> One of the intention of the Benjamin's patch (support for maginal
> path) is
> to support for the FPIN events we receive from fabric.
> On receiving the fpin-li our intention was to  place all the paths
> that
> are affected into the marginal path group.
I think this should all be done in kernel space as we're talking sub-
millisecond timings here when it comes to fpins and the reaction time
expected. I may be wrong but I'll leave that up to you.
> 
> Below are the 4 types of descriptors returned in an FPIN:
> •       Link Integrity (LN): some error on a link that affected
> frames,
> which is the main one for "flaky path"
> •       Delivery Notification (DN):  something explicitly knew about
> a
> dropped frame and is reporting it. Usually, things like a CRC error
> says
> you can't trust the frame header, so you it's a LI error. But if you
> do
> have a valid frame, but drop it, such as a fabric edge timer (don't
> queue
> it more the 250-600ms), then it becomes a DN type. Could be flaky
> path,
> but not necessarily.
> •       Congestion (CN): fabric is saying it's congested sending to
> "your"
> port. Meaning if a host receives it - fabric is saying it has more
> frames
> for the host than it's pulling in so it's backing up the fabric.What
> should happen is load by the host should be lowered - but it's across
> all
> targets. Not all targets are perhaps in the mpio path list
> •       Peer Congestion (PCN): this goes along with CN in that the
> fabric
> is now telling the other devices in the zone sending traffic to that
> congested port that the other port is backing up. So the idea is
> these
> peer send less load to the congested port.  Shouldn't really tie to
> mpio.
> some of the current thinking is targets could see this and reduce
> their
> transmission rate to a host to the link speed of the host
> 
> On receiving the congestion notifications our intention is to
> slowdown the
> work load gradually from the host until it stops receiving the
> congestion
> notifications.
> We need to validate the same how we can achieve the same of
> decreasing the
> workloads with the help of dm-multipath.
Would it be possible to piggyback on the service time path selector in
this when it pertains latency?  

Another thing is that at some stage the IO queueing decision needs to
take into account the various different FPIN descriptors. A remote
delivery notification due to slow drain behaviour is very different
than ISL congestion or any physical issues.
> 
> As Hannes mentioned  in his earlier mail our primary goal is that the
> admin first should be _alerted_, having FPINs showing up in the
> message
> log, to alert the
> admin that his fabric is not performing well.
> 
This is a bit of a reactive approach that should be a secondary
objective. Having been in storage/fc support for 20 years I know that
most admins are not really responsive to this and taking actions based
on event entries take a very very long time. From an operations
perspective any sort of manual action should be avoided as much as
possible.
> 
> Regards,
> Muneendra.
> 

[-- Attachment #1.1.2: Type: text/html, Size: 10448 bytes --]

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-04-01  2:48             ` Erwin van Londen
@ 2021-04-01 10:16               ` Martin Wilck
  2021-04-01 22:04                 ` Erwin van Londen
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Wilck @ 2021-04-01 10:16 UTC (permalink / raw)
  To: erwin, muneendra.kumar, bblock, hare; +Cc: dm-devel

On Thu, 2021-04-01 at 12:48 +1000, Erwin van Londen wrote:
> > 
> > Benjamin has added a marginal_path group(multipath marginal
> > pathgroups) in
> > the dm-multipath.
> > https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git
> > -send-email-bmarzins@redhat.com/
> > 
> > One of the intention of the Benjamin's patch (support for maginal
> > path) is
> > to support for the FPIN events we receive from fabric.
> > On receiving the fpin-li our intention was to  place all the paths
> > that
> > are affected into the marginal path group.
> I think this should all be done in kernel space as we're talking sub-
> millisecond timings here when it comes to fpins and the reaction time
> expected. I may be wrong but I'll leave that up to you.

Sub-ms latency is impossible with this setup  (kernel -> broadcom FC
daemon -> multipathd -> kernel). It's only suitable for "fatal" FPINs
that would suggest taking a path offline on the time scale of minutes.
I suppose that would work well for LN FPINs, but not for the other
types.

> > 
> > On receiving the congestion notifications our intention is to
> > slowdown the
> > work load gradually from the host until it stops receiving the
> > congestion
> > notifications.
> > We need to validate the same how we can achieve the same of
> > decreasing the
> > workloads with the help of dm-multipath.
> Would it be possible to piggyback on the service time path selector
> in this when it pertains latency?  

Not on service-time itself, but someone could write a new path selector
algorithm. IMO we'd still have the problem that this would be seen as a
layering violation. In the long run dm-mpath may need to add transport-
specific callbacks. But for a proof-of-concept, a selector algorithm
with layering violations would be ok, I believe.

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-04-01 10:16               ` Martin Wilck
@ 2021-04-01 22:04                 ` Erwin van Londen
  2021-04-05  5:30                   ` Muneendra Kumar M
  0 siblings, 1 reply; 17+ messages in thread
From: Erwin van Londen @ 2021-04-01 22:04 UTC (permalink / raw)
  To: Martin Wilck, muneendra.kumar, bblock, hare; +Cc: dm-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2578 bytes --]

Hello Martin,

On Thu, 2021-04-01 at 10:16 +0000, Martin Wilck wrote:
> On Thu, 2021-04-01 at 12:48 +1000, Erwin van Londen wrote:
> > > 
> > > Benjamin has added a marginal_path group(multipath marginal
> > > pathgroups) in
> > > the dm-multipath.
> > > https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git
> > > -send-email-bmarzins@redhat.com/
> > > 
> > > One of the intention of the Benjamin's patch (support for maginal
> > > path) is
> > > to support for the FPIN events we receive from fabric.
> > > On receiving the fpin-li our intention was to  place all the
> > > paths
> > > that
> > > are affected into the marginal path group.
> > I think this should all be done in kernel space as we're talking
> > sub-
> > millisecond timings here when it comes to fpins and the reaction
> > time
> > expected. I may be wrong but I'll leave that up to you.
> 
> Sub-ms latency is impossible with this setup  (kernel -> broadcom FC
> daemon -> multipathd -> kernel). It's only suitable for "fatal" FPINs
> that would suggest taking a path offline on the time scale of
> minutes.
> I suppose that would work well for LN FPINs, but not for the other
> types.
I agree. I was hoping the FC drivers would be able to play a role in
this and provide a direct hook into the FPIN notifications in such a
way that userspace daemons would not be required and multipath would be
able to play a direct role here.
When it comes to latency in a san we're indeed talking about sub-ms
when it comes to impacting other parts of the fabrics having an
immediate effect on multiple initiators and targets due to the shared
nature of the beast.
> 
> > > 
> > > On receiving the congestion notifications our intention is to
> > > slowdown the
> > > work load gradually from the host until it stops receiving the
> > > congestion
> > > notifications.
> > > We need to validate the same how we can achieve the same of
> > > decreasing the
> > > workloads with the help of dm-multipath.
> > Would it be possible to piggyback on the service time path selector
> > in this when it pertains latency?  
> 
> Not on service-time itself, but someone could write a new path
> selector
> algorithm. IMO we'd still have the problem that this would be seen as
> a
> layering violation. In the long run dm-mpath may need to add
> transport-
> specific callbacks. But for a proof-of-concept, a selector algorithm
> with layering violations would be ok, I believe.
Is that an offer of volunteering?? :-)
> 
> Regards
> Martin
> 

[-- Attachment #1.1.2: Type: text/html, Size: 4120 bytes --]

[-- Attachment #1.2: face-smile.png --]
[-- Type: image/png, Size: 871 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-04-01 22:04                 ` Erwin van Londen
@ 2021-04-05  5:30                   ` Muneendra Kumar M
  2021-04-05  5:55                     ` Erwin van Londen
  0 siblings, 1 reply; 17+ messages in thread
From: Muneendra Kumar M @ 2021-04-05  5:30 UTC (permalink / raw)
  To: Erwin van Londen, Martin Wilck, bblock, hare; +Cc: dm-devel


[-- Attachment #1.1.1.1: Type: text/plain, Size: 3606 bytes --]

Hi Erwin,

Below are my replies.



On Thu, 2021-04-01 at 10:16 +0000, Martin Wilck wrote:

On Thu, 2021-04-01 at 12:48 +1000, Erwin van Londen wrote:



Benjamin has added a marginal_path group(multipath marginal

pathgroups) in

the dm-multipath.

https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git

-send-email-bmarzins@redhat.com/



One of the intention of the Benjamin's patch (support for maginal

path) is

to support for the FPIN events we receive from fabric.

On receiving the fpin-li our intention was to  place all the paths

that

are affected into the marginal path group.

I think this should all be done in kernel space as we're talking sub-

millisecond timings here when it comes to fpins and the reaction time

expected. I may be wrong but I'll leave that up to you.



Sub-ms latency is impossible with this setup  (kernel -> broadcom FC

daemon -> multipathd -> kernel). It's only suitable for "fatal" FPINs

that would suggest taking a path offline on the time scale of minutes.

I suppose that would work well for LN FPINs, but not for the other

types.

>>I agree. I was hoping the FC drivers would be able to play a role in this
and provide a direct hook into the FPIN notifications in such a way that
userspace daemons would not be required and multipath would >>be able to
play a direct role here.

>>When it comes to latency in a san we're indeed talking about sub-ms when
it comes to impacting other parts of the fabrics having an immediate effect
on multiple initiators and targets due to the shared nature >>of the beast.

>>





On receiving the congestion notifications our intention is to

slowdown the

work load gradually from the host until it stops receiving the

congestion

notifications.

We need to validate the same how we can achieve the same of

decreasing the

workloads with the help of dm-multipath.

Would it be possible to piggyback on the service time path selector

in this when it pertains latency?



Not on service-time itself, but someone could write a new path selector

algorithm. IMO we'd still have the problem that this would be seen as a

layering violation. In the long run dm-mpath may need to add transport-

specific callbacks. But for a proof-of-concept, a selector algorithm

with layering violations would be ok, I believe.

>>Is that an offer of volunteering?? [image: :-)]

[Muneendra]To address all the issues we are planning to come up with new
dm-path selector algorithm which should address

the above concerns where FC drivers will do a direct hook into the FPIN
notifications in such a way that userspace daemons would not be required
and multipath would be able to play a

direct role here.

Will come up with more details regarding the new dm-path selector algorithm
for FPIN notifications.



Regards,

Muneendra.

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.

[-- Attachment #1.1.1.2: Type: text/html, Size: 8831 bytes --]

[-- Attachment #1.1.2: image001.png --]
[-- Type: image/png, Size: 871 bytes --]

[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4220 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.
  2021-04-05  5:30                   ` Muneendra Kumar M
@ 2021-04-05  5:55                     ` Erwin van Londen
  0 siblings, 0 replies; 17+ messages in thread
From: Erwin van Londen @ 2021-04-05  5:55 UTC (permalink / raw)
  To: Muneendra Kumar M, Martin Wilck, bblock, hare; +Cc: dm-devel


[-- Attachment #1.1.1.1: Type: text/plain, Size: 4243 bytes --]

Hello Muneendra,

On Mon, 2021-04-05 at 11:00 +0530, Muneendra Kumar M wrote:
> Hi Erwin,
> Below are my replies.
>  
> On Thu, 2021-04-01 at 10:16 +0000, Martin Wilck wrote:
> > On Thu, 2021-04-01 at 12:48 +1000, Erwin van Londen wrote:
> > > >  
> > > > Benjamin has added a marginal_path group(multipath marginal
> > > > pathgroups) in
> > > > the dm-multipath.
> > > >
> https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git
> > > > -send-email-bmarzins@redhat.com/
> > > >  
> > > > One of the intention of the Benjamin's patch (support for
> > > > maginal
> > > > path) is
> > > > to support for the FPIN events we receive from fabric.
> > > > On receiving the fpin-li our intention was to  place all the
> > > > paths
> > > > that
> > > > are affected into the marginal path group.
> > > I think this should all be done in kernel space as we're talking
> > > sub-
> > > millisecond timings here when it comes to fpins and the reaction
> > > time
> > > expected. I may be wrong but I'll leave that up to you.
> >  
> > Sub-ms latency is impossible with this setup  (kernel -> broadcom
> > FC
> > daemon -> multipathd -> kernel). It's only suitable for "fatal"
> > FPINs
> > that would suggest taking a path offline on the time scale of
> > minutes.
> > I suppose that would work well for LN FPINs, but not for the other
> > types.
> 
> >>I agree. I was hoping the FC drivers would be able to play a role
> in this and provide a direct hook into the FPIN notifications in such
> a way that userspace daemons would not be required and multipath
> would >>be able to play a direct role here.
> >>When it comes to latency in a san we're indeed talking about sub-ms
> when it comes to impacting other parts of the fabrics having an
> immediate effect on multiple initiators and targets due to the shared
> nature >>of the beast.
> >> 
> >  
> > > >  
> > > > On receiving the congestion notifications our intention is to
> > > > slowdown the
> > > > work load gradually from the host until it stops receiving the
> > > > congestion
> > > > notifications.
> > > > We need to validate the same how we can achieve the same of
> > > > decreasing the
> > > > workloads with the help of dm-multipath.
> > > Would it be possible to piggyback on the service time path
> > > selector
> > > in this when it pertains latency?  
> >  
> > Not on service-time itself, but someone could write a new path
> > selector
> > algorithm. IMO we'd still have the problem that this would be seen
> > as a
> > layering violation. In the long run dm-mpath may need to add
> > transport-
> > specific callbacks. But for a proof-of-concept, a selector
> > algorithm
> > with layering violations would be ok, I believe.
> 
> >>Is that an offer of volunteering?? :-)
> [Muneendra]To address all the issues we are planning to come up with
> new dm-path selector algorithm which should address 
> the above concerns where FC drivers will do a direct hook into the
> FPIN notifications in such a way that userspace daemons would not be
> required and multipath would be able to play a 
> direct role here.
> Will come up with more details regarding the new dm-path selector
> algorithm for FPIN notifications.

That is awesome. Thank you very much. If you need any input or feedback
then please let me know.
>  
> Regards,
> Muneendra.
>  
> 
> This electronic communication and the information and any files
> transmitted with it, or attached to it, are confidential and are
> intended solely for the use of the individual or entity to whom it is
> addressed and may contain information that is confidential, legally
> privileged, protected by privacy laws, or otherwise restricted from
> disclosure to anyone else. If you are not the intended recipient or
> the person responsible for delivering the e-mail to the intended
> recipient, you are hereby notified that any use, copying,
> distributing, dissemination, forwarding, printing, or copying of this
> e-mail is strictly prohibited. If you received this e-mail in error,
> please return the e-mail to the sender, delete it from your computer,
> and destroy any printed copy of it.

[-- Attachment #1.1.1.2: Type: text/html, Size: 9192 bytes --]

[-- Attachment #1.1.2: image001.png --]
[-- Type: image/png, Size: 871 bytes --]

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-04-06 11:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-23  7:52 [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications Erwin van Londen
2021-03-25 16:07 ` Benjamin Block
2021-03-26 11:15   ` Muneendra Kumar M
2021-03-31  0:22     ` Erwin van Londen
2021-03-31  7:25       ` Hannes Reinecke
2021-03-31  8:12         ` Erwin van Londen
2021-03-31  9:57         ` Martin Wilck
2021-03-31 10:48           ` Muneendra Kumar M
2021-03-31 11:45             ` Martin Wilck
2021-03-31 11:53               ` Hannes Reinecke
2021-03-31 11:57                 ` Muneendra Kumar M
2021-03-31 12:41                   ` Martin Wilck
2021-04-01  2:48             ` Erwin van Londen
2021-04-01 10:16               ` Martin Wilck
2021-04-01 22:04                 ` Erwin van Londen
2021-04-05  5:30                   ` Muneendra Kumar M
2021-04-05  5:55                     ` Erwin van Londen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.