* [LSF/MM TOPIC] Reducing the SRP initiator failover time
@ 2013-02-01 13:43 Bart Van Assche
[not found] ` <510BC68A.90708-HInyCGIudOg@public.gmane.org>
[not found] ` <CAJZOPZJeCdkJ0xfK0kxic9jfz5A5ddw7TSWXe51yuO6bYTk4ag@mail.gmail.com>
0 siblings, 2 replies; 5+ messages in thread
From: Bart Van Assche @ 2013-02-01 13:43 UTC (permalink / raw)
To: lsf-pc, linux-scsi, linux-rdma, David Dillow
It is known that it takes about two to three minutes before the upstream
SRP initiator fails over from a failed path to a working path. This is
not only considered longer than acceptable but is also longer than other
Linux SCSI initiators (e.g. iSCSI and FC). Progress so far with
improving the fail-over SRP initiator has been slow. This is because the
discussion about candidate patches occurred at two different levels: not
only the patches itself were discussed but also the approach that should
be followed. That last aspect is easier to discuss in a meeting than
over a mailing list. Hence the proposal to discuss SRP initiator
failover behavior during the LSF/MM summit. The topics that need further
discussion are:
* If a path fails, remove the entire SCSI host or preserve the SCSI
host and only remove the SCSI devices associated with that host ?
* Which software component should test the state of a path and should
reconnect to an SRP target if a path is restored ? Should that be
done by the user space process srp_daemon or by the SRP initiator
kernel module ?
* How should the SRP initiator behave after a path failure has been
detected ? Should the behavior be similar to the FC initiator with
its fast_io_fail_tmo and dev_loss_tmo parameters ?
Dave, if this topic gets accepted, I really hope you will be able to
attend the LSF/MM summit.
Bart.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time
[not found] ` <510BC68A.90708-HInyCGIudOg@public.gmane.org>
@ 2013-02-04 12:13 ` Sebastian Riemer
0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Riemer @ 2013-02-04 12:13 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-rdma, David Dillow, Dongsu Park
Hi Bart,
thanks for approaching this! We're not the best mainline developers so I
guess we won't be there. But we have the big SRP setups and our
sysadmins really don't like reconnecting SRP hosts manually and putting
their devices complicated to the related dm-multipath devices again.
Think about > 200 SRP devices per server (already filtered by initiator
groups). We also consider the srptools as unmaintained, unreliable and
slow. It is possible that the srptools commands don't return. Therefore,
we send the SRP connection strings directly to the initiator within our
mapping jobs.
It would also be great not to develop a DDoS attack reconnect like
open-iscsi does. Rebooting the whole cluster to fix this isn't fun.
There must be a possibility to configure different reconnect intervals.
Btw.: We even had the case that the IPoIB stuff reconnected but the RDMA
part didn't with iSER. It was so broken then, that we couldn't
disconnect or reconnect anymore - only chance hard reboot.
So you know our point of view and we already develop it that way for us.
I'm looking forward what's the output of the discussion. At the current
state it's difficult to nag our bosses to publish what we have so far.
On 01.02.2013 14:43, Bart Van Assche wrote:
> It is known that it takes about two to three minutes before the upstream
> SRP initiator fails over from a failed path to a working path. This is
> not only considered longer than acceptable but is also longer than other
> Linux SCSI initiators (e.g. iSCSI and FC). Progress so far with
> improving the fail-over SRP initiator has been slow. This is because the
> discussion about candidate patches occurred at two different levels: not
> only the patches itself were discussed but also the approach that should
> be followed. That last aspect is easier to discuss in a meeting than
> over a mailing list. Hence the proposal to discuss SRP initiator
> failover behavior during the LSF/MM summit. The topics that need further
> discussion are:
> * If a path fails, remove the entire SCSI host or preserve the SCSI
> host and only remove the SCSI devices associated with that host ?
Preserve SCSI hosts and SCSI devices unless they are removed explicitly
by disconnect request. Rescanning SCSI devices with "- - -" like
"iscsiadm -R" does for example may reorder the device names (sda becomes
sdb, etc.).
> * Which software component should test the state of a path and should
> reconnect to an SRP target if a path is restored ? Should that be
> done by the user space process srp_daemon or by the SRP initiator
> kernel module ?
By the SRP kernel module. This is exactly the big advantage of SRP so
far: It is simple, it is RDMA and kernel only.
> * How should the SRP initiator behave after a path failure has been
> detected ? Should the behavior be similar to the FC initiator with
> its fast_io_fail_tmo and dev_loss_tmo parameters ?
Fine for us as long as it is possible to configure such times and the
behavior at all. For dm-multipath we need fast IO failing and that the
SRP initiator tries to automatically reconnect that path.
Cheers,
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time
[not found] ` <BB97625FCF082447AC2B11418FF02044A6E9E9C5-fViJhHBwANKuSA5JZHE7gA@public.gmane.org>
@ 2013-02-07 22:42 ` Vu Pham
[not found] ` <51142DE9.30900-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Vu Pham @ 2013-02-07 22:42 UTC (permalink / raw)
To: Bart Van Assche
Cc: lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-scsi,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, David Dillow, Oren Duer,
Sagi Grimberg
>
>
> It is known that it takes about two to three minutes before the
> upstream SRP initiator fails over from a failed path to a working
> path. This is not only considered longer than acceptable but is also
> longer than other Linux SCSI initiators (e.g. iSCSI and FC). Progress
> so far with improving the fail-over SRP initiator has been slow. This
> is because the discussion about candidate patches occurred at two
> different levels: not only the patches itself were discussed but also
> the approach that should be followed. That last aspect is easier to
> discuss in a meeting than over a mailing list. Hence the proposal to
> discuss SRP initiator failover behavior during the LSF/MM summit. The
> topics that need further discussion are:
> * If a path fails, remove the entire SCSI host or preserve the SCSI
> host and only remove the SCSI devices associated with that host ?
> * Which software component should test the state of a path and should
> reconnect to an SRP target if a path is restored ? Should that be
> done by the user space process srp_daemon or by the SRP initiator
> kernel module ?
> * How should the SRP initiator behave after a path failure has been
> detected ? Should the behavior be similar to the FC initiator with
> its fast_io_fail_tmo and dev_loss_tmo parameters ?
>
> Dave, if this topic gets accepted, I really hope you will be able to
> attend the LSF/MM summit.
>
> Bart.
>
Hello Bart,
Thank you for taking the initiative.
Mellanox think that this should be discussed. We'd be happy to attend.
We also would like to discuss:
* How and how fast does SRP detect a path failure besides RC error?
* Role of srp_daemon, how often srp_daemon scan fabric for new/old
targets, how-to scale srp_daemon discovery, traps.
-vu
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time
[not found] ` <51142DE9.30900-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-02-08 9:24 ` Sagi Grimberg
2013-02-08 11:38 ` Sebastian Riemer
0 siblings, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2013-02-08 9:24 UTC (permalink / raw)
To: Bart Van Assche
Cc: Vu Pham, lsf-pc-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-scsi, linux-rdma-u79uwXL29TY76Z2rM5mHXA, David Dillow,
Oren Duer
On 2/8/2013 12:42 AM, Vu Pham wrote:
>
>>
>>
>> It is known that it takes about two to three minutes before the
>> upstream SRP initiator fails over from a failed path to a working
>> path. This is not only considered longer than acceptable but is also
>> longer than other Linux SCSI initiators (e.g. iSCSI and FC). Progress
>> so far with improving the fail-over SRP initiator has been slow. This
>> is because the discussion about candidate patches occurred at two
>> different levels: not only the patches itself were discussed but also
>> the approach that should be followed. That last aspect is easier to
>> discuss in a meeting than over a mailing list. Hence the proposal to
>> discuss SRP initiator failover behavior during the LSF/MM summit. The
>> topics that need further discussion are:
>> * If a path fails, remove the entire SCSI host or preserve the SCSI
>> host and only remove the SCSI devices associated with that host ?
>> * Which software component should test the state of a path and should
>> reconnect to an SRP target if a path is restored ? Should that be
>> done by the user space process srp_daemon or by the SRP initiator
>> kernel module ?
>> * How should the SRP initiator behave after a path failure has been
>> detected ? Should the behavior be similar to the FC initiator with
>> its fast_io_fail_tmo and dev_loss_tmo parameters ?
>>
>> Dave, if this topic gets accepted, I really hope you will be able to
>> attend the LSF/MM summit.
>>
>> Bart.
>>
> Hello Bart,
>
> Thank you for taking the initiative.
> Mellanox think that this should be discussed. We'd be happy to attend.
>
> We also would like to discuss:
> * How and how fast does SRP detect a path failure besides RC error?
> * Role of srp_daemon, how often srp_daemon scan fabric for new/old
> targets, how-to scale srp_daemon discovery, traps.
>
> -vu
Hey Bart,
I agree with Vu that this issue should be discussed. We'd be happy to
attend.
--
Sagi
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [LSF/MM TOPIC] Reducing the SRP initiator failover time
2013-02-08 9:24 ` Sagi Grimberg
@ 2013-02-08 11:38 ` Sebastian Riemer
0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Riemer @ 2013-02-08 11:38 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Bart Van Assche, Vu Pham, lsf-pc, linux-scsi, linux-rdma,
David Dillow, Oren Duer
On 08.02.2013 10:24, Sagi Grimberg wrote:
> On 2/8/2013 12:42 AM, Vu Pham wrote:
>> Hello Bart,
>>
>> Thank you for taking the initiative.
>> Mellanox think that this should be discussed. We'd be happy to attend.
>>
>> We also would like to discuss:
>> * How and how fast does SRP detect a path failure besides RC error?
>> * Role of srp_daemon, how often srp_daemon scan fabric for new/old
>> targets, how-to scale srp_daemon discovery, traps.
>>
>> -vu
> Hey Bart,
>
> I agree with Vu that this issue should be discussed. We'd be happy to
> attend.
>
> --
> Sagi
Wow, also thanks to Mellanox for spending resources on SRP as well! Last
year in June we came across a very different situation.
Cheers,
Sebastian and the ProfitBricks storage team
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-02-08 11:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-01 13:43 [LSF/MM TOPIC] Reducing the SRP initiator failover time Bart Van Assche
[not found] ` <510BC68A.90708-HInyCGIudOg@public.gmane.org>
2013-02-04 12:13 ` Sebastian Riemer
[not found] ` <CAJZOPZJeCdkJ0xfK0kxic9jfz5A5ddw7TSWXe51yuO6bYTk4ag@mail.gmail.com>
[not found] ` <BB97625FCF082447AC2B11418FF02044A6E9E9C5@MTLDAG01.mtl.com>
[not found] ` <BB97625FCF082447AC2B11418FF02044A6E9E9C5-fViJhHBwANKuSA5JZHE7gA@public.gmane.org>
2013-02-07 22:42 ` Vu Pham
[not found] ` <51142DE9.30900-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-02-08 9:24 ` Sagi Grimberg
2013-02-08 11:38 ` Sebastian Riemer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.