All of lore.kernel.org
 help / color / mirror / Atom feed
* Questions around multipath failover and no_path_retry
@ 2018-03-11 16:47 Karan Vohra
  2018-03-19 22:48 ` Martin Wilck
  0 siblings, 1 reply; 2+ messages in thread
From: Karan Vohra @ 2018-03-11 16:47 UTC (permalink / raw)
  To: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 1818 bytes --]

Hi Folks,

Let us assume, there are 2 paths within the path group which dm-multipath is sending the I/Os in round-robin fashion. Each of these paths are identified as unique block device(s) such as /dev/sdb and /dev/sdc.

Let us say some I/Os are sent over to the path /dev/sdb and either the requests time out or there is a failure on that path, what happens to those I/Os? Are they sent over to the other path - /dev/sdc or does dm-multipath waits for /dev/sdb to come back online and only sends I/O to /dev/sdb? One of the reasons we are concerned about the above scenario is- let us say there is a write I/O W1 which is routed to /dev/sdb and then there is a failure. There was a write I/O W2 which wrote at the same block via /dev/sdc. Now if multipath sends W1 through /dev/sdc, W2 gets overwritten by W1. The expectation was that W2 happens after W1 and should overwrite W1 but the result is opposite. Situations like these can cause data inconsistency and corruption.We were thinking of using no_path_retry configuration to be set to queue to make sure that the I/Os supposed to be going to path1 never make it to path2. But the question is that would not that cause unexpected behavior in application layer? Let us say there are I/O Requests R1, R2, R3 and so on.. R1 is going to Path1, R2 is going to Path2 and so on. If Path1 dies for some reason, with the setting of no_path_retry to queue, queueing will not stop until the path is fixed so does not that mean that R1, R3,R5 ... will not make it to block device until the path is fixed? Would it not cause failures if the issue persists for seconds? What about the size of queue? Is there any danger of queue getting overloaded?Any pointers or references would be of great help.

Thanks!
Karan


Get Outlook for Android<https://aka.ms/ghei36>


[-- Attachment #1.2: Type: text/html, Size: 2937 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Questions around multipath failover and no_path_retry
  2018-03-11 16:47 Questions around multipath failover and no_path_retry Karan Vohra
@ 2018-03-19 22:48 ` Martin Wilck
  0 siblings, 0 replies; 2+ messages in thread
From: Martin Wilck @ 2018-03-19 22:48 UTC (permalink / raw)
  To: Karan Vohra, dm-devel

On Sun, 2018-03-11 at 16:47 +0000, Karan Vohra wrote:
> Hi Folks,
> 
> Let us assume, there are 2 paths within the path group which dm-
> multipath is sending the I/Os in round-robin fashion. Each of these
> paths are identified as unique block device(s) such as /dev/sdb and
> /dev/sdc. 
> 
> Let us say some I/Os are sent over to the path /dev/sdb and either
> the requests time out or there is a failure on that path, what
> happens to those I/Os? Are they sent over to the other path -
> /dev/sdc or does dm-multipath waits for /dev/sdb to come back online
> and only sends I/O to /dev/sdb?

The I/O is sent to otherr paths (sdc in your example) when the lower
layer (e.g. SCSI) indicates path failure for sdb. That's the very point
of multipathing.

>  One of the reasons we are concerned about the above scenario is- let
> us say there is a write I/O W1 which is routed to /dev/sdb and then
> there is a failure. There was a write I/O W2 which wrote at the same
> block via /dev/sdc. Now if multipath sends W1 through /dev/sdc, W2
> gets overwritten by W1. The expectation was that W2 happens after W1
> and should overwrite W1 but the result is opposite. 

If you send two write IOs to the same sector at the same time, you
can't be sure which one arrives first. That's not specific to
multipath. If you want to guarantee ordering, you have to flush W1 
using e.g. fdatasync() before sending W2. The flush command won't
return before W1 is written to disk.

> Situations like these can cause data inconsistency and corruption.We
> were thinking of using no_path_retry configuration to be set to queue
> to make sure that the I/Os supposed to be going to path1 never make
> it to path2.

That won't work. As the name of the option suggeests, "no_path_retry"
only affects the behavior if there's _no_ healthy path left.

>  But the question is that would not that cause unexpected behavior in
> application layer? Let us say there are I/O Requests R1, R2, R3 and
> so on.. R1 is going to Path1, R2 is going to Path2 and so on. If
> Path1 dies for some reason, with the setting of no_path_retry to
> queue, queueing will not stop until the path is fixed so does not
> that mean that R1, R3,R5 ... will not make it to block device until
> the path is fixed? Would it not cause failures if the issue persists
> for seconds? 

As I said, that's not how it works. R1->P1, R2->P2, ... only holds as
long as all paths are up (and no IO scheduler is active which might re-
order your I/O requests).

> What about the size of queue? Is there any danger of queue getting
> overloaded?Any pointers or references would be of great help.

Theoretically, the queue is only limited by memory size.

Martin

> 
> Thanks!
> Karan
> 
> 
> Get Outlook for Android
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-03-19 22:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-11 16:47 Questions around multipath failover and no_path_retry Karan Vohra
2018-03-19 22:48 ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.