Re: [systemd-devel] RFC: one more time: SCSI device identification

From: Martin Wilck <martin.wilck@suse.com>
To: "emilne@redhat.com" <emilne@redhat.com>,
	"Ulrich.Windl@rz.uni-regensburg.de" 
	<Ulrich.Windl@rz.uni-regensburg.de>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>, "hch@lst.de" <hch@lst.de>,
	"dgilbert@interlog.com" <dgilbert@interlog.com>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"jejb@linux.vnet.ibm.com" <jejb@linux.vnet.ibm.com>,
	"systemd-devel@lists.freedesktop.org" 
	<systemd-devel@lists.freedesktop.org>,
	"bmarzins@redhat.com" <bmarzins@redhat.com>
Subject: Re: [systemd-devel] RFC: one more time: SCSI device identification
Date: Wed, 28 Apr 2021 06:30:28 +0000	[thread overview]
Message-ID: <9248c6df5484a0f5fe4247a1867945ed3902341b.camel@suse.com> (raw)
In-Reply-To: <c8ede601244e1710dbf320c33c0f7853e249bbee.camel@redhat.com>

On Tue, 2021-04-27 at 16:41 -0400, Ewan D. Milne wrote:
> On Tue, 2021-04-27 at 20:33 +0000, Martin Wilck wrote:
> > On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote:
> > > 
> > > There's no way to do that, in principle.  Because there could be
> > > other I/Os in flight.  You might (somehow) avoid retrying an I/O
> > > that got a UA until you figured out if something changed, but other
> > > I/Os can already have been sent to the target, or issued before you
> > > get to look at the status.
> > 
> > Right. But in practice, a WWID change will hardly happen under full
> > IO
> > load. The storage side will probably have to block IO while this
> > happens, at least for a short time period. So blocking and quiescing
> > the queue upon an UA might still work, most of the time. Even if we
> > were too late already, the sooner we stop the queue, the better.
> > 
> > The current algorithm in multipath-tools needs to detect a path going
> > down and being reinstated. The time interval during which a WWID
> > change
> > will go unnoticed is one or more path checker intervals, typically on
> > the order of 5-30 seconds. If we could decrease this interval to a
> > sub-
> > second or even millisecond range by blocking the queue in the kernel
> > quickly, we'd have made a big step forward.
> 
> Yes, and in many situations this may help.  But in the general case
> we can't protect against a storage array misconfiguration,
> where something like this can happen.  So I worry about people
> believing the host software will protect them against a mistake,
> when we can't really do that.

I agree. I expressed a similar notion in the following thread about
multipathd's WWID change detection capabilities in the face of really
bad mistakes on the administrator's (or storage array's, FTM)  part:
https://listman.redhat.com/archives/dm-devel/2021-February/msg00248.html
But others stressed that nonetheless we should try our best to
avoid customer data corruption (which I agree with, too), and thus we
settled on the current algorithm, which suited the needs at least of
the affected user(s) in that specific case.

Personally I think that the current "5-30s" time period for WWID change
detection in multipathd is unsafe both theoretically and practially,
and may lure users into a false feeling of safety. Therefore I'd
strongly welcome a kernel-side solution that might still not be safe
theoretically, but cover most practical problem scenarios much better
than we currently do.

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix Imendörffer