All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Wilck <martin.wilck@suse.com>
To: "emilne@redhat.com" <emilne@redhat.com>,
	"Ulrich.Windl@rz.uni-regensburg.de" 
	<Ulrich.Windl@rz.uni-regensburg.de>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>, "hch@lst.de" <hch@lst.de>,
	"dgilbert@interlog.com" <dgilbert@interlog.com>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"jejb@linux.vnet.ibm.com" <jejb@linux.vnet.ibm.com>,
	"systemd-devel@lists.freedesktop.org" 
	<systemd-devel@lists.freedesktop.org>,
	"bmarzins@redhat.com" <bmarzins@redhat.com>
Subject: Re: RFC: one more time: SCSI device identification
Date: Tue, 27 Apr 2021 20:33:43 +0000	[thread overview]
Message-ID: <488ef3e7fa0cca4f0a0cb2e9307ddaa08385d3f7.camel@suse.com> (raw)
In-Reply-To: <65f66a5e03081dd3b470fa9aeff9a77dbc41743c.camel@redhat.com>

On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote:
> On Mon, 2021-04-26 at 13:16 +0000, Martin Wilck wrote:
> > On Mon, 2021-04-26 at 13:14 +0200, Ulrich Windl wrote:
> > > > > 
> > > > 
> > > > While we're at it, I'd like to mention another issue: WWID
> > > > changes.
> > > > 
> > > > This is a big problem for multipathd. The gist is that the
> > > > device
> > > > identification attributes in sysfs only change after rescanning
> > > > the
> > > > device. Thus if a user changes LUN assignments on a storage
> > > > system,
> > > > it can happen that a direct INQUIRY returns a different WWID as
> > > > in
> > > > sysfs, which is fatal. If we plan to rely more on sysfs for
> > > > device
> > > > identification in the future, the problem gets worse. 
> > > 
> > > I think many devices rely on the fact that they are identified by
> > > Vendor/model/serial_nr, because in most professional SAN storage
> > > systems you
> > > can pre-set the serial number to a custom value; so if you want a
> > > new
> > > disk
> > > (maybe a snapshot) to be compatible with the old one, just assign
> > > the
> > > same
> > > serial number. I guess that's the idea behind.
> > 
> > What you are saying sounds dangerous to me. If a snapshot has the
> > same
> > WWID as the device it's a snapshot of, it must not be exposed to
> > any
> > host(s) at the same time with its origin, otherwise the host may
> > happily combine it with the origin into one multipath map, and data
> > corruption will almost certainly result. 
> > 
> > My argument is about how the host is supposed to deal with a WWID
> > change if it happens. Here, "WWID change" means that a given
> > H:C:T:L
> > suddenly exposes different device designators than it used to,
> > while
> > this device is in use by a host. Here, too, data corruption is
> > imminent, and can happen in a blink of an eye. To avoid this,
> > several
> > things are needed:
> > 
> >  1) the host needs to get notified about the change (likely by an
> > UA
> > of
> > some sort)
> >  2) the kernel needs to react to the notification immediately, e.g.
> > by
> > blocking IO to the device,
> 
> There's no way to do that, in principle.  Because there could be
> other I/Os in flight.  You might (somehow) avoid retrying an I/O
> that got a UA until you figured out if something changed, but other
> I/Os can already have been sent to the target, or issued before you
> get to look at the status.

Right. But in practice, a WWID change will hardly happen under full IO
load. The storage side will probably have to block IO while this
happens, at least for a short time period. So blocking and quiescing
the queue upon an UA might still work, most of the time. Even if we
were too late already, the sooner we stop the queue, the better.

The current algorithm in multipath-tools needs to detect a path going
down and being reinstated. The time interval during which a WWID change
will go unnoticed is one or more path checker intervals, typically on
the order of 5-30 seconds. If we could decrease this interval to a sub-
second or even millisecond range by blocking the queue in the kernel
quickly, we'd have made a big step forward.

Regards
Martin


WARNING: multiple messages have this Message-ID (diff)
From: Martin Wilck <martin.wilck@suse.com>
To: "emilne@redhat.com" <emilne@redhat.com>,
	"Ulrich.Windl@rz.uni-regensburg.de"
	<Ulrich.Windl@rz.uni-regensburg.de>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>,
	"jejb@linux.vnet.ibm.com" <jejb@linux.vnet.ibm.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>,
	"dgilbert@interlog.com" <dgilbert@interlog.com>,
	"systemd-devel@lists.freedesktop.org"
	<systemd-devel@lists.freedesktop.org>, "hch@lst.de" <hch@lst.de>
Subject: Re: [dm-devel] RFC: one more time: SCSI device identification
Date: Tue, 27 Apr 2021 20:33:43 +0000	[thread overview]
Message-ID: <488ef3e7fa0cca4f0a0cb2e9307ddaa08385d3f7.camel@suse.com> (raw)
In-Reply-To: <65f66a5e03081dd3b470fa9aeff9a77dbc41743c.camel@redhat.com>

On Tue, 2021-04-27 at 16:14 -0400, Ewan D. Milne wrote:
> On Mon, 2021-04-26 at 13:16 +0000, Martin Wilck wrote:
> > On Mon, 2021-04-26 at 13:14 +0200, Ulrich Windl wrote:
> > > > > 
> > > > 
> > > > While we're at it, I'd like to mention another issue: WWID
> > > > changes.
> > > > 
> > > > This is a big problem for multipathd. The gist is that the
> > > > device
> > > > identification attributes in sysfs only change after rescanning
> > > > the
> > > > device. Thus if a user changes LUN assignments on a storage
> > > > system,
> > > > it can happen that a direct INQUIRY returns a different WWID as
> > > > in
> > > > sysfs, which is fatal. If we plan to rely more on sysfs for
> > > > device
> > > > identification in the future, the problem gets worse. 
> > > 
> > > I think many devices rely on the fact that they are identified by
> > > Vendor/model/serial_nr, because in most professional SAN storage
> > > systems you
> > > can pre-set the serial number to a custom value; so if you want a
> > > new
> > > disk
> > > (maybe a snapshot) to be compatible with the old one, just assign
> > > the
> > > same
> > > serial number. I guess that's the idea behind.
> > 
> > What you are saying sounds dangerous to me. If a snapshot has the
> > same
> > WWID as the device it's a snapshot of, it must not be exposed to
> > any
> > host(s) at the same time with its origin, otherwise the host may
> > happily combine it with the origin into one multipath map, and data
> > corruption will almost certainly result. 
> > 
> > My argument is about how the host is supposed to deal with a WWID
> > change if it happens. Here, "WWID change" means that a given
> > H:C:T:L
> > suddenly exposes different device designators than it used to,
> > while
> > this device is in use by a host. Here, too, data corruption is
> > imminent, and can happen in a blink of an eye. To avoid this,
> > several
> > things are needed:
> > 
> >  1) the host needs to get notified about the change (likely by an
> > UA
> > of
> > some sort)
> >  2) the kernel needs to react to the notification immediately, e.g.
> > by
> > blocking IO to the device,
> 
> There's no way to do that, in principle.  Because there could be
> other I/Os in flight.  You might (somehow) avoid retrying an I/O
> that got a UA until you figured out if something changed, but other
> I/Os can already have been sent to the target, or issued before you
> get to look at the status.

Right. But in practice, a WWID change will hardly happen under full IO
load. The storage side will probably have to block IO while this
happens, at least for a short time period. So blocking and quiescing
the queue upon an UA might still work, most of the time. Even if we
were too late already, the sooner we stop the queue, the better.

The current algorithm in multipath-tools needs to detect a path going
down and being reinstated. The time interval during which a WWID change
will go unnoticed is one or more path checker intervals, typically on
the order of 5-30 seconds. If we could decrease this interval to a sub-
second or even millisecond range by blocking the queue in the kernel
quickly, we'd have made a big step forward.

Regards
Martin


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


  reply	other threads:[~2021-04-27 20:35 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-29  9:58 RFC: one more time: SCSI device identification Martin Wilck
2021-03-29  9:58 ` [dm-devel] " Martin Wilck
2021-04-06  4:47 ` Martin K. Petersen
2021-04-06  4:47   ` [dm-devel] " Martin K. Petersen
2021-04-16 23:28   ` Martin Wilck
2021-04-16 23:28     ` [dm-devel] " Martin Wilck
2021-04-22  2:46     ` Martin K. Petersen
2021-04-22  2:46       ` [dm-devel] " Martin K. Petersen
2021-04-22  9:07       ` Martin Wilck
2021-04-22  9:07         ` [dm-devel] " Martin Wilck
2021-04-22 16:14         ` Benjamin Marzinski
2021-04-22 16:14           ` [dm-devel] " Benjamin Marzinski
2021-04-23  1:40         ` Martin K. Petersen
2021-04-23  1:40           ` [dm-devel] " Martin K. Petersen
2021-04-23 10:28           ` Martin Wilck
2021-04-23 10:28             ` [dm-devel] " Martin Wilck
2021-04-26 11:14             ` Antw: [EXT] Re: [systemd-devel] " Ulrich Windl
2021-04-26 11:14               ` [dm-devel] " Ulrich Windl
2021-04-26 13:16               ` Martin Wilck
2021-04-26 13:16                 ` [dm-devel] " Martin Wilck
2021-04-27  3:48                 ` Erwin van Londen
2021-04-27  7:02                   ` Antw: [EXT] " Ulrich Windl
2021-04-27  7:02                     ` [dm-devel] Antw: [EXT] " Ulrich Windl
2021-04-27  8:10                   ` [dm-devel] " Martin Wilck
2021-04-27  8:10                     ` Martin Wilck
2021-04-27  8:21                     ` Hannes Reinecke
2021-04-27  8:21                       ` Hannes Reinecke
2021-04-27 10:52                       ` Antw: [EXT] " Ulrich Windl
2021-04-27 10:52                         ` [dm-devel] Antw: [EXT] " Ulrich Windl
2021-04-27 20:04                         ` Antw: [EXT] Re: [dm-devel] " Ewan D. Milne
2021-04-27 20:04                           ` [dm-devel] Antw: [EXT] " Ewan D. Milne
2021-05-04  7:32                         ` Antw: [EXT] Re: [dm-devel] " Hannes Reinecke
2021-05-04  7:32                           ` [dm-devel] Antw: [EXT] " Hannes Reinecke
2021-04-28  1:01                       ` [dm-devel] " Erwin van Londen
2021-04-28  6:34                         ` Martin Wilck
2021-04-28  6:34                           ` Martin Wilck
2021-04-29 14:47                           ` Erwin van Londen
2021-04-29 14:47                             ` Erwin van Londen
2021-04-27 20:14                 ` Ewan D. Milne
2021-04-27 20:14                   ` [dm-devel] " Ewan D. Milne
2021-04-27 20:33                   ` Martin Wilck [this message]
2021-04-27 20:33                     ` Martin Wilck
2021-04-27 20:41                     ` Ewan D. Milne
2021-04-27 20:41                       ` [dm-devel] " Ewan D. Milne
2021-04-28  0:09                       ` Erwin van Londen
2021-04-30 23:44                         ` Ewan D. Milne
2021-05-03  2:34                           ` Erwin van Londen
2021-05-03  2:34                             ` Erwin van Londen
2021-04-28  6:30                       ` [systemd-devel] " Martin Wilck
2021-04-28  6:30                         ` [dm-devel] " Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=488ef3e7fa0cca4f0a0cb2e9307ddaa08385d3f7.camel@suse.com \
    --to=martin.wilck@suse.com \
    --cc=Ulrich.Windl@rz.uni-regensburg.de \
    --cc=bmarzins@redhat.com \
    --cc=dgilbert@interlog.com \
    --cc=dm-devel@redhat.com \
    --cc=emilne@redhat.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=systemd-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.