On Mon, 2021-04-26 at 13:16 +0000, Martin Wilck wrote: > On Mon, 2021-04-26 at 13:14 +0200, Ulrich Windl wrote: > > > > > > > > > > While we're at it, I'd like to mention another issue: WWID > > > changes. > > > > > > This is a big problem for multipathd. The gist is that the device > > > identification attributes in sysfs only change after rescanning > > > the > > > device. Thus if a user changes LUN assignments on a storage > > > system, > > > it can happen that a direct INQUIRY returns a different WWID as > > > in > > > sysfs, which is fatal. If we plan to rely more on sysfs for > > > device > > > identification in the future, the problem gets worse. > > > > I think many devices rely on the fact that they are identified by > > Vendor/model/serial_nr, because in most professional SAN storage > > systems you > > can pre-set the serial number to a custom value; so if you want a > > new > > disk > > (maybe a snapshot) to be compatible with the old one, just assign > > the > > same > > serial number. I guess that's the idea behind. > > What you are saying sounds dangerous to me. If a snapshot has the > same > WWID as the device it's a snapshot of, it must not be exposed to any > host(s) at the same time with its origin, otherwise the host may > happily combine it with the origin into one multipath map, and data > corruption will almost certainly result. > > My argument is about how the host is supposed to deal with a WWID > change if it happens. Here, "WWID change" means that a given H:C:T:L > suddenly exposes different device designators than it used to, while > this device is in use by a host. Here, too, data corruption is > imminent, and can happen in a blink of an eye. To avoid this, several > things are needed: > >  1) the host needs to get notified about the change (likely by an UA > of > some sort) >  2) the kernel needs to react to the notification immediately, e.g. > by > blocking IO to the device, >  3) userspace tooling such as udev or multipathd need to figure out > how > to  how to deal with the situation cleanly, and eventually unblock > it. > > Wrt 1), we can only hope that it's the case. But 2) and 3) need work, > afaics. > In my view the WWID should never change. If a snapshot is created it should either obtain a new WWID. An example out of a Hitachi array is Device Identification VPD page: Addressed logical unit: designator type: T10 vendor identification, code set: ASCII vendor id: HITACHI vendor specific: 50403B050709 designator type: NAA, code set: Binary 0x60060e80123b050050403b0500000709 The majority of the naa wwid is tied to the storage subsystem and identifies the vendor oui, model, serial etc. The last 4 in this example indicate the LDEV ID (Sorry mainframe heritage here..). When a snapshot is taken these 4 will change as a new LDEV ID is assigned to the snapshot. This sort of behaviour should be consistent across all storage vendors imho. > Martin >