On Tue, Jun 06, 2023 at 10:25:47AM +0200, Roger Pau Monné wrote: > On Tue, May 30, 2023 at 04:31:13PM -0400, Demi Marie Obenour wrote: > > This allows specifying a disk sequence number in XenStore. If it does > > not match the disk sequence number of the underlying device, the device > > will not be exported and a warning will be logged. Userspace can use > > this to eliminate race conditions due to major/minor number reuse. > > Old kernels do not support the new syntax, but a later patch will allow > > userspace to discover that the new syntax is supported. > > > > Signed-off-by: Demi Marie Obenour > > --- > > drivers/block/xen-blkback/xenbus.c | 112 +++++++++++++++++++++++------ > > 1 file changed, 89 insertions(+), 23 deletions(-) > > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c > > index 4807af1d58059394d7a992335dabaf2bc3901721..9c3eb148fbd802c74e626c3d7bcd69dcb09bd921 100644 > > --- a/drivers/block/xen-blkback/xenbus.c > > +++ b/drivers/block/xen-blkback/xenbus.c > > @@ -24,6 +24,7 @@ struct backend_info { > > struct xenbus_watch backend_watch; > > unsigned major; > > unsigned minor; > > + unsigned long long diskseq; > > Since diskseq is declared as u64 in gendisk, better use the same type > here too? simple_strtoull() returns an unsigned long long, and C permits unsigned long long to be larger than 64 bits. > > char *mode; > > }; > > > > @@ -479,7 +480,7 @@ static void xen_vbd_free(struct xen_vbd *vbd) > > > > static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, > > unsigned major, unsigned minor, int readonly, > > - int cdrom) > > + bool cdrom, u64 diskseq) > > { > > struct xen_vbd *vbd; > > struct block_device *bdev; > > @@ -507,6 +508,26 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, > > xen_vbd_free(vbd); > > return -ENOENT; > > } > > + > > + if (diskseq) { > > + struct gendisk *disk = bdev->bd_disk; > > const. > > > + > > + if (unlikely(disk == NULL)) { > > + pr_err("%s: device %08x has no gendisk\n", > > + __func__, vbd->pdevice); > > + xen_vbd_free(vbd); > > + return -EFAULT; > > ENODEV or ENOENT might be more accurate IMO. I will drop it, as this turns out to be unreachable code. > > + } > > + > > + if (unlikely(disk->diskseq != diskseq)) { > > + pr_warn("%s: device %08x has incorrect sequence " > > + "number 0x%llx (expected 0x%llx)\n", > > I prefer %#llx, and likely pr_err like above. Also I think it's now > preferred to not split printed lines, so that `grep "has incorrect > sequence number" ...` can find the instance. Ah, so _that_ is why I got a warning from checkpatch! > > + __func__, vbd->pdevice, disk->diskseq, diskseq); > > + xen_vbd_free(vbd); > > + return -ENODEV; > > + } > > + } > > + > > vbd->size = vbd_sz(vbd); > > > > if (cdrom || disk_to_cdi(vbd->bdev->bd_disk)) > > @@ -707,6 +728,9 @@ static void backend_changed(struct xenbus_watch *watch, > > int cdrom = 0; > > unsigned long handle; > > char *device_type; > > + char *diskseq_str = NULL; > > const, and I think there's no need to init to NULL. > > > + int diskseq_len; > > unsigned int > > > + unsigned long long diskseq; > > u64 > > > > > pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id); > > > > @@ -725,10 +749,46 @@ static void backend_changed(struct xenbus_watch *watch, > > return; > > } > > > > - if (be->major | be->minor) { > > - if (be->major != major || be->minor != minor) > > - pr_warn("changing physical device (from %x:%x to %x:%x) not supported.\n", > > - be->major, be->minor, major, minor); > > + diskseq_str = xenbus_read(XBT_NIL, dev->nodename, "diskseq", &diskseq_len); > > + if (IS_ERR(diskseq_str)) { > > + int err = PTR_ERR(diskseq_str); > > + diskseq_str = NULL; > > + > > + /* > > + * If this does not exist, it means legacy userspace that does not > > + * support diskseq. > > + */ > > + if (unlikely(!XENBUS_EXIST_ERR(err))) { > > + xenbus_dev_fatal(dev, err, "reading diskseq"); > > + return; > > + } > > + diskseq = 0; > > + } else if (diskseq_len <= 0) { > > + xenbus_dev_fatal(dev, -EFAULT, "diskseq must not be empty"); > > + goto fail; > > + } else if (diskseq_len > 16) { > > + xenbus_dev_fatal(dev, -ERANGE, "diskseq too long: got %d but limit is 16", > > + diskseq_len); > > + goto fail; > > + } else if (diskseq_str[0] == '0') { > > + xenbus_dev_fatal(dev, -ERANGE, "diskseq must not start with '0'"); > > + goto fail; > > + } else { > > + char *diskseq_end; > > + diskseq = simple_strtoull(diskseq_str, &diskseq_end, 16); > > + if (diskseq_end != diskseq_str + diskseq_len) { > > + xenbus_dev_fatal(dev, -EINVAL, "invalid diskseq"); > > + goto fail; > > + } > > + kfree(diskseq_str); > > + diskseq_str = NULL; > > + } > > Won't it be simpler to use xenbus_scanf() with %llx formatter? xenbus_scanf() doesn’t check for overflow and accepts lots of junk it really should not. Should this be fixed in xenbus_scanf()? > Also, we might want to fetch "physical-device" and "diskseq" inside > the same xenstore transaction. Should the rest of the xenstore reads be included in the same transaction? > Also, you tie this logic to the "physical-device" watch, which > strictly implies that the "diskseq" node must be written to xenstore > before the "physical-device" node. This seems fragile, but I don't > see much better optiono since the "diskseq" is optional. What about including the diskseq in the "physical-device" node? Perhaps use diskseq@major:minor syntax? > The node and its behaviour should be documented in blkif.h. Indeed so. > > + if (be->major | be->minor | be->diskseq) { > > + if (be->major != major || be->minor != minor || be->diskseq != diskseq) > > + pr_warn("changing physical device (from %x:%x:%llx to %x:%x:%llx)" > > + " not supported.\n", > > + be->major, be->minor, be->diskseq, major, minor, diskseq); > > return; > > You are leaking diskseq_str here, and in all the error cases between > here and up to the call to xen_vbd_create(). I will fix this by moving the diskseq reading code into its own function. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab