On 18/12/2015 16:31, Ilya Dryomov wrote: > On Fri, Dec 18, 2015 at 1:38 PM, Loic Dachary wrote: >> Hi Ilya, >> >> It turns out that sgdisk 0.8.6 -i 2 /dev/vdb removes partitions and re-adds them on CentOS 7 with a 3.10.0-229.11.1.el7 kernel, in the same way partprobe does. It is used intensively by ceph-disk and inevitably leads to races where a device temporarily disapears. The same command (sgdisk 0.8.8) on Ubuntu 14.04 with a 3.13.0-62-generic kernel only generates two udev change events and does not remove / add partitions. The source code between sgdisk 0.8.6 and sgdisk 0.8.8 did not change in a significant way and the output of strace -e ioctl sgdisk -i 2 /dev/vdb is identical in both environments. >> >> ioctl(3, BLKGETSIZE, 20971520) = 0 >> ioctl(3, BLKGETSIZE64, 10737418240) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, HDIO_GETGEO, {heads=16, sectors=63, cylinders=16383, start=0}) = 0 >> ioctl(3, HDIO_GETGEO, {heads=16, sectors=63, cylinders=16383, start=0}) = 0 >> ioctl(3, BLKGETSIZE, 20971520) = 0 >> ioctl(3, BLKGETSIZE64, 10737418240) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKGETSIZE, 20971520) = 0 >> ioctl(3, BLKGETSIZE64, 10737418240) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> ioctl(3, BLKSSZGET, 512) = 0 >> >> This leads me to the conclusion that the difference is in how the kernel reacts to these ioctl. > > I'm pretty sure it's not the kernel versions that matter here, but > systemd versions. Those are all get-property ioctls, and I don't think > sgdisk -i does anything with the partition table. > > What it probably does though is it opens the disk for write for some > reason. When it closes it, udevd (systemd-udevd process) picks that > close up via inotify and issues the BLKRRPART ioctl, instructing the > kernel to re-read the partition table. Technically, that's different > from what partprobe does, but it still generates those udev events you > are seeing in the monitor. > > AFAICT udevd started doing this in v214. That explains everything indeed. # strace -f -e open sgdisk -i 2 /dev/vdb ... open("/dev/vdb", O_RDONLY) = 4 open("/dev/vdb", O_WRONLY|O_CREAT, 0644) = 4 open("/dev/vdb", O_RDONLY) = 4 Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown) Partition unique GUID: 7BBAA731-AA45-47B8-8661-B4FAA53C4162 First sector: 2048 (at 1024.0 KiB) Last sector: 204800 (at 100.0 MiB) Partition size: 202753 sectors (99.0 MiB) Attribute flags: 0000000000000000 Partition name: 'ceph journal' # strace -f -e open blkid /dev/vdb2 ... open("/etc/blkid.conf", O_RDONLY) = 4 open("/dev/.blkid.tab", O_RDONLY) = 4 open("/dev/vdb2", O_RDONLY) = 4 open("/sys/dev/block/253:18", O_RDONLY) = 5 open("/sys/block/vdb/dev", O_RDONLY) = 6 open("/dev/.blkid.tab-hVvwJi", O_RDWR|O_CREAT|O_EXCL, 0600) = 4 blkid does not open the device for write, hence the different behavior. Switching sgdisk in favor of blkid fixes the issue. Nice catch ! > Thanks, > > Ilya > -- Loïc Dachary, Artisan Logiciel Libre