linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Race condition in Kernel
@ 2021-03-24 12:37 Gulam Mohamed
  2021-03-25  0:37 ` Ming Lei
  2021-03-25  1:46 ` Ming Lei
  0 siblings, 2 replies; 6+ messages in thread
From: Gulam Mohamed @ 2021-03-24 12:37 UTC (permalink / raw)
  To: hch, ming.lei, linux-kernel, linux-block
  Cc: Junxiao Bi, Martin Petersen, axboe

Hi All,

We are facing a stale link (of the device) issue during the iscsi-logout process if we use parted command just before the iscsi logout. Here are the details:
	 	 
As part of iscsi logout, the partitions and the disk will be removed. The parted command, used to list the partitions, will open the disk in RW mode which results in systemd-udevd re-reading the partitions. This will trigger the rescan partitions which will also delete and re-add the partitions. So, both iscsi logout processing and the parted (through systemd-udevd) will be involved in add/delete of partitions. In our case, the following sequence of operations happened (the iscsi device is /dev/sdb with partition sdb1):
	
	1. sdb1 was removed by PARTED
	2. kworker, as part of iscsi logout, couldn't remove sdb1 as it was already removed by PARTED
	3. sdb1 was added by parted
	4. sdb was NOW removed as part of iscsi logout (the last part of the device removal after remoing the partitions)

Since the symlink /sys/class/block/sdb1 points to /sys/class/devices/platform/hostx/sessionx/targetx:x:x:x/x:x:x:x/block/sdb/sdb1 and since sdb is already removed, the symlink /sys/class/block/sdb1 will be orphan and stale. So, this stale link is a result of the race condition in kernel between the systemd-udevd and iscsi-logout processing as described above. We are able to reproduce this even with latest upstream kernel.
	
We have come across a patch from Ming Lei which was created for "avoid to drop & re-add partitions if partitions aren't changed":
https://lore.kernel.org/linux-block/20210216084430.GA23694@lst.de/T/
	
This patch could resolve our problem of stale link but it just seems to be a work-around and not the actual fix for the race. We were looking for help to fix this race in kernel. Do you have any idea how to fix this race condition?
	
Following is the script we are using to reproduce the issue:
	
#!/bin/bash
  
dir=/sys/class/block
iter_count=0
while [ $iter_count -lt 10000000 ]; do
    iscsiadm -m node -T iqn.2016-01.com.example:target1 -p 100.100.242.162:3260 -l

    poll_loop=0
    while [ ! -e /sys/class/block/sdb1 ]; do
        ls  -i -l /sys/class/block/sd* > /dev/null
        let poll_loop+=1
        if [ $poll_loop -gt 1000000 ]; then
            ls  -i -l /sys/class/block/sd* --color
            exit 1
        fi
    done

    ls -i -l /sys/class/block/sd* --color
    mount /dev/sdb1 /mnt
    dd of=/mnt/sdb1 if=/dev/sdb2 bs=1M count=100 &
    pid_01=$!
    wait $pid_01
    umount -l /mnt &
    pid_02=$!
    wait $pid_02

    parted /dev/sdb -s print
    iscsiadm -m node -T iqn.2016-01.com.example:target1 -p 100.100.242.162:3260 -u &
    pid_1=$!

    iscsiadm -m node -T iqn.2016-01.com.example:target2 -p 100.100.242.162:3260 -l &
    pid_2=$!

    sleep 1
    ls -i -l /sys/class/block/sd* --color

    for i in `ls  $dir`; do
        if [ ! -e $dir/$i ]; then
            echo "broken link: $dir/$i"
            exit 1
        fi
    done

    parted /dev/sdb -s print
    iscsiadm -m node -T iqn.2016-01.com.example:target2 -p 100.100.242.162:3260 -u
    iter_count=`expr $iter_count + 1`
done


Regards,
Gulam Mohamed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition in Kernel
  2021-03-24 12:37 Race condition in Kernel Gulam Mohamed
@ 2021-03-25  0:37 ` Ming Lei
  2021-03-25  0:58   ` Junxiao Bi
  2021-03-25  1:46 ` Ming Lei
  1 sibling, 1 reply; 6+ messages in thread
From: Ming Lei @ 2021-03-25  0:37 UTC (permalink / raw)
  To: Gulam Mohamed
  Cc: hch, linux-kernel, linux-block, Junxiao Bi, Martin Petersen, axboe

On Wed, Mar 24, 2021 at 12:37:03PM +0000, Gulam Mohamed wrote:
> Hi All,
> 
> We are facing a stale link (of the device) issue during the iscsi-logout process if we use parted command just before the iscsi logout. Here are the details:
> 	 	 
> As part of iscsi logout, the partitions and the disk will be removed. The parted command, used to list the partitions, will open the disk in RW mode which results in systemd-udevd re-reading the partitions. This will trigger the rescan partitions which will also delete and re-add the partitions. So, both iscsi logout processing and the parted (through systemd-udevd) will be involved in add/delete of partitions. In our case, the following sequence of operations happened (the iscsi device is /dev/sdb with partition sdb1):
> 	
> 	1. sdb1 was removed by PARTED
> 	2. kworker, as part of iscsi logout, couldn't remove sdb1 as it was already removed by PARTED
> 	3. sdb1 was added by parted
> 	4. sdb was NOW removed as part of iscsi logout (the last part of the device removal after remoing the partitions)
> 
> Since the symlink /sys/class/block/sdb1 points to /sys/class/devices/platform/hostx/sessionx/targetx:x:x:x/x:x:x:x/block/sdb/sdb1 and since sdb is already removed, the symlink /sys/class/block/sdb1 will be orphan and stale. So, this stale link is a result of the race condition in kernel between the systemd-udevd and iscsi-logout processing as described above. We are able to reproduce this even with latest upstream kernel.
> 	
> We have come across a patch from Ming Lei which was created for "avoid to drop & re-add partitions if partitions aren't changed":
> https://lore.kernel.org/linux-block/20210216084430.GA23694@lst.de/T/

BTW,  there is a newer version of this patchset:

https://lore.kernel.org/linux-block/20210224081825.GA1339@lst.de/#r

> 	
> This patch could resolve our problem of stale link but it just seems to be a work-around and not the actual fix for the race. We were looking for help to fix this race in kernel. Do you have any idea how to fix this race condition?
>

IMO, that isn't a work-around, kernel shouldn't drop partitions if
partition table isn't changed. But Christoph thought the current approach
is taken since beginning of kernel, and he suggested to fix systemd-udev.



Thanks, 
Ming


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition in Kernel
  2021-03-25  0:37 ` Ming Lei
@ 2021-03-25  0:58   ` Junxiao Bi
  0 siblings, 0 replies; 6+ messages in thread
From: Junxiao Bi @ 2021-03-25  0:58 UTC (permalink / raw)
  To: Ming Lei, Gulam Mohamed
  Cc: hch, linux-kernel, linux-block, Martin Petersen, axboe

On 3/24/21 5:37 PM, Ming Lei wrote:

> On Wed, Mar 24, 2021 at 12:37:03PM +0000, Gulam Mohamed wrote:
>> Hi All,
>>
>> We are facing a stale link (of the device) issue during the iscsi-logout process if we use parted command just before the iscsi logout. Here are the details:
>> 	 	
>> As part of iscsi logout, the partitions and the disk will be removed. The parted command, used to list the partitions, will open the disk in RW mode which results in systemd-udevd re-reading the partitions. This will trigger the rescan partitions which will also delete and re-add the partitions. So, both iscsi logout processing and the parted (through systemd-udevd) will be involved in add/delete of partitions. In our case, the following sequence of operations happened (the iscsi device is /dev/sdb with partition sdb1):
>> 	
>> 	1. sdb1 was removed by PARTED
>> 	2. kworker, as part of iscsi logout, couldn't remove sdb1 as it was already removed by PARTED
>> 	3. sdb1 was added by parted
>> 	4. sdb was NOW removed as part of iscsi logout (the last part of the device removal after remoing the partitions)
>>
>> Since the symlink /sys/class/block/sdb1 points to /sys/class/devices/platform/hostx/sessionx/targetx:x:x:x/x:x:x:x/block/sdb/sdb1 and since sdb is already removed, the symlink /sys/class/block/sdb1 will be orphan and stale. So, this stale link is a result of the race condition in kernel between the systemd-udevd and iscsi-logout processing as described above. We are able to reproduce this even with latest upstream kernel.
>> 	
>> We have come across a patch from Ming Lei which was created for "avoid to drop & re-add partitions if partitions aren't changed":
>> https://lore.kernel.org/linux-block/20210216084430.GA23694@lst.de/T/
> BTW,  there is a newer version of this patchset:
>
> https://lore.kernel.org/linux-block/20210224081825.GA1339@lst.de/#r
>
>> 	
>> This patch could resolve our problem of stale link but it just seems to be a work-around and not the actual fix for the race. We were looking for help to fix this race in kernel. Do you have any idea how to fix this race condition?
>>
> IMO, that isn't a work-around, kernel shouldn't drop partitions if
> partition table isn't changed. But Christoph thought the current approach
> is taken since beginning of kernel, and he suggested to fix systemd-udev.

This is a real kernel bug. Whatever BLK_RRPART do, it should not cause 
this sysfs stale link issue. After this issue happen, there is no way to 
remove that stale link except reboot. The situation is even worse when 
login back a new disk, since it will reuse the disk number of the old 
one, it will fail when it creates the symbol link because the stale link 
is still there.

Thanks,

Junxiao.

>
>
>
> Thanks,
> Ming
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition in Kernel
  2021-03-24 12:37 Race condition in Kernel Gulam Mohamed
  2021-03-25  0:37 ` Ming Lei
@ 2021-03-25  1:46 ` Ming Lei
  2021-04-01 16:27   ` Gulam Mohamed
  1 sibling, 1 reply; 6+ messages in thread
From: Ming Lei @ 2021-03-25  1:46 UTC (permalink / raw)
  To: Gulam Mohamed
  Cc: hch, linux-kernel, linux-block, Junxiao Bi, Martin Petersen, axboe

On Wed, Mar 24, 2021 at 12:37:03PM +0000, Gulam Mohamed wrote:
> Hi All,
> 
> We are facing a stale link (of the device) issue during the iscsi-logout process if we use parted command just before the iscsi logout. Here are the details:
> 	 	 
> As part of iscsi logout, the partitions and the disk will be removed. The parted command, used to list the partitions, will open the disk in RW mode which results in systemd-udevd re-reading the partitions. This will trigger the rescan partitions which will also delete and re-add the partitions. So, both iscsi logout processing and the parted (through systemd-udevd) will be involved in add/delete of partitions. In our case, the following sequence of operations happened (the iscsi device is /dev/sdb with partition sdb1):
> 	
> 	1. sdb1 was removed by PARTED
> 	2. kworker, as part of iscsi logout, couldn't remove sdb1 as it was already removed by PARTED
> 	3. sdb1 was added by parted

After kworker is started for logout, I guess all IOs are supposed to be failed
at that time, so just wondering why 'sdb1' is still added by parted(systemd-udev)? 
ioctl(BLKRRPART) needs to read partition table for adding back partitions, if IOs
are failed by iscsi logout, I guess the issue can be avoided too?

-- 
Ming


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Race condition in Kernel
  2021-03-25  1:46 ` Ming Lei
@ 2021-04-01 16:27   ` Gulam Mohamed
  2021-04-02  2:38     ` Ming Lei
  0 siblings, 1 reply; 6+ messages in thread
From: Gulam Mohamed @ 2021-04-01 16:27 UTC (permalink / raw)
  To: Ming Lei
  Cc: hch, linux-kernel, linux-block, Junxiao Bi, Martin Petersen, axboe

Hi Ming,

      Thanks for taking a look into this. Can you please see my inline comments in below mail?

Regards,
Gulam Mohamed.

-----Original Message-----
From: Ming Lei <ming.lei@redhat.com> 
Sent: Thursday, March 25, 2021 7:16 AM
To: Gulam Mohamed <gulam.mohamed@oracle.com>
Cc: hch@infradead.org; linux-kernel@vger.kernel.org; linux-block@vger.kernel.org; Junxiao Bi <junxiao.bi@oracle.com>; Martin Petersen <martin.petersen@oracle.com>; axboe@kernel.dk
Subject: Re: Race condition in Kernel

On Wed, Mar 24, 2021 at 12:37:03PM +0000, Gulam Mohamed wrote:
> Hi All,
> 
> We are facing a stale link (of the device) issue during the iscsi-logout process if we use parted command just before the iscsi logout. Here are the details:
> 	 	 
> As part of iscsi logout, the partitions and the disk will be removed. The parted command, used to list the partitions, will open the disk in RW mode which results in systemd-udevd re-reading the partitions. This will trigger the rescan partitions which will also delete and re-add the partitions. So, both iscsi logout processing and the parted (through systemd-udevd) will be involved in add/delete of partitions. In our case, the following sequence of operations happened (the iscsi device is /dev/sdb with partition sdb1):
> 	
> 	1. sdb1 was removed by PARTED
> 	2. kworker, as part of iscsi logout, couldn't remove sdb1 as it was already removed by PARTED
> 	3. sdb1 was added by parted

After kworker is started for logout, I guess all IOs are supposed to be failed at that time, so just wondering why 'sdb1' is still added by parted(systemd-udev)? 
ioctl(BLKRRPART) needs to read partition table for adding back partitions, if IOs are failed by iscsi logout, I guess the issue can be avoided too?

[GULAM]: Yes, the ioctl(BLKRRPART) reads the partition table for adding back the partitions. I kept a printk in the code just after the partition table is read. Noticed that the partition table was read before the iscsi-logout kworker started the logout processing.
                   Following are the logs for your reference:

 Apr  1 09:23:27 gms-iscsi-initiator-2 kernel: ORA:: Calling sysfs_delete_link() for dev: sdb3 command: systemd-udevd		<== sdb3 Removed by PARTED 
Apr  1 09:23:27 gms-iscsi-initiator-2 kernel: ORA:: rescan_partitions() Read Complete to the disk: sdb command: systemd-udevd   <== Reading sdb completed, before iscsi-logout worker started
Apr  1 09:23:27 gms-iscsi-initiator-2 kernel: ORA:: Calling sysfs_delete_link() for dev: 3:0:0:0 command: kworker/u16:3
Apr  1 09:23:27 gms-iscsi-initiator-2 kernel: sdb: sdb1 sdb2 sdb3
Apr  1 09:23:27 gms-iscsi-initiator-2 kernel: ORA:: device: 'sdb3': device_add command: systemd-udevd		<== sdb3 Added by PARTED 
Apr  1 09:23:27 gms-iscsi-initiator-2 kernel: ORA:: Calling sysfs_delete_link() for dev: 8:16 command: kworker/u16:3
Apr  1 09:23:27 gms-iscsi-initiator-2 kernel: ORA:: Calling sysfs_delete_link() for dev: sdb command: kworker/u16:3	<== sdb Removed by iscsi 
Apr  1 09:23:27 gms-iscsi-initiator-2 kernel: scsi 3:0:0:0: alua: Detached

--
Ming


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition in Kernel
  2021-04-01 16:27   ` Gulam Mohamed
@ 2021-04-02  2:38     ` Ming Lei
  0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2021-04-02  2:38 UTC (permalink / raw)
  To: Gulam Mohamed
  Cc: hch, linux-kernel, linux-block, Junxiao Bi, Martin Petersen, axboe

On Thu, Apr 01, 2021 at 04:27:37PM +0000, Gulam Mohamed wrote:
> Hi Ming,
> 
>       Thanks for taking a look into this. Can you please see my inline comments in below mail?
> 
> Regards,
> Gulam Mohamed.
> 
> -----Original Message-----
> From: Ming Lei <ming.lei@redhat.com> 
> Sent: Thursday, March 25, 2021 7:16 AM
> To: Gulam Mohamed <gulam.mohamed@oracle.com>
> Cc: hch@infradead.org; linux-kernel@vger.kernel.org; linux-block@vger.kernel.org; Junxiao Bi <junxiao.bi@oracle.com>; Martin Petersen <martin.petersen@oracle.com>; axboe@kernel.dk
> Subject: Re: Race condition in Kernel
> 
> On Wed, Mar 24, 2021 at 12:37:03PM +0000, Gulam Mohamed wrote:
> > Hi All,
> > 
> > We are facing a stale link (of the device) issue during the iscsi-logout process if we use parted command just before the iscsi logout. Here are the details:
> > 	 	 
> > As part of iscsi logout, the partitions and the disk will be removed. The parted command, used to list the partitions, will open the disk in RW mode which results in systemd-udevd re-reading the partitions. This will trigger the rescan partitions which will also delete and re-add the partitions. So, both iscsi logout processing and the parted (through systemd-udevd) will be involved in add/delete of partitions. In our case, the following sequence of operations happened (the iscsi device is /dev/sdb with partition sdb1):
> > 	
> > 	1. sdb1 was removed by PARTED
> > 	2. kworker, as part of iscsi logout, couldn't remove sdb1 as it was already removed by PARTED
> > 	3. sdb1 was added by parted
> 
> After kworker is started for logout, I guess all IOs are supposed to be failed at that time, so just wondering why 'sdb1' is still added by parted(systemd-udev)? 
> ioctl(BLKRRPART) needs to read partition table for adding back partitions, if IOs are failed by iscsi logout, I guess the issue can be avoided too?
> 
> [GULAM]: Yes, the ioctl(BLKRRPART) reads the partition table for adding back the partitions. I kept a printk in the code just after the partition table is read. Noticed that the partition table was read before the iscsi-logout kworker started the logout processing.

OK, I guess I understood your issue now, what you want is to not allow
to add partitions since step 1, so can you remove disk just at the
beginning of 2) if it is possible? then step 1) isn't needed any more

For your issue, my patch of 'not drop partitions if partition table
isn't changed' can't fix your issue completely since new real partition
still may come from parted during the series.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-04-02  2:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-24 12:37 Race condition in Kernel Gulam Mohamed
2021-03-25  0:37 ` Ming Lei
2021-03-25  0:58   ` Junxiao Bi
2021-03-25  1:46 ` Ming Lei
2021-04-01 16:27   ` Gulam Mohamed
2021-04-02  2:38     ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).