Missing lock during partition read

From: "Tkaczyk, Mariusz" <mariusz.tkaczyk@intel.com>
To: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Missing lock during partition read
Date: Thu, 5 Nov 2020 09:31:21 +0000	[thread overview]
Message-ID: <SA0PR11MB4542ECA84F72506B39C3C9F1FFEE0@SA0PR11MB4542.namprd11.prod.outlook.com> (raw)

Hi all,

I found an issue related to missing locking mechanism during partition
detection process. I'm using md raid 5 with IMSM metadata. On top of
array I created GPT with some partitions.

The problem here is that partition may stay read-only even if the parent
device (in this case raid array) becomes read-write. The issue doesn't
affect every partition. I got result where some of partitions are
read-write and the rest doesn't.

It is related to raid assembly process for external arrays. First
array appears read-only and later it is switched to read-write mode.
The read-only for array is well respected and as a result, if partition
detection start at this stage, then partitions get read-only mode.

The mode switch is done from userspace by mdmon, it manages array's
sysfs attribute "md/array_state" and kernel changes to read-write from
this context. This is done by set_disk_ro() function
(see array_state_store() in md.c).

So, as I wrote before partition detection starts when array is read-only.
I investigated that the issue occurs if mdmon changes array state during 
this process in background. As a result, it changes state on already
detected partitions, it doesn't wait for rest to appear. Udev reports md
device change event (generated by "md/array_state" update) between adds:

KERNEL[85844.484805] add /devices/virtual/block/md126/md126p1 (block)
KERNEL[85844.484853] change /devices/virtual/block/md126 (block)
KERNEL[85844.484912] add /devices/virtual/block/md126/md126p2 (block)

It ends with /dev/md126p2 as read-only. It can be fixed manually by
partprobe, but system may drop to emergency shell or dracut, depending
on configuration.

My understanding is that those two actions aren't synchronized and time
race occurs. To prevent from it, common resources should be locked.
Looks like md problem, it cannot be reproduced on standalone drives.
What are your thoughts?

TIA,
Mariusz