All of lore.kernel.org
 help / color / mirror / Atom feed
* [md PATCH 0/2] Resend raid10-NULL-deref fix
@ 2018-02-02 22:19 NeilBrown
  2018-02-02 22:19 ` [md PATCH 2/2] md: only allow remove_and_add_spares when no sync_thread running NeilBrown
  2018-02-02 22:19 ` [md PATCH 1/2] md: document lifetime of internal rdev pointer NeilBrown
  0 siblings, 2 replies; 4+ messages in thread
From: NeilBrown @ 2018-02-02 22:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid, yuyufen, colyli

Hi Shaohua,
 you asked me to resend this (second) patch with improved
 comments back in November, and it completely slipped my mind - sorry.
 I decided the comments would fit better in a separate patch,
 so following are a patch to add comments, and then the bugfix
 patch marked for -stable.

 You asked:

   This will bypass hotadd too, is it what we want?

 No, it doesn't.  In the only cases where the patch makes a
 difference, a non-NULL rdev will have been passed as "this",
 so hot-add will be skipped anyway.

 This patch includes a bugfix as reported by yuyufen
 
Thanks,
NeilBrown

---

NeilBrown (2):
      md: document lifetime of internal rdev pointer.
      md: only allow remove_and_add_spares when no sync_thread running.


 drivers/md/md.c     |    4 ++++
 drivers/md/raid1.h  |   12 ++++++++++++
 drivers/md/raid10.h |   13 +++++++++++++
 drivers/md/raid5.c  |    4 ++--
 drivers/md/raid5.h  |   12 ++++++++++++
 5 files changed, 43 insertions(+), 2 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [md PATCH 1/2] md: document lifetime of internal rdev pointer.
  2018-02-02 22:19 [md PATCH 0/2] Resend raid10-NULL-deref fix NeilBrown
  2018-02-02 22:19 ` [md PATCH 2/2] md: only allow remove_and_add_spares when no sync_thread running NeilBrown
@ 2018-02-02 22:19 ` NeilBrown
  1 sibling, 0 replies; 4+ messages in thread
From: NeilBrown @ 2018-02-02 22:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid, yuyufen, colyli

The rdev pointer kept in the local 'config' for each for
raid1, raid10, raid4/5/6 has non-obvious lifetime rules.
Sometimes RCU is needed, sometimes a lock, something nothing.

Add documentation to explain this.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.h  |   12 ++++++++++++
 drivers/md/raid10.h |   13 +++++++++++++
 drivers/md/raid5.h  |   12 ++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index c7294e7557e0..eb84bc68e2fd 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -26,6 +26,18 @@
 #define BARRIER_BUCKETS_NR_BITS		(PAGE_SHIFT - ilog2(sizeof(atomic_t)))
 #define BARRIER_BUCKETS_NR		(1<<BARRIER_BUCKETS_NR_BITS)
 
+/* Note: raid1_info.rdev can be set to NULL asynchronously by raid1_remove_disk.
+ * There are three safe ways to access raid1_info.rdev.
+ * 1/ when holding mddev->reconfig_mutex
+ * 2/ when resync/recovery is known to be happening - i.e. in code that is
+ *    called as part of performing resync/recovery.
+ * 3/ while holding rcu_read_lock(), use rcu_dereference to get the pointer
+ *    and if it is non-NULL, increment rdev->nr_pending before dropping the
+ *    RCU lock.
+ * When .rdev is set to NULL, the nr_pending count checked again and if it has
+ * been incremented, the pointer is put back in .rdev.
+ */
+
 struct raid1_info {
 	struct md_rdev	*rdev;
 	sector_t	head_position;
diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
index db2ac22ac1b4..e2e8840de9bf 100644
--- a/drivers/md/raid10.h
+++ b/drivers/md/raid10.h
@@ -2,6 +2,19 @@
 #ifndef _RAID10_H
 #define _RAID10_H
 
+/* Note: raid10_info.rdev can be set to NULL asynchronously by
+ * raid10_remove_disk.
+ * There are three safe ways to access raid10_info.rdev.
+ * 1/ when holding mddev->reconfig_mutex
+ * 2/ when resync/recovery/reshape is known to be happening - i.e. in code
+ *    that is called as part of performing resync/recovery/reshape.
+ * 3/ while holding rcu_read_lock(), use rcu_dereference to get the pointer
+ *    and if it is non-NULL, increment rdev->nr_pending before dropping the
+ *    RCU lock.
+ * When .rdev is set to NULL, the nr_pending count checked again and if it has
+ * been incremented, the pointer is put back in .rdev.
+ */
+
 struct raid10_info {
 	struct md_rdev	*rdev, *replacement;
 	sector_t	head_position;
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 2e6123825095..3f8da26032ac 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -450,6 +450,18 @@ enum {
  * HANDLE gets cleared if stripe_handle leaves nothing locked.
  */
 
+/* Note: disk_info.rdev can be set to NULL asynchronously by raid5_remove_disk.
+ * There are three safe ways to access disk_info.rdev.
+ * 1/ when holding mddev->reconfig_mutex
+ * 2/ when resync/recovery/reshape is known to be happening - i.e. in code that
+ *    is called as part of performing resync/recovery/reshape.
+ * 3/ while holding rcu_read_lock(), use rcu_dereference to get the pointer
+ *    and if it is non-NULL, increment rdev->nr_pending before dropping the RCU
+ *    lock.
+ * When .rdev is set to NULL, the nr_pending count checked again and if
+ * it has been incremented, the pointer is put back in .rdev.
+ */
+
 struct disk_info {
 	struct md_rdev	*rdev, *replacement;
 	struct page	*extra_page; /* extra page to use in prexor */



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [md PATCH 2/2] md: only allow remove_and_add_spares when no sync_thread running.
  2018-02-02 22:19 [md PATCH 0/2] Resend raid10-NULL-deref fix NeilBrown
@ 2018-02-02 22:19 ` NeilBrown
  2018-02-06 14:50   ` Artur Paszkiewicz
  2018-02-02 22:19 ` [md PATCH 1/2] md: document lifetime of internal rdev pointer NeilBrown
  1 sibling, 1 reply; 4+ messages in thread
From: NeilBrown @ 2018-02-02 22:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid, yuyufen, colyli

The locking protocols in md assume that a device will
never be removed from an array during resync/recovery/reshape.
When that isn't happening, rcu or reconfig_mutex is needed
to protect an rdev pointer while taking a refcount.  When
it is happening, that protection isn't needed.

Unfortunately there are cases were remove_and_add_spares() is
called when recovery might be happening: is state_store(),
slot_store() and hot_remove_disk().
In each case, this is just an optimization, to try to expedite
removal from the personality so the device can be removed from
the array.  If resync etc is happening, we just have to wait
for md_check_recover to find a suitable time to call
remove_and_add_spares().

This optimization and not essential so it doesn't
matter if it fails.
So change remove_and_add_spares() to abort early if
resync/recovery/reshape is happening, unless it is called
from md_check_recovery() as part of a newly started recovery.
The parameter "this" is only NULL when called from
md_check_recovery() so when it is NULL, there is no need to abort.

As this can result in a NULL dereference, the fix is suitable
for -stable.

cc: yuyufen <yuyufen@huawei.com>
Cc: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Fixes: 8430e7e0af9a ("md: disconnect device from personality before trying to remove it.")
Cc: stable@ver.kernel.org (v4.8+)
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/md.c    |    4 ++++
 drivers/md/raid5.c |    4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4e4dee0ec2de..926542fbc892 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8554,6 +8554,10 @@ static int remove_and_add_spares(struct mddev *mddev,
 	int removed = 0;
 	bool remove_some = false;
 
+	if (this && test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+		/* Mustn't remove devices when resync thread is running */
+		return 0;
+
 	rdev_for_each(rdev, mddev) {
 		if ((this == NULL || rdev == this) &&
 		    rdev->raid_disk >= 0 &&
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 98ce4272ace9..3fa97dad3837 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4448,12 +4448,12 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
 		else if (is_bad) {
 			/* also not in-sync */
 			if (!test_bit(WriteErrorSeen, &rdev->flags) &&
-			    test_bit(R5_UPTODATE, &dev->flags)) {
+			    (test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_OVERWRITE, &dev->flags))) {
 				/* treat as in-sync, but with a read error
 				 * which we can now try to correct
 				 */
 				set_bit(R5_Insync, &dev->flags);
-				set_bit(R5_ReadError, &dev->flags);
+				//set_bit(R5_ReadError, &dev->flags);
 			}
 		} else if (test_bit(In_sync, &rdev->flags))
 			set_bit(R5_Insync, &dev->flags);



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [md PATCH 2/2] md: only allow remove_and_add_spares when no sync_thread running.
  2018-02-02 22:19 ` [md PATCH 2/2] md: only allow remove_and_add_spares when no sync_thread running NeilBrown
@ 2018-02-06 14:50   ` Artur Paszkiewicz
  0 siblings, 0 replies; 4+ messages in thread
From: Artur Paszkiewicz @ 2018-02-06 14:50 UTC (permalink / raw)
  To: NeilBrown, Shaohua Li; +Cc: linux-raid, yuyufen, colyli

On 02/02/2018 11:19 PM, NeilBrown wrote:
> The locking protocols in md assume that a device will
> never be removed from an array during resync/recovery/reshape.
> When that isn't happening, rcu or reconfig_mutex is needed
> to protect an rdev pointer while taking a refcount.  When
> it is happening, that protection isn't needed.
> 
> Unfortunately there are cases were remove_and_add_spares() is
> called when recovery might be happening: is state_store(),
> slot_store() and hot_remove_disk().
> In each case, this is just an optimization, to try to expedite
> removal from the personality so the device can be removed from
> the array.  If resync etc is happening, we just have to wait
> for md_check_recover to find a suitable time to call
> remove_and_add_spares().
> 
> This optimization and not essential so it doesn't
> matter if it fails.
> So change remove_and_add_spares() to abort early if
> resync/recovery/reshape is happening, unless it is called
> from md_check_recovery() as part of a newly started recovery.
> The parameter "this" is only NULL when called from
> md_check_recovery() so when it is NULL, there is no need to abort.
> 
> As this can result in a NULL dereference, the fix is suitable
> for -stable.
> 
> cc: yuyufen <yuyufen@huawei.com>
> Cc: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
> Fixes: 8430e7e0af9a ("md: disconnect device from personality before trying to remove it.")
> Cc: stable@ver.kernel.org (v4.8+)
> Signed-off-by: NeilBrown <neilb@suse.com>

I can confirm that this patch fixes a NULL pointer dereference issue for me.

Tested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-02-06 14:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-02 22:19 [md PATCH 0/2] Resend raid10-NULL-deref fix NeilBrown
2018-02-02 22:19 ` [md PATCH 2/2] md: only allow remove_and_add_spares when no sync_thread running NeilBrown
2018-02-06 14:50   ` Artur Paszkiewicz
2018-02-02 22:19 ` [md PATCH 1/2] md: document lifetime of internal rdev pointer NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.