All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/4] block: fix partition use-after-free and optimization
@ 2020-05-08  8:17 Ming Lei
  2020-05-08  8:17 ` [PATCH V3 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Ming Lei @ 2020-05-08  8:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao

Hi,

The 1st patch fixes one use-after-free on cached last_lookup partition.

The other 3 patches optimizes partition uses in IO path.

V3:
	- add reviewed-by tag
	- centralize partno check in the helper(4/4)

V2:
	- add comment, use part_to_disk() to retrieve disk instead of
	adding one field to hd_struct
	- don't put part in blk_account_io_merge


Ming Lei (4):
  block: fix use-after-free on cached last_lookup partition
  block: only define 'nr_sects_seq' in hd_part for 32bit SMP
  block: re-organize fields of 'struct hd_part'
  block: don't hold part0's refcount in IO path

 block/blk-core.c        | 12 ------------
 block/blk.h             | 13 ++++++-------
 block/genhd.c           | 17 +++++++++++++----
 block/partitions/core.c | 14 ++++++++++++--
 include/linux/genhd.h   | 24 +++++++++++++++++-------
 5 files changed, 48 insertions(+), 32 deletions(-)

Cc: Yufen Yu <yuyufen@huawei.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
-- 
2.25.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V3 1/4] block: fix use-after-free on cached last_lookup partition
  2020-05-08  8:17 [PATCH V3 0/4] block: fix partition use-after-free and optimization Ming Lei
@ 2020-05-08  8:17 ` Ming Lei
  2020-05-08  8:17 ` [PATCH V3 2/4] block: only define 'nr_sects_seq' in hd_part for 32bit SMP Ming Lei
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2020-05-08  8:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao

delete_partition() clears the cached last_lookup partition. However the
.last_lookup cache may be overwritten by one IO path after it is cleared
from delete_partition(). Then another IO path may use the cached deleting
partition after hd_struct_free() is called, then use-after-free is triggered
on the cached partition.

Fixes the issue by the following approach:

1) always get the partition's refcount via hd_struct_try_get() before
setting .last_lookup

2) move clearing .last_lookup from delete_partition() to hd_struct_free()
which is the release handle of the partition's percpu-refcount, so that no
IO path can cache deleteing partition via .last_lookup.

It is one candidate approach of Yufen's patch[1] which adds overhead
in fast path by indirect lookup which may introduce one extra cacheline
in IO path. Also this patch relies on percpu-refcount's protection, and
it is easier to understand and verify.

[1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#t

Reported-by: Yufen Yu <yuyufen@huawei.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c        | 12 ------------
 block/genhd.c           | 15 ++++++++++++---
 block/partitions/core.c | 12 +++++++++++-
 3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ec50d7e6be21..826a8980997d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1363,18 +1363,6 @@ void blk_account_io_start(struct request *rq, bool new_io)
 		part_stat_inc(part, merges[rw]);
 	} else {
 		part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
-		if (!hd_struct_try_get(part)) {
-			/*
-			 * The partition is already being removed,
-			 * the request will be accounted on the disk only
-			 *
-			 * We take a reference on disk->part0 although that
-			 * partition will never be deleted, so we can treat
-			 * it as any other partition.
-			 */
-			part = &rq->rq_disk->part0;
-			hd_struct_get(part);
-		}
 		part_inc_in_flight(rq->q, part, rw);
 		rq->part = part;
 	}
diff --git a/block/genhd.c b/block/genhd.c
index c05d509877fa..ec57d5d7a64d 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -344,11 +344,12 @@ static inline int sector_in_part(struct hd_struct *part, sector_t sector)
  * primarily used for stats accounting.
  *
  * CONTEXT:
- * RCU read locked.  The returned partition pointer is valid only
- * while preemption is disabled.
+ * RCU read locked.  The returned partition pointer is always valid
+ * because its refcount is grabbed.
  *
  * RETURNS:
  * Found partition on success, part0 is returned if no partition matches
+ * or the matched partition is being deleted.
  */
 struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector)
 {
@@ -359,17 +360,25 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector)
 	ptbl = rcu_dereference(disk->part_tbl);
 
 	part = rcu_dereference(ptbl->last_lookup);
-	if (part && sector_in_part(part, sector))
+	if (part && sector_in_part(part, sector) && hd_struct_try_get(part))
 		return part;
 
 	for (i = 1; i < ptbl->len; i++) {
 		part = rcu_dereference(ptbl->part[i]);
 
 		if (part && sector_in_part(part, sector)) {
+			/*
+			 * only live partition can be cached for lookup,
+			 * so use-after-free on cached & deleting partition
+			 * can be avoided
+			 */
+			if (!hd_struct_try_get(part))
+				break;
 			rcu_assign_pointer(ptbl->last_lookup, part);
 			return part;
 		}
 	}
+	hd_struct_get(&disk->part0);
 	return &disk->part0;
 }
 
diff --git a/block/partitions/core.c b/block/partitions/core.c
index c085bf85509b..f4000dac23ef 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -288,6 +288,12 @@ static void hd_struct_free_work(struct work_struct *work)
 static void hd_struct_free(struct percpu_ref *ref)
 {
 	struct hd_struct *part = container_of(ref, struct hd_struct, ref);
+	struct gendisk *disk = part_to_disk(part);
+	struct disk_part_tbl *ptbl =
+		rcu_dereference_protected(disk->part_tbl, 1);
+
+	rcu_assign_pointer(ptbl->last_lookup, NULL);
+	put_device(disk_to_dev(disk));
 
 	INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work);
 	queue_rcu_work(system_wq, &part->rcu_work);
@@ -309,8 +315,12 @@ void delete_partition(struct gendisk *disk, struct hd_struct *part)
 	struct disk_part_tbl *ptbl =
 		rcu_dereference_protected(disk->part_tbl, 1);
 
+	/*
+	 * ->part_tbl is referenced in this part's release handler, so
+	 *  we have to hold the disk device
+	 */
+	get_device(disk_to_dev(part_to_disk(part)));
 	rcu_assign_pointer(ptbl->part[part->partno], NULL);
-	rcu_assign_pointer(ptbl->last_lookup, NULL);
 	kobject_put(part->holder_dir);
 	device_del(part_to_dev(part));
 
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V3 2/4] block: only define 'nr_sects_seq' in hd_part for 32bit SMP
  2020-05-08  8:17 [PATCH V3 0/4] block: fix partition use-after-free and optimization Ming Lei
  2020-05-08  8:17 ` [PATCH V3 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei
@ 2020-05-08  8:17 ` Ming Lei
  2020-05-08  8:17 ` [PATCH V3 3/4] block: re-organize fields of 'struct hd_part' Ming Lei
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2020-05-08  8:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao

The seqcount of 'nr_sects_seq' is only needed in case of 32bit SMP,
so define it just for 32bit SMP.

Cc: Yufen Yu <yuyufen@huawei.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/genhd.c           | 2 +-
 block/partitions/core.c | 2 +-
 include/linux/genhd.h   | 9 +++++++++
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index ec57d5d7a64d..bf8cbb033d64 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1729,7 +1729,7 @@ struct gendisk *__alloc_disk_node(int minors, int node_id)
 		 * TODO: Ideally set_capacity() and get_capacity() should be
 		 * converted to make use of bd_mutex and sequence counters.
 		 */
-		seqcount_init(&disk->part0.nr_sects_seq);
+		hd_sects_seq_init(&disk->part0);
 		if (hd_ref_init(&disk->part0)) {
 			hd_free_part(&disk->part0);
 			kfree(disk);
diff --git a/block/partitions/core.c b/block/partitions/core.c
index f4000dac23ef..ec81986b358e 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -392,7 +392,7 @@ static struct hd_struct *add_partition(struct gendisk *disk, int partno,
 		goto out_free;
 	}
 
-	seqcount_init(&p->nr_sects_seq);
+	hd_sects_seq_init(p);
 	pdev = part_to_dev(p);
 
 	p->start_sect = start;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index f9c226f9546a..b4744035ae58 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -68,7 +68,9 @@ struct hd_struct {
 	 * can be non-atomic on 32bit machines with 64bit sector_t.
 	 */
 	sector_t nr_sects;
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
 	seqcount_t nr_sects_seq;
+#endif
 	sector_t alignment_offset;
 	unsigned int discard_alignment;
 	struct device __dev;
@@ -274,6 +276,13 @@ static inline void disk_put_part(struct hd_struct *part)
 		put_device(part_to_dev(part));
 }
 
+static inline void hd_sects_seq_init(struct hd_struct *p)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+	seqcount_init(&p->nr_sects_seq);
+#endif
+}
+
 /*
  * Smarter partition iterator without context limits.
  */
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V3 3/4] block: re-organize fields of 'struct hd_part'
  2020-05-08  8:17 [PATCH V3 0/4] block: fix partition use-after-free and optimization Ming Lei
  2020-05-08  8:17 ` [PATCH V3 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei
  2020-05-08  8:17 ` [PATCH V3 2/4] block: only define 'nr_sects_seq' in hd_part for 32bit SMP Ming Lei
@ 2020-05-08  8:17 ` Ming Lei
  2020-05-08  8:17 ` [PATCH V3 4/4] block: don't hold part0's refcount in IO path Ming Lei
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2020-05-08  8:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao

Put all fields accessed in IO path together at the beginning
of the struct, so that all can be fetched in single cacheline.

Cc: Yufen Yu <yuyufen@huawei.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/genhd.h | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index b4744035ae58..a9384449465a 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -71,6 +71,14 @@ struct hd_struct {
 #if BITS_PER_LONG==32 && defined(CONFIG_SMP)
 	seqcount_t nr_sects_seq;
 #endif
+	unsigned long stamp;
+#ifdef	CONFIG_SMP
+	struct disk_stats __percpu *dkstats;
+#else
+	struct disk_stats dkstats;
+#endif
+	struct percpu_ref ref;
+
 	sector_t alignment_offset;
 	unsigned int discard_alignment;
 	struct device __dev;
@@ -80,13 +88,6 @@ struct hd_struct {
 #ifdef CONFIG_FAIL_MAKE_REQUEST
 	int make_it_fail;
 #endif
-	unsigned long stamp;
-#ifdef	CONFIG_SMP
-	struct disk_stats __percpu *dkstats;
-#else
-	struct disk_stats dkstats;
-#endif
-	struct percpu_ref ref;
 	struct rcu_work rcu_work;
 };
 
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V3 4/4] block: don't hold part0's refcount in IO path
  2020-05-08  8:17 [PATCH V3 0/4] block: fix partition use-after-free and optimization Ming Lei
                   ` (2 preceding siblings ...)
  2020-05-08  8:17 ` [PATCH V3 3/4] block: re-organize fields of 'struct hd_part' Ming Lei
@ 2020-05-08  8:17 ` Ming Lei
  2020-05-12  2:44 ` [PATCH V3 0/4] block: fix partition use-after-free and optimization Ming Lei
  2020-05-13  2:32 ` Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2020-05-08  8:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao

gendisk can't be gone when there is IO activity, so not hold
part0's refcount in IO path.

Cc: Yufen Yu <yuyufen@huawei.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk.h   | 13 ++++++-------
 block/genhd.c |  4 ++--
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 133fb0b99759..8efd1ca5c975 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -376,19 +376,18 @@ int bdev_resize_partition(struct block_device *bdev, int partno,
 int disk_expand_part_tbl(struct gendisk *disk, int target);
 int hd_ref_init(struct hd_struct *part);
 
-static inline void hd_struct_get(struct hd_struct *part)
-{
-	percpu_ref_get(&part->ref);
-}
-
+/* no need to get/put refcount of part0 */
 static inline int hd_struct_try_get(struct hd_struct *part)
 {
-	return percpu_ref_tryget_live(&part->ref);
+	if (part->partno)
+		return percpu_ref_tryget_live(&part->ref);
+	return 1;
 }
 
 static inline void hd_struct_put(struct hd_struct *part)
 {
-	percpu_ref_put(&part->ref);
+	if (part->partno)
+		percpu_ref_put(&part->ref);
 }
 
 static inline void hd_free_part(struct hd_struct *part)
diff --git a/block/genhd.c b/block/genhd.c
index bf8cbb033d64..d97b95d1a2fd 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -345,7 +345,8 @@ static inline int sector_in_part(struct hd_struct *part, sector_t sector)
  *
  * CONTEXT:
  * RCU read locked.  The returned partition pointer is always valid
- * because its refcount is grabbed.
+ * because its refcount is grabbed except for part0, which lifetime
+ * is same with the disk.
  *
  * RETURNS:
  * Found partition on success, part0 is returned if no partition matches
@@ -378,7 +379,6 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector)
 			return part;
 		}
 	}
-	hd_struct_get(&disk->part0);
 	return &disk->part0;
 }
 
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V3 0/4] block: fix partition use-after-free and optimization
  2020-05-08  8:17 [PATCH V3 0/4] block: fix partition use-after-free and optimization Ming Lei
                   ` (3 preceding siblings ...)
  2020-05-08  8:17 ` [PATCH V3 4/4] block: don't hold part0's refcount in IO path Ming Lei
@ 2020-05-12  2:44 ` Ming Lei
  2020-05-13  2:32 ` Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2020-05-12  2:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Yufen Yu, Christoph Hellwig, Hou Tao

On Fri, May 08, 2020 at 04:17:54PM +0800, Ming Lei wrote:
> Hi,
> 
> The 1st patch fixes one use-after-free on cached last_lookup partition.
> 
> The other 3 patches optimizes partition uses in IO path.
> 
> V3:
> 	- add reviewed-by tag
> 	- centralize partno check in the helper(4/4)
> 
> V2:
> 	- add comment, use part_to_disk() to retrieve disk instead of
> 	adding one field to hd_struct
> 	- don't put part in blk_account_io_merge
> 
> 
> Ming Lei (4):
>   block: fix use-after-free on cached last_lookup partition
>   block: only define 'nr_sects_seq' in hd_part for 32bit SMP
>   block: re-organize fields of 'struct hd_part'
>   block: don't hold part0's refcount in IO path
> 
>  block/blk-core.c        | 12 ------------
>  block/blk.h             | 13 ++++++-------
>  block/genhd.c           | 17 +++++++++++++----
>  block/partitions/core.c | 14 ++++++++++++--
>  include/linux/genhd.h   | 24 +++++++++++++++++-------
>  5 files changed, 48 insertions(+), 32 deletions(-)
> 
> Cc: Yufen Yu <yuyufen@huawei.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Hou Tao <houtao1@huawei.com>

Hello Jens,

Ping...

thanks,
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V3 0/4] block: fix partition use-after-free and optimization
  2020-05-08  8:17 [PATCH V3 0/4] block: fix partition use-after-free and optimization Ming Lei
                   ` (4 preceding siblings ...)
  2020-05-12  2:44 ` [PATCH V3 0/4] block: fix partition use-after-free and optimization Ming Lei
@ 2020-05-13  2:32 ` Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2020-05-13  2:32 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, Yufen Yu, Christoph Hellwig, Hou Tao

On 5/8/20 2:17 AM, Ming Lei wrote:
> Hi,
> 
> The 1st patch fixes one use-after-free on cached last_lookup partition.
> 
> The other 3 patches optimizes partition uses in IO path.
> 
> V3:
> 	- add reviewed-by tag
> 	- centralize partno check in the helper(4/4)
> 
> V2:
> 	- add comment, use part_to_disk() to retrieve disk instead of
> 	adding one field to hd_struct
> 	- don't put part in blk_account_io_merge

Applied for 5.8, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-05-13  2:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-08  8:17 [PATCH V3 0/4] block: fix partition use-after-free and optimization Ming Lei
2020-05-08  8:17 ` [PATCH V3 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei
2020-05-08  8:17 ` [PATCH V3 2/4] block: only define 'nr_sects_seq' in hd_part for 32bit SMP Ming Lei
2020-05-08  8:17 ` [PATCH V3 3/4] block: re-organize fields of 'struct hd_part' Ming Lei
2020-05-08  8:17 ` [PATCH V3 4/4] block: don't hold part0's refcount in IO path Ming Lei
2020-05-12  2:44 ` [PATCH V3 0/4] block: fix partition use-after-free and optimization Ming Lei
2020-05-13  2:32 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.