* [PATCH 0/4] block: fix partition use-after-free and optimization @ 2020-05-07 8:52 Ming Lei 2020-05-07 8:52 ` [PATCH 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei ` (3 more replies) 0 siblings, 4 replies; 15+ messages in thread From: Ming Lei @ 2020-05-07 8:52 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao Hi, The 1st patch fixes one use-after-free on cached last_lookup partition. The other 3 patches optimizes partition uses in IO path. Ming Lei (4): block: fix use-after-free on cached last_lookup partition block: only define 'nr_sects_seq' in hd_part for 32bit SMP block: re-organize fields of 'struct hd_part' block: don't hold part0's refcount in IO path block/blk-core.c | 15 ++------------- block/genhd.c | 7 +++++-- block/partitions/core.c | 14 ++++++++++++-- include/linux/genhd.h | 25 ++++++++++++++++++------- 4 files changed, 37 insertions(+), 24 deletions(-) Cc: Yufen Yu <yuyufen@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hou Tao <houtao1@huawei.com> -- 2.25.2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] block: fix use-after-free on cached last_lookup partition 2020-05-07 8:52 [PATCH 0/4] block: fix partition use-after-free and optimization Ming Lei @ 2020-05-07 8:52 ` Ming Lei 2020-05-07 14:16 ` Christoph Hellwig 2020-05-07 8:52 ` [PATCH 2/4] block: only define 'nr_sects_seq' in hd_part for 32bit SMP Ming Lei ` (2 subsequent siblings) 3 siblings, 1 reply; 15+ messages in thread From: Ming Lei @ 2020-05-07 8:52 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao delete_partition() clears the cached last_lookup partition. However the .last_lookup cache may be overwritten by one IO path after it is cleared from delete_partition(). Then another IO path may use the cached deleting partition after hd_struct_free() is called, then use-after-free is triggered on the cached partition. Fixes the issue by the following approach: 1) always get the partition's refcount via hd_struct_try_get() before setting .last_lookup 2) move clearing .last_lookup from delete_partition() to hd_struct_free() which is the release handle of the partition's percpu-refcount, so that no IO path can overwrite .last_lookup after it is cleared in hd_struct_free(). It is one candidate approach of Yufen's patch[1] which adds overhead in fast path by indirect lookup which may introduce one extra cacheline in IO path. Also this patch relies on percpu-refcount's protection, and it is easier to understand and verify. [1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#t Reported-by: Yufen Yu <yuyufen@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hou Tao <houtao1@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-core.c | 12 ------------ block/genhd.c | 6 +++++- block/partitions/core.c | 12 +++++++++++- include/linux/genhd.h | 1 + 4 files changed, 17 insertions(+), 14 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index ec50d7e6be21..826a8980997d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1363,18 +1363,6 @@ void blk_account_io_start(struct request *rq, bool new_io) part_stat_inc(part, merges[rw]); } else { part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq)); - if (!hd_struct_try_get(part)) { - /* - * The partition is already being removed, - * the request will be accounted on the disk only - * - * We take a reference on disk->part0 although that - * partition will never be deleted, so we can treat - * it as any other partition. - */ - part = &rq->rq_disk->part0; - hd_struct_get(part); - } part_inc_in_flight(rq->q, part, rw); rq->part = part; } diff --git a/block/genhd.c b/block/genhd.c index c05d509877fa..8b33f0e54356 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -359,17 +359,21 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector) ptbl = rcu_dereference(disk->part_tbl); part = rcu_dereference(ptbl->last_lookup); - if (part && sector_in_part(part, sector)) + if (part && sector_in_part(part, sector) && hd_struct_try_get(part)) return part; for (i = 1; i < ptbl->len; i++) { part = rcu_dereference(ptbl->part[i]); if (part && sector_in_part(part, sector)) { + if (!hd_struct_try_get(part)) + goto exit; rcu_assign_pointer(ptbl->last_lookup, part); return part; } } + exit: + hd_struct_get(&disk->part0); return &disk->part0; } diff --git a/block/partitions/core.c b/block/partitions/core.c index c085bf85509b..b6cbc9b98426 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -288,6 +288,11 @@ static void hd_struct_free_work(struct work_struct *work) static void hd_struct_free(struct percpu_ref *ref) { struct hd_struct *part = container_of(ref, struct hd_struct, ref); + struct disk_part_tbl *ptbl = + rcu_dereference_protected(part->disk->part_tbl, 1); + + rcu_assign_pointer(ptbl->last_lookup, NULL); + put_device(disk_to_dev(part->disk)); INIT_RCU_WORK(&part->rcu_work, hd_struct_free_work); queue_rcu_work(system_wq, &part->rcu_work); @@ -309,8 +314,12 @@ void delete_partition(struct gendisk *disk, struct hd_struct *part) struct disk_part_tbl *ptbl = rcu_dereference_protected(disk->part_tbl, 1); + /* + * make sure disk won't be gone because ->part_tbl is referenced in + * this part's release handler + */ + get_device(disk_to_dev(disk)); rcu_assign_pointer(ptbl->part[part->partno], NULL); - rcu_assign_pointer(ptbl->last_lookup, NULL); kobject_put(part->holder_dir); device_del(part_to_dev(part)); @@ -393,6 +402,7 @@ static struct hd_struct *add_partition(struct gendisk *disk, int partno, p->nr_sects = len; p->partno = partno; p->policy = get_disk_ro(disk); + p->disk = disk; if (info) { struct partition_meta_info *pinfo; diff --git a/include/linux/genhd.h b/include/linux/genhd.h index f9c226f9546a..c28d1d9bfa72 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -85,6 +85,7 @@ struct hd_struct { struct disk_stats dkstats; #endif struct percpu_ref ref; + struct gendisk *disk; struct rcu_work rcu_work; }; -- 2.25.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] block: fix use-after-free on cached last_lookup partition 2020-05-07 8:52 ` [PATCH 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei @ 2020-05-07 14:16 ` Christoph Hellwig 2020-05-08 1:54 ` Ming Lei 0 siblings, 1 reply; 15+ messages in thread From: Christoph Hellwig @ 2020-05-07 14:16 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, linux-block, Yufen Yu, Christoph Hellwig, Hou Tao On Thu, May 07, 2020 at 04:52:36PM +0800, Ming Lei wrote: > for (i = 1; i < ptbl->len; i++) { > part = rcu_dereference(ptbl->part[i]); > > if (part && sector_in_part(part, sector)) { > + if (!hd_struct_try_get(part)) > + goto exit; I think this needs the comment that was in blk_account_io_start to explain the logic. Also no need for the goto, this can just be a break. > - rcu_assign_pointer(ptbl->last_lookup, NULL); > kobject_put(part->holder_dir); > device_del(part_to_dev(part)); > > @@ -393,6 +402,7 @@ static struct hd_struct *add_partition(struct gendisk *disk, int partno, > p->nr_sects = len; > p->partno = partno; > p->policy = get_disk_ro(disk); > + p->disk = disk; Do we really need this pointer? Can't we use part_to_disk in the free path for some reason? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] block: fix use-after-free on cached last_lookup partition 2020-05-07 14:16 ` Christoph Hellwig @ 2020-05-08 1:54 ` Ming Lei 0 siblings, 0 replies; 15+ messages in thread From: Ming Lei @ 2020-05-08 1:54 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Yufen Yu, Hou Tao On Thu, May 07, 2020 at 07:16:26AM -0700, Christoph Hellwig wrote: > On Thu, May 07, 2020 at 04:52:36PM +0800, Ming Lei wrote: > > for (i = 1; i < ptbl->len; i++) { > > part = rcu_dereference(ptbl->part[i]); > > > > if (part && sector_in_part(part, sector)) { > > + if (!hd_struct_try_get(part)) > > + goto exit; > > I think this needs the comment that was in blk_account_io_start to > explain the logic. Also no need for the goto, this can just be a break. OK, not did it because I thought the logic is straightforward. > > > - rcu_assign_pointer(ptbl->last_lookup, NULL); > > kobject_put(part->holder_dir); > > device_del(part_to_dev(part)); > > > > @@ -393,6 +402,7 @@ static struct hd_struct *add_partition(struct gendisk *disk, int partno, > > p->nr_sects = len; > > p->partno = partno; > > p->policy = get_disk_ro(disk); > > + p->disk = disk; > > Do we really need this pointer? Can't we use part_to_disk in the > free path for some reason? Looks I did miss this function, :-) Thanks, Ming ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/4] block: only define 'nr_sects_seq' in hd_part for 32bit SMP 2020-05-07 8:52 [PATCH 0/4] block: fix partition use-after-free and optimization Ming Lei 2020-05-07 8:52 ` [PATCH 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei @ 2020-05-07 8:52 ` Ming Lei 2020-05-07 14:16 ` Christoph Hellwig 2020-05-07 8:52 ` [PATCH 3/4] block: re-organize fields of 'struct hd_part' Ming Lei 2020-05-07 8:52 ` [PATCH 4/4] block: don't hold part0's refcount in IO path Ming Lei 3 siblings, 1 reply; 15+ messages in thread From: Ming Lei @ 2020-05-07 8:52 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao The seqcount of 'nr_sects_seq' is only needed in case of 32bit SMP, so define it just for 32bit SMP. Cc: Yufen Yu <yuyufen@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hou Tao <houtao1@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/genhd.c | 2 +- block/partitions/core.c | 2 +- include/linux/genhd.h | 9 +++++++++ 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 8b33f0e54356..c0288b89a7ad 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1724,7 +1724,7 @@ struct gendisk *__alloc_disk_node(int minors, int node_id) * TODO: Ideally set_capacity() and get_capacity() should be * converted to make use of bd_mutex and sequence counters. */ - seqcount_init(&disk->part0.nr_sects_seq); + hd_sects_seq_init(&disk->part0); if (hd_ref_init(&disk->part0)) { hd_free_part(&disk->part0); kfree(disk); diff --git a/block/partitions/core.c b/block/partitions/core.c index b6cbc9b98426..efccb1be4c5c 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -391,7 +391,7 @@ static struct hd_struct *add_partition(struct gendisk *disk, int partno, goto out_free; } - seqcount_init(&p->nr_sects_seq); + hd_sects_seq_init(p); pdev = part_to_dev(p); p->start_sect = start; diff --git a/include/linux/genhd.h b/include/linux/genhd.h index c28d1d9bfa72..2732120751e8 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -68,7 +68,9 @@ struct hd_struct { * can be non-atomic on 32bit machines with 64bit sector_t. */ sector_t nr_sects; +#if BITS_PER_LONG==32 && defined(CONFIG_SMP) seqcount_t nr_sects_seq; +#endif sector_t alignment_offset; unsigned int discard_alignment; struct device __dev; @@ -275,6 +277,13 @@ static inline void disk_put_part(struct hd_struct *part) put_device(part_to_dev(part)); } +static inline void hd_sects_seq_init(struct hd_struct *p) +{ +#if BITS_PER_LONG==32 && defined(CONFIG_SMP) + seqcount_init(&p->nr_sects_seq); +#endif +} + /* * Smarter partition iterator without context limits. */ -- 2.25.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] block: only define 'nr_sects_seq' in hd_part for 32bit SMP 2020-05-07 8:52 ` [PATCH 2/4] block: only define 'nr_sects_seq' in hd_part for 32bit SMP Ming Lei @ 2020-05-07 14:16 ` Christoph Hellwig 0 siblings, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2020-05-07 14:16 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, linux-block, Yufen Yu, Christoph Hellwig, Hou Tao On Thu, May 07, 2020 at 04:52:37PM +0800, Ming Lei wrote: > The seqcount of 'nr_sects_seq' is only needed in case of 32bit SMP, > so define it just for 32bit SMP. Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/4] block: re-organize fields of 'struct hd_part' 2020-05-07 8:52 [PATCH 0/4] block: fix partition use-after-free and optimization Ming Lei 2020-05-07 8:52 ` [PATCH 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei 2020-05-07 8:52 ` [PATCH 2/4] block: only define 'nr_sects_seq' in hd_part for 32bit SMP Ming Lei @ 2020-05-07 8:52 ` Ming Lei 2020-05-07 14:16 ` Christoph Hellwig 2020-05-07 8:52 ` [PATCH 4/4] block: don't hold part0's refcount in IO path Ming Lei 3 siblings, 1 reply; 15+ messages in thread From: Ming Lei @ 2020-05-07 8:52 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao Put all fields accessed in IO path together at the beginning of the struct, so that all can be fetched in single cacheline. Cc: Yufen Yu <yuyufen@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hou Tao <houtao1@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- include/linux/genhd.h | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 2732120751e8..e0fab519ce37 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -71,6 +71,14 @@ struct hd_struct { #if BITS_PER_LONG==32 && defined(CONFIG_SMP) seqcount_t nr_sects_seq; #endif + unsigned long stamp; +#ifdef CONFIG_SMP + struct disk_stats __percpu *dkstats; +#else + struct disk_stats dkstats; +#endif + struct percpu_ref ref; + sector_t alignment_offset; unsigned int discard_alignment; struct device __dev; @@ -80,13 +88,6 @@ struct hd_struct { #ifdef CONFIG_FAIL_MAKE_REQUEST int make_it_fail; #endif - unsigned long stamp; -#ifdef CONFIG_SMP - struct disk_stats __percpu *dkstats; -#else - struct disk_stats dkstats; -#endif - struct percpu_ref ref; struct gendisk *disk; struct rcu_work rcu_work; }; -- 2.25.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] block: re-organize fields of 'struct hd_part' 2020-05-07 8:52 ` [PATCH 3/4] block: re-organize fields of 'struct hd_part' Ming Lei @ 2020-05-07 14:16 ` Christoph Hellwig 0 siblings, 0 replies; 15+ messages in thread From: Christoph Hellwig @ 2020-05-07 14:16 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, linux-block, Yufen Yu, Christoph Hellwig, Hou Tao On Thu, May 07, 2020 at 04:52:38PM +0800, Ming Lei wrote: > Put all fields accessed in IO path together at the beginning > of the struct, so that all can be fetched in single cacheline. Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4/4] block: don't hold part0's refcount in IO path 2020-05-07 8:52 [PATCH 0/4] block: fix partition use-after-free and optimization Ming Lei ` (2 preceding siblings ...) 2020-05-07 8:52 ` [PATCH 3/4] block: re-organize fields of 'struct hd_part' Ming Lei @ 2020-05-07 8:52 ` Ming Lei 2020-05-07 14:20 ` Christoph Hellwig 3 siblings, 1 reply; 15+ messages in thread From: Ming Lei @ 2020-05-07 8:52 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao gendisk can't be gone when there is IO activity, so not hold part0's refcount in IO path. Cc: Yufen Yu <yuyufen@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hou Tao <houtao1@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-core.c | 3 ++- block/genhd.c | 1 - 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 826a8980997d..56cc61354671 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1343,7 +1343,8 @@ void blk_account_io_done(struct request *req, u64 now) part_stat_add(part, nsecs[sgrp], now - req->start_time_ns); part_dec_in_flight(req->q, part, rq_data_dir(req)); - hd_struct_put(part); + if (part->partno) + hd_struct_put(part); part_stat_unlock(); } } diff --git a/block/genhd.c b/block/genhd.c index c0288b89a7ad..af8641351510 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -373,7 +373,6 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector) } } exit: - hd_struct_get(&disk->part0); return &disk->part0; } -- 2.25.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] block: don't hold part0's refcount in IO path 2020-05-07 8:52 ` [PATCH 4/4] block: don't hold part0's refcount in IO path Ming Lei @ 2020-05-07 14:20 ` Christoph Hellwig 2020-05-08 1:59 ` Ming Lei 0 siblings, 1 reply; 15+ messages in thread From: Christoph Hellwig @ 2020-05-07 14:20 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, linux-block, Yufen Yu, Christoph Hellwig, Hou Tao On Thu, May 07, 2020 at 04:52:39PM +0800, Ming Lei wrote: > gendisk can't be gone when there is IO activity, so not hold > part0's refcount in IO path. > > Cc: Yufen Yu <yuyufen@huawei.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Hou Tao <houtao1@huawei.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-core.c | 3 ++- > block/genhd.c | 1 - > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index 826a8980997d..56cc61354671 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -1343,7 +1343,8 @@ void blk_account_io_done(struct request *req, u64 now) > part_stat_add(part, nsecs[sgrp], now - req->start_time_ns); > part_dec_in_flight(req->q, part, rq_data_dir(req)); > > - hd_struct_put(part); > + if (part->partno) > + hd_struct_put(part); > part_stat_unlock(); Doesn't blk_account_io_merge needs the check as well? Maybe it should go into hd_struct_put and the other helpers to be centralized? Otherwise this looks fine to me. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] block: don't hold part0's refcount in IO path 2020-05-07 14:20 ` Christoph Hellwig @ 2020-05-08 1:59 ` Ming Lei 0 siblings, 0 replies; 15+ messages in thread From: Ming Lei @ 2020-05-08 1:59 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Yufen Yu, Hou Tao On Thu, May 07, 2020 at 07:20:50AM -0700, Christoph Hellwig wrote: > On Thu, May 07, 2020 at 04:52:39PM +0800, Ming Lei wrote: > > gendisk can't be gone when there is IO activity, so not hold > > part0's refcount in IO path. > > > > Cc: Yufen Yu <yuyufen@huawei.com> > > Cc: Christoph Hellwig <hch@infradead.org> > > Cc: Hou Tao <houtao1@huawei.com> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-core.c | 3 ++- > > block/genhd.c | 1 - > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > index 826a8980997d..56cc61354671 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -1343,7 +1343,8 @@ void blk_account_io_done(struct request *req, u64 now) > > part_stat_add(part, nsecs[sgrp], now - req->start_time_ns); > > part_dec_in_flight(req->q, part, rq_data_dir(req)); > > > > - hd_struct_put(part); > > + if (part->partno) > > + hd_struct_put(part); > > part_stat_unlock(); > > Doesn't blk_account_io_merge needs the check as well? You are right, blk_account_io_merge needs it too. > > Maybe it should go into hd_struct_put and the other helpers to be > centralized? I think that is doable. Thanks, Ming ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 0/4] block: fix partition use-after-free and optimization @ 2020-01-09 6:21 Ming Lei 2020-01-09 6:21 ` [PATCH 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei 0 siblings, 1 reply; 15+ messages in thread From: Ming Lei @ 2020-01-09 6:21 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao Hi, The 1st patch fixes one use-after-free on cached last_lookup partition. The other 3 patches optimizes partition uses in IO path. Ming Lei (4): block: fix use-after-free on cached last_lookup partition block: only define 'nr_sects_seq' in hd_part for 32bit SMP block: re-organize fields of 'struct hd_part' block: don't hold part0's refcount in IO path block/blk-core.c | 15 ++------------- block/genhd.c | 7 +++++-- block/partition-generic.c | 12 ++++++++++-- include/linux/genhd.h | 25 ++++++++++++++++++------- 4 files changed, 35 insertions(+), 24 deletions(-) Cc: Yufen Yu <yuyufen@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hou Tao <houtao1@huawei.com> -- 2.20.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] block: fix use-after-free on cached last_lookup partition 2020-01-09 6:21 [PATCH 0/4] block: fix partition use-after-free and optimization Ming Lei @ 2020-01-09 6:21 ` Ming Lei 2020-02-21 3:05 ` Yufen Yu 2020-02-21 4:03 ` Bart Van Assche 0 siblings, 2 replies; 15+ messages in thread From: Ming Lei @ 2020-01-09 6:21 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Ming Lei, Yufen Yu, Christoph Hellwig, Hou Tao delete_partition() clears the cached last_lookup partition. However the .last_lookup cache may be overwritten by one IO path after it is cleared from delete_partition(). Then another IO path may use the cached deleting partition after __delete_partition() is called, then use-after-free is triggered on the cached partition. Fixes the issue by the following approach: 1) always get the partition's refcount via hd_struct_try_get() before setting .last_lookup 2) move clearing .last_lookup from delete_partition() to __delete_partition() which is release handle of the partition's percpu-refcount, so that no IO path can overwrite .last_lookup after it is cleared in __delete_partition(). It is one candidate approach of Yufen's patch[1] which adds overhead in fast path by indirect lookup which may introduce one extra cacheline in IO path. Also this patch relies on percpu-refcount's protection, and it is easier to understand and verify. [1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#t Reported-by: Yufen Yu <yuyufen@huawei.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hou Tao <houtao1@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-core.c | 12 ------------ block/genhd.c | 6 +++++- block/partition-generic.c | 10 +++++++++- include/linux/genhd.h | 1 + 4 files changed, 15 insertions(+), 14 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 089e890ab208..79599f5fd5b7 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1365,18 +1365,6 @@ void blk_account_io_start(struct request *rq, bool new_io) part_stat_inc(part, merges[rw]); } else { part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq)); - if (!hd_struct_try_get(part)) { - /* - * The partition is already being removed, - * the request will be accounted on the disk only - * - * We take a reference on disk->part0 although that - * partition will never be deleted, so we can treat - * it as any other partition. - */ - part = &rq->rq_disk->part0; - hd_struct_get(part); - } part_inc_in_flight(rq->q, part, rw); rq->part = part; } diff --git a/block/genhd.c b/block/genhd.c index ff6268970ddc..6029c94510f0 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -286,17 +286,21 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector) ptbl = rcu_dereference(disk->part_tbl); part = rcu_dereference(ptbl->last_lookup); - if (part && sector_in_part(part, sector)) + if (part && sector_in_part(part, sector) && hd_struct_try_get(part)) return part; for (i = 1; i < ptbl->len; i++) { part = rcu_dereference(ptbl->part[i]); if (part && sector_in_part(part, sector)) { + if (!hd_struct_try_get(part)) + goto exit; rcu_assign_pointer(ptbl->last_lookup, part); return part; } } + exit: + hd_struct_get(&disk->part0); return &disk->part0; } EXPORT_SYMBOL_GPL(disk_map_sector_rcu); diff --git a/block/partition-generic.c b/block/partition-generic.c index 1d20c9cf213f..1739f750dbf2 100644 --- a/block/partition-generic.c +++ b/block/partition-generic.c @@ -262,6 +262,12 @@ static void delete_partition_work_fn(struct work_struct *work) void __delete_partition(struct percpu_ref *ref) { struct hd_struct *part = container_of(ref, struct hd_struct, ref); + struct disk_part_tbl *ptbl = + rcu_dereference_protected(part->disk->part_tbl, 1); + + rcu_assign_pointer(ptbl->last_lookup, NULL); + put_device(disk_to_dev(part->disk)); + INIT_RCU_WORK(&part->rcu_work, delete_partition_work_fn); queue_rcu_work(system_wq, &part->rcu_work); } @@ -283,8 +289,9 @@ void delete_partition(struct gendisk *disk, int partno) if (!part) return; + get_device(disk_to_dev(disk)); rcu_assign_pointer(ptbl->part[partno], NULL); - rcu_assign_pointer(ptbl->last_lookup, NULL); + kobject_put(part->holder_dir); device_del(part_to_dev(part)); @@ -349,6 +356,7 @@ struct hd_struct *add_partition(struct gendisk *disk, int partno, p->nr_sects = len; p->partno = partno; p->policy = get_disk_ro(disk); + p->disk = disk; if (info) { struct partition_meta_info *pinfo = alloc_part_info(disk); diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 8bb63027e4d6..1b09cfe00aa3 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -130,6 +130,7 @@ struct hd_struct { struct disk_stats dkstats; #endif struct percpu_ref ref; + struct gendisk *disk; struct rcu_work rcu_work; }; -- 2.20.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] block: fix use-after-free on cached last_lookup partition 2020-01-09 6:21 ` [PATCH 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei @ 2020-02-21 3:05 ` Yufen Yu 2020-02-21 4:03 ` Bart Van Assche 1 sibling, 0 replies; 15+ messages in thread From: Yufen Yu @ 2020-02-21 3:05 UTC (permalink / raw) To: Ming Lei, Jens Axboe; +Cc: linux-block, Christoph Hellwig, Hou Tao Hi Guys, Did this patch have been forgotten? So, ping... Thanks, Yufen On 2020/1/9 14:21, Ming Lei wrote: > delete_partition() clears the cached last_lookup partition. However > the .last_lookup cache may be overwritten by one IO path after > it is cleared from delete_partition(). Then another IO path may > use the cached deleting partition after __delete_partition() is > called, then use-after-free is triggered on the cached partition. > > Fixes the issue by the following approach: > > 1) always get the partition's refcount via hd_struct_try_get() before > setting .last_lookup > > 2) move clearing .last_lookup from delete_partition() to > __delete_partition() which is release handle of the partition's > percpu-refcount, so that no IO path can overwrite .last_lookup after it > is cleared in __delete_partition(). > > It is one candidate approach of Yufen's patch[1] which adds overhead > in fast path by indirect lookup which may introduce one extra cacheline > in IO path. Also this patch relies on percpu-refcount's protection, and > it is easier to understand and verify. > > [1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#t > > Reported-by: Yufen Yu <yuyufen@huawei.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Hou Tao <houtao1@huawei.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-core.c | 12 ------------ > block/genhd.c | 6 +++++- > block/partition-generic.c | 10 +++++++++- > include/linux/genhd.h | 1 + > 4 files changed, 15 insertions(+), 14 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index 089e890ab208..79599f5fd5b7 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -1365,18 +1365,6 @@ void blk_account_io_start(struct request *rq, bool new_io) > part_stat_inc(part, merges[rw]); > } else { > part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq)); > - if (!hd_struct_try_get(part)) { > - /* > - * The partition is already being removed, > - * the request will be accounted on the disk only > - * > - * We take a reference on disk->part0 although that > - * partition will never be deleted, so we can treat > - * it as any other partition. > - */ > - part = &rq->rq_disk->part0; > - hd_struct_get(part); > - } > part_inc_in_flight(rq->q, part, rw); > rq->part = part; > } > diff --git a/block/genhd.c b/block/genhd.c > index ff6268970ddc..6029c94510f0 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -286,17 +286,21 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector) > ptbl = rcu_dereference(disk->part_tbl); > > part = rcu_dereference(ptbl->last_lookup); > - if (part && sector_in_part(part, sector)) > + if (part && sector_in_part(part, sector) && hd_struct_try_get(part)) > return part; > > for (i = 1; i < ptbl->len; i++) { > part = rcu_dereference(ptbl->part[i]); > > if (part && sector_in_part(part, sector)) { > + if (!hd_struct_try_get(part)) > + goto exit; > rcu_assign_pointer(ptbl->last_lookup, part); > return part; > } > } > + exit: > + hd_struct_get(&disk->part0); > return &disk->part0; > } > EXPORT_SYMBOL_GPL(disk_map_sector_rcu); > diff --git a/block/partition-generic.c b/block/partition-generic.c > index 1d20c9cf213f..1739f750dbf2 100644 > --- a/block/partition-generic.c > +++ b/block/partition-generic.c > @@ -262,6 +262,12 @@ static void delete_partition_work_fn(struct work_struct *work) > void __delete_partition(struct percpu_ref *ref) > { > struct hd_struct *part = container_of(ref, struct hd_struct, ref); > + struct disk_part_tbl *ptbl = > + rcu_dereference_protected(part->disk->part_tbl, 1); > + > + rcu_assign_pointer(ptbl->last_lookup, NULL); > + put_device(disk_to_dev(part->disk)); > + > INIT_RCU_WORK(&part->rcu_work, delete_partition_work_fn); > queue_rcu_work(system_wq, &part->rcu_work); > } > @@ -283,8 +289,9 @@ void delete_partition(struct gendisk *disk, int partno) > if (!part) > return; > > + get_device(disk_to_dev(disk)); > rcu_assign_pointer(ptbl->part[partno], NULL); > - rcu_assign_pointer(ptbl->last_lookup, NULL); > + > kobject_put(part->holder_dir); > device_del(part_to_dev(part)); > > @@ -349,6 +356,7 @@ struct hd_struct *add_partition(struct gendisk *disk, int partno, > p->nr_sects = len; > p->partno = partno; > p->policy = get_disk_ro(disk); > + p->disk = disk; > > if (info) { > struct partition_meta_info *pinfo = alloc_part_info(disk); > diff --git a/include/linux/genhd.h b/include/linux/genhd.h > index 8bb63027e4d6..1b09cfe00aa3 100644 > --- a/include/linux/genhd.h > +++ b/include/linux/genhd.h > @@ -130,6 +130,7 @@ struct hd_struct { > struct disk_stats dkstats; > #endif > struct percpu_ref ref; > + struct gendisk *disk; > struct rcu_work rcu_work; > }; > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] block: fix use-after-free on cached last_lookup partition 2020-01-09 6:21 ` [PATCH 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei 2020-02-21 3:05 ` Yufen Yu @ 2020-02-21 4:03 ` Bart Van Assche 2020-02-21 7:47 ` Ming Lei 1 sibling, 1 reply; 15+ messages in thread From: Bart Van Assche @ 2020-02-21 4:03 UTC (permalink / raw) To: Ming Lei, Jens Axboe; +Cc: linux-block, Yufen Yu, Christoph Hellwig, Hou Tao On 2020-01-08 22:21, Ming Lei wrote: > delete_partition() clears the cached last_lookup partition. However > the .last_lookup cache may be overwritten by one IO path after > it is cleared from delete_partition(). Then another IO path may > use the cached deleting partition after __delete_partition() is > called, then use-after-free is triggered on the cached partition. > > Fixes the issue by the following approach: > > 1) always get the partition's refcount via hd_struct_try_get() before > setting .last_lookup > > 2) move clearing .last_lookup from delete_partition() to > __delete_partition() which is release handle of the partition's > percpu-refcount, so that no IO path can overwrite .last_lookup after it > is cleared in __delete_partition(). > > It is one candidate approach of Yufen's patch[1] which adds overhead > in fast path by indirect lookup which may introduce one extra cacheline > in IO path. Also this patch relies on percpu-refcount's protection, and > it is easier to understand and verify. > > [1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#t Hi Ming, disk_map_sector_rcu() is called from the I/O path only and hence with q->q_usage_counter > 0. Has it been considered to freeze disk->queue from delete_partition() before deleting a partition and unfreezing disk->queue after partition deletion has finished? Would that approach allow to eliminate partition reference counting and thereby improve the performance of the hot path? Thanks, Bart. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] block: fix use-after-free on cached last_lookup partition 2020-02-21 4:03 ` Bart Van Assche @ 2020-02-21 7:47 ` Ming Lei 0 siblings, 0 replies; 15+ messages in thread From: Ming Lei @ 2020-02-21 7:47 UTC (permalink / raw) To: Bart Van Assche Cc: Jens Axboe, linux-block, Yufen Yu, Christoph Hellwig, Hou Tao On Thu, Feb 20, 2020 at 08:03:52PM -0800, Bart Van Assche wrote: > On 2020-01-08 22:21, Ming Lei wrote: > > delete_partition() clears the cached last_lookup partition. However > > the .last_lookup cache may be overwritten by one IO path after > > it is cleared from delete_partition(). Then another IO path may > > use the cached deleting partition after __delete_partition() is > > called, then use-after-free is triggered on the cached partition. > > > > Fixes the issue by the following approach: > > > > 1) always get the partition's refcount via hd_struct_try_get() before > > setting .last_lookup > > > > 2) move clearing .last_lookup from delete_partition() to > > __delete_partition() which is release handle of the partition's > > percpu-refcount, so that no IO path can overwrite .last_lookup after it > > is cleared in __delete_partition(). > > > > It is one candidate approach of Yufen's patch[1] which adds overhead > > in fast path by indirect lookup which may introduce one extra cacheline > > in IO path. Also this patch relies on percpu-refcount's protection, and > > it is easier to understand and verify. > > > > [1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#t > > Hi Ming, > > disk_map_sector_rcu() is called from the I/O path only and hence with > q->q_usage_counter > 0. Has it been considered to freeze disk->queue > from delete_partition() before deleting a partition and unfreezing > disk->queue after partition deletion has finished? Would that approach > allow to eliminate partition reference counting and thereby improve the > performance of the hot path? Hi Bart, I did consider that approach, but this way may cause performance regression, given deleting any partition drops IO performance a lot on other un-related partitions. Thanks, Ming ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-05-08 1:59 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-07 8:52 [PATCH 0/4] block: fix partition use-after-free and optimization Ming Lei 2020-05-07 8:52 ` [PATCH 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei 2020-05-07 14:16 ` Christoph Hellwig 2020-05-08 1:54 ` Ming Lei 2020-05-07 8:52 ` [PATCH 2/4] block: only define 'nr_sects_seq' in hd_part for 32bit SMP Ming Lei 2020-05-07 14:16 ` Christoph Hellwig 2020-05-07 8:52 ` [PATCH 3/4] block: re-organize fields of 'struct hd_part' Ming Lei 2020-05-07 14:16 ` Christoph Hellwig 2020-05-07 8:52 ` [PATCH 4/4] block: don't hold part0's refcount in IO path Ming Lei 2020-05-07 14:20 ` Christoph Hellwig 2020-05-08 1:59 ` Ming Lei -- strict thread matches above, loose matches on Subject: below -- 2020-01-09 6:21 [PATCH 0/4] block: fix partition use-after-free and optimization Ming Lei 2020-01-09 6:21 ` [PATCH 1/4] block: fix use-after-free on cached last_lookup partition Ming Lei 2020-02-21 3:05 ` Yufen Yu 2020-02-21 4:03 ` Bart Van Assche 2020-02-21 7:47 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).