* [PATCH] fs: System memory leak when running HTX with T10 DIF enabled @ 2017-06-28 16:32 wenxiong 2017-06-28 17:10 ` Keith Busch 2017-06-28 17:42 ` Jens Axboe 0 siblings, 2 replies; 14+ messages in thread From: wenxiong @ 2017-06-28 16:32 UTC (permalink / raw) To: linux-kernel; +Cc: keith.busch, axboe, bjking, wenxiong From: Wen Xiong <wenxiong@linux.vnet.ibm.com> With nvme devive + T10 enabled, On a system it has 256GB and started logging /proc/meminfo & /proc/slabinfo for every minute and in an hour it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute leaking. /proc/meminfo | grep SUnreclaim... SUnreclaim: 6752128 kB SUnreclaim: 6874880 kB SUnreclaim: 7238080 kB .... SUnreclaim: 22307264 kB SUnreclaim: 22485888 kB SUnreclaim: 22720256 kB When testcases with T10 enabled call into __blkdev_direct_IO_simple, code doesn't free memory allocated by bio_integrity_alloc. The patch fixes the issue. HTX has been run with +60 hours without failure. failing stack: [36587.216329] [c000002ff60874a0] [c000000000bcac68] dump_stack+0xb0/0xf0 (unreliable) [36587.216349] [c000002ff60874e0] [c000000000bc8c94] panic+0x140/0x308 [36587.216407] [c000002ff6087570] [c000000000282530] out_of_memory+0x4e0/0x650 [36587.216465] [c000002ff6087610] [c00000000028a154] __alloc_pages_nodemask+0xf34/0x10b0 [36587.216534] [c000002ff6087810] [c00000000030b800] alloc_pages_current+0xc0/0x1d0 [36587.216603] [c000002ff6087870] [c00000000031907c] new_slab+0x46c/0x7d0 [36587.216661] [c000002ff6087950] [c00000000031bf00] ___slab_alloc+0x570/0x670 [36587.216718] [c000002ff6087a70] [c00000000031c05c] __slab_alloc+0x5c/0x90 [36587.216776] [c000002ff6087ad0] [c00000000031c1f4] kmem_cache_alloc+0x164/0x300 [36587.216845] [c000002ff6087b20] [c0000000002de120] mmap_region+0x3e0/0x6e0 [36587.216903] [c000002ff6087c00] [c0000000002de7cc] do_mmap+0x3ac/0x480 [36587.216960] [c000002ff6087c80] [c0000000002af244] vm_mmap_pgoff+0x114/0x160 [36587.217018] [c000002ff6087d60] [c0000000002db6b0] SyS_mmap_pgoff+0x230/0x300 [36587.217087] [c000002ff6087de0] [c000000000014eac] sys_mmap+0x8c/0xd0 [36587.217145] [c000002ff6087e30] [c00000000000b184] system_call+0x38/0xe0 [36587.217481] ---[ end Kernel panic - not syncing: Out of memory and no killable processes... [36587.217481] Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Reviewed by: Brian King <brking@linux.vnet.ibm.com> Tested by: Murali Iyer <mniyer@us.ibm.com> --- fs/block_dev.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 519599d..e871444 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio) if (unlikely(bio.bi_error)) return bio.bi_error; + + if (bio_integrity(&bio)) + bio_integrity_free(&bio); + return ret; } -- 1.7.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 16:32 [PATCH] fs: System memory leak when running HTX with T10 DIF enabled wenxiong @ 2017-06-28 17:10 ` Keith Busch 2017-06-28 17:51 ` Jens Axboe 2017-06-28 18:31 ` Christoph Hellwig 2017-06-28 17:42 ` Jens Axboe 1 sibling, 2 replies; 14+ messages in thread From: Keith Busch @ 2017-06-28 17:10 UTC (permalink / raw) To: wenxiong; +Cc: linux-kernel, axboe, bjking, hch On Wed, Jun 28, 2017 at 11:32:51AM -0500, wenxiong@linux.vnet.ibm.com wrote: > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 519599d..e871444 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio) > > if (unlikely(bio.bi_error)) > return bio.bi_error; > + > + if (bio_integrity(&bio)) > + bio_integrity_free(&bio); > + > return ret; > } We don't want to leak the integrity payload in case of bi_error either. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 17:10 ` Keith Busch @ 2017-06-28 17:51 ` Jens Axboe 2017-06-28 18:31 ` Christoph Hellwig 1 sibling, 0 replies; 14+ messages in thread From: Jens Axboe @ 2017-06-28 17:51 UTC (permalink / raw) To: Keith Busch, wenxiong; +Cc: linux-kernel, bjking, hch On 06/28/2017 11:10 AM, Keith Busch wrote: > On Wed, Jun 28, 2017 at 11:32:51AM -0500, wenxiong@linux.vnet.ibm.com wrote: >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 519599d..e871444 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio) >> >> if (unlikely(bio.bi_error)) >> return bio.bi_error; >> + >> + if (bio_integrity(&bio)) >> + bio_integrity_free(&bio); >> + >> return ret; >> } > > We don't want to leak the integrity payload in case of bi_error either. Yep, see my follow up email on this. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 17:10 ` Keith Busch 2017-06-28 17:51 ` Jens Axboe @ 2017-06-28 18:31 ` Christoph Hellwig 2017-06-28 18:34 ` Jens Axboe 1 sibling, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2017-06-28 18:31 UTC (permalink / raw) To: Keith Busch; +Cc: wenxiong, linux-kernel, axboe, bjking, hch On Wed, Jun 28, 2017 at 01:10:31PM -0400, Keith Busch wrote: > On Wed, Jun 28, 2017 at 11:32:51AM -0500, wenxiong@linux.vnet.ibm.com wrote: > > diff --git a/fs/block_dev.c b/fs/block_dev.c > > index 519599d..e871444 100644 > > --- a/fs/block_dev.c > > +++ b/fs/block_dev.c > > @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio) > > > > if (unlikely(bio.bi_error)) > > return bio.bi_error; > > + > > + if (bio_integrity(&bio)) > > + bio_integrity_free(&bio); > > + > > return ret; > > } > > We don't want to leak the integrity payload in case of bi_error either. And we should just call __bio_free. Which btw every user of bio_init probably needs. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 18:31 ` Christoph Hellwig @ 2017-06-28 18:34 ` Jens Axboe 2017-06-28 18:38 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2017-06-28 18:34 UTC (permalink / raw) To: Christoph Hellwig, Keith Busch; +Cc: wenxiong, linux-kernel, bjking On 06/28/2017 12:31 PM, Christoph Hellwig wrote: > On Wed, Jun 28, 2017 at 01:10:31PM -0400, Keith Busch wrote: >> On Wed, Jun 28, 2017 at 11:32:51AM -0500, wenxiong@linux.vnet.ibm.com wrote: >>> diff --git a/fs/block_dev.c b/fs/block_dev.c >>> index 519599d..e871444 100644 >>> --- a/fs/block_dev.c >>> +++ b/fs/block_dev.c >>> @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio) >>> >>> if (unlikely(bio.bi_error)) >>> return bio.bi_error; >>> + >>> + if (bio_integrity(&bio)) >>> + bio_integrity_free(&bio); >>> + >>> return ret; >>> } >> >> We don't want to leak the integrity payload in case of bi_error either. > > And we should just call __bio_free. Which btw every user of bio_init > probably needs. That's what I sent out. Here it is again. We should get this into 4.12, so would be great with a review or two. diff --git a/block/bio.c b/block/bio.c index 888e7801c638..bdc6e6541278 100644 --- a/block/bio.c +++ b/block/bio.c @@ -240,7 +240,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx, return bvl; } -static void __bio_free(struct bio *bio) +void __bio_free(struct bio *bio) { bio_disassociate_task(bio); diff --git a/fs/block_dev.c b/fs/block_dev.c index 519599dddd36..d929f886ee7b 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -263,7 +263,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, kfree(vecs); if (unlikely(bio.bi_error)) - return bio.bi_error; + ret = bio.bi_error; + + __bio_free(&bio); + return ret; } diff --git a/include/linux/bio.h b/include/linux/bio.h index d1b04b0e99cf..4a29dc3db8cb 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -380,6 +380,7 @@ extern mempool_t *biovec_create_pool(int pool_entries); extern struct bio *bio_alloc_bioset(gfp_t, unsigned int, struct bio_set *); extern void bio_put(struct bio *); +extern void __bio_free(struct bio *); extern void __bio_clone_fast(struct bio *, struct bio *); extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *); -- Jens Axboe ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 18:34 ` Jens Axboe @ 2017-06-28 18:38 ` Christoph Hellwig 2017-06-28 18:44 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2017-06-28 18:38 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, Keith Busch, wenxiong, linux-kernel, bjking On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote: > That's what I sent out. Where? Didn't see that anywhere.. > Here it is again. We should get this into 4.12, > so would be great with a review or two. Can we rename __bio_free to bio_uninit and add a comment to bio_init that it must be paried with bio_uninit? Except for that this looks fine, although there are a lot more callers that should get this treatment.. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 18:38 ` Christoph Hellwig @ 2017-06-28 18:44 ` Jens Axboe 2017-06-28 18:52 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2017-06-28 18:44 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, wenxiong, linux-kernel, bjking On 06/28/2017 12:38 PM, Christoph Hellwig wrote: > On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote: >> That's what I sent out. > > Where? Didn't see that anywhere.. Looks like you weren't CC'ed on the original thread. About an hour ago. >> Here it is again. We should get this into 4.12, >> so would be great with a review or two. > > Can we rename __bio_free to bio_uninit and add a comment to bio_init > that it must be paried with bio_uninit? Let's keep it small for 4.12. We can do a cleanup on top of this for 4.13. > Except for that this looks fine, although there are a lot more callers > that should get this treatment.. Should only be an issue for on-stack bio's, since we don't go through the put/free path. Did a quick grep, looks like this is one of 3. One is floppy, which probably neither has DIF or uses blk-throttle. Then there's one in dm-bufio, didn't look too closely at that. Last one is this one. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 18:44 ` Jens Axboe @ 2017-06-28 18:52 ` Christoph Hellwig 2017-06-28 18:57 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2017-06-28 18:52 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, Keith Busch, wenxiong, linux-kernel, bjking On Wed, Jun 28, 2017 at 12:44:00PM -0600, Jens Axboe wrote: > On 06/28/2017 12:38 PM, Christoph Hellwig wrote: > > On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote: > >> That's what I sent out. > > > > Where? Didn't see that anywhere.. > > Looks like you weren't CC'ed on the original thread. About an hour ago. > > >> Here it is again. We should get this into 4.12, > >> so would be great with a review or two. > > > > Can we rename __bio_free to bio_uninit and add a comment to bio_init > > that it must be paried with bio_uninit? > > Let's keep it small for 4.12. We can do a cleanup on top of this for > 4.13. The rename is two additional lines for the patch, it's not going to make a difference.. > > Except for that this looks fine, although there are a lot more callers > > that should get this treatment.. > > Should only be an issue for on-stack bio's, since we don't go through > the put/free path. Did a quick grep, looks like this is one of 3. One > is floppy, which probably neither has DIF or uses blk-throttle. Then > there's one in dm-bufio, didn't look too closely at that. Last one is > this one. Well, it's really all callers but bio_alloc_bioset itself that will need this handling, as only bios that come from bio_alloc_bioset will be freed through bio_free. Most of them probably don't support DIF, but they'll also miss the bio_disassociate_task call this way, and will leak I/O context and css references if block cgroup support is enabled. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 18:52 ` Christoph Hellwig @ 2017-06-28 18:57 ` Jens Axboe 2017-06-28 21:10 ` Christoph Hellwig 2017-06-28 21:11 ` Shaohua Li 0 siblings, 2 replies; 14+ messages in thread From: Jens Axboe @ 2017-06-28 18:57 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, wenxiong, linux-kernel, bjking On 06/28/2017 12:52 PM, Christoph Hellwig wrote: > On Wed, Jun 28, 2017 at 12:44:00PM -0600, Jens Axboe wrote: >> On 06/28/2017 12:38 PM, Christoph Hellwig wrote: >>> On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote: >>>> That's what I sent out. >>> >>> Where? Didn't see that anywhere.. >> >> Looks like you weren't CC'ed on the original thread. About an hour ago. >> >>>> Here it is again. We should get this into 4.12, >>>> so would be great with a review or two. >>> >>> Can we rename __bio_free to bio_uninit and add a comment to bio_init >>> that it must be paried with bio_uninit? >> >> Let's keep it small for 4.12. We can do a cleanup on top of this for >> 4.13. > > The rename is two additional lines for the patch, it's not going to > make a difference.. Sure, why not... >>> Except for that this looks fine, although there are a lot more callers >>> that should get this treatment.. >> >> Should only be an issue for on-stack bio's, since we don't go through >> the put/free path. Did a quick grep, looks like this is one of 3. One >> is floppy, which probably neither has DIF or uses blk-throttle. Then >> there's one in dm-bufio, didn't look too closely at that. Last one is >> this one. > > Well, it's really all callers but bio_alloc_bioset itself that > will need this handling, as only bios that come from bio_alloc_bioset > will be freed through bio_free. Most of them probably don't > support DIF, but they'll also miss the bio_disassociate_task call > this way, and will leak I/O context and css references if block > cgroup support is enabled. I guess local allocs too will be affected. I'm baffled we've had this issue for so long, and nobody's seen it. I knew DIF was never used (basically), but blk-throttle must have some users at least. diff --git a/block/bio.c b/block/bio.c index 888e7801c638..f877d5e6d17f 100644 --- a/block/bio.c +++ b/block/bio.c @@ -240,7 +240,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx, return bvl; } -static void __bio_free(struct bio *bio) +void bio_uninit(struct bio *bio) { bio_disassociate_task(bio); @@ -253,7 +253,7 @@ static void bio_free(struct bio *bio) struct bio_set *bs = bio->bi_pool; void *p; - __bio_free(bio); + bio_uninit(bio); if (bs) { bvec_free(bs->bvec_pool, bio->bi_io_vec, BVEC_POOL_IDX(bio)); @@ -271,6 +271,11 @@ static void bio_free(struct bio *bio) } } +/* + * Users of this function have their own bio allocation. Subsequently, + * they must remember to pair any call to bio_init() with bio_uninit() + * when IO has completed, or when the bio is released. + */ void bio_init(struct bio *bio, struct bio_vec *table, unsigned short max_vecs) { @@ -297,7 +302,7 @@ void bio_reset(struct bio *bio) { unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS); - __bio_free(bio); + bio_uninit(bio); memset(bio, 0, BIO_RESET_BYTES); bio->bi_flags = flags; diff --git a/fs/block_dev.c b/fs/block_dev.c index 519599dddd36..0a7404ef9335 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -263,7 +263,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, kfree(vecs); if (unlikely(bio.bi_error)) - return bio.bi_error; + ret = bio.bi_error; + + bio_uninit(&bio); + return ret; } diff --git a/include/linux/bio.h b/include/linux/bio.h index d1b04b0e99cf..a7e29fa0981f 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -426,6 +426,7 @@ extern void bio_advance(struct bio *, unsigned); extern void bio_init(struct bio *bio, struct bio_vec *table, unsigned short max_vecs); +extern void bio_uninit(struct bio *); extern void bio_reset(struct bio *); void bio_chain(struct bio *, struct bio *); -- Jens Axboe ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 18:57 ` Jens Axboe @ 2017-06-28 21:10 ` Christoph Hellwig 2017-06-28 21:11 ` Shaohua Li 1 sibling, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2017-06-28 21:10 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, Keith Busch, wenxiong, linux-kernel, bjking Looks fine, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 18:57 ` Jens Axboe 2017-06-28 21:10 ` Christoph Hellwig @ 2017-06-28 21:11 ` Shaohua Li 2017-06-28 21:25 ` Christoph Hellwig 1 sibling, 1 reply; 14+ messages in thread From: Shaohua Li @ 2017-06-28 21:11 UTC (permalink / raw) To: Jens Axboe; +Cc: Christoph Hellwig, Keith Busch, wenxiong, linux-kernel, bjking On Wed, Jun 28, 2017 at 12:57:50PM -0600, Jens Axboe wrote: > On 06/28/2017 12:52 PM, Christoph Hellwig wrote: > > On Wed, Jun 28, 2017 at 12:44:00PM -0600, Jens Axboe wrote: > >> On 06/28/2017 12:38 PM, Christoph Hellwig wrote: > >>> On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote: > >>>> That's what I sent out. > >>> > >>> Where? Didn't see that anywhere.. > >> > >> Looks like you weren't CC'ed on the original thread. About an hour ago. > >> > >>>> Here it is again. We should get this into 4.12, > >>>> so would be great with a review or two. > >>> > >>> Can we rename __bio_free to bio_uninit and add a comment to bio_init > >>> that it must be paried with bio_uninit? > >> > >> Let's keep it small for 4.12. We can do a cleanup on top of this for > >> 4.13. > > > > The rename is two additional lines for the patch, it's not going to > > make a difference.. > > Sure, why not... > > >>> Except for that this looks fine, although there are a lot more callers > >>> that should get this treatment.. > >> > >> Should only be an issue for on-stack bio's, since we don't go through > >> the put/free path. Did a quick grep, looks like this is one of 3. One > >> is floppy, which probably neither has DIF or uses blk-throttle. Then > >> there's one in dm-bufio, didn't look too closely at that. Last one is > >> this one. > > > > Well, it's really all callers but bio_alloc_bioset itself that > > will need this handling, as only bios that come from bio_alloc_bioset > > will be freed through bio_free. Most of them probably don't > > support DIF, but they'll also miss the bio_disassociate_task call > > this way, and will leak I/O context and css references if block > > cgroup support is enabled. > > I guess local allocs too will be affected. > > I'm baffled we've had this issue for so long, and nobody's seen it. > I knew DIF was never used (basically), but blk-throttle must have some > users at least. I fixed the same issue in the blktrace patches. It did the free in bio_endio. > > > diff --git a/block/bio.c b/block/bio.c > index 888e7801c638..f877d5e6d17f 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -240,7 +240,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx, > return bvl; > } > > -static void __bio_free(struct bio *bio) > +void bio_uninit(struct bio *bio) > { > bio_disassociate_task(bio); > > @@ -253,7 +253,7 @@ static void bio_free(struct bio *bio) > struct bio_set *bs = bio->bi_pool; > void *p; > > - __bio_free(bio); > + bio_uninit(bio); > > if (bs) { > bvec_free(bs->bvec_pool, bio->bi_io_vec, BVEC_POOL_IDX(bio)); > @@ -271,6 +271,11 @@ static void bio_free(struct bio *bio) > } > } > > +/* > + * Users of this function have their own bio allocation. Subsequently, > + * they must remember to pair any call to bio_init() with bio_uninit() > + * when IO has completed, or when the bio is released. > + */ > void bio_init(struct bio *bio, struct bio_vec *table, > unsigned short max_vecs) > { > @@ -297,7 +302,7 @@ void bio_reset(struct bio *bio) > { > unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS); > > - __bio_free(bio); > + bio_uninit(bio); > > memset(bio, 0, BIO_RESET_BYTES); > bio->bi_flags = flags; > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 519599dddd36..0a7404ef9335 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -263,7 +263,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, > kfree(vecs); > > if (unlikely(bio.bi_error)) > - return bio.bi_error; > + ret = bio.bi_error; > + > + bio_uninit(&bio); > + > return ret; > } > > diff --git a/include/linux/bio.h b/include/linux/bio.h > index d1b04b0e99cf..a7e29fa0981f 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -426,6 +426,7 @@ extern void bio_advance(struct bio *, unsigned); > > extern void bio_init(struct bio *bio, struct bio_vec *table, > unsigned short max_vecs); > +extern void bio_uninit(struct bio *); > extern void bio_reset(struct bio *); > void bio_chain(struct bio *, struct bio *); > > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 21:11 ` Shaohua Li @ 2017-06-28 21:25 ` Christoph Hellwig 0 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2017-06-28 21:25 UTC (permalink / raw) To: Shaohua Li Cc: Jens Axboe, Christoph Hellwig, Keith Busch, wenxiong, linux-kernel, bjking On Wed, Jun 28, 2017 at 02:11:31PM -0700, Shaohua Li wrote: > I fixed the same issue in the blktrace patches. It did the free in bio_endio. Moving all the frees to bio_endio does indeed seem sensible, as it avoids the additional calls for the bio_init()ed bios. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 16:32 [PATCH] fs: System memory leak when running HTX with T10 DIF enabled wenxiong 2017-06-28 17:10 ` Keith Busch @ 2017-06-28 17:42 ` Jens Axboe 2017-06-28 17:45 ` Jens Axboe 1 sibling, 1 reply; 14+ messages in thread From: Jens Axboe @ 2017-06-28 17:42 UTC (permalink / raw) To: wenxiong, linux-kernel; +Cc: keith.busch, bjking On 06/28/2017 10:32 AM, wenxiong@linux.vnet.ibm.com wrote: > From: Wen Xiong <wenxiong@linux.vnet.ibm.com> > > With nvme devive + T10 enabled, On a system it has 256GB and started > logging /proc/meminfo & /proc/slabinfo for every minute and in an hour > it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute > leaking. > > /proc/meminfo | grep SUnreclaim... > > SUnreclaim: 6752128 kB > SUnreclaim: 6874880 kB > SUnreclaim: 7238080 kB > .... > SUnreclaim: 22307264 kB > SUnreclaim: 22485888 kB > SUnreclaim: 22720256 kB > > When testcases with T10 enabled call into __blkdev_direct_IO_simple, > code doesn't free memory allocated by bio_integrity_alloc. The patch > fixes the issue. HTX has been run with +60 hours without failure. > > failing stack: > > [36587.216329] [c000002ff60874a0] [c000000000bcac68] > dump_stack+0xb0/0xf0 (unreliable) > [36587.216349] [c000002ff60874e0] [c000000000bc8c94] panic+0x140/0x308 > [36587.216407] [c000002ff6087570] [c000000000282530] > out_of_memory+0x4e0/0x650 > [36587.216465] [c000002ff6087610] [c00000000028a154] > __alloc_pages_nodemask+0xf34/0x10b0 > [36587.216534] [c000002ff6087810] [c00000000030b800] > alloc_pages_current+0xc0/0x1d0 > [36587.216603] [c000002ff6087870] [c00000000031907c] > new_slab+0x46c/0x7d0 > [36587.216661] [c000002ff6087950] [c00000000031bf00] > ___slab_alloc+0x570/0x670 > [36587.216718] [c000002ff6087a70] [c00000000031c05c] > __slab_alloc+0x5c/0x90 > [36587.216776] [c000002ff6087ad0] [c00000000031c1f4] > kmem_cache_alloc+0x164/0x300 > [36587.216845] [c000002ff6087b20] [c0000000002de120] > mmap_region+0x3e0/0x6e0 > [36587.216903] [c000002ff6087c00] [c0000000002de7cc] do_mmap+0x3ac/0x480 > [36587.216960] [c000002ff6087c80] [c0000000002af244] > vm_mmap_pgoff+0x114/0x160 > [36587.217018] [c000002ff6087d60] [c0000000002db6b0] > SyS_mmap_pgoff+0x230/0x300 > [36587.217087] [c000002ff6087de0] [c000000000014eac] sys_mmap+0x8c/0xd0 > [36587.217145] [c000002ff6087e30] [c00000000000b184] > system_call+0x38/0xe0 > [36587.217481] ---[ end Kernel panic - not syncing: Out of memory and no > killable processes... > [36587.217481] > > Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> > Reviewed by: Brian King <brking@linux.vnet.ibm.com> > Tested by: Murali Iyer <mniyer@us.ibm.com> > --- > fs/block_dev.c | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 519599d..e871444 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio) > > if (unlikely(bio.bi_error)) > return bio.bi_error; > + > + if (bio_integrity(&bio)) > + bio_integrity_free(&bio); > + > return ret; > } If this went through blk-throttle, then we'd also need to drop the task association. Should this just use __bio_free(&bio)? -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled 2017-06-28 17:42 ` Jens Axboe @ 2017-06-28 17:45 ` Jens Axboe 0 siblings, 0 replies; 14+ messages in thread From: Jens Axboe @ 2017-06-28 17:45 UTC (permalink / raw) To: wenxiong, linux-kernel; +Cc: keith.busch, bjking On 06/28/2017 11:42 AM, Jens Axboe wrote: > On 06/28/2017 10:32 AM, wenxiong@linux.vnet.ibm.com wrote: >> From: Wen Xiong <wenxiong@linux.vnet.ibm.com> >> >> With nvme devive + T10 enabled, On a system it has 256GB and started >> logging /proc/meminfo & /proc/slabinfo for every minute and in an hour >> it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute >> leaking. >> >> /proc/meminfo | grep SUnreclaim... >> >> SUnreclaim: 6752128 kB >> SUnreclaim: 6874880 kB >> SUnreclaim: 7238080 kB >> .... >> SUnreclaim: 22307264 kB >> SUnreclaim: 22485888 kB >> SUnreclaim: 22720256 kB >> >> When testcases with T10 enabled call into __blkdev_direct_IO_simple, >> code doesn't free memory allocated by bio_integrity_alloc. The patch >> fixes the issue. HTX has been run with +60 hours without failure. >> >> failing stack: >> >> [36587.216329] [c000002ff60874a0] [c000000000bcac68] >> dump_stack+0xb0/0xf0 (unreliable) >> [36587.216349] [c000002ff60874e0] [c000000000bc8c94] panic+0x140/0x308 >> [36587.216407] [c000002ff6087570] [c000000000282530] >> out_of_memory+0x4e0/0x650 >> [36587.216465] [c000002ff6087610] [c00000000028a154] >> __alloc_pages_nodemask+0xf34/0x10b0 >> [36587.216534] [c000002ff6087810] [c00000000030b800] >> alloc_pages_current+0xc0/0x1d0 >> [36587.216603] [c000002ff6087870] [c00000000031907c] >> new_slab+0x46c/0x7d0 >> [36587.216661] [c000002ff6087950] [c00000000031bf00] >> ___slab_alloc+0x570/0x670 >> [36587.216718] [c000002ff6087a70] [c00000000031c05c] >> __slab_alloc+0x5c/0x90 >> [36587.216776] [c000002ff6087ad0] [c00000000031c1f4] >> kmem_cache_alloc+0x164/0x300 >> [36587.216845] [c000002ff6087b20] [c0000000002de120] >> mmap_region+0x3e0/0x6e0 >> [36587.216903] [c000002ff6087c00] [c0000000002de7cc] do_mmap+0x3ac/0x480 >> [36587.216960] [c000002ff6087c80] [c0000000002af244] >> vm_mmap_pgoff+0x114/0x160 >> [36587.217018] [c000002ff6087d60] [c0000000002db6b0] >> SyS_mmap_pgoff+0x230/0x300 >> [36587.217087] [c000002ff6087de0] [c000000000014eac] sys_mmap+0x8c/0xd0 >> [36587.217145] [c000002ff6087e30] [c00000000000b184] >> system_call+0x38/0xe0 >> [36587.217481] ---[ end Kernel panic - not syncing: Out of memory and no >> killable processes... >> [36587.217481] >> >> Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> >> Reviewed by: Brian King <brking@linux.vnet.ibm.com> >> Tested by: Murali Iyer <mniyer@us.ibm.com> >> --- >> fs/block_dev.c | 4 ++++ >> 1 files changed, 4 insertions(+), 0 deletions(-) >> >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 519599d..e871444 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio) >> >> if (unlikely(bio.bi_error)) >> return bio.bi_error; >> + >> + if (bio_integrity(&bio)) >> + bio_integrity_free(&bio); >> + >> return ret; >> } > > If this went through blk-throttle, then we'd also need to drop > the task association. Should this just use __bio_free(&bio)? Something like the below. Also fixes the case where we had an error, your patch is still leaking for that case. diff --git a/block/bio.c b/block/bio.c index 888e7801c638..bdc6e6541278 100644 --- a/block/bio.c +++ b/block/bio.c @@ -240,7 +240,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx, return bvl; } -static void __bio_free(struct bio *bio) +void __bio_free(struct bio *bio) { bio_disassociate_task(bio); diff --git a/fs/block_dev.c b/fs/block_dev.c index 519599dddd36..d929f886ee7b 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -263,7 +263,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, kfree(vecs); if (unlikely(bio.bi_error)) - return bio.bi_error; + ret = bio.bi_error; + + __bio_free(&bio); + return ret; } diff --git a/include/linux/bio.h b/include/linux/bio.h index d1b04b0e99cf..4a29dc3db8cb 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -380,6 +380,7 @@ extern mempool_t *biovec_create_pool(int pool_entries); extern struct bio *bio_alloc_bioset(gfp_t, unsigned int, struct bio_set *); extern void bio_put(struct bio *); +extern void __bio_free(struct bio *); extern void __bio_clone_fast(struct bio *, struct bio *); extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *); -- Jens Axboe ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-06-28 21:25 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-28 16:32 [PATCH] fs: System memory leak when running HTX with T10 DIF enabled wenxiong 2017-06-28 17:10 ` Keith Busch 2017-06-28 17:51 ` Jens Axboe 2017-06-28 18:31 ` Christoph Hellwig 2017-06-28 18:34 ` Jens Axboe 2017-06-28 18:38 ` Christoph Hellwig 2017-06-28 18:44 ` Jens Axboe 2017-06-28 18:52 ` Christoph Hellwig 2017-06-28 18:57 ` Jens Axboe 2017-06-28 21:10 ` Christoph Hellwig 2017-06-28 21:11 ` Shaohua Li 2017-06-28 21:25 ` Christoph Hellwig 2017-06-28 17:42 ` Jens Axboe 2017-06-28 17:45 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).