linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
@ 2017-06-28 16:32 wenxiong
  2017-06-28 17:10 ` Keith Busch
  2017-06-28 17:42 ` Jens Axboe
  0 siblings, 2 replies; 14+ messages in thread
From: wenxiong @ 2017-06-28 16:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: keith.busch, axboe, bjking, wenxiong

From: Wen Xiong <wenxiong@linux.vnet.ibm.com>

With nvme devive + T10 enabled, On a system it has 256GB and started
logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
leaking.

/proc/meminfo | grep SUnreclaim...

SUnreclaim:      6752128 kB
SUnreclaim:      6874880 kB
SUnreclaim:      7238080 kB
....
SUnreclaim:     22307264 kB
SUnreclaim:     22485888 kB
SUnreclaim:     22720256 kB

When testcases with T10 enabled call into __blkdev_direct_IO_simple,
code doesn't free memory allocated by bio_integrity_alloc. The patch
fixes the issue. HTX has been run with +60 hours without failure.

failing stack:

[36587.216329] [c000002ff60874a0] [c000000000bcac68]
dump_stack+0xb0/0xf0 (unreliable)
[36587.216349] [c000002ff60874e0] [c000000000bc8c94] panic+0x140/0x308
[36587.216407] [c000002ff6087570] [c000000000282530]
out_of_memory+0x4e0/0x650
[36587.216465] [c000002ff6087610] [c00000000028a154]
__alloc_pages_nodemask+0xf34/0x10b0
[36587.216534] [c000002ff6087810] [c00000000030b800]
alloc_pages_current+0xc0/0x1d0
[36587.216603] [c000002ff6087870] [c00000000031907c]
new_slab+0x46c/0x7d0
[36587.216661] [c000002ff6087950] [c00000000031bf00]
___slab_alloc+0x570/0x670
[36587.216718] [c000002ff6087a70] [c00000000031c05c]
__slab_alloc+0x5c/0x90
[36587.216776] [c000002ff6087ad0] [c00000000031c1f4]
kmem_cache_alloc+0x164/0x300
[36587.216845] [c000002ff6087b20] [c0000000002de120]
mmap_region+0x3e0/0x6e0
[36587.216903] [c000002ff6087c00] [c0000000002de7cc] do_mmap+0x3ac/0x480
[36587.216960] [c000002ff6087c80] [c0000000002af244]
vm_mmap_pgoff+0x114/0x160
[36587.217018] [c000002ff6087d60] [c0000000002db6b0]
SyS_mmap_pgoff+0x230/0x300
[36587.217087] [c000002ff6087de0] [c000000000014eac] sys_mmap+0x8c/0xd0
[36587.217145] [c000002ff6087e30] [c00000000000b184]
system_call+0x38/0xe0
[36587.217481] ---[ end Kernel panic - not syncing: Out of memory and no
killable processes...
[36587.217481]

Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed by:  Brian King <brking@linux.vnet.ibm.com>
Tested by: Murali Iyer <mniyer@us.ibm.com>
---
 fs/block_dev.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 519599d..e871444 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
 
 	if (unlikely(bio.bi_error))
 		return bio.bi_error;
+
+	if (bio_integrity(&bio))
+		bio_integrity_free(&bio);
+
 	return ret;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 16:32 [PATCH] fs: System memory leak when running HTX with T10 DIF enabled wenxiong
@ 2017-06-28 17:10 ` Keith Busch
  2017-06-28 17:51   ` Jens Axboe
  2017-06-28 18:31   ` Christoph Hellwig
  2017-06-28 17:42 ` Jens Axboe
  1 sibling, 2 replies; 14+ messages in thread
From: Keith Busch @ 2017-06-28 17:10 UTC (permalink / raw)
  To: wenxiong; +Cc: linux-kernel, axboe, bjking, hch

On Wed, Jun 28, 2017 at 11:32:51AM -0500, wenxiong@linux.vnet.ibm.com wrote:
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 519599d..e871444 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
>  
>  	if (unlikely(bio.bi_error))
>  		return bio.bi_error;
> +
> +	if (bio_integrity(&bio))
> +		bio_integrity_free(&bio);
> +
>  	return ret;
>  }

We don't want to leak the integrity payload in case of bi_error either.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 16:32 [PATCH] fs: System memory leak when running HTX with T10 DIF enabled wenxiong
  2017-06-28 17:10 ` Keith Busch
@ 2017-06-28 17:42 ` Jens Axboe
  2017-06-28 17:45   ` Jens Axboe
  1 sibling, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2017-06-28 17:42 UTC (permalink / raw)
  To: wenxiong, linux-kernel; +Cc: keith.busch, bjking

On 06/28/2017 10:32 AM, wenxiong@linux.vnet.ibm.com wrote:
> From: Wen Xiong <wenxiong@linux.vnet.ibm.com>
> 
> With nvme devive + T10 enabled, On a system it has 256GB and started
> logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
> it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
> leaking.
> 
> /proc/meminfo | grep SUnreclaim...
> 
> SUnreclaim:      6752128 kB
> SUnreclaim:      6874880 kB
> SUnreclaim:      7238080 kB
> ....
> SUnreclaim:     22307264 kB
> SUnreclaim:     22485888 kB
> SUnreclaim:     22720256 kB
> 
> When testcases with T10 enabled call into __blkdev_direct_IO_simple,
> code doesn't free memory allocated by bio_integrity_alloc. The patch
> fixes the issue. HTX has been run with +60 hours without failure.
> 
> failing stack:
> 
> [36587.216329] [c000002ff60874a0] [c000000000bcac68]
> dump_stack+0xb0/0xf0 (unreliable)
> [36587.216349] [c000002ff60874e0] [c000000000bc8c94] panic+0x140/0x308
> [36587.216407] [c000002ff6087570] [c000000000282530]
> out_of_memory+0x4e0/0x650
> [36587.216465] [c000002ff6087610] [c00000000028a154]
> __alloc_pages_nodemask+0xf34/0x10b0
> [36587.216534] [c000002ff6087810] [c00000000030b800]
> alloc_pages_current+0xc0/0x1d0
> [36587.216603] [c000002ff6087870] [c00000000031907c]
> new_slab+0x46c/0x7d0
> [36587.216661] [c000002ff6087950] [c00000000031bf00]
> ___slab_alloc+0x570/0x670
> [36587.216718] [c000002ff6087a70] [c00000000031c05c]
> __slab_alloc+0x5c/0x90
> [36587.216776] [c000002ff6087ad0] [c00000000031c1f4]
> kmem_cache_alloc+0x164/0x300
> [36587.216845] [c000002ff6087b20] [c0000000002de120]
> mmap_region+0x3e0/0x6e0
> [36587.216903] [c000002ff6087c00] [c0000000002de7cc] do_mmap+0x3ac/0x480
> [36587.216960] [c000002ff6087c80] [c0000000002af244]
> vm_mmap_pgoff+0x114/0x160
> [36587.217018] [c000002ff6087d60] [c0000000002db6b0]
> SyS_mmap_pgoff+0x230/0x300
> [36587.217087] [c000002ff6087de0] [c000000000014eac] sys_mmap+0x8c/0xd0
> [36587.217145] [c000002ff6087e30] [c00000000000b184]
> system_call+0x38/0xe0
> [36587.217481] ---[ end Kernel panic - not syncing: Out of memory and no
> killable processes...
> [36587.217481]
> 
> Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
> Reviewed by:  Brian King <brking@linux.vnet.ibm.com>
> Tested by: Murali Iyer <mniyer@us.ibm.com>
> ---
>  fs/block_dev.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 519599d..e871444 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
>  
>  	if (unlikely(bio.bi_error))
>  		return bio.bi_error;
> +
> +	if (bio_integrity(&bio))
> +		bio_integrity_free(&bio);
> +
>  	return ret;
>  }

If this went through blk-throttle, then we'd also need to drop
the task association. Should this just use __bio_free(&bio)?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 17:42 ` Jens Axboe
@ 2017-06-28 17:45   ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2017-06-28 17:45 UTC (permalink / raw)
  To: wenxiong, linux-kernel; +Cc: keith.busch, bjking

On 06/28/2017 11:42 AM, Jens Axboe wrote:
> On 06/28/2017 10:32 AM, wenxiong@linux.vnet.ibm.com wrote:
>> From: Wen Xiong <wenxiong@linux.vnet.ibm.com>
>>
>> With nvme devive + T10 enabled, On a system it has 256GB and started
>> logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
>> it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
>> leaking.
>>
>> /proc/meminfo | grep SUnreclaim...
>>
>> SUnreclaim:      6752128 kB
>> SUnreclaim:      6874880 kB
>> SUnreclaim:      7238080 kB
>> ....
>> SUnreclaim:     22307264 kB
>> SUnreclaim:     22485888 kB
>> SUnreclaim:     22720256 kB
>>
>> When testcases with T10 enabled call into __blkdev_direct_IO_simple,
>> code doesn't free memory allocated by bio_integrity_alloc. The patch
>> fixes the issue. HTX has been run with +60 hours without failure.
>>
>> failing stack:
>>
>> [36587.216329] [c000002ff60874a0] [c000000000bcac68]
>> dump_stack+0xb0/0xf0 (unreliable)
>> [36587.216349] [c000002ff60874e0] [c000000000bc8c94] panic+0x140/0x308
>> [36587.216407] [c000002ff6087570] [c000000000282530]
>> out_of_memory+0x4e0/0x650
>> [36587.216465] [c000002ff6087610] [c00000000028a154]
>> __alloc_pages_nodemask+0xf34/0x10b0
>> [36587.216534] [c000002ff6087810] [c00000000030b800]
>> alloc_pages_current+0xc0/0x1d0
>> [36587.216603] [c000002ff6087870] [c00000000031907c]
>> new_slab+0x46c/0x7d0
>> [36587.216661] [c000002ff6087950] [c00000000031bf00]
>> ___slab_alloc+0x570/0x670
>> [36587.216718] [c000002ff6087a70] [c00000000031c05c]
>> __slab_alloc+0x5c/0x90
>> [36587.216776] [c000002ff6087ad0] [c00000000031c1f4]
>> kmem_cache_alloc+0x164/0x300
>> [36587.216845] [c000002ff6087b20] [c0000000002de120]
>> mmap_region+0x3e0/0x6e0
>> [36587.216903] [c000002ff6087c00] [c0000000002de7cc] do_mmap+0x3ac/0x480
>> [36587.216960] [c000002ff6087c80] [c0000000002af244]
>> vm_mmap_pgoff+0x114/0x160
>> [36587.217018] [c000002ff6087d60] [c0000000002db6b0]
>> SyS_mmap_pgoff+0x230/0x300
>> [36587.217087] [c000002ff6087de0] [c000000000014eac] sys_mmap+0x8c/0xd0
>> [36587.217145] [c000002ff6087e30] [c00000000000b184]
>> system_call+0x38/0xe0
>> [36587.217481] ---[ end Kernel panic - not syncing: Out of memory and no
>> killable processes...
>> [36587.217481]
>>
>> Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
>> Reviewed by:  Brian King <brking@linux.vnet.ibm.com>
>> Tested by: Murali Iyer <mniyer@us.ibm.com>
>> ---
>>  fs/block_dev.c |    4 ++++
>>  1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 519599d..e871444 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
>>  
>>  	if (unlikely(bio.bi_error))
>>  		return bio.bi_error;
>> +
>> +	if (bio_integrity(&bio))
>> +		bio_integrity_free(&bio);
>> +
>>  	return ret;
>>  }
> 
> If this went through blk-throttle, then we'd also need to drop
> the task association. Should this just use __bio_free(&bio)?

Something like the below. Also fixes the case where we had an
error, your patch is still leaking for that case.


diff --git a/block/bio.c b/block/bio.c
index 888e7801c638..bdc6e6541278 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -240,7 +240,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx,
 	return bvl;
 }
 
-static void __bio_free(struct bio *bio)
+void __bio_free(struct bio *bio)
 {
 	bio_disassociate_task(bio);
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 519599dddd36..d929f886ee7b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -263,7 +263,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 		kfree(vecs);
 
 	if (unlikely(bio.bi_error))
-		return bio.bi_error;
+		ret = bio.bi_error;
+
+	__bio_free(&bio);
+
 	return ret;
 }
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d1b04b0e99cf..4a29dc3db8cb 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -380,6 +380,7 @@ extern mempool_t *biovec_create_pool(int pool_entries);
 
 extern struct bio *bio_alloc_bioset(gfp_t, unsigned int, struct bio_set *);
 extern void bio_put(struct bio *);
+extern void __bio_free(struct bio *);
 
 extern void __bio_clone_fast(struct bio *, struct bio *);
 extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *);

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 17:10 ` Keith Busch
@ 2017-06-28 17:51   ` Jens Axboe
  2017-06-28 18:31   ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2017-06-28 17:51 UTC (permalink / raw)
  To: Keith Busch, wenxiong; +Cc: linux-kernel, bjking, hch

On 06/28/2017 11:10 AM, Keith Busch wrote:
> On Wed, Jun 28, 2017 at 11:32:51AM -0500, wenxiong@linux.vnet.ibm.com wrote:
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 519599d..e871444 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
>>  
>>  	if (unlikely(bio.bi_error))
>>  		return bio.bi_error;
>> +
>> +	if (bio_integrity(&bio))
>> +		bio_integrity_free(&bio);
>> +
>>  	return ret;
>>  }
> 
> We don't want to leak the integrity payload in case of bi_error either.

Yep, see my follow up email on this.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 17:10 ` Keith Busch
  2017-06-28 17:51   ` Jens Axboe
@ 2017-06-28 18:31   ` Christoph Hellwig
  2017-06-28 18:34     ` Jens Axboe
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-06-28 18:31 UTC (permalink / raw)
  To: Keith Busch; +Cc: wenxiong, linux-kernel, axboe, bjking, hch

On Wed, Jun 28, 2017 at 01:10:31PM -0400, Keith Busch wrote:
> On Wed, Jun 28, 2017 at 11:32:51AM -0500, wenxiong@linux.vnet.ibm.com wrote:
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 519599d..e871444 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
> >  
> >  	if (unlikely(bio.bi_error))
> >  		return bio.bi_error;
> > +
> > +	if (bio_integrity(&bio))
> > +		bio_integrity_free(&bio);
> > +
> >  	return ret;
> >  }
> 
> We don't want to leak the integrity payload in case of bi_error either.

And we should just call __bio_free.  Which btw every user of bio_init
probably needs.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 18:31   ` Christoph Hellwig
@ 2017-06-28 18:34     ` Jens Axboe
  2017-06-28 18:38       ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2017-06-28 18:34 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch; +Cc: wenxiong, linux-kernel, bjking

On 06/28/2017 12:31 PM, Christoph Hellwig wrote:
> On Wed, Jun 28, 2017 at 01:10:31PM -0400, Keith Busch wrote:
>> On Wed, Jun 28, 2017 at 11:32:51AM -0500, wenxiong@linux.vnet.ibm.com wrote:
>>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>>> index 519599d..e871444 100644
>>> --- a/fs/block_dev.c
>>> +++ b/fs/block_dev.c
>>> @@ -264,6 +264,10 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
>>>  
>>>  	if (unlikely(bio.bi_error))
>>>  		return bio.bi_error;
>>> +
>>> +	if (bio_integrity(&bio))
>>> +		bio_integrity_free(&bio);
>>> +
>>>  	return ret;
>>>  }
>>
>> We don't want to leak the integrity payload in case of bi_error either.
> 
> And we should just call __bio_free.  Which btw every user of bio_init
> probably needs.

That's what I sent out. Here it is again. We should get this into 4.12,
so would be great with a review or two.


diff --git a/block/bio.c b/block/bio.c
index 888e7801c638..bdc6e6541278 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -240,7 +240,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx,
 	return bvl;
 }
 
-static void __bio_free(struct bio *bio)
+void __bio_free(struct bio *bio)
 {
 	bio_disassociate_task(bio);
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 519599dddd36..d929f886ee7b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -263,7 +263,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 		kfree(vecs);
 
 	if (unlikely(bio.bi_error))
-		return bio.bi_error;
+		ret = bio.bi_error;
+
+	__bio_free(&bio);
+
 	return ret;
 }
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d1b04b0e99cf..4a29dc3db8cb 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -380,6 +380,7 @@ extern mempool_t *biovec_create_pool(int pool_entries);
 
 extern struct bio *bio_alloc_bioset(gfp_t, unsigned int, struct bio_set *);
 extern void bio_put(struct bio *);
+extern void __bio_free(struct bio *);
 
 extern void __bio_clone_fast(struct bio *, struct bio *);
 extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *);

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 18:34     ` Jens Axboe
@ 2017-06-28 18:38       ` Christoph Hellwig
  2017-06-28 18:44         ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-06-28 18:38 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Keith Busch, wenxiong, linux-kernel, bjking

On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote:
> That's what I sent out.

Where?  Didn't see that anywhere..

> Here it is again. We should get this into 4.12,
> so would be great with a review or two.

Can we rename __bio_free to bio_uninit and add a comment to bio_init
that it must be paried with bio_uninit?

Except for that this looks fine, although there are a lot more callers
that should get this treatment..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 18:38       ` Christoph Hellwig
@ 2017-06-28 18:44         ` Jens Axboe
  2017-06-28 18:52           ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2017-06-28 18:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, wenxiong, linux-kernel, bjking

On 06/28/2017 12:38 PM, Christoph Hellwig wrote:
> On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote:
>> That's what I sent out.
> 
> Where?  Didn't see that anywhere..

Looks like you weren't CC'ed on the original thread. About an hour ago.

>> Here it is again. We should get this into 4.12,
>> so would be great with a review or two.
> 
> Can we rename __bio_free to bio_uninit and add a comment to bio_init
> that it must be paried with bio_uninit?

Let's keep it small for 4.12. We can do a cleanup on top of this for
4.13.

> Except for that this looks fine, although there are a lot more callers
> that should get this treatment..

Should only be an issue for on-stack bio's, since we don't go through
the put/free path. Did a quick grep, looks like this is one of 3. One
is floppy, which probably neither has DIF or uses blk-throttle. Then
there's one in dm-bufio, didn't look too closely at that. Last one is
this one.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 18:44         ` Jens Axboe
@ 2017-06-28 18:52           ` Christoph Hellwig
  2017-06-28 18:57             ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2017-06-28 18:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Keith Busch, wenxiong, linux-kernel, bjking

On Wed, Jun 28, 2017 at 12:44:00PM -0600, Jens Axboe wrote:
> On 06/28/2017 12:38 PM, Christoph Hellwig wrote:
> > On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote:
> >> That's what I sent out.
> > 
> > Where?  Didn't see that anywhere..
> 
> Looks like you weren't CC'ed on the original thread. About an hour ago.
> 
> >> Here it is again. We should get this into 4.12,
> >> so would be great with a review or two.
> > 
> > Can we rename __bio_free to bio_uninit and add a comment to bio_init
> > that it must be paried with bio_uninit?
> 
> Let's keep it small for 4.12. We can do a cleanup on top of this for
> 4.13.

The rename is two additional lines for the patch, it's not going to
make a difference..

> > Except for that this looks fine, although there are a lot more callers
> > that should get this treatment..
> 
> Should only be an issue for on-stack bio's, since we don't go through
> the put/free path. Did a quick grep, looks like this is one of 3. One
> is floppy, which probably neither has DIF or uses blk-throttle. Then
> there's one in dm-bufio, didn't look too closely at that. Last one is
> this one.

Well, it's really all callers but bio_alloc_bioset itself that
will need this handling, as only bios that come from bio_alloc_bioset
will be freed through bio_free.  Most of them probably don't
support DIF, but they'll also miss the bio_disassociate_task call
this way, and will leak I/O context and css references if block
cgroup support is enabled.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 18:52           ` Christoph Hellwig
@ 2017-06-28 18:57             ` Jens Axboe
  2017-06-28 21:10               ` Christoph Hellwig
  2017-06-28 21:11               ` Shaohua Li
  0 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2017-06-28 18:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Keith Busch, wenxiong, linux-kernel, bjking

On 06/28/2017 12:52 PM, Christoph Hellwig wrote:
> On Wed, Jun 28, 2017 at 12:44:00PM -0600, Jens Axboe wrote:
>> On 06/28/2017 12:38 PM, Christoph Hellwig wrote:
>>> On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote:
>>>> That's what I sent out.
>>>
>>> Where?  Didn't see that anywhere..
>>
>> Looks like you weren't CC'ed on the original thread. About an hour ago.
>>
>>>> Here it is again. We should get this into 4.12,
>>>> so would be great with a review or two.
>>>
>>> Can we rename __bio_free to bio_uninit and add a comment to bio_init
>>> that it must be paried with bio_uninit?
>>
>> Let's keep it small for 4.12. We can do a cleanup on top of this for
>> 4.13.
> 
> The rename is two additional lines for the patch, it's not going to
> make a difference..

Sure, why not...

>>> Except for that this looks fine, although there are a lot more callers
>>> that should get this treatment..
>>
>> Should only be an issue for on-stack bio's, since we don't go through
>> the put/free path. Did a quick grep, looks like this is one of 3. One
>> is floppy, which probably neither has DIF or uses blk-throttle. Then
>> there's one in dm-bufio, didn't look too closely at that. Last one is
>> this one.
> 
> Well, it's really all callers but bio_alloc_bioset itself that
> will need this handling, as only bios that come from bio_alloc_bioset
> will be freed through bio_free.  Most of them probably don't
> support DIF, but they'll also miss the bio_disassociate_task call
> this way, and will leak I/O context and css references if block
> cgroup support is enabled.

I guess local allocs too will be affected.

I'm baffled we've had this issue for so long, and nobody's seen it.
I knew DIF was never used (basically), but blk-throttle must have some
users at least.



diff --git a/block/bio.c b/block/bio.c
index 888e7801c638..f877d5e6d17f 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -240,7 +240,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx,
 	return bvl;
 }
 
-static void __bio_free(struct bio *bio)
+void bio_uninit(struct bio *bio)
 {
 	bio_disassociate_task(bio);
 
@@ -253,7 +253,7 @@ static void bio_free(struct bio *bio)
 	struct bio_set *bs = bio->bi_pool;
 	void *p;
 
-	__bio_free(bio);
+	bio_uninit(bio);
 
 	if (bs) {
 		bvec_free(bs->bvec_pool, bio->bi_io_vec, BVEC_POOL_IDX(bio));
@@ -271,6 +271,11 @@ static void bio_free(struct bio *bio)
 	}
 }
 
+/*
+ * Users of this function have their own bio allocation. Subsequently,
+ * they must remember to pair any call to bio_init() with bio_uninit()
+ * when IO has completed, or when the bio is released.
+ */
 void bio_init(struct bio *bio, struct bio_vec *table,
 	      unsigned short max_vecs)
 {
@@ -297,7 +302,7 @@ void bio_reset(struct bio *bio)
 {
 	unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS);
 
-	__bio_free(bio);
+	bio_uninit(bio);
 
 	memset(bio, 0, BIO_RESET_BYTES);
 	bio->bi_flags = flags;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 519599dddd36..0a7404ef9335 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -263,7 +263,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 		kfree(vecs);
 
 	if (unlikely(bio.bi_error))
-		return bio.bi_error;
+		ret = bio.bi_error;
+
+	bio_uninit(&bio);
+
 	return ret;
 }
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d1b04b0e99cf..a7e29fa0981f 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -426,6 +426,7 @@ extern void bio_advance(struct bio *, unsigned);
 
 extern void bio_init(struct bio *bio, struct bio_vec *table,
 		     unsigned short max_vecs);
+extern void bio_uninit(struct bio *);
 extern void bio_reset(struct bio *);
 void bio_chain(struct bio *, struct bio *);
 

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 18:57             ` Jens Axboe
@ 2017-06-28 21:10               ` Christoph Hellwig
  2017-06-28 21:11               ` Shaohua Li
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2017-06-28 21:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Keith Busch, wenxiong, linux-kernel, bjking

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 18:57             ` Jens Axboe
  2017-06-28 21:10               ` Christoph Hellwig
@ 2017-06-28 21:11               ` Shaohua Li
  2017-06-28 21:25                 ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Shaohua Li @ 2017-06-28 21:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Keith Busch, wenxiong, linux-kernel, bjking

On Wed, Jun 28, 2017 at 12:57:50PM -0600, Jens Axboe wrote:
> On 06/28/2017 12:52 PM, Christoph Hellwig wrote:
> > On Wed, Jun 28, 2017 at 12:44:00PM -0600, Jens Axboe wrote:
> >> On 06/28/2017 12:38 PM, Christoph Hellwig wrote:
> >>> On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote:
> >>>> That's what I sent out.
> >>>
> >>> Where?  Didn't see that anywhere..
> >>
> >> Looks like you weren't CC'ed on the original thread. About an hour ago.
> >>
> >>>> Here it is again. We should get this into 4.12,
> >>>> so would be great with a review or two.
> >>>
> >>> Can we rename __bio_free to bio_uninit and add a comment to bio_init
> >>> that it must be paried with bio_uninit?
> >>
> >> Let's keep it small for 4.12. We can do a cleanup on top of this for
> >> 4.13.
> > 
> > The rename is two additional lines for the patch, it's not going to
> > make a difference..
> 
> Sure, why not...
> 
> >>> Except for that this looks fine, although there are a lot more callers
> >>> that should get this treatment..
> >>
> >> Should only be an issue for on-stack bio's, since we don't go through
> >> the put/free path. Did a quick grep, looks like this is one of 3. One
> >> is floppy, which probably neither has DIF or uses blk-throttle. Then
> >> there's one in dm-bufio, didn't look too closely at that. Last one is
> >> this one.
> > 
> > Well, it's really all callers but bio_alloc_bioset itself that
> > will need this handling, as only bios that come from bio_alloc_bioset
> > will be freed through bio_free.  Most of them probably don't
> > support DIF, but they'll also miss the bio_disassociate_task call
> > this way, and will leak I/O context and css references if block
> > cgroup support is enabled.
> 
> I guess local allocs too will be affected.
> 
> I'm baffled we've had this issue for so long, and nobody's seen it.
> I knew DIF was never used (basically), but blk-throttle must have some
> users at least.

I fixed the same issue in the blktrace patches. It did the free in bio_endio.

 
> 
> 
> diff --git a/block/bio.c b/block/bio.c
> index 888e7801c638..f877d5e6d17f 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -240,7 +240,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx,
>  	return bvl;
>  }
>  
> -static void __bio_free(struct bio *bio)
> +void bio_uninit(struct bio *bio)
>  {
>  	bio_disassociate_task(bio);
>  
> @@ -253,7 +253,7 @@ static void bio_free(struct bio *bio)
>  	struct bio_set *bs = bio->bi_pool;
>  	void *p;
>  
> -	__bio_free(bio);
> +	bio_uninit(bio);
>  
>  	if (bs) {
>  		bvec_free(bs->bvec_pool, bio->bi_io_vec, BVEC_POOL_IDX(bio));
> @@ -271,6 +271,11 @@ static void bio_free(struct bio *bio)
>  	}
>  }
>  
> +/*
> + * Users of this function have their own bio allocation. Subsequently,
> + * they must remember to pair any call to bio_init() with bio_uninit()
> + * when IO has completed, or when the bio is released.
> + */
>  void bio_init(struct bio *bio, struct bio_vec *table,
>  	      unsigned short max_vecs)
>  {
> @@ -297,7 +302,7 @@ void bio_reset(struct bio *bio)
>  {
>  	unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS);
>  
> -	__bio_free(bio);
> +	bio_uninit(bio);
>  
>  	memset(bio, 0, BIO_RESET_BYTES);
>  	bio->bi_flags = flags;
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 519599dddd36..0a7404ef9335 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -263,7 +263,10 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
>  		kfree(vecs);
>  
>  	if (unlikely(bio.bi_error))
> -		return bio.bi_error;
> +		ret = bio.bi_error;
> +
> +	bio_uninit(&bio);
> +
>  	return ret;
>  }
>  
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d1b04b0e99cf..a7e29fa0981f 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -426,6 +426,7 @@ extern void bio_advance(struct bio *, unsigned);
>  
>  extern void bio_init(struct bio *bio, struct bio_vec *table,
>  		     unsigned short max_vecs);
> +extern void bio_uninit(struct bio *);
>  extern void bio_reset(struct bio *);
>  void bio_chain(struct bio *, struct bio *);
>  
> 
> -- 
> Jens Axboe
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled
  2017-06-28 21:11               ` Shaohua Li
@ 2017-06-28 21:25                 ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2017-06-28 21:25 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, Christoph Hellwig, Keith Busch, wenxiong,
	linux-kernel, bjking

On Wed, Jun 28, 2017 at 02:11:31PM -0700, Shaohua Li wrote:
> I fixed the same issue in the blktrace patches. It did the free in bio_endio.

Moving all the frees to bio_endio does indeed seem sensible,
as it avoids the additional calls for the bio_init()ed bios.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-06-28 21:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-28 16:32 [PATCH] fs: System memory leak when running HTX with T10 DIF enabled wenxiong
2017-06-28 17:10 ` Keith Busch
2017-06-28 17:51   ` Jens Axboe
2017-06-28 18:31   ` Christoph Hellwig
2017-06-28 18:34     ` Jens Axboe
2017-06-28 18:38       ` Christoph Hellwig
2017-06-28 18:44         ` Jens Axboe
2017-06-28 18:52           ` Christoph Hellwig
2017-06-28 18:57             ` Jens Axboe
2017-06-28 21:10               ` Christoph Hellwig
2017-06-28 21:11               ` Shaohua Li
2017-06-28 21:25                 ` Christoph Hellwig
2017-06-28 17:42 ` Jens Axboe
2017-06-28 17:45   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).