[PATCH] zram: set physical queue limits to avoid array out of bounds accesses

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
@ 2017-03-06 10:23 Johannes Thumshirn
  2017-03-06 10:25 ` Hannes Reinecke
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Johannes Thumshirn @ 2017-03-06 10:23 UTC (permalink / raw)
  To: Jens Axboe, Minchan Kim, Nitin Gupta
  Cc: Christoph Hellwig, Sergey Senozhatsky, Hannes Reinecke, yizhan,
	Linux Block Layer Mailinglist, Linux Kernel Mailinglist,
	Johannes Thumshirn

zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
the NVMe over Fabrics loopback target which potentially sends a huge bulk of
pages attached to the bio's bvec this results in a kernel panic because of
array out of bounds accesses in zram_decompress_page().

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/block/zram/zram_drv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e27d89a..dceb5ed 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1189,6 +1189,8 @@ static int zram_add(void)
 	blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
 	blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);
 	zram->disk->queue->limits.discard_granularity = PAGE_SIZE;
+	zram->disk->queue->limits.max_sectors = SECTORS_PER_PAGE;
+	zram->disk->queue->limits.chunk_sectors = 0;
 	blk_queue_max_discard_sectors(zram->disk->queue, UINT_MAX);
 	/*
 	 * zram_bio_discard() will clear all logical blocks if logical block
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-06 10:23 [PATCH] zram: set physical queue limits to avoid array out of bounds accesses Johannes Thumshirn
@ 2017-03-06 10:25 ` Hannes Reinecke
  2017-03-06 10:45 ` Sergey Senozhatsky
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2017-03-06 10:25 UTC (permalink / raw)
  To: Johannes Thumshirn, Jens Axboe, Minchan Kim, Nitin Gupta
  Cc: Christoph Hellwig, Sergey Senozhatsky, yizhan,
	Linux Block Layer Mailinglist, Linux Kernel Mailinglist

On 03/06/2017 11:23 AM, Johannes Thumshirn wrote:
> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
> pages attached to the bio's bvec this results in a kernel panic because of
> array out of bounds accesses in zram_decompress_page().
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/block/zram/zram_drv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: F. Imendï¿½rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nï¿½rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-06 10:23 [PATCH] zram: set physical queue limits to avoid array out of bounds accesses Johannes Thumshirn
  2017-03-06 10:25 ` Hannes Reinecke
@ 2017-03-06 10:45 ` Sergey Senozhatsky
  2017-03-06 15:21 ` Jens Axboe
  2017-03-07  5:22 ` Minchan Kim
  3 siblings, 0 replies; 20+ messages in thread
From: Sergey Senozhatsky @ 2017-03-06 10:45 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Andrew Morton, Jens Axboe, Minchan Kim, Nitin Gupta,
	Christoph Hellwig, Sergey Senozhatsky, Hannes Reinecke, yizhan,
	Linux Block Layer Mailinglist, Linux Kernel Mailinglist

On (03/06/17 11:23), Johannes Thumshirn wrote:
> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
> pages attached to the bio's bvec this results in a kernel panic because of
> array out of bounds accesses in zram_decompress_page().
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>

Cc Andrew
Link: lkml.kernel.org/r/20170306102335.9180-1-jthumshirn@suse.de


Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

thanks!

	-ss

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-06 10:23 [PATCH] zram: set physical queue limits to avoid array out of bounds accesses Johannes Thumshirn
  2017-03-06 10:25 ` Hannes Reinecke
  2017-03-06 10:45 ` Sergey Senozhatsky
@ 2017-03-06 15:21 ` Jens Axboe
  2017-03-06 20:18   ` Andrew Morton
  2017-03-07  5:22 ` Minchan Kim
  3 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2017-03-06 15:21 UTC (permalink / raw)
  To: Johannes Thumshirn, Minchan Kim, Nitin Gupta
  Cc: Christoph Hellwig, Sergey Senozhatsky, Hannes Reinecke, yizhan,
	Linux Block Layer Mailinglist, Linux Kernel Mailinglist

On 03/06/2017 03:23 AM, Johannes Thumshirn wrote:
> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
> pages attached to the bio's bvec this results in a kernel panic because of
> array out of bounds accesses in zram_decompress_page().

Applied, thanks.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-06 15:21 ` Jens Axboe
@ 2017-03-06 20:18   ` Andrew Morton
  2017-03-06 20:19     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2017-03-06 20:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Johannes Thumshirn, Minchan Kim, Nitin Gupta, Christoph Hellwig,
	Sergey Senozhatsky, Hannes Reinecke, yizhan,
	Linux Block Layer Mailinglist, Linux Kernel Mailinglist

On Mon, 6 Mar 2017 08:21:11 -0700 Jens Axboe <axboe@fb.com> wrote:

> On 03/06/2017 03:23 AM, Johannes Thumshirn wrote:
> > zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
> > the NVMe over Fabrics loopback target which potentially sends a huge bulk of
> > pages attached to the bio's bvec this results in a kernel panic because of
> > array out of bounds accesses in zram_decompress_page().
> 
> Applied, thanks.

With an added cc:stable, hopefully?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-06 20:18   ` Andrew Morton
@ 2017-03-06 20:19     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2017-03-06 20:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Thumshirn, Minchan Kim, Nitin Gupta, Christoph Hellwig,
	Sergey Senozhatsky, Hannes Reinecke, yizhan,
	Linux Block Layer Mailinglist, Linux Kernel Mailinglist

On 03/06/2017 01:18 PM, Andrew Morton wrote:
> On Mon, 6 Mar 2017 08:21:11 -0700 Jens Axboe <axboe@fb.com> wrote:
> 
>> On 03/06/2017 03:23 AM, Johannes Thumshirn wrote:
>>> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
>>> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
>>> pages attached to the bio's bvec this results in a kernel panic because of
>>> array out of bounds accesses in zram_decompress_page().
>>
>> Applied, thanks.
> 
> With an added cc:stable, hopefully?

I didn't. But I can email it to stable@ when it lands in Linus's tree.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-06 10:23 [PATCH] zram: set physical queue limits to avoid array out of bounds accesses Johannes Thumshirn
                   ` (2 preceding siblings ...)
  2017-03-06 15:21 ` Jens Axboe
@ 2017-03-07  5:22 ` Minchan Kim
  2017-03-07  7:00   ` Hannes Reinecke
  3 siblings, 1 reply; 20+ messages in thread
From: Minchan Kim @ 2017-03-07  5:22 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Jens Axboe, Nitin Gupta, Christoph Hellwig, Sergey Senozhatsky,
	Hannes Reinecke, yizhan, Linux Block Layer Mailinglist,
	Linux Kernel Mailinglist

Hello Johannes,

On Mon, Mar 06, 2017 at 11:23:35AM +0100, Johannes Thumshirn wrote:
> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
> pages attached to the bio's bvec this results in a kernel panic because of
> array out of bounds accesses in zram_decompress_page().

First of all, thanks for the report and fix up!
Unfortunately, I'm not familiar with that interface of block layer.

It seems this is a material for stable so I want to understand it clear.
Could you say more specific things to educate me?

What scenario/When/How it is problem?  It will help for me to understand!

Thanks.

> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/block/zram/zram_drv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index e27d89a..dceb5ed 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1189,6 +1189,8 @@ static int zram_add(void)
>  	blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
>  	blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);
>  	zram->disk->queue->limits.discard_granularity = PAGE_SIZE;
> +	zram->disk->queue->limits.max_sectors = SECTORS_PER_PAGE;
> +	zram->disk->queue->limits.chunk_sectors = 0;
>  	blk_queue_max_discard_sectors(zram->disk->queue, UINT_MAX);
>  	/*
>  	 * zram_bio_discard() will clear all logical blocks if logical block
> -- 
> 1.8.5.6
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-07  5:22 ` Minchan Kim
@ 2017-03-07  7:00   ` Hannes Reinecke
  2017-03-07  7:23     ` Minchan Kim
  0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2017-03-07  7:00 UTC (permalink / raw)
  To: Minchan Kim, Johannes Thumshirn
  Cc: Jens Axboe, Nitin Gupta, Christoph Hellwig, Sergey Senozhatsky,
	yizhan, Linux Block Layer Mailinglist, Linux Kernel Mailinglist

On 03/07/2017 06:22 AM, Minchan Kim wrote:
> Hello Johannes,
> 
> On Mon, Mar 06, 2017 at 11:23:35AM +0100, Johannes Thumshirn wrote:
>> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
>> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
>> pages attached to the bio's bvec this results in a kernel panic because of
>> array out of bounds accesses in zram_decompress_page().
> 
> First of all, thanks for the report and fix up!
> Unfortunately, I'm not familiar with that interface of block layer.
> 
> It seems this is a material for stable so I want to understand it clear.
> Could you say more specific things to educate me?
> 
> What scenario/When/How it is problem?  It will help for me to understand!
> 
The problem is that zram as it currently stands can only handle bios
where each bvec contains a single page (or, to be precise, a chunk of
data with a length of a page).

This is not an automatic guarantee from the block layer (who is free to
send us bios with arbitrary-sized bvecs), so we need to set the queue
limits to ensure that.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: F. Imendï¿½rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nï¿½rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-07  7:00   ` Hannes Reinecke
@ 2017-03-07  7:23     ` Minchan Kim
  2017-03-07  7:48       ` Hannes Reinecke
  0 siblings, 1 reply; 20+ messages in thread
From: Minchan Kim @ 2017-03-07  7:23 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Johannes Thumshirn, Jens Axboe, Nitin Gupta, Christoph Hellwig,
	Sergey Senozhatsky, yizhan, Linux Block Layer Mailinglist,
	Linux Kernel Mailinglist

Hi Hannes,

On Tue, Mar 7, 2017 at 4:00 PM, Hannes Reinecke <hare@suse.de> wrote:
> On 03/07/2017 06:22 AM, Minchan Kim wrote:
>> Hello Johannes,
>>
>> On Mon, Mar 06, 2017 at 11:23:35AM +0100, Johannes Thumshirn wrote:
>>> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
>>> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
>>> pages attached to the bio's bvec this results in a kernel panic because of
>>> array out of bounds accesses in zram_decompress_page().
>>
>> First of all, thanks for the report and fix up!
>> Unfortunately, I'm not familiar with that interface of block layer.
>>
>> It seems this is a material for stable so I want to understand it clear.
>> Could you say more specific things to educate me?
>>
>> What scenario/When/How it is problem?  It will help for me to understand!
>>

Thanks for the quick response!

> The problem is that zram as it currently stands can only handle bios
> where each bvec contains a single page (or, to be precise, a chunk of
> data with a length of a page).

Right.

>
> This is not an automatic guarantee from the block layer (who is free to
> send us bios with arbitrary-sized bvecs), so we need to set the queue
> limits to ensure that.

What does it mean "bios with arbitrary-sized bvecs"?
What kinds of scenario is it used/useful?

And how can we solve it by setting queue limit?

Sorry for the many questions due to limited knowledge.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-07  7:23     ` Minchan Kim
@ 2017-03-07  7:48       ` Hannes Reinecke
  2017-03-07  8:55         ` Minchan Kim
  0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2017-03-07  7:48 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Thumshirn, Jens Axboe, Nitin Gupta, Christoph Hellwig,
	Sergey Senozhatsky, yizhan, Linux Block Layer Mailinglist,
	Linux Kernel Mailinglist

On 03/07/2017 08:23 AM, Minchan Kim wrote:
> Hi Hannes,
> 
> On Tue, Mar 7, 2017 at 4:00 PM, Hannes Reinecke <hare@suse.de> wrote:
>> On 03/07/2017 06:22 AM, Minchan Kim wrote:
>>> Hello Johannes,
>>>
>>> On Mon, Mar 06, 2017 at 11:23:35AM +0100, Johannes Thumshirn wrote:
>>>> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
>>>> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
>>>> pages attached to the bio's bvec this results in a kernel panic because of
>>>> array out of bounds accesses in zram_decompress_page().
>>>
>>> First of all, thanks for the report and fix up!
>>> Unfortunately, I'm not familiar with that interface of block layer.
>>>
>>> It seems this is a material for stable so I want to understand it clear.
>>> Could you say more specific things to educate me?
>>>
>>> What scenario/When/How it is problem?  It will help for me to understand!
>>>
> 
> Thanks for the quick response!
> 
>> The problem is that zram as it currently stands can only handle bios
>> where each bvec contains a single page (or, to be precise, a chunk of
>> data with a length of a page).
> 
> Right.
> 
>>
>> This is not an automatic guarantee from the block layer (who is free to
>> send us bios with arbitrary-sized bvecs), so we need to set the queue
>> limits to ensure that.
> 
> What does it mean "bios with arbitrary-sized bvecs"?
> What kinds of scenario is it used/useful?
> 
Each bio contains a list of bvecs, each of which points to a specific
memory area:

struct bio_vec {
	struct page	*bv_page;
	unsigned int	bv_len;
	unsigned int	bv_offset;
};

The trick now is that while 'bv_page' does point to a page, the memory
area pointed to might in fact be contiguous (if several pages are
adjacent). Hence we might be getting a bio_vec where bv_len is _larger_
than a page.

Hence the check for 'is_partial_io' in zram_drv.c (which just does a
test 'if bv_len != PAGE_SIZE) is in fact wrong, as it would trigger for
partial I/O (ie if the overall length of the bio_vec is _smaller_ than a
page), but also for multipage bvecs (where the length of the bio_vec is
_larger_ than a page).

So rather than fixing the bio scanning loop in zram it's easier to set
the queue limits correctly so that 'is_partial_io' does the correct
thing and the overall logic in zram doesn't need to be altered.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-07  7:48       ` Hannes Reinecke
@ 2017-03-07  8:55         ` Minchan Kim
  2017-03-07  9:51           ` Johannes Thumshirn
  0 siblings, 1 reply; 20+ messages in thread
From: Minchan Kim @ 2017-03-07  8:55 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Johannes Thumshirn, Jens Axboe, Nitin Gupta, Christoph Hellwig,
	Sergey Senozhatsky, yizhan, Linux Block Layer Mailinglist,
	Linux Kernel Mailinglist

On Tue, Mar 07, 2017 at 08:48:06AM +0100, Hannes Reinecke wrote:
> On 03/07/2017 08:23 AM, Minchan Kim wrote:
> > Hi Hannes,
> > 
> > On Tue, Mar 7, 2017 at 4:00 PM, Hannes Reinecke <hare@suse.de> wrote:
> >> On 03/07/2017 06:22 AM, Minchan Kim wrote:
> >>> Hello Johannes,
> >>>
> >>> On Mon, Mar 06, 2017 at 11:23:35AM +0100, Johannes Thumshirn wrote:
> >>>> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
> >>>> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
> >>>> pages attached to the bio's bvec this results in a kernel panic because of
> >>>> array out of bounds accesses in zram_decompress_page().
> >>>
> >>> First of all, thanks for the report and fix up!
> >>> Unfortunately, I'm not familiar with that interface of block layer.
> >>>
> >>> It seems this is a material for stable so I want to understand it clear.
> >>> Could you say more specific things to educate me?
> >>>
> >>> What scenario/When/How it is problem?  It will help for me to understand!
> >>>
> > 
> > Thanks for the quick response!
> > 
> >> The problem is that zram as it currently stands can only handle bios
> >> where each bvec contains a single page (or, to be precise, a chunk of
> >> data with a length of a page).
> > 
> > Right.
> > 
> >>
> >> This is not an automatic guarantee from the block layer (who is free to
> >> send us bios with arbitrary-sized bvecs), so we need to set the queue
> >> limits to ensure that.
> > 
> > What does it mean "bios with arbitrary-sized bvecs"?
> > What kinds of scenario is it used/useful?
> > 
> Each bio contains a list of bvecs, each of which points to a specific
> memory area:
> 
> struct bio_vec {
> 	struct page	*bv_page;
> 	unsigned int	bv_len;
> 	unsigned int	bv_offset;
> };
> 
> The trick now is that while 'bv_page' does point to a page, the memory
> area pointed to might in fact be contiguous (if several pages are
> adjacent). Hence we might be getting a bio_vec where bv_len is _larger_
> than a page.

Thanks for detail, Hannes!

If I understand it correctly, it seems to be related to bid_add_page
with high-order page. Right?

If so, I really wonder why I don't see such problem because several
places have used it and I expected some of them might do IO with
contiguous pages intentionally or by chance. Hmm,

IIUC, it's not a nvme specific problme but general problem which
can trigger normal FSes if they uses contiguos pages?

> 
> Hence the check for 'is_partial_io' in zram_drv.c (which just does a
> test 'if bv_len != PAGE_SIZE) is in fact wrong, as it would trigger for
> partial I/O (ie if the overall length of the bio_vec is _smaller_ than a
> page), but also for multipage bvecs (where the length of the bio_vec is
> _larger_ than a page).

Right. I need to look into that. Thanks for the pointing out!

> 
> So rather than fixing the bio scanning loop in zram it's easier to set
> the queue limits correctly so that 'is_partial_io' does the correct
> thing and the overall logic in zram doesn't need to be altered.


Isn't that approach require new bio allocation through blk_queue_split?
Maybe, it wouldn't make severe regression in zram-FS workload but need
to test.

Is there any ways to trigger the problem without real nvme device?
It would really help to test/measure zram.

Anyway, to me, it's really subtle at this moment so I doubt it should
be stable material. :(

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-07  8:55         ` Minchan Kim
@ 2017-03-07  9:51           ` Johannes Thumshirn
  2017-03-08  5:11             ` Minchan Kim
  0 siblings, 1 reply; 20+ messages in thread
From: Johannes Thumshirn @ 2017-03-07  9:51 UTC (permalink / raw)
  To: Minchan Kim, Hannes Reinecke
  Cc: Jens Axboe, Nitin Gupta, Christoph Hellwig, Sergey Senozhatsky,
	yizhan, Linux Block Layer Mailinglist, Linux Kernel Mailinglist

On 03/07/2017 09:55 AM, Minchan Kim wrote:
> On Tue, Mar 07, 2017 at 08:48:06AM +0100, Hannes Reinecke wrote:
>> On 03/07/2017 08:23 AM, Minchan Kim wrote:
>>> Hi Hannes,
>>>
>>> On Tue, Mar 7, 2017 at 4:00 PM, Hannes Reinecke <hare@suse.de> wrote:
>>>> On 03/07/2017 06:22 AM, Minchan Kim wrote:
>>>>> Hello Johannes,
>>>>>
>>>>> On Mon, Mar 06, 2017 at 11:23:35AM +0100, Johannes Thumshirn wrote:
>>>>>> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
>>>>>> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
>>>>>> pages attached to the bio's bvec this results in a kernel panic because of
>>>>>> array out of bounds accesses in zram_decompress_page().
>>>>>
>>>>> First of all, thanks for the report and fix up!
>>>>> Unfortunately, I'm not familiar with that interface of block layer.
>>>>>
>>>>> It seems this is a material for stable so I want to understand it clear.
>>>>> Could you say more specific things to educate me?
>>>>>
>>>>> What scenario/When/How it is problem?  It will help for me to understand!
>>>>>
>>>
>>> Thanks for the quick response!
>>>
>>>> The problem is that zram as it currently stands can only handle bios
>>>> where each bvec contains a single page (or, to be precise, a chunk of
>>>> data with a length of a page).
>>>
>>> Right.
>>>
>>>>
>>>> This is not an automatic guarantee from the block layer (who is free to
>>>> send us bios with arbitrary-sized bvecs), so we need to set the queue
>>>> limits to ensure that.
>>>
>>> What does it mean "bios with arbitrary-sized bvecs"?
>>> What kinds of scenario is it used/useful?
>>>
>> Each bio contains a list of bvecs, each of which points to a specific
>> memory area:
>>
>> struct bio_vec {
>> 	struct page	*bv_page;
>> 	unsigned int	bv_len;
>> 	unsigned int	bv_offset;
>> };
>>
>> The trick now is that while 'bv_page' does point to a page, the memory
>> area pointed to might in fact be contiguous (if several pages are
>> adjacent). Hence we might be getting a bio_vec where bv_len is _larger_
>> than a page.
> 
> Thanks for detail, Hannes!
> 
> If I understand it correctly, it seems to be related to bid_add_page
> with high-order page. Right?
> 
> If so, I really wonder why I don't see such problem because several
> places have used it and I expected some of them might do IO with
> contiguous pages intentionally or by chance. Hmm,
> 
> IIUC, it's not a nvme specific problme but general problem which
> can trigger normal FSes if they uses contiguos pages?
> 

I'm not a FS expert, but a quick grep shows that non of the file-systems
does the

for_each_sg()
	while(bio_add_page()))

trick NVMe does.

>>
>> Hence the check for 'is_partial_io' in zram_drv.c (which just does a
>> test 'if bv_len != PAGE_SIZE) is in fact wrong, as it would trigger for
>> partial I/O (ie if the overall length of the bio_vec is _smaller_ than a
>> page), but also for multipage bvecs (where the length of the bio_vec is
>> _larger_ than a page).
> 
> Right. I need to look into that. Thanks for the pointing out!
> 
>>
>> So rather than fixing the bio scanning loop in zram it's easier to set
>> the queue limits correctly so that 'is_partial_io' does the correct
>> thing and the overall logic in zram doesn't need to be altered.
> 
> 
> Isn't that approach require new bio allocation through blk_queue_split?
> Maybe, it wouldn't make severe regression in zram-FS workload but need
> to test.

Yes, but blk_queue_split() needs information how big the bvecs can be,
hence the patch.

For details have a look into blk_bio_segment_split() in block/blk-merge.c

It get's the max_sectors from blk_max_size_offset() which is
q->limits.max_sectors when q->limits.chunk_sectors isn't set and
then loops over the bio's bvecs to check when to split the bio and then
calls bio_split() when appropriate.

> 
> Is there any ways to trigger the problem without real nvme device?
> It would really help to test/measure zram.

It isn't a /real/ device but the fabrics loopback target. If you want a
fast reproducible test-case, have a look at:

https://github.com/ddiss/rapido/
the cut_nvme_local.sh script set's up the correct VM for this test. Then
a simple mkfs.xfs /dev/nvme0n1 will oops.

> 
> Anyway, to me, it's really subtle at this moment so I doubt it should
> be stable material. :(

I'm not quite sure, it's at least 4.11 material. See above.

Thanks,
	Johannes


-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: Felix Imendï¿½rffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nï¿½rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-07  9:51           ` Johannes Thumshirn
@ 2017-03-08  5:11             ` Minchan Kim
  2017-03-08  7:58               ` Johannes Thumshirn
  0 siblings, 1 reply; 20+ messages in thread
From: Minchan Kim @ 2017-03-08  5:11 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Hannes Reinecke, Jens Axboe, Nitin Gupta, Christoph Hellwig,
	Sergey Senozhatsky, yizhan, Linux Block Layer Mailinglist,
	Linux Kernel Mailinglist

Hi Johannes,

On Tue, Mar 07, 2017 at 10:51:45AM +0100, Johannes Thumshirn wrote:
> On 03/07/2017 09:55 AM, Minchan Kim wrote:
> > On Tue, Mar 07, 2017 at 08:48:06AM +0100, Hannes Reinecke wrote:
> >> On 03/07/2017 08:23 AM, Minchan Kim wrote:
> >>> Hi Hannes,
> >>>
> >>> On Tue, Mar 7, 2017 at 4:00 PM, Hannes Reinecke <hare@suse.de> wrote:
> >>>> On 03/07/2017 06:22 AM, Minchan Kim wrote:
> >>>>> Hello Johannes,
> >>>>>
> >>>>> On Mon, Mar 06, 2017 at 11:23:35AM +0100, Johannes Thumshirn wrote:
> >>>>>> zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
> >>>>>> the NVMe over Fabrics loopback target which potentially sends a huge bulk of
> >>>>>> pages attached to the bio's bvec this results in a kernel panic because of
> >>>>>> array out of bounds accesses in zram_decompress_page().
> >>>>>
> >>>>> First of all, thanks for the report and fix up!
> >>>>> Unfortunately, I'm not familiar with that interface of block layer.
> >>>>>
> >>>>> It seems this is a material for stable so I want to understand it clear.
> >>>>> Could you say more specific things to educate me?
> >>>>>
> >>>>> What scenario/When/How it is problem?  It will help for me to understand!
> >>>>>
> >>>
> >>> Thanks for the quick response!
> >>>
> >>>> The problem is that zram as it currently stands can only handle bios
> >>>> where each bvec contains a single page (or, to be precise, a chunk of
> >>>> data with a length of a page).
> >>>
> >>> Right.
> >>>
> >>>>
> >>>> This is not an automatic guarantee from the block layer (who is free to
> >>>> send us bios with arbitrary-sized bvecs), so we need to set the queue
> >>>> limits to ensure that.
> >>>
> >>> What does it mean "bios with arbitrary-sized bvecs"?
> >>> What kinds of scenario is it used/useful?
> >>>
> >> Each bio contains a list of bvecs, each of which points to a specific
> >> memory area:
> >>
> >> struct bio_vec {
> >> 	struct page	*bv_page;
> >> 	unsigned int	bv_len;
> >> 	unsigned int	bv_offset;
> >> };
> >>
> >> The trick now is that while 'bv_page' does point to a page, the memory
> >> area pointed to might in fact be contiguous (if several pages are
> >> adjacent). Hence we might be getting a bio_vec where bv_len is _larger_
> >> than a page.
> > 
> > Thanks for detail, Hannes!
> > 
> > If I understand it correctly, it seems to be related to bid_add_page
> > with high-order page. Right?
> > 
> > If so, I really wonder why I don't see such problem because several
> > places have used it and I expected some of them might do IO with
> > contiguous pages intentionally or by chance. Hmm,
> > 
> > IIUC, it's not a nvme specific problme but general problem which
> > can trigger normal FSes if they uses contiguos pages?
> > 
> 
> I'm not a FS expert, but a quick grep shows that non of the file-systems
> does the
> 
> for_each_sg()
> 	while(bio_add_page()))
> 
> trick NVMe does.

Aah, I see.

> 
> >>
> >> Hence the check for 'is_partial_io' in zram_drv.c (which just does a
> >> test 'if bv_len != PAGE_SIZE) is in fact wrong, as it would trigger for
> >> partial I/O (ie if the overall length of the bio_vec is _smaller_ than a
> >> page), but also for multipage bvecs (where the length of the bio_vec is
> >> _larger_ than a page).
> > 
> > Right. I need to look into that. Thanks for the pointing out!
> > 
> >>
> >> So rather than fixing the bio scanning loop in zram it's easier to set
> >> the queue limits correctly so that 'is_partial_io' does the correct
> >> thing and the overall logic in zram doesn't need to be altered.
> > 
> > 
> > Isn't that approach require new bio allocation through blk_queue_split?
> > Maybe, it wouldn't make severe regression in zram-FS workload but need
> > to test.
> 
> Yes, but blk_queue_split() needs information how big the bvecs can be,
> hence the patch.
> 
> For details have a look into blk_bio_segment_split() in block/blk-merge.c
> 
> It get's the max_sectors from blk_max_size_offset() which is
> q->limits.max_sectors when q->limits.chunk_sectors isn't set and
> then loops over the bio's bvecs to check when to split the bio and then
> calls bio_split() when appropriate.

Yeb so it causes split bio which means new bio allocations which was
not needed before.

> 
> > 
> > Is there any ways to trigger the problem without real nvme device?
> > It would really help to test/measure zram.
> 
> It isn't a /real/ device but the fabrics loopback target. If you want a
> fast reproducible test-case, have a look at:
> 
> https://github.com/ddiss/rapido/
> the cut_nvme_local.sh script set's up the correct VM for this test. Then
> a simple mkfs.xfs /dev/nvme0n1 will oops.

Thanks! I will look into that.

And could you test this patch? It avoids split bio so no need new bio
allocations and makes zram code simple.

>From f778d7564d5cd772f25bb181329362c29548a257 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Wed, 8 Mar 2017 13:35:29 +0900
Subject: [PATCH] fix

Not-yet-Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 40 ++++++++++++++--------------------------
 1 file changed, 14 insertions(+), 26 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index bcb03bacdded..516c3bd97a28 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -147,8 +147,7 @@ static inline bool valid_io_request(struct zram *zram,
 
 static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
 {
-	if (*offset + bvec->bv_len >= PAGE_SIZE)
-		(*index)++;
+	*index  += (*offset + bvec->bv_len) / PAGE_SIZE;
 	*offset = (*offset + bvec->bv_len) % PAGE_SIZE;
 }
 
@@ -886,7 +885,7 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
 {
 	int offset;
 	u32 index;
-	struct bio_vec bvec;
+	struct bio_vec bvec, bv;
 	struct bvec_iter iter;
 
 	index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
@@ -900,34 +899,23 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
 	}
 
 	bio_for_each_segment(bvec, bio, iter) {
-		int max_transfer_size = PAGE_SIZE - offset;
+		int remained_size = bvec.bv_len;
+		int transfer_size;
 
-		if (bvec.bv_len > max_transfer_size) {
-			/*
-			 * zram_bvec_rw() can only make operation on a single
-			 * zram page. Split the bio vector.
-			 */
-			struct bio_vec bv;
-
-			bv.bv_page = bvec.bv_page;
-			bv.bv_len = max_transfer_size;
-			bv.bv_offset = bvec.bv_offset;
+		bv.bv_page = bvec.bv_page;
+		bv.bv_offset = bvec.bv_offset;
+		do {
+			transfer_size = min_t(int, PAGE_SIZE, remained_size);
+			bv.bv_len = transfer_size;
 
 			if (zram_bvec_rw(zram, &bv, index, offset,
-					 op_is_write(bio_op(bio))) < 0)
-				goto out;
-
-			bv.bv_len = bvec.bv_len - max_transfer_size;
-			bv.bv_offset += max_transfer_size;
-			if (zram_bvec_rw(zram, &bv, index + 1, 0,
-					 op_is_write(bio_op(bio))) < 0)
-				goto out;
-		} else
-			if (zram_bvec_rw(zram, &bvec, index, offset,
-					 op_is_write(bio_op(bio))) < 0)
+					op_is_write(bio_op(bio))) < 0)
 				goto out;
 
-		update_position(&index, &offset, &bvec);
+			bv.bv_offset += transfer_size;
+			update_position(&index, &offset, &bv);
+			remained_size -= transfer_size;
+		} while (remained_size);
 	}
 
 	bio_endio(bio);
-- 
2.7.4


> 
> > 
> > Anyway, to me, it's really subtle at this moment so I doubt it should
> > be stable material. :(
> 
> I'm not quite sure, it's at least 4.11 material. See above.

Absolutely agree that it should be 4.11 material but I don't want to
backport it to the stable because it would make regression due to
split bio works.

Anyway, if my patch I attached works for you, I will resend this
with modified descriptions includes more detail.

Thanks for the help!!

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-08  5:11             ` Minchan Kim
@ 2017-03-08  7:58               ` Johannes Thumshirn
  2017-03-09  5:28                 ` Minchan Kim
  0 siblings, 1 reply; 20+ messages in thread
From: Johannes Thumshirn @ 2017-03-08  7:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hannes Reinecke, Jens Axboe, Nitin Gupta, Christoph Hellwig,
	Sergey Senozhatsky, yizhan, Linux Block Layer Mailinglist,
	Linux Kernel Mailinglist

On 03/08/2017 06:11 AM, Minchan Kim wrote:
> And could you test this patch? It avoids split bio so no need new bio
> allocations and makes zram code simple.
> 
> From f778d7564d5cd772f25bb181329362c29548a257 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Wed, 8 Mar 2017 13:35:29 +0900
> Subject: [PATCH] fix
> 
> Not-yet-Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---

[...]

Yup, this works here.

I did a mkfs.xfs /dev/nvme0n1
dd if=/dev/urandom of=/test.bin bs=1M count=128
sha256sum test.bin
mount /dev/nvme0n1 /dir
mv test.bin /dir/
sha256sum /dir/test.bin

No panics and sha256sum of the 128MB test file still matches

Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

Now that you removed the one page limit in zram_bvec_rw() you can also
add this hunk to remove the queue splitting:

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 85f4df8..27b168f6 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -868,8 +868,6 @@ static blk_qc_t zram_make_request(struct
request_queue *queue, struct bio *bio)
 {
        struct zram *zram = queue->queuedata;

-       blk_queue_split(queue, &bio, queue->bio_split);
-
        if (!valid_io_request(zram, bio->bi_iter.bi_sector,
                                        bio->bi_iter.bi_size)) {
                atomic64_inc(&zram->stats.invalid_io);

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: Felix Imendï¿½rffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nï¿½rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-08  7:58               ` Johannes Thumshirn
@ 2017-03-09  5:28                 ` Minchan Kim
  2017-03-30 15:08                   ` Minchan Kim
  0 siblings, 1 reply; 20+ messages in thread
From: Minchan Kim @ 2017-03-09  5:28 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Hannes Reinecke, Jens Axboe, Nitin Gupta, Christoph Hellwig,
	Sergey Senozhatsky, yizhan, Linux Block Layer Mailinglist,
	Linux Kernel Mailinglist, Andrew Morton

On Wed, Mar 08, 2017 at 08:58:02AM +0100, Johannes Thumshirn wrote:
> On 03/08/2017 06:11 AM, Minchan Kim wrote:
> > And could you test this patch? It avoids split bio so no need new bio
> > allocations and makes zram code simple.
> > 
> > From f778d7564d5cd772f25bb181329362c29548a257 Mon Sep 17 00:00:00 2001
> > From: Minchan Kim <minchan@kernel.org>
> > Date: Wed, 8 Mar 2017 13:35:29 +0900
> > Subject: [PATCH] fix
> > 
> > Not-yet-Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> 
> [...]
> 
> Yup, this works here.
> 
> I did a mkfs.xfs /dev/nvme0n1
> dd if=/dev/urandom of=/test.bin bs=1M count=128
> sha256sum test.bin
> mount /dev/nvme0n1 /dir
> mv test.bin /dir/
> sha256sum /dir/test.bin
> 
> No panics and sha256sum of the 128MB test file still matches
> 
> Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

Thanks a lot, Johannes and Hannes!!

> 
> Now that you removed the one page limit in zram_bvec_rw() you can also
> add this hunk to remove the queue splitting:

Right. I added what you suggested with detailed description.

> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 85f4df8..27b168f6 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -868,8 +868,6 @@ static blk_qc_t zram_make_request(struct
> request_queue *queue, struct bio *bio)
>  {
>         struct zram *zram = queue->queuedata;
> 
> -       blk_queue_split(queue, &bio, queue->bio_split);
> -
>         if (!valid_io_request(zram, bio->bi_iter.bi_sector,
>                                         bio->bi_iter.bi_size)) {
>                 atomic64_inc(&zram->stats.invalid_io);
> 
> Byte,
> 	Johannes
> 

Jens, Could you replace the one merged with this? And I don't want
to add stable mark in this patch because I feel it need enough
testing in 64K page system I don't have. ;(

>From bb73e75ab0e21016f60858fd61e7dc6a6813e359 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Thu, 9 Mar 2017 14:00:40 +0900
Subject: [PATCH] zram: handle multiple pages attached bio's bvec

Johannes Thumshirn reported system goes the panic when using NVMe over
Fabrics loopback target with zram.

The reason is zram expects each bvec in bio contains a single page
but nvme can attach a huge bulk of pages attached to the bio's bvec
so that zram's index arithmetic could be wrong so that out-of-bound
access makes panic.

This patch solves the problem via removing the limit(a bvec should
contains a only single page).

Cc: Hannes Reinecke <hare@suse.com>
Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
I don't add stable mark intentionally because I think it's rather risky
without enough testing on 64K page system(ie, partial IO part).

Thanks for the help, Johannes and Hannes!!

 drivers/block/zram/zram_drv.c | 37 ++++++++++---------------------------
 1 file changed, 10 insertions(+), 27 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 01944419b1f3..fefdf260503a 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -137,8 +137,7 @@ static inline bool valid_io_request(struct zram *zram,
 
 static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
 {
-	if (*offset + bvec->bv_len >= PAGE_SIZE)
-		(*index)++;
+	*index  += (*offset + bvec->bv_len) / PAGE_SIZE;
 	*offset = (*offset + bvec->bv_len) % PAGE_SIZE;
 }
 
@@ -838,34 +837,20 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
 	}
 
 	bio_for_each_segment(bvec, bio, iter) {
-		int max_transfer_size = PAGE_SIZE - offset;
-
-		if (bvec.bv_len > max_transfer_size) {
-			/*
-			 * zram_bvec_rw() can only make operation on a single
-			 * zram page. Split the bio vector.
-			 */
-			struct bio_vec bv;
-
-			bv.bv_page = bvec.bv_page;
-			bv.bv_len = max_transfer_size;
-			bv.bv_offset = bvec.bv_offset;
+		struct bio_vec bv = bvec;
+		unsigned int remained = bvec.bv_len;
 
+		do {
+			bv.bv_len = min_t(unsigned int, PAGE_SIZE, remained);
 			if (zram_bvec_rw(zram, &bv, index, offset,
-					 op_is_write(bio_op(bio))) < 0)
+					op_is_write(bio_op(bio))) < 0)
 				goto out;
 
-			bv.bv_len = bvec.bv_len - max_transfer_size;
-			bv.bv_offset += max_transfer_size;
-			if (zram_bvec_rw(zram, &bv, index + 1, 0,
-					 op_is_write(bio_op(bio))) < 0)
-				goto out;
-		} else
-			if (zram_bvec_rw(zram, &bvec, index, offset,
-					 op_is_write(bio_op(bio))) < 0)
-				goto out;
+			bv.bv_offset += bv.bv_len;
+			remained -= bv.bv_len;
 
-		update_position(&index, &offset, &bvec);
+			update_position(&index, &offset, &bv);
+		} while (remained);
 	}
 
 	bio_endio(bio);
@@ -882,8 +867,6 @@ static blk_qc_t zram_make_request(struct request_queue *queue, struct bio *bio)
 {
 	struct zram *zram = queue->queuedata;
 
-	blk_queue_split(queue, &bio, queue->bio_split);
-
 	if (!valid_io_request(zram, bio->bi_iter.bi_sector,
 					bio->bi_iter.bi_size)) {
 		atomic64_inc(&zram->stats.invalid_io);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-09  5:28                 ` Minchan Kim
@ 2017-03-30 15:08                   ` Minchan Kim
  2017-03-30 15:35                     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Minchan Kim @ 2017-03-30 15:08 UTC (permalink / raw)
  To: Johannes Thumshirn, Jens Axboe
  Cc: Hannes Reinecke, Nitin Gupta, Christoph Hellwig,
	Sergey Senozhatsky, yizhan, Linux Block Layer Mailinglist,
	Linux Kernel Mailinglist, Andrew Morton

Hi Jens,

It seems you miss this.
Could you handle this?

Thanks.

On Thu, Mar 9, 2017 at 2:28 PM, Minchan Kim <minchan@kernel.org> wrote:

< snip>

> Jens, Could you replace the one merged with this? And I don't want
> to add stable mark in this patch because I feel it need enough
> testing in 64K page system I don't have. ;(
>
> From bb73e75ab0e21016f60858fd61e7dc6a6813e359 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Thu, 9 Mar 2017 14:00:40 +0900
> Subject: [PATCH] zram: handle multiple pages attached bio's bvec
>
> Johannes Thumshirn reported system goes the panic when using NVMe over
> Fabrics loopback target with zram.
>
> The reason is zram expects each bvec in bio contains a single page
> but nvme can attach a huge bulk of pages attached to the bio's bvec
> so that zram's index arithmetic could be wrong so that out-of-bound
> access makes panic.
>
> This patch solves the problem via removing the limit(a bvec should
> contains a only single page).
>
> Cc: Hannes Reinecke <hare@suse.com>
> Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
> Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
> I don't add stable mark intentionally because I think it's rather risky
> without enough testing on 64K page system(ie, partial IO part).
>
> Thanks for the help, Johannes and Hannes!!
>
>  drivers/block/zram/zram_drv.c | 37 ++++++++++---------------------------
>  1 file changed, 10 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 01944419b1f3..fefdf260503a 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -137,8 +137,7 @@ static inline bool valid_io_request(struct zram *zram,
>
>  static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
>  {
> -       if (*offset + bvec->bv_len >= PAGE_SIZE)
> -               (*index)++;
> +       *index  += (*offset + bvec->bv_len) / PAGE_SIZE;
>         *offset = (*offset + bvec->bv_len) % PAGE_SIZE;
>  }
>
> @@ -838,34 +837,20 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
>         }
>
>         bio_for_each_segment(bvec, bio, iter) {
> -               int max_transfer_size = PAGE_SIZE - offset;
> -
> -               if (bvec.bv_len > max_transfer_size) {
> -                       /*
> -                        * zram_bvec_rw() can only make operation on a single
> -                        * zram page. Split the bio vector.
> -                        */
> -                       struct bio_vec bv;
> -
> -                       bv.bv_page = bvec.bv_page;
> -                       bv.bv_len = max_transfer_size;
> -                       bv.bv_offset = bvec.bv_offset;
> +               struct bio_vec bv = bvec;
> +               unsigned int remained = bvec.bv_len;
>
> +               do {
> +                       bv.bv_len = min_t(unsigned int, PAGE_SIZE, remained);
>                         if (zram_bvec_rw(zram, &bv, index, offset,
> -                                        op_is_write(bio_op(bio))) < 0)
> +                                       op_is_write(bio_op(bio))) < 0)
>                                 goto out;
>
> -                       bv.bv_len = bvec.bv_len - max_transfer_size;
> -                       bv.bv_offset += max_transfer_size;
> -                       if (zram_bvec_rw(zram, &bv, index + 1, 0,
> -                                        op_is_write(bio_op(bio))) < 0)
> -                               goto out;
> -               } else
> -                       if (zram_bvec_rw(zram, &bvec, index, offset,
> -                                        op_is_write(bio_op(bio))) < 0)
> -                               goto out;
> +                       bv.bv_offset += bv.bv_len;
> +                       remained -= bv.bv_len;
>
> -               update_position(&index, &offset, &bvec);
> +                       update_position(&index, &offset, &bv);
> +               } while (remained);
>         }
>
>         bio_endio(bio);
> @@ -882,8 +867,6 @@ static blk_qc_t zram_make_request(struct request_queue *queue, struct bio *bio)
>  {
>         struct zram *zram = queue->queuedata;
>
> -       blk_queue_split(queue, &bio, queue->bio_split);
> -
>         if (!valid_io_request(zram, bio->bi_iter.bi_sector,
>                                         bio->bi_iter.bi_size)) {
>                 atomic64_inc(&zram->stats.invalid_io);
> --
> 2.7.4
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-30 15:08                   ` Minchan Kim
@ 2017-03-30 15:35                     ` Jens Axboe
  2017-03-30 23:45                       ` Minchan Kim
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2017-03-30 15:35 UTC (permalink / raw)
  To: Minchan Kim, Johannes Thumshirn
  Cc: Hannes Reinecke, Nitin Gupta, Christoph Hellwig,
	Sergey Senozhatsky, yizhan, Linux Block Layer Mailinglist,
	Linux Kernel Mailinglist, Andrew Morton

On 03/30/2017 09:08 AM, Minchan Kim wrote:
> Hi Jens,
> 
> It seems you miss this.
> Could you handle this?

I can, but I'm a little confused. The comment talks about replacing
the one I merged with this one, I can't do that. I'm assuming you
are talking about this commit:

commit 0bc315381fe9ed9fb91db8b0e82171b645ac008f
Author: Johannes Thumshirn <jthumshirn@suse.de>
Date:   Mon Mar 6 11:23:35 2017 +0100

    zram: set physical queue limits to avoid array out of bounds accesses

which is in mainline. The patch still applies, though.

Do we really REALLY need this for 4.11, or can we queue for 4.12 and
mark it stable?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-30 15:35                     ` Jens Axboe
@ 2017-03-30 23:45                       ` Minchan Kim
  2017-03-31  1:38                         ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Minchan Kim @ 2017-03-30 23:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Johannes Thumshirn, Hannes Reinecke, Nitin Gupta,
	Christoph Hellwig, Sergey Senozhatsky, yizhan,
	Linux Block Layer Mailinglist, Linux Kernel Mailinglist,
	Andrew Morton

On Thu, Mar 30, 2017 at 09:35:56AM -0600, Jens Axboe wrote:
> On 03/30/2017 09:08 AM, Minchan Kim wrote:
> > Hi Jens,
> > 
> > It seems you miss this.
> > Could you handle this?
> 
> I can, but I'm a little confused. The comment talks about replacing
> the one I merged with this one, I can't do that. I'm assuming you
> are talking about this commit:

Right.

> 
> commit 0bc315381fe9ed9fb91db8b0e82171b645ac008f
> Author: Johannes Thumshirn <jthumshirn@suse.de>
> Date:   Mon Mar 6 11:23:35 2017 +0100
> 
>     zram: set physical queue limits to avoid array out of bounds accesses
> 
> which is in mainline. The patch still applies, though.

You mean it's already in mainline so you cannot replace but can revert.
Right?
If so, please revert it and merge this one.

> 
> Do we really REALLY need this for 4.11, or can we queue for 4.12 and
> mark it stable?

Not urgent because one in mainline fixes the problem so I'm okay
with 4.12 but I don't want mark it as -stable.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-30 23:45                       ` Minchan Kim
@ 2017-03-31  1:38                         ` Jens Axboe
  2017-04-03  5:11                           ` Minchan Kim
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2017-03-31  1:38 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Thumshirn, Hannes Reinecke, Nitin Gupta,
	Christoph Hellwig, Sergey Senozhatsky, yizhan,
	Linux Block Layer Mailinglist, Linux Kernel Mailinglist,
	Andrew Morton

On 03/30/2017 05:45 PM, Minchan Kim wrote:
> On Thu, Mar 30, 2017 at 09:35:56AM -0600, Jens Axboe wrote:
>> On 03/30/2017 09:08 AM, Minchan Kim wrote:
>>> Hi Jens,
>>>
>>> It seems you miss this.
>>> Could you handle this?
>>
>> I can, but I'm a little confused. The comment talks about replacing
>> the one I merged with this one, I can't do that. I'm assuming you
>> are talking about this commit:
> 
> Right.
> 
>>
>> commit 0bc315381fe9ed9fb91db8b0e82171b645ac008f
>> Author: Johannes Thumshirn <jthumshirn@suse.de>
>> Date:   Mon Mar 6 11:23:35 2017 +0100
>>
>>     zram: set physical queue limits to avoid array out of bounds accesses
>>
>> which is in mainline. The patch still applies, though.
> 
> You mean it's already in mainline so you cannot replace but can revert.
> Right?
> If so, please revert it and merge this one.

Let's please fold it into the other patch. That's cleaner and it makes
logical sense.

>> Do we really REALLY need this for 4.11, or can we queue for 4.12 and
>> mark it stable?
> 
> Not urgent because one in mainline fixes the problem so I'm okay
> with 4.12 but I don't want mark it as -stable.

OK good, please resend with the two-line revert in your current
patch, and I'll get it queued up for 4.12.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] zram: set physical queue limits to avoid array out of bounds accesses
  2017-03-31  1:38                         ` Jens Axboe
@ 2017-04-03  5:11                           ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-04-03  5:11 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Johannes Thumshirn, Hannes Reinecke, Nitin Gupta,
	Christoph Hellwig, Sergey Senozhatsky, yizhan,
	Linux Block Layer Mailinglist, Linux Kernel Mailinglist,
	Andrew Morton

Hi Jens,

On Thu, Mar 30, 2017 at 07:38:26PM -0600, Jens Axboe wrote:
> On 03/30/2017 05:45 PM, Minchan Kim wrote:
> > On Thu, Mar 30, 2017 at 09:35:56AM -0600, Jens Axboe wrote:
> >> On 03/30/2017 09:08 AM, Minchan Kim wrote:
> >>> Hi Jens,
> >>>
> >>> It seems you miss this.
> >>> Could you handle this?
> >>
> >> I can, but I'm a little confused. The comment talks about replacing
> >> the one I merged with this one, I can't do that. I'm assuming you
> >> are talking about this commit:
> > 
> > Right.
> > 
> >>
> >> commit 0bc315381fe9ed9fb91db8b0e82171b645ac008f
> >> Author: Johannes Thumshirn <jthumshirn@suse.de>
> >> Date:   Mon Mar 6 11:23:35 2017 +0100
> >>
> >>     zram: set physical queue limits to avoid array out of bounds accesses
> >>
> >> which is in mainline. The patch still applies, though.
> > 
> > You mean it's already in mainline so you cannot replace but can revert.
> > Right?
> > If so, please revert it and merge this one.
> 
> Let's please fold it into the other patch. That's cleaner and it makes
> logical sense.

Understood.

> 
> >> Do we really REALLY need this for 4.11, or can we queue for 4.12 and
> >> mark it stable?
> > 
> > Not urgent because one in mainline fixes the problem so I'm okay
> > with 4.12 but I don't want mark it as -stable.
> 
> OK good, please resend with the two-line revert in your current
> patch, and I'll get it queued up for 4.12.

Yeb. If so, now that I think about it, it would be better to handle
it via Andrew's tree because Andrew have been handled zram's patches
and I have several pending patches based on it.
So, I will send new patchset with it to Andrew.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-04-03  5:11 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-06 10:23 [PATCH] zram: set physical queue limits to avoid array out of bounds accesses Johannes Thumshirn
2017-03-06 10:25 ` Hannes Reinecke
2017-03-06 10:45 ` Sergey Senozhatsky
2017-03-06 15:21 ` Jens Axboe
2017-03-06 20:18   ` Andrew Morton
2017-03-06 20:19     ` Jens Axboe
2017-03-07  5:22 ` Minchan Kim
2017-03-07  7:00   ` Hannes Reinecke
2017-03-07  7:23     ` Minchan Kim
2017-03-07  7:48       ` Hannes Reinecke
2017-03-07  8:55         ` Minchan Kim
2017-03-07  9:51           ` Johannes Thumshirn
2017-03-08  5:11             ` Minchan Kim
2017-03-08  7:58               ` Johannes Thumshirn
2017-03-09  5:28                 ` Minchan Kim
2017-03-30 15:08                   ` Minchan Kim
2017-03-30 15:35                     ` Jens Axboe
2017-03-30 23:45                       ` Minchan Kim
2017-03-31  1:38                         ` Jens Axboe
2017-04-03  5:11                           ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).