linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] zram: fix operator precedence to get offset
@ 2017-04-13  0:17 Minchan Kim
  2017-04-13  0:17 ` [PATCH 2/3] zram: do not use copy_page with non-page alinged address Minchan Kim
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Minchan Kim @ 2017-04-13  0:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Sergey Senozhatsky, kernel-team, Minchan Kim, stable

In zram_rw_page, the logic to get offset is wrong by operator precedence
(i.e., "<<" is higher than "&"). With wrong offset, zram can corrupt the
user's data. This patch fixes it.

Fixes: 8c7f01025 ("zram: implement rw_page operation of zram")
Cc: stable@vger.kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 9e2199060040..83c38a123242 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -930,7 +930,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
 	}
 
 	index = sector >> SECTORS_PER_PAGE_SHIFT;
-	offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
+	offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
 
 	bv.bv_page = page;
 	bv.bv_len = PAGE_SIZE;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] zram: do not use copy_page with non-page alinged address
  2017-04-13  0:17 [PATCH 1/3] zram: fix operator precedence to get offset Minchan Kim
@ 2017-04-13  0:17 ` Minchan Kim
  2017-04-14  5:41   ` Sergey Senozhatsky
  2017-04-17  1:48   ` Sergey Senozhatsky
  2017-04-13  0:17 ` [PATCH 3/3] zsmalloc: expand class bit Minchan Kim
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Minchan Kim @ 2017-04-13  0:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Sergey Senozhatsky, kernel-team, Minchan Kim, stable

The copy_page is optimized memcpy for page-alinged address.
If it is used with non-page aligned address, it can corrupt memory which
means system corruption. With zram, it can happen with

1. 64K architecture
2. partial IO
3. slub debug

Partial IO need to allocate a page and zram allocates it via kmalloc.
With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned
address. And finally, copy_page(mem, cmem) corrupts memory.

So, this patch changes it to memcpy.

Acutaully, we don't need to change zram_bvec_write part because zsmalloc
returns page-aligned address in case of PAGE_SIZE class but it's not
good to rely on the internal of zsmalloc.

Note:
When this patch is merged to stable, clear_page should be fixed, too.
Unfortunately, recent zram removes it by "same page merge" feature
so it's hard to backport this patch to -stable tree.

I will handle it when I receive the mail from stable tree maintainer
to merge this patch to backport.

Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()")
Cc: stable@vger.kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 83c38a123242..11cc8767af99 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -525,7 +525,7 @@ static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
 
 	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO);
 	if (size == PAGE_SIZE) {
-		copy_page(mem, cmem);
+		memcpy(mem, cmem, PAGE_SIZE);
 	} else {
 		struct zcomp_strm *zstrm = zcomp_stream_get(zram->comp);
 
@@ -719,7 +719,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 
 	if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
 		src = kmap_atomic(page);
-		copy_page(cmem, src);
+		memcpy(cmem, src, PAGE_SIZE);
 		kunmap_atomic(src);
 	} else {
 		memcpy(cmem, src, clen);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] zsmalloc: expand class bit
  2017-04-13  0:17 [PATCH 1/3] zram: fix operator precedence to get offset Minchan Kim
  2017-04-13  0:17 ` [PATCH 2/3] zram: do not use copy_page with non-page alinged address Minchan Kim
@ 2017-04-13  0:17 ` Minchan Kim
  2017-04-14  5:07 ` [PATCH 1/3] zram: fix operator precedence to get offset Sergey Senozhatsky
  2017-04-17  1:21 ` Sergey Senozhatsky
  3 siblings, 0 replies; 17+ messages in thread
From: Minchan Kim @ 2017-04-13  0:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Sergey Senozhatsky, kernel-team, Minchan Kim,
	stable, linux-mm

Now 64K page system, zsamlloc has 257 classes so 8 class bit
is not enough. With that, it corrupts the system when zsmalloc
stores 65536byte data(ie, index number 256) so that this patch
increases class bit for simple fix for stable backport.
We should clean up this mess soon.

index	size
0	32
1	288
..
..
204	52256
256	65536

Cc: linux-mm@kvack.org
Fixes: 3783689a1 ("zsmalloc: introduce zspage structure")
Cc: stable@vger.kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b7b1fb6c8c21..9feadf4fc3d5 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -275,7 +275,7 @@ struct zs_pool {
 struct zspage {
 	struct {
 		unsigned int fullness:FULLNESS_BITS;
-		unsigned int class:CLASS_BITS;
+		unsigned int class:CLASS_BITS + 1;
 		unsigned int isolated:ISOLATED_BITS;
 		unsigned int magic:MAGIC_VAL_BITS;
 	};
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] zram: fix operator precedence to get offset
  2017-04-13  0:17 [PATCH 1/3] zram: fix operator precedence to get offset Minchan Kim
  2017-04-13  0:17 ` [PATCH 2/3] zram: do not use copy_page with non-page alinged address Minchan Kim
  2017-04-13  0:17 ` [PATCH 3/3] zsmalloc: expand class bit Minchan Kim
@ 2017-04-14  5:07 ` Sergey Senozhatsky
  2017-04-14 15:33   ` Minchan Kim
  2017-04-17  1:21 ` Sergey Senozhatsky
  3 siblings, 1 reply; 17+ messages in thread
From: Sergey Senozhatsky @ 2017-04-14  5:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, Sergey Senozhatsky, kernel-team, stable

Hello,

On (04/13/17 09:17), Minchan Kim wrote:
[..]
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 9e2199060040..83c38a123242 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -930,7 +930,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
>  	}
>  
>  	index = sector >> SECTORS_PER_PAGE_SHIFT;
> -	offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
> +	offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;

sorry, can it actually produce different results?

	-ss

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] zram: do not use copy_page with non-page alinged address
  2017-04-13  0:17 ` [PATCH 2/3] zram: do not use copy_page with non-page alinged address Minchan Kim
@ 2017-04-14  5:41   ` Sergey Senozhatsky
  2017-04-14 15:40     ` Minchan Kim
  2017-04-17  1:48   ` Sergey Senozhatsky
  1 sibling, 1 reply; 17+ messages in thread
From: Sergey Senozhatsky @ 2017-04-14  5:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, Sergey Senozhatsky, kernel-team, stable

Hello,

On (04/13/17 09:17), Minchan Kim wrote:
> The copy_page is optimized memcpy for page-alinged address.
> If it is used with non-page aligned address, it can corrupt memory which
> means system corruption. With zram, it can happen with
> 
> 1. 64K architecture
> 2. partial IO
> 3. slub debug
> 
> Partial IO need to allocate a page and zram allocates it via kmalloc.
> With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned
> address. And finally, copy_page(mem, cmem) corrupts memory.

which would be the case for many other copy_page() calls in the kernel.
right? if so - should the fix be in copy_page() then?

	-ss

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] zram: fix operator precedence to get offset
  2017-04-14  5:07 ` [PATCH 1/3] zram: fix operator precedence to get offset Sergey Senozhatsky
@ 2017-04-14 15:33   ` Minchan Kim
  2017-04-17  1:21     ` Sergey Senozhatsky
  0 siblings, 1 reply; 17+ messages in thread
From: Minchan Kim @ 2017-04-14 15:33 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-kernel, Sergey Senozhatsky, kernel-team, stable

Hi Sergey,

On Fri, Apr 14, 2017 at 02:07:47PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (04/13/17 09:17), Minchan Kim wrote:
> [..]
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index 9e2199060040..83c38a123242 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -930,7 +930,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
> >  	}
> >  
> >  	index = sector >> SECTORS_PER_PAGE_SHIFT;
> > -	offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
> > +	offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
> 
> sorry, can it actually produce different results?

I got your point. Actually, offset was wrong but rw_page is called
with PAGE_SIZE io while that offset is related to only partial io
(non-PAGEE size io). IOW, although the wrong offset it is never used
in functions.

To find subtle corruption in ppc64, I added some debug code to
catch up wrong buffer overflow and found it with other bugs but
didn't prove the specific case is valid case or not. Good catch, Sergey!

However, it should be *fixed* to prevent confusion in future but surely,
no need to go to the stable. I will send reply to Greg to prevent merging
it to *stable* when he send review asking to merge.

And next week I will send another fix which *maybe* removes code to get the
offset in zram_rw_page.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] zram: do not use copy_page with non-page alinged address
  2017-04-14  5:41   ` Sergey Senozhatsky
@ 2017-04-14 15:40     ` Minchan Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Minchan Kim @ 2017-04-14 15:40 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-kernel, Sergey Senozhatsky, kernel-team, stable

On Fri, Apr 14, 2017 at 02:41:05PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (04/13/17 09:17), Minchan Kim wrote:
> > The copy_page is optimized memcpy for page-alinged address.
> > If it is used with non-page aligned address, it can corrupt memory which
> > means system corruption. With zram, it can happen with
> > 
> > 1. 64K architecture
> > 2. partial IO
> > 3. slub debug
> > 
> > Partial IO need to allocate a page and zram allocates it via kmalloc.
> > With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned
> > address. And finally, copy_page(mem, cmem) corrupts memory.
> 
> which would be the case for many other copy_page() calls in the kernel.
> right? if so - should the fix be in copy_page() then?

I thought about it but was not sure it's good idea by several reasons
(but don't want to discuss it in this thread).

Anyway, it's stable stuff so I don't want to make the patch bloat.
If you believe it is right direction and valuable, you could be
a volunteer. :)

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] zram: fix operator precedence to get offset
  2017-04-14 15:33   ` Minchan Kim
@ 2017-04-17  1:21     ` Sergey Senozhatsky
  2017-04-17  1:54       ` Sergey Senozhatsky
  0 siblings, 1 reply; 17+ messages in thread
From: Sergey Senozhatsky @ 2017-04-17  1:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel,
	Sergey Senozhatsky, kernel-team, stable

Hello,

On (04/15/17 00:33), Minchan Kim wrote:
> On Fri, Apr 14, 2017 at 02:07:47PM +0900, Sergey Senozhatsky wrote:
> > On (04/13/17 09:17), Minchan Kim wrote:
> > [..]
> > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > > index 9e2199060040..83c38a123242 100644
> > > --- a/drivers/block/zram/zram_drv.c
> > > +++ b/drivers/block/zram/zram_drv.c
> > > @@ -930,7 +930,7 @@ static int zram_rw_page(struct block_device *bdev, sector_t sector,
> > >  	}
> > >  
> > >  	index = sector >> SECTORS_PER_PAGE_SHIFT;
> > > -	offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
> > > +	offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
> > 
> > sorry, can it actually produce different results?
> 
> I got your point. Actually, offset was wrong but rw_page is called
> with PAGE_SIZE io while that offset is related to only partial io
> (non-PAGEE size io). IOW, although the wrong offset it is never used
> in functions.
> 
> To find subtle corruption in ppc64, I added some debug code to
> catch up wrong buffer overflow and found it with other bugs but
> didn't prove the specific case is valid case or not. Good catch, Sergey!
> 
> However, it should be *fixed* to prevent confusion in future but surely,
> no need to go to the stable. I will send reply to Greg to prevent merging
> it to *stable* when he send review asking to merge.

cool. thanks!

> And next week I will send another fix which *maybe* removes code to get the
> offset in zram_rw_page.

sounds interesting!

	-ss

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] zram: fix operator precedence to get offset
  2017-04-13  0:17 [PATCH 1/3] zram: fix operator precedence to get offset Minchan Kim
                   ` (2 preceding siblings ...)
  2017-04-14  5:07 ` [PATCH 1/3] zram: fix operator precedence to get offset Sergey Senozhatsky
@ 2017-04-17  1:21 ` Sergey Senozhatsky
  3 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2017-04-17  1:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, Sergey Senozhatsky, kernel-team, stable

On (04/13/17 09:17), Minchan Kim wrote:
> Date: Thu, 13 Apr 2017 09:17:00 +0900
> From: Minchan Kim <minchan@kernel.org>
> To: Andrew Morton <akpm@linux-foundation.org>
> CC: linux-kernel@vger.kernel.org, Sergey Senozhatsky
>  <sergey.senozhatsky@gmail.com>, kernel-team@lge.com, Minchan Kim
>  <minchan@kernel.org>, stable@vger.kernel.org
> Subject: [PATCH 1/3] zram: fix operator precedence to get offset
> X-Mailer: git-send-email 2.7.4
> 
> In zram_rw_page, the logic to get offset is wrong by operator precedence
> (i.e., "<<" is higher than "&"). With wrong offset, zram can corrupt the
> user's data. This patch fixes it.
> 
> Fixes: 8c7f01025 ("zram: implement rw_page operation of zram")
> Cc: stable@vger.kernel.org
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com

	-ss

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] zram: do not use copy_page with non-page alinged address
  2017-04-13  0:17 ` [PATCH 2/3] zram: do not use copy_page with non-page alinged address Minchan Kim
  2017-04-14  5:41   ` Sergey Senozhatsky
@ 2017-04-17  1:48   ` Sergey Senozhatsky
  1 sibling, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2017-04-17  1:48 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, Sergey Senozhatsky, kernel-team, stable

On (04/13/17 09:17), Minchan Kim wrote:
> The copy_page is optimized memcpy for page-alinged address.
> If it is used with non-page aligned address, it can corrupt memory which
> means system corruption. With zram, it can happen with
> 
> 1. 64K architecture
> 2. partial IO
> 3. slub debug
> 
> Partial IO need to allocate a page and zram allocates it via kmalloc.
> With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned
> address. And finally, copy_page(mem, cmem) corrupts memory.
> 
> So, this patch changes it to memcpy.
> 
> Acutaully, we don't need to change zram_bvec_write part because zsmalloc
> returns page-aligned address in case of PAGE_SIZE class but it's not
> good to rely on the internal of zsmalloc.
> 
> Note:
> When this patch is merged to stable, clear_page should be fixed, too.
> Unfortunately, recent zram removes it by "same page merge" feature
> so it's hard to backport this patch to -stable tree.
> 
> I will handle it when I receive the mail from stable tree maintainer
> to merge this patch to backport.
> 
> Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()")
> Cc: stable@vger.kernel.org
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] zram: fix operator precedence to get offset
  2017-04-17  1:21     ` Sergey Senozhatsky
@ 2017-04-17  1:54       ` Sergey Senozhatsky
  2017-04-17  2:14         ` Minchan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Sergey Senozhatsky @ 2017-04-17  1:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, kernel-team, stable,
	Sergey Senozhatsky, Sergey Senozhatsky

On (04/17/17 10:21), Sergey Senozhatsky wrote:
> > However, it should be *fixed* to prevent confusion in future

or may be something like below? can save us some cycles.

remove this calculation

-       offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;


and pass 0 to zram_bvec_rw()

-       err = zram_bvec_rw(zram, &bv, index, offset, is_write);
+       err = zram_bvec_rw(zram, &bv, index, 0, is_write);


	-ss

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] zram: fix operator precedence to get offset
  2017-04-17  1:54       ` Sergey Senozhatsky
@ 2017-04-17  2:14         ` Minchan Kim
  2017-04-17 10:50           ` Sergey Senozhatsky
  0 siblings, 1 reply; 17+ messages in thread
From: Minchan Kim @ 2017-04-17  2:14 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-kernel, kernel-team, stable, Sergey Senozhatsky

Hi Sergey,

On Mon, Apr 17, 2017 at 10:54:29AM +0900, Sergey Senozhatsky wrote:
> On (04/17/17 10:21), Sergey Senozhatsky wrote:
> > > However, it should be *fixed* to prevent confusion in future
> 
> or may be something like below? can save us some cycles.
> 
> remove this calculation
> 
> -       offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
> 
> 
> and pass 0 to zram_bvec_rw()
> 
> -       err = zram_bvec_rw(zram, &bv, index, offset, is_write);
> +       err = zram_bvec_rw(zram, &bv, index, 0, is_write);

That was one I wrote but have thought it more.

Because I suspect fs can submit page-size IO in non-aligned PAGE_SIZE
sector? For example, it can submit PAGE_SIZE read request from 9 sector.
Is it possible? I don't know.

As well, FS can format zram from sector 1, not sector 0? IOW, can't it
use starting sector as non-page algined sector?
We can do it via fdisk?

Anyway, If one of scenario I mentioned is possible, zram_rw_page will
be broken.

If it's hard to check all of scenario in this moment, it would be
better to not remove it and then add WARN_ON(offset) in there.

While I am writing this, I found this.

/**
 * bdev_read_page() - Start reading a page from a block device
 * @bdev: The device to read the page from
 * @sector: The offset on the device to read the page to (need not be aligned)
 * @page: The page to read
 *

Hmm,, need investigation but no time.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] zram: fix operator precedence to get offset
  2017-04-17  2:14         ` Minchan Kim
@ 2017-04-17 10:50           ` Sergey Senozhatsky
  2017-04-17 10:53             ` Sergey Senozhatsky
  2017-04-17 23:53             ` Minchan Kim
  0 siblings, 2 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2017-04-17 10:50 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, kernel-team,
	stable, Sergey Senozhatsky

Hello Minchan,

On (04/17/17 11:14), Minchan Kim wrote:
> On Mon, Apr 17, 2017 at 10:54:29AM +0900, Sergey Senozhatsky wrote:
> > On (04/17/17 10:21), Sergey Senozhatsky wrote:
> > > > However, it should be *fixed* to prevent confusion in future
> > 
> > or may be something like below? can save us some cycles.
> > 
> > remove this calculation
> > 
> > -       offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
> > 
> > 
> > and pass 0 to zram_bvec_rw()
> > 
> > -       err = zram_bvec_rw(zram, &bv, index, offset, is_write);
> > +       err = zram_bvec_rw(zram, &bv, index, 0, is_write);
> 
> That was one I wrote but have thought it more.
> 
> Because I suspect fs can submit page-size IO in non-aligned PAGE_SIZE
> sector? For example, it can submit PAGE_SIZE read request from 9 sector.
> Is it possible? I don't know.
> 
> As well, FS can format zram from sector 1, not sector 0? IOW, can't it
> use starting sector as non-page algined sector?
> We can do it via fdisk?
> 
> Anyway, If one of scenario I mentioned is possible, zram_rw_page will
> be broken.
> 
> If it's hard to check all of scenario in this moment, it would be
> better to not remove it and then add WARN_ON(offset) in there.
> 
> While I am writing this, I found this.
> 
> /**
>  * bdev_read_page() - Start reading a page from a block device
>  * @bdev: The device to read the page from
>  * @sector: The offset on the device to read the page to (need not be aligned)
>  * @page: The page to read
>  *
> 
> Hmm,, need investigation but no time.

good questions.

as far as I can see, we never use 'offset' which we pass to zram_bvec_rw()
from zram_rw_page(). `offset' makes a lot of sense for partial IO, but in
zram_bvec_rw() we always do "bv.bv_len = PAGE_SIZE".

so what we have is

for READ

zram_rw_page()
	bv.bv_len = PAGE_SIZE
	zram_bvec_rw(zram, &bv, index, offset, is_write);
		zram_bvec_read()
			if (is_partial_io(bvec))		// always false
				memcpy(user_mem + bvec->bv_offset,
					uncmem + offset,
					bvec->bv_len);


for WRITE

zram_rw_page()
	bv.bv_len = PAGE_SIZE
	zram_bvec_rw(zram, &bv, index, offset, is_write);
		zram_bvec_write()
			if (is_partial_io(bvec))		// always false
				memcpy(uncmem + offset,
					user_mem + bvec->bv_offset,
					bvec->bv_len);


and our is_partial_io() looks at ->bv_len:

		bvec->bv_len != PAGE_SIZE;

which we set to PAGE_SIZE.

so in the existing scheme of things, we never care about 'sector'
passed from zram_rw_page(). and this has worked for us for quite
some time. my call would be -- let's drop zram_rw_page() `sector'
calculation.

	-ss

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] zram: fix operator precedence to get offset
  2017-04-17 10:50           ` Sergey Senozhatsky
@ 2017-04-17 10:53             ` Sergey Senozhatsky
  2017-04-17 23:53             ` Minchan Kim
  1 sibling, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2017-04-17 10:53 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Andrew Morton, linux-kernel, kernel-team, stable,
	Sergey Senozhatsky

On (04/17/17 19:50), Sergey Senozhatsky wrote:
[..]
> so in the existing scheme of things, we never care about 'sector'
> passed from zram_rw_page(). and this has worked for us for quite
> some time. my call would be -- let's drop zram_rw_page() `sector'
> calculation.

d'oh... s/sector/offset/g


	-ss

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] zram: fix operator precedence to get offset
  2017-04-17 10:50           ` Sergey Senozhatsky
  2017-04-17 10:53             ` Sergey Senozhatsky
@ 2017-04-17 23:53             ` Minchan Kim
  2017-04-18  1:53               ` Sergey Senozhatsky
  1 sibling, 1 reply; 17+ messages in thread
From: Minchan Kim @ 2017-04-17 23:53 UTC (permalink / raw)
  To: Sergey Senozhatsky, willy
  Cc: Andrew Morton, linux-kernel, kernel-team, stable, Sergey Senozhatsky

Hi Sergey,

On Mon, Apr 17, 2017 at 07:50:16PM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (04/17/17 11:14), Minchan Kim wrote:
> > On Mon, Apr 17, 2017 at 10:54:29AM +0900, Sergey Senozhatsky wrote:
> > > On (04/17/17 10:21), Sergey Senozhatsky wrote:
> > > > > However, it should be *fixed* to prevent confusion in future
> > > 
> > > or may be something like below? can save us some cycles.
> > > 
> > > remove this calculation
> > > 
> > > -       offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
> > > 
> > > 
> > > and pass 0 to zram_bvec_rw()
> > > 
> > > -       err = zram_bvec_rw(zram, &bv, index, offset, is_write);
> > > +       err = zram_bvec_rw(zram, &bv, index, 0, is_write);
> > 
> > That was one I wrote but have thought it more.
> > 
> > Because I suspect fs can submit page-size IO in non-aligned PAGE_SIZE
> > sector? For example, it can submit PAGE_SIZE read request from 9 sector.
> > Is it possible? I don't know.
> > 
> > As well, FS can format zram from sector 1, not sector 0? IOW, can't it
> > use starting sector as non-page algined sector?
> > We can do it via fdisk?
> > 
> > Anyway, If one of scenario I mentioned is possible, zram_rw_page will
> > be broken.
> > 
> > If it's hard to check all of scenario in this moment, it would be
> > better to not remove it and then add WARN_ON(offset) in there.
> > 
> > While I am writing this, I found this.
> > 
> > /**
> >  * bdev_read_page() - Start reading a page from a block device
> >  * @bdev: The device to read the page from
> >  * @sector: The offset on the device to read the page to (need not be aligned)
> >  * @page: The page to read
> >  *
> > 
> > Hmm,, need investigation but no time.
> 
> good questions.
> 
> as far as I can see, we never use 'offset' which we pass to zram_bvec_rw()
> from zram_rw_page(). `offset' makes a lot of sense for partial IO, but in
> zram_bvec_rw() we always do "bv.bv_len = PAGE_SIZE".
> 
> so what we have is
> 
> for READ
> 
> zram_rw_page()
> 	bv.bv_len = PAGE_SIZE
> 	zram_bvec_rw(zram, &bv, index, offset, is_write);
> 		zram_bvec_read()
> 			if (is_partial_io(bvec))		// always false
> 				memcpy(user_mem + bvec->bv_offset,
> 					uncmem + offset,
> 					bvec->bv_len);
> 
> 
> for WRITE
> 
> zram_rw_page()
> 	bv.bv_len = PAGE_SIZE
> 	zram_bvec_rw(zram, &bv, index, offset, is_write);
> 		zram_bvec_write()
> 			if (is_partial_io(bvec))		// always false
> 				memcpy(uncmem + offset,
> 					user_mem + bvec->bv_offset,
> 					bvec->bv_len);
> 
> 
> and our is_partial_io() looks at ->bv_len:
> 
> 		bvec->bv_len != PAGE_SIZE;
> 
> which we set to PAGE_SIZE.
> 
> so in the existing scheme of things, we never care about 'sector'
> passed from zram_rw_page(). and this has worked for us for quite
> some time. my call would be -- let's drop zram_rw_page() `sector'
> calculation.

I can do but before that, I want to confirm. Ccing Matthew,
Summary for Matthew,

I see following comment about the sector from bdev_read_page.

/**
 * bdev_read_page() - Start reading a page from a block device
 * @bdev: The device to read the page from
 * @sector: The offset on the device to read the page to (need not be aligned)
 * @page: The page to read
 *

Does it mean that sector can be not aligned PAGE_SIZE?

For example, 512byte sector, 4K page system, 4K = 8 sector

        bdev_read_page(bdev, 9, page);

is possible for driver declared below?

        blk_queue_physical_block_size(zram->disk->queue, PAGE_SIZE);
        blk_queue_logical_block_size(zram->disk->queue,
                                        ZRAM_LOGICAL_BLOCK_SIZE);

ZRAM_LOGICAL_BLOCK_SIZE is 4K regradless of 4K/64K page architecure.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] zram: fix operator precedence to get offset
  2017-04-17 23:53             ` Minchan Kim
@ 2017-04-18  1:53               ` Sergey Senozhatsky
  2017-04-18  2:47                 ` Minchan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Sergey Senozhatsky @ 2017-04-18  1:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, willy, Andrew Morton, linux-kernel,
	kernel-team, stable, Sergey Senozhatsky

Hello,

On (04/18/17 08:53), Minchan Kim wrote:
> On Mon, Apr 17, 2017 at 07:50:16PM +0900, Sergey Senozhatsky wrote:
> > Hello Minchan,
> > 
> > On (04/17/17 11:14), Minchan Kim wrote:
> > > On Mon, Apr 17, 2017 at 10:54:29AM +0900, Sergey Senozhatsky wrote:
> > > > On (04/17/17 10:21), Sergey Senozhatsky wrote:
> > > > > > However, it should be *fixed* to prevent confusion in future
> > > > 
> > > > or may be something like below? can save us some cycles.
> > > > 
> > > > remove this calculation
> > > > 
> > > > -       offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
> > > > 
> > > > 
> > > > and pass 0 to zram_bvec_rw()
> > > > 
> > > > -       err = zram_bvec_rw(zram, &bv, index, offset, is_write);
> > > > +       err = zram_bvec_rw(zram, &bv, index, 0, is_write);
> > > 
> > > That was one I wrote but have thought it more.
> > > 
> > > Because I suspect fs can submit page-size IO in non-aligned PAGE_SIZE
> > > sector? For example, it can submit PAGE_SIZE read request from 9 sector.
> > > Is it possible? I don't know.
> > > 
> > > As well, FS can format zram from sector 1, not sector 0? IOW, can't it
> > > use starting sector as non-page algined sector?
> > > We can do it via fdisk?
> > > 
> > > Anyway, If one of scenario I mentioned is possible, zram_rw_page will
> > > be broken.
> > > 
> > > If it's hard to check all of scenario in this moment, it would be
> > > better to not remove it and then add WARN_ON(offset) in there.
> > > 
> > > While I am writing this, I found this.
> > > 
> > > /**
> > >  * bdev_read_page() - Start reading a page from a block device
> > >  * @bdev: The device to read the page from
> > >  * @sector: The offset on the device to read the page to (need not be aligned)
> > >  * @page: The page to read
> > >  *
> > > 
> > > Hmm,, need investigation but no time.
> > 
> > good questions.
> > 
> > as far as I can see, we never use 'offset' which we pass to zram_bvec_rw()
> > from zram_rw_page(). `offset' makes a lot of sense for partial IO, but in
> > zram_bvec_rw() we always do "bv.bv_len = PAGE_SIZE".
> > 
> > so what we have is
> > 
> > for READ
> > 
> > zram_rw_page()
> > 	bv.bv_len = PAGE_SIZE
> > 	zram_bvec_rw(zram, &bv, index, offset, is_write);
> > 		zram_bvec_read()
> > 			if (is_partial_io(bvec))		// always false
> > 				memcpy(user_mem + bvec->bv_offset,
> > 					uncmem + offset,
> > 					bvec->bv_len);
> > 
> > 
> > for WRITE
> > 
> > zram_rw_page()
> > 	bv.bv_len = PAGE_SIZE
> > 	zram_bvec_rw(zram, &bv, index, offset, is_write);
> > 		zram_bvec_write()
> > 			if (is_partial_io(bvec))		// always false
> > 				memcpy(uncmem + offset,
> > 					user_mem + bvec->bv_offset,
> > 					bvec->bv_len);
> > 
> > 
> > and our is_partial_io() looks at ->bv_len:
> > 
> > 		bvec->bv_len != PAGE_SIZE;
> > 
> > which we set to PAGE_SIZE.
> > 
> > so in the existing scheme of things, we never care about 'sector'
> > passed from zram_rw_page(). and this has worked for us for quite
> > some time. my call would be -- let's drop zram_rw_page() `sector'
> > calculation.
> 
> I can do but before that, I want to confirm. Ccing Matthew,
> Summary for Matthew,
> 
> I see following comment about the sector from bdev_read_page.
> 
> /**
>  * bdev_read_page() - Start reading a page from a block device
>  * @bdev: The device to read the page from
>  * @sector: The offset on the device to read the page to (need not be aligned)
>  * @page: The page to read
>  *
> 
> Does it mean that sector can be not aligned PAGE_SIZE?
> 
> For example, 512byte sector, 4K page system, 4K = 8 sector
> 
>         bdev_read_page(bdev, 9, page);

do you mean a sector that spans two pages? sectors are pow of 2 in size
and pages are pow of 2 in size, so page_size is `K * sector_size', isn't
it?

fs/mpage.c

static struct bio *
do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
                sector_t *last_block_in_bio, struct buffer_head *map_bh,
                unsigned long *first_logical_block, get_block_t get_block,
                gfp_t gfp)
{
        const unsigned blkbits = inode->i_blkbits;
        const unsigned blocks_per_page = PAGE_SIZE >> blkbits;
        const unsigned blocksize = 1 << blkbits;
        sector_t block_in_file;
        sector_t last_block;
        sector_t last_block_in_file;
        sector_t blocks[MAX_BUF_PER_PAGE];
	...
        block_in_file = (sector_t)page->index << (PAGE_SHIFT - blkbits);
        last_block = block_in_file + nr_pages * blocks_per_page;
        last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;
        if (last_block > last_block_in_file)
                last_block = last_block_in_file;

or did I misunderstood your question?

	-ss

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] zram: fix operator precedence to get offset
  2017-04-18  1:53               ` Sergey Senozhatsky
@ 2017-04-18  2:47                 ` Minchan Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Minchan Kim @ 2017-04-18  2:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: willy, Andrew Morton, linux-kernel, kernel-team, stable,
	Sergey Senozhatsky

On Tue, Apr 18, 2017 at 10:53:10AM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (04/18/17 08:53), Minchan Kim wrote:
> > On Mon, Apr 17, 2017 at 07:50:16PM +0900, Sergey Senozhatsky wrote:
> > > Hello Minchan,
> > > 
> > > On (04/17/17 11:14), Minchan Kim wrote:
> > > > On Mon, Apr 17, 2017 at 10:54:29AM +0900, Sergey Senozhatsky wrote:
> > > > > On (04/17/17 10:21), Sergey Senozhatsky wrote:
> > > > > > > However, it should be *fixed* to prevent confusion in future
> > > > > 
> > > > > or may be something like below? can save us some cycles.
> > > > > 
> > > > > remove this calculation
> > > > > 
> > > > > -       offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
> > > > > 
> > > > > 
> > > > > and pass 0 to zram_bvec_rw()
> > > > > 
> > > > > -       err = zram_bvec_rw(zram, &bv, index, offset, is_write);
> > > > > +       err = zram_bvec_rw(zram, &bv, index, 0, is_write);
> > > > 
> > > > That was one I wrote but have thought it more.
> > > > 
> > > > Because I suspect fs can submit page-size IO in non-aligned PAGE_SIZE
> > > > sector? For example, it can submit PAGE_SIZE read request from 9 sector.
> > > > Is it possible? I don't know.
> > > > 
> > > > As well, FS can format zram from sector 1, not sector 0? IOW, can't it
> > > > use starting sector as non-page algined sector?
> > > > We can do it via fdisk?
> > > > 
> > > > Anyway, If one of scenario I mentioned is possible, zram_rw_page will
> > > > be broken.
> > > > 
> > > > If it's hard to check all of scenario in this moment, it would be
> > > > better to not remove it and then add WARN_ON(offset) in there.
> > > > 
> > > > While I am writing this, I found this.
> > > > 
> > > > /**
> > > >  * bdev_read_page() - Start reading a page from a block device
> > > >  * @bdev: The device to read the page from
> > > >  * @sector: The offset on the device to read the page to (need not be aligned)
> > > >  * @page: The page to read
> > > >  *
> > > > 
> > > > Hmm,, need investigation but no time.
> > > 
> > > good questions.
> > > 
> > > as far as I can see, we never use 'offset' which we pass to zram_bvec_rw()
> > > from zram_rw_page(). `offset' makes a lot of sense for partial IO, but in
> > > zram_bvec_rw() we always do "bv.bv_len = PAGE_SIZE".
> > > 
> > > so what we have is
> > > 
> > > for READ
> > > 
> > > zram_rw_page()
> > > 	bv.bv_len = PAGE_SIZE
> > > 	zram_bvec_rw(zram, &bv, index, offset, is_write);
> > > 		zram_bvec_read()
> > > 			if (is_partial_io(bvec))		// always false
> > > 				memcpy(user_mem + bvec->bv_offset,
> > > 					uncmem + offset,
> > > 					bvec->bv_len);
> > > 
> > > 
> > > for WRITE
> > > 
> > > zram_rw_page()
> > > 	bv.bv_len = PAGE_SIZE
> > > 	zram_bvec_rw(zram, &bv, index, offset, is_write);
> > > 		zram_bvec_write()
> > > 			if (is_partial_io(bvec))		// always false
> > > 				memcpy(uncmem + offset,
> > > 					user_mem + bvec->bv_offset,
> > > 					bvec->bv_len);
> > > 
> > > 
> > > and our is_partial_io() looks at ->bv_len:
> > > 
> > > 		bvec->bv_len != PAGE_SIZE;
> > > 
> > > which we set to PAGE_SIZE.
> > > 
> > > so in the existing scheme of things, we never care about 'sector'
> > > passed from zram_rw_page(). and this has worked for us for quite
> > > some time. my call would be -- let's drop zram_rw_page() `sector'
> > > calculation.
> > 
> > I can do but before that, I want to confirm. Ccing Matthew,
> > Summary for Matthew,
> > 
> > I see following comment about the sector from bdev_read_page.
> > 
> > /**
> >  * bdev_read_page() - Start reading a page from a block device
> >  * @bdev: The device to read the page from
> >  * @sector: The offset on the device to read the page to (need not be aligned)
> >  * @page: The page to read
> >  *
> > 
> > Does it mean that sector can be not aligned PAGE_SIZE?
> > 
> > For example, 512byte sector, 4K page system, 4K = 8 sector
> > 
> >         bdev_read_page(bdev, 9, page);
> 
> do you mean a sector that spans two pages? sectors are pow of 2 in size
> and pages are pow of 2 in size, so page_size is `K * sector_size', isn't
> it?
> 
> fs/mpage.c
> 
> static struct bio *
> do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
>                 sector_t *last_block_in_bio, struct buffer_head *map_bh,
>                 unsigned long *first_logical_block, get_block_t get_block,
>                 gfp_t gfp)
> {
>         const unsigned blkbits = inode->i_blkbits;
>         const unsigned blocks_per_page = PAGE_SIZE >> blkbits;
>         const unsigned blocksize = 1 << blkbits;
>         sector_t block_in_file;
>         sector_t last_block;
>         sector_t last_block_in_file;
>         sector_t blocks[MAX_BUF_PER_PAGE];
> 	...
>         block_in_file = (sector_t)page->index << (PAGE_SHIFT - blkbits);
>         last_block = block_in_file + nr_pages * blocks_per_page;
>         last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;
>         if (last_block > last_block_in_file)
>                 last_block = last_block_in_file;
> 
> or did I misunderstood your question?

I meant
        
If bdev_read_page ask 4K(8 sectors) from sector 9(if it is possible),
zram should handle it with two IO separate request like below.

zram_rw_page:

index = sector >> SECTORS_PER_PAGE_SHIFT;
offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;

bvec.bv_len = PAGE_SIZE - offset;
bvec.bv_offset = 0;

zram_bvec_rw(zram, &bv, index, offset, is_write);

bvec.bv_len = offset;
bvec.bv_offset = PAGE_SIZE - offset;

zram_bvec_rw(zram, &bv, index + 1, 0, is_write);

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-04-18  2:48 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-13  0:17 [PATCH 1/3] zram: fix operator precedence to get offset Minchan Kim
2017-04-13  0:17 ` [PATCH 2/3] zram: do not use copy_page with non-page alinged address Minchan Kim
2017-04-14  5:41   ` Sergey Senozhatsky
2017-04-14 15:40     ` Minchan Kim
2017-04-17  1:48   ` Sergey Senozhatsky
2017-04-13  0:17 ` [PATCH 3/3] zsmalloc: expand class bit Minchan Kim
2017-04-14  5:07 ` [PATCH 1/3] zram: fix operator precedence to get offset Sergey Senozhatsky
2017-04-14 15:33   ` Minchan Kim
2017-04-17  1:21     ` Sergey Senozhatsky
2017-04-17  1:54       ` Sergey Senozhatsky
2017-04-17  2:14         ` Minchan Kim
2017-04-17 10:50           ` Sergey Senozhatsky
2017-04-17 10:53             ` Sergey Senozhatsky
2017-04-17 23:53             ` Minchan Kim
2017-04-18  1:53               ` Sergey Senozhatsky
2017-04-18  2:47                 ` Minchan Kim
2017-04-17  1:21 ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).