linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: <willy@infradead.org>, Andrew Morton <akpm@linux-foundation.org>,
	<linux-kernel@vger.kernel.org>, <kernel-team@lge.com>,
	<stable@vger.kernel.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Subject: Re: [PATCH 1/3] zram: fix operator precedence to get offset
Date: Tue, 18 Apr 2017 11:47:53 +0900	[thread overview]
Message-ID: <20170418024753.GA10648@bbox> (raw)
In-Reply-To: <20170418015310.GA558@jagdpanzerIV.localdomain>

On Tue, Apr 18, 2017 at 10:53:10AM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (04/18/17 08:53), Minchan Kim wrote:
> > On Mon, Apr 17, 2017 at 07:50:16PM +0900, Sergey Senozhatsky wrote:
> > > Hello Minchan,
> > > 
> > > On (04/17/17 11:14), Minchan Kim wrote:
> > > > On Mon, Apr 17, 2017 at 10:54:29AM +0900, Sergey Senozhatsky wrote:
> > > > > On (04/17/17 10:21), Sergey Senozhatsky wrote:
> > > > > > > However, it should be *fixed* to prevent confusion in future
> > > > > 
> > > > > or may be something like below? can save us some cycles.
> > > > > 
> > > > > remove this calculation
> > > > > 
> > > > > -       offset = sector & (SECTORS_PER_PAGE - 1) << SECTOR_SHIFT;
> > > > > 
> > > > > 
> > > > > and pass 0 to zram_bvec_rw()
> > > > > 
> > > > > -       err = zram_bvec_rw(zram, &bv, index, offset, is_write);
> > > > > +       err = zram_bvec_rw(zram, &bv, index, 0, is_write);
> > > > 
> > > > That was one I wrote but have thought it more.
> > > > 
> > > > Because I suspect fs can submit page-size IO in non-aligned PAGE_SIZE
> > > > sector? For example, it can submit PAGE_SIZE read request from 9 sector.
> > > > Is it possible? I don't know.
> > > > 
> > > > As well, FS can format zram from sector 1, not sector 0? IOW, can't it
> > > > use starting sector as non-page algined sector?
> > > > We can do it via fdisk?
> > > > 
> > > > Anyway, If one of scenario I mentioned is possible, zram_rw_page will
> > > > be broken.
> > > > 
> > > > If it's hard to check all of scenario in this moment, it would be
> > > > better to not remove it and then add WARN_ON(offset) in there.
> > > > 
> > > > While I am writing this, I found this.
> > > > 
> > > > /**
> > > >  * bdev_read_page() - Start reading a page from a block device
> > > >  * @bdev: The device to read the page from
> > > >  * @sector: The offset on the device to read the page to (need not be aligned)
> > > >  * @page: The page to read
> > > >  *
> > > > 
> > > > Hmm,, need investigation but no time.
> > > 
> > > good questions.
> > > 
> > > as far as I can see, we never use 'offset' which we pass to zram_bvec_rw()
> > > from zram_rw_page(). `offset' makes a lot of sense for partial IO, but in
> > > zram_bvec_rw() we always do "bv.bv_len = PAGE_SIZE".
> > > 
> > > so what we have is
> > > 
> > > for READ
> > > 
> > > zram_rw_page()
> > > 	bv.bv_len = PAGE_SIZE
> > > 	zram_bvec_rw(zram, &bv, index, offset, is_write);
> > > 		zram_bvec_read()
> > > 			if (is_partial_io(bvec))		// always false
> > > 				memcpy(user_mem + bvec->bv_offset,
> > > 					uncmem + offset,
> > > 					bvec->bv_len);
> > > 
> > > 
> > > for WRITE
> > > 
> > > zram_rw_page()
> > > 	bv.bv_len = PAGE_SIZE
> > > 	zram_bvec_rw(zram, &bv, index, offset, is_write);
> > > 		zram_bvec_write()
> > > 			if (is_partial_io(bvec))		// always false
> > > 				memcpy(uncmem + offset,
> > > 					user_mem + bvec->bv_offset,
> > > 					bvec->bv_len);
> > > 
> > > 
> > > and our is_partial_io() looks at ->bv_len:
> > > 
> > > 		bvec->bv_len != PAGE_SIZE;
> > > 
> > > which we set to PAGE_SIZE.
> > > 
> > > so in the existing scheme of things, we never care about 'sector'
> > > passed from zram_rw_page(). and this has worked for us for quite
> > > some time. my call would be -- let's drop zram_rw_page() `sector'
> > > calculation.
> > 
> > I can do but before that, I want to confirm. Ccing Matthew,
> > Summary for Matthew,
> > 
> > I see following comment about the sector from bdev_read_page.
> > 
> > /**
> >  * bdev_read_page() - Start reading a page from a block device
> >  * @bdev: The device to read the page from
> >  * @sector: The offset on the device to read the page to (need not be aligned)
> >  * @page: The page to read
> >  *
> > 
> > Does it mean that sector can be not aligned PAGE_SIZE?
> > 
> > For example, 512byte sector, 4K page system, 4K = 8 sector
> > 
> >         bdev_read_page(bdev, 9, page);
> 
> do you mean a sector that spans two pages? sectors are pow of 2 in size
> and pages are pow of 2 in size, so page_size is `K * sector_size', isn't
> it?
> 
> fs/mpage.c
> 
> static struct bio *
> do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
>                 sector_t *last_block_in_bio, struct buffer_head *map_bh,
>                 unsigned long *first_logical_block, get_block_t get_block,
>                 gfp_t gfp)
> {
>         const unsigned blkbits = inode->i_blkbits;
>         const unsigned blocks_per_page = PAGE_SIZE >> blkbits;
>         const unsigned blocksize = 1 << blkbits;
>         sector_t block_in_file;
>         sector_t last_block;
>         sector_t last_block_in_file;
>         sector_t blocks[MAX_BUF_PER_PAGE];
> 	...
>         block_in_file = (sector_t)page->index << (PAGE_SHIFT - blkbits);
>         last_block = block_in_file + nr_pages * blocks_per_page;
>         last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;
>         if (last_block > last_block_in_file)
>                 last_block = last_block_in_file;
> 
> or did I misunderstood your question?

I meant
        
If bdev_read_page ask 4K(8 sectors) from sector 9(if it is possible),
zram should handle it with two IO separate request like below.

zram_rw_page:

index = sector >> SECTORS_PER_PAGE_SHIFT;
offset = (sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;

bvec.bv_len = PAGE_SIZE - offset;
bvec.bv_offset = 0;

zram_bvec_rw(zram, &bv, index, offset, is_write);

bvec.bv_len = offset;
bvec.bv_offset = PAGE_SIZE - offset;

zram_bvec_rw(zram, &bv, index + 1, 0, is_write);

  reply	other threads:[~2017-04-18  2:48 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-13  0:17 [PATCH 1/3] zram: fix operator precedence to get offset Minchan Kim
2017-04-13  0:17 ` [PATCH 2/3] zram: do not use copy_page with non-page alinged address Minchan Kim
2017-04-14  5:41   ` Sergey Senozhatsky
2017-04-14 15:40     ` Minchan Kim
2017-04-17  1:48   ` Sergey Senozhatsky
2017-04-13  0:17 ` [PATCH 3/3] zsmalloc: expand class bit Minchan Kim
2017-04-14  5:07 ` [PATCH 1/3] zram: fix operator precedence to get offset Sergey Senozhatsky
2017-04-14 15:33   ` Minchan Kim
2017-04-17  1:21     ` Sergey Senozhatsky
2017-04-17  1:54       ` Sergey Senozhatsky
2017-04-17  2:14         ` Minchan Kim
2017-04-17 10:50           ` Sergey Senozhatsky
2017-04-17 10:53             ` Sergey Senozhatsky
2017-04-17 23:53             ` Minchan Kim
2017-04-18  1:53               ` Sergey Senozhatsky
2017-04-18  2:47                 ` Minchan Kim [this message]
2017-04-17  1:21 ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170418024753.GA10648@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).