All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/60] block: support multipage bvec
@ 2016-10-29  8:07 ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:07 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Al Viro, Andrew Morton,
	Bart Van Assche, open list:GFS2 FILE SYSTEM, Coly Li,
	Dan Williams, open list:DEVICE-MAPPER  LVM,
	open list:DRBD DRIVER, Eric Wheeler, Guoqing Jiang,
	Hannes Reinecke, Hannes Reinecke, Jiri Kosina, Joe Perches,
	Johannes Berg, Johannes Thumshirn, Keith Busch, Kent

Hi,

This patchset brings multipage bvec into block layer. Basic
xfstests(-a auto) over virtio-blk/virtio-scsi have been run
and no regression is found, so it should be good enough
to show the approach now, and any comments are welcome!

1) what is multipage bvec?

Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in linux kernel for long time.

2) why is multipage bvec introduced?

Kent proposed the idea[1] first. 

As system's RAM becomes much bigger than before, and 
at the same time huge page, transparent huge page and
memory compaction are widely used, it is a bit easy now
to see physically contiguous pages inside fs/block stack.
On the other hand, from block layer's view, it isn't
necessary to store intermediate pages into bvec, and
it is enough to just store the physicallly contiguous
'segment'.

Also huge pages are being brought to filesystem[2], we
can do IO a hugepage a time[3], requires that one bio can
transfer at least one huge page one time. Turns out it isn't
flexiable to change BIO_MAX_PAGES simply[3]. Multipage bvec
can fit in this case very well.

With multipage bvec:

- bio size can be increased and it should improve some
high-bandwidth IO case in theory[4].

- Inside block layer, both bio splitting and sg map can
become more efficient than before by just traversing the
physically contiguous 'segment' instead of each page.

- there is possibility in future to improve memory footprint
of bvecs usage. 

3) how is multipage bvec implemented in this patchset?

The 1st 22 patches cleanup on direct access to bvec table,
and comments on some special cases. With this approach,
most of cases are found as safe for multipage bvec,
only fs/buffer, pktcdvd, dm-io, MD and btrfs need to deal
with.

Given a little more work is involved to cleanup pktcdvd,
MD and btrfs, this patchset introduces QUEUE_FLAG_NO_MP for
them, and these components can still see/use singlepage bvec.
In the future, once the cleanup is done, the flag can be killed.

The 2nd part(23 ~ 60) implements multipage bvec in block:

- put all tricks into bvec/bio/rq iterators, and as far as
drivers and fs use these standard iterators, they are happy
with multipage bvec

- bio_for_each_segment_all() changes
this helper pass pointer of each bvec directly to user, and
it has to be changed. Two new helpers(bio_for_each_segment_all_rd()
and bio_for_each_segment_all_wt()) are introduced. 

- bio_clone() changes
At default bio_clone still clones one new bio in multipage bvec
way. Also single page version of bio_clone() is introduced
for some special cases, such as only single page bvec is used
for the new cloned bio(bio bounce, ...)

These patches can be found in the following git tree:

	https://github.com/ming1/linux/tree/mp-bvec-0.3-v4.9

Thanks Christoph for looking at the early version and providing
very good suggestions, such as: introduce bio_init_with_vec_table(),
remove another unnecessary helpers for cleanup and so on.

TODO:
	- cleanup direct access to bvec table for MD & btrfs


[1], http://marc.info/?l=linux-kernel&m=141680246629547&w=2
[2], http://lwn.net/Articles/700781/
[3], http://marc.info/?t=147735447100001&r=1&w=2
[4], http://marc.info/?l=linux-mm&m=147745525801433&w=2


Ming Lei (60):
  block: bio: introduce bio_init_with_vec_table()
  block drivers: convert to bio_init_with_vec_table()
  block: drbd: remove impossible failure handling
  block: floppy: use bio_add_page()
  target: avoid to access .bi_vcnt directly
  bcache: debug: avoid to access .bi_io_vec directly
  dm: crypt: use bio_add_page()
  dm: use bvec iterator helpers to implement .get_page and .next_page
  dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  fs: logfs: convert to bio_add_page() in sync_request()
  fs: logfs: use bio_add_page() in __bdev_writeseg()
  fs: logfs: use bio_add_page() in do_erase()
  fs: logfs: remove unnecesary check
  block: drbd: comment on direct access bvec table
  block: loop: comment on direct access to bvec table
  block: pktcdvd: comment on direct access to bvec table
  kernel/power/swap.c: comment on direct access to bvec table
  mm: page_io.c: comment on direct access to bvec table
  fs/buffer: comment on direct access to bvec table
  f2fs: f2fs_read_end_io: comment on direct access to bvec table
  bcache: comment on direct access to bvec table
  block: comment on bio_alloc_pages()
  block: introduce flag QUEUE_FLAG_NO_MP
  md: set NO_MP for request queue of md
  block: pktcdvd: set NO_MP for pktcdvd request queue
  btrfs: set NO_MP for request queues behind BTRFS
  block: introduce BIO_SP_MAX_SECTORS
  block: introduce QUEUE_FLAG_SPLIT_MP
  dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT
  bcache: set flag of QUEUE_FLAG_SPLIT_MP
  block: introduce multipage/single page bvec helpers
  block: implement sp version of bvec iterator helpers
  block: introduce bio_for_each_segment_mp()
  block: introduce bio_clone_sp()
  bvec_iter: introduce BVEC_ITER_ALL_INIT
  block: bounce: avoid direct access to bvec from bio->bi_io_vec
  block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
  block: bounce: convert multipage bvecs into singlepage
  bcache: debug: switch to bio_clone_sp()
  blk-merge: compute bio->bi_seg_front_size efficiently
  block: blk-merge: try to make front segments in full size
  block: use bio_for_each_segment_mp() to compute segments count
  block: use bio_for_each_segment_mp() to map sg
  block: introduce bvec_for_each_sp_bvec()
  block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  block: deal with dirtying pages for multipage bvec
  block: convert to bio_for_each_segment_all_rd()
  fs/mpage: convert to bio_for_each_segment_all_rd()
  fs/direct-io: convert to bio_for_each_segment_all_rd()
  ext4: convert to bio_for_each_segment_all_rd()
  xfs: convert to bio_for_each_segment_all_rd()
  logfs: convert to bio_for_each_segment_all_rd()
  gfs2: convert to bio_for_each_segment_all_rd()
  f2fs: convert to bio_for_each_segment_all_rd()
  exofs: convert to bio_for_each_segment_all_rd()
  fs: crypto: convert to bio_for_each_segment_all_rd()
  bcache: convert to bio_for_each_segment_all_rd()
  dm-crypt: convert to bio_for_each_segment_all_rd()
  fs/buffer.c: use bvec iterator to truncate the bio
  block: enable multipage bvecs

 block/bio.c                        | 104 ++++++++++++++----
 block/blk-merge.c                  | 216 +++++++++++++++++++++++++++++--------
 block/bounce.c                     |  80 ++++++++++----
 drivers/block/drbd/drbd_bitmap.c   |   1 +
 drivers/block/drbd/drbd_receiver.c |  14 +--
 drivers/block/floppy.c             |  10 +-
 drivers/block/loop.c               |   5 +
 drivers/block/pktcdvd.c            |   8 ++
 drivers/md/bcache/btree.c          |   4 +-
 drivers/md/bcache/debug.c          |  19 +++-
 drivers/md/bcache/io.c             |   4 +-
 drivers/md/bcache/journal.c        |   4 +-
 drivers/md/bcache/movinggc.c       |   7 +-
 drivers/md/bcache/super.c          |  25 +++--
 drivers/md/bcache/util.c           |   7 ++
 drivers/md/bcache/writeback.c      |   6 +-
 drivers/md/dm-bufio.c              |   4 +-
 drivers/md/dm-crypt.c              |  11 +-
 drivers/md/dm-io.c                 |  34 ++++--
 drivers/md/dm-rq.c                 |   3 +-
 drivers/md/dm.c                    |  11 +-
 drivers/md/md.c                    |  12 +++
 drivers/md/raid5.c                 |   9 +-
 drivers/nvme/target/io-cmd.c       |   4 +-
 drivers/target/target_core_pscsi.c |   8 +-
 fs/btrfs/volumes.c                 |   3 +
 fs/buffer.c                        |  24 +++--
 fs/crypto/crypto.c                 |   3 +-
 fs/direct-io.c                     |   4 +-
 fs/exofs/ore.c                     |   3 +-
 fs/exofs/ore_raid.c                |   3 +-
 fs/ext4/page-io.c                  |   3 +-
 fs/ext4/readpage.c                 |   3 +-
 fs/f2fs/data.c                     |  13 ++-
 fs/gfs2/lops.c                     |   3 +-
 fs/gfs2/meta_io.c                  |   3 +-
 fs/logfs/dev_bdev.c                | 110 +++++++------------
 fs/mpage.c                         |   3 +-
 fs/xfs/xfs_aops.c                  |   3 +-
 include/linux/bio.h                | 108 +++++++++++++++++--
 include/linux/blk_types.h          |   6 ++
 include/linux/blkdev.h             |   4 +
 include/linux/bvec.h               | 123 +++++++++++++++++++--
 kernel/power/swap.c                |   2 +
 mm/page_io.c                       |   1 +
 45 files changed, 759 insertions(+), 276 deletions(-)

-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 00/60] block: support multipage bvec
@ 2016-10-29  8:07 ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:07 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Al Viro, Andrew Morton,
	Bart Van Assche, open list:GFS2 FILE SYSTEM, Coly Li,
	Dan Williams, open list:DEVICE-MAPPER  (LVM),
	open list:DRBD DRIVER, Eric Wheeler, Guoqing Jiang,
	Hannes Reinecke, Hannes Reinecke, Jiri Kosina, Joe Perches,
	Johannes Berg, Johannes Thumshirn, Keith Busch, Kent Overstreet,
	Kent Overstreet, open list:BCACHE (BLOCK LAYER CACHE),
	open list:BTRFS FILE SYSTEM, open list:EXT4 FILE SYSTEM,
	open list:F2FS FILE SYSTEM, open list:MEMORY MANAGEMENT,
	open list:NVM EXPRESS TARGET DRIVER, open list:SUSPEND TO RAM,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	open list:TARGET SUBSYSTEM, open list:XFS FILESYSTEM,
	open list:LogFS, Michal Hocko, Mike Christie, Mike Snitzer,
	Minchan Kim, Minfei Huang, open list:OSD LIBRARY and FILESYSTEM,
	Petr Mladek, Rasmus Villemoes, Takashi Iwai,
	open list:TARGET SUBSYSTEM, Toshi Kani, Yijing Wang, Zheng Liu,
	Zheng Liu

Hi,

This patchset brings multipage bvec into block layer. Basic
xfstests(-a auto) over virtio-blk/virtio-scsi have been run
and no regression is found, so it should be good enough
to show the approach now, and any comments are welcome!

1) what is multipage bvec?

Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in linux kernel for long time.

2) why is multipage bvec introduced?

Kent proposed the idea[1] first. 

As system's RAM becomes much bigger than before, and 
at the same time huge page, transparent huge page and
memory compaction are widely used, it is a bit easy now
to see physically contiguous pages inside fs/block stack.
On the other hand, from block layer's view, it isn't
necessary to store intermediate pages into bvec, and
it is enough to just store the physicallly contiguous
'segment'.

Also huge pages are being brought to filesystem[2], we
can do IO a hugepage a time[3], requires that one bio can
transfer at least one huge page one time. Turns out it isn't
flexiable to change BIO_MAX_PAGES simply[3]. Multipage bvec
can fit in this case very well.

With multipage bvec:

- bio size can be increased and it should improve some
high-bandwidth IO case in theory[4].

- Inside block layer, both bio splitting and sg map can
become more efficient than before by just traversing the
physically contiguous 'segment' instead of each page.

- there is possibility in future to improve memory footprint
of bvecs usage. 

3) how is multipage bvec implemented in this patchset?

The 1st 22 patches cleanup on direct access to bvec table,
and comments on some special cases. With this approach,
most of cases are found as safe for multipage bvec,
only fs/buffer, pktcdvd, dm-io, MD and btrfs need to deal
with.

Given a little more work is involved to cleanup pktcdvd,
MD and btrfs, this patchset introduces QUEUE_FLAG_NO_MP for
them, and these components can still see/use singlepage bvec.
In the future, once the cleanup is done, the flag can be killed.

The 2nd part(23 ~ 60) implements multipage bvec in block:

- put all tricks into bvec/bio/rq iterators, and as far as
drivers and fs use these standard iterators, they are happy
with multipage bvec

- bio_for_each_segment_all() changes
this helper pass pointer of each bvec directly to user, and
it has to be changed. Two new helpers(bio_for_each_segment_all_rd()
and bio_for_each_segment_all_wt()) are introduced. 

- bio_clone() changes
At default bio_clone still clones one new bio in multipage bvec
way. Also single page version of bio_clone() is introduced
for some special cases, such as only single page bvec is used
for the new cloned bio(bio bounce, ...)

These patches can be found in the following git tree:

	https://github.com/ming1/linux/tree/mp-bvec-0.3-v4.9

Thanks Christoph for looking at the early version and providing
very good suggestions, such as: introduce bio_init_with_vec_table(),
remove another unnecessary helpers for cleanup and so on.

TODO:
	- cleanup direct access to bvec table for MD & btrfs


[1], http://marc.info/?l=linux-kernel&m=141680246629547&w=2
[2], http://lwn.net/Articles/700781/
[3], http://marc.info/?t=147735447100001&r=1&w=2
[4], http://marc.info/?l=linux-mm&m=147745525801433&w=2


Ming Lei (60):
  block: bio: introduce bio_init_with_vec_table()
  block drivers: convert to bio_init_with_vec_table()
  block: drbd: remove impossible failure handling
  block: floppy: use bio_add_page()
  target: avoid to access .bi_vcnt directly
  bcache: debug: avoid to access .bi_io_vec directly
  dm: crypt: use bio_add_page()
  dm: use bvec iterator helpers to implement .get_page and .next_page
  dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  fs: logfs: convert to bio_add_page() in sync_request()
  fs: logfs: use bio_add_page() in __bdev_writeseg()
  fs: logfs: use bio_add_page() in do_erase()
  fs: logfs: remove unnecesary check
  block: drbd: comment on direct access bvec table
  block: loop: comment on direct access to bvec table
  block: pktcdvd: comment on direct access to bvec table
  kernel/power/swap.c: comment on direct access to bvec table
  mm: page_io.c: comment on direct access to bvec table
  fs/buffer: comment on direct access to bvec table
  f2fs: f2fs_read_end_io: comment on direct access to bvec table
  bcache: comment on direct access to bvec table
  block: comment on bio_alloc_pages()
  block: introduce flag QUEUE_FLAG_NO_MP
  md: set NO_MP for request queue of md
  block: pktcdvd: set NO_MP for pktcdvd request queue
  btrfs: set NO_MP for request queues behind BTRFS
  block: introduce BIO_SP_MAX_SECTORS
  block: introduce QUEUE_FLAG_SPLIT_MP
  dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT
  bcache: set flag of QUEUE_FLAG_SPLIT_MP
  block: introduce multipage/single page bvec helpers
  block: implement sp version of bvec iterator helpers
  block: introduce bio_for_each_segment_mp()
  block: introduce bio_clone_sp()
  bvec_iter: introduce BVEC_ITER_ALL_INIT
  block: bounce: avoid direct access to bvec from bio->bi_io_vec
  block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
  block: bounce: convert multipage bvecs into singlepage
  bcache: debug: switch to bio_clone_sp()
  blk-merge: compute bio->bi_seg_front_size efficiently
  block: blk-merge: try to make front segments in full size
  block: use bio_for_each_segment_mp() to compute segments count
  block: use bio_for_each_segment_mp() to map sg
  block: introduce bvec_for_each_sp_bvec()
  block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  block: deal with dirtying pages for multipage bvec
  block: convert to bio_for_each_segment_all_rd()
  fs/mpage: convert to bio_for_each_segment_all_rd()
  fs/direct-io: convert to bio_for_each_segment_all_rd()
  ext4: convert to bio_for_each_segment_all_rd()
  xfs: convert to bio_for_each_segment_all_rd()
  logfs: convert to bio_for_each_segment_all_rd()
  gfs2: convert to bio_for_each_segment_all_rd()
  f2fs: convert to bio_for_each_segment_all_rd()
  exofs: convert to bio_for_each_segment_all_rd()
  fs: crypto: convert to bio_for_each_segment_all_rd()
  bcache: convert to bio_for_each_segment_all_rd()
  dm-crypt: convert to bio_for_each_segment_all_rd()
  fs/buffer.c: use bvec iterator to truncate the bio
  block: enable multipage bvecs

 block/bio.c                        | 104 ++++++++++++++----
 block/blk-merge.c                  | 216 +++++++++++++++++++++++++++++--------
 block/bounce.c                     |  80 ++++++++++----
 drivers/block/drbd/drbd_bitmap.c   |   1 +
 drivers/block/drbd/drbd_receiver.c |  14 +--
 drivers/block/floppy.c             |  10 +-
 drivers/block/loop.c               |   5 +
 drivers/block/pktcdvd.c            |   8 ++
 drivers/md/bcache/btree.c          |   4 +-
 drivers/md/bcache/debug.c          |  19 +++-
 drivers/md/bcache/io.c             |   4 +-
 drivers/md/bcache/journal.c        |   4 +-
 drivers/md/bcache/movinggc.c       |   7 +-
 drivers/md/bcache/super.c          |  25 +++--
 drivers/md/bcache/util.c           |   7 ++
 drivers/md/bcache/writeback.c      |   6 +-
 drivers/md/dm-bufio.c              |   4 +-
 drivers/md/dm-crypt.c              |  11 +-
 drivers/md/dm-io.c                 |  34 ++++--
 drivers/md/dm-rq.c                 |   3 +-
 drivers/md/dm.c                    |  11 +-
 drivers/md/md.c                    |  12 +++
 drivers/md/raid5.c                 |   9 +-
 drivers/nvme/target/io-cmd.c       |   4 +-
 drivers/target/target_core_pscsi.c |   8 +-
 fs/btrfs/volumes.c                 |   3 +
 fs/buffer.c                        |  24 +++--
 fs/crypto/crypto.c                 |   3 +-
 fs/direct-io.c                     |   4 +-
 fs/exofs/ore.c                     |   3 +-
 fs/exofs/ore_raid.c                |   3 +-
 fs/ext4/page-io.c                  |   3 +-
 fs/ext4/readpage.c                 |   3 +-
 fs/f2fs/data.c                     |  13 ++-
 fs/gfs2/lops.c                     |   3 +-
 fs/gfs2/meta_io.c                  |   3 +-
 fs/logfs/dev_bdev.c                | 110 +++++++------------
 fs/mpage.c                         |   3 +-
 fs/xfs/xfs_aops.c                  |   3 +-
 include/linux/bio.h                | 108 +++++++++++++++++--
 include/linux/blk_types.h          |   6 ++
 include/linux/blkdev.h             |   4 +
 include/linux/bvec.h               | 123 +++++++++++++++++++--
 kernel/power/swap.c                |   2 +
 mm/page_io.c                       |   1 +
 45 files changed, 759 insertions(+), 276 deletions(-)

-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 00/60] block: support multipage bvec
@ 2016-10-29  8:07 ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:07 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Al Viro, Andrew Morton,
	Bart Van Assche, open list:GFS2 FILE SYSTEM, Coly Li,
	Dan Williams, open list:DEVICE-MAPPER  (LVM),
	open list:DRBD DRIVER, Eric Wheeler, Guoqing Jiang,
	Hannes Reinecke, Hannes Reinecke, Jiri Kosina, Joe Perches,
	Johannes Berg, Johannes Thumshirn, Keith Busch, Kent

Hi,

This patchset brings multipage bvec into block layer. Basic
xfstests(-a auto) over virtio-blk/virtio-scsi have been run
and no regression is found, so it should be good enough
to show the approach now, and any comments are welcome!

1) what is multipage bvec?

Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in linux kernel for long time.

2) why is multipage bvec introduced?

Kent proposed the idea[1] first. 

As system's RAM becomes much bigger than before, and 
at the same time huge page, transparent huge page and
memory compaction are widely used, it is a bit easy now
to see physically contiguous pages inside fs/block stack.
On the other hand, from block layer's view, it isn't
necessary to store intermediate pages into bvec, and
it is enough to just store the physicallly contiguous
'segment'.

Also huge pages are being brought to filesystem[2], we
can do IO a hugepage a time[3], requires that one bio can
transfer at least one huge page one time. Turns out it isn't
flexiable to change BIO_MAX_PAGES simply[3]. Multipage bvec
can fit in this case very well.

With multipage bvec:

- bio size can be increased and it should improve some
high-bandwidth IO case in theory[4].

- Inside block layer, both bio splitting and sg map can
become more efficient than before by just traversing the
physically contiguous 'segment' instead of each page.

- there is possibility in future to improve memory footprint
of bvecs usage. 

3) how is multipage bvec implemented in this patchset?

The 1st 22 patches cleanup on direct access to bvec table,
and comments on some special cases. With this approach,
most of cases are found as safe for multipage bvec,
only fs/buffer, pktcdvd, dm-io, MD and btrfs need to deal
with.

Given a little more work is involved to cleanup pktcdvd,
MD and btrfs, this patchset introduces QUEUE_FLAG_NO_MP for
them, and these components can still see/use singlepage bvec.
In the future, once the cleanup is done, the flag can be killed.

The 2nd part(23 ~ 60) implements multipage bvec in block:

- put all tricks into bvec/bio/rq iterators, and as far as
drivers and fs use these standard iterators, they are happy
with multipage bvec

- bio_for_each_segment_all() changes
this helper pass pointer of each bvec directly to user, and
it has to be changed. Two new helpers(bio_for_each_segment_all_rd()
and bio_for_each_segment_all_wt()) are introduced. 

- bio_clone() changes
At default bio_clone still clones one new bio in multipage bvec
way. Also single page version of bio_clone() is introduced
for some special cases, such as only single page bvec is used
for the new cloned bio(bio bounce, ...)

These patches can be found in the following git tree:

	https://github.com/ming1/linux/tree/mp-bvec-0.3-v4.9

Thanks Christoph for looking at the early version and providing
very good suggestions, such as: introduce bio_init_with_vec_table(),
remove another unnecessary helpers for cleanup and so on.

TODO:
	- cleanup direct access to bvec table for MD & btrfs


[1], http://marc.info/?l=linux-kernel&m=141680246629547&w=2
[2], http://lwn.net/Articles/700781/
[3], http://marc.info/?t=147735447100001&r=1&w=2
[4], http://marc.info/?l=linux-mm&m=147745525801433&w=2


Ming Lei (60):
  block: bio: introduce bio_init_with_vec_table()
  block drivers: convert to bio_init_with_vec_table()
  block: drbd: remove impossible failure handling
  block: floppy: use bio_add_page()
  target: avoid to access .bi_vcnt directly
  bcache: debug: avoid to access .bi_io_vec directly
  dm: crypt: use bio_add_page()
  dm: use bvec iterator helpers to implement .get_page and .next_page
  dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  fs: logfs: convert to bio_add_page() in sync_request()
  fs: logfs: use bio_add_page() in __bdev_writeseg()
  fs: logfs: use bio_add_page() in do_erase()
  fs: logfs: remove unnecesary check
  block: drbd: comment on direct access bvec table
  block: loop: comment on direct access to bvec table
  block: pktcdvd: comment on direct access to bvec table
  kernel/power/swap.c: comment on direct access to bvec table
  mm: page_io.c: comment on direct access to bvec table
  fs/buffer: comment on direct access to bvec table
  f2fs: f2fs_read_end_io: comment on direct access to bvec table
  bcache: comment on direct access to bvec table
  block: comment on bio_alloc_pages()
  block: introduce flag QUEUE_FLAG_NO_MP
  md: set NO_MP for request queue of md
  block: pktcdvd: set NO_MP for pktcdvd request queue
  btrfs: set NO_MP for request queues behind BTRFS
  block: introduce BIO_SP_MAX_SECTORS
  block: introduce QUEUE_FLAG_SPLIT_MP
  dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT
  bcache: set flag of QUEUE_FLAG_SPLIT_MP
  block: introduce multipage/single page bvec helpers
  block: implement sp version of bvec iterator helpers
  block: introduce bio_for_each_segment_mp()
  block: introduce bio_clone_sp()
  bvec_iter: introduce BVEC_ITER_ALL_INIT
  block: bounce: avoid direct access to bvec from bio->bi_io_vec
  block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
  block: bounce: convert multipage bvecs into singlepage
  bcache: debug: switch to bio_clone_sp()
  blk-merge: compute bio->bi_seg_front_size efficiently
  block: blk-merge: try to make front segments in full size
  block: use bio_for_each_segment_mp() to compute segments count
  block: use bio_for_each_segment_mp() to map sg
  block: introduce bvec_for_each_sp_bvec()
  block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  block: deal with dirtying pages for multipage bvec
  block: convert to bio_for_each_segment_all_rd()
  fs/mpage: convert to bio_for_each_segment_all_rd()
  fs/direct-io: convert to bio_for_each_segment_all_rd()
  ext4: convert to bio_for_each_segment_all_rd()
  xfs: convert to bio_for_each_segment_all_rd()
  logfs: convert to bio_for_each_segment_all_rd()
  gfs2: convert to bio_for_each_segment_all_rd()
  f2fs: convert to bio_for_each_segment_all_rd()
  exofs: convert to bio_for_each_segment_all_rd()
  fs: crypto: convert to bio_for_each_segment_all_rd()
  bcache: convert to bio_for_each_segment_all_rd()
  dm-crypt: convert to bio_for_each_segment_all_rd()
  fs/buffer.c: use bvec iterator to truncate the bio
  block: enable multipage bvecs

 block/bio.c                        | 104 ++++++++++++++----
 block/blk-merge.c                  | 216 +++++++++++++++++++++++++++++--------
 block/bounce.c                     |  80 ++++++++++----
 drivers/block/drbd/drbd_bitmap.c   |   1 +
 drivers/block/drbd/drbd_receiver.c |  14 +--
 drivers/block/floppy.c             |  10 +-
 drivers/block/loop.c               |   5 +
 drivers/block/pktcdvd.c            |   8 ++
 drivers/md/bcache/btree.c          |   4 +-
 drivers/md/bcache/debug.c          |  19 +++-
 drivers/md/bcache/io.c             |   4 +-
 drivers/md/bcache/journal.c        |   4 +-
 drivers/md/bcache/movinggc.c       |   7 +-
 drivers/md/bcache/super.c          |  25 +++--
 drivers/md/bcache/util.c           |   7 ++
 drivers/md/bcache/writeback.c      |   6 +-
 drivers/md/dm-bufio.c              |   4 +-
 drivers/md/dm-crypt.c              |  11 +-
 drivers/md/dm-io.c                 |  34 ++++--
 drivers/md/dm-rq.c                 |   3 +-
 drivers/md/dm.c                    |  11 +-
 drivers/md/md.c                    |  12 +++
 drivers/md/raid5.c                 |   9 +-
 drivers/nvme/target/io-cmd.c       |   4 +-
 drivers/target/target_core_pscsi.c |   8 +-
 fs/btrfs/volumes.c                 |   3 +
 fs/buffer.c                        |  24 +++--
 fs/crypto/crypto.c                 |   3 +-
 fs/direct-io.c                     |   4 +-
 fs/exofs/ore.c                     |   3 +-
 fs/exofs/ore_raid.c                |   3 +-
 fs/ext4/page-io.c                  |   3 +-
 fs/ext4/readpage.c                 |   3 +-
 fs/f2fs/data.c                     |  13 ++-
 fs/gfs2/lops.c                     |   3 +-
 fs/gfs2/meta_io.c                  |   3 +-
 fs/logfs/dev_bdev.c                | 110 +++++++------------
 fs/mpage.c                         |   3 +-
 fs/xfs/xfs_aops.c                  |   3 +-
 include/linux/bio.h                | 108 +++++++++++++++++--
 include/linux/blk_types.h          |   6 ++
 include/linux/blkdev.h             |   4 +
 include/linux/bvec.h               | 123 +++++++++++++++++++--
 kernel/power/swap.c                |   2 +
 mm/page_io.c                       |   1 +
 45 files changed, 759 insertions(+), 276 deletions(-)

-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 00/60] block: support multipage bvec
@ 2016-10-29  8:07 ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:07 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Al Viro, Andrew Morton,
	Bart Van Assche, open list:GFS2 FILE SYSTEM, Coly Li,
	Dan Williams, open list:DEVICE-MAPPER  LVM,
	open list:DRBD DRIVER, Eric Wheeler, Guoqing Jiang,
	Hannes Reinecke, Hannes Reinecke, Jiri Kosina, Joe Perches,
	Johannes Berg, Johannes Thumshirn, Keith Busch, Kent Overstreet,
	Kent Overstreet, open list:BCACHE BLOCK LAYER CACHE,
	open list:BTRFS FILE SYSTEM, open list:EXT4 FILE SYSTEM,
	open list:F2FS FILE SYSTEM, open list:MEMORY MANAGEMENT,
	open list:NVM EXPRESS TARGET DRIVER, open list:SUSPEND TO RAM,
	open list:SOFTWARE RAID Multiple Disks SUPPORT,
	open list:TARGET SUBSYSTEM, open list:XFS FILESYSTEM,
	open list:LogFS, Michal Hocko, Mike Christie, Mike Snitzer,
	Minchan Kim, Minfei Huang, open list:OSD LIBRARY and FILESYSTEM,
	Petr Mladek, Rasmus Villemoes, Takashi Iwai,
	open list:TARGET SUBSYSTEM, Toshi Kani, Yijing Wang, Zheng Liu,
	Zheng Liu

Hi,

This patchset brings multipage bvec into block layer. Basic
xfstests(-a auto) over virtio-blk/virtio-scsi have been run
and no regression is found, so it should be good enough
to show the approach now, and any comments are welcome!

1) what is multipage bvec?

Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in linux kernel for long time.

2) why is multipage bvec introduced?

Kent proposed the idea[1] first. 

As system's RAM becomes much bigger than before, and 
at the same time huge page, transparent huge page and
memory compaction are widely used, it is a bit easy now
to see physically contiguous pages inside fs/block stack.
On the other hand, from block layer's view, it isn't
necessary to store intermediate pages into bvec, and
it is enough to just store the physicallly contiguous
'segment'.

Also huge pages are being brought to filesystem[2], we
can do IO a hugepage a time[3], requires that one bio can
transfer at least one huge page one time. Turns out it isn't
flexiable to change BIO_MAX_PAGES simply[3]. Multipage bvec
can fit in this case very well.

With multipage bvec:

- bio size can be increased and it should improve some
high-bandwidth IO case in theory[4].

- Inside block layer, both bio splitting and sg map can
become more efficient than before by just traversing the
physically contiguous 'segment' instead of each page.

- there is possibility in future to improve memory footprint
of bvecs usage. 

3) how is multipage bvec implemented in this patchset?

The 1st 22 patches cleanup on direct access to bvec table,
and comments on some special cases. With this approach,
most of cases are found as safe for multipage bvec,
only fs/buffer, pktcdvd, dm-io, MD and btrfs need to deal
with.

Given a little more work is involved to cleanup pktcdvd,
MD and btrfs, this patchset introduces QUEUE_FLAG_NO_MP for
them, and these components can still see/use singlepage bvec.
In the future, once the cleanup is done, the flag can be killed.

The 2nd part(23 ~ 60) implements multipage bvec in block:

- put all tricks into bvec/bio/rq iterators, and as far as
drivers and fs use these standard iterators, they are happy
with multipage bvec

- bio_for_each_segment_all() changes
this helper pass pointer of each bvec directly to user, and
it has to be changed. Two new helpers(bio_for_each_segment_all_rd()
and bio_for_each_segment_all_wt()) are introduced. 

- bio_clone() changes
At default bio_clone still clones one new bio in multipage bvec
way. Also single page version of bio_clone() is introduced
for some special cases, such as only single page bvec is used
for the new cloned bio(bio bounce, ...)

These patches can be found in the following git tree:

	https://github.com/ming1/linux/tree/mp-bvec-0.3-v4.9

Thanks Christoph for looking at the early version and providing
very good suggestions, such as: introduce bio_init_with_vec_table(),
remove another unnecessary helpers for cleanup and so on.

TODO:
	- cleanup direct access to bvec table for MD & btrfs


[1], http://marc.info/?l=linux-kernel&m=141680246629547&w=2
[2], http://lwn.net/Articles/700781/
[3], http://marc.info/?t=147735447100001&r=1&w=2
[4], http://marc.info/?l=linux-mm&m=147745525801433&w=2


Ming Lei (60):
  block: bio: introduce bio_init_with_vec_table()
  block drivers: convert to bio_init_with_vec_table()
  block: drbd: remove impossible failure handling
  block: floppy: use bio_add_page()
  target: avoid to access .bi_vcnt directly
  bcache: debug: avoid to access .bi_io_vec directly
  dm: crypt: use bio_add_page()
  dm: use bvec iterator helpers to implement .get_page and .next_page
  dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  fs: logfs: convert to bio_add_page() in sync_request()
  fs: logfs: use bio_add_page() in __bdev_writeseg()
  fs: logfs: use bio_add_page() in do_erase()
  fs: logfs: remove unnecesary check
  block: drbd: comment on direct access bvec table
  block: loop: comment on direct access to bvec table
  block: pktcdvd: comment on direct access to bvec table
  kernel/power/swap.c: comment on direct access to bvec table
  mm: page_io.c: comment on direct access to bvec table
  fs/buffer: comment on direct access to bvec table
  f2fs: f2fs_read_end_io: comment on direct access to bvec table
  bcache: comment on direct access to bvec table
  block: comment on bio_alloc_pages()
  block: introduce flag QUEUE_FLAG_NO_MP
  md: set NO_MP for request queue of md
  block: pktcdvd: set NO_MP for pktcdvd request queue
  btrfs: set NO_MP for request queues behind BTRFS
  block: introduce BIO_SP_MAX_SECTORS
  block: introduce QUEUE_FLAG_SPLIT_MP
  dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT
  bcache: set flag of QUEUE_FLAG_SPLIT_MP
  block: introduce multipage/single page bvec helpers
  block: implement sp version of bvec iterator helpers
  block: introduce bio_for_each_segment_mp()
  block: introduce bio_clone_sp()
  bvec_iter: introduce BVEC_ITER_ALL_INIT
  block: bounce: avoid direct access to bvec from bio->bi_io_vec
  block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
  block: bounce: convert multipage bvecs into singlepage
  bcache: debug: switch to bio_clone_sp()
  blk-merge: compute bio->bi_seg_front_size efficiently
  block: blk-merge: try to make front segments in full size
  block: use bio_for_each_segment_mp() to compute segments count
  block: use bio_for_each_segment_mp() to map sg
  block: introduce bvec_for_each_sp_bvec()
  block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  block: deal with dirtying pages for multipage bvec
  block: convert to bio_for_each_segment_all_rd()
  fs/mpage: convert to bio_for_each_segment_all_rd()
  fs/direct-io: convert to bio_for_each_segment_all_rd()
  ext4: convert to bio_for_each_segment_all_rd()
  xfs: convert to bio_for_each_segment_all_rd()
  logfs: convert to bio_for_each_segment_all_rd()
  gfs2: convert to bio_for_each_segment_all_rd()
  f2fs: convert to bio_for_each_segment_all_rd()
  exofs: convert to bio_for_each_segment_all_rd()
  fs: crypto: convert to bio_for_each_segment_all_rd()
  bcache: convert to bio_for_each_segment_all_rd()
  dm-crypt: convert to bio_for_each_segment_all_rd()
  fs/buffer.c: use bvec iterator to truncate the bio
  block: enable multipage bvecs

 block/bio.c                        | 104 ++++++++++++++----
 block/blk-merge.c                  | 216 +++++++++++++++++++++++++++++--------
 block/bounce.c                     |  80 ++++++++++----
 drivers/block/drbd/drbd_bitmap.c   |   1 +
 drivers/block/drbd/drbd_receiver.c |  14 +--
 drivers/block/floppy.c             |  10 +-
 drivers/block/loop.c               |   5 +
 drivers/block/pktcdvd.c            |   8 ++
 drivers/md/bcache/btree.c          |   4 +-
 drivers/md/bcache/debug.c          |  19 +++-
 drivers/md/bcache/io.c             |   4 +-
 drivers/md/bcache/journal.c        |   4 +-
 drivers/md/bcache/movinggc.c       |   7 +-
 drivers/md/bcache/super.c          |  25 +++--
 drivers/md/bcache/util.c           |   7 ++
 drivers/md/bcache/writeback.c      |   6 +-
 drivers/md/dm-bufio.c              |   4 +-
 drivers/md/dm-crypt.c              |  11 +-
 drivers/md/dm-io.c                 |  34 ++++--
 drivers/md/dm-rq.c                 |   3 +-
 drivers/md/dm.c                    |  11 +-
 drivers/md/md.c                    |  12 +++
 drivers/md/raid5.c                 |   9 +-
 drivers/nvme/target/io-cmd.c       |   4 +-
 drivers/target/target_core_pscsi.c |   8 +-
 fs/btrfs/volumes.c                 |   3 +
 fs/buffer.c                        |  24 +++--
 fs/crypto/crypto.c                 |   3 +-
 fs/direct-io.c                     |   4 +-
 fs/exofs/ore.c                     |   3 +-
 fs/exofs/ore_raid.c                |   3 +-
 fs/ext4/page-io.c                  |   3 +-
 fs/ext4/readpage.c                 |   3 +-
 fs/f2fs/data.c                     |  13 ++-
 fs/gfs2/lops.c                     |   3 +-
 fs/gfs2/meta_io.c                  |   3 +-
 fs/logfs/dev_bdev.c                | 110 +++++++------------
 fs/mpage.c                         |   3 +-
 fs/xfs/xfs_aops.c                  |   3 +-
 include/linux/bio.h                | 108 +++++++++++++++++--
 include/linux/blk_types.h          |   6 ++
 include/linux/blkdev.h             |   4 +
 include/linux/bvec.h               | 123 +++++++++++++++++++--
 kernel/power/swap.c                |   2 +
 mm/page_io.c                       |   1 +
 45 files changed, 759 insertions(+), 276 deletions(-)

-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [Cluster-devel] [PATCH 00/60] block: support multipage bvec
@ 2016-10-29  8:07 ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

This patchset brings multipage bvec into block layer. Basic
xfstests(-a auto) over virtio-blk/virtio-scsi have been run
and no regression is found, so it should be good enough
to show the approach now, and any comments are welcome!

1) what is multipage bvec?

Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in linux kernel for long time.

2) why is multipage bvec introduced?

Kent proposed the idea[1] first. 

As system's RAM becomes much bigger than before, and 
at the same time huge page, transparent huge page and
memory compaction are widely used, it is a bit easy now
to see physically contiguous pages inside fs/block stack.
On the other hand, from block layer's view, it isn't
necessary to store intermediate pages into bvec, and
it is enough to just store the physicallly contiguous
'segment'.

Also huge pages are being brought to filesystem[2], we
can do IO a hugepage a time[3], requires that one bio can
transfer at least one huge page one time. Turns out it isn't
flexiable to change BIO_MAX_PAGES simply[3]. Multipage bvec
can fit in this case very well.

With multipage bvec:

- bio size can be increased and it should improve some
high-bandwidth IO case in theory[4].

- Inside block layer, both bio splitting and sg map can
become more efficient than before by just traversing the
physically contiguous 'segment' instead of each page.

- there is possibility in future to improve memory footprint
of bvecs usage. 

3) how is multipage bvec implemented in this patchset?

The 1st 22 patches cleanup on direct access to bvec table,
and comments on some special cases. With this approach,
most of cases are found as safe for multipage bvec,
only fs/buffer, pktcdvd, dm-io, MD and btrfs need to deal
with.

Given a little more work is involved to cleanup pktcdvd,
MD and btrfs, this patchset introduces QUEUE_FLAG_NO_MP for
them, and these components can still see/use singlepage bvec.
In the future, once the cleanup is done, the flag can be killed.

The 2nd part(23 ~ 60) implements multipage bvec in block:

- put all tricks into bvec/bio/rq iterators, and as far as
drivers and fs use these standard iterators, they are happy
with multipage bvec

- bio_for_each_segment_all() changes
this helper pass pointer of each bvec directly to user, and
it has to be changed. Two new helpers(bio_for_each_segment_all_rd()
and bio_for_each_segment_all_wt()) are introduced. 

- bio_clone() changes
At default bio_clone still clones one new bio in multipage bvec
way. Also single page version of bio_clone() is introduced
for some special cases, such as only single page bvec is used
for the new cloned bio(bio bounce, ...)

These patches can be found in the following git tree:

	https://github.com/ming1/linux/tree/mp-bvec-0.3-v4.9

Thanks Christoph for looking at the early version and providing
very good suggestions, such as: introduce bio_init_with_vec_table(),
remove another unnecessary helpers for cleanup and so on.

TODO:
	- cleanup direct access to bvec table for MD & btrfs


[1], http://marc.info/?l=linux-kernel&m=141680246629547&w=2
[2], http://lwn.net/Articles/700781/
[3], http://marc.info/?t=147735447100001&r=1&w=2
[4], http://marc.info/?l=linux-mm&m=147745525801433&w=2


Ming Lei (60):
  block: bio: introduce bio_init_with_vec_table()
  block drivers: convert to bio_init_with_vec_table()
  block: drbd: remove impossible failure handling
  block: floppy: use bio_add_page()
  target: avoid to access .bi_vcnt directly
  bcache: debug: avoid to access .bi_io_vec directly
  dm: crypt: use bio_add_page()
  dm: use bvec iterator helpers to implement .get_page and .next_page
  dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  fs: logfs: convert to bio_add_page() in sync_request()
  fs: logfs: use bio_add_page() in __bdev_writeseg()
  fs: logfs: use bio_add_page() in do_erase()
  fs: logfs: remove unnecesary check
  block: drbd: comment on direct access bvec table
  block: loop: comment on direct access to bvec table
  block: pktcdvd: comment on direct access to bvec table
  kernel/power/swap.c: comment on direct access to bvec table
  mm: page_io.c: comment on direct access to bvec table
  fs/buffer: comment on direct access to bvec table
  f2fs: f2fs_read_end_io: comment on direct access to bvec table
  bcache: comment on direct access to bvec table
  block: comment on bio_alloc_pages()
  block: introduce flag QUEUE_FLAG_NO_MP
  md: set NO_MP for request queue of md
  block: pktcdvd: set NO_MP for pktcdvd request queue
  btrfs: set NO_MP for request queues behind BTRFS
  block: introduce BIO_SP_MAX_SECTORS
  block: introduce QUEUE_FLAG_SPLIT_MP
  dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT
  bcache: set flag of QUEUE_FLAG_SPLIT_MP
  block: introduce multipage/single page bvec helpers
  block: implement sp version of bvec iterator helpers
  block: introduce bio_for_each_segment_mp()
  block: introduce bio_clone_sp()
  bvec_iter: introduce BVEC_ITER_ALL_INIT
  block: bounce: avoid direct access to bvec from bio->bi_io_vec
  block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
  block: bounce: convert multipage bvecs into singlepage
  bcache: debug: switch to bio_clone_sp()
  blk-merge: compute bio->bi_seg_front_size efficiently
  block: blk-merge: try to make front segments in full size
  block: use bio_for_each_segment_mp() to compute segments count
  block: use bio_for_each_segment_mp() to map sg
  block: introduce bvec_for_each_sp_bvec()
  block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  block: deal with dirtying pages for multipage bvec
  block: convert to bio_for_each_segment_all_rd()
  fs/mpage: convert to bio_for_each_segment_all_rd()
  fs/direct-io: convert to bio_for_each_segment_all_rd()
  ext4: convert to bio_for_each_segment_all_rd()
  xfs: convert to bio_for_each_segment_all_rd()
  logfs: convert to bio_for_each_segment_all_rd()
  gfs2: convert to bio_for_each_segment_all_rd()
  f2fs: convert to bio_for_each_segment_all_rd()
  exofs: convert to bio_for_each_segment_all_rd()
  fs: crypto: convert to bio_for_each_segment_all_rd()
  bcache: convert to bio_for_each_segment_all_rd()
  dm-crypt: convert to bio_for_each_segment_all_rd()
  fs/buffer.c: use bvec iterator to truncate the bio
  block: enable multipage bvecs

 block/bio.c                        | 104 ++++++++++++++----
 block/blk-merge.c                  | 216 +++++++++++++++++++++++++++++--------
 block/bounce.c                     |  80 ++++++++++----
 drivers/block/drbd/drbd_bitmap.c   |   1 +
 drivers/block/drbd/drbd_receiver.c |  14 +--
 drivers/block/floppy.c             |  10 +-
 drivers/block/loop.c               |   5 +
 drivers/block/pktcdvd.c            |   8 ++
 drivers/md/bcache/btree.c          |   4 +-
 drivers/md/bcache/debug.c          |  19 +++-
 drivers/md/bcache/io.c             |   4 +-
 drivers/md/bcache/journal.c        |   4 +-
 drivers/md/bcache/movinggc.c       |   7 +-
 drivers/md/bcache/super.c          |  25 +++--
 drivers/md/bcache/util.c           |   7 ++
 drivers/md/bcache/writeback.c      |   6 +-
 drivers/md/dm-bufio.c              |   4 +-
 drivers/md/dm-crypt.c              |  11 +-
 drivers/md/dm-io.c                 |  34 ++++--
 drivers/md/dm-rq.c                 |   3 +-
 drivers/md/dm.c                    |  11 +-
 drivers/md/md.c                    |  12 +++
 drivers/md/raid5.c                 |   9 +-
 drivers/nvme/target/io-cmd.c       |   4 +-
 drivers/target/target_core_pscsi.c |   8 +-
 fs/btrfs/volumes.c                 |   3 +
 fs/buffer.c                        |  24 +++--
 fs/crypto/crypto.c                 |   3 +-
 fs/direct-io.c                     |   4 +-
 fs/exofs/ore.c                     |   3 +-
 fs/exofs/ore_raid.c                |   3 +-
 fs/ext4/page-io.c                  |   3 +-
 fs/ext4/readpage.c                 |   3 +-
 fs/f2fs/data.c                     |  13 ++-
 fs/gfs2/lops.c                     |   3 +-
 fs/gfs2/meta_io.c                  |   3 +-
 fs/logfs/dev_bdev.c                | 110 +++++++------------
 fs/mpage.c                         |   3 +-
 fs/xfs/xfs_aops.c                  |   3 +-
 include/linux/bio.h                | 108 +++++++++++++++++--
 include/linux/blk_types.h          |   6 ++
 include/linux/blkdev.h             |   4 +
 include/linux/bvec.h               | 123 +++++++++++++++++++--
 kernel/power/swap.c                |   2 +
 mm/page_io.c                       |   1 +
 45 files changed, 759 insertions(+), 276 deletions(-)

-- 
2.7.4



^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 01/60] block: bio: introduce bio_init_with_vec_table()
  2016-10-29  8:07 ` Ming Lei
                   ` (3 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-29 15:21   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Mike Christie, Hannes Reinecke,
	Keith Busch, Mike Snitzer

Some drivers often uses external bvec table, so introduce
this helper for this case. It is always safe to access the
bio->bi_io_vec in this way for this case.

After converting to this helper, it will becomes a bit easier
to evaluate the remaining direct access to bio->bi_io_vec,
so it can help to prepare for the following multipage bvec
support.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bio.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 97cb48f03dc7..8634bd24984c 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -664,6 +664,16 @@ static inline void bio_inc_remaining(struct bio *bio)
 	atomic_inc(&bio->__bi_remaining);
 }
 
+static inline void bio_init_with_vec_table(struct bio *bio,
+					   struct bio_vec *table,
+					   unsigned max_vecs)
+{
+	bio_init(bio);
+	bio->bi_io_vec = table;
+	bio->bi_max_vecs = max_vecs;
+}
+
+
 /*
  * bio_set is used to allow other portions of the IO system to
  * allocate their own private memory pools for bio and iovec structures.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 02/60] block drivers: convert to bio_init_with_vec_table()
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jiri Kosina, Kent Overstreet,
	Shaohua Li, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER LVM, Christoph Hellwig, Sagi Grimberg,
	Joern Engel, Prasad Joshi, Mike Christie, Hannes Reinecke,
	Rasmus Villemoes, Johannes Thumshirn, Guoqing Jiang

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/floppy.c        |  3 +--
 drivers/md/bcache/io.c        |  4 +---
 drivers/md/bcache/journal.c   |  4 +---
 drivers/md/bcache/movinggc.c  |  7 +++----
 drivers/md/bcache/super.c     | 13 ++++---------
 drivers/md/bcache/writeback.c |  6 +++---
 drivers/md/dm-bufio.c         |  4 +---
 drivers/md/raid5.c            |  9 ++-------
 drivers/nvme/target/io-cmd.c  |  4 +---
 fs/logfs/dev_bdev.c           |  4 +---
 10 files changed, 18 insertions(+), 40 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index e3d8e4ced4a2..cdc916a95137 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3806,8 +3806,7 @@ static int __floppy_read_block_0(struct block_device *bdev, int drive)
 
 	cbdata.drive = drive;
 
-	bio_init(&bio);
-	bio.bi_io_vec = &bio_vec;
+	bio_init_with_vec_table(&bio, &bio_vec, 1);
 	bio_vec.bv_page = page;
 	bio_vec.bv_len = size;
 	bio_vec.bv_offset = 0;
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index e97b0acf7b8d..af9489087cd3 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -24,9 +24,7 @@ struct bio *bch_bbio_alloc(struct cache_set *c)
 	struct bbio *b = mempool_alloc(c->bio_meta, GFP_NOIO);
 	struct bio *bio = &b->bio;
 
-	bio_init(bio);
-	bio->bi_max_vecs	 = bucket_pages(c);
-	bio->bi_io_vec		 = bio->bi_inline_vecs;
+	bio_init_with_vec_table(bio, bio->bi_inline_vecs, bucket_pages(c));
 
 	return bio;
 }
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 6925023e12d4..b966f28d1b98 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -448,13 +448,11 @@ static void do_journal_discard(struct cache *ca)
 
 		atomic_set(&ja->discard_in_flight, DISCARD_IN_FLIGHT);
 
-		bio_init(bio);
+		bio_init_with_vec_table(bio, bio->bi_inline_vecs, 1);
 		bio_set_op_attrs(bio, REQ_OP_DISCARD, 0);
 		bio->bi_iter.bi_sector	= bucket_to_sector(ca->set,
 						ca->sb.d[ja->discard_idx]);
 		bio->bi_bdev		= ca->bdev;
-		bio->bi_max_vecs	= 1;
-		bio->bi_io_vec		= bio->bi_inline_vecs;
 		bio->bi_iter.bi_size	= bucket_bytes(ca);
 		bio->bi_end_io		= journal_discard_endio;
 
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index 5c4bddecfaf0..9d7991f69030 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -77,15 +77,14 @@ static void moving_init(struct moving_io *io)
 {
 	struct bio *bio = &io->bio.bio;
 
-	bio_init(bio);
+	bio_init_with_vec_table(bio, bio->bi_inline_vecs,
+				DIV_ROUND_UP(KEY_SIZE(&io->w->key),
+					     PAGE_SECTORS));
 	bio_get(bio);
 	bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0));
 
 	bio->bi_iter.bi_size	= KEY_SIZE(&io->w->key) << 9;
-	bio->bi_max_vecs	= DIV_ROUND_UP(KEY_SIZE(&io->w->key),
-					       PAGE_SECTORS);
 	bio->bi_private		= &io->cl;
-	bio->bi_io_vec		= bio->bi_inline_vecs;
 	bch_bio_map(bio, NULL);
 }
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 849ad441cd76..d8a6d807b498 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1152,9 +1152,7 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
 	dc->bdev = bdev;
 	dc->bdev->bd_holder = dc;
 
-	bio_init(&dc->sb_bio);
-	dc->sb_bio.bi_max_vecs	= 1;
-	dc->sb_bio.bi_io_vec	= dc->sb_bio.bi_inline_vecs;
+	bio_init_with_vec_table(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1);
 	dc->sb_bio.bi_io_vec[0].bv_page = sb_page;
 	get_page(sb_page);
 
@@ -1814,9 +1812,8 @@ static int cache_alloc(struct cache *ca)
 	__module_get(THIS_MODULE);
 	kobject_init(&ca->kobj, &bch_cache_ktype);
 
-	bio_init(&ca->journal.bio);
-	ca->journal.bio.bi_max_vecs = 8;
-	ca->journal.bio.bi_io_vec = ca->journal.bio.bi_inline_vecs;
+	bio_init_with_vec_table(&ca->journal.bio,
+				ca->journal.bio.bi_inline_vecs, 8);
 
 	free = roundup_pow_of_two(ca->sb.nbuckets) >> 10;
 
@@ -1852,9 +1849,7 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page,
 	ca->bdev = bdev;
 	ca->bdev->bd_holder = ca;
 
-	bio_init(&ca->sb_bio);
-	ca->sb_bio.bi_max_vecs	= 1;
-	ca->sb_bio.bi_io_vec	= ca->sb_bio.bi_inline_vecs;
+	bio_init_with_vec_table(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1);
 	ca->sb_bio.bi_io_vec[0].bv_page = sb_page;
 	get_page(sb_page);
 
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index e51644e503a5..b2568cef8c86 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -106,14 +106,14 @@ static void dirty_init(struct keybuf_key *w)
 	struct dirty_io *io = w->private;
 	struct bio *bio = &io->bio;
 
-	bio_init(bio);
+	bio_init_with_vec_table(bio, bio->bi_inline_vecs,
+				DIV_ROUND_UP(KEY_SIZE(&w->key),
+					     PAGE_SECTORS));
 	if (!io->dc->writeback_percent)
 		bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0));
 
 	bio->bi_iter.bi_size	= KEY_SIZE(&w->key) << 9;
-	bio->bi_max_vecs	= DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS);
 	bio->bi_private		= w;
-	bio->bi_io_vec		= bio->bi_inline_vecs;
 	bch_bio_map(bio, NULL);
 }
 
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 125aedc3875f..5b13e7e7c8aa 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -611,9 +611,7 @@ static void use_inline_bio(struct dm_buffer *b, int rw, sector_t block,
 	char *ptr;
 	int len;
 
-	bio_init(&b->bio);
-	b->bio.bi_io_vec = b->bio_vec;
-	b->bio.bi_max_vecs = DM_BUFIO_INLINE_VECS;
+	bio_init_with_vec_table(&b->bio, b->bio_vec, DM_BUFIO_INLINE_VECS);
 	b->bio.bi_iter.bi_sector = block << b->c->sectors_per_block_bits;
 	b->bio.bi_bdev = b->c->bdev;
 	b->bio.bi_end_io = inline_endio;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 92ac251e91e6..eae7b4cf34d4 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2004,13 +2004,8 @@ static struct stripe_head *alloc_stripe(struct kmem_cache *sc, gfp_t gfp,
 		for (i = 0; i < disks; i++) {
 			struct r5dev *dev = &sh->dev[i];
 
-			bio_init(&dev->req);
-			dev->req.bi_io_vec = &dev->vec;
-			dev->req.bi_max_vecs = 1;
-
-			bio_init(&dev->rreq);
-			dev->rreq.bi_io_vec = &dev->rvec;
-			dev->rreq.bi_max_vecs = 1;
+			bio_init_with_vec_table(&dev->req, &dev->vec, 1);
+			bio_init_with_vec_table(&dev->rreq, &dev->rvec, 1);
 		}
 	}
 	return sh;
diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index 4a96c2049b7b..6a32b0b68b1e 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -37,9 +37,7 @@ static void nvmet_inline_bio_init(struct nvmet_req *req)
 {
 	struct bio *bio = &req->inline_bio;
 
-	bio_init(bio);
-	bio->bi_max_vecs = NVMET_MAX_INLINE_BIOVEC;
-	bio->bi_io_vec = req->inline_bvec;
+	bio_init_with_vec_table(bio, req->inline_bvec, NVMET_MAX_INLINE_BIOVEC);
 }
 
 static void nvmet_execute_rw(struct nvmet_req *req)
diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index a8329cc47dec..2bf53b0ffe83 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -19,9 +19,7 @@ static int sync_request(struct page *page, struct block_device *bdev, int op)
 	struct bio bio;
 	struct bio_vec bio_vec;
 
-	bio_init(&bio);
-	bio.bi_max_vecs = 1;
-	bio.bi_io_vec = &bio_vec;
+	bio_init_with_vec_table(&bio, &bio_vec, 1);
 	bio_vec.bv_page = page;
 	bio_vec.bv_len = PAGE_SIZE;
 	bio_vec.bv_offset = 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 02/60] block drivers: convert to bio_init_with_vec_table()
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jiri Kosina, Kent Overstreet,
	Shaohua Li, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER (LVM),
	Christoph Hellwig, Sagi Grimberg, Joern Engel, Prasad Joshi,
	Mike Christie, Hannes Reinecke, Rasmus Villemoes,
	Johannes Thumshirn, Guoqing Jiang, Eric Wheeler, Coly Li,
	Yijing Wang, Zheng Liu, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	open list:NVM EXPRESS TARGET DRIVER, open list:LogFS

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/floppy.c        |  3 +--
 drivers/md/bcache/io.c        |  4 +---
 drivers/md/bcache/journal.c   |  4 +---
 drivers/md/bcache/movinggc.c  |  7 +++----
 drivers/md/bcache/super.c     | 13 ++++---------
 drivers/md/bcache/writeback.c |  6 +++---
 drivers/md/dm-bufio.c         |  4 +---
 drivers/md/raid5.c            |  9 ++-------
 drivers/nvme/target/io-cmd.c  |  4 +---
 fs/logfs/dev_bdev.c           |  4 +---
 10 files changed, 18 insertions(+), 40 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index e3d8e4ced4a2..cdc916a95137 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3806,8 +3806,7 @@ static int __floppy_read_block_0(struct block_device *bdev, int drive)
 
 	cbdata.drive = drive;
 
-	bio_init(&bio);
-	bio.bi_io_vec = &bio_vec;
+	bio_init_with_vec_table(&bio, &bio_vec, 1);
 	bio_vec.bv_page = page;
 	bio_vec.bv_len = size;
 	bio_vec.bv_offset = 0;
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index e97b0acf7b8d..af9489087cd3 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -24,9 +24,7 @@ struct bio *bch_bbio_alloc(struct cache_set *c)
 	struct bbio *b = mempool_alloc(c->bio_meta, GFP_NOIO);
 	struct bio *bio = &b->bio;
 
-	bio_init(bio);
-	bio->bi_max_vecs	 = bucket_pages(c);
-	bio->bi_io_vec		 = bio->bi_inline_vecs;
+	bio_init_with_vec_table(bio, bio->bi_inline_vecs, bucket_pages(c));
 
 	return bio;
 }
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 6925023e12d4..b966f28d1b98 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -448,13 +448,11 @@ static void do_journal_discard(struct cache *ca)
 
 		atomic_set(&ja->discard_in_flight, DISCARD_IN_FLIGHT);
 
-		bio_init(bio);
+		bio_init_with_vec_table(bio, bio->bi_inline_vecs, 1);
 		bio_set_op_attrs(bio, REQ_OP_DISCARD, 0);
 		bio->bi_iter.bi_sector	= bucket_to_sector(ca->set,
 						ca->sb.d[ja->discard_idx]);
 		bio->bi_bdev		= ca->bdev;
-		bio->bi_max_vecs	= 1;
-		bio->bi_io_vec		= bio->bi_inline_vecs;
 		bio->bi_iter.bi_size	= bucket_bytes(ca);
 		bio->bi_end_io		= journal_discard_endio;
 
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index 5c4bddecfaf0..9d7991f69030 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -77,15 +77,14 @@ static void moving_init(struct moving_io *io)
 {
 	struct bio *bio = &io->bio.bio;
 
-	bio_init(bio);
+	bio_init_with_vec_table(bio, bio->bi_inline_vecs,
+				DIV_ROUND_UP(KEY_SIZE(&io->w->key),
+					     PAGE_SECTORS));
 	bio_get(bio);
 	bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0));
 
 	bio->bi_iter.bi_size	= KEY_SIZE(&io->w->key) << 9;
-	bio->bi_max_vecs	= DIV_ROUND_UP(KEY_SIZE(&io->w->key),
-					       PAGE_SECTORS);
 	bio->bi_private		= &io->cl;
-	bio->bi_io_vec		= bio->bi_inline_vecs;
 	bch_bio_map(bio, NULL);
 }
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 849ad441cd76..d8a6d807b498 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1152,9 +1152,7 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
 	dc->bdev = bdev;
 	dc->bdev->bd_holder = dc;
 
-	bio_init(&dc->sb_bio);
-	dc->sb_bio.bi_max_vecs	= 1;
-	dc->sb_bio.bi_io_vec	= dc->sb_bio.bi_inline_vecs;
+	bio_init_with_vec_table(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1);
 	dc->sb_bio.bi_io_vec[0].bv_page = sb_page;
 	get_page(sb_page);
 
@@ -1814,9 +1812,8 @@ static int cache_alloc(struct cache *ca)
 	__module_get(THIS_MODULE);
 	kobject_init(&ca->kobj, &bch_cache_ktype);
 
-	bio_init(&ca->journal.bio);
-	ca->journal.bio.bi_max_vecs = 8;
-	ca->journal.bio.bi_io_vec = ca->journal.bio.bi_inline_vecs;
+	bio_init_with_vec_table(&ca->journal.bio,
+				ca->journal.bio.bi_inline_vecs, 8);
 
 	free = roundup_pow_of_two(ca->sb.nbuckets) >> 10;
 
@@ -1852,9 +1849,7 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page,
 	ca->bdev = bdev;
 	ca->bdev->bd_holder = ca;
 
-	bio_init(&ca->sb_bio);
-	ca->sb_bio.bi_max_vecs	= 1;
-	ca->sb_bio.bi_io_vec	= ca->sb_bio.bi_inline_vecs;
+	bio_init_with_vec_table(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1);
 	ca->sb_bio.bi_io_vec[0].bv_page = sb_page;
 	get_page(sb_page);
 
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index e51644e503a5..b2568cef8c86 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -106,14 +106,14 @@ static void dirty_init(struct keybuf_key *w)
 	struct dirty_io *io = w->private;
 	struct bio *bio = &io->bio;
 
-	bio_init(bio);
+	bio_init_with_vec_table(bio, bio->bi_inline_vecs,
+				DIV_ROUND_UP(KEY_SIZE(&w->key),
+					     PAGE_SECTORS));
 	if (!io->dc->writeback_percent)
 		bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0));
 
 	bio->bi_iter.bi_size	= KEY_SIZE(&w->key) << 9;
-	bio->bi_max_vecs	= DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS);
 	bio->bi_private		= w;
-	bio->bi_io_vec		= bio->bi_inline_vecs;
 	bch_bio_map(bio, NULL);
 }
 
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 125aedc3875f..5b13e7e7c8aa 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -611,9 +611,7 @@ static void use_inline_bio(struct dm_buffer *b, int rw, sector_t block,
 	char *ptr;
 	int len;
 
-	bio_init(&b->bio);
-	b->bio.bi_io_vec = b->bio_vec;
-	b->bio.bi_max_vecs = DM_BUFIO_INLINE_VECS;
+	bio_init_with_vec_table(&b->bio, b->bio_vec, DM_BUFIO_INLINE_VECS);
 	b->bio.bi_iter.bi_sector = block << b->c->sectors_per_block_bits;
 	b->bio.bi_bdev = b->c->bdev;
 	b->bio.bi_end_io = inline_endio;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 92ac251e91e6..eae7b4cf34d4 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2004,13 +2004,8 @@ static struct stripe_head *alloc_stripe(struct kmem_cache *sc, gfp_t gfp,
 		for (i = 0; i < disks; i++) {
 			struct r5dev *dev = &sh->dev[i];
 
-			bio_init(&dev->req);
-			dev->req.bi_io_vec = &dev->vec;
-			dev->req.bi_max_vecs = 1;
-
-			bio_init(&dev->rreq);
-			dev->rreq.bi_io_vec = &dev->rvec;
-			dev->rreq.bi_max_vecs = 1;
+			bio_init_with_vec_table(&dev->req, &dev->vec, 1);
+			bio_init_with_vec_table(&dev->rreq, &dev->rvec, 1);
 		}
 	}
 	return sh;
diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index 4a96c2049b7b..6a32b0b68b1e 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -37,9 +37,7 @@ static void nvmet_inline_bio_init(struct nvmet_req *req)
 {
 	struct bio *bio = &req->inline_bio;
 
-	bio_init(bio);
-	bio->bi_max_vecs = NVMET_MAX_INLINE_BIOVEC;
-	bio->bi_io_vec = req->inline_bvec;
+	bio_init_with_vec_table(bio, req->inline_bvec, NVMET_MAX_INLINE_BIOVEC);
 }
 
 static void nvmet_execute_rw(struct nvmet_req *req)
diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index a8329cc47dec..2bf53b0ffe83 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -19,9 +19,7 @@ static int sync_request(struct page *page, struct block_device *bdev, int op)
 	struct bio bio;
 	struct bio_vec bio_vec;
 
-	bio_init(&bio);
-	bio.bi_max_vecs = 1;
-	bio.bi_io_vec = &bio_vec;
+	bio_init_with_vec_table(&bio, &bio_vec, 1);
 	bio_vec.bv_page = page;
 	bio_vec.bv_len = PAGE_SIZE;
 	bio_vec.bv_offset = 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 03/60] block: drbd: remove impossible failure handling
  2016-10-29  8:07 ` Ming Lei
                   ` (5 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 15:25   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Philipp Reisner, Lars Ellenberg,
	open list:DRBD DRIVER

For a non-cloned bio, bio_add_page() only returns failure when
the io vec table is full, but in that case, bio->bi_vcnt can't
be zero at all.

So remove the impossible failure handling.

Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/drbd/drbd_receiver.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 942384f34e22..c537e3bd09eb 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1648,20 +1648,8 @@ int drbd_submit_peer_request(struct drbd_device *device,
 
 	page_chain_for_each(page) {
 		unsigned len = min_t(unsigned, data_size, PAGE_SIZE);
-		if (!bio_add_page(bio, page, len, 0)) {
-			/* A single page must always be possible!
-			 * But in case it fails anyways,
-			 * we deal with it, and complain (below). */
-			if (bio->bi_vcnt == 0) {
-				drbd_err(device,
-					"bio_add_page failed for len=%u, "
-					"bi_vcnt=0 (bi_sector=%llu)\n",
-					len, (uint64_t)bio->bi_iter.bi_sector);
-				err = -ENOSPC;
-				goto fail;
-			}
+		if (!bio_add_page(bio, page, len, 0))
 			goto next_bio;
-		}
 		data_size -= len;
 		sector += len >> 9;
 		--nr_pages;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 04/60] block: floppy: use bio_add_page()
  2016-10-29  8:07 ` Ming Lei
                   ` (6 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 15:26   ` Christoph Hellwig
  2016-11-10 19:35   ` Christoph Hellwig
  -1 siblings, 2 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jiri Kosina, Mike Christie,
	Hannes Reinecke, Dan Williams

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/floppy.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index cdc916a95137..999099d9509d 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3807,11 +3807,6 @@ static int __floppy_read_block_0(struct block_device *bdev, int drive)
 	cbdata.drive = drive;
 
 	bio_init_with_vec_table(&bio, &bio_vec, 1);
-	bio_vec.bv_page = page;
-	bio_vec.bv_len = size;
-	bio_vec.bv_offset = 0;
-	bio.bi_vcnt = 1;
-	bio.bi_iter.bi_size = size;
 	bio.bi_bdev = bdev;
 	bio.bi_iter.bi_sector = 0;
 	bio.bi_flags |= (1 << BIO_QUIET);
@@ -3819,6 +3814,8 @@ static int __floppy_read_block_0(struct block_device *bdev, int drive)
 	bio.bi_end_io = floppy_rb0_cb;
 	bio_set_op_attrs(&bio, REQ_OP_READ, 0);
 
+	bio_add_page(&bio, page, size, 0);
+
 	submit_bio(&bio);
 	process_fd_request();
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 05/60] target: avoid to access .bi_vcnt directly
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Nicholas A. Bellinger,
	open list:TARGET SUBSYSTEM, open list:TARGET SUBSYSTEM

When the bio is full, bio_add_pc_page() will return zero,
so use this way to handle full bio.

Also replace access to .bi_vcnt for pr_debug() with bio_segments().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/target/target_core_pscsi.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index 9125d9358dea..04d7aa7390d0 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -935,13 +935,9 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 
 			rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
 					bio, page, bytes, off);
-			if (rc != bytes)
-				goto fail;
-
 			pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
-				bio->bi_vcnt, nr_vecs);
-
-			if (bio->bi_vcnt > nr_vecs) {
+				bio_segments(bio), nr_vecs);
+			if (rc != bytes) {
 				pr_debug("PSCSI: Reached bio->bi_vcnt max:"
 					" %d i: %d bio: %p, allocating another"
 					" bio\n", bio->bi_vcnt, i, bio);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 05/60] target: avoid to access .bi_vcnt directly
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Nicholas A. Bellinger,
	open list:TARGET SUBSYSTEM, open list:TARGET SUBSYSTEM

When the bio is full, bio_add_pc_page() will return zero,
so use this way to handle full bio.

Also replace access to .bi_vcnt for pr_debug() with bio_segments().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/target/target_core_pscsi.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index 9125d9358dea..04d7aa7390d0 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -935,13 +935,9 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 
 			rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
 					bio, page, bytes, off);
-			if (rc != bytes)
-				goto fail;
-
 			pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
-				bio->bi_vcnt, nr_vecs);
-
-			if (bio->bi_vcnt > nr_vecs) {
+				bio_segments(bio), nr_vecs);
+			if (rc != bytes) {
 				pr_debug("PSCSI: Reached bio->bi_vcnt max:"
 					" %d i: %d bio: %p, allocating another"
 					" bio\n", bio->bi_vcnt, i, bio);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 06/60] bcache: debug: avoid to access .bi_io_vec directly
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Kent Overstreet, Shaohua Li,
	Mike Christie, Hannes Reinecke, Guoqing Jiang,
	open list:BCACHE BLOCK LAYER CACHE,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

Instead we use standard iterator way to do that.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/debug.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 333a1e5f6ae6..430f3050663c 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -107,8 +107,8 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 {
 	char name[BDEVNAME_SIZE];
 	struct bio *check;
-	struct bio_vec bv;
-	struct bvec_iter iter;
+	struct bio_vec bv, cbv;
+	struct bvec_iter iter, citer = { 0 };
 
 	check = bio_clone(bio, GFP_NOIO);
 	if (!check)
@@ -120,9 +120,13 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 
 	submit_bio_wait(check);
 
+	citer.bi_size = UINT_MAX;
 	bio_for_each_segment(bv, bio, iter) {
 		void *p1 = kmap_atomic(bv.bv_page);
-		void *p2 = page_address(check->bi_io_vec[iter.bi_idx].bv_page);
+		void *p2;
+
+		cbv = bio_iter_iovec(check, citer);
+		p2 = page_address(cbv.bv_page);
 
 		cache_set_err_on(memcmp(p1 + bv.bv_offset,
 					p2 + bv.bv_offset,
@@ -133,6 +137,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 				 (uint64_t) bio->bi_iter.bi_sector);
 
 		kunmap_atomic(p1);
+		bio_advance_iter(check, &citer, bv.bv_len);
 	}
 
 	bio_free_pages(check);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 06/60] bcache: debug: avoid to access .bi_io_vec directly
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Kent Overstreet, Shaohua Li,
	Mike Christie, Hannes Reinecke, Guoqing Jiang,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Instead we use standard iterator way to do that.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/debug.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 333a1e5f6ae6..430f3050663c 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -107,8 +107,8 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 {
 	char name[BDEVNAME_SIZE];
 	struct bio *check;
-	struct bio_vec bv;
-	struct bvec_iter iter;
+	struct bio_vec bv, cbv;
+	struct bvec_iter iter, citer = { 0 };
 
 	check = bio_clone(bio, GFP_NOIO);
 	if (!check)
@@ -120,9 +120,13 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 
 	submit_bio_wait(check);
 
+	citer.bi_size = UINT_MAX;
 	bio_for_each_segment(bv, bio, iter) {
 		void *p1 = kmap_atomic(bv.bv_page);
-		void *p2 = page_address(check->bi_io_vec[iter.bi_idx].bv_page);
+		void *p2;
+
+		cbv = bio_iter_iovec(check, citer);
+		p2 = page_address(cbv.bv_page);
 
 		cache_set_err_on(memcmp(p1 + bv.bv_offset,
 					p2 + bv.bv_offset,
@@ -133,6 +137,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 				 (uint64_t) bio->bi_iter.bi_sector);
 
 		kunmap_atomic(p1);
+		bio_advance_iter(check, &citer, bv.bv_len);
 	}
 
 	bio_free_pages(check);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 07/60] dm: crypt: use bio_add_page()
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER LVM, Shaohua Li,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

We have the standard interface to add page to bio, so don't
do that in hacking way.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/dm-crypt.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index a2768835d394..4999c7497f95 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -994,7 +994,6 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size)
 	gfp_t gfp_mask = GFP_NOWAIT | __GFP_HIGHMEM;
 	unsigned i, len, remaining_size;
 	struct page *page;
-	struct bio_vec *bvec;
 
 retry:
 	if (unlikely(gfp_mask & __GFP_DIRECT_RECLAIM))
@@ -1019,12 +1018,7 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size)
 
 		len = (remaining_size > PAGE_SIZE) ? PAGE_SIZE : remaining_size;
 
-		bvec = &clone->bi_io_vec[clone->bi_vcnt++];
-		bvec->bv_page = page;
-		bvec->bv_len = len;
-		bvec->bv_offset = 0;
-
-		clone->bi_iter.bi_size += len;
+		bio_add_page(clone, page, len, 0);
 
 		remaining_size -= len;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 07/60] dm: crypt: use bio_add_page()
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

We have the standard interface to add page to bio, so don't
do that in hacking way.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/dm-crypt.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index a2768835d394..4999c7497f95 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -994,7 +994,6 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size)
 	gfp_t gfp_mask = GFP_NOWAIT | __GFP_HIGHMEM;
 	unsigned i, len, remaining_size;
 	struct page *page;
-	struct bio_vec *bvec;
 
 retry:
 	if (unlikely(gfp_mask & __GFP_DIRECT_RECLAIM))
@@ -1019,12 +1018,7 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size)
 
 		len = (remaining_size > PAGE_SIZE) ? PAGE_SIZE : remaining_size;
 
-		bvec = &clone->bi_io_vec[clone->bi_vcnt++];
-		bvec->bv_page = page;
-		bvec->bv_len = len;
-		bvec->bv_offset = 0;
-
-		clone->bi_iter.bi_size += len;
+		bio_add_page(clone, page, len, 0);
 
 		remaining_size -= len;
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 08/60] dm: use bvec iterator helpers to implement .get_page and .next_page
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER LVM, Shaohua Li,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

Firstly we have mature bvec/bio iterator helper for iterate each
page in one bio, not necessary to reinvent a wheel to do that.

Secondly the coming multipage bvecs requires this patch.

Also add comments about the direct access to bvec table.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/dm-io.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 0bf1a12e35fe..2ef573c220fc 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -162,7 +162,10 @@ struct dpages {
 			 struct page **p, unsigned long *len, unsigned *offset);
 	void (*next_page)(struct dpages *dp);
 
-	unsigned context_u;
+	union {
+		unsigned context_u;
+		struct bvec_iter context_bi;
+	};
 	void *context_ptr;
 
 	void *vma_invalidate_address;
@@ -204,25 +207,36 @@ static void list_dp_init(struct dpages *dp, struct page_list *pl, unsigned offse
 static void bio_get_page(struct dpages *dp, struct page **p,
 			 unsigned long *len, unsigned *offset)
 {
-	struct bio_vec *bvec = dp->context_ptr;
-	*p = bvec->bv_page;
-	*len = bvec->bv_len - dp->context_u;
-	*offset = bvec->bv_offset + dp->context_u;
+	struct bio_vec bv = bvec_iter_bvec((struct bio_vec *)dp->context_ptr,
+			dp->context_bi);
+
+	*p = bv.bv_page;
+	*len = bv.bv_len;
+	*offset = bv.bv_offset;
+
+	/* avoid to figure out it in bio_next_page() again */
+	dp->context_bi.bi_sector = (sector_t)bv.bv_len;
 }
 
 static void bio_next_page(struct dpages *dp)
 {
-	struct bio_vec *bvec = dp->context_ptr;
-	dp->context_ptr = bvec + 1;
-	dp->context_u = 0;
+	unsigned int len = (unsigned int)dp->context_bi.bi_sector;
+
+	bvec_iter_advance((struct bio_vec *)dp->context_ptr,
+			&dp->context_bi, len);
 }
 
 static void bio_dp_init(struct dpages *dp, struct bio *bio)
 {
 	dp->get_page = bio_get_page;
 	dp->next_page = bio_next_page;
-	dp->context_ptr = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
-	dp->context_u = bio->bi_iter.bi_bvec_done;
+
+	/*
+	 * We just use bvec iterator to retrieve pages, so it is ok to
+	 * access the bvec table directly here
+	 */
+	dp->context_ptr = bio->bi_io_vec;
+	dp->context_bi = bio->bi_iter;
 }
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 08/60] dm: use bvec iterator helpers to implement .get_page and .next_page
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Firstly we have mature bvec/bio iterator helper for iterate each
page in one bio, not necessary to reinvent a wheel to do that.

Secondly the coming multipage bvecs requires this patch.

Also add comments about the direct access to bvec table.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/dm-io.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 0bf1a12e35fe..2ef573c220fc 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -162,7 +162,10 @@ struct dpages {
 			 struct page **p, unsigned long *len, unsigned *offset);
 	void (*next_page)(struct dpages *dp);
 
-	unsigned context_u;
+	union {
+		unsigned context_u;
+		struct bvec_iter context_bi;
+	};
 	void *context_ptr;
 
 	void *vma_invalidate_address;
@@ -204,25 +207,36 @@ static void list_dp_init(struct dpages *dp, struct page_list *pl, unsigned offse
 static void bio_get_page(struct dpages *dp, struct page **p,
 			 unsigned long *len, unsigned *offset)
 {
-	struct bio_vec *bvec = dp->context_ptr;
-	*p = bvec->bv_page;
-	*len = bvec->bv_len - dp->context_u;
-	*offset = bvec->bv_offset + dp->context_u;
+	struct bio_vec bv = bvec_iter_bvec((struct bio_vec *)dp->context_ptr,
+			dp->context_bi);
+
+	*p = bv.bv_page;
+	*len = bv.bv_len;
+	*offset = bv.bv_offset;
+
+	/* avoid to figure out it in bio_next_page() again */
+	dp->context_bi.bi_sector = (sector_t)bv.bv_len;
 }
 
 static void bio_next_page(struct dpages *dp)
 {
-	struct bio_vec *bvec = dp->context_ptr;
-	dp->context_ptr = bvec + 1;
-	dp->context_u = 0;
+	unsigned int len = (unsigned int)dp->context_bi.bi_sector;
+
+	bvec_iter_advance((struct bio_vec *)dp->context_ptr,
+			&dp->context_bi, len);
 }
 
 static void bio_dp_init(struct dpages *dp, struct bio *bio)
 {
 	dp->get_page = bio_get_page;
 	dp->next_page = bio_next_page;
-	dp->context_ptr = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
-	dp->context_u = bio->bi_iter.bi_bvec_done;
+
+	/*
+	 * We just use bvec iterator to retrieve pages, so it is ok to
+	 * access the bvec table directly here
+	 */
+	dp->context_ptr = bio->bi_io_vec;
+	dp->context_bi = bio->bi_iter;
 }
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER LVM, Shaohua Li,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

Avoid to access .bi_vcnt directly, because it may be not what
the driver expected any more after supporting multipage bvec.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/dm-rq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1d0d2adc050a..8534cbf8ce35 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -819,7 +819,8 @@ static void dm_old_request_fn(struct request_queue *q)
 			pos = blk_rq_pos(rq);
 
 		if ((dm_old_request_peeked_before_merge_deadline(md) &&
-		     md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
+		     md_in_flight(md) && rq->bio &&
+		     !bio_multiple_segments(rq->bio) &&
 		     md->last_rq_pos == pos && md->last_rq_rw == rq_data_dir(rq)) ||
 		    (ti->type->busy && ti->type->busy(ti))) {
 			blk_delay_queue(q, 10);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Avoid to access .bi_vcnt directly, because it may be not what
the driver expected any more after supporting multipage bvec.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/dm-rq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1d0d2adc050a..8534cbf8ce35 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -819,7 +819,8 @@ static void dm_old_request_fn(struct request_queue *q)
 			pos = blk_rq_pos(rq);
 
 		if ((dm_old_request_peeked_before_merge_deadline(md) &&
-		     md_in_flight(md) && rq->bio && rq->bio->bi_vcnt == 1 &&
+		     md_in_flight(md) && rq->bio &&
+		     !bio_multiple_segments(rq->bio) &&
 		     md->last_rq_pos == pos && md->last_rq_rw == rq_data_dir(rq)) ||
 		    (ti->type->busy && ti->type->busy(ti))) {
 			blk_delay_queue(q, 10);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 10/60] fs: logfs: convert to bio_add_page() in sync_request()
  2016-10-29  8:07 ` Ming Lei
                   ` (12 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Joern Engel, Prasad Joshi,
	open list:LogFS

Always bio_add_page() is the standard and preferred way to
do the task.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/logfs/dev_bdev.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index 2bf53b0ffe83..696dcdd65fdd 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -20,15 +20,12 @@ static int sync_request(struct page *page, struct block_device *bdev, int op)
 	struct bio_vec bio_vec;
 
 	bio_init_with_vec_table(&bio, &bio_vec, 1);
-	bio_vec.bv_page = page;
-	bio_vec.bv_len = PAGE_SIZE;
-	bio_vec.bv_offset = 0;
-	bio.bi_vcnt = 1;
 	bio.bi_bdev = bdev;
 	bio.bi_iter.bi_sector = page->index * (PAGE_SIZE >> 9);
-	bio.bi_iter.bi_size = PAGE_SIZE;
 	bio_set_op_attrs(&bio, op, 0);
 
+	bio_add_page(&bio, page, PAGE_SIZE, 0);
+
 	return submit_bio_wait(&bio);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 11/60] fs: logfs: use bio_add_page() in __bdev_writeseg()
  2016-10-29  8:07 ` Ming Lei
                   ` (13 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 15:29   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Joern Engel, Prasad Joshi,
	open list:LogFS

Also this patch simplify the code a bit.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/logfs/dev_bdev.c | 51 ++++++++++++++++++++-------------------------------
 1 file changed, 20 insertions(+), 31 deletions(-)

diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index 696dcdd65fdd..79be4cb0dfd8 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -72,56 +72,45 @@ static int __bdev_writeseg(struct super_block *sb, u64 ofs, pgoff_t index,
 {
 	struct logfs_super *super = logfs_super(sb);
 	struct address_space *mapping = super->s_mapping_inode->i_mapping;
-	struct bio *bio;
+	struct bio *bio = NULL;
 	struct page *page;
 	unsigned int max_pages;
-	int i;
+	int i, ret;
 
 	max_pages = min_t(size_t, nr_pages, BIO_MAX_PAGES);
 
-	bio = bio_alloc(GFP_NOFS, max_pages);
-	BUG_ON(!bio);
-
 	for (i = 0; i < nr_pages; i++) {
-		if (i >= max_pages) {
-			/* Block layer cannot split bios :( */
-			bio->bi_vcnt = i;
-			bio->bi_iter.bi_size = i * PAGE_SIZE;
+		if (!bio) {
+			bio = bio_alloc(GFP_NOFS, max_pages);
+			BUG_ON(!bio);
+
 			bio->bi_bdev = super->s_bdev;
 			bio->bi_iter.bi_sector = ofs >> 9;
 			bio->bi_private = sb;
 			bio->bi_end_io = writeseg_end_io;
 			bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-			atomic_inc(&super->s_pending_writes);
-			submit_bio(bio);
-
-			ofs += i * PAGE_SIZE;
-			index += i;
-			nr_pages -= i;
-			i = 0;
-
-			bio = bio_alloc(GFP_NOFS, max_pages);
-			BUG_ON(!bio);
 		}
 		page = find_lock_page(mapping, index + i);
 		BUG_ON(!page);
-		bio->bi_io_vec[i].bv_page = page;
-		bio->bi_io_vec[i].bv_len = PAGE_SIZE;
-		bio->bi_io_vec[i].bv_offset = 0;
+		ret = bio_add_page(bio, page, PAGE_SIZE, 0);
 
 		BUG_ON(PageWriteback(page));
 		set_page_writeback(page);
 		unlock_page(page);
+
+		if (!ret) {
+			/* Block layer cannot split bios :( */
+			ofs += bio->bi_iter.bi_size;
+			atomic_inc(&super->s_pending_writes);
+			submit_bio(bio);
+			bio = NULL;
+		}
+	}
+
+	if (bio) {
+		atomic_inc(&super->s_pending_writes);
+		submit_bio(bio);
 	}
-	bio->bi_vcnt = nr_pages;
-	bio->bi_iter.bi_size = nr_pages * PAGE_SIZE;
-	bio->bi_bdev = super->s_bdev;
-	bio->bi_iter.bi_sector = ofs >> 9;
-	bio->bi_private = sb;
-	bio->bi_end_io = writeseg_end_io;
-	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-	atomic_inc(&super->s_pending_writes);
-	submit_bio(bio);
 	return 0;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 12/60] fs: logfs: use bio_add_page() in do_erase()
  2016-10-29  8:07 ` Ming Lei
                   ` (14 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 15:29   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Joern Engel, Prasad Joshi,
	open list:LogFS

Also code gets simplified a bit.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/logfs/dev_bdev.c | 44 +++++++++++++++-----------------------------
 1 file changed, 15 insertions(+), 29 deletions(-)

diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index 79be4cb0dfd8..ff5e3e31bca3 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -154,49 +154,35 @@ static int do_erase(struct super_block *sb, u64 ofs, pgoff_t index,
 		size_t nr_pages)
 {
 	struct logfs_super *super = logfs_super(sb);
-	struct bio *bio;
+	struct bio *bio = NULL;
 	unsigned int max_pages;
-	int i;
+	int i, ret;
 
 	max_pages = min_t(size_t, nr_pages, BIO_MAX_PAGES);
 
-	bio = bio_alloc(GFP_NOFS, max_pages);
-	BUG_ON(!bio);
-
 	for (i = 0; i < nr_pages; i++) {
-		if (i >= max_pages) {
-			/* Block layer cannot split bios :( */
-			bio->bi_vcnt = i;
-			bio->bi_iter.bi_size = i * PAGE_SIZE;
+		if (!bio) {
+			bio = bio_alloc(GFP_NOFS, max_pages);
+			BUG_ON(!bio);
+
 			bio->bi_bdev = super->s_bdev;
 			bio->bi_iter.bi_sector = ofs >> 9;
 			bio->bi_private = sb;
 			bio->bi_end_io = erase_end_io;
 			bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+		}
+		ret = bio_add_page(bio, super->s_erase_page, PAGE_SIZE, 0);
+		if (!ret) {
+			/* Block layer cannot split bios :( */
+			ofs += bio->bi_iter.bi_size;
 			atomic_inc(&super->s_pending_writes);
 			submit_bio(bio);
-
-			ofs += i * PAGE_SIZE;
-			index += i;
-			nr_pages -= i;
-			i = 0;
-
-			bio = bio_alloc(GFP_NOFS, max_pages);
-			BUG_ON(!bio);
 		}
-		bio->bi_io_vec[i].bv_page = super->s_erase_page;
-		bio->bi_io_vec[i].bv_len = PAGE_SIZE;
-		bio->bi_io_vec[i].bv_offset = 0;
 	}
-	bio->bi_vcnt = nr_pages;
-	bio->bi_iter.bi_size = nr_pages * PAGE_SIZE;
-	bio->bi_bdev = super->s_bdev;
-	bio->bi_iter.bi_sector = ofs >> 9;
-	bio->bi_private = sb;
-	bio->bi_end_io = erase_end_io;
-	bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-	atomic_inc(&super->s_pending_writes);
-	submit_bio(bio);
+	if (bio) {
+		atomic_inc(&super->s_pending_writes);
+		submit_bio(bio);
+	}
 	return 0;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 13/60] fs: logfs: remove unnecesary check
  2016-10-29  8:07 ` Ming Lei
                   ` (15 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 15:29   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Joern Engel, Prasad Joshi,
	open list:LogFS

The check on bio->bi_vcnt doesn't make sense in erase_end_io().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/logfs/dev_bdev.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index ff5e3e31bca3..f05a02ff43e6 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -144,7 +144,6 @@ static void erase_end_io(struct bio *bio)
 	struct logfs_super *super = logfs_super(sb); 
 
 	BUG_ON(bio->bi_error); /* FIXME: Retry io or write elsewhere */ 
-	BUG_ON(bio->bi_vcnt == 0); 
 	bio_put(bio); 
 	if (atomic_dec_and_test(&super->s_pending_writes))
 		wake_up(&wq); 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 14/60] block: drbd: comment on direct access bvec table
  2016-10-29  8:07 ` Ming Lei
                   ` (16 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Philipp Reisner, Lars Ellenberg,
	open list:DRBD DRIVER

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/drbd/drbd_bitmap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index ab62b81c2ca7..ce9506da30ad 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -953,6 +953,7 @@ static void drbd_bm_endio(struct bio *bio)
 	struct drbd_bm_aio_ctx *ctx = bio->bi_private;
 	struct drbd_device *device = ctx->device;
 	struct drbd_bitmap *b = device->bitmap;
+	/* single page bio, safe for multipage bvec */
 	unsigned int idx = bm_page_to_idx(bio->bi_io_vec[0].bv_page);
 
 	if ((ctx->flags & BM_AIO_COPY_PAGES) == 0 &&
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 15/60] block: loop: comment on direct access to bvec table
  2016-10-29  8:07 ` Ming Lei
                   ` (17 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 15:31   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Hannes Reinecke, Mike Christie,
	Minfei Huang, Petr Mladek

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/loop.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index fa1b7a90ba11..55ce4226590d 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -485,6 +485,11 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	/* nomerge for loop request queue */
 	WARN_ON(cmd->rq->bio != cmd->rq->biotail);
 
+	/*
+	 * For multipage bvec support, it is safe to pass the bvec
+	 * table to iov iterator, because iov iter still uses bvec
+	 * iter helpers to travese bvec.
+	 */
 	bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
 	iov_iter_bvec(&iter, ITER_BVEC | rw, bvec,
 		      bio_segments(bio), blk_rq_bytes(cmd->rq));
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 16/60] block: pktcdvd: comment on direct access to bvec table
  2016-10-29  8:07 ` Ming Lei
                   ` (18 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 15:33   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jiri Kosina

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/pktcdvd.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 90fa4ac149db..817d2cc17d01 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1298,6 +1298,11 @@ static int pkt_handle_queue(struct pktcdvd_device *pd)
 static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 {
 	int f;
+
+	/*
+	 * Need to fix this usage for supporting multipage bvecs,
+	 * because the table can be changed in pkt_make_local_copy().
+	 */
 	struct bio_vec *bvec = pkt->w_bio->bi_io_vec;
 
 	bio_reset(pkt->w_bio);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 17/60] kernel/power/swap.c: comment on direct access to bvec table
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Rafael J. Wysocki, Pavel Machek,
	Len Brown, open list:HIBERNATION (aka Software Suspend,
	aka swsusp)

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 kernel/power/swap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index a3b1e617bcdc..8bc13a4461bc 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -238,6 +238,8 @@ static void hib_init_batch(struct hib_bio_batch *hb)
 static void hib_end_io(struct bio *bio)
 {
 	struct hib_bio_batch *hb = bio->bi_private;
+
+	/* single page bio, safe for multipage bvec */
 	struct page *page = bio->bi_io_vec[0].bv_page;
 
 	if (bio->bi_error) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 17/60] kernel/power/swap.c: comment on direct access to bvec table
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Rafael J. Wysocki, Pavel Machek,
	Len Brown, open list:HIBERNATION aka Software Suspend,
	aka swsusp

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 kernel/power/swap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index a3b1e617bcdc..8bc13a4461bc 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -238,6 +238,8 @@ static void hib_init_batch(struct hib_bio_batch *hb)
 static void hib_end_io(struct bio *bio)
 {
 	struct hib_bio_batch *hb = bio->bi_private;
+
+	/* single page bio, safe for multipage bvec */
 	struct page *page = bio->bi_io_vec[0].bv_page;
 
 	if (bio->bi_error) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 18/60] mm: page_io.c: comment on direct access to bvec table
  2016-10-29  8:07 ` Ming Lei
  (?)
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Andrew Morton, Michal Hocko,
	Minchan Kim, Mike Christie, Santosh Shilimkar, Joe Perches,
	open list:MEMORY MANAGEMENT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 mm/page_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_io.c b/mm/page_io.c
index a2651f58c86a..b0c0069ec1f4 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -43,6 +43,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
 
 void end_swap_bio_write(struct bio *bio)
 {
+	/* single page bio, safe for multipage bvec */
 	struct page *page = bio->bi_io_vec[0].bv_page;
 
 	if (bio->bi_error) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 18/60] mm: page_io.c: comment on direct access to bvec table
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Andrew Morton, Michal Hocko,
	Minchan Kim, Mike Christie, Santosh Shilimkar, Joe Perches,
	open list:MEMORY MANAGEMENT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 mm/page_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_io.c b/mm/page_io.c
index a2651f58c86a..b0c0069ec1f4 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -43,6 +43,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
 
 void end_swap_bio_write(struct bio *bio)
 {
+	/* single page bio, safe for multipage bvec */
 	struct page *page = bio->bi_io_vec[0].bv_page;
 
 	if (bio->bi_error) {
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 18/60] mm: page_io.c: comment on direct access to bvec table
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Andrew Morton, Michal Hocko,
	Minchan Kim, Mike Christie, Santosh Shilimkar, Joe Perches,
	open list:MEMORY MANAGEMENT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 mm/page_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_io.c b/mm/page_io.c
index a2651f58c86a..b0c0069ec1f4 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -43,6 +43,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
 
 void end_swap_bio_write(struct bio *bio)
 {
+	/* single page bio, safe for multipage bvec */
 	struct page *page = bio->bi_io_vec[0].bv_page;
 
 	if (bio->bi_error) {
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 19/60] fs/buffer: comment on direct access to bvec table
  2016-10-29  8:07 ` Ming Lei
                   ` (21 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 15:35   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alexander Viro

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/buffer.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index b205a629001d..81c3793948b4 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3018,8 +3018,13 @@ static void end_bio_bh_io_sync(struct bio *bio)
 void guard_bio_eod(int op, struct bio *bio)
 {
 	sector_t maxsector;
-	struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
 	unsigned truncated_bytes;
+	/*
+	 * It is safe to truncate the last bvec in the following way
+	 * even though multipage bvec is supported, but we need to
+	 * fix the parameters passed to zero_user().
+	 */
+	struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
 
 	maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
 	if (!maxsector)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 20/60] f2fs: f2fs_read_end_io: comment on direct access to bvec table
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jaegeuk Kim, Chao Yu,
	open list:F2FS FILE SYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/f2fs/data.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9ae194fd2fdb..24f6f6977d37 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -35,6 +35,10 @@ static void f2fs_read_end_io(struct bio *bio)
 	int i;
 
 #ifdef CONFIG_F2FS_FAULT_INJECTION
+	/*
+	 * It is still safe to retrieve the 1st page of the bio
+	 * in this way after supporting multipage bvec.
+	 */
 	if (time_to_inject(F2FS_P_SB(bio->bi_io_vec->bv_page), FAULT_IO))
 		bio->bi_error = -EIO;
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 20/60] f2fs: f2fs_read_end_io: comment on direct access to bvec table
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jaegeuk Kim, Chao Yu,
	open list:F2FS FILE SYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/f2fs/data.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9ae194fd2fdb..24f6f6977d37 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -35,6 +35,10 @@ static void f2fs_read_end_io(struct bio *bio)
 	int i;
 
 #ifdef CONFIG_F2FS_FAULT_INJECTION
+	/*
+	 * It is still safe to retrieve the 1st page of the bio
+	 * in this way after supporting multipage bvec.
+	 */
 	if (time_to_inject(F2FS_P_SB(bio->bi_io_vec->bv_page), FAULT_IO))
 		bio->bi_error = -EIO;
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 21/60] bcache: comment on direct access to bvec table
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Kent Overstreet, Shaohua Li,
	Mike Christie, Hannes Reinecke, Guoqing Jiang, Jiri Kosina,
	Zheng Liu, Eric Wheeler, Yijing Wang, Coly Li, Al Viro,
	open list:BCACHE BLOCK LAYER CACHE,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

Looks all are safe after multipage bvec is supported.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/btree.c | 1 +
 drivers/md/bcache/super.c | 6 ++++++
 drivers/md/bcache/util.c  | 7 +++++++
 3 files changed, 14 insertions(+)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 81d3db40cd7b..b419bc91ba32 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -428,6 +428,7 @@ static void do_btree_node_write(struct btree *b)
 
 		continue_at(cl, btree_node_write_done, NULL);
 	} else {
+		/* No harm for multipage bvec since the new is just allocated */
 		b->bio->bi_vcnt = 0;
 		bch_bio_map(b->bio, i);
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index d8a6d807b498..52876fcf2b36 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -207,6 +207,7 @@ static void write_bdev_super_endio(struct bio *bio)
 
 static void __write_super(struct cache_sb *sb, struct bio *bio)
 {
+	/* single page bio, safe for multipage bvec */
 	struct cache_sb *out = page_address(bio->bi_io_vec[0].bv_page);
 	unsigned i;
 
@@ -1153,6 +1154,8 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
 	dc->bdev->bd_holder = dc;
 
 	bio_init_with_vec_table(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1);
+
+	/* single page bio, safe for multipage bvec */
 	dc->sb_bio.bi_io_vec[0].bv_page = sb_page;
 	get_page(sb_page);
 
@@ -1794,6 +1797,7 @@ void bch_cache_release(struct kobject *kobj)
 	for (i = 0; i < RESERVE_NR; i++)
 		free_fifo(&ca->free[i]);
 
+	/* single page bio, safe for multipage bvec */
 	if (ca->sb_bio.bi_inline_vecs[0].bv_page)
 		put_page(ca->sb_bio.bi_io_vec[0].bv_page);
 
@@ -1850,6 +1854,8 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page,
 	ca->bdev->bd_holder = ca;
 
 	bio_init_with_vec_table(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1);
+
+	/* single page bio, safe for multipage bvec */
 	ca->sb_bio.bi_io_vec[0].bv_page = sb_page;
 	get_page(sb_page);
 
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index dde6172f3f10..5cc0b49a65fb 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -222,6 +222,13 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
 		: 0;
 }
 
+/*
+ * Generally it isn't good to access .bi_io_vec and .bi_vcnt
+ * directly, the preferred way is bio_add_page, but in
+ * this case, bch_bio_map() supposes that the bvec table
+ * is empty, so it is safe to access .bi_vcnt & .bi_io_vec
+ * in this way even after multipage bvec is supported.
+ */
 void bch_bio_map(struct bio *bio, void *base)
 {
 	size_t size = bio->bi_iter.bi_size;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 21/60] bcache: comment on direct access to bvec table
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Kent Overstreet, Shaohua Li,
	Mike Christie, Hannes Reinecke, Guoqing Jiang, Jiri Kosina,
	Zheng Liu, Eric Wheeler, Yijing Wang, Coly Li, Al Viro,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Looks all are safe after multipage bvec is supported.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/btree.c | 1 +
 drivers/md/bcache/super.c | 6 ++++++
 drivers/md/bcache/util.c  | 7 +++++++
 3 files changed, 14 insertions(+)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 81d3db40cd7b..b419bc91ba32 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -428,6 +428,7 @@ static void do_btree_node_write(struct btree *b)
 
 		continue_at(cl, btree_node_write_done, NULL);
 	} else {
+		/* No harm for multipage bvec since the new is just allocated */
 		b->bio->bi_vcnt = 0;
 		bch_bio_map(b->bio, i);
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index d8a6d807b498..52876fcf2b36 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -207,6 +207,7 @@ static void write_bdev_super_endio(struct bio *bio)
 
 static void __write_super(struct cache_sb *sb, struct bio *bio)
 {
+	/* single page bio, safe for multipage bvec */
 	struct cache_sb *out = page_address(bio->bi_io_vec[0].bv_page);
 	unsigned i;
 
@@ -1153,6 +1154,8 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
 	dc->bdev->bd_holder = dc;
 
 	bio_init_with_vec_table(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1);
+
+	/* single page bio, safe for multipage bvec */
 	dc->sb_bio.bi_io_vec[0].bv_page = sb_page;
 	get_page(sb_page);
 
@@ -1794,6 +1797,7 @@ void bch_cache_release(struct kobject *kobj)
 	for (i = 0; i < RESERVE_NR; i++)
 		free_fifo(&ca->free[i]);
 
+	/* single page bio, safe for multipage bvec */
 	if (ca->sb_bio.bi_inline_vecs[0].bv_page)
 		put_page(ca->sb_bio.bi_io_vec[0].bv_page);
 
@@ -1850,6 +1854,8 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page,
 	ca->bdev->bd_holder = ca;
 
 	bio_init_with_vec_table(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1);
+
+	/* single page bio, safe for multipage bvec */
 	ca->sb_bio.bi_io_vec[0].bv_page = sb_page;
 	get_page(sb_page);
 
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index dde6172f3f10..5cc0b49a65fb 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -222,6 +222,13 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
 		: 0;
 }
 
+/*
+ * Generally it isn't good to access .bi_io_vec and .bi_vcnt
+ * directly, the preferred way is bio_add_page, but in
+ * this case, bch_bio_map() supposes that the bvec table
+ * is empty, so it is safe to access .bi_vcnt & .bi_io_vec
+ * in this way even after multipage bvec is supported.
+ */
 void bch_bio_map(struct bio *bio, void *base)
 {
 	size_t size = bio->bi_iter.bi_size;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 22/60] block: comment on bio_alloc_pages()
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe, Kent Overstreet,
	Shaohua Li, Mike Christie, Guoqing Jiang, Hannes Reinecke,
	open list:BCACHE BLOCK LAYER CACHE,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

This patch adds comment on usage of bio_alloc_pages(),
also comments on one special case of bch_data_verify().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bio.c               | 4 +++-
 drivers/md/bcache/debug.c | 6 ++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index db85c5753a76..a49d1d89a85c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -907,7 +907,9 @@ EXPORT_SYMBOL(bio_advance);
  * @bio: bio to allocate pages for
  * @gfp_mask: flags for allocation
  *
- * Allocates pages up to @bio->bi_vcnt.
+ * Allocates pages up to @bio->bi_vcnt, and this function should only
+ * be called on a new initialized bio, which means no page isn't added
+ * to the bio via bio_add_page() yet.
  *
  * Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages are
  * freed.
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 430f3050663c..71a9f05918eb 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -110,6 +110,12 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 	struct bio_vec bv, cbv;
 	struct bvec_iter iter, citer = { 0 };
 
+	/*
+	 * Once multipage bvec is supported, the bio_clone()
+	 * has to make sure page count in this bio can be held
+	 * in the new cloned bio because each single page need
+	 * to assign to each bvec of the new bio.
+	 */
 	check = bio_clone(bio, GFP_NOIO);
 	if (!check)
 		return;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 22/60] block: comment on bio_alloc_pages()
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe, Kent Overstreet,
	Shaohua Li, Mike Christie, Guoqing Jiang, Hannes Reinecke,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

This patch adds comment on usage of bio_alloc_pages(),
also comments on one special case of bch_data_verify().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bio.c               | 4 +++-
 drivers/md/bcache/debug.c | 6 ++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index db85c5753a76..a49d1d89a85c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -907,7 +907,9 @@ EXPORT_SYMBOL(bio_advance);
  * @bio: bio to allocate pages for
  * @gfp_mask: flags for allocation
  *
- * Allocates pages up to @bio->bi_vcnt.
+ * Allocates pages up to @bio->bi_vcnt, and this function should only
+ * be called on a new initialized bio, which means no page isn't added
+ * to the bio via bio_add_page() yet.
  *
  * Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages are
  * freed.
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 430f3050663c..71a9f05918eb 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -110,6 +110,12 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 	struct bio_vec bv, cbv;
 	struct bvec_iter iter, citer = { 0 };
 
+	/*
+	 * Once multipage bvec is supported, the bio_clone()
+	 * has to make sure page count in this bio can be held
+	 * in the new cloned bio because each single page need
+	 * to assign to each bvec of the new bio.
+	 */
 	check = bio_clone(bio, GFP_NOIO);
 	if (!check)
 		return;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 23/60] block: introduce flag QUEUE_FLAG_NO_MP
  2016-10-29  8:07 ` Ming Lei
                   ` (25 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-29 15:29   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Mike Christie, Hannes Reinecke,
	Dan Williams, Toshi Kani

MD(especially raid1 and raid10) is a bit difficult to support
multipage bvec, so introduce this flag for not enabling multipage
bvec, then MD can still accept singlepage bvec only, and once
direct access to bvec table in MD and other fs/drivers are cleanuped,
the flag can be removed. BTRFS has the similar issue too.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/blkdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358ba052..e4dd25361bd6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -505,6 +505,7 @@ struct request_queue {
 #define QUEUE_FLAG_FUA	       24	/* device supports FUA writes */
 #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
 #define QUEUE_FLAG_DAX         26	/* device supports DAX */
+#define QUEUE_FLAG_NO_MP       27	/* multipage bvecs isn't ready */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -595,6 +596,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 #define blk_queue_secure_erase(q) \
 	(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_no_mp(q)	test_bit(QUEUE_FLAG_NO_MP, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 24/60] md: set NO_MP for request queue of md
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Shaohua Li,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

MD isn't ready for multipage bvecs, so mark it as
NO_MP.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/md.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index eac84d8ff724..f8d98098dff8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5128,6 +5128,16 @@ static void md_safemode_timeout(unsigned long data)
 
 static int start_dirty_degraded;
 
+/*
+ * MD isn't ready for multipage bvecs yet, and set the flag
+ * so that MD still can see singlepage bvecs bio
+ */
+static inline void md_set_no_mp(struct mddev *mddev)
+{
+	if (mddev->queue)
+		set_bit(QUEUE_FLAG_NO_MP, &mddev->queue->queue_flags);
+}
+
 int md_run(struct mddev *mddev)
 {
 	int err;
@@ -5353,6 +5363,8 @@ int md_run(struct mddev *mddev)
 	if (mddev->flags & MD_UPDATE_SB_FLAGS)
 		md_update_sb(mddev, 0);
 
+	md_set_no_mp(mddev);
+
 	md_new_event(mddev);
 	sysfs_notify_dirent_safe(mddev->sysfs_state);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 24/60] md: set NO_MP for request queue of md
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Shaohua Li,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

MD isn't ready for multipage bvecs, so mark it as
NO_MP.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/md.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index eac84d8ff724..f8d98098dff8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5128,6 +5128,16 @@ static void md_safemode_timeout(unsigned long data)
 
 static int start_dirty_degraded;
 
+/*
+ * MD isn't ready for multipage bvecs yet, and set the flag
+ * so that MD still can see singlepage bvecs bio
+ */
+static inline void md_set_no_mp(struct mddev *mddev)
+{
+	if (mddev->queue)
+		set_bit(QUEUE_FLAG_NO_MP, &mddev->queue->queue_flags);
+}
+
 int md_run(struct mddev *mddev)
 {
 	int err;
@@ -5353,6 +5363,8 @@ int md_run(struct mddev *mddev)
 	if (mddev->flags & MD_UPDATE_SB_FLAGS)
 		md_update_sb(mddev, 0);
 
+	md_set_no_mp(mddev);
+
 	md_new_event(mddev);
 	sysfs_notify_dirent_safe(mddev->sysfs_state);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 25/60] block: pktcdvd: set NO_MP for pktcdvd request queue
  2016-10-29  8:07 ` Ming Lei
                   ` (27 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jiri Kosina

At least pkt_start_write() operates on the bvec table directly,
it isn't ready to enable multipage bvec yet, so mark the
flag now.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/block/pktcdvd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 817d2cc17d01..403c93b46ea3 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2518,6 +2518,9 @@ static void pkt_init_queue(struct pktcdvd_device *pd)
 	blk_queue_logical_block_size(q, CD_FRAMESIZE);
 	blk_queue_max_hw_sectors(q, PACKET_MAX_SECTORS);
 	q->queuedata = pd;
+
+	/* not ready for multipage bvec yet */
+	set_bit(QUEUE_FLAG_NO_MP, &q->queue_flags);
 }
 
 static int pkt_seq_show(struct seq_file *m, void *p)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 26/60] btrfs: set NO_MP for request queues behind BTRFS
  2016-10-29  8:07 ` Ming Lei
                   ` (28 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 15:36   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Chris Mason, Josef Bacik,
	David Sterba, open list:BTRFS FILE SYSTEM

There are lots of direct access to .bi_vcnt & .bi_io_vec
of bio, and it isn't ready to support multipage bvecs
for BTRFS, so set NO_MP for these request queues.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/btrfs/volumes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 71a60cc01451..2e7237a3b84d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1011,6 +1011,9 @@ static int __btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 		if (blk_queue_discard(q))
 			device->can_discard = 1;
 
+		/* BTRFS isn't ready to support multipage bvecs */
+		set_bit(QUEUE_FLAG_NO_MP, &q->queue_flags);
+
 		device->bdev = bdev;
 		device->in_fs_metadata = 0;
 		device->mode = flags;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 27/60] block: introduce BIO_SP_MAX_SECTORS
  2016-10-29  8:07 ` Ming Lei
                   ` (29 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Mike Christie, Hannes Reinecke,
	Keith Busch, Mike Snitzer

This macro is needed when one multipage bvec based bio is
converted to singlepage bvec based bio, for example, bio bounce
requires singlepage bvec.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bio.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8634bd24984c..fa71f6a57f81 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -40,6 +40,9 @@
 
 #define BIO_MAX_PAGES		256
 
+/* Max sectors of bio with singlepage bvec */
+#define BIO_SP_MAX_SECTORS     (BIO_MAX_PAGES * (PAGE_SIZE >> 9))
+
 #define bio_prio(bio)			(bio)->bi_ioprio
 #define bio_set_prio(bio, prio)		((bio)->bi_ioprio = prio)
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP
  2016-10-29  8:07 ` Ming Lei
                   ` (30 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 15:39   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe, Hannes Reinecke,
	Mike Christie, Dan Williams, Toshi Kani

Some drivers(such as dm) should be capable of dealing with multipage
bvec, but the incoming bio may be too big, such as, a new singlepage bvec
bio can't be cloned from the bio, or can't be allocated to singlepage
bvec with same size.

At least crypt dm, log writes and bcache have this kind of issue.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-merge.c      | 4 ++++
 include/linux/blkdev.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2642e5fc8b69..266c94d1d82f 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -79,6 +79,10 @@ static inline unsigned get_max_io_size(struct request_queue *q,
 	/* aligned to logical block size */
 	sectors &= ~(mask >> 9);
 
+	/* some queues can't handle bigger bio even it is ready for mp bvecs */
+	if (blk_queue_split_mp(q) && sectors > BIO_SP_MAX_SECTORS)
+		sectors = BIO_SP_MAX_SECTORS;
+
 	return sectors;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e4dd25361bd6..7cee0179c9e6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -506,6 +506,7 @@ struct request_queue {
 #define QUEUE_FLAG_FLUSH_NQ    25	/* flush not queueuable */
 #define QUEUE_FLAG_DAX         26	/* device supports DAX */
 #define QUEUE_FLAG_NO_MP       27	/* multipage bvecs isn't ready */
+#define QUEUE_FLAG_SPLIT_MP    28	/* split MP bvecs if too bigger */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
@@ -597,6 +598,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 	(test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags))
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
 #define blk_queue_no_mp(q)	test_bit(QUEUE_FLAG_NO_MP, &(q)->queue_flags)
+#define blk_queue_split_mp(q)	test_bit(QUEUE_FLAG_SPLIT_MP, &(q)->queue_flags)
 
 #define blk_noretry_request(rq) \
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 29/60] dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER LVM, Shaohua Li,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

For BIO based DM, some targets aren't ready for dealing with
bigger incoming bio than 1Mbyte, such as crypt and log write
targets.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/dm.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ef7bf1dd6900..ce454c6c1a4e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -899,7 +899,16 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
 		return -EINVAL;
 	}
 
-	ti->max_io_len = (uint32_t) len;
+	/*
+	 * BIO based queue uses its own splitting. When multipage bvecs
+	 * is switched on, size of the incoming bio may be too big to
+	 * be handled in some targets, such as crypt and log write.
+	 *
+	 * When these targets are ready for the big bio, we can remove
+	 * the limit.
+	 */
+	ti->max_io_len = min_t(uint32_t, len,
+			       BIO_SP_MAX_SECTORS << SECTOR_SHIFT);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 29/60] dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

For BIO based DM, some targets aren't ready for dealing with
bigger incoming bio than 1Mbyte, such as crypt and log write
targets.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/dm.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ef7bf1dd6900..ce454c6c1a4e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -899,7 +899,16 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
 		return -EINVAL;
 	}
 
-	ti->max_io_len = (uint32_t) len;
+	/*
+	 * BIO based queue uses its own splitting. When multipage bvecs
+	 * is switched on, size of the incoming bio may be too big to
+	 * be handled in some targets, such as crypt and log write.
+	 *
+	 * When these targets are ready for the big bio, we can remove
+	 * the limit.
+	 */
+	ti->max_io_len = min_t(uint32_t, len,
+			       BIO_SP_MAX_SECTORS << SECTOR_SHIFT);
 
 	return 0;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 30/60] bcache: set flag of QUEUE_FLAG_SPLIT_MP
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Kent Overstreet, Shaohua Li,
	Eric Wheeler, Coly Li, Yijing Wang, Zheng Liu, Mike Christie,
	open list:BCACHE BLOCK LAYER CACHE,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

It isn't safe(such as bch_data_verify()) to let bcache deal with
more than 1M bio from multipage bvec, so set this flag and size of
incoming bio won't be bigger than BIO_SP_MAX_SECTORS.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/super.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 52876fcf2b36..fca023a1a026 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -821,6 +821,12 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 
 	blk_queue_write_cache(q, true, true);
 
+	/*
+	 * Once bcache is audited that it is ready to deal with big
+	 * incoming bio with multipage bvecs, we can remove the flag.
+	 */
+	set_bit(QUEUE_FLAG_SPLIT_MP,	&d->disk->queue->queue_flags);
+
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 30/60] bcache: set flag of QUEUE_FLAG_SPLIT_MP
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Kent Overstreet, Shaohua Li,
	Eric Wheeler, Coly Li, Yijing Wang, Zheng Liu, Mike Christie,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

It isn't safe(such as bch_data_verify()) to let bcache deal with
more than 1M bio from multipage bvec, so set this flag and size of
incoming bio won't be bigger than BIO_SP_MAX_SECTORS.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/super.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 52876fcf2b36..fca023a1a026 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -821,6 +821,12 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
 
 	blk_queue_write_cache(q, true, true);
 
+	/*
+	 * Once bcache is audited that it is ready to deal with big
+	 * incoming bio with multipage bvecs, we can remove the flag.
+	 */
+	set_bit(QUEUE_FLAG_SPLIT_MP,	&d->disk->queue->queue_flags);
+
 	return 0;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 31/60] block: introduce multipage/single page bvec helpers
  2016-10-29  8:07 ` Ming Lei
                   ` (33 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Johannes Berg

This patch introduces helpers which are suffixed with _mp
and _sp for the multipage bvec/segment support.

The helpers with _mp suffix are the interfaces for treating
one bvec/segment as real multipage one, for example, .bv_len
is the total length of the multipage segment.

The helpers with _sp suffix are interfaces for supporting
current bvec iterator which is thought as singlepage only
by drivers, fs, dm and etc. These _sp helpers are introduced
to build singlepage bvec in flight, so users of bio/bvec
iterator still can work well and needn't change even though
we store multipage into bvec.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bvec.h | 57 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 54 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 89b65b82d98f..da984fa171bc 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -24,6 +24,42 @@
 #include <linux/bug.h>
 
 /*
+ * What is multipage bvecs(segment)?
+ *
+ * - bvec stored in bio->bi_io_vec is always multipage style vector
+ *
+ * - bvec(struct bio_vec) represents one physically contiguous I/O
+ *   buffer, now the buffer may include more than one pages since
+ *   multipage(mp) bvec is supported, and all these pages represented
+ *   by one bvec is physically contiguous. Before mp support, at most
+ *   one page can be included in one bvec, we call it singlepage(sp)
+ *   bvec.
+ *
+ * - .bv_page of th bvec represents the 1st page in the mp segment
+ *
+ * - .bv_offset of the bvec represents offset of the buffer in the bvec
+ *
+ * The effect on the current drivers/filesystem/dm/bcache/...:
+ *
+ * - almost everyone supposes that one bvec only includes one single
+ *   page, so we keep the sp interface not changed, for example,
+ *   bio_for_each_segment() still returns bvec with single page
+ *
+ * - bio_for_each_segment_all() will be changed to return singlepage
+ *   bvec too
+ *
+ * - during iterating, iterator variable(struct bvec_iter) is always
+ *   updated in multipage bvec style and that means bvec_iter_advance()
+ *   is kept not changed
+ *
+ * - returned(copied) singlepage bvec is generated in flight from bvec
+ *   helpers
+ *
+ * - In case that some components(such as iov_iter) need to support mp
+ *   segment, we introduce new helpers(suffixed with _mp) for them.
+ */
+
+/*
  * was unsigned short, but we might as well be ready for > 64kB I/O pages
  */
 struct bio_vec {
@@ -49,16 +85,31 @@ struct bvec_iter {
  */
 #define __bvec_iter_bvec(bvec, iter)	(&(bvec)[(iter).bi_idx])
 
-#define bvec_iter_page(bvec, iter)				\
+#define bvec_iter_page_mp(bvec, iter)				\
 	(__bvec_iter_bvec((bvec), (iter))->bv_page)
 
-#define bvec_iter_len(bvec, iter)				\
+#define bvec_iter_len_mp(bvec, iter)				\
 	min((iter).bi_size,					\
 	    __bvec_iter_bvec((bvec), (iter))->bv_len - (iter).bi_bvec_done)
 
-#define bvec_iter_offset(bvec, iter)				\
+#define bvec_iter_offset_mp(bvec, iter)				\
 	(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)
 
+/*
+ * <page, offset,length> of sp segment.
+ *
+ * This helpers will be implemented for building sp bvec in flight.
+ *
+ */
+#define bvec_iter_offset_sp(bvec, iter)	bvec_iter_offset_mp((bvec), (iter))
+#define bvec_iter_len_sp(bvec, iter)	bvec_iter_len_mp((bvec), (iter))
+#define bvec_iter_page_sp(bvec, iter)	bvec_iter_page_mp((bvec), (iter))
+
+/* current interfaces support sp style at default */
+#define bvec_iter_page(bvec, iter)	bvec_iter_page_sp((bvec), (iter))
+#define bvec_iter_len(bvec, iter)	bvec_iter_len_sp((bvec), (iter))
+#define bvec_iter_offset(bvec, iter)	bvec_iter_offset_sp((bvec), (iter))
+
 #define bvec_iter_bvec(bvec, iter)				\
 ((struct bio_vec) {						\
 	.bv_page	= bvec_iter_page((bvec), (iter)),	\
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 32/60] block: implement sp version of bvec iterator helpers
  2016-10-29  8:07 ` Ming Lei
                   ` (34 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-29 11:06   ` kbuild test robot
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Johannes Berg

This patch implements singlepage version of the following
3 helpers:
	- bvec_iter_offset_sp()
	- bvec_iter_len_sp()
	- bvec_iter_page_sp()

So that one multipage bvec can be splited to singlepage
bvec, and make users of current bvec iterator happy.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bvec.h | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index da984fa171bc..12c53a0eee52 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -22,6 +22,7 @@
 
 #include <linux/kernel.h>
 #include <linux/bug.h>
+#include <linux/mm.h>
 
 /*
  * What is multipage bvecs(segment)?
@@ -95,15 +96,25 @@ struct bvec_iter {
 #define bvec_iter_offset_mp(bvec, iter)				\
 	(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)
 
+#define bvec_iter_page_idx_mp(bvec, iter)			\
+	(bvec_iter_offset_mp((bvec), (iter)) / PAGE_SIZE)
+
 /*
  * <page, offset,length> of sp segment.
  *
  * This helpers will be implemented for building sp bvec in flight.
  *
  */
-#define bvec_iter_offset_sp(bvec, iter)	bvec_iter_offset_mp((bvec), (iter))
-#define bvec_iter_len_sp(bvec, iter)	bvec_iter_len_mp((bvec), (iter))
-#define bvec_iter_page_sp(bvec, iter)	bvec_iter_page_mp((bvec), (iter))
+#define bvec_iter_offset_sp(bvec, iter)					\
+	(bvec_iter_offset_mp((bvec), (iter)) % PAGE_SIZE)
+
+#define bvec_iter_len_sp(bvec, iter)					\
+	min_t(unsigned, bvec_iter_len_mp((bvec), (iter)),		\
+	    (PAGE_SIZE - (bvec_iter_offset_sp((bvec), (iter)))))
+
+#define bvec_iter_page_sp(bvec, iter)					\
+	nth_page(bvec_iter_page_mp((bvec), (iter)),			\
+		 bvec_iter_page_idx_mp((bvec), (iter)))
 
 /* current interfaces support sp style at default */
 #define bvec_iter_page(bvec, iter)	bvec_iter_page_sp((bvec), (iter))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 33/60] block: introduce bio_for_each_segment_mp()
  2016-10-29  8:07 ` Ming Lei
                   ` (35 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Mike Christie, Hannes Reinecke,
	Keith Busch, Mike Snitzer, Johannes Berg

This helper is used to iterate multipage bvec and it is
required in bio_clone().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bio.h  | 38 +++++++++++++++++++++++++++++++++-----
 include/linux/bvec.h | 37 ++++++++++++++++++++++++++++++++-----
 2 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index fa71f6a57f81..17852ba0e40f 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -65,6 +65,9 @@
 #define bio_sectors(bio)	((bio)->bi_iter.bi_size >> 9)
 #define bio_end_sector(bio)	((bio)->bi_iter.bi_sector + bio_sectors((bio)))
 
+#define bio_iter_iovec_mp(bio, iter)				\
+	bvec_iter_bvec_mp((bio)->bi_io_vec, (iter))
+
 /*
  * Check whether this bio carries any data or not. A NULL bio is allowed.
  */
@@ -167,15 +170,31 @@ static inline void *bio_data(struct bio *bio)
 #define bio_for_each_segment_all(bvl, bio, i)				\
 	for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
 
-static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
-				    unsigned bytes)
+static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+				      unsigned bytes, bool mp)
 {
 	iter->bi_sector += bytes >> 9;
 
-	if (bio_no_advance_iter(bio))
+	if (bio_no_advance_iter(bio)) {
 		iter->bi_size -= bytes;
-	else
-		bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+	} else {
+		if (!mp)
+			bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+		else
+			bvec_iter_advance_mp(bio->bi_io_vec, iter, bytes);
+	}
+}
+
+static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+				    unsigned bytes)
+{
+	__bio_advance_iter(bio, iter, bytes, false);
+}
+
+static inline void bio_advance_iter_mp(struct bio *bio, struct bvec_iter *iter,
+				       unsigned bytes)
+{
+	__bio_advance_iter(bio, iter, bytes, true);
 }
 
 #define __bio_for_each_segment(bvl, bio, iter, start)			\
@@ -187,6 +206,15 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
 #define bio_for_each_segment(bvl, bio, iter)				\
 	__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)
 
+#define __bio_for_each_segment_mp(bvl, bio, iter, start)		\
+	for (iter = (start);						\
+	     (iter).bi_size &&						\
+		((bvl = bio_iter_iovec_mp((bio), (iter))), 1);		\
+	     bio_advance_iter_mp((bio), &(iter), (bvl).bv_len))
+
+#define bio_for_each_segment_mp(bvl, bio, iter)				\
+	__bio_for_each_segment_mp(bvl, bio, iter, (bio)->bi_iter)
+
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
 static inline unsigned bio_segments(struct bio *bio)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 12c53a0eee52..9df9e582bd3f 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -128,16 +128,29 @@ struct bvec_iter {
 	.bv_offset	= bvec_iter_offset((bvec), (iter)),	\
 })
 
-static inline void bvec_iter_advance(const struct bio_vec *bv,
-				     struct bvec_iter *iter,
-				     unsigned bytes)
+#define bvec_iter_bvec_mp(bvec, iter)				\
+((struct bio_vec) {						\
+	.bv_page	= bvec_iter_page_mp((bvec), (iter)),	\
+	.bv_len		= bvec_iter_len_mp((bvec), (iter)),	\
+	.bv_offset	= bvec_iter_offset_mp((bvec), (iter)),	\
+})
+
+static inline void __bvec_iter_advance(const struct bio_vec *bv,
+				       struct bvec_iter *iter,
+				       unsigned bytes, bool mp)
 {
 	WARN_ONCE(bytes > iter->bi_size,
 		  "Attempted to advance past end of bvec iter\n");
 
 	while (bytes) {
-		unsigned iter_len = bvec_iter_len(bv, *iter);
-		unsigned len = min(bytes, iter_len);
+		unsigned len;
+
+		if (mp)
+			len = bvec_iter_len_mp(bv, *iter);
+		else
+			len = bvec_iter_len_sp(bv, *iter);
+
+		len = min(bytes, len);
 
 		bytes -= len;
 		iter->bi_size -= len;
@@ -150,6 +163,20 @@ static inline void bvec_iter_advance(const struct bio_vec *bv,
 	}
 }
 
+static inline void bvec_iter_advance(const struct bio_vec *bv,
+				     struct bvec_iter *iter,
+				     unsigned bytes)
+{
+	__bvec_iter_advance(bv, iter, bytes, false);
+}
+
+static inline void bvec_iter_advance_mp(const struct bio_vec *bv,
+					struct bvec_iter *iter,
+					unsigned bytes)
+{
+	__bvec_iter_advance(bv, iter, bytes, true);
+}
+
 #define for_each_bvec(bvl, bio_vec, iter, start)			\
 	for (iter = (start);						\
 	     (iter).bi_size &&						\
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 34/60] block: introduce bio_clone_sp()
  2016-10-29  8:07 ` Ming Lei
                   ` (36 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe, Mike Christie,
	Hannes Reinecke, Keith Busch, Mike Snitzer

Firstly bio_clone() and bio_clone_bioset() are changed
to clone mp bvecs because our iterator helpers are capable
of splitting mp bvecs into sp bvecs.

But sometimes we still need cloned bio with singlepage bvecs,
for example, in bio bounce/bcache(bch_data_verify), bvecs of
cloned bio need to be updated.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bio.c         | 27 +++++++++++++++++++++------
 include/linux/bio.h | 42 ++++++++++++++++++++++++++++++++++++++----
 2 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index a49d1d89a85c..a9bf01784f37 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -626,16 +626,22 @@ EXPORT_SYMBOL(bio_clone_fast);
  * 	@bio_src: bio to clone
  *	@gfp_mask: allocation priority
  *	@bs: bio_set to allocate from
+ *	@sp_bvecs: if clone to singlepage bvecs.
  *
  *	Clone bio. Caller will own the returned bio, but not the actual data it
  *	points to. Reference count of returned bio will be one.
+ *
+ *	If @sp_bvecs is true, the caller must make sure number of singlepage
+ *	bvecs is less than maximum bvec count.
+ *
  */
-struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
-			     struct bio_set *bs)
+struct bio *__bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
+			       struct bio_set *bs, bool sp_bvecs)
 {
 	struct bvec_iter iter;
 	struct bio_vec bv;
 	struct bio *bio;
+	unsigned segs;
 
 	/*
 	 * Pre immutable biovecs, __bio_clone() used to just do a memcpy from
@@ -659,7 +665,12 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
 	 *    __bio_clone_fast() anyways.
 	 */
 
-	bio = bio_alloc_bioset(gfp_mask, bio_segments(bio_src), bs);
+	if (sp_bvecs)
+		segs = bio_segments(bio_src);
+	else
+		segs = bio_segments_mp(bio_src);
+
+	bio = bio_alloc_bioset(gfp_mask, segs, bs);
 	if (!bio)
 		return NULL;
 	bio->bi_bdev		= bio_src->bi_bdev;
@@ -675,8 +686,12 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
 		bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
 		break;
 	default:
-		bio_for_each_segment(bv, bio_src, iter)
-			bio->bi_io_vec[bio->bi_vcnt++] = bv;
+		if (sp_bvecs)
+			bio_for_each_segment(bv, bio_src, iter)
+				bio->bi_io_vec[bio->bi_vcnt++] = bv;
+		else
+			bio_for_each_segment_mp(bv, bio_src, iter)
+				bio->bi_io_vec[bio->bi_vcnt++] = bv;
 		break;
 	}
 
@@ -694,7 +709,7 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
 
 	return bio;
 }
-EXPORT_SYMBOL(bio_clone_bioset);
+EXPORT_SYMBOL(__bio_clone_bioset);
 
 /**
  *	bio_add_pc_page	-	attempt to add page to bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 17852ba0e40f..ec1c0f2aaa19 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -217,7 +217,7 @@ static inline void bio_advance_iter_mp(struct bio *bio, struct bvec_iter *iter,
 
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
-static inline unsigned bio_segments(struct bio *bio)
+static inline unsigned __bio_segments(struct bio *bio, bool mp)
 {
 	unsigned segs = 0;
 	struct bio_vec bv;
@@ -237,12 +237,26 @@ static inline unsigned bio_segments(struct bio *bio)
 	if (bio_op(bio) == REQ_OP_WRITE_SAME)
 		return 1;
 
-	bio_for_each_segment(bv, bio, iter)
-		segs++;
+	if (!mp)
+		bio_for_each_segment(bv, bio, iter)
+			segs++;
+	else
+		bio_for_each_segment_mp(bv, bio, iter)
+			segs++;
 
 	return segs;
 }
 
+static inline unsigned bio_segments(struct bio *bio)
+{
+	return __bio_segments(bio, false);
+}
+
+static inline unsigned bio_segments_mp(struct bio *bio)
+{
+	return __bio_segments(bio, true);
+}
+
 /*
  * get a reference to a bio, so it won't disappear. the intended use is
  * something like:
@@ -415,10 +429,24 @@ extern void bio_put(struct bio *);
 
 extern void __bio_clone_fast(struct bio *, struct bio *);
 extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *);
-extern struct bio *bio_clone_bioset(struct bio *, gfp_t, struct bio_set *bs);
+extern struct bio *__bio_clone_bioset(struct bio *, gfp_t,
+				      struct bio_set *bs, bool);
 
 extern struct bio_set *fs_bio_set;
 
+/* at default we clone bio with multipage bvecs */
+static inline struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp,
+					   struct bio_set *bs)
+{
+	return __bio_clone_bioset(bio, gfp, bs, false);
+}
+
+static inline struct bio *bio_clone_bioset_sp(struct bio *bio, gfp_t gfp,
+					      struct bio_set *bs)
+{
+	return __bio_clone_bioset(bio, gfp, bs, true);
+}
+
 static inline struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 {
 	return bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
@@ -429,6 +457,12 @@ static inline struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
 	return bio_clone_bioset(bio, gfp_mask, fs_bio_set);
 }
 
+/* Sometimes we have to clone one bio with singlepage bvec */
+static inline struct bio *bio_clone_sp(struct bio *bio, gfp_t gfp_mask)
+{
+	return __bio_clone_bioset(bio, gfp_mask, fs_bio_set, true);
+}
+
 static inline struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int nr_iovecs)
 {
 	return bio_alloc_bioset(gfp_mask, nr_iovecs, NULL);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 35/60] bvec_iter: introduce BVEC_ITER_ALL_INIT
  2016-10-29  8:07 ` Ming Lei
                   ` (37 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Johannes Berg

Introduce BVEC_ITER_ALL_INIT for iterating one bio
from start to end.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bvec.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 9df9e582bd3f..e12ce6bd63d7 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -183,4 +183,12 @@ static inline void bvec_iter_advance_mp(const struct bio_vec *bv,
 		((bvl = bvec_iter_bvec((bio_vec), (iter))), 1);	\
 	     bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
 
+#define BVEC_ITER_ALL_INIT (struct bvec_iter)			\
+{									\
+	.bi_sector	= 0,						\
+	.bi_size	= UINT_MAX,					\
+	.bi_idx		= 0,						\
+	.bi_bvec_done	= 0,						\
+}
+
 #endif /* __LINUX_BVEC_ITER_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 36/60] block: bounce: avoid direct access to bvec from bio->bi_io_vec
  2016-10-29  8:07 ` Ming Lei
                   ` (38 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe

We will support multipage bvecs in the future, so change to
iterator way for getting bv_page of bvec from original bio.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bounce.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/bounce.c b/block/bounce.c
index 1cb5dd3a5da1..babd3f224ca0 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -126,21 +126,23 @@ static void copy_to_high_bio_irq(struct bio *to, struct bio *from)
 static void bounce_end_io(struct bio *bio, mempool_t *pool)
 {
 	struct bio *bio_orig = bio->bi_private;
-	struct bio_vec *bvec, *org_vec;
+	struct bio_vec *bvec, orig_vec;
 	int i;
-	int start = bio_orig->bi_iter.bi_idx;
+	struct bvec_iter orig_iter = bio_orig->bi_iter;
 
 	/*
 	 * free up bounce indirect pages used
 	 */
 	bio_for_each_segment_all(bvec, bio, i) {
-		org_vec = bio_orig->bi_io_vec + i + start;
 
-		if (bvec->bv_page == org_vec->bv_page)
-			continue;
+		orig_vec = bio_iter_iovec(bio_orig, orig_iter);
+		if (bvec->bv_page == orig_vec.bv_page)
+			goto next;
 
 		dec_zone_page_state(bvec->bv_page, NR_BOUNCE);
 		mempool_free(bvec->bv_page, pool);
+ next:
+		bio_advance_iter(bio_orig, &orig_iter, orig_vec.bv_len);
 	}
 
 	bio_orig->bi_error = bio->bi_error;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 37/60] block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
  2016-10-29  8:07 ` Ming Lei
                   ` (39 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe

As we need to support multipage bvecs, so don't access bio->bi_io_vec
in copy_to_high_bio_irq(), and just use the standard iterator
to do that.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bounce.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/bounce.c b/block/bounce.c
index babd3f224ca0..a42f7b98b7e6 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -102,24 +102,30 @@ int init_emergency_isa_pool(void)
 static void copy_to_high_bio_irq(struct bio *to, struct bio *from)
 {
 	unsigned char *vfrom;
-	struct bio_vec tovec, *fromvec = from->bi_io_vec;
+	struct bio_vec tovec, fromvec;
 	struct bvec_iter iter;
+	/*
+	 * @from bio is created by bounce, so we can iterate from
+	 * start and can't trust @from->bi_iter because it might be
+	 * changed by splitting.
+	 */
+	struct bvec_iter from_iter = BVEC_ITER_ALL_INIT;
 
 	bio_for_each_segment(tovec, to, iter) {
-		if (tovec.bv_page != fromvec->bv_page) {
+		fromvec = bio_iter_iovec(from, from_iter);
+		if (tovec.bv_page != fromvec.bv_page) {
 			/*
 			 * fromvec->bv_offset and fromvec->bv_len might have
 			 * been modified by the block layer, so use the original
 			 * copy, bounce_copy_vec already uses tovec->bv_len
 			 */
-			vfrom = page_address(fromvec->bv_page) +
+			vfrom = page_address(fromvec.bv_page) +
 				tovec.bv_offset;
 
 			bounce_copy_vec(&tovec, vfrom);
 			flush_dcache_page(tovec.bv_page);
 		}
-
-		fromvec++;
+		bio_advance_iter(from, &from_iter, tovec.bv_len);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 38/60] block: bounce: convert multipage bvecs into singlepage
  2016-10-29  8:07 ` Ming Lei
                   ` (40 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe

This patch trys to split the incoming multipage bvecs bio, so
that the splitted bio can be held into one singlepage bvecs bio.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bounce.c | 46 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/block/bounce.c b/block/bounce.c
index a42f7b98b7e6..da240d1de809 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -187,22 +187,38 @@ static void bounce_end_io_read_isa(struct bio *bio)
 	__bounce_end_io_read(bio, isa_page_pool);
 }
 
+static inline bool need_split(struct request_queue *q, struct bio *bio)
+{
+	return bio_sectors(bio) > BIO_SP_MAX_SECTORS;
+}
+
+static inline bool need_bounce(struct request_queue *q, struct bio *bio)
+{
+	struct bvec_iter iter;
+	struct bio_vec bv;
+
+	bio_for_each_segment_mp(bv, bio, iter) {
+		unsigned nr = (bv.bv_offset + bv.bv_len - 1) >>
+			PAGE_SHIFT;
+
+		if (page_to_pfn(bv.bv_page) + nr > queue_bounce_pfn(q))
+			return true;
+	}
+	return false;
+}
+
 static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 			       mempool_t *pool)
 {
 	struct bio *bio;
 	int rw = bio_data_dir(*bio_orig);
-	struct bio_vec *to, from;
-	struct bvec_iter iter;
+	struct bio_vec *to;
 	unsigned i;
 
-	bio_for_each_segment(from, *bio_orig, iter)
-		if (page_to_pfn(from.bv_page) > queue_bounce_pfn(q))
-			goto bounce;
+	if (!need_bounce(q, *bio_orig))
+		return;
 
-	return;
-bounce:
-	bio = bio_clone_bioset(*bio_orig, GFP_NOIO, fs_bio_set);
+	bio = bio_clone_bioset_sp(*bio_orig, GFP_NOIO, fs_bio_set);
 
 	bio_for_each_segment_all(to, bio, i) {
 		struct page *page = to->bv_page;
@@ -267,9 +283,23 @@ void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
 		pool = isa_page_pool;
 	}
 
+	if (!need_bounce(q, *bio_orig))
+		return;
+
 	/*
 	 * slow path
+	 *
+	 * REQ_PC bio won't reach splitting because multipage bvecs
+	 * isn't enabled for REQ_PC.
 	 */
+	if (need_split(q, *bio_orig)) {
+		struct bio *split = bio_split(*bio_orig,
+					      BIO_SP_MAX_SECTORS,
+					      GFP_NOIO, q->bio_split);
+		bio_chain(split, *bio_orig);
+		generic_make_request(*bio_orig);
+		*bio_orig = split;
+	}
 	__blk_queue_bounce(q, bio_orig, pool);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 39/60] bcache: debug: switch to bio_clone_sp()
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Kent Overstreet, Shaohua Li,
	Mike Christie, Hannes Reinecke, Guoqing Jiang,
	open list:BCACHE BLOCK LAYER CACHE,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

The cloned bio has to be singlepage bvec based, so
use bio_clone_sp(), and the allocated bvec table
is enough for hold the bvecs because QUEUE_FLAG_SPLIT_MP
is set for bcache.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/debug.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 71a9f05918eb..0735015b0842 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -111,12 +111,10 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 	struct bvec_iter iter, citer = { 0 };
 
 	/*
-	 * Once multipage bvec is supported, the bio_clone()
-	 * has to make sure page count in this bio can be held
-	 * in the new cloned bio because each single page need
-	 * to assign to each bvec of the new bio.
+	 * QUEUE_FLAG_SPLIT_MP can make the cloned singlepage
+	 * bvecs to be held in the allocated bvec table.
 	 */
-	check = bio_clone(bio, GFP_NOIO);
+	check = bio_clone_sp(bio, GFP_NOIO);
 	if (!check)
 		return;
 	bio_set_op_attrs(check, REQ_OP_READ, READ_SYNC);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 39/60] bcache: debug: switch to bio_clone_sp()
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Kent Overstreet, Shaohua Li,
	Mike Christie, Hannes Reinecke, Guoqing Jiang,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

The cloned bio has to be singlepage bvec based, so
use bio_clone_sp(), and the allocated bvec table
is enough for hold the bvecs because QUEUE_FLAG_SPLIT_MP
is set for bcache.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/debug.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 71a9f05918eb..0735015b0842 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -111,12 +111,10 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
 	struct bvec_iter iter, citer = { 0 };
 
 	/*
-	 * Once multipage bvec is supported, the bio_clone()
-	 * has to make sure page count in this bio can be held
-	 * in the new cloned bio because each single page need
-	 * to assign to each bvec of the new bio.
+	 * QUEUE_FLAG_SPLIT_MP can make the cloned singlepage
+	 * bvecs to be held in the allocated bvec table.
 	 */
-	check = bio_clone(bio, GFP_NOIO);
+	check = bio_clone_sp(bio, GFP_NOIO);
 	if (!check)
 		return;
 	bio_set_op_attrs(check, REQ_OP_READ, READ_SYNC);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 40/60] blk-merge: compute bio->bi_seg_front_size efficiently
  2016-10-29  8:07 ` Ming Lei
                   ` (42 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe

It is enough to check and compute bio->bi_seg_front_size just
after the 1st segment is found, but current code checks that
for each bvec, which is inefficient.

This patch follows the way in  __blk_recalc_rq_segments()
for computing bio->bi_seg_front_size, and it is more efficient
and code becomes more readable too.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-merge.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 266c94d1d82f..465d9c65cb41 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -157,22 +157,21 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			bvprvp = &bvprv;
 			sectors += bv.bv_len >> 9;
 
-			if (nsegs == 1 && seg_size > front_seg_size)
-				front_seg_size = seg_size;
 			continue;
 		}
 new_segment:
 		if (nsegs == queue_max_segments(q))
 			goto split;
 
+		if (nsegs == 1 && seg_size > front_seg_size)
+			front_seg_size = seg_size;
+
 		nsegs++;
 		bvprv = bv;
 		bvprvp = &bvprv;
 		seg_size = bv.bv_len;
 		sectors += bv.bv_len >> 9;
 
-		if (nsegs == 1 && seg_size > front_seg_size)
-			front_seg_size = seg_size;
 	}
 
 	do_split = false;
@@ -185,6 +184,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			bio = new;
 	}
 
+	if (nsegs == 1 && seg_size > front_seg_size)
+		front_seg_size = seg_size;
 	bio->bi_seg_front_size = front_seg_size;
 	if (seg_size > bio->bi_seg_back_size)
 		bio->bi_seg_back_size = seg_size;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 41/60] block: blk-merge: try to make front segments in full size
  2016-10-29  8:07 ` Ming Lei
                   ` (43 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe

When merging one bvec into segment, if the bvec is too big
to merge, current policy is to move the whole bvec into another
new segment.

This patchset changes the policy into trying to maximize size of
front segments, that means in above situation, part of bvec
is merged into current segment, and the remainder is put
into next segment.

This patch prepares for support multipage bvec because
it can be quite common to see this case and we should try
to make front segments in full size.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-merge.c | 44 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 465d9c65cb41..a6457e70dafc 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -99,6 +99,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	struct bio *new = NULL;
 	const unsigned max_sectors = get_max_io_size(q, bio);
 	unsigned bvecs = 0;
+	unsigned advance;
 
 	bio_for_each_segment(bv, bio, iter) {
 		/*
@@ -129,6 +130,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		if (bvprvp && bvec_gap_to_prev(q, bvprvp, bv.bv_offset))
 			goto split;
 
+		advance = 0;
 		if (sectors + (bv.bv_len >> 9) > max_sectors) {
 			/*
 			 * Consider this a new segment if we're splitting in
@@ -145,12 +147,24 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		}
 
 		if (bvprvp && blk_queue_cluster(q)) {
-			if (seg_size + bv.bv_len > queue_max_segment_size(q))
-				goto new_segment;
 			if (!BIOVEC_PHYS_MERGEABLE(bvprvp, &bv))
 				goto new_segment;
 			if (!BIOVEC_SEG_BOUNDARY(q, bvprvp, &bv))
 				goto new_segment;
+			if (seg_size + bv.bv_len > queue_max_segment_size(q)) {
+				advance = queue_max_segment_size(q) - seg_size;
+
+				if (advance > 0) {
+					seg_size += advance;
+					sectors += advance >> 9;
+					bv.bv_len -= advance;
+					bv.bv_offset += advance;
+				} else {
+					advance = 0;
+				}
+
+				goto new_segment;
+			}
 
 			seg_size += bv.bv_len;
 			bvprv = bv;
@@ -172,6 +186,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		seg_size = bv.bv_len;
 		sectors += bv.bv_len >> 9;
 
+		/* restore the bvec for iterator */
+		bv.bv_len += advance;
+		bv.bv_offset -= advance;
 	}
 
 	do_split = false;
@@ -371,16 +388,29 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 {
 
 	int nbytes = bvec->bv_len;
+	int advance = 0;
 
 	if (*sg && *cluster) {
-		if ((*sg)->length + nbytes > queue_max_segment_size(q))
-			goto new_segment;
-
 		if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
 			goto new_segment;
 		if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
 			goto new_segment;
 
+		/* try best to merge part of the bvec into previous seg */
+		if ((*sg)->length + nbytes > queue_max_segment_size(q)) {
+			advance = queue_max_segment_size(q) - (*sg)->length;
+			if (advance <= 0) {
+				advance = 0;
+				goto new_segment;
+			}
+
+			(*sg)->length += advance;
+
+			bvec->bv_offset += advance;
+			bvec->bv_len -= advance;
+			goto new_segment;
+		}
+
 		(*sg)->length += nbytes;
 	} else {
 new_segment:
@@ -403,6 +433,10 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 
 		sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
 		(*nsegs)++;
+
+		/* for making iterator happy */
+		bvec->bv_offset -= advance;
+		bvec->bv_len += advance;
 	}
 	*bvprv = *bvec;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 42/60] block: use bio_for_each_segment_mp() to compute segments count
  2016-10-29  8:07 ` Ming Lei
                   ` (44 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe

Firstly it is more efficient to use bio_for_each_segment_mp()
in both blk_bio_segment_split() and __blk_recalc_rq_segments()
to compute how many segments there are in the bio.

Secondaly once bio_for_each_segment_mp() is used, the bvec
may need to be splitted because its length can be very long
and more than max segment size, so we have to split one bvec
into several segments.

Thirdly during splitting mp bvec into segments, max segment
number may be reached, then the bio need to be splitted.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-merge.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 75 insertions(+), 14 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index a6457e70dafc..9142f1fc914b 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -86,6 +86,61 @@ static inline unsigned get_max_io_size(struct request_queue *q,
 	return sectors;
 }
 
+static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv,
+		unsigned *nsegs, unsigned *last_seg_size,
+		unsigned *front_seg_size, unsigned *sectors)
+{
+	bool need_split = false;
+	unsigned old_nsegs = *nsegs;
+	unsigned new_nsegs, seg_size;
+
+	WARN_ON(old_nsegs == queue_max_segments(q));
+	WARN_ON(bv->bv_len == 0);
+
+	/*
+	 * Multipage bvec is too big to hold in one segment,
+	 * so the current bvec has to be splitted as multiple
+	 * segments.
+	 *
+	 * @seg_size is segment size of last segment in this bvec
+	 * @new_nsegs is segment count of this bvec
+	 */
+	seg_size = bv->bv_len % queue_max_segment_size(q);
+	new_nsegs = bv->bv_len / queue_max_segment_size(q);
+	if (!seg_size)
+		seg_size = queue_max_segment_size(q);
+	else
+		new_nsegs += 1;
+
+	/* need splitting if max segs is reached */
+	if (old_nsegs + new_nsegs > queue_max_segments(q)) {
+		new_nsegs = queue_max_segments(q) - old_nsegs;
+
+		/* split the bvec */
+		if (bv->bv_len > queue_max_segment_size(q))
+			seg_size = queue_max_segment_size(q);
+		need_split = true;
+	}
+
+	/* update front segment size */
+	if (!old_nsegs) {
+		unsigned first_seg_size = seg_size;
+		if (new_nsegs > 1)
+			first_seg_size = queue_max_segment_size(q);
+		if (*front_seg_size < first_seg_size)
+			*front_seg_size = first_seg_size;
+	}
+
+	*last_seg_size = seg_size;
+	*nsegs += new_nsegs;
+
+	if (sectors)
+		*sectors += ((new_nsegs - 1) *
+				queue_max_segment_size(q) + seg_size) >> 9;
+
+	return need_split;
+}
+
 static struct bio *blk_bio_segment_split(struct request_queue *q,
 					 struct bio *bio,
 					 struct bio_set *bs,
@@ -101,7 +156,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	unsigned bvecs = 0;
 	unsigned advance;
 
-	bio_for_each_segment(bv, bio, iter) {
+	bio_for_each_segment_mp(bv, bio, iter) {
 		/*
 		 * With arbitrary bio size, the incoming bio may be very
 		 * big. We have to split the bio into small bios so that
@@ -138,8 +193,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 			 */
 			if (nsegs < queue_max_segments(q) &&
 			    sectors < max_sectors) {
-				nsegs++;
-				sectors = max_sectors;
+				/* split in the middle of bvec */
+				bv.bv_len = (max_sectors - sectors) << 9;
+				bvec_split_segs(q, &bv, &nsegs,
+						&seg_size,
+						&front_seg_size,
+						&sectors);
 			}
 			if (sectors)
 				goto split;
@@ -180,11 +239,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 		if (nsegs == 1 && seg_size > front_seg_size)
 			front_seg_size = seg_size;
 
-		nsegs++;
 		bvprv = bv;
 		bvprvp = &bvprv;
-		seg_size = bv.bv_len;
-		sectors += bv.bv_len >> 9;
+
+		if (bvec_split_segs(q, &bv, &nsegs, &seg_size,
+					&front_seg_size, &sectors))
+			goto split;
 
 		/* restore the bvec for iterator */
 		bv.bv_len += advance;
@@ -253,6 +313,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	struct bio_vec bv, bvprv = { NULL };
 	int cluster, prev = 0;
 	unsigned int seg_size, nr_phys_segs;
+	unsigned front_seg_size = bio->bi_seg_front_size;
 	struct bio *fbio, *bbio;
 	struct bvec_iter iter;
 
@@ -274,7 +335,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 	seg_size = 0;
 	nr_phys_segs = 0;
 	for_each_bio(bio) {
-		bio_for_each_segment(bv, bio, iter) {
+		bio_for_each_segment_mp(bv, bio, iter) {
 			/*
 			 * If SG merging is disabled, each bio vector is
 			 * a segment
@@ -296,20 +357,20 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 				continue;
 			}
 new_segment:
-			if (nr_phys_segs == 1 && seg_size >
-			    fbio->bi_seg_front_size)
-				fbio->bi_seg_front_size = seg_size;
+			if (nr_phys_segs == 1 && seg_size > front_seg_size)
+				front_seg_size = seg_size;
 
-			nr_phys_segs++;
 			bvprv = bv;
 			prev = 1;
-			seg_size = bv.bv_len;
+			bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size,
+					&front_seg_size, NULL);
 		}
 		bbio = bio;
 	}
 
-	if (nr_phys_segs == 1 && seg_size > fbio->bi_seg_front_size)
-		fbio->bi_seg_front_size = seg_size;
+	if (nr_phys_segs == 1 && seg_size > front_seg_size)
+		front_seg_size = seg_size;
+	fbio->bi_seg_front_size = front_seg_size;
 	if (seg_size > bbio->bi_seg_back_size)
 		bbio->bi_seg_back_size = seg_size;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 43/60] block: use bio_for_each_segment_mp() to map sg
  2016-10-29  8:07 ` Ming Lei
                   ` (45 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe

It is more efficient to use bio_for_each_segment_mp()
for mapping sg, meantime we have to consider splitting
multipage bvec as done in blk_bio_segment_split().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/blk-merge.c | 72 +++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 9142f1fc914b..e3b8cbc8b675 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -442,6 +442,56 @@ static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
 	return 0;
 }
 
+static inline struct scatterlist *blk_next_sg(struct scatterlist **sg,
+		struct scatterlist *sglist)
+{
+	if (!*sg)
+		return sglist;
+	else {
+		/*
+		 * If the driver previously mapped a shorter
+		 * list, we could see a termination bit
+		 * prematurely unless it fully inits the sg
+		 * table on each mapping. We KNOW that there
+		 * must be more entries here or the driver
+		 * would be buggy, so force clear the
+		 * termination bit to avoid doing a full
+		 * sg_init_table() in drivers for each command.
+		 */
+		sg_unmark_end(*sg);
+		return sg_next(*sg);
+	}
+}
+
+static inline unsigned
+blk_bvec_map_sg(struct request_queue *q, struct bio_vec *bvec,
+		struct scatterlist *sglist, struct scatterlist **sg)
+{
+	unsigned nbytes = bvec->bv_len;
+	unsigned nsegs = 0, total = 0;
+
+	while (nbytes > 0) {
+		unsigned seg_size;
+		struct page *pg;
+		unsigned offset, idx;
+
+		*sg = blk_next_sg(sg, sglist);
+
+		seg_size = min(nbytes, queue_max_segment_size(q));
+		offset = (total + bvec->bv_offset) % PAGE_SIZE;
+		idx = (total + bvec->bv_offset) / PAGE_SIZE;
+		pg = nth_page(bvec->bv_page, idx);
+
+		sg_set_page(*sg, pg, seg_size, offset);
+
+		total += seg_size;
+		nbytes -= seg_size;
+		nsegs++;
+	}
+
+	return nsegs;
+}
+
 static inline void
 __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 		     struct scatterlist *sglist, struct bio_vec *bvprv,
@@ -475,25 +525,7 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 		(*sg)->length += nbytes;
 	} else {
 new_segment:
-		if (!*sg)
-			*sg = sglist;
-		else {
-			/*
-			 * If the driver previously mapped a shorter
-			 * list, we could see a termination bit
-			 * prematurely unless it fully inits the sg
-			 * table on each mapping. We KNOW that there
-			 * must be more entries here or the driver
-			 * would be buggy, so force clear the
-			 * termination bit to avoid doing a full
-			 * sg_init_table() in drivers for each command.
-			 */
-			sg_unmark_end(*sg);
-			*sg = sg_next(*sg);
-		}
-
-		sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
-		(*nsegs)++;
+		(*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);
 
 		/* for making iterator happy */
 		bvec->bv_offset -= advance;
@@ -536,7 +568,7 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
 	}
 
 	for_each_bio(bio)
-		bio_for_each_segment(bvec, bio, iter)
+		bio_for_each_segment_mp(bvec, bio, iter)
 			__blk_segment_map_sg(q, &bvec, sglist, &bvprv, sg,
 					     &nsegs, &cluster);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 44/60] block: introduce bvec_for_each_sp_bvec()
  2016-10-29  8:07 ` Ming Lei
                   ` (46 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Johannes Berg

This helper can be used to iterate each singlepage bvec
from one multipage bvec.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bvec.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index e12ce6bd63d7..510d1d2d79f1 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -191,4 +191,14 @@ static inline void bvec_iter_advance_mp(const struct bio_vec *bv,
 	.bi_bvec_done	= 0,						\
 }
 
+#define __bvec_for_each_sp_bvec(sp_bvl, mp_bvec, iter, start)		\
+	for (iter = start,						\
+	     (iter).bi_size = (mp_bvec)->bv_len;			\
+	     (iter).bi_size &&						\
+		((sp_bvl = bvec_iter_bvec((mp_bvec), (iter))), 1);	\
+	     bvec_iter_advance((mp_bvec), &(iter), (sp_bvl).bv_len))
+
+#define bvec_for_each_sp_bvec(sp_bvl, mp_bvec, iter)			\
+	__bvec_for_each_sp_bvec(sp_bvl, mp_bvec, iter, BVEC_ITER_ALL_INIT)
+
 #endif /* __LINUX_BVEC_ITER_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  2016-10-29  8:07 ` Ming Lei
                   ` (47 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 13:59   ` Theodore Ts'o
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Mike Christie, Hannes Reinecke,
	Keith Busch, Mike Snitzer, Johannes Thumshirn, Bart Van Assche

This patches introduce bio_for_each_segment_all_rd() and
bio_for_each_segment_all_wt().

bio_for_each_segment_all_rd() is for replacing
bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
is accessed as readonly.

bio_for_each_segment_all_wt() is for replacing
bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
need to be updated.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bio.h       | 15 +++++++++++++++
 include/linux/blk_types.h |  6 ++++++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index ec1c0f2aaa19..f8a025ffaa9c 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -215,6 +215,21 @@ static inline void bio_advance_iter_mp(struct bio *bio, struct bvec_iter *iter,
 #define bio_for_each_segment_mp(bvl, bio, iter)				\
 	__bio_for_each_segment_mp(bvl, bio, iter, (bio)->bi_iter)
 
+/* the bio has to be singlepage bvecs based */
+#define bio_for_each_segment_all_wt(bvl, bio, i)                       \
+	bio_for_each_segment_all((bvl), (bio), (i))
+
+/*
+ * This helper returns singlepage bvec to caller for readonly
+ * purpose, and the caller can _not_ change the bvec stored in
+ * bio->bi_io_vec[] via this helper.
+ */
+#define bio_for_each_segment_all_rd(bvl, bio, i, bi)			\
+	for ((bi).iter = BVEC_ITER_ALL_INIT, i = 0, bvl = &(bi).bv;	\
+	     (bi).iter.bi_idx < (bio)->bi_vcnt &&			\
+		(((bi).bv = bio_iter_iovec((bio), (bi).iter)), 1);	\
+	     bio_advance_iter((bio), &(bi).iter, (bi).bv.bv_len), i++)
+
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
 static inline unsigned __bio_segments(struct bio *bio, bool mp)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cd395ecec99d..b4a202e98016 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -108,6 +108,12 @@ struct bio {
 
 #define BIO_RESET_BYTES		offsetof(struct bio, bi_max_vecs)
 
+/* this iter is only for implementing bio_for_each_segment_rd() */
+struct bvec_iter_all {
+	struct bvec_iter	iter;
+	struct bio_vec		bv;      /* in-flight singlepage bvec */
+};
+
 /*
  * bio flags
  */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 46/60] block: deal with dirtying pages for multipage bvec
  2016-10-29  8:07 ` Ming Lei
                   ` (48 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  2016-10-31 15:40   ` Christoph Hellwig
  -1 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe

In bio_check_pages_dirty(), bvec->bv_page is used as flag
for marking if the page has been dirtied & released, and if
no, it will be dirtied in deferred workqueue.

With multipage bvec, we can't do that any more, so change
the logic into checking all pages in one mp bvec, and only
release all these pages if all are dirtied, otherwise dirty
them all in deferred wrokqueue.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bio.c | 43 ++++++++++++++++++++++++++++++++++++-------
 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index a9bf01784f37..8e5af6e8bba3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1597,8 +1597,9 @@ void bio_set_pages_dirty(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	int i;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		struct page *page = bvec->bv_page;
 
 		if (page && !PageCompound(page))
@@ -1606,16 +1607,26 @@ void bio_set_pages_dirty(struct bio *bio)
 	}
 }
 
+static inline void release_mp_bvec_pages(struct bio_vec *bvec)
+{
+	struct bio_vec bv;
+	struct bvec_iter iter;
+
+	bvec_for_each_sp_bvec(bv, bvec, iter)
+		put_page(bv.bv_page);
+}
+
 static void bio_release_pages(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	int i;
 
+	/* iterate each mp bvec */
 	bio_for_each_segment_all(bvec, bio, i) {
 		struct page *page = bvec->bv_page;
 
 		if (page)
-			put_page(page);
+			release_mp_bvec_pages(bvec);
 	}
 }
 
@@ -1659,20 +1670,38 @@ static void bio_dirty_fn(struct work_struct *work)
 	}
 }
 
+static inline void check_mp_bvec_pages(struct bio_vec *bvec,
+		int *nr_dirty, int *nr_pages)
+{
+	struct bio_vec bv;
+	struct bvec_iter iter;
+
+	bvec_for_each_sp_bvec(bv, bvec, iter) {
+		struct page *page = bv.bv_page;
+
+		if (PageDirty(page) || PageCompound(page))
+			(*nr_dirty)++;
+		(*nr_pages)++;
+	}
+}
+
 void bio_check_pages_dirty(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	int nr_clean_pages = 0;
 	int i;
 
-	bio_for_each_segment_all(bvec, bio, i) {
-		struct page *page = bvec->bv_page;
+	bio_for_each_segment_all_wt(bvec, bio, i) {
+		int nr_dirty = 0, nr_pages = 0;
+
+		check_mp_bvec_pages(bvec, &nr_dirty, &nr_pages);
 
-		if (PageDirty(page) || PageCompound(page)) {
-			put_page(page);
+		/* release all pages in the mp bvec if all are dirtied */
+		if (nr_dirty == nr_pages) {
+			release_mp_bvec_pages(bvec);
 			bvec->bv_page = NULL;
 		} else {
-			nr_clean_pages++;
+			nr_clean_pages += nr_pages;
 		}
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 47/60] block: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
                   ` (49 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bio.c    | 17 +++++++++++------
 block/bounce.c |  6 ++++--
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8e5af6e8bba3..c9cf0a81cca3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -934,7 +934,7 @@ int bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
 	int i;
 	struct bio_vec *bv;
 
-	bio_for_each_segment_all(bv, bio, i) {
+	bio_for_each_segment_all_wt(bv, bio, i) {
 		bv->bv_page = alloc_page(gfp_mask);
 		if (!bv->bv_page) {
 			while (--bv >= bio->bi_io_vec)
@@ -1035,8 +1035,9 @@ static int bio_copy_from_iter(struct bio *bio, struct iov_iter iter)
 {
 	int i;
 	struct bio_vec *bvec;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		ssize_t ret;
 
 		ret = copy_page_from_iter(bvec->bv_page,
@@ -1066,8 +1067,9 @@ static int bio_copy_to_iter(struct bio *bio, struct iov_iter iter)
 {
 	int i;
 	struct bio_vec *bvec;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		ssize_t ret;
 
 		ret = copy_page_to_iter(bvec->bv_page,
@@ -1089,8 +1091,9 @@ void bio_free_pages(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	int i;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bvec, bio, i)
+	bio_for_each_segment_all_rd(bvec, bio, i, bia)
 		__free_page(bvec->bv_page);
 }
 EXPORT_SYMBOL(bio_free_pages);
@@ -1390,11 +1393,12 @@ static void __bio_unmap_user(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	int i;
+	struct bvec_iter_all bia;
 
 	/*
 	 * make sure we dirty pages we wrote to
 	 */
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		if (bio_data_dir(bio) == READ)
 			set_page_dirty_lock(bvec->bv_page);
 
@@ -1486,8 +1490,9 @@ static void bio_copy_kern_endio_read(struct bio *bio)
 	char *p = bio->bi_private;
 	struct bio_vec *bvec;
 	int i;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		memcpy(p, page_address(bvec->bv_page), bvec->bv_len);
 		p += bvec->bv_len;
 	}
diff --git a/block/bounce.c b/block/bounce.c
index da240d1de809..5459127188c1 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -135,11 +135,12 @@ static void bounce_end_io(struct bio *bio, mempool_t *pool)
 	struct bio_vec *bvec, orig_vec;
 	int i;
 	struct bvec_iter orig_iter = bio_orig->bi_iter;
+	struct bvec_iter_all bia;
 
 	/*
 	 * free up bounce indirect pages used
 	 */
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 
 		orig_vec = bio_iter_iovec(bio_orig, orig_iter);
 		if (bvec->bv_page == orig_vec.bv_page)
@@ -214,13 +215,14 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 	int rw = bio_data_dir(*bio_orig);
 	struct bio_vec *to;
 	unsigned i;
+	struct bvec_iter_all bia;
 
 	if (!need_bounce(q, *bio_orig))
 		return;
 
 	bio = bio_clone_bioset_sp(*bio_orig, GFP_NOIO, fs_bio_set);
 
-	bio_for_each_segment_all(to, bio, i) {
+	bio_for_each_segment_all_rd(to, bio, i, bia) {
 		struct page *page = to->bv_page;
 
 		if (page_to_pfn(page) <= queue_bounce_pfn(q))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 48/60] fs/mpage: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
                   ` (50 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alexander Viro

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/mpage.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/mpage.c b/fs/mpage.c
index d2413af0823a..2c906e82dd49 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -46,9 +46,10 @@
 static void mpage_end_io(struct bio *bio)
 {
 	struct bio_vec *bv;
+	struct bvec_iter_all bia;
 	int i;
 
-	bio_for_each_segment_all(bv, bio, i) {
+	bio_for_each_segment_all_rd(bv, bio, i, bia) {
 		struct page *page = bv->bv_page;
 		page_endio(page, op_is_write(bio_op(bio)), bio->bi_error);
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 49/60] fs/direct-io: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
                   ` (51 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alexander Viro

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/direct-io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index fb9aa16a7727..cfad1ac8fa53 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -487,7 +487,9 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
 		err = bio->bi_error;
 		bio_check_pages_dirty(bio);	/* transfers ownership */
 	} else {
-		bio_for_each_segment_all(bvec, bio, i) {
+		struct bvec_iter_all bia;
+
+		bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 			struct page *page = bvec->bv_page;
 
 			if (dio->op == REQ_OP_READ && !PageCompound(page) &&
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 50/60] ext4: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
                   ` (52 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Theodore Ts'o, Andreas Dilger,
	open list:EXT4 FILE SYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/ext4/page-io.c  | 3 ++-
 fs/ext4/readpage.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 0094923e5ebf..abde26af55e7 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -63,8 +63,9 @@ static void ext4_finish_bio(struct bio *bio)
 {
 	int i;
 	struct bio_vec *bvec;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		struct page *page = bvec->bv_page;
 #ifdef CONFIG_EXT4_FS_ENCRYPTION
 		struct page *data_page = NULL;
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index a81b829d56de..b30444fd9333 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -71,6 +71,7 @@ static void mpage_end_io(struct bio *bio)
 {
 	struct bio_vec *bv;
 	int i;
+	struct bvec_iter_all bia;
 
 	if (ext4_bio_encrypted(bio)) {
 		if (bio->bi_error) {
@@ -80,7 +81,7 @@ static void mpage_end_io(struct bio *bio)
 			return;
 		}
 	}
-	bio_for_each_segment_all(bv, bio, i) {
+	bio_for_each_segment_all_rd(bv, bio, i, bia) {
 		struct page *page = bv->bv_page;
 
 		if (!bio->bi_error) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 51/60] xfs: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Dave Chinner,
	supporter:XFS FILESYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/xfs/xfs_aops.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3e57a56cf829..974b0a516f1d 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -144,6 +144,7 @@ xfs_destroy_ioend(
 	for (bio = &ioend->io_inline_bio; bio; bio = next) {
 		struct bio_vec	*bvec;
 		int		i;
+		struct bvec_iter_all bia;
 
 		/*
 		 * For the last bio, bi_private points to the ioend, so we
@@ -155,7 +156,7 @@ xfs_destroy_ioend(
 			next = bio->bi_private;
 
 		/* walk each page on bio, ending page IO on them */
-		bio_for_each_segment_all(bvec, bio, i)
+		bio_for_each_segment_all_rd(bvec, bio, i, bia)
 			xfs_finish_page_writeback(inode, bvec, error);
 
 		bio_put(bio);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 51/60] xfs: convert to bio_for_each_segment_all_rd()
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Dave Chinner,
	supporter:XFS FILESYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/xfs/xfs_aops.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3e57a56cf829..974b0a516f1d 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -144,6 +144,7 @@ xfs_destroy_ioend(
 	for (bio = &ioend->io_inline_bio; bio; bio = next) {
 		struct bio_vec	*bvec;
 		int		i;
+		struct bvec_iter_all bia;
 
 		/*
 		 * For the last bio, bi_private points to the ioend, so we
@@ -155,7 +156,7 @@ xfs_destroy_ioend(
 			next = bio->bi_private;
 
 		/* walk each page on bio, ending page IO on them */
-		bio_for_each_segment_all(bvec, bio, i)
+		bio_for_each_segment_all_rd(bvec, bio, i, bia)
 			xfs_finish_page_writeback(inode, bvec, error);
 
 		bio_put(bio);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 52/60] logfs: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
                   ` (54 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Joern Engel, Prasad Joshi,
	open list:LogFS

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/logfs/dev_bdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index f05a02ff43e6..b81bd2154253 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -55,10 +55,11 @@ static void writeseg_end_io(struct bio *bio)
 	int i;
 	struct super_block *sb = bio->bi_private;
 	struct logfs_super *super = logfs_super(sb);
+	struct bvec_iter_all bia;
 
 	BUG_ON(bio->bi_error); /* FIXME: Retry io or write elsewhere */
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		end_page_writeback(bvec->bv_page);
 		put_page(bvec->bv_page);
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 53/60] gfs2: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Steven Whitehouse, Bob Peterson,
	open list:GFS2 FILE SYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/gfs2/lops.c    | 3 ++-
 fs/gfs2/meta_io.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 49d5a1b61b06..f03a52e06ce5 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -208,13 +208,14 @@ static void gfs2_end_log_write(struct bio *bio)
 	struct bio_vec *bvec;
 	struct page *page;
 	int i;
+	struct bvec_iter_all bia;
 
 	if (bio->bi_error) {
 		sdp->sd_log_error = bio->bi_error;
 		fs_err(sdp, "Error %d writing to log\n", bio->bi_error);
 	}
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		page = bvec->bv_page;
 		if (page_has_buffers(page))
 			gfs2_end_log_write_bh(sdp, bvec, bio->bi_error);
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 373639a59782..3ab7a8609009 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -191,8 +191,9 @@ static void gfs2_meta_read_endio(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	int i;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		struct page *page = bvec->bv_page;
 		struct buffer_head *bh = page_buffers(page);
 		unsigned int len = bvec->bv_len;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [Cluster-devel] [PATCH 53/60] gfs2: convert to bio_for_each_segment_all_rd()
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/gfs2/lops.c    | 3 ++-
 fs/gfs2/meta_io.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 49d5a1b61b06..f03a52e06ce5 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -208,13 +208,14 @@ static void gfs2_end_log_write(struct bio *bio)
 	struct bio_vec *bvec;
 	struct page *page;
 	int i;
+	struct bvec_iter_all bia;
 
 	if (bio->bi_error) {
 		sdp->sd_log_error = bio->bi_error;
 		fs_err(sdp, "Error %d writing to log\n", bio->bi_error);
 	}
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		page = bvec->bv_page;
 		if (page_has_buffers(page))
 			gfs2_end_log_write_bh(sdp, bvec, bio->bi_error);
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 373639a59782..3ab7a8609009 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -191,8 +191,9 @@ static void gfs2_meta_read_endio(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	int i;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		struct page *page = bvec->bv_page;
 		struct buffer_head *bh = page_buffers(page);
 		unsigned int len = bvec->bv_len;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 54/60] f2fs: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jaegeuk Kim, Chao Yu,
	open list:F2FS FILE SYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/f2fs/data.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 24f6f6977d37..04b1a5caf2d6 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -33,6 +33,7 @@ static void f2fs_read_end_io(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	int i;
+	struct bvec_iter_all bia;
 
 #ifdef CONFIG_F2FS_FAULT_INJECTION
 	/*
@@ -52,7 +53,7 @@ static void f2fs_read_end_io(struct bio *bio)
 		}
 	}
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		struct page *page = bvec->bv_page;
 
 		if (!bio->bi_error) {
@@ -72,8 +73,9 @@ static void f2fs_write_end_io(struct bio *bio)
 	struct f2fs_sb_info *sbi = bio->bi_private;
 	struct bio_vec *bvec;
 	int i;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		struct page *page = bvec->bv_page;
 
 		fscrypt_pullback_bio_page(&page, true);
@@ -145,6 +147,7 @@ static bool __has_merged_page(struct f2fs_bio_info *io, struct inode *inode,
 	struct bio_vec *bvec;
 	struct page *target;
 	int i;
+	struct bvec_iter_all bia;
 
 	if (!io->bio)
 		return false;
@@ -152,7 +155,7 @@ static bool __has_merged_page(struct f2fs_bio_info *io, struct inode *inode,
 	if (!inode && !page && !ino)
 		return true;
 
-	bio_for_each_segment_all(bvec, io->bio, i) {
+	bio_for_each_segment_all_rd(bvec, io->bio, i, bia) {
 
 		if (bvec->bv_page->mapping)
 			target = bvec->bv_page;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 54/60] f2fs: convert to bio_for_each_segment_all_rd()
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jaegeuk Kim, Chao Yu,
	open list:F2FS FILE SYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/f2fs/data.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 24f6f6977d37..04b1a5caf2d6 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -33,6 +33,7 @@ static void f2fs_read_end_io(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	int i;
+	struct bvec_iter_all bia;
 
 #ifdef CONFIG_F2FS_FAULT_INJECTION
 	/*
@@ -52,7 +53,7 @@ static void f2fs_read_end_io(struct bio *bio)
 		}
 	}
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		struct page *page = bvec->bv_page;
 
 		if (!bio->bi_error) {
@@ -72,8 +73,9 @@ static void f2fs_write_end_io(struct bio *bio)
 	struct f2fs_sb_info *sbi = bio->bi_private;
 	struct bio_vec *bvec;
 	int i;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bvec, bio, i) {
+	bio_for_each_segment_all_rd(bvec, bio, i, bia) {
 		struct page *page = bvec->bv_page;
 
 		fscrypt_pullback_bio_page(&page, true);
@@ -145,6 +147,7 @@ static bool __has_merged_page(struct f2fs_bio_info *io, struct inode *inode,
 	struct bio_vec *bvec;
 	struct page *target;
 	int i;
+	struct bvec_iter_all bia;
 
 	if (!io->bio)
 		return false;
@@ -152,7 +155,7 @@ static bool __has_merged_page(struct f2fs_bio_info *io, struct inode *inode,
 	if (!inode && !page && !ino)
 		return true;
 
-	bio_for_each_segment_all(bvec, io->bio, i) {
+	bio_for_each_segment_all_rd(bvec, io->bio, i, bia) {
 
 		if (bvec->bv_page->mapping)
 			target = bvec->bv_page;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 55/60] exofs: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
                   ` (57 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Boaz Harrosh, Benny Halevy,
	open list:OSD LIBRARY and FILESYSTEM

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/exofs/ore.c      | 3 ++-
 fs/exofs/ore_raid.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
index 8bb72807e70d..826696f3bdf2 100644
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -406,8 +406,9 @@ static void _clear_bio(struct bio *bio)
 {
 	struct bio_vec *bv;
 	unsigned i;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bv, bio, i) {
+	bio_for_each_segment_all_rd(bv, bio, i, bia) {
 		unsigned this_count = bv->bv_len;
 
 		if (likely(PAGE_SIZE == this_count))
diff --git a/fs/exofs/ore_raid.c b/fs/exofs/ore_raid.c
index 27cbdb697649..a083d2f3f10d 100644
--- a/fs/exofs/ore_raid.c
+++ b/fs/exofs/ore_raid.c
@@ -429,6 +429,7 @@ static void _mark_read4write_pages_uptodate(struct ore_io_state *ios, int ret)
 {
 	struct bio_vec *bv;
 	unsigned i, d;
+	struct bvec_iter_all bia;
 
 	/* loop on all devices all pages */
 	for (d = 0; d < ios->numdevs; d++) {
@@ -437,7 +438,7 @@ static void _mark_read4write_pages_uptodate(struct ore_io_state *ios, int ret)
 		if (!bio)
 			continue;
 
-		bio_for_each_segment_all(bv, bio, i) {
+		bio_for_each_segment_all_rd(bv, bio, i, bia) {
 			struct page *page = bv->bv_page;
 
 			SetPageUptodate(page);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 56/60] fs: crypto: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
                   ` (58 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Theodore Y. Ts'o, Jaegeuk Kim

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/crypto/crypto.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index 98f87fe8f186..ed007ebbd1ab 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -411,8 +411,9 @@ static void completion_pages(struct work_struct *work)
 	struct bio *bio = ctx->r.bio;
 	struct bio_vec *bv;
 	int i;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bv, bio, i) {
+	bio_for_each_segment_all_rd(bv, bio, i, bia) {
 		struct page *page = bv->bv_page;
 		int ret = fscrypt_decrypt_page(page);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 57/60] bcache: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Kent Overstreet, Shaohua Li,
	Hannes Reinecke, Jiri Kosina, Mike Christie, Guoqing Jiang,
	Zheng Liu, open list:BCACHE BLOCK LAYER CACHE,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/btree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index b419bc91ba32..89abada6a091 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -419,8 +419,9 @@ static void do_btree_node_write(struct btree *b)
 		int j;
 		struct bio_vec *bv;
 		void *base = (void *) ((unsigned long) i & ~(PAGE_SIZE - 1));
+		struct bvec_iter_all bia;
 
-		bio_for_each_segment_all(bv, b->bio, j)
+		bio_for_each_segment_all_rd(bv, b->bio, j, bia)
 			memcpy(page_address(bv->bv_page),
 			       base + j * PAGE_SIZE, PAGE_SIZE);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 57/60] bcache: convert to bio_for_each_segment_all_rd()
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Kent Overstreet, Shaohua Li,
	Hannes Reinecke, Jiri Kosina, Mike Christie, Guoqing Jiang,
	Zheng Liu, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/bcache/btree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index b419bc91ba32..89abada6a091 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -419,8 +419,9 @@ static void do_btree_node_write(struct btree *b)
 		int j;
 		struct bio_vec *bv;
 		void *base = (void *) ((unsigned long) i & ~(PAGE_SIZE - 1));
+		struct bvec_iter_all bia;
 
-		bio_for_each_segment_all(bv, b->bio, j)
+		bio_for_each_segment_all_rd(bv, b->bio, j, bia)
 			memcpy(page_address(bv->bv_page),
 			       base + j * PAGE_SIZE, PAGE_SIZE);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 58/60] dm-crypt: convert to bio_for_each_segment_all_rd()
  2016-10-29  8:07 ` Ming Lei
@ 2016-10-29  8:08   ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER LVM, Shaohua Li,
	open list:SOFTWARE RAID Multiple Disks SUPPORT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/dm-crypt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 4999c7497f95..ed0f54e51638 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1034,8 +1034,9 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone)
 {
 	unsigned int i;
 	struct bio_vec *bv;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bv, clone, i) {
+	bio_for_each_segment_all_rd(bv, clone, i, bia) {
 		BUG_ON(!bv->bv_page);
 		mempool_free(bv->bv_page, cc->page_pool);
 		bv->bv_page = NULL;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 58/60] dm-crypt: convert to bio_for_each_segment_all_rd()
@ 2016-10-29  8:08   ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/md/dm-crypt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 4999c7497f95..ed0f54e51638 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1034,8 +1034,9 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone)
 {
 	unsigned int i;
 	struct bio_vec *bv;
+	struct bvec_iter_all bia;
 
-	bio_for_each_segment_all(bv, clone, i) {
+	bio_for_each_segment_all_rd(bv, clone, i, bia) {
 		BUG_ON(!bv->bv_page);
 		mempool_free(bv->bv_page, cc->page_pool);
 		bv->bv_page = NULL;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 59/60] fs/buffer.c: use bvec iterator to truncate the bio
  2016-10-29  8:07 ` Ming Lei
                   ` (61 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Alexander Viro

For making multipage bvec working well.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 fs/buffer.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 81c3793948b4..293e081a4b5f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3021,8 +3021,7 @@ void guard_bio_eod(int op, struct bio *bio)
 	unsigned truncated_bytes;
 	/*
 	 * It is safe to truncate the last bvec in the following way
-	 * even though multipage bvec is supported, but we need to
-	 * fix the parameters passed to zero_user().
+	 * even though multipage bvec is supported.
 	 */
 	struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
 
@@ -3045,15 +3044,21 @@ void guard_bio_eod(int op, struct bio *bio)
 	/* Uhhuh. We've got a bio that straddles the device size! */
 	truncated_bytes = bio->bi_iter.bi_size - (maxsector << 9);
 
-	/* Truncate the bio.. */
-	bio->bi_iter.bi_size -= truncated_bytes;
-	bvec->bv_len -= truncated_bytes;
-
 	/* ..and clear the end of the buffer for reads */
 	if (op == REQ_OP_READ) {
-		zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len,
-				truncated_bytes);
+		struct bvec_iter start = BVEC_ITER_ALL_INIT;
+		struct bvec_iter iter;
+		struct bio_vec bv;
+
+		start.bi_bvec_done = bvec->bv_len - truncated_bytes;
+
+		__bvec_for_each_sp_bvec(bv, bvec, iter, start)
+			zero_user(bv.bv_page, bv.bv_offset, bv.bv_len);
 	}
+
+	/* Truncate the bio.. */
+	bio->bi_iter.bi_size -= truncated_bytes;
+	bvec->bv_len -= truncated_bytes;
 }
 
 static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 60/60] block: enable multipage bvecs
  2016-10-29  8:07 ` Ming Lei
                   ` (62 preceding siblings ...)
  (?)
@ 2016-10-29  8:08 ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29  8:08 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel
  Cc: linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Ming Lei, Jens Axboe

This patch pulls the trigger for multipage bvecs.

Any request queue which doesn't set QUEUE_FLAG_NO_MP
should support to handle multipage bvecs.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 block/bio.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index c9cf0a81cca3..b73777dc59c3 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -838,6 +838,11 @@ int bio_add_page(struct bio *bio, struct page *page,
 	 * a consecutive offset.  Optimize this special case.
 	 */
 	if (bio->bi_vcnt > 0) {
+		struct request_queue *q = NULL;
+
+		if (bio->bi_bdev)
+			q = bdev_get_queue(bio->bi_bdev);
+
 		bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
 
 		if (page == bv->bv_page &&
@@ -845,6 +850,14 @@ int bio_add_page(struct bio *bio, struct page *page,
 			bv->bv_len += len;
 			goto done;
 		}
+
+		/* disable multipage bvec too if cluster isn't enabled */
+		if (q && !blk_queue_no_mp(q) && blk_queue_cluster(q) &&
+		    (bvec_to_phys(bv) + bv->bv_len ==
+		     page_to_phys(page) + offset)) {
+			bv->bv_len += len;
+			goto done;
+		}
 	}
 
 	if (bio->bi_vcnt >= bio->bi_max_vecs)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* Re: [PATCH 32/60] block: implement sp version of bvec iterator helpers
  2016-10-29  8:08 ` [PATCH 32/60] block: implement sp version of bvec iterator helpers Ming Lei
@ 2016-10-29 11:06   ` kbuild test robot
  2016-12-17 11:38       ` Ming Lei
  0 siblings, 1 reply; 148+ messages in thread
From: kbuild test robot @ 2016-10-29 11:06 UTC (permalink / raw)
  To: Ming Lei
  Cc: kbuild-all, Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Ming Lei, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 25881 bytes --]

Hi Ming,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.9-rc2 next-20161028]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Ming-Lei/block-support-multipage-bvec/20161029-163910
config: sparc-defconfig (attached as .config)
compiler: sparc-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc 

All error/warnings (new ones prefixed by >>):

   In file included from arch/sparc/include/asm/oplib.h:6:0,
                    from arch/sparc/include/asm/pgtable_32.h:21,
                    from arch/sparc/include/asm/pgtable.h:6,
                    from include/linux/mm.h:68,
                    from include/linux/bvec.h:25,
                    from include/linux/blk_types.h:9,
                    from include/linux/fs.h:31,
                    from include/linux/proc_fs.h:8,
                    from arch/sparc/include/asm/prom.h:22,
                    from include/linux/of.h:232,
                    from arch/sparc/include/asm/openprom.h:14,
                    from arch/sparc/include/asm/device.h:9,
                    from include/linux/device.h:30,
                    from include/linux/node.h:17,
                    from include/linux/cpu.h:16,
                    from include/linux/stop_machine.h:4,
                    from kernel/sched/sched.h:10,
                    from kernel/sched/loadavg.c:11:
>> arch/sparc/include/asm/oplib_32.h:105:39: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
    int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
                                          ^~~~~~~~~~~~~~~~~~~~
   arch/sparc/include/asm/oplib_32.h:168:36: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
    void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
                                       ^~~~~~~~~~~~~~~~~~~~
   arch/sparc/include/asm/oplib_32.h:172:18: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
              struct linux_prom_registers *sbusregs, int nregs);
                     ^~~~~~~~~~~~~~~~~~~~
--
   In file included from arch/sparc/include/asm/oplib.h:6:0,
                    from arch/sparc/include/asm/pgtable_32.h:21,
                    from arch/sparc/include/asm/pgtable.h:6,
                    from include/linux/mm.h:68,
                    from include/linux/bvec.h:25,
                    from include/linux/blk_types.h:9,
                    from include/linux/fs.h:31,
                    from include/linux/proc_fs.h:8,
                    from arch/sparc/include/asm/prom.h:22,
                    from include/linux/of.h:232,
                    from arch/sparc/include/asm/openprom.h:14,
                    from arch/sparc/prom/mp.c:12:
>> arch/sparc/include/asm/oplib_32.h:105:39: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
    int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
                                          ^~~~~~~~~~~~~~~~~~~~
   arch/sparc/include/asm/oplib_32.h:168:36: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
    void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
                                       ^~~~~~~~~~~~~~~~~~~~
   arch/sparc/include/asm/oplib_32.h:172:18: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
              struct linux_prom_registers *sbusregs, int nregs);
                     ^~~~~~~~~~~~~~~~~~~~
>> arch/sparc/prom/mp.c:23:1: error: conflicting types for 'prom_startcpu'
    prom_startcpu(int cpunode, struct linux_prom_registers *ctable_reg, int ctx, char *pc)
    ^~~~~~~~~~~~~
   In file included from arch/sparc/include/asm/oplib.h:6:0,
                    from arch/sparc/include/asm/pgtable_32.h:21,
                    from arch/sparc/include/asm/pgtable.h:6,
                    from include/linux/mm.h:68,
                    from include/linux/bvec.h:25,
                    from include/linux/blk_types.h:9,
                    from include/linux/fs.h:31,
                    from include/linux/proc_fs.h:8,
                    from arch/sparc/include/asm/prom.h:22,
                    from include/linux/of.h:232,
                    from arch/sparc/include/asm/openprom.h:14,
                    from arch/sparc/prom/mp.c:12:
   arch/sparc/include/asm/oplib_32.h:105:5: note: previous declaration of 'prom_startcpu' was here
    int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
        ^~~~~~~~~~~~~
--
   In file included from arch/sparc/include/asm/oplib.h:6:0,
                    from arch/sparc/include/asm/pgtable_32.h:21,
                    from arch/sparc/include/asm/pgtable.h:6,
                    from include/linux/mm.h:68,
                    from include/linux/bvec.h:25,
                    from include/linux/blk_types.h:9,
                    from include/linux/fs.h:31,
                    from include/linux/proc_fs.h:8,
                    from arch/sparc/include/asm/prom.h:22,
                    from include/linux/of.h:232,
                    from arch/sparc/include/asm/openprom.h:14,
                    from arch/sparc/prom/ranges.c:11:
>> arch/sparc/include/asm/oplib_32.h:105:39: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
    int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
                                          ^~~~~~~~~~~~~~~~~~~~
   arch/sparc/include/asm/oplib_32.h:168:36: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
    void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
                                       ^~~~~~~~~~~~~~~~~~~~
   arch/sparc/include/asm/oplib_32.h:172:18: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
              struct linux_prom_registers *sbusregs, int nregs);
                     ^~~~~~~~~~~~~~~~~~~~
>> arch/sparc/prom/ranges.c:57:6: error: conflicting types for 'prom_apply_obio_ranges'
    void prom_apply_obio_ranges(struct linux_prom_registers *regs, int nregs)
         ^~~~~~~~~~~~~~~~~~~~~~
   In file included from arch/sparc/include/asm/oplib.h:6:0,
                    from arch/sparc/include/asm/pgtable_32.h:21,
                    from arch/sparc/include/asm/pgtable.h:6,
                    from include/linux/mm.h:68,
                    from include/linux/bvec.h:25,
                    from include/linux/blk_types.h:9,
                    from include/linux/fs.h:31,
                    from include/linux/proc_fs.h:8,
                    from arch/sparc/include/asm/prom.h:22,
                    from include/linux/of.h:232,
                    from arch/sparc/include/asm/openprom.h:14,
                    from arch/sparc/prom/ranges.c:11:
   arch/sparc/include/asm/oplib_32.h:168:6: note: previous declaration of 'prom_apply_obio_ranges' was here
    void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
         ^~~~~~~~~~~~~~~~~~~~~~
   In file included from include/linux/linkage.h:6:0,
                    from include/linux/kernel.h:6,
                    from include/linux/list.h:8,
                    from include/linux/module.h:9,
                    from arch/sparc/prom/ranges.c:9:
   arch/sparc/prom/ranges.c:62:15: error: conflicting types for 'prom_apply_obio_ranges'
    EXPORT_SYMBOL(prom_apply_obio_ranges);
                  ^
   include/linux/export.h:58:21: note: in definition of macro '___EXPORT_SYMBOL'
     extern typeof(sym) sym;      \
                        ^~~
>> arch/sparc/prom/ranges.c:62:1: note: in expansion of macro 'EXPORT_SYMBOL'
    EXPORT_SYMBOL(prom_apply_obio_ranges);
    ^~~~~~~~~~~~~
   In file included from arch/sparc/include/asm/oplib.h:6:0,
                    from arch/sparc/include/asm/pgtable_32.h:21,
                    from arch/sparc/include/asm/pgtable.h:6,
                    from include/linux/mm.h:68,
                    from include/linux/bvec.h:25,
                    from include/linux/blk_types.h:9,
                    from include/linux/fs.h:31,
                    from include/linux/proc_fs.h:8,
                    from arch/sparc/include/asm/prom.h:22,
                    from include/linux/of.h:232,
                    from arch/sparc/include/asm/openprom.h:14,
                    from arch/sparc/prom/ranges.c:11:
   arch/sparc/include/asm/oplib_32.h:168:6: note: previous declaration of 'prom_apply_obio_ranges' was here
    void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
         ^~~~~~~~~~~~~~~~~~~~~~
>> arch/sparc/prom/ranges.c:87:6: error: conflicting types for 'prom_apply_generic_ranges'
    void prom_apply_generic_ranges(phandle node, phandle parent,
         ^~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from arch/sparc/include/asm/oplib.h:6:0,
                    from arch/sparc/include/asm/pgtable_32.h:21,
                    from arch/sparc/include/asm/pgtable.h:6,
                    from include/linux/mm.h:68,
                    from include/linux/bvec.h:25,
                    from include/linux/blk_types.h:9,
                    from include/linux/fs.h:31,
                    from include/linux/proc_fs.h:8,
                    from arch/sparc/include/asm/prom.h:22,
                    from include/linux/of.h:232,
                    from arch/sparc/include/asm/openprom.h:14,
                    from arch/sparc/prom/ranges.c:11:
   arch/sparc/include/asm/oplib_32.h:171:6: note: previous declaration of 'prom_apply_generic_ranges' was here
    void prom_apply_generic_ranges(phandle node, phandle parent,
         ^~~~~~~~~~~~~~~~~~~~~~~~~
--
   In file included from include/linux/bvec.h:25:0,
                    from include/linux/blk_types.h:9,
                    from include/linux/fs.h:31,
                    from include/linux/proc_fs.h:8,
                    from arch/sparc/include/asm/prom.h:22,
                    from include/linux/of.h:232,
                    from arch/sparc/include/asm/openprom.h:14,
                    from arch/sparc/include/asm/oplib_32.h:11,
                    from arch/sparc/include/asm/oplib.h:6,
                    from arch/sparc/include/asm/pgtable_32.h:21,
                    from arch/sparc/include/asm/pgtable.h:6,
                    from mm/init-mm.c:9:
   include/linux/mm.h: In function 'is_vmalloc_addr':
>> include/linux/mm.h:486:17: error: 'VMALLOC_START' undeclared (first use in this function)
     return addr >= VMALLOC_START && addr < VMALLOC_END;
                    ^~~~~~~~~~~~~
   include/linux/mm.h:486:17: note: each undeclared identifier is reported only once for each function it appears in
>> include/linux/mm.h:486:41: error: 'VMALLOC_END' undeclared (first use in this function)
     return addr >= VMALLOC_START && addr < VMALLOC_END;
                                            ^~~~~~~~~~~
   include/linux/mm.h: In function 'maybe_mkwrite':
>> include/linux/mm.h:624:9: error: implicit declaration of function 'pte_mkwrite' [-Werror=implicit-function-declaration]
      pte = pte_mkwrite(pte);
            ^~~~~~~~~~~
   In file included from include/linux/bvec.h:25:0,
                    from include/linux/blk_types.h:9,
                    from include/linux/fs.h:31,
                    from include/linux/proc_fs.h:8,
                    from arch/sparc/include/asm/prom.h:22,
                    from include/linux/of.h:232,
                    from arch/sparc/include/asm/openprom.h:14,
                    from arch/sparc/include/asm/oplib_32.h:11,
                    from arch/sparc/include/asm/oplib.h:6,
                    from arch/sparc/include/asm/pgtable_32.h:21,
                    from arch/sparc/include/asm/pgtable.h:6,
                    from mm/init-mm.c:9:
   include/linux/mm.h: In function 'pgtable_init':
>> include/linux/mm.h:1674:2: error: implicit declaration of function 'pgtable_cache_init' [-Werror=implicit-function-declaration]
     pgtable_cache_init();
     ^~~~~~~~~~~~~~~~~~
   In file included from arch/sparc/include/asm/pgtable.h:6:0,
                    from mm/init-mm.c:9:
   arch/sparc/include/asm/pgtable_32.h: At top level:
>> arch/sparc/include/asm/pgtable_32.h:245:21: error: conflicting types for 'pte_mkwrite'
    static inline pte_t pte_mkwrite(pte_t pte)
                        ^~~~~~~~~~~
   In file included from include/linux/bvec.h:25:0,
                    from include/linux/blk_types.h:9,
                    from include/linux/fs.h:31,
                    from include/linux/proc_fs.h:8,
                    from arch/sparc/include/asm/prom.h:22,
                    from include/linux/of.h:232,
                    from arch/sparc/include/asm/openprom.h:14,
                    from arch/sparc/include/asm/oplib_32.h:11,
                    from arch/sparc/include/asm/oplib.h:6,
                    from arch/sparc/include/asm/pgtable_32.h:21,
                    from arch/sparc/include/asm/pgtable.h:6,
                    from mm/init-mm.c:9:
   include/linux/mm.h:624:9: note: previous implicit declaration of 'pte_mkwrite' was here
      pte = pte_mkwrite(pte);
            ^~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/VMALLOC_START +486 include/linux/mm.h

0738c4bb8 Paul Mundt             2008-03-12  480   */
bb00a789e Yaowei Bai             2016-05-19  481  static inline bool is_vmalloc_addr(const void *x)
9e2779fa2 Christoph Lameter      2008-02-04  482  {
0738c4bb8 Paul Mundt             2008-03-12  483  #ifdef CONFIG_MMU
9e2779fa2 Christoph Lameter      2008-02-04  484  	unsigned long addr = (unsigned long)x;
9e2779fa2 Christoph Lameter      2008-02-04  485  
9e2779fa2 Christoph Lameter      2008-02-04 @486  	return addr >= VMALLOC_START && addr < VMALLOC_END;
0738c4bb8 Paul Mundt             2008-03-12  487  #else
bb00a789e Yaowei Bai             2016-05-19  488  	return false;
8ca3ed87d David Howells          2008-02-23  489  #endif
0738c4bb8 Paul Mundt             2008-03-12  490  }
81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  491  #ifdef CONFIG_MMU
81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  492  extern int is_vmalloc_or_module_addr(const void *x);
81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  493  #else
934831d06 David Howells          2009-09-24  494  static inline int is_vmalloc_or_module_addr(const void *x)
81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  495  {
81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  496  	return 0;
81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  497  }
81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  498  #endif
9e2779fa2 Christoph Lameter      2008-02-04  499  
39f1f78d5 Al Viro                2014-05-06  500  extern void kvfree(const void *addr);
39f1f78d5 Al Viro                2014-05-06  501  
53f9263ba Kirill A. Shutemov     2016-01-15  502  static inline atomic_t *compound_mapcount_ptr(struct page *page)
53f9263ba Kirill A. Shutemov     2016-01-15  503  {
53f9263ba Kirill A. Shutemov     2016-01-15  504  	return &page[1].compound_mapcount;
53f9263ba Kirill A. Shutemov     2016-01-15  505  }
53f9263ba Kirill A. Shutemov     2016-01-15  506  
53f9263ba Kirill A. Shutemov     2016-01-15  507  static inline int compound_mapcount(struct page *page)
53f9263ba Kirill A. Shutemov     2016-01-15  508  {
5f527c2b3 Andrea Arcangeli       2016-05-20  509  	VM_BUG_ON_PAGE(!PageCompound(page), page);
53f9263ba Kirill A. Shutemov     2016-01-15  510  	page = compound_head(page);
53f9263ba Kirill A. Shutemov     2016-01-15  511  	return atomic_read(compound_mapcount_ptr(page)) + 1;
53f9263ba Kirill A. Shutemov     2016-01-15  512  }
53f9263ba Kirill A. Shutemov     2016-01-15  513  
ccaafd7fd Joonsoo Kim            2015-02-10  514  /*
70b50f94f Andrea Arcangeli       2011-11-02  515   * The atomic page->_mapcount, starts from -1: so that transitions
70b50f94f Andrea Arcangeli       2011-11-02  516   * both from it and to it can be tracked, using atomic_inc_and_test
70b50f94f Andrea Arcangeli       2011-11-02  517   * and atomic_add_negative(-1).
70b50f94f Andrea Arcangeli       2011-11-02  518   */
22b751c3d Mel Gorman             2013-02-22  519  static inline void page_mapcount_reset(struct page *page)
70b50f94f Andrea Arcangeli       2011-11-02  520  {
70b50f94f Andrea Arcangeli       2011-11-02  521  	atomic_set(&(page)->_mapcount, -1);
70b50f94f Andrea Arcangeli       2011-11-02  522  }
70b50f94f Andrea Arcangeli       2011-11-02  523  
b20ce5e03 Kirill A. Shutemov     2016-01-15  524  int __page_mapcount(struct page *page);
b20ce5e03 Kirill A. Shutemov     2016-01-15  525  
70b50f94f Andrea Arcangeli       2011-11-02  526  static inline int page_mapcount(struct page *page)
70b50f94f Andrea Arcangeli       2011-11-02  527  {
1d148e218 Wang, Yalin            2015-02-11  528  	VM_BUG_ON_PAGE(PageSlab(page), page);
53f9263ba Kirill A. Shutemov     2016-01-15  529  
b20ce5e03 Kirill A. Shutemov     2016-01-15  530  	if (unlikely(PageCompound(page)))
b20ce5e03 Kirill A. Shutemov     2016-01-15  531  		return __page_mapcount(page);
b20ce5e03 Kirill A. Shutemov     2016-01-15  532  	return atomic_read(&page->_mapcount) + 1;
53f9263ba Kirill A. Shutemov     2016-01-15  533  }
b20ce5e03 Kirill A. Shutemov     2016-01-15  534  
b20ce5e03 Kirill A. Shutemov     2016-01-15  535  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
b20ce5e03 Kirill A. Shutemov     2016-01-15  536  int total_mapcount(struct page *page);
6d0a07edd Andrea Arcangeli       2016-05-12  537  int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
b20ce5e03 Kirill A. Shutemov     2016-01-15  538  #else
b20ce5e03 Kirill A. Shutemov     2016-01-15  539  static inline int total_mapcount(struct page *page)
b20ce5e03 Kirill A. Shutemov     2016-01-15  540  {
b20ce5e03 Kirill A. Shutemov     2016-01-15  541  	return page_mapcount(page);
70b50f94f Andrea Arcangeli       2011-11-02  542  }
6d0a07edd Andrea Arcangeli       2016-05-12  543  static inline int page_trans_huge_mapcount(struct page *page,
6d0a07edd Andrea Arcangeli       2016-05-12  544  					   int *total_mapcount)
6d0a07edd Andrea Arcangeli       2016-05-12  545  {
6d0a07edd Andrea Arcangeli       2016-05-12  546  	int mapcount = page_mapcount(page);
6d0a07edd Andrea Arcangeli       2016-05-12  547  	if (total_mapcount)
6d0a07edd Andrea Arcangeli       2016-05-12  548  		*total_mapcount = mapcount;
6d0a07edd Andrea Arcangeli       2016-05-12  549  	return mapcount;
6d0a07edd Andrea Arcangeli       2016-05-12  550  }
b20ce5e03 Kirill A. Shutemov     2016-01-15  551  #endif
70b50f94f Andrea Arcangeli       2011-11-02  552  
b49af68ff Christoph Lameter      2007-05-06  553  static inline struct page *virt_to_head_page(const void *x)
b49af68ff Christoph Lameter      2007-05-06  554  {
b49af68ff Christoph Lameter      2007-05-06  555  	struct page *page = virt_to_page(x);
ccaafd7fd Joonsoo Kim            2015-02-10  556  
1d798ca3f Kirill A. Shutemov     2015-11-06  557  	return compound_head(page);
b49af68ff Christoph Lameter      2007-05-06  558  }
b49af68ff Christoph Lameter      2007-05-06  559  
ddc58f27f Kirill A. Shutemov     2016-01-15  560  void __put_page(struct page *page);
ddc58f27f Kirill A. Shutemov     2016-01-15  561  
1d7ea7324 Alexander Zarochentsev 2006-08-13  562  void put_pages_list(struct list_head *pages);
^1da177e4 Linus Torvalds         2005-04-16  563  
8dfcc9ba2 Nick Piggin            2006-03-22  564  void split_page(struct page *page, unsigned int order);
8dfcc9ba2 Nick Piggin            2006-03-22  565  
^1da177e4 Linus Torvalds         2005-04-16  566  /*
33f2ef89f Andy Whitcroft         2006-12-06  567   * Compound pages have a destructor function.  Provide a
33f2ef89f Andy Whitcroft         2006-12-06  568   * prototype for that function and accessor functions.
f1e61557f Kirill A. Shutemov     2015-11-06  569   * These are _only_ valid on the head of a compound page.
33f2ef89f Andy Whitcroft         2006-12-06  570   */
f1e61557f Kirill A. Shutemov     2015-11-06  571  typedef void compound_page_dtor(struct page *);
f1e61557f Kirill A. Shutemov     2015-11-06  572  
f1e61557f Kirill A. Shutemov     2015-11-06  573  /* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
f1e61557f Kirill A. Shutemov     2015-11-06  574  enum compound_dtor_id {
f1e61557f Kirill A. Shutemov     2015-11-06  575  	NULL_COMPOUND_DTOR,
f1e61557f Kirill A. Shutemov     2015-11-06  576  	COMPOUND_PAGE_DTOR,
f1e61557f Kirill A. Shutemov     2015-11-06  577  #ifdef CONFIG_HUGETLB_PAGE
f1e61557f Kirill A. Shutemov     2015-11-06  578  	HUGETLB_PAGE_DTOR,
f1e61557f Kirill A. Shutemov     2015-11-06  579  #endif
9a982250f Kirill A. Shutemov     2016-01-15  580  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
9a982250f Kirill A. Shutemov     2016-01-15  581  	TRANSHUGE_PAGE_DTOR,
9a982250f Kirill A. Shutemov     2016-01-15  582  #endif
f1e61557f Kirill A. Shutemov     2015-11-06  583  	NR_COMPOUND_DTORS,
f1e61557f Kirill A. Shutemov     2015-11-06  584  };
f1e61557f Kirill A. Shutemov     2015-11-06  585  extern compound_page_dtor * const compound_page_dtors[];
33f2ef89f Andy Whitcroft         2006-12-06  586  
33f2ef89f Andy Whitcroft         2006-12-06  587  static inline void set_compound_page_dtor(struct page *page,
f1e61557f Kirill A. Shutemov     2015-11-06  588  		enum compound_dtor_id compound_dtor)
33f2ef89f Andy Whitcroft         2006-12-06  589  {
f1e61557f Kirill A. Shutemov     2015-11-06  590  	VM_BUG_ON_PAGE(compound_dtor >= NR_COMPOUND_DTORS, page);
f1e61557f Kirill A. Shutemov     2015-11-06  591  	page[1].compound_dtor = compound_dtor;
33f2ef89f Andy Whitcroft         2006-12-06  592  }
33f2ef89f Andy Whitcroft         2006-12-06  593  
33f2ef89f Andy Whitcroft         2006-12-06  594  static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
33f2ef89f Andy Whitcroft         2006-12-06  595  {
f1e61557f Kirill A. Shutemov     2015-11-06  596  	VM_BUG_ON_PAGE(page[1].compound_dtor >= NR_COMPOUND_DTORS, page);
f1e61557f Kirill A. Shutemov     2015-11-06  597  	return compound_page_dtors[page[1].compound_dtor];
33f2ef89f Andy Whitcroft         2006-12-06  598  }
33f2ef89f Andy Whitcroft         2006-12-06  599  
d00181b96 Kirill A. Shutemov     2015-11-06  600  static inline unsigned int compound_order(struct page *page)
d85f33855 Christoph Lameter      2007-05-06  601  {
6d7779538 Christoph Lameter      2007-05-06  602  	if (!PageHead(page))
d85f33855 Christoph Lameter      2007-05-06  603  		return 0;
e4b294c2d Kirill A. Shutemov     2015-02-11  604  	return page[1].compound_order;
d85f33855 Christoph Lameter      2007-05-06  605  }
d85f33855 Christoph Lameter      2007-05-06  606  
f1e61557f Kirill A. Shutemov     2015-11-06  607  static inline void set_compound_order(struct page *page, unsigned int order)
d85f33855 Christoph Lameter      2007-05-06  608  {
e4b294c2d Kirill A. Shutemov     2015-02-11  609  	page[1].compound_order = order;
d85f33855 Christoph Lameter      2007-05-06  610  }
d85f33855 Christoph Lameter      2007-05-06  611  
9a982250f Kirill A. Shutemov     2016-01-15  612  void free_compound_page(struct page *page);
9a982250f Kirill A. Shutemov     2016-01-15  613  
3dece370e Michal Simek           2011-01-21  614  #ifdef CONFIG_MMU
33f2ef89f Andy Whitcroft         2006-12-06  615  /*
14fd403f2 Andrea Arcangeli       2011-01-13  616   * Do pte_mkwrite, but only if the vma says VM_WRITE.  We do this when
14fd403f2 Andrea Arcangeli       2011-01-13  617   * servicing faults for write access.  In the normal case, do always want
14fd403f2 Andrea Arcangeli       2011-01-13  618   * pte_mkwrite.  But get_user_pages can cause write faults for mappings
14fd403f2 Andrea Arcangeli       2011-01-13  619   * that do not have writing enabled, when used by access_process_vm.
14fd403f2 Andrea Arcangeli       2011-01-13  620   */
14fd403f2 Andrea Arcangeli       2011-01-13  621  static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
14fd403f2 Andrea Arcangeli       2011-01-13  622  {
14fd403f2 Andrea Arcangeli       2011-01-13  623  	if (likely(vma->vm_flags & VM_WRITE))
14fd403f2 Andrea Arcangeli       2011-01-13 @624  		pte = pte_mkwrite(pte);
14fd403f2 Andrea Arcangeli       2011-01-13  625  	return pte;
14fd403f2 Andrea Arcangeli       2011-01-13  626  }
8c6e50b02 Kirill A. Shutemov     2014-04-07  627  

:::::: The code at line 486 was first introduced by commit
:::::: 9e2779fa281cfda13ac060753d674bbcaa23367e is_vmalloc_addr(): Check if an address is within the vmalloc boundaries

:::::: TO: Christoph Lameter <clameter@sgi.com>
:::::: CC: Linus Torvalds <torvalds@woody.linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 11554 bytes --]

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 01/60] block: bio: introduce bio_init_with_vec_table()
  2016-10-29  8:08 ` [PATCH 01/60] block: bio: introduce bio_init_with_vec_table() Ming Lei
@ 2016-10-29 15:21   ` Christoph Hellwig
  0 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-29 15:21 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Mike Christie,
	Hannes Reinecke, Keith Busch, Mike Snitzer

Just add the two arguments to bio_init instead of adding a second
function with a way too long name.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 23/60] block: introduce flag QUEUE_FLAG_NO_MP
  2016-10-29  8:08 ` [PATCH 23/60] block: introduce flag QUEUE_FLAG_NO_MP Ming Lei
@ 2016-10-29 15:29   ` Christoph Hellwig
  2016-10-29 22:20       ` Ming Lei
  0 siblings, 1 reply; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-29 15:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Mike Christie,
	Hannes Reinecke, Dan Williams, Toshi Kani, shli, linux-raid

On Sat, Oct 29, 2016 at 04:08:22PM +0800, Ming Lei wrote:
> MD(especially raid1 and raid10) is a bit difficult to support
> multipage bvec, so introduce this flag for not enabling multipage
> bvec, then MD can still accept singlepage bvec only, and once
> direct access to bvec table in MD and other fs/drivers are cleanuped,
> the flag can be removed. BTRFS has the similar issue too.

There is really no good reason for that.  The RAID1 and 10 code really
just needs some love to use the bio cloning infrastructure, bio
iterators and generally recent bio apis.  btrfs just needs a tiny little
bit of help and I'll send patches soon.

Having two different code path is just asking for trouble in the long
run.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 23/60] block: introduce flag QUEUE_FLAG_NO_MP
  2016-10-29 15:29   ` Christoph Hellwig
@ 2016-10-29 22:20       ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29 22:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Mike Christie,
	Hannes Reinecke, Dan Williams, Toshi Kani, Shaohua Li,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Sat, Oct 29, 2016 at 11:29 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Sat, Oct 29, 2016 at 04:08:22PM +0800, Ming Lei wrote:
>> MD(especially raid1 and raid10) is a bit difficult to support
>> multipage bvec, so introduce this flag for not enabling multipage
>> bvec, then MD can still accept singlepage bvec only, and once
>> direct access to bvec table in MD and other fs/drivers are cleanuped,
>> the flag can be removed. BTRFS has the similar issue too.
>
> There is really no good reason for that.  The RAID1 and 10 code really
> just needs some love to use the bio cloning infrastructure, bio
> iterators and generally recent bio apis.  btrfs just needs a tiny little
> bit of help and I'll send patches soon.

That is very nice of you to do this cleanup, cool!

I guess it still need a bit time, and hope that won't be the block
for the whole patchset, :-)

[linux-2.6-next]$git grep -n -E "bi_io_vec|bi_vcnt" ./fs/btrfs/ | wc -l
45

[linux-2.6-next]$git grep -n -E "bi_io_vec|bi_vcnt" ./drivers/md/ |
grep raid | wc -l
54

>
> Having two different code path is just asking for trouble in the long
> run.

Definitely, that flag is introduced just as a short-term solution.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 23/60] block: introduce flag QUEUE_FLAG_NO_MP
@ 2016-10-29 22:20       ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-29 22:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Mike Christie,
	Hannes Reinecke, Dan Williams, Toshi Kani, Shaohua Li,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Sat, Oct 29, 2016 at 11:29 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Sat, Oct 29, 2016 at 04:08:22PM +0800, Ming Lei wrote:
>> MD(especially raid1 and raid10) is a bit difficult to support
>> multipage bvec, so introduce this flag for not enabling multipage
>> bvec, then MD can still accept singlepage bvec only, and once
>> direct access to bvec table in MD and other fs/drivers are cleanuped,
>> the flag can be removed. BTRFS has the similar issue too.
>
> There is really no good reason for that.  The RAID1 and 10 code really
> just needs some love to use the bio cloning infrastructure, bio
> iterators and generally recent bio apis.  btrfs just needs a tiny little
> bit of help and I'll send patches soon.

That is very nice of you to do this cleanup, cool!

I guess it still need a bit time, and hope that won't be the block
for the whole patchset, :-)

[linux-2.6-next]$git grep -n -E "bi_io_vec|bi_vcnt" ./fs/btrfs/ | wc -l
45

[linux-2.6-next]$git grep -n -E "bi_io_vec|bi_vcnt" ./drivers/md/ |
grep raid | wc -l
54

>
> Having two different code path is just asking for trouble in the long
> run.

Definitely, that flag is introduced just as a short-term solution.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  2016-10-29  8:08 ` [PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair Ming Lei
@ 2016-10-31 13:59   ` Theodore Ts'o
  2016-10-31 15:11     ` Christoph Hellwig
  2016-10-31 22:46     ` Ming Lei
  0 siblings, 2 replies; 148+ messages in thread
From: Theodore Ts'o @ 2016-10-31 13:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Mike Christie,
	Hannes Reinecke, Keith Busch, Mike Snitzer, Johannes Thumshirn,
	Bart Van Assche

On Sat, Oct 29, 2016 at 04:08:44PM +0800, Ming Lei wrote:
> This patches introduce bio_for_each_segment_all_rd() and
> bio_for_each_segment_all_wt().
> 
> bio_for_each_segment_all_rd() is for replacing
> bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
> is accessed as readonly.
> 
> bio_for_each_segment_all_wt() is for replacing
> bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
> need to be updated.

What is _rd and _wt supposed to stand for?  And speaking more
generally, could you write up some more detailed notes about all of
the various new functions they have been added, when they should be
used, and some kind of roadmap about how things are supposed to work
beyond the very high-level description in the introduction in your
patch series?  Ideally it would go into the Documentation directory,
so that after this patch set gets applied, people will be able to
refer to it to understand how things are supposed to work.

Thanks!!

						- Ted

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  2016-10-31 13:59   ` Theodore Ts'o
@ 2016-10-31 15:11     ` Christoph Hellwig
  2016-10-31 22:50       ` Ming Lei
  2016-11-02  3:01       ` Kent Overstreet
  2016-10-31 22:46     ` Ming Lei
  1 sibling, 2 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:11 UTC (permalink / raw)
  To: Theodore Ts'o, Ming Lei, Jens Axboe, linux-kernel,
	linux-block, linux-fsdevel, Christoph Hellwig,
	Kirill A . Shutemov, Mike Christie, Hannes Reinecke, Keith Busch,
	Mike Snitzer, Johannes Thumshirn, Bart Van Assche

On Mon, Oct 31, 2016 at 09:59:43AM -0400, Theodore Ts'o wrote:
> What is _rd and _wt supposed to stand for?

I think it's read and write, but I think the naming is highly
unfortunate.  I started dabbling around with the patches a bit,
and to keep my sanity a started reaming it to _pages and _bvec
which is the real semantics - the _rd or _pages gives you a synthetic
bvec for each page, and the other one gives you the full bvec.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 00/60] block: support multipage bvec
  2016-10-29  8:07 ` Ming Lei
  (?)
  (?)
@ 2016-10-31 15:25   ` Christoph Hellwig
  -1 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:25 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Al Viro, Andrew Morton,
	Bart Van Assche, open list:GFS2 FILE SYSTEM, Coly Li,
	Dan Williams, open list:DEVICE-MAPPER  (LVM),
	open list:DRBD DRIVER, Eric Wheeler, Guoqing Jiang,
	Hannes Reinecke, Hannes Reinecke, Jiri Kosina, Joe Perches,
	Johannes Berg, Johannes Thumshirn, Keith Busch

Hi Ming,

can you send a first patch just doing the obvious cleanups like
converting to bio_add_page and replacing direct poking into the
bio with the proper accessors?  That should help reducing the
actual series to a sane size, and it should also help to cut
down the Cc list.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 00/60] block: support multipage bvec
@ 2016-10-31 15:25   ` Christoph Hellwig
  0 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:25 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Al Viro, Andrew Morton,
	Bart Van Assche, open list:GFS2 FILE SYSTEM, Coly Li,
	Dan Williams, open list:DEVICE-MAPPER  (LVM),
	open list:DRBD DRIVER, Eric Wheeler, Guoqing Jiang,
	Hannes Reinecke, Hannes Reinecke, Jiri Kosina, Joe Perches,
	Johannes Berg, Johannes Thumshirn, Keith Busch, Kent Overstreet,
	Kent Overstreet, open list:BCACHE (BLOCK LAYER CACHE),
	open list:BTRFS FILE SYSTEM, open list:EXT4 FILE SYSTEM,
	open list:F2FS FILE SYSTEM, open list:MEMORY MANAGEMENT,
	open list:NVM EXPRESS TARGET DRIVER, open list:SUSPEND TO RAM,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	open list:TARGET SUBSYSTEM, open list:XFS FILESYSTEM,
	open list:LogFS, Michal Hocko, Mike Christie, Mike Snitzer,
	Minchan Kim, Minfei Huang, open list:OSD LIBRARY and FILESYSTEM,
	Petr Mladek, Rasmus Villemoes, Takashi Iwai,
	open list:TARGET SUBSYSTEM, Toshi Kani, Yijing Wang, Zheng Liu,
	Zheng Liu

Hi Ming,

can you send a first patch just doing the obvious cleanups like
converting to bio_add_page and replacing direct poking into the
bio with the proper accessors?  That should help reducing the
actual series to a sane size, and it should also help to cut
down the Cc list.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 00/60] block: support multipage bvec
@ 2016-10-31 15:25   ` Christoph Hellwig
  0 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:25 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Al Viro, Andrew Morton,
	Bart Van Assche, open list:GFS2 FILE SYSTEM, Coly Li,
	Dan Williams, open list:DEVICE-MAPPER  (LVM),
	open list:DRBD DRIVER, Eric Wheeler, Guoqing Jiang,
	Hannes Reinecke, Hannes Reinecke, Jiri Kosina, Joe Perches,
	Johannes Berg, Johannes Thumshirn, Keith Busch

Hi Ming,

can you send a first patch just doing the obvious cleanups like
converting to bio_add_page and replacing direct poking into the
bio with the proper accessors?  That should help reducing the
actual series to a sane size, and it should also help to cut
down the Cc list.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [Cluster-devel] [PATCH 00/60] block: support multipage bvec
@ 2016-10-31 15:25   ` Christoph Hellwig
  0 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:25 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Ming,

can you send a first patch just doing the obvious cleanups like
converting to bio_add_page and replacing direct poking into the
bio with the proper accessors?  That should help reducing the
actual series to a sane size, and it should also help to cut
down the Cc list.



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 03/60] block: drbd: remove impossible failure handling
  2016-10-29  8:08 ` [PATCH 03/60] block: drbd: remove impossible failure handling Ming Lei
@ 2016-10-31 15:25   ` Christoph Hellwig
  0 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:25 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Philipp Reisner,
	Lars Ellenberg, open list:DRBD DRIVER

On Sat, Oct 29, 2016 at 04:08:02PM +0800, Ming Lei wrote:
> For a non-cloned bio, bio_add_page() only returns failure when
> the io vec table is full, but in that case, bio->bi_vcnt can't
> be zero at all.
> 
> So remove the impossible failure handling.
> 
> Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 04/60] block: floppy: use bio_add_page()
  2016-10-29  8:08 ` [PATCH 04/60] block: floppy: use bio_add_page() Ming Lei
@ 2016-10-31 15:26   ` Christoph Hellwig
  2016-10-31 22:54     ` Ming Lei
  2016-11-10 19:35   ` Christoph Hellwig
  1 sibling, 1 reply; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:26 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Jiri Kosina,
	Mike Christie, Hannes Reinecke, Dan Williams

Why not keep the bio_add_page in the same spot as direct assignments
were before?

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 05/60] target: avoid to access .bi_vcnt directly
  2016-10-29  8:08   ` Ming Lei
  (?)
@ 2016-10-31 15:26   ` Christoph Hellwig
  -1 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:26 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Nicholas A. Bellinger,
	open list:TARGET SUBSYSTEM, open list:TARGET SUBSYSTEM

On Sat, Oct 29, 2016 at 04:08:04PM +0800, Ming Lei wrote:
> When the bio is full, bio_add_pc_page() will return zero,
> so use this way to handle full bio.
> 
> Also replace access to .bi_vcnt for pr_debug() with bio_segments().
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  2016-10-29  8:08   ` Ming Lei
@ 2016-10-31 15:29     ` Christoph Hellwig
  -1 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Alasdair Kergon,
	Mike Snitzer, maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
> Avoid to access .bi_vcnt directly, because it may be not what
> the driver expected any more after supporting multipage bvec.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>

It would be really nice to have a comment in the code why it's
even checking for multiple segments.


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
@ 2016-10-31 15:29     ` Christoph Hellwig
  0 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Alasdair Kergon,
	Mike Snitzer, maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
> Avoid to access .bi_vcnt directly, because it may be not what
> the driver expected any more after supporting multipage bvec.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>

It would be really nice to have a comment in the code why it's
even checking for multiple segments.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 11/60] fs: logfs: use bio_add_page() in __bdev_writeseg()
  2016-10-29  8:08 ` [PATCH 11/60] fs: logfs: use bio_add_page() in __bdev_writeseg() Ming Lei
@ 2016-10-31 15:29   ` Christoph Hellwig
  0 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Joern Engel,
	Prasad Joshi, open list:LogFS

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 12/60] fs: logfs: use bio_add_page() in do_erase()
  2016-10-29  8:08 ` [PATCH 12/60] fs: logfs: use bio_add_page() in do_erase() Ming Lei
@ 2016-10-31 15:29   ` Christoph Hellwig
  0 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Joern Engel,
	Prasad Joshi, open list:LogFS

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 13/60] fs: logfs: remove unnecesary check
  2016-10-29  8:08 ` [PATCH 13/60] fs: logfs: remove unnecesary check Ming Lei
@ 2016-10-31 15:29   ` Christoph Hellwig
  0 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:29 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Joern Engel,
	Prasad Joshi, open list:LogFS

On Sat, Oct 29, 2016 at 04:08:12PM +0800, Ming Lei wrote:
> The check on bio->bi_vcnt doesn't make sense in erase_end_io().

Agreed,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 15/60] block: loop: comment on direct access to bvec table
  2016-10-29  8:08 ` [PATCH 15/60] block: loop: comment on direct access to " Ming Lei
@ 2016-10-31 15:31   ` Christoph Hellwig
  2016-10-31 23:08     ` Ming Lei
  0 siblings, 1 reply; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:31 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Hannes Reinecke,
	Mike Christie, Minfei Huang, Petr Mladek

Btw, the lib/iov_iter.c code that iterates over bvec currently
expects single-page segments.  Is the loop code fine with that?
Even if it is I think we'd be much better off if it becomes multipage
segment aware.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 16/60] block: pktcdvd: comment on direct access to bvec table
  2016-10-29  8:08 ` [PATCH 16/60] block: pktcdvd: " Ming Lei
@ 2016-10-31 15:33   ` Christoph Hellwig
  2016-10-31 23:08     ` Ming Lei
  0 siblings, 1 reply; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:33 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Jiri Kosina

Please pick up my "pktcdvd: don't scribble over the bvec array"
patch instead of the pktcdvd patches in this series.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 19/60] fs/buffer: comment on direct access to bvec table
  2016-10-29  8:08 ` [PATCH 19/60] fs/buffer: " Ming Lei
@ 2016-10-31 15:35   ` Christoph Hellwig
  2016-10-31 23:12     ` Ming Lei
  0 siblings, 1 reply; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:35 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Alexander Viro,
	Kent Overstreet


I think we'll just need a version zero_fill_bio with a length argument
and let that handle all the bvec access.  I have vague memories that
Kent posted one a while ago, Ccing him.

On Sat, Oct 29, 2016 at 04:08:18PM +0800, Ming Lei wrote:
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  fs/buffer.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index b205a629001d..81c3793948b4 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3018,8 +3018,13 @@ static void end_bio_bh_io_sync(struct bio *bio)
>  void guard_bio_eod(int op, struct bio *bio)
>  {
>  	sector_t maxsector;
> -	struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
>  	unsigned truncated_bytes;
> +	/*
> +	 * It is safe to truncate the last bvec in the following way
> +	 * even though multipage bvec is supported, but we need to
> +	 * fix the parameters passed to zero_user().
> +	 */
> +	struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
>  
>  	maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
>  	if (!maxsector)
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 26/60] btrfs: set NO_MP for request queues behind BTRFS
  2016-10-29  8:08 ` [PATCH 26/60] btrfs: set NO_MP for request queues behind BTRFS Ming Lei
@ 2016-10-31 15:36   ` Christoph Hellwig
  2016-10-31 17:58     ` Chris Mason
  0 siblings, 1 reply; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Chris Mason, Josef Bacik,
	David Sterba, open list:BTRFS FILE SYSTEM

On Sat, Oct 29, 2016 at 04:08:25PM +0800, Ming Lei wrote:
> There are lots of direct access to .bi_vcnt & .bi_io_vec
> of bio, and it isn't ready to support multipage bvecs
> for BTRFS, so set NO_MP for these request queues.

For one bio is an I/O submitter, it has absolutely no business changing
queue flags - if we need to stick to this limitation it simply needs
a version of bio_add_page that doesn't create multi-page bvecs.

Second I don't think making it multipage bvec aware is all that hard,
and we should aim for doing the proper thing.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP
  2016-10-29  8:08 ` [PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP Ming Lei
@ 2016-10-31 15:39   ` Christoph Hellwig
  2016-10-31 23:56     ` Ming Lei
  2016-11-02  3:08     ` Kent Overstreet
  0 siblings, 2 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:39 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Jens Axboe,
	Hannes Reinecke, Mike Christie, Dan Williams, Toshi Kani,
	Kent Overstreet

On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
> Some drivers(such as dm) should be capable of dealing with multipage
> bvec, but the incoming bio may be too big, such as, a new singlepage bvec
> bio can't be cloned from the bio, or can't be allocated to singlepage
> bvec with same size.
> 
> At least crypt dm, log writes and bcache have this kind of issue.

We already have the segment_size limitation for request based drivers.
I'd rather extent it to bio drivers if really needed.

But then again we should look into not having this limitation.  E.g.
for bcache I'd be really surprised if it's that limited, given that
Kent came up with this whole multipage bvec scheme.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 46/60] block: deal with dirtying pages for multipage bvec
  2016-10-29  8:08 ` [PATCH 46/60] block: deal with dirtying pages for multipage bvec Ming Lei
@ 2016-10-31 15:40   ` Christoph Hellwig
  2016-11-01  0:19     ` Ming Lei
  0 siblings, 1 reply; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 15:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Christoph Hellwig, Kirill A . Shutemov, Jens Axboe

On Sat, Oct 29, 2016 at 04:08:45PM +0800, Ming Lei wrote:
> In bio_check_pages_dirty(), bvec->bv_page is used as flag
> for marking if the page has been dirtied & released, and if
> no, it will be dirtied in deferred workqueue.
> 
> With multipage bvec, we can't do that any more, so change
> the logic into checking all pages in one mp bvec, and only
> release all these pages if all are dirtied, otherwise dirty
> them all in deferred wrokqueue.

Just defer the whole bio to the workqueue if we need to redirty any,
that avoids having all these complex iteratations.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 26/60] btrfs: set NO_MP for request queues behind BTRFS
  2016-10-31 15:36   ` Christoph Hellwig
@ 2016-10-31 17:58     ` Chris Mason
  2016-10-31 18:00       ` Christoph Hellwig
  0 siblings, 1 reply; 148+ messages in thread
From: Chris Mason @ 2016-10-31 17:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Kirill A . Shutemov, Josef Bacik, David Sterba,
	open list:BTRFS FILE SYSTEM

On Mon, Oct 31, 2016 at 08:36:44AM -0700, Christoph Hellwig wrote:
>On Sat, Oct 29, 2016 at 04:08:25PM +0800, Ming Lei wrote:
>> There are lots of direct access to .bi_vcnt & .bi_io_vec
>> of bio, and it isn't ready to support multipage bvecs
>> for BTRFS, so set NO_MP for these request queues.
>
>For one bio is an I/O submitter, it has absolutely no business changing
>queue flags - if we need to stick to this limitation it simply needs
>a version of bio_add_page that doesn't create multi-page bvecs.
>
>Second I don't think making it multipage bvec aware is all that hard,
>and we should aim for doing the proper thing.

Yeah, I'd rather make us less special.  The direct access was a short 
term fix to adjust to the new bio interfaces, we should clean it up.

-chris


^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 26/60] btrfs: set NO_MP for request queues behind BTRFS
  2016-10-31 17:58     ` Chris Mason
@ 2016-10-31 18:00       ` Christoph Hellwig
  0 siblings, 0 replies; 148+ messages in thread
From: Christoph Hellwig @ 2016-10-31 18:00 UTC (permalink / raw)
  To: Chris Mason, Christoph Hellwig, Ming Lei, Jens Axboe,
	linux-kernel, linux-block, linux-fsdevel, Kirill A . Shutemov,
	Josef Bacik, David Sterba, open list:BTRFS FILE SYSTEM

On Mon, Oct 31, 2016 at 11:58:29AM -0600, Chris Mason wrote:
> Yeah, I'd rather make us less special.  The direct access was a short term
> fix to adjust to the new bio interfaces, we should clean it up.

I've got patches for a few areas in progress, I'll send them your way
once I've finished testing.  There will be a few more areas left where
I'll need a little help, though.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  2016-10-31 13:59   ` Theodore Ts'o
  2016-10-31 15:11     ` Christoph Hellwig
@ 2016-10-31 22:46     ` Ming Lei
  2016-10-31 23:51       ` Ming Lei
  1 sibling, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-31 22:46 UTC (permalink / raw)
  To: Theodore Ts'o, Ming Lei, Jens Axboe,
	Linux Kernel Mailing List, linux-block, Linux FS Devel,
	Christoph Hellwig, Kirill A . Shutemov, Mike Christie,
	Hannes Reinecke, Keith Busch, Mike Snitzer, Johannes Thumshirn,
	Bart Van Assche

On Mon, Oct 31, 2016 at 9:59 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Sat, Oct 29, 2016 at 04:08:44PM +0800, Ming Lei wrote:
>> This patches introduce bio_for_each_segment_all_rd() and
>> bio_for_each_segment_all_wt().
>>
>> bio_for_each_segment_all_rd() is for replacing
>> bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
>> is accessed as readonly.
>>
>> bio_for_each_segment_all_wt() is for replacing
>> bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
>> need to be updated.
>
> What is _rd and _wt supposed to stand for?  And speaking more

As Christoph replied, _rd means read, which said the bvec pointed by
the iterator variable(bvec pointer) is read-only, and the bvec table
can't be written into via this usage any more. Maybe
bio_for_each_segment_all_ro is better?

On the other hand, _wr meands write, which said the bvec pointed by
the iterator variable(bvec pointer) can be written to. Maybe we can use
original bio_for_each_segment_all() for it?

> generally, could you write up some more detailed notes about all of
> the various new functions they have been added, when they should be
> used, and some kind of roadmap about how things are supposed to work
> beyond the very high-level description in the introduction in your
> patch series?  Ideally it would go into the Documentation directory,
> so that after this patch set gets applied, people will be able to
> refer to it to understand how things are supposed to work.

In the next post, I will add comment on the two helpers, thanks for
your review.

Thanks,
Ming

>
> Thanks!!
>
>                                                 - Ted



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  2016-10-31 15:11     ` Christoph Hellwig
@ 2016-10-31 22:50       ` Ming Lei
  2016-11-02  3:01       ` Kent Overstreet
  1 sibling, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-31 22:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Theodore Ts'o, Jens Axboe, Linux Kernel Mailing List,
	linux-block, Linux FS Devel, Kirill A . Shutemov, Mike Christie,
	Hannes Reinecke, Keith Busch, Mike Snitzer, Johannes Thumshirn,
	Bart Van Assche

On Mon, Oct 31, 2016 at 11:11 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Oct 31, 2016 at 09:59:43AM -0400, Theodore Ts'o wrote:
>> What is _rd and _wt supposed to stand for?
>
> I think it's read and write, but I think the naming is highly
> unfortunate.  I started dabbling around with the patches a bit,
> and to keep my sanity a started reaming it to _pages and _bvec
> which is the real semantics - the _rd or _pages gives you a synthetic
> bvec for each page, and the other one gives you the full bvec.

Looks _pages & _bvec is better and still a little confusing and not reflect
its real purpose from user view, since both points to real bvec. Could we
just rename _rd as bio_for_each_segment_all_ro() which means the pointed
bvec is readonly and not introduce _wt?


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 00/60] block: support multipage bvec
  2016-10-31 15:25   ` Christoph Hellwig
  (?)
@ 2016-10-31 22:52     ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-31 22:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Al Viro, Andrew Morton,
	Bart Van Assche, open list:GFS2 FILE SYSTEM, Coly Li,
	Dan Williams, open list:DEVICE-MAPPER (LVM),
	open list:DRBD DRIVER, Eric Wheeler, Guoqing Jiang,
	Hannes Reinecke, Hannes Reinecke, Jiri Kosina, Joe Perches,
	Johannes Berg, Johannes Thumshirn, Kei

On Mon, Oct 31, 2016 at 11:25 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Hi Ming,
>
> can you send a first patch just doing the obvious cleanups like
> converting to bio_add_page and replacing direct poking into the
> bio with the proper accessors?  That should help reducing the

OK, that is just the 1st part of the patchset.

> actual series to a sane size, and it should also help to cut
> down the Cc list.
>



Thanks,
Ming Lei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 00/60] block: support multipage bvec
@ 2016-10-31 22:52     ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-31 22:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Al Viro, Andrew Morton,
	Bart Van Assche, open list:GFS2 FILE SYSTEM, Coly Li,
	Dan Williams, open list:DEVICE-MAPPER (LVM),
	open list:DRBD DRIVER, Eric Wheeler, Guoqing Jiang,
	Hannes Reinecke, Hannes Reinecke, Jiri Kosina, Joe Perches,
	Johannes Berg, Johannes Thumshirn, Keith Busch, Kent Overstreet,
	Kent Overstreet, open list:BCACHE (BLOCK LAYER CACHE),
	open list:BTRFS FILE SYSTEM, open list:EXT4 FILE SYSTEM,
	open list:F2FS FILE SYSTEM, open list:MEMORY MANAGEMENT,
	open list:NVM EXPRESS TARGET DRIVER, open list:SUSPEND TO RAM,
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	open list:TARGET SUBSYSTEM, open list:XFS FILESYSTEM,
	open list:LogFS, Michal Hocko, Mike Christie, Mike Snitzer,
	Minchan Kim, Minfei Huang, open list:OSD LIBRARY and FILESYSTEM,
	Petr Mladek, Rasmus Villemoes, Takashi Iwai,
	open list:TARGET SUBSYSTEM, Toshi Kani, Yijing Wang, Zheng Liu,
	Zheng Liu

On Mon, Oct 31, 2016 at 11:25 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Hi Ming,
>
> can you send a first patch just doing the obvious cleanups like
> converting to bio_add_page and replacing direct poking into the
> bio with the proper accessors?  That should help reducing the

OK, that is just the 1st part of the patchset.

> actual series to a sane size, and it should also help to cut
> down the Cc list.
>



Thanks,
Ming Lei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [Cluster-devel] [PATCH 00/60] block: support multipage bvec
@ 2016-10-31 22:52     ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-31 22:52 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, Oct 31, 2016 at 11:25 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Hi Ming,
>
> can you send a first patch just doing the obvious cleanups like
> converting to bio_add_page and replacing direct poking into the
> bio with the proper accessors?  That should help reducing the

OK, that is just the 1st part of the patchset.

> actual series to a sane size, and it should also help to cut
> down the Cc list.
>



Thanks,
Ming Lei



^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 04/60] block: floppy: use bio_add_page()
  2016-10-31 15:26   ` Christoph Hellwig
@ 2016-10-31 22:54     ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-31 22:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Jiri Kosina, Mike Christie,
	Hannes Reinecke, Dan Williams

On Mon, Oct 31, 2016 at 11:26 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Why not keep the bio_add_page in the same spot as direct assignments
> were before?

I just want to put adding page after setting bi_bdev.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  2016-10-31 15:29     ` Christoph Hellwig
@ 2016-10-31 22:59       ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-31 22:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Alasdair Kergon,
	Mike Snitzer, maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Mon, Oct 31, 2016 at 11:29 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
>> Avoid to access .bi_vcnt directly, because it may be not what
>> the driver expected any more after supporting multipage bvec.
>>
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>
> It would be really nice to have a comment in the code why it's
> even checking for multiple segments.
>

OK, will add comment about using !bio_multiple_segments(rq->bio)
to replace 'rq->bio->bi_vcnt == 1'.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
@ 2016-10-31 22:59       ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-31 22:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Alasdair Kergon,
	Mike Snitzer, maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Mon, Oct 31, 2016 at 11:29 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
>> Avoid to access .bi_vcnt directly, because it may be not what
>> the driver expected any more after supporting multipage bvec.
>>
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>
> It would be really nice to have a comment in the code why it's
> even checking for multiple segments.
>

OK, will add comment about using !bio_multiple_segments(rq->bio)
to replace 'rq->bio->bi_vcnt == 1'.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 15/60] block: loop: comment on direct access to bvec table
  2016-10-31 15:31   ` Christoph Hellwig
@ 2016-10-31 23:08     ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-31 23:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Hannes Reinecke,
	Mike Christie, Minfei Huang, Petr Mladek

On Mon, Oct 31, 2016 at 11:31 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Btw, the lib/iov_iter.c code that iterates over bvec currently
> expects single-page segments.  Is the loop code fine with that?

lib/iov_iter.c has switched to bvec iterator already in the mp-bvec
preparing patchset, so every thing will be fine after multipage bvec
is enabled.

Another multipage bvec benefit for lib/iov_iter.c(dio) is that we
can return whole pages in one segment, instead of one page
each time, such as iov_iter_get_pages(), but that can be
a follow-up optimization.

> Even if it is I think we'd be much better off if it becomes multipage
> segment aware.

This patch is for auditing possible effect with multipage bvec, so
looks we should expose as much as possible direct access to
bvec table.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 16/60] block: pktcdvd: comment on direct access to bvec table
  2016-10-31 15:33   ` Christoph Hellwig
@ 2016-10-31 23:08     ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-31 23:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Jiri Kosina

On Mon, Oct 31, 2016 at 11:33 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Please pick up my "pktcdvd: don't scribble over the bvec array"
> patch instead of the pktcdvd patches in this series.

OK.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 19/60] fs/buffer: comment on direct access to bvec table
  2016-10-31 15:35   ` Christoph Hellwig
@ 2016-10-31 23:12     ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-31 23:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Alexander Viro,
	Kent Overstreet

On Mon, Oct 31, 2016 at 11:35 PM, Christoph Hellwig <hch@infradead.org> wrote:
>
> I think we'll just need a version zero_fill_bio with a length argument
> and let that handle all the bvec access.  I have vague memories that
> Kent posted one a while ago, Ccing him.

I will try to find zero_fill_bio() in Kent's tree later.

BTW, patch 59 switches to use bvec iterator to do that too.

>
> On Sat, Oct 29, 2016 at 04:08:18PM +0800, Ming Lei wrote:
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> ---
>>  fs/buffer.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/buffer.c b/fs/buffer.c
>> index b205a629001d..81c3793948b4 100644
>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -3018,8 +3018,13 @@ static void end_bio_bh_io_sync(struct bio *bio)
>>  void guard_bio_eod(int op, struct bio *bio)
>>  {
>>       sector_t maxsector;
>> -     struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
>>       unsigned truncated_bytes;
>> +     /*
>> +      * It is safe to truncate the last bvec in the following way
>> +      * even though multipage bvec is supported, but we need to
>> +      * fix the parameters passed to zero_user().
>> +      */
>> +     struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
>>
>>       maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
>>       if (!maxsector)
>> --
>> 2.7.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-block" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> ---end quoted text---


thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  2016-10-31 22:46     ` Ming Lei
@ 2016-10-31 23:51       ` Ming Lei
  2016-11-01 14:17         ` Theodore Ts'o
  0 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-10-31 23:51 UTC (permalink / raw)
  To: Theodore Ts'o, Ming Lei, Jens Axboe,
	Linux Kernel Mailing List, linux-block, Linux FS Devel,
	Christoph Hellwig, Kirill A . Shutemov, Mike Christie,
	Hannes Reinecke, Keith Busch, Mike Snitzer, Johannes Thumshirn,
	Bart Van Assche

On Tue, Nov 1, 2016 at 6:46 AM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Mon, Oct 31, 2016 at 9:59 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>> On Sat, Oct 29, 2016 at 04:08:44PM +0800, Ming Lei wrote:
>>> This patches introduce bio_for_each_segment_all_rd() and
>>> bio_for_each_segment_all_wt().
>>>
>>> bio_for_each_segment_all_rd() is for replacing
>>> bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
>>> is accessed as readonly.
>>>
>>> bio_for_each_segment_all_wt() is for replacing
>>> bio_for_each_segment_all() in case the bvec from bio->bi_io_vec
>>> need to be updated.
>>
>> What is _rd and _wt supposed to stand for?  And speaking more
>
> As Christoph replied, _rd means read, which said the bvec pointed by
> the iterator variable(bvec pointer) is read-only, and the bvec table
> can't be written into via this usage any more. Maybe
> bio_for_each_segment_all_ro is better?

Sorry for forgetting to mention one important point:

- after multipage bvec is introduced, the iterated bvec pointer
still points to singlge page bvec, which is generated in-flight
and is readonly actually. That is the motivation about the introduction
of bio_for_each_segment_all_rd().

So maybe bio_for_each_page_all_ro() is better?

>
> On the other hand, _wr meands write, which said the bvec pointed by
> the iterator variable(bvec pointer) can be written to. Maybe we can use
> original bio_for_each_segment_all() for it?

For _wt(), we still can keep it as bio_for_each_segment(), which also
reflects that now the iterated bvec points to one whole segment if
we name _rd as bio_for_each_page_all_ro().


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP
  2016-10-31 15:39   ` Christoph Hellwig
@ 2016-10-31 23:56     ` Ming Lei
  2016-11-02  3:08     ` Kent Overstreet
  1 sibling, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-10-31 23:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Jens Axboe, Hannes Reinecke,
	Mike Christie, Dan Williams, Toshi Kani, Kent Overstreet

On Mon, Oct 31, 2016 at 11:39 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
>> Some drivers(such as dm) should be capable of dealing with multipage
>> bvec, but the incoming bio may be too big, such as, a new singlepage bvec
>> bio can't be cloned from the bio, or can't be allocated to singlepage
>> bvec with same size.
>>
>> At least crypt dm, log writes and bcache have this kind of issue.
>
> We already have the segment_size limitation for request based drivers.
> I'd rather extent it to bio drivers if really needed.

Yeah, just found dm actually don't need the flag, and it has its own way
for limitting bio size. For bcache, there is only place which need the
flag, so we can use max sectors limit to address it or use multiple bio
to read & check ony by one.

>
> But then again we should look into not having this limitation.  E.g.
> for bcache I'd be really surprised if it's that limited, given that
> Kent came up with this whole multipage bvec scheme.

As far as I find, the only one which need the flag is bch_data_verify().


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 46/60] block: deal with dirtying pages for multipage bvec
  2016-10-31 15:40   ` Christoph Hellwig
@ 2016-11-01  0:19     ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-11-01  0:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Kirill A . Shutemov, Jens Axboe

On Mon, Oct 31, 2016 at 11:40 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Sat, Oct 29, 2016 at 04:08:45PM +0800, Ming Lei wrote:
>> In bio_check_pages_dirty(), bvec->bv_page is used as flag
>> for marking if the page has been dirtied & released, and if
>> no, it will be dirtied in deferred workqueue.
>>
>> With multipage bvec, we can't do that any more, so change
>> the logic into checking all pages in one mp bvec, and only
>> release all these pages if all are dirtied, otherwise dirty
>> them all in deferred wrokqueue.
>
> Just defer the whole bio to the workqueue if we need to redirty any,
> that avoids having all these complex iteratations.

For dio READ, it is always dirtied before submitting, so there should be
little possibity that pages in the bio becomes non-dirty after completion,
it should hurt performance if each direct-read bio is deferred to the block
wq.


thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  2016-10-31 23:51       ` Ming Lei
@ 2016-11-01 14:17         ` Theodore Ts'o
  2016-11-02  1:58           ` Ming Lei
  0 siblings, 1 reply; 148+ messages in thread
From: Theodore Ts'o @ 2016-11-01 14:17 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Christoph Hellwig, Kirill A . Shutemov,
	Mike Christie, Hannes Reinecke, Keith Busch, Mike Snitzer,
	Johannes Thumshirn, Bart Van Assche

On Tue, Nov 01, 2016 at 07:51:27AM +0800, Ming Lei wrote:
> Sorry for forgetting to mention one important point:
> 
> - after multipage bvec is introduced, the iterated bvec pointer
> still points to singlge page bvec, which is generated in-flight
> and is readonly actually. That is the motivation about the introduction
> of bio_for_each_segment_all_rd().
> 
> So maybe bio_for_each_page_all_ro() is better?
> 
> For _wt(), we still can keep it as bio_for_each_segment(), which also
> reflects that now the iterated bvec points to one whole segment if
> we name _rd as bio_for_each_page_all_ro().

I'm agnostic as to what the right names are --- my big concern is
there is an explosion of bio_for_each_page_* functions, and that there
isn't good documentation about (a) when to use each of these
functions, and (b) why.  I was goinig through the patch series, and it
was hard for me to figure out why, and I was looking through all of
the patches.  Once all of the patches are merged in, I am concerned
this is going to be massive trapdoor that will snare a large number of
unwitting developers.

As far as my preference, from an abstract perspective, if one version
(the read-write variant, I presume) is always safe, while one (the
read-only variant) is faster, if you can work under restricted
circumstances, naming the safe version so it is the "default", and
more dangerous one with the name that makes it a bit more obvious what
you have to do in order to use it safely, and then very clearly
document both in sources, and in the Documentation directory, what the
issues are and what you have to do in order to use the faster version.

Cheers,

					- Ted
					

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  2016-11-01 14:17         ` Theodore Ts'o
@ 2016-11-02  1:58           ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-11-02  1:58 UTC (permalink / raw)
  To: Theodore Ts'o, Ming Lei, Jens Axboe,
	Linux Kernel Mailing List, linux-block, Linux FS Devel,
	Christoph Hellwig, Kirill A . Shutemov, Mike Christie,
	Hannes Reinecke, Keith Busch, Mike Snitzer, Johannes Thumshirn,
	Bart Van Assche, Kent Overstreet

On Tue, Nov 1, 2016 at 10:17 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Tue, Nov 01, 2016 at 07:51:27AM +0800, Ming Lei wrote:
>> Sorry for forgetting to mention one important point:
>>
>> - after multipage bvec is introduced, the iterated bvec pointer
>> still points to singlge page bvec, which is generated in-flight
>> and is readonly actually. That is the motivation about the introduction
>> of bio_for_each_segment_all_rd().
>>
>> So maybe bio_for_each_page_all_ro() is better?
>>
>> For _wt(), we still can keep it as bio_for_each_segment(), which also
>> reflects that now the iterated bvec points to one whole segment if
>> we name _rd as bio_for_each_page_all_ro().
>
> I'm agnostic as to what the right names are --- my big concern is
> there is an explosion of bio_for_each_page_* functions, and that there

There isn't big users of bio_for_each_segment_all(), see:

        [ming@linux-2.6]$git grep -n bio_for_each_segment_all ./fs/ | wc -l
        23

I guess there isn't execuses to switch to that after this patchset.

>From view of API, bio_for_each_segment_all() is ugly and
exposes the bvec table to users, and the main reason we keep it
is that it can avoid one bvec copy in one loop. And it can be replaced
easily by bio_for_each_segment().

> isn't good documentation about (a) when to use each of these
> functions, and (b) why.  I was goinig through the patch series, and it
> was hard for me to figure out why, and I was looking through all of
> the patches.  Once all of the patches are merged in, I am concerned
> this is going to be massive trapdoor that will snare a large number of
> unwitting developers.

I understand your concern, and let me explain the whole story a bit:

1) in current linus tree, we have the following two bio iterator helpers,
for which we still don't provide any document:

       bio_for_each_segment(bvl, bio, iter)
       bio_for_each_segment_all(bvl, bio, i)

- the former is used to traverse each 'segment' in the bio range
descibed by the 'iter'(just like [start, size]); the latter is used
to traverse each 'segment' in the whole bio, so there isn't 'iter'
passed in.

- in the former helper, typeof('bvl') is 'struct bvec', and the 'segment'
is copied to 'bvl'; in the latter helper, typeof('bvl') is 'struct bvec *', and
it just points to one bvec directly in the table(bio->bi_io_vec) one by one.

- we can use the former helper to implement the latter easily and provide
a more friendly interface, and the main reason we keep it is that _all can
avoid bvec copy in each loop, so it might be a bit efficient.

- even segment is used in the helper's name, but each 'bvl' in the
helper just describes one single page, so actually they should have been
named as the following:

         bio_for_each_page(bvl, bio, iter)
         bio_for_each_page(bvl, bio, iter)

2) this patchset introduces multipage bvec, which will store one
real segment in each 'bvec' of the table(bio->bi_io_vec), and one
segment may include more than one page

- bio_for_each_segment() is kept as current interface to retrieve
one page in each 'bvl', that is just for making current users happy,
and it will be replaced with bio_for_each_page() finally, which
should be a follow-up work of this patchset

- the story of introduction of bio_for_each_segment_all_rd(bvl, bio, i):
we can't simply make 'bvl' point to each bvec in the table direclty
any more, because now each bvec in the table store one real segment
instead of one page. So in this patchst the _rd() is implemented by
bio_for_each_segment(), and we can't change/write to the bvec in the
table any more using the pointer of 'bvl' via this helper.

>
> As far as my preference, from an abstract perspective, if one version
> (the read-write variant, I presume) is always safe, while one (the
> read-only variant) is faster, if you can work under restricted
> circumstances, naming the safe version so it is the "default", and
> more dangerous one with the name that makes it a bit more obvious what
> you have to do in order to use it safely, and then very clearly
> document both in sources, and in the Documentation directory, what the
> issues are and what you have to do in order to use the faster version.

I will add detailed documents about these helpers in next version:

    - bio_for_each_segment()
    - bio_for_each_segment_all()
    - bio_for_each_page_all_ro()(renamed from bio_for_each_segment_all_rd())

Thanks,
Ming

>
> Cheers,
>
>                                         - Ted
>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair
  2016-10-31 15:11     ` Christoph Hellwig
  2016-10-31 22:50       ` Ming Lei
@ 2016-11-02  3:01       ` Kent Overstreet
  1 sibling, 0 replies; 148+ messages in thread
From: Kent Overstreet @ 2016-11-02  3:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Theodore Ts'o, Ming Lei, Jens Axboe, linux-kernel,
	linux-block, linux-fsdevel, Kirill A . Shutemov, Mike Christie,
	Hannes Reinecke, Keith Busch, Mike Snitzer, Johannes Thumshirn,
	Bart Van Assche

On Mon, Oct 31, 2016 at 08:11:23AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 31, 2016 at 09:59:43AM -0400, Theodore Ts'o wrote:
> > What is _rd and _wt supposed to stand for?
> 
> I think it's read and write, but I think the naming is highly
> unfortunate.  I started dabbling around with the patches a bit,
> and to keep my sanity a started reaming it to _pages and _bvec
> which is the real semantics - the _rd or _pages gives you a synthetic
> bvec for each page, and the other one gives you the full bvec.

My original naming was bio_for_each_segment() and bio_for_each_page().

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP
  2016-10-31 15:39   ` Christoph Hellwig
  2016-10-31 23:56     ` Ming Lei
@ 2016-11-02  3:08     ` Kent Overstreet
  2016-11-03 10:38       ` Ming Lei
  1 sibling, 1 reply; 148+ messages in thread
From: Kent Overstreet @ 2016-11-02  3:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Kirill A . Shutemov, Jens Axboe, Hannes Reinecke, Mike Christie,
	Dan Williams, Toshi Kani

On Mon, Oct 31, 2016 at 08:39:15AM -0700, Christoph Hellwig wrote:
> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
> > Some drivers(such as dm) should be capable of dealing with multipage
> > bvec, but the incoming bio may be too big, such as, a new singlepage bvec
> > bio can't be cloned from the bio, or can't be allocated to singlepage
> > bvec with same size.
> > 
> > At least crypt dm, log writes and bcache have this kind of issue.
> 
> We already have the segment_size limitation for request based drivers.
> I'd rather extent it to bio drivers if really needed.
> 
> But then again we should look into not having this limitation.  E.g.
> for bcache I'd be really surprised if it's that limited, given that
> Kent came up with this whole multipage bvec scheme.

AFAIK the only issue is with drivers that may have to bounce bios - pages that
were contiguous in the original bio won't necessarily be contiguous in the
bounced bio, thus bouncing might require more than BIO_MAX_SEGMENTS bvecs.

I don't know what Ming's referring to by "singlepage bvec bios".

Anyways, bouncing comes up in multiple places so we probably need to come up
with a generic solution for that. Other than that, there shouldn't be any issues
or limitations - if you're not bouncing, there's no need to clone the bvecs.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  2016-10-31 15:29     ` Christoph Hellwig
@ 2016-11-02  3:09       ` Kent Overstreet
  -1 siblings, 0 replies; 148+ messages in thread
From: Kent Overstreet @ 2016-11-02  3:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Kirill A . Shutemov, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Mon, Oct 31, 2016 at 08:29:01AM -0700, Christoph Hellwig wrote:
> On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
> > Avoid to access .bi_vcnt directly, because it may be not what
> > the driver expected any more after supporting multipage bvec.
> > 
> > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> 
> It would be really nice to have a comment in the code why it's
> even checking for multiple segments.

Or ideally refactor the code to not care about multiple segments at all.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
@ 2016-11-02  3:09       ` Kent Overstreet
  0 siblings, 0 replies; 148+ messages in thread
From: Kent Overstreet @ 2016-11-02  3:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ming Lei, Jens Axboe, linux-kernel, linux-block, linux-fsdevel,
	Kirill A . Shutemov, Alasdair Kergon, Mike Snitzer,
	maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Mon, Oct 31, 2016 at 08:29:01AM -0700, Christoph Hellwig wrote:
> On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
> > Avoid to access .bi_vcnt directly, because it may be not what
> > the driver expected any more after supporting multipage bvec.
> > 
> > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> 
> It would be really nice to have a comment in the code why it's
> even checking for multiple segments.

Or ideally refactor the code to not care about multiple segments at all.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  2016-11-02  3:09       ` Kent Overstreet
@ 2016-11-02  7:56         ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-11-02  7:56 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	linux-block, Linux FS Devel, Kirill A . Shutemov,
	Alasdair Kergon, Mike Snitzer, maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Wed, Nov 2, 2016 at 11:09 AM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Mon, Oct 31, 2016 at 08:29:01AM -0700, Christoph Hellwig wrote:
>> On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
>> > Avoid to access .bi_vcnt directly, because it may be not what
>> > the driver expected any more after supporting multipage bvec.
>> >
>> > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>>
>> It would be really nice to have a comment in the code why it's
>> even checking for multiple segments.
>
> Or ideally refactor the code to not care about multiple segments at all.

The check on 'bio->bi_vcnt == 1' is introduced in commit de3ec86dff160(dm:
don't start current request if it would've merged with the previous), which
fixed one performance issue.[1]

Looks the idea of the patch is to delay dispatching the rq if it
would've merged with previous request and the rq is small(single bvec).
I guess the motivation is to try to increase chance of merging with the delay.

But why does the code check on 'bio->bi_vcnt == 1'? Once the bio is
submitted, .bi_vcnt isn't changed any more and merging doesn't change
it too. So should the check have been on blk_rq_bytes(rq)?

Mike, please correct me if my understanding is wrong.


[1] https://www.redhat.com/archives/dm-devel/2015-March/msg00014.html


thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
@ 2016-11-02  7:56         ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-11-02  7:56 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	linux-block, Linux FS Devel, Kirill A . Shutemov,
	Alasdair Kergon, Mike Snitzer, maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Wed, Nov 2, 2016 at 11:09 AM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Mon, Oct 31, 2016 at 08:29:01AM -0700, Christoph Hellwig wrote:
>> On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
>> > Avoid to access .bi_vcnt directly, because it may be not what
>> > the driver expected any more after supporting multipage bvec.
>> >
>> > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>>
>> It would be really nice to have a comment in the code why it's
>> even checking for multiple segments.
>
> Or ideally refactor the code to not care about multiple segments at all.

The check on 'bio->bi_vcnt == 1' is introduced in commit de3ec86dff160(dm:
don't start current request if it would've merged with the previous), which
fixed one performance issue.[1]

Looks the idea of the patch is to delay dispatching the rq if it
would've merged with previous request and the rq is small(single bvec).
I guess the motivation is to try to increase chance of merging with the delay.

But why does the code check on 'bio->bi_vcnt == 1'? Once the bio is
submitted, .bi_vcnt isn't changed any more and merging doesn't change
it too. So should the check have been on blk_rq_bytes(rq)?

Mike, please correct me if my understanding is wrong.


[1] https://www.redhat.com/archives/dm-devel/2015-March/msg00014.html


thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  2016-11-02  7:56         ` Ming Lei
  (?)
@ 2016-11-02 14:24           ` Mike Snitzer
  -1 siblings, 0 replies; 148+ messages in thread
From: Mike Snitzer @ 2016-11-02 14:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	Linux Kernel Mailing List, Christoph Hellwig,
	maintainer:DEVICE-MAPPER (LVM),
	linux-block, Alasdair Kergon, Linux FS Devel, Shaohua Li,
	Kent Overstreet, Kirill A . Shutemov

On Wed, Nov 02 2016 at  3:56am -0400,
Ming Lei <tom.leiming@gmail.com> wrote:

> On Wed, Nov 2, 2016 at 11:09 AM, Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> > On Mon, Oct 31, 2016 at 08:29:01AM -0700, Christoph Hellwig wrote:
> >> On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
> >> > Avoid to access .bi_vcnt directly, because it may be not what
> >> > the driver expected any more after supporting multipage bvec.
> >> >
> >> > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >>
> >> It would be really nice to have a comment in the code why it's
> >> even checking for multiple segments.
> >
> > Or ideally refactor the code to not care about multiple segments at all.
> 
> The check on 'bio->bi_vcnt == 1' is introduced in commit de3ec86dff160(dm:
> don't start current request if it would've merged with the previous), which
> fixed one performance issue.[1]
> 
> Looks the idea of the patch is to delay dispatching the rq if it
> would've merged with previous request and the rq is small(single bvec).
> I guess the motivation is to try to increase chance of merging with the delay.
> 
> But why does the code check on 'bio->bi_vcnt == 1'? Once the bio is
> submitted, .bi_vcnt isn't changed any more and merging doesn't change
> it too. So should the check have been on blk_rq_bytes(rq)?
> 
> Mike, please correct me if my understanding is wrong.
> 
> 
> [1] https://www.redhat.com/archives/dm-devel/2015-March/msg00014.html

The patch was labored over for quite a while and is based on suggestions I
got from Jens when discussing a very problematic aspect of old
.request_fn request-based DM performance for a multi-threaded (64
threads) sequential IO benchmark (vdbench IIRC).  The issue was reported
by NetApp.

The patch in question fixed the lack of merging that was seen with this
interleaved sequential IO benchmark.  The lack of merging was made worse
if a DM multipath device had more underlying paths (e.g. 4 instead of 2).

As for your question, about using blk_rq_bytes(rq) vs 'bio->bi_vcnt == 1'
.. not sure how that would be a suitable replacement.  But it has been a
while since I've delved into these block core merge details of old
.request_fn but _please_ don't change the logic of this code simply
because it is proving itself to be problematic for your current
patchset's cleanliness.

Mike

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
@ 2016-11-02 14:24           ` Mike Snitzer
  0 siblings, 0 replies; 148+ messages in thread
From: Mike Snitzer @ 2016-11-02 14:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Kent Overstreet, Christoph Hellwig, Jens Axboe,
	Linux Kernel Mailing List, linux-block, Linux FS Devel,
	Kirill A . Shutemov, Alasdair Kergon,
	maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Wed, Nov 02 2016 at  3:56am -0400,
Ming Lei <tom.leiming@gmail.com> wrote:

> On Wed, Nov 2, 2016 at 11:09 AM, Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> > On Mon, Oct 31, 2016 at 08:29:01AM -0700, Christoph Hellwig wrote:
> >> On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
> >> > Avoid to access .bi_vcnt directly, because it may be not what
> >> > the driver expected any more after supporting multipage bvec.
> >> >
> >> > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >>
> >> It would be really nice to have a comment in the code why it's
> >> even checking for multiple segments.
> >
> > Or ideally refactor the code to not care about multiple segments at all.
> 
> The check on 'bio->bi_vcnt == 1' is introduced in commit de3ec86dff160(dm:
> don't start current request if it would've merged with the previous), which
> fixed one performance issue.[1]
> 
> Looks the idea of the patch is to delay dispatching the rq if it
> would've merged with previous request and the rq is small(single bvec).
> I guess the motivation is to try to increase chance of merging with the delay.
> 
> But why does the code check on 'bio->bi_vcnt == 1'? Once the bio is
> submitted, .bi_vcnt isn't changed any more and merging doesn't change
> it too. So should the check have been on blk_rq_bytes(rq)?
> 
> Mike, please correct me if my understanding is wrong.
> 
> 
> [1] https://www.redhat.com/archives/dm-devel/2015-March/msg00014.html

The patch was labored over for quite a while and is based on suggestions I
got from Jens when discussing a very problematic aspect of old
.request_fn request-based DM performance for a multi-threaded (64
threads) sequential IO benchmark (vdbench IIRC).  The issue was reported
by NetApp.

The patch in question fixed the lack of merging that was seen with this
interleaved sequential IO benchmark.  The lack of merging was made worse
if a DM multipath device had more underlying paths (e.g. 4 instead of 2).

As for your question, about using blk_rq_bytes(rq) vs 'bio->bi_vcnt == 1'
.. not sure how that would be a suitable replacement.  But it has been a
while since I've delved into these block core merge details of old
.request_fn but _please_ don't change the logic of this code simply
because it is proving itself to be problematic for your current
patchset's cleanliness.

Mike

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
@ 2016-11-02 14:24           ` Mike Snitzer
  0 siblings, 0 replies; 148+ messages in thread
From: Mike Snitzer @ 2016-11-02 14:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Kent Overstreet, Christoph Hellwig, Jens Axboe,
	Linux Kernel Mailing List, linux-block, Linux FS Devel,
	Kirill A . Shutemov, Alasdair Kergon,
	maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Wed, Nov 02 2016 at  3:56am -0400,
Ming Lei <tom.leiming@gmail.com> wrote:

> On Wed, Nov 2, 2016 at 11:09 AM, Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> > On Mon, Oct 31, 2016 at 08:29:01AM -0700, Christoph Hellwig wrote:
> >> On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
> >> > Avoid to access .bi_vcnt directly, because it may be not what
> >> > the driver expected any more after supporting multipage bvec.
> >> >
> >> > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >>
> >> It would be really nice to have a comment in the code why it's
> >> even checking for multiple segments.
> >
> > Or ideally refactor the code to not care about multiple segments at all.
> 
> The check on 'bio->bi_vcnt == 1' is introduced in commit de3ec86dff160(dm:
> don't start current request if it would've merged with the previous), which
> fixed one performance issue.[1]
> 
> Looks the idea of the patch is to delay dispatching the rq if it
> would've merged with previous request and the rq is small(single bvec).
> I guess the motivation is to try to increase chance of merging with the delay.
> 
> But why does the code check on 'bio->bi_vcnt == 1'? Once the bio is
> submitted, .bi_vcnt isn't changed any more and merging doesn't change
> it too. So should the check have been on blk_rq_bytes(rq)?
> 
> Mike, please correct me if my understanding is wrong.
> 
> 
> [1] https://www.redhat.com/archives/dm-devel/2015-March/msg00014.html

The patch was labored over for quite a while and is based on suggestions I
got from Jens when discussing a very problematic aspect of old
.request_fn request-based DM performance for a multi-threaded (64
threads) sequential IO benchmark (vdbench IIRC).  The issue was reported
by NetApp.

The patch in question fixed the lack of merging that was seen with this
interleaved sequential IO benchmark.  The lack of merging was made worse
if a DM multipath device had more underlying paths (e.g. 4 instead of 2).

As for your question, about using blk_rq_bytes(rq) vs 'bio->bi_vcnt == 1'
.. not sure how that would be a suitable replacement.  But it has been a
while since I've delved into these block core merge details of old
.request_fn but _please_ don't change the logic of this code simply
because it is proving itself to be problematic for your current
patchset's cleanliness.

Mike

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  2016-11-02 14:24           ` Mike Snitzer
@ 2016-11-02 23:47             ` Ming Lei
  -1 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-11-02 23:47 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Kent Overstreet, Christoph Hellwig, Jens Axboe,
	Linux Kernel Mailing List, linux-block, Linux FS Devel,
	Kirill A . Shutemov, Alasdair Kergon,
	maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Wed, Nov 2, 2016 at 10:24 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> On Wed, Nov 02 2016 at  3:56am -0400,
> Ming Lei <tom.leiming@gmail.com> wrote:
>
>> On Wed, Nov 2, 2016 at 11:09 AM, Kent Overstreet
>> <kent.overstreet@gmail.com> wrote:
>> > On Mon, Oct 31, 2016 at 08:29:01AM -0700, Christoph Hellwig wrote:
>> >> On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
>> >> > Avoid to access .bi_vcnt directly, because it may be not what
>> >> > the driver expected any more after supporting multipage bvec.
>> >> >
>> >> > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> >>
>> >> It would be really nice to have a comment in the code why it's
>> >> even checking for multiple segments.
>> >
>> > Or ideally refactor the code to not care about multiple segments at all.
>>
>> The check on 'bio->bi_vcnt == 1' is introduced in commit de3ec86dff160(dm:
>> don't start current request if it would've merged with the previous), which
>> fixed one performance issue.[1]
>>
>> Looks the idea of the patch is to delay dispatching the rq if it
>> would've merged with previous request and the rq is small(single bvec).
>> I guess the motivation is to try to increase chance of merging with the delay.
>>
>> But why does the code check on 'bio->bi_vcnt == 1'? Once the bio is
>> submitted, .bi_vcnt isn't changed any more and merging doesn't change
>> it too. So should the check have been on blk_rq_bytes(rq)?
>>
>> Mike, please correct me if my understanding is wrong.
>>
>>
>> [1] https://www.redhat.com/archives/dm-devel/2015-March/msg00014.html
>
> The patch was labored over for quite a while and is based on suggestions I
> got from Jens when discussing a very problematic aspect of old
> .request_fn request-based DM performance for a multi-threaded (64
> threads) sequential IO benchmark (vdbench IIRC).  The issue was reported
> by NetApp.
>
> The patch in question fixed the lack of merging that was seen with this
> interleaved sequential IO benchmark.  The lack of merging was made worse
> if a DM multipath device had more underlying paths (e.g. 4 instead of 2).
>
> As for your question, about using blk_rq_bytes(rq) vs 'bio->bi_vcnt == 1'
> .. not sure how that would be a suitable replacement.  But it has been a
> while since I've delved into these block core merge details of old

Just last year, looks not long enough, :-)

> .request_fn but _please_ don't change the logic of this code simply

As I explained before, neither .bi_vcnt will be changed after submitting,
nor be changed during merging, so I think the checking is wrong,
could you explain what is your initial motivation of checking on
'bio->bi_vcnt == 1'?

> because it is proving itself to be problematic for your current
> patchset's cleanliness.

Could you explain what is problematic for the cleanliness?

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
@ 2016-11-02 23:47             ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-11-02 23:47 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Kent Overstreet, Christoph Hellwig, Jens Axboe,
	Linux Kernel Mailing List, linux-block, Linux FS Devel,
	Kirill A . Shutemov, Alasdair Kergon,
	maintainer:DEVICE-MAPPER (LVM),
	Shaohua Li, open list:SOFTWARE RAID (Multiple Disks) SUPPORT

On Wed, Nov 2, 2016 at 10:24 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> On Wed, Nov 02 2016 at  3:56am -0400,
> Ming Lei <tom.leiming@gmail.com> wrote:
>
>> On Wed, Nov 2, 2016 at 11:09 AM, Kent Overstreet
>> <kent.overstreet@gmail.com> wrote:
>> > On Mon, Oct 31, 2016 at 08:29:01AM -0700, Christoph Hellwig wrote:
>> >> On Sat, Oct 29, 2016 at 04:08:08PM +0800, Ming Lei wrote:
>> >> > Avoid to access .bi_vcnt directly, because it may be not what
>> >> > the driver expected any more after supporting multipage bvec.
>> >> >
>> >> > Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> >>
>> >> It would be really nice to have a comment in the code why it's
>> >> even checking for multiple segments.
>> >
>> > Or ideally refactor the code to not care about multiple segments at all.
>>
>> The check on 'bio->bi_vcnt == 1' is introduced in commit de3ec86dff160(dm:
>> don't start current request if it would've merged with the previous), which
>> fixed one performance issue.[1]
>>
>> Looks the idea of the patch is to delay dispatching the rq if it
>> would've merged with previous request and the rq is small(single bvec).
>> I guess the motivation is to try to increase chance of merging with the delay.
>>
>> But why does the code check on 'bio->bi_vcnt == 1'? Once the bio is
>> submitted, .bi_vcnt isn't changed any more and merging doesn't change
>> it too. So should the check have been on blk_rq_bytes(rq)?
>>
>> Mike, please correct me if my understanding is wrong.
>>
>>
>> [1] https://www.redhat.com/archives/dm-devel/2015-March/msg00014.html
>
> The patch was labored over for quite a while and is based on suggestions I
> got from Jens when discussing a very problematic aspect of old
> .request_fn request-based DM performance for a multi-threaded (64
> threads) sequential IO benchmark (vdbench IIRC).  The issue was reported
> by NetApp.
>
> The patch in question fixed the lack of merging that was seen with this
> interleaved sequential IO benchmark.  The lack of merging was made worse
> if a DM multipath device had more underlying paths (e.g. 4 instead of 2).
>
> As for your question, about using blk_rq_bytes(rq) vs 'bio->bi_vcnt == 1'
> .. not sure how that would be a suitable replacement.  But it has been a
> while since I've delved into these block core merge details of old

Just last year, looks not long enough, :-)

> .request_fn but _please_ don't change the logic of this code simply

As I explained before, neither .bi_vcnt will be changed after submitting,
nor be changed during merging, so I think the checking is wrong,
could you explain what is your initial motivation of checking on
'bio->bi_vcnt == 1'?

> because it is proving itself to be problematic for your current
> patchset's cleanliness.

Could you explain what is problematic for the cleanliness?

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP
  2016-11-02  3:08     ` Kent Overstreet
@ 2016-11-03 10:38       ` Ming Lei
  2016-11-03 11:20         ` Kent Overstreet
  0 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-11-03 10:38 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	linux-block, Linux FS Devel, Kirill A . Shutemov, Jens Axboe,
	Hannes Reinecke, Mike Christie, Dan Williams, Toshi Kani

On Wed, Nov 2, 2016 at 11:08 AM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Mon, Oct 31, 2016 at 08:39:15AM -0700, Christoph Hellwig wrote:
>> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
>> > Some drivers(such as dm) should be capable of dealing with multipage
>> > bvec, but the incoming bio may be too big, such as, a new singlepage bvec
>> > bio can't be cloned from the bio, or can't be allocated to singlepage
>> > bvec with same size.
>> >
>> > At least crypt dm, log writes and bcache have this kind of issue.
>>
>> We already have the segment_size limitation for request based drivers.
>> I'd rather extent it to bio drivers if really needed.
>>
>> But then again we should look into not having this limitation.  E.g.
>> for bcache I'd be really surprised if it's that limited, given that
>> Kent came up with this whole multipage bvec scheme.
>
> AFAIK the only issue is with drivers that may have to bounce bios - pages that
> were contiguous in the original bio won't necessarily be contiguous in the
> bounced bio, thus bouncing might require more than BIO_MAX_SEGMENTS bvecs.
>
> I don't know what Ming's referring to by "singlepage bvec bios".
>
> Anyways, bouncing comes up in multiple places so we probably need to come up
> with a generic solution for that. Other than that, there shouldn't be any issues
> or limitations - if you're not bouncing, there's no need to clone the bvecs.

AFAIK, the only special case is bch_data_verify(): drivers/md/bcache/debug.c,
for other bio_clone(), no direct access to io vec table, so default
multipage bvec
copy is fine.

I will remove the flag and try to fix bch_data_verify() by using multiple bio,
and I remembered I cooked patch to do that long time ago, :-)


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP
  2016-11-03 10:38       ` Ming Lei
@ 2016-11-03 11:20         ` Kent Overstreet
  2016-11-03 11:26           ` Ming Lei
  0 siblings, 1 reply; 148+ messages in thread
From: Kent Overstreet @ 2016-11-03 11:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	linux-block, Linux FS Devel, Kirill A . Shutemov, Jens Axboe,
	Hannes Reinecke, Mike Christie, Dan Williams, Toshi Kani

On Thu, Nov 03, 2016 at 06:38:57PM +0800, Ming Lei wrote:
> On Wed, Nov 2, 2016 at 11:08 AM, Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> > On Mon, Oct 31, 2016 at 08:39:15AM -0700, Christoph Hellwig wrote:
> >> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
> >> > Some drivers(such as dm) should be capable of dealing with multipage
> >> > bvec, but the incoming bio may be too big, such as, a new singlepage bvec
> >> > bio can't be cloned from the bio, or can't be allocated to singlepage
> >> > bvec with same size.
> >> >
> >> > At least crypt dm, log writes and bcache have this kind of issue.
> >>
> >> We already have the segment_size limitation for request based drivers.
> >> I'd rather extent it to bio drivers if really needed.
> >>
> >> But then again we should look into not having this limitation.  E.g.
> >> for bcache I'd be really surprised if it's that limited, given that
> >> Kent came up with this whole multipage bvec scheme.
> >
> > AFAIK the only issue is with drivers that may have to bounce bios - pages that
> > were contiguous in the original bio won't necessarily be contiguous in the
> > bounced bio, thus bouncing might require more than BIO_MAX_SEGMENTS bvecs.
> >
> > I don't know what Ming's referring to by "singlepage bvec bios".
> >
> > Anyways, bouncing comes up in multiple places so we probably need to come up
> > with a generic solution for that. Other than that, there shouldn't be any issues
> > or limitations - if you're not bouncing, there's no need to clone the bvecs.
> 
> AFAIK, the only special case is bch_data_verify(): drivers/md/bcache/debug.c,
> for other bio_clone(), no direct access to io vec table, so default
> multipage bvec
> copy is fine.
> 
> I will remove the flag and try to fix bch_data_verify() by using multiple bio,
> and I remembered I cooked patch to do that long time ago, :-)

You can #ifdef out the bch_data_verify() code, it's debug code that hasn't been
used in ages.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP
  2016-11-03 11:20         ` Kent Overstreet
@ 2016-11-03 11:26           ` Ming Lei
  2016-11-03 11:30             ` Kent Overstreet
  0 siblings, 1 reply; 148+ messages in thread
From: Ming Lei @ 2016-11-03 11:26 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	linux-block, Linux FS Devel, Kirill A . Shutemov, Jens Axboe,
	Hannes Reinecke, Mike Christie, Dan Williams, Toshi Kani

On Thu, Nov 3, 2016 at 7:20 PM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> On Thu, Nov 03, 2016 at 06:38:57PM +0800, Ming Lei wrote:
>> On Wed, Nov 2, 2016 at 11:08 AM, Kent Overstreet
>> <kent.overstreet@gmail.com> wrote:
>> > On Mon, Oct 31, 2016 at 08:39:15AM -0700, Christoph Hellwig wrote:
>> >> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
>> >> > Some drivers(such as dm) should be capable of dealing with multipage
>> >> > bvec, but the incoming bio may be too big, such as, a new singlepage bvec
>> >> > bio can't be cloned from the bio, or can't be allocated to singlepage
>> >> > bvec with same size.
>> >> >
>> >> > At least crypt dm, log writes and bcache have this kind of issue.
>> >>
>> >> We already have the segment_size limitation for request based drivers.
>> >> I'd rather extent it to bio drivers if really needed.
>> >>
>> >> But then again we should look into not having this limitation.  E.g.
>> >> for bcache I'd be really surprised if it's that limited, given that
>> >> Kent came up with this whole multipage bvec scheme.
>> >
>> > AFAIK the only issue is with drivers that may have to bounce bios - pages that
>> > were contiguous in the original bio won't necessarily be contiguous in the
>> > bounced bio, thus bouncing might require more than BIO_MAX_SEGMENTS bvecs.
>> >
>> > I don't know what Ming's referring to by "singlepage bvec bios".
>> >
>> > Anyways, bouncing comes up in multiple places so we probably need to come up
>> > with a generic solution for that. Other than that, there shouldn't be any issues
>> > or limitations - if you're not bouncing, there's no need to clone the bvecs.
>>
>> AFAIK, the only special case is bch_data_verify(): drivers/md/bcache/debug.c,
>> for other bio_clone(), no direct access to io vec table, so default
>> multipage bvec
>> copy is fine.
>>
>> I will remove the flag and try to fix bch_data_verify() by using multiple bio,
>> and I remembered I cooked patch to do that long time ago, :-)
>
> You can #ifdef out the bch_data_verify() code, it's debug code that hasn't been
> used in ages.

Though you didn't test it ages, it is sitll working in my last test, :-)

But someone can enable that for debug too, I don't want to make him/her sad.

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP
  2016-11-03 11:26           ` Ming Lei
@ 2016-11-03 11:30             ` Kent Overstreet
  0 siblings, 0 replies; 148+ messages in thread
From: Kent Overstreet @ 2016-11-03 11:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Linux Kernel Mailing List,
	linux-block, Linux FS Devel, Kirill A . Shutemov, Jens Axboe,
	Hannes Reinecke, Mike Christie, Dan Williams, Toshi Kani

On Thu, Nov 03, 2016 at 07:26:52PM +0800, Ming Lei wrote:
> On Thu, Nov 3, 2016 at 7:20 PM, Kent Overstreet
> <kent.overstreet@gmail.com> wrote:
> > On Thu, Nov 03, 2016 at 06:38:57PM +0800, Ming Lei wrote:
> >> On Wed, Nov 2, 2016 at 11:08 AM, Kent Overstreet
> >> <kent.overstreet@gmail.com> wrote:
> >> > On Mon, Oct 31, 2016 at 08:39:15AM -0700, Christoph Hellwig wrote:
> >> >> On Sat, Oct 29, 2016 at 04:08:27PM +0800, Ming Lei wrote:
> >> >> > Some drivers(such as dm) should be capable of dealing with multipage
> >> >> > bvec, but the incoming bio may be too big, such as, a new singlepage bvec
> >> >> > bio can't be cloned from the bio, or can't be allocated to singlepage
> >> >> > bvec with same size.
> >> >> >
> >> >> > At least crypt dm, log writes and bcache have this kind of issue.
> >> >>
> >> >> We already have the segment_size limitation for request based drivers.
> >> >> I'd rather extent it to bio drivers if really needed.
> >> >>
> >> >> But then again we should look into not having this limitation.  E.g.
> >> >> for bcache I'd be really surprised if it's that limited, given that
> >> >> Kent came up with this whole multipage bvec scheme.
> >> >
> >> > AFAIK the only issue is with drivers that may have to bounce bios - pages that
> >> > were contiguous in the original bio won't necessarily be contiguous in the
> >> > bounced bio, thus bouncing might require more than BIO_MAX_SEGMENTS bvecs.
> >> >
> >> > I don't know what Ming's referring to by "singlepage bvec bios".
> >> >
> >> > Anyways, bouncing comes up in multiple places so we probably need to come up
> >> > with a generic solution for that. Other than that, there shouldn't be any issues
> >> > or limitations - if you're not bouncing, there's no need to clone the bvecs.
> >>
> >> AFAIK, the only special case is bch_data_verify(): drivers/md/bcache/debug.c,
> >> for other bio_clone(), no direct access to io vec table, so default
> >> multipage bvec
> >> copy is fine.
> >>
> >> I will remove the flag and try to fix bch_data_verify() by using multiple bio,
> >> and I remembered I cooked patch to do that long time ago, :-)
> >
> > You can #ifdef out the bch_data_verify() code, it's debug code that hasn't been
> > used in ages.
> 
> Though you didn't test it ages, it is sitll working in my last test, :-)
> 
> But someone can enable that for debug too, I don't want to make him/her sad.

Up to you :)

It's not useful for anything but debugging though, so I wouldn't worry about
impacting end users.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 04/60] block: floppy: use bio_add_page()
  2016-10-29  8:08 ` [PATCH 04/60] block: floppy: use bio_add_page() Ming Lei
  2016-10-31 15:26   ` Christoph Hellwig
@ 2016-11-10 19:35   ` Christoph Hellwig
  2016-11-11  8:39     ` Ming Lei
  1 sibling, 1 reply; 148+ messages in thread
From: Christoph Hellwig @ 2016-11-10 19:35 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block

Hi Ming,

any chance you could send out a series with the various bio_add_page
soon-ish?  I'd really like to get all the good prep work in for
this merge window, so that we can look at the real multipage-bvec
work for the next one.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 04/60] block: floppy: use bio_add_page()
  2016-11-10 19:35   ` Christoph Hellwig
@ 2016-11-11  8:39     ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-11-11  8:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Fri, Nov 11, 2016 at 3:35 AM, Christoph Hellwig <hch@infradead.org> wrote:
> Hi Ming,
>
> any chance you could send out a series with the various bio_add_page
> soon-ish?  I'd really like to get all the good prep work in for

No problem, will post out v1 later

thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 32/60] block: implement sp version of bvec iterator helpers
  2016-10-29 11:06   ` kbuild test robot
@ 2016-12-17 11:38       ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-12-17 11:38 UTC (permalink / raw)
  To: kbuild test robot, sparclinux, David S. Miller
  Cc: kbuild-all, Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Christoph Hellwig, Kirill A . Shutemov,
	Johannes Berg

Hi Guys,

On Sat, Oct 29, 2016 at 7:06 PM, kbuild test robot <lkp@intel.com> wrote:
> Hi Ming,

Thanks for the report!

>
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.9-rc2 next-20161028]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
> [Check https://git-scm.com/docs/git-format-patch for more information]
>
> url:    https://github.com/0day-ci/linux/commits/Ming-Lei/block-support-multipage-bvec/20161029-163910
> config: sparc-defconfig (attached as .config)
> compiler: sparc-linux-gcc (GCC) 6.2.0
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=sparc
>
> All error/warnings (new ones prefixed by >>):
>
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,


This issue should be caused by somewhere in sparc arch, and this patch
only adds '#include <linux/mm.h>' to 'include/linux/bvec.h' for using
nth_page().

So Cc sparc list.

Thanks,
Ming

>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/include/asm/device.h:9,
>                     from include/linux/device.h:30,
>                     from include/linux/node.h:17,
>                     from include/linux/cpu.h:16,
>                     from include/linux/stop_machine.h:4,
>                     from kernel/sched/sched.h:10,
>                     from kernel/sched/loadavg.c:11:
>>> arch/sparc/include/asm/oplib_32.h:105:39: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
>                                           ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:168:36: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
>                                        ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:172:18: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>               struct linux_prom_registers *sbusregs, int nregs);
>                      ^~~~~~~~~~~~~~~~~~~~
> --
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/mp.c:12:
>>> arch/sparc/include/asm/oplib_32.h:105:39: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
>                                           ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:168:36: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
>                                        ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:172:18: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>               struct linux_prom_registers *sbusregs, int nregs);
>                      ^~~~~~~~~~~~~~~~~~~~
>>> arch/sparc/prom/mp.c:23:1: error: conflicting types for 'prom_startcpu'
>     prom_startcpu(int cpunode, struct linux_prom_registers *ctable_reg, int ctx, char *pc)
>     ^~~~~~~~~~~~~
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/mp.c:12:
>    arch/sparc/include/asm/oplib_32.h:105:5: note: previous declaration of 'prom_startcpu' was here
>     int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
>         ^~~~~~~~~~~~~
> --
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/ranges.c:11:
>>> arch/sparc/include/asm/oplib_32.h:105:39: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
>                                           ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:168:36: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
>                                        ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:172:18: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>               struct linux_prom_registers *sbusregs, int nregs);
>                      ^~~~~~~~~~~~~~~~~~~~
>>> arch/sparc/prom/ranges.c:57:6: error: conflicting types for 'prom_apply_obio_ranges'
>     void prom_apply_obio_ranges(struct linux_prom_registers *regs, int nregs)
>          ^~~~~~~~~~~~~~~~~~~~~~
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/ranges.c:11:
>    arch/sparc/include/asm/oplib_32.h:168:6: note: previous declaration of 'prom_apply_obio_ranges' was here
>     void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
>          ^~~~~~~~~~~~~~~~~~~~~~
>    In file included from include/linux/linkage.h:6:0,
>                     from include/linux/kernel.h:6,
>                     from include/linux/list.h:8,
>                     from include/linux/module.h:9,
>                     from arch/sparc/prom/ranges.c:9:
>    arch/sparc/prom/ranges.c:62:15: error: conflicting types for 'prom_apply_obio_ranges'
>     EXPORT_SYMBOL(prom_apply_obio_ranges);
>                   ^
>    include/linux/export.h:58:21: note: in definition of macro '___EXPORT_SYMBOL'
>      extern typeof(sym) sym;      \
>                         ^~~
>>> arch/sparc/prom/ranges.c:62:1: note: in expansion of macro 'EXPORT_SYMBOL'
>     EXPORT_SYMBOL(prom_apply_obio_ranges);
>     ^~~~~~~~~~~~~
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/ranges.c:11:
>    arch/sparc/include/asm/oplib_32.h:168:6: note: previous declaration of 'prom_apply_obio_ranges' was here
>     void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
>          ^~~~~~~~~~~~~~~~~~~~~~
>>> arch/sparc/prom/ranges.c:87:6: error: conflicting types for 'prom_apply_generic_ranges'
>     void prom_apply_generic_ranges(phandle node, phandle parent,
>          ^~~~~~~~~~~~~~~~~~~~~~~~~
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/ranges.c:11:
>    arch/sparc/include/asm/oplib_32.h:171:6: note: previous declaration of 'prom_apply_generic_ranges' was here
>     void prom_apply_generic_ranges(phandle node, phandle parent,
>          ^~~~~~~~~~~~~~~~~~~~~~~~~
> --
>    In file included from include/linux/bvec.h:25:0,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/include/asm/oplib_32.h:11,
>                     from arch/sparc/include/asm/oplib.h:6,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from mm/init-mm.c:9:
>    include/linux/mm.h: In function 'is_vmalloc_addr':
>>> include/linux/mm.h:486:17: error: 'VMALLOC_START' undeclared (first use in this function)
>      return addr >= VMALLOC_START && addr < VMALLOC_END;
>                     ^~~~~~~~~~~~~
>    include/linux/mm.h:486:17: note: each undeclared identifier is reported only once for each function it appears in
>>> include/linux/mm.h:486:41: error: 'VMALLOC_END' undeclared (first use in this function)
>      return addr >= VMALLOC_START && addr < VMALLOC_END;
>                                             ^~~~~~~~~~~
>    include/linux/mm.h: In function 'maybe_mkwrite':
>>> include/linux/mm.h:624:9: error: implicit declaration of function 'pte_mkwrite' [-Werror=implicit-function-declaration]
>       pte = pte_mkwrite(pte);
>             ^~~~~~~~~~~
>    In file included from include/linux/bvec.h:25:0,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/include/asm/oplib_32.h:11,
>                     from arch/sparc/include/asm/oplib.h:6,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from mm/init-mm.c:9:
>    include/linux/mm.h: In function 'pgtable_init':
>>> include/linux/mm.h:1674:2: error: implicit declaration of function 'pgtable_cache_init' [-Werror=implicit-function-declaration]
>      pgtable_cache_init();
>      ^~~~~~~~~~~~~~~~~~
>    In file included from arch/sparc/include/asm/pgtable.h:6:0,
>                     from mm/init-mm.c:9:
>    arch/sparc/include/asm/pgtable_32.h: At top level:
>>> arch/sparc/include/asm/pgtable_32.h:245:21: error: conflicting types for 'pte_mkwrite'
>     static inline pte_t pte_mkwrite(pte_t pte)
>                         ^~~~~~~~~~~
>    In file included from include/linux/bvec.h:25:0,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/include/asm/oplib_32.h:11,
>                     from arch/sparc/include/asm/oplib.h:6,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from mm/init-mm.c:9:
>    include/linux/mm.h:624:9: note: previous implicit declaration of 'pte_mkwrite' was here
>       pte = pte_mkwrite(pte);
>             ^~~~~~~~~~~
>    cc1: some warnings being treated as errors
>
> vim +/VMALLOC_START +486 include/linux/mm.h
>
> 0738c4bb8 Paul Mundt             2008-03-12  480   */
> bb00a789e Yaowei Bai             2016-05-19  481  static inline bool is_vmalloc_addr(const void *x)
> 9e2779fa2 Christoph Lameter      2008-02-04  482  {
> 0738c4bb8 Paul Mundt             2008-03-12  483  #ifdef CONFIG_MMU
> 9e2779fa2 Christoph Lameter      2008-02-04  484        unsigned long addr = (unsigned long)x;
> 9e2779fa2 Christoph Lameter      2008-02-04  485
> 9e2779fa2 Christoph Lameter      2008-02-04 @486        return addr >= VMALLOC_START && addr < VMALLOC_END;
> 0738c4bb8 Paul Mundt             2008-03-12  487  #else
> bb00a789e Yaowei Bai             2016-05-19  488        return false;
> 8ca3ed87d David Howells          2008-02-23  489  #endif
> 0738c4bb8 Paul Mundt             2008-03-12  490  }
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  491  #ifdef CONFIG_MMU
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  492  extern int is_vmalloc_or_module_addr(const void *x);
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  493  #else
> 934831d06 David Howells          2009-09-24  494  static inline int is_vmalloc_or_module_addr(const void *x)
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  495  {
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  496        return 0;
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  497  }
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  498  #endif
> 9e2779fa2 Christoph Lameter      2008-02-04  499
> 39f1f78d5 Al Viro                2014-05-06  500  extern void kvfree(const void *addr);
> 39f1f78d5 Al Viro                2014-05-06  501
> 53f9263ba Kirill A. Shutemov     2016-01-15  502  static inline atomic_t *compound_mapcount_ptr(struct page *page)
> 53f9263ba Kirill A. Shutemov     2016-01-15  503  {
> 53f9263ba Kirill A. Shutemov     2016-01-15  504        return &page[1].compound_mapcount;
> 53f9263ba Kirill A. Shutemov     2016-01-15  505  }
> 53f9263ba Kirill A. Shutemov     2016-01-15  506
> 53f9263ba Kirill A. Shutemov     2016-01-15  507  static inline int compound_mapcount(struct page *page)
> 53f9263ba Kirill A. Shutemov     2016-01-15  508  {
> 5f527c2b3 Andrea Arcangeli       2016-05-20  509        VM_BUG_ON_PAGE(!PageCompound(page), page);
> 53f9263ba Kirill A. Shutemov     2016-01-15  510        page = compound_head(page);
> 53f9263ba Kirill A. Shutemov     2016-01-15  511        return atomic_read(compound_mapcount_ptr(page)) + 1;
> 53f9263ba Kirill A. Shutemov     2016-01-15  512  }
> 53f9263ba Kirill A. Shutemov     2016-01-15  513
> ccaafd7fd Joonsoo Kim            2015-02-10  514  /*
> 70b50f94f Andrea Arcangeli       2011-11-02  515   * The atomic page->_mapcount, starts from -1: so that transitions
> 70b50f94f Andrea Arcangeli       2011-11-02  516   * both from it and to it can be tracked, using atomic_inc_and_test
> 70b50f94f Andrea Arcangeli       2011-11-02  517   * and atomic_add_negative(-1).
> 70b50f94f Andrea Arcangeli       2011-11-02  518   */
> 22b751c3d Mel Gorman             2013-02-22  519  static inline void page_mapcount_reset(struct page *page)
> 70b50f94f Andrea Arcangeli       2011-11-02  520  {
> 70b50f94f Andrea Arcangeli       2011-11-02  521        atomic_set(&(page)->_mapcount, -1);
> 70b50f94f Andrea Arcangeli       2011-11-02  522  }
> 70b50f94f Andrea Arcangeli       2011-11-02  523
> b20ce5e03 Kirill A. Shutemov     2016-01-15  524  int __page_mapcount(struct page *page);
> b20ce5e03 Kirill A. Shutemov     2016-01-15  525
> 70b50f94f Andrea Arcangeli       2011-11-02  526  static inline int page_mapcount(struct page *page)
> 70b50f94f Andrea Arcangeli       2011-11-02  527  {
> 1d148e218 Wang, Yalin            2015-02-11  528        VM_BUG_ON_PAGE(PageSlab(page), page);
> 53f9263ba Kirill A. Shutemov     2016-01-15  529
> b20ce5e03 Kirill A. Shutemov     2016-01-15  530        if (unlikely(PageCompound(page)))
> b20ce5e03 Kirill A. Shutemov     2016-01-15  531                return __page_mapcount(page);
> b20ce5e03 Kirill A. Shutemov     2016-01-15  532        return atomic_read(&page->_mapcount) + 1;
> 53f9263ba Kirill A. Shutemov     2016-01-15  533  }
> b20ce5e03 Kirill A. Shutemov     2016-01-15  534
> b20ce5e03 Kirill A. Shutemov     2016-01-15  535  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> b20ce5e03 Kirill A. Shutemov     2016-01-15  536  int total_mapcount(struct page *page);
> 6d0a07edd Andrea Arcangeli       2016-05-12  537  int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
> b20ce5e03 Kirill A. Shutemov     2016-01-15  538  #else
> b20ce5e03 Kirill A. Shutemov     2016-01-15  539  static inline int total_mapcount(struct page *page)
> b20ce5e03 Kirill A. Shutemov     2016-01-15  540  {
> b20ce5e03 Kirill A. Shutemov     2016-01-15  541        return page_mapcount(page);
> 70b50f94f Andrea Arcangeli       2011-11-02  542  }
> 6d0a07edd Andrea Arcangeli       2016-05-12  543  static inline int page_trans_huge_mapcount(struct page *page,
> 6d0a07edd Andrea Arcangeli       2016-05-12  544                                           int *total_mapcount)
> 6d0a07edd Andrea Arcangeli       2016-05-12  545  {
> 6d0a07edd Andrea Arcangeli       2016-05-12  546        int mapcount = page_mapcount(page);
> 6d0a07edd Andrea Arcangeli       2016-05-12  547        if (total_mapcount)
> 6d0a07edd Andrea Arcangeli       2016-05-12  548                *total_mapcount = mapcount;
> 6d0a07edd Andrea Arcangeli       2016-05-12  549        return mapcount;
> 6d0a07edd Andrea Arcangeli       2016-05-12  550  }
> b20ce5e03 Kirill A. Shutemov     2016-01-15  551  #endif
> 70b50f94f Andrea Arcangeli       2011-11-02  552
> b49af68ff Christoph Lameter      2007-05-06  553  static inline struct page *virt_to_head_page(const void *x)
> b49af68ff Christoph Lameter      2007-05-06  554  {
> b49af68ff Christoph Lameter      2007-05-06  555        struct page *page = virt_to_page(x);
> ccaafd7fd Joonsoo Kim            2015-02-10  556
> 1d798ca3f Kirill A. Shutemov     2015-11-06  557        return compound_head(page);
> b49af68ff Christoph Lameter      2007-05-06  558  }
> b49af68ff Christoph Lameter      2007-05-06  559
> ddc58f27f Kirill A. Shutemov     2016-01-15  560  void __put_page(struct page *page);
> ddc58f27f Kirill A. Shutemov     2016-01-15  561
> 1d7ea7324 Alexander Zarochentsev 2006-08-13  562  void put_pages_list(struct list_head *pages);
> ^1da177e4 Linus Torvalds         2005-04-16  563
> 8dfcc9ba2 Nick Piggin            2006-03-22  564  void split_page(struct page *page, unsigned int order);
> 8dfcc9ba2 Nick Piggin            2006-03-22  565
> ^1da177e4 Linus Torvalds         2005-04-16  566  /*
> 33f2ef89f Andy Whitcroft         2006-12-06  567   * Compound pages have a destructor function.  Provide a
> 33f2ef89f Andy Whitcroft         2006-12-06  568   * prototype for that function and accessor functions.
> f1e61557f Kirill A. Shutemov     2015-11-06  569   * These are _only_ valid on the head of a compound page.
> 33f2ef89f Andy Whitcroft         2006-12-06  570   */
> f1e61557f Kirill A. Shutemov     2015-11-06  571  typedef void compound_page_dtor(struct page *);
> f1e61557f Kirill A. Shutemov     2015-11-06  572
> f1e61557f Kirill A. Shutemov     2015-11-06  573  /* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
> f1e61557f Kirill A. Shutemov     2015-11-06  574  enum compound_dtor_id {
> f1e61557f Kirill A. Shutemov     2015-11-06  575        NULL_COMPOUND_DTOR,
> f1e61557f Kirill A. Shutemov     2015-11-06  576        COMPOUND_PAGE_DTOR,
> f1e61557f Kirill A. Shutemov     2015-11-06  577  #ifdef CONFIG_HUGETLB_PAGE
> f1e61557f Kirill A. Shutemov     2015-11-06  578        HUGETLB_PAGE_DTOR,
> f1e61557f Kirill A. Shutemov     2015-11-06  579  #endif
> 9a982250f Kirill A. Shutemov     2016-01-15  580  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> 9a982250f Kirill A. Shutemov     2016-01-15  581        TRANSHUGE_PAGE_DTOR,
> 9a982250f Kirill A. Shutemov     2016-01-15  582  #endif
> f1e61557f Kirill A. Shutemov     2015-11-06  583        NR_COMPOUND_DTORS,
> f1e61557f Kirill A. Shutemov     2015-11-06  584  };
> f1e61557f Kirill A. Shutemov     2015-11-06  585  extern compound_page_dtor * const compound_page_dtors[];
> 33f2ef89f Andy Whitcroft         2006-12-06  586
> 33f2ef89f Andy Whitcroft         2006-12-06  587  static inline void set_compound_page_dtor(struct page *page,
> f1e61557f Kirill A. Shutemov     2015-11-06  588                enum compound_dtor_id compound_dtor)
> 33f2ef89f Andy Whitcroft         2006-12-06  589  {
> f1e61557f Kirill A. Shutemov     2015-11-06  590        VM_BUG_ON_PAGE(compound_dtor >= NR_COMPOUND_DTORS, page);
> f1e61557f Kirill A. Shutemov     2015-11-06  591        page[1].compound_dtor = compound_dtor;
> 33f2ef89f Andy Whitcroft         2006-12-06  592  }
> 33f2ef89f Andy Whitcroft         2006-12-06  593
> 33f2ef89f Andy Whitcroft         2006-12-06  594  static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
> 33f2ef89f Andy Whitcroft         2006-12-06  595  {
> f1e61557f Kirill A. Shutemov     2015-11-06  596        VM_BUG_ON_PAGE(page[1].compound_dtor >= NR_COMPOUND_DTORS, page);
> f1e61557f Kirill A. Shutemov     2015-11-06  597        return compound_page_dtors[page[1].compound_dtor];
> 33f2ef89f Andy Whitcroft         2006-12-06  598  }
> 33f2ef89f Andy Whitcroft         2006-12-06  599
> d00181b96 Kirill A. Shutemov     2015-11-06  600  static inline unsigned int compound_order(struct page *page)
> d85f33855 Christoph Lameter      2007-05-06  601  {
> 6d7779538 Christoph Lameter      2007-05-06  602        if (!PageHead(page))
> d85f33855 Christoph Lameter      2007-05-06  603                return 0;
> e4b294c2d Kirill A. Shutemov     2015-02-11  604        return page[1].compound_order;
> d85f33855 Christoph Lameter      2007-05-06  605  }
> d85f33855 Christoph Lameter      2007-05-06  606
> f1e61557f Kirill A. Shutemov     2015-11-06  607  static inline void set_compound_order(struct page *page, unsigned int order)
> d85f33855 Christoph Lameter      2007-05-06  608  {
> e4b294c2d Kirill A. Shutemov     2015-02-11  609        page[1].compound_order = order;
> d85f33855 Christoph Lameter      2007-05-06  610  }
> d85f33855 Christoph Lameter      2007-05-06  611
> 9a982250f Kirill A. Shutemov     2016-01-15  612  void free_compound_page(struct page *page);
> 9a982250f Kirill A. Shutemov     2016-01-15  613
> 3dece370e Michal Simek           2011-01-21  614  #ifdef CONFIG_MMU
> 33f2ef89f Andy Whitcroft         2006-12-06  615  /*
> 14fd403f2 Andrea Arcangeli       2011-01-13  616   * Do pte_mkwrite, but only if the vma says VM_WRITE.  We do this when
> 14fd403f2 Andrea Arcangeli       2011-01-13  617   * servicing faults for write access.  In the normal case, do always want
> 14fd403f2 Andrea Arcangeli       2011-01-13  618   * pte_mkwrite.  But get_user_pages can cause write faults for mappings
> 14fd403f2 Andrea Arcangeli       2011-01-13  619   * that do not have writing enabled, when used by access_process_vm.
> 14fd403f2 Andrea Arcangeli       2011-01-13  620   */
> 14fd403f2 Andrea Arcangeli       2011-01-13  621  static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
> 14fd403f2 Andrea Arcangeli       2011-01-13  622  {
> 14fd403f2 Andrea Arcangeli       2011-01-13  623        if (likely(vma->vm_flags & VM_WRITE))
> 14fd403f2 Andrea Arcangeli       2011-01-13 @624                pte = pte_mkwrite(pte);
> 14fd403f2 Andrea Arcangeli       2011-01-13  625        return pte;
> 14fd403f2 Andrea Arcangeli       2011-01-13  626  }
> 8c6e50b02 Kirill A. Shutemov     2014-04-07  627
>
> :::::: The code at line 486 was first introduced by commit
> :::::: 9e2779fa281cfda13ac060753d674bbcaa23367e is_vmalloc_addr(): Check if an address is within the vmalloc boundaries
>
> :::::: TO: Christoph Lameter <clameter@sgi.com>
> :::::: CC: Linus Torvalds <torvalds@woody.linux-foundation.org>
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 32/60] block: implement sp version of bvec iterator helpers
@ 2016-12-17 11:38       ` Ming Lei
  0 siblings, 0 replies; 148+ messages in thread
From: Ming Lei @ 2016-12-17 11:38 UTC (permalink / raw)
  To: kbuild test robot, sparclinux, David S. Miller
  Cc: kbuild-all, Jens Axboe, Linux Kernel Mailing List, linux-block,
	Linux FS Devel, Christoph Hellwig, Kirill A . Shutemov,
	Johannes Berg

Hi Guys,

On Sat, Oct 29, 2016 at 7:06 PM, kbuild test robot <lkp@intel.com> wrote:
> Hi Ming,

Thanks for the report!

>
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.9-rc2 next-20161028]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
> [Check https://git-scm.com/docs/git-format-patch for more information]
>
> url:    https://github.com/0day-ci/linux/commits/Ming-Lei/block-support-multipage-bvec/20161029-163910
> config: sparc-defconfig (attached as .config)
> compiler: sparc-linux-gcc (GCC) 6.2.0
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=sparc
>
> All error/warnings (new ones prefixed by >>):
>
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,


This issue should be caused by somewhere in sparc arch, and this patch
only adds '#include <linux/mm.h>' to 'include/linux/bvec.h' for using
nth_page().

So Cc sparc list.

Thanks,
Ming

>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/include/asm/device.h:9,
>                     from include/linux/device.h:30,
>                     from include/linux/node.h:17,
>                     from include/linux/cpu.h:16,
>                     from include/linux/stop_machine.h:4,
>                     from kernel/sched/sched.h:10,
>                     from kernel/sched/loadavg.c:11:
>>> arch/sparc/include/asm/oplib_32.h:105:39: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
>                                           ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:168:36: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
>                                        ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:172:18: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>               struct linux_prom_registers *sbusregs, int nregs);
>                      ^~~~~~~~~~~~~~~~~~~~
> --
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/mp.c:12:
>>> arch/sparc/include/asm/oplib_32.h:105:39: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
>                                           ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:168:36: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
>                                        ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:172:18: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>               struct linux_prom_registers *sbusregs, int nregs);
>                      ^~~~~~~~~~~~~~~~~~~~
>>> arch/sparc/prom/mp.c:23:1: error: conflicting types for 'prom_startcpu'
>     prom_startcpu(int cpunode, struct linux_prom_registers *ctable_reg, int ctx, char *pc)
>     ^~~~~~~~~~~~~
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/mp.c:12:
>    arch/sparc/include/asm/oplib_32.h:105:5: note: previous declaration of 'prom_startcpu' was here
>     int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
>         ^~~~~~~~~~~~~
> --
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/ranges.c:11:
>>> arch/sparc/include/asm/oplib_32.h:105:39: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     int prom_startcpu(int cpunode, struct linux_prom_registers *context_table,
>                                           ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:168:36: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>     void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
>                                        ^~~~~~~~~~~~~~~~~~~~
>    arch/sparc/include/asm/oplib_32.h:172:18: warning: 'struct linux_prom_registers' declared inside parameter list will not be visible outside of this definition or declaration
>               struct linux_prom_registers *sbusregs, int nregs);
>                      ^~~~~~~~~~~~~~~~~~~~
>>> arch/sparc/prom/ranges.c:57:6: error: conflicting types for 'prom_apply_obio_ranges'
>     void prom_apply_obio_ranges(struct linux_prom_registers *regs, int nregs)
>          ^~~~~~~~~~~~~~~~~~~~~~
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/ranges.c:11:
>    arch/sparc/include/asm/oplib_32.h:168:6: note: previous declaration of 'prom_apply_obio_ranges' was here
>     void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
>          ^~~~~~~~~~~~~~~~~~~~~~
>    In file included from include/linux/linkage.h:6:0,
>                     from include/linux/kernel.h:6,
>                     from include/linux/list.h:8,
>                     from include/linux/module.h:9,
>                     from arch/sparc/prom/ranges.c:9:
>    arch/sparc/prom/ranges.c:62:15: error: conflicting types for 'prom_apply_obio_ranges'
>     EXPORT_SYMBOL(prom_apply_obio_ranges);
>                   ^
>    include/linux/export.h:58:21: note: in definition of macro '___EXPORT_SYMBOL'
>      extern typeof(sym) sym;      \
>                         ^~~
>>> arch/sparc/prom/ranges.c:62:1: note: in expansion of macro 'EXPORT_SYMBOL'
>     EXPORT_SYMBOL(prom_apply_obio_ranges);
>     ^~~~~~~~~~~~~
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/ranges.c:11:
>    arch/sparc/include/asm/oplib_32.h:168:6: note: previous declaration of 'prom_apply_obio_ranges' was here
>     void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int nregs);
>          ^~~~~~~~~~~~~~~~~~~~~~
>>> arch/sparc/prom/ranges.c:87:6: error: conflicting types for 'prom_apply_generic_ranges'
>     void prom_apply_generic_ranges(phandle node, phandle parent,
>          ^~~~~~~~~~~~~~~~~~~~~~~~~
>    In file included from arch/sparc/include/asm/oplib.h:6:0,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from include/linux/mm.h:68,
>                     from include/linux/bvec.h:25,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/prom/ranges.c:11:
>    arch/sparc/include/asm/oplib_32.h:171:6: note: previous declaration of 'prom_apply_generic_ranges' was here
>     void prom_apply_generic_ranges(phandle node, phandle parent,
>          ^~~~~~~~~~~~~~~~~~~~~~~~~
> --
>    In file included from include/linux/bvec.h:25:0,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/include/asm/oplib_32.h:11,
>                     from arch/sparc/include/asm/oplib.h:6,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from mm/init-mm.c:9:
>    include/linux/mm.h: In function 'is_vmalloc_addr':
>>> include/linux/mm.h:486:17: error: 'VMALLOC_START' undeclared (first use in this function)
>      return addr >= VMALLOC_START && addr < VMALLOC_END;
>                     ^~~~~~~~~~~~~
>    include/linux/mm.h:486:17: note: each undeclared identifier is reported only once for each function it appears in
>>> include/linux/mm.h:486:41: error: 'VMALLOC_END' undeclared (first use in this function)
>      return addr >= VMALLOC_START && addr < VMALLOC_END;
>                                             ^~~~~~~~~~~
>    include/linux/mm.h: In function 'maybe_mkwrite':
>>> include/linux/mm.h:624:9: error: implicit declaration of function 'pte_mkwrite' [-Werror=implicit-function-declaration]
>       pte = pte_mkwrite(pte);
>             ^~~~~~~~~~~
>    In file included from include/linux/bvec.h:25:0,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/include/asm/oplib_32.h:11,
>                     from arch/sparc/include/asm/oplib.h:6,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from mm/init-mm.c:9:
>    include/linux/mm.h: In function 'pgtable_init':
>>> include/linux/mm.h:1674:2: error: implicit declaration of function 'pgtable_cache_init' [-Werror=implicit-function-declaration]
>      pgtable_cache_init();
>      ^~~~~~~~~~~~~~~~~~
>    In file included from arch/sparc/include/asm/pgtable.h:6:0,
>                     from mm/init-mm.c:9:
>    arch/sparc/include/asm/pgtable_32.h: At top level:
>>> arch/sparc/include/asm/pgtable_32.h:245:21: error: conflicting types for 'pte_mkwrite'
>     static inline pte_t pte_mkwrite(pte_t pte)
>                         ^~~~~~~~~~~
>    In file included from include/linux/bvec.h:25:0,
>                     from include/linux/blk_types.h:9,
>                     from include/linux/fs.h:31,
>                     from include/linux/proc_fs.h:8,
>                     from arch/sparc/include/asm/prom.h:22,
>                     from include/linux/of.h:232,
>                     from arch/sparc/include/asm/openprom.h:14,
>                     from arch/sparc/include/asm/oplib_32.h:11,
>                     from arch/sparc/include/asm/oplib.h:6,
>                     from arch/sparc/include/asm/pgtable_32.h:21,
>                     from arch/sparc/include/asm/pgtable.h:6,
>                     from mm/init-mm.c:9:
>    include/linux/mm.h:624:9: note: previous implicit declaration of 'pte_mkwrite' was here
>       pte = pte_mkwrite(pte);
>             ^~~~~~~~~~~
>    cc1: some warnings being treated as errors
>
> vim +/VMALLOC_START +486 include/linux/mm.h
>
> 0738c4bb8 Paul Mundt             2008-03-12  480   */
> bb00a789e Yaowei Bai             2016-05-19  481  static inline bool is_vmalloc_addr(const void *x)
> 9e2779fa2 Christoph Lameter      2008-02-04  482  {
> 0738c4bb8 Paul Mundt             2008-03-12  483  #ifdef CONFIG_MMU
> 9e2779fa2 Christoph Lameter      2008-02-04  484        unsigned long addr = (unsigned long)x;
> 9e2779fa2 Christoph Lameter      2008-02-04  485
> 9e2779fa2 Christoph Lameter      2008-02-04 @486        return addr >= VMALLOC_START && addr < VMALLOC_END;
> 0738c4bb8 Paul Mundt             2008-03-12  487  #else
> bb00a789e Yaowei Bai             2016-05-19  488        return false;
> 8ca3ed87d David Howells          2008-02-23  489  #endif
> 0738c4bb8 Paul Mundt             2008-03-12  490  }
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  491  #ifdef CONFIG_MMU
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  492  extern int is_vmalloc_or_module_addr(const void *x);
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  493  #else
> 934831d06 David Howells          2009-09-24  494  static inline int is_vmalloc_or_module_addr(const void *x)
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  495  {
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  496        return 0;
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  497  }
> 81ac3ad90 KAMEZAWA Hiroyuki      2009-09-22  498  #endif
> 9e2779fa2 Christoph Lameter      2008-02-04  499
> 39f1f78d5 Al Viro                2014-05-06  500  extern void kvfree(const void *addr);
> 39f1f78d5 Al Viro                2014-05-06  501
> 53f9263ba Kirill A. Shutemov     2016-01-15  502  static inline atomic_t *compound_mapcount_ptr(struct page *page)
> 53f9263ba Kirill A. Shutemov     2016-01-15  503  {
> 53f9263ba Kirill A. Shutemov     2016-01-15  504        return &page[1].compound_mapcount;
> 53f9263ba Kirill A. Shutemov     2016-01-15  505  }
> 53f9263ba Kirill A. Shutemov     2016-01-15  506
> 53f9263ba Kirill A. Shutemov     2016-01-15  507  static inline int compound_mapcount(struct page *page)
> 53f9263ba Kirill A. Shutemov     2016-01-15  508  {
> 5f527c2b3 Andrea Arcangeli       2016-05-20  509        VM_BUG_ON_PAGE(!PageCompound(page), page);
> 53f9263ba Kirill A. Shutemov     2016-01-15  510        page = compound_head(page);
> 53f9263ba Kirill A. Shutemov     2016-01-15  511        return atomic_read(compound_mapcount_ptr(page)) + 1;
> 53f9263ba Kirill A. Shutemov     2016-01-15  512  }
> 53f9263ba Kirill A. Shutemov     2016-01-15  513
> ccaafd7fd Joonsoo Kim            2015-02-10  514  /*
> 70b50f94f Andrea Arcangeli       2011-11-02  515   * The atomic page->_mapcount, starts from -1: so that transitions
> 70b50f94f Andrea Arcangeli       2011-11-02  516   * both from it and to it can be tracked, using atomic_inc_and_test
> 70b50f94f Andrea Arcangeli       2011-11-02  517   * and atomic_add_negative(-1).
> 70b50f94f Andrea Arcangeli       2011-11-02  518   */
> 22b751c3d Mel Gorman             2013-02-22  519  static inline void page_mapcount_reset(struct page *page)
> 70b50f94f Andrea Arcangeli       2011-11-02  520  {
> 70b50f94f Andrea Arcangeli       2011-11-02  521        atomic_set(&(page)->_mapcount, -1);
> 70b50f94f Andrea Arcangeli       2011-11-02  522  }
> 70b50f94f Andrea Arcangeli       2011-11-02  523
> b20ce5e03 Kirill A. Shutemov     2016-01-15  524  int __page_mapcount(struct page *page);
> b20ce5e03 Kirill A. Shutemov     2016-01-15  525
> 70b50f94f Andrea Arcangeli       2011-11-02  526  static inline int page_mapcount(struct page *page)
> 70b50f94f Andrea Arcangeli       2011-11-02  527  {
> 1d148e218 Wang, Yalin            2015-02-11  528        VM_BUG_ON_PAGE(PageSlab(page), page);
> 53f9263ba Kirill A. Shutemov     2016-01-15  529
> b20ce5e03 Kirill A. Shutemov     2016-01-15  530        if (unlikely(PageCompound(page)))
> b20ce5e03 Kirill A. Shutemov     2016-01-15  531                return __page_mapcount(page);
> b20ce5e03 Kirill A. Shutemov     2016-01-15  532        return atomic_read(&page->_mapcount) + 1;
> 53f9263ba Kirill A. Shutemov     2016-01-15  533  }
> b20ce5e03 Kirill A. Shutemov     2016-01-15  534
> b20ce5e03 Kirill A. Shutemov     2016-01-15  535  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> b20ce5e03 Kirill A. Shutemov     2016-01-15  536  int total_mapcount(struct page *page);
> 6d0a07edd Andrea Arcangeli       2016-05-12  537  int page_trans_huge_mapcount(struct page *page, int *total_mapcount);
> b20ce5e03 Kirill A. Shutemov     2016-01-15  538  #else
> b20ce5e03 Kirill A. Shutemov     2016-01-15  539  static inline int total_mapcount(struct page *page)
> b20ce5e03 Kirill A. Shutemov     2016-01-15  540  {
> b20ce5e03 Kirill A. Shutemov     2016-01-15  541        return page_mapcount(page);
> 70b50f94f Andrea Arcangeli       2011-11-02  542  }
> 6d0a07edd Andrea Arcangeli       2016-05-12  543  static inline int page_trans_huge_mapcount(struct page *page,
> 6d0a07edd Andrea Arcangeli       2016-05-12  544                                           int *total_mapcount)
> 6d0a07edd Andrea Arcangeli       2016-05-12  545  {
> 6d0a07edd Andrea Arcangeli       2016-05-12  546        int mapcount = page_mapcount(page);
> 6d0a07edd Andrea Arcangeli       2016-05-12  547        if (total_mapcount)
> 6d0a07edd Andrea Arcangeli       2016-05-12  548                *total_mapcount = mapcount;
> 6d0a07edd Andrea Arcangeli       2016-05-12  549        return mapcount;
> 6d0a07edd Andrea Arcangeli       2016-05-12  550  }
> b20ce5e03 Kirill A. Shutemov     2016-01-15  551  #endif
> 70b50f94f Andrea Arcangeli       2011-11-02  552
> b49af68ff Christoph Lameter      2007-05-06  553  static inline struct page *virt_to_head_page(const void *x)
> b49af68ff Christoph Lameter      2007-05-06  554  {
> b49af68ff Christoph Lameter      2007-05-06  555        struct page *page = virt_to_page(x);
> ccaafd7fd Joonsoo Kim            2015-02-10  556
> 1d798ca3f Kirill A. Shutemov     2015-11-06  557        return compound_head(page);
> b49af68ff Christoph Lameter      2007-05-06  558  }
> b49af68ff Christoph Lameter      2007-05-06  559
> ddc58f27f Kirill A. Shutemov     2016-01-15  560  void __put_page(struct page *page);
> ddc58f27f Kirill A. Shutemov     2016-01-15  561
> 1d7ea7324 Alexander Zarochentsev 2006-08-13  562  void put_pages_list(struct list_head *pages);
> ^1da177e4 Linus Torvalds         2005-04-16  563
> 8dfcc9ba2 Nick Piggin            2006-03-22  564  void split_page(struct page *page, unsigned int order);
> 8dfcc9ba2 Nick Piggin            2006-03-22  565
> ^1da177e4 Linus Torvalds         2005-04-16  566  /*
> 33f2ef89f Andy Whitcroft         2006-12-06  567   * Compound pages have a destructor function.  Provide a
> 33f2ef89f Andy Whitcroft         2006-12-06  568   * prototype for that function and accessor functions.
> f1e61557f Kirill A. Shutemov     2015-11-06  569   * These are _only_ valid on the head of a compound page.
> 33f2ef89f Andy Whitcroft         2006-12-06  570   */
> f1e61557f Kirill A. Shutemov     2015-11-06  571  typedef void compound_page_dtor(struct page *);
> f1e61557f Kirill A. Shutemov     2015-11-06  572
> f1e61557f Kirill A. Shutemov     2015-11-06  573  /* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
> f1e61557f Kirill A. Shutemov     2015-11-06  574  enum compound_dtor_id {
> f1e61557f Kirill A. Shutemov     2015-11-06  575        NULL_COMPOUND_DTOR,
> f1e61557f Kirill A. Shutemov     2015-11-06  576        COMPOUND_PAGE_DTOR,
> f1e61557f Kirill A. Shutemov     2015-11-06  577  #ifdef CONFIG_HUGETLB_PAGE
> f1e61557f Kirill A. Shutemov     2015-11-06  578        HUGETLB_PAGE_DTOR,
> f1e61557f Kirill A. Shutemov     2015-11-06  579  #endif
> 9a982250f Kirill A. Shutemov     2016-01-15  580  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> 9a982250f Kirill A. Shutemov     2016-01-15  581        TRANSHUGE_PAGE_DTOR,
> 9a982250f Kirill A. Shutemov     2016-01-15  582  #endif
> f1e61557f Kirill A. Shutemov     2015-11-06  583        NR_COMPOUND_DTORS,
> f1e61557f Kirill A. Shutemov     2015-11-06  584  };
> f1e61557f Kirill A. Shutemov     2015-11-06  585  extern compound_page_dtor * const compound_page_dtors[];
> 33f2ef89f Andy Whitcroft         2006-12-06  586
> 33f2ef89f Andy Whitcroft         2006-12-06  587  static inline void set_compound_page_dtor(struct page *page,
> f1e61557f Kirill A. Shutemov     2015-11-06  588                enum compound_dtor_id compound_dtor)
> 33f2ef89f Andy Whitcroft         2006-12-06  589  {
> f1e61557f Kirill A. Shutemov     2015-11-06  590        VM_BUG_ON_PAGE(compound_dtor >= NR_COMPOUND_DTORS, page);
> f1e61557f Kirill A. Shutemov     2015-11-06  591        page[1].compound_dtor = compound_dtor;
> 33f2ef89f Andy Whitcroft         2006-12-06  592  }
> 33f2ef89f Andy Whitcroft         2006-12-06  593
> 33f2ef89f Andy Whitcroft         2006-12-06  594  static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
> 33f2ef89f Andy Whitcroft         2006-12-06  595  {
> f1e61557f Kirill A. Shutemov     2015-11-06  596        VM_BUG_ON_PAGE(page[1].compound_dtor >= NR_COMPOUND_DTORS, page);
> f1e61557f Kirill A. Shutemov     2015-11-06  597        return compound_page_dtors[page[1].compound_dtor];
> 33f2ef89f Andy Whitcroft         2006-12-06  598  }
> 33f2ef89f Andy Whitcroft         2006-12-06  599
> d00181b96 Kirill A. Shutemov     2015-11-06  600  static inline unsigned int compound_order(struct page *page)
> d85f33855 Christoph Lameter      2007-05-06  601  {
> 6d7779538 Christoph Lameter      2007-05-06  602        if (!PageHead(page))
> d85f33855 Christoph Lameter      2007-05-06  603                return 0;
> e4b294c2d Kirill A. Shutemov     2015-02-11  604        return page[1].compound_order;
> d85f33855 Christoph Lameter      2007-05-06  605  }
> d85f33855 Christoph Lameter      2007-05-06  606
> f1e61557f Kirill A. Shutemov     2015-11-06  607  static inline void set_compound_order(struct page *page, unsigned int order)
> d85f33855 Christoph Lameter      2007-05-06  608  {
> e4b294c2d Kirill A. Shutemov     2015-02-11  609        page[1].compound_order = order;
> d85f33855 Christoph Lameter      2007-05-06  610  }
> d85f33855 Christoph Lameter      2007-05-06  611
> 9a982250f Kirill A. Shutemov     2016-01-15  612  void free_compound_page(struct page *page);
> 9a982250f Kirill A. Shutemov     2016-01-15  613
> 3dece370e Michal Simek           2011-01-21  614  #ifdef CONFIG_MMU
> 33f2ef89f Andy Whitcroft         2006-12-06  615  /*
> 14fd403f2 Andrea Arcangeli       2011-01-13  616   * Do pte_mkwrite, but only if the vma says VM_WRITE.  We do this when
> 14fd403f2 Andrea Arcangeli       2011-01-13  617   * servicing faults for write access.  In the normal case, do always want
> 14fd403f2 Andrea Arcangeli       2011-01-13  618   * pte_mkwrite.  But get_user_pages can cause write faults for mappings
> 14fd403f2 Andrea Arcangeli       2011-01-13  619   * that do not have writing enabled, when used by access_process_vm.
> 14fd403f2 Andrea Arcangeli       2011-01-13  620   */
> 14fd403f2 Andrea Arcangeli       2011-01-13  621  static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
> 14fd403f2 Andrea Arcangeli       2011-01-13  622  {
> 14fd403f2 Andrea Arcangeli       2011-01-13  623        if (likely(vma->vm_flags & VM_WRITE))
> 14fd403f2 Andrea Arcangeli       2011-01-13 @624                pte = pte_mkwrite(pte);
> 14fd403f2 Andrea Arcangeli       2011-01-13  625        return pte;
> 14fd403f2 Andrea Arcangeli       2011-01-13  626  }
> 8c6e50b02 Kirill A. Shutemov     2014-04-07  627
>
> :::::: The code at line 486 was first introduced by commit
> :::::: 9e2779fa281cfda13ac060753d674bbcaa23367e is_vmalloc_addr(): Check if an address is within the vmalloc boundaries
>
> :::::: TO: Christoph Lameter <clameter@sgi.com>
> :::::: CC: Linus Torvalds <torvalds@woody.linux-foundation.org>
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 148+ messages in thread

end of thread, other threads:[~2016-12-17 11:38 UTC | newest]

Thread overview: 148+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-29  8:07 [PATCH 00/60] block: support multipage bvec Ming Lei
2016-10-29  8:07 ` [Cluster-devel] " Ming Lei
2016-10-29  8:07 ` Ming Lei
2016-10-29  8:07 ` Ming Lei
2016-10-29  8:07 ` Ming Lei
2016-10-29  8:08 ` [PATCH 01/60] block: bio: introduce bio_init_with_vec_table() Ming Lei
2016-10-29 15:21   ` Christoph Hellwig
2016-10-29  8:08 ` [PATCH 02/60] block drivers: convert to bio_init_with_vec_table() Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 03/60] block: drbd: remove impossible failure handling Ming Lei
2016-10-31 15:25   ` Christoph Hellwig
2016-10-29  8:08 ` [PATCH 04/60] block: floppy: use bio_add_page() Ming Lei
2016-10-31 15:26   ` Christoph Hellwig
2016-10-31 22:54     ` Ming Lei
2016-11-10 19:35   ` Christoph Hellwig
2016-11-11  8:39     ` Ming Lei
2016-10-29  8:08 ` [PATCH 05/60] target: avoid to access .bi_vcnt directly Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-31 15:26   ` Christoph Hellwig
2016-10-29  8:08 ` [PATCH 06/60] bcache: debug: avoid to access .bi_io_vec directly Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 07/60] dm: crypt: use bio_add_page() Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 08/60] dm: use bvec iterator helpers to implement .get_page and .next_page Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 09/60] dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-31 15:29   ` Christoph Hellwig
2016-10-31 15:29     ` Christoph Hellwig
2016-10-31 22:59     ` Ming Lei
2016-10-31 22:59       ` Ming Lei
2016-11-02  3:09     ` Kent Overstreet
2016-11-02  3:09       ` Kent Overstreet
2016-11-02  7:56       ` Ming Lei
2016-11-02  7:56         ` Ming Lei
2016-11-02 14:24         ` Mike Snitzer
2016-11-02 14:24           ` Mike Snitzer
2016-11-02 14:24           ` Mike Snitzer
2016-11-02 23:47           ` Ming Lei
2016-11-02 23:47             ` Ming Lei
2016-10-29  8:08 ` [PATCH 10/60] fs: logfs: convert to bio_add_page() in sync_request() Ming Lei
2016-10-29  8:08 ` [PATCH 11/60] fs: logfs: use bio_add_page() in __bdev_writeseg() Ming Lei
2016-10-31 15:29   ` Christoph Hellwig
2016-10-29  8:08 ` [PATCH 12/60] fs: logfs: use bio_add_page() in do_erase() Ming Lei
2016-10-31 15:29   ` Christoph Hellwig
2016-10-29  8:08 ` [PATCH 13/60] fs: logfs: remove unnecesary check Ming Lei
2016-10-31 15:29   ` Christoph Hellwig
2016-10-29  8:08 ` [PATCH 14/60] block: drbd: comment on direct access bvec table Ming Lei
2016-10-29  8:08 ` [PATCH 15/60] block: loop: comment on direct access to " Ming Lei
2016-10-31 15:31   ` Christoph Hellwig
2016-10-31 23:08     ` Ming Lei
2016-10-29  8:08 ` [PATCH 16/60] block: pktcdvd: " Ming Lei
2016-10-31 15:33   ` Christoph Hellwig
2016-10-31 23:08     ` Ming Lei
2016-10-29  8:08 ` [PATCH 17/60] kernel/power/swap.c: " Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 18/60] mm: page_io.c: " Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 19/60] fs/buffer: " Ming Lei
2016-10-31 15:35   ` Christoph Hellwig
2016-10-31 23:12     ` Ming Lei
2016-10-29  8:08 ` [PATCH 20/60] f2fs: f2fs_read_end_io: " Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 21/60] bcache: " Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 22/60] block: comment on bio_alloc_pages() Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 23/60] block: introduce flag QUEUE_FLAG_NO_MP Ming Lei
2016-10-29 15:29   ` Christoph Hellwig
2016-10-29 22:20     ` Ming Lei
2016-10-29 22:20       ` Ming Lei
2016-10-29  8:08 ` [PATCH 24/60] md: set NO_MP for request queue of md Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 25/60] block: pktcdvd: set NO_MP for pktcdvd request queue Ming Lei
2016-10-29  8:08 ` [PATCH 26/60] btrfs: set NO_MP for request queues behind BTRFS Ming Lei
2016-10-31 15:36   ` Christoph Hellwig
2016-10-31 17:58     ` Chris Mason
2016-10-31 18:00       ` Christoph Hellwig
2016-10-29  8:08 ` [PATCH 27/60] block: introduce BIO_SP_MAX_SECTORS Ming Lei
2016-10-29  8:08 ` [PATCH 28/60] block: introduce QUEUE_FLAG_SPLIT_MP Ming Lei
2016-10-31 15:39   ` Christoph Hellwig
2016-10-31 23:56     ` Ming Lei
2016-11-02  3:08     ` Kent Overstreet
2016-11-03 10:38       ` Ming Lei
2016-11-03 11:20         ` Kent Overstreet
2016-11-03 11:26           ` Ming Lei
2016-11-03 11:30             ` Kent Overstreet
2016-10-29  8:08 ` [PATCH 29/60] dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 30/60] bcache: set flag of QUEUE_FLAG_SPLIT_MP Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 31/60] block: introduce multipage/single page bvec helpers Ming Lei
2016-10-29  8:08 ` [PATCH 32/60] block: implement sp version of bvec iterator helpers Ming Lei
2016-10-29 11:06   ` kbuild test robot
2016-12-17 11:38     ` Ming Lei
2016-12-17 11:38       ` Ming Lei
2016-10-29  8:08 ` [PATCH 33/60] block: introduce bio_for_each_segment_mp() Ming Lei
2016-10-29  8:08 ` [PATCH 34/60] block: introduce bio_clone_sp() Ming Lei
2016-10-29  8:08 ` [PATCH 35/60] bvec_iter: introduce BVEC_ITER_ALL_INIT Ming Lei
2016-10-29  8:08 ` [PATCH 36/60] block: bounce: avoid direct access to bvec from bio->bi_io_vec Ming Lei
2016-10-29  8:08 ` [PATCH 37/60] block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq Ming Lei
2016-10-29  8:08 ` [PATCH 38/60] block: bounce: convert multipage bvecs into singlepage Ming Lei
2016-10-29  8:08 ` [PATCH 39/60] bcache: debug: switch to bio_clone_sp() Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 40/60] blk-merge: compute bio->bi_seg_front_size efficiently Ming Lei
2016-10-29  8:08 ` [PATCH 41/60] block: blk-merge: try to make front segments in full size Ming Lei
2016-10-29  8:08 ` [PATCH 42/60] block: use bio_for_each_segment_mp() to compute segments count Ming Lei
2016-10-29  8:08 ` [PATCH 43/60] block: use bio_for_each_segment_mp() to map sg Ming Lei
2016-10-29  8:08 ` [PATCH 44/60] block: introduce bvec_for_each_sp_bvec() Ming Lei
2016-10-29  8:08 ` [PATCH 45/60] block: bio: introduce bio_for_each_segment_all_rd() and its write pair Ming Lei
2016-10-31 13:59   ` Theodore Ts'o
2016-10-31 15:11     ` Christoph Hellwig
2016-10-31 22:50       ` Ming Lei
2016-11-02  3:01       ` Kent Overstreet
2016-10-31 22:46     ` Ming Lei
2016-10-31 23:51       ` Ming Lei
2016-11-01 14:17         ` Theodore Ts'o
2016-11-02  1:58           ` Ming Lei
2016-10-29  8:08 ` [PATCH 46/60] block: deal with dirtying pages for multipage bvec Ming Lei
2016-10-31 15:40   ` Christoph Hellwig
2016-11-01  0:19     ` Ming Lei
2016-10-29  8:08 ` [PATCH 47/60] block: convert to bio_for_each_segment_all_rd() Ming Lei
2016-10-29  8:08 ` [PATCH 48/60] fs/mpage: " Ming Lei
2016-10-29  8:08 ` [PATCH 49/60] fs/direct-io: " Ming Lei
2016-10-29  8:08 ` [PATCH 50/60] ext4: " Ming Lei
2016-10-29  8:08 ` [PATCH 51/60] xfs: " Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 52/60] logfs: " Ming Lei
2016-10-29  8:08 ` [PATCH 53/60] gfs2: " Ming Lei
2016-10-29  8:08   ` [Cluster-devel] " Ming Lei
2016-10-29  8:08 ` [PATCH 54/60] f2fs: " Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 55/60] exofs: " Ming Lei
2016-10-29  8:08 ` [PATCH 56/60] fs: crypto: " Ming Lei
2016-10-29  8:08 ` [PATCH 57/60] bcache: " Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 58/60] dm-crypt: " Ming Lei
2016-10-29  8:08   ` Ming Lei
2016-10-29  8:08 ` [PATCH 59/60] fs/buffer.c: use bvec iterator to truncate the bio Ming Lei
2016-10-29  8:08 ` [PATCH 60/60] block: enable multipage bvecs Ming Lei
2016-10-31 15:25 ` [PATCH 00/60] block: support multipage bvec Christoph Hellwig
2016-10-31 15:25   ` [Cluster-devel] " Christoph Hellwig
2016-10-31 15:25   ` Christoph Hellwig
2016-10-31 15:25   ` Christoph Hellwig
2016-10-31 22:52   ` Ming Lei
2016-10-31 22:52     ` [Cluster-devel] " Ming Lei
2016-10-31 22:52     ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.