All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] Block pull request for- 4.11-rc1
@ 2017-02-20  0:10 Jens Axboe
  2017-02-20  1:09   ` Bart Van Assche
  2017-02-21 19:11 ` Linus Torvalds
  0 siblings, 2 replies; 43+ messages in thread
From: Jens Axboe @ 2017-02-20  0:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-block, linux-kernel

Hi Linus,

This is the collected pull request for 4.11 for the block core and
drivers. It's really two different branches:

	for-4.11/block-signed
	for-4.11/next-signed

for-4.11/next exists because some of Christoph's patch series were based
on patches that were added after for-4.11/block was forked off, which
would have caused needless merge pain. So for-4.11/next was forked off
v4.10-rc5, with for-4.11/block pulled in. Feel free to pull each of
these in succession instead of this pre-merged branch. Note that if you
do opt for pulling the two branches, for-4.11/block will throw a trivial
merge conflict in block/blk-mq.c, where you need to delete a function.
for-4.11/next merges cleanly, HOWEVER, a commit that was added in
mainline since v4.10-rc5 (f2e767bb5d6e) adds a line of code that is no
longer valid with the changes in for-4.11/next. Hence, if you do pull in
separately, you'll want to --no-commit and edit:

	drivers/scsi/mpt3sas/mpt3sas_scsih.c

to fix up that one line, like I did in the for-4.11/linus-merge branch.


With that said, this pull request contains:

- blk-mq scheduling framework from me and Omar, with a port of the
  deadline scheduler for this framework. A port of BFQ from Paolo
  is in the works, and should be ready for 4.12.

- Various fixups and improvements to the above scheduling framework
  from Omar, Paolo, Bart, me, others.

- Cleanup of the exported sysfs blk-mq data into debugfs, from
  Omar. This allows us to export more information that helps
  debug hangs or performance issues, without cluttering or
  abusing the sysfs API.

- Fixes for the sbitmap code, the scalable bitmap code that was
  migrated from blk-mq, from Omar.

- Removal of the BLOCK_PC support in struct request, and refactoring of
  carrying SCSI payloads in the block layer. This cleans up the code
  nicely, and enables us to kill the SCSI specific parts of struct
  request, shrinking it down nicely. From Christoph mainly, with help
  from Hannes.

- Support for ranged discard requests and discard merging, also from
  Christoph.

- Support for OPAL in the block layer, and for NVMe as well. Mainly
  from Scott Bauer, with fixes/updates from various others folks.

- Error code fixup for gdrom from Christophe.

- cciss pci irq allocation cleanup from Christoph.

- Making the cdrom device operations read only, from Kees Cook.

- Fixes for duplicate bdi registrations and bdi/queue life time
  problems from Jan and Dan.

- Set of fixes and updates for lightnvm, from Matias and Javier.

- A few fixes for nbd from Josef, using idr to name devices and
  a workqueue deadlock fix on receive. Also marks Josef as the
  current maintainer of nbd.

- Fix from Josef, overwriting queue settings when the number of
  hardware queues is updated for a blk-mq device.

- NVMe fix from Keith, ensuring that we don't repeatedly mark
  and IO aborted, if we didn't end up aborting it.

- SG gap merging fix from Ming Lei for block.

- Loop fix also from Ming, fixing a race and crash between setting
  loop status and IO.

- Two block race fixes from Tahsin, fixing request list iteration
  and fixing a race between device registration and udev device add
  notifiations.

- Double free fix from cgroup writeback, from Tejun.

- Another double free fix in blkcg, from Hou Tao.

- Partition overflow fix for EFI from Alden Tondettar.

Please pull! Either this pre-merged branch:


  git://git.kernel.dk/linux-block.git for-4.11/linus-merge-signed


or


  git://git.kernel.dk/linux-block.git for-4.11/block-signed
  git://git.kernel.dk/linux-block.git for-4.11/next-signed


in that order. All branches are signed tags.


----------------------------------------------------------------
Alden Tondettar (1):
      partitions/efi: Fix integer overflow in GPT size calculation

Alexander Potapenko (1):
      block: Initialize cfqq->ioprio_class in cfq_get_queue()

Bart Van Assche (5):
      blk-mq-debugfs: Add missing __acquires() / __releases() annotations
      blk-mq-debug: Avoid that sparse complains about req_flags_t usage
      blk-mq-debug: Make show() operations interruptible
      blk-mq-debug: Introduce debugfs_create_files()
      block: Update comments that refer to __bio_map_user() and bio_map_user()

Christoph Hellwig (39):
      block: add a op_is_flush helper
      md: cleanup bio op / flags handling in raid1_write_request
      block: fix elevator init check
      block: simplify blk_init_allocated_queue
      block: allow specifying size for extra command data
      block: cleanup tracing
      dm: remove incomplete BLOCK_PC support
      dm: always defer request allocation to the owner of the request_queue
      scsi: remove gfp_flags member in scsi_host_cmd_pool
      scsi: respect unchecked_isa_dma for blk-mq
      scsi: remove scsi_cmd_dma_pool
      scsi: remove __scsi_alloc_queue
      scsi: allocate scsi_cmnd structures as part of struct request
      block/bsg: move queue creation into bsg_setup_queue
      block: split scsi_request out of struct request
      block: don't assign cmd_flags in __blk_rq_prep_clone
      nvme/scsi: don't rely on BLK_MAX_CDB
      skd: implement trivial scsi ioctls directly
      block: make scsi_request and scsi ioctl support optional
      virtio_blk: remove struct request backpointer from virtblk_req
      virtio_blk: make SCSI passthrough support configurable
      scm_blk: remove unneeded REQ_TYPE_FS check
      ѕd: remove pointless REQ_TYPE_FS check
      mmc: remove pointless request type check in mmc_prep_request
      ms_block: remove pointless prep_fn
      mspro_block: remove pointless prep_fn
      nbd: remove REQ_TYPE_DRV_PRIV leftovers
      nbd: move request validity checking into nbd_send_cmd
      block: introduce blk_rq_is_passthrough
      ide: don't abuse cmd_type
      block: fold cmd_type into the REQ_OP_ space
      dm: don't allow ioctls to targets that don't map to whole devices
      block: move req_set_nomerge to blk.h
      block: enumify ELEVATOR_*_MERGE
      block: optionally merge discontiguous discard bios into a single request
      nvme: support ranged discard requests
      cciss: switch to pci_irq_alloc_vectors
      block/sed-opal: tone down not supported warnings
      block/sed-opal: allocate struct opal_dev dynamically

Christophe JAILLET (1):
      gdrom: Add missing error code

Dan Williams (1):
      scsi, block: fix duplicate bdi name registration crashes

Hannes Reinecke (3):
      scsi_dh_rdac: switch to scsi_execute_req_flags()
      scsi_dh_emc: switch to scsi_execute_req_flags()
      scsi_dh_hp_sw: switch to scsi_execute_req_flags()

Hou Tao (1):
      blkcg: fix double free of new_blkg in blkcg_init_queue

Jan Kara (5):
      block: Unhash block device inodes on gendisk destruction
      block: Use pointer to backing_dev_info from request_queue
      block: Dynamically allocate and refcount backing_dev_info
      block: Make blk_get_backing_dev_info() safe without open bdev
      block: Get rid of blk_get_backing_dev_info()

Javier González (3):
      lightnvm: Add CRC read error
      lightnvm: free properly on target creation error
      lightnvm: allow targets to use sysfs

Jens Axboe (43):
      blk-mq: make mq_ops a const pointer
      block: move existing elevator ops to union
      block: move rq_ioc() to blk.h
      blk-mq: un-export blk_mq_free_hctx_request()
      blk-mq: export some helpers we need to the scheduling framework
      blk-mq-tag: cleanup the normal/reserved tag allocation
      blk-mq: abstract out helpers for allocating/freeing tag maps
      blk-mq: add support for carrying internal tag information in blk_qc_t
      blk-mq: split tag ->rqs[] into two
      blk-mq-sched: add framework for MQ capable IO schedulers
      mq-deadline: add blk-mq adaptation of the deadline IO scheduler
      blk-mq-sched: allow setting of default IO scheduler
      blk-cgroup: ensure that we clear the stop bit on quiesced queues
      blk-cgroup: don't quiesce the queue on policy activate/deactivate
      elevator: fix unnecessary put of elevator in failure case
      blk-mq-tag: remove redundant check for 'data->hctx' being non-NULL
      blk-mq: stop hardware queue in blk_mq_delay_queue()
      blk-mq: allow resize of scheduler requests
      blk-mq: only apply active queue tag throttling for driver tags
      blk-mq: don't lose flags passed in to blk_mq_alloc_request()
      blk-mq-sched: check for successful allocation before assigning tag
      blk-mq: improve scheduler queue sync/async running
      blk-mq: fix potential race in queue restart and driver tag allocation
      blk-mq: release driver tag on a requeue event
      blk-mq-sched: fix starvation for multiple hardware queues and shared tags
      blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()
      blk-mq-sched: add flush insertion into blk_mq_sched_insert_request()
      block: cleanup remaining manual checks for PREFLUSH|FUA
      Merge branch 'for-4.11/block' into for-4.11/rq-refactor
      nvme: fix compilation of scsi component
      block: move internal_tag to same cache line as tag
      blk-mq: don't fail allocating driver tag for stopped hw queue
      zram_drv: update for backing dev info changes
      blk-merge: return the merged request
      block: free merged request in the caller
      elevator: fix loading wrong elevator type for blk-mq devices
      blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not
      blk-mq-sched: don't add flushes to the head of requeue queue
      blk-mq: don't special case flush inserts for blk-mq-sched
      blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers
      block: don't defer flushes on blk-mq + scheduling
      Merge branch 'for-4.11/block' into for-4.11/linus-merge
      Merge branch 'for-4.11/next' into for-4.11/linus-merge

Josef Bacik (4):
      MAINTAINERS: Update maintainer entry for NBD
      nbd: use our own workqueue for recv threads
      nbd: use an idr to keep track of nbd devices
      block: set make_request_fn manually in blk_mq_update_nr_hw_queues

Kees Cook (1):
      cdrom: Make device operations read-only

Keith Busch (1):
      nvme/pci: Don't mark IOD as aborted if abort wasn't sent

Markus Elfring (2):
      blk-throttle: Adjust two function calls together with a variable assignment
      cfq-iosched: Adjust one function call together with a variable assignment

Matias Bjørling (11):
      lightnvm: merge gennvm with core
      lightnvm: collapse nvm_erase_ppa and nvm_erase_blk
      lightnvm: remove nvm_submit_ppa* functions
      lightnvm: remove nvm_get_bb_tbl and nvm_set_bb_tbl
      lightnvm: make nvm_map_* return void
      lightnvm: cleanup nvm transformation functions
      lightnvm: reduce number of nvm_id groups to one
      lightnvm: add ioctls for vector I/Os
      lightnvm: use end_io callback instead of instance
      lightnvm: fix off-by-one error on target initialization
      lightnvm: set default lun range when no luns are specified

Ming Lei (2):
      block: relax check on sg gap
      block/loop: fix race between I/O and set_status

Omar Sandoval (22):
      sbitmap: use smp_mb__after_atomic() in sbq_wake_up()
      sbitmap: fix wakeup hang after sbq resize
      blk-mq: create debugfs directory tree
      blk-mq: add hctx->{state,flags} to debugfs
      blk-mq: move hctx->dispatch and ctx->rq_list from sysfs to debugfs
      blk-mq: add extra request information to debugfs
      sbitmap: add helpers for dumping to a seq_file
      blk-mq: export software queue pending map to debugfs
      blk-mq: move tags and sched_tags info from sysfs to debugfs
      blk-mq: add tags and sched_tags bitmaps to debugfs
      blk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfs
      blk-mq: move hctx and ctx counters from sysfs to debugfs
      blk-mq: fix debugfs compilation issues
      debugfs: add debugfs_lookup()
      block: fix debugfs config conditional in struct request_queue
      blktrace: make do_blk_trace_setup() static
      block: use same block debugfs directory for blk-mq and blktrace
      blk-mq: move debugfs_remove() of disk dir to blk_release_queue()
      blktrace: use existing disk debugfs directory
      blk-mq-sched: bypass the scheduler for flushes entirely
      blk-mq-sched: (un)register elevator when (un)registering queue
      blk-mq-sched: don't hold queue_lock when calling exit_icq

Paolo Valente (1):
      blk-mq: pass bio to blk_mq_sched_get_rq_priv

Scott Bauer (8):
      Include: Uapi: Add user ABI for Sed/Opal
      block: Add Sed-opal library
      nvme: Add Support for Opal: Unlock from S3 & Opal Allocation/Ioctls
      Fix SED-OPAL UAPI structs to prevent 32/64 bit size differences.
      uapi: sed-opal fix IOW for activate lsp to use correct struct
      Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN
      Maintainers: Modify SED list from nvme to block
      nvme: Check for Security send/recv support before issuing commands.

Tahsin Erdogan (2):
      block: queue lock must be acquired when iterating over rls
      block: do not allow updates through sysfs until registration completes

Tejun Heo (1):
      block: fix double-free in the failure path of cgwb_bdi_init()

Vlastimil Babka (1):
      floppy: replace wrong kmalloc(GFP_USER) with GFP_KERNEL

 Documentation/cdrom/cdrom-standard.tex      |    9 +-
 MAINTAINERS                                 |   15 +-
 block/Kconfig                               |   24 +
 block/Kconfig.iosched                       |   50 +
 block/Makefile                              |   10 +-
 block/bio.c                                 |   16 +-
 block/blk-cgroup.c                          |   32 +-
 block/blk-core.c                            |  355 ++--
 block/blk-exec.c                            |   22 +-
 block/blk-flush.c                           |   26 +-
 block/blk-integrity.c                       |    4 +-
 block/blk-ioc.c                             |   34 +-
 block/blk-map.c                             |   13 +-
 block/blk-merge.c                           |   62 +-
 block/blk-mq-debugfs.c                      |  772 +++++++++
 block/blk-mq-sched.c                        |  515 ++++++
 block/blk-mq-sched.h                        |  143 ++
 block/blk-mq-sysfs.c                        |  235 +--
 block/blk-mq-tag.c                          |  190 +-
 block/blk-mq-tag.h                          |   10 +-
 block/blk-mq.c                              |  590 ++++---
 block/blk-mq.h                              |   72 +
 block/blk-settings.c                        |   22 +-
 block/blk-sysfs.c                           |   68 +-
 block/blk-tag.c                             |    1 +
 block/blk-throttle.c                        |    6 +-
 block/blk-wbt.c                             |    8 +-
 block/blk.h                                 |   47 +-
 block/bsg-lib.c                             |   49 +-
 block/bsg.c                                 |   64 +-
 block/cfq-iosched.c                         |   14 +-
 block/compat_ioctl.c                        |    7 +-
 block/deadline-iosched.c                    |   14 +-
 block/elevator.c                            |  267 ++-
 block/genhd.c                               |   25 +-
 block/ioctl.c                               |    7 +-
 block/mq-deadline.c                         |  556 ++++++
 block/noop-iosched.c                        |    2 +-
 block/opal_proto.h                          |  452 +++++
 block/partitions/efi.c                      |   17 +-
 block/scsi_ioctl.c                          |   83 +-
 block/sed-opal.c                            | 2488 +++++++++++++++++++++++++++
 drivers/ata/libata-scsi.c                   |    4 +-
 drivers/block/Kconfig                       |   13 +
 drivers/block/aoe/aoeblk.c                  |    4 +-
 drivers/block/cciss.c                       |  131 +-
 drivers/block/cciss.h                       |    6 +-
 drivers/block/drbd/drbd_main.c              |    6 +-
 drivers/block/drbd/drbd_nl.c                |   12 +-
 drivers/block/drbd/drbd_proc.c              |    2 +-
 drivers/block/drbd/drbd_req.c               |    2 +-
 drivers/block/floppy.c                      |    6 +-
 drivers/block/hd.c                          |   45 +-
 drivers/block/loop.c                        |   17 +-
 drivers/block/mg_disk.c                     |   31 +-
 drivers/block/nbd.c                         |  258 +--
 drivers/block/null_blk.c                    |   10 +-
 drivers/block/osdblk.c                      |    6 -
 drivers/block/paride/Kconfig                |    1 +
 drivers/block/paride/pcd.c                  |    2 +-
 drivers/block/paride/pd.c                   |   15 +-
 drivers/block/pktcdvd.c                     |   12 +-
 drivers/block/ps3disk.c                     |   15 +-
 drivers/block/rbd.c                         |   24 +-
 drivers/block/skd_main.c                    |   15 +-
 drivers/block/sx8.c                         |    4 +-
 drivers/block/virtio_blk.c                  |  205 ++-
 drivers/block/xen-blkfront.c                |    2 +-
 drivers/block/xsysace.c                     |    2 +-
 drivers/block/zram/zram_drv.c               |    2 +-
 drivers/cdrom/cdrom.c                       |   92 +-
 drivers/cdrom/gdrom.c                       |   41 +-
 drivers/ide/Kconfig                         |    1 +
 drivers/ide/ide-atapi.c                     |   78 +-
 drivers/ide/ide-cd.c                        |  192 ++-
 drivers/ide/ide-cd_ioctl.c                  |    5 +-
 drivers/ide/ide-cd_verbose.c                |    6 +-
 drivers/ide/ide-devsets.c                   |   13 +-
 drivers/ide/ide-disk.c                      |   12 +-
 drivers/ide/ide-eh.c                        |    8 +-
 drivers/ide/ide-floppy.c                    |   37 +-
 drivers/ide/ide-io.c                        |   13 +-
 drivers/ide/ide-ioctls.c                    |   14 +-
 drivers/ide/ide-park.c                      |   20 +-
 drivers/ide/ide-pm.c                        |   20 +-
 drivers/ide/ide-probe.c                     |   36 +-
 drivers/ide/ide-tape.c                      |   41 +-
 drivers/ide/ide-taskfile.c                  |    8 +-
 drivers/ide/sis5513.c                       |    2 +-
 drivers/lightnvm/Kconfig                    |    9 -
 drivers/lightnvm/Makefile                   |    3 +-
 drivers/lightnvm/core.c                     | 1027 ++++++-----
 drivers/lightnvm/gennvm.c                   |  657 -------
 drivers/lightnvm/gennvm.h                   |   62 -
 drivers/lightnvm/rrpc.c                     |    7 +-
 drivers/lightnvm/rrpc.h                     |    3 -
 drivers/lightnvm/sysblk.c                   |  733 --------
 drivers/md/bcache/request.c                 |   12 +-
 drivers/md/bcache/super.c                   |    8 +-
 drivers/md/dm-cache-target.c                |   15 +-
 drivers/md/dm-core.h                        |    1 -
 drivers/md/dm-era-target.c                  |    2 +-
 drivers/md/dm-mpath.c                       |  132 +-
 drivers/md/dm-rq.c                          |  268 +--
 drivers/md/dm-rq.h                          |    2 +-
 drivers/md/dm-table.c                       |    2 +-
 drivers/md/dm-target.c                      |    7 -
 drivers/md/dm-thin.c                        |   15 +-
 drivers/md/dm.c                             |   49 +-
 drivers/md/dm.h                             |    3 +-
 drivers/md/linear.c                         |    2 +-
 drivers/md/md.c                             |    6 +-
 drivers/md/multipath.c                      |    2 +-
 drivers/md/raid0.c                          |    6 +-
 drivers/md/raid1.c                          |   11 +-
 drivers/md/raid10.c                         |   10 +-
 drivers/md/raid5.c                          |   12 +-
 drivers/memstick/core/ms_block.c            |   11 -
 drivers/memstick/core/mspro_block.c         |   13 -
 drivers/message/fusion/mptsas.c             |    8 +-
 drivers/mmc/core/queue.c                    |    9 -
 drivers/mtd/mtd_blkdevs.c                   |   13 +-
 drivers/mtd/ubi/block.c                     |   15 +-
 drivers/nvme/host/core.c                    |   86 +-
 drivers/nvme/host/fc.c                      |    2 +-
 drivers/nvme/host/lightnvm.c                |  315 +++-
 drivers/nvme/host/nvme.h                    |   13 +
 drivers/nvme/host/pci.c                     |   19 +-
 drivers/nvme/host/rdma.c                    |    6 +-
 drivers/nvme/host/scsi.c                    |    7 +-
 drivers/nvme/target/loop.c                  |    2 +-
 drivers/s390/block/scm_blk.c                |    7 -
 drivers/scsi/Kconfig                        |    1 +
 drivers/scsi/device_handler/scsi_dh_emc.c   |  247 +--
 drivers/scsi/device_handler/scsi_dh_hp_sw.c |  222 +--
 drivers/scsi/device_handler/scsi_dh_rdac.c  |  174 +-
 drivers/scsi/hosts.c                        |   24 +-
 drivers/scsi/hpsa.c                         |    4 +-
 drivers/scsi/libfc/fc_lport.c               |    2 +-
 drivers/scsi/libsas/sas_expander.c          |    8 +-
 drivers/scsi/libsas/sas_host_smp.c          |   38 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c        |    2 +-
 drivers/scsi/mpt3sas/mpt3sas_transport.c    |    8 +-
 drivers/scsi/osd/osd_initiator.c            |   22 +-
 drivers/scsi/osst.c                         |   18 +-
 drivers/scsi/qla2xxx/qla_bsg.c              |    2 +-
 drivers/scsi/qla2xxx/qla_isr.c              |    6 +-
 drivers/scsi/qla2xxx/qla_mr.c               |    2 +-
 drivers/scsi/scsi.c                         |  354 +---
 drivers/scsi/scsi_error.c                   |   43 +-
 drivers/scsi/scsi_lib.c                     |  264 ++-
 drivers/scsi/scsi_priv.h                    |    5 +-
 drivers/scsi/scsi_transport_fc.c            |   34 +-
 drivers/scsi/scsi_transport_iscsi.c         |   14 +-
 drivers/scsi/scsi_transport_sas.c           |    5 +
 drivers/scsi/sd.c                           |   48 +-
 drivers/scsi/sg.c                           |   33 +-
 drivers/scsi/smartpqi/smartpqi_init.c       |    2 +-
 drivers/scsi/sr.c                           |   11 +-
 drivers/scsi/st.c                           |   28 +-
 drivers/scsi/sun3_scsi.c                    |    2 +-
 drivers/target/Kconfig                      |    1 +
 drivers/target/target_core_pscsi.c          |   14 +-
 fs/block_dev.c                              |   22 +
 fs/btrfs/disk-io.c                          |    2 +-
 fs/btrfs/volumes.c                          |    2 +-
 fs/debugfs/inode.c                          |   36 +
 fs/gfs2/ops_fstype.c                        |    2 +-
 fs/nfsd/Kconfig                             |    1 +
 fs/nfsd/blocklayout.c                       |   19 +-
 fs/nilfs2/super.c                           |    2 +-
 fs/super.c                                  |    2 +-
 fs/xfs/xfs_buf.c                            |    3 +-
 fs/xfs/xfs_buf.h                            |    1 -
 include/linux/backing-dev-defs.h            |    2 +
 include/linux/backing-dev.h                 |   12 +-
 include/linux/blk-mq.h                      |    9 +-
 include/linux/blk_types.h                   |   38 +-
 include/linux/blkdev.h                      |  124 +-
 include/linux/blktrace_api.h                |   18 +-
 include/linux/bsg-lib.h                     |    5 +-
 include/linux/cdrom.h                       |    5 +-
 include/linux/debugfs.h                     |    8 +
 include/linux/device-mapper.h               |    3 -
 include/linux/elevator.h                    |   63 +-
 include/linux/fs.h                          |    2 +
 include/linux/genhd.h                       |    8 +
 include/linux/ide.h                         |   58 +-
 include/linux/lightnvm.h                    |  138 +-
 include/linux/nvme.h                        |    3 +
 include/linux/sbitmap.h                     |   30 +
 include/linux/sed-opal.h                    |   70 +
 include/scsi/scsi_cmnd.h                    |    4 +-
 include/scsi/scsi_host.h                    |    5 -
 include/scsi/scsi_request.h                 |   30 +
 include/scsi/scsi_transport.h               |    2 +
 include/trace/events/block.h                |   27 +-
 include/uapi/linux/lightnvm.h               |   50 +
 include/uapi/linux/sed-opal.h               |  119 ++
 kernel/trace/blktrace.c                     |   78 +-
 lib/sbitmap.c                               |  139 +-
 mm/backing-dev.c                            |   43 +-
 mm/page-writeback.c                         |    4 +-
 203 files changed, 9729 insertions(+), 5577 deletions(-)
 create mode 100644 block/blk-mq-debugfs.c
 create mode 100644 block/blk-mq-sched.c
 create mode 100644 block/blk-mq-sched.h
 create mode 100644 block/mq-deadline.c
 create mode 100644 block/opal_proto.h
 create mode 100644 block/sed-opal.c
 delete mode 100644 drivers/lightnvm/gennvm.c
 delete mode 100644 drivers/lightnvm/gennvm.h
 delete mode 100644 drivers/lightnvm/sysblk.c
 create mode 100644 include/linux/sed-opal.h
 create mode 100644 include/scsi/scsi_request.h
 create mode 100644 include/uapi/linux/sed-opal.h

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-20  0:10 [GIT PULL] Block pull request for- 4.11-rc1 Jens Axboe
@ 2017-02-20  1:09   ` Bart Van Assche
  2017-02-21 19:11 ` Linus Torvalds
  1 sibling, 0 replies; 43+ messages in thread
From: Bart Van Assche @ 2017-02-20  1:09 UTC (permalink / raw)
  To: Jens Axboe, Linus Torvalds
  Cc: linux-block, linux-kernel, Christoph Hellwig, Mike Snitzer

On 02/19/2017 04:11 PM, Jens Axboe wrote:=0A=
> - Removal of the BLOCK_PC support in struct request, and refactoring of=
=0A=
>   carrying SCSI payloads in the block layer. This cleans up the code=0A=
>   nicely, and enables us to kill the SCSI specific parts of struct=0A=
>   request, shrinking it down nicely. From Christoph mainly, with help=0A=
>   from Hannes.=0A=
=0A=
Hello Jens, Christoph and Mike,=0A=
=0A=
Is anyone working on a fix for the regression introduced by the BLOCK_PC re=
moval=0A=
changes (general protection fault) that I had reported three weeks ago? See=
 also=0A=
https://www.spinics.net/lists/raid/msg55494.html=0A=
=0A=
Thanks,=0A=
=0A=
Bart.=

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL] Block pull request for- 4.11-rc1
@ 2017-02-20  1:09   ` Bart Van Assche
  0 siblings, 0 replies; 43+ messages in thread
From: Bart Van Assche @ 2017-02-20  1:09 UTC (permalink / raw)
  To: Jens Axboe, Linus Torvalds
  Cc: linux-block, linux-kernel, Christoph Hellwig, Mike Snitzer

On 02/19/2017 04:11 PM, Jens Axboe wrote:
> - Removal of the BLOCK_PC support in struct request, and refactoring of
>   carrying SCSI payloads in the block layer. This cleans up the code
>   nicely, and enables us to kill the SCSI specific parts of struct
>   request, shrinking it down nicely. From Christoph mainly, with help
>   from Hannes.

Hello Jens, Christoph and Mike,

Is anyone working on a fix for the regression introduced by the BLOCK_PC removal
changes (general protection fault) that I had reported three weeks ago? See also
https://www.spinics.net/lists/raid/msg55494.html

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-20  1:09   ` Bart Van Assche
  (?)
@ 2017-02-20  1:15   ` Jens Axboe
  2017-02-20  2:12       ` James Bottomley
  2017-02-20  7:35     ` Christoph Hellwig
  -1 siblings, 2 replies; 43+ messages in thread
From: Jens Axboe @ 2017-02-20  1:15 UTC (permalink / raw)
  To: Bart Van Assche, Linus Torvalds
  Cc: linux-block, linux-kernel, Christoph Hellwig, Mike Snitzer

On 02/19/2017 06:09 PM, Bart Van Assche wrote:
> On 02/19/2017 04:11 PM, Jens Axboe wrote:
>> - Removal of the BLOCK_PC support in struct request, and refactoring of
>>   carrying SCSI payloads in the block layer. This cleans up the code
>>   nicely, and enables us to kill the SCSI specific parts of struct
>>   request, shrinking it down nicely. From Christoph mainly, with help
>>   from Hannes.
> 
> Hello Jens, Christoph and Mike,
> 
> Is anyone working on a fix for the regression introduced by the BLOCK_PC removal
> changes (general protection fault) that I had reported three weeks ago? See also
> https://www.spinics.net/lists/raid/msg55494.html

I don't think that's a regression in this series, it just triggers more easily
with this series. The BLOCK_PC removal fixes aren't touching device life times
at all.

That said, we will look into this again, of course. Christoph, any idea?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-20  1:15   ` Jens Axboe
@ 2017-02-20  2:12       ` James Bottomley
  2017-02-20  7:35     ` Christoph Hellwig
  1 sibling, 0 replies; 43+ messages in thread
From: James Bottomley @ 2017-02-20  2:12 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, Linus Torvalds
  Cc: linux-block, linux-kernel, Christoph Hellwig, Mike Snitzer

On Sun, 2017-02-19 at 18:15 -0700, Jens Axboe wrote:
> On 02/19/2017 06:09 PM, Bart Van Assche wrote:
> > On 02/19/2017 04:11 PM, Jens Axboe wrote:
> > > - Removal of the BLOCK_PC support in struct request, and
> > > refactoring of
> > >   carrying SCSI payloads in the block layer. This cleans up the
> > > code
> > >   nicely, and enables us to kill the SCSI specific parts of
> > > struct
> > >   request, shrinking it down nicely. From Christoph mainly, with
> > > help
> > >   from Hannes.
> > 
> > Hello Jens, Christoph and Mike,
> > 
> > Is anyone working on a fix for the regression introduced by the 
> > BLOCK_PC removal changes (general protection fault) that I had 
> > reported three weeks ago? See also
> > https://www.spinics.net/lists/raid/msg55494.html
> 
> I don't think that's a regression in this series, it just triggers 
> more easily with this series. The BLOCK_PC removal fixes aren't 
> touching device life times at all.
> 
> That said, we will look into this again, of course. Christoph, any
> idea?

We could do with tracing the bdi removal failure issue fingered both by
the block xfstests and Omar.  It's something to do with this set of
commits

> - Fixes for duplicate bdi registrations and bdi/queue life time
>   problems from Jan and Dan.

But no-one has actually root caused it yet.

James

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
@ 2017-02-20  2:12       ` James Bottomley
  0 siblings, 0 replies; 43+ messages in thread
From: James Bottomley @ 2017-02-20  2:12 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, Linus Torvalds
  Cc: linux-block, linux-kernel, Christoph Hellwig, Mike Snitzer

On Sun, 2017-02-19 at 18:15 -0700, Jens Axboe wrote:
> On 02/19/2017 06:09 PM, Bart Van Assche wrote:
> > On 02/19/2017 04:11 PM, Jens Axboe wrote:
> > > - Removal of the BLOCK_PC support in struct request, and
> > > refactoring of
> > >   carrying SCSI payloads in the block layer. This cleans up the
> > > code
> > >   nicely, and enables us to kill the SCSI specific parts of
> > > struct
> > >   request, shrinking it down nicely. From Christoph mainly, with
> > > help
> > >   from Hannes.
> > 
> > Hello Jens, Christoph and Mike,
> > 
> > Is anyone working on a fix for the regression introduced by the 
> > BLOCK_PC removal changes (general protection fault) that I had 
> > reported three weeks ago? See also
> > https://www.spinics.net/lists/raid/msg55494.html
> 
> I don't think that's a regression in this series, it just triggers 
> more easily with this series. The BLOCK_PC removal fixes aren't 
> touching device life times at all.
> 
> That said, we will look into this again, of course. Christoph, any
> idea?

We could do with tracing the bdi removal failure issue fingered both by
the block xfstests and Omar.  It's something to do with this set of
commits

> - Fixes for duplicate bdi registrations and bdi/queue life time
>   problems from Jan and Dan.

But no-one has actually root caused it yet.

James

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-20  2:12       ` James Bottomley
  (?)
@ 2017-02-20  2:59       ` Jens Axboe
  2017-02-20  3:02         ` Jens Axboe
  -1 siblings, 1 reply; 43+ messages in thread
From: Jens Axboe @ 2017-02-20  2:59 UTC (permalink / raw)
  To: James Bottomley, Bart Van Assche, Linus Torvalds
  Cc: linux-block, linux-kernel, Christoph Hellwig, Mike Snitzer

On 02/19/2017 07:12 PM, James Bottomley wrote:
> On Sun, 2017-02-19 at 18:15 -0700, Jens Axboe wrote:
>> On 02/19/2017 06:09 PM, Bart Van Assche wrote:
>>> On 02/19/2017 04:11 PM, Jens Axboe wrote:
>>>> - Removal of the BLOCK_PC support in struct request, and
>>>> refactoring of
>>>>   carrying SCSI payloads in the block layer. This cleans up the
>>>> code
>>>>   nicely, and enables us to kill the SCSI specific parts of
>>>> struct
>>>>   request, shrinking it down nicely. From Christoph mainly, with
>>>> help
>>>>   from Hannes.
>>>
>>> Hello Jens, Christoph and Mike,
>>>
>>> Is anyone working on a fix for the regression introduced by the 
>>> BLOCK_PC removal changes (general protection fault) that I had 
>>> reported three weeks ago? See also
>>> https://www.spinics.net/lists/raid/msg55494.html
>>
>> I don't think that's a regression in this series, it just triggers 
>> more easily with this series. The BLOCK_PC removal fixes aren't 
>> touching device life times at all.
>>
>> That said, we will look into this again, of course. Christoph, any
>> idea?
> 
> We could do with tracing the bdi removal failure issue fingered both by
> the block xfstests and Omar.  It's something to do with this set of
> commits
> 
>> - Fixes for duplicate bdi registrations and bdi/queue life time
>>   problems from Jan and Dan.
> 
> But no-one has actually root caused it yet.

The above set from Jan and Dan fixed one set of issues around this, and
the reproducer test case was happy as well. But we've recently found
that there are still corner cases that aren't happy, Omar reported that
separately on Friday. So there will be a followup set for that,
hopefully intersecting with the issue that Bart reported.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-20  2:59       ` Jens Axboe
@ 2017-02-20  3:02         ` Jens Axboe
  0 siblings, 0 replies; 43+ messages in thread
From: Jens Axboe @ 2017-02-20  3:02 UTC (permalink / raw)
  To: James Bottomley, Bart Van Assche, Linus Torvalds
  Cc: linux-block, linux-kernel, Christoph Hellwig, Mike Snitzer

On 02/19/2017 07:59 PM, Jens Axboe wrote:
> On 02/19/2017 07:12 PM, James Bottomley wrote:
>> On Sun, 2017-02-19 at 18:15 -0700, Jens Axboe wrote:
>>> On 02/19/2017 06:09 PM, Bart Van Assche wrote:
>>>> On 02/19/2017 04:11 PM, Jens Axboe wrote:
>>>>> - Removal of the BLOCK_PC support in struct request, and
>>>>> refactoring of
>>>>>   carrying SCSI payloads in the block layer. This cleans up the
>>>>> code
>>>>>   nicely, and enables us to kill the SCSI specific parts of
>>>>> struct
>>>>>   request, shrinking it down nicely. From Christoph mainly, with
>>>>> help
>>>>>   from Hannes.
>>>>
>>>> Hello Jens, Christoph and Mike,
>>>>
>>>> Is anyone working on a fix for the regression introduced by the 
>>>> BLOCK_PC removal changes (general protection fault) that I had 
>>>> reported three weeks ago? See also
>>>> https://www.spinics.net/lists/raid/msg55494.html
>>>
>>> I don't think that's a regression in this series, it just triggers 
>>> more easily with this series. The BLOCK_PC removal fixes aren't 
>>> touching device life times at all.
>>>
>>> That said, we will look into this again, of course. Christoph, any
>>> idea?
>>
>> We could do with tracing the bdi removal failure issue fingered both by
>> the block xfstests and Omar.  It's something to do with this set of
>> commits
>>
>>> - Fixes for duplicate bdi registrations and bdi/queue life time
>>>   problems from Jan and Dan.
>>
>> But no-one has actually root caused it yet.
> 
> The above set from Jan and Dan fixed one set of issues around this, and
> the reproducer test case was happy as well. But we've recently found
> that there are still corner cases that aren't happy, Omar reported that
> separately on Friday. So there will be a followup set for that,
> hopefully intersecting with the issue that Bart reported.

Forgot to mention, that these cases exist in 4.0 and 4.6 as well, so
neither are a new problem with this series. The fixes from Jan and
Dan didn't close all of them.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-20  1:15   ` Jens Axboe
  2017-02-20  2:12       ` James Bottomley
@ 2017-02-20  7:35     ` Christoph Hellwig
  2017-02-20 16:16         ` Bart Van Assche
  1 sibling, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2017-02-20  7:35 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, Linus Torvalds, linux-block, linux-kernel,
	Christoph Hellwig, Mike Snitzer

On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:
> I don't think that's a regression in this series, it just triggers more easily
> with this series. The BLOCK_PC removal fixes aren't touching device life times
> at all.

Yes.

> That said, we will look into this again, of course. Christoph, any idea?

No idea really - this seems so far away from the code touched, and there
are no obvious signs for a memory scamble from another object touched
that I think if it really bisects down to that issue it must be a timing
issue.

But reading Bart's message again:  Did you actually bisect it down
to the is commit?  Or just test the whole tree?  Between the 4.10-rc5
merge and all the block tree there might a few more likely suspects
like the scsi bdi lifetime fixes that James mentioned.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-20  7:35     ` Christoph Hellwig
@ 2017-02-20 16:16         ` Bart Van Assche
  0 siblings, 0 replies; 43+ messages in thread
From: Bart Van Assche @ 2017-02-20 16:16 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Linus Torvalds, linux-block, linux-kernel, Mike Snitzer

On 02/19/2017 11:35 PM, Christoph Hellwig wrote:=0A=
> On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:=0A=
>> That said, we will look into this again, of course. Christoph, any idea?=
=0A=
> =0A=
> No idea really - this seems so far away from the code touched, and there=
=0A=
> are no obvious signs for a memory scamble from another object touched=0A=
> that I think if it really bisects down to that issue it must be a timing=
=0A=
> issue.=0A=
> =0A=
> But reading Bart's message again:  Did you actually bisect it down=0A=
> to the is commit?  Or just test the whole tree?  Between the 4.10-rc5=0A=
> merge and all the block tree there might a few more likely suspects=0A=
> like the scsi bdi lifetime fixes that James mentioned.=0A=
=0A=
Hello Christoph,=0A=
=0A=
As far as I know Jens does not rebase his trees so we can use the commit=0A=
date to check which patch went in when. From the first of Jan's bdi patches=
:=0A=
=0A=
CommitDate: Thu Feb 2 08:18:41 2017 -0700=0A=
=0A=
So the bdi patches went in several days after I reported the general protec=
tion=0A=
fault issue.=0A=
=0A=
In an e-mail of January 30th I wrote the following: "Running the srp-test=
=0A=
software against kernel 4.9.6 and kernel 4.10-rc5 went fine.  With your=0A=
for-4.11/block branch (commit 400f73b23f457a) however I just ran into=0A=
the following warning: [ ... ]" That means that I did not hit the crash wit=
h=0A=
Jens' for-4.11/block branch but only with the for-next branch. The patches=
=0A=
on Jens' for-next branch after that commit that were applied before I ran=
=0A=
my test are:=0A=
=0A=
$ PAGER=3D git log --format=3Doneline 400f73b23f457a..fb045ca25cc7 block dr=
ivers/md/dm{,-mpath,-table}.[ch]=0A=
fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in _=
_blk_rq_prep_clone=0A=
82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of s=
truct request=0A=
8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation int=
o bsg_setup_queue=0A=
eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request allocatio=
n to the owner of the request_queue=0A=
6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for e=
xtra command data=0A=
5ea708d15a928f7a479987704203616d3274c03b block: simplify blk_init_allocated=
_queue=0A=
e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check=0A=
f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into=
 for-4.11/rq-refactor=0A=
88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable=0A=
bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from _=
_blkdev_issue_zeroout=0A=
f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size =
and bdev_zone_size=0A=
=0A=
Do you see any patch in the above list that does not belong to the "split=
=0A=
scsi passthrough fields out of struct request" series and that could have=
=0A=
caused the reported behavior change?=0A=
=0A=
Thanks,=0A=
=0A=
Bart.=

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [GIT PULL] Block pull request for- 4.11-rc1
@ 2017-02-20 16:16         ` Bart Van Assche
  0 siblings, 0 replies; 43+ messages in thread
From: Bart Van Assche @ 2017-02-20 16:16 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Linus Torvalds, linux-block, linux-kernel, Mike Snitzer

On 02/19/2017 11:35 PM, Christoph Hellwig wrote:
> On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:
>> That said, we will look into this again, of course. Christoph, any idea?
> 
> No idea really - this seems so far away from the code touched, and there
> are no obvious signs for a memory scamble from another object touched
> that I think if it really bisects down to that issue it must be a timing
> issue.
> 
> But reading Bart's message again:  Did you actually bisect it down
> to the is commit?  Or just test the whole tree?  Between the 4.10-rc5
> merge and all the block tree there might a few more likely suspects
> like the scsi bdi lifetime fixes that James mentioned.

Hello Christoph,

As far as I know Jens does not rebase his trees so we can use the commit
date to check which patch went in when. From the first of Jan's bdi patches:

CommitDate: Thu Feb 2 08:18:41 2017 -0700

So the bdi patches went in several days after I reported the general protection
fault issue.

In an e-mail of January 30th I wrote the following: "Running the srp-test
software against kernel 4.9.6 and kernel 4.10-rc5 went fine.  With your
for-4.11/block branch (commit 400f73b23f457a) however I just ran into
the following warning: [ ... ]" That means that I did not hit the crash with
Jens' for-4.11/block branch but only with the for-next branch. The patches
on Jens' for-next branch after that commit that were applied before I ran
my test are:

$ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block drivers/md/dm{,-mpath,-table}.[ch]
fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in __blk_rq_prep_clone
82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of struct request
8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation into bsg_setup_queue
eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request allocation to the owner of the request_queue
6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for extra command data
5ea708d15a928f7a479987704203616d3274c03b block: simplify blk_init_allocated_queue
e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check
f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into for-4.11/rq-refactor
88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable
bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from __blkdev_issue_zeroout
f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size and bdev_zone_size

Do you see any patch in the above list that does not belong to the "split
scsi passthrough fields out of struct request" series and that could have
caused the reported behavior change?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-20 16:16         ` Bart Van Assche
  (?)
@ 2017-02-20 16:32         ` Jens Axboe
  2017-02-21  1:18             ` Bart Van Assche
  2017-02-24 17:39             ` Bart Van Assche
  -1 siblings, 2 replies; 43+ messages in thread
From: Jens Axboe @ 2017-02-20 16:32 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Linus Torvalds, linux-block, linux-kernel, Mike Snitzer

On 02/20/2017 09:16 AM, Bart Van Assche wrote:
> On 02/19/2017 11:35 PM, Christoph Hellwig wrote:
>> On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:
>>> That said, we will look into this again, of course. Christoph, any idea?
>>
>> No idea really - this seems so far away from the code touched, and there
>> are no obvious signs for a memory scamble from another object touched
>> that I think if it really bisects down to that issue it must be a timing
>> issue.
>>
>> But reading Bart's message again:  Did you actually bisect it down
>> to the is commit?  Or just test the whole tree?  Between the 4.10-rc5
>> merge and all the block tree there might a few more likely suspects
>> like the scsi bdi lifetime fixes that James mentioned.
> 
> Hello Christoph,
> 
> As far as I know Jens does not rebase his trees so we can use the commit
> date to check which patch went in when. From the first of Jan's bdi patches:
> 
> CommitDate: Thu Feb 2 08:18:41 2017 -0700
> 
> So the bdi patches went in several days after I reported the general protection
> fault issue.
> 
> In an e-mail of January 30th I wrote the following: "Running the srp-test
> software against kernel 4.9.6 and kernel 4.10-rc5 went fine.  With your
> for-4.11/block branch (commit 400f73b23f457a) however I just ran into
> the following warning: [ ... ]" That means that I did not hit the crash with
> Jens' for-4.11/block branch but only with the for-next branch. The patches
> on Jens' for-next branch after that commit that were applied before I ran
> my test are:
> 
> $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block drivers/md/dm{,-mpath,-table}.[ch]
> fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in __blk_rq_prep_clone
> 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of struct request
> 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation into bsg_setup_queue
> eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request allocation to the owner of the request_queue
> 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for extra command data
> 5ea708d15a928f7a479987704203616d3274c03b block: simplify blk_init_allocated_queue
> e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check
> f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into for-4.11/rq-refactor
> 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable
> bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from __blkdev_issue_zeroout
> f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size and bdev_zone_size
> 
> Do you see any patch in the above list that does not belong to the "split
> scsi passthrough fields out of struct request" series and that could have
> caused the reported behavior change?

Bart, since you are the only one that can reproduce this, can you just bisect
your way through that series?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-20 16:32         ` Jens Axboe
@ 2017-02-21  1:18             ` Bart Van Assche
  2017-02-24 17:39             ` Bart Van Assche
  1 sibling, 0 replies; 43+ messages in thread
From: Bart Van Assche @ 2017-02-21  1:18 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Linus Torvalds, linux-block, linux-kernel, Mike Snitzer

On 02/20/2017 08:32 AM, Jens Axboe wrote:=0A=
> Bart, since you are the only one that can reproduce this, can you just bi=
sect=0A=
> your way through that series?=0A=
=0A=
Hello Jens,=0A=
=0A=
I will do that as soon as I'm back in the office (later this week).=0A=
=0A=
Bart.=0A=
=0A=
=0A=

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
@ 2017-02-21  1:18             ` Bart Van Assche
  0 siblings, 0 replies; 43+ messages in thread
From: Bart Van Assche @ 2017-02-21  1:18 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig
  Cc: Linus Torvalds, linux-block, linux-kernel, Mike Snitzer

On 02/20/2017 08:32 AM, Jens Axboe wrote:
> Bart, since you are the only one that can reproduce this, can you just bisect
> your way through that series?

Hello Jens,

I will do that as soon as I'm back in the office (later this week).

Bart.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-20  0:10 [GIT PULL] Block pull request for- 4.11-rc1 Jens Axboe
  2017-02-20  1:09   ` Bart Van Assche
@ 2017-02-21 19:11 ` Linus Torvalds
  2017-02-21 19:34   ` Jens Axboe
  2017-02-21 23:02   ` Linus Torvalds
  1 sibling, 2 replies; 43+ messages in thread
From: Linus Torvalds @ 2017-02-21 19:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

On Sun, Feb 19, 2017 at 4:10 PM, Jens Axboe <axboe@kernel.dk> wrote:
>
> Please pull! Either this pre-merged branch:
>
>   git://git.kernel.dk/linux-block.git for-4.11/linus-merge-signed
>
> or
>
>   git://git.kernel.dk/linux-block.git for-4.11/block-signed
>   git://git.kernel.dk/linux-block.git for-4.11/next-signed

So normally I'd merge them separately, but since you didn't actually
give me explanations for what the two branches were (ie "block-signed
does X, next-signed does Y") I didn't feel like I could write a sane
merge message for the two branches - so I took the pre-merged one.

Which does bring me to my next issue: *your* merge messages suck too.
They don't actually talk about what you are merging and why.

A merge is a commit, and needs to have a message, unless it's really
really obvious (and they seldom are - merges are generally a lot less
obvious than most non-merge commits). So just saying

    Merge branch 'for-4.11/block' into for-4.11/linus-merge

    Signed-off-by: Jens Axboe <axboe@fb.com>

is simply not an acceptable merge message. What are you merging, and why?

Please. We've been very good at having good commit messages in the
kernel. Merges need good commit messages too!

                       Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-21 19:11 ` Linus Torvalds
@ 2017-02-21 19:34   ` Jens Axboe
  2017-02-21 23:02   ` Linus Torvalds
  1 sibling, 0 replies; 43+ messages in thread
From: Jens Axboe @ 2017-02-21 19:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-block, linux-kernel

On 02/21/2017 12:11 PM, Linus Torvalds wrote:
> On Sun, Feb 19, 2017 at 4:10 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> Please pull! Either this pre-merged branch:
>>
>>   git://git.kernel.dk/linux-block.git for-4.11/linus-merge-signed
>>
>> or
>>
>>   git://git.kernel.dk/linux-block.git for-4.11/block-signed
>>   git://git.kernel.dk/linux-block.git for-4.11/next-signed
> 
> So normally I'd merge them separately, but since you didn't actually
> give me explanations for what the two branches were (ie "block-signed
> does X, next-signed does Y") I didn't feel like I could write a sane
> merge message for the two branches - so I took the pre-merged one.
> 
> Which does bring me to my next issue: *your* merge messages suck too.
> They don't actually talk about what you are merging and why.
> 
> A merge is a commit, and needs to have a message, unless it's really
> really obvious (and they seldom are - merges are generally a lot less
> obvious than most non-merge commits). So just saying
> 
>     Merge branch 'for-4.11/block' into for-4.11/linus-merge
> 
>     Signed-off-by: Jens Axboe <axboe@fb.com>
> 
> is simply not an acceptable merge message. What are you merging, and why?
> 
> Please. We've been very good at having good commit messages in the
> kernel. Merges need good commit messages too!

You are right, and honestly I don't think I've ever done merge commit
messages for my own merges, I only do it when I merge other peoples
branches. I'll improve on it.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-21 19:11 ` Linus Torvalds
  2017-02-21 19:34   ` Jens Axboe
@ 2017-02-21 23:02   ` Linus Torvalds
  2017-02-21 23:15     ` Jens Axboe
  1 sibling, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2017-02-21 23:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

Hmm. The new config options are incomprehensible and their help
messages don't actually help.

So can you fill in the blanks on what

  Default single-queue blk-mq I/O scheduler
  Default multi-queue blk-mq I/O scheduler

config options mean, and why they default to none?

The config phase of the kernel is one of the worst parts of the whole
project, and adding these kinds of random and incomprehensible config
options does *not* help.

                  Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-21 23:02   ` Linus Torvalds
@ 2017-02-21 23:15     ` Jens Axboe
  2017-02-21 23:23       ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: Jens Axboe @ 2017-02-21 23:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-block, linux-kernel

On 02/21/2017 04:02 PM, Linus Torvalds wrote:
> Hmm. The new config options are incomprehensible and their help
> messages don't actually help.
> 
> So can you fill in the blanks on what
> 
>   Default single-queue blk-mq I/O scheduler
>   Default multi-queue blk-mq I/O scheduler
> 
> config options mean, and why they default to none?
> 
> The config phase of the kernel is one of the worst parts of the whole
> project, and adding these kinds of random and incomprehensible config
> options does *not* help.

I'll try and see if I can come up with some better sounding/reading
explanations.

But under a device managed by blk-mq, that device exposes a number of
hardware queues. For older style devices, that number is typically 1
(single queue). This is true for most SCSI devices that are run by
scsi-mq, which is often hosting rotational storage. Faster devices, like
nvme, exposes a lot more hardware queues (multi-queue). Hence the
distinction between having a scheduler attached for single-queue
devices, and for multi-queue devices. For rotational devices, we'll want
to default to something like mq-deadline, and I actually thought that
was the default already. It should be (see below). For multi-queue
devices, we'll want to initially default to "none", and then later
attach a properly multiqueue scheduler, when we have it (it's still in
development).

"none" just means that we don't have a scheduler attached.

In essence, we want to default to having a sane IO scheduler attached
depending on device class. For single-queue devices, that's deadline for
now. For multi-queue, we'll want to wait until we have something that
scales really well. It's not that easy to present this information in a
user grokkable fashion, since most people would not know the difference
between the two.

I'll send the below as a real patch, and also ponder how we can improve
the Kconfig text.

diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index 0715ce93daef..f6144c5d7c70 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -75,7 +75,7 @@ config MQ_IOSCHED_NONE
 
 choice
 	prompt "Default single-queue blk-mq I/O scheduler"
-	default DEFAULT_SQ_NONE
+	default DEFAULT_SQ_DEADLINE if MQ_IOSCHED_DEADLINE=y
 	help
 	  Select the I/O scheduler which will be used by default for blk-mq
 	  managed block devices with a single queue.

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-21 23:15     ` Jens Axboe
@ 2017-02-21 23:23       ` Linus Torvalds
  2017-02-22 18:14         ` Jens Axboe
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2017-02-21 23:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

On Tue, Feb 21, 2017 at 3:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
>
> But under a device managed by blk-mq, that device exposes a number of
> hardware queues. For older style devices, that number is typically 1
> (single queue).

... but why would this ever be different from the normal IO scheduler?

IOW, what makes single-queue mq scheduling so special that

 (a) it needs its own config option

 (b) it is different from just the regular IO scheduler in the first place?

So the whole thing stinks. The fact that it then has an
incomprehensible config option seems to be just gravy on top of the
crap.

> "none" just means that we don't have a scheduler attached.

.. which makes no sense to me in the first place.

People used to try to convince us that doing IO schedulers was a
mistake, because modern disk hardware did a better job than we could
do in software.

Those people were full of crap. The regular IO scheduler used to have
a "NONE" option too. Maybe it even still has one, but only insane
people actually use it.

Why is the MQ stuff magically so different that NONE would make sense at all>?

And equally importantly: why do we _ask_ people these issues? Is this
some kind of sick "cover your ass" thing, where you can say "well, I
asked about it", when inevitably the choice ends up being the wrong
one?

We have too damn many Kconfig options as-is, I'm trying to push back
on them. These two options seem fundamentally broken and stupid.

The "we have no good idea, so let's add a Kconfig option" seems like a
broken excuse for these things existing.

So why ask this question in the first place?

Is there any possible reason why "NONE" is a good option at all? And
if it is the _only_ option (because no other better choice exists), it
damn well shouldn't be a kconfig option!

             Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-21 23:23       ` Linus Torvalds
@ 2017-02-22 18:14         ` Jens Axboe
  2017-02-22 18:26           ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: Jens Axboe @ 2017-02-22 18:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-block, linux-kernel

On 02/21/2017 04:23 PM, Linus Torvalds wrote:
> On Tue, Feb 21, 2017 at 3:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> But under a device managed by blk-mq, that device exposes a number of
>> hardware queues. For older style devices, that number is typically 1
>> (single queue).
> 
> ... but why would this ever be different from the normal IO scheduler?

Because we have a different set of schedulers for blk-mq, different
than the legacy path. mq-deadline is a basic port that will work
fine with rotational storage, but it's not going to be a good choice
for NVMe because of scalability issues.

We'll have BFQ on the blk-mq side, catering to the needs of those
folks that currently rely on the richer feature set that CFQ supports.

We've continually been working towards getting rid of the legacy
IO path, and its set of schedulers. So if it's any consolation,
those options will go away in the future.

> IOW, what makes single-queue mq scheduling so special that
> 
>  (a) it needs its own config option
> 
>  (b) it is different from just the regular IO scheduler in the first place?
> 
> So the whole thing stinks. The fact that it then has an
> incomprehensible config option seems to be just gravy on top of the
> crap.

What do you mean by "the regular IO scheduler"? These are different
schedulers.

As explained above, single-queue mq devices generally DO want mq-deadline.
multi-queue mq devices, we don't have a good choice for them right now,
so we retain the current behavior (that we've had since blk-mq was
introduced in 3.13) of NOT doing any IO scheduling for them. If you
do want scheduling for them, set the option, or configure udev to
make the right choice for you.

I agree the wording isn't great, and we can improve that. But I do
think that the current choices make sense.

>> "none" just means that we don't have a scheduler attached.
> 
> .. which makes no sense to me in the first place.
> 
> People used to try to convince us that doing IO schedulers was a
> mistake, because modern disk hardware did a better job than we could
> do in software.
> 
> Those people were full of crap. The regular IO scheduler used to have
> a "NONE" option too. Maybe it even still has one, but only insane
> people actually use it.
> 
> Why is the MQ stuff magically so different that NONE would make sense at all>?

I was never one of those people, and I've always been a strong advocate
for imposing scheduling to keep devices in check. The regular IO scheduler
pool includes "noop", which is probably the one you are thinking of. That
one is a bit different than the new "none" option for blk-mq, in that it
does do insertion sorts and it does do merges. "none" does some merging,
but only where it happens to make sense. There's no insertion sorting.

> And equally importantly: why do we _ask_ people these issues? Is this
> some kind of sick "cover your ass" thing, where you can say "well, I
> asked about it", when inevitably the choice ends up being the wrong
> one?
> 
> We have too damn many Kconfig options as-is, I'm trying to push back
> on them. These two options seem fundamentally broken and stupid.
> 
> The "we have no good idea, so let's add a Kconfig option" seems like a
> broken excuse for these things existing.
> 
> So why ask this question in the first place?
> 
> Is there any possible reason why "NONE" is a good option at all? And
> if it is the _only_ option (because no other better choice exists), it
> damn well shouldn't be a kconfig option!

I'm all for NOT asking questions, and not providing tunables. That's
generally how I do write code. See the blk-wbt stuff, for instance, that
basically just has one tunable that's set sanely by default, and we
figure out the rest.

I don't want to regress performance of blk-mq devices by attaching
mq-deadline to them. When we do have a sane scheduler choice, we'll
make that the default. And yes, maybe we can remove the Kconfig option
at that point.

For single queue devices, we could kill the option. But we're expecting
bfq-mq for 4.12, and we'll want to have the option at that point unless
you want to rely solely on runtime setting of the scheduler through
udev or by the sysadmin.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 18:14         ` Jens Axboe
@ 2017-02-22 18:26           ` Linus Torvalds
  2017-02-22 18:41             ` Jens Axboe
  2017-02-22 18:42             ` Linus Torvalds
  0 siblings, 2 replies; 43+ messages in thread
From: Linus Torvalds @ 2017-02-22 18:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

On Wed, Feb 22, 2017 at 10:14 AM, Jens Axboe <axboe@kernel.dk> wrote:
>
> What do you mean by "the regular IO scheduler"? These are different
> schedulers.

Not to the user they aren't.

If the user already answered once about the IO schedulers, we damn
well shouldn't ask again abotu another small implementaiton detail.

How hard is this to understand? You're asking users stupid things.

It's not just about the wording. It's a fundamental issue.  These
questions are about internal implementation details. They make no
sense to a user. They don't even make sense to a kernel developer, for
chrissake!

Don't make the kconfig mess worse. This "we can't make good defaults
in the kernel, so ask users about random things that they cannot
possibly answer" model is not an acceptable model.

If the new schedulers aren't better than NOOP, they shouldn't exist.
And if you want people to be able to test, they should be dynamic.

And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?

It's really that simple.

             Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 18:26           ` Linus Torvalds
@ 2017-02-22 18:41             ` Jens Axboe
  2017-02-22 18:45               ` Linus Torvalds
  2017-02-22 18:42             ` Linus Torvalds
  1 sibling, 1 reply; 43+ messages in thread
From: Jens Axboe @ 2017-02-22 18:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-block, linux-kernel

On 02/22/2017 11:26 AM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:14 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> What do you mean by "the regular IO scheduler"? These are different
>> schedulers.
> 
> Not to the user they aren't.
> 
> If the user already answered once about the IO schedulers, we damn
> well shouldn't ask again abotu another small implementaiton detail.
> 
> How hard is this to understand? You're asking users stupid things.

The fact is that we have two different sets, until we can yank
the old ones. So I can't just ask one question, since the sets
aren't identical.

This IS confusing to the user, and it's an artifact of the situation
that we have where we are phasing out the old IO path and switching
to blk-mq. I don't want to user to know about blk-mq, I just want
it to be what everything runs on. But until that happens, and it is
happening, we are going to be stuck with that situation.

We have this exposed in other places, too. Like for dm, and for
SCSI. Not a perfect situation, but something that WILL go away
eventually.

> It's not just about the wording. It's a fundamental issue.  These
> questions are about internal implementation details. They make no
> sense to a user. They don't even make sense to a kernel developer, for
> chrissake!
> 
> Don't make the kconfig mess worse. This "we can't make good defaults
> in the kernel, so ask users about random things that they cannot
> possibly answer" model is not an acceptable model.

There are good defaults! mq single-queue should default to mq-deadline,
and mq multi-queue should default to "none" for now. If you feel that
strongly about it (and I'm guessing you do, judging by the speed
typing and generally annoyed demeanor), then by all means, let's kill
the config entries and I'll just hardwire the defaults. The config
entries were implemented similarly to the old schedulers, and each
scheduler is selectable individually. I'd greatly prefer just
improving the wording so it makes more sense.

> If the new schedulers aren't better than NOOP, they shouldn't exist.
> And if you want people to be able to test, they should be dynamic.

They are dynamic! You can build them as modules, you can switch at
runtime. Just like we have always been able to. I can't make it more
dynamic than that. We're reusing the same internal infrastructure for
that, AND the user visible ABI for checking what is available, and
setting a new one.

> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?

BECAUSE IT'S POLICY! Fact of that matter is, if I just default to what
we had before, it'd all be running with none. In a few years time, if
I'm lucky, someone will have shipped udev rules setting this appropriately.
If I ask the question, we'll get testing NOW. People will run with
the default set.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 18:26           ` Linus Torvalds
  2017-02-22 18:41             ` Jens Axboe
@ 2017-02-22 18:42             ` Linus Torvalds
  2017-02-22 18:44               ` Jens Axboe
  1 sibling, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2017-02-22 18:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?

Basically, I'm pushing back on config options that I can't personally
even sanely answer.

If it's a config option about "do I have a particular piece of
hardware", it makes sense. But these new ones were just complete
garbage.

The whole "default IO scheduler" thing is a disease. We should stop
making up these shit schedulers and then say "we don't know which one
works best for you".

All it does is encourage developers to make shortcuts and create crap
that isn't generically useful, and then blame the user and say "well,
you should have picked a different scheduler" when they say "this does
not work well for me".

We have had too many of those kinds of broken choices.  And when the
new Kconfig options get so confusing and so esoteric that I go "Hmm, I
have no idea if my hardware does a single queue or not", I put my foot
down.

When the IO scheduler questions were about a generic IO scheduler for
everything, I can kind of understand them. I think it was still a
mistake (for the reasons outline above), but at least it was a
comprehensible question to ask.

But when it gets to "what should I do about a single-queue version of
a MQ scheduler", the question is no longer even remotely sensible. The
question should simply NOT EXIST. There is no possible valid reason to
ask that kind of crap.

               Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 18:42             ` Linus Torvalds
@ 2017-02-22 18:44               ` Jens Axboe
  2017-02-22 21:50                 ` Markus Trippelsdorf
  0 siblings, 1 reply; 43+ messages in thread
From: Jens Axboe @ 2017-02-22 18:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-block, linux-kernel

On 02/22/2017 11:42 AM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?
> 
> Basically, I'm pushing back on config options that I can't personally
> even sanely answer.

I got that much, and I don't disagree on that part.

> If it's a config option about "do I have a particular piece of
> hardware", it makes sense. But these new ones were just complete
> garbage.
> 
> The whole "default IO scheduler" thing is a disease. We should stop
> making up these shit schedulers and then say "we don't know which one
> works best for you".
> 
> All it does is encourage developers to make shortcuts and create crap
> that isn't generically useful, and then blame the user and say "well,
> you should have picked a different scheduler" when they say "this does
> not work well for me".
> 
> We have had too many of those kinds of broken choices.  And when the
> new Kconfig options get so confusing and so esoteric that I go "Hmm, I
> have no idea if my hardware does a single queue or not", I put my foot
> down.
> 
> When the IO scheduler questions were about a generic IO scheduler for
> everything, I can kind of understand them. I think it was still a
> mistake (for the reasons outline above), but at least it was a
> comprehensible question to ask.
> 
> But when it gets to "what should I do about a single-queue version of
> a MQ scheduler", the question is no longer even remotely sensible. The
> question should simply NOT EXIST. There is no possible valid reason to
> ask that kind of crap.

OK, so here's what I'll do:

1) We'll kill the default scheduler choices. sq blk-mq will default to
   mq-deadline, mq blk-mq will default to "none" (at least for now, until
   the new scheduler is done).
2) The individual schedulers will be y/m/n selectable, just like any
   other driver.

I hope that works for everyone.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 18:41             ` Jens Axboe
@ 2017-02-22 18:45               ` Linus Torvalds
  2017-02-22 18:52                 ` Jens Axboe
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2017-02-22 18:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

On Wed, Feb 22, 2017 at 10:41 AM, Jens Axboe <axboe@kernel.dk> wrote:
>
> The fact is that we have two different sets, until we can yank
> the old ones. So I can't just ask one question, since the sets
> aren't identical.

Bullshit.

I'm, saying: rip out the question ENTIRELY. For *both* cases.

If you cannot yourself give a good answer, then there's no f*cking way
any user can give a good answer. So asking the question is totally and
utterly pointless.

All it means is that different people will try different (in random
ways) configurations, and the end result is random crap.

So get rid of those questions. Pick a default, and live with it. And
if people complain about performance, fix the performance issue.

It's that simple.

                Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 18:45               ` Linus Torvalds
@ 2017-02-22 18:52                 ` Jens Axboe
  2017-02-22 18:56                   ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: Jens Axboe @ 2017-02-22 18:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-block, linux-kernel

On 02/22/2017 11:45 AM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:41 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> The fact is that we have two different sets, until we can yank
>> the old ones. So I can't just ask one question, since the sets
>> aren't identical.
> 
> Bullshit.
> 
> I'm, saying: rip out the question ENTIRELY. For *both* cases.
> 
> If you cannot yourself give a good answer, then there's no f*cking way
> any user can give a good answer. So asking the question is totally and
> utterly pointless.
> 
> All it means is that different people will try different (in random
> ways) configurations, and the end result is random crap.
> 
> So get rid of those questions. Pick a default, and live with it. And
> if people complain about performance, fix the performance issue.
> 
> It's that simple.

No, it's not that simple at all. Fact is, some optimizations make sense
for some workloads, and some do not. CFQ works great for some cases, and
it works poorly for others, even if we try to make heuristics that enable
it to work well for all cases. Some optimizations are costly, that's
fine on certain types of hardware or maybe that's a trade off you want
to make. Or we end up with tons of settings for a single driver, that
does not reduce the configuration matrix at all.

By that logic, why do we have ANY config options outside of what drivers
to build? What should I set HZ at? RCU options? Let's default to ext4,
and kill off xfs? Or btrfs? slab/slob/slub/whatever?

Yes, that's taking the argument a bit more to the extreme, but it's the
same damn thing.

I'm fine with getting rid of the default selections, but we're NOT
going to be able to have just one scheduler for everything. We can
make sane defaults based on the hardware type.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 18:52                 ` Jens Axboe
@ 2017-02-22 18:56                   ` Linus Torvalds
  2017-02-22 18:58                     ` Jens Axboe
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2017-02-22 18:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

On Wed, Feb 22, 2017 at 10:52 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> It's that simple.
>
> No, it's not that simple at all. Fact is, some optimizations make sense
> for some workloads, and some do not.

Are you even listening?

I'm saying no user can ever give a sane answer to your question. The
question is insane and wrong.

I already said you can have a dynamic configuration (and maybe even an
automatic heuristic - like saying that a ramdisk gets NOOP by default,
real hardware does not).

But asking a user at kernel config time for a default is insane. If
*you* cannot answer it, then the user sure as hell cannot.

Other configuration questions have problems too, but at least the
question about "should I support ext4" is something a user (or distro)
can sanely answer. So your comparisons are pure bullshit.

                     Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 18:56                   ` Linus Torvalds
@ 2017-02-22 18:58                     ` Jens Axboe
  2017-02-22 19:04                       ` Linus Torvalds
  0 siblings, 1 reply; 43+ messages in thread
From: Jens Axboe @ 2017-02-22 18:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-block, linux-kernel

On 02/22/2017 11:56 AM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:52 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>> It's that simple.
>>
>> No, it's not that simple at all. Fact is, some optimizations make sense
>> for some workloads, and some do not.
> 
> Are you even listening?
> 
> I'm saying no user can ever give a sane answer to your question. The
> question is insane and wrong.
> 
> I already said you can have a dynamic configuration (and maybe even an
> automatic heuristic - like saying that a ramdisk gets NOOP by default,
> real hardware does not).
> 
> But asking a user at kernel config time for a default is insane. If
> *you* cannot answer it, then the user sure as hell cannot.
> 
> Other configuration questions have problems too, but at least the
> question about "should I support ext4" is something a user (or distro)
> can sanely answer. So your comparisons are pure bullshit.

As per the previous email, this was my proposed solution:

OK, so here's what I'll do:

1) We'll kill the default scheduler choices. sq blk-mq will default to
   mq-deadline, mq blk-mq will default to "none" (at least for now, until
   the new scheduler is done).
2) The individual schedulers will be y/m/n selectable, just like any
   other driver.

Any further settings on that can be done at runtime, through sysfs.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 18:58                     ` Jens Axboe
@ 2017-02-22 19:04                       ` Linus Torvalds
  2017-02-22 21:29                         ` Jens Axboe
  0 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2017-02-22 19:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

On Wed, Feb 22, 2017 at 10:58 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 02/22/2017 11:56 AM, Linus Torvalds wrote:
>
> OK, so here's what I'll do:
>
> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>    mq-deadline, mq blk-mq will default to "none" (at least for now, until
>    the new scheduler is done).
> 2) The individual schedulers will be y/m/n selectable, just like any
>    other driver.

Yes. That makes sense as options. I can (or, perhaps even more
importantly, a distro can) answer those kinds of questions.

                   Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 19:04                       ` Linus Torvalds
@ 2017-02-22 21:29                         ` Jens Axboe
  0 siblings, 0 replies; 43+ messages in thread
From: Jens Axboe @ 2017-02-22 21:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-block, linux-kernel

On 02/22/2017 12:04 PM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:58 AM, Jens Axboe <axboe@kernel.dk> wrote:
>> On 02/22/2017 11:56 AM, Linus Torvalds wrote:
>>
>> OK, so here's what I'll do:
>>
>> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>>    mq-deadline, mq blk-mq will default to "none" (at least for now, until
>>    the new scheduler is done).
>> 2) The individual schedulers will be y/m/n selectable, just like any
>>    other driver.
> 
> Yes. That makes sense as options. I can (or, perhaps even more
> importantly, a distro can) answer those kinds of questions.

Someone misspelled pacman:

parman (PARMAN) [N/m/y] (NEW) ?

There is no help available for this option.

Or I think it's pacman, because I have no idea what else it could be. I'm
going to say N.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 18:44               ` Jens Axboe
@ 2017-02-22 21:50                 ` Markus Trippelsdorf
  2017-02-22 21:55                   ` Jens Axboe
  2017-02-23  0:16                   ` Linus Torvalds
  0 siblings, 2 replies; 43+ messages in thread
From: Markus Trippelsdorf @ 2017-02-22 21:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, linux-block, linux-kernel

On 2017.02.22 at 11:44 -0700, Jens Axboe wrote:
> On 02/22/2017 11:42 AM, Linus Torvalds wrote:
> > On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?
> > 
> > Basically, I'm pushing back on config options that I can't personally
> > even sanely answer.
> 
> I got that much, and I don't disagree on that part.
> 
> > If it's a config option about "do I have a particular piece of
> > hardware", it makes sense. But these new ones were just complete
> > garbage.
> > 
> > The whole "default IO scheduler" thing is a disease. We should stop
> > making up these shit schedulers and then say "we don't know which one
> > works best for you".
> > 
> > All it does is encourage developers to make shortcuts and create crap
> > that isn't generically useful, and then blame the user and say "well,
> > you should have picked a different scheduler" when they say "this does
> > not work well for me".
> > 
> > We have had too many of those kinds of broken choices.  And when the
> > new Kconfig options get so confusing and so esoteric that I go "Hmm, I
> > have no idea if my hardware does a single queue or not", I put my foot
> > down.
> > 
> > When the IO scheduler questions were about a generic IO scheduler for
> > everything, I can kind of understand them. I think it was still a
> > mistake (for the reasons outline above), but at least it was a
> > comprehensible question to ask.
> > 
> > But when it gets to "what should I do about a single-queue version of
> > a MQ scheduler", the question is no longer even remotely sensible. The
> > question should simply NOT EXIST. There is no possible valid reason to
> > ask that kind of crap.
> 
> OK, so here's what I'll do:
> 
> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>    mq-deadline, mq blk-mq will default to "none" (at least for now, until
>    the new scheduler is done).

But what about e.g. SATA SSDs? Wouldn't they be better off without any
scheduler? 
So perhaps setting "none" for queue/rotational==0 and mq-deadline for
spinning drives automatically in the sq blk-mq case?

-- 
Markus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 21:50                 ` Markus Trippelsdorf
@ 2017-02-22 21:55                   ` Jens Axboe
  2017-02-23  0:16                   ` Linus Torvalds
  1 sibling, 0 replies; 43+ messages in thread
From: Jens Axboe @ 2017-02-22 21:55 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: Linus Torvalds, linux-block, linux-kernel

On 02/22/2017 02:50 PM, Markus Trippelsdorf wrote:
> On 2017.02.22 at 11:44 -0700, Jens Axboe wrote:
>> On 02/22/2017 11:42 AM, Linus Torvalds wrote:
>>> On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>>>
>>>> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?
>>>
>>> Basically, I'm pushing back on config options that I can't personally
>>> even sanely answer.
>>
>> I got that much, and I don't disagree on that part.
>>
>>> If it's a config option about "do I have a particular piece of
>>> hardware", it makes sense. But these new ones were just complete
>>> garbage.
>>>
>>> The whole "default IO scheduler" thing is a disease. We should stop
>>> making up these shit schedulers and then say "we don't know which one
>>> works best for you".
>>>
>>> All it does is encourage developers to make shortcuts and create crap
>>> that isn't generically useful, and then blame the user and say "well,
>>> you should have picked a different scheduler" when they say "this does
>>> not work well for me".
>>>
>>> We have had too many of those kinds of broken choices.  And when the
>>> new Kconfig options get so confusing and so esoteric that I go "Hmm, I
>>> have no idea if my hardware does a single queue or not", I put my foot
>>> down.
>>>
>>> When the IO scheduler questions were about a generic IO scheduler for
>>> everything, I can kind of understand them. I think it was still a
>>> mistake (for the reasons outline above), but at least it was a
>>> comprehensible question to ask.
>>>
>>> But when it gets to "what should I do about a single-queue version of
>>> a MQ scheduler", the question is no longer even remotely sensible. The
>>> question should simply NOT EXIST. There is no possible valid reason to
>>> ask that kind of crap.
>>
>> OK, so here's what I'll do:
>>
>> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>>    mq-deadline, mq blk-mq will default to "none" (at least for now, until
>>    the new scheduler is done).
> 
> But what about e.g. SATA SSDs? Wouldn't they be better off without any
> scheduler? 

Marginal. If they are single queue, using a basic scheduler like
deadline isn't going to be a significant amount of overhead. In some
cases they are going to be better off, due to better merging. In the
worst case, overhead is slightly higher. Net result is positive, I'd
say.

> So perhaps setting "none" for queue/rotational==0 and mq-deadline for
> spinning drives automatically in the sq blk-mq case?

You can do that through a udev rule. The kernel doesn't know if the
device is rotational or not when we set up the scheduler. So we'd either
have to add code to do that, or simply just do it with a udev rule. I'd
prefer the latter.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-22 21:50                 ` Markus Trippelsdorf
  2017-02-22 21:55                   ` Jens Axboe
@ 2017-02-23  0:16                   ` Linus Torvalds
  1 sibling, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2017-02-23  0:16 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: Jens Axboe, linux-block, linux-kernel

On Wed, Feb 22, 2017 at 1:50 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
>
> But what about e.g. SATA SSDs? Wouldn't they be better off without any
> scheduler?
> So perhaps setting "none" for queue/rotational==0 and mq-deadline for
> spinning drives automatically in the sq blk-mq case?

Jens already said that the merging advantage can outweigh the costs,
but he didn't actually talk much about it.

The scheduler advantage can outweigh the costs of running a scheduler
by an absolutely _huge_ amount.

An SSD isn't zero-cost, and each command tends to have some fixed
overhead on the controller, and pretty much all SSD's heavily prefer
fewer large request over lots of tiny ones.

There are also fairness/latency issues that tend to very heavily favor
having an actual scheduler, ie reads want to be scheduled before
writes on an SSD (within reason) in order to make latency better.

Ten years ago, there were lots of people who argued that you don't
want to do do scheduling for SSD's, because SSD's were so fast that
you only added overhead. Nobody really believes that fairytale any
more.

So you might have particular loads that look better with noop, but
they will be rare and far between. Try it, by all means, and if it
works for you, set it in your udev rules.

The main place where a noop scheduler currently might make sense is
likely for a ramdisk, but quite frankly, since the main real usecase
for a ram-disk tends to be to make it easy to profile and find the
bottlenecks for performance analysis (for emulating future "infinitely
fast" media), even that isn't true - using noop there defeats the
whole purpose.

              Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-20 16:32         ` Jens Axboe
@ 2017-02-24 17:39             ` Bart Van Assche
  2017-02-24 17:39             ` Bart Van Assche
  1 sibling, 0 replies; 43+ messages in thread
From: Bart Van Assche @ 2017-02-24 17:39 UTC (permalink / raw)
  To: hch, axboe; +Cc: torvalds, linux-kernel, linux-block, snitzer

On Mon, 2017-02-20 at 09:32 -0700, Jens Axboe wrote:
> On 02/20/2017 09:16 AM, Bart Van Assche wrote:
> > On 02/19/2017 11:35 PM, Christoph Hellwig wrote:
> > > On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:
> > > > That said, we will look into this again, of course. Christoph, any =
idea?
> > >=20
> > > No idea really - this seems so far away from the code touched, and th=
ere
> > > are no obvious signs for a memory scamble from another object touched
> > > that I think if it really bisects down to that issue it must be a tim=
ing
> > > issue.
> > >=20
> > > But reading Bart's message again:  Did you actually bisect it down
> > > to the is commit?  Or just test the whole tree?  Between the 4.10-rc5
> > > merge and all the block tree there might a few more likely suspects
> > > like the scsi bdi lifetime fixes that James mentioned.
> >=20
> > Hello Christoph,
> >=20
> > As far as I know Jens does not rebase his trees so we can use the commi=
t
> > date to check which patch went in when. From the first of Jan's bdi pat=
ches:
> >=20
> > CommitDate: Thu Feb 2 08:18:41 2017 -0700
> >=20
> > So the bdi patches went in several days after I reported the general pr=
otection
> > fault issue.
> >=20
> > In an e-mail of January 30th I wrote the following: "Running the srp-te=
st
> > software against kernel 4.9.6 and kernel 4.10-rc5 went fine.  With your
> > for-4.11/block branch (commit 400f73b23f457a) however I just ran into
> > the following warning: [ ... ]" That means that I did not hit the crash=
 with
> > Jens' for-4.11/block branch but only with the for-next branch. The patc=
hes
> > on Jens' for-next branch after that commit that were applied before I r=
an
> > my test are:
> >=20
> > $ PAGER=3D git log --format=3Doneline 400f73b23f457a..fb045ca25cc7 bloc=
k drivers/md/dm{,-mpath,-table}.[ch]
> > fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags =
in __blk_rq_prep_clone
> > 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out =
of struct request
> > 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation=
 into bsg_setup_queue
> > eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request alloc=
ation to the owner of the request_queue
> > 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size f=
or extra command data
> > 5ea708d15a928f7a479987704203616d3274c03b block: simplify blk_init_alloc=
ated_queue
> > e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check
> > f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' =
into for-4.11/rq-refactor
> > 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable
> > bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard fr=
om __blkdev_issue_zeroout
> > f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_s=
ize and bdev_zone_size
> >=20
> > Do you see any patch in the above list that does not belong to the "spl=
it
> > scsi passthrough fields out of struct request" series and that could ha=
ve
> > caused the reported behavior change?
>=20
> Bart, since you are the only one that can reproduce this, can you just bi=
sect
> your way through that series?

Hello Jens,

Since Christoph also has access to IB hardware I will leave it to Christoph
to do the bisect. Anyway, I just reproduced this crash with Linus' current
tree (commit f1ef09fde17f) by running=A0srp-test/run_tests -r 10 -t 02-sq-o=
n-mq
(see also=A0https://github.com/bvanassche/srp-test):

[ 1629.920553] general protection fault: 0000 [#1] SMP
[ 1629.921193] CPU: 6 PID: 46 Comm: ksoftirqd/6 Tainted: G          I     4=
.10.0-dbg+ #1
[ 1629.921289] RIP: 0010:rq_completed+0x12/0x90 [dm_mod]
[ 1629.921316] RSP: 0018:ffffc90001bdbda8 EFLAGS: 00010246
[ 1629.921344] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 00000000000=
00000
[ 1629.921372] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6=
b6b6b
[ 1629.921401] RBP: ffffc90001bdbdc0 R08: ffff8803a3858d48 R09: 00000000000=
00000
[ 1629.921429] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000=
00000
[ 1629.921458] R13: 0000000000000000 R14: ffffffff81c05120 R15: 00000000000=
00004
[ 1629.921489] FS:  0000000000000000(0000) GS:ffff88046ef80000(0000) knlGS:=
0000000000000000
[ 1629.921520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1629.921547] CR2: 00007fb6324486b8 CR3: 0000000001c0f000 CR4: 00000000001=
406e0
[ 1629.921576] Call Trace:
[ 1629.921605]  dm_softirq_done+0xe6/0x1e0 [dm_mod]
[ 1629.921637]  blk_done_softirq+0x88/0xa0
[ 1629.921663]  __do_softirq+0xba/0x4c0
[ 1629.921744]  run_ksoftirqd+0x1a/0x50
[ 1629.921769]  smpboot_thread_fn+0x123/0x1e0
[ 1629.921797]  kthread+0x107/0x140
[ 1629.921944]  ret_from_fork+0x2e/0x40
[ 1629.921972] Code: ff ff 31 f6 48 89 c7 e8 ed 96 2f e1 5d c3 90 66 2e 0f =
1f 84 00 00 00 00 00 55 48 63 f6 48 89 e5 41 55 41 89 d5 41 54 53 48 89 fb =
<4c> 8b a7 70 02 00 00 f0 ff 8c b7 38 03 00 00 e8 3a 43 ff ff 85=20
[ 1629.922093] RIP: rq_completed+0x12/0x90 [dm_mod] RSP: ffffc90001bdbda8

$ gdb drivers/md/dm-mod.ko
(gdb) list *(rq_completed+0x12) =A0=A0=A0
0xdf62 is in rq_completed (drivers/md/dm-rq.c:187).
182 =A0=A0=A0=A0=A0* the md may be freed in dm_put() at the end of this fun=
ction.
183 =A0=A0=A0=A0=A0* Or do dm_get() before calling this function and dm_put=
() later.
184 =A0=A0=A0=A0=A0*/
185 =A0=A0=A0=A0static void rq_completed(struct mapped_device *md, int rw, =
bool run_queue)
186 =A0=A0=A0=A0{
187 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0struct request_queue *q =3D md->que=
ue;
188 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0unsigned long flags;
189
190 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0atomic_dec(&md->pending[rw]);
191
(gdb) disas rq_completed =A0
Dump of assembler code for function rq_completed:
 =A0=A00x000000000000df50 <+0>: =A0=A0=A0=A0push =A0=A0%rbp
 =A0=A00x000000000000df51 <+1>: =A0=A0=A0=A0movslq %esi,%rsi
 =A0=A00x000000000000df54 <+4>: =A0=A0=A0=A0mov =A0=A0=A0%rsp,%rbp
 =A0=A00x000000000000df57 <+7>: =A0=A0=A0=A0push =A0=A0%r13
 =A0=A00x000000000000df59 <+9>: =A0=A0=A0=A0mov =A0=A0=A0%edx,%r13d
 =A0=A00x000000000000df5c <+12>: =A0=A0=A0push =A0=A0%r12
 =A0=A00x000000000000df5e <+14>: =A0=A0=A0push =A0=A0%rbx
 =A0=A00x000000000000df5f <+15>: =A0=A0=A0mov =A0=A0=A0%rdi,%rbx
 =A0=A00x000000000000df62 <+18>: =A0=A0=A0mov =A0=A0=A00x270(%rdi),%r12
[ ... ]

So the crash is caused by an attempt to dereference address 0x6b6b6b6b6b6b6=
b6b
at offset 0x270. I think this means the crash is caused by a use-after-free=
.

Bart.=

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
@ 2017-02-24 17:39             ` Bart Van Assche
  0 siblings, 0 replies; 43+ messages in thread
From: Bart Van Assche @ 2017-02-24 17:39 UTC (permalink / raw)
  To: hch, axboe; +Cc: torvalds, linux-kernel, linux-block, snitzer

On Mon, 2017-02-20 at 09:32 -0700, Jens Axboe wrote:
> On 02/20/2017 09:16 AM, Bart Van Assche wrote:
> > On 02/19/2017 11:35 PM, Christoph Hellwig wrote:
> > > On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:
> > > > That said, we will look into this again, of course. Christoph, any idea?
> > > 
> > > No idea really - this seems so far away from the code touched, and there
> > > are no obvious signs for a memory scamble from another object touched
> > > that I think if it really bisects down to that issue it must be a timing
> > > issue.
> > > 
> > > But reading Bart's message again:  Did you actually bisect it down
> > > to the is commit?  Or just test the whole tree?  Between the 4.10-rc5
> > > merge and all the block tree there might a few more likely suspects
> > > like the scsi bdi lifetime fixes that James mentioned.
> > 
> > Hello Christoph,
> > 
> > As far as I know Jens does not rebase his trees so we can use the commit
> > date to check which patch went in when. From the first of Jan's bdi patches:
> > 
> > CommitDate: Thu Feb 2 08:18:41 2017 -0700
> > 
> > So the bdi patches went in several days after I reported the general protection
> > fault issue.
> > 
> > In an e-mail of January 30th I wrote the following: "Running the srp-test
> > software against kernel 4.9.6 and kernel 4.10-rc5 went fine.  With your
> > for-4.11/block branch (commit 400f73b23f457a) however I just ran into
> > the following warning: [ ... ]" That means that I did not hit the crash with
> > Jens' for-4.11/block branch but only with the for-next branch. The patches
> > on Jens' for-next branch after that commit that were applied before I ran
> > my test are:
> > 
> > $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block drivers/md/dm{,-mpath,-table}.[ch]
> > fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in __blk_rq_prep_clone
> > 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of struct request
> > 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation into bsg_setup_queue
> > eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request allocation to the owner of the request_queue
> > 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for extra command data
> > 5ea708d15a928f7a479987704203616d3274c03b block: simplify blk_init_allocated_queue
> > e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check
> > f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into for-4.11/rq-refactor
> > 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable
> > bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from __blkdev_issue_zeroout
> > f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size and bdev_zone_size
> > 
> > Do you see any patch in the above list that does not belong to the "split
> > scsi passthrough fields out of struct request" series and that could have
> > caused the reported behavior change?
> 
> Bart, since you are the only one that can reproduce this, can you just bisect
> your way through that series?

Hello Jens,

Since Christoph also has access to IB hardware I will leave it to Christoph
to do the bisect. Anyway, I just reproduced this crash with Linus' current
tree (commit f1ef09fde17f) by running srp-test/run_tests -r 10 -t 02-sq-on-mq
(see also https://github.com/bvanassche/srp-test):

[ 1629.920553] general protection fault: 0000 [#1] SMP
[ 1629.921193] CPU: 6 PID: 46 Comm: ksoftirqd/6 Tainted: G          I     4.10.0-dbg+ #1
[ 1629.921289] RIP: 0010:rq_completed+0x12/0x90 [dm_mod]
[ 1629.921316] RSP: 0018:ffffc90001bdbda8 EFLAGS: 00010246
[ 1629.921344] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000
[ 1629.921372] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b
[ 1629.921401] RBP: ffffc90001bdbdc0 R08: ffff8803a3858d48 R09: 0000000000000000
[ 1629.921429] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1629.921458] R13: 0000000000000000 R14: ffffffff81c05120 R15: 0000000000000004
[ 1629.921489] FS:  0000000000000000(0000) GS:ffff88046ef80000(0000) knlGS:0000000000000000
[ 1629.921520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1629.921547] CR2: 00007fb6324486b8 CR3: 0000000001c0f000 CR4: 00000000001406e0
[ 1629.921576] Call Trace:
[ 1629.921605]  dm_softirq_done+0xe6/0x1e0 [dm_mod]
[ 1629.921637]  blk_done_softirq+0x88/0xa0
[ 1629.921663]  __do_softirq+0xba/0x4c0
[ 1629.921744]  run_ksoftirqd+0x1a/0x50
[ 1629.921769]  smpboot_thread_fn+0x123/0x1e0
[ 1629.921797]  kthread+0x107/0x140
[ 1629.921944]  ret_from_fork+0x2e/0x40
[ 1629.921972] Code: ff ff 31 f6 48 89 c7 e8 ed 96 2f e1 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 63 f6 48 89 e5 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b a7 70 02 00 00 f0 ff 8c b7 38 03 00 00 e8 3a 43 ff ff 85 
[ 1629.922093] RIP: rq_completed+0x12/0x90 [dm_mod] RSP: ffffc90001bdbda8

$ gdb drivers/md/dm-mod.ko
(gdb) list *(rq_completed+0x12)    
0xdf62 is in rq_completed (drivers/md/dm-rq.c:187).
182      * the md may be freed in dm_put() at the end of this function.
183      * Or do dm_get() before calling this function and dm_put() later.
184      */
185     static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
186     {
187             struct request_queue *q = md->queue;
188             unsigned long flags;
189
190             atomic_dec(&md->pending[rw]);
191
(gdb) disas rq_completed  
Dump of assembler code for function rq_completed:
   0x000000000000df50 <+0>:     push   %rbp
   0x000000000000df51 <+1>:     movslq %esi,%rsi
   0x000000000000df54 <+4>:     mov    %rsp,%rbp
   0x000000000000df57 <+7>:     push   %r13
   0x000000000000df59 <+9>:     mov    %edx,%r13d
   0x000000000000df5c <+12>:    push   %r12
   0x000000000000df5e <+14>:    push   %rbx
   0x000000000000df5f <+15>:    mov    %rdi,%rbx
   0x000000000000df62 <+18>:    mov    0x270(%rdi),%r12
[ ... ]

So the crash is caused by an attempt to dereference address 0x6b6b6b6b6b6b6b6b
at offset 0x270. I think this means the crash is caused by a use-after-free.

Bart.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-24 17:39             ` Bart Van Assche
  (?)
@ 2017-02-24 17:51             ` Jens Axboe
  -1 siblings, 0 replies; 43+ messages in thread
From: Jens Axboe @ 2017-02-24 17:51 UTC (permalink / raw)
  To: Bart Van Assche, hch; +Cc: torvalds, linux-kernel, linux-block, snitzer

On 02/24/2017 10:39 AM, Bart Van Assche wrote:
> On Mon, 2017-02-20 at 09:32 -0700, Jens Axboe wrote:
>> On 02/20/2017 09:16 AM, Bart Van Assche wrote:
>>> On 02/19/2017 11:35 PM, Christoph Hellwig wrote:
>>>> On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:
>>>>> That said, we will look into this again, of course. Christoph, any idea?
>>>>
>>>> No idea really - this seems so far away from the code touched, and there
>>>> are no obvious signs for a memory scamble from another object touched
>>>> that I think if it really bisects down to that issue it must be a timing
>>>> issue.
>>>>
>>>> But reading Bart's message again:  Did you actually bisect it down
>>>> to the is commit?  Or just test the whole tree?  Between the 4.10-rc5
>>>> merge and all the block tree there might a few more likely suspects
>>>> like the scsi bdi lifetime fixes that James mentioned.
>>>
>>> Hello Christoph,
>>>
>>> As far as I know Jens does not rebase his trees so we can use the commit
>>> date to check which patch went in when. From the first of Jan's bdi patches:
>>>
>>> CommitDate: Thu Feb 2 08:18:41 2017 -0700
>>>
>>> So the bdi patches went in several days after I reported the general protection
>>> fault issue.
>>>
>>> In an e-mail of January 30th I wrote the following: "Running the srp-test
>>> software against kernel 4.9.6 and kernel 4.10-rc5 went fine.  With your
>>> for-4.11/block branch (commit 400f73b23f457a) however I just ran into
>>> the following warning: [ ... ]" That means that I did not hit the crash with
>>> Jens' for-4.11/block branch but only with the for-next branch. The patches
>>> on Jens' for-next branch after that commit that were applied before I ran
>>> my test are:
>>>
>>> $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block drivers/md/dm{,-mpath,-table}.[ch]
>>> fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in __blk_rq_prep_clone
>>> 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of struct request
>>> 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation into bsg_setup_queue
>>> eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request allocation to the owner of the request_queue
>>> 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for extra command data
>>> 5ea708d15a928f7a479987704203616d3274c03b block: simplify blk_init_allocated_queue
>>> e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check
>>> f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into for-4.11/rq-refactor
>>> 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable
>>> bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from __blkdev_issue_zeroout
>>> f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size and bdev_zone_size
>>>
>>> Do you see any patch in the above list that does not belong to the "split
>>> scsi passthrough fields out of struct request" series and that could have
>>> caused the reported behavior change?
>>
>> Bart, since you are the only one that can reproduce this, can you just bisect
>> your way through that series?
> 
> Hello Jens,
> 
> Since Christoph also has access to IB hardware I will leave it to Christoph
> to do the bisect. Anyway, I just reproduced this crash with Linus' current
> tree (commit f1ef09fde17f) by running srp-test/run_tests -r 10 -t 02-sq-on-mq
> (see also https://github.com/bvanassche/srp-test):
> 
> [ 1629.920553] general protection fault: 0000 [#1] SMP
> [ 1629.921193] CPU: 6 PID: 46 Comm: ksoftirqd/6 Tainted: G          I     4.10.0-dbg+ #1
> [ 1629.921289] RIP: 0010:rq_completed+0x12/0x90 [dm_mod]
> [ 1629.921316] RSP: 0018:ffffc90001bdbda8 EFLAGS: 00010246
> [ 1629.921344] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000
> [ 1629.921372] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b
> [ 1629.921401] RBP: ffffc90001bdbdc0 R08: ffff8803a3858d48 R09: 0000000000000000
> [ 1629.921429] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 1629.921458] R13: 0000000000000000 R14: ffffffff81c05120 R15: 0000000000000004
> [ 1629.921489] FS:  0000000000000000(0000) GS:ffff88046ef80000(0000) knlGS:0000000000000000
> [ 1629.921520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1629.921547] CR2: 00007fb6324486b8 CR3: 0000000001c0f000 CR4: 00000000001406e0
> [ 1629.921576] Call Trace:
> [ 1629.921605]  dm_softirq_done+0xe6/0x1e0 [dm_mod]
> [ 1629.921637]  blk_done_softirq+0x88/0xa0
> [ 1629.921663]  __do_softirq+0xba/0x4c0
> [ 1629.921744]  run_ksoftirqd+0x1a/0x50
> [ 1629.921769]  smpboot_thread_fn+0x123/0x1e0
> [ 1629.921797]  kthread+0x107/0x140
> [ 1629.921944]  ret_from_fork+0x2e/0x40
> [ 1629.921972] Code: ff ff 31 f6 48 89 c7 e8 ed 96 2f e1 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 63 f6 48 89 e5 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b a7 70 02 00 00 f0 ff 8c b7 38 03 00 00 e8 3a 43 ff ff 85 
> [ 1629.922093] RIP: rq_completed+0x12/0x90 [dm_mod] RSP: ffffc90001bdbda8
> 
> $ gdb drivers/md/dm-mod.ko
> (gdb) list *(rq_completed+0x12)    
> 0xdf62 is in rq_completed (drivers/md/dm-rq.c:187).
> 182      * the md may be freed in dm_put() at the end of this function.
> 183      * Or do dm_get() before calling this function and dm_put() later.
> 184      */
> 185     static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
> 186     {
> 187             struct request_queue *q = md->queue;
> 188             unsigned long flags;
> 189
> 190             atomic_dec(&md->pending[rw]);
> 191
> (gdb) disas rq_completed  
> Dump of assembler code for function rq_completed:
>    0x000000000000df50 <+0>:     push   %rbp
>    0x000000000000df51 <+1>:     movslq %esi,%rsi
>    0x000000000000df54 <+4>:     mov    %rsp,%rbp
>    0x000000000000df57 <+7>:     push   %r13
>    0x000000000000df59 <+9>:     mov    %edx,%r13d
>    0x000000000000df5c <+12>:    push   %r12
>    0x000000000000df5e <+14>:    push   %rbx
>    0x000000000000df5f <+15>:    mov    %rdi,%rbx
>    0x000000000000df62 <+18>:    mov    0x270(%rdi),%r12
> [ ... ]
> 
> So the crash is caused by an attempt to dereference address 0x6b6b6b6b6b6b6b6b
> at offset 0x270. I think this means the crash is caused by a use-after-free.

Christoph, if you have the setup to reproduce this now, can you bisect and
see wtf is going on here?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-24 17:39             ` Bart Van Assche
  (?)
  (?)
@ 2017-02-24 19:43             ` Linus Torvalds
  2017-02-24 20:00               ` Jens Axboe
  -1 siblings, 1 reply; 43+ messages in thread
From: Linus Torvalds @ 2017-02-24 19:43 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: hch, axboe, linux-kernel, linux-block, snitzer

On Fri, Feb 24, 2017 at 9:39 AM, Bart Van Assche
<Bart.VanAssche@sandisk.com> wrote:
>
> So the crash is caused by an attempt to dereference address 0x6b6b6b6b6b6b6b6b
> at offset 0x270. I think this means the crash is caused by a use-after-free.

Yeah, that's POISON_FREE, and that might explain why you see crashes
that others don't - you obviously have SLAB poisoning enabled. Jens
may not have.

%rdi is "struct mapped_device *md", which came from dm_softirq_done() doing

        struct dm_rq_target_io *tio = tio_from_request(rq);
        struct request *clone = tio->clone;
        int rw;

        if (!clone) {
                rq_end_stats(tio->md, rq);
                rw = rq_data_dir(rq);
                if (!rq->q->mq_ops)
                        blk_end_request_all(rq, tio->error);
                else
                        blk_mq_end_request(rq, tio->error);
                rq_completed(tio->md, rw, false);
                return;
        }

so it's the 'tio' pointer that has been free'd. But it's worth noting
that we did apparently successfully dereference "tio" earlier in that
dm_softirq_done() *without* getting the poison value, so what I think
might be going on is that the 'tio' thing gets free'd when the code
does the blk_end_request_all()/blk_mq_end_request() call.

Which makes sense - that ends the lifetime of the request, which in
turn also ends the lifetime of the "tio_from_request()", no?

So the fix may be as simple as just doing

        if (!clone) {
                struct mapped_device *md = tio->md;

                rq_end_stats(md, rq);
                ...
                rq_completed(md, rw, false);
                return;
        }

because the 'mapped_device' pointer hopefully is still valid, it's
just 'tio' that has been freed.

Jens? Bart? Christoph? Somebody who knows this code should
double-check my thinking above. I don't actually know the tio
lifetimes, I'm just going by looking at how earlier accesses seemed to
be fine (eg that "tio->clone" got us NULL, not a POISON_FREE pointer,
for example).

               Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-24 19:43             ` Linus Torvalds
@ 2017-02-24 20:00               ` Jens Axboe
  2017-02-24 20:22                 ` Jens Axboe
  0 siblings, 1 reply; 43+ messages in thread
From: Jens Axboe @ 2017-02-24 20:00 UTC (permalink / raw)
  To: Linus Torvalds, Bart Van Assche; +Cc: hch, linux-kernel, linux-block, snitzer

On 02/24/2017 12:43 PM, Linus Torvalds wrote:
> On Fri, Feb 24, 2017 at 9:39 AM, Bart Van Assche
> <Bart.VanAssche@sandisk.com> wrote:
>>
>> So the crash is caused by an attempt to dereference address 0x6b6b6b6b6b6b6b6b
>> at offset 0x270. I think this means the crash is caused by a use-after-free.
> 
> Yeah, that's POISON_FREE, and that might explain why you see crashes
> that others don't - you obviously have SLAB poisoning enabled. Jens
> may not have.
> 
> %rdi is "struct mapped_device *md", which came from dm_softirq_done() doing
> 
>         struct dm_rq_target_io *tio = tio_from_request(rq);
>         struct request *clone = tio->clone;
>         int rw;
> 
>         if (!clone) {
>                 rq_end_stats(tio->md, rq);
>                 rw = rq_data_dir(rq);
>                 if (!rq->q->mq_ops)
>                         blk_end_request_all(rq, tio->error);
>                 else
>                         blk_mq_end_request(rq, tio->error);
>                 rq_completed(tio->md, rw, false);
>                 return;
>         }
> 
> so it's the 'tio' pointer that has been free'd. But it's worth noting
> that we did apparently successfully dereference "tio" earlier in that
> dm_softirq_done() *without* getting the poison value, so what I think
> might be going on is that the 'tio' thing gets free'd when the code
> does the blk_end_request_all()/blk_mq_end_request() call.
> 
> Which makes sense - that ends the lifetime of the request, which in
> turn also ends the lifetime of the "tio_from_request()", no?
> 
> So the fix may be as simple as just doing
> 
>         if (!clone) {
>                 struct mapped_device *md = tio->md;
> 
>                 rq_end_stats(md, rq);
>                 ...
>                 rq_completed(md, rw, false);
>                 return;
>         }
> 
> because the 'mapped_device' pointer hopefully is still valid, it's
> just 'tio' that has been freed.
> 
> Jens? Bart? Christoph? Somebody who knows this code should
> double-check my thinking above. I don't actually know the tio
> lifetimes, I'm just going by looking at how earlier accesses seemed to
> be fine (eg that "tio->clone" got us NULL, not a POISON_FREE pointer,
> for example).

I think that is spot on. With the request changes for CDBs, for non
blk-mq, we know also carry the payload after the request. But since
blk-mq never frees the request, the above use-after-free with poison
will only happen for !mq. Caching 'md' and avoiding a dereference of
'tio' after calling blk_end_request_all() will likely fix it.

Bart, can you test that?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-24 20:00               ` Jens Axboe
@ 2017-02-24 20:22                 ` Jens Axboe
  2017-02-24 21:15                     ` Bart Van Assche
  2017-02-25 18:17                   ` hch
  0 siblings, 2 replies; 43+ messages in thread
From: Jens Axboe @ 2017-02-24 20:22 UTC (permalink / raw)
  To: Linus Torvalds, Bart Van Assche; +Cc: hch, linux-kernel, linux-block, snitzer

On 02/24/2017 01:00 PM, Jens Axboe wrote:
> On 02/24/2017 12:43 PM, Linus Torvalds wrote:
>> On Fri, Feb 24, 2017 at 9:39 AM, Bart Van Assche
>> <Bart.VanAssche@sandisk.com> wrote:
>>>
>>> So the crash is caused by an attempt to dereference address 0x6b6b6b6b6b6b6b6b
>>> at offset 0x270. I think this means the crash is caused by a use-after-free.
>>
>> Yeah, that's POISON_FREE, and that might explain why you see crashes
>> that others don't - you obviously have SLAB poisoning enabled. Jens
>> may not have.
>>
>> %rdi is "struct mapped_device *md", which came from dm_softirq_done() doing
>>
>>         struct dm_rq_target_io *tio = tio_from_request(rq);
>>         struct request *clone = tio->clone;
>>         int rw;
>>
>>         if (!clone) {
>>                 rq_end_stats(tio->md, rq);
>>                 rw = rq_data_dir(rq);
>>                 if (!rq->q->mq_ops)
>>                         blk_end_request_all(rq, tio->error);
>>                 else
>>                         blk_mq_end_request(rq, tio->error);
>>                 rq_completed(tio->md, rw, false);
>>                 return;
>>         }
>>
>> so it's the 'tio' pointer that has been free'd. But it's worth noting
>> that we did apparently successfully dereference "tio" earlier in that
>> dm_softirq_done() *without* getting the poison value, so what I think
>> might be going on is that the 'tio' thing gets free'd when the code
>> does the blk_end_request_all()/blk_mq_end_request() call.
>>
>> Which makes sense - that ends the lifetime of the request, which in
>> turn also ends the lifetime of the "tio_from_request()", no?
>>
>> So the fix may be as simple as just doing
>>
>>         if (!clone) {
>>                 struct mapped_device *md = tio->md;
>>
>>                 rq_end_stats(md, rq);
>>                 ...
>>                 rq_completed(md, rw, false);
>>                 return;
>>         }
>>
>> because the 'mapped_device' pointer hopefully is still valid, it's
>> just 'tio' that has been freed.
>>
>> Jens? Bart? Christoph? Somebody who knows this code should
>> double-check my thinking above. I don't actually know the tio
>> lifetimes, I'm just going by looking at how earlier accesses seemed to
>> be fine (eg that "tio->clone" got us NULL, not a POISON_FREE pointer,
>> for example).
> 
> I think that is spot on. With the request changes for CDBs, for non
> blk-mq, we know also carry the payload after the request. But since
> blk-mq never frees the request, the above use-after-free with poison
> will only happen for !mq. Caching 'md' and avoiding a dereference of
> 'tio' after calling blk_end_request_all() will likely fix it.
> 
> Bart, can you test that?

Bart, I pushed a fix here:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=61febef40bfe8ab68259d8545257686e8a0d91d1

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-24 20:22                 ` Jens Axboe
@ 2017-02-24 21:15                     ` Bart Van Assche
  2017-02-25 18:17                   ` hch
  1 sibling, 0 replies; 43+ messages in thread
From: Bart Van Assche @ 2017-02-24 21:15 UTC (permalink / raw)
  To: torvalds, axboe; +Cc: hch, linux-kernel, linux-block, snitzer

On Fri, 2017-02-24 at 13:22 -0700, Jens Axboe wrote:
> Bart, I pushed a fix here:
>=20
> http://git.kernel.dk/cgit/linux-block/commit/?h=3Dfor-linus&id=3D61febef4=
0bfe8ab68259d8545257686e8a0d91d1

Hello Jens,

The same test passes against the kernel I obtained by merging your for-linu=
s
branch with the same version of Linus' master branch I mentioned in a previ=
ous
e-mail. Feel free to add my Tested-by to the patch "dm-rq: don't dereferenc=
e
request payload after ending request".

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
@ 2017-02-24 21:15                     ` Bart Van Assche
  0 siblings, 0 replies; 43+ messages in thread
From: Bart Van Assche @ 2017-02-24 21:15 UTC (permalink / raw)
  To: torvalds, axboe; +Cc: hch, linux-kernel, linux-block, snitzer

On Fri, 2017-02-24 at 13:22 -0700, Jens Axboe wrote:
> Bart, I pushed a fix here:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=61febef40bfe8ab68259d8545257686e8a0d91d1

Hello Jens,

The same test passes against the kernel I obtained by merging your for-linus
branch with the same version of Linus' master branch I mentioned in a previous
e-mail. Feel free to add my Tested-by to the patch "dm-rq: don't dereference
request payload after ending request".

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-24 20:22                 ` Jens Axboe
  2017-02-24 21:15                     ` Bart Van Assche
@ 2017-02-25 18:17                   ` hch
  2017-02-25 18:22                     ` Jens Axboe
  1 sibling, 1 reply; 43+ messages in thread
From: hch @ 2017-02-25 18:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Bart Van Assche, hch, linux-kernel, linux-block, snitzer

On Fri, Feb 24, 2017 at 01:22:42PM -0700, Jens Axboe wrote:
> Bart, I pushed a fix here:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=61febef40bfe8ab68259d8545257686e8a0d91d1

Yeah, this looks fine to me.  It was broken on blk-mq before, but
basically impossible to hit.  I wonder if we should have a debug mode
where we set requests to a known pattern after they are freed to catch
these sort of bugs.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [GIT PULL] Block pull request for- 4.11-rc1
  2017-02-25 18:17                   ` hch
@ 2017-02-25 18:22                     ` Jens Axboe
  0 siblings, 0 replies; 43+ messages in thread
From: Jens Axboe @ 2017-02-25 18:22 UTC (permalink / raw)
  To: hch; +Cc: Linus Torvalds, Bart Van Assche, linux-kernel, linux-block, snitzer

On 02/25/2017 11:17 AM, hch@lst.de wrote:
> On Fri, Feb 24, 2017 at 01:22:42PM -0700, Jens Axboe wrote:
>> Bart, I pushed a fix here:
>>
>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus&id=61febef40bfe8ab68259d8545257686e8a0d91d1
> 
> Yeah, this looks fine to me.  It was broken on blk-mq before, but
> basically impossible to hit.  I wonder if we should have a debug mode
> where we set requests to a known pattern after they are freed to catch
> these sort of bugs.

tio/md would probably never change for the same set of requests, so
it worked by happy accident before. Some of the blk-mq state remains
valid after "freeing" a request, but we could selectively fill and
definitely fill the entire PDU area with free poison.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2017-02-25 18:22 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-20  0:10 [GIT PULL] Block pull request for- 4.11-rc1 Jens Axboe
2017-02-20  1:09 ` Bart Van Assche
2017-02-20  1:09   ` Bart Van Assche
2017-02-20  1:15   ` Jens Axboe
2017-02-20  2:12     ` James Bottomley
2017-02-20  2:12       ` James Bottomley
2017-02-20  2:59       ` Jens Axboe
2017-02-20  3:02         ` Jens Axboe
2017-02-20  7:35     ` Christoph Hellwig
2017-02-20 16:16       ` Bart Van Assche
2017-02-20 16:16         ` Bart Van Assche
2017-02-20 16:32         ` Jens Axboe
2017-02-21  1:18           ` Bart Van Assche
2017-02-21  1:18             ` Bart Van Assche
2017-02-24 17:39           ` Bart Van Assche
2017-02-24 17:39             ` Bart Van Assche
2017-02-24 17:51             ` Jens Axboe
2017-02-24 19:43             ` Linus Torvalds
2017-02-24 20:00               ` Jens Axboe
2017-02-24 20:22                 ` Jens Axboe
2017-02-24 21:15                   ` Bart Van Assche
2017-02-24 21:15                     ` Bart Van Assche
2017-02-25 18:17                   ` hch
2017-02-25 18:22                     ` Jens Axboe
2017-02-21 19:11 ` Linus Torvalds
2017-02-21 19:34   ` Jens Axboe
2017-02-21 23:02   ` Linus Torvalds
2017-02-21 23:15     ` Jens Axboe
2017-02-21 23:23       ` Linus Torvalds
2017-02-22 18:14         ` Jens Axboe
2017-02-22 18:26           ` Linus Torvalds
2017-02-22 18:41             ` Jens Axboe
2017-02-22 18:45               ` Linus Torvalds
2017-02-22 18:52                 ` Jens Axboe
2017-02-22 18:56                   ` Linus Torvalds
2017-02-22 18:58                     ` Jens Axboe
2017-02-22 19:04                       ` Linus Torvalds
2017-02-22 21:29                         ` Jens Axboe
2017-02-22 18:42             ` Linus Torvalds
2017-02-22 18:44               ` Jens Axboe
2017-02-22 21:50                 ` Markus Trippelsdorf
2017-02-22 21:55                   ` Jens Axboe
2017-02-23  0:16                   ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.