All of lore.kernel.org
 help / color / mirror / Atom feed
* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-01-30 15:41 ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-01-30 15:41 UTC (permalink / raw)
  To: linux-block; +Cc: Christoph Hellwig, Ming Lei, linux-nvme

Hi,

I just hit this on 4.15+ on the laptop, it's running Linus' git
as of yesterday, right after the block tree merge:

commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
Merge: 9697e9da8429 796baeeef85a
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Jan 29 11:51:49 2018 -0800

    Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block

It's hitting this case:

        if (WARN_ON_ONCE(n != segments)) {                                      

in nvme, indicating that rq->nr_phys_segments does not equal the
number of bios in the request. Anyone seen this? I'll take a closer
look as time permits, full trace below.


 WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
 Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
 CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G     U           4.15.0+ #176
 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
 RIP: 0010:nvme_setup_cmd+0x3d3/0x430
 RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
 RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
 RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
 RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
 R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
 R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
 FS:  0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  nvme_queue_rq+0x40/0xa00
  ? __sbitmap_queue_get+0x24/0x90
  ? blk_mq_get_tag+0xa3/0x250
  ? wait_woken+0x80/0x80
  ? blk_mq_get_driver_tag+0x97/0xf0
  blk_mq_dispatch_rq_list+0x7b/0x4a0
  ? deadline_remove_request+0x49/0xb0
  blk_mq_do_dispatch_sched+0x4f/0xc0
  blk_mq_sched_dispatch_requests+0x106/0x170
  __blk_mq_run_hw_queue+0x53/0xa0
  __blk_mq_delay_run_hw_queue+0x83/0xa0
  blk_mq_run_hw_queue+0x6c/0xd0
  blk_mq_sched_insert_request+0x96/0x140
  __blk_mq_try_issue_directly+0x3d/0x190
  blk_mq_try_issue_directly+0x30/0x70
  blk_mq_make_request+0x1a4/0x6a0
  generic_make_request+0xfd/0x2f0
  ? submit_bio+0x5c/0x110
  submit_bio+0x5c/0x110
  ? __blkdev_issue_discard+0x152/0x200
  submit_bio_wait+0x43/0x60
  ext4_process_freed_data+0x1cd/0x440
  ? account_page_dirtied+0xe2/0x1a0
  ext4_journal_commit_callback+0x4a/0xc0
  jbd2_journal_commit_transaction+0x17e2/0x19e0
  ? kjournald2+0xb0/0x250
  kjournald2+0xb0/0x250
  ? wait_woken+0x80/0x80
  ? commit_timeout+0x10/0x10
  kthread+0x111/0x130
  ? kthread_create_worker_on_cpu+0x50/0x50
  ? do_group_exit+0x3a/0xa0
  ret_from_fork+0x1f/0x30
 Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
 ---[ end trace 50d361cc444506c8 ]---
 print_req_error: I/O error, dev nvme0n1, sector 847167488

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-01-30 15:41 ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-01-30 15:41 UTC (permalink / raw)


Hi,

I just hit this on 4.15+ on the laptop, it's running Linus' git
as of yesterday, right after the block tree merge:

commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
Merge: 9697e9da8429 796baeeef85a
Author: Linus Torvalds <torvalds at linux-foundation.org>
Date:   Mon Jan 29 11:51:49 2018 -0800

    Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block

It's hitting this case:

        if (WARN_ON_ONCE(n != segments)) {                                      

in nvme, indicating that rq->nr_phys_segments does not equal the
number of bios in the request. Anyone seen this? I'll take a closer
look as time permits, full trace below.


 WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
 Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
 CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G     U           4.15.0+ #176
 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
 RIP: 0010:nvme_setup_cmd+0x3d3/0x430
 RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
 RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
 RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
 RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
 R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
 R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
 FS:  0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  nvme_queue_rq+0x40/0xa00
  ? __sbitmap_queue_get+0x24/0x90
  ? blk_mq_get_tag+0xa3/0x250
  ? wait_woken+0x80/0x80
  ? blk_mq_get_driver_tag+0x97/0xf0
  blk_mq_dispatch_rq_list+0x7b/0x4a0
  ? deadline_remove_request+0x49/0xb0
  blk_mq_do_dispatch_sched+0x4f/0xc0
  blk_mq_sched_dispatch_requests+0x106/0x170
  __blk_mq_run_hw_queue+0x53/0xa0
  __blk_mq_delay_run_hw_queue+0x83/0xa0
  blk_mq_run_hw_queue+0x6c/0xd0
  blk_mq_sched_insert_request+0x96/0x140
  __blk_mq_try_issue_directly+0x3d/0x190
  blk_mq_try_issue_directly+0x30/0x70
  blk_mq_make_request+0x1a4/0x6a0
  generic_make_request+0xfd/0x2f0
  ? submit_bio+0x5c/0x110
  submit_bio+0x5c/0x110
  ? __blkdev_issue_discard+0x152/0x200
  submit_bio_wait+0x43/0x60
  ext4_process_freed_data+0x1cd/0x440
  ? account_page_dirtied+0xe2/0x1a0
  ext4_journal_commit_callback+0x4a/0xc0
  jbd2_journal_commit_transaction+0x17e2/0x19e0
  ? kjournald2+0xb0/0x250
  kjournald2+0xb0/0x250
  ? wait_woken+0x80/0x80
  ? commit_timeout+0x10/0x10
  kthread+0x111/0x130
  ? kthread_create_worker_on_cpu+0x50/0x50
  ? do_group_exit+0x3a/0xa0
  ret_from_fork+0x1f/0x30
 Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
 ---[ end trace 50d361cc444506c8 ]---
 print_req_error: I/O error, dev nvme0n1, sector 847167488

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-01-30 15:41 ` Jens Axboe
@ 2018-01-30 15:57   ` Jens Axboe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-01-30 15:57 UTC (permalink / raw)
  To: linux-block; +Cc: Christoph Hellwig, Ming Lei, linux-nvme

On 1/30/18 8:41 AM, Jens Axboe wrote:
> Hi,
> 
> I just hit this on 4.15+ on the laptop, it's running Linus' git
> as of yesterday, right after the block tree merge:
> 
> commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
> Merge: 9697e9da8429 796baeeef85a
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Mon Jan 29 11:51:49 2018 -0800
> 
>     Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
> 
> It's hitting this case:
> 
>         if (WARN_ON_ONCE(n != segments)) {                                      
> 
> in nvme, indicating that rq->nr_phys_segments does not equal the
> number of bios in the request. Anyone seen this? I'll take a closer
> look as time permits, full trace below.
> 
> 
>  WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
>  Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
>  CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G     U           4.15.0+ #176
>  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
>  RIP: 0010:nvme_setup_cmd+0x3d3/0x430
>  RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
>  RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
>  RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
>  RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
>  R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
>  R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
>  FS:  0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   nvme_queue_rq+0x40/0xa00
>   ? __sbitmap_queue_get+0x24/0x90
>   ? blk_mq_get_tag+0xa3/0x250
>   ? wait_woken+0x80/0x80
>   ? blk_mq_get_driver_tag+0x97/0xf0
>   blk_mq_dispatch_rq_list+0x7b/0x4a0
>   ? deadline_remove_request+0x49/0xb0
>   blk_mq_do_dispatch_sched+0x4f/0xc0
>   blk_mq_sched_dispatch_requests+0x106/0x170
>   __blk_mq_run_hw_queue+0x53/0xa0
>   __blk_mq_delay_run_hw_queue+0x83/0xa0
>   blk_mq_run_hw_queue+0x6c/0xd0
>   blk_mq_sched_insert_request+0x96/0x140
>   __blk_mq_try_issue_directly+0x3d/0x190
>   blk_mq_try_issue_directly+0x30/0x70
>   blk_mq_make_request+0x1a4/0x6a0
>   generic_make_request+0xfd/0x2f0
>   ? submit_bio+0x5c/0x110
>   submit_bio+0x5c/0x110
>   ? __blkdev_issue_discard+0x152/0x200
>   submit_bio_wait+0x43/0x60
>   ext4_process_freed_data+0x1cd/0x440
>   ? account_page_dirtied+0xe2/0x1a0
>   ext4_journal_commit_callback+0x4a/0xc0
>   jbd2_journal_commit_transaction+0x17e2/0x19e0
>   ? kjournald2+0xb0/0x250
>   kjournald2+0xb0/0x250
>   ? wait_woken+0x80/0x80
>   ? commit_timeout+0x10/0x10
>   kthread+0x111/0x130
>   ? kthread_create_worker_on_cpu+0x50/0x50
>   ? do_group_exit+0x3a/0xa0
>   ret_from_fork+0x1f/0x30
>  Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
>  ---[ end trace 50d361cc444506c8 ]---
>  print_req_error: I/O error, dev nvme0n1, sector 847167488

Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-01-30 15:57   ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-01-30 15:57 UTC (permalink / raw)


On 1/30/18 8:41 AM, Jens Axboe wrote:
> Hi,
> 
> I just hit this on 4.15+ on the laptop, it's running Linus' git
> as of yesterday, right after the block tree merge:
> 
> commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
> Merge: 9697e9da8429 796baeeef85a
> Author: Linus Torvalds <torvalds at linux-foundation.org>
> Date:   Mon Jan 29 11:51:49 2018 -0800
> 
>     Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
> 
> It's hitting this case:
> 
>         if (WARN_ON_ONCE(n != segments)) {                                      
> 
> in nvme, indicating that rq->nr_phys_segments does not equal the
> number of bios in the request. Anyone seen this? I'll take a closer
> look as time permits, full trace below.
> 
> 
>  WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
>  Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
>  CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G     U           4.15.0+ #176
>  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
>  RIP: 0010:nvme_setup_cmd+0x3d3/0x430
>  RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
>  RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
>  RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
>  RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
>  R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
>  R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
>  FS:  0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   nvme_queue_rq+0x40/0xa00
>   ? __sbitmap_queue_get+0x24/0x90
>   ? blk_mq_get_tag+0xa3/0x250
>   ? wait_woken+0x80/0x80
>   ? blk_mq_get_driver_tag+0x97/0xf0
>   blk_mq_dispatch_rq_list+0x7b/0x4a0
>   ? deadline_remove_request+0x49/0xb0
>   blk_mq_do_dispatch_sched+0x4f/0xc0
>   blk_mq_sched_dispatch_requests+0x106/0x170
>   __blk_mq_run_hw_queue+0x53/0xa0
>   __blk_mq_delay_run_hw_queue+0x83/0xa0
>   blk_mq_run_hw_queue+0x6c/0xd0
>   blk_mq_sched_insert_request+0x96/0x140
>   __blk_mq_try_issue_directly+0x3d/0x190
>   blk_mq_try_issue_directly+0x30/0x70
>   blk_mq_make_request+0x1a4/0x6a0
>   generic_make_request+0xfd/0x2f0
>   ? submit_bio+0x5c/0x110
>   submit_bio+0x5c/0x110
>   ? __blkdev_issue_discard+0x152/0x200
>   submit_bio_wait+0x43/0x60
>   ext4_process_freed_data+0x1cd/0x440
>   ? account_page_dirtied+0xe2/0x1a0
>   ext4_journal_commit_callback+0x4a/0xc0
>   jbd2_journal_commit_transaction+0x17e2/0x19e0
>   ? kjournald2+0xb0/0x250
>   kjournald2+0xb0/0x250
>   ? wait_woken+0x80/0x80
>   ? commit_timeout+0x10/0x10
>   kthread+0x111/0x130
>   ? kthread_create_worker_on_cpu+0x50/0x50
>   ? do_group_exit+0x3a/0xa0
>   ret_from_fork+0x1f/0x30
>  Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
>  ---[ end trace 50d361cc444506c8 ]---
>  print_req_error: I/O error, dev nvme0n1, sector 847167488

Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-01-30 15:57   ` Jens Axboe
@ 2018-01-30 20:30     ` Keith Busch
  -1 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-01-30 20:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ming Lei, linux-nvme

On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote:
> 
> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.

Is this still a problem if you don't use an IO scheduler? With deadline,
I'm not finding any path to bio_attempt_discard_merge which is where the
nr_phys_segments is supposed to get it set to 2. Not sure how it could
becmoe 0xffff, though.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-01-30 20:30     ` Keith Busch
  0 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-01-30 20:30 UTC (permalink / raw)


On Tue, Jan 30, 2018@08:57:49AM -0700, Jens Axboe wrote:
> 
> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.

Is this still a problem if you don't use an IO scheduler? With deadline,
I'm not finding any path to bio_attempt_discard_merge which is where the
nr_phys_segments is supposed to get it set to 2. Not sure how it could
becmoe 0xffff, though.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-01-30 20:30     ` Keith Busch
@ 2018-01-30 20:32       ` Jens Axboe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-01-30 20:32 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, Christoph Hellwig, Ming Lei, linux-nvme

On 1/30/18 1:30 PM, Keith Busch wrote:
> On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote:
>>
>> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.
> 
> Is this still a problem if you don't use an IO scheduler? With deadline,
> I'm not finding any path to bio_attempt_discard_merge which is where the
> nr_phys_segments is supposed to get it set to 2. Not sure how it could
> becmoe 0xffff, though.

blk_mq_make_request() -> blk_mq_sched_bio_merge() -> __blk_mq_sched_bio_merge()
	-> blk_mq_attempt_merge() -> bio_attempt_discard_merge()

Doesn't matter what IO scheduler you use.

I don't know if it triggers without a scheduler. I've been running this code
continually on the laptop (always do), and haven't seen it before today.


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-01-30 20:32       ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-01-30 20:32 UTC (permalink / raw)


On 1/30/18 1:30 PM, Keith Busch wrote:
> On Tue, Jan 30, 2018@08:57:49AM -0700, Jens Axboe wrote:
>>
>> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.
> 
> Is this still a problem if you don't use an IO scheduler? With deadline,
> I'm not finding any path to bio_attempt_discard_merge which is where the
> nr_phys_segments is supposed to get it set to 2. Not sure how it could
> becmoe 0xffff, though.

blk_mq_make_request() -> blk_mq_sched_bio_merge() -> __blk_mq_sched_bio_merge()
	-> blk_mq_attempt_merge() -> bio_attempt_discard_merge()

Doesn't matter what IO scheduler you use.

I don't know if it triggers without a scheduler. I've been running this code
continually on the laptop (always do), and haven't seen it before today.


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-01-30 20:32       ` Jens Axboe
@ 2018-01-30 20:49         ` Keith Busch
  -1 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-01-30 20:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ming Lei, linux-nvme

On Tue, Jan 30, 2018 at 01:32:25PM -0700, Jens Axboe wrote:
> On 1/30/18 1:30 PM, Keith Busch wrote:
> > On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote:
> >>
> >> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.
> > 
> > Is this still a problem if you don't use an IO scheduler? With deadline,
> > I'm not finding any path to bio_attempt_discard_merge which is where the
> > nr_phys_segments is supposed to get it set to 2. Not sure how it could
> > becmoe 0xffff, though.
> 
> blk_mq_make_request() -> blk_mq_sched_bio_merge() -> __blk_mq_sched_bio_merge()
> 	-> blk_mq_attempt_merge() -> bio_attempt_discard_merge()

That's the calls only if you don't have an elevator_queue, right? With
deadline, it looks like it goes through this path (ftrace confirms):

  __blk_mq_sched_bio_merge() -> dd_bio_merge() -> blk_mq_sched_try_merge()

Which doesn't have a case for ELEVATOR_DISCARD_MERGE.

Relavant function_graph:

  46)               |                    blk_mq_make_request() {
  46)   0.133 us    |                      blk_queue_bounce();
  46)   0.370 us    |                      blk_queue_split();
  46)   0.314 us    |                      bio_integrity_prep();
  46)   0.081 us    |                      blk_attempt_plug_merge();
  46)               |                      __blk_mq_sched_bio_merge() {
  46)               |                        dd_bio_merge() {
  46)   0.792 us    |                          _raw_spin_lock();
  46)               |                          blk_mq_sched_try_merge() {

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-01-30 20:49         ` Keith Busch
  0 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-01-30 20:49 UTC (permalink / raw)


On Tue, Jan 30, 2018@01:32:25PM -0700, Jens Axboe wrote:
> On 1/30/18 1:30 PM, Keith Busch wrote:
> > On Tue, Jan 30, 2018@08:57:49AM -0700, Jens Axboe wrote:
> >>
> >> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.
> > 
> > Is this still a problem if you don't use an IO scheduler? With deadline,
> > I'm not finding any path to bio_attempt_discard_merge which is where the
> > nr_phys_segments is supposed to get it set to 2. Not sure how it could
> > becmoe 0xffff, though.
> 
> blk_mq_make_request() -> blk_mq_sched_bio_merge() -> __blk_mq_sched_bio_merge()
> 	-> blk_mq_attempt_merge() -> bio_attempt_discard_merge()

That's the calls only if you don't have an elevator_queue, right? With
deadline, it looks like it goes through this path (ftrace confirms):

  __blk_mq_sched_bio_merge() -> dd_bio_merge() -> blk_mq_sched_try_merge()

Which doesn't have a case for ELEVATOR_DISCARD_MERGE.

Relavant function_graph:

  46)               |                    blk_mq_make_request() {
  46)   0.133 us    |                      blk_queue_bounce();
  46)   0.370 us    |                      blk_queue_split();
  46)   0.314 us    |                      bio_integrity_prep();
  46)   0.081 us    |                      blk_attempt_plug_merge();
  46)               |                      __blk_mq_sched_bio_merge() {
  46)               |                        dd_bio_merge() {
  46)   0.792 us    |                          _raw_spin_lock();
  46)               |                          blk_mq_sched_try_merge() {

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-01-30 20:49         ` Keith Busch
@ 2018-01-30 20:55           ` Jens Axboe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-01-30 20:55 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, Christoph Hellwig, Ming Lei, linux-nvme

On 1/30/18 1:49 PM, Keith Busch wrote:
> On Tue, Jan 30, 2018 at 01:32:25PM -0700, Jens Axboe wrote:
>> On 1/30/18 1:30 PM, Keith Busch wrote:
>>> On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote:
>>>>
>>>> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.
>>>
>>> Is this still a problem if you don't use an IO scheduler? With deadline,
>>> I'm not finding any path to bio_attempt_discard_merge which is where the
>>> nr_phys_segments is supposed to get it set to 2. Not sure how it could
>>> becmoe 0xffff, though.
>>
>> blk_mq_make_request() -> blk_mq_sched_bio_merge() -> __blk_mq_sched_bio_merge()
>> 	-> blk_mq_attempt_merge() -> bio_attempt_discard_merge()
> 
> That's the calls only if you don't have an elevator_queue, right? With
> deadline, it looks like it goes through this path (ftrace confirms):
> 
>   __blk_mq_sched_bio_merge() -> dd_bio_merge() -> blk_mq_sched_try_merge()
> 
> Which doesn't have a case for ELEVATOR_DISCARD_MERGE.
> 
> Relavant function_graph:
> 
>   46)               |                    blk_mq_make_request() {
>   46)   0.133 us    |                      blk_queue_bounce();
>   46)   0.370 us    |                      blk_queue_split();
>   46)   0.314 us    |                      bio_integrity_prep();
>   46)   0.081 us    |                      blk_attempt_plug_merge();
>   46)               |                      __blk_mq_sched_bio_merge() {
>   46)               |                        dd_bio_merge() {
>   46)   0.792 us    |                          _raw_spin_lock();
>   46)               |                          blk_mq_sched_try_merge() {

Yeah I guess you are right, it can't happen for mq-deadline since the
generic sched path doesn't support it.

I'll see if I can make it happen again, but I'm not too hopeful. And if
I can't, it's hard to compare with "none" to see if it makes a difference
or not.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-01-30 20:55           ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-01-30 20:55 UTC (permalink / raw)


On 1/30/18 1:49 PM, Keith Busch wrote:
> On Tue, Jan 30, 2018@01:32:25PM -0700, Jens Axboe wrote:
>> On 1/30/18 1:30 PM, Keith Busch wrote:
>>> On Tue, Jan 30, 2018@08:57:49AM -0700, Jens Axboe wrote:
>>>>
>>>> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.
>>>
>>> Is this still a problem if you don't use an IO scheduler? With deadline,
>>> I'm not finding any path to bio_attempt_discard_merge which is where the
>>> nr_phys_segments is supposed to get it set to 2. Not sure how it could
>>> becmoe 0xffff, though.
>>
>> blk_mq_make_request() -> blk_mq_sched_bio_merge() -> __blk_mq_sched_bio_merge()
>> 	-> blk_mq_attempt_merge() -> bio_attempt_discard_merge()
> 
> That's the calls only if you don't have an elevator_queue, right? With
> deadline, it looks like it goes through this path (ftrace confirms):
> 
>   __blk_mq_sched_bio_merge() -> dd_bio_merge() -> blk_mq_sched_try_merge()
> 
> Which doesn't have a case for ELEVATOR_DISCARD_MERGE.
> 
> Relavant function_graph:
> 
>   46)               |                    blk_mq_make_request() {
>   46)   0.133 us    |                      blk_queue_bounce();
>   46)   0.370 us    |                      blk_queue_split();
>   46)   0.314 us    |                      bio_integrity_prep();
>   46)   0.081 us    |                      blk_attempt_plug_merge();
>   46)               |                      __blk_mq_sched_bio_merge() {
>   46)               |                        dd_bio_merge() {
>   46)   0.792 us    |                          _raw_spin_lock();
>   46)               |                          blk_mq_sched_try_merge() {

Yeah I guess you are right, it can't happen for mq-deadline since the
generic sched path doesn't support it.

I'll see if I can make it happen again, but I'm not too hopeful. And if
I can't, it's hard to compare with "none" to see if it makes a difference
or not.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-01-30 15:57   ` Jens Axboe
@ 2018-01-31  4:25     ` jianchao.wang
  -1 siblings, 0 replies; 42+ messages in thread
From: jianchao.wang @ 2018-01-31  4:25 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Christoph Hellwig, Ming Lei, linux-nvme

Hi Jens

On 01/30/2018 11:57 PM, Jens Axboe wrote:
> On 1/30/18 8:41 AM, Jens Axboe wrote:
>> Hi,
>>
>> I just hit this on 4.15+ on the laptop, it's running Linus' git
>> as of yesterday, right after the block tree merge:
>>
>> commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
>> Merge: 9697e9da8429 796baeeef85a
>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>> Date:   Mon Jan 29 11:51:49 2018 -0800
>>
>>     Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
>>
>> It's hitting this case:
>>
>>         if (WARN_ON_ONCE(n != segments)) {                                      
>>
>> in nvme, indicating that rq->nr_phys_segments does not equal the
>> number of bios in the request. Anyone seen this? I'll take a closer
>> look as time permits, full trace below.
>>
>>
>>  WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
>>  Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
>>  CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G     U           4.15.0+ #176
>>  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
>>  RIP: 0010:nvme_setup_cmd+0x3d3/0x430
>>  RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
>>  RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
>>  RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
>>  RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
>>  R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
>>  R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
>>  FS:  0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
>>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>  Call Trace:
>>   nvme_queue_rq+0x40/0xa00
>>   ? __sbitmap_queue_get+0x24/0x90
>>   ? blk_mq_get_tag+0xa3/0x250
>>   ? wait_woken+0x80/0x80
>>   ? blk_mq_get_driver_tag+0x97/0xf0
>>   blk_mq_dispatch_rq_list+0x7b/0x4a0
>>   ? deadline_remove_request+0x49/0xb0
>>   blk_mq_do_dispatch_sched+0x4f/0xc0
>>   blk_mq_sched_dispatch_requests+0x106/0x170
>>   __blk_mq_run_hw_queue+0x53/0xa0
>>   __blk_mq_delay_run_hw_queue+0x83/0xa0
>>   blk_mq_run_hw_queue+0x6c/0xd0
>>   blk_mq_sched_insert_request+0x96/0x140
>>   __blk_mq_try_issue_directly+0x3d/0x190
>>   blk_mq_try_issue_directly+0x30/0x70
>>   blk_mq_make_request+0x1a4/0x6a0
>>   generic_make_request+0xfd/0x2f0
>>   ? submit_bio+0x5c/0x110
>>   submit_bio+0x5c/0x110
>>   ? __blkdev_issue_discard+0x152/0x200
>>   submit_bio_wait+0x43/0x60
>>   ext4_process_freed_data+0x1cd/0x440
>>   ? account_page_dirtied+0xe2/0x1a0
>>   ext4_journal_commit_callback+0x4a/0xc0
>>   jbd2_journal_commit_transaction+0x17e2/0x19e0
>>   ? kjournald2+0xb0/0x250
>>   kjournald2+0xb0/0x250
>>   ? wait_woken+0x80/0x80
>>   ? commit_timeout+0x10/0x10
>>   kthread+0x111/0x130
>>   ? kthread_create_worker_on_cpu+0x50/0x50
>>   ? do_group_exit+0x3a/0xa0
>>   ret_from_fork+0x1f/0x30
>>  Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
>>  ---[ end trace 50d361cc444506c8 ]---
>>  print_req_error: I/O error, dev nvme0n1, sector 847167488
> 
> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.
> 

Let's consider the case following:

blk_mq_bio_to_request()
  -> blk_init_request_from_bio()
    -> blk_rq_bio_prep()
    ----
    if (bio_has_data(bio))
        rq->nr_phys_segments = bio_phys_segments(q, bio);
    ----
static inline bool bio_has_data(struct bio *bio)
{
    if (bio &&
        bio->bi_iter.bi_size &&
        bio_op(bio) != REQ_OP_DISCARD &&   //-----> HERE !
        bio_op(bio) != REQ_OP_SECURE_ERASE &&
        bio_op(bio) != REQ_OP_WRITE_ZEROES)
        return true;

    return false;
}
For the DISCARD req, the nr_phys_segments is 0.

dd_insert_request()
  -> blk_mq_sched_try_insert_merge()
    -> elv_attempt_insert_merge()
      -> blk_attempt_req_merge()
        -> attempt_merge()
          -> ll_merge_requests_fn()
----
    total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
    // total_phys_segments will be 0
    if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
        if (req->nr_phys_segments == 1)
            req->bio->bi_seg_front_size = seg_size;
        if (next->nr_phys_segments == 1)
            next->biotail->bi_seg_back_size = seg_size;
        // total_phys_segments is int, it will be -1;
        total_phys_segments--;
    }

    //total_phys_segments is -1, so it is false here
    if (total_phys_segments > queue_max_segments(q))
        return 0;

    if (blk_integrity_merge_rq(q, req, next) == false)
        return 0;

    /* Merge is OK... */
    // -1 is set to req->nr_phys_segments which is a unsigned short
    // finally, we get a 0xffff
    req->nr_phys_segments = total_phys_segments;
----

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-01-31  4:25     ` jianchao.wang
  0 siblings, 0 replies; 42+ messages in thread
From: jianchao.wang @ 2018-01-31  4:25 UTC (permalink / raw)


Hi Jens

On 01/30/2018 11:57 PM, Jens Axboe wrote:
> On 1/30/18 8:41 AM, Jens Axboe wrote:
>> Hi,
>>
>> I just hit this on 4.15+ on the laptop, it's running Linus' git
>> as of yesterday, right after the block tree merge:
>>
>> commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
>> Merge: 9697e9da8429 796baeeef85a
>> Author: Linus Torvalds <torvalds at linux-foundation.org>
>> Date:   Mon Jan 29 11:51:49 2018 -0800
>>
>>     Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
>>
>> It's hitting this case:
>>
>>         if (WARN_ON_ONCE(n != segments)) {                                      
>>
>> in nvme, indicating that rq->nr_phys_segments does not equal the
>> number of bios in the request. Anyone seen this? I'll take a closer
>> look as time permits, full trace below.
>>
>>
>>  WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
>>  Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
>>  CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G     U           4.15.0+ #176
>>  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
>>  RIP: 0010:nvme_setup_cmd+0x3d3/0x430
>>  RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
>>  RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
>>  RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
>>  RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
>>  R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
>>  R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
>>  FS:  0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
>>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>  Call Trace:
>>   nvme_queue_rq+0x40/0xa00
>>   ? __sbitmap_queue_get+0x24/0x90
>>   ? blk_mq_get_tag+0xa3/0x250
>>   ? wait_woken+0x80/0x80
>>   ? blk_mq_get_driver_tag+0x97/0xf0
>>   blk_mq_dispatch_rq_list+0x7b/0x4a0
>>   ? deadline_remove_request+0x49/0xb0
>>   blk_mq_do_dispatch_sched+0x4f/0xc0
>>   blk_mq_sched_dispatch_requests+0x106/0x170
>>   __blk_mq_run_hw_queue+0x53/0xa0
>>   __blk_mq_delay_run_hw_queue+0x83/0xa0
>>   blk_mq_run_hw_queue+0x6c/0xd0
>>   blk_mq_sched_insert_request+0x96/0x140
>>   __blk_mq_try_issue_directly+0x3d/0x190
>>   blk_mq_try_issue_directly+0x30/0x70
>>   blk_mq_make_request+0x1a4/0x6a0
>>   generic_make_request+0xfd/0x2f0
>>   ? submit_bio+0x5c/0x110
>>   submit_bio+0x5c/0x110
>>   ? __blkdev_issue_discard+0x152/0x200
>>   submit_bio_wait+0x43/0x60
>>   ext4_process_freed_data+0x1cd/0x440
>>   ? account_page_dirtied+0xe2/0x1a0
>>   ext4_journal_commit_callback+0x4a/0xc0
>>   jbd2_journal_commit_transaction+0x17e2/0x19e0
>>   ? kjournald2+0xb0/0x250
>>   kjournald2+0xb0/0x250
>>   ? wait_woken+0x80/0x80
>>   ? commit_timeout+0x10/0x10
>>   kthread+0x111/0x130
>>   ? kthread_create_worker_on_cpu+0x50/0x50
>>   ? do_group_exit+0x3a/0xa0
>>   ret_from_fork+0x1f/0x30
>>  Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
>>  ---[ end trace 50d361cc444506c8 ]---
>>  print_req_error: I/O error, dev nvme0n1, sector 847167488
> 
> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.
> 

Let's consider the case following:

blk_mq_bio_to_request()
  -> blk_init_request_from_bio()
    -> blk_rq_bio_prep()
    ----
    if (bio_has_data(bio))
        rq->nr_phys_segments = bio_phys_segments(q, bio);
    ----
static inline bool bio_has_data(struct bio *bio)
{
    if (bio &&
        bio->bi_iter.bi_size &&
        bio_op(bio) != REQ_OP_DISCARD &&   //-----> HERE !
        bio_op(bio) != REQ_OP_SECURE_ERASE &&
        bio_op(bio) != REQ_OP_WRITE_ZEROES)
        return true;

    return false;
}
For the DISCARD req, the nr_phys_segments is 0.

dd_insert_request()
  -> blk_mq_sched_try_insert_merge()
    -> elv_attempt_insert_merge()
      -> blk_attempt_req_merge()
        -> attempt_merge()
          -> ll_merge_requests_fn()
----
    total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
    // total_phys_segments will be 0
    if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
        if (req->nr_phys_segments == 1)
            req->bio->bi_seg_front_size = seg_size;
        if (next->nr_phys_segments == 1)
            next->biotail->bi_seg_back_size = seg_size;
        // total_phys_segments is int, it will be -1;
        total_phys_segments--;
    }

    //total_phys_segments is -1, so it is false here
    if (total_phys_segments > queue_max_segments(q))
        return 0;

    if (blk_integrity_merge_rq(q, req, next) == false)
        return 0;

    /* Merge is OK... */
    // -1 is set to req->nr_phys_segments which is a unsigned short
    // finally, we get a 0xffff
    req->nr_phys_segments = total_phys_segments;
----

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-01-31  4:25     ` jianchao.wang
@ 2018-01-31 15:29       ` Jens Axboe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-01-31 15:29 UTC (permalink / raw)
  To: jianchao.wang, linux-block; +Cc: Christoph Hellwig, Ming Lei, linux-nvme

On 1/30/18 9:25 PM, jianchao.wang wrote:
> Hi Jens
> 
> On 01/30/2018 11:57 PM, Jens Axboe wrote:
>> On 1/30/18 8:41 AM, Jens Axboe wrote:
>>> Hi,
>>>
>>> I just hit this on 4.15+ on the laptop, it's running Linus' git
>>> as of yesterday, right after the block tree merge:
>>>
>>> commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
>>> Merge: 9697e9da8429 796baeeef85a
>>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>>> Date:   Mon Jan 29 11:51:49 2018 -0800
>>>
>>>     Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
>>>
>>> It's hitting this case:
>>>
>>>         if (WARN_ON_ONCE(n != segments)) {                                      
>>>
>>> in nvme, indicating that rq->nr_phys_segments does not equal the
>>> number of bios in the request. Anyone seen this? I'll take a closer
>>> look as time permits, full trace below.
>>>
>>>
>>>  WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
>>>  Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
>>>  CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G     U           4.15.0+ #176
>>>  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
>>>  RIP: 0010:nvme_setup_cmd+0x3d3/0x430
>>>  RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
>>>  RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
>>>  RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
>>>  RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
>>>  R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
>>>  R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
>>>  FS:  0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
>>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>  CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
>>>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>  Call Trace:
>>>   nvme_queue_rq+0x40/0xa00
>>>   ? __sbitmap_queue_get+0x24/0x90
>>>   ? blk_mq_get_tag+0xa3/0x250
>>>   ? wait_woken+0x80/0x80
>>>   ? blk_mq_get_driver_tag+0x97/0xf0
>>>   blk_mq_dispatch_rq_list+0x7b/0x4a0
>>>   ? deadline_remove_request+0x49/0xb0
>>>   blk_mq_do_dispatch_sched+0x4f/0xc0
>>>   blk_mq_sched_dispatch_requests+0x106/0x170
>>>   __blk_mq_run_hw_queue+0x53/0xa0
>>>   __blk_mq_delay_run_hw_queue+0x83/0xa0
>>>   blk_mq_run_hw_queue+0x6c/0xd0
>>>   blk_mq_sched_insert_request+0x96/0x140
>>>   __blk_mq_try_issue_directly+0x3d/0x190
>>>   blk_mq_try_issue_directly+0x30/0x70
>>>   blk_mq_make_request+0x1a4/0x6a0
>>>   generic_make_request+0xfd/0x2f0
>>>   ? submit_bio+0x5c/0x110
>>>   submit_bio+0x5c/0x110
>>>   ? __blkdev_issue_discard+0x152/0x200
>>>   submit_bio_wait+0x43/0x60
>>>   ext4_process_freed_data+0x1cd/0x440
>>>   ? account_page_dirtied+0xe2/0x1a0
>>>   ext4_journal_commit_callback+0x4a/0xc0
>>>   jbd2_journal_commit_transaction+0x17e2/0x19e0
>>>   ? kjournald2+0xb0/0x250
>>>   kjournald2+0xb0/0x250
>>>   ? wait_woken+0x80/0x80
>>>   ? commit_timeout+0x10/0x10
>>>   kthread+0x111/0x130
>>>   ? kthread_create_worker_on_cpu+0x50/0x50
>>>   ? do_group_exit+0x3a/0xa0
>>>   ret_from_fork+0x1f/0x30
>>>  Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
>>>  ---[ end trace 50d361cc444506c8 ]---
>>>  print_req_error: I/O error, dev nvme0n1, sector 847167488
>>
>> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.
>>
> 
> Let's consider the case following:
> 
> blk_mq_bio_to_request()
>   -> blk_init_request_from_bio()
>     -> blk_rq_bio_prep()
>     ----
>     if (bio_has_data(bio))
>         rq->nr_phys_segments = bio_phys_segments(q, bio);
>     ----
> static inline bool bio_has_data(struct bio *bio)
> {
>     if (bio &&
>         bio->bi_iter.bi_size &&
>         bio_op(bio) != REQ_OP_DISCARD &&   //-----> HERE !
>         bio_op(bio) != REQ_OP_SECURE_ERASE &&
>         bio_op(bio) != REQ_OP_WRITE_ZEROES)
>         return true;
> 
>     return false;
> }
> For the DISCARD req, the nr_phys_segments is 0.
> 
> dd_insert_request()
>   -> blk_mq_sched_try_insert_merge()
>     -> elv_attempt_insert_merge()
>       -> blk_attempt_req_merge()
>         -> attempt_merge()
>           -> ll_merge_requests_fn()
> ----
>     total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
>     // total_phys_segments will be 0
>     if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
>         if (req->nr_phys_segments == 1)
>             req->bio->bi_seg_front_size = seg_size;
>         if (next->nr_phys_segments == 1)
>             next->biotail->bi_seg_back_size = seg_size;
>         // total_phys_segments is int, it will be -1;
>         total_phys_segments--;
>     }
> 
>     //total_phys_segments is -1, so it is false here
>     if (total_phys_segments > queue_max_segments(q))
>         return 0;
> 
>     if (blk_integrity_merge_rq(q, req, next) == false)
>         return 0;
> 
>     /* Merge is OK... */
>     // -1 is set to req->nr_phys_segments which is a unsigned short
>     // finally, we get a 0xffff
>     req->nr_phys_segments = total_phys_segments;

That's a great analysis, and it looks correct. The problem is that
blk_phys_contig_segment() pretends that the segments are contig
if none of them have data, which is correct from the point of view
of not needing a new segment to be mergeable. But it fails pretty
miserably for this case.

How about something like the below?


diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..cee102fb060e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
 		return 0;
 
+	/*
+	 * For DISCARDs, the segment count isn't interesting since
+	 * the requests have no data attached.
+	 */
 	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
-	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
+	if (total_phys_segments &&
+	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
 		if (req->nr_phys_segments == 1)
 			req->bio->bi_seg_front_size = seg_size;
 		if (next->nr_phys_segments == 1)

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-01-31 15:29       ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-01-31 15:29 UTC (permalink / raw)


On 1/30/18 9:25 PM, jianchao.wang wrote:
> Hi Jens
> 
> On 01/30/2018 11:57 PM, Jens Axboe wrote:
>> On 1/30/18 8:41 AM, Jens Axboe wrote:
>>> Hi,
>>>
>>> I just hit this on 4.15+ on the laptop, it's running Linus' git
>>> as of yesterday, right after the block tree merge:
>>>
>>> commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
>>> Merge: 9697e9da8429 796baeeef85a
>>> Author: Linus Torvalds <torvalds at linux-foundation.org>
>>> Date:   Mon Jan 29 11:51:49 2018 -0800
>>>
>>>     Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
>>>
>>> It's hitting this case:
>>>
>>>         if (WARN_ON_ONCE(n != segments)) {                                      
>>>
>>> in nvme, indicating that rq->nr_phys_segments does not equal the
>>> number of bios in the request. Anyone seen this? I'll take a closer
>>> look as time permits, full trace below.
>>>
>>>
>>>  WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
>>>  Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
>>>  CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G     U           4.15.0+ #176
>>>  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
>>>  RIP: 0010:nvme_setup_cmd+0x3d3/0x430
>>>  RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
>>>  RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
>>>  RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
>>>  RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
>>>  R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
>>>  R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
>>>  FS:  0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
>>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>  CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
>>>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>  Call Trace:
>>>   nvme_queue_rq+0x40/0xa00
>>>   ? __sbitmap_queue_get+0x24/0x90
>>>   ? blk_mq_get_tag+0xa3/0x250
>>>   ? wait_woken+0x80/0x80
>>>   ? blk_mq_get_driver_tag+0x97/0xf0
>>>   blk_mq_dispatch_rq_list+0x7b/0x4a0
>>>   ? deadline_remove_request+0x49/0xb0
>>>   blk_mq_do_dispatch_sched+0x4f/0xc0
>>>   blk_mq_sched_dispatch_requests+0x106/0x170
>>>   __blk_mq_run_hw_queue+0x53/0xa0
>>>   __blk_mq_delay_run_hw_queue+0x83/0xa0
>>>   blk_mq_run_hw_queue+0x6c/0xd0
>>>   blk_mq_sched_insert_request+0x96/0x140
>>>   __blk_mq_try_issue_directly+0x3d/0x190
>>>   blk_mq_try_issue_directly+0x30/0x70
>>>   blk_mq_make_request+0x1a4/0x6a0
>>>   generic_make_request+0xfd/0x2f0
>>>   ? submit_bio+0x5c/0x110
>>>   submit_bio+0x5c/0x110
>>>   ? __blkdev_issue_discard+0x152/0x200
>>>   submit_bio_wait+0x43/0x60
>>>   ext4_process_freed_data+0x1cd/0x440
>>>   ? account_page_dirtied+0xe2/0x1a0
>>>   ext4_journal_commit_callback+0x4a/0xc0
>>>   jbd2_journal_commit_transaction+0x17e2/0x19e0
>>>   ? kjournald2+0xb0/0x250
>>>   kjournald2+0xb0/0x250
>>>   ? wait_woken+0x80/0x80
>>>   ? commit_timeout+0x10/0x10
>>>   kthread+0x111/0x130
>>>   ? kthread_create_worker_on_cpu+0x50/0x50
>>>   ? do_group_exit+0x3a/0xa0
>>>   ret_from_fork+0x1f/0x30
>>>  Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
>>>  ---[ end trace 50d361cc444506c8 ]---
>>>  print_req_error: I/O error, dev nvme0n1, sector 847167488
>>
>> Looking at the disassembly, 'n' is 2 and 'segments' is 0xffff.
>>
> 
> Let's consider the case following:
> 
> blk_mq_bio_to_request()
>   -> blk_init_request_from_bio()
>     -> blk_rq_bio_prep()
>     ----
>     if (bio_has_data(bio))
>         rq->nr_phys_segments = bio_phys_segments(q, bio);
>     ----
> static inline bool bio_has_data(struct bio *bio)
> {
>     if (bio &&
>         bio->bi_iter.bi_size &&
>         bio_op(bio) != REQ_OP_DISCARD &&   //-----> HERE !
>         bio_op(bio) != REQ_OP_SECURE_ERASE &&
>         bio_op(bio) != REQ_OP_WRITE_ZEROES)
>         return true;
> 
>     return false;
> }
> For the DISCARD req, the nr_phys_segments is 0.
> 
> dd_insert_request()
>   -> blk_mq_sched_try_insert_merge()
>     -> elv_attempt_insert_merge()
>       -> blk_attempt_req_merge()
>         -> attempt_merge()
>           -> ll_merge_requests_fn()
> ----
>     total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
>     // total_phys_segments will be 0
>     if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
>         if (req->nr_phys_segments == 1)
>             req->bio->bi_seg_front_size = seg_size;
>         if (next->nr_phys_segments == 1)
>             next->biotail->bi_seg_back_size = seg_size;
>         // total_phys_segments is int, it will be -1;
>         total_phys_segments--;
>     }
> 
>     //total_phys_segments is -1, so it is false here
>     if (total_phys_segments > queue_max_segments(q))
>         return 0;
> 
>     if (blk_integrity_merge_rq(q, req, next) == false)
>         return 0;
> 
>     /* Merge is OK... */
>     // -1 is set to req->nr_phys_segments which is a unsigned short
>     // finally, we get a 0xffff
>     req->nr_phys_segments = total_phys_segments;

That's a great analysis, and it looks correct. The problem is that
blk_phys_contig_segment() pretends that the segments are contig
if none of them have data, which is correct from the point of view
of not needing a new segment to be mergeable. But it fails pretty
miserably for this case.

How about something like the below?


diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..cee102fb060e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
 		return 0;
 
+	/*
+	 * For DISCARDs, the segment count isn't interesting since
+	 * the requests have no data attached.
+	 */
 	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
-	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
+	if (total_phys_segments &&
+	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
 		if (req->nr_phys_segments == 1)
 			req->bio->bi_seg_front_size = seg_size;
 		if (next->nr_phys_segments == 1)

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-01-31 15:29       ` Jens Axboe
@ 2018-01-31 23:33         ` Keith Busch
  -1 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-01-31 23:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: jianchao.wang, linux-block, Christoph Hellwig, Ming Lei, linux-nvme

On Wed, Jan 31, 2018 at 08:29:37AM -0700, Jens Axboe wrote:
> 
> How about something like the below?
> 
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..cee102fb060e 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>  		return 0;
>  
> +	/*
> +	 * For DISCARDs, the segment count isn't interesting since
> +	 * the requests have no data attached.
> +	 */
>  	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
> +	if (total_phys_segments &&
> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
>  		if (req->nr_phys_segments == 1)
>  			req->bio->bi_seg_front_size = seg_size;
>  		if (next->nr_phys_segments == 1)

That'll keep it from going to 0xffff, but you'll still hit the warning and
IO error. Even worse, this will corrupt memory: blk_rq_nr_discard_segments
will return 1, and since you really had 2 segments, the nvme driver will
overrun its array.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-01-31 23:33         ` Keith Busch
  0 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-01-31 23:33 UTC (permalink / raw)


On Wed, Jan 31, 2018@08:29:37AM -0700, Jens Axboe wrote:
> 
> How about something like the below?
> 
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..cee102fb060e 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>  		return 0;
>  
> +	/*
> +	 * For DISCARDs, the segment count isn't interesting since
> +	 * the requests have no data attached.
> +	 */
>  	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
> +	if (total_phys_segments &&
> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
>  		if (req->nr_phys_segments == 1)
>  			req->bio->bi_seg_front_size = seg_size;
>  		if (next->nr_phys_segments == 1)

That'll keep it from going to 0xffff, but you'll still hit the warning and
IO error. Even worse, this will corrupt memory: blk_rq_nr_discard_segments
will return 1, and since you really had 2 segments, the nvme driver will
overrun its array.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-01-31 15:29       ` Jens Axboe
@ 2018-02-01  3:03         ` jianchao.wang
  -1 siblings, 0 replies; 42+ messages in thread
From: jianchao.wang @ 2018-02-01  3:03 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Christoph Hellwig, linux-nvme, Ming Lei

Hi Jens


On 01/31/2018 11:29 PM, Jens Axboe wrote:
> How about something like the below?
> 
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..cee102fb060e 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>  		return 0;
>  
> +	/*
> +	 * For DISCARDs, the segment count isn't interesting since
> +	 * the requests have no data attached.
> +	 */
>  	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
> +	if (total_phys_segments &&
> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
>  		if (req->nr_phys_segments == 1)
>  			req->bio->bi_seg_front_size = seg_size;
>  		if (next->nr_phys_segments == 1)

This patch will avoid the nr_phys_segments to be set to 0xffff,
but the merged req will have two bios but zero nr_phys_segments.

We have to align with the DISCARD merging strategy.

Please refer to:
/*
 * Number of discard segments (or ranges) the driver needs to fill in.
 * Each discard bio merged into a request is counted as one segment.
 */
static inline unsigned short blk_rq_nr_discard_segments(struct request *rq)
{
       return max_t(unsigned short, rq->nr_phys_segments, 1);
}
bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
		struct bio *bio)
{
	unsigned short segments = blk_rq_nr_discard_segments(req);

	if (segments >= queue_max_discard_segments(q))
		goto no_merge;
	if (blk_rq_sectors(req) + bio_sectors(bio) >
	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
		goto no_merge;

	req->biotail->bi_next = bio;
	req->biotail = bio;
	req->__data_len += bio->bi_iter.bi_size;
	req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));
	req->nr_phys_segments = segments + 1;

	blk_account_io_start(req, false);
	return true;
no_merge:
	req_set_nomerge(q, req);
	return false;
}

blk_rq_nr_discard_segments will get a wrong value finally.

Maybe we could change blk_rq_nr_discard_segments to iterate and count the bios in one request
to decide the DISCARD request nr_phy_segment. And discard the nr_phys_segments operations in
the DISCARD merging path, plus your patch here.

Thanks
Jianchao

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01  3:03         ` jianchao.wang
  0 siblings, 0 replies; 42+ messages in thread
From: jianchao.wang @ 2018-02-01  3:03 UTC (permalink / raw)


Hi Jens


On 01/31/2018 11:29 PM, Jens Axboe wrote:
> How about something like the below?
> 
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..cee102fb060e 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>  		return 0;
>  
> +	/*
> +	 * For DISCARDs, the segment count isn't interesting since
> +	 * the requests have no data attached.
> +	 */
>  	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
> +	if (total_phys_segments &&
> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
>  		if (req->nr_phys_segments == 1)
>  			req->bio->bi_seg_front_size = seg_size;
>  		if (next->nr_phys_segments == 1)

This patch will avoid the nr_phys_segments to be set to 0xffff,
but the merged req will have two bios but zero nr_phys_segments.

We have to align with the DISCARD merging strategy.

Please refer to:
/*
 * Number of discard segments (or ranges) the driver needs to fill in.
 * Each discard bio merged into a request is counted as one segment.
 */
static inline unsigned short blk_rq_nr_discard_segments(struct request *rq)
{
       return max_t(unsigned short, rq->nr_phys_segments, 1);
}
bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
		struct bio *bio)
{
	unsigned short segments = blk_rq_nr_discard_segments(req);

	if (segments >= queue_max_discard_segments(q))
		goto no_merge;
	if (blk_rq_sectors(req) + bio_sectors(bio) >
	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
		goto no_merge;

	req->biotail->bi_next = bio;
	req->biotail = bio;
	req->__data_len += bio->bi_iter.bi_size;
	req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));
	req->nr_phys_segments = segments + 1;

	blk_account_io_start(req, false);
	return true;
no_merge:
	req_set_nomerge(q, req);
	return false;
}

blk_rq_nr_discard_segments will get a wrong value finally.

Maybe we could change blk_rq_nr_discard_segments to iterate and count the bios in one request
to decide the DISCARD request nr_phy_segment. And discard the nr_phys_segments operations in
the DISCARD merging path, plus your patch here.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-01-31 23:33         ` Keith Busch
@ 2018-02-01  3:03           ` Jens Axboe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01  3:03 UTC (permalink / raw)
  To: Keith Busch
  Cc: jianchao.wang, linux-block, Christoph Hellwig, Ming Lei, linux-nvme

On 1/31/18 4:33 PM, Keith Busch wrote:
> On Wed, Jan 31, 2018 at 08:29:37AM -0700, Jens Axboe wrote:
>>
>> How about something like the below?
>>
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..cee102fb060e 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>>  		return 0;
>>  
>> +	/*
>> +	 * For DISCARDs, the segment count isn't interesting since
>> +	 * the requests have no data attached.
>> +	 */
>>  	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
>> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +	if (total_phys_segments &&
>> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
>>  		if (req->nr_phys_segments == 1)
>>  			req->bio->bi_seg_front_size = seg_size;
>>  		if (next->nr_phys_segments == 1)
> 
> That'll keep it from going to 0xffff, but you'll still hit the warning and
> IO error. Even worse, this will corrupt memory: blk_rq_nr_discard_segments
> will return 1, and since you really had 2 segments, the nvme driver will
> overrun its array.

Yeah you are right, that patch was shit. How about the below? We only
need to worry about segment size and number of segments if we are
carrying data. req->biotail and next->bio must be the same type, so
should be safe.


diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..cf9adc4c64b5 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -553,9 +553,7 @@ static bool req_no_special_merge(struct request *req)
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 				struct request *next)
 {
-	int total_phys_segments;
-	unsigned int seg_size =
-		req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
+	int total_phys_segments = 0;
 
 	/*
 	 * First check if the either of the requests are re-queued
@@ -574,17 +572,27 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
 		return 0;
 
-	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
-	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
-		if (req->nr_phys_segments == 1)
-			req->bio->bi_seg_front_size = seg_size;
-		if (next->nr_phys_segments == 1)
-			next->biotail->bi_seg_back_size = seg_size;
-		total_phys_segments--;
-	}
+	/*
+	 * If the requests aren't carrying any data payloads, we don't need
+	 * to look at the segment count
+	 */
+	if (bio_has_data(next->bio)) {
+		total_phys_segments = req->nr_phys_segments +
+					next->nr_phys_segments;
+		if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
+			unsigned int seg_size = req->biotail->bi_seg_back_size +
+						next->bio->bi_seg_front_size;
+
+			if (req->nr_phys_segments == 1)
+				req->bio->bi_seg_front_size = seg_size;
+			if (next->nr_phys_segments == 1)
+				next->biotail->bi_seg_back_size = seg_size;
+			total_phys_segments--;
+		}
 
-	if (total_phys_segments > queue_max_segments(q))
-		return 0;
+		if (total_phys_segments > queue_max_segments(q))
+			return 0;
+	}
 
 	if (blk_integrity_merge_rq(q, req, next) == false)
 		return 0;

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01  3:03           ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01  3:03 UTC (permalink / raw)


On 1/31/18 4:33 PM, Keith Busch wrote:
> On Wed, Jan 31, 2018@08:29:37AM -0700, Jens Axboe wrote:
>>
>> How about something like the below?
>>
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..cee102fb060e 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>>  		return 0;
>>  
>> +	/*
>> +	 * For DISCARDs, the segment count isn't interesting since
>> +	 * the requests have no data attached.
>> +	 */
>>  	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
>> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +	if (total_phys_segments &&
>> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
>>  		if (req->nr_phys_segments == 1)
>>  			req->bio->bi_seg_front_size = seg_size;
>>  		if (next->nr_phys_segments == 1)
> 
> That'll keep it from going to 0xffff, but you'll still hit the warning and
> IO error. Even worse, this will corrupt memory: blk_rq_nr_discard_segments
> will return 1, and since you really had 2 segments, the nvme driver will
> overrun its array.

Yeah you are right, that patch was shit. How about the below? We only
need to worry about segment size and number of segments if we are
carrying data. req->biotail and next->bio must be the same type, so
should be safe.


diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..cf9adc4c64b5 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -553,9 +553,7 @@ static bool req_no_special_merge(struct request *req)
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 				struct request *next)
 {
-	int total_phys_segments;
-	unsigned int seg_size =
-		req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
+	int total_phys_segments = 0;
 
 	/*
 	 * First check if the either of the requests are re-queued
@@ -574,17 +572,27 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
 		return 0;
 
-	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
-	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
-		if (req->nr_phys_segments == 1)
-			req->bio->bi_seg_front_size = seg_size;
-		if (next->nr_phys_segments == 1)
-			next->biotail->bi_seg_back_size = seg_size;
-		total_phys_segments--;
-	}
+	/*
+	 * If the requests aren't carrying any data payloads, we don't need
+	 * to look at the segment count
+	 */
+	if (bio_has_data(next->bio)) {
+		total_phys_segments = req->nr_phys_segments +
+					next->nr_phys_segments;
+		if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
+			unsigned int seg_size = req->biotail->bi_seg_back_size +
+						next->bio->bi_seg_front_size;
+
+			if (req->nr_phys_segments == 1)
+				req->bio->bi_seg_front_size = seg_size;
+			if (next->nr_phys_segments == 1)
+				next->biotail->bi_seg_back_size = seg_size;
+			total_phys_segments--;
+		}
 
-	if (total_phys_segments > queue_max_segments(q))
-		return 0;
+		if (total_phys_segments > queue_max_segments(q))
+			return 0;
+	}
 
 	if (blk_integrity_merge_rq(q, req, next) == false)
 		return 0;

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-02-01  3:03         ` jianchao.wang
@ 2018-02-01  3:07           ` Jens Axboe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01  3:07 UTC (permalink / raw)
  To: jianchao.wang, linux-block; +Cc: Christoph Hellwig, linux-nvme, Ming Lei

On 1/31/18 8:03 PM, jianchao.wang wrote:
> Hi Jens
> 
> 
> On 01/31/2018 11:29 PM, Jens Axboe wrote:
>> How about something like the below?
>>
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..cee102fb060e 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>>  		return 0;
>>  
>> +	/*
>> +	 * For DISCARDs, the segment count isn't interesting since
>> +	 * the requests have no data attached.
>> +	 */
>>  	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
>> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +	if (total_phys_segments &&
>> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
>>  		if (req->nr_phys_segments == 1)
>>  			req->bio->bi_seg_front_size = seg_size;
>>  		if (next->nr_phys_segments == 1)
> 
> This patch will avoid the nr_phys_segments to be set to 0xffff,
> but the merged req will have two bios but zero nr_phys_segments.
> 
> We have to align with the DISCARD merging strategy.
> 
> Please refer to:
> /*
>  * Number of discard segments (or ranges) the driver needs to fill in.
>  * Each discard bio merged into a request is counted as one segment.
>  */
> static inline unsigned short blk_rq_nr_discard_segments(struct request *rq)
> {
>        return max_t(unsigned short, rq->nr_phys_segments, 1);
> }
> bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
> 		struct bio *bio)
> {
> 	unsigned short segments = blk_rq_nr_discard_segments(req);
> 
> 	if (segments >= queue_max_discard_segments(q))
> 		goto no_merge;
> 	if (blk_rq_sectors(req) + bio_sectors(bio) >
> 	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
> 		goto no_merge;
> 
> 	req->biotail->bi_next = bio;
> 	req->biotail = bio;
> 	req->__data_len += bio->bi_iter.bi_size;
> 	req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));
> 	req->nr_phys_segments = segments + 1;
> 
> 	blk_account_io_start(req, false);
> 	return true;
> no_merge:
> 	req_set_nomerge(q, req);
> 	return false;
> }
> 
> blk_rq_nr_discard_segments will get a wrong value finally.
> 
> Maybe we could change blk_rq_nr_discard_segments to iterate and count the bios in one request
> to decide the DISCARD request nr_phy_segment. And discard the nr_phys_segments operations in
> the DISCARD merging path, plus your patch here.

Yeah I agree, and my last patch missed that we do care about segments for
discards. Below should be better...

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..055057bd727f 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -553,9 +553,8 @@ static bool req_no_special_merge(struct request *req)
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 				struct request *next)
 {
-	int total_phys_segments;
-	unsigned int seg_size =
-		req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
+	int total_phys_segments = req->nr_phys_segments +
+					next->nr_phys_segments;
 
 	/*
 	 * First check if the either of the requests are re-queued
@@ -574,8 +573,15 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
 		return 0;
 
-	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
-	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
+	/*
+	 * If the requests aren't carrying any data payloads, we don't need
+	 * to look at the segment count
+	 */
+	if (bio_has_data(next->bio) &&
+	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
+		unsigned int seg_size = req->biotail->bi_seg_back_size +
+						next->bio->bi_seg_front_size;
+
 		if (req->nr_phys_segments == 1)
 			req->bio->bi_seg_front_size = seg_size;
 		if (next->nr_phys_segments == 1)
@@ -584,7 +590,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 	}
 
 	if (total_phys_segments > queue_max_segments(q))
-		return 0;
+			return 0;
 
 	if (blk_integrity_merge_rq(q, req, next) == false)
 		return 0;

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01  3:07           ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01  3:07 UTC (permalink / raw)


On 1/31/18 8:03 PM, jianchao.wang wrote:
> Hi Jens
> 
> 
> On 01/31/2018 11:29 PM, Jens Axboe wrote:
>> How about something like the below?
>>
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..cee102fb060e 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>>  		return 0;
>>  
>> +	/*
>> +	 * For DISCARDs, the segment count isn't interesting since
>> +	 * the requests have no data attached.
>> +	 */
>>  	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
>> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +	if (total_phys_segments &&
>> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
>>  		if (req->nr_phys_segments == 1)
>>  			req->bio->bi_seg_front_size = seg_size;
>>  		if (next->nr_phys_segments == 1)
> 
> This patch will avoid the nr_phys_segments to be set to 0xffff,
> but the merged req will have two bios but zero nr_phys_segments.
> 
> We have to align with the DISCARD merging strategy.
> 
> Please refer to:
> /*
>  * Number of discard segments (or ranges) the driver needs to fill in.
>  * Each discard bio merged into a request is counted as one segment.
>  */
> static inline unsigned short blk_rq_nr_discard_segments(struct request *rq)
> {
>        return max_t(unsigned short, rq->nr_phys_segments, 1);
> }
> bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
> 		struct bio *bio)
> {
> 	unsigned short segments = blk_rq_nr_discard_segments(req);
> 
> 	if (segments >= queue_max_discard_segments(q))
> 		goto no_merge;
> 	if (blk_rq_sectors(req) + bio_sectors(bio) >
> 	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
> 		goto no_merge;
> 
> 	req->biotail->bi_next = bio;
> 	req->biotail = bio;
> 	req->__data_len += bio->bi_iter.bi_size;
> 	req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));
> 	req->nr_phys_segments = segments + 1;
> 
> 	blk_account_io_start(req, false);
> 	return true;
> no_merge:
> 	req_set_nomerge(q, req);
> 	return false;
> }
> 
> blk_rq_nr_discard_segments will get a wrong value finally.
> 
> Maybe we could change blk_rq_nr_discard_segments to iterate and count the bios in one request
> to decide the DISCARD request nr_phy_segment. And discard the nr_phys_segments operations in
> the DISCARD merging path, plus your patch here.

Yeah I agree, and my last patch missed that we do care about segments for
discards. Below should be better...

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..055057bd727f 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -553,9 +553,8 @@ static bool req_no_special_merge(struct request *req)
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 				struct request *next)
 {
-	int total_phys_segments;
-	unsigned int seg_size =
-		req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
+	int total_phys_segments = req->nr_phys_segments +
+					next->nr_phys_segments;
 
 	/*
 	 * First check if the either of the requests are re-queued
@@ -574,8 +573,15 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
 		return 0;
 
-	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
-	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
+	/*
+	 * If the requests aren't carrying any data payloads, we don't need
+	 * to look at the segment count
+	 */
+	if (bio_has_data(next->bio) &&
+	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
+		unsigned int seg_size = req->biotail->bi_seg_back_size +
+						next->bio->bi_seg_front_size;
+
 		if (req->nr_phys_segments == 1)
 			req->bio->bi_seg_front_size = seg_size;
 		if (next->nr_phys_segments == 1)
@@ -584,7 +590,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 	}
 
 	if (total_phys_segments > queue_max_segments(q))
-		return 0;
+			return 0;
 
 	if (blk_integrity_merge_rq(q, req, next) == false)
 		return 0;

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-02-01  3:07           ` Jens Axboe
@ 2018-02-01  3:33             ` jianchao.wang
  -1 siblings, 0 replies; 42+ messages in thread
From: jianchao.wang @ 2018-02-01  3:33 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Christoph Hellwig, linux-nvme, Ming Lei

Sorry, Jens, I think I didn't get the point.
Do I miss anything ?

On 02/01/2018 11:07 AM, Jens Axboe wrote:
> Yeah I agree, and my last patch missed that we do care about segments for
> discards. Below should be better...
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..055057bd727f 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -553,9 +553,8 @@ static bool req_no_special_merge(struct request *req)
>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  				struct request *next)
>  {
> -	int total_phys_segments;
> -	unsigned int seg_size =
> -		req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
> +	int total_phys_segments = req->nr_phys_segments +
> +					next->nr_phys_segments;

For DISCARD reqs, the total_phys_segments is still zero here.

>  
>  	/*
>  	 * First check if the either of the requests are re-queued
> @@ -574,8 +573,15 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>  		return 0;
>  
> -	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
> +	/*
> +	 * If the requests aren't carrying any data payloads, we don't need
> +	 * to look at the segment count
> +	 */
> +	if (bio_has_data(next->bio) &&
> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
> +		unsigned int seg_size = req->biotail->bi_seg_back_size +
> +						next->bio->bi_seg_front_size;

Yes, total_phys_segments will not be decreased.

> +
>  		if (req->nr_phys_segments == 1)
>  			req->bio->bi_seg_front_size = seg_size;
>  		if (next->nr_phys_segments == 1)
> @@ -584,7 +590,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  	}
>  
>  	if (total_phys_segments > queue_max_segments(q))
> -		return 0;
> +			return 0;
>  
>  	if (blk_integrity_merge_rq(q, req, next) == false)
>  		return 0;

But finally, the merged DISCARD req's nr_phys_segment is still zero.

blk_rq_nr_discard_segments will return 1 but this req has two bios.
blk_rq_nr_discard_segments 's comment says
-- 
Each discard bio merged into a request is counted as one segment
--

Maybe patch below should be followed with yours.

diff --git a/block/blk-core.c b/block/blk-core.c
index a2005a4..b444fb7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1763,7 +1763,6 @@ bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
        req->biotail = bio;
        req->__data_len += bio->bi_iter.bi_size;
        req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));
-       req->nr_phys_segments = segments + 1;
 
        blk_account_io_start(req, false);
        return true;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4f3df80..1af2138 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1312,7 +1312,13 @@ static inline unsigned short blk_rq_nr_phys_segments(struct request *rq)
  */
 static inline unsigned short blk_rq_nr_discard_segments(struct request *rq)
 {
-       return max_t(unsigned short, rq->nr_phys_segments, 1);
+       struct bio *bio;
+       unsigned short count = 0;
+
+       __rq_for_each_bio(bio, req)
+               count ++;
+
+       return count;
 }


Thanks
Jianchao

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01  3:33             ` jianchao.wang
  0 siblings, 0 replies; 42+ messages in thread
From: jianchao.wang @ 2018-02-01  3:33 UTC (permalink / raw)


Sorry, Jens, I think I didn't get the point.
Do I miss anything ?

On 02/01/2018 11:07 AM, Jens Axboe wrote:
> Yeah I agree, and my last patch missed that we do care about segments for
> discards. Below should be better...
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..055057bd727f 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -553,9 +553,8 @@ static bool req_no_special_merge(struct request *req)
>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  				struct request *next)
>  {
> -	int total_phys_segments;
> -	unsigned int seg_size =
> -		req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
> +	int total_phys_segments = req->nr_phys_segments +
> +					next->nr_phys_segments;

For DISCARD reqs, the total_phys_segments is still zero here.

>  
>  	/*
>  	 * First check if the either of the requests are re-queued
> @@ -574,8 +573,15 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>  		return 0;
>  
> -	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
> +	/*
> +	 * If the requests aren't carrying any data payloads, we don't need
> +	 * to look at the segment count
> +	 */
> +	if (bio_has_data(next->bio) &&
> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
> +		unsigned int seg_size = req->biotail->bi_seg_back_size +
> +						next->bio->bi_seg_front_size;

Yes, total_phys_segments will not be decreased.

> +
>  		if (req->nr_phys_segments == 1)
>  			req->bio->bi_seg_front_size = seg_size;
>  		if (next->nr_phys_segments == 1)
> @@ -584,7 +590,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  	}
>  
>  	if (total_phys_segments > queue_max_segments(q))
> -		return 0;
> +			return 0;
>  
>  	if (blk_integrity_merge_rq(q, req, next) == false)
>  		return 0;

But finally, the merged DISCARD req's nr_phys_segment is still zero.

blk_rq_nr_discard_segments will return 1 but this req has two bios.
blk_rq_nr_discard_segments 's comment says
-- 
Each discard bio merged into a request is counted as one segment
--

Maybe patch below should be followed with yours.

diff --git a/block/blk-core.c b/block/blk-core.c
index a2005a4..b444fb7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1763,7 +1763,6 @@ bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
        req->biotail = bio;
        req->__data_len += bio->bi_iter.bi_size;
        req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));
-       req->nr_phys_segments = segments + 1;
 
        blk_account_io_start(req, false);
        return true;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4f3df80..1af2138 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1312,7 +1312,13 @@ static inline unsigned short blk_rq_nr_phys_segments(struct request *rq)
  */
 static inline unsigned short blk_rq_nr_discard_segments(struct request *rq)
 {
-       return max_t(unsigned short, rq->nr_phys_segments, 1);
+       struct bio *bio;
+       unsigned short count = 0;
+
+       __rq_for_each_bio(bio, req)
+               count ++;
+
+       return count;
 }


Thanks
Jianchao

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-02-01  3:33             ` jianchao.wang
@ 2018-02-01  3:35               ` Jens Axboe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01  3:35 UTC (permalink / raw)
  To: jianchao.wang, linux-block; +Cc: Christoph Hellwig, linux-nvme, Ming Lei

On 1/31/18 8:33 PM, jianchao.wang wrote:
> Sorry, Jens, I think I didn't get the point.
> Do I miss anything ?
> 
> On 02/01/2018 11:07 AM, Jens Axboe wrote:
>> Yeah I agree, and my last patch missed that we do care about segments for
>> discards. Below should be better...
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..055057bd727f 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -553,9 +553,8 @@ static bool req_no_special_merge(struct request *req)
>>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  				struct request *next)
>>  {
>> -	int total_phys_segments;
>> -	unsigned int seg_size =
>> -		req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
>> +	int total_phys_segments = req->nr_phys_segments +
>> +					next->nr_phys_segments;
> 
> For DISCARD reqs, the total_phys_segments is still zero here.

This seems broken, it should count the ranges in the discard request.

>> @@ -574,8 +573,15 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>>  		return 0;
>>  
>> -	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
>> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +	/*
>> +	 * If the requests aren't carrying any data payloads, we don't need
>> +	 * to look at the segment count
>> +	 */
>> +	if (bio_has_data(next->bio) &&
>> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +		unsigned int seg_size = req->biotail->bi_seg_back_size +
>> +						next->bio->bi_seg_front_size;
> 
> Yes, total_phys_segments will not be decreased.
> 
>> +
>>  		if (req->nr_phys_segments == 1)
>>  			req->bio->bi_seg_front_size = seg_size;
>>  		if (next->nr_phys_segments == 1)
>> @@ -584,7 +590,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  	}
>>  
>>  	if (total_phys_segments > queue_max_segments(q))
>> -		return 0;
>> +			return 0;
>>  
>>  	if (blk_integrity_merge_rq(q, req, next) == false)
>>  		return 0;
> 
> But finally, the merged DISCARD req's nr_phys_segment is still zero.
> 
> blk_rq_nr_discard_segments will return 1 but this req has two bios.
> blk_rq_nr_discard_segments 's comment says

They should have the range count. The discard merge stuff is a bit of
a hack, it would be nice to get that improved.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01  3:35               ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01  3:35 UTC (permalink / raw)


On 1/31/18 8:33 PM, jianchao.wang wrote:
> Sorry, Jens, I think I didn't get the point.
> Do I miss anything ?
> 
> On 02/01/2018 11:07 AM, Jens Axboe wrote:
>> Yeah I agree, and my last patch missed that we do care about segments for
>> discards. Below should be better...
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..055057bd727f 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -553,9 +553,8 @@ static bool req_no_special_merge(struct request *req)
>>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  				struct request *next)
>>  {
>> -	int total_phys_segments;
>> -	unsigned int seg_size =
>> -		req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
>> +	int total_phys_segments = req->nr_phys_segments +
>> +					next->nr_phys_segments;
> 
> For DISCARD reqs, the total_phys_segments is still zero here.

This seems broken, it should count the ranges in the discard request.

>> @@ -574,8 +573,15 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>>  		return 0;
>>  
>> -	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
>> -	if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +	/*
>> +	 * If the requests aren't carrying any data payloads, we don't need
>> +	 * to look at the segment count
>> +	 */
>> +	if (bio_has_data(next->bio) &&
>> +	    blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +		unsigned int seg_size = req->biotail->bi_seg_back_size +
>> +						next->bio->bi_seg_front_size;
> 
> Yes, total_phys_segments will not be decreased.
> 
>> +
>>  		if (req->nr_phys_segments == 1)
>>  			req->bio->bi_seg_front_size = seg_size;
>>  		if (next->nr_phys_segments == 1)
>> @@ -584,7 +590,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  	}
>>  
>>  	if (total_phys_segments > queue_max_segments(q))
>> -		return 0;
>> +			return 0;
>>  
>>  	if (blk_integrity_merge_rq(q, req, next) == false)
>>  		return 0;
> 
> But finally, the merged DISCARD req's nr_phys_segment is still zero.
> 
> blk_rq_nr_discard_segments will return 1 but this req has two bios.
> blk_rq_nr_discard_segments 's comment says

They should have the range count. The discard merge stuff is a bit of
a hack, it would be nice to get that improved.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-02-01  3:07           ` Jens Axboe
@ 2018-02-01  4:56             ` Keith Busch
  -1 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-02-01  4:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: jianchao.wang, linux-block, Christoph Hellwig, linux-nvme, Ming Lei

On Wed, Jan 31, 2018 at 08:07:41PM -0700, Jens Axboe wrote:
>  	if (total_phys_segments > queue_max_segments(q))
> -		return 0;
> +			return 0;

This perhaps unintended change happens to point out another problem:
queue_max_segments is the wrong limit for discards, which require
queue_max_discard_segments.

It might be easier to merge discard requests special, like how merging
a discard bio is handled (untested).

---
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..01671e1373ff 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
 	return !q->mq_ops && req->special;
 }
 
+static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
+		struct request *next)
+{
+	unsigned short segments = blk_rq_nr_discard_segments(req);
+
+	if (segments >= queue_max_discard_segments(q))
+		goto no_merge;
+	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
+	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
+		goto no_merge;
+
+	req->biotail->bi_next = next->bio;
+	req->biotail = next->biotail;
+	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
+	next->bio = NULL;
+	return true;
+
+no_merge:
+	req_set_nomerge(q, req);
+	return false;
+}
+
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 				struct request *next)
 {
@@ -679,6 +701,15 @@ static struct request *attempt_merge(struct request_queue *q,
 	if (req->write_hint != next->write_hint)
 		return NULL;
 
+	/*
+	 * Discards are ... special.
+	 */
+	if (req_op(req) == REQ_OP_DISCARD) {
+		if (req_attempt_discard_merge(q, req, next))
+			return next;
+		return NULL;
+	}
+
 	/*
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
--

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01  4:56             ` Keith Busch
  0 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-02-01  4:56 UTC (permalink / raw)


On Wed, Jan 31, 2018@08:07:41PM -0700, Jens Axboe wrote:
>  	if (total_phys_segments > queue_max_segments(q))
> -		return 0;
> +			return 0;

This perhaps unintended change happens to point out another problem:
queue_max_segments is the wrong limit for discards, which require
queue_max_discard_segments.

It might be easier to merge discard requests special, like how merging
a discard bio is handled (untested).

---
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..01671e1373ff 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
 	return !q->mq_ops && req->special;
 }
 
+static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
+		struct request *next)
+{
+	unsigned short segments = blk_rq_nr_discard_segments(req);
+
+	if (segments >= queue_max_discard_segments(q))
+		goto no_merge;
+	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
+	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
+		goto no_merge;
+
+	req->biotail->bi_next = next->bio;
+	req->biotail = next->biotail;
+	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
+	next->bio = NULL;
+	return true;
+
+no_merge:
+	req_set_nomerge(q, req);
+	return false;
+}
+
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 				struct request *next)
 {
@@ -679,6 +701,15 @@ static struct request *attempt_merge(struct request_queue *q,
 	if (req->write_hint != next->write_hint)
 		return NULL;
 
+	/*
+	 * Discards are ... special.
+	 */
+	if (req_op(req) == REQ_OP_DISCARD) {
+		if (req_attempt_discard_merge(q, req, next))
+			return next;
+		return NULL;
+	}
+
 	/*
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
--

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-02-01  4:56             ` Keith Busch
@ 2018-02-01 15:26               ` Jens Axboe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01 15:26 UTC (permalink / raw)
  To: Keith Busch
  Cc: jianchao.wang, linux-block, Christoph Hellwig, linux-nvme, Ming Lei

On 1/31/18 9:56 PM, Keith Busch wrote:
> On Wed, Jan 31, 2018 at 08:07:41PM -0700, Jens Axboe wrote:
>>  	if (total_phys_segments > queue_max_segments(q))
>> -		return 0;
>> +			return 0;
> 
> This perhaps unintended change happens to point out another problem:
> queue_max_segments is the wrong limit for discards, which require
> queue_max_discard_segments.
> 
> It might be easier to merge discard requests special, like how merging
> a discard bio is handled (untested).

Yeah agreed, we should just split it up completely instead instead of
special casing it in the read/write path.


> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..01671e1373ff 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
>  	return !q->mq_ops && req->special;
>  }
>  
> +static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
> +		struct request *next)
> +{
> +	unsigned short segments = blk_rq_nr_discard_segments(req);
> +
> +	if (segments >= queue_max_discard_segments(q))
> +		goto no_merge;
> +	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
> +	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
> +		goto no_merge;
> +
> +	req->biotail->bi_next = next->bio;
> +	req->biotail = next->biotail;
> +	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
> +	next->bio = NULL;
> +	return true;
> +
> +no_merge:
> +	req_set_nomerge(q, req);
> +	return false;
> +}
> +
>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  				struct request *next)
>  {
> @@ -679,6 +701,15 @@ static struct request *attempt_merge(struct request_queue *q,
>  	if (req->write_hint != next->write_hint)
>  		return NULL;
>  
> +	/*
> +	 * Discards are ... special.
> +	 */
> +	if (req_op(req) == REQ_OP_DISCARD) {
> +		if (req_attempt_discard_merge(q, req, next))
> +			return next;
> +		return NULL;
> +	}
> +
>  	/*
>  	 * If we are allowed to merge, then append bio list
>  	 * from next to rq and release next. merge_requests_fn

This looks fine to me, the bio-to-request merge path already looks correct.
Care to send a properly formatted patch?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01 15:26               ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01 15:26 UTC (permalink / raw)


On 1/31/18 9:56 PM, Keith Busch wrote:
> On Wed, Jan 31, 2018@08:07:41PM -0700, Jens Axboe wrote:
>>  	if (total_phys_segments > queue_max_segments(q))
>> -		return 0;
>> +			return 0;
> 
> This perhaps unintended change happens to point out another problem:
> queue_max_segments is the wrong limit for discards, which require
> queue_max_discard_segments.
> 
> It might be easier to merge discard requests special, like how merging
> a discard bio is handled (untested).

Yeah agreed, we should just split it up completely instead instead of
special casing it in the read/write path.


> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..01671e1373ff 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
>  	return !q->mq_ops && req->special;
>  }
>  
> +static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
> +		struct request *next)
> +{
> +	unsigned short segments = blk_rq_nr_discard_segments(req);
> +
> +	if (segments >= queue_max_discard_segments(q))
> +		goto no_merge;
> +	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
> +	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
> +		goto no_merge;
> +
> +	req->biotail->bi_next = next->bio;
> +	req->biotail = next->biotail;
> +	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
> +	next->bio = NULL;
> +	return true;
> +
> +no_merge:
> +	req_set_nomerge(q, req);
> +	return false;
> +}
> +
>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  				struct request *next)
>  {
> @@ -679,6 +701,15 @@ static struct request *attempt_merge(struct request_queue *q,
>  	if (req->write_hint != next->write_hint)
>  		return NULL;
>  
> +	/*
> +	 * Discards are ... special.
> +	 */
> +	if (req_op(req) == REQ_OP_DISCARD) {
> +		if (req_attempt_discard_merge(q, req, next))
> +			return next;
> +		return NULL;
> +	}
> +
>  	/*
>  	 * If we are allowed to merge, then append bio list
>  	 * from next to rq and release next. merge_requests_fn

This looks fine to me, the bio-to-request merge path already looks correct.
Care to send a properly formatted patch?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-02-01 15:26               ` Jens Axboe
@ 2018-02-01 17:58                 ` Jens Axboe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01 17:58 UTC (permalink / raw)
  To: Keith Busch
  Cc: jianchao.wang, linux-block, Christoph Hellwig, linux-nvme, Ming Lei

On 2/1/18 8:26 AM, Jens Axboe wrote:
> On 1/31/18 9:56 PM, Keith Busch wrote:
>> On Wed, Jan 31, 2018 at 08:07:41PM -0700, Jens Axboe wrote:
>>>  	if (total_phys_segments > queue_max_segments(q))
>>> -		return 0;
>>> +			return 0;
>>
>> This perhaps unintended change happens to point out another problem:
>> queue_max_segments is the wrong limit for discards, which require
>> queue_max_discard_segments.
>>
>> It might be easier to merge discard requests special, like how merging
>> a discard bio is handled (untested).
> 
> Yeah agreed, we should just split it up completely instead instead of
> special casing it in the read/write path.
> 
> 
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..01671e1373ff 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
>>  	return !q->mq_ops && req->special;
>>  }
>>  
>> +static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
>> +		struct request *next)
>> +{
>> +	unsigned short segments = blk_rq_nr_discard_segments(req);
>> +
>> +	if (segments >= queue_max_discard_segments(q))
>> +		goto no_merge;
>> +	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
>> +	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>> +		goto no_merge;
>> +
>> +	req->biotail->bi_next = next->bio;
>> +	req->biotail = next->biotail;
>> +	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
>> +	next->bio = NULL;
>> +	return true;
>> +
>> +no_merge:
>> +	req_set_nomerge(q, req);
>> +	return false;
>> +}
>> +
>>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  				struct request *next)
>>  {
>> @@ -679,6 +701,15 @@ static struct request *attempt_merge(struct request_queue *q,
>>  	if (req->write_hint != next->write_hint)
>>  		return NULL;
>>  
>> +	/*
>> +	 * Discards are ... special.
>> +	 */
>> +	if (req_op(req) == REQ_OP_DISCARD) {
>> +		if (req_attempt_discard_merge(q, req, next))
>> +			return next;
>> +		return NULL;
>> +	}
>> +
>>  	/*
>>  	 * If we are allowed to merge, then append bio list
>>  	 * from next to rq and release next. merge_requests_fn
> 
> This looks fine to me, the bio-to-request merge path already looks correct.
> Care to send a properly formatted patch?

I was able to reproduce on a test box, pretty trivially in fact:

# echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
# mkfs.ext4 /dev/nvme2n1
# mount /dev/nvme2n1 /data -o discard
# dd if=/dev/zero of=/data/10g bs=1M count=10k
# sync
# rm /data/10g
# sync <- triggered

Your patch still doesn't work, but mainly because we init the segments
to 0 when setting up a discard. The below works for me, and cleans up
the merge path a bit, since your patch was missing various adjustments
on both the merged and freed request.


diff --git a/block/blk-core.c b/block/blk-core.c
index a2005a485335..e4561c95fc23 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3282,6 +3282,8 @@ void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
 {
 	if (bio_has_data(bio))
 		rq->nr_phys_segments = bio_phys_segments(q, bio);
+	else if (bio_op(bio) == REQ_OP_DISCARD)
+		rq->nr_phys_segments = 1;
 
 	rq->__data_len = bio->bi_iter.bi_size;
 	rq->bio = rq->biotail = bio;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..782940c65d8a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -550,6 +550,24 @@ static bool req_no_special_merge(struct request *req)
 	return !q->mq_ops && req->special;
 }
 
+static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
+		struct request *next)
+{
+	unsigned short segments = blk_rq_nr_discard_segments(req);
+
+	if (segments >= queue_max_discard_segments(q))
+		goto no_merge;
+	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
+	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
+		goto no_merge;
+
+	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
+	return true;
+no_merge:
+	req_set_nomerge(q, req);
+	return false;
+}
+
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 				struct request *next)
 {
@@ -683,9 +701,13 @@ static struct request *attempt_merge(struct request_queue *q,
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
 	 * will have updated segment counts, update sector
-	 * counts here.
+	 * counts here. Handle DISCARDs separately, as they
+	 * have separate settings.
 	 */
-	if (!ll_merge_requests_fn(q, req, next))
+	if (req_op(req) == REQ_OP_DISCARD) {
+		if (!req_attempt_discard_merge(q, req, next))
+			return NULL;
+	} else if (!ll_merge_requests_fn(q, req, next))
 		return NULL;
 
 	/*
@@ -715,7 +737,8 @@ static struct request *attempt_merge(struct request_queue *q,
 
 	req->__data_len += blk_rq_bytes(next);
 
-	elv_merge_requests(q, req, next);
+	if (req_op(req) != REQ_OP_DISCARD)
+		elv_merge_requests(q, req, next);
 
 	/*
 	 * 'next' is going away, so update stats accordingly

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01 17:58                 ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01 17:58 UTC (permalink / raw)


On 2/1/18 8:26 AM, Jens Axboe wrote:
> On 1/31/18 9:56 PM, Keith Busch wrote:
>> On Wed, Jan 31, 2018@08:07:41PM -0700, Jens Axboe wrote:
>>>  	if (total_phys_segments > queue_max_segments(q))
>>> -		return 0;
>>> +			return 0;
>>
>> This perhaps unintended change happens to point out another problem:
>> queue_max_segments is the wrong limit for discards, which require
>> queue_max_discard_segments.
>>
>> It might be easier to merge discard requests special, like how merging
>> a discard bio is handled (untested).
> 
> Yeah agreed, we should just split it up completely instead instead of
> special casing it in the read/write path.
> 
> 
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..01671e1373ff 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
>>  	return !q->mq_ops && req->special;
>>  }
>>  
>> +static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
>> +		struct request *next)
>> +{
>> +	unsigned short segments = blk_rq_nr_discard_segments(req);
>> +
>> +	if (segments >= queue_max_discard_segments(q))
>> +		goto no_merge;
>> +	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
>> +	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>> +		goto no_merge;
>> +
>> +	req->biotail->bi_next = next->bio;
>> +	req->biotail = next->biotail;
>> +	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
>> +	next->bio = NULL;
>> +	return true;
>> +
>> +no_merge:
>> +	req_set_nomerge(q, req);
>> +	return false;
>> +}
>> +
>>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>>  				struct request *next)
>>  {
>> @@ -679,6 +701,15 @@ static struct request *attempt_merge(struct request_queue *q,
>>  	if (req->write_hint != next->write_hint)
>>  		return NULL;
>>  
>> +	/*
>> +	 * Discards are ... special.
>> +	 */
>> +	if (req_op(req) == REQ_OP_DISCARD) {
>> +		if (req_attempt_discard_merge(q, req, next))
>> +			return next;
>> +		return NULL;
>> +	}
>> +
>>  	/*
>>  	 * If we are allowed to merge, then append bio list
>>  	 * from next to rq and release next. merge_requests_fn
> 
> This looks fine to me, the bio-to-request merge path already looks correct.
> Care to send a properly formatted patch?

I was able to reproduce on a test box, pretty trivially in fact:

# echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
# mkfs.ext4 /dev/nvme2n1
# mount /dev/nvme2n1 /data -o discard
# dd if=/dev/zero of=/data/10g bs=1M count=10k
# sync
# rm /data/10g
# sync <- triggered

Your patch still doesn't work, but mainly because we init the segments
to 0 when setting up a discard. The below works for me, and cleans up
the merge path a bit, since your patch was missing various adjustments
on both the merged and freed request.


diff --git a/block/blk-core.c b/block/blk-core.c
index a2005a485335..e4561c95fc23 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3282,6 +3282,8 @@ void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
 {
 	if (bio_has_data(bio))
 		rq->nr_phys_segments = bio_phys_segments(q, bio);
+	else if (bio_op(bio) == REQ_OP_DISCARD)
+		rq->nr_phys_segments = 1;
 
 	rq->__data_len = bio->bi_iter.bi_size;
 	rq->bio = rq->biotail = bio;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..782940c65d8a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -550,6 +550,24 @@ static bool req_no_special_merge(struct request *req)
 	return !q->mq_ops && req->special;
 }
 
+static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
+		struct request *next)
+{
+	unsigned short segments = blk_rq_nr_discard_segments(req);
+
+	if (segments >= queue_max_discard_segments(q))
+		goto no_merge;
+	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
+	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
+		goto no_merge;
+
+	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
+	return true;
+no_merge:
+	req_set_nomerge(q, req);
+	return false;
+}
+
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 				struct request *next)
 {
@@ -683,9 +701,13 @@ static struct request *attempt_merge(struct request_queue *q,
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
 	 * will have updated segment counts, update sector
-	 * counts here.
+	 * counts here. Handle DISCARDs separately, as they
+	 * have separate settings.
 	 */
-	if (!ll_merge_requests_fn(q, req, next))
+	if (req_op(req) == REQ_OP_DISCARD) {
+		if (!req_attempt_discard_merge(q, req, next))
+			return NULL;
+	} else if (!ll_merge_requests_fn(q, req, next))
 		return NULL;
 
 	/*
@@ -715,7 +737,8 @@ static struct request *attempt_merge(struct request_queue *q,
 
 	req->__data_len += blk_rq_bytes(next);
 
-	elv_merge_requests(q, req, next);
+	if (req_op(req) != REQ_OP_DISCARD)
+		elv_merge_requests(q, req, next);
 
 	/*
 	 * 'next' is going away, so update stats accordingly

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-02-01 15:26               ` Jens Axboe
@ 2018-02-01 18:01                 ` Keith Busch
  -1 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-02-01 18:01 UTC (permalink / raw)
  To: Jens Axboe
  Cc: jianchao.wang, linux-block, Christoph Hellwig, linux-nvme, Ming Lei

On Thu, Feb 01, 2018 at 08:26:11AM -0700, Jens Axboe wrote:
> On 1/31/18 9:56 PM, Keith Busch wrote:
> 
> > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > index 8452fc7164cc..01671e1373ff 100644
> > --- a/block/blk-merge.c
> > +++ b/block/blk-merge.c
> > @@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
> >  	return !q->mq_ops && req->special;
> >  }
> >  
> > +static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
> > +		struct request *next)
> > +{
> > +	unsigned short segments = blk_rq_nr_discard_segments(req);
> > +
> > +	if (segments >= queue_max_discard_segments(q))
> > +		goto no_merge;
> > +	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
> > +	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
> > +		goto no_merge;
> > +
> > +	req->biotail->bi_next = next->bio;
> > +	req->biotail = next->biotail;
> > +	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
> > +	next->bio = NULL;
> > +	return true;
> > +
> > +no_merge:
> > +	req_set_nomerge(q, req);
> > +	return false;
> > +}
> > +
> >  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
> >  				struct request *next)
> >  {
> > @@ -679,6 +701,15 @@ static struct request *attempt_merge(struct request_queue *q,
> >  	if (req->write_hint != next->write_hint)
> >  		return NULL;
> >  
> > +	/*
> > +	 * Discards are ... special.
> > +	 */
> > +	if (req_op(req) == REQ_OP_DISCARD) {
> > +		if (req_attempt_discard_merge(q, req, next))
> > +			return next;
> > +		return NULL;
> > +	}
> > +
> >  	/*
> >  	 * If we are allowed to merge, then append bio list
> >  	 * from next to rq and release next. merge_requests_fn
> 
> This looks fine to me, the bio-to-request merge path already looks correct.
> Care to send a properly formatted patch?

Sending the patch now. It's a little different from the above so that it
doesn't need to duplicate some of the merging accounting.

Full disclosure, I have not found a way to trigger this merge. I'm just
running 'fio' with trim, randtrim, and trimwrite on device with
mq-deadline for the past hour, and haven't seen a merge happen yet.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01 18:01                 ` Keith Busch
  0 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-02-01 18:01 UTC (permalink / raw)


On Thu, Feb 01, 2018@08:26:11AM -0700, Jens Axboe wrote:
> On 1/31/18 9:56 PM, Keith Busch wrote:
> 
> > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > index 8452fc7164cc..01671e1373ff 100644
> > --- a/block/blk-merge.c
> > +++ b/block/blk-merge.c
> > @@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
> >  	return !q->mq_ops && req->special;
> >  }
> >  
> > +static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
> > +		struct request *next)
> > +{
> > +	unsigned short segments = blk_rq_nr_discard_segments(req);
> > +
> > +	if (segments >= queue_max_discard_segments(q))
> > +		goto no_merge;
> > +	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
> > +	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
> > +		goto no_merge;
> > +
> > +	req->biotail->bi_next = next->bio;
> > +	req->biotail = next->biotail;
> > +	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
> > +	next->bio = NULL;
> > +	return true;
> > +
> > +no_merge:
> > +	req_set_nomerge(q, req);
> > +	return false;
> > +}
> > +
> >  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
> >  				struct request *next)
> >  {
> > @@ -679,6 +701,15 @@ static struct request *attempt_merge(struct request_queue *q,
> >  	if (req->write_hint != next->write_hint)
> >  		return NULL;
> >  
> > +	/*
> > +	 * Discards are ... special.
> > +	 */
> > +	if (req_op(req) == REQ_OP_DISCARD) {
> > +		if (req_attempt_discard_merge(q, req, next))
> > +			return next;
> > +		return NULL;
> > +	}
> > +
> >  	/*
> >  	 * If we are allowed to merge, then append bio list
> >  	 * from next to rq and release next. merge_requests_fn
> 
> This looks fine to me, the bio-to-request merge path already looks correct.
> Care to send a properly formatted patch?

Sending the patch now. It's a little different from the above so that it
doesn't need to duplicate some of the merging accounting.

Full disclosure, I have not found a way to trigger this merge. I'm just
running 'fio' with trim, randtrim, and trimwrite on device with
mq-deadline for the past hour, and haven't seen a merge happen yet.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-02-01 17:58                 ` Jens Axboe
@ 2018-02-01 18:12                   ` Keith Busch
  -1 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-02-01 18:12 UTC (permalink / raw)
  To: Jens Axboe
  Cc: jianchao.wang, linux-block, Christoph Hellwig, linux-nvme, Ming Lei

On Thu, Feb 01, 2018 at 10:58:23AM -0700, Jens Axboe wrote:
> I was able to reproduce on a test box, pretty trivially in fact:
> 
> # echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
> # mkfs.ext4 /dev/nvme2n1
> # mount /dev/nvme2n1 /data -o discard
> # dd if=/dev/zero of=/data/10g bs=1M count=10k
> # sync
> # rm /data/10g
> # sync <- triggered

Nice! Thanks, this recipe works for me too.
 
> Your patch still doesn't work, but mainly because we init the segments
> to 0 when setting up a discard. The below works for me, and cleans up
> the merge path a bit, since your patch was missing various adjustments
> on both the merged and freed request.

Yep, your update is very similiar to my real patch, but I'm missing one
thing (elv_merge_requests). If you're already testing successfully with
your patch, I don't mind if you want to move forward with yours.

 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a2005a485335..e4561c95fc23 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3282,6 +3282,8 @@ void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
>  {
>  	if (bio_has_data(bio))
>  		rq->nr_phys_segments = bio_phys_segments(q, bio);
> +	else if (bio_op(bio) == REQ_OP_DISCARD)
> +		rq->nr_phys_segments = 1;
>  
>  	rq->__data_len = bio->bi_iter.bi_size;
>  	rq->bio = rq->biotail = bio;
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..782940c65d8a 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -550,6 +550,24 @@ static bool req_no_special_merge(struct request *req)
>  	return !q->mq_ops && req->special;
>  }
>  
> +static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
> +		struct request *next)
> +{
> +	unsigned short segments = blk_rq_nr_discard_segments(req);
> +
> +	if (segments >= queue_max_discard_segments(q))
> +		goto no_merge;
> +	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
> +	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
> +		goto no_merge;
> +
> +	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
> +	return true;
> +no_merge:
> +	req_set_nomerge(q, req);
> +	return false;
> +}
> +
>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  				struct request *next)
>  {
> @@ -683,9 +701,13 @@ static struct request *attempt_merge(struct request_queue *q,
>  	 * If we are allowed to merge, then append bio list
>  	 * from next to rq and release next. merge_requests_fn
>  	 * will have updated segment counts, update sector
> -	 * counts here.
> +	 * counts here. Handle DISCARDs separately, as they
> +	 * have separate settings.
>  	 */
> -	if (!ll_merge_requests_fn(q, req, next))
> +	if (req_op(req) == REQ_OP_DISCARD) {
> +		if (!req_attempt_discard_merge(q, req, next))
> +			return NULL;
> +	} else if (!ll_merge_requests_fn(q, req, next))
>  		return NULL;
>  
>  	/*
> @@ -715,7 +737,8 @@ static struct request *attempt_merge(struct request_queue *q,
>  
>  	req->__data_len += blk_rq_bytes(next);
>  
> -	elv_merge_requests(q, req, next);
> +	if (req_op(req) != REQ_OP_DISCARD)
> +		elv_merge_requests(q, req, next);
>  
>  	/*
>  	 * 'next' is going away, so update stats accordingly
> 
> -- 
> Jens Axboe
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01 18:12                   ` Keith Busch
  0 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-02-01 18:12 UTC (permalink / raw)


On Thu, Feb 01, 2018@10:58:23AM -0700, Jens Axboe wrote:
> I was able to reproduce on a test box, pretty trivially in fact:
> 
> # echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
> # mkfs.ext4 /dev/nvme2n1
> # mount /dev/nvme2n1 /data -o discard
> # dd if=/dev/zero of=/data/10g bs=1M count=10k
> # sync
> # rm /data/10g
> # sync <- triggered

Nice! Thanks, this recipe works for me too.
 
> Your patch still doesn't work, but mainly because we init the segments
> to 0 when setting up a discard. The below works for me, and cleans up
> the merge path a bit, since your patch was missing various adjustments
> on both the merged and freed request.

Yep, your update is very similiar to my real patch, but I'm missing one
thing (elv_merge_requests). If you're already testing successfully with
your patch, I don't mind if you want to move forward with yours.

 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a2005a485335..e4561c95fc23 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3282,6 +3282,8 @@ void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
>  {
>  	if (bio_has_data(bio))
>  		rq->nr_phys_segments = bio_phys_segments(q, bio);
> +	else if (bio_op(bio) == REQ_OP_DISCARD)
> +		rq->nr_phys_segments = 1;
>  
>  	rq->__data_len = bio->bi_iter.bi_size;
>  	rq->bio = rq->biotail = bio;
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..782940c65d8a 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -550,6 +550,24 @@ static bool req_no_special_merge(struct request *req)
>  	return !q->mq_ops && req->special;
>  }
>  
> +static bool req_attempt_discard_merge(struct request_queue *q, struct request *req,
> +		struct request *next)
> +{
> +	unsigned short segments = blk_rq_nr_discard_segments(req);
> +
> +	if (segments >= queue_max_discard_segments(q))
> +		goto no_merge;
> +	if (blk_rq_sectors(req) + bio_sectors(next->bio) >
> +	    blk_rq_get_max_sectors(req, blk_rq_pos(req)))
> +		goto no_merge;
> +
> +	req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
> +	return true;
> +no_merge:
> +	req_set_nomerge(q, req);
> +	return false;
> +}
> +
>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>  				struct request *next)
>  {
> @@ -683,9 +701,13 @@ static struct request *attempt_merge(struct request_queue *q,
>  	 * If we are allowed to merge, then append bio list
>  	 * from next to rq and release next. merge_requests_fn
>  	 * will have updated segment counts, update sector
> -	 * counts here.
> +	 * counts here. Handle DISCARDs separately, as they
> +	 * have separate settings.
>  	 */
> -	if (!ll_merge_requests_fn(q, req, next))
> +	if (req_op(req) == REQ_OP_DISCARD) {
> +		if (!req_attempt_discard_merge(q, req, next))
> +			return NULL;
> +	} else if (!ll_merge_requests_fn(q, req, next))
>  		return NULL;
>  
>  	/*
> @@ -715,7 +737,8 @@ static struct request *attempt_merge(struct request_queue *q,
>  
>  	req->__data_len += blk_rq_bytes(next);
>  
> -	elv_merge_requests(q, req, next);
> +	if (req_op(req) != REQ_OP_DISCARD)
> +		elv_merge_requests(q, req, next);
>  
>  	/*
>  	 * 'next' is going away, so update stats accordingly
> 
> -- 
> Jens Axboe
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-02-01 17:58                 ` Jens Axboe
@ 2018-02-01 19:52                   ` Keith Busch
  -1 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-02-01 19:52 UTC (permalink / raw)
  To: Jens Axboe
  Cc: jianchao.wang, linux-block, Christoph Hellwig, linux-nvme, Ming Lei

On Thu, Feb 01, 2018 at 10:58:23AM -0700, Jens Axboe wrote:
> I was able to reproduce on a test box, pretty trivially in fact:
> 
> # echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
> # mkfs.ext4 /dev/nvme2n1
> # mount /dev/nvme2n1 /data -o discard
> # dd if=/dev/zero of=/data/10g bs=1M count=10k
> # sync
> # rm /data/10g
> # sync <- triggered
> 
> Your patch still doesn't work, but mainly because we init the segments
> to 0 when setting up a discard. The below works for me, and cleans up
> the merge path a bit, since your patch was missing various adjustments
> on both the merged and freed request.

I'm still finding cases not accounted even your patch. I had to use the
following on top of that, and this pattern looks like it needs to be
repeated for all schedulers:

---
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 55c0a745b427..25c14c58385c 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -259,6 +259,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
 		if (!*merged_request)
 			elv_merged_request(q, rq, ELEVATOR_FRONT_MERGE);
 		return true;
+	case ELEVATOR_DISCARD_MERGE:
+		return bio_attempt_discard_merge(q, rq, bio);
 	default:
 		return false;
 	}
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index c56f211c8440..a0f5752b6858 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -451,7 +451,7 @@ static int dd_request_merge(struct request_queue *q, struct request **rq,
 
 		if (elv_bio_merge_ok(__rq, bio)) {
 			*rq = __rq;
-			return ELEVATOR_FRONT_MERGE;
+			return blk_try_merge(__rq, bio);
 		}
 	}
 
--

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01 19:52                   ` Keith Busch
  0 siblings, 0 replies; 42+ messages in thread
From: Keith Busch @ 2018-02-01 19:52 UTC (permalink / raw)


On Thu, Feb 01, 2018@10:58:23AM -0700, Jens Axboe wrote:
> I was able to reproduce on a test box, pretty trivially in fact:
> 
> # echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
> # mkfs.ext4 /dev/nvme2n1
> # mount /dev/nvme2n1 /data -o discard
> # dd if=/dev/zero of=/data/10g bs=1M count=10k
> # sync
> # rm /data/10g
> # sync <- triggered
> 
> Your patch still doesn't work, but mainly because we init the segments
> to 0 when setting up a discard. The below works for me, and cleans up
> the merge path a bit, since your patch was missing various adjustments
> on both the merged and freed request.

I'm still finding cases not accounted even your patch. I had to use the
following on top of that, and this pattern looks like it needs to be
repeated for all schedulers:

---
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 55c0a745b427..25c14c58385c 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -259,6 +259,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
 		if (!*merged_request)
 			elv_merged_request(q, rq, ELEVATOR_FRONT_MERGE);
 		return true;
+	case ELEVATOR_DISCARD_MERGE:
+		return bio_attempt_discard_merge(q, rq, bio);
 	default:
 		return false;
 	}
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index c56f211c8440..a0f5752b6858 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -451,7 +451,7 @@ static int dd_request_merge(struct request_queue *q, struct request **rq,
 
 		if (elv_bio_merge_ok(__rq, bio)) {
 			*rq = __rq;
-			return ELEVATOR_FRONT_MERGE;
+			return blk_try_merge(__rq, bio);
 		}
 	}
 
--

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
  2018-02-01 19:52                   ` Keith Busch
@ 2018-02-01 20:55                     ` Jens Axboe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01 20:55 UTC (permalink / raw)
  To: Keith Busch
  Cc: jianchao.wang, linux-block, Christoph Hellwig, linux-nvme, Ming Lei

On 2/1/18 12:52 PM, Keith Busch wrote:
> On Thu, Feb 01, 2018 at 10:58:23AM -0700, Jens Axboe wrote:
>> I was able to reproduce on a test box, pretty trivially in fact:
>>
>> # echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
>> # mkfs.ext4 /dev/nvme2n1
>> # mount /dev/nvme2n1 /data -o discard
>> # dd if=/dev/zero of=/data/10g bs=1M count=10k
>> # sync
>> # rm /data/10g
>> # sync <- triggered
>>
>> Your patch still doesn't work, but mainly because we init the segments
>> to 0 when setting up a discard. The below works for me, and cleans up
>> the merge path a bit, since your patch was missing various adjustments
>> on both the merged and freed request.
> 
> I'm still finding cases not accounted even your patch. I had to use the
> following on top of that, and this pattern looks like it needs to be
> repeated for all schedulers:
> 
> ---
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 55c0a745b427..25c14c58385c 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -259,6 +259,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
>  		if (!*merged_request)
>  			elv_merged_request(q, rq, ELEVATOR_FRONT_MERGE);
>  		return true;
> +	case ELEVATOR_DISCARD_MERGE:
> +		return bio_attempt_discard_merge(q, rq, bio);

Yes, we need this to enable the bio-to-request merging.

> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index c56f211c8440..a0f5752b6858 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -451,7 +451,7 @@ static int dd_request_merge(struct request_queue *q, struct request **rq,
>  
>  		if (elv_bio_merge_ok(__rq, bio)) {
>  			*rq = __rq;
> -			return ELEVATOR_FRONT_MERGE;
> +			return blk_try_merge(__rq, bio);

This isn't needed. We already know it's a front merge at this point,
and it can only be a normal read/write request.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

* WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3
@ 2018-02-01 20:55                     ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2018-02-01 20:55 UTC (permalink / raw)


On 2/1/18 12:52 PM, Keith Busch wrote:
> On Thu, Feb 01, 2018@10:58:23AM -0700, Jens Axboe wrote:
>> I was able to reproduce on a test box, pretty trivially in fact:
>>
>> # echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
>> # mkfs.ext4 /dev/nvme2n1
>> # mount /dev/nvme2n1 /data -o discard
>> # dd if=/dev/zero of=/data/10g bs=1M count=10k
>> # sync
>> # rm /data/10g
>> # sync <- triggered
>>
>> Your patch still doesn't work, but mainly because we init the segments
>> to 0 when setting up a discard. The below works for me, and cleans up
>> the merge path a bit, since your patch was missing various adjustments
>> on both the merged and freed request.
> 
> I'm still finding cases not accounted even your patch. I had to use the
> following on top of that, and this pattern looks like it needs to be
> repeated for all schedulers:
> 
> ---
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 55c0a745b427..25c14c58385c 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -259,6 +259,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
>  		if (!*merged_request)
>  			elv_merged_request(q, rq, ELEVATOR_FRONT_MERGE);
>  		return true;
> +	case ELEVATOR_DISCARD_MERGE:
> +		return bio_attempt_discard_merge(q, rq, bio);

Yes, we need this to enable the bio-to-request merging.

> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index c56f211c8440..a0f5752b6858 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -451,7 +451,7 @@ static int dd_request_merge(struct request_queue *q, struct request **rq,
>  
>  		if (elv_bio_merge_ok(__rq, bio)) {
>  			*rq = __rq;
> -			return ELEVATOR_FRONT_MERGE;
> +			return blk_try_merge(__rq, bio);

This isn't needed. We already know it's a front merge at this point,
and it can only be a normal read/write request.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2018-02-01 20:55 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-30 15:41 WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3 Jens Axboe
2018-01-30 15:41 ` Jens Axboe
2018-01-30 15:57 ` Jens Axboe
2018-01-30 15:57   ` Jens Axboe
2018-01-30 20:30   ` Keith Busch
2018-01-30 20:30     ` Keith Busch
2018-01-30 20:32     ` Jens Axboe
2018-01-30 20:32       ` Jens Axboe
2018-01-30 20:49       ` Keith Busch
2018-01-30 20:49         ` Keith Busch
2018-01-30 20:55         ` Jens Axboe
2018-01-30 20:55           ` Jens Axboe
2018-01-31  4:25   ` jianchao.wang
2018-01-31  4:25     ` jianchao.wang
2018-01-31 15:29     ` Jens Axboe
2018-01-31 15:29       ` Jens Axboe
2018-01-31 23:33       ` Keith Busch
2018-01-31 23:33         ` Keith Busch
2018-02-01  3:03         ` Jens Axboe
2018-02-01  3:03           ` Jens Axboe
2018-02-01  3:03       ` jianchao.wang
2018-02-01  3:03         ` jianchao.wang
2018-02-01  3:07         ` Jens Axboe
2018-02-01  3:07           ` Jens Axboe
2018-02-01  3:33           ` jianchao.wang
2018-02-01  3:33             ` jianchao.wang
2018-02-01  3:35             ` Jens Axboe
2018-02-01  3:35               ` Jens Axboe
2018-02-01  4:56           ` Keith Busch
2018-02-01  4:56             ` Keith Busch
2018-02-01 15:26             ` Jens Axboe
2018-02-01 15:26               ` Jens Axboe
2018-02-01 17:58               ` Jens Axboe
2018-02-01 17:58                 ` Jens Axboe
2018-02-01 18:12                 ` Keith Busch
2018-02-01 18:12                   ` Keith Busch
2018-02-01 19:52                 ` Keith Busch
2018-02-01 19:52                   ` Keith Busch
2018-02-01 20:55                   ` Jens Axboe
2018-02-01 20:55                     ` Jens Axboe
2018-02-01 18:01               ` Keith Busch
2018-02-01 18:01                 ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.