* New warning in nvme_setup_discard @ 2021-07-15 13:56 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-15 13:56 UTC (permalink / raw) To: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Ming Lei, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman Cc: Keith Busch Hello. After a v5.13.2 massive update I encountered this: ``` [19231.556665] ------------[ cut here ]------------ [19231.556674] WARNING: CPU: 20 PID: 502 at drivers/nvme/host/core.c:850 nvme_setup_discard+0x188/0x1f0 ... [19231.556776] CPU: 20 PID: 502 Comm: kworker/20:1H Not tainted 5.13.2 #1 [19231.556780] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3601 05/26/2021 [19231.556784] Workqueue: kblockd blk_mq_run_work_fn [19231.556789] RIP: 0010:nvme_setup_discard+0x188/0x1f0 [19231.556794] Code: 49 8b 44 24 10 4c 8b 90 40 0b 00 00 4c 2b 15 8f 09 d8 00 49 c1 fa 06 49 c1 e2 0c 4c 03 15 90 09 d8 00 4d 89 d0 e9 b9 fe ff ff <0f> 0b b8 00 00 00 80 49 01 c2 72 52 48 c7 c0 00 00 00 80 48 2b 05 [19231.556798] RSP: 0018:ffffaed2416efc00 EFLAGS: 00010287 [19231.556802] RAX: ffff8e67fb580000 RBX: ffff8e640bbe5240 RCX: 0000000000000020 [19231.556805] RDX: ffff8e67fb580000 RSI: 000000000000001f RDI: 0000000000000000 [19231.556808] RBP: ffff8e640bbe5388 R08: ffff8e677b580000 R09: 0000000008abb800 [19231.556811] R10: ffff8e677b580000 R11: 0000000000000400 R12: ffff8e6405999c00 [19231.556814] R13: 000000000000001f R14: ffff8e6405693000 R15: ffff8e640bbaf800 [19231.556816] FS: 0000000000000000(0000) GS:ffff8e6b0ef00000(0000) knlGS: 0000000000000000 [19231.556819] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [19231.556822] CR2: ffff888c76ece000 CR3: 000000047a184000 CR4: 0000000000350ee0 [19231.556825] Call Trace: [19231.556830] nvme_setup_cmd+0x2d0/0x670 [19231.556834] nvme_queue_rq+0x79/0xc90 [19231.556837] ? __sbitmap_get_word+0x30/0x80 [19231.556842] ? sbitmap_get+0x85/0x180 [19231.556846] blk_mq_dispatch_rq_list+0x15c/0x810 [19231.556851] ? list_sort+0x21d/0x2f0 [19231.556856] __blk_mq_do_dispatch_sched+0x196/0x320 [19231.556860] __blk_mq_sched_dispatch_requests+0x14d/0x190 [19231.556864] blk_mq_sched_dispatch_requests+0x2f/0x60 [19231.556867] blk_mq_run_work_fn+0x43/0xc0 [19231.556871] process_one_work+0x24e/0x430 [19231.556876] worker_thread+0x54/0x4d0 [19231.556880] ? process_one_work+0x430/0x430 [19231.556883] kthread+0x1b3/0x1e0 [19231.556886] ? __kthread_init_worker+0x50/0x50 [19231.556889] ret_from_fork+0x22/0x30 [19231.556895] ---[ end trace d9abdf019a56b4c7 ]--- [19231.556906] blk_update_request: I/O error, dev nvme1n1, sector 632935424 op 0x3:(DISCARD) flags 0x0 phys_seg 31 prio class 0 ``` or, in code: ``` 850 if (WARN_ON_ONCE(n != segments)) { 851 if (virt_to_page(range) == ns->ctrl->discard_page) 852 clear_bit_unlock(0, &ns->ctrl->discard_page_busy); 853 else 854 kfree(range); 855 return BLK_STS_IOERR; 856 } ``` BFQ scheduler is in use. Something similar was already fixed by a958937ff166fc60d1c3a721036f6ff41bfa2821, but I do not have a multipath device here, it's just 2 NVMe SSDs in a soft RAID10 with LUKS and LVM. Any idea what this might mean? v5.13.2 brought some commit into a stable tree that are, as I still suspect, causing unreproducible panics [1] [2]. Previously, I dropped that extra stuff from my kernel build and had no issues. This time I also do not have any extra commits in the block layer, only those that are in v5.13.2. Thanks. [1] https://lore.kernel.org/linux-block/3533087.dJKXTdksHR@spock/ [2] https://lore.kernel.org/linux-block/2957867.CS06ZTPI5V@spock/ -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* New warning in nvme_setup_discard @ 2021-07-15 13:56 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-15 13:56 UTC (permalink / raw) To: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Ming Lei, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman Cc: Keith Busch Hello. After a v5.13.2 massive update I encountered this: ``` [19231.556665] ------------[ cut here ]------------ [19231.556674] WARNING: CPU: 20 PID: 502 at drivers/nvme/host/core.c:850 nvme_setup_discard+0x188/0x1f0 ... [19231.556776] CPU: 20 PID: 502 Comm: kworker/20:1H Not tainted 5.13.2 #1 [19231.556780] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3601 05/26/2021 [19231.556784] Workqueue: kblockd blk_mq_run_work_fn [19231.556789] RIP: 0010:nvme_setup_discard+0x188/0x1f0 [19231.556794] Code: 49 8b 44 24 10 4c 8b 90 40 0b 00 00 4c 2b 15 8f 09 d8 00 49 c1 fa 06 49 c1 e2 0c 4c 03 15 90 09 d8 00 4d 89 d0 e9 b9 fe ff ff <0f> 0b b8 00 00 00 80 49 01 c2 72 52 48 c7 c0 00 00 00 80 48 2b 05 [19231.556798] RSP: 0018:ffffaed2416efc00 EFLAGS: 00010287 [19231.556802] RAX: ffff8e67fb580000 RBX: ffff8e640bbe5240 RCX: 0000000000000020 [19231.556805] RDX: ffff8e67fb580000 RSI: 000000000000001f RDI: 0000000000000000 [19231.556808] RBP: ffff8e640bbe5388 R08: ffff8e677b580000 R09: 0000000008abb800 [19231.556811] R10: ffff8e677b580000 R11: 0000000000000400 R12: ffff8e6405999c00 [19231.556814] R13: 000000000000001f R14: ffff8e6405693000 R15: ffff8e640bbaf800 [19231.556816] FS: 0000000000000000(0000) GS:ffff8e6b0ef00000(0000) knlGS: 0000000000000000 [19231.556819] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [19231.556822] CR2: ffff888c76ece000 CR3: 000000047a184000 CR4: 0000000000350ee0 [19231.556825] Call Trace: [19231.556830] nvme_setup_cmd+0x2d0/0x670 [19231.556834] nvme_queue_rq+0x79/0xc90 [19231.556837] ? __sbitmap_get_word+0x30/0x80 [19231.556842] ? sbitmap_get+0x85/0x180 [19231.556846] blk_mq_dispatch_rq_list+0x15c/0x810 [19231.556851] ? list_sort+0x21d/0x2f0 [19231.556856] __blk_mq_do_dispatch_sched+0x196/0x320 [19231.556860] __blk_mq_sched_dispatch_requests+0x14d/0x190 [19231.556864] blk_mq_sched_dispatch_requests+0x2f/0x60 [19231.556867] blk_mq_run_work_fn+0x43/0xc0 [19231.556871] process_one_work+0x24e/0x430 [19231.556876] worker_thread+0x54/0x4d0 [19231.556880] ? process_one_work+0x430/0x430 [19231.556883] kthread+0x1b3/0x1e0 [19231.556886] ? __kthread_init_worker+0x50/0x50 [19231.556889] ret_from_fork+0x22/0x30 [19231.556895] ---[ end trace d9abdf019a56b4c7 ]--- [19231.556906] blk_update_request: I/O error, dev nvme1n1, sector 632935424 op 0x3:(DISCARD) flags 0x0 phys_seg 31 prio class 0 ``` or, in code: ``` 850 if (WARN_ON_ONCE(n != segments)) { 851 if (virt_to_page(range) == ns->ctrl->discard_page) 852 clear_bit_unlock(0, &ns->ctrl->discard_page_busy); 853 else 854 kfree(range); 855 return BLK_STS_IOERR; 856 } ``` BFQ scheduler is in use. Something similar was already fixed by a958937ff166fc60d1c3a721036f6ff41bfa2821, but I do not have a multipath device here, it's just 2 NVMe SSDs in a soft RAID10 with LUKS and LVM. Any idea what this might mean? v5.13.2 brought some commit into a stable tree that are, as I still suspect, causing unreproducible panics [1] [2]. Previously, I dropped that extra stuff from my kernel build and had no issues. This time I also do not have any extra commits in the block layer, only those that are in v5.13.2. Thanks. [1] https://lore.kernel.org/linux-block/3533087.dJKXTdksHR@spock/ [2] https://lore.kernel.org/linux-block/2957867.CS06ZTPI5V@spock/ -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-15 13:56 ` Oleksandr Natalenko @ 2021-07-15 14:19 ` Greg Kroah-Hartman -1 siblings, 0 replies; 52+ messages in thread From: Greg Kroah-Hartman @ 2021-07-15 14:19 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Ming Lei, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Keith Busch On Thu, Jul 15, 2021 at 03:56:38PM +0200, Oleksandr Natalenko wrote: > Hello. > > After a v5.13.2 massive update I encountered this: > > ``` > [19231.556665] ------------[ cut here ]------------ > [19231.556674] WARNING: CPU: 20 PID: 502 at drivers/nvme/host/core.c:850 > nvme_setup_discard+0x188/0x1f0 > ... > [19231.556776] CPU: 20 PID: 502 Comm: kworker/20:1H Not tainted 5.13.2 #1 > [19231.556780] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS > 3601 05/26/2021 > [19231.556784] Workqueue: kblockd blk_mq_run_work_fn > [19231.556789] RIP: 0010:nvme_setup_discard+0x188/0x1f0 > [19231.556794] Code: 49 8b 44 24 10 4c 8b 90 40 0b 00 00 4c 2b 15 8f 09 d8 00 > 49 c1 fa 06 49 c1 e2 0c 4c 03 15 90 09 d8 00 4d 89 d0 e9 b9 fe ff ff <0f> 0b b8 > 00 00 00 80 49 01 c2 72 52 48 c7 c0 00 00 00 80 48 2b 05 > [19231.556798] RSP: 0018:ffffaed2416efc00 EFLAGS: 00010287 > [19231.556802] RAX: ffff8e67fb580000 RBX: ffff8e640bbe5240 RCX: 0000000000000020 > [19231.556805] RDX: ffff8e67fb580000 RSI: 000000000000001f RDI: 0000000000000000 > [19231.556808] RBP: ffff8e640bbe5388 R08: ffff8e677b580000 R09: 0000000008abb800 > [19231.556811] R10: ffff8e677b580000 R11: 0000000000000400 R12: ffff8e6405999c00 > [19231.556814] R13: 000000000000001f R14: ffff8e6405693000 R15: ffff8e640bbaf800 > [19231.556816] FS: 0000000000000000(0000) GS:ffff8e6b0ef00000(0000) knlGS: > 0000000000000000 > [19231.556819] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [19231.556822] CR2: ffff888c76ece000 CR3: 000000047a184000 CR4: 0000000000350ee0 > [19231.556825] Call Trace: > [19231.556830] nvme_setup_cmd+0x2d0/0x670 > [19231.556834] nvme_queue_rq+0x79/0xc90 > [19231.556837] ? __sbitmap_get_word+0x30/0x80 > [19231.556842] ? sbitmap_get+0x85/0x180 > [19231.556846] blk_mq_dispatch_rq_list+0x15c/0x810 > [19231.556851] ? list_sort+0x21d/0x2f0 > [19231.556856] __blk_mq_do_dispatch_sched+0x196/0x320 > [19231.556860] __blk_mq_sched_dispatch_requests+0x14d/0x190 > [19231.556864] blk_mq_sched_dispatch_requests+0x2f/0x60 > [19231.556867] blk_mq_run_work_fn+0x43/0xc0 > [19231.556871] process_one_work+0x24e/0x430 > [19231.556876] worker_thread+0x54/0x4d0 > [19231.556880] ? process_one_work+0x430/0x430 > [19231.556883] kthread+0x1b3/0x1e0 > [19231.556886] ? __kthread_init_worker+0x50/0x50 > [19231.556889] ret_from_fork+0x22/0x30 > [19231.556895] ---[ end trace d9abdf019a56b4c7 ]--- > [19231.556906] blk_update_request: I/O error, dev nvme1n1, sector 632935424 op > 0x3:(DISCARD) flags 0x0 phys_seg 31 prio class 0 > ``` > > or, in code: > > ``` > 850 if (WARN_ON_ONCE(n != segments)) { > 851 if (virt_to_page(range) == ns->ctrl->discard_page) > 852 clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > 853 else > 854 kfree(range); > 855 return BLK_STS_IOERR; > 856 } > ``` > > BFQ scheduler is in use. > > Something similar was already fixed by a958937ff166fc60d1c3a721036f6ff41bfa2821, > but I do not have a multipath device here, it's just 2 NVMe SSDs in a soft > RAID10 with LUKS and LVM. > > Any idea what this might mean? v5.13.2 brought some commit into a stable tree > that are, as I still suspect, causing unreproducible panics [1] [2]. > Previously, I dropped that extra stuff from my kernel build and had no issues. > This time I also do not have any extra commits in the block layer, only those > that are in v5.13.2. > > Thanks. > > [1] https://lore.kernel.org/linux-block/3533087.dJKXTdksHR@spock/ > [2] https://lore.kernel.org/linux-block/2957867.CS06ZTPI5V@spock/ Can you run 'git bisect' to find the offending patch? thanks, greg k-h ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-15 14:19 ` Greg Kroah-Hartman 0 siblings, 0 replies; 52+ messages in thread From: Greg Kroah-Hartman @ 2021-07-15 14:19 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Ming Lei, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Keith Busch On Thu, Jul 15, 2021 at 03:56:38PM +0200, Oleksandr Natalenko wrote: > Hello. > > After a v5.13.2 massive update I encountered this: > > ``` > [19231.556665] ------------[ cut here ]------------ > [19231.556674] WARNING: CPU: 20 PID: 502 at drivers/nvme/host/core.c:850 > nvme_setup_discard+0x188/0x1f0 > ... > [19231.556776] CPU: 20 PID: 502 Comm: kworker/20:1H Not tainted 5.13.2 #1 > [19231.556780] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS > 3601 05/26/2021 > [19231.556784] Workqueue: kblockd blk_mq_run_work_fn > [19231.556789] RIP: 0010:nvme_setup_discard+0x188/0x1f0 > [19231.556794] Code: 49 8b 44 24 10 4c 8b 90 40 0b 00 00 4c 2b 15 8f 09 d8 00 > 49 c1 fa 06 49 c1 e2 0c 4c 03 15 90 09 d8 00 4d 89 d0 e9 b9 fe ff ff <0f> 0b b8 > 00 00 00 80 49 01 c2 72 52 48 c7 c0 00 00 00 80 48 2b 05 > [19231.556798] RSP: 0018:ffffaed2416efc00 EFLAGS: 00010287 > [19231.556802] RAX: ffff8e67fb580000 RBX: ffff8e640bbe5240 RCX: 0000000000000020 > [19231.556805] RDX: ffff8e67fb580000 RSI: 000000000000001f RDI: 0000000000000000 > [19231.556808] RBP: ffff8e640bbe5388 R08: ffff8e677b580000 R09: 0000000008abb800 > [19231.556811] R10: ffff8e677b580000 R11: 0000000000000400 R12: ffff8e6405999c00 > [19231.556814] R13: 000000000000001f R14: ffff8e6405693000 R15: ffff8e640bbaf800 > [19231.556816] FS: 0000000000000000(0000) GS:ffff8e6b0ef00000(0000) knlGS: > 0000000000000000 > [19231.556819] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [19231.556822] CR2: ffff888c76ece000 CR3: 000000047a184000 CR4: 0000000000350ee0 > [19231.556825] Call Trace: > [19231.556830] nvme_setup_cmd+0x2d0/0x670 > [19231.556834] nvme_queue_rq+0x79/0xc90 > [19231.556837] ? __sbitmap_get_word+0x30/0x80 > [19231.556842] ? sbitmap_get+0x85/0x180 > [19231.556846] blk_mq_dispatch_rq_list+0x15c/0x810 > [19231.556851] ? list_sort+0x21d/0x2f0 > [19231.556856] __blk_mq_do_dispatch_sched+0x196/0x320 > [19231.556860] __blk_mq_sched_dispatch_requests+0x14d/0x190 > [19231.556864] blk_mq_sched_dispatch_requests+0x2f/0x60 > [19231.556867] blk_mq_run_work_fn+0x43/0xc0 > [19231.556871] process_one_work+0x24e/0x430 > [19231.556876] worker_thread+0x54/0x4d0 > [19231.556880] ? process_one_work+0x430/0x430 > [19231.556883] kthread+0x1b3/0x1e0 > [19231.556886] ? __kthread_init_worker+0x50/0x50 > [19231.556889] ret_from_fork+0x22/0x30 > [19231.556895] ---[ end trace d9abdf019a56b4c7 ]--- > [19231.556906] blk_update_request: I/O error, dev nvme1n1, sector 632935424 op > 0x3:(DISCARD) flags 0x0 phys_seg 31 prio class 0 > ``` > > or, in code: > > ``` > 850 if (WARN_ON_ONCE(n != segments)) { > 851 if (virt_to_page(range) == ns->ctrl->discard_page) > 852 clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > 853 else > 854 kfree(range); > 855 return BLK_STS_IOERR; > 856 } > ``` > > BFQ scheduler is in use. > > Something similar was already fixed by a958937ff166fc60d1c3a721036f6ff41bfa2821, > but I do not have a multipath device here, it's just 2 NVMe SSDs in a soft > RAID10 with LUKS and LVM. > > Any idea what this might mean? v5.13.2 brought some commit into a stable tree > that are, as I still suspect, causing unreproducible panics [1] [2]. > Previously, I dropped that extra stuff from my kernel build and had no issues. > This time I also do not have any extra commits in the block layer, only those > that are in v5.13.2. > > Thanks. > > [1] https://lore.kernel.org/linux-block/3533087.dJKXTdksHR@spock/ > [2] https://lore.kernel.org/linux-block/2957867.CS06ZTPI5V@spock/ Can you run 'git bisect' to find the offending patch? thanks, greg k-h _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-15 14:19 ` Greg Kroah-Hartman @ 2021-07-15 14:21 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-15 14:21 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Ming Lei, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Keith Busch Hello. On čtvrtek 15. července 2021 16:19:31 CEST Greg Kroah-Hartman wrote: > Can you run 'git bisect' to find the offending patch? Of course, as soon as I reproduce this reliably. Thanks. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-15 14:21 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-15 14:21 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Ming Lei, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Keith Busch Hello. On čtvrtek 15. července 2021 16:19:31 CEST Greg Kroah-Hartman wrote: > Can you run 'git bisect' to find the offending patch? Of course, as soon as I reproduce this reliably. Thanks. -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-15 14:19 ` Greg Kroah-Hartman @ 2021-07-15 21:37 ` Laurence Oberman -1 siblings, 0 replies; 52+ messages in thread From: Laurence Oberman @ 2021-07-15 21:37 UTC (permalink / raw) To: Greg Kroah-Hartman, Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Ming Lei, Paolo Valente, Jan Kara, Sasha Levin, Keith Busch On Thu, 2021-07-15 at 16:19 +0200, Greg Kroah-Hartman wrote: > On Thu, Jul 15, 2021 at 03:56:38PM +0200, Oleksandr Natalenko wrote: > > Hello. > > > > After a v5.13.2 massive update I encountered this: > > > > ``` > > [19231.556665] ------------[ cut here ]------------ > > [19231.556674] WARNING: CPU: 20 PID: 502 at > > drivers/nvme/host/core.c:850 > > nvme_setup_discard+0x188/0x1f0 > > ... > > [19231.556776] CPU: 20 PID: 502 Comm: kworker/20:1H Not tainted > > 5.13.2 #1 > > [19231.556780] Hardware name: ASUS System Product Name/Pro WS X570- > > ACE, BIOS > > 3601 05/26/2021 > > [19231.556784] Workqueue: kblockd blk_mq_run_work_fn > > [19231.556789] RIP: 0010:nvme_setup_discard+0x188/0x1f0 > > [19231.556794] Code: 49 8b 44 24 10 4c 8b 90 40 0b 00 00 4c 2b 15 > > 8f 09 d8 00 > > 49 c1 fa 06 49 c1 e2 0c 4c 03 15 90 09 d8 00 4d 89 d0 e9 b9 fe ff > > ff <0f> 0b b8 > > 00 00 00 80 49 01 c2 72 52 48 c7 c0 00 00 00 80 48 2b 05 > > [19231.556798] RSP: 0018:ffffaed2416efc00 EFLAGS: 00010287 > > [19231.556802] RAX: ffff8e67fb580000 RBX: ffff8e640bbe5240 RCX: > > 0000000000000020 > > [19231.556805] RDX: ffff8e67fb580000 RSI: 000000000000001f RDI: > > 0000000000000000 > > [19231.556808] RBP: ffff8e640bbe5388 R08: ffff8e677b580000 R09: > > 0000000008abb800 > > [19231.556811] R10: ffff8e677b580000 R11: 0000000000000400 R12: > > ffff8e6405999c00 > > [19231.556814] R13: 000000000000001f R14: ffff8e6405693000 R15: > > ffff8e640bbaf800 > > [19231.556816] FS: 0000000000000000(0000) > > GS:ffff8e6b0ef00000(0000) knlGS: > > 0000000000000000 > > [19231.556819] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [19231.556822] CR2: ffff888c76ece000 CR3: 000000047a184000 CR4: > > 0000000000350ee0 > > [19231.556825] Call Trace: > > [19231.556830] nvme_setup_cmd+0x2d0/0x670 > > [19231.556834] nvme_queue_rq+0x79/0xc90 > > [19231.556837] ? __sbitmap_get_word+0x30/0x80 > > [19231.556842] ? sbitmap_get+0x85/0x180 > > [19231.556846] blk_mq_dispatch_rq_list+0x15c/0x810 > > [19231.556851] ? list_sort+0x21d/0x2f0 > > [19231.556856] __blk_mq_do_dispatch_sched+0x196/0x320 > > [19231.556860] __blk_mq_sched_dispatch_requests+0x14d/0x190 > > [19231.556864] blk_mq_sched_dispatch_requests+0x2f/0x60 > > [19231.556867] blk_mq_run_work_fn+0x43/0xc0 > > [19231.556871] process_one_work+0x24e/0x430 > > [19231.556876] worker_thread+0x54/0x4d0 > > [19231.556880] ? process_one_work+0x430/0x430 > > [19231.556883] kthread+0x1b3/0x1e0 > > [19231.556886] ? __kthread_init_worker+0x50/0x50 > > [19231.556889] ret_from_fork+0x22/0x30 > > [19231.556895] ---[ end trace d9abdf019a56b4c7 ]--- > > [19231.556906] blk_update_request: I/O error, dev nvme1n1, sector > > 632935424 op > > 0x3:(DISCARD) flags 0x0 phys_seg 31 prio class 0 > > ``` > > > > or, in code: > > > > ``` > > 850 if (WARN_ON_ONCE(n != segments)) { > > 851 if (virt_to_page(range) == ns->ctrl->discard_page) > > 852 clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > > 853 else > > 854 kfree(range); > > 855 return BLK_STS_IOERR; > > 856 } > > ``` > > > > BFQ scheduler is in use. > > > > Something similar was already fixed by > > a958937ff166fc60d1c3a721036f6ff41bfa2821, > > but I do not have a multipath device here, it's just 2 NVMe SSDs in > > a soft > > RAID10 with LUKS and LVM. > > > > Any idea what this might mean? v5.13.2 brought some commit into a > > stable tree > > that are, as I still suspect, causing unreproducible panics [1] > > [2]. > > Previously, I dropped that extra stuff from my kernel build and had > > no issues. > > This time I also do not have any extra commits in the block layer, > > only those > > that are in v5.13.2. > > > > Thanks. > > > > [1] https://lore.kernel.org/linux-block/3533087.dJKXTdksHR@spock/ > > [2] https://lore.kernel.org/linux-block/2957867.CS06ZTPI5V@spock/ > > Can you run 'git bisect' to find the offending patch? > > thanks, > > greg k-h > Hello [root@ml150 ~]# uname -a Linux ml150 5.14.0-rc1+ #1 SMP Thu Jul 15 16:41:08 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux [root@ml150 ~]# nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ------------------------------- --------- --------- -------------------------- ---------------- ----- --- /dev/nvme0n1 CVCQ536300C9400AGN INTEL SSDPEDMW400G4 1 400.09 GB / 400.09 GB 512 B + 0 B 8EV10135 /dev/nvme1n1 CVFT7383000W400BGN INTEL SSDPEDMD400G4 1 400.09 GB / 400.09 GB 512 B + 0 B 8DV10171 fwiw I built 5.14 and I have 2 nvme devices and I am not seeing this even using them to build the kernel on. Regards Laurence Oberman ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-15 21:37 ` Laurence Oberman 0 siblings, 0 replies; 52+ messages in thread From: Laurence Oberman @ 2021-07-15 21:37 UTC (permalink / raw) To: Greg Kroah-Hartman, Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Ming Lei, Paolo Valente, Jan Kara, Sasha Levin, Keith Busch On Thu, 2021-07-15 at 16:19 +0200, Greg Kroah-Hartman wrote: > On Thu, Jul 15, 2021 at 03:56:38PM +0200, Oleksandr Natalenko wrote: > > Hello. > > > > After a v5.13.2 massive update I encountered this: > > > > ``` > > [19231.556665] ------------[ cut here ]------------ > > [19231.556674] WARNING: CPU: 20 PID: 502 at > > drivers/nvme/host/core.c:850 > > nvme_setup_discard+0x188/0x1f0 > > ... > > [19231.556776] CPU: 20 PID: 502 Comm: kworker/20:1H Not tainted > > 5.13.2 #1 > > [19231.556780] Hardware name: ASUS System Product Name/Pro WS X570- > > ACE, BIOS > > 3601 05/26/2021 > > [19231.556784] Workqueue: kblockd blk_mq_run_work_fn > > [19231.556789] RIP: 0010:nvme_setup_discard+0x188/0x1f0 > > [19231.556794] Code: 49 8b 44 24 10 4c 8b 90 40 0b 00 00 4c 2b 15 > > 8f 09 d8 00 > > 49 c1 fa 06 49 c1 e2 0c 4c 03 15 90 09 d8 00 4d 89 d0 e9 b9 fe ff > > ff <0f> 0b b8 > > 00 00 00 80 49 01 c2 72 52 48 c7 c0 00 00 00 80 48 2b 05 > > [19231.556798] RSP: 0018:ffffaed2416efc00 EFLAGS: 00010287 > > [19231.556802] RAX: ffff8e67fb580000 RBX: ffff8e640bbe5240 RCX: > > 0000000000000020 > > [19231.556805] RDX: ffff8e67fb580000 RSI: 000000000000001f RDI: > > 0000000000000000 > > [19231.556808] RBP: ffff8e640bbe5388 R08: ffff8e677b580000 R09: > > 0000000008abb800 > > [19231.556811] R10: ffff8e677b580000 R11: 0000000000000400 R12: > > ffff8e6405999c00 > > [19231.556814] R13: 000000000000001f R14: ffff8e6405693000 R15: > > ffff8e640bbaf800 > > [19231.556816] FS: 0000000000000000(0000) > > GS:ffff8e6b0ef00000(0000) knlGS: > > 0000000000000000 > > [19231.556819] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [19231.556822] CR2: ffff888c76ece000 CR3: 000000047a184000 CR4: > > 0000000000350ee0 > > [19231.556825] Call Trace: > > [19231.556830] nvme_setup_cmd+0x2d0/0x670 > > [19231.556834] nvme_queue_rq+0x79/0xc90 > > [19231.556837] ? __sbitmap_get_word+0x30/0x80 > > [19231.556842] ? sbitmap_get+0x85/0x180 > > [19231.556846] blk_mq_dispatch_rq_list+0x15c/0x810 > > [19231.556851] ? list_sort+0x21d/0x2f0 > > [19231.556856] __blk_mq_do_dispatch_sched+0x196/0x320 > > [19231.556860] __blk_mq_sched_dispatch_requests+0x14d/0x190 > > [19231.556864] blk_mq_sched_dispatch_requests+0x2f/0x60 > > [19231.556867] blk_mq_run_work_fn+0x43/0xc0 > > [19231.556871] process_one_work+0x24e/0x430 > > [19231.556876] worker_thread+0x54/0x4d0 > > [19231.556880] ? process_one_work+0x430/0x430 > > [19231.556883] kthread+0x1b3/0x1e0 > > [19231.556886] ? __kthread_init_worker+0x50/0x50 > > [19231.556889] ret_from_fork+0x22/0x30 > > [19231.556895] ---[ end trace d9abdf019a56b4c7 ]--- > > [19231.556906] blk_update_request: I/O error, dev nvme1n1, sector > > 632935424 op > > 0x3:(DISCARD) flags 0x0 phys_seg 31 prio class 0 > > ``` > > > > or, in code: > > > > ``` > > 850 if (WARN_ON_ONCE(n != segments)) { > > 851 if (virt_to_page(range) == ns->ctrl->discard_page) > > 852 clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > > 853 else > > 854 kfree(range); > > 855 return BLK_STS_IOERR; > > 856 } > > ``` > > > > BFQ scheduler is in use. > > > > Something similar was already fixed by > > a958937ff166fc60d1c3a721036f6ff41bfa2821, > > but I do not have a multipath device here, it's just 2 NVMe SSDs in > > a soft > > RAID10 with LUKS and LVM. > > > > Any idea what this might mean? v5.13.2 brought some commit into a > > stable tree > > that are, as I still suspect, causing unreproducible panics [1] > > [2]. > > Previously, I dropped that extra stuff from my kernel build and had > > no issues. > > This time I also do not have any extra commits in the block layer, > > only those > > that are in v5.13.2. > > > > Thanks. > > > > [1] https://lore.kernel.org/linux-block/3533087.dJKXTdksHR@spock/ > > [2] https://lore.kernel.org/linux-block/2957867.CS06ZTPI5V@spock/ > > Can you run 'git bisect' to find the offending patch? > > thanks, > > greg k-h > Hello [root@ml150 ~]# uname -a Linux ml150 5.14.0-rc1+ #1 SMP Thu Jul 15 16:41:08 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux [root@ml150 ~]# nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ------------------------------- --------- --------- -------------------------- ---------------- ----- --- /dev/nvme0n1 CVCQ536300C9400AGN INTEL SSDPEDMW400G4 1 400.09 GB / 400.09 GB 512 B + 0 B 8EV10135 /dev/nvme1n1 CVFT7383000W400BGN INTEL SSDPEDMD400G4 1 400.09 GB / 400.09 GB 512 B + 0 B 8DV10171 fwiw I built 5.14 and I have 2 nvme devices and I am not seeing this even using them to build the kernel on. Regards Laurence Oberman _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-15 21:37 ` Laurence Oberman @ 2021-07-16 5:50 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-16 5:50 UTC (permalink / raw) To: Greg Kroah-Hartman, Laurence Oberman Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Ming Lei, Paolo Valente, Jan Kara, Sasha Levin, Keith Busch Hello. On čtvrtek 15. července 2021 23:37:21 CEST Laurence Oberman wrote: > [root@ml150 ~]# uname -a > Linux ml150 5.14.0-rc1+ #1 SMP Thu Jul 15 16:41:08 EDT 2021 x86_64 > x86_64 x86_64 GNU/Linux > > [root@ml150 ~]# nvme list > Node SN Model > Namespace Usage Format FW Rev > ---------------- -------------------- ------------------------------- > --------- --------- -------------------------- ---------------- ----- > --- > /dev/nvme0n1 CVCQ536300C9400AGN INTEL > SSDPEDMW400G4 1 400.09 GB / > 400.09 GB 512 B + 0 B 8EV10135 > /dev/nvme1n1 CVFT7383000W400BGN INTEL > SSDPEDMD400G4 1 400.09 GB / > 400.09 GB 512 B + 0 B 8DV10171 > > fwiw > > I built 5.14 and I have 2 nvme devices and I am not seeing this even > using them to build the kernel on. Thanks for trying. What about v5.13.2? Also, are using BFQ? Is device mapper in use? -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-16 5:50 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-16 5:50 UTC (permalink / raw) To: Greg Kroah-Hartman, Laurence Oberman Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Ming Lei, Paolo Valente, Jan Kara, Sasha Levin, Keith Busch Hello. On čtvrtek 15. července 2021 23:37:21 CEST Laurence Oberman wrote: > [root@ml150 ~]# uname -a > Linux ml150 5.14.0-rc1+ #1 SMP Thu Jul 15 16:41:08 EDT 2021 x86_64 > x86_64 x86_64 GNU/Linux > > [root@ml150 ~]# nvme list > Node SN Model > Namespace Usage Format FW Rev > ---------------- -------------------- ------------------------------- > --------- --------- -------------------------- ---------------- ----- > --- > /dev/nvme0n1 CVCQ536300C9400AGN INTEL > SSDPEDMW400G4 1 400.09 GB / > 400.09 GB 512 B + 0 B 8EV10135 > /dev/nvme1n1 CVFT7383000W400BGN INTEL > SSDPEDMD400G4 1 400.09 GB / > 400.09 GB 512 B + 0 B 8DV10171 > > fwiw > > I built 5.14 and I have 2 nvme devices and I am not seeing this even > using them to build the kernel on. Thanks for trying. What about v5.13.2? Also, are using BFQ? Is device mapper in use? -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-15 13:56 ` Oleksandr Natalenko @ 2021-07-16 2:16 ` Ming Lei -1 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-16 2:16 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Thu, Jul 15, 2021 at 03:56:38PM +0200, Oleksandr Natalenko wrote: > Hello. > > After a v5.13.2 massive update I encountered this: > > ``` > [19231.556665] ------------[ cut here ]------------ > [19231.556674] WARNING: CPU: 20 PID: 502 at drivers/nvme/host/core.c:850 > nvme_setup_discard+0x188/0x1f0 > ... > [19231.556776] CPU: 20 PID: 502 Comm: kworker/20:1H Not tainted 5.13.2 #1 > [19231.556780] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS > 3601 05/26/2021 > [19231.556784] Workqueue: kblockd blk_mq_run_work_fn > [19231.556789] RIP: 0010:nvme_setup_discard+0x188/0x1f0 > [19231.556794] Code: 49 8b 44 24 10 4c 8b 90 40 0b 00 00 4c 2b 15 8f 09 d8 00 > 49 c1 fa 06 49 c1 e2 0c 4c 03 15 90 09 d8 00 4d 89 d0 e9 b9 fe ff ff <0f> 0b b8 > 00 00 00 80 49 01 c2 72 52 48 c7 c0 00 00 00 80 48 2b 05 > [19231.556798] RSP: 0018:ffffaed2416efc00 EFLAGS: 00010287 > [19231.556802] RAX: ffff8e67fb580000 RBX: ffff8e640bbe5240 RCX: 0000000000000020 > [19231.556805] RDX: ffff8e67fb580000 RSI: 000000000000001f RDI: 0000000000000000 > [19231.556808] RBP: ffff8e640bbe5388 R08: ffff8e677b580000 R09: 0000000008abb800 > [19231.556811] R10: ffff8e677b580000 R11: 0000000000000400 R12: ffff8e6405999c00 > [19231.556814] R13: 000000000000001f R14: ffff8e6405693000 R15: ffff8e640bbaf800 > [19231.556816] FS: 0000000000000000(0000) GS:ffff8e6b0ef00000(0000) knlGS: > 0000000000000000 > [19231.556819] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [19231.556822] CR2: ffff888c76ece000 CR3: 000000047a184000 CR4: 0000000000350ee0 > [19231.556825] Call Trace: > [19231.556830] nvme_setup_cmd+0x2d0/0x670 > [19231.556834] nvme_queue_rq+0x79/0xc90 > [19231.556837] ? __sbitmap_get_word+0x30/0x80 > [19231.556842] ? sbitmap_get+0x85/0x180 > [19231.556846] blk_mq_dispatch_rq_list+0x15c/0x810 > [19231.556851] ? list_sort+0x21d/0x2f0 > [19231.556856] __blk_mq_do_dispatch_sched+0x196/0x320 > [19231.556860] __blk_mq_sched_dispatch_requests+0x14d/0x190 > [19231.556864] blk_mq_sched_dispatch_requests+0x2f/0x60 > [19231.556867] blk_mq_run_work_fn+0x43/0xc0 > [19231.556871] process_one_work+0x24e/0x430 > [19231.556876] worker_thread+0x54/0x4d0 > [19231.556880] ? process_one_work+0x430/0x430 > [19231.556883] kthread+0x1b3/0x1e0 > [19231.556886] ? __kthread_init_worker+0x50/0x50 > [19231.556889] ret_from_fork+0x22/0x30 > [19231.556895] ---[ end trace d9abdf019a56b4c7 ]--- > [19231.556906] blk_update_request: I/O error, dev nvme1n1, sector 632935424 op > 0x3:(DISCARD) flags 0x0 phys_seg 31 prio class 0 > ``` > > or, in code: > > ``` > 850 if (WARN_ON_ONCE(n != segments)) { What is 'n' and 'segments'? thanks, Ming ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-16 2:16 ` Ming Lei 0 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-16 2:16 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Thu, Jul 15, 2021 at 03:56:38PM +0200, Oleksandr Natalenko wrote: > Hello. > > After a v5.13.2 massive update I encountered this: > > ``` > [19231.556665] ------------[ cut here ]------------ > [19231.556674] WARNING: CPU: 20 PID: 502 at drivers/nvme/host/core.c:850 > nvme_setup_discard+0x188/0x1f0 > ... > [19231.556776] CPU: 20 PID: 502 Comm: kworker/20:1H Not tainted 5.13.2 #1 > [19231.556780] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS > 3601 05/26/2021 > [19231.556784] Workqueue: kblockd blk_mq_run_work_fn > [19231.556789] RIP: 0010:nvme_setup_discard+0x188/0x1f0 > [19231.556794] Code: 49 8b 44 24 10 4c 8b 90 40 0b 00 00 4c 2b 15 8f 09 d8 00 > 49 c1 fa 06 49 c1 e2 0c 4c 03 15 90 09 d8 00 4d 89 d0 e9 b9 fe ff ff <0f> 0b b8 > 00 00 00 80 49 01 c2 72 52 48 c7 c0 00 00 00 80 48 2b 05 > [19231.556798] RSP: 0018:ffffaed2416efc00 EFLAGS: 00010287 > [19231.556802] RAX: ffff8e67fb580000 RBX: ffff8e640bbe5240 RCX: 0000000000000020 > [19231.556805] RDX: ffff8e67fb580000 RSI: 000000000000001f RDI: 0000000000000000 > [19231.556808] RBP: ffff8e640bbe5388 R08: ffff8e677b580000 R09: 0000000008abb800 > [19231.556811] R10: ffff8e677b580000 R11: 0000000000000400 R12: ffff8e6405999c00 > [19231.556814] R13: 000000000000001f R14: ffff8e6405693000 R15: ffff8e640bbaf800 > [19231.556816] FS: 0000000000000000(0000) GS:ffff8e6b0ef00000(0000) knlGS: > 0000000000000000 > [19231.556819] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [19231.556822] CR2: ffff888c76ece000 CR3: 000000047a184000 CR4: 0000000000350ee0 > [19231.556825] Call Trace: > [19231.556830] nvme_setup_cmd+0x2d0/0x670 > [19231.556834] nvme_queue_rq+0x79/0xc90 > [19231.556837] ? __sbitmap_get_word+0x30/0x80 > [19231.556842] ? sbitmap_get+0x85/0x180 > [19231.556846] blk_mq_dispatch_rq_list+0x15c/0x810 > [19231.556851] ? list_sort+0x21d/0x2f0 > [19231.556856] __blk_mq_do_dispatch_sched+0x196/0x320 > [19231.556860] __blk_mq_sched_dispatch_requests+0x14d/0x190 > [19231.556864] blk_mq_sched_dispatch_requests+0x2f/0x60 > [19231.556867] blk_mq_run_work_fn+0x43/0xc0 > [19231.556871] process_one_work+0x24e/0x430 > [19231.556876] worker_thread+0x54/0x4d0 > [19231.556880] ? process_one_work+0x430/0x430 > [19231.556883] kthread+0x1b3/0x1e0 > [19231.556886] ? __kthread_init_worker+0x50/0x50 > [19231.556889] ret_from_fork+0x22/0x30 > [19231.556895] ---[ end trace d9abdf019a56b4c7 ]--- > [19231.556906] blk_update_request: I/O error, dev nvme1n1, sector 632935424 op > 0x3:(DISCARD) flags 0x0 phys_seg 31 prio class 0 > ``` > > or, in code: > > ``` > 850 if (WARN_ON_ONCE(n != segments)) { What is 'n' and 'segments'? thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-16 2:16 ` Ming Lei @ 2021-07-16 5:53 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-16 5:53 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On pátek 16. července 2021 4:16:18 CEST Ming Lei wrote: > > 850 if (WARN_ON_ONCE(n != segments)) { > > What is 'n' and 'segments'? So, I used this change to check that: ``` diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 66973bb56305..0a64f2f77daf 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -847,7 +847,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, n++; } - if (WARN_ON_ONCE(n != segments)) { + if (unlikely(n != segments)) { + pr_warn("nvme_setup_discard: n != segments (%u != %u)\n", n, segments); if (virt_to_page(range) == ns->ctrl->discard_page) clear_bit_unlock(0, &ns->ctrl->discard_page_busy); else ``` and the result is: ``` nvme_setup_discard: n != segments (3 != 2) nvme_setup_discard: n != segments (12 != 11) ``` (those are from different boots) Thanks. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-16 5:53 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-16 5:53 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On pátek 16. července 2021 4:16:18 CEST Ming Lei wrote: > > 850 if (WARN_ON_ONCE(n != segments)) { > > What is 'n' and 'segments'? So, I used this change to check that: ``` diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 66973bb56305..0a64f2f77daf 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -847,7 +847,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, n++; } - if (WARN_ON_ONCE(n != segments)) { + if (unlikely(n != segments)) { + pr_warn("nvme_setup_discard: n != segments (%u != %u)\n", n, segments); if (virt_to_page(range) == ns->ctrl->discard_page) clear_bit_unlock(0, &ns->ctrl->discard_page_busy); else ``` and the result is: ``` nvme_setup_discard: n != segments (3 != 2) nvme_setup_discard: n != segments (12 != 11) ``` (those are from different boots) Thanks. -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-16 5:53 ` Oleksandr Natalenko @ 2021-07-16 9:33 ` Ming Lei -1 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-16 9:33 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Fri, Jul 16, 2021 at 07:53:28AM +0200, Oleksandr Natalenko wrote: > Hello. > > On pátek 16. července 2021 4:16:18 CEST Ming Lei wrote: > > > 850 if (WARN_ON_ONCE(n != segments)) { > > > > What is 'n' and 'segments'? > > So, I used this change to check that: > > ``` > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 66973bb56305..0a64f2f77daf 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -847,7 +847,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, > struct request *req, > n++; > } > > - if (WARN_ON_ONCE(n != segments)) { > + if (unlikely(n != segments)) { > + pr_warn("nvme_setup_discard: n != segments (%u != %u)\n", n, > segments); > if (virt_to_page(range) == ns->ctrl->discard_page) > clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > else > > ``` > > and the result is: > > ``` > nvme_setup_discard: n != segments (3 != 2) > nvme_setup_discard: n != segments (12 != 11) > ``` Can you test the following patch? diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 727955918563..673a634eadd9 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -2361,6 +2361,9 @@ static int bfq_request_merge(struct request_queue *q, struct request **req, __rq = bfq_find_rq_fmerge(bfqd, bio, q); if (__rq && elv_bio_merge_ok(__rq, bio)) { *req = __rq; + + if (blk_discard_mergable(__rq)) + return ELEVATOR_DISCARD_MERGE; return ELEVATOR_FRONT_MERGE; } diff --git a/block/blk-merge.c b/block/blk-merge.c index a11b3b53717e..f8707ff7e2fc 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -705,22 +705,6 @@ static void blk_account_io_merge_request(struct request *req) } } -/* - * Two cases of handling DISCARD merge: - * If max_discard_segments > 1, the driver takes every bio - * as a range and send them to controller together. The ranges - * needn't to be contiguous. - * Otherwise, the bios/requests will be handled as same as - * others which should be contiguous. - */ -static inline bool blk_discard_mergable(struct request *req) -{ - if (req_op(req) == REQ_OP_DISCARD && - queue_max_discard_segments(req->q) > 1) - return true; - return false; -} - static enum elv_merge blk_try_req_merge(struct request *req, struct request *next) { diff --git a/block/elevator.c b/block/elevator.c index 52ada14cfe45..a5fe2615ec0f 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -336,6 +336,9 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req, __rq = elv_rqhash_find(q, bio->bi_iter.bi_sector); if (__rq && elv_bio_merge_ok(__rq, bio)) { *req = __rq; + + if (blk_discard_mergable(__rq)) + return ELEVATOR_DISCARD_MERGE; return ELEVATOR_BACK_MERGE; } diff --git a/block/mq-deadline-main.c b/block/mq-deadline-main.c index 6f612e6dc82b..294be0c0db65 100644 --- a/block/mq-deadline-main.c +++ b/block/mq-deadline-main.c @@ -677,6 +677,8 @@ static int dd_request_merge(struct request_queue *q, struct request **rq, if (elv_bio_merge_ok(__rq, bio)) { *rq = __rq; + if (blk_discard_mergable(__rq)) + return ELEVATOR_DISCARD_MERGE; return ELEVATOR_FRONT_MERGE; } } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3177181c4326..87f00292fd7a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1521,6 +1521,22 @@ static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector return offset << SECTOR_SHIFT; } +/* + * Two cases of handling DISCARD merge: + * If max_discard_segments > 1, the driver takes every bio + * as a range and send them to controller together. The ranges + * needn't to be contiguous. + * Otherwise, the bios/requests will be handled as same as + * others which should be contiguous. + */ +static inline bool blk_discard_mergable(struct request *req) +{ + if (req_op(req) == REQ_OP_DISCARD && + queue_max_discard_segments(req->q) > 1) + return true; + return false; +} + static inline int bdev_discard_alignment(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); Thanks, Ming ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-16 9:33 ` Ming Lei 0 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-16 9:33 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Fri, Jul 16, 2021 at 07:53:28AM +0200, Oleksandr Natalenko wrote: > Hello. > > On pátek 16. července 2021 4:16:18 CEST Ming Lei wrote: > > > 850 if (WARN_ON_ONCE(n != segments)) { > > > > What is 'n' and 'segments'? > > So, I used this change to check that: > > ``` > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 66973bb56305..0a64f2f77daf 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -847,7 +847,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, > struct request *req, > n++; > } > > - if (WARN_ON_ONCE(n != segments)) { > + if (unlikely(n != segments)) { > + pr_warn("nvme_setup_discard: n != segments (%u != %u)\n", n, > segments); > if (virt_to_page(range) == ns->ctrl->discard_page) > clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > else > > ``` > > and the result is: > > ``` > nvme_setup_discard: n != segments (3 != 2) > nvme_setup_discard: n != segments (12 != 11) > ``` Can you test the following patch? diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 727955918563..673a634eadd9 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -2361,6 +2361,9 @@ static int bfq_request_merge(struct request_queue *q, struct request **req, __rq = bfq_find_rq_fmerge(bfqd, bio, q); if (__rq && elv_bio_merge_ok(__rq, bio)) { *req = __rq; + + if (blk_discard_mergable(__rq)) + return ELEVATOR_DISCARD_MERGE; return ELEVATOR_FRONT_MERGE; } diff --git a/block/blk-merge.c b/block/blk-merge.c index a11b3b53717e..f8707ff7e2fc 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -705,22 +705,6 @@ static void blk_account_io_merge_request(struct request *req) } } -/* - * Two cases of handling DISCARD merge: - * If max_discard_segments > 1, the driver takes every bio - * as a range and send them to controller together. The ranges - * needn't to be contiguous. - * Otherwise, the bios/requests will be handled as same as - * others which should be contiguous. - */ -static inline bool blk_discard_mergable(struct request *req) -{ - if (req_op(req) == REQ_OP_DISCARD && - queue_max_discard_segments(req->q) > 1) - return true; - return false; -} - static enum elv_merge blk_try_req_merge(struct request *req, struct request *next) { diff --git a/block/elevator.c b/block/elevator.c index 52ada14cfe45..a5fe2615ec0f 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -336,6 +336,9 @@ enum elv_merge elv_merge(struct request_queue *q, struct request **req, __rq = elv_rqhash_find(q, bio->bi_iter.bi_sector); if (__rq && elv_bio_merge_ok(__rq, bio)) { *req = __rq; + + if (blk_discard_mergable(__rq)) + return ELEVATOR_DISCARD_MERGE; return ELEVATOR_BACK_MERGE; } diff --git a/block/mq-deadline-main.c b/block/mq-deadline-main.c index 6f612e6dc82b..294be0c0db65 100644 --- a/block/mq-deadline-main.c +++ b/block/mq-deadline-main.c @@ -677,6 +677,8 @@ static int dd_request_merge(struct request_queue *q, struct request **rq, if (elv_bio_merge_ok(__rq, bio)) { *rq = __rq; + if (blk_discard_mergable(__rq)) + return ELEVATOR_DISCARD_MERGE; return ELEVATOR_FRONT_MERGE; } } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3177181c4326..87f00292fd7a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1521,6 +1521,22 @@ static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector return offset << SECTOR_SHIFT; } +/* + * Two cases of handling DISCARD merge: + * If max_discard_segments > 1, the driver takes every bio + * as a range and send them to controller together. The ranges + * needn't to be contiguous. + * Otherwise, the bios/requests will be handled as same as + * others which should be contiguous. + */ +static inline bool blk_discard_mergable(struct request *req) +{ + if (req_op(req) == REQ_OP_DISCARD && + queue_max_discard_segments(req->q) > 1) + return true; + return false; +} + static inline int bdev_discard_alignment(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); Thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-16 9:33 ` Ming Lei @ 2021-07-16 10:03 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-16 10:03 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On pátek 16. července 2021 11:33:05 CEST Ming Lei wrote: > Can you test the following patch? Sure, building it at the moment, and will give it a try. Also please see my comments and questions below. > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > index 727955918563..673a634eadd9 100644 > --- a/block/bfq-iosched.c > +++ b/block/bfq-iosched.c > @@ -2361,6 +2361,9 @@ static int bfq_request_merge(struct request_queue *q, > struct request **req, __rq = bfq_find_rq_fmerge(bfqd, bio, q); > if (__rq && elv_bio_merge_ok(__rq, bio)) { > *req = __rq; > + > + if (blk_discard_mergable(__rq)) > + return ELEVATOR_DISCARD_MERGE; > return ELEVATOR_FRONT_MERGE; > } > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index a11b3b53717e..f8707ff7e2fc 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -705,22 +705,6 @@ static void blk_account_io_merge_request(struct request > *req) } > } > > -/* > - * Two cases of handling DISCARD merge: > - * If max_discard_segments > 1, the driver takes every bio > - * as a range and send them to controller together. The ranges > - * needn't to be contiguous. > - * Otherwise, the bios/requests will be handled as same as > - * others which should be contiguous. > - */ > -static inline bool blk_discard_mergable(struct request *req) > -{ > - if (req_op(req) == REQ_OP_DISCARD && > - queue_max_discard_segments(req->q) > 1) > - return true; > - return false; > -} > - > static enum elv_merge blk_try_req_merge(struct request *req, > struct request *next) > { > diff --git a/block/elevator.c b/block/elevator.c > index 52ada14cfe45..a5fe2615ec0f 100644 > --- a/block/elevator.c > +++ b/block/elevator.c > @@ -336,6 +336,9 @@ enum elv_merge elv_merge(struct request_queue *q, struct > request **req, __rq = elv_rqhash_find(q, bio->bi_iter.bi_sector); > if (__rq && elv_bio_merge_ok(__rq, bio)) { > *req = __rq; > + > + if (blk_discard_mergable(__rq)) > + return ELEVATOR_DISCARD_MERGE; > return ELEVATOR_BACK_MERGE; > } > > diff --git a/block/mq-deadline-main.c b/block/mq-deadline-main.c > index 6f612e6dc82b..294be0c0db65 100644 > --- a/block/mq-deadline-main.c > +++ b/block/mq-deadline-main.c I had to adjust this against v5.13 because there's no mq-deadline-main.c, only mq-deadline.c (due to Bart series, I assume). I hope this is fine as the patch applies cleanly. > @@ -677,6 +677,8 @@ static int dd_request_merge(struct request_queue *q, > struct request **rq, > > if (elv_bio_merge_ok(__rq, bio)) { > *rq = __rq; > + if (blk_discard_mergable(__rq)) > + return ELEVATOR_DISCARD_MERGE; > return ELEVATOR_FRONT_MERGE; > } > } > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 3177181c4326..87f00292fd7a 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -1521,6 +1521,22 @@ static inline int > queue_limit_discard_alignment(struct queue_limits *lim, sector return > offset << SECTOR_SHIFT; > } > > +/* > + * Two cases of handling DISCARD merge: > + * If max_discard_segments > 1, the driver takes every bio > + * as a range and send them to controller together. The ranges > + * needn't to be contiguous. > + * Otherwise, the bios/requests will be handled as same as > + * others which should be contiguous. > + */ > +static inline bool blk_discard_mergable(struct request *req) > +{ > + if (req_op(req) == REQ_OP_DISCARD && > + queue_max_discard_segments(req->q) > 1) > + return true; > + return false; > +} > + > static inline int bdev_discard_alignment(struct block_device *bdev) > { > struct request_queue *q = bdev_get_queue(bdev); Do I understand correctly that this will be something like: Fixes: 2705dfb209 ("block: fix discard request merge") ? Because as the bisection progresses, I've bumped into this commit only. Without it the issue is not reproducible, at least so far. Thanks! -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-16 10:03 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-16 10:03 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On pátek 16. července 2021 11:33:05 CEST Ming Lei wrote: > Can you test the following patch? Sure, building it at the moment, and will give it a try. Also please see my comments and questions below. > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > index 727955918563..673a634eadd9 100644 > --- a/block/bfq-iosched.c > +++ b/block/bfq-iosched.c > @@ -2361,6 +2361,9 @@ static int bfq_request_merge(struct request_queue *q, > struct request **req, __rq = bfq_find_rq_fmerge(bfqd, bio, q); > if (__rq && elv_bio_merge_ok(__rq, bio)) { > *req = __rq; > + > + if (blk_discard_mergable(__rq)) > + return ELEVATOR_DISCARD_MERGE; > return ELEVATOR_FRONT_MERGE; > } > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index a11b3b53717e..f8707ff7e2fc 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -705,22 +705,6 @@ static void blk_account_io_merge_request(struct request > *req) } > } > > -/* > - * Two cases of handling DISCARD merge: > - * If max_discard_segments > 1, the driver takes every bio > - * as a range and send them to controller together. The ranges > - * needn't to be contiguous. > - * Otherwise, the bios/requests will be handled as same as > - * others which should be contiguous. > - */ > -static inline bool blk_discard_mergable(struct request *req) > -{ > - if (req_op(req) == REQ_OP_DISCARD && > - queue_max_discard_segments(req->q) > 1) > - return true; > - return false; > -} > - > static enum elv_merge blk_try_req_merge(struct request *req, > struct request *next) > { > diff --git a/block/elevator.c b/block/elevator.c > index 52ada14cfe45..a5fe2615ec0f 100644 > --- a/block/elevator.c > +++ b/block/elevator.c > @@ -336,6 +336,9 @@ enum elv_merge elv_merge(struct request_queue *q, struct > request **req, __rq = elv_rqhash_find(q, bio->bi_iter.bi_sector); > if (__rq && elv_bio_merge_ok(__rq, bio)) { > *req = __rq; > + > + if (blk_discard_mergable(__rq)) > + return ELEVATOR_DISCARD_MERGE; > return ELEVATOR_BACK_MERGE; > } > > diff --git a/block/mq-deadline-main.c b/block/mq-deadline-main.c > index 6f612e6dc82b..294be0c0db65 100644 > --- a/block/mq-deadline-main.c > +++ b/block/mq-deadline-main.c I had to adjust this against v5.13 because there's no mq-deadline-main.c, only mq-deadline.c (due to Bart series, I assume). I hope this is fine as the patch applies cleanly. > @@ -677,6 +677,8 @@ static int dd_request_merge(struct request_queue *q, > struct request **rq, > > if (elv_bio_merge_ok(__rq, bio)) { > *rq = __rq; > + if (blk_discard_mergable(__rq)) > + return ELEVATOR_DISCARD_MERGE; > return ELEVATOR_FRONT_MERGE; > } > } > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 3177181c4326..87f00292fd7a 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -1521,6 +1521,22 @@ static inline int > queue_limit_discard_alignment(struct queue_limits *lim, sector return > offset << SECTOR_SHIFT; > } > > +/* > + * Two cases of handling DISCARD merge: > + * If max_discard_segments > 1, the driver takes every bio > + * as a range and send them to controller together. The ranges > + * needn't to be contiguous. > + * Otherwise, the bios/requests will be handled as same as > + * others which should be contiguous. > + */ > +static inline bool blk_discard_mergable(struct request *req) > +{ > + if (req_op(req) == REQ_OP_DISCARD && > + queue_max_discard_segments(req->q) > 1) > + return true; > + return false; > +} > + > static inline int bdev_discard_alignment(struct block_device *bdev) > { > struct request_queue *q = bdev_get_queue(bdev); Do I understand correctly that this will be something like: Fixes: 2705dfb209 ("block: fix discard request merge") ? Because as the bisection progresses, I've bumped into this commit only. Without it the issue is not reproducible, at least so far. Thanks! -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-16 10:03 ` Oleksandr Natalenko @ 2021-07-16 10:41 ` Ming Lei -1 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-16 10:41 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Fri, Jul 16, 2021 at 12:03:43PM +0200, Oleksandr Natalenko wrote: > Hello. > > On pátek 16. července 2021 11:33:05 CEST Ming Lei wrote: > > Can you test the following patch? > > Sure, building it at the moment, and will give it a try. Also please see my > comments and questions below. > > > > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > > index 727955918563..673a634eadd9 100644 > > --- a/block/bfq-iosched.c > > +++ b/block/bfq-iosched.c > > @@ -2361,6 +2361,9 @@ static int bfq_request_merge(struct request_queue *q, > > struct request **req, __rq = bfq_find_rq_fmerge(bfqd, bio, q); > > if (__rq && elv_bio_merge_ok(__rq, bio)) { > > *req = __rq; > > + > > + if (blk_discard_mergable(__rq)) > > + return ELEVATOR_DISCARD_MERGE; > > return ELEVATOR_FRONT_MERGE; > > } > > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > > index a11b3b53717e..f8707ff7e2fc 100644 > > --- a/block/blk-merge.c > > +++ b/block/blk-merge.c > > @@ -705,22 +705,6 @@ static void blk_account_io_merge_request(struct request > > *req) } > > } > > > > -/* > > - * Two cases of handling DISCARD merge: > > - * If max_discard_segments > 1, the driver takes every bio > > - * as a range and send them to controller together. The ranges > > - * needn't to be contiguous. > > - * Otherwise, the bios/requests will be handled as same as > > - * others which should be contiguous. > > - */ > > -static inline bool blk_discard_mergable(struct request *req) > > -{ > > - if (req_op(req) == REQ_OP_DISCARD && > > - queue_max_discard_segments(req->q) > 1) > > - return true; > > - return false; > > -} > > - > > static enum elv_merge blk_try_req_merge(struct request *req, > > struct request *next) > > { > > diff --git a/block/elevator.c b/block/elevator.c > > index 52ada14cfe45..a5fe2615ec0f 100644 > > --- a/block/elevator.c > > +++ b/block/elevator.c > > @@ -336,6 +336,9 @@ enum elv_merge elv_merge(struct request_queue *q, struct > > request **req, __rq = elv_rqhash_find(q, bio->bi_iter.bi_sector); > > if (__rq && elv_bio_merge_ok(__rq, bio)) { > > *req = __rq; > > + > > + if (blk_discard_mergable(__rq)) > > + return ELEVATOR_DISCARD_MERGE; > > return ELEVATOR_BACK_MERGE; > > } > > > > diff --git a/block/mq-deadline-main.c b/block/mq-deadline-main.c > > index 6f612e6dc82b..294be0c0db65 100644 > > --- a/block/mq-deadline-main.c > > +++ b/block/mq-deadline-main.c > > I had to adjust this against v5.13 because there's no mq-deadline-main.c, only > mq-deadline.c (due to Bart series, I assume). I hope this is fine as the patch > applies cleanly. > > > @@ -677,6 +677,8 @@ static int dd_request_merge(struct request_queue *q, > > struct request **rq, > > > > if (elv_bio_merge_ok(__rq, bio)) { > > *rq = __rq; > > + if (blk_discard_mergable(__rq)) > > + return ELEVATOR_DISCARD_MERGE; > > return ELEVATOR_FRONT_MERGE; > > } > > } > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > > index 3177181c4326..87f00292fd7a 100644 > > --- a/include/linux/blkdev.h > > +++ b/include/linux/blkdev.h > > @@ -1521,6 +1521,22 @@ static inline int > > queue_limit_discard_alignment(struct queue_limits *lim, sector return > > offset << SECTOR_SHIFT; > > } > > > > +/* > > + * Two cases of handling DISCARD merge: > > + * If max_discard_segments > 1, the driver takes every bio > > + * as a range and send them to controller together. The ranges > > + * needn't to be contiguous. > > + * Otherwise, the bios/requests will be handled as same as > > + * others which should be contiguous. > > + */ > > +static inline bool blk_discard_mergable(struct request *req) > > +{ > > + if (req_op(req) == REQ_OP_DISCARD && > > + queue_max_discard_segments(req->q) > 1) > > + return true; > > + return false; > > +} > > + > > static inline int bdev_discard_alignment(struct block_device *bdev) > > { > > struct request_queue *q = bdev_get_queue(bdev); > > Do I understand correctly that this will be something like: > > Fixes: 2705dfb209 ("block: fix discard request merge") > > ? > > Because as the bisection progresses, I've bumped into this commit only. > Without it the issue is not reproducible, at least so far. It could be. So can you just test v5.14-rc1? Thanks, Ming ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-16 10:41 ` Ming Lei 0 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-16 10:41 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Fri, Jul 16, 2021 at 12:03:43PM +0200, Oleksandr Natalenko wrote: > Hello. > > On pátek 16. července 2021 11:33:05 CEST Ming Lei wrote: > > Can you test the following patch? > > Sure, building it at the moment, and will give it a try. Also please see my > comments and questions below. > > > > > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > > index 727955918563..673a634eadd9 100644 > > --- a/block/bfq-iosched.c > > +++ b/block/bfq-iosched.c > > @@ -2361,6 +2361,9 @@ static int bfq_request_merge(struct request_queue *q, > > struct request **req, __rq = bfq_find_rq_fmerge(bfqd, bio, q); > > if (__rq && elv_bio_merge_ok(__rq, bio)) { > > *req = __rq; > > + > > + if (blk_discard_mergable(__rq)) > > + return ELEVATOR_DISCARD_MERGE; > > return ELEVATOR_FRONT_MERGE; > > } > > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > > index a11b3b53717e..f8707ff7e2fc 100644 > > --- a/block/blk-merge.c > > +++ b/block/blk-merge.c > > @@ -705,22 +705,6 @@ static void blk_account_io_merge_request(struct request > > *req) } > > } > > > > -/* > > - * Two cases of handling DISCARD merge: > > - * If max_discard_segments > 1, the driver takes every bio > > - * as a range and send them to controller together. The ranges > > - * needn't to be contiguous. > > - * Otherwise, the bios/requests will be handled as same as > > - * others which should be contiguous. > > - */ > > -static inline bool blk_discard_mergable(struct request *req) > > -{ > > - if (req_op(req) == REQ_OP_DISCARD && > > - queue_max_discard_segments(req->q) > 1) > > - return true; > > - return false; > > -} > > - > > static enum elv_merge blk_try_req_merge(struct request *req, > > struct request *next) > > { > > diff --git a/block/elevator.c b/block/elevator.c > > index 52ada14cfe45..a5fe2615ec0f 100644 > > --- a/block/elevator.c > > +++ b/block/elevator.c > > @@ -336,6 +336,9 @@ enum elv_merge elv_merge(struct request_queue *q, struct > > request **req, __rq = elv_rqhash_find(q, bio->bi_iter.bi_sector); > > if (__rq && elv_bio_merge_ok(__rq, bio)) { > > *req = __rq; > > + > > + if (blk_discard_mergable(__rq)) > > + return ELEVATOR_DISCARD_MERGE; > > return ELEVATOR_BACK_MERGE; > > } > > > > diff --git a/block/mq-deadline-main.c b/block/mq-deadline-main.c > > index 6f612e6dc82b..294be0c0db65 100644 > > --- a/block/mq-deadline-main.c > > +++ b/block/mq-deadline-main.c > > I had to adjust this against v5.13 because there's no mq-deadline-main.c, only > mq-deadline.c (due to Bart series, I assume). I hope this is fine as the patch > applies cleanly. > > > @@ -677,6 +677,8 @@ static int dd_request_merge(struct request_queue *q, > > struct request **rq, > > > > if (elv_bio_merge_ok(__rq, bio)) { > > *rq = __rq; > > + if (blk_discard_mergable(__rq)) > > + return ELEVATOR_DISCARD_MERGE; > > return ELEVATOR_FRONT_MERGE; > > } > > } > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > > index 3177181c4326..87f00292fd7a 100644 > > --- a/include/linux/blkdev.h > > +++ b/include/linux/blkdev.h > > @@ -1521,6 +1521,22 @@ static inline int > > queue_limit_discard_alignment(struct queue_limits *lim, sector return > > offset << SECTOR_SHIFT; > > } > > > > +/* > > + * Two cases of handling DISCARD merge: > > + * If max_discard_segments > 1, the driver takes every bio > > + * as a range and send them to controller together. The ranges > > + * needn't to be contiguous. > > + * Otherwise, the bios/requests will be handled as same as > > + * others which should be contiguous. > > + */ > > +static inline bool blk_discard_mergable(struct request *req) > > +{ > > + if (req_op(req) == REQ_OP_DISCARD && > > + queue_max_discard_segments(req->q) > 1) > > + return true; > > + return false; > > +} > > + > > static inline int bdev_discard_alignment(struct block_device *bdev) > > { > > struct request_queue *q = bdev_get_queue(bdev); > > Do I understand correctly that this will be something like: > > Fixes: 2705dfb209 ("block: fix discard request merge") > > ? > > Because as the bisection progresses, I've bumped into this commit only. > Without it the issue is not reproducible, at least so far. It could be. So can you just test v5.14-rc1? Thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-16 10:41 ` Ming Lei @ 2021-07-16 12:56 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-16 12:56 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On pátek 16. července 2021 12:41:52 CEST Ming Lei wrote: > > Do I understand correctly that this will be something like: > > > > Fixes: 2705dfb209 ("block: fix discard request merge") > > > > ? > > > > Because as the bisection progresses, I've bumped into this commit only. > > Without it the issue is not reproducible, at least so far. > > It could be. > > So can you just test v5.14-rc1? Doing it right now, but I've got another issue. Why BFQ is not listed here: ``` /sys/class/block/nvme0n1/queue/scheduler:[mq-deadline] kyber none /sys/class/block/nvme1n1/queue/scheduler:[mq-deadline] kyber none ``` ? It is a built-in, FWIW: ``` $ modinfo bfq name: bfq filename: (builtin) description: MQ Budget Fair Queueing I/O Scheduler license: GPL file: block/bfq author: Paolo Valente alias: bfq-iosched ``` So far the issue is not reproducible with your patch + 5.13.2 as well as 5.14- rc1 (but I don't have BFQ either with v5.14-rc1). -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-16 12:56 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-16 12:56 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On pátek 16. července 2021 12:41:52 CEST Ming Lei wrote: > > Do I understand correctly that this will be something like: > > > > Fixes: 2705dfb209 ("block: fix discard request merge") > > > > ? > > > > Because as the bisection progresses, I've bumped into this commit only. > > Without it the issue is not reproducible, at least so far. > > It could be. > > So can you just test v5.14-rc1? Doing it right now, but I've got another issue. Why BFQ is not listed here: ``` /sys/class/block/nvme0n1/queue/scheduler:[mq-deadline] kyber none /sys/class/block/nvme1n1/queue/scheduler:[mq-deadline] kyber none ``` ? It is a built-in, FWIW: ``` $ modinfo bfq name: bfq filename: (builtin) description: MQ Budget Fair Queueing I/O Scheduler license: GPL file: block/bfq author: Paolo Valente alias: bfq-iosched ``` So far the issue is not reproducible with your patch + 5.13.2 as well as 5.14- rc1 (but I don't have BFQ either with v5.14-rc1). -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-16 12:56 ` Oleksandr Natalenko @ 2021-07-17 9:35 ` Ming Lei -1 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-17 9:35 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Fri, Jul 16, 2021 at 02:56:22PM +0200, Oleksandr Natalenko wrote: > On pátek 16. července 2021 12:41:52 CEST Ming Lei wrote: > > > Do I understand correctly that this will be something like: > > > > > > Fixes: 2705dfb209 ("block: fix discard request merge") > > > > > > ? > > > > > > Because as the bisection progresses, I've bumped into this commit only. > > > Without it the issue is not reproducible, at least so far. > > > > It could be. > > > > So can you just test v5.14-rc1? > > Doing it right now, but I've got another issue. Why BFQ is not listed here: > > ``` > /sys/class/block/nvme0n1/queue/scheduler:[mq-deadline] kyber none > /sys/class/block/nvme1n1/queue/scheduler:[mq-deadline] kyber none > ``` Maybe you need to check if the build is OK, I can't reproduce it in my VM, and BFQ is still builtin: [root@ktest-01 ~]# uname -a Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 x86_64 x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat /sys/block/nvme0n1/queue/scheduler [none] mq-deadline kyber bfq > > ? > > It is a built-in, FWIW: > > ``` > $ modinfo bfq > name: bfq > filename: (builtin) > description: MQ Budget Fair Queueing I/O Scheduler > license: GPL > file: block/bfq > author: Paolo Valente > alias: bfq-iosched > ``` > > So far the issue is not reproducible with your patch + 5.13.2 as well as 5.14- > rc1 (but I don't have BFQ either with v5.14-rc1). You have to verify it with BFQ applied, :-) Thanks, Ming ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-17 9:35 ` Ming Lei 0 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-17 9:35 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Fri, Jul 16, 2021 at 02:56:22PM +0200, Oleksandr Natalenko wrote: > On pátek 16. července 2021 12:41:52 CEST Ming Lei wrote: > > > Do I understand correctly that this will be something like: > > > > > > Fixes: 2705dfb209 ("block: fix discard request merge") > > > > > > ? > > > > > > Because as the bisection progresses, I've bumped into this commit only. > > > Without it the issue is not reproducible, at least so far. > > > > It could be. > > > > So can you just test v5.14-rc1? > > Doing it right now, but I've got another issue. Why BFQ is not listed here: > > ``` > /sys/class/block/nvme0n1/queue/scheduler:[mq-deadline] kyber none > /sys/class/block/nvme1n1/queue/scheduler:[mq-deadline] kyber none > ``` Maybe you need to check if the build is OK, I can't reproduce it in my VM, and BFQ is still builtin: [root@ktest-01 ~]# uname -a Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 x86_64 x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat /sys/block/nvme0n1/queue/scheduler [none] mq-deadline kyber bfq > > ? > > It is a built-in, FWIW: > > ``` > $ modinfo bfq > name: bfq > filename: (builtin) > description: MQ Budget Fair Queueing I/O Scheduler > license: GPL > file: block/bfq > author: Paolo Valente > alias: bfq-iosched > ``` > > So far the issue is not reproducible with your patch + 5.13.2 as well as 5.14- > rc1 (but I don't have BFQ either with v5.14-rc1). You have to verify it with BFQ applied, :-) Thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-17 9:35 ` Ming Lei @ 2021-07-17 12:11 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-17 12:11 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > Maybe you need to check if the build is OK, I can't reproduce it in my > VM, and BFQ is still builtin: > > [root@ktest-01 ~]# uname -a > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 x86_64 > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > /sys/block/nvme0n1/queue/scheduler > [none] mq-deadline kyber bfq I don't think this is an issue with the build… BTW, with `initcall_debug`: ``` [ 0.902555] calling bfq_init+0x0/0x8b @ 1 [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 usecs ``` -ENOSPC? Why? Also re-tested with the latest git tip, same result :(. > > So far the issue is not reproducible with your patch + 5.13.2 as well as > > 5.14- rc1 (but I don't have BFQ either with v5.14-rc1). > > You have to verify it with BFQ applied, :-) Understandable… BTW, I'm still running v5.13.2 with your patch applied and do not see the issue. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-17 12:11 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-17 12:11 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > Maybe you need to check if the build is OK, I can't reproduce it in my > VM, and BFQ is still builtin: > > [root@ktest-01 ~]# uname -a > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 x86_64 > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > /sys/block/nvme0n1/queue/scheduler > [none] mq-deadline kyber bfq I don't think this is an issue with the build… BTW, with `initcall_debug`: ``` [ 0.902555] calling bfq_init+0x0/0x8b @ 1 [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 usecs ``` -ENOSPC? Why? Also re-tested with the latest git tip, same result :(. > > So far the issue is not reproducible with your patch + 5.13.2 as well as > > 5.14- rc1 (but I don't have BFQ either with v5.14-rc1). > > You have to verify it with BFQ applied, :-) Understandable… BTW, I'm still running v5.13.2 with your patch applied and do not see the issue. -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-17 12:11 ` Oleksandr Natalenko @ 2021-07-17 12:19 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-17 12:19 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > Maybe you need to check if the build is OK, I can't reproduce it in my > > VM, and BFQ is still builtin: > > > > [root@ktest-01 ~]# uname -a > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 x86_64 > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > /sys/block/nvme0n1/queue/scheduler > > [none] mq-deadline kyber bfq > > I don't think this is an issue with the build… BTW, with `initcall_debug`: > > ``` > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 usecs > ``` > > -ENOSPC? Why? Also re-tested with the latest git tip, same result :(. OK, one extra pr_info, and I see this: ``` [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small [ 0.871612] blkcg_policy_register: -28 ``` What does it mean please :)? The value seems to be hard-coded: ``` include/linux/blkdev.h 60:#define BLKCG_MAX_POLS 5 ``` -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-17 12:19 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-17 12:19 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > Maybe you need to check if the build is OK, I can't reproduce it in my > > VM, and BFQ is still builtin: > > > > [root@ktest-01 ~]# uname -a > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 x86_64 > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > /sys/block/nvme0n1/queue/scheduler > > [none] mq-deadline kyber bfq > > I don't think this is an issue with the build… BTW, with `initcall_debug`: > > ``` > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 usecs > ``` > > -ENOSPC? Why? Also re-tested with the latest git tip, same result :(. OK, one extra pr_info, and I see this: ``` [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small [ 0.871612] blkcg_policy_register: -28 ``` What does it mean please :)? The value seems to be hard-coded: ``` include/linux/blkdev.h 60:#define BLKCG_MAX_POLS 5 ``` -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-17 12:19 ` Oleksandr Natalenko @ 2021-07-17 12:35 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-17 12:35 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On sobota 17. července 2021 14:19:59 CEST Oleksandr Natalenko wrote: > On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > > Maybe you need to check if the build is OK, I can't reproduce it in my > > > VM, and BFQ is still builtin: > > > > > > [root@ktest-01 ~]# uname -a > > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 x86_64 > > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > > /sys/block/nvme0n1/queue/scheduler > > > [none] mq-deadline kyber bfq > > > > I don't think this is an issue with the build… BTW, with `initcall_debug`: > > > > ``` > > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 usecs > > ``` > > > > -ENOSPC? Why? Also re-tested with the latest git tip, same result :(. > > OK, one extra pr_info, and I see this: > > ``` > [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small > [ 0.871612] blkcg_policy_register: -28 > ``` > > What does it mean please :)? The value seems to be hard-coded: > > ``` > include/linux/blkdev.h > 60:#define BLKCG_MAX_POLS 5 > ``` OK, after increasing this to 6 I've got my BFQ back. Please see [1]. [1] https://lore.kernel.org/linux-block/20210717123328.945810-1-oleksandr@natalenko.name/ -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-17 12:35 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-17 12:35 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On sobota 17. července 2021 14:19:59 CEST Oleksandr Natalenko wrote: > On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > > Maybe you need to check if the build is OK, I can't reproduce it in my > > > VM, and BFQ is still builtin: > > > > > > [root@ktest-01 ~]# uname -a > > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 x86_64 > > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > > /sys/block/nvme0n1/queue/scheduler > > > [none] mq-deadline kyber bfq > > > > I don't think this is an issue with the build… BTW, with `initcall_debug`: > > > > ``` > > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 usecs > > ``` > > > > -ENOSPC? Why? Also re-tested with the latest git tip, same result :(. > > OK, one extra pr_info, and I see this: > > ``` > [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small > [ 0.871612] blkcg_policy_register: -28 > ``` > > What does it mean please :)? The value seems to be hard-coded: > > ``` > include/linux/blkdev.h > 60:#define BLKCG_MAX_POLS 5 > ``` OK, after increasing this to 6 I've got my BFQ back. Please see [1]. [1] https://lore.kernel.org/linux-block/20210717123328.945810-1-oleksandr@natalenko.name/ -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-17 12:35 ` Oleksandr Natalenko @ 2021-07-19 1:40 ` Ming Lei -1 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-19 1:40 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Sat, Jul 17, 2021 at 02:35:14PM +0200, Oleksandr Natalenko wrote: > On sobota 17. července 2021 14:19:59 CEST Oleksandr Natalenko wrote: > > On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > > > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > > > Maybe you need to check if the build is OK, I can't reproduce it in my > > > > VM, and BFQ is still builtin: > > > > > > > > [root@ktest-01 ~]# uname -a > > > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 x86_64 > > > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > > > /sys/block/nvme0n1/queue/scheduler > > > > [none] mq-deadline kyber bfq > > > > > > I don't think this is an issue with the build… BTW, with `initcall_debug`: > > > > > > ``` > > > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > > > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 usecs > > > ``` > > > > > > -ENOSPC? Why? Also re-tested with the latest git tip, same result :(. > > > > OK, one extra pr_info, and I see this: > > > > ``` > > [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small > > [ 0.871612] blkcg_policy_register: -28 > > ``` > > > > What does it mean please :)? The value seems to be hard-coded: > > > > ``` > > include/linux/blkdev.h > > 60:#define BLKCG_MAX_POLS 5 > > ``` > > OK, after increasing this to 6 I've got my BFQ back. Please see [1]. > > [1] https://lore.kernel.org/linux-block/20210717123328.945810-1-oleksandr@natalenko.name/ OK, after you fixed the issue in blkcg_policy_register(), can you reproduce the discard issue on v5.14-rc1 with BFQ applied? If yes, can you test the patch I posted previously? Thanks, Ming ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-19 1:40 ` Ming Lei 0 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-19 1:40 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Sat, Jul 17, 2021 at 02:35:14PM +0200, Oleksandr Natalenko wrote: > On sobota 17. července 2021 14:19:59 CEST Oleksandr Natalenko wrote: > > On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > > > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > > > Maybe you need to check if the build is OK, I can't reproduce it in my > > > > VM, and BFQ is still builtin: > > > > > > > > [root@ktest-01 ~]# uname -a > > > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 x86_64 > > > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > > > /sys/block/nvme0n1/queue/scheduler > > > > [none] mq-deadline kyber bfq > > > > > > I don't think this is an issue with the build… BTW, with `initcall_debug`: > > > > > > ``` > > > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > > > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 usecs > > > ``` > > > > > > -ENOSPC? Why? Also re-tested with the latest git tip, same result :(. > > > > OK, one extra pr_info, and I see this: > > > > ``` > > [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small > > [ 0.871612] blkcg_policy_register: -28 > > ``` > > > > What does it mean please :)? The value seems to be hard-coded: > > > > ``` > > include/linux/blkdev.h > > 60:#define BLKCG_MAX_POLS 5 > > ``` > > OK, after increasing this to 6 I've got my BFQ back. Please see [1]. > > [1] https://lore.kernel.org/linux-block/20210717123328.945810-1-oleksandr@natalenko.name/ OK, after you fixed the issue in blkcg_policy_register(), can you reproduce the discard issue on v5.14-rc1 with BFQ applied? If yes, can you test the patch I posted previously? Thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-19 1:40 ` Ming Lei @ 2021-07-19 6:27 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-19 6:27 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On pondělí 19. července 2021 3:40:40 CEST Ming Lei wrote: > On Sat, Jul 17, 2021 at 02:35:14PM +0200, Oleksandr Natalenko wrote: > > On sobota 17. července 2021 14:19:59 CEST Oleksandr Natalenko wrote: > > > On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > > > > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > > > > Maybe you need to check if the build is OK, I can't reproduce it in > > > > > my > > > > > VM, and BFQ is still builtin: > > > > > > > > > > [root@ktest-01 ~]# uname -a > > > > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 > > > > > x86_64 > > > > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > > > > /sys/block/nvme0n1/queue/scheduler > > > > > [none] mq-deadline kyber bfq > > > > > > > > I don't think this is an issue with the build… BTW, with > > > > `initcall_debug`: > > > > > > > > ``` > > > > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > > > > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 usecs > > > > ``` > > > > > > > > -ENOSPC? Why? Also re-tested with the latest git tip, same result :(. > > > > > > OK, one extra pr_info, and I see this: > > > > > > ``` > > > [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small > > > [ 0.871612] blkcg_policy_register: -28 > > > ``` > > > > > > What does it mean please :)? The value seems to be hard-coded: > > > > > > ``` > > > include/linux/blkdev.h > > > 60:#define BLKCG_MAX_POLS 5 > > > ``` > > > > OK, after increasing this to 6 I've got my BFQ back. Please see [1]. > > > > [1] > > https://lore.kernel.org/linux-block/20210717123328.945810-1-oleksandr@nat > > alenko.name/ > OK, after you fixed the issue in blkcg_policy_register(), can you > reproduce the discard issue on v5.14-rc1 with BFQ applied? If yes, > can you test the patch I posted previously? Yes, the issue is reproducible with both v5.13.2 and v5.14-rc1. I haven't managed to reproduce it with v5.13.2+your patch. Now I will build v5.14- rc2+your patch and test further. Thanks. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-19 6:27 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-19 6:27 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On pondělí 19. července 2021 3:40:40 CEST Ming Lei wrote: > On Sat, Jul 17, 2021 at 02:35:14PM +0200, Oleksandr Natalenko wrote: > > On sobota 17. července 2021 14:19:59 CEST Oleksandr Natalenko wrote: > > > On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > > > > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > > > > Maybe you need to check if the build is OK, I can't reproduce it in > > > > > my > > > > > VM, and BFQ is still builtin: > > > > > > > > > > [root@ktest-01 ~]# uname -a > > > > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 > > > > > x86_64 > > > > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > > > > /sys/block/nvme0n1/queue/scheduler > > > > > [none] mq-deadline kyber bfq > > > > > > > > I don't think this is an issue with the build… BTW, with > > > > `initcall_debug`: > > > > > > > > ``` > > > > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > > > > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 usecs > > > > ``` > > > > > > > > -ENOSPC? Why? Also re-tested with the latest git tip, same result :(. > > > > > > OK, one extra pr_info, and I see this: > > > > > > ``` > > > [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small > > > [ 0.871612] blkcg_policy_register: -28 > > > ``` > > > > > > What does it mean please :)? The value seems to be hard-coded: > > > > > > ``` > > > include/linux/blkdev.h > > > 60:#define BLKCG_MAX_POLS 5 > > > ``` > > > > OK, after increasing this to 6 I've got my BFQ back. Please see [1]. > > > > [1] > > https://lore.kernel.org/linux-block/20210717123328.945810-1-oleksandr@nat > > alenko.name/ > OK, after you fixed the issue in blkcg_policy_register(), can you > reproduce the discard issue on v5.14-rc1 with BFQ applied? If yes, > can you test the patch I posted previously? Yes, the issue is reproducible with both v5.13.2 and v5.14-rc1. I haven't managed to reproduce it with v5.13.2+your patch. Now I will build v5.14- rc2+your patch and test further. Thanks. -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-19 6:27 ` Oleksandr Natalenko @ 2021-07-20 9:05 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-20 9:05 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello, Ming. On pondělí 19. července 2021 8:27:29 CEST Oleksandr Natalenko wrote: > On pondělí 19. července 2021 3:40:40 CEST Ming Lei wrote: > > On Sat, Jul 17, 2021 at 02:35:14PM +0200, Oleksandr Natalenko wrote: > > > On sobota 17. července 2021 14:19:59 CEST Oleksandr Natalenko wrote: > > > > On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > > > > > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > > > > > Maybe you need to check if the build is OK, I can't reproduce it > > > > > > in > > > > > > my > > > > > > VM, and BFQ is still builtin: > > > > > > > > > > > > [root@ktest-01 ~]# uname -a > > > > > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 > > > > > > x86_64 > > > > > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > > > > > /sys/block/nvme0n1/queue/scheduler > > > > > > [none] mq-deadline kyber bfq > > > > > > > > > > I don't think this is an issue with the build… BTW, with > > > > > `initcall_debug`: > > > > > > > > > > ``` > > > > > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > > > > > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 > > > > > usecs > > > > > ``` > > > > > > > > > > -ENOSPC? Why? Also re-tested with the latest git tip, same result > > > > > :(. > > > > > > > > OK, one extra pr_info, and I see this: > > > > > > > > ``` > > > > [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small > > > > [ 0.871612] blkcg_policy_register: -28 > > > > ``` > > > > > > > > What does it mean please :)? The value seems to be hard-coded: > > > > > > > > ``` > > > > include/linux/blkdev.h > > > > 60:#define BLKCG_MAX_POLS 5 > > > > ``` > > > > > > OK, after increasing this to 6 I've got my BFQ back. Please see [1]. > > > > > > [1] > > > https://lore.kernel.org/linux-block/20210717123328.945810-1-oleksandr@na > > > t > > > alenko.name/ > > > > OK, after you fixed the issue in blkcg_policy_register(), can you > > reproduce the discard issue on v5.14-rc1 with BFQ applied? If yes, > > can you test the patch I posted previously? > > Yes, the issue is reproducible with both v5.13.2 and v5.14-rc1. I haven't > managed to reproduce it with v5.13.2+your patch. Now I will build v5.14- > rc2+your patch and test further. I'm still hammering v5.14-rc2 + your patch, and I cannot reproduce the issue. Given I do not have a reliable reproducer (I'm just firing up the kernel build, and the issue pops up eventually, sooner or later, but usually within a couple of first tries), for how long I should hammer it for your fix to be considered proven? Thanks. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-20 9:05 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-20 9:05 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello, Ming. On pondělí 19. července 2021 8:27:29 CEST Oleksandr Natalenko wrote: > On pondělí 19. července 2021 3:40:40 CEST Ming Lei wrote: > > On Sat, Jul 17, 2021 at 02:35:14PM +0200, Oleksandr Natalenko wrote: > > > On sobota 17. července 2021 14:19:59 CEST Oleksandr Natalenko wrote: > > > > On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > > > > > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > > > > > Maybe you need to check if the build is OK, I can't reproduce it > > > > > > in > > > > > > my > > > > > > VM, and BFQ is still builtin: > > > > > > > > > > > > [root@ktest-01 ~]# uname -a > > > > > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 > > > > > > x86_64 > > > > > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > > > > > /sys/block/nvme0n1/queue/scheduler > > > > > > [none] mq-deadline kyber bfq > > > > > > > > > > I don't think this is an issue with the build… BTW, with > > > > > `initcall_debug`: > > > > > > > > > > ``` > > > > > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > > > > > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 > > > > > usecs > > > > > ``` > > > > > > > > > > -ENOSPC? Why? Also re-tested with the latest git tip, same result > > > > > :(. > > > > > > > > OK, one extra pr_info, and I see this: > > > > > > > > ``` > > > > [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small > > > > [ 0.871612] blkcg_policy_register: -28 > > > > ``` > > > > > > > > What does it mean please :)? The value seems to be hard-coded: > > > > > > > > ``` > > > > include/linux/blkdev.h > > > > 60:#define BLKCG_MAX_POLS 5 > > > > ``` > > > > > > OK, after increasing this to 6 I've got my BFQ back. Please see [1]. > > > > > > [1] > > > https://lore.kernel.org/linux-block/20210717123328.945810-1-oleksandr@na > > > t > > > alenko.name/ > > > > OK, after you fixed the issue in blkcg_policy_register(), can you > > reproduce the discard issue on v5.14-rc1 with BFQ applied? If yes, > > can you test the patch I posted previously? > > Yes, the issue is reproducible with both v5.13.2 and v5.14-rc1. I haven't > managed to reproduce it with v5.13.2+your patch. Now I will build v5.14- > rc2+your patch and test further. I'm still hammering v5.14-rc2 + your patch, and I cannot reproduce the issue. Given I do not have a reliable reproducer (I'm just firing up the kernel build, and the issue pops up eventually, sooner or later, but usually within a couple of first tries), for how long I should hammer it for your fix to be considered proven? Thanks. -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-20 9:05 ` Oleksandr Natalenko @ 2021-07-21 8:00 ` Ming Lei -1 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-21 8:00 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Tue, Jul 20, 2021 at 11:05:29AM +0200, Oleksandr Natalenko wrote: > Hello, Ming. > > On pondělí 19. července 2021 8:27:29 CEST Oleksandr Natalenko wrote: > > On pondělí 19. července 2021 3:40:40 CEST Ming Lei wrote: > > > On Sat, Jul 17, 2021 at 02:35:14PM +0200, Oleksandr Natalenko wrote: > > > > On sobota 17. července 2021 14:19:59 CEST Oleksandr Natalenko wrote: > > > > > On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > > > > > > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > > > > > > Maybe you need to check if the build is OK, I can't reproduce it > > > > > > > in > > > > > > > my > > > > > > > VM, and BFQ is still builtin: > > > > > > > > > > > > > > [root@ktest-01 ~]# uname -a > > > > > > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 > > > > > > > x86_64 > > > > > > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > > > > > > /sys/block/nvme0n1/queue/scheduler > > > > > > > [none] mq-deadline kyber bfq > > > > > > > > > > > > I don't think this is an issue with the build… BTW, with > > > > > > `initcall_debug`: > > > > > > > > > > > > ``` > > > > > > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > > > > > > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 > > > > > > usecs > > > > > > ``` > > > > > > > > > > > > -ENOSPC? Why? Also re-tested with the latest git tip, same result > > > > > > :(. > > > > > > > > > > OK, one extra pr_info, and I see this: > > > > > > > > > > ``` > > > > > [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small > > > > > [ 0.871612] blkcg_policy_register: -28 > > > > > ``` > > > > > > > > > > What does it mean please :)? The value seems to be hard-coded: > > > > > > > > > > ``` > > > > > include/linux/blkdev.h > > > > > 60:#define BLKCG_MAX_POLS 5 > > > > > ``` > > > > > > > > OK, after increasing this to 6 I've got my BFQ back. Please see [1]. > > > > > > > > [1] > > > > https://lore.kernel.org/linux-block/20210717123328.945810-1-oleksandr@na > > > > t > > > > alenko.name/ > > > > > > OK, after you fixed the issue in blkcg_policy_register(), can you > > > reproduce the discard issue on v5.14-rc1 with BFQ applied? If yes, > > > can you test the patch I posted previously? > > > > Yes, the issue is reproducible with both v5.13.2 and v5.14-rc1. I haven't > > managed to reproduce it with v5.13.2+your patch. Now I will build v5.14- > > rc2+your patch and test further. > > I'm still hammering v5.14-rc2 + your patch, and I cannot reproduce the issue. > Given I do not have a reliable reproducer (I'm just firing up the kernel build, > and the issue pops up eventually, sooner or later, but usually within a couple > of first tries), for how long I should hammer it for your fix to be considered > proven? You mentioned that the issue is reproducible with v5.14-rc, that means it can be always reproduced in limited time(suppose it is A). If the issue can't be reproduced any more after applying the patch in long enough time B(B >> A), we can think it is fixed by the patch. For example, if A is one hour, we can set B as 5*A or bigger to simulate the long enough time. Thanks, Ming ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-21 8:00 ` Ming Lei 0 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-21 8:00 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Tue, Jul 20, 2021 at 11:05:29AM +0200, Oleksandr Natalenko wrote: > Hello, Ming. > > On pondělí 19. července 2021 8:27:29 CEST Oleksandr Natalenko wrote: > > On pondělí 19. července 2021 3:40:40 CEST Ming Lei wrote: > > > On Sat, Jul 17, 2021 at 02:35:14PM +0200, Oleksandr Natalenko wrote: > > > > On sobota 17. července 2021 14:19:59 CEST Oleksandr Natalenko wrote: > > > > > On sobota 17. července 2021 14:11:05 CEST Oleksandr Natalenko wrote: > > > > > > On sobota 17. července 2021 11:35:32 CEST Ming Lei wrote: > > > > > > > Maybe you need to check if the build is OK, I can't reproduce it > > > > > > > in > > > > > > > my > > > > > > > VM, and BFQ is still builtin: > > > > > > > > > > > > > > [root@ktest-01 ~]# uname -a > > > > > > > Linux ktest-01 5.14.0-rc1+ #52 SMP Fri Jul 16 18:56:36 CST 2021 > > > > > > > x86_64 > > > > > > > x86_64 x86_64 GNU/Linux [root@ktest-01 ~]# cat > > > > > > > /sys/block/nvme0n1/queue/scheduler > > > > > > > [none] mq-deadline kyber bfq > > > > > > > > > > > > I don't think this is an issue with the build… BTW, with > > > > > > `initcall_debug`: > > > > > > > > > > > > ``` > > > > > > [ 0.902555] calling bfq_init+0x0/0x8b @ 1 > > > > > > [ 0.903448] initcall bfq_init+0x0/0x8b returned -28 after 507 > > > > > > usecs > > > > > > ``` > > > > > > > > > > > > -ENOSPC? Why? Also re-tested with the latest git tip, same result > > > > > > :(. > > > > > > > > > > OK, one extra pr_info, and I see this: > > > > > > > > > > ``` > > > > > [ 0.871180] blkcg_policy_register: BLKCG_MAX_POLS too small > > > > > [ 0.871612] blkcg_policy_register: -28 > > > > > ``` > > > > > > > > > > What does it mean please :)? The value seems to be hard-coded: > > > > > > > > > > ``` > > > > > include/linux/blkdev.h > > > > > 60:#define BLKCG_MAX_POLS 5 > > > > > ``` > > > > > > > > OK, after increasing this to 6 I've got my BFQ back. Please see [1]. > > > > > > > > [1] > > > > https://lore.kernel.org/linux-block/20210717123328.945810-1-oleksandr@na > > > > t > > > > alenko.name/ > > > > > > OK, after you fixed the issue in blkcg_policy_register(), can you > > > reproduce the discard issue on v5.14-rc1 with BFQ applied? If yes, > > > can you test the patch I posted previously? > > > > Yes, the issue is reproducible with both v5.13.2 and v5.14-rc1. I haven't > > managed to reproduce it with v5.13.2+your patch. Now I will build v5.14- > > rc2+your patch and test further. > > I'm still hammering v5.14-rc2 + your patch, and I cannot reproduce the issue. > Given I do not have a reliable reproducer (I'm just firing up the kernel build, > and the issue pops up eventually, sooner or later, but usually within a couple > of first tries), for how long I should hammer it for your fix to be considered > proven? You mentioned that the issue is reproducible with v5.14-rc, that means it can be always reproduced in limited time(suppose it is A). If the issue can't be reproduced any more after applying the patch in long enough time B(B >> A), we can think it is fixed by the patch. For example, if A is one hour, we can set B as 5*A or bigger to simulate the long enough time. Thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-21 8:00 ` Ming Lei @ 2021-07-27 15:12 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-27 15:12 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On středa 21. července 2021 10:00:56 CEST Ming Lei wrote: > > I'm still hammering v5.14-rc2 + your patch, and I cannot reproduce the > > issue. Given I do not have a reliable reproducer (I'm just firing up the > > kernel build, and the issue pops up eventually, sooner or later, but > > usually within a couple of first tries), for how long I should hammer it > > for your fix to be considered proven? > > You mentioned that the issue is reproducible with v5.14-rc, that means > it can be always reproduced in limited time(suppose it is A). If the issue > can't be reproduced any more after applying the patch in long enough time > B(B >> A), we can think it is fixed by the patch. > > For example, if A is one hour, we can set B as 5*A or bigger to simulate > the long enough time. With your patch the issue is still not reproducible for me. Hence, from my side the patch looks to be verified. Thanks. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-27 15:12 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-27 15:12 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On středa 21. července 2021 10:00:56 CEST Ming Lei wrote: > > I'm still hammering v5.14-rc2 + your patch, and I cannot reproduce the > > issue. Given I do not have a reliable reproducer (I'm just firing up the > > kernel build, and the issue pops up eventually, sooner or later, but > > usually within a couple of first tries), for how long I should hammer it > > for your fix to be considered proven? > > You mentioned that the issue is reproducible with v5.14-rc, that means > it can be always reproduced in limited time(suppose it is A). If the issue > can't be reproduced any more after applying the patch in long enough time > B(B >> A), we can think it is fixed by the patch. > > For example, if A is one hour, we can set B as 5*A or bigger to simulate > the long enough time. With your patch the issue is still not reproducible for me. Hence, from my side the patch looks to be verified. Thanks. -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-27 15:12 ` Oleksandr Natalenko @ 2021-07-27 15:58 ` Ming Lei -1 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-27 15:58 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Tue, Jul 27, 2021 at 05:12:03PM +0200, Oleksandr Natalenko wrote: > Hello. > > On středa 21. července 2021 10:00:56 CEST Ming Lei wrote: > > > I'm still hammering v5.14-rc2 + your patch, and I cannot reproduce the > > > issue. Given I do not have a reliable reproducer (I'm just firing up the > > > kernel build, and the issue pops up eventually, sooner or later, but > > > usually within a couple of first tries), for how long I should hammer it > > > for your fix to be considered proven? > > > > You mentioned that the issue is reproducible with v5.14-rc, that means > > it can be always reproduced in limited time(suppose it is A). If the issue > > can't be reproduced any more after applying the patch in long enough time > > B(B >> A), we can think it is fixed by the patch. > > > > For example, if A is one hour, we can set B as 5*A or bigger to simulate > > the long enough time. > > With your patch the issue is still not reproducible for me. Hence, from my > side the patch looks to be verified. OK. BTW, can you test the following patch? which is another approach on the same issue with other benefits. From c853e7ed05a75f631da5b7952b9a989983437819 Mon Sep 17 00:00:00 2001 From: Ming Lei <ming.lei@redhat.com> Date: Mon, 7 Jun 2021 16:03:51 +0800 Subject: [PATCH 2/2] block: support bio merge for multi-range discard So far multi-range discard treats each bio as one segment(range) of single discard request. This way becomes not efficient if lots of small sized discard bios are submitted, and one example is raid456. Support bio merge for multi-range discard for improving lots of small sized discard bios. Turns out it is easy to support it: 1) always try to merge bio first 2) run into multi-range discard only if bio merge can't be done 3) add rq_for_each_discard_range() for retrieving each range(segment) of discard request Reported-by: Wang Shanker <shankerwangmiao@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-merge.c | 12 ++++----- drivers/block/virtio_blk.c | 9 ++++--- drivers/nvme/host/core.c | 8 +++--- include/linux/blkdev.h | 51 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 66 insertions(+), 14 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index bcdff1879c34..65210e9a8efa 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -724,10 +724,10 @@ static inline bool blk_discard_mergable(struct request *req) static enum elv_merge blk_try_req_merge(struct request *req, struct request *next) { - if (blk_discard_mergable(req)) - return ELEVATOR_DISCARD_MERGE; - else if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) + if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) return ELEVATOR_BACK_MERGE; + else if (blk_discard_mergable(req)) + return ELEVATOR_DISCARD_MERGE; return ELEVATOR_NO_MERGE; } @@ -908,12 +908,12 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) enum elv_merge blk_try_merge(struct request *rq, struct bio *bio) { - if (blk_discard_mergable(rq)) - return ELEVATOR_DISCARD_MERGE; - else if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) + if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) return ELEVATOR_BACK_MERGE; else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_iter.bi_sector) return ELEVATOR_FRONT_MERGE; + else if (blk_discard_mergable(rq)) + return ELEVATOR_DISCARD_MERGE; return ELEVATOR_NO_MERGE; } diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index b9fa3ef5b57c..970cb0d8acaa 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -116,7 +116,6 @@ static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap) unsigned short segments = blk_rq_nr_discard_segments(req); unsigned short n = 0; struct virtio_blk_discard_write_zeroes *range; - struct bio *bio; u32 flags = 0; if (unmap) @@ -138,9 +137,11 @@ static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap) range[0].sector = cpu_to_le64(blk_rq_pos(req)); n = 1; } else { - __rq_for_each_bio(bio, req) { - u64 sector = bio->bi_iter.bi_sector; - u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; + struct req_discard_range r; + + rq_for_each_discard_range(r, req) { + u64 sector = r.sector; + u32 num_sectors = r.size >> SECTOR_SHIFT; range[n].flags = cpu_to_le32(flags); range[n].num_sectors = cpu_to_le32(num_sectors); diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 24bcae88587a..4b0a39360ce9 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -813,7 +813,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, { unsigned short segments = blk_rq_nr_discard_segments(req), n = 0; struct nvme_dsm_range *range; - struct bio *bio; + struct req_discard_range r; /* * Some devices do not consider the DSM 'Number of Ranges' field when @@ -835,9 +835,9 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, range = page_address(ns->ctrl->discard_page); } - __rq_for_each_bio(bio, req) { - u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector); - u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift; + rq_for_each_discard_range(r, req) { + u64 slba = nvme_sect_to_lba(ns, r.sector); + u32 nlb = r.size >> ns->lba_shift; if (n < segments) { range[n].cattr = cpu_to_le32(0); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d66d0da72529..bd9d22269a7b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1007,6 +1007,57 @@ static inline unsigned int blk_rq_stats_sectors(const struct request *rq) return rq->stats_sectors; } +struct req_discard_range { + sector_t sector; + unsigned int size; + + /* + * internal field: driver don't use it, and it always points to + * next bio to be processed + */ + struct bio *__bio; +}; + +static inline void req_init_discard_range_iter(const struct request *rq, + struct req_discard_range *range) +{ + range->__bio = rq->bio; +} + +/* return true if @range stores one valid discard range */ +static inline bool req_get_discard_range(struct req_discard_range *range) +{ + struct bio *bio; + + if (!range->__bio) + return false; + + bio = range->__bio; + range->sector = bio->bi_iter.bi_sector; + range->size = bio->bi_iter.bi_size; + range->__bio = bio->bi_next; + + while (range->__bio) { + struct bio *bio = range->__bio; + + if (range->sector + (range->size >> SECTOR_SHIFT) != + bio->bi_iter.bi_sector) + break; + + /* + * ->size won't overflow because req->__data_len is defined + * as 'unsigned int' + */ + range->size += bio->bi_iter.bi_size; + range->__bio = bio->bi_next; + } + return true; +} + +#define rq_for_each_discard_range(range, rq) \ + for (req_init_discard_range_iter((rq), &range); \ + req_get_discard_range(&range);) + #ifdef CONFIG_BLK_DEV_ZONED /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */ -- 2.31.1 Thanks, Ming ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-27 15:58 ` Ming Lei 0 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-27 15:58 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Tue, Jul 27, 2021 at 05:12:03PM +0200, Oleksandr Natalenko wrote: > Hello. > > On středa 21. července 2021 10:00:56 CEST Ming Lei wrote: > > > I'm still hammering v5.14-rc2 + your patch, and I cannot reproduce the > > > issue. Given I do not have a reliable reproducer (I'm just firing up the > > > kernel build, and the issue pops up eventually, sooner or later, but > > > usually within a couple of first tries), for how long I should hammer it > > > for your fix to be considered proven? > > > > You mentioned that the issue is reproducible with v5.14-rc, that means > > it can be always reproduced in limited time(suppose it is A). If the issue > > can't be reproduced any more after applying the patch in long enough time > > B(B >> A), we can think it is fixed by the patch. > > > > For example, if A is one hour, we can set B as 5*A or bigger to simulate > > the long enough time. > > With your patch the issue is still not reproducible for me. Hence, from my > side the patch looks to be verified. OK. BTW, can you test the following patch? which is another approach on the same issue with other benefits. From c853e7ed05a75f631da5b7952b9a989983437819 Mon Sep 17 00:00:00 2001 From: Ming Lei <ming.lei@redhat.com> Date: Mon, 7 Jun 2021 16:03:51 +0800 Subject: [PATCH 2/2] block: support bio merge for multi-range discard So far multi-range discard treats each bio as one segment(range) of single discard request. This way becomes not efficient if lots of small sized discard bios are submitted, and one example is raid456. Support bio merge for multi-range discard for improving lots of small sized discard bios. Turns out it is easy to support it: 1) always try to merge bio first 2) run into multi-range discard only if bio merge can't be done 3) add rq_for_each_discard_range() for retrieving each range(segment) of discard request Reported-by: Wang Shanker <shankerwangmiao@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-merge.c | 12 ++++----- drivers/block/virtio_blk.c | 9 ++++--- drivers/nvme/host/core.c | 8 +++--- include/linux/blkdev.h | 51 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 66 insertions(+), 14 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index bcdff1879c34..65210e9a8efa 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -724,10 +724,10 @@ static inline bool blk_discard_mergable(struct request *req) static enum elv_merge blk_try_req_merge(struct request *req, struct request *next) { - if (blk_discard_mergable(req)) - return ELEVATOR_DISCARD_MERGE; - else if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) + if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) return ELEVATOR_BACK_MERGE; + else if (blk_discard_mergable(req)) + return ELEVATOR_DISCARD_MERGE; return ELEVATOR_NO_MERGE; } @@ -908,12 +908,12 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) enum elv_merge blk_try_merge(struct request *rq, struct bio *bio) { - if (blk_discard_mergable(rq)) - return ELEVATOR_DISCARD_MERGE; - else if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) + if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) return ELEVATOR_BACK_MERGE; else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_iter.bi_sector) return ELEVATOR_FRONT_MERGE; + else if (blk_discard_mergable(rq)) + return ELEVATOR_DISCARD_MERGE; return ELEVATOR_NO_MERGE; } diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index b9fa3ef5b57c..970cb0d8acaa 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -116,7 +116,6 @@ static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap) unsigned short segments = blk_rq_nr_discard_segments(req); unsigned short n = 0; struct virtio_blk_discard_write_zeroes *range; - struct bio *bio; u32 flags = 0; if (unmap) @@ -138,9 +137,11 @@ static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap) range[0].sector = cpu_to_le64(blk_rq_pos(req)); n = 1; } else { - __rq_for_each_bio(bio, req) { - u64 sector = bio->bi_iter.bi_sector; - u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; + struct req_discard_range r; + + rq_for_each_discard_range(r, req) { + u64 sector = r.sector; + u32 num_sectors = r.size >> SECTOR_SHIFT; range[n].flags = cpu_to_le32(flags); range[n].num_sectors = cpu_to_le32(num_sectors); diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 24bcae88587a..4b0a39360ce9 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -813,7 +813,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, { unsigned short segments = blk_rq_nr_discard_segments(req), n = 0; struct nvme_dsm_range *range; - struct bio *bio; + struct req_discard_range r; /* * Some devices do not consider the DSM 'Number of Ranges' field when @@ -835,9 +835,9 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, range = page_address(ns->ctrl->discard_page); } - __rq_for_each_bio(bio, req) { - u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector); - u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift; + rq_for_each_discard_range(r, req) { + u64 slba = nvme_sect_to_lba(ns, r.sector); + u32 nlb = r.size >> ns->lba_shift; if (n < segments) { range[n].cattr = cpu_to_le32(0); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d66d0da72529..bd9d22269a7b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1007,6 +1007,57 @@ static inline unsigned int blk_rq_stats_sectors(const struct request *rq) return rq->stats_sectors; } +struct req_discard_range { + sector_t sector; + unsigned int size; + + /* + * internal field: driver don't use it, and it always points to + * next bio to be processed + */ + struct bio *__bio; +}; + +static inline void req_init_discard_range_iter(const struct request *rq, + struct req_discard_range *range) +{ + range->__bio = rq->bio; +} + +/* return true if @range stores one valid discard range */ +static inline bool req_get_discard_range(struct req_discard_range *range) +{ + struct bio *bio; + + if (!range->__bio) + return false; + + bio = range->__bio; + range->sector = bio->bi_iter.bi_sector; + range->size = bio->bi_iter.bi_size; + range->__bio = bio->bi_next; + + while (range->__bio) { + struct bio *bio = range->__bio; + + if (range->sector + (range->size >> SECTOR_SHIFT) != + bio->bi_iter.bi_sector) + break; + + /* + * ->size won't overflow because req->__data_len is defined + * as 'unsigned int' + */ + range->size += bio->bi_iter.bi_size; + range->__bio = bio->bi_next; + } + return true; +} + +#define rq_for_each_discard_range(range, rq) \ + for (req_init_discard_range_iter((rq), &range); \ + req_get_discard_range(&range);) + #ifdef CONFIG_BLK_DEV_ZONED /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */ -- 2.31.1 Thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-27 15:58 ` Ming Lei @ 2021-07-28 13:44 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-28 13:44 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On úterý 27. července 2021 17:58:39 CEST Ming Lei wrote: > BTW, can you test the following patch? which is another approach on the same > issue with other benefits. > > From c853e7ed05a75f631da5b7952b9a989983437819 Mon Sep 17 00:00:00 2001 > From: Ming Lei <ming.lei@redhat.com> > Date: Mon, 7 Jun 2021 16:03:51 +0800 > Subject: [PATCH 2/2] block: support bio merge for multi-range discard > > So far multi-range discard treats each bio as one segment(range) of single > discard request. This way becomes not efficient if lots of small sized > discard bios are submitted, and one example is raid456. > > Support bio merge for multi-range discard for improving lots of small > sized discard bios. > > Turns out it is easy to support it: > > 1) always try to merge bio first > > 2) run into multi-range discard only if bio merge can't be done > > 3) add rq_for_each_discard_range() for retrieving each range(segment) > of discard request > > Reported-by: Wang Shanker <shankerwangmiao@gmail.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-merge.c | 12 ++++----- > drivers/block/virtio_blk.c | 9 ++++--- > drivers/nvme/host/core.c | 8 +++--- > include/linux/blkdev.h | 51 ++++++++++++++++++++++++++++++++++++++ > 4 files changed, 66 insertions(+), 14 deletions(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index bcdff1879c34..65210e9a8efa 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -724,10 +724,10 @@ static inline bool blk_discard_mergable(struct request > *req) static enum elv_merge blk_try_req_merge(struct request *req, > struct request *next) > { > - if (blk_discard_mergable(req)) > - return ELEVATOR_DISCARD_MERGE; > - else if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) > + if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) > return ELEVATOR_BACK_MERGE; > + else if (blk_discard_mergable(req)) > + return ELEVATOR_DISCARD_MERGE; > > return ELEVATOR_NO_MERGE; > } > @@ -908,12 +908,12 @@ bool blk_rq_merge_ok(struct request *rq, struct bio > *bio) > > enum elv_merge blk_try_merge(struct request *rq, struct bio *bio) > { > - if (blk_discard_mergable(rq)) > - return ELEVATOR_DISCARD_MERGE; > - else if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) > + if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) > return ELEVATOR_BACK_MERGE; > else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_iter.bi_sector) > return ELEVATOR_FRONT_MERGE; > + else if (blk_discard_mergable(rq)) > + return ELEVATOR_DISCARD_MERGE; > return ELEVATOR_NO_MERGE; > } > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index b9fa3ef5b57c..970cb0d8acaa 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -116,7 +116,6 @@ static int virtblk_setup_discard_write_zeroes(struct > request *req, bool unmap) unsigned short segments = > blk_rq_nr_discard_segments(req); > unsigned short n = 0; > struct virtio_blk_discard_write_zeroes *range; > - struct bio *bio; > u32 flags = 0; > > if (unmap) > @@ -138,9 +137,11 @@ static int virtblk_setup_discard_write_zeroes(struct > request *req, bool unmap) range[0].sector = cpu_to_le64(blk_rq_pos(req)); > n = 1; > } else { > - __rq_for_each_bio(bio, req) { > - u64 sector = bio->bi_iter.bi_sector; > - u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; > + struct req_discard_range r; > + > + rq_for_each_discard_range(r, req) { > + u64 sector = r.sector; > + u32 num_sectors = r.size >> SECTOR_SHIFT; > > range[n].flags = cpu_to_le32(flags); > range[n].num_sectors = cpu_to_le32(num_sectors); > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 24bcae88587a..4b0a39360ce9 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -813,7 +813,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > *ns, struct request *req, { > unsigned short segments = blk_rq_nr_discard_segments(req), n = 0; > struct nvme_dsm_range *range; > - struct bio *bio; > + struct req_discard_range r; > > /* > * Some devices do not consider the DSM 'Number of Ranges' field when > @@ -835,9 +835,9 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > *ns, struct request *req, range = page_address(ns->ctrl->discard_page); > } > > - __rq_for_each_bio(bio, req) { > - u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector); > - u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift; > + rq_for_each_discard_range(r, req) { > + u64 slba = nvme_sect_to_lba(ns, r.sector); > + u32 nlb = r.size >> ns->lba_shift; > > if (n < segments) { > range[n].cattr = cpu_to_le32(0); > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index d66d0da72529..bd9d22269a7b 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -1007,6 +1007,57 @@ static inline unsigned int blk_rq_stats_sectors(const > struct request *rq) return rq->stats_sectors; > } > > +struct req_discard_range { > + sector_t sector; > + unsigned int size; > + > + /* > + * internal field: driver don't use it, and it always points to > + * next bio to be processed > + */ > + struct bio *__bio; > +}; > + > +static inline void req_init_discard_range_iter(const struct request *rq, > + struct req_discard_range *range) > +{ > + range->__bio = rq->bio; > +} > + > +/* return true if @range stores one valid discard range */ > +static inline bool req_get_discard_range(struct req_discard_range *range) > +{ > + struct bio *bio; > + > + if (!range->__bio) > + return false; > + > + bio = range->__bio; > + range->sector = bio->bi_iter.bi_sector; > + range->size = bio->bi_iter.bi_size; > + range->__bio = bio->bi_next; > + > + while (range->__bio) { > + struct bio *bio = range->__bio; > + > + if (range->sector + (range->size >> SECTOR_SHIFT) != > + bio->bi_iter.bi_sector) > + break; > + > + /* > + * ->size won't overflow because req->__data_len is defined > + * as 'unsigned int' > + */ > + range->size += bio->bi_iter.bi_size; > + range->__bio = bio->bi_next; > + } > + return true; > +} > + > +#define rq_for_each_discard_range(range, rq) \ > + for (req_init_discard_range_iter((rq), &range); \ > + req_get_discard_range(&range);) > + > #ifdef CONFIG_BLK_DEV_ZONED > > /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */ Do I have to revert the previous one and apply this one? If so, with this one the issue is triggered pretty quick: ``` kernel: ------------[ cut here ]------------ kernel: WARNING: CPU: 20 PID: 490 at drivers/nvme/host/core.c:850 nvme_setup_discard+0x1b9/0x220 … kernel: CPU: 20 PID: 490 Comm: md0_raid10 Not tainted 5.13.0-pf4 #1 kernel: Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3601 05/26/2021 kernel: RIP: 0010:nvme_setup_discard+0x1b9/0x220 kernel: Code: 38 4c 8b 88 40 0b 00 00 4c 2b 0d f2 06 d8 00 49 c1 f9 06 49 c1 e1 0c 4c 03 0d f3 06 d8 00 4d 89 c8 48 85 d2 0f 85 9f fe ff ff <0f> 0b b8 00 00 00 80 4c 01 c8 72 52 48 c7 c2 00 00 00 80 48 2b 15 kernel: RSP: 0018:ffffa3a34152ba10 EFLAGS: 00010202 kernel: RAX: ffff8b78d80db0c0 RBX: 000000000000000f RCX: 0000000000000400 kernel: RDX: 0000000000000000 RSI: 00000000241b5c00 RDI: 000000000000000d kernel: RBP: ffff8b78cbd70380 R08: ffff8b78d80db000 R09: ffff8b78d80db000 kernel: R10: 00000000241b5c00 R11: 0000000000000000 R12: ffff8b78c5a4b800 kernel: R13: ffff8b78cbd704c8 R14: ffff8b78c5bd8000 R15: ffff8b78cabbf000 kernel: FS: 0000000000000000(0000) GS:ffff8b7fcef00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007faaaf746020 CR3: 00000001e0342000 CR4: 0000000000350ee0 kernel: Call Trace: kernel: nvme_setup_cmd+0x2d0/0x670 kernel: nvme_queue_rq+0x79/0xc90 kernel: ? __sbitmap_get_word+0x30/0x80 kernel: ? sbitmap_get+0x85/0x180 kernel: blk_mq_dispatch_rq_list+0x15c/0x810 kernel: __blk_mq_do_dispatch_sched+0xca/0x320 kernel: ? ktime_get+0x38/0xa0 kernel: __blk_mq_sched_dispatch_requests+0x14d/0x190 kernel: blk_mq_sched_dispatch_requests+0x2f/0x60 kernel: __blk_mq_run_hw_queue+0x30/0xa0 kernel: __blk_mq_delay_run_hw_queue+0x142/0x170 kernel: blk_mq_sched_insert_requests+0x6d/0xf0 kernel: blk_mq_flush_plug_list+0x111/0x1c0 kernel: blk_finish_plug+0x21/0x30 kernel: raid10d+0x7c8/0x1960 [raid10] kernel: ? psi_task_switch+0xf2/0x330 kernel: ? __switch_to_asm+0x42/0x70 kernel: ? finish_task_switch.isra.0+0xaa/0x290 kernel: ? md_thread+0xc3/0x190 [md_mod] kernel: md_thread+0xc3/0x190 [md_mod] kernel: ? finish_wait+0x80/0x80 kernel: ? md_rdev_init+0xb0/0xb0 [md_mod] kernel: kthread+0x1b3/0x1e0 kernel: ? __kthread_init_worker+0x50/0x50 kernel: ret_from_fork+0x22/0x30 kernel: ---[ end trace dc148fcea235e799 ]--- kernel: blk_update_request: I/O error, dev nvme0n1, sector 605615104 op 0x3:(DISCARD) flags 0x0 phys_seg 15 prio class 0 kernel: blk_update_request: I/O error, dev nvme1n1, sector 118159360 op 0x3:(DISCARD) flags 0x0 phys_seg 15 prio class 0 kernel: blk_update_request: I/O error, dev nvme0n1, sector 118200320 op 0x3:(DISCARD) flags 0x0 phys_seg 50 prio class 0 kernel: blk_update_request: I/O error, dev nvme1n1, sector 118326272 op 0x3:(DISCARD) flags 0x0 phys_seg 165 prio class 0 ``` -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-28 13:44 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-28 13:44 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On úterý 27. července 2021 17:58:39 CEST Ming Lei wrote: > BTW, can you test the following patch? which is another approach on the same > issue with other benefits. > > From c853e7ed05a75f631da5b7952b9a989983437819 Mon Sep 17 00:00:00 2001 > From: Ming Lei <ming.lei@redhat.com> > Date: Mon, 7 Jun 2021 16:03:51 +0800 > Subject: [PATCH 2/2] block: support bio merge for multi-range discard > > So far multi-range discard treats each bio as one segment(range) of single > discard request. This way becomes not efficient if lots of small sized > discard bios are submitted, and one example is raid456. > > Support bio merge for multi-range discard for improving lots of small > sized discard bios. > > Turns out it is easy to support it: > > 1) always try to merge bio first > > 2) run into multi-range discard only if bio merge can't be done > > 3) add rq_for_each_discard_range() for retrieving each range(segment) > of discard request > > Reported-by: Wang Shanker <shankerwangmiao@gmail.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-merge.c | 12 ++++----- > drivers/block/virtio_blk.c | 9 ++++--- > drivers/nvme/host/core.c | 8 +++--- > include/linux/blkdev.h | 51 ++++++++++++++++++++++++++++++++++++++ > 4 files changed, 66 insertions(+), 14 deletions(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index bcdff1879c34..65210e9a8efa 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -724,10 +724,10 @@ static inline bool blk_discard_mergable(struct request > *req) static enum elv_merge blk_try_req_merge(struct request *req, > struct request *next) > { > - if (blk_discard_mergable(req)) > - return ELEVATOR_DISCARD_MERGE; > - else if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) > + if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) > return ELEVATOR_BACK_MERGE; > + else if (blk_discard_mergable(req)) > + return ELEVATOR_DISCARD_MERGE; > > return ELEVATOR_NO_MERGE; > } > @@ -908,12 +908,12 @@ bool blk_rq_merge_ok(struct request *rq, struct bio > *bio) > > enum elv_merge blk_try_merge(struct request *rq, struct bio *bio) > { > - if (blk_discard_mergable(rq)) > - return ELEVATOR_DISCARD_MERGE; > - else if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) > + if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) > return ELEVATOR_BACK_MERGE; > else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_iter.bi_sector) > return ELEVATOR_FRONT_MERGE; > + else if (blk_discard_mergable(rq)) > + return ELEVATOR_DISCARD_MERGE; > return ELEVATOR_NO_MERGE; > } > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index b9fa3ef5b57c..970cb0d8acaa 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -116,7 +116,6 @@ static int virtblk_setup_discard_write_zeroes(struct > request *req, bool unmap) unsigned short segments = > blk_rq_nr_discard_segments(req); > unsigned short n = 0; > struct virtio_blk_discard_write_zeroes *range; > - struct bio *bio; > u32 flags = 0; > > if (unmap) > @@ -138,9 +137,11 @@ static int virtblk_setup_discard_write_zeroes(struct > request *req, bool unmap) range[0].sector = cpu_to_le64(blk_rq_pos(req)); > n = 1; > } else { > - __rq_for_each_bio(bio, req) { > - u64 sector = bio->bi_iter.bi_sector; > - u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; > + struct req_discard_range r; > + > + rq_for_each_discard_range(r, req) { > + u64 sector = r.sector; > + u32 num_sectors = r.size >> SECTOR_SHIFT; > > range[n].flags = cpu_to_le32(flags); > range[n].num_sectors = cpu_to_le32(num_sectors); > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 24bcae88587a..4b0a39360ce9 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -813,7 +813,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > *ns, struct request *req, { > unsigned short segments = blk_rq_nr_discard_segments(req), n = 0; > struct nvme_dsm_range *range; > - struct bio *bio; > + struct req_discard_range r; > > /* > * Some devices do not consider the DSM 'Number of Ranges' field when > @@ -835,9 +835,9 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > *ns, struct request *req, range = page_address(ns->ctrl->discard_page); > } > > - __rq_for_each_bio(bio, req) { > - u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector); > - u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift; > + rq_for_each_discard_range(r, req) { > + u64 slba = nvme_sect_to_lba(ns, r.sector); > + u32 nlb = r.size >> ns->lba_shift; > > if (n < segments) { > range[n].cattr = cpu_to_le32(0); > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index d66d0da72529..bd9d22269a7b 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -1007,6 +1007,57 @@ static inline unsigned int blk_rq_stats_sectors(const > struct request *rq) return rq->stats_sectors; > } > > +struct req_discard_range { > + sector_t sector; > + unsigned int size; > + > + /* > + * internal field: driver don't use it, and it always points to > + * next bio to be processed > + */ > + struct bio *__bio; > +}; > + > +static inline void req_init_discard_range_iter(const struct request *rq, > + struct req_discard_range *range) > +{ > + range->__bio = rq->bio; > +} > + > +/* return true if @range stores one valid discard range */ > +static inline bool req_get_discard_range(struct req_discard_range *range) > +{ > + struct bio *bio; > + > + if (!range->__bio) > + return false; > + > + bio = range->__bio; > + range->sector = bio->bi_iter.bi_sector; > + range->size = bio->bi_iter.bi_size; > + range->__bio = bio->bi_next; > + > + while (range->__bio) { > + struct bio *bio = range->__bio; > + > + if (range->sector + (range->size >> SECTOR_SHIFT) != > + bio->bi_iter.bi_sector) > + break; > + > + /* > + * ->size won't overflow because req->__data_len is defined > + * as 'unsigned int' > + */ > + range->size += bio->bi_iter.bi_size; > + range->__bio = bio->bi_next; > + } > + return true; > +} > + > +#define rq_for_each_discard_range(range, rq) \ > + for (req_init_discard_range_iter((rq), &range); \ > + req_get_discard_range(&range);) > + > #ifdef CONFIG_BLK_DEV_ZONED > > /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */ Do I have to revert the previous one and apply this one? If so, with this one the issue is triggered pretty quick: ``` kernel: ------------[ cut here ]------------ kernel: WARNING: CPU: 20 PID: 490 at drivers/nvme/host/core.c:850 nvme_setup_discard+0x1b9/0x220 … kernel: CPU: 20 PID: 490 Comm: md0_raid10 Not tainted 5.13.0-pf4 #1 kernel: Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3601 05/26/2021 kernel: RIP: 0010:nvme_setup_discard+0x1b9/0x220 kernel: Code: 38 4c 8b 88 40 0b 00 00 4c 2b 0d f2 06 d8 00 49 c1 f9 06 49 c1 e1 0c 4c 03 0d f3 06 d8 00 4d 89 c8 48 85 d2 0f 85 9f fe ff ff <0f> 0b b8 00 00 00 80 4c 01 c8 72 52 48 c7 c2 00 00 00 80 48 2b 15 kernel: RSP: 0018:ffffa3a34152ba10 EFLAGS: 00010202 kernel: RAX: ffff8b78d80db0c0 RBX: 000000000000000f RCX: 0000000000000400 kernel: RDX: 0000000000000000 RSI: 00000000241b5c00 RDI: 000000000000000d kernel: RBP: ffff8b78cbd70380 R08: ffff8b78d80db000 R09: ffff8b78d80db000 kernel: R10: 00000000241b5c00 R11: 0000000000000000 R12: ffff8b78c5a4b800 kernel: R13: ffff8b78cbd704c8 R14: ffff8b78c5bd8000 R15: ffff8b78cabbf000 kernel: FS: 0000000000000000(0000) GS:ffff8b7fcef00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007faaaf746020 CR3: 00000001e0342000 CR4: 0000000000350ee0 kernel: Call Trace: kernel: nvme_setup_cmd+0x2d0/0x670 kernel: nvme_queue_rq+0x79/0xc90 kernel: ? __sbitmap_get_word+0x30/0x80 kernel: ? sbitmap_get+0x85/0x180 kernel: blk_mq_dispatch_rq_list+0x15c/0x810 kernel: __blk_mq_do_dispatch_sched+0xca/0x320 kernel: ? ktime_get+0x38/0xa0 kernel: __blk_mq_sched_dispatch_requests+0x14d/0x190 kernel: blk_mq_sched_dispatch_requests+0x2f/0x60 kernel: __blk_mq_run_hw_queue+0x30/0xa0 kernel: __blk_mq_delay_run_hw_queue+0x142/0x170 kernel: blk_mq_sched_insert_requests+0x6d/0xf0 kernel: blk_mq_flush_plug_list+0x111/0x1c0 kernel: blk_finish_plug+0x21/0x30 kernel: raid10d+0x7c8/0x1960 [raid10] kernel: ? psi_task_switch+0xf2/0x330 kernel: ? __switch_to_asm+0x42/0x70 kernel: ? finish_task_switch.isra.0+0xaa/0x290 kernel: ? md_thread+0xc3/0x190 [md_mod] kernel: md_thread+0xc3/0x190 [md_mod] kernel: ? finish_wait+0x80/0x80 kernel: ? md_rdev_init+0xb0/0xb0 [md_mod] kernel: kthread+0x1b3/0x1e0 kernel: ? __kthread_init_worker+0x50/0x50 kernel: ret_from_fork+0x22/0x30 kernel: ---[ end trace dc148fcea235e799 ]--- kernel: blk_update_request: I/O error, dev nvme0n1, sector 605615104 op 0x3:(DISCARD) flags 0x0 phys_seg 15 prio class 0 kernel: blk_update_request: I/O error, dev nvme1n1, sector 118159360 op 0x3:(DISCARD) flags 0x0 phys_seg 15 prio class 0 kernel: blk_update_request: I/O error, dev nvme0n1, sector 118200320 op 0x3:(DISCARD) flags 0x0 phys_seg 50 prio class 0 kernel: blk_update_request: I/O error, dev nvme1n1, sector 118326272 op 0x3:(DISCARD) flags 0x0 phys_seg 165 prio class 0 ``` -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-28 13:44 ` Oleksandr Natalenko @ 2021-07-28 15:53 ` Ming Lei -1 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-28 15:53 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Wed, Jul 28, 2021 at 03:44:06PM +0200, Oleksandr Natalenko wrote: > Hello. > > On úterý 27. července 2021 17:58:39 CEST Ming Lei wrote: > > BTW, can you test the following patch? which is another approach on the same > > issue with other benefits. > > > > From c853e7ed05a75f631da5b7952b9a989983437819 Mon Sep 17 00:00:00 2001 > > From: Ming Lei <ming.lei@redhat.com> > > Date: Mon, 7 Jun 2021 16:03:51 +0800 > > Subject: [PATCH 2/2] block: support bio merge for multi-range discard > > > > So far multi-range discard treats each bio as one segment(range) of single > > discard request. This way becomes not efficient if lots of small sized > > discard bios are submitted, and one example is raid456. > > > > Support bio merge for multi-range discard for improving lots of small > > sized discard bios. > > > > Turns out it is easy to support it: > > > > 1) always try to merge bio first > > > > 2) run into multi-range discard only if bio merge can't be done > > > > 3) add rq_for_each_discard_range() for retrieving each range(segment) > > of discard request > > > > Reported-by: Wang Shanker <shankerwangmiao@gmail.com> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-merge.c | 12 ++++----- > > drivers/block/virtio_blk.c | 9 ++++--- > > drivers/nvme/host/core.c | 8 +++--- > > include/linux/blkdev.h | 51 ++++++++++++++++++++++++++++++++++++++ > > 4 files changed, 66 insertions(+), 14 deletions(-) > > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > > index bcdff1879c34..65210e9a8efa 100644 > > --- a/block/blk-merge.c > > +++ b/block/blk-merge.c > > @@ -724,10 +724,10 @@ static inline bool blk_discard_mergable(struct request > > *req) static enum elv_merge blk_try_req_merge(struct request *req, > > struct request *next) > > { > > - if (blk_discard_mergable(req)) > > - return ELEVATOR_DISCARD_MERGE; > > - else if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) > > + if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) > > return ELEVATOR_BACK_MERGE; > > + else if (blk_discard_mergable(req)) > > + return ELEVATOR_DISCARD_MERGE; > > > > return ELEVATOR_NO_MERGE; > > } > > @@ -908,12 +908,12 @@ bool blk_rq_merge_ok(struct request *rq, struct bio > > *bio) > > > > enum elv_merge blk_try_merge(struct request *rq, struct bio *bio) > > { > > - if (blk_discard_mergable(rq)) > > - return ELEVATOR_DISCARD_MERGE; > > - else if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) > > + if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) > > return ELEVATOR_BACK_MERGE; > > else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_iter.bi_sector) > > return ELEVATOR_FRONT_MERGE; > > + else if (blk_discard_mergable(rq)) > > + return ELEVATOR_DISCARD_MERGE; > > return ELEVATOR_NO_MERGE; > > } > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > > index b9fa3ef5b57c..970cb0d8acaa 100644 > > --- a/drivers/block/virtio_blk.c > > +++ b/drivers/block/virtio_blk.c > > @@ -116,7 +116,6 @@ static int virtblk_setup_discard_write_zeroes(struct > > request *req, bool unmap) unsigned short segments = > > blk_rq_nr_discard_segments(req); > > unsigned short n = 0; > > struct virtio_blk_discard_write_zeroes *range; > > - struct bio *bio; > > u32 flags = 0; > > > > if (unmap) > > @@ -138,9 +137,11 @@ static int virtblk_setup_discard_write_zeroes(struct > > request *req, bool unmap) range[0].sector = cpu_to_le64(blk_rq_pos(req)); > > n = 1; > > } else { > > - __rq_for_each_bio(bio, req) { > > - u64 sector = bio->bi_iter.bi_sector; > > - u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; > > + struct req_discard_range r; > > + > > + rq_for_each_discard_range(r, req) { > > + u64 sector = r.sector; > > + u32 num_sectors = r.size >> SECTOR_SHIFT; > > > > range[n].flags = cpu_to_le32(flags); > > range[n].num_sectors = cpu_to_le32(num_sectors); > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > index 24bcae88587a..4b0a39360ce9 100644 > > --- a/drivers/nvme/host/core.c > > +++ b/drivers/nvme/host/core.c > > @@ -813,7 +813,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > > *ns, struct request *req, { > > unsigned short segments = blk_rq_nr_discard_segments(req), n = 0; > > struct nvme_dsm_range *range; > > - struct bio *bio; > > + struct req_discard_range r; > > > > /* > > * Some devices do not consider the DSM 'Number of Ranges' field when > > @@ -835,9 +835,9 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > > *ns, struct request *req, range = page_address(ns->ctrl->discard_page); > > } > > > > - __rq_for_each_bio(bio, req) { > > - u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector); > > - u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift; > > + rq_for_each_discard_range(r, req) { > > + u64 slba = nvme_sect_to_lba(ns, r.sector); > > + u32 nlb = r.size >> ns->lba_shift; > > > > if (n < segments) { > > range[n].cattr = cpu_to_le32(0); > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > > index d66d0da72529..bd9d22269a7b 100644 > > --- a/include/linux/blkdev.h > > +++ b/include/linux/blkdev.h > > @@ -1007,6 +1007,57 @@ static inline unsigned int blk_rq_stats_sectors(const > > struct request *rq) return rq->stats_sectors; > > } > > > > +struct req_discard_range { > > + sector_t sector; > > + unsigned int size; > > + > > + /* > > + * internal field: driver don't use it, and it always points to > > + * next bio to be processed > > + */ > > + struct bio *__bio; > > +}; > > + > > +static inline void req_init_discard_range_iter(const struct request *rq, > > + struct req_discard_range *range) > > +{ > > + range->__bio = rq->bio; > > +} > > + > > +/* return true if @range stores one valid discard range */ > > +static inline bool req_get_discard_range(struct req_discard_range *range) > > +{ > > + struct bio *bio; > > + > > + if (!range->__bio) > > + return false; > > + > > + bio = range->__bio; > > + range->sector = bio->bi_iter.bi_sector; > > + range->size = bio->bi_iter.bi_size; > > + range->__bio = bio->bi_next; > > + > > + while (range->__bio) { > > + struct bio *bio = range->__bio; > > + > > + if (range->sector + (range->size >> SECTOR_SHIFT) != > > + bio->bi_iter.bi_sector) > > + break; > > + > > + /* > > + * ->size won't overflow because req->__data_len is defined > > + * as 'unsigned int' > > + */ > > + range->size += bio->bi_iter.bi_size; > > + range->__bio = bio->bi_next; > > + } > > + return true; > > +} > > + > > +#define rq_for_each_discard_range(range, rq) \ > > + for (req_init_discard_range_iter((rq), &range); \ > > + req_get_discard_range(&range);) > > + > > #ifdef CONFIG_BLK_DEV_ZONED > > > > /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */ > > Do I have to revert the previous one and apply this one? If so, with this one the issue is triggered pretty quick: Yeah, the previous one needs to be reverted. > > ``` > kernel: ------------[ cut here ]------------ > kernel: WARNING: CPU: 20 PID: 490 at drivers/nvme/host/core.c:850 nvme_setup_discard+0x1b9/0x220 Can you collect debug log by applying the following patch against the last one? diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 8780e4aa9df2..fbd8a68c619b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -828,6 +828,24 @@ static inline void nvme_setup_flush(struct nvme_ns *ns, cmnd->common.nsid = cpu_to_le32(ns->head->ns_id); } +static inline void blk_dump_rq(const struct request *req) +{ + struct bio *bio; + int i = 0; + + printk("dump req %p(f:%x, seg: %d)\n", req, req->cmd_flags, + req->nr_phys_segments); + + __rq_for_each_bio(bio, req) { + printk("%d-%p: %hx/%hx %llu %u\n", + i++, bio, + bio->bi_flags, bio->bi_opf, + (unsigned long long)bio->bi_iter.bi_sector, + bio->bi_iter.bi_size>>9); + } +} + + static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, struct nvme_command *cmnd) { @@ -868,6 +886,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, } if (WARN_ON_ONCE(n != segments)) { + printk("%s: ranges %u segments %u\n", __func__, n, segments); + blk_dump_rq(req); if (virt_to_page(range) == ns->ctrl->discard_page) clear_bit_unlock(0, &ns->ctrl->discard_page_busy); else Thanks, Ming ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-28 15:53 ` Ming Lei 0 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-28 15:53 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Wed, Jul 28, 2021 at 03:44:06PM +0200, Oleksandr Natalenko wrote: > Hello. > > On úterý 27. července 2021 17:58:39 CEST Ming Lei wrote: > > BTW, can you test the following patch? which is another approach on the same > > issue with other benefits. > > > > From c853e7ed05a75f631da5b7952b9a989983437819 Mon Sep 17 00:00:00 2001 > > From: Ming Lei <ming.lei@redhat.com> > > Date: Mon, 7 Jun 2021 16:03:51 +0800 > > Subject: [PATCH 2/2] block: support bio merge for multi-range discard > > > > So far multi-range discard treats each bio as one segment(range) of single > > discard request. This way becomes not efficient if lots of small sized > > discard bios are submitted, and one example is raid456. > > > > Support bio merge for multi-range discard for improving lots of small > > sized discard bios. > > > > Turns out it is easy to support it: > > > > 1) always try to merge bio first > > > > 2) run into multi-range discard only if bio merge can't be done > > > > 3) add rq_for_each_discard_range() for retrieving each range(segment) > > of discard request > > > > Reported-by: Wang Shanker <shankerwangmiao@gmail.com> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-merge.c | 12 ++++----- > > drivers/block/virtio_blk.c | 9 ++++--- > > drivers/nvme/host/core.c | 8 +++--- > > include/linux/blkdev.h | 51 ++++++++++++++++++++++++++++++++++++++ > > 4 files changed, 66 insertions(+), 14 deletions(-) > > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > > index bcdff1879c34..65210e9a8efa 100644 > > --- a/block/blk-merge.c > > +++ b/block/blk-merge.c > > @@ -724,10 +724,10 @@ static inline bool blk_discard_mergable(struct request > > *req) static enum elv_merge blk_try_req_merge(struct request *req, > > struct request *next) > > { > > - if (blk_discard_mergable(req)) > > - return ELEVATOR_DISCARD_MERGE; > > - else if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) > > + if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next)) > > return ELEVATOR_BACK_MERGE; > > + else if (blk_discard_mergable(req)) > > + return ELEVATOR_DISCARD_MERGE; > > > > return ELEVATOR_NO_MERGE; > > } > > @@ -908,12 +908,12 @@ bool blk_rq_merge_ok(struct request *rq, struct bio > > *bio) > > > > enum elv_merge blk_try_merge(struct request *rq, struct bio *bio) > > { > > - if (blk_discard_mergable(rq)) > > - return ELEVATOR_DISCARD_MERGE; > > - else if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) > > + if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) > > return ELEVATOR_BACK_MERGE; > > else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_iter.bi_sector) > > return ELEVATOR_FRONT_MERGE; > > + else if (blk_discard_mergable(rq)) > > + return ELEVATOR_DISCARD_MERGE; > > return ELEVATOR_NO_MERGE; > > } > > > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > > index b9fa3ef5b57c..970cb0d8acaa 100644 > > --- a/drivers/block/virtio_blk.c > > +++ b/drivers/block/virtio_blk.c > > @@ -116,7 +116,6 @@ static int virtblk_setup_discard_write_zeroes(struct > > request *req, bool unmap) unsigned short segments = > > blk_rq_nr_discard_segments(req); > > unsigned short n = 0; > > struct virtio_blk_discard_write_zeroes *range; > > - struct bio *bio; > > u32 flags = 0; > > > > if (unmap) > > @@ -138,9 +137,11 @@ static int virtblk_setup_discard_write_zeroes(struct > > request *req, bool unmap) range[0].sector = cpu_to_le64(blk_rq_pos(req)); > > n = 1; > > } else { > > - __rq_for_each_bio(bio, req) { > > - u64 sector = bio->bi_iter.bi_sector; > > - u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; > > + struct req_discard_range r; > > + > > + rq_for_each_discard_range(r, req) { > > + u64 sector = r.sector; > > + u32 num_sectors = r.size >> SECTOR_SHIFT; > > > > range[n].flags = cpu_to_le32(flags); > > range[n].num_sectors = cpu_to_le32(num_sectors); > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > index 24bcae88587a..4b0a39360ce9 100644 > > --- a/drivers/nvme/host/core.c > > +++ b/drivers/nvme/host/core.c > > @@ -813,7 +813,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > > *ns, struct request *req, { > > unsigned short segments = blk_rq_nr_discard_segments(req), n = 0; > > struct nvme_dsm_range *range; > > - struct bio *bio; > > + struct req_discard_range r; > > > > /* > > * Some devices do not consider the DSM 'Number of Ranges' field when > > @@ -835,9 +835,9 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > > *ns, struct request *req, range = page_address(ns->ctrl->discard_page); > > } > > > > - __rq_for_each_bio(bio, req) { > > - u64 slba = nvme_sect_to_lba(ns, bio->bi_iter.bi_sector); > > - u32 nlb = bio->bi_iter.bi_size >> ns->lba_shift; > > + rq_for_each_discard_range(r, req) { > > + u64 slba = nvme_sect_to_lba(ns, r.sector); > > + u32 nlb = r.size >> ns->lba_shift; > > > > if (n < segments) { > > range[n].cattr = cpu_to_le32(0); > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > > index d66d0da72529..bd9d22269a7b 100644 > > --- a/include/linux/blkdev.h > > +++ b/include/linux/blkdev.h > > @@ -1007,6 +1007,57 @@ static inline unsigned int blk_rq_stats_sectors(const > > struct request *rq) return rq->stats_sectors; > > } > > > > +struct req_discard_range { > > + sector_t sector; > > + unsigned int size; > > + > > + /* > > + * internal field: driver don't use it, and it always points to > > + * next bio to be processed > > + */ > > + struct bio *__bio; > > +}; > > + > > +static inline void req_init_discard_range_iter(const struct request *rq, > > + struct req_discard_range *range) > > +{ > > + range->__bio = rq->bio; > > +} > > + > > +/* return true if @range stores one valid discard range */ > > +static inline bool req_get_discard_range(struct req_discard_range *range) > > +{ > > + struct bio *bio; > > + > > + if (!range->__bio) > > + return false; > > + > > + bio = range->__bio; > > + range->sector = bio->bi_iter.bi_sector; > > + range->size = bio->bi_iter.bi_size; > > + range->__bio = bio->bi_next; > > + > > + while (range->__bio) { > > + struct bio *bio = range->__bio; > > + > > + if (range->sector + (range->size >> SECTOR_SHIFT) != > > + bio->bi_iter.bi_sector) > > + break; > > + > > + /* > > + * ->size won't overflow because req->__data_len is defined > > + * as 'unsigned int' > > + */ > > + range->size += bio->bi_iter.bi_size; > > + range->__bio = bio->bi_next; > > + } > > + return true; > > +} > > + > > +#define rq_for_each_discard_range(range, rq) \ > > + for (req_init_discard_range_iter((rq), &range); \ > > + req_get_discard_range(&range);) > > + > > #ifdef CONFIG_BLK_DEV_ZONED > > > > /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */ > > Do I have to revert the previous one and apply this one? If so, with this one the issue is triggered pretty quick: Yeah, the previous one needs to be reverted. > > ``` > kernel: ------------[ cut here ]------------ > kernel: WARNING: CPU: 20 PID: 490 at drivers/nvme/host/core.c:850 nvme_setup_discard+0x1b9/0x220 Can you collect debug log by applying the following patch against the last one? diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 8780e4aa9df2..fbd8a68c619b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -828,6 +828,24 @@ static inline void nvme_setup_flush(struct nvme_ns *ns, cmnd->common.nsid = cpu_to_le32(ns->head->ns_id); } +static inline void blk_dump_rq(const struct request *req) +{ + struct bio *bio; + int i = 0; + + printk("dump req %p(f:%x, seg: %d)\n", req, req->cmd_flags, + req->nr_phys_segments); + + __rq_for_each_bio(bio, req) { + printk("%d-%p: %hx/%hx %llu %u\n", + i++, bio, + bio->bi_flags, bio->bi_opf, + (unsigned long long)bio->bi_iter.bi_sector, + bio->bi_iter.bi_size>>9); + } +} + + static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, struct nvme_command *cmnd) { @@ -868,6 +886,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, } if (WARN_ON_ONCE(n != segments)) { + printk("%s: ranges %u segments %u\n", __func__, n, segments); + blk_dump_rq(req); if (virt_to_page(range) == ns->ctrl->discard_page) clear_bit_unlock(0, &ns->ctrl->discard_page_busy); else Thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-28 15:53 ` Ming Lei @ 2021-07-28 16:38 ` Oleksandr Natalenko -1 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-28 16:38 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On středa 28. července 2021 17:53:05 CEST Ming Lei wrote: > Can you collect debug log by applying the following patch against the > last one? Yes, please see below. > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 8780e4aa9df2..fbd8a68c619b 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -828,6 +828,24 @@ static inline void nvme_setup_flush(struct nvme_ns *ns, > cmnd->common.nsid = cpu_to_le32(ns->head->ns_id); > } > > +static inline void blk_dump_rq(const struct request *req) > +{ > + struct bio *bio; > + int i = 0; > + > + printk("dump req %p(f:%x, seg: %d)\n", req, req->cmd_flags, > + req->nr_phys_segments); > + > + __rq_for_each_bio(bio, req) { > + printk("%d-%p: %hx/%hx %llu %u\n", > + i++, bio, > + bio->bi_flags, bio->bi_opf, > + (unsigned long long)bio->bi_iter.bi_sector, > + bio->bi_iter.bi_size>>9); > + } > +} > + > + > static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request > *req, struct nvme_command *cmnd) > { > @@ -868,6 +886,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > *ns, struct request *req, } > > if (WARN_ON_ONCE(n != segments)) { > + printk("%s: ranges %u segments %u\n", __func__, n, segments); > + blk_dump_rq(req); > if (virt_to_page(range) == ns->ctrl->discard_page) > clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > else ``` WARNING: CPU: 17 PID: 821 at drivers/nvme/host/core.c:868 nvme_setup_discard+0x1c6/0x220 … CPU: 17 PID: 821 Comm: kworker/17:1H Not tainted 5.13.0-pf4 #1 Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3601 05/26/2021 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:nvme_setup_discard+0x1c6/0x220 Code: 8b a0 40 0b 00 00 4c 2b 25 f7 ff d7 00 49 c1 fc 06 49 c1 e4 0c 4c 03 25 f8 ff d7 00 4c 89 e5 48 85 d2 0f 85 9b fe ff ff 31 d2 <0f> 0b 48 c7 c6 e0 a8 10 8b 41 0f b7 cd 48 c7 c7 af 09 40 8b e8 14 RSP: 0018:ffffafa884517bf0 EFLAGS: 00010297 RAX: ffff91602f5b20d0 RBX: ffff915e05743c00 RCX: 0000000000080000 RDX: 000000000000000f RSI: 0000000028445c00 RDI: 000000000000000f RBP: ffff91602f5b2000 R08: 000000000b366000 R09: ffff91602f5b2000 R10: 000000000000002d R11: fefefefefefefeff R12: ffff91602f5b2000 R13: 000000000000000e R14: ffff915e18c77000 R15: ffff915e18c77148 FS: 0000000000000000(0000) GS:ffff91650ee40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00001c90b36879a8 CR3: 000000010d18c000 CR4: 0000000000350ee0 Call Trace: nvme_setup_cmd+0x2e4/0x6a0 nvme_queue_rq+0x79/0xc90 blk_mq_dispatch_rq_list+0x15c/0x810 __blk_mq_do_dispatch_sched+0xca/0x320 __blk_mq_sched_dispatch_requests+0x14d/0x190 blk_mq_sched_dispatch_requests+0x2f/0x60 blk_mq_run_work_fn+0x43/0xc0 process_one_work+0x24e/0x430 worker_thread+0x54/0x4d0 kthread+0x1b3/0x1e0 ret_from_fork+0x22/0x30 ---[ end trace bd51917eae1d7201 ]--- nvme_setup_discard: ranges 15 segments 14 dump req 000000002c6a085b(f:3, seg: 14) 0-000000002c3297c7: f80/3 675773440 1024 1-0000000098edb2a8: b80/3 188319744 1024 2-00000000f58e3b18: b80/3 675775488 1024 3-00000000f6670c5a: b80/3 188129280 2048 4-00000000ea371a88: b80/3 675585024 2048 5-00000000e9cec043: b80/3 188140544 2048 6-000000006e1126e6: b80/3 675596288 2048 7-000000009466f937: b80/3 188327936 2048 8-000000003c9e2ccd: b80/3 675783680 2048 9-00000000ab322c68: b80/3 188329984 2048 10-00000000eb2b3fb6: b80/3 675785728 2048 11-00000000aa19f73d: b80/3 188098560 4096 12-00000000dd2f8682: b80/3 675554304 4096 13-0000000072f20a8c: b80/3 188112896 1024 14-00000000559ba401: b80/3 675568640 1024 blk_update_request: I/O error, dev nvme1n1, sector 675773440 op 0x3:(DISCARD) flags 0x0 phys_seg 14 prio class 0 nvme_setup_discard: ranges 45 segments 48 dump req 00000000f96fdd7c(f:3, seg: 48) 0-00000000180da7f3: f82/3 188316672 1024 1-00000000d2d0828e: b82/3 675774464 1024 2-000000000c4882eb: b82/3 645145600 1024 3-000000009d701d1e: b82/3 188318720 1024 4-000000000b9e8e53: b82/3 675776512 1024 5-00000000c73ca145: b82/3 157696000 1024 6-00000000e22bbe77: b82/3 188320768 1024 7-0000000054696401: b82/3 675777536 1024 8-0000000016bc968c: b82/3 188321792 1024 9-000000006e25f763: b82/3 675779584 1024 10-00000000340df89d: b82/3 157765632 1024 11-0000000046522f81: b82/3 188323840 1024 12-00000000fa9a368c: b82/3 675780608 1024 13-0000000093af3f87: b82/3 188128256 1024 14-000000008585ce43: b82/3 675587072 1024 15-0000000044f73721: b82/3 188132352 1024 16-000000006c9df2f2: b82/3 675589120 1024 17-00000000c6f4ba8b: b82/3 675594240 1024 18-000000001f5dbb5e: b82/3 188138496 1024 19-000000001de03037: b82/3 188139520 1024 20-00000000a307ccef: b82/3 645164032 1024 21-0000000018da5076: b82/3 675598336 1024 22-0000000033cdeaf5: b82/3 188143616 1024 23-00000000ecf3d07a: b82/3 675600384 1024 24-00000000e86b10f4: b82/3 188144640 1024 25-00000000ba439495: b82/3 157726720 1024 26-0000000027fcd176: b82/3 641614848 1024 27-0000000014182edf: b82/3 157728768 1024 28-00000000194e904f: b82/3 157731840 1024 29-000000002ea97c9e: b82/3 675787776 1024 30-000000004d5c6130: b82/3 188332032 1024 31-0000000058639801: b82/3 675788800 1024 32-000000003dd731c7: b82/3 675789824 1024 33-00000000f6f8a09e: b82/3 645190656 1024 34-000000002d4c6b21: b82/3 188334080 1024 35-000000007ea3ba79: b82/3 675790848 1024 36-00000000011db072: b82/3 188064768 1024 37-00000000e31f9da8: b82/3 188067840 1024 38-000000006baa465a: b82/3 188097536 1024 39-00000000a870893c: b82/3 675559424 1024 40-000000004105bc56: b82/3 188103680 1024 41-00000000e75709b5: b82/3 675560448 1024 42-00000000e08f7e0a: b82/3 675566592 1024 43-0000000020218f1b: b82/3 188110848 1024 44-000000004bb5880a: b82/3 188111872 1024 45-00000000e6632009: b82/3 675569664 1024 46-0000000029e61303: b82/3 188114944 1024 47-000000006f66801c: b82/3 675571712 1024 blk_update_request: I/O error, dev nvme1n1, sector 188316672 op 0x3:(DISCARD) flags 0x0 phys_seg 48 prio class 0 nvme_setup_discard: ranges 73 segments 75 dump req 00000000b51b828c(f:3, seg: 75) 0-000000009c966abb: b82/3 675772416 1024 1-00000000d0b7b912: f80/3 675773440 1024 2-00000000c38e02dd: b80/3 188319744 1024 3-0000000043a53b40: b80/3 675775488 1024 4-000000009b5569c4: b80/3 188129280 2048 5-0000000066d31f07: b80/3 675585024 2048 6-000000006ff6d59f: b80/3 188140544 2048 7-00000000f39f9bdb: b80/3 675596288 2048 8-00000000929b3cff: b80/3 188327936 2048 9-0000000034c64365: b80/3 675783680 2048 10-000000004c508955: b80/3 188329984 2048 11-00000000fc9b9d0a: b80/3 675785728 2048 12-000000002a402a6c: b80/3 188098560 4096 13-0000000008074d32: b80/3 675554304 4096 14-0000000018d76730: b80/3 188112896 1024 15-000000003a12b456: b80/3 675568640 1024 16-00000000285092ed: b82/3 610962432 1024 17-000000005f714a72: b82/3 188342272 1024 18-00000000794c5e57: b82/3 611005440 1024 19-00000000630a4c01: b82/3 153951232 1024 20-00000000ffa4369f: b82/3 675798016 1024 21-00000000521f4eda: b82/3 188348416 1024 22-0000000041b3ca89: b82/3 17291264 1024 23-00000000e9c2e9f0: b82/3 17362944 1024 24-000000008f2954da: b82/3 599607296 1024 25-00000000805912ef: b82/3 611051520 1024 26-000000008533232e: b82/3 188336128 1024 27-00000000f9c0f9ae: b82/3 174541824 1024 28-00000000552ad7fc: b82/3 188318720 1024 29-000000009971d59c: b82/3 157689856 1024 30-0000000001ef1cc6: b82/3 675774464 1024 31-0000000037eddacb: b82/3 188320768 1024 32-00000000a2c8dd24: b82/3 645151744 1024 33-000000003e4d7ac7: b82/3 675776512 1024 34-000000000cb417af: b82/3 188321792 1024 35-0000000056c3e342: b82/3 675777536 1024 36-00000000ab22b292: b82/3 188323840 1024 37-0000000069b44eba: b82/3 645221376 1024 38-00000000a35eafb0: b82/3 675779584 1024 39-00000000f71cf77d: b82/3 188324864 1024 40-0000000011ba1e4e: b82/3 675584000 1024 41-0000000012ec22d4: b82/3 188131328 1024 42-00000000acf89f36: b82/3 675588096 1024 43-000000007fdfa2fc: b82/3 188133376 1024 44-000000006ae8f6de: b82/3 188138496 1024 45-00000000c86eaa59: b82/3 675594240 1024 46-00000000cbeecf93: b82/3 675595264 1024 47-0000000002061627: b82/3 157708288 1024 48-00000000c6c0e9fb: b82/3 188142592 1024 49-00000000aca0a548: b82/3 675599360 1024 50-00000000cf220d06: b82/3 188144640 1024 51-000000007a9bdfd3: b82/3 675600384 1024 52-000000009c6129de: b82/3 645182464 1024 53-000000006744fe84: b82/3 154159104 1024 54-00000000217342e3: b82/3 645184512 1024 55-00000000bc5fea55: b82/3 645187584 1024 56-00000000eb6093bd: b82/3 188332032 1024 57-0000000056371b99: b82/3 675787776 1024 58-0000000080e53e1e: b82/3 188333056 1024 59-0000000020cc4aa0: b82/3 188334080 1024 60-00000000dfb62a3e: b82/3 157734912 1024 61-00000000ff41dcf5: b82/3 675789824 1024 62-0000000042c2746d: b82/3 188335104 1024 63-00000000192eebcb: b82/3 675520512 1024 64-000000008725b451: b82/3 675523584 1024 65-00000000893843dc: b82/3 675553280 1024 66-00000000eab256cb: b82/3 188103680 1024 67-0000000046ef0487: b82/3 675559424 1024 68-000000000228f430: b82/3 188104704 1024 69-00000000386e9a99: b82/3 188110848 1024 70-000000008ced777c: b82/3 675566592 1024 71-000000008a5f9e82: b82/3 675567616 1024 72-0000000065da8656: b82/3 188113920 1024 73-00000000c938ac22: b82/3 675570688 1024 74-00000000c67c0e32: b82/3 188115968 1024 75-00000000646e12f4: b82/3 675760128 1024 76-0000000054aa9d13: b82/3 188305408 1024 blk_update_request: I/O error, dev nvme0n1, sector 675772416 op 0x3:(DISCARD) flags 0x0 phys_seg 75 prio class 0 nvme_setup_discard: ranges 5 segments 6 dump req 00000000a157e475(f:3, seg: 6) 0-000000004fc95770: f82/3 189429760 1024 1-000000003d5f4d39: b82/3 189432832 1024 2-000000003e3a5606: b82/3 676889600 1024 3-00000000d5b2b48e: b82/3 189433856 1024 4-00000000101795a9: b82/3 189434880 1024 5-000000001c3c352f: b82/3 184399872 1024 blk_update_request: I/O error, dev nvme1n1, sector 189429760 op 0x3:(DISCARD) flags 0x0 phys_seg 6 prio class 0 nvme_setup_discard: ranges 2 segments 3 dump req 00000000e17a5886(f:3, seg: 3) 0-000000007af7af24: b82/3 188352512 1024 1-000000003af3dbc9: f80/3 188353536 1024 2-00000000c4e5d7e7: b80/3 675809280 1024 3-00000000cb04663c: b82/3 675810304 1024 blk_update_request: I/O error, dev nvme1n1, sector 188352512 op 0x3:(DISCARD) flags 0x0 phys_seg 3 prio class 0 nvme_setup_discard: ranges 2 segments 3 dump req 00000000ad5525bd(f:3, seg: 3) 0-000000005cf09cca: b82/3 188348416 1024 1-0000000018cc2558: f80/3 188349440 1024 2-00000000c4c2a716: b80/3 675805184 1024 3-00000000a8908f6c: b82/3 675806208 1024 blk_update_request: I/O error, dev nvme1n1, sector 188348416 op 0x3:(DISCARD) flags 0x0 phys_seg 3 prio class 0 nvme_setup_discard: ranges 2 segments 3 dump req 00000000f96fdd7c(f:3, seg: 3) 0-000000009d4e946d: b82/3 189264896 1024 1-00000000d5eee140: f80/3 189265920 2048 2-00000000d84f9e26: b80/3 676721664 2048 3-0000000099245351: b82/3 676723712 1024 blk_update_request: I/O error, dev nvme1n1, sector 189264896 op 0x3:(DISCARD) flags 0x0 phys_seg 3 prio class 0 nvme_setup_discard: ranges 48 segments 49 dump req 0000000034977ea1(f:3, seg: 49) 0-00000000e4cae23d: f82/3 676881408 1024 1-0000000068dc7629: b82/3 676885504 1024 2-00000000f049ea5a: b82/3 676888576 1024 3-00000000c20d2ce8: b82/3 189433856 1024 4-000000004c139e91: b82/3 676889600 1024 5-0000000091100457: b82/3 676890624 1024 6-0000000076b82d71: b82/3 671855616 1024 7-000000003003717f: b82/3 144389120 1024 8-00000000f664d7bc: b82/3 188355584 1024 9-0000000004dffa3c: b82/3 675811328 1024 10-00000000333580c6: b82/3 188356608 1024 11-00000000b69a3602: b82/3 631820288 1024 12-00000000736b0c31: b80/3 188353536 1024 13-0000000007888c71: b80/3 675809280 1024 14-0000000003614d26: b82/3 675808256 1024 15-000000001a7ae177: b82/3 188354560 1024 16-000000008ab11bbd: b82/3 631758848 1024 17-00000000998c6d3c: b82/3 675807232 1024 18-0000000056d675a4: b82/3 188352512 1024 19-00000000854a6b1e: b80/3 188349440 1024 20-000000000b49b05e: b80/3 675805184 1024 21-000000008acc3b01: b82/3 675804160 1024 22-00000000315cb886: b82/3 188350464 1024 23-0000000086fb9348: b82/3 175320064 1024 24-000000006233dfd4: b82/3 188591104 1024 25-000000008022c2e2: b82/3 676046848 1024 26-000000002388ff7a: b82/3 188592128 1024 27-0000000024e54dc9: b82/3 188590080 1024 28-00000000e1706a07: b82/3 670319616 1024 29-00000000f5308dfc: b82/3 189270016 1024 30-00000000c005ae85: b80/3 189265920 2048 31-00000000a98ed3a6: b80/3 676721664 2048 32-000000003e7e01e1: b82/3 676720640 1024 33-000000003e9b0c86: b82/3 189267968 1024 34-00000000802b8160: b82/3 676716544 1024 35-00000000adf4fab8: b82/3 189257728 1024 36-000000007136233a: b82/3 676713472 1024 37-0000000085a35fca: b82/3 189258752 1024 38-000000008c8ed35f: b82/3 676710400 1024 39-00000000567baa73: b82/3 189255680 1024 40-0000000041ff7157: b82/3 676711424 1024 41-00000000fb3f4cd8: b82/3 189249536 1024 42-000000002efa7248: b82/3 676705280 1024 43-00000000a53bc3cb: b82/3 189239296 1024 44-000000006e6456e9: b82/3 676691968 1024 45-00000000f0fa291b: b82/3 676690944 1024 46-000000008b814484: b82/3 189234176 1024 47-00000000f3fa7f80: b82/3 676686848 1024 48-00000000e34ea3ca: b82/3 676679680 1024 blk_update_request: I/O error, dev nvme0n1, sector 676881408 op 0x3:(DISCARD) flags 0x0 phys_seg 49 prio class 0 nvme_setup_discard: ranges 9 segments 10 dump req 0000000066d1c6c7(f:3, seg: 10) 0-000000006f9c1d2b: f82/3 95788032 1024 1-000000005592941a: b82/3 155664384 1024 2-00000000180b2bab: b80/3 186275840 2048 3-00000000485a4757: b80/3 673731584 2048 4-00000000fcc93dcb: b82/3 673733632 1024 5-0000000089e0acd8: b80/3 186273792 1024 6-0000000005896717: b80/3 673729536 1024 7-00000000f264b15c: b82/3 186272768 1024 8-000000009684f15b: b82/3 673730560 1024 9-000000003a2a57b4: b82/3 175249408 1024 blk_update_request: I/O error, dev nvme1n1, sector 95788032 op 0x3:(DISCARD) flags 0x0 phys_seg 10 prio class 0 ``` Thanks. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-28 16:38 ` Oleksandr Natalenko 0 siblings, 0 replies; 52+ messages in thread From: Oleksandr Natalenko @ 2021-07-28 16:38 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch Hello. On středa 28. července 2021 17:53:05 CEST Ming Lei wrote: > Can you collect debug log by applying the following patch against the > last one? Yes, please see below. > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 8780e4aa9df2..fbd8a68c619b 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -828,6 +828,24 @@ static inline void nvme_setup_flush(struct nvme_ns *ns, > cmnd->common.nsid = cpu_to_le32(ns->head->ns_id); > } > > +static inline void blk_dump_rq(const struct request *req) > +{ > + struct bio *bio; > + int i = 0; > + > + printk("dump req %p(f:%x, seg: %d)\n", req, req->cmd_flags, > + req->nr_phys_segments); > + > + __rq_for_each_bio(bio, req) { > + printk("%d-%p: %hx/%hx %llu %u\n", > + i++, bio, > + bio->bi_flags, bio->bi_opf, > + (unsigned long long)bio->bi_iter.bi_sector, > + bio->bi_iter.bi_size>>9); > + } > +} > + > + > static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request > *req, struct nvme_command *cmnd) > { > @@ -868,6 +886,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > *ns, struct request *req, } > > if (WARN_ON_ONCE(n != segments)) { > + printk("%s: ranges %u segments %u\n", __func__, n, segments); > + blk_dump_rq(req); > if (virt_to_page(range) == ns->ctrl->discard_page) > clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > else ``` WARNING: CPU: 17 PID: 821 at drivers/nvme/host/core.c:868 nvme_setup_discard+0x1c6/0x220 … CPU: 17 PID: 821 Comm: kworker/17:1H Not tainted 5.13.0-pf4 #1 Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3601 05/26/2021 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:nvme_setup_discard+0x1c6/0x220 Code: 8b a0 40 0b 00 00 4c 2b 25 f7 ff d7 00 49 c1 fc 06 49 c1 e4 0c 4c 03 25 f8 ff d7 00 4c 89 e5 48 85 d2 0f 85 9b fe ff ff 31 d2 <0f> 0b 48 c7 c6 e0 a8 10 8b 41 0f b7 cd 48 c7 c7 af 09 40 8b e8 14 RSP: 0018:ffffafa884517bf0 EFLAGS: 00010297 RAX: ffff91602f5b20d0 RBX: ffff915e05743c00 RCX: 0000000000080000 RDX: 000000000000000f RSI: 0000000028445c00 RDI: 000000000000000f RBP: ffff91602f5b2000 R08: 000000000b366000 R09: ffff91602f5b2000 R10: 000000000000002d R11: fefefefefefefeff R12: ffff91602f5b2000 R13: 000000000000000e R14: ffff915e18c77000 R15: ffff915e18c77148 FS: 0000000000000000(0000) GS:ffff91650ee40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00001c90b36879a8 CR3: 000000010d18c000 CR4: 0000000000350ee0 Call Trace: nvme_setup_cmd+0x2e4/0x6a0 nvme_queue_rq+0x79/0xc90 blk_mq_dispatch_rq_list+0x15c/0x810 __blk_mq_do_dispatch_sched+0xca/0x320 __blk_mq_sched_dispatch_requests+0x14d/0x190 blk_mq_sched_dispatch_requests+0x2f/0x60 blk_mq_run_work_fn+0x43/0xc0 process_one_work+0x24e/0x430 worker_thread+0x54/0x4d0 kthread+0x1b3/0x1e0 ret_from_fork+0x22/0x30 ---[ end trace bd51917eae1d7201 ]--- nvme_setup_discard: ranges 15 segments 14 dump req 000000002c6a085b(f:3, seg: 14) 0-000000002c3297c7: f80/3 675773440 1024 1-0000000098edb2a8: b80/3 188319744 1024 2-00000000f58e3b18: b80/3 675775488 1024 3-00000000f6670c5a: b80/3 188129280 2048 4-00000000ea371a88: b80/3 675585024 2048 5-00000000e9cec043: b80/3 188140544 2048 6-000000006e1126e6: b80/3 675596288 2048 7-000000009466f937: b80/3 188327936 2048 8-000000003c9e2ccd: b80/3 675783680 2048 9-00000000ab322c68: b80/3 188329984 2048 10-00000000eb2b3fb6: b80/3 675785728 2048 11-00000000aa19f73d: b80/3 188098560 4096 12-00000000dd2f8682: b80/3 675554304 4096 13-0000000072f20a8c: b80/3 188112896 1024 14-00000000559ba401: b80/3 675568640 1024 blk_update_request: I/O error, dev nvme1n1, sector 675773440 op 0x3:(DISCARD) flags 0x0 phys_seg 14 prio class 0 nvme_setup_discard: ranges 45 segments 48 dump req 00000000f96fdd7c(f:3, seg: 48) 0-00000000180da7f3: f82/3 188316672 1024 1-00000000d2d0828e: b82/3 675774464 1024 2-000000000c4882eb: b82/3 645145600 1024 3-000000009d701d1e: b82/3 188318720 1024 4-000000000b9e8e53: b82/3 675776512 1024 5-00000000c73ca145: b82/3 157696000 1024 6-00000000e22bbe77: b82/3 188320768 1024 7-0000000054696401: b82/3 675777536 1024 8-0000000016bc968c: b82/3 188321792 1024 9-000000006e25f763: b82/3 675779584 1024 10-00000000340df89d: b82/3 157765632 1024 11-0000000046522f81: b82/3 188323840 1024 12-00000000fa9a368c: b82/3 675780608 1024 13-0000000093af3f87: b82/3 188128256 1024 14-000000008585ce43: b82/3 675587072 1024 15-0000000044f73721: b82/3 188132352 1024 16-000000006c9df2f2: b82/3 675589120 1024 17-00000000c6f4ba8b: b82/3 675594240 1024 18-000000001f5dbb5e: b82/3 188138496 1024 19-000000001de03037: b82/3 188139520 1024 20-00000000a307ccef: b82/3 645164032 1024 21-0000000018da5076: b82/3 675598336 1024 22-0000000033cdeaf5: b82/3 188143616 1024 23-00000000ecf3d07a: b82/3 675600384 1024 24-00000000e86b10f4: b82/3 188144640 1024 25-00000000ba439495: b82/3 157726720 1024 26-0000000027fcd176: b82/3 641614848 1024 27-0000000014182edf: b82/3 157728768 1024 28-00000000194e904f: b82/3 157731840 1024 29-000000002ea97c9e: b82/3 675787776 1024 30-000000004d5c6130: b82/3 188332032 1024 31-0000000058639801: b82/3 675788800 1024 32-000000003dd731c7: b82/3 675789824 1024 33-00000000f6f8a09e: b82/3 645190656 1024 34-000000002d4c6b21: b82/3 188334080 1024 35-000000007ea3ba79: b82/3 675790848 1024 36-00000000011db072: b82/3 188064768 1024 37-00000000e31f9da8: b82/3 188067840 1024 38-000000006baa465a: b82/3 188097536 1024 39-00000000a870893c: b82/3 675559424 1024 40-000000004105bc56: b82/3 188103680 1024 41-00000000e75709b5: b82/3 675560448 1024 42-00000000e08f7e0a: b82/3 675566592 1024 43-0000000020218f1b: b82/3 188110848 1024 44-000000004bb5880a: b82/3 188111872 1024 45-00000000e6632009: b82/3 675569664 1024 46-0000000029e61303: b82/3 188114944 1024 47-000000006f66801c: b82/3 675571712 1024 blk_update_request: I/O error, dev nvme1n1, sector 188316672 op 0x3:(DISCARD) flags 0x0 phys_seg 48 prio class 0 nvme_setup_discard: ranges 73 segments 75 dump req 00000000b51b828c(f:3, seg: 75) 0-000000009c966abb: b82/3 675772416 1024 1-00000000d0b7b912: f80/3 675773440 1024 2-00000000c38e02dd: b80/3 188319744 1024 3-0000000043a53b40: b80/3 675775488 1024 4-000000009b5569c4: b80/3 188129280 2048 5-0000000066d31f07: b80/3 675585024 2048 6-000000006ff6d59f: b80/3 188140544 2048 7-00000000f39f9bdb: b80/3 675596288 2048 8-00000000929b3cff: b80/3 188327936 2048 9-0000000034c64365: b80/3 675783680 2048 10-000000004c508955: b80/3 188329984 2048 11-00000000fc9b9d0a: b80/3 675785728 2048 12-000000002a402a6c: b80/3 188098560 4096 13-0000000008074d32: b80/3 675554304 4096 14-0000000018d76730: b80/3 188112896 1024 15-000000003a12b456: b80/3 675568640 1024 16-00000000285092ed: b82/3 610962432 1024 17-000000005f714a72: b82/3 188342272 1024 18-00000000794c5e57: b82/3 611005440 1024 19-00000000630a4c01: b82/3 153951232 1024 20-00000000ffa4369f: b82/3 675798016 1024 21-00000000521f4eda: b82/3 188348416 1024 22-0000000041b3ca89: b82/3 17291264 1024 23-00000000e9c2e9f0: b82/3 17362944 1024 24-000000008f2954da: b82/3 599607296 1024 25-00000000805912ef: b82/3 611051520 1024 26-000000008533232e: b82/3 188336128 1024 27-00000000f9c0f9ae: b82/3 174541824 1024 28-00000000552ad7fc: b82/3 188318720 1024 29-000000009971d59c: b82/3 157689856 1024 30-0000000001ef1cc6: b82/3 675774464 1024 31-0000000037eddacb: b82/3 188320768 1024 32-00000000a2c8dd24: b82/3 645151744 1024 33-000000003e4d7ac7: b82/3 675776512 1024 34-000000000cb417af: b82/3 188321792 1024 35-0000000056c3e342: b82/3 675777536 1024 36-00000000ab22b292: b82/3 188323840 1024 37-0000000069b44eba: b82/3 645221376 1024 38-00000000a35eafb0: b82/3 675779584 1024 39-00000000f71cf77d: b82/3 188324864 1024 40-0000000011ba1e4e: b82/3 675584000 1024 41-0000000012ec22d4: b82/3 188131328 1024 42-00000000acf89f36: b82/3 675588096 1024 43-000000007fdfa2fc: b82/3 188133376 1024 44-000000006ae8f6de: b82/3 188138496 1024 45-00000000c86eaa59: b82/3 675594240 1024 46-00000000cbeecf93: b82/3 675595264 1024 47-0000000002061627: b82/3 157708288 1024 48-00000000c6c0e9fb: b82/3 188142592 1024 49-00000000aca0a548: b82/3 675599360 1024 50-00000000cf220d06: b82/3 188144640 1024 51-000000007a9bdfd3: b82/3 675600384 1024 52-000000009c6129de: b82/3 645182464 1024 53-000000006744fe84: b82/3 154159104 1024 54-00000000217342e3: b82/3 645184512 1024 55-00000000bc5fea55: b82/3 645187584 1024 56-00000000eb6093bd: b82/3 188332032 1024 57-0000000056371b99: b82/3 675787776 1024 58-0000000080e53e1e: b82/3 188333056 1024 59-0000000020cc4aa0: b82/3 188334080 1024 60-00000000dfb62a3e: b82/3 157734912 1024 61-00000000ff41dcf5: b82/3 675789824 1024 62-0000000042c2746d: b82/3 188335104 1024 63-00000000192eebcb: b82/3 675520512 1024 64-000000008725b451: b82/3 675523584 1024 65-00000000893843dc: b82/3 675553280 1024 66-00000000eab256cb: b82/3 188103680 1024 67-0000000046ef0487: b82/3 675559424 1024 68-000000000228f430: b82/3 188104704 1024 69-00000000386e9a99: b82/3 188110848 1024 70-000000008ced777c: b82/3 675566592 1024 71-000000008a5f9e82: b82/3 675567616 1024 72-0000000065da8656: b82/3 188113920 1024 73-00000000c938ac22: b82/3 675570688 1024 74-00000000c67c0e32: b82/3 188115968 1024 75-00000000646e12f4: b82/3 675760128 1024 76-0000000054aa9d13: b82/3 188305408 1024 blk_update_request: I/O error, dev nvme0n1, sector 675772416 op 0x3:(DISCARD) flags 0x0 phys_seg 75 prio class 0 nvme_setup_discard: ranges 5 segments 6 dump req 00000000a157e475(f:3, seg: 6) 0-000000004fc95770: f82/3 189429760 1024 1-000000003d5f4d39: b82/3 189432832 1024 2-000000003e3a5606: b82/3 676889600 1024 3-00000000d5b2b48e: b82/3 189433856 1024 4-00000000101795a9: b82/3 189434880 1024 5-000000001c3c352f: b82/3 184399872 1024 blk_update_request: I/O error, dev nvme1n1, sector 189429760 op 0x3:(DISCARD) flags 0x0 phys_seg 6 prio class 0 nvme_setup_discard: ranges 2 segments 3 dump req 00000000e17a5886(f:3, seg: 3) 0-000000007af7af24: b82/3 188352512 1024 1-000000003af3dbc9: f80/3 188353536 1024 2-00000000c4e5d7e7: b80/3 675809280 1024 3-00000000cb04663c: b82/3 675810304 1024 blk_update_request: I/O error, dev nvme1n1, sector 188352512 op 0x3:(DISCARD) flags 0x0 phys_seg 3 prio class 0 nvme_setup_discard: ranges 2 segments 3 dump req 00000000ad5525bd(f:3, seg: 3) 0-000000005cf09cca: b82/3 188348416 1024 1-0000000018cc2558: f80/3 188349440 1024 2-00000000c4c2a716: b80/3 675805184 1024 3-00000000a8908f6c: b82/3 675806208 1024 blk_update_request: I/O error, dev nvme1n1, sector 188348416 op 0x3:(DISCARD) flags 0x0 phys_seg 3 prio class 0 nvme_setup_discard: ranges 2 segments 3 dump req 00000000f96fdd7c(f:3, seg: 3) 0-000000009d4e946d: b82/3 189264896 1024 1-00000000d5eee140: f80/3 189265920 2048 2-00000000d84f9e26: b80/3 676721664 2048 3-0000000099245351: b82/3 676723712 1024 blk_update_request: I/O error, dev nvme1n1, sector 189264896 op 0x3:(DISCARD) flags 0x0 phys_seg 3 prio class 0 nvme_setup_discard: ranges 48 segments 49 dump req 0000000034977ea1(f:3, seg: 49) 0-00000000e4cae23d: f82/3 676881408 1024 1-0000000068dc7629: b82/3 676885504 1024 2-00000000f049ea5a: b82/3 676888576 1024 3-00000000c20d2ce8: b82/3 189433856 1024 4-000000004c139e91: b82/3 676889600 1024 5-0000000091100457: b82/3 676890624 1024 6-0000000076b82d71: b82/3 671855616 1024 7-000000003003717f: b82/3 144389120 1024 8-00000000f664d7bc: b82/3 188355584 1024 9-0000000004dffa3c: b82/3 675811328 1024 10-00000000333580c6: b82/3 188356608 1024 11-00000000b69a3602: b82/3 631820288 1024 12-00000000736b0c31: b80/3 188353536 1024 13-0000000007888c71: b80/3 675809280 1024 14-0000000003614d26: b82/3 675808256 1024 15-000000001a7ae177: b82/3 188354560 1024 16-000000008ab11bbd: b82/3 631758848 1024 17-00000000998c6d3c: b82/3 675807232 1024 18-0000000056d675a4: b82/3 188352512 1024 19-00000000854a6b1e: b80/3 188349440 1024 20-000000000b49b05e: b80/3 675805184 1024 21-000000008acc3b01: b82/3 675804160 1024 22-00000000315cb886: b82/3 188350464 1024 23-0000000086fb9348: b82/3 175320064 1024 24-000000006233dfd4: b82/3 188591104 1024 25-000000008022c2e2: b82/3 676046848 1024 26-000000002388ff7a: b82/3 188592128 1024 27-0000000024e54dc9: b82/3 188590080 1024 28-00000000e1706a07: b82/3 670319616 1024 29-00000000f5308dfc: b82/3 189270016 1024 30-00000000c005ae85: b80/3 189265920 2048 31-00000000a98ed3a6: b80/3 676721664 2048 32-000000003e7e01e1: b82/3 676720640 1024 33-000000003e9b0c86: b82/3 189267968 1024 34-00000000802b8160: b82/3 676716544 1024 35-00000000adf4fab8: b82/3 189257728 1024 36-000000007136233a: b82/3 676713472 1024 37-0000000085a35fca: b82/3 189258752 1024 38-000000008c8ed35f: b82/3 676710400 1024 39-00000000567baa73: b82/3 189255680 1024 40-0000000041ff7157: b82/3 676711424 1024 41-00000000fb3f4cd8: b82/3 189249536 1024 42-000000002efa7248: b82/3 676705280 1024 43-00000000a53bc3cb: b82/3 189239296 1024 44-000000006e6456e9: b82/3 676691968 1024 45-00000000f0fa291b: b82/3 676690944 1024 46-000000008b814484: b82/3 189234176 1024 47-00000000f3fa7f80: b82/3 676686848 1024 48-00000000e34ea3ca: b82/3 676679680 1024 blk_update_request: I/O error, dev nvme0n1, sector 676881408 op 0x3:(DISCARD) flags 0x0 phys_seg 49 prio class 0 nvme_setup_discard: ranges 9 segments 10 dump req 0000000066d1c6c7(f:3, seg: 10) 0-000000006f9c1d2b: f82/3 95788032 1024 1-000000005592941a: b82/3 155664384 1024 2-00000000180b2bab: b80/3 186275840 2048 3-00000000485a4757: b80/3 673731584 2048 4-00000000fcc93dcb: b82/3 673733632 1024 5-0000000089e0acd8: b80/3 186273792 1024 6-0000000005896717: b80/3 673729536 1024 7-00000000f264b15c: b82/3 186272768 1024 8-000000009684f15b: b82/3 673730560 1024 9-000000003a2a57b4: b82/3 175249408 1024 blk_update_request: I/O error, dev nvme1n1, sector 95788032 op 0x3:(DISCARD) flags 0x0 phys_seg 10 prio class 0 ``` Thanks. -- Oleksandr Natalenko (post-factum) _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-28 16:38 ` Oleksandr Natalenko @ 2021-07-29 3:33 ` Ming Lei -1 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-29 3:33 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Wed, Jul 28, 2021 at 06:38:36PM +0200, Oleksandr Natalenko wrote: > Hello. > > On středa 28. července 2021 17:53:05 CEST Ming Lei wrote: > > Can you collect debug log by applying the following patch against the > > last one? > > Yes, please see below. > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > index 8780e4aa9df2..fbd8a68c619b 100644 > > --- a/drivers/nvme/host/core.c > > +++ b/drivers/nvme/host/core.c > > @@ -828,6 +828,24 @@ static inline void nvme_setup_flush(struct nvme_ns *ns, > > cmnd->common.nsid = cpu_to_le32(ns->head->ns_id); > > } > > > > +static inline void blk_dump_rq(const struct request *req) > > +{ > > + struct bio *bio; > > + int i = 0; > > + > > + printk("dump req %p(f:%x, seg: %d)\n", req, req->cmd_flags, > > + req->nr_phys_segments); > > + > > + __rq_for_each_bio(bio, req) { > > + printk("%d-%p: %hx/%hx %llu %u\n", > > + i++, bio, > > + bio->bi_flags, bio->bi_opf, > > + (unsigned long long)bio->bi_iter.bi_sector, > > + bio->bi_iter.bi_size>>9); > > + } > > +} > > + > > + > > static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request > > *req, struct nvme_command *cmnd) > > { > > @@ -868,6 +886,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > > *ns, struct request *req, } > > > > if (WARN_ON_ONCE(n != segments)) { > > + printk("%s: ranges %u segments %u\n", __func__, n, segments); > > + blk_dump_rq(req); > > if (virt_to_page(range) == ns->ctrl->discard_page) > > clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > > else > > ``` > WARNING: CPU: 17 PID: 821 at drivers/nvme/host/core.c:868 nvme_setup_discard+0x1c6/0x220 > … > CPU: 17 PID: 821 Comm: kworker/17:1H Not tainted 5.13.0-pf4 #1 > Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3601 05/26/2021 > Workqueue: kblockd blk_mq_run_work_fn > RIP: 0010:nvme_setup_discard+0x1c6/0x220 > Code: 8b a0 40 0b 00 00 4c 2b 25 f7 ff d7 00 49 c1 fc 06 49 c1 e4 0c 4c 03 25 f8 ff d7 00 4c 89 e5 48 85 d2 0f 85 9b fe ff ff 31 d2 <0f> 0b 48 c7 c6 e0 a8 10 8b 41 0f b7 cd 48 c7 c7 af 09 40 8b e8 14 > RSP: 0018:ffffafa884517bf0 EFLAGS: 00010297 > RAX: ffff91602f5b20d0 RBX: ffff915e05743c00 RCX: 0000000000080000 > RDX: 000000000000000f RSI: 0000000028445c00 RDI: 000000000000000f > RBP: ffff91602f5b2000 R08: 000000000b366000 R09: ffff91602f5b2000 > R10: 000000000000002d R11: fefefefefefefeff R12: ffff91602f5b2000 > R13: 000000000000000e R14: ffff915e18c77000 R15: ffff915e18c77148 > FS: 0000000000000000(0000) GS:ffff91650ee40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00001c90b36879a8 CR3: 000000010d18c000 CR4: 0000000000350ee0 > Call Trace: > nvme_setup_cmd+0x2e4/0x6a0 > nvme_queue_rq+0x79/0xc90 > blk_mq_dispatch_rq_list+0x15c/0x810 > __blk_mq_do_dispatch_sched+0xca/0x320 > __blk_mq_sched_dispatch_requests+0x14d/0x190 > blk_mq_sched_dispatch_requests+0x2f/0x60 > blk_mq_run_work_fn+0x43/0xc0 > process_one_work+0x24e/0x430 > worker_thread+0x54/0x4d0 > kthread+0x1b3/0x1e0 > ret_from_fork+0x22/0x30 > ---[ end trace bd51917eae1d7201 ]--- > nvme_setup_discard: ranges 15 segments 14 > dump req 000000002c6a085b(f:3, seg: 14) > 0-000000002c3297c7: f80/3 675773440 1024 > 1-0000000098edb2a8: b80/3 188319744 1024 > 2-00000000f58e3b18: b80/3 675775488 1024 > 3-00000000f6670c5a: b80/3 188129280 2048 > 4-00000000ea371a88: b80/3 675585024 2048 > 5-00000000e9cec043: b80/3 188140544 2048 > 6-000000006e1126e6: b80/3 675596288 2048 > 7-000000009466f937: b80/3 188327936 2048 > 8-000000003c9e2ccd: b80/3 675783680 2048 > 9-00000000ab322c68: b80/3 188329984 2048 188329984 = 188327936 + 2048 > 10-00000000eb2b3fb6: b80/3 675785728 2048 675785728 = 675783680 + 2048 Seems the adjacent bios are cut by coming discard range. Looks it isn't mature to support mixed discard IO merge( traditional io merge vs. multi-range merge), I will post the previous patch for fixing this issue. > blk_update_request: I/O error, dev nvme1n1, sector 675773440 op 0x3:(DISCARD) flags 0x0 phys_seg 14 prio class 0 > nvme_setup_discard: ranges 45 segments 48 ... > nvme_setup_discard: ranges 73 segments 75 ... > nvme_setup_discard: ranges 5 segments 6 ... > nvme_setup_discard: ranges 2 segments 3 ... > nvme_setup_discard: ranges 2 segments 3 ... > nvme_setup_discard: ranges 2 segments 3 ... > nvme_setup_discard: ranges 48 segments 49 ... > nvme_setup_discard: ranges 9 segments 10 For the above(ranges < segments), that should be caused by blk_recalc_rq_segments(), which can be solved by switching to rq_for_each_discard_range() for discard io. Thanks, Ming ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-29 3:33 ` Ming Lei 0 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-29 3:33 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Wed, Jul 28, 2021 at 06:38:36PM +0200, Oleksandr Natalenko wrote: > Hello. > > On středa 28. července 2021 17:53:05 CEST Ming Lei wrote: > > Can you collect debug log by applying the following patch against the > > last one? > > Yes, please see below. > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > index 8780e4aa9df2..fbd8a68c619b 100644 > > --- a/drivers/nvme/host/core.c > > +++ b/drivers/nvme/host/core.c > > @@ -828,6 +828,24 @@ static inline void nvme_setup_flush(struct nvme_ns *ns, > > cmnd->common.nsid = cpu_to_le32(ns->head->ns_id); > > } > > > > +static inline void blk_dump_rq(const struct request *req) > > +{ > > + struct bio *bio; > > + int i = 0; > > + > > + printk("dump req %p(f:%x, seg: %d)\n", req, req->cmd_flags, > > + req->nr_phys_segments); > > + > > + __rq_for_each_bio(bio, req) { > > + printk("%d-%p: %hx/%hx %llu %u\n", > > + i++, bio, > > + bio->bi_flags, bio->bi_opf, > > + (unsigned long long)bio->bi_iter.bi_sector, > > + bio->bi_iter.bi_size>>9); > > + } > > +} > > + > > + > > static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request > > *req, struct nvme_command *cmnd) > > { > > @@ -868,6 +886,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > > *ns, struct request *req, } > > > > if (WARN_ON_ONCE(n != segments)) { > > + printk("%s: ranges %u segments %u\n", __func__, n, segments); > > + blk_dump_rq(req); > > if (virt_to_page(range) == ns->ctrl->discard_page) > > clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > > else > > ``` > WARNING: CPU: 17 PID: 821 at drivers/nvme/host/core.c:868 nvme_setup_discard+0x1c6/0x220 > … > CPU: 17 PID: 821 Comm: kworker/17:1H Not tainted 5.13.0-pf4 #1 > Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3601 05/26/2021 > Workqueue: kblockd blk_mq_run_work_fn > RIP: 0010:nvme_setup_discard+0x1c6/0x220 > Code: 8b a0 40 0b 00 00 4c 2b 25 f7 ff d7 00 49 c1 fc 06 49 c1 e4 0c 4c 03 25 f8 ff d7 00 4c 89 e5 48 85 d2 0f 85 9b fe ff ff 31 d2 <0f> 0b 48 c7 c6 e0 a8 10 8b 41 0f b7 cd 48 c7 c7 af 09 40 8b e8 14 > RSP: 0018:ffffafa884517bf0 EFLAGS: 00010297 > RAX: ffff91602f5b20d0 RBX: ffff915e05743c00 RCX: 0000000000080000 > RDX: 000000000000000f RSI: 0000000028445c00 RDI: 000000000000000f > RBP: ffff91602f5b2000 R08: 000000000b366000 R09: ffff91602f5b2000 > R10: 000000000000002d R11: fefefefefefefeff R12: ffff91602f5b2000 > R13: 000000000000000e R14: ffff915e18c77000 R15: ffff915e18c77148 > FS: 0000000000000000(0000) GS:ffff91650ee40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00001c90b36879a8 CR3: 000000010d18c000 CR4: 0000000000350ee0 > Call Trace: > nvme_setup_cmd+0x2e4/0x6a0 > nvme_queue_rq+0x79/0xc90 > blk_mq_dispatch_rq_list+0x15c/0x810 > __blk_mq_do_dispatch_sched+0xca/0x320 > __blk_mq_sched_dispatch_requests+0x14d/0x190 > blk_mq_sched_dispatch_requests+0x2f/0x60 > blk_mq_run_work_fn+0x43/0xc0 > process_one_work+0x24e/0x430 > worker_thread+0x54/0x4d0 > kthread+0x1b3/0x1e0 > ret_from_fork+0x22/0x30 > ---[ end trace bd51917eae1d7201 ]--- > nvme_setup_discard: ranges 15 segments 14 > dump req 000000002c6a085b(f:3, seg: 14) > 0-000000002c3297c7: f80/3 675773440 1024 > 1-0000000098edb2a8: b80/3 188319744 1024 > 2-00000000f58e3b18: b80/3 675775488 1024 > 3-00000000f6670c5a: b80/3 188129280 2048 > 4-00000000ea371a88: b80/3 675585024 2048 > 5-00000000e9cec043: b80/3 188140544 2048 > 6-000000006e1126e6: b80/3 675596288 2048 > 7-000000009466f937: b80/3 188327936 2048 > 8-000000003c9e2ccd: b80/3 675783680 2048 > 9-00000000ab322c68: b80/3 188329984 2048 188329984 = 188327936 + 2048 > 10-00000000eb2b3fb6: b80/3 675785728 2048 675785728 = 675783680 + 2048 Seems the adjacent bios are cut by coming discard range. Looks it isn't mature to support mixed discard IO merge( traditional io merge vs. multi-range merge), I will post the previous patch for fixing this issue. > blk_update_request: I/O error, dev nvme1n1, sector 675773440 op 0x3:(DISCARD) flags 0x0 phys_seg 14 prio class 0 > nvme_setup_discard: ranges 45 segments 48 ... > nvme_setup_discard: ranges 73 segments 75 ... > nvme_setup_discard: ranges 5 segments 6 ... > nvme_setup_discard: ranges 2 segments 3 ... > nvme_setup_discard: ranges 2 segments 3 ... > nvme_setup_discard: ranges 2 segments 3 ... > nvme_setup_discard: ranges 48 segments 49 ... > nvme_setup_discard: ranges 9 segments 10 For the above(ranges < segments), that should be caused by blk_recalc_rq_segments(), which can be solved by switching to rq_for_each_discard_range() for discard io. Thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard 2021-07-29 3:33 ` Ming Lei @ 2021-07-29 9:29 ` Ming Lei -1 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-29 9:29 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Thu, Jul 29, 2021 at 11:33:33AM +0800, Ming Lei wrote: > On Wed, Jul 28, 2021 at 06:38:36PM +0200, Oleksandr Natalenko wrote: > > Hello. > > > > On středa 28. července 2021 17:53:05 CEST Ming Lei wrote: > > > Can you collect debug log by applying the following patch against the > > > last one? > > > > Yes, please see below. > > > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > > index 8780e4aa9df2..fbd8a68c619b 100644 > > > --- a/drivers/nvme/host/core.c > > > +++ b/drivers/nvme/host/core.c > > > @@ -828,6 +828,24 @@ static inline void nvme_setup_flush(struct nvme_ns *ns, > > > cmnd->common.nsid = cpu_to_le32(ns->head->ns_id); > > > } > > > > > > +static inline void blk_dump_rq(const struct request *req) > > > +{ > > > + struct bio *bio; > > > + int i = 0; > > > + > > > + printk("dump req %p(f:%x, seg: %d)\n", req, req->cmd_flags, > > > + req->nr_phys_segments); > > > + > > > + __rq_for_each_bio(bio, req) { > > > + printk("%d-%p: %hx/%hx %llu %u\n", > > > + i++, bio, > > > + bio->bi_flags, bio->bi_opf, > > > + (unsigned long long)bio->bi_iter.bi_sector, > > > + bio->bi_iter.bi_size>>9); > > > + } > > > +} > > > + > > > + > > > static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request > > > *req, struct nvme_command *cmnd) > > > { > > > @@ -868,6 +886,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > > > *ns, struct request *req, } > > > > > > if (WARN_ON_ONCE(n != segments)) { > > > + printk("%s: ranges %u segments %u\n", __func__, n, segments); > > > + blk_dump_rq(req); > > > if (virt_to_page(range) == ns->ctrl->discard_page) > > > clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > > > else > > > > ``` > > WARNING: CPU: 17 PID: 821 at drivers/nvme/host/core.c:868 nvme_setup_discard+0x1c6/0x220 > > … > > CPU: 17 PID: 821 Comm: kworker/17:1H Not tainted 5.13.0-pf4 #1 > > Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3601 05/26/2021 > > Workqueue: kblockd blk_mq_run_work_fn > > RIP: 0010:nvme_setup_discard+0x1c6/0x220 > > Code: 8b a0 40 0b 00 00 4c 2b 25 f7 ff d7 00 49 c1 fc 06 49 c1 e4 0c 4c 03 25 f8 ff d7 00 4c 89 e5 48 85 d2 0f 85 9b fe ff ff 31 d2 <0f> 0b 48 c7 c6 e0 a8 10 8b 41 0f b7 cd 48 c7 c7 af 09 40 8b e8 14 > > RSP: 0018:ffffafa884517bf0 EFLAGS: 00010297 > > RAX: ffff91602f5b20d0 RBX: ffff915e05743c00 RCX: 0000000000080000 > > RDX: 000000000000000f RSI: 0000000028445c00 RDI: 000000000000000f > > RBP: ffff91602f5b2000 R08: 000000000b366000 R09: ffff91602f5b2000 > > R10: 000000000000002d R11: fefefefefefefeff R12: ffff91602f5b2000 > > R13: 000000000000000e R14: ffff915e18c77000 R15: ffff915e18c77148 > > FS: 0000000000000000(0000) GS:ffff91650ee40000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00001c90b36879a8 CR3: 000000010d18c000 CR4: 0000000000350ee0 > > Call Trace: > > nvme_setup_cmd+0x2e4/0x6a0 > > nvme_queue_rq+0x79/0xc90 > > blk_mq_dispatch_rq_list+0x15c/0x810 > > __blk_mq_do_dispatch_sched+0xca/0x320 > > __blk_mq_sched_dispatch_requests+0x14d/0x190 > > blk_mq_sched_dispatch_requests+0x2f/0x60 > > blk_mq_run_work_fn+0x43/0xc0 > > process_one_work+0x24e/0x430 > > worker_thread+0x54/0x4d0 > > kthread+0x1b3/0x1e0 > > ret_from_fork+0x22/0x30 > > ---[ end trace bd51917eae1d7201 ]--- > > nvme_setup_discard: ranges 15 segments 14 > > dump req 000000002c6a085b(f:3, seg: 14) > > 0-000000002c3297c7: f80/3 675773440 1024 > > 1-0000000098edb2a8: b80/3 188319744 1024 > > 2-00000000f58e3b18: b80/3 675775488 1024 > > 3-00000000f6670c5a: b80/3 188129280 2048 > > 4-00000000ea371a88: b80/3 675585024 2048 > > 5-00000000e9cec043: b80/3 188140544 2048 > > 6-000000006e1126e6: b80/3 675596288 2048 > > 7-000000009466f937: b80/3 188327936 2048 > > 8-000000003c9e2ccd: b80/3 675783680 2048 > > 9-00000000ab322c68: b80/3 188329984 2048 > > 188329984 = 188327936 + 2048 > > > 10-00000000eb2b3fb6: b80/3 675785728 2048 > > 675785728 = 675783680 + 2048 > > Seems the adjacent bios are cut by coming discard > range. > > Looks it isn't mature to support mixed discard IO merge( > traditional io merge vs. multi-range merge), I will post > the previous patch for fixing this issue. The reason is the following lines of code: #define rq_hash_key(rq) (blk_rq_pos(rq) + blk_rq_sectors(rq)) if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) ... The current block merge code thinks that all bios in one request is physically continuous, so we can't support mixed merge for multi-range discard request simply. Thanks, Ming ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: New warning in nvme_setup_discard @ 2021-07-29 9:29 ` Ming Lei 0 siblings, 0 replies; 52+ messages in thread From: Ming Lei @ 2021-07-29 9:29 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme, David Jeffery, Laurence Oberman, Paolo Valente, Jan Kara, Sasha Levin, Greg Kroah-Hartman, Keith Busch On Thu, Jul 29, 2021 at 11:33:33AM +0800, Ming Lei wrote: > On Wed, Jul 28, 2021 at 06:38:36PM +0200, Oleksandr Natalenko wrote: > > Hello. > > > > On středa 28. července 2021 17:53:05 CEST Ming Lei wrote: > > > Can you collect debug log by applying the following patch against the > > > last one? > > > > Yes, please see below. > > > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > > index 8780e4aa9df2..fbd8a68c619b 100644 > > > --- a/drivers/nvme/host/core.c > > > +++ b/drivers/nvme/host/core.c > > > @@ -828,6 +828,24 @@ static inline void nvme_setup_flush(struct nvme_ns *ns, > > > cmnd->common.nsid = cpu_to_le32(ns->head->ns_id); > > > } > > > > > > +static inline void blk_dump_rq(const struct request *req) > > > +{ > > > + struct bio *bio; > > > + int i = 0; > > > + > > > + printk("dump req %p(f:%x, seg: %d)\n", req, req->cmd_flags, > > > + req->nr_phys_segments); > > > + > > > + __rq_for_each_bio(bio, req) { > > > + printk("%d-%p: %hx/%hx %llu %u\n", > > > + i++, bio, > > > + bio->bi_flags, bio->bi_opf, > > > + (unsigned long long)bio->bi_iter.bi_sector, > > > + bio->bi_iter.bi_size>>9); > > > + } > > > +} > > > + > > > + > > > static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request > > > *req, struct nvme_command *cmnd) > > > { > > > @@ -868,6 +886,8 @@ static blk_status_t nvme_setup_discard(struct nvme_ns > > > *ns, struct request *req, } > > > > > > if (WARN_ON_ONCE(n != segments)) { > > > + printk("%s: ranges %u segments %u\n", __func__, n, segments); > > > + blk_dump_rq(req); > > > if (virt_to_page(range) == ns->ctrl->discard_page) > > > clear_bit_unlock(0, &ns->ctrl->discard_page_busy); > > > else > > > > ``` > > WARNING: CPU: 17 PID: 821 at drivers/nvme/host/core.c:868 nvme_setup_discard+0x1c6/0x220 > > … > > CPU: 17 PID: 821 Comm: kworker/17:1H Not tainted 5.13.0-pf4 #1 > > Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3601 05/26/2021 > > Workqueue: kblockd blk_mq_run_work_fn > > RIP: 0010:nvme_setup_discard+0x1c6/0x220 > > Code: 8b a0 40 0b 00 00 4c 2b 25 f7 ff d7 00 49 c1 fc 06 49 c1 e4 0c 4c 03 25 f8 ff d7 00 4c 89 e5 48 85 d2 0f 85 9b fe ff ff 31 d2 <0f> 0b 48 c7 c6 e0 a8 10 8b 41 0f b7 cd 48 c7 c7 af 09 40 8b e8 14 > > RSP: 0018:ffffafa884517bf0 EFLAGS: 00010297 > > RAX: ffff91602f5b20d0 RBX: ffff915e05743c00 RCX: 0000000000080000 > > RDX: 000000000000000f RSI: 0000000028445c00 RDI: 000000000000000f > > RBP: ffff91602f5b2000 R08: 000000000b366000 R09: ffff91602f5b2000 > > R10: 000000000000002d R11: fefefefefefefeff R12: ffff91602f5b2000 > > R13: 000000000000000e R14: ffff915e18c77000 R15: ffff915e18c77148 > > FS: 0000000000000000(0000) GS:ffff91650ee40000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00001c90b36879a8 CR3: 000000010d18c000 CR4: 0000000000350ee0 > > Call Trace: > > nvme_setup_cmd+0x2e4/0x6a0 > > nvme_queue_rq+0x79/0xc90 > > blk_mq_dispatch_rq_list+0x15c/0x810 > > __blk_mq_do_dispatch_sched+0xca/0x320 > > __blk_mq_sched_dispatch_requests+0x14d/0x190 > > blk_mq_sched_dispatch_requests+0x2f/0x60 > > blk_mq_run_work_fn+0x43/0xc0 > > process_one_work+0x24e/0x430 > > worker_thread+0x54/0x4d0 > > kthread+0x1b3/0x1e0 > > ret_from_fork+0x22/0x30 > > ---[ end trace bd51917eae1d7201 ]--- > > nvme_setup_discard: ranges 15 segments 14 > > dump req 000000002c6a085b(f:3, seg: 14) > > 0-000000002c3297c7: f80/3 675773440 1024 > > 1-0000000098edb2a8: b80/3 188319744 1024 > > 2-00000000f58e3b18: b80/3 675775488 1024 > > 3-00000000f6670c5a: b80/3 188129280 2048 > > 4-00000000ea371a88: b80/3 675585024 2048 > > 5-00000000e9cec043: b80/3 188140544 2048 > > 6-000000006e1126e6: b80/3 675596288 2048 > > 7-000000009466f937: b80/3 188327936 2048 > > 8-000000003c9e2ccd: b80/3 675783680 2048 > > 9-00000000ab322c68: b80/3 188329984 2048 > > 188329984 = 188327936 + 2048 > > > 10-00000000eb2b3fb6: b80/3 675785728 2048 > > 675785728 = 675783680 + 2048 > > Seems the adjacent bios are cut by coming discard > range. > > Looks it isn't mature to support mixed discard IO merge( > traditional io merge vs. multi-range merge), I will post > the previous patch for fixing this issue. The reason is the following lines of code: #define rq_hash_key(rq) (blk_rq_pos(rq) + blk_rq_sectors(rq)) if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector) ... The current block merge code thinks that all bios in one request is physically continuous, so we can't support mixed merge for multi-range discard request simply. Thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2021-07-29 9:30 UTC | newest] Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-15 13:56 New warning in nvme_setup_discard Oleksandr Natalenko 2021-07-15 13:56 ` Oleksandr Natalenko 2021-07-15 14:19 ` Greg Kroah-Hartman 2021-07-15 14:19 ` Greg Kroah-Hartman 2021-07-15 14:21 ` Oleksandr Natalenko 2021-07-15 14:21 ` Oleksandr Natalenko 2021-07-15 21:37 ` Laurence Oberman 2021-07-15 21:37 ` Laurence Oberman 2021-07-16 5:50 ` Oleksandr Natalenko 2021-07-16 5:50 ` Oleksandr Natalenko 2021-07-16 2:16 ` Ming Lei 2021-07-16 2:16 ` Ming Lei 2021-07-16 5:53 ` Oleksandr Natalenko 2021-07-16 5:53 ` Oleksandr Natalenko 2021-07-16 9:33 ` Ming Lei 2021-07-16 9:33 ` Ming Lei 2021-07-16 10:03 ` Oleksandr Natalenko 2021-07-16 10:03 ` Oleksandr Natalenko 2021-07-16 10:41 ` Ming Lei 2021-07-16 10:41 ` Ming Lei 2021-07-16 12:56 ` Oleksandr Natalenko 2021-07-16 12:56 ` Oleksandr Natalenko 2021-07-17 9:35 ` Ming Lei 2021-07-17 9:35 ` Ming Lei 2021-07-17 12:11 ` Oleksandr Natalenko 2021-07-17 12:11 ` Oleksandr Natalenko 2021-07-17 12:19 ` Oleksandr Natalenko 2021-07-17 12:19 ` Oleksandr Natalenko 2021-07-17 12:35 ` Oleksandr Natalenko 2021-07-17 12:35 ` Oleksandr Natalenko 2021-07-19 1:40 ` Ming Lei 2021-07-19 1:40 ` Ming Lei 2021-07-19 6:27 ` Oleksandr Natalenko 2021-07-19 6:27 ` Oleksandr Natalenko 2021-07-20 9:05 ` Oleksandr Natalenko 2021-07-20 9:05 ` Oleksandr Natalenko 2021-07-21 8:00 ` Ming Lei 2021-07-21 8:00 ` Ming Lei 2021-07-27 15:12 ` Oleksandr Natalenko 2021-07-27 15:12 ` Oleksandr Natalenko 2021-07-27 15:58 ` Ming Lei 2021-07-27 15:58 ` Ming Lei 2021-07-28 13:44 ` Oleksandr Natalenko 2021-07-28 13:44 ` Oleksandr Natalenko 2021-07-28 15:53 ` Ming Lei 2021-07-28 15:53 ` Ming Lei 2021-07-28 16:38 ` Oleksandr Natalenko 2021-07-28 16:38 ` Oleksandr Natalenko 2021-07-29 3:33 ` Ming Lei 2021-07-29 3:33 ` Ming Lei 2021-07-29 9:29 ` Ming Lei 2021-07-29 9:29 ` Ming Lei
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.