linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bisected] bfq regression on latest linux-block/for-next
       [not found] <841664844.1202812.1617239260115.JavaMail.zimbra@redhat.com>
@ 2021-04-01  1:27 ` Yi Zhang
  2021-04-01  1:55   ` Chaitanya Kulkarni
  2021-04-02 13:39   ` Paolo Valente
  0 siblings, 2 replies; 9+ messages in thread
From: Yi Zhang @ 2021-04-01  1:27 UTC (permalink / raw)
  To: linux-block; +Cc: paolo.valente

Hi
We reproduced this bfq regression[3] on ppc64le with blktests[2] on the latest linux-block/for-next branch, seems it was introduced with [1] from my bisecting, pls help check it. Let me know if you need any testing for it, thanks.

[1]
commit 430a67f9d6169a7b3e328bceb2ef9542e4153c7c (HEAD, refs/bisect/bad)
Author: Paolo Valente <paolo.valente@linaro.org>
Date:   Thu Mar 4 18:46:27 2021 +0100

    block, bfq: merge bursts of newly-created queues

[2] blktests: nvme_trtype=tcp ./check nvme/011

[3]
[  109.342525] run blktests nvme/011 at 2021-03-31 20:58:58
[  109.497429] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  109.512868] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  109.584932] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:9b73e8531afa4320963cc96571e0acb1.
[  109.585809] nvme nvme0: creating 128 I/O queues.
[  109.596570] nvme nvme0: mapped 128/0/0 default/read/poll queues.
[  109.654170] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420
[  155.366535] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[  155.366568] nvmet: ctrl 1 fatal error occurred!
[  155.366608] nvme nvme0: starting error recovery
[  155.374861] block nvme0n1: no usable path - requeuing I/O
[  155.374891] block nvme0n1: no usable path - requeuing I/O
[  155.374911] block nvme0n1: no usable path - requeuing I/O
[  155.374923] block nvme0n1: no usable path - requeuing I/O
[  155.374934] block nvme0n1: no usable path - requeuing I/O
[  155.374954] block nvme0n1: no usable path - requeuing I/O
[  155.374973] block nvme0n1: no usable path - requeuing I/O
[  155.374984] block nvme0n1: no usable path - requeuing I/O
[  155.375004] block nvme0n1: no usable path - requeuing I/O
[  155.375024] block nvme0n1: no usable path - requeuing I/O
[  155.375674] nvme nvme0: Reconnecting in 10 seconds...
[  180.967462] nvmet: ctrl 2 keep-alive timer (15 seconds) expired!
[  180.967486] nvmet: ctrl 2 fatal error occurred!
[  193.427906] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  193.427934] rcu: 	40-...0: (1 GPs behind) idle=f3e/1/0x4000000000000000 softirq=535/535 fqs=2839 
[  193.427966] rcu: 	42-...0: (1 GPs behind) idle=792/1/0x4000000000000000 softirq=160/160 fqs=2839 
[  193.427995] 	(detected by 5, t=6002 jiffies, g=7961, q=7219)
[  193.428030] Sending NMI from CPU 5 to CPUs 40:
[  199.235530] CPU 40 didn't respond to backtrace IPI, inspecting paca.
[  199.235551] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 217 (kworker/40:0H)
[  199.235579] Back trace of paca->saved_r1 (0xc000000012973380) (possibly stale):
[  199.235594] Call Trace:
[  199.235601] [c000000012973380] [c0000000129733e0] 0xc0000000129733e0 (unreliable)
[  199.235633] [c000000012973420] [c000000000933008] bfq_allow_bio_merge+0x78/0x110
[  199.235673] [c000000012973460] [c0000000008d7470] elv_bio_merge_ok+0x90/0xb0
[  199.235711] [c000000012973490] [c0000000008d811c] elv_merge+0x6c/0x1b0
[  199.235747] [c0000000129734e0] [c0000000008ea1d8] blk_mq_sched_try_merge+0x48/0x250
[  199.235785] [c000000012973540] [c00000000092d29c] bfq_bio_merge+0xfc/0x230
[  199.235820] [c0000000129735c0] [c0000000008f9e24] __blk_mq_sched_bio_merge+0xa4/0x210
[  199.235848] [c000000012973620] [c0000000008f288c] blk_mq_submit_bio+0xec/0x710
[  199.235877] [c0000000129736c0] [c0000000008dfad4] submit_bio_noacct+0x534/0x680
[  199.235915] [c000000012973760] [c0000000005e0308] iomap_dio_submit_bio+0xd8/0x100
[  199.235953] [c000000012973790] [c0000000005e0ed4] iomap_dio_bio_actor+0x3c4/0x570
[  199.235991] [c000000012973830] [c0000000005db284] iomap_apply+0x1f4/0x3e0
[  199.236017] [c000000012973920] [c0000000005e06e4] __iomap_dio_rw+0x204/0x5b0
[  199.236054] [c0000000129739f0] [c0000000005e0ab0] iomap_dio_rw+0x20/0x80
[  199.236090] [c000000012973a10] [c00800000e4ea660] xfs_file_dio_write_aligned+0xb8/0x1b0 [xfs]
[  199.236230] [c000000012973a60] [c00800000e4eabfc] xfs_file_write_iter+0x104/0x190 [xfs]
[  199.236368] [c000000012973a90] [c008000010440724] nvmet_file_submit_bvec+0xfc/0x160 [nvmet]
[  199.236412] [c000000012973b10] [c008000010440b54] nvmet_file_execute_io+0x2cc/0x3b0 [nvmet]
[  199.236455] [c000000012973b90] [c00800000fa142d8] nvmet_tcp_io_work+0xce0/0xd10 [nvmet_tcp]
[  199.236493] [c000000012973c70] [c000000000179534] process_one_work+0x294/0x580
[  199.236533] [c000000012973d10] [c0000000001798c8] worker_thread+0xa8/0x650
[  199.236568] [c000000012973da0] [c000000000184180] kthread+0x190/0x1a0
[  199.236595] [c000000012973e10] [c00000000000d4ec] ret_from_kernel_thread+0x5c/0x70
[  199.236710] Sending NMI from CPU 5 to CPUs 42:
[  205.044364] CPU 42 didn't respond to backtrace IPI, inspecting paca.
[  205.044381] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 3004 (kworker/42:2H)
[  205.044408] Back trace of paca->saved_r1 (0xc00000085194b520) (possibly stale):
[  205.044423] Call Trace:
[  205.044440] [c00000085194b520] [c00000079ac7f700] 0xc00000079ac7f700 (unreliable)
[  205.044468] [c00000085194b540] [c00000000092d248] bfq_bio_merge+0xa8/0x230
[  205.044494] [c00000085194b5c0] [c0000000008f9e24] __blk_mq_sched_bio_merge+0xa4/0x210
[  205.044531] [c00000085194b620] [c0000000008f288c] blk_mq_submit_bio+0xec/0x710
[  205.044569] [c00000085194b6c0] [c0000000008dfad4] submit_bio_noacct+0x534/0x680
[  205.044607] [c00000085194b760] [c0000000005e0308] iomap_dio_submit_bio+0xd8/0x100
[  205.044645] [c00000085194b790] [c0000000005e0ed4] iomap_dio_bio_actor+0x3c4/0x570
[  205.044672] [c00000085194b830] [c0000000005db284] iomap_apply+0x1f4/0x3e0
[  205.044698] [c00000085194b920] [c0000000005e06e4] __iomap_dio_rw+0x204/0x5b0
[  205.044735] [c00000085194b9f0] [c0000000005e0ab0] iomap_dio_rw+0x20/0x80
[  205.044771] [c00000085194ba10] [c00800000e4ea660] xfs_file_dio_write_aligned+0xb8/0x1b0 [xfs]
[  205.044897] [c00000085194ba60] [c00800000e4eabfc] xfs_file_write_iter+0x104/0x190 [xfs]
[  205.045021] [c00000085194ba90] [c008000010440724] nvmet_file_submit_bvec+0xfc/0x160 [nvmet]
[  205.045064] [c00000085194bb10] [c008000010440b54] nvmet_file_execute_io+0x2cc/0x3b0 [nvmet]
[  205.045106] [c00000085194bb90] [c00800000fa142d8] nvmet_tcp_io_work+0xce0/0xd10 [nvmet_tcp]
[  205.045144] [c00000085194bc70] [c000000000179534] process_one_work+0x294/0x580
[  205.045181] [c00000085194bd10] [c0000000001798c8] worker_thread+0xa8/0x650
[  205.045216] [c00000085194bda0] [c000000000184180] kthread+0x190/0x1a0
[  205.045242] [c00000085194be10] [c00000000000d4ec] ret_from_kernel_thread+0x5c/0x70




Best Regards,
  Yi Zhang



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] bfq regression on latest linux-block/for-next
  2021-04-01  1:27 ` [bisected] bfq regression on latest linux-block/for-next Yi Zhang
@ 2021-04-01  1:55   ` Chaitanya Kulkarni
  2021-04-01  2:05     ` Yi Zhang
  2021-04-02 13:39   ` Paolo Valente
  1 sibling, 1 reply; 9+ messages in thread
From: Chaitanya Kulkarni @ 2021-04-01  1:55 UTC (permalink / raw)
  To: Yi Zhang; +Cc: linux-block, paolo.valente, linux-nvme

Yi,

On 3/31/21 18:28, Yi Zhang wrote:
> Hi
> We reproduced this bfq regression[3] on ppc64le with blktests[2] on the latest linux-block/for-next branch, seems it was introduced with [1] from my bisecting, pls help check it. Let me know if you need any testing for it, thanks.
>
> [1]
> commit 430a67f9d6169a7b3e328bceb2ef9542e4153c7c (HEAD, refs/bisect/bad)
> Author: Paolo Valente <paolo.valente@linaro.org>
> Date:   Thu Mar 4 18:46:27 2021 +0100
>
>     block, bfq: merge bursts of newly-created queues
>
> [2] blktests: nvme_trtype=tcp ./check nvme/011
>
>

+ linux-nvme

Please add linux-nvme mailing list from next time for blktests
nvmecategory to
get better response.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] bfq regression on latest linux-block/for-next
  2021-04-01  1:55   ` Chaitanya Kulkarni
@ 2021-04-01  2:05     ` Yi Zhang
  0 siblings, 0 replies; 9+ messages in thread
From: Yi Zhang @ 2021-04-01  2:05 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, paolo.valente, linux-nvme



On 4/1/21 9:55 AM, Chaitanya Kulkarni wrote:
> Yi,
>
> On 3/31/21 18:28, Yi Zhang wrote:
>> Hi
>> We reproduced this bfq regression[3] on ppc64le with blktests[2] on the latest linux-block/for-next branch, seems it was introduced with [1] from my bisecting, pls help check it. Let me know if you need any testing for it, thanks.
>>
>> [1]
>> commit 430a67f9d6169a7b3e328bceb2ef9542e4153c7c (HEAD, refs/bisect/bad)
>> Author: Paolo Valente <paolo.valente@linaro.org>
>> Date:   Thu Mar 4 18:46:27 2021 +0100
>>
>>      block, bfq: merge bursts of newly-created queues
>>
>> [2] blktests: nvme_trtype=tcp ./check nvme/011
>>
>>
> + linux-nvme
>
> Please add linux-nvme mailing list from next time for blktests
> nvmecategory to
> get better response.
>
Thanks for reminder, here is another log from x86_64:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/03/31/627938/build_x86_64_redhat%3A1135726/tests/9788884_x86_64_2_console.log

[18786.566141] run blktests nvme/011 at 2021-03-31 05:14:49
[18786.628130] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[18786.665543] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[18786.699230] nvmet: creating controller 1 for subsystem 
blktests-subsystem-1 for NQN 
nqn.2014-08.org.nvmexpress:uuid:efb6b8a520d74d6c8193b4cdfcf4ed27.
[18786.768196] nvme nvme0: creating 32 I/O queues.
[18786.794890] nvme nvme0: mapped 32/0/0 default/read/poll queues.
[18786.838099] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 
127.0.0.1:4420
[-- MARK -- Wed Mar 31 09:15:00 2021]
[18805.461676] ------------[ cut here ]------------
[18805.486313] kernel BUG at mm/slub.c:314!
[18805.506044] invalid opcode: 0000 [#1] SMP PTI
[18805.528104] CPU: 31 PID: 322 Comm: kworker/31:1H Tainted: G          
I       5.12.0-rc5 #1
[18805.570307] Hardware name: HP ProLiant DL380e Gen8, BIOS P73 08/20/2012
[18805.604735] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp]
[18805.635671] RIP: 0010:kmem_cache_free+0x40d/0x440
[18805.660905] Code: 00 00 00 e9 1c ff ff ff 4c 8b 0c 24 48 8b 4c 24 08 
48 89 da 48 89 ee 41 b8 01 00 00 00 4c 89 f7 e8 98 c5 ff ff e9 c0 fd ff 
ff <0f> 0b 48 c7 c6 78 c4 3c 90 4c 89 ff e8 a2 68 fa ff 0f 0b 48 8b 05
[18805.755735] RSP: 0018:ffffae10023eba68 EFLAGS: 00010046
[18805.783155] RAX: ffff909991c4e760 RBX: ffff909991c4e760 RCX: 
ffff909991c4e878
[18805.819016] RDX: 00000000000008c4 RSI: 0000000000000000 RDI: 
ffff909991c4e760
[18805.855858] RBP: fffff7dd49471300 R08: ffff90998688ba48 R09: 
ffffae10023ebabb
[18805.891750] R10: ffff909991c4e760 R11: 0000000000001000 R12: 
ffff909991c4e760
[18805.930721] R13: 0000000000000000 R14: ffff909844eb9300 R15: 
fffff7dd49471300
[18805.969903] FS:  0000000000000000(0000) GS:ffff909b777c0000(0000) 
knlGS:0000000000000000
[18806.011278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18806.041018] CR2: 000055883d2eb018 CR3: 0000000255b7c001 CR4: 
00000000000606e0
[18806.077143] Call Trace:
[18806.090748]  ? bfq_put_queue+0x156/0x280
[18806.110302]  bfq_put_queue+0x156/0x280
[18806.129147]  bfq_insert_requests+0x182/0x1610
[18806.153478]  ? iomap_apply+0xf1/0x2b0
[18806.171562]  blk_mq_sched_insert_requests+0x5d/0xe0
[18806.195976]  blk_mq_flush_plug_list+0xf0/0x170
[18806.219609]  blk_finish_plug+0x36/0x50
[18806.238186]  __iomap_dio_rw+0x3ff/0x4c0
[18806.257607]  iomap_dio_rw+0xa/0x30
[18806.275661]  xfs_file_dio_write_aligned+0x86/0x110 [xfs]
[18806.302402]  ? xfs_file_buffered_write+0x280/0x280 [xfs]
[18806.329803]  xfs_file_write_iter+0xc8/0x110 [xfs]
[18806.354426]  nvmet_file_submit_bvec+0xc3/0x100 [nvmet]
[18806.380201]  nvmet_file_execute_io+0x1f8/0x240 [nvmet]
[18806.407312]  nvmet_tcp_io_work+0xbd3/0xc15 [nvmet_tcp]
[18806.434505]  ? switch_mm_irqs_off+0x58/0x430
[18806.456485]  ? __switch_to_asm+0x42/0x70
[18806.476908]  ? __switch_to+0x7b/0x450
[18806.494909]  process_one_work+0x1ec/0x380
[18806.516534]  worker_thread+0x53/0x3e0
[18806.535986]  ? process_one_work+0x380/0x380
[18806.556696]  kthread+0x11b/0x140
[18806.573191]  ? __kthread_bind_mask+0x60/0x60
[18806.596048]  ret_from_fork+0x22/0x30
[18806.613788] Modules linked in: nvme_tcp nvme_fabrics nvmet_tcp nvmet 
nvme nvme_core md4 cmac nls_utf8 cifs libarc4 libdes loop 
rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd auth_rpcgss nfs_acl 
lockd grace nfs_ssc dm_log_writes dm_flakey rfkill sunrpc iTCO_wdt 
intel_rapl_msr intel_pmc_bxt iTCO_vendor_support intel_rapl_common 
sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
irqbypass rapl intel_cstate intel_uncore pcspkr ipmi_ssif lpc_ich 
acpi_ipmi hpilo ipmi_si hpwdt ipmi_devintf ioatdma igb ipmi_msghandler 
acpi_power_meter dca fuse zram ip_tables xfs crct10dif_pclmul 
crc32_pclmul mgag200 crc32c_intel drm_kms_helper ghash_clmulni_intel cec 
serio_raw drm i2c_algo_bit [last unloaded: nvmet]
[18806.937291] ---[ end trace e1409caee7927af8 ]---
>
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] bfq regression on latest linux-block/for-next
  2021-04-01  1:27 ` [bisected] bfq regression on latest linux-block/for-next Yi Zhang
  2021-04-01  1:55   ` Chaitanya Kulkarni
@ 2021-04-02 13:39   ` Paolo Valente
  2021-04-07 15:15     ` Yi Zhang
  1 sibling, 1 reply; 9+ messages in thread
From: Paolo Valente @ 2021-04-02 13:39 UTC (permalink / raw)
  To: Yi Zhang; +Cc: linux-block



> Il giorno 1 apr 2021, alle ore 03:27, Yi Zhang <yi.zhang@redhat.com> ha scritto:
> 
> Hi

Hi

> We reproduced this bfq regression[3] on ppc64le with blktests[2] on the latest linux-block/for-next branch, seems it was introduced with [1] from my bisecting, pls help check it. Let me know if you need any testing for it, thanks.
> 

Thanks for reporting this bug and finding the candidate offending commit. Could you try this test with my dev kernel, which might provide more information? The kernel is here:
https://github.com/Algodev-github/bfq-mq

Alternatively, I could try to provide you with patches to instrument your kernel.

The first option may be quicker.

In both cases, having KASAN active could be rather helpful too.

Looking forward to your feedback,
Paolo

> [1]
> commit 430a67f9d6169a7b3e328bceb2ef9542e4153c7c (HEAD, refs/bisect/bad)
> Author: Paolo Valente <paolo.valente@linaro.org>
> Date:   Thu Mar 4 18:46:27 2021 +0100
> 
>    block, bfq: merge bursts of newly-created queues
> 
> [2] blktests: nvme_trtype=tcp ./check nvme/011
> 
> [3]
> [  109.342525] run blktests nvme/011 at 2021-03-31 20:58:58
> [  109.497429] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [  109.512868] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [  109.584932] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:9b73e8531afa4320963cc96571e0acb1.
> [  109.585809] nvme nvme0: creating 128 I/O queues.
> [  109.596570] nvme nvme0: mapped 128/0/0 default/read/poll queues.
> [  109.654170] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420
> [  155.366535] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> [  155.366568] nvmet: ctrl 1 fatal error occurred!
> [  155.366608] nvme nvme0: starting error recovery
> [  155.374861] block nvme0n1: no usable path - requeuing I/O
> [  155.374891] block nvme0n1: no usable path - requeuing I/O
> [  155.374911] block nvme0n1: no usable path - requeuing I/O
> [  155.374923] block nvme0n1: no usable path - requeuing I/O
> [  155.374934] block nvme0n1: no usable path - requeuing I/O
> [  155.374954] block nvme0n1: no usable path - requeuing I/O
> [  155.374973] block nvme0n1: no usable path - requeuing I/O
> [  155.374984] block nvme0n1: no usable path - requeuing I/O
> [  155.375004] block nvme0n1: no usable path - requeuing I/O
> [  155.375024] block nvme0n1: no usable path - requeuing I/O
> [  155.375674] nvme nvme0: Reconnecting in 10 seconds...
> [  180.967462] nvmet: ctrl 2 keep-alive timer (15 seconds) expired!
> [  180.967486] nvmet: ctrl 2 fatal error occurred!
> [  193.427906] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [  193.427934] rcu: 	40-...0: (1 GPs behind) idle=f3e/1/0x4000000000000000 softirq=535/535 fqs=2839 
> [  193.427966] rcu: 	42-...0: (1 GPs behind) idle=792/1/0x4000000000000000 softirq=160/160 fqs=2839 
> [  193.427995] 	(detected by 5, t=6002 jiffies, g=7961, q=7219)
> [  193.428030] Sending NMI from CPU 5 to CPUs 40:
> [  199.235530] CPU 40 didn't respond to backtrace IPI, inspecting paca.
> [  199.235551] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 217 (kworker/40:0H)
> [  199.235579] Back trace of paca->saved_r1 (0xc000000012973380) (possibly stale):
> [  199.235594] Call Trace:
> [  199.235601] [c000000012973380] [c0000000129733e0] 0xc0000000129733e0 (unreliable)
> [  199.235633] [c000000012973420] [c000000000933008] bfq_allow_bio_merge+0x78/0x110
> [  199.235673] [c000000012973460] [c0000000008d7470] elv_bio_merge_ok+0x90/0xb0
> [  199.235711] [c000000012973490] [c0000000008d811c] elv_merge+0x6c/0x1b0
> [  199.235747] [c0000000129734e0] [c0000000008ea1d8] blk_mq_sched_try_merge+0x48/0x250
> [  199.235785] [c000000012973540] [c00000000092d29c] bfq_bio_merge+0xfc/0x230
> [  199.235820] [c0000000129735c0] [c0000000008f9e24] __blk_mq_sched_bio_merge+0xa4/0x210
> [  199.235848] [c000000012973620] [c0000000008f288c] blk_mq_submit_bio+0xec/0x710
> [  199.235877] [c0000000129736c0] [c0000000008dfad4] submit_bio_noacct+0x534/0x680
> [  199.235915] [c000000012973760] [c0000000005e0308] iomap_dio_submit_bio+0xd8/0x100
> [  199.235953] [c000000012973790] [c0000000005e0ed4] iomap_dio_bio_actor+0x3c4/0x570
> [  199.235991] [c000000012973830] [c0000000005db284] iomap_apply+0x1f4/0x3e0
> [  199.236017] [c000000012973920] [c0000000005e06e4] __iomap_dio_rw+0x204/0x5b0
> [  199.236054] [c0000000129739f0] [c0000000005e0ab0] iomap_dio_rw+0x20/0x80
> [  199.236090] [c000000012973a10] [c00800000e4ea660] xfs_file_dio_write_aligned+0xb8/0x1b0 [xfs]
> [  199.236230] [c000000012973a60] [c00800000e4eabfc] xfs_file_write_iter+0x104/0x190 [xfs]
> [  199.236368] [c000000012973a90] [c008000010440724] nvmet_file_submit_bvec+0xfc/0x160 [nvmet]
> [  199.236412] [c000000012973b10] [c008000010440b54] nvmet_file_execute_io+0x2cc/0x3b0 [nvmet]
> [  199.236455] [c000000012973b90] [c00800000fa142d8] nvmet_tcp_io_work+0xce0/0xd10 [nvmet_tcp]
> [  199.236493] [c000000012973c70] [c000000000179534] process_one_work+0x294/0x580
> [  199.236533] [c000000012973d10] [c0000000001798c8] worker_thread+0xa8/0x650
> [  199.236568] [c000000012973da0] [c000000000184180] kthread+0x190/0x1a0
> [  199.236595] [c000000012973e10] [c00000000000d4ec] ret_from_kernel_thread+0x5c/0x70
> [  199.236710] Sending NMI from CPU 5 to CPUs 42:
> [  205.044364] CPU 42 didn't respond to backtrace IPI, inspecting paca.
> [  205.044381] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 3004 (kworker/42:2H)
> [  205.044408] Back trace of paca->saved_r1 (0xc00000085194b520) (possibly stale):
> [  205.044423] Call Trace:
> [  205.044440] [c00000085194b520] [c00000079ac7f700] 0xc00000079ac7f700 (unreliable)
> [  205.044468] [c00000085194b540] [c00000000092d248] bfq_bio_merge+0xa8/0x230
> [  205.044494] [c00000085194b5c0] [c0000000008f9e24] __blk_mq_sched_bio_merge+0xa4/0x210
> [  205.044531] [c00000085194b620] [c0000000008f288c] blk_mq_submit_bio+0xec/0x710
> [  205.044569] [c00000085194b6c0] [c0000000008dfad4] submit_bio_noacct+0x534/0x680
> [  205.044607] [c00000085194b760] [c0000000005e0308] iomap_dio_submit_bio+0xd8/0x100
> [  205.044645] [c00000085194b790] [c0000000005e0ed4] iomap_dio_bio_actor+0x3c4/0x570
> [  205.044672] [c00000085194b830] [c0000000005db284] iomap_apply+0x1f4/0x3e0
> [  205.044698] [c00000085194b920] [c0000000005e06e4] __iomap_dio_rw+0x204/0x5b0
> [  205.044735] [c00000085194b9f0] [c0000000005e0ab0] iomap_dio_rw+0x20/0x80
> [  205.044771] [c00000085194ba10] [c00800000e4ea660] xfs_file_dio_write_aligned+0xb8/0x1b0 [xfs]
> [  205.044897] [c00000085194ba60] [c00800000e4eabfc] xfs_file_write_iter+0x104/0x190 [xfs]
> [  205.045021] [c00000085194ba90] [c008000010440724] nvmet_file_submit_bvec+0xfc/0x160 [nvmet]
> [  205.045064] [c00000085194bb10] [c008000010440b54] nvmet_file_execute_io+0x2cc/0x3b0 [nvmet]
> [  205.045106] [c00000085194bb90] [c00800000fa142d8] nvmet_tcp_io_work+0xce0/0xd10 [nvmet_tcp]
> [  205.045144] [c00000085194bc70] [c000000000179534] process_one_work+0x294/0x580
> [  205.045181] [c00000085194bd10] [c0000000001798c8] worker_thread+0xa8/0x650
> [  205.045216] [c00000085194bda0] [c000000000184180] kthread+0x190/0x1a0
> [  205.045242] [c00000085194be10] [c00000000000d4ec] ret_from_kernel_thread+0x5c/0x70
> 
> 
> 
> 
> Best Regards,
>  Yi Zhang
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] bfq regression on latest linux-block/for-next
  2021-04-02 13:39   ` Paolo Valente
@ 2021-04-07 15:15     ` Yi Zhang
       [not found]       ` <CAHj4cs9+q-vH9qar+MTP-aECb2whT7O8J5OmR240yss1y=kWKw@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Yi Zhang @ 2021-04-07 15:15 UTC (permalink / raw)
  To: Paolo Valente; +Cc: linux-block



On 4/2/21 9:39 PM, Paolo Valente wrote:
>
>> Il giorno 1 apr 2021, alle ore 03:27, Yi Zhang <yi.zhang@redhat.com> ha scritto:
>>
>> Hi
> Hi
>
>> We reproduced this bfq regression[3] on ppc64le with blktests[2] on the latest linux-block/for-next branch, seems it was introduced with [1] from my bisecting, pls help check it. Let me know if you need any testing for it, thanks.
>>
> Thanks for reporting this bug and finding the candidate offending commit. Could you try this test with my dev kernel, which might provide more information? The kernel is here:
> https://github.com/Algodev-github/bfq-mq
>
> Alternatively, I could try to provide you with patches to instrument your kernel.
HI Paolo
I tried your dev kernel, but with no luck to reproduce it, could you 
provide the debug patch based on latest linux-block/for-next?

> The first option may be quicker.
>
> In both cases, having KASAN active could be rather helpful too.
>
> Looking forward to your feedback,
> Paolo
>
>> [1]
>> commit 430a67f9d6169a7b3e328bceb2ef9542e4153c7c (HEAD, refs/bisect/bad)
>> Author: Paolo Valente <paolo.valente@linaro.org>
>> Date:   Thu Mar 4 18:46:27 2021 +0100
>>
>>     block, bfq: merge bursts of newly-created queues
>>
>> [2] blktests: nvme_trtype=tcp ./check nvme/011
>>
>> [3]
>> [  109.342525] run blktests nvme/011 at 2021-03-31 20:58:58
>> [  109.497429] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>> [  109.512868] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>> [  109.584932] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:9b73e8531afa4320963cc96571e0acb1.
>> [  109.585809] nvme nvme0: creating 128 I/O queues.
>> [  109.596570] nvme nvme0: mapped 128/0/0 default/read/poll queues.
>> [  109.654170] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420
>> [  155.366535] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
>> [  155.366568] nvmet: ctrl 1 fatal error occurred!
>> [  155.366608] nvme nvme0: starting error recovery
>> [  155.374861] block nvme0n1: no usable path - requeuing I/O
>> [  155.374891] block nvme0n1: no usable path - requeuing I/O
>> [  155.374911] block nvme0n1: no usable path - requeuing I/O
>> [  155.374923] block nvme0n1: no usable path - requeuing I/O
>> [  155.374934] block nvme0n1: no usable path - requeuing I/O
>> [  155.374954] block nvme0n1: no usable path - requeuing I/O
>> [  155.374973] block nvme0n1: no usable path - requeuing I/O
>> [  155.374984] block nvme0n1: no usable path - requeuing I/O
>> [  155.375004] block nvme0n1: no usable path - requeuing I/O
>> [  155.375024] block nvme0n1: no usable path - requeuing I/O
>> [  155.375674] nvme nvme0: Reconnecting in 10 seconds...
>> [  180.967462] nvmet: ctrl 2 keep-alive timer (15 seconds) expired!
>> [  180.967486] nvmet: ctrl 2 fatal error occurred!
>> [  193.427906] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>> [  193.427934] rcu: 	40-...0: (1 GPs behind) idle=f3e/1/0x4000000000000000 softirq=535/535 fqs=2839
>> [  193.427966] rcu: 	42-...0: (1 GPs behind) idle=792/1/0x4000000000000000 softirq=160/160 fqs=2839
>> [  193.427995] 	(detected by 5, t=6002 jiffies, g=7961, q=7219)
>> [  193.428030] Sending NMI from CPU 5 to CPUs 40:
>> [  199.235530] CPU 40 didn't respond to backtrace IPI, inspecting paca.
>> [  199.235551] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 217 (kworker/40:0H)
>> [  199.235579] Back trace of paca->saved_r1 (0xc000000012973380) (possibly stale):
>> [  199.235594] Call Trace:
>> [  199.235601] [c000000012973380] [c0000000129733e0] 0xc0000000129733e0 (unreliable)
>> [  199.235633] [c000000012973420] [c000000000933008] bfq_allow_bio_merge+0x78/0x110
>> [  199.235673] [c000000012973460] [c0000000008d7470] elv_bio_merge_ok+0x90/0xb0
>> [  199.235711] [c000000012973490] [c0000000008d811c] elv_merge+0x6c/0x1b0
>> [  199.235747] [c0000000129734e0] [c0000000008ea1d8] blk_mq_sched_try_merge+0x48/0x250
>> [  199.235785] [c000000012973540] [c00000000092d29c] bfq_bio_merge+0xfc/0x230
>> [  199.235820] [c0000000129735c0] [c0000000008f9e24] __blk_mq_sched_bio_merge+0xa4/0x210
>> [  199.235848] [c000000012973620] [c0000000008f288c] blk_mq_submit_bio+0xec/0x710
>> [  199.235877] [c0000000129736c0] [c0000000008dfad4] submit_bio_noacct+0x534/0x680
>> [  199.235915] [c000000012973760] [c0000000005e0308] iomap_dio_submit_bio+0xd8/0x100
>> [  199.235953] [c000000012973790] [c0000000005e0ed4] iomap_dio_bio_actor+0x3c4/0x570
>> [  199.235991] [c000000012973830] [c0000000005db284] iomap_apply+0x1f4/0x3e0
>> [  199.236017] [c000000012973920] [c0000000005e06e4] __iomap_dio_rw+0x204/0x5b0
>> [  199.236054] [c0000000129739f0] [c0000000005e0ab0] iomap_dio_rw+0x20/0x80
>> [  199.236090] [c000000012973a10] [c00800000e4ea660] xfs_file_dio_write_aligned+0xb8/0x1b0 [xfs]
>> [  199.236230] [c000000012973a60] [c00800000e4eabfc] xfs_file_write_iter+0x104/0x190 [xfs]
>> [  199.236368] [c000000012973a90] [c008000010440724] nvmet_file_submit_bvec+0xfc/0x160 [nvmet]
>> [  199.236412] [c000000012973b10] [c008000010440b54] nvmet_file_execute_io+0x2cc/0x3b0 [nvmet]
>> [  199.236455] [c000000012973b90] [c00800000fa142d8] nvmet_tcp_io_work+0xce0/0xd10 [nvmet_tcp]
>> [  199.236493] [c000000012973c70] [c000000000179534] process_one_work+0x294/0x580
>> [  199.236533] [c000000012973d10] [c0000000001798c8] worker_thread+0xa8/0x650
>> [  199.236568] [c000000012973da0] [c000000000184180] kthread+0x190/0x1a0
>> [  199.236595] [c000000012973e10] [c00000000000d4ec] ret_from_kernel_thread+0x5c/0x70
>> [  199.236710] Sending NMI from CPU 5 to CPUs 42:
>> [  205.044364] CPU 42 didn't respond to backtrace IPI, inspecting paca.
>> [  205.044381] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 3004 (kworker/42:2H)
>> [  205.044408] Back trace of paca->saved_r1 (0xc00000085194b520) (possibly stale):
>> [  205.044423] Call Trace:
>> [  205.044440] [c00000085194b520] [c00000079ac7f700] 0xc00000079ac7f700 (unreliable)
>> [  205.044468] [c00000085194b540] [c00000000092d248] bfq_bio_merge+0xa8/0x230
>> [  205.044494] [c00000085194b5c0] [c0000000008f9e24] __blk_mq_sched_bio_merge+0xa4/0x210
>> [  205.044531] [c00000085194b620] [c0000000008f288c] blk_mq_submit_bio+0xec/0x710
>> [  205.044569] [c00000085194b6c0] [c0000000008dfad4] submit_bio_noacct+0x534/0x680
>> [  205.044607] [c00000085194b760] [c0000000005e0308] iomap_dio_submit_bio+0xd8/0x100
>> [  205.044645] [c00000085194b790] [c0000000005e0ed4] iomap_dio_bio_actor+0x3c4/0x570
>> [  205.044672] [c00000085194b830] [c0000000005db284] iomap_apply+0x1f4/0x3e0
>> [  205.044698] [c00000085194b920] [c0000000005e06e4] __iomap_dio_rw+0x204/0x5b0
>> [  205.044735] [c00000085194b9f0] [c0000000005e0ab0] iomap_dio_rw+0x20/0x80
>> [  205.044771] [c00000085194ba10] [c00800000e4ea660] xfs_file_dio_write_aligned+0xb8/0x1b0 [xfs]
>> [  205.044897] [c00000085194ba60] [c00800000e4eabfc] xfs_file_write_iter+0x104/0x190 [xfs]
>> [  205.045021] [c00000085194ba90] [c008000010440724] nvmet_file_submit_bvec+0xfc/0x160 [nvmet]
>> [  205.045064] [c00000085194bb10] [c008000010440b54] nvmet_file_execute_io+0x2cc/0x3b0 [nvmet]
>> [  205.045106] [c00000085194bb90] [c00800000fa142d8] nvmet_tcp_io_work+0xce0/0xd10 [nvmet_tcp]
>> [  205.045144] [c00000085194bc70] [c000000000179534] process_one_work+0x294/0x580
>> [  205.045181] [c00000085194bd10] [c0000000001798c8] worker_thread+0xa8/0x650
>> [  205.045216] [c00000085194bda0] [c000000000184180] kthread+0x190/0x1a0
>> [  205.045242] [c00000085194be10] [c00000000000d4ec] ret_from_kernel_thread+0x5c/0x70
>>
>>
>>
>>
>> Best Regards,
>>   Yi Zhang
>>
>>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] bfq regression on latest linux-block/for-next
       [not found]       ` <CAHj4cs9+q-vH9qar+MTP-aECb2whT7O8J5OmR240yss1y=kWKw@mail.gmail.com>
@ 2021-04-20  7:33         ` Paolo Valente
  2021-05-03  9:37           ` Paolo Valente
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Valente @ 2021-04-20  7:33 UTC (permalink / raw)
  To: Yi Zhang; +Cc: linux-block, Xiong Zhou, Li Wang, Ming Lei



> Il giorno 20 apr 2021, alle ore 04:00, Yi Zhang <yi.zhang@redhat.com> ha scritto:
> 
> 
> On Wed, Apr 7, 2021 at 11:15 PM Yi Zhang <yi.zhang@redhat.com> wrote:
> 
> 
> On 4/2/21 9:39 PM, Paolo Valente wrote:
> >
> >> Il giorno 1 apr 2021, alle ore 03:27, Yi Zhang <yi.zhang@redhat.com> ha scritto:
> >>
> >> Hi
> > Hi
> >
> >> We reproduced this bfq regression[3] on ppc64le with blktests[2] on the latest linux-block/for-next branch, seems it was introduced with [1] from my bisecting, pls help check it. Let me know if you need any testing for it, thanks.
> >>
> > Thanks for reporting this bug and finding the candidate offending commit. Could you try this test with my dev kernel, which might provide more information? The kernel is here:
> > https://github.com/Algodev-github/bfq-mq
> >
> > Alternatively, I could try to provide you with patches to instrument your kernel.
> HI Paolo
> I tried your dev kernel, but with no luck to reproduce it, could you 
> provide the debug patch based on latest linux-block/for-next?
> 
> Hi Paolo
> This issue has been consistently reproduced with LTP/fstests/blktests on recent linux-block/for-next, do you have a chance to check it?

Hi Yi, all,
I've been working hard to port my code-instrumentation layer to the kernel in for-next. I seem I finished the porting yesterday. I tested it but the system crashed. I'm going to analyze the oops. Maybe this freeze is caused by mistakes in this layer, maybe the instrumentation is already detecting a bug. In the first case, I'll fix the mistakes and try the tests suggested in this thread.

Thanks,
Paolo

> 
> LTP
> [ 7223.279313] LTP: starting dio30 (diotest6 -b 65536 -n 100 -i 100 -o 1024000) 
> [ 7320.660837] ------------[ cut here ]------------ 
> [ 7320.660862] kernel BUG at mm/slub.c:314! 
> [ 7320.660872] Oops: Exception in kernel mode, sig: 5 [#1] 
> [ 7320.660882] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 
> [ 7320.660894] Modules linked in: dummy veth minix binfmt_misc can_raw can nfsv3 nfs_acl nfs lockd grace nfs_ssc fscache rds tun brd overlay fuse exfat vfat fat loop n_gsm pps_ldisc ppp_synctty mkiss ax25 ppp_async ppp_generic slcan slip slhc snd_hrtimer snd_seq snd_seq_device sctp ip6_udp_tunnel udp_tunnel snd_timer snd soundcore authenc pcrypt crypto_user sha3_generic n_hdlc rfkill sunrpc ibmveth pseries_rng crct10dif_vpmsum drm drm_panel_orientation_quirks i2c_core zram ip_tables xfs ibmvscsi scsi_transport_srp vmx_crypto crc32c_vpmsum [last unloaded: ltp_insmod01] 
> [ 7320.661052] CPU: 7 PID: 145059 Comm: diotest6 Tainted: G           OE     5.12.0-rc7 #1 
> [ 7320.661067] NIP:  c00000000049b9b8 LR: c0000000004a161c CTR: c000000000935870 
> [ 7320.661079] REGS: c0000000a9ae2f60 TRAP: 0700   Tainted: G           OE      (5.12.0-rc7) 
> [ 7320.661091] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44008242  XER: 20040000 
> [ 7320.661118] CFAR: c0000000004a1618 IRQMASK: 1  
> [ 7320.661118] GPR00: c0000000004a161c c0000000a9ae3200 c000000001f10400 c000000007fdd500  
> [ 7320.661118] GPR04: c00c00000005a700 c0000000169c59b0 c0000000169c5ac8 0000000000000001  
> [ 7320.661118] GPR08: 0000000000000118 0000000000000001 c00c0000000586c0 0000000000000000  
> [ 7320.661118] GPR12: 0000000000008000 c00000001eca6280 0000000000000000 0000000000000000  
> [ 7320.661118] GPR16: 0000000002cba000 0000000000140000 0000000000000000 fffffffffffff77f  
> [ 7320.661118] GPR20: c0000007e4042780 0000000000000001 c0000000169c59b0 0000000000210d00  
> [ 7320.661118] GPR24: c0000000169c59b0 0000000000000001 0000000000000001 c000000007fdd500  
> [ 7320.661118] GPR28: c0000000169c59b0 000000008075002e c0000000169c59b0 c00c00000005a700  
> [ 7320.661254] NIP [c00000000049b9b8] __slab_free+0x98/0x600 
> [ 7320.661268] LR [c0000000004a161c] kmem_cache_free+0x25c/0x5e0 
> [ 7320.661281] Call Trace: 
> [ 7320.661286] [c0000000a9ae3200] [0000000000000005] 0x5 (unreliable) 
> [ 7320.661301] [c0000000a9ae32c0] [c0000000004a161c] kmem_cache_free+0x25c/0x5e0 
> [ 7320.661316] [c0000000a9ae3380] [c0000000009329f8] bfq_put_queue+0x1d8/0x2d0 
> [ 7320.661330] [c0000000a9ae3420] [c0000000009350ec] bfq_init_rq+0x72c/0xc00 
> [ 7320.661344] [c0000000a9ae3500] [c000000000935aa4] bfq_insert_requests+0x234/0x1600 
> [ 7320.661359] [c0000000a9ae3650] [c0000000008fb264] blk_mq_sched_insert_requests+0xb4/0x1f0 
> [ 7320.661374] [c0000000a9ae36a0] [c0000000008f32c8] blk_mq_flush_plug_list+0x138/0x1f0 
> [ 7320.661390] [c0000000a9ae3710] [c0000000008e1538] blk_flush_plug_list+0x48/0x64 
> [ 7320.661406] [c0000000a9ae3740] [c0000000008f3b1c] blk_mq_submit_bio+0x38c/0x710 
> [ 7320.661421] [c0000000a9ae37e0] [c0000000008e0a94] submit_bio_noacct+0x3d4/0x680 
> [ 7320.661436] [c0000000a9ae3880] [c0000000005e1048] iomap_dio_submit_bio+0xd8/0x100 
> [ 7320.661452] [c0000000a9ae38b0] [c0000000005e1b48] iomap_dio_bio_actor+0x2f8/0x570 
> [ 7320.661467] [c0000000a9ae3950] [c0000000005dbfc4] iomap_apply+0x1f4/0x3e0 
> [ 7320.661481] [c0000000a9ae3a40] [c0000000005e1424] __iomap_dio_rw+0x204/0x5b0 
> [ 7320.661495] [c0000000a9ae3b10] [c0000000005e17f0] iomap_dio_rw+0x20/0x80 
> [ 7320.661509] [c0000000a9ae3b30] [c008000001cb2c34] xfs_file_dio_read+0x11c/0x1a0 [xfs] 
> [ 7320.661694] [c0000000a9ae3b70] [c008000001cb2dd4] xfs_file_read_iter+0x11c/0x170 [xfs] 
> [ 7320.661876] [c0000000a9ae3bb0] [c0000000004f8900] do_iter_readv_writev+0x130/0x240 
> [ 7320.661893] [c0000000a9ae3c10] [c0000000004f935c] do_iter_read+0x14c/0x2b0 
> [ 7320.661906] [c0000000a9ae3c60] [c0000000004f9568] vfs_readv+0x78/0xb0 
> [ 7320.661918] [c0000000a9ae3d50] [c0000000004f9630] do_readv+0x90/0x1a0 
> [ 7320.661931] [c0000000a9ae3db0] [c00000000002da24] system_call_exception+0x104/0x270 
> [ 7320.661947] [c0000000a9ae3e10] [c00000000000d25c] system_call_common+0xec/0x278 
> [ 7320.661961] --- interrupt: c00 at 0x7fff8699e724 
> [ 7320.661970] NIP:  00007fff8699e724 LR: 00000000100051f0 CTR: 0000000000000000 
> [ 7320.661982] REGS: c0000000a9ae3e80 TRAP: 0c00   Tainted: G           OE      (5.12.0-rc7) 
> [ 7320.661994] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 24002444  XER: 00000000 
> [ 7320.662020] IRQMASK: 0  
> [ 7320.662020] GPR00: 0000000000000091 00007fffc2bb1e90 00007fff86a87100 0000000000000004  
> [ 7320.662020] GPR04: 000001001f590470 0000000000000014 0000000000000000 0000000000000000  
> [ 7320.662020] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
> [ 7320.662020] GPR12: 0000000000000000 00007fff86b7a990 0000000000000000 0000000000000000  
> [ 7320.662020] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
> [ 7320.662020] GPR20: 0000000000000000 0000000000000022 0000000000000000 0000000000000022  
> [ 7320.662020] GPR24: 0000000000000004 0000000000000003 000001001f590470 000001001f5905c0  
> [ 7320.662020] GPR28: 00000000100505c8 0000000000010000 0000000002b7a000 0000000000000014  
> [ 7320.662149] NIP [00007fff8699e724] 0x7fff8699e724 
> [ 7320.662158] LR [00000000100051f0] 0x100051f0 
> [ 7320.662167] --- interrupt: c00 
> [ 7320.662174] Instruction dump: 
> [ 7320.662182] fbc100b0 3ee00021 7f390734 62f70d00 3b400001 ebdf0020 811b0028 ebbf0028  
> [ 7320.662205] 7fc9e278 7cdc4214 7d290074 7929d182 <0b090000> 78c50022 54c7c03e e93b00b8  
> [ 7320.662229] ---[ end trace 817605fc936991f3 ]--- 
> [ 7320.666013] 
> 
>  fstests
> [ 1271.938817] run fstests generic/421 at 2021-04-16 16:03:05 
> [ 1272.963661] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: acl,user_xattr. Quota mode: none. 
> [ 1272.969886] xfs_io (pid 192274) is setting deprecated v1 encryption policy; recommend upgrading to v2. 
> [ 1273.764719] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: acl,user_xattr. Quota mode: none. 
> [ 1274.735175] Kernel attempted to read user page (30) - exploit attempt? (uid: 0) 
> [ 1274.735210] BUG: Kernel NULL pointer dereference on read at 0x00000030 
> [ 1274.735223] Faulting instruction address: 0xc00000000092cc78 
> [ 1274.735246] Oops: Kernel access of bad area, sig: 11 [#1] 
> [ 1274.735257] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV 
> [ 1274.735273] Modules linked in: dm_flakey rfkill i40iw ib_uverbs ib_core sunrpc joydev ses at24 enclosure i40e scsi_transport_sas regmap_i2c ofpart ipmi_powernv ipmi_devintf powernv_flash rtc_opal ipmi_msghandler crct10dif_vpmsum mtd opal_prd i2c_opal fuse zram ip_tables xfs ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm vmx_crypto crc32c_vpmsum i2c_core aacraid drm_panel_orientation_quirks 
> [ 1274.735391] CPU: 48 PID: 192676 Comm: xfs_io Not tainted 5.12.0-rc7 #1 
> [ 1274.735406] NIP:  c00000000092cc78 LR: c000000000933c90 CTR: c000000000026570 
> [ 1274.735430] REGS: c00000002e13f340 TRAP: 0300   Not tainted  (5.12.0-rc7) 
> [ 1274.735443] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 84024800  XER: 20040000 
> [ 1274.735478] CFAR: c00000000092cc58 DAR: 0000000000000030 DSISR: 40000000 IRQMASK: 1  
> [ 1274.735478] GPR00: c000000000933c90 c00000002e13f5e0 c000000001f10400 c000000050803000  
> [ 1274.735478] GPR04: c000000050806a60 0000000002045800 c00000002e13f688 0000000000000000  
> [ 1274.735478] GPR08: 0000000000000002 c000000069474ef8 0000000000000000 0000000100017cbc  
> [ 1274.735478] GPR12: 0000000000002000 c000001ffffbb800 0000000000000000 c0000008438c9ff8  
> [ 1274.735478] GPR16: c0000008438c9e80 0000000000010000 0000000000000000 0000000000080001  
> [ 1274.735478] GPR20: c00000002e13fa40 c000000042b04f60 0000000000060800 c000000069474ec0  
> [ 1274.735478] GPR24: c0000000425ed800 0000000000000000 c000000841f10d20 0000000002045800  
> [ 1274.735478] GPR28: c0000000508031f0 c000000069474ec0 0000000000000000 c000000050803000  
> [ 1274.735694] NIP [c00000000092cc78] bfq_rq_pos_tree_lookup+0x48/0x110 
> [ 1274.735729] LR [c000000000933c90] bfq_setup_cooperator+0x2f0/0x4f0 
> [ 1274.735763] Call Trace: 
> [ 1274.735779] [c00000002e13f5e0] [c000000000934ed4] bfq_init_rq+0x534/0xc00 (unreliable) 
> [ 1274.735819] [c00000002e13f650] [c000000000933c90] bfq_setup_cooperator+0x2f0/0x4f0 
> [ 1274.735845] [c00000002e13f6c0] [c000000000935bfc] bfq_insert_requests+0x3ac/0x1600 
> [ 1274.735872] [c00000002e13f810] [c0000000008fb244] blk_mq_sched_insert_requests+0xb4/0x1f0 
> [ 1274.735888] [c00000002e13f860] [c0000000008f32a8] blk_mq_flush_plug_list+0x138/0x1f0 
> [ 1274.735906] [c00000002e13f8d0] [c0000000008de4a8] blk_finish_plug+0x68/0x90 
> [ 1274.735922] [c00000002e13f900] [c0000000003d64e0] read_pages+0x1d0/0x4b0 
> [ 1274.735938] [c00000002e13f980] [c0000000003d6e8c] page_cache_ra_unbounded+0x21c/0x300 
> [ 1274.735954] [c00000002e13fa20] [c0000000003c778c] filemap_get_pages+0x11c/0x7d0 
> [ 1274.735981] [c00000002e13fae0] [c0000000003c7f30] filemap_read+0xf0/0x4a0 
> [ 1274.735997] [c00000002e13fc10] [c00000000063e3ac] ext4_file_read_iter+0x7c/0x270 
> [ 1274.736013] [c00000002e13fc60] [c0000000004f85c0] new_sync_read+0x120/0x190 
> [ 1274.736038] [c00000002e13fd00] [c0000000004fb5a0] vfs_read+0x1d0/0x240 
> [ 1274.736052] [c00000002e13fd50] [c0000000004fb6c4] sys_pread64+0xb4/0xe0 
> [ 1274.736067] [c00000002e13fdb0] [c00000000002da24] system_call_exception+0x104/0x270 
> [ 1274.736084] [c00000002e13fe10] [c00000000000d25c] system_call_common+0xec/0x278 
> [ 1274.736100] --- interrupt: c00 at 0x7fff8d1899c8 
> [ 1274.736111] NIP:  00007fff8d1899c8 LR: 00007fff8d1899a8 CTR: 0000000000000000 
> [ 1274.736125] REGS: c00000002e13fe80 TRAP: 0c00   Not tainted  (5.12.0-rc7) 
> [ 1274.736138] MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24022400  XER: 00000000 
> [ 1274.736178] IRQMASK: 0  
> [ 1274.736178] GPR00: 00000000000000b3 00007fffeb4b8360 00007fff8d1b7f00 0000000000000003  
> [ 1274.736178] GPR04: 000000013a220000 0000000000001000 0000000000200000 00007fff8d256d08  
> [ 1274.736178] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
> [ 1274.736178] GPR12: 0000000000000000 00007fff8d25e400 0000000000000000 0000000000000000  
> [ 1274.736178] GPR16: 0000000000000000 00007fff8d0601a8 0000000000000000 0000000000000000  
> [ 1274.736178] GPR20: 0000000000000000 0000000000000000 0000000000000003 0000000000000000  
> [ 1274.736178] GPR24: 0000000000200000 000000010aac3508 0000000000000000 0000000000200000  
> [ 1274.736178] GPR28: 0000000000000000 0000000000000003 0000000000001000 0000000000200000  
> [ 1274.736407] NIP [00007fff8d1899c8] 0x7fff8d1899c8 
> [ 1274.736509] LR [00007fff8d1899a8] 0x7fff8d1899a8 
> [ 1274.736560] --- interrupt: c00 
> [ 1274.736695] Instruction dump: 
> [ 1274.736767] f8010010 f821ff91 e9240000 2c290000 4082001c 480000d4 ebe90008 38890008  
> [ 1274.736885] 2c3f0000 4182002c 7fe9fb78 e9490028 <e94a0030> 7c255040 4181ffe0 4080005c  
> [ 1274.737028] ---[ end trace 25f5fc09dd3c38dd ]--- 
> 
> 
> blktests nvme-tcp/011
> 
> [10384.975371] run blktests nvme/011 at 2021-04-16 16:43:37 
> [10385.017040] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 
> [10385.028953] nvmet_tcp: enabling port 0 (
> 127.0.0.1:4420
> ) 
> [10385.186103] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:688053f17e1241f385cf0632cd1d173a. 
> [10385.200147] nvmet: unhandled identify cns 6 on qid 0 
> [10385.205293] nvme nvme0: creating 128 I/O queues. 
> [10385.224831] nvme nvme0: mapped 128/0/0 default/read/poll queues. 
> [10385.259232] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 
> 127.0.0.1:4420
>  
> [10387.700391] ------------[ cut here ]------------ 
> [10387.705007] kernel BUG at mm/slub.c:314! 
> [10387.708920] Internal error: Oops - BUG: 0 [#1] SMP 
> [10387.713701] Modules linked in: nvme_tcp nvme_fabrics nvmet_tcp nvmet nvme nvme_core loop dm_log_writes dm_flakey rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm iw_cm ib_ipoib ib_cm rfkill mlx5_ib ib_uverbs ib_core sunrpc coresight_etm4x i2c_smbus coresight_tmc coresight_replicator coresight_tpiu joydev mlx5_core mlxfw acpi_ipmi ipmi_ssif ipmi_devintf ipmi_msghandler coresight_funnel thunderx2_pmu coresight vfat fat fuse zram ip_tables xfs ast i2c_algo_bit drm_vram_helper crct10dif_ce ghash_ce drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm gpio_xlp i2c_xlp9xx uas usb_storage aes_neon_bs [last unloaded: nvmet] 
> [10387.777601] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 5.12.0-rc7 #1 
> [10387.784030] Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS L50_5.13_1.16 07/29/2020 
> [10387.793756] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--) 
> [10387.799751] pc : set_freepointer+0x30/0x34 
> [10387.8038 39] lr : kmem_cf fff000886aea300  x24: ffff8000126 aea300 x20: fff00 000000000001 x113: 0000000000000068 x12: ffff8000110e84e8  
> [10387.859102] x11: ffff800012973d90 x10: 000000000000002f  
> [10387.864402] x9 : ffff8000106fd7ac x8 : 0000000000000017  
> [10387.869702] x7 : 0000000000000001 x6 : 000001eca7a44c15  
> [10387.875002] x5 : 00ffffffffffffff x4 : 0000000000000118  
> [10387.880302] x3 : ffff000886aea418 x2 : ffff000886aea300  
> [10387.885602] x1 : ffff000886aea300 x0 : ffff00080814f400  
> [10387.890903] Call trace: 
> [10387.893337]  set_freepointer+0x30/0x34 
> [10387.897075]  bfq_put_queue+0x15c/0x1e4 
> [10387.900815]  bfq_finish_requeue_request+0xdc/0x114 
> [10387.905594]  blk_mq_free_request+0x4c/0x1c0 
> [10387.909767]  __blk_mq_end_request+0x108/0x13c 
> [10387.914111]  scsi_end_request+0xcc/0x1f0 
> [10387.918023]  scsi_io_completion+0x70/0x180 
> [10387.922107]  scsi_finish_command+0x108/0x140 
> [10387.926367]  scsi_softirq_done+0x7c/0x190 
> [10387.930364]  blk_complete_reqs+0x5c/0x74 
> [10387.934277]  blk_done_softirq+0x2c/0x3c 
> [10387.938100]  __do_softirq+0x128/0x388 
> [10387.941752]  __irq_exit_rcu+0x16c/0x174 
> [10387.945578]  irq_exit+0x1c/0x30 
> [10387.948708]  __handle_domain_irq+0x8c/0xec 
> [10387.952794]  gic_handle_irq+0x5c/0xdc 
> [10387.956443]  el1_irq+0xb8/0x140 
> [10387.959573]  arch_cpu_idle+0x18/0x30 
> [10387.963138]  default_idle_call+0x4c/0x160 
> [10387.967137]  cpuidle_idle_call+0x14c/0x1a0 
> [10387.971224]  do_idle+0xb4/0x104 
> [10387.974354]  cpu_startup_entry+0x30/0x9c 
> [10387.978265]  secondary_start_kernel+0xf4/0x120 
> [10387.982699]  0x0 
> [10387.984530] Code: ca000042 ca030042 f8246822 d65f03c0 (d4210000)  
> [10387.990612] ---[ end trace fcc292b65af7c325 ]--- 
> [10387.995218] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt 
> [10388.002601] SMP: stopping secondary CPUs 
> [10389.075031] SMP: failed to stop secondary CPUs 0-12,14-223 
> [10389.080513] Kernel Offset: disabled 
> [10389.083988] CPU features: 0x00046002,63000838 
> [10389.088333] Memory Limit: none 
> [10389.091454] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]--- 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] bfq regression on latest linux-block/for-next
  2021-04-20  7:33         ` Paolo Valente
@ 2021-05-03  9:37           ` Paolo Valente
  2021-05-07  8:30             ` Yi Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Valente @ 2021-05-03  9:37 UTC (permalink / raw)
  To: Yi Zhang; +Cc: linux-block, Xiong Zhou, Li Wang, Ming Lei

[-- Attachment #1: Type: text/plain, Size: 2214 bytes --]



> Il giorno 20 apr 2021, alle ore 09:33, Paolo Valente <paolo.valente@linaro.org> ha scritto:
> 
> 
> 
>> Il giorno 20 apr 2021, alle ore 04:00, Yi Zhang <yi.zhang@redhat.com> ha scritto:
>> 
>> 
>> On Wed, Apr 7, 2021 at 11:15 PM Yi Zhang <yi.zhang@redhat.com> wrote:
>> 
>> 
>> On 4/2/21 9:39 PM, Paolo Valente wrote:
>>> 
>>>> Il giorno 1 apr 2021, alle ore 03:27, Yi Zhang <yi.zhang@redhat.com> ha scritto:
>>>> 
>>>> Hi
>>> Hi
>>> 
>>>> We reproduced this bfq regression[3] on ppc64le with blktests[2] on the latest linux-block/for-next branch, seems it was introduced with [1] from my bisecting, pls help check it. Let me know if you need any testing for it, thanks.
>>>> 
>>> Thanks for reporting this bug and finding the candidate offending commit. Could you try this test with my dev kernel, which might provide more information? The kernel is here:
>>> https://github.com/Algodev-github/bfq-mq
>>> 
>>> Alternatively, I could try to provide you with patches to instrument your kernel.
>> HI Paolo
>> I tried your dev kernel, but with no luck to reproduce it, could you 
>> provide the debug patch based on latest linux-block/for-next?
>> 
>> Hi Paolo
>> This issue has been consistently reproduced with LTP/fstests/blktests on recent linux-block/for-next, do you have a chance to check it?
> 
> Hi Yi, all,
> I've been working hard to port my code-instrumentation layer to the kernel in for-next. I seem I finished the porting yesterday. I tested it but the system crashed. I'm going to analyze the oops. Maybe this freeze is caused by mistakes in this layer, maybe the instrumentation is already detecting a bug. In the first case, I'll fix the mistakes and try the tests suggested in this thread.
> 

Hi Yi, all,
I seem to have made it.  I've attached a patch series, which applies
on top of for-next, as it was when you reported this failure (i.e., on
top of 816e1d1c2f7d Merge branch 'for-5.13/io_uring' into for-next).
If patches are to be applied on top of a different HEAD, and they
don't apply cleanly, I'll take care of rebasing them.

Of course I've tried your test myself, but with no failure at all.

Looking forward to your feedback,
Paolo


[-- Attachment #2: dev-patches-on-for-next.tgz --]
[-- Type: application/octet-stream, Size: 42405 bytes --]

[-- Attachment #3: Type: text/plain, Size: 15787 bytes --]





> Thanks,
> Paolo
> 
>> 
>> LTP
>> [ 7223.279313] LTP: starting dio30 (diotest6 -b 65536 -n 100 -i 100 -o 1024000) 
>> [ 7320.660837] ------------[ cut here ]------------ 
>> [ 7320.660862] kernel BUG at mm/slub.c:314! 
>> [ 7320.660872] Oops: Exception in kernel mode, sig: 5 [#1] 
>> [ 7320.660882] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 
>> [ 7320.660894] Modules linked in: dummy veth minix binfmt_misc can_raw can nfsv3 nfs_acl nfs lockd grace nfs_ssc fscache rds tun brd overlay fuse exfat vfat fat loop n_gsm pps_ldisc ppp_synctty mkiss ax25 ppp_async ppp_generic slcan slip slhc snd_hrtimer snd_seq snd_seq_device sctp ip6_udp_tunnel udp_tunnel snd_timer snd soundcore authenc pcrypt crypto_user sha3_generic n_hdlc rfkill sunrpc ibmveth pseries_rng crct10dif_vpmsum drm drm_panel_orientation_quirks i2c_core zram ip_tables xfs ibmvscsi scsi_transport_srp vmx_crypto crc32c_vpmsum [last unloaded: ltp_insmod01] 
>> [ 7320.661052] CPU: 7 PID: 145059 Comm: diotest6 Tainted: G           OE     5.12.0-rc7 #1 
>> [ 7320.661067] NIP:  c00000000049b9b8 LR: c0000000004a161c CTR: c000000000935870 
>> [ 7320.661079] REGS: c0000000a9ae2f60 TRAP: 0700   Tainted: G           OE      (5.12.0-rc7) 
>> [ 7320.661091] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44008242  XER: 20040000 
>> [ 7320.661118] CFAR: c0000000004a1618 IRQMASK: 1  
>> [ 7320.661118] GPR00: c0000000004a161c c0000000a9ae3200 c000000001f10400 c000000007fdd500  
>> [ 7320.661118] GPR04: c00c00000005a700 c0000000169c59b0 c0000000169c5ac8 0000000000000001  
>> [ 7320.661118] GPR08: 0000000000000118 0000000000000001 c00c0000000586c0 0000000000000000  
>> [ 7320.661118] GPR12: 0000000000008000 c00000001eca6280 0000000000000000 0000000000000000  
>> [ 7320.661118] GPR16: 0000000002cba000 0000000000140000 0000000000000000 fffffffffffff77f  
>> [ 7320.661118] GPR20: c0000007e4042780 0000000000000001 c0000000169c59b0 0000000000210d00  
>> [ 7320.661118] GPR24: c0000000169c59b0 0000000000000001 0000000000000001 c000000007fdd500  
>> [ 7320.661118] GPR28: c0000000169c59b0 000000008075002e c0000000169c59b0 c00c00000005a700  
>> [ 7320.661254] NIP [c00000000049b9b8] __slab_free+0x98/0x600 
>> [ 7320.661268] LR [c0000000004a161c] kmem_cache_free+0x25c/0x5e0 
>> [ 7320.661281] Call Trace: 
>> [ 7320.661286] [c0000000a9ae3200] [0000000000000005] 0x5 (unreliable) 
>> [ 7320.661301] [c0000000a9ae32c0] [c0000000004a161c] kmem_cache_free+0x25c/0x5e0 
>> [ 7320.661316] [c0000000a9ae3380] [c0000000009329f8] bfq_put_queue+0x1d8/0x2d0 
>> [ 7320.661330] [c0000000a9ae3420] [c0000000009350ec] bfq_init_rq+0x72c/0xc00 
>> [ 7320.661344] [c0000000a9ae3500] [c000000000935aa4] bfq_insert_requests+0x234/0x1600 
>> [ 7320.661359] [c0000000a9ae3650] [c0000000008fb264] blk_mq_sched_insert_requests+0xb4/0x1f0 
>> [ 7320.661374] [c0000000a9ae36a0] [c0000000008f32c8] blk_mq_flush_plug_list+0x138/0x1f0 
>> [ 7320.661390] [c0000000a9ae3710] [c0000000008e1538] blk_flush_plug_list+0x48/0x64 
>> [ 7320.661406] [c0000000a9ae3740] [c0000000008f3b1c] blk_mq_submit_bio+0x38c/0x710 
>> [ 7320.661421] [c0000000a9ae37e0] [c0000000008e0a94] submit_bio_noacct+0x3d4/0x680 
>> [ 7320.661436] [c0000000a9ae3880] [c0000000005e1048] iomap_dio_submit_bio+0xd8/0x100 
>> [ 7320.661452] [c0000000a9ae38b0] [c0000000005e1b48] iomap_dio_bio_actor+0x2f8/0x570 
>> [ 7320.661467] [c0000000a9ae3950] [c0000000005dbfc4] iomap_apply+0x1f4/0x3e0 
>> [ 7320.661481] [c0000000a9ae3a40] [c0000000005e1424] __iomap_dio_rw+0x204/0x5b0 
>> [ 7320.661495] [c0000000a9ae3b10] [c0000000005e17f0] iomap_dio_rw+0x20/0x80 
>> [ 7320.661509] [c0000000a9ae3b30] [c008000001cb2c34] xfs_file_dio_read+0x11c/0x1a0 [xfs] 
>> [ 7320.661694] [c0000000a9ae3b70] [c008000001cb2dd4] xfs_file_read_iter+0x11c/0x170 [xfs] 
>> [ 7320.661876] [c0000000a9ae3bb0] [c0000000004f8900] do_iter_readv_writev+0x130/0x240 
>> [ 7320.661893] [c0000000a9ae3c10] [c0000000004f935c] do_iter_read+0x14c/0x2b0 
>> [ 7320.661906] [c0000000a9ae3c60] [c0000000004f9568] vfs_readv+0x78/0xb0 
>> [ 7320.661918] [c0000000a9ae3d50] [c0000000004f9630] do_readv+0x90/0x1a0 
>> [ 7320.661931] [c0000000a9ae3db0] [c00000000002da24] system_call_exception+0x104/0x270 
>> [ 7320.661947] [c0000000a9ae3e10] [c00000000000d25c] system_call_common+0xec/0x278 
>> [ 7320.661961] --- interrupt: c00 at 0x7fff8699e724 
>> [ 7320.661970] NIP:  00007fff8699e724 LR: 00000000100051f0 CTR: 0000000000000000 
>> [ 7320.661982] REGS: c0000000a9ae3e80 TRAP: 0c00   Tainted: G           OE      (5.12.0-rc7) 
>> [ 7320.661994] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 24002444  XER: 00000000 
>> [ 7320.662020] IRQMASK: 0  
>> [ 7320.662020] GPR00: 0000000000000091 00007fffc2bb1e90 00007fff86a87100 0000000000000004  
>> [ 7320.662020] GPR04: 000001001f590470 0000000000000014 0000000000000000 0000000000000000  
>> [ 7320.662020] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
>> [ 7320.662020] GPR12: 0000000000000000 00007fff86b7a990 0000000000000000 0000000000000000  
>> [ 7320.662020] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
>> [ 7320.662020] GPR20: 0000000000000000 0000000000000022 0000000000000000 0000000000000022  
>> [ 7320.662020] GPR24: 0000000000000004 0000000000000003 000001001f590470 000001001f5905c0  
>> [ 7320.662020] GPR28: 00000000100505c8 0000000000010000 0000000002b7a000 0000000000000014  
>> [ 7320.662149] NIP [00007fff8699e724] 0x7fff8699e724 
>> [ 7320.662158] LR [00000000100051f0] 0x100051f0 
>> [ 7320.662167] --- interrupt: c00 
>> [ 7320.662174] Instruction dump: 
>> [ 7320.662182] fbc100b0 3ee00021 7f390734 62f70d00 3b400001 ebdf0020 811b0028 ebbf0028  
>> [ 7320.662205] 7fc9e278 7cdc4214 7d290074 7929d182 <0b090000> 78c50022 54c7c03e e93b00b8 
>> [ 7320.662229] ---[ end trace 817605fc936991f3 ]--- 
>> [ 7320.666013] 
>> 
>> fstests
>> [ 1271.938817] run fstests generic/421 at 2021-04-16 16:03:05 
>> [ 1272.963661] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: acl,user_xattr. Quota mode: none. 
>> [ 1272.969886] xfs_io (pid 192274) is setting deprecated v1 encryption policy; recommend upgrading to v2. 
>> [ 1273.764719] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: acl,user_xattr. Quota mode: none. 
>> [ 1274.735175] Kernel attempted to read user page (30) - exploit attempt? (uid: 0) 
>> [ 1274.735210] BUG: Kernel NULL pointer dereference on read at 0x00000030 
>> [ 1274.735223] Faulting instruction address: 0xc00000000092cc78 
>> [ 1274.735246] Oops: Kernel access of bad area, sig: 11 [#1] 
>> [ 1274.735257] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV 
>> [ 1274.735273] Modules linked in: dm_flakey rfkill i40iw ib_uverbs ib_core sunrpc joydev ses at24 enclosure i40e scsi_transport_sas regmap_i2c ofpart ipmi_powernv ipmi_devintf powernv_flash rtc_opal ipmi_msghandler crct10dif_vpmsum mtd opal_prd i2c_opal fuse zram ip_tables xfs ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm vmx_crypto crc32c_vpmsum i2c_core aacraid drm_panel_orientation_quirks 
>> [ 1274.735391] CPU: 48 PID: 192676 Comm: xfs_io Not tainted 5.12.0-rc7 #1 
>> [ 1274.735406] NIP:  c00000000092cc78 LR: c000000000933c90 CTR: c000000000026570 
>> [ 1274.735430] REGS: c00000002e13f340 TRAP: 0300   Not tainted  (5.12.0-rc7) 
>> [ 1274.735443] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 84024800  XER: 20040000 
>> [ 1274.735478] CFAR: c00000000092cc58 DAR: 0000000000000030 DSISR: 40000000 IRQMASK: 1  
>> [ 1274.735478] GPR00: c000000000933c90 c00000002e13f5e0 c000000001f10400 c000000050803000  
>> [ 1274.735478] GPR04: c000000050806a60 0000000002045800 c00000002e13f688 0000000000000000  
>> [ 1274.735478] GPR08: 0000000000000002 c000000069474ef8 0000000000000000 0000000100017cbc  
>> [ 1274.735478] GPR12: 0000000000002000 c000001ffffbb800 0000000000000000 c0000008438c9ff8  
>> [ 1274.735478] GPR16: c0000008438c9e80 0000000000010000 0000000000000000 0000000000080001  
>> [ 1274.735478] GPR20: c00000002e13fa40 c000000042b04f60 0000000000060800 c000000069474ec0  
>> [ 1274.735478] GPR24: c0000000425ed800 0000000000000000 c000000841f10d20 0000000002045800  
>> [ 1274.735478] GPR28: c0000000508031f0 c000000069474ec0 0000000000000000 c000000050803000  
>> [ 1274.735694] NIP [c00000000092cc78] bfq_rq_pos_tree_lookup+0x48/0x110 
>> [ 1274.735729] LR [c000000000933c90] bfq_setup_cooperator+0x2f0/0x4f0 
>> [ 1274.735763] Call Trace: 
>> [ 1274.735779] [c00000002e13f5e0] [c000000000934ed4] bfq_init_rq+0x534/0xc00 (unreliable) 
>> [ 1274.735819] [c00000002e13f650] [c000000000933c90] bfq_setup_cooperator+0x2f0/0x4f0 
>> [ 1274.735845] [c00000002e13f6c0] [c000000000935bfc] bfq_insert_requests+0x3ac/0x1600 
>> [ 1274.735872] [c00000002e13f810] [c0000000008fb244] blk_mq_sched_insert_requests+0xb4/0x1f0 
>> [ 1274.735888] [c00000002e13f860] [c0000000008f32a8] blk_mq_flush_plug_list+0x138/0x1f0 
>> [ 1274.735906] [c00000002e13f8d0] [c0000000008de4a8] blk_finish_plug+0x68/0x90 
>> [ 1274.735922] [c00000002e13f900] [c0000000003d64e0] read_pages+0x1d0/0x4b0 
>> [ 1274.735938] [c00000002e13f980] [c0000000003d6e8c] page_cache_ra_unbounded+0x21c/0x300 
>> [ 1274.735954] [c00000002e13fa20] [c0000000003c778c] filemap_get_pages+0x11c/0x7d0 
>> [ 1274.735981] [c00000002e13fae0] [c0000000003c7f30] filemap_read+0xf0/0x4a0 
>> [ 1274.735997] [c00000002e13fc10] [c00000000063e3ac] ext4_file_read_iter+0x7c/0x270 
>> [ 1274.736013] [c00000002e13fc60] [c0000000004f85c0] new_sync_read+0x120/0x190 
>> [ 1274.736038] [c00000002e13fd00] [c0000000004fb5a0] vfs_read+0x1d0/0x240 
>> [ 1274.736052] [c00000002e13fd50] [c0000000004fb6c4] sys_pread64+0xb4/0xe0 
>> [ 1274.736067] [c00000002e13fdb0] [c00000000002da24] system_call_exception+0x104/0x270 
>> [ 1274.736084] [c00000002e13fe10] [c00000000000d25c] system_call_common+0xec/0x278 
>> [ 1274.736100] --- interrupt: c00 at 0x7fff8d1899c8 
>> [ 1274.736111] NIP:  00007fff8d1899c8 LR: 00007fff8d1899a8 CTR: 0000000000000000 
>> [ 1274.736125] REGS: c00000002e13fe80 TRAP: 0c00   Not tainted  (5.12.0-rc7) 
>> [ 1274.736138] MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24022400  XER: 00000000 
>> [ 1274.736178] IRQMASK: 0  
>> [ 1274.736178] GPR00: 00000000000000b3 00007fffeb4b8360 00007fff8d1b7f00 0000000000000003  
>> [ 1274.736178] GPR04: 000000013a220000 0000000000001000 0000000000200000 00007fff8d256d08  
>> [ 1274.736178] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
>> [ 1274.736178] GPR12: 0000000000000000 00007fff8d25e400 0000000000000000 0000000000000000  
>> [ 1274.736178] GPR16: 0000000000000000 00007fff8d0601a8 0000000000000000 0000000000000000  
>> [ 1274.736178] GPR20: 0000000000000000 0000000000000000 0000000000000003 0000000000000000  
>> [ 1274.736178] GPR24: 0000000000200000 000000010aac3508 0000000000000000 0000000000200000  
>> [ 1274.736178] GPR28: 0000000000000000 0000000000000003 0000000000001000 0000000000200000  
>> [ 1274.736407] NIP [00007fff8d1899c8] 0x7fff8d1899c8 
>> [ 1274.736509] LR [00007fff8d1899a8] 0x7fff8d1899a8 
>> [ 1274.736560] --- interrupt: c00 
>> [ 1274.736695] Instruction dump: 
>> [ 1274.736767] f8010010 f821ff91 e9240000 2c290000 4082001c 480000d4 ebe90008 38890008  
>> [ 1274.736885] 2c3f0000 4182002c 7fe9fb78 e9490028 <e94a0030> 7c255040 4181ffe0 4080005c 
>> [ 1274.737028] ---[ end trace 25f5fc09dd3c38dd ]--- 
>> 
>> 
>> blktests nvme-tcp/011
>> 
>> [10384.975371] run blktests nvme/011 at 2021-04-16 16:43:37 
>> [10385.017040] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 
>> [10385.028953] nvmet_tcp: enabling port 0 (
>> 127.0.0.1:4420
>> ) 
>> [10385.186103] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:688053f17e1241f385cf0632cd1d173a. 
>> [10385.200147] nvmet: unhandled identify cns 6 on qid 0 
>> [10385.205293] nvme nvme0: creating 128 I/O queues. 
>> [10385.224831] nvme nvme0: mapped 128/0/0 default/read/poll queues. 
>> [10385.259232] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 
>> 127.0.0.1:4420
>> 
>> [10387.700391] ------------[ cut here ]------------ 
>> [10387.705007] kernel BUG at mm/slub.c:314! 
>> [10387.708920] Internal error: Oops - BUG: 0 [#1] SMP 
>> [10387.713701] Modules linked in: nvme_tcp nvme_fabrics nvmet_tcp nvmet nvme nvme_core loop dm_log_writes dm_flakey rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm iw_cm ib_ipoib ib_cm rfkill mlx5_ib ib_uverbs ib_core sunrpc coresight_etm4x i2c_smbus coresight_tmc coresight_replicator coresight_tpiu joydev mlx5_core mlxfw acpi_ipmi ipmi_ssif ipmi_devintf ipmi_msghandler coresight_funnel thunderx2_pmu coresight vfat fat fuse zram ip_tables xfs ast i2c_algo_bit drm_vram_helper crct10dif_ce ghash_ce drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm gpio_xlp i2c_xlp9xx uas usb_storage aes_neon_bs [last unloaded: nvmet] 
>> [10387.777601] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 5.12.0-rc7 #1 
>> [10387.784030] Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS L50_5.13_1.16 07/29/2020 
>> [10387.793756] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--) 
>> [10387.799751] pc : set_freepointer+0x30/0x34 
>> [10387.8038 39] lr : kmem_cf fff000886aea300  x24: ffff8000126 aea300 x20: fff00 000000000001 x113: 0000000000000068 x12: ffff8000110e84e8  
>> [10387.859102] x11: ffff800012973d90 x10: 000000000000002f  
>> [10387.864402] x9 : ffff8000106fd7ac x8 : 0000000000000017  
>> [10387.869702] x7 : 0000000000000001 x6 : 000001eca7a44c15  
>> [10387.875002] x5 : 00ffffffffffffff x4 : 0000000000000118  
>> [10387.880302] x3 : ffff000886aea418 x2 : ffff000886aea300  
>> [10387.885602] x1 : ffff000886aea300 x0 : ffff00080814f400  
>> [10387.890903] Call trace: 
>> [10387.893337]  set_freepointer+0x30/0x34 
>> [10387.897075]  bfq_put_queue+0x15c/0x1e4 
>> [10387.900815]  bfq_finish_requeue_request+0xdc/0x114 
>> [10387.905594]  blk_mq_free_request+0x4c/0x1c0 
>> [10387.909767]  __blk_mq_end_request+0x108/0x13c 
>> [10387.914111]  scsi_end_request+0xcc/0x1f0 
>> [10387.918023]  scsi_io_completion+0x70/0x180 
>> [10387.922107]  scsi_finish_command+0x108/0x140 
>> [10387.926367]  scsi_softirq_done+0x7c/0x190 
>> [10387.930364]  blk_complete_reqs+0x5c/0x74 
>> [10387.934277]  blk_done_softirq+0x2c/0x3c 
>> [10387.938100]  __do_softirq+0x128/0x388 
>> [10387.941752]  __irq_exit_rcu+0x16c/0x174 
>> [10387.945578]  irq_exit+0x1c/0x30 
>> [10387.948708]  __handle_domain_irq+0x8c/0xec 
>> [10387.952794]  gic_handle_irq+0x5c/0xdc 
>> [10387.956443]  el1_irq+0xb8/0x140 
>> [10387.959573]  arch_cpu_idle+0x18/0x30 
>> [10387.963138]  default_idle_call+0x4c/0x160 
>> [10387.967137]  cpuidle_idle_call+0x14c/0x1a0 
>> [10387.971224]  do_idle+0xb4/0x104 
>> [10387.974354]  cpu_startup_entry+0x30/0x9c 
>> [10387.978265]  secondary_start_kernel+0xf4/0x120 
>> [10387.982699]  0x0 
>> [10387.984530] Code: ca000042 ca030042 f8246822 d65f03c0 (d4210000)  
>> [10387.990612] ---[ end trace fcc292b65af7c325 ]--- 
>> [10387.995218] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt 
>> [10388.002601] SMP: stopping secondary CPUs 
>> [10389.075031] SMP: failed to stop secondary CPUs 0-12,14-223 
>> [10389.080513] Kernel Offset: disabled 
>> [10389.083988] CPU features: 0x00046002,63000838 
>> [10389.088333] Memory Limit: none 
>> [10389.091454] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]--- 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] bfq regression on latest linux-block/for-next
  2021-05-03  9:37           ` Paolo Valente
@ 2021-05-07  8:30             ` Yi Zhang
  2021-05-11 17:10               ` Paolo Valente
  0 siblings, 1 reply; 9+ messages in thread
From: Yi Zhang @ 2021-05-07  8:30 UTC (permalink / raw)
  To: Paolo Valente; +Cc: linux-block, Xiong Zhou, Li Wang, Ming Lei, Bruno Goncalves

On Mon, May 3, 2021 at 5:35 PM Paolo Valente <paolo.valente@linaro.org> wrote:
>
>
>
> > Il giorno 20 apr 2021, alle ore 09:33, Paolo Valente <paolo.valente@linaro.org> ha scritto:
> >
> >
> >
> >> Il giorno 20 apr 2021, alle ore 04:00, Yi Zhang <yi.zhang@redhat.com> ha scritto:
> >>
> >>
> >> On Wed, Apr 7, 2021 at 11:15 PM Yi Zhang <yi.zhang@redhat.com> wrote:
> >>
> >>
> >> On 4/2/21 9:39 PM, Paolo Valente wrote:
> >>>
> >>>> Il giorno 1 apr 2021, alle ore 03:27, Yi Zhang <yi.zhang@redhat.com> ha scritto:
> >>>>
> >>>> Hi
> >>> Hi
> >>>
> >>>> We reproduced this bfq regression[3] on ppc64le with blktests[2] on the latest linux-block/for-next branch, seems it was introduced with [1] from my bisecting, pls help check it. Let me know if you need any testing for it, thanks.
> >>>>
> >>> Thanks for reporting this bug and finding the candidate offending commit. Could you try this test with my dev kernel, which might provide more information? The kernel is here:
> >>> https://github.com/Algodev-github/bfq-mq
> >>>
> >>> Alternatively, I could try to provide you with patches to instrument your kernel.
> >> HI Paolo
> >> I tried your dev kernel, but with no luck to reproduce it, could you
> >> provide the debug patch based on latest linux-block/for-next?
> >>
> >> Hi Paolo
> >> This issue has been consistently reproduced with LTP/fstests/blktests on recent linux-block/for-next, do you have a chance to check it?
> >
> > Hi Yi, all,
> > I've been working hard to port my code-instrumentation layer to the kernel in for-next. I seem I finished the porting yesterday. I tested it but the system crashed. I'm going to analyze the oops. Maybe this freeze is caused by mistakes in this layer, maybe the instrumentation is already detecting a bug. In the first case, I'll fix the mistakes and try the tests suggested in this thread.
> >
>
> Hi Yi, all,
> I seem to have made it.  I've attached a patch series, which applies
> on top of for-next, as it was when you reported this failure (i.e., on
> top of 816e1d1c2f7d Merge branch 'for-5.13/io_uring' into for-next).
> If patches are to be applied on top of a different HEAD, and they
> don't apply cleanly, I'll take care of rebasing them.
>
> Of course I've tried your test myself, but with no failure at all.
>
> Looking forward to your feedback,
> Paolo
>
Hi Paolo

With the patch series, blktests nvme-tcp nvme/011 passed on
linux-block/for-next, thanks.

Thanks
Yi


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bisected] bfq regression on latest linux-block/for-next
  2021-05-07  8:30             ` Yi Zhang
@ 2021-05-11 17:10               ` Paolo Valente
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Valente @ 2021-05-11 17:10 UTC (permalink / raw)
  To: Yi Zhang; +Cc: linux-block, Xiong Zhou, Li Wang, Ming Lei, Bruno Goncalves



> Il giorno 7 mag 2021, alle ore 10:30, Yi Zhang <yi.zhang@redhat.com> ha scritto:
> 
> On Mon, May 3, 2021 at 5:35 PM Paolo Valente <paolo.valente@linaro.org> wrote:
>> 
>> 
>> 
>>> Il giorno 20 apr 2021, alle ore 09:33, Paolo Valente <paolo.valente@linaro.org> ha scritto:
>>> 
>>> 
>>> 
>>>> Il giorno 20 apr 2021, alle ore 04:00, Yi Zhang <yi.zhang@redhat.com> ha scritto:
>>>> 
>>>> 
>>>> On Wed, Apr 7, 2021 at 11:15 PM Yi Zhang <yi.zhang@redhat.com> wrote:
>>>> 
>>>> 
>>>> On 4/2/21 9:39 PM, Paolo Valente wrote:
>>>>> 
>>>>>> Il giorno 1 apr 2021, alle ore 03:27, Yi Zhang <yi.zhang@redhat.com> ha scritto:
>>>>>> 
>>>>>> Hi
>>>>> Hi
>>>>> 
>>>>>> We reproduced this bfq regression[3] on ppc64le with blktests[2] on the latest linux-block/for-next branch, seems it was introduced with [1] from my bisecting, pls help check it. Let me know if you need any testing for it, thanks.
>>>>>> 
>>>>> Thanks for reporting this bug and finding the candidate offending commit. Could you try this test with my dev kernel, which might provide more information? The kernel is here:
>>>>> https://github.com/Algodev-github/bfq-mq
>>>>> 
>>>>> Alternatively, I could try to provide you with patches to instrument your kernel.
>>>> HI Paolo
>>>> I tried your dev kernel, but with no luck to reproduce it, could you
>>>> provide the debug patch based on latest linux-block/for-next?
>>>> 
>>>> Hi Paolo
>>>> This issue has been consistently reproduced with LTP/fstests/blktests on recent linux-block/for-next, do you have a chance to check it?
>>> 
>>> Hi Yi, all,
>>> I've been working hard to port my code-instrumentation layer to the kernel in for-next. I seem I finished the porting yesterday. I tested it but the system crashed. I'm going to analyze the oops. Maybe this freeze is caused by mistakes in this layer, maybe the instrumentation is already detecting a bug. In the first case, I'll fix the mistakes and try the tests suggested in this thread.
>>> 
>> 
>> Hi Yi, all,
>> I seem to have made it.  I've attached a patch series, which applies
>> on top of for-next, as it was when you reported this failure (i.e., on
>> top of 816e1d1c2f7d Merge branch 'for-5.13/io_uring' into for-next).
>> If patches are to be applied on top of a different HEAD, and they
>> don't apply cleanly, I'll take care of rebasing them.
>> 
>> Of course I've tried your test myself, but with no failure at all.
>> 
>> Looking forward to your feedback,
>> Paolo
>> 
> Hi Paolo
> 
> With the patch series, blktests nvme-tcp nvme/011 passed on
> linux-block/for-next, thanks.
> 

Great!

That series contained only one fix.  I do hope that it has been that
fix to eliminate the failure.

So I'm about to propose such a fix as an individual patch for
mainline.  Please test it if you can.

Thanks,
Paolo

> Thanks
> Yi


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-05-11 17:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <841664844.1202812.1617239260115.JavaMail.zimbra@redhat.com>
2021-04-01  1:27 ` [bisected] bfq regression on latest linux-block/for-next Yi Zhang
2021-04-01  1:55   ` Chaitanya Kulkarni
2021-04-01  2:05     ` Yi Zhang
2021-04-02 13:39   ` Paolo Valente
2021-04-07 15:15     ` Yi Zhang
     [not found]       ` <CAHj4cs9+q-vH9qar+MTP-aECb2whT7O8J5OmR240yss1y=kWKw@mail.gmail.com>
2021-04-20  7:33         ` Paolo Valente
2021-05-03  9:37           ` Paolo Valente
2021-05-07  8:30             ` Yi Zhang
2021-05-11 17:10               ` Paolo Valente

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).