* dd hangs when reading large partitions @ 2019-01-18 12:10 Marc Gonzalez 2019-01-18 13:39 ` Ming Lei ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Marc Gonzalez @ 2019-01-18 12:10 UTC (permalink / raw) To: fsdevel, linux-block Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto, Jens Axboe, Fujita Tomonori, Paolo Valente Hello, I'm running into an issue which I don't know how to debug. So I'm open to ideas and suggestions :-) On my arm64 board, I have enabled Universal Flash Storage support. I wanted to benchmark read performance, and noticed that the system locks up when I read partitions larger than 3.5 GB, unless I tell dd to use direct IO: *** WITH O_DIRECT *** # dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress 57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s 55256+0 records in 55256+0 records out 57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s *** WITHOUT O_DIRECT *** # dd if=/dev/sda of=/dev/null bs=1M status=progress 3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626 rcu: 6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626 rcu: (detected by 7, t=5254 jiffies, g=-275, q=2) Task dump for CPU 1: kworker/1:1H R running task 0 675 2 0x0000002a Workqueue: kblockd blk_mq_run_work_fn Call trace: __switch_to+0x168/0x1d0 0xffffffc0f6efbbc8 blk_mq_run_work_fn+0x28/0x40 process_one_work+0x208/0x470 worker_thread+0x48/0x460 kthread+0x128/0x130 ret_from_fork+0x10/0x1c Task dump for CPU 6: kthreadd R running task 0 2 0 0x0000002a Call trace: __switch_to+0x168/0x1d0 0x5b36396f4e7d4000 rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=10500 rcu: 6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=10500 rcu: (detected by 7, t=21009 jiffies, g=-275, q=2) Task dump for CPU 1: kworker/1:1H R running task 0 675 2 0x0000002a Workqueue: kblockd blk_mq_run_work_fn Call trace: __switch_to+0x168/0x1d0 0xffffffc0f6efbbc8 blk_mq_run_work_fn+0x28/0x40 process_one_work+0x208/0x470 worker_thread+0x48/0x460 kthread+0x128/0x130 ret_from_fork+0x10/0x1c Task dump for CPU 6: kthreadd R running task 0 2 0 0x0000002a Call trace: __switch_to+0x168/0x1d0 0x5b36396f4e7d4000 The system always hangs around the 3.6 GiB mark, wherever I start from. How can I debug this issue? Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-18 12:10 dd hangs when reading large partitions Marc Gonzalez @ 2019-01-18 13:39 ` Ming Lei 2019-01-18 14:54 ` Marc Gonzalez 2019-01-18 15:18 ` jianchao.wang 2019-01-18 19:27 ` Douglas Gilbert 2 siblings, 1 reply; 34+ messages in thread From: Ming Lei @ 2019-01-18 13:39 UTC (permalink / raw) To: Marc Gonzalez Cc: fsdevel, linux-block, SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto, Jens Axboe, Fujita Tomonori, Paolo Valente On Fri, Jan 18, 2019 at 8:11 PM Marc Gonzalez <marc.w.gonzalez@free.fr> wrote: > > Hello, > > I'm running into an issue which I don't know how to debug. > So I'm open to ideas and suggestions :-) > > On my arm64 board, I have enabled Universal Flash Storage support. > > I wanted to benchmark read performance, and noticed that the system > locks up when I read partitions larger than 3.5 GB, unless I tell > dd to use direct IO: > > *** WITH O_DIRECT *** > # dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress > 57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s > 55256+0 records in > 55256+0 records out > 57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s > > *** WITHOUT O_DIRECT *** > # dd if=/dev/sda of=/dev/null bs=1M status=progress > 3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s > > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > rcu: 1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626 > rcu: 6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626 > rcu: (detected by 7, t=5254 jiffies, g=-275, q=2) > Task dump for CPU 1: > kworker/1:1H R running task 0 675 2 0x0000002a > Workqueue: kblockd blk_mq_run_work_fn > Call trace: > __switch_to+0x168/0x1d0 > 0xffffffc0f6efbbc8 > blk_mq_run_work_fn+0x28/0x40 > process_one_work+0x208/0x470 > worker_thread+0x48/0x460 > kthread+0x128/0x130 > ret_from_fork+0x10/0x1c > Task dump for CPU 6: > kthreadd R running task 0 2 0 0x0000002a > Call trace: > __switch_to+0x168/0x1d0 > 0x5b36396f4e7d4000 > > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > rcu: 1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=10500 > rcu: 6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=10500 > rcu: (detected by 7, t=21009 jiffies, g=-275, q=2) > Task dump for CPU 1: > kworker/1:1H R running task 0 675 2 0x0000002a > Workqueue: kblockd blk_mq_run_work_fn > Call trace: > __switch_to+0x168/0x1d0 > 0xffffffc0f6efbbc8 > blk_mq_run_work_fn+0x28/0x40 > process_one_work+0x208/0x470 > worker_thread+0x48/0x460 > kthread+0x128/0x130 > ret_from_fork+0x10/0x1c > Task dump for CPU 6: > kthreadd R running task 0 2 0 0x0000002a > Call trace: > __switch_to+0x168/0x1d0 > 0x5b36396f4e7d4000 > > > The system always hangs around the 3.6 GiB mark, wherever I start from. > How can I debug this issue? Maybe you can try to collect debugfs log first via the following command? (cd /sys/kernel/debug/block/sda && find . -type f -exec grep -aH . {} \;) BTW, I have several questions on this report: - what is the kernel version in your test? - can you reproduce this issue on other disk(not UFS)? - are there any tasks in 'D' state shown via 'ps -ax'? If yes, please dump their stack trace. Thanks, Ming Lei ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-18 13:39 ` Ming Lei @ 2019-01-18 14:54 ` Marc Gonzalez 0 siblings, 0 replies; 34+ messages in thread From: Marc Gonzalez @ 2019-01-18 14:54 UTC (permalink / raw) To: Ming Lei, fsdevel, linux-block Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto, Jens Axboe, Fujita Tomonori, Paolo Valente On 18/01/2019 14:39, Ming Lei wrote: > On Fri, Jan 18, 2019 at 8:11 PM Marc Gonzalez wrote: > >> I'm running into an issue which I don't know how to debug. >> So I'm open to ideas and suggestions :-) >> >> On my arm64 board, I have enabled Universal Flash Storage support. >> >> I wanted to benchmark read performance, and noticed that the system >> locks up when I read partitions larger than 3.5 GB, unless I tell >> dd to use direct IO: >> >> *** WITH O_DIRECT *** >> # dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress >> 57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s >> 55256+0 records in >> 55256+0 records out >> 57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s >> >> *** WITHOUT O_DIRECT *** >> # dd if=/dev/sda of=/dev/null bs=1M status=progress >> 3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s >> >> >> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: >> rcu: 1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626 >> rcu: 6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626 >> rcu: (detected by 7, t=5254 jiffies, g=-275, q=2) >> Task dump for CPU 1: >> kworker/1:1H R running task 0 675 2 0x0000002a >> Workqueue: kblockd blk_mq_run_work_fn >> Call trace: >> __switch_to+0x168/0x1d0 >> 0xffffffc0f6efbbc8 >> blk_mq_run_work_fn+0x28/0x40 >> process_one_work+0x208/0x470 >> worker_thread+0x48/0x460 >> kthread+0x128/0x130 >> ret_from_fork+0x10/0x1c >> Task dump for CPU 6: >> kthreadd R running task 0 2 0 0x0000002a >> Call trace: >> __switch_to+0x168/0x1d0 >> 0x5b36396f4e7d4000 >> >> >> The system always hangs around the 3.6 GiB mark, wherever I start from. >> How can I debug this issue? > > Maybe you can try to collect debugfs log first via the following command? > > (cd /sys/kernel/debug/block/sda && find . -type f -exec grep -aH . {} \;) # find . -type f -exec grep -aH . {} \; ./hctx0/cpu7/completed:0 0 ./hctx0/cpu7/merged:0 ./hctx0/cpu7/dispatched:0 0 ./hctx0/cpu6/completed:0 5 ./hctx0/cpu6/merged:0 ./hctx0/cpu6/dispatched:0 5 ./hctx0/cpu5/completed:0 0 ./hctx0/cpu5/merged:0 ./hctx0/cpu5/dispatched:0 0 ./hctx0/cpu4/completed:0 0 ./hctx0/cpu4/merged:0 ./hctx0/cpu4/dispatched:0 0 ./hctx0/cpu3/completed:5 38 ./hctx0/cpu3/merged:0 ./hctx0/cpu3/dispatched:5 38 ./hctx0/cpu2/completed:0 0 ./hctx0/cpu2/merged:0 ./hctx0/cpu2/dispatched:0 0 ./hctx0/cpu1/completed:0 0 ./hctx0/cpu1/merged:0 ./hctx0/cpu1/dispatched:0 0 ./hctx0/cpu0/completed:0 0 ./hctx0/cpu0/merged:0 ./hctx0/cpu0/dispatched:0 0 ./hctx0/type:default ./hctx0/dispatch_busy:0 ./hctx0/active:0 ./hctx0/run:43 ./hctx0/queued:48 ./hctx0/dispatched: 0 0 ./hctx0/dispatched: 1 43 ./hctx0/dispatched: 2 0 ./hctx0/dispatched: 4 0 ./hctx0/dispatched: 8 0 ./hctx0/dispatched: 16 0 ./hctx0/dispatched: 32+ 0 ./hctx0/io_poll:considered=0 ./hctx0/io_poll:invoked=0 ./hctx0/io_poll:success=0 ./hctx0/tags_bitmap:00000000: ff07 070f ./hctx0/tags:nr_tags=32 ./hctx0/tags:nr_reserved_tags=0 ./hctx0/tags:active_queues=0 ./hctx0/tags:bitmap_tags: ./hctx0/tags:depth=32 ./hctx0/tags:busy=0 ./hctx0/tags:cleared=18 ./hctx0/tags:bits_per_word=8 ./hctx0/tags:map_nr=4 ./hctx0/tags:alloc_hint={12, 7, 12, 24, 12, 27, 10, 18} ./hctx0/tags:wake_batch=4 ./hctx0/tags:wake_index=0 ./hctx0/tags:ws_active=0 ./hctx0/tags:ws={ ./hctx0/tags: {.wait_cnt=4, .wait=inactive}, ./hctx0/tags: {.wait_cnt=4, .wait=inactive}, ./hctx0/tags: {.wait_cnt=4, .wait=inactive}, ./hctx0/tags: {.wait_cnt=4, .wait=inactive}, ./hctx0/tags: {.wait_cnt=4, .wait=inactive}, ./hctx0/tags: {.wait_cnt=4, .wait=inactive}, ./hctx0/tags: {.wait_cnt=4, .wait=inactive}, ./hctx0/tags: {.wait_cnt=4, .wait=inactive}, ./hctx0/tags:} ./hctx0/tags:round_robin=0 ./hctx0/tags:min_shallow_depth=4294967295 ./hctx0/ctx_map:00000000: 00 ./hctx0/flags:alloc_policy=FIFO SHOULD_MERGE|TAG_SHARED|SG_MERGE ./write_hints:hint0: 0 ./write_hints:hint1: 0 ./write_hints:hint2: 0 ./write_hints:hint3: 0 ./write_hints:hint4: 0 ./state:SAME_COMP|NONROT|IO_STAT|DISCARD|INIT_DONE|WC|FUA|REGISTERED|SCSI_PASSTHROUGH ./pm_only:0 ./poll_stat:read (512 Bytes): samples=0 ./poll_stat:write (512 Bytes): samples=0 ./poll_stat:read (1024 Bytes): samples=0 ./poll_stat:write (1024 Bytes): samples=0 ./poll_stat:read (2048 Bytes): samples=0 ./poll_stat:write (2048 Bytes): samples=0 ./poll_stat:read (4096 Bytes): samples=0 ./poll_stat:write (4096 Bytes): samples=0 ./poll_stat:read (8192 Bytes): samples=0 ./poll_stat:write (8192 Bytes): samples=0 ./poll_stat:read (16384 Bytes): samples=0 ./poll_stat:write (16384 Bytes): samples=0 ./poll_stat:read (32768 Bytes): samples=0 ./poll_stat:write (32768 Bytes): samples=0 ./poll_stat:read (65536 Bytes): samples=0 ./poll_stat:write (65536 Bytes): samples=0 > BTW, I have several questions on this report: > > - what is the kernel version in your test? v5.0-rc1 + a few patches to enable UFS > - can you reproduce this issue on other disk (not UFS)? At the moment, UFS is the only available storage option. (USB is broken, SDHC requires some work.) > - are there any tasks in 'D' state shown via 'ps -ax'? If yes, please dump their > stack trace. (This rootfs provides busybox ps.) # ps -o pid,user,stat,args PID USER STAT COMMAND 1 root S init 2 root SW [kthreadd] 3 root IW< [rcu_gp] 4 root IW< [rcu_par_gp] 5 root IW [kworker/0:0-eve] 6 root IW< [kworker/0:0H-kb] 8 root IW< [mm_percpu_wq] 9 root SW [ksoftirqd/0] 10 root IW [rcu_preempt] 11 root SW [migration/0] 12 root IW [kworker/0:1-eve] 13 root SW [cpuhp/0] 14 root SW [cpuhp/1] 15 root SW [migration/1] 16 root SW [ksoftirqd/1] 17 root IW [kworker/1:0-eve] 18 root IW< [kworker/1:0H-kb] 19 root SW [cpuhp/2] 20 root SW [migration/2] 21 root SW [ksoftirqd/2] 22 root IW [kworker/2:0-eve] 23 root IW< [kworker/2:0H-kb] 24 root SW [cpuhp/3] 25 root SW [migration/3] 26 root SW [ksoftirqd/3] 27 root IW [kworker/3:0-eve] 28 root IW< [kworker/3:0H-kb] 29 root SW [cpuhp/4] 30 root SW [migration/4] 31 root SW [ksoftirqd/4] 32 root IW [kworker/4:0-eve] 33 root IW< [kworker/4:0H] 34 root SW [cpuhp/5] 35 root SW [migration/5] 36 root SW [ksoftirqd/5] 37 root IW [kworker/5:0-eve] 38 root IW< [kworker/5:0H] 39 root SW [cpuhp/6] 40 root SW [migration/6] 41 root SW [ksoftirqd/6] 42 root IW [kworker/6:0-eve] 43 root IW< [kworker/6:0H-kb] 44 root SW [cpuhp/7] 45 root SW [migration/7] 46 root SW [ksoftirqd/7] 47 root IW [kworker/7:0-eve] 48 root IW< [kworker/7:0H] 49 root SW [kdevtmpfs] 52 root SW [rcu_tasks_kthre] 53 root IW [kworker/3:1-mm_] 64 root IW [kworker/4:1-mm_] 65 root IW [kworker/5:1-eve] 76 root IW [kworker/6:1-mm_] 78 root IW [kworker/1:1-mm_] 88 root IW [kworker/7:1-eve] 101 root IW [kworker/2:1-mm_] 198 root SW [khungtaskd] 199 root SW [oom_reaper] 200 root IW< [writeback] 202 root SW [kcompactd0] 204 root IW< [kblockd] 223 root IW< [devfreq_wq] 372 root SW [kswapd0] 505 root SW [scsi_eh_0] 506 root IW< [scsi_tmf_0] 507 root IW< [ufs_clk_gating_] 508 root IW< [ufs_clkscaling_] 546 root IW [kworker/u16:5-e] 555 root IW [kworker/u16:7-e] 657 root S /sbin/klogd -n 675 root S -sh 677 root IW< [kworker/6:1H-kb] 678 root IW< [kworker/3:1H] 679 root IW< [kworker/1:1H] 680 root IW< [kworker/2:1H] 681 root IW< [kworker/0:1H] 859 root R ps -o pid,user,stat,args Do you see anything relevant? Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-18 12:10 dd hangs when reading large partitions Marc Gonzalez 2019-01-18 13:39 ` Ming Lei @ 2019-01-18 15:18 ` jianchao.wang 2019-01-18 17:38 ` Marc Gonzalez 2019-01-18 17:48 ` Jens Axboe 2019-01-18 19:27 ` Douglas Gilbert 2 siblings, 2 replies; 34+ messages in thread From: jianchao.wang @ 2019-01-18 15:18 UTC (permalink / raw) To: Marc Gonzalez, fsdevel, linux-block Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto, Jens Axboe, Fujita Tomonori, Paolo Valente Hello On 1/18/19 8:10 PM, Marc Gonzalez wrote: > Hello, > > I'm running into an issue which I don't know how to debug. > So I'm open to ideas and suggestions :-) > > On my arm64 board, I have enabled Universal Flash Storage support. > > I wanted to benchmark read performance, and noticed that the system > locks up when I read partitions larger than 3.5 GB, unless I tell > dd to use direct IO: > > *** WITH O_DIRECT *** > # dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress > 57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s > 55256+0 records in > 55256+0 records out > 57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s > > *** WITHOUT O_DIRECT *** > # dd if=/dev/sda of=/dev/null bs=1M status=progress > 3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s > > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > rcu: 1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626 > rcu: 6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626 > rcu: (detected by 7, t=5254 jiffies, g=-275, q=2) > Task dump for CPU 1: > kworker/1:1H R running task 0 675 2 0x0000002a > Workqueue: kblockd blk_mq_run_work_fn > Call trace: > __switch_to+0x168/0x1d0 It looks like the blk_mq_run_work_fn went to sleep with rcu lock (preempt), isn't it ? Can you share the symbol of the following address ? > 0xffffffc0f6efbbc8 > blk_mq_run_work_fn+0x28/0x40 > process_one_work+0x208/0x470 > worker_thread+0x48/0x460 > kthread+0x128/0x130 > ret_from_fork+0x10/0x1c > Task dump for CPU 6: > kthreadd R running task 0 2 0 0x0000002a > Call trace: > __switch_to+0x168/0x1d0 > 0x5b36396f4e7d4000 > > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > rcu: 1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=10500 > rcu: 6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=10500 > rcu: (detected by 7, t=21009 jiffies, g=-275, q=2) > Task dump for CPU 1: > kworker/1:1H R running task 0 675 2 0x0000002a > Workqueue: kblockd blk_mq_run_work_fn > Call trace: > __switch_to+0x168/0x1d0 > 0xffffffc0f6efbbc8 > blk_mq_run_work_fn+0x28/0x40 > process_one_work+0x208/0x470 > worker_thread+0x48/0x460 > kthread+0x128/0x130 > ret_from_fork+0x10/0x1c > Task dump for CPU 6: > kthreadd R running task 0 2 0 0x0000002a > Call trace: > __switch_to+0x168/0x1d0 > 0x5b36396f4e7d4000 > > > The system always hangs around the 3.6 GiB mark, wherever I start from. > How can I debug this issue? > > Regards. > Thanks Jianchao ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-18 15:18 ` jianchao.wang @ 2019-01-18 17:38 ` Marc Gonzalez 2019-01-18 17:48 ` Jens Axboe 1 sibling, 0 replies; 34+ messages in thread From: Marc Gonzalez @ 2019-01-18 17:38 UTC (permalink / raw) To: Jianchao Wang, fsdevel, linux-block Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto, Jens Axboe, Fujita Tomonori, Paolo Valente On 18/01/2019 16:18, jianchao.wang wrote: > It looks like the blk_mq_run_work_fn went to sleep with rcu lock (preempt), isn't it ? > Can you share the symbol of the following address ? Note: if I boot with 'nosmp', then the system reboots without logging anything. (Perhaps some kind of HW watchdog, or secure OS) I no longer get the 'rcu_preempt detected stalls' message... Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-18 15:18 ` jianchao.wang 2019-01-18 17:38 ` Marc Gonzalez @ 2019-01-18 17:48 ` Jens Axboe 2019-01-18 17:51 ` Bart Van Assche 2019-01-19 9:56 ` Christoph Hellwig 1 sibling, 2 replies; 34+ messages in thread From: Jens Axboe @ 2019-01-18 17:48 UTC (permalink / raw) To: jianchao.wang, Marc Gonzalez, fsdevel, linux-block Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto, Fujita Tomonori, Paolo Valente On 1/18/19 8:18 AM, jianchao.wang wrote: > Hello > > On 1/18/19 8:10 PM, Marc Gonzalez wrote: >> Hello, >> >> I'm running into an issue which I don't know how to debug. >> So I'm open to ideas and suggestions :-) >> >> On my arm64 board, I have enabled Universal Flash Storage support. >> >> I wanted to benchmark read performance, and noticed that the system >> locks up when I read partitions larger than 3.5 GB, unless I tell >> dd to use direct IO: >> >> *** WITH O_DIRECT *** >> # dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress >> 57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s >> 55256+0 records in >> 55256+0 records out >> 57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s >> >> *** WITHOUT O_DIRECT *** >> # dd if=/dev/sda of=/dev/null bs=1M status=progress >> 3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s >> >> >> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: >> rcu: 1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626 >> rcu: 6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626 >> rcu: (detected by 7, t=5254 jiffies, g=-275, q=2) >> Task dump for CPU 1: >> kworker/1:1H R running task 0 675 2 0x0000002a >> Workqueue: kblockd blk_mq_run_work_fn >> Call trace: >> __switch_to+0x168/0x1d0 > > It looks like the blk_mq_run_work_fn went to sleep with rcu lock (preempt), isn't it ? > Can you share the symbol of the following address ? It's UFS that totally buggy, if you look at its queuecommand, it does: if (!down_read_trylock(&hba->clk_scaling_lock)) return SCSI_MLQUEUE_HOST_BUSY; UFS either needs to get fixed up, or we'll want a way to do something like the below. Marc, can you test this? diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index eaf329db3973..e28c3420a9d9 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -412,6 +412,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) shost->hostt = sht; shost->this_id = sht->this_id; shost->can_queue = sht->can_queue; + shost->queue_may_block = sht->queue_may_block; shost->sg_tablesize = sht->sg_tablesize; shost->sg_prot_tablesize = sht->sg_prot_tablesize; shost->cmd_per_lun = sht->cmd_per_lun; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index b13cc9288ba0..4e266af2871f 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1902,6 +1902,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; shost->tag_set.flags |= BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy); + if (shost->queue_may_blocK) + shost->tag_set.flags |= BLK_MQ_F_BLOCKING; shost->tag_set.driver_data = shost; return blk_mq_alloc_tag_set(&shost->tag_set); diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 9ba7671b84f8..9ab354e43630 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -6981,6 +6981,7 @@ static struct scsi_host_template ufshcd_driver_template = { .sg_tablesize = SG_ALL, .cmd_per_lun = UFSHCD_CMD_PER_LUN, .can_queue = UFSHCD_CAN_QUEUE, + .queue_may_block = 1, .max_host_blocked = 1, .track_queue_depth = 1, .sdev_groups = ufshcd_driver_groups, diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index 6ca954e9f752..30aa7b6c4342 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -339,6 +339,11 @@ struct scsi_host_template { */ int can_queue; + /* + * If the ->queuecommand() ever blocks, this should be set + */ + int queue_may_block; + /* * In many instances, especially where disconnect / reconnect are * supported, our host also has an ID on the SCSI bus. If this is @@ -584,6 +589,7 @@ struct Scsi_Host { int this_id; int can_queue; + int queue_may_block; short cmd_per_lun; short unsigned int sg_tablesize; short unsigned int sg_prot_tablesize; -- Jens Axboe ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-18 17:48 ` Jens Axboe @ 2019-01-18 17:51 ` Bart Van Assche 2019-01-18 19:00 ` Jens Axboe 2019-01-19 9:56 ` Christoph Hellwig 1 sibling, 1 reply; 34+ messages in thread From: Bart Van Assche @ 2019-01-18 17:51 UTC (permalink / raw) To: Jens Axboe, jianchao.wang, Marc Gonzalez, fsdevel, linux-block Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto, Fujita Tomonori, Paolo Valente On Fri, 2019-01-18 at 10:48 -0700, Jens Axboe wrote: > It's UFS that totally buggy, if you look at its queuecommand, it does: > > if (!down_read_trylock(&hba->clk_scaling_lock)) > return SCSI_MLQUEUE_HOST_BUSY; > > UFS either needs to get fixed up, or we'll want a way to do something like > the below. > > Marc, can you test this? > > > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c > index eaf329db3973..e28c3420a9d9 100644 > --- a/drivers/scsi/hosts.c > +++ b/drivers/scsi/hosts.c > @@ -412,6 +412,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) > shost->hostt = sht; > shost->this_id = sht->this_id; > shost->can_queue = sht->can_queue; > + shost->queue_may_block = sht->queue_may_block; > shost->sg_tablesize = sht->sg_tablesize; > shost->sg_prot_tablesize = sht->sg_prot_tablesize; > shost->cmd_per_lun = sht->cmd_per_lun; > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index b13cc9288ba0..4e266af2871f 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1902,6 +1902,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) > shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; > shost->tag_set.flags |= > BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy); > + if (shost->queue_may_blocK) > + shost->tag_set.flags |= BLK_MQ_F_BLOCKING; > shost->tag_set.driver_data = shost; > > return blk_mq_alloc_tag_set(&shost->tag_set); > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index 9ba7671b84f8..9ab354e43630 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -6981,6 +6981,7 @@ static struct scsi_host_template ufshcd_driver_template = { > .sg_tablesize = SG_ALL, > .cmd_per_lun = UFSHCD_CMD_PER_LUN, > .can_queue = UFSHCD_CAN_QUEUE, > + .queue_may_block = 1, > .max_host_blocked = 1, > .track_queue_depth = 1, > .sdev_groups = ufshcd_driver_groups, > diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h > index 6ca954e9f752..30aa7b6c4342 100644 > --- a/include/scsi/scsi_host.h > +++ b/include/scsi/scsi_host.h > @@ -339,6 +339,11 @@ struct scsi_host_template { > */ > int can_queue; > > + /* > + * If the ->queuecommand() ever blocks, this should be set > + */ > + int queue_may_block; > + > /* > * In many instances, especially where disconnect / reconnect are > * supported, our host also has an ID on the SCSI bus. If this is > @@ -584,6 +589,7 @@ struct Scsi_Host { > > int this_id; > int can_queue; > + int queue_may_block; > short cmd_per_lun; > short unsigned int sg_tablesize; > short unsigned int sg_prot_tablesize; Hi Jens, Did you perhaps want to include a change from down_read_trylock() into down_read() in the UFS queuecommand function in this patch? Thanks, Bart. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-18 17:51 ` Bart Van Assche @ 2019-01-18 19:00 ` Jens Axboe 0 siblings, 0 replies; 34+ messages in thread From: Jens Axboe @ 2019-01-18 19:00 UTC (permalink / raw) To: Bart Van Assche, jianchao.wang, Marc Gonzalez, fsdevel, linux-block Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto, Fujita Tomonori, Paolo Valente On 1/18/19 10:51 AM, Bart Van Assche wrote: > On Fri, 2019-01-18 at 10:48 -0700, Jens Axboe wrote: >> It's UFS that totally buggy, if you look at its queuecommand, it does: >> >> if (!down_read_trylock(&hba->clk_scaling_lock)) >> return SCSI_MLQUEUE_HOST_BUSY; >> >> UFS either needs to get fixed up, or we'll want a way to do something like >> the below. >> >> Marc, can you test this? >> >> >> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c >> index eaf329db3973..e28c3420a9d9 100644 >> --- a/drivers/scsi/hosts.c >> +++ b/drivers/scsi/hosts.c >> @@ -412,6 +412,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) >> shost->hostt = sht; >> shost->this_id = sht->this_id; >> shost->can_queue = sht->can_queue; >> + shost->queue_may_block = sht->queue_may_block; >> shost->sg_tablesize = sht->sg_tablesize; >> shost->sg_prot_tablesize = sht->sg_prot_tablesize; >> shost->cmd_per_lun = sht->cmd_per_lun; >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >> index b13cc9288ba0..4e266af2871f 100644 >> --- a/drivers/scsi/scsi_lib.c >> +++ b/drivers/scsi/scsi_lib.c >> @@ -1902,6 +1902,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) >> shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; >> shost->tag_set.flags |= >> BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy); >> + if (shost->queue_may_blocK) >> + shost->tag_set.flags |= BLK_MQ_F_BLOCKING; >> shost->tag_set.driver_data = shost; >> >> return blk_mq_alloc_tag_set(&shost->tag_set); >> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >> index 9ba7671b84f8..9ab354e43630 100644 >> --- a/drivers/scsi/ufs/ufshcd.c >> +++ b/drivers/scsi/ufs/ufshcd.c >> @@ -6981,6 +6981,7 @@ static struct scsi_host_template ufshcd_driver_template = { >> .sg_tablesize = SG_ALL, >> .cmd_per_lun = UFSHCD_CMD_PER_LUN, >> .can_queue = UFSHCD_CAN_QUEUE, >> + .queue_may_block = 1, >> .max_host_blocked = 1, >> .track_queue_depth = 1, >> .sdev_groups = ufshcd_driver_groups, >> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h >> index 6ca954e9f752..30aa7b6c4342 100644 >> --- a/include/scsi/scsi_host.h >> +++ b/include/scsi/scsi_host.h >> @@ -339,6 +339,11 @@ struct scsi_host_template { >> */ >> int can_queue; >> >> + /* >> + * If the ->queuecommand() ever blocks, this should be set >> + */ >> + int queue_may_block; >> + >> /* >> * In many instances, especially where disconnect / reconnect are >> * supported, our host also has an ID on the SCSI bus. If this is >> @@ -584,6 +589,7 @@ struct Scsi_Host { >> >> int this_id; >> int can_queue; >> + int queue_may_block; >> short cmd_per_lun; >> short unsigned int sg_tablesize; >> short unsigned int sg_prot_tablesize; > > Hi Jens, > > Did you perhaps want to include a change from down_read_trylock() into > down_read() in the UFS queuecommand function in this patch? Yeah, that should be done as well. But honestly, looking at UFS, it's in dire need of love from someone that has some experience in writing a driver. The locking is absolutely miserable. For one, the clk_scaling_lock needs to die. The clock scaling change should coordinate with the requests, not try to lock out request handling. It's totally an upside down approach. -- Jens Axboe ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-18 17:48 ` Jens Axboe 2019-01-18 17:51 ` Bart Van Assche @ 2019-01-19 9:56 ` Christoph Hellwig 2019-01-19 14:37 ` Jens Axboe ` (2 more replies) 1 sibling, 3 replies; 34+ messages in thread From: Christoph Hellwig @ 2019-01-19 9:56 UTC (permalink / raw) To: Jens Axboe Cc: jianchao.wang, Marc Gonzalez, fsdevel, linux-block, SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto, Fujita Tomonori, Paolo Valente On Fri, Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote: > It's UFS that totally buggy, if you look at its queuecommand, it does: > > if (!down_read_trylock(&hba->clk_scaling_lock)) > return SCSI_MLQUEUE_HOST_BUSY; > > UFS either needs to get fixed up, or we'll want a way to do something like > the below. I think the right answer is to just revert the offending patch instead of papering over it in the SCSI code. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-19 9:56 ` Christoph Hellwig @ 2019-01-19 14:37 ` Jens Axboe 2019-01-19 16:09 ` Bart Van Assche 2019-01-19 19:47 ` Marc Gonzalez 2 siblings, 0 replies; 34+ messages in thread From: Jens Axboe @ 2019-01-19 14:37 UTC (permalink / raw) To: Christoph Hellwig Cc: jianchao.wang, Marc Gonzalez, fsdevel, linux-block, SCSI, Alexander Viro, Jan Kara, Joao Pinto, Fujita Tomonori, Paolo Valente On 1/19/19 2:56 AM, Christoph Hellwig wrote: > On Fri, Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote: >> It's UFS that totally buggy, if you look at its queuecommand, it does: >> >> if (!down_read_trylock(&hba->clk_scaling_lock)) >> return SCSI_MLQUEUE_HOST_BUSY; >> >> UFS either needs to get fixed up, or we'll want a way to do something like >> the below. > > I think the right answer is to just revert the offending patch instead > of papering over it in the SCSI code. I fully agree. -- Jens Axboe ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-19 9:56 ` Christoph Hellwig 2019-01-19 14:37 ` Jens Axboe @ 2019-01-19 16:09 ` Bart Van Assche 2019-01-21 8:33 ` Christoph Hellwig 2019-01-19 19:47 ` Marc Gonzalez 2 siblings, 1 reply; 34+ messages in thread From: Bart Van Assche @ 2019-01-19 16:09 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe Cc: jianchao.wang, Marc Gonzalez, fsdevel, linux-block, SCSI, Alexander Viro, Jan Kara, Joao Pinto, Fujita Tomonori, Paolo Valente On 1/19/19 1:56 AM, Christoph Hellwig wrote: > On Fri, Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote: >> It's UFS that totally buggy, if you look at its queuecommand, it does: >> >> if (!down_read_trylock(&hba->clk_scaling_lock)) >> return SCSI_MLQUEUE_HOST_BUSY; >> >> UFS either needs to get fixed up, or we'll want a way to do something like >> the below. > > I think the right answer is to just revert the offending patch instead > of papering over it in the SCSI code. > . Hi Christoph, Which patch are you referring to? Thanks, Bart. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-19 16:09 ` Bart Van Assche @ 2019-01-21 8:33 ` Christoph Hellwig 0 siblings, 0 replies; 34+ messages in thread From: Christoph Hellwig @ 2019-01-21 8:33 UTC (permalink / raw) To: Bart Van Assche Cc: Christoph Hellwig, Jens Axboe, jianchao.wang, Marc Gonzalez, fsdevel, linux-block, SCSI, Alexander Viro, Jan Kara, Joao Pinto, Fujita Tomonori, Paolo Valente On Sat, Jan 19, 2019 at 08:09:18AM -0800, Bart Van Assche wrote: > Which patch are you referring to? a3cd5ec55f6 ("scsi: ufs: add load based scaling of UFS gear") That being said it doesn't seem entirely trivial to revert due to later additions, so this might require a little more work. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-19 9:56 ` Christoph Hellwig 2019-01-19 14:37 ` Jens Axboe 2019-01-19 16:09 ` Bart Van Assche @ 2019-01-19 19:47 ` Marc Gonzalez 2019-01-19 20:45 ` Marc Gonzalez ` (2 more replies) 2 siblings, 3 replies; 34+ messages in thread From: Marc Gonzalez @ 2019-01-19 19:47 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe, Jianchao Wang Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam, Vijay Viswanath, Sayali Lokhande, Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson On 19/01/2019 10:56, Christoph Hellwig wrote: > On Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote: > >> It's UFS that totally buggy, if you look at its queuecommand, it does: >> >> if (!down_read_trylock(&hba->clk_scaling_lock)) >> return SCSI_MLQUEUE_HOST_BUSY; >> >> UFS either needs to get fixed up, or we'll want a way to do something like >> the below. > > I think the right answer is to just revert the offending patch instead > of papering over it in the SCSI code. [ Adjusting recipients list ] Full thread, for new recipients: https://www.spinics.net/lists/linux-fsdevel/msg138601.html Christoph, do you mean a3cd5ec55f6c7 ? Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-19 19:47 ` Marc Gonzalez @ 2019-01-19 20:45 ` Marc Gonzalez 2019-01-21 8:33 ` Christoph Hellwig 2019-01-21 15:22 ` Marc Gonzalez 2 siblings, 0 replies; 34+ messages in thread From: Marc Gonzalez @ 2019-01-19 20:45 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe, Jianchao Wang Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson On 19/01/2019 20:47, Marc Gonzalez wrote: > On 19/01/2019 10:56, Christoph Hellwig wrote: > >> On Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote: >> >>> It's UFS that totally buggy, if you look at its queuecommand, it does: >>> >>> if (!down_read_trylock(&hba->clk_scaling_lock)) >>> return SCSI_MLQUEUE_HOST_BUSY; >>> >>> UFS either needs to get fixed up, or we'll want a way to do something like >>> the below. >> >> I think the right answer is to just revert the offending patch instead >> of papering over it in the SCSI code. > > [ Adjusting recipients list ] > > Full thread, for new recipients: > https://www.spinics.net/lists/linux-fsdevel/msg138601.html > > Christoph, do you mean a3cd5ec55f6c7 ? And by the way, can someone explain these two oddities: 1) the system only hangs if we try to read more than 3.8 GB (starting point is irrelevant) 2) the system does not hang if we use iflag=direct Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-19 19:47 ` Marc Gonzalez 2019-01-19 20:45 ` Marc Gonzalez @ 2019-01-21 8:33 ` Christoph Hellwig 2019-01-21 15:22 ` Marc Gonzalez 2 siblings, 0 replies; 34+ messages in thread From: Christoph Hellwig @ 2019-01-21 8:33 UTC (permalink / raw) To: Marc Gonzalez Cc: Christoph Hellwig, Jens Axboe, Jianchao Wang, fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson On Sat, Jan 19, 2019 at 08:47:13PM +0100, Marc Gonzalez wrote: > On 19/01/2019 10:56, Christoph Hellwig wrote: > > > On Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote: > > > >> It's UFS that totally buggy, if you look at its queuecommand, it does: > >> > >> if (!down_read_trylock(&hba->clk_scaling_lock)) > >> return SCSI_MLQUEUE_HOST_BUSY; > >> > >> UFS either needs to get fixed up, or we'll want a way to do something like > >> the below. > > > > I think the right answer is to just revert the offending patch instead > > of papering over it in the SCSI code. > > [ Adjusting recipients list ] > > Full thread, for new recipients: > https://www.spinics.net/lists/linux-fsdevel/msg138601.html > > Christoph, do you mean a3cd5ec55f6c7 ? Yes. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-19 19:47 ` Marc Gonzalez 2019-01-19 20:45 ` Marc Gonzalez 2019-01-21 8:33 ` Christoph Hellwig @ 2019-01-21 15:22 ` Marc Gonzalez 2019-01-22 3:12 ` jianchao.wang 2 siblings, 1 reply; 34+ messages in thread From: Marc Gonzalez @ 2019-01-21 15:22 UTC (permalink / raw) To: Christoph Hellwig, Jens Axboe, Jianchao Wang Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson On 19/01/2019 20:47, Marc Gonzalez wrote: > On 19/01/2019 10:56, Christoph Hellwig wrote: > >> On Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote: >> >>> It's UFS that totally buggy, if you look at its queuecommand, it does: >>> >>> if (!down_read_trylock(&hba->clk_scaling_lock)) >>> return SCSI_MLQUEUE_HOST_BUSY; >>> >>> UFS either needs to get fixed up, or we'll want a way to do something like >>> the below. >> >> I think the right answer is to just revert the offending patch instead >> of papering over it in the SCSI code. > > [ Adjusting recipients list ] > > Full thread, for new recipients: > https://www.spinics.net/lists/linux-fsdevel/msg138601.html > > Christoph, do you mean a3cd5ec55f6c7 ? Well, now we know for sure that the clk_scaling_lock is a red herring. I applied the patch below, and still the system locked up: # dd if=/dev/sde of=/dev/null bs=1M status=progress 3892314112 bytes (3.9 GB, 3.6 GiB) copied, 50.0042 s, 77.8 MB/s I can't seem to get the RCU stall warning anymore. How can I get to the bottom of this issue? Regards. diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c index 3aeadb14aae1..5700662ff7ca 100644 --- a/drivers/scsi/ufs/ufs-qcom.c +++ b/drivers/scsi/ufs/ufs-qcom.c @@ -1085,7 +1085,6 @@ static void ufs_qcom_set_caps(struct ufs_hba *hba) struct ufs_qcom_host *host = ufshcd_get_variant(hba); hba->caps |= UFSHCD_CAP_CLK_GATING | UFSHCD_CAP_HIBERN8_WITH_CLK_GATING; - hba->caps |= UFSHCD_CAP_CLK_SCALING; hba->caps |= UFSHCD_CAP_AUTO_BKOPS_SUSPEND; if (host->hw_ver.major >= 0x2) { diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index d0d340210ccf..90be92766723 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -250,10 +250,6 @@ static int ufshcd_uic_hibern8_exit(struct ufs_hba *hba); static int ufshcd_uic_hibern8_enter(struct ufs_hba *hba); static inline void ufshcd_add_delay_before_dme_cmd(struct ufs_hba *hba); static int ufshcd_host_reset_and_restore(struct ufs_hba *hba); -static void ufshcd_resume_clkscaling(struct ufs_hba *hba); -static void ufshcd_suspend_clkscaling(struct ufs_hba *hba); -static void __ufshcd_suspend_clkscaling(struct ufs_hba *hba); -static int ufshcd_scale_clks(struct ufs_hba *hba, bool scale_up); static irqreturn_t ufshcd_intr(int irq, void *__hba); static int ufshcd_change_power_mode(struct ufs_hba *hba, struct ufs_pa_layer_attr *pwr_mode); @@ -892,575 +888,6 @@ static bool ufshcd_is_unipro_pa_params_tuning_req(struct ufs_hba *hba) return false; } -static int ufshcd_scale_clks(struct ufs_hba *hba, bool scale_up) -{ - int ret = 0; - struct ufs_clk_info *clki; - struct list_head *head = &hba->clk_list_head; - ktime_t start = ktime_get(); - bool clk_state_changed = false; - - if (list_empty(head)) - goto out; - - ret = ufshcd_vops_clk_scale_notify(hba, scale_up, PRE_CHANGE); - if (ret) - return ret; - - list_for_each_entry(clki, head, list) { - if (!IS_ERR_OR_NULL(clki->clk)) { - if (scale_up && clki->max_freq) { - if (clki->curr_freq == clki->max_freq) - continue; - - clk_state_changed = true; - ret = clk_set_rate(clki->clk, clki->max_freq); - if (ret) { - dev_err(hba->dev, "%s: %s clk set rate(%dHz) failed, %d\n", - __func__, clki->name, - clki->max_freq, ret); - break; - } - trace_ufshcd_clk_scaling(dev_name(hba->dev), - "scaled up", clki->name, - clki->curr_freq, - clki->max_freq); - - clki->curr_freq = clki->max_freq; - - } else if (!scale_up && clki->min_freq) { - if (clki->curr_freq == clki->min_freq) - continue; - - clk_state_changed = true; - ret = clk_set_rate(clki->clk, clki->min_freq); - if (ret) { - dev_err(hba->dev, "%s: %s clk set rate(%dHz) failed, %d\n", - __func__, clki->name, - clki->min_freq, ret); - break; - } - trace_ufshcd_clk_scaling(dev_name(hba->dev), - "scaled down", clki->name, - clki->curr_freq, - clki->min_freq); - clki->curr_freq = clki->min_freq; - } - } - dev_dbg(hba->dev, "%s: clk: %s, rate: %lu\n", __func__, - clki->name, clk_get_rate(clki->clk)); - } - - ret = ufshcd_vops_clk_scale_notify(hba, scale_up, POST_CHANGE); - -out: - if (clk_state_changed) - trace_ufshcd_profile_clk_scaling(dev_name(hba->dev), - (scale_up ? "up" : "down"), - ktime_to_us(ktime_sub(ktime_get(), start)), ret); - return ret; -} - -/** - * ufshcd_is_devfreq_scaling_required - check if scaling is required or not - * @hba: per adapter instance - * @scale_up: True if scaling up and false if scaling down - * - * Returns true if scaling is required, false otherwise. - */ -static bool ufshcd_is_devfreq_scaling_required(struct ufs_hba *hba, - bool scale_up) -{ - struct ufs_clk_info *clki; - struct list_head *head = &hba->clk_list_head; - - if (list_empty(head)) - return false; - - list_for_each_entry(clki, head, list) { - if (!IS_ERR_OR_NULL(clki->clk)) { - if (scale_up && clki->max_freq) { - if (clki->curr_freq == clki->max_freq) - continue; - return true; - } else if (!scale_up && clki->min_freq) { - if (clki->curr_freq == clki->min_freq) - continue; - return true; - } - } - } - - return false; -} - -static int ufshcd_wait_for_doorbell_clr(struct ufs_hba *hba, - u64 wait_timeout_us) -{ - unsigned long flags; - int ret = 0; - u32 tm_doorbell; - u32 tr_doorbell; - bool timeout = false, do_last_check = false; - ktime_t start; - - ufshcd_hold(hba, false); - spin_lock_irqsave(hba->host->host_lock, flags); - /* - * Wait for all the outstanding tasks/transfer requests. - * Verify by checking the doorbell registers are clear. - */ - start = ktime_get(); - do { - if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) { - ret = -EBUSY; - goto out; - } - - tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL); - tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL); - if (!tm_doorbell && !tr_doorbell) { - timeout = false; - break; - } else if (do_last_check) { - break; - } - - spin_unlock_irqrestore(hba->host->host_lock, flags); - schedule(); - if (ktime_to_us(ktime_sub(ktime_get(), start)) > - wait_timeout_us) { - timeout = true; - /* - * We might have scheduled out for long time so make - * sure to check if doorbells are cleared by this time - * or not. - */ - do_last_check = true; - } - spin_lock_irqsave(hba->host->host_lock, flags); - } while (tm_doorbell || tr_doorbell); - - if (timeout) { - dev_err(hba->dev, - "%s: timedout waiting for doorbell to clear (tm=0x%x, tr=0x%x)\n", - __func__, tm_doorbell, tr_doorbell); - ret = -EBUSY; - } -out: - spin_unlock_irqrestore(hba->host->host_lock, flags); - ufshcd_release(hba); - return ret; -} - -/** - * ufshcd_scale_gear - scale up/down UFS gear - * @hba: per adapter instance - * @scale_up: True for scaling up gear and false for scaling down - * - * Returns 0 for success, - * Returns -EBUSY if scaling can't happen at this time - * Returns non-zero for any other errors - */ -static int ufshcd_scale_gear(struct ufs_hba *hba, bool scale_up) -{ - #define UFS_MIN_GEAR_TO_SCALE_DOWN UFS_HS_G1 - int ret = 0; - struct ufs_pa_layer_attr new_pwr_info; - - if (scale_up) { - memcpy(&new_pwr_info, &hba->clk_scaling.saved_pwr_info.info, - sizeof(struct ufs_pa_layer_attr)); - } else { - memcpy(&new_pwr_info, &hba->pwr_info, - sizeof(struct ufs_pa_layer_attr)); - - if (hba->pwr_info.gear_tx > UFS_MIN_GEAR_TO_SCALE_DOWN - || hba->pwr_info.gear_rx > UFS_MIN_GEAR_TO_SCALE_DOWN) { - /* save the current power mode */ - memcpy(&hba->clk_scaling.saved_pwr_info.info, - &hba->pwr_info, - sizeof(struct ufs_pa_layer_attr)); - - /* scale down gear */ - new_pwr_info.gear_tx = UFS_MIN_GEAR_TO_SCALE_DOWN; - new_pwr_info.gear_rx = UFS_MIN_GEAR_TO_SCALE_DOWN; - } - } - - /* check if the power mode needs to be changed or not? */ - ret = ufshcd_change_power_mode(hba, &new_pwr_info); - - if (ret) - dev_err(hba->dev, "%s: failed err %d, old gear: (tx %d rx %d), new gear: (tx %d rx %d)", - __func__, ret, - hba->pwr_info.gear_tx, hba->pwr_info.gear_rx, - new_pwr_info.gear_tx, new_pwr_info.gear_rx); - - return ret; -} - -static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba) -{ - #define DOORBELL_CLR_TOUT_US (1000 * 1000) /* 1 sec */ - int ret = 0; - /* - * make sure that there are no outstanding requests when - * clock scaling is in progress - */ - ufshcd_scsi_block_requests(hba); - down_write(&hba->clk_scaling_lock); - if (ufshcd_wait_for_doorbell_clr(hba, DOORBELL_CLR_TOUT_US)) { - ret = -EBUSY; - up_write(&hba->clk_scaling_lock); - ufshcd_scsi_unblock_requests(hba); - } - - return ret; -} - -static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba) -{ - up_write(&hba->clk_scaling_lock); - ufshcd_scsi_unblock_requests(hba); -} - -/** - * ufshcd_devfreq_scale - scale up/down UFS clocks and gear - * @hba: per adapter instance - * @scale_up: True for scaling up and false for scalin down - * - * Returns 0 for success, - * Returns -EBUSY if scaling can't happen at this time - * Returns non-zero for any other errors - */ -static int ufshcd_devfreq_scale(struct ufs_hba *hba, bool scale_up) -{ - int ret = 0; - - /* let's not get into low power until clock scaling is completed */ - ufshcd_hold(hba, false); - - ret = ufshcd_clock_scaling_prepare(hba); - if (ret) - return ret; - - /* scale down the gear before scaling down clocks */ - if (!scale_up) { - ret = ufshcd_scale_gear(hba, false); - if (ret) - goto out; - } - - ret = ufshcd_scale_clks(hba, scale_up); - if (ret) { - if (!scale_up) - ufshcd_scale_gear(hba, true); - goto out; - } - - /* scale up the gear after scaling up clocks */ - if (scale_up) { - ret = ufshcd_scale_gear(hba, true); - if (ret) { - ufshcd_scale_clks(hba, false); - goto out; - } - } - - ret = ufshcd_vops_clk_scale_notify(hba, scale_up, POST_CHANGE); - -out: - ufshcd_clock_scaling_unprepare(hba); - ufshcd_release(hba); - return ret; -} - -static void ufshcd_clk_scaling_suspend_work(struct work_struct *work) -{ - struct ufs_hba *hba = container_of(work, struct ufs_hba, - clk_scaling.suspend_work); - unsigned long irq_flags; - - spin_lock_irqsave(hba->host->host_lock, irq_flags); - if (hba->clk_scaling.active_reqs || hba->clk_scaling.is_suspended) { - spin_unlock_irqrestore(hba->host->host_lock, irq_flags); - return; - } - hba->clk_scaling.is_suspended = true; - spin_unlock_irqrestore(hba->host->host_lock, irq_flags); - - __ufshcd_suspend_clkscaling(hba); -} - -static void ufshcd_clk_scaling_resume_work(struct work_struct *work) -{ - struct ufs_hba *hba = container_of(work, struct ufs_hba, - clk_scaling.resume_work); - unsigned long irq_flags; - - spin_lock_irqsave(hba->host->host_lock, irq_flags); - if (!hba->clk_scaling.is_suspended) { - spin_unlock_irqrestore(hba->host->host_lock, irq_flags); - return; - } - hba->clk_scaling.is_suspended = false; - spin_unlock_irqrestore(hba->host->host_lock, irq_flags); - - devfreq_resume_device(hba->devfreq); -} - -static int ufshcd_devfreq_target(struct device *dev, - unsigned long *freq, u32 flags) -{ - int ret = 0; - struct ufs_hba *hba = dev_get_drvdata(dev); - ktime_t start; - bool scale_up, sched_clk_scaling_suspend_work = false; - struct list_head *clk_list = &hba->clk_list_head; - struct ufs_clk_info *clki; - unsigned long irq_flags; - - if (!ufshcd_is_clkscaling_supported(hba)) - return -EINVAL; - - spin_lock_irqsave(hba->host->host_lock, irq_flags); - if (ufshcd_eh_in_progress(hba)) { - spin_unlock_irqrestore(hba->host->host_lock, irq_flags); - return 0; - } - - if (!hba->clk_scaling.active_reqs) - sched_clk_scaling_suspend_work = true; - - if (list_empty(clk_list)) { - spin_unlock_irqrestore(hba->host->host_lock, irq_flags); - goto out; - } - - clki = list_first_entry(&hba->clk_list_head, struct ufs_clk_info, list); - scale_up = (*freq == clki->max_freq) ? true : false; - if (!ufshcd_is_devfreq_scaling_required(hba, scale_up)) { - spin_unlock_irqrestore(hba->host->host_lock, irq_flags); - ret = 0; - goto out; /* no state change required */ - } - spin_unlock_irqrestore(hba->host->host_lock, irq_flags); - - start = ktime_get(); - ret = ufshcd_devfreq_scale(hba, scale_up); - - trace_ufshcd_profile_clk_scaling(dev_name(hba->dev), - (scale_up ? "up" : "down"), - ktime_to_us(ktime_sub(ktime_get(), start)), ret); - -out: - if (sched_clk_scaling_suspend_work) - queue_work(hba->clk_scaling.workq, - &hba->clk_scaling.suspend_work); - - return ret; -} - - -static int ufshcd_devfreq_get_dev_status(struct device *dev, - struct devfreq_dev_status *stat) -{ - struct ufs_hba *hba = dev_get_drvdata(dev); - struct ufs_clk_scaling *scaling = &hba->clk_scaling; - unsigned long flags; - - if (!ufshcd_is_clkscaling_supported(hba)) - return -EINVAL; - - memset(stat, 0, sizeof(*stat)); - - spin_lock_irqsave(hba->host->host_lock, flags); - if (!scaling->window_start_t) - goto start_window; - - if (scaling->is_busy_started) - scaling->tot_busy_t += ktime_to_us(ktime_sub(ktime_get(), - scaling->busy_start_t)); - - stat->total_time = jiffies_to_usecs((long)jiffies - - (long)scaling->window_start_t); - stat->busy_time = scaling->tot_busy_t; -start_window: - scaling->window_start_t = jiffies; - scaling->tot_busy_t = 0; - - if (hba->outstanding_reqs) { - scaling->busy_start_t = ktime_get(); - scaling->is_busy_started = true; - } else { - scaling->busy_start_t = 0; - scaling->is_busy_started = false; - } - spin_unlock_irqrestore(hba->host->host_lock, flags); - return 0; -} - -static struct devfreq_dev_profile ufs_devfreq_profile = { - .polling_ms = 100, - .target = ufshcd_devfreq_target, - .get_dev_status = ufshcd_devfreq_get_dev_status, -}; - -static int ufshcd_devfreq_init(struct ufs_hba *hba) -{ - struct list_head *clk_list = &hba->clk_list_head; - struct ufs_clk_info *clki; - struct devfreq *devfreq; - int ret; - - /* Skip devfreq if we don't have any clocks in the list */ - if (list_empty(clk_list)) - return 0; - - clki = list_first_entry(clk_list, struct ufs_clk_info, list); - dev_pm_opp_add(hba->dev, clki->min_freq, 0); - dev_pm_opp_add(hba->dev, clki->max_freq, 0); - - devfreq = devfreq_add_device(hba->dev, - &ufs_devfreq_profile, - DEVFREQ_GOV_SIMPLE_ONDEMAND, - NULL); - if (IS_ERR(devfreq)) { - ret = PTR_ERR(devfreq); - dev_err(hba->dev, "Unable to register with devfreq %d\n", ret); - - dev_pm_opp_remove(hba->dev, clki->min_freq); - dev_pm_opp_remove(hba->dev, clki->max_freq); - return ret; - } - - hba->devfreq = devfreq; - - return 0; -} - -static void ufshcd_devfreq_remove(struct ufs_hba *hba) -{ - struct list_head *clk_list = &hba->clk_list_head; - struct ufs_clk_info *clki; - - if (!hba->devfreq) - return; - - devfreq_remove_device(hba->devfreq); - hba->devfreq = NULL; - - clki = list_first_entry(clk_list, struct ufs_clk_info, list); - dev_pm_opp_remove(hba->dev, clki->min_freq); - dev_pm_opp_remove(hba->dev, clki->max_freq); -} - -static void __ufshcd_suspend_clkscaling(struct ufs_hba *hba) -{ - unsigned long flags; - - devfreq_suspend_device(hba->devfreq); - spin_lock_irqsave(hba->host->host_lock, flags); - hba->clk_scaling.window_start_t = 0; - spin_unlock_irqrestore(hba->host->host_lock, flags); -} - -static void ufshcd_suspend_clkscaling(struct ufs_hba *hba) -{ - unsigned long flags; - bool suspend = false; - - if (!ufshcd_is_clkscaling_supported(hba)) - return; - - spin_lock_irqsave(hba->host->host_lock, flags); - if (!hba->clk_scaling.is_suspended) { - suspend = true; - hba->clk_scaling.is_suspended = true; - } - spin_unlock_irqrestore(hba->host->host_lock, flags); - - if (suspend) - __ufshcd_suspend_clkscaling(hba); -} - -static void ufshcd_resume_clkscaling(struct ufs_hba *hba) -{ - unsigned long flags; - bool resume = false; - - if (!ufshcd_is_clkscaling_supported(hba)) - return; - - spin_lock_irqsave(hba->host->host_lock, flags); - if (hba->clk_scaling.is_suspended) { - resume = true; - hba->clk_scaling.is_suspended = false; - } - spin_unlock_irqrestore(hba->host->host_lock, flags); - - if (resume) - devfreq_resume_device(hba->devfreq); -} - -static ssize_t ufshcd_clkscale_enable_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct ufs_hba *hba = dev_get_drvdata(dev); - - return snprintf(buf, PAGE_SIZE, "%d\n", hba->clk_scaling.is_allowed); -} - -static ssize_t ufshcd_clkscale_enable_store(struct device *dev, - struct device_attribute *attr, const char *buf, size_t count) -{ - struct ufs_hba *hba = dev_get_drvdata(dev); - u32 value; - int err; - - if (kstrtou32(buf, 0, &value)) - return -EINVAL; - - value = !!value; - if (value == hba->clk_scaling.is_allowed) - goto out; - - pm_runtime_get_sync(hba->dev); - ufshcd_hold(hba, false); - - cancel_work_sync(&hba->clk_scaling.suspend_work); - cancel_work_sync(&hba->clk_scaling.resume_work); - - hba->clk_scaling.is_allowed = value; - - if (value) { - ufshcd_resume_clkscaling(hba); - } else { - ufshcd_suspend_clkscaling(hba); - err = ufshcd_devfreq_scale(hba, true); - if (err) - dev_err(hba->dev, "%s: failed to scale clocks up %d\n", - __func__, err); - } - - ufshcd_release(hba); - pm_runtime_put_sync(hba->dev); -out: - return count; -} - -static void ufshcd_clkscaling_init_sysfs(struct ufs_hba *hba) -{ - hba->clk_scaling.enable_attr.show = ufshcd_clkscale_enable_show; - hba->clk_scaling.enable_attr.store = ufshcd_clkscale_enable_store; - sysfs_attr_init(&hba->clk_scaling.enable_attr.attr); - hba->clk_scaling.enable_attr.attr.name = "clkscale_enable"; - hba->clk_scaling.enable_attr.attr.mode = 0644; - if (device_create_file(hba->dev, &hba->clk_scaling.enable_attr)) - dev_err(hba->dev, "Failed to create sysfs for clkscale_enable\n"); -} - static void ufshcd_ungate_work(struct work_struct *work) { int ret; @@ -1742,34 +1169,6 @@ static ssize_t ufshcd_clkgate_enable_store(struct device *dev, return count; } -static void ufshcd_init_clk_scaling(struct ufs_hba *hba) -{ - char wq_name[sizeof("ufs_clkscaling_00")]; - - if (!ufshcd_is_clkscaling_supported(hba)) - return; - - INIT_WORK(&hba->clk_scaling.suspend_work, - ufshcd_clk_scaling_suspend_work); - INIT_WORK(&hba->clk_scaling.resume_work, - ufshcd_clk_scaling_resume_work); - - snprintf(wq_name, sizeof(wq_name), "ufs_clkscaling_%d", - hba->host->host_no); - hba->clk_scaling.workq = create_singlethread_workqueue(wq_name); - - ufshcd_clkscaling_init_sysfs(hba); -} - -static void ufshcd_exit_clk_scaling(struct ufs_hba *hba) -{ - if (!ufshcd_is_clkscaling_supported(hba)) - return; - - destroy_workqueue(hba->clk_scaling.workq); - ufshcd_devfreq_remove(hba); -} - static void ufshcd_init_clk_gating(struct ufs_hba *hba) { char wq_name[sizeof("ufs_clk_gating_00")]; @@ -1816,50 +1215,6 @@ static void ufshcd_exit_clk_gating(struct ufs_hba *hba) destroy_workqueue(hba->clk_gating.clk_gating_workq); } -/* Must be called with host lock acquired */ -static void ufshcd_clk_scaling_start_busy(struct ufs_hba *hba) -{ - bool queue_resume_work = false; - - if (!ufshcd_is_clkscaling_supported(hba)) - return; - - if (!hba->clk_scaling.active_reqs++) - queue_resume_work = true; - - if (!hba->clk_scaling.is_allowed || hba->pm_op_in_progress) - return; - - if (queue_resume_work) - queue_work(hba->clk_scaling.workq, - &hba->clk_scaling.resume_work); - - if (!hba->clk_scaling.window_start_t) { - hba->clk_scaling.window_start_t = jiffies; - hba->clk_scaling.tot_busy_t = 0; - hba->clk_scaling.is_busy_started = false; - } - - if (!hba->clk_scaling.is_busy_started) { - hba->clk_scaling.busy_start_t = ktime_get(); - hba->clk_scaling.is_busy_started = true; - } -} - -static void ufshcd_clk_scaling_update_busy(struct ufs_hba *hba) -{ - struct ufs_clk_scaling *scaling = &hba->clk_scaling; - - if (!ufshcd_is_clkscaling_supported(hba)) - return; - - if (!hba->outstanding_reqs && scaling->is_busy_started) { - scaling->tot_busy_t += ktime_to_us(ktime_sub(ktime_get(), - scaling->busy_start_t)); - scaling->busy_start_t = 0; - scaling->is_busy_started = false; - } -} /** * ufshcd_send_command - Send SCSI or device management commands * @hba: per adapter instance @@ -1870,7 +1225,6 @@ void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag) { hba->lrb[task_tag].issue_time_stamp = ktime_get(); hba->lrb[task_tag].compl_time_stamp = ktime_set(0, 0); - ufshcd_clk_scaling_start_busy(hba); __set_bit(task_tag, &hba->outstanding_reqs); ufshcd_writel(hba, 1 << task_tag, REG_UTP_TRANSFER_REQ_DOOR_BELL); /* Make sure that doorbell is committed immediately */ @@ -2398,9 +1752,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd) BUG(); } - if (!down_read_trylock(&hba->clk_scaling_lock)) - return SCSI_MLQUEUE_HOST_BUSY; - spin_lock_irqsave(hba->host->host_lock, flags); switch (hba->ufshcd_state) { case UFSHCD_STATE_OPERATIONAL: @@ -2480,7 +1831,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd) out_unlock: spin_unlock_irqrestore(hba->host->host_lock, flags); out: - up_read(&hba->clk_scaling_lock); return err; } @@ -2670,8 +2020,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba, struct completion wait; unsigned long flags; - down_read(&hba->clk_scaling_lock); - /* * Get free slot, sleep if slots are unavailable. * Even though we use wait_event() which sleeps indefinitely, @@ -2704,7 +2052,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba, out_put_tag: ufshcd_put_dev_cmd_tag(hba, tag); wake_up(&hba->dev_cmd.tag_wq); - up_read(&hba->clk_scaling_lock); return err; } @@ -4786,8 +4133,6 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba, complete(hba->dev_cmd.complete); } } - if (ufshcd_is_clkscaling_supported(hba)) - hba->clk_scaling.active_reqs--; lrbp->compl_time_stamp = ktime_get(); } @@ -4795,8 +4140,6 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba, /* clear corresponding bits of completed commands */ hba->outstanding_reqs ^= completed_reqs; - ufshcd_clk_scaling_update_busy(hba); - /* we might have free'd some tags above */ wake_up(&hba->dev_cmd.tag_wq); } @@ -5708,8 +5051,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba, unsigned long flags; u32 upiu_flags; - down_read(&hba->clk_scaling_lock); - wait_event(hba->dev_cmd.tag_wq, ufshcd_get_dev_cmd_tag(hba, &tag)); init_completion(&wait); @@ -5772,7 +5113,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba, ufshcd_put_dev_cmd_tag(hba, tag); wake_up(&hba->dev_cmd.tag_wq); - up_read(&hba->clk_scaling_lock); return err; } @@ -6110,9 +5450,6 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba) ufshcd_hba_stop(hba, false); spin_unlock_irqrestore(hba->host->host_lock, flags); - /* scale up clocks to max frequency before full reinitialization */ - ufshcd_scale_clks(hba, true); - err = ufshcd_hba_enable(hba); if (err) goto out; @@ -6871,20 +6208,6 @@ static int ufshcd_probe_hba(struct ufs_hba *hba) if (ufshcd_scsi_add_wlus(hba)) goto out; - /* Initialize devfreq after UFS device is detected */ - if (ufshcd_is_clkscaling_supported(hba)) { - memcpy(&hba->clk_scaling.saved_pwr_info.info, - &hba->pwr_info, - sizeof(struct ufs_pa_layer_attr)); - hba->clk_scaling.saved_pwr_info.is_valid = true; - if (!hba->devfreq) { - ret = ufshcd_devfreq_init(hba); - if (ret) - goto out; - } - hba->clk_scaling.is_allowed = true; - } - ufs_bsg_probe(hba); scsi_scan_host(hba->host); @@ -6901,7 +6224,6 @@ static int ufshcd_probe_hba(struct ufs_hba *hba) */ if (ret && !ufshcd_eh_in_progress(hba) && !hba->pm_op_in_progress) { pm_runtime_put_sync(hba->dev); - ufshcd_exit_clk_scaling(hba); ufshcd_hba_exit(hba); } @@ -7446,10 +6768,6 @@ static void ufshcd_hba_exit(struct ufs_hba *hba) if (hba->is_powered) { ufshcd_variant_hba_exit(hba); ufshcd_setup_vreg(hba, false); - ufshcd_suspend_clkscaling(hba); - if (ufshcd_is_clkscaling_supported(hba)) - if (hba->devfreq) - ufshcd_suspend_clkscaling(hba); ufshcd_setup_clocks(hba, false); ufshcd_setup_hba_vreg(hba, false); hba->is_powered = false; @@ -7725,12 +7043,6 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) ufshcd_hold(hba, false); hba->clk_gating.is_suspended = true; - if (hba->clk_scaling.is_allowed) { - cancel_work_sync(&hba->clk_scaling.suspend_work); - cancel_work_sync(&hba->clk_scaling.resume_work); - ufshcd_suspend_clkscaling(hba); - } - if (req_dev_pwr_mode == UFS_ACTIVE_PWR_MODE && req_link_state == UIC_LINK_ACTIVE_STATE) { goto disable_clks; @@ -7806,8 +7118,6 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) goto out; set_link_active: - if (hba->clk_scaling.is_allowed) - ufshcd_resume_clkscaling(hba); ufshcd_vreg_set_hpm(hba); if (ufshcd_is_link_hibern8(hba) && !ufshcd_uic_hibern8_exit(hba)) ufshcd_set_link_active(hba); @@ -7817,8 +7127,6 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) if (!ufshcd_set_dev_pwr_mode(hba, UFS_ACTIVE_PWR_MODE)) ufshcd_disable_auto_bkops(hba); enable_gating: - if (hba->clk_scaling.is_allowed) - ufshcd_resume_clkscaling(hba); hba->clk_gating.is_suspended = false; ufshcd_release(hba); out: @@ -7901,9 +7209,6 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op) hba->clk_gating.is_suspended = false; - if (hba->clk_scaling.is_allowed) - ufshcd_resume_clkscaling(hba); - /* Schedule clock gating in case of no access to UFS device yet */ ufshcd_release(hba); @@ -7920,8 +7225,6 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op) ufshcd_vreg_set_lpm(hba); disable_irq_and_vops_clks: ufshcd_disable_irq(hba); - if (hba->clk_scaling.is_allowed) - ufshcd_suspend_clkscaling(hba); ufshcd_setup_clocks(hba, false); out: hba->pm_op_in_progress = 0; @@ -8125,10 +7428,7 @@ void ufshcd_remove(struct ufs_hba *hba) ufshcd_disable_intr(hba, hba->intr_mask); ufshcd_hba_stop(hba, true); - ufshcd_exit_clk_scaling(hba); ufshcd_exit_clk_gating(hba); - if (ufshcd_is_clkscaling_supported(hba)) - device_remove_file(hba->dev, &hba->clk_scaling.enable_attr); ufshcd_hba_exit(hba); } EXPORT_SYMBOL_GPL(ufshcd_remove); @@ -8284,15 +7584,11 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) /* Initialize mutex for device management commands */ mutex_init(&hba->dev_cmd.lock); - init_rwsem(&hba->clk_scaling_lock); - /* Initialize device management tag acquire wait queue */ init_waitqueue_head(&hba->dev_cmd.tag_wq); ufshcd_init_clk_gating(hba); - ufshcd_init_clk_scaling(hba); - /* * In order to avoid any spurious interrupt immediately after * registering UFS controller interrupt handler, clear any pending UFS @@ -8368,7 +7664,6 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) out_remove_scsi_host: scsi_remove_host(hba->host); exit_gating: - ufshcd_exit_clk_scaling(hba); ufshcd_exit_clk_gating(hba); out_disable: hba->is_irq_enabled = false; diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h index 69ba7445d2b3..11c7b070861c 100644 --- a/drivers/scsi/ufs/ufshcd.h +++ b/drivers/scsi/ufs/ufshcd.h @@ -700,7 +700,6 @@ struct ufs_hba { enum bkops_status urgent_bkops_lvl; bool is_urgent_bkops_lvl_checked; - struct rw_semaphore clk_scaling_lock; struct ufs_desc_size desc_size; atomic_t scsi_block_reqs_cnt; ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-21 15:22 ` Marc Gonzalez @ 2019-01-22 3:12 ` jianchao.wang 2019-01-22 10:59 ` Marc Gonzalez 0 siblings, 1 reply; 34+ messages in thread From: jianchao.wang @ 2019-01-22 3:12 UTC (permalink / raw) To: Marc Gonzalez, Christoph Hellwig, Jens Axboe Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson Hello On 1/21/19 11:22 PM, Marc Gonzalez wrote: > Well, now we know for sure that the clk_scaling_lock is a red herring. > I applied the patch below, and still the system locked up: > > # dd if=/dev/sde of=/dev/null bs=1M status=progress > 3892314112 bytes (3.9 GB, 3.6 GiB) copied, 50.0042 s, 77.8 MB/s > > I can't seem to get the RCU stall warning anymore. How can I get > to the bottom of this issue? > Can you detail the system 'locked up' ? dd hangs there ? any hung task warning log ? hang forever or just hang for a relatively long time. And what is the status of the dd when it hangs ? Can you take some samples of the /proc/<dd's pid>/status and /proc/<dd's pid>/stack during the hang ? And also would you please share the dmesg log and config ? Since always fails with buffered read with fixed bytes, what is the capacity of your system memory ? Thanks Jianchao ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-22 3:12 ` jianchao.wang @ 2019-01-22 10:59 ` Marc Gonzalez 2019-01-22 12:49 ` Marc Gonzalez ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Marc Gonzalez @ 2019-01-22 10:59 UTC (permalink / raw) To: Jianchao Wang, Christoph Hellwig, Jens Axboe Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson On 22/01/2019 04:12, Jianchao Wang wrote: > On 1/21/19 11:22 PM, Marc Gonzalez wrote: > >> Well, now we know for sure that the clk_scaling_lock is a red herring. >> I applied the patch below, and still the system locked up: >> >> # dd if=/dev/sde of=/dev/null bs=1M status=progress >> 3892314112 bytes (3.9 GB, 3.6 GiB) copied, 50.0042 s, 77.8 MB/s >> >> I can't seem to get the RCU stall warning anymore. How can I get >> to the bottom of this issue? > > Can you detail the system 'locked up' ? > dd hangs there ? any hung task warning log ? > hang forever or just hang for a relatively long time. The system is an arm64 dev board (APQ8098 MEDIABOX) with 4GB RAM and 64 GB UFS. USB, SDHC, PCIe, SATA, Ethernet are not functional yet (so much work ahead). All I have is a single serial console. When the shell hangs, I lose access to the system altogether. SysRq is not implemented either. I am blind once the shell locks up. The system has been frozen for 15 hours, I think that qualifies as 'forever' ;-) > And what is the status of the dd when it hangs ? > Can you take some samples of the /proc/<dd's pid>/status and /proc/<dd's pid>/stack during the hang ? Sadly, I cannot access this information once the shell locks up. However, the kernel did print many warnings overnight (see below). > And also would you please share the dmesg log and config ? See below. > Since always fails with buffered read with fixed bytes, > what is the capacity of your system memory ? 4GB RAM. And the system hangs after reading 3.8GB I think this is not a coincidence. NB: swap is disabled (this might be relevant) On a freshly booted system, I get # free random: get_random_u64 called from copy_process.isra.9.part.10+0x2c8/0x1700 with crng_init=1 random: get_random_u64 called from arch_pick_mmap_layout+0xdc/0x100 with crng_init=1 random: get_random_u64 called from load_elf_binary+0x2b8/0x12b4 with crng_init=1 total used free shared buffers cached Mem: 3948996 48916 3900080 17124 0 17124 -/+ buffers/cache: 31792 3917204 Swap: 0 0 0 # cat /proc/meminfo random: get_random_u64 called from copy_process.isra.9.part.10+0x2c8/0x1700 with crng_init=1 random: get_random_u64 called from arch_pick_mmap_layout+0xdc/0x100 with crng_init=1 random: get_random_u64 called from load_elf_binary+0x2b8/0x12b4 with crng_init=1 MemTotal: 3948996 kB MemFree: 3899824 kB MemAvailable: 3880472 kB Buffers: 0 kB Cached: 17124 kB SwapCached: 0 kB Active: 5700 kB Inactive: 11748 kB Active(anon): 5700 kB Inactive(anon): 11748 kB Active(file): 0 kB Inactive(file): 0 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 432 kB Mapped: 2244 kB Shmem: 17124 kB KReclaimable: 8444 kB Slab: 18120 kB SReclaimable: 8444 kB SUnreclaim: 9676 kB KernelStack: 1372 kB PageTables: 44 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 1974496 kB Committed_AS: 19072 kB VmallocTotal: 262930368 kB VmallocUsed: 0 kB VmallocChunk: 0 kB Percpu: 832 kB BOOT LOG: Booting Linux on physical CPU 0x0000000000 [0x51af8014] Linux version 5.0.0-rc1 (mgonzalez@venus) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05)) #54 SMP PREEMPT Mon Jan 21 19:06:50 CET 2019 Machine model: Qualcomm Technologies, Inc. MSM8998 v1 MTP psci: probing for conduit method from DT. psci: PSCIv1.0 detected in firmware. psci: Using standard PSCI v0.2 function IDs psci: MIGRATE_INFO_TYPE not supported. psci: SMC Calling Convention v1.0 random: get_random_bytes called from start_kernel+0xb4/0x45c with crng_init=0 percpu: Embedded 23 pages/cpu @(____ptrval____) s53272 r8192 d32744 u94208 Detected VIPT I-cache on CPU0 CPU features: detected: Kernel page table isolation (KPTI) Built 1 zonelists, mobility grouping on. Total pages: 1012273 Kernel command line: loglevel=7 androidboot.bootdevice=1da4000.ufshc androidboot.serialno=53733c35 androidboot.baseband=apq mdss_mdp.panel=1:hdmi:16 Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) software IO TLB: mapped [mem 0xfbfff000-0xfffff000] (64MB) Memory: 3943364K/4114176K available (3646K kernel code, 478K rwdata, 1296K rodata, 5632K init, 12850K bss, 170812K reserved, 0K cma-reserved) SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 ftrace: allocating 12677 entries in 50 pages Running RCU self tests rcu: Preemptible hierarchical RCU implementation. rcu: RCU lockdep checking is enabled. rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=8. Tasks RCU enabled. rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 GICv3: Distributor has no Range Selector support GICv3: no VLPI support, no direct LPI support GICv3: CPU0: found redistributor 0 region 0:0x0000000017b00000 ITS: No ITS available, not enabling LPIs arch_timer: cp15 and mmio timer(s) running at 19.20MHz (virt/virt). clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x46d987e47, max_idle_ns: 440795202767 ns sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 4398046511078ns Console: colour dummy device 80x25 printk: console [tty0] enabled Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar ... MAX_LOCKDEP_SUBCLASSES: 8 ... MAX_LOCK_DEPTH: 48 ... MAX_LOCKDEP_KEYS: 8191 ... CLASSHASH_SIZE: 4096 ... MAX_LOCKDEP_ENTRIES: 32768 ... MAX_LOCKDEP_CHAINS: 65536 ... CHAINHASH_SIZE: 32768 memory used by lock dependency info: 7263 kB per task-struct memory footprint: 1920 bytes Calibrating delay loop (skipped), value calculated using timer frequency.. 38.40 BogoMIPS (lpj=76800) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 8192 (order: 4, 65536 bytes) Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes) random: get_random_u64 called from copy_process.isra.9.part.10+0x2c8/0x1700 with crng_init=0 ASID allocator initialised with 32768 entries rcu: Hierarchical SRCU implementation. smp: Bringing up secondary CPUs ... Detected VIPT I-cache on CPU1 GICv3: CPU1: found redistributor 1 region 0:0x0000000017b20000 CPU1: Booted secondary processor 0x0000000001 [0x51af8014] Detected VIPT I-cache on CPU2 GICv3: CPU2: found redistributor 2 region 0:0x0000000017b40000 CPU2: Booted secondary processor 0x0000000002 [0x51af8014] Detected VIPT I-cache on CPU3 GICv3: CPU3: found redistributor 3 region 0:0x0000000017b60000 CPU3: Booted secondary processor 0x0000000003 [0x51af8014] Detected VIPT I-cache on CPU4 CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU4: 0x00000000101122 CPU features: Unsupported CPU feature variation detected. GICv3: CPU4: found redistributor 100 region 0:0x0000000017b80000 CPU4: Booted secondary processor 0x0000000100 [0x51af8001] Detected VIPT I-cache on CPU5 CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU5: 0x00000000101122 GICv3: CPU5: found redistributor 101 region 0:0x0000000017ba0000 CPU5: Booted secondary processor 0x0000000101 [0x51af8001] Detected VIPT I-cache on CPU6 CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU6: 0x00000000101122 GICv3: CPU6: found redistributor 102 region 0:0x0000000017bc0000 CPU6: Booted secondary processor 0x0000000102 [0x51af8001] Detected VIPT I-cache on CPU7 CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU7: 0x00000000101122 GICv3: CPU7: found redistributor 103 region 0:0x0000000017be0000 CPU7: Booted secondary processor 0x0000000103 [0x51af8001] smp: Brought up 1 node, 8 CPUs SMP: Total of 8 processors activated. CPU features: detected: GIC system register CPU interface CPU features: detected: 32-bit EL0 Support CPU features: detected: CRC32 instructions CPU: All CPU(s) started at EL1 alternatives: patching kernel code devtmpfs: initialized clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns futex hash table entries: 2048 (order: 6, 262144 bytes) pinctrl core: initialized pinctrl subsystem vdso: 2 pages (1 code @ (____ptrval____), 1 data @ (____ptrval____)) DMA: preallocated 256 KiB pool for atomic allocations SCSI subsystem initialized clocksource: Switched to clocksource arch_sys_counter s1: supplied by vph_pwr s2: supplied by vph_pwr s3: supplied by vph_pwr s3: Bringing 0uV into 1352000-1352000uV s4: supplied by vph_pwr s4: Bringing 0uV into 1800000-1800000uV s5: supplied by vph_pwr s5: Bringing 0uV into 1904000-1904000uV s6: supplied by vph_pwr s7: supplied by vph_pwr s7: Bringing 0uV into 900000-900000uV s8: supplied by vph_pwr s9: supplied by vph_pwr s10: supplied by vph_pwr s11: supplied by vph_pwr s12: supplied by vph_pwr s13: supplied by vph_pwr l1: supplied by s7 l1: Bringing 0uV into 880000-880000uV l2: supplied by s3 l2: Bringing 0uV into 1200000-1200000uV l3: supplied by s7 l3: Bringing 0uV into 1000000-1000000uV l4: supplied by s7 l5: supplied by s7 l5: Bringing 0uV into 800000-800000uV l6: supplied by s5 l6: Bringing 0uV into 1808000-1808000uV l7: supplied by s5 l7: Bringing 0uV into 1800000-1800000uV l8: supplied by s3 l8: Bringing 0uV into 1200000-1200000uV l9: Bringing 0uV into 1808000-1808000uV l10: Bringing 0uV into 1808000-1808000uV l11: supplied by s7 l11: Bringing 0uV into 1000000-1000000uV l12: supplied by s5 l12: Bringing 0uV into 1800000-1800000uV l13: Bringing 0uV into 1808000-1808000uV l14: supplied by s5 l14: Bringing 0uV into 1880000-1880000uV l15: supplied by s5 l15: Bringing 0uV into 1800000-1800000uV l16: Bringing 0uV into 2704000-2704000uV l17: supplied by s3 l17: Bringing 0uV into 1304000-1304000uV l18: Bringing 0uV into 2704000-2704000uV l19: Bringing 0uV into 3008000-3008000uV l20: Bringing 0uV into 2960000-2960000uV l21: Bringing 0uV into 2960000-2960000uV l22: Bringing 0uV into 2864000-2864000uV l23: Bringing 0uV into 3312000-3312000uV l24: Bringing 0uV into 3088000-3088000uV l25: Bringing 0uV into 3104000-3104000uV l26: supplied by s3 l26: Bringing 0uV into 1200000-1200000uV l27: supplied by s7 l28: Bringing 0uV into 3008000-3008000uV lvs1: supplied by s4 lvs2: supplied by s4 bob: supplied by vph_pwr bob: Bringing 0uV into 3312000-3312000uV workingset: timestamp_bits=62 max_order=20 bucket_order=0 Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253) qcom-qmp-phy 1da7000.phy: Linked as a consumer to regulator.15 qcom-qmp-phy 1da7000.phy: Linked as a consumer to regulator.16 qcom-qmp-phy 1da7000.phy: Registered Qcom-QMP phy Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled msm_serial c1b0000.serial: msm_serial: detected port #0 msm_serial c1b0000.serial: uartclk = 1843200 c1b0000.serial: ttyMSM0 at MMIO 0xc1b0000 (irq = 15, base_baud = 115200) is a MSM msm_serial: console setup on port #0 printk: console [ttyMSM0] enabled msm_serial: driver initialized ufshcd-qcom 1da4000.ufshc: ufshcd_populate_vreg: Unable to find vdd-hba-supply regulator, assuming enabled l20: supplied by bob ufshcd-qcom 1da4000.ufshc: Linked as a consumer to regulator.34 ufshcd-qcom 1da4000.ufshc: Linked as a consumer to regulator.40 ufshcd-qcom 1da4000.ufshc: Linked as a consumer to regulator.5 ENTER ufshcd_runtime_suspend scsi host0: ufshcd random: get_random_bytes called from init_oops_id+0x40/0x50 with crng_init=0 random: get_random_bytes called from prandom_seed_full_state+0x60/0x110 with crng_init=0 ufshcd-qcom 1da4000.ufshc: ufshcd_print_pwr_info:[RX, TX]: gear=[1, 1], lane[1, 1], pwr[SLOWAUTO_MODE, SLOWAUTO_MODE], rate = 0 l9: supplied by bob l10: supplied by bob l13: supplied by bob l16: supplied by bob l18: supplied by bob l19: supplied by bob l21: supplied by bob l22: supplied by bob l23: supplied by bob l24: supplied by bob l25: supplied by bob l28: supplied by bob ufshcd-qcom 1da4000.ufshc: ufshcd_print_pwr_info:[RX, TX]: gear=[3, 3], lane[2, 2], pwr[FAST MODE, FAST MODE], rate = 2 scsi 0:0:0:49488: scsi_add_lun: correcting incorrect peripheral device type 0x0 for W-LUN 0x c150hN scsi 0:0:0:49488: Well-known LUN SAMSUNG KLUCG4J1EB-B0B1 0200 PQ: 0 ANSI: 6 scsi 0:0:0:49476: scsi_add_lun: correcting incorrect peripheral device type 0x0 for W-LUN 0x c144hN scsi 0:0:0:49476: Well-known LUN SAMSUNG KLUCG4J1EB-B0B1 0200 PQ: 0 ANSI: 6 scsi 0:0:0:49456: scsi_add_lun: correcting incorrect peripheral device type 0x0 for W-LUN 0x c130hN scsi 0:0:0:49456: Well-known LUN SAMSUNG KLUCG4J1EB-B0B1 0200 PQ: 0 ANSI: 6 scsi 0:0:0:0: Direct-Access SAMSUNG KLUCG4J1EB-B0B1 0200 PQ: 0 ANSI: 6 sd 0:0:0:0: Power-on or device reset occurred scsi 0:0:0:1: Direct-Access SAMSUNG KLUCG4J1EB-B0B1 0200 PQ: 0 ANSI: 6 sd 0:0:0:0: [sda] 14145536 4096-byte logical blocks: (57.9 GB/54.0 GiB) sd 0:0:0:1: Power-on or device reset occurred scsi 0:0:0:2: Direct-Access SAMSUNG KLUCG4J1EB-B0B1 0200 PQ: 0 ANSI: 6 sd 0:0:0:2: Power-on or device reset occurred sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA scsi 0:0:0:3: Direct-Access SAMSUNG KLUCG4J1EB-B0B1 0200 PQ: 0 ANSI: 6 sd 0:0:0:3: Power-on or device reset occurred sd 0:0:0:3: [sdd] 32768 4096-byte logical blocks: (134 MB/128 MiB) sd 0:0:0:1: [sdb] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB) sd 0:0:0:1: [sdb] Write Protect is off sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA sda: sda1 sda2 sda3 sda4 sda5 sda6 sda7 sd 0:0:0:2: [sdc] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB) sd 0:0:0:2: [sdc] Write Protect is off sd 0:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA random: fast init done sdb: sdb1 sd 0:0:0:3: [sdd] Write Protect is off scsi 0:0:0:4: Direct-Access SAMSUNG KLUCG4J1EB-B0B1 0200 PQ: 0 ANSI: 6 sdc: sdc1 sd 0:0:0:4: Power-on or device reset occurred sd 0:0:0:1: [sdb] Attached SCSI disk sd 0:0:0:4: [sde] 1048576 4096-byte logical blocks: (4.29 GB/4.00 GiB) sd 0:0:0:4: [sde] Write Protect is off sd 0:0:0:4: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA sd 0:0:0:3: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA sd 0:0:0:2: [sdc] Attached SCSI disk sde: sde1 sde2 sde3 sde4 sde5 sde6 sde7 sde8 sde9 sde10 sde11 sde12 sde13 sde14 sde15 sde16 sde17 sde18 sde19 sde20 sde21 sde22 sde23 sde24 sde25 sde26 sde27 sde28 sde29 sde30 sde31 sde32 sde33 sde34 sde35 sde36 sde37 sde38 scsi 0:0:0:5: Direct-Access SAMSUNG KLUCG4J1EB-B0B1 0200 PQ: 0 ANSI: 6 sdd: sdd1 sdd2 sd 0:0:0:0: [sda] Attached SCSI disk sd 0:0:0:3: [sdd] Attached SCSI disk sd 0:0:0:5: Power-on or device reset occurred sd 0:0:0:4: [sde] Attached SCSI disk sd 0:0:0:5: [sdf] 393216 4096-byte logical blocks: (1.61 GB/1.50 GiB) sd 0:0:0:5: [sdf] Write Protect is off sd 0:0:0:5: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA sdf: sdf1 sdf2 sdf3 sd 0:0:0:5: [sdf] Attached SCSI disk Freeing unused kernel memory: 5632K Checked W+X mappings: passed, no W+X pages found DEFCONFIG # CONFIG_SWAP is not set CONFIG_NO_HZ_IDLE=y CONFIG_HIGH_RES_TIMERS=y CONFIG_PREEMPT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=20 CONFIG_BLK_DEV_INITRD=y CONFIG_ARCH_QCOM=y # CONFIG_EFI is not set # CONFIG_SUSPEND is not set CONFIG_PM=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MQ_IOSCHED_DEADLINE is not set # CONFIG_MQ_IOSCHED_KYBER is not set CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y # CONFIG_BLK_DEV is not set CONFIG_SCSI=y # CONFIG_SCSI_PROC_FS is not set CONFIG_BLK_DEV_SD=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_SAS_LIBSAS=y CONFIG_SCSI_UFSHCD=y CONFIG_SCSI_UFSHCD_PLATFORM=y CONFIG_SCSI_UFS_QCOM=y CONFIG_SCSI_UFS_BSG=y # CONFIG_INPUT_KEYBOARD is not set # CONFIG_INPUT_MOUSE is not set # CONFIG_SERIO_SERPORT is not set CONFIG_VT_HW_CONSOLE_BINDING=y CONFIG_LEGACY_PTY_COUNT=16 CONFIG_SERIAL_8250=y CONFIG_SERIAL_MSM=y CONFIG_SERIAL_MSM_CONSOLE=y CONFIG_SERIAL_DEV_BUS=y # CONFIG_HW_RANDOM is not set CONFIG_PINCTRL_MSM8998=y CONFIG_GPIO_GENERIC_PLATFORM=y # CONFIG_HWMON is not set CONFIG_REGULATOR=y CONFIG_REGULATOR_FIXED_VOLTAGE=y CONFIG_REGULATOR_QCOM_SMD_RPM=y # CONFIG_HID is not set # CONFIG_USB_SUPPORT is not set # CONFIG_COMMON_CLK_XGENE is not set CONFIG_COMMON_CLK_QCOM=y # CONFIG_QCOM_A53PLL is not set # CONFIG_QCOM_CLK_APCS_MSM8916 is not set CONFIG_QCOM_CLK_SMD_RPM=y CONFIG_MSM_GCC_8998=y CONFIG_HWSPINLOCK=y CONFIG_HWSPINLOCK_QCOM=y CONFIG_MAILBOX=y CONFIG_QCOM_APCS_IPC=y CONFIG_RPMSG_QCOM_GLINK_RPM=y CONFIG_RPMSG_QCOM_GLINK_SMEM=y CONFIG_RPMSG_QCOM_SMD=y CONFIG_RPMSG_VIRTIO=y CONFIG_QCOM_COMMAND_DB=y CONFIG_QCOM_SMEM=y CONFIG_QCOM_SMD_RPM=y CONFIG_PHY_QCOM_QMP=y CONFIG_TMPFS=y CONFIG_NLS_ISO8859_1=y CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_SHIRQ=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=5 CONFIG_WQ_WATCHDOG=y CONFIG_SCHED_STACK_END_CHECK=y CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_ATOMIC_SLEEP=y CONFIG_WARN_ALL_UNSEEDED_RANDOM=y CONFIG_DEBUG_LIST=y CONFIG_DEBUG_PI_LIST=y CONFIG_DEBUG_NOTIFIERS=y CONFIG_RCU_CPU_STALL_TIMEOUT=3 CONFIG_FUNCTION_TRACER=y CONFIG_DEBUG_WX=y FULL SESSION LOG: https://pastebin.ubuntu.com/p/YSm82GxhNW/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-22 10:59 ` Marc Gonzalez @ 2019-01-22 12:49 ` Marc Gonzalez 2019-01-22 16:17 ` Marc Gonzalez 2019-01-23 3:10 ` jianchao.wang 2 siblings, 0 replies; 34+ messages in thread From: Marc Gonzalez @ 2019-01-22 12:49 UTC (permalink / raw) To: Jianchao Wang, Christoph Hellwig, Jens Axboe Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson On 22/01/2019 11:59, Marc Gonzalez wrote: > 4GB RAM. And the system hangs after reading 3.8GB > I think this is not a coincidence. > NB: swap is disabled (this might be relevant) > > On a freshly booted system, I get > > # free > random: get_random_u64 called from copy_process.isra.9.part.10+0x2c8/0x1700 with crng_init=1 > random: get_random_u64 called from arch_pick_mmap_layout+0xdc/0x100 with crng_init=1 > random: get_random_u64 called from load_elf_binary+0x2b8/0x12b4 with crng_init=1 > total used free shared buffers cached > Mem: 3948996 48916 3900080 17124 0 17124 > -/+ buffers/cache: 31792 3917204 > Swap: 0 0 0 When using iflag=direct, 'buffers' doesn't grow # dd if=/dev/sde of=/dev/null bs=1M iflag=direct & while true; do free; sleep 5; done total used free shared buffers cached Mem: 3948996 49060 3899936 17124 0 17124 -/+ buffers/cache: 31936 3917060 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 50084 3898912 17124 0 17124 -/+ buffers/cache: 32960 3916036 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 49832 3899164 17124 0 17124 -/+ buffers/cache: 32708 3916288 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 49832 3899164 17124 0 17124 -/+ buffers/cache: 32708 3916288 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 49832 3899164 17124 0 17124 -/+ buffers/cache: 32708 3916288 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 49832 3899164 17124 0 17124 -/+ buffers/cache: 32708 3916288 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 49832 3899164 17124 0 17124 -/+ buffers/cache: 32708 3916288 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 50084 3898912 17124 0 17124 -/+ buffers/cache: 32960 3916036 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 49832 3899164 17124 0 17124 -/+ buffers/cache: 32708 3916288 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 50084 3898912 17124 0 17124 -/+ buffers/cache: 32960 3916036 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 50084 3898912 17124 0 17124 -/+ buffers/cache: 32960 3916036 Swap: 0 0 0 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 53.3547 s, 80.5 MB/s total used free shared buffers cached Mem: 3948996 49188 3899808 17124 0 17124 -/+ buffers/cache: 32064 3916932 Swap: 0 0 0 But when using "normal" I/O, the kernel seems to consume all RAM, until the system locks up. # dd if=/dev/sde of=/dev/null bs=1M & while true; do free; sleep 5; done total used free shared buffers cached Mem: 3948996 49076 3899920 17124 0 17124 -/+ buffers/cache: 31952 3917044 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 239816 3709180 17124 190464 17004 -/+ buffers/cache: 32348 3916648 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 430760 3518236 17124 380928 16880 -/+ buffers/cache: 32952 3916044 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 733716 3215280 17124 685056 17124 -/+ buffers/cache: 31536 3917460 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 1037056 2911940 17124 987648 16936 -/+ buffers/cache: 32472 3916524 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 1335576 2613420 17124 1285632 17124 -/+ buffers/cache: 32820 3916176 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 1565188 2383808 17124 1514036 17172 -/+ buffers/cache: 33980 3915016 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 1753352 2195644 17124 1702400 17004 -/+ buffers/cache: 33948 3915048 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 1942784 2006212 17124 1891328 17152 -/+ buffers/cache: 34304 3914692 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 2226024 1722972 17124 2175088 17204 -/+ buffers/cache: 33732 3915264 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 2452092 1496904 17124 2400608 17136 -/+ buffers/cache: 34348 3914648 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 2679096 1269900 17124 2626400 17100 -/+ buffers/cache: 35596 3913400 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 2870828 1078168 17124 2816864 17004 -/+ buffers/cache: 36960 3912036 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 3144360 804636 17124 3089760 17044 -/+ buffers/cache: 37556 3911440 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 3388840 560156 17124 3333472 17088 -/+ buffers/cache: 38280 3910716 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 3577752 371244 17124 3521888 17040 -/+ buffers/cache: 38824 3910172 Swap: 0 0 0 total used free shared buffers cached Mem: 3948996 3754844 194152 17124 3699040 17056 -/+ buffers/cache: 38748 3910248 Swap: 0 0 0 rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 3-...0: (0 ticks this GP) idle=be2/1/0x4000000000000000 softirq=275/275 fqs=68 rcu: 4-...0: (709 ticks this GP) idle=386/1/0x4000000000000000 softirq=248/250 fqs=68 rcu: (detected by 6, t=2018 jiffies, g=505, q=8) Task dump for CPU 3: sh R running task 0 671 1 0x00000002 Call trace: __switch_to+0x174/0x1e0 __kmalloc+0x37c/0x3d0 0x100000000 Task dump for CPU 4: dd R running task 0 707 671 0x00000002 Call trace: __switch_to+0x174/0x1e0 __cpu_online_mask+0x0/0x8 rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 3-...0: (0 ticks this GP) idle=be2/1/0x4000000000000000 softirq=275/275 fqs=250 rcu: 4-...0: (709 ticks this GP) idle=386/1/0x4000000000000000 softirq=248/250 fqs=250 rcu: (detected by 6, t=8024 jiffies, g=505, q=8) Task dump for CPU 3: sh R running task 0 671 1 0x00000002 Call trace: __switch_to+0x174/0x1e0 __kmalloc+0x37c/0x3d0 0x100000000 Task dump for CPU 4: dd R running task 0 707 671 0x00000002 Call trace: __switch_to+0x174/0x1e0 __cpu_online_mask+0x0/0x8 Calling 'echo 3 > /proc/sys/vm/drop_caches' while dd is running does not seem to free any of the buffers. Could they be considered dirty? total used free shared buffers cached Mem: 3948996 3067664 881332 17124 3011584 17176 -/+ buffers/cache: 38904 3910092 Swap: 0 0 0 # echo 3 > /proc/sys/vm/drop_caches total used free shared buffers cached Mem: 3948996 3256876 692120 17124 3200512 17076 -/+ buffers/cache: 39288 3909708 Swap: 0 0 0 Calling 'sync' has no effect either. Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-22 10:59 ` Marc Gonzalez 2019-01-22 12:49 ` Marc Gonzalez @ 2019-01-22 16:17 ` Marc Gonzalez 2019-01-22 16:22 ` Greg Kroah-Hartman 2019-01-23 3:10 ` jianchao.wang 2 siblings, 1 reply; 34+ messages in thread From: Marc Gonzalez @ 2019-01-22 16:17 UTC (permalink / raw) To: Jianchao Wang, Christoph Hellwig, Jens Axboe, Greg Kroah-Hartman Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson + Greg On 22/01/2019 11:59, Marc Gonzalez wrote: > FULL SESSION LOG: https://pastebin.ubuntu.com/p/YSm82GxhNW/ I finally got the hung_task detector to fire: https://pastebin.ubuntu.com/p/XDDJjf7PTb/ Had a chat with GKH, who said that drivers/scsi/ufs is known to be not 100% functional, and requires some cherry-picking from downstream to handle all corner cases. I've sent a patch demoting drivers/scsi/ufs status from "Supported" to "Odd Fixes", in order to warn the next unwary fellow developers: https://patchwork.kernel.org/patch/10773503/ Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-22 16:17 ` Marc Gonzalez @ 2019-01-22 16:22 ` Greg Kroah-Hartman 2019-01-22 19:07 ` Evan Green 0 siblings, 1 reply; 34+ messages in thread From: Greg Kroah-Hartman @ 2019-01-22 16:22 UTC (permalink / raw) To: Marc Gonzalez Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson On Tue, Jan 22, 2019 at 05:17:56PM +0100, Marc Gonzalez wrote: > + Greg > > On 22/01/2019 11:59, Marc Gonzalez wrote: > > > FULL SESSION LOG: https://pastebin.ubuntu.com/p/YSm82GxhNW/ > > I finally got the hung_task detector to fire: > https://pastebin.ubuntu.com/p/XDDJjf7PTb/ > > Had a chat with GKH, who said that drivers/scsi/ufs is known to be not 100% functional, > and requires some cherry-picking from downstream to handle all corner cases. Note, we know it needs that for your controller, maybe not for others. So you can't make that kind of generalization :( thanks, greg k-h ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-22 16:22 ` Greg Kroah-Hartman @ 2019-01-22 19:07 ` Evan Green 0 siblings, 0 replies; 34+ messages in thread From: Evan Green @ 2019-01-22 19:07 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Marc Gonzalez, Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson FWIW, I am unable to repro this in our 4.19 kernel [1]. I put mem=4G on the command line, and can dd till the cows come home. Seems like you're leaking buffers somewhere? localhost ~ # dd if=/dev/sda of=/dev/null bs=4M status=progress 12683575296 bytes (13 GB, 12 GiB) copied, 125.004 s, 101 MB/s^C 3048+0 records in 3047+0 records out 12780044288 bytes (13 GB, 12 GiB) copied, 126.015 s, 101 MB/s -Evan [1] https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-4.19 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-22 10:59 ` Marc Gonzalez 2019-01-22 12:49 ` Marc Gonzalez 2019-01-22 16:17 ` Marc Gonzalez @ 2019-01-23 3:10 ` jianchao.wang 2019-02-06 16:16 ` Marc Gonzalez 2 siblings, 1 reply; 34+ messages in thread From: jianchao.wang @ 2019-01-23 3:10 UTC (permalink / raw) To: Marc Gonzalez, Christoph Hellwig, Jens Axboe Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson On 1/22/19 6:59 PM, Marc Gonzalez wrote: > On 22/01/2019 04:12, Jianchao Wang wrote: > >> On 1/21/19 11:22 PM, Marc Gonzalez wrote: >> >>> Well, now we know for sure that the clk_scaling_lock is a red herring. >>> I applied the patch below, and still the system locked up: >>> >>> # dd if=/dev/sde of=/dev/null bs=1M status=progress >>> 3892314112 bytes (3.9 GB, 3.6 GiB) copied, 50.0042 s, 77.8 MB/s >>> >>> I can't seem to get the RCU stall warning anymore. How can I get >>> to the bottom of this issue? >> Can you detail the system 'locked up' ? >> dd hangs there ? any hung task warning log ? >> hang forever or just hang for a relatively long time. > The system is an arm64 dev board (APQ8098 MEDIABOX) with 4GB RAM and 64 GB UFS. > USB, SDHC, PCIe, SATA, Ethernet are not functional yet (so much work ahead). > All I have is a single serial console. > When the shell hangs, I lose access to the system altogether. > SysRq is not implemented either. I am blind once the shell locks up. > The system has been frozen for 15 hours, I think that qualifies as 'forever' ;-) > >> And what is the status of the dd when it hangs ? >> Can you take some samples of the /proc/<dd's pid>/status and /proc/<dd's pid>/stack during the hang ? > Sadly, I cannot access this information once the shell locks up. > > However, the kernel did print many warnings overnight (see below). > >> And also would you please share the dmesg log and config ? > See below. > >> Since always fails with buffered read with fixed bytes, >> what is the capacity of your system memory ? > 4GB RAM. And the system hangs after reading 3.8GB > I think this is not a coincidence. > NB: swap is disabled (this might be relevant) Look through the log https://pastebin.ubuntu.com/p/YSm82GxhNW/ rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 6-...0: (13995 ticks this GP) idle=e16/1/0x4000000000000000 softirq=155/155 fqs=655 rcu: (detected by 4, t=576151 jiffies, g=-391, q=18) Task dump for CPU 6: dd R running task 0 677 671 0x00000002 Call trace: __switch_to+0x174/0x1e0 ufshcd_queuecommand+0x84c/0x9a8 The task is in RUNNING state when it was scheduled out. So it should be a preempt (the path is under preemptible rcu). And I wonder why it is not scheduled back for so long time that rcu stall was triggered. And who was occupying the cpu all the time ? Would you please try to show all running tasks on all cpu ? echo l > /proc/sysrq-trigger In addition, since the rcu grace period didn't pass, a lot of things could not be moved forward. Thanks Jianchao ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-23 3:10 ` jianchao.wang @ 2019-02-06 16:16 ` Marc Gonzalez 2019-02-06 17:05 ` Marc Gonzalez 0 siblings, 1 reply; 34+ messages in thread From: Marc Gonzalez @ 2019-02-06 16:16 UTC (permalink / raw) To: Jianchao Wang, Christoph Hellwig, Jens Axboe Cc: fsdevel, linux-block, SCSI, Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson On 23/01/2019 04:10, jianchao.wang wrote: > Looking through the log > https://pastebin.ubuntu.com/p/YSm82GxhNW/ > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > rcu: 6-...0: (13995 ticks this GP) idle=e16/1/0x4000000000000000 softirq=155/155 fqs=655 > rcu: (detected by 4, t=576151 jiffies, g=-391, q=18) > Task dump for CPU 6: > dd R running task 0 677 671 0x00000002 > Call trace: > __switch_to+0x174/0x1e0 > ufshcd_queuecommand+0x84c/0x9a8 > > The task is in RUNNING state when it was scheduled out. > So it should be a preempt (the path is under preemptible rcu). > > And I wonder why it is not scheduled back for so long time that rcu stall was triggered. > And who was occupying the cpu all the time ? > > Would you please try to show all running tasks on all cpu ? > > echo l > /proc/sysrq-trigger > > In addition, since the rcu grace period didn't pass, a lot of things could not be moved > forward. # echo w > /proc/sysrq-trigger [ 517.619818] sysrq: Show Blocked State [ 517.620542] task PC stack pid father # echo l > /proc/sysrq-trigger [ 530.692149] sysrq: Show backtrace of all active CPUs # ps -o pid,user,stat,args PID USER STAT COMMAND 1 root S init 2 root SW [kthreadd] 3 root IW< [rcu_gp] 4 root IW< [rcu_par_gp] 5 root IW [kworker/0:0-eve] 6 root IW< [kworker/0:0H] 7 root IW [kworker/u16:0-u] 8 root IW< [mm_percpu_wq] 9 root SW [ksoftirqd/0] 10 root IW [rcu_preempt] 11 root SW [migration/0] 12 root IW [kworker/0:1-rcu] 13 root SW [cpuhp/0] 14 root SW [cpuhp/1] 15 root SW [migration/1] 16 root SW [ksoftirqd/1] 17 root IW [kworker/1:0-eve] 18 root IW< [kworker/1:0H] 19 root SW [cpuhp/2] 20 root SW [migration/2] 21 root SW [ksoftirqd/2] 22 root IW [kworker/2:0-eve] 23 root IW< [kworker/2:0H] 24 root SW [cpuhp/3] 25 root SW [migration/3] 26 root SW [ksoftirqd/3] 27 root IW [kworker/3:0-eve] 28 root IW< [kworker/3:0H] 29 root SW [cpuhp/4] 30 root SW [migration/4] 31 root SW [ksoftirqd/4] 32 root IW [kworker/4:0-eve] 33 root IW< [kworker/4:0H-kb] 34 root SW [cpuhp/5] 35 root SW [migration/5] 36 root SW [ksoftirqd/5] 37 root IW [kworker/5:0-eve] 38 root IW< [kworker/5:0H] 39 root SW [cpuhp/6] 40 root SW [migration/6] 41 root SW [ksoftirqd/6] 42 root IW [kworker/6:0-eve] 43 root IW< [kworker/6:0H] 44 root SW [cpuhp/7] 45 root SW [migration/7] 46 root SW [ksoftirqd/7] 47 root IW [kworker/7:0-eve] 48 root IW< [kworker/7:0H] 49 root SW [kdevtmpfs] 50 root IW [kworker/u16:1-e] 52 root SW [rcu_tasks_kthre] 61 root IW [kworker/3:1-mm_] 68 root IW [kworker/5:1-eve] 76 root IW [kworker/6:1-eve] 77 root IW [kworker/7:1-mm_] 85 root IW [kworker/1:1-eve] 93 root IW [kworker/2:1-eve] 102 root IW [kworker/4:1-mm_] 202 root SW [khungtaskd] 203 root SW [oom_reaper] 204 root IW< [writeback] 206 root SW [kcompactd0] 208 root IW< [kblockd] 209 root IW< [devfreq_wq] 211 root IW [kworker/u16:2-e] 214 root IW [kworker/u16:3-e] 224 root IW [kworker/u16:4-e] 235 root IW [kworker/u16:5-e] 259 root IW [kworker/u16:6-e] 380 root SW [kswapd0] 509 root SW [scsi_eh_0] 510 root IW< [scsi_tmf_0] 511 root IW< [ufs_clk_gating_] 512 root IW< [ufs_clkscaling_] 551 root IW [kworker/u16:7-e] 635 root IW [kworker/u16:8-u] 656 root S /sbin/klogd -n 674 root S /bin/sh 675 root IW< [kworker/4:1H] 677 root R ps -o pid,user,stat,args # dd if=/dev/sde of=/dev/null bs=1M & while true; do echo l > /proc/sysrq-trigger; sleep 1; done https://pastebin.ubuntu.com/p/Sx4hXRbHby/ What information can I provide to help debug this issue? Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-02-06 16:16 ` Marc Gonzalez @ 2019-02-06 17:05 ` Marc Gonzalez 2019-02-07 10:44 ` Marc Gonzalez 0 siblings, 1 reply; 34+ messages in thread From: Marc Gonzalez @ 2019-02-06 17:05 UTC (permalink / raw) To: Jianchao Wang, Christoph Hellwig, Jens Axboe Cc: fsdevel, linux-block, SCSI, Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson On 06/02/2019 17:16, Marc Gonzalez wrote: > # dd if=/dev/sde of=/dev/null bs=1M & while true; do echo l > /proc/sysrq-trigger; sleep 1; done > https://pastebin.ubuntu.com/p/Sx4hXRbHby/ > > What other information can I provide to help debug this issue? # dd if=/dev/sde of=/dev/null bs=1M & while true; do echo m > /proc/sysrq-trigger; echo; sleep 1; done https://pastebin.ubuntu.com/p/tXTCK3Jypn/ Here are the first and last Mem-Info reports: (Identical fields removed) active_anon:1449 inactive_anon:3417 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 slab_reclaimable:1309 slab_unreclaimable:2174 mapped:465 shmem:4797 pagetables:14 bounce:0 free:979014 free_pcp:2068 free_cma:0 Node 0 active_anon:5960kB inactive_anon:13668kB active_file:0kB inactive_file:656kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2188kB dirty:0kB writeback:0kB shmem:19188kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no DMA32 free:1972788kB min:3956kB low:5924kB high:7892kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2045952kB managed:1974524kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1480kB local_pcp:1480kB free_cma:0kB Normal free:1936864kB min:3992kB low:5976kB high:7960kB active_anon:6956kB inactive_anon:13668kB active_file:0kB inactive_file:5584kB unevictable:0kB writepending:0kB present:2068224kB managed:1986952kB mlocked:0kB kernel_stack:1456kB pagetables:76kB bounce:0kB free_pcp:6408kB local_pcp:612kB free_cma:0kB DMA32: 1*4kB (M) 4*8kB (M) 3*16kB (M) 3*32kB (M) 4*64kB (M) 3*128kB (M) 5*256kB (UM) 5*512kB (UM) 6*1024kB (UM) 6*2048kB (UM) 476*4096kB (M) = 1972788kB Normal: 2*4kB (E) 4*8kB (ME) 1*16kB (E) 2*32kB (ME) 2*64kB (ME) 2*128kB (UM) 0*256kB 3*512kB (UME) 1*1024kB (E) 1*2048kB (M) 471*4096kB (M) = 1934328kB 6974 total pagecache pages active_anon:1762 inactive_anon:3403 isolated_anon:0 active_file:0 inactive_file:949699 isolated_file:0 slab_reclaimable:3465 slab_unreclaimable:2220 mapped:591 shmem:4798 pagetables:19 bounce:0 free:26601 free_pcp:2250 free_cma:0 Node 0 active_anon:7048kB inactive_anon:13612kB active_file:0kB inactive_file:3798796kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2364kB dirty:0kB writeback:0kB shmem:19192kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no DMA32 free:100488kB min:3956kB low:5924kB high:7892kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:1866424kB unevictable:0kB writepending:0kB present:2045952kB managed:1974524kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:2844kB local_pcp:1172kB free_cma:0kB Normal free:5916kB min:3992kB low:5976kB high:7960kB active_anon:7048kB inactive_anon:13612kB active_file:0kB inactive_file:1932372kB unevictable:0kB writepending:0kB present:2068224kB managed:1986952kB mlocked:0kB kernel_stack:1472kB pagetables:76kB bounce:0kB free_pcp:6156kB local_pcp:516kB free_cma:0kB DMA32: 2*4kB (UM) 2*8kB (M) 1*16kB (U) 1*32kB (U) 5*64kB (M) 6*128kB (UME) 6*256kB (UME) 5*512kB (ME) 7*1024kB (UME) 7*2048kB (UME) 18*4096kB (M) = 100488kB Normal: 3*4kB (ME) 4*8kB (UE) 3*16kB (UME) 0*32kB 1*64kB (U) 1*128kB (M) 0*256kB 3*512kB (UME) 2*1024kB (UE) 1*2048kB (M) 0*4096kB = 5916kB 954552 total pagecache pages 'inactive_file' grows from 0 to 949699 'free' shrinks from 979014 to 26601 (probably 0 by the time the system hangs) 'total pagecache pages' grows from 6974 to 954552 Where are "pagecache pages" allocated? and freed? Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-02-06 17:05 ` Marc Gonzalez @ 2019-02-07 10:44 ` Marc Gonzalez 2019-02-07 16:56 ` Marc Gonzalez 0 siblings, 1 reply; 34+ messages in thread From: Marc Gonzalez @ 2019-02-07 10:44 UTC (permalink / raw) To: linux-mm, linux-block Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI, Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin, Andrew Morton, Michal Hocko + linux-mm Summarizing the issue for linux-mm readers: If I read data from a storage device larger than my system's RAM, the system freezes once dd has read more data than available RAM. # dd if=/dev/sde of=/dev/null bs=1M & while true; do echo m > /proc/sysrq-trigger; echo; echo; sleep 1; done https://pastebin.ubuntu.com/p/HXzdqDZH4W/ A few seconds before the system hangs, Mem-Info shows: [ 90.986784] Node 0 active_anon:7060kB inactive_anon:13644kB active_file:0kB inactive_file:3797500kB [...] => 3797500kB is basically all of RAM. I tried to locate where "inactive_file" was being increased from, and saw two signatures: [ 255.606019] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | add_to_page_cache_lru | mpage_readpages | blkdev_readpages | read_pages | __do_page_cache_readahead | ondemand_readahead | page_cache_sync_readahead [ 255.637238] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | lru_cache_add_active_or_unevictable | __handle_mm_fault | handle_mm_fault | do_page_fault | do_translation_fault | do_mem_abort | el1_da Are these expected? NB: the system does not hang if I specify 'iflag=direct' to dd. According to the RCU watchdog: [ 108.466240] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 108.466420] rcu: 1-...0: (130 ticks this GP) idle=79e/1/0x4000000000000000 softirq=2393/2523 fqs=2626 [ 108.471436] rcu: (detected by 4, t=5252 jiffies, g=133, q=85) [ 108.480605] Task dump for CPU 1: [ 108.486483] kworker/1:1H R running task 0 680 2 0x0000002a [ 108.489977] Workqueue: kblockd blk_mq_run_work_fn [ 108.496908] Call trace: [ 108.501513] __switch_to+0x174/0x1e0 [ 108.503757] blk_mq_run_work_fn+0x28/0x40 [ 108.507589] process_one_work+0x208/0x480 [ 108.511486] worker_thread+0x48/0x460 [ 108.515480] kthread+0x124/0x130 [ 108.519123] ret_from_fork+0x10/0x1c Can anyone shed some light on what's going on? Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-02-07 10:44 ` Marc Gonzalez @ 2019-02-07 16:56 ` Marc Gonzalez 2019-02-08 15:33 ` Marc Gonzalez 0 siblings, 1 reply; 34+ messages in thread From: Marc Gonzalez @ 2019-02-07 16:56 UTC (permalink / raw) To: linux-mm, linux-block Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI, Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin, Andrew Morton, Michal Hocko On 07/02/2019 11:44, Marc Gonzalez wrote: > + linux-mm > > Summarizing the issue for linux-mm readers: > > If I read data from a storage device larger than my system's RAM, the system freezes > once dd has read more data than available RAM. > > # dd if=/dev/sde of=/dev/null bs=1M & while true; do echo m > /proc/sysrq-trigger; echo; echo; sleep 1; done > https://pastebin.ubuntu.com/p/HXzdqDZH4W/ > > A few seconds before the system hangs, Mem-Info shows: > > [ 90.986784] Node 0 active_anon:7060kB inactive_anon:13644kB active_file:0kB inactive_file:3797500kB [...] > > => 3797500kB is basically all of RAM. > > I tried to locate where "inactive_file" was being increased from, and saw two signatures: > > [ 255.606019] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | add_to_page_cache_lru | mpage_readpages | blkdev_readpages | read_pages | __do_page_cache_readahead | ondemand_readahead | page_cache_sync_readahead > > [ 255.637238] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | lru_cache_add_active_or_unevictable | __handle_mm_fault | handle_mm_fault | do_page_fault | do_translation_fault | do_mem_abort | el1_da > > Are these expected? > > NB: the system does not hang if I specify 'iflag=direct' to dd. > > According to the RCU watchdog: > > [ 108.466240] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 108.466420] rcu: 1-...0: (130 ticks this GP) idle=79e/1/0x4000000000000000 softirq=2393/2523 fqs=2626 > [ 108.471436] rcu: (detected by 4, t=5252 jiffies, g=133, q=85) > [ 108.480605] Task dump for CPU 1: > [ 108.486483] kworker/1:1H R running task 0 680 2 0x0000002a > [ 108.489977] Workqueue: kblockd blk_mq_run_work_fn > [ 108.496908] Call trace: > [ 108.501513] __switch_to+0x174/0x1e0 > [ 108.503757] blk_mq_run_work_fn+0x28/0x40 > [ 108.507589] process_one_work+0x208/0x480 > [ 108.511486] worker_thread+0x48/0x460 > [ 108.515480] kthread+0x124/0x130 > [ 108.519123] ret_from_fork+0x10/0x1c > > Can anyone shed some light on what's going on? Saw a slightly different report from another test run: https://pastebin.ubuntu.com/p/jCywbKgRCq/ [ 340.689764] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 340.689992] rcu: 1-...0: (8548 ticks this GP) idle=c6e/1/0x4000000000000000 softirq=82/82 fqs=6 [ 340.694977] rcu: (detected by 5, t=5430 jiffies, g=-719, q=16) [ 340.703803] Task dump for CPU 1: [ 340.709507] dd R running task 0 675 673 0x00000002 [ 340.713018] Call trace: [ 340.720059] __switch_to+0x174/0x1e0 [ 340.722192] 0xffffffc0f6dc9600 [ 352.689742] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 33s! [ 352.689910] Showing busy workqueues and worker pools: [ 352.696743] workqueue mm_percpu_wq: flags=0x8 [ 352.701753] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 [ 352.706099] pending: vmstat_update [ 384.693730] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 65s! [ 384.693815] Showing busy workqueues and worker pools: [ 384.700577] workqueue events: flags=0x0 [ 384.705699] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [ 384.709351] pending: vmstat_shepherd [ 384.715587] workqueue mm_percpu_wq: flags=0x8 [ 384.719495] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 [ 384.723754] pending: vmstat_update ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-02-07 16:56 ` Marc Gonzalez @ 2019-02-08 15:33 ` Marc Gonzalez 2019-02-08 15:49 ` Bart Van Assche 0 siblings, 1 reply; 34+ messages in thread From: Marc Gonzalez @ 2019-02-08 15:33 UTC (permalink / raw) To: linux-mm, linux-block Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI, Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin, Andrew Morton, Michal Hocko On 07/02/2019 17:56, Marc Gonzalez wrote: > Saw a slightly different report from another test run: > https://pastebin.ubuntu.com/p/jCywbKgRCq/ > > [ 340.689764] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 340.689992] rcu: 1-...0: (8548 ticks this GP) idle=c6e/1/0x4000000000000000 softirq=82/82 fqs=6 > [ 340.694977] rcu: (detected by 5, t=5430 jiffies, g=-719, q=16) > [ 340.703803] Task dump for CPU 1: > [ 340.709507] dd R running task 0 675 673 0x00000002 > [ 340.713018] Call trace: > [ 340.720059] __switch_to+0x174/0x1e0 > [ 340.722192] 0xffffffc0f6dc9600 > > [ 352.689742] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 33s! > [ 352.689910] Showing busy workqueues and worker pools: > [ 352.696743] workqueue mm_percpu_wq: flags=0x8 > [ 352.701753] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > [ 352.706099] pending: vmstat_update > > [ 384.693730] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 65s! > [ 384.693815] Showing busy workqueues and worker pools: > [ 384.700577] workqueue events: flags=0x0 > [ 384.705699] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 > [ 384.709351] pending: vmstat_shepherd > [ 384.715587] workqueue mm_percpu_wq: flags=0x8 > [ 384.719495] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > [ 384.723754] pending: vmstat_update Running 'dd if=/dev/sda of=/dev/null bs=40M status=progress' I got a slightly different splat: [ 171.513944] INFO: task dd:674 blocked for more than 23 seconds. [ 171.514131] Tainted: G S 5.0.0-rc5-next-20190206 #23 [ 171.518784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 171.525728] dd D 0 674 672 0x00000000 [ 171.533525] Call trace: [ 171.538926] __switch_to+0x174/0x1e0 [ 171.541237] __schedule+0x1e4/0x630 [ 171.545041] schedule+0x34/0x90 [ 171.548261] io_schedule+0x20/0x40 [ 171.551401] blk_mq_get_tag+0x178/0x320 [ 171.554852] blk_mq_get_request+0x13c/0x3e0 [ 171.558587] blk_mq_make_request+0xcc/0x640 [ 171.562763] generic_make_request+0x1d4/0x390 [ 171.566924] submit_bio+0x5c/0x1c0 [ 171.571447] mpage_readpages+0x178/0x1d0 [ 171.574730] blkdev_readpages+0x3c/0x50 [ 171.578831] read_pages+0x70/0x180 [ 171.582364] __do_page_cache_readahead+0x1cc/0x200 [ 171.585843] ondemand_readahead+0x148/0x310 [ 171.590613] page_cache_async_readahead+0xc0/0x100 [ 171.594719] generic_file_read_iter+0x54c/0x860 [ 171.599565] blkdev_read_iter+0x50/0x80 [ 171.603998] __vfs_read+0x134/0x190 [ 171.607800] vfs_read+0x94/0x130 [ 171.611273] ksys_read+0x6c/0xe0 [ 171.614745] __arm64_sys_read+0x24/0x30 [ 171.617974] el0_svc_handler+0xb8/0x140 [ 171.621509] el0_svc+0x8/0xc For the record, I'll restate the problem: dd hangs when reading a partition larger than RAM, except when using iflag=direct or iflag=nocache # dd if=/dev/sde of=/dev/null bs=64M iflag=direct 64+0 records in 64+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 51.1532 s, 84.0 MB/s # dd if=/dev/sde of=/dev/null bs=64M iflag=nocache 64+0 records in 64+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 60.6478 s, 70.8 MB/s # dd if=/dev/sde of=/dev/null bs=64M count=56 56+0 records in 56+0 records out 3758096384 bytes (3.8 GB, 3.5 GiB) copied, 50.5897 s, 74.3 MB/s # dd if=/dev/sde of=/dev/null bs=64M /*** CONSOLE LOCKS UP ***/ I've been looking at the differences between iflag=direct and no-flag. Using the following script to enable relevant(?) logs: mount -t debugfs nodev /sys/kernel/debug/ cd /sys/kernel/debug/tracing/events echo 1 > filemap/enable echo 1 > pagemap/enable echo 1 > vmscan/enable echo 1 > kmem/mm_page_free/enable echo 1 > kmem/mm_page_free_batched/enable echo 1 > kmem/mm_page_alloc/enable echo 1 > kmem/mm_page_alloc_zone_locked/enable echo 1 > kmem/mm_page_pcpu_drain/enable echo 1 > kmem/mm_page_alloc_extfrag/enable echo 1 > kmem/kmalloc_node/enable echo 1 > kmem/kmem_cache_alloc_node/enable echo 1 > kmem/kmem_cache_alloc/enable echo 1 > kmem/kmem_cache_free/enable # dd if=/dev/sde of=/dev/null bs=64M count=1 iflag=direct https://pastebin.ubuntu.com/p/YWp4pydM6V/ (114942 lines) # dd if=/dev/sde of=/dev/null bs=64M count=1 https://pastebin.ubuntu.com/p/xpzgN5H3Hp/ (247439 lines) Does anyone see what's going sideways in the no-flag case? Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-02-08 15:33 ` Marc Gonzalez @ 2019-02-08 15:49 ` Bart Van Assche 2019-02-09 11:57 ` Marc Gonzalez 2019-02-11 16:36 ` Marc Gonzalez 0 siblings, 2 replies; 34+ messages in thread From: Bart Van Assche @ 2019-02-08 15:49 UTC (permalink / raw) To: Marc Gonzalez, linux-mm, linux-block Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI, Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin, Andrew Morton, Michal Hocko On Fri, 2019-02-08 at 16:33 +0100, Marc Gonzalez wrote: > Does anyone see what's going sideways in the no-flag case? Hi Marc, Does this problem only occur with block devices backed by the UFS driver or does this problem also occur with other block drivers? Thanks, Bart. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-02-08 15:49 ` Bart Van Assche @ 2019-02-09 11:57 ` Marc Gonzalez 2019-02-11 16:36 ` Marc Gonzalez 1 sibling, 0 replies; 34+ messages in thread From: Marc Gonzalez @ 2019-02-09 11:57 UTC (permalink / raw) To: Bart Van Assche, linux-mm, linux-block Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI, Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin, Andrew Morton, Michal Hocko On 08/02/2019 16:49, Bart Van Assche wrote: > On Fri, 2019-02-08 at 16:33 +0100, Marc Gonzalez wrote: > >> Does anyone see what's going sideways in the no-flag case? > > Does this problem only occur with block devices backed by the UFS driver > or does this problem also occur with other block drivers? So far, I've only been able to test with UFS storage. The board has no PATA/SATA. SDHC is not supported yet. With Jeffrey's help, I was able to get a semi-functional USB3 stack running. I'll test USB3 mass storage on Monday. FWIW, I removed most (all?) locks from the UFSHC driver, by dropping scaling and gating support. I could also drop runtime suspend, if someone thinks that could help, but I'm thinking the problem might be in the mm or block layers? (It doesn't look like a locking problem, but more a memory exhaustion problem.) Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-02-08 15:49 ` Bart Van Assche 2019-02-09 11:57 ` Marc Gonzalez @ 2019-02-11 16:36 ` Marc Gonzalez 2019-02-11 17:27 ` Marc Gonzalez 1 sibling, 1 reply; 34+ messages in thread From: Marc Gonzalez @ 2019-02-11 16:36 UTC (permalink / raw) To: Bart Van Assche, linux-mm, linux-block Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI, Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Bart Van Assche, Martin Petersen, Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin, Andrew Morton, Michal Hocko, James Bottomley On 08/02/2019 16:49, Bart Van Assche wrote: > Does this problem only occur with block devices backed by the UFS driver > or does this problem also occur with other block drivers? Yes, same issue with a USB3 mass storage device: usb 2-1: new SuperSpeed Gen 1 USB device number 2 using xhci-hcd usb 2-1: New USB device found, idVendor=05dc, idProduct=a838, bcdDevice=11.00 usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 usb 2-1: Product: USB Flash Drive usb 2-1: Manufacturer: Lexar usb 2-1: SerialNumber: AAYW2W7I13BAR0JC usb-storage 2-1:1.0: USB Mass Storage device detected scsi host0: usb-storage 2-1:1.0 scsi 0:0:0:0: Direct-Access Lexar USB Flash Drive 1100 PQ: 0 ANSI: 6 sd 0:0:0:0: [sda] 62517248 512-byte logical blocks: (32.0 GB/29.8 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sd 0:0:0:0: [sda] Attached SCSI removable disk # dd if=/dev/sda of=/dev/null bs=1M status=progress 3879731200 bytes (3.9 GB, 3.6 GiB) copied, 56.0097 s, 69.3 MB/s This definitively rules out drivers/scsi/ufs (Dropping UFS people) So the problem could be in SCSI glue, or block, or mm? How can I pinpoint the bug? Problem statement and logs: https://lore.kernel.org/linux-block/66419195-594c-aa83-c19d-f091ad3b296d@free.fr/ Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-02-11 16:36 ` Marc Gonzalez @ 2019-02-11 17:27 ` Marc Gonzalez 2019-02-12 15:26 ` [SOLVED] " Marc Gonzalez 0 siblings, 1 reply; 34+ messages in thread From: Marc Gonzalez @ 2019-02-11 17:27 UTC (permalink / raw) To: Bart Van Assche, linux-mm, linux-block Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI, Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Bart Van Assche, Martin Petersen, Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin, Andrew Morton, Michal Hocko, James Bottomley On 11/02/2019 17:36, Marc Gonzalez wrote: > On 08/02/2019 16:49, Bart Van Assche wrote: > >> Does this problem only occur with block devices backed by the UFS driver >> or does this problem also occur with other block drivers? > > Yes, same issue with a USB3 mass storage device: > > usb 2-1: new SuperSpeed Gen 1 USB device number 2 using xhci-hcd > usb 2-1: New USB device found, idVendor=05dc, idProduct=a838, bcdDevice=11.00 > usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 > usb 2-1: Product: USB Flash Drive > usb 2-1: Manufacturer: Lexar > usb 2-1: SerialNumber: AAYW2W7I13BAR0JC > usb-storage 2-1:1.0: USB Mass Storage device detected > scsi host0: usb-storage 2-1:1.0 > scsi 0:0:0:0: Direct-Access Lexar USB Flash Drive 1100 PQ: 0 ANSI: 6 > sd 0:0:0:0: [sda] 62517248 512-byte logical blocks: (32.0 GB/29.8 GiB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > sda: sda1 > sd 0:0:0:0: [sda] Attached SCSI removable disk > > # dd if=/dev/sda of=/dev/null bs=1M status=progress > 3879731200 bytes (3.9 GB, 3.6 GiB) copied, 56.0097 s, 69.3 MB/s > > So the problem could be in SCSI glue, or block, or mm? Unlikely. Someone else would have been affected... A colleague pointed out that some memory areas are reserved downstream. Perhaps the FW goes haywire once the kernel touches reserved memory? I'm off to test that hypothesis. Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* [SOLVED] dd hangs when reading large partitions 2019-02-11 17:27 ` Marc Gonzalez @ 2019-02-12 15:26 ` Marc Gonzalez 0 siblings, 0 replies; 34+ messages in thread From: Marc Gonzalez @ 2019-02-12 15:26 UTC (permalink / raw) To: Bart Van Assche, linux-mm, linux-block Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI, Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter, Bart Van Assche, Martin Petersen, Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin, Andrew Morton, Michal Hocko, James Bottomley On 11/02/2019 18:27, Marc Gonzalez wrote: > A colleague pointed out that some memory areas are reserved downstream. > Perhaps the FW goes haywire once the kernel touches reserved memory? Bingo! FW quirk. https://patchwork.kernel.org/patch/10808173/ Once the reserved memory range is extended, I am finally able to read large partitions: # dd if=/dev/sda of=/dev/null bs=1M 55256+0 records in 55256+0 records out 57940115456 bytes (58 GB, 54 GiB) copied, 786.165 s, 73.7 MB/s Thanks to everyone who provided suggestions and guidance. Regards. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: dd hangs when reading large partitions 2019-01-18 12:10 dd hangs when reading large partitions Marc Gonzalez 2019-01-18 13:39 ` Ming Lei 2019-01-18 15:18 ` jianchao.wang @ 2019-01-18 19:27 ` Douglas Gilbert 2 siblings, 0 replies; 34+ messages in thread From: Douglas Gilbert @ 2019-01-18 19:27 UTC (permalink / raw) To: Marc Gonzalez, fsdevel, linux-block Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto, Jens Axboe, Fujita Tomonori, Paolo Valente On 2019-01-18 7:10 a.m., Marc Gonzalez wrote: > Hello, > > I'm running into an issue which I don't know how to debug. > So I'm open to ideas and suggestions :-) > > On my arm64 board, I have enabled Universal Flash Storage support. > > I wanted to benchmark read performance, and noticed that the system > locks up when I read partitions larger than 3.5 GB, unless I tell > dd to use direct IO: Marc, If you want to benchmark (or torture) UFS read performance I have many dd variants. In the sg3_utils package there is sgp_dd (using POSIX threads for a multi-threaded copy or read) and sgm_dd (using mmap-ed IO). Plus there is a multi-platform ddpt in a package of the same name. One major difference between my dd variants and the "standard" one is that I split the bs=BS option in two: one called bs=BS where BS is the logical block size of the given device, the other is bpt=BPT (blocks per transfer) which is the number of logical blocks in each copy (or read) segment. So with your example below bs=1M would become, for a LB size of 4096, 'bs=4096 bpt=16k'. Also sgp_dd and sgm_dd don't support status=progress (ddpt does) but you can always send 'kill -s USR1 <pid_of_dd>' from another (virtual) console that has root permissions. All my dd variants and dd itself will accept that signal gracefully and output a progress report then continue. Doug Gilbert > *** WITH O_DIRECT *** > # dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress > 57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s > 55256+0 records in > 55256+0 records out > 57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s > > *** WITHOUT O_DIRECT *** > # dd if=/dev/sda of=/dev/null bs=1M status=progress > 3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s > > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > rcu: 1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626 > rcu: 6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626 > rcu: (detected by 7, t=5254 jiffies, g=-275, q=2) > Task dump for CPU 1: > kworker/1:1H R running task 0 675 2 0x0000002a > Workqueue: kblockd blk_mq_run_work_fn > Call trace: > __switch_to+0x168/0x1d0 > 0xffffffc0f6efbbc8 > blk_mq_run_work_fn+0x28/0x40 > process_one_work+0x208/0x470 > worker_thread+0x48/0x460 > kthread+0x128/0x130 > ret_from_fork+0x10/0x1c > Task dump for CPU 6: > kthreadd R running task 0 2 0 0x0000002a > Call trace: > __switch_to+0x168/0x1d0 > 0x5b36396f4e7d4000 > > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > rcu: 1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=10500 > rcu: 6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=10500 > rcu: (detected by 7, t=21009 jiffies, g=-275, q=2) > Task dump for CPU 1: > kworker/1:1H R running task 0 675 2 0x0000002a > Workqueue: kblockd blk_mq_run_work_fn > Call trace: > __switch_to+0x168/0x1d0 > 0xffffffc0f6efbbc8 > blk_mq_run_work_fn+0x28/0x40 > process_one_work+0x208/0x470 > worker_thread+0x48/0x460 > kthread+0x128/0x130 > ret_from_fork+0x10/0x1c > Task dump for CPU 6: > kthreadd R running task 0 2 0 0x0000002a > Call trace: > __switch_to+0x168/0x1d0 > 0x5b36396f4e7d4000 > > > The system always hangs around the 3.6 GiB mark, wherever I start from. > How can I debug this issue? > > Regards. > ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2019-02-12 15:27 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-18 12:10 dd hangs when reading large partitions Marc Gonzalez 2019-01-18 13:39 ` Ming Lei 2019-01-18 14:54 ` Marc Gonzalez 2019-01-18 15:18 ` jianchao.wang 2019-01-18 17:38 ` Marc Gonzalez 2019-01-18 17:48 ` Jens Axboe 2019-01-18 17:51 ` Bart Van Assche 2019-01-18 19:00 ` Jens Axboe 2019-01-19 9:56 ` Christoph Hellwig 2019-01-19 14:37 ` Jens Axboe 2019-01-19 16:09 ` Bart Van Assche 2019-01-21 8:33 ` Christoph Hellwig 2019-01-19 19:47 ` Marc Gonzalez 2019-01-19 20:45 ` Marc Gonzalez 2019-01-21 8:33 ` Christoph Hellwig 2019-01-21 15:22 ` Marc Gonzalez 2019-01-22 3:12 ` jianchao.wang 2019-01-22 10:59 ` Marc Gonzalez 2019-01-22 12:49 ` Marc Gonzalez 2019-01-22 16:17 ` Marc Gonzalez 2019-01-22 16:22 ` Greg Kroah-Hartman 2019-01-22 19:07 ` Evan Green 2019-01-23 3:10 ` jianchao.wang 2019-02-06 16:16 ` Marc Gonzalez 2019-02-06 17:05 ` Marc Gonzalez 2019-02-07 10:44 ` Marc Gonzalez 2019-02-07 16:56 ` Marc Gonzalez 2019-02-08 15:33 ` Marc Gonzalez 2019-02-08 15:49 ` Bart Van Assche 2019-02-09 11:57 ` Marc Gonzalez 2019-02-11 16:36 ` Marc Gonzalez 2019-02-11 17:27 ` Marc Gonzalez 2019-02-12 15:26 ` [SOLVED] " Marc Gonzalez 2019-01-18 19:27 ` Douglas Gilbert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).