linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dd hangs when reading large partitions
@ 2019-01-18 12:10 Marc Gonzalez
  2019-01-18 13:39 ` Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Marc Gonzalez @ 2019-01-18 12:10 UTC (permalink / raw)
  To: fsdevel, linux-block
  Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto,
	Jens Axboe, Fujita Tomonori, Paolo Valente

Hello,

I'm running into an issue which I don't know how to debug.
So I'm open to ideas and suggestions :-)

On my arm64 board, I have enabled Universal Flash Storage support.

I wanted to benchmark read performance, and noticed that the system
locks up when I read partitions larger than 3.5 GB, unless I tell
dd to use direct IO:

*** WITH O_DIRECT ***
# dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress
57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s 
55256+0 records in
55256+0 records out
57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s

*** WITHOUT O_DIRECT ***
# dd if=/dev/sda of=/dev/null bs=1M status=progress
3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s


rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626 
rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626 
rcu:    (detected by 7, t=5254 jiffies, g=-275, q=2)
Task dump for CPU 1:
kworker/1:1H    R  running task        0   675      2 0x0000002a
Workqueue: kblockd blk_mq_run_work_fn
Call trace:
 __switch_to+0x168/0x1d0
 0xffffffc0f6efbbc8
 blk_mq_run_work_fn+0x28/0x40
 process_one_work+0x208/0x470
 worker_thread+0x48/0x460
 kthread+0x128/0x130
 ret_from_fork+0x10/0x1c
Task dump for CPU 6:
kthreadd        R  running task        0     2      0 0x0000002a
Call trace:
 __switch_to+0x168/0x1d0
 0x5b36396f4e7d4000


rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=10500 
rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=10500 
rcu:    (detected by 7, t=21009 jiffies, g=-275, q=2)
Task dump for CPU 1:
kworker/1:1H    R  running task        0   675      2 0x0000002a
Workqueue: kblockd blk_mq_run_work_fn
Call trace:
 __switch_to+0x168/0x1d0
 0xffffffc0f6efbbc8
 blk_mq_run_work_fn+0x28/0x40
 process_one_work+0x208/0x470
 worker_thread+0x48/0x460
 kthread+0x128/0x130
 ret_from_fork+0x10/0x1c
Task dump for CPU 6:
kthreadd        R  running task        0     2      0 0x0000002a
Call trace:
 __switch_to+0x168/0x1d0
 0x5b36396f4e7d4000


The system always hangs around the 3.6 GiB mark, wherever I start from.
How can I debug this issue?

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-18 12:10 dd hangs when reading large partitions Marc Gonzalez
@ 2019-01-18 13:39 ` Ming Lei
  2019-01-18 14:54   ` Marc Gonzalez
  2019-01-18 15:18 ` jianchao.wang
  2019-01-18 19:27 ` Douglas Gilbert
  2 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2019-01-18 13:39 UTC (permalink / raw)
  To: Marc Gonzalez
  Cc: fsdevel, linux-block, SCSI, Alexander Viro, Jan Kara,
	Christoph Hellwig, Joao Pinto, Jens Axboe, Fujita Tomonori,
	Paolo Valente

On Fri, Jan 18, 2019 at 8:11 PM Marc Gonzalez <marc.w.gonzalez@free.fr> wrote:
>
> Hello,
>
> I'm running into an issue which I don't know how to debug.
> So I'm open to ideas and suggestions :-)
>
> On my arm64 board, I have enabled Universal Flash Storage support.
>
> I wanted to benchmark read performance, and noticed that the system
> locks up when I read partitions larger than 3.5 GB, unless I tell
> dd to use direct IO:
>
> *** WITH O_DIRECT ***
> # dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress
> 57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s
> 55256+0 records in
> 55256+0 records out
> 57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s
>
> *** WITHOUT O_DIRECT ***
> # dd if=/dev/sda of=/dev/null bs=1M status=progress
> 3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s
>
>
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626
> rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626
> rcu:    (detected by 7, t=5254 jiffies, g=-275, q=2)
> Task dump for CPU 1:
> kworker/1:1H    R  running task        0   675      2 0x0000002a
> Workqueue: kblockd blk_mq_run_work_fn
> Call trace:
>  __switch_to+0x168/0x1d0
>  0xffffffc0f6efbbc8
>  blk_mq_run_work_fn+0x28/0x40
>  process_one_work+0x208/0x470
>  worker_thread+0x48/0x460
>  kthread+0x128/0x130
>  ret_from_fork+0x10/0x1c
> Task dump for CPU 6:
> kthreadd        R  running task        0     2      0 0x0000002a
> Call trace:
>  __switch_to+0x168/0x1d0
>  0x5b36396f4e7d4000
>
>
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=10500
> rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=10500
> rcu:    (detected by 7, t=21009 jiffies, g=-275, q=2)
> Task dump for CPU 1:
> kworker/1:1H    R  running task        0   675      2 0x0000002a
> Workqueue: kblockd blk_mq_run_work_fn
> Call trace:
>  __switch_to+0x168/0x1d0
>  0xffffffc0f6efbbc8
>  blk_mq_run_work_fn+0x28/0x40
>  process_one_work+0x208/0x470
>  worker_thread+0x48/0x460
>  kthread+0x128/0x130
>  ret_from_fork+0x10/0x1c
> Task dump for CPU 6:
> kthreadd        R  running task        0     2      0 0x0000002a
> Call trace:
>  __switch_to+0x168/0x1d0
>  0x5b36396f4e7d4000
>
>
> The system always hangs around the 3.6 GiB mark, wherever I start from.
> How can I debug this issue?

Maybe you can try to collect debugfs log first via the following command?

(cd /sys/kernel/debug/block/sda && find . -type f -exec grep -aH . {} \;)

BTW,  I have several questions on this report:

- what is the kernel version in your test?
- can you reproduce this issue on other disk(not UFS)?
- are there any tasks in 'D' state shown via 'ps -ax'? If yes, please dump their
stack trace.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-18 13:39 ` Ming Lei
@ 2019-01-18 14:54   ` Marc Gonzalez
  0 siblings, 0 replies; 34+ messages in thread
From: Marc Gonzalez @ 2019-01-18 14:54 UTC (permalink / raw)
  To: Ming Lei, fsdevel, linux-block
  Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto,
	Jens Axboe, Fujita Tomonori, Paolo Valente

On 18/01/2019 14:39, Ming Lei wrote:

> On Fri, Jan 18, 2019 at 8:11 PM Marc Gonzalez wrote:
> 
>> I'm running into an issue which I don't know how to debug.
>> So I'm open to ideas and suggestions :-)
>>
>> On my arm64 board, I have enabled Universal Flash Storage support.
>>
>> I wanted to benchmark read performance, and noticed that the system
>> locks up when I read partitions larger than 3.5 GB, unless I tell
>> dd to use direct IO:
>>
>> *** WITH O_DIRECT ***
>> # dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress
>> 57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s
>> 55256+0 records in
>> 55256+0 records out
>> 57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s
>>
>> *** WITHOUT O_DIRECT ***
>> # dd if=/dev/sda of=/dev/null bs=1M status=progress
>> 3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s
>>
>>
>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626
>> rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626
>> rcu:    (detected by 7, t=5254 jiffies, g=-275, q=2)
>> Task dump for CPU 1:
>> kworker/1:1H    R  running task        0   675      2 0x0000002a
>> Workqueue: kblockd blk_mq_run_work_fn
>> Call trace:
>>  __switch_to+0x168/0x1d0
>>  0xffffffc0f6efbbc8
>>  blk_mq_run_work_fn+0x28/0x40
>>  process_one_work+0x208/0x470
>>  worker_thread+0x48/0x460
>>  kthread+0x128/0x130
>>  ret_from_fork+0x10/0x1c
>> Task dump for CPU 6:
>> kthreadd        R  running task        0     2      0 0x0000002a
>> Call trace:
>>  __switch_to+0x168/0x1d0
>>  0x5b36396f4e7d4000
>>
>>
>> The system always hangs around the 3.6 GiB mark, wherever I start from.
>> How can I debug this issue?
> 
> Maybe you can try to collect debugfs log first via the following command?
> 
> (cd /sys/kernel/debug/block/sda && find . -type f -exec grep -aH . {} \;)

# find . -type f -exec grep -aH . {} \;
./hctx0/cpu7/completed:0 0
./hctx0/cpu7/merged:0
./hctx0/cpu7/dispatched:0 0
./hctx0/cpu6/completed:0 5
./hctx0/cpu6/merged:0
./hctx0/cpu6/dispatched:0 5
./hctx0/cpu5/completed:0 0
./hctx0/cpu5/merged:0
./hctx0/cpu5/dispatched:0 0
./hctx0/cpu4/completed:0 0
./hctx0/cpu4/merged:0
./hctx0/cpu4/dispatched:0 0
./hctx0/cpu3/completed:5 38
./hctx0/cpu3/merged:0
./hctx0/cpu3/dispatched:5 38
./hctx0/cpu2/completed:0 0
./hctx0/cpu2/merged:0
./hctx0/cpu2/dispatched:0 0
./hctx0/cpu1/completed:0 0
./hctx0/cpu1/merged:0
./hctx0/cpu1/dispatched:0 0
./hctx0/cpu0/completed:0 0
./hctx0/cpu0/merged:0
./hctx0/cpu0/dispatched:0 0
./hctx0/type:default
./hctx0/dispatch_busy:0
./hctx0/active:0
./hctx0/run:43
./hctx0/queued:48
./hctx0/dispatched:       0     0
./hctx0/dispatched:       1     43
./hctx0/dispatched:       2     0
./hctx0/dispatched:       4     0
./hctx0/dispatched:       8     0
./hctx0/dispatched:      16     0
./hctx0/dispatched:      32+    0
./hctx0/io_poll:considered=0
./hctx0/io_poll:invoked=0
./hctx0/io_poll:success=0
./hctx0/tags_bitmap:00000000: ff07 070f
./hctx0/tags:nr_tags=32
./hctx0/tags:nr_reserved_tags=0
./hctx0/tags:active_queues=0
./hctx0/tags:bitmap_tags:
./hctx0/tags:depth=32
./hctx0/tags:busy=0
./hctx0/tags:cleared=18
./hctx0/tags:bits_per_word=8
./hctx0/tags:map_nr=4
./hctx0/tags:alloc_hint={12, 7, 12, 24, 12, 27, 10, 18}
./hctx0/tags:wake_batch=4
./hctx0/tags:wake_index=0
./hctx0/tags:ws_active=0
./hctx0/tags:ws={
./hctx0/tags:   {.wait_cnt=4, .wait=inactive},
./hctx0/tags:   {.wait_cnt=4, .wait=inactive},
./hctx0/tags:   {.wait_cnt=4, .wait=inactive},
./hctx0/tags:   {.wait_cnt=4, .wait=inactive},
./hctx0/tags:   {.wait_cnt=4, .wait=inactive},
./hctx0/tags:   {.wait_cnt=4, .wait=inactive},
./hctx0/tags:   {.wait_cnt=4, .wait=inactive},
./hctx0/tags:   {.wait_cnt=4, .wait=inactive},
./hctx0/tags:}
./hctx0/tags:round_robin=0
./hctx0/tags:min_shallow_depth=4294967295
./hctx0/ctx_map:00000000: 00
./hctx0/flags:alloc_policy=FIFO SHOULD_MERGE|TAG_SHARED|SG_MERGE
./write_hints:hint0: 0
./write_hints:hint1: 0
./write_hints:hint2: 0
./write_hints:hint3: 0
./write_hints:hint4: 0
./state:SAME_COMP|NONROT|IO_STAT|DISCARD|INIT_DONE|WC|FUA|REGISTERED|SCSI_PASSTHROUGH
./pm_only:0
./poll_stat:read  (512 Bytes): samples=0
./poll_stat:write (512 Bytes): samples=0
./poll_stat:read  (1024 Bytes): samples=0
./poll_stat:write (1024 Bytes): samples=0
./poll_stat:read  (2048 Bytes): samples=0
./poll_stat:write (2048 Bytes): samples=0
./poll_stat:read  (4096 Bytes): samples=0
./poll_stat:write (4096 Bytes): samples=0
./poll_stat:read  (8192 Bytes): samples=0
./poll_stat:write (8192 Bytes): samples=0
./poll_stat:read  (16384 Bytes): samples=0
./poll_stat:write (16384 Bytes): samples=0
./poll_stat:read  (32768 Bytes): samples=0
./poll_stat:write (32768 Bytes): samples=0
./poll_stat:read  (65536 Bytes): samples=0
./poll_stat:write (65536 Bytes): samples=0


> BTW,  I have several questions on this report:
> 
> - what is the kernel version in your test?

v5.0-rc1 + a few patches to enable UFS

> - can you reproduce this issue on other disk (not UFS)?

At the moment, UFS is the only available storage option.
(USB is broken, SDHC requires some work.)

> - are there any tasks in 'D' state shown via 'ps -ax'? If yes, please dump their
> stack trace.

(This rootfs provides busybox ps.)

# ps -o pid,user,stat,args
PID   USER     STAT COMMAND
    1 root     S    init
    2 root     SW   [kthreadd]
    3 root     IW<  [rcu_gp]
    4 root     IW<  [rcu_par_gp]
    5 root     IW   [kworker/0:0-eve]
    6 root     IW<  [kworker/0:0H-kb]
    8 root     IW<  [mm_percpu_wq]
    9 root     SW   [ksoftirqd/0]
   10 root     IW   [rcu_preempt]
   11 root     SW   [migration/0]
   12 root     IW   [kworker/0:1-eve]
   13 root     SW   [cpuhp/0]
   14 root     SW   [cpuhp/1]
   15 root     SW   [migration/1]
   16 root     SW   [ksoftirqd/1]
   17 root     IW   [kworker/1:0-eve]
   18 root     IW<  [kworker/1:0H-kb]
   19 root     SW   [cpuhp/2]
   20 root     SW   [migration/2]
   21 root     SW   [ksoftirqd/2]
   22 root     IW   [kworker/2:0-eve]
   23 root     IW<  [kworker/2:0H-kb]
   24 root     SW   [cpuhp/3]
   25 root     SW   [migration/3]
   26 root     SW   [ksoftirqd/3]
   27 root     IW   [kworker/3:0-eve]
   28 root     IW<  [kworker/3:0H-kb]
   29 root     SW   [cpuhp/4]
   30 root     SW   [migration/4]
   31 root     SW   [ksoftirqd/4]
   32 root     IW   [kworker/4:0-eve]
   33 root     IW<  [kworker/4:0H]
   34 root     SW   [cpuhp/5]
   35 root     SW   [migration/5]
   36 root     SW   [ksoftirqd/5]
   37 root     IW   [kworker/5:0-eve]
   38 root     IW<  [kworker/5:0H]
   39 root     SW   [cpuhp/6]
   40 root     SW   [migration/6]
   41 root     SW   [ksoftirqd/6]
   42 root     IW   [kworker/6:0-eve]
   43 root     IW<  [kworker/6:0H-kb]
   44 root     SW   [cpuhp/7]
   45 root     SW   [migration/7]
   46 root     SW   [ksoftirqd/7]
   47 root     IW   [kworker/7:0-eve]
   48 root     IW<  [kworker/7:0H]
   49 root     SW   [kdevtmpfs]
   52 root     SW   [rcu_tasks_kthre]
   53 root     IW   [kworker/3:1-mm_]
   64 root     IW   [kworker/4:1-mm_]
   65 root     IW   [kworker/5:1-eve]
   76 root     IW   [kworker/6:1-mm_]
   78 root     IW   [kworker/1:1-mm_]
   88 root     IW   [kworker/7:1-eve]
  101 root     IW   [kworker/2:1-mm_]
  198 root     SW   [khungtaskd]
  199 root     SW   [oom_reaper]
  200 root     IW<  [writeback]
  202 root     SW   [kcompactd0]
  204 root     IW<  [kblockd]
  223 root     IW<  [devfreq_wq]
  372 root     SW   [kswapd0]
  505 root     SW   [scsi_eh_0]
  506 root     IW<  [scsi_tmf_0]
  507 root     IW<  [ufs_clk_gating_]
  508 root     IW<  [ufs_clkscaling_]
  546 root     IW   [kworker/u16:5-e]
  555 root     IW   [kworker/u16:7-e]
  657 root     S    /sbin/klogd -n
  675 root     S    -sh
  677 root     IW<  [kworker/6:1H-kb]
  678 root     IW<  [kworker/3:1H]
  679 root     IW<  [kworker/1:1H]
  680 root     IW<  [kworker/2:1H]
  681 root     IW<  [kworker/0:1H]
  859 root     R    ps -o pid,user,stat,args


Do you see anything relevant?

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-18 12:10 dd hangs when reading large partitions Marc Gonzalez
  2019-01-18 13:39 ` Ming Lei
@ 2019-01-18 15:18 ` jianchao.wang
  2019-01-18 17:38   ` Marc Gonzalez
  2019-01-18 17:48   ` Jens Axboe
  2019-01-18 19:27 ` Douglas Gilbert
  2 siblings, 2 replies; 34+ messages in thread
From: jianchao.wang @ 2019-01-18 15:18 UTC (permalink / raw)
  To: Marc Gonzalez, fsdevel, linux-block
  Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto,
	Jens Axboe, Fujita Tomonori, Paolo Valente

Hello

On 1/18/19 8:10 PM, Marc Gonzalez wrote:
> Hello,
> 
> I'm running into an issue which I don't know how to debug.
> So I'm open to ideas and suggestions :-)
> 
> On my arm64 board, I have enabled Universal Flash Storage support.
> 
> I wanted to benchmark read performance, and noticed that the system
> locks up when I read partitions larger than 3.5 GB, unless I tell
> dd to use direct IO:
> 
> *** WITH O_DIRECT ***
> # dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress
> 57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s 
> 55256+0 records in
> 55256+0 records out
> 57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s
> 
> *** WITHOUT O_DIRECT ***
> # dd if=/dev/sda of=/dev/null bs=1M status=progress
> 3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s
> 
> 
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626 
> rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626 
> rcu:    (detected by 7, t=5254 jiffies, g=-275, q=2)
> Task dump for CPU 1:
> kworker/1:1H    R  running task        0   675      2 0x0000002a
> Workqueue: kblockd blk_mq_run_work_fn
> Call trace:
>  __switch_to+0x168/0x1d0

It looks like the blk_mq_run_work_fn went to sleep with rcu lock (preempt), isn't it ?
Can you share the symbol of the following address ?

>  0xffffffc0f6efbbc8


>  blk_mq_run_work_fn+0x28/0x40
>  process_one_work+0x208/0x470
>  worker_thread+0x48/0x460
>  kthread+0x128/0x130
>  ret_from_fork+0x10/0x1c
> Task dump for CPU 6:
> kthreadd        R  running task        0     2      0 0x0000002a
> Call trace:
>  __switch_to+0x168/0x1d0
>  0x5b36396f4e7d4000
> 
> 
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=10500 
> rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=10500 
> rcu:    (detected by 7, t=21009 jiffies, g=-275, q=2)
> Task dump for CPU 1:
> kworker/1:1H    R  running task        0   675      2 0x0000002a
> Workqueue: kblockd blk_mq_run_work_fn
> Call trace:
>  __switch_to+0x168/0x1d0
>  0xffffffc0f6efbbc8
>  blk_mq_run_work_fn+0x28/0x40
>  process_one_work+0x208/0x470
>  worker_thread+0x48/0x460
>  kthread+0x128/0x130
>  ret_from_fork+0x10/0x1c
> Task dump for CPU 6:
> kthreadd        R  running task        0     2      0 0x0000002a
> Call trace:
>  __switch_to+0x168/0x1d0
>  0x5b36396f4e7d4000
> 
> 
> The system always hangs around the 3.6 GiB mark, wherever I start from.
> How can I debug this issue?
> 
> Regards.
> 

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-18 15:18 ` jianchao.wang
@ 2019-01-18 17:38   ` Marc Gonzalez
  2019-01-18 17:48   ` Jens Axboe
  1 sibling, 0 replies; 34+ messages in thread
From: Marc Gonzalez @ 2019-01-18 17:38 UTC (permalink / raw)
  To: Jianchao Wang, fsdevel, linux-block
  Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto,
	Jens Axboe, Fujita Tomonori, Paolo Valente

On 18/01/2019 16:18, jianchao.wang wrote:

> It looks like the blk_mq_run_work_fn went to sleep with rcu lock (preempt), isn't it ?
> Can you share the symbol of the following address ?

Note: if I boot with 'nosmp', then the system reboots without logging anything.
(Perhaps some kind of HW watchdog, or secure OS)

I no longer get the 'rcu_preempt detected stalls' message...

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-18 15:18 ` jianchao.wang
  2019-01-18 17:38   ` Marc Gonzalez
@ 2019-01-18 17:48   ` Jens Axboe
  2019-01-18 17:51     ` Bart Van Assche
  2019-01-19  9:56     ` Christoph Hellwig
  1 sibling, 2 replies; 34+ messages in thread
From: Jens Axboe @ 2019-01-18 17:48 UTC (permalink / raw)
  To: jianchao.wang, Marc Gonzalez, fsdevel, linux-block
  Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto,
	Fujita Tomonori, Paolo Valente

On 1/18/19 8:18 AM, jianchao.wang wrote:
> Hello
> 
> On 1/18/19 8:10 PM, Marc Gonzalez wrote:
>> Hello,
>>
>> I'm running into an issue which I don't know how to debug.
>> So I'm open to ideas and suggestions :-)
>>
>> On my arm64 board, I have enabled Universal Flash Storage support.
>>
>> I wanted to benchmark read performance, and noticed that the system
>> locks up when I read partitions larger than 3.5 GB, unless I tell
>> dd to use direct IO:
>>
>> *** WITH O_DIRECT ***
>> # dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress
>> 57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s 
>> 55256+0 records in
>> 55256+0 records out
>> 57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s
>>
>> *** WITHOUT O_DIRECT ***
>> # dd if=/dev/sda of=/dev/null bs=1M status=progress
>> 3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s
>>
>>
>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626 
>> rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626 
>> rcu:    (detected by 7, t=5254 jiffies, g=-275, q=2)
>> Task dump for CPU 1:
>> kworker/1:1H    R  running task        0   675      2 0x0000002a
>> Workqueue: kblockd blk_mq_run_work_fn
>> Call trace:
>>  __switch_to+0x168/0x1d0
> 
> It looks like the blk_mq_run_work_fn went to sleep with rcu lock (preempt), isn't it ?
> Can you share the symbol of the following address ?

It's UFS that totally buggy, if you look at its queuecommand, it does:

        if (!down_read_trylock(&hba->clk_scaling_lock))                         
                return SCSI_MLQUEUE_HOST_BUSY;

UFS either needs to get fixed up, or we'll want a way to do something like
the below.

Marc, can you test this?


diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index eaf329db3973..e28c3420a9d9 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -412,6 +412,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	shost->hostt = sht;
 	shost->this_id = sht->this_id;
 	shost->can_queue = sht->can_queue;
+	shost->queue_may_block = sht->queue_may_block;
 	shost->sg_tablesize = sht->sg_tablesize;
 	shost->sg_prot_tablesize = sht->sg_prot_tablesize;
 	shost->cmd_per_lun = sht->cmd_per_lun;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b13cc9288ba0..4e266af2871f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1902,6 +1902,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
 	shost->tag_set.flags |=
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
+	if (shost->queue_may_blocK)
+		shost->tag_set.flags |= BLK_MQ_F_BLOCKING;
 	shost->tag_set.driver_data = shost;
 
 	return blk_mq_alloc_tag_set(&shost->tag_set);
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 9ba7671b84f8..9ab354e43630 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6981,6 +6981,7 @@ static struct scsi_host_template ufshcd_driver_template = {
 	.sg_tablesize		= SG_ALL,
 	.cmd_per_lun		= UFSHCD_CMD_PER_LUN,
 	.can_queue		= UFSHCD_CAN_QUEUE,
+	.queue_may_block	= 1,
 	.max_host_blocked	= 1,
 	.track_queue_depth	= 1,
 	.sdev_groups		= ufshcd_driver_groups,
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 6ca954e9f752..30aa7b6c4342 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -339,6 +339,11 @@ struct scsi_host_template {
 	 */
 	int can_queue;
 
+	/*
+	 * If the ->queuecommand() ever blocks, this should be set
+	 */
+	int queue_may_block;
+
 	/*
 	 * In many instances, especially where disconnect / reconnect are
 	 * supported, our host also has an ID on the SCSI bus.  If this is
@@ -584,6 +589,7 @@ struct Scsi_Host {
 
 	int this_id;
 	int can_queue;
+	int queue_may_block;
 	short cmd_per_lun;
 	short unsigned int sg_tablesize;
 	short unsigned int sg_prot_tablesize;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-18 17:48   ` Jens Axboe
@ 2019-01-18 17:51     ` Bart Van Assche
  2019-01-18 19:00       ` Jens Axboe
  2019-01-19  9:56     ` Christoph Hellwig
  1 sibling, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2019-01-18 17:51 UTC (permalink / raw)
  To: Jens Axboe, jianchao.wang, Marc Gonzalez, fsdevel, linux-block
  Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto,
	Fujita Tomonori, Paolo Valente

On Fri, 2019-01-18 at 10:48 -0700, Jens Axboe wrote:
> It's UFS that totally buggy, if you look at its queuecommand, it does:
> 
>         if (!down_read_trylock(&hba->clk_scaling_lock))                         
>                 return SCSI_MLQUEUE_HOST_BUSY;
> 
> UFS either needs to get fixed up, or we'll want a way to do something like
> the below.
> 
> Marc, can you test this?
> 
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index eaf329db3973..e28c3420a9d9 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -412,6 +412,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>  	shost->hostt = sht;
>  	shost->this_id = sht->this_id;
>  	shost->can_queue = sht->can_queue;
> +	shost->queue_may_block = sht->queue_may_block;
>  	shost->sg_tablesize = sht->sg_tablesize;
>  	shost->sg_prot_tablesize = sht->sg_prot_tablesize;
>  	shost->cmd_per_lun = sht->cmd_per_lun;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index b13cc9288ba0..4e266af2871f 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1902,6 +1902,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>  	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
>  	shost->tag_set.flags |=
>  		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
> +	if (shost->queue_may_blocK)
> +		shost->tag_set.flags |= BLK_MQ_F_BLOCKING;
>  	shost->tag_set.driver_data = shost;
>  
>  	return blk_mq_alloc_tag_set(&shost->tag_set);
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 9ba7671b84f8..9ab354e43630 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -6981,6 +6981,7 @@ static struct scsi_host_template ufshcd_driver_template = {
>  	.sg_tablesize		= SG_ALL,
>  	.cmd_per_lun		= UFSHCD_CMD_PER_LUN,
>  	.can_queue		= UFSHCD_CAN_QUEUE,
> +	.queue_may_block	= 1,
>  	.max_host_blocked	= 1,
>  	.track_queue_depth	= 1,
>  	.sdev_groups		= ufshcd_driver_groups,
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 6ca954e9f752..30aa7b6c4342 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -339,6 +339,11 @@ struct scsi_host_template {
>  	 */
>  	int can_queue;
>  
> +	/*
> +	 * If the ->queuecommand() ever blocks, this should be set
> +	 */
> +	int queue_may_block;
> +
>  	/*
>  	 * In many instances, especially where disconnect / reconnect are
>  	 * supported, our host also has an ID on the SCSI bus.  If this is
> @@ -584,6 +589,7 @@ struct Scsi_Host {
>  
>  	int this_id;
>  	int can_queue;
> +	int queue_may_block;
>  	short cmd_per_lun;
>  	short unsigned int sg_tablesize;
>  	short unsigned int sg_prot_tablesize;

Hi Jens,

Did you perhaps want to include a change from down_read_trylock() into
down_read() in the UFS queuecommand function in this patch?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-18 17:51     ` Bart Van Assche
@ 2019-01-18 19:00       ` Jens Axboe
  0 siblings, 0 replies; 34+ messages in thread
From: Jens Axboe @ 2019-01-18 19:00 UTC (permalink / raw)
  To: Bart Van Assche, jianchao.wang, Marc Gonzalez, fsdevel, linux-block
  Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto,
	Fujita Tomonori, Paolo Valente

On 1/18/19 10:51 AM, Bart Van Assche wrote:
> On Fri, 2019-01-18 at 10:48 -0700, Jens Axboe wrote:
>> It's UFS that totally buggy, if you look at its queuecommand, it does:
>>
>>         if (!down_read_trylock(&hba->clk_scaling_lock))                         
>>                 return SCSI_MLQUEUE_HOST_BUSY;
>>
>> UFS either needs to get fixed up, or we'll want a way to do something like
>> the below.
>>
>> Marc, can you test this?
>>
>>
>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>> index eaf329db3973..e28c3420a9d9 100644
>> --- a/drivers/scsi/hosts.c
>> +++ b/drivers/scsi/hosts.c
>> @@ -412,6 +412,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>>  	shost->hostt = sht;
>>  	shost->this_id = sht->this_id;
>>  	shost->can_queue = sht->can_queue;
>> +	shost->queue_may_block = sht->queue_may_block;
>>  	shost->sg_tablesize = sht->sg_tablesize;
>>  	shost->sg_prot_tablesize = sht->sg_prot_tablesize;
>>  	shost->cmd_per_lun = sht->cmd_per_lun;
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index b13cc9288ba0..4e266af2871f 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1902,6 +1902,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>>  	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
>>  	shost->tag_set.flags |=
>>  		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
>> +	if (shost->queue_may_blocK)
>> +		shost->tag_set.flags |= BLK_MQ_F_BLOCKING;
>>  	shost->tag_set.driver_data = shost;
>>  
>>  	return blk_mq_alloc_tag_set(&shost->tag_set);
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 9ba7671b84f8..9ab354e43630 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -6981,6 +6981,7 @@ static struct scsi_host_template ufshcd_driver_template = {
>>  	.sg_tablesize		= SG_ALL,
>>  	.cmd_per_lun		= UFSHCD_CMD_PER_LUN,
>>  	.can_queue		= UFSHCD_CAN_QUEUE,
>> +	.queue_may_block	= 1,
>>  	.max_host_blocked	= 1,
>>  	.track_queue_depth	= 1,
>>  	.sdev_groups		= ufshcd_driver_groups,
>> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
>> index 6ca954e9f752..30aa7b6c4342 100644
>> --- a/include/scsi/scsi_host.h
>> +++ b/include/scsi/scsi_host.h
>> @@ -339,6 +339,11 @@ struct scsi_host_template {
>>  	 */
>>  	int can_queue;
>>  
>> +	/*
>> +	 * If the ->queuecommand() ever blocks, this should be set
>> +	 */
>> +	int queue_may_block;
>> +
>>  	/*
>>  	 * In many instances, especially where disconnect / reconnect are
>>  	 * supported, our host also has an ID on the SCSI bus.  If this is
>> @@ -584,6 +589,7 @@ struct Scsi_Host {
>>  
>>  	int this_id;
>>  	int can_queue;
>> +	int queue_may_block;
>>  	short cmd_per_lun;
>>  	short unsigned int sg_tablesize;
>>  	short unsigned int sg_prot_tablesize;
> 
> Hi Jens,
> 
> Did you perhaps want to include a change from down_read_trylock() into
> down_read() in the UFS queuecommand function in this patch?

Yeah, that should be done as well. But honestly, looking at UFS, it's in
dire need of love from someone that has some experience in writing a
driver. The locking is absolutely miserable.

For one, the clk_scaling_lock needs to die. The clock scaling change
should coordinate with the requests, not try to lock out request
handling. It's totally an upside down approach.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-18 12:10 dd hangs when reading large partitions Marc Gonzalez
  2019-01-18 13:39 ` Ming Lei
  2019-01-18 15:18 ` jianchao.wang
@ 2019-01-18 19:27 ` Douglas Gilbert
  2 siblings, 0 replies; 34+ messages in thread
From: Douglas Gilbert @ 2019-01-18 19:27 UTC (permalink / raw)
  To: Marc Gonzalez, fsdevel, linux-block
  Cc: SCSI, Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto,
	Jens Axboe, Fujita Tomonori, Paolo Valente

On 2019-01-18 7:10 a.m., Marc Gonzalez wrote:
> Hello,
> 
> I'm running into an issue which I don't know how to debug.
> So I'm open to ideas and suggestions :-)
> 
> On my arm64 board, I have enabled Universal Flash Storage support.
> 
> I wanted to benchmark read performance, and noticed that the system
> locks up when I read partitions larger than 3.5 GB, unless I tell
> dd to use direct IO:

Marc,
If you want to benchmark (or torture) UFS read performance I have
many dd variants. In the sg3_utils package there is sgp_dd (using
POSIX threads for a multi-threaded copy or read) and sgm_dd (using
mmap-ed IO). Plus there is a multi-platform ddpt in a package of
the same name.

One major difference between my dd variants and the "standard" one
is that I split the bs=BS option in two: one called bs=BS where BS
is the logical block size of the given device, the other is bpt=BPT
(blocks per transfer) which is the number of logical blocks in each
copy (or read) segment.

So with your example below bs=1M would become, for a LB size of 4096,
'bs=4096 bpt=16k'. Also sgp_dd and sgm_dd don't support status=progress
(ddpt does) but you can always send 'kill -s USR1 <pid_of_dd>'
from another (virtual) console that has root permissions. All my
dd variants and dd itself will accept that signal gracefully and output
a progress report then continue.

Doug Gilbert

> *** WITH O_DIRECT ***
> # dd if=/dev/sda of=/dev/null bs=1M iflag=direct status=progress
> 57892929536 bytes (58 GB, 54 GiB) copied, 697.006 s, 83.1 MB/s
> 55256+0 records in
> 55256+0 records out
> 57940115456 bytes (58 GB, 54 GiB) copied, 697.575 s, 83.1 MB/s
> 
> *** WITHOUT O_DIRECT ***
> # dd if=/dev/sda of=/dev/null bs=1M status=progress
> 3853516800 bytes (3.9 GB, 3.6 GiB) copied, 49.0002 s, 78.6 MB/s
> 
> 
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=2626
> rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=2626
> rcu:    (detected by 7, t=5254 jiffies, g=-275, q=2)
> Task dump for CPU 1:
> kworker/1:1H    R  running task        0   675      2 0x0000002a
> Workqueue: kblockd blk_mq_run_work_fn
> Call trace:
>   __switch_to+0x168/0x1d0
>   0xffffffc0f6efbbc8
>   blk_mq_run_work_fn+0x28/0x40
>   process_one_work+0x208/0x470
>   worker_thread+0x48/0x460
>   kthread+0x128/0x130
>   ret_from_fork+0x10/0x1c
> Task dump for CPU 6:
> kthreadd        R  running task        0     2      0 0x0000002a
> Call trace:
>   __switch_to+0x168/0x1d0
>   0x5b36396f4e7d4000
> 
> 
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu:    1-...0: (8242 ticks this GP) idle=106/1/0x4000000000000000 softirq=168/171 fqs=10500
> rcu:    6-...0: (99 GPs behind) idle=ec2/1/0x4000000000000000 softirq=71/71 fqs=10500
> rcu:    (detected by 7, t=21009 jiffies, g=-275, q=2)
> Task dump for CPU 1:
> kworker/1:1H    R  running task        0   675      2 0x0000002a
> Workqueue: kblockd blk_mq_run_work_fn
> Call trace:
>   __switch_to+0x168/0x1d0
>   0xffffffc0f6efbbc8
>   blk_mq_run_work_fn+0x28/0x40
>   process_one_work+0x208/0x470
>   worker_thread+0x48/0x460
>   kthread+0x128/0x130
>   ret_from_fork+0x10/0x1c
> Task dump for CPU 6:
> kthreadd        R  running task        0     2      0 0x0000002a
> Call trace:
>   __switch_to+0x168/0x1d0
>   0x5b36396f4e7d4000
> 
> 
> The system always hangs around the 3.6 GiB mark, wherever I start from.
> How can I debug this issue?
> 
> Regards.
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-18 17:48   ` Jens Axboe
  2019-01-18 17:51     ` Bart Van Assche
@ 2019-01-19  9:56     ` Christoph Hellwig
  2019-01-19 14:37       ` Jens Axboe
                         ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Christoph Hellwig @ 2019-01-19  9:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: jianchao.wang, Marc Gonzalez, fsdevel, linux-block, SCSI,
	Alexander Viro, Jan Kara, Christoph Hellwig, Joao Pinto,
	Fujita Tomonori, Paolo Valente

On Fri, Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote:
> It's UFS that totally buggy, if you look at its queuecommand, it does:
> 
>         if (!down_read_trylock(&hba->clk_scaling_lock))                         
>                 return SCSI_MLQUEUE_HOST_BUSY;
> 
> UFS either needs to get fixed up, or we'll want a way to do something like
> the below.

I think the right answer is to just revert the offending patch instead
of papering over it in the SCSI code.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-19  9:56     ` Christoph Hellwig
@ 2019-01-19 14:37       ` Jens Axboe
  2019-01-19 16:09       ` Bart Van Assche
  2019-01-19 19:47       ` Marc Gonzalez
  2 siblings, 0 replies; 34+ messages in thread
From: Jens Axboe @ 2019-01-19 14:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: jianchao.wang, Marc Gonzalez, fsdevel, linux-block, SCSI,
	Alexander Viro, Jan Kara, Joao Pinto, Fujita Tomonori,
	Paolo Valente

On 1/19/19 2:56 AM, Christoph Hellwig wrote:
> On Fri, Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote:
>> It's UFS that totally buggy, if you look at its queuecommand, it does:
>>
>>         if (!down_read_trylock(&hba->clk_scaling_lock))                         
>>                 return SCSI_MLQUEUE_HOST_BUSY;
>>
>> UFS either needs to get fixed up, or we'll want a way to do something like
>> the below.
> 
> I think the right answer is to just revert the offending patch instead
> of papering over it in the SCSI code.

I fully agree.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-19  9:56     ` Christoph Hellwig
  2019-01-19 14:37       ` Jens Axboe
@ 2019-01-19 16:09       ` Bart Van Assche
  2019-01-21  8:33         ` Christoph Hellwig
  2019-01-19 19:47       ` Marc Gonzalez
  2 siblings, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2019-01-19 16:09 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: jianchao.wang, Marc Gonzalez, fsdevel, linux-block, SCSI,
	Alexander Viro, Jan Kara, Joao Pinto, Fujita Tomonori,
	Paolo Valente

On 1/19/19 1:56 AM, Christoph Hellwig wrote:
> On Fri, Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote:
>> It's UFS that totally buggy, if you look at its queuecommand, it does:
>>
>>          if (!down_read_trylock(&hba->clk_scaling_lock))
>>                  return SCSI_MLQUEUE_HOST_BUSY;
>>
>> UFS either needs to get fixed up, or we'll want a way to do something like
>> the below.
> 
> I think the right answer is to just revert the offending patch instead
> of papering over it in the SCSI code.
> .

Hi Christoph,

Which patch are you referring to?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-19  9:56     ` Christoph Hellwig
  2019-01-19 14:37       ` Jens Axboe
  2019-01-19 16:09       ` Bart Van Assche
@ 2019-01-19 19:47       ` Marc Gonzalez
  2019-01-19 20:45         ` Marc Gonzalez
                           ` (2 more replies)
  2 siblings, 3 replies; 34+ messages in thread
From: Marc Gonzalez @ 2019-01-19 19:47 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Jianchao Wang
  Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani,
	Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath,
	Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam,
	Vijay Viswanath, Sayali Lokhande, Jeffrey Hugo, Maya Erez,
	Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd,
	Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman,
	Bart Van Assche, Martin Petersen, Bjorn Andersson

On 19/01/2019 10:56, Christoph Hellwig wrote:

> On Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote:
>
>> It's UFS that totally buggy, if you look at its queuecommand, it does:
>>
>>         if (!down_read_trylock(&hba->clk_scaling_lock))                         
>>                 return SCSI_MLQUEUE_HOST_BUSY;
>>
>> UFS either needs to get fixed up, or we'll want a way to do something like
>> the below.
> 
> I think the right answer is to just revert the offending patch instead
> of papering over it in the SCSI code.

[ Adjusting recipients list ]

Full thread, for new recipients:
https://www.spinics.net/lists/linux-fsdevel/msg138601.html

Christoph, do you mean a3cd5ec55f6c7 ?

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-19 19:47       ` Marc Gonzalez
@ 2019-01-19 20:45         ` Marc Gonzalez
  2019-01-21  8:33         ` Christoph Hellwig
  2019-01-21 15:22         ` Marc Gonzalez
  2 siblings, 0 replies; 34+ messages in thread
From: Marc Gonzalez @ 2019-01-19 20:45 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Jianchao Wang
  Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani,
	Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath,
	Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam,
	Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson

On 19/01/2019 20:47, Marc Gonzalez wrote:

> On 19/01/2019 10:56, Christoph Hellwig wrote:
> 
>> On Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote:
>>
>>> It's UFS that totally buggy, if you look at its queuecommand, it does:
>>>
>>>         if (!down_read_trylock(&hba->clk_scaling_lock))                         
>>>                 return SCSI_MLQUEUE_HOST_BUSY;
>>>
>>> UFS either needs to get fixed up, or we'll want a way to do something like
>>> the below.
>>
>> I think the right answer is to just revert the offending patch instead
>> of papering over it in the SCSI code.
> 
> [ Adjusting recipients list ]
> 
> Full thread, for new recipients:
> https://www.spinics.net/lists/linux-fsdevel/msg138601.html
> 
> Christoph, do you mean a3cd5ec55f6c7 ?

And by the way, can someone explain these two oddities:

1) the system only hangs if we try to read more than 3.8 GB
   (starting point is irrelevant)

2) the system does not hang if we use iflag=direct

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-19 16:09       ` Bart Van Assche
@ 2019-01-21  8:33         ` Christoph Hellwig
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2019-01-21  8:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, jianchao.wang, Marc Gonzalez,
	fsdevel, linux-block, SCSI, Alexander Viro, Jan Kara, Joao Pinto,
	Fujita Tomonori, Paolo Valente

On Sat, Jan 19, 2019 at 08:09:18AM -0800, Bart Van Assche wrote:
> Which patch are you referring to?

a3cd5ec55f6 ("scsi: ufs: add load based scaling of UFS gear")

That being said it doesn't seem entirely trivial to revert due to
later additions, so this might require a little more work.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-19 19:47       ` Marc Gonzalez
  2019-01-19 20:45         ` Marc Gonzalez
@ 2019-01-21  8:33         ` Christoph Hellwig
  2019-01-21 15:22         ` Marc Gonzalez
  2 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2019-01-21  8:33 UTC (permalink / raw)
  To: Marc Gonzalez
  Cc: Christoph Hellwig, Jens Axboe, Jianchao Wang, fsdevel,
	linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande,
	Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan,
	Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez,
	Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd,
	Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman,
	Bart Van Assche, Martin Petersen, Bjorn Andersson

On Sat, Jan 19, 2019 at 08:47:13PM +0100, Marc Gonzalez wrote:
> On 19/01/2019 10:56, Christoph Hellwig wrote:
> 
> > On Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote:
> >
> >> It's UFS that totally buggy, if you look at its queuecommand, it does:
> >>
> >>         if (!down_read_trylock(&hba->clk_scaling_lock))                         
> >>                 return SCSI_MLQUEUE_HOST_BUSY;
> >>
> >> UFS either needs to get fixed up, or we'll want a way to do something like
> >> the below.
> > 
> > I think the right answer is to just revert the offending patch instead
> > of papering over it in the SCSI code.
> 
> [ Adjusting recipients list ]
> 
> Full thread, for new recipients:
> https://www.spinics.net/lists/linux-fsdevel/msg138601.html
> 
> Christoph, do you mean a3cd5ec55f6c7 ?

Yes.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-19 19:47       ` Marc Gonzalez
  2019-01-19 20:45         ` Marc Gonzalez
  2019-01-21  8:33         ` Christoph Hellwig
@ 2019-01-21 15:22         ` Marc Gonzalez
  2019-01-22  3:12           ` jianchao.wang
  2 siblings, 1 reply; 34+ messages in thread
From: Marc Gonzalez @ 2019-01-21 15:22 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Jianchao Wang
  Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani,
	Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath,
	Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam,
	Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson

On 19/01/2019 20:47, Marc Gonzalez wrote:

> On 19/01/2019 10:56, Christoph Hellwig wrote:
> 
>> On Jan 18, 2019 at 10:48:15AM -0700, Jens Axboe wrote:
>>
>>> It's UFS that totally buggy, if you look at its queuecommand, it does:
>>>
>>>         if (!down_read_trylock(&hba->clk_scaling_lock))                         
>>>                 return SCSI_MLQUEUE_HOST_BUSY;
>>>
>>> UFS either needs to get fixed up, or we'll want a way to do something like
>>> the below.
>>
>> I think the right answer is to just revert the offending patch instead
>> of papering over it in the SCSI code.
> 
> [ Adjusting recipients list ]
> 
> Full thread, for new recipients:
> https://www.spinics.net/lists/linux-fsdevel/msg138601.html
> 
> Christoph, do you mean a3cd5ec55f6c7 ?

Well, now we know for sure that the clk_scaling_lock is a red herring.
I applied the patch below, and still the system locked up:

# dd if=/dev/sde of=/dev/null bs=1M status=progress
3892314112 bytes (3.9 GB, 3.6 GiB) copied, 50.0042 s, 77.8 MB/s

I can't seem to get the RCU stall warning anymore. How can I get
to the bottom of this issue?

Regards.



diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c
index 3aeadb14aae1..5700662ff7ca 100644
--- a/drivers/scsi/ufs/ufs-qcom.c
+++ b/drivers/scsi/ufs/ufs-qcom.c
@@ -1085,7 +1085,6 @@ static void ufs_qcom_set_caps(struct ufs_hba *hba)
 	struct ufs_qcom_host *host = ufshcd_get_variant(hba);
 
 	hba->caps |= UFSHCD_CAP_CLK_GATING | UFSHCD_CAP_HIBERN8_WITH_CLK_GATING;
-	hba->caps |= UFSHCD_CAP_CLK_SCALING;
 	hba->caps |= UFSHCD_CAP_AUTO_BKOPS_SUSPEND;
 
 	if (host->hw_ver.major >= 0x2) {
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index d0d340210ccf..90be92766723 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -250,10 +250,6 @@ static int ufshcd_uic_hibern8_exit(struct ufs_hba *hba);
 static int ufshcd_uic_hibern8_enter(struct ufs_hba *hba);
 static inline void ufshcd_add_delay_before_dme_cmd(struct ufs_hba *hba);
 static int ufshcd_host_reset_and_restore(struct ufs_hba *hba);
-static void ufshcd_resume_clkscaling(struct ufs_hba *hba);
-static void ufshcd_suspend_clkscaling(struct ufs_hba *hba);
-static void __ufshcd_suspend_clkscaling(struct ufs_hba *hba);
-static int ufshcd_scale_clks(struct ufs_hba *hba, bool scale_up);
 static irqreturn_t ufshcd_intr(int irq, void *__hba);
 static int ufshcd_change_power_mode(struct ufs_hba *hba,
 			     struct ufs_pa_layer_attr *pwr_mode);
@@ -892,575 +888,6 @@ static bool ufshcd_is_unipro_pa_params_tuning_req(struct ufs_hba *hba)
 		return false;
 }
 
-static int ufshcd_scale_clks(struct ufs_hba *hba, bool scale_up)
-{
-	int ret = 0;
-	struct ufs_clk_info *clki;
-	struct list_head *head = &hba->clk_list_head;
-	ktime_t start = ktime_get();
-	bool clk_state_changed = false;
-
-	if (list_empty(head))
-		goto out;
-
-	ret = ufshcd_vops_clk_scale_notify(hba, scale_up, PRE_CHANGE);
-	if (ret)
-		return ret;
-
-	list_for_each_entry(clki, head, list) {
-		if (!IS_ERR_OR_NULL(clki->clk)) {
-			if (scale_up && clki->max_freq) {
-				if (clki->curr_freq == clki->max_freq)
-					continue;
-
-				clk_state_changed = true;
-				ret = clk_set_rate(clki->clk, clki->max_freq);
-				if (ret) {
-					dev_err(hba->dev, "%s: %s clk set rate(%dHz) failed, %d\n",
-						__func__, clki->name,
-						clki->max_freq, ret);
-					break;
-				}
-				trace_ufshcd_clk_scaling(dev_name(hba->dev),
-						"scaled up", clki->name,
-						clki->curr_freq,
-						clki->max_freq);
-
-				clki->curr_freq = clki->max_freq;
-
-			} else if (!scale_up && clki->min_freq) {
-				if (clki->curr_freq == clki->min_freq)
-					continue;
-
-				clk_state_changed = true;
-				ret = clk_set_rate(clki->clk, clki->min_freq);
-				if (ret) {
-					dev_err(hba->dev, "%s: %s clk set rate(%dHz) failed, %d\n",
-						__func__, clki->name,
-						clki->min_freq, ret);
-					break;
-				}
-				trace_ufshcd_clk_scaling(dev_name(hba->dev),
-						"scaled down", clki->name,
-						clki->curr_freq,
-						clki->min_freq);
-				clki->curr_freq = clki->min_freq;
-			}
-		}
-		dev_dbg(hba->dev, "%s: clk: %s, rate: %lu\n", __func__,
-				clki->name, clk_get_rate(clki->clk));
-	}
-
-	ret = ufshcd_vops_clk_scale_notify(hba, scale_up, POST_CHANGE);
-
-out:
-	if (clk_state_changed)
-		trace_ufshcd_profile_clk_scaling(dev_name(hba->dev),
-			(scale_up ? "up" : "down"),
-			ktime_to_us(ktime_sub(ktime_get(), start)), ret);
-	return ret;
-}
-
-/**
- * ufshcd_is_devfreq_scaling_required - check if scaling is required or not
- * @hba: per adapter instance
- * @scale_up: True if scaling up and false if scaling down
- *
- * Returns true if scaling is required, false otherwise.
- */
-static bool ufshcd_is_devfreq_scaling_required(struct ufs_hba *hba,
-					       bool scale_up)
-{
-	struct ufs_clk_info *clki;
-	struct list_head *head = &hba->clk_list_head;
-
-	if (list_empty(head))
-		return false;
-
-	list_for_each_entry(clki, head, list) {
-		if (!IS_ERR_OR_NULL(clki->clk)) {
-			if (scale_up && clki->max_freq) {
-				if (clki->curr_freq == clki->max_freq)
-					continue;
-				return true;
-			} else if (!scale_up && clki->min_freq) {
-				if (clki->curr_freq == clki->min_freq)
-					continue;
-				return true;
-			}
-		}
-	}
-
-	return false;
-}
-
-static int ufshcd_wait_for_doorbell_clr(struct ufs_hba *hba,
-					u64 wait_timeout_us)
-{
-	unsigned long flags;
-	int ret = 0;
-	u32 tm_doorbell;
-	u32 tr_doorbell;
-	bool timeout = false, do_last_check = false;
-	ktime_t start;
-
-	ufshcd_hold(hba, false);
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	/*
-	 * Wait for all the outstanding tasks/transfer requests.
-	 * Verify by checking the doorbell registers are clear.
-	 */
-	start = ktime_get();
-	do {
-		if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) {
-			ret = -EBUSY;
-			goto out;
-		}
-
-		tm_doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
-		tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-		if (!tm_doorbell && !tr_doorbell) {
-			timeout = false;
-			break;
-		} else if (do_last_check) {
-			break;
-		}
-
-		spin_unlock_irqrestore(hba->host->host_lock, flags);
-		schedule();
-		if (ktime_to_us(ktime_sub(ktime_get(), start)) >
-		    wait_timeout_us) {
-			timeout = true;
-			/*
-			 * We might have scheduled out for long time so make
-			 * sure to check if doorbells are cleared by this time
-			 * or not.
-			 */
-			do_last_check = true;
-		}
-		spin_lock_irqsave(hba->host->host_lock, flags);
-	} while (tm_doorbell || tr_doorbell);
-
-	if (timeout) {
-		dev_err(hba->dev,
-			"%s: timedout waiting for doorbell to clear (tm=0x%x, tr=0x%x)\n",
-			__func__, tm_doorbell, tr_doorbell);
-		ret = -EBUSY;
-	}
-out:
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ufshcd_release(hba);
-	return ret;
-}
-
-/**
- * ufshcd_scale_gear - scale up/down UFS gear
- * @hba: per adapter instance
- * @scale_up: True for scaling up gear and false for scaling down
- *
- * Returns 0 for success,
- * Returns -EBUSY if scaling can't happen at this time
- * Returns non-zero for any other errors
- */
-static int ufshcd_scale_gear(struct ufs_hba *hba, bool scale_up)
-{
-	#define UFS_MIN_GEAR_TO_SCALE_DOWN	UFS_HS_G1
-	int ret = 0;
-	struct ufs_pa_layer_attr new_pwr_info;
-
-	if (scale_up) {
-		memcpy(&new_pwr_info, &hba->clk_scaling.saved_pwr_info.info,
-		       sizeof(struct ufs_pa_layer_attr));
-	} else {
-		memcpy(&new_pwr_info, &hba->pwr_info,
-		       sizeof(struct ufs_pa_layer_attr));
-
-		if (hba->pwr_info.gear_tx > UFS_MIN_GEAR_TO_SCALE_DOWN
-		    || hba->pwr_info.gear_rx > UFS_MIN_GEAR_TO_SCALE_DOWN) {
-			/* save the current power mode */
-			memcpy(&hba->clk_scaling.saved_pwr_info.info,
-				&hba->pwr_info,
-				sizeof(struct ufs_pa_layer_attr));
-
-			/* scale down gear */
-			new_pwr_info.gear_tx = UFS_MIN_GEAR_TO_SCALE_DOWN;
-			new_pwr_info.gear_rx = UFS_MIN_GEAR_TO_SCALE_DOWN;
-		}
-	}
-
-	/* check if the power mode needs to be changed or not? */
-	ret = ufshcd_change_power_mode(hba, &new_pwr_info);
-
-	if (ret)
-		dev_err(hba->dev, "%s: failed err %d, old gear: (tx %d rx %d), new gear: (tx %d rx %d)",
-			__func__, ret,
-			hba->pwr_info.gear_tx, hba->pwr_info.gear_rx,
-			new_pwr_info.gear_tx, new_pwr_info.gear_rx);
-
-	return ret;
-}
-
-static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba)
-{
-	#define DOORBELL_CLR_TOUT_US		(1000 * 1000) /* 1 sec */
-	int ret = 0;
-	/*
-	 * make sure that there are no outstanding requests when
-	 * clock scaling is in progress
-	 */
-	ufshcd_scsi_block_requests(hba);
-	down_write(&hba->clk_scaling_lock);
-	if (ufshcd_wait_for_doorbell_clr(hba, DOORBELL_CLR_TOUT_US)) {
-		ret = -EBUSY;
-		up_write(&hba->clk_scaling_lock);
-		ufshcd_scsi_unblock_requests(hba);
-	}
-
-	return ret;
-}
-
-static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba)
-{
-	up_write(&hba->clk_scaling_lock);
-	ufshcd_scsi_unblock_requests(hba);
-}
-
-/**
- * ufshcd_devfreq_scale - scale up/down UFS clocks and gear
- * @hba: per adapter instance
- * @scale_up: True for scaling up and false for scalin down
- *
- * Returns 0 for success,
- * Returns -EBUSY if scaling can't happen at this time
- * Returns non-zero for any other errors
- */
-static int ufshcd_devfreq_scale(struct ufs_hba *hba, bool scale_up)
-{
-	int ret = 0;
-
-	/* let's not get into low power until clock scaling is completed */
-	ufshcd_hold(hba, false);
-
-	ret = ufshcd_clock_scaling_prepare(hba);
-	if (ret)
-		return ret;
-
-	/* scale down the gear before scaling down clocks */
-	if (!scale_up) {
-		ret = ufshcd_scale_gear(hba, false);
-		if (ret)
-			goto out;
-	}
-
-	ret = ufshcd_scale_clks(hba, scale_up);
-	if (ret) {
-		if (!scale_up)
-			ufshcd_scale_gear(hba, true);
-		goto out;
-	}
-
-	/* scale up the gear after scaling up clocks */
-	if (scale_up) {
-		ret = ufshcd_scale_gear(hba, true);
-		if (ret) {
-			ufshcd_scale_clks(hba, false);
-			goto out;
-		}
-	}
-
-	ret = ufshcd_vops_clk_scale_notify(hba, scale_up, POST_CHANGE);
-
-out:
-	ufshcd_clock_scaling_unprepare(hba);
-	ufshcd_release(hba);
-	return ret;
-}
-
-static void ufshcd_clk_scaling_suspend_work(struct work_struct *work)
-{
-	struct ufs_hba *hba = container_of(work, struct ufs_hba,
-					   clk_scaling.suspend_work);
-	unsigned long irq_flags;
-
-	spin_lock_irqsave(hba->host->host_lock, irq_flags);
-	if (hba->clk_scaling.active_reqs || hba->clk_scaling.is_suspended) {
-		spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
-		return;
-	}
-	hba->clk_scaling.is_suspended = true;
-	spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
-
-	__ufshcd_suspend_clkscaling(hba);
-}
-
-static void ufshcd_clk_scaling_resume_work(struct work_struct *work)
-{
-	struct ufs_hba *hba = container_of(work, struct ufs_hba,
-					   clk_scaling.resume_work);
-	unsigned long irq_flags;
-
-	spin_lock_irqsave(hba->host->host_lock, irq_flags);
-	if (!hba->clk_scaling.is_suspended) {
-		spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
-		return;
-	}
-	hba->clk_scaling.is_suspended = false;
-	spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
-
-	devfreq_resume_device(hba->devfreq);
-}
-
-static int ufshcd_devfreq_target(struct device *dev,
-				unsigned long *freq, u32 flags)
-{
-	int ret = 0;
-	struct ufs_hba *hba = dev_get_drvdata(dev);
-	ktime_t start;
-	bool scale_up, sched_clk_scaling_suspend_work = false;
-	struct list_head *clk_list = &hba->clk_list_head;
-	struct ufs_clk_info *clki;
-	unsigned long irq_flags;
-
-	if (!ufshcd_is_clkscaling_supported(hba))
-		return -EINVAL;
-
-	spin_lock_irqsave(hba->host->host_lock, irq_flags);
-	if (ufshcd_eh_in_progress(hba)) {
-		spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
-		return 0;
-	}
-
-	if (!hba->clk_scaling.active_reqs)
-		sched_clk_scaling_suspend_work = true;
-
-	if (list_empty(clk_list)) {
-		spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
-		goto out;
-	}
-
-	clki = list_first_entry(&hba->clk_list_head, struct ufs_clk_info, list);
-	scale_up = (*freq == clki->max_freq) ? true : false;
-	if (!ufshcd_is_devfreq_scaling_required(hba, scale_up)) {
-		spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
-		ret = 0;
-		goto out; /* no state change required */
-	}
-	spin_unlock_irqrestore(hba->host->host_lock, irq_flags);
-
-	start = ktime_get();
-	ret = ufshcd_devfreq_scale(hba, scale_up);
-
-	trace_ufshcd_profile_clk_scaling(dev_name(hba->dev),
-		(scale_up ? "up" : "down"),
-		ktime_to_us(ktime_sub(ktime_get(), start)), ret);
-
-out:
-	if (sched_clk_scaling_suspend_work)
-		queue_work(hba->clk_scaling.workq,
-			   &hba->clk_scaling.suspend_work);
-
-	return ret;
-}
-
-
-static int ufshcd_devfreq_get_dev_status(struct device *dev,
-		struct devfreq_dev_status *stat)
-{
-	struct ufs_hba *hba = dev_get_drvdata(dev);
-	struct ufs_clk_scaling *scaling = &hba->clk_scaling;
-	unsigned long flags;
-
-	if (!ufshcd_is_clkscaling_supported(hba))
-		return -EINVAL;
-
-	memset(stat, 0, sizeof(*stat));
-
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	if (!scaling->window_start_t)
-		goto start_window;
-
-	if (scaling->is_busy_started)
-		scaling->tot_busy_t += ktime_to_us(ktime_sub(ktime_get(),
-					scaling->busy_start_t));
-
-	stat->total_time = jiffies_to_usecs((long)jiffies -
-				(long)scaling->window_start_t);
-	stat->busy_time = scaling->tot_busy_t;
-start_window:
-	scaling->window_start_t = jiffies;
-	scaling->tot_busy_t = 0;
-
-	if (hba->outstanding_reqs) {
-		scaling->busy_start_t = ktime_get();
-		scaling->is_busy_started = true;
-	} else {
-		scaling->busy_start_t = 0;
-		scaling->is_busy_started = false;
-	}
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	return 0;
-}
-
-static struct devfreq_dev_profile ufs_devfreq_profile = {
-	.polling_ms	= 100,
-	.target		= ufshcd_devfreq_target,
-	.get_dev_status	= ufshcd_devfreq_get_dev_status,
-};
-
-static int ufshcd_devfreq_init(struct ufs_hba *hba)
-{
-	struct list_head *clk_list = &hba->clk_list_head;
-	struct ufs_clk_info *clki;
-	struct devfreq *devfreq;
-	int ret;
-
-	/* Skip devfreq if we don't have any clocks in the list */
-	if (list_empty(clk_list))
-		return 0;
-
-	clki = list_first_entry(clk_list, struct ufs_clk_info, list);
-	dev_pm_opp_add(hba->dev, clki->min_freq, 0);
-	dev_pm_opp_add(hba->dev, clki->max_freq, 0);
-
-	devfreq = devfreq_add_device(hba->dev,
-			&ufs_devfreq_profile,
-			DEVFREQ_GOV_SIMPLE_ONDEMAND,
-			NULL);
-	if (IS_ERR(devfreq)) {
-		ret = PTR_ERR(devfreq);
-		dev_err(hba->dev, "Unable to register with devfreq %d\n", ret);
-
-		dev_pm_opp_remove(hba->dev, clki->min_freq);
-		dev_pm_opp_remove(hba->dev, clki->max_freq);
-		return ret;
-	}
-
-	hba->devfreq = devfreq;
-
-	return 0;
-}
-
-static void ufshcd_devfreq_remove(struct ufs_hba *hba)
-{
-	struct list_head *clk_list = &hba->clk_list_head;
-	struct ufs_clk_info *clki;
-
-	if (!hba->devfreq)
-		return;
-
-	devfreq_remove_device(hba->devfreq);
-	hba->devfreq = NULL;
-
-	clki = list_first_entry(clk_list, struct ufs_clk_info, list);
-	dev_pm_opp_remove(hba->dev, clki->min_freq);
-	dev_pm_opp_remove(hba->dev, clki->max_freq);
-}
-
-static void __ufshcd_suspend_clkscaling(struct ufs_hba *hba)
-{
-	unsigned long flags;
-
-	devfreq_suspend_device(hba->devfreq);
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	hba->clk_scaling.window_start_t = 0;
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-}
-
-static void ufshcd_suspend_clkscaling(struct ufs_hba *hba)
-{
-	unsigned long flags;
-	bool suspend = false;
-
-	if (!ufshcd_is_clkscaling_supported(hba))
-		return;
-
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	if (!hba->clk_scaling.is_suspended) {
-		suspend = true;
-		hba->clk_scaling.is_suspended = true;
-	}
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-
-	if (suspend)
-		__ufshcd_suspend_clkscaling(hba);
-}
-
-static void ufshcd_resume_clkscaling(struct ufs_hba *hba)
-{
-	unsigned long flags;
-	bool resume = false;
-
-	if (!ufshcd_is_clkscaling_supported(hba))
-		return;
-
-	spin_lock_irqsave(hba->host->host_lock, flags);
-	if (hba->clk_scaling.is_suspended) {
-		resume = true;
-		hba->clk_scaling.is_suspended = false;
-	}
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
-
-	if (resume)
-		devfreq_resume_device(hba->devfreq);
-}
-
-static ssize_t ufshcd_clkscale_enable_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
-{
-	struct ufs_hba *hba = dev_get_drvdata(dev);
-
-	return snprintf(buf, PAGE_SIZE, "%d\n", hba->clk_scaling.is_allowed);
-}
-
-static ssize_t ufshcd_clkscale_enable_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t count)
-{
-	struct ufs_hba *hba = dev_get_drvdata(dev);
-	u32 value;
-	int err;
-
-	if (kstrtou32(buf, 0, &value))
-		return -EINVAL;
-
-	value = !!value;
-	if (value == hba->clk_scaling.is_allowed)
-		goto out;
-
-	pm_runtime_get_sync(hba->dev);
-	ufshcd_hold(hba, false);
-
-	cancel_work_sync(&hba->clk_scaling.suspend_work);
-	cancel_work_sync(&hba->clk_scaling.resume_work);
-
-	hba->clk_scaling.is_allowed = value;
-
-	if (value) {
-		ufshcd_resume_clkscaling(hba);
-	} else {
-		ufshcd_suspend_clkscaling(hba);
-		err = ufshcd_devfreq_scale(hba, true);
-		if (err)
-			dev_err(hba->dev, "%s: failed to scale clocks up %d\n",
-					__func__, err);
-	}
-
-	ufshcd_release(hba);
-	pm_runtime_put_sync(hba->dev);
-out:
-	return count;
-}
-
-static void ufshcd_clkscaling_init_sysfs(struct ufs_hba *hba)
-{
-	hba->clk_scaling.enable_attr.show = ufshcd_clkscale_enable_show;
-	hba->clk_scaling.enable_attr.store = ufshcd_clkscale_enable_store;
-	sysfs_attr_init(&hba->clk_scaling.enable_attr.attr);
-	hba->clk_scaling.enable_attr.attr.name = "clkscale_enable";
-	hba->clk_scaling.enable_attr.attr.mode = 0644;
-	if (device_create_file(hba->dev, &hba->clk_scaling.enable_attr))
-		dev_err(hba->dev, "Failed to create sysfs for clkscale_enable\n");
-}
-
 static void ufshcd_ungate_work(struct work_struct *work)
 {
 	int ret;
@@ -1742,34 +1169,6 @@ static ssize_t ufshcd_clkgate_enable_store(struct device *dev,
 	return count;
 }
 
-static void ufshcd_init_clk_scaling(struct ufs_hba *hba)
-{
-	char wq_name[sizeof("ufs_clkscaling_00")];
-
-	if (!ufshcd_is_clkscaling_supported(hba))
-		return;
-
-	INIT_WORK(&hba->clk_scaling.suspend_work,
-		  ufshcd_clk_scaling_suspend_work);
-	INIT_WORK(&hba->clk_scaling.resume_work,
-		  ufshcd_clk_scaling_resume_work);
-
-	snprintf(wq_name, sizeof(wq_name), "ufs_clkscaling_%d",
-		 hba->host->host_no);
-	hba->clk_scaling.workq = create_singlethread_workqueue(wq_name);
-
-	ufshcd_clkscaling_init_sysfs(hba);
-}
-
-static void ufshcd_exit_clk_scaling(struct ufs_hba *hba)
-{
-	if (!ufshcd_is_clkscaling_supported(hba))
-		return;
-
-	destroy_workqueue(hba->clk_scaling.workq);
-	ufshcd_devfreq_remove(hba);
-}
-
 static void ufshcd_init_clk_gating(struct ufs_hba *hba)
 {
 	char wq_name[sizeof("ufs_clk_gating_00")];
@@ -1816,50 +1215,6 @@ static void ufshcd_exit_clk_gating(struct ufs_hba *hba)
 	destroy_workqueue(hba->clk_gating.clk_gating_workq);
 }
 
-/* Must be called with host lock acquired */
-static void ufshcd_clk_scaling_start_busy(struct ufs_hba *hba)
-{
-	bool queue_resume_work = false;
-
-	if (!ufshcd_is_clkscaling_supported(hba))
-		return;
-
-	if (!hba->clk_scaling.active_reqs++)
-		queue_resume_work = true;
-
-	if (!hba->clk_scaling.is_allowed || hba->pm_op_in_progress)
-		return;
-
-	if (queue_resume_work)
-		queue_work(hba->clk_scaling.workq,
-			   &hba->clk_scaling.resume_work);
-
-	if (!hba->clk_scaling.window_start_t) {
-		hba->clk_scaling.window_start_t = jiffies;
-		hba->clk_scaling.tot_busy_t = 0;
-		hba->clk_scaling.is_busy_started = false;
-	}
-
-	if (!hba->clk_scaling.is_busy_started) {
-		hba->clk_scaling.busy_start_t = ktime_get();
-		hba->clk_scaling.is_busy_started = true;
-	}
-}
-
-static void ufshcd_clk_scaling_update_busy(struct ufs_hba *hba)
-{
-	struct ufs_clk_scaling *scaling = &hba->clk_scaling;
-
-	if (!ufshcd_is_clkscaling_supported(hba))
-		return;
-
-	if (!hba->outstanding_reqs && scaling->is_busy_started) {
-		scaling->tot_busy_t += ktime_to_us(ktime_sub(ktime_get(),
-					scaling->busy_start_t));
-		scaling->busy_start_t = 0;
-		scaling->is_busy_started = false;
-	}
-}
 /**
  * ufshcd_send_command - Send SCSI or device management commands
  * @hba: per adapter instance
@@ -1870,7 +1225,6 @@ void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 {
 	hba->lrb[task_tag].issue_time_stamp = ktime_get();
 	hba->lrb[task_tag].compl_time_stamp = ktime_set(0, 0);
-	ufshcd_clk_scaling_start_busy(hba);
 	__set_bit(task_tag, &hba->outstanding_reqs);
 	ufshcd_writel(hba, 1 << task_tag, REG_UTP_TRANSFER_REQ_DOOR_BELL);
 	/* Make sure that doorbell is committed immediately */
@@ -2398,9 +1752,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 		BUG();
 	}
 
-	if (!down_read_trylock(&hba->clk_scaling_lock))
-		return SCSI_MLQUEUE_HOST_BUSY;
-
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	switch (hba->ufshcd_state) {
 	case UFSHCD_STATE_OPERATIONAL:
@@ -2480,7 +1831,6 @@ static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 out_unlock:
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 out:
-	up_read(&hba->clk_scaling_lock);
 	return err;
 }
 
@@ -2670,8 +2020,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 	struct completion wait;
 	unsigned long flags;
 
-	down_read(&hba->clk_scaling_lock);
-
 	/*
 	 * Get free slot, sleep if slots are unavailable.
 	 * Even though we use wait_event() which sleeps indefinitely,
@@ -2704,7 +2052,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
 out_put_tag:
 	ufshcd_put_dev_cmd_tag(hba, tag);
 	wake_up(&hba->dev_cmd.tag_wq);
-	up_read(&hba->clk_scaling_lock);
 	return err;
 }
 
@@ -4786,8 +4133,6 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 				complete(hba->dev_cmd.complete);
 			}
 		}
-		if (ufshcd_is_clkscaling_supported(hba))
-			hba->clk_scaling.active_reqs--;
 
 		lrbp->compl_time_stamp = ktime_get();
 	}
@@ -4795,8 +4140,6 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 	/* clear corresponding bits of completed commands */
 	hba->outstanding_reqs ^= completed_reqs;
 
-	ufshcd_clk_scaling_update_busy(hba);
-
 	/* we might have free'd some tags above */
 	wake_up(&hba->dev_cmd.tag_wq);
 }
@@ -5708,8 +5051,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 	unsigned long flags;
 	u32 upiu_flags;
 
-	down_read(&hba->clk_scaling_lock);
-
 	wait_event(hba->dev_cmd.tag_wq, ufshcd_get_dev_cmd_tag(hba, &tag));
 
 	init_completion(&wait);
@@ -5772,7 +5113,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
 
 	ufshcd_put_dev_cmd_tag(hba, tag);
 	wake_up(&hba->dev_cmd.tag_wq);
-	up_read(&hba->clk_scaling_lock);
 	return err;
 }
 
@@ -6110,9 +5450,6 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
 	ufshcd_hba_stop(hba, false);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
-	/* scale up clocks to max frequency before full reinitialization */
-	ufshcd_scale_clks(hba, true);
-
 	err = ufshcd_hba_enable(hba);
 	if (err)
 		goto out;
@@ -6871,20 +6208,6 @@ static int ufshcd_probe_hba(struct ufs_hba *hba)
 		if (ufshcd_scsi_add_wlus(hba))
 			goto out;
 
-		/* Initialize devfreq after UFS device is detected */
-		if (ufshcd_is_clkscaling_supported(hba)) {
-			memcpy(&hba->clk_scaling.saved_pwr_info.info,
-				&hba->pwr_info,
-				sizeof(struct ufs_pa_layer_attr));
-			hba->clk_scaling.saved_pwr_info.is_valid = true;
-			if (!hba->devfreq) {
-				ret = ufshcd_devfreq_init(hba);
-				if (ret)
-					goto out;
-			}
-			hba->clk_scaling.is_allowed = true;
-		}
-
 		ufs_bsg_probe(hba);
 
 		scsi_scan_host(hba->host);
@@ -6901,7 +6224,6 @@ static int ufshcd_probe_hba(struct ufs_hba *hba)
 	 */
 	if (ret && !ufshcd_eh_in_progress(hba) && !hba->pm_op_in_progress) {
 		pm_runtime_put_sync(hba->dev);
-		ufshcd_exit_clk_scaling(hba);
 		ufshcd_hba_exit(hba);
 	}
 
@@ -7446,10 +6768,6 @@ static void ufshcd_hba_exit(struct ufs_hba *hba)
 	if (hba->is_powered) {
 		ufshcd_variant_hba_exit(hba);
 		ufshcd_setup_vreg(hba, false);
-		ufshcd_suspend_clkscaling(hba);
-		if (ufshcd_is_clkscaling_supported(hba))
-			if (hba->devfreq)
-				ufshcd_suspend_clkscaling(hba);
 		ufshcd_setup_clocks(hba, false);
 		ufshcd_setup_hba_vreg(hba, false);
 		hba->is_powered = false;
@@ -7725,12 +7043,6 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 	ufshcd_hold(hba, false);
 	hba->clk_gating.is_suspended = true;
 
-	if (hba->clk_scaling.is_allowed) {
-		cancel_work_sync(&hba->clk_scaling.suspend_work);
-		cancel_work_sync(&hba->clk_scaling.resume_work);
-		ufshcd_suspend_clkscaling(hba);
-	}
-
 	if (req_dev_pwr_mode == UFS_ACTIVE_PWR_MODE &&
 			req_link_state == UIC_LINK_ACTIVE_STATE) {
 		goto disable_clks;
@@ -7806,8 +7118,6 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 	goto out;
 
 set_link_active:
-	if (hba->clk_scaling.is_allowed)
-		ufshcd_resume_clkscaling(hba);
 	ufshcd_vreg_set_hpm(hba);
 	if (ufshcd_is_link_hibern8(hba) && !ufshcd_uic_hibern8_exit(hba))
 		ufshcd_set_link_active(hba);
@@ -7817,8 +7127,6 @@ static int ufshcd_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 	if (!ufshcd_set_dev_pwr_mode(hba, UFS_ACTIVE_PWR_MODE))
 		ufshcd_disable_auto_bkops(hba);
 enable_gating:
-	if (hba->clk_scaling.is_allowed)
-		ufshcd_resume_clkscaling(hba);
 	hba->clk_gating.is_suspended = false;
 	ufshcd_release(hba);
 out:
@@ -7901,9 +7209,6 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 
 	hba->clk_gating.is_suspended = false;
 
-	if (hba->clk_scaling.is_allowed)
-		ufshcd_resume_clkscaling(hba);
-
 	/* Schedule clock gating in case of no access to UFS device yet */
 	ufshcd_release(hba);
 
@@ -7920,8 +7225,6 @@ static int ufshcd_resume(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 	ufshcd_vreg_set_lpm(hba);
 disable_irq_and_vops_clks:
 	ufshcd_disable_irq(hba);
-	if (hba->clk_scaling.is_allowed)
-		ufshcd_suspend_clkscaling(hba);
 	ufshcd_setup_clocks(hba, false);
 out:
 	hba->pm_op_in_progress = 0;
@@ -8125,10 +7428,7 @@ void ufshcd_remove(struct ufs_hba *hba)
 	ufshcd_disable_intr(hba, hba->intr_mask);
 	ufshcd_hba_stop(hba, true);
 
-	ufshcd_exit_clk_scaling(hba);
 	ufshcd_exit_clk_gating(hba);
-	if (ufshcd_is_clkscaling_supported(hba))
-		device_remove_file(hba->dev, &hba->clk_scaling.enable_attr);
 	ufshcd_hba_exit(hba);
 }
 EXPORT_SYMBOL_GPL(ufshcd_remove);
@@ -8284,15 +7584,11 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	/* Initialize mutex for device management commands */
 	mutex_init(&hba->dev_cmd.lock);
 
-	init_rwsem(&hba->clk_scaling_lock);
-
 	/* Initialize device management tag acquire wait queue */
 	init_waitqueue_head(&hba->dev_cmd.tag_wq);
 
 	ufshcd_init_clk_gating(hba);
 
-	ufshcd_init_clk_scaling(hba);
-
 	/*
 	 * In order to avoid any spurious interrupt immediately after
 	 * registering UFS controller interrupt handler, clear any pending UFS
@@ -8368,7 +7664,6 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 out_remove_scsi_host:
 	scsi_remove_host(hba->host);
 exit_gating:
-	ufshcd_exit_clk_scaling(hba);
 	ufshcd_exit_clk_gating(hba);
 out_disable:
 	hba->is_irq_enabled = false;
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 69ba7445d2b3..11c7b070861c 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -700,7 +700,6 @@ struct ufs_hba {
 	enum bkops_status urgent_bkops_lvl;
 	bool is_urgent_bkops_lvl_checked;
 
-	struct rw_semaphore clk_scaling_lock;
 	struct ufs_desc_size desc_size;
 	atomic_t scsi_block_reqs_cnt;
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-21 15:22         ` Marc Gonzalez
@ 2019-01-22  3:12           ` jianchao.wang
  2019-01-22 10:59             ` Marc Gonzalez
  0 siblings, 1 reply; 34+ messages in thread
From: jianchao.wang @ 2019-01-22  3:12 UTC (permalink / raw)
  To: Marc Gonzalez, Christoph Hellwig, Jens Axboe
  Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani,
	Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath,
	Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam,
	Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson

Hello

On 1/21/19 11:22 PM, Marc Gonzalez wrote:
> Well, now we know for sure that the clk_scaling_lock is a red herring.
> I applied the patch below, and still the system locked up:
> 
> # dd if=/dev/sde of=/dev/null bs=1M status=progress
> 3892314112 bytes (3.9 GB, 3.6 GiB) copied, 50.0042 s, 77.8 MB/s
> 
> I can't seem to get the RCU stall warning anymore. How can I get
> to the bottom of this issue?
> 

Can you detail the system 'locked up' ?
dd hangs there ? any hung task warning log ?

hang forever or just hang for a relatively long time.

And what is the status of the dd when it hangs ?
Can you take some samples of the /proc/<dd's pid>/status and /proc/<dd's pid>/stack during the hang ?

And also would you please share the dmesg log and config ?

Since always fails with buffered read with fixed bytes,
what is the capacity of your system memory ? 


Thanks
Jianchao

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-22  3:12           ` jianchao.wang
@ 2019-01-22 10:59             ` Marc Gonzalez
  2019-01-22 12:49               ` Marc Gonzalez
                                 ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Marc Gonzalez @ 2019-01-22 10:59 UTC (permalink / raw)
  To: Jianchao Wang, Christoph Hellwig, Jens Axboe
  Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani,
	Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath,
	Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam,
	Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson

On 22/01/2019 04:12, Jianchao Wang wrote:

> On 1/21/19 11:22 PM, Marc Gonzalez wrote:
> 
>> Well, now we know for sure that the clk_scaling_lock is a red herring.
>> I applied the patch below, and still the system locked up:
>>
>> # dd if=/dev/sde of=/dev/null bs=1M status=progress
>> 3892314112 bytes (3.9 GB, 3.6 GiB) copied, 50.0042 s, 77.8 MB/s
>>
>> I can't seem to get the RCU stall warning anymore. How can I get
>> to the bottom of this issue?
> 
> Can you detail the system 'locked up' ?
> dd hangs there ? any hung task warning log ?
> hang forever or just hang for a relatively long time.

The system is an arm64 dev board (APQ8098 MEDIABOX) with 4GB RAM and 64 GB UFS.
USB, SDHC, PCIe, SATA, Ethernet are not functional yet (so much work ahead).
All I have is a single serial console.
When the shell hangs, I lose access to the system altogether.
SysRq is not implemented either. I am blind once the shell locks up.
The system has been frozen for 15 hours, I think that qualifies as 'forever' ;-)

> And what is the status of the dd when it hangs ?
> Can you take some samples of the /proc/<dd's pid>/status and /proc/<dd's pid>/stack during the hang ?

Sadly, I cannot access this information once the shell locks up.

However, the kernel did print many warnings overnight (see below).

> And also would you please share the dmesg log and config ?

See below.

> Since always fails with buffered read with fixed bytes,
> what is the capacity of your system memory ? 

4GB RAM. And the system hangs after reading 3.8GB
I think this is not a coincidence.
NB: swap is disabled (this might be relevant)

On a freshly booted system, I get

# free
random: get_random_u64 called from copy_process.isra.9.part.10+0x2c8/0x1700 with crng_init=1
random: get_random_u64 called from arch_pick_mmap_layout+0xdc/0x100 with crng_init=1
random: get_random_u64 called from load_elf_binary+0x2b8/0x12b4 with crng_init=1
             total       used       free     shared    buffers     cached
Mem:       3948996      48916    3900080      17124          0      17124
-/+ buffers/cache:      31792    3917204
Swap:            0          0          0

# cat /proc/meminfo 
random: get_random_u64 called from copy_process.isra.9.part.10+0x2c8/0x1700 with crng_init=1
random: get_random_u64 called from arch_pick_mmap_layout+0xdc/0x100 with crng_init=1
random: get_random_u64 called from load_elf_binary+0x2b8/0x12b4 with crng_init=1
MemTotal:        3948996 kB
MemFree:         3899824 kB
MemAvailable:    3880472 kB
Buffers:               0 kB
Cached:            17124 kB
SwapCached:            0 kB
Active:             5700 kB
Inactive:          11748 kB
Active(anon):       5700 kB
Inactive(anon):    11748 kB
Active(file):          0 kB
Inactive(file):        0 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:           432 kB
Mapped:             2244 kB
Shmem:             17124 kB
KReclaimable:       8444 kB
Slab:              18120 kB
SReclaimable:       8444 kB
SUnreclaim:         9676 kB
KernelStack:        1372 kB
PageTables:           44 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1974496 kB
Committed_AS:      19072 kB
VmallocTotal:   262930368 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:              832 kB


BOOT LOG:

Booting Linux on physical CPU 0x0000000000 [0x51af8014]
Linux version 5.0.0-rc1 (mgonzalez@venus) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05)) #54 SMP PREEMPT Mon Jan 21 19:06:50 CET 2019
Machine model: Qualcomm Technologies, Inc. MSM8998 v1 MTP
psci: probing for conduit method from DT.
psci: PSCIv1.0 detected in firmware.
psci: Using standard PSCI v0.2 function IDs
psci: MIGRATE_INFO_TYPE not supported.
psci: SMC Calling Convention v1.0
random: get_random_bytes called from start_kernel+0xb4/0x45c with crng_init=0
percpu: Embedded 23 pages/cpu @(____ptrval____) s53272 r8192 d32744 u94208
Detected VIPT I-cache on CPU0
CPU features: detected: Kernel page table isolation (KPTI)
Built 1 zonelists, mobility grouping on.  Total pages: 1012273
Kernel command line: loglevel=7 androidboot.bootdevice=1da4000.ufshc androidboot.serialno=53733c35 androidboot.baseband=apq mdss_mdp.panel=1:hdmi:16
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
software IO TLB: mapped [mem 0xfbfff000-0xfffff000] (64MB)
Memory: 3943364K/4114176K available (3646K kernel code, 478K rwdata, 1296K rodata, 5632K init, 12850K bss, 170812K reserved, 0K cma-reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
ftrace: allocating 12677 entries in 50 pages
Running RCU self tests
rcu: Preemptible hierarchical RCU implementation.
rcu:    RCU lockdep checking is enabled.
rcu:    RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=8.
        Tasks RCU enabled.
rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
GICv3: Distributor has no Range Selector support
GICv3: no VLPI support, no direct LPI support
GICv3: CPU0: found redistributor 0 region 0:0x0000000017b00000
ITS: No ITS available, not enabling LPIs
arch_timer: cp15 and mmio timer(s) running at 19.20MHz (virt/virt).
clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x46d987e47, max_idle_ns: 440795202767 ns
sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 4398046511078ns
Console: colour dummy device 80x25
printk: console [tty0] enabled
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:  8
... MAX_LOCK_DEPTH:          48
... MAX_LOCKDEP_KEYS:        8191
... CLASSHASH_SIZE:          4096
... MAX_LOCKDEP_ENTRIES:     32768
... MAX_LOCKDEP_CHAINS:      65536
... CHAINHASH_SIZE:          32768
 memory used by lock dependency info: 7263 kB
 per task-struct memory footprint: 1920 bytes
Calibrating delay loop (skipped), value calculated using timer frequency.. 38.40 BogoMIPS (lpj=76800)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes)
random: get_random_u64 called from copy_process.isra.9.part.10+0x2c8/0x1700 with crng_init=0
ASID allocator initialised with 32768 entries
rcu: Hierarchical SRCU implementation.
smp: Bringing up secondary CPUs ...
Detected VIPT I-cache on CPU1
GICv3: CPU1: found redistributor 1 region 0:0x0000000017b20000
CPU1: Booted secondary processor 0x0000000001 [0x51af8014]
Detected VIPT I-cache on CPU2
GICv3: CPU2: found redistributor 2 region 0:0x0000000017b40000
CPU2: Booted secondary processor 0x0000000002 [0x51af8014]
Detected VIPT I-cache on CPU3
GICv3: CPU3: found redistributor 3 region 0:0x0000000017b60000
CPU3: Booted secondary processor 0x0000000003 [0x51af8014]
Detected VIPT I-cache on CPU4
CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU4: 0x00000000101122
CPU features: Unsupported CPU feature variation detected.
GICv3: CPU4: found redistributor 100 region 0:0x0000000017b80000
CPU4: Booted secondary processor 0x0000000100 [0x51af8001]
Detected VIPT I-cache on CPU5
CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU5: 0x00000000101122
GICv3: CPU5: found redistributor 101 region 0:0x0000000017ba0000
CPU5: Booted secondary processor 0x0000000101 [0x51af8001]
Detected VIPT I-cache on CPU6
CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU6: 0x00000000101122
GICv3: CPU6: found redistributor 102 region 0:0x0000000017bc0000
CPU6: Booted secondary processor 0x0000000102 [0x51af8001]
Detected VIPT I-cache on CPU7
CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU7: 0x00000000101122
GICv3: CPU7: found redistributor 103 region 0:0x0000000017be0000
CPU7: Booted secondary processor 0x0000000103 [0x51af8001]
smp: Brought up 1 node, 8 CPUs
SMP: Total of 8 processors activated.
CPU features: detected: GIC system register CPU interface
CPU features: detected: 32-bit EL0 Support
CPU features: detected: CRC32 instructions
CPU: All CPU(s) started at EL1
alternatives: patching kernel code
devtmpfs: initialized
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
futex hash table entries: 2048 (order: 6, 262144 bytes)
pinctrl core: initialized pinctrl subsystem
vdso: 2 pages (1 code @ (____ptrval____), 1 data @ (____ptrval____))
DMA: preallocated 256 KiB pool for atomic allocations
SCSI subsystem initialized
clocksource: Switched to clocksource arch_sys_counter
s1: supplied by vph_pwr
s2: supplied by vph_pwr
s3: supplied by vph_pwr
s3: Bringing 0uV into 1352000-1352000uV
s4: supplied by vph_pwr
s4: Bringing 0uV into 1800000-1800000uV
s5: supplied by vph_pwr
s5: Bringing 0uV into 1904000-1904000uV
s6: supplied by vph_pwr
s7: supplied by vph_pwr
s7: Bringing 0uV into 900000-900000uV
s8: supplied by vph_pwr
s9: supplied by vph_pwr
s10: supplied by vph_pwr
s11: supplied by vph_pwr
s12: supplied by vph_pwr
s13: supplied by vph_pwr
l1: supplied by s7
l1: Bringing 0uV into 880000-880000uV
l2: supplied by s3
l2: Bringing 0uV into 1200000-1200000uV
l3: supplied by s7
l3: Bringing 0uV into 1000000-1000000uV
l4: supplied by s7
l5: supplied by s7
l5: Bringing 0uV into 800000-800000uV
l6: supplied by s5
l6: Bringing 0uV into 1808000-1808000uV
l7: supplied by s5
l7: Bringing 0uV into 1800000-1800000uV
l8: supplied by s3
l8: Bringing 0uV into 1200000-1200000uV
l9: Bringing 0uV into 1808000-1808000uV
l10: Bringing 0uV into 1808000-1808000uV
l11: supplied by s7
l11: Bringing 0uV into 1000000-1000000uV
l12: supplied by s5
l12: Bringing 0uV into 1800000-1800000uV
l13: Bringing 0uV into 1808000-1808000uV
l14: supplied by s5
l14: Bringing 0uV into 1880000-1880000uV
l15: supplied by s5
l15: Bringing 0uV into 1800000-1800000uV
l16: Bringing 0uV into 2704000-2704000uV
l17: supplied by s3
l17: Bringing 0uV into 1304000-1304000uV
l18: Bringing 0uV into 2704000-2704000uV
l19: Bringing 0uV into 3008000-3008000uV
l20: Bringing 0uV into 2960000-2960000uV
l21: Bringing 0uV into 2960000-2960000uV
l22: Bringing 0uV into 2864000-2864000uV
l23: Bringing 0uV into 3312000-3312000uV
l24: Bringing 0uV into 3088000-3088000uV
l25: Bringing 0uV into 3104000-3104000uV
l26: supplied by s3
l26: Bringing 0uV into 1200000-1200000uV
l27: supplied by s7
l28: Bringing 0uV into 3008000-3008000uV
lvs1: supplied by s4
lvs2: supplied by s4
bob: supplied by vph_pwr
bob: Bringing 0uV into 3312000-3312000uV
workingset: timestamp_bits=62 max_order=20 bucket_order=0
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
qcom-qmp-phy 1da7000.phy: Linked as a consumer to regulator.15
qcom-qmp-phy 1da7000.phy: Linked as a consumer to regulator.16
qcom-qmp-phy 1da7000.phy: Registered Qcom-QMP phy
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
msm_serial c1b0000.serial: msm_serial: detected port #0
msm_serial c1b0000.serial: uartclk = 1843200
c1b0000.serial: ttyMSM0 at MMIO 0xc1b0000 (irq = 15, base_baud = 115200) is a MSM
msm_serial: console setup on port #0
printk: console [ttyMSM0] enabled
msm_serial: driver initialized
ufshcd-qcom 1da4000.ufshc: ufshcd_populate_vreg: Unable to find vdd-hba-supply regulator, assuming enabled
l20: supplied by bob
ufshcd-qcom 1da4000.ufshc: Linked as a consumer to regulator.34
ufshcd-qcom 1da4000.ufshc: Linked as a consumer to regulator.40
ufshcd-qcom 1da4000.ufshc: Linked as a consumer to regulator.5
ENTER ufshcd_runtime_suspend
scsi host0: ufshcd
random: get_random_bytes called from init_oops_id+0x40/0x50 with crng_init=0
random: get_random_bytes called from prandom_seed_full_state+0x60/0x110 with crng_init=0
ufshcd-qcom 1da4000.ufshc: ufshcd_print_pwr_info:[RX, TX]: gear=[1, 1], lane[1, 1], pwr[SLOWAUTO_MODE, SLOWAUTO_MODE], rate = 0
l9: supplied by bob
l10: supplied by bob
l13: supplied by bob
l16: supplied by bob
l18: supplied by bob
l19: supplied by bob
l21: supplied by bob
l22: supplied by bob
l23: supplied by bob
l24: supplied by bob
l25: supplied by bob
l28: supplied by bob
ufshcd-qcom 1da4000.ufshc: ufshcd_print_pwr_info:[RX, TX]: gear=[3, 3], lane[2, 2], pwr[FAST MODE, FAST MODE], rate = 2
scsi 0:0:0:49488: scsi_add_lun: correcting incorrect peripheral device type 0x0 for W-LUN 0x            c150hN
scsi 0:0:0:49488: Well-known LUN    SAMSUNG  KLUCG4J1EB-B0B1  0200 PQ: 0 ANSI: 6
scsi 0:0:0:49476: scsi_add_lun: correcting incorrect peripheral device type 0x0 for W-LUN 0x            c144hN
scsi 0:0:0:49476: Well-known LUN    SAMSUNG  KLUCG4J1EB-B0B1  0200 PQ: 0 ANSI: 6
scsi 0:0:0:49456: scsi_add_lun: correcting incorrect peripheral device type 0x0 for W-LUN 0x            c130hN
scsi 0:0:0:49456: Well-known LUN    SAMSUNG  KLUCG4J1EB-B0B1  0200 PQ: 0 ANSI: 6
scsi 0:0:0:0: Direct-Access     SAMSUNG  KLUCG4J1EB-B0B1  0200 PQ: 0 ANSI: 6
sd 0:0:0:0: Power-on or device reset occurred
scsi 0:0:0:1: Direct-Access     SAMSUNG  KLUCG4J1EB-B0B1  0200 PQ: 0 ANSI: 6
sd 0:0:0:0: [sda] 14145536 4096-byte logical blocks: (57.9 GB/54.0 GiB)
sd 0:0:0:1: Power-on or device reset occurred
scsi 0:0:0:2: Direct-Access     SAMSUNG  KLUCG4J1EB-B0B1  0200 PQ: 0 ANSI: 6
sd 0:0:0:2: Power-on or device reset occurred
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
scsi 0:0:0:3: Direct-Access     SAMSUNG  KLUCG4J1EB-B0B1  0200 PQ: 0 ANSI: 6
sd 0:0:0:3: Power-on or device reset occurred
sd 0:0:0:3: [sdd] 32768 4096-byte logical blocks: (134 MB/128 MiB)
sd 0:0:0:1: [sdb] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB)
sd 0:0:0:1: [sdb] Write Protect is off
sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
 sda: sda1 sda2 sda3 sda4 sda5 sda6 sda7
sd 0:0:0:2: [sdc] 1024 4096-byte logical blocks: (4.19 MB/4.00 MiB)
sd 0:0:0:2: [sdc] Write Protect is off
sd 0:0:0:2: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
random: fast init done
 sdb: sdb1
sd 0:0:0:3: [sdd] Write Protect is off
scsi 0:0:0:4: Direct-Access     SAMSUNG  KLUCG4J1EB-B0B1  0200 PQ: 0 ANSI: 6
 sdc: sdc1
sd 0:0:0:4: Power-on or device reset occurred
sd 0:0:0:1: [sdb] Attached SCSI disk
sd 0:0:0:4: [sde] 1048576 4096-byte logical blocks: (4.29 GB/4.00 GiB)
sd 0:0:0:4: [sde] Write Protect is off
sd 0:0:0:4: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA
sd 0:0:0:3: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
sd 0:0:0:2: [sdc] Attached SCSI disk
 sde: sde1 sde2 sde3 sde4 sde5 sde6 sde7 sde8 sde9 sde10 sde11 sde12 sde13 sde14 sde15 sde16 sde17 sde18 sde19 sde20 sde21 sde22 sde23 sde24 sde25 sde26 sde27 sde28 sde29 sde30 sde31 sde32 sde33 sde34 sde35 sde36 sde37 sde38
scsi 0:0:0:5: Direct-Access     SAMSUNG  KLUCG4J1EB-B0B1  0200 PQ: 0 ANSI: 6
 sdd: sdd1 sdd2
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:0:3: [sdd] Attached SCSI disk
sd 0:0:0:5: Power-on or device reset occurred
sd 0:0:0:4: [sde] Attached SCSI disk
sd 0:0:0:5: [sdf] 393216 4096-byte logical blocks: (1.61 GB/1.50 GiB)
sd 0:0:0:5: [sdf] Write Protect is off
sd 0:0:0:5: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA
 sdf: sdf1 sdf2 sdf3
sd 0:0:0:5: [sdf] Attached SCSI disk
Freeing unused kernel memory: 5632K
Checked W+X mappings: passed, no W+X pages found


DEFCONFIG

# CONFIG_SWAP is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_PREEMPT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=20
CONFIG_BLK_DEV_INITRD=y
CONFIG_ARCH_QCOM=y
# CONFIG_EFI is not set
# CONFIG_SUSPEND is not set
CONFIG_PM=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MQ_IOSCHED_DEADLINE is not set
# CONFIG_MQ_IOSCHED_KYBER is not set
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
# CONFIG_BLK_DEV is not set
CONFIG_SCSI=y
# CONFIG_SCSI_PROC_FS is not set
CONFIG_BLK_DEV_SD=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_SAS_LIBSAS=y
CONFIG_SCSI_UFSHCD=y
CONFIG_SCSI_UFSHCD_PLATFORM=y
CONFIG_SCSI_UFS_QCOM=y
CONFIG_SCSI_UFS_BSG=y
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_SERIO_SERPORT is not set
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_LEGACY_PTY_COUNT=16
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_MSM=y
CONFIG_SERIAL_MSM_CONSOLE=y
CONFIG_SERIAL_DEV_BUS=y
# CONFIG_HW_RANDOM is not set
CONFIG_PINCTRL_MSM8998=y
CONFIG_GPIO_GENERIC_PLATFORM=y
# CONFIG_HWMON is not set
CONFIG_REGULATOR=y
CONFIG_REGULATOR_FIXED_VOLTAGE=y
CONFIG_REGULATOR_QCOM_SMD_RPM=y
# CONFIG_HID is not set
# CONFIG_USB_SUPPORT is not set
# CONFIG_COMMON_CLK_XGENE is not set
CONFIG_COMMON_CLK_QCOM=y
# CONFIG_QCOM_A53PLL is not set
# CONFIG_QCOM_CLK_APCS_MSM8916 is not set
CONFIG_QCOM_CLK_SMD_RPM=y
CONFIG_MSM_GCC_8998=y
CONFIG_HWSPINLOCK=y
CONFIG_HWSPINLOCK_QCOM=y
CONFIG_MAILBOX=y
CONFIG_QCOM_APCS_IPC=y
CONFIG_RPMSG_QCOM_GLINK_RPM=y
CONFIG_RPMSG_QCOM_GLINK_SMEM=y
CONFIG_RPMSG_QCOM_SMD=y
CONFIG_RPMSG_VIRTIO=y
CONFIG_QCOM_COMMAND_DB=y
CONFIG_QCOM_SMEM=y
CONFIG_QCOM_SMD_RPM=y
CONFIG_PHY_QCOM_QMP=y
CONFIG_TMPFS=y
CONFIG_NLS_ISO8859_1=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=5
CONFIG_WQ_WATCHDOG=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_WARN_ALL_UNSEEDED_RANDOM=y
CONFIG_DEBUG_LIST=y
CONFIG_DEBUG_PI_LIST=y
CONFIG_DEBUG_NOTIFIERS=y
CONFIG_RCU_CPU_STALL_TIMEOUT=3
CONFIG_FUNCTION_TRACER=y
CONFIG_DEBUG_WX=y



FULL SESSION LOG:

https://pastebin.ubuntu.com/p/YSm82GxhNW/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-22 10:59             ` Marc Gonzalez
@ 2019-01-22 12:49               ` Marc Gonzalez
  2019-01-22 16:17               ` Marc Gonzalez
  2019-01-23  3:10               ` jianchao.wang
  2 siblings, 0 replies; 34+ messages in thread
From: Marc Gonzalez @ 2019-01-22 12:49 UTC (permalink / raw)
  To: Jianchao Wang, Christoph Hellwig, Jens Axboe
  Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani,
	Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath,
	Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam,
	Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson

On 22/01/2019 11:59, Marc Gonzalez wrote:

> 4GB RAM. And the system hangs after reading 3.8GB
> I think this is not a coincidence.
> NB: swap is disabled (this might be relevant)
> 
> On a freshly booted system, I get
> 
> # free
> random: get_random_u64 called from copy_process.isra.9.part.10+0x2c8/0x1700 with crng_init=1
> random: get_random_u64 called from arch_pick_mmap_layout+0xdc/0x100 with crng_init=1
> random: get_random_u64 called from load_elf_binary+0x2b8/0x12b4 with crng_init=1
>              total       used       free     shared    buffers     cached
> Mem:       3948996      48916    3900080      17124          0      17124
> -/+ buffers/cache:      31792    3917204
> Swap:            0          0          0

When using iflag=direct, 'buffers' doesn't grow

# dd if=/dev/sde of=/dev/null bs=1M iflag=direct & while true; do free; sleep 5; done
             total       used       free     shared    buffers     cached
Mem:       3948996      49060    3899936      17124          0      17124
-/+ buffers/cache:      31936    3917060
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996      50084    3898912      17124          0      17124
-/+ buffers/cache:      32960    3916036
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996      49832    3899164      17124          0      17124
-/+ buffers/cache:      32708    3916288
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996      49832    3899164      17124          0      17124
-/+ buffers/cache:      32708    3916288
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996      49832    3899164      17124          0      17124
-/+ buffers/cache:      32708    3916288
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996      49832    3899164      17124          0      17124
-/+ buffers/cache:      32708    3916288
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996      49832    3899164      17124          0      17124
-/+ buffers/cache:      32708    3916288
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996      50084    3898912      17124          0      17124
-/+ buffers/cache:      32960    3916036
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996      49832    3899164      17124          0      17124
-/+ buffers/cache:      32708    3916288
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996      50084    3898912      17124          0      17124
-/+ buffers/cache:      32960    3916036
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996      50084    3898912      17124          0      17124
-/+ buffers/cache:      32960    3916036
Swap:            0          0          0
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 53.3547 s, 80.5 MB/s
             total       used       free     shared    buffers     cached
Mem:       3948996      49188    3899808      17124          0      17124
-/+ buffers/cache:      32064    3916932
Swap:            0          0          0



But when using "normal" I/O, the kernel seems to consume all RAM, until
the system locks up.

# dd if=/dev/sde of=/dev/null bs=1M & while true; do free; sleep 5; done
             total       used       free     shared    buffers     cached
Mem:       3948996      49076    3899920      17124          0      17124
-/+ buffers/cache:      31952    3917044
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996     239816    3709180      17124     190464      17004
-/+ buffers/cache:      32348    3916648
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996     430760    3518236      17124     380928      16880
-/+ buffers/cache:      32952    3916044
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996     733716    3215280      17124     685056      17124
-/+ buffers/cache:      31536    3917460
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    1037056    2911940      17124     987648      16936
-/+ buffers/cache:      32472    3916524
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    1335576    2613420      17124    1285632      17124
-/+ buffers/cache:      32820    3916176
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    1565188    2383808      17124    1514036      17172
-/+ buffers/cache:      33980    3915016
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    1753352    2195644      17124    1702400      17004
-/+ buffers/cache:      33948    3915048
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    1942784    2006212      17124    1891328      17152
-/+ buffers/cache:      34304    3914692
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    2226024    1722972      17124    2175088      17204
-/+ buffers/cache:      33732    3915264
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    2452092    1496904      17124    2400608      17136
-/+ buffers/cache:      34348    3914648
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    2679096    1269900      17124    2626400      17100
-/+ buffers/cache:      35596    3913400
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    2870828    1078168      17124    2816864      17004
-/+ buffers/cache:      36960    3912036
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    3144360     804636      17124    3089760      17044
-/+ buffers/cache:      37556    3911440
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    3388840     560156      17124    3333472      17088
-/+ buffers/cache:      38280    3910716
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    3577752     371244      17124    3521888      17040
-/+ buffers/cache:      38824    3910172
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:       3948996    3754844     194152      17124    3699040      17056
-/+ buffers/cache:      38748    3910248
Swap:            0          0          0
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:    3-...0: (0 ticks this GP) idle=be2/1/0x4000000000000000 softirq=275/275 fqs=68 
rcu:    4-...0: (709 ticks this GP) idle=386/1/0x4000000000000000 softirq=248/250 fqs=68 
rcu:    (detected by 6, t=2018 jiffies, g=505, q=8)
Task dump for CPU 3:
sh              R  running task        0   671      1 0x00000002
Call trace:
 __switch_to+0x174/0x1e0
 __kmalloc+0x37c/0x3d0
 0x100000000
Task dump for CPU 4:
dd              R  running task        0   707    671 0x00000002
Call trace:
 __switch_to+0x174/0x1e0
 __cpu_online_mask+0x0/0x8
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:    3-...0: (0 ticks this GP) idle=be2/1/0x4000000000000000 softirq=275/275 fqs=250 
rcu:    4-...0: (709 ticks this GP) idle=386/1/0x4000000000000000 softirq=248/250 fqs=250 
rcu:    (detected by 6, t=8024 jiffies, g=505, q=8)
Task dump for CPU 3:
sh              R  running task        0   671      1 0x00000002
Call trace:
 __switch_to+0x174/0x1e0
 __kmalloc+0x37c/0x3d0
 0x100000000
Task dump for CPU 4:
dd              R  running task        0   707    671 0x00000002
Call trace:
 __switch_to+0x174/0x1e0
 __cpu_online_mask+0x0/0x8


Calling 'echo 3 > /proc/sys/vm/drop_caches' while dd is running does not seem
to free any of the buffers. Could they be considered dirty?

             total       used       free     shared    buffers     cached
Mem:       3948996    3067664     881332      17124    3011584      17176
-/+ buffers/cache:      38904    3910092
Swap:            0          0          0

# echo 3 > /proc/sys/vm/drop_caches

             total       used       free     shared    buffers     cached
Mem:       3948996    3256876     692120      17124    3200512      17076
-/+ buffers/cache:      39288    3909708
Swap:            0          0          0


Calling 'sync' has no effect either.

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-22 10:59             ` Marc Gonzalez
  2019-01-22 12:49               ` Marc Gonzalez
@ 2019-01-22 16:17               ` Marc Gonzalez
  2019-01-22 16:22                 ` Greg Kroah-Hartman
  2019-01-23  3:10               ` jianchao.wang
  2 siblings, 1 reply; 34+ messages in thread
From: Marc Gonzalez @ 2019-01-22 16:17 UTC (permalink / raw)
  To: Jianchao Wang, Christoph Hellwig, Jens Axboe, Greg Kroah-Hartman
  Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani,
	Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath,
	Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam,
	Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson

+ Greg

On 22/01/2019 11:59, Marc Gonzalez wrote:

> FULL SESSION LOG: https://pastebin.ubuntu.com/p/YSm82GxhNW/

I finally got the hung_task detector to fire:
https://pastebin.ubuntu.com/p/XDDJjf7PTb/

Had a chat with GKH, who said that drivers/scsi/ufs is known to be not 100% functional,
and requires some cherry-picking from downstream to handle all corner cases.

I've sent a patch demoting drivers/scsi/ufs status from "Supported" to "Odd Fixes",
in order to warn the next unwary fellow developers:

https://patchwork.kernel.org/patch/10773503/

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-22 16:17               ` Marc Gonzalez
@ 2019-01-22 16:22                 ` Greg Kroah-Hartman
  2019-01-22 19:07                   ` Evan Green
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2019-01-22 16:22 UTC (permalink / raw)
  To: Marc Gonzalez
  Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel,
	linux-block, SCSI, Joao Pinto, Subhash Jadavani, Sayali Lokhande,
	Can Guo, Asutosh Das, Vijay Viswanath, Venkat Gopalakrishnan,
	Ritesh Harjani, Vivek Gautam, Jeffrey Hugo, Maya Erez,
	Evan Green, Matthias Kaehlcke, Douglas Anderson, Stephen Boyd,
	Tomas Winkler, Adrian Hunter, Alim Akhtar, Avri Altman,
	Bart Van Assche, Martin Petersen, Bjorn Andersson

On Tue, Jan 22, 2019 at 05:17:56PM +0100, Marc Gonzalez wrote:
> + Greg
> 
> On 22/01/2019 11:59, Marc Gonzalez wrote:
> 
> > FULL SESSION LOG: https://pastebin.ubuntu.com/p/YSm82GxhNW/
> 
> I finally got the hung_task detector to fire:
> https://pastebin.ubuntu.com/p/XDDJjf7PTb/
> 
> Had a chat with GKH, who said that drivers/scsi/ufs is known to be not 100% functional,
> and requires some cherry-picking from downstream to handle all corner cases.

Note, we know it needs that for your controller, maybe not for others.
So you can't make that kind of generalization :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-22 16:22                 ` Greg Kroah-Hartman
@ 2019-01-22 19:07                   ` Evan Green
  0 siblings, 0 replies; 34+ messages in thread
From: Evan Green @ 2019-01-22 19:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Marc Gonzalez, Jianchao Wang, Christoph Hellwig, Jens Axboe,
	fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani,
	Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath,
	Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam,
	Jeffrey Hugo, Maya Erez, Matthias Kaehlcke, Douglas Anderson,
	Stephen Boyd, Tomas Winkler, Adrian Hunter, Alim Akhtar,
	Avri Altman, Bart Van Assche, Martin Petersen, Bjorn Andersson

FWIW, I am unable to repro this in our 4.19 kernel [1]. I put mem=4G
on the command line, and can dd till the cows come home. Seems like
you're leaking buffers somewhere?

localhost ~ # dd if=/dev/sda of=/dev/null bs=4M status=progress
12683575296 bytes (13 GB, 12 GiB) copied, 125.004 s, 101 MB/s^C
3048+0 records in
3047+0 records out
12780044288 bytes (13 GB, 12 GiB) copied, 126.015 s, 101 MB/s

-Evan
[1] https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-4.19

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-22 10:59             ` Marc Gonzalez
  2019-01-22 12:49               ` Marc Gonzalez
  2019-01-22 16:17               ` Marc Gonzalez
@ 2019-01-23  3:10               ` jianchao.wang
  2019-02-06 16:16                 ` Marc Gonzalez
  2 siblings, 1 reply; 34+ messages in thread
From: jianchao.wang @ 2019-01-23  3:10 UTC (permalink / raw)
  To: Marc Gonzalez, Christoph Hellwig, Jens Axboe
  Cc: fsdevel, linux-block, SCSI, Joao Pinto, Subhash Jadavani,
	Sayali Lokhande, Can Guo, Asutosh Das, Vijay Viswanath,
	Venkat Gopalakrishnan, Ritesh Harjani, Vivek Gautam,
	Jeffrey Hugo, Maya Erez, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson



On 1/22/19 6:59 PM, Marc Gonzalez wrote:
> On 22/01/2019 04:12, Jianchao Wang wrote:
> 
>> On 1/21/19 11:22 PM, Marc Gonzalez wrote:
>>
>>> Well, now we know for sure that the clk_scaling_lock is a red herring.
>>> I applied the patch below, and still the system locked up:
>>>
>>> # dd if=/dev/sde of=/dev/null bs=1M status=progress
>>> 3892314112 bytes (3.9 GB, 3.6 GiB) copied, 50.0042 s, 77.8 MB/s
>>>
>>> I can't seem to get the RCU stall warning anymore. How can I get
>>> to the bottom of this issue?
>> Can you detail the system 'locked up' ?
>> dd hangs there ? any hung task warning log ?
>> hang forever or just hang for a relatively long time.
> The system is an arm64 dev board (APQ8098 MEDIABOX) with 4GB RAM and 64 GB UFS.
> USB, SDHC, PCIe, SATA, Ethernet are not functional yet (so much work ahead).
> All I have is a single serial console.
> When the shell hangs, I lose access to the system altogether.
> SysRq is not implemented either. I am blind once the shell locks up.
> The system has been frozen for 15 hours, I think that qualifies as 'forever' ;-)
> 
>> And what is the status of the dd when it hangs ?
>> Can you take some samples of the /proc/<dd's pid>/status and /proc/<dd's pid>/stack during the hang ?
> Sadly, I cannot access this information once the shell locks up.
> 
> However, the kernel did print many warnings overnight (see below).
> 
>> And also would you please share the dmesg log and config ?
> See below.
> 
>> Since always fails with buffered read with fixed bytes,
>> what is the capacity of your system memory ? 
> 4GB RAM. And the system hangs after reading 3.8GB
> I think this is not a coincidence.
> NB: swap is disabled (this might be relevant)

Look through the log
https://pastebin.ubuntu.com/p/YSm82GxhNW/

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:    6-...0: (13995 ticks this GP) idle=e16/1/0x4000000000000000 softirq=155/155 fqs=655 
rcu:    (detected by 4, t=576151 jiffies, g=-391, q=18)
Task dump for CPU 6:
dd              R  running task        0   677    671 0x00000002
Call trace:
 __switch_to+0x174/0x1e0
 ufshcd_queuecommand+0x84c/0x9a8

The task is in RUNNING state when it was scheduled out.
So it should be a preempt (the path is under preemptible rcu).

And I wonder why it is not scheduled back for so long time that rcu stall was triggered.
And who was occupying the cpu all the time ?

Would you please try to show all running tasks on all cpu ?

echo l > /proc/sysrq-trigger

In addition, since the rcu grace period didn't pass, a lot of things could not be moved
forward.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-01-23  3:10               ` jianchao.wang
@ 2019-02-06 16:16                 ` Marc Gonzalez
  2019-02-06 17:05                   ` Marc Gonzalez
  0 siblings, 1 reply; 34+ messages in thread
From: Marc Gonzalez @ 2019-02-06 16:16 UTC (permalink / raw)
  To: Jianchao Wang, Christoph Hellwig, Jens Axboe
  Cc: fsdevel, linux-block, SCSI, Joao Pinto, Jeffrey Hugo, Evan Green,
	Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler,
	Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche,
	Martin Petersen, Bjorn Andersson

On 23/01/2019 04:10, jianchao.wang wrote:

> Looking through the log
> https://pastebin.ubuntu.com/p/YSm82GxhNW/
> 
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu:    6-...0: (13995 ticks this GP) idle=e16/1/0x4000000000000000 softirq=155/155 fqs=655 
> rcu:    (detected by 4, t=576151 jiffies, g=-391, q=18)
> Task dump for CPU 6:
> dd              R  running task        0   677    671 0x00000002
> Call trace:
>  __switch_to+0x174/0x1e0
>  ufshcd_queuecommand+0x84c/0x9a8
> 
> The task is in RUNNING state when it was scheduled out.
> So it should be a preempt (the path is under preemptible rcu).
> 
> And I wonder why it is not scheduled back for so long time that rcu stall was triggered.
> And who was occupying the cpu all the time ?
> 
> Would you please try to show all running tasks on all cpu ?
> 
> echo l > /proc/sysrq-trigger
> 
> In addition, since the rcu grace period didn't pass, a lot of things could not be moved
> forward.

# echo w > /proc/sysrq-trigger
[  517.619818] sysrq: Show Blocked State
[  517.620542]   task                        PC stack   pid father

# echo l > /proc/sysrq-trigger
[  530.692149] sysrq: Show backtrace of all active CPUs

# ps -o pid,user,stat,args
PID   USER     STAT COMMAND
    1 root     S    init
    2 root     SW   [kthreadd]
    3 root     IW<  [rcu_gp]
    4 root     IW<  [rcu_par_gp]
    5 root     IW   [kworker/0:0-eve]
    6 root     IW<  [kworker/0:0H]
    7 root     IW   [kworker/u16:0-u]
    8 root     IW<  [mm_percpu_wq]
    9 root     SW   [ksoftirqd/0]
   10 root     IW   [rcu_preempt]
   11 root     SW   [migration/0]
   12 root     IW   [kworker/0:1-rcu]
   13 root     SW   [cpuhp/0]
   14 root     SW   [cpuhp/1]
   15 root     SW   [migration/1]
   16 root     SW   [ksoftirqd/1]
   17 root     IW   [kworker/1:0-eve]
   18 root     IW<  [kworker/1:0H]
   19 root     SW   [cpuhp/2]
   20 root     SW   [migration/2]
   21 root     SW   [ksoftirqd/2]
   22 root     IW   [kworker/2:0-eve]
   23 root     IW<  [kworker/2:0H]
   24 root     SW   [cpuhp/3]
   25 root     SW   [migration/3]
   26 root     SW   [ksoftirqd/3]
   27 root     IW   [kworker/3:0-eve]
   28 root     IW<  [kworker/3:0H]
   29 root     SW   [cpuhp/4]
   30 root     SW   [migration/4]
   31 root     SW   [ksoftirqd/4]
   32 root     IW   [kworker/4:0-eve]
   33 root     IW<  [kworker/4:0H-kb]
   34 root     SW   [cpuhp/5]
   35 root     SW   [migration/5]
   36 root     SW   [ksoftirqd/5]
   37 root     IW   [kworker/5:0-eve]
   38 root     IW<  [kworker/5:0H]
   39 root     SW   [cpuhp/6]
   40 root     SW   [migration/6]
   41 root     SW   [ksoftirqd/6]
   42 root     IW   [kworker/6:0-eve]
   43 root     IW<  [kworker/6:0H]
   44 root     SW   [cpuhp/7]
   45 root     SW   [migration/7]
   46 root     SW   [ksoftirqd/7]
   47 root     IW   [kworker/7:0-eve]
   48 root     IW<  [kworker/7:0H]
   49 root     SW   [kdevtmpfs]
   50 root     IW   [kworker/u16:1-e]
   52 root     SW   [rcu_tasks_kthre]
   61 root     IW   [kworker/3:1-mm_]
   68 root     IW   [kworker/5:1-eve]
   76 root     IW   [kworker/6:1-eve]
   77 root     IW   [kworker/7:1-mm_]
   85 root     IW   [kworker/1:1-eve]
   93 root     IW   [kworker/2:1-eve]
  102 root     IW   [kworker/4:1-mm_]
  202 root     SW   [khungtaskd]
  203 root     SW   [oom_reaper]
  204 root     IW<  [writeback]
  206 root     SW   [kcompactd0]
  208 root     IW<  [kblockd]
  209 root     IW<  [devfreq_wq]
  211 root     IW   [kworker/u16:2-e]
  214 root     IW   [kworker/u16:3-e]
  224 root     IW   [kworker/u16:4-e]
  235 root     IW   [kworker/u16:5-e]
  259 root     IW   [kworker/u16:6-e]
  380 root     SW   [kswapd0]
  509 root     SW   [scsi_eh_0]
  510 root     IW<  [scsi_tmf_0]
  511 root     IW<  [ufs_clk_gating_]
  512 root     IW<  [ufs_clkscaling_]
  551 root     IW   [kworker/u16:7-e]
  635 root     IW   [kworker/u16:8-u]
  656 root     S    /sbin/klogd -n
  674 root     S    /bin/sh
  675 root     IW<  [kworker/4:1H]
  677 root     R    ps -o pid,user,stat,args


# dd if=/dev/sde of=/dev/null bs=1M & while true; do echo l > /proc/sysrq-trigger; sleep 1; done
https://pastebin.ubuntu.com/p/Sx4hXRbHby/

What information can I provide to help debug this issue?

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-02-06 16:16                 ` Marc Gonzalez
@ 2019-02-06 17:05                   ` Marc Gonzalez
  2019-02-07 10:44                     ` Marc Gonzalez
  0 siblings, 1 reply; 34+ messages in thread
From: Marc Gonzalez @ 2019-02-06 17:05 UTC (permalink / raw)
  To: Jianchao Wang, Christoph Hellwig, Jens Axboe
  Cc: fsdevel, linux-block, SCSI, Joao Pinto, Jeffrey Hugo, Evan Green,
	Matthias Kaehlcke, Douglas Anderson, Stephen Boyd, Tomas Winkler,
	Adrian Hunter, Alim Akhtar, Avri Altman, Bart Van Assche,
	Martin Petersen, Bjorn Andersson

On 06/02/2019 17:16, Marc Gonzalez wrote:

> # dd if=/dev/sde of=/dev/null bs=1M & while true; do echo l > /proc/sysrq-trigger; sleep 1; done
> https://pastebin.ubuntu.com/p/Sx4hXRbHby/
> 
> What other information can I provide to help debug this issue?

# dd if=/dev/sde of=/dev/null bs=1M & while true; do echo m > /proc/sysrq-trigger; echo; sleep 1; done
https://pastebin.ubuntu.com/p/tXTCK3Jypn/

Here are the first and last Mem-Info reports:
(Identical fields removed)

active_anon:1449 inactive_anon:3417 isolated_anon:0
 active_file:0 inactive_file:0 isolated_file:0
 slab_reclaimable:1309 slab_unreclaimable:2174
 mapped:465 shmem:4797 pagetables:14 bounce:0
 free:979014 free_pcp:2068 free_cma:0
Node 0 active_anon:5960kB inactive_anon:13668kB active_file:0kB inactive_file:656kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2188kB dirty:0kB writeback:0kB shmem:19188kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
DMA32 free:1972788kB min:3956kB low:5924kB high:7892kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2045952kB managed:1974524kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1480kB local_pcp:1480kB free_cma:0kB
Normal free:1936864kB min:3992kB low:5976kB high:7960kB active_anon:6956kB inactive_anon:13668kB active_file:0kB inactive_file:5584kB unevictable:0kB writepending:0kB present:2068224kB managed:1986952kB mlocked:0kB kernel_stack:1456kB pagetables:76kB bounce:0kB free_pcp:6408kB local_pcp:612kB free_cma:0kB
DMA32: 1*4kB (M) 4*8kB (M) 3*16kB (M) 3*32kB (M) 4*64kB (M) 3*128kB (M) 5*256kB (UM) 5*512kB (UM) 6*1024kB (UM) 6*2048kB (UM) 476*4096kB (M) = 1972788kB
Normal: 2*4kB (E) 4*8kB (ME) 1*16kB (E) 2*32kB (ME) 2*64kB (ME) 2*128kB (UM) 0*256kB 3*512kB (UME) 1*1024kB (E) 1*2048kB (M) 471*4096kB (M) = 1934328kB
6974 total pagecache pages

active_anon:1762 inactive_anon:3403 isolated_anon:0
 active_file:0 inactive_file:949699 isolated_file:0
 slab_reclaimable:3465 slab_unreclaimable:2220
 mapped:591 shmem:4798 pagetables:19 bounce:0
 free:26601 free_pcp:2250 free_cma:0
Node 0 active_anon:7048kB inactive_anon:13612kB active_file:0kB inactive_file:3798796kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2364kB dirty:0kB writeback:0kB shmem:19192kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
DMA32 free:100488kB min:3956kB low:5924kB high:7892kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:1866424kB unevictable:0kB writepending:0kB present:2045952kB managed:1974524kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:2844kB local_pcp:1172kB free_cma:0kB
Normal free:5916kB min:3992kB low:5976kB high:7960kB active_anon:7048kB inactive_anon:13612kB active_file:0kB inactive_file:1932372kB unevictable:0kB writepending:0kB present:2068224kB managed:1986952kB mlocked:0kB kernel_stack:1472kB pagetables:76kB bounce:0kB free_pcp:6156kB local_pcp:516kB free_cma:0kB
DMA32: 2*4kB (UM) 2*8kB (M) 1*16kB (U) 1*32kB (U) 5*64kB (M) 6*128kB (UME) 6*256kB (UME) 5*512kB (ME) 7*1024kB (UME) 7*2048kB (UME) 18*4096kB (M) = 100488kB
Normal: 3*4kB (ME) 4*8kB (UE) 3*16kB (UME) 0*32kB 1*64kB (U) 1*128kB (M) 0*256kB 3*512kB (UME) 2*1024kB (UE) 1*2048kB (M) 0*4096kB = 5916kB
954552 total pagecache pages


'inactive_file' grows from 0 to 949699
'free' shrinks from 979014 to 26601 (probably 0 by the time the system hangs)
'total pagecache pages' grows from 6974 to 954552

Where are "pagecache pages" allocated? and freed?

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-02-06 17:05                   ` Marc Gonzalez
@ 2019-02-07 10:44                     ` Marc Gonzalez
  2019-02-07 16:56                       ` Marc Gonzalez
  0 siblings, 1 reply; 34+ messages in thread
From: Marc Gonzalez @ 2019-02-07 10:44 UTC (permalink / raw)
  To: linux-mm, linux-block
  Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI,
	Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin,
	Andrew Morton, Michal Hocko

+ linux-mm

Summarizing the issue for linux-mm readers:

If I read data from a storage device larger than my system's RAM, the system freezes
once dd has read more data than available RAM.

# dd if=/dev/sde of=/dev/null bs=1M & while true; do echo m > /proc/sysrq-trigger; echo; echo; sleep 1; done
https://pastebin.ubuntu.com/p/HXzdqDZH4W/

A few seconds before the system hangs, Mem-Info shows:

[   90.986784] Node 0 active_anon:7060kB inactive_anon:13644kB active_file:0kB inactive_file:3797500kB [...]

=> 3797500kB is basically all of RAM.

I tried to locate where "inactive_file" was being increased from, and saw two signatures:

[  255.606019] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | add_to_page_cache_lru | mpage_readpages | blkdev_readpages | read_pages | __do_page_cache_readahead | ondemand_readahead | page_cache_sync_readahead

[  255.637238] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | lru_cache_add_active_or_unevictable | __handle_mm_fault | handle_mm_fault | do_page_fault | do_translation_fault | do_mem_abort | el1_da

Are these expected?

NB: the system does not hang if I specify 'iflag=direct' to dd.

According to the RCU watchdog:

[  108.466240] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  108.466420] rcu:     1-...0: (130 ticks this GP) idle=79e/1/0x4000000000000000 softirq=2393/2523 fqs=2626 
[  108.471436] rcu:     (detected by 4, t=5252 jiffies, g=133, q=85)
[  108.480605] Task dump for CPU 1:
[  108.486483] kworker/1:1H    R  running task        0   680      2 0x0000002a
[  108.489977] Workqueue: kblockd blk_mq_run_work_fn
[  108.496908] Call trace:
[  108.501513]  __switch_to+0x174/0x1e0
[  108.503757]  blk_mq_run_work_fn+0x28/0x40
[  108.507589]  process_one_work+0x208/0x480
[  108.511486]  worker_thread+0x48/0x460
[  108.515480]  kthread+0x124/0x130
[  108.519123]  ret_from_fork+0x10/0x1c

Can anyone shed some light on what's going on?

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-02-07 10:44                     ` Marc Gonzalez
@ 2019-02-07 16:56                       ` Marc Gonzalez
  2019-02-08 15:33                         ` Marc Gonzalez
  0 siblings, 1 reply; 34+ messages in thread
From: Marc Gonzalez @ 2019-02-07 16:56 UTC (permalink / raw)
  To: linux-mm, linux-block
  Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI,
	Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin,
	Andrew Morton, Michal Hocko

On 07/02/2019 11:44, Marc Gonzalez wrote:

> + linux-mm
> 
> Summarizing the issue for linux-mm readers:
> 
> If I read data from a storage device larger than my system's RAM, the system freezes
> once dd has read more data than available RAM.
> 
> # dd if=/dev/sde of=/dev/null bs=1M & while true; do echo m > /proc/sysrq-trigger; echo; echo; sleep 1; done
> https://pastebin.ubuntu.com/p/HXzdqDZH4W/
> 
> A few seconds before the system hangs, Mem-Info shows:
> 
> [   90.986784] Node 0 active_anon:7060kB inactive_anon:13644kB active_file:0kB inactive_file:3797500kB [...]
> 
> => 3797500kB is basically all of RAM.
> 
> I tried to locate where "inactive_file" was being increased from, and saw two signatures:
> 
> [  255.606019] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | add_to_page_cache_lru | mpage_readpages | blkdev_readpages | read_pages | __do_page_cache_readahead | ondemand_readahead | page_cache_sync_readahead
> 
> [  255.637238] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | lru_cache_add_active_or_unevictable | __handle_mm_fault | handle_mm_fault | do_page_fault | do_translation_fault | do_mem_abort | el1_da
> 
> Are these expected?
> 
> NB: the system does not hang if I specify 'iflag=direct' to dd.
> 
> According to the RCU watchdog:
> 
> [  108.466240] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  108.466420] rcu:     1-...0: (130 ticks this GP) idle=79e/1/0x4000000000000000 softirq=2393/2523 fqs=2626 
> [  108.471436] rcu:     (detected by 4, t=5252 jiffies, g=133, q=85)
> [  108.480605] Task dump for CPU 1:
> [  108.486483] kworker/1:1H    R  running task        0   680      2 0x0000002a
> [  108.489977] Workqueue: kblockd blk_mq_run_work_fn
> [  108.496908] Call trace:
> [  108.501513]  __switch_to+0x174/0x1e0
> [  108.503757]  blk_mq_run_work_fn+0x28/0x40
> [  108.507589]  process_one_work+0x208/0x480
> [  108.511486]  worker_thread+0x48/0x460
> [  108.515480]  kthread+0x124/0x130
> [  108.519123]  ret_from_fork+0x10/0x1c
> 
> Can anyone shed some light on what's going on?

Saw a slightly different report from another test run:
https://pastebin.ubuntu.com/p/jCywbKgRCq/

[  340.689764] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  340.689992] rcu:     1-...0: (8548 ticks this GP) idle=c6e/1/0x4000000000000000 softirq=82/82 fqs=6 
[  340.694977] rcu:     (detected by 5, t=5430 jiffies, g=-719, q=16)
[  340.703803] Task dump for CPU 1:
[  340.709507] dd              R  running task        0   675    673 0x00000002
[  340.713018] Call trace:
[  340.720059]  __switch_to+0x174/0x1e0
[  340.722192]  0xffffffc0f6dc9600

[  352.689742] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 33s!
[  352.689910] Showing busy workqueues and worker pools:
[  352.696743] workqueue mm_percpu_wq: flags=0x8
[  352.701753]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  352.706099]     pending: vmstat_update

[  384.693730] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 65s!
[  384.693815] Showing busy workqueues and worker pools:
[  384.700577] workqueue events: flags=0x0
[  384.705699]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[  384.709351]     pending: vmstat_shepherd
[  384.715587] workqueue mm_percpu_wq: flags=0x8
[  384.719495]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  384.723754]     pending: vmstat_update

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-02-07 16:56                       ` Marc Gonzalez
@ 2019-02-08 15:33                         ` Marc Gonzalez
  2019-02-08 15:49                           ` Bart Van Assche
  0 siblings, 1 reply; 34+ messages in thread
From: Marc Gonzalez @ 2019-02-08 15:33 UTC (permalink / raw)
  To: linux-mm, linux-block
  Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI,
	Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin,
	Andrew Morton, Michal Hocko

On 07/02/2019 17:56, Marc Gonzalez wrote:

> Saw a slightly different report from another test run:
> https://pastebin.ubuntu.com/p/jCywbKgRCq/
> 
> [  340.689764] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  340.689992] rcu:     1-...0: (8548 ticks this GP) idle=c6e/1/0x4000000000000000 softirq=82/82 fqs=6
> [  340.694977] rcu:     (detected by 5, t=5430 jiffies, g=-719, q=16)
> [  340.703803] Task dump for CPU 1:
> [  340.709507] dd              R  running task        0   675    673 0x00000002
> [  340.713018] Call trace:
> [  340.720059]  __switch_to+0x174/0x1e0
> [  340.722192]  0xffffffc0f6dc9600
> 
> [  352.689742] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 33s!
> [  352.689910] Showing busy workqueues and worker pools:
> [  352.696743] workqueue mm_percpu_wq: flags=0x8
> [  352.701753]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
> [  352.706099]     pending: vmstat_update
> 
> [  384.693730] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 65s!
> [  384.693815] Showing busy workqueues and worker pools:
> [  384.700577] workqueue events: flags=0x0
> [  384.705699]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
> [  384.709351]     pending: vmstat_shepherd
> [  384.715587] workqueue mm_percpu_wq: flags=0x8
> [  384.719495]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
> [  384.723754]     pending: vmstat_update

Running 'dd if=/dev/sda of=/dev/null bs=40M status=progress'
I got a slightly different splat:

[  171.513944] INFO: task dd:674 blocked for more than 23 seconds.
[  171.514131]       Tainted: G S                5.0.0-rc5-next-20190206 #23
[  171.518784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  171.525728] dd              D    0   674    672 0x00000000
[  171.533525] Call trace:
[  171.538926]  __switch_to+0x174/0x1e0
[  171.541237]  __schedule+0x1e4/0x630
[  171.545041]  schedule+0x34/0x90
[  171.548261]  io_schedule+0x20/0x40
[  171.551401]  blk_mq_get_tag+0x178/0x320
[  171.554852]  blk_mq_get_request+0x13c/0x3e0
[  171.558587]  blk_mq_make_request+0xcc/0x640
[  171.562763]  generic_make_request+0x1d4/0x390
[  171.566924]  submit_bio+0x5c/0x1c0
[  171.571447]  mpage_readpages+0x178/0x1d0
[  171.574730]  blkdev_readpages+0x3c/0x50
[  171.578831]  read_pages+0x70/0x180
[  171.582364]  __do_page_cache_readahead+0x1cc/0x200
[  171.585843]  ondemand_readahead+0x148/0x310
[  171.590613]  page_cache_async_readahead+0xc0/0x100
[  171.594719]  generic_file_read_iter+0x54c/0x860
[  171.599565]  blkdev_read_iter+0x50/0x80
[  171.603998]  __vfs_read+0x134/0x190
[  171.607800]  vfs_read+0x94/0x130
[  171.611273]  ksys_read+0x6c/0xe0
[  171.614745]  __arm64_sys_read+0x24/0x30
[  171.617974]  el0_svc_handler+0xb8/0x140
[  171.621509]  el0_svc+0x8/0xc


For the record, I'll restate the problem:

dd hangs when reading a partition larger than RAM, except when using
iflag=direct or iflag=nocache

# dd if=/dev/sde of=/dev/null bs=64M iflag=direct
64+0 records in
64+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 51.1532 s, 84.0 MB/s

# dd if=/dev/sde of=/dev/null bs=64M iflag=nocache
64+0 records in
64+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 60.6478 s, 70.8 MB/s

# dd if=/dev/sde of=/dev/null bs=64M count=56
56+0 records in
56+0 records out
3758096384 bytes (3.8 GB, 3.5 GiB) copied, 50.5897 s, 74.3 MB/s

# dd if=/dev/sde of=/dev/null bs=64M
/*** CONSOLE LOCKS UP ***/



I've been looking at the differences between iflag=direct and no-flag.
Using the following script to enable relevant(?) logs:

mount -t debugfs nodev /sys/kernel/debug/
cd /sys/kernel/debug/tracing/events
echo 1 > filemap/enable
echo 1 > pagemap/enable
echo 1 > vmscan/enable
echo 1 > kmem/mm_page_free/enable
echo 1 > kmem/mm_page_free_batched/enable
echo 1 > kmem/mm_page_alloc/enable
echo 1 > kmem/mm_page_alloc_zone_locked/enable
echo 1 > kmem/mm_page_pcpu_drain/enable
echo 1 > kmem/mm_page_alloc_extfrag/enable
echo 1 > kmem/kmalloc_node/enable
echo 1 > kmem/kmem_cache_alloc_node/enable
echo 1 > kmem/kmem_cache_alloc/enable
echo 1 > kmem/kmem_cache_free/enable


# dd if=/dev/sde of=/dev/null bs=64M count=1 iflag=direct
https://pastebin.ubuntu.com/p/YWp4pydM6V/
(114942 lines)

# dd if=/dev/sde of=/dev/null bs=64M count=1
https://pastebin.ubuntu.com/p/xpzgN5H3Hp/
(247439 lines)


Does anyone see what's going sideways in the no-flag case?

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-02-08 15:33                         ` Marc Gonzalez
@ 2019-02-08 15:49                           ` Bart Van Assche
  2019-02-09 11:57                             ` Marc Gonzalez
  2019-02-11 16:36                             ` Marc Gonzalez
  0 siblings, 2 replies; 34+ messages in thread
From: Bart Van Assche @ 2019-02-08 15:49 UTC (permalink / raw)
  To: Marc Gonzalez, linux-mm, linux-block
  Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI,
	Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin,
	Andrew Morton, Michal Hocko

On Fri, 2019-02-08 at 16:33 +0100, Marc Gonzalez wrote:
> Does anyone see what's going sideways in the no-flag case?

Hi Marc,

Does this problem only occur with block devices backed by the UFS driver
or does this problem also occur with other block drivers?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-02-08 15:49                           ` Bart Van Assche
@ 2019-02-09 11:57                             ` Marc Gonzalez
  2019-02-11 16:36                             ` Marc Gonzalez
  1 sibling, 0 replies; 34+ messages in thread
From: Marc Gonzalez @ 2019-02-09 11:57 UTC (permalink / raw)
  To: Bart Van Assche, linux-mm, linux-block
  Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI,
	Joao Pinto, Jeffrey Hugo, Evan Green, Matthias Kaehlcke,
	Douglas Anderson, Stephen Boyd, Tomas Winkler, Adrian Hunter,
	Alim Akhtar, Avri Altman, Bart Van Assche, Martin Petersen,
	Bjorn Andersson, Ming Lei, Omar Sandoval, Roman Gushchin,
	Andrew Morton, Michal Hocko

On 08/02/2019 16:49, Bart Van Assche wrote:

> On Fri, 2019-02-08 at 16:33 +0100, Marc Gonzalez wrote:
>
>> Does anyone see what's going sideways in the no-flag case?
> 
> Does this problem only occur with block devices backed by the UFS driver
> or does this problem also occur with other block drivers?

So far, I've only been able to test with UFS storage.
The board has no PATA/SATA. SDHC is not supported yet.

With Jeffrey's help, I was able to get a semi-functional
USB3 stack running. I'll test USB3 mass storage on Monday.

FWIW, I removed most (all?) locks from the UFSHC driver,
by dropping scaling and gating support. I could also
drop runtime suspend, if someone thinks that could help,
but I'm thinking the problem might be in the mm or block
layers?

(It doesn't look like a locking problem, but more a memory
exhaustion problem.)

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-02-08 15:49                           ` Bart Van Assche
  2019-02-09 11:57                             ` Marc Gonzalez
@ 2019-02-11 16:36                             ` Marc Gonzalez
  2019-02-11 17:27                               ` Marc Gonzalez
  1 sibling, 1 reply; 34+ messages in thread
From: Marc Gonzalez @ 2019-02-11 16:36 UTC (permalink / raw)
  To: Bart Van Assche, linux-mm, linux-block
  Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI,
	Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson,
	Stephen Boyd, Tomas Winkler, Adrian Hunter, Bart Van Assche,
	Martin Petersen, Bjorn Andersson, Ming Lei, Omar Sandoval,
	Roman Gushchin, Andrew Morton, Michal Hocko, James Bottomley

On 08/02/2019 16:49, Bart Van Assche wrote:

> Does this problem only occur with block devices backed by the UFS driver
> or does this problem also occur with other block drivers?

Yes, same issue with a USB3 mass storage device:

usb 2-1: new SuperSpeed Gen 1 USB device number 2 using xhci-hcd
usb 2-1: New USB device found, idVendor=05dc, idProduct=a838, bcdDevice=11.00
usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-1: Product: USB Flash Drive
usb 2-1: Manufacturer: Lexar
usb 2-1: SerialNumber: AAYW2W7I13BAR0JC
usb-storage 2-1:1.0: USB Mass Storage device detected
scsi host0: usb-storage 2-1:1.0
scsi 0:0:0:0: Direct-Access     Lexar    USB Flash Drive  1100 PQ: 0 ANSI: 6
sd 0:0:0:0: [sda] 62517248 512-byte logical blocks: (32.0 GB/29.8 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1
sd 0:0:0:0: [sda] Attached SCSI removable disk

# dd if=/dev/sda of=/dev/null bs=1M status=progress
3879731200 bytes (3.9 GB, 3.6 GiB) copied, 56.0097 s, 69.3 MB/s

This definitively rules out drivers/scsi/ufs
(Dropping UFS people)

So the problem could be in SCSI glue, or block, or mm?

How can I pinpoint the bug?

Problem statement and logs:
https://lore.kernel.org/linux-block/66419195-594c-aa83-c19d-f091ad3b296d@free.fr/

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: dd hangs when reading large partitions
  2019-02-11 16:36                             ` Marc Gonzalez
@ 2019-02-11 17:27                               ` Marc Gonzalez
  2019-02-12 15:26                                 ` [SOLVED] " Marc Gonzalez
  0 siblings, 1 reply; 34+ messages in thread
From: Marc Gonzalez @ 2019-02-11 17:27 UTC (permalink / raw)
  To: Bart Van Assche, linux-mm, linux-block
  Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI,
	Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson,
	Stephen Boyd, Tomas Winkler, Adrian Hunter, Bart Van Assche,
	Martin Petersen, Bjorn Andersson, Ming Lei, Omar Sandoval,
	Roman Gushchin, Andrew Morton, Michal Hocko, James Bottomley

On 11/02/2019 17:36, Marc Gonzalez wrote:

> On 08/02/2019 16:49, Bart Van Assche wrote:
> 
>> Does this problem only occur with block devices backed by the UFS driver
>> or does this problem also occur with other block drivers?
> 
> Yes, same issue with a USB3 mass storage device:
> 
> usb 2-1: new SuperSpeed Gen 1 USB device number 2 using xhci-hcd
> usb 2-1: New USB device found, idVendor=05dc, idProduct=a838, bcdDevice=11.00
> usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> usb 2-1: Product: USB Flash Drive
> usb 2-1: Manufacturer: Lexar
> usb 2-1: SerialNumber: AAYW2W7I13BAR0JC
> usb-storage 2-1:1.0: USB Mass Storage device detected
> scsi host0: usb-storage 2-1:1.0
> scsi 0:0:0:0: Direct-Access     Lexar    USB Flash Drive  1100 PQ: 0 ANSI: 6
> sd 0:0:0:0: [sda] 62517248 512-byte logical blocks: (32.0 GB/29.8 GiB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>  sda: sda1
> sd 0:0:0:0: [sda] Attached SCSI removable disk
> 
> # dd if=/dev/sda of=/dev/null bs=1M status=progress
> 3879731200 bytes (3.9 GB, 3.6 GiB) copied, 56.0097 s, 69.3 MB/s
> 
> So the problem could be in SCSI glue, or block, or mm?

Unlikely. Someone else would have been affected...

A colleague pointed out that some memory areas are reserved downstream.
Perhaps the FW goes haywire once the kernel touches reserved memory?

I'm off to test that hypothesis.

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [SOLVED] dd hangs when reading large partitions
  2019-02-11 17:27                               ` Marc Gonzalez
@ 2019-02-12 15:26                                 ` Marc Gonzalez
  0 siblings, 0 replies; 34+ messages in thread
From: Marc Gonzalez @ 2019-02-12 15:26 UTC (permalink / raw)
  To: Bart Van Assche, linux-mm, linux-block
  Cc: Jianchao Wang, Christoph Hellwig, Jens Axboe, fsdevel, SCSI,
	Jeffrey Hugo, Evan Green, Matthias Kaehlcke, Douglas Anderson,
	Stephen Boyd, Tomas Winkler, Adrian Hunter, Bart Van Assche,
	Martin Petersen, Bjorn Andersson, Ming Lei, Omar Sandoval,
	Roman Gushchin, Andrew Morton, Michal Hocko, James Bottomley

On 11/02/2019 18:27, Marc Gonzalez wrote:

> A colleague pointed out that some memory areas are reserved downstream.
> Perhaps the FW goes haywire once the kernel touches reserved memory?

Bingo! FW quirk.

https://patchwork.kernel.org/patch/10808173/

Once the reserved memory range is extended, I am finally able
to read large partitions:

# dd if=/dev/sda of=/dev/null bs=1M
55256+0 records in
55256+0 records out
57940115456 bytes (58 GB, 54 GiB) copied, 786.165 s, 73.7 MB/s

Thanks to everyone who provided suggestions and guidance.

Regards.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2019-02-12 15:27 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-18 12:10 dd hangs when reading large partitions Marc Gonzalez
2019-01-18 13:39 ` Ming Lei
2019-01-18 14:54   ` Marc Gonzalez
2019-01-18 15:18 ` jianchao.wang
2019-01-18 17:38   ` Marc Gonzalez
2019-01-18 17:48   ` Jens Axboe
2019-01-18 17:51     ` Bart Van Assche
2019-01-18 19:00       ` Jens Axboe
2019-01-19  9:56     ` Christoph Hellwig
2019-01-19 14:37       ` Jens Axboe
2019-01-19 16:09       ` Bart Van Assche
2019-01-21  8:33         ` Christoph Hellwig
2019-01-19 19:47       ` Marc Gonzalez
2019-01-19 20:45         ` Marc Gonzalez
2019-01-21  8:33         ` Christoph Hellwig
2019-01-21 15:22         ` Marc Gonzalez
2019-01-22  3:12           ` jianchao.wang
2019-01-22 10:59             ` Marc Gonzalez
2019-01-22 12:49               ` Marc Gonzalez
2019-01-22 16:17               ` Marc Gonzalez
2019-01-22 16:22                 ` Greg Kroah-Hartman
2019-01-22 19:07                   ` Evan Green
2019-01-23  3:10               ` jianchao.wang
2019-02-06 16:16                 ` Marc Gonzalez
2019-02-06 17:05                   ` Marc Gonzalez
2019-02-07 10:44                     ` Marc Gonzalez
2019-02-07 16:56                       ` Marc Gonzalez
2019-02-08 15:33                         ` Marc Gonzalez
2019-02-08 15:49                           ` Bart Van Assche
2019-02-09 11:57                             ` Marc Gonzalez
2019-02-11 16:36                             ` Marc Gonzalez
2019-02-11 17:27                               ` Marc Gonzalez
2019-02-12 15:26                                 ` [SOLVED] " Marc Gonzalez
2019-01-18 19:27 ` Douglas Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).