Host managed SMR drive issue

* Host managed SMR drive issue
@ 2021-09-24 18:25 Sven Oehme
  2021-09-26 23:19 ` Damien Le Moal
  0 siblings, 1 reply; 23+ messages in thread
From: Sven Oehme @ 2021-09-24 18:25 UTC (permalink / raw)
  To: linux-btrfs

Hi,

i am running into issues with Host Managed SMR drive testing. when i
try to copy or move a file to the btrfs filesystem it just hangs. i
tried multiple 5.12,5.13 as well as 5.14 all the way to 5.14.6 but the
issue still persists.

here is the setup :

I am using btrfs-progs-v5.14.1
device is a  Host Managed WDC  20TB SMR drive with firmware level C421
its connected via a HBA 9400-8i Tri-Mode Storage Adapter , latest 20.0 FW
I am using the /dev/sd device direct  , no lvm or device mapper or
anything else in between

after a few seconds, sometimes minutes data rate to the drive drops to
0 and 1 or 2 cores on my system show 100% IO wait time, but no longer
make any progress

the process in question has the following stack :

[ 2168.589160] task:mv              state:D stack:    0 pid: 3814
ppid:  3679 flags:0x00004000
[ 2168.589162] Call Trace:
[ 2168.589163]  __schedule+0x2fa/0x910
[ 2168.589166]  schedule+0x4f/0xc0
[ 2168.589168]  schedule_timeout+0x8a/0x140
[ 2168.589171]  ? __bpf_trace_tick_stop+0x10/0x10
[ 2168.589173]  io_schedule_timeout+0x51/0x80
[ 2168.589176]  balance_dirty_pages+0x2fa/0xe30
[ 2168.589179]  ? __mod_lruvec_state+0x3a/0x50
[ 2168.589182]  balance_dirty_pages_ratelimited+0x2f9/0x3c0
[ 2168.589185]  btrfs_buffered_write+0x58e/0x7e0 [btrfs]
[ 2168.589226]  btrfs_file_write_iter+0x138/0x3e0 [btrfs]
[ 2168.589260]  ? ext4_file_read_iter+0x5b/0x180
[ 2168.589262]  new_sync_write+0x114/0x1a0
[ 2168.589265]  vfs_write+0x1c5/0x260
[ 2168.589267]  ksys_write+0x67/0xe0
[ 2168.589270]  __x64_sys_write+0x1a/0x20
[ 2168.589272]  do_syscall_64+0x40/0xb0
[ 2168.589275]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2168.589277] RIP: 0033:0x7ffff7e91c27
[ 2168.589278] RSP: 002b:00007fffffffdc48 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 2168.589280] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007ffff7e91c27
[ 2168.589281] RDX: 0000000000020000 RSI: 00007ffff79bd000 RDI: 0000000000000004
[ 2168.589282] RBP: 00007ffff79bd000 R08: 0000000000000000 R09: 0000000000000000
[ 2168.589283] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000004
[ 2168.589284] R13: 0000000000000004 R14: 00007ffff79bd000 R15: 0000000000020000

and shows up under runnable tasks :

[ 2168.593562] runnable tasks:
[ 2168.593562]  S            task   PID         tree-key  switches
prio     wait-time             sum-exec        sum-sleep
[ 2168.593563] -------------------------------------------------------------------------------------------------------------
[ 2168.593565]  S        cpuhp/13    92     88923.802487        19
120         0.000000         0.292061         0.000000 2 0 /
[ 2168.593571]  S  idle_inject/13    93       -11.997255         3
49         0.000000         0.005480         0.000000 2 0 /
[ 2168.593577]  S    migration/13    94       814.287531       551
0         0.000000      1015.550514         0.000000 2 0 /
[ 2168.593582]  S    ksoftirqd/13    95     88762.317130        44
120         0.000000         1.940252         0.000000 2 0 /
[ 2168.593588]  I    kworker/13:0    96        -9.031157         5
120         0.000000         0.017423         0.000000 2 0 /
[ 2168.593593]  I   kworker/13:0H    97      3570.961886         5
100         0.000000         0.034345         0.000000 2 0 /
[ 2168.593603]  I    kworker/13:1   400    101650.731913       578
120         0.000000        10.110898         0.000000 2 0 /
[ 2168.593611]  I   kworker/13:1H  1015    101649.600698        65
100         0.000000         1.443300         0.000000 2 0 /
[ 2168.593618]  S           loop3  1994     99133.655903        70
100         0.000000         1.137468         0.000000 2 0 /
[ 2168.593625]  S           snapd  3161        15.296181       166
120         0.000000        90.296991         0.000000 2 0
/system.slice/snapd.service
[ 2168.593631]  S           snapd  3198        10.047573        49
120         0.000000         5.646247         0.000000 2 0
/system.slice/snapd.service
[ 2168.593639]  S            java  2446       970.743682       301
120         0.000000       101.648659         0.000000 2 0
/system.slice/stor_tomcat.service
[ 2168.593645]  S C1 CompilerThre  2573      1033.157689      3636
120         0.000000       615.256247         0.000000 2 0
/system.slice/stor_tomcat.service
[ 2168.593654]  D              mv  3814      2263.816953    186734
120         0.000000     30087.917319         0.000000 2 0 /user.slice

any idea what is going on and how to fix this ?

thx.

^ permalink raw reply	[flat|nested] 23+ messages in thread