All of lore.kernel.org
 help / color / mirror / Atom feed
* Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-02-19  5:02 ` Kyle Sanderson
  0 siblings, 0 replies; 49+ messages in thread
From: Kyle Sanderson @ 2022-02-19  5:02 UTC (permalink / raw)
  To: qat-linux, giovanni.cabiddu
  Cc: Linux-Kernal, linux-xfs, linux-crypto, dm-devel, Linus Torvalds

A2SDi-8C-HLN4F has IQAT enabled by default, when this device is
attempted to be used by xfs (through dm-crypt) the entire kernel
thread stalls forever. Multiple users have hit this over the years
(through sporadic reporting) - I ended up trying ZFS and encryption
wasn't an issue there at all because I guess they don't use this
device. Returning to sanity (xfs), I was able to provision a dm-crypt
volume no problem on the disk, however when running mkfs.xfs on the
volume is what triggers the cascading failure (each request kills a
kthread). Disabling IQAT on the south bridge results in a working
system, however this is not the default configuration for the
distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm
convinced this never worked properly based on the lack of popularity
for kernel encryption (crypto), and the embedded nature that
SuperMicro has integrated this device in collaboration with intel as
it looks like the primary usage is through external accelerator cards.

Kernels tried were from RHEL8 over a year ago, and this impacts the
entirety of the 5.4 series on Ubuntu.
Please CC me on replies as I'm not subscribed to all lists. CPU is C3758.

  363.495058] INFO: task kworker/u16:0:8 blocked for more than 120 seconds.
[  363.495114]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.495155] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.495201] kworker/u16:0   D    0     8      2 0x80004000
[  363.495213] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.495214] Call Trace:
[  363.495223]  __schedule+0x2e3/0x740
[  363.495226]  schedule+0x42/0xb0
[  363.495228]  schedule_timeout+0x10e/0x160
[  363.495232]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.495233]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.495236]  wait_for_completion+0xb1/0x120
[  363.495239]  ? wake_up_q+0x70/0x70
[  363.495242]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.495245]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.495249]  process_one_work+0x1eb/0x3b0
[  363.495251]  worker_thread+0x4d/0x400
[  363.495254]  kthread+0x104/0x140
[  363.495256]  ? process_one_work+0x3b0/0x3b0
[  363.495257]  ? kthread_park+0x90/0x90
[  363.495260]  ret_from_fork+0x1f/0x40
[  363.495274] INFO: task kworker/u16:1:123 blocked for more than 120 seconds.
[  363.495317]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.495364] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.495410] kworker/u16:1   D    0   123      2 0x80004000
[  363.495415] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.495416] Call Trace:
[  363.495419]  __schedule+0x2e3/0x740
[  363.495422]  schedule+0x42/0xb0
[  363.495424]  schedule_timeout+0x10e/0x160
[  363.495426]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.495427]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.495430]  wait_for_completion+0xb1/0x120
[  363.495431]  ? wake_up_q+0x70/0x70
[  363.495434]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.495437]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.495441]  process_one_work+0x1eb/0x3b0
[  363.495443]  worker_thread+0x4d/0x400
[  363.495445]  kthread+0x104/0x140
[  363.495447]  ? process_one_work+0x3b0/0x3b0
[  363.495449]  ? kthread_park+0x90/0x90
[  363.495451]  ret_from_fork+0x1f/0x40
[  363.495457] INFO: task kworker/u16:2:153 blocked for more than 120 seconds.
[  363.495499]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.495539] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.495584] kworker/u16:2   D    0   153      2 0x80004000
[  363.495589] Workqueue: kcryptd/253:5 kcryptd_crypt [dm_crypt]
[  363.495590] Call Trace:
[  363.495593]  __schedule+0x2e3/0x740
[  363.495595]  schedule+0x42/0xb0
[  363.495597]  schedule_timeout+0x10e/0x160
[  363.495599]  ? skcipher_decrypt_ablkcipher+0x61/0x70
[  363.495601]  ? crypto_skcipher_decrypt+0x48/0x60
[  363.495603]  wait_for_completion+0xb1/0x120
[  363.495605]  ? wake_up_q+0x70/0x70
[  363.495608]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.495611]  kcryptd_crypt+0xc6/0x3b0 [dm_crypt]
[  363.495613]  ? __switch_to+0x7f/0x480
[  363.495615]  ? switch_mm_irqs_off+0x19b/0x500
[  363.495618]  process_one_work+0x1eb/0x3b0
[  363.495621]  worker_thread+0x4d/0x400
[  363.495623]  kthread+0x104/0x140
[  363.495625]  ? process_one_work+0x3b0/0x3b0
[  363.495627]  ? kthread_park+0x90/0x90
[  363.495629]  ret_from_fork+0x1f/0x40
[  363.495636] INFO: task kworker/u16:5:279 blocked for more than 120 seconds.
[  363.495677]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.495717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.495762] kworker/u16:5   D    0   279      2 0x80004000
[  363.495766] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.495767] Call Trace:
[  363.495771]  __schedule+0x2e3/0x740
[  363.495773]  schedule+0x42/0xb0
[  363.495775]  schedule_timeout+0x10e/0x160
[  363.495777]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.495778]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.495781]  wait_for_completion+0xb1/0x120
[  363.495782]  ? wake_up_q+0x70/0x70
[  363.495785]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.495788]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.495791]  process_one_work+0x1eb/0x3b0
[  363.495794]  worker_thread+0x4d/0x400
[  363.495796]  kthread+0x104/0x140
[  363.495798]  ? process_one_work+0x3b0/0x3b0
[  363.495800]  ? kthread_park+0x90/0x90
[  363.495802]  ret_from_fork+0x1f/0x40
[  363.495808] INFO: task kworker/u16:11:299 blocked for more than 120 seconds.
[  363.495849]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.495890] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.495935] kworker/u16:11  D    0   299      2 0x80004000
[  363.495939] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.495940] Call Trace:
[  363.495943]  __schedule+0x2e3/0x740
[  363.495946]  schedule+0x42/0xb0
[  363.495947]  schedule_timeout+0x10e/0x160
[  363.495949]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.495951]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.495953]  wait_for_completion+0xb1/0x120
[  363.495955]  ? wake_up_q+0x70/0x70
[  363.495958]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.495961]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.495964]  process_one_work+0x1eb/0x3b0
[  363.495966]  worker_thread+0x4d/0x400
[  363.495969]  kthread+0x104/0x140
[  363.495971]  ? process_one_work+0x3b0/0x3b0
[  363.495972]  ? kthread_park+0x90/0x90
[  363.495974]  ret_from_fork+0x1f/0x40
[  363.495977] INFO: task kworker/u16:12:300 blocked for more than 120 seconds.
[  363.496018]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.496058] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.496108] kworker/u16:12  D    0   300      2 0x80004000
[  363.496113] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.496114] Call Trace:
[  363.496117]  __schedule+0x2e3/0x740
[  363.496120]  schedule+0x42/0xb0
[  363.496121]  schedule_timeout+0x10e/0x160
[  363.496123]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.496125]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.496127]  wait_for_completion+0xb1/0x120
[  363.496129]  ? wake_up_q+0x70/0x70
[  363.496132]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.496134]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.496138]  process_one_work+0x1eb/0x3b0
[  363.496140]  worker_thread+0x4d/0x400
[  363.496142]  kthread+0x104/0x140
[  363.496144]  ? process_one_work+0x3b0/0x3b0
[  363.496146]  ? kthread_park+0x90/0x90
[  363.496148]  ret_from_fork+0x1f/0x40
[  363.496151] INFO: task kworker/u16:13:301 blocked for more than 120 seconds.
[  363.496193]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.496233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.496278] kworker/u16:13  D    0   301      2 0x80004000
[  363.496282] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.496283] Call Trace:
[  363.496286]  __schedule+0x2e3/0x740
[  363.496289]  schedule+0x42/0xb0
[  363.496290]  schedule_timeout+0x10e/0x160
[  363.496292]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.496294]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.496296]  wait_for_completion+0xb1/0x120
[  363.496298]  ? wake_up_q+0x70/0x70
[  363.496301]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.496304]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.496307]  process_one_work+0x1eb/0x3b0
[  363.496310]  worker_thread+0x4d/0x400
[  363.496312]  kthread+0x104/0x140
[  363.496314]  ? process_one_work+0x3b0/0x3b0
[  363.496316]  ? kthread_park+0x90/0x90
[  363.496317]  ret_from_fork+0x1f/0x40
[  363.496320] INFO: task kworker/u16:14:302 blocked for more than 120 seconds.
[  363.496362]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.496402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.496447] kworker/u16:14  D    0   302      2 0x80004000
[  363.496451] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.496452] Call Trace:
[  363.496455]  __schedule+0x2e3/0x740
[  363.496458]  schedule+0x42/0xb0
[  363.496459]  schedule_timeout+0x10e/0x160
[  363.496461]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.496463]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.496465]  wait_for_completion+0xb1/0x120
[  363.496467]  ? wake_up_q+0x70/0x70
[  363.496470]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.496473]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.496476]  process_one_work+0x1eb/0x3b0
[  363.496478]  worker_thread+0x4d/0x400
[  363.496481]  kthread+0x104/0x140
[  363.496483]  ? process_one_work+0x3b0/0x3b0
[  363.496484]  ? kthread_park+0x90/0x90
[  363.496486]  ret_from_fork+0x1f/0x40
[  363.496489] INFO: task kworker/u16:15:303 blocked for more than 120 seconds.
[  363.496531]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.496571] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.496616] kworker/u16:15  D    0   303      2 0x80004000
[  363.496620] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.496621] Call Trace:
[  363.496624]  __schedule+0x2e3/0x740
[  363.496627]  schedule+0x42/0xb0
[  363.496629]  schedule_timeout+0x10e/0x160
[  363.496630]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.496632]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.496634]  wait_for_completion+0xb1/0x120
[  363.496636]  ? wake_up_q+0x70/0x70
[  363.496639]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.496642]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.496645]  process_one_work+0x1eb/0x3b0
[  363.496647]  worker_thread+0x4d/0x400
[  363.496650]  kthread+0x104/0x140
[  363.496652]  ? process_one_work+0x3b0/0x3b0
[  363.496654]  ? kthread_park+0x90/0x90
[  363.496655]  ret_from_fork+0x1f/0x40
[  363.496713] INFO: task mergerfs:9760 blocked for more than 120 seconds.
[  363.496752]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.496793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.496838] mergerfs        D    0  9760      1 0x00000000
[  363.496840] Call Trace:
[  363.496843]  __schedule+0x2e3/0x740
[  363.496846]  schedule+0x42/0xb0
[  363.496848]  schedule_timeout+0x10e/0x160
[  363.496851]  ? blk_finish_plug+0x26/0x40
[  363.496853]  wait_for_completion+0xb1/0x120
[  363.496855]  ? wake_up_q+0x70/0x70
[  363.496910]  ? __xfs_buf_submit+0x138/0x260 [xfs]
[  363.496950]  xfs_buf_iowait+0x26/0xe0 [xfs]
[  363.496990]  __xfs_buf_submit+0x138/0x260 [xfs]
[  363.497030]  _xfs_buf_read+0x27/0x30 [xfs]
[  363.497070]  xfs_buf_read_map+0x132/0x1d0 [xfs]
[  363.497073]  ? new_slab+0x4a/0x70
[  363.497117]  xfs_trans_read_buf_map+0xca/0x350 [xfs]
[  363.497155]  xfs_imap_to_bp+0x66/0xd0 [xfs]
[  363.497193]  xfs_iread+0x83/0x200 [xfs]
[  363.497234]  xfs_iget+0x214/0x9e0 [xfs]
[  363.497270]  ? xfs_da_compname+0x1d/0x30 [xfs]
[  363.497306]  ? xfs_dir2_sf_lookup+0xd0/0x200 [xfs]
[  363.497348]  xfs_lookup+0xe2/0x120 [xfs]
[  363.497390]  xfs_vn_lookup+0x72/0xb0 [xfs]
[  363.497393]  __lookup_slow+0x92/0x160
[  363.497395]  lookup_slow+0x3b/0x60
[  363.497397]  walk_component+0x1da/0x360
[  363.497399]  ? link_path_walk.part.0+0x2a2/0x550
[  363.497401]  path_lookupat.isra.0+0x80/0x230
[  363.497404]  filename_lookup+0xae/0x170
[  363.497407]  ? __check_object_size+0x13f/0x150
[  363.497409]  ? strncpy_from_user+0x4c/0x150
[  363.497412]  user_path_at_empty+0x3a/0x50
[  363.497414]  vfs_statx+0x7d/0xe0
[  363.497417]  __do_sys_newlstat+0x3e/0x80
[  363.497419]  ? vfs_read+0x12e/0x160
[  363.497420]  ? fput+0x13/0x20
[  363.497422]  ? ksys_read+0xce/0xe0
[  363.497424]  __x64_sys_newlstat+0x16/0x20
[  363.497427]  do_syscall_64+0x57/0x190
[  363.497429]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  363.497432] RIP: 0033:0x7f7f32c656ea
[  363.497438] Code: Bad RIP value.
[  363.497439] RSP: 002b:00007f7f31fea0c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000006
[  363.497441] RAX: ffffffffffffffda RBX: 0000560953796248 RCX: 00007f7f32c656ea
[  363.497442] RDX: 00007f7f31fea110 RSI: 00007f7f31fea110 RDI: 00007f7f31fea100
[  363.497443] RBP: 00007f7f31fea120 R08: 0000000000000001 R09: 000000000000000a
[  363.497445] R10: 00007f7f14000b90 R11: 0000000000000246 R12: 00007f7f31fea220
[  363.497446] R13: 00007f7f14000b90 R14: 00007f7f31fea100 R15: 00007f7f31fea110

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-02-19  5:02 ` Kyle Sanderson
  0 siblings, 0 replies; 49+ messages in thread
From: Kyle Sanderson @ 2022-02-19  5:02 UTC (permalink / raw)
  To: qat-linux, giovanni.cabiddu
  Cc: linux-xfs, dm-devel, Linus Torvalds, Linux-Kernal, linux-crypto

A2SDi-8C-HLN4F has IQAT enabled by default, when this device is
attempted to be used by xfs (through dm-crypt) the entire kernel
thread stalls forever. Multiple users have hit this over the years
(through sporadic reporting) - I ended up trying ZFS and encryption
wasn't an issue there at all because I guess they don't use this
device. Returning to sanity (xfs), I was able to provision a dm-crypt
volume no problem on the disk, however when running mkfs.xfs on the
volume is what triggers the cascading failure (each request kills a
kthread). Disabling IQAT on the south bridge results in a working
system, however this is not the default configuration for the
distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm
convinced this never worked properly based on the lack of popularity
for kernel encryption (crypto), and the embedded nature that
SuperMicro has integrated this device in collaboration with intel as
it looks like the primary usage is through external accelerator cards.

Kernels tried were from RHEL8 over a year ago, and this impacts the
entirety of the 5.4 series on Ubuntu.
Please CC me on replies as I'm not subscribed to all lists. CPU is C3758.

  363.495058] INFO: task kworker/u16:0:8 blocked for more than 120 seconds.
[  363.495114]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.495155] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.495201] kworker/u16:0   D    0     8      2 0x80004000
[  363.495213] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.495214] Call Trace:
[  363.495223]  __schedule+0x2e3/0x740
[  363.495226]  schedule+0x42/0xb0
[  363.495228]  schedule_timeout+0x10e/0x160
[  363.495232]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.495233]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.495236]  wait_for_completion+0xb1/0x120
[  363.495239]  ? wake_up_q+0x70/0x70
[  363.495242]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.495245]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.495249]  process_one_work+0x1eb/0x3b0
[  363.495251]  worker_thread+0x4d/0x400
[  363.495254]  kthread+0x104/0x140
[  363.495256]  ? process_one_work+0x3b0/0x3b0
[  363.495257]  ? kthread_park+0x90/0x90
[  363.495260]  ret_from_fork+0x1f/0x40
[  363.495274] INFO: task kworker/u16:1:123 blocked for more than 120 seconds.
[  363.495317]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.495364] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.495410] kworker/u16:1   D    0   123      2 0x80004000
[  363.495415] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.495416] Call Trace:
[  363.495419]  __schedule+0x2e3/0x740
[  363.495422]  schedule+0x42/0xb0
[  363.495424]  schedule_timeout+0x10e/0x160
[  363.495426]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.495427]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.495430]  wait_for_completion+0xb1/0x120
[  363.495431]  ? wake_up_q+0x70/0x70
[  363.495434]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.495437]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.495441]  process_one_work+0x1eb/0x3b0
[  363.495443]  worker_thread+0x4d/0x400
[  363.495445]  kthread+0x104/0x140
[  363.495447]  ? process_one_work+0x3b0/0x3b0
[  363.495449]  ? kthread_park+0x90/0x90
[  363.495451]  ret_from_fork+0x1f/0x40
[  363.495457] INFO: task kworker/u16:2:153 blocked for more than 120 seconds.
[  363.495499]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.495539] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.495584] kworker/u16:2   D    0   153      2 0x80004000
[  363.495589] Workqueue: kcryptd/253:5 kcryptd_crypt [dm_crypt]
[  363.495590] Call Trace:
[  363.495593]  __schedule+0x2e3/0x740
[  363.495595]  schedule+0x42/0xb0
[  363.495597]  schedule_timeout+0x10e/0x160
[  363.495599]  ? skcipher_decrypt_ablkcipher+0x61/0x70
[  363.495601]  ? crypto_skcipher_decrypt+0x48/0x60
[  363.495603]  wait_for_completion+0xb1/0x120
[  363.495605]  ? wake_up_q+0x70/0x70
[  363.495608]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.495611]  kcryptd_crypt+0xc6/0x3b0 [dm_crypt]
[  363.495613]  ? __switch_to+0x7f/0x480
[  363.495615]  ? switch_mm_irqs_off+0x19b/0x500
[  363.495618]  process_one_work+0x1eb/0x3b0
[  363.495621]  worker_thread+0x4d/0x400
[  363.495623]  kthread+0x104/0x140
[  363.495625]  ? process_one_work+0x3b0/0x3b0
[  363.495627]  ? kthread_park+0x90/0x90
[  363.495629]  ret_from_fork+0x1f/0x40
[  363.495636] INFO: task kworker/u16:5:279 blocked for more than 120 seconds.
[  363.495677]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.495717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.495762] kworker/u16:5   D    0   279      2 0x80004000
[  363.495766] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.495767] Call Trace:
[  363.495771]  __schedule+0x2e3/0x740
[  363.495773]  schedule+0x42/0xb0
[  363.495775]  schedule_timeout+0x10e/0x160
[  363.495777]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.495778]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.495781]  wait_for_completion+0xb1/0x120
[  363.495782]  ? wake_up_q+0x70/0x70
[  363.495785]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.495788]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.495791]  process_one_work+0x1eb/0x3b0
[  363.495794]  worker_thread+0x4d/0x400
[  363.495796]  kthread+0x104/0x140
[  363.495798]  ? process_one_work+0x3b0/0x3b0
[  363.495800]  ? kthread_park+0x90/0x90
[  363.495802]  ret_from_fork+0x1f/0x40
[  363.495808] INFO: task kworker/u16:11:299 blocked for more than 120 seconds.
[  363.495849]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.495890] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.495935] kworker/u16:11  D    0   299      2 0x80004000
[  363.495939] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.495940] Call Trace:
[  363.495943]  __schedule+0x2e3/0x740
[  363.495946]  schedule+0x42/0xb0
[  363.495947]  schedule_timeout+0x10e/0x160
[  363.495949]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.495951]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.495953]  wait_for_completion+0xb1/0x120
[  363.495955]  ? wake_up_q+0x70/0x70
[  363.495958]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.495961]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.495964]  process_one_work+0x1eb/0x3b0
[  363.495966]  worker_thread+0x4d/0x400
[  363.495969]  kthread+0x104/0x140
[  363.495971]  ? process_one_work+0x3b0/0x3b0
[  363.495972]  ? kthread_park+0x90/0x90
[  363.495974]  ret_from_fork+0x1f/0x40
[  363.495977] INFO: task kworker/u16:12:300 blocked for more than 120 seconds.
[  363.496018]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.496058] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.496108] kworker/u16:12  D    0   300      2 0x80004000
[  363.496113] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.496114] Call Trace:
[  363.496117]  __schedule+0x2e3/0x740
[  363.496120]  schedule+0x42/0xb0
[  363.496121]  schedule_timeout+0x10e/0x160
[  363.496123]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.496125]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.496127]  wait_for_completion+0xb1/0x120
[  363.496129]  ? wake_up_q+0x70/0x70
[  363.496132]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.496134]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.496138]  process_one_work+0x1eb/0x3b0
[  363.496140]  worker_thread+0x4d/0x400
[  363.496142]  kthread+0x104/0x140
[  363.496144]  ? process_one_work+0x3b0/0x3b0
[  363.496146]  ? kthread_park+0x90/0x90
[  363.496148]  ret_from_fork+0x1f/0x40
[  363.496151] INFO: task kworker/u16:13:301 blocked for more than 120 seconds.
[  363.496193]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.496233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.496278] kworker/u16:13  D    0   301      2 0x80004000
[  363.496282] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.496283] Call Trace:
[  363.496286]  __schedule+0x2e3/0x740
[  363.496289]  schedule+0x42/0xb0
[  363.496290]  schedule_timeout+0x10e/0x160
[  363.496292]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.496294]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.496296]  wait_for_completion+0xb1/0x120
[  363.496298]  ? wake_up_q+0x70/0x70
[  363.496301]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.496304]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.496307]  process_one_work+0x1eb/0x3b0
[  363.496310]  worker_thread+0x4d/0x400
[  363.496312]  kthread+0x104/0x140
[  363.496314]  ? process_one_work+0x3b0/0x3b0
[  363.496316]  ? kthread_park+0x90/0x90
[  363.496317]  ret_from_fork+0x1f/0x40
[  363.496320] INFO: task kworker/u16:14:302 blocked for more than 120 seconds.
[  363.496362]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.496402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.496447] kworker/u16:14  D    0   302      2 0x80004000
[  363.496451] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.496452] Call Trace:
[  363.496455]  __schedule+0x2e3/0x740
[  363.496458]  schedule+0x42/0xb0
[  363.496459]  schedule_timeout+0x10e/0x160
[  363.496461]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.496463]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.496465]  wait_for_completion+0xb1/0x120
[  363.496467]  ? wake_up_q+0x70/0x70
[  363.496470]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.496473]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.496476]  process_one_work+0x1eb/0x3b0
[  363.496478]  worker_thread+0x4d/0x400
[  363.496481]  kthread+0x104/0x140
[  363.496483]  ? process_one_work+0x3b0/0x3b0
[  363.496484]  ? kthread_park+0x90/0x90
[  363.496486]  ret_from_fork+0x1f/0x40
[  363.496489] INFO: task kworker/u16:15:303 blocked for more than 120 seconds.
[  363.496531]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.496571] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.496616] kworker/u16:15  D    0   303      2 0x80004000
[  363.496620] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt]
[  363.496621] Call Trace:
[  363.496624]  __schedule+0x2e3/0x740
[  363.496627]  schedule+0x42/0xb0
[  363.496629]  schedule_timeout+0x10e/0x160
[  363.496630]  ? skcipher_encrypt_ablkcipher+0x61/0x70
[  363.496632]  ? crypto_skcipher_encrypt+0x48/0x60
[  363.496634]  wait_for_completion+0xb1/0x120
[  363.496636]  ? wake_up_q+0x70/0x70
[  363.496639]  crypt_convert+0x144/0x1f0 [dm_crypt]
[  363.496642]  kcryptd_crypt+0x2b9/0x3b0 [dm_crypt]
[  363.496645]  process_one_work+0x1eb/0x3b0
[  363.496647]  worker_thread+0x4d/0x400
[  363.496650]  kthread+0x104/0x140
[  363.496652]  ? process_one_work+0x3b0/0x3b0
[  363.496654]  ? kthread_park+0x90/0x90
[  363.496655]  ret_from_fork+0x1f/0x40
[  363.496713] INFO: task mergerfs:9760 blocked for more than 120 seconds.
[  363.496752]       Tainted: P           O      5.4.0-100-generic #113-Ubuntu
[  363.496793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  363.496838] mergerfs        D    0  9760      1 0x00000000
[  363.496840] Call Trace:
[  363.496843]  __schedule+0x2e3/0x740
[  363.496846]  schedule+0x42/0xb0
[  363.496848]  schedule_timeout+0x10e/0x160
[  363.496851]  ? blk_finish_plug+0x26/0x40
[  363.496853]  wait_for_completion+0xb1/0x120
[  363.496855]  ? wake_up_q+0x70/0x70
[  363.496910]  ? __xfs_buf_submit+0x138/0x260 [xfs]
[  363.496950]  xfs_buf_iowait+0x26/0xe0 [xfs]
[  363.496990]  __xfs_buf_submit+0x138/0x260 [xfs]
[  363.497030]  _xfs_buf_read+0x27/0x30 [xfs]
[  363.497070]  xfs_buf_read_map+0x132/0x1d0 [xfs]
[  363.497073]  ? new_slab+0x4a/0x70
[  363.497117]  xfs_trans_read_buf_map+0xca/0x350 [xfs]
[  363.497155]  xfs_imap_to_bp+0x66/0xd0 [xfs]
[  363.497193]  xfs_iread+0x83/0x200 [xfs]
[  363.497234]  xfs_iget+0x214/0x9e0 [xfs]
[  363.497270]  ? xfs_da_compname+0x1d/0x30 [xfs]
[  363.497306]  ? xfs_dir2_sf_lookup+0xd0/0x200 [xfs]
[  363.497348]  xfs_lookup+0xe2/0x120 [xfs]
[  363.497390]  xfs_vn_lookup+0x72/0xb0 [xfs]
[  363.497393]  __lookup_slow+0x92/0x160
[  363.497395]  lookup_slow+0x3b/0x60
[  363.497397]  walk_component+0x1da/0x360
[  363.497399]  ? link_path_walk.part.0+0x2a2/0x550
[  363.497401]  path_lookupat.isra.0+0x80/0x230
[  363.497404]  filename_lookup+0xae/0x170
[  363.497407]  ? __check_object_size+0x13f/0x150
[  363.497409]  ? strncpy_from_user+0x4c/0x150
[  363.497412]  user_path_at_empty+0x3a/0x50
[  363.497414]  vfs_statx+0x7d/0xe0
[  363.497417]  __do_sys_newlstat+0x3e/0x80
[  363.497419]  ? vfs_read+0x12e/0x160
[  363.497420]  ? fput+0x13/0x20
[  363.497422]  ? ksys_read+0xce/0xe0
[  363.497424]  __x64_sys_newlstat+0x16/0x20
[  363.497427]  do_syscall_64+0x57/0x190
[  363.497429]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  363.497432] RIP: 0033:0x7f7f32c656ea
[  363.497438] Code: Bad RIP value.
[  363.497439] RSP: 002b:00007f7f31fea0c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000006
[  363.497441] RAX: ffffffffffffffda RBX: 0000560953796248 RCX: 00007f7f32c656ea
[  363.497442] RDX: 00007f7f31fea110 RSI: 00007f7f31fea110 RDI: 00007f7f31fea100
[  363.497443] RBP: 00007f7f31fea120 R08: 0000000000000001 R09: 000000000000000a
[  363.497445] R10: 00007f7f14000b90 R11: 0000000000000246 R12: 00007f7f31fea220
[  363.497446] R13: 00007f7f14000b90 R14: 00007f7f31fea100 R15: 00007f7f31fea110

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-02-19  5:02 ` [dm-devel] " Kyle Sanderson
@ 2022-02-19 21:03   ` Dave Chinner
  -1 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2022-02-19 21:03 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: qat-linux, giovanni.cabiddu, Linux-Kernal, linux-xfs,
	linux-crypto, dm-devel, Linus Torvalds

On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote:
> A2SDi-8C-HLN4F has IQAT enabled by default, when this device is
> attempted to be used by xfs (through dm-crypt) the entire kernel
> thread stalls forever. Multiple users have hit this over the years
> (through sporadic reporting) - I ended up trying ZFS and encryption
> wasn't an issue there at all because I guess they don't use this
> device. Returning to sanity (xfs), I was able to provision a dm-crypt
> volume no problem on the disk, however when running mkfs.xfs on the
> volume is what triggers the cascading failure (each request kills a
> kthread).

Can you provide the full stack traces for these errors so we can see
exactly what this cascading failure looks like, please? In reality,
the stall messages some time after this are not interesting - it's
the first errors that cause the stall that need to be investigated.

A good idea would be to provide the full storage stack decription
and hardware in use, as per:

https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> Disabling IQAT on the south bridge results in a working
> system, however this is not the default configuration for the
> distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm
> convinced this never worked properly based on the lack of popularity
> for kernel encryption (crypto), and the embedded nature that
> SuperMicro has integrated this device in collaboration with intel as
> it looks like the primary usage is through external accelerator cards.

This really sounds like broken hardware, not a kernel problem.

> Kernels tried were from RHEL8 over a year ago, and this impacts the
> entirety of the 5.4 series on Ubuntu.
> Please CC me on replies as I'm not subscribed to all lists. CPU is C3758.

[snip stalled kcryptd worker threads]

This implies a dmcrypt level problem - XFS can't make progress is
dmcrypt is not completing IOs.

Where are the XFS corruption reports that the subject implies is
occurring?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-02-19 21:03   ` Dave Chinner
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Chinner @ 2022-02-19 21:03 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: giovanni.cabiddu, qat-linux, Linux-Kernal, linux-xfs, dm-devel,
	linux-crypto, Linus Torvalds

On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote:
> A2SDi-8C-HLN4F has IQAT enabled by default, when this device is
> attempted to be used by xfs (through dm-crypt) the entire kernel
> thread stalls forever. Multiple users have hit this over the years
> (through sporadic reporting) - I ended up trying ZFS and encryption
> wasn't an issue there at all because I guess they don't use this
> device. Returning to sanity (xfs), I was able to provision a dm-crypt
> volume no problem on the disk, however when running mkfs.xfs on the
> volume is what triggers the cascading failure (each request kills a
> kthread).

Can you provide the full stack traces for these errors so we can see
exactly what this cascading failure looks like, please? In reality,
the stall messages some time after this are not interesting - it's
the first errors that cause the stall that need to be investigated.

A good idea would be to provide the full storage stack decription
and hardware in use, as per:

https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> Disabling IQAT on the south bridge results in a working
> system, however this is not the default configuration for the
> distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm
> convinced this never worked properly based on the lack of popularity
> for kernel encryption (crypto), and the embedded nature that
> SuperMicro has integrated this device in collaboration with intel as
> it looks like the primary usage is through external accelerator cards.

This really sounds like broken hardware, not a kernel problem.

> Kernels tried were from RHEL8 over a year ago, and this impacts the
> entirety of the 5.4 series on Ubuntu.
> Please CC me on replies as I'm not subscribed to all lists. CPU is C3758.

[snip stalled kcryptd worker threads]

This implies a dmcrypt level problem - XFS can't make progress is
dmcrypt is not completing IOs.

Where are the XFS corruption reports that the subject implies is
occurring?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-02-19 21:03   ` [dm-devel] " Dave Chinner
@ 2022-02-19 23:00     ` Kyle Sanderson
  -1 siblings, 0 replies; 49+ messages in thread
From: Kyle Sanderson @ 2022-02-19 23:00 UTC (permalink / raw)
  To: Dave Chinner
  Cc: qat-linux, giovanni.cabiddu, Linux-Kernal, linux-xfs,
	linux-crypto, dm-devel, Linus Torvalds, Greg KH,
	salvatore.benedetto, herbert, pablo.marcos.oltra

hi Dave,

> This really sounds like broken hardware, not a kernel problem.

It is indeed a hardware issue, specifically the intel qat crypto
driver that's in-tree - the hardware is fine (see below). The IQAT
eratta documentation states that if a request is not submitted
properly it can stall the entire device. The remediation guidance from
2020 was "don't do that" and "don't allow unprivileged users access to
the device". The in-tree driver is not implemented properly either for
this SoC or board - I'm thinking it's related to QATE-7495.

https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf

> This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs.

That's the weird part about it. Some bio's are completing, others are
completely dropped, with some stalling forever. I had to use
xfs_repair to get the volumes operational again. I lost a good deal of
files and had to recover from backup after toggling the device back on
on a production system (silly, I know).

> Where are the XFS corruption reports that the subject implies is occurring?

I think you're right, it's dm-crypt that's broken here, with
ultimately the crypto driver causing this corruption. XFS being the
edge to the end-user is taking the brunt of it. There's reports going
back to late 2017 of significant issues with this mainlined stable
driver.

https://bugzilla.redhat.com/show_bug.cgi?id=1522962
https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu
https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560

Any guidance would be appreciated.
Kyle.
On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote:
> > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is
> > attempted to be used by xfs (through dm-crypt) the entire kernel
> > thread stalls forever. Multiple users have hit this over the years
> > (through sporadic reporting) - I ended up trying ZFS and encryption
> > wasn't an issue there at all because I guess they don't use this
> > device. Returning to sanity (xfs), I was able to provision a dm-crypt
> > volume no problem on the disk, however when running mkfs.xfs on the
> > volume is what triggers the cascading failure (each request kills a
> > kthread).
>
> Can you provide the full stack traces for these errors so we can see
> exactly what this cascading failure looks like, please? In reality,
> the stall messages some time after this are not interesting - it's
> the first errors that cause the stall that need to be investigated.
>
> A good idea would be to provide the full storage stack decription
> and hardware in use, as per:
>
> https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>
> > Disabling IQAT on the south bridge results in a working
> > system, however this is not the default configuration for the
> > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm
> > convinced this never worked properly based on the lack of popularity
> > for kernel encryption (crypto), and the embedded nature that
> > SuperMicro has integrated this device in collaboration with intel as
> > it looks like the primary usage is through external accelerator cards.
>
> This really sounds like broken hardware, not a kernel problem.
>
> > Kernels tried were from RHEL8 over a year ago, and this impacts the
> > entirety of the 5.4 series on Ubuntu.
> > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758.
>
> [snip stalled kcryptd worker threads]
>
> This implies a dmcrypt level problem - XFS can't make progress is
> dmcrypt is not completing IOs.
>
> Where are the XFS corruption reports that the subject implies is
> occurring?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-02-19 23:00     ` Kyle Sanderson
  0 siblings, 0 replies; 49+ messages in thread
From: Kyle Sanderson @ 2022-02-19 23:00 UTC (permalink / raw)
  To: Dave Chinner
  Cc: giovanni.cabiddu, herbert, pablo.marcos.oltra, Greg KH,
	qat-linux, Linux-Kernal, linux-xfs, salvatore.benedetto,
	dm-devel, linux-crypto, Linus Torvalds

hi Dave,

> This really sounds like broken hardware, not a kernel problem.

It is indeed a hardware issue, specifically the intel qat crypto
driver that's in-tree - the hardware is fine (see below). The IQAT
eratta documentation states that if a request is not submitted
properly it can stall the entire device. The remediation guidance from
2020 was "don't do that" and "don't allow unprivileged users access to
the device". The in-tree driver is not implemented properly either for
this SoC or board - I'm thinking it's related to QATE-7495.

https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf

> This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs.

That's the weird part about it. Some bio's are completing, others are
completely dropped, with some stalling forever. I had to use
xfs_repair to get the volumes operational again. I lost a good deal of
files and had to recover from backup after toggling the device back on
on a production system (silly, I know).

> Where are the XFS corruption reports that the subject implies is occurring?

I think you're right, it's dm-crypt that's broken here, with
ultimately the crypto driver causing this corruption. XFS being the
edge to the end-user is taking the brunt of it. There's reports going
back to late 2017 of significant issues with this mainlined stable
driver.

https://bugzilla.redhat.com/show_bug.cgi?id=1522962
https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu
https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560

Any guidance would be appreciated.
Kyle.
On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote:
> > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is
> > attempted to be used by xfs (through dm-crypt) the entire kernel
> > thread stalls forever. Multiple users have hit this over the years
> > (through sporadic reporting) - I ended up trying ZFS and encryption
> > wasn't an issue there at all because I guess they don't use this
> > device. Returning to sanity (xfs), I was able to provision a dm-crypt
> > volume no problem on the disk, however when running mkfs.xfs on the
> > volume is what triggers the cascading failure (each request kills a
> > kthread).
>
> Can you provide the full stack traces for these errors so we can see
> exactly what this cascading failure looks like, please? In reality,
> the stall messages some time after this are not interesting - it's
> the first errors that cause the stall that need to be investigated.
>
> A good idea would be to provide the full storage stack decription
> and hardware in use, as per:
>
> https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>
> > Disabling IQAT on the south bridge results in a working
> > system, however this is not the default configuration for the
> > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm
> > convinced this never worked properly based on the lack of popularity
> > for kernel encryption (crypto), and the embedded nature that
> > SuperMicro has integrated this device in collaboration with intel as
> > it looks like the primary usage is through external accelerator cards.
>
> This really sounds like broken hardware, not a kernel problem.
>
> > Kernels tried were from RHEL8 over a year ago, and this impacts the
> > entirety of the 5.4 series on Ubuntu.
> > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758.
>
> [snip stalled kcryptd worker threads]
>
> This implies a dmcrypt level problem - XFS can't make progress is
> dmcrypt is not completing IOs.
>
> Where are the XFS corruption reports that the subject implies is
> occurring?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-02-19 23:00     ` [dm-devel] " Kyle Sanderson
@ 2022-02-21 11:47       ` Giovanni Cabiddu
  -1 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-02-21 11:47 UTC (permalink / raw)
  To: Kyle Sanderson, herbert
  Cc: Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, linux-crypto,
	dm-devel, Linus Torvalds, Greg KH

Hi Kyle,

The issue is that the implementations of aead and skcipher in the QAT
driver are not properly supporting requests with the
CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
If the HW queue is full, the driver returns -EBUSY [1] but does not
enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting
indefinitely for a completion to a request that was never submitted,
therefore the stall.
This is not related to QATE-7495 'An incorrectly formatted request to
QAT can hang the entire QAT endpoint' [3], which occurs when a malformed
request is sent to the device.

I'm working at patch that resolves this problem. In the meanwhile a
workaround is to blacklist the qat_c3xxx.ko driver.

Regarding avoiding this issue on stable kernels. The usage of QAT with
dm-crypt was already disabled in kernel 5.10 for a different issue
(the driver allocates memory in the datapath).
The following patches implement the change:
    7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
    2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
    fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
    b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
    cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
An option would be to send the patches above to stable, another is to wait
for a patch that fixes the problems in the QAT driver and send that to
stable.
@Herbert, what is the preferred approach here?

Thanks,

[1] https://elixir.bootlin.com/linux/latest/source/drivers/crypto/qat/qat_common/qat_algs.c#L1022
[2] https://elixir.bootlin.com/linux/latest/source/drivers/md/dm-crypt.c#L1584
[3] https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf - page 25

-- 
Giovanni


On Sat, Feb 19, 2022 at 03:00:51PM -0800, Kyle Sanderson wrote:
> hi Dave,
> 
> > This really sounds like broken hardware, not a kernel problem.
> 
> It is indeed a hardware issue, specifically the intel qat crypto
> driver that's in-tree - the hardware is fine (see below). The IQAT
> eratta documentation states that if a request is not submitted
> properly it can stall the entire device. The remediation guidance from
> 2020 was "don't do that" and "don't allow unprivileged users access to
> the device". The in-tree driver is not implemented properly either for
> this SoC or board - I'm thinking it's related to QATE-7495.
> 
> https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf
> 
> > This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs.
> 
> That's the weird part about it. Some bio's are completing, others are
> completely dropped, with some stalling forever. I had to use
> xfs_repair to get the volumes operational again. I lost a good deal of
> files and had to recover from backup after toggling the device back on
> on a production system (silly, I know).
> 
> > Where are the XFS corruption reports that the subject implies is occurring?
> 
> I think you're right, it's dm-crypt that's broken here, with
> ultimately the crypto driver causing this corruption. XFS being the
> edge to the end-user is taking the brunt of it. There's reports going
> back to late 2017 of significant issues with this mainlined stable
> driver.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1522962
> https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu
> https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560
> 
> Any guidance would be appreciated.
> Kyle.
> On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote:
> > > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is
> > > attempted to be used by xfs (through dm-crypt) the entire kernel
> > > thread stalls forever. Multiple users have hit this over the years
> > > (through sporadic reporting) - I ended up trying ZFS and encryption
> > > wasn't an issue there at all because I guess they don't use this
> > > device. Returning to sanity (xfs), I was able to provision a dm-crypt
> > > volume no problem on the disk, however when running mkfs.xfs on the
> > > volume is what triggers the cascading failure (each request kills a
> > > kthread).
> >
> > Can you provide the full stack traces for these errors so we can see
> > exactly what this cascading failure looks like, please? In reality,
> > the stall messages some time after this are not interesting - it's
> > the first errors that cause the stall that need to be investigated.
> >
> > A good idea would be to provide the full storage stack decription
> > and hardware in use, as per:
> >
> > https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> >
> > > Disabling IQAT on the south bridge results in a working
> > > system, however this is not the default configuration for the
> > > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm
> > > convinced this never worked properly based on the lack of popularity
> > > for kernel encryption (crypto), and the embedded nature that
> > > SuperMicro has integrated this device in collaboration with intel as
> > > it looks like the primary usage is through external accelerator cards.
> >
> > This really sounds like broken hardware, not a kernel problem.
> >
> > > Kernels tried were from RHEL8 over a year ago, and this impacts the
> > > entirety of the 5.4 series on Ubuntu.
> > > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758.
> >
> > [snip stalled kcryptd worker threads]
> >
> > This implies a dmcrypt level problem - XFS can't make progress is
> > dmcrypt is not completing IOs.
> >
> > Where are the XFS corruption reports that the subject implies is
> > occurring?
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-02-21 11:47       ` Giovanni Cabiddu
  0 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-02-21 11:47 UTC (permalink / raw)
  To: Kyle Sanderson, herbert
  Cc: Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs,
	dm-devel, linux-crypto, Linus Torvalds

Hi Kyle,

The issue is that the implementations of aead and skcipher in the QAT
driver are not properly supporting requests with the
CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
If the HW queue is full, the driver returns -EBUSY [1] but does not
enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting
indefinitely for a completion to a request that was never submitted,
therefore the stall.
This is not related to QATE-7495 'An incorrectly formatted request to
QAT can hang the entire QAT endpoint' [3], which occurs when a malformed
request is sent to the device.

I'm working at patch that resolves this problem. In the meanwhile a
workaround is to blacklist the qat_c3xxx.ko driver.

Regarding avoiding this issue on stable kernels. The usage of QAT with
dm-crypt was already disabled in kernel 5.10 for a different issue
(the driver allocates memory in the datapath).
The following patches implement the change:
    7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
    2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
    fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
    b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
    cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
An option would be to send the patches above to stable, another is to wait
for a patch that fixes the problems in the QAT driver and send that to
stable.
@Herbert, what is the preferred approach here?

Thanks,

[1] https://elixir.bootlin.com/linux/latest/source/drivers/crypto/qat/qat_common/qat_algs.c#L1022
[2] https://elixir.bootlin.com/linux/latest/source/drivers/md/dm-crypt.c#L1584
[3] https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf - page 25

-- 
Giovanni


On Sat, Feb 19, 2022 at 03:00:51PM -0800, Kyle Sanderson wrote:
> hi Dave,
> 
> > This really sounds like broken hardware, not a kernel problem.
> 
> It is indeed a hardware issue, specifically the intel qat crypto
> driver that's in-tree - the hardware is fine (see below). The IQAT
> eratta documentation states that if a request is not submitted
> properly it can stall the entire device. The remediation guidance from
> 2020 was "don't do that" and "don't allow unprivileged users access to
> the device". The in-tree driver is not implemented properly either for
> this SoC or board - I'm thinking it's related to QATE-7495.
> 
> https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf
> 
> > This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs.
> 
> That's the weird part about it. Some bio's are completing, others are
> completely dropped, with some stalling forever. I had to use
> xfs_repair to get the volumes operational again. I lost a good deal of
> files and had to recover from backup after toggling the device back on
> on a production system (silly, I know).
> 
> > Where are the XFS corruption reports that the subject implies is occurring?
> 
> I think you're right, it's dm-crypt that's broken here, with
> ultimately the crypto driver causing this corruption. XFS being the
> edge to the end-user is taking the brunt of it. There's reports going
> back to late 2017 of significant issues with this mainlined stable
> driver.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1522962
> https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu
> https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560
> 
> Any guidance would be appreciated.
> Kyle.
> On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote:
> > > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is
> > > attempted to be used by xfs (through dm-crypt) the entire kernel
> > > thread stalls forever. Multiple users have hit this over the years
> > > (through sporadic reporting) - I ended up trying ZFS and encryption
> > > wasn't an issue there at all because I guess they don't use this
> > > device. Returning to sanity (xfs), I was able to provision a dm-crypt
> > > volume no problem on the disk, however when running mkfs.xfs on the
> > > volume is what triggers the cascading failure (each request kills a
> > > kthread).
> >
> > Can you provide the full stack traces for these errors so we can see
> > exactly what this cascading failure looks like, please? In reality,
> > the stall messages some time after this are not interesting - it's
> > the first errors that cause the stall that need to be investigated.
> >
> > A good idea would be to provide the full storage stack decription
> > and hardware in use, as per:
> >
> > https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> >
> > > Disabling IQAT on the south bridge results in a working
> > > system, however this is not the default configuration for the
> > > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm
> > > convinced this never worked properly based on the lack of popularity
> > > for kernel encryption (crypto), and the embedded nature that
> > > SuperMicro has integrated this device in collaboration with intel as
> > > it looks like the primary usage is through external accelerator cards.
> >
> > This really sounds like broken hardware, not a kernel problem.
> >
> > > Kernels tried were from RHEL8 over a year ago, and this impacts the
> > > entirety of the 5.4 series on Ubuntu.
> > > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758.
> >
> > [snip stalled kcryptd worker threads]
> >
> > This implies a dmcrypt level problem - XFS can't make progress is
> > dmcrypt is not completing IOs.
> >
> > Where are the XFS corruption reports that the subject implies is
> > occurring?
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-02-21 11:47       ` [dm-devel] " Giovanni Cabiddu
@ 2022-02-28  8:18         ` Kyle Sanderson
  -1 siblings, 0 replies; 49+ messages in thread
From: Kyle Sanderson @ 2022-02-28  8:18 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: herbert, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs,
	linux-crypto, dm-devel, Linus Torvalds, Greg KH

> The issue is that the implementations of aead and skcipher in the QAT driver are not properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.

Thanks Giovanni. Joel (from Intel) reached out to me out of band to
try and sell me further on QAT but wasn't able to follow-up on any
questions (like - how is the device actually used, how can I
personally help, etc).

> If the HW queue is full, the driver returns -EBUSY [1] but does not enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting indefinitely for a completion to a request that was never submitted, therefore the stall.

Makes sense - this kernel driver has been destroying users for many
years. I'm disappointed that this critical bricking failure isn't
searchable for others.

> This is not related to QATE-7495 'An incorrectly formatted request to QAT can hang the entire QAT endpoint' [3], which occurs when a malformed request is sent to the device.

That's nice to hear that the device itself isn't dying, but it's been
completely destroying systems for years which itself is a DoS.

> I'm working at patch that resolves this problem. In the meanwhile a workaround is to blacklist the qat_c3xxx.ko driver.

I'm not writing this facetiously, but this driver has caused
incredible harm over the past 5+ years and seems to continue to do so.
As there's no patch proposed yet, I'm looking for the driver to be
completely removed from the tree as it's presently a pure marketing
campaign that's caused significant harm. If the marketing benefits
(like accelerated crypto + hashing) aren't there when the accelerated
instruction set was pulled from these integrated chips - the driver
continues to serve no purpose for consumers beyond damage. Disabling
the core I/O bits in December 2020 to make this barely work continues
to promote this as a side project as it was never resolved in the
driver.

If I can test patches, or assist with the removal of this present
in-tree malware I'm happy to help.

Kyle.


On Mon, Feb 21, 2022 at 3:48 AM Giovanni Cabiddu
<giovanni.cabiddu@intel.com> wrote:
>
> Hi Kyle,
>
> The issue is that the implementations of aead and skcipher in the QAT
> driver are not properly supporting requests with the
> CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
> If the HW queue is full, the driver returns -EBUSY [1] but does not
> enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting
> indefinitely for a completion to a request that was never submitted,
> therefore the stall.
> This is not related to QATE-7495 'An incorrectly formatted request to
> QAT can hang the entire QAT endpoint' [3], which occurs when a malformed
> request is sent to the device.
>
> I'm working at patch that resolves this problem. In the meanwhile a
> workaround is to blacklist the qat_c3xxx.ko driver.
>
> Regarding avoiding this issue on stable kernels. The usage of QAT with
> dm-crypt was already disabled in kernel 5.10 for a different issue
> (the driver allocates memory in the datapath).
> The following patches implement the change:
>     7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
>     2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
>     fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
>     b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
>     cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
> An option would be to send the patches above to stable, another is to wait
> for a patch that fixes the problems in the QAT driver and send that to
> stable.
> @Herbert, what is the preferred approach here?
>
> Thanks,
>
> [1] https://elixir.bootlin.com/linux/latest/source/drivers/crypto/qat/qat_common/qat_algs.c#L1022
> [2] https://elixir.bootlin.com/linux/latest/source/drivers/md/dm-crypt.c#L1584
> [3] https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf - page 25
>
> --
> Giovanni
>
>
> On Sat, Feb 19, 2022 at 03:00:51PM -0800, Kyle Sanderson wrote:
> > hi Dave,
> >
> > > This really sounds like broken hardware, not a kernel problem.
> >
> > It is indeed a hardware issue, specifically the intel qat crypto
> > driver that's in-tree - the hardware is fine (see below). The IQAT
> > eratta documentation states that if a request is not submitted
> > properly it can stall the entire device. The remediation guidance from
> > 2020 was "don't do that" and "don't allow unprivileged users access to
> > the device". The in-tree driver is not implemented properly either for
> > this SoC or board - I'm thinking it's related to QATE-7495.
> >
> > https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf
> >
> > > This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs.
> >
> > That's the weird part about it. Some bio's are completing, others are
> > completely dropped, with some stalling forever. I had to use
> > xfs_repair to get the volumes operational again. I lost a good deal of
> > files and had to recover from backup after toggling the device back on
> > on a production system (silly, I know).
> >
> > > Where are the XFS corruption reports that the subject implies is occurring?
> >
> > I think you're right, it's dm-crypt that's broken here, with
> > ultimately the crypto driver causing this corruption. XFS being the
> > edge to the end-user is taking the brunt of it. There's reports going
> > back to late 2017 of significant issues with this mainlined stable
> > driver.
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1522962
> > https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu
> > https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560
> >
> > Any guidance would be appreciated.
> > Kyle.
> > On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote:
> > > > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is
> > > > attempted to be used by xfs (through dm-crypt) the entire kernel
> > > > thread stalls forever. Multiple users have hit this over the years
> > > > (through sporadic reporting) - I ended up trying ZFS and encryption
> > > > wasn't an issue there at all because I guess they don't use this
> > > > device. Returning to sanity (xfs), I was able to provision a dm-crypt
> > > > volume no problem on the disk, however when running mkfs.xfs on the
> > > > volume is what triggers the cascading failure (each request kills a
> > > > kthread).
> > >
> > > Can you provide the full stack traces for these errors so we can see
> > > exactly what this cascading failure looks like, please? In reality,
> > > the stall messages some time after this are not interesting - it's
> > > the first errors that cause the stall that need to be investigated.
> > >
> > > A good idea would be to provide the full storage stack decription
> > > and hardware in use, as per:
> > >
> > > https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> > >
> > > > Disabling IQAT on the south bridge results in a working
> > > > system, however this is not the default configuration for the
> > > > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm
> > > > convinced this never worked properly based on the lack of popularity
> > > > for kernel encryption (crypto), and the embedded nature that
> > > > SuperMicro has integrated this device in collaboration with intel as
> > > > it looks like the primary usage is through external accelerator cards.
> > >
> > > This really sounds like broken hardware, not a kernel problem.
> > >
> > > > Kernels tried were from RHEL8 over a year ago, and this impacts the
> > > > entirety of the 5.4 series on Ubuntu.
> > > > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758.
> > >
> > > [snip stalled kcryptd worker threads]
> > >
> > > This implies a dmcrypt level problem - XFS can't make progress is
> > > dmcrypt is not completing IOs.
> > >
> > > Where are the XFS corruption reports that the subject implies is
> > > occurring?
> > >
> > > Cheers,
> > >
> > > Dave.
> > > --
> > > Dave Chinner
> > > david@fromorbit.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-02-28  8:18         ` Kyle Sanderson
  0 siblings, 0 replies; 49+ messages in thread
From: Kyle Sanderson @ 2022-02-28  8:18 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: herbert, Greg KH, Dave Chinner, Linux-Kernal, qat-linux,
	linux-xfs, dm-devel, linux-crypto, Linus Torvalds

> The issue is that the implementations of aead and skcipher in the QAT driver are not properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.

Thanks Giovanni. Joel (from Intel) reached out to me out of band to
try and sell me further on QAT but wasn't able to follow-up on any
questions (like - how is the device actually used, how can I
personally help, etc).

> If the HW queue is full, the driver returns -EBUSY [1] but does not enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting indefinitely for a completion to a request that was never submitted, therefore the stall.

Makes sense - this kernel driver has been destroying users for many
years. I'm disappointed that this critical bricking failure isn't
searchable for others.

> This is not related to QATE-7495 'An incorrectly formatted request to QAT can hang the entire QAT endpoint' [3], which occurs when a malformed request is sent to the device.

That's nice to hear that the device itself isn't dying, but it's been
completely destroying systems for years which itself is a DoS.

> I'm working at patch that resolves this problem. In the meanwhile a workaround is to blacklist the qat_c3xxx.ko driver.

I'm not writing this facetiously, but this driver has caused
incredible harm over the past 5+ years and seems to continue to do so.
As there's no patch proposed yet, I'm looking for the driver to be
completely removed from the tree as it's presently a pure marketing
campaign that's caused significant harm. If the marketing benefits
(like accelerated crypto + hashing) aren't there when the accelerated
instruction set was pulled from these integrated chips - the driver
continues to serve no purpose for consumers beyond damage. Disabling
the core I/O bits in December 2020 to make this barely work continues
to promote this as a side project as it was never resolved in the
driver.

If I can test patches, or assist with the removal of this present
in-tree malware I'm happy to help.

Kyle.


On Mon, Feb 21, 2022 at 3:48 AM Giovanni Cabiddu
<giovanni.cabiddu@intel.com> wrote:
>
> Hi Kyle,
>
> The issue is that the implementations of aead and skcipher in the QAT
> driver are not properly supporting requests with the
> CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
> If the HW queue is full, the driver returns -EBUSY [1] but does not
> enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting
> indefinitely for a completion to a request that was never submitted,
> therefore the stall.
> This is not related to QATE-7495 'An incorrectly formatted request to
> QAT can hang the entire QAT endpoint' [3], which occurs when a malformed
> request is sent to the device.
>
> I'm working at patch that resolves this problem. In the meanwhile a
> workaround is to blacklist the qat_c3xxx.ko driver.
>
> Regarding avoiding this issue on stable kernels. The usage of QAT with
> dm-crypt was already disabled in kernel 5.10 for a different issue
> (the driver allocates memory in the datapath).
> The following patches implement the change:
>     7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
>     2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
>     fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
>     b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
>     cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
> An option would be to send the patches above to stable, another is to wait
> for a patch that fixes the problems in the QAT driver and send that to
> stable.
> @Herbert, what is the preferred approach here?
>
> Thanks,
>
> [1] https://elixir.bootlin.com/linux/latest/source/drivers/crypto/qat/qat_common/qat_algs.c#L1022
> [2] https://elixir.bootlin.com/linux/latest/source/drivers/md/dm-crypt.c#L1584
> [3] https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf - page 25
>
> --
> Giovanni
>
>
> On Sat, Feb 19, 2022 at 03:00:51PM -0800, Kyle Sanderson wrote:
> > hi Dave,
> >
> > > This really sounds like broken hardware, not a kernel problem.
> >
> > It is indeed a hardware issue, specifically the intel qat crypto
> > driver that's in-tree - the hardware is fine (see below). The IQAT
> > eratta documentation states that if a request is not submitted
> > properly it can stall the entire device. The remediation guidance from
> > 2020 was "don't do that" and "don't allow unprivileged users access to
> > the device". The in-tree driver is not implemented properly either for
> > this SoC or board - I'm thinking it's related to QATE-7495.
> >
> > https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf
> >
> > > This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs.
> >
> > That's the weird part about it. Some bio's are completing, others are
> > completely dropped, with some stalling forever. I had to use
> > xfs_repair to get the volumes operational again. I lost a good deal of
> > files and had to recover from backup after toggling the device back on
> > on a production system (silly, I know).
> >
> > > Where are the XFS corruption reports that the subject implies is occurring?
> >
> > I think you're right, it's dm-crypt that's broken here, with
> > ultimately the crypto driver causing this corruption. XFS being the
> > edge to the end-user is taking the brunt of it. There's reports going
> > back to late 2017 of significant issues with this mainlined stable
> > driver.
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1522962
> > https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu
> > https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560
> >
> > Any guidance would be appreciated.
> > Kyle.
> > On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote:
> > > > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is
> > > > attempted to be used by xfs (through dm-crypt) the entire kernel
> > > > thread stalls forever. Multiple users have hit this over the years
> > > > (through sporadic reporting) - I ended up trying ZFS and encryption
> > > > wasn't an issue there at all because I guess they don't use this
> > > > device. Returning to sanity (xfs), I was able to provision a dm-crypt
> > > > volume no problem on the disk, however when running mkfs.xfs on the
> > > > volume is what triggers the cascading failure (each request kills a
> > > > kthread).
> > >
> > > Can you provide the full stack traces for these errors so we can see
> > > exactly what this cascading failure looks like, please? In reality,
> > > the stall messages some time after this are not interesting - it's
> > > the first errors that cause the stall that need to be investigated.
> > >
> > > A good idea would be to provide the full storage stack decription
> > > and hardware in use, as per:
> > >
> > > https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> > >
> > > > Disabling IQAT on the south bridge results in a working
> > > > system, however this is not the default configuration for the
> > > > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm
> > > > convinced this never worked properly based on the lack of popularity
> > > > for kernel encryption (crypto), and the embedded nature that
> > > > SuperMicro has integrated this device in collaboration with intel as
> > > > it looks like the primary usage is through external accelerator cards.
> > >
> > > This really sounds like broken hardware, not a kernel problem.
> > >
> > > > Kernels tried were from RHEL8 over a year ago, and this impacts the
> > > > entirety of the 5.4 series on Ubuntu.
> > > > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758.
> > >
> > > [snip stalled kcryptd worker threads]
> > >
> > > This implies a dmcrypt level problem - XFS can't make progress is
> > > dmcrypt is not completing IOs.
> > >
> > > Where are the XFS corruption reports that the subject implies is
> > > occurring?
> > >
> > > Cheers,
> > >
> > > Dave.
> > > --
> > > Dave Chinner
> > > david@fromorbit.com

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-02-28  8:18         ` [dm-devel] " Kyle Sanderson
@ 2022-02-28 19:25           ` Linus Torvalds
  -1 siblings, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2022-02-28 19:25 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: Giovanni Cabiddu, Herbert Xu, Dave Chinner, qat-linux,
	Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development, Greg KH

On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote:
>
> Makes sense - this kernel driver has been destroying users for many
> years. I'm disappointed that this critical bricking failure isn't
> searchable for others.

It does sound like we should just disable that driver entirely until
it is fixed.

Or at least the configuration that can cause problems, if there is
some particular sub-case. Although from a cursory glance and the
noises made in this thread, it looks like it's all of the 'qat_aeads'
cases (since that uses qat_alg_aead_enc() which can return -EAGAIN),
which effectively means that all of the QAT stuff.

So presumably CRYPTO_DEV_QAT should just be marked as

        depends on BROKEN || COMPILE_TEST

or similar?

              Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-02-28 19:25           ` Linus Torvalds
  0 siblings, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2022-02-28 19:25 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: Giovanni Cabiddu, Herbert Xu, Greg KH, Dave Chinner,
	Linux-Kernal, qat-linux, linux-xfs, device-mapper development,
	Linux Crypto Mailing List

On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote:
>
> Makes sense - this kernel driver has been destroying users for many
> years. I'm disappointed that this critical bricking failure isn't
> searchable for others.

It does sound like we should just disable that driver entirely until
it is fixed.

Or at least the configuration that can cause problems, if there is
some particular sub-case. Although from a cursory glance and the
noises made in this thread, it looks like it's all of the 'qat_aeads'
cases (since that uses qat_alg_aead_enc() which can return -EAGAIN),
which effectively means that all of the QAT stuff.

So presumably CRYPTO_DEV_QAT should just be marked as

        depends on BROKEN || COMPILE_TEST

or similar?

              Linus

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-02-28 19:25           ` [dm-devel] " Linus Torvalds
@ 2022-02-28 20:39             ` Giovanni Cabiddu
  -1 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-02-28 20:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kyle Sanderson, Herbert Xu, Dave Chinner, qat-linux,
	Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development, Greg KH

On Mon, Feb 28, 2022 at 11:25:49AM -0800, Linus Torvalds wrote:
> On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote:
> >
> > Makes sense - this kernel driver has been destroying users for many
> > years. I'm disappointed that this critical bricking failure isn't
> > searchable for others.
> 
> It does sound like we should just disable that driver entirely until
> it is fixed.
> 
> Or at least the configuration that can cause problems, if there is
> some particular sub-case.
The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to
a different issue.
Is it an option to port those patches to stable till I provide a fix for
the driver? I drafted already few alternatives for the fix and I am aiming
for a final set by end of week.

Thanks,

-- 
Giovanni

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-02-28 20:39             ` Giovanni Cabiddu
  0 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-02-28 20:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Herbert Xu, Greg KH, Dave Chinner, Linux-Kernal, qat-linux,
	linux-xfs, device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List

On Mon, Feb 28, 2022 at 11:25:49AM -0800, Linus Torvalds wrote:
> On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote:
> >
> > Makes sense - this kernel driver has been destroying users for many
> > years. I'm disappointed that this critical bricking failure isn't
> > searchable for others.
> 
> It does sound like we should just disable that driver entirely until
> it is fixed.
> 
> Or at least the configuration that can cause problems, if there is
> some particular sub-case.
The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to
a different issue.
Is it an option to port those patches to stable till I provide a fix for
the driver? I drafted already few alternatives for the fix and I am aiming
for a final set by end of week.

Thanks,

-- 
Giovanni

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-02-28 20:39             ` [dm-devel] " Giovanni Cabiddu
@ 2022-02-28 20:59               ` Greg KH
  -1 siblings, 0 replies; 49+ messages in thread
From: Greg KH @ 2022-02-28 20:59 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Linus Torvalds, Kyle Sanderson, Herbert Xu, Dave Chinner,
	qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development

On Mon, Feb 28, 2022 at 08:39:11PM +0000, Giovanni Cabiddu wrote:
> On Mon, Feb 28, 2022 at 11:25:49AM -0800, Linus Torvalds wrote:
> > On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote:
> > >
> > > Makes sense - this kernel driver has been destroying users for many
> > > years. I'm disappointed that this critical bricking failure isn't
> > > searchable for others.
> > 
> > It does sound like we should just disable that driver entirely until
> > it is fixed.
> > 
> > Or at least the configuration that can cause problems, if there is
> > some particular sub-case.
> The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to
> a different issue.
> Is it an option to port those patches to stable till I provide a fix for
> the driver? I drafted already few alternatives for the fix and I am aiming
> for a final set by end of week.

If the existing situation is broken, yes, those patches are fine for
stable releases.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-02-28 20:59               ` Greg KH
  0 siblings, 0 replies; 49+ messages in thread
From: Greg KH @ 2022-02-28 20:59 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Herbert Xu, Dave Chinner, Linux-Kernal, qat-linux, linux-xfs,
	device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

On Mon, Feb 28, 2022 at 08:39:11PM +0000, Giovanni Cabiddu wrote:
> On Mon, Feb 28, 2022 at 11:25:49AM -0800, Linus Torvalds wrote:
> > On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote:
> > >
> > > Makes sense - this kernel driver has been destroying users for many
> > > years. I'm disappointed that this critical bricking failure isn't
> > > searchable for others.
> > 
> > It does sound like we should just disable that driver entirely until
> > it is fixed.
> > 
> > Or at least the configuration that can cause problems, if there is
> > some particular sub-case.
> The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to
> a different issue.
> Is it an option to port those patches to stable till I provide a fix for
> the driver? I drafted already few alternatives for the fix and I am aiming
> for a final set by end of week.

If the existing situation is broken, yes, those patches are fine for
stable releases.

thanks,

greg k-h

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-02-28 19:25           ` [dm-devel] " Linus Torvalds
@ 2022-02-28 21:13             ` Milan Broz
  -1 siblings, 0 replies; 49+ messages in thread
From: Milan Broz @ 2022-02-28 21:13 UTC (permalink / raw)
  To: Linus Torvalds, Kyle Sanderson
  Cc: Giovanni Cabiddu, Herbert Xu, Greg KH, Dave Chinner,
	Linux-Kernal, qat-linux, linux-xfs, device-mapper development,
	Linux Crypto Mailing List

On 28/02/2022 20:25, Linus Torvalds wrote:
> On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote:
>>
>> Makes sense - this kernel driver has been destroying users for many
>> years. I'm disappointed that this critical bricking failure isn't
>> searchable for others.
> 
> It does sound like we should just disable that driver entirely until
> it is fixed.
> 
> Or at least the configuration that can cause problems, if there is
> some particular sub-case. Although from a cursory glance and the
> noises made in this thread, it looks like it's all of the 'qat_aeads'
> cases (since that uses qat_alg_aead_enc() which can return -EAGAIN),
> which effectively means that all of the QAT stuff.
> 
> So presumably CRYPTO_DEV_QAT should just be marked as
> 
>          depends on BROKEN || COMPILE_TEST
> 
> or similar?

Yes, please! Or at least disable it in stable for now.

During the last years, we had several reports of problems with this driver
for cryptsetup/LUKS (dm-crypt with qat driver; here it is skcipher, not aead, though).

The problem with the misunderstanding of the crypto API queue has been known
to authors for some time, at least since 2020
see https://lore.kernel.org/dm-devel/20200601160418.171851200@debian-a64.vm/
and it is apparently not fixed yet.

Thanks you,
Milan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-02-28 21:13             ` Milan Broz
  0 siblings, 0 replies; 49+ messages in thread
From: Milan Broz @ 2022-02-28 21:13 UTC (permalink / raw)
  To: Linus Torvalds, Kyle Sanderson
  Cc: Giovanni Cabiddu, Herbert Xu, Greg KH, Dave Chinner, qat-linux,
	Linux-Kernal, linux-xfs, device-mapper development,
	Linux Crypto Mailing List

On 28/02/2022 20:25, Linus Torvalds wrote:
> On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote:
>>
>> Makes sense - this kernel driver has been destroying users for many
>> years. I'm disappointed that this critical bricking failure isn't
>> searchable for others.
> 
> It does sound like we should just disable that driver entirely until
> it is fixed.
> 
> Or at least the configuration that can cause problems, if there is
> some particular sub-case. Although from a cursory glance and the
> noises made in this thread, it looks like it's all of the 'qat_aeads'
> cases (since that uses qat_alg_aead_enc() which can return -EAGAIN),
> which effectively means that all of the QAT stuff.
> 
> So presumably CRYPTO_DEV_QAT should just be marked as
> 
>          depends on BROKEN || COMPILE_TEST
> 
> or similar?

Yes, please! Or at least disable it in stable for now.

During the last years, we had several reports of problems with this driver
for cryptsetup/LUKS (dm-crypt with qat driver; here it is skcipher, not aead, though).

The problem with the misunderstanding of the crypto API queue has been known
to authors for some time, at least since 2020
see https://lore.kernel.org/dm-devel/20200601160418.171851200@debian-a64.vm/
and it is apparently not fixed yet.

Thanks you,
Milan

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-02-28 20:39             ` [dm-devel] " Giovanni Cabiddu
@ 2022-02-28 23:26               ` Herbert Xu
  -1 siblings, 0 replies; 49+ messages in thread
From: Herbert Xu @ 2022-02-28 23:26 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux,
	Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development, Greg KH

On Mon, Feb 28, 2022 at 08:39:11PM +0000, Giovanni Cabiddu wrote:
>
> The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to
> a different issue.

Indeed, qat has been disabled for dm-crypt since

commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54
Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Thu Jul 9 23:20:41 2020 -0700

    crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY

So this should no longer be an issue with an up-to-date kernel.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-02-28 23:26               ` Herbert Xu
  0 siblings, 0 replies; 49+ messages in thread
From: Herbert Xu @ 2022-02-28 23:26 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs,
	device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

On Mon, Feb 28, 2022 at 08:39:11PM +0000, Giovanni Cabiddu wrote:
>
> The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to
> a different issue.

Indeed, qat has been disabled for dm-crypt since

commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54
Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Thu Jul 9 23:20:41 2020 -0700

    crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY

So this should no longer be an issue with an up-to-date kernel.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-02-28 23:26               ` [dm-devel] " Herbert Xu
@ 2022-03-01  1:12                 ` Linus Torvalds
  -1 siblings, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2022-03-01  1:12 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Giovanni Cabiddu, Kyle Sanderson, Dave Chinner, qat-linux,
	Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development, Greg KH

On Mon, Feb 28, 2022 at 3:26 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Indeed, qat has been disabled for dm-crypt since
>
> commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54
> Author: Mikulas Patocka <mpatocka@redhat.com>
> Date:   Thu Jul 9 23:20:41 2020 -0700
>
>     crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
>
> So this should no longer be an issue with an up-to-date kernel.

Ok, that commit message doesn't exactly make it clear that it also
fixes a major disk corruption issue.

It sounds like it was incidental and almost accidental that it fixed
that thing, and nobody realized it should perhaps be also moved to
stable.

Oh, except I think you *also* need commit cd74693870fb ("dm crypt:
don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY") that
actually reacts to that flag.

Which also wasn't marked for stable, and which is why 5.10 is ok, but
5.9 (which has that first commit, but not the second) is not ok.

Of course, maybe they got marked for stable separately and actually
have been back-ported, but it doesn't sound like that happened.. I
didn't actually check.

                  Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-01  1:12                 ` Linus Torvalds
  0 siblings, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2022-03-01  1:12 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Giovanni Cabiddu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal,
	linux-xfs, device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List

On Mon, Feb 28, 2022 at 3:26 PM Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> Indeed, qat has been disabled for dm-crypt since
>
> commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54
> Author: Mikulas Patocka <mpatocka@redhat.com>
> Date:   Thu Jul 9 23:20:41 2020 -0700
>
>     crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
>
> So this should no longer be an issue with an up-to-date kernel.

Ok, that commit message doesn't exactly make it clear that it also
fixes a major disk corruption issue.

It sounds like it was incidental and almost accidental that it fixed
that thing, and nobody realized it should perhaps be also moved to
stable.

Oh, except I think you *also* need commit cd74693870fb ("dm crypt:
don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY") that
actually reacts to that flag.

Which also wasn't marked for stable, and which is why 5.10 is ok, but
5.9 (which has that first commit, but not the second) is not ok.

Of course, maybe they got marked for stable separately and actually
have been back-ported, but it doesn't sound like that happened.. I
didn't actually check.

                  Linus

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-01  1:12                 ` [dm-devel] " Linus Torvalds
@ 2022-03-01  4:11                   ` Herbert Xu
  -1 siblings, 0 replies; 49+ messages in thread
From: Herbert Xu @ 2022-03-01  4:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Giovanni Cabiddu, Kyle Sanderson, Dave Chinner, qat-linux,
	Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development, Greg KH

On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote:
> 
> It sounds like it was incidental and almost accidental that it fixed
> that thing, and nobody realized it should perhaps be also moved to
> stable.

Yes this was incidental.  The patch in question fixes an issue in
OOM situations where drivers that must allocate memory on each
request may lead to dead-lock so it's not really targeted at qat.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-01  4:11                   ` Herbert Xu
  0 siblings, 0 replies; 49+ messages in thread
From: Herbert Xu @ 2022-03-01  4:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Giovanni Cabiddu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal,
	linux-xfs, device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List

On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote:
> 
> It sounds like it was incidental and almost accidental that it fixed
> that thing, and nobody realized it should perhaps be also moved to
> stable.

Yes this was incidental.  The patch in question fixes an issue in
OOM situations where drivers that must allocate memory on each
request may lead to dead-lock so it's not really targeted at qat.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-01  4:11                   ` [dm-devel] " Herbert Xu
@ 2022-03-02 10:29                     ` Greg KH
  -1 siblings, 0 replies; 49+ messages in thread
From: Greg KH @ 2022-03-02 10:29 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Linus Torvalds, Giovanni Cabiddu, Kyle Sanderson, Dave Chinner,
	qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development

On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote:
> On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote:
> > 
> > It sounds like it was incidental and almost accidental that it fixed
> > that thing, and nobody realized it should perhaps be also moved to
> > stable.
> 
> Yes this was incidental.  The patch in question fixes an issue in
> OOM situations where drivers that must allocate memory on each
> request may lead to dead-lock so it's not really targeted at qat.

Ok, so what commits should I backport to kernels older than 5.10 to
resolve this?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-02 10:29                     ` Greg KH
  0 siblings, 0 replies; 49+ messages in thread
From: Greg KH @ 2022-03-02 10:29 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Giovanni Cabiddu, Dave Chinner, qat-linux, Linux-Kernal,
	linux-xfs, device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote:
> On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote:
> > 
> > It sounds like it was incidental and almost accidental that it fixed
> > that thing, and nobody realized it should perhaps be also moved to
> > stable.
> 
> Yes this was incidental.  The patch in question fixes an issue in
> OOM situations where drivers that must allocate memory on each
> request may lead to dead-lock so it's not really targeted at qat.

Ok, so what commits should I backport to kernels older than 5.10 to
resolve this?

thanks,

greg k-h

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-02 10:29                     ` [dm-devel] " Greg KH
@ 2022-03-02 11:49                       ` Giovanni Cabiddu
  -1 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-03-02 11:49 UTC (permalink / raw)
  To: Greg KH
  Cc: Herbert Xu, Linus Torvalds, Kyle Sanderson, Dave Chinner,
	qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development

Hi Greg,

On Wed, Mar 02, 2022 at 11:29:00AM +0100, Greg KH wrote:
> On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote:
> > On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote:
> > > 
> > > It sounds like it was incidental and almost accidental that it fixed
> > > that thing, and nobody realized it should perhaps be also moved to
> > > stable.
> > 
> > Yes this was incidental.  The patch in question fixes an issue in
> > OOM situations where drivers that must allocate memory on each
> > request may lead to dead-lock so it's not really targeted at qat.
> 
> Ok, so what commits should I backport to kernels older than 5.10 to
> resolve this?
Is it possible to wait for a set that resolves the problem rather than
backporting the patches that disables the use-case?
I have a patchset that fixes the actual issue and we are doing an
internal review before submission to the mailing list.
I should be able to send a V1 out between today and tomorrow.

If not, then these are the patches that should be backported:
    7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
    2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
    fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
    b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
    cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
Herbert, correct me if I'm wrong here.

Thanks,

-- 
Giovanni

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-02 11:49                       ` Giovanni Cabiddu
  0 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-03-02 11:49 UTC (permalink / raw)
  To: Greg KH
  Cc: Herbert Xu, Dave Chinner, Linux-Kernal, qat-linux, linux-xfs,
	device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

Hi Greg,

On Wed, Mar 02, 2022 at 11:29:00AM +0100, Greg KH wrote:
> On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote:
> > On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote:
> > > 
> > > It sounds like it was incidental and almost accidental that it fixed
> > > that thing, and nobody realized it should perhaps be also moved to
> > > stable.
> > 
> > Yes this was incidental.  The patch in question fixes an issue in
> > OOM situations where drivers that must allocate memory on each
> > request may lead to dead-lock so it's not really targeted at qat.
> 
> Ok, so what commits should I backport to kernels older than 5.10 to
> resolve this?
Is it possible to wait for a set that resolves the problem rather than
backporting the patches that disables the use-case?
I have a patchset that fixes the actual issue and we are doing an
internal review before submission to the mailing list.
I should be able to send a V1 out between today and tomorrow.

If not, then these are the patches that should be backported:
    7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
    2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
    fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
    b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
    cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
Herbert, correct me if I'm wrong here.

Thanks,

-- 
Giovanni

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-02 11:49                       ` [dm-devel] " Giovanni Cabiddu
@ 2022-03-02 14:56                         ` Greg KH
  -1 siblings, 0 replies; 49+ messages in thread
From: Greg KH @ 2022-03-02 14:56 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Herbert Xu, Linus Torvalds, Kyle Sanderson, Dave Chinner,
	qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development

On Wed, Mar 02, 2022 at 11:49:16AM +0000, Giovanni Cabiddu wrote:
> Hi Greg,
> 
> On Wed, Mar 02, 2022 at 11:29:00AM +0100, Greg KH wrote:
> > On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote:
> > > On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote:
> > > > 
> > > > It sounds like it was incidental and almost accidental that it fixed
> > > > that thing, and nobody realized it should perhaps be also moved to
> > > > stable.
> > > 
> > > Yes this was incidental.  The patch in question fixes an issue in
> > > OOM situations where drivers that must allocate memory on each
> > > request may lead to dead-lock so it's not really targeted at qat.
> > 
> > Ok, so what commits should I backport to kernels older than 5.10 to
> > resolve this?
> Is it possible to wait for a set that resolves the problem rather than
> backporting the patches that disables the use-case?

It's already disabled in newer kernels, so we should do so for older
ones to prevent problems and the delay in getting those potential fixes
merged some day in the future.

> I have a patchset that fixes the actual issue and we are doing an
> internal review before submission to the mailing list.
> I should be able to send a V1 out between today and tomorrow.
> 
> If not, then these are the patches that should be backported:
>     7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
>     2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
>     fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
>     b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
>     cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
> Herbert, correct me if I'm wrong here.

These need to be manually backported as they do not apply cleanly.  Can
you provide such a set?  Or should I just disable a specific driver here
instead which would be easier overall?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-02 14:56                         ` Greg KH
  0 siblings, 0 replies; 49+ messages in thread
From: Greg KH @ 2022-03-02 14:56 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Herbert Xu, Dave Chinner, Linux-Kernal, qat-linux, linux-xfs,
	device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

On Wed, Mar 02, 2022 at 11:49:16AM +0000, Giovanni Cabiddu wrote:
> Hi Greg,
> 
> On Wed, Mar 02, 2022 at 11:29:00AM +0100, Greg KH wrote:
> > On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote:
> > > On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote:
> > > > 
> > > > It sounds like it was incidental and almost accidental that it fixed
> > > > that thing, and nobody realized it should perhaps be also moved to
> > > > stable.
> > > 
> > > Yes this was incidental.  The patch in question fixes an issue in
> > > OOM situations where drivers that must allocate memory on each
> > > request may lead to dead-lock so it's not really targeted at qat.
> > 
> > Ok, so what commits should I backport to kernels older than 5.10 to
> > resolve this?
> Is it possible to wait for a set that resolves the problem rather than
> backporting the patches that disables the use-case?

It's already disabled in newer kernels, so we should do so for older
ones to prevent problems and the delay in getting those potential fixes
merged some day in the future.

> I have a patchset that fixes the actual issue and we are doing an
> internal review before submission to the mailing list.
> I should be able to send a V1 out between today and tomorrow.
> 
> If not, then these are the patches that should be backported:
>     7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
>     2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
>     fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
>     b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
>     cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
> Herbert, correct me if I'm wrong here.

These need to be manually backported as they do not apply cleanly.  Can
you provide such a set?  Or should I just disable a specific driver here
instead which would be easier overall?

thanks,

greg k-h

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-02 14:56                         ` [dm-devel] " Greg KH
@ 2022-03-02 22:27                           ` Herbert Xu
  -1 siblings, 0 replies; 49+ messages in thread
From: Herbert Xu @ 2022-03-02 22:27 UTC (permalink / raw)
  To: Greg KH
  Cc: Giovanni Cabiddu, Linus Torvalds, Kyle Sanderson, Dave Chinner,
	qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development

On Wed, Mar 02, 2022 at 03:56:36PM +0100, Greg KH wrote:
>
> > If not, then these are the patches that should be backported:
> >     7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
> >     2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
> >     fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
> >     b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
> >     cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
> > Herbert, correct me if I'm wrong here.
> 
> These need to be manually backported as they do not apply cleanly.  Can
> you provide such a set?  Or should I just disable a specific driver here
> instead which would be easier overall?

I think the safest thing is to disable qat in stable (possibly only
when DM_CRYPT is enabled/modular).  The patches in question while
good may have too wide an effect for the stable kernel series.

Giovanni, could you send Greg a Kconfig patch to do that?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-02 22:27                           ` Herbert Xu
  0 siblings, 0 replies; 49+ messages in thread
From: Herbert Xu @ 2022-03-02 22:27 UTC (permalink / raw)
  To: Greg KH
  Cc: Giovanni Cabiddu, Dave Chinner, qat-linux, Linux-Kernal,
	linux-xfs, device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

On Wed, Mar 02, 2022 at 03:56:36PM +0100, Greg KH wrote:
>
> > If not, then these are the patches that should be backported:
> >     7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
> >     2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
> >     fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
> >     b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
> >     cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
> > Herbert, correct me if I'm wrong here.
> 
> These need to be manually backported as they do not apply cleanly.  Can
> you provide such a set?  Or should I just disable a specific driver here
> instead which would be easier overall?

I think the safest thing is to disable qat in stable (possibly only
when DM_CRYPT is enabled/modular).  The patches in question while
good may have too wide an effect for the stable kernel series.

Giovanni, could you send Greg a Kconfig patch to do that?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-02 22:27                           ` [dm-devel] " Herbert Xu
@ 2022-03-02 22:42                             ` Giovanni Cabiddu
  -1 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-03-02 22:42 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Greg KH, Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux,
	Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development

On Thu, Mar 03, 2022 at 10:27:47AM +1200, Herbert Xu wrote:
> On Wed, Mar 02, 2022 at 03:56:36PM +0100, Greg KH wrote:
> >
> > > If not, then these are the patches that should be backported:
> > >     7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
> > >     2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
> > >     fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
> > >     b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
> > >     cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
> > > Herbert, correct me if I'm wrong here.
> > 
> > These need to be manually backported as they do not apply cleanly.  Can
> > you provide such a set?  Or should I just disable a specific driver here
> > instead which would be easier overall?
> 
> I think the safest thing is to disable qat in stable (possibly only
> when DM_CRYPT is enabled/modular).  The patches in question while
> good may have too wide an effect for the stable kernel series.
> 
> Giovanni, could you send Greg a Kconfig patch to do that?
I was thinking, as an alternative, to lower the cra_priority in the QAT
driver for the algorithms used by dm-crypt so they are not used by
default.
Is that a viable option?

Sure, I can provide a patch for either the cra_priority or the Kconfig
option for the stable kernels that don't have the patches above.

-- 
Giovanni

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-02 22:42                             ` Giovanni Cabiddu
  0 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-03-02 22:42 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs,
	device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

On Thu, Mar 03, 2022 at 10:27:47AM +1200, Herbert Xu wrote:
> On Wed, Mar 02, 2022 at 03:56:36PM +0100, Greg KH wrote:
> >
> > > If not, then these are the patches that should be backported:
> > >     7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags
> > >     2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS
> > >     fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
> > >     b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY
> > >     cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY
> > > Herbert, correct me if I'm wrong here.
> > 
> > These need to be manually backported as they do not apply cleanly.  Can
> > you provide such a set?  Or should I just disable a specific driver here
> > instead which would be easier overall?
> 
> I think the safest thing is to disable qat in stable (possibly only
> when DM_CRYPT is enabled/modular).  The patches in question while
> good may have too wide an effect for the stable kernel series.
> 
> Giovanni, could you send Greg a Kconfig patch to do that?
I was thinking, as an alternative, to lower the cra_priority in the QAT
driver for the algorithms used by dm-crypt so they are not used by
default.
Is that a viable option?

Sure, I can provide a patch for either the cra_priority or the Kconfig
option for the stable kernels that don't have the patches above.

-- 
Giovanni

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-02 22:42                             ` [dm-devel] " Giovanni Cabiddu
@ 2022-03-02 22:45                               ` Herbert Xu
  -1 siblings, 0 replies; 49+ messages in thread
From: Herbert Xu @ 2022-03-02 22:45 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Greg KH, Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux,
	Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development

On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote:
>
> I was thinking, as an alternative, to lower the cra_priority in the QAT
> driver for the algorithms used by dm-crypt so they are not used by
> default.
> Is that a viable option?

Yes I think that should work too.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-02 22:45                               ` Herbert Xu
  0 siblings, 0 replies; 49+ messages in thread
From: Herbert Xu @ 2022-03-02 22:45 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs,
	device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote:
>
> I was thinking, as an alternative, to lower the cra_priority in the QAT
> driver for the algorithms used by dm-crypt so they are not used by
> default.
> Is that a viable option?

Yes I think that should work too.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-02 22:45                               ` [dm-devel] " Herbert Xu
@ 2022-03-03 13:49                                 ` Giovanni Cabiddu
  -1 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-03-03 13:49 UTC (permalink / raw)
  To: Herbert Xu, Greg KH
  Cc: Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux,
	Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development

On Thu, Mar 03, 2022 at 10:45:48AM +1200, Herbert Xu wrote:
> On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote:
> >
> > I was thinking, as an alternative, to lower the cra_priority in the QAT
> > driver for the algorithms used by dm-crypt so they are not used by
> > default.
> > Is that a viable option?
> 
> Yes I think that should work too.
The patch below implements that solution and applies to linux-5.4.y.
If it is ok, I can send it to stable for all kernels <= 5.4 following
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-3

---8<---
From: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Date: Thu, 3 Mar 2022 11:54:07 +0000
Subject: [PATCH] crypto: qat - drop priority of algorithms
Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland

The implementations of aead and skcipher in the QAT driver are not
properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
If the HW queue is full, the driver returns -EBUSY but does not enqueue
the request.
This can result in applications like dm-crypt waiting indefinitely for a
completion of a request that was never submitted to the hardware.

To mitigate this problem, reduce the priority of all skcipher and aead
implementations in the QAT driver so they are not used by default.

This patch deviates from the original upstream solution, that prevents
dm-crypt to use drivers registered with the flag
CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
kernels may have a too wide effect.

commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
commit cd74693870fb748d812867ba49af733d689a3604 upstream

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 drivers/crypto/qat/qat_common/qat_algs.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/qat/qat_common/qat_algs.c b/drivers/crypto/qat/qat_common/qat_algs.c
index 6b8ad3d67481..a5c28a08fd8c 100644
--- a/drivers/crypto/qat/qat_common/qat_algs.c
+++ b/drivers/crypto/qat/qat_common/qat_algs.c
@@ -1274,7 +1274,7 @@ static struct aead_alg qat_aeads[] = { {
 	.base = {
 		.cra_name = "authenc(hmac(sha1),cbc(aes))",
 		.cra_driver_name = "qat_aes_cbc_hmac_sha1",
-		.cra_priority = 4001,
+		.cra_priority = 1,
 		.cra_flags = CRYPTO_ALG_ASYNC,
 		.cra_blocksize = AES_BLOCK_SIZE,
 		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
@@ -1291,7 +1291,7 @@ static struct aead_alg qat_aeads[] = { {
 	.base = {
 		.cra_name = "authenc(hmac(sha256),cbc(aes))",
 		.cra_driver_name = "qat_aes_cbc_hmac_sha256",
-		.cra_priority = 4001,
+		.cra_priority = 1,
 		.cra_flags = CRYPTO_ALG_ASYNC,
 		.cra_blocksize = AES_BLOCK_SIZE,
 		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
@@ -1308,7 +1308,7 @@ static struct aead_alg qat_aeads[] = { {
 	.base = {
 		.cra_name = "authenc(hmac(sha512),cbc(aes))",
 		.cra_driver_name = "qat_aes_cbc_hmac_sha512",
-		.cra_priority = 4001,
+		.cra_priority = 1,
 		.cra_flags = CRYPTO_ALG_ASYNC,
 		.cra_blocksize = AES_BLOCK_SIZE,
 		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
@@ -1326,7 +1326,7 @@ static struct aead_alg qat_aeads[] = { {
 static struct crypto_alg qat_algs[] = { {
 	.cra_name = "cbc(aes)",
 	.cra_driver_name = "qat_aes_cbc",
-	.cra_priority = 4001,
+	.cra_priority = 1,
 	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
 	.cra_blocksize = AES_BLOCK_SIZE,
 	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
@@ -1348,7 +1348,7 @@ static struct crypto_alg qat_algs[] = { {
 }, {
 	.cra_name = "ctr(aes)",
 	.cra_driver_name = "qat_aes_ctr",
-	.cra_priority = 4001,
+	.cra_priority = 1,
 	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
 	.cra_blocksize = 1,
 	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
@@ -1370,7 +1370,7 @@ static struct crypto_alg qat_algs[] = { {
 }, {
 	.cra_name = "xts(aes)",
 	.cra_driver_name = "qat_aes_xts",
-	.cra_priority = 4001,
+	.cra_priority = 1,
 	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
 	.cra_blocksize = AES_BLOCK_SIZE,
 	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),

base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-03 13:49                                 ` Giovanni Cabiddu
  0 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-03-03 13:49 UTC (permalink / raw)
  To: Herbert Xu, Greg KH
  Cc: Dave Chinner, qat-linux, Linux-Kernal, linux-xfs,
	device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

On Thu, Mar 03, 2022 at 10:45:48AM +1200, Herbert Xu wrote:
> On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote:
> >
> > I was thinking, as an alternative, to lower the cra_priority in the QAT
> > driver for the algorithms used by dm-crypt so they are not used by
> > default.
> > Is that a viable option?
> 
> Yes I think that should work too.
The patch below implements that solution and applies to linux-5.4.y.
If it is ok, I can send it to stable for all kernels <= 5.4 following
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-3

---8<---
From: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Date: Thu, 3 Mar 2022 11:54:07 +0000
Subject: [PATCH] crypto: qat - drop priority of algorithms
Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland

The implementations of aead and skcipher in the QAT driver are not
properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
If the HW queue is full, the driver returns -EBUSY but does not enqueue
the request.
This can result in applications like dm-crypt waiting indefinitely for a
completion of a request that was never submitted to the hardware.

To mitigate this problem, reduce the priority of all skcipher and aead
implementations in the QAT driver so they are not used by default.

This patch deviates from the original upstream solution, that prevents
dm-crypt to use drivers registered with the flag
CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
kernels may have a too wide effect.

commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
commit cd74693870fb748d812867ba49af733d689a3604 upstream

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 drivers/crypto/qat/qat_common/qat_algs.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/qat/qat_common/qat_algs.c b/drivers/crypto/qat/qat_common/qat_algs.c
index 6b8ad3d67481..a5c28a08fd8c 100644
--- a/drivers/crypto/qat/qat_common/qat_algs.c
+++ b/drivers/crypto/qat/qat_common/qat_algs.c
@@ -1274,7 +1274,7 @@ static struct aead_alg qat_aeads[] = { {
 	.base = {
 		.cra_name = "authenc(hmac(sha1),cbc(aes))",
 		.cra_driver_name = "qat_aes_cbc_hmac_sha1",
-		.cra_priority = 4001,
+		.cra_priority = 1,
 		.cra_flags = CRYPTO_ALG_ASYNC,
 		.cra_blocksize = AES_BLOCK_SIZE,
 		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
@@ -1291,7 +1291,7 @@ static struct aead_alg qat_aeads[] = { {
 	.base = {
 		.cra_name = "authenc(hmac(sha256),cbc(aes))",
 		.cra_driver_name = "qat_aes_cbc_hmac_sha256",
-		.cra_priority = 4001,
+		.cra_priority = 1,
 		.cra_flags = CRYPTO_ALG_ASYNC,
 		.cra_blocksize = AES_BLOCK_SIZE,
 		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
@@ -1308,7 +1308,7 @@ static struct aead_alg qat_aeads[] = { {
 	.base = {
 		.cra_name = "authenc(hmac(sha512),cbc(aes))",
 		.cra_driver_name = "qat_aes_cbc_hmac_sha512",
-		.cra_priority = 4001,
+		.cra_priority = 1,
 		.cra_flags = CRYPTO_ALG_ASYNC,
 		.cra_blocksize = AES_BLOCK_SIZE,
 		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
@@ -1326,7 +1326,7 @@ static struct aead_alg qat_aeads[] = { {
 static struct crypto_alg qat_algs[] = { {
 	.cra_name = "cbc(aes)",
 	.cra_driver_name = "qat_aes_cbc",
-	.cra_priority = 4001,
+	.cra_priority = 1,
 	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
 	.cra_blocksize = AES_BLOCK_SIZE,
 	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
@@ -1348,7 +1348,7 @@ static struct crypto_alg qat_algs[] = { {
 }, {
 	.cra_name = "ctr(aes)",
 	.cra_driver_name = "qat_aes_ctr",
-	.cra_priority = 4001,
+	.cra_priority = 1,
 	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
 	.cra_blocksize = 1,
 	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
@@ -1370,7 +1370,7 @@ static struct crypto_alg qat_algs[] = { {
 }, {
 	.cra_name = "xts(aes)",
 	.cra_driver_name = "qat_aes_xts",
-	.cra_priority = 4001,
+	.cra_priority = 1,
 	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
 	.cra_blocksize = AES_BLOCK_SIZE,
 	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),

base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7
-- 
2.35.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-03 13:49                                 ` [dm-devel] " Giovanni Cabiddu
@ 2022-03-03 19:21                                   ` Eric Biggers
  -1 siblings, 0 replies; 49+ messages in thread
From: Eric Biggers @ 2022-03-03 19:21 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Herbert Xu, Greg KH, Linus Torvalds, Kyle Sanderson,
	Dave Chinner, qat-linux, Linux-Kernal, linux-xfs,
	Linux Crypto Mailing List, device-mapper development

On Thu, Mar 03, 2022 at 01:49:03PM +0000, Giovanni Cabiddu wrote:
> On Thu, Mar 03, 2022 at 10:45:48AM +1200, Herbert Xu wrote:
> > On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote:
> > >
> > > I was thinking, as an alternative, to lower the cra_priority in the QAT
> > > driver for the algorithms used by dm-crypt so they are not used by
> > > default.
> > > Is that a viable option?
> > 
> > Yes I think that should work too.
> The patch below implements that solution and applies to linux-5.4.y.
> If it is ok, I can send it to stable for all kernels <= 5.4 following
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-3
> 
> ---8<---
> From: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> Date: Thu, 3 Mar 2022 11:54:07 +0000
> Subject: [PATCH] crypto: qat - drop priority of algorithms
> Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland
> 
> The implementations of aead and skcipher in the QAT driver are not
> properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
> If the HW queue is full, the driver returns -EBUSY but does not enqueue
> the request.
> This can result in applications like dm-crypt waiting indefinitely for a
> completion of a request that was never submitted to the hardware.
> 
> To mitigate this problem, reduce the priority of all skcipher and aead
> implementations in the QAT driver so they are not used by default.
> 
> This patch deviates from the original upstream solution, that prevents
> dm-crypt to use drivers registered with the flag
> CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
> kernels may have a too wide effect.
> 
> commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
> commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
> commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
> commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
> commit cd74693870fb748d812867ba49af733d689a3604 upstream
> 
> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> ---
>  drivers/crypto/qat/qat_common/qat_algs.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/crypto/qat/qat_common/qat_algs.c b/drivers/crypto/qat/qat_common/qat_algs.c
> index 6b8ad3d67481..a5c28a08fd8c 100644
> --- a/drivers/crypto/qat/qat_common/qat_algs.c
> +++ b/drivers/crypto/qat/qat_common/qat_algs.c
> @@ -1274,7 +1274,7 @@ static struct aead_alg qat_aeads[] = { {
>  	.base = {
>  		.cra_name = "authenc(hmac(sha1),cbc(aes))",
>  		.cra_driver_name = "qat_aes_cbc_hmac_sha1",
> -		.cra_priority = 4001,
> +		.cra_priority = 1,
>  		.cra_flags = CRYPTO_ALG_ASYNC,
>  		.cra_blocksize = AES_BLOCK_SIZE,
>  		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
> @@ -1291,7 +1291,7 @@ static struct aead_alg qat_aeads[] = { {
>  	.base = {
>  		.cra_name = "authenc(hmac(sha256),cbc(aes))",
>  		.cra_driver_name = "qat_aes_cbc_hmac_sha256",
> -		.cra_priority = 4001,
> +		.cra_priority = 1,
>  		.cra_flags = CRYPTO_ALG_ASYNC,
>  		.cra_blocksize = AES_BLOCK_SIZE,
>  		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
> @@ -1308,7 +1308,7 @@ static struct aead_alg qat_aeads[] = { {
>  	.base = {
>  		.cra_name = "authenc(hmac(sha512),cbc(aes))",
>  		.cra_driver_name = "qat_aes_cbc_hmac_sha512",
> -		.cra_priority = 4001,
> +		.cra_priority = 1,
>  		.cra_flags = CRYPTO_ALG_ASYNC,
>  		.cra_blocksize = AES_BLOCK_SIZE,
>  		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
> @@ -1326,7 +1326,7 @@ static struct aead_alg qat_aeads[] = { {
>  static struct crypto_alg qat_algs[] = { {
>  	.cra_name = "cbc(aes)",
>  	.cra_driver_name = "qat_aes_cbc",
> -	.cra_priority = 4001,
> +	.cra_priority = 1,
>  	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
>  	.cra_blocksize = AES_BLOCK_SIZE,
>  	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
> @@ -1348,7 +1348,7 @@ static struct crypto_alg qat_algs[] = { {
>  }, {
>  	.cra_name = "ctr(aes)",
>  	.cra_driver_name = "qat_aes_ctr",
> -	.cra_priority = 4001,
> +	.cra_priority = 1,
>  	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
>  	.cra_blocksize = 1,
>  	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
> @@ -1370,7 +1370,7 @@ static struct crypto_alg qat_algs[] = { {
>  }, {
>  	.cra_name = "xts(aes)",
>  	.cra_driver_name = "qat_aes_xts",
> -	.cra_priority = 4001,
> +	.cra_priority = 1,
>  	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
>  	.cra_blocksize = AES_BLOCK_SIZE,
>  	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
> 
> base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7
> -- 
> 2.35.1
> 

If these algorithms have critical bugs, which it appears they do, then IMO it
would be better to disable them (either stop registering them, or disable the
whole driver) than to leave them available with low cra_priority.  Low
cra_priority doesn't guarantee that they aren't used.

- Eric

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-03 19:21                                   ` Eric Biggers
  0 siblings, 0 replies; 49+ messages in thread
From: Eric Biggers @ 2022-03-03 19:21 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Herbert Xu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal,
	linux-xfs, device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

On Thu, Mar 03, 2022 at 01:49:03PM +0000, Giovanni Cabiddu wrote:
> On Thu, Mar 03, 2022 at 10:45:48AM +1200, Herbert Xu wrote:
> > On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote:
> > >
> > > I was thinking, as an alternative, to lower the cra_priority in the QAT
> > > driver for the algorithms used by dm-crypt so they are not used by
> > > default.
> > > Is that a viable option?
> > 
> > Yes I think that should work too.
> The patch below implements that solution and applies to linux-5.4.y.
> If it is ok, I can send it to stable for all kernels <= 5.4 following
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-3
> 
> ---8<---
> From: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> Date: Thu, 3 Mar 2022 11:54:07 +0000
> Subject: [PATCH] crypto: qat - drop priority of algorithms
> Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland
> 
> The implementations of aead and skcipher in the QAT driver are not
> properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
> If the HW queue is full, the driver returns -EBUSY but does not enqueue
> the request.
> This can result in applications like dm-crypt waiting indefinitely for a
> completion of a request that was never submitted to the hardware.
> 
> To mitigate this problem, reduce the priority of all skcipher and aead
> implementations in the QAT driver so they are not used by default.
> 
> This patch deviates from the original upstream solution, that prevents
> dm-crypt to use drivers registered with the flag
> CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
> kernels may have a too wide effect.
> 
> commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
> commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
> commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
> commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
> commit cd74693870fb748d812867ba49af733d689a3604 upstream
> 
> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> ---
>  drivers/crypto/qat/qat_common/qat_algs.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/crypto/qat/qat_common/qat_algs.c b/drivers/crypto/qat/qat_common/qat_algs.c
> index 6b8ad3d67481..a5c28a08fd8c 100644
> --- a/drivers/crypto/qat/qat_common/qat_algs.c
> +++ b/drivers/crypto/qat/qat_common/qat_algs.c
> @@ -1274,7 +1274,7 @@ static struct aead_alg qat_aeads[] = { {
>  	.base = {
>  		.cra_name = "authenc(hmac(sha1),cbc(aes))",
>  		.cra_driver_name = "qat_aes_cbc_hmac_sha1",
> -		.cra_priority = 4001,
> +		.cra_priority = 1,
>  		.cra_flags = CRYPTO_ALG_ASYNC,
>  		.cra_blocksize = AES_BLOCK_SIZE,
>  		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
> @@ -1291,7 +1291,7 @@ static struct aead_alg qat_aeads[] = { {
>  	.base = {
>  		.cra_name = "authenc(hmac(sha256),cbc(aes))",
>  		.cra_driver_name = "qat_aes_cbc_hmac_sha256",
> -		.cra_priority = 4001,
> +		.cra_priority = 1,
>  		.cra_flags = CRYPTO_ALG_ASYNC,
>  		.cra_blocksize = AES_BLOCK_SIZE,
>  		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
> @@ -1308,7 +1308,7 @@ static struct aead_alg qat_aeads[] = { {
>  	.base = {
>  		.cra_name = "authenc(hmac(sha512),cbc(aes))",
>  		.cra_driver_name = "qat_aes_cbc_hmac_sha512",
> -		.cra_priority = 4001,
> +		.cra_priority = 1,
>  		.cra_flags = CRYPTO_ALG_ASYNC,
>  		.cra_blocksize = AES_BLOCK_SIZE,
>  		.cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
> @@ -1326,7 +1326,7 @@ static struct aead_alg qat_aeads[] = { {
>  static struct crypto_alg qat_algs[] = { {
>  	.cra_name = "cbc(aes)",
>  	.cra_driver_name = "qat_aes_cbc",
> -	.cra_priority = 4001,
> +	.cra_priority = 1,
>  	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
>  	.cra_blocksize = AES_BLOCK_SIZE,
>  	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
> @@ -1348,7 +1348,7 @@ static struct crypto_alg qat_algs[] = { {
>  }, {
>  	.cra_name = "ctr(aes)",
>  	.cra_driver_name = "qat_aes_ctr",
> -	.cra_priority = 4001,
> +	.cra_priority = 1,
>  	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
>  	.cra_blocksize = 1,
>  	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
> @@ -1370,7 +1370,7 @@ static struct crypto_alg qat_algs[] = { {
>  }, {
>  	.cra_name = "xts(aes)",
>  	.cra_driver_name = "qat_aes_xts",
> -	.cra_priority = 4001,
> +	.cra_priority = 1,
>  	.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
>  	.cra_blocksize = AES_BLOCK_SIZE,
>  	.cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
> 
> base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7
> -- 
> 2.35.1
> 

If these algorithms have critical bugs, which it appears they do, then IMO it
would be better to disable them (either stop registering them, or disable the
whole driver) than to leave them available with low cra_priority.  Low
cra_priority doesn't guarantee that they aren't used.

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-03 19:21                                   ` [dm-devel] " Eric Biggers
@ 2022-03-03 21:24                                     ` Giovanni Cabiddu
  -1 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-03-03 21:24 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Herbert Xu, Greg KH, Linus Torvalds, Kyle Sanderson,
	Dave Chinner, qat-linux, Linux-Kernal, linux-xfs,
	Linux Crypto Mailing List, device-mapper development

On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote:
> If these algorithms have critical bugs, which it appears they do, then IMO it
> would be better to disable them (either stop registering them, or disable the
> whole driver) than to leave them available with low cra_priority.  Low
> cra_priority doesn't guarantee that they aren't used.
Thanks for your feedback Eric.

Here is a patch that disables the registration of the algorithms in the
QAT driver by setting, a config time, the number of HW queues (aka
instances) to zero.

---8<---
From: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Subject: [PATCH] crypto: qat - disable registration of algorithms
Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland

The implementations of aead and skcipher in the QAT driver do not
support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
If the HW queue is full, the driver returns -EBUSY but does not enqueue
the request.
This can result in applications like dm-crypt waiting indefinitely for a
completion of a request that was never submitted to the hardware.

To avoid this problem, disable the registration of all skcipher and aead
implementations in the QAT driver by setting the number of crypto
instances to 0 at configuration time.

This patch deviates from the original upstream solution, that prevents
dm-crypt to use drivers registered with the flag
CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
kernels may have a too wide effect.

commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
commit cd74693870fb748d812867ba49af733d689a3604 upstream

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 drivers/crypto/qat/qat_common/qat_crypto.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/crypto/qat/qat_common/qat_crypto.c b/drivers/crypto/qat/qat_common/qat_crypto.c
index 3852d31ce0a4..611d214d5198 100644
--- a/drivers/crypto/qat/qat_common/qat_crypto.c
+++ b/drivers/crypto/qat/qat_common/qat_crypto.c
@@ -159,9 +159,7 @@ struct qat_crypto_instance *qat_crypto_get_instance_node(int node)
  */
 int qat_crypto_dev_config(struct adf_accel_dev *accel_dev)
 {
-	int cpus = num_online_cpus();
-	int banks = GET_MAX_BANKS(accel_dev);
-	int instances = min(cpus, banks);
+	int instances = 0;
 	char key[ADF_CFG_MAX_KEY_LEN_IN_BYTES];
 	int i;
 	unsigned long val;

base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7
-- 
2.35.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-03 21:24                                     ` Giovanni Cabiddu
  0 siblings, 0 replies; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-03-03 21:24 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Herbert Xu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal,
	linux-xfs, device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote:
> If these algorithms have critical bugs, which it appears they do, then IMO it
> would be better to disable them (either stop registering them, or disable the
> whole driver) than to leave them available with low cra_priority.  Low
> cra_priority doesn't guarantee that they aren't used.
Thanks for your feedback Eric.

Here is a patch that disables the registration of the algorithms in the
QAT driver by setting, a config time, the number of HW queues (aka
instances) to zero.

---8<---
From: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Subject: [PATCH] crypto: qat - disable registration of algorithms
Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland

The implementations of aead and skcipher in the QAT driver do not
support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
If the HW queue is full, the driver returns -EBUSY but does not enqueue
the request.
This can result in applications like dm-crypt waiting indefinitely for a
completion of a request that was never submitted to the hardware.

To avoid this problem, disable the registration of all skcipher and aead
implementations in the QAT driver by setting the number of crypto
instances to 0 at configuration time.

This patch deviates from the original upstream solution, that prevents
dm-crypt to use drivers registered with the flag
CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
kernels may have a too wide effect.

commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
commit cd74693870fb748d812867ba49af733d689a3604 upstream

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 drivers/crypto/qat/qat_common/qat_crypto.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/crypto/qat/qat_common/qat_crypto.c b/drivers/crypto/qat/qat_common/qat_crypto.c
index 3852d31ce0a4..611d214d5198 100644
--- a/drivers/crypto/qat/qat_common/qat_crypto.c
+++ b/drivers/crypto/qat/qat_common/qat_crypto.c
@@ -159,9 +159,7 @@ struct qat_crypto_instance *qat_crypto_get_instance_node(int node)
  */
 int qat_crypto_dev_config(struct adf_accel_dev *accel_dev)
 {
-	int cpus = num_online_cpus();
-	int banks = GET_MAX_BANKS(accel_dev);
-	int instances = min(cpus, banks);
+	int instances = 0;
 	char key[ADF_CFG_MAX_KEY_LEN_IN_BYTES];
 	int i;
 	unsigned long val;

base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7
-- 
2.35.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-03 21:24                                     ` [dm-devel] " Giovanni Cabiddu
@ 2022-03-03 21:44                                       ` Eric Biggers
  -1 siblings, 0 replies; 49+ messages in thread
From: Eric Biggers @ 2022-03-03 21:44 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Herbert Xu, Greg KH, Linus Torvalds, Kyle Sanderson,
	Dave Chinner, qat-linux, Linux-Kernal, linux-xfs,
	Linux Crypto Mailing List, device-mapper development

On Thu, Mar 03, 2022 at 09:24:42PM +0000, Giovanni Cabiddu wrote:
> On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote:
> > If these algorithms have critical bugs, which it appears they do, then IMO it
> > would be better to disable them (either stop registering them, or disable the
> > whole driver) than to leave them available with low cra_priority.  Low
> > cra_priority doesn't guarantee that they aren't used.
> Thanks for your feedback Eric.
> 
> Here is a patch that disables the registration of the algorithms in the
> QAT driver by setting, a config time, the number of HW queues (aka
> instances) to zero.
> 
> ---8<---
> From: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> Subject: [PATCH] crypto: qat - disable registration of algorithms
> Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland
> 
> The implementations of aead and skcipher in the QAT driver do not
> support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
> If the HW queue is full, the driver returns -EBUSY but does not enqueue
> the request.
> This can result in applications like dm-crypt waiting indefinitely for a
> completion of a request that was never submitted to the hardware.
> 
> To avoid this problem, disable the registration of all skcipher and aead
> implementations in the QAT driver by setting the number of crypto
> instances to 0 at configuration time.
> 
> This patch deviates from the original upstream solution, that prevents
> dm-crypt to use drivers registered with the flag
> CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
> kernels may have a too wide effect.
> 
> commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
> commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
> commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
> commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
> commit cd74693870fb748d812867ba49af733d689a3604 upstream
> 
> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> ---
>  drivers/crypto/qat/qat_common/qat_crypto.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Sounds good; is there any reason not to apply this upstream too, though?
You could revert it later as part of the patch series that fixes the driver.

- Eric

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-03 21:44                                       ` Eric Biggers
  0 siblings, 0 replies; 49+ messages in thread
From: Eric Biggers @ 2022-03-03 21:44 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Herbert Xu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal,
	linux-xfs, device-mapper development, Kyle Sanderson,
	Linux Crypto Mailing List, Linus Torvalds

On Thu, Mar 03, 2022 at 09:24:42PM +0000, Giovanni Cabiddu wrote:
> On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote:
> > If these algorithms have critical bugs, which it appears they do, then IMO it
> > would be better to disable them (either stop registering them, or disable the
> > whole driver) than to leave them available with low cra_priority.  Low
> > cra_priority doesn't guarantee that they aren't used.
> Thanks for your feedback Eric.
> 
> Here is a patch that disables the registration of the algorithms in the
> QAT driver by setting, a config time, the number of HW queues (aka
> instances) to zero.
> 
> ---8<---
> From: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> Subject: [PATCH] crypto: qat - disable registration of algorithms
> Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland
> 
> The implementations of aead and skcipher in the QAT driver do not
> support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
> If the HW queue is full, the driver returns -EBUSY but does not enqueue
> the request.
> This can result in applications like dm-crypt waiting indefinitely for a
> completion of a request that was never submitted to the hardware.
> 
> To avoid this problem, disable the registration of all skcipher and aead
> implementations in the QAT driver by setting the number of crypto
> instances to 0 at configuration time.
> 
> This patch deviates from the original upstream solution, that prevents
> dm-crypt to use drivers registered with the flag
> CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
> kernels may have a too wide effect.
> 
> commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
> commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
> commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
> commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
> commit cd74693870fb748d812867ba49af733d689a3604 upstream
> 
> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> ---
>  drivers/crypto/qat/qat_common/qat_crypto.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Sounds good; is there any reason not to apply this upstream too, though?
You could revert it later as part of the patch series that fixes the driver.

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-03 21:44                                       ` [dm-devel] " Eric Biggers
  (?)
@ 2022-03-04 17:50                                       ` Giovanni Cabiddu
  2022-03-16 21:38                                           ` [dm-devel] " Kyle Sanderson
  -1 siblings, 1 reply; 49+ messages in thread
From: Giovanni Cabiddu @ 2022-03-04 17:50 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Herbert Xu, Greg KH, Linus Torvalds, Kyle Sanderson,
	Dave Chinner, qat-linux, Linux-Kernal, linux-xfs,
	Linux Crypto Mailing List, device-mapper development

On Thu, Mar 03, 2022 at 09:44:53PM +0000, Eric Biggers wrote:
> On Thu, Mar 03, 2022 at 09:24:42PM +0000, Giovanni Cabiddu wrote:
> > On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote:
> > > If these algorithms have critical bugs, which it appears they do, then IMO it
> > > would be better to disable them (either stop registering them, or disable the
> > > whole driver) than to leave them available with low cra_priority.  Low
> > > cra_priority doesn't guarantee that they aren't used.
> > Thanks for your feedback Eric.
> > 
> > Here is a patch that disables the registration of the algorithms in the
> > QAT driver by setting, a config time, the number of HW queues (aka
> > instances) to zero.
> > 
> > ---8<---
> > From: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> > Subject: [PATCH] crypto: qat - disable registration of algorithms
> > Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland
> > 
> > The implementations of aead and skcipher in the QAT driver do not
> > support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
> > If the HW queue is full, the driver returns -EBUSY but does not enqueue
> > the request.
> > This can result in applications like dm-crypt waiting indefinitely for a
> > completion of a request that was never submitted to the hardware.
> > 
> > To avoid this problem, disable the registration of all skcipher and aead
> > implementations in the QAT driver by setting the number of crypto
> > instances to 0 at configuration time.
> > 
> > This patch deviates from the original upstream solution, that prevents
> > dm-crypt to use drivers registered with the flag
> > CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
> > kernels may have a too wide effect.
> > 
> > commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
> > commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
> > commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
> > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
> > commit cd74693870fb748d812867ba49af733d689a3604 upstream
> > 
> > Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> > ---
> >  drivers/crypto/qat/qat_common/qat_crypto.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> Sounds good; is there any reason not to apply this upstream too, though?
> You could revert it later as part of the patch series that fixes the driver.
Makes sense. I'm going to send it upstream and Cc stable as documented
in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1
I will then revert this change in the set that fixes the problem.

Thanks,

-- 
Giovanni

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-04 17:50                                       ` Giovanni Cabiddu
@ 2022-03-16 21:38                                           ` Kyle Sanderson
  0 siblings, 0 replies; 49+ messages in thread
From: Kyle Sanderson @ 2022-03-16 21:38 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Eric Biggers, Herbert Xu, Greg KH, Linus Torvalds, Dave Chinner,
	qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List,
	device-mapper development

> Makes sense. I'm going to send it upstream and Cc stable as documented
> in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1
> I will then revert this change in the set that fixes the problem.

Did this go anywhere? I'm still not seeing it in any of the stable trees.

Kyle.

On Fri, Mar 4, 2022 at 9:50 AM Giovanni Cabiddu
<giovanni.cabiddu@intel.com> wrote:
>
> On Thu, Mar 03, 2022 at 09:44:53PM +0000, Eric Biggers wrote:
> > On Thu, Mar 03, 2022 at 09:24:42PM +0000, Giovanni Cabiddu wrote:
> > > On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote:
> > > > If these algorithms have critical bugs, which it appears they do, then IMO it
> > > > would be better to disable them (either stop registering them, or disable the
> > > > whole driver) than to leave them available with low cra_priority.  Low
> > > > cra_priority doesn't guarantee that they aren't used.
> > > Thanks for your feedback Eric.
> > >
> > > Here is a patch that disables the registration of the algorithms in the
> > > QAT driver by setting, a config time, the number of HW queues (aka
> > > instances) to zero.
> > >
> > > ---8<---
> > > From: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> > > Subject: [PATCH] crypto: qat - disable registration of algorithms
> > > Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland
> > >
> > > The implementations of aead and skcipher in the QAT driver do not
> > > support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
> > > If the HW queue is full, the driver returns -EBUSY but does not enqueue
> > > the request.
> > > This can result in applications like dm-crypt waiting indefinitely for a
> > > completion of a request that was never submitted to the hardware.
> > >
> > > To avoid this problem, disable the registration of all skcipher and aead
> > > implementations in the QAT driver by setting the number of crypto
> > > instances to 0 at configuration time.
> > >
> > > This patch deviates from the original upstream solution, that prevents
> > > dm-crypt to use drivers registered with the flag
> > > CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
> > > kernels may have a too wide effect.
> > >
> > > commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
> > > commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
> > > commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
> > > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
> > > commit cd74693870fb748d812867ba49af733d689a3604 upstream
> > >
> > > Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> > > ---
> > >  drivers/crypto/qat/qat_common/qat_crypto.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > Sounds good; is there any reason not to apply this upstream too, though?
> > You could revert it later as part of the patch series that fixes the driver.
> Makes sense. I'm going to send it upstream and Cc stable as documented
> in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1
> I will then revert this change in the set that fixes the problem.
>
> Thanks,
>
> --
> Giovanni

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-16 21:38                                           ` Kyle Sanderson
  0 siblings, 0 replies; 49+ messages in thread
From: Kyle Sanderson @ 2022-03-16 21:38 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: linux-xfs, Herbert Xu, Greg KH, Dave Chinner, qat-linux,
	Linux-Kernal, Eric Biggers, device-mapper development,
	Linux Crypto Mailing List, Linus Torvalds

> Makes sense. I'm going to send it upstream and Cc stable as documented
> in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1
> I will then revert this change in the set that fixes the problem.

Did this go anywhere? I'm still not seeing it in any of the stable trees.

Kyle.

On Fri, Mar 4, 2022 at 9:50 AM Giovanni Cabiddu
<giovanni.cabiddu@intel.com> wrote:
>
> On Thu, Mar 03, 2022 at 09:44:53PM +0000, Eric Biggers wrote:
> > On Thu, Mar 03, 2022 at 09:24:42PM +0000, Giovanni Cabiddu wrote:
> > > On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote:
> > > > If these algorithms have critical bugs, which it appears they do, then IMO it
> > > > would be better to disable them (either stop registering them, or disable the
> > > > whole driver) than to leave them available with low cra_priority.  Low
> > > > cra_priority doesn't guarantee that they aren't used.
> > > Thanks for your feedback Eric.
> > >
> > > Here is a patch that disables the registration of the algorithms in the
> > > QAT driver by setting, a config time, the number of HW queues (aka
> > > instances) to zero.
> > >
> > > ---8<---
> > > From: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> > > Subject: [PATCH] crypto: qat - disable registration of algorithms
> > > Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland
> > >
> > > The implementations of aead and skcipher in the QAT driver do not
> > > support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
> > > If the HW queue is full, the driver returns -EBUSY but does not enqueue
> > > the request.
> > > This can result in applications like dm-crypt waiting indefinitely for a
> > > completion of a request that was never submitted to the hardware.
> > >
> > > To avoid this problem, disable the registration of all skcipher and aead
> > > implementations in the QAT driver by setting the number of crypto
> > > instances to 0 at configuration time.
> > >
> > > This patch deviates from the original upstream solution, that prevents
> > > dm-crypt to use drivers registered with the flag
> > > CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
> > > kernels may have a too wide effect.
> > >
> > > commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
> > > commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
> > > commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
> > > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
> > > commit cd74693870fb748d812867ba49af733d689a3604 upstream
> > >
> > > Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> > > ---
> > >  drivers/crypto/qat/qat_common/qat_crypto.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > Sounds good; is there any reason not to apply this upstream too, though?
> > You could revert it later as part of the patch series that fixes the driver.
> Makes sense. I'm going to send it upstream and Cc stable as documented
> in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1
> I will then revert this change in the set that fixes the problem.
>
> Thanks,
>
> --
> Giovanni

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
  2022-03-16 21:38                                           ` [dm-devel] " Kyle Sanderson
@ 2022-03-16 22:13                                             ` Herbert Xu
  -1 siblings, 0 replies; 49+ messages in thread
From: Herbert Xu @ 2022-03-16 22:13 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: Giovanni Cabiddu, Eric Biggers, Greg KH, Linus Torvalds,
	Dave Chinner, qat-linux, Linux-Kernal, linux-xfs,
	Linux Crypto Mailing List, device-mapper development

On Wed, Mar 16, 2022 at 02:38:10PM -0700, Kyle Sanderson wrote:
> > Makes sense. I'm going to send it upstream and Cc stable as documented
> > in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1
> > I will then revert this change in the set that fixes the problem.
> 
> Did this go anywhere? I'm still not seeing it in any of the stable trees.

It's in the cryptodev tree which should hit mainline when the merge
window opens.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs
@ 2022-03-16 22:13                                             ` Herbert Xu
  0 siblings, 0 replies; 49+ messages in thread
From: Herbert Xu @ 2022-03-16 22:13 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: Giovanni Cabiddu, linux-xfs, Greg KH, Dave Chinner, qat-linux,
	Linux-Kernal, Eric Biggers, device-mapper development,
	Linux Crypto Mailing List, Linus Torvalds

On Wed, Mar 16, 2022 at 02:38:10PM -0700, Kyle Sanderson wrote:
> > Makes sense. I'm going to send it upstream and Cc stable as documented
> > in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1
> > I will then revert this change in the set that fixes the problem.
> 
> Did this go anywhere? I'm still not seeing it in any of the stable trees.

It's in the cryptodev tree which should hit mainline when the merge
window opens.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2022-03-17  8:03 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-19  5:02 Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs Kyle Sanderson
2022-02-19  5:02 ` [dm-devel] " Kyle Sanderson
2022-02-19 21:03 ` Dave Chinner
2022-02-19 21:03   ` [dm-devel] " Dave Chinner
2022-02-19 23:00   ` Kyle Sanderson
2022-02-19 23:00     ` [dm-devel] " Kyle Sanderson
2022-02-21 11:47     ` Giovanni Cabiddu
2022-02-21 11:47       ` [dm-devel] " Giovanni Cabiddu
2022-02-28  8:18       ` Kyle Sanderson
2022-02-28  8:18         ` [dm-devel] " Kyle Sanderson
2022-02-28 19:25         ` Linus Torvalds
2022-02-28 19:25           ` [dm-devel] " Linus Torvalds
2022-02-28 20:39           ` Giovanni Cabiddu
2022-02-28 20:39             ` [dm-devel] " Giovanni Cabiddu
2022-02-28 20:59             ` Greg KH
2022-02-28 20:59               ` [dm-devel] " Greg KH
2022-02-28 23:26             ` Herbert Xu
2022-02-28 23:26               ` [dm-devel] " Herbert Xu
2022-03-01  1:12               ` Linus Torvalds
2022-03-01  1:12                 ` [dm-devel] " Linus Torvalds
2022-03-01  4:11                 ` Herbert Xu
2022-03-01  4:11                   ` [dm-devel] " Herbert Xu
2022-03-02 10:29                   ` Greg KH
2022-03-02 10:29                     ` [dm-devel] " Greg KH
2022-03-02 11:49                     ` Giovanni Cabiddu
2022-03-02 11:49                       ` [dm-devel] " Giovanni Cabiddu
2022-03-02 14:56                       ` Greg KH
2022-03-02 14:56                         ` [dm-devel] " Greg KH
2022-03-02 22:27                         ` Herbert Xu
2022-03-02 22:27                           ` [dm-devel] " Herbert Xu
2022-03-02 22:42                           ` Giovanni Cabiddu
2022-03-02 22:42                             ` [dm-devel] " Giovanni Cabiddu
2022-03-02 22:45                             ` Herbert Xu
2022-03-02 22:45                               ` [dm-devel] " Herbert Xu
2022-03-03 13:49                               ` Giovanni Cabiddu
2022-03-03 13:49                                 ` [dm-devel] " Giovanni Cabiddu
2022-03-03 19:21                                 ` Eric Biggers
2022-03-03 19:21                                   ` [dm-devel] " Eric Biggers
2022-03-03 21:24                                   ` Giovanni Cabiddu
2022-03-03 21:24                                     ` [dm-devel] " Giovanni Cabiddu
2022-03-03 21:44                                     ` Eric Biggers
2022-03-03 21:44                                       ` [dm-devel] " Eric Biggers
2022-03-04 17:50                                       ` Giovanni Cabiddu
2022-03-16 21:38                                         ` Kyle Sanderson
2022-03-16 21:38                                           ` [dm-devel] " Kyle Sanderson
2022-03-16 22:13                                           ` Herbert Xu
2022-03-16 22:13                                             ` [dm-devel] " Herbert Xu
2022-02-28 21:13           ` Milan Broz
2022-02-28 21:13             ` Milan Broz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.