* Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-02-19 5:02 ` Kyle Sanderson 0 siblings, 0 replies; 49+ messages in thread From: Kyle Sanderson @ 2022-02-19 5:02 UTC (permalink / raw) To: qat-linux, giovanni.cabiddu Cc: Linux-Kernal, linux-xfs, linux-crypto, dm-devel, Linus Torvalds A2SDi-8C-HLN4F has IQAT enabled by default, when this device is attempted to be used by xfs (through dm-crypt) the entire kernel thread stalls forever. Multiple users have hit this over the years (through sporadic reporting) - I ended up trying ZFS and encryption wasn't an issue there at all because I guess they don't use this device. Returning to sanity (xfs), I was able to provision a dm-crypt volume no problem on the disk, however when running mkfs.xfs on the volume is what triggers the cascading failure (each request kills a kthread). Disabling IQAT on the south bridge results in a working system, however this is not the default configuration for the distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm convinced this never worked properly based on the lack of popularity for kernel encryption (crypto), and the embedded nature that SuperMicro has integrated this device in collaboration with intel as it looks like the primary usage is through external accelerator cards. Kernels tried were from RHEL8 over a year ago, and this impacts the entirety of the 5.4 series on Ubuntu. Please CC me on replies as I'm not subscribed to all lists. CPU is C3758. 363.495058] INFO: task kworker/u16:0:8 blocked for more than 120 seconds. [ 363.495114] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.495155] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.495201] kworker/u16:0 D 0 8 2 0x80004000 [ 363.495213] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.495214] Call Trace: [ 363.495223] __schedule+0x2e3/0x740 [ 363.495226] schedule+0x42/0xb0 [ 363.495228] schedule_timeout+0x10e/0x160 [ 363.495232] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.495233] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.495236] wait_for_completion+0xb1/0x120 [ 363.495239] ? wake_up_q+0x70/0x70 [ 363.495242] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.495245] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.495249] process_one_work+0x1eb/0x3b0 [ 363.495251] worker_thread+0x4d/0x400 [ 363.495254] kthread+0x104/0x140 [ 363.495256] ? process_one_work+0x3b0/0x3b0 [ 363.495257] ? kthread_park+0x90/0x90 [ 363.495260] ret_from_fork+0x1f/0x40 [ 363.495274] INFO: task kworker/u16:1:123 blocked for more than 120 seconds. [ 363.495317] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.495364] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.495410] kworker/u16:1 D 0 123 2 0x80004000 [ 363.495415] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.495416] Call Trace: [ 363.495419] __schedule+0x2e3/0x740 [ 363.495422] schedule+0x42/0xb0 [ 363.495424] schedule_timeout+0x10e/0x160 [ 363.495426] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.495427] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.495430] wait_for_completion+0xb1/0x120 [ 363.495431] ? wake_up_q+0x70/0x70 [ 363.495434] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.495437] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.495441] process_one_work+0x1eb/0x3b0 [ 363.495443] worker_thread+0x4d/0x400 [ 363.495445] kthread+0x104/0x140 [ 363.495447] ? process_one_work+0x3b0/0x3b0 [ 363.495449] ? kthread_park+0x90/0x90 [ 363.495451] ret_from_fork+0x1f/0x40 [ 363.495457] INFO: task kworker/u16:2:153 blocked for more than 120 seconds. [ 363.495499] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.495539] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.495584] kworker/u16:2 D 0 153 2 0x80004000 [ 363.495589] Workqueue: kcryptd/253:5 kcryptd_crypt [dm_crypt] [ 363.495590] Call Trace: [ 363.495593] __schedule+0x2e3/0x740 [ 363.495595] schedule+0x42/0xb0 [ 363.495597] schedule_timeout+0x10e/0x160 [ 363.495599] ? skcipher_decrypt_ablkcipher+0x61/0x70 [ 363.495601] ? crypto_skcipher_decrypt+0x48/0x60 [ 363.495603] wait_for_completion+0xb1/0x120 [ 363.495605] ? wake_up_q+0x70/0x70 [ 363.495608] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.495611] kcryptd_crypt+0xc6/0x3b0 [dm_crypt] [ 363.495613] ? __switch_to+0x7f/0x480 [ 363.495615] ? switch_mm_irqs_off+0x19b/0x500 [ 363.495618] process_one_work+0x1eb/0x3b0 [ 363.495621] worker_thread+0x4d/0x400 [ 363.495623] kthread+0x104/0x140 [ 363.495625] ? process_one_work+0x3b0/0x3b0 [ 363.495627] ? kthread_park+0x90/0x90 [ 363.495629] ret_from_fork+0x1f/0x40 [ 363.495636] INFO: task kworker/u16:5:279 blocked for more than 120 seconds. [ 363.495677] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.495717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.495762] kworker/u16:5 D 0 279 2 0x80004000 [ 363.495766] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.495767] Call Trace: [ 363.495771] __schedule+0x2e3/0x740 [ 363.495773] schedule+0x42/0xb0 [ 363.495775] schedule_timeout+0x10e/0x160 [ 363.495777] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.495778] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.495781] wait_for_completion+0xb1/0x120 [ 363.495782] ? wake_up_q+0x70/0x70 [ 363.495785] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.495788] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.495791] process_one_work+0x1eb/0x3b0 [ 363.495794] worker_thread+0x4d/0x400 [ 363.495796] kthread+0x104/0x140 [ 363.495798] ? process_one_work+0x3b0/0x3b0 [ 363.495800] ? kthread_park+0x90/0x90 [ 363.495802] ret_from_fork+0x1f/0x40 [ 363.495808] INFO: task kworker/u16:11:299 blocked for more than 120 seconds. [ 363.495849] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.495890] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.495935] kworker/u16:11 D 0 299 2 0x80004000 [ 363.495939] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.495940] Call Trace: [ 363.495943] __schedule+0x2e3/0x740 [ 363.495946] schedule+0x42/0xb0 [ 363.495947] schedule_timeout+0x10e/0x160 [ 363.495949] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.495951] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.495953] wait_for_completion+0xb1/0x120 [ 363.495955] ? wake_up_q+0x70/0x70 [ 363.495958] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.495961] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.495964] process_one_work+0x1eb/0x3b0 [ 363.495966] worker_thread+0x4d/0x400 [ 363.495969] kthread+0x104/0x140 [ 363.495971] ? process_one_work+0x3b0/0x3b0 [ 363.495972] ? kthread_park+0x90/0x90 [ 363.495974] ret_from_fork+0x1f/0x40 [ 363.495977] INFO: task kworker/u16:12:300 blocked for more than 120 seconds. [ 363.496018] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.496058] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.496108] kworker/u16:12 D 0 300 2 0x80004000 [ 363.496113] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.496114] Call Trace: [ 363.496117] __schedule+0x2e3/0x740 [ 363.496120] schedule+0x42/0xb0 [ 363.496121] schedule_timeout+0x10e/0x160 [ 363.496123] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.496125] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.496127] wait_for_completion+0xb1/0x120 [ 363.496129] ? wake_up_q+0x70/0x70 [ 363.496132] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.496134] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.496138] process_one_work+0x1eb/0x3b0 [ 363.496140] worker_thread+0x4d/0x400 [ 363.496142] kthread+0x104/0x140 [ 363.496144] ? process_one_work+0x3b0/0x3b0 [ 363.496146] ? kthread_park+0x90/0x90 [ 363.496148] ret_from_fork+0x1f/0x40 [ 363.496151] INFO: task kworker/u16:13:301 blocked for more than 120 seconds. [ 363.496193] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.496233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.496278] kworker/u16:13 D 0 301 2 0x80004000 [ 363.496282] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.496283] Call Trace: [ 363.496286] __schedule+0x2e3/0x740 [ 363.496289] schedule+0x42/0xb0 [ 363.496290] schedule_timeout+0x10e/0x160 [ 363.496292] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.496294] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.496296] wait_for_completion+0xb1/0x120 [ 363.496298] ? wake_up_q+0x70/0x70 [ 363.496301] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.496304] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.496307] process_one_work+0x1eb/0x3b0 [ 363.496310] worker_thread+0x4d/0x400 [ 363.496312] kthread+0x104/0x140 [ 363.496314] ? process_one_work+0x3b0/0x3b0 [ 363.496316] ? kthread_park+0x90/0x90 [ 363.496317] ret_from_fork+0x1f/0x40 [ 363.496320] INFO: task kworker/u16:14:302 blocked for more than 120 seconds. [ 363.496362] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.496402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.496447] kworker/u16:14 D 0 302 2 0x80004000 [ 363.496451] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.496452] Call Trace: [ 363.496455] __schedule+0x2e3/0x740 [ 363.496458] schedule+0x42/0xb0 [ 363.496459] schedule_timeout+0x10e/0x160 [ 363.496461] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.496463] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.496465] wait_for_completion+0xb1/0x120 [ 363.496467] ? wake_up_q+0x70/0x70 [ 363.496470] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.496473] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.496476] process_one_work+0x1eb/0x3b0 [ 363.496478] worker_thread+0x4d/0x400 [ 363.496481] kthread+0x104/0x140 [ 363.496483] ? process_one_work+0x3b0/0x3b0 [ 363.496484] ? kthread_park+0x90/0x90 [ 363.496486] ret_from_fork+0x1f/0x40 [ 363.496489] INFO: task kworker/u16:15:303 blocked for more than 120 seconds. [ 363.496531] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.496571] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.496616] kworker/u16:15 D 0 303 2 0x80004000 [ 363.496620] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.496621] Call Trace: [ 363.496624] __schedule+0x2e3/0x740 [ 363.496627] schedule+0x42/0xb0 [ 363.496629] schedule_timeout+0x10e/0x160 [ 363.496630] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.496632] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.496634] wait_for_completion+0xb1/0x120 [ 363.496636] ? wake_up_q+0x70/0x70 [ 363.496639] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.496642] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.496645] process_one_work+0x1eb/0x3b0 [ 363.496647] worker_thread+0x4d/0x400 [ 363.496650] kthread+0x104/0x140 [ 363.496652] ? process_one_work+0x3b0/0x3b0 [ 363.496654] ? kthread_park+0x90/0x90 [ 363.496655] ret_from_fork+0x1f/0x40 [ 363.496713] INFO: task mergerfs:9760 blocked for more than 120 seconds. [ 363.496752] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.496793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.496838] mergerfs D 0 9760 1 0x00000000 [ 363.496840] Call Trace: [ 363.496843] __schedule+0x2e3/0x740 [ 363.496846] schedule+0x42/0xb0 [ 363.496848] schedule_timeout+0x10e/0x160 [ 363.496851] ? blk_finish_plug+0x26/0x40 [ 363.496853] wait_for_completion+0xb1/0x120 [ 363.496855] ? wake_up_q+0x70/0x70 [ 363.496910] ? __xfs_buf_submit+0x138/0x260 [xfs] [ 363.496950] xfs_buf_iowait+0x26/0xe0 [xfs] [ 363.496990] __xfs_buf_submit+0x138/0x260 [xfs] [ 363.497030] _xfs_buf_read+0x27/0x30 [xfs] [ 363.497070] xfs_buf_read_map+0x132/0x1d0 [xfs] [ 363.497073] ? new_slab+0x4a/0x70 [ 363.497117] xfs_trans_read_buf_map+0xca/0x350 [xfs] [ 363.497155] xfs_imap_to_bp+0x66/0xd0 [xfs] [ 363.497193] xfs_iread+0x83/0x200 [xfs] [ 363.497234] xfs_iget+0x214/0x9e0 [xfs] [ 363.497270] ? xfs_da_compname+0x1d/0x30 [xfs] [ 363.497306] ? xfs_dir2_sf_lookup+0xd0/0x200 [xfs] [ 363.497348] xfs_lookup+0xe2/0x120 [xfs] [ 363.497390] xfs_vn_lookup+0x72/0xb0 [xfs] [ 363.497393] __lookup_slow+0x92/0x160 [ 363.497395] lookup_slow+0x3b/0x60 [ 363.497397] walk_component+0x1da/0x360 [ 363.497399] ? link_path_walk.part.0+0x2a2/0x550 [ 363.497401] path_lookupat.isra.0+0x80/0x230 [ 363.497404] filename_lookup+0xae/0x170 [ 363.497407] ? __check_object_size+0x13f/0x150 [ 363.497409] ? strncpy_from_user+0x4c/0x150 [ 363.497412] user_path_at_empty+0x3a/0x50 [ 363.497414] vfs_statx+0x7d/0xe0 [ 363.497417] __do_sys_newlstat+0x3e/0x80 [ 363.497419] ? vfs_read+0x12e/0x160 [ 363.497420] ? fput+0x13/0x20 [ 363.497422] ? ksys_read+0xce/0xe0 [ 363.497424] __x64_sys_newlstat+0x16/0x20 [ 363.497427] do_syscall_64+0x57/0x190 [ 363.497429] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 363.497432] RIP: 0033:0x7f7f32c656ea [ 363.497438] Code: Bad RIP value. [ 363.497439] RSP: 002b:00007f7f31fea0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000006 [ 363.497441] RAX: ffffffffffffffda RBX: 0000560953796248 RCX: 00007f7f32c656ea [ 363.497442] RDX: 00007f7f31fea110 RSI: 00007f7f31fea110 RDI: 00007f7f31fea100 [ 363.497443] RBP: 00007f7f31fea120 R08: 0000000000000001 R09: 000000000000000a [ 363.497445] R10: 00007f7f14000b90 R11: 0000000000000246 R12: 00007f7f31fea220 [ 363.497446] R13: 00007f7f14000b90 R14: 00007f7f31fea100 R15: 00007f7f31fea110 ^ permalink raw reply [flat|nested] 49+ messages in thread
* [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-02-19 5:02 ` Kyle Sanderson 0 siblings, 0 replies; 49+ messages in thread From: Kyle Sanderson @ 2022-02-19 5:02 UTC (permalink / raw) To: qat-linux, giovanni.cabiddu Cc: linux-xfs, dm-devel, Linus Torvalds, Linux-Kernal, linux-crypto A2SDi-8C-HLN4F has IQAT enabled by default, when this device is attempted to be used by xfs (through dm-crypt) the entire kernel thread stalls forever. Multiple users have hit this over the years (through sporadic reporting) - I ended up trying ZFS and encryption wasn't an issue there at all because I guess they don't use this device. Returning to sanity (xfs), I was able to provision a dm-crypt volume no problem on the disk, however when running mkfs.xfs on the volume is what triggers the cascading failure (each request kills a kthread). Disabling IQAT on the south bridge results in a working system, however this is not the default configuration for the distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm convinced this never worked properly based on the lack of popularity for kernel encryption (crypto), and the embedded nature that SuperMicro has integrated this device in collaboration with intel as it looks like the primary usage is through external accelerator cards. Kernels tried were from RHEL8 over a year ago, and this impacts the entirety of the 5.4 series on Ubuntu. Please CC me on replies as I'm not subscribed to all lists. CPU is C3758. 363.495058] INFO: task kworker/u16:0:8 blocked for more than 120 seconds. [ 363.495114] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.495155] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.495201] kworker/u16:0 D 0 8 2 0x80004000 [ 363.495213] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.495214] Call Trace: [ 363.495223] __schedule+0x2e3/0x740 [ 363.495226] schedule+0x42/0xb0 [ 363.495228] schedule_timeout+0x10e/0x160 [ 363.495232] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.495233] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.495236] wait_for_completion+0xb1/0x120 [ 363.495239] ? wake_up_q+0x70/0x70 [ 363.495242] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.495245] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.495249] process_one_work+0x1eb/0x3b0 [ 363.495251] worker_thread+0x4d/0x400 [ 363.495254] kthread+0x104/0x140 [ 363.495256] ? process_one_work+0x3b0/0x3b0 [ 363.495257] ? kthread_park+0x90/0x90 [ 363.495260] ret_from_fork+0x1f/0x40 [ 363.495274] INFO: task kworker/u16:1:123 blocked for more than 120 seconds. [ 363.495317] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.495364] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.495410] kworker/u16:1 D 0 123 2 0x80004000 [ 363.495415] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.495416] Call Trace: [ 363.495419] __schedule+0x2e3/0x740 [ 363.495422] schedule+0x42/0xb0 [ 363.495424] schedule_timeout+0x10e/0x160 [ 363.495426] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.495427] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.495430] wait_for_completion+0xb1/0x120 [ 363.495431] ? wake_up_q+0x70/0x70 [ 363.495434] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.495437] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.495441] process_one_work+0x1eb/0x3b0 [ 363.495443] worker_thread+0x4d/0x400 [ 363.495445] kthread+0x104/0x140 [ 363.495447] ? process_one_work+0x3b0/0x3b0 [ 363.495449] ? kthread_park+0x90/0x90 [ 363.495451] ret_from_fork+0x1f/0x40 [ 363.495457] INFO: task kworker/u16:2:153 blocked for more than 120 seconds. [ 363.495499] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.495539] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.495584] kworker/u16:2 D 0 153 2 0x80004000 [ 363.495589] Workqueue: kcryptd/253:5 kcryptd_crypt [dm_crypt] [ 363.495590] Call Trace: [ 363.495593] __schedule+0x2e3/0x740 [ 363.495595] schedule+0x42/0xb0 [ 363.495597] schedule_timeout+0x10e/0x160 [ 363.495599] ? skcipher_decrypt_ablkcipher+0x61/0x70 [ 363.495601] ? crypto_skcipher_decrypt+0x48/0x60 [ 363.495603] wait_for_completion+0xb1/0x120 [ 363.495605] ? wake_up_q+0x70/0x70 [ 363.495608] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.495611] kcryptd_crypt+0xc6/0x3b0 [dm_crypt] [ 363.495613] ? __switch_to+0x7f/0x480 [ 363.495615] ? switch_mm_irqs_off+0x19b/0x500 [ 363.495618] process_one_work+0x1eb/0x3b0 [ 363.495621] worker_thread+0x4d/0x400 [ 363.495623] kthread+0x104/0x140 [ 363.495625] ? process_one_work+0x3b0/0x3b0 [ 363.495627] ? kthread_park+0x90/0x90 [ 363.495629] ret_from_fork+0x1f/0x40 [ 363.495636] INFO: task kworker/u16:5:279 blocked for more than 120 seconds. [ 363.495677] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.495717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.495762] kworker/u16:5 D 0 279 2 0x80004000 [ 363.495766] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.495767] Call Trace: [ 363.495771] __schedule+0x2e3/0x740 [ 363.495773] schedule+0x42/0xb0 [ 363.495775] schedule_timeout+0x10e/0x160 [ 363.495777] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.495778] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.495781] wait_for_completion+0xb1/0x120 [ 363.495782] ? wake_up_q+0x70/0x70 [ 363.495785] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.495788] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.495791] process_one_work+0x1eb/0x3b0 [ 363.495794] worker_thread+0x4d/0x400 [ 363.495796] kthread+0x104/0x140 [ 363.495798] ? process_one_work+0x3b0/0x3b0 [ 363.495800] ? kthread_park+0x90/0x90 [ 363.495802] ret_from_fork+0x1f/0x40 [ 363.495808] INFO: task kworker/u16:11:299 blocked for more than 120 seconds. [ 363.495849] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.495890] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.495935] kworker/u16:11 D 0 299 2 0x80004000 [ 363.495939] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.495940] Call Trace: [ 363.495943] __schedule+0x2e3/0x740 [ 363.495946] schedule+0x42/0xb0 [ 363.495947] schedule_timeout+0x10e/0x160 [ 363.495949] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.495951] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.495953] wait_for_completion+0xb1/0x120 [ 363.495955] ? wake_up_q+0x70/0x70 [ 363.495958] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.495961] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.495964] process_one_work+0x1eb/0x3b0 [ 363.495966] worker_thread+0x4d/0x400 [ 363.495969] kthread+0x104/0x140 [ 363.495971] ? process_one_work+0x3b0/0x3b0 [ 363.495972] ? kthread_park+0x90/0x90 [ 363.495974] ret_from_fork+0x1f/0x40 [ 363.495977] INFO: task kworker/u16:12:300 blocked for more than 120 seconds. [ 363.496018] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.496058] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.496108] kworker/u16:12 D 0 300 2 0x80004000 [ 363.496113] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.496114] Call Trace: [ 363.496117] __schedule+0x2e3/0x740 [ 363.496120] schedule+0x42/0xb0 [ 363.496121] schedule_timeout+0x10e/0x160 [ 363.496123] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.496125] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.496127] wait_for_completion+0xb1/0x120 [ 363.496129] ? wake_up_q+0x70/0x70 [ 363.496132] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.496134] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.496138] process_one_work+0x1eb/0x3b0 [ 363.496140] worker_thread+0x4d/0x400 [ 363.496142] kthread+0x104/0x140 [ 363.496144] ? process_one_work+0x3b0/0x3b0 [ 363.496146] ? kthread_park+0x90/0x90 [ 363.496148] ret_from_fork+0x1f/0x40 [ 363.496151] INFO: task kworker/u16:13:301 blocked for more than 120 seconds. [ 363.496193] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.496233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.496278] kworker/u16:13 D 0 301 2 0x80004000 [ 363.496282] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.496283] Call Trace: [ 363.496286] __schedule+0x2e3/0x740 [ 363.496289] schedule+0x42/0xb0 [ 363.496290] schedule_timeout+0x10e/0x160 [ 363.496292] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.496294] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.496296] wait_for_completion+0xb1/0x120 [ 363.496298] ? wake_up_q+0x70/0x70 [ 363.496301] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.496304] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.496307] process_one_work+0x1eb/0x3b0 [ 363.496310] worker_thread+0x4d/0x400 [ 363.496312] kthread+0x104/0x140 [ 363.496314] ? process_one_work+0x3b0/0x3b0 [ 363.496316] ? kthread_park+0x90/0x90 [ 363.496317] ret_from_fork+0x1f/0x40 [ 363.496320] INFO: task kworker/u16:14:302 blocked for more than 120 seconds. [ 363.496362] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.496402] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.496447] kworker/u16:14 D 0 302 2 0x80004000 [ 363.496451] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.496452] Call Trace: [ 363.496455] __schedule+0x2e3/0x740 [ 363.496458] schedule+0x42/0xb0 [ 363.496459] schedule_timeout+0x10e/0x160 [ 363.496461] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.496463] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.496465] wait_for_completion+0xb1/0x120 [ 363.496467] ? wake_up_q+0x70/0x70 [ 363.496470] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.496473] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.496476] process_one_work+0x1eb/0x3b0 [ 363.496478] worker_thread+0x4d/0x400 [ 363.496481] kthread+0x104/0x140 [ 363.496483] ? process_one_work+0x3b0/0x3b0 [ 363.496484] ? kthread_park+0x90/0x90 [ 363.496486] ret_from_fork+0x1f/0x40 [ 363.496489] INFO: task kworker/u16:15:303 blocked for more than 120 seconds. [ 363.496531] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.496571] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.496616] kworker/u16:15 D 0 303 2 0x80004000 [ 363.496620] Workqueue: kcryptd/253:0 kcryptd_crypt [dm_crypt] [ 363.496621] Call Trace: [ 363.496624] __schedule+0x2e3/0x740 [ 363.496627] schedule+0x42/0xb0 [ 363.496629] schedule_timeout+0x10e/0x160 [ 363.496630] ? skcipher_encrypt_ablkcipher+0x61/0x70 [ 363.496632] ? crypto_skcipher_encrypt+0x48/0x60 [ 363.496634] wait_for_completion+0xb1/0x120 [ 363.496636] ? wake_up_q+0x70/0x70 [ 363.496639] crypt_convert+0x144/0x1f0 [dm_crypt] [ 363.496642] kcryptd_crypt+0x2b9/0x3b0 [dm_crypt] [ 363.496645] process_one_work+0x1eb/0x3b0 [ 363.496647] worker_thread+0x4d/0x400 [ 363.496650] kthread+0x104/0x140 [ 363.496652] ? process_one_work+0x3b0/0x3b0 [ 363.496654] ? kthread_park+0x90/0x90 [ 363.496655] ret_from_fork+0x1f/0x40 [ 363.496713] INFO: task mergerfs:9760 blocked for more than 120 seconds. [ 363.496752] Tainted: P O 5.4.0-100-generic #113-Ubuntu [ 363.496793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.496838] mergerfs D 0 9760 1 0x00000000 [ 363.496840] Call Trace: [ 363.496843] __schedule+0x2e3/0x740 [ 363.496846] schedule+0x42/0xb0 [ 363.496848] schedule_timeout+0x10e/0x160 [ 363.496851] ? blk_finish_plug+0x26/0x40 [ 363.496853] wait_for_completion+0xb1/0x120 [ 363.496855] ? wake_up_q+0x70/0x70 [ 363.496910] ? __xfs_buf_submit+0x138/0x260 [xfs] [ 363.496950] xfs_buf_iowait+0x26/0xe0 [xfs] [ 363.496990] __xfs_buf_submit+0x138/0x260 [xfs] [ 363.497030] _xfs_buf_read+0x27/0x30 [xfs] [ 363.497070] xfs_buf_read_map+0x132/0x1d0 [xfs] [ 363.497073] ? new_slab+0x4a/0x70 [ 363.497117] xfs_trans_read_buf_map+0xca/0x350 [xfs] [ 363.497155] xfs_imap_to_bp+0x66/0xd0 [xfs] [ 363.497193] xfs_iread+0x83/0x200 [xfs] [ 363.497234] xfs_iget+0x214/0x9e0 [xfs] [ 363.497270] ? xfs_da_compname+0x1d/0x30 [xfs] [ 363.497306] ? xfs_dir2_sf_lookup+0xd0/0x200 [xfs] [ 363.497348] xfs_lookup+0xe2/0x120 [xfs] [ 363.497390] xfs_vn_lookup+0x72/0xb0 [xfs] [ 363.497393] __lookup_slow+0x92/0x160 [ 363.497395] lookup_slow+0x3b/0x60 [ 363.497397] walk_component+0x1da/0x360 [ 363.497399] ? link_path_walk.part.0+0x2a2/0x550 [ 363.497401] path_lookupat.isra.0+0x80/0x230 [ 363.497404] filename_lookup+0xae/0x170 [ 363.497407] ? __check_object_size+0x13f/0x150 [ 363.497409] ? strncpy_from_user+0x4c/0x150 [ 363.497412] user_path_at_empty+0x3a/0x50 [ 363.497414] vfs_statx+0x7d/0xe0 [ 363.497417] __do_sys_newlstat+0x3e/0x80 [ 363.497419] ? vfs_read+0x12e/0x160 [ 363.497420] ? fput+0x13/0x20 [ 363.497422] ? ksys_read+0xce/0xe0 [ 363.497424] __x64_sys_newlstat+0x16/0x20 [ 363.497427] do_syscall_64+0x57/0x190 [ 363.497429] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 363.497432] RIP: 0033:0x7f7f32c656ea [ 363.497438] Code: Bad RIP value. [ 363.497439] RSP: 002b:00007f7f31fea0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000006 [ 363.497441] RAX: ffffffffffffffda RBX: 0000560953796248 RCX: 00007f7f32c656ea [ 363.497442] RDX: 00007f7f31fea110 RSI: 00007f7f31fea110 RDI: 00007f7f31fea100 [ 363.497443] RBP: 00007f7f31fea120 R08: 0000000000000001 R09: 000000000000000a [ 363.497445] R10: 00007f7f14000b90 R11: 0000000000000246 R12: 00007f7f31fea220 [ 363.497446] R13: 00007f7f14000b90 R14: 00007f7f31fea100 R15: 00007f7f31fea110 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-02-19 5:02 ` [dm-devel] " Kyle Sanderson @ 2022-02-19 21:03 ` Dave Chinner -1 siblings, 0 replies; 49+ messages in thread From: Dave Chinner @ 2022-02-19 21:03 UTC (permalink / raw) To: Kyle Sanderson Cc: qat-linux, giovanni.cabiddu, Linux-Kernal, linux-xfs, linux-crypto, dm-devel, Linus Torvalds On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote: > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is > attempted to be used by xfs (through dm-crypt) the entire kernel > thread stalls forever. Multiple users have hit this over the years > (through sporadic reporting) - I ended up trying ZFS and encryption > wasn't an issue there at all because I guess they don't use this > device. Returning to sanity (xfs), I was able to provision a dm-crypt > volume no problem on the disk, however when running mkfs.xfs on the > volume is what triggers the cascading failure (each request kills a > kthread). Can you provide the full stack traces for these errors so we can see exactly what this cascading failure looks like, please? In reality, the stall messages some time after this are not interesting - it's the first errors that cause the stall that need to be investigated. A good idea would be to provide the full storage stack decription and hardware in use, as per: https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > Disabling IQAT on the south bridge results in a working > system, however this is not the default configuration for the > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm > convinced this never worked properly based on the lack of popularity > for kernel encryption (crypto), and the embedded nature that > SuperMicro has integrated this device in collaboration with intel as > it looks like the primary usage is through external accelerator cards. This really sounds like broken hardware, not a kernel problem. > Kernels tried were from RHEL8 over a year ago, and this impacts the > entirety of the 5.4 series on Ubuntu. > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758. [snip stalled kcryptd worker threads] This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs. Where are the XFS corruption reports that the subject implies is occurring? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-02-19 21:03 ` Dave Chinner 0 siblings, 0 replies; 49+ messages in thread From: Dave Chinner @ 2022-02-19 21:03 UTC (permalink / raw) To: Kyle Sanderson Cc: giovanni.cabiddu, qat-linux, Linux-Kernal, linux-xfs, dm-devel, linux-crypto, Linus Torvalds On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote: > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is > attempted to be used by xfs (through dm-crypt) the entire kernel > thread stalls forever. Multiple users have hit this over the years > (through sporadic reporting) - I ended up trying ZFS and encryption > wasn't an issue there at all because I guess they don't use this > device. Returning to sanity (xfs), I was able to provision a dm-crypt > volume no problem on the disk, however when running mkfs.xfs on the > volume is what triggers the cascading failure (each request kills a > kthread). Can you provide the full stack traces for these errors so we can see exactly what this cascading failure looks like, please? In reality, the stall messages some time after this are not interesting - it's the first errors that cause the stall that need to be investigated. A good idea would be to provide the full storage stack decription and hardware in use, as per: https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > Disabling IQAT on the south bridge results in a working > system, however this is not the default configuration for the > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm > convinced this never worked properly based on the lack of popularity > for kernel encryption (crypto), and the embedded nature that > SuperMicro has integrated this device in collaboration with intel as > it looks like the primary usage is through external accelerator cards. This really sounds like broken hardware, not a kernel problem. > Kernels tried were from RHEL8 over a year ago, and this impacts the > entirety of the 5.4 series on Ubuntu. > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758. [snip stalled kcryptd worker threads] This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs. Where are the XFS corruption reports that the subject implies is occurring? Cheers, Dave. -- Dave Chinner david@fromorbit.com -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-02-19 21:03 ` [dm-devel] " Dave Chinner @ 2022-02-19 23:00 ` Kyle Sanderson -1 siblings, 0 replies; 49+ messages in thread From: Kyle Sanderson @ 2022-02-19 23:00 UTC (permalink / raw) To: Dave Chinner Cc: qat-linux, giovanni.cabiddu, Linux-Kernal, linux-xfs, linux-crypto, dm-devel, Linus Torvalds, Greg KH, salvatore.benedetto, herbert, pablo.marcos.oltra hi Dave, > This really sounds like broken hardware, not a kernel problem. It is indeed a hardware issue, specifically the intel qat crypto driver that's in-tree - the hardware is fine (see below). The IQAT eratta documentation states that if a request is not submitted properly it can stall the entire device. The remediation guidance from 2020 was "don't do that" and "don't allow unprivileged users access to the device". The in-tree driver is not implemented properly either for this SoC or board - I'm thinking it's related to QATE-7495. https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf > This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs. That's the weird part about it. Some bio's are completing, others are completely dropped, with some stalling forever. I had to use xfs_repair to get the volumes operational again. I lost a good deal of files and had to recover from backup after toggling the device back on on a production system (silly, I know). > Where are the XFS corruption reports that the subject implies is occurring? I think you're right, it's dm-crypt that's broken here, with ultimately the crypto driver causing this corruption. XFS being the edge to the end-user is taking the brunt of it. There's reports going back to late 2017 of significant issues with this mainlined stable driver. https://bugzilla.redhat.com/show_bug.cgi?id=1522962 https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560 Any guidance would be appreciated. Kyle. On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote: > > On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote: > > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is > > attempted to be used by xfs (through dm-crypt) the entire kernel > > thread stalls forever. Multiple users have hit this over the years > > (through sporadic reporting) - I ended up trying ZFS and encryption > > wasn't an issue there at all because I guess they don't use this > > device. Returning to sanity (xfs), I was able to provision a dm-crypt > > volume no problem on the disk, however when running mkfs.xfs on the > > volume is what triggers the cascading failure (each request kills a > > kthread). > > Can you provide the full stack traces for these errors so we can see > exactly what this cascading failure looks like, please? In reality, > the stall messages some time after this are not interesting - it's > the first errors that cause the stall that need to be investigated. > > A good idea would be to provide the full storage stack decription > and hardware in use, as per: > > https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > > > Disabling IQAT on the south bridge results in a working > > system, however this is not the default configuration for the > > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm > > convinced this never worked properly based on the lack of popularity > > for kernel encryption (crypto), and the embedded nature that > > SuperMicro has integrated this device in collaboration with intel as > > it looks like the primary usage is through external accelerator cards. > > This really sounds like broken hardware, not a kernel problem. > > > Kernels tried were from RHEL8 over a year ago, and this impacts the > > entirety of the 5.4 series on Ubuntu. > > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758. > > [snip stalled kcryptd worker threads] > > This implies a dmcrypt level problem - XFS can't make progress is > dmcrypt is not completing IOs. > > Where are the XFS corruption reports that the subject implies is > occurring? > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-02-19 23:00 ` Kyle Sanderson 0 siblings, 0 replies; 49+ messages in thread From: Kyle Sanderson @ 2022-02-19 23:00 UTC (permalink / raw) To: Dave Chinner Cc: giovanni.cabiddu, herbert, pablo.marcos.oltra, Greg KH, qat-linux, Linux-Kernal, linux-xfs, salvatore.benedetto, dm-devel, linux-crypto, Linus Torvalds hi Dave, > This really sounds like broken hardware, not a kernel problem. It is indeed a hardware issue, specifically the intel qat crypto driver that's in-tree - the hardware is fine (see below). The IQAT eratta documentation states that if a request is not submitted properly it can stall the entire device. The remediation guidance from 2020 was "don't do that" and "don't allow unprivileged users access to the device". The in-tree driver is not implemented properly either for this SoC or board - I'm thinking it's related to QATE-7495. https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf > This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs. That's the weird part about it. Some bio's are completing, others are completely dropped, with some stalling forever. I had to use xfs_repair to get the volumes operational again. I lost a good deal of files and had to recover from backup after toggling the device back on on a production system (silly, I know). > Where are the XFS corruption reports that the subject implies is occurring? I think you're right, it's dm-crypt that's broken here, with ultimately the crypto driver causing this corruption. XFS being the edge to the end-user is taking the brunt of it. There's reports going back to late 2017 of significant issues with this mainlined stable driver. https://bugzilla.redhat.com/show_bug.cgi?id=1522962 https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560 Any guidance would be appreciated. Kyle. On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote: > > On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote: > > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is > > attempted to be used by xfs (through dm-crypt) the entire kernel > > thread stalls forever. Multiple users have hit this over the years > > (through sporadic reporting) - I ended up trying ZFS and encryption > > wasn't an issue there at all because I guess they don't use this > > device. Returning to sanity (xfs), I was able to provision a dm-crypt > > volume no problem on the disk, however when running mkfs.xfs on the > > volume is what triggers the cascading failure (each request kills a > > kthread). > > Can you provide the full stack traces for these errors so we can see > exactly what this cascading failure looks like, please? In reality, > the stall messages some time after this are not interesting - it's > the first errors that cause the stall that need to be investigated. > > A good idea would be to provide the full storage stack decription > and hardware in use, as per: > > https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > > > Disabling IQAT on the south bridge results in a working > > system, however this is not the default configuration for the > > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm > > convinced this never worked properly based on the lack of popularity > > for kernel encryption (crypto), and the embedded nature that > > SuperMicro has integrated this device in collaboration with intel as > > it looks like the primary usage is through external accelerator cards. > > This really sounds like broken hardware, not a kernel problem. > > > Kernels tried were from RHEL8 over a year ago, and this impacts the > > entirety of the 5.4 series on Ubuntu. > > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758. > > [snip stalled kcryptd worker threads] > > This implies a dmcrypt level problem - XFS can't make progress is > dmcrypt is not completing IOs. > > Where are the XFS corruption reports that the subject implies is > occurring? > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-02-19 23:00 ` [dm-devel] " Kyle Sanderson @ 2022-02-21 11:47 ` Giovanni Cabiddu -1 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-02-21 11:47 UTC (permalink / raw) To: Kyle Sanderson, herbert Cc: Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, linux-crypto, dm-devel, Linus Torvalds, Greg KH Hi Kyle, The issue is that the implementations of aead and skcipher in the QAT driver are not properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. If the HW queue is full, the driver returns -EBUSY [1] but does not enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting indefinitely for a completion to a request that was never submitted, therefore the stall. This is not related to QATE-7495 'An incorrectly formatted request to QAT can hang the entire QAT endpoint' [3], which occurs when a malformed request is sent to the device. I'm working at patch that resolves this problem. In the meanwhile a workaround is to blacklist the qat_c3xxx.ko driver. Regarding avoiding this issue on stable kernels. The usage of QAT with dm-crypt was already disabled in kernel 5.10 for a different issue (the driver allocates memory in the datapath). The following patches implement the change: 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY An option would be to send the patches above to stable, another is to wait for a patch that fixes the problems in the QAT driver and send that to stable. @Herbert, what is the preferred approach here? Thanks, [1] https://elixir.bootlin.com/linux/latest/source/drivers/crypto/qat/qat_common/qat_algs.c#L1022 [2] https://elixir.bootlin.com/linux/latest/source/drivers/md/dm-crypt.c#L1584 [3] https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf - page 25 -- Giovanni On Sat, Feb 19, 2022 at 03:00:51PM -0800, Kyle Sanderson wrote: > hi Dave, > > > This really sounds like broken hardware, not a kernel problem. > > It is indeed a hardware issue, specifically the intel qat crypto > driver that's in-tree - the hardware is fine (see below). The IQAT > eratta documentation states that if a request is not submitted > properly it can stall the entire device. The remediation guidance from > 2020 was "don't do that" and "don't allow unprivileged users access to > the device". The in-tree driver is not implemented properly either for > this SoC or board - I'm thinking it's related to QATE-7495. > > https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf > > > This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs. > > That's the weird part about it. Some bio's are completing, others are > completely dropped, with some stalling forever. I had to use > xfs_repair to get the volumes operational again. I lost a good deal of > files and had to recover from backup after toggling the device back on > on a production system (silly, I know). > > > Where are the XFS corruption reports that the subject implies is occurring? > > I think you're right, it's dm-crypt that's broken here, with > ultimately the crypto driver causing this corruption. XFS being the > edge to the end-user is taking the brunt of it. There's reports going > back to late 2017 of significant issues with this mainlined stable > driver. > > https://bugzilla.redhat.com/show_bug.cgi?id=1522962 > https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu > https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560 > > Any guidance would be appreciated. > Kyle. > On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote: > > > > On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote: > > > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is > > > attempted to be used by xfs (through dm-crypt) the entire kernel > > > thread stalls forever. Multiple users have hit this over the years > > > (through sporadic reporting) - I ended up trying ZFS and encryption > > > wasn't an issue there at all because I guess they don't use this > > > device. Returning to sanity (xfs), I was able to provision a dm-crypt > > > volume no problem on the disk, however when running mkfs.xfs on the > > > volume is what triggers the cascading failure (each request kills a > > > kthread). > > > > Can you provide the full stack traces for these errors so we can see > > exactly what this cascading failure looks like, please? In reality, > > the stall messages some time after this are not interesting - it's > > the first errors that cause the stall that need to be investigated. > > > > A good idea would be to provide the full storage stack decription > > and hardware in use, as per: > > > > https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > > > > > Disabling IQAT on the south bridge results in a working > > > system, however this is not the default configuration for the > > > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm > > > convinced this never worked properly based on the lack of popularity > > > for kernel encryption (crypto), and the embedded nature that > > > SuperMicro has integrated this device in collaboration with intel as > > > it looks like the primary usage is through external accelerator cards. > > > > This really sounds like broken hardware, not a kernel problem. > > > > > Kernels tried were from RHEL8 over a year ago, and this impacts the > > > entirety of the 5.4 series on Ubuntu. > > > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758. > > > > [snip stalled kcryptd worker threads] > > > > This implies a dmcrypt level problem - XFS can't make progress is > > dmcrypt is not completing IOs. > > > > Where are the XFS corruption reports that the subject implies is > > occurring? > > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@fromorbit.com ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-02-21 11:47 ` Giovanni Cabiddu 0 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-02-21 11:47 UTC (permalink / raw) To: Kyle Sanderson, herbert Cc: Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, dm-devel, linux-crypto, Linus Torvalds Hi Kyle, The issue is that the implementations of aead and skcipher in the QAT driver are not properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. If the HW queue is full, the driver returns -EBUSY [1] but does not enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting indefinitely for a completion to a request that was never submitted, therefore the stall. This is not related to QATE-7495 'An incorrectly formatted request to QAT can hang the entire QAT endpoint' [3], which occurs when a malformed request is sent to the device. I'm working at patch that resolves this problem. In the meanwhile a workaround is to blacklist the qat_c3xxx.ko driver. Regarding avoiding this issue on stable kernels. The usage of QAT with dm-crypt was already disabled in kernel 5.10 for a different issue (the driver allocates memory in the datapath). The following patches implement the change: 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY An option would be to send the patches above to stable, another is to wait for a patch that fixes the problems in the QAT driver and send that to stable. @Herbert, what is the preferred approach here? Thanks, [1] https://elixir.bootlin.com/linux/latest/source/drivers/crypto/qat/qat_common/qat_algs.c#L1022 [2] https://elixir.bootlin.com/linux/latest/source/drivers/md/dm-crypt.c#L1584 [3] https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf - page 25 -- Giovanni On Sat, Feb 19, 2022 at 03:00:51PM -0800, Kyle Sanderson wrote: > hi Dave, > > > This really sounds like broken hardware, not a kernel problem. > > It is indeed a hardware issue, specifically the intel qat crypto > driver that's in-tree - the hardware is fine (see below). The IQAT > eratta documentation states that if a request is not submitted > properly it can stall the entire device. The remediation guidance from > 2020 was "don't do that" and "don't allow unprivileged users access to > the device". The in-tree driver is not implemented properly either for > this SoC or board - I'm thinking it's related to QATE-7495. > > https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf > > > This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs. > > That's the weird part about it. Some bio's are completing, others are > completely dropped, with some stalling forever. I had to use > xfs_repair to get the volumes operational again. I lost a good deal of > files and had to recover from backup after toggling the device back on > on a production system (silly, I know). > > > Where are the XFS corruption reports that the subject implies is occurring? > > I think you're right, it's dm-crypt that's broken here, with > ultimately the crypto driver causing this corruption. XFS being the > edge to the end-user is taking the brunt of it. There's reports going > back to late 2017 of significant issues with this mainlined stable > driver. > > https://bugzilla.redhat.com/show_bug.cgi?id=1522962 > https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu > https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560 > > Any guidance would be appreciated. > Kyle. > On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote: > > > > On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote: > > > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is > > > attempted to be used by xfs (through dm-crypt) the entire kernel > > > thread stalls forever. Multiple users have hit this over the years > > > (through sporadic reporting) - I ended up trying ZFS and encryption > > > wasn't an issue there at all because I guess they don't use this > > > device. Returning to sanity (xfs), I was able to provision a dm-crypt > > > volume no problem on the disk, however when running mkfs.xfs on the > > > volume is what triggers the cascading failure (each request kills a > > > kthread). > > > > Can you provide the full stack traces for these errors so we can see > > exactly what this cascading failure looks like, please? In reality, > > the stall messages some time after this are not interesting - it's > > the first errors that cause the stall that need to be investigated. > > > > A good idea would be to provide the full storage stack decription > > and hardware in use, as per: > > > > https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > > > > > Disabling IQAT on the south bridge results in a working > > > system, however this is not the default configuration for the > > > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm > > > convinced this never worked properly based on the lack of popularity > > > for kernel encryption (crypto), and the embedded nature that > > > SuperMicro has integrated this device in collaboration with intel as > > > it looks like the primary usage is through external accelerator cards. > > > > This really sounds like broken hardware, not a kernel problem. > > > > > Kernels tried were from RHEL8 over a year ago, and this impacts the > > > entirety of the 5.4 series on Ubuntu. > > > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758. > > > > [snip stalled kcryptd worker threads] > > > > This implies a dmcrypt level problem - XFS can't make progress is > > dmcrypt is not completing IOs. > > > > Where are the XFS corruption reports that the subject implies is > > occurring? > > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@fromorbit.com -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-02-21 11:47 ` [dm-devel] " Giovanni Cabiddu @ 2022-02-28 8:18 ` Kyle Sanderson -1 siblings, 0 replies; 49+ messages in thread From: Kyle Sanderson @ 2022-02-28 8:18 UTC (permalink / raw) To: Giovanni Cabiddu Cc: herbert, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, linux-crypto, dm-devel, Linus Torvalds, Greg KH > The issue is that the implementations of aead and skcipher in the QAT driver are not properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. Thanks Giovanni. Joel (from Intel) reached out to me out of band to try and sell me further on QAT but wasn't able to follow-up on any questions (like - how is the device actually used, how can I personally help, etc). > If the HW queue is full, the driver returns -EBUSY [1] but does not enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting indefinitely for a completion to a request that was never submitted, therefore the stall. Makes sense - this kernel driver has been destroying users for many years. I'm disappointed that this critical bricking failure isn't searchable for others. > This is not related to QATE-7495 'An incorrectly formatted request to QAT can hang the entire QAT endpoint' [3], which occurs when a malformed request is sent to the device. That's nice to hear that the device itself isn't dying, but it's been completely destroying systems for years which itself is a DoS. > I'm working at patch that resolves this problem. In the meanwhile a workaround is to blacklist the qat_c3xxx.ko driver. I'm not writing this facetiously, but this driver has caused incredible harm over the past 5+ years and seems to continue to do so. As there's no patch proposed yet, I'm looking for the driver to be completely removed from the tree as it's presently a pure marketing campaign that's caused significant harm. If the marketing benefits (like accelerated crypto + hashing) aren't there when the accelerated instruction set was pulled from these integrated chips - the driver continues to serve no purpose for consumers beyond damage. Disabling the core I/O bits in December 2020 to make this barely work continues to promote this as a side project as it was never resolved in the driver. If I can test patches, or assist with the removal of this present in-tree malware I'm happy to help. Kyle. On Mon, Feb 21, 2022 at 3:48 AM Giovanni Cabiddu <giovanni.cabiddu@intel.com> wrote: > > Hi Kyle, > > The issue is that the implementations of aead and skcipher in the QAT > driver are not properly supporting requests with the > CRYPTO_TFM_REQ_MAY_BACKLOG flag set. > If the HW queue is full, the driver returns -EBUSY [1] but does not > enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting > indefinitely for a completion to a request that was never submitted, > therefore the stall. > This is not related to QATE-7495 'An incorrectly formatted request to > QAT can hang the entire QAT endpoint' [3], which occurs when a malformed > request is sent to the device. > > I'm working at patch that resolves this problem. In the meanwhile a > workaround is to blacklist the qat_c3xxx.ko driver. > > Regarding avoiding this issue on stable kernels. The usage of QAT with > dm-crypt was already disabled in kernel 5.10 for a different issue > (the driver allocates memory in the datapath). > The following patches implement the change: > 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags > 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS > fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY > b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY > cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY > An option would be to send the patches above to stable, another is to wait > for a patch that fixes the problems in the QAT driver and send that to > stable. > @Herbert, what is the preferred approach here? > > Thanks, > > [1] https://elixir.bootlin.com/linux/latest/source/drivers/crypto/qat/qat_common/qat_algs.c#L1022 > [2] https://elixir.bootlin.com/linux/latest/source/drivers/md/dm-crypt.c#L1584 > [3] https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf - page 25 > > -- > Giovanni > > > On Sat, Feb 19, 2022 at 03:00:51PM -0800, Kyle Sanderson wrote: > > hi Dave, > > > > > This really sounds like broken hardware, not a kernel problem. > > > > It is indeed a hardware issue, specifically the intel qat crypto > > driver that's in-tree - the hardware is fine (see below). The IQAT > > eratta documentation states that if a request is not submitted > > properly it can stall the entire device. The remediation guidance from > > 2020 was "don't do that" and "don't allow unprivileged users access to > > the device". The in-tree driver is not implemented properly either for > > this SoC or board - I'm thinking it's related to QATE-7495. > > > > https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf > > > > > This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs. > > > > That's the weird part about it. Some bio's are completing, others are > > completely dropped, with some stalling forever. I had to use > > xfs_repair to get the volumes operational again. I lost a good deal of > > files and had to recover from backup after toggling the device back on > > on a production system (silly, I know). > > > > > Where are the XFS corruption reports that the subject implies is occurring? > > > > I think you're right, it's dm-crypt that's broken here, with > > ultimately the crypto driver causing this corruption. XFS being the > > edge to the end-user is taking the brunt of it. There's reports going > > back to late 2017 of significant issues with this mainlined stable > > driver. > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1522962 > > https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu > > https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560 > > > > Any guidance would be appreciated. > > Kyle. > > On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote: > > > > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is > > > > attempted to be used by xfs (through dm-crypt) the entire kernel > > > > thread stalls forever. Multiple users have hit this over the years > > > > (through sporadic reporting) - I ended up trying ZFS and encryption > > > > wasn't an issue there at all because I guess they don't use this > > > > device. Returning to sanity (xfs), I was able to provision a dm-crypt > > > > volume no problem on the disk, however when running mkfs.xfs on the > > > > volume is what triggers the cascading failure (each request kills a > > > > kthread). > > > > > > Can you provide the full stack traces for these errors so we can see > > > exactly what this cascading failure looks like, please? In reality, > > > the stall messages some time after this are not interesting - it's > > > the first errors that cause the stall that need to be investigated. > > > > > > A good idea would be to provide the full storage stack decription > > > and hardware in use, as per: > > > > > > https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > > > > > > > Disabling IQAT on the south bridge results in a working > > > > system, however this is not the default configuration for the > > > > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm > > > > convinced this never worked properly based on the lack of popularity > > > > for kernel encryption (crypto), and the embedded nature that > > > > SuperMicro has integrated this device in collaboration with intel as > > > > it looks like the primary usage is through external accelerator cards. > > > > > > This really sounds like broken hardware, not a kernel problem. > > > > > > > Kernels tried were from RHEL8 over a year ago, and this impacts the > > > > entirety of the 5.4 series on Ubuntu. > > > > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758. > > > > > > [snip stalled kcryptd worker threads] > > > > > > This implies a dmcrypt level problem - XFS can't make progress is > > > dmcrypt is not completing IOs. > > > > > > Where are the XFS corruption reports that the subject implies is > > > occurring? > > > > > > Cheers, > > > > > > Dave. > > > -- > > > Dave Chinner > > > david@fromorbit.com ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-02-28 8:18 ` Kyle Sanderson 0 siblings, 0 replies; 49+ messages in thread From: Kyle Sanderson @ 2022-02-28 8:18 UTC (permalink / raw) To: Giovanni Cabiddu Cc: herbert, Greg KH, Dave Chinner, Linux-Kernal, qat-linux, linux-xfs, dm-devel, linux-crypto, Linus Torvalds > The issue is that the implementations of aead and skcipher in the QAT driver are not properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. Thanks Giovanni. Joel (from Intel) reached out to me out of band to try and sell me further on QAT but wasn't able to follow-up on any questions (like - how is the device actually used, how can I personally help, etc). > If the HW queue is full, the driver returns -EBUSY [1] but does not enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting indefinitely for a completion to a request that was never submitted, therefore the stall. Makes sense - this kernel driver has been destroying users for many years. I'm disappointed that this critical bricking failure isn't searchable for others. > This is not related to QATE-7495 'An incorrectly formatted request to QAT can hang the entire QAT endpoint' [3], which occurs when a malformed request is sent to the device. That's nice to hear that the device itself isn't dying, but it's been completely destroying systems for years which itself is a DoS. > I'm working at patch that resolves this problem. In the meanwhile a workaround is to blacklist the qat_c3xxx.ko driver. I'm not writing this facetiously, but this driver has caused incredible harm over the past 5+ years and seems to continue to do so. As there's no patch proposed yet, I'm looking for the driver to be completely removed from the tree as it's presently a pure marketing campaign that's caused significant harm. If the marketing benefits (like accelerated crypto + hashing) aren't there when the accelerated instruction set was pulled from these integrated chips - the driver continues to serve no purpose for consumers beyond damage. Disabling the core I/O bits in December 2020 to make this barely work continues to promote this as a side project as it was never resolved in the driver. If I can test patches, or assist with the removal of this present in-tree malware I'm happy to help. Kyle. On Mon, Feb 21, 2022 at 3:48 AM Giovanni Cabiddu <giovanni.cabiddu@intel.com> wrote: > > Hi Kyle, > > The issue is that the implementations of aead and skcipher in the QAT > driver are not properly supporting requests with the > CRYPTO_TFM_REQ_MAY_BACKLOG flag set. > If the HW queue is full, the driver returns -EBUSY [1] but does not > enqueues the request as dm-crypt expects [2]. Dm-crypt ends up waiting > indefinitely for a completion to a request that was never submitted, > therefore the stall. > This is not related to QATE-7495 'An incorrectly formatted request to > QAT can hang the entire QAT endpoint' [3], which occurs when a malformed > request is sent to the device. > > I'm working at patch that resolves this problem. In the meanwhile a > workaround is to blacklist the qat_c3xxx.ko driver. > > Regarding avoiding this issue on stable kernels. The usage of QAT with > dm-crypt was already disabled in kernel 5.10 for a different issue > (the driver allocates memory in the datapath). > The following patches implement the change: > 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags > 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS > fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY > b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY > cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY > An option would be to send the patches above to stable, another is to wait > for a patch that fixes the problems in the QAT driver and send that to > stable. > @Herbert, what is the preferred approach here? > > Thanks, > > [1] https://elixir.bootlin.com/linux/latest/source/drivers/crypto/qat/qat_common/qat_algs.c#L1022 > [2] https://elixir.bootlin.com/linux/latest/source/drivers/md/dm-crypt.c#L1584 > [3] https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf - page 25 > > -- > Giovanni > > > On Sat, Feb 19, 2022 at 03:00:51PM -0800, Kyle Sanderson wrote: > > hi Dave, > > > > > This really sounds like broken hardware, not a kernel problem. > > > > It is indeed a hardware issue, specifically the intel qat crypto > > driver that's in-tree - the hardware is fine (see below). The IQAT > > eratta documentation states that if a request is not submitted > > properly it can stall the entire device. The remediation guidance from > > 2020 was "don't do that" and "don't allow unprivileged users access to > > the device". The in-tree driver is not implemented properly either for > > this SoC or board - I'm thinking it's related to QATE-7495. > > > > https://01.org/sites/default/files/downloads//336211qatsoftwareforlinux-rn-hwversion1.7021.pdf > > > > > This implies a dmcrypt level problem - XFS can't make progress is dmcrypt is not completing IOs. > > > > That's the weird part about it. Some bio's are completing, others are > > completely dropped, with some stalling forever. I had to use > > xfs_repair to get the volumes operational again. I lost a good deal of > > files and had to recover from backup after toggling the device back on > > on a production system (silly, I know). > > > > > Where are the XFS corruption reports that the subject implies is occurring? > > > > I think you're right, it's dm-crypt that's broken here, with > > ultimately the crypto driver causing this corruption. XFS being the > > edge to the end-user is taking the brunt of it. There's reports going > > back to late 2017 of significant issues with this mainlined stable > > driver. > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1522962 > > https://serverfault.com/questions/1010108/luks-hangs-on-centos-running-on-atom-c3758-cpu > > https://www.phoronix.com/forums/forum/software/distributions/1172231-fedora-33-s-enterprise-linux-next-effort-approved-testbed-for-raising-cpu-requirements-etc?p=1174560#post1174560 > > > > Any guidance would be appreciated. > > Kyle. > > On Sat, Feb 19, 2022 at 1:03 PM Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Fri, Feb 18, 2022 at 09:02:28PM -0800, Kyle Sanderson wrote: > > > > A2SDi-8C-HLN4F has IQAT enabled by default, when this device is > > > > attempted to be used by xfs (through dm-crypt) the entire kernel > > > > thread stalls forever. Multiple users have hit this over the years > > > > (through sporadic reporting) - I ended up trying ZFS and encryption > > > > wasn't an issue there at all because I guess they don't use this > > > > device. Returning to sanity (xfs), I was able to provision a dm-crypt > > > > volume no problem on the disk, however when running mkfs.xfs on the > > > > volume is what triggers the cascading failure (each request kills a > > > > kthread). > > > > > > Can you provide the full stack traces for these errors so we can see > > > exactly what this cascading failure looks like, please? In reality, > > > the stall messages some time after this are not interesting - it's > > > the first errors that cause the stall that need to be investigated. > > > > > > A good idea would be to provide the full storage stack decription > > > and hardware in use, as per: > > > > > > https://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > > > > > > > Disabling IQAT on the south bridge results in a working > > > > system, however this is not the default configuration for the > > > > distribution of choice (Ubuntu 20.04.3 LTS), nor the motherboard. I'm > > > > convinced this never worked properly based on the lack of popularity > > > > for kernel encryption (crypto), and the embedded nature that > > > > SuperMicro has integrated this device in collaboration with intel as > > > > it looks like the primary usage is through external accelerator cards. > > > > > > This really sounds like broken hardware, not a kernel problem. > > > > > > > Kernels tried were from RHEL8 over a year ago, and this impacts the > > > > entirety of the 5.4 series on Ubuntu. > > > > Please CC me on replies as I'm not subscribed to all lists. CPU is C3758. > > > > > > [snip stalled kcryptd worker threads] > > > > > > This implies a dmcrypt level problem - XFS can't make progress is > > > dmcrypt is not completing IOs. > > > > > > Where are the XFS corruption reports that the subject implies is > > > occurring? > > > > > > Cheers, > > > > > > Dave. > > > -- > > > Dave Chinner > > > david@fromorbit.com -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-02-28 8:18 ` [dm-devel] " Kyle Sanderson @ 2022-02-28 19:25 ` Linus Torvalds -1 siblings, 0 replies; 49+ messages in thread From: Linus Torvalds @ 2022-02-28 19:25 UTC (permalink / raw) To: Kyle Sanderson Cc: Giovanni Cabiddu, Herbert Xu, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development, Greg KH On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote: > > Makes sense - this kernel driver has been destroying users for many > years. I'm disappointed that this critical bricking failure isn't > searchable for others. It does sound like we should just disable that driver entirely until it is fixed. Or at least the configuration that can cause problems, if there is some particular sub-case. Although from a cursory glance and the noises made in this thread, it looks like it's all of the 'qat_aeads' cases (since that uses qat_alg_aead_enc() which can return -EAGAIN), which effectively means that all of the QAT stuff. So presumably CRYPTO_DEV_QAT should just be marked as depends on BROKEN || COMPILE_TEST or similar? Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-02-28 19:25 ` Linus Torvalds 0 siblings, 0 replies; 49+ messages in thread From: Linus Torvalds @ 2022-02-28 19:25 UTC (permalink / raw) To: Kyle Sanderson Cc: Giovanni Cabiddu, Herbert Xu, Greg KH, Dave Chinner, Linux-Kernal, qat-linux, linux-xfs, device-mapper development, Linux Crypto Mailing List On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote: > > Makes sense - this kernel driver has been destroying users for many > years. I'm disappointed that this critical bricking failure isn't > searchable for others. It does sound like we should just disable that driver entirely until it is fixed. Or at least the configuration that can cause problems, if there is some particular sub-case. Although from a cursory glance and the noises made in this thread, it looks like it's all of the 'qat_aeads' cases (since that uses qat_alg_aead_enc() which can return -EAGAIN), which effectively means that all of the QAT stuff. So presumably CRYPTO_DEV_QAT should just be marked as depends on BROKEN || COMPILE_TEST or similar? Linus -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-02-28 19:25 ` [dm-devel] " Linus Torvalds @ 2022-02-28 20:39 ` Giovanni Cabiddu -1 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-02-28 20:39 UTC (permalink / raw) To: Linus Torvalds Cc: Kyle Sanderson, Herbert Xu, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development, Greg KH On Mon, Feb 28, 2022 at 11:25:49AM -0800, Linus Torvalds wrote: > On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote: > > > > Makes sense - this kernel driver has been destroying users for many > > years. I'm disappointed that this critical bricking failure isn't > > searchable for others. > > It does sound like we should just disable that driver entirely until > it is fixed. > > Or at least the configuration that can cause problems, if there is > some particular sub-case. The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to a different issue. Is it an option to port those patches to stable till I provide a fix for the driver? I drafted already few alternatives for the fix and I am aiming for a final set by end of week. Thanks, -- Giovanni ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-02-28 20:39 ` Giovanni Cabiddu 0 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-02-28 20:39 UTC (permalink / raw) To: Linus Torvalds Cc: Herbert Xu, Greg KH, Dave Chinner, Linux-Kernal, qat-linux, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List On Mon, Feb 28, 2022 at 11:25:49AM -0800, Linus Torvalds wrote: > On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote: > > > > Makes sense - this kernel driver has been destroying users for many > > years. I'm disappointed that this critical bricking failure isn't > > searchable for others. > > It does sound like we should just disable that driver entirely until > it is fixed. > > Or at least the configuration that can cause problems, if there is > some particular sub-case. The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to a different issue. Is it an option to port those patches to stable till I provide a fix for the driver? I drafted already few alternatives for the fix and I am aiming for a final set by end of week. Thanks, -- Giovanni -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-02-28 20:39 ` [dm-devel] " Giovanni Cabiddu @ 2022-02-28 20:59 ` Greg KH -1 siblings, 0 replies; 49+ messages in thread From: Greg KH @ 2022-02-28 20:59 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Linus Torvalds, Kyle Sanderson, Herbert Xu, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Mon, Feb 28, 2022 at 08:39:11PM +0000, Giovanni Cabiddu wrote: > On Mon, Feb 28, 2022 at 11:25:49AM -0800, Linus Torvalds wrote: > > On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote: > > > > > > Makes sense - this kernel driver has been destroying users for many > > > years. I'm disappointed that this critical bricking failure isn't > > > searchable for others. > > > > It does sound like we should just disable that driver entirely until > > it is fixed. > > > > Or at least the configuration that can cause problems, if there is > > some particular sub-case. > The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to > a different issue. > Is it an option to port those patches to stable till I provide a fix for > the driver? I drafted already few alternatives for the fix and I am aiming > for a final set by end of week. If the existing situation is broken, yes, those patches are fine for stable releases. thanks, greg k-h ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-02-28 20:59 ` Greg KH 0 siblings, 0 replies; 49+ messages in thread From: Greg KH @ 2022-02-28 20:59 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Herbert Xu, Dave Chinner, Linux-Kernal, qat-linux, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds On Mon, Feb 28, 2022 at 08:39:11PM +0000, Giovanni Cabiddu wrote: > On Mon, Feb 28, 2022 at 11:25:49AM -0800, Linus Torvalds wrote: > > On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote: > > > > > > Makes sense - this kernel driver has been destroying users for many > > > years. I'm disappointed that this critical bricking failure isn't > > > searchable for others. > > > > It does sound like we should just disable that driver entirely until > > it is fixed. > > > > Or at least the configuration that can cause problems, if there is > > some particular sub-case. > The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to > a different issue. > Is it an option to port those patches to stable till I provide a fix for > the driver? I drafted already few alternatives for the fix and I am aiming > for a final set by end of week. If the existing situation is broken, yes, those patches are fine for stable releases. thanks, greg k-h -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-02-28 20:39 ` [dm-devel] " Giovanni Cabiddu @ 2022-02-28 23:26 ` Herbert Xu -1 siblings, 0 replies; 49+ messages in thread From: Herbert Xu @ 2022-02-28 23:26 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development, Greg KH On Mon, Feb 28, 2022 at 08:39:11PM +0000, Giovanni Cabiddu wrote: > > The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to > a different issue. Indeed, qat has been disabled for dm-crypt since commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 Author: Mikulas Patocka <mpatocka@redhat.com> Date: Thu Jul 9 23:20:41 2020 -0700 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY So this should no longer be an issue with an up-to-date kernel. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-02-28 23:26 ` Herbert Xu 0 siblings, 0 replies; 49+ messages in thread From: Herbert Xu @ 2022-02-28 23:26 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds On Mon, Feb 28, 2022 at 08:39:11PM +0000, Giovanni Cabiddu wrote: > > The dm-crypt + QAT use-case is already disabled since kernel 5.10 due to > a different issue. Indeed, qat has been disabled for dm-crypt since commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 Author: Mikulas Patocka <mpatocka@redhat.com> Date: Thu Jul 9 23:20:41 2020 -0700 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY So this should no longer be an issue with an up-to-date kernel. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-02-28 23:26 ` [dm-devel] " Herbert Xu @ 2022-03-01 1:12 ` Linus Torvalds -1 siblings, 0 replies; 49+ messages in thread From: Linus Torvalds @ 2022-03-01 1:12 UTC (permalink / raw) To: Herbert Xu Cc: Giovanni Cabiddu, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development, Greg KH On Mon, Feb 28, 2022 at 3:26 PM Herbert Xu <herbert@gondor.apana.org.au> wrote: > > Indeed, qat has been disabled for dm-crypt since > > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 > Author: Mikulas Patocka <mpatocka@redhat.com> > Date: Thu Jul 9 23:20:41 2020 -0700 > > crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY > > So this should no longer be an issue with an up-to-date kernel. Ok, that commit message doesn't exactly make it clear that it also fixes a major disk corruption issue. It sounds like it was incidental and almost accidental that it fixed that thing, and nobody realized it should perhaps be also moved to stable. Oh, except I think you *also* need commit cd74693870fb ("dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY") that actually reacts to that flag. Which also wasn't marked for stable, and which is why 5.10 is ok, but 5.9 (which has that first commit, but not the second) is not ok. Of course, maybe they got marked for stable separately and actually have been back-ported, but it doesn't sound like that happened.. I didn't actually check. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-01 1:12 ` Linus Torvalds 0 siblings, 0 replies; 49+ messages in thread From: Linus Torvalds @ 2022-03-01 1:12 UTC (permalink / raw) To: Herbert Xu Cc: Giovanni Cabiddu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List On Mon, Feb 28, 2022 at 3:26 PM Herbert Xu <herbert@gondor.apana.org.au> wrote: > > Indeed, qat has been disabled for dm-crypt since > > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 > Author: Mikulas Patocka <mpatocka@redhat.com> > Date: Thu Jul 9 23:20:41 2020 -0700 > > crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY > > So this should no longer be an issue with an up-to-date kernel. Ok, that commit message doesn't exactly make it clear that it also fixes a major disk corruption issue. It sounds like it was incidental and almost accidental that it fixed that thing, and nobody realized it should perhaps be also moved to stable. Oh, except I think you *also* need commit cd74693870fb ("dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY") that actually reacts to that flag. Which also wasn't marked for stable, and which is why 5.10 is ok, but 5.9 (which has that first commit, but not the second) is not ok. Of course, maybe they got marked for stable separately and actually have been back-ported, but it doesn't sound like that happened.. I didn't actually check. Linus -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-01 1:12 ` [dm-devel] " Linus Torvalds @ 2022-03-01 4:11 ` Herbert Xu -1 siblings, 0 replies; 49+ messages in thread From: Herbert Xu @ 2022-03-01 4:11 UTC (permalink / raw) To: Linus Torvalds Cc: Giovanni Cabiddu, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development, Greg KH On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote: > > It sounds like it was incidental and almost accidental that it fixed > that thing, and nobody realized it should perhaps be also moved to > stable. Yes this was incidental. The patch in question fixes an issue in OOM situations where drivers that must allocate memory on each request may lead to dead-lock so it's not really targeted at qat. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-01 4:11 ` Herbert Xu 0 siblings, 0 replies; 49+ messages in thread From: Herbert Xu @ 2022-03-01 4:11 UTC (permalink / raw) To: Linus Torvalds Cc: Giovanni Cabiddu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote: > > It sounds like it was incidental and almost accidental that it fixed > that thing, and nobody realized it should perhaps be also moved to > stable. Yes this was incidental. The patch in question fixes an issue in OOM situations where drivers that must allocate memory on each request may lead to dead-lock so it's not really targeted at qat. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-01 4:11 ` [dm-devel] " Herbert Xu @ 2022-03-02 10:29 ` Greg KH -1 siblings, 0 replies; 49+ messages in thread From: Greg KH @ 2022-03-02 10:29 UTC (permalink / raw) To: Herbert Xu Cc: Linus Torvalds, Giovanni Cabiddu, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote: > On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote: > > > > It sounds like it was incidental and almost accidental that it fixed > > that thing, and nobody realized it should perhaps be also moved to > > stable. > > Yes this was incidental. The patch in question fixes an issue in > OOM situations where drivers that must allocate memory on each > request may lead to dead-lock so it's not really targeted at qat. Ok, so what commits should I backport to kernels older than 5.10 to resolve this? thanks, greg k-h ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-02 10:29 ` Greg KH 0 siblings, 0 replies; 49+ messages in thread From: Greg KH @ 2022-03-02 10:29 UTC (permalink / raw) To: Herbert Xu Cc: Giovanni Cabiddu, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote: > On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote: > > > > It sounds like it was incidental and almost accidental that it fixed > > that thing, and nobody realized it should perhaps be also moved to > > stable. > > Yes this was incidental. The patch in question fixes an issue in > OOM situations where drivers that must allocate memory on each > request may lead to dead-lock so it's not really targeted at qat. Ok, so what commits should I backport to kernels older than 5.10 to resolve this? thanks, greg k-h -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-02 10:29 ` [dm-devel] " Greg KH @ 2022-03-02 11:49 ` Giovanni Cabiddu -1 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-03-02 11:49 UTC (permalink / raw) To: Greg KH Cc: Herbert Xu, Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development Hi Greg, On Wed, Mar 02, 2022 at 11:29:00AM +0100, Greg KH wrote: > On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote: > > On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote: > > > > > > It sounds like it was incidental and almost accidental that it fixed > > > that thing, and nobody realized it should perhaps be also moved to > > > stable. > > > > Yes this was incidental. The patch in question fixes an issue in > > OOM situations where drivers that must allocate memory on each > > request may lead to dead-lock so it's not really targeted at qat. > > Ok, so what commits should I backport to kernels older than 5.10 to > resolve this? Is it possible to wait for a set that resolves the problem rather than backporting the patches that disables the use-case? I have a patchset that fixes the actual issue and we are doing an internal review before submission to the mailing list. I should be able to send a V1 out between today and tomorrow. If not, then these are the patches that should be backported: 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY Herbert, correct me if I'm wrong here. Thanks, -- Giovanni ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-02 11:49 ` Giovanni Cabiddu 0 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-03-02 11:49 UTC (permalink / raw) To: Greg KH Cc: Herbert Xu, Dave Chinner, Linux-Kernal, qat-linux, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds Hi Greg, On Wed, Mar 02, 2022 at 11:29:00AM +0100, Greg KH wrote: > On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote: > > On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote: > > > > > > It sounds like it was incidental and almost accidental that it fixed > > > that thing, and nobody realized it should perhaps be also moved to > > > stable. > > > > Yes this was incidental. The patch in question fixes an issue in > > OOM situations where drivers that must allocate memory on each > > request may lead to dead-lock so it's not really targeted at qat. > > Ok, so what commits should I backport to kernels older than 5.10 to > resolve this? Is it possible to wait for a set that resolves the problem rather than backporting the patches that disables the use-case? I have a patchset that fixes the actual issue and we are doing an internal review before submission to the mailing list. I should be able to send a V1 out between today and tomorrow. If not, then these are the patches that should be backported: 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY Herbert, correct me if I'm wrong here. Thanks, -- Giovanni -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-02 11:49 ` [dm-devel] " Giovanni Cabiddu @ 2022-03-02 14:56 ` Greg KH -1 siblings, 0 replies; 49+ messages in thread From: Greg KH @ 2022-03-02 14:56 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Herbert Xu, Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Wed, Mar 02, 2022 at 11:49:16AM +0000, Giovanni Cabiddu wrote: > Hi Greg, > > On Wed, Mar 02, 2022 at 11:29:00AM +0100, Greg KH wrote: > > On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote: > > > On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote: > > > > > > > > It sounds like it was incidental and almost accidental that it fixed > > > > that thing, and nobody realized it should perhaps be also moved to > > > > stable. > > > > > > Yes this was incidental. The patch in question fixes an issue in > > > OOM situations where drivers that must allocate memory on each > > > request may lead to dead-lock so it's not really targeted at qat. > > > > Ok, so what commits should I backport to kernels older than 5.10 to > > resolve this? > Is it possible to wait for a set that resolves the problem rather than > backporting the patches that disables the use-case? It's already disabled in newer kernels, so we should do so for older ones to prevent problems and the delay in getting those potential fixes merged some day in the future. > I have a patchset that fixes the actual issue and we are doing an > internal review before submission to the mailing list. > I should be able to send a V1 out between today and tomorrow. > > If not, then these are the patches that should be backported: > 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags > 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS > fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY > b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY > cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY > Herbert, correct me if I'm wrong here. These need to be manually backported as they do not apply cleanly. Can you provide such a set? Or should I just disable a specific driver here instead which would be easier overall? thanks, greg k-h ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-02 14:56 ` Greg KH 0 siblings, 0 replies; 49+ messages in thread From: Greg KH @ 2022-03-02 14:56 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Herbert Xu, Dave Chinner, Linux-Kernal, qat-linux, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds On Wed, Mar 02, 2022 at 11:49:16AM +0000, Giovanni Cabiddu wrote: > Hi Greg, > > On Wed, Mar 02, 2022 at 11:29:00AM +0100, Greg KH wrote: > > On Tue, Mar 01, 2022 at 04:11:13PM +1200, Herbert Xu wrote: > > > On Mon, Feb 28, 2022 at 05:12:20PM -0800, Linus Torvalds wrote: > > > > > > > > It sounds like it was incidental and almost accidental that it fixed > > > > that thing, and nobody realized it should perhaps be also moved to > > > > stable. > > > > > > Yes this was incidental. The patch in question fixes an issue in > > > OOM situations where drivers that must allocate memory on each > > > request may lead to dead-lock so it's not really targeted at qat. > > > > Ok, so what commits should I backport to kernels older than 5.10 to > > resolve this? > Is it possible to wait for a set that resolves the problem rather than > backporting the patches that disables the use-case? It's already disabled in newer kernels, so we should do so for older ones to prevent problems and the delay in getting those potential fixes merged some day in the future. > I have a patchset that fixes the actual issue and we are doing an > internal review before submission to the mailing list. > I should be able to send a V1 out between today and tomorrow. > > If not, then these are the patches that should be backported: > 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags > 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS > fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY > b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY > cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY > Herbert, correct me if I'm wrong here. These need to be manually backported as they do not apply cleanly. Can you provide such a set? Or should I just disable a specific driver here instead which would be easier overall? thanks, greg k-h -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-02 14:56 ` [dm-devel] " Greg KH @ 2022-03-02 22:27 ` Herbert Xu -1 siblings, 0 replies; 49+ messages in thread From: Herbert Xu @ 2022-03-02 22:27 UTC (permalink / raw) To: Greg KH Cc: Giovanni Cabiddu, Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Wed, Mar 02, 2022 at 03:56:36PM +0100, Greg KH wrote: > > > If not, then these are the patches that should be backported: > > 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags > > 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS > > fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY > > b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY > > cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY > > Herbert, correct me if I'm wrong here. > > These need to be manually backported as they do not apply cleanly. Can > you provide such a set? Or should I just disable a specific driver here > instead which would be easier overall? I think the safest thing is to disable qat in stable (possibly only when DM_CRYPT is enabled/modular). The patches in question while good may have too wide an effect for the stable kernel series. Giovanni, could you send Greg a Kconfig patch to do that? Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-02 22:27 ` Herbert Xu 0 siblings, 0 replies; 49+ messages in thread From: Herbert Xu @ 2022-03-02 22:27 UTC (permalink / raw) To: Greg KH Cc: Giovanni Cabiddu, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds On Wed, Mar 02, 2022 at 03:56:36PM +0100, Greg KH wrote: > > > If not, then these are the patches that should be backported: > > 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags > > 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS > > fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY > > b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY > > cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY > > Herbert, correct me if I'm wrong here. > > These need to be manually backported as they do not apply cleanly. Can > you provide such a set? Or should I just disable a specific driver here > instead which would be easier overall? I think the safest thing is to disable qat in stable (possibly only when DM_CRYPT is enabled/modular). The patches in question while good may have too wide an effect for the stable kernel series. Giovanni, could you send Greg a Kconfig patch to do that? Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-02 22:27 ` [dm-devel] " Herbert Xu @ 2022-03-02 22:42 ` Giovanni Cabiddu -1 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-03-02 22:42 UTC (permalink / raw) To: Herbert Xu Cc: Greg KH, Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Thu, Mar 03, 2022 at 10:27:47AM +1200, Herbert Xu wrote: > On Wed, Mar 02, 2022 at 03:56:36PM +0100, Greg KH wrote: > > > > > If not, then these are the patches that should be backported: > > > 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags > > > 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS > > > fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY > > > b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY > > > cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY > > > Herbert, correct me if I'm wrong here. > > > > These need to be manually backported as they do not apply cleanly. Can > > you provide such a set? Or should I just disable a specific driver here > > instead which would be easier overall? > > I think the safest thing is to disable qat in stable (possibly only > when DM_CRYPT is enabled/modular). The patches in question while > good may have too wide an effect for the stable kernel series. > > Giovanni, could you send Greg a Kconfig patch to do that? I was thinking, as an alternative, to lower the cra_priority in the QAT driver for the algorithms used by dm-crypt so they are not used by default. Is that a viable option? Sure, I can provide a patch for either the cra_priority or the Kconfig option for the stable kernels that don't have the patches above. -- Giovanni ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-02 22:42 ` Giovanni Cabiddu 0 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-03-02 22:42 UTC (permalink / raw) To: Herbert Xu Cc: Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds On Thu, Mar 03, 2022 at 10:27:47AM +1200, Herbert Xu wrote: > On Wed, Mar 02, 2022 at 03:56:36PM +0100, Greg KH wrote: > > > > > If not, then these are the patches that should be backported: > > > 7bcb2c99f8ed crypto: algapi - use common mechanism for inheriting flags > > > 2eb27c11937e crypto: algapi - add NEED_FALLBACK to INHERITED_FLAGS > > > fbb6cda44190 crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY > > > b8aa7dc5c753 crypto: drivers - set the flag CRYPTO_ALG_ALLOCATES_MEMORY > > > cd74693870fb dm crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY > > > Herbert, correct me if I'm wrong here. > > > > These need to be manually backported as they do not apply cleanly. Can > > you provide such a set? Or should I just disable a specific driver here > > instead which would be easier overall? > > I think the safest thing is to disable qat in stable (possibly only > when DM_CRYPT is enabled/modular). The patches in question while > good may have too wide an effect for the stable kernel series. > > Giovanni, could you send Greg a Kconfig patch to do that? I was thinking, as an alternative, to lower the cra_priority in the QAT driver for the algorithms used by dm-crypt so they are not used by default. Is that a viable option? Sure, I can provide a patch for either the cra_priority or the Kconfig option for the stable kernels that don't have the patches above. -- Giovanni -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-02 22:42 ` [dm-devel] " Giovanni Cabiddu @ 2022-03-02 22:45 ` Herbert Xu -1 siblings, 0 replies; 49+ messages in thread From: Herbert Xu @ 2022-03-02 22:45 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Greg KH, Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote: > > I was thinking, as an alternative, to lower the cra_priority in the QAT > driver for the algorithms used by dm-crypt so they are not used by > default. > Is that a viable option? Yes I think that should work too. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-02 22:45 ` Herbert Xu 0 siblings, 0 replies; 49+ messages in thread From: Herbert Xu @ 2022-03-02 22:45 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote: > > I was thinking, as an alternative, to lower the cra_priority in the QAT > driver for the algorithms used by dm-crypt so they are not used by > default. > Is that a viable option? Yes I think that should work too. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-02 22:45 ` [dm-devel] " Herbert Xu @ 2022-03-03 13:49 ` Giovanni Cabiddu -1 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-03-03 13:49 UTC (permalink / raw) To: Herbert Xu, Greg KH Cc: Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Thu, Mar 03, 2022 at 10:45:48AM +1200, Herbert Xu wrote: > On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote: > > > > I was thinking, as an alternative, to lower the cra_priority in the QAT > > driver for the algorithms used by dm-crypt so they are not used by > > default. > > Is that a viable option? > > Yes I think that should work too. The patch below implements that solution and applies to linux-5.4.y. If it is ok, I can send it to stable for all kernels <= 5.4 following https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-3 ---8<--- From: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Date: Thu, 3 Mar 2022 11:54:07 +0000 Subject: [PATCH] crypto: qat - drop priority of algorithms Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland The implementations of aead and skcipher in the QAT driver are not properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. If the HW queue is full, the driver returns -EBUSY but does not enqueue the request. This can result in applications like dm-crypt waiting indefinitely for a completion of a request that was never submitted to the hardware. To mitigate this problem, reduce the priority of all skcipher and aead implementations in the QAT driver so they are not used by default. This patch deviates from the original upstream solution, that prevents dm-crypt to use drivers registered with the flag CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable kernels may have a too wide effect. commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream commit cd74693870fb748d812867ba49af733d689a3604 upstream Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> --- drivers/crypto/qat/qat_common/qat_algs.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/crypto/qat/qat_common/qat_algs.c b/drivers/crypto/qat/qat_common/qat_algs.c index 6b8ad3d67481..a5c28a08fd8c 100644 --- a/drivers/crypto/qat/qat_common/qat_algs.c +++ b/drivers/crypto/qat/qat_common/qat_algs.c @@ -1274,7 +1274,7 @@ static struct aead_alg qat_aeads[] = { { .base = { .cra_name = "authenc(hmac(sha1),cbc(aes))", .cra_driver_name = "qat_aes_cbc_hmac_sha1", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_ASYNC, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), @@ -1291,7 +1291,7 @@ static struct aead_alg qat_aeads[] = { { .base = { .cra_name = "authenc(hmac(sha256),cbc(aes))", .cra_driver_name = "qat_aes_cbc_hmac_sha256", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_ASYNC, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), @@ -1308,7 +1308,7 @@ static struct aead_alg qat_aeads[] = { { .base = { .cra_name = "authenc(hmac(sha512),cbc(aes))", .cra_driver_name = "qat_aes_cbc_hmac_sha512", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_ASYNC, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), @@ -1326,7 +1326,7 @@ static struct aead_alg qat_aeads[] = { { static struct crypto_alg qat_algs[] = { { .cra_name = "cbc(aes)", .cra_driver_name = "qat_aes_cbc", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), @@ -1348,7 +1348,7 @@ static struct crypto_alg qat_algs[] = { { }, { .cra_name = "ctr(aes)", .cra_driver_name = "qat_aes_ctr", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, .cra_blocksize = 1, .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), @@ -1370,7 +1370,7 @@ static struct crypto_alg qat_algs[] = { { }, { .cra_name = "xts(aes)", .cra_driver_name = "qat_aes_xts", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7 -- 2.35.1 ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-03 13:49 ` Giovanni Cabiddu 0 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-03-03 13:49 UTC (permalink / raw) To: Herbert Xu, Greg KH Cc: Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds On Thu, Mar 03, 2022 at 10:45:48AM +1200, Herbert Xu wrote: > On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote: > > > > I was thinking, as an alternative, to lower the cra_priority in the QAT > > driver for the algorithms used by dm-crypt so they are not used by > > default. > > Is that a viable option? > > Yes I think that should work too. The patch below implements that solution and applies to linux-5.4.y. If it is ok, I can send it to stable for all kernels <= 5.4 following https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-3 ---8<--- From: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Date: Thu, 3 Mar 2022 11:54:07 +0000 Subject: [PATCH] crypto: qat - drop priority of algorithms Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland The implementations of aead and skcipher in the QAT driver are not properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. If the HW queue is full, the driver returns -EBUSY but does not enqueue the request. This can result in applications like dm-crypt waiting indefinitely for a completion of a request that was never submitted to the hardware. To mitigate this problem, reduce the priority of all skcipher and aead implementations in the QAT driver so they are not used by default. This patch deviates from the original upstream solution, that prevents dm-crypt to use drivers registered with the flag CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable kernels may have a too wide effect. commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream commit cd74693870fb748d812867ba49af733d689a3604 upstream Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> --- drivers/crypto/qat/qat_common/qat_algs.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/crypto/qat/qat_common/qat_algs.c b/drivers/crypto/qat/qat_common/qat_algs.c index 6b8ad3d67481..a5c28a08fd8c 100644 --- a/drivers/crypto/qat/qat_common/qat_algs.c +++ b/drivers/crypto/qat/qat_common/qat_algs.c @@ -1274,7 +1274,7 @@ static struct aead_alg qat_aeads[] = { { .base = { .cra_name = "authenc(hmac(sha1),cbc(aes))", .cra_driver_name = "qat_aes_cbc_hmac_sha1", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_ASYNC, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), @@ -1291,7 +1291,7 @@ static struct aead_alg qat_aeads[] = { { .base = { .cra_name = "authenc(hmac(sha256),cbc(aes))", .cra_driver_name = "qat_aes_cbc_hmac_sha256", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_ASYNC, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), @@ -1308,7 +1308,7 @@ static struct aead_alg qat_aeads[] = { { .base = { .cra_name = "authenc(hmac(sha512),cbc(aes))", .cra_driver_name = "qat_aes_cbc_hmac_sha512", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_ASYNC, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), @@ -1326,7 +1326,7 @@ static struct aead_alg qat_aeads[] = { { static struct crypto_alg qat_algs[] = { { .cra_name = "cbc(aes)", .cra_driver_name = "qat_aes_cbc", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), @@ -1348,7 +1348,7 @@ static struct crypto_alg qat_algs[] = { { }, { .cra_name = "ctr(aes)", .cra_driver_name = "qat_aes_ctr", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, .cra_blocksize = 1, .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), @@ -1370,7 +1370,7 @@ static struct crypto_alg qat_algs[] = { { }, { .cra_name = "xts(aes)", .cra_driver_name = "qat_aes_xts", - .cra_priority = 4001, + .cra_priority = 1, .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, .cra_blocksize = AES_BLOCK_SIZE, .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7 -- 2.35.1 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-03 13:49 ` [dm-devel] " Giovanni Cabiddu @ 2022-03-03 19:21 ` Eric Biggers -1 siblings, 0 replies; 49+ messages in thread From: Eric Biggers @ 2022-03-03 19:21 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Herbert Xu, Greg KH, Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Thu, Mar 03, 2022 at 01:49:03PM +0000, Giovanni Cabiddu wrote: > On Thu, Mar 03, 2022 at 10:45:48AM +1200, Herbert Xu wrote: > > On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote: > > > > > > I was thinking, as an alternative, to lower the cra_priority in the QAT > > > driver for the algorithms used by dm-crypt so they are not used by > > > default. > > > Is that a viable option? > > > > Yes I think that should work too. > The patch below implements that solution and applies to linux-5.4.y. > If it is ok, I can send it to stable for all kernels <= 5.4 following > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-3 > > ---8<--- > From: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > Date: Thu, 3 Mar 2022 11:54:07 +0000 > Subject: [PATCH] crypto: qat - drop priority of algorithms > Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland > > The implementations of aead and skcipher in the QAT driver are not > properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. > If the HW queue is full, the driver returns -EBUSY but does not enqueue > the request. > This can result in applications like dm-crypt waiting indefinitely for a > completion of a request that was never submitted to the hardware. > > To mitigate this problem, reduce the priority of all skcipher and aead > implementations in the QAT driver so they are not used by default. > > This patch deviates from the original upstream solution, that prevents > dm-crypt to use drivers registered with the flag > CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable > kernels may have a too wide effect. > > commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream > commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream > commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream > commit cd74693870fb748d812867ba49af733d689a3604 upstream > > Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > --- > drivers/crypto/qat/qat_common/qat_algs.c | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/crypto/qat/qat_common/qat_algs.c b/drivers/crypto/qat/qat_common/qat_algs.c > index 6b8ad3d67481..a5c28a08fd8c 100644 > --- a/drivers/crypto/qat/qat_common/qat_algs.c > +++ b/drivers/crypto/qat/qat_common/qat_algs.c > @@ -1274,7 +1274,7 @@ static struct aead_alg qat_aeads[] = { { > .base = { > .cra_name = "authenc(hmac(sha1),cbc(aes))", > .cra_driver_name = "qat_aes_cbc_hmac_sha1", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_ASYNC, > .cra_blocksize = AES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), > @@ -1291,7 +1291,7 @@ static struct aead_alg qat_aeads[] = { { > .base = { > .cra_name = "authenc(hmac(sha256),cbc(aes))", > .cra_driver_name = "qat_aes_cbc_hmac_sha256", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_ASYNC, > .cra_blocksize = AES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), > @@ -1308,7 +1308,7 @@ static struct aead_alg qat_aeads[] = { { > .base = { > .cra_name = "authenc(hmac(sha512),cbc(aes))", > .cra_driver_name = "qat_aes_cbc_hmac_sha512", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_ASYNC, > .cra_blocksize = AES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), > @@ -1326,7 +1326,7 @@ static struct aead_alg qat_aeads[] = { { > static struct crypto_alg qat_algs[] = { { > .cra_name = "cbc(aes)", > .cra_driver_name = "qat_aes_cbc", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, > .cra_blocksize = AES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), > @@ -1348,7 +1348,7 @@ static struct crypto_alg qat_algs[] = { { > }, { > .cra_name = "ctr(aes)", > .cra_driver_name = "qat_aes_ctr", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, > .cra_blocksize = 1, > .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), > @@ -1370,7 +1370,7 @@ static struct crypto_alg qat_algs[] = { { > }, { > .cra_name = "xts(aes)", > .cra_driver_name = "qat_aes_xts", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, > .cra_blocksize = AES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), > > base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7 > -- > 2.35.1 > If these algorithms have critical bugs, which it appears they do, then IMO it would be better to disable them (either stop registering them, or disable the whole driver) than to leave them available with low cra_priority. Low cra_priority doesn't guarantee that they aren't used. - Eric ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-03 19:21 ` Eric Biggers 0 siblings, 0 replies; 49+ messages in thread From: Eric Biggers @ 2022-03-03 19:21 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Herbert Xu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds On Thu, Mar 03, 2022 at 01:49:03PM +0000, Giovanni Cabiddu wrote: > On Thu, Mar 03, 2022 at 10:45:48AM +1200, Herbert Xu wrote: > > On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote: > > > > > > I was thinking, as an alternative, to lower the cra_priority in the QAT > > > driver for the algorithms used by dm-crypt so they are not used by > > > default. > > > Is that a viable option? > > > > Yes I think that should work too. > The patch below implements that solution and applies to linux-5.4.y. > If it is ok, I can send it to stable for all kernels <= 5.4 following > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-3 > > ---8<--- > From: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > Date: Thu, 3 Mar 2022 11:54:07 +0000 > Subject: [PATCH] crypto: qat - drop priority of algorithms > Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland > > The implementations of aead and skcipher in the QAT driver are not > properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. > If the HW queue is full, the driver returns -EBUSY but does not enqueue > the request. > This can result in applications like dm-crypt waiting indefinitely for a > completion of a request that was never submitted to the hardware. > > To mitigate this problem, reduce the priority of all skcipher and aead > implementations in the QAT driver so they are not used by default. > > This patch deviates from the original upstream solution, that prevents > dm-crypt to use drivers registered with the flag > CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable > kernels may have a too wide effect. > > commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream > commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream > commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream > commit cd74693870fb748d812867ba49af733d689a3604 upstream > > Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > --- > drivers/crypto/qat/qat_common/qat_algs.c | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/crypto/qat/qat_common/qat_algs.c b/drivers/crypto/qat/qat_common/qat_algs.c > index 6b8ad3d67481..a5c28a08fd8c 100644 > --- a/drivers/crypto/qat/qat_common/qat_algs.c > +++ b/drivers/crypto/qat/qat_common/qat_algs.c > @@ -1274,7 +1274,7 @@ static struct aead_alg qat_aeads[] = { { > .base = { > .cra_name = "authenc(hmac(sha1),cbc(aes))", > .cra_driver_name = "qat_aes_cbc_hmac_sha1", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_ASYNC, > .cra_blocksize = AES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), > @@ -1291,7 +1291,7 @@ static struct aead_alg qat_aeads[] = { { > .base = { > .cra_name = "authenc(hmac(sha256),cbc(aes))", > .cra_driver_name = "qat_aes_cbc_hmac_sha256", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_ASYNC, > .cra_blocksize = AES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), > @@ -1308,7 +1308,7 @@ static struct aead_alg qat_aeads[] = { { > .base = { > .cra_name = "authenc(hmac(sha512),cbc(aes))", > .cra_driver_name = "qat_aes_cbc_hmac_sha512", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_ASYNC, > .cra_blocksize = AES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct qat_alg_aead_ctx), > @@ -1326,7 +1326,7 @@ static struct aead_alg qat_aeads[] = { { > static struct crypto_alg qat_algs[] = { { > .cra_name = "cbc(aes)", > .cra_driver_name = "qat_aes_cbc", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, > .cra_blocksize = AES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), > @@ -1348,7 +1348,7 @@ static struct crypto_alg qat_algs[] = { { > }, { > .cra_name = "ctr(aes)", > .cra_driver_name = "qat_aes_ctr", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, > .cra_blocksize = 1, > .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), > @@ -1370,7 +1370,7 @@ static struct crypto_alg qat_algs[] = { { > }, { > .cra_name = "xts(aes)", > .cra_driver_name = "qat_aes_xts", > - .cra_priority = 4001, > + .cra_priority = 1, > .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC, > .cra_blocksize = AES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx), > > base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7 > -- > 2.35.1 > If these algorithms have critical bugs, which it appears they do, then IMO it would be better to disable them (either stop registering them, or disable the whole driver) than to leave them available with low cra_priority. Low cra_priority doesn't guarantee that they aren't used. - Eric -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-03 19:21 ` [dm-devel] " Eric Biggers @ 2022-03-03 21:24 ` Giovanni Cabiddu -1 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-03-03 21:24 UTC (permalink / raw) To: Eric Biggers Cc: Herbert Xu, Greg KH, Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote: > If these algorithms have critical bugs, which it appears they do, then IMO it > would be better to disable them (either stop registering them, or disable the > whole driver) than to leave them available with low cra_priority. Low > cra_priority doesn't guarantee that they aren't used. Thanks for your feedback Eric. Here is a patch that disables the registration of the algorithms in the QAT driver by setting, a config time, the number of HW queues (aka instances) to zero. ---8<--- From: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Subject: [PATCH] crypto: qat - disable registration of algorithms Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland The implementations of aead and skcipher in the QAT driver do not support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. If the HW queue is full, the driver returns -EBUSY but does not enqueue the request. This can result in applications like dm-crypt waiting indefinitely for a completion of a request that was never submitted to the hardware. To avoid this problem, disable the registration of all skcipher and aead implementations in the QAT driver by setting the number of crypto instances to 0 at configuration time. This patch deviates from the original upstream solution, that prevents dm-crypt to use drivers registered with the flag CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable kernels may have a too wide effect. commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream commit cd74693870fb748d812867ba49af733d689a3604 upstream Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> --- drivers/crypto/qat/qat_common/qat_crypto.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/crypto/qat/qat_common/qat_crypto.c b/drivers/crypto/qat/qat_common/qat_crypto.c index 3852d31ce0a4..611d214d5198 100644 --- a/drivers/crypto/qat/qat_common/qat_crypto.c +++ b/drivers/crypto/qat/qat_common/qat_crypto.c @@ -159,9 +159,7 @@ struct qat_crypto_instance *qat_crypto_get_instance_node(int node) */ int qat_crypto_dev_config(struct adf_accel_dev *accel_dev) { - int cpus = num_online_cpus(); - int banks = GET_MAX_BANKS(accel_dev); - int instances = min(cpus, banks); + int instances = 0; char key[ADF_CFG_MAX_KEY_LEN_IN_BYTES]; int i; unsigned long val; base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7 -- 2.35.1 ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-03 21:24 ` Giovanni Cabiddu 0 siblings, 0 replies; 49+ messages in thread From: Giovanni Cabiddu @ 2022-03-03 21:24 UTC (permalink / raw) To: Eric Biggers Cc: Herbert Xu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote: > If these algorithms have critical bugs, which it appears they do, then IMO it > would be better to disable them (either stop registering them, or disable the > whole driver) than to leave them available with low cra_priority. Low > cra_priority doesn't guarantee that they aren't used. Thanks for your feedback Eric. Here is a patch that disables the registration of the algorithms in the QAT driver by setting, a config time, the number of HW queues (aka instances) to zero. ---8<--- From: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Subject: [PATCH] crypto: qat - disable registration of algorithms Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland The implementations of aead and skcipher in the QAT driver do not support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. If the HW queue is full, the driver returns -EBUSY but does not enqueue the request. This can result in applications like dm-crypt waiting indefinitely for a completion of a request that was never submitted to the hardware. To avoid this problem, disable the registration of all skcipher and aead implementations in the QAT driver by setting the number of crypto instances to 0 at configuration time. This patch deviates from the original upstream solution, that prevents dm-crypt to use drivers registered with the flag CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable kernels may have a too wide effect. commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream commit cd74693870fb748d812867ba49af733d689a3604 upstream Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> --- drivers/crypto/qat/qat_common/qat_crypto.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/crypto/qat/qat_common/qat_crypto.c b/drivers/crypto/qat/qat_common/qat_crypto.c index 3852d31ce0a4..611d214d5198 100644 --- a/drivers/crypto/qat/qat_common/qat_crypto.c +++ b/drivers/crypto/qat/qat_common/qat_crypto.c @@ -159,9 +159,7 @@ struct qat_crypto_instance *qat_crypto_get_instance_node(int node) */ int qat_crypto_dev_config(struct adf_accel_dev *accel_dev) { - int cpus = num_online_cpus(); - int banks = GET_MAX_BANKS(accel_dev); - int instances = min(cpus, banks); + int instances = 0; char key[ADF_CFG_MAX_KEY_LEN_IN_BYTES]; int i; unsigned long val; base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7 -- 2.35.1 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-03 21:24 ` [dm-devel] " Giovanni Cabiddu @ 2022-03-03 21:44 ` Eric Biggers -1 siblings, 0 replies; 49+ messages in thread From: Eric Biggers @ 2022-03-03 21:44 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Herbert Xu, Greg KH, Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Thu, Mar 03, 2022 at 09:24:42PM +0000, Giovanni Cabiddu wrote: > On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote: > > If these algorithms have critical bugs, which it appears they do, then IMO it > > would be better to disable them (either stop registering them, or disable the > > whole driver) than to leave them available with low cra_priority. Low > > cra_priority doesn't guarantee that they aren't used. > Thanks for your feedback Eric. > > Here is a patch that disables the registration of the algorithms in the > QAT driver by setting, a config time, the number of HW queues (aka > instances) to zero. > > ---8<--- > From: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > Subject: [PATCH] crypto: qat - disable registration of algorithms > Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland > > The implementations of aead and skcipher in the QAT driver do not > support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. > If the HW queue is full, the driver returns -EBUSY but does not enqueue > the request. > This can result in applications like dm-crypt waiting indefinitely for a > completion of a request that was never submitted to the hardware. > > To avoid this problem, disable the registration of all skcipher and aead > implementations in the QAT driver by setting the number of crypto > instances to 0 at configuration time. > > This patch deviates from the original upstream solution, that prevents > dm-crypt to use drivers registered with the flag > CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable > kernels may have a too wide effect. > > commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream > commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream > commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream > commit cd74693870fb748d812867ba49af733d689a3604 upstream > > Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > --- > drivers/crypto/qat/qat_common/qat_crypto.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) Sounds good; is there any reason not to apply this upstream too, though? You could revert it later as part of the patch series that fixes the driver. - Eric ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-03 21:44 ` Eric Biggers 0 siblings, 0 replies; 49+ messages in thread From: Eric Biggers @ 2022-03-03 21:44 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Herbert Xu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Kyle Sanderson, Linux Crypto Mailing List, Linus Torvalds On Thu, Mar 03, 2022 at 09:24:42PM +0000, Giovanni Cabiddu wrote: > On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote: > > If these algorithms have critical bugs, which it appears they do, then IMO it > > would be better to disable them (either stop registering them, or disable the > > whole driver) than to leave them available with low cra_priority. Low > > cra_priority doesn't guarantee that they aren't used. > Thanks for your feedback Eric. > > Here is a patch that disables the registration of the algorithms in the > QAT driver by setting, a config time, the number of HW queues (aka > instances) to zero. > > ---8<--- > From: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > Subject: [PATCH] crypto: qat - disable registration of algorithms > Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland > > The implementations of aead and skcipher in the QAT driver do not > support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. > If the HW queue is full, the driver returns -EBUSY but does not enqueue > the request. > This can result in applications like dm-crypt waiting indefinitely for a > completion of a request that was never submitted to the hardware. > > To avoid this problem, disable the registration of all skcipher and aead > implementations in the QAT driver by setting the number of crypto > instances to 0 at configuration time. > > This patch deviates from the original upstream solution, that prevents > dm-crypt to use drivers registered with the flag > CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable > kernels may have a too wide effect. > > commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream > commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream > commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream > commit cd74693870fb748d812867ba49af733d689a3604 upstream > > Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > --- > drivers/crypto/qat/qat_common/qat_crypto.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) Sounds good; is there any reason not to apply this upstream too, though? You could revert it later as part of the patch series that fixes the driver. - Eric -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-03 21:44 ` [dm-devel] " Eric Biggers (?) @ 2022-03-04 17:50 ` Giovanni Cabiddu 2022-03-16 21:38 ` [dm-devel] " Kyle Sanderson -1 siblings, 1 reply; 49+ messages in thread From: Giovanni Cabiddu @ 2022-03-04 17:50 UTC (permalink / raw) To: Eric Biggers Cc: Herbert Xu, Greg KH, Linus Torvalds, Kyle Sanderson, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Thu, Mar 03, 2022 at 09:44:53PM +0000, Eric Biggers wrote: > On Thu, Mar 03, 2022 at 09:24:42PM +0000, Giovanni Cabiddu wrote: > > On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote: > > > If these algorithms have critical bugs, which it appears they do, then IMO it > > > would be better to disable them (either stop registering them, or disable the > > > whole driver) than to leave them available with low cra_priority. Low > > > cra_priority doesn't guarantee that they aren't used. > > Thanks for your feedback Eric. > > > > Here is a patch that disables the registration of the algorithms in the > > QAT driver by setting, a config time, the number of HW queues (aka > > instances) to zero. > > > > ---8<--- > > From: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > > Subject: [PATCH] crypto: qat - disable registration of algorithms > > Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland > > > > The implementations of aead and skcipher in the QAT driver do not > > support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. > > If the HW queue is full, the driver returns -EBUSY but does not enqueue > > the request. > > This can result in applications like dm-crypt waiting indefinitely for a > > completion of a request that was never submitted to the hardware. > > > > To avoid this problem, disable the registration of all skcipher and aead > > implementations in the QAT driver by setting the number of crypto > > instances to 0 at configuration time. > > > > This patch deviates from the original upstream solution, that prevents > > dm-crypt to use drivers registered with the flag > > CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable > > kernels may have a too wide effect. > > > > commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream > > commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream > > commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream > > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream > > commit cd74693870fb748d812867ba49af733d689a3604 upstream > > > > Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > > --- > > drivers/crypto/qat/qat_common/qat_crypto.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > Sounds good; is there any reason not to apply this upstream too, though? > You could revert it later as part of the patch series that fixes the driver. Makes sense. I'm going to send it upstream and Cc stable as documented in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1 I will then revert this change in the set that fixes the problem. Thanks, -- Giovanni ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-04 17:50 ` Giovanni Cabiddu @ 2022-03-16 21:38 ` Kyle Sanderson 0 siblings, 0 replies; 49+ messages in thread From: Kyle Sanderson @ 2022-03-16 21:38 UTC (permalink / raw) To: Giovanni Cabiddu Cc: Eric Biggers, Herbert Xu, Greg KH, Linus Torvalds, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development > Makes sense. I'm going to send it upstream and Cc stable as documented > in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1 > I will then revert this change in the set that fixes the problem. Did this go anywhere? I'm still not seeing it in any of the stable trees. Kyle. On Fri, Mar 4, 2022 at 9:50 AM Giovanni Cabiddu <giovanni.cabiddu@intel.com> wrote: > > On Thu, Mar 03, 2022 at 09:44:53PM +0000, Eric Biggers wrote: > > On Thu, Mar 03, 2022 at 09:24:42PM +0000, Giovanni Cabiddu wrote: > > > On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote: > > > > If these algorithms have critical bugs, which it appears they do, then IMO it > > > > would be better to disable them (either stop registering them, or disable the > > > > whole driver) than to leave them available with low cra_priority. Low > > > > cra_priority doesn't guarantee that they aren't used. > > > Thanks for your feedback Eric. > > > > > > Here is a patch that disables the registration of the algorithms in the > > > QAT driver by setting, a config time, the number of HW queues (aka > > > instances) to zero. > > > > > > ---8<--- > > > From: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > > > Subject: [PATCH] crypto: qat - disable registration of algorithms > > > Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland > > > > > > The implementations of aead and skcipher in the QAT driver do not > > > support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. > > > If the HW queue is full, the driver returns -EBUSY but does not enqueue > > > the request. > > > This can result in applications like dm-crypt waiting indefinitely for a > > > completion of a request that was never submitted to the hardware. > > > > > > To avoid this problem, disable the registration of all skcipher and aead > > > implementations in the QAT driver by setting the number of crypto > > > instances to 0 at configuration time. > > > > > > This patch deviates from the original upstream solution, that prevents > > > dm-crypt to use drivers registered with the flag > > > CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable > > > kernels may have a too wide effect. > > > > > > commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream > > > commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream > > > commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream > > > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream > > > commit cd74693870fb748d812867ba49af733d689a3604 upstream > > > > > > Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > > > --- > > > drivers/crypto/qat/qat_common/qat_crypto.c | 4 +--- > > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > Sounds good; is there any reason not to apply this upstream too, though? > > You could revert it later as part of the patch series that fixes the driver. > Makes sense. I'm going to send it upstream and Cc stable as documented > in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1 > I will then revert this change in the set that fixes the problem. > > Thanks, > > -- > Giovanni ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-16 21:38 ` Kyle Sanderson 0 siblings, 0 replies; 49+ messages in thread From: Kyle Sanderson @ 2022-03-16 21:38 UTC (permalink / raw) To: Giovanni Cabiddu Cc: linux-xfs, Herbert Xu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal, Eric Biggers, device-mapper development, Linux Crypto Mailing List, Linus Torvalds > Makes sense. I'm going to send it upstream and Cc stable as documented > in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1 > I will then revert this change in the set that fixes the problem. Did this go anywhere? I'm still not seeing it in any of the stable trees. Kyle. On Fri, Mar 4, 2022 at 9:50 AM Giovanni Cabiddu <giovanni.cabiddu@intel.com> wrote: > > On Thu, Mar 03, 2022 at 09:44:53PM +0000, Eric Biggers wrote: > > On Thu, Mar 03, 2022 at 09:24:42PM +0000, Giovanni Cabiddu wrote: > > > On Thu, Mar 03, 2022 at 07:21:33PM +0000, Eric Biggers wrote: > > > > If these algorithms have critical bugs, which it appears they do, then IMO it > > > > would be better to disable them (either stop registering them, or disable the > > > > whole driver) than to leave them available with low cra_priority. Low > > > > cra_priority doesn't guarantee that they aren't used. > > > Thanks for your feedback Eric. > > > > > > Here is a patch that disables the registration of the algorithms in the > > > QAT driver by setting, a config time, the number of HW queues (aka > > > instances) to zero. > > > > > > ---8<--- > > > From: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > > > Subject: [PATCH] crypto: qat - disable registration of algorithms > > > Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland > > > > > > The implementations of aead and skcipher in the QAT driver do not > > > support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set. > > > If the HW queue is full, the driver returns -EBUSY but does not enqueue > > > the request. > > > This can result in applications like dm-crypt waiting indefinitely for a > > > completion of a request that was never submitted to the hardware. > > > > > > To avoid this problem, disable the registration of all skcipher and aead > > > implementations in the QAT driver by setting the number of crypto > > > instances to 0 at configuration time. > > > > > > This patch deviates from the original upstream solution, that prevents > > > dm-crypt to use drivers registered with the flag > > > CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable > > > kernels may have a too wide effect. > > > > > > commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream > > > commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream > > > commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream > > > commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream > > > commit cd74693870fb748d812867ba49af733d689a3604 upstream > > > > > > Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> > > > --- > > > drivers/crypto/qat/qat_common/qat_crypto.c | 4 +--- > > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > Sounds good; is there any reason not to apply this upstream too, though? > > You could revert it later as part of the patch series that fixes the driver. > Makes sense. I'm going to send it upstream and Cc stable as documented > in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1 > I will then revert this change in the set that fixes the problem. > > Thanks, > > -- > Giovanni -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-03-16 21:38 ` [dm-devel] " Kyle Sanderson @ 2022-03-16 22:13 ` Herbert Xu -1 siblings, 0 replies; 49+ messages in thread From: Herbert Xu @ 2022-03-16 22:13 UTC (permalink / raw) To: Kyle Sanderson Cc: Giovanni Cabiddu, Eric Biggers, Greg KH, Linus Torvalds, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, Linux Crypto Mailing List, device-mapper development On Wed, Mar 16, 2022 at 02:38:10PM -0700, Kyle Sanderson wrote: > > Makes sense. I'm going to send it upstream and Cc stable as documented > > in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1 > > I will then revert this change in the set that fixes the problem. > > Did this go anywhere? I'm still not seeing it in any of the stable trees. It's in the cryptodev tree which should hit mainline when the merge window opens. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-03-16 22:13 ` Herbert Xu 0 siblings, 0 replies; 49+ messages in thread From: Herbert Xu @ 2022-03-16 22:13 UTC (permalink / raw) To: Kyle Sanderson Cc: Giovanni Cabiddu, linux-xfs, Greg KH, Dave Chinner, qat-linux, Linux-Kernal, Eric Biggers, device-mapper development, Linux Crypto Mailing List, Linus Torvalds On Wed, Mar 16, 2022 at 02:38:10PM -0700, Kyle Sanderson wrote: > > Makes sense. I'm going to send it upstream and Cc stable as documented > > in https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html#option-1 > > I will then revert this change in the set that fixes the problem. > > Did this go anywhere? I'm still not seeing it in any of the stable trees. It's in the cryptodev tree which should hit mainline when the merge window opens. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs 2022-02-28 19:25 ` [dm-devel] " Linus Torvalds @ 2022-02-28 21:13 ` Milan Broz -1 siblings, 0 replies; 49+ messages in thread From: Milan Broz @ 2022-02-28 21:13 UTC (permalink / raw) To: Linus Torvalds, Kyle Sanderson Cc: Giovanni Cabiddu, Herbert Xu, Greg KH, Dave Chinner, Linux-Kernal, qat-linux, linux-xfs, device-mapper development, Linux Crypto Mailing List On 28/02/2022 20:25, Linus Torvalds wrote: > On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote: >> >> Makes sense - this kernel driver has been destroying users for many >> years. I'm disappointed that this critical bricking failure isn't >> searchable for others. > > It does sound like we should just disable that driver entirely until > it is fixed. > > Or at least the configuration that can cause problems, if there is > some particular sub-case. Although from a cursory glance and the > noises made in this thread, it looks like it's all of the 'qat_aeads' > cases (since that uses qat_alg_aead_enc() which can return -EAGAIN), > which effectively means that all of the QAT stuff. > > So presumably CRYPTO_DEV_QAT should just be marked as > > depends on BROKEN || COMPILE_TEST > > or similar? Yes, please! Or at least disable it in stable for now. During the last years, we had several reports of problems with this driver for cryptsetup/LUKS (dm-crypt with qat driver; here it is skcipher, not aead, though). The problem with the misunderstanding of the crypto API queue has been known to authors for some time, at least since 2020 see https://lore.kernel.org/dm-devel/20200601160418.171851200@debian-a64.vm/ and it is apparently not fixed yet. Thanks you, Milan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [dm-devel] Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs @ 2022-02-28 21:13 ` Milan Broz 0 siblings, 0 replies; 49+ messages in thread From: Milan Broz @ 2022-02-28 21:13 UTC (permalink / raw) To: Linus Torvalds, Kyle Sanderson Cc: Giovanni Cabiddu, Herbert Xu, Greg KH, Dave Chinner, qat-linux, Linux-Kernal, linux-xfs, device-mapper development, Linux Crypto Mailing List On 28/02/2022 20:25, Linus Torvalds wrote: > On Mon, Feb 28, 2022 at 12:18 AM Kyle Sanderson <kyle.leet@gmail.com> wrote: >> >> Makes sense - this kernel driver has been destroying users for many >> years. I'm disappointed that this critical bricking failure isn't >> searchable for others. > > It does sound like we should just disable that driver entirely until > it is fixed. > > Or at least the configuration that can cause problems, if there is > some particular sub-case. Although from a cursory glance and the > noises made in this thread, it looks like it's all of the 'qat_aeads' > cases (since that uses qat_alg_aead_enc() which can return -EAGAIN), > which effectively means that all of the QAT stuff. > > So presumably CRYPTO_DEV_QAT should just be marked as > > depends on BROKEN || COMPILE_TEST > > or similar? Yes, please! Or at least disable it in stable for now. During the last years, we had several reports of problems with this driver for cryptsetup/LUKS (dm-crypt with qat driver; here it is skcipher, not aead, though). The problem with the misunderstanding of the crypto API queue has been known to authors for some time, at least since 2020 see https://lore.kernel.org/dm-devel/20200601160418.171851200@debian-a64.vm/ and it is apparently not fixed yet. Thanks you, Milan -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2022-03-17 8:03 UTC | newest] Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-19 5:02 Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs Kyle Sanderson 2022-02-19 5:02 ` [dm-devel] " Kyle Sanderson 2022-02-19 21:03 ` Dave Chinner 2022-02-19 21:03 ` [dm-devel] " Dave Chinner 2022-02-19 23:00 ` Kyle Sanderson 2022-02-19 23:00 ` [dm-devel] " Kyle Sanderson 2022-02-21 11:47 ` Giovanni Cabiddu 2022-02-21 11:47 ` [dm-devel] " Giovanni Cabiddu 2022-02-28 8:18 ` Kyle Sanderson 2022-02-28 8:18 ` [dm-devel] " Kyle Sanderson 2022-02-28 19:25 ` Linus Torvalds 2022-02-28 19:25 ` [dm-devel] " Linus Torvalds 2022-02-28 20:39 ` Giovanni Cabiddu 2022-02-28 20:39 ` [dm-devel] " Giovanni Cabiddu 2022-02-28 20:59 ` Greg KH 2022-02-28 20:59 ` [dm-devel] " Greg KH 2022-02-28 23:26 ` Herbert Xu 2022-02-28 23:26 ` [dm-devel] " Herbert Xu 2022-03-01 1:12 ` Linus Torvalds 2022-03-01 1:12 ` [dm-devel] " Linus Torvalds 2022-03-01 4:11 ` Herbert Xu 2022-03-01 4:11 ` [dm-devel] " Herbert Xu 2022-03-02 10:29 ` Greg KH 2022-03-02 10:29 ` [dm-devel] " Greg KH 2022-03-02 11:49 ` Giovanni Cabiddu 2022-03-02 11:49 ` [dm-devel] " Giovanni Cabiddu 2022-03-02 14:56 ` Greg KH 2022-03-02 14:56 ` [dm-devel] " Greg KH 2022-03-02 22:27 ` Herbert Xu 2022-03-02 22:27 ` [dm-devel] " Herbert Xu 2022-03-02 22:42 ` Giovanni Cabiddu 2022-03-02 22:42 ` [dm-devel] " Giovanni Cabiddu 2022-03-02 22:45 ` Herbert Xu 2022-03-02 22:45 ` [dm-devel] " Herbert Xu 2022-03-03 13:49 ` Giovanni Cabiddu 2022-03-03 13:49 ` [dm-devel] " Giovanni Cabiddu 2022-03-03 19:21 ` Eric Biggers 2022-03-03 19:21 ` [dm-devel] " Eric Biggers 2022-03-03 21:24 ` Giovanni Cabiddu 2022-03-03 21:24 ` [dm-devel] " Giovanni Cabiddu 2022-03-03 21:44 ` Eric Biggers 2022-03-03 21:44 ` [dm-devel] " Eric Biggers 2022-03-04 17:50 ` Giovanni Cabiddu 2022-03-16 21:38 ` Kyle Sanderson 2022-03-16 21:38 ` [dm-devel] " Kyle Sanderson 2022-03-16 22:13 ` Herbert Xu 2022-03-16 22:13 ` [dm-devel] " Herbert Xu 2022-02-28 21:13 ` Milan Broz 2022-02-28 21:13 ` Milan Broz
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.