Kernel lock during cryptsetup luksClose

* Kernel lock during cryptsetup luksClose
@ 2023-02-23 15:03 Frédéric Martinsons
  2023-02-27  9:52 ` Frédéric Martinsons
  0 siblings, 1 reply; 9+ messages in thread
From: Frédéric Martinsons @ 2023-02-23 15:03 UTC (permalink / raw)
  To: linux-rt-users

Subject: Kernel lock during cryptsetup luksClose

Hello,

I recently upgraded my system (Marvell Armada-3720) from linux 4.19.255-rt113 to
linux 4.19.271-rt120 and experienced a blocking issue with cryptsetup.

My sequence is roughly the following:
  - Boot an initramfs from PXE
  - Get back a ciphered image from a local HTTP server
  - Get back the ciphering key of this image (which is itself ciphered)
  - Decipher the key it via a TPM chip
  - Decipher the image via cryptsetup luksOpen
  - Partition the storage and install the image on it
  - Cleanup operation

The blocking I see is when I close the LUKS encrypted volume via
cryptsetup luksClose.
I manage to have some more traces of cryptsetup luksClose by adding
--debug to it.
Below is the correct sequence  in 4.19.255-rt113:

# cryptsetup 2.3.7 processing "cryptsetup --verbose --debug luksClose image"
# Running command close.
# Locking memory.
# Installing SIGINT/SIGTERM handler.
# Unblocking interruption on signal.
# Allocating crypt device context by device image.
# Initialising device-mapper backend library.
# dm version   [ opencount flush ]   [16384] (*1)
# dm versions   [ opencount flush ]   [16384] (*1)
# Detected dm-ioctl version 4.39.0.
# Detected dm-crypt version 1.18.1.
# Udev is not running. Not using udev synchronisation code.
# Device-mapper backend running with UDEV support disabled.
# dm status image  [ opencount noflush ]   [16384] (*1)
# Releasing device-mapper backend.
# Trying to open and read device /dev/loop0 with direct-io.
# Allocating context for crypt device /dev/loop0.
# Trying to open and read device /dev/loop0 with direct-io.
# Initialising device-mapper backend library.
# dm versions   [ opencount flush ]   [16384] (*1)
# dm table image  [ opencount flush securedata ]   [16384] (*1)
# Trying to open and read device /dev/loop0 with direct-io.
# dm versions   [ opencount flush ]   [16384] (*1)
# dm deps image  [ opencount flush ]   [16384] (*1)
# Crypto backend (OpenSSL 1.1.1s  1 Nov 2022) initialized in
cryptsetup library version 2.3.7.
# Detected kernel Linux 4.19.255-rt113-sigfox aarch64.
# PBKDF pbkdf2-sha256, time_ms 2000 (iterations 0).
# Reading LUKS header of size 1024 from device /dev/loop0
# Key length 32, device size 179761 sectors, header size 2050 sectors.
# Deactivating volume image.
# dm versions   [ opencount flush ]   [16384] (*1)
# dm status image  [ opencount noflush ]   [16384] (*1)
# dm versions   [ opencount flush ]   [16384] (*1)
# dm table image  [ opencount flush securedata ]   [16384] (*1)
# Trying to open and read device /dev/loop0 with direct-io.
# dm versions   [ opencount flush ]   [16384] (*1)
# dm remove image  [ opencount flush retryremove ]   [16384] (*1)
# image: Stacking NODE_DEL
# image: Processing NODE_DEL
# Removed /dev/mapper/image
# Releasing crypt device /dev/loop0 context.
# Releasing device-mapper backend.
# Closing read only fd for /dev/loop0.
# Unlocking memory.
Command successful.

And with 4.19.271-rt120:
# cryptsetup 2.3.7 processing "cryptsetup --verbose --debug luksClose image"
# Running command close.
# Locking memory.
# Installing SIGINT/SIGTERM handler.
# Unblocking interruption on signal.
# Allocating crypt device context by device image.
# Initialising device-mapper backend library.
# dm version   [ opencount flush ]   [16384] (*1)
# dm versions   [ opencount flush ]   [16384] (*1)
# Detected dm-ioctl version 4.39.0.
# Detected dm-crypt version 1.18.1.
# Udev is not running. Not using udev synchronisation code.
# Device-mapper backend running with UDEV support disabled.
# dm status image  [ opencount noflush ]   [16384] (*1)
# Releasing device-mapper backend.
# Trying to open and read device /dev/loop0 with direct-io.
# Allocating context for crypt device /dev/loop0.
# Trying to open and read device /dev/loop0 with direct-io.
# Initialising device-mapper backend library.
# dm versions   [ opencount flush ]   [16384] (*1)
# dm table image  [ opencount flush securedata ]   [16384] (*1)
# Trying to open and read device /dev/loop0 with direct-io.
# dm versions   [ opencount flush ]   [16384] (*1)
# dm deps image  [ opencount flush ]   [16384] (*1)
# Crypto backend (OpenSSL 1.1.1s  1 Nov 2022) initialized in
cryptsetup library version 2.3.7.
# Detected kernel Linux 4.19.271-rt120-sigfox aarch64.
# PBKDF pbkdf2-sha256, time_ms 2000 (iterations 0).
# Reading LUKS header of size 1024 from device /dev/loop0
# Key length 32, device size 179761 sectors, header size 2050 sectors.
# Deactivating volume image.
# dm versions   [ opencount flush ]   [16384] (*1)
# dm status image  [ opencount noflush ]   [16384] (*1)
# dm versions   [ opencount flush ]   [16384] (*1)
# dm table image  [ opencount flush securedata ]   [16384] (*1)
# Trying to open and read device /dev/loop0 with direct-io.
# dm versions   [ opencount flush ]   [16384] (*1)
# dm remove image  [ opencount flush retryremove ]   [16384] (*1)

After that line the system is completely blocked.

I perform some dichotomy and found that the problematic commit is:
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v4.19-rt&id=7fcfa8b616d360ab9ea1da94c2200f0708e0188b
introduced in 4.19.255-rt114. Revert this commit on 4.19.271-rt220 makes
the blocking disappear.

I put the kernel max verbosity (logleve=8 ignore_loglevel for command
line parameters)
but I didn't see any kernel traces during this sequence (I have a lot of log
before loading the initramfs but as soon as it is loaded, it seems the kernel
became quiet).
I'll post a follow up if I manage to have more traces.

Does anyone know what could happen ?

I have a reproducible setup and can test several things if someone
tells me what to look at.

Thanks in advance for any insights you could give.

^ permalink raw reply	[flat|nested] 9+ messages in thread