* Re: performance and threads
2022-09-05 3:52 performance and threads Kristóf Csillag
@ 2022-09-05 6:34 ` Michael Kjörling
2022-09-05 15:09 ` Arno Wagner
2022-09-15 14:55 ` Milan Broz
2 siblings, 0 replies; 4+ messages in thread
From: Michael Kjörling @ 2022-09-05 6:34 UTC (permalink / raw)
To: cryptsetup
On 5 Sep 2022 05:52 +0200, from csillag.kristof@gmail.com (Kristóf Csillag):
> - Raw device read: 2,3 GB/s
> - RAID device read: 2,3 GB/s
> - Reading from the encrypted device: 1,6 GB/s
What does `cryptsetup benchmark` report for the relevant combination
of cipher and key size?
--
Michael Kjörling • https://michael.kjorling.se • michael@kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: performance and threads
2022-09-05 3:52 performance and threads Kristóf Csillag
2022-09-05 6:34 ` Michael Kjörling
@ 2022-09-05 15:09 ` Arno Wagner
2022-09-15 14:55 ` Milan Broz
2 siblings, 0 replies; 4+ messages in thread
From: Arno Wagner @ 2022-09-05 15:09 UTC (permalink / raw)
To: Kristóf Csillag; +Cc: cryptsetup
Hi,
encryption is not actually CPU bound for AES, as modern CPUs
do AES hardware. So it may appear the CPU is idle, but its
AES hardware is not.
At these speeds, the penalty you see is probably just memory
bandwidth limitations stemming from the copy that has to be
made when encrypting or decrypting. I would say these speeds
look as expected.
Kind Regards,
Arno
P.S.: I just leaned that signing with "Regards" basically means
"You are the worst!", so I am not using that anymore:
https://www.youtube.com/watch?v=owzPL9jaSU8&t=478s
On Mon, Sep 05, 2022 at 05:52:06 CEST, Kristóf Csillag wrote:
> Dear all,
>
> I would like to ask you if what I am seeing is normal, or it's some
> configuration problem.
>
> Short summary: reading my encrypted is a lot slower than reading it
> raw, while the CPU is underutilized.
>
> Detailed version:
>
> I have a RAID1 device, consisting of two identical NVMe devices.
> On the top of the RAID device, I have LUKS encryption.
>
> These are the read speeds:
>
> - Raw device read: 2,3 GB/s
> - RAID device read: 2,3 GB/s
> - Reading from the encrypted device: 1,6 GB/s
>
> As you can see, there is a pretty serious performance penalty for the
> decryption.
> The cipher running is the default aes-xts-plain64 cipher.
> This is an AMD Ryzen 9 5900X 12-Core CPU, so I'm not sure what this is.
> What is even more interesting, is that the CPU doesn't seem to be all
> that busy during the reading. as far as I can tell, I only get 4
> threads of kworker / kcryptd, and their total system load is less than
> 100% (of 1 core.)
>
> So I'm getting the impression that even though decryption is a
> CPU-bound process, my CPU is still underutilized.
>
> Is this interpretation correct? If yes, is this to be expected, or am
> I doing something wrong? Can dm-crypt be configured to run the
> encryption on more CPU cores, with better performance?
>
> Thank you for your help:
>
> Kristof Csillag
>
> ps. I'm on kernel 5.19
--
Arno Wagner, Dr. sc. techn., Dipl. Inform., Email: arno@wagner.name
GnuPG: ID: CB5D9718 FP: 12D6 C03B 1B30 33BB 13CF B774 E35C 5FA1 CB5D 9718
----
A good decision is based on knowledge and not on numbers. -- Plato
If it's in the news, don't worry about it. The very definition of
"news" is "something that hardly ever happens." -- Bruce Schneier
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: performance and threads
2022-09-05 3:52 performance and threads Kristóf Csillag
2022-09-05 6:34 ` Michael Kjörling
2022-09-05 15:09 ` Arno Wagner
@ 2022-09-15 14:55 ` Milan Broz
2 siblings, 0 replies; 4+ messages in thread
From: Milan Broz @ 2022-09-15 14:55 UTC (permalink / raw)
To: Kristóf Csillag
Cc: cryptsetup, device-mapper development, Mikulas Patocka
Hi,
little bit late reply, but as I struggled with NVMe performance
with LUKS on my recently updated notebook, I add some notes here.
It could apply for your config too, but as I sometimes see similar reports,
it is better to have it on one place (also cc dm-devel as Mikulas
helped me to debug some things).
On 05/09/2022 05:52, Kristóf Csillag wrote:
> Short summary: reading my encrypted is a lot slower than reading it
> raw, while the CPU is underutilized.
>
> Detailed version:
>
> I have a RAID1 device, consisting of two identical NVMe devices.
> On the top of the RAID device, I have LUKS encryption.
>
> These are the read speeds:
>
> - Raw device read: 2,3 GB/s
> - RAID device read: 2,3 GB/s
> - Reading from the encrypted device: 1,6 GB/s
In general (applies both for reading and writing performance issues)
- check that kernel really uses AES-NI acceleration
lsmod |grep aes should show used aesni_intel
- for NVMe, always use 4k encryption block (--sector-size 4096 for luksFormat).
It should be autodetected, but many NVMe lie and show 512B physical sector,
so you need to overwrite it.
If you have existing LUKS2 device, you can use online reencryption to switch it on
existing device (even online).
It needs to reencrypt the whole drive, so it will take a long time.
BE SURE no fs above uses 512B block size, though! (otherwise it cannot be activated later).
We try to check it before reencryption starts, but if you use more complex setup like LVM
above, it still can run it and cause unusable fs.
(XFS seems to set 512B blocks; ext4 seems to use 4k block even over 512B sector drive.)
- if the performance is still is not optimal, you can try to use some performance flags.
You can switch them on/off them over active dm-crypt device by using "cryptsetup refresh" command.
cryptsetup refresh -perf-no_read_workqueue --perf-no_write_workqueue <active device>
(and similar flags, see --perf-* flags)
If it helps, use --persistent option to store it to LUKS2 metadata to be used by default
for that device. (see "man cryptsetup open")
For writing issues, like random freezing etc, try also:
- enable discard (both for LUKS device and filesystem)
- and mainly, check your HW...
Some (cheaper) NVMe behaves strange. I had 2TB NVMe that after copying a lot
of data keep systems regularly freezing for a few seconds.
CPU was doing almost nothing, it spent time waiting for NVMe IOs.
I tried several things (even patching dm/dm-crypt to limit IOs). It helps only slightly.
Finally I found that it is just strange NVMe - after replacing it with better one
I have no problems even under extreme load and everything running over one huge dm-crypt.
No idea why dm-crypt amplifies such a behavior (I am sure it can be reproduced
without dm-crypt too, just with some specific IO load). I expect there could be some
internal compression of data or some optimization that is jut not efficient
with encrypted data that cannot be compressed.
So the conclusion is that often the problem is not in performance or
parallel processing of encryption in dm-crypt (that actually works quite nice)
but with NVMe drive itself.
Milan
>
> As you can see, there is a pretty serious performance penalty for the
> decryption.
> The cipher running is the default aes-xts-plain64 cipher.
> This is an AMD Ryzen 9 5900X 12-Core CPU, so I'm not sure what this is.
> What is even more interesting, is that the CPU doesn't seem to be all
> that busy during the reading. as far as I can tell, I only get 4
> threads of kworker / kcryptd, and their total system load is less than
> 100% (of 1 core.)
>
> So I'm getting the impression that even though decryption is a
> CPU-bound process, my CPU is still underutilized.
>
> Is this interpretation correct? If yes, is this to be expected, or am
> I doing something wrong? Can dm-crypt be configured to run the
> encryption on more CPU cores, with better performance?
>
> Thank you for your help:
>
> Kristof Csillag
>
> ps. I'm on kernel 5.19
>
^ permalink raw reply [flat|nested] 4+ messages in thread