* kernel_fpu_begin and spinlocks
@ 2018-06-15 13:00 Jason A. Donenfeld
2018-06-15 16:09 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 2+ messages in thread
From: Jason A. Donenfeld @ 2018-06-15 13:00 UTC (permalink / raw)
To: linux-rt-users
Hello,
In order to do fast crypto, people like to use vector instructions,
which make use of the FPU registers. Typically things resemble this
pattern:
kernel_fpu_begin();
encrypt();
kernel_fpu_end();
If there are multiple things to encrypt, one pattern is:
for (thing) {
kernel_fpu_begin();
encrypt(thing);
kernel_fpu_end();
}
But it turns out this is slow, so instead it's better to:
kernel_fpu_begin();
for (thing)
encrypt(thing);
kernel_fpu_end();
However, what if that innerloop has some spinlocks?
kernel_fpu_begin();
for (thing) {
encrypt(thing);
spin_lock(queue_lock);
add_to_queue(queue, thing);
spin_unlock(queue_lock);
}
kernel_fpu_end();
On normal kernels, that's certainly okay. But on rt kernels, spinlocks
call schedule(), and kernel_fpu_begin() explicitly disables
preemption, so everything explodes.
I'm wondering if -rt has any ideas about not having a strict
preemption requirement for kernel_fpu_begin/end, so that the ordinary
pattern can work on -rt kernels without exploding.
Regards,
Jason
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: kernel_fpu_begin and spinlocks
2018-06-15 13:00 kernel_fpu_begin and spinlocks Jason A. Donenfeld
@ 2018-06-15 16:09 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 2+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-06-15 16:09 UTC (permalink / raw)
To: Jason A. Donenfeld; +Cc: linux-rt-users
On 2018-06-15 15:00:51 [+0200], Jason A. Donenfeld wrote:
> Hello,
Hi,
> On normal kernels, that's certainly okay.
once the latencies get "big" it is also affects PREEMPT.
> But on rt kernels, spinlocks
> call schedule(), and kernel_fpu_begin() explicitly disables
> preemption, so everything explodes.
>
> I'm wondering if -rt has any ideas about not having a strict
> preemption requirement for kernel_fpu_begin/end, so that the ordinary
> pattern can work on -rt kernels without exploding.
We can't do fast crypto on RT. We unbreak those sections:
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/x86-crypto-reduce-preempt-disabled-regions.patch?h=linux-4.16.y-rt-patches
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/crypto-limit-more-FPU-enabled-sections.patch?h=linux-4.16.y-rt-patches
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/crypto-Reduce-preempt-disabled-regions-more-algos.patch?h=linux-4.16.y-rt-patches
There are a few problems. One problem is that algorithm involving
SSE/AVX doing page_size+ operations in "one go" would affect the latency
due to the "long" preempt-disable regions.
The other thing are nested locks like you mentioned. "Simple" locks like
those protecting a list_head could be turned into raw-locks. But then
some of those implementation involve advancing the scatter-gather list
within a kernel_fpu_begin()/preempt_disable() region. Depending on the
size of list this is also bad latency wise for PREEMPT (and not only for
RT) the problem is that on RT we can't do kmalloc()/kmap() within
kernel_fpu_begin().
At some point I tried something like:
if (tif_need_resched_now()) {
kernel_fpu_end();
kernel_fpu_begin();
}
which died once the scatter-gather got into the game.
> Regards,
> Jason
Sebastian
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2018-06-15 16:09 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-15 13:00 kernel_fpu_begin and spinlocks Jason A. Donenfeld
2018-06-15 16:09 ` Sebastian Andrzej Siewior
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.