All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel_fpu_begin and spinlocks
@ 2018-06-15 13:00 Jason A. Donenfeld
  2018-06-15 16:09 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 2+ messages in thread
From: Jason A. Donenfeld @ 2018-06-15 13:00 UTC (permalink / raw)
  To: linux-rt-users

Hello,

In order to do fast crypto, people like to use vector instructions,
which make use of the FPU registers. Typically things resemble this
pattern:

kernel_fpu_begin();
encrypt();
kernel_fpu_end();

If there are multiple things to encrypt, one pattern is:

for (thing) {
  kernel_fpu_begin();
  encrypt(thing);
  kernel_fpu_end();
}

But it turns out this is slow, so instead it's better to:

kernel_fpu_begin();
for (thing)
  encrypt(thing);
kernel_fpu_end();

However, what if that innerloop has some spinlocks?

kernel_fpu_begin();
for (thing) {
  encrypt(thing);
  spin_lock(queue_lock);
  add_to_queue(queue, thing);
  spin_unlock(queue_lock);
}
kernel_fpu_end();

On normal kernels, that's certainly okay. But on rt kernels, spinlocks
call schedule(), and kernel_fpu_begin() explicitly disables
preemption, so everything explodes.

I'm wondering if -rt has any ideas about not having a strict
preemption requirement for kernel_fpu_begin/end, so that the ordinary
pattern can work on -rt kernels without exploding.

Regards,
Jason

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: kernel_fpu_begin and spinlocks
  2018-06-15 13:00 kernel_fpu_begin and spinlocks Jason A. Donenfeld
@ 2018-06-15 16:09 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 2+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-06-15 16:09 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: linux-rt-users

On 2018-06-15 15:00:51 [+0200], Jason A. Donenfeld wrote:
> Hello,
Hi,

> On normal kernels, that's certainly okay. 

once the latencies get "big" it is also affects PREEMPT.

> But on rt kernels, spinlocks
> call schedule(), and kernel_fpu_begin() explicitly disables
> preemption, so everything explodes.
> 
> I'm wondering if -rt has any ideas about not having a strict
> preemption requirement for kernel_fpu_begin/end, so that the ordinary
> pattern can work on -rt kernels without exploding.

We can't do fast crypto on RT. We unbreak those sections:
 https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/x86-crypto-reduce-preempt-disabled-regions.patch?h=linux-4.16.y-rt-patches
 https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/crypto-limit-more-FPU-enabled-sections.patch?h=linux-4.16.y-rt-patches
 https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/crypto-Reduce-preempt-disabled-regions-more-algos.patch?h=linux-4.16.y-rt-patches

There are a few problems. One problem is that algorithm involving
SSE/AVX doing page_size+ operations in "one go" would affect the latency
due to the "long" preempt-disable regions.
The other thing are nested locks like you mentioned. "Simple" locks like
those protecting a list_head could be turned into raw-locks. But then
some of those implementation involve advancing the scatter-gather list
within a kernel_fpu_begin()/preempt_disable() region. Depending on the
size of list this is also bad latency wise for PREEMPT (and not only for
RT) the problem is that on RT we can't do kmalloc()/kmap() within
kernel_fpu_begin().

At some point I tried something like:

	if (tif_need_resched_now()) {
		kernel_fpu_end();
		kernel_fpu_begin();
	}

which died once the scatter-gather got into the game.

> Regards,
> Jason

Sebastian

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-06-15 16:09 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-15 13:00 kernel_fpu_begin and spinlocks Jason A. Donenfeld
2018-06-15 16:09 ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.