netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SSE instructions for fast packet copy?
@ 2017-05-05  5:50 Tom Herbert
  2017-05-05 10:14 ` David Laight
  2017-05-08 19:46 ` Benjamin Poirier
  0 siblings, 2 replies; 3+ messages in thread
From: Tom Herbert @ 2017-05-05  5:50 UTC (permalink / raw)
  To: Linux Kernel Network Developers

Hi,

I am thinking about the possibility of using SSE in kernel for
speeding up the kernel memcpy particularly for copy to userspace
emeory, and maybe even using the string instructions (like if we
supported regex in something like eBPF). AFAIK we don't use SSE in
kernel because of xmm register state needing to be saved across
context switch. However, if we start busy-polling a CPU in kernel on
network queues then there might not be any context switches to worry
about. In this model we'd want to enable SSE per CPU.

Has this ever been tried before? Is this at all feasible? :-) Is it
possible to enable SSE for kernel for just one CPU? (I found CPUID
will return SSE supported, but don't see how to enable other than
-msse for compiling).

Thanks,
Tom

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: SSE instructions for fast packet copy?
  2017-05-05  5:50 SSE instructions for fast packet copy? Tom Herbert
@ 2017-05-05 10:14 ` David Laight
  2017-05-08 19:46 ` Benjamin Poirier
  1 sibling, 0 replies; 3+ messages in thread
From: David Laight @ 2017-05-05 10:14 UTC (permalink / raw)
  To: 'Tom Herbert', Linux Kernel Network Developers

From: Tom Herbert
> Sent: 05 May 2017 06:51
> To: Linux Kernel Network Developers
> Subject: SSE instructions for fast packet copy?
> 
> Hi,
> 
> I am thinking about the possibility of using SSE in kernel for
> speeding up the kernel memcpy particularly for copy to userspace
> emeory, and maybe even using the string instructions (like if we
> supported regex in something like eBPF). AFAIK we don't use SSE in
> kernel because of xmm register state needing to be saved across
> context switch. However, if we start busy-polling a CPU in kernel on
> network queues then there might not be any context switches to worry
> about. In this model we'd want to enable SSE per CPU.
> 
> Has this ever been tried before? Is this at all feasible? :-) Is it
> possible to enable SSE for kernel for just one CPU? (I found CPUID
> will return SSE supported, but don't see how to enable other than
> -msse for compiling).

Not even worth thinking about.
With recent intel cpus 'rep movsb' is optimised in the hardware
(for cached memory) and will run as fast as any other copy.

(There is a related fubar that memcopytoio() is implemented
as memcpy() and then as 'rep movsb' so generates repeated
byte accesses to io memory.)

I'm pretty sure the FP registers are 'lazy saved'.
The cpu's sse registers (the entire FP register set) might
contain life values for a process that is running on a different cpu.
If that process executes an FP instruction it will fault and an IPI
issued to get the registers written to the processes fp save area
from where they can be loaded.
Any use of the sse registers would have to interact correctly
with that IPI code.

	David


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: SSE instructions for fast packet copy?
  2017-05-05  5:50 SSE instructions for fast packet copy? Tom Herbert
  2017-05-05 10:14 ` David Laight
@ 2017-05-08 19:46 ` Benjamin Poirier
  1 sibling, 0 replies; 3+ messages in thread
From: Benjamin Poirier @ 2017-05-08 19:46 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers

On 2017/05/04 22:50, Tom Herbert wrote:
> Hi,
> 
> I am thinking about the possibility of using SSE in kernel for
> speeding up the kernel memcpy particularly for copy to userspace
> emeory, and maybe even using the string instructions (like if we
> supported regex in something like eBPF). AFAIK we don't use SSE in
> kernel because of xmm register state needing to be saved across
> context switch. However, if we start busy-polling a CPU in kernel on
> network queues then there might not be any context switches to worry
> about. In this model we'd want to enable SSE per CPU.
> 
> Has this ever been tried before? Is this at all feasible? :-) Is it
> possible to enable SSE for kernel for just one CPU? (I found CPUID
> will return SSE supported, but don't see how to enable other than
> -msse for compiling).

This reminds me of what you tried in
	c6e1a0d12ca7 net: Allow no-cache copy from user on transmit
	(v3.0-rc1)
and that I reverted in
	cdb3f4a31b64 net: Do not enable tx-nocache-copy by default
	(v3.14-rc1)

Sure, it's not exactly the same thing...

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-05-08 19:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-05  5:50 SSE instructions for fast packet copy? Tom Herbert
2017-05-05 10:14 ` David Laight
2017-05-08 19:46 ` Benjamin Poirier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).