netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Introduce FCLONE_SCRATCH skbs to reduce stack memory useage and napi jitter
@ 2011-10-27 19:53 Neil Horman
  2011-10-27 19:53 ` [RFC PATCH 1/5] net: add SKB_FCLONE_SCRATCH API Neil Horman
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Neil Horman @ 2011-10-27 19:53 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, David S. Miller


I had this idea awhile ago while I was looking at the receive path for multicast
frames.   The top of the mcast recieve path (in __udp4_lib_mcast_deliver, has a
loop in which we traverse a hash list linearly, looking for sockets that are
listening to a given multicast group.  For each matching socket we clone the skb
to enqueue it to the corresponding socket.  This creates two problems:

1) Application driven jitter in the receive path
   As you add processes that listen to the same multcast group, you increase the
number of iterations you have to preform in this loop, which can lead to
increases in the amount of time you spend processing each frame in softirq
context, expecially if you are memory constrained, and the skb_clone operation
has to call all the way back into the buddy allocator for more ram.  This can
lead to needlessly dropped frames as rx latency increases in the stack.

2) Increased memory usage
   As you increase the number of listeners to a multicast group, you directly
increase the number of times you clone and skb, putting increased memory
pressure on the system.

while neither of these problems is a huge concern, I thought it would be nice if
we could mitigate the effects of increased application instances on performance
in this area.  As such I came up with this patch set.  I created a new skb
fclone type called FCLONE_SCRATCH.  When available, it commandeers the
internally fragmented space of an skb data buffer and uses that to allocate
additional skbs during the clone operation. Since the skb->data area is
allocated with a kmalloc operation (and is therefore nominally a power of 2 in
size), and nominally network interfaces tend to have an mtu of around 1500
bytes, we typically can reclaim several hundred bytes of space at the end of an
skb (more if the incomming packet is not a full MTU in size).  This space, being
exclusively accessible to the softirq doing the reclaim, can be quickly accesed
without the need for additional locking, potntially providing lower jitter in
napi context per frame during a receive operation, as well as some memory
savings.

I'm still collecting stats on its performance, but I thought I would post now to
get some early reviews and feedback on it.

Thanks & Regards
Neil

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-10-28  2:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-27 19:53 Introduce FCLONE_SCRATCH skbs to reduce stack memory useage and napi jitter Neil Horman
2011-10-27 19:53 ` [RFC PATCH 1/5] net: add SKB_FCLONE_SCRATCH API Neil Horman
2011-10-27 19:53 ` [RFC PATCH 2/5] net: add FCLONE_SCRATCH use to ipv4 udp path Neil Horman
2011-10-27 19:53 ` [RFC PATCH 3/5] net: Add & modify tracepoints to skb FCLONE_SCRATCH paths Neil Horman
2011-10-27 19:53 ` [RFC PATCH 4/5] perf: add perf script to monitor efficiency increase in FCLONE_SCRATCH api Neil Horman
2011-10-27 19:53 ` [RFC PATCH 5/5] net: add FCLONE_SCRATCH use to ipv6 udp path Neil Horman
2011-10-27 22:55 ` Introduce FCLONE_SCRATCH skbs to reduce stack memory useage and napi jitter Eric Dumazet
2011-10-28  1:37   ` Neil Horman
2011-10-28  2:37     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).