All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: add Documentation/networking/scaling.txt
@ 2011-08-01  6:56 Tom Herbert
  2011-08-01 18:41 ` Randy Dunlap
  2011-08-01 18:49 ` Rick Jones
  0 siblings, 2 replies; 6+ messages in thread
From: Tom Herbert @ 2011-08-01  6:56 UTC (permalink / raw)
  To: rdunlap, linux-doc, davem, netdev; +Cc: willemb

[-- Attachment #1: Type: TEXT/PLAIN, Size: 17646 bytes --]

Describes RSS, RPS, RFS, accelerated RFS, and XPS.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 Documentation/networking/scaling.txt |  346 ++++++++++++++++++++++++++++++++++
 1 files changed, 346 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/networking/scaling.txt

diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
new file mode 100644
index 0000000..aa51f0f
--- /dev/null
+++ b/Documentation/networking/scaling.txt
@@ -0,0 +1,346 @@
+Scaling in the Linux Networking Stack
+
+
+Introduction
+============ 
+
+This document describes a set of complementary techniques in the Linux
+networking stack to increase parallelism and improve performance (in
+throughput, latency, CPU utilization, etc.) for multi-processor systems.
+
+The following technologies are described:
+
+  RSS: Receive Side Scaling
+  RPS: Receive Packet Steering
+  RFS: Receive Flow Steering
+  Accelerated Receive Flow Steering
+  XPS: Transmit Packet Steering
+
+
+RSS: Receive Side Scaling
+=========================
+
+Contemporary NICs support multiple receive queues (multi-queue), which
+can be used to distribute packets amongst CPUs for processing. The NIC
+distributes packets by applying a filter to each packet to assign it to
+one of a small number of logical flows.  Packets for each flow are
+steered to a separate receive queue, which in turn can be processed by
+separate CPUs.  This mechanism is generally known as “Receive-side
+Scaling” (RSS).
+
+The filter used in RSS is typically a hash function over the network or
+transport layer headers-- for example, a 4-tuple hash over IP addresses
+and TCP ports of a packet. The most common hardware implementation of
+RSS uses a 128 entry indirection table where each entry stores a queue
+number. The receive queue for a packet is determined by masking out the
+low order seven bits of the computed hash for the packet (usually a
+Toeplitz hash), taking this number as a key into the indirection table
+and reading the corresponding value.
+
+Some advanced NICs allow steering packets to queues based on
+programmable filters. For example, webserver bound TCP port 80 packets
+can be directed to their own receive queue. Such “n-tuple” filters can
+be configured from ethtool (--config-ntuple).
+
+== RSS Configuration
+
+The driver for a multi-queue capable NIC typically provides a module
+parameter specifying the number of hardware queues to configure. In the
+bnx2x driver, for instance, this parameter is called num_queues. A
+typical RSS configuration would be to have one receive queue for each
+CPU if the device supports enough queues, or otherwise at least one for
+each cache domain at a particular cache level (L1, L2, etc.).
+
+The indirection table of an RSS device, which resolves a queue by masked
+hash, is usually programmed by the driver at initialization.  The
+default mapping is to distribute the queues evenly in the table, but the
+indirection table can be retrieved and modified at runtime using ethtool
+commands (--show-rxfh-indir and --set-rxfh-indir).  Modifying the
+indirection table could be done to to give different queues different
+relative weights. 
+
+== RSS IRQ Configuration
+
+Each receive queue has a separate IRQ associated with it. The NIC
+triggers this to notify a CPU when new packets arrive on the given
+queue. The signaling path for PCIe devices uses message signaled
+interrupts (MSI-X), that can route each interrupt to a particular CPU.
+The active mapping of queues to IRQs can be determined from
+/proc/interrupts. By default, all IRQs are routed to CPU0.  Because a
+non-negligible part of packet processing takes place in receive
+interrupt handling, it is advantageous to spread receive interrupts
+between CPUs. To manually adjust the IRQ affinity of each interrupt see
+Documentation/IRQ-affinity. On some systems, the irqbalance daemon is
+running and will try to dynamically optimize this setting.
+
+
+RPS: Receive Packet Steering
+============================
+
+Receive Packet Steering (RPS) is logically a software implementation of
+RSS.  Being in software, it is necessarily called later in the datapath.
+Whereas RSS selects the queue and hence CPU that will run the hardware
+interrupt handler, RPS selects the CPU to perform protocol processing
+above the interrupt handler.  This is accomplished by placing the packet
+on the desired CPU’s backlog queue and waking up the CPU for processing.
+RPS has some advantages over RSS: 1) it can be used with any NIC, 2)
+software filters can easily be added to handle new protocols, 3) it does
+not increase hardware device interrupt rate (but does use IPIs).
+
+RPS is called during bottom half of the receive interrupt handler, when
+a driver sends a packet up the network stack with netif_rx() or
+netif_receive_skb(). These call the get_rps_cpu() function, which
+selects the queue that should process a packet.
+
+The first step in determining the target CPU for RPS is to calculate a
+flow hash over the packet’s addresses or ports (2-tuple or 4-tuple hash
+depending on the protocol). This serves as a consistent hash of the
+associated flow of the packet. The hash is either provided by hardware
+or will be computed in the stack. Capable hardware can pass the hash in
+the receive descriptor for the packet, this would usually be the same
+hash used for RSS (e.g. computed Toeplitz hash). The hash is saved in
+skb->rx_hash and can be used elsewhere in the stack as a hash of the
+packet’s flow.
+
+Each receive hardware qeueue has associated list of CPUs which can
+process packets received on the queue for RPS.  For each received
+packet, an index into the list is computed from the flow hash modulo the
+size of the list.  The indexed CPU is the target for processing the
+packet, and the packet is queued to the tail of that CPU’s backlog
+queue. At the end of the bottom half routine, inter-processor interrupts
+(IPIs) are sent to any CPUs for which packets have been queued to their
+backlog queue. The IPI wakes backlog processing on the remote CPU, and
+any queued packets are then processed up the networking stack. Note that
+the list of CPUs can be configured separately for each hardware receive
+queue.
+
+== RPS Configuration
+
+RPS requires a kernel compiled with the CONFIG_RPS flag (on by default
+for smp). Even when compiled in, it is disabled without any
+configuration. The list of CPUs to which RPS may forward traffic can be
+configured for each receive queue using the sysfs file entry:
+
+ /sys/class/net/<dev>/queues/rx-<n>/rps_cpus
+
+This file implements a bitmap of CPUs. RPS is disabled when it is zero
+(the default), in which case packets are processed on the interrupting
+CPU.  IRQ-affinity.txt explains how CPUs are assigned to the bitmap.
+
+For a single queue device, a typical RPS configuration would be to set
+the rps_cpus to the CPUs in the same cache domain of the interrupting
+CPU for a queue. If NUMA locality is not an issue, this could also be
+all CPUs in the system. At high interrupt rate, it might wise to exclude
+the interrupting CPU from the map since that already performs much work.
+
+For a multi-queue system, if RSS is configured so that a receive queue
+is mapped to each CPU, then RPS is probably redundant and unnecessary.
+If there are fewer queues than CPUs, then RPS might be beneficial if the
+rps_cpus for each queue are the ones that share the same cache domain as
+the interrupting CPU for the queue.
+
+RFS: Receive Flow Steering
+==========================
+
+While RPS steers packet solely based on hash, and thus generally
+provides good load distribution, it does not take into account
+application locality. This is accomplished by Receive Flow Steering
+(RFS). The goal of RFS is to increase datacache hitrate by steering
+kernel processing of packets to the CPU where the application thread
+consuming the packet is running. RFS relies on the same RPS mechanisms
+to enqueue packets onto the backlog of another CPU and to wake that CPU.
+
+In RFS, packets are not forwarded directly by the value of their hash,
+but the hash is used as index into a flow lookup table. This table maps
+flows to the CPUs where those flows are being processed. The flow hash
+(see RPS section above) is used to calculate the index into this table.
+The CPU recorded in each entry is the one which last processed the flow,
+and if there is not a valid CPU for an entry, then packets mapped to
+that entry are steered using plain RPS.
+
+To avoid out of order packets (ie. when scheduler moves a thread with
+outstanding receive packets on) there are two levels of flow tables used
+by RFS: rps_sock_flow_table and rps_dev_flow_table.
+
+rps_sock_table is a global flow table. Each table value is a CPU index
+and is populated by recvmsg and sendmsg (specifically, inet_recvmsg(),
+inet_sendmsg(), inet_sendpage() and tcp_splice_read()). This table
+contains the *desired* CPUs for flows.
+
+rps_dev_flow_table is specific to each hardware receive queue of each
+device.  Each table value stores a CPU index and a counter. The CPU
+index represents the *current* CPU that is assigned to processing the
+matching flows.
+
+The counter records the length of this CPU's backlog when a packet in
+this flow was last enqueued.  Each backlog queue has a head counter that
+is incremented on dequeue. A tail counter is computed as head counter +
+queue length. In other words, the counter in rps_dev_flow_table[i]
+records the last element in flow i that has been enqueued onto the
+currently designated CPU for flow i (of course, entry i is actually
+selected by hash and multiple flows may hash to the same entry i). 
+
+And now the trick for avoiding out of order packets: when selecting the
+CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table
+and the rps_dev_flow table of the queue that the packet was received on
+are compared.  If the desired CPU for the flow (found in the
+rps_sock_flow table) matches the current CPU (found in the rps_dev_flow
+table), the packet is enqueud onto that CPU’s backlog. If they differ,
+the current cpu is updated to match the desired CPU if one of the
+following is true:
+
+- The current CPU's queue head counter >= the recorded tail counter
+  value in rps_dev_flow[i]
+- The current CPU is unset (equal to NR_CPUS)
+- The current CPU is offline
+
+After this check, the packet is sent to the (possibly updated) current
+CPU.  These rules aim to ensure that a flow only moves to a new CPU when
+there are no packets outstanding on the old CPU, as the outstanding
+packets could arrive later than those about to be processed on the new
+CPU.
+
+== RFS Configuration
+
+RFS is only available if the kernel flag CONFIG_RFS is enabled (on by
+default for smp). The functionality is disabled without any
+configuration. The number of entries in the global flow table is set
+through:
+
+ /proc/sys/net/core/rps_sock_flow_entries
+
+The number of entries in the per queue flow table are set through:
+
+ /sys/class/net/<dev>/queues/tx-<n>/rps_flow_cnt
+
+Both of these need to be set before RFS is enabled for a receive queue.
+Values for both of these are rounded up to the nearest power of two. The
+suggested flow count depends on the expected number active connections
+at any given time, which may be significantly less than the number of
+open connections. We have found that a value of 32768 for
+rps_sock_flow_entries works fairly well on a moderately loaded server.
+
+For a single queue device, the rps_flow_cnt value for the single queue
+would normally be configured to the same value as rps_sock_flow_entries.
+For a multi-queue device, the rps_flow_cnt for each queue might be
+configured as rps_sock_flow_entries / N, where N is the number of
+queues. So for instance, if rps_flow_entries is set to 32768 and there
+are 16 configured receive queues, rps_flow_cnt for each queue might be
+configured as 2048.
+
+
+Accelerated RFS
+===============
+
+Accelerated RFS is to RFS what RSS is to RPS: a hardware-accelerated
+load balancing mechanism that uses soft state to steer flows based on
+where the thread consuming the packets of each flow is running.
+Accelerated RFS should perform better than RFS since packets are sent
+directly to a CPU local to the thread consuming the data. The target CPU
+will either be the same CPU where the application runs, or at least a
+CPU which is local to the application thread’s CPU in the cache
+hierarchy. 
+
+To enable accelerated RFS, the networking stack calls the
+ndo_rx_flow_steer driver function to communicate the desired hardware
+queue for packets matching a particular flow. The network stack
+automatically calls this function every time a flow entry in
+rps_dev_flow_table is updated. The driver in turn uses a device specific
+method to program the NIC to steer the packets.
+
+The hardware queue for a flow is derived from the CPU recorded in
+rps_dev_flow_table. The stack consults a CPU to hardware queue map which
+is maintained by the NIC driver. This is an autogenerated reverse map of
+the IRQ affinity table shown by /proc/interrupts. Drivers can use
+functions in the cpu_rmap (“cpu affinitiy reverse map”) kernel library
+to populate the map. For each CPU, the corresponding queue in the map is
+set to be one whose processing CPU is closest in cache locality.
+
+== Accelerated RFS Configuration
+
+Accelerated RFS is only available if the kernel is compiled with
+CONFIG_RFS_ACCEL and support is provided by the NIC device and driver.
+It also requires that ntuple filtering is enabled via ethtool. The map
+of CPU to queues is automatically deduced from the IRQ affinities
+configured for each receive queue by the driver, so no additional
+configuration should be necessary.
+
+XPS: Transmit Packet Steering
+=============================
+
+Transmit Packet Steering is a mechanism for intelligently selecting
+which transmit queue to use when transmitting a packet on a multi-queue
+device. To accomplish this, a mapping from CPU to hardware queue(s) is
+recorded. The goal of this mapping is usually to assign queues
+exclusively to a subset of CPUs, where the transmit completions for
+these queues are processed on a CPU within this set. This choice
+provides two benefits. First, contention on the device queue lock is
+significantly reduced since fewer CPUs contend for the same queue
+(contention can be eliminated completely if each CPU has its own
+transmit queue).  Secondly, cache miss rate on transmit completion is
+reduced, in particular for data cache lines that hold the sk_buff
+structures.
+
+XPS is configured per transmit queue by setting a bitmap of CPUs that
+may use that queue to transmit. The reverse mapping, from CPUs to
+transmit queues, is computed and maintained for each network device.
+When transmitting the first packet in a flow, the function
+get_xps_queue() is called to select a queue.  This function uses the ID
+of the running CPU as a key into the CPU to queue lookup table. If the
+ID matches a single queue, that is used for transmission.  If multiple
+queues match, one is selected by using the flow hash to compute an index
+into the set.
+
+The queue chosen for transmitting a particular flow is saved in the
+corresponding socket structure for the flow (e.g. a TCP connection).
+This transmit queue is used for subsequent packets sent on the flow to
+prevent out of order (ooo) packets. The choice also amortizes the cost
+of calling get_xps_queues() over all packets in the connection. To avoid
+ooo packets, the queue for a flow can subsequently only be changed if
+skb->ooo_okay is set for a packet in the flow. This flag indicates that
+there are no outstanding packets in the flow, so the transmit queue can
+change without the risk of generating out of order packets. The
+transport layer is responsible for setting ooo_okay appropriately. TCP,
+for instance, sets the flag when all data for a connection has been
+acknowledged.
+
+
+== XPS Configuration
+
+XPS is only available if the kernel flag CONFIG_XPS is enabled (on by
+default for smp). The functionality is disabled without any
+configuration, in which case the the transmit queue for a packet is
+selected by using a flow hash as an index into the set of all transmit
+queues for the device. To enable XPS, the bitmap of CPUs that may use a
+transmit queue is configured using the sysfs file entry:
+
+/sys/class/net/<dev>/queues/tx-<n>/xps_cpus
+
+XPS is disabled when it is zero (the default). IRQ-affinity.txt explains
+how CPUs are assigned to the bitmap. 
+
+For a network device with a single transmission queue, XPS configuration
+has no effect, since there is no choice in this case. In a multi-queue
+system, XPS is usually configured so that each CPU maps onto one queue.
+If there are as many queues as there are CPUs in the system, then each
+queue can also map onto one CPU, resulting in exclusive pairings that
+experience no contention. If there are fewer queues than CPUs, then the
+best CPUs to share a given queue are probably those that share the cache
+with the CPU that processes transmit completions for that queue
+(transmit interrupts).
+
+
+Further Information
+===================
+RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into
+2.6.38. Original patches were submitted by Tom Herbert
+(therbert@google.com)
+
+
+Accelerated RFS was introduced in 2.6.35. Original patches were
+submitted by Ben Hutchings (bhutchings@solarflare.com)
+
+Authors:
+Tom Herbert (therbert@google.com)
+Willem de Bruijn (willemb@google.com)
+
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: add Documentation/networking/scaling.txt
  2011-08-01  6:56 [PATCH] net: add Documentation/networking/scaling.txt Tom Herbert
@ 2011-08-01 18:41 ` Randy Dunlap
  2011-08-01 18:49 ` Rick Jones
  1 sibling, 0 replies; 6+ messages in thread
From: Randy Dunlap @ 2011-08-01 18:41 UTC (permalink / raw)
  To: Tom Herbert; +Cc: linux-doc, davem, netdev, willemb

On Sun, 31 Jul 2011 23:56:26 -0700 (PDT) Tom Herbert wrote:

> Describes RSS, RPS, RFS, accelerated RFS, and XPS.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  Documentation/networking/scaling.txt |  346 ++++++++++++++++++++++++++++++++++
>  1 files changed, 346 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/networking/scaling.txt
> 
> diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
> new file mode 100644
> index 0000000..aa51f0f
> --- /dev/null
> +++ b/Documentation/networking/scaling.txt
> @@ -0,0 +1,346 @@
> +Scaling in the Linux Networking Stack
> +
> +
> +Introduction
> +============ 
> +
> +This document describes a set of complementary techniques in the Linux
> +networking stack to increase parallelism and improve performance (in
> +throughput, latency, CPU utilization, etc.) for multi-processor systems.
> +
> +The following technologies are described:
> +
> +  RSS: Receive Side Scaling
> +  RPS: Receive Packet Steering
> +  RFS: Receive Flow Steering
> +  Accelerated Receive Flow Steering
> +  XPS: Transmit Packet Steering
> +
> +
> +RSS: Receive Side Scaling
> +=========================
> +
> +Contemporary NICs support multiple receive queues (multi-queue), which
> +can be used to distribute packets amongst CPUs for processing. The NIC
> +distributes packets by applying a filter to each packet to assign it to
> +one of a small number of logical flows.  Packets for each flow are
> +steered to a separate receive queue, which in turn can be processed by
> +separate CPUs.  This mechanism is generally known as “Receive-side
> +Scaling” (RSS).
> +
> +The filter used in RSS is typically a hash function over the network or
> +transport layer headers-- for example, a 4-tuple hash over IP addresses
> +and TCP ports of a packet. The most common hardware implementation of
> +RSS uses a 128 entry indirection table where each entry stores a queue

              128-entry

> +number. The receive queue for a packet is determined by masking out the
> +low order seven bits of the computed hash for the packet (usually a
> +Toeplitz hash), taking this number as a key into the indirection table
> +and reading the corresponding value.
> +
> +Some advanced NICs allow steering packets to queues based on
> +programmable filters. For example, webserver bound TCP port 80 packets
> +can be directed to their own receive queue. Such “n-tuple” filters can
> +be configured from ethtool (--config-ntuple).
> +
> +== RSS Configuration
> +
> +The driver for a multi-queue capable NIC typically provides a module
> +parameter specifying the number of hardware queues to configure. In the
> +bnx2x driver, for instance, this parameter is called num_queues. A
> +typical RSS configuration would be to have one receive queue for each
> +CPU if the device supports enough queues, or otherwise at least one for
> +each cache domain at a particular cache level (L1, L2, etc.).
> +
> +The indirection table of an RSS device, which resolves a queue by masked
> +hash, is usually programmed by the driver at initialization.  The
> +default mapping is to distribute the queues evenly in the table, but the
> +indirection table can be retrieved and modified at runtime using ethtool
> +commands (--show-rxfh-indir and --set-rxfh-indir).  Modifying the
> +indirection table could be done to to give different queues different

                                   ^^drop one "to"

> +relative weights. 

Drop trailing whitespace above and anywhere else that it's found. (5 places)

I thought (long ago :) that multiple RX queues were for prioritizing traffic,
but there is nothing here about using multi-queues for priorities.
Is that (no longer) done?


> +
> +== RSS IRQ Configuration
> +
> +Each receive queue has a separate IRQ associated with it. The NIC
> +triggers this to notify a CPU when new packets arrive on the given
> +queue. The signaling path for PCIe devices uses message signaled
> +interrupts (MSI-X), that can route each interrupt to a particular CPU.
> +The active mapping of queues to IRQs can be determined from
> +/proc/interrupts. By default, all IRQs are routed to CPU0.  Because a
> +non-negligible part of packet processing takes place in receive
> +interrupt handling, it is advantageous to spread receive interrupts
> +between CPUs. To manually adjust the IRQ affinity of each interrupt see
> +Documentation/IRQ-affinity. On some systems, the irqbalance daemon is
> +running and will try to dynamically optimize this setting.

or (avoid a split infinitive):  will try to optimize this setting dynamically.

> +
> +
> +RPS: Receive Packet Steering
> +============================
> +
> +Receive Packet Steering (RPS) is logically a software implementation of
> +RSS.  Being in software, it is necessarily called later in the datapath.
> +Whereas RSS selects the queue and hence CPU that will run the hardware
> +interrupt handler, RPS selects the CPU to perform protocol processing
> +above the interrupt handler.  This is accomplished by placing the packet
> +on the desired CPU’s backlog queue and waking up the CPU for processing.
> +RPS has some advantages over RSS: 1) it can be used with any NIC, 2)
> +software filters can easily be added to handle new protocols, 3) it does
> +not increase hardware device interrupt rate (but does use IPIs).
> +
> +RPS is called during bottom half of the receive interrupt handler, when
> +a driver sends a packet up the network stack with netif_rx() or
> +netif_receive_skb(). These call the get_rps_cpu() function, which
> +selects the queue that should process a packet.
> +
> +The first step in determining the target CPU for RPS is to calculate a
> +flow hash over the packet’s addresses or ports (2-tuple or 4-tuple hash
> +depending on the protocol). This serves as a consistent hash of the
> +associated flow of the packet. The hash is either provided by hardware
> +or will be computed in the stack. Capable hardware can pass the hash in
> +the receive descriptor for the packet, this would usually be the same

                                  packet;

> +hash used for RSS (e.g. computed Toeplitz hash). The hash is saved in
> +skb->rx_hash and can be used elsewhere in the stack as a hash of the
> +packet’s flow.
> +
> +Each receive hardware qeueue has associated list of CPUs which can

                                has an associated list (?)

> +process packets received on the queue for RPS.  For each received
> +packet, an index into the list is computed from the flow hash modulo the
> +size of the list.  The indexed CPU is the target for processing the
> +packet, and the packet is queued to the tail of that CPU’s backlog
> +queue. At the end of the bottom half routine, inter-processor interrupts
> +(IPIs) are sent to any CPUs for which packets have been queued to their
> +backlog queue. The IPI wakes backlog processing on the remote CPU, and
> +any queued packets are then processed up the networking stack. Note that
> +the list of CPUs can be configured separately for each hardware receive
> +queue.
> +
> +== RPS Configuration
> +
> +RPS requires a kernel compiled with the CONFIG_RPS flag (on by default

s/flag/kconfig symbol/

> +for smp). Even when compiled in, it is disabled without any

   for SMP).

> +configuration. The list of CPUs to which RPS may forward traffic can be
> +configured for each receive queue using the sysfs file entry:
> +
> + /sys/class/net/<dev>/queues/rx-<n>/rps_cpus
> +
> +This file implements a bitmap of CPUs. RPS is disabled when it is zero
> +(the default), in which case packets are processed on the interrupting
> +CPU.  IRQ-affinity.txt explains how CPUs are assigned to the bitmap.
> +
> +For a single queue device, a typical RPS configuration would be to set
> +the rps_cpus to the CPUs in the same cache domain of the interrupting
> +CPU for a queue. If NUMA locality is not an issue, this could also be
> +all CPUs in the system. At high interrupt rate, it might wise to exclude

                                                   it might be wise

> +the interrupting CPU from the map since that already performs much work.
> +
> +For a multi-queue system, if RSS is configured so that a receive queue
> +is mapped to each CPU, then RPS is probably redundant and unnecessary.
> +If there are fewer queues than CPUs, then RPS might be beneficial if the
> +rps_cpus for each queue are the ones that share the same cache domain as
> +the interrupting CPU for the queue.
> +
> +RFS: Receive Flow Steering
> +==========================
> +
> +While RPS steers packet solely based on hash, and thus generally

             steers packets

> +provides good load distribution, it does not take into account
> +application locality. This is accomplished by Receive Flow Steering
> +(RFS). The goal of RFS is to increase datacache hitrate by steering
> +kernel processing of packets to the CPU where the application thread
> +consuming the packet is running. RFS relies on the same RPS mechanisms
> +to enqueue packets onto the backlog of another CPU and to wake that CPU.
> +
> +In RFS, packets are not forwarded directly by the value of their hash,
> +but the hash is used as index into a flow lookup table. This table maps
> +flows to the CPUs where those flows are being processed. The flow hash
> +(see RPS section above) is used to calculate the index into this table.
> +The CPU recorded in each entry is the one which last processed the flow,
> +and if there is not a valid CPU for an entry, then packets mapped to
> +that entry are steered using plain RPS.
> +
> +To avoid out of order packets (ie. when scheduler moves a thread with

                                 (i.e., when the scheduler moves a thread that

> +outstanding receive packets on) there are two levels of flow tables used

has outstanding receive packets),

> +by RFS: rps_sock_flow_table and rps_dev_flow_table.
> +
> +rps_sock_table is a global flow table. Each table value is a CPU index
> +and is populated by recvmsg and sendmsg (specifically, inet_recvmsg(),
> +inet_sendmsg(), inet_sendpage() and tcp_splice_read()). This table
> +contains the *desired* CPUs for flows.
> +
> +rps_dev_flow_table is specific to each hardware receive queue of each
> +device.  Each table value stores a CPU index and a counter. The CPU
> +index represents the *current* CPU that is assigned to processing the
> +matching flows.
> +
> +The counter records the length of this CPU's backlog when a packet in
> +this flow was last enqueued.  Each backlog queue has a head counter that
> +is incremented on dequeue. A tail counter is computed as head counter +
> +queue length. In other words, the counter in rps_dev_flow_table[i]
> +records the last element in flow i that has been enqueued onto the
> +currently designated CPU for flow i (of course, entry i is actually
> +selected by hash and multiple flows may hash to the same entry i). 
> +
> +And now the trick for avoiding out of order packets: when selecting the
> +CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table
> +and the rps_dev_flow table of the queue that the packet was received on
> +are compared.  If the desired CPU for the flow (found in the
> +rps_sock_flow table) matches the current CPU (found in the rps_dev_flow
> +table), the packet is enqueud onto that CPU’s backlog. If they differ,

                         enqueued

> +the current cpu is updated to match the desired CPU if one of the

s/cpu/CPU/ (globally as needed)

> +following is true:
> +
> +- The current CPU's queue head counter >= the recorded tail counter
> +  value in rps_dev_flow[i]
> +- The current CPU is unset (equal to NR_CPUS)
> +- The current CPU is offline
> +
> +After this check, the packet is sent to the (possibly updated) current
> +CPU.  These rules aim to ensure that a flow only moves to a new CPU when
> +there are no packets outstanding on the old CPU, as the outstanding
> +packets could arrive later than those about to be processed on the new
> +CPU.
> +
> +== RFS Configuration
> +
> +RFS is only available if the kernel flag CONFIG_RFS is enabled (on by

s/flag/kconfig symbol/

> +default for smp). The functionality is disabled without any

s/smp/SMP/

> +configuration. The number of entries in the global flow table is set
> +through:
> +
> + /proc/sys/net/core/rps_sock_flow_entries
> +
> +The number of entries in the per queue flow table are set through:

                                per-queue

> +
> + /sys/class/net/<dev>/queues/tx-<n>/rps_flow_cnt
> +
> +Both of these need to be set before RFS is enabled for a receive queue.
> +Values for both of these are rounded up to the nearest power of two. The
> +suggested flow count depends on the expected number active connections

                                                number of

> +at any given time, which may be significantly less than the number of
> +open connections. We have found that a value of 32768 for
> +rps_sock_flow_entries works fairly well on a moderately loaded server.
> +
> +For a single queue device, the rps_flow_cnt value for the single queue
> +would normally be configured to the same value as rps_sock_flow_entries.
> +For a multi-queue device, the rps_flow_cnt for each queue might be
> +configured as rps_sock_flow_entries / N, where N is the number of
> +queues. So for instance, if rps_flow_entries is set to 32768 and there
> +are 16 configured receive queues, rps_flow_cnt for each queue might be
> +configured as 2048.
> +
> +
> +Accelerated RFS
> +===============
> +
> +Accelerated RFS is to RFS what RSS is to RPS: a hardware-accelerated
> +load balancing mechanism that uses soft state to steer flows based on
> +where the thread consuming the packets of each flow is running.
> +Accelerated RFS should perform better than RFS since packets are sent
> +directly to a CPU local to the thread consuming the data. The target CPU
> +will either be the same CPU where the application runs, or at least a
> +CPU which is local to the application thread’s CPU in the cache
> +hierarchy. 
> +
> +To enable accelerated RFS, the networking stack calls the
> +ndo_rx_flow_steer driver function to communicate the desired hardware
> +queue for packets matching a particular flow. The network stack
> +automatically calls this function every time a flow entry in
> +rps_dev_flow_table is updated. The driver in turn uses a device specific

                                                            device-specific

> +method to program the NIC to steer the packets.
> +
> +The hardware queue for a flow is derived from the CPU recorded in
> +rps_dev_flow_table. The stack consults a CPU to hardware queue map which

                                            CPU-to-hardware-queue map

> +is maintained by the NIC driver. This is an autogenerated reverse map of
> +the IRQ affinity table shown by /proc/interrupts. Drivers can use
> +functions in the cpu_rmap (“cpu affinitiy reverse map”) kernel library
> +to populate the map. For each CPU, the corresponding queue in the map is
> +set to be one whose processing CPU is closest in cache locality.
> +
> +== Accelerated RFS Configuration
> +
> +Accelerated RFS is only available if the kernel is compiled with
> +CONFIG_RFS_ACCEL and support is provided by the NIC device and driver.
> +It also requires that ntuple filtering is enabled via ethtool. The map
> +of CPU to queues is automatically deduced from the IRQ affinities
> +configured for each receive queue by the driver, so no additional
> +configuration should be necessary.
> +
> +XPS: Transmit Packet Steering
> +=============================
> +
> +Transmit Packet Steering is a mechanism for intelligently selecting
> +which transmit queue to use when transmitting a packet on a multi-queue
> +device. To accomplish this, a mapping from CPU to hardware queue(s) is
> +recorded. The goal of this mapping is usually to assign queues
> +exclusively to a subset of CPUs, where the transmit completions for
> +these queues are processed on a CPU within this set. This choice
> +provides two benefits. First, contention on the device queue lock is
> +significantly reduced since fewer CPUs contend for the same queue
> +(contention can be eliminated completely if each CPU has its own
> +transmit queue).  Secondly, cache miss rate on transmit completion is
> +reduced, in particular for data cache lines that hold the sk_buff
> +structures.
> +
> +XPS is configured per transmit queue by setting a bitmap of CPUs that
> +may use that queue to transmit. The reverse mapping, from CPUs to
> +transmit queues, is computed and maintained for each network device.
> +When transmitting the first packet in a flow, the function
> +get_xps_queue() is called to select a queue.  This function uses the ID
> +of the running CPU as a key into the CPU to queue lookup table. If the

                                        CPU-to-queue

> +ID matches a single queue, that is used for transmission.  If multiple
> +queues match, one is selected by using the flow hash to compute an index
> +into the set.
> +
> +The queue chosen for transmitting a particular flow is saved in the
> +corresponding socket structure for the flow (e.g. a TCP connection).
> +This transmit queue is used for subsequent packets sent on the flow to
> +prevent out of order (ooo) packets. The choice also amortizes the cost
> +of calling get_xps_queues() over all packets in the connection. To avoid
> +ooo packets, the queue for a flow can subsequently only be changed if
> +skb->ooo_okay is set for a packet in the flow. This flag indicates that
> +there are no outstanding packets in the flow, so the transmit queue can
> +change without the risk of generating out of order packets. The
> +transport layer is responsible for setting ooo_okay appropriately. TCP,
> +for instance, sets the flag when all data for a connection has been
> +acknowledged.
> +
> +
> +== XPS Configuration
> +
> +XPS is only available if the kernel flag CONFIG_XPS is enabled (on by

s/flag/kconfig symbol/

> +default for smp). The functionality is disabled without any

s/smp/SMP/

> +configuration, in which case the the transmit queue for a packet is
> +selected by using a flow hash as an index into the set of all transmit
> +queues for the device. To enable XPS, the bitmap of CPUs that may use a
> +transmit queue is configured using the sysfs file entry:
> +
> +/sys/class/net/<dev>/queues/tx-<n>/xps_cpus
> +
> +XPS is disabled when it is zero (the default). IRQ-affinity.txt explains
> +how CPUs are assigned to the bitmap. 
> +
> +For a network device with a single transmission queue, XPS configuration
> +has no effect, since there is no choice in this case. In a multi-queue
> +system, XPS is usually configured so that each CPU maps onto one queue.
> +If there are as many queues as there are CPUs in the system, then each
> +queue can also map onto one CPU, resulting in exclusive pairings that
> +experience no contention. If there are fewer queues than CPUs, then the
> +best CPUs to share a given queue are probably those that share the cache
> +with the CPU that processes transmit completions for that queue
> +(transmit interrupts).
> +
> +
> +Further Information
> +===================
> +RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into
> +2.6.38. Original patches were submitted by Tom Herbert
> +(therbert@google.com)
> +
> +
> +Accelerated RFS was introduced in 2.6.35. Original patches were
> +submitted by Ben Hutchings (bhutchings@solarflare.com)
> +
> +Authors:
> +Tom Herbert (therbert@google.com)
> +Willem de Bruijn (willemb@google.com)
> +
> -- 


Very nice writeup.  Thanks.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: add Documentation/networking/scaling.txt
  2011-08-01  6:56 [PATCH] net: add Documentation/networking/scaling.txt Tom Herbert
  2011-08-01 18:41 ` Randy Dunlap
@ 2011-08-01 18:49 ` Rick Jones
  2011-08-16 19:53   ` Ben Hutchings
  1 sibling, 1 reply; 6+ messages in thread
From: Rick Jones @ 2011-08-01 18:49 UTC (permalink / raw)
  To: Tom Herbert; +Cc: rdunlap, linux-doc, davem, netdev, willemb

On 07/31/2011 11:56 PM, Tom Herbert wrote:
> Describes RSS, RPS, RFS, accelerated RFS, and XPS.
>
> Signed-off-by: Tom Herbert<therbert@google.com>
> ---
>   Documentation/networking/scaling.txt |  346 ++++++++++++++++++++++++++++++++++
>   1 files changed, 346 insertions(+), 0 deletions(-)
>   create mode 100644 Documentation/networking/scaling.txt
>
> diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
> new file mode 100644
> index 0000000..aa51f0f
> --- /dev/null
> +++ b/Documentation/networking/scaling.txt
> @@ -0,0 +1,346 @@
> +Scaling in the Linux Networking Stack
> +
> +
> +Introduction
> +============
> +
> +This document describes a set of complementary techniques in the Linux
> +networking stack to increase parallelism and improve performance (in
> +throughput, latency, CPU utilization, etc.) for multi-processor systems.

Why not just leave-out the parenthetical lest some picky pedant find a 
specific example where either of those three are not improved?

> +
> +The following technologies are described:
> +
> +  RSS: Receive Side Scaling
> +  RPS: Receive Packet Steering
> +  RFS: Receive Flow Steering
> +  Accelerated Receive Flow Steering
> +  XPS: Transmit Packet Steering
> +
> +
> +RSS: Receive Side Scaling
> +=========================
> +
> +Contemporary NICs support multiple receive queues (multi-queue), which
> +can be used to distribute packets amongst CPUs for processing. The NIC
> +distributes packets by applying a filter to each packet to assign it to
> +one of a small number of logical flows.  Packets for each flow are
> +steered to a separate receive queue, which in turn can be processed by
> +separate CPUs.  This mechanism is generally known as “Receive-side
> +Scaling” (RSS).
> +
> +The filter used in RSS is typically a hash function over the network or
> +transport layer headers-- for example, a 4-tuple hash over IP addresses

Network *and* transport layer headers?  And/or?


> +== RSS IRQ Configuration
> +
> +Each receive queue has a separate IRQ associated with it. The NIC
> +triggers this to notify a CPU when new packets arrive on the given
> +queue. The signaling path for PCIe devices uses message signaled
> +interrupts (MSI-X), that can route each interrupt to a particular CPU.
> +The active mapping of queues to IRQs can be determined from
> +/proc/interrupts. By default, all IRQs are routed to CPU0.  Because a

Really?

> +non-negligible part of packet processing takes place in receive
> +interrupt handling, it is advantageous to spread receive interrupts
> +between CPUs. To manually adjust the IRQ affinity of each interrupt see
> +Documentation/IRQ-affinity. On some systems, the irqbalance daemon is
> +running and will try to dynamically optimize this setting.

I would probably make it explicit that the irqbalance daemon will undo 
one's manual changes:

"Some systems will be running an irqbalance daemon which will be trying 
to dynamically optimize IRQ assignments and will undo manual adjustments."

Whether one needs to go so far as to explicitly suggest that the 
irqbalance daemon should be disabled in such cases I'm not sure.


> +RPS: Receive Packet Steering
> +============================
> +
> +Receive Packet Steering (RPS) is logically a software implementation of
> ...
> +
> +Each receive hardware qeueue has associated list of CPUs which can

"queue has an associated" (spelling and grammar nits)

> +process packets received on the queue for RPS.  For each received
> +packet, an index into the list is computed from the flow hash modulo the
> +size of the list.  The indexed CPU is the target for processing the
> +packet, and the packet is queued to the tail of that CPU’s backlog
> +queue. At the end of the bottom half routine, inter-processor interrupts
> +(IPIs) are sent to any CPUs for which packets have been queued to their
> +backlog queue. The IPI wakes backlog processing on the remote CPU, and
> +any queued packets are then processed up the networking stack. Note that
> +the list of CPUs can be configured separately for each hardware receive
> +queue.
> +
> +== RPS Configuration
> +
> +RPS requires a kernel compiled with the CONFIG_RPS flag (on by default
> +for smp). Even when compiled in, it is disabled without any
> +configuration. The list of CPUs to which RPS may forward traffic can be
> +configured for each receive queue using the sysfs file entry:
> +
> + /sys/class/net/<dev>/queues/rx-<n>/rps_cpus
> +
> +This file implements a bitmap of CPUs. RPS is disabled when it is zero
> +(the default), in which case packets are processed on the interrupting
> +CPU.  IRQ-affinity.txt explains how CPUs are assigned to the bitmap.

Earlier in the writeup (snipped) it is presented as 
"Documentation/IRQ-affinity" and here as IRQ-affinity.txt, should that 
be "Documentation/IRQ-affinity.txt" in both cases?

> +For a single queue device, a typical RPS configuration would be to set
> +the rps_cpus to the CPUs in the same cache domain of the interrupting
> +CPU for a queue. If NUMA locality is not an issue, this could also be
> +all CPUs in the system. At high interrupt rate, it might wise to exclude
> +the interrupting CPU from the map since that already performs much work.
> +
> +For a multi-queue system, if RSS is configured so that a receive queue

Multple hardware queue to help keep the "queues" separate in the mind of 
the reader?

> +is mapped to each CPU, then RPS is probably redundant and unnecessary.
> +If there are fewer queues than CPUs, then RPS might be beneficial if the

same.

> +rps_cpus for each queue are the ones that share the same cache domain as
> +the interrupting CPU for the queue.
> +
> +RFS: Receive Flow Steering
> +==========================
> +
> +While RPS steers packet solely based on hash, and thus generally
> +provides good load distribution, it does not take into account
> +application locality. This is accomplished by Receive Flow Steering

Should it also mention how an application thread of execution might be 
processing requests on multiple connections, which themselves might not 
normally hash to the same place?


> +== RFS Configuration
> +
> +RFS is only available if the kernel flag CONFIG_RFS is enabled (on by
> +default for smp). The functionality is disabled without any
> +configuration.

Perhaps just wordsmithing, but "This functionality remains disabled 
until explicitly configured." seems clearer.

> +== Accelerated RFS Configuration
> +
> +Accelerated RFS is only available if the kernel is compiled with
> +CONFIG_RFS_ACCEL and support is provided by the NIC device and driver.
> +It also requires that ntuple filtering is enabled via ethtool.

Requires that ntuple filtering be enabled?

> +XPS: Transmit Packet Steering
> +=============================
> +
> +Transmit Packet Steering is a mechanism for intelligently selecting
> +which transmit queue to use when transmitting a packet on a multi-queue
> +device.

Minor nit.  Up to this point a multi-queue device was only described as 
one with multiple receive queues.


> +Further Information
> +===================
> +RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into
> +2.6.38. Original patches were submitted by Tom Herbert
> +(therbert@google.com)
> +
> +
> +Accelerated RFS was introduced in 2.6.35. Original patches were
> +submitted by Ben Hutchings (bhutchings@solarflare.com)
> +
> +Authors:
> +Tom Herbert (therbert@google.com)
> +Willem de Bruijn (willemb@google.com)
> +

While there are tidbits and indications in the descriptions of each 
mechanism, a section with explicit description of when one would use the 
different mechanisms would be goodness.

rick jones

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: add Documentation/networking/scaling.txt
  2011-08-01 18:49 ` Rick Jones
@ 2011-08-16 19:53   ` Ben Hutchings
  2011-08-19 17:38     ` Rick Jones
  2011-08-19 19:50     ` Will de Bruijn
  0 siblings, 2 replies; 6+ messages in thread
From: Ben Hutchings @ 2011-08-16 19:53 UTC (permalink / raw)
  To: Rick Jones; +Cc: Tom Herbert, rdunlap, linux-doc, davem, netdev, willemb

On Mon, 2011-08-01 at 11:49 -0700, Rick Jones wrote:
> On 07/31/2011 11:56 PM, Tom Herbert wrote:
> > Describes RSS, RPS, RFS, accelerated RFS, and XPS.
> >
> > Signed-off-by: Tom Herbert<therbert@google.com>
> > ---
> >   Documentation/networking/scaling.txt |  346 ++++++++++++++++++++++++++++++++++
> >   1 files changed, 346 insertions(+), 0 deletions(-)
> >   create mode 100644 Documentation/networking/scaling.txt
> >
> > diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
> > new file mode 100644
> > index 0000000..aa51f0f
> > --- /dev/null
> > +++ b/Documentation/networking/scaling.txt
> > @@ -0,0 +1,346 @@
> > +Scaling in the Linux Networking Stack
> > +
> > +
> > +Introduction
> > +============
> > +
> > +This document describes a set of complementary techniques in the Linux
> > +networking stack to increase parallelism and improve performance (in
> > +throughput, latency, CPU utilization, etc.) for multi-processor systems.
> 
> Why not just leave-out the parenthetical lest some picky pedant find a 
> specific example where either of those three are not improved?

As I'm sure you're aware, there is often a trade-off between throughput
and latency.  It might be useful to provide some guidelines for
optimising each of the above.

[...]
> > +== RSS IRQ Configuration
> > +
> > +Each receive queue has a separate IRQ associated with it. The NIC
> > +triggers this to notify a CPU when new packets arrive on the given
> > +queue. The signaling path for PCIe devices uses message signaled
> > +interrupts (MSI-X), that can route each interrupt to a particular CPU.
> > +The active mapping of queues to IRQs can be determined from
> > +/proc/interrupts. By default, all IRQs are routed to CPU0.  Because a
> 
> Really?

The default affinity for most IRQs is all-CPUs.  At least on x86, that
really means CPU 0 only, so far as I can see.

[...]
> > +== Accelerated RFS Configuration
> > +
> > +Accelerated RFS is only available if the kernel is compiled with
> > +CONFIG_RFS_ACCEL and support is provided by the NIC device and driver.
> > +It also requires that ntuple filtering is enabled via ethtool.
> 
> Requires that ntuple filtering be enabled?
[...]

As a matter of fact, n-tuple filtering is enabled by default where
available.  So it might actually make more sense to say that RFS
acceleration can be *disabled* by disabling n-tuple filtering using
ethtool.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: add Documentation/networking/scaling.txt
  2011-08-16 19:53   ` Ben Hutchings
@ 2011-08-19 17:38     ` Rick Jones
  2011-08-19 19:50     ` Will de Bruijn
  1 sibling, 0 replies; 6+ messages in thread
From: Rick Jones @ 2011-08-19 17:38 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Tom Herbert, rdunlap, linux-doc, davem, netdev, willemb

On 08/16/2011 12:53 PM, Ben Hutchings wrote:
> On Mon, 2011-08-01 at 11:49 -0700, Rick Jones wrote:
>> On 07/31/2011 11:56 PM, Tom Herbert wrote:
>>> Describes RSS, RPS, RFS, accelerated RFS, and XPS.
>>>
>>> Signed-off-by: Tom Herbert<therbert@google.com>
>>> ---
>>>    Documentation/networking/scaling.txt |  346 ++++++++++++++++++++++++++++++++++
>>>    1 files changed, 346 insertions(+), 0 deletions(-)
>>>    create mode 100644 Documentation/networking/scaling.txt
>>>
>>> diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
>>> new file mode 100644
>>> index 0000000..aa51f0f
>>> --- /dev/null
>>> +++ b/Documentation/networking/scaling.txt
>>> @@ -0,0 +1,346 @@
>>> +Scaling in the Linux Networking Stack
>>> +
>>> +
>>> +Introduction
>>> +============
>>> +
>>> +This document describes a set of complementary techniques in the Linux
>>> +networking stack to increase parallelism and improve performance (in
>>> +throughput, latency, CPU utilization, etc.) for multi-processor systems.
>>
>> Why not just leave-out the parenthetical lest some picky pedant find a
>> specific example where either of those three are not improved?
>
> As I'm sure you're aware, there is often a trade-off between throughput
> and latency.  It might be useful to provide some guidelines for
> optimising each of the above.

Yes.  I just worry about taking *too* many steps down the slippery slope 
and ending-up ballooning into a broad systems tuning tutorial.

rick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: add Documentation/networking/scaling.txt
  2011-08-16 19:53   ` Ben Hutchings
  2011-08-19 17:38     ` Rick Jones
@ 2011-08-19 19:50     ` Will de Bruijn
  1 sibling, 0 replies; 6+ messages in thread
From: Will de Bruijn @ 2011-08-19 19:50 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Rick Jones, Tom Herbert, rdunlap, linux-doc, davem, netdev

> As I'm sure you're aware, there is often a trade-off between throughput
> and latency.  It might be useful to provide some guidelines for
> optimising each of the above.

The suggested configuration section on RSS already gives some vague
heuristics. In general, I hesitate to write down best guesses, and I
lack the hard experimental data that would make such advise more
sound.

> The default affinity for most IRQs is all-CPUs.  At least on x86, that
> really means CPU 0 only, so far as I can see.

Yes, I believe so, but since the specified behavior is all-CPUs and
other architectures are free to implement this differently, I prefer
to leave the weaker statement. Also, even on x86 there is the
possibility of migration, so starting on CPU0 is not equivalent to
having the affinity set to that CPU.

>> Requires that ntuple filtering be enabled?
> [...]
>
> As a matter of fact, n-tuple filtering is enabled by default where
> available.  So it might actually make more sense to say that RFS
> acceleration can be *disabled* by disabling n-tuple filtering using
> ethtool.

I didn't know (or check, clearly). I'll clarify that in a next
revision. Since the text is still factually correct and I don't want
to spam the list with frequent minor changes, I'll batch them.

Thanks, Ben.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-08-19 19:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-01  6:56 [PATCH] net: add Documentation/networking/scaling.txt Tom Herbert
2011-08-01 18:41 ` Randy Dunlap
2011-08-01 18:49 ` Rick Jones
2011-08-16 19:53   ` Ben Hutchings
2011-08-19 17:38     ` Rick Jones
2011-08-19 19:50     ` Will de Bruijn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.