All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] net: Require socket to allow XPS to set queue mapping
@ 2016-08-25 19:23 Alexander Duyck
  2016-08-25 19:49 ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Duyck @ 2016-08-25 19:23 UTC (permalink / raw)
  To: netdev, rick.jones2

I have been seeing a number of issues where XPS leads to issues with
packets being reordered in the transmit queues of the device drivers.  The
main situation where this seems to occur is when a VM us using a tap
interface to send packets to the network via a NIC that has XPS enabled.

A bit of looking into this revealed the main issue is that the scheduler
seems to be migrating the VM between CPUs and as this occurs the traffic
for a given flow from a VM is following this migration and hopping between
Tx queues leading to packet reordering.

A workaround for this is to make certain all the VMs have RPS enabled on
the tap interfaces, however this requires extra configuration on the host
for each VM created.

A simpler approach is provided with this patch.  With it we disable XPS any
time a socket is not present for a given flow.  By doing this we can avoid
using XPS for any routing or bridging situations in which XPS is likely
more of a hinderance than a help.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 net/core/dev.c |   28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index a75df86..9cea73b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3242,24 +3242,26 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 
 static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb)
 {
+	int queue_index, new_index;
 	struct sock *sk = skb->sk;
-	int queue_index = sk_tx_queue_get(sk);
 
-	if (queue_index < 0 || skb->ooo_okay ||
-	    queue_index >= dev->real_num_tx_queues) {
-		int new_index = get_xps_queue(dev, skb);
-		if (new_index < 0)
-			new_index = skb_tx_hash(dev, skb);
+	if (!sk)
+		return skb_tx_hash(dev, skb);
 
-		if (queue_index != new_index && sk &&
-		    sk_fullsock(sk) &&
-		    rcu_access_pointer(sk->sk_dst_cache))
-			sk_tx_queue_set(sk, new_index);
+	queue_index = sk_tx_queue_get(sk);
+	if (queue_index >= 0 && !skb->ooo_okay &&
+	    queue_index < dev->real_num_tx_queues)
+		return queue_index;
 
-		queue_index = new_index;
-	}
+	new_index = get_xps_queue(dev, skb);
+	if (new_index < 0)
+		new_index = skb_tx_hash(dev, skb);
 
-	return queue_index;
+	if (queue_index != new_index && sk_fullsock(sk) &&
+	    rcu_access_pointer(sk->sk_dst_cache))
+		sk_tx_queue_set(sk, new_index);
+
+	return new_index;
 }
 
 struct netdev_queue *netdev_pick_tx(struct net_device *dev,

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
  2016-08-25 19:23 [RFC PATCH] net: Require socket to allow XPS to set queue mapping Alexander Duyck
@ 2016-08-25 19:49 ` Eric Dumazet
  2016-08-25 20:32   ` Rick Jones
  2016-08-25 21:42   ` Tom Herbert
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2016-08-25 19:49 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, rick.jones2

On Thu, 2016-08-25 at 12:23 -0700, Alexander Duyck wrote:
> I have been seeing a number of issues where XPS leads to issues with
> packets being reordered in the transmit queues of the device drivers.  The
> main situation where this seems to occur is when a VM us using a tap
> interface to send packets to the network via a NIC that has XPS enabled.
> 
> A bit of looking into this revealed the main issue is that the scheduler
> seems to be migrating the VM between CPUs and as this occurs the traffic
> for a given flow from a VM is following this migration and hopping between
> Tx queues leading to packet reordering.
> 
> A workaround for this is to make certain all the VMs have RPS enabled on
> the tap interfaces, however this requires extra configuration on the host
> for each VM created.
> 
> A simpler approach is provided with this patch.  With it we disable XPS any
> time a socket is not present for a given flow.  By doing this we can avoid
> using XPS for any routing or bridging situations in which XPS is likely
> more of a hinderance than a help.

Yes, but this will destroy isolation for people properly doing VM cpu
pining.

With this patch, DDOS traffic coming from a VM will hit all TX queues.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
  2016-08-25 19:49 ` Eric Dumazet
@ 2016-08-25 20:32   ` Rick Jones
  2016-08-25 21:08     ` Eric Dumazet
  2016-08-25 21:08     ` Alexander Duyck
  2016-08-25 21:42   ` Tom Herbert
  1 sibling, 2 replies; 8+ messages in thread
From: Rick Jones @ 2016-08-25 20:32 UTC (permalink / raw)
  To: Eric Dumazet, Alexander Duyck; +Cc: netdev

On 08/25/2016 12:49 PM, Eric Dumazet wrote:
> On Thu, 2016-08-25 at 12:23 -0700, Alexander Duyck wrote:
>> A simpler approach is provided with this patch.  With it we disable XPS any
>> time a socket is not present for a given flow.  By doing this we can avoid
>> using XPS for any routing or bridging situations in which XPS is likely
>> more of a hinderance than a help.
>
> Yes, but this will destroy isolation for people properly doing VM cpu
> pining.

Why not simply stop enabling XPS by default. Treat it like RPS and RFS 
(unless I've missed a patch...). The people who are already doing the 
extra steps to pin VMs can enable XPS in that case.  It isn't clear that 
one should always pin VMs - for example if a (public) cloud needed to 
oversubscribe the cores.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
  2016-08-25 20:32   ` Rick Jones
@ 2016-08-25 21:08     ` Eric Dumazet
  2016-08-25 21:28       ` Rick Jones
  2016-08-25 21:08     ` Alexander Duyck
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-08-25 21:08 UTC (permalink / raw)
  To: Rick Jones; +Cc: Alexander Duyck, netdev

On Thu, 2016-08-25 at 13:32 -0700, Rick Jones wrote:
> On 08/25/2016 12:49 PM, Eric Dumazet wrote:
> > On Thu, 2016-08-25 at 12:23 -0700, Alexander Duyck wrote:
> >> A simpler approach is provided with this patch.  With it we disable XPS any
> >> time a socket is not present for a given flow.  By doing this we can avoid
> >> using XPS for any routing or bridging situations in which XPS is likely
> >> more of a hinderance than a help.
> >
> > Yes, but this will destroy isolation for people properly doing VM cpu
> > pining.
> 
> Why not simply stop enabling XPS by default. Treat it like RPS and RFS 
> (unless I've missed a patch...). The people who are already doing the 
> extra steps to pin VMs can enable XPS in that case.  It isn't clear that 
> one should always pin VMs - for example if a (public) cloud needed to 
> oversubscribe the cores.
> 

When XPS was submitted, it was _not_ enabled by default and 'magic'

Some NIC vendors decided it was a good thing, you should complain to
them ;)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
  2016-08-25 20:32   ` Rick Jones
  2016-08-25 21:08     ` Eric Dumazet
@ 2016-08-25 21:08     ` Alexander Duyck
  1 sibling, 0 replies; 8+ messages in thread
From: Alexander Duyck @ 2016-08-25 21:08 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, Alexander Duyck, Netdev

On Thu, Aug 25, 2016 at 1:32 PM, Rick Jones <rick.jones2@hpe.com> wrote:
> On 08/25/2016 12:49 PM, Eric Dumazet wrote:
>>
>> On Thu, 2016-08-25 at 12:23 -0700, Alexander Duyck wrote:
>>>
>>> A simpler approach is provided with this patch.  With it we disable XPS
>>> any
>>> time a socket is not present for a given flow.  By doing this we can
>>> avoid
>>> using XPS for any routing or bridging situations in which XPS is likely
>>> more of a hinderance than a help.
>>
>>
>> Yes, but this will destroy isolation for people properly doing VM cpu
>> pining.
>
>
> Why not simply stop enabling XPS by default. Treat it like RPS and RFS
> (unless I've missed a patch...). The people who are already doing the extra
> steps to pin VMs can enable XPS in that case.  It isn't clear that one
> should always pin VMs - for example if a (public) cloud needed to
> oversubscribe the cores.
>
> happy benchmarking,
>
> rick jones

The big motivation for most of the drivers to have it is because XPS
can provide very good savings when you end up aligning the entire
transmit path by also controlling the memory allocations and IRQ
affinity.  What most of these devices try to do by default is isolate
all of a given flow to a single CPU so you can get as close to linear
scaling as possible.  The problem is with XPS disabled by default you
take a pretty serious performance hit for all of the
non-virtualization cases since you can easily end up crossing CPUs or
worse yet NUMA nodes when processing Tx traffic.

Really if anything I still think we need to do something to enforce
ordering of flows in regards to things that make use of netif_rx
instead of using a NAPI processing path.  From what I can tell we will
always end up running into reordering issues as long as you can have a
single thread bouncing between CPUs and spraying packets on to the
various backlogs.  It is one of the reasons why I had mentioned
possibly enabling RPS for these type of interfaces as without it you
will still get some reordering, it just varies in the degree as to how
much.  Essentially all that is happening when you disable XPS is that
you shorten the queues, but there are still multiple queues in play.
You still can end up with packets getting reordered between the
various per_cpu backlogs.

- Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
  2016-08-25 21:08     ` Eric Dumazet
@ 2016-08-25 21:28       ` Rick Jones
  2016-09-21 15:07         ` Willem de Bruijn
  0 siblings, 1 reply; 8+ messages in thread
From: Rick Jones @ 2016-08-25 21:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, netdev

On 08/25/2016 02:08 PM, Eric Dumazet wrote:
> When XPS was submitted, it was _not_ enabled by default and 'magic'
>
> Some NIC vendors decided it was a good thing, you should complain to
> them ;)

I kindasorta am with the emails I've been sending to netdev :)  And also 
hopefully precluding others going down that path.

happy benchmarking,

rick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
  2016-08-25 19:49 ` Eric Dumazet
  2016-08-25 20:32   ` Rick Jones
@ 2016-08-25 21:42   ` Tom Herbert
  1 sibling, 0 replies; 8+ messages in thread
From: Tom Herbert @ 2016-08-25 21:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, Linux Kernel Network Developers, Rick Jones

On Thu, Aug 25, 2016 at 12:49 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-08-25 at 12:23 -0700, Alexander Duyck wrote:
>> I have been seeing a number of issues where XPS leads to issues with
>> packets being reordered in the transmit queues of the device drivers.  The
>> main situation where this seems to occur is when a VM us using a tap
>> interface to send packets to the network via a NIC that has XPS enabled.
>>
>> A bit of looking into this revealed the main issue is that the scheduler
>> seems to be migrating the VM between CPUs and as this occurs the traffic
>> for a given flow from a VM is following this migration and hopping between
>> Tx queues leading to packet reordering.
>>
>> A workaround for this is to make certain all the VMs have RPS enabled on
>> the tap interfaces, however this requires extra configuration on the host
>> for each VM created.
>>
>> A simpler approach is provided with this patch.  With it we disable XPS any
>> time a socket is not present for a given flow.  By doing this we can avoid
>> using XPS for any routing or bridging situations in which XPS is likely
>> more of a hinderance than a help.
>
> Yes, but this will destroy isolation for people properly doing VM cpu
> pining.
>
> With this patch, DDOS traffic coming from a VM will hit all TX queues.
>
As I said the other thread, the better solution is to start tracking
flows coming out of VMs.

Tom

>
>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
  2016-08-25 21:28       ` Rick Jones
@ 2016-09-21 15:07         ` Willem de Bruijn
  0 siblings, 0 replies; 8+ messages in thread
From: Willem de Bruijn @ 2016-09-21 15:07 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, Alexander Duyck, Network Development

On Thu, Aug 25, 2016 at 5:28 PM, Rick Jones <rick.jones2@hpe.com> wrote:
> On 08/25/2016 02:08 PM, Eric Dumazet wrote:
>>
>> When XPS was submitted, it was _not_ enabled by default and 'magic'
>>
>> Some NIC vendors decided it was a good thing, you should complain to
>> them ;)
>
>
> I kindasorta am with the emails I've been sending to netdev :)  And also
> hopefully precluding others going down that path.

I recently came across another effect of configuring devices at
ndo_open. The behavior of `ip link set dev ${dev} down && ip link set
dev ${dev} up` is no longer consistent across devices. Drivers that
call netif_set_xps_queue overwrite a user supplied xps configuration,
others leave it in place.

This is easily demonstrated by adding this snippet to the loopback driver:

+static int loopback_dev_open(struct net_device *dev)
+{
+       cpumask_t mask;
+
+       cpumask_clear(&mask);
+       cpumask_set_cpu(0, &mask);
+
+       return netif_set_xps_queue(dev, &mask, 0);
+}
+
 static void loopback_dev_free(struct net_device *dev)
 {
        free_percpu(dev->lstats);
@@ -158,6 +168,7 @@ static void loopback_dev_free(struct net_device *dev)

 static const struct net_device_ops loopback_ops = {
        .ndo_init      = loopback_dev_init,
+       .ndo_open       = loopback_dev_open,


and running

fp=/sys/class/net/lo/queues/tx-0/xps_cpus

cat ${fp}
echo 8 > ${fp}
cat ${fp}
ip link set dev lo down
ip link set dev lo up
cat ${fp}

On a vanilla kernel, the output is {0, 8, 8} : the user-supplied xps
setting persists
With the patch, it is {1, 8, 1} : the driver sets, and resets, xps

A third patch that adds netif_set_xps_queue to loopback_dev_init gives
{1, 8, 8}. That is arguably the preferred behavior (sidestepping the
issue of whether device driver initialization is a good thing in
itself): init with a sane default, but do not override user
preference.

The fm10k and i40e drivers makes the call conditional on
test_and_set_bit(__FM10K_TX_XPS_INIT_DONE), so probably already
implement those semantics. In which case other drivers should probably
be updated to do the same. If all drivers are essentially trying to
set the same basic load balancing policy, this could also be lifted
out of drivers into __dev_open for consistency and code deduplication.

I'm pointing this out less for changing this xps feature, than as a
subtle effect to be aware of with any subsequent device driver policy
patches.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-09-21 15:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25 19:23 [RFC PATCH] net: Require socket to allow XPS to set queue mapping Alexander Duyck
2016-08-25 19:49 ` Eric Dumazet
2016-08-25 20:32   ` Rick Jones
2016-08-25 21:08     ` Eric Dumazet
2016-08-25 21:28       ` Rick Jones
2016-09-21 15:07         ` Willem de Bruijn
2016-08-25 21:08     ` Alexander Duyck
2016-08-25 21:42   ` Tom Herbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.