All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
@ 2017-02-07 18:15 Ben Serebrin
  2017-02-08 18:37 ` David Miller
  2017-02-08 19:37 ` Michael S. Tsirkin
  0 siblings, 2 replies; 13+ messages in thread
From: Ben Serebrin @ 2017-02-07 18:15 UTC (permalink / raw)
  To: Christian Borntraeger, netdev, mst, jasowang, davem, willemb,
	venkateshs, jonolson, willemdebruijn.kernel, rick.jones2,
	jmattson, linux-s390, linux-arch
  Cc: Benjamin Serebrin

From: Benjamin Serebrin <serebrin@google.com>

If the number of virtio queue pairs is not equal to the
number of VCPUs, the virtio guest driver doesn't assign
any CPU affinity for the queue interrupts or the xps
aggregation interrupt.  (In contrast, the driver does assign
both if the counts of VCPUs and queues are equal, which is a good
default behavior.)

Google Compute Engine currently provides 1 queue pair for
every VCPU, but limits that at a maximum of 32 queue pairs.

This code extends the driver's default interrupt affinity
and transmit affinity settings for the case where there
are mismatching queue and VCPU counts.  Userspace affinity
adjustment may always be needed to tune for a given workload.

Tested:

(on a 64-VCPU VM with debian 8, jessie-backports 4.9.2)

Without the fix we see all queues affinitized to all CPUs:

cd /proc/irq
for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
0-63
[...]
0-63

and we see all TX queues' xps_cpus affinitzed to no cores:

for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done
00000000,00000000
[...]
00000000,00000000

With the fix, we see each queue assigned to the a single core,
and xps affinity set to 1 unique cpu per TX queue.

64 VCPU:

cd /proc/irq
for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done

0-63
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
17
17
18
18
19
19
20
20
21
21
22
22
23
23
24
24
25
25
26
26
27
27
28
28
29
29
30
30
31
31
0-63
0-63
0-63
0-63

cd /sys/class/net/eth0/queues
for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done

00000001,00000001
00000002,00000002
00000004,00000004
00000008,00000008
00000010,00000010
00000020,00000020
00000040,00000040
00000080,00000080
00000100,00000100
00000200,00000200
00000400,00000400
00000800,00000800
00001000,00001000
00002000,00002000
00004000,00004000
00008000,00008000
00010000,00010000
00020000,00020000
00040000,00040000
00080000,00080000
00100000,00100000
00200000,00200000
00400000,00400000
00800000,00800000
01000000,01000000
02000000,02000000
04000000,04000000
08000000,08000000
10000000,10000000
20000000,20000000
40000000,40000000
80000000,80000000

48 VCPU:

cd /proc/irq
for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
0-47
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
17
17
18
18
19
19
20
20
21
21
22
22
23
23
24
24
25
25
26
26
27
27
28
28
29
29
30
30
31
31
0-47
0-47
0-47
0-47

cd /sys/class/net/eth0/queues
for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done

0001,00000001
0002,00000002
0004,00000004
0008,00000008
0010,00000010
0020,00000020
0040,00000040
0080,00000080
0100,00000100
0200,00000200
0400,00000400
0800,00000800
1000,00001000
2000,00002000
4000,00004000
8000,00008000
0000,00010000
0000,00020000
0000,00040000
0000,00080000
0000,00100000
0000,00200000
0000,00400000
0000,00800000
0000,01000000
0000,02000000
0000,04000000
0000,08000000
0000,10000000
0000,20000000
0000,40000000
0000,80000000

Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jim Mattson <jmattson@google.com>
Acked-by: Venkatesh Srinivas <venkateshs@google.com>

Signed-off-by: Ben Serebrin <serebrin@google.com>
---
 drivers/net/virtio_net.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 765c2d6358da..0dc3a102bfc4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1502,20 +1502,44 @@ static void virtnet_set_affinity(struct virtnet_info *vi)
 	 * queue pairs, we let the queue pairs to be private to one cpu by
 	 * setting the affinity hint to eliminate the contention.
 	 */
-	if (vi->curr_queue_pairs == 1 ||
-	    vi->max_queue_pairs != num_online_cpus()) {
+	if (vi->curr_queue_pairs == 1) {
 		virtnet_clean_affinity(vi, -1);
 		return;
 	}
 
+	/* If there are more cpus than queues, then assign the queues'
+	 * interrupts to the first cpus until we run out.
+	 */
 	i = 0;
 	for_each_online_cpu(cpu) {
+		if (i == vi->max_queue_pairs)
+			break;
 		virtqueue_set_affinity(vi->rq[i].vq, cpu);
 		virtqueue_set_affinity(vi->sq[i].vq, cpu);
-		netif_set_xps_queue(vi->dev, cpumask_of(cpu), i);
 		i++;
 	}
 
+	/* Stripe the XPS affinities across the online CPUs.
+	 * Hyperthread pairs are typically assigned such that Linux's
+	 * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause
+	 * hyperthread twins to share TX queues, in the case where there are
+	 * more cpus than queues.
+	 */
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		struct cpumask mask;
+		int skip = i;
+
+		cpumask_clear(&mask);
+		for_each_online_cpu(cpu) {
+			while (skip--)
+				cpu = cpumask_next(cpu, cpu_online_mask);
+			if (cpu < num_possible_cpus())
+				cpumask_set_cpu(cpu, &mask);
+			skip = vi->max_queue_pairs - 1;
+		}
+		netif_set_xps_queue(vi->dev, &mask, i);
+	}
+
 	vi->affinity_hint_set = true;
 }
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-07 18:15 [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs Ben Serebrin
@ 2017-02-08 18:37 ` David Miller
  2017-02-08 19:37 ` Michael S. Tsirkin
  1 sibling, 0 replies; 13+ messages in thread
From: David Miller @ 2017-02-08 18:37 UTC (permalink / raw)
  To: serebrin
  Cc: borntraeger, netdev, mst, jasowang, willemb, venkateshs,
	jonolson, willemdebruijn.kernel, rick.jones2, jmattson,
	linux-s390, linux-arch

From: Ben Serebrin <serebrin@google.com>
Date: Tue,  7 Feb 2017 10:15:06 -0800

> If the number of virtio queue pairs is not equal to the
> number of VCPUs, the virtio guest driver doesn't assign
> any CPU affinity for the queue interrupts or the xps
> aggregation interrupt.  (In contrast, the driver does assign
> both if the counts of VCPUs and queues are equal, which is a good
> default behavior.)
 ...

Can I get some reviews, please?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-07 18:15 [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs Ben Serebrin
  2017-02-08 18:37 ` David Miller
@ 2017-02-08 19:37 ` Michael S. Tsirkin
  2017-02-14 19:17   ` Benjamin Serebrin
  1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2017-02-08 19:37 UTC (permalink / raw)
  To: Ben Serebrin
  Cc: Christian Borntraeger, netdev, jasowang, davem, willemb,
	venkateshs, jonolson, willemdebruijn.kernel, rick.jones2,
	jmattson, linux-s390, linux-arch

On Tue, Feb 07, 2017 at 10:15:06AM -0800, Ben Serebrin wrote:
> From: Benjamin Serebrin <serebrin@google.com>
> 
> If the number of virtio queue pairs is not equal to the
> number of VCPUs, the virtio guest driver doesn't assign
> any CPU affinity for the queue interrupts or the xps
> aggregation interrupt.  (In contrast, the driver does assign
> both if the counts of VCPUs and queues are equal, which is a good
> default behavior.)
> 
> Google Compute Engine currently provides 1 queue pair for
> every VCPU, but limits that at a maximum of 32 queue pairs.
> 
> This code extends the driver's default interrupt affinity
> and transmit affinity settings for the case where there
> are mismatching queue and VCPU counts.  Userspace affinity
> adjustment may always be needed to tune for a given workload.

IIRC irqbalance will bail out and avoid touching affinity
if you set affinity from driver.  Breaking that's not nice.
Pls correct me if I'm wrong.

Generally, I wonder - we aren't the only device with a limited number of
queues.

> Tested:
> 
> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2)
> 
> Without the fix we see all queues affinitized to all CPUs:
> 
> cd /proc/irq
> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
> 0-63
> [...]
> 0-63
> 
> and we see all TX queues' xps_cpus affinitzed to no cores:
> 
> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done
> 00000000,00000000
> [...]
> 00000000,00000000
> 
> With the fix, we see each queue assigned to the a single core,
> and xps affinity set to 1 unique cpu per TX queue.
> 
> 64 VCPU:
> 
> cd /proc/irq
> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
> 
> 0-63
> 0
> 0
> 1
> 1
> 2
> 2
> 3
> 3
> 4
> 4
> 5
> 5
> 6
> 6
> 7
> 7
> 8
> 8
> 9
> 9
> 10
> 10
> 11
> 11
> 12
> 12
> 13
> 13
> 14
> 14
> 15
> 15
> 16
> 16
> 17
> 17
> 18
> 18
> 19
> 19
> 20
> 20
> 21
> 21
> 22
> 22
> 23
> 23
> 24
> 24
> 25
> 25
> 26
> 26
> 27
> 27
> 28
> 28
> 29
> 29
> 30
> 30
> 31
> 31
> 0-63
> 0-63
> 0-63
> 0-63
> 
> cd /sys/class/net/eth0/queues
> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done
> 
> 00000001,00000001
> 00000002,00000002
> 00000004,00000004
> 00000008,00000008
> 00000010,00000010
> 00000020,00000020
> 00000040,00000040
> 00000080,00000080
> 00000100,00000100
> 00000200,00000200
> 00000400,00000400
> 00000800,00000800
> 00001000,00001000
> 00002000,00002000
> 00004000,00004000
> 00008000,00008000
> 00010000,00010000
> 00020000,00020000
> 00040000,00040000
> 00080000,00080000
> 00100000,00100000
> 00200000,00200000
> 00400000,00400000
> 00800000,00800000
> 01000000,01000000
> 02000000,02000000
> 04000000,04000000
> 08000000,08000000
> 10000000,10000000
> 20000000,20000000
> 40000000,40000000
> 80000000,80000000
> 
> 48 VCPU:
> 
> cd /proc/irq
> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
> 0-47
> 0
> 0
> 1
> 1
> 2
> 2
> 3
> 3
> 4
> 4
> 5
> 5
> 6
> 6
> 7
> 7
> 8
> 8
> 9
> 9
> 10
> 10
> 11
> 11
> 12
> 12
> 13
> 13
> 14
> 14
> 15
> 15
> 16
> 16
> 17
> 17
> 18
> 18
> 19
> 19
> 20
> 20
> 21
> 21
> 22
> 22
> 23
> 23
> 24
> 24
> 25
> 25
> 26
> 26
> 27
> 27
> 28
> 28
> 29
> 29
> 30
> 30
> 31
> 31
> 0-47
> 0-47
> 0-47
> 0-47
> 
> cd /sys/class/net/eth0/queues
> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done
> 
> 0001,00000001
> 0002,00000002
> 0004,00000004
> 0008,00000008
> 0010,00000010
> 0020,00000020
> 0040,00000040
> 0080,00000080
> 0100,00000100
> 0200,00000200
> 0400,00000400
> 0800,00000800
> 1000,00001000
> 2000,00002000
> 4000,00004000
> 8000,00008000
> 0000,00010000
> 0000,00020000
> 0000,00040000
> 0000,00080000
> 0000,00100000
> 0000,00200000
> 0000,00400000
> 0000,00800000
> 0000,01000000
> 0000,02000000
> 0000,04000000
> 0000,08000000
> 0000,10000000
> 0000,20000000
> 0000,40000000
> 0000,80000000
> 
> Acked-by: Willem de Bruijn <willemb@google.com>
> Acked-by: Jim Mattson <jmattson@google.com>
> Acked-by: Venkatesh Srinivas <venkateshs@google.com>
> 
> Signed-off-by: Ben Serebrin <serebrin@google.com>


What happens if you have more than 1 virtio net device?

> ---
>  drivers/net/virtio_net.c | 30 +++++++++++++++++++++++++++---
>  1 file changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 765c2d6358da..0dc3a102bfc4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1502,20 +1502,44 @@ static void virtnet_set_affinity(struct virtnet_info *vi)
>  	 * queue pairs, we let the queue pairs to be private to one cpu by
>  	 * setting the affinity hint to eliminate the contention.
>  	 */
> -	if (vi->curr_queue_pairs == 1 ||
> -	    vi->max_queue_pairs != num_online_cpus()) {
> +	if (vi->curr_queue_pairs == 1) {
>  		virtnet_clean_affinity(vi, -1);
>  		return;
>  	}
>  
> +	/* If there are more cpus than queues, then assign the queues'
> +	 * interrupts to the first cpus until we run out.
> +	 */
>  	i = 0;
>  	for_each_online_cpu(cpu) {
> +		if (i == vi->max_queue_pairs)
> +			break;
>  		virtqueue_set_affinity(vi->rq[i].vq, cpu);
>  		virtqueue_set_affinity(vi->sq[i].vq, cpu);
> -		netif_set_xps_queue(vi->dev, cpumask_of(cpu), i);
>  		i++;
>  	}
>  
> +	/* Stripe the XPS affinities across the online CPUs.
> +	 * Hyperthread pairs are typically assigned such that Linux's
> +	 * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause
> +	 * hyperthread twins to share TX queues, in the case where there are
> +	 * more cpus than queues.

Couldn't you add some kind of API so that we don't need to make
assumptions like this? E.g. "give me a new CPU core to use for an
interrupt"?
Would address multiple device thing too.


> +	 */
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
> +		struct cpumask mask;
> +		int skip = i;
> +
> +		cpumask_clear(&mask);
> +		for_each_online_cpu(cpu) {
> +			while (skip--)
> +				cpu = cpumask_next(cpu, cpu_online_mask);
> +			if (cpu < num_possible_cpus())
> +				cpumask_set_cpu(cpu, &mask);
> +			skip = vi->max_queue_pairs - 1;
> +		}
> +		netif_set_xps_queue(vi->dev, &mask, i);
> +	}
> +

Doesn't look like this will handle the case of num cpus < num queues well.

>  	vi->affinity_hint_set = true;
>  }
>  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-08 19:37 ` Michael S. Tsirkin
@ 2017-02-14 19:17   ` Benjamin Serebrin
  2017-02-14 21:05     ` Michael S. Tsirkin
  2017-02-15 15:23     ` Michael S. Tsirkin
  0 siblings, 2 replies; 13+ messages in thread
From: Benjamin Serebrin @ 2017-02-14 19:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christian Borntraeger, netdev, Jason Wang, David Miller,
	Willem de Bruijn, Venkatesh Srinivas, Jon Olson (Google Drive),
	willemdebruijn.kernel, Rick Jones, James Mattson, linux-s390,
	linux-arch

On Wed, Feb 8, 2017 at 11:37 AM, Michael S. Tsirkin <mst@redhat.com> wrote:

> IIRC irqbalance will bail out and avoid touching affinity
> if you set affinity from driver.  Breaking that's not nice.
> Pls correct me if I'm wrong.


I believe you're right that irqbalance will leave the affinity alone.

Irqbalance has had changes that may or may not be in the versions bundled with
various guests, and I don't have a definitive cross-correlation of irqbalance
version to guest version.  But in the existing code, the driver does
set affinity for #VCPUs==#queues, so that's been happening anyway.

The (original) intention of this patch was to extend the existing behavior
to the case where we limit queue counts, to avoid the surprising discontinuity
when #VCPU != #queues.

It's not obvious that it's wrong to cause irqbalance to leave these
queues alone:  Generally you want the interrupt to come to the core that
caused the work, to have cache locality and avoid lock contention.
Doing fancier things is outside the scope of this patch.

> Doesn't look like this will handle the case of num cpus < num queues well.

I believe it's correct.  The first #VCPUs queues will have one bit set in their
xps mask, and the remaining queues have no bits set.  That means each VCPU uses
its own assigned TX queue (and the TX interrupt comes back to that VCPU).

Thanks again for the review!
Ben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-14 19:17   ` Benjamin Serebrin
@ 2017-02-14 21:05     ` Michael S. Tsirkin
  2017-02-15 16:50       ` Willem de Bruijn
  2017-02-15 15:23     ` Michael S. Tsirkin
  1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2017-02-14 21:05 UTC (permalink / raw)
  To: Benjamin Serebrin
  Cc: Christian Borntraeger, netdev, Jason Wang, David Miller,
	Willem de Bruijn, Venkatesh Srinivas, Jon Olson (Google Drive),
	willemdebruijn.kernel, Rick Jones, James Mattson, linux-s390,
	linux-arch

On Tue, Feb 14, 2017 at 11:17:41AM -0800, Benjamin Serebrin wrote:
> On Wed, Feb 8, 2017 at 11:37 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> > IIRC irqbalance will bail out and avoid touching affinity
> > if you set affinity from driver.  Breaking that's not nice.
> > Pls correct me if I'm wrong.
> 
> 
> I believe you're right that irqbalance will leave the affinity alone.
> 
> Irqbalance has had changes that may or may not be in the versions bundled with
> various guests, and I don't have a definitive cross-correlation of irqbalance
> version to guest version.  But in the existing code, the driver does
> set affinity for #VCPUs==#queues, so that's been happening anyway.

Right - the idea being we load all CPUs equally so we don't
need help from irqbalance - hopefully packets will be spread
across queues in a balanced way.

When we have less queues the load isn't balanced so we
definitely need something fancier to take into account
the overall system load.


> The (original) intention of this patch was to extend the existing behavior
> to the case where we limit queue counts, to avoid the surprising discontinuity
> when #VCPU != #queues.
> 
> It's not obvious that it's wrong to cause irqbalance to leave these
> queues alone:  Generally you want the interrupt to come to the core that
> caused the work, to have cache locality and avoid lock contention.

But why the first N cpus? That's more or less the same as assigning them
at random. It might benefit your workload but it's not clear it isn't
breaking someone else's. You are right it's not obvious it's doing the
wrong thing but it's also not obvious it's doing the right one.

> Doing fancier things is outside the scope of this patch.
>
> > Doesn't look like this will handle the case of num cpus < num queues well.
> 
> I believe it's correct.  The first #VCPUs queues will have one bit set in their
> xps mask, and the remaining queues have no bits set.  That means each VCPU uses
> its own assigned TX queue (and the TX interrupt comes back to that VCPU).
> 
> Thanks again for the review!
> Ben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-14 19:17   ` Benjamin Serebrin
  2017-02-14 21:05     ` Michael S. Tsirkin
@ 2017-02-15 15:23     ` Michael S. Tsirkin
  1 sibling, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2017-02-15 15:23 UTC (permalink / raw)
  To: Benjamin Serebrin
  Cc: Christian Borntraeger, netdev, Jason Wang, David Miller,
	Willem de Bruijn, Venkatesh Srinivas, Jon Olson (Google Drive),
	willemdebruijn.kernel, Rick Jones, James Mattson, linux-s390,
	linux-arch

On Tue, Feb 14, 2017 at 11:17:41AM -0800, Benjamin Serebrin wrote:
> On Wed, Feb 8, 2017 at 11:37 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> > IIRC irqbalance will bail out and avoid touching affinity
> > if you set affinity from driver.  Breaking that's not nice.
> > Pls correct me if I'm wrong.
> 
> 
> I believe you're right that irqbalance will leave the affinity alone.
> 
> Irqbalance has had changes that may or may not be in the versions bundled with
> various guests, and I don't have a definitive cross-correlation of irqbalance
> version to guest version.  But in the existing code, the driver does
> set affinity for #VCPUs==#queues, so that's been happening anyway.

Right but only for the case where we are very sure we are doing the
right thing, so we don't need any help from irqbalance.

> The (original) intention of this patch was to extend the existing behavior
> to the case where we limit queue counts, to avoid the surprising discontinuity
> when #VCPU != #queues.
> 
> It's not obvious that it's wrong to cause irqbalance to leave these
> queues alone:  Generally you want the interrupt to come to the core that
> caused the work, to have cache locality and avoid lock contention.
> Doing fancier things is outside the scope of this patch.

Doing fancier things like trying to balance the load would be in scope
for irqbalance so I think you need to find a way to supply default
affinity without disabling irqbalance.

> > Doesn't look like this will handle the case of num cpus < num queues well.
> 
> I believe it's correct.  The first #VCPUs queues will have one bit set in their
> xps mask, and the remaining queues have no bits set.  That means each VCPU uses
> its own assigned TX queue (and the TX interrupt comes back to that VCPU).
>
> Thanks again for the review!
> Ben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-14 21:05     ` Michael S. Tsirkin
@ 2017-02-15 16:50       ` Willem de Bruijn
  2017-02-15 17:42         ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2017-02-15 16:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Benjamin Serebrin, Christian Borntraeger, Network Development,
	Jason Wang, David Miller, Willem de Bruijn, Venkatesh Srinivas,
	Jon Olson (Google Drive),
	Rick Jones, James Mattson, linux-s390, linux-arch

On Tue, Feb 14, 2017 at 1:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Feb 14, 2017 at 11:17:41AM -0800, Benjamin Serebrin wrote:
>> On Wed, Feb 8, 2017 at 11:37 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> > IIRC irqbalance will bail out and avoid touching affinity
>> > if you set affinity from driver.  Breaking that's not nice.
>> > Pls correct me if I'm wrong.
>>
>>
>> I believe you're right that irqbalance will leave the affinity alone.
>>
>> Irqbalance has had changes that may or may not be in the versions bundled with
>> various guests, and I don't have a definitive cross-correlation of irqbalance
>> version to guest version.  But in the existing code, the driver does
>> set affinity for #VCPUs==#queues, so that's been happening anyway.
>
> Right - the idea being we load all CPUs equally so we don't
> need help from irqbalance - hopefully packets will be spread
> across queues in a balanced way.
>
> When we have less queues the load isn't balanced so we
> definitely need something fancier to take into account
> the overall system load.

For pure network load, assigning each txqueue IRQ exclusively
to one of the cores that generates traffic on that queue is the
optimal layout in terms of load spreading. Irqbalance does
not have the XPS information to make this optimal decision.

Overall system load affects this calculation both in the case of 1:1
mapping uneven queue distribution. In both cases, irqbalance
is hopefully smart enough to migrate other non-pinned IRQs to
cpus with lower overall load.

> But why the first N cpus? That's more or less the same as assigning them
> at random.

CPU selection is an interesting point. Spreading equally across numa
nodes would be preferable over first N. Aside from that, the first N
should work best to minimize the chance of hitting multiple
hyperthreads on the same core -- if all architectures lay out
hyperthreads in the same way as x86_64.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-15 16:50       ` Willem de Bruijn
@ 2017-02-15 17:42         ` Michael S. Tsirkin
  2017-02-15 18:27           ` Benjamin Serebrin
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2017-02-15 17:42 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Benjamin Serebrin, Christian Borntraeger, Network Development,
	Jason Wang, David Miller, Willem de Bruijn, Venkatesh Srinivas,
	Jon Olson (Google Drive),
	Rick Jones, James Mattson, linux-s390, linux-arch

On Wed, Feb 15, 2017 at 08:50:34AM -0800, Willem de Bruijn wrote:
> On Tue, Feb 14, 2017 at 1:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Feb 14, 2017 at 11:17:41AM -0800, Benjamin Serebrin wrote:
> >> On Wed, Feb 8, 2017 at 11:37 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>
> >> > IIRC irqbalance will bail out and avoid touching affinity
> >> > if you set affinity from driver.  Breaking that's not nice.
> >> > Pls correct me if I'm wrong.
> >>
> >>
> >> I believe you're right that irqbalance will leave the affinity alone.
> >>
> >> Irqbalance has had changes that may or may not be in the versions bundled with
> >> various guests, and I don't have a definitive cross-correlation of irqbalance
> >> version to guest version.  But in the existing code, the driver does
> >> set affinity for #VCPUs==#queues, so that's been happening anyway.
> >
> > Right - the idea being we load all CPUs equally so we don't
> > need help from irqbalance - hopefully packets will be spread
> > across queues in a balanced way.
> >
> > When we have less queues the load isn't balanced so we
> > definitely need something fancier to take into account
> > the overall system load.
> 
> For pure network load, assigning each txqueue IRQ exclusively
> to one of the cores that generates traffic on that queue is the
> optimal layout in terms of load spreading. Irqbalance does
> not have the XPS information to make this optimal decision.

Try to add hints for it?

> Overall system load affects this calculation both in the case of 1:1
> mapping uneven queue distribution. In both cases, irqbalance
> is hopefully smart enough to migrate other non-pinned IRQs to
> cpus with lower overall load.

Not if everyone starts inserting hacks like this one in code.

> > But why the first N cpus? That's more or less the same as assigning them
> > at random.
> 
> CPU selection is an interesting point. Spreading equally across numa
> nodes would be preferable over first N. Aside from that, the first N
> should work best to minimize the chance of hitting multiple
> hyperthreads on the same core -- if all architectures lay out
> hyperthreads in the same way as x86_64.

That's another problem with this patch. If you care about hyperthreads
you want an API to probe for that.

-- 
MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-15 17:42         ` Michael S. Tsirkin
@ 2017-02-15 18:27           ` Benjamin Serebrin
  2017-02-15 19:17             ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Serebrin @ 2017-02-15 18:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, Christian Borntraeger, Network Development,
	Jason Wang, David Miller, Willem de Bruijn, Venkatesh Srinivas,
	Jon Olson (Google Drive),
	Rick Jones, James Mattson, linux-s390, linux-arch

On Wed, Feb 15, 2017 at 9:42 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>
>
> > For pure network load, assigning each txqueue IRQ exclusively
> > to one of the cores that generates traffic on that queue is the
> > optimal layout in terms of load spreading. Irqbalance does
> > not have the XPS information to make this optimal decision.
>
> Try to add hints for it?
>
>
> > Overall system load affects this calculation both in the case of 1:1
> > mapping uneven queue distribution. In both cases, irqbalance
> > is hopefully smart enough to migrate other non-pinned IRQs to
> > cpus with lower overall load.
>
> Not if everyone starts inserting hacks like this one in code.


It seems to me that the default behavior is equally "random" - why would we want
XPS striped across the cores the way it's done today?

What we're trying to do here is avoid the surprise cliff that guests will
hit when queue count is limited to less than VCPU count.  That will
happen because
we limit queue pair count to 32.  I'll happily push further complexity
to user mode.

If this won't fly, we can leave all of this behavior in user code.
Michael, would
you prefer that I abandon this patch?

> That's another problem with this patch. If you care about hyperthreads
> you want an API to probe for that.


It's something of a happy accident that hyperthreads line up that way.
Keeping the topology knowledge out of the patch and into user space seems
cleaner, would you agree?

Thanks!
Ben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-15 18:27           ` Benjamin Serebrin
@ 2017-02-15 19:17             ` Michael S. Tsirkin
  2017-02-15 21:38               ` Benjamin Serebrin
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2017-02-15 19:17 UTC (permalink / raw)
  To: Benjamin Serebrin
  Cc: Willem de Bruijn, Christian Borntraeger, Network Development,
	Jason Wang, David Miller, Willem de Bruijn, Venkatesh Srinivas,
	Jon Olson (Google Drive),
	Rick Jones, James Mattson, linux-s390, linux-arch

On Wed, Feb 15, 2017 at 10:27:37AM -0800, Benjamin Serebrin wrote:
> On Wed, Feb 15, 2017 at 9:42 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> >
> > > For pure network load, assigning each txqueue IRQ exclusively
> > > to one of the cores that generates traffic on that queue is the
> > > optimal layout in terms of load spreading. Irqbalance does
> > > not have the XPS information to make this optimal decision.
> >
> > Try to add hints for it?
> >
> >
> > > Overall system load affects this calculation both in the case of 1:1
> > > mapping uneven queue distribution. In both cases, irqbalance
> > > is hopefully smart enough to migrate other non-pinned IRQs to
> > > cpus with lower overall load.
> >
> > Not if everyone starts inserting hacks like this one in code.
> 
> 
> It seems to me that the default behavior is equally "random" - why would we want
> XPS striped across the cores the way it's done today?

Right. But userspace knows it's random at least. If kernel supplies
affinity e.g. the way your patch does, userspace ATM accepts this as a
gospel.

> What we're trying to do here is avoid the surprise cliff that guests will
> hit when queue count is limited to less than VCPU count.

And presumably the high VCPU count is there because people
actually run some tasks on all these extra VCPUs.
so I'm not sure I buy the "pure networking workload" argument.

> That will
> happen because
> we limit queue pair count to 32.  I'll happily push further complexity
> to user mode.


Why do you do this BTW? I note that this will interfere with e.g. XPS
which for better or worse wants its own set of TC queues.

> If this won't fly, we can leave all of this behavior in user code.
> Michael, would
> you prefer that I abandon this patch?

I think it's either that or find a way that does not interfere with what
existing userspace has been doing. That's likely to involve core changes
outside of virtio.

> > That's another problem with this patch. If you care about hyperthreads
> > you want an API to probe for that.
> 
> 
> It's something of a happy accident that hyperthreads line up that way.
> Keeping the topology knowledge out of the patch and into user space seems
> cleaner, would you agree?
> 
> Thanks!
> Ben

Well kernel does have topology knowledge already but I have no
issue with keeping the logic all in userspace.

-- 
MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-15 19:17             ` Michael S. Tsirkin
@ 2017-02-15 21:38               ` Benjamin Serebrin
  2017-02-15 21:49                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Serebrin @ 2017-02-15 21:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, Christian Borntraeger, Network Development,
	Jason Wang, David Miller, Willem de Bruijn, Venkatesh Srinivas,
	Jon Olson (Google Drive),
	Rick Jones, James Mattson, linux-s390, linux-arch

On Wed, Feb 15, 2017 at 11:17 AM, Michael S. Tsirkin <mst@redhat.com> wrote:

> Right. But userspace knows it's random at least. If kernel supplies
> affinity e.g. the way your patch does, userspace ATM accepts this as a
> gospel.

The existing code supplies the same affinity gospels in the #vcpu ==
#queue case today.
And the patch (unless it has a bug in it) should not affect the #vcpu
== #queue case's
behavior.  I don't quite understand what property we'd be changing
with the patch.

Here's the same dump of smp_affinity_list, on a 16 VCPU machine with
unmodified kernel:

0
0
1
1
2
2
[..]
15
15

And xps_cpus
00000001
00000002
[...]
00008000

This patch causes #vcpu != #queue case to follow the same pattern.


Thanks again!
Ben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-15 21:38               ` Benjamin Serebrin
@ 2017-02-15 21:49                 ` Michael S. Tsirkin
  2017-02-15 22:13                   ` Benjamin Serebrin
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2017-02-15 21:49 UTC (permalink / raw)
  To: Benjamin Serebrin
  Cc: Willem de Bruijn, Christian Borntraeger, Network Development,
	Jason Wang, David Miller, Willem de Bruijn, Venkatesh Srinivas,
	Jon Olson (Google Drive),
	Rick Jones, James Mattson, linux-s390, linux-arch

On Wed, Feb 15, 2017 at 01:38:48PM -0800, Benjamin Serebrin wrote:
> On Wed, Feb 15, 2017 at 11:17 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> > Right. But userspace knows it's random at least. If kernel supplies
> > affinity e.g. the way your patch does, userspace ATM accepts this as a
> > gospel.
> 
> The existing code supplies the same affinity gospels in the #vcpu ==
> #queue case today.
> And the patch (unless it has a bug in it) should not affect the #vcpu
> == #queue case's
> behavior.  I don't quite understand what property we'd be changing
> with the patch.
> 
> Here's the same dump of smp_affinity_list, on a 16 VCPU machine with
> unmodified kernel:
> 
> 0
> 0
> 1
> 1
> 2
> 2
> [..]
> 15
> 15
> 
> And xps_cpus
> 00000001
> 00000002
> [...]
> 00008000
> 
> This patch causes #vcpu != #queue case to follow the same pattern.
> 
> 
> Thanks again!
> Ben

The logic is simple really. With #VCPUs == #queues we can reasonably
assume this box is mostly doing networking so we can set affinity
the way we like. With VCPUs > queues clearly VM is doing more stuff
so we need a userspace policy to take that into account,
we don't know ourselves what is the right thing to do.

Arguably for #VCPUs == #queues we are not always doing the right thing
either but I see this as an argument to move more smarts
into core kernel not for adding more dumb heuristics in the driver.

-- 
MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs
  2017-02-15 21:49                 ` Michael S. Tsirkin
@ 2017-02-15 22:13                   ` Benjamin Serebrin
  0 siblings, 0 replies; 13+ messages in thread
From: Benjamin Serebrin @ 2017-02-15 22:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, Christian Borntraeger, Network Development,
	Jason Wang, David Miller, Willem de Bruijn, Venkatesh Srinivas,
	Jon Olson (Google Drive),
	Rick Jones, James Mattson, linux-s390, linux-arch

On Wed, Feb 15, 2017 at 1:49 PM, Michael S. Tsirkin <mst@redhat.com> wrote:

> The logic is simple really. With #VCPUs == #queues we can reasonably
> assume this box is mostly doing networking so we can set affinity
> the way we like. With VCPUs > queues clearly VM is doing more stuff
> so we need a userspace policy to take that into account,
> we don't know ourselves what is the right thing to do.
>
> Arguably for #VCPUs == #queues we are not always doing the right thing
> either but I see this as an argument to move more smarts
> into core kernel not for adding more dumb heuristics in the driver.

Thanks for all the feedback, Michael.  We'll drop this patch and move
to user mode.

Thanks,
Ben

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-02-15 22:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-07 18:15 [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs Ben Serebrin
2017-02-08 18:37 ` David Miller
2017-02-08 19:37 ` Michael S. Tsirkin
2017-02-14 19:17   ` Benjamin Serebrin
2017-02-14 21:05     ` Michael S. Tsirkin
2017-02-15 16:50       ` Willem de Bruijn
2017-02-15 17:42         ` Michael S. Tsirkin
2017-02-15 18:27           ` Benjamin Serebrin
2017-02-15 19:17             ` Michael S. Tsirkin
2017-02-15 21:38               ` Benjamin Serebrin
2017-02-15 21:49                 ` Michael S. Tsirkin
2017-02-15 22:13                   ` Benjamin Serebrin
2017-02-15 15:23     ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.