linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] irq/affinity: Fix extra vecs calculation
@ 2017-04-13 17:28 Keith Busch
  2017-04-13 21:46 ` [tip:irq/urgent] " tip-bot for Keith Busch
  2017-04-19 16:20 ` Andrei Vagin
  0 siblings, 2 replies; 9+ messages in thread
From: Keith Busch @ 2017-04-13 17:28 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner; +Cc: Xiaolong Ye, Keith Busch

This fixes a math error calculating the extra_vecs. The error assumed
only 1 cpu per vector, but the value needs to account for the actual
number of cpus per vector in order to get the correct remainder for
extra CPU assignment.

Fixes: 7bf8222b9bd0 ("irq/affinity: Fix CPU spread for unbalanced nodes")
Reported-by: Xiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 kernel/irq/affinity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index dc52911..d052947 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -108,7 +108,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 		vecs_to_assign = min(vecs_per_node, ncpus);
 
 		/* Account for rounding errors */
-		extra_vecs = ncpus - vecs_to_assign;
+		extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
 
 		for (v = 0; curvec < last_affv && v < vecs_to_assign;
 		     curvec++, v++) {
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [tip:irq/urgent] irq/affinity: Fix extra vecs calculation
  2017-04-13 17:28 [PATCH] irq/affinity: Fix extra vecs calculation Keith Busch
@ 2017-04-13 21:46 ` tip-bot for Keith Busch
  2017-04-19 16:20 ` Andrei Vagin
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Keith Busch @ 2017-04-13 21:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: keith.busch, linux-kernel, xiaolong.ye, tglx, hpa, mingo

Commit-ID:  3412386b531244f24a27c79ee003506a52a00848
Gitweb:     http://git.kernel.org/tip/3412386b531244f24a27c79ee003506a52a00848
Author:     Keith Busch <keith.busch@intel.com>
AuthorDate: Thu, 13 Apr 2017 13:28:12 -0400
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 13 Apr 2017 23:41:00 +0200

irq/affinity: Fix extra vecs calculation

This fixes a math error calculating the extra_vecs. The error assumed
only 1 cpu per vector, but the value needs to account for the actual
number of cpus per vector in order to get the correct remainder for
extra CPU assignment.

Fixes: 7bf8222b9bd0 ("irq/affinity: Fix CPU spread for unbalanced nodes")
Reported-by: Xiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Link: http://lkml.kernel.org/r/1492104492-19943-1-git-send-email-keith.busch@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/irq/affinity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index dc52911..d052947 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -108,7 +108,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 		vecs_to_assign = min(vecs_per_node, ncpus);
 
 		/* Account for rounding errors */
-		extra_vecs = ncpus - vecs_to_assign;
+		extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
 
 		for (v = 0; curvec < last_affv && v < vecs_to_assign;
 		     curvec++, v++) {

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: irq/affinity: Fix extra vecs calculation
  2017-04-13 17:28 [PATCH] irq/affinity: Fix extra vecs calculation Keith Busch
  2017-04-13 21:46 ` [tip:irq/urgent] " tip-bot for Keith Busch
@ 2017-04-19 16:20 ` Andrei Vagin
  2017-04-19 17:03   ` Keith Busch
  1 sibling, 1 reply; 9+ messages in thread
From: Andrei Vagin @ 2017-04-19 16:20 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-kernel, Thomas Gleixner, Xiaolong Ye

Hi,

Something is wrong with this patch. We run CRIU tests for upstream kernels.
And we found that a kernel with this patch can't be booted.

https://travis-ci.org/avagin/linux/builds/223557750

We don't have access to console logs and I can't reproduce this issue on
my nodes. I tired to revert this patch and everything works as expected.

https://travis-ci.org/avagin/linux/builds/223594172

Here is another report about this patch
https://lkml.org/lkml/2017/4/16/344

Thanks,
Andrei

On Thu, Apr 13, 2017 at 01:28:12PM -0400, Keith Busch wrote:
> This fixes a math error calculating the extra_vecs. The error assumed
> only 1 cpu per vector, but the value needs to account for the actual
> number of cpus per vector in order to get the correct remainder for
> extra CPU assignment.
> 
> Fixes: 7bf8222b9bd0 ("irq/affinity: Fix CPU spread for unbalanced nodes")
> Reported-by: Xiaolong Ye <xiaolong.ye@intel.com>
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  kernel/irq/affinity.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index dc52911..d052947 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -108,7 +108,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  		vecs_to_assign = min(vecs_per_node, ncpus);
>  
>  		/* Account for rounding errors */
> -		extra_vecs = ncpus - vecs_to_assign;
> +		extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
>  
>  		for (v = 0; curvec < last_affv && v < vecs_to_assign;
>  		     curvec++, v++) {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: irq/affinity: Fix extra vecs calculation
  2017-04-19 16:20 ` Andrei Vagin
@ 2017-04-19 17:03   ` Keith Busch
  2017-04-19 19:11     ` Andrei Vagin
  2017-04-19 19:53     ` Andrei Vagin
  0 siblings, 2 replies; 9+ messages in thread
From: Keith Busch @ 2017-04-19 17:03 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: linux-kernel, Thomas Gleixner, Xiaolong Ye

On Wed, Apr 19, 2017 at 09:20:27AM -0700, Andrei Vagin wrote:
> Hi,
> 
> Something is wrong with this patch. We run CRIU tests for upstream kernels.
> And we found that a kernel with this patch can't be booted.
> 
> https://travis-ci.org/avagin/linux/builds/223557750
> 
> We don't have access to console logs and I can't reproduce this issue on
> my nodes. I tired to revert this patch and everything works as expected.
> 
> https://travis-ci.org/avagin/linux/builds/223594172
> 
> Here is another report about this patch
> https://lkml.org/lkml/2017/4/16/344

Yikes, okay, I've made a mistake somewhere. Sorry about that, I will
look into this ASAP.

If it's a divide by 0 as your last link indicates, that must mean there
are possible nodes, but have no CPUs, and those should be skipped. If
that's the case, the following should fix it, but I'm going to do some
more qemu testing with various CPU topologies to confirm.

---
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index d052947..80c45d0 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -105,6 +105,9 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 
 		/* Calculate the number of cpus per vector */
 		ncpus = cpumask_weight(nmsk);
+		if (!ncpus)
+			continue;
+
 		vecs_to_assign = min(vecs_per_node, ncpus);
 
 		/* Account for rounding errors */
--

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: irq/affinity: Fix extra vecs calculation
  2017-04-19 17:03   ` Keith Busch
@ 2017-04-19 19:11     ` Andrei Vagin
  2017-04-19 19:53     ` Andrei Vagin
  1 sibling, 0 replies; 9+ messages in thread
From: Andrei Vagin @ 2017-04-19 19:11 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-kernel, Thomas Gleixner, Xiaolong Ye

On Wed, Apr 19, 2017 at 01:03:59PM -0400, Keith Busch wrote:
> On Wed, Apr 19, 2017 at 09:20:27AM -0700, Andrei Vagin wrote:
> > Hi,
> > 
> > Something is wrong with this patch. We run CRIU tests for upstream kernels.
> > And we found that a kernel with this patch can't be booted.
> > 
> > https://travis-ci.org/avagin/linux/builds/223557750
> > 
> > We don't have access to console logs and I can't reproduce this issue on
> > my nodes. I tired to revert this patch and everything works as expected.
> > 
> > https://travis-ci.org/avagin/linux/builds/223594172
> > 
> > Here is another report about this patch
> > https://lkml.org/lkml/2017/4/16/344
> 
> Yikes, okay, I've made a mistake somewhere. Sorry about that, I will
> look into this ASAP.

Thank you

> 
> If it's a divide by 0 as your last link indicates, that must mean there
> are possible nodes, but have no CPUs, and those should be skipped. If
> that's the case, the following should fix it, but I'm going to do some
> more qemu testing with various CPU topologies to confirm.

This patch doesn't fix my problem
https://travis-ci.org/avagin/linux/builds/223674690

> 
> ---
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index d052947..80c45d0 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -105,6 +105,9 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  
>  		/* Calculate the number of cpus per vector */
>  		ncpus = cpumask_weight(nmsk);
> +		if (!ncpus)
> +			continue;
> +
>  		vecs_to_assign = min(vecs_per_node, ncpus);
>  
>  		/* Account for rounding errors */
> --

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: irq/affinity: Fix extra vecs calculation
  2017-04-19 17:03   ` Keith Busch
  2017-04-19 19:11     ` Andrei Vagin
@ 2017-04-19 19:53     ` Andrei Vagin
  2017-04-19 21:53       ` Keith Busch
  1 sibling, 1 reply; 9+ messages in thread
From: Andrei Vagin @ 2017-04-19 19:53 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-kernel, Thomas Gleixner, Xiaolong Ye

On Wed, Apr 19, 2017 at 01:03:59PM -0400, Keith Busch wrote:
> On Wed, Apr 19, 2017 at 09:20:27AM -0700, Andrei Vagin wrote:
> > Hi,
> > 
> > Something is wrong with this patch. We run CRIU tests for upstream kernels.
> > And we found that a kernel with this patch can't be booted.
> > 
> > https://travis-ci.org/avagin/linux/builds/223557750
> > 
> > We don't have access to console logs and I can't reproduce this issue on
> > my nodes. I tired to revert this patch and everything works as expected.
> > 
> > https://travis-ci.org/avagin/linux/builds/223594172
> > 
> > Here is another report about this patch
> > https://lkml.org/lkml/2017/4/16/344
> 
> Yikes, okay, I've made a mistake somewhere. Sorry about that, I will
> look into this ASAP.
> 
> If it's a divide by 0 as your last link indicates, that must mean there
> are possible nodes, but have no CPUs, and those should be skipped. If
> that's the case, the following should fix it, but I'm going to do some
> more qemu testing with various CPU topologies to confirm.

I printed variables from my test host, I think this can help to
investigate the issue:

irq_create_affinity_masks:116: vecs_to_assign 0 ncpus 2 extra_vecs 2 vecs_per_node 0 affv 2 curvec 2 nodes 1

and here is a patch which I use to print them:

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index a073a6e..c43c85d 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -110,7 +110,10 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
                vecs_to_assign = min(vecs_per_node, ncpus);
 
                /* Account for rounding errors */
-               extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
+//             extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
+               extra_vecs = ncpus - vecs_to_assign;
+               printk("%s:%d: vecs_to_assign %d ncpus %d extra_vecs %d vecs_per_node %d affv %d curvec %d nodes %d\n",
+                       __func__, __LINE__, vecs_to_assign, ncpus, extra_vecs, vecs_per_node, affv, curvec, nodes);
 
                for (v = 0; curvec < last_affv && v < vecs_to_assign;
                     curvec++, v++) {


> 
> ---
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index d052947..80c45d0 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -105,6 +105,9 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  
>  		/* Calculate the number of cpus per vector */
>  		ncpus = cpumask_weight(nmsk);
> +		if (!ncpus)
> +			continue;
> +
>  		vecs_to_assign = min(vecs_per_node, ncpus);
>  
>  		/* Account for rounding errors */
> --

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: irq/affinity: Fix extra vecs calculation
  2017-04-19 19:53     ` Andrei Vagin
@ 2017-04-19 21:53       ` Keith Busch
  2017-04-19 22:32         ` Andrei Vagin
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2017-04-19 21:53 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: linux-kernel, Thomas Gleixner, Xiaolong Ye

On Wed, Apr 19, 2017 at 12:53:44PM -0700, Andrei Vagin wrote:
> On Wed, Apr 19, 2017 at 01:03:59PM -0400, Keith Busch wrote:
> > If it's a divide by 0 as your last link indicates, that must mean there
> > are possible nodes, but have no CPUs, and those should be skipped. If
> > that's the case, the following should fix it, but I'm going to do some
> > more qemu testing with various CPU topologies to confirm.
> 
> I printed variables from my test host, I think this can help to
> investigate the issue:
> 
> irq_create_affinity_masks:116: vecs_to_assign 0 ncpus 2 extra_vecs 2 vecs_per_node 0 affv 2 curvec 2 nodes 1

That explains a lot. This setup wants 2 "pre_vectors", but I didn't
know that was even a thing. This should fix it:

---
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index d052947..eb8b689 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -98,13 +98,16 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 		int ncpus, v, vecs_to_assign, vecs_per_node;
 
 		/* Spread the vectors per node */
-		vecs_per_node = (affv - curvec) / nodes;
+		vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
 
 		/* Get the cpus on this node which are in the mask */
 		cpumask_and(nmsk, cpu_online_mask, cpumask_of_node(n));
--

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: irq/affinity: Fix extra vecs calculation
  2017-04-19 21:53       ` Keith Busch
@ 2017-04-19 22:32         ` Andrei Vagin
  2017-04-19 22:45           ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Andrei Vagin @ 2017-04-19 22:32 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-kernel, Thomas Gleixner, Xiaolong Ye

On Wed, Apr 19, 2017 at 05:53:09PM -0400, Keith Busch wrote:
> On Wed, Apr 19, 2017 at 12:53:44PM -0700, Andrei Vagin wrote:
> > On Wed, Apr 19, 2017 at 01:03:59PM -0400, Keith Busch wrote:
> > > If it's a divide by 0 as your last link indicates, that must mean there
> > > are possible nodes, but have no CPUs, and those should be skipped. If
> > > that's the case, the following should fix it, but I'm going to do some
> > > more qemu testing with various CPU topologies to confirm.
> > 
> > I printed variables from my test host, I think this can help to
> > investigate the issue:
> > 
> > irq_create_affinity_masks:116: vecs_to_assign 0 ncpus 2 extra_vecs 2 vecs_per_node 0 affv 2 curvec 2 nodes 1
> 
> That explains a lot. This setup wants 2 "pre_vectors", but I didn't
> know that was even a thing. This should fix it:

This patch works for me.
> 
> ---
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index d052947..eb8b689 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -98,13 +98,16 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  		int ncpus, v, vecs_to_assign, vecs_per_node;
>  
>  		/* Spread the vectors per node */
> -		vecs_per_node = (affv - curvec) / nodes;
> +		vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
>  
>  		/* Get the cpus on this node which are in the mask */
>  		cpumask_and(nmsk, cpu_online_mask, cpumask_of_node(n));
> --

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: irq/affinity: Fix extra vecs calculation
  2017-04-19 22:32         ` Andrei Vagin
@ 2017-04-19 22:45           ` Keith Busch
  0 siblings, 0 replies; 9+ messages in thread
From: Keith Busch @ 2017-04-19 22:45 UTC (permalink / raw)
  To: Andrei Vagin; +Cc: linux-kernel, Thomas Gleixner, Xiaolong Ye

On Wed, Apr 19, 2017 at 03:32:06PM -0700, Andrei Vagin wrote:
> This patch works for me.

Awesome, thank you much for confirming, and again, sorry for the
breakage. I see virtio-scsi is one reliable way to have reproduced this,
so I'll incorporate that into tests before posting future kernel core
patches.

I'll send this fix with change log shortly.

> > ---
> > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> > index d052947..eb8b689 100644
> > --- a/kernel/irq/affinity.c
> > +++ b/kernel/irq/affinity.c
> > @@ -98,13 +98,16 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> >  		int ncpus, v, vecs_to_assign, vecs_per_node;
> >  
> >  		/* Spread the vectors per node */
> > -		vecs_per_node = (affv - curvec) / nodes;
> > +		vecs_per_node = (affv - (curvec - affd->pre_vectors)) / nodes;
> >  
> >  		/* Get the cpus on this node which are in the mask */
> >  		cpumask_and(nmsk, cpu_online_mask, cpumask_of_node(n));
> > --

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-04-19 22:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-13 17:28 [PATCH] irq/affinity: Fix extra vecs calculation Keith Busch
2017-04-13 21:46 ` [tip:irq/urgent] " tip-bot for Keith Busch
2017-04-19 16:20 ` Andrei Vagin
2017-04-19 17:03   ` Keith Busch
2017-04-19 19:11     ` Andrei Vagin
2017-04-19 19:53     ` Andrei Vagin
2017-04-19 21:53       ` Keith Busch
2017-04-19 22:32         ` Andrei Vagin
2017-04-19 22:45           ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).