[PATCH 0/2] genriq/affinity: two improvement on __irq_build_affinity

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] genriq/affinity: two improvement on __irq_build_affinity_masks
@ 2019-08-09 10:23 Ming Lei
  2019-08-09 10:23 ` [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks() Ming Lei
  2019-08-09 10:23 ` [PATCH 2/2] genirq/affinity: spread vectors on node according to nr_cpu ratio Ming Lei
  0 siblings, 2 replies; 6+ messages in thread
From: Ming Lei @ 2019-08-09 10:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Ming Lei, Christoph Hellwig, Keith Busch,
	linux-nvme, Jon Derrick

Hi,

The 1st patch makes __irq_build_affinity_masks() more reliable, such as,
all nodes can be covered in the spread.

The 2nd patch spread vectors on node according to the ratio of this node's
CPU number to number of all remaining CPUs, then vectors assignment can
become more fair. Meantime, the warning report from Jon Derrick can be
fixed.

Please review & comment!

Ming Lei (2):
  genirq/affinity: improve __irq_build_affinity_masks()
  genirq/affinity: spread vectors on node according to nr_cpu ratio

 kernel/irq/affinity.c | 46 +++++++++++++++++++++++++++++++------------
 1 file changed, 33 insertions(+), 13 deletions(-)

Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>
Cc: linux-nvme@lists.infradead.org,
Cc: Jon Derrick <jonathan.derrick@intel.com>

-- 
2.20.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks()
  2019-08-09 10:23 [PATCH 0/2] genriq/affinity: two improvement on __irq_build_affinity_masks Ming Lei
@ 2019-08-09 10:23 ` Ming Lei
  2019-08-09 14:42   ` Keith Busch
  2019-08-09 10:23 ` [PATCH 2/2] genirq/affinity: spread vectors on node according to nr_cpu ratio Ming Lei
  1 sibling, 1 reply; 6+ messages in thread
From: Ming Lei @ 2019-08-09 10:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Ming Lei, Christoph Hellwig, Keith Busch,
	linux-nvme, Jon Derrick

One invariant of __irq_build_affinity_masks() is that all CPUs in the
specified masks( cpu_mask AND node_to_cpumask for each node) should be
covered during the spread. Even though all requested vectors have been
reached, we still need to spread vectors among left CPUs. The similar
policy has been taken in case of 'numvecs <= nodes'.

So remove the following check inside the loop:

	if (done >= numvecs)
		break;

Meantime assign at least 1 vector for left nodes if 'numvecs' vectors
have been spread.

Also, if the specified cpumask for one numa node is empty, simply not
spread vectors on this node.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>
Cc: linux-nvme@lists.infradead.org,
Cc: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 kernel/irq/affinity.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 6fef48033f96..bc3652a2c61b 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -129,21 +129,32 @@ static int __irq_build_affinity_masks(unsigned int startvec,
 	for_each_node_mask(n, nodemsk) {
 		unsigned int ncpus, v, vecs_to_assign, vecs_per_node;
 
-		/* Spread the vectors per node */
-		vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;
-
 		/* Get the cpus on this node which are in the mask */
 		cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
-
-		/* Calculate the number of cpus per vector */
 		ncpus = cpumask_weight(nmsk);
+		if (!ncpus)
+			continue;
+
+		/*
+		 * Calculate the number of cpus per vector
+		 *
+		 * Spread the vectors evenly per node. If the requested
+		 * vector number has been reached, simply allocate one
+		 * vector for each remaining node so that all nodes can
+		 * be covered
+		 */
+		if (numvecs > done)
+			vecs_per_node = max_t(unsigned,
+					(numvecs - done) / nodes, 1);
+		else
+			vecs_per_node = 1;
+
 		vecs_to_assign = min(vecs_per_node, ncpus);
 
 		/* Account for rounding errors */
 		extra_vecs = ncpus - vecs_to_assign * (ncpus / vecs_to_assign);
 
-		for (v = 0; curvec < last_affv && v < vecs_to_assign;
-		     curvec++, v++) {
+		for (v = 0; v < vecs_to_assign; v++) {
 			cpus_per_vec = ncpus / vecs_to_assign;
 
 			/* Account for extra vectors to compensate rounding errors */
@@ -153,16 +164,14 @@ static int __irq_build_affinity_masks(unsigned int startvec,
 			}
 			irq_spread_init_one(&masks[curvec].mask, nmsk,
 						cpus_per_vec);
+			if (++curvec >= last_affv)
+				curvec = firstvec;
 		}
 
 		done += v;
-		if (done >= numvecs)
-			break;
-		if (curvec >= last_affv)
-			curvec = firstvec;
 		--nodes;
 	}
-	return done;
+	return done < numvecs ? done : numvecs;
 }
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] genirq/affinity: spread vectors on node according to nr_cpu ratio
  2019-08-09 10:23 [PATCH 0/2] genriq/affinity: two improvement on __irq_build_affinity_masks Ming Lei
  2019-08-09 10:23 ` [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks() Ming Lei
@ 2019-08-09 10:23 ` Ming Lei
  1 sibling, 0 replies; 6+ messages in thread
From: Ming Lei @ 2019-08-09 10:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Ming Lei, Christoph Hellwig, Keith Busch,
	linux-nvme, Jon Derrick

Now __irq_build_affinity_masks() spreads vectors evenly per node, and
all vectors may not be spread in case that each numa node has different
CPU number, then the following warning in irq_build_affinity_masks() can
be triggered:

	if (nr_present < numvecs)
		WARN_ON(nr_present + nr_others < numvecs);

Improve current spreading algorithm by assigning vectors according to
the ratio of node's nr_cpu to nr_remaining_cpus.

Meantime the reported warning can be fixed.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>
Cc: linux-nvme@lists.infradead.org,
Cc: Jon Derrick <jonathan.derrick@intel.com>
Reported-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 kernel/irq/affinity.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index bc3652a2c61b..76f3d1b27d00 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -106,6 +106,7 @@ static int __irq_build_affinity_masks(unsigned int startvec,
 	unsigned int last_affv = firstvec + numvecs;
 	unsigned int curvec = startvec;
 	nodemask_t nodemsk = NODE_MASK_NONE;
+	unsigned remaining_cpus = 0;
 
 	if (!cpumask_weight(cpu_mask))
 		return 0;
@@ -126,6 +127,11 @@ static int __irq_build_affinity_masks(unsigned int startvec,
 		return numvecs;
 	}
 
+	for_each_node_mask(n, nodemsk) {
+		cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
+		remaining_cpus += cpumask_weight(nmsk);
+	}
+
 	for_each_node_mask(n, nodemsk) {
 		unsigned int ncpus, v, vecs_to_assign, vecs_per_node;
 
@@ -135,17 +141,22 @@ static int __irq_build_affinity_masks(unsigned int startvec,
 		if (!ncpus)
 			continue;
 
+		if (remaining_cpus == 0)
+			break;
+
 		/*
 		 * Calculate the number of cpus per vector
 		 *
-		 * Spread the vectors evenly per node. If the requested
-		 * vector number has been reached, simply allocate one
-		 * vector for each remaining node so that all nodes can
-		 * be covered
+		 * Spread the vectors among CPUs on this node according
+		 * to the ratio of 'ncpus' to 'remaining_cpus'. If the
+		 * requested vector number has been reached, simply
+		 * spread one vector for each remaining node so that all
+		 * nodes can be covered
 		 */
 		if (numvecs > done)
 			vecs_per_node = max_t(unsigned,
-					(numvecs - done) / nodes, 1);
+					(numvecs - done) * ncpus /
+					remaining_cpus, 1);
 		else
 			vecs_per_node = 1;
 
@@ -169,7 +180,7 @@ static int __irq_build_affinity_masks(unsigned int startvec,
 		}
 
 		done += v;
-		--nodes;
+		remaining_cpus -= ncpus;
 	}
 	return done < numvecs ? done : numvecs;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks()
  2019-08-09 10:23 ` [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks() Ming Lei
@ 2019-08-09 14:42   ` Keith Busch
  2019-08-09 23:05     ` Ming Lei
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2019-08-09 14:42 UTC (permalink / raw)
  To: Ming Lei
  Cc: Thomas Gleixner, linux-kernel, Christoph Hellwig, linux-nvme,
	Jon Derrick

On Fri, Aug 09, 2019 at 06:23:09PM +0800, Ming Lei wrote:
> One invariant of __irq_build_affinity_masks() is that all CPUs in the
> specified masks( cpu_mask AND node_to_cpumask for each node) should be
> covered during the spread. Even though all requested vectors have been
> reached, we still need to spread vectors among left CPUs. The similar
> policy has been taken in case of 'numvecs <= nodes'.
> 
> So remove the following check inside the loop:
> 
> 	if (done >= numvecs)
> 		break;
> 
> Meantime assign at least 1 vector for left nodes if 'numvecs' vectors
> have been spread.
> 
> Also, if the specified cpumask for one numa node is empty, simply not
> spread vectors on this node.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Keith Busch <kbusch@kernel.org>
> Cc: linux-nvme@lists.infradead.org,
> Cc: Jon Derrick <jonathan.derrick@intel.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  kernel/irq/affinity.c | 33 +++++++++++++++++++++------------
>  1 file changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index 6fef48033f96..bc3652a2c61b 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -129,21 +129,32 @@ static int __irq_build_affinity_masks(unsigned int startvec,
>  	for_each_node_mask(n, nodemsk) {
>  		unsigned int ncpus, v, vecs_to_assign, vecs_per_node;
>  
> -		/* Spread the vectors per node */
> -		vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;
> -
>  		/* Get the cpus on this node which are in the mask */
>  		cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
> -
> -		/* Calculate the number of cpus per vector */
>  		ncpus = cpumask_weight(nmsk);
> +		if (!ncpus)
> +			continue;

This shouldn't be possible, right? The nodemsk we're looping  wouldn't
have had that node set if no CPUs intersect the node_to_cpu_mask for
that node, so the resulting cpumask should always have a non-zero weight.

> @@ -153,16 +164,14 @@ static int __irq_build_affinity_masks(unsigned int startvec,
>  			}
>  			irq_spread_init_one(&masks[curvec].mask, nmsk,
>  						cpus_per_vec);
> +			if (++curvec >= last_affv)
> +				curvec = firstvec;

I'm not so sure about wrapping the vector to share it across nodes. We
have enough vectors in this path to ensure each compute node can have
a unique one, and it's much cheaper to share these within nodes than
across them.

>  		}
>  
>  		done += v;
> -		if (done >= numvecs)
> -			break;
> -		if (curvec >= last_affv)
> -			curvec = firstvec;
>  		--nodes;
>  	}
> -	return done;
> +	return done < numvecs ? done : numvecs;
>  }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks()
  2019-08-09 14:42   ` Keith Busch
@ 2019-08-09 23:05     ` Ming Lei
  2019-08-12  5:06       ` Ming Lei
  0 siblings, 1 reply; 6+ messages in thread
From: Ming Lei @ 2019-08-09 23:05 UTC (permalink / raw)
  To: Keith Busch
  Cc: Ming Lei, Thomas Gleixner, Jon Derrick,
	Linux Kernel Mailing List, linux-nvme, Christoph Hellwig

On Fri, Aug 9, 2019 at 10:44 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Fri, Aug 09, 2019 at 06:23:09PM +0800, Ming Lei wrote:
> > One invariant of __irq_build_affinity_masks() is that all CPUs in the
> > specified masks( cpu_mask AND node_to_cpumask for each node) should be
> > covered during the spread. Even though all requested vectors have been
> > reached, we still need to spread vectors among left CPUs. The similar
> > policy has been taken in case of 'numvecs <= nodes'.
> >
> > So remove the following check inside the loop:
> >
> >       if (done >= numvecs)
> >               break;
> >
> > Meantime assign at least 1 vector for left nodes if 'numvecs' vectors
> > have been spread.
> >
> > Also, if the specified cpumask for one numa node is empty, simply not
> > spread vectors on this node.
> >
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Keith Busch <kbusch@kernel.org>
> > Cc: linux-nvme@lists.infradead.org,
> > Cc: Jon Derrick <jonathan.derrick@intel.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  kernel/irq/affinity.c | 33 +++++++++++++++++++++------------
> >  1 file changed, 21 insertions(+), 12 deletions(-)
> >
> > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> > index 6fef48033f96..bc3652a2c61b 100644
> > --- a/kernel/irq/affinity.c
> > +++ b/kernel/irq/affinity.c
> > @@ -129,21 +129,32 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> >       for_each_node_mask(n, nodemsk) {
> >               unsigned int ncpus, v, vecs_to_assign, vecs_per_node;
> >
> > -             /* Spread the vectors per node */
> > -             vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;
> > -
> >               /* Get the cpus on this node which are in the mask */
> >               cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
> > -
> > -             /* Calculate the number of cpus per vector */
> >               ncpus = cpumask_weight(nmsk);
> > +             if (!ncpus)
> > +                     continue;
>
> This shouldn't be possible, right? The nodemsk we're looping  wouldn't
> have had that node set if no CPUs intersect the node_to_cpu_mask for
> that node, so the resulting cpumask should always have a non-zero weight.

     cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);

It is often true, see the following cases:

1) all CPUs in one node are not present

OR

2) all CPUs in one node are present

>
> > @@ -153,16 +164,14 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> >                       }
> >                       irq_spread_init_one(&masks[curvec].mask, nmsk,
> >                                               cpus_per_vec);
> > +                     if (++curvec >= last_affv)
> > +                             curvec = firstvec;
>
> I'm not so sure about wrapping the vector to share it across nodes. We

The wrapping is always there, not added by this patch.

Most time it won't happen since we spread vectors on remaining
(un-assigned)nodes.  And it only happens when there is remaining
nodes not spread. We have to make sure all nodes are spread.

And the similar policy is applied on the branch of 'numvecs <= nodes' too.

> have enough vectors in this path to ensure each compute node can have
> a unique one, and it's much cheaper to share these within nodes than
> across them.

The patch just moves the wrapping from loop outside into the loop, then
all 'extra_vecs' can be covered because it is always < 'vecs_to_assign'.

What matters is that the following check is removed:

   if (done >= numvecs)
                break;

then all nodes can be covered.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks()
  2019-08-09 23:05     ` Ming Lei
@ 2019-08-12  5:06       ` Ming Lei
  0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2019-08-12  5:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: Ming Lei, Thomas Gleixner, Jon Derrick,
	Linux Kernel Mailing List, linux-nvme, Christoph Hellwig

On Sat, Aug 10, 2019 at 7:05 AM Ming Lei <tom.leiming@gmail.com> wrote:
>
> On Fri, Aug 9, 2019 at 10:44 PM Keith Busch <kbusch@kernel.org> wrote:
> >
> > On Fri, Aug 09, 2019 at 06:23:09PM +0800, Ming Lei wrote:
> > > One invariant of __irq_build_affinity_masks() is that all CPUs in the
> > > specified masks( cpu_mask AND node_to_cpumask for each node) should be
> > > covered during the spread. Even though all requested vectors have been
> > > reached, we still need to spread vectors among left CPUs. The similar
> > > policy has been taken in case of 'numvecs <= nodes'.
> > >
> > > So remove the following check inside the loop:
> > >
> > >       if (done >= numvecs)
> > >               break;
> > >
> > > Meantime assign at least 1 vector for left nodes if 'numvecs' vectors
> > > have been spread.
> > >
> > > Also, if the specified cpumask for one numa node is empty, simply not
> > > spread vectors on this node.
> > >
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Keith Busch <kbusch@kernel.org>
> > > Cc: linux-nvme@lists.infradead.org,
> > > Cc: Jon Derrick <jonathan.derrick@intel.com>
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > >  kernel/irq/affinity.c | 33 +++++++++++++++++++++------------
> > >  1 file changed, 21 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> > > index 6fef48033f96..bc3652a2c61b 100644
> > > --- a/kernel/irq/affinity.c
> > > +++ b/kernel/irq/affinity.c
> > > @@ -129,21 +129,32 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> > >       for_each_node_mask(n, nodemsk) {
> > >               unsigned int ncpus, v, vecs_to_assign, vecs_per_node;
> > >
> > > -             /* Spread the vectors per node */
> > > -             vecs_per_node = (numvecs - (curvec - firstvec)) / nodes;
> > > -
> > >               /* Get the cpus on this node which are in the mask */
> > >               cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
> > > -
> > > -             /* Calculate the number of cpus per vector */
> > >               ncpus = cpumask_weight(nmsk);
> > > +             if (!ncpus)
> > > +                     continue;
> >
> > This shouldn't be possible, right? The nodemsk we're looping  wouldn't
> > have had that node set if no CPUs intersect the node_to_cpu_mask for
> > that node, so the resulting cpumask should always have a non-zero weight.
>
>      cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
>
> It is often true, see the following cases:
>
> 1) all CPUs in one node are not present
>
> OR
>
> 2) all CPUs in one node are present
>
> >
> > > @@ -153,16 +164,14 @@ static int __irq_build_affinity_masks(unsigned int startvec,
> > >                       }
> > >                       irq_spread_init_one(&masks[curvec].mask, nmsk,
> > >                                               cpus_per_vec);
> > > +                     if (++curvec >= last_affv)
> > > +                             curvec = firstvec;
> >
> > I'm not so sure about wrapping the vector to share it across nodes. We
>
> The wrapping is always there, not added by this patch.

We could avoid the wrapping completely given 'numvecs' > 'nodes', and
it can be done by sorting each node's nr_cpus, will do it in V2.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-08-12  5:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09 10:23 [PATCH 0/2] genriq/affinity: two improvement on __irq_build_affinity_masks Ming Lei
2019-08-09 10:23 ` [PATCH 1/2] genirq/affinity: improve __irq_build_affinity_masks() Ming Lei
2019-08-09 14:42   ` Keith Busch
2019-08-09 23:05     ` Ming Lei
2019-08-12  5:06       ` Ming Lei
2019-08-09 10:23 ` [PATCH 2/2] genirq/affinity: spread vectors on node according to nr_cpu ratio Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).