All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] net/mlx5: Fix mlx5_get_vector_affinity function
@ 2018-05-05 14:38 Guenter Roeck
  2018-05-06  7:33 ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Guenter Roeck @ 2018-05-05 14:38 UTC (permalink / raw)
  To: Israel Rukshin
  Cc: Max Gurtovoy, Matan Barak, Doug Ledford, linux-rdma,
	linux-kernel, netdev, Thomas Gleixner

On Thu, Apr 12, 2018 at 09:49:11AM +0000, Israel Rukshin wrote:
> Adding the vector offset when calling to mlx5_vector2eqn() is wrong.
> This is because mlx5_vector2eqn() checks if EQ index is equal to vector number
> and the fact that the internal completion vectors that mlx5 allocates
> don't get an EQ index.
> 
> The second problem here is that using effective_affinity_mask gives the same
> CPU for different vectors.
> This leads to unmapped queues when calling it from blk_mq_rdma_map_queues().
> This doesn't happen when using affinity_hint mask.
> 
Except that affinity_hint is only defined if SMP is enabled. Without:

include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’:
include/linux/mlx5/driver.h:1299:13: error:
        ‘struct irq_desc’ has no member named ‘affinity_hint’

Note that this is the only use of affinity_hint outside kernel/irq.
Don't other drivers have similar problems ?

Guenter

> Fixes: 2572cf57d75a ("mlx5: fix mlx5_get_vector_affinity to start from completion vector 0")
> Fixes: 05e0cc84e00c ("net/mlx5: Fix get vector affinity helper function")
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/infiniband/hw/mlx5/main.c |  2 +-
>  include/linux/mlx5/driver.h       | 12 +++---------
>  2 files changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index daa919e5a442..241cf4ff9901 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -4757,7 +4757,7 @@ mlx5_ib_get_vector_affinity(struct ib_device *ibdev, int comp_vector)
>  {
>  	struct mlx5_ib_dev *dev = to_mdev(ibdev);
>  
> -	return mlx5_get_vector_affinity(dev->mdev, comp_vector);
> +	return mlx5_get_vector_affinity_hint(dev->mdev, comp_vector);
>  }
>  
>  /* The mlx5_ib_multiport_mutex should be held when calling this function */
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index 767d193c269a..2a156c5dfadd 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -1284,25 +1284,19 @@ enum {
>  };
>  
>  static inline const struct cpumask *
> -mlx5_get_vector_affinity(struct mlx5_core_dev *dev, int vector)
> +mlx5_get_vector_affinity_hint(struct mlx5_core_dev *dev, int vector)
>  {
> -	const struct cpumask *mask;
>  	struct irq_desc *desc;
>  	unsigned int irq;
>  	int eqn;
>  	int err;
>  
> -	err = mlx5_vector2eqn(dev, MLX5_EQ_VEC_COMP_BASE + vector, &eqn, &irq);
> +	err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
>  	if (err)
>  		return NULL;
>  
>  	desc = irq_to_desc(irq);
> -#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
> -	mask = irq_data_get_effective_affinity_mask(&desc->irq_data);
> -#else
> -	mask = desc->irq_common_data.affinity;
> -#endif
> -	return mask;
> +	return desc->affinity_hint;
>  }
>  
>  #endif /* MLX5_DRIVER_H */
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/mlx5: Fix mlx5_get_vector_affinity function
  2018-05-05 14:38 [PATCH] net/mlx5: Fix mlx5_get_vector_affinity function Guenter Roeck
@ 2018-05-06  7:33 ` Thomas Gleixner
  2018-05-06  7:43   ` Thomas Gleixner
  2018-05-09 22:19   ` Guenter Roeck
  0 siblings, 2 replies; 5+ messages in thread
From: Thomas Gleixner @ 2018-05-06  7:33 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Israel Rukshin, Max Gurtovoy, Matan Barak, Doug Ledford,
	linux-rdma, linux-kernel, netdev

[-- Attachment #1: Type: text/plain, Size: 2165 bytes --]

On Sat, 5 May 2018, Guenter Roeck wrote:

> On Thu, Apr 12, 2018 at 09:49:11AM +0000, Israel Rukshin wrote:
> > Adding the vector offset when calling to mlx5_vector2eqn() is wrong.
> > This is because mlx5_vector2eqn() checks if EQ index is equal to vector number
> > and the fact that the internal completion vectors that mlx5 allocates
> > don't get an EQ index.
> > 
> > The second problem here is that using effective_affinity_mask gives the same
> > CPU for different vectors.
> > This leads to unmapped queues when calling it from blk_mq_rdma_map_queues().
> > This doesn't happen when using affinity_hint mask.
> > 
> Except that affinity_hint is only defined if SMP is enabled. Without:
> 
> include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’:
> include/linux/mlx5/driver.h:1299:13: error:
>         ‘struct irq_desc’ has no member named ‘affinity_hint’
> 
> Note that this is the only use of affinity_hint outside kernel/irq.
> Don't other drivers have similar problems ?

Aside of that.

> >  static inline const struct cpumask *
> > -mlx5_get_vector_affinity(struct mlx5_core_dev *dev, int vector)
> > +mlx5_get_vector_affinity_hint(struct mlx5_core_dev *dev, int vector)
> >  {
> > -	const struct cpumask *mask;
> >  	struct irq_desc *desc;
> >  	unsigned int irq;
> >  	int eqn;
> >  	int err;
> >  
> > -	err = mlx5_vector2eqn(dev, MLX5_EQ_VEC_COMP_BASE + vector, &eqn, &irq);
> > +	err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
> >  	if (err)
> >  		return NULL;
> >  
> >  	desc = irq_to_desc(irq);
> > -#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
> > -	mask = irq_data_get_effective_affinity_mask(&desc->irq_data);
> > -#else
> > -	mask = desc->irq_common_data.affinity;
> > -#endif
> > -	return mask;
> > +	return desc->affinity_hint;

NAK.

Nothing in regular device drivers is supposed to ever fiddle with struct
irq_desc. The existing code is already a violation of that rule and needs
to be fixed, but not in that way.

The logic here is completely screwed. affinity_hint is set by the driver,
so the driver already knows what it is. If the driver does not set it, then
the thing is NULL.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/mlx5: Fix mlx5_get_vector_affinity function
  2018-05-06  7:33 ` Thomas Gleixner
@ 2018-05-06  7:43   ` Thomas Gleixner
  2018-05-09 22:19   ` Guenter Roeck
  1 sibling, 0 replies; 5+ messages in thread
From: Thomas Gleixner @ 2018-05-06  7:43 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Israel Rukshin, Max Gurtovoy, Matan Barak, Doug Ledford,
	linux-rdma, linux-kernel, netdev

On Sun, 6 May 2018, Thomas Gleixner wrote:
> On Sat, 5 May 2018, Guenter Roeck wrote:
> > > -#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
> > > -	mask = irq_data_get_effective_affinity_mask(&desc->irq_data);
> > > -#else
> > > -	mask = desc->irq_common_data.affinity;
> > > -#endif
> > > -	return mask;
> > > +	return desc->affinity_hint;
> 
> NAK.
> 
> Nothing in regular device drivers is supposed to ever fiddle with struct
> irq_desc. The existing code is already a violation of that rule and needs
> to be fixed, but not in that way.
> 
> The logic here is completely screwed. affinity_hint is set by the driver,
> so the driver already knows what it is. If the driver does not set it, then
> the thing is NULL.

And this completely insane fiddling with irq_desc is in MLX4 as
well. Dammit, why can't people respect subsytem boundaries and just fiddle
in everything just because they can? If there is something missing at the
core level then please talk to the maintainers instead of hacking utter
crap into your driver.

Yours grumpy

      tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/mlx5: Fix mlx5_get_vector_affinity function
  2018-05-06  7:33 ` Thomas Gleixner
  2018-05-06  7:43   ` Thomas Gleixner
@ 2018-05-09 22:19   ` Guenter Roeck
  2018-05-09 23:19     ` Saeed Mahameed
  1 sibling, 1 reply; 5+ messages in thread
From: Guenter Roeck @ 2018-05-09 22:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Israel Rukshin, Max Gurtovoy, Matan Barak, Doug Ledford,
	linux-rdma, linux-kernel, netdev

On Sun, May 06, 2018 at 09:33:26AM +0200, Thomas Gleixner wrote:
> On Sat, 5 May 2018, Guenter Roeck wrote:
> 
> > On Thu, Apr 12, 2018 at 09:49:11AM +0000, Israel Rukshin wrote:
> > > Adding the vector offset when calling to mlx5_vector2eqn() is wrong.
> > > This is because mlx5_vector2eqn() checks if EQ index is equal to vector number
> > > and the fact that the internal completion vectors that mlx5 allocates
> > > don't get an EQ index.
> > > 
> > > The second problem here is that using effective_affinity_mask gives the same
> > > CPU for different vectors.
> > > This leads to unmapped queues when calling it from blk_mq_rdma_map_queues().
> > > This doesn't happen when using affinity_hint mask.
> > > 
> > Except that affinity_hint is only defined if SMP is enabled. Without:
> > 
> > include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’:
> > include/linux/mlx5/driver.h:1299:13: error:
> >         ‘struct irq_desc’ has no member named ‘affinity_hint’
> > 
> > Note that this is the only use of affinity_hint outside kernel/irq.
> > Don't other drivers have similar problems ?
> 
> Aside of that.
> 
> > >  static inline const struct cpumask *
> > > -mlx5_get_vector_affinity(struct mlx5_core_dev *dev, int vector)
> > > +mlx5_get_vector_affinity_hint(struct mlx5_core_dev *dev, int vector)
> > >  {
> > > -	const struct cpumask *mask;
> > >  	struct irq_desc *desc;
> > >  	unsigned int irq;
> > >  	int eqn;
> > >  	int err;
> > >  
> > > -	err = mlx5_vector2eqn(dev, MLX5_EQ_VEC_COMP_BASE + vector, &eqn, &irq);
> > > +	err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
> > >  	if (err)
> > >  		return NULL;
> > >  
> > >  	desc = irq_to_desc(irq);
> > > -#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
> > > -	mask = irq_data_get_effective_affinity_mask(&desc->irq_data);
> > > -#else
> > > -	mask = desc->irq_common_data.affinity;
> > > -#endif
> > > -	return mask;
> > > +	return desc->affinity_hint;
> 
> NAK.
> 

The offending patch is upstream, breaking non-SMP test builds, and I have
not seen any feedback from the submitter. Any suggestion how to proceed ?

Guenter

> Nothing in regular device drivers is supposed to ever fiddle with struct
> irq_desc. The existing code is already a violation of that rule and needs
> to be fixed, but not in that way.
> 
> The logic here is completely screwed. affinity_hint is set by the driver,
> so the driver already knows what it is. If the driver does not set it, then
> the thing is NULL.
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] net/mlx5: Fix mlx5_get_vector_affinity function
  2018-05-09 22:19   ` Guenter Roeck
@ 2018-05-09 23:19     ` Saeed Mahameed
  0 siblings, 0 replies; 5+ messages in thread
From: Saeed Mahameed @ 2018-05-09 23:19 UTC (permalink / raw)
  To: linux, tglx
  Cc: netdev, Max Gurtovoy, Israel Rukshin, linux-rdma, Matan Barak,
	dledford, linux-kernel

On Wed, 2018-05-09 at 15:19 -0700, Guenter Roeck wrote:
> On Sun, May 06, 2018 at 09:33:26AM +0200, Thomas Gleixner wrote:
> > On Sat, 5 May 2018, Guenter Roeck wrote:
> > 
> > > On Thu, Apr 12, 2018 at 09:49:11AM +0000, Israel Rukshin wrote:
> > > > Adding the vector offset when calling to mlx5_vector2eqn() is
> > > > wrong.
> > > > This is because mlx5_vector2eqn() checks if EQ index is equal
> > > > to vector number
> > > > and the fact that the internal completion vectors that mlx5
> > > > allocates
> > > > don't get an EQ index.
> > > > 
> > > > The second problem here is that using effective_affinity_mask
> > > > gives the same
> > > > CPU for different vectors.
> > > > This leads to unmapped queues when calling it from
> > > > blk_mq_rdma_map_queues().
> > > > This doesn't happen when using affinity_hint mask.
> > > > 
> > > 
> > > Except that affinity_hint is only defined if SMP is enabled.
> > > Without:
> > > 
> > > include/linux/mlx5/driver.h: In function
> > > ‘mlx5_get_vector_affinity_hint’:
> > > include/linux/mlx5/driver.h:1299:13: error:
> > >         ‘struct irq_desc’ has no member named ‘affinity_hint’
> > > 
> > > Note that this is the only use of affinity_hint outside
> > > kernel/irq.
> > > Don't other drivers have similar problems ?
> > 
> > Aside of that.
> > 
> > > >  static inline const struct cpumask *
> > > > -mlx5_get_vector_affinity(struct mlx5_core_dev *dev, int
> > > > vector)
> > > > +mlx5_get_vector_affinity_hint(struct mlx5_core_dev *dev, int
> > > > vector)
> > > >  {
> > > > -	const struct cpumask *mask;
> > > >  	struct irq_desc *desc;
> > > >  	unsigned int irq;
> > > >  	int eqn;
> > > >  	int err;
> > > >  
> > > > -	err = mlx5_vector2eqn(dev, MLX5_EQ_VEC_COMP_BASE +
> > > > vector, &eqn, &irq);
> > > > +	err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
> > > >  	if (err)
> > > >  		return NULL;
> > > >  
> > > >  	desc = irq_to_desc(irq);
> > > > -#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
> > > > -	mask = irq_data_get_effective_affinity_mask(&desc-
> > > > >irq_data);
> > > > -#else
> > > > -	mask = desc->irq_common_data.affinity;
> > > > -#endif
> > > > -	return mask;
> > > > +	return desc->affinity_hint;
> > 
> > NAK.
> > 
> 
> The offending patch is upstream, breaking non-SMP test builds, and I
> have
> not seen any feedback from the submitter. Any suggestion how to
> proceed ?
> 
> Guenter
> 
> > 
> 

Hi Guenter and Thomas,

Max and Israel are handling this internally to find a solution that
provides the needed functionality for rdma and addresses your comments.

Thanks,
Saeed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-05-09 23:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-05 14:38 [PATCH] net/mlx5: Fix mlx5_get_vector_affinity function Guenter Roeck
2018-05-06  7:33 ` Thomas Gleixner
2018-05-06  7:43   ` Thomas Gleixner
2018-05-09 22:19   ` Guenter Roeck
2018-05-09 23:19     ` Saeed Mahameed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.