All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] net/mlx5: fix return value of start operation
@ 2018-01-18 13:00 Olivier Matz
  2018-01-18 13:00 ` [PATCH 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Olivier Matz @ 2018-01-18 13:00 UTC (permalink / raw)
  To: dev, Adrien Mazarguil, Nelio Laranjeiro, Yongseok Koh; +Cc: stable

On error, mlx5_dev_start() does not return a negative value
as it is supposed to do. The consequence is that the application
(ex: testpmd) does not notice that the port is not started
and begins the rxtx on an uninitialized port, which crashes.

Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/mlx5/mlx5_trigger.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 1a20967a2..44f702daa 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 		ERROR("%p: an error occurred while configuring control flows:"
 		      " %s",
 		      (void *)priv, strerror(err));
+		err = -err;
 		goto error;
 	}
 	err = priv_flow_start(priv, &priv->flows);
@@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 		ERROR("%p: an error occurred while configuring flows:"
 		      " %s",
 		      (void *)priv, strerror(err));
+		err = -err;
 		goto error;
 	}
 	err = priv_rx_intr_vec_enable(priv);
@@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	priv_rxq_stop(priv);
 	priv_flow_delete_drop_queue(priv);
 	priv_unlock(priv);
-	return -err;
+	return err;
 }
 
 /**
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] net/mlx5: fix allocation when no memory on device NUMA node
  2018-01-18 13:00 [PATCH 1/2] net/mlx5: fix return value of start operation Olivier Matz
@ 2018-01-18 13:00 ` Olivier Matz
  2018-01-18 16:19   ` Nélio Laranjeiro
  2018-01-18 16:04 ` [PATCH 1/2] net/mlx5: fix return value of start operation Nélio Laranjeiro
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Olivier Matz @ 2018-01-18 13:00 UTC (permalink / raw)
  To: dev, Adrien Mazarguil, Nelio Laranjeiro, Yongseok Koh; +Cc: stable

If there is no memory available on the same numa node than the
device, it is preferable to fallback on another socket instead
of failing.

Fixes: 1e3a39f72d5d ("net/mlx5: allocate verbs object into shared memory")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 drivers/net/mlx5/mlx5.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 1c95f3520..312f3d5be 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -143,6 +143,10 @@ mlx5_alloc_verbs_buf(size_t size, void *data)
 	assert(data != NULL);
 	ret = rte_malloc_socket(__func__, size, alignment,
 				priv->dev->device->numa_node);
+	if (ret == NULL)
+		ret = rte_malloc_socket(__func__, size, alignment,
+					SOCKET_ID_ANY);
+
 	DEBUG("Extern alloc size: %lu, align: %lu: %p", size, alignment, ret);
 	return ret;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] net/mlx5: fix return value of start operation
  2018-01-18 13:00 [PATCH 1/2] net/mlx5: fix return value of start operation Olivier Matz
  2018-01-18 13:00 ` [PATCH 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
@ 2018-01-18 16:04 ` Nélio Laranjeiro
  2018-01-18 16:13   ` Olivier Matz
  2018-01-19 13:42 ` Olivier Matz
  2018-01-19 16:25 ` [PATCH v2 " Olivier Matz
  3 siblings, 1 reply; 19+ messages in thread
From: Nélio Laranjeiro @ 2018-01-18 16:04 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, Adrien Mazarguil, Yongseok Koh, stable

On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> On error, mlx5_dev_start() does not return a negative value
> as it is supposed to do. The consequence is that the application
> (ex: testpmd) does not notice that the port is not started
> and begins the rxtx on an uninitialized port, which crashes.
> 
> Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> index 1a20967a2..44f702daa 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>  		ERROR("%p: an error occurred while configuring control flows:"
>  		      " %s",
>  		      (void *)priv, strerror(err));
> +		err = -err;
>  		goto error;
>  	}
>  	err = priv_flow_start(priv, &priv->flows);
> @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>  		ERROR("%p: an error occurred while configuring flows:"
>  		      " %s",
>  		      (void *)priv, strerror(err));
> +		err = -err;
>  		goto error;
>  	}
>  	err = priv_rx_intr_vec_enable(priv);
> @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>  	priv_rxq_stop(priv);
>  	priv_flow_delete_drop_queue(priv);
>  	priv_unlock(priv);
> -	return -err;
> +	return err;
>  }
>  
>  /**

err in the function is handled with positives errno's, adding only those
two and returning err will make the other positive.

Regards,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] net/mlx5: fix return value of start operation
  2018-01-18 16:04 ` [PATCH 1/2] net/mlx5: fix return value of start operation Nélio Laranjeiro
@ 2018-01-18 16:13   ` Olivier Matz
  2018-01-19  6:28     ` Yongseok Koh
  2018-01-19  8:35     ` Nélio Laranjeiro
  0 siblings, 2 replies; 19+ messages in thread
From: Olivier Matz @ 2018-01-18 16:13 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: dev, Adrien Mazarguil, Yongseok Koh, stable

On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > On error, mlx5_dev_start() does not return a negative value
> > as it is supposed to do. The consequence is that the application
> > (ex: testpmd) does not notice that the port is not started
> > and begins the rxtx on an uninitialized port, which crashes.
> > 
> > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > index 1a20967a2..44f702daa 100644
> > --- a/drivers/net/mlx5/mlx5_trigger.c
> > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> >  		ERROR("%p: an error occurred while configuring control flows:"
> >  		      " %s",
> >  		      (void *)priv, strerror(err));
> > +		err = -err;
> >  		goto error;
> >  	}
> >  	err = priv_flow_start(priv, &priv->flows);
> > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> >  		ERROR("%p: an error occurred while configuring flows:"
> >  		      " %s",
> >  		      (void *)priv, strerror(err));
> > +		err = -err;
> >  		goto error;
> >  	}
> >  	err = priv_rx_intr_vec_enable(priv);
> > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> >  	priv_rxq_stop(priv);
> >  	priv_flow_delete_drop_queue(priv);
> >  	priv_unlock(priv);
> > -	return -err;
> > +	return err;
> >  }
> >  
> >  /**
> 
> err in the function is handled with positives errno's, adding only those
> two and returning err will make the other positive.

I tried to check the return value of all functions called by mlx5_dev_start()
(negative or positive). Do you see something wrong?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] net/mlx5: fix allocation when no memory on device NUMA node
  2018-01-18 13:00 ` [PATCH 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
@ 2018-01-18 16:19   ` Nélio Laranjeiro
  0 siblings, 0 replies; 19+ messages in thread
From: Nélio Laranjeiro @ 2018-01-18 16:19 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, Adrien Mazarguil, Yongseok Koh, stable

On Thu, Jan 18, 2018 at 02:00:43PM +0100, Olivier Matz wrote:
> If there is no memory available on the same numa node than the
> device, it is preferable to fallback on another socket instead
> of failing.
> 
> Fixes: 1e3a39f72d5d ("net/mlx5: allocate verbs object into shared memory")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  drivers/net/mlx5/mlx5.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index 1c95f3520..312f3d5be 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -143,6 +143,10 @@ mlx5_alloc_verbs_buf(size_t size, void *data)
>  	assert(data != NULL);
>  	ret = rte_malloc_socket(__func__, size, alignment,
>  				priv->dev->device->numa_node);
> +	if (ret == NULL)
> +		ret = rte_malloc_socket(__func__, size, alignment,
> +					SOCKET_ID_ANY);
> +
>  	DEBUG("Extern alloc size: %lu, align: %lu: %p", size, alignment, ret);
>  	return ret;
>  }
> -- 
> 2.11.0
> 

This function is the finalisation of the creation of the queues and
contains the buffers to the CQ/WQ which must be on the correct socket
otherwise the performances will be limited.

Even if this function is only called on dev_start() it must reflect the
configuration requested from the application on the
rte_eth_{tx,rx}_queue_setup().

Regards,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] net/mlx5: fix return value of start operation
  2018-01-18 16:13   ` Olivier Matz
@ 2018-01-19  6:28     ` Yongseok Koh
  2018-01-19  8:35     ` Nélio Laranjeiro
  1 sibling, 0 replies; 19+ messages in thread
From: Yongseok Koh @ 2018-01-19  6:28 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Nélio Laranjeiro, dev, Adrien Mazarguil, stable

> On Jan 18, 2018, at 8:13 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
> 
> On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
>> On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
>>> On error, mlx5_dev_start() does not return a negative value
>>> as it is supposed to do. The consequence is that the application
>>> (ex: testpmd) does not notice that the port is not started
>>> and begins the rxtx on an uninitialized port, which crashes.
>>> 
>>> Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
>>> Cc: stable@dpdk.org
>>> 
>>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
>>> ---
>>> drivers/net/mlx5/mlx5_trigger.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
>>> index 1a20967a2..44f702daa 100644
>>> --- a/drivers/net/mlx5/mlx5_trigger.c
>>> +++ b/drivers/net/mlx5/mlx5_trigger.c
>>> @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>>> 		ERROR("%p: an error occurred while configuring control flows:"
>>> 		      " %s",
>>> 		      (void *)priv, strerror(err));
>>> +		err = -err;
>>> 		goto error;
>>> 	}
>>> 	err = priv_flow_start(priv, &priv->flows);
>>> @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>>> 		ERROR("%p: an error occurred while configuring flows:"
>>> 		      " %s",
>>> 		      (void *)priv, strerror(err));
>>> +		err = -err;
>>> 		goto error;
>>> 	}
>>> 	err = priv_rx_intr_vec_enable(priv);
>>> @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>>> 	priv_rxq_stop(priv);
>>> 	priv_flow_delete_drop_queue(priv);
>>> 	priv_unlock(priv);
>>> -	return -err;
>>> +	return err;
>>> }
>>> 
>>> /**
>> 
>> err in the function is handled with positives errno's, adding only those
>> two and returning err will make the other positive.
> 
> I tried to check the return value of all functions called by mlx5_dev_start()
> (negative or positive). Do you see something wrong?

Those two func calls have been moved recently. [1]
Please rebase it on top of dpdk-next-net-mlx/for-next-net

Then, the last change is okay to return negative values.

Nelio, we should make all the return values consistent someday, shouldn't we?

[1] http://dpdk.org/browse/next/dpdk-next-net-mlx/commit/?h=for-next-net&id=ed3d6afc9295bc16ab9ed2cad26af0c8cd9bd14e

Thanks,
Yongseok


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] net/mlx5: fix return value of start operation
  2018-01-18 16:13   ` Olivier Matz
  2018-01-19  6:28     ` Yongseok Koh
@ 2018-01-19  8:35     ` Nélio Laranjeiro
  2018-01-19  8:43       ` Olivier Matz
  1 sibling, 1 reply; 19+ messages in thread
From: Nélio Laranjeiro @ 2018-01-19  8:35 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, Adrien Mazarguil, Yongseok Koh, stable

On Thu, Jan 18, 2018 at 05:13:08PM +0100, Olivier Matz wrote:
> On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> > On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > > On error, mlx5_dev_start() does not return a negative value
> > > as it is supposed to do. The consequence is that the application
> > > (ex: testpmd) does not notice that the port is not started
> > > and begins the rxtx on an uninitialized port, which crashes.
> > > 
> > > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > > Cc: stable@dpdk.org
> > > 
> > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > ---
> > >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > > index 1a20967a2..44f702daa 100644
> > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > >  		ERROR("%p: an error occurred while configuring control flows:"
> > >  		      " %s",
> > >  		      (void *)priv, strerror(err));
> > > +		err = -err;
> > >  		goto error;
> > >  	}
> > >  	err = priv_flow_start(priv, &priv->flows);
> > > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > >  		ERROR("%p: an error occurred while configuring flows:"
> > >  		      " %s",
> > >  		      (void *)priv, strerror(err));
> > > +		err = -err;
> > >  		goto error;
> > >  	}
> > >  	err = priv_rx_intr_vec_enable(priv);
> > > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > >  	priv_rxq_stop(priv);
> > >  	priv_flow_delete_drop_queue(priv);
> > >  	priv_unlock(priv);
> > > -	return -err;
> > > +	return err;
> > >  }
> > >  
> > >  /**
> > 
> > err in the function is handled with positives errno's, adding only those
> > two and returning err will make the other positive.
> 
> I tried to check the return value of all functions called by mlx5_dev_start()
> (negative or positive). Do you see something wrong?

What I mean is priv_flow_start() is returning a positive errno as all
other functions priv_*() that's the main reason for the final rteurn -err;

Internally MLX5 driver only works with positives errnos as lot of them
are retuning the values from ioctl directly.  Only the public ones are
returning negatives.

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] net/mlx5: fix return value of start operation
  2018-01-19  8:35     ` Nélio Laranjeiro
@ 2018-01-19  8:43       ` Olivier Matz
  2018-01-19 13:30         ` Olivier Matz
  0 siblings, 1 reply; 19+ messages in thread
From: Olivier Matz @ 2018-01-19  8:43 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: dev, Adrien Mazarguil, Yongseok Koh, stable

On Fri, Jan 19, 2018 at 09:35:01AM +0100, Nélio Laranjeiro wrote:
> On Thu, Jan 18, 2018 at 05:13:08PM +0100, Olivier Matz wrote:
> > On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> > > On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > > > On error, mlx5_dev_start() does not return a negative value
> > > > as it is supposed to do. The consequence is that the application
> > > > (ex: testpmd) does not notice that the port is not started
> > > > and begins the rxtx on an uninitialized port, which crashes.
> > > > 
> > > > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > > > Cc: stable@dpdk.org
> > > > 
> > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > ---
> > > >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > > > index 1a20967a2..44f702daa 100644
> > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > >  		ERROR("%p: an error occurred while configuring control flows:"
> > > >  		      " %s",
> > > >  		      (void *)priv, strerror(err));
> > > > +		err = -err;
> > > >  		goto error;
> > > >  	}
> > > >  	err = priv_flow_start(priv, &priv->flows);
> > > > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > >  		ERROR("%p: an error occurred while configuring flows:"
> > > >  		      " %s",
> > > >  		      (void *)priv, strerror(err));
> > > > +		err = -err;
> > > >  		goto error;
> > > >  	}
> > > >  	err = priv_rx_intr_vec_enable(priv);
> > > > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > >  	priv_rxq_stop(priv);
> > > >  	priv_flow_delete_drop_queue(priv);
> > > >  	priv_unlock(priv);
> > > > -	return -err;
> > > > +	return err;
> > > >  }
> > > >  
> > > >  /**
> > > 
> > > err in the function is handled with positives errno's, adding only those
> > > two and returning err will make the other positive.
> > 
> > I tried to check the return value of all functions called by mlx5_dev_start()
> > (negative or positive). Do you see something wrong?
> 
> What I mean is priv_flow_start() is returning a positive errno as all
> other functions priv_*() that's the main reason for the final rteurn -err;
> 
> Internally MLX5 driver only works with positives errnos as lot of them
> are retuning the values from ioctl directly.  Only the public ones are
> returning negatives.

So, what should I modify in this patch for v2?
Do you agree that there is a bug regarding the return value of mlx5_dev_start()?

It can be reproduced easily by starting testpmd on a dual-socket machine,
use the core and memory from socket 0, and have the mlx device on socket 1.
Then start traffic forwarding, it will crash.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] net/mlx5: fix return value of start operation
  2018-01-19  8:43       ` Olivier Matz
@ 2018-01-19 13:30         ` Olivier Matz
  2018-01-19 13:43           ` Nélio Laranjeiro
  2018-01-19 14:18           ` Nélio Laranjeiro
  0 siblings, 2 replies; 19+ messages in thread
From: Olivier Matz @ 2018-01-19 13:30 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: dev, Adrien Mazarguil, Yongseok Koh, stable

On Fri, Jan 19, 2018 at 09:43:14AM +0100, Olivier Matz wrote:
> On Fri, Jan 19, 2018 at 09:35:01AM +0100, Nélio Laranjeiro wrote:
> > On Thu, Jan 18, 2018 at 05:13:08PM +0100, Olivier Matz wrote:
> > > On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> > > > On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > > > > On error, mlx5_dev_start() does not return a negative value
> > > > > as it is supposed to do. The consequence is that the application
> > > > > (ex: testpmd) does not notice that the port is not started
> > > > > and begins the rxtx on an uninitialized port, which crashes.
> > > > > 
> > > > > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > > > > Cc: stable@dpdk.org
> > > > > 
> > > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > ---
> > > > >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > > > > index 1a20967a2..44f702daa 100644
> > > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > >  		ERROR("%p: an error occurred while configuring control flows:"
> > > > >  		      " %s",
> > > > >  		      (void *)priv, strerror(err));
> > > > > +		err = -err;
> > > > >  		goto error;
> > > > >  	}
> > > > >  	err = priv_flow_start(priv, &priv->flows);
> > > > > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > >  		ERROR("%p: an error occurred while configuring flows:"
> > > > >  		      " %s",
> > > > >  		      (void *)priv, strerror(err));
> > > > > +		err = -err;
> > > > >  		goto error;
> > > > >  	}
> > > > >  	err = priv_rx_intr_vec_enable(priv);
> > > > > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > >  	priv_rxq_stop(priv);
> > > > >  	priv_flow_delete_drop_queue(priv);
> > > > >  	priv_unlock(priv);
> > > > > -	return -err;
> > > > > +	return err;
> > > > >  }
> > > > >  
> > > > >  /**
> > > > 
> > > > err in the function is handled with positives errno's, adding only those
> > > > two and returning err will make the other positive.
> > > 
> > > I tried to check the return value of all functions called by mlx5_dev_start()
> > > (negative or positive). Do you see something wrong?
> > 
> > What I mean is priv_flow_start() is returning a positive errno as all
> > other functions priv_*() that's the main reason for the final rteurn -err;

priv_txq_start(priv);
  -> negative on error
priv_rxq_start(priv);
  -> negative on error
priv_dev_traffic_enable(priv, dev);
  -> positive (+errno) on error
priv_flow_start(priv, &priv->flows);
  -> positive (+errno) on error
priv_rx_intr_vec_enable(priv);
  -> negative (-1 or -errno) on error


> > Internally MLX5 driver only works with positives errnos as lot of them
> > are retuning the values from ioctl directly.  Only the public ones are
> > returning negatives.
> 
> So, what should I modify in this patch for v2?
> Do you agree that there is a bug regarding the return value of mlx5_dev_start()?
> 
> It can be reproduced easily by starting testpmd on a dual-socket machine,
> use the core and memory from socket 0, and have the mlx device on socket 1.
> Then start traffic forwarding, it will crash.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] net/mlx5: fix return value of start operation
  2018-01-18 13:00 [PATCH 1/2] net/mlx5: fix return value of start operation Olivier Matz
  2018-01-18 13:00 ` [PATCH 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
  2018-01-18 16:04 ` [PATCH 1/2] net/mlx5: fix return value of start operation Nélio Laranjeiro
@ 2018-01-19 13:42 ` Olivier Matz
  2018-01-19 16:25 ` [PATCH v2 " Olivier Matz
  3 siblings, 0 replies; 19+ messages in thread
From: Olivier Matz @ 2018-01-19 13:42 UTC (permalink / raw)
  To: dev, Adrien Mazarguil, Nelio Laranjeiro, Yongseok Koh; +Cc: stable

On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> On error, mlx5_dev_start() does not return a negative value
> as it is supposed to do. The consequence is that the application
> (ex: testpmd) does not notice that the port is not started
> and begins the rxtx on an uninitialized port, which crashes.
> 
> Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

For reference, here is how to reproduce the problem.


The topology of the target:

   socket 0                 socket 1
  +---------------------+  +---------------------+
  |        c0        c1 |  |        c0        c1 |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  | |  0| 16| |  1| 17| |  | |  8| 24| |  9| 25| |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  |        c2        c3 |  |        c2        c3 |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  | |  2| 18| |  3| 19| |  | | 10| 26| | 11| 27| |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  |        c4        c5 |  |        c4        c5 |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  | |  4| 20| |  5| 21| |  | | 12| 28| | 13| 29| |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  |        c6        c7 |  |        c6        c7 |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  | |  6| 22| |  7| 23| |  | | 14| 30| | 15| 31| |
  | +-------+ +-------+ |  | +-------+ +-------+ |
  +---------------------+  +---------------------+

The cx4 devices are on socket 1, but I use cores and memory from
socket 0. I know it is not optimal, but it should work.



root@dut-cx4:~# cd dpdk.org
root@dut-cx4:~/dpdk.org# make config T=x86_64-native-linuxapp-gcc
root@dut-cx4:~/dpdk.org# make -j32
root@dut-cx4:~/dpdk.org# mkdir -p /mnt/huge
root@dut-cx4:~/dpdk.org# mount -t hugetlbfs nodev /mnt/huge
root@dut-cx4:~/dpdk.org# echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
root@dut-cx4:~/dpdk.org# testpmd -l 0,2 -- --total-num-mbufs=16384 -i --port-topology=chained --no-numa
EAL: Detected 32 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:02:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:02:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:02:00.2 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:02:00.3 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 14e4:16d7 net_bnxt
EAL: PCI device 0000:04:00.1 on NUMA socket 0
EAL:   probe driver: 14e4:16d7 net_bnxt
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:83:00.0 on NUMA socket 1
EAL:   probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:83:00.1 on NUMA socket 1
EAL:   probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:86:00.0 on NUMA socket 1
EAL:   probe driver: 15b3:1013 net_mlx5
PMD: net_mlx5: PCI information matches, using device "mlx5_0" (SR-IOV: false)
PMD: net_mlx5: 1 port(s) detected
PMD: net_mlx5: MPS is disabled
PMD: net_mlx5: port 1 MAC address is e4:1d:2d:e7:0d:06
EAL: PCI device 0000:86:00.1 on NUMA socket 1
EAL:   probe driver: 15b3:1013 net_mlx5
PMD: net_mlx5: PCI information matches, using device "mlx5_1" (SR-IOV: false)
PMD: net_mlx5: 1 port(s) detected
PMD: net_mlx5: MPS is disabled
PMD: net_mlx5: port 1 MAC address is e4:1d:2d:e7:0d:07
Interactive-mode selected
USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=16384, size=2176, socket=0
Configuring Port 0 (socket 0)
PMD: net_mlx5: 0x1459e40: TX queues number update: 0 -> 1
PMD: net_mlx5: 0x1459e40: RX queues number update: 0 -> 1
PMD: net_mlx5: cannot allocate CQ for drop queue
PMD: net_mlx5: 0x1459e40: Drop queue allocation failed: Unknown error -1
Port 0: E4:1D:2D:E7:0D:06
Configuring Port 1 (socket 0)
PMD: net_mlx5: 0x145dec0: TX queues number update: 0 -> 1
PMD: net_mlx5: 0x145dec0: RX queues number update: 0 -> 1
PMD: net_mlx5: cannot allocate CQ for drop queue
PMD: net_mlx5: 0x145dec0: Drop queue allocation failed: Unknown error -1
Port 1: E4:1D:2D:E7:0D:07
Checking link statuses...
Done
testpmd> start
Segmentation fault (core dumped)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] net/mlx5: fix return value of start operation
  2018-01-19 13:30         ` Olivier Matz
@ 2018-01-19 13:43           ` Nélio Laranjeiro
  2018-01-19 14:18           ` Nélio Laranjeiro
  1 sibling, 0 replies; 19+ messages in thread
From: Nélio Laranjeiro @ 2018-01-19 13:43 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, Adrien Mazarguil, Yongseok Koh, stable

On Fri, Jan 19, 2018 at 02:30:45PM +0100, Olivier Matz wrote:
> On Fri, Jan 19, 2018 at 09:43:14AM +0100, Olivier Matz wrote:
> > On Fri, Jan 19, 2018 at 09:35:01AM +0100, Nélio Laranjeiro wrote:
> > > On Thu, Jan 18, 2018 at 05:13:08PM +0100, Olivier Matz wrote:
> > > > On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> > > > > On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > > > > > On error, mlx5_dev_start() does not return a negative value
> > > > > > as it is supposed to do. The consequence is that the application
> > > > > > (ex: testpmd) does not notice that the port is not started
> > > > > > and begins the rxtx on an uninitialized port, which crashes.
> > > > > > 
> > > > > > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > > > > > Cc: stable@dpdk.org
> > > > > > 
> > > > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > > ---
> > > > > >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> > > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > > > > > index 1a20967a2..44f702daa 100644
> > > > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > > > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  		ERROR("%p: an error occurred while configuring control flows:"
> > > > > >  		      " %s",
> > > > > >  		      (void *)priv, strerror(err));
> > > > > > +		err = -err;
> > > > > >  		goto error;
> > > > > >  	}
> > > > > >  	err = priv_flow_start(priv, &priv->flows);
> > > > > > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  		ERROR("%p: an error occurred while configuring flows:"
> > > > > >  		      " %s",
> > > > > >  		      (void *)priv, strerror(err));
> > > > > > +		err = -err;
> > > > > >  		goto error;
> > > > > >  	}
> > > > > >  	err = priv_rx_intr_vec_enable(priv);
> > > > > > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  	priv_rxq_stop(priv);
> > > > > >  	priv_flow_delete_drop_queue(priv);
> > > > > >  	priv_unlock(priv);
> > > > > > -	return -err;
> > > > > > +	return err;
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > 
> > > > > err in the function is handled with positives errno's, adding only those
> > > > > two and returning err will make the other positive.
> > > > 
> > > > I tried to check the return value of all functions called by mlx5_dev_start()
> > > > (negative or positive). Do you see something wrong?
> > > 
> > > What I mean is priv_flow_start() is returning a positive errno as all
> > > other functions priv_*() that's the main reason for the final rteurn -err;
> 
> priv_txq_start(priv);
>   -> negative on error
> priv_rxq_start(priv);
>   -> negative on error
> priv_dev_traffic_enable(priv, dev);
>   -> positive (+errno) on error
> priv_flow_start(priv, &priv->flows);
>   -> positive (+errno) on error
> priv_rx_intr_vec_enable(priv);
>   -> negative (-1 or -errno) on error

What a mess, it should be all positives...
 
> > > Internally MLX5 driver only works with positives errnos as lot of them
> > > are retuning the values from ioctl directly.  Only the public ones are
> > > returning negatives.
> > 
> > So, what should I modify in this patch for v2?
> > Do you agree that there is a bug regarding the return value of mlx5_dev_start()?
> > 
> > It can be reproduced easily by starting testpmd on a dual-socket machine,
> > use the core and memory from socket 0, and have the mlx device on socket 1.
> > Then start traffic forwarding, it will crash.

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] net/mlx5: fix return value of start operation
  2018-01-19 13:30         ` Olivier Matz
  2018-01-19 13:43           ` Nélio Laranjeiro
@ 2018-01-19 14:18           ` Nélio Laranjeiro
  1 sibling, 0 replies; 19+ messages in thread
From: Nélio Laranjeiro @ 2018-01-19 14:18 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, Adrien Mazarguil, Yongseok Koh, stable

On Fri, Jan 19, 2018 at 02:30:45PM +0100, Olivier Matz wrote:
> On Fri, Jan 19, 2018 at 09:43:14AM +0100, Olivier Matz wrote:
> > On Fri, Jan 19, 2018 at 09:35:01AM +0100, Nélio Laranjeiro wrote:
> > > On Thu, Jan 18, 2018 at 05:13:08PM +0100, Olivier Matz wrote:
> > > > On Thu, Jan 18, 2018 at 05:04:27PM +0100, Nélio Laranjeiro wrote:
> > > > > On Thu, Jan 18, 2018 at 02:00:42PM +0100, Olivier Matz wrote:
> > > > > > On error, mlx5_dev_start() does not return a negative value
> > > > > > as it is supposed to do. The consequence is that the application
> > > > > > (ex: testpmd) does not notice that the port is not started
> > > > > > and begins the rxtx on an uninitialized port, which crashes.
> > > > > > 
> > > > > > Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> > > > > > Cc: stable@dpdk.org
> > > > > > 
> > > > > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > > > > ---
> > > > > >  drivers/net/mlx5/mlx5_trigger.c | 4 +++-
> > > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> > > > > > index 1a20967a2..44f702daa 100644
> > > > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > > > @@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  		ERROR("%p: an error occurred while configuring control flows:"
> > > > > >  		      " %s",
> > > > > >  		      (void *)priv, strerror(err));
> > > > > > +		err = -err;
> > > > > >  		goto error;
> > > > > >  	}
> > > > > >  	err = priv_flow_start(priv, &priv->flows);
> > > > > > @@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  		ERROR("%p: an error occurred while configuring flows:"
> > > > > >  		      " %s",
> > > > > >  		      (void *)priv, strerror(err));
> > > > > > +		err = -err;
> > > > > >  		goto error;
> > > > > >  	}
> > > > > >  	err = priv_rx_intr_vec_enable(priv);
> > > > > > @@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  	priv_rxq_stop(priv);
> > > > > >  	priv_flow_delete_drop_queue(priv);
> > > > > >  	priv_unlock(priv);
> > > > > > -	return -err;
> > > > > > +	return err;
> > > > > >  }
> > > > > >  
> > > > > >  /**
> > > > > 
> > > > > err in the function is handled with positives errno's, adding only those
> > > > > two and returning err will make the other positive.
> > > > 
> > > > I tried to check the return value of all functions called by mlx5_dev_start()
> > > > (negative or positive). Do you see something wrong?
> > > 
> > > What I mean is priv_flow_start() is returning a positive errno as all
> > > other functions priv_*() that's the main reason for the final rteurn -err;
> 
> priv_txq_start(priv);
>   -> negative on error
> priv_rxq_start(priv);
>   -> negative on error
> priv_dev_traffic_enable(priv, dev);
>   -> positive (+errno) on error
> priv_flow_start(priv, &priv->flows);
>   -> positive (+errno) on error
> priv_rx_intr_vec_enable(priv);
>   -> negative (-1 or -errno) on error

Indeed,

All this needs to be re-worked to only return negatives values, in the
mean time,

Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

> > > Internally MLX5 driver only works with positives errnos as lot of them
> > > are retuning the values from ioctl directly.  Only the public ones are
> > > returning negatives.
> > 
> > So, what should I modify in this patch for v2?
> > Do you agree that there is a bug regarding the return value of mlx5_dev_start()?
> > 
> > It can be reproduced easily by starting testpmd on a dual-socket machine,
> > use the core and memory from socket 0, and have the mlx device on socket 1.
> > Then start traffic forwarding, it will crash.

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 1/2] net/mlx5: fix return value of start operation
  2018-01-18 13:00 [PATCH 1/2] net/mlx5: fix return value of start operation Olivier Matz
                   ` (2 preceding siblings ...)
  2018-01-19 13:42 ` Olivier Matz
@ 2018-01-19 16:25 ` Olivier Matz
  2018-01-19 16:25   ` [PATCH v2 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
  2018-01-22 12:33   ` [PATCH v3 1/2] net/mlx5: fix return value of start operation Olivier Matz
  3 siblings, 2 replies; 19+ messages in thread
From: Olivier Matz @ 2018-01-19 16:25 UTC (permalink / raw)
  To: dev, Adrien Mazarguil, Nelio Laranjeiro, Yongseok Koh; +Cc: stable

On error, mlx5_dev_start() does not return a negative value
as it is supposed to do. The consequence is that the application
(ex: testpmd) does not notice that the port is not started
and begins the rxtx on an uninitialized port, which crashes.

Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5_trigger.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 1a20967a2..44f702daa 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -166,6 +166,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 		ERROR("%p: an error occurred while configuring control flows:"
 		      " %s",
 		      (void *)priv, strerror(err));
+		err = -err;
 		goto error;
 	}
 	err = priv_flow_start(priv, &priv->flows);
@@ -173,6 +174,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 		ERROR("%p: an error occurred while configuring flows:"
 		      " %s",
 		      (void *)priv, strerror(err));
+		err = -err;
 		goto error;
 	}
 	err = priv_rx_intr_vec_enable(priv);
@@ -196,7 +198,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	priv_rxq_stop(priv);
 	priv_flow_delete_drop_queue(priv);
 	priv_unlock(priv);
-	return -err;
+	return err;
 }
 
 /**
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 2/2] net/mlx5: fix allocation when no memory on device NUMA node
  2018-01-19 16:25 ` [PATCH v2 " Olivier Matz
@ 2018-01-19 16:25   ` Olivier Matz
  2018-01-21  6:58     ` [dpdk-stable] " Shahaf Shuler
  2018-01-22 12:33   ` [PATCH v3 1/2] net/mlx5: fix return value of start operation Olivier Matz
  1 sibling, 1 reply; 19+ messages in thread
From: Olivier Matz @ 2018-01-19 16:25 UTC (permalink / raw)
  To: dev, Adrien Mazarguil, Nelio Laranjeiro, Yongseok Koh; +Cc: stable

If there is no memory available on the same numa node than the
device, it is preferable to fallback on another socket instead
of failing.

Fixes: 1e3a39f72d5d ("net/mlx5: allocate verbs object into shared memory")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---

This new version of the patch was provided by Nelio (thanks), I
validated it on my platform. I just did minimal changes to fix the
checkpatch issues in the comments of mlx5.h (/** instead of /*).

 drivers/net/mlx5/mlx5.c     | 14 ++++++++++++--
 drivers/net/mlx5/mlx5.h     | 20 ++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxq.c |  4 ++++
 drivers/net/mlx5/mlx5_txq.c |  4 ++++
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 1c95f3520..7a04ccf98 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -139,10 +139,20 @@ mlx5_alloc_verbs_buf(size_t size, void *data)
 	struct priv *priv = data;
 	void *ret;
 	size_t alignment = sysconf(_SC_PAGESIZE);
+	unsigned int socket = SOCKET_ID_ANY;
 
+	if (priv->verbs_alloc_ctx.type == MLX5_VERSB_ALLOC_TYPE_TX_QUEUE) {
+		const struct mlx5_txq_ctrl *ctrl = priv->verbs_alloc_ctx.obj;
+
+		socket = ctrl->socket;
+	} else if (priv->verbs_alloc_ctx.type ==
+		   MLX5_VERSB_ALLOC_TYPE_RX_QUEUE) {
+		const struct mlx5_rxq_ctrl *ctrl = priv->verbs_alloc_ctx.obj;
+
+		socket = ctrl->socket;
+	}
 	assert(data != NULL);
-	ret = rte_malloc_socket(__func__, size, alignment,
-				priv->dev->device->numa_node);
+	ret = rte_malloc_socket(__func__, size, alignment, socket);
 	DEBUG("Extern alloc size: %lu, align: %lu: %p", size, alignment, ret);
 	return ret;
 }
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e740a4e77..abcae95b8 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -123,6 +123,24 @@ struct mlx5_dev_config {
 	int inline_max_packet_sz; /* Max packet size for inlining. */
 };
 
+/**
+ * Type of objet being allocated.
+ */
+enum mlx5_verbs_alloc_type {
+	MLX5_VERSB_ALLOC_TYPE_NONE,
+	MLX5_VERSB_ALLOC_TYPE_TX_QUEUE,
+	MLX5_VERSB_ALLOC_TYPE_RX_QUEUE,
+};
+
+/**
+ * Verbs allocator needs a context to know in the callback which kind of
+ * resources it is allocating.
+ */
+struct mlx5_verbs_alloc_ctx {
+	enum mlx5_verbs_alloc_type type; /* Kind of object being allocated. */
+	const void *obj; /* Pointer to the DPDK object. */
+};
+
 struct priv {
 	struct rte_eth_dev *dev; /* Ethernet device of master process. */
 	struct ibv_context *ctx; /* Verbs context. */
@@ -164,6 +182,8 @@ struct priv {
 	int primary_socket; /* Unix socket for primary process. */
 	struct rte_intr_handle intr_handle_socket; /* Interrupt handler. */
 	struct mlx5_dev_config config; /* Device configuration. */
+	struct mlx5_verbs_alloc_ctx verbs_alloc_ctx;
+	/* Context for Verbs allocator. */
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 950472754..a43a67526 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -655,6 +655,8 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
 
 	assert(rxq_data);
 	assert(!rxq_ctrl->ibv);
+	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_RX_QUEUE;
+	priv->verbs_alloc_ctx.obj = rxq_ctrl;
 	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
 				 rxq_ctrl->socket);
 	if (!tmpl) {
@@ -818,6 +820,7 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
 	DEBUG("%p: Verbs Rx queue %p: refcnt %d", (void *)priv,
 	      (void *)tmpl, rte_atomic32_read(&tmpl->refcnt));
 	LIST_INSERT_HEAD(&priv->rxqsibv, tmpl, next);
+	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_NONE;
 	return tmpl;
 error:
 	if (tmpl->wq)
@@ -828,6 +831,7 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
 		claim_zero(ibv_destroy_comp_channel(tmpl->channel));
 	if (tmpl->mr)
 		priv_mr_release(priv, tmpl->mr);
+	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_NONE;
 	return NULL;
 }
 
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 26db15a4f..b43cc9ed0 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -396,6 +396,8 @@ mlx5_priv_txq_ibv_new(struct priv *priv, uint16_t idx)
 	int ret = 0;
 
 	assert(txq_data);
+	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_TX_QUEUE;
+	priv->verbs_alloc_ctx.obj = txq_ctrl;
 	if (mlx5_getenv_int("MLX5_ENABLE_CQE_COMPRESSION")) {
 		ERROR("MLX5_ENABLE_CQE_COMPRESSION must never be set");
 		goto error;
@@ -526,12 +528,14 @@ mlx5_priv_txq_ibv_new(struct priv *priv, uint16_t idx)
 	DEBUG("%p: Verbs Tx queue %p: refcnt %d", (void *)priv,
 	      (void *)txq_ibv, rte_atomic32_read(&txq_ibv->refcnt));
 	LIST_INSERT_HEAD(&priv->txqsibv, txq_ibv, next);
+	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_NONE;
 	return txq_ibv;
 error:
 	if (tmpl.cq)
 		claim_zero(ibv_destroy_cq(tmpl.cq));
 	if (tmpl.qp)
 		claim_zero(ibv_destroy_qp(tmpl.qp));
+	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_NONE;
 	return NULL;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [dpdk-stable] [PATCH v2 2/2] net/mlx5: fix allocation when no memory on device NUMA node
  2018-01-19 16:25   ` [PATCH v2 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
@ 2018-01-21  6:58     ` Shahaf Shuler
  2018-01-22  8:20       ` Olivier Matz
  0 siblings, 1 reply; 19+ messages in thread
From: Shahaf Shuler @ 2018-01-21  6:58 UTC (permalink / raw)
  To: Olivier Matz, dev, Adrien Mazarguil, Nélio Laranjeiro, Yongseok Koh
  Cc: stable

Friday, January 19, 2018 6:25 PM, Olivier Matz:
on the same numa node than the device, it is
> preferable to fallback on another socket instead of failing.
> 
> Fixes: 1e3a39f72d5d ("net/mlx5: allocate verbs object into shared memory")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> ---
> 
> This new version of the patch was provided by Nelio (thanks), I validated it
> on my platform. I just did minimal changes to fix the checkpatch issues in the
> comments of mlx5.h (/** instead of /*).

Per my understanding the below patch is to select the socket on which to create the Verbs object based on the ethdev configuration rather than the PCI numa node.
While it introduce the infrastructure to do fallback to other socket id, it is not yet used. 
I think the commit log should be modified to better explain this patch.

> 
>  drivers/net/mlx5/mlx5.c     | 14 ++++++++++++--
>  drivers/net/mlx5/mlx5.h     | 20 ++++++++++++++++++++
>  drivers/net/mlx5/mlx5_rxq.c |  4 ++++
>  drivers/net/mlx5/mlx5_txq.c |  4 ++++
>  4 files changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 1c95f3520..7a04ccf98 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -139,10 +139,20 @@ mlx5_alloc_verbs_buf(size_t size, void *data)
>  	struct priv *priv = data;
>  	void *ret;
>  	size_t alignment = sysconf(_SC_PAGESIZE);
> +	unsigned int socket = SOCKET_ID_ANY;
> 
> +	if (priv->verbs_alloc_ctx.type ==
> MLX5_VERSB_ALLOC_TYPE_TX_QUEUE) {
> +		const struct mlx5_txq_ctrl *ctrl = priv->verbs_alloc_ctx.obj;
> +
> +		socket = ctrl->socket;
> +	} else if (priv->verbs_alloc_ctx.type ==
> +		   MLX5_VERSB_ALLOC_TYPE_RX_QUEUE) {
> +		const struct mlx5_rxq_ctrl *ctrl = priv->verbs_alloc_ctx.obj;
> +
> +		socket = ctrl->socket;
> +	}
>  	assert(data != NULL);
> -	ret = rte_malloc_socket(__func__, size, alignment,
> -				priv->dev->device->numa_node);
> +	ret = rte_malloc_socket(__func__, size, alignment, socket);
>  	DEBUG("Extern alloc size: %lu, align: %lu: %p", size, alignment, ret);
>  	return ret;
>  }
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> e740a4e77..abcae95b8 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -123,6 +123,24 @@ struct mlx5_dev_config {
>  	int inline_max_packet_sz; /* Max packet size for inlining. */  };
> 
> +/**
> + * Type of objet being allocated.
> + */
> +enum mlx5_verbs_alloc_type {
> +	MLX5_VERSB_ALLOC_TYPE_NONE,
> +	MLX5_VERSB_ALLOC_TYPE_TX_QUEUE,
> +	MLX5_VERSB_ALLOC_TYPE_RX_QUEUE,
> +};
> +
> +/**
> + * Verbs allocator needs a context to know in the callback which kind
> +of
> + * resources it is allocating.
> + */
> +struct mlx5_verbs_alloc_ctx {
> +	enum mlx5_verbs_alloc_type type; /* Kind of object being allocated.
> */
> +	const void *obj; /* Pointer to the DPDK object. */ };
> +
>  struct priv {
>  	struct rte_eth_dev *dev; /* Ethernet device of master process. */
>  	struct ibv_context *ctx; /* Verbs context. */ @@ -164,6 +182,8 @@
> struct priv {
>  	int primary_socket; /* Unix socket for primary process. */
>  	struct rte_intr_handle intr_handle_socket; /* Interrupt handler. */
>  	struct mlx5_dev_config config; /* Device configuration. */
> +	struct mlx5_verbs_alloc_ctx verbs_alloc_ctx;
> +	/* Context for Verbs allocator. */
>  };
> 
>  /**
> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> index 950472754..a43a67526 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -655,6 +655,8 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t
> idx)
> 
>  	assert(rxq_data);
>  	assert(!rxq_ctrl->ibv);
> +	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_RX_QUEUE;
> +	priv->verbs_alloc_ctx.obj = rxq_ctrl;
>  	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
>  				 rxq_ctrl->socket);
>  	if (!tmpl) {
> @@ -818,6 +820,7 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t
> idx)
>  	DEBUG("%p: Verbs Rx queue %p: refcnt %d", (void *)priv,
>  	      (void *)tmpl, rte_atomic32_read(&tmpl->refcnt));
>  	LIST_INSERT_HEAD(&priv->rxqsibv, tmpl, next);
> +	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_NONE;
>  	return tmpl;
>  error:
>  	if (tmpl->wq)
> @@ -828,6 +831,7 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t
> idx)
>  		claim_zero(ibv_destroy_comp_channel(tmpl->channel));
>  	if (tmpl->mr)
>  		priv_mr_release(priv, tmpl->mr);
> +	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_NONE;
>  	return NULL;
>  }
> 
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
> index 26db15a4f..b43cc9ed0 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -396,6 +396,8 @@ mlx5_priv_txq_ibv_new(struct priv *priv, uint16_t
> idx)
>  	int ret = 0;
> 
>  	assert(txq_data);
> +	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_TX_QUEUE;
> +	priv->verbs_alloc_ctx.obj = txq_ctrl;
>  	if (mlx5_getenv_int("MLX5_ENABLE_CQE_COMPRESSION")) {
>  		ERROR("MLX5_ENABLE_CQE_COMPRESSION must never be
> set");
>  		goto error;
> @@ -526,12 +528,14 @@ mlx5_priv_txq_ibv_new(struct priv *priv, uint16_t
> idx)
>  	DEBUG("%p: Verbs Tx queue %p: refcnt %d", (void *)priv,
>  	      (void *)txq_ibv, rte_atomic32_read(&txq_ibv->refcnt));
>  	LIST_INSERT_HEAD(&priv->txqsibv, txq_ibv, next);
> +	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_NONE;
>  	return txq_ibv;
>  error:
>  	if (tmpl.cq)
>  		claim_zero(ibv_destroy_cq(tmpl.cq));
>  	if (tmpl.qp)
>  		claim_zero(ibv_destroy_qp(tmpl.qp));
> +	priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_NONE;
>  	return NULL;
>  }
> 
> --
> 2.11.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-stable] [PATCH v2 2/2] net/mlx5: fix allocation when no memory on device NUMA node
  2018-01-21  6:58     ` [dpdk-stable] " Shahaf Shuler
@ 2018-01-22  8:20       ` Olivier Matz
  0 siblings, 0 replies; 19+ messages in thread
From: Olivier Matz @ 2018-01-22  8:20 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: dev, Adrien Mazarguil, Nélio Laranjeiro, Yongseok Koh, stable

Hi Shahaf,

On Sun, Jan 21, 2018 at 06:58:09AM +0000, Shahaf Shuler wrote:
> Friday, January 19, 2018 6:25 PM, Olivier Matz:
> on the same numa node than the device, it is
> > preferable to fallback on another socket instead of failing.
> > 
> > Fixes: 1e3a39f72d5d ("net/mlx5: allocate verbs object into shared memory")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > ---
> > 
> > This new version of the patch was provided by Nelio (thanks), I validated it
> > on my platform. I just did minimal changes to fix the checkpatch issues in the
> > comments of mlx5.h (/** instead of /*).
> 
> Per my understanding the below patch is to select the socket on which to create the Verbs object based on the ethdev configuration rather than the PCI numa node.
> While it introduce the infrastructure to do fallback to other socket id, it is not yet used. 
> I think the commit log should be modified to better explain this patch.

That's right, the commit log should be updated, it's still the commit log
of the v1, which does not match.

> > 
> >  drivers/net/mlx5/mlx5.c     | 14 ++++++++++++--
> >  drivers/net/mlx5/mlx5.h     | 20 ++++++++++++++++++++
> >  drivers/net/mlx5/mlx5_rxq.c |  4 ++++
> >  drivers/net/mlx5/mlx5_txq.c |  4 ++++
> >  4 files changed, 40 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > 1c95f3520..7a04ccf98 100644
> > --- a/drivers/net/mlx5/mlx5.c
> > +++ b/drivers/net/mlx5/mlx5.c
> > @@ -139,10 +139,20 @@ mlx5_alloc_verbs_buf(size_t size, void *data)
> >  	struct priv *priv = data;
> >  	void *ret;
> >  	size_t alignment = sysconf(_SC_PAGESIZE);
> > +	unsigned int socket = SOCKET_ID_ANY;
> > 
> > +	if (priv->verbs_alloc_ctx.type ==
> > MLX5_VERSB_ALLOC_TYPE_TX_QUEUE) {

It looks we should also replace VERSB by VERBS :D

I will send a v3.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/2] net/mlx5: fix return value of start operation
  2018-01-19 16:25 ` [PATCH v2 " Olivier Matz
  2018-01-19 16:25   ` [PATCH v2 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
@ 2018-01-22 12:33   ` Olivier Matz
  2018-01-22 12:33     ` [PATCH v3 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
  2018-01-22 20:27     ` [dpdk-stable] [PATCH v3 1/2] net/mlx5: fix return value of start operation Shahaf Shuler
  1 sibling, 2 replies; 19+ messages in thread
From: Olivier Matz @ 2018-01-22 12:33 UTC (permalink / raw)
  To: dev, Adrien Mazarguil, Nelio Laranjeiro, Yongseok Koh; +Cc: stable

On error, mlx5_dev_start() does not return a negative value
as it is supposed to do. The consequence is that the application
(ex: testpmd) does not notice that the port is not started
and begins the rxtx on an uninitialized port, which crashes.

Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---

v2->v3:
- rebase on top of head
  The commit is much smaller after
  c7bf62255edf ("net/mlx5: fix handling link status event")

For backport, prefer the v2.

 drivers/net/mlx5/mlx5_trigger.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 61fa2604f..827db2e7e 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -181,7 +181,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	priv_rxq_stop(priv);
 	priv_flow_delete_drop_queue(priv);
 	priv_unlock(priv);
-	return -err;
+	return err;
 }
 
 /**
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/2] net/mlx5: fix allocation when no memory on device NUMA node
  2018-01-22 12:33   ` [PATCH v3 1/2] net/mlx5: fix return value of start operation Olivier Matz
@ 2018-01-22 12:33     ` Olivier Matz
  2018-01-22 20:27     ` [dpdk-stable] [PATCH v3 1/2] net/mlx5: fix return value of start operation Shahaf Shuler
  1 sibling, 0 replies; 19+ messages in thread
From: Olivier Matz @ 2018-01-22 12:33 UTC (permalink / raw)
  To: dev, Adrien Mazarguil, Nelio Laranjeiro, Yongseok Koh; +Cc: stable

When no memory is available on the same numa node than the device, the
initialization of the device fails. However, the use case where the
cores and memory are on a different socket than the device is valid,
even if not optimal.

To fix this issue, this commit introduces an infrastructure to select
the socket on which to allocate the verbs objects based on the ethdev
configuration and the object type, rather than the PCI numa node.

Fixes: 1e3a39f72d5d ("net/mlx5: allocate verbs object into shared memory")
Cc: stable@dpdk.org

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---

v2->v3:
- fix commit log
- VERSB -> VERBS

v1->v2:
- update from Nelop to select socket on which to allocate based on
  config and object type

 drivers/net/mlx5/mlx5.c     | 14 ++++++++++++--
 drivers/net/mlx5/mlx5.h     | 20 ++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxq.c |  4 ++++
 drivers/net/mlx5/mlx5_txq.c |  4 ++++
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9d1de366b..bc4b6bad0 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -139,10 +139,20 @@ mlx5_alloc_verbs_buf(size_t size, void *data)
 	struct priv *priv = data;
 	void *ret;
 	size_t alignment = sysconf(_SC_PAGESIZE);
+	unsigned int socket = SOCKET_ID_ANY;
 
+	if (priv->verbs_alloc_ctx.type == MLX5_VERBS_ALLOC_TYPE_TX_QUEUE) {
+		const struct mlx5_txq_ctrl *ctrl = priv->verbs_alloc_ctx.obj;
+
+		socket = ctrl->socket;
+	} else if (priv->verbs_alloc_ctx.type ==
+		   MLX5_VERBS_ALLOC_TYPE_RX_QUEUE) {
+		const struct mlx5_rxq_ctrl *ctrl = priv->verbs_alloc_ctx.obj;
+
+		socket = ctrl->socket;
+	}
 	assert(data != NULL);
-	ret = rte_malloc_socket(__func__, size, alignment,
-				priv->dev->device->numa_node);
+	ret = rte_malloc_socket(__func__, size, alignment, socket);
 	DEBUG("Extern alloc size: %lu, align: %lu: %p", size, alignment, ret);
 	return ret;
 }
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index eb0894fa5..a7ec607c3 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -123,6 +123,24 @@ struct mlx5_dev_config {
 	int inline_max_packet_sz; /* Max packet size for inlining. */
 };
 
+/**
+ * Type of objet being allocated.
+ */
+enum mlx5_verbs_alloc_type {
+	MLX5_VERBS_ALLOC_TYPE_NONE,
+	MLX5_VERBS_ALLOC_TYPE_TX_QUEUE,
+	MLX5_VERBS_ALLOC_TYPE_RX_QUEUE,
+};
+
+/**
+ * Verbs allocator needs a context to know in the callback which kind of
+ * resources it is allocating.
+ */
+struct mlx5_verbs_alloc_ctx {
+	enum mlx5_verbs_alloc_type type; /* Kind of object being allocated. */
+	const void *obj; /* Pointer to the DPDK object. */
+};
+
 struct priv {
 	struct rte_eth_dev *dev; /* Ethernet device of master process. */
 	struct ibv_context *ctx; /* Verbs context. */
@@ -164,6 +182,8 @@ struct priv {
 	int primary_socket; /* Unix socket for primary process. */
 	struct rte_intr_handle intr_handle_socket; /* Interrupt handler. */
 	struct mlx5_dev_config config; /* Device configuration. */
+	struct mlx5_verbs_alloc_ctx verbs_alloc_ctx;
+	/* Context for Verbs allocator. */
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 8e4fbbf2a..0274ccf31 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -655,6 +655,8 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
 
 	assert(rxq_data);
 	assert(!rxq_ctrl->ibv);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_RX_QUEUE;
+	priv->verbs_alloc_ctx.obj = rxq_ctrl;
 	tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
 				 rxq_ctrl->socket);
 	if (!tmpl) {
@@ -818,6 +820,7 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
 	DEBUG("%p: Verbs Rx queue %p: refcnt %d", (void *)priv,
 	      (void *)tmpl, rte_atomic32_read(&tmpl->refcnt));
 	LIST_INSERT_HEAD(&priv->rxqsibv, tmpl, next);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
 	return tmpl;
 error:
 	if (tmpl->wq)
@@ -828,6 +831,7 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
 		claim_zero(ibv_destroy_comp_channel(tmpl->channel));
 	if (tmpl->mr)
 		priv_mr_release(priv, tmpl->mr);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
 	return NULL;
 }
 
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 49018303e..18321c3e3 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -396,6 +396,8 @@ mlx5_priv_txq_ibv_new(struct priv *priv, uint16_t idx)
 	int ret = 0;
 
 	assert(txq_data);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_TX_QUEUE;
+	priv->verbs_alloc_ctx.obj = txq_ctrl;
 	if (mlx5_getenv_int("MLX5_ENABLE_CQE_COMPRESSION")) {
 		ERROR("MLX5_ENABLE_CQE_COMPRESSION must never be set");
 		goto error;
@@ -526,12 +528,14 @@ mlx5_priv_txq_ibv_new(struct priv *priv, uint16_t idx)
 	DEBUG("%p: Verbs Tx queue %p: refcnt %d", (void *)priv,
 	      (void *)txq_ibv, rte_atomic32_read(&txq_ibv->refcnt));
 	LIST_INSERT_HEAD(&priv->txqsibv, txq_ibv, next);
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
 	return txq_ibv;
 error:
 	if (tmpl.cq)
 		claim_zero(ibv_destroy_cq(tmpl.cq));
 	if (tmpl.qp)
 		claim_zero(ibv_destroy_qp(tmpl.qp));
+	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
 	return NULL;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [dpdk-stable] [PATCH v3 1/2] net/mlx5: fix return value of start operation
  2018-01-22 12:33   ` [PATCH v3 1/2] net/mlx5: fix return value of start operation Olivier Matz
  2018-01-22 12:33     ` [PATCH v3 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
@ 2018-01-22 20:27     ` Shahaf Shuler
  1 sibling, 0 replies; 19+ messages in thread
From: Shahaf Shuler @ 2018-01-22 20:27 UTC (permalink / raw)
  To: Olivier Matz, dev, Adrien Mazarguil, Nélio Laranjeiro, Yongseok Koh
  Cc: stable

Monday, January 22, 2018 2:34 PM, Olivier Matz:
> On error, mlx5_dev_start() does not return a negative value as it is supposed
> to do. The consequence is that the application
> (ex: testpmd) does not notice that the port is not started and begins the rxtx
> on an uninitialized port, which crashes.
> 
> Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

Series applied to next-net-mlx, thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-01-22 20:27 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-18 13:00 [PATCH 1/2] net/mlx5: fix return value of start operation Olivier Matz
2018-01-18 13:00 ` [PATCH 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
2018-01-18 16:19   ` Nélio Laranjeiro
2018-01-18 16:04 ` [PATCH 1/2] net/mlx5: fix return value of start operation Nélio Laranjeiro
2018-01-18 16:13   ` Olivier Matz
2018-01-19  6:28     ` Yongseok Koh
2018-01-19  8:35     ` Nélio Laranjeiro
2018-01-19  8:43       ` Olivier Matz
2018-01-19 13:30         ` Olivier Matz
2018-01-19 13:43           ` Nélio Laranjeiro
2018-01-19 14:18           ` Nélio Laranjeiro
2018-01-19 13:42 ` Olivier Matz
2018-01-19 16:25 ` [PATCH v2 " Olivier Matz
2018-01-19 16:25   ` [PATCH v2 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
2018-01-21  6:58     ` [dpdk-stable] " Shahaf Shuler
2018-01-22  8:20       ` Olivier Matz
2018-01-22 12:33   ` [PATCH v3 1/2] net/mlx5: fix return value of start operation Olivier Matz
2018-01-22 12:33     ` [PATCH v3 2/2] net/mlx5: fix allocation when no memory on device NUMA node Olivier Matz
2018-01-22 20:27     ` [dpdk-stable] [PATCH v3 1/2] net/mlx5: fix return value of start operation Shahaf Shuler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.