All of lore.kernel.org
 help / color / mirror / Atom feed
* Regression: Connect-X5 doesn't connect with NVME-of
@ 2018-02-01 17:56 Logan Gunthorpe
       [not found] ` <66a5332c-01ee-7a39-8224-189fa52a7298-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Logan Gunthorpe @ 2018-02-01 17:56 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Max Gurtovoy, Stephen Bates, Saeed Mahameed

Hello,

We've experienced a regression with using nvme-of and two Connect-X5s. 
With v4.15 and v4.14.16 we see the following dmesgs when trying to 
connect to the target:

> [   43.732539] nvme nvme2: creating 16 I/O queues.
> [   44.072427] nvmet: adding queue 1 to ctrl 1.
> [   44.072553] nvmet: adding queue 2 to ctrl 1.
> [   44.072597] nvme nvme2: Connect command failed, error wo/DNR bit: -16402
> [   44.072609] nvme nvme2: failed to connect queue: 3 ret=-18
> [   44.075421] nvmet_rdma: freeing queue 2
> [   44.075792] nvmet_rdma: freeing queue 1
> [   44.264293] nvmet_rdma: freeing queue 3
> *snip*

(on v4.15 there is additional error panics likely do to some other 
nvme-of error handling bugs)

And nvme connect returns:

> Failed to write to /dev/nvme-fabrics: Invalid cross-device link

The two adapters are the same with the latest available firmware:

> 	transport:			InfiniBand (0)
> 	fw_ver:				16.21.2010
> 	vendor_id:			0x02c9
> 	vendor_part_id:			4119
> 	hw_ver:				0x0
> 	board_id:			MT_0000000010

We bisected to find the commit that broke our setup is:

05e0cc84e00c net/mlx5: Fix get vector affinity helper function

Thanks,

Logan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Regression: Connect-X5 doesn't connect with NVME-of
       [not found] ` <66a5332c-01ee-7a39-8224-189fa52a7298-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
@ 2018-02-03  4:53   ` Saeed Mahameed
       [not found]     ` <e6cdfbe7-762c-c70c-be5f-397bdb08ee80-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Saeed Mahameed @ 2018-02-03  4:53 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Max Gurtovoy, Stephen Bates, Sagi Grimberg



On 02/01/2018 09:56 AM, Logan Gunthorpe wrote:
> Hello,
> 
> We've experienced a regression with using nvme-of and two Connect-X5s. With v4.15 and v4.14.16 we see the following dmesgs when trying to connect to the target:
> 
>> [   43.732539] nvme nvme2: creating 16 I/O queues.
>> [   44.072427] nvmet: adding queue 1 to ctrl 1.
>> [   44.072553] nvmet: adding queue 2 to ctrl 1.
>> [   44.072597] nvme nvme2: Connect command failed, error wo/DNR bit: -16402
>> [   44.072609] nvme nvme2: failed to connect queue: 3 ret=-18
>> [   44.075421] nvmet_rdma: freeing queue 2
>> [   44.075792] nvmet_rdma: freeing queue 1
>> [   44.264293] nvmet_rdma: freeing queue 3
>> *snip*
> 
> (on v4.15 there is additional error panics likely do to some other nvme-of error handling bugs)
> 
> And nvme connect returns:
> 
>> Failed to write to /dev/nvme-fabrics: Invalid cross-device link
> 
> The two adapters are the same with the latest available firmware:
> 
>>     transport:            InfiniBand (0)
>>     fw_ver:                16.21.2010
>>     vendor_id:            0x02c9
>>     vendor_part_id:            4119
>>     hw_ver:                0x0
>>     board_id:            MT_0000000010
> 
> We bisected to find the commit that broke our setup is:
> 
> 05e0cc84e00c net/mlx5: Fix get vector affinity helper function

I doubt that the issue is within this fix itself, but with this fix the Automatic affinity settings
for nvme over rdma is enabled, Maybe a bug was hiding there and we just stepped on it.

Added Sagi, maybe he can help us spot the issue here.

Thanks,
saeed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Regression: Connect-X5 doesn't connect with NVME-of
       [not found]     ` <e6cdfbe7-762c-c70c-be5f-397bdb08ee80-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2018-02-03 22:46       ` Max Gurtovoy
  2018-02-04  9:57         ` Sagi Grimberg
  1 sibling, 0 replies; 13+ messages in thread
From: Max Gurtovoy @ 2018-02-03 22:46 UTC (permalink / raw)
  To: Saeed Mahameed, Logan Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Stephen Bates, Sagi Grimberg

Hi Logan,

On 2/3/2018 6:53 AM, Saeed Mahameed wrote:
> 
> 
> On 02/01/2018 09:56 AM, Logan Gunthorpe wrote:
>> Hello,
>>
>> We've experienced a regression with using nvme-of and two Connect-X5s. With v4.15 and v4.14.16 we see the following dmesgs when trying to connect to the target:

I would like to repro it in our labs so please describe the environment 
and the topology you run (B2B/switch/loopback ?)

>>
>>> [   43.732539] nvme nvme2: creating 16 I/O queues.
>>> [   44.072427] nvmet: adding queue 1 to ctrl 1.
>>> [   44.072553] nvmet: adding queue 2 to ctrl 1.
>>> [   44.072597] nvme nvme2: Connect command failed, error wo/DNR bit: -16402
>>> [   44.072609] nvme nvme2: failed to connect queue: 3 ret=-18
>>> [   44.075421] nvmet_rdma: freeing queue 2
>>> [   44.075792] nvmet_rdma: freeing queue 1
>>> [   44.264293] nvmet_rdma: freeing queue 3
>>> *snip*
>>
>> (on v4.15 there is additional error panics likely do to some other nvme-of error handling bugs)

I fixed the panic during connect error flow by fixing the state machine 
in the NVME core.
It should be pushed to 4.16-rc and I hope to 4.15.x soon.

>>
>> And nvme connect returns:
>>
>>> Failed to write to /dev/nvme-fabrics: Invalid cross-device link
>>
>> The two adapters are the same with the latest available firmware:
>>
>>>      transport:            InfiniBand (0)
>>>      fw_ver:                16.21.2010
>>>      vendor_id:            0x02c9
>>>      vendor_part_id:            4119
>>>      hw_ver:                0x0
>>>      board_id:            MT_0000000010
>>
>> We bisected to find the commit that broke our setup is:
>>
>> 05e0cc84e00c net/mlx5: Fix get vector affinity helper function
> 
> I doubt that the issue is within this fix itself, but with this fix the Automatic affinity settings
> for nvme over rdma is enabled, Maybe a bug was hiding there and we just stepped on it.
> 
> Added Sagi, maybe he can help us spot the issue here.
> 
> Thanks,
> saeed.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Regression: Connect-X5 doesn't connect with NVME-of
  2018-02-03  4:53   ` Saeed Mahameed
@ 2018-02-04  9:57         ` Sagi Grimberg
  0 siblings, 0 replies; 13+ messages in thread
From: Sagi Grimberg @ 2018-02-04  9:57 UTC (permalink / raw)
  To: Saeed Mahameed, Logan Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Max Gurtovoy, Stephen Bates, linux-nvme, Christoph Hellwig


>> Hello,

Hi Logan, thanks for reporting.

>> We've experienced a regression with using nvme-of and two Connect-X5s. With v4.15 and v4.14.16 we see the following dmesgs when trying to connect to the target:
>>
>>> [   43.732539] nvme nvme2: creating 16 I/O queues.
>>> [   44.072427] nvmet: adding queue 1 to ctrl 1.
>>> [   44.072553] nvmet: adding queue 2 to ctrl 1.
>>> [   44.072597] nvme nvme2: Connect command failed, error wo/DNR bit: -16402
>>> [   44.072609] nvme nvme2: failed to connect queue: 3 ret=-18
>>> [   44.075421] nvmet_rdma: freeing queue 2
>>> [   44.075792] nvmet_rdma: freeing queue 1
>>> [   44.264293] nvmet_rdma: freeing queue 3
>>> *snip*
>>
>> (on v4.15 there is additional error panics likely do to some other nvme-of error handling bugs)
>>
>> And nvme connect returns:
>>
>>> Failed to write to /dev/nvme-fabrics: Invalid cross-device link
>>
>> The two adapters are the same with the latest available firmware:
>>
>>>      transport:            InfiniBand (0)
>>>      fw_ver:                16.21.2010
>>>      vendor_id:            0x02c9
>>>      vendor_part_id:            4119
>>>      hw_ver:                0x0
>>>      board_id:            MT_0000000010
>>
>> We bisected to find the commit that broke our setup is:
>>
>> 05e0cc84e00c net/mlx5: Fix get vector affinity helper function

I'm really bummed out about this... I seem to have missed it
in my review and apparently went in untested.

If we look at the patch, it clearly shows that the behavior changed
as mlx5_get_vector_affinity does not add the offset of
MLX5_EQ_VEC_COMP_BASE as before.

The API assumes that completion vector 0 means the first _completion_
vector which means ignoring the private/internal mlx5 vectors created
for stuff like port async events, fw commands and page requests...

What happens is that the consumer asked for affinity mask of
completion vector 0 and got the async event vector and the skew
continued leading to unmapped block queues.

So I think this should make the problem go away:
--
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index a0610427e168..b82c4ae92411 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1238,7 +1238,7 @@ mlx5_get_vector_affinity(struct mlx5_core_dev 
*dev, int vector)
         int eqn;
         int err;

-       err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
+       err = mlx5_vector2eqn(dev, MLX5_EQ_VEC_COMP_BASE + vector, &eqn, 
&irq);
         if (err)
                 return NULL;
--

Can you verify that this fixes your problem?

Regardless, it looks like we also have a second bug in here such that we 
still attempt to connect a queue which is unmapped and fail the
controller association when it fails. This was not an option before
because PCI_IRQ_AFFINITY guaranteed us that we will have the cpu spread
that we need to ignore this case, but thats changed now.

We should either settle with less queues, or fallback to the
default mq_map for the queues that are left unmapped, or we should
at least continue forward without these unmapped queues (I think
the former makes better sense).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Regression: Connect-X5 doesn't connect with NVME-of
@ 2018-02-04  9:57         ` Sagi Grimberg
  0 siblings, 0 replies; 13+ messages in thread
From: Sagi Grimberg @ 2018-02-04  9:57 UTC (permalink / raw)



>> Hello,

Hi Logan, thanks for reporting.

>> We've experienced a regression with using nvme-of and two Connect-X5s. With v4.15 and v4.14.16 we see the following dmesgs when trying to connect to the target:
>>
>>> [?? 43.732539] nvme nvme2: creating 16 I/O queues.
>>> [?? 44.072427] nvmet: adding queue 1 to ctrl 1.
>>> [?? 44.072553] nvmet: adding queue 2 to ctrl 1.
>>> [?? 44.072597] nvme nvme2: Connect command failed, error wo/DNR bit: -16402
>>> [?? 44.072609] nvme nvme2: failed to connect queue: 3 ret=-18
>>> [?? 44.075421] nvmet_rdma: freeing queue 2
>>> [?? 44.075792] nvmet_rdma: freeing queue 1
>>> [?? 44.264293] nvmet_rdma: freeing queue 3
>>> *snip*
>>
>> (on v4.15 there is additional error panics likely do to some other nvme-of error handling bugs)
>>
>> And nvme connect returns:
>>
>>> Failed to write to /dev/nvme-fabrics: Invalid cross-device link
>>
>> The two adapters are the same with the latest available firmware:
>>
>>>  ????transport:??????????? InfiniBand (0)
>>>  ????fw_ver:??????????????? 16.21.2010
>>>  ????vendor_id:??????????? 0x02c9
>>>  ????vendor_part_id:??????????? 4119
>>>  ????hw_ver:??????????????? 0x0
>>>  ????board_id:??????????? MT_0000000010
>>
>> We bisected to find the commit that broke our setup is:
>>
>> 05e0cc84e00c net/mlx5: Fix get vector affinity helper function

I'm really bummed out about this... I seem to have missed it
in my review and apparently went in untested.

If we look at the patch, it clearly shows that the behavior changed
as mlx5_get_vector_affinity does not add the offset of
MLX5_EQ_VEC_COMP_BASE as before.

The API assumes that completion vector 0 means the first _completion_
vector which means ignoring the private/internal mlx5 vectors created
for stuff like port async events, fw commands and page requests...

What happens is that the consumer asked for affinity mask of
completion vector 0 and got the async event vector and the skew
continued leading to unmapped block queues.

So I think this should make the problem go away:
--
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index a0610427e168..b82c4ae92411 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1238,7 +1238,7 @@ mlx5_get_vector_affinity(struct mlx5_core_dev 
*dev, int vector)
         int eqn;
         int err;

-       err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
+       err = mlx5_vector2eqn(dev, MLX5_EQ_VEC_COMP_BASE + vector, &eqn, 
&irq);
         if (err)
                 return NULL;
--

Can you verify that this fixes your problem?

Regardless, it looks like we also have a second bug in here such that we 
still attempt to connect a queue which is unmapped and fail the
controller association when it fails. This was not an option before
because PCI_IRQ_AFFINITY guaranteed us that we will have the cpu spread
that we need to ignore this case, but thats changed now.

We should either settle with less queues, or fallback to the
default mq_map for the queues that are left unmapped, or we should
at least continue forward without these unmapped queues (I think
the former makes better sense).

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Regression: Connect-X5 doesn't connect with NVME-of
  2018-02-04  9:57         ` Sagi Grimberg
@ 2018-02-05 11:23             ` Max Gurtovoy
  -1 siblings, 0 replies; 13+ messages in thread
From: Max Gurtovoy @ 2018-02-05 11:23 UTC (permalink / raw)
  To: Sagi Grimberg, Saeed Mahameed, Logan Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Stephen Bates, linux-nvme, Christoph Hellwig

Hi Sagi/Logan,

I've repro it with v4.14.1 (not happens on each connect).
Sagi's proposal below fix the issue of the "Failed to write to 
/dev/nvme-fabrics: Invalid cross-device link".

Sagi can you push (with my Tested-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 
and Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>) it or I will ?

The crush after the connection failure is fixed in my patches for NVMe 
core state machine fixes that are under review in the list.


> 
> So I think this should make the problem go away:
> -- 
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index a0610427e168..b82c4ae92411 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -1238,7 +1238,7 @@ mlx5_get_vector_affinity(struct mlx5_core_dev 
> *dev, int vector)
>          int eqn;
>          int err;
> 
> -       err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
> +       err = mlx5_vector2eqn(dev, MLX5_EQ_VEC_COMP_BASE + vector, &eqn, 
> &irq);
>          if (err)
>                  return NULL;
> -- 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Regression: Connect-X5 doesn't connect with NVME-of
@ 2018-02-05 11:23             ` Max Gurtovoy
  0 siblings, 0 replies; 13+ messages in thread
From: Max Gurtovoy @ 2018-02-05 11:23 UTC (permalink / raw)


Hi Sagi/Logan,

I've repro it with v4.14.1 (not happens on each connect).
Sagi's proposal below fix the issue of the "Failed to write to 
/dev/nvme-fabrics: Invalid cross-device link".

Sagi can you push (with my Tested-by: Max Gurtovoy <maxg at mellanox.com> 
and Reviewed-by: Max Gurtovoy <maxg at mellanox.com>) it or I will ?

The crush after the connection failure is fixed in my patches for NVMe 
core state machine fixes that are under review in the list.


> 
> So I think this should make the problem go away:
> -- 
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index a0610427e168..b82c4ae92411 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -1238,7 +1238,7 @@ mlx5_get_vector_affinity(struct mlx5_core_dev 
> *dev, int vector)
>  ??????? int eqn;
>  ??????? int err;
> 
> -?????? err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
> +?????? err = mlx5_vector2eqn(dev, MLX5_EQ_VEC_COMP_BASE + vector, &eqn, 
> &irq);
>  ??????? if (err)
>  ??????????????? return NULL;
> -- 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Regression: Connect-X5 doesn't connect with NVME-of
  2018-02-05 11:23             ` Max Gurtovoy
@ 2018-02-05 14:18                 ` Sagi Grimberg
  -1 siblings, 0 replies; 13+ messages in thread
From: Sagi Grimberg @ 2018-02-05 14:18 UTC (permalink / raw)
  To: Max Gurtovoy, Saeed Mahameed, Logan Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Stephen Bates, linux-nvme, Christoph Hellwig


> Hi Sagi/Logan,
> 
> I've repro it with v4.14.1 (not happens on each connect).
> Sagi's proposal below fix the issue of the "Failed to write to 
> /dev/nvme-fabrics: Invalid cross-device link".
> 
> Sagi can you push (with my Tested-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 
> and Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>) it or I will ?

Thanks, I'll send it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Regression: Connect-X5 doesn't connect with NVME-of
@ 2018-02-05 14:18                 ` Sagi Grimberg
  0 siblings, 0 replies; 13+ messages in thread
From: Sagi Grimberg @ 2018-02-05 14:18 UTC (permalink / raw)



> Hi Sagi/Logan,
> 
> I've repro it with v4.14.1 (not happens on each connect).
> Sagi's proposal below fix the issue of the "Failed to write to 
> /dev/nvme-fabrics: Invalid cross-device link".
> 
> Sagi can you push (with my Tested-by: Max Gurtovoy <maxg at mellanox.com> 
> and Reviewed-by: Max Gurtovoy <maxg at mellanox.com>) it or I will ?

Thanks, I'll send it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Regression: Connect-X5 doesn't connect with NVME-of
  2018-02-05 14:18                 ` Sagi Grimberg
@ 2018-02-05 15:44                     ` Laurence Oberman
  -1 siblings, 0 replies; 13+ messages in thread
From: Laurence Oberman @ 2018-02-05 15:44 UTC (permalink / raw)
  To: Sagi Grimberg, Max Gurtovoy, Saeed Mahameed, Logan Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Stephen Bates, linux-nvme, Christoph Hellwig

On Mon, 2018-02-05 at 16:18 +0200, Sagi Grimberg wrote:
> > Hi Sagi/Logan,
> > 
> > I've repro it with v4.14.1 (not happens on each connect).
> > Sagi's proposal below fix the issue of the "Failed to write to 
> > /dev/nvme-fabrics: Invalid cross-device link".
> > 
> > Sagi can you push (with my Tested-by: Max Gurtovoy <maxg@mellanox.c
> > om> 
> > and Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>) it or I will ?
> 
> Thanks, I'll send it.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" 
> in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

This missed me because all my NVME is still using CX3 (mlx4)
I will move the NVME devices into the setup with mlx5 so next time I
will catch this sort of issue with the testing.
Currently the mlx5 is only testing ISER and SRP.

Thanks
Laurence 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Regression: Connect-X5 doesn't connect with NVME-of
@ 2018-02-05 15:44                     ` Laurence Oberman
  0 siblings, 0 replies; 13+ messages in thread
From: Laurence Oberman @ 2018-02-05 15:44 UTC (permalink / raw)


On Mon, 2018-02-05@16:18 +0200, Sagi Grimberg wrote:
> > Hi Sagi/Logan,
> > 
> > I've repro it with v4.14.1 (not happens on each connect).
> > Sagi's proposal below fix the issue of the "Failed to write to?
> > /dev/nvme-fabrics: Invalid cross-device link".
> > 
> > Sagi can you push (with my Tested-by: Max Gurtovoy <maxg at mellanox.c
> > om>?
> > and Reviewed-by: Max Gurtovoy <maxg at mellanox.com>) it or I will ?
> 
> Thanks, I'll send it.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" 
> in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at??http://vger.kernel.org/majordomo-info.html

This missed me because all my NVME is still using CX3 (mlx4)
I will move the NVME devices into the setup with mlx5 so next time I
will catch this sort of issue with the testing.
Currently the mlx5 is only testing ISER and SRP.

Thanks
Laurence 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Regression: Connect-X5 doesn't connect with NVME-of
  2018-02-05 15:44                     ` Laurence Oberman
@ 2018-02-05 15:59                         ` Max Gurtovoy
  -1 siblings, 0 replies; 13+ messages in thread
From: Max Gurtovoy @ 2018-02-05 15:59 UTC (permalink / raw)
  To: Laurence Oberman, Sagi Grimberg, Saeed Mahameed, Logan Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Stephen Bates, linux-nvme, Christoph Hellwig



On 2/5/2018 5:44 PM, Laurence Oberman wrote:
> On Mon, 2018-02-05 at 16:18 +0200, Sagi Grimberg wrote:
>>> Hi Sagi/Logan,
>>>
>>> I've repro it with v4.14.1 (not happens on each connect).
>>> Sagi's proposal below fix the issue of the "Failed to write to
>>> /dev/nvme-fabrics: Invalid cross-device link".
>>>
>>> Sagi can you push (with my Tested-by: Max Gurtovoy <maxg@mellanox.c
>>> om>
>>> and Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>) it or I will ?
>>
>> Thanks, I'll send it.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>> in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> This missed me because all my NVME is still using CX3 (mlx4)
> I will move the NVME devices into the setup with mlx5 so next time I
> will catch this sort of issue with the testing.
> Currently the mlx5 is only testing ISER and SRP.

Thanks Laurence,
This is well appreciated :)

> 
> Thanks
> Laurence
> 

Cheers,
Max
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Regression: Connect-X5 doesn't connect with NVME-of
@ 2018-02-05 15:59                         ` Max Gurtovoy
  0 siblings, 0 replies; 13+ messages in thread
From: Max Gurtovoy @ 2018-02-05 15:59 UTC (permalink / raw)




On 2/5/2018 5:44 PM, Laurence Oberman wrote:
> On Mon, 2018-02-05@16:18 +0200, Sagi Grimberg wrote:
>>> Hi Sagi/Logan,
>>>
>>> I've repro it with v4.14.1 (not happens on each connect).
>>> Sagi's proposal below fix the issue of the "Failed to write to
>>> /dev/nvme-fabrics: Invalid cross-device link".
>>>
>>> Sagi can you push (with my Tested-by: Max Gurtovoy <maxg at mellanox.c
>>> om>
>>> and Reviewed-by: Max Gurtovoy <maxg at mellanox.com>) it or I will ?
>>
>> Thanks, I'll send it.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>> in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at??http://vger.kernel.org/majordomo-info.html
> 
> This missed me because all my NVME is still using CX3 (mlx4)
> I will move the NVME devices into the setup with mlx5 so next time I
> will catch this sort of issue with the testing.
> Currently the mlx5 is only testing ISER and SRP.

Thanks Laurence,
This is well appreciated :)

> 
> Thanks
> Laurence
> 

Cheers,
Max

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-02-05 15:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-01 17:56 Regression: Connect-X5 doesn't connect with NVME-of Logan Gunthorpe
     [not found] ` <66a5332c-01ee-7a39-8224-189fa52a7298-OTvnGxWRz7hWk0Htik3J/w@public.gmane.org>
2018-02-03  4:53   ` Saeed Mahameed
     [not found]     ` <e6cdfbe7-762c-c70c-be5f-397bdb08ee80-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-02-03 22:46       ` Max Gurtovoy
2018-02-04  9:57       ` Sagi Grimberg
2018-02-04  9:57         ` Sagi Grimberg
     [not found]         ` <0d629a68-a1fa-7297-e371-5abbc2dd5fe7-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2018-02-05 11:23           ` Max Gurtovoy
2018-02-05 11:23             ` Max Gurtovoy
     [not found]             ` <dbda15f0-f678-9fab-dffe-5e8d2ae24ae1-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-02-05 14:18               ` Sagi Grimberg
2018-02-05 14:18                 ` Sagi Grimberg
     [not found]                 ` <7941ee6c-bf13-093c-e5c2-9ed93889405d-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2018-02-05 15:44                   ` Laurence Oberman
2018-02-05 15:44                     ` Laurence Oberman
     [not found]                     ` <1517845472.11655.3.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-02-05 15:59                       ` Max Gurtovoy
2018-02-05 15:59                         ` Max Gurtovoy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.