stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Backport missing mlx5 fixes after 50b2412b7e7
@ 2020-11-18 18:28 Timo Rothenpieler
  2020-11-20  6:18 ` Jack Wang
  2020-11-20  8:37 ` Greg KH
  0 siblings, 2 replies; 4+ messages in thread
From: Timo Rothenpieler @ 2020-11-18 18:28 UTC (permalink / raw)
  To: stable; +Cc: Eran Ben Elisha, Jack Wang, Saeed Mahameed, jgg

Hi,

After 50b2412b7e7862c5af0cbf4b10d93bc5c712d021 was backported to stable 
branches (I only tested 5.4), some serious issues started to arrise.

According to linux-rdma, the following two patches that need to go along 
with 50b2412b7e are missing:

> 1. 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout
> 2. 410bd754cd73 net/mlx5: Add retry mechanism to the command entry ... 

I managed to apply those mostly cleanly after also applying two 
dependencies.
So the complete list of needed commits for 5.4 is:

1. 3ed879965cc4 net/mlx5: Use async EQ setup cleanup helpers ...
2. 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout
3. d43b7007dbd1 net/mlx5: Fix a race when moving command ...
4. 410bd754cd73 net/mlx5: Add retry mechanism to the command entry ...

With those 4 commits applied, the issue is fixed.
For reference, that's the output I get with 5.4.77:

> Nov 17 01:12:58 store01 kernel: mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
> Nov 17 01:12:58 store01 kernel: mlx5_core 0000:01:00.0: cmd_work_handler:887:(pid 383): failed to allocate command entry
> Nov 17 01:12:58 store01 kernel: infiniband mlx5_0: reg_mr_callback:104:(pid 383): async reg mr failed. status -11
> Nov 17 01:12:58 store01 kernel: mlx5_core 0000:01:00.0: cmd_work_handler:887:(pid 383): failed to allocate command entry
> Nov 17 01:12:58 store01 kernel: mlx5_core 0000:01:00.0: mlx5e_create_mdev_resources:104:(pid 1): alloc td failed, -11
> Nov 17 01:12:58 store01 kernel: mlx5_0, 1: ipoib_intf_alloc failed -11 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Backport missing mlx5 fixes after 50b2412b7e7
  2020-11-18 18:28 Backport missing mlx5 fixes after 50b2412b7e7 Timo Rothenpieler
@ 2020-11-20  6:18 ` Jack Wang
  2020-11-22 17:48   ` Sasha Levin
  2020-11-20  8:37 ` Greg KH
  1 sibling, 1 reply; 4+ messages in thread
From: Jack Wang @ 2020-11-20  6:18 UTC (permalink / raw)
  To: Timo Rothenpieler, gregkh, sashal
  Cc: stable, Eran Ben Elisha, Saeed Mahameed, Jason Gunthorpe

Timo Rothenpieler <timo@rothenpieler.org> 于2020年11月18日周三 下午7:28写道:
>
> Hi,
>
> After 50b2412b7e7862c5af0cbf4b10d93bc5c712d021 was backported to stable
> branches (I only tested 5.4), some serious issues started to arrise.
>
> According to linux-rdma, the following two patches that need to go along
> with 50b2412b7e are missing:
>
> > 1. 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout
> > 2. 410bd754cd73 net/mlx5: Add retry mechanism to the command entry ...
>
> I managed to apply those mostly cleanly after also applying two
> dependencies.
> So the complete list of needed commits for 5.4 is:
>
> 1. 3ed879965cc4 net/mlx5: Use async EQ setup cleanup helpers ...
> 2. 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout
> 3. d43b7007dbd1 net/mlx5: Fix a race when moving command ...
> 4. 410bd754cd73 net/mlx5: Add retry mechanism to the command entry ...
>
> With those 4 commits applied, the issue is fixed.
> For reference, that's the output I get with 5.4.77:
>
> > Nov 17 01:12:58 store01 kernel: mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
> > Nov 17 01:12:58 store01 kernel: mlx5_core 0000:01:00.0: cmd_work_handler:887:(pid 383): failed to allocate command entry
> > Nov 17 01:12:58 store01 kernel: infiniband mlx5_0: reg_mr_callback:104:(pid 383): async reg mr failed. status -11
> > Nov 17 01:12:58 store01 kernel: mlx5_core 0000:01:00.0: cmd_work_handler:887:(pid 383): failed to allocate command entry
> > Nov 17 01:12:58 store01 kernel: mlx5_core 0000:01:00.0: mlx5e_create_mdev_resources:104:(pid 1): alloc td failed, -11
> > Nov 17 01:12:58 store01 kernel: mlx5_0, 1: ipoib_intf_alloc failed -11
>
+cc Greg & Sascha
Hi,

We hit the same problem on mlx5, I've tested four mentioned commits,
it works fine, please include them in future 5.4 kernel.

Thanks!
Jack Wang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Backport missing mlx5 fixes after 50b2412b7e7
  2020-11-18 18:28 Backport missing mlx5 fixes after 50b2412b7e7 Timo Rothenpieler
  2020-11-20  6:18 ` Jack Wang
@ 2020-11-20  8:37 ` Greg KH
  1 sibling, 0 replies; 4+ messages in thread
From: Greg KH @ 2020-11-20  8:37 UTC (permalink / raw)
  To: Timo Rothenpieler; +Cc: stable, Eran Ben Elisha, Jack Wang, Saeed Mahameed, jgg

On Wed, Nov 18, 2020 at 07:28:30PM +0100, Timo Rothenpieler wrote:
> Hi,
> 
> After 50b2412b7e7862c5af0cbf4b10d93bc5c712d021 was backported to stable
> branches (I only tested 5.4), some serious issues started to arrise.
> 
> According to linux-rdma, the following two patches that need to go along
> with 50b2412b7e are missing:
> 
> > 1. 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout
> > 2. 410bd754cd73 net/mlx5: Add retry mechanism to the command entry ...
> 
> I managed to apply those mostly cleanly after also applying two
> dependencies.
> So the complete list of needed commits for 5.4 is:
> 
> 1. 3ed879965cc4 net/mlx5: Use async EQ setup cleanup helpers ...
> 2. 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout
> 3. d43b7007dbd1 net/mlx5: Fix a race when moving command ...
> 4. 410bd754cd73 net/mlx5: Add retry mechanism to the command entry ...
> 
> With those 4 commits applied, the issue is fixed.
> For reference, that's the output I get with 5.4.77:

All now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Backport missing mlx5 fixes after 50b2412b7e7
  2020-11-20  6:18 ` Jack Wang
@ 2020-11-22 17:48   ` Sasha Levin
  0 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2020-11-22 17:48 UTC (permalink / raw)
  To: Jack Wang
  Cc: Timo Rothenpieler, gregkh, stable, Eran Ben Elisha,
	Saeed Mahameed, Jason Gunthorpe

On Fri, Nov 20, 2020 at 07:18:04AM +0100, Jack Wang wrote:
>Timo Rothenpieler <timo@rothenpieler.org> 于2020年11月18日周三 下午7:28写道:
>>
>> Hi,
>>
>> After 50b2412b7e7862c5af0cbf4b10d93bc5c712d021 was backported to stable
>> branches (I only tested 5.4), some serious issues started to arrise.
>>
>> According to linux-rdma, the following two patches that need to go along
>> with 50b2412b7e are missing:
>>
>> > 1. 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout
>> > 2. 410bd754cd73 net/mlx5: Add retry mechanism to the command entry ...
>>
>> I managed to apply those mostly cleanly after also applying two
>> dependencies.
>> So the complete list of needed commits for 5.4 is:
>>
>> 1. 3ed879965cc4 net/mlx5: Use async EQ setup cleanup helpers ...
>> 2. 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout
>> 3. d43b7007dbd1 net/mlx5: Fix a race when moving command ...
>> 4. 410bd754cd73 net/mlx5: Add retry mechanism to the command entry ...
>>
>> With those 4 commits applied, the issue is fixed.
>> For reference, that's the output I get with 5.4.77:
>>
>> > Nov 17 01:12:58 store01 kernel: mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
>> > Nov 17 01:12:58 store01 kernel: mlx5_core 0000:01:00.0: cmd_work_handler:887:(pid 383): failed to allocate command entry
>> > Nov 17 01:12:58 store01 kernel: infiniband mlx5_0: reg_mr_callback:104:(pid 383): async reg mr failed. status -11
>> > Nov 17 01:12:58 store01 kernel: mlx5_core 0000:01:00.0: cmd_work_handler:887:(pid 383): failed to allocate command entry
>> > Nov 17 01:12:58 store01 kernel: mlx5_core 0000:01:00.0: mlx5e_create_mdev_resources:104:(pid 1): alloc td failed, -11
>> > Nov 17 01:12:58 store01 kernel: mlx5_0, 1: ipoib_intf_alloc failed -11
>>
>+cc Greg & Sascha
>Hi,
>
>We hit the same problem on mlx5, I've tested four mentioned commits,
>it works fine, please include them in future 5.4 kernel.

Looks like Greg picked them up, thanks!

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-22 17:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-18 18:28 Backport missing mlx5 fixes after 50b2412b7e7 Timo Rothenpieler
2020-11-20  6:18 ` Jack Wang
2020-11-22 17:48   ` Sasha Levin
2020-11-20  8:37 ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).