All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-rc] RDMA/hns: Fix the problem of mailbox being blocked in the reset scene
@ 2021-11-23  8:48 Wenpeng Liang
  2021-11-25 17:26 ` Jason Gunthorpe
  0 siblings, 1 reply; 2+ messages in thread
From: Wenpeng Liang @ 2021-11-23  8:48 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, linuxarm, liangwenpeng

From: Yangyang Li <liyangyang20@huawei.com>

is_reset is used to indicate whether the hardware starts to reset. When
hns_roce_hw_v2_reset_notify_down() is called, the hardware has not yet
started to reset. If is_reset is set at this time, all mailbox operations
of resource destroy actions will be intercepted by driver. When the driver
cleans up resources, but the hardware is still accessed, the following
errors will appear:

[382663.191495] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
[382663.336320] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
[382663.349860] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x000002088000003f
[382663.362217] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50e0800
[382663.370690] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
[382663.385557] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
[382663.487465] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
[382663.534555] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x000002088000043e
[382663.546569] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50a0800
[382663.554642] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
[382663.565023] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
[382663.575860] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
[382663.585248] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000020880000436
[382663.595860] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50a0880
[382663.804870] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
[382663.942132] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
[382663.962770] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
[382664.100535] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x000002088000043a
[382664.178632] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50e0840
[382664.218997] hns3 0000:35:00.0: INT status: CMDQ(0x0) HW errors(0x0) other(0x0)
[382664.223572] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
[382664.257988] hns3 0000:35:00.0: received unknown or unhandled event of vector0
[382664.271027] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
[382664.546592] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
[382664.555942] {34}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 7

is_reset will be set correctly in check_aedev_reset_status(), so the
setting in hns_roce_hw_v2_reset_notify_down() should be deleted.

Fixes: 726be12f5ca0 ("RDMA/hns: Set reset flag when hw resetting")
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 9bfbaddd1763..ae14329c619c 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -6387,10 +6387,8 @@ static int hns_roce_hw_v2_reset_notify_down(struct hnae3_handle *handle)
 	if (!hr_dev)
 		return 0;
 
-	hr_dev->is_reset = true;
 	hr_dev->active = false;
 	hr_dev->dis_db = true;
-
 	hr_dev->state = HNS_ROCE_DEVICE_STATE_RST_DOWN;
 
 	return 0;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH for-rc] RDMA/hns: Fix the problem of mailbox being blocked in the reset scene
  2021-11-23  8:48 [PATCH for-rc] RDMA/hns: Fix the problem of mailbox being blocked in the reset scene Wenpeng Liang
@ 2021-11-25 17:26 ` Jason Gunthorpe
  0 siblings, 0 replies; 2+ messages in thread
From: Jason Gunthorpe @ 2021-11-25 17:26 UTC (permalink / raw)
  To: Wenpeng Liang; +Cc: leon, linux-rdma, linuxarm

On Tue, Nov 23, 2021 at 04:48:09PM +0800, Wenpeng Liang wrote:
> From: Yangyang Li <liyangyang20@huawei.com>
> 
> is_reset is used to indicate whether the hardware starts to reset. When
> hns_roce_hw_v2_reset_notify_down() is called, the hardware has not yet
> started to reset. If is_reset is set at this time, all mailbox operations
> of resource destroy actions will be intercepted by driver. When the driver
> cleans up resources, but the hardware is still accessed, the following
> errors will appear:
> 
> [382663.191495] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [382663.336320] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
> [382663.349860] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x000002088000003f
> [382663.362217] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50e0800
> [382663.370690] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
> [382663.385557] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [382663.487465] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
> [382663.534555] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x000002088000043e
> [382663.546569] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50a0800
> [382663.554642] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
> [382663.565023] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [382663.575860] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
> [382663.585248] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000020880000436
> [382663.595860] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50a0880
> [382663.804870] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
> [382663.942132] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [382663.962770] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
> [382664.100535] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x000002088000043a
> [382664.178632] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x00000000a50e0840
> [382664.218997] hns3 0000:35:00.0: INT status: CMDQ(0x0) HW errors(0x0) other(0x0)
> [382664.223572] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000000000000000
> [382664.257988] hns3 0000:35:00.0: received unknown or unhandled event of vector0
> [382664.271027] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [382664.546592] arm-smmu-v3 arm-smmu-v3.2.auto: 	0x0000350100000010
> [382664.555942] {34}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 7
> 
> is_reset will be set correctly in check_aedev_reset_status(), so the
> setting in hns_roce_hw_v2_reset_notify_down() should be deleted.
> 
> Fixes: 726be12f5ca0 ("RDMA/hns: Set reset flag when hw resetting")
> Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
> ---
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 2 --
>  1 file changed, 2 deletions(-)

Applied to for-rc, thanks

Jason

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-11-25 17:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-23  8:48 [PATCH for-rc] RDMA/hns: Fix the problem of mailbox being blocked in the reset scene Wenpeng Liang
2021-11-25 17:26 ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.