From: Leon Romanovsky <leon@kernel.org>
To: Doug Ledford <dledford@redhat.com>, Jason Gunthorpe <jgg@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>,
RDMA mailing list <linux-rdma@vger.kernel.org>,
Daniel Jurgens <danielj@mellanox.com>,
Erez Shitrit <erezsh@mellanox.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
Maor Gottlieb <maorg@mellanox.com>,
Michael Guralnik <michaelgur@mellanox.com>,
Moni Shoua <monis@mellanox.com>,
Parav Pandit <parav@mellanox.com>,
Sean Hefty <sean.hefty@intel.com>,
Valentine Fatiev <valentinef@mellanox.com>,
Yishai Hadas <yishaih@mellanox.com>,
Yonatan Cohen <yonatanc@mellanox.com>,
Zhu Yanjun <yanjunz@mellanox.com>
Subject: [PATCH rdma-rc 7/9] RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq
Date: Wed, 12 Feb 2020 09:26:33 +0200 [thread overview]
Message-ID: <20200212072635.682689-8-leon@kernel.org> (raw)
In-Reply-To: <20200212072635.682689-1-leon@kernel.org>
From: Zhu Yanjun <yanjunz@mellanox.com>
When run stress tests with RXE, the following Call Traces often occur
"
watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
...
Call Trace:
<IRQ>
create_object+0x3f/0x3b0
kmem_cache_alloc_node_trace+0x129/0x2d0
__kmalloc_reserve.isra.52+0x2e/0x80
__alloc_skb+0x83/0x270
rxe_init_packet+0x99/0x150 [rdma_rxe]
rxe_requester+0x34e/0x11a0 [rdma_rxe]
rxe_do_task+0x85/0xf0 [rdma_rxe]
tasklet_action_common.isra.21+0xeb/0x100
__do_softirq+0xd0/0x298
irq_exit+0xc5/0xd0
smp_apic_timer_interrupt+0x68/0x120
apic_timer_interrupt+0xf/0x20
</IRQ>
...
"
The root cause is that tasklet is actually a softirq. In a tasklet handler,
another softirq handler is triggered. Usually these softirq handlers run
on the same cpu core. So this will cause "soft lockup Bug".
Fixes: 8700e3e7c485 ("Soft RoCE driver")
CC: Yossef Efraim <yossefe@mellanox.com>
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/sw/rxe/rxe_comp.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 116cafc9afcf..4bc88708b355 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -329,7 +329,7 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
qp->comp.psn = pkt->psn;
if (qp->req.wait_psn) {
qp->req.wait_psn = 0;
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task, 0);
}
}
return COMPST_ERROR_RETRY;
@@ -463,7 +463,7 @@ static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
*/
if (qp->req.wait_fence) {
qp->req.wait_fence = 0;
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task, 0);
}
}
@@ -479,7 +479,7 @@ static inline enum comp_state complete_ack(struct rxe_qp *qp,
if (qp->req.need_rd_atomic) {
qp->comp.timeout_retry = 0;
qp->req.need_rd_atomic = 0;
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task, 0);
}
}
@@ -725,7 +725,7 @@ int rxe_completer(void *arg)
RXE_CNT_COMP_RETRY);
qp->req.need_retry = 1;
qp->comp.started_retry = 1;
- rxe_run_task(&qp->req.task, 1);
+ rxe_run_task(&qp->req.task, 0);
}
if (pkt) {
--
2.24.1
next prev parent reply other threads:[~2020-02-12 7:27 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-12 7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
2020-02-12 7:26 ` [PATCH rdma-rc 1/9] RDMA/ucma: Mask QPN to be 24 bits according to IBTA Leon Romanovsky
2020-02-12 7:26 ` [PATCH rdma-rc 2/9] RDMA/core: Fix protection fault in get_pkey_idx_qp_list Leon Romanovsky
2020-02-12 8:01 ` Leon Romanovsky
2020-02-12 8:06 ` Leon Romanovsky
2020-02-12 7:26 ` [PATCH rdma-rc 3/9] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow" Leon Romanovsky
2020-02-13 13:30 ` Jason Gunthorpe
2020-02-14 3:11 ` Parav Pandit
2020-02-14 14:08 ` Jason Gunthorpe
2020-02-14 14:48 ` Parav Pandit
2020-02-19 20:40 ` Jason Gunthorpe
2020-02-12 7:26 ` [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode Leon Romanovsky
2020-02-13 15:37 ` Jason Gunthorpe
2020-02-13 18:10 ` Leon Romanovsky
2020-02-13 18:26 ` Jason Gunthorpe
2020-02-13 18:36 ` Leon Romanovsky
2020-02-13 19:09 ` Jason Gunthorpe
2020-02-12 7:26 ` [PATCH rdma-rc 5/9] RDMA/core: Add missing list deletion on freeing event queue Leon Romanovsky
2020-02-12 7:26 ` [PATCH rdma-rc 6/9] IB/mlx5: Fix async events cleanup flows Leon Romanovsky
2020-02-12 7:26 ` Leon Romanovsky [this message]
2020-02-12 7:26 ` [PATCH rdma-rc 8/9] IB/umad: Fix kernel crash while unloading ib_umad Leon Romanovsky
2020-02-13 14:28 ` Jason Gunthorpe
2020-02-13 18:03 ` Leon Romanovsky
2020-02-12 7:26 ` [PATCH rdma-rc 9/9] RDMA/mlx5: Prevent overflow in mmap offset calculations Leon Romanovsky
2020-02-13 18:03 ` [PATCH rdma-rc 0/9] Fixes for v5.6 Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200212072635.682689-8-leon@kernel.org \
--to=leon@kernel.org \
--cc=danielj@mellanox.com \
--cc=dledford@redhat.com \
--cc=erezsh@mellanox.com \
--cc=jgg@mellanox.com \
--cc=jgg@ziepe.ca \
--cc=leonro@mellanox.com \
--cc=linux-rdma@vger.kernel.org \
--cc=maorg@mellanox.com \
--cc=michaelgur@mellanox.com \
--cc=monis@mellanox.com \
--cc=parav@mellanox.com \
--cc=sean.hefty@intel.com \
--cc=valentinef@mellanox.com \
--cc=yanjunz@mellanox.com \
--cc=yishaih@mellanox.com \
--cc=yonatanc@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).