linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Haris Iqbal <haris.iqbal@cloud.ionos.com>
To: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>,
	linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org,
	Jinpu Wang <jinpu.wang@cloud.ionos.com>
Subject: Re: Lock issue with 2a7cec53816 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
Date: Mon, 26 Oct 2020 15:57:53 +0530	[thread overview]
Message-ID: <CAJpMwyhONwT6mYzOazjuWS01AVSd4bu=K33-LYk=ZygP021V4w@mail.gmail.com> (raw)
In-Reply-To: <3b1f7767-98e2-93e0-b718-16d1c5346140@cloud.ionos.com>

On Mon, Oct 26, 2020 at 1:50 PM Guoqing Jiang
<guoqing.jiang@cloud.ionos.com> wrote:
>
> Hi Jason & Leon,
>
>
> [Sorry to send again since the previous mail didn't hit the list]
>
> With the mentioned commit, I got the below calltraces.
>
> [ 1209.544359] INFO: task kworker/u8:3:615 blocked for more than 120
> seconds.
> [ 1209.545170]       Tainted: G           O      5.9.0-rc8+ #30
> [ 1209.545341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1209.545814] task:kworker/u8:3    state:D stack:    0 pid:  615
> ppid:     2 flags:0x00004000
> [ 1209.547601] Workqueue: rdma_cm cma_work_handler [rdma_cm]
> [ 1209.548645] Call Trace:
> [ 1209.552120]  __schedule+0x363/0x7b0
> [ 1209.552903]  schedule+0x44/0xb0
> [ 1209.553093]  schedule_preempt_disabled+0xe/0x10
> [ 1209.553253]  __mutex_lock.isra.10+0x26d/0x4e0
> [ 1209.553421]  __mutex_lock_slowpath+0x13/0x20
> [ 1209.553596]  ? __mutex_lock_slowpath+0x13/0x20
> [ 1209.553751]  mutex_lock+0x2f/0x40
> [ 1209.554045]  rdma_connect+0x52/0x770 [rdma_cm]
> [ 1209.554215]  ? vprintk_default+0x1f/0x30
> [ 1209.554359]  ? vprintk_func+0x62/0x100
> [ 1209.554532]  ? cma_cm_event_handler+0x2b/0xe0 [rdma_cm]
> [ 1209.554688]  ? printk+0x52/0x6e
> [ 1209.554852]  rtrs_clt_rdma_cm_handler+0x393/0x8b0 [rtrs_client]
> [ 1209.555056]  cma_cm_event_handler+0x2b/0xe0 [rdma_cm]
> [ 1209.561823]  ? cma_comp_exch+0x4e/0x60 [rdma_cm]
> [ 1209.561883]  ? cma_cm_event_handler+0x2b/0xe0 [rdma_cm]
> [ 1209.561940]  cma_work_handler+0x89/0xc0 [rdma_cm]
> [ 1209.561988]  process_one_work+0x20c/0x400
> [...]
> [ 1209.562323] INFO: task bash:630 blocked for more than 120 seconds.
> [ 1209.562368]       Tainted: G           O      5.9.0-rc8+ #30
> [ 1209.562404] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1209.562456] task:bash            state:D stack:    0 pid:  630
> ppid:   618 flags:0x00004000
> [ 1209.562515] Call Trace:
> [ 1209.562559]  __schedule+0x363/0x7b0
> [ 1209.562603]  schedule+0x44/0xb0
> [ 1209.562642]  schedule_preempt_disabled+0xe/0x10
> [ 1209.562685]  __mutex_lock.isra.10+0x26d/0x4e0
> [ 1209.562727]  ? vprintk_func+0x62/0x100
> [ 1209.562765]  ? vprintk_default+0x1f/0x30
> [ 1209.562806]  __mutex_lock_slowpath+0x13/0x20
> [ 1209.562847]  ? __mutex_lock_slowpath+0x13/0x20
> [ 1209.562887]  mutex_lock+0x2f/0x40
> [ 1209.562936]  rdma_destroy_id+0x31/0x60 [rdma_cm]
> [ 1209.562983]  destroy_cm.isra.24+0x2d/0x50 [rtrs_client]
> [ 1209.563586]  init_sess+0x23b/0xac0 [rtrs_client]
> [ 1209.563691]  ? wait_woken+0x80/0x80
> [ 1209.563742]  rtrs_clt_open+0x27f/0x540 [rtrs_client]
> [ 1209.563791]  ? remap_devs+0x150/0x150 [rnbd_client]
> [ 1209.563867]  find_and_get_or_create_sess+0x4a1/0x5e0 [rnbd_client]
> [ 1209.563921]  ? remap_devs+0x150/0x150 [rnbd_client]
> [ 1209.563966]  ? atomic_notifier_call_chain+0x1a/0x20
> [ 1209.564011]  ? vt_console_print+0x203/0x3d0
> [ 1209.564051]  ? up+0x32/0x50
> [ 1209.564093]  ? __irq_work_queue_local+0x4f/0x60
> [ 1209.564132]  ? irq_work_queue+0x1a/0x30
> [ 1209.564169]  ? wake_up_klogd.part.28+0x34/0x40
> [ 1209.564208]  ? vprintk_emit+0x126/0x2b0
> [ 1209.564249]  ? vprintk_default+0x1f/0x30
> [ 1209.564292]  rnbd_clt_map_device+0x74/0x7b0 [rnbd_client]
> [ 1209.564341]  rnbd_clt_map_device_store+0x39e/0x6c0 [rnbd_client]
>
>  From my understanding, it is caused by the rdma_connect need to hold
> the handler_mutex
> again in the path.
>
> cma_ib_handler -> mutex_lock(&id_priv->handler_mutex)
> -> cma_cm_event_handler
>                      -> id_priv->id.event_handler = rtrs_clt_rdma_cm_handler
> -> rdma_connect
> -> mutex_lock(&id_priv->handler_mutex)

I think its coming from

cma_work_handler() -> cma_cm_event_handler() ->
id_priv->id.event_handler = rtrs_clt_rdma_cm_handler
rtrs_rdma_route_resolved
-> rdma_connect tries to get the same lock again
                                       -> mutex_lock(&id_priv->handler_mutex)

Since, cma_ib_handler() doesn't handle and forward the
RDMA_CM_EVENT_ROUTE_RESOLVED event.
And cma_work_handler() also holds the mutex before calling
cma_cm_event_handler()

It probably starts with,

After RDMA_CM_EVENT_ADDR_RESOLVED comes to rtrs_clt_rdma_cm_handler,
rdma_resolve_route() -> cma_resolve_ib_route() -> cma_query_ib_route() --> ...
(callback) cma_work_handler()

>
>
> And seems nvme rdma could have the same issue since rdma_connect is
> called by
> nvme_rdma_cm_handler -> nvme_rdma_route_resolved.
>
> Thanks,
> Guoqing



-- 

Regards
-Haris

      reply	other threads:[~2020-10-26 10:28 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-26  8:19 Lock issue with 2a7cec53816 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state") Guoqing Jiang
2020-10-26 10:27 ` Haris Iqbal [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJpMwyhONwT6mYzOazjuWS01AVSd4bu=K33-LYk=ZygP021V4w@mail.gmail.com' \
    --to=haris.iqbal@cloud.ionos.com \
    --cc=guoqing.jiang@cloud.ionos.com \
    --cc=jgg@ziepe.ca \
    --cc=jinpu.wang@cloud.ionos.com \
    --cc=leon@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).