All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
To: jgg@ziepe.ca, leon@kernel.org
Cc: linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org,
	Jinpu Wang <jinpu.wang@cloud.ionos.com>
Subject: Lock issue with 2a7cec53816 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
Date: Mon, 26 Oct 2020 09:19:37 +0100	[thread overview]
Message-ID: <3b1f7767-98e2-93e0-b718-16d1c5346140@cloud.ionos.com> (raw)

Hi Jason & Leon,


[Sorry to send again since the previous mail didn't hit the list]

With the mentioned commit, I got the below calltraces.

[ 1209.544359] INFO: task kworker/u8:3:615 blocked for more than 120 
seconds.
[ 1209.545170]       Tainted: G           O      5.9.0-rc8+ #30
[ 1209.545341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 1209.545814] task:kworker/u8:3    state:D stack:    0 pid:  615 
ppid:     2 flags:0x00004000
[ 1209.547601] Workqueue: rdma_cm cma_work_handler [rdma_cm]
[ 1209.548645] Call Trace:
[ 1209.552120]  __schedule+0x363/0x7b0
[ 1209.552903]  schedule+0x44/0xb0
[ 1209.553093]  schedule_preempt_disabled+0xe/0x10
[ 1209.553253]  __mutex_lock.isra.10+0x26d/0x4e0
[ 1209.553421]  __mutex_lock_slowpath+0x13/0x20
[ 1209.553596]  ? __mutex_lock_slowpath+0x13/0x20
[ 1209.553751]  mutex_lock+0x2f/0x40
[ 1209.554045]  rdma_connect+0x52/0x770 [rdma_cm]
[ 1209.554215]  ? vprintk_default+0x1f/0x30
[ 1209.554359]  ? vprintk_func+0x62/0x100
[ 1209.554532]  ? cma_cm_event_handler+0x2b/0xe0 [rdma_cm]
[ 1209.554688]  ? printk+0x52/0x6e
[ 1209.554852]  rtrs_clt_rdma_cm_handler+0x393/0x8b0 [rtrs_client]
[ 1209.555056]  cma_cm_event_handler+0x2b/0xe0 [rdma_cm]
[ 1209.561823]  ? cma_comp_exch+0x4e/0x60 [rdma_cm]
[ 1209.561883]  ? cma_cm_event_handler+0x2b/0xe0 [rdma_cm]
[ 1209.561940]  cma_work_handler+0x89/0xc0 [rdma_cm]
[ 1209.561988]  process_one_work+0x20c/0x400
[...]
[ 1209.562323] INFO: task bash:630 blocked for more than 120 seconds.
[ 1209.562368]       Tainted: G           O      5.9.0-rc8+ #30
[ 1209.562404] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 1209.562456] task:bash            state:D stack:    0 pid:  630 
ppid:   618 flags:0x00004000
[ 1209.562515] Call Trace:
[ 1209.562559]  __schedule+0x363/0x7b0
[ 1209.562603]  schedule+0x44/0xb0
[ 1209.562642]  schedule_preempt_disabled+0xe/0x10
[ 1209.562685]  __mutex_lock.isra.10+0x26d/0x4e0
[ 1209.562727]  ? vprintk_func+0x62/0x100
[ 1209.562765]  ? vprintk_default+0x1f/0x30
[ 1209.562806]  __mutex_lock_slowpath+0x13/0x20
[ 1209.562847]  ? __mutex_lock_slowpath+0x13/0x20
[ 1209.562887]  mutex_lock+0x2f/0x40
[ 1209.562936]  rdma_destroy_id+0x31/0x60 [rdma_cm]
[ 1209.562983]  destroy_cm.isra.24+0x2d/0x50 [rtrs_client]
[ 1209.563586]  init_sess+0x23b/0xac0 [rtrs_client]
[ 1209.563691]  ? wait_woken+0x80/0x80
[ 1209.563742]  rtrs_clt_open+0x27f/0x540 [rtrs_client]
[ 1209.563791]  ? remap_devs+0x150/0x150 [rnbd_client]
[ 1209.563867]  find_and_get_or_create_sess+0x4a1/0x5e0 [rnbd_client]
[ 1209.563921]  ? remap_devs+0x150/0x150 [rnbd_client]
[ 1209.563966]  ? atomic_notifier_call_chain+0x1a/0x20
[ 1209.564011]  ? vt_console_print+0x203/0x3d0
[ 1209.564051]  ? up+0x32/0x50
[ 1209.564093]  ? __irq_work_queue_local+0x4f/0x60
[ 1209.564132]  ? irq_work_queue+0x1a/0x30
[ 1209.564169]  ? wake_up_klogd.part.28+0x34/0x40
[ 1209.564208]  ? vprintk_emit+0x126/0x2b0
[ 1209.564249]  ? vprintk_default+0x1f/0x30
[ 1209.564292]  rnbd_clt_map_device+0x74/0x7b0 [rnbd_client]
[ 1209.564341]  rnbd_clt_map_device_store+0x39e/0x6c0 [rnbd_client]

 From my understanding, it is caused by the rdma_connect need to hold 
the handler_mutex
again in the path.

cma_ib_handler -> mutex_lock(&id_priv->handler_mutex)
-> cma_cm_event_handler
                     -> id_priv->id.event_handler = rtrs_clt_rdma_cm_handler
-> rdma_connect
-> mutex_lock(&id_priv->handler_mutex)


And seems nvme rdma could have the same issue since rdma_connect is 
called by
nvme_rdma_cm_handler -> nvme_rdma_route_resolved.

Thanks,
Guoqing

WARNING: multiple messages have this Message-ID (diff)
From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
To: jgg@ziepe.ca, leon@kernel.org
Cc: linux-rdma@vger.kernel.org,
	Jinpu Wang <jinpu.wang@cloud.ionos.com>,
	linux-nvme@lists.infradead.org
Subject: Lock issue with 2a7cec53816 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
Date: Mon, 26 Oct 2020 09:19:37 +0100	[thread overview]
Message-ID: <3b1f7767-98e2-93e0-b718-16d1c5346140@cloud.ionos.com> (raw)

Hi Jason & Leon,


[Sorry to send again since the previous mail didn't hit the list]

With the mentioned commit, I got the below calltraces.

[ 1209.544359] INFO: task kworker/u8:3:615 blocked for more than 120 
seconds.
[ 1209.545170]       Tainted: G           O      5.9.0-rc8+ #30
[ 1209.545341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 1209.545814] task:kworker/u8:3    state:D stack:    0 pid:  615 
ppid:     2 flags:0x00004000
[ 1209.547601] Workqueue: rdma_cm cma_work_handler [rdma_cm]
[ 1209.548645] Call Trace:
[ 1209.552120]  __schedule+0x363/0x7b0
[ 1209.552903]  schedule+0x44/0xb0
[ 1209.553093]  schedule_preempt_disabled+0xe/0x10
[ 1209.553253]  __mutex_lock.isra.10+0x26d/0x4e0
[ 1209.553421]  __mutex_lock_slowpath+0x13/0x20
[ 1209.553596]  ? __mutex_lock_slowpath+0x13/0x20
[ 1209.553751]  mutex_lock+0x2f/0x40
[ 1209.554045]  rdma_connect+0x52/0x770 [rdma_cm]
[ 1209.554215]  ? vprintk_default+0x1f/0x30
[ 1209.554359]  ? vprintk_func+0x62/0x100
[ 1209.554532]  ? cma_cm_event_handler+0x2b/0xe0 [rdma_cm]
[ 1209.554688]  ? printk+0x52/0x6e
[ 1209.554852]  rtrs_clt_rdma_cm_handler+0x393/0x8b0 [rtrs_client]
[ 1209.555056]  cma_cm_event_handler+0x2b/0xe0 [rdma_cm]
[ 1209.561823]  ? cma_comp_exch+0x4e/0x60 [rdma_cm]
[ 1209.561883]  ? cma_cm_event_handler+0x2b/0xe0 [rdma_cm]
[ 1209.561940]  cma_work_handler+0x89/0xc0 [rdma_cm]
[ 1209.561988]  process_one_work+0x20c/0x400
[...]
[ 1209.562323] INFO: task bash:630 blocked for more than 120 seconds.
[ 1209.562368]       Tainted: G           O      5.9.0-rc8+ #30
[ 1209.562404] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 1209.562456] task:bash            state:D stack:    0 pid:  630 
ppid:   618 flags:0x00004000
[ 1209.562515] Call Trace:
[ 1209.562559]  __schedule+0x363/0x7b0
[ 1209.562603]  schedule+0x44/0xb0
[ 1209.562642]  schedule_preempt_disabled+0xe/0x10
[ 1209.562685]  __mutex_lock.isra.10+0x26d/0x4e0
[ 1209.562727]  ? vprintk_func+0x62/0x100
[ 1209.562765]  ? vprintk_default+0x1f/0x30
[ 1209.562806]  __mutex_lock_slowpath+0x13/0x20
[ 1209.562847]  ? __mutex_lock_slowpath+0x13/0x20
[ 1209.562887]  mutex_lock+0x2f/0x40
[ 1209.562936]  rdma_destroy_id+0x31/0x60 [rdma_cm]
[ 1209.562983]  destroy_cm.isra.24+0x2d/0x50 [rtrs_client]
[ 1209.563586]  init_sess+0x23b/0xac0 [rtrs_client]
[ 1209.563691]  ? wait_woken+0x80/0x80
[ 1209.563742]  rtrs_clt_open+0x27f/0x540 [rtrs_client]
[ 1209.563791]  ? remap_devs+0x150/0x150 [rnbd_client]
[ 1209.563867]  find_and_get_or_create_sess+0x4a1/0x5e0 [rnbd_client]
[ 1209.563921]  ? remap_devs+0x150/0x150 [rnbd_client]
[ 1209.563966]  ? atomic_notifier_call_chain+0x1a/0x20
[ 1209.564011]  ? vt_console_print+0x203/0x3d0
[ 1209.564051]  ? up+0x32/0x50
[ 1209.564093]  ? __irq_work_queue_local+0x4f/0x60
[ 1209.564132]  ? irq_work_queue+0x1a/0x30
[ 1209.564169]  ? wake_up_klogd.part.28+0x34/0x40
[ 1209.564208]  ? vprintk_emit+0x126/0x2b0
[ 1209.564249]  ? vprintk_default+0x1f/0x30
[ 1209.564292]  rnbd_clt_map_device+0x74/0x7b0 [rnbd_client]
[ 1209.564341]  rnbd_clt_map_device_store+0x39e/0x6c0 [rnbd_client]

 From my understanding, it is caused by the rdma_connect need to hold 
the handler_mutex
again in the path.

cma_ib_handler -> mutex_lock(&id_priv->handler_mutex)
-> cma_cm_event_handler
                     -> id_priv->id.event_handler = rtrs_clt_rdma_cm_handler
-> rdma_connect
-> mutex_lock(&id_priv->handler_mutex)


And seems nvme rdma could have the same issue since rdma_connect is 
called by
nvme_rdma_cm_handler -> nvme_rdma_route_resolved.

Thanks,
Guoqing

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

             reply	other threads:[~2020-10-26  8:19 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-26  8:19 Guoqing Jiang [this message]
2020-10-26  8:19 ` Lock issue with 2a7cec53816 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state") Guoqing Jiang
2020-10-26 10:27 ` Haris Iqbal
2020-10-26 10:27   ` Haris Iqbal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b1f7767-98e2-93e0-b718-16d1c5346140@cloud.ionos.com \
    --to=guoqing.jiang@cloud.ionos.com \
    --cc=jgg@ziepe.ca \
    --cc=jinpu.wang@cloud.ionos.com \
    --cc=leon@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.