linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Chris Leech <cleech@redhat.com>, Christoph Hellwig <hch@lst.de>,
	Sasha Levin <sashal@kernel.org>,
	kbusch@kernel.org, axboe@fb.com, sagi@grimberg.me,
	linux-nvme@lists.infradead.org
Subject: [PATCH AUTOSEL 5.17 40/43] nvme-tcp: lockdep: annotate in-kernel sockets
Date: Mon, 28 Mar 2022 07:18:24 -0400	[thread overview]
Message-ID: <20220328111828.1554086-40-sashal@kernel.org> (raw)
In-Reply-To: <20220328111828.1554086-1-sashal@kernel.org>

From: Chris Leech <cleech@redhat.com>

[ Upstream commit 841aee4d75f18fdfb53935080b03de0c65e9b92c ]

Put NVMe/TCP sockets in their own class to avoid some lockdep warnings.
Sockets created by nvme-tcp are not exposed to user-space, and will not
trigger certain code paths that the general socket API exposes.

Lockdep complains about a circular dependency between the socket and
filesystem locks, because setsockopt can trigger a page fault with a
socket lock held, but nvme-tcp sends requests on the socket while file
system locks are held.

  ======================================================
  WARNING: possible circular locking dependency detected
  5.15.0-rc3 #1 Not tainted
  ------------------------------------------------------
  fio/1496 is trying to acquire lock:
  (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendpage+0x23/0x80

  but task is already holding lock:
  (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]

  which lock already depends on the new lock.

  other info that might help us debug this:

  chain exists of:
   sk_lock-AF_INET --> sb_internal --> &xfs_dir_ilock_class/5

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&xfs_dir_ilock_class/5);
                                lock(sb_internal);
                                lock(&xfs_dir_ilock_class/5);
   lock(sk_lock-AF_INET);

  *** DEADLOCK ***

  6 locks held by fio/1496:
   #0: (sb_writers#13){.+.+}-{0:0}, at: path_openat+0x9fc/0xa20
   #1: (&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at: path_openat+0x296/0xa20
   #2: (sb_internal){.+.+}-{0:0}, at: xfs_trans_alloc_icreate+0x41/0xd0 [xfs]
   #3: (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]
   #4: (hctx->srcu){....}-{0:0}, at: hctx_lock+0x51/0xd0
   #5: (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0x33e/0x380 [nvme_tcp]

This annotation lets lockdep analyze nvme-tcp controlled sockets
independently of what the user-space sockets API does.

Link: https://lore.kernel.org/linux-nvme/CAHj4cs9MDYLJ+q+2_GXUK9HxFizv2pxUryUR0toX974M040z7g@mail.gmail.com/

Signed-off-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/nvme/host/tcp.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 65e00c64a588..d66e2de044e0 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -30,6 +30,44 @@ static int so_priority;
 module_param(so_priority, int, 0644);
 MODULE_PARM_DESC(so_priority, "nvme tcp socket optimize priority");
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/* lockdep can detect a circular dependency of the form
+ *   sk_lock -> mmap_lock (page fault) -> fs locks -> sk_lock
+ * because dependencies are tracked for both nvme-tcp and user contexts. Using
+ * a separate class prevents lockdep from conflating nvme-tcp socket use with
+ * user-space socket API use.
+ */
+static struct lock_class_key nvme_tcp_sk_key[2];
+static struct lock_class_key nvme_tcp_slock_key[2];
+
+static void nvme_tcp_reclassify_socket(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+
+	if (WARN_ON_ONCE(!sock_allow_reclassification(sk)))
+		return;
+
+	switch (sk->sk_family) {
+	case AF_INET:
+		sock_lock_init_class_and_name(sk, "slock-AF_INET-NVME",
+					      &nvme_tcp_slock_key[0],
+					      "sk_lock-AF_INET-NVME",
+					      &nvme_tcp_sk_key[0]);
+		break;
+	case AF_INET6:
+		sock_lock_init_class_and_name(sk, "slock-AF_INET6-NVME",
+					      &nvme_tcp_slock_key[1],
+					      "sk_lock-AF_INET6-NVME",
+					      &nvme_tcp_sk_key[1]);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+	}
+}
+#else
+static void nvme_tcp_reclassify_socket(struct socket *sock) { }
+#endif
+
 enum nvme_tcp_send_state {
 	NVME_TCP_SEND_CMD_PDU = 0,
 	NVME_TCP_SEND_H2C_PDU,
@@ -1469,6 +1507,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl,
 		goto err_destroy_mutex;
 	}
 
+	nvme_tcp_reclassify_socket(queue->sock);
+
 	/* Single syn retry */
 	tcp_sock_set_syncnt(queue->sock->sk, 1);
 
-- 
2.34.1


  parent reply	other threads:[~2022-03-28 11:23 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 02/43] regulator: rpi-panel: Handle I2C errors/timing to the Atmel Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 03/43] crypto: hisilicon/qm - cleanup warning in qm_vf_read_qos Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 04/43] crypto: octeontx2 - CN10K CPT to RNM workaround Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 05/43] gcc-plugins/stackleak: Exactly match strings instead of prefixes Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 06/43] rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 07/43] pinctrl: npcm: Fix broken references to chip->parent_device Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 08/43] rcu: Mark writes to the rcu_segcblist structure's ->flags field Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 09/43] block: throttle split bio in case of iops limit Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 10/43] memstick/mspro_block: fix handling of read-only devices Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 11/43] block/bfq_wf2q: correct weight to ioprio Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 12/43] crypto: xts - Add softdep on ecb Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 13/43] crypto: hisilicon/sec - not need to enable sm4 extra mode at HW V3 Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 14/43] block, bfq: don't move oom_bfqq Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 15/43] selinux: use correct type for context length Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction Sasha Levin
2022-03-28 18:08   ` Eric Biggers
2022-03-28 18:34     ` Michael Brooks
2022-03-29  5:31     ` Jason A. Donenfeld
2022-04-05 22:10       ` Jason A. Donenfeld
2022-03-29 15:38     ` Theodore Ts'o
2022-03-29 17:34       ` Michael Brooks
2022-03-29 18:28         ` Theodore Ts'o
     [not found]   ` <CAOnCY6RUN+CSwjsD6Vg-MDi7ERAj2kKLorMLGp1jE8dTZ+3cpQ@mail.gmail.com>
2022-03-28 19:33     ` Michael Brooks
2022-03-30 16:08   ` Michael Brooks
2022-03-30 16:49     ` David Laight
2022-03-30 17:10       ` Michael Brooks
2022-03-30 18:33         ` Michael Brooks
2022-03-30 19:01           ` Theodore Y. Ts'o
2022-03-30 19:08             ` Michael Brooks
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 17/43] random: remove batched entropy locking Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 18/43] random: absorb fast pool into input pool after fast load Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 19/43] powercap/dtpm_cpu: Reset per_cpu variable in the release function Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 20/43] random: round-robin registers as ulong, not u32 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 21/43] arm64: module: remove (NOLOAD) from linker script Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 22/43] selinux: allow FIOCLEX and FIONCLEX with policy capability Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 23/43] loop: use sysfs_emit() in the sysfs xxx show() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 24/43] Fix incorrect type in assignment of ipv6 port for audit Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 25/43] irqchip/qcom-pdc: Fix broken locking Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 26/43] irqchip/nvic: Release nvic_base upon failure Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 27/43] fs/binfmt_elf: Fix AT_PHDR for unusual ELF files Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 28/43] hwrng: cavium - fix NULL but dereferenced coccicheck error Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels Sasha Levin
2022-03-28 14:31   ` Eric W. Biederman
2022-03-28 16:35     ` Sebastian Andrzej Siewior
2022-03-31 16:59       ` Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 30/43] bfq: fix use-after-free in bfq_dispatch_request Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 31/43] ACPICA: Avoid walking the ACPI Namespace if it is not there Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 32/43] ACPI / x86: Add skip i2c clients quirk for Nextbook Ares 8 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 33/43] ACPI / x86: Add skip i2c clients quirk for Lenovo Yoga Tablet 1050F/L Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 34/43] lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 35/43] Revert "Revert "block, bfq: honor already-setup queue merges"" Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 36/43] ACPI/APEI: Limit printable size of BERT table data Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 37/43] PM: core: keep irq flags in device_pm_check_callbacks() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 38/43] parisc: Fix non-access data TLB cache flush faults Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 39/43] parisc: Fix handling off probe non-access faults Sasha Levin
2022-03-28 11:18 ` Sasha Levin [this message]
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 41/43] spi: tegra20: Use of_device_get_match_data() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag" Sasha Levin
2022-07-07 21:30   ` Tom Crossland
2022-07-07 21:36     ` Limonciello, Mario
2022-07-08  9:22       ` Tom Crossland
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 43/43] spi: fsi: Implement a timeout for polling status Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220328111828.1554086-40-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=axboe@fb.com \
    --cc=cleech@redhat.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).