netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Coddington <bcodding@redhat.com>
To: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org,
	"Philipp Reisner" <philipp.reisner@linbit.com>,
	"Lars Ellenberg" <lars.ellenberg@linbit.com>,
	"Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>,
	"Jens Axboe" <axboe@kernel.dk>,
	"Josef Bacik" <josef@toxicpanda.com>,
	"Keith Busch" <kbusch@kernel.org>,
	"Christoph Hellwig" <hch@lst.de>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	"Lee Duncan" <lduncan@suse.com>,
	"Chris Leech" <cleech@redhat.com>,
	"Mike Christie" <michael.christie@oracle.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"Valentina Manea" <valentina.manea.m@gmail.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"David Howells" <dhowells@redhat.com>,
	"Marc Dionne" <marc.dionne@auristor.com>,
	"Steve French" <sfrench@samba.org>,
	"Christine Caulfield" <ccaulfie@redhat.com>,
	"David Teigland" <teigland@redhat.com>,
	"Mark Fasheh" <mark@fasheh.com>,
	"Joel Becker" <jlbec@evilplan.org>,
	"Joseph Qi" <joseph.qi@linux.alibaba.com>,
	"Eric Van Hensbergen" <ericvh@gmail.com>,
	"Latchesar Ionkov" <lucho@ionkov.net>,
	"Dominique Martinet" <asmadeus@codewreck.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Ilya Dryomov" <idryomov@gmail.com>,
	"Xiubo Li" <xiubli@redhat.com>,
	"Trond Myklebust" <trond.myklebust@hammerspace.com>,
	"Anna Schumaker" <anna@kernel.org>,
	"Chuck Lever" <chuck.lever@oracle.com>,
	"Jeff Layton" <jlayton@kernel.org>,
	drbd-dev@lists.linbit.com, linux-block@vger.kernel.org,
	nbd@other.debian.org, linux-nvme@lists.infradead.org,
	open-iscsi@googlegroups.com, linux-scsi@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-afs@lists.infradead.org,
	linux-cifs@vger.kernel.org, samba-technical@lists.samba.org,
	cluster-devel@redhat.com, ocfs2-devel@oss.oracle.com,
	v9fs-developer@lists.sourceforge.net, ceph-devel@vger.kernel.org,
	linux-nfs@vger.kernel.org
Subject: [PATCH v1 2/3] Treewide: Stop corrupting socket's task_frag
Date: Mon, 21 Nov 2022 08:35:18 -0500	[thread overview]
Message-ID: <c2ec184226acd21a191ccc1aa46a1d7e43ca7104.1669036433.git.bcodding@redhat.com> (raw)
In-Reply-To: <cover.1669036433.git.bcodding@redhat.com>

Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current->task_frag.  The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.

The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code.  I believe this problem to
be much more pervasive than reports to the community may indicate.

Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false.  Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.

CC: Philipp Reisner <philipp.reisner@linbit.com>
CC: Lars Ellenberg <lars.ellenberg@linbit.com>
CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Josef Bacik <josef@toxicpanda.com>
CC: Keith Busch <kbusch@kernel.org>
CC: Christoph Hellwig <hch@lst.de>
CC: Sagi Grimberg <sagi@grimberg.me>
CC: Lee Duncan <lduncan@suse.com>
CC: Chris Leech <cleech@redhat.com>
CC: Mike Christie <michael.christie@oracle.com>
CC: "James E.J. Bottomley" <jejb@linux.ibm.com>
CC: "Martin K. Petersen" <martin.petersen@oracle.com>
CC: Valentina Manea <valentina.manea.m@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: David Howells <dhowells@redhat.com>
CC: Marc Dionne <marc.dionne@auristor.com>
CC: Steve French <sfrench@samba.org>
CC: Christine Caulfield <ccaulfie@redhat.com>
CC: David Teigland <teigland@redhat.com>
CC: Mark Fasheh <mark@fasheh.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: Joseph Qi <joseph.qi@linux.alibaba.com>
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: Dominique Martinet <asmadeus@codewreck.org>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Paolo Abeni <pabeni@redhat.com>
CC: Ilya Dryomov <idryomov@gmail.com>
CC: Xiubo Li <xiubli@redhat.com>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Jeff Layton <jlayton@kernel.org>
CC: Trond Myklebust <trond.myklebust@hammerspace.com>
CC: Anna Schumaker <anna@kernel.org>
CC: drbd-dev@lists.linbit.com
CC: linux-block@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: nbd@other.debian.org
CC: linux-nvme@lists.infradead.org
CC: open-iscsi@googlegroups.com
CC: linux-scsi@vger.kernel.org
CC: linux-usb@vger.kernel.org
CC: linux-afs@lists.infradead.org
CC: linux-cifs@vger.kernel.org
CC: samba-technical@lists.samba.org
CC: cluster-devel@redhat.com
CC: ocfs2-devel@oss.oracle.com
CC: v9fs-developer@lists.sourceforge.net
CC: netdev@vger.kernel.org
CC: ceph-devel@vger.kernel.org
CC: linux-nfs@vger.kernel.org

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
 drivers/block/drbd/drbd_receiver.c | 3 +++
 drivers/block/nbd.c                | 1 +
 drivers/nvme/host/tcp.c            | 1 +
 drivers/scsi/iscsi_tcp.c           | 1 +
 drivers/usb/usbip/usbip_common.c   | 1 +
 fs/afs/rxrpc.c                     | 1 +
 fs/cifs/connect.c                  | 1 +
 fs/dlm/lowcomms.c                  | 2 ++
 fs/ocfs2/cluster/tcp.c             | 1 +
 net/9p/trans_fd.c                  | 1 +
 net/ceph/messenger.c               | 1 +
 net/sunrpc/xprtsock.c              | 3 +++
 12 files changed, 17 insertions(+)

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index af4c7d65490b..09ad8d82c200 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1030,6 +1030,9 @@ static int conn_connect(struct drbd_connection *connection)
 	sock.socket->sk->sk_allocation = GFP_NOIO;
 	msock.socket->sk->sk_allocation = GFP_NOIO;
 
+	sock.socket->sk->sk_use_task_frag = false;
+	msock.socket->sk->sk_use_task_frag = false;
+
 	sock.socket->sk->sk_priority = TC_PRIO_INTERACTIVE_BULK;
 	msock.socket->sk->sk_priority = TC_PRIO_INTERACTIVE;
 
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 2a709daefbc4..815ee631ed30 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -514,6 +514,7 @@ static int sock_xmit(struct nbd_device *nbd, int index, int send,
 	noreclaim_flag = memalloc_noreclaim_save();
 	do {
 		sock->sk->sk_allocation = GFP_NOIO | __GFP_MEMALLOC;
+		sock->sk->sk_use_task_frag = false;
 		msg.msg_name = NULL;
 		msg.msg_namelen = 0;
 		msg.msg_control = NULL;
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index d5871fd6f769..e01d78858cb4 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1531,6 +1531,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl,
 	queue->sock->sk->sk_rcvtimeo = 10 * HZ;
 
 	queue->sock->sk->sk_allocation = GFP_ATOMIC;
+	queue->sock->sk->sk_use_task_frag = false;
 	nvme_tcp_set_queue_io_cpu(queue);
 	queue->request = NULL;
 	queue->data_remaining = 0;
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 29b1bd755afe..733e540d0abf 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -733,6 +733,7 @@ iscsi_sw_tcp_conn_bind(struct iscsi_cls_session *cls_session,
 	sk->sk_reuse = SK_CAN_REUSE;
 	sk->sk_sndtimeo = 15 * HZ; /* FIXME: make it configurable */
 	sk->sk_allocation = GFP_ATOMIC;
+	sk->sk_use_task_frag = false;
 	sk_set_memalloc(sk);
 	sock_no_linger(sk);
 
diff --git a/drivers/usb/usbip/usbip_common.c b/drivers/usb/usbip/usbip_common.c
index 2ab99244bc31..76bfc6e43881 100644
--- a/drivers/usb/usbip/usbip_common.c
+++ b/drivers/usb/usbip/usbip_common.c
@@ -315,6 +315,7 @@ int usbip_recv(struct socket *sock, void *buf, int size)
 
 	do {
 		sock->sk->sk_allocation = GFP_NOIO;
+		sock->sk->sk_use_task_frag = false;
 
 		result = sock_recvmsg(sock, &msg, MSG_WAITALL);
 		if (result <= 0)
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index eccc3cd0cb70..ac75ad18db83 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -46,6 +46,7 @@ int afs_open_socket(struct afs_net *net)
 		goto error_1;
 
 	socket->sk->sk_allocation = GFP_NOFS;
+	socket->sk->sk_use_task_frag = false;
 
 	/* bind the callback manager's address to make this a server socket */
 	memset(&srx, 0, sizeof(srx));
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 7ae6f2c08153..c2b0d6f59f79 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2935,6 +2935,7 @@ generic_ip_connect(struct TCP_Server_Info *server)
 		cifs_dbg(FYI, "Socket created\n");
 		server->ssocket = socket;
 		socket->sk->sk_allocation = GFP_NOFS;
+		socket->sk->sk_use_task_frag = false;
 		if (sfamily == AF_INET6)
 			cifs_reclassify_socket6(socket);
 		else
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index a4e84e8d94c8..4cf29ac3c428 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -699,6 +699,7 @@ static void add_listen_sock(struct socket *sock, struct listen_connection *con)
 
 	sk->sk_user_data = con;
 	sk->sk_allocation = GFP_NOFS;
+	sk->sk_use_task_frag = false;
 	/* Install a data_ready callback */
 	sk->sk_data_ready = lowcomms_listen_data_ready;
 	release_sock(sk);
@@ -718,6 +719,7 @@ static void add_sock(struct socket *sock, struct connection *con)
 	sk->sk_write_space = lowcomms_write_space;
 	sk->sk_state_change = lowcomms_state_change;
 	sk->sk_allocation = GFP_NOFS;
+	sk->sk_use_task_frag = false;
 	sk->sk_error_report = lowcomms_error_report;
 	release_sock(sk);
 }
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index f660c0dbdb63..3eaafa5e5ec4 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -1604,6 +1604,7 @@ static void o2net_start_connect(struct work_struct *work)
 	sc->sc_sock = sock; /* freed by sc_kref_release */
 
 	sock->sk->sk_allocation = GFP_ATOMIC;
+	sock->sk->sk_use_task_frag = false;
 
 	myaddr.sin_family = AF_INET;
 	myaddr.sin_addr.s_addr = mynode->nd_ipv4_address;
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index e758978b44be..96f803499323 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -851,6 +851,7 @@ static int p9_socket_open(struct p9_client *client, struct socket *csocket)
 		return -ENOMEM;
 
 	csocket->sk->sk_allocation = GFP_NOIO;
+	csocket->sk->sk_use_task_frag = false;
 	file = sock_alloc_file(csocket, 0, NULL);
 	if (IS_ERR(file)) {
 		pr_err("%s (%d): failed to map fd\n",
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index d3bb656308b4..cad8e0ca8432 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -446,6 +446,7 @@ int ceph_tcp_connect(struct ceph_connection *con)
 	if (ret)
 		return ret;
 	sock->sk->sk_allocation = GFP_NOFS;
+	sock->sk->sk_use_task_frag = false;
 
 #ifdef CONFIG_LOCKDEP
 	lockdep_set_class(&sock->sk->sk_lock, &socket_class);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index e976007f4fd0..d3170b753dfc 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1882,6 +1882,7 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt,
 		sk->sk_write_space = xs_udp_write_space;
 		sk->sk_state_change = xs_local_state_change;
 		sk->sk_error_report = xs_error_report;
+		sk->sk_use_task_frag = false;
 
 		xprt_clear_connected(xprt);
 
@@ -2083,6 +2084,7 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 		sk->sk_user_data = xprt;
 		sk->sk_data_ready = xs_data_ready;
 		sk->sk_write_space = xs_udp_write_space;
+		sk->sk_use_task_frag = false;
 
 		xprt_set_connected(xprt);
 
@@ -2250,6 +2252,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
 		sk->sk_state_change = xs_tcp_state_change;
 		sk->sk_write_space = xs_tcp_write_space;
 		sk->sk_error_report = xs_error_report;
+		sk->sk_use_task_frag = false;
 
 		/* socket options */
 		sock_reset_flag(sk, SOCK_LINGER);
-- 
2.31.1


  parent reply	other threads:[~2022-11-21 13:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-21 13:35 [PATCH v1 0/3] Stop corrupting socket's task_frag Benjamin Coddington
2022-11-21 13:35 ` [PATCH v1 1/3] net: Introduce sk_use_task_frag in struct sock Benjamin Coddington
2022-12-09 12:09   ` Paolo Abeni
2022-12-09 14:16     ` Eric Dumazet
2022-11-21 13:35 ` Benjamin Coddington [this message]
2022-11-29 14:02   ` [PATCH v1 2/3] Treewide: Stop corrupting socket's task_frag Christoph Hellwig
2022-11-29 16:47     ` Benjamin Coddington
2022-11-30 12:07       ` Guillaume Nault
2022-11-29 17:42     ` Jeff Layton
2022-11-30 11:48     ` Guillaume Nault
2022-12-09 12:37   ` Paolo Abeni
2022-12-09 16:11     ` Jakub Kicinski
2022-12-09 16:50       ` Paolo Abeni
2022-11-21 13:35 ` [PATCH v1 3/3] net: simplify sk_page_frag Benjamin Coddington
2022-12-09 16:42   ` Paolo Abeni
2022-12-09 16:44   ` Paolo Abeni
2022-12-09 16:44   ` Paolo Abeni
2022-11-21 13:56 ` [PATCH v1 2/3] Treewide: Stop corrupting socket's task_frag David Howells
2022-11-21 14:34   ` Benjamin Coddington
2022-11-21 21:40     ` Shuah Khan
2022-11-21 21:43       ` Shuah Khan
2022-11-21 22:01         ` Benjamin Coddington
2022-11-21 22:32           ` Shuah Khan
2022-11-22 14:23             ` Benjamin Coddington
2022-12-07 11:44 ` [PATCH v1 0/3] " Benjamin Coddington

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c2ec184226acd21a191ccc1aa46a1d7e43ca7104.1669036433.git.bcodding@redhat.com \
    --to=bcodding@redhat.com \
    --cc=anna@kernel.org \
    --cc=asmadeus@codewreck.org \
    --cc=axboe@kernel.dk \
    --cc=ccaulfie@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=christoph.boehmwalder@linbit.com \
    --cc=chuck.lever@oracle.com \
    --cc=cleech@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=drbd-dev@lists.linbit.com \
    --cc=edumazet@google.com \
    --cc=ericvh@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=idryomov@gmail.com \
    --cc=jejb@linux.ibm.com \
    --cc=jlayton@kernel.org \
    --cc=jlbec@evilplan.org \
    --cc=josef@toxicpanda.com \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=kbusch@kernel.org \
    --cc=kuba@kernel.org \
    --cc=lars.ellenberg@linbit.com \
    --cc=lduncan@suse.com \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=lucho@ionkov.net \
    --cc=marc.dionne@auristor.com \
    --cc=mark@fasheh.com \
    --cc=martin.petersen@oracle.com \
    --cc=michael.christie@oracle.com \
    --cc=nbd@other.debian.org \
    --cc=netdev@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=open-iscsi@googlegroups.com \
    --cc=pabeni@redhat.com \
    --cc=philipp.reisner@linbit.com \
    --cc=sagi@grimberg.me \
    --cc=samba-technical@lists.samba.org \
    --cc=sfrench@samba.org \
    --cc=shuah@kernel.org \
    --cc=teigland@redhat.com \
    --cc=trond.myklebust@hammerspace.com \
    --cc=v9fs-developer@lists.sourceforge.net \
    --cc=valentina.manea.m@gmail.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).