linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: [PATCH 3.10 71/80] SUNRPC: Fix a data corruption issue when retransmitting RPC calls
Date: Tue, 26 Nov 2013 16:57:40 -0800	[thread overview]
Message-ID: <20131127005645.851997493@linuxfoundation.org> (raw)
In-Reply-To: <20131127005640.934155527@linuxfoundation.org>

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <Trond.Myklebust@netapp.com>

commit a6b31d18b02ff9d7915c5898c9b5ca41a798cd73 upstream.

The following scenario can cause silent data corruption when doing
NFS writes. It has mainly been observed when doing database writes
using O_DIRECT.

1) The RPC client uses sendpage() to do zero-copy of the page data.
2) Due to networking issues, the reply from the server is delayed,
   and so the RPC client times out.

3) The client issues a second sendpage of the page data as part of
   an RPC call retransmission.

4) The reply to the first transmission arrives from the server
   _before_ the client hardware has emptied the TCP socket send
   buffer.
5) After processing the reply, the RPC state machine rules that
   the call to be done, and triggers the completion callbacks.
6) The application notices the RPC call is done, and reuses the
   pages to store something else (e.g. a new write).

7) The client NIC drains the TCP socket send buffer. Since the
   page data has now changed, it reads a corrupted version of the
   initial RPC call, and puts it on the wire.

This patch fixes the problem in the following manner:

The ordering guarantees of TCP ensure that when the server sends a
reply, then we know that the _first_ transmission has completed. Using
zero-copy in that situation is therefore safe.
If a time out occurs, we then send the retransmission using sendmsg()
(i.e. no zero-copy), We then know that the socket contains a full copy of
the data, and so it will retransmit a faithful reproduction even if the
RPC call completes, and the application reuses the O_DIRECT buffer in
the meantime.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/sunrpc/xprtsock.c |   28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -391,8 +391,10 @@ static int xs_send_kvec(struct socket *s
 	return kernel_sendmsg(sock, &msg, NULL, 0, 0);
 }
 
-static int xs_send_pagedata(struct socket *sock, struct xdr_buf *xdr, unsigned int base, int more)
+static int xs_send_pagedata(struct socket *sock, struct xdr_buf *xdr, unsigned int base, int more, bool zerocopy)
 {
+	ssize_t (*do_sendpage)(struct socket *sock, struct page *page,
+			int offset, size_t size, int flags);
 	struct page **ppage;
 	unsigned int remainder;
 	int err, sent = 0;
@@ -401,6 +403,9 @@ static int xs_send_pagedata(struct socke
 	base += xdr->page_base;
 	ppage = xdr->pages + (base >> PAGE_SHIFT);
 	base &= ~PAGE_MASK;
+	do_sendpage = sock->ops->sendpage;
+	if (!zerocopy)
+		do_sendpage = sock_no_sendpage;
 	for(;;) {
 		unsigned int len = min_t(unsigned int, PAGE_SIZE - base, remainder);
 		int flags = XS_SENDMSG_FLAGS;
@@ -408,7 +413,7 @@ static int xs_send_pagedata(struct socke
 		remainder -= len;
 		if (remainder != 0 || more)
 			flags |= MSG_MORE;
-		err = sock->ops->sendpage(sock, *ppage, base, len, flags);
+		err = do_sendpage(sock, *ppage, base, len, flags);
 		if (remainder == 0 || err != len)
 			break;
 		sent += err;
@@ -429,9 +434,10 @@ static int xs_send_pagedata(struct socke
  * @addrlen: UDP only -- length of destination address
  * @xdr: buffer containing this request
  * @base: starting position in the buffer
+ * @zerocopy: true if it is safe to use sendpage()
  *
  */
-static int xs_sendpages(struct socket *sock, struct sockaddr *addr, int addrlen, struct xdr_buf *xdr, unsigned int base)
+static int xs_sendpages(struct socket *sock, struct sockaddr *addr, int addrlen, struct xdr_buf *xdr, unsigned int base, bool zerocopy)
 {
 	unsigned int remainder = xdr->len - base;
 	int err, sent = 0;
@@ -459,7 +465,7 @@ static int xs_sendpages(struct socket *s
 	if (base < xdr->page_len) {
 		unsigned int len = xdr->page_len - base;
 		remainder -= len;
-		err = xs_send_pagedata(sock, xdr, base, remainder != 0);
+		err = xs_send_pagedata(sock, xdr, base, remainder != 0, zerocopy);
 		if (remainder == 0 || err != len)
 			goto out;
 		sent += err;
@@ -562,7 +568,7 @@ static int xs_local_send_request(struct
 			req->rq_svec->iov_base, req->rq_svec->iov_len);
 
 	status = xs_sendpages(transport->sock, NULL, 0,
-						xdr, req->rq_bytes_sent);
+						xdr, req->rq_bytes_sent, true);
 	dprintk("RPC:       %s(%u) = %d\n",
 			__func__, xdr->len - req->rq_bytes_sent, status);
 	if (likely(status >= 0)) {
@@ -618,7 +624,7 @@ static int xs_udp_send_request(struct rp
 	status = xs_sendpages(transport->sock,
 			      xs_addr(xprt),
 			      xprt->addrlen, xdr,
-			      req->rq_bytes_sent);
+			      req->rq_bytes_sent, true);
 
 	dprintk("RPC:       xs_udp_send_request(%u) = %d\n",
 			xdr->len - req->rq_bytes_sent, status);
@@ -689,6 +695,7 @@ static int xs_tcp_send_request(struct rp
 	struct rpc_xprt *xprt = req->rq_xprt;
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
 	struct xdr_buf *xdr = &req->rq_snd_buf;
+	bool zerocopy = true;
 	int status;
 
 	xs_encode_stream_record_marker(&req->rq_snd_buf);
@@ -696,13 +703,20 @@ static int xs_tcp_send_request(struct rp
 	xs_pktdump("packet data:",
 				req->rq_svec->iov_base,
 				req->rq_svec->iov_len);
+	/* Don't use zero copy if this is a resend. If the RPC call
+	 * completes while the socket holds a reference to the pages,
+	 * then we may end up resending corrupted data.
+	 */
+	if (task->tk_flags & RPC_TASK_SENT)
+		zerocopy = false;
 
 	/* Continue transmitting the packet/record. We must be careful
 	 * to cope with writespace callbacks arriving _after_ we have
 	 * called sendmsg(). */
 	while (1) {
 		status = xs_sendpages(transport->sock,
-					NULL, 0, xdr, req->rq_bytes_sent);
+					NULL, 0, xdr, req->rq_bytes_sent,
+					zerocopy);
 
 		dprintk("RPC:       xs_tcp_send_request(%u) = %d\n",
 				xdr->len - req->rq_bytes_sent, status);



  parent reply	other threads:[~2013-11-27  1:39 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-27  0:56 [PATCH 3.10 00/80] 3.10.21-stable review Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 01/80] ACPICA: DeRefOf operator: Update to fully resolve FieldUnit and BufferField refs Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 02/80] libertas: potential oops in debugfs Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 03/80] aacraid: prevent invalid pointer dereference Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 04/80] ACPICA: Return error if DerefOf resolves to a null package element Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 05/80] ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 06/80] USB: mos7840: fix tiocmget error handling Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 07/80] can: kvaser_usb: fix usb endpoints detection Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 08/80] crypto: ansi_cprng - Fix off by one error in non-block size request Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 09/80] crypto: s390 - Fix aes-cbc IV corruption Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 10/80] can: c_can: Fix RX message handling, handle lost message before EOB Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 11/80] ipc,shm: correct error return value in shmctl (SHM_UNLOCK) Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 12/80] ipc,shm: fix shm_file deletion races Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 13/80] drm/nv50-/disp: remove dcb_outp_match call, and related variables Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 14/80] drm/nva3-/disp: fix hda eld writing, needs to be padded Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 15/80] SUNRPC: dont map EKEYEXPIRED to EACCES in call_refreshresult Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 16/80] sched, idle: Fix the idle polling state logic Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 17/80] PCI: Allow PCIe Capability link-related register access for switches Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 18/80] PCI: Remove PCIe Capability version checks Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 19/80] PCI: Support PCIe Capability Slot registers only for ports with slots Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 20/80] perf/ftrace: Fix paranoid level for enabling function tracer Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 21/80] ACPI / EC: Ensure lock is acquired before accessing ec struct members Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 22/80] ACPI / video: Quirk initial backlight level 0 Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 23/80] ACPI / hotplug: Fix handle_root_bridge_removal() Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 24/80] ACPI / hotplug: Do not execute "insert in progress" _OST Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 25/80] rt2x00: fix a crash bug in the HT descriptor handling fix Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 26/80] rt2x00: check if device is still available on rt2x00mac_flush() Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 27/80] rt2x00: rt2800lib: fix VGC adjustment for RT5592 Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 28/80] rt2x00: fix HT TX descriptor settings regression Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 29/80] Revert "ima: policy for RAMFS" Greg Kroah-Hartman
2013-11-27  0:56 ` [PATCH 3.10 30/80] exec/ptrace: fix get_dumpable() incorrect tests Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 31/80] ALSA: 6fire: Fix probe of multiple cards Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 32/80] ALSA: compress: fix drain calls blocking other compress functions Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 33/80] ALSA: compress: fix drain calls blocking other compress functions (v6) Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 34/80] ALSA: msnd: Avoid duplicated driver name Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 35/80] ALSA: hda - Add support of ALC255 codecs Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 36/80] ALSA: hda - Enable SPDIF for Acer TravelMate 6293 Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 37/80] ALSA: hda - Make sure mute LEDs stay on during runtime suspend (Realtek) Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 38/80] ALSA: hda - Add support for CX20952 Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 39/80] ALSA: hda - Add pincfg fixup for ASUS W5A Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 40/80] ALSA: hda - Fix Line Out automute on Realtek multifunction jacks Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 41/80] ALSA: hda - Check keep_eapd_on before inv_eapd Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 42/80] ALSA: hda - Dont turn off EAPD for headphone on Lenovo N100 Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 43/80] ALSA: hda - Dont clear the power state at snd_hda_codec_reset() Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 44/80] ALSA: hda - Fix unbalanced runtime PM notification at resume Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 45/80] ALSA: hda - Fix the headphone jack detection on Sony VAIO TX Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 46/80] ALSA: hda - Add headset quirk for Dell Inspiron 3135 Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 47/80] ALSA: hda - Provide missing pin configs for VAIO with ALC260 Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 48/80] NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk() Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 49/80] NFSv4: fix NULL dereference in open recover Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 50/80] NFSv4: dont fail on missing fattr " Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 51/80] NFSv4: dont reprocess cached open CLAIM_PREVIOUS Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 52/80] NFSv4: Fix state reference counting in _nfs4_opendata_reclaim_to_nfs4_state Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 53/80] nfsd: return better errors to exportfs Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 54/80] nfsd: split up nfsd_setattr Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 55/80] nfsd: make sure to balance get/put_write_access Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 56/80] x86/microcode/amd: Tone down printk(), dont treat a missing firmware file as an error Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 57/80] KVM: x86: fix emulation of "movzbl %bpl, %eax" Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 58/80] ftrace/x86: skip over the breakpoint for ftrace caller Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 59/80] KVM: IOMMU: hva align mapping page size Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 60/80] arm/arm64: KVM: Fix hyp mappings of vmalloc regions Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 61/80] hwmon: (lm90) Fix max6696 alarm handling Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 62/80] block: fix race between request completion and timeout handling Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 63/80] block: fix a probe argument to blk_register_region Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 64/80] block: properly stack underlying max_segment_size to DM device Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 65/80] powerpc/52xx: fix build breakage for MPC5200 LPBFIFO module Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 66/80] powerpc/vio: use strcpy in modalias_show Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 67/80] powerpc/powernv: Add PE to its own PELTV Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 68/80] powerpc: ppc64 address space capped at 32TB, mmap randomisation disabled Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 69/80] powerpc/signals: Mark VSX not saved with small contexts Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 70/80] slub: Handle NULL parameter in kmem_cache_flags Greg Kroah-Hartman
2013-11-27  0:57 ` Greg Kroah-Hartman [this message]
2013-11-27  0:57 ` [PATCH 3.10 72/80] mei: nfc: fix memory leak in error path Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 73/80] usb: hub: Clear Port Reset Change during init/resume Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 74/80] rt2800usb: slow down TX status polling Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 75/80] s390/vtime: correct idle time calculation Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 76/80] configfs: fix race between dentry put and lookup Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 77/80] cris: media platform drivers: fix build Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 78/80] dmi: add support for exact DMI matches in addition to substring matching Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 79/80] drm/i915: quirk away phantom LVDS on Intels D510MO mainboard Greg Kroah-Hartman
2013-11-27  0:57 ` [PATCH 3.10 80/80] drm/i915: No LVDS hardware on Intel D410PT and D425KT Greg Kroah-Hartman
2013-11-27 12:57 ` [PATCH 3.10 00/80] 3.10.21-stable review Guenter Roeck
2013-11-27 22:29 ` Shuah Khan
2013-11-28 10:55 ` Satoru Takeuchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131127005645.851997493@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=Trond.Myklebust@netapp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).