linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: stable@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Dave Chinner <dchinner@redhat.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Sasha Levin <sashal@kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: [PATCH AUTOSEL 4.19 54/68] iomap: readpages doesn't zero page tail beyond EOF
Date: Thu, 29 Nov 2018 00:55:45 -0500	[thread overview]
Message-ID: <20181129055559.159228-54-sashal@kernel.org> (raw)
In-Reply-To: <20181129055559.159228-1-sashal@kernel.org>

From: Dave Chinner <dchinner@redhat.com>

[ Upstream commit 8c110d43c6bca4b24dd13272a9d4e0ba6f2ec957 ]

When we read the EOF page of the file via readpages, we need
to zero the region beyond EOF that we either do not read or
should not contain data so that mmap does not expose stale data to
user applications.

However, iomap_adjust_read_range() fails to detect EOF correctly,
and so fsx on 1k block size filesystems fails very quickly with
mapreads exposing data beyond EOF. There are two problems here.

Firstly, when calculating the end block of the EOF byte, we have
to round the size by one to avoid a block aligned EOF from reporting
a block too large. i.e. a size of 1024 bytes is 1 block, which in
index terms is block 0. Therefore we have to calculate the end block
from (isize - 1), not isize.

The second bug is determining if the current page spans EOF, and so
whether we need split it into two half, one for the IO, and the
other for zeroing. Unfortunately, the code that checks whether
we should split the block doesn't actually check if we span EOF, it
just checks if the read spans the /offset in the page/ that EOF
sits on. So it splits every read into two if EOF is not page
aligned, regardless of whether we are reading the EOF block or not.

Hence we need to restrict the "does the read span EOF" check to
just the page that spans EOF, not every page we read.

This patch results in correct EOF detection through readpages:

xfs_vm_readpages:     dev 259:0 ino 0x43 nr_pages 24
xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4
iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864
xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c
iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864
iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864

As you can see, it now does full page reads until the last one which
is split correctly at the block aligned EOF, reading 3072 bytes and
zeroing the last 1024 bytes. The original version of the patch got
this right, but it got another case wrong.

The EOF detection crossing really needs to the the original length
as plen, while it starts at the end of the block, will be shortened
as up-to-date blocks are found on the page. This means "orig_pos +
plen" no longer points to the end of the page, and so will not
correctly detect EOF crossing. Hence we have to use the length
passed in to detect this partial page case:

xfs_filemap_fault:    dev 259:1 ino 0x43  write_fault 0
xfs_vm_readpage:      dev 259:1 ino 0x43 nr_pages 1
xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4
iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296
xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1
iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296

Heere we see a trace where the first block on the EOF page is up to
date, hence poff = 1024 bytes. The offset into the page of EOF is
3072, so the range we want to read is 1024 - 3071, and the range we
want to zero is 3072 - 4095. You can see this is split correctly
now.

This fixes the stale data beyond EOF problem that fsx quickly
uncovers on 1k block size filesystems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/iomap.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index e2e4a1154c90..cc080445b057 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -143,13 +143,14 @@ static void
 iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
 		loff_t *pos, loff_t length, unsigned *offp, unsigned *lenp)
 {
+	loff_t orig_pos = *pos;
+	loff_t isize = i_size_read(inode);
 	unsigned block_bits = inode->i_blkbits;
 	unsigned block_size = (1 << block_bits);
 	unsigned poff = offset_in_page(*pos);
 	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
 	unsigned first = poff >> block_bits;
 	unsigned last = (poff + plen - 1) >> block_bits;
-	unsigned end = offset_in_page(i_size_read(inode)) >> block_bits;
 
 	/*
 	 * If the block size is smaller than the page size we need to check the
@@ -184,8 +185,12 @@ iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
 	 * handle both halves separately so that we properly zero data in the
 	 * page cache for blocks that are entirely outside of i_size.
 	 */
-	if (first <= end && last > end)
-		plen -= (last - end) * block_size;
+	if (orig_pos <= isize && orig_pos + length > isize) {
+		unsigned end = offset_in_page(isize - 1) >> block_bits;
+
+		if (first <= end && last > end)
+			plen -= (last - end) * block_size;
+	}
 
 	*offp = poff;
 	*lenp = plen;
-- 
2.17.1


  parent reply	other threads:[~2018-11-29  6:00 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-29  5:54 [PATCH AUTOSEL 4.19 01/68] media: vicodec: lower minimum height to 360 Sasha Levin
2018-11-29  5:54 ` [PATCH AUTOSEL 4.19 02/68] media: cec: check for non-OK/NACK conditions while claiming a LA Sasha Levin
2018-11-29  5:54 ` [PATCH AUTOSEL 4.19 03/68] media: omap3isp: Unregister media device as first Sasha Levin
2018-11-29  5:54 ` [PATCH AUTOSEL 4.19 04/68] media: ipu3-cio2: Unregister device nodes first, then release resources Sasha Levin
2018-11-29  5:54 ` [PATCH AUTOSEL 4.19 05/68] iommu/vt-d: Fix NULL pointer dereference in prq_event_thread() Sasha Levin
2018-11-29  5:54 ` [PATCH AUTOSEL 4.19 06/68] brcmutil: really fix decoding channel info for 160 MHz bandwidth Sasha Levin
2018-11-29 11:49   ` Kalle Valo
2018-11-29 16:54     ` Sasha Levin
2018-11-29  5:54 ` [PATCH AUTOSEL 4.19 07/68] mt76: fix building without CONFIG_LEDS_CLASS Sasha Levin
2018-11-29  5:54 ` [PATCH AUTOSEL 4.19 08/68] iommu/ipmmu-vmsa: Fix crash on early domain free Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 09/68] scsi: ufs: Fix hynix ufs bug with quirk on hi36xx SoC Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 10/68] can: ucan: remove set but not used variable 'udev' Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 11/68] can: rcar_can: Fix erroneous registration Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 12/68] test_firmware: fix error return getting clobbered Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 13/68] HID: input: Ignore battery reported by Symbol DS4308 Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 14/68] batman-adv: Use explicit tvlv padding for ELP packets Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 15/68] batman-adv: Expand merged fragment buffer for full packet Sasha Levin
2018-11-29 10:00   ` Sergei Shtylyov
2018-11-29 10:04     ` Sergei Shtylyov
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 16/68] amd/iommu: Fix Guest Virtual APIC Log Tail Address Register Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 17/68] bnx2x: Assign unique DMAE channel number for FW DMAE transactions Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 18/68] qed: Fix PTT leak in qed_drain() Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 19/68] qed: Fix overriding offload_tc by protocols without APP TLV Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 20/68] qed: Fix rdma_info structure allocation Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 21/68] qed: Fix reading wrong value in loop condition Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 22/68] usb: dwc2: pci: Fix an error code in probe Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 23/68] Revert "usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers" Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 24/68] s390/ism: clear dmbe_mask bit before SMC IRQ handling Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 25/68] nvme-fc: resolve io failures during connect Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 26/68] bnxt_en: Fix filling time in bnxt_fill_coredump_record() Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 27/68] drm/amdgpu: Add amdgpu "max bpc" connector property (v2) Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 28/68] drm/amd/display: Support " Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 29/68] net/mlx4_core: Zero out lkey field in SW2HW_MPT fw command Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 30/68] net/mlx4_core: Fix uninitialized variable compilation warning Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 31/68] net/mlx4: Fix UBSAN warning of signed integer overflow Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 32/68] drivers/net/ethernet/qlogic/qed/qed_rdma.h: fix typo Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 33/68] gpio: pxa: fix legacy non pinctrl aware builds again Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 34/68] gpio: mockup: fix indicated direction Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 35/68] tc-testing: tdc.py: ignore errors when decoding stdout/stderr Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 36/68] tc-testing: tdc.py: Guard against lack of returncode in executed command Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 37/68] mtd: rawnand: qcom: Namespace prefix some commands Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 38/68] cpufreq: ti-cpufreq: Only register platform_device when supported Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 39/68] exec: make de_thread() freezable Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 40/68] ALSA: hda/ca0132 - Add new ZxR quirk Sasha Levin
2018-11-29 14:51   ` Connor McAdams
2018-12-05 16:00     ` Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 41/68] Revert "HID: uhid: use strlcpy() instead of strncpy()" Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 42/68] HID: steam: remove input device when a hid client is running Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 43/68] HID: multitouch: Add pointstick support for Cirque Touchpad Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 44/68] mtd: spi-nor: Fix Cadence QSPI page fault kernel panic Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 45/68] net: ena: fix crash during failed resume from hibernation Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 46/68] NFSv4: Fix a NFSv4 state manager deadlock Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 47/68] qed: Fix bitmap_weight() check Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 48/68] qed: Fix QM getters to always return a valid pq Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 49/68] net/ibmnvic: Fix deadlock problem in reset Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 50/68] riscv: fix warning in arch/riscv/include/asm/module.h Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 51/68] iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 52/68] iomap: sub-block dio needs to zeroout beyond EOF Sasha Levin
2018-11-29 12:19   ` Dave Chinner
2018-11-29 12:36     ` Amir Goldstein
2018-11-29 22:43       ` Dave Chinner
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 53/68] iomap: dio data corruption and spurious errors when pipes fill Sasha Levin
2018-11-29  5:55 ` Sasha Levin [this message]
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 55/68] net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 56/68] iommu/vt-d: Use memunmap to free memremap Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 57/68] NFSv4.2 copy do not allocate memory under the lock Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 58/68] flexfiles: use per-mirror specified stateid for IO Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 59/68] net/dim: Update DIM start sample after each DIM iteration Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 60/68] net: thunderx: set xdp_prog to NULL if bpf_prog_add fails Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 61/68] ibmvnic: Fix RX queue buffer cleanup Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 62/68] ibmvnic: Update driver queues after change in ring size support Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 63/68] virtio-net: disable guest csum during XDP set Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 64/68] virtio-net: fail XDP set if guest csum is negotiated Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 65/68] team: no need to do team_notify_peers or team_mcast_rejoin when disabling port Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 66/68] net: amd: add missing of_node_put() Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 67/68] net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue Sasha Levin
2018-11-29  5:55 ` [PATCH AUTOSEL 4.19 68/68] net: gemini: Fix copy/paste error Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181129055559.159228-54-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).