All of lore.kernel.org
 help / color / mirror / Atom feed
* lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch
@ 2017-06-29 13:54 Christoph Hellwig
  2017-06-29 13:54 ` [PATCH 1/5] xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk Christoph Hellwig
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-29 13:54 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

Here is my take on all the current seek work.  It has my XFS quota changes
(unmodified from last post), Andrea's page cache seek helper (unmodified
except that the export is dropped), and Andreas iomap SEEK_HOLE / DATA,
for which I've added a couple of cleanups and converted ext4 as well.

I've made sure both xfs and ext4 pass all the xfstests.

Changes since the previous posting:
 - trivial cleanups of the iomap helper

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/5] xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk
  2017-06-29 13:54 lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Christoph Hellwig
@ 2017-06-29 13:54 ` Christoph Hellwig
  2017-07-01  2:45   ` Darrick J. Wong
  2017-06-29 13:54 ` [PATCH 2/5] vfs: Add page_cache_seek_hole_data helper Christoph Hellwig
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-29 13:54 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_quotaops.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/xfs/xfs_quotaops.c b/fs/xfs/xfs_quotaops.c
index f82d79a8c694..de9493253edf 100644
--- a/fs/xfs/xfs_quotaops.c
+++ b/fs/xfs/xfs_quotaops.c
@@ -269,7 +269,6 @@ xfs_fs_get_nextdqblk(
 	/* ID may be different, so convert back what we got */
 	*qid = make_kqid(current_user_ns(), qid->type, id);
 	return 0;
-	
 }
 
 STATIC int
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/5] vfs: Add page_cache_seek_hole_data helper
  2017-06-29 13:54 lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Christoph Hellwig
  2017-06-29 13:54 ` [PATCH 1/5] xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk Christoph Hellwig
@ 2017-06-29 13:54 ` Christoph Hellwig
  2017-06-29 13:54 ` [PATCH 3/5] vfs: Add iomap_seek_hole and iomap_seek_data helpers Christoph Hellwig
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-29 13:54 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

From: Andreas Gruenbacher <agruenba@redhat.com>

Both ext4 and xfs implement seeking for the next hole or piece of data
in unwritten extents by scanning the page cache, and both versions share
the same bug when iterating the buffers of a page: the start offset into
the page isn't taken into account, so when a page fits more than two
filesystem blocks, things will go wrong.  For example, on a filesystem
with a block size of 1k, the following command will fail:

  xfs_io -f -c "falloc 0 4k" \
            -c "pwrite 1k 1k" \
            -c "pwrite 3k 1k" \
            -c "seek -a -r 0" foo

In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.

Introduce a generic vfs helper for seeking in the page cache that gets
this right.  The next commits will replace the filesystem specific
implementations.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[hch: dropped the export]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/buffer.c                 | 124 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/buffer_head.h |   2 +
 2 files changed, 126 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index 161be58c5cb0..b3674eb7c9c0 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3492,6 +3492,130 @@ int bh_submit_read(struct buffer_head *bh)
 }
 EXPORT_SYMBOL(bh_submit_read);
 
+/*
+ * Seek for SEEK_DATA / SEEK_HOLE within @page, starting at @lastoff.
+ *
+ * Returns the offset within the file on success, and -ENOENT otherwise.
+ */
+static loff_t
+page_seek_hole_data(struct page *page, loff_t lastoff, int whence)
+{
+	loff_t offset = page_offset(page);
+	struct buffer_head *bh, *head;
+	bool seek_data = whence == SEEK_DATA;
+
+	if (lastoff < offset)
+		lastoff = offset;
+
+	bh = head = page_buffers(page);
+	do {
+		offset += bh->b_size;
+		if (lastoff >= offset)
+			continue;
+
+		/*
+		 * Unwritten extents that have data in the page cache covering
+		 * them can be identified by the BH_Unwritten state flag.
+		 * Pages with multiple buffers might have a mix of holes, data
+		 * and unwritten extents - any buffer with valid data in it
+		 * should have BH_Uptodate flag set on it.
+		 */
+
+		if ((buffer_unwritten(bh) || buffer_uptodate(bh)) == seek_data)
+			return lastoff;
+
+		lastoff = offset;
+	} while ((bh = bh->b_this_page) != head);
+	return -ENOENT;
+}
+
+/*
+ * Seek for SEEK_DATA / SEEK_HOLE in the page cache.
+ *
+ * Within unwritten extents, the page cache determines which parts are holes
+ * and which are data: unwritten and uptodate buffer heads count as data;
+ * everything else counts as a hole.
+ *
+ * Returns the resulting offset on successs, and -ENOENT otherwise.
+ */
+loff_t
+page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
+			  int whence)
+{
+	pgoff_t index = offset >> PAGE_SHIFT;
+	pgoff_t end = DIV_ROUND_UP(offset + length, PAGE_SIZE);
+	loff_t lastoff = offset;
+	struct pagevec pvec;
+
+	if (length <= 0)
+		return -ENOENT;
+
+	pagevec_init(&pvec, 0);
+
+	do {
+		unsigned want, nr_pages, i;
+
+		want = min_t(unsigned, end - index, PAGEVEC_SIZE);
+		nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index, want);
+		if (nr_pages == 0)
+			break;
+
+		for (i = 0; i < nr_pages; i++) {
+			struct page *page = pvec.pages[i];
+
+			/*
+			 * At this point, the page may be truncated or
+			 * invalidated (changing page->mapping to NULL), or
+			 * even swizzled back from swapper_space to tmpfs file
+			 * mapping.  However, page->index will not change
+			 * because we have a reference on the page.
+                         *
+			 * If current page offset is beyond where we've ended,
+			 * we've found a hole.
+                         */
+			if (whence == SEEK_HOLE &&
+			    lastoff < page_offset(page))
+				goto check_range;
+
+			/* Searching done if the page index is out of range. */
+			if (page->index >= end)
+				goto not_found;
+
+			lock_page(page);
+			if (likely(page->mapping == inode->i_mapping) &&
+			    page_has_buffers(page)) {
+				lastoff = page_seek_hole_data(page, lastoff, whence);
+				if (lastoff >= 0) {
+					unlock_page(page);
+					goto check_range;
+				}
+			}
+			unlock_page(page);
+			lastoff = page_offset(page) + PAGE_SIZE;
+		}
+
+		/* Searching done if fewer pages returned than wanted. */
+		if (nr_pages < want)
+			break;
+
+		index = pvec.pages[i - 1]->index + 1;
+		pagevec_release(&pvec);
+	} while (index < end);
+
+	/* When no page at lastoff and we are not done, we found a hole. */
+	if (whence != SEEK_HOLE)
+		goto not_found;
+
+check_range:
+	if (lastoff < offset + length)
+		goto out;
+not_found:
+	lastoff = -ENOENT;
+out:
+	pagevec_release(&pvec);
+	return lastoff;
+}
+
 void __init buffer_init(void)
 {
 	unsigned long nrpages;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index bd029e52ef5e..ad4e024ce17e 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -201,6 +201,8 @@ void write_boundary_block(struct block_device *bdev,
 			sector_t bblock, unsigned blocksize);
 int bh_uptodate_or_lock(struct buffer_head *bh);
 int bh_submit_read(struct buffer_head *bh);
+loff_t page_cache_seek_hole_data(struct inode *inode, loff_t offset,
+				 loff_t length, int whence);
 
 extern int buffer_heads_over_limit;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/5] vfs: Add iomap_seek_hole and iomap_seek_data helpers
  2017-06-29 13:54 lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Christoph Hellwig
  2017-06-29 13:54 ` [PATCH 1/5] xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk Christoph Hellwig
  2017-06-29 13:54 ` [PATCH 2/5] vfs: Add page_cache_seek_hole_data helper Christoph Hellwig
@ 2017-06-29 13:54 ` Christoph Hellwig
  2017-06-29 13:54 ` [PATCH 4/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA Christoph Hellwig
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-29 13:54 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

From: Andreas Gruenbacher <agruenba@redhat.com>

Filesystems can use this for implementing lseek SEEK_HOLE / SEEK_DATA
support via iomap.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[hch: split functions, coding style cleanups]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c            | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/iomap.h |  4 +++
 2 files changed, 98 insertions(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index 4b10892967a5..432eed8f091f 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -584,6 +584,100 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
 }
 EXPORT_SYMBOL_GPL(iomap_fiemap);
 
+static loff_t
+iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
+		      void *data, struct iomap *iomap)
+{
+	switch (iomap->type) {
+	case IOMAP_UNWRITTEN:
+		offset = page_cache_seek_hole_data(inode, offset, length,
+						   SEEK_HOLE);
+		if (offset < 0)
+			return length;
+		/* fall through */
+	case IOMAP_HOLE:
+		*(loff_t *)data = offset;
+		return 0;
+	default:
+		return length;
+	}
+}
+
+loff_t
+iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops *ops)
+{
+	loff_t size = i_size_read(inode);
+	loff_t length = size - offset;
+	loff_t ret;
+
+	/* Nothing to be found beyond the end of the file. */
+	if (offset >= size)
+		return -ENXIO;
+
+	while (length > 0) {
+		ret = iomap_apply(inode, offset, length, IOMAP_REPORT, ops,
+				  &offset, iomap_seek_hole_actor);
+		if (ret < 0)
+			return ret;
+		if (ret == 0)
+			break;
+
+		offset += ret;
+		length -= ret;
+	}
+
+	return offset;
+}
+EXPORT_SYMBOL_GPL(iomap_seek_hole);
+
+static loff_t
+iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
+		      void *data, struct iomap *iomap)
+{
+	switch (iomap->type) {
+	case IOMAP_HOLE:
+		return length;
+	case IOMAP_UNWRITTEN:
+		offset = page_cache_seek_hole_data(inode, offset, length,
+						   SEEK_DATA);
+		if (offset < 0)
+			return length;
+		/*FALLTHRU*/
+	default:
+		*(loff_t *)data = offset;
+		return 0;
+	}
+}
+
+loff_t
+iomap_seek_data(struct inode *inode, loff_t offset, const struct iomap_ops *ops)
+{
+	loff_t size = i_size_read(inode);
+	loff_t length = size - offset;
+	loff_t ret;
+
+	/* Nothing to be found beyond the end of the file. */
+	if (offset >= size)
+		return -ENXIO;
+
+	while (length > 0) {
+		ret = iomap_apply(inode, offset, length, IOMAP_REPORT, ops,
+				  &offset, iomap_seek_data_actor);
+		if (ret < 0)
+			return ret;
+		if (ret == 0)
+			break;
+
+		offset += ret;
+		length -= ret;
+	}
+
+	if (length <= 0)
+		return -ENXIO;
+	return offset;
+}
+EXPORT_SYMBOL_GPL(iomap_seek_data);
+
 /*
  * Private flags for iomap_dio, must not overlap with the public ones in
  * iomap.h:
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index f753e788da31..8a03f5dcd89b 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -83,6 +83,10 @@ int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops);
 int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		loff_t start, loff_t len, const struct iomap_ops *ops);
+loff_t iomap_seek_hole(struct inode *inode, loff_t offset,
+		const struct iomap_ops *ops);
+loff_t iomap_seek_data(struct inode *inode, loff_t offset,
+		const struct iomap_ops *ops);
 
 /*
  * Flags for direct I/O ->end_io:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-06-29 13:54 lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Christoph Hellwig
                   ` (2 preceding siblings ...)
  2017-06-29 13:54 ` [PATCH 3/5] vfs: Add iomap_seek_hole and iomap_seek_data helpers Christoph Hellwig
@ 2017-06-29 13:54 ` Christoph Hellwig
  2017-06-29 13:54 ` [PATCH 5/5] ext4: " Christoph Hellwig
  2017-06-29 18:47 ` lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Darrick J. Wong
  5 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-29 13:54 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

Switch to the iomap_seek_hole and iomap_seek_data helpers for
implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the
code that isn't needed any more.

Based on patches from Andreas Gruenbacher <agruenba@redhat.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c  | 375 ++---------------------------------------------------
 fs/xfs/xfs_inode.h |   3 -
 2 files changed, 14 insertions(+), 364 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 36c129303fcf..280f07a73d57 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -953,378 +953,31 @@ xfs_file_readdir(
 	return xfs_readdir(NULL, ip, ctx, bufsize);
 }
 
-/*
- * This type is designed to indicate the type of offset we would like
- * to search from page cache for xfs_seek_hole_data().
- */
-enum {
-	HOLE_OFF = 0,
-	DATA_OFF,
-};
-
-/*
- * Lookup the desired type of offset from the given page.
- *
- * On success, return true and the offset argument will point to the
- * start of the region that was found.  Otherwise this function will
- * return false and keep the offset argument unchanged.
- */
-STATIC bool
-xfs_lookup_buffer_offset(
-	struct page		*page,
-	loff_t			*offset,
-	unsigned int		type)
-{
-	loff_t			lastoff = page_offset(page);
-	bool			found = false;
-	struct buffer_head	*bh, *head;
-
-	bh = head = page_buffers(page);
-	do {
-		/*
-		 * Unwritten extents that have data in the page
-		 * cache covering them can be identified by the
-		 * BH_Unwritten state flag.  Pages with multiple
-		 * buffers might have a mix of holes, data and
-		 * unwritten extents - any buffer with valid
-		 * data in it should have BH_Uptodate flag set
-		 * on it.
-		 */
-		if (buffer_unwritten(bh) ||
-		    buffer_uptodate(bh)) {
-			if (type == DATA_OFF)
-				found = true;
-		} else {
-			if (type == HOLE_OFF)
-				found = true;
-		}
-
-		if (found) {
-			*offset = lastoff;
-			break;
-		}
-		lastoff += bh->b_size;
-	} while ((bh = bh->b_this_page) != head);
-
-	return found;
-}
-
-/*
- * This routine is called to find out and return a data or hole offset
- * from the page cache for unwritten extents according to the desired
- * type for xfs_seek_hole_data().
- *
- * The argument offset is used to tell where we start to search from the
- * page cache.  Map is used to figure out the end points of the range to
- * lookup pages.
- *
- * Return true if the desired type of offset was found, and the argument
- * offset is filled with that address.  Otherwise, return false and keep
- * offset unchanged.
- */
-STATIC bool
-xfs_find_get_desired_pgoff(
-	struct inode		*inode,
-	struct xfs_bmbt_irec	*map,
-	unsigned int		type,
-	loff_t			*offset)
-{
-	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_mount	*mp = ip->i_mount;
-	struct pagevec		pvec;
-	pgoff_t			index;
-	pgoff_t			end;
-	loff_t			endoff;
-	loff_t			startoff = *offset;
-	loff_t			lastoff = startoff;
-	bool			found = false;
-
-	pagevec_init(&pvec, 0);
-
-	index = startoff >> PAGE_SHIFT;
-	endoff = XFS_FSB_TO_B(mp, map->br_startoff + map->br_blockcount);
-	end = (endoff - 1) >> PAGE_SHIFT;
-	do {
-		int		want;
-		unsigned	nr_pages;
-		unsigned int	i;
-
-		want = min_t(pgoff_t, end - index, PAGEVEC_SIZE - 1) + 1;
-		nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index,
-					  want);
-		if (nr_pages == 0)
-			break;
-
-		for (i = 0; i < nr_pages; i++) {
-			struct page	*page = pvec.pages[i];
-			loff_t		b_offset;
-
-			/*
-			 * At this point, the page may be truncated or
-			 * invalidated (changing page->mapping to NULL),
-			 * or even swizzled back from swapper_space to tmpfs
-			 * file mapping. However, page->index will not change
-			 * because we have a reference on the page.
-			 *
-			 * If current page offset is beyond where we've ended,
-			 * we've found a hole.
-			 */
-			if (type == HOLE_OFF && lastoff < endoff &&
-			    lastoff < page_offset(pvec.pages[i])) {
-				found = true;
-				*offset = lastoff;
-				goto out;
-			}
-			/* Searching done if the page index is out of range. */
-			if (page->index > end)
-				goto out;
-
-			lock_page(page);
-			/*
-			 * Page truncated or invalidated(page->mapping == NULL).
-			 * We can freely skip it and proceed to check the next
-			 * page.
-			 */
-			if (unlikely(page->mapping != inode->i_mapping)) {
-				unlock_page(page);
-				continue;
-			}
-
-			if (!page_has_buffers(page)) {
-				unlock_page(page);
-				continue;
-			}
-
-			found = xfs_lookup_buffer_offset(page, &b_offset, type);
-			if (found) {
-				/*
-				 * The found offset may be less than the start
-				 * point to search if this is the first time to
-				 * come here.
-				 */
-				*offset = max_t(loff_t, startoff, b_offset);
-				unlock_page(page);
-				goto out;
-			}
-
-			/*
-			 * We either searching data but nothing was found, or
-			 * searching hole but found a data buffer.  In either
-			 * case, probably the next page contains the desired
-			 * things, update the last offset to it so.
-			 */
-			lastoff = page_offset(page) + PAGE_SIZE;
-			unlock_page(page);
-		}
-
-		/*
-		 * The number of returned pages less than our desired, search
-		 * done.
-		 */
-		if (nr_pages < want)
-			break;
-
-		index = pvec.pages[i - 1]->index + 1;
-		pagevec_release(&pvec);
-	} while (index <= end);
-
-	/* No page at lastoff and we are not done - we found a hole. */
-	if (type == HOLE_OFF && lastoff < endoff) {
-		*offset = lastoff;
-		found = true;
-	}
-out:
-	pagevec_release(&pvec);
-	return found;
-}
-
-/*
- * caller must lock inode with xfs_ilock_data_map_shared,
- * can we craft an appropriate ASSERT?
- *
- * end is because the VFS-level lseek interface is defined such that any
- * offset past i_size shall return -ENXIO, but we use this for quota code
- * which does not maintain i_size, and we want to SEEK_DATA past i_size.
- */
-loff_t
-__xfs_seek_hole_data(
-	struct inode		*inode,
-	loff_t			start,
-	loff_t			end,
-	int			whence)
-{
-	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_mount	*mp = ip->i_mount;
-	loff_t			uninitialized_var(offset);
-	xfs_fileoff_t		fsbno;
-	xfs_filblks_t		lastbno;
-	int			error;
-
-	if (start >= end) {
-		error = -ENXIO;
-		goto out_error;
-	}
-
-	/*
-	 * Try to read extents from the first block indicated
-	 * by fsbno to the end block of the file.
-	 */
-	fsbno = XFS_B_TO_FSBT(mp, start);
-	lastbno = XFS_B_TO_FSB(mp, end);
-
-	for (;;) {
-		struct xfs_bmbt_irec	map[2];
-		int			nmap = 2;
-		unsigned int		i;
-
-		error = xfs_bmapi_read(ip, fsbno, lastbno - fsbno, map, &nmap,
-				       XFS_BMAPI_ENTIRE);
-		if (error)
-			goto out_error;
-
-		/* No extents at given offset, must be beyond EOF */
-		if (nmap == 0) {
-			error = -ENXIO;
-			goto out_error;
-		}
-
-		for (i = 0; i < nmap; i++) {
-			offset = max_t(loff_t, start,
-				       XFS_FSB_TO_B(mp, map[i].br_startoff));
-
-			/* Landed in the hole we wanted? */
-			if (whence == SEEK_HOLE &&
-			    map[i].br_startblock == HOLESTARTBLOCK)
-				goto out;
-
-			/* Landed in the data extent we wanted? */
-			if (whence == SEEK_DATA &&
-			    (map[i].br_startblock == DELAYSTARTBLOCK ||
-			     (map[i].br_state == XFS_EXT_NORM &&
-			      !isnullstartblock(map[i].br_startblock))))
-				goto out;
-
-			/*
-			 * Landed in an unwritten extent, try to search
-			 * for hole or data from page cache.
-			 */
-			if (map[i].br_state == XFS_EXT_UNWRITTEN) {
-				if (xfs_find_get_desired_pgoff(inode, &map[i],
-				      whence == SEEK_HOLE ? HOLE_OFF : DATA_OFF,
-							&offset))
-					goto out;
-			}
-		}
-
-		/*
-		 * We only received one extent out of the two requested. This
-		 * means we've hit EOF and didn't find what we are looking for.
-		 */
-		if (nmap == 1) {
-			/*
-			 * If we were looking for a hole, set offset to
-			 * the end of the file (i.e., there is an implicit
-			 * hole at the end of any file).
-		 	 */
-			if (whence == SEEK_HOLE) {
-				offset = end;
-				break;
-			}
-			/*
-			 * If we were looking for data, it's nowhere to be found
-			 */
-			ASSERT(whence == SEEK_DATA);
-			error = -ENXIO;
-			goto out_error;
-		}
-
-		ASSERT(i > 1);
-
-		/*
-		 * Nothing was found, proceed to the next round of search
-		 * if the next reading offset is not at or beyond EOF.
-		 */
-		fsbno = map[i - 1].br_startoff + map[i - 1].br_blockcount;
-		start = XFS_FSB_TO_B(mp, fsbno);
-		if (start >= end) {
-			if (whence == SEEK_HOLE) {
-				offset = end;
-				break;
-			}
-			ASSERT(whence == SEEK_DATA);
-			error = -ENXIO;
-			goto out_error;
-		}
-	}
-
-out:
-	/*
-	 * If at this point we have found the hole we wanted, the returned
-	 * offset may be bigger than the file size as it may be aligned to
-	 * page boundary for unwritten extents.  We need to deal with this
-	 * situation in particular.
-	 */
-	if (whence == SEEK_HOLE)
-		offset = min_t(loff_t, offset, end);
-
-	return offset;
-
-out_error:
-	return error;
-}
-
-STATIC loff_t
-xfs_seek_hole_data(
-	struct file		*file,
-	loff_t			start,
-	int			whence)
-{
-	struct inode		*inode = file->f_mapping->host;
-	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_mount	*mp = ip->i_mount;
-	uint			lock;
-	loff_t			offset, end;
-	int			error = 0;
-
-	if (XFS_FORCED_SHUTDOWN(mp))
-		return -EIO;
-
-	lock = xfs_ilock_data_map_shared(ip);
-
-	end = i_size_read(inode);
-	offset = __xfs_seek_hole_data(inode, start, end, whence);
-	if (offset < 0) {
-		error = offset;
-		goto out_unlock;
-	}
-
-	offset = vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
-
-out_unlock:
-	xfs_iunlock(ip, lock);
-
-	if (error)
-		return error;
-	return offset;
-}
-
 STATIC loff_t
 xfs_file_llseek(
 	struct file	*file,
 	loff_t		offset,
 	int		whence)
 {
+	struct inode		*inode = file->f_mapping->host;
+
+	if (XFS_FORCED_SHUTDOWN(XFS_I(inode)->i_mount))
+		return -EIO;
+
 	switch (whence) {
-	case SEEK_END:
-	case SEEK_CUR:
-	case SEEK_SET:
+	default:
 		return generic_file_llseek(file, offset, whence);
 	case SEEK_HOLE:
+		offset = iomap_seek_hole(inode, offset, &xfs_iomap_ops);
+		break;
 	case SEEK_DATA:
-		return xfs_seek_hole_data(file, offset, whence);
-	default:
-		return -EINVAL;
+		offset = iomap_seek_data(inode, offset, &xfs_iomap_ops);
+		break;
 	}
+
+	if (offset < 0)
+		return offset;
+	return vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
 }
 
 /*
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 677d0bfe1c2d..0ee453de239a 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -445,9 +445,6 @@ int	xfs_zero_eof(struct xfs_inode *ip, xfs_off_t offset,
 		     xfs_fsize_t isize, bool *did_zeroing);
 int	xfs_zero_range(struct xfs_inode *ip, xfs_off_t pos, xfs_off_t count,
 		bool *did_zero);
-loff_t	__xfs_seek_hole_data(struct inode *inode, loff_t start,
-			     loff_t eof, int whence);
-
 
 /* from xfs_iops.c */
 extern void xfs_setup_inode(struct xfs_inode *ip);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-06-29 13:54 lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Christoph Hellwig
                   ` (3 preceding siblings ...)
  2017-06-29 13:54 ` [PATCH 4/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA Christoph Hellwig
@ 2017-06-29 13:54 ` Christoph Hellwig
  2017-06-30 11:51   ` Andreas Gruenbacher
  2017-07-25 12:45   ` [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA Jan Kara
  2017-06-29 18:47 ` lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Darrick J. Wong
  5 siblings, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-29 13:54 UTC (permalink / raw)
  To: Andreas Gruenbacher; +Cc: Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

Switch to the iomap_seek_hole and iomap_seek_data helpers for
implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the
code that isn't needed any more.

Note that with this patch ext4 will now always depend on the iomap
code instead of only when CONFIG_DAX is enabled, and it requires
adding a call into the extent status tree for iomap_begin as well
to properly deal with delalloc extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext4/Kconfig |   1 +
 fs/ext4/ext4.h  |   3 -
 fs/ext4/file.c  | 264 +++-----------------------------------------------------
 fs/ext4/inode.c |  96 ++++++---------------
 4 files changed, 36 insertions(+), 328 deletions(-)

diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
index e38039fd96ff..73b850f5659c 100644
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@@ -37,6 +37,7 @@ config EXT4_FS
 	select CRC16
 	select CRYPTO
 	select CRYPTO_CRC32C
+	select FS_IOMAP
 	help
 	  This is the next generation of the ext3 filesystem.
 
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 32191548abed..eb0a1f221af3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2490,9 +2490,6 @@ extern void ext4_da_update_reserve_space(struct inode *inode,
 					int used, int quota_claim);
 extern int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t lblk,
 			      ext4_fsblk_t pblk, ext4_lblk_t len);
-extern int ext4_get_next_extent(struct inode *inode, ext4_lblk_t lblk,
-				unsigned int map_len,
-				struct extent_status *result);
 
 /* indirect.c */
 extern int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 02ce7e7bbdf5..02bbf2ce7517 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -20,6 +20,7 @@
 
 #include <linux/time.h>
 #include <linux/fs.h>
+#include <linux/iomap.h>
 #include <linux/mount.h>
 #include <linux/path.h>
 #include <linux/dax.h>
@@ -439,253 +440,6 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
 }
 
 /*
- * Here we use ext4_map_blocks() to get a block mapping for a extent-based
- * file rather than ext4_ext_walk_space() because we can introduce
- * SEEK_DATA/SEEK_HOLE for block-mapped and extent-mapped file at the same
- * function.  When extent status tree has been fully implemented, it will
- * track all extent status for a file and we can directly use it to
- * retrieve the offset for SEEK_DATA/SEEK_HOLE.
- */
-
-/*
- * When we retrieve the offset for SEEK_DATA/SEEK_HOLE, we would need to
- * lookup page cache to check whether or not there has some data between
- * [startoff, endoff] because, if this range contains an unwritten extent,
- * we determine this extent as a data or a hole according to whether the
- * page cache has data or not.
- */
-static int ext4_find_unwritten_pgoff(struct inode *inode,
-				     int whence,
-				     ext4_lblk_t end_blk,
-				     loff_t *offset)
-{
-	struct pagevec pvec;
-	unsigned int blkbits;
-	pgoff_t index;
-	pgoff_t end;
-	loff_t endoff;
-	loff_t startoff;
-	loff_t lastoff;
-	int found = 0;
-
-	blkbits = inode->i_sb->s_blocksize_bits;
-	startoff = *offset;
-	lastoff = startoff;
-	endoff = (loff_t)end_blk << blkbits;
-
-	index = startoff >> PAGE_SHIFT;
-	end = (endoff - 1) >> PAGE_SHIFT;
-
-	pagevec_init(&pvec, 0);
-	do {
-		int i, num;
-		unsigned long nr_pages;
-
-		num = min_t(pgoff_t, end - index, PAGEVEC_SIZE - 1) + 1;
-		nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index,
-					  (pgoff_t)num);
-		if (nr_pages == 0)
-			break;
-
-		for (i = 0; i < nr_pages; i++) {
-			struct page *page = pvec.pages[i];
-			struct buffer_head *bh, *head;
-
-			/*
-			 * If current offset is smaller than the page offset,
-			 * there is a hole at this offset.
-			 */
-			if (whence == SEEK_HOLE && lastoff < endoff &&
-			    lastoff < page_offset(pvec.pages[i])) {
-				found = 1;
-				*offset = lastoff;
-				goto out;
-			}
-
-			if (page->index > end)
-				goto out;
-
-			lock_page(page);
-
-			if (unlikely(page->mapping != inode->i_mapping)) {
-				unlock_page(page);
-				continue;
-			}
-
-			if (!page_has_buffers(page)) {
-				unlock_page(page);
-				continue;
-			}
-
-			if (page_has_buffers(page)) {
-				lastoff = page_offset(page);
-				bh = head = page_buffers(page);
-				do {
-					if (buffer_uptodate(bh) ||
-					    buffer_unwritten(bh)) {
-						if (whence == SEEK_DATA)
-							found = 1;
-					} else {
-						if (whence == SEEK_HOLE)
-							found = 1;
-					}
-					if (found) {
-						*offset = max_t(loff_t,
-							startoff, lastoff);
-						unlock_page(page);
-						goto out;
-					}
-					lastoff += bh->b_size;
-					bh = bh->b_this_page;
-				} while (bh != head);
-			}
-
-			lastoff = page_offset(page) + PAGE_SIZE;
-			unlock_page(page);
-		}
-
-		/* The no. of pages is less than our desired, we are done. */
-		if (nr_pages < num)
-			break;
-
-		index = pvec.pages[i - 1]->index + 1;
-		pagevec_release(&pvec);
-	} while (index <= end);
-
-	if (whence == SEEK_HOLE && lastoff < endoff) {
-		found = 1;
-		*offset = lastoff;
-	}
-out:
-	pagevec_release(&pvec);
-	return found;
-}
-
-/*
- * ext4_seek_data() retrieves the offset for SEEK_DATA.
- */
-static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
-{
-	struct inode *inode = file->f_mapping->host;
-	struct extent_status es;
-	ext4_lblk_t start, last, end;
-	loff_t dataoff, isize;
-	int blkbits;
-	int ret;
-
-	inode_lock(inode);
-
-	isize = i_size_read(inode);
-	if (offset >= isize) {
-		inode_unlock(inode);
-		return -ENXIO;
-	}
-
-	blkbits = inode->i_sb->s_blocksize_bits;
-	start = offset >> blkbits;
-	last = start;
-	end = isize >> blkbits;
-	dataoff = offset;
-
-	do {
-		ret = ext4_get_next_extent(inode, last, end - last + 1, &es);
-		if (ret <= 0) {
-			/* No extent found -> no data */
-			if (ret == 0)
-				ret = -ENXIO;
-			inode_unlock(inode);
-			return ret;
-		}
-
-		last = es.es_lblk;
-		if (last != start)
-			dataoff = (loff_t)last << blkbits;
-		if (!ext4_es_is_unwritten(&es))
-			break;
-
-		/*
-		 * If there is a unwritten extent at this offset,
-		 * it will be as a data or a hole according to page
-		 * cache that has data or not.
-		 */
-		if (ext4_find_unwritten_pgoff(inode, SEEK_DATA,
-					      es.es_lblk + es.es_len, &dataoff))
-			break;
-		last += es.es_len;
-		dataoff = (loff_t)last << blkbits;
-		cond_resched();
-	} while (last <= end);
-
-	inode_unlock(inode);
-
-	if (dataoff > isize)
-		return -ENXIO;
-
-	return vfs_setpos(file, dataoff, maxsize);
-}
-
-/*
- * ext4_seek_hole() retrieves the offset for SEEK_HOLE.
- */
-static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
-{
-	struct inode *inode = file->f_mapping->host;
-	struct extent_status es;
-	ext4_lblk_t start, last, end;
-	loff_t holeoff, isize;
-	int blkbits;
-	int ret;
-
-	inode_lock(inode);
-
-	isize = i_size_read(inode);
-	if (offset >= isize) {
-		inode_unlock(inode);
-		return -ENXIO;
-	}
-
-	blkbits = inode->i_sb->s_blocksize_bits;
-	start = offset >> blkbits;
-	last = start;
-	end = isize >> blkbits;
-	holeoff = offset;
-
-	do {
-		ret = ext4_get_next_extent(inode, last, end - last + 1, &es);
-		if (ret < 0) {
-			inode_unlock(inode);
-			return ret;
-		}
-		/* Found a hole? */
-		if (ret == 0 || es.es_lblk > last) {
-			if (last != start)
-				holeoff = (loff_t)last << blkbits;
-			break;
-		}
-		/*
-		 * If there is a unwritten extent at this offset,
-		 * it will be as a data or a hole according to page
-		 * cache that has data or not.
-		 */
-		if (ext4_es_is_unwritten(&es) &&
-		    ext4_find_unwritten_pgoff(inode, SEEK_HOLE,
-					      last + es.es_len, &holeoff))
-			break;
-
-		last += es.es_len;
-		holeoff = (loff_t)last << blkbits;
-		cond_resched();
-	} while (last <= end);
-
-	inode_unlock(inode);
-
-	if (holeoff > isize)
-		holeoff = isize;
-
-	return vfs_setpos(file, holeoff, maxsize);
-}
-
-/*
  * ext4_llseek() handles both block-mapped and extent-mapped maxbytes values
  * by calling generic_file_llseek_size() with the appropriate maxbytes
  * value for each.
@@ -701,18 +455,20 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int whence)
 		maxbytes = inode->i_sb->s_maxbytes;
 
 	switch (whence) {
-	case SEEK_SET:
-	case SEEK_CUR:
-	case SEEK_END:
+	default:
 		return generic_file_llseek_size(file, offset, whence,
 						maxbytes, i_size_read(inode));
-	case SEEK_DATA:
-		return ext4_seek_data(file, offset, maxbytes);
 	case SEEK_HOLE:
-		return ext4_seek_hole(file, offset, maxbytes);
+		offset = iomap_seek_hole(inode, offset, &ext4_iomap_ops);
+		break;
+	case SEEK_DATA:
+		offset = iomap_seek_data(inode, offset, &ext4_iomap_ops);
+		break;
 	}
 
-	return -EINVAL;
+	if (offset < 0)
+		return offset;
+	return vfs_setpos(file, offset, maxbytes);
 }
 
 const struct file_operations ext4_file_operations = {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5cf82d03968c..56a3b042b0ce 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3350,7 +3350,6 @@ static int ext4_releasepage(struct page *page, gfp_t wait)
 		return try_to_free_buffers(page);
 }
 
-#ifdef CONFIG_FS_DAX
 static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 			    unsigned flags, struct iomap *iomap)
 {
@@ -3359,6 +3358,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	unsigned long first_block = offset >> blkbits;
 	unsigned long last_block = (offset + length - 1) >> blkbits;
 	struct ext4_map_blocks map;
+	bool delalloc = false;
 	int ret;
 
 	if (WARN_ON_ONCE(ext4_has_inline_data(inode)))
@@ -3369,6 +3369,27 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 
 	if (!(flags & IOMAP_WRITE)) {
 		ret = ext4_map_blocks(NULL, inode, &map, 0);
+		if (!ret) {
+			struct extent_status es = {};
+
+			ext4_es_find_delayed_extent_range(inode, map.m_lblk,
+					map.m_lblk + map.m_len - 1, &es);
+			/* Is delalloc data before next block in extent tree? */
+			if (es.es_len && es.es_lblk < map.m_lblk + map.m_len) {
+				ext4_lblk_t offset = 0;
+
+				if (es.es_lblk < map.m_lblk)
+					offset = map.m_lblk - es.es_lblk;
+				map.m_lblk = es.es_lblk + offset;
+				map.m_pblk = ext4_es_pblock(&es) + offset;
+				map.m_len = es.es_len - offset;
+				if (ext4_es_status(&es) & EXTENT_STATUS_UNWRITTEN)
+					map.m_flags |= EXT4_MAP_UNWRITTEN;
+				if (ext4_es_status(&es) & EXTENT_STATUS_DELAYED)
+					delalloc = true;
+				ret = 1;
+			}
+		}
 	} else {
 		int dio_credits;
 		handle_t *handle;
@@ -3436,7 +3457,9 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		iomap->blkno = IOMAP_NULL_BLOCK;
 		iomap->length = (u64)map.m_len << blkbits;
 	} else {
-		if (map.m_flags & EXT4_MAP_MAPPED) {
+		if (delalloc) {
+			iomap->type = IOMAP_DELALLOC;
+		} else if (map.m_flags & EXT4_MAP_MAPPED) {
 			iomap->type = IOMAP_MAPPED;
 		} else if (map.m_flags & EXT4_MAP_UNWRITTEN) {
 			iomap->type = IOMAP_UNWRITTEN;
@@ -3511,8 +3534,6 @@ const struct iomap_ops ext4_iomap_ops = {
 	.iomap_end		= ext4_iomap_end,
 };
 
-#endif
-
 static int ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
 			    ssize_t size, void *private)
 {
@@ -5994,70 +6015,3 @@ int ext4_filemap_fault(struct vm_fault *vmf)
 
 	return err;
 }
-
-/*
- * Find the first extent at or after @lblk in an inode that is not a hole.
- * Search for @map_len blocks at most. The extent is returned in @result.
- *
- * The function returns 1 if we found an extent. The function returns 0 in
- * case there is no extent at or after @lblk and in that case also sets
- * @result->es_len to 0. In case of error, the error code is returned.
- */
-int ext4_get_next_extent(struct inode *inode, ext4_lblk_t lblk,
-			 unsigned int map_len, struct extent_status *result)
-{
-	struct ext4_map_blocks map;
-	struct extent_status es = {};
-	int ret;
-
-	map.m_lblk = lblk;
-	map.m_len = map_len;
-
-	/*
-	 * For non-extent based files this loop may iterate several times since
-	 * we do not determine full hole size.
-	 */
-	while (map.m_len > 0) {
-		ret = ext4_map_blocks(NULL, inode, &map, 0);
-		if (ret < 0)
-			return ret;
-		/* There's extent covering m_lblk? Just return it. */
-		if (ret > 0) {
-			int status;
-
-			ext4_es_store_pblock(result, map.m_pblk);
-			result->es_lblk = map.m_lblk;
-			result->es_len = map.m_len;
-			if (map.m_flags & EXT4_MAP_UNWRITTEN)
-				status = EXTENT_STATUS_UNWRITTEN;
-			else
-				status = EXTENT_STATUS_WRITTEN;
-			ext4_es_store_status(result, status);
-			return 1;
-		}
-		ext4_es_find_delayed_extent_range(inode, map.m_lblk,
-						  map.m_lblk + map.m_len - 1,
-						  &es);
-		/* Is delalloc data before next block in extent tree? */
-		if (es.es_len && es.es_lblk < map.m_lblk + map.m_len) {
-			ext4_lblk_t offset = 0;
-
-			if (es.es_lblk < lblk)
-				offset = lblk - es.es_lblk;
-			result->es_lblk = es.es_lblk + offset;
-			ext4_es_store_pblock(result,
-					     ext4_es_pblock(&es) + offset);
-			result->es_len = es.es_len - offset;
-			ext4_es_store_status(result, ext4_es_status(&es));
-
-			return 1;
-		}
-		/* There's a hole at m_lblk, advance us after it */
-		map.m_lblk += map.m_len;
-		map_len -= map.m_len;
-		map.m_len = map_len;
-		cond_resched();
-	}
-	result->es_len = 0;
-	return 0;
-}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch
  2017-06-29 13:54 lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Christoph Hellwig
                   ` (4 preceding siblings ...)
  2017-06-29 13:54 ` [PATCH 5/5] ext4: " Christoph Hellwig
@ 2017-06-29 18:47 ` Darrick J. Wong
  2017-06-29 18:53   ` Christoph Hellwig
  5 siblings, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2017-06-29 18:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Thu, Jun 29, 2017 at 06:54:15AM -0700, Christoph Hellwig wrote:
> Here is my take on all the current seek work.  It has my XFS quota changes

I think the quota change patch fell out of this series, but I can grab
it from the previous iteration if it hasn't changed.

--D

> (unmodified from last post), Andrea's page cache seek helper (unmodified
> except that the export is dropped), and Andreas iomap SEEK_HOLE / DATA,
> for which I've added a couple of cleanups and converted ext4 as well.
> 
> I've made sure both xfs and ext4 pass all the xfstests.
> 
> Changes since the previous posting:
>  - trivial cleanups of the iomap helper
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch
  2017-06-29 18:47 ` lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Darrick J. Wong
@ 2017-06-29 18:53   ` Christoph Hellwig
  2017-07-03 15:11     ` Andreas Gruenbacher
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-29 18:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andreas Gruenbacher, Jan Kara, linux-fsdevel,
	linux-xfs, linux-ext4

On Thu, Jun 29, 2017 at 11:47:51AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 29, 2017 at 06:54:15AM -0700, Christoph Hellwig wrote:
> > Here is my take on all the current seek work.  It has my XFS quota changes
> 
> I think the quota change patch fell out of this series, but I can grab
> it from the previous iteration if it hasn't changed.

It hasn't, please do.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-06-29 13:54 ` [PATCH 5/5] ext4: " Christoph Hellwig
@ 2017-06-30 11:51   ` Andreas Gruenbacher
  2017-06-30 12:11     ` Andreas Gruenbacher
                       ` (5 more replies)
  2017-07-25 12:45   ` [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA Jan Kara
  1 sibling, 6 replies; 28+ messages in thread
From: Andreas Gruenbacher @ 2017-06-30 11:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Thu, Jun 29, 2017 at 3:54 PM, Christoph Hellwig <hch@lst.de> wrote:
> Switch to the iomap_seek_hole and iomap_seek_data helpers for
> implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the
> code that isn't needed any more.
>
> Note that with this patch ext4 will now always depend on the iomap
> code instead of only when CONFIG_DAX is enabled, and it requires
> adding a call into the extent status tree for iomap_begin as well
> to properly deal with delalloc extents.

This breaks SEEK_HOLE / SEEK_DATA on filesystems with the inline_data feature.

Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
to be added back for consistency between reading i_size and walking
the file extents.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-06-30 11:51   ` Andreas Gruenbacher
@ 2017-06-30 12:11     ` Andreas Gruenbacher
  2017-06-30 17:37     ` Christoph Hellwig
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Andreas Gruenbacher @ 2017-06-30 12:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Fri, Jun 30, 2017 at 1:51 PM, Andreas Gruenbacher
<agruenba@redhat.com> wrote:
> On Thu, Jun 29, 2017 at 3:54 PM, Christoph Hellwig <hch@lst.de> wrote:
>> Switch to the iomap_seek_hole and iomap_seek_data helpers for
>> implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the
>> code that isn't needed any more.
>>
>> Note that with this patch ext4 will now always depend on the iomap
>> code instead of only when CONFIG_DAX is enabled, and it requires
>> adding a call into the extent status tree for iomap_begin as well
>> to properly deal with delalloc extents.
>
> This breaks SEEK_HOLE / SEEK_DATA on filesystems with the inline_data feature.
>
> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> to be added back for consistency between reading i_size and walking
> the file extents.

Same on xfs, btw.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-06-30 11:51   ` Andreas Gruenbacher
  2017-06-30 12:11     ` Andreas Gruenbacher
@ 2017-06-30 17:37     ` Christoph Hellwig
  2017-07-01  7:03       ` Darrick J. Wong
  2017-07-03 15:03       ` Andreas Gruenbacher
  2017-07-07 21:27     ` Andreas Gruenbacher
                       ` (3 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-30 17:37 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> to be added back for consistency between reading i_size and walking
> the file extents.

At least for XFS we never had such a consistency as we never took
the iolock (aka i_rwsem).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk
  2017-06-29 13:54 ` [PATCH 1/5] xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk Christoph Hellwig
@ 2017-07-01  2:45   ` Darrick J. Wong
  0 siblings, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2017-07-01  2:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Thu, Jun 29, 2017 at 06:54:16AM -0700, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

> ---
>  fs/xfs/xfs_quotaops.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_quotaops.c b/fs/xfs/xfs_quotaops.c
> index f82d79a8c694..de9493253edf 100644
> --- a/fs/xfs/xfs_quotaops.c
> +++ b/fs/xfs/xfs_quotaops.c
> @@ -269,7 +269,6 @@ xfs_fs_get_nextdqblk(
>  	/* ID may be different, so convert back what we got */
>  	*qid = make_kqid(current_user_ns(), qid->type, id);
>  	return 0;
> -	
>  }
>  
>  STATIC int
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-06-30 17:37     ` Christoph Hellwig
@ 2017-07-01  7:03       ` Darrick J. Wong
  2017-07-02 15:24         ` Christoph Hellwig
  2017-07-03 15:03       ` Andreas Gruenbacher
  1 sibling, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2017-07-01  7:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Fri, Jun 30, 2017 at 07:37:40PM +0200, Christoph Hellwig wrote:
> On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
> > Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> > to be added back for consistency between reading i_size and walking
> > the file extents.
> 
> At least for XFS we never had such a consistency as we never took
> the iolock (aka i_rwsem).

Do we need it?  It seems strange to me that someone else could be
changing the file size while we're trying to use it to scan for holes or
data, but OTOH there are plenty of other places where the kernel
effectively requires that userspace either has synchronized writer
processess with the seeker process or is prepared to deal with the
inconsistent results.  If that's the intended behavior for the
SEEK_{HOLE,DATA} then I guess I can put my fears to rest about XFS /not/
taking the IOLOCK/i_rwsem.

The non-ext4 mechanical parts look ok to me (and test out ok), but I
want to be sure that we don't create a locking mess.  Dave complained in
an earlier thread about lockdep problems because the old code took the
ilock and then started locking pages; since we take the ILOCK
during ->iomap_begin, drop it, and only take page locks during
page_cache_seek_hole_data (which is called from the iomap actor) I think
that particular problem goes away.

(As for the ext4 patch that involves deeper surgery to the ext4 iomap
implementation so you'll have to take that up with Ted...)

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-07-01  7:03       ` Darrick J. Wong
@ 2017-07-02 15:24         ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-07-02 15:24 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Andreas Gruenbacher, Jan Kara, linux-fsdevel,
	linux-xfs, linux-ext4

On Sat, Jul 01, 2017 at 12:03:06AM -0700, Darrick J. Wong wrote:
> On Fri, Jun 30, 2017 at 07:37:40PM +0200, Christoph Hellwig wrote:
> > On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
> > > Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> > > to be added back for consistency between reading i_size and walking
> > > the file extents.
> > 
> > At least for XFS we never had such a consistency as we never took
> > the iolock (aka i_rwsem).
> 
> Do we need it?

For XFS I'm pretty sure we don't.  The lseek is fundamentally safe
without it due to the ilock.

> The non-ext4 mechanical parts look ok to me (and test out ok), but I
> want to be sure that we don't create a locking mess.  Dave complained in
> an earlier thread about lockdep problems because the old code took the
> ilock and then started locking pages; since we take the ILOCK
> during ->iomap_begin, drop it, and only take page locks during
> page_cache_seek_hole_data (which is called from the iomap actor) I think
> that particular problem goes away.

The old code took the ilook in the seek hole/data helper, which had
two problems:  it double locked the ilock as we take it in iomap,
and it hols the ilock over the page cache calls.  None of which happen
with this code.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-06-30 17:37     ` Christoph Hellwig
  2017-07-01  7:03       ` Darrick J. Wong
@ 2017-07-03 15:03       ` Andreas Gruenbacher
  2017-07-03 16:21         ` Darrick J. Wong
  1 sibling, 1 reply; 28+ messages in thread
From: Andreas Gruenbacher @ 2017-07-03 15:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Fri, Jun 30, 2017 at 7:37 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
>> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
>> to be added back for consistency between reading i_size and walking
>> the file extents.
>
> At least for XFS we never had such a consistency as we never took
> the iolock (aka i_rwsem).

What else does this piece of code from mainline xfs_seek_hole_data()
do instead then?

        lock = xfs_ilock_data_map_shared(ip);

        end = i_size_read(inode);
        offset = __xfs_seek_hole_data(inode, start, end, whence);

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch
  2017-06-29 18:53   ` Christoph Hellwig
@ 2017-07-03 15:11     ` Andreas Gruenbacher
  0 siblings, 0 replies; 28+ messages in thread
From: Andreas Gruenbacher @ 2017-07-03 15:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Thu, Jun 29, 2017 at 8:53 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Jun 29, 2017 at 11:47:51AM -0700, Darrick J. Wong wrote:
>> On Thu, Jun 29, 2017 at 06:54:15AM -0700, Christoph Hellwig wrote:
>> > Here is my take on all the current seek work.  It has my XFS quota changes
>>
>> I think the quota change patch fell out of this series, but I can grab
>> it from the previous iteration if it hasn't changed.
>
> It hasn't, please do.

We haven't heard from the ext4 folks on this yet: is everyone happy
with the vfs changes in this series at least? Can those be merged
through the xfs tree? That would allow us to move ahead on the gfs2
side.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-07-03 15:03       ` Andreas Gruenbacher
@ 2017-07-03 16:21         ` Darrick J. Wong
  2017-07-03 22:58           ` Dave Chinner
  0 siblings, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2017-07-03 16:21 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Mon, Jul 03, 2017 at 05:03:54PM +0200, Andreas Gruenbacher wrote:
> On Fri, Jun 30, 2017 at 7:37 PM, Christoph Hellwig <hch@lst.de> wrote:
> > On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
> >> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> >> to be added back for consistency between reading i_size and walking
> >> the file extents.
> >
> > At least for XFS we never had such a consistency as we never took
> > the iolock (aka i_rwsem).
> 
> What else does this piece of code from mainline xfs_seek_hole_data()
> do instead then?
> 
>         lock = xfs_ilock_data_map_shared(ip);

To avoid confusion, I'll start with ilock != iolock.

If I'm reading everything correctly, there are three levels of inode
locks that must be taken in XFS: First, the IOLOCK (aka i_rwsem) to
protect against concurrent IO when we need it, then the MMAPLOCK (istr
this was created to handle dax page faults, which don't take i_rwsem?),
and finally the ILOCK for inode core/extent map updates.  I think page
locks are supposed to happen before ILOCK.

xfs_ilock_data_map_shared() is a helper that takes the ILOCK in shared
or exclusive mode depending on whether or not all the inode metadata
have already been cached in memory.

The ILOCK must be held before reading or writing the extent map.

--D

> 
>         end = i_size_read(inode);
>         offset = __xfs_seek_hole_data(inode, start, end, whence);
> 
> Thanks,
> Andreas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-07-03 16:21         ` Darrick J. Wong
@ 2017-07-03 22:58           ` Dave Chinner
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Chinner @ 2017-07-03 22:58 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andreas Gruenbacher, Christoph Hellwig, Jan Kara, linux-fsdevel,
	linux-xfs, linux-ext4

On Mon, Jul 03, 2017 at 09:21:45AM -0700, Darrick J. Wong wrote:
> On Mon, Jul 03, 2017 at 05:03:54PM +0200, Andreas Gruenbacher wrote:
> > On Fri, Jun 30, 2017 at 7:37 PM, Christoph Hellwig <hch@lst.de> wrote:
> > > On Fri, Jun 30, 2017 at 01:51:10PM +0200, Andreas Gruenbacher wrote:
> > >> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> > >> to be added back for consistency between reading i_size and walking
> > >> the file extents.
> > >
> > > At least for XFS we never had such a consistency as we never took
> > > the iolock (aka i_rwsem).
> > 
> > What else does this piece of code from mainline xfs_seek_hole_data()
> > do instead then?
> > 
> >         lock = xfs_ilock_data_map_shared(ip);
> 
> To avoid confusion, I'll start with ilock != iolock.
> 
> If I'm reading everything correctly, there are three levels of inode
> locks that must be taken in XFS: First, the IOLOCK (aka i_rwsem) to
> protect against concurrent IO when we need it, then the MMAPLOCK (istr
> this was created to handle dax page faults, which don't take i_rwsem?),
> and finally the ILOCK for inode core/extent map updates.  I think page
> locks are supposed to happen before ILOCK.

IOLOCK and MMAPLOCK are equivalent - there are two locks because we
can't take the IOLOCK in page fault context because page faults can
occur while we are holding the IOLOCK.

Locking order for IO path is:

IOLOCK -> page lock -> ILOCK

Locking order for page faults is:

mmap_sem -> MMAPLOCK -> page lock -> ILOCK

> xfs_ilock_data_map_shared() is a helper that takes the ILOCK in shared
> or exclusive mode depending on whether or not all the inode metadata
> have already been cached in memory.
> 
> The ILOCK must be held before reading or writing the extent map.

More simply:

	- IOLOCK/MMAPLOCK is for access control to file data
	- ILOCK is for access control to inode metadata

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-06-30 11:51   ` Andreas Gruenbacher
  2017-06-30 12:11     ` Andreas Gruenbacher
  2017-06-30 17:37     ` Christoph Hellwig
@ 2017-07-07 21:27     ` Andreas Gruenbacher
  2017-07-07 21:27     ` [PATCH v4 1/3] ext4: Add missing locking around iomap_seek_{hole,data} Andreas Gruenbacher
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Andreas Gruenbacher @ 2017-07-07 21:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Fri, Jun 30, 2017 at 1:51 PM, Andreas Gruenbacher <agruenba@redhat.com> wrote:
> On Thu, Jun 29, 2017 at 3:54 PM, Christoph Hellwig <hch@lst.de> wrote:
>> Switch to the iomap_seek_hole and iomap_seek_data helpers for
>> implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the
>> code that isn't needed any more.
>>
>> Note that with this patch ext4 will now always depend on the iomap
>> code instead of only when CONFIG_DAX is enabled, and it requires
>> adding a call into the extent status tree for iomap_begin as well
>> to properly deal with delalloc extents.
>
> This breaks SEEK_HOLE / SEEK_DATA on filesystems with the inline_data feature.
>
> Also, ext4 no longer calls inode_lock or inode_lock_shared; that needs
> to be added back for consistency between reading i_size and walking
> the file extents.

Here are some possible fixes on top of this patch.

One problem with iomap is that is uses a sector number for the physical offset
instead of a byte offset which prevents iomap from being used for inline data
(ext4) / stuffed inodes (gfs2).  If we convert that into a byte offset,
iomap_fiemap and iomap_seek_{hole,data} will just work for those kinds of files.

Andreas

Andreas Gruenbacher (3):
  ext4: Add missing locking around iomap_seek_{hole,data}
  iomap: Switch from blkno to physical offset
  ext4: Add IOMAP_REPORT support for inline data

 fs/buffer.c           |  2 +-
 fs/dax.c              |  2 +-
 fs/ext2/inode.c       |  4 ++--
 fs/ext4/ext4.h        |  4 ++++
 fs/ext4/file.c        |  4 ++++
 fs/ext4/inline.c      | 33 +++++++++++++++++++++++++++++++++
 fs/ext4/inode.c       | 13 +++++++++----
 fs/iomap.c            | 14 ++++++++------
 fs/nfsd/blocklayout.c |  4 ++--
 fs/xfs/xfs_iomap.c    |  6 +++---
 include/linux/iomap.h |  5 +++--
 11 files changed, 70 insertions(+), 21 deletions(-)

-- 
2.7.5

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 1/3] ext4: Add missing locking around iomap_seek_{hole,data}
  2017-06-30 11:51   ` Andreas Gruenbacher
                       ` (2 preceding siblings ...)
  2017-07-07 21:27     ` Andreas Gruenbacher
@ 2017-07-07 21:27     ` Andreas Gruenbacher
  2017-07-12  9:17       ` Christoph Hellwig
  2017-07-07 21:28     ` [PATCH v4 2/3] iomap: Switch from blkno to physical offset Andreas Gruenbacher
  2017-07-07 21:28     ` [PATCH v4 3/3] ext4: Add IOMAP_REPORT support for inline data Andreas Gruenbacher
  5 siblings, 1 reply; 28+ messages in thread
From: Andreas Gruenbacher @ 2017-07-07 21:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

Add back the inode locking that "ext4: Switch to iomap for SEEK_HOLE /
SEEK_DATA" blew away.  We can use inode_lock_shared instead of
inode_lock instead, though.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/ext4/file.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 02bbf2c..699a676 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -459,10 +459,14 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int whence)
 		return generic_file_llseek_size(file, offset, whence,
 						maxbytes, i_size_read(inode));
 	case SEEK_HOLE:
+		inode_lock_shared(inode);
 		offset = iomap_seek_hole(inode, offset, &ext4_iomap_ops);
+		inode_unlock_shared(inode);
 		break;
 	case SEEK_DATA:
+		inode_lock_shared(inode);
 		offset = iomap_seek_data(inode, offset, &ext4_iomap_ops);
+		inode_unlock_shared(inode);
 		break;
 	}
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 2/3] iomap: Switch from blkno to physical offset
  2017-06-30 11:51   ` Andreas Gruenbacher
                       ` (3 preceding siblings ...)
  2017-07-07 21:27     ` [PATCH v4 1/3] ext4: Add missing locking around iomap_seek_{hole,data} Andreas Gruenbacher
@ 2017-07-07 21:28     ` Andreas Gruenbacher
  2017-07-12  9:20       ` Christoph Hellwig
  2017-07-07 21:28     ` [PATCH v4 3/3] ext4: Add IOMAP_REPORT support for inline data Andreas Gruenbacher
  5 siblings, 1 reply; 28+ messages in thread
From: Andreas Gruenbacher @ 2017-07-07 21:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

Replace iomap->blkno, the sector number in 512-byte units, into
iomap->pyhsical, the physical offset in bytes.  For invalid physical
offsets, use the special value IOMAP_NULL_PHYSICAL instead of
IOMAP_NULL_BLOCK.  This allows to use iomap for mappings which are not
block aligned, line inline data on ext4.

Add a new IOMAP_F_INLINE flag to indicate that a mapping is in an area
that contains data as well as metadata.  In iomap_fiemap, map that flag
to FIEMAP_EXTENT_DATA_INLINE.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/buffer.c           |  2 +-
 fs/dax.c              |  2 +-
 fs/ext2/inode.c       |  4 ++--
 fs/ext4/inode.c       |  4 ++--
 fs/iomap.c            | 14 ++++++++------
 fs/nfsd/blocklayout.c |  4 ++--
 fs/xfs/xfs_iomap.c    |  6 +++---
 include/linux/iomap.h |  5 +++--
 8 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index b3674eb..e16f647 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1982,7 +1982,7 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
 	case IOMAP_MAPPED:
 		if (offset >= i_size_read(inode))
 			set_buffer_new(bh);
-		bh->b_blocknr = (iomap->blkno >> (inode->i_blkbits - 9)) +
+		bh->b_blocknr = (iomap->physical >> inode->i_blkbits) +
 				((offset - iomap->offset) >> inode->i_blkbits);
 		set_buffer_mapped(bh);
 		break;
diff --git a/fs/dax.c b/fs/dax.c
index 9187f3b..926caa5 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -985,7 +985,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
 
 static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
 {
-	return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
+	return (iomap->physical + (pos & PAGE_MASK) - iomap->offset) >> 9;
 }
 
 static loff_t
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 2dcbd56..8e9f109 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -823,11 +823,11 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 
 	if (ret == 0) {
 		iomap->type = IOMAP_HOLE;
-		iomap->blkno = IOMAP_NULL_BLOCK;
+		iomap->physical = IOMAP_NULL_PHYSICAL;
 		iomap->length = 1 << blkbits;
 	} else {
 		iomap->type = IOMAP_MAPPED;
-		iomap->blkno = (sector_t)bno << (blkbits - 9);
+		iomap->physical = (u64)bno << blkbits;
 		iomap->length = (u64)ret << blkbits;
 		iomap->flags |= IOMAP_F_MERGED;
 	}
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 56a3b04..82f3f7d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3454,7 +3454,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 
 	if (ret == 0) {
 		iomap->type = IOMAP_HOLE;
-		iomap->blkno = IOMAP_NULL_BLOCK;
+		iomap->physical = IOMAP_NULL_PHYSICAL;
 		iomap->length = (u64)map.m_len << blkbits;
 	} else {
 		if (delalloc) {
@@ -3467,7 +3467,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 			WARN_ON_ONCE(1);
 			return -EIO;
 		}
-		iomap->blkno = (sector_t)map.m_pblk << (blkbits - 9);
+		iomap->physical = (u64)map.m_pblk << blkbits;
 		iomap->length = (u64)map.m_len << blkbits;
 	}
 
diff --git a/fs/iomap.c b/fs/iomap.c
index 432eed8..fca7b4c 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -350,8 +350,8 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
 		struct iomap *iomap)
 {
-	sector_t sector = iomap->blkno +
-		(((pos & ~(PAGE_SIZE - 1)) - iomap->offset) >> 9);
+	sector_t sector = (iomap->physical +
+			   (pos & ~(PAGE_SIZE - 1)) - iomap->offset) >> 9;
 
 	return __dax_zero_page_range(iomap->bdev, iomap->dax_dev, sector,
 			offset, bytes);
@@ -510,11 +510,13 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 		flags |= FIEMAP_EXTENT_MERGED;
 	if (iomap->flags & IOMAP_F_SHARED)
 		flags |= FIEMAP_EXTENT_SHARED;
+	if (iomap->flags & IOMAP_F_INLINE)
+		flags |= FIEMAP_EXTENT_DATA_INLINE;
 
 	return fiemap_fill_next_extent(fi, iomap->offset,
-			iomap->blkno != IOMAP_NULL_BLOCK ? iomap->blkno << 9: 0,
+			iomap->physical != IOMAP_NULL_PHYSICAL ?
+					   iomap->physical : 0,
 			iomap->length, flags);
-
 }
 
 static loff_t
@@ -807,7 +809,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
 	bio = bio_alloc(GFP_KERNEL, 1);
 	bio->bi_bdev = iomap->bdev;
 	bio->bi_iter.bi_sector =
-		iomap->blkno + ((pos - iomap->offset) >> 9);
+		(iomap->physical + pos - iomap->offset) >> 9;
 	bio->bi_private = dio;
 	bio->bi_end_io = iomap_dio_bio_end_io;
 
@@ -886,7 +888,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
 		bio = bio_alloc(GFP_KERNEL, nr_pages);
 		bio->bi_bdev = iomap->bdev;
 		bio->bi_iter.bi_sector =
-			iomap->blkno + ((pos - iomap->offset) >> 9);
+			(iomap->physical + pos - iomap->offset) >> 9;
 		bio->bi_private = dio;
 		bio->bi_end_io = iomap_dio_bio_end_io;
 
diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
index fb5213a..6027b45 100644
--- a/fs/nfsd/blocklayout.c
+++ b/fs/nfsd/blocklayout.c
@@ -65,7 +65,7 @@ nfsd4_block_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
 			bex->es = PNFS_BLOCK_READ_DATA;
 		else
 			bex->es = PNFS_BLOCK_READWRITE_DATA;
-		bex->soff = (iomap.blkno << 9);
+		bex->soff = iomap.physical;
 		break;
 	case IOMAP_UNWRITTEN:
 		if (seg->iomode & IOMODE_RW) {
@@ -78,7 +78,7 @@ nfsd4_block_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
 			}
 
 			bex->es = PNFS_BLOCK_INVALID_DATA;
-			bex->soff = (iomap.blkno << 9);
+			bex->soff = iomap.physical;
 			break;
 		}
 		/*FALLTHRU*/
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 94e5bdf..45b2561 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -54,13 +54,13 @@ xfs_bmbt_to_iomap(
 	struct xfs_mount	*mp = ip->i_mount;
 
 	if (imap->br_startblock == HOLESTARTBLOCK) {
-		iomap->blkno = IOMAP_NULL_BLOCK;
+		iomap->physical = IOMAP_NULL_PHYSICAL;
 		iomap->type = IOMAP_HOLE;
 	} else if (imap->br_startblock == DELAYSTARTBLOCK) {
-		iomap->blkno = IOMAP_NULL_BLOCK;
+		iomap->physical = IOMAP_NULL_PHYSICAL;
 		iomap->type = IOMAP_DELALLOC;
 	} else {
-		iomap->blkno = xfs_fsb_to_db(ip, imap->br_startblock);
+		iomap->physical = xfs_fsb_to_db(ip, imap->br_startblock) << 9;
 		if (imap->br_state == XFS_EXT_UNWRITTEN)
 			iomap->type = IOMAP_UNWRITTEN;
 		else
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8a03f5d..ad49625 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -28,14 +28,15 @@ struct vm_fault;
  */
 #define IOMAP_F_MERGED	0x10	/* contains multiple blocks/extents */
 #define IOMAP_F_SHARED	0x20	/* block shared with another file */
+#define IOMAP_F_INLINE	0x40	/* data mixed with metadata */
 
 /*
  * Magic value for blkno:
  */
-#define IOMAP_NULL_BLOCK -1LL	/* blkno is not valid */
+#define IOMAP_NULL_PHYSICAL -1LL	/* physical offset is not valid */
 
 struct iomap {
-	sector_t		blkno;	/* 1st sector of mapping, 512b units */
+	u64			physical; /* physical offset of mapping, bytes */
 	loff_t			offset;	/* file offset of mapping, bytes */
 	u64			length;	/* length of mapping, bytes */
 	u16			type;	/* type of mapping */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 3/3] ext4: Add IOMAP_REPORT support for inline data
  2017-06-30 11:51   ` Andreas Gruenbacher
                       ` (4 preceding siblings ...)
  2017-07-07 21:28     ` [PATCH v4 2/3] iomap: Switch from blkno to physical offset Andreas Gruenbacher
@ 2017-07-07 21:28     ` Andreas Gruenbacher
  2017-07-25 12:16       ` Jan Kara
  5 siblings, 1 reply; 28+ messages in thread
From: Andreas Gruenbacher @ 2017-07-07 21:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

Fixes iomap_seek_{hole,data} for files with inline data (on filesystems
with the inline_data feature).  In addition, this will make converting
ext4 to use iomap_fiemap easier.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/ext4/ext4.h   |  4 ++++
 fs/ext4/inline.c | 33 +++++++++++++++++++++++++++++++++
 fs/ext4/inode.c  |  9 +++++++--
 3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index eb0a1f2..3eae6ae 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3017,6 +3017,10 @@ extern struct buffer_head *ext4_get_first_inline_block(struct inode *inode,
 extern int ext4_inline_data_fiemap(struct inode *inode,
 				   struct fiemap_extent_info *fieinfo,
 				   int *has_inline, __u64 start, __u64 len);
+
+struct iomap;
+extern int ext4_inline_data_iomap(struct inode *inode, struct iomap *iomap);
+
 extern int ext4_try_to_evict_inline_data(handle_t *handle,
 					 struct inode *inode,
 					 int needed);
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 8d141c0..e9d0ab1 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -12,6 +12,7 @@
  * GNU General Public License for more details.
  */
 
+#include <linux/iomap.h>
 #include <linux/fiemap.h>
 
 #include "ext4_jbd2.h"
@@ -1827,6 +1828,38 @@ int ext4_destroy_inline_data(handle_t *handle, struct inode *inode)
 	return ret;
 }
 
+int ext4_inline_data_iomap(struct inode *inode, struct iomap *iomap)
+{
+	__u64 physical;
+	int error = -ENOENT;
+	struct ext4_iloc iloc;
+
+	down_read(&EXT4_I(inode)->xattr_sem);
+	if (!ext4_has_inline_data(inode))
+		goto out;
+
+	error = ext4_get_inode_loc(inode, &iloc);
+	if (error)
+		goto out;
+
+	physical = (__u64)iloc.bh->b_blocknr << inode->i_sb->s_blocksize_bits;
+	physical += (char *)ext4_raw_inode(&iloc) - iloc.bh->b_data;
+	physical += offsetof(struct ext4_inode, i_block);
+
+	brelse(iloc.bh);
+
+	iomap->physical = physical;
+	iomap->offset = 0;
+	iomap->length = min_t(loff_t, ext4_get_inline_size(inode),
+			      i_size_read(inode));
+	iomap->type = IOMAP_MAPPED;
+	iomap->flags = IOMAP_F_INLINE;
+
+out:
+	up_read(&EXT4_I(inode)->xattr_sem);
+	return error;
+}
+
 int ext4_inline_data_fiemap(struct inode *inode,
 			    struct fiemap_extent_info *fieinfo,
 			    int *has_inline, __u64 start, __u64 len)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 82f3f7d..e2b0a8a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3361,8 +3361,13 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	bool delalloc = false;
 	int ret;
 
-	if (WARN_ON_ONCE(ext4_has_inline_data(inode)))
-		return -ERANGE;
+	if (ext4_has_inline_data(inode)) {
+		if (WARN_ON_ONCE(!(flags & IOMAP_REPORT)))
+			return -ERANGE;
+		if (!ext4_inline_data_iomap(inode, iomap) &&
+		    offset < iomap->length)
+			return 0;
+	}
 
 	map.m_lblk = first_block;
 	map.m_len = last_block - first_block + 1;
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 1/3] ext4: Add missing locking around iomap_seek_{hole,data}
  2017-07-07 21:27     ` [PATCH v4 1/3] ext4: Add missing locking around iomap_seek_{hole,data} Andreas Gruenbacher
@ 2017-07-12  9:17       ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-07-12  9:17 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Fri, Jul 07, 2017 at 11:27:59PM +0200, Andreas Gruenbacher wrote:
> Add back the inode locking that "ext4: Switch to iomap for SEEK_HOLE /
> SEEK_DATA" blew away.  We can use inode_lock_shared instead of
> inode_lock instead, though.

should be folded into that patch.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 2/3] iomap: Switch from blkno to physical offset
  2017-07-07 21:28     ` [PATCH v4 2/3] iomap: Switch from blkno to physical offset Andreas Gruenbacher
@ 2017-07-12  9:20       ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-07-12  9:20 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

Maybe diskaddr or just addr instead of physical which sounds a little
confusing?

Also how well was this tested?  With all the calculation I'm a little
worried about overflows, so a full xfstests run including sub-page
blocksize and DAX really is in order here.

> +#define IOMAP_F_INLINE	0x40	/* data mixed with metadata */

data inline in the inode?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 3/3] ext4: Add IOMAP_REPORT support for inline data
  2017-07-07 21:28     ` [PATCH v4 3/3] ext4: Add IOMAP_REPORT support for inline data Andreas Gruenbacher
@ 2017-07-25 12:16       ` Jan Kara
  2017-07-25 12:19         ` Andreas Gruenbacher
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2017-07-25 12:16 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Fri 07-07-17 23:28:01, Andreas Gruenbacher wrote:
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 82f3f7d..e2b0a8a 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3361,8 +3361,13 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>  	bool delalloc = false;
>  	int ret;
>  
> -	if (WARN_ON_ONCE(ext4_has_inline_data(inode)))
> -		return -ERANGE;
> +	if (ext4_has_inline_data(inode)) {
> +		if (WARN_ON_ONCE(!(flags & IOMAP_REPORT)))
> +			return -ERANGE;
> +		if (!ext4_inline_data_iomap(inode, iomap) &&
> +		    offset < iomap->length)

Hum, what's the thinking behind this "offset < iomap->length" check? If it
fails, we'd just fall through to the normal case which I'm not sure is
guaranteed to be safe? Shouldn't we return error instead?

								Honza

> +			return 0;
> +	}
>  
>  	map.m_lblk = first_block;
>  	map.m_len = last_block - first_block + 1;
> -- 
> 2.7.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 3/3] ext4: Add IOMAP_REPORT support for inline data
  2017-07-25 12:16       ` Jan Kara
@ 2017-07-25 12:19         ` Andreas Gruenbacher
  0 siblings, 0 replies; 28+ messages in thread
From: Andreas Gruenbacher @ 2017-07-25 12:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs, linux-ext4

On Tue, Jul 25, 2017 at 2:16 PM, Jan Kara <jack@suse.cz> wrote:
> On Fri 07-07-17 23:28:01, Andreas Gruenbacher wrote:
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 82f3f7d..e2b0a8a 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -3361,8 +3361,13 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>>       bool delalloc = false;
>>       int ret;
>>
>> -     if (WARN_ON_ONCE(ext4_has_inline_data(inode)))
>> -             return -ERANGE;
>> +     if (ext4_has_inline_data(inode)) {
>> +             if (WARN_ON_ONCE(!(flags & IOMAP_REPORT)))
>> +                     return -ERANGE;
>> +             if (!ext4_inline_data_iomap(inode, iomap) &&
>> +                 offset < iomap->length)
>
> Hum, what's the thinking behind this "offset < iomap->length" check? If it
> fails, we'd just fall through to the normal case which I'm not sure is
> guaranteed to be safe? Shouldn't we return error instead?

Indeed. I'll send out an updated patch queue including this and
several other changes shortly.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-06-29 13:54 ` [PATCH 5/5] ext4: " Christoph Hellwig
  2017-06-30 11:51   ` Andreas Gruenbacher
@ 2017-07-25 12:45   ` Jan Kara
  2017-08-29 13:46     ` Andreas Gruenbacher
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Kara @ 2017-07-25 12:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Jan Kara, linux-fsdevel, linux-xfs, linux-ext4

On Thu 29-06-17 06:54:20, Christoph Hellwig wrote:
> @@ -3359,6 +3358,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>  	unsigned long first_block = offset >> blkbits;
>  	unsigned long last_block = (offset + length - 1) >> blkbits;
>  	struct ext4_map_blocks map;
> +	bool delalloc = false;
>  	int ret;
>  
>  	if (WARN_ON_ONCE(ext4_has_inline_data(inode)))
> @@ -3369,6 +3369,27 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>  
>  	if (!(flags & IOMAP_WRITE)) {
>  		ret = ext4_map_blocks(NULL, inode, &map, 0);
> +		if (!ret) {
> +			struct extent_status es = {};
> +
> +			ext4_es_find_delayed_extent_range(inode, map.m_lblk,
> +					map.m_lblk + map.m_len - 1, &es);
> +			/* Is delalloc data before next block in extent tree? */
> +			if (es.es_len && es.es_lblk < map.m_lblk + map.m_len) {
> +				ext4_lblk_t offset = 0;
> +
> +				if (es.es_lblk < map.m_lblk)
> +					offset = map.m_lblk - es.es_lblk;
> +				map.m_lblk = es.es_lblk + offset;
> +				map.m_pblk = ext4_es_pblock(&es) + offset;

This is wrong for delalloc extent - adding offset to some magic block
value...

> +				map.m_len = es.es_len - offset;
> +				if (ext4_es_status(&es) & EXTENT_STATUS_UNWRITTEN)
> +					map.m_flags |= EXT4_MAP_UNWRITTEN;
> +				if (ext4_es_status(&es) & EXTENT_STATUS_DELAYED)
> +					delalloc = true;
> +				ret = 1;
> +			}
> +		}

Also the delalloc handling seems to be wrong that if we have file as:

HD

where H is hole, D is delalloc block, and iomap is called at offset 0, we
should be getting back "hole at 0" back but instead you will return
"delalloc at 1".

We deal with a similar situation in ext4_da_map_blocks() so it would be
good if we could reuse that one rather than reimplementing it here. But
factoring the necessary functionality out isn't that easy.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA
  2017-07-25 12:45   ` [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA Jan Kara
@ 2017-08-29 13:46     ` Andreas Gruenbacher
  0 siblings, 0 replies; 28+ messages in thread
From: Andreas Gruenbacher @ 2017-08-29 13:46 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs, linux-ext4

Jan,

On Tue, Jul 25, 2017 at 2:45 PM, Jan Kara <jack@suse.cz> wrote:
> On Thu 29-06-17 06:54:20, Christoph Hellwig wrote:
>> @@ -3359,6 +3358,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>>       unsigned long first_block = offset >> blkbits;
>>       unsigned long last_block = (offset + length - 1) >> blkbits;
>>       struct ext4_map_blocks map;
>> +     bool delalloc = false;
>>       int ret;
>>
>>       if (WARN_ON_ONCE(ext4_has_inline_data(inode)))
>> @@ -3369,6 +3369,27 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>>
>>       if (!(flags & IOMAP_WRITE)) {
>>               ret = ext4_map_blocks(NULL, inode, &map, 0);
>> +             if (!ret) {
>> +                     struct extent_status es = {};
>> +
>> +                     ext4_es_find_delayed_extent_range(inode, map.m_lblk,
>> +                                     map.m_lblk + map.m_len - 1, &es);
>> +                     /* Is delalloc data before next block in extent tree? */
>> +                     if (es.es_len && es.es_lblk < map.m_lblk + map.m_len) {
>> +                             ext4_lblk_t offset = 0;
>> +
>> +                             if (es.es_lblk < map.m_lblk)
>> +                                     offset = map.m_lblk - es.es_lblk;
>> +                             map.m_lblk = es.es_lblk + offset;
>> +                             map.m_pblk = ext4_es_pblock(&es) + offset;
>
> This is wrong for delalloc extent - adding offset to some magic block
> value...

Right.

>> +                             map.m_len = es.es_len - offset;
>> +                             if (ext4_es_status(&es) & EXTENT_STATUS_UNWRITTEN)
>> +                                     map.m_flags |= EXT4_MAP_UNWRITTEN;
>> +                             if (ext4_es_status(&es) & EXTENT_STATUS_DELAYED)
>> +                                     delalloc = true;
>> +                             ret = 1;
>> +                     }
>> +             }
>
> Also the delalloc handling seems to be wrong that if we have file as:
>
> HD
>
> where H is hole, D is delalloc block, and iomap is called at offset 0, we
> should be getting back "hole at 0" back but instead you will return
> "delalloc at 1".
>
> We deal with a similar situation in ext4_da_map_blocks() so it would be
> good if we could reuse that one rather than reimplementing it here. But
> factoring the necessary functionality out isn't that easy.

I've tried to figure out where the issue might be, and I really don't see any.

It may be confusing that ext4_iomap_begin doesn't split up
IOMAP_UNWRITTEN maps into holes and data as ext4_find_unwritten_pgoff
did. This splitting is instead done generically in
iomap_seek_{hole,data} for IOMAP_UNWRITTEN maps using
page_cache_seek_hole_data.

I'll post an improved version of this patch queue shortly.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-08-29 13:46 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-29 13:54 lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Christoph Hellwig
2017-06-29 13:54 ` [PATCH 1/5] xfs: remove a whitespace-only line from xfs_fs_get_nextdqblk Christoph Hellwig
2017-07-01  2:45   ` Darrick J. Wong
2017-06-29 13:54 ` [PATCH 2/5] vfs: Add page_cache_seek_hole_data helper Christoph Hellwig
2017-06-29 13:54 ` [PATCH 3/5] vfs: Add iomap_seek_hole and iomap_seek_data helpers Christoph Hellwig
2017-06-29 13:54 ` [PATCH 4/5] xfs: Switch to iomap for SEEK_HOLE / SEEK_DATA Christoph Hellwig
2017-06-29 13:54 ` [PATCH 5/5] ext4: " Christoph Hellwig
2017-06-30 11:51   ` Andreas Gruenbacher
2017-06-30 12:11     ` Andreas Gruenbacher
2017-06-30 17:37     ` Christoph Hellwig
2017-07-01  7:03       ` Darrick J. Wong
2017-07-02 15:24         ` Christoph Hellwig
2017-07-03 15:03       ` Andreas Gruenbacher
2017-07-03 16:21         ` Darrick J. Wong
2017-07-03 22:58           ` Dave Chinner
2017-07-07 21:27     ` Andreas Gruenbacher
2017-07-07 21:27     ` [PATCH v4 1/3] ext4: Add missing locking around iomap_seek_{hole,data} Andreas Gruenbacher
2017-07-12  9:17       ` Christoph Hellwig
2017-07-07 21:28     ` [PATCH v4 2/3] iomap: Switch from blkno to physical offset Andreas Gruenbacher
2017-07-12  9:20       ` Christoph Hellwig
2017-07-07 21:28     ` [PATCH v4 3/3] ext4: Add IOMAP_REPORT support for inline data Andreas Gruenbacher
2017-07-25 12:16       ` Jan Kara
2017-07-25 12:19         ` Andreas Gruenbacher
2017-07-25 12:45   ` [PATCH 5/5] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA Jan Kara
2017-08-29 13:46     ` Andreas Gruenbacher
2017-06-29 18:47 ` lseek SEEK_HOLE / SEEK_DATA fixes and switch to iomap V3.2-hch Darrick J. Wong
2017-06-29 18:53   ` Christoph Hellwig
2017-07-03 15:11     ` Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.