All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
@ 2024-03-28 20:39 Stefan Hajnoczi
  2024-03-28 20:39 ` [RFC 1/9] " Stefan Hajnoczi
                   ` (10 more replies)
  0 siblings, 11 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 20:39 UTC (permalink / raw)
  To: linux-block
  Cc: linux-kernel, eblake, Alasdair Kergon, Mikulas Patocka, dm-devel,
	David Teigland, Mike Snitzer, Jens Axboe, Christoph Hellwig,
	Joe Thornber, Stefan Hajnoczi

cp(1) and backup tools use llseek(SEEK_HOLE/SEEK_DATA) to skip holes in files.
This can speed up the process by reducing the amount of data read and it
preserves sparseness when writing to the output file.

This patch series is an initial attempt at implementing
llseek(SEEK_HOLE/SEEK_DATA) for block devices. I'm looking for feedback on this
approach and suggestions for resolving the open issues.

In the block device world there are similar concepts to holes:
- SCSI has Logical Block Provisioning where the "mapped" state would be
  considered data and other states would be considered holes.
- NBD has NBD_CMD_BLOCK_STATUS for querying whether blocks are present.
- Linux loop block devices and dm-linear targets can pass through queries to
  the backing file.
- dm-thin targets can query metadata to find holes.
- ...and you may be able to think of more examples.

Therefore it is possible to offer this functionality in block drivers.

In my use case a QEMU process in userspace copies the contents of a dm-thin
target. QEMU already uses SEEK_HOLE but that doesn't work on dm-thin targets
without this patch series.

Others have also wished for block device support for SEEK_HOLE. Here is an open
issue from the BorgBackup project:
https://github.com/borgbackup/borg/issues/5609

With these patches userspace can identify holes in loop, dm-linear, and dm-thin
devices. This is done by adding a seek_hole_data() callback to struct
block_device_operations. When the callback is NULL the entire device is
considered data. Device-mapper is extended along the same lines so that targets
can provide seek_hole_data() callbacks.

I'm unfamiliar with much of this code and have probably missed locking
requirements. Since llseek() executes synchronously like ioctl() and is not an
asynchronous I/O request it's possible that my changes to the loop block driver
and dm-thin are broken (e.g. what if the loop device fd is changed during
llseek()?).

To run the tests:

  # make TARGETS=block_seek_hole -C tools/testing/selftests run_tests

The code is also available here:
https://gitlab.com/stefanha/linux/-/tree/block-seek-hole

Please take a look and let me know your thoughts. Thanks!

Stefan Hajnoczi (9):
  block: add llseek(SEEK_HOLE/SEEK_DATA) support
  loop: add llseek(SEEK_HOLE/SEEK_DATA) support
  selftests: block_seek_hole: add loop block driver tests
  dm: add llseek(SEEK_HOLE/SEEK_DATA) support
  selftests: block_seek_hole: add dm-zero test
  dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support
  selftests: block_seek_hole: add dm-linear test
  dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support
  selftests: block_seek_hole: add dm-thin test

 tools/testing/selftests/Makefile              |   1 +
 .../selftests/block_seek_hole/Makefile        |  17 +++
 include/linux/blkdev.h                        |   7 ++
 include/linux/device-mapper.h                 |   5 +
 block/fops.c                                  |  43 ++++++-
 drivers/block/loop.c                          |  36 +++++-
 drivers/md/dm-linear.c                        |  25 ++++
 drivers/md/dm-thin.c                          |  77 ++++++++++++
 drivers/md/dm.c                               |  68 ++++++++++
 .../testing/selftests/block_seek_hole/config  |   3 +
 .../selftests/block_seek_hole/dm_thin.sh      |  80 ++++++++++++
 .../selftests/block_seek_hole/dm_zero.sh      |  31 +++++
 .../selftests/block_seek_hole/map_holes.py    |  37 ++++++
 .../testing/selftests/block_seek_hole/test.py | 117 ++++++++++++++++++
 14 files changed, 540 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/block_seek_hole/Makefile
 create mode 100644 tools/testing/selftests/block_seek_hole/config
 create mode 100755 tools/testing/selftests/block_seek_hole/dm_thin.sh
 create mode 100755 tools/testing/selftests/block_seek_hole/dm_zero.sh
 create mode 100755 tools/testing/selftests/block_seek_hole/map_holes.py
 create mode 100755 tools/testing/selftests/block_seek_hole/test.py

-- 
2.44.0


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC 1/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
@ 2024-03-28 20:39 ` Stefan Hajnoczi
  2024-03-28 23:50   ` Eric Blake
  2024-03-28 20:39 ` [RFC 2/9] loop: " Stefan Hajnoczi
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 20:39 UTC (permalink / raw)
  To: linux-block
  Cc: linux-kernel, eblake, Alasdair Kergon, Mikulas Patocka, dm-devel,
	David Teigland, Mike Snitzer, Jens Axboe, Christoph Hellwig,
	Joe Thornber, Stefan Hajnoczi

The SEEK_HOLE/SEEK_DATA interface is used by userspace applications to
detect sparseness. This makes copying and backup applications faster and
reduces space consumption because only ranges that do not contain data
can be skipped.

Handle SEEK_HOLE/SEEK_DATA for block devices. No block drivers implement
the new callback yet so the entire block device will appear to contain
data. Later patches will add support to drivers so this actually becomes
useful.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/linux/blkdev.h |  7 +++++++
 block/fops.c           | 43 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c3e8f7cf96be9..eecfbf9c27fc4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -332,6 +332,9 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_op op,
 int blk_revalidate_disk_zones(struct gendisk *disk,
 		void (*update_driver_data)(struct gendisk *disk));
 
+loff_t blkdev_seek_hole_data(struct block_device *bdev, loff_t offset,
+		int whence);
+
 /*
  * Independent access ranges: struct blk_independent_access_range describes
  * a range of contiguous sectors that can be accessed using device command
@@ -1432,6 +1435,10 @@ struct block_device_operations {
 	 * driver.
 	 */
 	int (*alternative_gpt_sector)(struct gendisk *disk, sector_t *sector);
+
+	/* Like llseek(SEEK_HOLE/SEEK_DATA). This callback may be NULL. */
+	loff_t (*seek_hole_data)(struct block_device *bdev, loff_t offset,
+			int whence);
 };
 
 #ifdef CONFIG_COMPAT
diff --git a/block/fops.c b/block/fops.c
index 679d9b752fe82..8ffbfec6b4c25 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -523,6 +523,43 @@ const struct address_space_operations def_blk_aops = {
 };
 #endif /* CONFIG_BUFFER_HEAD */
 
+/* Like llseek(SEEK_HOLE/SEEK_DATA) */
+loff_t blkdev_seek_hole_data(struct block_device *bdev, loff_t offset,
+		int whence)
+{
+	const struct block_device_operations *fops = bdev->bd_disk->fops;
+	loff_t size;
+
+	if (fops->seek_hole_data)
+		return fops->seek_hole_data(bdev, offset, whence);
+
+	size = bdev_nr_bytes(bdev);
+
+	switch (whence) {
+	case SEEK_DATA:
+		if ((unsigned long long)offset >= size)
+			return -ENXIO;
+		return offset;
+	case SEEK_HOLE:
+		if ((unsigned long long)offset >= size)
+			return -ENXIO;
+		return size;
+	default:
+		return -EINVAL;
+	}
+}
+
+static loff_t blkdev_llseek_hole_data(struct file *file, loff_t offset,
+		int whence)
+{
+	struct block_device *bdev = file_bdev(file);
+
+	offset = blkdev_seek_hole_data(bdev, offset, whence);
+	if (offset >= 0)
+		offset = vfs_setpos(file, offset, bdev_nr_bytes(bdev));
+	return offset;
+}
+
 /*
  * for a block special file file_inode(file)->i_size is zero
  * so we compute the size by hand (just as in block_read/write above)
@@ -533,7 +570,11 @@ static loff_t blkdev_llseek(struct file *file, loff_t offset, int whence)
 	loff_t retval;
 
 	inode_lock(bd_inode);
-	retval = fixed_size_llseek(file, offset, whence, i_size_read(bd_inode));
+	if (whence == SEEK_HOLE || whence == SEEK_DATA)
+		retval = blkdev_llseek_hole_data(file, offset, whence);
+	else
+		retval = fixed_size_llseek(file, offset, whence,
+					   i_size_read(bd_inode));
 	inode_unlock(bd_inode);
 	return retval;
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 2/9] loop: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
  2024-03-28 20:39 ` [RFC 1/9] " Stefan Hajnoczi
@ 2024-03-28 20:39 ` Stefan Hajnoczi
  2024-03-29  0:00   ` Eric Blake
  2024-03-28 20:39 ` [RFC 3/9] selftests: block_seek_hole: add loop block driver tests Stefan Hajnoczi
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 20:39 UTC (permalink / raw)
  To: linux-block
  Cc: linux-kernel, eblake, Alasdair Kergon, Mikulas Patocka, dm-devel,
	David Teigland, Mike Snitzer, Jens Axboe, Christoph Hellwig,
	Joe Thornber, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
Open issues:
- The file offset is updated on both the blkdev file and the backing
  file. Is there a way to avoid updating the backing file offset so the
  file opened by userspace is not affected?
- Should this run in the worker or use the cgroups?
---
 drivers/block/loop.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 28a95fd366fea..6a89375de82e8 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -750,6 +750,29 @@ static void loop_sysfs_exit(struct loop_device *lo)
 				   &loop_attribute_group);
 }
 
+static loff_t lo_seek_hole_data(struct block_device *bdev, loff_t offset,
+		int whence)
+{
+	/* TODO need to activate cgroups or use worker? */
+	/* TODO locking? */
+	struct loop_device *lo = bdev->bd_disk->private_data;
+	struct file *file = lo->lo_backing_file;
+
+	if (lo->lo_offset > 0)
+		offset += lo->lo_offset; /* TODO underflow/overflow? */
+
+	/* TODO backing file offset is modified! */
+	offset = vfs_llseek(file, offset, whence);
+	if (offset < 0)
+		return offset;
+
+	if (lo->lo_offset > 0)
+		offset -= lo->lo_offset; /* TODO underflow/overflow? */
+	if (lo->lo_sizelimit > 0 && offset > lo->lo_sizelimit)
+		offset = lo->lo_sizelimit;
+	return offset;
+}
+
 static void loop_config_discard(struct loop_device *lo,
 		struct queue_limits *lim)
 {
@@ -1751,13 +1774,14 @@ static void lo_free_disk(struct gendisk *disk)
 }
 
 static const struct block_device_operations lo_fops = {
-	.owner =	THIS_MODULE,
-	.release =	lo_release,
-	.ioctl =	lo_ioctl,
+	.owner =		THIS_MODULE,
+	.release =		lo_release,
+	.ioctl =		lo_ioctl,
 #ifdef CONFIG_COMPAT
-	.compat_ioctl =	lo_compat_ioctl,
+	.compat_ioctl =		lo_compat_ioctl,
 #endif
-	.free_disk =	lo_free_disk,
+	.free_disk =		lo_free_disk,
+	.seek_hole_data =	lo_seek_hole_data,
 };
 
 /*
@@ -2140,7 +2164,7 @@ static int loop_control_remove(int idx)
 		pr_warn_once("deleting an unspecified loop device is not supported.\n");
 		return -EINVAL;
 	}
-		
+
 	/* Hide this loop device for serialization. */
 	ret = mutex_lock_killable(&loop_ctl_mutex);
 	if (ret)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 3/9] selftests: block_seek_hole: add loop block driver tests
  2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
  2024-03-28 20:39 ` [RFC 1/9] " Stefan Hajnoczi
  2024-03-28 20:39 ` [RFC 2/9] loop: " Stefan Hajnoczi
@ 2024-03-28 20:39 ` Stefan Hajnoczi
  2024-03-29  0:11   ` Eric Blake
  2024-03-29 12:38   ` Eric Blake
  2024-03-28 20:39 ` [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 20:39 UTC (permalink / raw)
  To: linux-block
  Cc: linux-kernel, eblake, Alasdair Kergon, Mikulas Patocka, dm-devel,
	David Teigland, Mike Snitzer, Jens Axboe, Christoph Hellwig,
	Joe Thornber, Stefan Hajnoczi

Run the tests with:

  $ make TARGETS=block_seek_hole -C tools/selftests run_tests

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/testing/selftests/Makefile              |   1 +
 .../selftests/block_seek_hole/Makefile        |  17 +++
 .../testing/selftests/block_seek_hole/config  |   1 +
 .../selftests/block_seek_hole/map_holes.py    |  37 +++++++
 .../testing/selftests/block_seek_hole/test.py | 103 ++++++++++++++++++
 5 files changed, 159 insertions(+)
 create mode 100644 tools/testing/selftests/block_seek_hole/Makefile
 create mode 100644 tools/testing/selftests/block_seek_hole/config
 create mode 100755 tools/testing/selftests/block_seek_hole/map_holes.py
 create mode 100755 tools/testing/selftests/block_seek_hole/test.py

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index e1504833654db..8a21d6031b940 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -2,6 +2,7 @@
 TARGETS += alsa
 TARGETS += amd-pstate
 TARGETS += arm64
+TARGETS += block_seek_hole
 TARGETS += bpf
 TARGETS += breakpoints
 TARGETS += cachestat
diff --git a/tools/testing/selftests/block_seek_hole/Makefile b/tools/testing/selftests/block_seek_hole/Makefile
new file mode 100644
index 0000000000000..3f4bbd52db29f
--- /dev/null
+++ b/tools/testing/selftests/block_seek_hole/Makefile
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0-only
+PY3 = $(shell which python3 2>/dev/null)
+
+ifneq ($(PY3),)
+
+TEST_PROGS := test.py
+
+include ../lib.mk
+
+else
+
+all: no_py3_warning
+
+no_py3_warning:
+	@echo "Missing python3. This test will be skipped."
+
+endif
diff --git a/tools/testing/selftests/block_seek_hole/config b/tools/testing/selftests/block_seek_hole/config
new file mode 100644
index 0000000000000..72437e0c0fc1c
--- /dev/null
+++ b/tools/testing/selftests/block_seek_hole/config
@@ -0,0 +1 @@
+CONFIG_BLK_DEV_LOOP=m
diff --git a/tools/testing/selftests/block_seek_hole/map_holes.py b/tools/testing/selftests/block_seek_hole/map_holes.py
new file mode 100755
index 0000000000000..9477ec5d69d3a
--- /dev/null
+++ b/tools/testing/selftests/block_seek_hole/map_holes.py
@@ -0,0 +1,37 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# map_holes.py <filename>
+#
+# Print the holes and data ranges in a file.
+
+import errno
+import os
+import sys
+
+def map_holes(fd):
+    end = os.lseek(fd, 0, os.SEEK_END)
+    offset = 0
+
+    print('TYPE START END SIZE')
+
+    while offset < end:
+        contents = 'DATA'
+        new_offset = os.lseek(fd, offset, os.SEEK_HOLE)
+        if new_offset == offset:
+            contents = 'HOLE'
+            try:
+              new_offset = os.lseek(fd, offset, os.SEEK_DATA)
+            except OSError as err:
+                if err.errno == errno.ENXIO:
+                    new_offset = end
+                else:
+                    raise err
+            assert new_offset != offset
+        print(f'{contents} {offset} {new_offset} {new_offset - offset}')
+        offset = new_offset
+
+if __name__ == '__main__':
+    with open(sys.argv[1], 'rb') as f:
+        fd = f.fileno()
+        map_holes(fd)
diff --git a/tools/testing/selftests/block_seek_hole/test.py b/tools/testing/selftests/block_seek_hole/test.py
new file mode 100755
index 0000000000000..4f7c2d01ab3d3
--- /dev/null
+++ b/tools/testing/selftests/block_seek_hole/test.py
@@ -0,0 +1,103 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# test.py
+#
+# Test SEEK_HOLE/SEEK_DATA support in block drivers
+
+import os
+import subprocess
+import sys
+from contextlib import contextmanager
+
+KB = 1024
+MB = 1024 * KB
+
+def run(args):
+    try:
+        cmd = subprocess.run(args, check=True, capture_output=True)
+    except subprocess.CalledProcessError as e:
+        print(e)
+        print(e.stderr.decode('utf-8').strip())
+        sys.exit(1)
+    return cmd
+
+@contextmanager
+def test_file(layout_fn, prefix='test'):
+    '''A context manager that creates a test file and produces its path'''
+    path = f'{prefix}-{os.getpid()}'
+    with open(path, 'w+b') as f:
+        layout_fn(f)
+
+    try:
+        yield path
+    finally:
+        os.unlink(path)
+
+@contextmanager
+def loop_device(file_path):
+    '''A context manager that attaches a loop device for a given file and produces the path of the loop device'''
+    cmd = run(['losetup', '--show', '-f', file_path])
+    loop_path = os.fsdecode(cmd.stdout.strip())
+
+    try:
+        yield loop_path
+    finally:
+        run(['losetup', '-d', loop_path])
+
+def test(layout, dev_context_manager):
+    with test_file(layout) as file_path, dev_context_manager(file_path) as dev_path:
+        cmd = run(['./map_holes.py', file_path])
+        file_output = cmd.stdout.decode('utf-8').strip()
+
+        cmd = run(['./map_holes.py', dev_path])
+        dev_output = cmd.stdout.decode('utf-8').strip()
+
+        if file_output != dev_output:
+            print(f'FAIL {dev_context_manager.__name__} {layout.__name__}')
+            print('File output:')
+            print(file_output)
+            print('Does not match device output:')
+            print(dev_output)
+            sys.exit(1)
+
+def test_all(layouts, dev_context_managers):
+    for dev_context_manager in dev_context_managers:
+        for layout in layouts:
+            test(layout, dev_context_manager)
+
+# Different data layouts to test
+
+def data_at_beginning_and_end(f):
+    f.write(b'A' * 4 * KB)
+    f.seek(256 * MB)
+
+    f.write(b'B' * 64 * KB)
+
+    f.seek(1024 * MB - KB)
+    f.write(b'C' * KB)
+
+def holes_at_beginning_and_end(f):
+    f.seek(128 * MB)
+    f.write(b'A' * 4 * KB)
+
+    f.seek(512 * MB)
+    f.write(b'B' * 64 * KB)
+
+    f.truncate(1024 * MB)
+
+def no_holes(f):
+    # Just 1 MB so test file generation is quick
+    mb = b'A' * MB
+    f.write(mb)
+
+def empty_file(f):
+    f.truncate(1024 * MB)
+
+if __name__ == '__main__':
+    layouts = [data_at_beginning_and_end,
+               holes_at_beginning_and_end,
+               no_holes,
+               empty_file]
+    dev_context_managers = [loop_device]
+    test_all(layouts, dev_context_managers)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2024-03-28 20:39 ` [RFC 3/9] selftests: block_seek_hole: add loop block driver tests Stefan Hajnoczi
@ 2024-03-28 20:39 ` Stefan Hajnoczi
  2024-03-29  0:38   ` Eric Blake
  2024-03-28 20:39 ` [RFC 5/9] selftests: block_seek_hole: add dm-zero test Stefan Hajnoczi
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 20:39 UTC (permalink / raw)
  To: linux-block
  Cc: linux-kernel, eblake, Alasdair Kergon, Mikulas Patocka, dm-devel,
	David Teigland, Mike Snitzer, Jens Axboe, Christoph Hellwig,
	Joe Thornber, Stefan Hajnoczi

Delegate SEEK_HOLE/SEEK_DATA to device-mapper targets. The new
dm_seek_hole_data() callback allows target types to customize behavior.
The default implementation treats the target as all data with no holes.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/linux/device-mapper.h |  5 +++
 drivers/md/dm.c               | 68 +++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 82b2195efaca7..e89ebaab6507a 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -161,6 +161,10 @@ typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, pgoff_t pgoff,
 typedef size_t (*dm_dax_recovery_write_fn)(struct dm_target *ti, pgoff_t pgoff,
 		void *addr, size_t bytes, struct iov_iter *i);
 
+/* Like llseek(SEEK_HOLE/SEEK_DATA) */
+typedef loff_t (*dm_seek_hole_data)(struct dm_target *ti, loff_t offset,
+		int whence);
+
 void dm_error(const char *message);
 
 struct dm_dev {
@@ -210,6 +214,7 @@ struct target_type {
 	dm_dax_direct_access_fn direct_access;
 	dm_dax_zero_page_range_fn dax_zero_page_range;
 	dm_dax_recovery_write_fn dax_recovery_write;
+	dm_seek_hole_data seek_hole_data;
 
 	/* For internal device-mapper use. */
 	struct list_head list;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 56aa2a8b9d715..3c921bdbd17fc 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -3167,6 +3167,72 @@ void dm_free_md_mempools(struct dm_md_mempools *pools)
 	kfree(pools);
 }
 
+/* Default implementation for targets that do not implement the callback */
+static loff_t dm_blk_seek_hole_data_default(loff_t offset, int whence,
+		loff_t size)
+{
+	switch (whence) {
+	case SEEK_DATA:
+		if ((unsigned long long)offset >= size)
+			return -ENXIO;
+		return offset;
+	case SEEK_HOLE:
+		if ((unsigned long long)offset >= size)
+			return -ENXIO;
+		return size;
+	default:
+		return -EINVAL;
+	}
+}
+
+static loff_t dm_blk_do_seek_hole_data(struct dm_table *table, loff_t offset,
+		int whence)
+{
+	struct dm_target *ti;
+	loff_t end;
+
+	/* Loop when the end of a target is reached */
+	do {
+		ti = dm_table_find_target(table, offset >> SECTOR_SHIFT);
+		if (!ti)
+			return whence == SEEK_DATA ? -ENXIO : offset;
+
+		end = (ti->begin + ti->len) << SECTOR_SHIFT;
+
+		if (ti->type->seek_hole_data)
+			offset = ti->type->seek_hole_data(ti, offset, whence);
+		else
+			offset = dm_blk_seek_hole_data_default(offset, whence, end);
+
+		if (whence == SEEK_DATA && offset == -ENXIO)
+			offset = end;
+	} while (offset == end);
+
+	return offset;
+}
+
+static loff_t dm_blk_seek_hole_data(struct block_device *bdev, loff_t offset,
+		int whence)
+{
+	struct mapped_device *md = bdev->bd_disk->private_data;
+	struct dm_table *table;
+	int srcu_idx;
+	loff_t ret;
+
+	if (dm_suspended_md(md))
+		return -EAGAIN;
+
+	table = dm_get_live_table(md, &srcu_idx);
+	if (!table)
+		return -EIO;
+
+	ret = dm_blk_do_seek_hole_data(table, offset, whence);
+
+	dm_put_live_table(md, srcu_idx);
+
+	return ret;
+}
+
 struct dm_pr {
 	u64	old_key;
 	u64	new_key;
@@ -3493,6 +3559,7 @@ static const struct block_device_operations dm_blk_dops = {
 	.getgeo = dm_blk_getgeo,
 	.report_zones = dm_blk_report_zones,
 	.pr_ops = &dm_pr_ops,
+	.seek_hole_data = dm_blk_seek_hole_data,
 	.owner = THIS_MODULE
 };
 
@@ -3502,6 +3569,7 @@ static const struct block_device_operations dm_rq_blk_dops = {
 	.ioctl = dm_blk_ioctl,
 	.getgeo = dm_blk_getgeo,
 	.pr_ops = &dm_pr_ops,
+	.seek_hole_data = dm_blk_seek_hole_data,
 	.owner = THIS_MODULE
 };
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 5/9] selftests: block_seek_hole: add dm-zero test
  2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2024-03-28 20:39 ` [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
@ 2024-03-28 20:39 ` Stefan Hajnoczi
  2024-03-28 22:19   ` Eric Blake
  2024-03-28 20:39 ` [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 20:39 UTC (permalink / raw)
  To: linux-block
  Cc: linux-kernel, eblake, Alasdair Kergon, Mikulas Patocka, dm-devel,
	David Teigland, Mike Snitzer, Jens Axboe, Christoph Hellwig,
	Joe Thornber, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 .../selftests/block_seek_hole/Makefile        |  2 +-
 .../testing/selftests/block_seek_hole/config  |  2 ++
 .../selftests/block_seek_hole/dm_zero.sh      | 31 +++++++++++++++++++
 3 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/block_seek_hole/dm_zero.sh

diff --git a/tools/testing/selftests/block_seek_hole/Makefile b/tools/testing/selftests/block_seek_hole/Makefile
index 3f4bbd52db29f..1bd9e748b2acc 100644
--- a/tools/testing/selftests/block_seek_hole/Makefile
+++ b/tools/testing/selftests/block_seek_hole/Makefile
@@ -3,7 +3,7 @@ PY3 = $(shell which python3 2>/dev/null)
 
 ifneq ($(PY3),)
 
-TEST_PROGS := test.py
+TEST_PROGS := test.py dm_zero.sh
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/block_seek_hole/config b/tools/testing/selftests/block_seek_hole/config
index 72437e0c0fc1c..bfd59f1d98769 100644
--- a/tools/testing/selftests/block_seek_hole/config
+++ b/tools/testing/selftests/block_seek_hole/config
@@ -1 +1,3 @@
 CONFIG_BLK_DEV_LOOP=m
+CONFIG_BLK_DEV_DM=m
+CONFIG_DM_ZERO=m
diff --git a/tools/testing/selftests/block_seek_hole/dm_zero.sh b/tools/testing/selftests/block_seek_hole/dm_zero.sh
new file mode 100755
index 0000000000000..20836a566fcc8
--- /dev/null
+++ b/tools/testing/selftests/block_seek_hole/dm_zero.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# dm_zero.sh
+#
+# Test that dm-zero reports data because it does not have a custom
+# SEEK_HOLE/SEEK_DATA implementation.
+
+set -e
+
+dev_name=test-$$
+size=$((1024 * 1024 * 1024 / 512)) # 1 GB
+
+cleanup() {
+	dmsetup remove $dev_name
+}
+trap cleanup EXIT
+
+dmsetup create $dev_name --table "0 $size zero"
+
+output=$(./map_holes.py /dev/mapper/$dev_name)
+expected='TYPE START END SIZE
+DATA 0 1073741824 1073741824'
+
+if [ "$output" != "$expected" ]; then
+	echo 'FAIL expected:'
+	echo "$expected"
+	echo 'Does not match device output:'
+	echo "$output"
+	exit 1
+fi
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2024-03-28 20:39 ` [RFC 5/9] selftests: block_seek_hole: add dm-zero test Stefan Hajnoczi
@ 2024-03-28 20:39 ` Stefan Hajnoczi
  2024-03-29  0:54   ` Eric Blake
  2024-03-31  7:35   ` kernel test robot
  2024-03-28 20:39 ` [RFC 7/9] selftests: block_seek_hole: add dm-linear test Stefan Hajnoczi
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 20:39 UTC (permalink / raw)
  To: linux-block
  Cc: linux-kernel, eblake, Alasdair Kergon, Mikulas Patocka, dm-devel,
	David Teigland, Mike Snitzer, Jens Axboe, Christoph Hellwig,
	Joe Thornber, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 drivers/md/dm-linear.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 2d3e186ca87e3..9b6cdfa4f951d 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -147,6 +147,30 @@ static int linear_report_zones(struct dm_target *ti,
 #define linear_report_zones NULL
 #endif
 
+static loff_t linear_seek_hole_data(struct dm_target *ti, loff_t offset,
+		int whence)
+{
+	struct linear_c *lc = ti->private;
+	loff_t ti_begin = ti->begin << SECTOR_SHIFT;
+	loff_t ti_len = ti->len << SECTOR_SHIFT;
+	loff_t bdev_start = lc->start << SECTOR_SHIFT;
+	loff_t bdev_offset;
+
+	/* TODO underflow/overflow? */
+	bdev_offset = offset - ti_begin + bdev_start;
+
+	bdev_offset = blkdev_seek_hole_data(lc->dev->bdev, bdev_offset,
+					    whence);
+	if (bdev_offset < 0)
+		return bdev_offset;
+
+	offset = bdev_offset - bdev_start;
+	if (offset >= ti_len)
+		return whence == SEEK_DATA ? -ENXIO : ti_begin + ti_len;
+
+	return offset + ti_begin;
+}
+
 static int linear_iterate_devices(struct dm_target *ti,
 				  iterate_devices_callout_fn fn, void *data)
 {
@@ -212,6 +236,7 @@ static struct target_type linear_target = {
 	.direct_access = linear_dax_direct_access,
 	.dax_zero_page_range = linear_dax_zero_page_range,
 	.dax_recovery_write = linear_dax_recovery_write,
+	.seek_hole_data = linear_seek_hole_data,
 };
 
 int __init dm_linear_init(void)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 7/9] selftests: block_seek_hole: add dm-linear test
  2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2024-03-28 20:39 ` [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
@ 2024-03-28 20:39 ` Stefan Hajnoczi
  2024-03-29  0:59   ` Eric Blake
  2024-03-28 20:39 ` [RFC 8/9] dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 20:39 UTC (permalink / raw)
  To: linux-block
  Cc: linux-kernel, eblake, Alasdair Kergon, Mikulas Patocka, dm-devel,
	David Teigland, Mike Snitzer, Jens Axboe, Christoph Hellwig,
	Joe Thornber, Stefan Hajnoczi

The dm-linear linear target passes through SEEK_HOLE/SEEK_DATA. Extend
the test case to check that the same holes/data are reported as for the
underlying file.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tools/testing/selftests/block_seek_hole/test.py | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/block_seek_hole/test.py b/tools/testing/selftests/block_seek_hole/test.py
index 4f7c2d01ab3d3..6360b72aee338 100755
--- a/tools/testing/selftests/block_seek_hole/test.py
+++ b/tools/testing/selftests/block_seek_hole/test.py
@@ -45,6 +45,20 @@ def loop_device(file_path):
     finally:
         run(['losetup', '-d', loop_path])
 
+@contextmanager
+def dm_linear(file_path):
+    file_size = os.path.getsize(file_path)
+
+    with loop_device(file_path) as loop_path:
+        dm_name = f'test-{os.getpid()}'
+        run(['dmsetup', 'create', dm_name, '--table',
+             f'0 {file_size // 512} linear {loop_path} 0'])
+
+        try:
+            yield f'/dev/mapper/{dm_name}'
+        finally:
+            run(['dmsetup', 'remove', dm_name])
+
 def test(layout, dev_context_manager):
     with test_file(layout) as file_path, dev_context_manager(file_path) as dev_path:
         cmd = run(['./map_holes.py', file_path])
@@ -99,5 +113,5 @@ if __name__ == '__main__':
                holes_at_beginning_and_end,
                no_holes,
                empty_file]
-    dev_context_managers = [loop_device]
+    dev_context_managers = [loop_device, dm_linear]
     test_all(layouts, dev_context_managers)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 8/9] dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2024-03-28 20:39 ` [RFC 7/9] selftests: block_seek_hole: add dm-linear test Stefan Hajnoczi
@ 2024-03-28 20:39 ` Stefan Hajnoczi
  2024-03-29  1:31   ` Eric Blake
  2024-03-28 20:39 ` [RFC 9/9] selftests: block_seek_hole: add dm-thin test Stefan Hajnoczi
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 20:39 UTC (permalink / raw)
  To: linux-block
  Cc: linux-kernel, eblake, Alasdair Kergon, Mikulas Patocka, dm-devel,
	David Teigland, Mike Snitzer, Jens Axboe, Christoph Hellwig,
	Joe Thornber, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
Open issues:
- Locking?
- thin_seek_hole_data() does not run as a bio or request. This patch
  assumes dm_thin_find_mapped_range() synchronously performs I/O if
  metadata needs to be loaded from disk. Is that a valid assumption?
---
 drivers/md/dm-thin.c | 77 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 4793ad2aa1f7e..3c5dc4f0fe8a3 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -4501,6 +4501,82 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
 	}
 }
 
+static dm_block_t loff_to_block(struct pool *pool, loff_t offset)
+{
+	sector_t offset_sectors = offset >> SECTOR_SHIFT;
+	dm_block_t ret;
+
+	if (block_size_is_power_of_two(pool))
+		ret = offset_sectors >> pool->sectors_per_block_shift;
+	else {
+		ret = offset_sectors;
+		(void) sector_div(ret, pool->sectors_per_block);
+	}
+
+	return ret;
+}
+
+static loff_t block_to_loff(struct pool *pool, dm_block_t block)
+{
+	return block_to_sectors(pool, block) << SECTOR_SHIFT;
+}
+
+static loff_t thin_seek_hole_data(struct dm_target *ti, loff_t offset,
+		int whence)
+{
+	struct thin_c *tc = ti->private;
+	struct dm_thin_device *td = tc->td;
+	struct pool *pool = tc->pool;
+	dm_block_t begin;
+	dm_block_t end;
+	dm_block_t mapped_begin;
+	dm_block_t mapped_end;
+	dm_block_t pool_begin;
+	bool maybe_shared;
+	int ret;
+
+	/* TODO locking? */
+
+	if (block_size_is_power_of_two(pool))
+		end = ti->len >> pool->sectors_per_block_shift;
+	else {
+		end = ti->len;
+		(void) sector_div(end, pool->sectors_per_block);
+	}
+
+	offset -= ti->begin << SECTOR_SHIFT;
+
+	while (true) {
+		begin = loff_to_block(pool, offset);
+		ret = dm_thin_find_mapped_range(td, begin, end,
+						&mapped_begin, &mapped_end,
+						&pool_begin, &maybe_shared);
+		if (ret == -ENODATA) {
+			if (whence == SEEK_DATA)
+				return -ENXIO;
+			break;
+		} else if (ret < 0) {
+			/* TODO handle EWOULDBLOCK? */
+			return -ENXIO;
+		}
+
+		/* SEEK_DATA finishes here... */
+		if (whence == SEEK_DATA) {
+			if (mapped_begin != begin)
+				offset = block_to_loff(pool, mapped_begin);
+			break;
+		}
+
+		/* ...while SEEK_HOLE may need to look further */
+		if (mapped_begin != begin)
+			break; /* offset is in a hole */
+
+		offset = block_to_loff(pool, mapped_end);
+	}
+
+	return offset + (ti->begin << SECTOR_SHIFT);
+}
+
 static struct target_type thin_target = {
 	.name = "thin",
 	.version = {1, 23, 0},
@@ -4515,6 +4591,7 @@ static struct target_type thin_target = {
 	.status = thin_status,
 	.iterate_devices = thin_iterate_devices,
 	.io_hints = thin_io_hints,
+	.seek_hole_data = thin_seek_hole_data,
 };
 
 /*----------------------------------------------------------------*/
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC 9/9] selftests: block_seek_hole: add dm-thin test
  2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2024-03-28 20:39 ` [RFC 8/9] dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
@ 2024-03-28 20:39 ` Stefan Hajnoczi
  2024-03-28 22:16 ` [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Eric Blake
  2024-04-02 12:26 ` Christoph Hellwig
  10 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 20:39 UTC (permalink / raw)
  To: linux-block
  Cc: linux-kernel, eblake, Alasdair Kergon, Mikulas Patocka, dm-devel,
	David Teigland, Mike Snitzer, Jens Axboe, Christoph Hellwig,
	Joe Thornber, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 .../selftests/block_seek_hole/Makefile        |  2 +-
 .../selftests/block_seek_hole/dm_thin.sh      | 80 +++++++++++++++++++
 2 files changed, 81 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/block_seek_hole/dm_thin.sh

diff --git a/tools/testing/selftests/block_seek_hole/Makefile b/tools/testing/selftests/block_seek_hole/Makefile
index 1bd9e748b2acc..3b4ee1b1fb6e7 100644
--- a/tools/testing/selftests/block_seek_hole/Makefile
+++ b/tools/testing/selftests/block_seek_hole/Makefile
@@ -3,7 +3,7 @@ PY3 = $(shell which python3 2>/dev/null)
 
 ifneq ($(PY3),)
 
-TEST_PROGS := test.py dm_zero.sh
+TEST_PROGS := test.py dm_zero.sh dm_thin.sh
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/block_seek_hole/dm_thin.sh b/tools/testing/selftests/block_seek_hole/dm_thin.sh
new file mode 100755
index 0000000000000..a379b7c875f28
--- /dev/null
+++ b/tools/testing/selftests/block_seek_hole/dm_thin.sh
@@ -0,0 +1,80 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# dm_thin.sh
+#
+# Test that dm-thin supports SEEK_HOLE/SEEK_DATA.
+
+set -e
+
+# check <actual> <expected>
+# Check that the actual output matches the expected output.
+check() {
+	if [ "$1" != "$2" ]; then
+		echo 'FAIL expected:'
+		echo "$2"
+		echo 'Does not match device output:'
+		echo "$1"
+		exit 1
+	fi
+}
+
+cleanup() {
+	if [ -n "$thin_name" ]; then
+		dmsetup remove $thin_name
+	fi
+	if [ -n "$pool_name" ]; then
+		dmsetup remove $pool_name
+	fi
+	if [ -n "$metadata_path" ]; then
+		losetup --detach "$metadata_path"
+	fi
+	if [ -n "$data_path" ]; then
+		losetup --detach "$data_path"
+	fi
+	rm -f pool-metadata pool-data
+}
+trap cleanup EXIT
+
+rm -f pool-metadata pool-data
+truncate -s 256M pool-metadata
+truncate -s 1G pool-data
+
+size_sectors=$((1024 * 1024 * 1024 / 512)) # 1 GB
+metadata_path=$(losetup --show --find pool-metadata)
+data_path=$(losetup --show --find pool-data)
+pool_name=pool-$$
+thin_name=thin-$$
+
+dmsetup create $pool_name \
+	--table "0 $size_sectors thin-pool $metadata_path $data_path 128 $size_sectors"
+dmsetup message /dev/mapper/$pool_name 0 'create_thin 0'
+dmsetup create $thin_name --table "0 $size_sectors thin /dev/mapper/$pool_name 0"
+
+# Verify that the device is empty
+check "$(./map_holes.py /dev/mapper/$thin_name)" 'TYPE START END SIZE
+HOLE 0 1073741824 1073741824'
+
+# Write 4k at offset 128M but dm-thin will actually map an entire 64k block
+dd if=/dev/urandom of=/dev/mapper/$thin_name bs=4k count=1 seek=32768 status=none
+check "$(./map_holes.py /dev/mapper/$thin_name)" 'TYPE START END SIZE
+HOLE 0 134217728 134217728
+DATA 134217728 134283264 65536
+HOLE 134283264 1073741824 939458560'
+
+# Write at the beginning of the device
+dd if=/dev/urandom of=/dev/mapper/$thin_name bs=4k count=1 status=none
+check "$(./map_holes.py /dev/mapper/$thin_name)" 'TYPE START END SIZE
+DATA 0 65536 65536
+HOLE 65536 134217728 134152192
+DATA 134217728 134283264 65536
+HOLE 134283264 1073741824 939458560'
+
+# Write at the end of the device
+dd if=/dev/urandom of=/dev/mapper/$thin_name bs=4k count=1 seek=262143 status=none
+check "$(./map_holes.py /dev/mapper/$thin_name)" 'TYPE START END SIZE
+DATA 0 65536 65536
+HOLE 65536 134217728 134152192
+DATA 134217728 134283264 65536
+HOLE 134283264 1073676288 939393024
+DATA 1073676288 1073741824 65536'
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
                   ` (8 preceding siblings ...)
  2024-03-28 20:39 ` [RFC 9/9] selftests: block_seek_hole: add dm-thin test Stefan Hajnoczi
@ 2024-03-28 22:16 ` Eric Blake
  2024-03-28 22:29   ` Eric Blake
  2024-03-28 23:09   ` Stefan Hajnoczi
  2024-04-02 12:26 ` Christoph Hellwig
  10 siblings, 2 replies; 40+ messages in thread
From: Eric Blake @ 2024-03-28 22:16 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Thu, Mar 28, 2024 at 04:39:01PM -0400, Stefan Hajnoczi wrote:
> cp(1) and backup tools use llseek(SEEK_HOLE/SEEK_DATA) to skip holes in files.

As a minor point of clarity (perhaps as much for my own records for
documenting research I've done over the years, and not necessarily
something you need to change in the commit message):

Userspace apps generally use lseek(2) from glibc or similar (perhaps
via its alias lseek64(), depending on whether userspace is using large
file offsets), rather than directly calling the _llseek() syscall.
But it all boils down to the same notion of seeking information about
various special offsets.

Also, in past history, coreutils cp(1) and dd(1) did experiment with
using FS_IOC_FIEMAP ioctls when SEEK_HOLE was not available, but it
proved to cause more problems than it solved, so that is not currently
in favor.  Yes, we could teach more and more block devices to expose
specific ioctls for querying sparseness boundaries, and then teach
userspace apps a list of ioctls to try; but as cp(1) already learned,
having one common interface is much easier than an ever-growing ioctl
ladder to be copied across every client that would benefit from
knowing where the unallocated portions are.

> This can speed up the process by reducing the amount of data read and it
> preserves sparseness when writing to the output file.
> 
> This patch series is an initial attempt at implementing
> llseek(SEEK_HOLE/SEEK_DATA) for block devices. I'm looking for feedback on this
> approach and suggestions for resolving the open issues.

One of your open issues was whether adjusting the offset of the block
device itself should also adjust the file offset of the underlying
file (at least in the case of loopback and dm-linear).  What would the
community think of adding two additional constants to the entire
family of *seek() functions, that have the effect of returning the
same offset as their SEEK_HOLE/SEEK_DATA counterparts but without
moving the file offset?

Explaining the idea by example, although I'm not stuck on these names:
suppose you have an fd visiting a file description of 2MiB in size,
with the first 1MiB being a hole and the second being data.

#define MiB (1024*1024)
lseek64(fd, MiB, SEEK_SET); // returns MiB, file offset changed to MiB
lseek64(fd, 0, SEEK_HOLE); // returns 0, file offset changed to 0
lseek64(fd, 0, SEEK_DATA); // returns MiB, file offset changed to MiB
lseek64(fd, 0, SEEK_PEEK_HOLE); // returns 0, but file offset left at MiB
lseek64(fd, 0, SEEK_SET); // returns 0, file offset changed to MiB
lseek64(fd, 0, SEEK_PEEK_DATA); // returns MiB, but file offset left at MiB

With semantics like that, it might be easier to implement just
SEEK_PEEK* in devices (don't worry about changing offsets, just about
reporting where the requested offset is), and then have a common layer
do the translation from llseek(...,offs,SEEK_HOLE) into a 2-step
llseek(...,llseek(...,offs,SEEK_PEEK_HOLE),SEEK_SET) if that makes life
easier under the hood.

> 
> In the block device world there are similar concepts to holes:
> - SCSI has Logical Block Provisioning where the "mapped" state would be
>   considered data and other states would be considered holes.

BIG caveat here: the SCSI spec does not necessarily guarantee that
unmapped regions read as all zeroes; compare the difference between
FALLOC_FL_ZERO_RANGE and FALLOC_FL_PUNCH_HOLE.  While lseek(SEEK_HOLE)
on a regular file guarantees that future read() in that hole will see
NUL bytes, I'm not sure whether we want to make that guarantee for
block devices.  This may be yet another case where we might want to
add new SEEK_* constants to the *seek() family of functions that lets
the caller indicate whether they want offsets that are guaranteed to
read as zero, vs. merely offsets that are not allocated but may or may
not read as zero.  Skipping unallocated portions, even when you don't
know if the contents reliably read as zero, is still a useful goal in
some userspace programs.

> - NBD has NBD_CMD_BLOCK_STATUS for querying whether blocks are present.

However, utilizing it in nbd.ko would require teaching the kernel to
handle structured or extended headers (right now, that is an extension
only supported in user-space implementations of the NBD protocol).  I
can see why you did not tackle that in this RFC series, even though
you mention it in the cover letter.

> - Linux loop block devices and dm-linear targets can pass through queries to
>   the backing file.
> - dm-thin targets can query metadata to find holes.
> - ...and you may be able to think of more examples.
> 
> Therefore it is possible to offer this functionality in block drivers.
> 
> In my use case a QEMU process in userspace copies the contents of a dm-thin
> target. QEMU already uses SEEK_HOLE but that doesn't work on dm-thin targets
> without this patch series.
> 
> Others have also wished for block device support for SEEK_HOLE. Here is an open
> issue from the BorgBackup project:
> https://github.com/borgbackup/borg/issues/5609
> 
> With these patches userspace can identify holes in loop, dm-linear, and dm-thin
> devices. This is done by adding a seek_hole_data() callback to struct
> block_device_operations. When the callback is NULL the entire device is
> considered data. Device-mapper is extended along the same lines so that targets
> can provide seek_hole_data() callbacks.
> 
> I'm unfamiliar with much of this code and have probably missed locking
> requirements. Since llseek() executes synchronously like ioctl() and is not an
> asynchronous I/O request it's possible that my changes to the loop block driver
> and dm-thin are broken (e.g. what if the loop device fd is changed during
> llseek()?).
> 
> To run the tests:
> 
>   # make TARGETS=block_seek_hole -C tools/testing/selftests run_tests
> 
> The code is also available here:
> https://gitlab.com/stefanha/linux/-/tree/block-seek-hole
> 
> Please take a look and let me know your thoughts. Thanks!
> 
> Stefan Hajnoczi (9):
>   block: add llseek(SEEK_HOLE/SEEK_DATA) support
>   loop: add llseek(SEEK_HOLE/SEEK_DATA) support
>   selftests: block_seek_hole: add loop block driver tests
>   dm: add llseek(SEEK_HOLE/SEEK_DATA) support
>   selftests: block_seek_hole: add dm-zero test
>   dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support
>   selftests: block_seek_hole: add dm-linear test
>   dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support
>   selftests: block_seek_hole: add dm-thin test
> 
>  tools/testing/selftests/Makefile              |   1 +
>  .../selftests/block_seek_hole/Makefile        |  17 +++
>  include/linux/blkdev.h                        |   7 ++
>  include/linux/device-mapper.h                 |   5 +
>  block/fops.c                                  |  43 ++++++-
>  drivers/block/loop.c                          |  36 +++++-
>  drivers/md/dm-linear.c                        |  25 ++++
>  drivers/md/dm-thin.c                          |  77 ++++++++++++
>  drivers/md/dm.c                               |  68 ++++++++++
>  .../testing/selftests/block_seek_hole/config  |   3 +
>  .../selftests/block_seek_hole/dm_thin.sh      |  80 ++++++++++++
>  .../selftests/block_seek_hole/dm_zero.sh      |  31 +++++
>  .../selftests/block_seek_hole/map_holes.py    |  37 ++++++
>  .../testing/selftests/block_seek_hole/test.py | 117 ++++++++++++++++++
>  14 files changed, 540 insertions(+), 7 deletions(-)
>  create mode 100644 tools/testing/selftests/block_seek_hole/Makefile
>  create mode 100644 tools/testing/selftests/block_seek_hole/config
>  create mode 100755 tools/testing/selftests/block_seek_hole/dm_thin.sh
>  create mode 100755 tools/testing/selftests/block_seek_hole/dm_zero.sh
>  create mode 100755 tools/testing/selftests/block_seek_hole/map_holes.py
>  create mode 100755 tools/testing/selftests/block_seek_hole/test.py
> 
> -- 
> 2.44.0
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 5/9] selftests: block_seek_hole: add dm-zero test
  2024-03-28 20:39 ` [RFC 5/9] selftests: block_seek_hole: add dm-zero test Stefan Hajnoczi
@ 2024-03-28 22:19   ` Eric Blake
  2024-03-28 22:32     ` Stefan Hajnoczi
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2024-03-28 22:19 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Thu, Mar 28, 2024 at 04:39:06PM -0400, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  .../selftests/block_seek_hole/Makefile        |  2 +-
>  .../testing/selftests/block_seek_hole/config  |  2 ++
>  .../selftests/block_seek_hole/dm_zero.sh      | 31 +++++++++++++++++++
>  3 files changed, 34 insertions(+), 1 deletion(-)
>  create mode 100755 tools/testing/selftests/block_seek_hole/dm_zero.sh
> 

> +++ b/tools/testing/selftests/block_seek_hole/dm_zero.sh
> @@ -0,0 +1,31 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# dm_zero.sh
> +#
> +# Test that dm-zero reports data because it does not have a custom
> +# SEEK_HOLE/SEEK_DATA implementation.

Why not?  Wouldn't it make more sense to have dm-zero report the
entire device as a hole (that is, an in-range SEEK_HOLE always returns
the same offset, while an in-range SEEK_DATA returns the device size)?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 22:16 ` [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Eric Blake
@ 2024-03-28 22:29   ` Eric Blake
  2024-03-28 23:09   ` Stefan Hajnoczi
  1 sibling, 0 replies; 40+ messages in thread
From: Eric Blake @ 2024-03-28 22:29 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

Replying to myself,

On Thu, Mar 28, 2024 at 05:17:18PM -0500, Eric Blake wrote:
> On Thu, Mar 28, 2024 at 04:39:01PM -0400, Stefan Hajnoczi wrote:
> > cp(1) and backup tools use llseek(SEEK_HOLE/SEEK_DATA) to skip holes in files.
> 
> > 
> > In the block device world there are similar concepts to holes:
> > - SCSI has Logical Block Provisioning where the "mapped" state would be
> >   considered data and other states would be considered holes.
> 
> BIG caveat here: the SCSI spec does not necessarily guarantee that
> unmapped regions read as all zeroes; compare the difference between
> FALLOC_FL_ZERO_RANGE and FALLOC_FL_PUNCH_HOLE.  While lseek(SEEK_HOLE)
> on a regular file guarantees that future read() in that hole will see
> NUL bytes, I'm not sure whether we want to make that guarantee for
> block devices.  This may be yet another case where we might want to
> add new SEEK_* constants to the *seek() family of functions that lets
> the caller indicate whether they want offsets that are guaranteed to
> read as zero, vs. merely offsets that are not allocated but may or may
> not read as zero.  Skipping unallocated portions, even when you don't
> know if the contents reliably read as zero, is still a useful goal in
> some userspace programs.
> 
> > - NBD has NBD_CMD_BLOCK_STATUS for querying whether blocks are present.

The upstream NBD spec[1] took the time to represent two bits of
information per extent, _because_ of the knowledge that not all SCSI
devices with TRIM support actually guarantee a read of zeroes after
trimming.  That is, NBD chose to convey both:

NBD_STATE_HOLE: 1<<0 if region is unallocated, 0 if region has not been trimmed
NBD_STATE_ZERO: 1<<1 if region reads as zeroes, 0 if region contents might be nonzero

it is always safe to describe an extent as value 0 (both bits clear),
whether or not lseek(SEEK_DATA) returns the same offset; meanwhile,
traditional lseek(SEEK_HOLE) on filesystems generally translates to a
status of 3 (both bits set), but as it is (sometimes) possible to
determine that allocated data still reads as zero, or that unallocated
data may not necessarily read as zero, it is also possible to
implement NBD servers that don't report both bits in parallel.

If we are going to enhance llseek(2) to expose more information about
underlying block devices, possibly by adding more SEEK_ constants for
use in the entire family of *seek() API, it may be worth thinking
about whether it is worth userspace being able to query this
additional distinction between unallocated vs reads-as-zero.

[1] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#baseallocation-metadata-context

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 5/9] selftests: block_seek_hole: add dm-zero test
  2024-03-28 22:19   ` Eric Blake
@ 2024-03-28 22:32     ` Stefan Hajnoczi
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 22:32 UTC (permalink / raw)
  To: Eric Blake
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

On Thu, Mar 28, 2024 at 05:19:26PM -0500, Eric Blake wrote:
> On Thu, Mar 28, 2024 at 04:39:06PM -0400, Stefan Hajnoczi wrote:
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  .../selftests/block_seek_hole/Makefile        |  2 +-
> >  .../testing/selftests/block_seek_hole/config  |  2 ++
> >  .../selftests/block_seek_hole/dm_zero.sh      | 31 +++++++++++++++++++
> >  3 files changed, 34 insertions(+), 1 deletion(-)
> >  create mode 100755 tools/testing/selftests/block_seek_hole/dm_zero.sh
> > 
> 
> > +++ b/tools/testing/selftests/block_seek_hole/dm_zero.sh
> > @@ -0,0 +1,31 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +#
> > +# dm_zero.sh
> > +#
> > +# Test that dm-zero reports data because it does not have a custom
> > +# SEEK_HOLE/SEEK_DATA implementation.
> 
> Why not?  Wouldn't it make more sense to have dm-zero report the
> entire device as a hole (that is, an in-range SEEK_HOLE always returns
> the same offset, while an in-range SEEK_DATA returns the device size)?

Yes, dm-zero could report a hole. I just added this test to verify that
targets that do not implement seek_hole_data() work and the dm-zero
target was convenient for testing.

Stefan

> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 22:16 ` [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Eric Blake
  2024-03-28 22:29   ` Eric Blake
@ 2024-03-28 23:09   ` Stefan Hajnoczi
  1 sibling, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-28 23:09 UTC (permalink / raw)
  To: Eric Blake
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

[-- Attachment #1: Type: text/plain, Size: 2845 bytes --]

On Thu, Mar 28, 2024 at 05:16:54PM -0500, Eric Blake wrote:
> On Thu, Mar 28, 2024 at 04:39:01PM -0400, Stefan Hajnoczi wrote:
> > This can speed up the process by reducing the amount of data read and it
> > preserves sparseness when writing to the output file.
> > 
> > This patch series is an initial attempt at implementing
> > llseek(SEEK_HOLE/SEEK_DATA) for block devices. I'm looking for feedback on this
> > approach and suggestions for resolving the open issues.
> 
> One of your open issues was whether adjusting the offset of the block
> device itself should also adjust the file offset of the underlying
> file (at least in the case of loopback and dm-linear).  What would the

Only the loop block driver has this issue. The dm-linear driver uses
blkdev_seek_hole_data(), which does not update the file offset because
it operates on a struct block_device instead of a struct file.

> 
> > 
> > In the block device world there are similar concepts to holes:
> > - SCSI has Logical Block Provisioning where the "mapped" state would be
> >   considered data and other states would be considered holes.
> 
> BIG caveat here: the SCSI spec does not necessarily guarantee that
> unmapped regions read as all zeroes; compare the difference between
> FALLOC_FL_ZERO_RANGE and FALLOC_FL_PUNCH_HOLE.  While lseek(SEEK_HOLE)
> on a regular file guarantees that future read() in that hole will see
> NUL bytes, I'm not sure whether we want to make that guarantee for
> block devices.  This may be yet another case where we might want to
> add new SEEK_* constants to the *seek() family of functions that lets
> the caller indicate whether they want offsets that are guaranteed to
> read as zero, vs. merely offsets that are not allocated but may or may
> not read as zero.  Skipping unallocated portions, even when you don't
> know if the contents reliably read as zero, is still a useful goal in
> some userspace programs.

SCSI initiators can check the Logical Block Provisioning Read Zeroes
(LBPRZ) field to determine whether or not zeroes are guaranteed. The sd
driver would only rely on the device when LPBRZ indicates that zeroes
will be read. Otherwise the driver would report that the device is
filled with data.

> 
> > - NBD has NBD_CMD_BLOCK_STATUS for querying whether blocks are present.
> 
> However, utilizing it in nbd.ko would require teaching the kernel to
> handle structured or extended headers (right now, that is an extension
> only supported in user-space implementations of the NBD protocol).  I
> can see why you did not tackle that in this RFC series, even though
> you mention it in the cover letter.

Yes, I'm mostly interested in dm-thin. The loop block driver and
dm-linear are useful for testing so I modified them. I didn't try SCSI
or NBD.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 1/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 ` [RFC 1/9] " Stefan Hajnoczi
@ 2024-03-28 23:50   ` Eric Blake
  0 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2024-03-28 23:50 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Thu, Mar 28, 2024 at 04:39:02PM -0400, Stefan Hajnoczi wrote:
> The SEEK_HOLE/SEEK_DATA interface is used by userspace applications to
> detect sparseness. This makes copying and backup applications faster and
> reduces space consumption because only ranges that do not contain data
> can be skipped.

s/only //

> 
> Handle SEEK_HOLE/SEEK_DATA for block devices. No block drivers implement
> the new callback yet so the entire block device will appear to contain
> data. Later patches will add support to drivers so this actually becomes
> useful.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/linux/blkdev.h |  7 +++++++
>  block/fops.c           | 43 +++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 49 insertions(+), 1 deletion(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 2/9] loop: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 ` [RFC 2/9] loop: " Stefan Hajnoczi
@ 2024-03-29  0:00   ` Eric Blake
  2024-03-29 12:54     ` Stefan Hajnoczi
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2024-03-29  0:00 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Thu, Mar 28, 2024 at 04:39:03PM -0400, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> Open issues:
> - The file offset is updated on both the blkdev file and the backing
>   file. Is there a way to avoid updating the backing file offset so the
>   file opened by userspace is not affected?
> - Should this run in the worker or use the cgroups?
> ---
>  drivers/block/loop.c | 36 ++++++++++++++++++++++++++++++------
>  1 file changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 28a95fd366fea..6a89375de82e8 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -750,6 +750,29 @@ static void loop_sysfs_exit(struct loop_device *lo)
>  				   &loop_attribute_group);
>  }
>  
> +static loff_t lo_seek_hole_data(struct block_device *bdev, loff_t offset,
> +		int whence)
> +{
> +	/* TODO need to activate cgroups or use worker? */
> +	/* TODO locking? */
> +	struct loop_device *lo = bdev->bd_disk->private_data;
> +	struct file *file = lo->lo_backing_file;
> +
> +	if (lo->lo_offset > 0)
> +		offset += lo->lo_offset; /* TODO underflow/overflow? */
> +
> +	/* TODO backing file offset is modified! */
> +	offset = vfs_llseek(file, offset, whence);

Not only did you modify the underlying offset...

> +	if (offset < 0)
> +		return offset;
> +
> +	if (lo->lo_offset > 0)
> +		offset -= lo->lo_offset; /* TODO underflow/overflow? */
> +	if (lo->lo_sizelimit > 0 && offset > lo->lo_sizelimit)
> +		offset = lo->lo_sizelimit;

...but if this code kicks in, you have clamped the return result to
EOF of the loop device while leaving the underlying offset beyond the
limit, which may mess up assumptions of other code expecting the loop
to always have an in-bounds offset for the underlying file (offhand, I
don't know if there is any such code; but since loop_ctl_fops.llseek =
noop_lseek, there may be code relying on an even-tighter restriction
that the offset of the underlying file never changes, not even within
bounds).

Furthermore, this is inconsistent with all other seek-beyond-end code
that fails with -ENXIO instead of returning size.

But for an RFC, the idea of being able to seek to HOLEs in a loop
device is awesome!

> @@ -2140,7 +2164,7 @@ static int loop_control_remove(int idx)
>  		pr_warn_once("deleting an unspecified loop device is not supported.\n");
>  		return -EINVAL;
>  	}
> -		
> +
>  	/* Hide this loop device for serialization. */
>  	ret = mutex_lock_killable(&loop_ctl_mutex);
>  	if (ret)

Unrelated whitespace change?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 3/9] selftests: block_seek_hole: add loop block driver tests
  2024-03-28 20:39 ` [RFC 3/9] selftests: block_seek_hole: add loop block driver tests Stefan Hajnoczi
@ 2024-03-29  0:11   ` Eric Blake
  2024-04-03 13:50     ` Stefan Hajnoczi
  2024-03-29 12:38   ` Eric Blake
  1 sibling, 1 reply; 40+ messages in thread
From: Eric Blake @ 2024-03-29  0:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Thu, Mar 28, 2024 at 04:39:04PM -0400, Stefan Hajnoczi wrote:
> Run the tests with:
> 
>   $ make TARGETS=block_seek_hole -C tools/selftests run_tests
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/testing/selftests/Makefile              |   1 +
>  .../selftests/block_seek_hole/Makefile        |  17 +++
>  .../testing/selftests/block_seek_hole/config  |   1 +
>  .../selftests/block_seek_hole/map_holes.py    |  37 +++++++
>  .../testing/selftests/block_seek_hole/test.py | 103 ++++++++++++++++++
>  5 files changed, 159 insertions(+)
>  create mode 100644 tools/testing/selftests/block_seek_hole/Makefile
>  create mode 100644 tools/testing/selftests/block_seek_hole/config
>  create mode 100755 tools/testing/selftests/block_seek_hole/map_holes.py
>  create mode 100755 tools/testing/selftests/block_seek_hole/test.py
> 
> +++ b/tools/testing/selftests/block_seek_hole/test.py

> +
> +# Different data layouts to test
> +
> +def data_at_beginning_and_end(f):
> +    f.write(b'A' * 4 * KB)
> +    f.seek(256 * MB)
> +
> +    f.write(b'B' * 64 * KB)
> +
> +    f.seek(1024 * MB - KB)
> +    f.write(b'C' * KB)
> +
> +def holes_at_beginning_and_end(f):
> +    f.seek(128 * MB)
> +    f.write(b'A' * 4 * KB)
> +
> +    f.seek(512 * MB)
> +    f.write(b'B' * 64 * KB)
> +
> +    f.truncate(1024 * MB)
> +
> +def no_holes(f):
> +    # Just 1 MB so test file generation is quick
> +    mb = b'A' * MB
> +    f.write(mb)
> +
> +def empty_file(f):
> +    f.truncate(1024 * MB)

Is it also worth attempting to test a (necessarily sparse!) file
larger than 2GiB to prove that we are 64-bit clean, even on a 32-bit
system where lseek is different than lseek64?  (I honestly have no
idea if python always uses 64-bit seek even on 32-bit systems,
although I would be surprised if it were not)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 ` [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
@ 2024-03-29  0:38   ` Eric Blake
  2024-04-03 14:11     ` Stefan Hajnoczi
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2024-03-29  0:38 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Thu, Mar 28, 2024 at 04:39:05PM -0400, Stefan Hajnoczi wrote:
> Delegate SEEK_HOLE/SEEK_DATA to device-mapper targets. The new
> dm_seek_hole_data() callback allows target types to customize behavior.
> The default implementation treats the target as all data with no holes.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/linux/device-mapper.h |  5 +++
>  drivers/md/dm.c               | 68 +++++++++++++++++++++++++++++++++++
>  2 files changed, 73 insertions(+)
> 

> +/* Default implementation for targets that do not implement the callback */
> +static loff_t dm_blk_seek_hole_data_default(loff_t offset, int whence,
> +		loff_t size)
> +{
> +	switch (whence) {
> +	case SEEK_DATA:
> +		if ((unsigned long long)offset >= size)
> +			return -ENXIO;
> +		return offset;
> +	case SEEK_HOLE:
> +		if ((unsigned long long)offset >= size)
> +			return -ENXIO;
> +		return size;

These fail with -ENXIO if offset == size (matching what we do on files)...

> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static loff_t dm_blk_do_seek_hole_data(struct dm_table *table, loff_t offset,
> +		int whence)
> +{
> +	struct dm_target *ti;
> +	loff_t end;
> +
> +	/* Loop when the end of a target is reached */
> +	do {
> +		ti = dm_table_find_target(table, offset >> SECTOR_SHIFT);
> +		if (!ti)
> +			return whence == SEEK_DATA ? -ENXIO : offset;

...but this blindly returns offset for SEEK_HOLE, even when offset is
beyond the end of the dm.  I think you want 'return -ENXIO;'
unconditionally here.

> +
> +		end = (ti->begin + ti->len) << SECTOR_SHIFT;
> +
> +		if (ti->type->seek_hole_data)
> +			offset = ti->type->seek_hole_data(ti, offset, whence);

Are we guaranteed that ti->type->seek_hole_data will not return a
value exceeding end?  Or can dm be used to truncate the view of an
underlying device, and the underlying seek_hold_data can now return an
answer beyond where dm_table_find_target should look for the next part
of the dm's view?

In which case, should the blkdev_seek_hole_data callback be passed a
max size parameter everywhere, similar to how fixed_size_llseek does
things?

> +		else
> +			offset = dm_blk_seek_hole_data_default(offset, whence, end);
> +
> +		if (whence == SEEK_DATA && offset == -ENXIO)
> +			offset = end;

You have a bug here.  If I have a dm contructed of two underlying targets:

|A  |B  |

and A is all data, then whence == SEEK_HOLE will have offset = -ENXIO
at this point, and you fail to check whether B is also data.  That is,
you have silently treated the rest of the block device as data, which
is semantically not wrong (as that is always a safe fallback), but not
optimal.

I think the correct logic is s/whence == SEEK_DATA &&//.

> +	} while (offset == end);

I'm trying to make sure that we can never return the equivalent of
lseek(dm, 0, SEEK_END).  If you make my above suggested changes, we
will iterate through the do loop once more at EOF, and
dm_table_find_target() will then fail to match at which point we do
get the desired -ENXIO for both SEEK_HOLE and SEEK_DATA.

> +
> +	return offset;
> +}
> +

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 ` [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
@ 2024-03-29  0:54   ` Eric Blake
  2024-04-03 14:22     ` Stefan Hajnoczi
  2024-03-31  7:35   ` kernel test robot
  1 sibling, 1 reply; 40+ messages in thread
From: Eric Blake @ 2024-03-29  0:54 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Thu, Mar 28, 2024 at 04:39:07PM -0400, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  drivers/md/dm-linear.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
> index 2d3e186ca87e3..9b6cdfa4f951d 100644
> --- a/drivers/md/dm-linear.c
> +++ b/drivers/md/dm-linear.c
> @@ -147,6 +147,30 @@ static int linear_report_zones(struct dm_target *ti,
>  #define linear_report_zones NULL
>  #endif
>  
> +static loff_t linear_seek_hole_data(struct dm_target *ti, loff_t offset,
> +		int whence)
> +{
> +	struct linear_c *lc = ti->private;
> +	loff_t ti_begin = ti->begin << SECTOR_SHIFT;
> +	loff_t ti_len = ti->len << SECTOR_SHIFT;
> +	loff_t bdev_start = lc->start << SECTOR_SHIFT;
> +	loff_t bdev_offset;

Okay, given my questions in 4/9, it looks like your intent is that
each callback for dm_seek_hole_data will obey its own ti-> limits.

> +
> +	/* TODO underflow/overflow? */
> +	bdev_offset = offset - ti_begin + bdev_start;
> +
> +	bdev_offset = blkdev_seek_hole_data(lc->dev->bdev, bdev_offset,
> +					    whence);
> +	if (bdev_offset < 0)
> +		return bdev_offset;
> +
> +	offset = bdev_offset - bdev_start;
> +	if (offset >= ti_len)
> +		return whence == SEEK_DATA ? -ENXIO : ti_begin + ti_len;

However, this is inconsistent with dm_blk_seek_hole_data_default in
4/9; I think you want to unconditionally return -ENXIO here, and let
the caller figure out when to turn -ENXIO back into end to proceed
with the next ti in the list.

OR, you may want to document the semantics that dm_seek_hole_data
callbacks must NOT return -ENXIO, but always return ti_begin + ti_len
when the answer (either SEEK_HOLE or SEEK_END) did not lie within the
current ti - it is DIFFERENT than the semantics for
blkdev_seek_hole_data, but gets normalized back into the expected
-ENXIO answer when dm_blk_do_seek_hole_data finally advances past the
last ti.

At any rate, I know this is an RFC series, but it goes to show that
comments will be essential, whichever interface you decide for
callbacks to honor (both a guarantee that callbacks will only ever see
SEEK_HOLE/SEEK_DATA in bounds, because earlier points in the call
stack have filtered out out-of-bounds and SEEK_SET; and constraints on
what the return value(s) must be for the various callbacks, especially
if it is different from the eventual return value of the overall
llseek syscall)

> +
> +	return offset + ti_begin;
> +}
> +
>  static int linear_iterate_devices(struct dm_target *ti,
>  				  iterate_devices_callout_fn fn, void *data)
>  {
> @@ -212,6 +236,7 @@ static struct target_type linear_target = {
>  	.direct_access = linear_dax_direct_access,
>  	.dax_zero_page_range = linear_dax_zero_page_range,
>  	.dax_recovery_write = linear_dax_recovery_write,
> +	.seek_hole_data = linear_seek_hole_data,
>  };
>  
>  int __init dm_linear_init(void)
> -- 
> 2.44.0
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 7/9] selftests: block_seek_hole: add dm-linear test
  2024-03-28 20:39 ` [RFC 7/9] selftests: block_seek_hole: add dm-linear test Stefan Hajnoczi
@ 2024-03-29  0:59   ` Eric Blake
  2024-04-03 14:23     ` Stefan Hajnoczi
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2024-03-29  0:59 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Thu, Mar 28, 2024 at 04:39:08PM -0400, Stefan Hajnoczi wrote:
> The dm-linear linear target passes through SEEK_HOLE/SEEK_DATA. Extend
> the test case to check that the same holes/data are reported as for the
> underlying file.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/testing/selftests/block_seek_hole/test.py | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/block_seek_hole/test.py b/tools/testing/selftests/block_seek_hole/test.py
> index 4f7c2d01ab3d3..6360b72aee338 100755
> --- a/tools/testing/selftests/block_seek_hole/test.py
> +++ b/tools/testing/selftests/block_seek_hole/test.py
> @@ -45,6 +45,20 @@ def loop_device(file_path):
>      finally:
>          run(['losetup', '-d', loop_path])
>  
> +@contextmanager
> +def dm_linear(file_path):
> +    file_size = os.path.getsize(file_path)
> +
> +    with loop_device(file_path) as loop_path:
> +        dm_name = f'test-{os.getpid()}'
> +        run(['dmsetup', 'create', dm_name, '--table',
> +             f'0 {file_size // 512} linear {loop_path} 0'])

Would it be worth tryiing to create the dm with two copies of
loop_path concatenated one after the other?  You'd have to do more
work on expected output (coalescing adjacent data or holes between the
tail of the first copy and the head of the second), but without that
in place, I worry that you are missing logic bugs for when there is
more than one table in the overall dm (as evidenced by my review in
4/9).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 8/9] dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 ` [RFC 8/9] dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
@ 2024-03-29  1:31   ` Eric Blake
  2024-04-03 15:03     ` Stefan Hajnoczi
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2024-03-29  1:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Thu, Mar 28, 2024 at 04:39:09PM -0400, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> Open issues:
> - Locking?
> - thin_seek_hole_data() does not run as a bio or request. This patch
>   assumes dm_thin_find_mapped_range() synchronously performs I/O if
>   metadata needs to be loaded from disk. Is that a valid assumption?
> ---
>  drivers/md/dm-thin.c | 77 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 77 insertions(+)
> 
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 4793ad2aa1f7e..3c5dc4f0fe8a3 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -4501,6 +4501,82 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
>  	}
>  }
>  
> +static dm_block_t loff_to_block(struct pool *pool, loff_t offset)
> +{
> +	sector_t offset_sectors = offset >> SECTOR_SHIFT;
> +	dm_block_t ret;
> +
> +	if (block_size_is_power_of_two(pool))
> +		ret = offset_sectors >> pool->sectors_per_block_shift;
> +	else {
> +		ret = offset_sectors;
> +		(void) sector_div(ret, pool->sectors_per_block);
> +	}
> +
> +	return ret;
> +}
> +
> +static loff_t block_to_loff(struct pool *pool, dm_block_t block)
> +{
> +	return block_to_sectors(pool, block) << SECTOR_SHIFT;
> +}
> +
> +static loff_t thin_seek_hole_data(struct dm_target *ti, loff_t offset,
> +		int whence)
> +{
> +	struct thin_c *tc = ti->private;
> +	struct dm_thin_device *td = tc->td;
> +	struct pool *pool = tc->pool;
> +	dm_block_t begin;
> +	dm_block_t end;
> +	dm_block_t mapped_begin;
> +	dm_block_t mapped_end;
> +	dm_block_t pool_begin;
> +	bool maybe_shared;
> +	int ret;
> +
> +	/* TODO locking? */
> +
> +	if (block_size_is_power_of_two(pool))
> +		end = ti->len >> pool->sectors_per_block_shift;
> +	else {
> +		end = ti->len;
> +		(void) sector_div(end, pool->sectors_per_block);
> +	}
> +
> +	offset -= ti->begin << SECTOR_SHIFT;
> +
> +	while (true) {
> +		begin = loff_to_block(pool, offset);
> +		ret = dm_thin_find_mapped_range(td, begin, end,
> +						&mapped_begin, &mapped_end,
> +						&pool_begin, &maybe_shared);
> +		if (ret == -ENODATA) {
> +			if (whence == SEEK_DATA)
> +				return -ENXIO;
> +			break;
> +		} else if (ret < 0) {
> +			/* TODO handle EWOULDBLOCK? */
> +			return -ENXIO;

This should probably be -EIO, not -ENXIO.

> +		}
> +
> +		/* SEEK_DATA finishes here... */
> +		if (whence == SEEK_DATA) {
> +			if (mapped_begin != begin)
> +				offset = block_to_loff(pool, mapped_begin);
> +			break;
> +		}
> +
> +		/* ...while SEEK_HOLE may need to look further */
> +		if (mapped_begin != begin)
> +			break; /* offset is in a hole */
> +
> +		offset = block_to_loff(pool, mapped_end);
> +	}
> +
> +	return offset + (ti->begin << SECTOR_SHIFT);

It's hard to follow, but I'm fairly certain that if whence ==
SEEK_HOLE, you end up returning ti->begin + ti->len instead of -ENXIO
if the range from begin to end is fully mapped; which is inconsistent
with the semantics you have in 4/9 (although in 6/9 I argue that
having all of the dm callbacks return ti->begin + ti->len instead of
-ENXIO might make logic easier for iterating through consecutive ti,
and then convert to -ENXIO only in the caller).

> +}
> +
>  static struct target_type thin_target = {
>  	.name = "thin",
>  	.version = {1, 23, 0},
> @@ -4515,6 +4591,7 @@ static struct target_type thin_target = {
>  	.status = thin_status,
>  	.iterate_devices = thin_iterate_devices,
>  	.io_hints = thin_io_hints,
> +	.seek_hole_data = thin_seek_hole_data,
>  };
>  
>  /*----------------------------------------------------------------*/
> -- 
> 2.44.0
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 3/9] selftests: block_seek_hole: add loop block driver tests
  2024-03-28 20:39 ` [RFC 3/9] selftests: block_seek_hole: add loop block driver tests Stefan Hajnoczi
  2024-03-29  0:11   ` Eric Blake
@ 2024-03-29 12:38   ` Eric Blake
  2024-04-03 13:51     ` Stefan Hajnoczi
  1 sibling, 1 reply; 40+ messages in thread
From: Eric Blake @ 2024-03-29 12:38 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Thu, Mar 28, 2024 at 04:39:04PM -0400, Stefan Hajnoczi wrote:
> Run the tests with:
> 
>   $ make TARGETS=block_seek_hole -C tools/selftests run_tests
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tools/testing/selftests/Makefile              |   1 +
>  .../selftests/block_seek_hole/Makefile        |  17 +++
>  .../testing/selftests/block_seek_hole/config  |   1 +
>  .../selftests/block_seek_hole/map_holes.py    |  37 +++++++
>  .../testing/selftests/block_seek_hole/test.py | 103 ++++++++++++++++++
>  5 files changed, 159 insertions(+)
>  create mode 100644 tools/testing/selftests/block_seek_hole/Makefile
>  create mode 100644 tools/testing/selftests/block_seek_hole/config
>  create mode 100755 tools/testing/selftests/block_seek_hole/map_holes.py
>  create mode 100755 tools/testing/selftests/block_seek_hole/test.py
> 

> +
> +def map_holes(fd):
> +    end = os.lseek(fd, 0, os.SEEK_END)
> +    offset = 0
> +
> +    print('TYPE START END SIZE')
> +
> +    while offset < end:
> +        contents = 'DATA'
> +        new_offset = os.lseek(fd, offset, os.SEEK_HOLE)
> +        if new_offset == offset:
> +            contents = 'HOLE'
> +            try:
> +              new_offset = os.lseek(fd, offset, os.SEEK_DATA)
> +            except OSError as err:
> +                if err.errno == errno.ENXIO:
> +                    new_offset = end
> +                else:
> +                    raise err
> +            assert new_offset != offset
> +        print(f'{contents} {offset} {new_offset} {new_offset - offset}')
> +        offset = new_offset

Over the years, I've seen various SEEK_HOLE implementation bugs where
things work great on the initial boundary, but fail when requested on
an offset not aligned to the start of the extent boundary.  It would
probably be worth enhancing the test to prove that:

if lseek(fd, offset, SEEK_HOLE) == offset:
  new_offset = lseek(fd, offset, SEEK_DATA)
  assert new_offset > offset
  assert lseek(fd, new_offset - 1, SEEK_HOLE) == new_offset - 1
else:
  assert lseek(fd, offset, SEEK_DATA) == offset
  new_offset = lseek(fd, offset, SEEK_HOLE)
  assert new_offset > offset
  assert lseek(fd, new_offset - 1, SEEK_DATA) == new_offset - 1

Among other things, this would prove that even though block devices
generally operate on a minimum granularity of a sector, lseek() still
gives byte-accurate results for a random offset that falls in the
middle of a sector, and doesn't accidentally round down reporting an
offset less than the value passed in to the request.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 2/9] loop: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-29  0:00   ` Eric Blake
@ 2024-03-29 12:54     ` Stefan Hajnoczi
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-03-29 12:54 UTC (permalink / raw)
  To: Eric Blake
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

[-- Attachment #1: Type: text/plain, Size: 3436 bytes --]

On Thu, Mar 28, 2024 at 07:00:26PM -0500, Eric Blake wrote:
> On Thu, Mar 28, 2024 at 04:39:03PM -0400, Stefan Hajnoczi wrote:
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > Open issues:
> > - The file offset is updated on both the blkdev file and the backing
> >   file. Is there a way to avoid updating the backing file offset so the
> >   file opened by userspace is not affected?
> > - Should this run in the worker or use the cgroups?
> > ---
> >  drivers/block/loop.c | 36 ++++++++++++++++++++++++++++++------
> >  1 file changed, 30 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index 28a95fd366fea..6a89375de82e8 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -750,6 +750,29 @@ static void loop_sysfs_exit(struct loop_device *lo)
> >  				   &loop_attribute_group);
> >  }
> >  
> > +static loff_t lo_seek_hole_data(struct block_device *bdev, loff_t offset,
> > +		int whence)
> > +{
> > +	/* TODO need to activate cgroups or use worker? */
> > +	/* TODO locking? */
> > +	struct loop_device *lo = bdev->bd_disk->private_data;
> > +	struct file *file = lo->lo_backing_file;
> > +
> > +	if (lo->lo_offset > 0)
> > +		offset += lo->lo_offset; /* TODO underflow/overflow? */
> > +
> > +	/* TODO backing file offset is modified! */
> > +	offset = vfs_llseek(file, offset, whence);
> 
> Not only did you modify the underlying offset...
> 
> > +	if (offset < 0)
> > +		return offset;
> > +
> > +	if (lo->lo_offset > 0)
> > +		offset -= lo->lo_offset; /* TODO underflow/overflow? */
> > +	if (lo->lo_sizelimit > 0 && offset > lo->lo_sizelimit)
> > +		offset = lo->lo_sizelimit;
> 
> ...but if this code kicks in, you have clamped the return result to
> EOF of the loop device while leaving the underlying offset beyond the
> limit, which may mess up assumptions of other code expecting the loop
> to always have an in-bounds offset for the underlying file (offhand, I
> don't know if there is any such code; but since loop_ctl_fops.llseek =
> noop_lseek, there may be code relying on an even-tighter restriction
> that the offset of the underlying file never changes, not even within
> bounds).

I don't think anything relies on the file offset. Requests coming from
the block device contain their own offset (which may have been based on
the block device file's offset, but not the backing file's offset).

> Furthermore, this is inconsistent with all other seek-beyond-end code
> that fails with -ENXIO instead of returning size.

You're right, in the SEEK_DATA case the return value should be -ENXIO.
The SEEK_HOLE case is correct with lo_sizelimit. There is also an
off-by-one error in the comparison.

It should be:

  if (lo->lo_sizelimit > 0 && offset >= lo->lo_sizelimit) {
  	if (whence == SEEK_DATA)
  		offset = -ENXIO;
  	else
  		offset = lo->lo_sizelimit;
  }

> But for an RFC, the idea of being able to seek to HOLEs in a loop
> device is awesome!
> 
> > @@ -2140,7 +2164,7 @@ static int loop_control_remove(int idx)
> >  		pr_warn_once("deleting an unspecified loop device is not supported.\n");
> >  		return -EINVAL;
> >  	}
> > -		
> > +
> >  	/* Hide this loop device for serialization. */
> >  	ret = mutex_lock_killable(&loop_ctl_mutex);
> >  	if (ret)
> 
> Unrelated whitespace change?

Yes, I'll drop it.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 ` [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
  2024-03-29  0:54   ` Eric Blake
@ 2024-03-31  7:35   ` kernel test robot
  2024-04-03 14:14     ` Stefan Hajnoczi
  1 sibling, 1 reply; 40+ messages in thread
From: kernel test robot @ 2024-03-31  7:35 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: oe-kbuild-all

Hi Stefan,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on shuah-kselftest/next]
[also build test ERROR on shuah-kselftest/fixes device-mapper-dm/for-next axboe-block/for-next linus/master v6.9-rc1 next-20240328]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Stefan-Hajnoczi/block-add-llseek-SEEK_HOLE-SEEK_DATA-support/20240329-044334
base:   https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
patch link:    https://lore.kernel.org/r/20240328203910.2370087-7-stefanha%40redhat.com
patch subject: [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support
config: m68k-allmodconfig (https://download.01.org/0day-ci/archive/20240331/202403311535.lNG6ozhq-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240331/202403311535.lNG6ozhq-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403311535.lNG6ozhq-lkp@intel.com/

All errors (new ones prefixed by >>, old ones prefixed by <<):

WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-gbphy.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-gpio.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-i2c.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-pwm.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-sdio.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-spi.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-uart.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-usb.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/platform/goldfish/goldfish_pipe.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/platform/chrome/cros_kunit_proto_test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mailbox/mtk-cmdq-mailbox.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_simpleondemand.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_performance.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_powersave.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_userspace.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hwtracing/intel_th/intel_th_msu_sink.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvmem/nvmem-apple-efuses.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvmem/nvmem_brcm_nvram.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvmem/nvmem_u-boot-env.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx-interconnect.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mm-interconnect.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mq-interconnect.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mn-interconnect.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mp-interconnect.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hte/hte-tegra194-test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vdpa/vdpa.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/parport/parport.o
WARNING: modpost: drivers/parport/parport_amiga: section mismatch in reference: amiga_parallel_driver+0x8 (section: .data) -> amiga_parallel_remove (section: .exit.text)
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mtd/parsers/brcm_u-boot.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mtd/parsers/tplink_safeloader.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mtd/chips/cfi_util.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mtd/chips/cfi_cmdset_0020.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mtd/maps/map_funcs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/spmi/hisi-spmi-controller.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/spmi/spmi-pmic-arb.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/uio/uio.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/uio/uio_pruss.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/pcmcia/pcmcia_rsrc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/pcmcia/i82365.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hwmon/corsair-cpro.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hwmon/mr75203.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vhost/vringh.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/greybus/greybus.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/greybus/gb-es2.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/rpmsg/rpmsg_char.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/iio/adc/ingenic-adc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/iio/adc/xilinx-ams.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/iio/buffer/kfifo_buf.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-core.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-master-hub.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-master-aspeed.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-master-gpio.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-master-ast-cf.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-scom.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/siox/siox-bus-gpio.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/counter/ftm-quaddec.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/oss/dmasound/dmasound_core.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/oss/dmasound/dmasound_atari.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/oss/dmasound/dmasound_paula.o
WARNING: modpost: sound/oss/dmasound/dmasound_paula: section mismatch in reference: amiga_audio_driver+0x8 (section: .data) -> amiga_audio_remove (section: .exit.text)
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/core/snd-pcm-dmaengine.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/core/sound_kunit.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/drivers/snd-pcmtest.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/pci/hda/snd-hda-cirrus-scodec-test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/soc-topology-test.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/codecs/snd-soc-ab8500-codec.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/codecs/snd-soc-sigmadsp.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/codecs/snd-soc-wm-adsp.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/fsl/imx-pcm-dma.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/mxs/snd-soc-mxs-pcm.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/qcom/snd-soc-qcom-common.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/qcom/snd-soc-qcom-sdw.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/qcom/qdsp6/snd-q6dsp-common.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/intel/snd-sof-intel-atom.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/intel/snd-sof-acpi-intel-byt.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/intel/snd-sof-acpi-intel-bdw.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/imx/snd-sof-imx8.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/imx/snd-sof-imx8m.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/imx/snd-sof-imx8ulp.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/imx/imx-common.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/mediatek/mtk-adsp-common.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/mediatek/mt8195/snd-sof-mt8195.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/mediatek/mt8186/snd-sof-mt8186.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/snd-sof-utils.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/snd-sof-acpi.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/snd-sof-of.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/xilinx/snd-soc-xlnx-i2s.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/xilinx/snd-soc-xlnx-formatter-pcm.o
WARNING: modpost: missing MODULE_DESCRIPTION() in sound/ac97_bus.o
WARNING: modpost: missing MODULE_DESCRIPTION() in samples/vfio-mdev/mtty.o
WARNING: modpost: missing MODULE_DESCRIPTION() in samples/vfio-mdev/mdpy.o
WARNING: modpost: missing MODULE_DESCRIPTION() in samples/vfio-mdev/mdpy-fb.o
WARNING: modpost: missing MODULE_DESCRIPTION() in samples/vfio-mdev/mbochs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in samples/configfs/configfs_sample.o
WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kfifo/bytestream-example.o
WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kfifo/dma-example.o
WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kfifo/inttype-example.o
WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kfifo/record-example.o
WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kobject/kobject-example.o
WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kobject/kset-example.o
>> ERROR: modpost: "blkdev_seek_hole_data" [drivers/md/dm-mod.ko] undefined!

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
                   ` (9 preceding siblings ...)
  2024-03-28 22:16 ` [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Eric Blake
@ 2024-04-02 12:26 ` Christoph Hellwig
  2024-04-02 13:04   ` Stefan Hajnoczi
  2024-04-02 13:31   ` Eric Blake
  10 siblings, 2 replies; 40+ messages in thread
From: Christoph Hellwig @ 2024-04-02 12:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, eblake, Alasdair Kergon,
	Mikulas Patocka, dm-devel, David Teigland, Mike Snitzer,
	Jens Axboe, Christoph Hellwig, Joe Thornber

On Thu, Mar 28, 2024 at 04:39:01PM -0400, Stefan Hajnoczi wrote:
> In the block device world there are similar concepts to holes:
> - SCSI has Logical Block Provisioning where the "mapped" state would be
>   considered data and other states would be considered holes.

But for SCSI (and ATA and NVMe) unmapped/delallocated/etc blocks do
not have to return zeroes.  They could also return some other
initialization pattern pattern.  So they are (unfortunately) not a 1:1
mapping to holes in sparse files.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-04-02 12:26 ` Christoph Hellwig
@ 2024-04-02 13:04   ` Stefan Hajnoczi
  2024-04-05  7:02     ` Christoph Hellwig
  2024-04-02 13:31   ` Eric Blake
  1 sibling, 1 reply; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-04-02 13:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-block, linux-kernel, eblake, Alasdair Kergon,
	Mikulas Patocka, dm-devel, David Teigland, Mike Snitzer,
	Jens Axboe, Joe Thornber

[-- Attachment #1: Type: text/plain, Size: 901 bytes --]

On Tue, Apr 02, 2024 at 02:26:17PM +0200, Christoph Hellwig wrote:
> On Thu, Mar 28, 2024 at 04:39:01PM -0400, Stefan Hajnoczi wrote:
> > In the block device world there are similar concepts to holes:
> > - SCSI has Logical Block Provisioning where the "mapped" state would be
> >   considered data and other states would be considered holes.
> 
> But for SCSI (and ATA and NVMe) unmapped/delallocated/etc blocks do
> not have to return zeroes.  They could also return some other
> initialization pattern pattern.  So they are (unfortunately) not a 1:1
> mapping to holes in sparse files.

Hi Christoph,
There is a 1:1 mapping when when the Logical Block Provisioning Read
Zeroes (LBPRZ) field is set to xx1b in the Logical Block Provisioning
VPD page.

Otherwise SEEK_HOLE/SEEK_DATA has to treat the device as filled with
data because it doesn't know where the holes are.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-04-02 12:26 ` Christoph Hellwig
  2024-04-02 13:04   ` Stefan Hajnoczi
@ 2024-04-02 13:31   ` Eric Blake
  2024-04-05  7:02     ` Christoph Hellwig
  1 sibling, 1 reply; 40+ messages in thread
From: Eric Blake @ 2024-04-02 13:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Stefan Hajnoczi, linux-block, linux-kernel, Alasdair Kergon,
	Mikulas Patocka, dm-devel, David Teigland, Mike Snitzer,
	Jens Axboe, Joe Thornber

On Tue, Apr 02, 2024 at 02:26:17PM +0200, Christoph Hellwig wrote:
> On Thu, Mar 28, 2024 at 04:39:01PM -0400, Stefan Hajnoczi wrote:
> > In the block device world there are similar concepts to holes:
> > - SCSI has Logical Block Provisioning where the "mapped" state would be
> >   considered data and other states would be considered holes.
> 
> But for SCSI (and ATA and NVMe) unmapped/delallocated/etc blocks do
> not have to return zeroes.  They could also return some other
> initialization pattern pattern.  So they are (unfortunately) not a 1:1
> mapping to holes in sparse files.

Yes, and Stefan already answered that:

https://lore.kernel.org/dm-devel/e2lcp3n5gpf7zmlpyn4nj7wsr36sffn23z5bmzlsghu6oapi5u@sdkcbpimi5is/t/#m58146a45951ec086966497e179a2b2715692712d

>> SCSI initiators can check the Logical Block Provisioning Read Zeroes
>> (LBPRZ) field to determine whether or not zeroes are guaranteed. The sd
>> driver would only rely on the device when LPBRZ indicates that zeroes
>> will be read. Otherwise the driver would report that the device is
>> filled with data.

As well as my question on whether the community would be open to
introducing new SEEK_* constants to allow orthogonality between
searching for zeroes (known to read as zero, whether or not it was
allocated) vs. sparseness (known to be unallocated, whether or not it
reads as zero), where the existing SEEK_HOLE seeks for both properties
at once.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 3/9] selftests: block_seek_hole: add loop block driver tests
  2024-03-29  0:11   ` Eric Blake
@ 2024-04-03 13:50     ` Stefan Hajnoczi
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-04-03 13:50 UTC (permalink / raw)
  To: Eric Blake
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

[-- Attachment #1: Type: text/plain, Size: 2113 bytes --]

On Thu, Mar 28, 2024 at 07:11:30PM -0500, Eric Blake wrote:
> On Thu, Mar 28, 2024 at 04:39:04PM -0400, Stefan Hajnoczi wrote:
> > Run the tests with:
> > 
> >   $ make TARGETS=block_seek_hole -C tools/selftests run_tests
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  tools/testing/selftests/Makefile              |   1 +
> >  .../selftests/block_seek_hole/Makefile        |  17 +++
> >  .../testing/selftests/block_seek_hole/config  |   1 +
> >  .../selftests/block_seek_hole/map_holes.py    |  37 +++++++
> >  .../testing/selftests/block_seek_hole/test.py | 103 ++++++++++++++++++
> >  5 files changed, 159 insertions(+)
> >  create mode 100644 tools/testing/selftests/block_seek_hole/Makefile
> >  create mode 100644 tools/testing/selftests/block_seek_hole/config
> >  create mode 100755 tools/testing/selftests/block_seek_hole/map_holes.py
> >  create mode 100755 tools/testing/selftests/block_seek_hole/test.py
> > 
> > +++ b/tools/testing/selftests/block_seek_hole/test.py
> 
> > +
> > +# Different data layouts to test
> > +
> > +def data_at_beginning_and_end(f):
> > +    f.write(b'A' * 4 * KB)
> > +    f.seek(256 * MB)
> > +
> > +    f.write(b'B' * 64 * KB)
> > +
> > +    f.seek(1024 * MB - KB)
> > +    f.write(b'C' * KB)
> > +
> > +def holes_at_beginning_and_end(f):
> > +    f.seek(128 * MB)
> > +    f.write(b'A' * 4 * KB)
> > +
> > +    f.seek(512 * MB)
> > +    f.write(b'B' * 64 * KB)
> > +
> > +    f.truncate(1024 * MB)
> > +
> > +def no_holes(f):
> > +    # Just 1 MB so test file generation is quick
> > +    mb = b'A' * MB
> > +    f.write(mb)
> > +
> > +def empty_file(f):
> > +    f.truncate(1024 * MB)
> 
> Is it also worth attempting to test a (necessarily sparse!) file
> larger than 2GiB to prove that we are 64-bit clean, even on a 32-bit
> system where lseek is different than lseek64?  (I honestly have no
> idea if python always uses 64-bit seek even on 32-bit systems,
> although I would be surprised if it were not)

Yes, it wouldn't hurt to add a test case for that. I'll do that.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 3/9] selftests: block_seek_hole: add loop block driver tests
  2024-03-29 12:38   ` Eric Blake
@ 2024-04-03 13:51     ` Stefan Hajnoczi
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-04-03 13:51 UTC (permalink / raw)
  To: Eric Blake
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

[-- Attachment #1: Type: text/plain, Size: 2841 bytes --]

On Fri, Mar 29, 2024 at 07:38:17AM -0500, Eric Blake wrote:
> On Thu, Mar 28, 2024 at 04:39:04PM -0400, Stefan Hajnoczi wrote:
> > Run the tests with:
> > 
> >   $ make TARGETS=block_seek_hole -C tools/selftests run_tests
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  tools/testing/selftests/Makefile              |   1 +
> >  .../selftests/block_seek_hole/Makefile        |  17 +++
> >  .../testing/selftests/block_seek_hole/config  |   1 +
> >  .../selftests/block_seek_hole/map_holes.py    |  37 +++++++
> >  .../testing/selftests/block_seek_hole/test.py | 103 ++++++++++++++++++
> >  5 files changed, 159 insertions(+)
> >  create mode 100644 tools/testing/selftests/block_seek_hole/Makefile
> >  create mode 100644 tools/testing/selftests/block_seek_hole/config
> >  create mode 100755 tools/testing/selftests/block_seek_hole/map_holes.py
> >  create mode 100755 tools/testing/selftests/block_seek_hole/test.py
> > 
> 
> > +
> > +def map_holes(fd):
> > +    end = os.lseek(fd, 0, os.SEEK_END)
> > +    offset = 0
> > +
> > +    print('TYPE START END SIZE')
> > +
> > +    while offset < end:
> > +        contents = 'DATA'
> > +        new_offset = os.lseek(fd, offset, os.SEEK_HOLE)
> > +        if new_offset == offset:
> > +            contents = 'HOLE'
> > +            try:
> > +              new_offset = os.lseek(fd, offset, os.SEEK_DATA)
> > +            except OSError as err:
> > +                if err.errno == errno.ENXIO:
> > +                    new_offset = end
> > +                else:
> > +                    raise err
> > +            assert new_offset != offset
> > +        print(f'{contents} {offset} {new_offset} {new_offset - offset}')
> > +        offset = new_offset
> 
> Over the years, I've seen various SEEK_HOLE implementation bugs where
> things work great on the initial boundary, but fail when requested on
> an offset not aligned to the start of the extent boundary.  It would
> probably be worth enhancing the test to prove that:
> 
> if lseek(fd, offset, SEEK_HOLE) == offset:
>   new_offset = lseek(fd, offset, SEEK_DATA)
>   assert new_offset > offset
>   assert lseek(fd, new_offset - 1, SEEK_HOLE) == new_offset - 1
> else:
>   assert lseek(fd, offset, SEEK_DATA) == offset
>   new_offset = lseek(fd, offset, SEEK_HOLE)
>   assert new_offset > offset
>   assert lseek(fd, new_offset - 1, SEEK_DATA) == new_offset - 1
> 
> Among other things, this would prove that even though block devices
> generally operate on a minimum granularity of a sector, lseek() still
> gives byte-accurate results for a random offset that falls in the
> middle of a sector, and doesn't accidentally round down reporting an
> offset less than the value passed in to the request.

Sure. I'll add a test for that.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-29  0:38   ` Eric Blake
@ 2024-04-03 14:11     ` Stefan Hajnoczi
  2024-04-03 17:02       ` Eric Blake
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-04-03 14:11 UTC (permalink / raw)
  To: Eric Blake
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

[-- Attachment #1: Type: text/plain, Size: 4524 bytes --]

On Thu, Mar 28, 2024 at 07:38:20PM -0500, Eric Blake wrote:
> On Thu, Mar 28, 2024 at 04:39:05PM -0400, Stefan Hajnoczi wrote:
> > Delegate SEEK_HOLE/SEEK_DATA to device-mapper targets. The new
> > dm_seek_hole_data() callback allows target types to customize behavior.
> > The default implementation treats the target as all data with no holes.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  include/linux/device-mapper.h |  5 +++
> >  drivers/md/dm.c               | 68 +++++++++++++++++++++++++++++++++++
> >  2 files changed, 73 insertions(+)
> > 
> 
> > +/* Default implementation for targets that do not implement the callback */
> > +static loff_t dm_blk_seek_hole_data_default(loff_t offset, int whence,
> > +		loff_t size)
> > +{
> > +	switch (whence) {
> > +	case SEEK_DATA:
> > +		if ((unsigned long long)offset >= size)
> > +			return -ENXIO;
> > +		return offset;
> > +	case SEEK_HOLE:
> > +		if ((unsigned long long)offset >= size)
> > +			return -ENXIO;
> > +		return size;
> 
> These fail with -ENXIO if offset == size (matching what we do on files)...
> 
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +}
> > +
> > +static loff_t dm_blk_do_seek_hole_data(struct dm_table *table, loff_t offset,
> > +		int whence)
> > +{
> > +	struct dm_target *ti;
> > +	loff_t end;
> > +
> > +	/* Loop when the end of a target is reached */
> > +	do {
> > +		ti = dm_table_find_target(table, offset >> SECTOR_SHIFT);
> > +		if (!ti)
> > +			return whence == SEEK_DATA ? -ENXIO : offset;
> 
> ...but this blindly returns offset for SEEK_HOLE, even when offset is
> beyond the end of the dm.  I think you want 'return -ENXIO;'
> unconditionally here.

If the initial offset is beyond the end of the table, then SEEK_HOLE
should return -ENXIO. I agree that the code doesn't handle this case.

However, returning offset here is correct when there is data at the end
with SEEK_HOLE.

I'll update the code to address the out-of-bounds offset case, perhaps
by checking the initial offset before entering the loop.

> 
> > +
> > +		end = (ti->begin + ti->len) << SECTOR_SHIFT;
> > +
> > +		if (ti->type->seek_hole_data)
> > +			offset = ti->type->seek_hole_data(ti, offset, whence);
> 
> Are we guaranteed that ti->type->seek_hole_data will not return a
> value exceeding end?  Or can dm be used to truncate the view of an
> underlying device, and the underlying seek_hold_data can now return an
> answer beyond where dm_table_find_target should look for the next part
> of the dm's view?

ti->type->seek_hole_data() must not return a value larger than
(ti->begin + ti->len) << SECTOR_SHIFT.

> 
> In which case, should the blkdev_seek_hole_data callback be passed a
> max size parameter everywhere, similar to how fixed_size_llseek does
> things?
> 
> > +		else
> > +			offset = dm_blk_seek_hole_data_default(offset, whence, end);
> > +
> > +		if (whence == SEEK_DATA && offset == -ENXIO)
> > +			offset = end;
> 
> You have a bug here.  If I have a dm contructed of two underlying targets:
> 
> |A  |B  |
> 
> and A is all data, then whence == SEEK_HOLE will have offset = -ENXIO
> at this point, and you fail to check whether B is also data.  That is,
> you have silently treated the rest of the block device as data, which
> is semantically not wrong (as that is always a safe fallback), but not
> optimal.
> 
> I think the correct logic is s/whence == SEEK_DATA &&//.

No, with whence == SEEK_HOLE and an initial offset in A, the new offset
will be (A->begin + A->end) << SECTOR_SHIFT. The loop will iterate and
continue seeking into B.

The if statement you commented on ensures that we also continue looping
with whence == SEEK_DATA, because that would otherwise prematurely end
with the new offset = -ENXIO.

> 
> > +	} while (offset == end);
> 
> I'm trying to make sure that we can never return the equivalent of
> lseek(dm, 0, SEEK_END).  If you make my above suggested changes, we
> will iterate through the do loop once more at EOF, and
> dm_table_find_target() will then fail to match at which point we do
> get the desired -ENXIO for both SEEK_HOLE and SEEK_DATA.

Wait, lseek() is supposed to return the equivalent of lseek(dm, 0,
SEEK_END) when whence == SEEK_HOLE and there is data at the end.

> 
> > +
> > +	return offset;
> > +}
> > +
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-31  7:35   ` kernel test robot
@ 2024-04-03 14:14     ` Stefan Hajnoczi
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-04-03 14:14 UTC (permalink / raw)
  To: kernel test robot; +Cc: oe-kbuild-all

[-- Attachment #1: Type: text/plain, Size: 10710 bytes --]

On Sun, Mar 31, 2024 at 03:35:48PM +0800, kernel test robot wrote:
> Hi Stefan,
> 
> [This is a private test report for your RFC patch.]
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on shuah-kselftest/next]
> [also build test ERROR on shuah-kselftest/fixes device-mapper-dm/for-next axboe-block/for-next linus/master v6.9-rc1 next-20240328]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Stefan-Hajnoczi/block-add-llseek-SEEK_HOLE-SEEK_DATA-support/20240329-044334
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
> patch link:    https://lore.kernel.org/r/20240328203910.2370087-7-stefanha%40redhat.com
> patch subject: [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support
> config: m68k-allmodconfig (https://download.01.org/0day-ci/archive/20240331/202403311535.lNG6ozhq-lkp@intel.com/config)
> compiler: m68k-linux-gcc (GCC) 13.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240331/202403311535.lNG6ozhq-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202403311535.lNG6ozhq-lkp@intel.com/
> 
> All errors (new ones prefixed by >>, old ones prefixed by <<):
> 
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-gbphy.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-gpio.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-i2c.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-pwm.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-sdio.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-spi.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-uart.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/staging/greybus/gb-usb.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/platform/goldfish/goldfish_pipe.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/platform/chrome/cros_kunit_proto_test.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mailbox/mtk-cmdq-mailbox.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_simpleondemand.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_performance.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_powersave.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_userspace.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hwtracing/intel_th/intel_th_msu_sink.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvmem/nvmem-apple-efuses.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvmem/nvmem_brcm_nvram.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvmem/nvmem_u-boot-env.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx-interconnect.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mm-interconnect.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mq-interconnect.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mn-interconnect.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/interconnect/imx/imx8mp-interconnect.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hte/hte-tegra194-test.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vdpa/vdpa.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/parport/parport.o
> WARNING: modpost: drivers/parport/parport_amiga: section mismatch in reference: amiga_parallel_driver+0x8 (section: .data) -> amiga_parallel_remove (section: .exit.text)
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mtd/parsers/brcm_u-boot.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mtd/parsers/tplink_safeloader.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mtd/chips/cfi_util.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mtd/chips/cfi_cmdset_0020.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/mtd/maps/map_funcs.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/spmi/hisi-spmi-controller.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/spmi/spmi-pmic-arb.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/uio/uio.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/uio/uio_pruss.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/pcmcia/pcmcia_rsrc.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/pcmcia/i82365.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hwmon/corsair-cpro.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/hwmon/mr75203.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/vhost/vringh.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/greybus/greybus.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/greybus/gb-es2.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/rpmsg/rpmsg_char.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/iio/adc/ingenic-adc.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/iio/adc/xilinx-ams.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/iio/buffer/kfifo_buf.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-core.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-master-hub.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-master-aspeed.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-master-gpio.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-master-ast-cf.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fsi/fsi-scom.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/siox/siox-bus-gpio.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/counter/ftm-quaddec.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/oss/dmasound/dmasound_core.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/oss/dmasound/dmasound_atari.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/oss/dmasound/dmasound_paula.o
> WARNING: modpost: sound/oss/dmasound/dmasound_paula: section mismatch in reference: amiga_audio_driver+0x8 (section: .data) -> amiga_audio_remove (section: .exit.text)
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/core/snd-pcm-dmaengine.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/core/sound_kunit.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/drivers/snd-pcmtest.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/pci/hda/snd-hda-cirrus-scodec-test.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/soc-topology-test.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/codecs/snd-soc-ab8500-codec.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/codecs/snd-soc-sigmadsp.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/codecs/snd-soc-wm-adsp.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/fsl/imx-pcm-dma.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/mxs/snd-soc-mxs-pcm.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/qcom/snd-soc-qcom-common.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/qcom/snd-soc-qcom-sdw.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/qcom/qdsp6/snd-q6dsp-common.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/intel/snd-sof-intel-atom.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/intel/snd-sof-acpi-intel-byt.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/intel/snd-sof-acpi-intel-bdw.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/imx/snd-sof-imx8.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/imx/snd-sof-imx8m.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/imx/snd-sof-imx8ulp.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/imx/imx-common.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/mediatek/mtk-adsp-common.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/mediatek/mt8195/snd-sof-mt8195.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/mediatek/mt8186/snd-sof-mt8186.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/snd-sof-utils.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/snd-sof-acpi.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/sof/snd-sof-of.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/xilinx/snd-soc-xlnx-i2s.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/soc/xilinx/snd-soc-xlnx-formatter-pcm.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in sound/ac97_bus.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in samples/vfio-mdev/mtty.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in samples/vfio-mdev/mdpy.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in samples/vfio-mdev/mdpy-fb.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in samples/vfio-mdev/mbochs.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in samples/configfs/configfs_sample.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kfifo/bytestream-example.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kfifo/dma-example.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kfifo/inttype-example.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kfifo/record-example.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kobject/kobject-example.o
> WARNING: modpost: missing MODULE_DESCRIPTION() in samples/kobject/kset-example.o
> >> ERROR: modpost: "blkdev_seek_hole_data" [drivers/md/dm-mod.ko] undefined!

I think EXPORT_SYMBOL() is missing. I'll add that in the next revision.

> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-29  0:54   ` Eric Blake
@ 2024-04-03 14:22     ` Stefan Hajnoczi
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-04-03 14:22 UTC (permalink / raw)
  To: Eric Blake
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

[-- Attachment #1: Type: text/plain, Size: 3112 bytes --]

On Thu, Mar 28, 2024 at 07:54:21PM -0500, Eric Blake wrote:
> On Thu, Mar 28, 2024 at 04:39:07PM -0400, Stefan Hajnoczi wrote:
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  drivers/md/dm-linear.c | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
> > index 2d3e186ca87e3..9b6cdfa4f951d 100644
> > --- a/drivers/md/dm-linear.c
> > +++ b/drivers/md/dm-linear.c
> > @@ -147,6 +147,30 @@ static int linear_report_zones(struct dm_target *ti,
> >  #define linear_report_zones NULL
> >  #endif
> >  
> > +static loff_t linear_seek_hole_data(struct dm_target *ti, loff_t offset,
> > +		int whence)
> > +{
> > +	struct linear_c *lc = ti->private;
> > +	loff_t ti_begin = ti->begin << SECTOR_SHIFT;
> > +	loff_t ti_len = ti->len << SECTOR_SHIFT;
> > +	loff_t bdev_start = lc->start << SECTOR_SHIFT;
> > +	loff_t bdev_offset;
> 
> Okay, given my questions in 4/9, it looks like your intent is that
> each callback for dm_seek_hole_data will obey its own ti-> limits.

Yes, exactly.

> 
> > +
> > +	/* TODO underflow/overflow? */
> > +	bdev_offset = offset - ti_begin + bdev_start;
> > +
> > +	bdev_offset = blkdev_seek_hole_data(lc->dev->bdev, bdev_offset,
> > +					    whence);
> > +	if (bdev_offset < 0)
> > +		return bdev_offset;
> > +
> > +	offset = bdev_offset - bdev_start;
> > +	if (offset >= ti_len)
> > +		return whence == SEEK_DATA ? -ENXIO : ti_begin + ti_len;
> 
> However, this is inconsistent with dm_blk_seek_hole_data_default in
> 4/9; I think you want to unconditionally return -ENXIO here, and let
> the caller figure out when to turn -ENXIO back into end to proceed
> with the next ti in the list.
> 
> OR, you may want to document the semantics that dm_seek_hole_data
> callbacks must NOT return -ENXIO, but always return ti_begin + ti_len
> when the answer (either SEEK_HOLE or SEEK_END) did not lie within the
> current ti - it is DIFFERENT than the semantics for
> blkdev_seek_hole_data, but gets normalized back into the expected
> -ENXIO answer when dm_blk_do_seek_hole_data finally advances past the
> last ti.
> 
> At any rate, I know this is an RFC series, but it goes to show that
> comments will be essential, whichever interface you decide for
> callbacks to honor (both a guarantee that callbacks will only ever see
> SEEK_HOLE/SEEK_DATA in bounds, because earlier points in the call
> stack have filtered out out-of-bounds and SEEK_SET; and constraints on
> what the return value(s) must be for the various callbacks, especially
> if it is different from the eventual return value of the overall
> llseek syscall)

It's easier to understand the code when lseek function has the same
semantics, so I'd rather not customize the semantics for certain lseek
functions.

I'll make sure that the device-mapper targets match the
dm_blk_seek_hole_data_default() behavior. To be honest, I relied on dm.c
always passing offset values that are within the target, but that in
itself is customizing the semantics :).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 7/9] selftests: block_seek_hole: add dm-linear test
  2024-03-29  0:59   ` Eric Blake
@ 2024-04-03 14:23     ` Stefan Hajnoczi
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-04-03 14:23 UTC (permalink / raw)
  To: Eric Blake
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

[-- Attachment #1: Type: text/plain, Size: 1791 bytes --]

On Thu, Mar 28, 2024 at 07:59:14PM -0500, Eric Blake wrote:
> On Thu, Mar 28, 2024 at 04:39:08PM -0400, Stefan Hajnoczi wrote:
> > The dm-linear linear target passes through SEEK_HOLE/SEEK_DATA. Extend
> > the test case to check that the same holes/data are reported as for the
> > underlying file.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  tools/testing/selftests/block_seek_hole/test.py | 16 +++++++++++++++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/selftests/block_seek_hole/test.py b/tools/testing/selftests/block_seek_hole/test.py
> > index 4f7c2d01ab3d3..6360b72aee338 100755
> > --- a/tools/testing/selftests/block_seek_hole/test.py
> > +++ b/tools/testing/selftests/block_seek_hole/test.py
> > @@ -45,6 +45,20 @@ def loop_device(file_path):
> >      finally:
> >          run(['losetup', '-d', loop_path])
> >  
> > +@contextmanager
> > +def dm_linear(file_path):
> > +    file_size = os.path.getsize(file_path)
> > +
> > +    with loop_device(file_path) as loop_path:
> > +        dm_name = f'test-{os.getpid()}'
> > +        run(['dmsetup', 'create', dm_name, '--table',
> > +             f'0 {file_size // 512} linear {loop_path} 0'])
> 
> Would it be worth tryiing to create the dm with two copies of
> loop_path concatenated one after the other?  You'd have to do more
> work on expected output (coalescing adjacent data or holes between the
> tail of the first copy and the head of the second), but without that
> in place, I worry that you are missing logic bugs for when there is
> more than one table in the overall dm (as evidenced by my review in
> 4/9).

Yes, I agree that more tests are needed to cover transitions between
adjacent targets.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 8/9] dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-03-29  1:31   ` Eric Blake
@ 2024-04-03 15:03     ` Stefan Hajnoczi
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-04-03 15:03 UTC (permalink / raw)
  To: Eric Blake
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

[-- Attachment #1: Type: text/plain, Size: 4107 bytes --]

On Thu, Mar 28, 2024 at 08:31:21PM -0500, Eric Blake wrote:
> On Thu, Mar 28, 2024 at 04:39:09PM -0400, Stefan Hajnoczi wrote:
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > Open issues:
> > - Locking?
> > - thin_seek_hole_data() does not run as a bio or request. This patch
> >   assumes dm_thin_find_mapped_range() synchronously performs I/O if
> >   metadata needs to be loaded from disk. Is that a valid assumption?
> > ---
> >  drivers/md/dm-thin.c | 77 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 77 insertions(+)
> > 
> > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > index 4793ad2aa1f7e..3c5dc4f0fe8a3 100644
> > --- a/drivers/md/dm-thin.c
> > +++ b/drivers/md/dm-thin.c
> > @@ -4501,6 +4501,82 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
> >  	}
> >  }
> >  
> > +static dm_block_t loff_to_block(struct pool *pool, loff_t offset)
> > +{
> > +	sector_t offset_sectors = offset >> SECTOR_SHIFT;
> > +	dm_block_t ret;
> > +
> > +	if (block_size_is_power_of_two(pool))
> > +		ret = offset_sectors >> pool->sectors_per_block_shift;
> > +	else {
> > +		ret = offset_sectors;
> > +		(void) sector_div(ret, pool->sectors_per_block);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static loff_t block_to_loff(struct pool *pool, dm_block_t block)
> > +{
> > +	return block_to_sectors(pool, block) << SECTOR_SHIFT;
> > +}
> > +
> > +static loff_t thin_seek_hole_data(struct dm_target *ti, loff_t offset,
> > +		int whence)
> > +{
> > +	struct thin_c *tc = ti->private;
> > +	struct dm_thin_device *td = tc->td;
> > +	struct pool *pool = tc->pool;
> > +	dm_block_t begin;
> > +	dm_block_t end;
> > +	dm_block_t mapped_begin;
> > +	dm_block_t mapped_end;
> > +	dm_block_t pool_begin;
> > +	bool maybe_shared;
> > +	int ret;
> > +
> > +	/* TODO locking? */
> > +
> > +	if (block_size_is_power_of_two(pool))
> > +		end = ti->len >> pool->sectors_per_block_shift;
> > +	else {
> > +		end = ti->len;
> > +		(void) sector_div(end, pool->sectors_per_block);
> > +	}
> > +
> > +	offset -= ti->begin << SECTOR_SHIFT;
> > +
> > +	while (true) {
> > +		begin = loff_to_block(pool, offset);
> > +		ret = dm_thin_find_mapped_range(td, begin, end,
> > +						&mapped_begin, &mapped_end,
> > +						&pool_begin, &maybe_shared);
> > +		if (ret == -ENODATA) {
> > +			if (whence == SEEK_DATA)
> > +				return -ENXIO;
> > +			break;
> > +		} else if (ret < 0) {
> > +			/* TODO handle EWOULDBLOCK? */
> > +			return -ENXIO;
> 
> This should probably be -EIO, not -ENXIO.

Yes. XFS also returns -EIO, so I guess it's okay to do so.

I still need to get to the bottom of whether calling
dm_thin_find_mapped_range() is sane here and what to do when/if it
returns EWOULDBLOCK.

> > +		}
> > +
> > +		/* SEEK_DATA finishes here... */
> > +		if (whence == SEEK_DATA) {
> > +			if (mapped_begin != begin)
> > +				offset = block_to_loff(pool, mapped_begin);
> > +			break;
> > +		}
> > +
> > +		/* ...while SEEK_HOLE may need to look further */
> > +		if (mapped_begin != begin)
> > +			break; /* offset is in a hole */
> > +
> > +		offset = block_to_loff(pool, mapped_end);
> > +	}
> > +
> > +	return offset + (ti->begin << SECTOR_SHIFT);
> 
> It's hard to follow, but I'm fairly certain that if whence ==
> SEEK_HOLE, you end up returning ti->begin + ti->len instead of -ENXIO
> if the range from begin to end is fully mapped; which is inconsistent
> with the semantics you have in 4/9 (although in 6/9 I argue that
> having all of the dm callbacks return ti->begin + ti->len instead of
> -ENXIO might make logic easier for iterating through consecutive ti,
> and then convert to -ENXIO only in the caller).

Returning (ti->begin + ti->len) << SECTOR_SHIFT for SEEK_HOLE when there
is data at the end of the target is intentional. This matches the
semantics of lseek().

I agree there is adjustment necessary in dm.c, but I want to seek the
semantics of all lseek() functions identical to avoid confusion.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-04-03 14:11     ` Stefan Hajnoczi
@ 2024-04-03 17:02       ` Eric Blake
  2024-04-03 17:58         ` Stefan Hajnoczi
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2024-04-03 17:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Wed, Apr 03, 2024 at 10:11:47AM -0400, Stefan Hajnoczi wrote:
...
> > > +static loff_t dm_blk_do_seek_hole_data(struct dm_table *table, loff_t offset,
> > > +		int whence)
> > > +{
> > > +	struct dm_target *ti;
> > > +	loff_t end;
> > > +
> > > +	/* Loop when the end of a target is reached */
> > > +	do {
> > > +		ti = dm_table_find_target(table, offset >> SECTOR_SHIFT);
> > > +		if (!ti)
> > > +			return whence == SEEK_DATA ? -ENXIO : offset;
> > 
> > ...but this blindly returns offset for SEEK_HOLE, even when offset is
> > beyond the end of the dm.  I think you want 'return -ENXIO;'
> > unconditionally here.
> 
> If the initial offset is beyond the end of the table, then SEEK_HOLE
> should return -ENXIO. I agree that the code doesn't handle this case.
> 
> However, returning offset here is correct when there is data at the end
> with SEEK_HOLE.
> 
> I'll update the code to address the out-of-bounds offset case, perhaps
> by checking the initial offset before entering the loop.

You are correct that if we are on the second loop iteration of
SEEK_HOLE (because the first iteration saw all data), then we have
found the offset of the start of a hole and should return that offset,
not -ENXIO.  This may be a case where we just have to be careful on
whether the initial pass might have any corner cases different from
later times through the loop, and that we end the loop with correct
results for both SEEK_HOLE and SEEK_DATA.

> 
> > 
> > > +
> > > +		end = (ti->begin + ti->len) << SECTOR_SHIFT;
> > > +
> > > +		if (ti->type->seek_hole_data)
> > > +			offset = ti->type->seek_hole_data(ti, offset, whence);
> > 
> > Are we guaranteed that ti->type->seek_hole_data will not return a
> > value exceeding end?  Or can dm be used to truncate the view of an
> > underlying device, and the underlying seek_hold_data can now return an
> > answer beyond where dm_table_find_target should look for the next part
> > of the dm's view?
> 
> ti->type->seek_hole_data() must not return a value larger than
> (ti->begin + ti->len) << SECTOR_SHIFT.

Worth adding as documentation then.

> 
> > 
> > In which case, should the blkdev_seek_hole_data callback be passed a
> > max size parameter everywhere, similar to how fixed_size_llseek does
> > things?
> > 
> > > +		else
> > > +			offset = dm_blk_seek_hole_data_default(offset, whence, end);
> > > +
> > > +		if (whence == SEEK_DATA && offset == -ENXIO)
> > > +			offset = end;
> > 
> > You have a bug here.  If I have a dm contructed of two underlying targets:
> > 
> > |A  |B  |
> > 
> > and A is all data, then whence == SEEK_HOLE will have offset = -ENXIO
> > at this point, and you fail to check whether B is also data.  That is,
> > you have silently treated the rest of the block device as data, which
> > is semantically not wrong (as that is always a safe fallback), but not
> > optimal.
> > 
> > I think the correct logic is s/whence == SEEK_DATA &&//.
> 
> No, with whence == SEEK_HOLE and an initial offset in A, the new offset
> will be (A->begin + A->end) << SECTOR_SHIFT. The loop will iterate and
> continue seeking into B.
> 
> The if statement you commented on ensures that we also continue looping
> with whence == SEEK_DATA, because that would otherwise prematurely end
> with the new offset = -ENXIO.
> 
> > 
> > > +	} while (offset == end);
> > 
> > I'm trying to make sure that we can never return the equivalent of
> > lseek(dm, 0, SEEK_END).  If you make my above suggested changes, we
> > will iterate through the do loop once more at EOF, and
> > dm_table_find_target() will then fail to match at which point we do
> > get the desired -ENXIO for both SEEK_HOLE and SEEK_DATA.
> 
> Wait, lseek() is supposed to return the equivalent of lseek(dm, 0,
> SEEK_END) when whence == SEEK_HOLE and there is data at the end.

It was confusing enough for me to write my initial review, I apologize
if I'm making it harder for you.  Yes, we want to ensure that:

off1 = lseek(fd, -1, SEEK_END);
off2 = off1 + 1; // == lseek(fd, 0, SEEK_END)

if off1 belongs to a data extent:
  - lseek(fd, off1, SEEK_DATA) == off1
  - lseek(fd, off1, SEEK_HOLE) == off2
  - lseek(fd, off2, SEEK_DATA) == -ENXIO
  - lseek(fd, off2, SEEK_HOLE) == -ENXIO

if off1 belongs to a hole:
  - lseek(fd, off1, SEEK_DATA) == -ENXIO
  - lseek(fd, off1, SEEK_HOLE) == off1
  - lseek(fd, off2, SEEK_DATA) == -ENXIO
  - lseek(fd, off2, SEEK_HOLE) == -ENXIO

Anything in my wall of text from the earlier message inconsistent with
this table can be ignored; but at the same time, I was not able to
quickly convince myself that your code properly had those properties,
even after writing up the table.

Reiterating what I said elsewhere, it may be smarter to document that
for callbacks, it is wiser to require intermediate behavior that the
input value 'offset' is always between the half-open range
[ti->begin<<SECTOR_SHIFT, (ti->begin+ti->len)<<SECTOR_SHIFT), and on
success, the output must be in the fully-closed range [offset,
(ti->begin+ti->len)<<SECTOR_SHIFT], errors like -EIO are permitted but
-ENXIO should not be returned; and let the caller worry about
synthesizing -ENXIO from that (since the caller knows whether or not
there is a successor ti where adjacency concerns come into play).

That is, we can never pass in off2 (beyond the bounds of the table),
and when passing in off1, I think this interface may be easier to work
with in the intermediate layers, even though it differs from the
lseek() interface above.  For off1 in data:
  - dm_blk_do_seek_hole_data(dm, off1, SEEK_DATA) == off1
  - dm_blk_do_seek_hole_data(dm, off1, SEEK_HOLE) == off2
and for a hole:
  - dm_blk_do_seek_hole_data(dm, off1, SEEK_DATA) == off2
  - dm_blk_do_seek_hole_data(dm, off1, SEEK_HOLE) == off1

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-04-03 17:02       ` Eric Blake
@ 2024-04-03 17:58         ` Stefan Hajnoczi
  2024-04-03 19:28           ` Eric Blake
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Hajnoczi @ 2024-04-03 17:58 UTC (permalink / raw)
  To: Eric Blake
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

[-- Attachment #1: Type: text/plain, Size: 6609 bytes --]

On Wed, Apr 03, 2024 at 12:02:19PM -0500, Eric Blake wrote:
> On Wed, Apr 03, 2024 at 10:11:47AM -0400, Stefan Hajnoczi wrote:
> ...
> > > > +static loff_t dm_blk_do_seek_hole_data(struct dm_table *table, loff_t offset,
> > > > +		int whence)
> > > > +{
> > > > +	struct dm_target *ti;
> > > > +	loff_t end;
> > > > +
> > > > +	/* Loop when the end of a target is reached */
> > > > +	do {
> > > > +		ti = dm_table_find_target(table, offset >> SECTOR_SHIFT);
> > > > +		if (!ti)
> > > > +			return whence == SEEK_DATA ? -ENXIO : offset;
> > > 
> > > ...but this blindly returns offset for SEEK_HOLE, even when offset is
> > > beyond the end of the dm.  I think you want 'return -ENXIO;'
> > > unconditionally here.
> > 
> > If the initial offset is beyond the end of the table, then SEEK_HOLE
> > should return -ENXIO. I agree that the code doesn't handle this case.
> > 
> > However, returning offset here is correct when there is data at the end
> > with SEEK_HOLE.
> > 
> > I'll update the code to address the out-of-bounds offset case, perhaps
> > by checking the initial offset before entering the loop.
> 
> You are correct that if we are on the second loop iteration of
> SEEK_HOLE (because the first iteration saw all data), then we have
> found the offset of the start of a hole and should return that offset,
> not -ENXIO.  This may be a case where we just have to be careful on
> whether the initial pass might have any corner cases different from
> later times through the loop, and that we end the loop with correct
> results for both SEEK_HOLE and SEEK_DATA.
> 
> > 
> > > 
> > > > +
> > > > +		end = (ti->begin + ti->len) << SECTOR_SHIFT;
> > > > +
> > > > +		if (ti->type->seek_hole_data)
> > > > +			offset = ti->type->seek_hole_data(ti, offset, whence);
> > > 
> > > Are we guaranteed that ti->type->seek_hole_data will not return a
> > > value exceeding end?  Or can dm be used to truncate the view of an
> > > underlying device, and the underlying seek_hold_data can now return an
> > > answer beyond where dm_table_find_target should look for the next part
> > > of the dm's view?
> > 
> > ti->type->seek_hole_data() must not return a value larger than
> > (ti->begin + ti->len) << SECTOR_SHIFT.
> 
> Worth adding as documentation then.
> 
> > 
> > > 
> > > In which case, should the blkdev_seek_hole_data callback be passed a
> > > max size parameter everywhere, similar to how fixed_size_llseek does
> > > things?
> > > 
> > > > +		else
> > > > +			offset = dm_blk_seek_hole_data_default(offset, whence, end);
> > > > +
> > > > +		if (whence == SEEK_DATA && offset == -ENXIO)
> > > > +			offset = end;
> > > 
> > > You have a bug here.  If I have a dm contructed of two underlying targets:
> > > 
> > > |A  |B  |
> > > 
> > > and A is all data, then whence == SEEK_HOLE will have offset = -ENXIO
> > > at this point, and you fail to check whether B is also data.  That is,
> > > you have silently treated the rest of the block device as data, which
> > > is semantically not wrong (as that is always a safe fallback), but not
> > > optimal.
> > > 
> > > I think the correct logic is s/whence == SEEK_DATA &&//.
> > 
> > No, with whence == SEEK_HOLE and an initial offset in A, the new offset
> > will be (A->begin + A->end) << SECTOR_SHIFT. The loop will iterate and
> > continue seeking into B.
> > 
> > The if statement you commented on ensures that we also continue looping
> > with whence == SEEK_DATA, because that would otherwise prematurely end
> > with the new offset = -ENXIO.
> > 
> > > 
> > > > +	} while (offset == end);
> > > 
> > > I'm trying to make sure that we can never return the equivalent of
> > > lseek(dm, 0, SEEK_END).  If you make my above suggested changes, we
> > > will iterate through the do loop once more at EOF, and
> > > dm_table_find_target() will then fail to match at which point we do
> > > get the desired -ENXIO for both SEEK_HOLE and SEEK_DATA.
> > 
> > Wait, lseek() is supposed to return the equivalent of lseek(dm, 0,
> > SEEK_END) when whence == SEEK_HOLE and there is data at the end.
> 
> It was confusing enough for me to write my initial review, I apologize
> if I'm making it harder for you.

No worries, if my code is hard to understand I can learn from your
feedback.

> Yes, we want to ensure that:
> 
> off1 = lseek(fd, -1, SEEK_END);
> off2 = off1 + 1; // == lseek(fd, 0, SEEK_END)
> 
> if off1 belongs to a data extent:
>   - lseek(fd, off1, SEEK_DATA) == off1
>   - lseek(fd, off1, SEEK_HOLE) == off2
>   - lseek(fd, off2, SEEK_DATA) == -ENXIO
>   - lseek(fd, off2, SEEK_HOLE) == -ENXIO

Agreed.

> if off1 belongs to a hole:
>   - lseek(fd, off1, SEEK_DATA) == -ENXIO
>   - lseek(fd, off1, SEEK_HOLE) == off1
>   - lseek(fd, off2, SEEK_DATA) == -ENXIO
>   - lseek(fd, off2, SEEK_HOLE) == -ENXIO

Agreed.

> 
> Anything in my wall of text from the earlier message inconsistent with
> this table can be ignored; but at the same time, I was not able to
> quickly convince myself that your code properly had those properties,
> even after writing up the table.
> 
> Reiterating what I said elsewhere, it may be smarter to document that
> for callbacks, it is wiser to require intermediate behavior that the
> input value 'offset' is always between the half-open range
> [ti->begin<<SECTOR_SHIFT, (ti->begin+ti->len)<<SECTOR_SHIFT), and on
> success, the output must be in the fully-closed range [offset,
> (ti->begin+ti->len)<<SECTOR_SHIFT], errors like -EIO are permitted but
> -ENXIO should not be returned; and let the caller worry about
> synthesizing -ENXIO from that (since the caller knows whether or not
> there is a successor ti where adjacency concerns come into play).
> 
> That is, we can never pass in off2 (beyond the bounds of the table),
> and when passing in off1, I think this interface may be easier to work
> with in the intermediate layers, even though it differs from the
> lseek() interface above.  For off1 in data:
>   - dm_blk_do_seek_hole_data(dm, off1, SEEK_DATA) == off1
>   - dm_blk_do_seek_hole_data(dm, off1, SEEK_HOLE) == off2
> and for a hole:
>   - dm_blk_do_seek_hole_data(dm, off1, SEEK_DATA) == off2
>   - dm_blk_do_seek_hole_data(dm, off1, SEEK_HOLE) == off1

I'll take a look again starting from block/fops.c, through dm.c, and
into dm-linear.c to see how to make things clearest. Although I would
like to have the same semantics for every seek function, maybe in the
end your suggestion will make the code clearer. Let's see.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-04-03 17:58         ` Stefan Hajnoczi
@ 2024-04-03 19:28           ` Eric Blake
  0 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2024-04-03 19:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-block, linux-kernel, Alasdair Kergon, Mikulas Patocka,
	dm-devel, David Teigland, Mike Snitzer, Jens Axboe,
	Christoph Hellwig, Joe Thornber

On Wed, Apr 03, 2024 at 01:58:38PM -0400, Stefan Hajnoczi wrote:
> > Yes, we want to ensure that:
> > 
> > off1 = lseek(fd, -1, SEEK_END);
> > off2 = off1 + 1; // == lseek(fd, 0, SEEK_END)
> > 
> > if off1 belongs to a data extent:
> >   - lseek(fd, off1, SEEK_DATA) == off1
> >   - lseek(fd, off1, SEEK_HOLE) == off2
> >   - lseek(fd, off2, SEEK_DATA) == -ENXIO
> >   - lseek(fd, off2, SEEK_HOLE) == -ENXIO
> 
> Agreed.
> 
> > if off1 belongs to a hole:
> >   - lseek(fd, off1, SEEK_DATA) == -ENXIO
> >   - lseek(fd, off1, SEEK_HOLE) == off1
> >   - lseek(fd, off2, SEEK_DATA) == -ENXIO
> >   - lseek(fd, off2, SEEK_HOLE) == -ENXIO
> 
> Agreed.
> 
...
> > 
> > That is, we can never pass in off2 (beyond the bounds of the table),
> > and when passing in off1, I think this interface may be easier to work
> > with in the intermediate layers, even though it differs from the
> > lseek() interface above.  For off1 in data:
> >   - dm_blk_do_seek_hole_data(dm, off1, SEEK_DATA) == off1
> >   - dm_blk_do_seek_hole_data(dm, off1, SEEK_HOLE) == off2
> > and for a hole:
> >   - dm_blk_do_seek_hole_data(dm, off1, SEEK_DATA) == off2
> >   - dm_blk_do_seek_hole_data(dm, off1, SEEK_HOLE) == off1
>

In the caller, we already need to know both off1 and off2 (that is,
the offset we are asking about, AND the offset at which the underlying
component ends its map at, since that is where we are then in charge
of whether to concatenate that with the next component or give the
overall result).

If we already guarantee that we never call into a lower layer with
off2 (because it was beyond bounds), then the only difference between
the two semantics is whether SEEK_DATA in a final hole returns -ENXIO
or off2; and it looks like we can do a conversion from either style
into the other.

So designing the caller logic, it looks like I would want:

if off1 >= EOF return -ENXIO (out of bounds)

while off1 < EOF:

  determine off2 of current lower region
  at this point, we are guaranteed off1 < off2
  we also know that off2 < EOF unless we are on last lower region
  call result=lower_layer(off1, SEEK_X)
  it is a bug if result is non-negative but < off1, or if result > off2

  if result == off1, return result (we are already in the right HOLE
  or DATA)

  if result > off1 and < off2, return result (we had to advance to get
  to the right region, but the right region was within the lower
  layer, and we don't need to inspect further layers)

  Note - the above two paragraphs can be collapsed into one: if result
  < off2, return result (the current subregion gave us an adequate
  answer)

  if result == off2, continue to the next region with off1=result (in
  first semantics, this is only possible for SEEK_HOLE for a lower
  region ending in data; in the second semantics, it is possible for
  either SEEK_HOLE or SEEK_DATA for a lower region ending in the type
  opposite of the request)

  if result == -ENXIO, continue to the next region by using off1=off2
  (only possible in the first semantics for SEEK_DATA on a lower
  region ending in a hole)

  any other result is necessarily a negative errno like -EIO; return it

if the loop completes without an early return, we are out of lower
regions, and we should be left with off1 == EOF.  Because we advanced,
we know now to:
return whence == SEEK_HOLE ? off1 : -ENXIO

> I'll take a look again starting from block/fops.c, through dm.c, and
> into dm-linear.c to see how to make things clearest. Although I would
> like to have the same semantics for every seek function, maybe in the
> end your suggestion will make the code clearer. Let's see.

Keeping lseek semantics may require a couple more lines of code
duplicated at each layer, but less maintainer headaches in the long
run of converting between slightly different semantics depending on
which layer you are at.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-04-02 13:04   ` Stefan Hajnoczi
@ 2024-04-05  7:02     ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2024-04-05  7:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Christoph Hellwig, linux-block, linux-kernel, eblake,
	Alasdair Kergon, Mikulas Patocka, dm-devel, David Teigland,
	Mike Snitzer, Jens Axboe, Joe Thornber

On Tue, Apr 02, 2024 at 09:04:46AM -0400, Stefan Hajnoczi wrote:
> Hi Christoph,
> There is a 1:1 mapping when when the Logical Block Provisioning Read
> Zeroes (LBPRZ) field is set to xx1b in the Logical Block Provisioning
> VPD page.

Yes.  NVMe also has a similar field, but ATA does not.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support
  2024-04-02 13:31   ` Eric Blake
@ 2024-04-05  7:02     ` Christoph Hellwig
  0 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2024-04-05  7:02 UTC (permalink / raw)
  To: Eric Blake
  Cc: Christoph Hellwig, Stefan Hajnoczi, linux-block, linux-kernel,
	Alasdair Kergon, Mikulas Patocka, dm-devel, David Teigland,
	Mike Snitzer, Jens Axboe, Joe Thornber

On Tue, Apr 02, 2024 at 08:31:09AM -0500, Eric Blake wrote:
> As well as my question on whether the community would be open to
> introducing new SEEK_* constants to allow orthogonality between
> searching for zeroes (known to read as zero, whether or not it was
> allocated) vs. sparseness (known to be unallocated, whether or not it
> reads as zero), where the existing SEEK_HOLE seeks for both properties
> at once.

That seems like quite an effort.  Is is worth it?


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2024-04-05  7:03 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28 20:39 [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 1/9] " Stefan Hajnoczi
2024-03-28 23:50   ` Eric Blake
2024-03-28 20:39 ` [RFC 2/9] loop: " Stefan Hajnoczi
2024-03-29  0:00   ` Eric Blake
2024-03-29 12:54     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 3/9] selftests: block_seek_hole: add loop block driver tests Stefan Hajnoczi
2024-03-29  0:11   ` Eric Blake
2024-04-03 13:50     ` Stefan Hajnoczi
2024-03-29 12:38   ` Eric Blake
2024-04-03 13:51     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 4/9] dm: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
2024-03-29  0:38   ` Eric Blake
2024-04-03 14:11     ` Stefan Hajnoczi
2024-04-03 17:02       ` Eric Blake
2024-04-03 17:58         ` Stefan Hajnoczi
2024-04-03 19:28           ` Eric Blake
2024-03-28 20:39 ` [RFC 5/9] selftests: block_seek_hole: add dm-zero test Stefan Hajnoczi
2024-03-28 22:19   ` Eric Blake
2024-03-28 22:32     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 6/9] dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
2024-03-29  0:54   ` Eric Blake
2024-04-03 14:22     ` Stefan Hajnoczi
2024-03-31  7:35   ` kernel test robot
2024-04-03 14:14     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 7/9] selftests: block_seek_hole: add dm-linear test Stefan Hajnoczi
2024-03-29  0:59   ` Eric Blake
2024-04-03 14:23     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 8/9] dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support Stefan Hajnoczi
2024-03-29  1:31   ` Eric Blake
2024-04-03 15:03     ` Stefan Hajnoczi
2024-03-28 20:39 ` [RFC 9/9] selftests: block_seek_hole: add dm-thin test Stefan Hajnoczi
2024-03-28 22:16 ` [RFC 0/9] block: add llseek(SEEK_HOLE/SEEK_DATA) support Eric Blake
2024-03-28 22:29   ` Eric Blake
2024-03-28 23:09   ` Stefan Hajnoczi
2024-04-02 12:26 ` Christoph Hellwig
2024-04-02 13:04   ` Stefan Hajnoczi
2024-04-05  7:02     ` Christoph Hellwig
2024-04-02 13:31   ` Eric Blake
2024-04-05  7:02     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.