All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe
@ 2023-06-29 15:54 David Howells
  2023-06-29 15:54 ` [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns David Howells
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: David Howells @ 2023-06-29 15:54 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Matthew Wilcox, Dave Chinner, Matt Whitlock,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel

Due to the way splice() and vmsplice() currently splice active pages from
the pagecache or process VM into the intermediary pipe, changes to the data
in those pages can occur whilst they're held in the pipe by such as
write(), writing through a shared-writable mmap or using fallocate() to
mangle the file[1] change the data.

Matt Whitlock, Matthew Wilcox and Dave Chinner are of the opinion that data
in the pipe must not be seen to change and that if it does, this is a bug.
Apart from in one specific instance (vmsplice() with SPLICE_F_GIFT), the
manual pages agree with them.  I'm more inclined to adjust the
documentation since the behaviour we have has been that way since 2005, I
think.

These patches attempt to fix this by stealing a page if possible and
copying the data if not before splice() or vmsplice() adds it to the pipe.

Whilst this does allow the code to be somewhat simplified, it also results
in a loss of performance: stolen pages have to be reloaded in accessed
again; more data has to be copied.

Ideally, this should result in all pages in the pipe belonging solely to
the pipe and so they can be removed from the pipe and spliced into
pagecaches or process VM immediately with no further checking required.

Note that this conversion is incomplete.  It does not simplify fuse and
virtio_console and it does not clean up the splicing into pipes from
relayfs, watch_queue and sockets.

There's also a bug in the vmsplice() page stealing.  It mostly works but
after splicing a bunch of pages, it will oops somewhere in the interval
tree's macros.

I've pushed the patches here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=splice-fix-corruption

David

Link: https://lore.kernel.org/r/ec804f26-fa76-4fbe-9b1c-8fbbd829b735@mattwhitlock.name/ [1]

David Howells (4):
  splice: Fix corruption of spliced data after splice() returns
  splice: Make vmsplice() steal or copy
  splice: Remove some now-unused bits
  splice: Record some statistics

 fs/fuse/dev.c             |  37 -----
 fs/pipe.c                 |  12 --
 fs/splice.c               | 304 ++++++++++++++++++--------------------
 include/linux/pipe_fs_i.h |  14 --
 include/linux/splice.h    |   4 +-
 mm/filemap.c              |  98 +++++++++++-
 mm/internal.h             |   4 +-
 mm/shmem.c                |   8 +-
 8 files changed, 245 insertions(+), 236 deletions(-)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-06-29 15:54 [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe David Howells
@ 2023-06-29 15:54 ` David Howells
  2023-07-19 10:17   ` Miklos Szeredi
  2023-06-29 15:54 ` [RFC PATCH 2/4] splice: Make vmsplice() steal or copy David Howells
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: David Howells @ 2023-06-29 15:54 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Matthew Wilcox, Dave Chinner, Matt Whitlock,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

Splicing data from, say, a file into a pipe currently leaves the source
pages in the pipe after splice() returns - but this means that those pages
can be subsequently modified by shared-writable mmap(), write(),
fallocate(), etc. before they're consumed.

Fix this by stealing the pages in splice() before they're added to the pipe
if no one else is using them or has them mapped and copying them otherwise.

Reported-by: Matt Whitlock <kernel@mattwhitlock.name>
Link: https://lore.kernel.org/r/ec804f26-fa76-4fbe-9b1c-8fbbd829b735@mattwhitlock.name/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Dave Chinner <david@fromorbit.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-fsdevel@vger.kernel.org
---
 mm/filemap.c  | 92 ++++++++++++++++++++++++++++++++++++++++++++++++---
 mm/internal.h |  4 +--
 mm/shmem.c    |  8 +++--
 3 files changed, 95 insertions(+), 9 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 9e44a49bbd74..a002df515966 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2838,15 +2838,87 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 }
 EXPORT_SYMBOL(generic_file_read_iter);
 
+static inline void copy_folio_to_folio(struct folio *src, size_t src_offset,
+				       struct folio *dst, size_t dst_offset,
+				       size_t size)
+{
+	void *p, *q;
+
+	while (size > 0) {
+		size_t part = min3(PAGE_SIZE - src_offset % PAGE_SIZE,
+				   PAGE_SIZE - dst_offset % PAGE_SIZE,
+				   size);
+
+		p = kmap_local_folio(src, src_offset);
+		q = kmap_local_folio(dst, dst_offset);
+		memcpy(q, p, part);
+		kunmap_local(p);
+		kunmap_local(q);
+		src_offset += part;
+		dst_offset += part;
+		size -= part;
+	}
+}
+
 /*
- * Splice subpages from a folio into a pipe.
+ * Splice data from a folio into a pipe.  The folio is stolen if no one else is
+ * using it and copied otherwise.  We can't put the folio into the pipe still
+ * attached to the pagecache as that allows someone to modify it after the
+ * splice.
  */
-size_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
-			      struct folio *folio, loff_t fpos, size_t size)
+ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
+			       struct folio *folio, loff_t fpos, size_t size)
 {
+	struct address_space *mapping;
+	struct folio *copy = NULL;
 	struct page *page;
+	unsigned int flags = 0;
+	ssize_t ret;
 	size_t spliced = 0, offset = offset_in_folio(folio, fpos);
 
+	folio_lock(folio);
+
+	mapping = folio_mapping(folio);
+	ret = -ENODATA;
+	if (!folio->mapping)
+		goto err_unlock; /* Truncated */
+	ret = -EIO;
+	if (!folio_test_uptodate(folio))
+		goto err_unlock;
+
+	/*
+	 * At least for ext2 with nobh option, we need to wait on writeback
+	 * completing on this folio, since we'll remove it from the pagecache.
+	 * Otherwise truncate wont wait on the folio, allowing the disk blocks
+	 * to be reused by someone else before we actually wrote our data to
+	 * them. fs corruption ensues.
+	 */
+	folio_wait_writeback(folio);
+
+	if (folio_has_private(folio) &&
+	    !filemap_release_folio(folio, GFP_KERNEL))
+		goto need_copy;
+
+	/* If we succeed in removing the mapping, set LRU flag and add it. */
+	if (remove_mapping(mapping, folio)) {
+		folio_unlock(folio);
+		flags = PIPE_BUF_FLAG_LRU;
+		goto add_to_pipe;
+	}
+
+need_copy:
+	folio_unlock(folio);
+
+	copy = folio_alloc(GFP_KERNEL, 0);
+	if (!copy)
+		return -ENOMEM;
+
+	size = min(size, PAGE_SIZE - offset % PAGE_SIZE);
+	copy_folio_to_folio(folio, offset, copy, 0, size);
+	folio = copy;
+	offset = 0;
+
+add_to_pipe:
 	page = folio_page(folio, offset / PAGE_SIZE);
 	size = min(size, folio_size(folio) - offset);
 	offset %= PAGE_SIZE;
@@ -2861,6 +2933,7 @@ size_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
 			.page	= page,
 			.offset	= offset,
 			.len	= part,
+			.flags	= flags,
 		};
 		folio_get(folio);
 		pipe->head++;
@@ -2869,7 +2942,13 @@ size_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
 		offset = 0;
 	}
 
+	if (copy)
+		folio_put(copy);
 	return spliced;
+
+err_unlock:
+	folio_unlock(folio);
+	return ret;
 }
 
 /**
@@ -2947,7 +3026,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
 
 		for (i = 0; i < folio_batch_count(&fbatch); i++) {
 			struct folio *folio = fbatch.folios[i];
-			size_t n;
+			ssize_t n;
 
 			if (folio_pos(folio) >= end_offset)
 				goto out;
@@ -2963,8 +3042,11 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
 
 			n = min_t(loff_t, len, isize - *ppos);
 			n = splice_folio_into_pipe(pipe, folio, *ppos, n);
-			if (!n)
+			if (n <= 0) {
+				if (n < 0)
+					error = n;
 				goto out;
+			}
 			len -= n;
 			total_spliced += n;
 			*ppos += n;
diff --git a/mm/internal.h b/mm/internal.h
index a7d9e980429a..ae395e0f31d5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -881,8 +881,8 @@ struct migration_target_control {
 /*
  * mm/filemap.c
  */
-size_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
-			      struct folio *folio, loff_t fpos, size_t size);
+ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
+			       struct folio *folio, loff_t fpos, size_t size);
 
 /*
  * mm/vmalloc.c
diff --git a/mm/shmem.c b/mm/shmem.c
index 2f2e0e618072..969931b0f00e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2783,7 +2783,8 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
 	struct inode *inode = file_inode(in);
 	struct address_space *mapping = inode->i_mapping;
 	struct folio *folio = NULL;
-	size_t total_spliced = 0, used, npages, n, part;
+	ssize_t n;
+	size_t total_spliced = 0, used, npages, part;
 	loff_t isize;
 	int error = 0;
 
@@ -2844,8 +2845,11 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
 			n = splice_zeropage_into_pipe(pipe, *ppos, len);
 		}
 
-		if (!n)
+		if (n <= 0) {
+			if (n < 0)
+				error = n;
 			break;
+		}
 		len -= n;
 		total_spliced += n;
 		*ppos += n;


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 2/4] splice: Make vmsplice() steal or copy
  2023-06-29 15:54 [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe David Howells
  2023-06-29 15:54 ` [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns David Howells
@ 2023-06-29 15:54 ` David Howells
  2023-06-30 13:44   ` Simon Horman
  2023-06-30 15:29   ` David Howells
  2023-06-29 15:54 ` [RFC PATCH 3/4] splice: Remove some now-unused bits David Howells
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 32+ messages in thread
From: David Howells @ 2023-06-29 15:54 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Matthew Wilcox, Dave Chinner, Matt Whitlock,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

Make vmsplice()-to-pipe try to steal gifted data or else copy the source
data immediately before adding it to the pipe.  This prevents the data
added to the pipe from being modified by write(), by shared-writable mmap
and by fallocate().

[!] Note: I'm using unmap_mapping_folio() and remove_mapping() to steal a
    gifted page on behalf of vmsplice().  It works partly, but after a
    large batch of stealing, it will oops, but I can't tell why as it dies
    in the middle of a huge chunk of macro-generated interval tree code.

[!] Note: I'm only allowing theft of pages with refcount <= 4.  refcount == 3
    would actually seem to be the right thing (one for the caller, one for the
    pagecache and one for our page table), but sometimes a fourth ref is held
    transiently (possibly deferred put from page-in).

Reported-by:  Matt Whitlock <kernel@mattwhitlock.name>
Link: https://lore.kernel.org/r/ec804f26-fa76-4fbe-9b1c-8fbbd829b735@mattwhitlock.name/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Dave Chinner <david@fromorbit.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-fsdevel@vger.kernel.org
---
 fs/splice.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 113 insertions(+), 10 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 004eb1c4ce31..42af642c0ff8 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -37,6 +37,7 @@
 #include <linux/socket.h>
 #include <linux/sched/signal.h>
 
+#include "../mm/internal.h"
 #include "internal.h"
 
 /*
@@ -1382,14 +1383,117 @@ static long __do_splice(struct file *in, loff_t __user *off_in,
 	return ret;
 }
 
+static void copy_folio_to_folio(struct folio *src, size_t src_offset,
+				struct folio *dst, size_t dst_offset,
+				size_t size)
+{
+	void *p, *q;
+
+	while (size > 0) {
+		size_t part = min3(PAGE_SIZE - src_offset % PAGE_SIZE,
+				   PAGE_SIZE - dst_offset % PAGE_SIZE,
+				   size);
+
+		p = kmap_local_folio(src, src_offset);
+		q = kmap_local_folio(dst, dst_offset);
+		memcpy(q, p, part);
+		kunmap_local(p);
+		kunmap_local(q);
+		src_offset += part;
+		dst_offset += part;
+		size -= part;
+	}
+}
+
+static int splice_try_to_steal_page(struct pipe_inode_info *pipe,
+				    struct page *page, size_t offset,
+				    size_t size, unsigned int splice_flags)
+{
+	struct folio *folio = page_folio(page), *copy;
+	unsigned int flags = 0;
+	size_t fsize = folio_size(folio), spliced = 0;
+
+	if (!(splice_flags & SPLICE_F_GIFT) ||
+	    fsize != PAGE_SIZE || offset != 0 || size != fsize)
+		goto need_copy;
+
+	/*
+	 * For a folio to be stealable, the caller holds a ref, the mapping
+	 * holds a ref and the page tables hold a ref; it may or may not also
+	 * be on the LRU.  Anything else and someone else has access to it.
+	 */
+	if (folio_ref_count(folio) > 4 || folio_mapcount(folio) != 1 ||
+	    folio_maybe_dma_pinned(folio))
+		goto need_copy;
+
+	/* Try to steal. */
+	folio_lock(folio);
+
+	if (folio_ref_count(folio) > 4 || folio_mapcount(folio) != 1 ||
+	    folio_maybe_dma_pinned(folio))
+		goto need_copy_unlock;
+	if (!folio->mapping)
+		goto need_copy_unlock; /* vmsplice race? */
+
+	/*
+	 * Remove the folio from the process VM and then try to remove
+	 * it from the mapping.  It we can't remove it, we'll have to
+	 * copy it instead.
+	 */
+	unmap_mapping_folio(folio);
+	if (remove_mapping(folio->mapping, folio)) {
+		folio_clear_mappedtodisk(folio);
+		flags |= PIPE_BUF_FLAG_LRU;
+		goto add_to_pipe;
+	}
+
+need_copy_unlock:
+	folio_unlock(folio);
+need_copy:
+
+	copy = folio_alloc(GFP_KERNEL, 0);
+	if (!copy)
+		return -ENOMEM;
+
+	size = min(size, PAGE_SIZE - offset % PAGE_SIZE);
+	copy_folio_to_folio(folio, offset, copy, 0, size);
+	folio_mark_uptodate(copy);
+	folio_put(folio);
+	folio = copy;
+	offset = 0;
+
+add_to_pipe:
+	page = folio_page(folio, offset / PAGE_SIZE);
+	size = min(size, folio_size(folio) - offset);
+	offset %= PAGE_SIZE;
+
+	while (spliced < size &&
+	       !pipe_full(pipe->head, pipe->tail, pipe->max_usage)) {
+		struct pipe_buffer *buf = pipe_head_buf(pipe);
+		size_t part = min_t(size_t, PAGE_SIZE - offset, size - spliced);
+
+		*buf = (struct pipe_buffer) {
+			.ops	= &default_pipe_buf_ops,
+			.page	= page,
+			.offset	= offset,
+			.len	= part,
+			.flags	= flags,
+		};
+		folio_get(folio);
+		pipe->head++;
+		page++;
+		spliced += part;
+		offset = 0;
+	}
+
+	folio_put(folio);
+	return spliced;
+}
+
 static int iter_to_pipe(struct iov_iter *from,
 			struct pipe_inode_info *pipe,
 			unsigned flags)
 {
-	struct pipe_buffer buf = {
-		.ops = &user_page_pipe_buf_ops,
-		.flags = flags
-	};
 	size_t total = 0;
 	int ret = 0;
 
@@ -1407,12 +1511,11 @@ static int iter_to_pipe(struct iov_iter *from,
 
 		n = DIV_ROUND_UP(left + start, PAGE_SIZE);
 		for (i = 0; i < n; i++) {
-			int size = min_t(int, left, PAGE_SIZE - start);
+			size_t part = min_t(size_t, left,
+					    PAGE_SIZE - start % PAGE_SIZE);
 
-			buf.page = pages[i];
-			buf.offset = start;
-			buf.len = size;
-			ret = add_to_pipe(pipe, &buf);
+			ret = splice_try_to_steal_page(pipe, pages[i], start,
+						       part, flags);
 			if (unlikely(ret < 0)) {
 				iov_iter_revert(from, left);
 				// this one got dropped by add_to_pipe()
@@ -1421,7 +1524,7 @@ static int iter_to_pipe(struct iov_iter *from,
 				goto out;
 			}
 			total += ret;
-			left -= size;
+			left -= part;
 			start = 0;
 		}
 	}


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 3/4] splice: Remove some now-unused bits
  2023-06-29 15:54 [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe David Howells
  2023-06-29 15:54 ` [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns David Howells
  2023-06-29 15:54 ` [RFC PATCH 2/4] splice: Make vmsplice() steal or copy David Howells
@ 2023-06-29 15:54 ` David Howells
  2023-06-29 15:54 ` [RFC PATCH 4/4] splice: Record some statistics David Howells
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: David Howells @ 2023-06-29 15:54 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Matthew Wilcox, Dave Chinner, Matt Whitlock,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

Remove some code that's no longer used as the ->confirm() op is no longer
used and pages spliced in from the pagecache and process VM are now
pre-stolen or copied.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Dave Chinner <david@fromorbit.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-fsdevel@vger.kernel.org
---
 fs/fuse/dev.c             |  37 ---------
 fs/pipe.c                 |  12 ---
 fs/splice.c               | 155 +-------------------------------------
 include/linux/pipe_fs_i.h |  14 ----
 include/linux/splice.h    |   1 -
 mm/filemap.c              |   2 +-
 6 files changed, 3 insertions(+), 218 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1a8f82f478cb..9718dce0f0d9 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -700,10 +700,6 @@ static int fuse_copy_fill(struct fuse_copy_state *cs)
 		struct pipe_buffer *buf = cs->pipebufs;
 
 		if (!cs->write) {
-			err = pipe_buf_confirm(cs->pipe, buf);
-			if (err)
-				return err;
-
 			BUG_ON(!cs->nr_segs);
 			cs->currbuf = buf;
 			cs->pg = buf->page;
@@ -766,26 +762,6 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size)
 	return ncpy;
 }
 
-static int fuse_check_folio(struct folio *folio)
-{
-	if (folio_mapped(folio) ||
-	    folio->mapping != NULL ||
-	    (folio->flags & PAGE_FLAGS_CHECK_AT_PREP &
-	     ~(1 << PG_locked |
-	       1 << PG_referenced |
-	       1 << PG_uptodate |
-	       1 << PG_lru |
-	       1 << PG_active |
-	       1 << PG_workingset |
-	       1 << PG_reclaim |
-	       1 << PG_waiters |
-	       LRU_GEN_MASK | LRU_REFS_MASK))) {
-		dump_page(&folio->page, "fuse: trying to steal weird page");
-		return 1;
-	}
-	return 0;
-}
-
 static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
 {
 	int err;
@@ -800,10 +776,6 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
 
 	fuse_copy_finish(cs);
 
-	err = pipe_buf_confirm(cs->pipe, buf);
-	if (err)
-		goto out_put_old;
-
 	BUG_ON(!cs->nr_segs);
 	cs->currbuf = buf;
 	cs->len = buf->len;
@@ -818,14 +790,6 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
 
 	newfolio = page_folio(buf->page);
 
-	if (!folio_test_uptodate(newfolio))
-		folio_mark_uptodate(newfolio);
-
-	folio_clear_mappedtodisk(newfolio);
-
-	if (fuse_check_folio(newfolio) != 0)
-		goto out_fallback_unlock;
-
 	/*
 	 * This is a new and locked page, it shouldn't be mapped or
 	 * have any special flags on it
@@ -2020,7 +1984,6 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe,
 				goto out_free;
 
 			*obuf = *ibuf;
-			obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
 			obuf->len = rem;
 			ibuf->offset += obuf->len;
 			ibuf->len -= obuf->len;
diff --git a/fs/pipe.c b/fs/pipe.c
index 2d88f73f585a..d5c86eb20f29 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -286,7 +286,6 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
 			struct pipe_buffer *buf = &pipe->bufs[tail & mask];
 			size_t chars = buf->len;
 			size_t written;
-			int error;
 
 			if (chars > total_len) {
 				if (buf->flags & PIPE_BUF_FLAG_WHOLE) {
@@ -297,13 +296,6 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
 				chars = total_len;
 			}
 
-			error = pipe_buf_confirm(pipe, buf);
-			if (error) {
-				if (!ret)
-					ret = error;
-				break;
-			}
-
 			written = copy_page_to_iter(buf->page, buf->offset, chars, to);
 			if (unlikely(written < chars)) {
 				if (!ret)
@@ -462,10 +454,6 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
 
 		if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) &&
 		    offset + chars <= PAGE_SIZE) {
-			ret = pipe_buf_confirm(pipe, buf);
-			if (ret)
-				goto out;
-
 			ret = copy_page_from_iter(buf->page, offset, chars, from);
 			if (unlikely(ret < chars)) {
 				ret = -EFAULT;
diff --git a/fs/splice.c b/fs/splice.c
index 42af642c0ff8..2b1f109a7d4f 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -56,129 +56,6 @@ static noinline void noinline pipe_clear_nowait(struct file *file)
 	} while (!try_cmpxchg(&file->f_mode, &fmode, fmode & ~FMODE_NOWAIT));
 }
 
-/*
- * Attempt to steal a page from a pipe buffer. This should perhaps go into
- * a vm helper function, it's already simplified quite a bit by the
- * addition of remove_mapping(). If success is returned, the caller may
- * attempt to reuse this page for another destination.
- */
-static bool page_cache_pipe_buf_try_steal(struct pipe_inode_info *pipe,
-		struct pipe_buffer *buf)
-{
-	struct folio *folio = page_folio(buf->page);
-	struct address_space *mapping;
-
-	folio_lock(folio);
-
-	mapping = folio_mapping(folio);
-	if (mapping) {
-		WARN_ON(!folio_test_uptodate(folio));
-
-		/*
-		 * At least for ext2 with nobh option, we need to wait on
-		 * writeback completing on this folio, since we'll remove it
-		 * from the pagecache.  Otherwise truncate wont wait on the
-		 * folio, allowing the disk blocks to be reused by someone else
-		 * before we actually wrote our data to them. fs corruption
-		 * ensues.
-		 */
-		folio_wait_writeback(folio);
-
-		if (folio_has_private(folio) &&
-		    !filemap_release_folio(folio, GFP_KERNEL))
-			goto out_unlock;
-
-		/*
-		 * If we succeeded in removing the mapping, set LRU flag
-		 * and return good.
-		 */
-		if (remove_mapping(mapping, folio)) {
-			buf->flags |= PIPE_BUF_FLAG_LRU;
-			return true;
-		}
-	}
-
-	/*
-	 * Raced with truncate or failed to remove folio from current
-	 * address space, unlock and return failure.
-	 */
-out_unlock:
-	folio_unlock(folio);
-	return false;
-}
-
-static void page_cache_pipe_buf_release(struct pipe_inode_info *pipe,
-					struct pipe_buffer *buf)
-{
-	put_page(buf->page);
-	buf->flags &= ~PIPE_BUF_FLAG_LRU;
-}
-
-/*
- * Check whether the contents of buf is OK to access. Since the content
- * is a page cache page, IO may be in flight.
- */
-static int page_cache_pipe_buf_confirm(struct pipe_inode_info *pipe,
-				       struct pipe_buffer *buf)
-{
-	struct page *page = buf->page;
-	int err;
-
-	if (!PageUptodate(page)) {
-		lock_page(page);
-
-		/*
-		 * Page got truncated/unhashed. This will cause a 0-byte
-		 * splice, if this is the first page.
-		 */
-		if (!page->mapping) {
-			err = -ENODATA;
-			goto error;
-		}
-
-		/*
-		 * Uh oh, read-error from disk.
-		 */
-		if (!PageUptodate(page)) {
-			err = -EIO;
-			goto error;
-		}
-
-		/*
-		 * Page is ok afterall, we are done.
-		 */
-		unlock_page(page);
-	}
-
-	return 0;
-error:
-	unlock_page(page);
-	return err;
-}
-
-const struct pipe_buf_operations page_cache_pipe_buf_ops = {
-	.confirm	= page_cache_pipe_buf_confirm,
-	.release	= page_cache_pipe_buf_release,
-	.try_steal	= page_cache_pipe_buf_try_steal,
-	.get		= generic_pipe_buf_get,
-};
-
-static bool user_page_pipe_buf_try_steal(struct pipe_inode_info *pipe,
-		struct pipe_buffer *buf)
-{
-	if (!(buf->flags & PIPE_BUF_FLAG_GIFT))
-		return false;
-
-	buf->flags |= PIPE_BUF_FLAG_LRU;
-	return generic_pipe_buf_try_steal(pipe, buf);
-}
-
-static const struct pipe_buf_operations user_page_pipe_buf_ops = {
-	.release	= page_cache_pipe_buf_release,
-	.try_steal	= user_page_pipe_buf_try_steal,
-	.get		= generic_pipe_buf_get,
-};
-
 static void wakeup_pipe_readers(struct pipe_inode_info *pipe)
 {
 	smp_mb();
@@ -460,13 +337,6 @@ static int splice_from_pipe_feed(struct pipe_inode_info *pipe, struct splice_des
 		if (sd->len > sd->total_len)
 			sd->len = sd->total_len;
 
-		ret = pipe_buf_confirm(pipe, buf);
-		if (unlikely(ret)) {
-			if (ret == -ENODATA)
-				ret = 0;
-			return ret;
-		}
-
 		ret = actor(pipe, buf, sd);
 		if (ret <= 0)
 			return ret;
@@ -723,13 +593,6 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 				continue;
 			this_len = min(this_len, left);
 
-			ret = pipe_buf_confirm(pipe, buf);
-			if (unlikely(ret)) {
-				if (ret == -ENODATA)
-					ret = 0;
-				goto done;
-			}
-
 			bvec_set_page(&array[n], buf->page, this_len,
 				      buf->offset);
 			left -= this_len;
@@ -764,7 +627,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 			}
 		}
 	}
-done:
+
 	kfree(array);
 	splice_from_pipe_end(pipe, &sd);
 
@@ -855,13 +718,6 @@ ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
 
 			seg = min_t(size_t, remain, buf->len);
 
-			ret = pipe_buf_confirm(pipe, buf);
-			if (unlikely(ret)) {
-				if (ret == -ENODATA)
-					ret = 0;
-				break;
-			}
-
 			bvec_set_page(&bvec[bc++], buf->page, seg, buf->offset);
 			remain -= seg;
 			if (remain == 0 || bc >= ARRAY_SIZE(bvec))
@@ -1450,7 +1306,6 @@ static int splice_try_to_steal_page(struct pipe_inode_info *pipe,
 need_copy_unlock:
 	folio_unlock(folio);
 need_copy:
-
 	copy = folio_alloc(GFP_KERNEL, 0);
 	if (!copy)
 		return -ENOMEM;
@@ -1578,10 +1433,6 @@ static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
 {
 	struct pipe_inode_info *pipe;
 	long ret = 0;
-	unsigned buf_flag = 0;
-
-	if (flags & SPLICE_F_GIFT)
-		buf_flag = PIPE_BUF_FLAG_GIFT;
 
 	pipe = get_pipe_info(file, true);
 	if (!pipe)
@@ -1592,7 +1443,7 @@ static long vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
 	pipe_lock(pipe);
 	ret = wait_for_space(pipe, flags);
 	if (!ret)
-		ret = iter_to_pipe(iter, pipe, buf_flag);
+		ret = iter_to_pipe(iter, pipe, flags);
 	pipe_unlock(pipe);
 	if (ret > 0)
 		wakeup_pipe_readers(pipe);
@@ -1876,7 +1727,6 @@ static int splice_pipe_to_pipe(struct pipe_inode_info *ipipe,
 			 * Don't inherit the gift and merge flags, we need to
 			 * prevent multiple steals of this page.
 			 */
-			obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
 			obuf->flags &= ~PIPE_BUF_FLAG_CAN_MERGE;
 
 			obuf->len = len;
@@ -1968,7 +1818,6 @@ static int link_pipe(struct pipe_inode_info *ipipe,
 		 * Don't inherit the gift and merge flag, we need to prevent
 		 * multiple steals of this page.
 		 */
-		obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
 		obuf->flags &= ~PIPE_BUF_FLAG_CAN_MERGE;
 
 		if (obuf->len > len)
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 02e0086b10f6..9cfbefd7ba31 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -6,7 +6,6 @@
 
 #define PIPE_BUF_FLAG_LRU	0x01	/* page is on the LRU */
 #define PIPE_BUF_FLAG_ATOMIC	0x02	/* was atomically mapped */
-#define PIPE_BUF_FLAG_GIFT	0x04	/* page is a gift */
 #define PIPE_BUF_FLAG_PACKET	0x08	/* read() as a packet */
 #define PIPE_BUF_FLAG_CAN_MERGE	0x10	/* can merge buffers */
 #define PIPE_BUF_FLAG_WHOLE	0x20	/* read() must return entire buffer or error */
@@ -203,19 +202,6 @@ static inline void pipe_buf_release(struct pipe_inode_info *pipe,
 	ops->release(pipe, buf);
 }
 
-/**
- * pipe_buf_confirm - verify contents of the pipe buffer
- * @pipe:	the pipe that the buffer belongs to
- * @buf:	the buffer to confirm
- */
-static inline int pipe_buf_confirm(struct pipe_inode_info *pipe,
-				   struct pipe_buffer *buf)
-{
-	if (!buf->ops->confirm)
-		return 0;
-	return buf->ops->confirm(pipe, buf);
-}
-
 /**
  * pipe_buf_try_steal - attempt to take ownership of a pipe_buffer
  * @pipe:	the pipe that the buffer belongs to
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 6c461573434d..3c5abbd49ff2 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -97,6 +97,5 @@ extern ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out,
 extern int splice_grow_spd(const struct pipe_inode_info *, struct splice_pipe_desc *);
 extern void splice_shrink_spd(struct splice_pipe_desc *);
 
-extern const struct pipe_buf_operations page_cache_pipe_buf_ops;
 extern const struct pipe_buf_operations default_pipe_buf_ops;
 #endif
diff --git a/mm/filemap.c b/mm/filemap.c
index a002df515966..dd144b0dab69 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2929,7 +2929,7 @@ ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
 		size_t part = min_t(size_t, PAGE_SIZE - offset, size - spliced);
 
 		*buf = (struct pipe_buffer) {
-			.ops	= &page_cache_pipe_buf_ops,
+			.ops	= &default_pipe_buf_ops,
 			.page	= page,
 			.offset	= offset,
 			.len	= part,


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 4/4] splice: Record some statistics
  2023-06-29 15:54 [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe David Howells
                   ` (2 preceding siblings ...)
  2023-06-29 15:54 ` [RFC PATCH 3/4] splice: Remove some now-unused bits David Howells
@ 2023-06-29 15:54 ` David Howells
  2023-06-29 17:56 ` [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe Linus Torvalds
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: David Howells @ 2023-06-29 15:54 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Matthew Wilcox, Dave Chinner, Matt Whitlock,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

Add a proc file to export some statistics for debugging purposes.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Dave Chinner <david@fromorbit.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: linux-fsdevel@vger.kernel.org
---
 fs/splice.c            | 28 ++++++++++++++++++++++++++++
 include/linux/splice.h |  3 +++
 mm/filemap.c           |  6 +++++-
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index 2b1f109a7d4f..831973ea6b3f 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -36,10 +36,15 @@
 #include <linux/net.h>
 #include <linux/socket.h>
 #include <linux/sched/signal.h>
+#include <linux/proc_fs.h>
 
 #include "../mm/internal.h"
 #include "internal.h"
 
+atomic_t splice_stat_filemap_copied, splice_stat_filemap_moved;
+static atomic_t splice_stat_directly_copied;
+static atomic_t vmsplice_stat_copied, vmsplice_stat_stole;
+
 /*
  * Splice doesn't support FMODE_NOWAIT. Since pipes may set this flag to
  * indicate they support non-blocking reads or writes, we must clear it
@@ -276,6 +281,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
 		remain -= chunk;
 	}
 
+	atomic_add(keep, &splice_stat_directly_copied);
 	kfree(bv);
 	return ret;
 }
@@ -1299,6 +1305,7 @@ static int splice_try_to_steal_page(struct pipe_inode_info *pipe,
 	unmap_mapping_folio(folio);
 	if (remove_mapping(folio->mapping, folio)) {
 		folio_clear_mappedtodisk(folio);
+		atomic_inc(&vmsplice_stat_stole);
 		flags |= PIPE_BUF_FLAG_LRU;
 		goto add_to_pipe;
 	}
@@ -1316,6 +1323,7 @@ static int splice_try_to_steal_page(struct pipe_inode_info *pipe,
 	folio_put(folio);
 	folio = copy;
 	offset = 0;
+	atomic_inc(&vmsplice_stat_copied);
 
 add_to_pipe:
 	page = folio_page(folio, offset / PAGE_SIZE);
@@ -1905,3 +1913,23 @@ SYSCALL_DEFINE4(tee, int, fdin, int, fdout, size_t, len, unsigned int, flags)
 
 	return error;
 }
+
+static int splice_stats_show(struct seq_file *m, void *data)
+{
+	seq_printf(m, "filemap: copied=%u moved=%u\n",
+		   atomic_read(&splice_stat_filemap_copied),
+		   atomic_read(&splice_stat_filemap_moved));
+	seq_printf(m, "direct : copied=%u\n",
+		   atomic_read(&splice_stat_directly_copied));
+	seq_printf(m, "vmsplice: copied=%u stole=%u\n",
+		   atomic_read(&vmsplice_stat_copied),
+		   atomic_read(&vmsplice_stat_stole));
+	return 0;
+}
+
+static int splice_stats_init(void)
+{
+	proc_create_single("fs/splice", S_IFREG | 0444, NULL, splice_stats_show);
+	return 0;
+}
+late_initcall(splice_stats_init);
diff --git a/include/linux/splice.h b/include/linux/splice.h
index 3c5abbd49ff2..4f04dc338010 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -98,4 +98,7 @@ extern int splice_grow_spd(const struct pipe_inode_info *, struct splice_pipe_de
 extern void splice_shrink_spd(struct splice_pipe_desc *);
 
 extern const struct pipe_buf_operations default_pipe_buf_ops;
+
+extern atomic_t splice_stat_filemap_copied, splice_stat_filemap_moved;
+
 #endif
diff --git a/mm/filemap.c b/mm/filemap.c
index dd144b0dab69..38d38cc826fa 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2872,7 +2872,8 @@ ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
 	struct address_space *mapping;
 	struct folio *copy = NULL;
 	struct page *page;
-	unsigned int flags = 0;
+	unsigned int flags = 0, count = 0;
+	atomic_t *stat = &splice_stat_filemap_copied;
 	ssize_t ret;
 	size_t spliced = 0, offset = offset_in_folio(folio, fpos);
 
@@ -2902,6 +2903,7 @@ ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
 	/* If we succeed in removing the mapping, set LRU flag and add it. */
 	if (remove_mapping(mapping, folio)) {
 		folio_unlock(folio);
+		stat = &splice_stat_filemap_moved;
 		flags = PIPE_BUF_FLAG_LRU;
 		goto add_to_pipe;
 	}
@@ -2940,8 +2942,10 @@ ssize_t splice_folio_into_pipe(struct pipe_inode_info *pipe,
 		page++;
 		spliced += part;
 		offset = 0;
+		count++;
 	}
 
+	atomic_add(count, stat);
 	if (copy)
 		folio_put(copy);
 	return spliced;


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe
  2023-06-29 15:54 [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe David Howells
                   ` (3 preceding siblings ...)
  2023-06-29 15:54 ` [RFC PATCH 4/4] splice: Record some statistics David Howells
@ 2023-06-29 17:56 ` Linus Torvalds
  2023-06-29 18:05   ` Matt Whitlock
  2023-06-29 18:16 ` Matt Whitlock
  2023-06-30  0:01 ` Jakub Kicinski
  6 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2023-06-29 17:56 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Matthew Wilcox, Dave Chinner, Matt Whitlock, Jens Axboe,
	linux-fsdevel, linux-mm, linux-kernel

On Thu, 29 Jun 2023 at 08:55, David Howells <dhowells@redhat.com> wrote:
>
> Matt Whitlock, Matthew Wilcox and Dave Chinner are of the opinion that data
> in the pipe must not be seen to change and that if it does, this is a bug.

I'm not convinced.

The whole *point* of vmsplice (and splicing from a file) is the zero-copy.

If you don't want the zero-copy, then you should use just "write()".

So I disagree violently. This is not a bug unless you can point to
some other correctness issues.

The "stableness" of the data is literally the *only* difference
between vmsplice() and write().

> Whilst this does allow the code to be somewhat simplified, it also results
> in a loss of performance: stolen pages have to be reloaded in accessed
> again; more data has to be copied.

No. It literally results in a loss of THE WHOLE POINT of vmsplice().

                    Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe
  2023-06-29 17:56 ` [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe Linus Torvalds
@ 2023-06-29 18:05   ` Matt Whitlock
  2023-06-29 18:19     ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Matt Whitlock @ 2023-06-29 18:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Howells, netdev, Matthew Wilcox, Dave Chinner, Jens Axboe,
	linux-fsdevel, linux-mm, linux-kernel

On Thursday, 29 June 2023 13:56:04 EDT, Linus Torvalds wrote:
> On Thu, 29 Jun 2023 at 08:55, David Howells <dhowells@redhat.com> wrote:
>> 
>> Matt Whitlock, Matthew Wilcox and Dave Chinner are of the 
>> opinion that data
>> in the pipe must not be seen to change and that if it does, this is a bug.
>
> I'm not convinced.
>
> The whole *point* of vmsplice (and splicing from a file) is the zero-copy.
>
> If you don't want the zero-copy, then you should use just "write()".

If you want zero copies, then call splice() *with* SPLICE_F_MOVE.

If you want one copy (kernel-to-kernel), then call splice() *without* 
SPLICE_F_MOVE.

If you want two copies (kernel-to-user + user-to-kernel), call read() and 
write().

I don't know why SPLICE_F_MOVE is being ignored in this thread. Sure, maybe 
the way it has historically been implemented was only relevant when the 
input FD is a pipe, but that's not what the man page implies. You have the 
opportunity to make it actually do what it says on the tin.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe
  2023-06-29 15:54 [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe David Howells
                   ` (4 preceding siblings ...)
  2023-06-29 17:56 ` [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe Linus Torvalds
@ 2023-06-29 18:16 ` Matt Whitlock
  2023-06-30  0:01 ` Jakub Kicinski
  6 siblings, 0 replies; 32+ messages in thread
From: Matt Whitlock @ 2023-06-29 18:16 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Matthew Wilcox, Dave Chinner, Linus Torvalds, Jens Axboe,
	linux-fsdevel, linux-mm, linux-kernel

On Thursday, 29 June 2023 11:54:29 EDT, David Howells wrote:
> Matt Whitlock, Matthew Wilcox and Dave Chinner are of the opinion that data
> in the pipe must not be seen to change and that if it does, this is a bug.
> Apart from in one specific instance (vmsplice() with SPLICE_F_GIFT), the
> manual pages agree with them.  I'm more inclined to adjust the
> documentation since the behaviour we have has been that way since 2005, I
> think.

Anecdotally, my use case had been working fine for years until I upgraded 
from 5.15.x to 6.1.x in February of this year. That's when my backups 
started being corrupted. I only noticed when I was trying to restore a lost 
file from backup earlier this week.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe
  2023-06-29 18:05   ` Matt Whitlock
@ 2023-06-29 18:19     ` Linus Torvalds
  2023-06-29 18:34       ` Matthew Wilcox
  2023-06-29 18:42       ` Linus Torvalds
  0 siblings, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2023-06-29 18:19 UTC (permalink / raw)
  To: Matt Whitlock
  Cc: David Howells, netdev, Matthew Wilcox, Dave Chinner, Jens Axboe,
	linux-fsdevel, linux-mm, linux-kernel

On Thu, 29 Jun 2023 at 11:05, Matt Whitlock <kernel@mattwhitlock.name> wrote:
>
> I don't know why SPLICE_F_MOVE is being ignored in this thread. Sure, maybe
> the way it has historically been implemented was only relevant when the
> input FD is a pipe, but that's not what the man page implies. You have the
> opportunity to make it actually do what it says on the tin.

First off, when documentation and reality disagree, it's the
documentation that is garbage.

Secondly, your point is literally moot, from what I can tell:

       SPLICE_F_MOVE
              Unused for vmsplice(); see splice(2).

that's the doc I see right now for "man vmsplice".

There's no "implies" there. There's an actual big honking clear
statement at the top of the man-page saying that what you claim is
simply not even remotely true.

Also, the reason SPLICE_F_MOVE is unused for vmsplice() is that
actually trying to move pages would involve having to *remove* them
from the VM source. And the TLB invalidation involved with that is
literally more expensive than the memory copy would be.

So no. SPLICE_F_MOVE isn't the answer.

Now, we also have SPLICE_F_GIFT. That's actually a more extreme case
of "not only should you taekm this page, you can actually try to
re-use the end result for your own nefarious purposes".

Now, I would actually not disagree with removing that part. It's
scary. But I think we don't really have any users (ok, fuse and some
random console driver?)

            Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe
  2023-06-29 18:19     ` Linus Torvalds
@ 2023-06-29 18:34       ` Matthew Wilcox
  2023-06-29 18:53         ` Linus Torvalds
  2023-06-30 16:50         ` David Howells
  2023-06-29 18:42       ` Linus Torvalds
  1 sibling, 2 replies; 32+ messages in thread
From: Matthew Wilcox @ 2023-06-29 18:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matt Whitlock, David Howells, netdev, Dave Chinner, Jens Axboe,
	linux-fsdevel, linux-mm, linux-kernel

On Thu, Jun 29, 2023 at 11:19:36AM -0700, Linus Torvalds wrote:
> On Thu, 29 Jun 2023 at 11:05, Matt Whitlock <kernel@mattwhitlock.name> wrote:
> >
> > I don't know why SPLICE_F_MOVE is being ignored in this thread. Sure, maybe
> > the way it has historically been implemented was only relevant when the
> > input FD is a pipe, but that's not what the man page implies. You have the
> > opportunity to make it actually do what it says on the tin.
> 
> First off, when documentation and reality disagree, it's the
> documentation that is garbage.
> 
> Secondly, your point is literally moot, from what I can tell:
> 
>        SPLICE_F_MOVE
>               Unused for vmsplice(); see splice(2).
> 
> that's the doc I see right now for "man vmsplice".
> 
> There's no "implies" there. There's an actual big honking clear
> statement at the top of the man-page saying that what you claim is
> simply not even remotely true.
> 
> Also, the reason SPLICE_F_MOVE is unused for vmsplice() is that
> actually trying to move pages would involve having to *remove* them
> from the VM source. And the TLB invalidation involved with that is
> literally more expensive than the memory copy would be.

I think David muddied the waters by talking about vmsplice().  The
problem encountered is with splice() from the page cache.  Reading
the documentation,

       splice()  moves  data  between two file descriptors without copying be‐
       tween kernel address space and user address space.  It transfers up  to
       len bytes of data from the file descriptor fd_in to the file descriptor
       fd_out, where one of the file descriptors must refer to a pipe.

The bug reported is actually with using FALLOC_FL_PUNCH_HOLE, but a
simpler problem is:

#define _GNU_SOURCE
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>

#define PAGE_SIZE 4096

int main(int argc, char **argv)
{
        int fd = open(argv[1], O_RDWR | O_CREAT, 0644);

        err = ftruncate(fd, PAGE_SIZE);
        pwrite(fd, "old", 3, 0);
        splice(fd, NULL, 1, NULL, PAGE_SIZE, 0);
        pwrite(fd, "new", 3, 0);

        return 0;
}

That outputs "new".  Should it?  If so, the manpage is really wrong.
It says the point of splice() is to remove the kernel-user-kernel copy,
and notes that zerocopy might be happening, but that's an optimisation
the user shouldn't notice.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe
  2023-06-29 18:19     ` Linus Torvalds
  2023-06-29 18:34       ` Matthew Wilcox
@ 2023-06-29 18:42       ` Linus Torvalds
  1 sibling, 0 replies; 32+ messages in thread
From: Linus Torvalds @ 2023-06-29 18:42 UTC (permalink / raw)
  To: Matt Whitlock
  Cc: David Howells, netdev, Matthew Wilcox, Dave Chinner, Jens Axboe,
	linux-fsdevel, linux-mm, linux-kernel

On Thu, 29 Jun 2023 at 11:19, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Now, we also have SPLICE_F_GIFT. [..]
>
> Now, I would actually not disagree with removing that part. It's
> scary. But I think we don't really have any users (ok, fuse and some
> random console driver?)

Side note: maybe I should clarify. I have grown to pretty much hate
splice() over the years, just because it's been a constant source of
sorrow in so many ways.

So I'd personally be perfectly ok with just making vmsplice() be
exactly the same as write, and turn all of vmsplice() into just "it's
a read() if the pipe is open for read, and a write if it's open for
writing".

IOW, effectively get rid of vmsplice() entirely, just leaving it as a
legacy name for an interface.

What I *absolutely* don't want to see is to make vmsplice() even more
complicated, and actively slower in the process. Unmapping it from the
source, removing it from the VM, is all just crazy talk.

If you want to be really crazy, I can tell you how to make for some
truly stupendously great benchmarks: make a plain "write()" system
call look up the physical page, check if it's COW'able, and if so,
mark it read-only in the source and steal the page. Now write() has
taken a snapshot of the source, and can use that page for the pipe
buffer as-is. It won't change, because if the user writes to it, the
user will just take a page fault and force a COW.

Then, to complete the thing, make 'read()' of a pipe able to just take
the page, and insert it into the destination VM (it's ok to make it
writable at that point).

You can get *wonderful* performance numbers from benchmarks with that.

I know, because I did exactly that long long ago. So long ago that I
think I had a i486 that had memory throughput measured in megabytes.
And my pipe throughput benchmark got gigabytes per second!

Of course, that benchmark relied entirely on the source of the write()
never actually writing to the page, and the reader never actually
bothering to touch the page. So it was gigabytes on a pretty bad
benchmark. But it was quite impressive.

I don't think those patches ever got posted publicly, because while
very impressive on benchmarks, it obviously was absolutely horrendous
in real life, because in real life the source of the pipe data would
(a) not usually be page-aligned anyway, and (b) even if it was and
triggered this wonderful case, it would then re-use the buffer and
take a COW fault, and now the overhead of faulting, allocating a new
page, copying said page, was obviously higher than just doing all that
in the pipe write() code without any faulting overhead.

But splice() (and vmsplice()) does conceptually come from that kind of
background.

It's just that it was never as lovely and as useful as it promised to
be. So I'd actually be more than happy to just say "let's decommission
splice entirely, just keeping the interfaces alive for backwards
compatibility"

                     Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe
  2023-06-29 18:34       ` Matthew Wilcox
@ 2023-06-29 18:53         ` Linus Torvalds
  2023-06-30 16:50         ` David Howells
  1 sibling, 0 replies; 32+ messages in thread
From: Linus Torvalds @ 2023-06-29 18:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Matt Whitlock, David Howells, netdev, Dave Chinner, Jens Axboe,
	linux-fsdevel, linux-mm, linux-kernel

On Thu, 29 Jun 2023 at 11:34, Matthew Wilcox <willy@infradead.org> wrote:
>
> I think David muddied the waters by talking about vmsplice().  The
> problem encountered is with splice() from the page cache.  Reading
> the documentation,
>
>        splice()  moves  data  between two file descriptors without copying be‐
>        tween kernel address space and user address space.  It transfers up  to
>        len bytes of data from the file descriptor fd_in to the file descriptor
>        fd_out, where one of the file descriptors must refer to a pipe.

Well, the original intent really always was that it's about zero-copy.

So I do think that the answer to your test-program is that yes, it
really traditionally *should* output "new".

A splice from a file acts like a scatter-gather mmap() in the kernel.
It's the original intent, and it's the whole reason it's noticeably
faster than doing a write.

Now, do I then agree that splice() has turned out to be a nasty morass
of problems?  Yes.

And I even agree that while I actually *think* that your test program
should output "new" (because that is the whole point of the exercise),
it also means that people who use splice() need to *understand* that,
and it's much too easy to get things wrong if you don't understand
that the whole point of splice is to act as a kind of ad-hoc in-kernel
mmap thing.

And to make matters worse, for mmap() we actually do have some
coherency helpers. For splice(), the page ref stays around.

It's kind of like GUP and page pinning - another area where we have
had lots of problems and lots of nasty semantics and complications
with other VM operations over the years.

So I really *really* don't want to complicate splice() even more to
give it some new semantics that it has never ever really had, because
people didn't understand it and used it wrong.

Quite the reverse. I'd be willing to *simplify* splice() by just
saying "it was all a mistake", and just turning it into wrappers
around read/write. But those patches would have to be radical
simplifications, not adding yet more crud on top of the pain that is
splice().

Because it will hurt performance. And I'm ok with that as long as it
comes with huge simplifications. What I'm *not* ok with is "I mis-used
splice, now I want splice to act differently, so let's make it even
more complicated".

               Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe
  2023-06-29 15:54 [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe David Howells
                   ` (5 preceding siblings ...)
  2023-06-29 18:16 ` Matt Whitlock
@ 2023-06-30  0:01 ` Jakub Kicinski
  6 siblings, 0 replies; 32+ messages in thread
From: Jakub Kicinski @ 2023-06-30  0:01 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Matthew Wilcox, Dave Chinner, Matt Whitlock,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel

On Thu, 29 Jun 2023 16:54:29 +0100 David Howells wrote:
> I'm more inclined to adjust the documentation since the behaviour we
> have has been that way since 2005, I think.

+1 FWIW I think that networking always operated under the assumption
that the pages may change. In TLS we require explicit opt-in from users
that the pages they send will not get changed, if it could cause crypto
errors (TLS_TX_ZEROCOPY_RO).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/4] splice: Make vmsplice() steal or copy
  2023-06-29 15:54 ` [RFC PATCH 2/4] splice: Make vmsplice() steal or copy David Howells
@ 2023-06-30 13:44   ` Simon Horman
  2023-06-30 15:29   ` David Howells
  1 sibling, 0 replies; 32+ messages in thread
From: Simon Horman @ 2023-06-30 13:44 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Matthew Wilcox, Dave Chinner, Matt Whitlock,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

On Thu, Jun 29, 2023 at 04:54:31PM +0100, David Howells wrote:

...

>  static int iter_to_pipe(struct iov_iter *from,
>  			struct pipe_inode_info *pipe,
>  			unsigned flags)
>  {
> -	struct pipe_buffer buf = {
> -		.ops = &user_page_pipe_buf_ops,

Hi David,

perhaps this patchset will change somewhat based on discussion
elsewhere in this thread.

But, on a more mundane level, GCC reports that user_page_pipe_buf_ops is
(now) unused.  I guess this was the last user, and user_page_pipe_buf_ops
can be removed as part of this patch.

...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/4] splice: Make vmsplice() steal or copy
  2023-06-29 15:54 ` [RFC PATCH 2/4] splice: Make vmsplice() steal or copy David Howells
  2023-06-30 13:44   ` Simon Horman
@ 2023-06-30 15:29   ` David Howells
  2023-06-30 17:32     ` Simon Horman
  1 sibling, 1 reply; 32+ messages in thread
From: David Howells @ 2023-06-30 15:29 UTC (permalink / raw)
  To: Simon Horman
  Cc: dhowells, netdev, Matthew Wilcox, Dave Chinner, Matt Whitlock,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

Simon Horman <simon.horman@corigine.com> wrote:

> But, on a more mundane level, GCC reports that user_page_pipe_buf_ops is
> (now) unused.  I guess this was the last user, and user_page_pipe_buf_ops
> can be removed as part of this patch.

See patch 3.

David


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe
  2023-06-29 18:34       ` Matthew Wilcox
  2023-06-29 18:53         ` Linus Torvalds
@ 2023-06-30 16:50         ` David Howells
  1 sibling, 0 replies; 32+ messages in thread
From: David Howells @ 2023-06-30 16:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Matthew Wilcox, Matt Whitlock, netdev, Dave Chinner,
	Jens Axboe, linux-fsdevel, linux-mm, linux-kernel

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> Quite the reverse. I'd be willing to *simplify* splice() by just
> saying "it was all a mistake", and just turning it into wrappers
> around read/write. But those patches would have to be radical
> simplifications, not adding yet more crud on top of the pain that is
> splice().
> 
> Because it will hurt performance. And I'm ok with that as long as it
> comes with huge simplifications. What I'm *not* ok with is "I mis-used
> splice, now I want splice to act differently, so let's make it even
> more complicated".

If we want to go down the simplification route, then the patches I posted
might be a good start.

The idea I tried to work towards is that the pipe only ever contains private
pages in it that only the pipe has a ref on and that no one else can access
until they come out the other end again.  I got rid of the ->confirm() pipe
buf op and would like to kill off all of the others too.

I simplified splice() by:

 - Making sure any candidate pages are uptodate right up front.

 - Allowing automatic stealing of pages from the pagecache if no one else is
   using them.  This should avoid losing a chunk of the performance that
   splice is supposed to gain - but if you're serving pages repeatedly in a
   webserver with this, it's going to be a problem.

   Possibly this should be contingent on SPLICE_F_MOVE - though the manpage
   says "*from* the pipe" implying it's only usable on the output side.

 - Copying in every other circumstance.

I simplified vmsplice() by:

 - If SPLICE_F_GIFT is set, attempting to steal whole pages in the buffer up
   front if not in use by anyone else.

 - Copying in every other circumstance.

That said, there are still sources that I didn't touch yet that attempt to
insert pages into a pipe: relayfs (which does some accounting stuff based on
the final consumption of the pages it inserted), sockets (which don't allow
inserted pages to be stolen) and notifications (which don't want to allocate
at notification time - but I can deal with that).  And there's tee() (which
would need to copy the data).  And pipe-to-pipe splice (which could steal
whole pages, but would otherwise have to copy).


If you would prefer to go for utter simplification, we could make sendfile()
from a buffered file just call sendmsg() directly with MSG_SPLICE_PAGES set
and ignore the pipe entirely (I'm tempted to do this anyway) and then make
splice() to a pipe just do copy_splice_read() and vmsplice() to a pipe do
writev().

I wonder how much splice() is used compared to sendfile().


I would prefer to leave splice() and vmsplice() as they are now and adjust the
documentation.  As you say, they can be considered a type of zerocopy.

David


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/4] splice: Make vmsplice() steal or copy
  2023-06-30 15:29   ` David Howells
@ 2023-06-30 17:32     ` Simon Horman
  0 siblings, 0 replies; 32+ messages in thread
From: Simon Horman @ 2023-06-30 17:32 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Matthew Wilcox, Dave Chinner, Matt Whitlock,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

On Fri, Jun 30, 2023 at 04:29:34PM +0100, David Howells wrote:
> Simon Horman <simon.horman@corigine.com> wrote:
> 
> > But, on a more mundane level, GCC reports that user_page_pipe_buf_ops is
> > (now) unused.  I guess this was the last user, and user_page_pipe_buf_ops
> > can be removed as part of this patch.
> 
> See patch 3.

Thanks, I do see that now.
But as thing stand, bisection is broken.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-06-29 15:54 ` [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns David Howells
@ 2023-07-19 10:17   ` Miklos Szeredi
  2023-07-19 17:59     ` Matt Whitlock
  0 siblings, 1 reply; 32+ messages in thread
From: Miklos Szeredi @ 2023-07-19 10:17 UTC (permalink / raw)
  To: David Howells
  Cc: netdev, Matthew Wilcox, Dave Chinner, Matt Whitlock,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

On Thu, 29 Jun 2023 at 17:56, David Howells <dhowells@redhat.com> wrote:
>
> Splicing data from, say, a file into a pipe currently leaves the source
> pages in the pipe after splice() returns - but this means that those pages
> can be subsequently modified by shared-writable mmap(), write(),
> fallocate(), etc. before they're consumed.

What is this trying to fix?   The above behavior is well known, so
it's not likely to be a problem.

Besides, removing spliced pages from the cache is basically guaranteed
to result in a performance regression for any application using
splice.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 10:17   ` Miklos Szeredi
@ 2023-07-19 17:59     ` Matt Whitlock
  2023-07-19 19:35       ` Miklos Szeredi
  0 siblings, 1 reply; 32+ messages in thread
From: Matt Whitlock @ 2023-07-19 17:59 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: David Howells, netdev, Matthew Wilcox, Dave Chinner,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

On Wednesday, 19 July 2023 06:17:51 EDT, Miklos Szeredi wrote:
> On Thu, 29 Jun 2023 at 17:56, David Howells <dhowells@redhat.com> wrote:
>> 
>> Splicing data from, say, a file into a pipe currently leaves the source
>> pages in the pipe after splice() returns - but this means that those pages
>> can be subsequently modified by shared-writable mmap(), write(),
>> fallocate(), etc. before they're consumed.
>
> What is this trying to fix?   The above behavior is well known, so
> it's not likely to be a problem.

Respectfully, it's not well-known, as it's not documented. If the splice(2) 
man page had mentioned that pages can be mutated after they're already 
ostensibly at rest in the output pipe buffer, then my nightly backups 
wouldn't have been incurring corruption silently for many months.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 17:59     ` Matt Whitlock
@ 2023-07-19 19:35       ` Miklos Szeredi
  2023-07-19 19:44         ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: Miklos Szeredi @ 2023-07-19 19:35 UTC (permalink / raw)
  To: Matt Whitlock
  Cc: David Howells, netdev, Matthew Wilcox, Dave Chinner,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

On Wed, 19 Jul 2023 at 19:59, Matt Whitlock <kernel@mattwhitlock.name> wrote:
>
> On Wednesday, 19 July 2023 06:17:51 EDT, Miklos Szeredi wrote:
> > On Thu, 29 Jun 2023 at 17:56, David Howells <dhowells@redhat.com> wrote:
> >>
> >> Splicing data from, say, a file into a pipe currently leaves the source
> >> pages in the pipe after splice() returns - but this means that those pages
> >> can be subsequently modified by shared-writable mmap(), write(),
> >> fallocate(), etc. before they're consumed.
> >
> > What is this trying to fix?   The above behavior is well known, so
> > it's not likely to be a problem.
>
> Respectfully, it's not well-known, as it's not documented. If the splice(2)
> man page had mentioned that pages can be mutated after they're already
> ostensibly at rest in the output pipe buffer, then my nightly backups
> wouldn't have been incurring corruption silently for many months.

splice(2):

       Though we talk of copying, actual copies are generally avoided.
The kernel does this by implementing a pipe buffer as a set  of
refer‐
       ence-counted  pointers  to  pages  of kernel memory.  The
kernel creates "copies" of pages in a buffer by creating new pointers
(for the
       output buffer) referring to the pages, and increasing the
reference counts for the pages: only pointers are copied, not the
pages of the
       buffer.

While not explicitly stating that the contents of the pages can change
after being spliced, this can easily be inferred from the above
semantics.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 19:35       ` Miklos Szeredi
@ 2023-07-19 19:44         ` Matthew Wilcox
  2023-07-19 19:56           ` Miklos Szeredi
                             ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Matthew Wilcox @ 2023-07-19 19:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Matt Whitlock, David Howells, netdev, Dave Chinner,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

On Wed, Jul 19, 2023 at 09:35:33PM +0200, Miklos Szeredi wrote:
> On Wed, 19 Jul 2023 at 19:59, Matt Whitlock <kernel@mattwhitlock.name> wrote:
> >
> > On Wednesday, 19 July 2023 06:17:51 EDT, Miklos Szeredi wrote:
> > > On Thu, 29 Jun 2023 at 17:56, David Howells <dhowells@redhat.com> wrote:
> > >>
> > >> Splicing data from, say, a file into a pipe currently leaves the source
> > >> pages in the pipe after splice() returns - but this means that those pages
> > >> can be subsequently modified by shared-writable mmap(), write(),
> > >> fallocate(), etc. before they're consumed.
> > >
> > > What is this trying to fix?   The above behavior is well known, so
> > > it's not likely to be a problem.
> >
> > Respectfully, it's not well-known, as it's not documented. If the splice(2)
> > man page had mentioned that pages can be mutated after they're already
> > ostensibly at rest in the output pipe buffer, then my nightly backups
> > wouldn't have been incurring corruption silently for many months.
> 
> splice(2):
> 
>        Though we talk of copying, actual copies are generally avoided.
> The kernel does this by implementing a pipe buffer as a set  of
> refer‐
>        ence-counted  pointers  to  pages  of kernel memory.  The
> kernel creates "copies" of pages in a buffer by creating new pointers
> (for the
>        output buffer) referring to the pages, and increasing the
> reference counts for the pages: only pointers are copied, not the
> pages of the
>        buffer.
> 
> While not explicitly stating that the contents of the pages can change
> after being spliced, this can easily be inferred from the above
> semantics.

So what's the API that provides the semantics of _copying_?  And, frankly,
this is a "you're holding it wrong" kind of argument.  It only makes
sense if you're read the implementation, which is at best level 2:

https://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html

and worst a level -5:

https://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 19:44         ` Matthew Wilcox
@ 2023-07-19 19:56           ` Miklos Szeredi
  2023-07-19 20:04             ` Matthew Wilcox
  2023-07-19 20:16           ` Linus Torvalds
  2023-07-24  9:44           ` David Howells
  2 siblings, 1 reply; 32+ messages in thread
From: Miklos Szeredi @ 2023-07-19 19:56 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Matt Whitlock, David Howells, netdev, Dave Chinner,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

On Wed, 19 Jul 2023 at 21:44, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Jul 19, 2023 at 09:35:33PM +0200, Miklos Szeredi wrote:
> > On Wed, 19 Jul 2023 at 19:59, Matt Whitlock <kernel@mattwhitlock.name> wrote:
> > >
> > > On Wednesday, 19 July 2023 06:17:51 EDT, Miklos Szeredi wrote:
> > > > On Thu, 29 Jun 2023 at 17:56, David Howells <dhowells@redhat.com> wrote:
> > > >>
> > > >> Splicing data from, say, a file into a pipe currently leaves the source
> > > >> pages in the pipe after splice() returns - but this means that those pages
> > > >> can be subsequently modified by shared-writable mmap(), write(),
> > > >> fallocate(), etc. before they're consumed.
> > > >
> > > > What is this trying to fix?   The above behavior is well known, so
> > > > it's not likely to be a problem.
> > >
> > > Respectfully, it's not well-known, as it's not documented. If the splice(2)
> > > man page had mentioned that pages can be mutated after they're already
> > > ostensibly at rest in the output pipe buffer, then my nightly backups
> > > wouldn't have been incurring corruption silently for many months.
> >
> > splice(2):
> >
> >        Though we talk of copying, actual copies are generally avoided.
> > The kernel does this by implementing a pipe buffer as a set  of
> > refer‐
> >        ence-counted  pointers  to  pages  of kernel memory.  The
> > kernel creates "copies" of pages in a buffer by creating new pointers
> > (for the
> >        output buffer) referring to the pages, and increasing the
> > reference counts for the pages: only pointers are copied, not the
> > pages of the
> >        buffer.
> >
> > While not explicitly stating that the contents of the pages can change
> > after being spliced, this can easily be inferred from the above
> > semantics.
>
> So what's the API that provides the semantics of _copying_?

What's your definition of copying?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 19:56           ` Miklos Szeredi
@ 2023-07-19 20:04             ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2023-07-19 20:04 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Matt Whitlock, David Howells, netdev, Dave Chinner,
	Linus Torvalds, Jens Axboe, linux-fsdevel, linux-mm,
	linux-kernel, Christoph Hellwig, linux-fsdevel

On Wed, Jul 19, 2023 at 09:56:44PM +0200, Miklos Szeredi wrote:
> On Wed, 19 Jul 2023 at 21:44, Matthew Wilcox <willy@infradead.org> wrote:
> > So what's the API that provides the semantics of _copying_?
> 
> What's your definition of copying?

Future modifications to the pagecache do not affect the data after the
syscall has returned success.  Modifications to the pagecache while
the syscall is in progress may or may not affect the data received at
the destination.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 19:44         ` Matthew Wilcox
  2023-07-19 19:56           ` Miklos Szeredi
@ 2023-07-19 20:16           ` Linus Torvalds
  2023-07-19 21:02             ` Matt Whitlock
  2023-07-24  9:44           ` David Howells
  2 siblings, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2023-07-19 20:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Miklos Szeredi, Matt Whitlock, David Howells, netdev,
	Dave Chinner, Jens Axboe, linux-fsdevel, linux-mm, linux-kernel,
	Christoph Hellwig, linux-fsdevel

On Wed, 19 Jul 2023 at 12:44, Matthew Wilcox <willy@infradead.org> wrote:
>
> So what's the API that provides the semantics of _copying_?

It's called "read()" and "write()".

Seriously.

The *ONLY* reason for splice() existing is for zero-copy. If you don't
want zero-copy (aka "copy by reference"), don't use splice.

Stop arguing against it. If you don't want zero-copy, you use read()
and write(). It really is that simple.

And no, we don't start some kind of crazy "versioned zero-copy with
COW". That's a fundamental mistake. It's a mistake that has been done
- several times - and made perhaps most famous by Hurd, that made that
a big thing.

And yes, this has been documented *forever*. It may not have been
documented on the first line, because IT WAS SO OBVIOUS. The whole
reason splice() is fast is because it avoids the actual copy, and does
a copy-by-reference.

That's still a copy. But a copy-by-reference is a special thing. If
you don't know what copy-by-reference is, or don't want it, don't use
splice().

I don't know how many different ways I can say the same thing.

IF YOU DON'T WANT ZERO-COPY, DON'T USE SPLICE.

IF YOU DON'T UNDERSTAND THE DIFFERENCE BETWEEN COPY-BY-VALUE AND
COPY-BY-REFERENCE, DON'T USE SPLICE.

IF YOU DON'T UNDERSTAND THE *POINT* OF SPLICE, DON'T USE SPLICE.

It's kind of a bit like pointers in C: if you don't understand
pointers but use them anyway, you're going to have a hard time. That's
not the fault of the pointers. Pointers are very very powerful. But if
you are used to languages that only do copy-by-value, you are going to
think they are bad things.

                  Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 20:16           ` Linus Torvalds
@ 2023-07-19 21:02             ` Matt Whitlock
  2023-07-19 23:20               ` Linus Torvalds
  0 siblings, 1 reply; 32+ messages in thread
From: Matt Whitlock @ 2023-07-19 21:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Miklos Szeredi, David Howells, netdev,
	Dave Chinner, Jens Axboe, linux-fsdevel, linux-mm, linux-kernel,
	Christoph Hellwig, linux-fsdevel

On Wednesday, 19 July 2023 16:16:07 EDT, Linus Torvalds wrote:
> The *ONLY* reason for splice() existing is for zero-copy.

The very first sentence of splice(2) reads: "splice() moves data between 
two file descriptors without copying between kernel address space and user 
address space." Thus, it is not unreasonable to believe that the point of 
splice is to avoid copying between user-space and kernel-space.

If you use read() and write(), then you're making two copies. If you use 
splice(), then you're making one copy (or zero, but that's an optimization 
that should be invisible to the user).

> And no, we don't start some kind of crazy "versioned zero-copy with
> COW". That's a fundamental mistake.

Agreed. splice() should steal the reference if it can, copy the page data 
if it must. Note that, even in the slow case where the page data must be 
copied, this still gives a better-than-50% speedup over read()+write() 
since an entire copy (and one syscall) is elided.

> IF YOU DON'T UNDERSTAND THE *POINT* OF SPLICE, DON'T USE SPLICE.

Thanks for being so condescending. Your reputation is deserved.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 21:02             ` Matt Whitlock
@ 2023-07-19 23:20               ` Linus Torvalds
  2023-07-19 23:41                 ` Matt Whitlock
  2023-07-19 23:48                 ` Linus Torvalds
  0 siblings, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2023-07-19 23:20 UTC (permalink / raw)
  To: Matt Whitlock
  Cc: Matthew Wilcox, Miklos Szeredi, David Howells, netdev,
	Dave Chinner, Jens Axboe, linux-fsdevel, linux-mm, linux-kernel,
	Christoph Hellwig, linux-fsdevel

On Wed, 19 Jul 2023 at 14:02, Matt Whitlock <kernel@mattwhitlock.name> wrote:
>
> On Wednesday, 19 July 2023 16:16:07 EDT, Linus Torvalds wrote:
> > The *ONLY* reason for splice() existing is for zero-copy.
>
> The very first sentence of splice(2) reads: "splice() moves data between
> two file descriptors without copying between kernel address space and user
> address space." Thus, it is not unreasonable to believe that the point of
> splice is to avoid copying between user-space and kernel-space.

I'm not at all opposed to clarifying the documentation.

But I *am* very much against changing existing semantics. People rely
on it. The networking layer knows about it. The whole design is all
around "copy by reference".

And changing existing semantics would not only slow things down, it
wouldn't even *fix* anything that got this wrong. They'd still be
broken on old kernels.

When documentation and reality collide, documentation loses. That's
how this works.

> If you use read() and write(), then you're making two copies. If you use
> splice(), then you're making one copy (or zero, but that's an optimization
> that should be invisible to the user).

No. It really isn't.

It is an optimization that is INHERENT IN THE INTERFACE and has been
there literally since day #1. It was *never* invisible. It was the
*point*.

You getting this use case wrong is not an excuse to change reality. It
is, at most, a good reason to clarify the documentation.

The "without copying between kernel address space and user address
space" is about the ability to not copy AT ALL, and yes, let's by all
means clarify that part.

Really. If you cannot understand this fact, and the fact that you
simply misunderstood how splice() worked, I can't really help you
about that.

I repeat: if you want a stable copy of some file data, you *have* to
copy the file data. There's no magic. There's no difference between
"copy to user space" or "copy in kernel space". So you had better just
use "read()".

If you want to avoid the copy, you use one of the interfaces that are
about references to the data. splice() is not the only such interface.
mmap() acts the same way (on input).

You really should see splice() into a pipe as a way to 'mmap the data
without allocating user space backing store".

Of course, splice() *out* of a pipe is different too. It's the same
system call, but "splice into pipe" and "splice out of pipe" are
actually very different animals.

So splicing into a pipe is kind of like a small temporary mmap without
the TLB flush or VM allocation overhead.

But splicing out of the pipe is more akin to "map this buffer into
your own buffers as long as you don't modify it", so it basically say
"you can take just a reference to this page" (complexity addition:
SPLICE_F_GIFT and buffer stealing).

And all of this is literally designed to be able to do zero-copy from
multiple sources to multiple destinations. Not "sendpage()", which
could only do file->network, but a more generic ability like having
data that is sourced from (say) a TV capture card, and is transferred
to the network or maybe to another accelerator for encoding.

That's why the "pipe" part exists. It's the buffer in between
arbitrary end points. It's the replacement for a user buffer. But it's
also literally designed to be all about copy-by-reference.

Really.

So stop arguing. You misused splice(), because you didn't understand
it, and you got burnt. You don't like that. I get it. But that doesn't
make splice() wrong. That only made your use of it buggy.

So splice() is for zero-copy. It expects that you either stabilized
the data somehow (maybe those files are never modified, or maybe you
have other synchronization) or that you simply don't care whether it's
stable, and if the file changes, maybe the data you send changed too.

If you want "one-copy", what you can do is:

 - mmap() the file data (zero copy, not stable yet)

 - use "write()" to write the data to the network. This will copy it
to the skbs before the write() call returns and that copy makes it
stable.

Alternatively, if you want to be more than a bit odd, you _can_ do the
zero-copy on the write side, by doing

 - read the file data (one copy, now it's stable)

 - vmsplice() to the kernel buffer (zero copy)

 - splice() to the network (zero copy at least for the good cases)

and now you just need to make sure that you don't re-use the user
buffer until the network data has actually been sent. Which makes this
second alternative much more complicated than the first one, and I'm
absolutely not recommending it, but I'm mentioning it as a
possibility.

Honestly, the read/vmsplice/splice model is kind of crazy, but there
might be real reasons to do it that odd way, and the buffer handling
in user space is manageable (you might, for example, just decide to
"munmap()" the buffer after sending it out).

For an example of "why would you ever do that", you might have content
conversion issues between the read/vmsplice, or need to generate a
checksum on the stable data or whatever. So it's a *valid* use of
splice(), but it's certainly a bit odd.

              Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 23:20               ` Linus Torvalds
@ 2023-07-19 23:41                 ` Matt Whitlock
  2023-07-20  0:00                   ` Linus Torvalds
  2023-07-19 23:48                 ` Linus Torvalds
  1 sibling, 1 reply; 32+ messages in thread
From: Matt Whitlock @ 2023-07-19 23:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Miklos Szeredi, David Howells, netdev,
	Dave Chinner, Jens Axboe, linux-fsdevel, linux-mm, linux-kernel,
	Christoph Hellwig, linux-fsdevel

On Wednesday, 19 July 2023 19:20:32 EDT, Linus Torvalds wrote:
>> On Wednesday, 19 July 2023 16:16:07 EDT, Linus Torvalds wrote:
>>> The *ONLY* reason for splice() existing is for zero-copy.
>> 
>> The very first sentence of splice(2) reads: "splice() moves data between
>> two file descriptors without copying between kernel address space and user
>> address space." Thus, it is not unreasonable to believe that the point of
>> splice is to avoid copying between user-space and kernel-space.
>
> I'm not at all opposed to clarifying the documentation.

Then that is my request. This entire complaint/discussion/argument would 
have been avoided if splice(2) had contained a sentence like this one from 
sendfile(2):

"If out_fd refers to a socket or pipe with zero-copy support, callers must 
ensure the transferred portions of the file referred to by in_fd remain 
unmodified until the reader on the other end of out_fd has consumed the 
transferred data."

That is a clear warning of the perils of the implementation under the hood, 
and it could/should be copied, more or less verbatim, to splice(2).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 23:20               ` Linus Torvalds
  2023-07-19 23:41                 ` Matt Whitlock
@ 2023-07-19 23:48                 ` Linus Torvalds
  1 sibling, 0 replies; 32+ messages in thread
From: Linus Torvalds @ 2023-07-19 23:48 UTC (permalink / raw)
  To: Matt Whitlock
  Cc: Matthew Wilcox, Miklos Szeredi, David Howells, netdev,
	Dave Chinner, Jens Axboe, linux-fsdevel, linux-mm, linux-kernel,
	Christoph Hellwig, linux-fsdevel

On Wed, 19 Jul 2023 at 16:20, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> If you want "one-copy", what you can do is:
>
>  - mmap() the file data (zero copy, not stable yet)
>
>  - use "write()" to write the data to the network. This will copy it
> to the skbs before the write() call returns and that copy makes it
> stable.
>
> Alternatively, if you want to be more than a bit odd, you _can_ do the
> zero-copy on the write side, by doing
>
>  - read the file data (one copy, now it's stable)
>
>  - vmsplice() to the kernel buffer (zero copy)
>
>  - splice() to the network (zero copy at least for the good cases)

Actually, I guess technically there's a third way:

 - mmap the input (zero copy)

 - write() to a pipe (one copy)

 - splice() to the network (zero copy)

which doesn't seem to really have any sane use cases, but who knows...
It avoids the user buffer management of the vmsplice() model, and
while you cannot do anything to the data in user space *before* it is
stable (because it only becomes stable as it is copied to the pipe
buffers by the 'write()' system call), you could use "tee()" to
duplicate the now stable stream and perhaps log it or create a
checksum after-the-fact.

Another use-case would be if you want to send the *same* stable stream
to two different network connections, while still only having one
copy. You can't do that with plain splice() - because the data isn't
guaranteed to be stable, and the two network connections might see
different streams. You can't do that with the 'mmap and then
write-to-socket' approach, because the two writes not only copy twice,
they might copy different data.

And while you *can* do it with the "read+vmsplice()" approach, maybe
the "write to pipe() in order to avoid any user space buffer issues"
model is better. And "tee()" avoids the overhead of doing multiple
vmsplice() calls on the same buffer.

I dunno.

What I *am* trying to say is that "splice()" is actually kind of
designed for people to do these kinds of combinations. But very very
few people actually do it.

For example, the "tee()" system call exists, but it is crazy hard to
use, I'm not sure it has ever actually been used for anything real.

               Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 23:41                 ` Matt Whitlock
@ 2023-07-20  0:00                   ` Linus Torvalds
  0 siblings, 0 replies; 32+ messages in thread
From: Linus Torvalds @ 2023-07-20  0:00 UTC (permalink / raw)
  To: Matt Whitlock
  Cc: Matthew Wilcox, Miklos Szeredi, David Howells, netdev,
	Dave Chinner, Jens Axboe, linux-fsdevel, linux-mm, linux-kernel,
	Christoph Hellwig, linux-fsdevel

On Wed, 19 Jul 2023 at 16:41, Matt Whitlock <kernel@mattwhitlock.name> wrote:
>
> Then that is my request. This entire complaint/discussion/argument would
> have been avoided if splice(2) had contained a sentence like this one from
> sendfile(2):
>
> "If out_fd refers to a socket or pipe with zero-copy support, callers must
> ensure the transferred portions of the file referred to by in_fd remain
> unmodified until the reader on the other end of out_fd has consumed the
> transferred data."
>
> That is a clear warning of the perils of the implementation under the hood,
> and it could/should be copied, more or less verbatim, to splice(2).

Ack. Internally in the kernel, the two really have always been more or
less of intermingled.

In fact, I think splice()/sendfile()/tee() could - and maybe should -
actually be a single man-page to make it clear that they are all
facets of the same thing.

The issues with TCP_CORK exist for splice too, for example, for
exactly the same reasons.

And while SPLICE_F_MORE exists, it only deals with multiple splice()
calls, it doesn't deal with the "I wrote a header before I even
started using splice()" case that is the one that is mentioned for
sendfile().

Or course, technically TCP_CORK exists for plain write() use as well,
but there the portable and historical fix is simply to use writev()
and send it all in one go.

So it's hopefully only when you use sendfile() and splice() that you
end up with "oh, but I have multiple different *kinds* of sources, and
I want to cork things until I've dealt with them all".

            Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-19 19:44         ` Matthew Wilcox
  2023-07-19 19:56           ` Miklos Szeredi
  2023-07-19 20:16           ` Linus Torvalds
@ 2023-07-24  9:44           ` David Howells
  2023-07-24 13:55             ` Miklos Szeredi
  2023-07-24 16:15             ` David Howells
  2 siblings, 2 replies; 32+ messages in thread
From: David Howells @ 2023-07-24  9:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Matthew Wilcox, Miklos Szeredi, Matt Whitlock, netdev,
	Dave Chinner, Jens Axboe, linux-fsdevel, linux-mm, linux-kernel,
	Christoph Hellwig, linux-fsdevel

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > So what's the API that provides the semantics of _copying_?
> 
> It's called "read()" and "write()".

What about copy_file_range()?  That seems to fall back to splicing if not
directly implemented by the filesystem.  It looks like the manpage for that
needs updating too - or should that actually copy?

David


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-24  9:44           ` David Howells
@ 2023-07-24 13:55             ` Miklos Szeredi
  2023-07-24 16:15             ` David Howells
  1 sibling, 0 replies; 32+ messages in thread
From: Miklos Szeredi @ 2023-07-24 13:55 UTC (permalink / raw)
  To: David Howells
  Cc: Linus Torvalds, Matthew Wilcox, Matt Whitlock, netdev,
	Dave Chinner, Jens Axboe, linux-fsdevel, linux-mm, linux-kernel,
	Christoph Hellwig, linux-fsdevel

On Mon, 24 Jul 2023 at 11:44, David Howells <dhowells@redhat.com> wrote:
>
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > > So what's the API that provides the semantics of _copying_?
> >
> > It's called "read()" and "write()".
>
> What about copy_file_range()?  That seems to fall back to splicing if not
> directly implemented by the filesystem.  It looks like the manpage for that
> needs updating too - or should that actually copy?

Both source and destination of copy_file_range() are regular files and
do_splice_direct() is basically equivalent to write(dest, mmap of
source), no refd buffers remain beyond the end of the syscall.  What
is it that should be updated in the manpage?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns
  2023-07-24  9:44           ` David Howells
  2023-07-24 13:55             ` Miklos Szeredi
@ 2023-07-24 16:15             ` David Howells
  1 sibling, 0 replies; 32+ messages in thread
From: David Howells @ 2023-07-24 16:15 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Linus Torvalds, Matthew Wilcox, Matt Whitlock, netdev,
	Dave Chinner, Jens Axboe, linux-fsdevel, linux-mm, linux-kernel,
	Christoph Hellwig, linux-fsdevel

Miklos Szeredi <miklos@szeredi.hu> wrote:

> Both source and destination of copy_file_range() are regular files and

Ah - the check is in generic_file_rw_checks().  Okay, nevermind.  (Though it
looks like it might allow this to be used with procfiles and suchlike, but
anyone who tries that had probably better know what they're doing).

David


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-07-24 16:16 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-29 15:54 [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe David Howells
2023-06-29 15:54 ` [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns David Howells
2023-07-19 10:17   ` Miklos Szeredi
2023-07-19 17:59     ` Matt Whitlock
2023-07-19 19:35       ` Miklos Szeredi
2023-07-19 19:44         ` Matthew Wilcox
2023-07-19 19:56           ` Miklos Szeredi
2023-07-19 20:04             ` Matthew Wilcox
2023-07-19 20:16           ` Linus Torvalds
2023-07-19 21:02             ` Matt Whitlock
2023-07-19 23:20               ` Linus Torvalds
2023-07-19 23:41                 ` Matt Whitlock
2023-07-20  0:00                   ` Linus Torvalds
2023-07-19 23:48                 ` Linus Torvalds
2023-07-24  9:44           ` David Howells
2023-07-24 13:55             ` Miklos Szeredi
2023-07-24 16:15             ` David Howells
2023-06-29 15:54 ` [RFC PATCH 2/4] splice: Make vmsplice() steal or copy David Howells
2023-06-30 13:44   ` Simon Horman
2023-06-30 15:29   ` David Howells
2023-06-30 17:32     ` Simon Horman
2023-06-29 15:54 ` [RFC PATCH 3/4] splice: Remove some now-unused bits David Howells
2023-06-29 15:54 ` [RFC PATCH 4/4] splice: Record some statistics David Howells
2023-06-29 17:56 ` [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe Linus Torvalds
2023-06-29 18:05   ` Matt Whitlock
2023-06-29 18:19     ` Linus Torvalds
2023-06-29 18:34       ` Matthew Wilcox
2023-06-29 18:53         ` Linus Torvalds
2023-06-30 16:50         ` David Howells
2023-06-29 18:42       ` Linus Torvalds
2023-06-29 18:16 ` Matt Whitlock
2023-06-30  0:01 ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.