All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] iov_iter: Convert the iterator macros into inline funcs
@ 2023-08-16 12:07 David Howells
  2023-08-16 12:07 ` [PATCH v3 1/2] iov_iter: Convert iterate*() to " David Howells
  2023-08-16 12:07 ` [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc() David Howells
  0 siblings, 2 replies; 24+ messages in thread
From: David Howells @ 2023-08-16 12:07 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: David Howells, Jens Axboe, Christoph Hellwig, Christian Brauner,
	David Laight, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

Hi Al, Linus,

Here are a couple of patches to try and clean up the iov_iter iteration
stuff.

The first patch converts the iov_iter iteration macros to always-inline
functions to make the code easier to follow.  It uses function pointers,
but they should get optimised away.  The priv2 argument should likewise get
optimised away if unused.

The second patch makes _copy_from_iter() and copy_page_from_iter_atomic()
handle the ->copy_mc flag earlier and not in the step function.  This flag
is only set by the coredump code and only with a BVEC iterator, so we can
have special out-of-line handling for this that uses iterate_bvec() rather
than iterate_and_advance() - thereby avoiding repeated checks on it in a
multi-element iterator.

Further changes I could make:

 (1) Add an 'ITER_OTHER' type and an ops table pointer and have
     iterate_and_advance2(), iov_iter_advance(), iov_iter_revert(),
     etc. jump through it if it sees ITER_OTHER type.  This would allow
     types for, say, scatterlist, bio list, skbuff to be added without
     further expanding the core.

 (2) Move the ITER_XARRAY type to being an ITER_OTHER type.  This would
     shrink the core iterators quite a lot and reduce the stack usage as
     the xarray walking stuff wouldn't be there.

 (3) Move the iterate_*() functions into a header file so that bespoke
     iterators can be created elsewhere.  For instance, rbd has an
     optimisation that requires it to scan to the buffer it is given to see
     if it is all zeros.  It would be nice if this could use
     iterate_and_advance() - but that's buried inside lib/iov_iter.c.

Anyway, the overall changes in compiled function size for these patches on
x86_64 look like:

	__copy_from_iter_mc                      new 0xd6
	__export_symbol_iov_iter_init            inc 0x3 -> 0x8 +0x5
	_copy_from_iter                          inc 0x36e -> 0x380 +0x12
	_copy_from_iter_flushcache               inc 0x359 -> 0x364 +0xb
	_copy_from_iter_nocache                  dcr 0x36a -> 0x33e -0x2c
	_copy_mc_to_iter                         inc 0x3a7 -> 0x3bc +0x15
	_copy_to_iter                            dcr 0x358 -> 0x34a -0xe
	copy_page_from_iter_atomic.part.0        inc 0x3cf -> 0x3d4 +0x5
	copy_page_to_iter_nofault.part.0         dcr 0x3f1 -> 0x3a9 -0x48
	copyin                                   del 0x30
	copyout                                  del 0x2d
	copyout_mc                               del 0x2b
	csum_and_copy_from_iter                  dcr 0x3e8 -> 0x3e5 -0x3
	csum_and_copy_to_iter                    dcr 0x46a -> 0x446 -0x24
	iov_iter_zero                            dcr 0x34f -> 0x338 -0x17
	memcpy_from_iter.isra.0                  del 0x1f

with __copy_from_iter_mc() being the out-of-line handling for ->copy_mc.

I've pushed the patches here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-cleanup

David

Changes
=======
ver #3)
 - Use min_t(size_t,) not min() to avoid a warning on Hexagon.
 - Inline all the step functions.
 - Added a patch to better handle copy_mc.

ver #2)
 - Rebased on top of Willy's changes in linux-next.
 - Change the checksum argument to the iteration functions to be a general
   void* and use it to pass iter->copy_mc flag to memcpy_from_iter_mc() to
   avoid using a function pointer.
 - Arrange the end of the iterate_*() functions to look the same to give
   the optimiser the best chance.
 - Make iterate_and_advance() a wrapper around iterate_and_advance2().
 - Adjust iterate_and_advance2() to use if-else-if-else-if-else rather than
   switch(), to put ITER_BVEC before KVEC and to mark UBUF and IOVEC as
   likely().
 - Move "iter->count += progress" into iterate_and_advance2() from the
   iterate functions.
 - Mark a number of the iterator helpers with __always_inline.
 - Fix _copy_from_iter_flushcache() to use memcpy_from_iter_flushcache()
   not memcpy_from_iter().

Link: https://lore.kernel.org/r/3710261.1691764329@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/855.1692047347@warthog.procyon.org.uk/ # v2

David Howells (2):
  iov_iter: Convert iterate*() to inline funcs
  iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()

 lib/iov_iter.c | 627 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 386 insertions(+), 241 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/2] iov_iter: Convert iterate*() to inline funcs
  2023-08-16 12:07 [PATCH v3 0/2] iov_iter: Convert the iterator macros into inline funcs David Howells
@ 2023-08-16 12:07 ` David Howells
  2023-08-16 12:07 ` [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc() David Howells
  1 sibling, 0 replies; 24+ messages in thread
From: David Howells @ 2023-08-16 12:07 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: David Howells, Jens Axboe, Christoph Hellwig, Christian Brauner,
	David Laight, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel, Christoph Hellwig

Convert the iov_iter iteration macros to inline functions to make the code
easier to follow.

The functions are marked __always_inline as we don't want to end up with
indirect calls in the code.  This, however, leaves dealing with ->copy_mc
in an awkard situation since the step function (memcpy_from_iter_mc())
needs to test the flag in the iterator, but isn't passed the iterator.
This will be dealt with in a follow-up patch.

The variable names in the per-type iterator functions have been harmonised
as much as possible and made clearer as to the variable purpose.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>,
cc: Christian Brauner <christian@brauner.io>,
cc: Matthew Wilcox <willy@infradead.org>,
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: David Laight <David.Laight@ACULAB.COM>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/3710261.1691764329@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/855.1692047347@warthog.procyon.org.uk/ # v2
---

Notes:
    Changes
    =======
    ver #3)
     - Use min_t(size_t,) not min() to avoid a warning on Hexagon.
     - Inline all the step functions.
    
    ver #2)
     - Rebased on top of Willy's changes in linux-next.
     - Change the checksum argument to the iteration functions to be a general
       void* and use it to pass iter->copy_mc flag to memcpy_from_iter_mc() to
       avoid using a function pointer.
     - Arrange the end of the iterate_*() functions to look the same to give
       the optimiser the best chance.
     - Make iterate_and_advance() a wrapper around iterate_and_advance2().
     - Adjust iterate_and_advance2() to use if-else-if-else-if-else rather than
       switch(), to put ITER_BVEC before KVEC and to mark UBUF and IOVEC as
       likely().
     - Move "iter->count += progress" into iterate_and_advance2() from the
       iterate functions.
     - Mark a number of the iterator helpers with __always_inline.
     - Fix _copy_from_iter_flushcache() to use memcpy_from_iter_flushcache()
       not memcpy_from_iter().

 lib/iov_iter.c | 612 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 371 insertions(+), 241 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 424737045b97..378da0cb3069 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -14,188 +14,264 @@
 #include <linux/scatterlist.h>
 #include <linux/instrumented.h>
 
-/* covers ubuf and kbuf alike */
-#define iterate_buf(i, n, base, len, off, __p, STEP) {		\
-	size_t __maybe_unused off = 0;				\
-	len = n;						\
-	base = __p + i->iov_offset;				\
-	len -= (STEP);						\
-	i->iov_offset += len;					\
-	n = len;						\
-}
-
-/* covers iovec and kvec alike */
-#define iterate_iovec(i, n, base, len, off, __p, STEP) {	\
-	size_t off = 0;						\
-	size_t skip = i->iov_offset;				\
-	do {							\
-		len = min(n, __p->iov_len - skip);		\
-		if (likely(len)) {				\
-			base = __p->iov_base + skip;		\
-			len -= (STEP);				\
-			off += len;				\
-			skip += len;				\
-			n -= len;				\
-			if (skip < __p->iov_len)		\
-				break;				\
-		}						\
-		__p++;						\
-		skip = 0;					\
-	} while (n);						\
-	i->iov_offset = skip;					\
-	n = off;						\
-}
-
-#define iterate_bvec(i, n, base, len, off, p, STEP) {		\
-	size_t off = 0;						\
-	unsigned skip = i->iov_offset;				\
-	while (n) {						\
-		unsigned offset = p->bv_offset + skip;		\
-		unsigned left;					\
-		void *kaddr = kmap_local_page(p->bv_page +	\
-					offset / PAGE_SIZE);	\
-		base = kaddr + offset % PAGE_SIZE;		\
-		len = min(min(n, (size_t)(p->bv_len - skip)),	\
-		     (size_t)(PAGE_SIZE - offset % PAGE_SIZE));	\
-		left = (STEP);					\
-		kunmap_local(kaddr);				\
-		len -= left;					\
-		off += len;					\
-		skip += len;					\
-		if (skip == p->bv_len) {			\
-			skip = 0;				\
-			p++;					\
-		}						\
-		n -= len;					\
-		if (left)					\
-			break;					\
-	}							\
-	i->iov_offset = skip;					\
-	n = off;						\
-}
-
-#define iterate_xarray(i, n, base, len, __off, STEP) {		\
-	__label__ __out;					\
-	size_t __off = 0;					\
-	struct folio *folio;					\
-	loff_t start = i->xarray_start + i->iov_offset;		\
-	pgoff_t index = start / PAGE_SIZE;			\
-	XA_STATE(xas, i->xarray, index);			\
-								\
-	len = PAGE_SIZE - offset_in_page(start);		\
-	rcu_read_lock();					\
-	xas_for_each(&xas, folio, ULONG_MAX) {			\
-		unsigned left;					\
-		size_t offset;					\
-		if (xas_retry(&xas, folio))			\
-			continue;				\
-		if (WARN_ON(xa_is_value(folio)))		\
-			break;					\
-		if (WARN_ON(folio_test_hugetlb(folio)))		\
-			break;					\
-		offset = offset_in_folio(folio, start + __off);	\
-		while (offset < folio_size(folio)) {		\
-			base = kmap_local_folio(folio, offset);	\
-			len = min(n, len);			\
-			left = (STEP);				\
-			kunmap_local(base);			\
-			len -= left;				\
-			__off += len;				\
-			n -= len;				\
-			if (left || n == 0)			\
-				goto __out;			\
-			offset += len;				\
-			len = PAGE_SIZE;			\
-		}						\
-	}							\
-__out:								\
-	rcu_read_unlock();					\
-	i->iov_offset += __off;					\
-	n = __off;						\
-}
-
-#define __iterate_and_advance(i, n, base, len, off, I, K) {	\
-	if (unlikely(i->count < n))				\
-		n = i->count;					\
-	if (likely(n)) {					\
-		if (likely(iter_is_ubuf(i))) {			\
-			void __user *base;			\
-			size_t len;				\
-			iterate_buf(i, n, base, len, off,	\
-						i->ubuf, (I)) 	\
-		} else if (likely(iter_is_iovec(i))) {		\
-			const struct iovec *iov = iter_iov(i);	\
-			void __user *base;			\
-			size_t len;				\
-			iterate_iovec(i, n, base, len, off,	\
-						iov, (I))	\
-			i->nr_segs -= iov - iter_iov(i);	\
-			i->__iov = iov;				\
-		} else if (iov_iter_is_bvec(i)) {		\
-			const struct bio_vec *bvec = i->bvec;	\
-			void *base;				\
-			size_t len;				\
-			iterate_bvec(i, n, base, len, off,	\
-						bvec, (K))	\
-			i->nr_segs -= bvec - i->bvec;		\
-			i->bvec = bvec;				\
-		} else if (iov_iter_is_kvec(i)) {		\
-			const struct kvec *kvec = i->kvec;	\
-			void *base;				\
-			size_t len;				\
-			iterate_iovec(i, n, base, len, off,	\
-						kvec, (K))	\
-			i->nr_segs -= kvec - i->kvec;		\
-			i->kvec = kvec;				\
-		} else if (iov_iter_is_xarray(i)) {		\
-			void *base;				\
-			size_t len;				\
-			iterate_xarray(i, n, base, len, off,	\
-							(K))	\
-		}						\
-		i->count -= n;					\
-	}							\
-}
-#define iterate_and_advance(i, n, base, len, off, I, K) \
-	__iterate_and_advance(i, n, base, len, off, I, ((void)(K),0))
-
-static int copyout(void __user *to, const void *from, size_t n)
+typedef size_t (*iov_step_f)(void *iter_base, size_t progress, size_t len,
+			     void *priv, void *priv2);
+typedef size_t (*iov_ustep_f)(void __user *iter_base, size_t progress, size_t len,
+			      void *priv, void *priv2);
+
+static __always_inline
+size_t iterate_ubuf(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+		    iov_ustep_f step)
 {
-	if (should_fail_usercopy())
-		return n;
-	if (access_ok(to, n)) {
-		instrument_copy_to_user(to, from, n);
-		n = raw_copy_to_user(to, from, n);
+	void __user *base = iter->ubuf;
+	size_t progress = 0, remain;
+
+	remain = step(base + iter->iov_offset, 0, len, priv, priv2);
+	progress = len - remain;
+	iter->iov_offset += progress;
+	return progress;
+}
+
+static __always_inline
+size_t iterate_iovec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+		     iov_ustep_f step)
+{
+	const struct iovec *p = iter->__iov;
+	size_t progress = 0, skip = iter->iov_offset;
+
+	do {
+		size_t remain, consumed;
+		size_t part = min(len, p->iov_len - skip);
+
+		if (likely(part)) {
+			remain = step(p->iov_base + skip, progress, part, priv, priv2);
+			consumed = part - remain;
+			progress += consumed;
+			skip += consumed;
+			len -= consumed;
+			if (skip < p->iov_len)
+				break;
+		}
+		p++;
+		skip = 0;
+	} while (len);
+
+	iter->__iov = p;
+	iter->nr_segs -= p - iter->__iov;
+	iter->iov_offset = skip;
+	return progress;
+}
+
+static __always_inline
+size_t iterate_kvec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+		    iov_step_f step)
+{
+	const struct kvec *p = iter->kvec;
+	size_t progress = 0, skip = iter->iov_offset;
+
+	do {
+		size_t remain, consumed;
+		size_t part = min(len, p->iov_len - skip);
+
+		if (likely(part)) {
+			remain = step(p->iov_base + skip, progress, part, priv, priv2);
+			consumed = part - remain;
+			progress += consumed;
+			skip += consumed;
+			len -= consumed;
+			if (skip < p->iov_len)
+				break;
+		}
+		p++;
+		skip = 0;
+	} while (len);
+
+	iter->kvec = p;
+	iter->nr_segs -= p - iter->kvec;
+	iter->iov_offset = skip;
+	return progress;
+}
+
+static __always_inline
+size_t iterate_bvec(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+		    iov_step_f step)
+{
+	const struct bio_vec *p = iter->bvec;
+	size_t progress = 0, skip = iter->iov_offset;
+
+	do {
+		size_t remain, consumed;
+		size_t offset = p->bv_offset + skip, part;
+		void *kaddr = kmap_local_page(p->bv_page + offset / PAGE_SIZE);
+
+		part = min3(len,
+			   (size_t)(p->bv_len - skip),
+			   (size_t)(PAGE_SIZE - offset % PAGE_SIZE));
+		remain = step(kaddr + offset % PAGE_SIZE, progress, part, priv, priv2);
+		kunmap_local(kaddr);
+		consumed = part - remain;
+		len -= consumed;
+		progress += consumed;
+		skip += consumed;
+		if (skip >= p->bv_len) {
+			skip = 0;
+			p++;
+		}
+		if (remain)
+			break;
+	} while (len);
+
+	iter->bvec = p;
+	iter->nr_segs -= p - iter->bvec;
+	iter->iov_offset = skip;
+	return progress;
+}
+
+static __always_inline
+size_t iterate_xarray(struct iov_iter *iter, size_t len, void *priv, void *priv2,
+		      iov_step_f step)
+{
+	struct folio *folio;
+	size_t progress = 0;
+	loff_t start = iter->xarray_start + iter->iov_offset;
+	pgoff_t index = start / PAGE_SIZE;
+	XA_STATE(xas, iter->xarray, index);
+
+	rcu_read_lock();
+	xas_for_each(&xas, folio, ULONG_MAX) {
+		size_t remain, consumed, offset, part, flen;
+
+		if (xas_retry(&xas, folio))
+			continue;
+		if (WARN_ON(xa_is_value(folio)))
+			break;
+		if (WARN_ON(folio_test_hugetlb(folio)))
+			break;
+
+		offset = offset_in_folio(folio, start);
+		flen = min(folio_size(folio) - offset, len);
+		start += flen;
+
+		while (flen) {
+			void *base = kmap_local_folio(folio, offset);
+
+			part = min_t(size_t, flen,
+				     PAGE_SIZE - offset_in_page(offset));
+			remain = step(base, progress, part, priv, priv2);
+			kunmap_local(base);
+
+			consumed = part - remain;
+			progress += consumed;
+			len -= consumed;
+
+			if (remain || len == 0)
+				goto out;
+			flen -= consumed;
+			offset += consumed;
+		}
 	}
-	return n;
+
+out:
+	rcu_read_unlock();
+	iter->iov_offset += progress;
+	return progress;
 }
 
-static int copyout_nofault(void __user *to, const void *from, size_t n)
+static __always_inline
+size_t iterate_and_advance2(struct iov_iter *iter, size_t len, void *priv,
+			    void *priv2, iov_ustep_f ustep, iov_step_f step)
 {
-	long res;
+	size_t progress;
 
+	if (unlikely(iter->count < len))
+		len = iter->count;
+	if (unlikely(!len))
+		return 0;
+
+	if (likely(iter_is_ubuf(iter)))
+		progress = iterate_ubuf(iter, len, priv, priv2, ustep);
+	else if (likely(iter_is_iovec(iter)))
+		progress = iterate_iovec(iter, len, priv, priv2, ustep);
+	else if (iov_iter_is_bvec(iter))
+		progress = iterate_bvec(iter, len, priv, priv2, step);
+	else if (iov_iter_is_kvec(iter))
+		progress = iterate_kvec(iter, len, priv, priv2, step);
+	else if (iov_iter_is_xarray(iter))
+		progress = iterate_xarray(iter, len, priv, priv2, step);
+	else
+		progress = len;
+	iter->count -= progress;
+	return progress;
+}
+
+static __always_inline
+size_t iterate_and_advance(struct iov_iter *iter, size_t len, void *priv,
+			   iov_ustep_f ustep, iov_step_f step)
+{
+	return iterate_and_advance2(iter, len, priv, NULL, ustep, step);
+}
+
+static __always_inline
+size_t copy_to_user_iter(void __user *iter_to, size_t progress,
+			 size_t len, void *from, void *priv2)
+{
 	if (should_fail_usercopy())
-		return n;
+		return len;
+	if (access_ok(iter_to, len)) {
+		from += progress;
+		instrument_copy_to_user(iter_to, from, len);
+		len = raw_copy_to_user(iter_to, from, len);
+	}
+	return len;
+}
 
-	res = copy_to_user_nofault(to, from, n);
+static __always_inline
+size_t copy_to_user_iter_nofault(void __user *iter_to, size_t progress,
+				 size_t len, void *from, void *priv2)
+{
+	ssize_t res;
 
-	return res < 0 ? n : res;
+	if (should_fail_usercopy())
+		return len;
+
+	from += progress;
+	res = copy_to_user_nofault(iter_to, from, len);
+	return res < 0 ? len : res;
 }
 
-static int copyin(void *to, const void __user *from, size_t n)
+static __always_inline
+size_t copy_from_user_iter(void __user *iter_from, size_t progress,
+			   size_t len, void *to, void *priv2)
 {
-	size_t res = n;
+	size_t res = len;
 
 	if (should_fail_usercopy())
-		return n;
-	if (access_ok(from, n)) {
-		instrument_copy_from_user_before(to, from, n);
-		res = raw_copy_from_user(to, from, n);
-		instrument_copy_from_user_after(to, from, n, res);
+		return len;
+	if (access_ok(iter_from, len)) {
+		to += progress;
+		instrument_copy_from_user_before(to, iter_from, len);
+		res = raw_copy_from_user(to, iter_from, len);
+		instrument_copy_from_user_after(to, iter_from, len, res);
 	}
 	return res;
 }
 
+static __always_inline
+size_t memcpy_to_iter(void *iter_to, size_t progress,
+		      size_t len, void *from, void *priv2)
+{
+	memcpy(iter_to, from + progress, len);
+	return 0;
+}
+
+static __always_inline
+size_t memcpy_from_iter(void *iter_from, size_t progress,
+			size_t len, void *to, void *priv2)
+{
+	memcpy(to + progress, iter_from, len);
+	return 0;
+}
+
 /*
  * fault_in_iov_iter_readable - fault in iov iterator for reading
  * @i: iterator
@@ -313,23 +389,29 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 		return 0;
 	if (user_backed_iter(i))
 		might_fault();
-	iterate_and_advance(i, bytes, base, len, off,
-		copyout(base, addr + off, len),
-		memcpy(base, addr + off, len)
-	)
-
-	return bytes;
+	return iterate_and_advance(i, bytes, (void *)addr,
+				   copy_to_user_iter, memcpy_to_iter);
 }
 EXPORT_SYMBOL(_copy_to_iter);
 
 #ifdef CONFIG_ARCH_HAS_COPY_MC
-static int copyout_mc(void __user *to, const void *from, size_t n)
-{
-	if (access_ok(to, n)) {
-		instrument_copy_to_user(to, from, n);
-		n = copy_mc_to_user((__force void *) to, from, n);
+static __always_inline
+size_t copy_to_user_iter_mc(void __user *iter_to, size_t progress,
+			    size_t len, void *from, void *priv2)
+{
+	if (access_ok(iter_to, len)) {
+		from += progress;
+		instrument_copy_to_user(iter_to, from, len);
+		len = copy_mc_to_user(iter_to, from, len);
 	}
-	return n;
+	return len;
+}
+
+static __always_inline
+size_t memcpy_to_iter_mc(void *iter_to, size_t progress,
+			 size_t len, void *from, void *priv2)
+{
+	return copy_mc_to_kernel(iter_to, from + progress, len);
 }
 
 /**
@@ -362,22 +444,20 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 		return 0;
 	if (user_backed_iter(i))
 		might_fault();
-	__iterate_and_advance(i, bytes, base, len, off,
-		copyout_mc(base, addr + off, len),
-		copy_mc_to_kernel(base, addr + off, len)
-	)
-
-	return bytes;
+	return iterate_and_advance(i, bytes, (void *)addr,
+				   copy_to_user_iter_mc, memcpy_to_iter_mc);
 }
 EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
 #endif /* CONFIG_ARCH_HAS_COPY_MC */
 
-static void *memcpy_from_iter(struct iov_iter *i, void *to, const void *from,
-				 size_t size)
+static size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
+				  size_t len, void *to, void *priv2)
 {
-	if (iov_iter_is_copy_mc(i))
-		return (void *)copy_mc_to_kernel(to, from, size);
-	return memcpy(to, from, size);
+	struct iov_iter *iter = priv2;
+
+	if (iov_iter_is_copy_mc(iter))
+		return copy_mc_to_kernel(to + progress, iter_from, len);
+	return memcpy_from_iter(iter_from, progress, len, to, priv2);
 }
 
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
@@ -387,30 +467,46 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 
 	if (user_backed_iter(i))
 		might_fault();
-	iterate_and_advance(i, bytes, base, len, off,
-		copyin(addr + off, base, len),
-		memcpy_from_iter(i, addr + off, base, len)
-	)
-
-	return bytes;
+	return iterate_and_advance2(i, bytes, addr, i,
+				    copy_from_user_iter,
+				    memcpy_from_iter_mc);
 }
 EXPORT_SYMBOL(_copy_from_iter);
 
+static __always_inline
+size_t copy_from_user_iter_nocache(void __user *iter_from, size_t progress,
+				   size_t len, void *to, void *priv2)
+{
+	return __copy_from_user_inatomic_nocache(to + progress, iter_from, len);
+}
+
 size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (WARN_ON_ONCE(!i->data_source))
 		return 0;
 
-	iterate_and_advance(i, bytes, base, len, off,
-		__copy_from_user_inatomic_nocache(addr + off, base, len),
-		memcpy(addr + off, base, len)
-	)
-
-	return bytes;
+	return iterate_and_advance(i, bytes, addr,
+				   copy_from_user_iter_nocache,
+				   memcpy_from_iter);
 }
 EXPORT_SYMBOL(_copy_from_iter_nocache);
 
 #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
+static __always_inline
+size_t copy_from_user_iter_flushcache(void __user *iter_from, size_t progress,
+				      size_t len, void *to, void *priv2)
+{
+	return __copy_from_user_flushcache(to + progress, iter_from, len);
+}
+
+static __always_inline
+size_t memcpy_from_iter_flushcache(void *iter_from, size_t progress,
+				   size_t len, void *to, void *priv2)
+{
+	memcpy_flushcache(to + progress, iter_from, len);
+	return 0;
+}
+
 /**
  * _copy_from_iter_flushcache - write destination through cpu cache
  * @addr: destination kernel address
@@ -432,12 +528,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
 	if (WARN_ON_ONCE(!i->data_source))
 		return 0;
 
-	iterate_and_advance(i, bytes, base, len, off,
-		__copy_from_user_flushcache(addr + off, base, len),
-		memcpy_flushcache(addr + off, base, len)
-	)
-
-	return bytes;
+	return iterate_and_advance(i, bytes, addr,
+				   copy_from_user_iter_flushcache,
+				   memcpy_from_iter_flushcache);
 }
 EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache);
 #endif
@@ -509,10 +602,9 @@ size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_t byte
 		void *kaddr = kmap_local_page(page);
 		size_t n = min(bytes, (size_t)PAGE_SIZE - offset);
 
-		iterate_and_advance(i, n, base, len, off,
-			copyout_nofault(base, kaddr + offset + off, len),
-			memcpy(base, kaddr + offset + off, len)
-		)
+		n = iterate_and_advance(i, bytes, kaddr,
+					copy_to_user_iter_nofault,
+					memcpy_to_iter);
 		kunmap_local(kaddr);
 		res += n;
 		bytes -= n;
@@ -555,14 +647,25 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 }
 EXPORT_SYMBOL(copy_page_from_iter);
 
-size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
+static __always_inline
+size_t zero_to_user_iter(void __user *iter_to, size_t progress,
+			 size_t len, void *priv, void *priv2)
 {
-	iterate_and_advance(i, bytes, base, len, count,
-		clear_user(base, len),
-		memset(base, 0, len)
-	)
+	return clear_user(iter_to, len);
+}
 
-	return bytes;
+static __always_inline
+size_t zero_to_iter(void *iter_to, size_t progress,
+		    size_t len, void *priv, void *priv2)
+{
+	memset(iter_to, 0, len);
+	return 0;
+}
+
+size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
+{
+	return iterate_and_advance(i, bytes, NULL,
+				   zero_to_user_iter, zero_to_iter);
 }
 EXPORT_SYMBOL(iov_iter_zero);
 
@@ -587,10 +690,9 @@ size_t copy_page_from_iter_atomic(struct page *page, size_t offset,
 		}
 
 		p = kmap_atomic(page) + offset;
-		iterate_and_advance(i, n, base, len, off,
-			copyin(p + off, base, len),
-			memcpy_from_iter(i, p + off, base, len)
-		)
+		n = iterate_and_advance2(i, n, p, i,
+					 copy_from_user_iter,
+					 memcpy_from_iter_mc);
 		kunmap_atomic(p);
 		copied += n;
 		offset += n;
@@ -1181,32 +1283,64 @@ ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
 }
 EXPORT_SYMBOL(iov_iter_get_pages_alloc2);
 
+static __always_inline
+size_t copy_from_user_iter_csum(void __user *iter_from, size_t progress,
+				size_t len, void *to, void *priv2)
+{
+	__wsum next, *csum = priv2;
+
+	next = csum_and_copy_from_user(iter_from, to + progress, len);
+	*csum = csum_block_add(*csum, next, progress);
+	return next ? 0 : len;
+}
+
+static __always_inline
+size_t memcpy_from_iter_csum(void *iter_from, size_t progress,
+			     size_t len, void *to, void *priv2)
+{
+	__wsum *csum = priv2;
+
+	*csum = csum_and_memcpy(to + progress, iter_from, len, *csum, progress);
+	return 0;
+}
+
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
 			       struct iov_iter *i)
 {
-	__wsum sum, next;
-	sum = *csum;
 	if (WARN_ON_ONCE(!i->data_source))
 		return 0;
-
-	iterate_and_advance(i, bytes, base, len, off, ({
-		next = csum_and_copy_from_user(base, addr + off, len);
-		sum = csum_block_add(sum, next, off);
-		next ? 0 : len;
-	}), ({
-		sum = csum_and_memcpy(addr + off, base, len, sum, off);
-	})
-	)
-	*csum = sum;
-	return bytes;
+	return iterate_and_advance2(i, bytes, addr, csum,
+				    copy_from_user_iter_csum,
+				    memcpy_from_iter_csum);
 }
 EXPORT_SYMBOL(csum_and_copy_from_iter);
 
+static __always_inline
+size_t copy_to_user_iter_csum(void __user *iter_to, size_t progress,
+			      size_t len, void *from, void *priv2)
+{
+	__wsum next, *csum = priv2;
+
+	next = csum_and_copy_to_user(from + progress, iter_to, len);
+	*csum = csum_block_add(*csum, next, progress);
+	return next ? 0 : len;
+}
+
+static __always_inline
+size_t memcpy_to_iter_csum(void *iter_to, size_t progress,
+			   size_t len, void *from, void *priv2)
+{
+	__wsum *csum = priv2;
+
+	*csum = csum_and_memcpy(iter_to, from + progress, len, *csum, progress);
+	return 0;
+}
+
 size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
 			     struct iov_iter *i)
 {
 	struct csum_state *csstate = _csstate;
-	__wsum sum, next;
+	__wsum sum;
 
 	if (WARN_ON_ONCE(i->data_source))
 		return 0;
@@ -1220,14 +1354,10 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate,
 	}
 
 	sum = csum_shift(csstate->csum, csstate->off);
-	iterate_and_advance(i, bytes, base, len, off, ({
-		next = csum_and_copy_to_user(addr + off, base, len);
-		sum = csum_block_add(sum, next, off);
-		next ? 0 : len;
-	}), ({
-		sum = csum_and_memcpy(base, addr + off, len, sum, off);
-	})
-	)
+	
+	bytes = iterate_and_advance2(i, bytes, (void *)addr, &sum,
+				     copy_to_user_iter_csum,
+				     memcpy_to_iter_csum);
 	csstate->csum = csum_shift(sum, csstate->off);
 	csstate->off += bytes;
 	return bytes;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-16 12:07 [PATCH v3 0/2] iov_iter: Convert the iterator macros into inline funcs David Howells
  2023-08-16 12:07 ` [PATCH v3 1/2] iov_iter: Convert iterate*() to " David Howells
@ 2023-08-16 12:07 ` David Howells
  2023-08-16 12:28   ` David Laight
  2023-08-16 13:00   ` David Howells
  1 sibling, 2 replies; 24+ messages in thread
From: David Howells @ 2023-08-16 12:07 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: David Howells, Jens Axboe, Christoph Hellwig, Christian Brauner,
	David Laight, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

iter->copy_mc is only used with a bvec iterator and only by
dump_emit_page() in fs/coredump.c so rather than handle this in
memcpy_from_iter_mc() where it is checked repeatedly by _copy_from_iter() and
copy_page_from_iter_atomic(),
---
 lib/iov_iter.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 378da0cb3069..33febccadd9d 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -450,14 +450,33 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
 #endif /* CONFIG_ARCH_HAS_COPY_MC */
 
-static size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
-				  size_t len, void *to, void *priv2)
+static __always_inline
+size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
+			   size_t len, void *to, void *priv2)
 {
-	struct iov_iter *iter = priv2;
+	return copy_mc_to_kernel(to + progress, iter_from, len);
+}
 
-	if (iov_iter_is_copy_mc(iter))
-		return copy_mc_to_kernel(to + progress, iter_from, len);
-	return memcpy_from_iter(iter_from, progress, len, to, priv2);
+static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)
+{
+	size_t progress;
+
+	if (unlikely(i->count < bytes))
+		bytes = i->count;
+	if (unlikely(!bytes))
+		return 0;
+	progress = iterate_bvec(i, bytes, addr, NULL, memcpy_from_iter_mc);
+	i->count -= progress;
+	return progress;
+}
+
+static __always_inline
+size_t __copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
+{
+	if (unlikely(iov_iter_is_copy_mc(i)))
+		return __copy_from_iter_mc(addr, bytes, i);
+	return iterate_and_advance(i, bytes, addr,
+				   copy_from_user_iter, memcpy_from_iter);
 }
 
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
@@ -467,9 +486,7 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 
 	if (user_backed_iter(i))
 		might_fault();
-	return iterate_and_advance2(i, bytes, addr, i,
-				    copy_from_user_iter,
-				    memcpy_from_iter_mc);
+	return __copy_from_iter(addr, bytes, i);
 }
 EXPORT_SYMBOL(_copy_from_iter);
 
@@ -690,9 +707,7 @@ size_t copy_page_from_iter_atomic(struct page *page, size_t offset,
 		}
 
 		p = kmap_atomic(page) + offset;
-		n = iterate_and_advance2(i, n, p, i,
-					 copy_from_user_iter,
-					 memcpy_from_iter_mc);
+		__copy_from_iter(p, n, i);
 		kunmap_atomic(p);
 		copied += n;
 		offset += n;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-16 12:07 ` [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc() David Howells
@ 2023-08-16 12:28   ` David Laight
  2023-08-16 13:00   ` David Howells
  1 sibling, 0 replies; 24+ messages in thread
From: David Laight @ 2023-08-16 12:28 UTC (permalink / raw)
  To: 'David Howells', Al Viro, Linus Torvalds
  Cc: Jens Axboe, Christoph Hellwig, Christian Brauner, Matthew Wilcox,
	Jeff Layton, linux-fsdevel, linux-block, linux-mm, linux-kernel

From: David Howells
> Sent: Wednesday, August 16, 2023 1:08 PM

Given:

> iter->copy_mc is only used with a bvec iterator and only by
> dump_emit_page() in fs/coredump.c so rather than handle this in
> memcpy_from_iter_mc() where it is checked repeatedly by _copy_from_iter() and
> copy_page_from_iter_atomic(),

Instead of adding the extra check to every copy:

> +static __always_inline
> +size_t __copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
> +{
> +	if (unlikely(iov_iter_is_copy_mc(i)))
> +		return __copy_from_iter_mc(addr, bytes, i);
> +	return iterate_and_advance(i, bytes, addr,
> +				   copy_from_user_iter, memcpy_from_iter);
>  }

Couldn't the relevant code directly call __copy_frtom_iter_mc() ?
Or a version then checked iov_is_copy_mc() and then fell
back to the standard function.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-16 12:07 ` [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc() David Howells
  2023-08-16 12:28   ` David Laight
@ 2023-08-16 13:00   ` David Howells
  2023-08-16 14:19     ` David Laight
  1 sibling, 1 reply; 24+ messages in thread
From: David Howells @ 2023-08-16 13:00 UTC (permalink / raw)
  To: David Laight
  Cc: dhowells, Al Viro, Linus Torvalds, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

David Laight <David.Laight@ACULAB.COM> wrote:

> 
> Couldn't the relevant code directly call __copy_from_iter_mc() ?
> Or a version then checked iov_is_copy_mc() and then fell
> back to the standard function.

No, because the marked iterator is handed by the coredump code to
__kernel_write_iter() and thence on to who-knows-what driver - which will call
copy_from_iter() or some such.  $DRIVER shouldn't need to know about
->copy_mc.

One thing I do wonder about, though, is what should happen if they call, say,
csum_and_copy_from_iter()?  That doesn't have an _mc variant.  Or what if they
extract the pages and operate directly on those?

David


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-16 13:00   ` David Howells
@ 2023-08-16 14:19     ` David Laight
  2023-08-16 18:50       ` Linus Torvalds
                         ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: David Laight @ 2023-08-16 14:19 UTC (permalink / raw)
  To: 'David Howells'
  Cc: Al Viro, Linus Torvalds, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

From: David Howells
> Sent: Wednesday, August 16, 2023 2:01 PM
> 
> David Laight <David.Laight@ACULAB.COM> wrote:
> 
> >
> > Couldn't the relevant code directly call __copy_from_iter_mc() ?
> > Or a version then checked iov_is_copy_mc() and then fell
> > back to the standard function.
> 
> No, because the marked iterator is handed by the coredump code to
> __kernel_write_iter() and thence on to who-knows-what driver - which will call
> copy_from_iter() or some such.  $DRIVER shouldn't need to know about
> ->copy_mc.

What about ITER_BVEC_MC ??

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-16 14:19     ` David Laight
@ 2023-08-16 18:50       ` Linus Torvalds
  2023-08-16 20:35       ` David Howells
  2023-08-18 11:39       ` David Howells
  2 siblings, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2023-08-16 18:50 UTC (permalink / raw)
  To: David Laight
  Cc: David Howells, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

On Wed, 16 Aug 2023 at 14:19, David Laight <David.Laight@aculab.com> wrote:
>
> What about ITER_BVEC_MC ??

That probably would be the best option. Just make it a proper
ITER_xyz, instead of an odd sub-case for one ITER (but set up in such
a way that it looks like it might happen for other ITER_xyz cases).

                Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-16 14:19     ` David Laight
  2023-08-16 18:50       ` Linus Torvalds
@ 2023-08-16 20:35       ` David Howells
  2023-08-17  4:18         ` Linus Torvalds
                           ` (2 more replies)
  2023-08-18 11:39       ` David Howells
  2 siblings, 3 replies; 24+ messages in thread
From: David Howells @ 2023-08-16 20:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, David Laight, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > What about ITER_BVEC_MC ??
> 
> That probably would be the best option. Just make it a proper
> ITER_xyz, instead of an odd sub-case for one ITER (but set up in such
> a way that it looks like it might happen for other ITER_xyz cases).

I'm not sure that buys us anything.  It would then require every call to
iov_iter_is_bvec()[*] to check for two values instead of one - including in
iterate_and_advance() - and *still* we'd have to have the special-casing in
_copy_from_iter() and copy_page_from_iter_atomic().

The issue is that ITER_xyz changes the iteration function - but we don't
actually want to do that; rather, we need to change the step function.

David

[*] There's a bunch of them outside of iov_iter.c.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-16 20:35       ` David Howells
@ 2023-08-17  4:18         ` Linus Torvalds
  2023-08-17  8:41           ` David Laight
  2023-08-18 11:42         ` David Howells
  2023-08-18 13:33         ` David Howells
  2 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2023-08-17  4:18 UTC (permalink / raw)
  To: David Howells
  Cc: David Laight, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 772 bytes --]

On Wed, 16 Aug 2023 at 22:35, David Howells <dhowells@redhat.com> wrote:
>
> I'm not sure that buys us anything.  It would then require every call to
> iov_iter_is_bvec()[*] to check for two values instead of one

Well, that part is trivially fixable, and we should do that anyway for
other reasons.

See the attached patch.

> The issue is that ITER_xyz changes the iteration function - but we don't
> actually want to do that; rather, we need to change the step function.

Yeah, that may be the fundamental issue. But making the ITER_xyz flags
be bit masks would help - partly exactly because it makes it so
trivial yo say "for this set of ITER_xyz, do this".

This patch only does that for the 'user_backed' thing, which was a similar case.

Hmm?

               Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 5325 bytes --]

 drivers/infiniband/hw/hfi1/file_ops.c    |  2 +-
 drivers/infiniband/hw/qib/qib_file_ops.c |  2 +-
 include/linux/uio.h                      | 36 +++++++++++++++-----------------
 lib/iov_iter.c                           |  1 -
 sound/core/pcm_native.c                  |  4 ++--
 5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index a5ab22cedd41..788fc249234f 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -267,7 +267,7 @@ static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from)
 
 	if (!HFI1_CAP_IS_KSET(SDMA))
 		return -EINVAL;
-	if (!from->user_backed)
+	if (!user_backed_iter(from))
 		return -EINVAL;
 	idx = srcu_read_lock(&fd->pq_srcu);
 	pq = srcu_dereference(fd->pq, &fd->pq_srcu);
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index ef85bc8d9384..09a6d9121b3d 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -2244,7 +2244,7 @@ static ssize_t qib_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct qib_ctxtdata *rcd = ctxt_fp(iocb->ki_filp);
 	struct qib_user_sdma_queue *pq = fp->pq;
 
-	if (!from->user_backed || !from->nr_segs || !pq)
+	if (!user_backed_iter(from) || !from->nr_segs || !pq)
 		return -EINVAL;
 
 	return qib_user_sdma_writev(rcd, pq, iter_iov(from), from->nr_segs);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index ff81e5ccaef2..230da97a42d5 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -21,12 +21,12 @@ struct kvec {
 
 enum iter_type {
 	/* iter types */
-	ITER_IOVEC,
-	ITER_KVEC,
-	ITER_BVEC,
-	ITER_XARRAY,
-	ITER_DISCARD,
-	ITER_UBUF,
+	ITER_IOVEC = 1,
+	ITER_UBUF = 2,
+	ITER_KVEC = 4,
+	ITER_BVEC = 8,
+	ITER_XARRAY = 16,
+	ITER_DISCARD = 32,
 };
 
 #define ITER_SOURCE	1	// == WRITE
@@ -39,11 +39,10 @@ struct iov_iter_state {
 };
 
 struct iov_iter {
-	u8 iter_type;
-	bool copy_mc;
-	bool nofault;
+	u8	iter_type:6,
+		copy_mc:1,
+		nofault:1;
 	bool data_source;
-	bool user_backed;
 	union {
 		size_t iov_offset;
 		int last_offset;
@@ -85,7 +84,7 @@ struct iov_iter {
 
 static inline const struct iovec *iter_iov(const struct iov_iter *iter)
 {
-	if (iter->iter_type == ITER_UBUF)
+	if (iter->iter_type & ITER_UBUF)
 		return (const struct iovec *) &iter->__ubuf_iovec;
 	return iter->__iov;
 }
@@ -108,32 +107,32 @@ static inline void iov_iter_save_state(struct iov_iter *iter,
 
 static inline bool iter_is_ubuf(const struct iov_iter *i)
 {
-	return iov_iter_type(i) == ITER_UBUF;
+	return iov_iter_type(i) & ITER_UBUF;
 }
 
 static inline bool iter_is_iovec(const struct iov_iter *i)
 {
-	return iov_iter_type(i) == ITER_IOVEC;
+	return iov_iter_type(i) & ITER_IOVEC;
 }
 
 static inline bool iov_iter_is_kvec(const struct iov_iter *i)
 {
-	return iov_iter_type(i) == ITER_KVEC;
+	return iov_iter_type(i) & ITER_KVEC;
 }
 
 static inline bool iov_iter_is_bvec(const struct iov_iter *i)
 {
-	return iov_iter_type(i) == ITER_BVEC;
+	return iov_iter_type(i) & ITER_BVEC;
 }
 
 static inline bool iov_iter_is_discard(const struct iov_iter *i)
 {
-	return iov_iter_type(i) == ITER_DISCARD;
+	return iov_iter_type(i) & ITER_DISCARD;
 }
 
 static inline bool iov_iter_is_xarray(const struct iov_iter *i)
 {
-	return iov_iter_type(i) == ITER_XARRAY;
+	return iov_iter_type(i) & ITER_XARRAY;
 }
 
 static inline unsigned char iov_iter_rw(const struct iov_iter *i)
@@ -143,7 +142,7 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i)
 
 static inline bool user_backed_iter(const struct iov_iter *i)
 {
-	return i->user_backed;
+	return i->iter_type & (ITER_IOVEC | ITER_UBUF);
 }
 
 /*
@@ -376,7 +375,6 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 	*i = (struct iov_iter) {
 		.iter_type = ITER_UBUF,
 		.copy_mc = false,
-		.user_backed = true,
 		.data_source = direction,
 		.ubuf = buf,
 		.count = count,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e4dc809d1075..857e661d1554 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -290,7 +290,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 		.iter_type = ITER_IOVEC,
 		.copy_mc = false,
 		.nofault = false,
-		.user_backed = true,
 		.data_source = direction,
 		.__iov = iov,
 		.nr_segs = nr_segs,
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 95fc56e403b1..642dceeb80ee 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -3527,7 +3527,7 @@ static ssize_t snd_pcm_readv(struct kiocb *iocb, struct iov_iter *to)
 	if (runtime->state == SNDRV_PCM_STATE_OPEN ||
 	    runtime->state == SNDRV_PCM_STATE_DISCONNECTED)
 		return -EBADFD;
-	if (!to->user_backed)
+	if (!user_backed_iter(to))
 		return -EINVAL;
 	if (to->nr_segs > 1024 || to->nr_segs != runtime->channels)
 		return -EINVAL;
@@ -3567,7 +3567,7 @@ static ssize_t snd_pcm_writev(struct kiocb *iocb, struct iov_iter *from)
 	if (runtime->state == SNDRV_PCM_STATE_OPEN ||
 	    runtime->state == SNDRV_PCM_STATE_DISCONNECTED)
 		return -EBADFD;
-	if (!from->user_backed)
+	if (!user_backed_iter(from))
 		return -EINVAL;
 	if (from->nr_segs > 128 || from->nr_segs != runtime->channels ||
 	    !frame_aligned(runtime, iov->iov_len))

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-17  4:18         ` Linus Torvalds
@ 2023-08-17  8:41           ` David Laight
  2023-08-17 14:38             ` Linus Torvalds
  2023-08-18 15:19             ` David Howells
  0 siblings, 2 replies; 24+ messages in thread
From: David Laight @ 2023-08-17  8:41 UTC (permalink / raw)
  To: 'Linus Torvalds', David Howells
  Cc: Al Viro, Jens Axboe, Christoph Hellwig, Christian Brauner,
	Matthew Wilcox, Jeff Layton, linux-fsdevel, linux-block,
	linux-mm, linux-kernel

> This patch only does that for the 'user_backed' thing, which was a similar case.

Yes, having two fields that have to be set to match is a recipe
for disaster.

Although I'm not sure the bit-fields really help.
There are 8 bytes at the start of the structure, might as well
use them :-)
OTOH the 'nofault' and 'copy_mc' flags could be put into much
higher bits of a 32bit value.

Alternatively put both/all the USER values first.
Then an ordered compare could be used.

If everything is actually inlined could the 'iter' be passed
through to the step() functions?
Although is might be better to pass a cached copy of iter->iter_type
(although that might just end up spilling to stack.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-17  8:41           ` David Laight
@ 2023-08-17 14:38             ` Linus Torvalds
  2023-08-17 15:16               ` David Laight
  2023-08-18 15:19             ` David Howells
  1 sibling, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2023-08-17 14:38 UTC (permalink / raw)
  To: David Laight
  Cc: David Howells, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

On Thu, 17 Aug 2023 at 10:42, David Laight <David.Laight@aculab.com> wrote:
>
> Although I'm not sure the bit-fields really help.
> There are 8 bytes at the start of the structure, might as well
> use them :-)

Actuallyç I wrote the patch that way because it seems to improve code
generation.

The bitfields are generally all set together as just plain one-time
constants at initialization time, and gcc sees that it's a full byte
write. And the reason 'data_source' is not a bitfield is that it's not
a constant at iov_iter init time (it's an argument to all the init
functions), so having that one as a separate byte at init time is good
for code generation when you don't need to mask bits or anything like
that.

And once initialized, having things be dense and doing all the
compares with a bitwise 'and' instead of doing them as some value
compare again tends to generate good code.

Then being able to test multiple bits at the same time is just gravy
on top of that (ie that whole "remove user_backed, because it's easier
to just test the bit combination").

> OTOH the 'nofault' and 'copy_mc' flags could be put into much
> higher bits of a 32bit value.

Once you start doing that, you often get bigger constants in the code stream.

I didn't do any *extensive* testing of the code generation, but the
stuff I looked at looked good.

               Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-17 14:38             ` Linus Torvalds
@ 2023-08-17 15:16               ` David Laight
  2023-08-17 15:31                 ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: David Laight @ 2023-08-17 15:16 UTC (permalink / raw)
  To: 'Linus Torvalds'
  Cc: David Howells, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

From: Linus Torvalds
> Sent: Thursday, August 17, 2023 3:38 PM
> 
> On Thu, 17 Aug 2023 at 10:42, David Laight <David.Laight@aculab.com> wrote:
> >
> > Although I'm not sure the bit-fields really help.
> > There are 8 bytes at the start of the structure, might as well
> > use them :-)
> 
> Actuallyç I wrote the patch that way because it seems to improve code
> generation.
> 
> The bitfields are generally all set together as just plain one-time
> constants at initialization time, and gcc sees that it's a full byte
> write.

I've just spent too long on godbolt (again) :-)
Fiddling with:

#define t1 unsigned char
struct b {
    t1 b1:7;
    t1 b2:1;
};

void ff(struct b *,int);

void ff1(void)
{
    struct b b = {.b1=3, .b2 = 1};
    ff(&b, sizeof b);
}

gcc for x86-64 make a pigs-breakfast when the bitfields are 'char'
and loads the constant from memory using pc-relative access.
Otherwise pretty must all variants (with or without the bitfield)
get initialised in a single write.
(Although gcc seems to insist on loading a 32bit constant into %eax.)

I can well imagine that keeping the constant below 32768 will help
on architectures that have to construct large constants.

> And the reason 'data_source' is not a bitfield is that it's not
> a constant at iov_iter init time (it's an argument to all the init
> functions), so having that one as a separate byte at init time is good
> for code generation when you don't need to mask bits or anything like
> that.
> 
> And once initialized, having things be dense and doing all the
> compares with a bitwise 'and' instead of doing them as some value
> compare again tends to generate good code.
> 
> Then being able to test multiple bits at the same time is just gravy
> on top of that (ie that whole "remove user_backed, because it's easier
> to just test the bit combination").

Indeed, they used to be bits but never got tested together.

> > OTOH the 'nofault' and 'copy_mc' flags could be put into much
> > higher bits of a 32bit value.
> 
> Once you start doing that, you often get bigger constants in the code stream.

I wasn't thinking of using 'really big' values :-)
Even 32768 can be an issue because some cpu sign extend all constants.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-17 15:16               ` David Laight
@ 2023-08-17 15:31                 ` Linus Torvalds
  2023-08-17 16:06                   ` David Laight
  0 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2023-08-17 15:31 UTC (permalink / raw)
  To: David Laight
  Cc: David Howells, Al Viro, Jens Axboe, Christian Brauner,
	Matthew Wilcox, Jeff Layton, linux-fsdevel, linux-block,
	linux-mm, linux-kernel

On Thu, 17 Aug 2023 at 17:16, David Laight <David.Laight@aculab.com> wrote:
>
> gcc for x86-64 make a pigs-breakfast when the bitfields are 'char'
> and loads the constant from memory using pc-relative access.

I think your godbolt tests must be with some other model than what the
kernel uses.

For example, for me, iov_iter_init generates

        testl   %esi, %esi      # direction test
        movb    $1, (%rdi)      # bitfields
        movq    $0, 8(%rdi)
        movq    %rdx, 16(%rdi)
        movq    %r8, 24(%rdi)
        movq    %rcx, 32(%rdi)
        setne   1(%rdi)            # set the direction byte

with my patch for me. Which is pretty much optimal.

*Without& the patch, I get

        movzwl  .LC1(%rip), %eax
        testl   %esi, %esi
        movb    $0, (%rdi)
        movb    $1, 4(%rdi)
        movw    %ax, 1(%rdi)
        movq    $0, 8(%rdi)
        movq    %rdx, 16(%rdi)
        movq    %r8, 24(%rdi)
        movq    %rcx, 32(%rdi)
        setne   3(%rdi)

which is that disgusting "move two bytes from memory", and makes
absolutely no sense as a way to "write 2 zero bytes":

.LC1:
        .byte   0
        .byte   0

I think that's some odd gcc bug, actually.


> Otherwise pretty must all variants (with or without the bitfield)
> get initialised in a single write.

So there may well be some odd gcc code generation issue that is
triggered by the fact that we use an initializer to set those things,
and we then have two bytes (with my patch) followed by a hole, or
three bytes (without it) followed by a hole.

But that bitfield thing improves things at least for me. I think the
example code you used for godbolt is actually something else than what
the kernel does.

               Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-17 15:31                 ` Linus Torvalds
@ 2023-08-17 16:06                   ` David Laight
  0 siblings, 0 replies; 24+ messages in thread
From: David Laight @ 2023-08-17 16:06 UTC (permalink / raw)
  To: 'Linus Torvalds'
  Cc: David Howells, Al Viro, Jens Axboe, Christian Brauner,
	Matthew Wilcox, Jeff Layton, linux-fsdevel, linux-block,
	linux-mm, linux-kernel

From: Linus Torvalds
> Sent: Thursday, August 17, 2023 4:31 PM
...
>         movzwl  .LC1(%rip), %eax
>         testl   %esi, %esi
>         movb    $0, (%rdi)
>         movb    $1, 4(%rdi)
>         movw    %ax, 1(%rdi)
>         movq    $0, 8(%rdi)
>         movq    %rdx, 16(%rdi)
>         movq    %r8, 24(%rdi)
>         movq    %rcx, 32(%rdi)
>         setne   3(%rdi)
> 
> which is that disgusting "move two bytes from memory", and makes
> absolutely no sense as a way to "write 2 zero bytes":
> 
> .LC1:
>         .byte   0
>         .byte   0
> 
> I think that's some odd gcc bug, actually.

I get that with some code, but not others.
Seems to depend on random other stuff.
Happens for:
	struct { unsigned char x:7, y:1; };
but not if I add anything after if (that gets zeroed).
Which seems to be the opposite of what you see.

If I use explicit assignments (rather than an initialiser)
I still get merged writes (even if not a bitfield) but also
lose the memory access.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-16 14:19     ` David Laight
  2023-08-16 18:50       ` Linus Torvalds
  2023-08-16 20:35       ` David Howells
@ 2023-08-18 11:39       ` David Howells
  2 siblings, 0 replies; 24+ messages in thread
From: David Howells @ 2023-08-18 11:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, David Laight, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

Would it make sense to always check for MCE in _copy_from_iter() and always
return a short transfer if we encounter one?  It looks pretty cheap in terms
of code size as the exception table stuff handles it, so we don't need to do
anything in the normal path.

I guess this would change the handling of memory errors and DAX errors.

David
---
iov_iter: Always handle MCE in _copy_to_iter()

(incomplete)

---
 arch/x86/include/asm/mce.h |   22 ++++++++++++++++++++++
 fs/coredump.c              |    1 -
 include/linux/uio.h        |   16 ----------------
 lib/iov_iter.c             |   17 +++++------------
 4 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 180b1cbfcc4e..ee3ff090360d 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -353,4 +353,26 @@ static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)	{ return mce_am
 
 unsigned long copy_mc_fragile_handle_tail(char *to, char *from, unsigned len);
 
+static __always_inline __must_check
+unsigned long memcpy_mc(void *to, const void *from, unsigned long len)
+{
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+	/*
+	 * If CPU has FSRM feature, use 'rep movs'.
+	 * Otherwise, use rep_movs_alternative.
+	 */
+	asm volatile(
+		"1:\n\t"
+		ALTERNATIVE("rep movsb",
+			    "call rep_movs_alternative", ALT_NOT(X86_FEATURE_FSRM))
+		"2:\n"
+		_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_DEFAULT_MCE_SAFE)
+		:"+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
+		: : "memory", "rax", "r8", "r9", "r10", "r11");
+#else
+	return memcpy(to, from, len);
+#endif
+	return len;
+}
+
 #endif /* _ASM_X86_MCE_H */
diff --git a/fs/coredump.c b/fs/coredump.c
index 9d235fa14ab9..ad54102a5e14 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -884,7 +884,6 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
 	pos = file->f_pos;
 	bvec_set_page(&bvec, page, PAGE_SIZE, 0);
 	iov_iter_bvec(&iter, ITER_SOURCE, &bvec, 1, PAGE_SIZE);
-	iov_iter_set_copy_mc(&iter);
 	n = __kernel_write_iter(cprm->file, &iter, &pos);
 	if (n != PAGE_SIZE)
 		return 0;
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 42bce38a8e87..73078ba297b7 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -40,7 +40,6 @@ struct iov_iter_state {
 
 struct iov_iter {
 	u8 iter_type;
-	bool copy_mc;
 	bool nofault;
 	bool data_source;
 	bool user_backed;
@@ -252,22 +251,8 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
 
 #ifdef CONFIG_ARCH_HAS_COPY_MC
 size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
-static inline void iov_iter_set_copy_mc(struct iov_iter *i)
-{
-	i->copy_mc = true;
-}
-
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
-	return i->copy_mc;
-}
 #else
 #define _copy_mc_to_iter _copy_to_iter
-static inline void iov_iter_set_copy_mc(struct iov_iter *i) { }
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
-	return false;
-}
 #endif
 
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
@@ -382,7 +367,6 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter) {
 		.iter_type = ITER_UBUF,
-		.copy_mc = false,
 		.user_backed = true,
 		.data_source = direction,
 		.ubuf = buf,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index d282fd4d348f..887d9cb9be4e 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -14,6 +14,7 @@
 #include <linux/scatterlist.h>
 #include <linux/instrumented.h>
 #include <linux/iov_iter.h>
+#include <asm/mce.h>
 
 static __always_inline
 size_t copy_to_user_iter(void __user *iter_to, size_t progress,
@@ -168,7 +169,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter) {
 		.iter_type = ITER_IOVEC,
-		.copy_mc = false,
 		.nofault = false,
 		.user_backed = true,
 		.data_source = direction,
@@ -254,14 +254,11 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
 #endif /* CONFIG_ARCH_HAS_COPY_MC */
 
-static size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
-				  size_t len, void *to, void *priv2)
+static __always_inline
+size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
+			   size_t len, void *to, void *priv2)
 {
-	struct iov_iter *iter = priv2;
-
-	if (iov_iter_is_copy_mc(iter))
-		return copy_mc_to_kernel(to + progress, iter_from, len);
-	return memcpy_from_iter(iter_from, progress, len, to, priv2);
+	return memcpy_mc(to + progress, iter_from, len);
 }
 
 size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
@@ -632,7 +629,6 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter){
 		.iter_type = ITER_KVEC,
-		.copy_mc = false,
 		.data_source = direction,
 		.kvec = kvec,
 		.nr_segs = nr_segs,
@@ -649,7 +645,6 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
 	WARN_ON(direction & ~(READ | WRITE));
 	*i = (struct iov_iter){
 		.iter_type = ITER_BVEC,
-		.copy_mc = false,
 		.data_source = direction,
 		.bvec = bvec,
 		.nr_segs = nr_segs,
@@ -678,7 +673,6 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction,
 	BUG_ON(direction & ~1);
 	*i = (struct iov_iter) {
 		.iter_type = ITER_XARRAY,
-		.copy_mc = false,
 		.data_source = direction,
 		.xarray = xarray,
 		.xarray_start = start,
@@ -702,7 +696,6 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
 	BUG_ON(direction != READ);
 	*i = (struct iov_iter){
 		.iter_type = ITER_DISCARD,
-		.copy_mc = false,
 		.data_source = false,
 		.count = count,
 		.iov_offset = 0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-16 20:35       ` David Howells
  2023-08-17  4:18         ` Linus Torvalds
@ 2023-08-18 11:42         ` David Howells
  2023-08-18 12:16           ` David Laight
  2023-08-18 13:33         ` David Howells
  2 siblings, 1 reply; 24+ messages in thread
From: David Howells @ 2023-08-18 11:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, David Laight, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Well, that part is trivially fixable, and we should do that anyway for
> other reasons.
> ..
>  enum iter_type {
>  	/* iter types */
> -	ITER_IOVEC,
> -	ITER_KVEC,
> -	ITER_BVEC,
> -	ITER_XARRAY,
> -	ITER_DISCARD,
> -	ITER_UBUF,
> +	ITER_IOVEC = 1,
> +	ITER_UBUF = 2,
> +	ITER_KVEC = 4,
> +	ITER_BVEC = 8,
> +	ITER_XARRAY = 16,
> +	ITER_DISCARD = 32,
>  };

It used to be this way, but Al switched it:

	8cd54c1c848031a87820e58d772166ffdf8c08c0
	iov_iter: separate direction from flavour

David


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-18 11:42         ` David Howells
@ 2023-08-18 12:16           ` David Laight
  2023-08-18 12:26             ` Matthew Wilcox
  0 siblings, 1 reply; 24+ messages in thread
From: David Laight @ 2023-08-18 12:16 UTC (permalink / raw)
  To: 'David Howells', Linus Torvalds
  Cc: Al Viro, Jens Axboe, Christoph Hellwig, Christian Brauner,
	Matthew Wilcox, Jeff Layton, linux-fsdevel, linux-block,
	linux-mm, linux-kernel

From: David Howells
> Sent: Friday, August 18, 2023 12:43 PM
> 
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > Well, that part is trivially fixable, and we should do that anyway for
> > other reasons.
> > ..
> >  enum iter_type {
> >  	/* iter types */
> > -	ITER_IOVEC,
> > -	ITER_KVEC,
> > -	ITER_BVEC,
> > -	ITER_XARRAY,
> > -	ITER_DISCARD,
> > -	ITER_UBUF,
> > +	ITER_IOVEC = 1,
> > +	ITER_UBUF = 2,
> > +	ITER_KVEC = 4,
> > +	ITER_BVEC = 8,
> > +	ITER_XARRAY = 16,
> > +	ITER_DISCARD = 32,

That could be zero - no bits and default.

> >  };
> 
> It used to be this way, but Al switched it:
> 
> 	8cd54c1c848031a87820e58d772166ffdf8c08c0
> 	iov_iter: separate direction from flavour

Except it also had the direction flag inside the enum.
That caused its own piles of grief.

IIRC Linus had type:6 - that doesn't leave any headroom
for additional types (even though they shouldn't proliferate).
It may be best to avoid bits 15+ (in a bitfield) due to
issues with large constants and sign extension.
On x86 (I think) 'and immediate' and 'bit test' are the same
size for bit 0 to 7, BIT wins for higher bits.

gcc generates strange code for some initialisers (see yesterday's
thread) and you definitely mustn't leave unused bits in a bitfield.
Might be better is the fields are assigned later!

(I also saw clang carefully preserving %eax on stabck!)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-18 12:16           ` David Laight
@ 2023-08-18 12:26             ` Matthew Wilcox
  2023-08-18 12:41               ` David Laight
  0 siblings, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2023-08-18 12:26 UTC (permalink / raw)
  To: David Laight
  Cc: 'David Howells',
	Linus Torvalds, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Jeff Layton, linux-fsdevel, linux-block,
	linux-mm, linux-kernel

On Fri, Aug 18, 2023 at 12:16:23PM +0000, David Laight wrote:
> > > +	ITER_IOVEC = 1,
> > > +	ITER_UBUF = 2,
> > > +	ITER_KVEC = 4,
> > > +	ITER_BVEC = 8,
> > > +	ITER_XARRAY = 16,
> > > +	ITER_DISCARD = 32,
> 
> IIRC Linus had type:6 - that doesn't leave any headroom
> for additional types (even though they shouldn't proliferate).

I have proposed an ITER_KBUF in the past (it is to KVEC as UBUF is
to IOVEC).  I didn't care enough to keep pushing it, but it's clearly
a common idiom.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-18 12:26             ` Matthew Wilcox
@ 2023-08-18 12:41               ` David Laight
  0 siblings, 0 replies; 24+ messages in thread
From: David Laight @ 2023-08-18 12:41 UTC (permalink / raw)
  To: 'Matthew Wilcox'
  Cc: 'David Howells',
	Linus Torvalds, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Jeff Layton, linux-fsdevel, linux-block,
	linux-mm, linux-kernel

From: Matthew Wilcox
> Sent: Friday, August 18, 2023 1:27 PM
> 
> On Fri, Aug 18, 2023 at 12:16:23PM +0000, David Laight wrote:
> > > > +	ITER_IOVEC = 1,
> > > > +	ITER_UBUF = 2,
> > > > +	ITER_KVEC = 4,
> > > > +	ITER_BVEC = 8,
> > > > +	ITER_XARRAY = 16,
> > > > +	ITER_DISCARD = 32,
> >
> > IIRC Linus had type:6 - that doesn't leave any headroom
> > for additional types (even though they shouldn't proliferate).
> 
> I have proposed an ITER_KBUF in the past (it is to KVEC as UBUF is
> to IOVEC).  I didn't care enough to keep pushing it, but it's clearly
> a common idiom.

Indeed, I didn't spot UBUF going in - I spot a lot of stuff.

I did wonder if you could optimise for a vector length of 1
(inside the KVEC conditional).
That would also pick up the cases where there only happens
to be a single buffer.

I also remember writing a patch that simplified import_iovec()
by combining the iov_iter with a struct iovec iovec[UIO_FASTIOV].
All got bogged down in io_uring which would need changing first.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-16 20:35       ` David Howells
  2023-08-17  4:18         ` Linus Torvalds
  2023-08-18 11:42         ` David Howells
@ 2023-08-18 13:33         ` David Howells
  2 siblings, 0 replies; 24+ messages in thread
From: David Howells @ 2023-08-18 13:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, David Laight, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> This patch only does that for the 'user_backed' thing, which was a similar
> case.

It makes some things a bit bigger, makes some a bit smaller:

__iov_iter_get_pages_alloc               dcr 0x331 -> 0x32a -0x7
_copy_from_iter                          dcr 0x36e -> 0x36a -0x4
_copy_from_iter_flushcache               inc 0x359 -> 0x36b +0x12
_copy_mc_to_iter                         dcr 0x3a7 -> 0x39b -0xc
_copy_to_iter                            inc 0x358 -> 0x359 +0x1
copy_page_to_iter_nofault.part.0         dcr 0x3f1 -> 0x3ef -0x2
csum_and_copy_from_iter                  dcr 0x3e8 -> 0x3e4 -0x4
csum_and_copy_to_iter                    inc 0x46a -> 0x46d +0x3
dup_iter                                 inc 0x34 -> 0x39 +0x5
fault_in_iov_iter_readable               inc 0x9b -> 0xa0 +0x5
fault_in_iov_iter_writeable              inc 0x9b -> 0xa0 +0x5
first_iovec_segment                      inc 0x4a -> 0x51 +0x7
import_single_range                      dcr 0x62 -> 0x40 -0x22
import_ubuf                              dcr 0x65 -> 0x43 -0x22
iov_iter_advance                         inc 0xd7 -> 0x103 +0x2c
iov_iter_alignment                       inc 0xe0 -> 0xe2 +0x2
iov_iter_extract_pages                   dcr 0x418 -> 0x416 -0x2
iov_iter_init                            dcr 0x31 -> 0x27 -0xa
iov_iter_is_aligned                      inc 0xf3 -> 0x108 +0x15
iov_iter_npages                          inc 0x119 -> 0x11a +0x1
iov_iter_revert                          inc 0x88 -> 0x99 +0x11
iov_iter_single_seg_count                inc 0x38 -> 0x3e +0x6
iov_iter_ubuf                            new 0x39
iov_iter_zero                            inc 0x34f -> 0x353 +0x4
iter_iov                                 new 0x17

Adding an extra patch to get rid of the bitfields and using a u8 for the type
and bools for the flags makes very little difference on top of the above:

__iov_iter_get_pages_alloc               inc 0x32a -> 0x32f +0x5
_copy_from_iter                          inc 0x36a -> 0x36d +0x3
copy_page_from_iter_atomic.part.0        inc 0x3cf -> 0x3d2 +0x3
csum_and_copy_to_iter                    dcr 0x46d -> 0x46a -0x3
iov_iter_advance                         dcr 0x103 -> 0xfd -0x6
iov_iter_extract_pages                   inc 0x416 -> 0x417 +0x1
iov_iter_init                            inc 0x27 -> 0x2d +0x6
iov_iter_revert                          dcr 0x99 -> 0x95 -0x4

For reference, I generated the stats with:

	nm build3/lib/iov_iter.o  | sort >a
	... change...
	nm build3/lib/iov_iter.o  | sort >b
	perl analyse.pl a b

where analyse.pl is attached.

David
---
#!/usr/bin/perl -w
use strict;

die "$0 <file_a> <file_b>" if ($#ARGV != 1);
my ($file_a, $file_b) = @ARGV;
die "$file_a: File not found\n" unless -r $file_a;
die "$file_b: File not found\n" unless -r $file_b;

my %a = ();
my %b = ();
my %c = ();

sub read_one($$$)
{
    my ($file, $list, $all) = @_;
    my $last = undef;

    open FD, "<$file" || die $file;
    while (<FD>) {
	if (/([0-9a-f][0-9a-f]+) [Tt] ([_a-zA-Z0-9.]*)/) {
	    my $addr = hex $1;
	    my $sym = $2;
	    #print $addr, " ", $sym, "\n";

	    my %obj = (
		sym	=> $sym,
		addr	=> $addr,
		size	=> 0
		);

	    $list->{$sym} = \%obj;
	    $all->{$sym} = 1;

	    if ($last) {
		$last->{size} = $addr - $last->{addr};
	    }

	    $last = \%obj;
	}
    }
    close(FD);
}

read_one($file_a, \%a, \%c);
read_one($file_b, \%b, \%c);

foreach my $sym (sort keys %c) {
    my $as = -1;
    my $bs = -1;

    $as = $a{$sym}->{size} if (exists($a{$sym}));
    $bs = $b{$sym}->{size} if (exists($b{$sym}));

    next if ($as == $bs);
    #next if ($sym =~ /__UNIQUE_ID/);

    if ($as == -1) {
	printf "%-40s new 0x%x\n", $sym, $bs;
    } elsif ($bs == -1) {
	printf "%-40s del 0x%x\n", $sym, $as;
    } elsif ($bs > $as) {
	printf "%-40s inc 0x%x -> 0x%x +0x%x\n", $sym, $as, $bs, $bs - $as;
    } else {
	printf "%-40s dcr 0x%x -> 0x%x -0x%x\n", $sym, $as, $bs, $as - $bs;
    }
}


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-17  8:41           ` David Laight
  2023-08-17 14:38             ` Linus Torvalds
@ 2023-08-18 15:19             ` David Howells
  2023-08-18 15:42               ` David Laight
  2023-08-18 16:48               ` David Howells
  1 sibling, 2 replies; 24+ messages in thread
From: David Howells @ 2023-08-18 15:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, David Laight, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > Although I'm not sure the bit-fields really help.
> > There are 8 bytes at the start of the structure, might as well
> > use them :-)
> 
> Actuallyç I wrote the patch that way because it seems to improve code
> generation.
> 
> The bitfields are generally all set together as just plain one-time
> constants at initialization time, and gcc sees that it's a full byte
> write. And the reason 'data_source' is not a bitfield is that it's not
> a constant at iov_iter init time (it's an argument to all the init
> functions), so having that one as a separate byte at init time is good
> for code generation when you don't need to mask bits or anything like
> that.
> 
> And once initialized, having things be dense and doing all the
> compares with a bitwise 'and' instead of doing them as some value
> compare again tends to generate good code.

Actually...  I said that switch(enum) seemed to generate suboptimal code...
However, if the enum is renumbered such that the constants are in the same
order as in the switch() it generates better code.

So we want this order:

	enum iter_type {
		ITER_UBUF,
		ITER_IOVEC,
		ITER_BVEC,
		ITER_KVEC,
		ITER_XARRAY,
		ITER_DISCARD,
	};

to match:

	switch (iov_iter_type(iter)) {
	case ITER_UBUF:
		progress = iterate_ubuf(iter, len, priv, priv2, ustep);
		break;
	case ITER_IOVEC:
		progress = iterate_iovec(iter, len, priv, priv2, ustep);
		break;
	case ITER_BVEC:
		progress = iterate_bvec(iter, len, priv, priv2, step);
		break;
	case ITER_KVEC:
		progress = iterate_kvec(iter, len, priv, priv2, step);
		break;
	case ITER_XARRAY:
		progress = iterate_xarray(iter, len, priv, priv2, step);
		break;
	case ITER_DISCARD:
	default:
		progress = len;
		break;
	}

then gcc should be able to generate a ternary tree, which it does here:

	<+77>:	mov    (%rdx),%al
	<+79>:	cmp    $0x2,%al
	<+81>:	je     0xffffffff81779bcc <_copy_from_iter+394>
	<+87>:	ja     0xffffffff81779aa9 <_copy_from_iter+103>

though it really split the number space in the wrong place.  It can then use
one CMP (or TEST) to split again:

	<+89>:	test   %al,%al
	<+91>:	mov    0x8(%rdx),%rdx
	<+95>:	jne    0xffffffff81779b48 <_copy_from_iter+262>
	<+101>:	jmp    0xffffffff81779b17 <_copy_from_iter+213>

It then should only require one CMP at this point, since AL can only be 4, 5
or 6+:

	<+103>:	cmp    $0x3,%al
	<+105>:	je     0xffffffff81779c6e <_copy_from_iter+556>
	<+111>:	cmp    $0x4,%al
	<+113>:	jne    0xffffffff81779dd2 <_copy_from_iter+912>

The end result being that it should have at most two CMP instructions for any
number in the range 0 to 5 - though in this case, it will have three for AL>3.

However, it doesn't do this with if-if-if with a number sequence or a bitmask,
but rather generates an chain of cmp-cmp-cmp or test-test-test, presumably
because it fails to spot the if-conditions are related.

I should log a gcc bug on this on the poor switch() behaviour.

Also, if we renumber the enum to put UBUF and IOVEC first, we can get rid of
iter->user_backed in favour of:

	static inline bool user_backed_iter(const struct iov_iter *i)
	{
		return iter_is_ubuf(i) || iter_is_iovec(i);
	}

which gcc just changes into something like a "CMP $1" and a "JA".


Comparing Linus's bit patch (+ is better) to renumbering the switch (- is
better):

__iov_iter_get_pages_alloc               inc 0x32a -> 0x331 +0x7
_copy_from_iter                          dcr 0x3c7 -> 0x3bf -0x8
_copy_from_iter_flushcache               inc 0x364 -> 0x370 +0xc
_copy_from_iter_nocache                  inc 0x33e -> 0x347 +0x9
_copy_mc_to_iter                         dcr 0x3bc -> 0x3b6 -0x6
_copy_to_iter                            inc 0x34a -> 0x34b +0x1
copy_page_from_iter_atomic.part.0        dcr 0x424 -> 0x41c -0x8
copy_page_to_iter_nofault.part.0         dcr 0x3a9 -> 0x3a5 -0x4
csum_and_copy_from_iter                  inc 0x3e5 -> 0x3e8 +0x3
csum_and_copy_to_iter                    dcr 0x449 -> 0x448 -0x1
dup_iter                                 inc 0x34 -> 0x37 +0x3
fault_in_iov_iter_readable               dcr 0x9c -> 0x9a -0x2
fault_in_iov_iter_writeable              dcr 0x9c -> 0x9a -0x2
iov_iter_advance                         dcr 0xde -> 0xd8 -0x6
iov_iter_alignment                       inc 0xe0 -> 0xe3 +0x3
iov_iter_extract_pages                   inc 0x416 -> 0x418 +0x2
iov_iter_gap_alignment                   dcr 0x69 -> 0x67 -0x2
iov_iter_init                            inc 0x27 -> 0x31 +0xa
iov_iter_is_aligned                      dcr 0x104 -> 0xf5 -0xf
iov_iter_npages                          inc 0x119 -> 0x11d +0x4
iov_iter_revert                          dcr 0x93 -> 0x86 -0xd
iov_iter_single_seg_count                dcr 0x41 -> 0x3c -0x5
iov_iter_ubuf                            inc 0x39 -> 0x3a +0x1
iov_iter_zero                            dcr 0x338 -> 0x32e -0xa
memcpy_from_iter_mc                      inc 0x2a -> 0x2b +0x1

I think there may be more savings to be made if I go and convert more of the
functions to using switch().

David


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-18 15:19             ` David Howells
@ 2023-08-18 15:42               ` David Laight
  2023-08-18 16:48               ` David Howells
  1 sibling, 0 replies; 24+ messages in thread
From: David Laight @ 2023-08-18 15:42 UTC (permalink / raw)
  To: 'David Howells', Linus Torvalds
  Cc: Al Viro, Jens Axboe, Christoph Hellwig, Christian Brauner,
	Matthew Wilcox, Jeff Layton, linux-fsdevel, linux-block,
	linux-mm, linux-kernel

From: David Howells
> Sent: Friday, August 18, 2023 4:20 PM
> 
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > > Although I'm not sure the bit-fields really help.
> > > There are 8 bytes at the start of the structure, might as well
> > > use them :-)
> >
> > Actuallyç I wrote the patch that way because it seems to improve code
> > generation.
> >
> > The bitfields are generally all set together as just plain one-time
> > constants at initialization time, and gcc sees that it's a full byte
> > write. And the reason 'data_source' is not a bitfield is that it's not
> > a constant at iov_iter init time (it's an argument to all the init
> > functions), so having that one as a separate byte at init time is good
> > for code generation when you don't need to mask bits or anything like
> > that.
> >
> > And once initialized, having things be dense and doing all the
> > compares with a bitwise 'and' instead of doing them as some value
> > compare again tends to generate good code.
> 
> Actually...  I said that switch(enum) seemed to generate suboptimal code...
> However, if the enum is renumbered such that the constants are in the same
> order as in the switch() it generates better code.

Hmmm.. the order of the switch labels really shouldn't matter.

The advantage of the if-chain is that you can optimise for
the most common case.

> So we want this order:
> 
> 	enum iter_type {
> 		ITER_UBUF,
> 		ITER_IOVEC,
> 		ITER_BVEC,
> 		ITER_KVEC,
> 		ITER_XARRAY,
> 		ITER_DISCARD,
> 	};

Will gcc actually code this version without pessimising it?

	if (likely(type <= ITER_IOVEC) {
		if (likely(type != ITER_IOVEC))
			iterate_ubuf();
		else
			iterate_iovec();
	} else if (likely(type) <= ITER_KVEC)) {
		if (type == ITER_KVEC)
			iterate_kvec();
		else
			iterate_bvec();
	} else if (type == ITER_XARRAY) {
		iterate_xarrar()
	} else {
		discard;
	}

But I bet you can't stop it replicating the compares.
(especially with the likely().

That has two mis-predicted (are they ever right!) branches in the
common user-copy versions and three in the common kernel ones.

In some architectures you might get the default 'fall through'
to the UBUF code if the branches aren't predictable.
But I believe current x86 cpu never do static prediction.
So you always lose :-)

...
> 	static inline bool user_backed_iter(const struct iov_iter *i)
> 	{
> 		return iter_is_ubuf(i) || iter_is_iovec(i);
> 	}
> 
> which gcc just changes into something like a "CMP $1" and a "JA".

That makes sense...

> Comparing Linus's bit patch (+ is better) to renumbering the switch (- is
> better):
> 
....
> iov_iter_init                            inc 0x27 -> 0x31 +0xa

Are you hitting the gcc bug that loads the constant from memory?

> I think there may be more savings to be made if I go and convert more of the
> functions to using switch().

Size isn't everything, the code needs to be optimised for the hot paths.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-18 15:19             ` David Howells
  2023-08-18 15:42               ` David Laight
@ 2023-08-18 16:48               ` David Howells
  2023-08-18 21:39                 ` David Laight
  1 sibling, 1 reply; 24+ messages in thread
From: David Howells @ 2023-08-18 16:48 UTC (permalink / raw)
  To: David Laight
  Cc: dhowells, Linus Torvalds, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

David Laight <David.Laight@ACULAB.COM> wrote:

> > iov_iter_init                            inc 0x27 -> 0x31 +0xa
> 
> Are you hitting the gcc bug that loads the constant from memory?

I'm not sure what that looks like.  For your perusal, here's a disassembly of
the use-switch-on-enum variant:

   0xffffffff8177726c <+0>:     cmp    $0x1,%esi
   0xffffffff8177726f <+3>:     jbe    0xffffffff81777273 <iov_iter_init+7>
   0xffffffff81777271 <+5>:     ud2
   0xffffffff81777273 <+7>:     test   %esi,%esi
   0xffffffff81777275 <+9>:     movw   $0x1,(%rdi)
   0xffffffff8177727a <+14>:    setne  0x3(%rdi)
   0xffffffff8177727e <+18>:    xor    %eax,%eax
   0xffffffff81777280 <+20>:    movb   $0x0,0x2(%rdi)
   0xffffffff81777284 <+24>:    movb   $0x1,0x4(%rdi)
   0xffffffff81777288 <+28>:    mov    %rax,0x8(%rdi)
   0xffffffff8177728c <+32>:    mov    %rdx,0x10(%rdi)
   0xffffffff81777290 <+36>:    mov    %r8,0x18(%rdi)
   0xffffffff81777294 <+40>:    mov    %rcx,0x20(%rdi)
   0xffffffff81777298 <+44>:    jmp    0xffffffff81d728a0 <__x86_return_thunk>

versus the use-bitmap variant:

   0xffffffff81777311 <+0>:     cmp    $0x1,%esi
   0xffffffff81777314 <+3>:     jbe    0xffffffff81777318 <iov_iter_init+7>
   0xffffffff81777316 <+5>:     ud2
   0xffffffff81777318 <+7>:     test   %esi,%esi
   0xffffffff8177731a <+9>:     movb   $0x2,(%rdi)
   0xffffffff8177731d <+12>:    setne  0x1(%rdi)
   0xffffffff81777321 <+16>:    xor    %eax,%eax
   0xffffffff81777323 <+18>:    mov    %rdx,0x10(%rdi)
   0xffffffff81777327 <+22>:    mov    %rax,0x8(%rdi)
   0xffffffff8177732b <+26>:    mov    %r8,0x18(%rdi)
   0xffffffff8177732f <+30>:    mov    %rcx,0x20(%rdi)
   0xffffffff81777333 <+34>:    jmp    0xffffffff81d72960 <__x86_return_thunk>

It seems to be that the former is loading byte constants individually, whereas
Linus combined all those fields into a single byte and eliminated one of them.

David


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
  2023-08-18 16:48               ` David Howells
@ 2023-08-18 21:39                 ` David Laight
  0 siblings, 0 replies; 24+ messages in thread
From: David Laight @ 2023-08-18 21:39 UTC (permalink / raw)
  To: 'David Howells'
  Cc: Linus Torvalds, Al Viro, Jens Axboe, Christoph Hellwig,
	Christian Brauner, Matthew Wilcox, Jeff Layton, linux-fsdevel,
	linux-block, linux-mm, linux-kernel

From: David Howells
> Sent: Friday, August 18, 2023 5:49 PM
> 
> David Laight <David.Laight@ACULAB.COM> wrote:
> 
> > > iov_iter_init                            inc 0x27 -> 0x31 +0xa
> >
> > Are you hitting the gcc bug that loads the constant from memory?
> 
> I'm not sure what that looks like.  For your perusal, here's a disassembly of
> the use-switch-on-enum variant:
> 
>    0xffffffff8177726c <+0>:     cmp    $0x1,%esi
>    0xffffffff8177726f <+3>:     jbe    0xffffffff81777273 <iov_iter_init+7>
>    0xffffffff81777271 <+5>:     ud2
>    0xffffffff81777273 <+7>:     test   %esi,%esi
>    0xffffffff81777275 <+9>:     movw   $0x1,(%rdi)
>    0xffffffff8177727a <+14>:    setne  0x3(%rdi)
>    0xffffffff8177727e <+18>:    xor    %eax,%eax
>    0xffffffff81777280 <+20>:    movb   $0x0,0x2(%rdi)
>    0xffffffff81777284 <+24>:    movb   $0x1,0x4(%rdi)
>    0xffffffff81777288 <+28>:    mov    %rax,0x8(%rdi)
>    0xffffffff8177728c <+32>:    mov    %rdx,0x10(%rdi)
>    0xffffffff81777290 <+36>:    mov    %r8,0x18(%rdi)
>    0xffffffff81777294 <+40>:    mov    %rcx,0x20(%rdi)
>    0xffffffff81777298 <+44>:    jmp    0xffffffff81d728a0 <__x86_return_thunk>
> 
> versus the use-bitmap variant:
> 
>    0xffffffff81777311 <+0>:     cmp    $0x1,%esi
>    0xffffffff81777314 <+3>:     jbe    0xffffffff81777318 <iov_iter_init+7>
>    0xffffffff81777316 <+5>:     ud2
>    0xffffffff81777318 <+7>:     test   %esi,%esi
>    0xffffffff8177731a <+9>:     movb   $0x2,(%rdi)
>    0xffffffff8177731d <+12>:    setne  0x1(%rdi)
>    0xffffffff81777321 <+16>:    xor    %eax,%eax
>    0xffffffff81777323 <+18>:    mov    %rdx,0x10(%rdi)
>    0xffffffff81777327 <+22>:    mov    %rax,0x8(%rdi)
>    0xffffffff8177732b <+26>:    mov    %r8,0x18(%rdi)
>    0xffffffff8177732f <+30>:    mov    %rcx,0x20(%rdi)
>    0xffffffff81777333 <+34>:    jmp    0xffffffff81d72960 <__x86_return_thunk>
> 
> It seems to be that the former is loading byte constants individually, whereas
> Linus combined all those fields into a single byte and eliminated one of them.

I think you need to re-order the structure.
The top set writes to bytes 0..4 with:
>    0xffffffff81777275 <+9>:     movw   $0x1,(%rdi)
>    0xffffffff8177727a <+14>:    setne  0x3(%rdi)
>    0xffffffff81777280 <+20>:    movb   $0x0,0x2(%rdi)
>    0xffffffff81777284 <+24>:    movb   $0x1,0x4(%rdi)
Note that the 'setne' writes into the middle of the constants.

The lower writes bytes 0..1 with:
>    0xffffffff8177731a <+9>:     movb   $0x2,(%rdi)
>    0xffffffff8177731d <+12>:    setne  0x1(%rdi)

I think that if you move the 'conditional' value to offset 4
you'll get fewer writes.
Probably a 32bit load into %eax and then a write.

I don't think gcc likes generating 16bit immediates.
In some tests I did it loaded a 32bit value into %eax
and then wrote the low bits.
So the code is much the same (on x86) for 2 or 4 bytes
of constants.
I'm sure you can use the 'data-16' prefix with an immediate.

I'm not sure why you have two non-zero values when Linus
only had one though.

OTOH you don't want to be writing 3 bytes of constants.
Also gcc won't generate:
	movl $0xaabbccdd,%eax
	setne %al   // overwriting the dd
	movl %eax,(%rdi)
and I suspect the partial write (to %al) will be a stall.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-08-18 21:40 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-16 12:07 [PATCH v3 0/2] iov_iter: Convert the iterator macros into inline funcs David Howells
2023-08-16 12:07 ` [PATCH v3 1/2] iov_iter: Convert iterate*() to " David Howells
2023-08-16 12:07 ` [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc() David Howells
2023-08-16 12:28   ` David Laight
2023-08-16 13:00   ` David Howells
2023-08-16 14:19     ` David Laight
2023-08-16 18:50       ` Linus Torvalds
2023-08-16 20:35       ` David Howells
2023-08-17  4:18         ` Linus Torvalds
2023-08-17  8:41           ` David Laight
2023-08-17 14:38             ` Linus Torvalds
2023-08-17 15:16               ` David Laight
2023-08-17 15:31                 ` Linus Torvalds
2023-08-17 16:06                   ` David Laight
2023-08-18 15:19             ` David Howells
2023-08-18 15:42               ` David Laight
2023-08-18 16:48               ` David Howells
2023-08-18 21:39                 ` David Laight
2023-08-18 11:42         ` David Howells
2023-08-18 12:16           ` David Laight
2023-08-18 12:26             ` Matthew Wilcox
2023-08-18 12:41               ` David Laight
2023-08-18 13:33         ` David Howells
2023-08-18 11:39       ` David Howells

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.