linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/2] Lift memcpy_[to|from]_page to core
@ 2020-12-07 22:57 ira.weiny
  2020-12-07 22:57 ` [PATCH V2 1/2] mm/highmem: Remove deprecated kmap_atomic ira.weiny
  2020-12-07 22:57 ` [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core ira.weiny
  0 siblings, 2 replies; 24+ messages in thread
From: ira.weiny @ 2020-12-07 22:57 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Morton
  Cc: Ira Weiny, Joonas Lahtinen, Matthew Wilcox, Christoph Hellwig,
	Eric Biggers, Dan Williams, Al Viro, linux-kernel, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

These are based on tip/core/mm.  As I was looking at converting the calls to
kmap_local_page() I realized that there were a number of calls in highmem.h
which should also be converted.

So I've added a second prelim patch to convert those.

This is a V2 to get into 5.11 so that we can start to convert all the various
subsystems in 5.12.[1]

I'm sending to Andrew and Thomas but I'm expecting these to go through
tip/core/mm via Thomas if that is ok with Andrew.

[1] https://lore.kernel.org/lkml/20201204160504.GH1563847@iweiny-DESK2.sc.intel.com/

Ira Weiny (2):
  mm/highmem: Remove deprecated kmap_atomic
  mm/highmem: Lift memcpy_[to|from]_page to core

 include/linux/highmem.h | 28 +++++++++++++-------------
 include/linux/pagemap.h | 44 +++++++++++++++++++++++++++++++++++++++++
 lib/iov_iter.c          | 26 +++---------------------
 3 files changed, 61 insertions(+), 37 deletions(-)

-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH V2 1/2] mm/highmem: Remove deprecated kmap_atomic
  2020-12-07 22:57 [PATCH V2 0/2] Lift memcpy_[to|from]_page to core ira.weiny
@ 2020-12-07 22:57 ` ira.weiny
  2020-12-07 22:57 ` [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core ira.weiny
  1 sibling, 0 replies; 24+ messages in thread
From: ira.weiny @ 2020-12-07 22:57 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Morton
  Cc: Ira Weiny, Joonas Lahtinen, Matthew Wilcox, Christoph Hellwig,
	Eric Biggers, Dan Williams, Al Viro, linux-kernel, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

kmap_atomic() is being deprecated in favor of kmap_local_page().

Replace the uses of kmap_atomic() within the highmem code.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 include/linux/highmem.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index f597830f26b4..3bfe6a12d14d 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -146,9 +146,9 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
 #ifndef clear_user_highpage
 static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
 {
-	void *addr = kmap_atomic(page);
+	void *addr = kmap_local_page(page);
 	clear_user_page(addr, vaddr, page);
-	kunmap_atomic(addr);
+	kunmap_local(addr);
 }
 #endif
 
@@ -199,16 +199,16 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma,
 
 static inline void clear_highpage(struct page *page)
 {
-	void *kaddr = kmap_atomic(page);
+	void *kaddr = kmap_local_page(page);
 	clear_page(kaddr);
-	kunmap_atomic(kaddr);
+	kunmap_local(kaddr);
 }
 
 static inline void zero_user_segments(struct page *page,
 	unsigned start1, unsigned end1,
 	unsigned start2, unsigned end2)
 {
-	void *kaddr = kmap_atomic(page);
+	void *kaddr = kmap_local_page(page);
 
 	BUG_ON(end1 > PAGE_SIZE || end2 > PAGE_SIZE);
 
@@ -218,7 +218,7 @@ static inline void zero_user_segments(struct page *page,
 	if (end2 > start2)
 		memset(kaddr + start2, 0, end2 - start2);
 
-	kunmap_atomic(kaddr);
+	kunmap_local(kaddr);
 	flush_dcache_page(page);
 }
 
@@ -241,11 +241,11 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
 {
 	char *vfrom, *vto;
 
-	vfrom = kmap_atomic(from);
-	vto = kmap_atomic(to);
+	vfrom = kmap_local_page(from);
+	vto = kmap_local_page(to);
 	copy_user_page(vto, vfrom, vaddr, to);
-	kunmap_atomic(vto);
-	kunmap_atomic(vfrom);
+	kunmap_local(vto);
+	kunmap_local(vfrom);
 }
 
 #endif
@@ -256,11 +256,11 @@ static inline void copy_highpage(struct page *to, struct page *from)
 {
 	char *vfrom, *vto;
 
-	vfrom = kmap_atomic(from);
-	vto = kmap_atomic(to);
+	vfrom = kmap_local_page(from);
+	vto = kmap_local_page(to);
 	copy_page(vto, vfrom);
-	kunmap_atomic(vto);
-	kunmap_atomic(vfrom);
+	kunmap_local(vto);
+	kunmap_local(vfrom);
 }
 
 #endif
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-07 22:57 [PATCH V2 0/2] Lift memcpy_[to|from]_page to core ira.weiny
  2020-12-07 22:57 ` [PATCH V2 1/2] mm/highmem: Remove deprecated kmap_atomic ira.weiny
@ 2020-12-07 22:57 ` ira.weiny
  2020-12-07 23:26   ` Matthew Wilcox
  2020-12-08 12:23   ` Matthew Wilcox
  1 sibling, 2 replies; 24+ messages in thread
From: ira.weiny @ 2020-12-07 22:57 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Morton
  Cc: Ira Weiny, Dave Hansen, Matthew Wilcox, Christoph Hellwig,
	Dan Williams, Al Viro, Eric Biggers, Joonas Lahtinen,
	linux-kernel, linux-fsdevel

From: Ira Weiny <ira.weiny@intel.com>

Working through a conversion to a call such as kmap_thread() revealed
many places where the pattern kmap/memcpy/kunmap occurred.

Eric Biggers, Matthew Wilcox, Christoph Hellwig, Dan Williams, and Al
Viro all suggested putting this code into helper functions.  Al Viro
further pointed out that these functions already existed in the iov_iter
code.[1]

Placing these functions in 'highmem.h' is suboptimal especially with the
changes being proposed in the functionality of kmap.  From a caller
perspective including/using 'highmem.h' implies that the functions
defined in that header are only required when highmem is in use which is
increasingly not the case with modern processors.  Some headers like
mm.h or string.h seem ok but don't really portray the functionality
well.  'pagemap.h', on the other hand, makes sense and is already
included in many of the places we want to convert.

Another alternative would be to create a new header for the promoted
memcpy functions, but it masks the fact that these are designed to copy
to/from pages using the kernel direct mappings and complicates matters
with a new header.

Lift memcpy_to_page() and memcpy_from_page() to pagemap.h.

Remove memzero_page() in favor of zero_user() to zero a page.

Add a memcpy_page(), memmove_page, and memset_page() to cover more
kmap/mem*/kunmap. patterns.

Finally use kmap_local_page() in all the new calls.

[1] https://lore.kernel.org/lkml/20201013200149.GI3576660@ZenIV.linux.org.uk/
    https://lore.kernel.org/lkml/20201013112544.GA5249@infradead.org/

Cc: Dave Hansen <dave.hansen@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes for V2:
	From Thomas Gleixner
		Change kmap_atomic() to kmap_local_page() after basing
		on tip/core/mm
	From Joonas Lahtinen
		Reverse offset/val in memset_page()
---
 include/linux/pagemap.h | 44 +++++++++++++++++++++++++++++++++++++++++
 lib/iov_iter.c          | 26 +++---------------------
 2 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c77b7c31b2e4..9141e5b7b9df 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1028,4 +1028,48 @@ unsigned int i_blocks_per_page(struct inode *inode, struct page *page)
 {
 	return thp_size(page) >> inode->i_blkbits;
 }
+
+static inline void memcpy_page(struct page *dst_page, size_t dst_off,
+			       struct page *src_page, size_t src_off,
+			       size_t len)
+{
+	char *dst = kmap_local_page(dst_page);
+	char *src = kmap_local_page(src_page);
+	memcpy(dst + dst_off, src + src_off, len);
+	kunmap_local(src);
+	kunmap_local(dst);
+}
+
+static inline void memmove_page(struct page *dst_page, size_t dst_off,
+			       struct page *src_page, size_t src_off,
+			       size_t len)
+{
+	char *dst = kmap_local_page(dst_page);
+	char *src = kmap_local_page(src_page);
+	memmove(dst + dst_off, src + src_off, len);
+	kunmap_local(src);
+	kunmap_local(dst);
+}
+
+static inline void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len)
+{
+	char *from = kmap_local_page(page);
+	memcpy(to, from + offset, len);
+	kunmap_local(from);
+}
+
+static inline void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len)
+{
+	char *to = kmap_local_page(page);
+	memcpy(to + offset, from, len);
+	kunmap_local(to);
+}
+
+static inline void memset_page(struct page *page, size_t offset, int val, size_t len)
+{
+	char *addr = kmap_local_page(page);
+	memset(addr + offset, val, len);
+	kunmap_local(addr);
+}
+
 #endif /* _LINUX_PAGEMAP_H */
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1635111c5bd2..8ed1f846fcc3 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -5,6 +5,7 @@
 #include <linux/fault-inject-usercopy.h>
 #include <linux/uio.h>
 #include <linux/pagemap.h>
+#include <linux/highmem.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/splice.h>
@@ -466,27 +467,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 }
 EXPORT_SYMBOL(iov_iter_init);
 
-static void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len)
-{
-	char *from = kmap_atomic(page);
-	memcpy(to, from + offset, len);
-	kunmap_atomic(from);
-}
-
-static void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len)
-{
-	char *to = kmap_atomic(page);
-	memcpy(to + offset, from, len);
-	kunmap_atomic(to);
-}
-
-static void memzero_page(struct page *page, size_t offset, size_t len)
-{
-	char *addr = kmap_atomic(page);
-	memset(addr + offset, 0, len);
-	kunmap_atomic(addr);
-}
-
 static inline bool allocated(struct pipe_buffer *buf)
 {
 	return buf->ops == &default_pipe_buf_ops;
@@ -964,7 +944,7 @@ static size_t pipe_zero(size_t bytes, struct iov_iter *i)
 
 	do {
 		size_t chunk = min_t(size_t, n, PAGE_SIZE - off);
-		memzero_page(pipe->bufs[i_head & p_mask].page, off, chunk);
+		zero_user(pipe->bufs[i_head & p_mask].page, off, chunk);
 		i->head = i_head;
 		i->iov_offset = off + chunk;
 		n -= chunk;
@@ -981,7 +961,7 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
 		return pipe_zero(bytes, i);
 	iterate_and_advance(i, bytes, v,
 		clear_user(v.iov_base, v.iov_len),
-		memzero_page(v.bv_page, v.bv_offset, v.bv_len),
+		zero_user(v.bv_page, v.bv_offset, v.bv_len),
 		memset(v.iov_base, 0, v.iov_len)
 	)
 
-- 
2.28.0.rc0.12.gb6a658bd00c9


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-07 22:57 ` [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core ira.weiny
@ 2020-12-07 23:26   ` Matthew Wilcox
  2020-12-07 23:34     ` Dan Williams
  2020-12-08 12:23   ` Matthew Wilcox
  1 sibling, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2020-12-07 23:26 UTC (permalink / raw)
  To: ira.weiny
  Cc: Thomas Gleixner, Andrew Morton, Dave Hansen, Christoph Hellwig,
	Dan Williams, Al Viro, Eric Biggers, Joonas Lahtinen,
	linux-kernel, linux-fsdevel

On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> +			       struct page *src_page, size_t src_off,
> +			       size_t len)
> +{
> +	char *dst = kmap_local_page(dst_page);
> +	char *src = kmap_local_page(src_page);

I appreciate you've only moved these, but please add:

	BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);

> +	memcpy(dst + dst_off, src + src_off, len);
> +	kunmap_local(src);
> +	kunmap_local(dst);
> +}
> +
> +static inline void memmove_page(struct page *dst_page, size_t dst_off,
> +			       struct page *src_page, size_t src_off,
> +			       size_t len)
> +{
> +	char *dst = kmap_local_page(dst_page);
> +	char *src = kmap_local_page(src_page);

	BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);

> +	memmove(dst + dst_off, src + src_off, len);
> +	kunmap_local(src);
> +	kunmap_local(dst);
> +}
> +
> +static inline void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len)
> +{
> +	char *from = kmap_local_page(page);

	BUG_ON(offset + len > PAGE_SIZE);

> +	memcpy(to, from + offset, len);
> +	kunmap_local(from);
> +}
> +
> +static inline void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len)
> +{
> +	char *to = kmap_local_page(page);

	BUG_ON(offset + len > PAGE_SIZE);

> +	memcpy(to + offset, from, len);
> +	kunmap_local(to);
> +}
> +
> +static inline void memset_page(struct page *page, size_t offset, int val, size_t len)
> +{
> +	char *addr = kmap_local_page(page);

	BUG_ON(offset + len > PAGE_SIZE);

> +	memset(addr + offset, val, len);
> +	kunmap_local(addr);
> +}
> +
>  #endif /* _LINUX_PAGEMAP_H */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-07 23:26   ` Matthew Wilcox
@ 2020-12-07 23:34     ` Dan Williams
  2020-12-07 23:40       ` Matthew Wilcox
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2020-12-07 23:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Weiny, Ira, Thomas Gleixner, Andrew Morton, Dave Hansen,
	Christoph Hellwig, Al Viro, Eric Biggers, Joonas Lahtinen,
	Linux Kernel Mailing List, linux-fsdevel

On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > +                            struct page *src_page, size_t src_off,
> > +                            size_t len)
> > +{
> > +     char *dst = kmap_local_page(dst_page);
> > +     char *src = kmap_local_page(src_page);
>
> I appreciate you've only moved these, but please add:
>
>         BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);

I imagine it's not outside the realm of possibility that some driver
on CONFIG_HIGHMEM=n is violating this assumption and getting away with
it because kmap_atomic() of contiguous pages "just works (TM)".
Shouldn't this WARN rather than BUG so that the user can report the
buggy driver and not have a dead system?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-07 23:34     ` Dan Williams
@ 2020-12-07 23:40       ` Matthew Wilcox
  2020-12-07 23:49         ` Dan Williams
  0 siblings, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2020-12-07 23:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: Weiny, Ira, Thomas Gleixner, Andrew Morton, Dave Hansen,
	Christoph Hellwig, Al Viro, Eric Biggers, Joonas Lahtinen,
	Linux Kernel Mailing List, linux-fsdevel

On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > +                            struct page *src_page, size_t src_off,
> > > +                            size_t len)
> > > +{
> > > +     char *dst = kmap_local_page(dst_page);
> > > +     char *src = kmap_local_page(src_page);
> >
> > I appreciate you've only moved these, but please add:
> >
> >         BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> 
> I imagine it's not outside the realm of possibility that some driver
> on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> it because kmap_atomic() of contiguous pages "just works (TM)".
> Shouldn't this WARN rather than BUG so that the user can report the
> buggy driver and not have a dead system?

As opposed to (on a HIGHMEM=y system) silently corrupting data that
is on the next page of memory?  I suppose ideally ...

	if (WARN_ON(dst_off + len > PAGE_SIZE))
		len = PAGE_SIZE - dst_off;
	if (WARN_ON(src_off + len > PAGE_SIZE))
		len = PAGE_SIZE - src_off;

and then we just truncate the data of the offending caller instead of
corrupting innocent data that happens to be adjacent.  Although that's
not ideal either ... I dunno, what's the least bad poison to drink here?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-07 23:40       ` Matthew Wilcox
@ 2020-12-07 23:49         ` Dan Williams
  2020-12-08 21:32           ` Ira Weiny
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2020-12-07 23:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Weiny, Ira, Thomas Gleixner, Andrew Morton, Dave Hansen,
	Christoph Hellwig, Al Viro, Eric Biggers, Joonas Lahtinen,
	Linux Kernel Mailing List, linux-fsdevel

On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > +                            struct page *src_page, size_t src_off,
> > > > +                            size_t len)
> > > > +{
> > > > +     char *dst = kmap_local_page(dst_page);
> > > > +     char *src = kmap_local_page(src_page);
> > >
> > > I appreciate you've only moved these, but please add:
> > >
> > >         BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> >
> > I imagine it's not outside the realm of possibility that some driver
> > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > it because kmap_atomic() of contiguous pages "just works (TM)".
> > Shouldn't this WARN rather than BUG so that the user can report the
> > buggy driver and not have a dead system?
>
> As opposed to (on a HIGHMEM=y system) silently corrupting data that
> is on the next page of memory?

Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...

> I suppose ideally ...
>
>         if (WARN_ON(dst_off + len > PAGE_SIZE))
>                 len = PAGE_SIZE - dst_off;
>         if (WARN_ON(src_off + len > PAGE_SIZE))
>                 len = PAGE_SIZE - src_off;
>
> and then we just truncate the data of the offending caller instead of
> corrupting innocent data that happens to be adjacent.  Although that's
> not ideal either ... I dunno, what's the least bad poison to drink here?

Right, if the driver was relying on "corruption" for correct operation.

If corruption actual were happening in practice wouldn't there have
been screams by now? Again, not necessarily...

At least with just plain WARN the kernel will start screaming on the
user's behalf, and if it worked before it will keep working.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-07 22:57 ` [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core ira.weiny
  2020-12-07 23:26   ` Matthew Wilcox
@ 2020-12-08 12:23   ` Matthew Wilcox
  2020-12-08 16:38     ` Ira Weiny
  1 sibling, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2020-12-08 12:23 UTC (permalink / raw)
  To: ira.weiny
  Cc: Thomas Gleixner, Andrew Morton, Dave Hansen, Christoph Hellwig,
	Dan Williams, Al Viro, Eric Biggers, Joonas Lahtinen,
	linux-kernel, linux-fsdevel

On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> Placing these functions in 'highmem.h' is suboptimal especially with the
> changes being proposed in the functionality of kmap.  From a caller
> perspective including/using 'highmem.h' implies that the functions
> defined in that header are only required when highmem is in use which is
> increasingly not the case with modern processors.  Some headers like
> mm.h or string.h seem ok but don't really portray the functionality
> well.  'pagemap.h', on the other hand, makes sense and is already
> included in many of the places we want to convert.

pagemap.h is for the page cache.  It's not for "random page
functionality".  Yes, I know it's badly named.  No, I don't want to
rename it.  These helpers should go in highmem.h along with zero_user().

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-08 12:23   ` Matthew Wilcox
@ 2020-12-08 16:38     ` Ira Weiny
  2020-12-08 16:40       ` Matthew Wilcox
  0 siblings, 1 reply; 24+ messages in thread
From: Ira Weiny @ 2020-12-08 16:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Thomas Gleixner, Andrew Morton, Dave Hansen, Christoph Hellwig,
	Dan Williams, Al Viro, Eric Biggers, Joonas Lahtinen,
	linux-kernel, linux-fsdevel

On Tue, Dec 08, 2020 at 12:23:16PM +0000, Matthew Wilcox wrote:
> On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > Placing these functions in 'highmem.h' is suboptimal especially with the
> > changes being proposed in the functionality of kmap.  From a caller
> > perspective including/using 'highmem.h' implies that the functions
> > defined in that header are only required when highmem is in use which is
> > increasingly not the case with modern processors.  Some headers like
> > mm.h or string.h seem ok but don't really portray the functionality
> > well.  'pagemap.h', on the other hand, makes sense and is already
> > included in many of the places we want to convert.
> 
> pagemap.h is for the page cache.  It's not for "random page
> functionality".  Yes, I know it's badly named.  No, I don't want to
> rename it.  These helpers should go in highmem.h along with zero_user().

I could have sworn you suggested pagemap.h.  But I can't find the evidence on
lore.  :-/   hehehe...

In the end the code does not care.  I have a distaste for highmem.h because it
is no longer for 'high memory'.  And I think a number of driver writers who are
targeting 64bit platforms just don't care any more.  So as we head toward
memory not being mapped by the kernel for other reasons I think highmem needs
to be 'rebranded' if not renamed.

Ira

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-08 16:38     ` Ira Weiny
@ 2020-12-08 16:40       ` Matthew Wilcox
  0 siblings, 0 replies; 24+ messages in thread
From: Matthew Wilcox @ 2020-12-08 16:40 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Thomas Gleixner, Andrew Morton, Dave Hansen, Christoph Hellwig,
	Dan Williams, Al Viro, Eric Biggers, Joonas Lahtinen,
	linux-kernel, linux-fsdevel

On Tue, Dec 08, 2020 at 08:38:14AM -0800, Ira Weiny wrote:
> On Tue, Dec 08, 2020 at 12:23:16PM +0000, Matthew Wilcox wrote:
> > On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > > Placing these functions in 'highmem.h' is suboptimal especially with the
> > > changes being proposed in the functionality of kmap.  From a caller
> > > perspective including/using 'highmem.h' implies that the functions
> > > defined in that header are only required when highmem is in use which is
> > > increasingly not the case with modern processors.  Some headers like
> > > mm.h or string.h seem ok but don't really portray the functionality
> > > well.  'pagemap.h', on the other hand, makes sense and is already
> > > included in many of the places we want to convert.
> > 
> > pagemap.h is for the page cache.  It's not for "random page
> > functionality".  Yes, I know it's badly named.  No, I don't want to
> > rename it.  These helpers should go in highmem.h along with zero_user().
> 
> I could have sworn you suggested pagemap.h.  But I can't find the evidence on
> lore.  :-/   hehehe...
> 
> In the end the code does not care.  I have a distaste for highmem.h because it
> is no longer for 'high memory'.  And I think a number of driver writers who are
> targeting 64bit platforms just don't care any more.  So as we head toward
> memory not being mapped by the kernel for other reasons I think highmem needs
> to be 'rebranded' if not renamed.

Rename highmem.h to kmap.h?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-07 23:49         ` Dan Williams
@ 2020-12-08 21:32           ` Ira Weiny
  2020-12-08 21:50             ` Matthew Wilcox
  2020-12-08 22:21             ` Dan Williams
  0 siblings, 2 replies; 24+ messages in thread
From: Ira Weiny @ 2020-12-08 21:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Matthew Wilcox, Thomas Gleixner, Andrew Morton, Dave Hansen,
	Christoph Hellwig, Al Viro, Eric Biggers, Joonas Lahtinen,
	Linux Kernel Mailing List, linux-fsdevel

On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > +                            struct page *src_page, size_t src_off,
> > > > > +                            size_t len)
> > > > > +{
> > > > > +     char *dst = kmap_local_page(dst_page);
> > > > > +     char *src = kmap_local_page(src_page);
> > > >
> > > > I appreciate you've only moved these, but please add:
> > > >
> > > >         BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > >
> > > I imagine it's not outside the realm of possibility that some driver
> > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > Shouldn't this WARN rather than BUG so that the user can report the
> > > buggy driver and not have a dead system?
> >
> > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > is on the next page of memory?
> 
> Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> 
> > I suppose ideally ...
> >
> >         if (WARN_ON(dst_off + len > PAGE_SIZE))
> >                 len = PAGE_SIZE - dst_off;
> >         if (WARN_ON(src_off + len > PAGE_SIZE))
> >                 len = PAGE_SIZE - src_off;
> >
> > and then we just truncate the data of the offending caller instead of
> > corrupting innocent data that happens to be adjacent.  Although that's
> > not ideal either ... I dunno, what's the least bad poison to drink here?
> 
> Right, if the driver was relying on "corruption" for correct operation.
> 
> If corruption actual were happening in practice wouldn't there have
> been screams by now? Again, not necessarily...
> 
> At least with just plain WARN the kernel will start screaming on the
> user's behalf, and if it worked before it will keep working.

So I decided to just sleep on this because I was recently told to not introduce
new WARN_ON's[1]

I don't think that truncating len is worth the effort.  The conversions being
done should all 'work'  At least corrupting users data in the same way as it
used to...  ;-)  I'm ok with adding the WARN_ON's and I have modified the patch
to do so while I work through the 0-day issues.  (not sure what is going on
there.)

However, are we ok with adding the WARN_ON's given what Greg KH told me?  This
is a bit more critical than the PKS API in that it could result in corrupt
data.

Ira

[1] https://lore.kernel.org/linux-doc/20201103065024.GC75930@kroah.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-08 21:32           ` Ira Weiny
@ 2020-12-08 21:50             ` Matthew Wilcox
  2020-12-08 22:23               ` Dan Williams
  2020-12-08 22:21             ` Dan Williams
  1 sibling, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2020-12-08 21:50 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Thomas Gleixner, Andrew Morton, Dave Hansen,
	Christoph Hellwig, Al Viro, Eric Biggers, Joonas Lahtinen,
	Linux Kernel Mailing List, linux-fsdevel

On Tue, Dec 08, 2020 at 01:32:55PM -0800, Ira Weiny wrote:
> On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > +                            struct page *src_page, size_t src_off,
> > > > > > +                            size_t len)
> > > > > > +{
> > > > > > +     char *dst = kmap_local_page(dst_page);
> > > > > > +     char *src = kmap_local_page(src_page);
> > > > >
> > > > > I appreciate you've only moved these, but please add:
> > > > >
> > > > >         BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > >
> > > > I imagine it's not outside the realm of possibility that some driver
> > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > buggy driver and not have a dead system?
> > >
> > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > is on the next page of memory?
> > 
> > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> > 
> > > I suppose ideally ...
> > >
> > >         if (WARN_ON(dst_off + len > PAGE_SIZE))
> > >                 len = PAGE_SIZE - dst_off;
> > >         if (WARN_ON(src_off + len > PAGE_SIZE))
> > >                 len = PAGE_SIZE - src_off;
> > >
> > > and then we just truncate the data of the offending caller instead of
> > > corrupting innocent data that happens to be adjacent.  Although that's
> > > not ideal either ... I dunno, what's the least bad poison to drink here?
> > 
> > Right, if the driver was relying on "corruption" for correct operation.
> > 
> > If corruption actual were happening in practice wouldn't there have
> > been screams by now? Again, not necessarily...
> > 
> > At least with just plain WARN the kernel will start screaming on the
> > user's behalf, and if it worked before it will keep working.
> 
> So I decided to just sleep on this because I was recently told to not introduce
> new WARN_ON's[1]
> 
> I don't think that truncating len is worth the effort.  The conversions being
> done should all 'work'  At least corrupting users data in the same way as it
> used to...  ;-)  I'm ok with adding the WARN_ON's and I have modified the patch
> to do so while I work through the 0-day issues.  (not sure what is going on
> there.)
> 
> However, are we ok with adding the WARN_ON's given what Greg KH told me?  This
> is a bit more critical than the PKS API in that it could result in corrupt
> data.

zero_user_segments contains:

        BUG_ON(end1 > page_size(page) || end2 > page_size(page));

These should be consistent.  I think we've demonstrated that there is
no good option here.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-08 21:32           ` Ira Weiny
  2020-12-08 21:50             ` Matthew Wilcox
@ 2020-12-08 22:21             ` Dan Williams
  1 sibling, 0 replies; 24+ messages in thread
From: Dan Williams @ 2020-12-08 22:21 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Matthew Wilcox, Thomas Gleixner, Andrew Morton, Dave Hansen,
	Christoph Hellwig, Al Viro, Eric Biggers, Joonas Lahtinen,
	Linux Kernel Mailing List, linux-fsdevel

On Tue, Dec 8, 2020 at 1:33 PM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > +                            struct page *src_page, size_t src_off,
> > > > > > +                            size_t len)
> > > > > > +{
> > > > > > +     char *dst = kmap_local_page(dst_page);
> > > > > > +     char *src = kmap_local_page(src_page);
> > > > >
> > > > > I appreciate you've only moved these, but please add:
> > > > >
> > > > >         BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > >
> > > > I imagine it's not outside the realm of possibility that some driver
> > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > buggy driver and not have a dead system?
> > >
> > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > is on the next page of memory?
> >
> > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> >
> > > I suppose ideally ...
> > >
> > >         if (WARN_ON(dst_off + len > PAGE_SIZE))
> > >                 len = PAGE_SIZE - dst_off;
> > >         if (WARN_ON(src_off + len > PAGE_SIZE))
> > >                 len = PAGE_SIZE - src_off;
> > >
> > > and then we just truncate the data of the offending caller instead of
> > > corrupting innocent data that happens to be adjacent.  Although that's
> > > not ideal either ... I dunno, what's the least bad poison to drink here?
> >
> > Right, if the driver was relying on "corruption" for correct operation.
> >
> > If corruption actual were happening in practice wouldn't there have
> > been screams by now? Again, not necessarily...
> >
> > At least with just plain WARN the kernel will start screaming on the
> > user's behalf, and if it worked before it will keep working.
>
> So I decided to just sleep on this because I was recently told to not introduce
> new WARN_ON's[1]
>
> I don't think that truncating len is worth the effort.  The conversions being
> done should all 'work'  At least corrupting users data in the same way as it
> used to...  ;-)  I'm ok with adding the WARN_ON's and I have modified the patch
> to do so while I work through the 0-day issues.  (not sure what is going on
> there.)
>
> However, are we ok with adding the WARN_ON's given what Greg KH told me?  This
> is a bit more critical than the PKS API in that it could result in corrupt
> data.

That situation was a bit different in my mind because the default
fallback stub path has typically never had WARN_ON even if it's never
supposed to be called.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-08 21:50             ` Matthew Wilcox
@ 2020-12-08 22:23               ` Dan Williams
  2020-12-08 22:32                 ` Matthew Wilcox
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2020-12-08 22:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Ira Weiny, Thomas Gleixner, Andrew Morton, Dave Hansen,
	Christoph Hellwig, Al Viro, Eric Biggers, Joonas Lahtinen,
	Linux Kernel Mailing List, linux-fsdevel

On Tue, Dec 8, 2020 at 1:51 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Dec 08, 2020 at 01:32:55PM -0800, Ira Weiny wrote:
> > On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > >
> > > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > > +                            struct page *src_page, size_t src_off,
> > > > > > > +                            size_t len)
> > > > > > > +{
> > > > > > > +     char *dst = kmap_local_page(dst_page);
> > > > > > > +     char *src = kmap_local_page(src_page);
> > > > > >
> > > > > > I appreciate you've only moved these, but please add:
> > > > > >
> > > > > >         BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > > >
> > > > > I imagine it's not outside the realm of possibility that some driver
> > > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > > buggy driver and not have a dead system?
> > > >
> > > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > > is on the next page of memory?
> > >
> > > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> > >
> > > > I suppose ideally ...
> > > >
> > > >         if (WARN_ON(dst_off + len > PAGE_SIZE))
> > > >                 len = PAGE_SIZE - dst_off;
> > > >         if (WARN_ON(src_off + len > PAGE_SIZE))
> > > >                 len = PAGE_SIZE - src_off;
> > > >
> > > > and then we just truncate the data of the offending caller instead of
> > > > corrupting innocent data that happens to be adjacent.  Although that's
> > > > not ideal either ... I dunno, what's the least bad poison to drink here?
> > >
> > > Right, if the driver was relying on "corruption" for correct operation.
> > >
> > > If corruption actual were happening in practice wouldn't there have
> > > been screams by now? Again, not necessarily...
> > >
> > > At least with just plain WARN the kernel will start screaming on the
> > > user's behalf, and if it worked before it will keep working.
> >
> > So I decided to just sleep on this because I was recently told to not introduce
> > new WARN_ON's[1]
> >
> > I don't think that truncating len is worth the effort.  The conversions being
> > done should all 'work'  At least corrupting users data in the same way as it
> > used to...  ;-)  I'm ok with adding the WARN_ON's and I have modified the patch
> > to do so while I work through the 0-day issues.  (not sure what is going on
> > there.)
> >
> > However, are we ok with adding the WARN_ON's given what Greg KH told me?  This
> > is a bit more critical than the PKS API in that it could result in corrupt
> > data.
>
> zero_user_segments contains:
>
>         BUG_ON(end1 > page_size(page) || end2 > page_size(page));
>
> These should be consistent.  I think we've demonstrated that there is
> no good option here.

True, but these helpers are being deployed to many new locations where
they were not used before.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-08 22:23               ` Dan Williams
@ 2020-12-08 22:32                 ` Matthew Wilcox
  2020-12-08 22:45                   ` Darrick J. Wong
  0 siblings, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2020-12-08 22:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Thomas Gleixner, Andrew Morton, Dave Hansen,
	Christoph Hellwig, Al Viro, Eric Biggers, Joonas Lahtinen,
	Linux Kernel Mailing List, linux-fsdevel

On Tue, Dec 08, 2020 at 02:23:10PM -0800, Dan Williams wrote:
> On Tue, Dec 8, 2020 at 1:51 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Dec 08, 2020 at 01:32:55PM -0800, Ira Weiny wrote:
> > > On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > > > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > >
> > > > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > > > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > > > +                            struct page *src_page, size_t src_off,
> > > > > > > > +                            size_t len)
> > > > > > > > +{
> > > > > > > > +     char *dst = kmap_local_page(dst_page);
> > > > > > > > +     char *src = kmap_local_page(src_page);
> > > > > > >
> > > > > > > I appreciate you've only moved these, but please add:
> > > > > > >
> > > > > > >         BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > > > >
> > > > > > I imagine it's not outside the realm of possibility that some driver
> > > > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > > > buggy driver and not have a dead system?
> > > > >
> > > > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > > > is on the next page of memory?
> > > >
> > > > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> > > >
> > > > > I suppose ideally ...
> > > > >
> > > > >         if (WARN_ON(dst_off + len > PAGE_SIZE))
> > > > >                 len = PAGE_SIZE - dst_off;
> > > > >         if (WARN_ON(src_off + len > PAGE_SIZE))
> > > > >                 len = PAGE_SIZE - src_off;
> > > > >
> > > > > and then we just truncate the data of the offending caller instead of
> > > > > corrupting innocent data that happens to be adjacent.  Although that's
> > > > > not ideal either ... I dunno, what's the least bad poison to drink here?
> > > >
> > > > Right, if the driver was relying on "corruption" for correct operation.
> > > >
> > > > If corruption actual were happening in practice wouldn't there have
> > > > been screams by now? Again, not necessarily...
> > > >
> > > > At least with just plain WARN the kernel will start screaming on the
> > > > user's behalf, and if it worked before it will keep working.
> > >
> > > So I decided to just sleep on this because I was recently told to not introduce
> > > new WARN_ON's[1]
> > >
> > > I don't think that truncating len is worth the effort.  The conversions being
> > > done should all 'work'  At least corrupting users data in the same way as it
> > > used to...  ;-)  I'm ok with adding the WARN_ON's and I have modified the patch
> > > to do so while I work through the 0-day issues.  (not sure what is going on
> > > there.)
> > >
> > > However, are we ok with adding the WARN_ON's given what Greg KH told me?  This
> > > is a bit more critical than the PKS API in that it could result in corrupt
> > > data.
> >
> > zero_user_segments contains:
> >
> >         BUG_ON(end1 > page_size(page) || end2 > page_size(page));
> >
> > These should be consistent.  I think we've demonstrated that there is
> > no good option here.
> 
> True, but these helpers are being deployed to many new locations where
> they were not used before.

So what's your preferred poison?

1. Corrupt random data in whatever's been mapped into the next page (which
   is what the helpers currently do)
2. Copy less data than requested
3. Crash
4. Something else

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-08 22:32                 ` Matthew Wilcox
@ 2020-12-08 22:45                   ` Darrick J. Wong
  2020-12-08 22:54                     ` Matthew Wilcox
  2020-12-08 23:40                     ` Dan Williams
  0 siblings, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-12-08 22:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dan Williams, Ira Weiny, Thomas Gleixner, Andrew Morton,
	Dave Hansen, Christoph Hellwig, Al Viro, Eric Biggers,
	Joonas Lahtinen, Linux Kernel Mailing List, linux-fsdevel

On Tue, Dec 08, 2020 at 10:32:34PM +0000, Matthew Wilcox wrote:
> On Tue, Dec 08, 2020 at 02:23:10PM -0800, Dan Williams wrote:
> > On Tue, Dec 8, 2020 at 1:51 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Tue, Dec 08, 2020 at 01:32:55PM -0800, Ira Weiny wrote:
> > > > On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > > > > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > >
> > > > > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > >
> > > > > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > > > > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > > > > +                            struct page *src_page, size_t src_off,
> > > > > > > > > +                            size_t len)
> > > > > > > > > +{
> > > > > > > > > +     char *dst = kmap_local_page(dst_page);
> > > > > > > > > +     char *src = kmap_local_page(src_page);
> > > > > > > >
> > > > > > > > I appreciate you've only moved these, but please add:
> > > > > > > >
> > > > > > > >         BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > > > > >
> > > > > > > I imagine it's not outside the realm of possibility that some driver
> > > > > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > > > > buggy driver and not have a dead system?
> > > > > >
> > > > > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > > > > is on the next page of memory?
> > > > >
> > > > > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> > > > >
> > > > > > I suppose ideally ...
> > > > > >
> > > > > >         if (WARN_ON(dst_off + len > PAGE_SIZE))
> > > > > >                 len = PAGE_SIZE - dst_off;
> > > > > >         if (WARN_ON(src_off + len > PAGE_SIZE))
> > > > > >                 len = PAGE_SIZE - src_off;
> > > > > >
> > > > > > and then we just truncate the data of the offending caller instead of
> > > > > > corrupting innocent data that happens to be adjacent.  Although that's
> > > > > > not ideal either ... I dunno, what's the least bad poison to drink here?
> > > > >
> > > > > Right, if the driver was relying on "corruption" for correct operation.
> > > > >
> > > > > If corruption actual were happening in practice wouldn't there have
> > > > > been screams by now? Again, not necessarily...
> > > > >
> > > > > At least with just plain WARN the kernel will start screaming on the
> > > > > user's behalf, and if it worked before it will keep working.
> > > >
> > > > So I decided to just sleep on this because I was recently told to not introduce
> > > > new WARN_ON's[1]
> > > >
> > > > I don't think that truncating len is worth the effort.  The conversions being
> > > > done should all 'work'  At least corrupting users data in the same way as it
> > > > used to...  ;-)  I'm ok with adding the WARN_ON's and I have modified the patch
> > > > to do so while I work through the 0-day issues.  (not sure what is going on
> > > > there.)
> > > >
> > > > However, are we ok with adding the WARN_ON's given what Greg KH told me?  This
> > > > is a bit more critical than the PKS API in that it could result in corrupt
> > > > data.
> > >
> > > zero_user_segments contains:
> > >
> > >         BUG_ON(end1 > page_size(page) || end2 > page_size(page));
> > >
> > > These should be consistent.  I think we've demonstrated that there is
> > > no good option here.
> > 
> > True, but these helpers are being deployed to many new locations where
> > they were not used before.
> 
> So what's your preferred poison?
> 
> 1. Corrupt random data in whatever's been mapped into the next page (which
>    is what the helpers currently do)

Please no.

> 2. Copy less data than requested

This sounds like the germination event for a research paper showing that
63% of callers never notice. ;)

> 3. Crash

Useful as a debug tool?

> 4. Something else

Return bytes copied like we do for writes that didn't quite work?

--D

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-08 22:45                   ` Darrick J. Wong
@ 2020-12-08 22:54                     ` Matthew Wilcox
  2020-12-08 23:40                     ` Dan Williams
  1 sibling, 0 replies; 24+ messages in thread
From: Matthew Wilcox @ 2020-12-08 22:54 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Ira Weiny, Thomas Gleixner, Andrew Morton,
	Dave Hansen, Christoph Hellwig, Al Viro, Eric Biggers,
	Joonas Lahtinen, Linux Kernel Mailing List, linux-fsdevel

On Tue, Dec 08, 2020 at 02:45:55PM -0800, Darrick J. Wong wrote:
> On Tue, Dec 08, 2020 at 10:32:34PM +0000, Matthew Wilcox wrote:
> > On Tue, Dec 08, 2020 at 02:23:10PM -0800, Dan Williams wrote:
> > > On Tue, Dec 8, 2020 at 1:51 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Tue, Dec 08, 2020 at 01:32:55PM -0800, Ira Weiny wrote:
> > > > > On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > > > > > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > >
> > > > > > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > > > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, ira.weiny@intel.com wrote:
> > > > > > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > > > > > +                            struct page *src_page, size_t src_off,
> > > > > > > > > > +                            size_t len)
> > > > > > > > > > +{
> > > > > > > > > > +     char *dst = kmap_local_page(dst_page);
> > > > > > > > > > +     char *src = kmap_local_page(src_page);
> > > > > > > > >
> > > > > > > > > I appreciate you've only moved these, but please add:
> > > > > > > > >
> > > > > > > > >         BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > > > > > >
> > > > > > > > I imagine it's not outside the realm of possibility that some driver
> > > > > > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > > > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > > > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > > > > > buggy driver and not have a dead system?
> > > > > > >
> > > > > > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > > > > > is on the next page of memory?
> > > > > >
> > > > > > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> > > > > >
> > > > > > > I suppose ideally ...
> > > > > > >
> > > > > > >         if (WARN_ON(dst_off + len > PAGE_SIZE))
> > > > > > >                 len = PAGE_SIZE - dst_off;
> > > > > > >         if (WARN_ON(src_off + len > PAGE_SIZE))
> > > > > > >                 len = PAGE_SIZE - src_off;
> > > > > > >
> > > > > > > and then we just truncate the data of the offending caller instead of
> > > > > > > corrupting innocent data that happens to be adjacent.  Although that's
> > > > > > > not ideal either ... I dunno, what's the least bad poison to drink here?
> > > > > >
> > > > > > Right, if the driver was relying on "corruption" for correct operation.
> > > > > >
> > > > > > If corruption actual were happening in practice wouldn't there have
> > > > > > been screams by now? Again, not necessarily...
> > > > > >
> > > > > > At least with just plain WARN the kernel will start screaming on the
> > > > > > user's behalf, and if it worked before it will keep working.
> > > > >
> > > > > So I decided to just sleep on this because I was recently told to not introduce
> > > > > new WARN_ON's[1]
> > > > >
> > > > > I don't think that truncating len is worth the effort.  The conversions being
> > > > > done should all 'work'  At least corrupting users data in the same way as it
> > > > > used to...  ;-)  I'm ok with adding the WARN_ON's and I have modified the patch
> > > > > to do so while I work through the 0-day issues.  (not sure what is going on
> > > > > there.)
> > > > >
> > > > > However, are we ok with adding the WARN_ON's given what Greg KH told me?  This
> > > > > is a bit more critical than the PKS API in that it could result in corrupt
> > > > > data.
> > > >
> > > > zero_user_segments contains:
> > > >
> > > >         BUG_ON(end1 > page_size(page) || end2 > page_size(page));
> > > >
> > > > These should be consistent.  I think we've demonstrated that there is
> > > > no good option here.
> > > 
> > > True, but these helpers are being deployed to many new locations where
> > > they were not used before.
> > 
> > So what's your preferred poison?
> > 
> > 1. Corrupt random data in whatever's been mapped into the next page (which
> >    is what the helpers currently do)
> 
> Please no.
> 
> > 2. Copy less data than requested
> 
> This sounds like the germination event for a research paper showing that
> 63% of callers never notice. ;)
> 
> > 3. Crash
> 
> Useful as a debug tool?
> 
> > 4. Something else
> 
> Return bytes copied like we do for writes that didn't quite work?

... to learn that 87% of callers never check the return value, 10%
of them do the wrong thing and the remainder have never been tested?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-08 22:45                   ` Darrick J. Wong
  2020-12-08 22:54                     ` Matthew Wilcox
@ 2020-12-08 23:40                     ` Dan Williams
  2020-12-09  2:22                       ` Ira Weiny
  1 sibling, 1 reply; 24+ messages in thread
From: Dan Williams @ 2020-12-08 23:40 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Matthew Wilcox, Ira Weiny, Thomas Gleixner, Andrew Morton,
	Dave Hansen, Christoph Hellwig, Al Viro, Eric Biggers,
	Joonas Lahtinen, Linux Kernel Mailing List, linux-fsdevel

On Tue, Dec 8, 2020 at 2:49 PM Darrick J. Wong <darrick.wong@oracle.com> wrote:
[..]
> > So what's your preferred poison?
> >
> > 1. Corrupt random data in whatever's been mapped into the next page (which
> >    is what the helpers currently do)
>
> Please no.

My assertion is that the kernel can't know it's corruption, it can
only know that the driver is abusing the API. So over-copy and WARN
seems better than violently regress by crashing what might have been
working silently before.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-08 23:40                     ` Dan Williams
@ 2020-12-09  2:22                       ` Ira Weiny
  2020-12-09  4:03                         ` Matthew Wilcox
  0 siblings, 1 reply; 24+ messages in thread
From: Ira Weiny @ 2020-12-09  2:22 UTC (permalink / raw)
  To: Dan Williams
  Cc: Darrick J. Wong, Matthew Wilcox, Thomas Gleixner, Andrew Morton,
	Dave Hansen, Christoph Hellwig, Al Viro, Eric Biggers,
	Joonas Lahtinen, Linux Kernel Mailing List, linux-fsdevel

On Tue, Dec 08, 2020 at 03:40:52PM -0800, Dan Williams wrote:
> On Tue, Dec 8, 2020 at 2:49 PM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> [..]
> > > So what's your preferred poison?
> > >
> > > 1. Corrupt random data in whatever's been mapped into the next page (which
> > >    is what the helpers currently do)
> >
> > Please no.
> 
> My assertion is that the kernel can't know it's corruption, it can
> only know that the driver is abusing the API. So over-copy and WARN
> seems better than violently regress by crashing what might have been
> working silently before.

Right now we have a mixed bag.  zero_user() [and it's variants, circa 2008]
does a BUG_ON.[0]  While the other ones do nothing; clear_highpage(),
clear_user_highpage(), copy_user_highpage(), and copy_highpage().

While continuing to audit the code I don't see any users who would violating
the API with a simple conversion of the code.  The calls which I have worked on
[which is many at this point] all have checks in place which are well aware of
page boundaries.

Therefore, I tend to agree with Dan that if anything is to be done it should be
a WARN_ON() which is only going to throw an error that something has probably
been wrong all along and should be fixed but continue running as before.

BUG_ON() is a very big hammer.  And I don't think that Linus is going to
appreciate a BUG_ON here.[1]  Callers of this API should be well aware that
they are operating on a page and that specifying parameters beyond the bounds
of a page are going to have bad consequences...

Furthermore, I'm still leery of adding the WARN_ON's because Greg KH says many
people will be converting them to BUG_ON's via panic-on-warn anyway.  But at
least that is their choice.

FWIW I think this is a 'bad BUG_ON' use because we are "checking something that
we know we might be getting wrong".[1]  And because, "BUG() is only good for
something that never happens and that we really have no other option for".[2]

IMO, These calls are like memcpy/memmove.  memcpy/memmove don't validate bounds
and developers have lived with those constructs for a long time.

Ira

[0] BTW, After writing this email, with various URL research, I think this
BUG_ON() is also probably wrong...

[1] 
	<quote>
	...
	It's [BUG_ON] not a "let's check that
	everybody did things right", it's a "this is a major design rule in
	this core code".
	...
	</quote>
		-- Linus (https://lkml.org/lkml/2016/10/4/337)

[2] https://yarchive.net/comp/linux/BUG.html


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-09  2:22                       ` Ira Weiny
@ 2020-12-09  4:03                         ` Matthew Wilcox
  2020-12-09 19:47                           ` Dan Williams
  0 siblings, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2020-12-09  4:03 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Dan Williams, Darrick J. Wong, Thomas Gleixner, Andrew Morton,
	Dave Hansen, Christoph Hellwig, Al Viro, Eric Biggers,
	Joonas Lahtinen, Linux Kernel Mailing List, linux-fsdevel

On Tue, Dec 08, 2020 at 06:22:50PM -0800, Ira Weiny wrote:
> Right now we have a mixed bag.  zero_user() [and it's variants, circa 2008]
> does a BUG_ON.[0]  While the other ones do nothing; clear_highpage(),
> clear_user_highpage(), copy_user_highpage(), and copy_highpage().

Erm, those functions operate on the entire PAGE_SIZE.  There's nothing
for them to check.

> While continuing to audit the code I don't see any users who would violating
> the API with a simple conversion of the code.  The calls which I have worked on
> [which is many at this point] all have checks in place which are well aware of
> page boundaries.

Oh good, then this BUG_ON won't trigger.

> Therefore, I tend to agree with Dan that if anything is to be done it should be
> a WARN_ON() which is only going to throw an error that something has probably
> been wrong all along and should be fixed but continue running as before.

Silent data corruption is for ever.  Are you absolutely sure nobody has
done:

	page = alloc_pages(GFP_HIGHUSER_MOVABLE, 3);
	memcpy_to_page(page, PAGE_SIZE * 2, p, PAGE_SIZE * 2);

because that will work fine if the pages come from ZONE_NORMAL and fail
miserably if they came from ZONE_HIGHMEM.

> FWIW I think this is a 'bad BUG_ON' use because we are "checking something that
> we know we might be getting wrong".[1]  And because, "BUG() is only good for
> something that never happens and that we really have no other option for".[2]

BUG() is our only option here.  Both limiting how much we copy or
copying the requested amount result in data corruption or leaking
information to a process that isn't supposed to see it.

What Linus is railing against is the developers who say "Oh, I don't
know what to do here, I'll just BUG()".  That's not the case here.
We've thought about it.  We've discussed it.  There's NO GOOD OPTION.

Unless you want to do the moral equivalent of this:

http://git.infradead.org/users/willy/pagecache.git/commitdiff/d2417516bd8b3dd1db096a9b040b0264d8052339

I think that would look something like this ...

void memcpy_to_page(struct page *page, size_t offset, const char *from,
			size_t len)
{
	page += offset / PAGE_SIZE;
	offset %= PAGE_SIZE;

	while (len) {
		char *to = kmap_atomic(page);
		size_t bytes = min(len, PAGE_SIZE - offset);
		memcpy(to + offset, from, len);
		kunmap_atomic(to);
		len -= bytes;
		offset = 0;
		page++;
	}
}

Now 32-bit highmem will do the same thing as 64-bit for my example above,
just more slowly.  Untested, obviously.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-09  4:03                         ` Matthew Wilcox
@ 2020-12-09 19:47                           ` Dan Williams
  2020-12-09 20:14                             ` Matthew Wilcox
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2020-12-09 19:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Ira Weiny, Darrick J. Wong, Thomas Gleixner, Andrew Morton,
	Dave Hansen, Christoph Hellwig, Al Viro, Eric Biggers,
	Joonas Lahtinen, Linux Kernel Mailing List, linux-fsdevel

On Tue, Dec 8, 2020 at 8:03 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Dec 08, 2020 at 06:22:50PM -0800, Ira Weiny wrote:
> > Right now we have a mixed bag.  zero_user() [and it's variants, circa 2008]
> > does a BUG_ON.[0]  While the other ones do nothing; clear_highpage(),
> > clear_user_highpage(), copy_user_highpage(), and copy_highpage().
>
> Erm, those functions operate on the entire PAGE_SIZE.  There's nothing
> for them to check.
>
> > While continuing to audit the code I don't see any users who would violating
> > the API with a simple conversion of the code.  The calls which I have worked on
> > [which is many at this point] all have checks in place which are well aware of
> > page boundaries.
>
> Oh good, then this BUG_ON won't trigger.
>
> > Therefore, I tend to agree with Dan that if anything is to be done it should be
> > a WARN_ON() which is only going to throw an error that something has probably
> > been wrong all along and should be fixed but continue running as before.
>
> Silent data corruption is for ever.  Are you absolutely sure nobody has
> done:
>
>         page = alloc_pages(GFP_HIGHUSER_MOVABLE, 3);
>         memcpy_to_page(page, PAGE_SIZE * 2, p, PAGE_SIZE * 2);
>
> because that will work fine if the pages come from ZONE_NORMAL and fail
> miserably if they came from ZONE_HIGHMEM.

...and violently regress with the BUG_ON.

The question to me is: which is more likely that any bad usages have
been covered up by being limited to ZONE_NORMAL / 64-bit only, or that
silent data corruption has been occurring with no ill effects?

> > FWIW I think this is a 'bad BUG_ON' use because we are "checking something that
> > we know we might be getting wrong".[1]  And because, "BUG() is only good for
> > something that never happens and that we really have no other option for".[2]
>
> BUG() is our only option here.  Both limiting how much we copy or
> copying the requested amount result in data corruption or leaking
> information to a process that isn't supposed to see it.

At a minimum I think this should be debated in a follow on patch to
add assertion checking where there was none before. There is no
evidence of a page being overrun in the audit Ira performed.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-09 19:47                           ` Dan Williams
@ 2020-12-09 20:14                             ` Matthew Wilcox
  2020-12-09 20:19                               ` Dan Williams
  2020-12-10  5:35                               ` Ira Weiny
  0 siblings, 2 replies; 24+ messages in thread
From: Matthew Wilcox @ 2020-12-09 20:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, Darrick J. Wong, Thomas Gleixner, Andrew Morton,
	Dave Hansen, Christoph Hellwig, Al Viro, Eric Biggers,
	Joonas Lahtinen, Linux Kernel Mailing List, linux-fsdevel

On Wed, Dec 09, 2020 at 11:47:56AM -0800, Dan Williams wrote:
> On Tue, Dec 8, 2020 at 8:03 PM Matthew Wilcox <willy@infradead.org> wrote:
> > On Tue, Dec 08, 2020 at 06:22:50PM -0800, Ira Weiny wrote:
> > > Therefore, I tend to agree with Dan that if anything is to be done it should be
> > > a WARN_ON() which is only going to throw an error that something has probably
> > > been wrong all along and should be fixed but continue running as before.
> >
> > Silent data corruption is for ever.  Are you absolutely sure nobody has
> > done:
> >
> >         page = alloc_pages(GFP_HIGHUSER_MOVABLE, 3);
> >         memcpy_to_page(page, PAGE_SIZE * 2, p, PAGE_SIZE * 2);
> >
> > because that will work fine if the pages come from ZONE_NORMAL and fail
> > miserably if they came from ZONE_HIGHMEM.
> 
> ...and violently regress with the BUG_ON.

... which is what we want, no?

> The question to me is: which is more likely that any bad usages have
> been covered up by being limited to ZONE_NORMAL / 64-bit only, or that
> silent data corruption has been occurring with no ill effects?

I wouldn't be at all surprised to learn that there is silent data
corruption on 32-bit systems with HIGHMEM.  Would you?  How much testing
do you do on 32-bit HIGHMEM systems?

Actually, I wouldn't be at all surprised if we can hit this problem today.
Look at this:

size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
{
        char *to = addr;
        if (unlikely(iov_iter_is_pipe(i))) {
                WARN_ON(1);
                return 0;
        }
        if (iter_is_iovec(i))
                might_fault();
        iterate_and_advance(i, bytes, v,
                copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
                memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
                                 v.bv_offset, v.bv_len),
                memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
        )

        return bytes;
}
EXPORT_SYMBOL(_copy_from_iter);

There's a lot of macrology in there, so for those following along who
aren't familiar with the iov_iter code, if the iter is operating on a
bvec, then iterate_and_advance() will call memcpy_from_page(), passing
it the bv_page, bv_offset and bv_len stored in the bvec.  Since 2019,
Linux has supported multipage bvecs (commit 07173c3ec276).  So bv_len
absolutely *can* be > PAGE_SIZE.

Does this ever happen in practice?  I have no idea; I don't know whether
any multipage BIOs are currently handed to copy_from_iter().  But I
have no confidence in your audit if you didn't catch this one.

> > > FWIW I think this is a 'bad BUG_ON' use because we are "checking something that
> > > we know we might be getting wrong".[1]  And because, "BUG() is only good for
> > > something that never happens and that we really have no other option for".[2]
> >
> > BUG() is our only option here.  Both limiting how much we copy or
> > copying the requested amount result in data corruption or leaking
> > information to a process that isn't supposed to see it.
> 
> At a minimum I think this should be debated in a follow on patch to
> add assertion checking where there was none before. There is no
> evidence of a page being overrun in the audit Ira performed.

If we put in into a separate patch, someone will suggest backing out the
patch which tells us that there's a problem.  You know, like this guy ...
https://lore.kernel.org/linux-mm/CAPcyv4jNVroYmirzKw_=CsEixOEACdL3M1Wc4xjd_TFv3h+o8Q@mail.gmail.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-09 20:14                             ` Matthew Wilcox
@ 2020-12-09 20:19                               ` Dan Williams
  2020-12-10  5:35                               ` Ira Weiny
  1 sibling, 0 replies; 24+ messages in thread
From: Dan Williams @ 2020-12-09 20:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Ira Weiny, Darrick J. Wong, Thomas Gleixner, Andrew Morton,
	Dave Hansen, Christoph Hellwig, Al Viro, Eric Biggers,
	Joonas Lahtinen, Linux Kernel Mailing List, linux-fsdevel

On Wed, Dec 9, 2020 at 12:14 PM Matthew Wilcox <willy@infradead.org> wrote:
[..]
> If we put in into a separate patch, someone will suggest backing out the
> patch which tells us that there's a problem.  You know, like this guy ...
> https://lore.kernel.org/linux-mm/CAPcyv4jNVroYmirzKw_=CsEixOEACdL3M1Wc4xjd_TFv3h+o8Q@mail.gmail.com/

...and that person, like in this case, will be told 'no' and go on to
find / fix the real problem.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core
  2020-12-09 20:14                             ` Matthew Wilcox
  2020-12-09 20:19                               ` Dan Williams
@ 2020-12-10  5:35                               ` Ira Weiny
  1 sibling, 0 replies; 24+ messages in thread
From: Ira Weiny @ 2020-12-10  5:35 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dan Williams, Darrick J. Wong, Thomas Gleixner, Andrew Morton,
	Dave Hansen, Christoph Hellwig, Al Viro, Eric Biggers,
	Joonas Lahtinen, Linux Kernel Mailing List, linux-fsdevel

On Wed, Dec 09, 2020 at 08:14:15PM +0000, Matthew Wilcox wrote:
> On Wed, Dec 09, 2020 at 11:47:56AM -0800, Dan Williams wrote:
> > On Tue, Dec 8, 2020 at 8:03 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > On Tue, Dec 08, 2020 at 06:22:50PM -0800, Ira Weiny wrote:
> > > > Therefore, I tend to agree with Dan that if anything is to be done it should be
> > > > a WARN_ON() which is only going to throw an error that something has probably
> > > > been wrong all along and should be fixed but continue running as before.
> > >
> > > Silent data corruption is for ever.  Are you absolutely sure nobody has
> > > done:
> > >
> > >         page = alloc_pages(GFP_HIGHUSER_MOVABLE, 3);
> > >         memcpy_to_page(page, PAGE_SIZE * 2, p, PAGE_SIZE * 2);
> > >
> > > because that will work fine if the pages come from ZONE_NORMAL and fail
> > > miserably if they came from ZONE_HIGHMEM.
> > 
> > ...and violently regress with the BUG_ON.
> 
> ... which is what we want, no?
> 
> > The question to me is: which is more likely that any bad usages have
> > been covered up by being limited to ZONE_NORMAL / 64-bit only, or that
> > silent data corruption has been occurring with no ill effects?
> 
> I wouldn't be at all surprised to learn that there is silent data
> corruption on 32-bit systems with HIGHMEM.  Would you?  How much testing
> do you do on 32-bit HIGHMEM systems?
> 
> Actually, I wouldn't be at all surprised if we can hit this problem today.
> Look at this:
> 
> size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
> {
>         char *to = addr;
>         if (unlikely(iov_iter_is_pipe(i))) {
>                 WARN_ON(1);
>                 return 0;
>         }
>         if (iter_is_iovec(i))
>                 might_fault();
>         iterate_and_advance(i, bytes, v,
>                 copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
>                 memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
>                                  v.bv_offset, v.bv_len),
>                 memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
>         )
> 
>         return bytes;
> }
> EXPORT_SYMBOL(_copy_from_iter);
> 
> There's a lot of macrology in there, so for those following along who
> aren't familiar with the iov_iter code, if the iter is operating on a
> bvec, then iterate_and_advance() will call memcpy_from_page(), passing
> it the bv_page, bv_offset and bv_len stored in the bvec.  Since 2019,
> Linux has supported multipage bvecs (commit 07173c3ec276).  So bv_len
> absolutely *can* be > PAGE_SIZE.
> 
> Does this ever happen in practice?  I have no idea; I don't know whether
> any multipage BIOs are currently handed to copy_from_iter().  But I
> have no confidence in your audit if you didn't catch this one.

Ah...  This call site has been there since 2014 and is not a new caller I have
been 'auditing'.[1]

> 
> > > > FWIW I think this is a 'bad BUG_ON' use because we are "checking something that
> > > > we know we might be getting wrong".[1]  And because, "BUG() is only good for
> > > > something that never happens and that we really have no other option for".[2]
> > >
> > > BUG() is our only option here.  Both limiting how much we copy or
> > > copying the requested amount result in data corruption or leaking
> > > information to a process that isn't supposed to see it.
> > 
> > At a minimum I think this should be debated in a follow on patch to
> > add assertion checking where there was none before. There is no
> > evidence of a page being overrun in the audit Ira performed.
> 
> If we put in into a separate patch, someone will suggest backing out the
> patch which tells us that there's a problem.  You know, like this guy ...
> https://lore.kernel.org/linux-mm/CAPcyv4jNVroYmirzKw_=CsEixOEACdL3M1Wc4xjd_TFv3h+o8Q@mail.gmail.com/

I'm not following this.  Regardless I've already added the BUG_ON's.

Ira

[1]
commit 0dbca9a4b5d69a7e4b8c1d55b98312fcd9aafcf7
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Nov 27 14:26:43 2014 -0500

    iov_iter.c: convert copy_from_iter() to iterate_and_advance

    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-12-10  5:35 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-07 22:57 [PATCH V2 0/2] Lift memcpy_[to|from]_page to core ira.weiny
2020-12-07 22:57 ` [PATCH V2 1/2] mm/highmem: Remove deprecated kmap_atomic ira.weiny
2020-12-07 22:57 ` [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core ira.weiny
2020-12-07 23:26   ` Matthew Wilcox
2020-12-07 23:34     ` Dan Williams
2020-12-07 23:40       ` Matthew Wilcox
2020-12-07 23:49         ` Dan Williams
2020-12-08 21:32           ` Ira Weiny
2020-12-08 21:50             ` Matthew Wilcox
2020-12-08 22:23               ` Dan Williams
2020-12-08 22:32                 ` Matthew Wilcox
2020-12-08 22:45                   ` Darrick J. Wong
2020-12-08 22:54                     ` Matthew Wilcox
2020-12-08 23:40                     ` Dan Williams
2020-12-09  2:22                       ` Ira Weiny
2020-12-09  4:03                         ` Matthew Wilcox
2020-12-09 19:47                           ` Dan Williams
2020-12-09 20:14                             ` Matthew Wilcox
2020-12-09 20:19                               ` Dan Williams
2020-12-10  5:35                               ` Ira Weiny
2020-12-08 22:21             ` Dan Williams
2020-12-08 12:23   ` Matthew Wilcox
2020-12-08 16:38     ` Ira Weiny
2020-12-08 16:40       ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).