All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/4] convert read_kcore(), vread() to use iterators
@ 2023-03-23 10:15 Lorenzo Stoakes
  2023-03-23 10:15 ` [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data Lorenzo Stoakes
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Lorenzo Stoakes @ 2023-03-23 10:15 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Baoquan He, Uladzislau Rezki, Matthew Wilcox, David Hildenbrand,
	Liu Shixin, Jiri Olsa, Jens Axboe, Alexander Viro,
	Lorenzo Stoakes

While reviewing Baoquan's recent changes to permit vread() access to
vm_map_ram regions of vmalloc allocations, Willy pointed out [1] that it
would be nice to refactor vread() as a whole, since its only user is
read_kcore() and the existing form of vread() necessitates the use of a
bounce buffer.

This patch series does exactly that, as well as adjusting how we read the
kernel text section to avoid the use of a bounce buffer in this case as
well.

This has been tested against the test case which motivated Baoquan's
changes in the first place [2] which continues to function correctly, as do
the vmalloc self tests.

[1] https://lore.kernel.org/all/Y8WfDSRkc%2FOHP3oD@casper.infradead.org/
[2] https://lore.kernel.org/all/87ilk6gos2.fsf@oracle.com/T/#u

v8:
- Make zero_iter() static.

v7:
- Keep trying to fault in memory until the vmalloc read operation
  completes.
https://lore.kernel.org/all/cover.1679511146.git.lstoakes@gmail.com/

v6:
- Correct copy_page_to_iter_nofault() to handle -EFAULT case correctly.
https://lore.kernel.org/all/cover.1679496827.git.lstoakes@gmail.com/

v5:
- Do not rename fpos to ppos in read_kcore_iter() to avoid churn.
- Fix incorrect commit messages after prior revisions altered the approach.
- Replace copy_page_to_iter_atomic() with copy_page_to_iter_nofault() and
  adjust it to be able to handle compound pages. This uses
  copy_to_user_nofault() which ensures page faults are disabled during copy
  which kmap_local_page() was not doing.
- Only try to fault in pages if we are unable to copy in the first place
  and try only once to avoid any risk of spinning.
- Do not zero memory in aligned_vread_iter() if we couldn't copy it.
- Fix mistake in zeroing missing or unpopulated blocks in
  vmap_ram_vread_iter().
https://lore.kernel.org/linux-mm/cover.1679494218.git.lstoakes@gmail.com/

v4:
- Fixup mistake in email client which orphaned patch emails from the
  cover letter.
https://lore.kernel.org/all/cover.1679431886.git.lstoakes@gmail.com

v3:
- Revert introduction of mutex/rwsem in vmalloc
- Introduce copy_page_to_iter_atomic() iovec function
- Update vread_iter() and descendent functions to use only this
- Fault in user pages before calling vread_iter()
- Use const char* in vread_iter() and descendent functions
- Updated commit messages based on feedback
- Extend vread functions to always check how many bytes we could copy. If
  at any stage we are unable to copy/zero, abort and return the number of
  bytes we did copy.
https://lore.kernel.org/all/cover.1679354384.git.lstoakes@gmail.com/

v2:
- Fix ordering of vread_iter() parameters
- Fix nommu vread() -> vread_iter()
https://lore.kernel.org/all/cover.1679209395.git.lstoakes@gmail.com/

v1:
https://lore.kernel.org/all/cover.1679183626.git.lstoakes@gmail.com/

Lorenzo Stoakes (4):
  fs/proc/kcore: avoid bounce buffer for ktext data
  fs/proc/kcore: convert read_kcore() to read_kcore_iter()
  iov_iter: add copy_page_to_iter_nofault()
  mm: vmalloc: convert vread() to vread_iter()

 fs/proc/kcore.c         |  85 +++++++--------
 include/linux/uio.h     |   2 +
 include/linux/vmalloc.h |   3 +-
 lib/iov_iter.c          |  48 +++++++++
 mm/nommu.c              |  10 +-
 mm/vmalloc.c            | 234 +++++++++++++++++++++++++---------------
 6 files changed, 243 insertions(+), 139 deletions(-)

--
2.39.2

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-03-23 10:15 [PATCH v8 0/4] convert read_kcore(), vread() to use iterators Lorenzo Stoakes
@ 2023-03-23 10:15 ` Lorenzo Stoakes
  2023-05-31 11:58   ` Jiri Olsa
  2023-03-23 10:15 ` [PATCH v8 2/4] fs/proc/kcore: convert read_kcore() to read_kcore_iter() Lorenzo Stoakes
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Lorenzo Stoakes @ 2023-03-23 10:15 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Baoquan He, Uladzislau Rezki, Matthew Wilcox, David Hildenbrand,
	Liu Shixin, Jiri Olsa, Jens Axboe, Alexander Viro,
	Lorenzo Stoakes

Commit df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data")
introduced the use of a bounce buffer to retrieve kernel text data for
/proc/kcore in order to avoid failures arising from hardened user copies
enabled by CONFIG_HARDENED_USERCOPY in check_kernel_text_object().

We can avoid doing this if instead of copy_to_user() we use _copy_to_user()
which bypasses the hardening check. This is more efficient than using a
bounce buffer and simplifies the code.

We do so as part an overall effort to eliminate bounce buffer usage in the
function with an eye to converting it an iterator read.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 fs/proc/kcore.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 71157ee35c1a..556f310d6aa4 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -541,19 +541,12 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		case KCORE_VMEMMAP:
 		case KCORE_TEXT:
 			/*
-			 * Using bounce buffer to bypass the
-			 * hardened user copy kernel text checks.
+			 * We use _copy_to_user() to bypass usermode hardening
+			 * which would otherwise prevent this operation.
 			 */
-			if (copy_from_kernel_nofault(buf, (void *)start, tsz)) {
-				if (clear_user(buffer, tsz)) {
-					ret = -EFAULT;
-					goto out;
-				}
-			} else {
-				if (copy_to_user(buffer, buf, tsz)) {
-					ret = -EFAULT;
-					goto out;
-				}
+			if (_copy_to_user(buffer, (char *)start, tsz)) {
+				ret = -EFAULT;
+				goto out;
 			}
 			break;
 		default:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 2/4] fs/proc/kcore: convert read_kcore() to read_kcore_iter()
  2023-03-23 10:15 [PATCH v8 0/4] convert read_kcore(), vread() to use iterators Lorenzo Stoakes
  2023-03-23 10:15 ` [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data Lorenzo Stoakes
@ 2023-03-23 10:15 ` Lorenzo Stoakes
  2023-03-23 10:15 ` [PATCH v8 3/4] iov_iter: add copy_page_to_iter_nofault() Lorenzo Stoakes
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Lorenzo Stoakes @ 2023-03-23 10:15 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Baoquan He, Uladzislau Rezki, Matthew Wilcox, David Hildenbrand,
	Liu Shixin, Jiri Olsa, Jens Axboe, Alexander Viro,
	Lorenzo Stoakes

For the time being we still use a bounce buffer for vread(), however in the
next patch we will convert this to interact directly with the iterator and
eliminate the bounce buffer altogether.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 fs/proc/kcore.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 556f310d6aa4..08b795fd80b4 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -24,7 +24,7 @@
 #include <linux/memblock.h>
 #include <linux/init.h>
 #include <linux/slab.h>
-#include <linux/uaccess.h>
+#include <linux/uio.h>
 #include <asm/io.h>
 #include <linux/list.h>
 #include <linux/ioport.h>
@@ -308,9 +308,12 @@ static void append_kcore_note(char *notes, size_t *i, const char *name,
 }
 
 static ssize_t
-read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
+read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
+	struct file *file = iocb->ki_filp;
 	char *buf = file->private_data;
+	loff_t *fpos = &iocb->ki_pos;
+
 	size_t phdrs_offset, notes_offset, data_offset;
 	size_t page_offline_frozen = 1;
 	size_t phdrs_len, notes_len;
@@ -318,6 +321,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	size_t tsz;
 	int nphdr;
 	unsigned long start;
+	size_t buflen = iov_iter_count(iter);
 	size_t orig_buflen = buflen;
 	int ret = 0;
 
@@ -356,12 +360,11 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		};
 
 		tsz = min_t(size_t, buflen, sizeof(struct elfhdr) - *fpos);
-		if (copy_to_user(buffer, (char *)&ehdr + *fpos, tsz)) {
+		if (copy_to_iter((char *)&ehdr + *fpos, tsz, iter) != tsz) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		buffer += tsz;
 		buflen -= tsz;
 		*fpos += tsz;
 	}
@@ -398,15 +401,14 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		}
 
 		tsz = min_t(size_t, buflen, phdrs_offset + phdrs_len - *fpos);
-		if (copy_to_user(buffer, (char *)phdrs + *fpos - phdrs_offset,
-				 tsz)) {
+		if (copy_to_iter((char *)phdrs + *fpos - phdrs_offset, tsz,
+				 iter) != tsz) {
 			kfree(phdrs);
 			ret = -EFAULT;
 			goto out;
 		}
 		kfree(phdrs);
 
-		buffer += tsz;
 		buflen -= tsz;
 		*fpos += tsz;
 	}
@@ -448,14 +450,13 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 				  min(vmcoreinfo_size, notes_len - i));
 
 		tsz = min_t(size_t, buflen, notes_offset + notes_len - *fpos);
-		if (copy_to_user(buffer, notes + *fpos - notes_offset, tsz)) {
+		if (copy_to_iter(notes + *fpos - notes_offset, tsz, iter) != tsz) {
 			kfree(notes);
 			ret = -EFAULT;
 			goto out;
 		}
 		kfree(notes);
 
-		buffer += tsz;
 		buflen -= tsz;
 		*fpos += tsz;
 	}
@@ -497,7 +498,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		}
 
 		if (!m) {
-			if (clear_user(buffer, tsz)) {
+			if (iov_iter_zero(tsz, iter) != tsz) {
 				ret = -EFAULT;
 				goto out;
 			}
@@ -508,14 +509,14 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		case KCORE_VMALLOC:
 			vread(buf, (char *)start, tsz);
 			/* we have to zero-fill user buffer even if no read */
-			if (copy_to_user(buffer, buf, tsz)) {
+			if (copy_to_iter(buf, tsz, iter) != tsz) {
 				ret = -EFAULT;
 				goto out;
 			}
 			break;
 		case KCORE_USER:
 			/* User page is handled prior to normal kernel page: */
-			if (copy_to_user(buffer, (char *)start, tsz)) {
+			if (copy_to_iter((char *)start, tsz, iter) != tsz) {
 				ret = -EFAULT;
 				goto out;
 			}
@@ -531,7 +532,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 			 */
 			if (!page || PageOffline(page) ||
 			    is_page_hwpoison(page) || !pfn_is_ram(pfn)) {
-				if (clear_user(buffer, tsz)) {
+				if (iov_iter_zero(tsz, iter) != tsz) {
 					ret = -EFAULT;
 					goto out;
 				}
@@ -541,17 +542,17 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		case KCORE_VMEMMAP:
 		case KCORE_TEXT:
 			/*
-			 * We use _copy_to_user() to bypass usermode hardening
+			 * We use _copy_to_iter() to bypass usermode hardening
 			 * which would otherwise prevent this operation.
 			 */
-			if (_copy_to_user(buffer, (char *)start, tsz)) {
+			if (_copy_to_iter((char *)start, tsz, iter) != tsz) {
 				ret = -EFAULT;
 				goto out;
 			}
 			break;
 		default:
 			pr_warn_once("Unhandled KCORE type: %d\n", m->type);
-			if (clear_user(buffer, tsz)) {
+			if (iov_iter_zero(tsz, iter) != tsz) {
 				ret = -EFAULT;
 				goto out;
 			}
@@ -559,7 +560,6 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 skip:
 		buflen -= tsz;
 		*fpos += tsz;
-		buffer += tsz;
 		start += tsz;
 		tsz = (buflen > PAGE_SIZE ? PAGE_SIZE : buflen);
 	}
@@ -603,7 +603,7 @@ static int release_kcore(struct inode *inode, struct file *file)
 }
 
 static const struct proc_ops kcore_proc_ops = {
-	.proc_read	= read_kcore,
+	.proc_read_iter	= read_kcore_iter,
 	.proc_open	= open_kcore,
 	.proc_release	= release_kcore,
 	.proc_lseek	= default_llseek,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 3/4] iov_iter: add copy_page_to_iter_nofault()
  2023-03-23 10:15 [PATCH v8 0/4] convert read_kcore(), vread() to use iterators Lorenzo Stoakes
  2023-03-23 10:15 ` [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data Lorenzo Stoakes
  2023-03-23 10:15 ` [PATCH v8 2/4] fs/proc/kcore: convert read_kcore() to read_kcore_iter() Lorenzo Stoakes
@ 2023-03-23 10:15 ` Lorenzo Stoakes
  2023-03-23 10:15 ` [PATCH v8 4/4] mm: vmalloc: convert vread() to vread_iter() Lorenzo Stoakes
  2023-03-29  4:53 ` [PATCH v8 0/4] convert read_kcore(), vread() to use iterators Baoquan He
  4 siblings, 0 replies; 21+ messages in thread
From: Lorenzo Stoakes @ 2023-03-23 10:15 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Baoquan He, Uladzislau Rezki, Matthew Wilcox, David Hildenbrand,
	Liu Shixin, Jiri Olsa, Jens Axboe, Alexander Viro,
	Lorenzo Stoakes

Provide a means to copy a page to user space from an iterator, aborting if
a page fault would occur. This supports compound pages, but may be passed a
tail page with an offset extending further into the compound page, so we
cannot pass a folio.

This allows for this function to be called from atomic context and _try_ to
user pages if they are faulted in, aborting if not.

The function does not use _copy_to_iter() in order to not specify
might_fault(), this is similar to copy_page_from_iter_atomic().

This is being added in order that an iteratable form of vread() can be
implemented while holding spinlocks.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
 include/linux/uio.h |  2 ++
 lib/iov_iter.c      | 48 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 27e3fd942960..29eb18bb6feb 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -173,6 +173,8 @@ static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
 {
 	return copy_page_to_iter(&folio->page, offset, bytes, i);
 }
+size_t copy_page_to_iter_nofault(struct page *page, unsigned offset,
+				 size_t bytes, struct iov_iter *i);
 
 static __always_inline __must_check
 size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 274014e4eafe..34dd6bdf2fba 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -172,6 +172,18 @@ static int copyout(void __user *to, const void *from, size_t n)
 	return n;
 }
 
+static int copyout_nofault(void __user *to, const void *from, size_t n)
+{
+	long res;
+
+	if (should_fail_usercopy())
+		return n;
+
+	res = copy_to_user_nofault(to, from, n);
+
+	return res < 0 ? n : res;
+}
+
 static int copyin(void *to, const void __user *from, size_t n)
 {
 	size_t res = n;
@@ -734,6 +746,42 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
 }
 EXPORT_SYMBOL(copy_page_to_iter);
 
+size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_t bytes,
+				 struct iov_iter *i)
+{
+	size_t res = 0;
+
+	if (!page_copy_sane(page, offset, bytes))
+		return 0;
+	if (WARN_ON_ONCE(i->data_source))
+		return 0;
+	if (unlikely(iov_iter_is_pipe(i)))
+		return copy_page_to_iter_pipe(page, offset, bytes, i);
+	page += offset / PAGE_SIZE; // first subpage
+	offset %= PAGE_SIZE;
+	while (1) {
+		void *kaddr = kmap_local_page(page);
+		size_t n = min(bytes, (size_t)PAGE_SIZE - offset);
+
+		iterate_and_advance(i, n, base, len, off,
+			copyout_nofault(base, kaddr + offset + off, len),
+			memcpy(base, kaddr + offset + off, len)
+		)
+		kunmap_local(kaddr);
+		res += n;
+		bytes -= n;
+		if (!bytes || !n)
+			break;
+		offset += n;
+		if (offset == PAGE_SIZE) {
+			page++;
+			offset = 0;
+		}
+	}
+	return res;
+}
+EXPORT_SYMBOL(copy_page_to_iter_nofault);
+
 size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i)
 {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v8 4/4] mm: vmalloc: convert vread() to vread_iter()
  2023-03-23 10:15 [PATCH v8 0/4] convert read_kcore(), vread() to use iterators Lorenzo Stoakes
                   ` (2 preceding siblings ...)
  2023-03-23 10:15 ` [PATCH v8 3/4] iov_iter: add copy_page_to_iter_nofault() Lorenzo Stoakes
@ 2023-03-23 10:15 ` Lorenzo Stoakes
  2023-03-29  4:53 ` [PATCH v8 0/4] convert read_kcore(), vread() to use iterators Baoquan He
  4 siblings, 0 replies; 21+ messages in thread
From: Lorenzo Stoakes @ 2023-03-23 10:15 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Baoquan He, Uladzislau Rezki, Matthew Wilcox, David Hildenbrand,
	Liu Shixin, Jiri Olsa, Jens Axboe, Alexander Viro,
	Lorenzo Stoakes

Having previously laid the foundation for converting vread() to an iterator
function, pull the trigger and do so.

This patch attempts to provide minimal refactoring and to reflect the
existing logic as best we can, for example we continue to zero portions of
memory not read, as before.

Overall, there should be no functional difference other than a performance
improvement in /proc/kcore access to vmalloc regions.

Now we have eliminated the need for a bounce buffer in read_kcore_iter(),
we dispense with it, and try to write to user memory optimistically but
with faults disabled via copy_page_to_iter_nofault(). We already have
preemption disabled by holding a spin lock. We continue faulting in until
the operation is complete.

Additionally, we must account for the fact that at any point a copy may
fail (most likely due to a fault not being able to occur), we exit
indicating fewer bytes retrieved than expected.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
 fs/proc/kcore.c         |  44 ++++----
 include/linux/vmalloc.h |   3 +-
 mm/nommu.c              |  10 +-
 mm/vmalloc.c            | 234 +++++++++++++++++++++++++---------------
 4 files changed, 176 insertions(+), 115 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 08b795fd80b4..25b44b303b35 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -307,13 +307,9 @@ static void append_kcore_note(char *notes, size_t *i, const char *name,
 	*i = ALIGN(*i + descsz, 4);
 }
 
-static ssize_t
-read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
+static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
-	struct file *file = iocb->ki_filp;
-	char *buf = file->private_data;
 	loff_t *fpos = &iocb->ki_pos;
-
 	size_t phdrs_offset, notes_offset, data_offset;
 	size_t page_offline_frozen = 1;
 	size_t phdrs_len, notes_len;
@@ -507,13 +503,30 @@ read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
 
 		switch (m->type) {
 		case KCORE_VMALLOC:
-			vread(buf, (char *)start, tsz);
-			/* we have to zero-fill user buffer even if no read */
-			if (copy_to_iter(buf, tsz, iter) != tsz) {
-				ret = -EFAULT;
-				goto out;
+		{
+			const char *src = (char *)start;
+			size_t read = 0, left = tsz;
+
+			/*
+			 * vmalloc uses spinlocks, so we optimistically try to
+			 * read memory. If this fails, fault pages in and try
+			 * again until we are done.
+			 */
+			while (true) {
+				read += vread_iter(iter, src, left);
+				if (read == tsz)
+					break;
+
+				src += read;
+				left -= read;
+
+				if (fault_in_iov_iter_writeable(iter, left)) {
+					ret = -EFAULT;
+					goto out;
+				}
 			}
 			break;
+		}
 		case KCORE_USER:
 			/* User page is handled prior to normal kernel page: */
 			if (copy_to_iter((char *)start, tsz, iter) != tsz) {
@@ -582,10 +595,6 @@ static int open_kcore(struct inode *inode, struct file *filp)
 	if (ret)
 		return ret;
 
-	filp->private_data = kmalloc(PAGE_SIZE, GFP_KERNEL);
-	if (!filp->private_data)
-		return -ENOMEM;
-
 	if (kcore_need_update)
 		kcore_update_ram();
 	if (i_size_read(inode) != proc_root_kcore->size) {
@@ -596,16 +605,9 @@ static int open_kcore(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-static int release_kcore(struct inode *inode, struct file *file)
-{
-	kfree(file->private_data);
-	return 0;
-}
-
 static const struct proc_ops kcore_proc_ops = {
 	.proc_read_iter	= read_kcore_iter,
 	.proc_open	= open_kcore,
-	.proc_release	= release_kcore,
 	.proc_lseek	= default_llseek,
 };
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 69250efa03d1..461aa5637f65 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -9,6 +9,7 @@
 #include <asm/page.h>		/* pgprot_t */
 #include <linux/rbtree.h>
 #include <linux/overflow.h>
+#include <linux/uio.h>
 
 #include <asm/vmalloc.h>
 
@@ -251,7 +252,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
 #endif
 
 /* for /proc/kcore */
-extern long vread(char *buf, char *addr, unsigned long count);
+extern long vread_iter(struct iov_iter *iter, const char *addr, size_t count);
 
 /*
  *	Internals.  Don't use..
diff --git a/mm/nommu.c b/mm/nommu.c
index 57ba243c6a37..f670d9979a26 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -36,6 +36,7 @@
 #include <linux/printk.h>
 
 #include <linux/uaccess.h>
+#include <linux/uio.h>
 #include <asm/tlb.h>
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
@@ -198,14 +199,13 @@ unsigned long vmalloc_to_pfn(const void *addr)
 }
 EXPORT_SYMBOL(vmalloc_to_pfn);
 
-long vread(char *buf, char *addr, unsigned long count)
+long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
 {
 	/* Don't allow overflow */
-	if ((unsigned long) buf + count < count)
-		count = -(unsigned long) buf;
+	if ((unsigned long) addr + count < count)
+		count = -(unsigned long) addr;
 
-	memcpy(buf, addr, count);
-	return count;
+	return copy_to_iter(addr, count, iter);
 }
 
 /*
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 978194dc2bb8..2aaa9382605c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -37,7 +37,6 @@
 #include <linux/rbtree_augmented.h>
 #include <linux/overflow.h>
 #include <linux/pgtable.h>
-#include <linux/uaccess.h>
 #include <linux/hugetlb.h>
 #include <linux/sched/mm.h>
 #include <asm/tlbflush.h>
@@ -3442,62 +3441,96 @@ void *vmalloc_32_user(unsigned long size)
 EXPORT_SYMBOL(vmalloc_32_user);
 
 /*
- * small helper routine , copy contents to buf from addr.
- * If the page is not present, fill zero.
+ * Atomically zero bytes in the iterator.
+ *
+ * Returns the number of zeroed bytes.
  */
+static size_t zero_iter(struct iov_iter *iter, size_t count)
+{
+	size_t remains = count;
+
+	while (remains > 0) {
+		size_t num, copied;
+
+		num = remains < PAGE_SIZE ? remains : PAGE_SIZE;
+		copied = copy_page_to_iter_nofault(ZERO_PAGE(0), 0, num, iter);
+		remains -= copied;
+
+		if (copied < num)
+			break;
+	}
 
-static int aligned_vread(char *buf, char *addr, unsigned long count)
+	return count - remains;
+}
+
+/*
+ * small helper routine, copy contents to iter from addr.
+ * If the page is not present, fill zero.
+ *
+ * Returns the number of copied bytes.
+ */
+static size_t aligned_vread_iter(struct iov_iter *iter,
+				 const char *addr, size_t count)
 {
-	struct page *p;
-	int copied = 0;
+	size_t remains = count;
+	struct page *page;
 
-	while (count) {
+	while (remains > 0) {
 		unsigned long offset, length;
+		size_t copied = 0;
 
 		offset = offset_in_page(addr);
 		length = PAGE_SIZE - offset;
-		if (length > count)
-			length = count;
-		p = vmalloc_to_page(addr);
+		if (length > remains)
+			length = remains;
+		page = vmalloc_to_page(addr);
 		/*
-		 * To do safe access to this _mapped_ area, we need
-		 * lock. But adding lock here means that we need to add
-		 * overhead of vmalloc()/vfree() calls for this _debug_
-		 * interface, rarely used. Instead of that, we'll use
-		 * kmap() and get small overhead in this access function.
+		 * To do safe access to this _mapped_ area, we need lock. But
+		 * adding lock here means that we need to add overhead of
+		 * vmalloc()/vfree() calls for this _debug_ interface, rarely
+		 * used. Instead of that, we'll use an local mapping via
+		 * copy_page_to_iter_nofault() and accept a small overhead in
+		 * this access function.
 		 */
-		if (p) {
-			/* We can expect USER0 is not used -- see vread() */
-			void *map = kmap_atomic(p);
-			memcpy(buf, map + offset, length);
-			kunmap_atomic(map);
-		} else
-			memset(buf, 0, length);
+		if (page)
+			copied = copy_page_to_iter_nofault(page, offset,
+							   length, iter);
+		else
+			copied = zero_iter(iter, length);
 
-		addr += length;
-		buf += length;
-		copied += length;
-		count -= length;
+		addr += copied;
+		remains -= copied;
+
+		if (copied != length)
+			break;
 	}
-	return copied;
+
+	return count - remains;
 }
 
-static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags)
+/*
+ * Read from a vm_map_ram region of memory.
+ *
+ * Returns the number of copied bytes.
+ */
+static size_t vmap_ram_vread_iter(struct iov_iter *iter, const char *addr,
+				  size_t count, unsigned long flags)
 {
 	char *start;
 	struct vmap_block *vb;
 	unsigned long offset;
-	unsigned int rs, re, n;
+	unsigned int rs, re;
+	size_t remains, n;
 
 	/*
 	 * If it's area created by vm_map_ram() interface directly, but
 	 * not further subdividing and delegating management to vmap_block,
 	 * handle it here.
 	 */
-	if (!(flags & VMAP_BLOCK)) {
-		aligned_vread(buf, addr, count);
-		return;
-	}
+	if (!(flags & VMAP_BLOCK))
+		return aligned_vread_iter(iter, addr, count);
+
+	remains = count;
 
 	/*
 	 * Area is split into regions and tracked with vmap_block, read out
@@ -3505,50 +3538,64 @@ static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags
 	 */
 	vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
 	if (!vb)
-		goto finished;
+		goto finished_zero;
 
 	spin_lock(&vb->lock);
 	if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
 		spin_unlock(&vb->lock);
-		goto finished;
+		goto finished_zero;
 	}
+
 	for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
-		if (!count)
-			break;
+		size_t copied;
+
+		if (remains == 0)
+			goto finished;
+
 		start = vmap_block_vaddr(vb->va->va_start, rs);
-		while (addr < start) {
-			if (count == 0)
-				goto unlock;
-			*buf = '\0';
-			buf++;
-			addr++;
-			count--;
+
+		if (addr < start) {
+			size_t to_zero = min_t(size_t, start - addr, remains);
+			size_t zeroed = zero_iter(iter, to_zero);
+
+			addr += zeroed;
+			remains -= zeroed;
+
+			if (remains == 0 || zeroed != to_zero)
+				goto finished;
 		}
+
 		/*it could start reading from the middle of used region*/
 		offset = offset_in_page(addr);
 		n = ((re - rs + 1) << PAGE_SHIFT) - offset;
-		if (n > count)
-			n = count;
-		aligned_vread(buf, start+offset, n);
+		if (n > remains)
+			n = remains;
+
+		copied = aligned_vread_iter(iter, start + offset, n);
 
-		buf += n;
-		addr += n;
-		count -= n;
+		addr += copied;
+		remains -= copied;
+
+		if (copied != n)
+			goto finished;
 	}
-unlock:
+
 	spin_unlock(&vb->lock);
 
-finished:
+finished_zero:
 	/* zero-fill the left dirty or free regions */
-	if (count)
-		memset(buf, 0, count);
+	return count - remains + zero_iter(iter, remains);
+finished:
+	/* We couldn't copy/zero everything */
+	spin_unlock(&vb->lock);
+	return count - remains;
 }
 
 /**
- * vread() - read vmalloc area in a safe way.
- * @buf:     buffer for reading data
- * @addr:    vm address.
- * @count:   number of bytes to be read.
+ * vread_iter() - read vmalloc area in a safe way to an iterator.
+ * @iter:         the iterator to which data should be written.
+ * @addr:         vm address.
+ * @count:        number of bytes to be read.
  *
  * This function checks that addr is a valid vmalloc'ed area, and
  * copy data from that area to a given buffer. If the given memory range
@@ -3568,13 +3615,12 @@ static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags
  * (same number as @count) or %0 if [addr...addr+count) doesn't
  * include any intersection with valid vmalloc area
  */
-long vread(char *buf, char *addr, unsigned long count)
+long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
 {
 	struct vmap_area *va;
 	struct vm_struct *vm;
-	char *vaddr, *buf_start = buf;
-	unsigned long buflen = count;
-	unsigned long n, size, flags;
+	char *vaddr;
+	size_t n, size, flags, remains;
 
 	addr = kasan_reset_tag(addr);
 
@@ -3582,18 +3628,22 @@ long vread(char *buf, char *addr, unsigned long count)
 	if ((unsigned long) addr + count < count)
 		count = -(unsigned long) addr;
 
+	remains = count;
+
 	spin_lock(&vmap_area_lock);
 	va = find_vmap_area_exceed_addr((unsigned long)addr);
 	if (!va)
-		goto finished;
+		goto finished_zero;
 
 	/* no intersects with alive vmap_area */
-	if ((unsigned long)addr + count <= va->va_start)
-		goto finished;
+	if ((unsigned long)addr + remains <= va->va_start)
+		goto finished_zero;
 
 	list_for_each_entry_from(va, &vmap_area_list, list) {
-		if (!count)
-			break;
+		size_t copied;
+
+		if (remains == 0)
+			goto finished;
 
 		vm = va->vm;
 		flags = va->flags & VMAP_FLAGS_MASK;
@@ -3608,6 +3658,7 @@ long vread(char *buf, char *addr, unsigned long count)
 
 		if (vm && (vm->flags & VM_UNINITIALIZED))
 			continue;
+
 		/* Pair with smp_wmb() in clear_vm_uninitialized_flag() */
 		smp_rmb();
 
@@ -3616,38 +3667,45 @@ long vread(char *buf, char *addr, unsigned long count)
 
 		if (addr >= vaddr + size)
 			continue;
-		while (addr < vaddr) {
-			if (count == 0)
+
+		if (addr < vaddr) {
+			size_t to_zero = min_t(size_t, vaddr - addr, remains);
+			size_t zeroed = zero_iter(iter, to_zero);
+
+			addr += zeroed;
+			remains -= zeroed;
+
+			if (remains == 0 || zeroed != to_zero)
 				goto finished;
-			*buf = '\0';
-			buf++;
-			addr++;
-			count--;
 		}
+
 		n = vaddr + size - addr;
-		if (n > count)
-			n = count;
+		if (n > remains)
+			n = remains;
 
 		if (flags & VMAP_RAM)
-			vmap_ram_vread(buf, addr, n, flags);
+			copied = vmap_ram_vread_iter(iter, addr, n, flags);
 		else if (!(vm->flags & VM_IOREMAP))
-			aligned_vread(buf, addr, n);
+			copied = aligned_vread_iter(iter, addr, n);
 		else /* IOREMAP area is treated as memory hole */
-			memset(buf, 0, n);
-		buf += n;
-		addr += n;
-		count -= n;
+			copied = zero_iter(iter, n);
+
+		addr += copied;
+		remains -= copied;
+
+		if (copied != n)
+			goto finished;
 	}
-finished:
-	spin_unlock(&vmap_area_lock);
 
-	if (buf == buf_start)
-		return 0;
+finished_zero:
+	spin_unlock(&vmap_area_lock);
 	/* zero-fill memory holes */
-	if (buf != buf_start + buflen)
-		memset(buf, 0, buflen - (buf - buf_start));
+	return count - remains + zero_iter(iter, remains);
+finished:
+	/* Nothing remains, or We couldn't copy/zero everything. */
+	spin_unlock(&vmap_area_lock);
 
-	return buflen;
+	return count - remains;
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 0/4] convert read_kcore(), vread() to use iterators
  2023-03-23 10:15 [PATCH v8 0/4] convert read_kcore(), vread() to use iterators Lorenzo Stoakes
                   ` (3 preceding siblings ...)
  2023-03-23 10:15 ` [PATCH v8 4/4] mm: vmalloc: convert vread() to vread_iter() Lorenzo Stoakes
@ 2023-03-29  4:53 ` Baoquan He
  4 siblings, 0 replies; 21+ messages in thread
From: Baoquan He @ 2023-03-29  4:53 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, linux-kernel, linux-fsdevel, Andrew Morton,
	Uladzislau Rezki, Matthew Wilcox, David Hildenbrand, Liu Shixin,
	Jiri Olsa, Jens Axboe, Alexander Viro

On 03/23/23 at 10:15am, Lorenzo Stoakes wrote:
> While reviewing Baoquan's recent changes to permit vread() access to
> vm_map_ram regions of vmalloc allocations, Willy pointed out [1] that it
> would be nice to refactor vread() as a whole, since its only user is
> read_kcore() and the existing form of vread() necessitates the use of a
> bounce buffer.
> 
> This patch series does exactly that, as well as adjusting how we read the
> kernel text section to avoid the use of a bounce buffer in this case as
> well.
> 
> This has been tested against the test case which motivated Baoquan's
> changes in the first place [2] which continues to function correctly, as do
> the vmalloc self tests.
> 
> [1] https://lore.kernel.org/all/Y8WfDSRkc%2FOHP3oD@casper.infradead.org/
> [2] https://lore.kernel.org/all/87ilk6gos2.fsf@oracle.com/T/#u

The whole series looks good to me.

Reviewed-by: Baoquan He <bhe@redhat.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-03-23 10:15 ` [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data Lorenzo Stoakes
@ 2023-05-31 11:58   ` Jiri Olsa
  2023-07-21 13:48     ` Baoquan He
  2023-07-24  9:38     ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 2 replies; 21+ messages in thread
From: Jiri Olsa @ 2023-05-31 11:58 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, linux-kernel, linux-fsdevel, Andrew Morton, Baoquan He,
	Uladzislau Rezki, Matthew Wilcox, David Hildenbrand, Liu Shixin,
	Jens Axboe, Alexander Viro

On Thu, Mar 23, 2023 at 10:15:16AM +0000, Lorenzo Stoakes wrote:
> Commit df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data")
> introduced the use of a bounce buffer to retrieve kernel text data for
> /proc/kcore in order to avoid failures arising from hardened user copies
> enabled by CONFIG_HARDENED_USERCOPY in check_kernel_text_object().
> 
> We can avoid doing this if instead of copy_to_user() we use _copy_to_user()
> which bypasses the hardening check. This is more efficient than using a
> bounce buffer and simplifies the code.
> 
> We do so as part an overall effort to eliminate bounce buffer usage in the
> function with an eye to converting it an iterator read.
> 
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>

hi,
sorry for late feedback, but looks like this one breaks reading
/proc/kcore with objdump for me:

  # cat /proc/kallsyms | grep ksys_read
  ffffffff8150ebc0 T ksys_read
  # objdump -d  --start-address=0xffffffff8150ebc0 --stop-address=0xffffffff8150ebd0 /proc/kcore 

  /proc/kcore:     file format elf64-x86-64

  objdump: Reading section load1 failed because: Bad address

reverting this makes it work again

thanks,
jirka

> ---
>  fs/proc/kcore.c | 17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> index 71157ee35c1a..556f310d6aa4 100644
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -541,19 +541,12 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  		case KCORE_VMEMMAP:
>  		case KCORE_TEXT:
>  			/*
> -			 * Using bounce buffer to bypass the
> -			 * hardened user copy kernel text checks.
> +			 * We use _copy_to_user() to bypass usermode hardening
> +			 * which would otherwise prevent this operation.
>  			 */
> -			if (copy_from_kernel_nofault(buf, (void *)start, tsz)) {
> -				if (clear_user(buffer, tsz)) {
> -					ret = -EFAULT;
> -					goto out;
> -				}
> -			} else {
> -				if (copy_to_user(buffer, buf, tsz)) {
> -					ret = -EFAULT;
> -					goto out;
> -				}
> +			if (_copy_to_user(buffer, (char *)start, tsz)) {
> +				ret = -EFAULT;
> +				goto out;
>  			}
>  			break;
>  		default:
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-05-31 11:58   ` Jiri Olsa
@ 2023-07-21 13:48     ` Baoquan He
  2023-07-21 14:13       ` Jiri Olsa
  2023-07-24  6:23       ` David Hildenbrand
  2023-07-24  9:38     ` Linux regression tracking (Thorsten Leemhuis)
  1 sibling, 2 replies; 21+ messages in thread
From: Baoquan He @ 2023-07-21 13:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Lorenzo Stoakes, linux-mm, linux-kernel, linux-fsdevel,
	Andrew Morton, Uladzislau Rezki, Matthew Wilcox,
	David Hildenbrand, Liu Shixin, Jens Axboe, Alexander Viro

Hi Jiri,

On 05/31/23 at 01:58pm, Jiri Olsa wrote:
> On Thu, Mar 23, 2023 at 10:15:16AM +0000, Lorenzo Stoakes wrote:
> > Commit df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data")
> > introduced the use of a bounce buffer to retrieve kernel text data for
> > /proc/kcore in order to avoid failures arising from hardened user copies
> > enabled by CONFIG_HARDENED_USERCOPY in check_kernel_text_object().
> > 
> > We can avoid doing this if instead of copy_to_user() we use _copy_to_user()
> > which bypasses the hardening check. This is more efficient than using a
> > bounce buffer and simplifies the code.
> > 
> > We do so as part an overall effort to eliminate bounce buffer usage in the
> > function with an eye to converting it an iterator read.
> > 
> > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> hi,
> sorry for late feedback, but looks like this one breaks reading
> /proc/kcore with objdump for me:
> 
>   # cat /proc/kallsyms | grep ksys_read
>   ffffffff8150ebc0 T ksys_read
>   # objdump -d  --start-address=0xffffffff8150ebc0 --stop-address=0xffffffff8150ebd0 /proc/kcore 
> 
>   /proc/kcore:     file format elf64-x86-64
> 
>   objdump: Reading section load1 failed because: Bad address
> 
> reverting this makes it work again

I met this too when I executed below command to trigger a kcore reading.
I wanted to do a simple testing during system running and got this.

  makedumpfile --mem-usage /proc/kcore

Later I tried your above objdump testing, it corrupted system too.

Is there any conclusion about this issue you reported? I could miss
things in the discussion or patch posting to fix this.

Thanks
Baoquan

> 
> 
> > ---
> >  fs/proc/kcore.c | 17 +++++------------
> >  1 file changed, 5 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> > index 71157ee35c1a..556f310d6aa4 100644
> > --- a/fs/proc/kcore.c
> > +++ b/fs/proc/kcore.c
> > @@ -541,19 +541,12 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
> >  		case KCORE_VMEMMAP:
> >  		case KCORE_TEXT:
> >  			/*
> > -			 * Using bounce buffer to bypass the
> > -			 * hardened user copy kernel text checks.
> > +			 * We use _copy_to_user() to bypass usermode hardening
> > +			 * which would otherwise prevent this operation.
> >  			 */
> > -			if (copy_from_kernel_nofault(buf, (void *)start, tsz)) {
> > -				if (clear_user(buffer, tsz)) {
> > -					ret = -EFAULT;
> > -					goto out;
> > -				}
> > -			} else {
> > -				if (copy_to_user(buffer, buf, tsz)) {
> > -					ret = -EFAULT;
> > -					goto out;
> > -				}
> > +			if (_copy_to_user(buffer, (char *)start, tsz)) {
> > +				ret = -EFAULT;
> > +				goto out;
> >  			}
> >  			break;
> >  		default:
> > -- 
> > 2.39.2
> > 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-21 13:48     ` Baoquan He
@ 2023-07-21 14:13       ` Jiri Olsa
  2023-07-24  6:23       ` David Hildenbrand
  1 sibling, 0 replies; 21+ messages in thread
From: Jiri Olsa @ 2023-07-21 14:13 UTC (permalink / raw)
  To: Baoquan He, Lorenzo Stoakes
  Cc: Jiri Olsa, linux-mm, linux-kernel, linux-fsdevel, Andrew Morton,
	Uladzislau Rezki, Matthew Wilcox, David Hildenbrand, Liu Shixin,
	Jens Axboe, Alexander Viro

On Fri, Jul 21, 2023 at 09:48:37PM +0800, Baoquan He wrote:
> Hi Jiri,
> 
> On 05/31/23 at 01:58pm, Jiri Olsa wrote:
> > On Thu, Mar 23, 2023 at 10:15:16AM +0000, Lorenzo Stoakes wrote:
> > > Commit df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data")
> > > introduced the use of a bounce buffer to retrieve kernel text data for
> > > /proc/kcore in order to avoid failures arising from hardened user copies
> > > enabled by CONFIG_HARDENED_USERCOPY in check_kernel_text_object().
> > > 
> > > We can avoid doing this if instead of copy_to_user() we use _copy_to_user()
> > > which bypasses the hardening check. This is more efficient than using a
> > > bounce buffer and simplifies the code.
> > > 
> > > We do so as part an overall effort to eliminate bounce buffer usage in the
> > > function with an eye to converting it an iterator read.
> > > 
> > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
> > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > 
> > hi,
> > sorry for late feedback, but looks like this one breaks reading
> > /proc/kcore with objdump for me:
> > 
> >   # cat /proc/kallsyms | grep ksys_read
> >   ffffffff8150ebc0 T ksys_read
> >   # objdump -d  --start-address=0xffffffff8150ebc0 --stop-address=0xffffffff8150ebd0 /proc/kcore 
> > 
> >   /proc/kcore:     file format elf64-x86-64
> > 
> >   objdump: Reading section load1 failed because: Bad address
> > 
> > reverting this makes it work again
> 
> I met this too when I executed below command to trigger a kcore reading.
> I wanted to do a simple testing during system running and got this.
> 
>   makedumpfile --mem-usage /proc/kcore
> 
> Later I tried your above objdump testing, it corrupted system too.
> 
> Is there any conclusion about this issue you reported? I could miss
> things in the discussion or patch posting to fix this.

hi,
thanks for your reply, I meant to ping on this again

AFAIK there was no answer yet.. I managed to cleanly revert the patch when
I needed the functionality, then got sidetracked and forgot about this

I just re-tested and it's still failing for me, would be great to get it fixed

Lorenzo, any idea?

thanks,
jirka


> 
> Thanks
> Baoquan
> 
> > 
> > 
> > > ---
> > >  fs/proc/kcore.c | 17 +++++------------
> > >  1 file changed, 5 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> > > index 71157ee35c1a..556f310d6aa4 100644
> > > --- a/fs/proc/kcore.c
> > > +++ b/fs/proc/kcore.c
> > > @@ -541,19 +541,12 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
> > >  		case KCORE_VMEMMAP:
> > >  		case KCORE_TEXT:
> > >  			/*
> > > -			 * Using bounce buffer to bypass the
> > > -			 * hardened user copy kernel text checks.
> > > +			 * We use _copy_to_user() to bypass usermode hardening
> > > +			 * which would otherwise prevent this operation.
> > >  			 */
> > > -			if (copy_from_kernel_nofault(buf, (void *)start, tsz)) {
> > > -				if (clear_user(buffer, tsz)) {
> > > -					ret = -EFAULT;
> > > -					goto out;
> > > -				}
> > > -			} else {
> > > -				if (copy_to_user(buffer, buf, tsz)) {
> > > -					ret = -EFAULT;
> > > -					goto out;
> > > -				}
> > > +			if (_copy_to_user(buffer, (char *)start, tsz)) {
> > > +				ret = -EFAULT;
> > > +				goto out;
> > >  			}
> > >  			break;
> > >  		default:
> > > -- 
> > > 2.39.2
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-21 13:48     ` Baoquan He
  2023-07-21 14:13       ` Jiri Olsa
@ 2023-07-24  6:23       ` David Hildenbrand
  2023-07-24  8:08         ` Baoquan He
  2023-07-31 19:21         ` Lorenzo Stoakes
  1 sibling, 2 replies; 21+ messages in thread
From: David Hildenbrand @ 2023-07-24  6:23 UTC (permalink / raw)
  To: Baoquan He, Jiri Olsa
  Cc: Lorenzo Stoakes, linux-mm, linux-kernel, linux-fsdevel,
	Andrew Morton, Uladzislau Rezki, Matthew Wilcox, Liu Shixin,
	Jens Axboe, Alexander Viro

Hi,

> 
> I met this too when I executed below command to trigger a kcore reading.
> I wanted to do a simple testing during system running and got this.
> 
>    makedumpfile --mem-usage /proc/kcore
> 
> Later I tried your above objdump testing, it corrupted system too.
> 

What do you mean with "corrupted system too" --  did it not only fail to 
dump the system, but also actually harmed the system?

@Lorenzo do you plan on reproduce + fix, or should we consider reverting 
that change?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-24  6:23       ` David Hildenbrand
@ 2023-07-24  8:08         ` Baoquan He
  2023-07-24  8:18           ` Jiri Olsa
  2023-07-31 19:21         ` Lorenzo Stoakes
  1 sibling, 1 reply; 21+ messages in thread
From: Baoquan He @ 2023-07-24  8:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Jiri Olsa, Lorenzo Stoakes, linux-mm, linux-kernel,
	linux-fsdevel, Andrew Morton, Uladzislau Rezki, Matthew Wilcox,
	Liu Shixin, Jens Axboe, Alexander Viro

On 07/24/23 at 08:23am, David Hildenbrand wrote:
> Hi,
> 
> > 
> > I met this too when I executed below command to trigger a kcore reading.
> > I wanted to do a simple testing during system running and got this.
> > 
> >    makedumpfile --mem-usage /proc/kcore
> > 
> > Later I tried your above objdump testing, it corrupted system too.
> > 
> 
> What do you mean with "corrupted system too" --  did it not only fail to
> dump the system, but also actually harmed the system?

From my testing, reading kcore will cause system panic, then reboot. Not
sure if Jiri saw the same phenomenon.

> 
> @Lorenzo do you plan on reproduce + fix, or should we consider reverting
> that change?

When tested on a arm64 system, the reproducution is stable. I will have
a look too to see if I have some finding this week.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-24  8:08         ` Baoquan He
@ 2023-07-24  8:18           ` Jiri Olsa
  2023-07-24 14:33             ` Baoquan He
  0 siblings, 1 reply; 21+ messages in thread
From: Jiri Olsa @ 2023-07-24  8:18 UTC (permalink / raw)
  To: Baoquan He
  Cc: David Hildenbrand, Jiri Olsa, Lorenzo Stoakes, linux-mm,
	linux-kernel, linux-fsdevel, Andrew Morton, Uladzislau Rezki,
	Matthew Wilcox, Liu Shixin, Jens Axboe, Alexander Viro

On Mon, Jul 24, 2023 at 04:08:41PM +0800, Baoquan He wrote:
> On 07/24/23 at 08:23am, David Hildenbrand wrote:
> > Hi,
> > 
> > > 
> > > I met this too when I executed below command to trigger a kcore reading.
> > > I wanted to do a simple testing during system running and got this.
> > > 
> > >    makedumpfile --mem-usage /proc/kcore
> > > 
> > > Later I tried your above objdump testing, it corrupted system too.
> > > 
> > 
> > What do you mean with "corrupted system too" --  did it not only fail to
> > dump the system, but also actually harmed the system?
> 
> From my testing, reading kcore will cause system panic, then reboot. Not
> sure if Jiri saw the same phenomenon.

it did not crash for me, just the read error
could you get console output from that?

jirka

> 
> > 
> > @Lorenzo do you plan on reproduce + fix, or should we consider reverting
> > that change?
> 
> When tested on a arm64 system, the reproducution is stable. I will have
> a look too to see if I have some finding this week.
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-05-31 11:58   ` Jiri Olsa
  2023-07-21 13:48     ` Baoquan He
@ 2023-07-24  9:38     ` Linux regression tracking (Thorsten Leemhuis)
  1 sibling, 0 replies; 21+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-07-24  9:38 UTC (permalink / raw)
  To: Jiri Olsa, Lorenzo Stoakes
  Cc: linux-mm, linux-kernel, linux-fsdevel, Andrew Morton, Baoquan He,
	Uladzislau Rezki, Matthew Wilcox, David Hildenbrand, Liu Shixin,
	Jens Axboe, Alexander Viro, Linux kernel regressions list

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 31.05.23 13:58, Jiri Olsa wrote:
> On Thu, Mar 23, 2023 at 10:15:16AM +0000, Lorenzo Stoakes wrote:
>> Commit df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data")
>> introduced the use of a bounce buffer to retrieve kernel text data for
>> /proc/kcore in order to avoid failures arising from hardened user copies
>> enabled by CONFIG_HARDENED_USERCOPY in check_kernel_text_object().
>> [...]
>> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> sorry for late feedback, but looks like this one breaks reading
> /proc/kcore with objdump for me:
> 
>   # cat /proc/kallsyms | grep ksys_read
>   ffffffff8150ebc0 T ksys_read
>   # objdump -d  --start-address=0xffffffff8150ebc0 --stop-address=0xffffffff8150ebd0 /proc/kcore 
> 
>   /proc/kcore:     file format elf64-x86-64
> 
>   objdump: Reading section load1 failed because: Bad address
> 
> reverting this makes it work again

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 2e1c017077
#regzbot title mm / fs/proc/kcore: reading /proc/kcore with objdump broke
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-24  8:18           ` Jiri Olsa
@ 2023-07-24 14:33             ` Baoquan He
  0 siblings, 0 replies; 21+ messages in thread
From: Baoquan He @ 2023-07-24 14:33 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: David Hildenbrand, Lorenzo Stoakes, linux-mm, linux-kernel,
	linux-fsdevel, Andrew Morton, Uladzislau Rezki, Matthew Wilcox,
	Liu Shixin, Jens Axboe, Alexander Viro

On 07/24/23 at 10:18am, Jiri Olsa wrote:
> On Mon, Jul 24, 2023 at 04:08:41PM +0800, Baoquan He wrote:
> > On 07/24/23 at 08:23am, David Hildenbrand wrote:
> > > Hi,
> > > 
> > > > 
> > > > I met this too when I executed below command to trigger a kcore reading.
> > > > I wanted to do a simple testing during system running and got this.
> > > > 
> > > >    makedumpfile --mem-usage /proc/kcore
> > > > 
> > > > Later I tried your above objdump testing, it corrupted system too.
> > > > 
> > > 
> > > What do you mean with "corrupted system too" --  did it not only fail to
> > > dump the system, but also actually harmed the system?
> > 
> > From my testing, reading kcore will cause system panic, then reboot. Not
> > sure if Jiri saw the same phenomenon.
> 
> it did not crash for me, just the read error
> could you get console output from that?

I got a new arm64 machine, then executing "makedumpfile --mem-usage
/proc/kcore" won't trigger panic, your objdump command can trigger
panic. The call trace is pasted at below. It's the same as the panic and
call trace I met on my last arm64 machine.

[13270.314323] Mem abort info:
[13270.317162]   ESR = 0x0000000096000007
[13270.320901]   EC = 0x25: DABT (current EL), IL = 32 bits
[13270.326217]   SET = 0, FnV = 0
[13270.329261]   EA = 0, S1PTW = 0
[13270.332390]   FSC = 0x07: level 3 translation fault
[13270.337270] Data abort info:
[13270.340139]   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[13270.345626]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[13270.350666]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[13270.355981] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000400651d64000
[13270.362672] [ffffdc9cf3ea0000] pgd=1000401ffffff003, p4d=1000401ffffff003, pud=1000401fffffe003, pmd=1000401fffffd003, pte=0000000000000000
[13270.375367] Internal error: Oops: 0000000096000007 [#4] SMP
[13270.380934] Modules linked in: mlx5_ib ib_uverbs ib_core rfkill vfat fat joydev cdc_ether usbnet mii mlx5_core acpi_ipmi mlxfw ipmi_ssif psample tls ipmi_devintf pci_hyperv_intf arm_spe_pmu ipmi_msghandler arm_cmn arm_dmc620_pmu arm_dsu_pmu cppc_cpufreq acpi_tad fuse zram xfs crct10dif_ce polyval_ce polyval_generic ghash_ce uas sbsa_gwdt nvme nvme_core ast usb_storage nvme_common i2c_algo_bit xgene_hwmon
[13270.416751] CPU: 15 PID: 8803 Comm: objdump Tainted: G      D            6.5.0-rc3 #1
[13270.424570] Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 2.10.20220531 (SCP: 2.10.20220531) 2022/05/31
[13270.437337] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[13270.444289] pc : __arch_copy_to_user+0x180/0x240
[13270.448910] lr : _copy_to_iter+0x11c/0x5d0
[13270.453002] sp : ffff8000b15a37c0
[13270.456306] x29: ffff8000b15a37c0 x28: ffffdc9cf3ea0000 x27: ffffdc9cf6938158
[13270.463431] x26: ffff8000b15a3ba8 x25: 0000000000000690 x24: ffff8000b15a3b80
[13270.470556] x23: 00000000000038ac x22: ffffdc9cf3ea0000 x21: ffff8000b15a3b80
[13270.477682] x20: ffffdc9cf64fdf00 x19: 0000000000000400 x18: 0000000000000000
[13270.484806] x17: 0000000000000000 x16: 0000000000000000 x15: ffffdc9cf3ea0000
[13270.491931] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[13270.499056] x11: 0001000000000000 x10: ffffdc9cf64fdf00 x9 : 0000000000000690
[13270.506182] x8 : 000000007c000000 x7 : 0000fd007e000000 x6 : 000000000eee0b60
[13270.513306] x5 : 000000000eee0f60 x4 : 0000000000000000 x3 : 0000000000000400
[13270.520431] x2 : 0000000000000380 x1 : ffffdc9cf3ea0000 x0 : 000000000eee0b60
[13270.527556] Call trace:
[13270.529992]  __arch_copy_to_user+0x180/0x240
[13270.534250]  read_kcore_iter+0x718/0x878
[13270.538167]  proc_reg_read_iter+0x8c/0xe8
[13270.542168]  vfs_read+0x214/0x2c0
[13270.545478]  ksys_read+0x78/0x118
[13270.548782]  __arm64_sys_read+0x24/0x38
[13270.552608]  invoke_syscall+0x78/0x108
[13270.556351]  el0_svc_common.constprop.0+0x4c/0xf8
[13270.561044]  do_el0_svc+0x34/0x50
[13270.564347]  el0_svc+0x34/0x108
[13270.567482]  el0t_64_sync_handler+0x100/0x130
[13270.571829]  el0t_64_sync+0x194/0x198
[13270.575483] Code: d503201f d503201f d503201f d503201f (a8c12027) 
[13270.581567] ---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-24  6:23       ` David Hildenbrand
  2023-07-24  8:08         ` Baoquan He
@ 2023-07-31 19:21         ` Lorenzo Stoakes
  2023-07-31 19:24           ` David Hildenbrand
  1 sibling, 1 reply; 21+ messages in thread
From: Lorenzo Stoakes @ 2023-07-31 19:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Baoquan He, Jiri Olsa, linux-mm, linux-kernel, linux-fsdevel,
	Andrew Morton, Uladzislau Rezki, Matthew Wilcox, Liu Shixin,
	Jens Axboe, Alexander Viro

On Mon, Jul 24, 2023 at 08:23:55AM +0200, David Hildenbrand wrote:
> Hi,
>
> >
> > I met this too when I executed below command to trigger a kcore reading.
> > I wanted to do a simple testing during system running and got this.
> >
> >    makedumpfile --mem-usage /proc/kcore
> >
> > Later I tried your above objdump testing, it corrupted system too.
> >
>
> What do you mean with "corrupted system too" --  did it not only fail to
> dump the system, but also actually harmed the system?
>
> @Lorenzo do you plan on reproduce + fix, or should we consider reverting
> that change?
>
> --
> Cheers,
>
> David / dhildenb
>

Apologies I mised this, I have been very busy lately not least with book :)

Concerning, I will take a look as I get a chance. I think the whole series
would have to be reverted which would be... depressing... as other patches
in series eliminates the bounce buffer altogether.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-31 19:21         ` Lorenzo Stoakes
@ 2023-07-31 19:24           ` David Hildenbrand
  2023-07-31 19:40             ` Lorenzo Stoakes
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2023-07-31 19:24 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Baoquan He, Jiri Olsa, linux-mm, linux-kernel, linux-fsdevel,
	Andrew Morton, Uladzislau Rezki, Matthew Wilcox, Liu Shixin,
	Jens Axboe, Alexander Viro

On 31.07.23 21:21, Lorenzo Stoakes wrote:
> On Mon, Jul 24, 2023 at 08:23:55AM +0200, David Hildenbrand wrote:
>> Hi,
>>
>>>
>>> I met this too when I executed below command to trigger a kcore reading.
>>> I wanted to do a simple testing during system running and got this.
>>>
>>>     makedumpfile --mem-usage /proc/kcore
>>>
>>> Later I tried your above objdump testing, it corrupted system too.
>>>
>>
>> What do you mean with "corrupted system too" --  did it not only fail to
>> dump the system, but also actually harmed the system?
>>
>> @Lorenzo do you plan on reproduce + fix, or should we consider reverting
>> that change?
>>
>> --
>> Cheers,
>>
>> David / dhildenb
>>
> 
> Apologies I mised this, I have been very busy lately not least with book :)
> 
> Concerning, I will take a look as I get a chance. I think the whole series
> would have to be reverted which would be... depressing... as other patches
> in series eliminates the bounce buffer altogether.
> 

I spotted

https://lkml.kernel.org/r/069dd40aa71e634b414d07039d72467d051fb486.camel@gmx.de

today, maybe that's related.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-31 19:24           ` David Hildenbrand
@ 2023-07-31 19:40             ` Lorenzo Stoakes
  2023-07-31 20:34               ` Jiri Olsa
  0 siblings, 1 reply; 21+ messages in thread
From: Lorenzo Stoakes @ 2023-07-31 19:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Baoquan He, Jiri Olsa, linux-mm, linux-kernel, linux-fsdevel,
	Andrew Morton, Uladzislau Rezki, Matthew Wilcox, Liu Shixin,
	Jens Axboe, Alexander Viro

On Mon, Jul 31, 2023 at 09:24:50PM +0200, David Hildenbrand wrote:
> On 31.07.23 21:21, Lorenzo Stoakes wrote:
> > On Mon, Jul 24, 2023 at 08:23:55AM +0200, David Hildenbrand wrote:
> > > Hi,
> > >
> > > >
> > > > I met this too when I executed below command to trigger a kcore reading.
> > > > I wanted to do a simple testing during system running and got this.
> > > >
> > > >     makedumpfile --mem-usage /proc/kcore
> > > >
> > > > Later I tried your above objdump testing, it corrupted system too.
> > > >
> > >
> > > What do you mean with "corrupted system too" --  did it not only fail to
> > > dump the system, but also actually harmed the system?
> > >
> > > @Lorenzo do you plan on reproduce + fix, or should we consider reverting
> > > that change?
> > >
> > > --
> > > Cheers,
> > >
> > > David / dhildenb
> > >
> >
> > Apologies I mised this, I have been very busy lately not least with book :)
> >
> > Concerning, I will take a look as I get a chance. I think the whole series
> > would have to be reverted which would be... depressing... as other patches
> > in series eliminates the bounce buffer altogether.
> >
>
> I spotted
>
> https://lkml.kernel.org/r/069dd40aa71e634b414d07039d72467d051fb486.camel@gmx.de
>

Find that slightly confusing, they talk about just reveritng the patch but then
also add a kern_addr_valid()?

I'm also confused about people talking about just reverting the patch, as
4c91c07c93bb drops the bounce buffer altogether... presumably they mean
reverting both?

Clearly this is an arm64 thing (obviously), I have some arm64 hardware let me
see if I can repro...

Baoquan, Jiri - are you reverting more than just the one commit? And does doing
this go from not working -> working? Or from not working (worst case oops) ->
error?

> today, maybe that's related.
>
> --
> Cheers,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-31 19:40             ` Lorenzo Stoakes
@ 2023-07-31 20:34               ` Jiri Olsa
  2023-07-31 21:12                 ` Lorenzo Stoakes
  0 siblings, 1 reply; 21+ messages in thread
From: Jiri Olsa @ 2023-07-31 20:34 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: David Hildenbrand, Baoquan He, Jiri Olsa, linux-mm, linux-kernel,
	linux-fsdevel, Andrew Morton, Uladzislau Rezki, Matthew Wilcox,
	Liu Shixin, Jens Axboe, Alexander Viro

On Mon, Jul 31, 2023 at 08:40:24PM +0100, Lorenzo Stoakes wrote:
> On Mon, Jul 31, 2023 at 09:24:50PM +0200, David Hildenbrand wrote:
> > On 31.07.23 21:21, Lorenzo Stoakes wrote:
> > > On Mon, Jul 24, 2023 at 08:23:55AM +0200, David Hildenbrand wrote:
> > > > Hi,
> > > >
> > > > >
> > > > > I met this too when I executed below command to trigger a kcore reading.
> > > > > I wanted to do a simple testing during system running and got this.
> > > > >
> > > > >     makedumpfile --mem-usage /proc/kcore
> > > > >
> > > > > Later I tried your above objdump testing, it corrupted system too.
> > > > >
> > > >
> > > > What do you mean with "corrupted system too" --  did it not only fail to
> > > > dump the system, but also actually harmed the system?
> > > >
> > > > @Lorenzo do you plan on reproduce + fix, or should we consider reverting
> > > > that change?
> > > >
> > > > --
> > > > Cheers,
> > > >
> > > > David / dhildenb
> > > >
> > >
> > > Apologies I mised this, I have been very busy lately not least with book :)
> > >
> > > Concerning, I will take a look as I get a chance. I think the whole series
> > > would have to be reverted which would be... depressing... as other patches
> > > in series eliminates the bounce buffer altogether.
> > >
> >
> > I spotted
> >
> > https://lkml.kernel.org/r/069dd40aa71e634b414d07039d72467d051fb486.camel@gmx.de
> >
> 
> Find that slightly confusing, they talk about just reveritng the patch but then
> also add a kern_addr_valid()?
> 
> I'm also confused about people talking about just reverting the patch, as
> 4c91c07c93bb drops the bounce buffer altogether... presumably they mean
> reverting both?
> 
> Clearly this is an arm64 thing (obviously), I have some arm64 hardware let me
> see if I can repro...

I see the issue on x86

> 
> Baoquan, Jiri - are you reverting more than just the one commit? And does doing
> this go from not working -> working? Or from not working (worst case oops) ->
> error?

yes, I used to revert all 4 patches

I did quick check and had to revert 2 more patches to get clean revert

38b138abc355 Revert "fs/proc/kcore: avoid bounce buffer for ktext data"
e2c3b418d365 Revert "fs/proc/kcore: convert read_kcore() to read_kcore_iter()"
d8bc432cb314 Revert "iov_iter: add copy_page_to_iter_nofault()"
bf2c6799f68c Revert "iov_iter: Kill ITER_PIPE"
ccf4b2c5c5ce Revert "mm: vmalloc: convert vread() to vread_iter()"
de400d383a7e Revert "mm/vmalloc: replace the ternary conditional operator with min()"

jirka

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-31 20:34               ` Jiri Olsa
@ 2023-07-31 21:12                 ` Lorenzo Stoakes
  2023-07-31 21:50                   ` Jiri Olsa
  0 siblings, 1 reply; 21+ messages in thread
From: Lorenzo Stoakes @ 2023-07-31 21:12 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: David Hildenbrand, Baoquan He, linux-mm, linux-kernel,
	linux-fsdevel, Andrew Morton, Uladzislau Rezki, Matthew Wilcox,
	Liu Shixin, Jens Axboe, Alexander Viro

On Mon, Jul 31, 2023 at 10:34:21PM +0200, Jiri Olsa wrote:
> On Mon, Jul 31, 2023 at 08:40:24PM +0100, Lorenzo Stoakes wrote:
> > On Mon, Jul 31, 2023 at 09:24:50PM +0200, David Hildenbrand wrote:
> > > On 31.07.23 21:21, Lorenzo Stoakes wrote:
> > > > On Mon, Jul 24, 2023 at 08:23:55AM +0200, David Hildenbrand wrote:
> > > > > Hi,
> > > > >
> > > > > >
> > > > > > I met this too when I executed below command to trigger a kcore reading.
> > > > > > I wanted to do a simple testing during system running and got this.
> > > > > >
> > > > > >     makedumpfile --mem-usage /proc/kcore
> > > > > >
> > > > > > Later I tried your above objdump testing, it corrupted system too.
> > > > > >
> > > > >
> > > > > What do you mean with "corrupted system too" --  did it not only fail to
> > > > > dump the system, but also actually harmed the system?
> > > > >
> > > > > @Lorenzo do you plan on reproduce + fix, or should we consider reverting
> > > > > that change?
> > > > >
> > > > > --
> > > > > Cheers,
> > > > >
> > > > > David / dhildenb
> > > > >
> > > >
> > > > Apologies I mised this, I have been very busy lately not least with book :)
> > > >
> > > > Concerning, I will take a look as I get a chance. I think the whole series
> > > > would have to be reverted which would be... depressing... as other patches
> > > > in series eliminates the bounce buffer altogether.
> > > >
> > >
> > > I spotted
> > >
> > > https://lkml.kernel.org/r/069dd40aa71e634b414d07039d72467d051fb486.camel@gmx.de
> > >
> >
> > Find that slightly confusing, they talk about just reveritng the patch but then
> > also add a kern_addr_valid()?
> >
> > I'm also confused about people talking about just reverting the patch, as
> > 4c91c07c93bb drops the bounce buffer altogether... presumably they mean
> > reverting both?
> >
> > Clearly this is an arm64 thing (obviously), I have some arm64 hardware let me
> > see if I can repro...
>
> I see the issue on x86

Ummmm what? I can't! What repro are you seeing on x86, exactly?

>
> >
> > Baoquan, Jiri - are you reverting more than just the one commit? And does doing
> > this go from not working -> working? Or from not working (worst case oops) ->
> > error?
>
> yes, I used to revert all 4 patches
>
> I did quick check and had to revert 2 more patches to get clean revert
>
> 38b138abc355 Revert "fs/proc/kcore: avoid bounce buffer for ktext data"
> e2c3b418d365 Revert "fs/proc/kcore: convert read_kcore() to read_kcore_iter()"
> d8bc432cb314 Revert "iov_iter: add copy_page_to_iter_nofault()"
> bf2c6799f68c Revert "iov_iter: Kill ITER_PIPE"
> ccf4b2c5c5ce Revert "mm: vmalloc: convert vread() to vread_iter()"
> de400d383a7e Revert "mm/vmalloc: replace the ternary conditional operator with min()"
>
> jirka

That's quite a few more reverts and obviously not an acceptable solution here.

Looking at
https://lore.kernel.org/all/CAA5enKaUYehLZGL3abv4rsS7caoUG-pN9wF3R+qek-DGNZufbA@mail.gmail.com
a parallel thread on this, it looks like the issue is that we are no longer
using a no-fault kernel copy in KCORE_TEXT path and arm64 doesn't map everything
in the text range.

Solution would be to reinstate the bounce buffer in this case (ugh). Longer term
solution I think would be to create some iterator helper that does no fault
copies from the kernel.

I will try to come up with a semi-revert that keeps the iterator stuff but
keeps a hideous bounce buffer for the KCORE_TEXT bit with a comment
explaining why...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-31 21:12                 ` Lorenzo Stoakes
@ 2023-07-31 21:50                   ` Jiri Olsa
  2023-07-31 21:58                     ` Lorenzo Stoakes
  0 siblings, 1 reply; 21+ messages in thread
From: Jiri Olsa @ 2023-07-31 21:50 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Jiri Olsa, David Hildenbrand, Baoquan He, linux-mm, linux-kernel,
	linux-fsdevel, Andrew Morton, Uladzislau Rezki, Matthew Wilcox,
	Liu Shixin, Jens Axboe, Alexander Viro

On Mon, Jul 31, 2023 at 10:12:08PM +0100, Lorenzo Stoakes wrote:
> On Mon, Jul 31, 2023 at 10:34:21PM +0200, Jiri Olsa wrote:
> > On Mon, Jul 31, 2023 at 08:40:24PM +0100, Lorenzo Stoakes wrote:
> > > On Mon, Jul 31, 2023 at 09:24:50PM +0200, David Hildenbrand wrote:
> > > > On 31.07.23 21:21, Lorenzo Stoakes wrote:
> > > > > On Mon, Jul 24, 2023 at 08:23:55AM +0200, David Hildenbrand wrote:
> > > > > > Hi,
> > > > > >
> > > > > > >
> > > > > > > I met this too when I executed below command to trigger a kcore reading.
> > > > > > > I wanted to do a simple testing during system running and got this.
> > > > > > >
> > > > > > >     makedumpfile --mem-usage /proc/kcore
> > > > > > >
> > > > > > > Later I tried your above objdump testing, it corrupted system too.
> > > > > > >
> > > > > >
> > > > > > What do you mean with "corrupted system too" --  did it not only fail to
> > > > > > dump the system, but also actually harmed the system?
> > > > > >
> > > > > > @Lorenzo do you plan on reproduce + fix, or should we consider reverting
> > > > > > that change?
> > > > > >
> > > > > > --
> > > > > > Cheers,
> > > > > >
> > > > > > David / dhildenb
> > > > > >
> > > > >
> > > > > Apologies I mised this, I have been very busy lately not least with book :)
> > > > >
> > > > > Concerning, I will take a look as I get a chance. I think the whole series
> > > > > would have to be reverted which would be... depressing... as other patches
> > > > > in series eliminates the bounce buffer altogether.
> > > > >
> > > >
> > > > I spotted
> > > >
> > > > https://lkml.kernel.org/r/069dd40aa71e634b414d07039d72467d051fb486.camel@gmx.de
> > > >
> > >
> > > Find that slightly confusing, they talk about just reveritng the patch but then
> > > also add a kern_addr_valid()?
> > >
> > > I'm also confused about people talking about just reverting the patch, as
> > > 4c91c07c93bb drops the bounce buffer altogether... presumably they mean
> > > reverting both?
> > >
> > > Clearly this is an arm64 thing (obviously), I have some arm64 hardware let me
> > > see if I can repro...
> >
> > I see the issue on x86
> 
> Ummmm what? I can't! What repro are you seeing on x86, exactly?

# cat /proc/kallsyms | grep ksys_read
ffffffff8151e450 T ksys_read

# objdump -d  --start-address=0xffffffff8151e450 --stop-address=0xffffffff8151e460 /proc/kcore 

/proc/kcore:     file format elf64-x86-64

objdump: Reading section load1 failed because: Bad address


jirka

> 
> >
> > >
> > > Baoquan, Jiri - are you reverting more than just the one commit? And does doing
> > > this go from not working -> working? Or from not working (worst case oops) ->
> > > error?
> >
> > yes, I used to revert all 4 patches
> >
> > I did quick check and had to revert 2 more patches to get clean revert
> >
> > 38b138abc355 Revert "fs/proc/kcore: avoid bounce buffer for ktext data"
> > e2c3b418d365 Revert "fs/proc/kcore: convert read_kcore() to read_kcore_iter()"
> > d8bc432cb314 Revert "iov_iter: add copy_page_to_iter_nofault()"
> > bf2c6799f68c Revert "iov_iter: Kill ITER_PIPE"
> > ccf4b2c5c5ce Revert "mm: vmalloc: convert vread() to vread_iter()"
> > de400d383a7e Revert "mm/vmalloc: replace the ternary conditional operator with min()"
> >
> > jirka
> 
> That's quite a few more reverts and obviously not an acceptable solution here.
> 
> Looking at
> https://lore.kernel.org/all/CAA5enKaUYehLZGL3abv4rsS7caoUG-pN9wF3R+qek-DGNZufbA@mail.gmail.com
> a parallel thread on this, it looks like the issue is that we are no longer
> using a no-fault kernel copy in KCORE_TEXT path and arm64 doesn't map everything
> in the text range.
> 
> Solution would be to reinstate the bounce buffer in this case (ugh). Longer term
> solution I think would be to create some iterator helper that does no fault
> copies from the kernel.
> 
> I will try to come up with a semi-revert that keeps the iterator stuff but
> keeps a hideous bounce buffer for the KCORE_TEXT bit with a comment
> explaining why...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data
  2023-07-31 21:50                   ` Jiri Olsa
@ 2023-07-31 21:58                     ` Lorenzo Stoakes
  0 siblings, 0 replies; 21+ messages in thread
From: Lorenzo Stoakes @ 2023-07-31 21:58 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: David Hildenbrand, Baoquan He, linux-mm, linux-kernel,
	linux-fsdevel, Andrew Morton, Uladzislau Rezki, Matthew Wilcox,
	Liu Shixin, Jens Axboe, Alexander Viro

On Mon, Jul 31, 2023 at 11:50:22PM +0200, Jiri Olsa wrote:
> > Ummmm what? I can't! What repro are you seeing on x86, exactly?
>
> # cat /proc/kallsyms | grep ksys_read
> ffffffff8151e450 T ksys_read
>
> # objdump -d  --start-address=0xffffffff8151e450 --stop-address=0xffffffff8151e460 /proc/kcore
>
> /proc/kcore:     file format elf64-x86-64
>
> objdump: Reading section load1 failed because: Bad address
>
>
> jirka

Locally I don't see this issue. How odd. The bug doesn't manifest as a 'bad
address' in the arm64 repros either. I wonder if this is something unrelated...

In any case I have a candidate fix for the bug at
https://lore.kernel.org/all/20230731215021.70911-1-lstoakes@gmail.com/ which
should hopefully address the underlying issue with minimum change.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-07-31 21:58 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-23 10:15 [PATCH v8 0/4] convert read_kcore(), vread() to use iterators Lorenzo Stoakes
2023-03-23 10:15 ` [PATCH v8 1/4] fs/proc/kcore: avoid bounce buffer for ktext data Lorenzo Stoakes
2023-05-31 11:58   ` Jiri Olsa
2023-07-21 13:48     ` Baoquan He
2023-07-21 14:13       ` Jiri Olsa
2023-07-24  6:23       ` David Hildenbrand
2023-07-24  8:08         ` Baoquan He
2023-07-24  8:18           ` Jiri Olsa
2023-07-24 14:33             ` Baoquan He
2023-07-31 19:21         ` Lorenzo Stoakes
2023-07-31 19:24           ` David Hildenbrand
2023-07-31 19:40             ` Lorenzo Stoakes
2023-07-31 20:34               ` Jiri Olsa
2023-07-31 21:12                 ` Lorenzo Stoakes
2023-07-31 21:50                   ` Jiri Olsa
2023-07-31 21:58                     ` Lorenzo Stoakes
2023-07-24  9:38     ` Linux regression tracking (Thorsten Leemhuis)
2023-03-23 10:15 ` [PATCH v8 2/4] fs/proc/kcore: convert read_kcore() to read_kcore_iter() Lorenzo Stoakes
2023-03-23 10:15 ` [PATCH v8 3/4] iov_iter: add copy_page_to_iter_nofault() Lorenzo Stoakes
2023-03-23 10:15 ` [PATCH v8 4/4] mm: vmalloc: convert vread() to vread_iter() Lorenzo Stoakes
2023-03-29  4:53 ` [PATCH v8 0/4] convert read_kcore(), vread() to use iterators Baoquan He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.