linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/9] /proc/kcore improvements
@ 2018-07-25 23:59 Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 1/9] proc/kcore: don't grab lock for kclist_add() Omar Sandoval
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Omar Sandoval @ 2018-07-25 23:59 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

Hi,

This series makes a few improvements to /proc/kcore. It fixes a couple
of small issues in v3 but is otherwise the same. Patches 1, 2, and 3 are
prep patches. Patch 4 is a fix/cleanup. Patch 5 is another prep patch.
Patches 6 and 7 are optimizations to ->read(). Patch 8 makes it possible
to enable CRASH_CORE on any architecture, which is needed for patch 9.
Patch 9 adds vmcoreinfo to /proc/kcore.

Based on v4.18-rc6 + James' patch in the mm tree, and tested with crash
and readelf.

Thanks!

Changes from v3:

- Fixes a mixed up up_write() instead of up_read() in patch 5 reported
  by Tetsuo Handa
- Added patch 8 to fix a build failure reported by Stephen Rothwell

Changes from v2:

- Add __init to kclist_add() as per Andrew
- Got rid of conversion of kcore_need_update from int to atomic_t and
  just used xchg() instead of atomic_cmpxchg() (split out into a new
  patch instead of combining it with the rwlock -> rwsem conversion)
- Add comment about the increase in file size to patch 8

Changes from v1:

- Rebased onto v4.18-rc4 + James' patch
  (https://patchwork.kernel.org/patch/10519739/) in the mm tree
- Fix spurious sparse warning (see the report and response in
  https://patchwork.kernel.org/patch/10512431/)

Omar Sandoval (9):
  proc/kcore: don't grab lock for kclist_add()
  proc/kcore: don't grab lock for memory hotplug notifier
  proc/kcore: replace kclist_lock rwlock with rwsem
  proc/kcore: fix memory hotplug vs multiple opens race
  proc/kcore: hold lock during read
  proc/kcore: clean up ELF header generation
  proc/kcore: optimize multiple page reads
  crash_core: use VMCOREINFO_SYMBOL_ARRAY() for swapper_pg_dir
  proc/kcore: add vmcoreinfo note to /proc/kcore

 fs/proc/Kconfig            |   1 +
 fs/proc/kcore.c            | 534 +++++++++++++++++--------------------
 include/linux/crash_core.h |   2 +
 include/linux/kcore.h      |   2 +-
 kernel/crash_core.c        |   6 +-
 5 files changed, 252 insertions(+), 293 deletions(-)

-- 
2.18.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/9] proc/kcore: don't grab lock for kclist_add()
  2018-07-25 23:59 [PATCH v4 0/9] /proc/kcore improvements Omar Sandoval
@ 2018-07-25 23:59 ` Omar Sandoval
  2018-08-07  5:05   ` Bhupesh Sharma
  2018-07-25 23:59 ` [PATCH v4 2/9] proc/kcore: don't grab lock for memory hotplug notifier Omar Sandoval
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 11+ messages in thread
From: Omar Sandoval @ 2018-07-25 23:59 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

kclist_add() is only called at init time, so there's no point in
grabbing any locks. We're also going to replace the rwlock with a rwsem,
which we don't want to try grabbing during early boot.

While we're here, mark kclist_add() with __init so that we'll get a
warning if it's called from non-init code.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c       | 7 +++----
 include/linux/kcore.h | 2 +-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 66c373230e60..b0b9a76f28d6 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -62,16 +62,15 @@ static LIST_HEAD(kclist_head);
 static DEFINE_RWLOCK(kclist_lock);
 static int kcore_need_update = 1;
 
-void
-kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
+/* This doesn't grab kclist_lock, so it should only be used at init time. */
+void __init kclist_add(struct kcore_list *new, void *addr, size_t size,
+		       int type)
 {
 	new->addr = (unsigned long)addr;
 	new->size = size;
 	new->type = type;
 
-	write_lock(&kclist_lock);
 	list_add_tail(&new->list, &kclist_head);
-	write_unlock(&kclist_lock);
 }
 
 static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
diff --git a/include/linux/kcore.h b/include/linux/kcore.h
index 8de55e4b5ee9..c20f296438fb 100644
--- a/include/linux/kcore.h
+++ b/include/linux/kcore.h
@@ -35,7 +35,7 @@ struct vmcoredd_node {
 };
 
 #ifdef CONFIG_PROC_KCORE
-extern void kclist_add(struct kcore_list *, void *, size_t, int type);
+void __init kclist_add(struct kcore_list *, void *, size_t, int type);
 #else
 static inline
 void kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/9] proc/kcore: don't grab lock for memory hotplug notifier
  2018-07-25 23:59 [PATCH v4 0/9] /proc/kcore improvements Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 1/9] proc/kcore: don't grab lock for kclist_add() Omar Sandoval
@ 2018-07-25 23:59 ` Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 3/9] proc/kcore: replace kclist_lock rwlock with rwsem Omar Sandoval
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Omar Sandoval @ 2018-07-25 23:59 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

The memory hotplug notifier kcore_callback() only needs kclist_lock to
prevent races with __kcore_update_ram(), but we can easily eliminate
that race by using an atomic xchg() in __kcore_update_ram(). This is
preparation for converting kclist_lock to an rwsem.

Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index b0b9a76f28d6..e83f15a4f66d 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -118,7 +118,7 @@ static void __kcore_update_ram(struct list_head *list)
 	LIST_HEAD(garbage);
 
 	write_lock(&kclist_lock);
-	if (kcore_need_update) {
+	if (xchg(&kcore_need_update, 0)) {
 		list_for_each_entry_safe(pos, tmp, &kclist_head, list) {
 			if (pos->type == KCORE_RAM
 				|| pos->type == KCORE_VMEMMAP)
@@ -127,7 +127,6 @@ static void __kcore_update_ram(struct list_head *list)
 		list_splice_tail(list, &kclist_head);
 	} else
 		list_splice(list, &garbage);
-	kcore_need_update = 0;
 	proc_root_kcore->size = get_kcore_size(&nphdr, &size);
 	write_unlock(&kclist_lock);
 
@@ -593,9 +592,8 @@ static int __meminit kcore_callback(struct notifier_block *self,
 	switch (action) {
 	case MEM_ONLINE:
 	case MEM_OFFLINE:
-		write_lock(&kclist_lock);
 		kcore_need_update = 1;
-		write_unlock(&kclist_lock);
+		break;
 	}
 	return NOTIFY_OK;
 }
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/9] proc/kcore: replace kclist_lock rwlock with rwsem
  2018-07-25 23:59 [PATCH v4 0/9] /proc/kcore improvements Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 1/9] proc/kcore: don't grab lock for kclist_add() Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 2/9] proc/kcore: don't grab lock for memory hotplug notifier Omar Sandoval
@ 2018-07-25 23:59 ` Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 4/9] proc/kcore: fix memory hotplug vs multiple opens race Omar Sandoval
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Omar Sandoval @ 2018-07-25 23:59 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

Now we only need kclist_lock from user context and at fs init time, and
the following changes need to sleep while holding the kclist_lock.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index e83f15a4f66d..ae43a97d511d 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -59,7 +59,7 @@ struct memelfnote
 };
 
 static LIST_HEAD(kclist_head);
-static DEFINE_RWLOCK(kclist_lock);
+static DECLARE_RWSEM(kclist_lock);
 static int kcore_need_update = 1;
 
 /* This doesn't grab kclist_lock, so it should only be used at init time. */
@@ -117,7 +117,7 @@ static void __kcore_update_ram(struct list_head *list)
 	struct kcore_list *tmp, *pos;
 	LIST_HEAD(garbage);
 
-	write_lock(&kclist_lock);
+	down_write(&kclist_lock);
 	if (xchg(&kcore_need_update, 0)) {
 		list_for_each_entry_safe(pos, tmp, &kclist_head, list) {
 			if (pos->type == KCORE_RAM
@@ -128,7 +128,7 @@ static void __kcore_update_ram(struct list_head *list)
 	} else
 		list_splice(list, &garbage);
 	proc_root_kcore->size = get_kcore_size(&nphdr, &size);
-	write_unlock(&kclist_lock);
+	up_write(&kclist_lock);
 
 	free_kclist_ents(&garbage);
 }
@@ -451,11 +451,11 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	int nphdr;
 	unsigned long start;
 
-	read_lock(&kclist_lock);
+	down_read(&kclist_lock);
 	size = get_kcore_size(&nphdr, &elf_buflen);
 
 	if (buflen == 0 || *fpos >= size) {
-		read_unlock(&kclist_lock);
+		up_read(&kclist_lock);
 		return 0;
 	}
 
@@ -472,11 +472,11 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 			tsz = buflen;
 		elf_buf = kzalloc(elf_buflen, GFP_ATOMIC);
 		if (!elf_buf) {
-			read_unlock(&kclist_lock);
+			up_read(&kclist_lock);
 			return -ENOMEM;
 		}
 		elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen);
-		read_unlock(&kclist_lock);
+		up_read(&kclist_lock);
 		if (copy_to_user(buffer, elf_buf + *fpos, tsz)) {
 			kfree(elf_buf);
 			return -EFAULT;
@@ -491,7 +491,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		if (buflen == 0)
 			return acc;
 	} else
-		read_unlock(&kclist_lock);
+		up_read(&kclist_lock);
 
 	/*
 	 * Check to see if our file offset matches with any of
@@ -504,12 +504,12 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	while (buflen) {
 		struct kcore_list *m;
 
-		read_lock(&kclist_lock);
+		down_read(&kclist_lock);
 		list_for_each_entry(m, &kclist_head, list) {
 			if (start >= m->addr && start < (m->addr+m->size))
 				break;
 		}
-		read_unlock(&kclist_lock);
+		up_read(&kclist_lock);
 
 		if (&m->list == &kclist_head) {
 			if (clear_user(buffer, tsz))
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 4/9] proc/kcore: fix memory hotplug vs multiple opens race
  2018-07-25 23:59 [PATCH v4 0/9] /proc/kcore improvements Omar Sandoval
                   ` (2 preceding siblings ...)
  2018-07-25 23:59 ` [PATCH v4 3/9] proc/kcore: replace kclist_lock rwlock with rwsem Omar Sandoval
@ 2018-07-25 23:59 ` Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 5/9] proc/kcore: hold lock during read Omar Sandoval
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Omar Sandoval @ 2018-07-25 23:59 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

There's a theoretical race condition that will cause /proc/kcore to miss
a memory hotplug event:

CPU0                              CPU1
// hotplug event 1
kcore_need_update = 1

open_kcore()                      open_kcore()
    kcore_update_ram()                kcore_update_ram()
        // Walk RAM                       // Walk RAM
        __kcore_update_ram()              __kcore_update_ram()
            kcore_need_update = 0

// hotplug event 2
kcore_need_update = 1
                                              kcore_need_update = 0

Note that CPU1 set up the RAM kcore entries with the state after hotplug
event 1 but cleared the flag for hotplug event 2. The RAM entries will
therefore be stale until there is another hotplug event.

This is an extremely unlikely sequence of events, but the fix makes the
synchronization saner, anyways: we serialize the entire update sequence,
which means that whoever clears the flag will always succeed in
replacing the kcore list.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 93 +++++++++++++++++++++++--------------------------
 1 file changed, 44 insertions(+), 49 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index ae43a97d511d..95aa988c5b5d 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -98,53 +98,15 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
 	return size + *elf_buflen;
 }
 
-static void free_kclist_ents(struct list_head *head)
-{
-	struct kcore_list *tmp, *pos;
-
-	list_for_each_entry_safe(pos, tmp, head, list) {
-		list_del(&pos->list);
-		kfree(pos);
-	}
-}
-/*
- * Replace all KCORE_RAM/KCORE_VMEMMAP information with passed list.
- */
-static void __kcore_update_ram(struct list_head *list)
-{
-	int nphdr;
-	size_t size;
-	struct kcore_list *tmp, *pos;
-	LIST_HEAD(garbage);
-
-	down_write(&kclist_lock);
-	if (xchg(&kcore_need_update, 0)) {
-		list_for_each_entry_safe(pos, tmp, &kclist_head, list) {
-			if (pos->type == KCORE_RAM
-				|| pos->type == KCORE_VMEMMAP)
-				list_move(&pos->list, &garbage);
-		}
-		list_splice_tail(list, &kclist_head);
-	} else
-		list_splice(list, &garbage);
-	proc_root_kcore->size = get_kcore_size(&nphdr, &size);
-	up_write(&kclist_lock);
-
-	free_kclist_ents(&garbage);
-}
-
-
 #ifdef CONFIG_HIGHMEM
 /*
  * If no highmem, we can assume [0...max_low_pfn) continuous range of memory
  * because memory hole is not as big as !HIGHMEM case.
  * (HIGHMEM is special because part of memory is _invisible_ from the kernel.)
  */
-static int kcore_update_ram(void)
+static int kcore_ram_list(struct list_head *head)
 {
-	LIST_HEAD(head);
 	struct kcore_list *ent;
-	int ret = 0;
 
 	ent = kmalloc(sizeof(*ent), GFP_KERNEL);
 	if (!ent)
@@ -152,9 +114,8 @@ static int kcore_update_ram(void)
 	ent->addr = (unsigned long)__va(0);
 	ent->size = max_low_pfn << PAGE_SHIFT;
 	ent->type = KCORE_RAM;
-	list_add(&ent->list, &head);
-	__kcore_update_ram(&head);
-	return ret;
+	list_add(&ent->list, head);
+	return 0;
 }
 
 #else /* !CONFIG_HIGHMEM */
@@ -253,11 +214,10 @@ kclist_add_private(unsigned long pfn, unsigned long nr_pages, void *arg)
 	return 1;
 }
 
-static int kcore_update_ram(void)
+static int kcore_ram_list(struct list_head *list)
 {
 	int nid, ret;
 	unsigned long end_pfn;
-	LIST_HEAD(head);
 
 	/* Not inialized....update now */
 	/* find out "max pfn" */
@@ -269,15 +229,50 @@ static int kcore_update_ram(void)
 			end_pfn = node_end;
 	}
 	/* scan 0 to max_pfn */
-	ret = walk_system_ram_range(0, end_pfn, &head, kclist_add_private);
-	if (ret) {
-		free_kclist_ents(&head);
+	ret = walk_system_ram_range(0, end_pfn, list, kclist_add_private);
+	if (ret)
 		return -ENOMEM;
+	return 0;
+}
+#endif /* CONFIG_HIGHMEM */
+
+static int kcore_update_ram(void)
+{
+	LIST_HEAD(list);
+	LIST_HEAD(garbage);
+	int nphdr;
+	size_t size;
+	struct kcore_list *tmp, *pos;
+	int ret = 0;
+
+	down_write(&kclist_lock);
+	if (!xchg(&kcore_need_update, 0))
+		goto out;
+
+	ret = kcore_ram_list(&list);
+	if (ret) {
+		/* Couldn't get the RAM list, try again next time. */
+		WRITE_ONCE(kcore_need_update, 1);
+		list_splice_tail(&list, &garbage);
+		goto out;
+	}
+
+	list_for_each_entry_safe(pos, tmp, &kclist_head, list) {
+		if (pos->type == KCORE_RAM || pos->type == KCORE_VMEMMAP)
+			list_move(&pos->list, &garbage);
+	}
+	list_splice_tail(&list, &kclist_head);
+
+	proc_root_kcore->size = get_kcore_size(&nphdr, &size);
+
+out:
+	up_write(&kclist_lock);
+	list_for_each_entry_safe(pos, tmp, &garbage, list) {
+		list_del(&pos->list);
+		kfree(pos);
 	}
-	__kcore_update_ram(&head);
 	return ret;
 }
-#endif /* CONFIG_HIGHMEM */
 
 /*****************************************************************************/
 /*
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 5/9] proc/kcore: hold lock during read
  2018-07-25 23:59 [PATCH v4 0/9] /proc/kcore improvements Omar Sandoval
                   ` (3 preceding siblings ...)
  2018-07-25 23:59 ` [PATCH v4 4/9] proc/kcore: fix memory hotplug vs multiple opens race Omar Sandoval
@ 2018-07-25 23:59 ` Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 6/9] proc/kcore: clean up ELF header generation Omar Sandoval
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Omar Sandoval @ 2018-07-25 23:59 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

Now that we're using an rwsem, we can hold it during the entirety of
read_kcore() and have a common return path. This is preparation for the
next change.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 70 ++++++++++++++++++++++++++++---------------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 95aa988c5b5d..dc34642bbdb7 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -440,19 +440,18 @@ static ssize_t
 read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 {
 	char *buf = file->private_data;
-	ssize_t acc = 0;
 	size_t size, tsz;
 	size_t elf_buflen;
 	int nphdr;
 	unsigned long start;
+	size_t orig_buflen = buflen;
+	int ret = 0;
 
 	down_read(&kclist_lock);
 	size = get_kcore_size(&nphdr, &elf_buflen);
 
-	if (buflen == 0 || *fpos >= size) {
-		up_read(&kclist_lock);
-		return 0;
-	}
+	if (buflen == 0 || *fpos >= size)
+		goto out;
 
 	/* trim buflen to not go beyond EOF */
 	if (buflen > size - *fpos)
@@ -465,28 +464,26 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		tsz = elf_buflen - *fpos;
 		if (buflen < tsz)
 			tsz = buflen;
-		elf_buf = kzalloc(elf_buflen, GFP_ATOMIC);
+		elf_buf = kzalloc(elf_buflen, GFP_KERNEL);
 		if (!elf_buf) {
-			up_read(&kclist_lock);
-			return -ENOMEM;
+			ret = -ENOMEM;
+			goto out;
 		}
 		elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen);
-		up_read(&kclist_lock);
 		if (copy_to_user(buffer, elf_buf + *fpos, tsz)) {
 			kfree(elf_buf);
-			return -EFAULT;
+			ret = -EFAULT;
+			goto out;
 		}
 		kfree(elf_buf);
 		buflen -= tsz;
 		*fpos += tsz;
 		buffer += tsz;
-		acc += tsz;
 
 		/* leave now if filled buffer already */
 		if (buflen == 0)
-			return acc;
-	} else
-		up_read(&kclist_lock);
+			goto out;
+	}
 
 	/*
 	 * Check to see if our file offset matches with any of
@@ -499,25 +496,29 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	while (buflen) {
 		struct kcore_list *m;
 
-		down_read(&kclist_lock);
 		list_for_each_entry(m, &kclist_head, list) {
 			if (start >= m->addr && start < (m->addr+m->size))
 				break;
 		}
-		up_read(&kclist_lock);
 
 		if (&m->list == &kclist_head) {
-			if (clear_user(buffer, tsz))
-				return -EFAULT;
+			if (clear_user(buffer, tsz)) {
+				ret = -EFAULT;
+				goto out;
+			}
 		} else if (m->type == KCORE_VMALLOC) {
 			vread(buf, (char *)start, tsz);
 			/* we have to zero-fill user buffer even if no read */
-			if (copy_to_user(buffer, buf, tsz))
-				return -EFAULT;
+			if (copy_to_user(buffer, buf, tsz)) {
+				ret = -EFAULT;
+				goto out;
+			}
 		} else if (m->type == KCORE_USER) {
 			/* User page is handled prior to normal kernel page: */
-			if (copy_to_user(buffer, (char *)start, tsz))
-				return -EFAULT;
+			if (copy_to_user(buffer, (char *)start, tsz)) {
+				ret = -EFAULT;
+				goto out;
+			}
 		} else {
 			if (kern_addr_valid(start)) {
 				/*
@@ -525,26 +526,35 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 				 * hardened user copy kernel text checks.
 				 */
 				if (probe_kernel_read(buf, (void *) start, tsz)) {
-					if (clear_user(buffer, tsz))
-						return -EFAULT;
+					if (clear_user(buffer, tsz)) {
+						ret = -EFAULT;
+						goto out;
+					}
 				} else {
-					if (copy_to_user(buffer, buf, tsz))
-						return -EFAULT;
+					if (copy_to_user(buffer, buf, tsz)) {
+						ret = -EFAULT;
+						goto out;
+					}
 				}
 			} else {
-				if (clear_user(buffer, tsz))
-					return -EFAULT;
+				if (clear_user(buffer, tsz)) {
+					ret = -EFAULT;
+					goto out;
+				}
 			}
 		}
 		buflen -= tsz;
 		*fpos += tsz;
 		buffer += tsz;
-		acc += tsz;
 		start += tsz;
 		tsz = (buflen > PAGE_SIZE ? PAGE_SIZE : buflen);
 	}
 
-	return acc;
+out:
+	up_read(&kclist_lock);
+	if (ret)
+		return ret;
+	return orig_buflen - buflen;
 }
 
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 6/9] proc/kcore: clean up ELF header generation
  2018-07-25 23:59 [PATCH v4 0/9] /proc/kcore improvements Omar Sandoval
                   ` (4 preceding siblings ...)
  2018-07-25 23:59 ` [PATCH v4 5/9] proc/kcore: hold lock during read Omar Sandoval
@ 2018-07-25 23:59 ` Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 7/9] proc/kcore: optimize multiple page reads Omar Sandoval
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Omar Sandoval @ 2018-07-25 23:59 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

Currently, the ELF file header, program headers, and note segment are
allocated all at once, in some icky code dating back to 2.3. Programs
tend to read the file header, then the program headers, then the note
segment, all separately, so this is a waste of effort. It's cleaner and
more efficient to handle the three separately.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 350 +++++++++++++++++++-----------------------------
 1 file changed, 141 insertions(+), 209 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index dc34642bbdb7..808ef9afd084 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -49,15 +49,6 @@ static struct proc_dir_entry *proc_root_kcore;
 #define	kc_offset_to_vaddr(o) ((o) + PAGE_OFFSET)
 #endif
 
-/* An ELF note in memory */
-struct memelfnote
-{
-	const char *name;
-	int type;
-	unsigned int datasz;
-	void *data;
-};
-
 static LIST_HEAD(kclist_head);
 static DECLARE_RWSEM(kclist_lock);
 static int kcore_need_update = 1;
@@ -73,7 +64,8 @@ void __init kclist_add(struct kcore_list *new, void *addr, size_t size,
 	list_add_tail(&new->list, &kclist_head);
 }
 
-static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
+static size_t get_kcore_size(int *nphdr, size_t *phdrs_len, size_t *notes_len,
+			     size_t *data_offset)
 {
 	size_t try, size;
 	struct kcore_list *m;
@@ -87,15 +79,15 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
 			size = try;
 		*nphdr = *nphdr + 1;
 	}
-	*elf_buflen =	sizeof(struct elfhdr) + 
-			(*nphdr + 2)*sizeof(struct elf_phdr) + 
-			3 * ((sizeof(struct elf_note)) +
-			     roundup(sizeof(CORE_STR), 4)) +
-			roundup(sizeof(struct elf_prstatus), 4) +
-			roundup(sizeof(struct elf_prpsinfo), 4) +
-			roundup(arch_task_struct_size, 4);
-	*elf_buflen = PAGE_ALIGN(*elf_buflen);
-	return size + *elf_buflen;
+
+	*phdrs_len = *nphdr * sizeof(struct elf_phdr);
+	*notes_len = (3 * (sizeof(struct elf_note) + ALIGN(sizeof(CORE_STR), 4)) +
+		      ALIGN(sizeof(struct elf_prstatus), 4) +
+		      ALIGN(sizeof(struct elf_prpsinfo), 4) +
+		      ALIGN(arch_task_struct_size, 4));
+	*data_offset = PAGE_ALIGN(sizeof(struct elfhdr) + *phdrs_len +
+				  *notes_len);
+	return *data_offset + size;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -241,7 +233,7 @@ static int kcore_update_ram(void)
 	LIST_HEAD(list);
 	LIST_HEAD(garbage);
 	int nphdr;
-	size_t size;
+	size_t phdrs_len, notes_len, data_offset;
 	struct kcore_list *tmp, *pos;
 	int ret = 0;
 
@@ -263,7 +255,8 @@ static int kcore_update_ram(void)
 	}
 	list_splice_tail(&list, &kclist_head);
 
-	proc_root_kcore->size = get_kcore_size(&nphdr, &size);
+	proc_root_kcore->size = get_kcore_size(&nphdr, &phdrs_len, &notes_len,
+					       &data_offset);
 
 out:
 	up_write(&kclist_lock);
@@ -274,228 +267,168 @@ static int kcore_update_ram(void)
 	return ret;
 }
 
-/*****************************************************************************/
-/*
- * determine size of ELF note
- */
-static int notesize(struct memelfnote *en)
+static void append_kcore_note(char *notes, size_t *i, const char *name,
+			      unsigned int type, const void *desc,
+			      size_t descsz)
 {
-	int sz;
-
-	sz = sizeof(struct elf_note);
-	sz += roundup((strlen(en->name) + 1), 4);
-	sz += roundup(en->datasz, 4);
-
-	return sz;
-} /* end notesize() */
-
-/*****************************************************************************/
-/*
- * store a note in the header buffer
- */
-static char *storenote(struct memelfnote *men, char *bufp)
-{
-	struct elf_note en;
-
-#define DUMP_WRITE(addr,nr) do { memcpy(bufp,addr,nr); bufp += nr; } while(0)
-
-	en.n_namesz = strlen(men->name) + 1;
-	en.n_descsz = men->datasz;
-	en.n_type = men->type;
-
-	DUMP_WRITE(&en, sizeof(en));
-	DUMP_WRITE(men->name, en.n_namesz);
-
-	/* XXX - cast from long long to long to avoid need for libgcc.a */
-	bufp = (char*) roundup((unsigned long)bufp,4);
-	DUMP_WRITE(men->data, men->datasz);
-	bufp = (char*) roundup((unsigned long)bufp,4);
-
-#undef DUMP_WRITE
-
-	return bufp;
-} /* end storenote() */
-
-/*
- * store an ELF coredump header in the supplied buffer
- * nphdr is the number of elf_phdr to insert
- */
-static void elf_kcore_store_hdr(char *bufp, int nphdr, int dataoff)
-{
-	struct elf_prstatus prstatus;	/* NT_PRSTATUS */
-	struct elf_prpsinfo prpsinfo;	/* NT_PRPSINFO */
-	struct elf_phdr *nhdr, *phdr;
-	struct elfhdr *elf;
-	struct memelfnote notes[3];
-	off_t offset = 0;
-	struct kcore_list *m;
-
-	/* setup ELF header */
-	elf = (struct elfhdr *) bufp;
-	bufp += sizeof(struct elfhdr);
-	offset += sizeof(struct elfhdr);
-	memcpy(elf->e_ident, ELFMAG, SELFMAG);
-	elf->e_ident[EI_CLASS]	= ELF_CLASS;
-	elf->e_ident[EI_DATA]	= ELF_DATA;
-	elf->e_ident[EI_VERSION]= EV_CURRENT;
-	elf->e_ident[EI_OSABI] = ELF_OSABI;
-	memset(elf->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
-	elf->e_type	= ET_CORE;
-	elf->e_machine	= ELF_ARCH;
-	elf->e_version	= EV_CURRENT;
-	elf->e_entry	= 0;
-	elf->e_phoff	= sizeof(struct elfhdr);
-	elf->e_shoff	= 0;
-	elf->e_flags	= ELF_CORE_EFLAGS;
-	elf->e_ehsize	= sizeof(struct elfhdr);
-	elf->e_phentsize= sizeof(struct elf_phdr);
-	elf->e_phnum	= nphdr;
-	elf->e_shentsize= 0;
-	elf->e_shnum	= 0;
-	elf->e_shstrndx	= 0;
-
-	/* setup ELF PT_NOTE program header */
-	nhdr = (struct elf_phdr *) bufp;
-	bufp += sizeof(struct elf_phdr);
-	offset += sizeof(struct elf_phdr);
-	nhdr->p_type	= PT_NOTE;
-	nhdr->p_offset	= 0;
-	nhdr->p_vaddr	= 0;
-	nhdr->p_paddr	= 0;
-	nhdr->p_filesz	= 0;
-	nhdr->p_memsz	= 0;
-	nhdr->p_flags	= 0;
-	nhdr->p_align	= 0;
-
-	/* setup ELF PT_LOAD program header for every area */
-	list_for_each_entry(m, &kclist_head, list) {
-		phdr = (struct elf_phdr *) bufp;
-		bufp += sizeof(struct elf_phdr);
-		offset += sizeof(struct elf_phdr);
-
-		phdr->p_type	= PT_LOAD;
-		phdr->p_flags	= PF_R|PF_W|PF_X;
-		phdr->p_offset	= kc_vaddr_to_offset(m->addr) + dataoff;
-		phdr->p_vaddr	= (size_t)m->addr;
-		if (m->type == KCORE_RAM)
-			phdr->p_paddr	= __pa(m->addr);
-		else if (m->type == KCORE_TEXT)
-			phdr->p_paddr	= __pa_symbol(m->addr);
-		else
-			phdr->p_paddr	= (elf_addr_t)-1;
-		phdr->p_filesz	= phdr->p_memsz	= m->size;
-		phdr->p_align	= PAGE_SIZE;
-	}
-
-	/*
-	 * Set up the notes in similar form to SVR4 core dumps made
-	 * with info from their /proc.
-	 */
-	nhdr->p_offset	= offset;
-
-	/* set up the process status */
-	notes[0].name = CORE_STR;
-	notes[0].type = NT_PRSTATUS;
-	notes[0].datasz = sizeof(struct elf_prstatus);
-	notes[0].data = &prstatus;
-
-	memset(&prstatus, 0, sizeof(struct elf_prstatus));
-
-	nhdr->p_filesz	= notesize(&notes[0]);
-	bufp = storenote(&notes[0], bufp);
-
-	/* set up the process info */
-	notes[1].name	= CORE_STR;
-	notes[1].type	= NT_PRPSINFO;
-	notes[1].datasz	= sizeof(struct elf_prpsinfo);
-	notes[1].data	= &prpsinfo;
-
-	memset(&prpsinfo, 0, sizeof(struct elf_prpsinfo));
-	prpsinfo.pr_state	= 0;
-	prpsinfo.pr_sname	= 'R';
-	prpsinfo.pr_zomb	= 0;
-
-	strcpy(prpsinfo.pr_fname, "vmlinux");
-	strlcpy(prpsinfo.pr_psargs, saved_command_line, sizeof(prpsinfo.pr_psargs));
-
-	nhdr->p_filesz	+= notesize(&notes[1]);
-	bufp = storenote(&notes[1], bufp);
-
-	/* set up the task structure */
-	notes[2].name	= CORE_STR;
-	notes[2].type	= NT_TASKSTRUCT;
-	notes[2].datasz	= arch_task_struct_size;
-	notes[2].data	= current;
-
-	nhdr->p_filesz	+= notesize(&notes[2]);
-	bufp = storenote(&notes[2], bufp);
-
-} /* end elf_kcore_store_hdr() */
+	struct elf_note *note = (struct elf_note *)&notes[*i];
+
+	note->n_namesz = strlen(name) + 1;
+	note->n_descsz = descsz;
+	note->n_type = type;
+	*i += sizeof(*note);
+	memcpy(&notes[*i], name, note->n_namesz);
+	*i = ALIGN(*i + note->n_namesz, 4);
+	memcpy(&notes[*i], desc, descsz);
+	*i = ALIGN(*i + descsz, 4);
+}
 
-/*****************************************************************************/
-/*
- * read from the ELF header and then kernel memory
- */
 static ssize_t
 read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 {
 	char *buf = file->private_data;
-	size_t size, tsz;
-	size_t elf_buflen;
+	size_t phdrs_offset, notes_offset, data_offset;
+	size_t phdrs_len, notes_len;
+	struct kcore_list *m;
+	size_t tsz;
 	int nphdr;
 	unsigned long start;
 	size_t orig_buflen = buflen;
 	int ret = 0;
 
 	down_read(&kclist_lock);
-	size = get_kcore_size(&nphdr, &elf_buflen);
 
-	if (buflen == 0 || *fpos >= size)
-		goto out;
+	get_kcore_size(&nphdr, &phdrs_len, &notes_len, &data_offset);
+	phdrs_offset = sizeof(struct elfhdr);
+	notes_offset = phdrs_offset + phdrs_len;
+
+	/* ELF file header. */
+	if (buflen && *fpos < sizeof(struct elfhdr)) {
+		struct elfhdr ehdr = {
+			.e_ident = {
+				[EI_MAG0] = ELFMAG0,
+				[EI_MAG1] = ELFMAG1,
+				[EI_MAG2] = ELFMAG2,
+				[EI_MAG3] = ELFMAG3,
+				[EI_CLASS] = ELF_CLASS,
+				[EI_DATA] = ELF_DATA,
+				[EI_VERSION] = EV_CURRENT,
+				[EI_OSABI] = ELF_OSABI,
+			},
+			.e_type = ET_CORE,
+			.e_machine = ELF_ARCH,
+			.e_version = EV_CURRENT,
+			.e_phoff = sizeof(struct elfhdr),
+			.e_flags = ELF_CORE_EFLAGS,
+			.e_ehsize = sizeof(struct elfhdr),
+			.e_phentsize = sizeof(struct elf_phdr),
+			.e_phnum = nphdr,
+		};
+
+		tsz = min_t(size_t, buflen, sizeof(struct elfhdr) - *fpos);
+		if (copy_to_user(buffer, (char *)&ehdr + *fpos, tsz)) {
+			ret = -EFAULT;
+			goto out;
+		}
 
-	/* trim buflen to not go beyond EOF */
-	if (buflen > size - *fpos)
-		buflen = size - *fpos;
+		buffer += tsz;
+		buflen -= tsz;
+		*fpos += tsz;
+	}
 
-	/* construct an ELF core header if we'll need some of it */
-	if (*fpos < elf_buflen) {
-		char * elf_buf;
+	/* ELF program headers. */
+	if (buflen && *fpos < phdrs_offset + phdrs_len) {
+		struct elf_phdr *phdrs, *phdr;
 
-		tsz = elf_buflen - *fpos;
-		if (buflen < tsz)
-			tsz = buflen;
-		elf_buf = kzalloc(elf_buflen, GFP_KERNEL);
-		if (!elf_buf) {
+		phdrs = kzalloc(phdrs_len, GFP_KERNEL);
+		if (!phdrs) {
 			ret = -ENOMEM;
 			goto out;
 		}
-		elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen);
-		if (copy_to_user(buffer, elf_buf + *fpos, tsz)) {
-			kfree(elf_buf);
+
+		phdrs[0].p_type = PT_NOTE;
+		phdrs[0].p_offset = notes_offset;
+		phdrs[0].p_filesz = notes_len;
+
+		phdr = &phdrs[1];
+		list_for_each_entry(m, &kclist_head, list) {
+			phdr->p_type = PT_LOAD;
+			phdr->p_flags = PF_R | PF_W | PF_X;
+			phdr->p_offset = kc_vaddr_to_offset(m->addr) + data_offset;
+			phdr->p_vaddr = (size_t)m->addr;
+			if (m->type == KCORE_RAM)
+				phdr->p_paddr = __pa(m->addr);
+			else if (m->type == KCORE_TEXT)
+				phdr->p_paddr = __pa_symbol(m->addr);
+			else
+				phdr->p_paddr = (elf_addr_t)-1;
+			phdr->p_filesz = phdr->p_memsz = m->size;
+			phdr->p_align = PAGE_SIZE;
+			phdr++;
+		}
+
+		tsz = min_t(size_t, buflen, phdrs_offset + phdrs_len - *fpos);
+		if (copy_to_user(buffer, (char *)phdrs + *fpos - phdrs_offset,
+				 tsz)) {
+			kfree(phdrs);
 			ret = -EFAULT;
 			goto out;
 		}
-		kfree(elf_buf);
+		kfree(phdrs);
+
+		buffer += tsz;
 		buflen -= tsz;
 		*fpos += tsz;
-		buffer += tsz;
+	}
+
+	/* ELF note segment. */
+	if (buflen && *fpos < notes_offset + notes_len) {
+		struct elf_prstatus prstatus = {};
+		struct elf_prpsinfo prpsinfo = {
+			.pr_sname = 'R',
+			.pr_fname = "vmlinux",
+		};
+		char *notes;
+		size_t i = 0;
+
+		strlcpy(prpsinfo.pr_psargs, saved_command_line,
+			sizeof(prpsinfo.pr_psargs));
+
+		notes = kzalloc(notes_len, GFP_KERNEL);
+		if (!notes) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		append_kcore_note(notes, &i, CORE_STR, NT_PRSTATUS, &prstatus,
+				  sizeof(prstatus));
+		append_kcore_note(notes, &i, CORE_STR, NT_PRPSINFO, &prpsinfo,
+				  sizeof(prpsinfo));
+		append_kcore_note(notes, &i, CORE_STR, NT_TASKSTRUCT, current,
+				  arch_task_struct_size);
 
-		/* leave now if filled buffer already */
-		if (buflen == 0)
+		tsz = min_t(size_t, buflen, notes_offset + notes_len - *fpos);
+		if (copy_to_user(buffer, notes + *fpos - notes_offset, tsz)) {
+			kfree(notes);
+			ret = -EFAULT;
 			goto out;
+		}
+		kfree(notes);
+
+		buffer += tsz;
+		buflen -= tsz;
+		*fpos += tsz;
 	}
 
 	/*
 	 * Check to see if our file offset matches with any of
 	 * the addresses in the elf_phdr on our list.
 	 */
-	start = kc_offset_to_vaddr(*fpos - elf_buflen);
+	start = kc_offset_to_vaddr(*fpos - data_offset);
 	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
 		tsz = buflen;
-		
-	while (buflen) {
-		struct kcore_list *m;
 
+	while (buflen) {
 		list_for_each_entry(m, &kclist_head, list) {
 			if (start >= m->addr && start < (m->addr+m->size))
 				break;
@@ -557,7 +490,6 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	return orig_buflen - buflen;
 }
 
-
 static int open_kcore(struct inode *inode, struct file *filp)
 {
 	if (!capable(CAP_SYS_RAWIO))
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 7/9] proc/kcore: optimize multiple page reads
  2018-07-25 23:59 [PATCH v4 0/9] /proc/kcore improvements Omar Sandoval
                   ` (5 preceding siblings ...)
  2018-07-25 23:59 ` [PATCH v4 6/9] proc/kcore: clean up ELF header generation Omar Sandoval
@ 2018-07-25 23:59 ` Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 8/9] crash_core: use VMCOREINFO_SYMBOL_ARRAY() for swapper_pg_dir Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 9/9] proc/kcore: add vmcoreinfo note to /proc/kcore Omar Sandoval
  8 siblings, 0 replies; 11+ messages in thread
From: Omar Sandoval @ 2018-07-25 23:59 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

The current code does a full search of the segment list every time for
every page. This is wasteful, since it's almost certain that the next
page will be in the same segment. Instead, check if the previous segment
covers the current page before doing the list search.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 808ef9afd084..758c14e46a44 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -428,10 +428,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
 		tsz = buflen;
 
+	m = NULL;
 	while (buflen) {
-		list_for_each_entry(m, &kclist_head, list) {
-			if (start >= m->addr && start < (m->addr+m->size))
-				break;
+		/*
+		 * If this is the first iteration or the address is not within
+		 * the previous entry, search for a matching entry.
+		 */
+		if (!m || start < m->addr || start >= m->addr + m->size) {
+			list_for_each_entry(m, &kclist_head, list) {
+				if (start >= m->addr &&
+				    start < m->addr + m->size)
+					break;
+			}
 		}
 
 		if (&m->list == &kclist_head) {
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 8/9] crash_core: use VMCOREINFO_SYMBOL_ARRAY() for swapper_pg_dir
  2018-07-25 23:59 [PATCH v4 0/9] /proc/kcore improvements Omar Sandoval
                   ` (6 preceding siblings ...)
  2018-07-25 23:59 ` [PATCH v4 7/9] proc/kcore: optimize multiple page reads Omar Sandoval
@ 2018-07-25 23:59 ` Omar Sandoval
  2018-07-25 23:59 ` [PATCH v4 9/9] proc/kcore: add vmcoreinfo note to /proc/kcore Omar Sandoval
  8 siblings, 0 replies; 11+ messages in thread
From: Omar Sandoval @ 2018-07-25 23:59 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

This is preparation for allowing CRASH_CORE to be enabled for any
architecture.

swapper_pg_dir is always either an array or a macro expanding to NULL.
In the latter case, VMCOREINFO_SYMBOL() won't work, as it tries to take
the address of the given symbol:

	#define VMCOREINFO_SYMBOL(name) \
		vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)&name)

Instead, use VMCOREINFO_SYMBOL_ARRAY(), which uses the value:

	#define VMCOREINFO_SYMBOL_ARRAY(name) \
		vmcoreinfo_append_str("SYMBOL(%s)=%lx\n", #name, (unsigned long)name)

This is the same thing for the array case but isn't an error for the
macro case.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 kernel/crash_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index b66aced5e8c2..e7b4025c7b24 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -401,7 +401,7 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_SYMBOL(init_uts_ns);
 	VMCOREINFO_SYMBOL(node_online_map);
 #ifdef CONFIG_MMU
-	VMCOREINFO_SYMBOL(swapper_pg_dir);
+	VMCOREINFO_SYMBOL_ARRAY(swapper_pg_dir);
 #endif
 	VMCOREINFO_SYMBOL(_stext);
 	VMCOREINFO_SYMBOL(vmap_area_list);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 9/9] proc/kcore: add vmcoreinfo note to /proc/kcore
  2018-07-25 23:59 [PATCH v4 0/9] /proc/kcore improvements Omar Sandoval
                   ` (7 preceding siblings ...)
  2018-07-25 23:59 ` [PATCH v4 8/9] crash_core: use VMCOREINFO_SYMBOL_ARRAY() for swapper_pg_dir Omar Sandoval
@ 2018-07-25 23:59 ` Omar Sandoval
  8 siblings, 0 replies; 11+ messages in thread
From: Omar Sandoval @ 2018-07-25 23:59 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

The vmcoreinfo information is useful for runtime debugging tools, not
just for crash dumps. A lot of this information can be determined by
other means, but this is much more convenient, and it only adds a page
at most to the file.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/Kconfig            |  1 +
 fs/proc/kcore.c            | 18 ++++++++++++++++--
 include/linux/crash_core.h |  2 ++
 kernel/crash_core.c        |  4 ++--
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 0eaeb41453f5..817c02b13b1d 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -31,6 +31,7 @@ config PROC_FS
 config PROC_KCORE
 	bool "/proc/kcore support" if !ARM
 	depends on PROC_FS && MMU
+	select CRASH_CORE
 	help
 	  Provides a virtual ELF core file of the live kernel.  This can
 	  be read with gdb and other ELF tools.  No modifications can be
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 758c14e46a44..80464432dfe6 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -10,6 +10,7 @@
  *	Safe accesses to vmalloc/direct-mapped discontiguous areas, Kanoj Sarcar <kanoj@sgi.com>
  */
 
+#include <linux/crash_core.h>
 #include <linux/mm.h>
 #include <linux/proc_fs.h>
 #include <linux/kcore.h>
@@ -81,10 +82,13 @@ static size_t get_kcore_size(int *nphdr, size_t *phdrs_len, size_t *notes_len,
 	}
 
 	*phdrs_len = *nphdr * sizeof(struct elf_phdr);
-	*notes_len = (3 * (sizeof(struct elf_note) + ALIGN(sizeof(CORE_STR), 4)) +
+	*notes_len = (4 * sizeof(struct elf_note) +
+		      3 * ALIGN(sizeof(CORE_STR), 4) +
+		      VMCOREINFO_NOTE_NAME_BYTES +
 		      ALIGN(sizeof(struct elf_prstatus), 4) +
 		      ALIGN(sizeof(struct elf_prpsinfo), 4) +
-		      ALIGN(arch_task_struct_size, 4));
+		      ALIGN(arch_task_struct_size, 4) +
+		      ALIGN(vmcoreinfo_size, 4));
 	*data_offset = PAGE_ALIGN(sizeof(struct elfhdr) + *phdrs_len +
 				  *notes_len);
 	return *data_offset + size;
@@ -406,6 +410,16 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 				  sizeof(prpsinfo));
 		append_kcore_note(notes, &i, CORE_STR, NT_TASKSTRUCT, current,
 				  arch_task_struct_size);
+		/*
+		 * vmcoreinfo_size is mostly constant after init time, but it
+		 * can be changed by crash_save_vmcoreinfo(). Racing here with a
+		 * panic on another CPU before the machine goes down is insanely
+		 * unlikely, but it's better to not leave potential buffer
+		 * overflows lying around, regardless.
+		 */
+		append_kcore_note(notes, &i, VMCOREINFO_NOTE_NAME, 0,
+				  vmcoreinfo_data,
+				  min(vmcoreinfo_size, notes_len - i));
 
 		tsz = min_t(size_t, buflen, notes_offset + notes_len - *fpos);
 		if (copy_to_user(buffer, notes + *fpos - notes_offset, tsz)) {
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index b511f6d24b42..525510a9f965 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -60,6 +60,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
 #define VMCOREINFO_CONFIG(name) \
 	vmcoreinfo_append_str("CONFIG_%s=y\n", #name)
 
+extern unsigned char *vmcoreinfo_data;
+extern size_t vmcoreinfo_size;
 extern u32 *vmcoreinfo_note;
 
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index e7b4025c7b24..a683bfb0530f 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -14,8 +14,8 @@
 #include <asm/sections.h>
 
 /* vmcoreinfo stuff */
-static unsigned char *vmcoreinfo_data;
-static size_t vmcoreinfo_size;
+unsigned char *vmcoreinfo_data;
+size_t vmcoreinfo_size;
 u32 *vmcoreinfo_note;
 
 /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/9] proc/kcore: don't grab lock for kclist_add()
  2018-07-25 23:59 ` [PATCH v4 1/9] proc/kcore: don't grab lock for kclist_add() Omar Sandoval
@ 2018-08-07  5:05   ` Bhupesh Sharma
  0 siblings, 0 replies; 11+ messages in thread
From: Bhupesh Sharma @ 2018-08-07  5:05 UTC (permalink / raw)
  To: Omar Sandoval, linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, kernel-team

Hello Omar,

On 07/26/2018 05:29 AM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> kclist_add() is only called at init time, so there's no point in
> grabbing any locks. We're also going to replace the rwlock with a rwsem,
> which we don't want to try grabbing during early boot.
> 
> While we're here, mark kclist_add() with __init so that we'll get a
> warning if it's called from non-init code.
> 
> Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>   fs/proc/kcore.c       | 7 +++----
>   include/linux/kcore.h | 2 +-
>   2 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> index 66c373230e60..b0b9a76f28d6 100644
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -62,16 +62,15 @@ static LIST_HEAD(kclist_head);
>   static DEFINE_RWLOCK(kclist_lock);
>   static int kcore_need_update = 1;
>   
> -void
> -kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
> +/* This doesn't grab kclist_lock, so it should only be used at init time. */
> +void __init kclist_add(struct kcore_list *new, void *addr, size_t size,
> +		       int type)
>   {
>   	new->addr = (unsigned long)addr;
>   	new->size = size;
>   	new->type = type;
>   
> -	write_lock(&kclist_lock);
>   	list_add_tail(&new->list, &kclist_head);
> -	write_unlock(&kclist_lock);
>   }
>   
>   static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
> diff --git a/include/linux/kcore.h b/include/linux/kcore.h
> index 8de55e4b5ee9..c20f296438fb 100644
> --- a/include/linux/kcore.h
> +++ b/include/linux/kcore.h
> @@ -35,7 +35,7 @@ struct vmcoredd_node {
>   };
>   
>   #ifdef CONFIG_PROC_KCORE
> -extern void kclist_add(struct kcore_list *, void *, size_t, int type);
> +void __init kclist_add(struct kcore_list *, void *, size_t, int type);
>   #else
>   static inline
>   void kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
> 

I have been looking at a problem on arm64 platforms where we are trying 
to get access to PHYS_OFFSET symbol (which indicates the start of 
physical RAM) in user-space for determining the start of physical RAM in 
user-space utilities like 'kexec-tools' (please see [1] and [2] for 
details).

Now, I have a 'kexec-tools' implementation available which can read the 
PHYS_OFFSET from the VMCOREINFO inside '/proc/kcore', which I plan to 
publish soon on my github tree.

I also see that 'readelf' and 'crash' can read the VMCOREINFO from the 
'/proc/kcore' properly after this patch:

# readelf -a --wide /proc/kcore

Displaying notes found at file offset 0x00000778 with length 0x000019b4:
   Owner                 Data size	Description
   CORE                 0x00000188	NT_PRSTATUS (prstatus structure)	
   CORE                 0x00000088	NT_PRPSINFO (prpsinfo structure)	
   CORE                 0x00001040	NT_TASKSTRUCT (task structure)	
   VMCOREINFO           0x00000710	Unknown note type: (0x00000000)	
<..snip..>

# crash /root/git/linux/vmlinux vmcore -d31

Elf64_Nhdr:
                n_namesz: 11 ("VMCOREINFO")
                n_descsz: 1829
                  n_type: 0 (unused)
                          OSRELEASE=4.18.0-rc7+
                          PAGESIZE=65536
                          SYMBOL(init_uts_ns)=ffff5493078a5428
                          SYMBOL(node_online_map)=ffff54930789d1c8
                          SYMBOL(swapper_pg_dir)=ffff549308380000
                          SYMBOL(_stext)=ffff549306681000
                          SYMBOL(vmap_area_list)=ffff549307944ee0
                          SYMBOL(mem_section)=ffff92047fffe400
                          LENGTH(mem_section)=64
                          SIZE(mem_section)=16
                          OFFSET(mem_section.section_mem_map)=0
                          SIZE(page)=64
                          SIZE(pglist_data)=6656
                          SIZE(zone)=1728
                          SIZE(free_area)=88
                          SIZE(list_head)=16
                          SIZE(nodemask_t)=8
                          OFFSET(page.flags)=0
                          OFFSET(page._refcount)=52
                          OFFSET(page.mapping)=24
                          OFFSET(page.lru)=8
                          OFFSET(page._mapcount)=48
                          OFFSET(page.private)=40
                          OFFSET(page.compound_dtor)=16
                          OFFSET(page.compound_order)=17
                          OFFSET(page.compound_head)=8
                          OFFSET(pglist_data.node_zones)=0
                          OFFSET(pglist_data.nr_zones)=5984
                          OFFSET(pglist_data.node_start_pfn)=5992
                          OFFSET(pglist_data.node_spanned_pages)=6008
                          OFFSET(pglist_data.node_id)=6016
                          OFFSET(zone.free_area)=192
                          OFFSET(zone.vm_stat)=1536
                          OFFSET(zone.spanned_pages)=96
                          OFFSET(free_area.free_list)=0
                          OFFSET(list_head.next)=0
                          OFFSET(list_head.prev)=8
                          OFFSET(vmap_area.va_start)=0
                          OFFSET(vmap_area.list)=48
                          LENGTH(zone.free_area)=14
                          SYMBOL(log_buf)=ffff5493078ddc30
                          SYMBOL(log_buf_len)=ffff5493078ddc28
                          SYMBOL(log_first_idx)=ffff5493081054ac
                          SYMBOL(clear_idx)=ffff5493081054b8
                          SYMBOL(log_next_idx)=ffff5493081054a8
                          SIZE(printk_log)=16
                          OFFSET(printk_log.ts_nsec)=0
                          OFFSET(printk_log.len)=8
                          OFFSET(printk_log.text_len)=10
                          OFFSET(printk_log.dict_len)=12
                          LENGTH(free_area.free_list)=5
                          NUMBER(NR_FREE_PAGES)=0
                          NUMBER(PG_lru)=5
                          NUMBER(PG_private)=12
                          NUMBER(PG_swapcache)=9
                          NUMBER(PG_swapbacked)=18
                          NUMBER(PG_slab)=8
                          NUMBER(PG_hwpoison)=21
                          NUMBER(PG_head_mask)=32768
                          NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
                          NUMBER(HUGETLB_PAGE_DTOR)=2
                          NUMBER(VA_BITS)=48
                          NUMBER(kimage_voffset)=0xffff5492f5c00000
                          NUMBER(PHYS_OFFSET)=0xffffee1380000000
                          CRASHTIME=1532965574


So, for what it is worth:

Reviewed-by and Tested-by: Bhupesh Sharma <bhsharma@redhat.com>

Thanks,
Bhupesh

[1] https://www.spinics.net/lists/kexec/msg20842.html
[2] https://www.spinics.net/lists/kexec/msg20618.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-08-07  7:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-25 23:59 [PATCH v4 0/9] /proc/kcore improvements Omar Sandoval
2018-07-25 23:59 ` [PATCH v4 1/9] proc/kcore: don't grab lock for kclist_add() Omar Sandoval
2018-08-07  5:05   ` Bhupesh Sharma
2018-07-25 23:59 ` [PATCH v4 2/9] proc/kcore: don't grab lock for memory hotplug notifier Omar Sandoval
2018-07-25 23:59 ` [PATCH v4 3/9] proc/kcore: replace kclist_lock rwlock with rwsem Omar Sandoval
2018-07-25 23:59 ` [PATCH v4 4/9] proc/kcore: fix memory hotplug vs multiple opens race Omar Sandoval
2018-07-25 23:59 ` [PATCH v4 5/9] proc/kcore: hold lock during read Omar Sandoval
2018-07-25 23:59 ` [PATCH v4 6/9] proc/kcore: clean up ELF header generation Omar Sandoval
2018-07-25 23:59 ` [PATCH v4 7/9] proc/kcore: optimize multiple page reads Omar Sandoval
2018-07-25 23:59 ` [PATCH v4 8/9] crash_core: use VMCOREINFO_SYMBOL_ARRAY() for swapper_pg_dir Omar Sandoval
2018-07-25 23:59 ` [PATCH v4 9/9] proc/kcore: add vmcoreinfo note to /proc/kcore Omar Sandoval

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).