linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] /proc/kcore improvements
@ 2018-07-13  0:09 Omar Sandoval
  2018-07-13  0:09 ` [PATCH v2 1/7] proc/kcore: don't grab lock for kclist_add() Omar Sandoval
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Omar Sandoval @ 2018-07-13  0:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

Hi,

This series makes a few improvements to /proc/kcore. Patches 1 and 2 are
prep patches. Patch 3 is a fix/cleanup. Patch 4 is another prep patch.
Patches 5 and 6 are optimizations to ->read(). Patch 7 adds vmcoreinfo
to /proc/kcore (apparently I'm not the only one who wants this, see
https://www.spinics.net/lists/arm-kernel/msg665103.html).

I tested that the crash utility still works with this applied, and
readelf is happy with it, as well.

Andrew, since this didn't get any traction on the fsdevel side, and
you're already carrying James' patch, could you take this through -mm?

Thanks!

Changes from v1:

- Rebased onto v4.18-rc4 + James' patch
  (https://patchwork.kernel.org/patch/10519739/) in the mm tree
- Fix spurious sparse warning (see the report and response in
  https://patchwork.kernel.org/patch/10512431/)

Omar Sandoval (7):
  proc/kcore: don't grab lock for kclist_add()
  proc/kcore: replace kclist_lock rwlock with rwsem
  proc/kcore: fix memory hotplug vs multiple opens race
  proc/kcore: hold lock during read
  proc/kcore: clean up ELF header generation
  proc/kcore: optimize multiple page reads
  proc/kcore: add vmcoreinfo note to /proc/kcore

 fs/proc/Kconfig            |   1 +
 fs/proc/kcore.c            | 536 +++++++++++++++++--------------------
 include/linux/crash_core.h |   2 +
 kernel/crash_core.c        |   4 +-
 4 files changed, 251 insertions(+), 292 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/7] proc/kcore: don't grab lock for kclist_add()
  2018-07-13  0:09 [PATCH v2 0/7] /proc/kcore improvements Omar Sandoval
@ 2018-07-13  0:09 ` Omar Sandoval
  2018-07-18  2:35   ` Andrew Morton
  2018-07-13  0:09 ` [PATCH v2 2/7] proc/kcore: replace kclist_lock rwlock with rwsem Omar Sandoval
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Omar Sandoval @ 2018-07-13  0:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

kclist_add() is only called at init time, so there's no point in
grabbing any locks. We're also going to replace the rwlock with a rwsem,
which we don't want to try grabbing during early boot.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 66c373230e60..ddeeb3a5a015 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -62,6 +62,7 @@ static LIST_HEAD(kclist_head);
 static DEFINE_RWLOCK(kclist_lock);
 static int kcore_need_update = 1;
 
+/* This doesn't grab kclist_lock, so it should only be used at init time. */
 void
 kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
 {
@@ -69,9 +70,7 @@ kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
 	new->size = size;
 	new->type = type;
 
-	write_lock(&kclist_lock);
 	list_add_tail(&new->list, &kclist_head);
-	write_unlock(&kclist_lock);
 }
 
 static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/7] proc/kcore: replace kclist_lock rwlock with rwsem
  2018-07-13  0:09 [PATCH v2 0/7] /proc/kcore improvements Omar Sandoval
  2018-07-13  0:09 ` [PATCH v2 1/7] proc/kcore: don't grab lock for kclist_add() Omar Sandoval
@ 2018-07-13  0:09 ` Omar Sandoval
  2018-07-18  2:38   ` Andrew Morton
  2018-07-13  0:09 ` [PATCH v2 3/7] proc/kcore: fix memory hotplug vs multiple opens race Omar Sandoval
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Omar Sandoval @ 2018-07-13  0:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

Now we only need kclist_lock from user context and at fs init time, and
the following changes need to sleep while holding the kclist_lock.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index ddeeb3a5a015..def92fccb167 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -59,8 +59,8 @@ struct memelfnote
 };
 
 static LIST_HEAD(kclist_head);
-static DEFINE_RWLOCK(kclist_lock);
-static int kcore_need_update = 1;
+static DECLARE_RWSEM(kclist_lock);
+static atomic_t kcore_need_update = ATOMIC_INIT(1);
 
 /* This doesn't grab kclist_lock, so it should only be used at init time. */
 void
@@ -117,8 +117,8 @@ static void __kcore_update_ram(struct list_head *list)
 	struct kcore_list *tmp, *pos;
 	LIST_HEAD(garbage);
 
-	write_lock(&kclist_lock);
-	if (kcore_need_update) {
+	down_write(&kclist_lock);
+	if (atomic_cmpxchg(&kcore_need_update, 1, 0)) {
 		list_for_each_entry_safe(pos, tmp, &kclist_head, list) {
 			if (pos->type == KCORE_RAM
 				|| pos->type == KCORE_VMEMMAP)
@@ -127,9 +127,8 @@ static void __kcore_update_ram(struct list_head *list)
 		list_splice_tail(list, &kclist_head);
 	} else
 		list_splice(list, &garbage);
-	kcore_need_update = 0;
 	proc_root_kcore->size = get_kcore_size(&nphdr, &size);
-	write_unlock(&kclist_lock);
+	up_write(&kclist_lock);
 
 	free_kclist_ents(&garbage);
 }
@@ -452,11 +451,11 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	int nphdr;
 	unsigned long start;
 
-	read_lock(&kclist_lock);
+	down_read(&kclist_lock);
 	size = get_kcore_size(&nphdr, &elf_buflen);
 
 	if (buflen == 0 || *fpos >= size) {
-		read_unlock(&kclist_lock);
+		up_read(&kclist_lock);
 		return 0;
 	}
 
@@ -473,11 +472,11 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 			tsz = buflen;
 		elf_buf = kzalloc(elf_buflen, GFP_ATOMIC);
 		if (!elf_buf) {
-			read_unlock(&kclist_lock);
+			up_read(&kclist_lock);
 			return -ENOMEM;
 		}
 		elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen);
-		read_unlock(&kclist_lock);
+		up_read(&kclist_lock);
 		if (copy_to_user(buffer, elf_buf + *fpos, tsz)) {
 			kfree(elf_buf);
 			return -EFAULT;
@@ -492,7 +491,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		if (buflen == 0)
 			return acc;
 	} else
-		read_unlock(&kclist_lock);
+		up_read(&kclist_lock);
 
 	/*
 	 * Check to see if our file offset matches with any of
@@ -505,12 +504,12 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	while (buflen) {
 		struct kcore_list *m;
 
-		read_lock(&kclist_lock);
+		down_read(&kclist_lock);
 		list_for_each_entry(m, &kclist_head, list) {
 			if (start >= m->addr && start < (m->addr+m->size))
 				break;
 		}
-		read_unlock(&kclist_lock);
+		up_read(&kclist_lock);
 
 		if (&m->list == &kclist_head) {
 			if (clear_user(buffer, tsz))
@@ -563,7 +562,7 @@ static int open_kcore(struct inode *inode, struct file *filp)
 	if (!filp->private_data)
 		return -ENOMEM;
 
-	if (kcore_need_update)
+	if (atomic_read(&kcore_need_update))
 		kcore_update_ram();
 	if (i_size_read(inode) != proc_root_kcore->size) {
 		inode_lock(inode);
@@ -593,9 +592,8 @@ static int __meminit kcore_callback(struct notifier_block *self,
 	switch (action) {
 	case MEM_ONLINE:
 	case MEM_OFFLINE:
-		write_lock(&kclist_lock);
-		kcore_need_update = 1;
-		write_unlock(&kclist_lock);
+		atomic_set(&kcore_need_update, 1);
+		break;
 	}
 	return NOTIFY_OK;
 }
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/7] proc/kcore: fix memory hotplug vs multiple opens race
  2018-07-13  0:09 [PATCH v2 0/7] /proc/kcore improvements Omar Sandoval
  2018-07-13  0:09 ` [PATCH v2 1/7] proc/kcore: don't grab lock for kclist_add() Omar Sandoval
  2018-07-13  0:09 ` [PATCH v2 2/7] proc/kcore: replace kclist_lock rwlock with rwsem Omar Sandoval
@ 2018-07-13  0:09 ` Omar Sandoval
  2018-07-13  0:09 ` [PATCH v2 4/7] proc/kcore: hold lock during read Omar Sandoval
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Omar Sandoval @ 2018-07-13  0:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

There's a theoretical race condition that will cause /proc/kcore to miss
a memory hotplug event:

CPU0                              CPU1
// hotplug event 1
kcore_need_update = 1

open_kcore()                      open_kcore()
    kcore_update_ram()                kcore_update_ram()
        // Walk RAM                       // Walk RAM
        __kcore_update_ram()              __kcore_update_ram()
            kcore_need_update = 0

// hotplug event 2
kcore_need_update = 1
                                              kcore_need_update = 0

Note that CPU1 set up the RAM kcore entries with the state after hotplug
event 1 but cleared the flag for hotplug event 2. The RAM entries will
therefore be stale until there is another hotplug event.

This is an extremely unlikely sequence of events, but the fix makes the
synchronization saner, anyways: we serialize the entire update sequence,
which means that whoever clears the flag will always succeed in
replacing the kcore list.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 93 +++++++++++++++++++++++--------------------------
 1 file changed, 44 insertions(+), 49 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index def92fccb167..33667db6e370 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -98,53 +98,15 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
 	return size + *elf_buflen;
 }
 
-static void free_kclist_ents(struct list_head *head)
-{
-	struct kcore_list *tmp, *pos;
-
-	list_for_each_entry_safe(pos, tmp, head, list) {
-		list_del(&pos->list);
-		kfree(pos);
-	}
-}
-/*
- * Replace all KCORE_RAM/KCORE_VMEMMAP information with passed list.
- */
-static void __kcore_update_ram(struct list_head *list)
-{
-	int nphdr;
-	size_t size;
-	struct kcore_list *tmp, *pos;
-	LIST_HEAD(garbage);
-
-	down_write(&kclist_lock);
-	if (atomic_cmpxchg(&kcore_need_update, 1, 0)) {
-		list_for_each_entry_safe(pos, tmp, &kclist_head, list) {
-			if (pos->type == KCORE_RAM
-				|| pos->type == KCORE_VMEMMAP)
-				list_move(&pos->list, &garbage);
-		}
-		list_splice_tail(list, &kclist_head);
-	} else
-		list_splice(list, &garbage);
-	proc_root_kcore->size = get_kcore_size(&nphdr, &size);
-	up_write(&kclist_lock);
-
-	free_kclist_ents(&garbage);
-}
-
-
 #ifdef CONFIG_HIGHMEM
 /*
  * If no highmem, we can assume [0...max_low_pfn) continuous range of memory
  * because memory hole is not as big as !HIGHMEM case.
  * (HIGHMEM is special because part of memory is _invisible_ from the kernel.)
  */
-static int kcore_update_ram(void)
+static int kcore_ram_list(struct list_head *head)
 {
-	LIST_HEAD(head);
 	struct kcore_list *ent;
-	int ret = 0;
 
 	ent = kmalloc(sizeof(*ent), GFP_KERNEL);
 	if (!ent)
@@ -152,9 +114,8 @@ static int kcore_update_ram(void)
 	ent->addr = (unsigned long)__va(0);
 	ent->size = max_low_pfn << PAGE_SHIFT;
 	ent->type = KCORE_RAM;
-	list_add(&ent->list, &head);
-	__kcore_update_ram(&head);
-	return ret;
+	list_add(&ent->list, head);
+	return 0;
 }
 
 #else /* !CONFIG_HIGHMEM */
@@ -253,11 +214,10 @@ kclist_add_private(unsigned long pfn, unsigned long nr_pages, void *arg)
 	return 1;
 }
 
-static int kcore_update_ram(void)
+static int kcore_ram_list(struct list_head *list)
 {
 	int nid, ret;
 	unsigned long end_pfn;
-	LIST_HEAD(head);
 
 	/* Not inialized....update now */
 	/* find out "max pfn" */
@@ -269,15 +229,50 @@ static int kcore_update_ram(void)
 			end_pfn = node_end;
 	}
 	/* scan 0 to max_pfn */
-	ret = walk_system_ram_range(0, end_pfn, &head, kclist_add_private);
-	if (ret) {
-		free_kclist_ents(&head);
+	ret = walk_system_ram_range(0, end_pfn, list, kclist_add_private);
+	if (ret)
 		return -ENOMEM;
+	return 0;
+}
+#endif /* CONFIG_HIGHMEM */
+
+static int kcore_update_ram(void)
+{
+	LIST_HEAD(list);
+	LIST_HEAD(garbage);
+	int nphdr;
+	size_t size;
+	struct kcore_list *tmp, *pos;
+	int ret = 0;
+
+	down_write(&kclist_lock);
+	if (!atomic_cmpxchg(&kcore_need_update, 1, 0))
+		goto out;
+
+	ret = kcore_ram_list(&list);
+	if (ret) {
+		/* Couldn't get the RAM list, try again next time. */
+		atomic_set(&kcore_need_update, 1);
+		list_splice_tail(&list, &garbage);
+		goto out;
+	}
+
+	list_for_each_entry_safe(pos, tmp, &kclist_head, list) {
+		if (pos->type == KCORE_RAM || pos->type == KCORE_VMEMMAP)
+			list_move(&pos->list, &garbage);
+	}
+	list_splice_tail(&list, &kclist_head);
+
+	proc_root_kcore->size = get_kcore_size(&nphdr, &size);
+
+out:
+	up_write(&kclist_lock);
+	list_for_each_entry_safe(pos, tmp, &garbage, list) {
+		list_del(&pos->list);
+		kfree(pos);
 	}
-	__kcore_update_ram(&head);
 	return ret;
 }
-#endif /* CONFIG_HIGHMEM */
 
 /*****************************************************************************/
 /*
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/7] proc/kcore: hold lock during read
  2018-07-13  0:09 [PATCH v2 0/7] /proc/kcore improvements Omar Sandoval
                   ` (2 preceding siblings ...)
  2018-07-13  0:09 ` [PATCH v2 3/7] proc/kcore: fix memory hotplug vs multiple opens race Omar Sandoval
@ 2018-07-13  0:09 ` Omar Sandoval
  2018-07-13  0:09 ` [PATCH v2 5/7] proc/kcore: clean up ELF header generation Omar Sandoval
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Omar Sandoval @ 2018-07-13  0:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

Now that we're using an rwsem, we can hold it during the entirety of
read_kcore() and have a common return path. This is preparation for the
next change.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 70 ++++++++++++++++++++++++++++---------------------
 1 file changed, 40 insertions(+), 30 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 33667db6e370..f1ae848c7bcc 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -440,19 +440,18 @@ static ssize_t
 read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 {
 	char *buf = file->private_data;
-	ssize_t acc = 0;
 	size_t size, tsz;
 	size_t elf_buflen;
 	int nphdr;
 	unsigned long start;
+	size_t orig_buflen = buflen;
+	int ret = 0;
 
 	down_read(&kclist_lock);
 	size = get_kcore_size(&nphdr, &elf_buflen);
 
-	if (buflen == 0 || *fpos >= size) {
-		up_read(&kclist_lock);
-		return 0;
-	}
+	if (buflen == 0 || *fpos >= size)
+		goto out;
 
 	/* trim buflen to not go beyond EOF */
 	if (buflen > size - *fpos)
@@ -465,28 +464,26 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		tsz = elf_buflen - *fpos;
 		if (buflen < tsz)
 			tsz = buflen;
-		elf_buf = kzalloc(elf_buflen, GFP_ATOMIC);
+		elf_buf = kzalloc(elf_buflen, GFP_KERNEL);
 		if (!elf_buf) {
-			up_read(&kclist_lock);
-			return -ENOMEM;
+			ret = -ENOMEM;
+			goto out;
 		}
 		elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen);
-		up_read(&kclist_lock);
 		if (copy_to_user(buffer, elf_buf + *fpos, tsz)) {
 			kfree(elf_buf);
-			return -EFAULT;
+			ret = -EFAULT;
+			goto out;
 		}
 		kfree(elf_buf);
 		buflen -= tsz;
 		*fpos += tsz;
 		buffer += tsz;
-		acc += tsz;
 
 		/* leave now if filled buffer already */
 		if (buflen == 0)
-			return acc;
-	} else
-		up_read(&kclist_lock);
+			goto out;
+	}
 
 	/*
 	 * Check to see if our file offset matches with any of
@@ -499,25 +496,29 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	while (buflen) {
 		struct kcore_list *m;
 
-		down_read(&kclist_lock);
 		list_for_each_entry(m, &kclist_head, list) {
 			if (start >= m->addr && start < (m->addr+m->size))
 				break;
 		}
-		up_read(&kclist_lock);
 
 		if (&m->list == &kclist_head) {
-			if (clear_user(buffer, tsz))
-				return -EFAULT;
+			if (clear_user(buffer, tsz)) {
+				ret = -EFAULT;
+				goto out;
+			}
 		} else if (m->type == KCORE_VMALLOC) {
 			vread(buf, (char *)start, tsz);
 			/* we have to zero-fill user buffer even if no read */
-			if (copy_to_user(buffer, buf, tsz))
-				return -EFAULT;
+			if (copy_to_user(buffer, buf, tsz)) {
+				ret = -EFAULT;
+				goto out;
+			}
 		} else if (m->type == KCORE_USER) {
 			/* User page is handled prior to normal kernel page: */
-			if (copy_to_user(buffer, (char *)start, tsz))
-				return -EFAULT;
+			if (copy_to_user(buffer, (char *)start, tsz)) {
+				ret = -EFAULT;
+				goto out;
+			}
 		} else {
 			if (kern_addr_valid(start)) {
 				/*
@@ -525,26 +526,35 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 				 * hardened user copy kernel text checks.
 				 */
 				if (probe_kernel_read(buf, (void *) start, tsz)) {
-					if (clear_user(buffer, tsz))
-						return -EFAULT;
+					if (clear_user(buffer, tsz)) {
+						ret = -EFAULT;
+						goto out;
+					}
 				} else {
-					if (copy_to_user(buffer, buf, tsz))
-						return -EFAULT;
+					if (copy_to_user(buffer, buf, tsz)) {
+						ret = -EFAULT;
+						goto out;
+					}
 				}
 			} else {
-				if (clear_user(buffer, tsz))
-					return -EFAULT;
+				if (clear_user(buffer, tsz)) {
+					ret = -EFAULT;
+					goto out;
+				}
 			}
 		}
 		buflen -= tsz;
 		*fpos += tsz;
 		buffer += tsz;
-		acc += tsz;
 		start += tsz;
 		tsz = (buflen > PAGE_SIZE ? PAGE_SIZE : buflen);
 	}
 
-	return acc;
+out:
+	up_write(&kclist_lock);
+	if (ret)
+		return ret;
+	return orig_buflen - buflen;
 }
 
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/7] proc/kcore: clean up ELF header generation
  2018-07-13  0:09 [PATCH v2 0/7] /proc/kcore improvements Omar Sandoval
                   ` (3 preceding siblings ...)
  2018-07-13  0:09 ` [PATCH v2 4/7] proc/kcore: hold lock during read Omar Sandoval
@ 2018-07-13  0:09 ` Omar Sandoval
  2018-07-13  0:09 ` [PATCH v2 6/7] proc/kcore: optimize multiple page reads Omar Sandoval
  2018-07-13  0:09 ` [PATCH v2 7/7] proc/kcore: add vmcoreinfo note to /proc/kcore Omar Sandoval
  6 siblings, 0 replies; 16+ messages in thread
From: Omar Sandoval @ 2018-07-13  0:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

Currently, the ELF file header, program headers, and note segment are
allocated all at once, in some icky code dating back to 2.3. Programs
tend to read the file header, then the program headers, then the note
segment, all separately, so this is a waste of effort. It's cleaner and
more efficient to handle the three separately.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 350 +++++++++++++++++++-----------------------------
 1 file changed, 141 insertions(+), 209 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index f1ae848c7bcc..a7e730b40154 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -49,15 +49,6 @@ static struct proc_dir_entry *proc_root_kcore;
 #define	kc_offset_to_vaddr(o) ((o) + PAGE_OFFSET)
 #endif
 
-/* An ELF note in memory */
-struct memelfnote
-{
-	const char *name;
-	int type;
-	unsigned int datasz;
-	void *data;
-};
-
 static LIST_HEAD(kclist_head);
 static DECLARE_RWSEM(kclist_lock);
 static atomic_t kcore_need_update = ATOMIC_INIT(1);
@@ -73,7 +64,8 @@ kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
 	list_add_tail(&new->list, &kclist_head);
 }
 
-static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
+static size_t get_kcore_size(int *nphdr, size_t *phdrs_len, size_t *notes_len,
+			     size_t *data_offset)
 {
 	size_t try, size;
 	struct kcore_list *m;
@@ -87,15 +79,15 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
 			size = try;
 		*nphdr = *nphdr + 1;
 	}
-	*elf_buflen =	sizeof(struct elfhdr) + 
-			(*nphdr + 2)*sizeof(struct elf_phdr) + 
-			3 * ((sizeof(struct elf_note)) +
-			     roundup(sizeof(CORE_STR), 4)) +
-			roundup(sizeof(struct elf_prstatus), 4) +
-			roundup(sizeof(struct elf_prpsinfo), 4) +
-			roundup(arch_task_struct_size, 4);
-	*elf_buflen = PAGE_ALIGN(*elf_buflen);
-	return size + *elf_buflen;
+
+	*phdrs_len = *nphdr * sizeof(struct elf_phdr);
+	*notes_len = (3 * (sizeof(struct elf_note) + ALIGN(sizeof(CORE_STR), 4)) +
+		      ALIGN(sizeof(struct elf_prstatus), 4) +
+		      ALIGN(sizeof(struct elf_prpsinfo), 4) +
+		      ALIGN(arch_task_struct_size, 4));
+	*data_offset = PAGE_ALIGN(sizeof(struct elfhdr) + *phdrs_len +
+				  *notes_len);
+	return *data_offset + size;
 }
 
 #ifdef CONFIG_HIGHMEM
@@ -241,7 +233,7 @@ static int kcore_update_ram(void)
 	LIST_HEAD(list);
 	LIST_HEAD(garbage);
 	int nphdr;
-	size_t size;
+	size_t phdrs_len, notes_len, data_offset;
 	struct kcore_list *tmp, *pos;
 	int ret = 0;
 
@@ -263,7 +255,8 @@ static int kcore_update_ram(void)
 	}
 	list_splice_tail(&list, &kclist_head);
 
-	proc_root_kcore->size = get_kcore_size(&nphdr, &size);
+	proc_root_kcore->size = get_kcore_size(&nphdr, &phdrs_len, &notes_len,
+					       &data_offset);
 
 out:
 	up_write(&kclist_lock);
@@ -274,228 +267,168 @@ static int kcore_update_ram(void)
 	return ret;
 }
 
-/*****************************************************************************/
-/*
- * determine size of ELF note
- */
-static int notesize(struct memelfnote *en)
+static void append_kcore_note(char *notes, size_t *i, const char *name,
+			      unsigned int type, const void *desc,
+			      size_t descsz)
 {
-	int sz;
-
-	sz = sizeof(struct elf_note);
-	sz += roundup((strlen(en->name) + 1), 4);
-	sz += roundup(en->datasz, 4);
-
-	return sz;
-} /* end notesize() */
-
-/*****************************************************************************/
-/*
- * store a note in the header buffer
- */
-static char *storenote(struct memelfnote *men, char *bufp)
-{
-	struct elf_note en;
-
-#define DUMP_WRITE(addr,nr) do { memcpy(bufp,addr,nr); bufp += nr; } while(0)
-
-	en.n_namesz = strlen(men->name) + 1;
-	en.n_descsz = men->datasz;
-	en.n_type = men->type;
-
-	DUMP_WRITE(&en, sizeof(en));
-	DUMP_WRITE(men->name, en.n_namesz);
-
-	/* XXX - cast from long long to long to avoid need for libgcc.a */
-	bufp = (char*) roundup((unsigned long)bufp,4);
-	DUMP_WRITE(men->data, men->datasz);
-	bufp = (char*) roundup((unsigned long)bufp,4);
-
-#undef DUMP_WRITE
-
-	return bufp;
-} /* end storenote() */
-
-/*
- * store an ELF coredump header in the supplied buffer
- * nphdr is the number of elf_phdr to insert
- */
-static void elf_kcore_store_hdr(char *bufp, int nphdr, int dataoff)
-{
-	struct elf_prstatus prstatus;	/* NT_PRSTATUS */
-	struct elf_prpsinfo prpsinfo;	/* NT_PRPSINFO */
-	struct elf_phdr *nhdr, *phdr;
-	struct elfhdr *elf;
-	struct memelfnote notes[3];
-	off_t offset = 0;
-	struct kcore_list *m;
-
-	/* setup ELF header */
-	elf = (struct elfhdr *) bufp;
-	bufp += sizeof(struct elfhdr);
-	offset += sizeof(struct elfhdr);
-	memcpy(elf->e_ident, ELFMAG, SELFMAG);
-	elf->e_ident[EI_CLASS]	= ELF_CLASS;
-	elf->e_ident[EI_DATA]	= ELF_DATA;
-	elf->e_ident[EI_VERSION]= EV_CURRENT;
-	elf->e_ident[EI_OSABI] = ELF_OSABI;
-	memset(elf->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
-	elf->e_type	= ET_CORE;
-	elf->e_machine	= ELF_ARCH;
-	elf->e_version	= EV_CURRENT;
-	elf->e_entry	= 0;
-	elf->e_phoff	= sizeof(struct elfhdr);
-	elf->e_shoff	= 0;
-	elf->e_flags	= ELF_CORE_EFLAGS;
-	elf->e_ehsize	= sizeof(struct elfhdr);
-	elf->e_phentsize= sizeof(struct elf_phdr);
-	elf->e_phnum	= nphdr;
-	elf->e_shentsize= 0;
-	elf->e_shnum	= 0;
-	elf->e_shstrndx	= 0;
-
-	/* setup ELF PT_NOTE program header */
-	nhdr = (struct elf_phdr *) bufp;
-	bufp += sizeof(struct elf_phdr);
-	offset += sizeof(struct elf_phdr);
-	nhdr->p_type	= PT_NOTE;
-	nhdr->p_offset	= 0;
-	nhdr->p_vaddr	= 0;
-	nhdr->p_paddr	= 0;
-	nhdr->p_filesz	= 0;
-	nhdr->p_memsz	= 0;
-	nhdr->p_flags	= 0;
-	nhdr->p_align	= 0;
-
-	/* setup ELF PT_LOAD program header for every area */
-	list_for_each_entry(m, &kclist_head, list) {
-		phdr = (struct elf_phdr *) bufp;
-		bufp += sizeof(struct elf_phdr);
-		offset += sizeof(struct elf_phdr);
-
-		phdr->p_type	= PT_LOAD;
-		phdr->p_flags	= PF_R|PF_W|PF_X;
-		phdr->p_offset	= kc_vaddr_to_offset(m->addr) + dataoff;
-		phdr->p_vaddr	= (size_t)m->addr;
-		if (m->type == KCORE_RAM)
-			phdr->p_paddr	= __pa(m->addr);
-		else if (m->type == KCORE_TEXT)
-			phdr->p_paddr	= __pa_symbol(m->addr);
-		else
-			phdr->p_paddr	= (elf_addr_t)-1;
-		phdr->p_filesz	= phdr->p_memsz	= m->size;
-		phdr->p_align	= PAGE_SIZE;
-	}
-
-	/*
-	 * Set up the notes in similar form to SVR4 core dumps made
-	 * with info from their /proc.
-	 */
-	nhdr->p_offset	= offset;
-
-	/* set up the process status */
-	notes[0].name = CORE_STR;
-	notes[0].type = NT_PRSTATUS;
-	notes[0].datasz = sizeof(struct elf_prstatus);
-	notes[0].data = &prstatus;
-
-	memset(&prstatus, 0, sizeof(struct elf_prstatus));
-
-	nhdr->p_filesz	= notesize(&notes[0]);
-	bufp = storenote(&notes[0], bufp);
-
-	/* set up the process info */
-	notes[1].name	= CORE_STR;
-	notes[1].type	= NT_PRPSINFO;
-	notes[1].datasz	= sizeof(struct elf_prpsinfo);
-	notes[1].data	= &prpsinfo;
-
-	memset(&prpsinfo, 0, sizeof(struct elf_prpsinfo));
-	prpsinfo.pr_state	= 0;
-	prpsinfo.pr_sname	= 'R';
-	prpsinfo.pr_zomb	= 0;
-
-	strcpy(prpsinfo.pr_fname, "vmlinux");
-	strlcpy(prpsinfo.pr_psargs, saved_command_line, sizeof(prpsinfo.pr_psargs));
-
-	nhdr->p_filesz	+= notesize(&notes[1]);
-	bufp = storenote(&notes[1], bufp);
-
-	/* set up the task structure */
-	notes[2].name	= CORE_STR;
-	notes[2].type	= NT_TASKSTRUCT;
-	notes[2].datasz	= arch_task_struct_size;
-	notes[2].data	= current;
-
-	nhdr->p_filesz	+= notesize(&notes[2]);
-	bufp = storenote(&notes[2], bufp);
-
-} /* end elf_kcore_store_hdr() */
+	struct elf_note *note = (struct elf_note *)&notes[*i];
+
+	note->n_namesz = strlen(name) + 1;
+	note->n_descsz = descsz;
+	note->n_type = type;
+	*i += sizeof(*note);
+	memcpy(&notes[*i], name, note->n_namesz);
+	*i = ALIGN(*i + note->n_namesz, 4);
+	memcpy(&notes[*i], desc, descsz);
+	*i = ALIGN(*i + descsz, 4);
+}
 
-/*****************************************************************************/
-/*
- * read from the ELF header and then kernel memory
- */
 static ssize_t
 read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 {
 	char *buf = file->private_data;
-	size_t size, tsz;
-	size_t elf_buflen;
+	size_t phdrs_offset, notes_offset, data_offset;
+	size_t phdrs_len, notes_len;
+	struct kcore_list *m;
+	size_t tsz;
 	int nphdr;
 	unsigned long start;
 	size_t orig_buflen = buflen;
 	int ret = 0;
 
 	down_read(&kclist_lock);
-	size = get_kcore_size(&nphdr, &elf_buflen);
 
-	if (buflen == 0 || *fpos >= size)
-		goto out;
+	get_kcore_size(&nphdr, &phdrs_len, &notes_len, &data_offset);
+	phdrs_offset = sizeof(struct elfhdr);
+	notes_offset = phdrs_offset + phdrs_len;
+
+	/* ELF file header. */
+	if (buflen && *fpos < sizeof(struct elfhdr)) {
+		struct elfhdr ehdr = {
+			.e_ident = {
+				[EI_MAG0] = ELFMAG0,
+				[EI_MAG1] = ELFMAG1,
+				[EI_MAG2] = ELFMAG2,
+				[EI_MAG3] = ELFMAG3,
+				[EI_CLASS] = ELF_CLASS,
+				[EI_DATA] = ELF_DATA,
+				[EI_VERSION] = EV_CURRENT,
+				[EI_OSABI] = ELF_OSABI,
+			},
+			.e_type = ET_CORE,
+			.e_machine = ELF_ARCH,
+			.e_version = EV_CURRENT,
+			.e_phoff = sizeof(struct elfhdr),
+			.e_flags = ELF_CORE_EFLAGS,
+			.e_ehsize = sizeof(struct elfhdr),
+			.e_phentsize = sizeof(struct elf_phdr),
+			.e_phnum = nphdr,
+		};
+
+		tsz = min_t(size_t, buflen, sizeof(struct elfhdr) - *fpos);
+		if (copy_to_user(buffer, (char *)&ehdr + *fpos, tsz)) {
+			ret = -EFAULT;
+			goto out;
+		}
 
-	/* trim buflen to not go beyond EOF */
-	if (buflen > size - *fpos)
-		buflen = size - *fpos;
+		buffer += tsz;
+		buflen -= tsz;
+		*fpos += tsz;
+	}
 
-	/* construct an ELF core header if we'll need some of it */
-	if (*fpos < elf_buflen) {
-		char * elf_buf;
+	/* ELF program headers. */
+	if (buflen && *fpos < phdrs_offset + phdrs_len) {
+		struct elf_phdr *phdrs, *phdr;
 
-		tsz = elf_buflen - *fpos;
-		if (buflen < tsz)
-			tsz = buflen;
-		elf_buf = kzalloc(elf_buflen, GFP_KERNEL);
-		if (!elf_buf) {
+		phdrs = kzalloc(phdrs_len, GFP_KERNEL);
+		if (!phdrs) {
 			ret = -ENOMEM;
 			goto out;
 		}
-		elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen);
-		if (copy_to_user(buffer, elf_buf + *fpos, tsz)) {
-			kfree(elf_buf);
+
+		phdrs[0].p_type = PT_NOTE;
+		phdrs[0].p_offset = notes_offset;
+		phdrs[0].p_filesz = notes_len;
+
+		phdr = &phdrs[1];
+		list_for_each_entry(m, &kclist_head, list) {
+			phdr->p_type = PT_LOAD;
+			phdr->p_flags = PF_R | PF_W | PF_X;
+			phdr->p_offset = kc_vaddr_to_offset(m->addr) + data_offset;
+			phdr->p_vaddr = (size_t)m->addr;
+			if (m->type == KCORE_RAM)
+				phdr->p_paddr = __pa(m->addr);
+			else if (m->type == KCORE_TEXT)
+				phdr->p_paddr = __pa_symbol(m->addr);
+			else
+				phdr->p_paddr = (elf_addr_t)-1;
+			phdr->p_filesz = phdr->p_memsz = m->size;
+			phdr->p_align = PAGE_SIZE;
+			phdr++;
+		}
+
+		tsz = min_t(size_t, buflen, phdrs_offset + phdrs_len - *fpos);
+		if (copy_to_user(buffer, (char *)phdrs + *fpos - phdrs_offset,
+				 tsz)) {
+			kfree(phdrs);
 			ret = -EFAULT;
 			goto out;
 		}
-		kfree(elf_buf);
+		kfree(phdrs);
+
+		buffer += tsz;
 		buflen -= tsz;
 		*fpos += tsz;
-		buffer += tsz;
+	}
+
+	/* ELF note segment. */
+	if (buflen && *fpos < notes_offset + notes_len) {
+		struct elf_prstatus prstatus = {};
+		struct elf_prpsinfo prpsinfo = {
+			.pr_sname = 'R',
+			.pr_fname = "vmlinux",
+		};
+		char *notes;
+		size_t i = 0;
+
+		strlcpy(prpsinfo.pr_psargs, saved_command_line,
+			sizeof(prpsinfo.pr_psargs));
+
+		notes = kzalloc(notes_len, GFP_KERNEL);
+		if (!notes) {
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		append_kcore_note(notes, &i, CORE_STR, NT_PRSTATUS, &prstatus,
+				  sizeof(prstatus));
+		append_kcore_note(notes, &i, CORE_STR, NT_PRPSINFO, &prpsinfo,
+				  sizeof(prpsinfo));
+		append_kcore_note(notes, &i, CORE_STR, NT_TASKSTRUCT, current,
+				  arch_task_struct_size);
 
-		/* leave now if filled buffer already */
-		if (buflen == 0)
+		tsz = min_t(size_t, buflen, notes_offset + notes_len - *fpos);
+		if (copy_to_user(buffer, notes + *fpos - notes_offset, tsz)) {
+			kfree(notes);
+			ret = -EFAULT;
 			goto out;
+		}
+		kfree(notes);
+
+		buffer += tsz;
+		buflen -= tsz;
+		*fpos += tsz;
 	}
 
 	/*
 	 * Check to see if our file offset matches with any of
 	 * the addresses in the elf_phdr on our list.
 	 */
-	start = kc_offset_to_vaddr(*fpos - elf_buflen);
+	start = kc_offset_to_vaddr(*fpos - data_offset);
 	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
 		tsz = buflen;
-		
-	while (buflen) {
-		struct kcore_list *m;
 
+	while (buflen) {
 		list_for_each_entry(m, &kclist_head, list) {
 			if (start >= m->addr && start < (m->addr+m->size))
 				break;
@@ -557,7 +490,6 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	return orig_buflen - buflen;
 }
 
-
 static int open_kcore(struct inode *inode, struct file *filp)
 {
 	if (!capable(CAP_SYS_RAWIO))
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 6/7] proc/kcore: optimize multiple page reads
  2018-07-13  0:09 [PATCH v2 0/7] /proc/kcore improvements Omar Sandoval
                   ` (4 preceding siblings ...)
  2018-07-13  0:09 ` [PATCH v2 5/7] proc/kcore: clean up ELF header generation Omar Sandoval
@ 2018-07-13  0:09 ` Omar Sandoval
  2018-07-13  0:09 ` [PATCH v2 7/7] proc/kcore: add vmcoreinfo note to /proc/kcore Omar Sandoval
  6 siblings, 0 replies; 16+ messages in thread
From: Omar Sandoval @ 2018-07-13  0:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

The current code does a full search of the segment list every time for
every page. This is wasteful, since it's almost certain that the next
page will be in the same segment. Instead, check if the previous segment
covers the current page before doing the list search.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/kcore.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index a7e730b40154..d1b875afc359 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -428,10 +428,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
 		tsz = buflen;
 
+	m = NULL;
 	while (buflen) {
-		list_for_each_entry(m, &kclist_head, list) {
-			if (start >= m->addr && start < (m->addr+m->size))
-				break;
+		/*
+		 * If this is the first iteration or the address is not within
+		 * the previous entry, search for a matching entry.
+		 */
+		if (!m || start < m->addr || start >= m->addr + m->size) {
+			list_for_each_entry(m, &kclist_head, list) {
+				if (start >= m->addr &&
+				    start < m->addr + m->size)
+					break;
+			}
 		}
 
 		if (&m->list == &kclist_head) {
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 7/7] proc/kcore: add vmcoreinfo note to /proc/kcore
  2018-07-13  0:09 [PATCH v2 0/7] /proc/kcore improvements Omar Sandoval
                   ` (5 preceding siblings ...)
  2018-07-13  0:09 ` [PATCH v2 6/7] proc/kcore: optimize multiple page reads Omar Sandoval
@ 2018-07-13  0:09 ` Omar Sandoval
  2018-07-18  2:44   ` Andrew Morton
  6 siblings, 1 reply; 16+ messages in thread
From: Omar Sandoval @ 2018-07-13  0:09 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, Andrew Morton
  Cc: Alexey Dobriyan, Eric Biederman, James Morse, Bhupesh Sharma,
	kernel-team

From: Omar Sandoval <osandov@fb.com>

The vmcoreinfo information is useful for runtime debugging tools, not
just for crash dumps. A lot of this information can be determined by
other means, but this is much more convenient.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/proc/Kconfig            |  1 +
 fs/proc/kcore.c            | 18 ++++++++++++++++--
 include/linux/crash_core.h |  2 ++
 kernel/crash_core.c        |  4 ++--
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 0eaeb41453f5..817c02b13b1d 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -31,6 +31,7 @@ config PROC_FS
 config PROC_KCORE
 	bool "/proc/kcore support" if !ARM
 	depends on PROC_FS && MMU
+	select CRASH_CORE
 	help
 	  Provides a virtual ELF core file of the live kernel.  This can
 	  be read with gdb and other ELF tools.  No modifications can be
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index d1b875afc359..bef78923b387 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -10,6 +10,7 @@
  *	Safe accesses to vmalloc/direct-mapped discontiguous areas, Kanoj Sarcar <kanoj@sgi.com>
  */
 
+#include <linux/crash_core.h>
 #include <linux/mm.h>
 #include <linux/proc_fs.h>
 #include <linux/kcore.h>
@@ -81,10 +82,13 @@ static size_t get_kcore_size(int *nphdr, size_t *phdrs_len, size_t *notes_len,
 	}
 
 	*phdrs_len = *nphdr * sizeof(struct elf_phdr);
-	*notes_len = (3 * (sizeof(struct elf_note) + ALIGN(sizeof(CORE_STR), 4)) +
+	*notes_len = (4 * sizeof(struct elf_note) +
+		      3 * ALIGN(sizeof(CORE_STR), 4) +
+		      VMCOREINFO_NOTE_NAME_BYTES +
 		      ALIGN(sizeof(struct elf_prstatus), 4) +
 		      ALIGN(sizeof(struct elf_prpsinfo), 4) +
-		      ALIGN(arch_task_struct_size, 4));
+		      ALIGN(arch_task_struct_size, 4) +
+		      ALIGN(vmcoreinfo_size, 4));
 	*data_offset = PAGE_ALIGN(sizeof(struct elfhdr) + *phdrs_len +
 				  *notes_len);
 	return *data_offset + size;
@@ -406,6 +410,16 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 				  sizeof(prpsinfo));
 		append_kcore_note(notes, &i, CORE_STR, NT_TASKSTRUCT, current,
 				  arch_task_struct_size);
+		/*
+		 * vmcoreinfo_size is mostly constant after init time, but it
+		 * can be changed by crash_save_vmcoreinfo(). Racing here with a
+		 * panic on another CPU before the machine goes down is insanely
+		 * unlikely, but it's better to not leave potential buffer
+		 * overflows lying around, regardless.
+		 */
+		append_kcore_note(notes, &i, VMCOREINFO_NOTE_NAME, 0,
+				  vmcoreinfo_data,
+				  min(vmcoreinfo_size, notes_len - i));
 
 		tsz = min_t(size_t, buflen, notes_offset + notes_len - *fpos);
 		if (copy_to_user(buffer, notes + *fpos - notes_offset, tsz)) {
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index b511f6d24b42..525510a9f965 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -60,6 +60,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
 #define VMCOREINFO_CONFIG(name) \
 	vmcoreinfo_append_str("CONFIG_%s=y\n", #name)
 
+extern unsigned char *vmcoreinfo_data;
+extern size_t vmcoreinfo_size;
 extern u32 *vmcoreinfo_note;
 
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index b66aced5e8c2..d02c58b94460 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -14,8 +14,8 @@
 #include <asm/sections.h>
 
 /* vmcoreinfo stuff */
-static unsigned char *vmcoreinfo_data;
-static size_t vmcoreinfo_size;
+unsigned char *vmcoreinfo_data;
+size_t vmcoreinfo_size;
 u32 *vmcoreinfo_note;
 
 /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/7] proc/kcore: don't grab lock for kclist_add()
  2018-07-13  0:09 ` [PATCH v2 1/7] proc/kcore: don't grab lock for kclist_add() Omar Sandoval
@ 2018-07-18  2:35   ` Andrew Morton
  2018-07-18  3:19     ` Omar Sandoval
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2018-07-18  2:35 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-kernel, linux-fsdevel, Alexey Dobriyan, Eric Biederman,
	James Morse, Bhupesh Sharma, kernel-team

On Thu, 12 Jul 2018 17:09:33 -0700 Omar Sandoval <osandov@osandov.com> wrote:

> From: Omar Sandoval <osandov@fb.com>
> 
> kclist_add() is only called at init time, so there's no point in
> grabbing any locks. We're also going to replace the rwlock with a rwsem,
> which we don't want to try grabbing during early boot.
> 
> ...
>
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -62,6 +62,7 @@ static LIST_HEAD(kclist_head);
>  static DEFINE_RWLOCK(kclist_lock);
>  static int kcore_need_update = 1;
>  
> +/* This doesn't grab kclist_lock, so it should only be used at init time. */
>  void
>  kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
>  {
> @@ -69,9 +70,7 @@ kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
>  	new->size = size;
>  	new->type = type;
>  
> -	write_lock(&kclist_lock);
>  	list_add_tail(&new->list, &kclist_head);
> -	write_unlock(&kclist_lock);
>  }

So we can mark kclist_add() as __init, yes?

That way we save a scrap of ram and if someone starts calling
kclist_add() from non-__init code we get a build-time warning.

--- a/fs/proc/kcore.c~proc-kcore-dont-grab-lock-for-kclist_add-fix
+++ a/fs/proc/kcore.c
@@ -63,7 +63,7 @@ static DEFINE_RWLOCK(kclist_lock);
 static int kcore_need_update = 1;
 
 /* This doesn't grab kclist_lock, so it should only be used at init time. */
-void
+void __init
 kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
 {
 	new->addr = (unsigned long)addr;
--- a/include/linux/kcore.h~proc-kcore-dont-grab-lock-for-kclist_add-fix
+++ a/include/linux/kcore.h
@@ -35,7 +35,7 @@ struct vmcoredd_node {
 };
 
 #ifdef CONFIG_PROC_KCORE
-extern void kclist_add(struct kcore_list *, void *, size_t, int type);
+extern void __init kclist_add(struct kcore_list *, void *, size_t, int type);
 #else
 static inline
 void kclist_add(struct kcore_list *new, void *addr, size_t size, int type)



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/7] proc/kcore: replace kclist_lock rwlock with rwsem
  2018-07-13  0:09 ` [PATCH v2 2/7] proc/kcore: replace kclist_lock rwlock with rwsem Omar Sandoval
@ 2018-07-18  2:38   ` Andrew Morton
  2018-07-18  3:24     ` Omar Sandoval
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2018-07-18  2:38 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-kernel, linux-fsdevel, Alexey Dobriyan, Eric Biederman,
	James Morse, Bhupesh Sharma, kernel-team

On Thu, 12 Jul 2018 17:09:34 -0700 Omar Sandoval <osandov@osandov.com> wrote:

> From: Omar Sandoval <osandov@fb.com>
> 
> Now we only need kclist_lock from user context and at fs init time, and
> the following changes need to sleep while holding the kclist_lock.
> 
> ...
>
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -59,8 +59,8 @@ struct memelfnote
>  };
>  
>  static LIST_HEAD(kclist_head);
> -static DEFINE_RWLOCK(kclist_lock);
> -static int kcore_need_update = 1;
> +static DECLARE_RWSEM(kclist_lock);
> +static atomic_t kcore_need_update = ATOMIC_INIT(1);

It's unclear why kcore_need_update was changed to atomic_t - it's still
updated under kclist_lock?

Maybe it's for a later patch.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 7/7] proc/kcore: add vmcoreinfo note to /proc/kcore
  2018-07-13  0:09 ` [PATCH v2 7/7] proc/kcore: add vmcoreinfo note to /proc/kcore Omar Sandoval
@ 2018-07-18  2:44   ` Andrew Morton
  2018-07-18  3:27     ` Omar Sandoval
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2018-07-18  2:44 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-kernel, linux-fsdevel, Alexey Dobriyan, Eric Biederman,
	James Morse, Bhupesh Sharma, kernel-team

On Thu, 12 Jul 2018 17:09:39 -0700 Omar Sandoval <osandov@osandov.com> wrote:

> From: Omar Sandoval <osandov@fb.com>
> 
> The vmcoreinfo information is useful for runtime debugging tools, not
> just for crash dumps. A lot of this information can be determined by
> other means, but this is much more convenient.
> 

What effect does this have upon the overall size of the dump file?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/7] proc/kcore: don't grab lock for kclist_add()
  2018-07-18  2:35   ` Andrew Morton
@ 2018-07-18  3:19     ` Omar Sandoval
  0 siblings, 0 replies; 16+ messages in thread
From: Omar Sandoval @ 2018-07-18  3:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Alexey Dobriyan, Eric Biederman,
	James Morse, Bhupesh Sharma, kernel-team

On Tue, Jul 17, 2018 at 07:35:04PM -0700, Andrew Morton wrote:
> On Thu, 12 Jul 2018 17:09:33 -0700 Omar Sandoval <osandov@osandov.com> wrote:
> 
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > kclist_add() is only called at init time, so there's no point in
> > grabbing any locks. We're also going to replace the rwlock with a rwsem,
> > which we don't want to try grabbing during early boot.
> > 
> > ...
> >
> > --- a/fs/proc/kcore.c
> > +++ b/fs/proc/kcore.c
> > @@ -62,6 +62,7 @@ static LIST_HEAD(kclist_head);
> >  static DEFINE_RWLOCK(kclist_lock);
> >  static int kcore_need_update = 1;
> >  
> > +/* This doesn't grab kclist_lock, so it should only be used at init time. */
> >  void
> >  kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
> >  {
> > @@ -69,9 +70,7 @@ kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
> >  	new->size = size;
> >  	new->type = type;
> >  
> > -	write_lock(&kclist_lock);
> >  	list_add_tail(&new->list, &kclist_head);
> > -	write_unlock(&kclist_lock);
> >  }
> 
> So we can mark kclist_add() as __init, yes?

Yes, thanks, I'll add that in v2.

> That way we save a scrap of ram and if someone starts calling
> kclist_add() from non-__init code we get a build-time warning.
> 
> --- a/fs/proc/kcore.c~proc-kcore-dont-grab-lock-for-kclist_add-fix
> +++ a/fs/proc/kcore.c
> @@ -63,7 +63,7 @@ static DEFINE_RWLOCK(kclist_lock);
>  static int kcore_need_update = 1;
>  
>  /* This doesn't grab kclist_lock, so it should only be used at init time. */
> -void
> +void __init
>  kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
>  {
>  	new->addr = (unsigned long)addr;
> --- a/include/linux/kcore.h~proc-kcore-dont-grab-lock-for-kclist_add-fix
> +++ a/include/linux/kcore.h
> @@ -35,7 +35,7 @@ struct vmcoredd_node {
>  };
>  
>  #ifdef CONFIG_PROC_KCORE
> -extern void kclist_add(struct kcore_list *, void *, size_t, int type);
> +extern void __init kclist_add(struct kcore_list *, void *, size_t, int type);
>  #else
>  static inline
>  void kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/7] proc/kcore: replace kclist_lock rwlock with rwsem
  2018-07-18  2:38   ` Andrew Morton
@ 2018-07-18  3:24     ` Omar Sandoval
  2018-07-18  3:27       ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Omar Sandoval @ 2018-07-18  3:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Alexey Dobriyan, Eric Biederman,
	James Morse, Bhupesh Sharma, kernel-team

On Tue, Jul 17, 2018 at 07:38:41PM -0700, Andrew Morton wrote:
> On Thu, 12 Jul 2018 17:09:34 -0700 Omar Sandoval <osandov@osandov.com> wrote:
> 
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Now we only need kclist_lock from user context and at fs init time, and
> > the following changes need to sleep while holding the kclist_lock.
> > 
> > ...
> >
> > --- a/fs/proc/kcore.c
> > +++ b/fs/proc/kcore.c
> > @@ -59,8 +59,8 @@ struct memelfnote
> >  };
> >  
> >  static LIST_HEAD(kclist_head);
> > -static DEFINE_RWLOCK(kclist_lock);
> > -static int kcore_need_update = 1;
> > +static DECLARE_RWSEM(kclist_lock);
> > +static atomic_t kcore_need_update = ATOMIC_INIT(1);
> 
> It's unclear why kcore_need_update was changed to atomic_t - it's still
> updated under kclist_lock?

Not in the hotplug notifier (kcore_callback()) anymore, so I need the
atomic_cmpxchg() in __kcore_update_ram(). That could use a mention in
the commit message.

> Maybe it's for a later patch.
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 7/7] proc/kcore: add vmcoreinfo note to /proc/kcore
  2018-07-18  2:44   ` Andrew Morton
@ 2018-07-18  3:27     ` Omar Sandoval
  0 siblings, 0 replies; 16+ messages in thread
From: Omar Sandoval @ 2018-07-18  3:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Alexey Dobriyan, Eric Biederman,
	James Morse, Bhupesh Sharma, kernel-team

On Tue, Jul 17, 2018 at 07:44:21PM -0700, Andrew Morton wrote:
> On Thu, 12 Jul 2018 17:09:39 -0700 Omar Sandoval <osandov@osandov.com> wrote:
> 
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > The vmcoreinfo information is useful for runtime debugging tools, not
> > just for crash dumps. A lot of this information can be determined by
> > other means, but this is much more convenient.
> > 
> 
> What effect does this have upon the overall size of the dump file?

About 2 kB on my machine, and no more than one page + the few bytes of
extra headers.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/7] proc/kcore: replace kclist_lock rwlock with rwsem
  2018-07-18  3:24     ` Omar Sandoval
@ 2018-07-18  3:27       ` Andrew Morton
  2018-07-18  3:36         ` Omar Sandoval
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2018-07-18  3:27 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-kernel, linux-fsdevel, Alexey Dobriyan, Eric Biederman,
	James Morse, Bhupesh Sharma, kernel-team

On Tue, 17 Jul 2018 20:24:05 -0700 Omar Sandoval <osandov@osandov.com> wrote:

> > > --- a/fs/proc/kcore.c
> > > +++ b/fs/proc/kcore.c
> > > @@ -59,8 +59,8 @@ struct memelfnote
> > >  };
> > >  
> > >  static LIST_HEAD(kclist_head);
> > > -static DEFINE_RWLOCK(kclist_lock);
> > > -static int kcore_need_update = 1;
> > > +static DECLARE_RWSEM(kclist_lock);
> > > +static atomic_t kcore_need_update = ATOMIC_INIT(1);
> > 
> > It's unclear why kcore_need_update was changed to atomic_t - it's still
> > updated under kclist_lock?
> 
> Not in the hotplug notifier (kcore_callback()) anymore, so I need the
> atomic_cmpxchg() in __kcore_update_ram().

Well that's just

	kcore_need_update = 1;

and turning that into an atomic_set doesn't change anything(?).

It's not a harmful change of course, but a bit ... odd.

> That could use a mention in the commit message.

That never hurts ;)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/7] proc/kcore: replace kclist_lock rwlock with rwsem
  2018-07-18  3:27       ` Andrew Morton
@ 2018-07-18  3:36         ` Omar Sandoval
  0 siblings, 0 replies; 16+ messages in thread
From: Omar Sandoval @ 2018-07-18  3:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Alexey Dobriyan, Eric Biederman,
	James Morse, Bhupesh Sharma, kernel-team

On Tue, Jul 17, 2018 at 08:27:53PM -0700, Andrew Morton wrote:
> On Tue, 17 Jul 2018 20:24:05 -0700 Omar Sandoval <osandov@osandov.com> wrote:
> 
> > > > --- a/fs/proc/kcore.c
> > > > +++ b/fs/proc/kcore.c
> > > > @@ -59,8 +59,8 @@ struct memelfnote
> > > >  };
> > > >  
> > > >  static LIST_HEAD(kclist_head);
> > > > -static DEFINE_RWLOCK(kclist_lock);
> > > > -static int kcore_need_update = 1;
> > > > +static DECLARE_RWSEM(kclist_lock);
> > > > +static atomic_t kcore_need_update = ATOMIC_INIT(1);
> > > 
> > > It's unclear why kcore_need_update was changed to atomic_t - it's still
> > > updated under kclist_lock?
> > 
> > Not in the hotplug notifier (kcore_callback()) anymore, so I need the
> > atomic_cmpxchg() in __kcore_update_ram().
> 
> Well that's just
> 
> 	kcore_need_update = 1;
> 
> and turning that into an atomic_set doesn't change anything(?).
> 
> It's not a harmful change of course, but a bit ... odd.

The change from read, ..., write to cmpxchg in __kcore_update_ram() is
the important part, not the change from = to atomic_set(). I needed to
change that because now kcore_need_update could potentially be set again
by the hotplug notifier while we're in __kcore_update_ram(). But I'll
just put all of this in the commit message in v3 :)

> > That could use a mention in the commit message.
> 
> That never hurts ;)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-07-18  3:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-13  0:09 [PATCH v2 0/7] /proc/kcore improvements Omar Sandoval
2018-07-13  0:09 ` [PATCH v2 1/7] proc/kcore: don't grab lock for kclist_add() Omar Sandoval
2018-07-18  2:35   ` Andrew Morton
2018-07-18  3:19     ` Omar Sandoval
2018-07-13  0:09 ` [PATCH v2 2/7] proc/kcore: replace kclist_lock rwlock with rwsem Omar Sandoval
2018-07-18  2:38   ` Andrew Morton
2018-07-18  3:24     ` Omar Sandoval
2018-07-18  3:27       ` Andrew Morton
2018-07-18  3:36         ` Omar Sandoval
2018-07-13  0:09 ` [PATCH v2 3/7] proc/kcore: fix memory hotplug vs multiple opens race Omar Sandoval
2018-07-13  0:09 ` [PATCH v2 4/7] proc/kcore: hold lock during read Omar Sandoval
2018-07-13  0:09 ` [PATCH v2 5/7] proc/kcore: clean up ELF header generation Omar Sandoval
2018-07-13  0:09 ` [PATCH v2 6/7] proc/kcore: optimize multiple page reads Omar Sandoval
2018-07-13  0:09 ` [PATCH v2 7/7] proc/kcore: add vmcoreinfo note to /proc/kcore Omar Sandoval
2018-07-18  2:44   ` Andrew Morton
2018-07-18  3:27     ` Omar Sandoval

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).