All of lore.kernel.org
 help / color / mirror / Atom feed
From: Denys Vlasenko <vda.linux@gmail.com>
To: linux-kernel@vger.kernel.org,
	"Jonathan M. Foote" <jmfoote@cert.org>,
	"H. J. Lu" <hjl.tools@gmail.com>, Ingo Molnar <mingo@elte.hu>,
	"H. Peter Anvin" <hpa@zytor.com>, Andi Kleen <ak@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Jan Kratochvil <jan.kratochvil@redhat.com>
Subject: [PATCH] Extend core dump note section to contain file names of mapped files
Date: Wed, 11 Jul 2012 12:35:51 +0200	[thread overview]
Message-ID: <CAK1hOcNWo2mV2nKmVEuzKzEs7pCveFCBFWrYSrMbj4Y7M8zQ6Q@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 5086 bytes --]

Hi,

Resending the patch after a while.
Jonathan, developer of CERT Triage Tools, expressed the need
to have this information, CCing him.

But before looking at the attached patch, we need a ruling.

In the last review it was proposed to maybe generate
this information in the form of ASCII text, a-la /proc/PID/maps.

This actually is a good idea, but regretfully, it come a few
decades too late, the rest of core file auxiliary information
is traditionally encoded in binary structures.

Please, can someone with authority in this area decide whether
we want to be unorthodox and use ASCII encoding for the whole thing,
or not?

If the decision will be to use ASCII, I will need to rework the patch.

Otherwise, please take a look at attached patch which implements
creation of a new note in binary format and let me know what do you think of it.

Original patch and description follows

* * * * * * * * * * * * * * * * * * * *

While working with core dump analysis, it struck me how much
PITA is caused merely by the fact that names of loaded binary
and libraries are not known.

gdb retrieves loaded library names by examining dynamic loader's
data stored in the core dump's data segments. It uses intimate
knowledge how and where dynamic loader keeps the list of loaded
libraries. (Meaning that it will break if non-standard loader
is used).

And, as Jan explained to me, it depends on knowing where
the linked list of libraries starts, which requires knowing binary
which was running. IIRC there is no easy and reasonably foolproof
way to determine binary's name. (Looking at argv[0] on stack
is not reasonably foolproof).

Which is *ridiculous*. We *know* the list of mapped files
at core dump generation time.

I propose to save this information in core dump, as a new note
in note segment.

This note has the following format:

long count     // how many files are mapped
long page_size // units for file_ofs
array of [COUNT] elements of
   long start
   long end
   long file_ofs
followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...
The attached patch implements this.

Since list of mapped files can be large (/proc/`pidof firefox`/maps
on my machine right now is 38k), I allocate the space for note
via vmalloc, and also have a sanity limit of 4 megabytes.
(Maybe we should make it smaller?)
Oleg suggested using a linked list of smaller structures instead of
using a potentially large contiguous block, and I tried it,
but resulting code was significantly more ugly (for my taste).

The patch is run-tested.

For testing, I sent ABRT signal to a running /usr/bin/md5sum.

"readelf -aW core" shows the new note as:

Notes at offset 0x00000274 with length 0x00000990:
 Owner                 Data size       Description
 CORE                 0x00000090       NT_PRSTATUS (prstatus structure)
 CORE                 0x0000007c       NT_PRPSINFO (prpsinfo structure)
 CORE                 0x000000a0       NT_AUXV (auxiliary vector)
 CORE                 0x00000168       Unknown note type: (0x46494c45)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^new note^^^^^^^^^^
In hex format:
                                               05 00 00 00  |................|
00000460  68 01 00 00 45 4c 49 46  46 49 4c 45 00 00 00 00  |h...ELIFCORE....|
00000470  0b 00 00 00 00 10 00 00  00 80 17 00 00 f0 31 00  |..............1.|
00000480  00 00 00 00 00 f0 31 00  00 00 32 00 a7 01 00 00  |......1...2.....|
00000490  00 00 32 00 00 20 32 00  a7 01 00 00 00 20 32 00  |..2.. 2...... 2.|
000004a0  00 30 32 00 a9 01 00 00  00 50 69 00 00 60 6b 00  |.02......Pi..`k.|
000004b0  00 00 00 00 00 60 6b 00  00 70 6b 00 20 00 00 00  |.....`k..pk. ...|
000004c0  00 70 6b 00 00 80 6b 00  21 00 00 00 00 80 04 08  |.pk...k.!.......|
000004d0  00 00 05 08 00 00 00 00  00 00 05 08 00 10 05 08  |................|
000004e0  07 00 00 00 00 10 05 08  00 20 05 08 08 00 00 00  |......... ......|
000004f0  00 20 52 b7 00 20 72 b7  00 00 00 00 2f 6c 69 62  |. R.. r...../lib|
00000500  2f 6c 69 62 63 2d 32 2e  31 34 2e 39 30 2e 73 6f  |/libc-2.14.90.so|
00000510  00 2f 6c 69 62 2f 6c 69  62 63 2d 32 2e 31 34 2e  |./lib/libc-2.14.|
00000520  39 30 2e 73 6f 00 2f 6c  69 62 2f 6c 69 62 63 2d  |90.so./lib/libc-|
00000530  32 2e 31 34 2e 39 30 2e  73 6f 00 2f 6c 69 62 2f  |2.14.90.so./lib/|
00000540  6c 69 62 63 2d 32 2e 31  34 2e 39 30 2e 73 6f 00  |libc-2.14.90.so.|
00000550  2f 6c 69 62 2f 6c 64 2d  32 2e 31 34 2e 39 30 2e  |/lib/ld-2.14.90.|
00000560  73 6f 00 2f 6c 69 62 2f  6c 64 2d 32 2e 31 34 2e  |so./lib/ld-2.14.|
00000570  39 30 2e 73 6f 00 2f 6c  69 62 2f 6c 64 2d 32 2e  |90.so./lib/ld-2.|
00000580  31 34 2e 39 30 2e 73 6f  00 2f 75 73 72 2f 62 69  |14.90.so./usr/bi|
00000590  6e 2f 6d 64 35 73 75 6d  00 2f 75 73 72 2f 62 69  |n/md5sum./usr/bi|
000005a0  6e 2f 6d 64 35 73 75 6d  00 2f 75 73 72 2f 62 69  |n/md5sum./usr/bi|
000005b0  6e 2f 6d 64 35 73 75 6d  00 2f 75 73 72 2f 6c 69  |n/md5sum./usr/li|
000005c0  62 2f 6c 6f 63 61 6c 65  2f 6c 6f 63 61 6c 65 2d  |b/locale/locale-|
000005d0  61 72 63 68 69 76 65 00                           |archive.

-- 
vda

[-- Attachment #2: file_note.patch --]
[-- Type: application/octet-stream, Size: 3404 bytes --]

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 81878b7..b585ba1 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1358,6 +1358,73 @@ static void fill_auxv_note(struct memelfnote *note, struct mm_struct *mm)
 	fill_note(note, "CORE", NT_AUXV, i * sizeof(elf_addr_t), auxv);
 }
 
+#define MAX_FILE_NOTE_SIZE (4*1024*1024)
+
+static void fill_files_note(struct memelfnote *note)
+{
+	struct vm_area_struct *vma;
+	struct file *file;
+	unsigned count, word_count, size, remaining;
+	long *data;
+	long *start_end_ofs;
+	char *name;
+
+	count = 0;
+	for (vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) {
+		file = vma->vm_file;
+		if (!file)
+			continue;
+		count++;
+		if (count >= MAX_FILE_NOTE_SIZE / 64) /* paranoia check */
+			goto err;
+	}
+
+	size = count * 64;
+	word_count = 2 + 3 * count;
+ alloc:
+	if (size >= MAX_FILE_NOTE_SIZE) /* paranoia check */
+		goto err;
+	size = (size + PAGE_SIZE - 1) & (-PAGE_SIZE);
+	data = vmalloc(size);
+	if (!data)
+		goto err;
+	start_end_ofs = data;
+	name = (void*)&start_end_ofs[word_count];
+	remaining = size - word_count * sizeof(long);
+
+	*start_end_ofs++ = count;
+	*start_end_ofs++ = PAGE_SIZE;
+	for (vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) {
+		const char *filename;
+
+		file = vma->vm_file;
+		if (!file)
+			continue;
+		if (remaining == 0) {
+ try_new_size:
+			vfree(data);
+			size = size * 5 / 4;
+			goto alloc;
+		}
+		filename = d_path(&file->f_path, name, remaining);
+		if (IS_ERR(filename)) {
+			if (PTR_ERR(filename) == -ENAMETOOLONG)
+				goto try_new_size;
+			/* continue; -- WRONG, we must have COUNT elements */
+			filename = "";
+		}
+		while ((remaining--, *name++ = *filename++) != '\0')
+			continue;
+		*start_end_ofs++ = vma->vm_start;
+		*start_end_ofs++ = vma->vm_end;
+		*start_end_ofs++ = vma->vm_pgoff;
+	}
+
+	size = name - (char*)data;
+	fill_note(note, "CORE", NT_FILE, size, data);
+ err: ;
+}
+
 #ifdef CORE_DUMP_USE_REGSET
 #include <linux/regset.h>
 
@@ -1372,6 +1439,7 @@ struct elf_note_info {
 	struct elf_thread_core_info *thread;
 	struct memelfnote psinfo;
 	struct memelfnote auxv;
+	struct memelfnote files;
 	size_t size;
 	int thread_notes;
 };
@@ -1532,6 +1600,9 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
 	fill_auxv_note(&info->auxv, current->mm);
 	info->size += notesize(&info->auxv);
 
+	fill_files_note(&info->files);
+	info->size += notesize(&info->files);
+
 	return 1;
 }
 
@@ -1560,6 +1631,8 @@ static int write_note_info(struct elf_note_info *info,
 			return 0;
 		if (first && !writenote(&info->auxv, file, foffset))
 			return 0;
+		if (first && !writenote(&info->files, file, foffset))
+			return 0;
 
 		for (i = 1; i < info->thread_notes; ++i)
 			if (t->notes[i].data &&
@@ -1586,6 +1659,7 @@ static void free_note_info(struct elf_note_info *info)
 		kfree(t);
 	}
 	kfree(info->psinfo.data);
+	vfree(info->files.data);
 }
 
 #else
diff --git a/include/linux/elf.h b/include/linux/elf.h
index 999b4f5..5e6c08f 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -372,6 +372,7 @@ typedef struct elf64_shdr {
 #define NT_PRPSINFO	3
 #define NT_TASKSTRUCT	4
 #define NT_AUXV		6
+#define NT_FILE		0x46494c45
 #define NT_PRXFPREG     0x46e62b7f      /* copied from gdb5.1/include/elf/common.h */
 #define NT_PPC_VMX	0x100		/* PowerPC Altivec/VMX registers */
 #define NT_PPC_SPE	0x101		/* PowerPC SPE/EVR registers */

             reply	other threads:[~2012-07-11 10:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-11 10:35 Denys Vlasenko [this message]
2012-07-11 15:15 ` [PATCH] Extend core dump note section to contain file names of mapped files Oleg Nesterov
2012-07-12 19:41   ` Denys Vlasenko
2012-07-11 15:40 ` Jonathan M. Foote
  -- strict thread matches above, loose matches on Subject: below --
2012-03-31 20:51 Denys Vlasenko
2012-04-01  3:13 ` Andi Kleen
2012-04-01  3:20   ` H. Peter Anvin
2012-04-01  9:44     ` Andi Kleen
2012-04-01 13:33   ` Denys Vlasenko
2012-04-01 16:53     ` Andi Kleen
2012-04-02  0:24   ` Oleg Nesterov
2012-04-02 11:20     ` Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAK1hOcNWo2mV2nKmVEuzKzEs7pCveFCBFWrYSrMbj4Y7M8zQ6Q@mail.gmail.com \
    --to=vda.linux@gmail.com \
    --cc=ak@suse.de \
    --cc=dvlasenk@redhat.com \
    --cc=hjl.tools@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jan.kratochvil@redhat.com \
    --cc=jmfoote@cert.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.