linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] i386 vsyscall DSO implementation
@ 2003-04-25  1:10 Roland McGrath
  2003-04-25  1:49 ` Jeff Garzik
  2003-04-26 17:15 ` Ulrich Drepper
  0 siblings, 2 replies; 10+ messages in thread
From: Roland McGrath @ 2003-04-25  1:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Ulrich Drepper

[-- Attachment #1: message body text --]
[-- Type: text/plain, Size: 3308 bytes --]

[Please remember that I am not on the mailing list, so make sure any
replies are also addressed to me directly.]

This patch against 2.5.68 turns the i386 vsyscall page into an ELF DSO, as
we discussed earlier.  I usurped the AT_SYSINFO_EH_FRAME number and
replaced it with AT_SYSINFO_EHDR, which points at the beginning of the page
where the DSO's ELF header is found.  Though AT_SYSINFO is left in for
compatibility, it is now superfluous since the vsyscall entry point can be
got from the e_entry field via AT_SYSINFO_EHDR with no additional overhead.

Two DSOs are built (a int $0x80 one and a sysenter one), using normal
assembly code and ld -shared with a special linker script.  Both images
(stripped ELF .so files) are embedded in __initdata space; sysenter_setup
copies one or the other whole image into the vsyscall page.  Each image is
a little under 2k (1884 and 1924) now, and could be trimmed a little bit
with some specialized ELF stripping that ld and strip don't do.  Adding
additional entry points should not have much additional overhead beyond the
code itself and the string size of new symbol names.

The code in the page is identical and still packed the same, but now at
0xffffe400.  The initializers for the page's contents are replaced with
assembly written like normal user-mode code, which is much easier to
maintain.  (Special care must still be taken by hand to make sure that the
entry points in both DSOs match up.)  Using a little additional ld
trickery, hand-written address constants for locations in the vsyscall page
are replaced with symbols that resolve the same at link-time but never need
hand-tweaking to match future changes to the vsyscall page code.  The
address 0xffffe000 is hard-wired in one place in the linker script used to
create the DSOs, and only there.

This patch also changes i386 core dumps to include the vsyscall DSO image
in the dump.  I believe this is important to do so that port-mortem
analysis of a core dump from a process that might have used PCs in the
vsyscall page does not depend on having the live kernel and hardware at the
time of analysis match the vsyscall page contents of the kernel that wrote
the core dump.  As well as dumping the memory, it copies other phdrs from
the DSO into the core file.  This is important to get the PT_GNU_EH_FRAME
information into the dump, where it serves the same purpose for post-mortem
analysis by the debugger that it serves in the live vsyscall DSO image for
cancelation handling.

I have tested the vsyscall DSO image with libc changes that use it (as well
as existing libc that needs AT_SYSINFO to work as it has before).
I have tested that core dumps are valid ELF and the new segments look correct.

Nothing yet in use needs the DSO to have symbol names, or an ELF soname, or
ELF symbol versions.  The selection of soname (in Makefile) and symbol
versions (in vsyscall.lds) are arbitrary and shouldn't be considered final
without some more thought.  Those can be changed or removed without
affecting anything in the kernel.  The exported symbol names __kernel_* are
also used inside the kernel now, but labels can be made in vsyscall-*.S for
kernel use irrespective of what names are chosen to export in the DSO
symbol table.

Please let me know if you like this or don't.


Thanks,
Roland



[-- Attachment #2: patch to linux-2.5.68 implementing i386 vsyscall DSO --]
[-- Type: text/plain, Size: 21872 bytes --]

--- stock-2.5.68/arch/i386/kernel/Makefile	Sat Apr 19 19:48:53 2003
+++ linux-2.5.68/arch/i386/kernel/Makefile	Wed Apr 23 21:03:25 2003
@@ -27,9 +27,29 @@ obj-$(CONFIG_SOFTWARE_SUSPEND)	+= suspen
 obj-$(CONFIG_X86_NUMAQ)		+= numaq.o
 obj-$(CONFIG_EDD)             	+= edd.o
 obj-$(CONFIG_MODULES)		+= module.o
-obj-y				+= sysenter.o
+obj-y				+= sysenter.o vsyscall.o
 obj-$(CONFIG_ACPI_SRAT) 	+= srat.o
 
 EXTRA_AFLAGS   := -traditional
 
 obj-$(CONFIG_SCx200)		+= scx200.o
+
+# vsyscall.o contains the vsyscall DSO images as __initdata.
+# We must build both images before we can assemble it.
+$(obj)/vsyscall.o: $(obj)/vsyscall-int80.so $(obj)/vsyscall-sysenter.so
+extra-y += $(foreach F,int80 sysenter,vsyscall-$F.o vsyscall-$F.so)
+
+# The DSO images are built using a special linker script.
+$(obj)/vsyscall-int80.so $(obj)/vsyscall-sysenter.so: \
+$(obj)/vsyscall-%.so: $(src)/vsyscall.lds $(obj)/vsyscall-%.o
+	$(CC) -nostdlib -shared -s -Wl,-soname=linux-vsyscall.so.1 \
+	      -o $@ -Wl,-T,$^
+
+# We also create a special relocatable object that should mirror the symbol
+# table and layout of the linked DSO.  With ld -R we can then refer to
+# these symbols in the kernel code rather than hand-coded addresses.
+extra-y += vsyscall-syms.o
+$(obj)/built-in.o: $(obj)/vsyscall-syms.o
+$(obj)/built-in.o: ld_flags += -R $(obj)/vsyscall-syms.o
+$(obj)/vsyscall-syms.o: $(src)/vsyscall.lds $(obj)/vsyscall-sysenter.o
+	$(CC) -nostdlib -r -o $@ -Wl,-T,$^
--- stock-2.5.68/arch/i386/kernel/entry.S	Sat Apr 19 19:48:56 2003
+++ linux-2.5.68/arch/i386/kernel/entry.S	Wed Apr 23 20:37:55 2003
@@ -230,8 +230,8 @@ need_resched:
 	jmp need_resched
 #endif
 
-/* Points to after the "sysenter" instruction in the vsyscall page */
-#define SYSENTER_RETURN 0xffffe010
+/* SYSENTER_RETURN points to after the "sysenter" instruction in
+   the vsyscall page.  See vsyscall-sysentry.S, which defines the symbol.  */
 
 	# sysenter call handler stub
 ENTRY(sysenter_entry)
--- stock-2.5.68/arch/i386/kernel/signal.c	Sat Apr 19 19:49:25 2003
+++ linux-2.5.68/arch/i386/kernel/signal.c	Wed Apr 23 20:35:43 2003
@@ -19,6 +19,7 @@
 #include <linux/stddef.h>
 #include <linux/personality.h>
 #include <linux/suspend.h>
+#include <linux/elf.h>
 #include <asm/ucontext.h>
 #include <asm/uaccess.h>
 #include <asm/i387.h>
@@ -347,6 +348,10 @@ get_sigframe(struct k_sigaction *ka, str
 	return (void __user *)((esp - frame_size) & -8ul);
 }
 
+/* These symbols are defined with the addresses in the vsyscall page.
+   See vsyscall-sigreturn.S.  */
+extern void __kernel_sigreturn, __kernel_rt_sigreturn;
+
 static void setup_frame(int sig, struct k_sigaction *ka,
 			sigset_t *set, struct pt_regs * regs)
 {
@@ -379,7 +384,7 @@ static void setup_frame(int sig, struct 
 	if (err)
 		goto give_sigsegv;
 
-	restorer = (void *) (fix_to_virt(FIX_VSYSCALL) + 32);
+	restorer = &__kernel_sigreturn;
 	if (ka->sa.sa_flags & SA_RESTORER)
 		restorer = ka->sa.sa_restorer;
 
@@ -462,7 +467,7 @@ static void setup_rt_frame(int sig, stru
 		goto give_sigsegv;
 
 	/* Set up to return from userspace.  */
-	restorer = (void *) (fix_to_virt(FIX_VSYSCALL) + 64);
+	restorer = &__kernel_rt_sigreturn;
 	if (ka->sa.sa_flags & SA_RESTORER)
 		restorer = ka->sa.sa_restorer;
 	err |= __put_user(restorer, &frame->pretcode);
--- stock-2.5.68/arch/i386/kernel/sysenter.c	Sat Apr 19 19:51:16 2003
+++ linux-2.5.68/arch/i386/kernel/sysenter.c	Wed Apr 23 02:16:02 2003
@@ -51,151 +51,30 @@ void enable_sep_cpu(void *info)
 	put_cpu();	
 }
 
+/*
+ * These symbols are defined by vsyscall.o to mark the bounds
+ * of the ELF DSO images included therein.
+ */
+extern const char vsyscall_int80_start, vsyscall_int80_end;
+extern const char vsyscall_sysenter_start, vsyscall_sysenter_end;
+
 static int __init sysenter_setup(void)
 {
-	static const char __initdata int80[] = {
-		0xcd, 0x80,		/* int $0x80 */
-		0xc3			/* ret */
-	};
-	/* Unwind information for the int80 code.  Keep track of
-	   where the return address is stored.  */
-	static const char __initdata int80_eh_frame[] = {
-	/* First the Common Information Entry (CIE):  */
-		0x14, 0x00, 0x00, 0x00,	/* Length of the CIE */
-		0x00, 0x00, 0x00, 0x00,	/* CIE Identifier Tag */
-		0x01,			/* CIE Version */
-		'z', 'R', 0x00,		/* CIE Augmentation */
-		0x01,			/* CIE Code Alignment Factor */
-		0x7c,			/* CIE Data Alignment Factor */
-		0x08,			/* CIE RA Column */
-		0x01,			/* Augmentation size */
-		0x1b,			/* FDE Encoding (pcrel sdata4) */
-		0x0c,			/* DW_CFA_def_cfa */
-		0x04,
-		0x04,
-		0x88,			/* DW_CFA_offset, column 0x8 */
-		0x01,
-		0x00,			/* padding */
-		0x00,
-	/* Now the FDE which contains the instructions for the frame.  */
-		0x0a, 0x00, 0x00, 0x00,	/* FDE Length */
-		0x1c, 0x00, 0x00, 0x00,	/* FDE CIE offset */
-	/* The PC-relative offset to the beginning of the code this
-	   FDE covers.  The computation below assumes that the offset
-	   can be represented in one byte.  Change if this is not true
-	   anymore.  The offset from the beginning of the .eh_frame
-	   is represented by EH_FRAME_OFFSET.  The word with the offset
-	   starts at byte 0x20 of the .eh_frame.  */
-		0x100 - (EH_FRAME_OFFSET + 0x20),
-		0xff, 0xff, 0xff,	/* FDE initial location */
-		3,			/* FDE address range */
-		0x00			/* Augmentation size */
-	/* The code does not change the stack pointer.  We need not
-	   record any operations.  */
-	};
-	static const char __initdata sysent[] = {
-		0x51,			/* push %ecx */
-		0x52,			/* push %edx */
-		0x55,			/* push %ebp */
-	/* 3: backjump target */
-		0x89, 0xe5,		/* movl %esp,%ebp */
-		0x0f, 0x34,		/* sysenter */
-
-	/* 7: align return point with nop's to make disassembly easier */
-		0x90, 0x90, 0x90, 0x90,
-		0x90, 0x90, 0x90,
-
-	/* 14: System call restart point is here! (SYSENTER_RETURN - 2) */
-		0xeb, 0xf3,		/* jmp to "movl %esp,%ebp" */
-	/* 16: System call normal return point is here! (SYSENTER_RETURN in entry.S) */
-		0x5d,			/* pop %ebp */
-		0x5a,			/* pop %edx */
-		0x59,			/* pop %ecx */
-		0xc3			/* ret */
-	};
-	/* Unwind information for the sysenter code.  Keep track of
-	   where the return address is stored.  */
-	static const char __initdata sysent_eh_frame[] = {
-	/* First the Common Information Entry (CIE):  */
-		0x14, 0x00, 0x00, 0x00,	/* Length of the CIE */
-		0x00, 0x00, 0x00, 0x00,	/* CIE Identifier Tag */
-		0x01,			/* CIE Version */
-		'z', 'R', 0x00,		/* CIE Augmentation */
-		0x01,			/* CIE Code Alignment Factor */
-		0x7c,			/* CIE Data Alignment Factor */
-		0x08,			/* CIE RA Column */
-		0x01,			/* Augmentation size */
-		0x1b,			/* FDE Encoding (pcrel sdata4) */
-		0x0c,			/* DW_CFA_def_cfa */
-		0x04,
-		0x04,
-		0x88,			/* DW_CFA_offset, column 0x8 */
-		0x01,
-		0x00,			/* padding */
-		0x00,
-	/* Now the FDE which contains the instructions for the frame.  */
-		0x22, 0x00, 0x00, 0x00,	/* FDE Length */
-		0x1c, 0x00, 0x00, 0x00,	/* FDE CIE offset */
-	/* The PC-relative offset to the beginning of the code this
-	   FDE covers.  The computation below assumes that the offset
-	   can be represented in one byte.  Change if this is not true
-	   anymore.  The offset from the beginning of the .eh_frame
-	   is represented by EH_FRAME_OFFSET.  The word with the offset
-	   starts at byte 0x20 of the .eh_frame.  */
-		0x100 - (EH_FRAME_OFFSET + 0x20),
-		0xff, 0xff, 0xff,	/* FDE initial location */
-		0x14, 0x00, 0x00, 0x00,	/* FDE address range */
-		0x00,			/* Augmentation size */
-	/* What follows are the instructions for the table generation.
-	   We have to record all changes of the stack pointer and
-	   callee-saved registers.  */
-		0x41,			/* DW_CFA_advance_loc+1, push %ecx */
-		0x0e,			/* DW_CFA_def_cfa_offset */
-		0x08,			/* RA at offset 8 now */
-		0x41,			/* DW_CFA_advance_loc+1, push %edx */
-		0x0e,			/* DW_CFA_def_cfa_offset */
-		0x0c,			/* RA at offset 12 now */
-		0x41,			/* DW_CFA_advance_loc+1, push %ebp */
-		0x0e,			/* DW_CFA_def_cfa_offset */
-		0x10,			/* RA at offset 16 now */
-		0x85, 0x04,		/* DW_CFA_offset %ebp -16 */
-	/* Finally the epilogue.  */
-		0x4e,			/* DW_CFA_advance_loc+14, pop %ebx */
-		0x0e,			/* DW_CFA_def_cfa_offset */
-		0x12,			/* RA at offset 12 now */
-		0xc5,			/* DW_CFA_restore %ebp */
-		0x41,			/* DW_CFA_advance_loc+1, pop %edx */
-		0x0e,			/* DW_CFA_def_cfa_offset */
-		0x08,			/* RA at offset 8 now */
-		0x41,			/* DW_CFA_advance_loc+1, pop %ecx */
-		0x0e,			/* DW_CFA_def_cfa_offset */
-		0x04			/* RA at offset 4 now */
-	};
-	static const char __initdata sigreturn[] = {
-	/* 32: sigreturn point */
-		0x58,				/* popl %eax */
-		0xb8, __NR_sigreturn, 0, 0, 0,	/* movl $__NR_sigreturn, %eax */
-		0xcd, 0x80,			/* int $0x80 */
-	};
-	static const char __initdata rt_sigreturn[] = {
-	/* 64: rt_sigreturn point */
-		0xb8, __NR_rt_sigreturn, 0, 0, 0,	/* movl $__NR_rt_sigreturn, %eax */
-		0xcd, 0x80,			/* int $0x80 */
-	};
 	unsigned long page = get_zeroed_page(GFP_ATOMIC);
 
 	__set_fixmap(FIX_VSYSCALL, __pa(page), PAGE_READONLY);
-	memcpy((void *) page, int80, sizeof(int80));
-	memcpy((void *)(page + 32), sigreturn, sizeof(sigreturn));
-	memcpy((void *)(page + 64), rt_sigreturn, sizeof(rt_sigreturn));
-	memcpy((void *)(page + EH_FRAME_OFFSET), int80_eh_frame,
-	       sizeof(int80_eh_frame));
-	if (!boot_cpu_has(X86_FEATURE_SEP))
+
+	if (!boot_cpu_has(X86_FEATURE_SEP)) {
+		memcpy((void *) page,
+		       &vsyscall_int80_start,
+		       &vsyscall_int80_end - &vsyscall_int80_start);
 		return 0;
+	}
+
+	memcpy((void *) page,
+	       &vsyscall_sysenter_start,
+	       &vsyscall_sysenter_end - &vsyscall_sysenter_start);
 
-	memcpy((void *) page, sysent, sizeof(sysent));
-	memcpy((void *)(page + EH_FRAME_OFFSET), sysent_eh_frame,
-	       sizeof(sysent_eh_frame));
 	on_each_cpu(enable_sep_cpu, NULL, 1, 1);
 	return 0;
 }
--- stock-2.5.68/arch/i386/kernel/vsyscall-int80.S	Wed Dec 31 16:00:00 1969
+++ linux-2.5.68/arch/i386/kernel/vsyscall-int80.S	Wed Apr 23 20:41:41 2003
@@ -0,0 +1,48 @@
+/*
+ * Code for the vsyscall page.  This version uses the old int $0x80 method.
+ */
+
+	.text
+	.globl __kernel_vsyscall
+	.type __kernel_vsyscall,@function
+__kernel_vsyscall:
+.LSTART_vsyscall:
+	int $0x80
+	ret
+.LEND_vsyscall:
+	.size __kernel_vsyscall,.-.LSTART_vsyscall
+	.previous
+
+	.section .eh_frame,"a",@progbits
+.LSTARTFRAMEDLSI:
+	.long .LENDCIEDLSI-.LSTARTCIEDLSI
+.LSTARTCIEDLSI:
+	.long 0			/* CIE ID */
+	.byte 1			/* Version number */
+	.string "zR"		/* NUL-terminated augmentation string */
+	.uleb128 1		/* Code alignment factor */
+	.sleb128 -4		/* Data alignment factor */
+	.byte 8			/* Return address register column */
+	.uleb128 1		/* Augmentation value length */
+	.byte 0x1b		/* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
+	.byte 0x0c		/* DW_CFA_def_cfa */
+	.uleb128 4
+	.uleb128 4
+	.byte 0x88		/* DW_CFA_offset, column 0x8 */
+	.uleb128 1
+	.align 4
+.LENDCIEDLSI:
+	.long .LENDFDEDLSI-.LSTARTFDEDLSI /* Length FDE */
+.LSTARTFDEDLSI:
+	.long .LSTARTFDEDLSI-.LSTARTFRAMEDLSI /* CIE pointer */
+	.long .LSTART_vsyscall-.	/* PC-relative start address */
+	.long .LEND_vsyscall-.LSTART_vsyscall
+	.uleb128 0
+	.align 4
+.LENDFDEDLSI:
+	.previous
+
+/*
+ * Get the common code for the sigreturn entry points.
+ */
+#include "vsyscall-sigreturn.S"
--- stock-2.5.68/arch/i386/kernel/vsyscall-sysenter.S	Wed Dec 31 16:00:00 1969
+++ linux-2.5.68/arch/i386/kernel/vsyscall-sysenter.S	Wed Apr 23 23:13:14 2003
@@ -0,0 +1,97 @@
+/*
+ * Code for the vsyscall page.  This version uses the sysenter instruction.
+ */
+
+	.text
+	.globl __kernel_vsyscall
+	.type __kernel_vsyscall,@function
+__kernel_vsyscall:
+.LSTART_vsyscall:
+	push %ecx
+.Lpush_ecx:
+	push %edx
+.Lpush_edx:
+	push %ebp
+.Lenter_kernel:
+	movl %esp,%ebp
+	sysenter
+
+	/* 7: align return point with nop's to make disassembly easier */
+	.space 7,0x90
+
+	/* 14: System call restart point is here! (SYSENTER_RETURN - 2) */
+	jmp .Lenter_kernel
+	/* 16: System call normal return point is here! */
+	.globl SYSENTER_RETURN	/* Symbol used by entry.S.  */
+SYSENTER_RETURN:
+	pop %ebp
+.Lpop_ebp:
+	pop %edx
+.Lpop_edx:
+	pop %ecx
+.Lpop_ecx:
+	ret
+.LEND_vsyscall:
+	.size __kernel_vsyscall,.-.LSTART_vsyscall
+	.previous
+
+	.section .eh_frame,"a",@progbits
+.LSTARTFRAMEDLSI:
+	.long .LENDCIEDLSI-.LSTARTCIEDLSI
+.LSTARTCIEDLSI:
+	.long 0			/* CIE ID */
+	.byte 1			/* Version number */
+	.string "zR"		/* NUL-terminated augmentation string */
+	.uleb128 1		/* Code alignment factor */
+	.sleb128 -4		/* Data alignment factor */
+	.byte 8			/* Return address register column */
+	.uleb128 1		/* Augmentation value length */
+	.byte 0x1b		/* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
+	.byte 0x0c		/* DW_CFA_def_cfa */
+	.uleb128 4
+	.uleb128 4
+	.byte 0x88		/* DW_CFA_offset, column 0x8 */
+	.uleb128 1
+	.align 4
+.LENDCIEDLSI:
+	.long .LENDFDEDLSI-.LSTARTFDEDLSI /* Length FDE */
+.LSTARTFDEDLSI:
+	.long .LSTARTFDEDLSI-.LSTARTFRAMEDLSI /* CIE pointer */
+	.long .LSTART_vsyscall-.	/* PC-relative start address */
+	.long .LEND_vsyscall-.LSTART_vsyscall
+	.uleb128 0
+	/* What follows are the instructions for the table generation.
+	   We have to record all changes of the stack pointer.  */
+	.byte 0x04		/* DW_CFA_advance_loc4 */
+	.long .Lpush_ecx-.LSTART_vsyscall
+	.byte 0x0e		/* DW_CFA_def_cfa_offset */
+	.byte 0x08		/* RA at offset 8 now */
+	.byte 0x04		/* DW_CFA_advance_loc4 */
+	.long .Lpush_edx-.Lpush_ecx
+	.byte 0x0e		/* DW_CFA_def_cfa_offset */
+	.byte 0x0c		/* RA at offset 12 now */
+	.byte 0x04		/* DW_CFA_advance_loc4 */
+	.long .Lenter_kernel-.Lpush_edx
+	.byte 0x0e		/* DW_CFA_def_cfa_offset */
+	.byte 0x10		/* RA at offset 16 now */
+	/* Finally the epilogue.  */
+	.byte 0x04		/* DW_CFA_advance_loc4 */
+	.long .Lpop_ebp-.Lenter_kernel
+	.byte 0x0e		/* DW_CFA_def_cfa_offset */
+	.byte 0x12		/* RA at offset 12 now */
+	.byte 0x04		/* DW_CFA_advance_loc4 */
+	.long .Lpop_edx-.Lpop_ebp
+	.byte 0x0e		/* DW_CFA_def_cfa_offset */
+	.byte 0x08		/* RA at offset 8 now */
+	.byte 0x04		/* DW_CFA_advance_loc4 */
+	.long .Lpop_ecx-.Lpop_edx
+	.byte 0x0e		/* DW_CFA_def_cfa_offset */
+	.byte 0x04		/* RA at offset 4 now */
+	.align 4
+.LENDFDEDLSI:
+	.previous
+
+/*
+ * Get the common code for the sigreturn entry points.
+ */
+#include "vsyscall-sigreturn.S"
--- stock-2.5.68/arch/i386/kernel/vsyscall-sigreturn.S	Wed Dec 31 16:00:00 1969
+++ linux-2.5.68/arch/i386/kernel/vsyscall-sigreturn.S	Wed Apr 23 20:43:16 2003
@@ -0,0 +1,38 @@
+/*
+ * Common code for the sigreturn entry points on the vsyscall page.
+ * So far this code is the same for both int80 and sysenter versions.
+ * This file is #include'd by vsyscall-*.S to define them after the
+ * vsyscall entry point.  The addresses we get for these entry points
+ * by doing ".balign 32" must match in both versions of the page.
+ */
+
+#include <asm/unistd.h>
+
+
+/* XXX
+   Should these be named "_sigtramp" or something?
+*/
+
+	.text
+	.balign 32
+	.globl __kernel_sigreturn
+	.type __kernel_sigreturn,@function
+__kernel_sigreturn:
+.LSTART_kernel_sigreturn:
+	popl %eax		/* XXX does this mean it needs unwind info? */
+	movl $__NR_sigreturn, %eax
+	int $0x80
+.LEND_sigreturn:
+	.size __kernel_sigreturn,.-.LSTART_sigreturn
+
+	.text
+	.balign 32
+	.globl __kernel_rt_sigreturn
+	.type __kernel_rt_sigreturn,@function
+__kernel_rt_sigreturn:
+.LSTART_kernel_rt_sigreturn:
+	movl $__NR_rt_sigreturn, %eax
+	int $0x80
+.LEND_rt_sigreturn:
+	.size __kernel_rt_sigreturn,.-.LSTART_rt_sigreturn
+	.previous
--- stock-2.5.68/arch/i386/kernel/vsyscall.lds	Wed Dec 31 16:00:00 1969
+++ linux-2.5.68/arch/i386/kernel/vsyscall.lds	Wed Apr 23 20:59:12 2003
@@ -0,0 +1,67 @@
+/*
+ * Linker script for vsyscall DSO.  The vsyscall page is an ELF shared
+ * object prelinked to its virtual address, and with only one read-only
+ * segment (that fits in one page).  This script controls its layout.
+ */
+
+/* This must match <asm/fixmap.h>.  */
+VSYSCALL_BASE = 0xffffe000;
+
+SECTIONS
+{
+  . = VSYSCALL_BASE + SIZEOF_HEADERS;
+
+  .hash           : { *(.hash) }		:text
+  .dynsym         : { *(.dynsym) }
+  .dynstr         : { *(.dynstr) }
+  .gnu.version    : { *(.gnu.version) }
+  .gnu.version_d  : { *(.gnu.version_d) }
+  .gnu.version_r  : { *(.gnu.version_r) }
+
+  /* This linker script is used both with -r and with -shared.
+     For the layouts to match, we need to skip more than enough
+     space for the dynamic symbol table et al.  If this amount
+     is insufficient, ld -shared will barf.  Just increase it here.  */
+  . = VSYSCALL_BASE + 0x400;
+
+  .text           : { *(.text) }		:text =0x90909090
+
+  .eh_frame_hdr   : { *(.eh_frame_hdr) }	:text :eh_frame_hdr
+  .eh_frame       : { KEEP (*(.eh_frame)) }	:text
+  .dynamic        : { *(.dynamic) }		:text :dynamic
+  .useless        : {
+  	*(.got.plt) *(.got)
+	*(.data .data.* .gnu.linkonce.d.*)
+	*(.dynbss)
+	*(.bss .bss.* .gnu.linkonce.b.*)
+  }						:text
+}
+
+/*
+ * We must supply the ELF program headers explicitly to get just one
+ * PT_LOAD segment, and set the flags explicitly to make segments read-only.
+ */
+PHDRS
+{
+  text PT_LOAD FILEHDR PHDRS FLAGS(5); /* PF_R|PF_X */
+  dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
+  eh_frame_hdr 0x6474e550; /* PT_GNU_EH_FRAME, but ld doesn't match the name */
+}
+
+/*
+ * This controls what symbols we export from the DSO.
+ */
+VERSION
+{
+  LINUX_2.5 {
+    global:
+    	__kernel_vsyscall;
+    	__kernel_sigreturn;
+    	__kernel_rt_sigreturn;
+
+    local: *;
+  };
+}
+
+/* The ELF entry point can be used to set the AT_SYSINFO value.  */
+ENTRY(__kernel_vsyscall);
--- stock-2.5.68/fs/binfmt_elf.c	Sat Apr 19 19:49:23 2003
+++ linux-2.5.68/fs/binfmt_elf.c	Wed Apr 23 12:54:07 2003
@@ -1260,6 +1260,9 @@ static int elf_core_dump(long signr, str
 	elf_core_copy_regs(&prstatus->pr_reg, regs);
 	
 	segs = current->mm->map_count;
+#ifdef ELF_CORE_EXTRA_PHDRS
+	segs += ELF_CORE_EXTRA_PHDRS;
+#endif
 
 	/* Set up header */
 	fill_elf_header(elf, segs+1);	/* including notes section */
@@ -1340,6 +1343,10 @@ static int elf_core_dump(long signr, str
 		DUMP_WRITE(&phdr, sizeof(phdr));
 	}
 
+#ifdef ELF_CORE_WRITE_EXTRA_PHDRS
+	ELF_CORE_WRITE_EXTRA_PHDRS;
+#endif
+
  	/* write out the notes section */
 	for (i = 0; i < numnote; i++)
 		if (!writenote(notes + i, file))
@@ -1385,6 +1392,10 @@ static int elf_core_dump(long signr, str
 		}
 	}
 
+#ifdef ELF_CORE_WRITE_EXTRA_DATA
+	ELF_CORE_WRITE_EXTRA_DATA;
+#endif
+
 	if ((off_t) file->f_pos != offset) {
 		/* Sanity check */
 		printk("elf_core_dump: file->f_pos (%ld) != offset (%ld)\n",
--- stock-2.5.68/include/linux/elf.h	Sat Apr 19 19:48:52 2003
+++ linux-2.5.68/include/linux/elf.h	Wed Apr 23 02:48:09 2003
@@ -29,8 +29,11 @@ typedef __s64	Elf64_Sxword;
 #define PT_NOTE    4
 #define PT_SHLIB   5
 #define PT_PHDR    6
+#define PT_LOOS	   0x60000000
+#define PT_HIOS	   0x6fffffff
 #define PT_LOPROC  0x70000000
 #define PT_HIPROC  0x7fffffff
+#define PT_GNU_EH_FRAME		0x6474e550
 #define PT_MIPS_REGINFO		0x70000000
 
 /* Flags in the e_flags field of the header */
--- stock-2.5.68/include/asm-i386/elf.h	Sat Apr 19 19:50:08 2003
+++ linux-2.5.68/include/asm-i386/elf.h	Thu Apr 24 17:15:55 2003
@@ -101,7 +101,7 @@ typedef struct user_fxsr_struct elf_fpxr
  * for more of them, start the x86-specific ones at 32.
  */
 #define AT_SYSINFO		32
-#define AT_SYSINFO_EH_FRAME	33
+#define AT_SYSINFO_EHDR		33
 
 #ifdef __KERNEL__
 #define SET_PERSONALITY(ex, ibcs2) set_personality((ibcs2)?PER_SVR4:PER_LINUX)
@@ -119,15 +119,55 @@ extern void dump_smp_unlazy_fpu(void);
 #define ELF_CORE_SYNC dump_smp_unlazy_fpu
 #endif
 
-/* Offset from the beginning of the page where the .eh_frame information
-   for the code in the vsyscall page starts.  */
-#define EH_FRAME_OFFSET 96
+#define VSYSCALL_BASE	(__fix_to_virt(FIX_VSYSCALL))
+#define VSYSCALL_EHDR	((const struct elfhdr *) VSYSCALL_BASE)
+#define VSYSCALL_ENTRY	((unsigned long) &__kernel_vsyscall)
+extern void __kernel_vsyscall;
 
 #define ARCH_DLINFO						\
 do {								\
-		NEW_AUX_ENT(AT_SYSINFO, 0xffffe000);		\
-		NEW_AUX_ENT(AT_SYSINFO_EH_FRAME,		\
-			    0xffffe000 + EH_FRAME_OFFSET);	\
+		NEW_AUX_ENT(AT_SYSINFO,	VSYSCALL_ENTRY);	\
+		NEW_AUX_ENT(AT_SYSINFO_EHDR, VSYSCALL_BASE);	\
+} while (0)
+
+/*
+ * These macros parameterize elf_core_dump in fs/binfmt_elf.c to write out
+ * extra segments containing the vsyscall DSO contents.  Dumping its
+ * contents makes post-mortem fully interpretable later without matching up
+ * the same kernel and hardware config to see what PC values meant.
+ * Dumping its extra ELF program headers includes all the other information
+ * a debugger needs to easily find how the vsyscall DSO was being used.
+ */
+#define ELF_CORE_EXTRA_PHDRS		(VSYSCALL_EHDR->e_phnum)
+#define ELF_CORE_WRITE_EXTRA_PHDRS					      \
+do {									      \
+	const struct elf_phdr *const vsyscall_phdrs =			      \
+		(const struct elf_phdr *) (VSYSCALL_BASE		      \
+					   + VSYSCALL_EHDR->e_phoff);	      \
+	int i;								      \
+	for (i = 0; i < VSYSCALL_EHDR->e_phnum; ++i) {			      \
+		struct elf_phdr phdr = vsyscall_phdrs[i];		      \
+		if (phdr.p_type == PT_LOAD) {				      \
+			phdr.p_offset = offset;				      \
+			offset += phdr.p_filesz;			      \
+		}							      \
+		else							      \
+			phdr.p_offset += offset;			      \
+		phdr.p_paddr = 0; /* match other core phdrs */		      \
+		DUMP_WRITE(&phdr, sizeof(phdr));			      \
+	}								      \
+} while (0)
+#define ELF_CORE_WRITE_EXTRA_DATA					      \
+do {									      \
+	const struct elf_phdr *const vsyscall_phdrs =			      \
+		(const struct elf_phdr *) (VSYSCALL_BASE		      \
+					   + VSYSCALL_EHDR->e_phoff);	      \
+	int i;								      \
+	for (i = 0; i < VSYSCALL_EHDR->e_phnum; ++i) {			      \
+		if (vsyscall_phdrs[i].p_type == PT_LOAD)		      \
+			DUMP_WRITE((void *) vsyscall_phdrs[i].p_vaddr,	      \
+				   vsyscall_phdrs[i].p_filesz);		      \
+	}								      \
 } while (0)
 
 #endif

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386 vsyscall DSO implementation
  2003-04-25  1:10 [PATCH] i386 vsyscall DSO implementation Roland McGrath
@ 2003-04-25  1:49 ` Jeff Garzik
  2003-04-25  2:10   ` Roland McGrath
  2003-04-26 17:15 ` Ulrich Drepper
  1 sibling, 1 reply; 10+ messages in thread
From: Jeff Garzik @ 2003-04-25  1:49 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Linus Torvalds, linux-kernel, Ulrich Drepper

> Two DSOs are built (a int $0x80 one and a sysenter one), using normal
> assembly code and ld -shared with a special linker script.  Both images
> (stripped ELF .so files) are embedded in __initdata space; sysenter_setup
> copies one or the other whole image into the vsyscall page.  Each image is
> a little under 2k (1884 and 1924) now, and could be trimmed a little bit
> with some specialized ELF stripping that ld and strip don't do.  Adding
> additional entry points should not have much additional overhead beyond the
> code itself and the string size of new symbol names.

We already embed a cpio archive into __initdata space.  What about 
putting the images in there, and either copying the data out of 
initramfs, or, directly referencing the pages that store each image?

	Jeff



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386 vsyscall DSO implementation
  2003-04-25  1:49 ` Jeff Garzik
@ 2003-04-25  2:10   ` Roland McGrath
  2003-04-25 16:21     ` David Mosberger
  0 siblings, 1 reply; 10+ messages in thread
From: Roland McGrath @ 2003-04-25  2:10 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, linux-kernel, Ulrich Drepper

> We already embed a cpio archive into __initdata space.  What about 
> putting the images in there, and either copying the data out of 
> initramfs, or, directly referencing the pages that store each image?

It doesn't matter to me, but I don't see the benefit to doing that.  It's
rather unlike what initramfs is used for now and would need a bunch of
extra code to accomplish something very simple.  

The DSO images are not stored page-aligned and padded in the kernel image,
so the pages can't be used directly.  Storing them that way would use more
space in the kernel image on disk, and then you'd want to free initdata
page containing the unused one.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386 vsyscall DSO implementation
  2003-04-25  2:10   ` Roland McGrath
@ 2003-04-25 16:21     ` David Mosberger
  2003-04-25 21:00       ` H. Peter Anvin
  2003-04-26 22:06       ` Roland McGrath
  0 siblings, 2 replies; 10+ messages in thread
From: David Mosberger @ 2003-04-25 16:21 UTC (permalink / raw)
  To: Roland McGrath; +Cc: Jeff Garzik, Linus Torvalds, linux-kernel, Ulrich Drepper

I like this.  Even better would be if all platforms could do the same.
I'm definitely interested in doing something similar for ia64 (the
getunwind() syscall was always just a stop-gap solution).

I assume that these kernel ELF images would then show up in
dl_iterate_phdr()?

To complete the picture, it would be nice if the kernel ELF images
were mappable files (either in /sysfs or /proc) and would show up in
/proc/PID/maps.  That way, a distributed application such as a remote
debugger could gain access to the kernel unwind tables on a remote
machine (assuming you have a remote filesystem).

	--david

>>>>> On Thu, 24 Apr 2003 19:10:50 -0700, Roland McGrath <roland@redhat.com> said:

  >> We already embed a cpio archive into __initdata space.  What
  >> about putting the images in there, and either copying the data
  >> out of initramfs, or, directly referencing the pages that store
  >> each image?

  Roland> It doesn't matter to me, but I don't see the benefit to
  Roland> doing that.  It's rather unlike what initramfs is used for
  Roland> now and would need a bunch of extra code to accomplish
  Roland> something very simple.

  Roland> The DSO images are not stored page-aligned and padded in the
  Roland> kernel image, so the pages can't be used directly.  Storing
  Roland> them that way would use more space in the kernel image on
  Roland> disk, and then you'd want to free initdata page containing
  Roland> the unused one.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386 vsyscall DSO implementation
  2003-04-25 16:21     ` David Mosberger
@ 2003-04-25 21:00       ` H. Peter Anvin
  2003-04-25 21:17         ` David Mosberger
  2003-04-26 22:06       ` Roland McGrath
  1 sibling, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2003-04-25 21:00 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <16041.24730.267207.671647@napali.hpl.hp.com>
By author:    David Mosberger <davidm@napali.hpl.hp.com>
In newsgroup: linux.dev.kernel
>
> I like this.  Even better would be if all platforms could do the same.
> I'm definitely interested in doing something similar for ia64 (the
> getunwind() syscall was always just a stop-gap solution).
> 
> I assume that these kernel ELF images would then show up in
> dl_iterate_phdr()?
> 
> To complete the picture, it would be nice if the kernel ELF images
> were mappable files (either in /sysfs or /proc) and would show up in
> /proc/PID/maps.  That way, a distributed application such as a remote
> debugger could gain access to the kernel unwind tables on a remote
> machine (assuming you have a remote filesystem).
> 

How about /boot?

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386 vsyscall DSO implementation
  2003-04-25 21:00       ` H. Peter Anvin
@ 2003-04-25 21:17         ` David Mosberger
  2003-04-25 21:20           ` H. Peter Anvin
  0 siblings, 1 reply; 10+ messages in thread
From: David Mosberger @ 2003-04-25 21:17 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

>>>>> On 25 Apr 2003 14:00:53 -0700, "H. Peter Anvin" <hpa@zytor.com> said:

  hpa> Followup to: <16041.24730.267207.671647@napali.hpl.hp.com> By
  hpa> author: David Mosberger <davidm@napali.hpl.hp.com> In
  hpa> newsgroup: linux.dev.kernel

  >>  I like this.  Even better would be if all platforms could do the
  >> same.  I'm definitely interested in doing something similar for
  >> ia64 (the getunwind() syscall was always just a stop-gap
  >> solution).

  >> I assume that these kernel ELF images would then show up in
  >> dl_iterate_phdr()?

  >> To complete the picture, it would be nice if the kernel ELF
  >> images were mappable files (either in /sysfs or /proc) and would
  >> show up in /proc/PID/maps.  That way, a distributed application
  >> such as a remote debugger could gain access to the kernel unwind
  >> tables on a remote machine (assuming you have a remote
  >> filesystem).

  hpa> How about /boot?

You mean a regular file?  I'm not sure whether this could be made to
work.  The /proc/PID/maps entry (really: the vm_area for the kernel
ELF images) would have to be created by the kernel, at a time when no
real filesystem is available.  Also, since the kernel needs to store
the data in kernel-memory anyhow, I don't think there is much point in
storing it on disk as well.

	--david

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386 vsyscall DSO implementation
  2003-04-25 21:17         ` David Mosberger
@ 2003-04-25 21:20           ` H. Peter Anvin
  2003-04-25 21:50             ` David Mosberger
  0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2003-04-25 21:20 UTC (permalink / raw)
  To: davidm; +Cc: linux-kernel

David Mosberger wrote:
> 
>   >> To complete the picture, it would be nice if the kernel ELF
>   >> images were mappable files (either in /sysfs or /proc) and would
>   >> show up in /proc/PID/maps.  That way, a distributed application
>   >> such as a remote debugger could gain access to the kernel unwind
>   >> tables on a remote machine (assuming you have a remote
>   >> filesystem).
> 
>   hpa> How about /boot?
> 
> You mean a regular file?  I'm not sure whether this could be made to
> work.  The /proc/PID/maps entry (really: the vm_area for the kernel
> ELF images) would have to be created by the kernel, at a time when no
> real filesystem is available.  Also, since the kernel needs to store
> the data in kernel-memory anyhow, I don't think there is much point in
> storing it on disk as well.
> 

Perhaps I misunderstood the statement.  With "kernel ELF images" above,
I am now gathering you're talking about only the segments exported to
userspace (i.e. vsyscall code), not the kernel itself, which was my
original reading of that statement.

	-hpa


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386 vsyscall DSO implementation
  2003-04-25 21:20           ` H. Peter Anvin
@ 2003-04-25 21:50             ` David Mosberger
  0 siblings, 0 replies; 10+ messages in thread
From: David Mosberger @ 2003-04-25 21:50 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: davidm, linux-kernel

>>>>> On Fri, 25 Apr 2003 14:20:47 -0700, "H. Peter Anvin" <hpa@zytor.com> said:

  hpa> David Mosberger wrote:
  >>  >> To complete the picture, it would be nice if the kernel ELF
  >> >> images were mappable files (either in /sysfs or /proc) and
  >> would >> show up in /proc/PID/maps.  That way, a distributed
  >> application >> such as a remote debugger could gain access to the
  >> kernel unwind >> tables on a remote machine (assuming you have a
  >> remote >> filesystem).

  hpa> How about /boot?
  >>  You mean a regular file?  I'm not sure whether this could be
  >> made to work.  The /proc/PID/maps entry (really: the vm_area for
  >> the kernel ELF images) would have to be created by the kernel, at
  >> a time when no real filesystem is available.  Also, since the
  >> kernel needs to store the data in kernel-memory anyhow, I don't
  >> think there is much point in storing it on disk as well.

  hpa> Perhaps I misunderstood the statement.  With "kernel ELF
  hpa> images" above, I am now gathering you're talking about only the
  hpa> segments exported to userspace (i.e. vsyscall code), not the
  hpa> kernel itself, which was my original reading of that statement.

Sort of.  I used the term "kernel ELF images" to refer to kernel code
that is shared with the user.  I thought that even on x86 this code is
pinned in memory, but perhaps I misunderstood.

Anyhow, it seems to me that using a special filesystem would be more
suitable, as otherwise you get into bootstrap problems etc.

	--david

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386 vsyscall DSO implementation
  2003-04-25  1:10 [PATCH] i386 vsyscall DSO implementation Roland McGrath
  2003-04-25  1:49 ` Jeff Garzik
@ 2003-04-26 17:15 ` Ulrich Drepper
  1 sibling, 0 replies; 10+ messages in thread
From: Ulrich Drepper @ 2003-04-26 17:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roland McGrath, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- From what I've read here so far there were no objections.  The only
comments were to list the pages in /proc/*/maps and somehow make the DSO
available as a real file in the filesystem.

The first I think is reasonable.  But it is orthogonal to this patch.
It applies as well to the code currently in the kernel.  I'm pretty sure
we can arrange this to happen but it doesn't have to be in this patch.

As for the second, I do not think this is a good idea at all.  In theory
there could be more then one such DSO in use.  Without looking at the
actual process' address space it is not possible to determine which one
is used.  Roland also has IIRC a patch for ptrace() which allows it to
access the vsyscall page.  This is the method you'll have to apply.
Your remote debugger will in any case have to use ptrace(), so this is
no new requirement.

The fake kernel DSO will indeed be visible through the
_dl_iterate_phdr() function.  This means programs have easy access to it.


And a few more points on the DSO solution:

+ since the DSO is build just like an ordinary userlevel DSO there is
  no problem with writing the code which goes into it in C.  It is
  not necessary to do what is necessary for the current functions.

+ the DSO method allows to introduce new kernel interfaces which do
  not require new syscalls.  Well, somehow the kernel must be entered
  but how this happens is not visible to the user code.  This could
  mean using a syscall but the actual syscall number changes with
  every release.

+ the mechanism can easily be transferred to other architectures.  It
  could in theory mean using syscall numbers as the kernel interface
  can be abandoned.  Syscalls would be indentified by name at runtime
  (which is, I think, what most people think is the right solution).
  This has a slight runtime impact but it could be almost reduced to
  nil (maybe prelink is already capable of doing this).


Having said this, Linus, could you apply the patch if you have no
objections so that we can move on and add the remaining pieces?

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+qr652ijCOnn/RHQRAoZEAKCJ3D39tZubMFK+NBdoIHsixF3qhgCeMM/o
+IA3Hu+EMdNA+UYI4jlG6ys=
=ENqC
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] i386 vsyscall DSO implementation
  2003-04-25 16:21     ` David Mosberger
  2003-04-25 21:00       ` H. Peter Anvin
@ 2003-04-26 22:06       ` Roland McGrath
  1 sibling, 0 replies; 10+ messages in thread
From: Roland McGrath @ 2003-04-26 22:06 UTC (permalink / raw)
  To: davidm; +Cc: Jeff Garzik, Linus Torvalds, linux-kernel, Ulrich Drepper

> I like this.  Even better would be if all platforms could do the same.
> I'm definitely interested in doing something similar for ia64 (the
> getunwind() syscall was always just a stop-gap solution).

It is very straightforward to implement.  The arch/i386/kernel/Makefile
rules can be copied for other architectures, modified slightly if you don't
need two different .so's (or name them differently).  The syntax in
vsyscall.S using .incbin works in gas for any platform AFAIK.  You can also
use -iformat binary and tweaks to the kernel linker script to do the
equivalent without wrapper assembly file.  The vsyscall-syms.o hacks are
only necessary if you want kernel code to refer to symbol names defined in
the DSO source.  The vsyscall.lds linker script will be slightly different
for each platform, but the tweaks should be trivial.  If you aren't making
vsyscall-syms.o then you don't need to hard-wire the .text offset, and the
entry point addresses will just be chosen by ld.  As Ulrich mentioned, you
can write the contents of the page however you would like to write it as
normal user code (C or assembly).

> I assume that these kernel ELF images would then show up in
> dl_iterate_phdr()?

Yes.  I have glibc changes for this that will go in when the kernel is ready.
(This is the primary immediate benefit of the scheme.  The immediate need
is for the unwind info, which with this and PT_GNU_EH_FRAME fits in neatly.)

> To complete the picture, it would be nice if the kernel ELF images were
> mappable files (either in /sysfs or /proc) and would show up in
> /proc/PID/maps.  That way, a distributed application such as a remote
> debugger could gain access to the kernel unwind tables on a remote
> machine (assuming you have a remote filesystem).

The /proc file is obviously trivial to do and I've considered offering it.
A remote debugger is much better prepared to read it out of the inferior
process's memory than to find the right filesystem, and that seems like the
debugger implementation most likely to always find the right info.  (We
expect to be hacking gdb to do this soon.)  The reason I can see for
wanting a /proc/sys/vsyscall.so or such file is if you are linking a
program against the DSO for its soname and symbols.  But for that purpose
it isn't necessary, and I think not even desireable, to always use the
fresh DSO image from the live kernel.  You can extract the DSO image from
the running kernel with a trivial program, and store it in a normal file,
package and version it manually for kernel-devel packages, etc.  I'm not
opposed to having a /proc file, but I doubt there is any essential purpose
for which it's really what you want to use.

Having some note of the region in /proc/PID/maps is another question.  Even
if it's not said to refer to some named file, it would be nice to have the
address range indicated so as to preserve the pre-vsyscall property that
maps lines show all addresses that the particular process could in fact
access.  That can be implemented either by a hack or by having a normal vma
for the vsyscall page.  The latter would also eliminate the need for a hack
in ptrace to allow reading the page (I will post that patch again shortly).


Thanks,
Roland

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-04-26 21:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-25  1:10 [PATCH] i386 vsyscall DSO implementation Roland McGrath
2003-04-25  1:49 ` Jeff Garzik
2003-04-25  2:10   ` Roland McGrath
2003-04-25 16:21     ` David Mosberger
2003-04-25 21:00       ` H. Peter Anvin
2003-04-25 21:17         ` David Mosberger
2003-04-25 21:20           ` H. Peter Anvin
2003-04-25 21:50             ` David Mosberger
2003-04-26 22:06       ` Roland McGrath
2003-04-26 17:15 ` Ulrich Drepper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).