All of lore.kernel.org
 help / color / mirror / Atom feed
* [3.1 patch] x86: default to vsyscall=native
@ 2011-10-03  9:08 Adrian Bunk
  2011-10-03 13:04 ` Andrew Lutomirski
  2011-10-03 13:19 ` richard -rw- weinberger
  0 siblings, 2 replies; 50+ messages in thread
From: Adrian Bunk @ 2011-10-03  9:08 UTC (permalink / raw)
  To: Andy Lutomirski, H. Peter Anvin, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, x86, linux-kernel

After upgrading a kernel the existing userspace should just work
(assuming it did work before ;-) ), but when I upgraded my kernel
from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.

dmesg said:
  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790

Looking throught the changelog I ended up at commit 3ae36655
("x86-64: Rework vsyscall emulation and add vsyscall= parameter").

Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to 
vsyscall=native.

That sounds reasonable to me, and fixes the problem for me.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
---
 Documentation/kernel-parameters.txt |    7 ++++---
 arch/x86/kernel/vsyscall_64.c       |    2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 854ed5ca..d6e6724 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2706,10 +2706,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			functions are at fixed addresses, they make nice
 			targets for exploits that can control RIP.
 
-			emulate     [default] Vsyscalls turn into traps and are
-			            emulated reasonably safely.
+			emulate     Vsyscalls turn into traps and are emulated
+			            reasonably safely.
 
-			native      Vsyscalls are native syscall instructions.
+			native      [default] Vsyscalls are native syscall
+			            instructions.
 			            This is a little bit faster than trapping
 			            and makes a few dynamic recompilers work
 			            better than they would in emulation mode.
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 18ae83d..b56c65de 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -56,7 +56,7 @@ DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
 	.lock = __SEQLOCK_UNLOCKED(__vsyscall_gtod_data.lock),
 };
 
-static enum { EMULATE, NATIVE, NONE } vsyscall_mode = EMULATE;
+static enum { EMULATE, NATIVE, NONE } vsyscall_mode = NATIVE;
 
 static int __init vsyscall_setup(char *str)
 {
-- 
1.7.6.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-03  9:08 [3.1 patch] x86: default to vsyscall=native Adrian Bunk
@ 2011-10-03 13:04 ` Andrew Lutomirski
  2011-10-03 17:33   ` Adrian Bunk
  2011-10-03 13:19 ` richard -rw- weinberger
  1 sibling, 1 reply; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-03 13:04 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: H. Peter Anvin, Linus Torvalds, Thomas Gleixner, Ingo Molnar,
	x86, linux-kernel

On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
> After upgrading a kernel the existing userspace should just work
> (assuming it did work before ;-) ), but when I upgraded my kernel
> from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
>
> dmesg said:
>  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
>  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
>
> Looking throught the changelog I ended up at commit 3ae36655
> ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
>
> Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
> vsyscall=native.
>
> That sounds reasonable to me, and fixes the problem for me.

At this point in the -rc cycle, this sounds fine.

That being said, I'd like to fix it for real for 3.2.  This particular
failure is suspicious -- the "vsyscall fault" message means that
sys_gettimeofday returned EFAULT, which means that the old (3.0 and
before) vgettimeofday should *also* have segfaulted.  We do have a bit
of a bug in that the new code doesn't report si_addr properly, but
that sounds unlikely as a culprit.  Did you try with the offending
commit reverted (i.e. fce8dc0)?  I bet that it also fails there.

What's the .config for your UML binary?  I'd like to see if I can
reproduce this.

--Andy

>
> Signed-off-by: Adrian Bunk <bunk@kernel.org>
> ---
>  Documentation/kernel-parameters.txt |    7 ++++---
>  arch/x86/kernel/vsyscall_64.c       |    2 +-
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 854ed5ca..d6e6724 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2706,10 +2706,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>                        functions are at fixed addresses, they make nice
>                        targets for exploits that can control RIP.
>
> -                       emulate     [default] Vsyscalls turn into traps and are
> -                                   emulated reasonably safely.
> +                       emulate     Vsyscalls turn into traps and are emulated
> +                                   reasonably safely.
>
> -                       native      Vsyscalls are native syscall instructions.
> +                       native      [default] Vsyscalls are native syscall
> +                                   instructions.
>                                    This is a little bit faster than trapping
>                                    and makes a few dynamic recompilers work
>                                    better than they would in emulation mode.
> diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
> index 18ae83d..b56c65de 100644
> --- a/arch/x86/kernel/vsyscall_64.c
> +++ b/arch/x86/kernel/vsyscall_64.c
> @@ -56,7 +56,7 @@ DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
>        .lock = __SEQLOCK_UNLOCKED(__vsyscall_gtod_data.lock),
>  };
>
> -static enum { EMULATE, NATIVE, NONE } vsyscall_mode = EMULATE;
> +static enum { EMULATE, NATIVE, NONE } vsyscall_mode = NATIVE;
>
>  static int __init vsyscall_setup(char *str)
>  {
> --
> 1.7.6.3
>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-03  9:08 [3.1 patch] x86: default to vsyscall=native Adrian Bunk
  2011-10-03 13:04 ` Andrew Lutomirski
@ 2011-10-03 13:19 ` richard -rw- weinberger
  2011-10-03 17:46   ` Adrian Bunk
  1 sibling, 1 reply; 50+ messages in thread
From: richard -rw- weinberger @ 2011-10-03 13:19 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andy Lutomirski, H. Peter Anvin, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, x86, linux-kernel

Adrian,

On Mon, Oct 3, 2011 at 11:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
> After upgrading a kernel the existing userspace should just work
> (assuming it did work before ;-) ), but when I upgraded my kernel
> from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
>

Are only old UML kernels like 2.6.30.1 affected?
Anyway, it's time to upgrade my main machine to 3.1.0-rc8 to observe
some new UML issues. ;-)

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-03 13:04 ` Andrew Lutomirski
@ 2011-10-03 17:33   ` Adrian Bunk
  2011-10-03 18:06     ` Andrew Lutomirski
  2011-10-05 22:13     ` Andrew Lutomirski
  0 siblings, 2 replies; 50+ messages in thread
From: Adrian Bunk @ 2011-10-03 17:33 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: H. Peter Anvin, Linus Torvalds, Thomas Gleixner, Ingo Molnar,
	x86, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2934 bytes --]

On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
> > After upgrading a kernel the existing userspace should just work
> > (assuming it did work before ;-) ), but when I upgraded my kernel
> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
> >
> > dmesg said:
> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
> >
> > Looking throught the changelog I ended up at commit 3ae36655
> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
> >
> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
> > vsyscall=native.
> >
> > That sounds reasonable to me, and fixes the problem for me.
> 
> At this point in the -rc cycle, this sounds fine.
> 
> That being said, I'd like to fix it for real for 3.2.  This particular
> failure is suspicious -- the "vsyscall fault" message means that
> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
> before) vgettimeofday should *also* have segfaulted.

This 2.6.30.1 UML kernel binary from 2009 worked for me for all host 
kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
it also seems to run nicely.

Looking deeper into "a UML instance didn't come up properly",
the problem is that it comes up in a strange (readonly) state.

There are "Using makefile-style concurrent boot in runlevel S."
and "Using makefile-style concurrent boot in runlevel 2." in the
logs with a Debian userspace, but no output from the init scripts
in these broken bootups (normal messages are in non-broken bootups).

Perhaps the two the messages I see in dmesg on the host are from the 
processes running rcS and rc2 failing early?

In a working startup with a Debian userspace, I'm getting during rcS
 Setting the system clock.
 Cannot access the Hardware Clock via any known method.
 Use the --debug option to see the details of our search for an access method.
 Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).

> We do have a bit
> of a bug in that the new code doesn't report si_addr properly, but
> that sounds unlikely as a culprit.  Did you try with the offending
> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.

fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you 
want me to revert?

> What's the .config for your UML binary?  I'd like to see if I can
> reproduce this.

It's attached.

> --Andy

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


[-- Attachment #2: config-uml --]
[-- Type: text/plain, Size: 11035 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.30-rc4
# Thu Apr 30 22:55:45 2009
#
CONFIG_DEFCONFIG_LIST="arch/$ARCH/defconfig"
CONFIG_GENERIC_HARDIRQS=y
CONFIG_UML=y
CONFIG_MMU=y
CONFIG_NO_IOMEM=y
# CONFIG_TRACE_IRQFLAGS_SUPPORT is not set
CONFIG_LOCKDEP_SUPPORT=y
# CONFIG_STACKTRACE_SUPPORT is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_IRQ_RELEASE_METHOD=y
CONFIG_HZ=100

#
# UML-specific options
#

#
# Host processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
CONFIG_MK8=y
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_INTERNODE_CACHE_BYTES=64
# CONFIG_X86_CMPXCHG is not set
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=3
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_UML_X86=y
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_3_LEVEL_PGTABLES=y
# CONFIG_ARCH_HAS_SC_SIGNALS is not set
# CONFIG_ARCH_REUSE_HOST_VSYSCALL_AREA is not set
CONFIG_SMP_BROKEN=y
CONFIG_GENERIC_HWEIGHT=y
# CONFIG_STATIC_LINK is not set
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=0
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
CONFIG_HAVE_MLOCK=y
CONFIG_HAVE_MLOCKED_PAGE_BIT=y
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_LD_SCRIPT_DYN=y
CONFIG_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_HAVE_AOUT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_HOSTFS=y
# CONFIG_HPPFS is not set
CONFIG_MCONSOLE=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_KERNEL_STACK_ORDER=1

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=128
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set

#
# RCU Subsystem
#
CONFIG_CLASSIC_RCU=y
# CONFIG_TREE_RCU is not set
# CONFIG_PREEMPT_RCU is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_GROUP_SCHED is not set
# CONFIG_CGROUPS is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
# CONFIG_RELAY is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_EXTRA_PASS=y
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_COMPAT_BRK=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
# CONFIG_PROFILING is not set
# CONFIG_MARKERS is not set
# CONFIG_SLOW_WORK is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
# CONFIG_MODULES is not set
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_BLK_DEV_INTEGRITY is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
# CONFIG_IOSCHED_DEADLINE is not set
# CONFIG_IOSCHED_CFQ is not set
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
CONFIG_DEFAULT_NOOP=y
CONFIG_DEFAULT_IOSCHED="noop"
# CONFIG_FREEZER is not set
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_UBD=y
# CONFIG_BLK_DEV_UBD_SYNC is not set
CONFIG_BLK_DEV_COW_COMMON=y
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_RAM is not set
# CONFIG_ATA_OVER_ETH is not set

#
# Character Devices
#
CONFIG_STDERR_CONSOLE=y
CONFIG_STDIO_CONSOLE=y
CONFIG_SSL=y
CONFIG_NULL_CHAN=y
CONFIG_PORT_CHAN=y
CONFIG_PTY_CHAN=y
CONFIG_TTY_CHAN=y
CONFIG_XTERM_CHAN=y
# CONFIG_NOCONFIG_CHAN is not set
CONFIG_CON_ZERO_CHAN="fd:0,fd:1"
CONFIG_CON_CHAN="xterm"
CONFIG_SSL_CHAN="pts"
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
# CONFIG_RAW_DRIVER is not set
CONFIG_LEGACY_PTY_COUNT=32
# CONFIG_WATCHDOG is not set
# CONFIG_UML_SOUND is not set
# CONFIG_SOUND is not set
# CONFIG_SOUND_OSS_CORE is not set
# CONFIG_HOSTAUDIO is not set
# CONFIG_HW_RANDOM is not set
CONFIG_UML_RANDOM=y
# CONFIG_MMAPPER is not set

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
# CONFIG_INET_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
# CONFIG_INET_LRO is not set
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
# CONFIG_IPV6 is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETFILTER is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_NET_DSA is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
# CONFIG_NET_SCHED is not set
# CONFIG_DCB is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
# CONFIG_WIRELESS is not set
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# UML Network Devices
#
CONFIG_UML_NET=y
CONFIG_UML_NET_ETHERTAP=y
CONFIG_UML_NET_TUNTAP=y
# CONFIG_UML_NET_SLIP is not set
# CONFIG_UML_NET_DAEMON is not set
# CONFIG_UML_NET_VDE is not set
# CONFIG_UML_NET_MCAST is not set
# CONFIG_UML_NET_PCAP is not set
# CONFIG_UML_NET_SLIRP is not set
CONFIG_NETDEVICES=y
CONFIG_COMPAT_NET_DEV_OPS=y
CONFIG_DUMMY=y
# CONFIG_BONDING is not set
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=y
# CONFIG_VETH is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#
# CONFIG_WAN is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
# CONFIG_CONNECTOR is not set

#
# File systems
#
CONFIG_EXT2_FS=y
# CONFIG_EXT2_FS_XATTR is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_EXT3_FS_XATTR is not set
# CONFIG_EXT4_FS is not set
CONFIG_JBD=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_FS_POSIX_ACL is not set
CONFIG_FILE_LOCKING=y
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_BTRFS_FS is not set
# CONFIG_DNOTIFY is not set
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_FUSE_FS is not set

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
# CONFIG_PROC_KCORE is not set
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
# CONFIG_HUGETLB_PAGE is not set
# CONFIG_CONFIGFS_FS is not set
# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
# CONFIG_NLS is not set
# CONFIG_DLM is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITYFS is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_CRYPTO is not set
# CONFIG_BINARY_PRINTF is not set

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
# CONFIG_CRC_T10DIF is not set
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
CONFIG_HAS_DMA=y
CONFIG_NLATTR=y

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
# CONFIG_SCSI is not set
# CONFIG_SCSI_DMA is not set
# CONFIG_SCSI_NETLINK is not set
# CONFIG_MD is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_INPUT is not set

#
# Kernel hacking
#
# CONFIG_PRINTK_TIME is not set
# CONFIG_ENABLE_WARN_DEPRECATED is not set
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_FRAME_WARN=1024
# CONFIG_UNUSED_SYMBOLS is not set
# CONFIG_DEBUG_FS is not set
# CONFIG_DEBUG_KERNEL is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
# CONFIG_SYSCTL_SYSCALL_CHECK is not set
# CONFIG_SAMPLES is not set
# CONFIG_DEBUG_STACK_USAGE is not set

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-03 13:19 ` richard -rw- weinberger
@ 2011-10-03 17:46   ` Adrian Bunk
  0 siblings, 0 replies; 50+ messages in thread
From: Adrian Bunk @ 2011-10-03 17:46 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Andy Lutomirski, H. Peter Anvin, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, x86, linux-kernel

On Mon, Oct 03, 2011 at 03:19:59PM +0200, richard -rw- weinberger wrote:
> Adrian,
> 
> On Mon, Oct 3, 2011 at 11:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
> > After upgrading a kernel the existing userspace should just work
> > (assuming it did work before ;-) ), but when I upgraded my kernel
> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
> 
> Are only old UML kernels like 2.6.30.1 affected?

I don't know, that's my only running UML instance.

> Anyway, it's time to upgrade my main machine to 3.1.0-rc8 to observe
> some new UML issues. ;-)
> 
> -- 
> Thanks,
> //richard

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-03 17:33   ` Adrian Bunk
@ 2011-10-03 18:06     ` Andrew Lutomirski
  2011-10-03 18:41       ` Adrian Bunk
  2011-10-05 22:13     ` Andrew Lutomirski
  1 sibling, 1 reply; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-03 18:06 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: H. Peter Anvin, Linus Torvalds, Thomas Gleixner, Ingo Molnar,
	x86, linux-kernel

On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@stusta.de> wrote:
> On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
>> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
>> > After upgrading a kernel the existing userspace should just work
>> > (assuming it did work before ;-) ), but when I upgraded my kernel
>> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
>> >
>> > dmesg said:
>> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
>> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
>> >
>> > Looking throught the changelog I ended up at commit 3ae36655
>> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
>> >
>> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
>> > vsyscall=native.
>> >
>> > That sounds reasonable to me, and fixes the problem for me.
>>
>> At this point in the -rc cycle, this sounds fine.
>>
>> That being said, I'd like to fix it for real for 3.2.  This particular
>> failure is suspicious -- the "vsyscall fault" message means that
>> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
>> before) vgettimeofday should *also* have segfaulted.
>
> This 2.6.30.1 UML kernel binary from 2009 worked for me for all host
> kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
> it also seems to run nicely.
>
> Looking deeper into "a UML instance didn't come up properly",
> the problem is that it comes up in a strange (readonly) state.
>
> There are "Using makefile-style concurrent boot in runlevel S."
> and "Using makefile-style concurrent boot in runlevel 2." in the
> logs with a Debian userspace, but no output from the init scripts
> in these broken bootups (normal messages are in non-broken bootups).
>
> Perhaps the two the messages I see in dmesg on the host are from the
> processes running rcS and rc2 failing early?
>
> In a working startup with a Debian userspace, I'm getting during rcS
>  Setting the system clock.
>  Cannot access the Hardware Clock via any known method.
>  Use the --debug option to see the details of our search for an access method.
>  Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).
>
>> We do have a bit
>> of a bug in that the new code doesn't report si_addr properly, but
>> that sounds unlikely as a culprit.  Did you try with the offending
>> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
>
> fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
> want me to revert?

No -- I actually meant to try running that revision or to try with the
vsyscall= patch reverted.

>
>> What's the .config for your UML binary?  I'd like to see if I can
>> reproduce this.
>
> It's attached.

I'll play around with it.

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-03 18:06     ` Andrew Lutomirski
@ 2011-10-03 18:41       ` Adrian Bunk
  0 siblings, 0 replies; 50+ messages in thread
From: Adrian Bunk @ 2011-10-03 18:41 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: H. Peter Anvin, Linus Torvalds, Thomas Gleixner, Ingo Molnar,
	x86, linux-kernel

On Mon, Oct 03, 2011 at 11:06:13AM -0700, Andrew Lutomirski wrote:
> On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@stusta.de> wrote:
> > On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
>...
> >> of a bug in that the new code doesn't report si_addr properly, but
> >> that sounds unlikely as a culprit.  Did you try with the offending
> >> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
> >
> > fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
> > want me to revert?
> 
> No -- I actually meant to try running that revision or to try with the
> vsyscall= patch reverted.

I now tried both, and your bet was right.

>...
> --Andy

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-03 17:33   ` Adrian Bunk
  2011-10-03 18:06     ` Andrew Lutomirski
@ 2011-10-05 22:13     ` Andrew Lutomirski
  2011-10-05 22:22       ` richard -rw- weinberger
  2011-10-05 22:24       ` [3.1 patch] x86: default to vsyscall=native Adrian Bunk
  1 sibling, 2 replies; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-05 22:13 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: H. Peter Anvin, Linus Torvalds, Thomas Gleixner, Ingo Molnar,
	x86, linux-kernel

On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@stusta.de> wrote:
> On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
>> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
>> > After upgrading a kernel the existing userspace should just work
>> > (assuming it did work before ;-) ), but when I upgraded my kernel
>> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
>> >
>> > dmesg said:
>> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
>> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
>> >
>> > Looking throught the changelog I ended up at commit 3ae36655
>> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
>> >
>> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
>> > vsyscall=native.
>> >
>> > That sounds reasonable to me, and fixes the problem for me.
>>
>> At this point in the -rc cycle, this sounds fine.
>>
>> That being said, I'd like to fix it for real for 3.2.  This particular
>> failure is suspicious -- the "vsyscall fault" message means that
>> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
>> before) vgettimeofday should *also* have segfaulted.
>
> This 2.6.30.1 UML kernel binary from 2009 worked for me for all host
> kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
> it also seems to run nicely.
>
> Looking deeper into "a UML instance didn't come up properly",
> the problem is that it comes up in a strange (readonly) state.
>
> There are "Using makefile-style concurrent boot in runlevel S."
> and "Using makefile-style concurrent boot in runlevel 2." in the
> logs with a Debian userspace, but no output from the init scripts
> in these broken bootups (normal messages are in non-broken bootups).
>
> Perhaps the two the messages I see in dmesg on the host are from the
> processes running rcS and rc2 failing early?
>
> In a working startup with a Debian userspace, I'm getting during rcS
>  Setting the system clock.
>  Cannot access the Hardware Clock via any known method.
>  Use the --debug option to see the details of our search for an access method.
>  Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).
>
>> We do have a bit
>> of a bug in that the new code doesn't report si_addr properly, but
>> that sounds unlikely as a culprit.  Did you try with the offending
>> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
>
> fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
> want me to revert?
>
>> What's the .config for your UML binary?  I'd like to see if I can
>> reproduce this.
>
> It's attached.
>

I can't reproduce it.  What distro is running inside the UML instance?

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-05 22:13     ` Andrew Lutomirski
@ 2011-10-05 22:22       ` richard -rw- weinberger
  2011-10-05 22:30         ` Adrian Bunk
  2011-10-05 22:24       ` [3.1 patch] x86: default to vsyscall=native Adrian Bunk
  1 sibling, 1 reply; 50+ messages in thread
From: richard -rw- weinberger @ 2011-10-05 22:22 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Adrian Bunk, H. Peter Anvin, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, x86, linux-kernel

On Thu, Oct 6, 2011 at 12:13 AM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@stusta.de> wrote:
>> On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
>>> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
>>> > After upgrading a kernel the existing userspace should just work
>>> > (assuming it did work before ;-) ), but when I upgraded my kernel
>>> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
>>> >
>>> > dmesg said:
>>> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
>>> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
>>> >
>>> > Looking throught the changelog I ended up at commit 3ae36655
>>> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
>>> >
>>> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
>>> > vsyscall=native.
>>> >
>>> > That sounds reasonable to me, and fixes the problem for me.
>>>
>>> At this point in the -rc cycle, this sounds fine.
>>>
>>> That being said, I'd like to fix it for real for 3.2.  This particular
>>> failure is suspicious -- the "vsyscall fault" message means that
>>> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
>>> before) vgettimeofday should *also* have segfaulted.
>>
>> This 2.6.30.1 UML kernel binary from 2009 worked for me for all host
>> kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
>> it also seems to run nicely.
>>
>> Looking deeper into "a UML instance didn't come up properly",
>> the problem is that it comes up in a strange (readonly) state.
>>
>> There are "Using makefile-style concurrent boot in runlevel S."
>> and "Using makefile-style concurrent boot in runlevel 2." in the
>> logs with a Debian userspace, but no output from the init scripts
>> in these broken bootups (normal messages are in non-broken bootups).
>>
>> Perhaps the two the messages I see in dmesg on the host are from the
>> processes running rcS and rc2 failing early?
>>
>> In a working startup with a Debian userspace, I'm getting during rcS
>>  Setting the system clock.
>>  Cannot access the Hardware Clock via any known method.
>>  Use the --debug option to see the details of our search for an access method.
>>  Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).
>>
>>> We do have a bit
>>> of a bug in that the new code doesn't report si_addr properly, but
>>> that sounds unlikely as a culprit.  Did you try with the offending
>>> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
>>
>> fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
>> want me to revert?
>>
>>> What's the .config for your UML binary?  I'd like to see if I can
>>> reproduce this.
>>
>> It's attached.
>>
>
> I can't reproduce it.  What distro is running inside the UML instance?

Same here.
Adrian, is the UML kernel crashing before executing init?
We definitely need more information...

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-05 22:13     ` Andrew Lutomirski
  2011-10-05 22:22       ` richard -rw- weinberger
@ 2011-10-05 22:24       ` Adrian Bunk
  1 sibling, 0 replies; 50+ messages in thread
From: Adrian Bunk @ 2011-10-05 22:24 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: H. Peter Anvin, Linus Torvalds, Thomas Gleixner, Ingo Molnar,
	x86, linux-kernel

On Wed, Oct 05, 2011 at 03:13:51PM -0700, Andrew Lutomirski wrote:
> On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@stusta.de> wrote:
> > On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
> >> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
> >> > After upgrading a kernel the existing userspace should just work
> >> > (assuming it did work before ;-) ), but when I upgraded my kernel
> >> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
> >> >
> >> > dmesg said:
> >> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
> >> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
> >> >
> >> > Looking throught the changelog I ended up at commit 3ae36655
> >> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
> >> >
> >> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
> >> > vsyscall=native.
> >> >
> >> > That sounds reasonable to me, and fixes the problem for me.
> >>
> >> At this point in the -rc cycle, this sounds fine.
> >>
> >> That being said, I'd like to fix it for real for 3.2.  This particular
> >> failure is suspicious -- the "vsyscall fault" message means that
> >> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
> >> before) vgettimeofday should *also* have segfaulted.
> >
> > This 2.6.30.1 UML kernel binary from 2009 worked for me for all host
> > kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
> > it also seems to run nicely.
> >
> > Looking deeper into "a UML instance didn't come up properly",
> > the problem is that it comes up in a strange (readonly) state.
> >
> > There are "Using makefile-style concurrent boot in runlevel S."
> > and "Using makefile-style concurrent boot in runlevel 2." in the
> > logs with a Debian userspace, but no output from the init scripts
> > in these broken bootups (normal messages are in non-broken bootups).
> >
> > Perhaps the two the messages I see in dmesg on the host are from the
> > processes running rcS and rc2 failing early?
> >
> > In a working startup with a Debian userspace, I'm getting during rcS
> >  Setting the system clock.
> >  Cannot access the Hardware Clock via any known method.
> >  Use the --debug option to see the details of our search for an access method.
> >  Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).
> >
> >> We do have a bit
> >> of a bug in that the new code doesn't report si_addr properly, but
> >> that sounds unlikely as a culprit.  Did you try with the offending
> >> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
> >
> > fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
> > want me to revert?
> >
> >> What's the .config for your UML binary?  I'd like to see if I can
> >> reproduce this.
> >
> > It's attached.
> >
> 
> I can't reproduce it.  What distro is running inside the UML instance?

Debian stable.

> --Andy

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-05 22:22       ` richard -rw- weinberger
@ 2011-10-05 22:30         ` Adrian Bunk
  2011-10-05 22:41           ` richard -rw- weinberger
  2011-10-05 22:46           ` Andrew Lutomirski
  0 siblings, 2 replies; 50+ messages in thread
From: Adrian Bunk @ 2011-10-05 22:30 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Andrew Lutomirski, H. Peter Anvin, Linus Torvalds,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Thu, Oct 06, 2011 at 12:22:34AM +0200, richard -rw- weinberger wrote:
> On Thu, Oct 6, 2011 at 12:13 AM, Andrew Lutomirski <luto@mit.edu> wrote:
> > On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@stusta.de> wrote:
> >> On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
> >>> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
> >>> > After upgrading a kernel the existing userspace should just work
> >>> > (assuming it did work before ;-) ), but when I upgraded my kernel
> >>> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
> >>> >
> >>> > dmesg said:
> >>> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
> >>> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
> >>> >
> >>> > Looking throught the changelog I ended up at commit 3ae36655
> >>> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
> >>> >
> >>> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
> >>> > vsyscall=native.
> >>> >
> >>> > That sounds reasonable to me, and fixes the problem for me.
> >>>
> >>> At this point in the -rc cycle, this sounds fine.
> >>>
> >>> That being said, I'd like to fix it for real for 3.2.  This particular
> >>> failure is suspicious -- the "vsyscall fault" message means that
> >>> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
> >>> before) vgettimeofday should *also* have segfaulted.
> >>
> >> This 2.6.30.1 UML kernel binary from 2009 worked for me for all host
> >> kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
> >> it also seems to run nicely.
> >>
> >> Looking deeper into "a UML instance didn't come up properly",
> >> the problem is that it comes up in a strange (readonly) state.
> >>
> >> There are "Using makefile-style concurrent boot in runlevel S."
> >> and "Using makefile-style concurrent boot in runlevel 2." in the
> >> logs with a Debian userspace, but no output from the init scripts
> >> in these broken bootups (normal messages are in non-broken bootups).
> >>
> >> Perhaps the two the messages I see in dmesg on the host are from the
> >> processes running rcS and rc2 failing early?
> >>
> >> In a working startup with a Debian userspace, I'm getting during rcS
> >>  Setting the system clock.
> >>  Cannot access the Hardware Clock via any known method.
> >>  Use the --debug option to see the details of our search for an access method.
> >>  Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).
> >>
> >>> We do have a bit
> >>> of a bug in that the new code doesn't report si_addr properly, but
> >>> that sounds unlikely as a culprit.  Did you try with the offending
> >>> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
> >>
> >> fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
> >> want me to revert?
> >>
> >>> What's the .config for your UML binary?  I'd like to see if I can
> >>> reproduce this.
> >>
> >> It's attached.
> >>
> >
> > I can't reproduce it.  What distro is running inside the UML instance?
> 
> Same here.
> Adrian, is the UML kernel crashing before executing init?

As I wrote:
  Looking deeper into "a UML instance didn't come up properly",
  the problem is that it comes up in a strange (readonly) state.

The UML kernel is running happily without crashing, and as I wrote my
guess about my problems is:
  Perhaps the two the messages I see in dmesg on the host are from the
  processes running rcS and rc2 failing early?

> We definitely need more information...

I gave the information that was requested. plus my observations.

What more information exactly do you need from me?

> Thanks,
> //richard

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-05 22:30         ` Adrian Bunk
@ 2011-10-05 22:41           ` richard -rw- weinberger
  2011-10-05 22:46           ` Andrew Lutomirski
  1 sibling, 0 replies; 50+ messages in thread
From: richard -rw- weinberger @ 2011-10-05 22:41 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andrew Lutomirski, H. Peter Anvin, Linus Torvalds,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Thu, Oct 6, 2011 at 12:30 AM, Adrian Bunk <bunk@stusta.de> wrote:
> On Thu, Oct 06, 2011 at 12:22:34AM +0200, richard -rw- weinberger wrote:
>> On Thu, Oct 6, 2011 at 12:13 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>> > On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@stusta.de> wrote:
>> >> On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
>> >>> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
>> >>> > After upgrading a kernel the existing userspace should just work
>> >>> > (assuming it did work before ;-) ), but when I upgraded my kernel
>> >>> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
>> >>> >
>> >>> > dmesg said:
>> >>> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
>> >>> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
>> >>> >
>> >>> > Looking throught the changelog I ended up at commit 3ae36655
>> >>> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
>> >>> >
>> >>> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
>> >>> > vsyscall=native.
>> >>> >
>> >>> > That sounds reasonable to me, and fixes the problem for me.
>> >>>
>> >>> At this point in the -rc cycle, this sounds fine.
>> >>>
>> >>> That being said, I'd like to fix it for real for 3.2.  This particular
>> >>> failure is suspicious -- the "vsyscall fault" message means that
>> >>> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
>> >>> before) vgettimeofday should *also* have segfaulted.
>> >>
>> >> This 2.6.30.1 UML kernel binary from 2009 worked for me for all host
>> >> kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
>> >> it also seems to run nicely.
>> >>
>> >> Looking deeper into "a UML instance didn't come up properly",
>> >> the problem is that it comes up in a strange (readonly) state.
>> >>
>> >> There are "Using makefile-style concurrent boot in runlevel S."
>> >> and "Using makefile-style concurrent boot in runlevel 2." in the
>> >> logs with a Debian userspace, but no output from the init scripts
>> >> in these broken bootups (normal messages are in non-broken bootups).
>> >>
>> >> Perhaps the two the messages I see in dmesg on the host are from the
>> >> processes running rcS and rc2 failing early?
>> >>
>> >> In a working startup with a Debian userspace, I'm getting during rcS
>> >>  Setting the system clock.
>> >>  Cannot access the Hardware Clock via any known method.
>> >>  Use the --debug option to see the details of our search for an access method.
>> >>  Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).
>> >>
>> >>> We do have a bit
>> >>> of a bug in that the new code doesn't report si_addr properly, but
>> >>> that sounds unlikely as a culprit.  Did you try with the offending
>> >>> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
>> >>
>> >> fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
>> >> want me to revert?
>> >>
>> >>> What's the .config for your UML binary?  I'd like to see if I can
>> >>> reproduce this.
>> >>
>> >> It's attached.
>> >>
>> >
>> > I can't reproduce it.  What distro is running inside the UML instance?
>>
>> Same here.
>> Adrian, is the UML kernel crashing before executing init?
>
> As I wrote:
>  Looking deeper into "a UML instance didn't come up properly",
>  the problem is that it comes up in a strange (readonly) state.
>
> The UML kernel is running happily without crashing, and as I wrote my
> guess about my problems is:
>  Perhaps the two the messages I see in dmesg on the host are from the
>  processes running rcS and rc2 failing early?
>
>> We definitely need more information...
>
> I gave the information that was requested. plus my observations.
>

Whoops, the mail containing that information did not make it into my
head, sorry.
Now I know where to look for...

BTW: Can you please test 3.1-rcX as UML kernel? It contains
vDSO/vsyscall fixes...

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-05 22:30         ` Adrian Bunk
  2011-10-05 22:41           ` richard -rw- weinberger
@ 2011-10-05 22:46           ` Andrew Lutomirski
  2011-10-05 23:36             ` Andrew Lutomirski
  1 sibling, 1 reply; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-05 22:46 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: richard -rw- weinberger, H. Peter Anvin, Linus Torvalds,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Wed, Oct 5, 2011 at 3:30 PM, Adrian Bunk <bunk@stusta.de> wrote:
> On Thu, Oct 06, 2011 at 12:22:34AM +0200, richard -rw- weinberger wrote:
>> On Thu, Oct 6, 2011 at 12:13 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>> > On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@stusta.de> wrote:
>> >> On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
>> >>> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
>> >>> > After upgrading a kernel the existing userspace should just work
>> >>> > (assuming it did work before ;-) ), but when I upgraded my kernel
>> >>> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
>> >>> >
>> >>> > dmesg said:
>> >>> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
>> >>> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
>> >>> >
>> >>> > Looking throught the changelog I ended up at commit 3ae36655
>> >>> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
>> >>> >
>> >>> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
>> >>> > vsyscall=native.
>> >>> >
>> >>> > That sounds reasonable to me, and fixes the problem for me.
>> >>>
>> >>> At this point in the -rc cycle, this sounds fine.
>> >>>
>> >>> That being said, I'd like to fix it for real for 3.2.  This particular
>> >>> failure is suspicious -- the "vsyscall fault" message means that
>> >>> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
>> >>> before) vgettimeofday should *also* have segfaulted.
>> >>
>> >> This 2.6.30.1 UML kernel binary from 2009 worked for me for all host
>> >> kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
>> >> it also seems to run nicely.
>> >>
>> >> Looking deeper into "a UML instance didn't come up properly",
>> >> the problem is that it comes up in a strange (readonly) state.
>> >>
>> >> There are "Using makefile-style concurrent boot in runlevel S."
>> >> and "Using makefile-style concurrent boot in runlevel 2." in the
>> >> logs with a Debian userspace, but no output from the init scripts
>> >> in these broken bootups (normal messages are in non-broken bootups).
>> >>
>> >> Perhaps the two the messages I see in dmesg on the host are from the
>> >> processes running rcS and rc2 failing early?
>> >>
>> >> In a working startup with a Debian userspace, I'm getting during rcS
>> >>  Setting the system clock.
>> >>  Cannot access the Hardware Clock via any known method.
>> >>  Use the --debug option to see the details of our search for an access method.
>> >>  Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).
>> >>
>> >>> We do have a bit
>> >>> of a bug in that the new code doesn't report si_addr properly, but
>> >>> that sounds unlikely as a culprit.  Did you try with the offending
>> >>> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
>> >>
>> >> fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
>> >> want me to revert?
>> >>
>> >>> What's the .config for your UML binary?  I'd like to see if I can
>> >>> reproduce this.
>> >>
>> >> It's attached.
>> >>
>> >
>> > I can't reproduce it.  What distro is running inside the UML instance?
>>
>> Same here.
>> Adrian, is the UML kernel crashing before executing init?
>
> As I wrote:
>  Looking deeper into "a UML instance didn't come up properly",
>  the problem is that it comes up in a strange (readonly) state.
>
> The UML kernel is running happily without crashing, and as I wrote my
> guess about my problems is:
>  Perhaps the two the messages I see in dmesg on the host are from the
>  processes running rcS and rc2 failing early?
>
>> We definitely need more information...
>
> I gave the information that was requested. plus my observations.
>
> What more information exactly do you need from me?

None :)  I just reproduced the problem with Debian Squeeze.  Lenny works fine.

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-05 22:46           ` Andrew Lutomirski
@ 2011-10-05 23:36             ` Andrew Lutomirski
  2011-10-06  3:06               ` Andrew Lutomirski
  0 siblings, 1 reply; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-05 23:36 UTC (permalink / raw)
  To: Adrian Bunk, richard -rw- weinberger
  Cc: H. Peter Anvin, Linus Torvalds, Thomas Gleixner, Ingo Molnar,
	x86, linux-kernel

On Wed, Oct 5, 2011 at 3:46 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Wed, Oct 5, 2011 at 3:30 PM, Adrian Bunk <bunk@stusta.de> wrote:
>> On Thu, Oct 06, 2011 at 12:22:34AM +0200, richard -rw- weinberger wrote:
>>> On Thu, Oct 6, 2011 at 12:13 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>>> > On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@stusta.de> wrote:
>>> >> On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
>>> >>> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
>>> >>> > After upgrading a kernel the existing userspace should just work
>>> >>> > (assuming it did work before ;-) ), but when I upgraded my kernel
>>> >>> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
>>> >>> >
>>> >>> > dmesg said:
>>> >>> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
>>> >>> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
>>> >>> >
>>> >>> > Looking throught the changelog I ended up at commit 3ae36655
>>> >>> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
>>> >>> >
>>> >>> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
>>> >>> > vsyscall=native.
>>> >>> >
>>> >>> > That sounds reasonable to me, and fixes the problem for me.
>>> >>>
>>> >>> At this point in the -rc cycle, this sounds fine.
>>> >>>
>>> >>> That being said, I'd like to fix it for real for 3.2.  This particular
>>> >>> failure is suspicious -- the "vsyscall fault" message means that
>>> >>> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
>>> >>> before) vgettimeofday should *also* have segfaulted.
>>> >>
>>> >> This 2.6.30.1 UML kernel binary from 2009 worked for me for all host
>>> >> kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
>>> >> it also seems to run nicely.
>>> >>
>>> >> Looking deeper into "a UML instance didn't come up properly",
>>> >> the problem is that it comes up in a strange (readonly) state.
>>> >>
>>> >> There are "Using makefile-style concurrent boot in runlevel S."
>>> >> and "Using makefile-style concurrent boot in runlevel 2." in the
>>> >> logs with a Debian userspace, but no output from the init scripts
>>> >> in these broken bootups (normal messages are in non-broken bootups).
>>> >>
>>> >> Perhaps the two the messages I see in dmesg on the host are from the
>>> >> processes running rcS and rc2 failing early?
>>> >>
>>> >> In a working startup with a Debian userspace, I'm getting during rcS
>>> >>  Setting the system clock.
>>> >>  Cannot access the Hardware Clock via any known method.
>>> >>  Use the --debug option to see the details of our search for an access method.
>>> >>  Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).
>>> >>
>>> >>> We do have a bit
>>> >>> of a bug in that the new code doesn't report si_addr properly, but
>>> >>> that sounds unlikely as a culprit.  Did you try with the offending
>>> >>> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
>>> >>
>>> >> fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
>>> >> want me to revert?
>>> >>
>>> >>> What's the .config for your UML binary?  I'd like to see if I can
>>> >>> reproduce this.
>>> >>
>>> >> It's attached.
>>> >>
>>> >
>>> > I can't reproduce it.  What distro is running inside the UML instance?
>>>
>>> Same here.
>>> Adrian, is the UML kernel crashing before executing init?
>>
>> As I wrote:
>>  Looking deeper into "a UML instance didn't come up properly",
>>  the problem is that it comes up in a strange (readonly) state.
>>
>> The UML kernel is running happily without crashing, and as I wrote my
>> guess about my problems is:
>>  Perhaps the two the messages I see in dmesg on the host are from the
>>  processes running rcS and rc2 failing early?
>>
>>> We definitely need more information...
>>
>> I gave the information that was requested. plus my observations.
>>
>> What more information exactly do you need from me?
>
> None :)  I just reproduced the problem with Debian Squeeze.  Lenny works fine.

This is strange.  The problem appears to be in startpar.  That same
exact Debian image works fine on KVM running 3.1-rc8 (with
vsyscall=emulate) and on 2.6.40 (i.e. Fedora 15's kernel).  If I set
print-fatal-signals=1 I don't see a fatal signal in startpar.

Richard, is it possible that UML 2.6.30.1 generates a bogus
vgettimeofday and recovers successfully on older kernels because the
resulting SIGSEGV had a valid sigcontext?  I can try hacking the
"vsyscall fault" path to generate full sigcontext and info.  This
seems rather unlikely, though.

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-05 23:36             ` Andrew Lutomirski
@ 2011-10-06  3:06               ` Andrew Lutomirski
  2011-10-06 12:12                 ` richard -rw- weinberger
  2011-10-06 15:37                 ` richard -rw- weinberger
  0 siblings, 2 replies; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-06  3:06 UTC (permalink / raw)
  To: Adrian Bunk, richard -rw- weinberger
  Cc: H. Peter Anvin, Linus Torvalds, Thomas Gleixner, Ingo Molnar,
	x86, linux-kernel

On Wed, Oct 5, 2011 at 4:36 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Wed, Oct 5, 2011 at 3:46 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>> On Wed, Oct 5, 2011 at 3:30 PM, Adrian Bunk <bunk@stusta.de> wrote:
>>> On Thu, Oct 06, 2011 at 12:22:34AM +0200, richard -rw- weinberger wrote:
>>>> On Thu, Oct 6, 2011 at 12:13 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>>>> > On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@stusta.de> wrote:
>>>> >> On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
>>>> >>> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
>>>> >>> > After upgrading a kernel the existing userspace should just work
>>>> >>> > (assuming it did work before ;-) ), but when I upgraded my kernel
>>>> >>> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
>>>> >>> >
>>>> >>> > dmesg said:
>>>> >>> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
>>>> >>> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
>>>> >>> >
>>>> >>> > Looking throught the changelog I ended up at commit 3ae36655
>>>> >>> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
>>>> >>> >
>>>> >>> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
>>>> >>> > vsyscall=native.
>>>> >>> >
>>>> >>> > That sounds reasonable to me, and fixes the problem for me.
>>>> >>>
>>>> >>> At this point in the -rc cycle, this sounds fine.
>>>> >>>
>>>> >>> That being said, I'd like to fix it for real for 3.2.  This particular
>>>> >>> failure is suspicious -- the "vsyscall fault" message means that
>>>> >>> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
>>>> >>> before) vgettimeofday should *also* have segfaulted.
>>>> >>
>>>> >> This 2.6.30.1 UML kernel binary from 2009 worked for me for all host
>>>> >> kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
>>>> >> it also seems to run nicely.
>>>> >>
>>>> >> Looking deeper into "a UML instance didn't come up properly",
>>>> >> the problem is that it comes up in a strange (readonly) state.
>>>> >>
>>>> >> There are "Using makefile-style concurrent boot in runlevel S."
>>>> >> and "Using makefile-style concurrent boot in runlevel 2." in the
>>>> >> logs with a Debian userspace, but no output from the init scripts
>>>> >> in these broken bootups (normal messages are in non-broken bootups).
>>>> >>
>>>> >> Perhaps the two the messages I see in dmesg on the host are from the
>>>> >> processes running rcS and rc2 failing early?
>>>> >>
>>>> >> In a working startup with a Debian userspace, I'm getting during rcS
>>>> >>  Setting the system clock.
>>>> >>  Cannot access the Hardware Clock via any known method.
>>>> >>  Use the --debug option to see the details of our search for an access method.
>>>> >>  Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).
>>>> >>
>>>> >>> We do have a bit
>>>> >>> of a bug in that the new code doesn't report si_addr properly, but
>>>> >>> that sounds unlikely as a culprit.  Did you try with the offending
>>>> >>> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
>>>> >>
>>>> >> fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
>>>> >> want me to revert?
>>>> >>
>>>> >>> What's the .config for your UML binary?  I'd like to see if I can
>>>> >>> reproduce this.
>>>> >>
>>>> >> It's attached.
>>>> >>
>>>> >
>>>> > I can't reproduce it.  What distro is running inside the UML instance?
>>>>
>>>> Same here.
>>>> Adrian, is the UML kernel crashing before executing init?
>>>
>>> As I wrote:
>>>  Looking deeper into "a UML instance didn't come up properly",
>>>  the problem is that it comes up in a strange (readonly) state.
>>>
>>> The UML kernel is running happily without crashing, and as I wrote my
>>> guess about my problems is:
>>>  Perhaps the two the messages I see in dmesg on the host are from the
>>>  processes running rcS and rc2 failing early?
>>>
>>>> We definitely need more information...
>>>
>>> I gave the information that was requested. plus my observations.
>>>
>>> What more information exactly do you need from me?
>>
>> None :)  I just reproduced the problem with Debian Squeeze.  Lenny works fine.
>
> This is strange.  The problem appears to be in startpar.  That same
> exact Debian image works fine on KVM running 3.1-rc8 (with
> vsyscall=emulate) and on 2.6.40 (i.e. Fedora 15's kernel).  If I set
> print-fatal-signals=1 I don't see a fatal signal in startpar.
>
> Richard, is it possible that UML 2.6.30.1 generates a bogus
> vgettimeofday and recovers successfully on older kernels because the
> resulting SIGSEGV had a valid sigcontext?  I can try hacking the
> "vsyscall fault" path to generate full sigcontext and info.  This
> seems rather unlikely, though.

I think that is the problem.  UML appears to lazily set up "page
tables" just like a real machine; it does this by handling SIGSEGV and
calling handle_mm_fault.  If cr2 isn't set right, though, it doesn't
know where the fault was and it can't handle it, so it just sends
SIGSEGV to userspace.

In 3.0 and earlier, we don't crash but we malfunction differently: UML
doesn't intercept the vsyscall at all and the guest sees the hosts's
time.  This should be fixed in a newer version of UML.

In vsyscall=native mode, we DTRT because UML handles the syscall itself.

I'll see how ugly the patch to get this all correct is.  It may not be
all that pretty because we won't be able to use sys_gettimeofday
anymore.

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-06  3:06               ` Andrew Lutomirski
@ 2011-10-06 12:12                 ` richard -rw- weinberger
  2011-10-06 15:37                 ` richard -rw- weinberger
  1 sibling, 0 replies; 50+ messages in thread
From: richard -rw- weinberger @ 2011-10-06 12:12 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Adrian Bunk, H. Peter Anvin, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, x86, linux-kernel

On Thu, Oct 6, 2011 at 5:06 AM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Wed, Oct 5, 2011 at 4:36 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>> On Wed, Oct 5, 2011 at 3:46 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>>> On Wed, Oct 5, 2011 at 3:30 PM, Adrian Bunk <bunk@stusta.de> wrote:
>>>> On Thu, Oct 06, 2011 at 12:22:34AM +0200, richard -rw- weinberger wrote:
>>>>> On Thu, Oct 6, 2011 at 12:13 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>>>>> > On Mon, Oct 3, 2011 at 10:33 AM, Adrian Bunk <bunk@stusta.de> wrote:
>>>>> >> On Mon, Oct 03, 2011 at 06:04:53AM -0700, Andrew Lutomirski wrote:
>>>>> >>> On Mon, Oct 3, 2011 at 2:08 AM, Adrian Bunk <bunk@stusta.de> wrote:
>>>>> >>> > After upgrading a kernel the existing userspace should just work
>>>>> >>> > (assuming it did work before ;-) ), but when I upgraded my kernel
>>>>> >>> > from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
>>>>> >>> >
>>>>> >>> > dmesg said:
>>>>> >>> >  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
>>>>> >>> >  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
>>>>> >>> >
>>>>> >>> > Looking throught the changelog I ended up at commit 3ae36655
>>>>> >>> > ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
>>>>> >>> >
>>>>> >>> > Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to
>>>>> >>> > vsyscall=native.
>>>>> >>> >
>>>>> >>> > That sounds reasonable to me, and fixes the problem for me.
>>>>> >>>
>>>>> >>> At this point in the -rc cycle, this sounds fine.
>>>>> >>>
>>>>> >>> That being said, I'd like to fix it for real for 3.2.  This particular
>>>>> >>> failure is suspicious -- the "vsyscall fault" message means that
>>>>> >>> sys_gettimeofday returned EFAULT, which means that the old (3.0 and
>>>>> >>> before) vgettimeofday should *also* have segfaulted.
>>>>> >>
>>>>> >> This 2.6.30.1 UML kernel binary from 2009 worked for me for all host
>>>>> >> kernels from 2.6.30 to 3.0, and with 3.1.0-rc8 and vsyscall=native
>>>>> >> it also seems to run nicely.
>>>>> >>
>>>>> >> Looking deeper into "a UML instance didn't come up properly",
>>>>> >> the problem is that it comes up in a strange (readonly) state.
>>>>> >>
>>>>> >> There are "Using makefile-style concurrent boot in runlevel S."
>>>>> >> and "Using makefile-style concurrent boot in runlevel 2." in the
>>>>> >> logs with a Debian userspace, but no output from the init scripts
>>>>> >> in these broken bootups (normal messages are in non-broken bootups).
>>>>> >>
>>>>> >> Perhaps the two the messages I see in dmesg on the host are from the
>>>>> >> processes running rcS and rc2 failing early?
>>>>> >>
>>>>> >> In a working startup with a Debian userspace, I'm getting during rcS
>>>>> >>  Setting the system clock.
>>>>> >>  Cannot access the Hardware Clock via any known method.
>>>>> >>  Use the --debug option to see the details of our search for an access method.
>>>>> >>  Unable to set System Clock to: Mon Oct 3 17:01:35 UTC 2011 ... (warning).
>>>>> >>
>>>>> >>> We do have a bit
>>>>> >>> of a bug in that the new code doesn't report si_addr properly, but
>>>>> >>> that sounds unlikely as a culprit.  Did you try with the offending
>>>>> >>> commit reverted (i.e. fce8dc0)?  I bet that it also fails there.
>>>>> >>
>>>>> >> fce8dc0 is "x86-64: Wire up getcpu syscall", is that really the one you
>>>>> >> want me to revert?
>>>>> >>
>>>>> >>> What's the .config for your UML binary?  I'd like to see if I can
>>>>> >>> reproduce this.
>>>>> >>
>>>>> >> It's attached.
>>>>> >>
>>>>> >
>>>>> > I can't reproduce it.  What distro is running inside the UML instance?
>>>>>
>>>>> Same here.
>>>>> Adrian, is the UML kernel crashing before executing init?
>>>>
>>>> As I wrote:
>>>>  Looking deeper into "a UML instance didn't come up properly",
>>>>  the problem is that it comes up in a strange (readonly) state.
>>>>
>>>> The UML kernel is running happily without crashing, and as I wrote my
>>>> guess about my problems is:
>>>>  Perhaps the two the messages I see in dmesg on the host are from the
>>>>  processes running rcS and rc2 failing early?
>>>>
>>>>> We definitely need more information...
>>>>
>>>> I gave the information that was requested. plus my observations.
>>>>
>>>> What more information exactly do you need from me?
>>>
>>> None :)  I just reproduced the problem with Debian Squeeze.  Lenny works fine.
>>
>> This is strange.  The problem appears to be in startpar.  That same
>> exact Debian image works fine on KVM running 3.1-rc8 (with
>> vsyscall=emulate) and on 2.6.40 (i.e. Fedora 15's kernel).  If I set
>> print-fatal-signals=1 I don't see a fatal signal in startpar.
>>
>> Richard, is it possible that UML 2.6.30.1 generates a bogus
>> vgettimeofday and recovers successfully on older kernels because the
>> resulting SIGSEGV had a valid sigcontext?  I can try hacking the
>> "vsyscall fault" path to generate full sigcontext and info.  This
>> seems rather unlikely, though.
>
> I think that is the problem.  UML appears to lazily set up "page
> tables" just like a real machine; it does this by handling SIGSEGV and
> calling handle_mm_fault.  If cr2 isn't set right, though, it doesn't
> know where the fault was and it can't handle it, so it just sends
> SIGSEGV to userspace.
>
> In 3.0 and earlier, we don't crash but we malfunction differently: UML
> doesn't intercept the vsyscall at all and the guest sees the hosts's
> time.  This should be fixed in a newer version of UML.

How can we intercept a vsyscall?
It's not trivial.

Starting with Linux 3.1 UML (x86_64) has a vDSO page which transforms
all vDSO calls
to real system calls which can be intercepted.
So, only statically linked binaries will use the host's vsyscall interface.

> In vsyscall=native mode, we DTRT because UML handles the syscall itself.
>
> I'll see how ugly the patch to get this all correct is.  It may not be
> all that pretty because we won't be able to use sys_gettimeofday
> anymore.
>

vsyscall=emulate would be okay for UML if the SEGV has a valid signal context.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-06  3:06               ` Andrew Lutomirski
  2011-10-06 12:12                 ` richard -rw- weinberger
@ 2011-10-06 15:37                 ` richard -rw- weinberger
  2011-10-06 18:16                   ` Andrew Lutomirski
  1 sibling, 1 reply; 50+ messages in thread
From: richard -rw- weinberger @ 2011-10-06 15:37 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Adrian Bunk, H. Peter Anvin, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, x86, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

On Thu, Oct 6, 2011 at 5:06 AM, Andrew Lutomirski <luto@mit.edu> wrote:
> I'll see how ugly the patch to get this all correct is.  It may not be
> all that pretty because we won't be able to use sys_gettimeofday
> anymore.

BTW: The attached program triggers the issue.

on 3.1-rc8+:
# ./sig.dyn
faulting address: 0xdeadbeef
# ./sig.static
[   19.075106] sig.static[863] vsyscall fault (exploit attempt?)
ip:ffffffffff600000 cs:33 sp:7fff9e53d8c8 ax:ffffffffff600000 si:0
di:deadbeef
faulting address: 0x0

I guess UML is not the only user of this feature...

-- 
Thanks,
//richard

[-- Attachment #2: sig.c --]
[-- Type: text/x-csrc, Size: 454 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/time.h>

static void sighandler(int sig, siginfo_t *si, void *uc)
{
	printf("faulting address: 0x%lx\n", (unsigned long)si->si_addr);

	exit(1);
}

int main()
{
	struct sigaction sa;

  sa.sa_sigaction = (void *)sighandler;
  sigemptyset(&sa.sa_mask);
  sa.sa_flags = SA_SIGINFO| SA_NODEFER;
  sigaction(SIGSEGV, &sa, NULL);

	gettimeofday((void *)0xdeadbeef, NULL);

	return 0;
}

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-06 15:37                 ` richard -rw- weinberger
@ 2011-10-06 18:16                   ` Andrew Lutomirski
  2011-10-06 18:34                     ` Linus Torvalds
  0 siblings, 1 reply; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-06 18:16 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Adrian Bunk, H. Peter Anvin, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, x86, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

On Thu, Oct 6, 2011 at 8:37 AM, richard -rw- weinberger
<richard.weinberger@gmail.com> wrote:
> On Thu, Oct 6, 2011 at 5:06 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>> I'll see how ugly the patch to get this all correct is.  It may not be
>> all that pretty because we won't be able to use sys_gettimeofday
>> anymore.
>
> BTW: The attached program triggers the issue.
>
> on 3.1-rc8+:
> # ./sig.dyn
> faulting address: 0xdeadbeef
> # ./sig.static
> [   19.075106] sig.static[863] vsyscall fault (exploit attempt?)
> ip:ffffffffff600000 cs:33 sp:7fff9e53d8c8 ax:ffffffffff600000 si:0
> di:deadbeef
> faulting address: 0x0
>
> I guess UML is not the only user of this feature...

I assume you wrote this to detect the problem :)

Fixing it will be annoying because the attached fancier version needs
to work, too.  I could implement the whole mess in software, but it
might be nicer to arrange for uaccess errors to stash some information
somewhere (like in the thread_struct cr2 variable).

--Andy

[-- Attachment #2: sig.c --]
[-- Type: text/x-csrc, Size: 691 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/time.h>
#include <sys/mman.h>

static void sighandler(int sig, siginfo_t *si, void *uc)
{
	printf("faulting address: 0x%lx\n", (unsigned long)si->si_addr);

	exit(1);
}

int main()
{
	char *page = mmap(0, 8192, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	mprotect(page, 4096, PROT_READ | PROT_WRITE);

	struct sigaction sa;

	sa.sa_sigaction = (void *)sighandler;
	sigemptyset(&sa.sa_mask);
	sa.sa_flags = SA_SIGINFO| SA_NODEFER;
	sigaction(SIGSEGV, &sa, NULL);

	void *access_addr = page + 4095;

	printf("Mapped page = %p; will access %p\n", page, access_addr);

	gettimeofday(access_addr, NULL);

	return 0;
}

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-06 18:16                   ` Andrew Lutomirski
@ 2011-10-06 18:34                     ` Linus Torvalds
  2011-10-07  0:48                       ` Andrew Lutomirski
  0 siblings, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2011-10-06 18:34 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: richard -rw- weinberger, Adrian Bunk, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Thu, Oct 6, 2011 at 11:16 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>
> Fixing it will be annoying because the attached fancier version needs
> to work, too.  I could implement the whole mess in software, but it
> might be nicer to arrange for uaccess errors to stash some information
> somewhere (like in the thread_struct cr2 variable).

That should be easy enough to do. Just add it to the
"fixup_exception()" case in no_context().

                Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-06 18:34                     ` Linus Torvalds
@ 2011-10-07  0:48                       ` Andrew Lutomirski
  2011-10-10 11:19                         ` richard -rw- weinberger
  0 siblings, 1 reply; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-07  0:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: richard -rw- weinberger, Adrian Bunk, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Thu, Oct 6, 2011 at 11:34 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Oct 6, 2011 at 11:16 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>>
>> Fixing it will be annoying because the attached fancier version needs
>> to work, too.  I could implement the whole mess in software, but it
>> might be nicer to arrange for uaccess errors to stash some information
>> somewhere (like in the thread_struct cr2 variable).
>
> That should be easy enough to do. Just add it to the
> "fixup_exception()" case in no_context().

This code is rather messy.  We stash the cr2, err, and trap fields of
sigcontext in thread_struct and we *never* reset them until the next
segfault.  So userspace sees stale garbage on every signal that isn't
a (genuine) segfault.  I can imagine this breaking UML is remarkably
bizarre ways even without vsyscall emulation because UML actually
seems to rely on that stuff to determine the source of a signal.

The nice fix would be to move this information into siginfo.  cr2
appears to be duplicated by sa_addr.  trap_no is apparently redundant
except for SIGTRAP.  error_code is interesting.  Any objection to
using some padding bytes to move this into siginfo and remove the
fields (except for uaccess) from thread_struct?  Better ideas?

Without some kind of cleanup, I'm a bit worried about breakage if a
uaccess fault happens between something else setting the flags and a
signal getting delivered, resulting in corruption of the sigcontext,
unless I add more crud to thread_struct and waste memory for every
process.

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-07  0:48                       ` Andrew Lutomirski
@ 2011-10-10 11:19                         ` richard -rw- weinberger
  2011-10-10 11:48                           ` Ingo Molnar
  0 siblings, 1 reply; 50+ messages in thread
From: richard -rw- weinberger @ 2011-10-10 11:19 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Linus Torvalds, Adrian Bunk, H. Peter Anvin, Thomas Gleixner,
	Ingo Molnar, x86, linux-kernel

On Fri, Oct 7, 2011 at 2:48 AM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Thu, Oct 6, 2011 at 11:34 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Thu, Oct 6, 2011 at 11:16 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>>>
>>> Fixing it will be annoying because the attached fancier version needs
>>> to work, too.  I could implement the whole mess in software, but it
>>> might be nicer to arrange for uaccess errors to stash some information
>>> somewhere (like in the thread_struct cr2 variable).
>>
>> That should be easy enough to do. Just add it to the
>> "fixup_exception()" case in no_context().
>
> This code is rather messy.  We stash the cr2, err, and trap fields of
> sigcontext in thread_struct and we *never* reset them until the next
> segfault.  So userspace sees stale garbage on every signal that isn't
> a (genuine) segfault.  I can imagine this breaking UML is remarkably
> bizarre ways even without vsyscall emulation because UML actually
> seems to rely on that stuff to determine the source of a signal.
>

>From UML's point of view the current situation is odd.
UML will no longer run on top of a default 3.1 kernel.

Why is this odd?
One of the major reasons why people are still using UML is because you
can run it as non-privileged user on any x86 Linux host.
An user which has root privileges can setup and use KVM which is much
nicer than UML...

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-10 11:19                         ` richard -rw- weinberger
@ 2011-10-10 11:48                           ` Ingo Molnar
  2011-10-10 15:31                             ` Andrew Lutomirski
  0 siblings, 1 reply; 50+ messages in thread
From: Ingo Molnar @ 2011-10-10 11:48 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Andrew Lutomirski, Linus Torvalds, Adrian Bunk, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel


* richard -rw- weinberger <richard.weinberger@gmail.com> wrote:

> From UML's point of view the current situation is odd. UML will no 
> longer run on top of a default 3.1 kernel.

This needs to be fixed (perhaps worked around in UML if that's 
possible and if you agree with that) - or barring a real obvious fix 
needs to be reverted to the last-known-working state. We are in -rc9 
so nothing but really, really obvious patches can be applied.

> Why is this odd? One of the major reasons why people are still 
> using UML is because you can run it as non-privileged user on any 
> x86 Linux host. An user which has root privileges can setup and use 
> KVM which is much nicer than UML...

No, your complaint is entirely justified.

Andrew?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-10 11:48                           ` Ingo Molnar
@ 2011-10-10 15:31                             ` Andrew Lutomirski
  2011-10-11  6:22                               ` Ingo Molnar
  0 siblings, 1 reply; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-10 15:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: richard -rw- weinberger, Linus Torvalds, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Mon, Oct 10, 2011 at 4:48 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * richard -rw- weinberger <richard.weinberger@gmail.com> wrote:
>
>> From UML's point of view the current situation is odd. UML will no
>> longer run on top of a default 3.1 kernel.
>
> This needs to be fixed (perhaps worked around in UML if that's
> possible and if you agree with that) - or barring a real obvious fix
> needs to be reverted to the last-known-working state. We are in -rc9
> so nothing but really, really obvious patches can be applied.
>
>> Why is this odd? One of the major reasons why people are still
>> using UML is because you can run it as non-privileged user on any
>> x86 Linux host. An user which has root privileges can setup and use
>> KVM which is much nicer than UML...
>
> No, your complaint is entirely justified.
>
> Andrew?

I think I know what the root cause is and I have most of a patch to
fix it.  It doesn't compile (yet), it's a little less trivial than I'd
like for something this late in the -rc cycle, and it adds 16 bytes to
thread_struct (ugh!).

I think I can make a follow-up patch that removes 32 bytes of
per-thread state to restore my karma, though, but that will definitely
not be 3.1 material.

The issue is that the existing trap_no, error_code, and cr2 fields are
used in ways that appear rather broken and extremely fragile to report
detailed exception info to user space when SIGSEGV, SIGBUS, and
SIGTRAP happen.  Touching them from the failed uaccess paths might
have unfortunate side effects like breaking vm86.  I suspect that
nothing other than UML and vm86 users care because they're only used
for the old sigcontext data and not for modern siginfo.  The tricky
case for vsyscall emulation is if gettimeofday is called with a buffer
that crosses a page boundary and the second page causes the fault.

I'll email something out in a day or two (maybe today).

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-10 15:31                             ` Andrew Lutomirski
@ 2011-10-11  6:22                               ` Ingo Molnar
  2011-10-11 17:24                                 ` [RFC] fixing the UML failure root cause Andrew Lutomirski
  0 siblings, 1 reply; 50+ messages in thread
From: Ingo Molnar @ 2011-10-11  6:22 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: richard -rw- weinberger, Linus Torvalds, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-kernel


* Andrew Lutomirski <luto@mit.edu> wrote:

> > Andrew?
> 
> I think I know what the root cause is and I have most of a patch to 
> fix it.  It doesn't compile (yet), it's a little less trivial than 
> I'd like for something this late in the -rc cycle, and it adds 16 
> bytes to thread_struct (ugh!).
> 
> I think I can make a follow-up patch that removes 32 bytes of 
> per-thread state to restore my karma, though, but that will 
> definitely not be 3.1 material.

Ok, i've queued up the vsyscall=native patch in tip:x86/urgent for 
now - we can re-try in v3.2 (perhaps) if a satisfactory solution is 
found.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC] fixing the UML failure root cause
  2011-10-11  6:22                               ` Ingo Molnar
@ 2011-10-11 17:24                                 ` Andrew Lutomirski
  2011-10-13  6:19                                   ` Linus Torvalds
  2011-10-14 19:53                                   ` [RFC] fixing the UML failure root cause richard -rw- weinberger
  0 siblings, 2 replies; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-11 17:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: richard -rw- weinberger, Linus Torvalds, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1259 bytes --]

On 10/10/2011 11:22 PM, Ingo Molnar wrote:
> 
> * Andrew Lutomirski <luto@mit.edu> wrote:
> 
>>> Andrew?
>>
>> I think I know what the root cause is and I have most of a patch to 
>> fix it.  It doesn't compile (yet), it's a little less trivial than 
>> I'd like for something this late in the -rc cycle, and it adds 16 
>> bytes to thread_struct (ugh!).
>>
>> I think I can make a follow-up patch that removes 32 bytes of 
>> per-thread state to restore my karma, though, but that will 
>> definitely not be 3.1 material.
> 
> Ok, i've queued up the vsyscall=native patch in tip:x86/urgent for 
> now - we can re-try in v3.2 (perhaps) if a satisfactory solution is 
> found.

Getting full cause information for uaccess failure was messy enough that
I gave up.  There are a *lot* of uaccess failure paths to work through.

So here's a different approach.  It's not perfect: it always blames
SEGV_MAPERR instead of SEGV_ACCERR.  I implemented it for vgettimeofday
but not the other two vsyscalls.

What do you think of this approach?  If it seems good, I'll finish the
patch and submit it.

With this patch applied, UML appears to work, but it fills the log with
exploit attempt warnings.  Any ideas on what to do about that?

--Andy

> 
> Thanks,
> 
> 	Ingo


[-- Attachment #2: vsyscall_hack.patch --]
[-- Type: text/plain, Size: 3033 bytes --]

diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 18ae83d..c0bafec 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -139,6 +139,42 @@ static int addr_to_vsyscall_nr(unsigned long addr)
 	return nr;
 }
 
+/* Copy data to user space, forcing signals on failure. */
+static int copy_to_user_sig(unsigned long dest, const void *src, size_t len)
+{
+	/*
+	 * This may be the slowest memcpy ever written.  We don't really care.
+	 */
+	size_t i;
+	for (i = 0; i < len; i++) {
+		char __user *user_byte = (char __user *)(dest + i);
+		if (put_user(((char*)src)[i], user_byte) != 0) {
+			/* Report full siginfo and context */
+			struct task_struct *tsk = current;
+			siginfo_t info;
+			memset(&info, 0, sizeof(info));
+			info.si_signo = SIGSEGV;
+			/*
+			 * Could be SEGV_ACCERR -- we don't distinguish it
+			 * correctly.
+			 */
+			info.si_code = SEGV_MAPERR;
+			info.si_addr = user_byte;
+			/*
+			 * Write fault in user mode.  We don't distinguish
+			 * protection fault from no page found.
+			 */
+			tsk->thread.error_code = 6;
+			tsk->thread.cr2 = (unsigned long)user_byte;
+			tsk->thread.trap_no = 14;
+			force_sig_info(SIGSEGV, &info, tsk);
+			return -EFAULT;
+		}
+	}
+
+	return 0;
+}
+
 bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 {
 	struct task_struct *tsk;
@@ -181,10 +217,19 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 
 	switch (vsyscall_nr) {
 	case 0:
-		ret = sys_gettimeofday(
-			(struct timeval __user *)regs->di,
-			(struct timezone __user *)regs->si);
+	{
+		struct timeval tv;
+		do_gettimeofday(&tv);
+
+		if (regs->di && copy_to_user_sig(regs->di, &tv, sizeof(tv)))
+			goto warn_fault;
+		if (regs->si && copy_to_user_sig(regs->si, &sys_tz,
+		                                 sizeof(struct timezone)))
+			goto warn_fault;
+
+		ret = 0;
 		break;
+	}
 
 	case 1:
 		ret = sys_time((time_t __user *)regs->di);
@@ -197,19 +242,6 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 		break;
 	}
 
-	if (ret == -EFAULT) {
-		/*
-		 * Bad news -- userspace fed a bad pointer to a vsyscall.
-		 *
-		 * With a real vsyscall, that would have caused SIGSEGV.
-		 * To make writing reliable exploits using the emulated
-		 * vsyscalls harder, generate SIGSEGV here as well.
-		 */
-		warn_bad_vsyscall(KERN_INFO, regs,
-				  "vsyscall fault (exploit attempt?)");
-		goto sigsegv;
-	}
-
 	regs->ax = ret;
 
 	/* Emulate a ret instruction. */
@@ -221,6 +253,19 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 sigsegv:
 	force_sig(SIGSEGV, current);
 	return true;
+
+warn_fault:
+	/*
+	 * Bad news -- userspace fed a bad pointer to a vsyscall.
+	 *
+	 * With a real vsyscall, that would have caused SIGSEGV.
+	 * To make writing reliable exploits using the emulated
+	 * vsyscalls harder, generate SIGSEGV here as well.
+	 */
+
+	warn_bad_vsyscall(KERN_INFO, regs,
+	                  "vsyscall fault (exploit attempt?)");
+	return true;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-11 17:24                                 ` [RFC] fixing the UML failure root cause Andrew Lutomirski
@ 2011-10-13  6:19                                   ` Linus Torvalds
  2011-10-13  8:40                                     ` Andrew Lutomirski
  2011-10-14 19:53                                   ` [RFC] fixing the UML failure root cause richard -rw- weinberger
  1 sibling, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2011-10-13  6:19 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Wed, Oct 12, 2011 at 5:24 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>
> So here's a different approach.  It's not perfect: it always blames
> SEGV_MAPERR instead of SEGV_ACCERR.  I implemented it for vgettimeofday
> but not the other two vsyscalls.
>
> What do you think of this approach?  If it seems good, I'll finish the
> patch and submit it.

I think the approach is valid, but you should *not* do this as some
kind of crazy byte-by-byte copy_to_user() emulation.

Do the "copy tz to user mode" as individual "put_user()" calls for
tv_sec/tv_usec/timezone. IOW, there are three words being written to
user mode, not "two memcpy's".

Other than that, there doesn't seem to be anything wrong.

            Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-13  6:19                                   ` Linus Torvalds
@ 2011-10-13  8:40                                     ` Andrew Lutomirski
  2011-10-14  4:46                                       ` Linus Torvalds
  0 siblings, 1 reply; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-13  8:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Wed, Oct 12, 2011 at 11:19 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Oct 12, 2011 at 5:24 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>>
>> So here's a different approach.  It's not perfect: it always blames
>> SEGV_MAPERR instead of SEGV_ACCERR.  I implemented it for vgettimeofday
>> but not the other two vsyscalls.
>>
>> What do you think of this approach?  If it seems good, I'll finish the
>> patch and submit it.
>
> I think the approach is valid, but you should *not* do this as some
> kind of crazy byte-by-byte copy_to_user() emulation.
>
> Do the "copy tz to user mode" as individual "put_user()" calls for
> tv_sec/tv_usec/timezone. IOW, there are three words being written to
> user mode, not "two memcpy's".

How does that work?  The tricky case is when one of those three words
spans a page boundary if the access to the first page is valid, but
the access to the second page is not.  When that happens, if we report
the fault as coming from the first page, then UML is likely to get
think the fault was spurious and enter an infinite loop.

To handle that case, I'll need 4- and 8- byte versions of put_user_sig
(IIRC vgetcpu uses unsigneds) that check whether their destinations
span page boundaries and complain accordingly, which will end up as
more code than I have now.

--Andy

>
> Other than that, there doesn't seem to be anything wrong.
>
>            Linus
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-13  8:40                                     ` Andrew Lutomirski
@ 2011-10-14  4:46                                       ` Linus Torvalds
  2011-10-14  6:30                                         ` Andrew Lutomirski
  0 siblings, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2011-10-14  4:46 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 857 bytes --]

On Thu, Oct 13, 2011 at 8:40 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>
> How does that work?  The tricky case is when one of those three words
> spans a page boundary if the access to the first page is valid, but
> the access to the second page is not.  When that happens, if we report
> the fault as coming from the first page, then UML is likely to get
> think the fault was spurious and enter an infinite loop.

Hmm. Gaah, I just find that memcpy loop disgusting.

We already have that ugly "uaccess_error" crap in handle_exception(),
we might as well do something like the attached and just say "hey, now
you can catch the page fault information for a get_user/put_user
fault".

Isn't that much nicer?

You don't even have to check each word, you can just take the last
exception info from the thread-info.

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1062 bytes --]

 arch/x86/include/asm/thread_info.h |    2 ++
 arch/x86/mm/fault.c                |    6 +++++-
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index a1fe5c127b52..e8d245febfae 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -41,6 +41,8 @@ struct thread_info {
 	__u8			supervisor_stack[0];
 #endif
 	int			uaccess_err;
+	int			uaccess_error_code;
+	unsigned long		uaccess_addr;
 };
 
 #define INIT_THREAD_INFO(tsk)			\
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 0d17c8c50acd..bbbee6e6a95b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -628,8 +628,12 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	int sig;
 
 	/* Are we prepared to handle this kernel fault? */
-	if (fixup_exception(regs))
+	if (fixup_exception(regs)) {
+		struct thread_info *ti = current_thread_info();
+		ti->uaccess_error_code = error_code;
+		ti->uaccess_addr = address;
 		return;
+	}
 
 	/*
 	 * 32-bit:

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-14  4:46                                       ` Linus Torvalds
@ 2011-10-14  6:30                                         ` Andrew Lutomirski
  2011-10-14 20:10                                           ` Linus Torvalds
  0 siblings, 1 reply; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-14  6:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Thu, Oct 13, 2011 at 9:46 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Oct 13, 2011 at 8:40 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>>
>> How does that work?  The tricky case is when one of those three words
>> spans a page boundary if the access to the first page is valid, but
>> the access to the second page is not.  When that happens, if we report
>> the fault as coming from the first page, then UML is likely to get
>> think the fault was spurious and enter an infinite loop.
>
> Hmm. Gaah, I just find that memcpy loop disgusting.
>

Yeah, it's not pretty.

> We already have that ugly "uaccess_error" crap in handle_exception(),
> we might as well do something like the attached and just say "hey, now
> you can catch the page fault information for a get_user/put_user
> fault".
>
> Isn't that much nicer?

I actually tried this.  To really get it right, though, I also need to
either hook the access_ok failure paths (either every single one or
just the ones that matter for those three syscalls, which could be
fragile) or to check access_ok separately in the vsyscall emulation
code.  This also takes up 16 bytes of stack just to support a corner
case of a legacy code path.

Another idea is to have a flag that asks the fault handlers to call
force_sig_info for us.  That's just one bit of per-thread state.  Then
the vsyscall emulation code could check access_ok, force a signal if
access is not ok, then set the flag and do the syscall.  And maybe
some processes would want to opt in to that mode anyway -- arguably
EFAULT is a serious programmer error and should be dealt with more
harshly than other syscall misuses.

Admittedly, UML probably doesn't care about recovering vgettimeofday
pointed at kernel space...

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-11 17:24                                 ` [RFC] fixing the UML failure root cause Andrew Lutomirski
  2011-10-13  6:19                                   ` Linus Torvalds
@ 2011-10-14 19:53                                   ` richard -rw- weinberger
  2011-10-14 20:17                                     ` Andrew Lutomirski
  1 sibling, 1 reply; 50+ messages in thread
From: richard -rw- weinberger @ 2011-10-14 19:53 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Ingo Molnar, Linus Torvalds, Adrian Bunk, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Tue, Oct 11, 2011 at 7:24 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> What do you think of this approach?  If it seems good, I'll finish the
> patch and submit it.
>
> With this patch applied, UML appears to work, but it fills the log with
> exploit attempt warnings.  Any ideas on what to do about that?
>

I can confirm that this patch works.
And I really like vsyscall=emulate because with that UML can trap vsyscalls. :-)

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-14  6:30                                         ` Andrew Lutomirski
@ 2011-10-14 20:10                                           ` Linus Torvalds
  2011-10-21 21:01                                             ` [PATCH] x86-64: Set siginfo and context on vsyscall emulation faults Andy Lutomirski
  0 siblings, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2011-10-14 20:10 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Fri, Oct 14, 2011 at 6:30 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>
> Another idea is to have a flag that asks the fault handlers to call
> force_sig_info for us.  That's just one bit of per-thread state.  Then
> the vsyscall emulation code could check access_ok, force a signal if
> access is not ok, then set the flag and do the syscall.  And maybe
> some processes would want to opt in to that mode anyway -- arguably
> EFAULT is a serious programmer error and should be dealt with more
> harshly than other syscall misuses.

Ok, so I really like that approach. I could easily see some process
saying "I want a SIGSEGV in addition to the EFAULT that I always get".

And yes, it would fix the vsyscall emulation code which could just
save the thread flag, set it, do the accesses, and restore it to the
old valud.

Please make it so,

          Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-14 19:53                                   ` [RFC] fixing the UML failure root cause richard -rw- weinberger
@ 2011-10-14 20:17                                     ` Andrew Lutomirski
  2011-10-14 20:23                                       ` richard -rw- weinberger
  2011-10-14 22:28                                       ` richard -rw- weinberger
  0 siblings, 2 replies; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-14 20:17 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Ingo Molnar, Linus Torvalds, Adrian Bunk, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Fri, Oct 14, 2011 at 12:53 PM, richard -rw- weinberger
<richard.weinberger@gmail.com> wrote:
> On Tue, Oct 11, 2011 at 7:24 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>> What do you think of this approach?  If it seems good, I'll finish the
>> patch and submit it.
>>
>> With this patch applied, UML appears to work, but it fills the log with
>> exploit attempt warnings.  Any ideas on what to do about that?
>>
>
> I can confirm that this patch works.
> And I really like vsyscall=emulate because with that UML can trap vsyscalls. :-)

Are you sure you don't mean vsyscall=native?  I suspect that UML can't
actually trap vsyscalls in emulate mode right now, although that ought
to be fixable.

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-14 20:17                                     ` Andrew Lutomirski
@ 2011-10-14 20:23                                       ` richard -rw- weinberger
  2011-10-14 20:31                                         ` Andrew Lutomirski
  2011-10-14 22:28                                       ` richard -rw- weinberger
  1 sibling, 1 reply; 50+ messages in thread
From: richard -rw- weinberger @ 2011-10-14 20:23 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Ingo Molnar, Linus Torvalds, Adrian Bunk, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Fri, Oct 14, 2011 at 10:17 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Fri, Oct 14, 2011 at 12:53 PM, richard -rw- weinberger
> <richard.weinberger@gmail.com> wrote:
>> On Tue, Oct 11, 2011 at 7:24 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>>> What do you think of this approach?  If it seems good, I'll finish the
>>> patch and submit it.
>>>
>>> With this patch applied, UML appears to work, but it fills the log with
>>> exploit attempt warnings.  Any ideas on what to do about that?
>>>
>>
>> I can confirm that this patch works.
>> And I really like vsyscall=emulate because with that UML can trap vsyscalls. :-)
>
> Are you sure you don't mean vsyscall=native?  I suspect that UML can't
> actually trap vsyscalls in emulate mode right now, although that ought
> to be fixable.
>

Doesn't vsyscall_emu_64.S transform any vsyscall into a real syscall?
So UML can trap it.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-14 20:23                                       ` richard -rw- weinberger
@ 2011-10-14 20:31                                         ` Andrew Lutomirski
  2011-10-14 20:39                                           ` richard -rw- weinberger
  0 siblings, 1 reply; 50+ messages in thread
From: Andrew Lutomirski @ 2011-10-14 20:31 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Ingo Molnar, Linus Torvalds, Adrian Bunk, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Fri, Oct 14, 2011 at 1:23 PM, richard -rw- weinberger
<richard.weinberger@gmail.com> wrote:
> On Fri, Oct 14, 2011 at 10:17 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>> On Fri, Oct 14, 2011 at 12:53 PM, richard -rw- weinberger
>> <richard.weinberger@gmail.com> wrote:
>>> On Tue, Oct 11, 2011 at 7:24 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>>>> What do you think of this approach?  If it seems good, I'll finish the
>>>> patch and submit it.
>>>>
>>>> With this patch applied, UML appears to work, but it fills the log with
>>>> exploit attempt warnings.  Any ideas on what to do about that?
>>>>
>>>
>>> I can confirm that this patch works.
>>> And I really like vsyscall=emulate because with that UML can trap vsyscalls. :-)
>>
>> Are you sure you don't mean vsyscall=native?  I suspect that UML can't
>> actually trap vsyscalls in emulate mode right now, although that ought
>> to be fixable.
>>
>
> Doesn't vsyscall_emu_64.S transform any vsyscall into a real syscall?
> So UML can trap it.

Only if that code actually executes.  In vsyscall=emulate mode, the
page is not executable and a trap is taken instead.  It's not entirely
clear what the right thing to do is wrt ptrace users.

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-14 20:31                                         ` Andrew Lutomirski
@ 2011-10-14 20:39                                           ` richard -rw- weinberger
  0 siblings, 0 replies; 50+ messages in thread
From: richard -rw- weinberger @ 2011-10-14 20:39 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Ingo Molnar, Linus Torvalds, Adrian Bunk, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Fri, Oct 14, 2011 at 10:31 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Fri, Oct 14, 2011 at 1:23 PM, richard -rw- weinberger
> <richard.weinberger@gmail.com> wrote:
>> On Fri, Oct 14, 2011 at 10:17 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>>> On Fri, Oct 14, 2011 at 12:53 PM, richard -rw- weinberger
>>> <richard.weinberger@gmail.com> wrote:
>>>> On Tue, Oct 11, 2011 at 7:24 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>>>>> What do you think of this approach?  If it seems good, I'll finish the
>>>>> patch and submit it.
>>>>>
>>>>> With this patch applied, UML appears to work, but it fills the log with
>>>>> exploit attempt warnings.  Any ideas on what to do about that?
>>>>>
>>>>
>>>> I can confirm that this patch works.
>>>> And I really like vsyscall=emulate because with that UML can trap vsyscalls. :-)
>>>
>>> Are you sure you don't mean vsyscall=native?  I suspect that UML can't
>>> actually trap vsyscalls in emulate mode right now, although that ought
>>> to be fixable.
>>>
>>
>> Doesn't vsyscall_emu_64.S transform any vsyscall into a real syscall?
>> So UML can trap it.
>
> Only if that code actually executes.  In vsyscall=emulate mode, the
> page is not executable and a trap is taken instead.  It's not entirely
> clear what the right thing to do is wrt ptrace users.

Okay.
I did some tests, in vsyscall=emulate mode a statically linked program
reports always the correct time.
On < 3.1 kernel this was not the case, here the same program reports
always the hosts time...

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-14 20:17                                     ` Andrew Lutomirski
  2011-10-14 20:23                                       ` richard -rw- weinberger
@ 2011-10-14 22:28                                       ` richard -rw- weinberger
  2011-10-15 16:57                                         ` Ingo Molnar
  1 sibling, 1 reply; 50+ messages in thread
From: richard -rw- weinberger @ 2011-10-14 22:28 UTC (permalink / raw)
  To: Andrew Lutomirski
  Cc: Ingo Molnar, Linus Torvalds, Adrian Bunk, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel

On Fri, Oct 14, 2011 at 10:17 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Fri, Oct 14, 2011 at 12:53 PM, richard -rw- weinberger
> <richard.weinberger@gmail.com> wrote:
>> On Tue, Oct 11, 2011 at 7:24 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>>> What do you think of this approach?  If it seems good, I'll finish the
>>> patch and submit it.
>>>
>>> With this patch applied, UML appears to work, but it fills the log with
>>> exploit attempt warnings.  Any ideas on what to do about that?
>>>
>>
>> I can confirm that this patch works.
>> And I really like vsyscall=emulate because with that UML can trap vsyscalls. :-)
>
> Are you sure you don't mean vsyscall=native?  I suspect that UML can't
> actually trap vsyscalls in emulate mode right now, although that ought
> to be fixable.
>

§/%)"&!, you are so right!
I missed that vsyscall=native is the default setting now.
Sorry for the confusion.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] fixing the UML failure root cause
  2011-10-14 22:28                                       ` richard -rw- weinberger
@ 2011-10-15 16:57                                         ` Ingo Molnar
  0 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2011-10-15 16:57 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Andrew Lutomirski, Linus Torvalds, Adrian Bunk, H. Peter Anvin,
	Thomas Gleixner, Ingo Molnar, x86, linux-kernel



* richard -rw- weinberger <richard.weinberger@gmail.com> wrote:

> On Fri, Oct 14, 2011 at 10:17 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> > On Fri, Oct 14, 2011 at 12:53 PM, richard -rw- weinberger
> > <richard.weinberger@gmail.com> wrote:
> >> On Tue, Oct 11, 2011 at 7:24 PM, Andrew Lutomirski <luto@mit.edu> wrote:
> >>> What do you think of this approach?  If it seems good, I'll finish the
> >>> patch and submit it.
> >>>
> >>> With this patch applied, UML appears to work, but it fills the log with
> >>> exploit attempt warnings.  Any ideas on what to do about that?
> >>>
> >>
> >> I can confirm that this patch works.
> >> And I really like vsyscall=emulate because with that UML can trap vsyscalls. :-)
> >
> > Are you sure you don't mean vsyscall=native?  I suspect that UML can't
> > actually trap vsyscalls in emulate mode right now, although that ought
> > to be fixable.
> >
> 
> §/%)"&!, you are so right!
> I missed that vsyscall=native is the default setting now.
> Sorry for the confusion.

Switch back to vsyscall=native was just a temporary ABI fix for v3.1 
- we'd like to switch to vsyscall=emulate again ASAP (possibly in 
v3.2), once Andrew is done with the patch and everyone is happy with 
it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH] x86-64: Set siginfo and context on vsyscall emulation faults
  2011-10-14 20:10                                           ` Linus Torvalds
@ 2011-10-21 21:01                                             ` Andy Lutomirski
  2011-10-22  4:46                                               ` Linus Torvalds
  0 siblings, 1 reply; 50+ messages in thread
From: Andy Lutomirski @ 2011-10-21 21:01 UTC (permalink / raw)
  To: Linus Torvalds, x86
  Cc: Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	Andy Lutomirski

To make this work, we teach the page fault handler how to send
signals on failed uaccess.  This only works for user addresses
(kernel addresses will never hit the page fault handler in the
first place), so we need to generate signals for those
separately.

This gets the tricky case right: if the user buffer spans
multiple pages and only the second page is invalid, we set
cr2 and si_addr correctly.  UML relies on this behavior to
"fault in" pages as needed.

We steal a bit from thread_info.uaccess_err to enable this.
Before this change, uaccess_err was a 32-bit boolean value.

This fixes issues with UML when vsyscall=emulate.

Reported-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---

I've tested this briefly on the UML image that used to blow it.  It seems
to work.  It also passes my little sigcontext test.

 arch/x86/include/asm/thread_info.h |    3 +-
 arch/x86/include/asm/uaccess.h     |    2 +-
 arch/x86/kernel/vsyscall_64.c      |   67 +++++++++++++++++++++++++++++++----
 arch/x86/mm/extable.c              |    2 +-
 arch/x86/mm/fault.c                |   22 ++++++++---
 5 files changed, 79 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index a1fe5c1..25ebd79 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -40,7 +40,8 @@ struct thread_info {
 						*/
 	__u8			supervisor_stack[0];
 #endif
-	int			uaccess_err;
+	int			sig_on_uaccess_error:1;
+	int			uaccess_err:1;	/* uaccess failed */
 };
 
 #define INIT_THREAD_INFO(tsk)			\
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 36361bf..8be5f54 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -462,7 +462,7 @@ struct __large_struct { unsigned long buf[100]; };
 	barrier();
 
 #define uaccess_catch(err)						\
-	(err) |= current_thread_info()->uaccess_err;			\
+	(err) |= (current_thread_info()->uaccess_err ? -EFAULT : 0);	\
 	current_thread_info()->uaccess_err = prev_err;			\
 } while (0)
 
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 18ae83d..c6dd0e6 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -139,11 +139,38 @@ static int addr_to_vsyscall_nr(unsigned long addr)
 	return nr;
 }
 
+static bool write_ok_or_segv(unsigned long ptr, size_t size)
+{
+	if (ptr == 0)
+		return true;
+
+	if (!access_ok(VERIFY_WRITE, (void __user *)ptr, size)) {
+		siginfo_t info;
+		struct thread_struct *thread = &current->thread;
+
+		thread->error_code	= 6;  /* user fault, no page, write */
+		thread->cr2		= ptr;
+		thread->trap_no		= 14;
+
+		memset(&info, 0, sizeof(info));
+		info.si_signo		= SIGSEGV;
+		info.si_errno		= 0;
+		info.si_code		= SEGV_MAPERR;
+		info.si_addr		= (void __user *)ptr;
+
+		force_sig_info(SIGSEGV, &info, current);
+		return false;
+	} else {
+		return true;
+	}
+}
+
 bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 {
 	struct task_struct *tsk;
 	unsigned long caller;
 	int vsyscall_nr;
+	int prev_sig_on_uaccess_error;
 	long ret;
 
 	/*
@@ -179,35 +206,59 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 	if (seccomp_mode(&tsk->seccomp))
 		do_exit(SIGKILL);
 
+	/*
+	 * With a real vsyscall, page faults cause SIGSEGV.  We want to
+	 * preserve that behavior to make writing exploits harder.
+	 */
+	prev_sig_on_uaccess_error = current_thread_info()->sig_on_uaccess_error;
+	current_thread_info()->sig_on_uaccess_error = 1;
+
+	ret = -EFAULT;
 	switch (vsyscall_nr) {
 	case 0:
+		if (!write_ok_or_segv(regs->di, sizeof(struct timeval)) ||
+		    !write_ok_or_segv(regs->si, sizeof(struct timezone)))
+			break;
+
 		ret = sys_gettimeofday(
 			(struct timeval __user *)regs->di,
 			(struct timezone __user *)regs->si);
 		break;
 
 	case 1:
+		if (!write_ok_or_segv(regs->di, sizeof(time_t)))
+			break;
+
 		ret = sys_time((time_t __user *)regs->di);
 		break;
 
 	case 2:
+		if (!write_ok_or_segv(regs->di, sizeof(unsigned)) ||
+		    !write_ok_or_segv(regs->si, sizeof(unsigned)))
+			break;
+
 		ret = sys_getcpu((unsigned __user *)regs->di,
 				 (unsigned __user *)regs->si,
 				 0);
 		break;
 	}
 
+	current_thread_info()->sig_on_uaccess_error = prev_sig_on_uaccess_error;
+
 	if (ret == -EFAULT) {
-		/*
-		 * Bad news -- userspace fed a bad pointer to a vsyscall.
-		 *
-		 * With a real vsyscall, that would have caused SIGSEGV.
-		 * To make writing reliable exploits using the emulated
-		 * vsyscalls harder, generate SIGSEGV here as well.
-		 */
+		/* Bad news -- userspace fed a bad pointer to a vsyscall. */
 		warn_bad_vsyscall(KERN_INFO, regs,
 				  "vsyscall fault (exploit attempt?)");
-		goto sigsegv;
+
+		/*
+		 * If we failed to generate a signal for any reason,
+		 * generate one here.  (This should be impossible.)
+		 */
+		if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) &&
+				 !sigismember(&tsk->pending.signal, SIGSEGV)))
+			goto sigsegv;
+
+		return true;  /* Don't emulate the ret. */
 	}
 
 	regs->ax = ret;
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index d0474ad..1fb85db 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -25,7 +25,7 @@ int fixup_exception(struct pt_regs *regs)
 	if (fixup) {
 		/* If fixup is less than 16, it means uaccess error */
 		if (fixup->fixup < 16) {
-			current_thread_info()->uaccess_err = -EFAULT;
+			current_thread_info()->uaccess_err = 1;
 			regs->ip += fixup->fixup;
 			return 1;
 		}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 0d17c8c..85bec26 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -620,7 +620,7 @@ pgtable_bad(struct pt_regs *regs, unsigned long error_code,
 
 static noinline void
 no_context(struct pt_regs *regs, unsigned long error_code,
-	   unsigned long address)
+	   unsigned long address, int signal, int si_code)
 {
 	struct task_struct *tsk = current;
 	unsigned long *stackend;
@@ -628,8 +628,17 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	int sig;
 
 	/* Are we prepared to handle this kernel fault? */
-	if (fixup_exception(regs))
+	if (fixup_exception(regs)) {
+		if (current_thread_info()->sig_on_uaccess_error && signal) {
+			tsk->thread.trap_no = 14;
+			tsk->thread.error_code = error_code | PF_USER;
+			tsk->thread.cr2 = address;
+
+			/* XXX: hwpoison faults will set the wrong code. */
+			force_sig_info_fault(signal, si_code, address, tsk, 0);
+		}
 		return;
+	}
 
 	/*
 	 * 32-bit:
@@ -749,7 +758,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 	if (is_f00f_bug(regs, address))
 		return;
 
-	no_context(regs, error_code, address);
+	no_context(regs, error_code, address, SIGSEGV, si_code);
 }
 
 static noinline void
@@ -813,7 +822,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 
 	/* Kernel mode? Handle exceptions or die: */
 	if (!(error_code & PF_USER)) {
-		no_context(regs, error_code, address);
+		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
 		return;
 	}
 
@@ -848,7 +857,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 		if (!(fault & VM_FAULT_RETRY))
 			up_read(&current->mm->mmap_sem);
 		if (!(error_code & PF_USER))
-			no_context(regs, error_code, address);
+			no_context(regs, error_code, address, 0, 0);
 		return 1;
 	}
 	if (!(fault & VM_FAULT_ERROR))
@@ -858,7 +867,8 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 		/* Kernel mode? Handle exceptions or die: */
 		if (!(error_code & PF_USER)) {
 			up_read(&current->mm->mmap_sem);
-			no_context(regs, error_code, address);
+			no_context(regs, error_code, address,
+				   SIGSEGV, SEGV_MAPERR);
 			return 1;
 		}
 
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH] x86-64: Set siginfo and context on vsyscall emulation faults
  2011-10-21 21:01                                             ` [PATCH] x86-64: Set siginfo and context on vsyscall emulation faults Andy Lutomirski
@ 2011-10-22  4:46                                               ` Linus Torvalds
  2011-10-22  9:07                                                 ` Andy Lutomirski
  0 siblings, 1 reply; 50+ messages in thread
From: Linus Torvalds @ 2011-10-22  4:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel

On Sat, Oct 22, 2011 at 12:01 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> +static bool write_ok_or_segv(unsigned long ptr, size_t size)
> +{
> +       if (ptr == 0)
> +               return true;

Why is ptr==0 special? That makes no sense.

Also, this whole function makes the notion of setting the "sigsegv on
fault" flag much less interesting. It would be much better if
access_ok() (including the cases embedded in get_user/put_user/etc)
just did it right automatically for everything, rather than
special-casing it for just this.

I wonder if we could just make access_ok() use a trap instead of just
the regular compares (and then in the trap handler do the same logic
as in the page fault handler)? Sadly, the 'bounds' instruction doesn't
work for this (in 32-bit mode it does a *signed* compare, and in
64-bit mode it no longer exists), but something like that might.

That said, I think that your patch looks acceptable as a "let's fix
vsyscalls without doing the bigger change". But I really don't see why
ptr==0 would be special.

So I think your write_ok_or_segv() function should just be

   static bool write_ok_or_segv(unsigned long ptr, size_t size)
   {
      if (access_ok(ptr, size))
         return true;

       .. send signal ...

      return false;
   }

instead of that odd thing you have now.

           Linus

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] x86-64: Set siginfo and context on vsyscall emulation faults
  2011-10-22  4:46                                               ` Linus Torvalds
@ 2011-10-22  9:07                                                 ` Andy Lutomirski
  2011-11-08  0:33                                                   ` [PATCH 0/2] Fix and re-enable vsyscall=emulate Andy Lutomirski
  0 siblings, 1 reply; 50+ messages in thread
From: Andy Lutomirski @ 2011-10-22  9:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: x86, Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel

On Fri, Oct 21, 2011 at 9:46 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Oct 22, 2011 at 12:01 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> +static bool write_ok_or_segv(unsigned long ptr, size_t size)
>> +{
>> +       if (ptr == 0)
>> +               return true;
>
> Why is ptr==0 special? That makes no sense.
>

Pure laziness.  Null pointers to the vsyscalls are valid and mean that
userspace doesn't care about the result.  I could have put the check
in the caller just as easily.

> Also, this whole function makes the notion of setting the "sigsegv on
> fault" flag much less interesting. It would be much better if
> access_ok() (including the cases embedded in get_user/put_user/etc)
> just did it right automatically for everything, rather than
> special-casing it for just this.
>

Agreed.  If I add an option to let userspace opt in to the
signal-sending behavior, I'd want to convince myself that all callers
of access_ok should be affected.

> I wonder if we could just make access_ok() use a trap instead of just
> the regular compares (and then in the trap handler do the same logic
> as in the page fault handler)? Sadly, the 'bounds' instruction doesn't
> work for this (in 32-bit mode it does a *signed* compare, and in
> 64-bit mode it no longer exists), but something like that might.
>

I suspect that bounds is considerably slower than a comparison anyway.

FWIW, there's a different optimization that could make a lot of this
code much nicer: using asm goto for the failure path in get_user, etc.
 The failure path is already a branch, and if gcc could be convinced
to generate sensible code for:

if (put_user(...)) goto out;
if (put_user(...)) goto out;
if (put_user(...)) goto out;
if (put_user(...)) goto out;

then the uaccess_err mechanism and a whole lot of bitwise ors could go away.

Sadly, gcc (at least 4.5 and 4.6) has weird limitations on the kind of
constraints allowed on asm goto that, IIRC, make get_user impossible
and put_user a little dicey.  (I could have that backwards.)

> That said, I think that your patch looks acceptable as a "let's fix
> vsyscalls without doing the bigger change". But I really don't see why
> ptr==0 would be special.
>
> So I think your write_ok_or_segv() function should just be
>
>   static bool write_ok_or_segv(unsigned long ptr, size_t size)
>   {
>      if (access_ok(ptr, size))
>         return true;
>
>       .. send signal ...
>
>      return false;
>   }
>
> instead of that odd thing you have now.

Or a comment to clarify it.  Alternatively I could ignore the issue
because access to 0 is okay in the access_ok sense anyway.

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/2] Fix and re-enable vsyscall=emulate
  2011-10-22  9:07                                                 ` Andy Lutomirski
@ 2011-11-08  0:33                                                   ` Andy Lutomirski
  2011-11-08  0:33                                                     ` [PATCH 1/2] x86-64: Set siginfo and context on vsyscall emulation faults Andy Lutomirski
                                                                       ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Andy Lutomirski @ 2011-11-08  0:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: x86, Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	Andy Lutomirski

The really nice fix (wiring up access_ok failures to be able to raise
signals) won't be ready on time for 3.2, so let's try the simpler fix
for now.

Changes from the earlier version:
 - Clean up the odd ptr==0 check.
 - Flip the default back to vsyscall=emulate

Andy Lutomirski (2):
  x86-64: Set siginfo and context on vsyscall emulation faults
  x86: Default to vsyscall=emulate

 Documentation/kernel-parameters.txt |    7 +--
 arch/x86/include/asm/thread_info.h  |    3 +-
 arch/x86/include/asm/uaccess.h      |    2 +-
 arch/x86/kernel/vsyscall_64.c       |   77 ++++++++++++++++++++++++++++++----
 arch/x86/mm/extable.c               |    2 +-
 arch/x86/mm/fault.c                 |   22 +++++++---
 6 files changed, 91 insertions(+), 22 deletions(-)

-- 
1.7.6.4


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] x86-64: Set siginfo and context on vsyscall emulation faults
  2011-11-08  0:33                                                   ` [PATCH 0/2] Fix and re-enable vsyscall=emulate Andy Lutomirski
@ 2011-11-08  0:33                                                     ` Andy Lutomirski
  2011-12-05 13:23                                                       ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2011-11-08  0:33                                                     ` [PATCH 2/2] x86: Default to vsyscall=emulate Andy Lutomirski
  2011-12-02 22:47                                                     ` [PATCH 0/2] Fix and re-enable vsyscall=emulate Andy Lutomirski
  2 siblings, 1 reply; 50+ messages in thread
From: Andy Lutomirski @ 2011-11-08  0:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: x86, Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	Andy Lutomirski

To make this work, we teach the page fault handler how to send
signals on failed uaccess.  This only works for user addresses
(kernel addresses will never hit the page fault handler in the
first place), so we need to generate signals for those
separately.

This gets the tricky case right: if the user buffer spans
multiple pages and only the second page is invalid, we set
cr2 and si_addr correctly.  UML relies on this behavior to
"fault in" pages as needed.

We steal a bit from thread_info.uaccess_err to enable this.
Before this change, uaccess_err was a 32-bit boolean value.

This fixes issues with UML when vsyscall=emulate.

Reported-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/include/asm/thread_info.h |    3 +-
 arch/x86/include/asm/uaccess.h     |    2 +-
 arch/x86/kernel/vsyscall_64.c      |   75 ++++++++++++++++++++++++++++++++----
 arch/x86/mm/extable.c              |    2 +-
 arch/x86/mm/fault.c                |   22 ++++++++---
 5 files changed, 87 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index a1fe5c1..25ebd79 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -40,7 +40,8 @@ struct thread_info {
 						*/
 	__u8			supervisor_stack[0];
 #endif
-	int			uaccess_err;
+	int			sig_on_uaccess_error:1;
+	int			uaccess_err:1;	/* uaccess failed */
 };
 
 #define INIT_THREAD_INFO(tsk)			\
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 36361bf..8be5f54 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -462,7 +462,7 @@ struct __large_struct { unsigned long buf[100]; };
 	barrier();
 
 #define uaccess_catch(err)						\
-	(err) |= current_thread_info()->uaccess_err;			\
+	(err) |= (current_thread_info()->uaccess_err ? -EFAULT : 0);	\
 	current_thread_info()->uaccess_err = prev_err;			\
 } while (0)
 
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index b56c65de..9b05546 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -139,11 +139,40 @@ static int addr_to_vsyscall_nr(unsigned long addr)
 	return nr;
 }
 
+static bool write_ok_or_segv(unsigned long ptr, size_t size)
+{
+	/*
+	 * XXX: if access_ok, get_user, and put_user handled
+	 * sig_on_uaccess_error, this could go away.
+	 */
+
+	if (!access_ok(VERIFY_WRITE, (void __user *)ptr, size)) {
+		siginfo_t info;
+		struct thread_struct *thread = &current->thread;
+
+		thread->error_code	= 6;  /* user fault, no page, write */
+		thread->cr2		= ptr;
+		thread->trap_no		= 14;
+
+		memset(&info, 0, sizeof(info));
+		info.si_signo		= SIGSEGV;
+		info.si_errno		= 0;
+		info.si_code		= SEGV_MAPERR;
+		info.si_addr		= (void __user *)ptr;
+
+		force_sig_info(SIGSEGV, &info, current);
+		return false;
+	} else {
+		return true;
+	}
+}
+
 bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 {
 	struct task_struct *tsk;
 	unsigned long caller;
 	int vsyscall_nr;
+	int prev_sig_on_uaccess_error;
 	long ret;
 
 	/*
@@ -179,35 +208,65 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 	if (seccomp_mode(&tsk->seccomp))
 		do_exit(SIGKILL);
 
+	/*
+	 * With a real vsyscall, page faults cause SIGSEGV.  We want to
+	 * preserve that behavior to make writing exploits harder.
+	 */
+	prev_sig_on_uaccess_error = current_thread_info()->sig_on_uaccess_error;
+	current_thread_info()->sig_on_uaccess_error = 1;
+
+	/*
+	 * 0 is a valid user pointer (in the access_ok sense) on 32-bit and
+	 * 64-bit, so we don't need to special-case it here.  For all the
+	 * vsyscalls, 0 means "don't write anything" not "write it at
+	 * address 0".
+	 */
+	ret = -EFAULT;
 	switch (vsyscall_nr) {
 	case 0:
+		if (!write_ok_or_segv(regs->di, sizeof(struct timeval)) ||
+		    !write_ok_or_segv(regs->si, sizeof(struct timezone)))
+			break;
+
 		ret = sys_gettimeofday(
 			(struct timeval __user *)regs->di,
 			(struct timezone __user *)regs->si);
 		break;
 
 	case 1:
+		if (!write_ok_or_segv(regs->di, sizeof(time_t)))
+			break;
+
 		ret = sys_time((time_t __user *)regs->di);
 		break;
 
 	case 2:
+		if (!write_ok_or_segv(regs->di, sizeof(unsigned)) ||
+		    !write_ok_or_segv(regs->si, sizeof(unsigned)))
+			break;
+
 		ret = sys_getcpu((unsigned __user *)regs->di,
 				 (unsigned __user *)regs->si,
 				 0);
 		break;
 	}
 
+	current_thread_info()->sig_on_uaccess_error = prev_sig_on_uaccess_error;
+
 	if (ret == -EFAULT) {
-		/*
-		 * Bad news -- userspace fed a bad pointer to a vsyscall.
-		 *
-		 * With a real vsyscall, that would have caused SIGSEGV.
-		 * To make writing reliable exploits using the emulated
-		 * vsyscalls harder, generate SIGSEGV here as well.
-		 */
+		/* Bad news -- userspace fed a bad pointer to a vsyscall. */
 		warn_bad_vsyscall(KERN_INFO, regs,
 				  "vsyscall fault (exploit attempt?)");
-		goto sigsegv;
+
+		/*
+		 * If we failed to generate a signal for any reason,
+		 * generate one here.  (This should be impossible.)
+		 */
+		if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) &&
+				 !sigismember(&tsk->pending.signal, SIGSEGV)))
+			goto sigsegv;
+
+		return true;  /* Don't emulate the ret. */
 	}
 
 	regs->ax = ret;
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index d0474ad..1fb85db 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -25,7 +25,7 @@ int fixup_exception(struct pt_regs *regs)
 	if (fixup) {
 		/* If fixup is less than 16, it means uaccess error */
 		if (fixup->fixup < 16) {
-			current_thread_info()->uaccess_err = -EFAULT;
+			current_thread_info()->uaccess_err = 1;
 			regs->ip += fixup->fixup;
 			return 1;
 		}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 0d17c8c..85bec26 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -620,7 +620,7 @@ pgtable_bad(struct pt_regs *regs, unsigned long error_code,
 
 static noinline void
 no_context(struct pt_regs *regs, unsigned long error_code,
-	   unsigned long address)
+	   unsigned long address, int signal, int si_code)
 {
 	struct task_struct *tsk = current;
 	unsigned long *stackend;
@@ -628,8 +628,17 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	int sig;
 
 	/* Are we prepared to handle this kernel fault? */
-	if (fixup_exception(regs))
+	if (fixup_exception(regs)) {
+		if (current_thread_info()->sig_on_uaccess_error && signal) {
+			tsk->thread.trap_no = 14;
+			tsk->thread.error_code = error_code | PF_USER;
+			tsk->thread.cr2 = address;
+
+			/* XXX: hwpoison faults will set the wrong code. */
+			force_sig_info_fault(signal, si_code, address, tsk, 0);
+		}
 		return;
+	}
 
 	/*
 	 * 32-bit:
@@ -749,7 +758,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 	if (is_f00f_bug(regs, address))
 		return;
 
-	no_context(regs, error_code, address);
+	no_context(regs, error_code, address, SIGSEGV, si_code);
 }
 
 static noinline void
@@ -813,7 +822,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 
 	/* Kernel mode? Handle exceptions or die: */
 	if (!(error_code & PF_USER)) {
-		no_context(regs, error_code, address);
+		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
 		return;
 	}
 
@@ -848,7 +857,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 		if (!(fault & VM_FAULT_RETRY))
 			up_read(&current->mm->mmap_sem);
 		if (!(error_code & PF_USER))
-			no_context(regs, error_code, address);
+			no_context(regs, error_code, address, 0, 0);
 		return 1;
 	}
 	if (!(fault & VM_FAULT_ERROR))
@@ -858,7 +867,8 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 		/* Kernel mode? Handle exceptions or die: */
 		if (!(error_code & PF_USER)) {
 			up_read(&current->mm->mmap_sem);
-			no_context(regs, error_code, address);
+			no_context(regs, error_code, address,
+				   SIGSEGV, SEGV_MAPERR);
 			return 1;
 		}
 
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/2] x86: Default to vsyscall=emulate
  2011-11-08  0:33                                                   ` [PATCH 0/2] Fix and re-enable vsyscall=emulate Andy Lutomirski
  2011-11-08  0:33                                                     ` [PATCH 1/2] x86-64: Set siginfo and context on vsyscall emulation faults Andy Lutomirski
@ 2011-11-08  0:33                                                     ` Andy Lutomirski
  2011-12-05 13:24                                                       ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2011-12-02 22:47                                                     ` [PATCH 0/2] Fix and re-enable vsyscall=emulate Andy Lutomirski
  2 siblings, 1 reply; 50+ messages in thread
From: Andy Lutomirski @ 2011-11-08  0:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: x86, Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	Andy Lutomirski

This reverts commit 2b666859ec323403ac9a3a441d16eab30945404b.  The
ABI breakage should be fixed by:

commit 48c4206f5b02f28c4c78a1f5b491d3772fb64fb9
Author: Andy Lutomirski <luto@mit.edu>
Date:   Thu Oct 20 08:48:19 2011 -0700

    x86-64: Set siginfo and context on vsyscall emulation faults

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 Documentation/kernel-parameters.txt |    7 +++----
 arch/x86/kernel/vsyscall_64.c       |    2 +-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index d6e6724..854ed5ca 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2706,11 +2706,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			functions are at fixed addresses, they make nice
 			targets for exploits that can control RIP.
 
-			emulate     Vsyscalls turn into traps and are emulated
-			            reasonably safely.
+			emulate     [default] Vsyscalls turn into traps and are
+			            emulated reasonably safely.
 
-			native      [default] Vsyscalls are native syscall
-			            instructions.
+			native      Vsyscalls are native syscall instructions.
 			            This is a little bit faster than trapping
 			            and makes a few dynamic recompilers work
 			            better than they would in emulation mode.
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 9b05546..02e980a 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -56,7 +56,7 @@ DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
 	.lock = __SEQLOCK_UNLOCKED(__vsyscall_gtod_data.lock),
 };
 
-static enum { EMULATE, NATIVE, NONE } vsyscall_mode = NATIVE;
+static enum { EMULATE, NATIVE, NONE } vsyscall_mode = EMULATE;
 
 static int __init vsyscall_setup(char *str)
 {
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] Fix and re-enable vsyscall=emulate
  2011-11-08  0:33                                                   ` [PATCH 0/2] Fix and re-enable vsyscall=emulate Andy Lutomirski
  2011-11-08  0:33                                                     ` [PATCH 1/2] x86-64: Set siginfo and context on vsyscall emulation faults Andy Lutomirski
  2011-11-08  0:33                                                     ` [PATCH 2/2] x86: Default to vsyscall=emulate Andy Lutomirski
@ 2011-12-02 22:47                                                     ` Andy Lutomirski
  2011-12-05 11:18                                                       ` H. Peter Anvin
  2 siblings, 1 reply; 50+ messages in thread
From: Andy Lutomirski @ 2011-12-02 22:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: x86, Ingo Molnar, richard -rw- weinberger, Adrian Bunk,
	H. Peter Anvin, Thomas Gleixner, Ingo Molnar, linux-kernel,
	Andy Lutomirski

On Mon, Nov 7, 2011 at 4:33 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> The really nice fix (wiring up access_ok failures to be able to raise
> signals) won't be ready on time for 3.2, so let's try the simpler fix
> for now.

I spoke to hpa about this a couple days ago, and he pointed out a
problem with making access_ok send signals.  Userspace expects signals
that come with full context information to be restartable, and many
system calls are not restartable.  read() and write() are the obvious
examples: once they're processed the beginning of the buffer, unless
they adjust their parameters, they can't safely be restarted.  So
without massive changes, I think allowing access_ok to raise a signal
with full context is asking for trouble.

I can still do the patch with two modes: signals without context via
arch_prctl and signals with context via vsyscall emulation, but that's
probably overkill for fixing this bug.  I'd say just apply these
patches as is (for 3.3).

--Andy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] Fix and re-enable vsyscall=emulate
  2011-12-02 22:47                                                     ` [PATCH 0/2] Fix and re-enable vsyscall=emulate Andy Lutomirski
@ 2011-12-05 11:18                                                       ` H. Peter Anvin
  0 siblings, 0 replies; 50+ messages in thread
From: H. Peter Anvin @ 2011-12-05 11:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, x86, Ingo Molnar, richard -rw- weinberger,
	Adrian Bunk, H. Peter Anvin, Thomas Gleixner, Ingo Molnar,
	linux-kernel

On 12/02/2011 02:47 PM, Andy Lutomirski wrote:
> On Mon, Nov 7, 2011 at 4:33 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> The really nice fix (wiring up access_ok failures to be able to raise
>> signals) won't be ready on time for 3.2, so let's try the simpler fix
>> for now.
> 
> I spoke to hpa about this a couple days ago, and he pointed out a
> problem with making access_ok send signals.  Userspace expects signals
> that come with full context information to be restartable, and many
> system calls are not restartable.  read() and write() are the obvious
> examples: once they're processed the beginning of the buffer, unless
> they adjust their parameters, they can't safely be restarted.  So
> without massive changes, I think allowing access_ok to raise a signal
> with full context is asking for trouble.
> 
> I can still do the patch with two modes: signals without context via
> arch_prctl and signals with context via vsyscall emulation, but that's
> probably overkill for fixing this bug.  I'd say just apply these
> patches as is (for 3.3).
> 

It's somewhat questionable if the "return -EFAULT and deliver SIGSEGV"
semantic resolves the problem; obviously the signal handler isn't
restartable, but returning from the signal handler will at least cause
the application to see the EFAULT and not try to restart a system call
in a way that is likely to cause massive failure.  If the handler is
aware about what needs to be done then it can correct the situation and
restart the system call -- but it would have to have detailed
information about the state before the system call.

I am also concerned about information leaks from the kernel.  The
existing kernel paths are not necessarily designed to be robust against
giving out additional error information.  This may be a theoretical
concern, but there have been real security holes in the past from these
kinds of changes.

	-hpa

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [tip:x86/asm] x86-64: Set siginfo and context on vsyscall emulation faults
  2011-11-08  0:33                                                     ` [PATCH 1/2] x86-64: Set siginfo and context on vsyscall emulation faults Andy Lutomirski
@ 2011-12-05 13:23                                                       ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-12-05 13:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, bunk, luto,
	richard.weinberger, tglx, hpa, mingo

Commit-ID:  4fc3490114bb159bd4fff1b3c96f4320fe6fb08f
Gitweb:     http://git.kernel.org/tip/4fc3490114bb159bd4fff1b3c96f4320fe6fb08f
Author:     Andy Lutomirski <luto@amacapital.net>
AuthorDate: Mon, 7 Nov 2011 16:33:40 -0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 5 Dec 2011 12:17:27 +0100

x86-64: Set siginfo and context on vsyscall emulation faults

To make this work, we teach the page fault handler how to send
signals on failed uaccess.  This only works for user addresses
(kernel addresses will never hit the page fault handler in the
first place), so we need to generate signals for those
separately.

This gets the tricky case right: if the user buffer spans
multiple pages and only the second page is invalid, we set
cr2 and si_addr correctly.  UML relies on this behavior to
"fault in" pages as needed.

We steal a bit from thread_info.uaccess_err to enable this.
Before this change, uaccess_err was a 32-bit boolean value.

This fixes issues with UML when vsyscall=emulate.

Reported-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/include/asm/thread_info.h |    3 +-
 arch/x86/include/asm/uaccess.h     |    2 +-
 arch/x86/kernel/vsyscall_64.c      |   75 ++++++++++++++++++++++++++++++++----
 arch/x86/mm/extable.c              |    2 +-
 arch/x86/mm/fault.c                |   22 ++++++++---
 5 files changed, 87 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index a1fe5c1..25ebd79 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -40,7 +40,8 @@ struct thread_info {
 						*/
 	__u8			supervisor_stack[0];
 #endif
-	int			uaccess_err;
+	int			sig_on_uaccess_error:1;
+	int			uaccess_err:1;	/* uaccess failed */
 };
 
 #define INIT_THREAD_INFO(tsk)			\
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 36361bf..8be5f54 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -462,7 +462,7 @@ struct __large_struct { unsigned long buf[100]; };
 	barrier();
 
 #define uaccess_catch(err)						\
-	(err) |= current_thread_info()->uaccess_err;			\
+	(err) |= (current_thread_info()->uaccess_err ? -EFAULT : 0);	\
 	current_thread_info()->uaccess_err = prev_err;			\
 } while (0)
 
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index e4d4a22..8084bec 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -140,11 +140,40 @@ static int addr_to_vsyscall_nr(unsigned long addr)
 	return nr;
 }
 
+static bool write_ok_or_segv(unsigned long ptr, size_t size)
+{
+	/*
+	 * XXX: if access_ok, get_user, and put_user handled
+	 * sig_on_uaccess_error, this could go away.
+	 */
+
+	if (!access_ok(VERIFY_WRITE, (void __user *)ptr, size)) {
+		siginfo_t info;
+		struct thread_struct *thread = &current->thread;
+
+		thread->error_code	= 6;  /* user fault, no page, write */
+		thread->cr2		= ptr;
+		thread->trap_no		= 14;
+
+		memset(&info, 0, sizeof(info));
+		info.si_signo		= SIGSEGV;
+		info.si_errno		= 0;
+		info.si_code		= SEGV_MAPERR;
+		info.si_addr		= (void __user *)ptr;
+
+		force_sig_info(SIGSEGV, &info, current);
+		return false;
+	} else {
+		return true;
+	}
+}
+
 bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 {
 	struct task_struct *tsk;
 	unsigned long caller;
 	int vsyscall_nr;
+	int prev_sig_on_uaccess_error;
 	long ret;
 
 	/*
@@ -180,35 +209,65 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 	if (seccomp_mode(&tsk->seccomp))
 		do_exit(SIGKILL);
 
+	/*
+	 * With a real vsyscall, page faults cause SIGSEGV.  We want to
+	 * preserve that behavior to make writing exploits harder.
+	 */
+	prev_sig_on_uaccess_error = current_thread_info()->sig_on_uaccess_error;
+	current_thread_info()->sig_on_uaccess_error = 1;
+
+	/*
+	 * 0 is a valid user pointer (in the access_ok sense) on 32-bit and
+	 * 64-bit, so we don't need to special-case it here.  For all the
+	 * vsyscalls, 0 means "don't write anything" not "write it at
+	 * address 0".
+	 */
+	ret = -EFAULT;
 	switch (vsyscall_nr) {
 	case 0:
+		if (!write_ok_or_segv(regs->di, sizeof(struct timeval)) ||
+		    !write_ok_or_segv(regs->si, sizeof(struct timezone)))
+			break;
+
 		ret = sys_gettimeofday(
 			(struct timeval __user *)regs->di,
 			(struct timezone __user *)regs->si);
 		break;
 
 	case 1:
+		if (!write_ok_or_segv(regs->di, sizeof(time_t)))
+			break;
+
 		ret = sys_time((time_t __user *)regs->di);
 		break;
 
 	case 2:
+		if (!write_ok_or_segv(regs->di, sizeof(unsigned)) ||
+		    !write_ok_or_segv(regs->si, sizeof(unsigned)))
+			break;
+
 		ret = sys_getcpu((unsigned __user *)regs->di,
 				 (unsigned __user *)regs->si,
 				 0);
 		break;
 	}
 
+	current_thread_info()->sig_on_uaccess_error = prev_sig_on_uaccess_error;
+
 	if (ret == -EFAULT) {
-		/*
-		 * Bad news -- userspace fed a bad pointer to a vsyscall.
-		 *
-		 * With a real vsyscall, that would have caused SIGSEGV.
-		 * To make writing reliable exploits using the emulated
-		 * vsyscalls harder, generate SIGSEGV here as well.
-		 */
+		/* Bad news -- userspace fed a bad pointer to a vsyscall. */
 		warn_bad_vsyscall(KERN_INFO, regs,
 				  "vsyscall fault (exploit attempt?)");
-		goto sigsegv;
+
+		/*
+		 * If we failed to generate a signal for any reason,
+		 * generate one here.  (This should be impossible.)
+		 */
+		if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) &&
+				 !sigismember(&tsk->pending.signal, SIGSEGV)))
+			goto sigsegv;
+
+		return true;  /* Don't emulate the ret. */
 	}
 
 	regs->ax = ret;
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index d0474ad..1fb85db 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -25,7 +25,7 @@ int fixup_exception(struct pt_regs *regs)
 	if (fixup) {
 		/* If fixup is less than 16, it means uaccess error */
 		if (fixup->fixup < 16) {
-			current_thread_info()->uaccess_err = -EFAULT;
+			current_thread_info()->uaccess_err = 1;
 			regs->ip += fixup->fixup;
 			return 1;
 		}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 5db0490..9d74824 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -626,7 +626,7 @@ pgtable_bad(struct pt_regs *regs, unsigned long error_code,
 
 static noinline void
 no_context(struct pt_regs *regs, unsigned long error_code,
-	   unsigned long address)
+	   unsigned long address, int signal, int si_code)
 {
 	struct task_struct *tsk = current;
 	unsigned long *stackend;
@@ -634,8 +634,17 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 	int sig;
 
 	/* Are we prepared to handle this kernel fault? */
-	if (fixup_exception(regs))
+	if (fixup_exception(regs)) {
+		if (current_thread_info()->sig_on_uaccess_error && signal) {
+			tsk->thread.trap_no = 14;
+			tsk->thread.error_code = error_code | PF_USER;
+			tsk->thread.cr2 = address;
+
+			/* XXX: hwpoison faults will set the wrong code. */
+			force_sig_info_fault(signal, si_code, address, tsk, 0);
+		}
 		return;
+	}
 
 	/*
 	 * 32-bit:
@@ -755,7 +764,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 	if (is_f00f_bug(regs, address))
 		return;
 
-	no_context(regs, error_code, address);
+	no_context(regs, error_code, address, SIGSEGV, si_code);
 }
 
 static noinline void
@@ -819,7 +828,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 
 	/* Kernel mode? Handle exceptions or die: */
 	if (!(error_code & PF_USER)) {
-		no_context(regs, error_code, address);
+		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
 		return;
 	}
 
@@ -854,7 +863,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 		if (!(fault & VM_FAULT_RETRY))
 			up_read(&current->mm->mmap_sem);
 		if (!(error_code & PF_USER))
-			no_context(regs, error_code, address);
+			no_context(regs, error_code, address, 0, 0);
 		return 1;
 	}
 	if (!(fault & VM_FAULT_ERROR))
@@ -864,7 +873,8 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
 		/* Kernel mode? Handle exceptions or die: */
 		if (!(error_code & PF_USER)) {
 			up_read(&current->mm->mmap_sem);
-			no_context(regs, error_code, address);
+			no_context(regs, error_code, address,
+				   SIGSEGV, SEGV_MAPERR);
 			return 1;
 		}
 

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [tip:x86/asm] x86: Default to vsyscall=emulate
  2011-11-08  0:33                                                     ` [PATCH 2/2] x86: Default to vsyscall=emulate Andy Lutomirski
@ 2011-12-05 13:24                                                       ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 50+ messages in thread
From: tip-bot for Andy Lutomirski @ 2011-12-05 13:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, bunk, luto,
	richard.weinberger, tglx, hpa, mingo

Commit-ID:  2e57ae0515124af45dd889bfbd4840fd40fcc07d
Gitweb:     http://git.kernel.org/tip/2e57ae0515124af45dd889bfbd4840fd40fcc07d
Author:     Andy Lutomirski <luto@amacapital.net>
AuthorDate: Mon, 7 Nov 2011 16:33:41 -0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 5 Dec 2011 12:17:29 +0100

x86: Default to vsyscall=emulate

This essentially reverts:

  2b666859ec32: x86: Default to vsyscall=native for now

The ABI breakage should now be fixed by:

 commit 48c4206f5b02f28c4c78a1f5b491d3772fb64fb9
 Author: Andy Lutomirski <luto@mit.edu>
 Date:   Thu Oct 20 08:48:19 2011 -0700

    x86-64: Set siginfo and context on vsyscall emulation faults

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Cc: Adrian Bunk <bunk@stusta.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/93154af3b2b6d208906ae02d80d92cf60c6fa94f.1320712291.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/kernel-parameters.txt |    7 +++----
 arch/x86/kernel/vsyscall_64.c       |    2 +-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index a0c5c5f..ce7fc8b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2750,11 +2750,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			functions are at fixed addresses, they make nice
 			targets for exploits that can control RIP.
 
-			emulate     Vsyscalls turn into traps and are emulated
-			            reasonably safely.
+			emulate     [default] Vsyscalls turn into traps and are
+			            emulated reasonably safely.
 
-			native      [default] Vsyscalls are native syscall
-			            instructions.
+			native      Vsyscalls are native syscall instructions.
 			            This is a little bit faster than trapping
 			            and makes a few dynamic recompilers work
 			            better than they would in emulation mode.
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 8084bec..b07ba93 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -57,7 +57,7 @@ DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
 	.lock = __SEQLOCK_UNLOCKED(__vsyscall_gtod_data.lock),
 };
 
-static enum { EMULATE, NATIVE, NONE } vsyscall_mode = NATIVE;
+static enum { EMULATE, NATIVE, NONE } vsyscall_mode = EMULATE;
 
 static int __init vsyscall_setup(char *str)
 {

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-05 22:01 ` Thomas Gleixner
@ 2011-10-09 13:45   ` Adrian Bunk
  0 siblings, 0 replies; 50+ messages in thread
From: Adrian Bunk @ 2011-10-09 13:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Lutomirski, H. Peter Anvin, Ingo Molnar, x86, LKML,
	Andrew Morton, Linus Torvalds, Arjan van de Ven

On Thu, Oct 06, 2011 at 12:01:44AM +0200, Thomas Gleixner wrote:
>...
> We might need better dmesg output, e.g.
> 
>    printk_once("you might run something which requires
>    		vsyscall=native, but be aware that you are
> 		opening a security hole. See Documentation/....")
> 
> That's fine, but making the defaults insecure is just ass backwards.

Better dmesg output is in any case a better idea, patch is coming.

I stayed with warn_bad_vsyscall() instead of printk_once() for
the following reasons:
- _once is bad for something that might indicate exploit attempts,
  warn_bad_vsyscall() is already ratelimited
- the name and pid of the process should be shown
- the additional output of warn_bad_vsyscall() can help determine
  what caused it

> Thanks,
> 
> 	tglx

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [3.1 patch] x86: default to vsyscall=native
  2011-10-05 21:40 Adrian Bunk
@ 2011-10-05 22:01 ` Thomas Gleixner
  2011-10-09 13:45   ` Adrian Bunk
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Gleixner @ 2011-10-05 22:01 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andrew Lutomirski, H. Peter Anvin, Ingo Molnar, x86, LKML,
	Andrew Morton, Linus Torvalds, Arjan van de Ven

On Thu, 6 Oct 2011, Adrian Bunk wrote:

> After upgrading a kernel the existing userspace should just work
> (assuming it did work before ;-) ), but when I upgraded my kernel
> from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.
> 
> dmesg said:
>   linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
>   linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790
> 
> Looking throught the changelog I ended up at commit 3ae36655
> ("x86-64: Rework vsyscall emulation and add vsyscall= parameter").
> 
> Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to 
> vsyscall=native.
> 
> That sounds reasonable to me, and fixes the problem for me.

NAK.

We have way too long listened to people who insisted that we keep all
known security holes open by default for the sake of backwards
compatibility.

Default wants to be restricted and not the other way round. Forcing
people to loosen restrictions makes them aware of the problem. Not
doing so keeps them in the illusion that stuff is just safe to use.

We might need better dmesg output, e.g.

   printk_once("you might run something which requires
   		vsyscall=native, but be aware that you are
		opening a security hole. See Documentation/....")

That's fine, but making the defaults insecure is just ass backwards.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [3.1 patch] x86: default to vsyscall=native
@ 2011-10-05 21:40 Adrian Bunk
  2011-10-05 22:01 ` Thomas Gleixner
  0 siblings, 1 reply; 50+ messages in thread
From: Adrian Bunk @ 2011-10-05 21:40 UTC (permalink / raw)
  To: Andrew Lutomirski, H. Peter Anvin
  Cc: Thomas Gleixner, Ingo Molnar, x86, linux-kernel, Andrew Morton

After upgrading a kernel the existing userspace should just work
(assuming it did work before ;-) ), but when I upgraded my kernel
from 3.0.4 to 3.1.0-rc8 a UML instance didn't come up properly.

dmesg said:
  linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790
  linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790

Looking throught the changelog I ended up at commit 3ae36655
("x86-64: Rework vsyscall emulation and add vsyscall= parameter").

Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to 
vsyscall=native.

That sounds reasonable to me, and fixes the problem for me.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Andrew Lutomirski <luto@mit.edu>
---
 Documentation/kernel-parameters.txt |    7 ++++---
 arch/x86/kernel/vsyscall_64.c       |    2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 854ed5ca..d6e6724 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2706,10 +2706,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			functions are at fixed addresses, they make nice
 			targets for exploits that can control RIP.
 
-			emulate     [default] Vsyscalls turn into traps and are
-			            emulated reasonably safely.
+			emulate     Vsyscalls turn into traps and are emulated
+			            reasonably safely.
 
-			native      Vsyscalls are native syscall instructions.
+			native      [default] Vsyscalls are native syscall
+			            instructions.
 			            This is a little bit faster than trapping
 			            and makes a few dynamic recompilers work
 			            better than they would in emulation mode.
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index 18ae83d..b56c65de 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -56,7 +56,7 @@ DEFINE_VVAR(struct vsyscall_gtod_data, vsyscall_gtod_data) =
 	.lock = __SEQLOCK_UNLOCKED(__vsyscall_gtod_data.lock),
 };
 
-static enum { EMULATE, NATIVE, NONE } vsyscall_mode = EMULATE;
+static enum { EMULATE, NATIVE, NONE } vsyscall_mode = NATIVE;
 
 static int __init vsyscall_setup(char *str)
 {
-- 
1.7.6.3


^ permalink raw reply related	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2011-12-05 13:27 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-03  9:08 [3.1 patch] x86: default to vsyscall=native Adrian Bunk
2011-10-03 13:04 ` Andrew Lutomirski
2011-10-03 17:33   ` Adrian Bunk
2011-10-03 18:06     ` Andrew Lutomirski
2011-10-03 18:41       ` Adrian Bunk
2011-10-05 22:13     ` Andrew Lutomirski
2011-10-05 22:22       ` richard -rw- weinberger
2011-10-05 22:30         ` Adrian Bunk
2011-10-05 22:41           ` richard -rw- weinberger
2011-10-05 22:46           ` Andrew Lutomirski
2011-10-05 23:36             ` Andrew Lutomirski
2011-10-06  3:06               ` Andrew Lutomirski
2011-10-06 12:12                 ` richard -rw- weinberger
2011-10-06 15:37                 ` richard -rw- weinberger
2011-10-06 18:16                   ` Andrew Lutomirski
2011-10-06 18:34                     ` Linus Torvalds
2011-10-07  0:48                       ` Andrew Lutomirski
2011-10-10 11:19                         ` richard -rw- weinberger
2011-10-10 11:48                           ` Ingo Molnar
2011-10-10 15:31                             ` Andrew Lutomirski
2011-10-11  6:22                               ` Ingo Molnar
2011-10-11 17:24                                 ` [RFC] fixing the UML failure root cause Andrew Lutomirski
2011-10-13  6:19                                   ` Linus Torvalds
2011-10-13  8:40                                     ` Andrew Lutomirski
2011-10-14  4:46                                       ` Linus Torvalds
2011-10-14  6:30                                         ` Andrew Lutomirski
2011-10-14 20:10                                           ` Linus Torvalds
2011-10-21 21:01                                             ` [PATCH] x86-64: Set siginfo and context on vsyscall emulation faults Andy Lutomirski
2011-10-22  4:46                                               ` Linus Torvalds
2011-10-22  9:07                                                 ` Andy Lutomirski
2011-11-08  0:33                                                   ` [PATCH 0/2] Fix and re-enable vsyscall=emulate Andy Lutomirski
2011-11-08  0:33                                                     ` [PATCH 1/2] x86-64: Set siginfo and context on vsyscall emulation faults Andy Lutomirski
2011-12-05 13:23                                                       ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2011-11-08  0:33                                                     ` [PATCH 2/2] x86: Default to vsyscall=emulate Andy Lutomirski
2011-12-05 13:24                                                       ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2011-12-02 22:47                                                     ` [PATCH 0/2] Fix and re-enable vsyscall=emulate Andy Lutomirski
2011-12-05 11:18                                                       ` H. Peter Anvin
2011-10-14 19:53                                   ` [RFC] fixing the UML failure root cause richard -rw- weinberger
2011-10-14 20:17                                     ` Andrew Lutomirski
2011-10-14 20:23                                       ` richard -rw- weinberger
2011-10-14 20:31                                         ` Andrew Lutomirski
2011-10-14 20:39                                           ` richard -rw- weinberger
2011-10-14 22:28                                       ` richard -rw- weinberger
2011-10-15 16:57                                         ` Ingo Molnar
2011-10-05 22:24       ` [3.1 patch] x86: default to vsyscall=native Adrian Bunk
2011-10-03 13:19 ` richard -rw- weinberger
2011-10-03 17:46   ` Adrian Bunk
2011-10-05 21:40 Adrian Bunk
2011-10-05 22:01 ` Thomas Gleixner
2011-10-09 13:45   ` Adrian Bunk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.