All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/9] Assembly coroutine backend and x86 CET support
@ 2019-05-04 12:05 ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, peter.maydell, cohuck, richard.henderson

*** BLURB HERE ***

Paolo Bonzini (10):
  qemugdb: allow adding support for other coroutine backends
  qemugdb: allow adding support for other architectures
  coroutine: add host specific coroutine backend for 64-bit x86
  coroutine: add host specific coroutine backend for 64-bit ARM
  coroutine: add host specific coroutine backend for 64-bit s390
  configure: add control-flow protection support
  tcg: add tcg_out_start
  tcg/i386: add support for IBT
  linux-user: add IBT support to x86 safe-syscall.S
  coroutine-asm: add x86 CET shadow stack support

 Makefile.target                           |   5 +
 configure                                 |  62 ++++
 include/qemu/cpuid.h                      |   5 +
 linux-user/host/i386/safe-syscall.inc.S   |  19 ++
 linux-user/host/x86_64/safe-syscall.inc.S |  19 ++
 scripts/qemugdb/coroutine.py              | 107 ++----
 scripts/qemugdb/coroutine_asm.py          |  24 ++
 scripts/qemugdb/coroutine_ucontext.py     |  69 ++++
 tcg/aarch64/tcg-target.inc.c              |   4 +
 tcg/arm/tcg-target.inc.c                  |   4 +
 tcg/i386/tcg-target.inc.c                 |  23 ++
 tcg/mips/tcg-target.inc.c                 |   4 +
 tcg/ppc/tcg-target.inc.c                  |   4 +
 tcg/riscv/tcg-target.inc.c                |   4 +
 tcg/s390/tcg-target.inc.c                 |   4 +
 tcg/sparc/tcg-target.inc.c                |   4 +
 tcg/tcg.c                                 |   2 +
 tcg/tci/tcg-target.inc.c                  |   4 +
 util/Makefile.objs                        |  10 +
 util/coroutine-asm.c                      | 387 ++++++++++++++++++++++
 20 files changed, 689 insertions(+), 75 deletions(-)
 create mode 100644 scripts/qemugdb/coroutine_asm.py
 create mode 100644 scripts/qemugdb/coroutine_ucontext.py
 create mode 100644 util/coroutine-asm.c

-- 
2.21.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 0/9] Assembly coroutine backend and x86 CET support
@ 2019-05-04 12:05 ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block

*** BLURB HERE ***

Paolo Bonzini (10):
  qemugdb: allow adding support for other coroutine backends
  qemugdb: allow adding support for other architectures
  coroutine: add host specific coroutine backend for 64-bit x86
  coroutine: add host specific coroutine backend for 64-bit ARM
  coroutine: add host specific coroutine backend for 64-bit s390
  configure: add control-flow protection support
  tcg: add tcg_out_start
  tcg/i386: add support for IBT
  linux-user: add IBT support to x86 safe-syscall.S
  coroutine-asm: add x86 CET shadow stack support

 Makefile.target                           |   5 +
 configure                                 |  62 ++++
 include/qemu/cpuid.h                      |   5 +
 linux-user/host/i386/safe-syscall.inc.S   |  19 ++
 linux-user/host/x86_64/safe-syscall.inc.S |  19 ++
 scripts/qemugdb/coroutine.py              | 107 ++----
 scripts/qemugdb/coroutine_asm.py          |  24 ++
 scripts/qemugdb/coroutine_ucontext.py     |  69 ++++
 tcg/aarch64/tcg-target.inc.c              |   4 +
 tcg/arm/tcg-target.inc.c                  |   4 +
 tcg/i386/tcg-target.inc.c                 |  23 ++
 tcg/mips/tcg-target.inc.c                 |   4 +
 tcg/ppc/tcg-target.inc.c                  |   4 +
 tcg/riscv/tcg-target.inc.c                |   4 +
 tcg/s390/tcg-target.inc.c                 |   4 +
 tcg/sparc/tcg-target.inc.c                |   4 +
 tcg/tcg.c                                 |   2 +
 tcg/tci/tcg-target.inc.c                  |   4 +
 util/Makefile.objs                        |  10 +
 util/coroutine-asm.c                      | 387 ++++++++++++++++++++++
 20 files changed, 689 insertions(+), 75 deletions(-)
 create mode 100644 scripts/qemugdb/coroutine_asm.py
 create mode 100644 scripts/qemugdb/coroutine_ucontext.py
 create mode 100644 util/coroutine-asm.c

-- 
2.21.0



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 1/9] qemugdb: allow adding support for other coroutine backends
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, peter.maydell, cohuck, richard.henderson

Split the jmpbuf access to a separate module and dispatch based
on which CoroutineXYZ type is defined.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 scripts/qemugdb/coroutine.py          | 106 ++++++++------------------
 scripts/qemugdb/coroutine_ucontext.py |  69 +++++++++++++++++
 2 files changed, 100 insertions(+), 75 deletions(-)
 create mode 100644 scripts/qemugdb/coroutine_ucontext.py

diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
index 41e079d0e2..db2753d949 100644
--- a/scripts/qemugdb/coroutine.py
+++ b/scripts/qemugdb/coroutine.py
@@ -1,6 +1,6 @@
 #!/usr/bin/python
 
-# GDB debugging support
+# GDB debugging support, coroutine dispatch
 #
 # Copyright 2012 Red Hat, Inc. and/or its affiliates
 #
@@ -10,82 +10,25 @@
 # This work is licensed under the terms of the GNU GPL, version 2
 # or later.  See the COPYING file in the top-level directory.
 
+from . import coroutine_ucontext
 import gdb
 
 VOID_PTR = gdb.lookup_type('void').pointer()
+UINTPTR_T = gdb.lookup_type('uintptr_t')
 
-def get_fs_base():
-    '''Fetch %fs base value using arch_prctl(ARCH_GET_FS).  This is
-       pthread_self().'''
-    # %rsp - 120 is scratch space according to the SystemV ABI
-    old = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
-    gdb.execute('call (int)arch_prctl(0x1003, $rsp - 120)', False, True)
-    fs_base = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
-    gdb.execute('set *(uint64_t*)($rsp - 120) = %s' % old, False, True)
-    return fs_base
-
-def pthread_self():
-    '''Fetch pthread_self() from the glibc start_thread function.'''
-    f = gdb.newest_frame()
-    while f.name() != 'start_thread':
-        f = f.older()
-        if f is None:
-            return get_fs_base()
-
-    try:
-        return f.read_var("arg")
-    except ValueError:
-        return get_fs_base()
-
-def get_glibc_pointer_guard():
-    '''Fetch glibc pointer guard value'''
-    fs_base = pthread_self()
-    return gdb.parse_and_eval('*(uint64_t*)((uint64_t)%s + 0x30)' % fs_base)
-
-def glibc_ptr_demangle(val, pointer_guard):
-    '''Undo effect of glibc's PTR_MANGLE()'''
-    return gdb.parse_and_eval('(((uint64_t)%s >> 0x11) | ((uint64_t)%s << (64 - 0x11))) ^ (uint64_t)%s' % (val, val, pointer_guard))
-
-def get_jmpbuf_regs(jmpbuf):
-    JB_RBX  = 0
-    JB_RBP  = 1
-    JB_R12  = 2
-    JB_R13  = 3
-    JB_R14  = 4
-    JB_R15  = 5
-    JB_RSP  = 6
-    JB_PC   = 7
-
-    pointer_guard = get_glibc_pointer_guard()
-    return {'rbx': jmpbuf[JB_RBX],
-        'rbp': glibc_ptr_demangle(jmpbuf[JB_RBP], pointer_guard),
-        'rsp': glibc_ptr_demangle(jmpbuf[JB_RSP], pointer_guard),
-        'r12': jmpbuf[JB_R12],
-        'r13': jmpbuf[JB_R13],
-        'r14': jmpbuf[JB_R14],
-        'r15': jmpbuf[JB_R15],
-        'rip': glibc_ptr_demangle(jmpbuf[JB_PC], pointer_guard) }
-
-def bt_jmpbuf(jmpbuf):
-    '''Backtrace a jmpbuf'''
-    regs = get_jmpbuf_regs(jmpbuf)
-    old = dict()
-
-    for i in regs:
-        old[i] = gdb.parse_and_eval('(uint64_t)$%s' % i)
-
-    for i in regs:
-        gdb.execute('set $%s = %s' % (i, regs[i]))
-
-    gdb.execute('bt')
-
-    for i in regs:
-        gdb.execute('set $%s = %s' % (i, old[i]))
-
-def coroutine_to_jmpbuf(co):
-    coroutine_pointer = co.cast(gdb.lookup_type('CoroutineUContext').pointer())
-    return coroutine_pointer['env']['__jmpbuf']
+backends = {
+    'CoroutineUContext': coroutine_ucontext
+}
 
+def coroutine_backend():
+    for k, v in backends.items():
+        try:
+            gdb.lookup_type(k)
+        except:
+            continue
+        return v
+
+    raise Exception('could not find coroutine backend')
 
 class CoroutineCommand(gdb.Command):
     '''Display coroutine backtrace'''
@@ -99,18 +42,31 @@ class CoroutineCommand(gdb.Command):
             gdb.write('usage: qemu coroutine <coroutine-pointer>\n')
             return
 
-        bt_jmpbuf(coroutine_to_jmpbuf(gdb.parse_and_eval(argv[0])))
+        addr = gdb.parse_and_eval(argv[0])
+        regs = coroutine_backend().get_coroutine_regs(addr)
+        old = dict()
+
+        for i in regs:
+            old[i] = gdb.parse_and_eval('(uint64_t)$%s' % i)
+
+        for i in regs:
+            gdb.execute('set $%s = %s' % (i, regs[i].cast(UINTPTR_T)))
+
+        gdb.execute('bt')
+
+        for i in regs:
+            gdb.execute('set $%s = %s' % (i, old[i].cast(UINTPTR_T)))
 
 class CoroutineSPFunction(gdb.Function):
     def __init__(self):
         gdb.Function.__init__(self, 'qemu_coroutine_sp')
 
     def invoke(self, addr):
-        return get_jmpbuf_regs(coroutine_to_jmpbuf(addr))['rsp'].cast(VOID_PTR)
+        return coroutine_backend().get_coroutine_regs(addr)['rsp'].cast(VOID_PTR)
 
 class CoroutinePCFunction(gdb.Function):
     def __init__(self):
         gdb.Function.__init__(self, 'qemu_coroutine_pc')
 
     def invoke(self, addr):
-        return get_jmpbuf_regs(coroutine_to_jmpbuf(addr))['rip'].cast(VOID_PTR)
+        return coroutine_backend().get_coroutine_regs(addr)['rip'].cast(VOID_PTR)
diff --git a/scripts/qemugdb/coroutine_ucontext.py b/scripts/qemugdb/coroutine_ucontext.py
new file mode 100644
index 0000000000..a2f8c1dbbf
--- /dev/null
+++ b/scripts/qemugdb/coroutine_ucontext.py
@@ -0,0 +1,69 @@
+#!/usr/bin/python
+
+# GDB debugging support
+#
+# Copyright 2012 Red Hat, Inc. and/or its affiliates
+#
+# Authors:
+#  Avi Kivity <avi@redhat.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+import gdb
+
+def get_fs_base():
+    '''Fetch %fs base value using arch_prctl(ARCH_GET_FS).  This is
+       pthread_self().'''
+    # %rsp - 120 is scratch space according to the SystemV ABI
+    old = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
+    gdb.execute('call (int)arch_prctl(0x1003, $rsp - 120)', False, True)
+    fs_base = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
+    gdb.execute('set *(uint64_t*)($rsp - 120) = %s' % old, False, True)
+    return fs_base
+
+def pthread_self():
+    '''Fetch pthread_self() from the glibc start_thread function.'''
+    f = gdb.newest_frame()
+    while f.name() != 'start_thread':
+        f = f.older()
+        if f is None:
+            return get_fs_base()
+
+    try:
+        return f.read_var("arg")
+    except ValueError:
+        return get_fs_base()
+
+def get_glibc_pointer_guard():
+    '''Fetch glibc pointer guard value'''
+    fs_base = pthread_self()
+    return gdb.parse_and_eval('*(uint64_t*)((uint64_t)%s + 0x30)' % fs_base)
+
+def glibc_ptr_demangle(val, pointer_guard):
+    '''Undo effect of glibc's PTR_MANGLE()'''
+    return gdb.parse_and_eval('(((uint64_t)%s >> 0x11) | ((uint64_t)%s << (64 - 0x11))) ^ (uint64_t)%s' % (val, val, pointer_guard))
+
+def get_jmpbuf_regs(jmpbuf):
+    JB_RBX  = 0
+    JB_RBP  = 1
+    JB_R12  = 2
+    JB_R13  = 3
+    JB_R14  = 4
+    JB_R15  = 5
+    JB_RSP  = 6
+    JB_PC   = 7
+
+    pointer_guard = get_glibc_pointer_guard()
+    return {'rbx': jmpbuf[JB_RBX],
+        'rbp': glibc_ptr_demangle(jmpbuf[JB_RBP], pointer_guard),
+        'rsp': glibc_ptr_demangle(jmpbuf[JB_RSP], pointer_guard),
+        'r12': jmpbuf[JB_R12],
+        'r13': jmpbuf[JB_R13],
+        'r14': jmpbuf[JB_R14],
+        'r15': jmpbuf[JB_R15],
+        'rip': glibc_ptr_demangle(jmpbuf[JB_PC], pointer_guard) }
+
+def get_coroutine_regs(addr):
+    co = addr.cast(gdb.lookup_type('CoroutineUContext').pointer())
+    return get_jmpbuf_regs(co['env']['__jmpbuf'])
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 1/9] qemugdb: allow adding support for other coroutine backends
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block

Split the jmpbuf access to a separate module and dispatch based
on which CoroutineXYZ type is defined.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 scripts/qemugdb/coroutine.py          | 106 ++++++++------------------
 scripts/qemugdb/coroutine_ucontext.py |  69 +++++++++++++++++
 2 files changed, 100 insertions(+), 75 deletions(-)
 create mode 100644 scripts/qemugdb/coroutine_ucontext.py

diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
index 41e079d0e2..db2753d949 100644
--- a/scripts/qemugdb/coroutine.py
+++ b/scripts/qemugdb/coroutine.py
@@ -1,6 +1,6 @@
 #!/usr/bin/python
 
-# GDB debugging support
+# GDB debugging support, coroutine dispatch
 #
 # Copyright 2012 Red Hat, Inc. and/or its affiliates
 #
@@ -10,82 +10,25 @@
 # This work is licensed under the terms of the GNU GPL, version 2
 # or later.  See the COPYING file in the top-level directory.
 
+from . import coroutine_ucontext
 import gdb
 
 VOID_PTR = gdb.lookup_type('void').pointer()
+UINTPTR_T = gdb.lookup_type('uintptr_t')
 
-def get_fs_base():
-    '''Fetch %fs base value using arch_prctl(ARCH_GET_FS).  This is
-       pthread_self().'''
-    # %rsp - 120 is scratch space according to the SystemV ABI
-    old = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
-    gdb.execute('call (int)arch_prctl(0x1003, $rsp - 120)', False, True)
-    fs_base = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
-    gdb.execute('set *(uint64_t*)($rsp - 120) = %s' % old, False, True)
-    return fs_base
-
-def pthread_self():
-    '''Fetch pthread_self() from the glibc start_thread function.'''
-    f = gdb.newest_frame()
-    while f.name() != 'start_thread':
-        f = f.older()
-        if f is None:
-            return get_fs_base()
-
-    try:
-        return f.read_var("arg")
-    except ValueError:
-        return get_fs_base()
-
-def get_glibc_pointer_guard():
-    '''Fetch glibc pointer guard value'''
-    fs_base = pthread_self()
-    return gdb.parse_and_eval('*(uint64_t*)((uint64_t)%s + 0x30)' % fs_base)
-
-def glibc_ptr_demangle(val, pointer_guard):
-    '''Undo effect of glibc's PTR_MANGLE()'''
-    return gdb.parse_and_eval('(((uint64_t)%s >> 0x11) | ((uint64_t)%s << (64 - 0x11))) ^ (uint64_t)%s' % (val, val, pointer_guard))
-
-def get_jmpbuf_regs(jmpbuf):
-    JB_RBX  = 0
-    JB_RBP  = 1
-    JB_R12  = 2
-    JB_R13  = 3
-    JB_R14  = 4
-    JB_R15  = 5
-    JB_RSP  = 6
-    JB_PC   = 7
-
-    pointer_guard = get_glibc_pointer_guard()
-    return {'rbx': jmpbuf[JB_RBX],
-        'rbp': glibc_ptr_demangle(jmpbuf[JB_RBP], pointer_guard),
-        'rsp': glibc_ptr_demangle(jmpbuf[JB_RSP], pointer_guard),
-        'r12': jmpbuf[JB_R12],
-        'r13': jmpbuf[JB_R13],
-        'r14': jmpbuf[JB_R14],
-        'r15': jmpbuf[JB_R15],
-        'rip': glibc_ptr_demangle(jmpbuf[JB_PC], pointer_guard) }
-
-def bt_jmpbuf(jmpbuf):
-    '''Backtrace a jmpbuf'''
-    regs = get_jmpbuf_regs(jmpbuf)
-    old = dict()
-
-    for i in regs:
-        old[i] = gdb.parse_and_eval('(uint64_t)$%s' % i)
-
-    for i in regs:
-        gdb.execute('set $%s = %s' % (i, regs[i]))
-
-    gdb.execute('bt')
-
-    for i in regs:
-        gdb.execute('set $%s = %s' % (i, old[i]))
-
-def coroutine_to_jmpbuf(co):
-    coroutine_pointer = co.cast(gdb.lookup_type('CoroutineUContext').pointer())
-    return coroutine_pointer['env']['__jmpbuf']
+backends = {
+    'CoroutineUContext': coroutine_ucontext
+}
 
+def coroutine_backend():
+    for k, v in backends.items():
+        try:
+            gdb.lookup_type(k)
+        except:
+            continue
+        return v
+
+    raise Exception('could not find coroutine backend')
 
 class CoroutineCommand(gdb.Command):
     '''Display coroutine backtrace'''
@@ -99,18 +42,31 @@ class CoroutineCommand(gdb.Command):
             gdb.write('usage: qemu coroutine <coroutine-pointer>\n')
             return
 
-        bt_jmpbuf(coroutine_to_jmpbuf(gdb.parse_and_eval(argv[0])))
+        addr = gdb.parse_and_eval(argv[0])
+        regs = coroutine_backend().get_coroutine_regs(addr)
+        old = dict()
+
+        for i in regs:
+            old[i] = gdb.parse_and_eval('(uint64_t)$%s' % i)
+
+        for i in regs:
+            gdb.execute('set $%s = %s' % (i, regs[i].cast(UINTPTR_T)))
+
+        gdb.execute('bt')
+
+        for i in regs:
+            gdb.execute('set $%s = %s' % (i, old[i].cast(UINTPTR_T)))
 
 class CoroutineSPFunction(gdb.Function):
     def __init__(self):
         gdb.Function.__init__(self, 'qemu_coroutine_sp')
 
     def invoke(self, addr):
-        return get_jmpbuf_regs(coroutine_to_jmpbuf(addr))['rsp'].cast(VOID_PTR)
+        return coroutine_backend().get_coroutine_regs(addr)['rsp'].cast(VOID_PTR)
 
 class CoroutinePCFunction(gdb.Function):
     def __init__(self):
         gdb.Function.__init__(self, 'qemu_coroutine_pc')
 
     def invoke(self, addr):
-        return get_jmpbuf_regs(coroutine_to_jmpbuf(addr))['rip'].cast(VOID_PTR)
+        return coroutine_backend().get_coroutine_regs(addr)['rip'].cast(VOID_PTR)
diff --git a/scripts/qemugdb/coroutine_ucontext.py b/scripts/qemugdb/coroutine_ucontext.py
new file mode 100644
index 0000000000..a2f8c1dbbf
--- /dev/null
+++ b/scripts/qemugdb/coroutine_ucontext.py
@@ -0,0 +1,69 @@
+#!/usr/bin/python
+
+# GDB debugging support
+#
+# Copyright 2012 Red Hat, Inc. and/or its affiliates
+#
+# Authors:
+#  Avi Kivity <avi@redhat.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+import gdb
+
+def get_fs_base():
+    '''Fetch %fs base value using arch_prctl(ARCH_GET_FS).  This is
+       pthread_self().'''
+    # %rsp - 120 is scratch space according to the SystemV ABI
+    old = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
+    gdb.execute('call (int)arch_prctl(0x1003, $rsp - 120)', False, True)
+    fs_base = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
+    gdb.execute('set *(uint64_t*)($rsp - 120) = %s' % old, False, True)
+    return fs_base
+
+def pthread_self():
+    '''Fetch pthread_self() from the glibc start_thread function.'''
+    f = gdb.newest_frame()
+    while f.name() != 'start_thread':
+        f = f.older()
+        if f is None:
+            return get_fs_base()
+
+    try:
+        return f.read_var("arg")
+    except ValueError:
+        return get_fs_base()
+
+def get_glibc_pointer_guard():
+    '''Fetch glibc pointer guard value'''
+    fs_base = pthread_self()
+    return gdb.parse_and_eval('*(uint64_t*)((uint64_t)%s + 0x30)' % fs_base)
+
+def glibc_ptr_demangle(val, pointer_guard):
+    '''Undo effect of glibc's PTR_MANGLE()'''
+    return gdb.parse_and_eval('(((uint64_t)%s >> 0x11) | ((uint64_t)%s << (64 - 0x11))) ^ (uint64_t)%s' % (val, val, pointer_guard))
+
+def get_jmpbuf_regs(jmpbuf):
+    JB_RBX  = 0
+    JB_RBP  = 1
+    JB_R12  = 2
+    JB_R13  = 3
+    JB_R14  = 4
+    JB_R15  = 5
+    JB_RSP  = 6
+    JB_PC   = 7
+
+    pointer_guard = get_glibc_pointer_guard()
+    return {'rbx': jmpbuf[JB_RBX],
+        'rbp': glibc_ptr_demangle(jmpbuf[JB_RBP], pointer_guard),
+        'rsp': glibc_ptr_demangle(jmpbuf[JB_RSP], pointer_guard),
+        'r12': jmpbuf[JB_R12],
+        'r13': jmpbuf[JB_R13],
+        'r14': jmpbuf[JB_R14],
+        'r15': jmpbuf[JB_R15],
+        'rip': glibc_ptr_demangle(jmpbuf[JB_PC], pointer_guard) }
+
+def get_coroutine_regs(addr):
+    co = addr.cast(gdb.lookup_type('CoroutineUContext').pointer())
+    return get_jmpbuf_regs(co['env']['__jmpbuf'])
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 2/9] qemugdb: allow adding support for other architectures
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, peter.maydell, cohuck, richard.henderson

$sp and $pc are standard register names that are available
on most machines, use them instead of $rsp and $rip so that
other architectures can use qemu_coroutine_sp and
qemu_coroutine_pc.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 scripts/qemugdb/coroutine.py          | 4 ++--
 scripts/qemugdb/coroutine_ucontext.py | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
index db2753d949..076f6808ab 100644
--- a/scripts/qemugdb/coroutine.py
+++ b/scripts/qemugdb/coroutine.py
@@ -62,11 +62,11 @@ class CoroutineSPFunction(gdb.Function):
         gdb.Function.__init__(self, 'qemu_coroutine_sp')
 
     def invoke(self, addr):
-        return coroutine_backend().get_coroutine_regs(addr)['rsp'].cast(VOID_PTR)
+        return coroutine_backend().get_coroutine_regs(addr)['sp'].cast(VOID_PTR)
 
 class CoroutinePCFunction(gdb.Function):
     def __init__(self):
         gdb.Function.__init__(self, 'qemu_coroutine_pc')
 
     def invoke(self, addr):
-        return coroutine_backend().get_coroutine_regs(addr)['rip'].cast(VOID_PTR)
+        return coroutine_backend().get_coroutine_regs(addr)['pc'].cast(VOID_PTR)
diff --git a/scripts/qemugdb/coroutine_ucontext.py b/scripts/qemugdb/coroutine_ucontext.py
index a2f8c1dbbf..eed095be22 100644
--- a/scripts/qemugdb/coroutine_ucontext.py
+++ b/scripts/qemugdb/coroutine_ucontext.py
@@ -57,12 +57,12 @@ def get_jmpbuf_regs(jmpbuf):
     pointer_guard = get_glibc_pointer_guard()
     return {'rbx': jmpbuf[JB_RBX],
         'rbp': glibc_ptr_demangle(jmpbuf[JB_RBP], pointer_guard),
-        'rsp': glibc_ptr_demangle(jmpbuf[JB_RSP], pointer_guard),
+        'sp': glibc_ptr_demangle(jmpbuf[JB_RSP], pointer_guard),
         'r12': jmpbuf[JB_R12],
         'r13': jmpbuf[JB_R13],
         'r14': jmpbuf[JB_R14],
         'r15': jmpbuf[JB_R15],
-        'rip': glibc_ptr_demangle(jmpbuf[JB_PC], pointer_guard) }
+        'pc': glibc_ptr_demangle(jmpbuf[JB_PC], pointer_guard) }
 
 def get_coroutine_regs(addr):
     co = addr.cast(gdb.lookup_type('CoroutineUContext').pointer())
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 2/9] qemugdb: allow adding support for other architectures
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block

$sp and $pc are standard register names that are available
on most machines, use them instead of $rsp and $rip so that
other architectures can use qemu_coroutine_sp and
qemu_coroutine_pc.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 scripts/qemugdb/coroutine.py          | 4 ++--
 scripts/qemugdb/coroutine_ucontext.py | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
index db2753d949..076f6808ab 100644
--- a/scripts/qemugdb/coroutine.py
+++ b/scripts/qemugdb/coroutine.py
@@ -62,11 +62,11 @@ class CoroutineSPFunction(gdb.Function):
         gdb.Function.__init__(self, 'qemu_coroutine_sp')
 
     def invoke(self, addr):
-        return coroutine_backend().get_coroutine_regs(addr)['rsp'].cast(VOID_PTR)
+        return coroutine_backend().get_coroutine_regs(addr)['sp'].cast(VOID_PTR)
 
 class CoroutinePCFunction(gdb.Function):
     def __init__(self):
         gdb.Function.__init__(self, 'qemu_coroutine_pc')
 
     def invoke(self, addr):
-        return coroutine_backend().get_coroutine_regs(addr)['rip'].cast(VOID_PTR)
+        return coroutine_backend().get_coroutine_regs(addr)['pc'].cast(VOID_PTR)
diff --git a/scripts/qemugdb/coroutine_ucontext.py b/scripts/qemugdb/coroutine_ucontext.py
index a2f8c1dbbf..eed095be22 100644
--- a/scripts/qemugdb/coroutine_ucontext.py
+++ b/scripts/qemugdb/coroutine_ucontext.py
@@ -57,12 +57,12 @@ def get_jmpbuf_regs(jmpbuf):
     pointer_guard = get_glibc_pointer_guard()
     return {'rbx': jmpbuf[JB_RBX],
         'rbp': glibc_ptr_demangle(jmpbuf[JB_RBP], pointer_guard),
-        'rsp': glibc_ptr_demangle(jmpbuf[JB_RSP], pointer_guard),
+        'sp': glibc_ptr_demangle(jmpbuf[JB_RSP], pointer_guard),
         'r12': jmpbuf[JB_R12],
         'r13': jmpbuf[JB_R13],
         'r14': jmpbuf[JB_R14],
         'r15': jmpbuf[JB_R15],
-        'rip': glibc_ptr_demangle(jmpbuf[JB_PC], pointer_guard) }
+        'pc': glibc_ptr_demangle(jmpbuf[JB_PC], pointer_guard) }
 
 def get_coroutine_regs(addr):
     co = addr.cast(gdb.lookup_type('CoroutineUContext').pointer())
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 3/9] coroutine: add host specific coroutine backend for 64-bit x86
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, peter.maydell, cohuck, richard.henderson

This backend is faster (100ns vs 150ns per switch on my laptop), but
especially it will be possible to add CET support to it.  Most of the
code is actually not architecture specific.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 configure                        |  10 ++
 scripts/qemugdb/coroutine.py     |   5 +-
 scripts/qemugdb/coroutine_asm.py |  20 +++
 util/Makefile.objs               |   1 +
 util/coroutine-asm.c             | 230 +++++++++++++++++++++++++++++++
 5 files changed, 264 insertions(+), 2 deletions(-)
 create mode 100644 scripts/qemugdb/coroutine_asm.py
 create mode 100644 util/coroutine-asm.c

diff --git a/configure b/configure
index 5b183c2e39..c01f57a3ae 100755
--- a/configure
+++ b/configure
@@ -5200,6 +5200,8 @@ fi
 if test "$coroutine" = ""; then
   if test "$mingw32" = "yes"; then
     coroutine=win32
+  elif test "$cpu" = "x86_64"; then
+     coroutine=asm
   elif test "$ucontext_works" = "yes"; then
     coroutine=ucontext
   else
@@ -5225,6 +5227,14 @@ else
       error_exit "only the 'windows' coroutine backend is valid for Windows"
     fi
     ;;
+  asm)
+    if test "$mingw32" = "yes"; then
+      error_exit "only the 'windows' coroutine backend is valid for Windows"
+    fi
+    if test "$cpu" != "x86_64"; then
+      error_exit "the 'asm' backend is only valid for x86_64 hosts"
+    fi
+    ;;
   *)
     error_exit "unknown coroutine backend $coroutine"
     ;;
diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
index 076f6808ab..dc760235e7 100644
--- a/scripts/qemugdb/coroutine.py
+++ b/scripts/qemugdb/coroutine.py
@@ -10,14 +10,15 @@
 # This work is licensed under the terms of the GNU GPL, version 2
 # or later.  See the COPYING file in the top-level directory.
 
-from . import coroutine_ucontext
+from . import coroutine_ucontext, coroutine_asm
 import gdb
 
 VOID_PTR = gdb.lookup_type('void').pointer()
 UINTPTR_T = gdb.lookup_type('uintptr_t')
 
 backends = {
-    'CoroutineUContext': coroutine_ucontext
+    'CoroutineUContext': coroutine_ucontext,
+    'CoroutineAsm': coroutine_asm
 }
 
 def coroutine_backend():
diff --git a/scripts/qemugdb/coroutine_asm.py b/scripts/qemugdb/coroutine_asm.py
new file mode 100644
index 0000000000..b4ac1291db
--- /dev/null
+++ b/scripts/qemugdb/coroutine_asm.py
@@ -0,0 +1,20 @@
+#!/usr/bin/python
+
+# GDB debugging support
+#
+# Copyright 2019 Red Hat, Inc.
+#
+# Authors:
+#  Paolo Bonzini <pbonzini@redhat.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+import gdb
+
+U64_PTR = gdb.lookup_type('uint64_t').pointer()
+
+def get_coroutine_regs(addr):
+    addr = addr.cast(gdb.lookup_type('CoroutineAsm').pointer())
+    rsp = addr['sp'].cast(U64_PTR)
+    return {'sp': rsp, 'pc': rsp.dereference()}
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 9206878dec..41a10539cf 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -39,6 +39,7 @@ util-obj-$(CONFIG_MEMBARRIER) += sys_membarrier.o
 util-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
 util-obj-y += qemu-coroutine-sleep.o
 util-obj-y += coroutine-$(CONFIG_COROUTINE_BACKEND).o
+coroutine-asm.o-cflags := -mno-red-zone
 util-obj-y += buffer.o
 util-obj-y += timed-average.o
 util-obj-y += base64.o
diff --git a/util/coroutine-asm.c b/util/coroutine-asm.c
new file mode 100644
index 0000000000..a06ecbcb0a
--- /dev/null
+++ b/util/coroutine-asm.c
@@ -0,0 +1,230 @@
+/*
+ * Host-specific coroutine initialization code
+ *
+ * Copyright (C) 2006  Anthony Liguori <anthony@codemonkey.ws>
+ * Copyright (C) 2011  Kevin Wolf <kwolf@redhat.com>
+ * Copyright (C) 2019  Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.0 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/coroutine_int.h"
+
+#ifdef CONFIG_VALGRIND_H
+#include <valgrind/valgrind.h>
+#endif
+
+#if defined(__SANITIZE_ADDRESS__) || __has_feature(address_sanitizer)
+#ifdef CONFIG_ASAN_IFACE_FIBER
+#define CONFIG_ASAN 1
+#include <sanitizer/asan_interface.h>
+#endif
+#endif
+
+#define COROUTINE_SHADOW_STACK_SIZE    4096
+
+typedef struct {
+    Coroutine base;
+    void *sp;
+
+    void *stack;
+    size_t stack_size;
+
+#ifdef CONFIG_VALGRIND_H
+    unsigned int valgrind_stack_id;
+#endif
+} CoroutineAsm;
+
+/**
+ * Per-thread coroutine bookkeeping
+ */
+static __thread CoroutineAsm leader;
+static __thread Coroutine *current;
+
+static void finish_switch_fiber(void *fake_stack_save)
+{
+#ifdef CONFIG_ASAN
+    const void *bottom_old;
+    size_t size_old;
+
+    __sanitizer_finish_switch_fiber(fake_stack_save, &bottom_old, &size_old);
+
+    if (!leader.stack) {
+        leader.stack = (void *)bottom_old;
+        leader.stack_size = size_old;
+    }
+#endif
+}
+
+static void start_switch_fiber(void **fake_stack_save,
+                               const void *bottom, size_t size)
+{
+#ifdef CONFIG_ASAN
+    __sanitizer_start_switch_fiber(fake_stack_save, bottom, size);
+#endif
+}
+
+#ifdef __x86_64__
+/*
+ * We hardcode all operands to specific registers so that we can write down all the
+ * others in the clobber list.  Note that action also needs to be hardcoded so that
+ * it is the same register in all expansions of this macro.  Also, we use %rdi
+ * for the coroutine because that is the ABI's first argument register;
+ * coroutine_trampoline can then retrieve the current coroutine from there.
+ *
+ * Note that push and call would clobber the red zone.  Makefile.objs compiles this
+ * file with -mno-red-zone.  The alternative is to subtract/add 128 bytes from rsp
+ * around the switch, with slightly lower cache performance.
+ */
+#define CO_SWITCH(from, to, action, jump) ({                                          \
+    int action_ = action;                                                             \
+    void *from_ = from;                                                               \
+    void *to_ = to;                                                                   \
+    asm volatile(                                                                     \
+        "pushq %%rbp\n"                     /* save frame register on source stack */ \
+        ".cfi_adjust_cfa_offset 8\n"                                                  \
+        "call 1f\n"                         /* switch continues at label 1 */         \
+        "jmp 2f\n"                          /* switch back continues at label 2 */    \
+                                                                                      \
+        "1: .cfi_adjust_cfa_offset 8\n"                                               \
+        "movq %%rsp, %c[SP](%[FROM])\n"     /* save source SP */                      \
+        "movq %c[SP](%[TO]), %%rsp\n"       /* load destination SP */                 \
+        jump "\n"                           /* coroutine switch */                    \
+                                                                                      \
+        "2: .cfi_adjust_cfa_offset -8\n"                                              \
+        "popq %%rbp\n"                                                                \
+        ".cfi_adjust_cfa_offset -8\n"                                                 \
+        : "+a" (action_), [FROM] "+b" (from_), [TO] "+D" (to_)                        \
+        : [SP] "i" (offsetof(CoroutineAsm, sp))                                       \
+        : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",  \
+          "memory");                                                                  \
+    action_;                                                                          \
+})
+/* Use "call" to ensure the stack  is aligned correctly.  */
+#define CO_SWITCH_NEW(from, to) CO_SWITCH(from, to, 0, "call coroutine_trampoline")
+#define CO_SWITCH_RET(from, to, action) CO_SWITCH(from, to, action, "ret")
+#else
+#error coroutine-asm.c not ported to this architecture.
+#endif
+
+static void __attribute__((__used__)) coroutine_trampoline(CoroutineAsm *self)
+{
+    finish_switch_fiber(NULL);
+
+    while (true) {
+        Coroutine *co = &self->base;
+        qemu_coroutine_switch(co, co->caller, COROUTINE_TERMINATE);
+        co->entry(co->entry_arg);
+    }
+}
+
+Coroutine *qemu_coroutine_new(void)
+{
+    CoroutineAsm *co;
+    void *fake_stack_save = NULL;
+
+    co = g_malloc0(sizeof(*co));
+    co->stack_size = COROUTINE_STACK_SIZE;
+    co->stack = qemu_alloc_stack(&co->stack_size);
+    co->sp = co->stack + co->stack_size;
+
+#ifdef CONFIG_VALGRIND_H
+    co->valgrind_stack_id =
+        VALGRIND_STACK_REGISTER(co->stack, co->stack + co->stack_size);
+#endif
+
+    /*
+     * Immediately enter the coroutine once to initialize the stack
+     * and program counter.  We could instead just push the address
+     * of coroutine_trampoline and let qemu_coroutine_switch return
+     * to it, but doing it this way confines the non-portable code
+     * to the CO_SWITCH* macros.
+     */
+    co->base.caller = qemu_coroutine_self();
+    start_switch_fiber(&fake_stack_save, co->stack, co->stack_size);
+    CO_SWITCH_NEW(current, co);
+    finish_switch_fiber(fake_stack_save);
+    co->base.caller = NULL;
+
+    return &co->base;
+}
+
+#ifdef CONFIG_VALGRIND_H
+#if defined(CONFIG_PRAGMA_DIAGNOSTIC_AVAILABLE) && !defined(__clang__)
+/* Work around an unused variable in the valgrind.h macro... */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wunused-but-set-variable"
+#endif
+static inline void valgrind_stack_deregister(CoroutineAsm *co)
+{
+    VALGRIND_STACK_DEREGISTER(co->valgrind_stack_id);
+}
+#if defined(CONFIG_PRAGMA_DIAGNOSTIC_AVAILABLE) && !defined(__clang__)
+#pragma GCC diagnostic pop
+#endif
+#endif
+
+void qemu_coroutine_delete(Coroutine *co_)
+{
+    CoroutineAsm *co = DO_UPCAST(CoroutineAsm, base, co_);
+
+#ifdef CONFIG_VALGRIND_H
+    valgrind_stack_deregister(co);
+#endif
+
+    qemu_free_stack(co->stack, co->stack_size);
+    g_free(co);
+}
+
+/*
+ * This function is marked noinline to prevent GCC from inlining it
+ * into coroutine_trampoline(). If we allow it to do that then it
+ * hoists the code to get the address of the TLS variable "current"
+ * out of the while() loop. This is an invalid transformation because
+ * qemu_coroutine_switch() may be called when running thread A but
+ * return in thread B, and so we might be in a different thread
+ * context each time round the loop.
+ */
+CoroutineAction __attribute__((noinline))
+qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
+                      CoroutineAction action)
+{
+    CoroutineAsm *from = DO_UPCAST(CoroutineAsm, base, from_);
+    CoroutineAsm *to = DO_UPCAST(CoroutineAsm, base, to_);
+    void *fake_stack_save = NULL;
+
+    current = to_;
+
+    start_switch_fiber(action == COROUTINE_TERMINATE ?
+                       NULL : &fake_stack_save, to->stack, to->stack_size);
+    action = CO_SWITCH_RET(from, to, action);
+    finish_switch_fiber(fake_stack_save);
+
+    return action;
+}
+
+Coroutine *qemu_coroutine_self(void)
+{
+    if (!current) {
+        current = &leader.base;
+    }
+    return current;
+}
+
+bool qemu_in_coroutine(void)
+{
+    return current && current->caller;
+}
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 3/9] coroutine: add host specific coroutine backend for 64-bit x86
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block

This backend is faster (100ns vs 150ns per switch on my laptop), but
especially it will be possible to add CET support to it.  Most of the
code is actually not architecture specific.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 configure                        |  10 ++
 scripts/qemugdb/coroutine.py     |   5 +-
 scripts/qemugdb/coroutine_asm.py |  20 +++
 util/Makefile.objs               |   1 +
 util/coroutine-asm.c             | 230 +++++++++++++++++++++++++++++++
 5 files changed, 264 insertions(+), 2 deletions(-)
 create mode 100644 scripts/qemugdb/coroutine_asm.py
 create mode 100644 util/coroutine-asm.c

diff --git a/configure b/configure
index 5b183c2e39..c01f57a3ae 100755
--- a/configure
+++ b/configure
@@ -5200,6 +5200,8 @@ fi
 if test "$coroutine" = ""; then
   if test "$mingw32" = "yes"; then
     coroutine=win32
+  elif test "$cpu" = "x86_64"; then
+     coroutine=asm
   elif test "$ucontext_works" = "yes"; then
     coroutine=ucontext
   else
@@ -5225,6 +5227,14 @@ else
       error_exit "only the 'windows' coroutine backend is valid for Windows"
     fi
     ;;
+  asm)
+    if test "$mingw32" = "yes"; then
+      error_exit "only the 'windows' coroutine backend is valid for Windows"
+    fi
+    if test "$cpu" != "x86_64"; then
+      error_exit "the 'asm' backend is only valid for x86_64 hosts"
+    fi
+    ;;
   *)
     error_exit "unknown coroutine backend $coroutine"
     ;;
diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
index 076f6808ab..dc760235e7 100644
--- a/scripts/qemugdb/coroutine.py
+++ b/scripts/qemugdb/coroutine.py
@@ -10,14 +10,15 @@
 # This work is licensed under the terms of the GNU GPL, version 2
 # or later.  See the COPYING file in the top-level directory.
 
-from . import coroutine_ucontext
+from . import coroutine_ucontext, coroutine_asm
 import gdb
 
 VOID_PTR = gdb.lookup_type('void').pointer()
 UINTPTR_T = gdb.lookup_type('uintptr_t')
 
 backends = {
-    'CoroutineUContext': coroutine_ucontext
+    'CoroutineUContext': coroutine_ucontext,
+    'CoroutineAsm': coroutine_asm
 }
 
 def coroutine_backend():
diff --git a/scripts/qemugdb/coroutine_asm.py b/scripts/qemugdb/coroutine_asm.py
new file mode 100644
index 0000000000..b4ac1291db
--- /dev/null
+++ b/scripts/qemugdb/coroutine_asm.py
@@ -0,0 +1,20 @@
+#!/usr/bin/python
+
+# GDB debugging support
+#
+# Copyright 2019 Red Hat, Inc.
+#
+# Authors:
+#  Paolo Bonzini <pbonzini@redhat.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+import gdb
+
+U64_PTR = gdb.lookup_type('uint64_t').pointer()
+
+def get_coroutine_regs(addr):
+    addr = addr.cast(gdb.lookup_type('CoroutineAsm').pointer())
+    rsp = addr['sp'].cast(U64_PTR)
+    return {'sp': rsp, 'pc': rsp.dereference()}
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 9206878dec..41a10539cf 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -39,6 +39,7 @@ util-obj-$(CONFIG_MEMBARRIER) += sys_membarrier.o
 util-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
 util-obj-y += qemu-coroutine-sleep.o
 util-obj-y += coroutine-$(CONFIG_COROUTINE_BACKEND).o
+coroutine-asm.o-cflags := -mno-red-zone
 util-obj-y += buffer.o
 util-obj-y += timed-average.o
 util-obj-y += base64.o
diff --git a/util/coroutine-asm.c b/util/coroutine-asm.c
new file mode 100644
index 0000000000..a06ecbcb0a
--- /dev/null
+++ b/util/coroutine-asm.c
@@ -0,0 +1,230 @@
+/*
+ * Host-specific coroutine initialization code
+ *
+ * Copyright (C) 2006  Anthony Liguori <anthony@codemonkey.ws>
+ * Copyright (C) 2011  Kevin Wolf <kwolf@redhat.com>
+ * Copyright (C) 2019  Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.0 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/coroutine_int.h"
+
+#ifdef CONFIG_VALGRIND_H
+#include <valgrind/valgrind.h>
+#endif
+
+#if defined(__SANITIZE_ADDRESS__) || __has_feature(address_sanitizer)
+#ifdef CONFIG_ASAN_IFACE_FIBER
+#define CONFIG_ASAN 1
+#include <sanitizer/asan_interface.h>
+#endif
+#endif
+
+#define COROUTINE_SHADOW_STACK_SIZE    4096
+
+typedef struct {
+    Coroutine base;
+    void *sp;
+
+    void *stack;
+    size_t stack_size;
+
+#ifdef CONFIG_VALGRIND_H
+    unsigned int valgrind_stack_id;
+#endif
+} CoroutineAsm;
+
+/**
+ * Per-thread coroutine bookkeeping
+ */
+static __thread CoroutineAsm leader;
+static __thread Coroutine *current;
+
+static void finish_switch_fiber(void *fake_stack_save)
+{
+#ifdef CONFIG_ASAN
+    const void *bottom_old;
+    size_t size_old;
+
+    __sanitizer_finish_switch_fiber(fake_stack_save, &bottom_old, &size_old);
+
+    if (!leader.stack) {
+        leader.stack = (void *)bottom_old;
+        leader.stack_size = size_old;
+    }
+#endif
+}
+
+static void start_switch_fiber(void **fake_stack_save,
+                               const void *bottom, size_t size)
+{
+#ifdef CONFIG_ASAN
+    __sanitizer_start_switch_fiber(fake_stack_save, bottom, size);
+#endif
+}
+
+#ifdef __x86_64__
+/*
+ * We hardcode all operands to specific registers so that we can write down all the
+ * others in the clobber list.  Note that action also needs to be hardcoded so that
+ * it is the same register in all expansions of this macro.  Also, we use %rdi
+ * for the coroutine because that is the ABI's first argument register;
+ * coroutine_trampoline can then retrieve the current coroutine from there.
+ *
+ * Note that push and call would clobber the red zone.  Makefile.objs compiles this
+ * file with -mno-red-zone.  The alternative is to subtract/add 128 bytes from rsp
+ * around the switch, with slightly lower cache performance.
+ */
+#define CO_SWITCH(from, to, action, jump) ({                                          \
+    int action_ = action;                                                             \
+    void *from_ = from;                                                               \
+    void *to_ = to;                                                                   \
+    asm volatile(                                                                     \
+        "pushq %%rbp\n"                     /* save frame register on source stack */ \
+        ".cfi_adjust_cfa_offset 8\n"                                                  \
+        "call 1f\n"                         /* switch continues at label 1 */         \
+        "jmp 2f\n"                          /* switch back continues at label 2 */    \
+                                                                                      \
+        "1: .cfi_adjust_cfa_offset 8\n"                                               \
+        "movq %%rsp, %c[SP](%[FROM])\n"     /* save source SP */                      \
+        "movq %c[SP](%[TO]), %%rsp\n"       /* load destination SP */                 \
+        jump "\n"                           /* coroutine switch */                    \
+                                                                                      \
+        "2: .cfi_adjust_cfa_offset -8\n"                                              \
+        "popq %%rbp\n"                                                                \
+        ".cfi_adjust_cfa_offset -8\n"                                                 \
+        : "+a" (action_), [FROM] "+b" (from_), [TO] "+D" (to_)                        \
+        : [SP] "i" (offsetof(CoroutineAsm, sp))                                       \
+        : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",  \
+          "memory");                                                                  \
+    action_;                                                                          \
+})
+/* Use "call" to ensure the stack  is aligned correctly.  */
+#define CO_SWITCH_NEW(from, to) CO_SWITCH(from, to, 0, "call coroutine_trampoline")
+#define CO_SWITCH_RET(from, to, action) CO_SWITCH(from, to, action, "ret")
+#else
+#error coroutine-asm.c not ported to this architecture.
+#endif
+
+static void __attribute__((__used__)) coroutine_trampoline(CoroutineAsm *self)
+{
+    finish_switch_fiber(NULL);
+
+    while (true) {
+        Coroutine *co = &self->base;
+        qemu_coroutine_switch(co, co->caller, COROUTINE_TERMINATE);
+        co->entry(co->entry_arg);
+    }
+}
+
+Coroutine *qemu_coroutine_new(void)
+{
+    CoroutineAsm *co;
+    void *fake_stack_save = NULL;
+
+    co = g_malloc0(sizeof(*co));
+    co->stack_size = COROUTINE_STACK_SIZE;
+    co->stack = qemu_alloc_stack(&co->stack_size);
+    co->sp = co->stack + co->stack_size;
+
+#ifdef CONFIG_VALGRIND_H
+    co->valgrind_stack_id =
+        VALGRIND_STACK_REGISTER(co->stack, co->stack + co->stack_size);
+#endif
+
+    /*
+     * Immediately enter the coroutine once to initialize the stack
+     * and program counter.  We could instead just push the address
+     * of coroutine_trampoline and let qemu_coroutine_switch return
+     * to it, but doing it this way confines the non-portable code
+     * to the CO_SWITCH* macros.
+     */
+    co->base.caller = qemu_coroutine_self();
+    start_switch_fiber(&fake_stack_save, co->stack, co->stack_size);
+    CO_SWITCH_NEW(current, co);
+    finish_switch_fiber(fake_stack_save);
+    co->base.caller = NULL;
+
+    return &co->base;
+}
+
+#ifdef CONFIG_VALGRIND_H
+#if defined(CONFIG_PRAGMA_DIAGNOSTIC_AVAILABLE) && !defined(__clang__)
+/* Work around an unused variable in the valgrind.h macro... */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wunused-but-set-variable"
+#endif
+static inline void valgrind_stack_deregister(CoroutineAsm *co)
+{
+    VALGRIND_STACK_DEREGISTER(co->valgrind_stack_id);
+}
+#if defined(CONFIG_PRAGMA_DIAGNOSTIC_AVAILABLE) && !defined(__clang__)
+#pragma GCC diagnostic pop
+#endif
+#endif
+
+void qemu_coroutine_delete(Coroutine *co_)
+{
+    CoroutineAsm *co = DO_UPCAST(CoroutineAsm, base, co_);
+
+#ifdef CONFIG_VALGRIND_H
+    valgrind_stack_deregister(co);
+#endif
+
+    qemu_free_stack(co->stack, co->stack_size);
+    g_free(co);
+}
+
+/*
+ * This function is marked noinline to prevent GCC from inlining it
+ * into coroutine_trampoline(). If we allow it to do that then it
+ * hoists the code to get the address of the TLS variable "current"
+ * out of the while() loop. This is an invalid transformation because
+ * qemu_coroutine_switch() may be called when running thread A but
+ * return in thread B, and so we might be in a different thread
+ * context each time round the loop.
+ */
+CoroutineAction __attribute__((noinline))
+qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
+                      CoroutineAction action)
+{
+    CoroutineAsm *from = DO_UPCAST(CoroutineAsm, base, from_);
+    CoroutineAsm *to = DO_UPCAST(CoroutineAsm, base, to_);
+    void *fake_stack_save = NULL;
+
+    current = to_;
+
+    start_switch_fiber(action == COROUTINE_TERMINATE ?
+                       NULL : &fake_stack_save, to->stack, to->stack_size);
+    action = CO_SWITCH_RET(from, to, action);
+    finish_switch_fiber(fake_stack_save);
+
+    return action;
+}
+
+Coroutine *qemu_coroutine_self(void)
+{
+    if (!current) {
+        current = &leader.base;
+    }
+    return current;
+}
+
+bool qemu_in_coroutine(void)
+{
+    return current && current->caller;
+}
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 4/9] coroutine: add host specific coroutine backend for 64-bit ARM
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, peter.maydell, cohuck, richard.henderson

The speedup is similar to x86, 120 ns vs 180 ns on an APM Mustang.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 configure                        |  2 +-
 scripts/qemugdb/coroutine_asm.py |  6 ++++-
 util/Makefile.objs               |  2 ++
 util/coroutine-asm.c             | 45 ++++++++++++++++++++++++++++++++
 4 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index c01f57a3ae..26e62a4ab1 100755
--- a/configure
+++ b/configure
@@ -5200,7 +5200,7 @@ fi
 if test "$coroutine" = ""; then
   if test "$mingw32" = "yes"; then
     coroutine=win32
-  elif test "$cpu" = "x86_64"; then
+  elif test "$cpu" = "x86_64" || test "$cpu" = "aarch64"; then
      coroutine=asm
   elif test "$ucontext_works" = "yes"; then
     coroutine=ucontext
diff --git a/scripts/qemugdb/coroutine_asm.py b/scripts/qemugdb/coroutine_asm.py
index b4ac1291db..181b77287b 100644
--- a/scripts/qemugdb/coroutine_asm.py
+++ b/scripts/qemugdb/coroutine_asm.py
@@ -17,4 +17,8 @@ U64_PTR = gdb.lookup_type('uint64_t').pointer()
 def get_coroutine_regs(addr):
     addr = addr.cast(gdb.lookup_type('CoroutineAsm').pointer())
     rsp = addr['sp'].cast(U64_PTR)
-    return {'sp': rsp, 'pc': rsp.dereference()}
+    arch = gdb.selected_frame().architecture.name().split(':'):
+    if arch[0] == 'i386' and arch[1] == 'x86-64':
+        return {'rsp': rsp, 'pc': rsp.dereference()}
+    else:
+        return {'sp': rsp, 'pc': addr['scratch'].cast(U64_PTR) }
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 41a10539cf..2167ffc862 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -39,7 +39,9 @@ util-obj-$(CONFIG_MEMBARRIER) += sys_membarrier.o
 util-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
 util-obj-y += qemu-coroutine-sleep.o
 util-obj-y += coroutine-$(CONFIG_COROUTINE_BACKEND).o
+ifeq ($(ARCH),x86_64)
 coroutine-asm.o-cflags := -mno-red-zone
+endif
 util-obj-y += buffer.o
 util-obj-y += timed-average.o
 util-obj-y += base64.o
diff --git a/util/coroutine-asm.c b/util/coroutine-asm.c
index a06ecbcb0a..de68e98622 100644
--- a/util/coroutine-asm.c
+++ b/util/coroutine-asm.c
@@ -40,6 +40,11 @@ typedef struct {
     Coroutine base;
     void *sp;
 
+    /*
+     * aarch64: instruction pointer
+     */
+    void *scratch;
+
     void *stack;
     size_t stack_size;
 
@@ -116,6 +121,49 @@ static void start_switch_fiber(void **fake_stack_save,
 /* Use "call" to ensure the stack  is aligned correctly.  */
 #define CO_SWITCH_NEW(from, to) CO_SWITCH(from, to, 0, "call coroutine_trampoline")
 #define CO_SWITCH_RET(from, to, action) CO_SWITCH(from, to, action, "ret")
+
+#elif defined __aarch64__
+/*
+ * GCC does not support clobbering the frame pointer, so we save it ourselves.
+ * Saving the link register as well generates slightly better code because then
+ * qemu_coroutine_switch can be treated as a leaf procedure.
+ */
+#define CO_SWITCH_RET(from, to, action) ({                                            \
+    register uintptr_t action_ __asm__("x0") = action;                                \
+    register void *from_ __asm__("x16") = from;                                       \
+    register void *to_ __asm__("x1") = to;                                            \
+    asm volatile(                                                                     \
+        ".cfi_remember_state\n"                                                       \
+        "stp x29, x30, [sp, #-16]!\n"    /* GCC does not save it, do it ourselves */  \
+        ".cfi_adjust_cfa_offset 16\n"                                                 \
+        ".cfi_def_cfa_register sp\n"                                                  \
+        "adr x30, 2f\n"                  /* source PC will be after the BR */         \
+        "str x30, [x16, %[SCRATCH]]\n"   /* save it */                                \
+        "mov x30, sp\n"                  /* save source SP */                         \
+        "str x30, [x16, %[SP]]\n"                                                     \
+        "ldr x30, [x1, %[SCRATCH]]\n"    /* load destination PC */                    \
+        "ldr x1, [x1, %[SP]]\n"          /* load destination SP */                    \
+        "mov sp, x1\n"                                                                \
+        "br x30\n"                                                                    \
+        "2: \n"                                                                       \
+        "ldp x29, x30, [sp], #16\n"                                                   \
+        ".cfi_restore_state\n"                                                        \
+        : "+r" (action_), "+r" (from_), "+r" (to_)                                    \
+        : [SP] "i" (offsetof(CoroutineAsm, sp)),                                      \
+          [SCRATCH] "i" (offsetof(CoroutineAsm, scratch))                             \
+        : "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10", "x11", "x12",        \
+          "x13", "x14", "x15", "x17", "x18", "x19", "x20", "x21", "x22", "x23",       \
+          "x24", "x25", "x26", "x27", "x28",                                          \
+          "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", "v10", "v11",   \
+          "v12", "v13", "v14", "v15", v16", "v17", "v18", "v19", "v20", "v21", "v22", \
+          "v23", "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", "memory",    \
+    action_;                                                                          \
+})
+
+#define CO_SWITCH_NEW(from, to) do {                                                  \
+  (to)->scratch = (void *) coroutine_trampoline;                                      \
+  (void) CO_SWITCH_RET(from, to, (uintptr_t) to);                                     \
+} while(0)
 #else
 #error coroutine-asm.c not ported to this architecture.
 #endif
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 4/9] coroutine: add host specific coroutine backend for 64-bit ARM
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block

The speedup is similar to x86, 120 ns vs 180 ns on an APM Mustang.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 configure                        |  2 +-
 scripts/qemugdb/coroutine_asm.py |  6 ++++-
 util/Makefile.objs               |  2 ++
 util/coroutine-asm.c             | 45 ++++++++++++++++++++++++++++++++
 4 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index c01f57a3ae..26e62a4ab1 100755
--- a/configure
+++ b/configure
@@ -5200,7 +5200,7 @@ fi
 if test "$coroutine" = ""; then
   if test "$mingw32" = "yes"; then
     coroutine=win32
-  elif test "$cpu" = "x86_64"; then
+  elif test "$cpu" = "x86_64" || test "$cpu" = "aarch64"; then
      coroutine=asm
   elif test "$ucontext_works" = "yes"; then
     coroutine=ucontext
diff --git a/scripts/qemugdb/coroutine_asm.py b/scripts/qemugdb/coroutine_asm.py
index b4ac1291db..181b77287b 100644
--- a/scripts/qemugdb/coroutine_asm.py
+++ b/scripts/qemugdb/coroutine_asm.py
@@ -17,4 +17,8 @@ U64_PTR = gdb.lookup_type('uint64_t').pointer()
 def get_coroutine_regs(addr):
     addr = addr.cast(gdb.lookup_type('CoroutineAsm').pointer())
     rsp = addr['sp'].cast(U64_PTR)
-    return {'sp': rsp, 'pc': rsp.dereference()}
+    arch = gdb.selected_frame().architecture.name().split(':'):
+    if arch[0] == 'i386' and arch[1] == 'x86-64':
+        return {'rsp': rsp, 'pc': rsp.dereference()}
+    else:
+        return {'sp': rsp, 'pc': addr['scratch'].cast(U64_PTR) }
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 41a10539cf..2167ffc862 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -39,7 +39,9 @@ util-obj-$(CONFIG_MEMBARRIER) += sys_membarrier.o
 util-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
 util-obj-y += qemu-coroutine-sleep.o
 util-obj-y += coroutine-$(CONFIG_COROUTINE_BACKEND).o
+ifeq ($(ARCH),x86_64)
 coroutine-asm.o-cflags := -mno-red-zone
+endif
 util-obj-y += buffer.o
 util-obj-y += timed-average.o
 util-obj-y += base64.o
diff --git a/util/coroutine-asm.c b/util/coroutine-asm.c
index a06ecbcb0a..de68e98622 100644
--- a/util/coroutine-asm.c
+++ b/util/coroutine-asm.c
@@ -40,6 +40,11 @@ typedef struct {
     Coroutine base;
     void *sp;
 
+    /*
+     * aarch64: instruction pointer
+     */
+    void *scratch;
+
     void *stack;
     size_t stack_size;
 
@@ -116,6 +121,49 @@ static void start_switch_fiber(void **fake_stack_save,
 /* Use "call" to ensure the stack  is aligned correctly.  */
 #define CO_SWITCH_NEW(from, to) CO_SWITCH(from, to, 0, "call coroutine_trampoline")
 #define CO_SWITCH_RET(from, to, action) CO_SWITCH(from, to, action, "ret")
+
+#elif defined __aarch64__
+/*
+ * GCC does not support clobbering the frame pointer, so we save it ourselves.
+ * Saving the link register as well generates slightly better code because then
+ * qemu_coroutine_switch can be treated as a leaf procedure.
+ */
+#define CO_SWITCH_RET(from, to, action) ({                                            \
+    register uintptr_t action_ __asm__("x0") = action;                                \
+    register void *from_ __asm__("x16") = from;                                       \
+    register void *to_ __asm__("x1") = to;                                            \
+    asm volatile(                                                                     \
+        ".cfi_remember_state\n"                                                       \
+        "stp x29, x30, [sp, #-16]!\n"    /* GCC does not save it, do it ourselves */  \
+        ".cfi_adjust_cfa_offset 16\n"                                                 \
+        ".cfi_def_cfa_register sp\n"                                                  \
+        "adr x30, 2f\n"                  /* source PC will be after the BR */         \
+        "str x30, [x16, %[SCRATCH]]\n"   /* save it */                                \
+        "mov x30, sp\n"                  /* save source SP */                         \
+        "str x30, [x16, %[SP]]\n"                                                     \
+        "ldr x30, [x1, %[SCRATCH]]\n"    /* load destination PC */                    \
+        "ldr x1, [x1, %[SP]]\n"          /* load destination SP */                    \
+        "mov sp, x1\n"                                                                \
+        "br x30\n"                                                                    \
+        "2: \n"                                                                       \
+        "ldp x29, x30, [sp], #16\n"                                                   \
+        ".cfi_restore_state\n"                                                        \
+        : "+r" (action_), "+r" (from_), "+r" (to_)                                    \
+        : [SP] "i" (offsetof(CoroutineAsm, sp)),                                      \
+          [SCRATCH] "i" (offsetof(CoroutineAsm, scratch))                             \
+        : "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9", "x10", "x11", "x12",        \
+          "x13", "x14", "x15", "x17", "x18", "x19", "x20", "x21", "x22", "x23",       \
+          "x24", "x25", "x26", "x27", "x28",                                          \
+          "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", "v10", "v11",   \
+          "v12", "v13", "v14", "v15", v16", "v17", "v18", "v19", "v20", "v21", "v22", \
+          "v23", "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", "memory",    \
+    action_;                                                                          \
+})
+
+#define CO_SWITCH_NEW(from, to) do {                                                  \
+  (to)->scratch = (void *) coroutine_trampoline;                                      \
+  (void) CO_SWITCH_RET(from, to, (uintptr_t) to);                                     \
+} while(0)
 #else
 #error coroutine-asm.c not ported to this architecture.
 #endif
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 5/9] coroutine: add host specific coroutine backend for 64-bit s390
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, peter.maydell, cohuck, richard.henderson

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/coroutine-asm.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/util/coroutine-asm.c b/util/coroutine-asm.c
index de68e98622..a9a80e9c71 100644
--- a/util/coroutine-asm.c
+++ b/util/coroutine-asm.c
@@ -41,7 +41,7 @@ typedef struct {
     void *sp;
 
     /*
-     * aarch64: instruction pointer
+     * aarch64, s390x: instruction pointer
      */
     void *scratch;
 
@@ -161,6 +161,40 @@ static void start_switch_fiber(void **fake_stack_save,
   (to)->scratch = (void *) coroutine_trampoline;                                      \
   (void) CO_SWITCH_RET(from, to, (uintptr_t) to);                                     \
 } while(0)
+
+#elif defined __s390x__
+#define CO_SWITCH_RET(from, to, action) ({                                            \
+    register uintptr_t action_ __asm__("r2") = action;                                \
+    register void *from_ __asm__("r1") = from;                                        \
+    register void *to_ __asm__("r3") = to;                                            \
+    register void *pc_ __asm__("r4") = to->scratch;                                   \
+    void *save_r13;                                                                   \
+    asm volatile(                                                                     \
+        "stg %%r13, %[SAVE_R13]\n"                                                    \
+        "stg %%r15, %[SP](%%r1)\n"       /* save source SP */                         \
+        "lg %%r15, %[SP](%%r3)\n"        /* load destination SP */                    \
+        "bras %%r3, 1f\n"                /* source PC will be after the BR */         \
+        "1: aghi %%r3, 12\n"             /* 4 */                                      \
+        "stg %%r3, %[SCRATCH](%%r1)\n"   /* 6 save switch-back PC */                  \
+        "br %%r4\n"                      /* 2 jump to destination PC */               \
+        "lg %%r13, %[SAVE_R13]\n"                                                     \
+        : "+r" (action_), "+r" (from_), "+r" (to_), "+r" (pc_),                       \
+          [SAVE_R13] "+m" (r13)                                                       \
+        : [SP] "i" (offsetof(CoroutineAsm, sp)),                                      \
+          [SCRATCH] "i" (offsetof(CoroutineAsm, scratch))                             \
+        : "r0", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "r14",             \
+          "a2", "a3", "a4", "a5", "a6", "a7",                                         \
+          "a8", "a9", "a10", "a11", "a12", "a13", "a14", "a15",                       \
+          "f0", "f1", "f2", "f3", "f4", "f5", "f6", "f7",                             \
+          "f8", "f9", "f10", "f11", "f12", "f13", "f14", "f15", "memory");            \
+    action_;                                                                          \
+})
+
+#define CO_SWITCH_NEW(from, to) do {                                                  \
+  (to)->scratch = (void *) coroutine_trampoline;                                      \
+  (to)->sp -= 160;                                                                    \
+  (void) CO_SWITCH_RET(from, to, (uintptr_t) to);                                     \
+} while(0)
 #else
 #error coroutine-asm.c not ported to this architecture.
 #endif
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 5/9] coroutine: add host specific coroutine backend for 64-bit s390
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/coroutine-asm.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/util/coroutine-asm.c b/util/coroutine-asm.c
index de68e98622..a9a80e9c71 100644
--- a/util/coroutine-asm.c
+++ b/util/coroutine-asm.c
@@ -41,7 +41,7 @@ typedef struct {
     void *sp;
 
     /*
-     * aarch64: instruction pointer
+     * aarch64, s390x: instruction pointer
      */
     void *scratch;
 
@@ -161,6 +161,40 @@ static void start_switch_fiber(void **fake_stack_save,
   (to)->scratch = (void *) coroutine_trampoline;                                      \
   (void) CO_SWITCH_RET(from, to, (uintptr_t) to);                                     \
 } while(0)
+
+#elif defined __s390x__
+#define CO_SWITCH_RET(from, to, action) ({                                            \
+    register uintptr_t action_ __asm__("r2") = action;                                \
+    register void *from_ __asm__("r1") = from;                                        \
+    register void *to_ __asm__("r3") = to;                                            \
+    register void *pc_ __asm__("r4") = to->scratch;                                   \
+    void *save_r13;                                                                   \
+    asm volatile(                                                                     \
+        "stg %%r13, %[SAVE_R13]\n"                                                    \
+        "stg %%r15, %[SP](%%r1)\n"       /* save source SP */                         \
+        "lg %%r15, %[SP](%%r3)\n"        /* load destination SP */                    \
+        "bras %%r3, 1f\n"                /* source PC will be after the BR */         \
+        "1: aghi %%r3, 12\n"             /* 4 */                                      \
+        "stg %%r3, %[SCRATCH](%%r1)\n"   /* 6 save switch-back PC */                  \
+        "br %%r4\n"                      /* 2 jump to destination PC */               \
+        "lg %%r13, %[SAVE_R13]\n"                                                     \
+        : "+r" (action_), "+r" (from_), "+r" (to_), "+r" (pc_),                       \
+          [SAVE_R13] "+m" (r13)                                                       \
+        : [SP] "i" (offsetof(CoroutineAsm, sp)),                                      \
+          [SCRATCH] "i" (offsetof(CoroutineAsm, scratch))                             \
+        : "r0", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "r14",             \
+          "a2", "a3", "a4", "a5", "a6", "a7",                                         \
+          "a8", "a9", "a10", "a11", "a12", "a13", "a14", "a15",                       \
+          "f0", "f1", "f2", "f3", "f4", "f5", "f6", "f7",                             \
+          "f8", "f9", "f10", "f11", "f12", "f13", "f14", "f15", "memory");            \
+    action_;                                                                          \
+})
+
+#define CO_SWITCH_NEW(from, to) do {                                                  \
+  (to)->scratch = (void *) coroutine_trampoline;                                      \
+  (to)->sp -= 160;                                                                    \
+  (void) CO_SWITCH_RET(from, to, (uintptr_t) to);                                     \
+} while(0)
 #else
 #error coroutine-asm.c not ported to this architecture.
 #endif
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 6/9] configure: add control-flow protection support
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, peter.maydell, cohuck, richard.henderson

Control-flow protection requires object files to note which features
are supported.  The linker will merge them to the set of features that
are supported by all object files.  The compiler creates these notes
when the -fcf-protection option is passed, but we have to blacklist
some object files that only support a subset of the full control-flow
protection feature set.

Even without any further host-specific patches, user-mode emulation
binaries can already use shadow stacks, because they don't need
coroutines and don't include the problematic util/coroutine-*.o
object files.  Likewise, system-mode emulation binaries will enable
indirect branch tracking if built without TCG support.

The next patches will improve the situation so that QEMU can be built
with full protection on x86 hosts.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.target    |  3 +++
 configure          | 29 +++++++++++++++++++++++++++++
 util/Makefile.objs |  5 +++++
 3 files changed, 37 insertions(+)

diff --git a/Makefile.target b/Makefile.target
index ae02495951..667682779b 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -111,6 +111,9 @@ obj-y += exec.o
 obj-y += accel/
 obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o tcg/tcg-op-gvec.o
 obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o
+ifeq ($(CONFIG_CF_PROTECTION),y)
+tcg/tcg.o-cflags := -fcf-protection=return
+endif
 obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o
 obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
 obj-$(CONFIG_TCG) += fpu/softfloat.o
diff --git a/configure b/configure
index 26e62a4ab1..946ff7825a 100755
--- a/configure
+++ b/configure
@@ -449,6 +449,7 @@ win_sdk="no"
 want_tools="yes"
 libiscsi=""
 libnfs=""
+cf_protection="no"      # leave it disabled until we can test performance
 coroutine=""
 coroutine_pool=""
 debug_stack_usage="no"
@@ -1267,6 +1268,10 @@ for opt do
   ;;
   --with-pkgversion=*) pkgversion="$optarg"
   ;;
+  --enable-cf-protection) cf_protection="yes"
+  ;;
+  --disable-cf-protection) cf_protection="no"
+  ;;
   --with-coroutine=*) coroutine="$optarg"
   ;;
   --disable-coroutine-pool) coroutine_pool="no"
@@ -1796,6 +1801,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   lzfse           support of lzfse compression library
                   (for reading lzfse-compressed dmg images)
   seccomp         seccomp support
+  cf-protection   Control-flow protection
   coroutine-pool  coroutine freelist (better performance)
   glusterfs       GlusterFS backend
   tpm             TPM support
@@ -5176,6 +5182,25 @@ if have_backend "dtrace"; then
   fi
 fi
 
+##########################################
+# detect Control-flow protection support in the toolchain
+
+if test "$cf_protection" != no; then
+  write_c_skeleton;
+  if ! compile_prog "-fcf-protection" "" ; then
+    if test "$cf_protection" = yes; then
+      feature_not_found "cf_protection" 'Control-flow protection is not supported by your toolchain'
+    fi
+    cf_protection=no
+  fi
+fi
+if test "$cf_protection" = ""; then
+  cf_protection=yes
+fi
+if test "$cf_protection" = "yes"; then
+  QEMU_CFLAGS="-fcf-protection $QEMU_CFLAGS"
+fi
+
 ##########################################
 # check and set a backend for coroutine
 
@@ -6361,6 +6386,7 @@ echo "netmap support    $netmap"
 echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs     $blobs"
+echo "CF protection     $cf_protection"
 echo "KVM support       $kvm"
 echo "HAX support       $hax"
 echo "HVF support       $hvf"
@@ -6571,6 +6597,9 @@ fi
 if test "$profiler" = "yes" ; then
   echo "CONFIG_PROFILER=y" >> $config_host_mak
 fi
+if test "$cf_protection" = "yes" ; then
+  echo "CONFIG_CF_PROTECTION=y" >> $config_host_mak
+fi
 if test "$slirp" != "no"; then
   echo "CONFIG_SLIRP=y" >> $config_host_mak
   echo "CONFIG_SMBD_COMMAND=\"$smbd\"" >> $config_host_mak
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 2167ffc862..d7add70b63 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -42,6 +42,11 @@ util-obj-y += coroutine-$(CONFIG_COROUTINE_BACKEND).o
 ifeq ($(ARCH),x86_64)
 coroutine-asm.o-cflags := -mno-red-zone
 endif
+ifeq ($(CONFIG_CF_PROTECTION),y)
+coroutine-sigaltstack.o-cflags := -fcf-protection=branch
+coroutine-ucontext.o-cflags := -fcf-protection=branch
+coroutine-asm.o-cflags += -fcf-protection=branch
+endif
 util-obj-y += buffer.o
 util-obj-y += timed-average.o
 util-obj-y += base64.o
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 6/9] configure: add control-flow protection support
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block

Control-flow protection requires object files to note which features
are supported.  The linker will merge them to the set of features that
are supported by all object files.  The compiler creates these notes
when the -fcf-protection option is passed, but we have to blacklist
some object files that only support a subset of the full control-flow
protection feature set.

Even without any further host-specific patches, user-mode emulation
binaries can already use shadow stacks, because they don't need
coroutines and don't include the problematic util/coroutine-*.o
object files.  Likewise, system-mode emulation binaries will enable
indirect branch tracking if built without TCG support.

The next patches will improve the situation so that QEMU can be built
with full protection on x86 hosts.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.target    |  3 +++
 configure          | 29 +++++++++++++++++++++++++++++
 util/Makefile.objs |  5 +++++
 3 files changed, 37 insertions(+)

diff --git a/Makefile.target b/Makefile.target
index ae02495951..667682779b 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -111,6 +111,9 @@ obj-y += exec.o
 obj-y += accel/
 obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o tcg/tcg-op-gvec.o
 obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o
+ifeq ($(CONFIG_CF_PROTECTION),y)
+tcg/tcg.o-cflags := -fcf-protection=return
+endif
 obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o
 obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
 obj-$(CONFIG_TCG) += fpu/softfloat.o
diff --git a/configure b/configure
index 26e62a4ab1..946ff7825a 100755
--- a/configure
+++ b/configure
@@ -449,6 +449,7 @@ win_sdk="no"
 want_tools="yes"
 libiscsi=""
 libnfs=""
+cf_protection="no"      # leave it disabled until we can test performance
 coroutine=""
 coroutine_pool=""
 debug_stack_usage="no"
@@ -1267,6 +1268,10 @@ for opt do
   ;;
   --with-pkgversion=*) pkgversion="$optarg"
   ;;
+  --enable-cf-protection) cf_protection="yes"
+  ;;
+  --disable-cf-protection) cf_protection="no"
+  ;;
   --with-coroutine=*) coroutine="$optarg"
   ;;
   --disable-coroutine-pool) coroutine_pool="no"
@@ -1796,6 +1801,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   lzfse           support of lzfse compression library
                   (for reading lzfse-compressed dmg images)
   seccomp         seccomp support
+  cf-protection   Control-flow protection
   coroutine-pool  coroutine freelist (better performance)
   glusterfs       GlusterFS backend
   tpm             TPM support
@@ -5176,6 +5182,25 @@ if have_backend "dtrace"; then
   fi
 fi
 
+##########################################
+# detect Control-flow protection support in the toolchain
+
+if test "$cf_protection" != no; then
+  write_c_skeleton;
+  if ! compile_prog "-fcf-protection" "" ; then
+    if test "$cf_protection" = yes; then
+      feature_not_found "cf_protection" 'Control-flow protection is not supported by your toolchain'
+    fi
+    cf_protection=no
+  fi
+fi
+if test "$cf_protection" = ""; then
+  cf_protection=yes
+fi
+if test "$cf_protection" = "yes"; then
+  QEMU_CFLAGS="-fcf-protection $QEMU_CFLAGS"
+fi
+
 ##########################################
 # check and set a backend for coroutine
 
@@ -6361,6 +6386,7 @@ echo "netmap support    $netmap"
 echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs     $blobs"
+echo "CF protection     $cf_protection"
 echo "KVM support       $kvm"
 echo "HAX support       $hax"
 echo "HVF support       $hvf"
@@ -6571,6 +6597,9 @@ fi
 if test "$profiler" = "yes" ; then
   echo "CONFIG_PROFILER=y" >> $config_host_mak
 fi
+if test "$cf_protection" = "yes" ; then
+  echo "CONFIG_CF_PROTECTION=y" >> $config_host_mak
+fi
 if test "$slirp" != "no"; then
   echo "CONFIG_SLIRP=y" >> $config_host_mak
   echo "CONFIG_SMBD_COMMAND=\"$smbd\"" >> $config_host_mak
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 2167ffc862..d7add70b63 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -42,6 +42,11 @@ util-obj-y += coroutine-$(CONFIG_COROUTINE_BACKEND).o
 ifeq ($(ARCH),x86_64)
 coroutine-asm.o-cflags := -mno-red-zone
 endif
+ifeq ($(CONFIG_CF_PROTECTION),y)
+coroutine-sigaltstack.o-cflags := -fcf-protection=branch
+coroutine-ucontext.o-cflags := -fcf-protection=branch
+coroutine-asm.o-cflags += -fcf-protection=branch
+endif
 util-obj-y += buffer.o
 util-obj-y += timed-average.o
 util-obj-y += base64.o
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 7/9] tcg: add tcg_out_start
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, peter.maydell, cohuck, richard.henderson

This function is called at the beginning of any translation block.  We will
use it to emit ENDBR32 or ENDBR64 annotations for x86 CET.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tcg/aarch64/tcg-target.inc.c | 4 ++++
 tcg/arm/tcg-target.inc.c     | 4 ++++
 tcg/i386/tcg-target.inc.c    | 4 ++++
 tcg/mips/tcg-target.inc.c    | 4 ++++
 tcg/ppc/tcg-target.inc.c     | 4 ++++
 tcg/riscv/tcg-target.inc.c   | 4 ++++
 tcg/s390/tcg-target.inc.c    | 4 ++++
 tcg/sparc/tcg-target.inc.c   | 4 ++++
 tcg/tcg.c                    | 2 ++
 tcg/tci/tcg-target.inc.c     | 4 ++++
 10 files changed, 38 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index eefa929948..c66f3cb857 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -2539,6 +2539,10 @@ QEMU_BUILD_BUG_ON(FRAME_SIZE >= (1 << 14));
 /* We're expecting to use a single ADDI insn.  */
 QEMU_BUILD_BUG_ON(FRAME_SIZE - PUSH_SIZE > 0xfff);
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
     TCGReg r;
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index abf0c444b4..8f919c7641 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -2284,6 +2284,10 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
     }
 }
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 /* Compute frame size via macros, to share between tcg_target_qemu_prologue
    and tcg_register_jit.  */
 
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index d5ed9f1ffd..b210977800 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3510,6 +3510,10 @@ static const int tcg_target_callee_save_regs[] = {
       + TCG_TARGET_STACK_ALIGN - 1) \
      & ~(TCG_TARGET_STACK_ALIGN - 1))
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 /* Generate global QEMU prologue and epilogue code */
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 412cacdcb9..2bb976a9a5 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -2471,6 +2471,10 @@ static tcg_insn_unit *align_code_ptr(TCGContext *s)
     return s->code_ptr;
 }
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 /* Stack frame parameters.  */
 #define REG_SIZE   (TCG_TARGET_REG_BITS / 8)
 #define SAVE_SIZE  ((int)ARRAY_SIZE(tcg_target_callee_save_regs) * REG_SIZE)
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 36b4791707..f4efca8f7b 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -1904,6 +1904,10 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
     }
 }
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 /* Parameters for function call generation, used in tcg.c.  */
 #define TCG_TARGET_STACK_ALIGN       16
 #define TCG_TARGET_EXTEND_ARGS       1
diff --git a/tcg/riscv/tcg-target.inc.c b/tcg/riscv/tcg-target.inc.c
index 2932505094..5780537b73 100644
--- a/tcg/riscv/tcg-target.inc.c
+++ b/tcg/riscv/tcg-target.inc.c
@@ -1798,6 +1798,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     }
 }
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 static const int tcg_target_callee_save_regs[] = {
     TCG_REG_S0,       /* used for the global env (TCG_AREG0) */
     TCG_REG_S1,
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 3d6150b10e..924bd01afd 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -2499,6 +2499,10 @@ static void query_s390_facilities(void)
     }
 }
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 static void tcg_target_init(TCGContext *s)
 {
     query_s390_facilities();
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 7a61839dc1..f795e78153 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -1004,6 +1004,10 @@ static void build_trampolines(TCGContext *s)
 }
 #endif
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 /* Generate global QEMU prologue and epilogue code */
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index f7bef51de8..c8832c3ccf 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -101,6 +101,7 @@ static void tcg_register_jit_int(void *buf, size_t size,
 /* Forward declarations for functions declared and used in tcg-target.inc.c. */
 static const char *target_parse_constraint(TCGArgConstraint *ct,
                                            const char *ct_str, TCGType type);
+static void tcg_out_start(TCGContext *s);
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
                        intptr_t arg2);
 static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
@@ -3925,6 +3926,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 #endif
 
     num_insns = -1;
+    tcg_out_start(s);
     QTAILQ_FOREACH(op, &s->ops, link) {
         TCGOpcode opc = op->opc;
 
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 0015a98485..cb90012999 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -889,6 +889,10 @@ static void tcg_target_init(TCGContext *s)
                   CPU_TEMP_BUF_NLONGS * sizeof(long));
 }
 
+static inline void tcg_out_start(void)
+{
+}
+
 /* Generate global QEMU prologue and epilogue code. */
 static inline void tcg_target_qemu_prologue(TCGContext *s)
 {
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 7/9] tcg: add tcg_out_start
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block

This function is called at the beginning of any translation block.  We will
use it to emit ENDBR32 or ENDBR64 annotations for x86 CET.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 tcg/aarch64/tcg-target.inc.c | 4 ++++
 tcg/arm/tcg-target.inc.c     | 4 ++++
 tcg/i386/tcg-target.inc.c    | 4 ++++
 tcg/mips/tcg-target.inc.c    | 4 ++++
 tcg/ppc/tcg-target.inc.c     | 4 ++++
 tcg/riscv/tcg-target.inc.c   | 4 ++++
 tcg/s390/tcg-target.inc.c    | 4 ++++
 tcg/sparc/tcg-target.inc.c   | 4 ++++
 tcg/tcg.c                    | 2 ++
 tcg/tci/tcg-target.inc.c     | 4 ++++
 10 files changed, 38 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index eefa929948..c66f3cb857 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -2539,6 +2539,10 @@ QEMU_BUILD_BUG_ON(FRAME_SIZE >= (1 << 14));
 /* We're expecting to use a single ADDI insn.  */
 QEMU_BUILD_BUG_ON(FRAME_SIZE - PUSH_SIZE > 0xfff);
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
     TCGReg r;
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index abf0c444b4..8f919c7641 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -2284,6 +2284,10 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
     }
 }
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 /* Compute frame size via macros, to share between tcg_target_qemu_prologue
    and tcg_register_jit.  */
 
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index d5ed9f1ffd..b210977800 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3510,6 +3510,10 @@ static const int tcg_target_callee_save_regs[] = {
       + TCG_TARGET_STACK_ALIGN - 1) \
      & ~(TCG_TARGET_STACK_ALIGN - 1))
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 /* Generate global QEMU prologue and epilogue code */
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 412cacdcb9..2bb976a9a5 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -2471,6 +2471,10 @@ static tcg_insn_unit *align_code_ptr(TCGContext *s)
     return s->code_ptr;
 }
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 /* Stack frame parameters.  */
 #define REG_SIZE   (TCG_TARGET_REG_BITS / 8)
 #define SAVE_SIZE  ((int)ARRAY_SIZE(tcg_target_callee_save_regs) * REG_SIZE)
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 36b4791707..f4efca8f7b 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -1904,6 +1904,10 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
     }
 }
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 /* Parameters for function call generation, used in tcg.c.  */
 #define TCG_TARGET_STACK_ALIGN       16
 #define TCG_TARGET_EXTEND_ARGS       1
diff --git a/tcg/riscv/tcg-target.inc.c b/tcg/riscv/tcg-target.inc.c
index 2932505094..5780537b73 100644
--- a/tcg/riscv/tcg-target.inc.c
+++ b/tcg/riscv/tcg-target.inc.c
@@ -1798,6 +1798,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     }
 }
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 static const int tcg_target_callee_save_regs[] = {
     TCG_REG_S0,       /* used for the global env (TCG_AREG0) */
     TCG_REG_S1,
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 3d6150b10e..924bd01afd 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -2499,6 +2499,10 @@ static void query_s390_facilities(void)
     }
 }
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 static void tcg_target_init(TCGContext *s)
 {
     query_s390_facilities();
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 7a61839dc1..f795e78153 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -1004,6 +1004,10 @@ static void build_trampolines(TCGContext *s)
 }
 #endif
 
+static inline void tcg_out_start(TCGContext *s)
+{
+}
+
 /* Generate global QEMU prologue and epilogue code */
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index f7bef51de8..c8832c3ccf 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -101,6 +101,7 @@ static void tcg_register_jit_int(void *buf, size_t size,
 /* Forward declarations for functions declared and used in tcg-target.inc.c. */
 static const char *target_parse_constraint(TCGArgConstraint *ct,
                                            const char *ct_str, TCGType type);
+static void tcg_out_start(TCGContext *s);
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
                        intptr_t arg2);
 static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
@@ -3925,6 +3926,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 #endif
 
     num_insns = -1;
+    tcg_out_start(s);
     QTAILQ_FOREACH(op, &s->ops, link) {
         TCGOpcode opc = op->opc;
 
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 0015a98485..cb90012999 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -889,6 +889,10 @@ static void tcg_target_init(TCGContext *s)
                   CPU_TEMP_BUF_NLONGS * sizeof(long));
 }
 
+static inline void tcg_out_start(void)
+{
+}
+
 /* Generate global QEMU prologue and epilogue code. */
 static inline void tcg_target_qemu_prologue(TCGContext *s)
 {
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 8/9] tcg/i386: add support for IBT
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, peter.maydell, cohuck, richard.henderson

Add endbr annotations before indirect branch targets.  This lets QEMU enable
IBT even for TCG-enabled builds.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.target           |  2 ++
 configure                 |  9 +++++++++
 include/qemu/cpuid.h      |  5 +++++
 tcg/i386/tcg-target.inc.c | 19 +++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/Makefile.target b/Makefile.target
index 667682779b..d168ee7555 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -112,8 +112,10 @@ obj-y += accel/
 obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o tcg/tcg-op-gvec.o
 obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o
 ifeq ($(CONFIG_CF_PROTECTION),y)
+ifneq ($(CONFIG_CF_PROTECTION_TCG),y)
 tcg/tcg.o-cflags := -fcf-protection=return
 endif
+endif
 obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o
 obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
 obj-$(CONFIG_TCG) += fpu/softfloat.o
diff --git a/configure b/configure
index 946ff7825a..c02a5f4b79 100755
--- a/configure
+++ b/configure
@@ -5200,6 +5200,11 @@ fi
 if test "$cf_protection" = "yes"; then
   QEMU_CFLAGS="-fcf-protection $QEMU_CFLAGS"
 fi
+if test "$cpu" = "x86_64"; then
+  cf_protection_tcg=yes
+else
+  cf_protection_tcg=no
+fi
 
 ##########################################
 # check and set a backend for coroutine
@@ -6395,6 +6400,7 @@ echo "TCG support       $tcg"
 if test "$tcg" = "yes" ; then
     echo "TCG debug enabled $debug_tcg"
     echo "TCG interpreter   $tcg_interpreter"
+    echo "TCG CF protection $cf_protection_tcg"
 fi
 echo "malloc trim support $malloc_trim"
 echo "RDMA support      $rdma"
@@ -6600,6 +6606,9 @@ fi
 if test "$cf_protection" = "yes" ; then
   echo "CONFIG_CF_PROTECTION=y" >> $config_host_mak
 fi
+if test "$cf_protection_tcg" = "yes" ; then
+  echo "CONFIG_CF_PROTECTION_TCG=y" >> $config_host_mak
+fi
 if test "$slirp" != "no"; then
   echo "CONFIG_SLIRP=y" >> $config_host_mak
   echo "CONFIG_SMBD_COMMAND=\"$smbd\"" >> $config_host_mak
diff --git a/include/qemu/cpuid.h b/include/qemu/cpuid.h
index 69301700bd..e32fb209f5 100644
--- a/include/qemu/cpuid.h
+++ b/include/qemu/cpuid.h
@@ -49,6 +49,11 @@
 #define bit_BMI2        (1 << 8)
 #endif
 
+/* Leaf 7, %edx */
+#ifndef bit_IBT
+#define bit_IBT         (1 << 20)
+#endif
+
 /* Leaf 0x80000001, %ecx */
 #ifndef bit_LZCNT
 #define bit_LZCNT       (1 << 5)
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index b210977800..cb3de2f7ac 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -159,10 +159,12 @@ bool have_avx2;
 static bool have_movbe;
 static bool have_bmi2;
 static bool have_lzcnt;
+static bool have_ibt;
 #else
 # define have_movbe 0
 # define have_bmi2 0
 # define have_lzcnt 0
+# define have_ibt 1
 #endif
 
 static tcg_insn_unit *tb_ret_addr;
@@ -809,6 +811,19 @@ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src)
     tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src);
 }
 
+static void tcg_out_endbr(TCGContext *s)
+{
+    if (have_ibt) {
+#if defined __CET__ && (__CET__ & 1)
+#ifdef __x86_64__
+        tcg_out32(s, 0xfa1e0ff3);
+#else
+        tcg_out32(s, 0xfb1e0ff3);
+#endif
+#endif
+    }
+}
+
 static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
     int rexw = 0;
@@ -3512,6 +3527,7 @@ static const int tcg_target_callee_save_regs[] = {
 
 static inline void tcg_out_start(TCGContext *s)
 {
+    tcg_out_endbr(s);
 }
 
 /* Generate global QEMU prologue and epilogue code */
@@ -3520,6 +3536,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     int i, stack_addend;
 
     /* TB prologue */
+    tcg_out_endbr(s);
 
     /* Reserve some stack space, also for TCG temps.  */
     stack_addend = FRAME_SIZE - PUSH_SIZE;
@@ -3566,6 +3583,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
      * and fall through to the rest of the epilogue.
      */
     s->code_gen_epilogue = s->code_ptr;
+    tcg_out_endbr(s);
     tcg_out_movi(s, TCG_TYPE_REG, TCG_REG_EAX, 0);
 
     /* TB epilogue */
@@ -3598,6 +3616,7 @@ static void tcg_target_init(TCGContext *s)
         __cpuid_count(7, 0, a, b7, c, d);
         have_bmi1 = (b7 & bit_BMI) != 0;
         have_bmi2 = (b7 & bit_BMI2) != 0;
+        have_ibt = (d & bit_IBT) != 0;
     }
 
     if (max >= 1) {
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 8/9] tcg/i386: add support for IBT
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block

Add endbr annotations before indirect branch targets.  This lets QEMU enable
IBT even for TCG-enabled builds.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.target           |  2 ++
 configure                 |  9 +++++++++
 include/qemu/cpuid.h      |  5 +++++
 tcg/i386/tcg-target.inc.c | 19 +++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/Makefile.target b/Makefile.target
index 667682779b..d168ee7555 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -112,8 +112,10 @@ obj-y += accel/
 obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o tcg/tcg-op-gvec.o
 obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o
 ifeq ($(CONFIG_CF_PROTECTION),y)
+ifneq ($(CONFIG_CF_PROTECTION_TCG),y)
 tcg/tcg.o-cflags := -fcf-protection=return
 endif
+endif
 obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o
 obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
 obj-$(CONFIG_TCG) += fpu/softfloat.o
diff --git a/configure b/configure
index 946ff7825a..c02a5f4b79 100755
--- a/configure
+++ b/configure
@@ -5200,6 +5200,11 @@ fi
 if test "$cf_protection" = "yes"; then
   QEMU_CFLAGS="-fcf-protection $QEMU_CFLAGS"
 fi
+if test "$cpu" = "x86_64"; then
+  cf_protection_tcg=yes
+else
+  cf_protection_tcg=no
+fi
 
 ##########################################
 # check and set a backend for coroutine
@@ -6395,6 +6400,7 @@ echo "TCG support       $tcg"
 if test "$tcg" = "yes" ; then
     echo "TCG debug enabled $debug_tcg"
     echo "TCG interpreter   $tcg_interpreter"
+    echo "TCG CF protection $cf_protection_tcg"
 fi
 echo "malloc trim support $malloc_trim"
 echo "RDMA support      $rdma"
@@ -6600,6 +6606,9 @@ fi
 if test "$cf_protection" = "yes" ; then
   echo "CONFIG_CF_PROTECTION=y" >> $config_host_mak
 fi
+if test "$cf_protection_tcg" = "yes" ; then
+  echo "CONFIG_CF_PROTECTION_TCG=y" >> $config_host_mak
+fi
 if test "$slirp" != "no"; then
   echo "CONFIG_SLIRP=y" >> $config_host_mak
   echo "CONFIG_SMBD_COMMAND=\"$smbd\"" >> $config_host_mak
diff --git a/include/qemu/cpuid.h b/include/qemu/cpuid.h
index 69301700bd..e32fb209f5 100644
--- a/include/qemu/cpuid.h
+++ b/include/qemu/cpuid.h
@@ -49,6 +49,11 @@
 #define bit_BMI2        (1 << 8)
 #endif
 
+/* Leaf 7, %edx */
+#ifndef bit_IBT
+#define bit_IBT         (1 << 20)
+#endif
+
 /* Leaf 0x80000001, %ecx */
 #ifndef bit_LZCNT
 #define bit_LZCNT       (1 << 5)
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index b210977800..cb3de2f7ac 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -159,10 +159,12 @@ bool have_avx2;
 static bool have_movbe;
 static bool have_bmi2;
 static bool have_lzcnt;
+static bool have_ibt;
 #else
 # define have_movbe 0
 # define have_bmi2 0
 # define have_lzcnt 0
+# define have_ibt 1
 #endif
 
 static tcg_insn_unit *tb_ret_addr;
@@ -809,6 +811,19 @@ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src)
     tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src);
 }
 
+static void tcg_out_endbr(TCGContext *s)
+{
+    if (have_ibt) {
+#if defined __CET__ && (__CET__ & 1)
+#ifdef __x86_64__
+        tcg_out32(s, 0xfa1e0ff3);
+#else
+        tcg_out32(s, 0xfb1e0ff3);
+#endif
+#endif
+    }
+}
+
 static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
     int rexw = 0;
@@ -3512,6 +3527,7 @@ static const int tcg_target_callee_save_regs[] = {
 
 static inline void tcg_out_start(TCGContext *s)
 {
+    tcg_out_endbr(s);
 }
 
 /* Generate global QEMU prologue and epilogue code */
@@ -3520,6 +3536,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     int i, stack_addend;
 
     /* TB prologue */
+    tcg_out_endbr(s);
 
     /* Reserve some stack space, also for TCG temps.  */
     stack_addend = FRAME_SIZE - PUSH_SIZE;
@@ -3566,6 +3583,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
      * and fall through to the rest of the epilogue.
      */
     s->code_gen_epilogue = s->code_ptr;
+    tcg_out_endbr(s);
     tcg_out_movi(s, TCG_TYPE_REG, TCG_REG_EAX, 0);
 
     /* TB epilogue */
@@ -3598,6 +3616,7 @@ static void tcg_target_init(TCGContext *s)
         __cpuid_count(7, 0, a, b7, c, d);
         have_bmi1 = (b7 & bit_BMI) != 0;
         have_bmi2 = (b7 & bit_BMI2) != 0;
+        have_ibt = (d & bit_IBT) != 0;
     }
 
     if (max >= 1) {
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 9/9] linux-user: add IBT support to x86 safe-syscall.S
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, peter.maydell, cohuck, richard.henderson

Because safe-syscall.S does not go through the C compiler, the
.note.gnu.property note has to be added manually.  Safe syscalls do not
involve any indirect branch or stack unwinding, so they are trivially
safe for IBT or shadow stacks.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 linux-user/host/i386/safe-syscall.inc.S   | 19 +++++++++++++++++++
 linux-user/host/x86_64/safe-syscall.inc.S | 19 +++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/linux-user/host/i386/safe-syscall.inc.S b/linux-user/host/i386/safe-syscall.inc.S
index 9e58fc6504..6c6d568d62 100644
--- a/linux-user/host/i386/safe-syscall.inc.S
+++ b/linux-user/host/i386/safe-syscall.inc.S
@@ -98,3 +98,22 @@ safe_syscall_end:
 	.cfi_endproc
 
 	.size	safe_syscall_base, .-safe_syscall_base
+
+	.pushsection ".note.gnu.property", "a"
+	.p2align 2
+	.long 1f - 0f          /* name length.  */
+	.long 4f - 1f          /* data length.  */
+	.long 5                /* NT_GNU_PROPERTY_TYPE_0.  */
+0:
+	.asciz "GNU"           /* vendor name.  */
+	.p2align 2
+1:
+        /* GNU_PROPERTY_X86_FEATURE_1_AND.  */
+	.long 0xc0000002       /* pr_type.  */
+	.long 3f - 2f          /* pr_datasz.  */
+2:
+	.long 0x3              /* IBT, SHSTK */
+3:
+	.p2align 2
+4:
+	.popsection
diff --git a/linux-user/host/x86_64/safe-syscall.inc.S b/linux-user/host/x86_64/safe-syscall.inc.S
index f36992daa3..e1a57db338 100644
--- a/linux-user/host/x86_64/safe-syscall.inc.S
+++ b/linux-user/host/x86_64/safe-syscall.inc.S
@@ -89,3 +89,22 @@ safe_syscall_end:
         .cfi_endproc
 
         .size   safe_syscall_base, .-safe_syscall_base
+
+	.pushsection ".note.gnu.property", "a"
+	.p2align 3
+	.long 1f - 0f          /* name length.  */
+	.long 4f - 1f          /* data length.  */
+	.long 5                /* NT_GNU_PROPERTY_TYPE_0.  */
+0:
+	.asciz "GNU"           /* vendor name.  */
+	.p2align 3
+1:
+        /* GNU_PROPERTY_X86_FEATURE_1_AND.  */
+	.long 0xc0000002       /* pr_type.  */
+	.long 3f - 2f          /* pr_datasz.  */
+2:
+	.long 0x3              /* IBT, SHSTK */
+3:
+	.p2align 3
+4:
+	.popsection
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 9/9] linux-user: add IBT support to x86 safe-syscall.S
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block

Because safe-syscall.S does not go through the C compiler, the
.note.gnu.property note has to be added manually.  Safe syscalls do not
involve any indirect branch or stack unwinding, so they are trivially
safe for IBT or shadow stacks.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 linux-user/host/i386/safe-syscall.inc.S   | 19 +++++++++++++++++++
 linux-user/host/x86_64/safe-syscall.inc.S | 19 +++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/linux-user/host/i386/safe-syscall.inc.S b/linux-user/host/i386/safe-syscall.inc.S
index 9e58fc6504..6c6d568d62 100644
--- a/linux-user/host/i386/safe-syscall.inc.S
+++ b/linux-user/host/i386/safe-syscall.inc.S
@@ -98,3 +98,22 @@ safe_syscall_end:
 	.cfi_endproc
 
 	.size	safe_syscall_base, .-safe_syscall_base
+
+	.pushsection ".note.gnu.property", "a"
+	.p2align 2
+	.long 1f - 0f          /* name length.  */
+	.long 4f - 1f          /* data length.  */
+	.long 5                /* NT_GNU_PROPERTY_TYPE_0.  */
+0:
+	.asciz "GNU"           /* vendor name.  */
+	.p2align 2
+1:
+        /* GNU_PROPERTY_X86_FEATURE_1_AND.  */
+	.long 0xc0000002       /* pr_type.  */
+	.long 3f - 2f          /* pr_datasz.  */
+2:
+	.long 0x3              /* IBT, SHSTK */
+3:
+	.p2align 2
+4:
+	.popsection
diff --git a/linux-user/host/x86_64/safe-syscall.inc.S b/linux-user/host/x86_64/safe-syscall.inc.S
index f36992daa3..e1a57db338 100644
--- a/linux-user/host/x86_64/safe-syscall.inc.S
+++ b/linux-user/host/x86_64/safe-syscall.inc.S
@@ -89,3 +89,22 @@ safe_syscall_end:
         .cfi_endproc
 
         .size   safe_syscall_base, .-safe_syscall_base
+
+	.pushsection ".note.gnu.property", "a"
+	.p2align 3
+	.long 1f - 0f          /* name length.  */
+	.long 4f - 1f          /* data length.  */
+	.long 5                /* NT_GNU_PROPERTY_TYPE_0.  */
+0:
+	.asciz "GNU"           /* vendor name.  */
+	.p2align 3
+1:
+        /* GNU_PROPERTY_X86_FEATURE_1_AND.  */
+	.long 0xc0000002       /* pr_type.  */
+	.long 3f - 2f          /* pr_datasz.  */
+2:
+	.long 0x3              /* IBT, SHSTK */
+3:
+	.p2align 3
+4:
+	.popsection
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 10/9] coroutine-asm: add x86 CET shadow stack support
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, peter.maydell, cohuck, richard.henderson

Note that the ABI is not yet part of Linux; this patch is
not intended to be committed until that is approved.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 configure            | 14 ++++++++
 util/Makefile.objs   |  2 ++
 util/coroutine-asm.c | 82 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index c02a5f4b79..8e81d08ef1 100755
--- a/configure
+++ b/configure
@@ -5192,6 +5192,20 @@ if test "$cf_protection" != no; then
       feature_not_found "cf_protection" 'Control-flow protection is not supported by your toolchain'
     fi
     cf_protection=no
+  else
+    if test $cpu = x86_64; then
+      # only needed by coroutine-asm.c, however it should be rare to have
+      # CET support in the compiler but not in binutils
+      cat > $TMPC << EOF
+int main(void) { asm("rdsspq %%rax" : : : "rax"); }
+EOF
+      if ! compile_prog "" "" ; then
+        if test "$cf_protection" = yes; then
+          feature_not_found "cf_protection" 'CET is not supported by your toolchain'
+        fi
+        cf_protection=no
+      fi
+    fi
   fi
 fi
 if test "$cf_protection" = ""; then
diff --git a/util/Makefile.objs b/util/Makefile.objs
index d7add70b63..cf08b4d1c4 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -45,8 +45,10 @@ endif
 ifeq ($(CONFIG_CF_PROTECTION),y)
 coroutine-sigaltstack.o-cflags := -fcf-protection=branch
 coroutine-ucontext.o-cflags := -fcf-protection=branch
+ifneq ($(ARCH),x86_64)
 coroutine-asm.o-cflags += -fcf-protection=branch
 endif
+endif
 util-obj-y += buffer.o
 util-obj-y += timed-average.o
 util-obj-y += base64.o
diff --git a/util/coroutine-asm.c b/util/coroutine-asm.c
index a9a80e9c71..01875acfc4 100644
--- a/util/coroutine-asm.c
+++ b/util/coroutine-asm.c
@@ -22,6 +22,13 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "qemu/coroutine_int.h"
+#include "qemu/error-report.h"
+
+#ifdef CONFIG_CF_PROTECTION
+#include <asm/prctl.h>
+#include <sys/prctl.h>
+int arch_prctl(int code, unsigned long addr);
+#endif
 
 #ifdef CONFIG_VALGRIND_H
 #include <valgrind/valgrind.h>
@@ -42,12 +49,16 @@ typedef struct {
 
     /*
      * aarch64, s390x: instruction pointer
+     * x86: shadow stack pointer
      */
     void *scratch;
 
     void *stack;
     size_t stack_size;
 
+    /* x86: CET shadow stack */
+    void *sstack;
+    size_t sstack_size;
 #ifdef CONFIG_VALGRIND_H
     unsigned int valgrind_stack_id;
 #endif
@@ -82,6 +93,35 @@ static void start_switch_fiber(void **fake_stack_save,
 #endif
 }
 
+static bool have_sstack(void)
+{
+#if defined CONFIG_CF_PROTECTION && defined __x86_64__
+    uint64_t ssp;
+    asm ("xor %0, %0; rdsspq %0\n" : "=r" (ssp));
+    return !!ssp;
+#else
+    return 0;
+#endif
+}
+
+static void *alloc_sstack(size_t sz)
+{
+#if defined CONFIG_CF_PROTECTION && defined __x86_64__
+#ifndef ARCH_X86_CET_ALLOC_SHSTK
+#define ARCH_X86_CET_ALLOC_SHSTK 0x3004
+#endif
+
+    uint64_t arg = sz;
+    if (arch_prctl(ARCH_X86_CET_ALLOC_SHSTK, (unsigned long) &arg) < 0) {
+        abort();
+    }
+
+    return (void *)arg;
+#else
+    abort();
+#endif
+}
+
 #ifdef __x86_64__
 /*
  * We hardcode all operands to specific registers so that we can write down all the
@@ -93,6 +133,26 @@ static void start_switch_fiber(void **fake_stack_save,
  * Note that push and call would clobber the red zone.  Makefile.objs compiles this
  * file with -mno-red-zone.  The alternative is to subtract/add 128 bytes from rsp
  * around the switch, with slightly lower cache performance.
+ *
+ * The RSTORSSP and SAVEPREVSSP instructions are intricate.  In a nutshell they are:
+ *
+ *      RSTORSSP(mem):    oldSSP = SSP
+ *                        SSP = mem
+ *                        *SSP = oldSSP
+ *
+ *      SAVEPREVSSP:      oldSSP = shadow_stack_pop()
+ *                        *(oldSSP - 8) = oldSSP       # "push" to old shadow stack
+ *
+ * Therefore, RSTORSSP(mem) followed by SAVEPREVSSP is the same as
+ *
+ *     shadow_stack_push(SSP)
+ *     SSP = mem
+ *     shadow_stack_pop()
+ *
+ * From the simplified description you can see that co->ssp, being stored before
+ * the RSTORSSP+SAVEPREVSSP sequence, points to the top actual entry of the shadow
+ * stack, not to the restore token.  Hence we use an offset of -8 in the operand
+ * of rstorssp.
  */
 #define CO_SWITCH(from, to, action, jump) ({                                          \
     int action_ = action;                                                             \
@@ -105,7 +165,15 @@ static void start_switch_fiber(void **fake_stack_save,
         "jmp 2f\n"                          /* switch back continues at label 2 */    \
                                                                                       \
         "1: .cfi_adjust_cfa_offset 8\n"                                               \
-        "movq %%rsp, %c[SP](%[FROM])\n"     /* save source SP */                      \
+        "xor %%rbp, %%rbp\n"                /* use old frame pointer as scratch reg */ \
+        "rdsspq %%rbp\n"                                                              \
+        "test %%rbp, %%rbp\n"               /* if CET is enabled... */                \
+        "jz 9f\n"                                                                     \
+        "movq %%rbp, %c[SCRATCH](%[FROM])\n" /* ... save source shadow SP, */         \
+        "movq %c[SCRATCH](%[TO]), %%rbp\n"   /* restore destination shadow stack, */  \
+        "rstorssp -8(%%rbp)\n"                                                        \
+        "saveprevssp\n"                     /* and save source shadow SP token */     \
+        "9: movq %%rsp, %c[SP](%[FROM])\n"  /* save source SP */                      \
         "movq %c[SP](%[TO]), %%rsp\n"       /* load destination SP */                 \
         jump "\n"                           /* coroutine switch */                    \
                                                                                       \
@@ -113,7 +181,8 @@ static void start_switch_fiber(void **fake_stack_save,
         "popq %%rbp\n"                                                                \
         ".cfi_adjust_cfa_offset -8\n"                                                 \
         : "+a" (action_), [FROM] "+b" (from_), [TO] "+D" (to_)                        \
-        : [SP] "i" (offsetof(CoroutineAsm, sp))                                       \
+        : [SP] "i" (offsetof(CoroutineAsm, sp)),                                      \
+          [SCRATCH] "i" (offsetof(CoroutineAsm, scratch))                             \
         : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",  \
           "memory");                                                                  \
     action_;                                                                          \
@@ -220,6 +289,12 @@ Coroutine *qemu_coroutine_new(void)
     co->stack = qemu_alloc_stack(&co->stack_size);
     co->sp = co->stack + co->stack_size;
 
+    if (have_sstack()) {
+        co->sstack_size = COROUTINE_SHADOW_STACK_SIZE;
+        co->sstack = alloc_sstack(co->sstack_size);
+        co->scratch = co->sstack + co->sstack_size;
+    }
+
 #ifdef CONFIG_VALGRIND_H
     co->valgrind_stack_id =
         VALGRIND_STACK_REGISTER(co->stack, co->stack + co->stack_size);
@@ -265,6 +340,9 @@ void qemu_coroutine_delete(Coroutine *co_)
 #endif
 
     qemu_free_stack(co->stack, co->stack_size);
+    if (co->sstack) {
+        munmap(co->sstack, co->sstack_size);
+    }
     g_free(co);
 }
 
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 10/9] coroutine-asm: add x86 CET shadow stack support
@ 2019-05-04 12:05   ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-04 12:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block

Note that the ABI is not yet part of Linux; this patch is
not intended to be committed until that is approved.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 configure            | 14 ++++++++
 util/Makefile.objs   |  2 ++
 util/coroutine-asm.c | 82 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index c02a5f4b79..8e81d08ef1 100755
--- a/configure
+++ b/configure
@@ -5192,6 +5192,20 @@ if test "$cf_protection" != no; then
       feature_not_found "cf_protection" 'Control-flow protection is not supported by your toolchain'
     fi
     cf_protection=no
+  else
+    if test $cpu = x86_64; then
+      # only needed by coroutine-asm.c, however it should be rare to have
+      # CET support in the compiler but not in binutils
+      cat > $TMPC << EOF
+int main(void) { asm("rdsspq %%rax" : : : "rax"); }
+EOF
+      if ! compile_prog "" "" ; then
+        if test "$cf_protection" = yes; then
+          feature_not_found "cf_protection" 'CET is not supported by your toolchain'
+        fi
+        cf_protection=no
+      fi
+    fi
   fi
 fi
 if test "$cf_protection" = ""; then
diff --git a/util/Makefile.objs b/util/Makefile.objs
index d7add70b63..cf08b4d1c4 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -45,8 +45,10 @@ endif
 ifeq ($(CONFIG_CF_PROTECTION),y)
 coroutine-sigaltstack.o-cflags := -fcf-protection=branch
 coroutine-ucontext.o-cflags := -fcf-protection=branch
+ifneq ($(ARCH),x86_64)
 coroutine-asm.o-cflags += -fcf-protection=branch
 endif
+endif
 util-obj-y += buffer.o
 util-obj-y += timed-average.o
 util-obj-y += base64.o
diff --git a/util/coroutine-asm.c b/util/coroutine-asm.c
index a9a80e9c71..01875acfc4 100644
--- a/util/coroutine-asm.c
+++ b/util/coroutine-asm.c
@@ -22,6 +22,13 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "qemu/coroutine_int.h"
+#include "qemu/error-report.h"
+
+#ifdef CONFIG_CF_PROTECTION
+#include <asm/prctl.h>
+#include <sys/prctl.h>
+int arch_prctl(int code, unsigned long addr);
+#endif
 
 #ifdef CONFIG_VALGRIND_H
 #include <valgrind/valgrind.h>
@@ -42,12 +49,16 @@ typedef struct {
 
     /*
      * aarch64, s390x: instruction pointer
+     * x86: shadow stack pointer
      */
     void *scratch;
 
     void *stack;
     size_t stack_size;
 
+    /* x86: CET shadow stack */
+    void *sstack;
+    size_t sstack_size;
 #ifdef CONFIG_VALGRIND_H
     unsigned int valgrind_stack_id;
 #endif
@@ -82,6 +93,35 @@ static void start_switch_fiber(void **fake_stack_save,
 #endif
 }
 
+static bool have_sstack(void)
+{
+#if defined CONFIG_CF_PROTECTION && defined __x86_64__
+    uint64_t ssp;
+    asm ("xor %0, %0; rdsspq %0\n" : "=r" (ssp));
+    return !!ssp;
+#else
+    return 0;
+#endif
+}
+
+static void *alloc_sstack(size_t sz)
+{
+#if defined CONFIG_CF_PROTECTION && defined __x86_64__
+#ifndef ARCH_X86_CET_ALLOC_SHSTK
+#define ARCH_X86_CET_ALLOC_SHSTK 0x3004
+#endif
+
+    uint64_t arg = sz;
+    if (arch_prctl(ARCH_X86_CET_ALLOC_SHSTK, (unsigned long) &arg) < 0) {
+        abort();
+    }
+
+    return (void *)arg;
+#else
+    abort();
+#endif
+}
+
 #ifdef __x86_64__
 /*
  * We hardcode all operands to specific registers so that we can write down all the
@@ -93,6 +133,26 @@ static void start_switch_fiber(void **fake_stack_save,
  * Note that push and call would clobber the red zone.  Makefile.objs compiles this
  * file with -mno-red-zone.  The alternative is to subtract/add 128 bytes from rsp
  * around the switch, with slightly lower cache performance.
+ *
+ * The RSTORSSP and SAVEPREVSSP instructions are intricate.  In a nutshell they are:
+ *
+ *      RSTORSSP(mem):    oldSSP = SSP
+ *                        SSP = mem
+ *                        *SSP = oldSSP
+ *
+ *      SAVEPREVSSP:      oldSSP = shadow_stack_pop()
+ *                        *(oldSSP - 8) = oldSSP       # "push" to old shadow stack
+ *
+ * Therefore, RSTORSSP(mem) followed by SAVEPREVSSP is the same as
+ *
+ *     shadow_stack_push(SSP)
+ *     SSP = mem
+ *     shadow_stack_pop()
+ *
+ * From the simplified description you can see that co->ssp, being stored before
+ * the RSTORSSP+SAVEPREVSSP sequence, points to the top actual entry of the shadow
+ * stack, not to the restore token.  Hence we use an offset of -8 in the operand
+ * of rstorssp.
  */
 #define CO_SWITCH(from, to, action, jump) ({                                          \
     int action_ = action;                                                             \
@@ -105,7 +165,15 @@ static void start_switch_fiber(void **fake_stack_save,
         "jmp 2f\n"                          /* switch back continues at label 2 */    \
                                                                                       \
         "1: .cfi_adjust_cfa_offset 8\n"                                               \
-        "movq %%rsp, %c[SP](%[FROM])\n"     /* save source SP */                      \
+        "xor %%rbp, %%rbp\n"                /* use old frame pointer as scratch reg */ \
+        "rdsspq %%rbp\n"                                                              \
+        "test %%rbp, %%rbp\n"               /* if CET is enabled... */                \
+        "jz 9f\n"                                                                     \
+        "movq %%rbp, %c[SCRATCH](%[FROM])\n" /* ... save source shadow SP, */         \
+        "movq %c[SCRATCH](%[TO]), %%rbp\n"   /* restore destination shadow stack, */  \
+        "rstorssp -8(%%rbp)\n"                                                        \
+        "saveprevssp\n"                     /* and save source shadow SP token */     \
+        "9: movq %%rsp, %c[SP](%[FROM])\n"  /* save source SP */                      \
         "movq %c[SP](%[TO]), %%rsp\n"       /* load destination SP */                 \
         jump "\n"                           /* coroutine switch */                    \
                                                                                       \
@@ -113,7 +181,8 @@ static void start_switch_fiber(void **fake_stack_save,
         "popq %%rbp\n"                                                                \
         ".cfi_adjust_cfa_offset -8\n"                                                 \
         : "+a" (action_), [FROM] "+b" (from_), [TO] "+D" (to_)                        \
-        : [SP] "i" (offsetof(CoroutineAsm, sp))                                       \
+        : [SP] "i" (offsetof(CoroutineAsm, sp)),                                      \
+          [SCRATCH] "i" (offsetof(CoroutineAsm, scratch))                             \
         : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",  \
           "memory");                                                                  \
     action_;                                                                          \
@@ -220,6 +289,12 @@ Coroutine *qemu_coroutine_new(void)
     co->stack = qemu_alloc_stack(&co->stack_size);
     co->sp = co->stack + co->stack_size;
 
+    if (have_sstack()) {
+        co->sstack_size = COROUTINE_SHADOW_STACK_SIZE;
+        co->sstack = alloc_sstack(co->sstack_size);
+        co->scratch = co->sstack + co->sstack_size;
+    }
+
 #ifdef CONFIG_VALGRIND_H
     co->valgrind_stack_id =
         VALGRIND_STACK_REGISTER(co->stack, co->stack + co->stack_size);
@@ -265,6 +340,9 @@ void qemu_coroutine_delete(Coroutine *co_)
 #endif
 
     qemu_free_stack(co->stack, co->stack_size);
+    if (co->sstack) {
+        munmap(co->sstack, co->sstack_size);
+    }
     g_free(co);
 }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] Assembly coroutine backend and x86 CET support
  2019-05-04 12:05 ` Paolo Bonzini
                   ` (10 preceding siblings ...)
  (?)
@ 2019-05-05 15:41 ` Alex Bennée
  2019-05-09 13:44   ` Peter Maydell
  -1 siblings, 1 reply; 37+ messages in thread
From: Alex Bennée @ 2019-05-05 15:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, cohuck, richard.henderson, qemu-block


Paolo Bonzini <pbonzini@redhat.com> writes:

> *** BLURB HERE ***

I assume there was going to be a bit more background here?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] coroutine: add host specific coroutine backend for 64-bit x86
@ 2019-05-05 16:52     ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-05-05 16:52 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block, peter.maydell, cohuck

On 5/4/19 5:05 AM, Paolo Bonzini wrote:
> This backend is faster (100ns vs 150ns per switch on my laptop), but
> especially it will be possible to add CET support to it.  Most of the
> code is actually not architecture specific.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  configure                        |  10 ++
>  scripts/qemugdb/coroutine.py     |   5 +-
>  scripts/qemugdb/coroutine_asm.py |  20 +++
>  util/Makefile.objs               |   1 +
>  util/coroutine-asm.c             | 230 +++++++++++++++++++++++++++++++
>  5 files changed, 264 insertions(+), 2 deletions(-)
>  create mode 100644 scripts/qemugdb/coroutine_asm.py
>  create mode 100644 util/coroutine-asm.c

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] coroutine: add host specific coroutine backend for 64-bit x86
@ 2019-05-05 16:52     ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-05-05 16:52 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: peter.maydell, cohuck, qemu-block

On 5/4/19 5:05 AM, Paolo Bonzini wrote:
> This backend is faster (100ns vs 150ns per switch on my laptop), but
> especially it will be possible to add CET support to it.  Most of the
> code is actually not architecture specific.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  configure                        |  10 ++
>  scripts/qemugdb/coroutine.py     |   5 +-
>  scripts/qemugdb/coroutine_asm.py |  20 +++
>  util/Makefile.objs               |   1 +
>  util/coroutine-asm.c             | 230 +++++++++++++++++++++++++++++++
>  5 files changed, 264 insertions(+), 2 deletions(-)
>  create mode 100644 scripts/qemugdb/coroutine_asm.py
>  create mode 100644 util/coroutine-asm.c

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] coroutine: add host specific coroutine backend for 64-bit ARM
@ 2019-05-05 17:00     ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-05-05 17:00 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block, peter.maydell, cohuck

On 5/4/19 5:05 AM, Paolo Bonzini wrote:
> The speedup is similar to x86, 120 ns vs 180 ns on an APM Mustang.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  configure                        |  2 +-
>  scripts/qemugdb/coroutine_asm.py |  6 ++++-
>  util/Makefile.objs               |  2 ++
>  util/coroutine-asm.c             | 45 ++++++++++++++++++++++++++++++++
>  4 files changed, 53 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

> +        "ldr x30, [x1, %[SCRATCH]]\n"    /* load destination PC */   \
> +        "ldr x1, [x1, %[SP]]\n"          /* load destination SP */   \
> +        "mov sp, x1\n"                                               \
> +        "br x30\n"                                                   \
> +        "2: \n"                                                      \

For future reference, "bti j" (aka hint #36) goes here,
for the aarch64 branch target identification extension.


r~

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] coroutine: add host specific coroutine backend for 64-bit ARM
@ 2019-05-05 17:00     ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-05-05 17:00 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: peter.maydell, cohuck, qemu-block

On 5/4/19 5:05 AM, Paolo Bonzini wrote:
> The speedup is similar to x86, 120 ns vs 180 ns on an APM Mustang.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  configure                        |  2 +-
>  scripts/qemugdb/coroutine_asm.py |  6 ++++-
>  util/Makefile.objs               |  2 ++
>  util/coroutine-asm.c             | 45 ++++++++++++++++++++++++++++++++
>  4 files changed, 53 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

> +        "ldr x30, [x1, %[SCRATCH]]\n"    /* load destination PC */   \
> +        "ldr x1, [x1, %[SP]]\n"          /* load destination SP */   \
> +        "mov sp, x1\n"                                               \
> +        "br x30\n"                                                   \
> +        "2: \n"                                                      \

For future reference, "bti j" (aka hint #36) goes here,
for the aarch64 branch target identification extension.


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 5/9] coroutine: add host specific coroutine backend for 64-bit s390
@ 2019-05-05 17:10     ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-05-05 17:10 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block, peter.maydell, cohuck

On 5/4/19 5:05 AM, Paolo Bonzini wrote:
> +  "bras %%r3, 1f\n"                /* source PC will be after the BR */ \
> +  "1: aghi %%r3, 12\n"             /* 4 */                              \
> +  "stg %%r3, %[SCRATCH](%%r1)\n"   /* 6 save switch-back PC */          \
> +  "br %%r4\n"                      /* 2 jump to destination PC */       \

Better as

	larl	%r3, 2f
	stg	%r3, SCRATCH(%r1)
	br	%r4
2:


r~

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 5/9] coroutine: add host specific coroutine backend for 64-bit s390
@ 2019-05-05 17:10     ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-05-05 17:10 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: peter.maydell, cohuck, qemu-block

On 5/4/19 5:05 AM, Paolo Bonzini wrote:
> +  "bras %%r3, 1f\n"                /* source PC will be after the BR */ \
> +  "1: aghi %%r3, 12\n"             /* 4 */                              \
> +  "stg %%r3, %[SCRATCH](%%r1)\n"   /* 6 save switch-back PC */          \
> +  "br %%r4\n"                      /* 2 jump to destination PC */       \

Better as

	larl	%r3, 2f
	stg	%r3, SCRATCH(%r1)
	br	%r4
2:


r~


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] tcg/i386: add support for IBT
@ 2019-05-05 17:14     ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-05-05 17:14 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-block, peter.maydell, cohuck

On 5/4/19 5:05 AM, Paolo Bonzini wrote:
> Add endbr annotations before indirect branch targets.  This lets QEMU enable
> IBT even for TCG-enabled builds.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Makefile.target           |  2 ++
>  configure                 |  9 +++++++++
>  include/qemu/cpuid.h      |  5 +++++
>  tcg/i386/tcg-target.inc.c | 19 +++++++++++++++++++
>  4 files changed, 35 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] tcg/i386: add support for IBT
@ 2019-05-05 17:14     ` Richard Henderson
  0 siblings, 0 replies; 37+ messages in thread
From: Richard Henderson @ 2019-05-05 17:14 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: peter.maydell, cohuck, qemu-block

On 5/4/19 5:05 AM, Paolo Bonzini wrote:
> Add endbr annotations before indirect branch targets.  This lets QEMU enable
> IBT even for TCG-enabled builds.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Makefile.target           |  2 ++
>  configure                 |  9 +++++++++
>  include/qemu/cpuid.h      |  5 +++++
>  tcg/i386/tcg-target.inc.c | 19 +++++++++++++++++++
>  4 files changed, 35 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] coroutine: add host specific coroutine backend for 64-bit ARM
  2019-05-04 12:05   ` Paolo Bonzini
  (?)
  (?)
@ 2019-05-09 13:15   ` Stefan Hajnoczi
  -1 siblings, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2019-05-09 13:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: peter.maydell, cohuck, richard.henderson, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1852 bytes --]

On Sat, May 04, 2019 at 06:05:22AM -0600, Paolo Bonzini wrote:
> diff --git a/configure b/configure
> index c01f57a3ae..26e62a4ab1 100755
> --- a/configure
> +++ b/configure
> @@ -5200,7 +5200,7 @@ fi
>  if test "$coroutine" = ""; then
>    if test "$mingw32" = "yes"; then
>      coroutine=win32
> -  elif test "$cpu" = "x86_64"; then
> +  elif test "$cpu" = "x86_64" || test "$cpu" = "aarch64"; then
>       coroutine=asm
>    elif test "$ucontext_works" = "yes"; then
>      coroutine=ucontext
> diff --git a/scripts/qemugdb/coroutine_asm.py b/scripts/qemugdb/coroutine_asm.py
> index b4ac1291db..181b77287b 100644
> --- a/scripts/qemugdb/coroutine_asm.py
> +++ b/scripts/qemugdb/coroutine_asm.py
> @@ -17,4 +17,8 @@ U64_PTR = gdb.lookup_type('uint64_t').pointer()
>  def get_coroutine_regs(addr):
>      addr = addr.cast(gdb.lookup_type('CoroutineAsm').pointer())
>      rsp = addr['sp'].cast(U64_PTR)
> -    return {'sp': rsp, 'pc': rsp.dereference()}
> +    arch = gdb.selected_frame().architecture.name().split(':'):
> +    if arch[0] == 'i386' and arch[1] == 'x86-64':
> +        return {'rsp': rsp, 'pc': rsp.dereference()}

Before: sp
After: rsp

Is this a typo?  I thought we were using sp everywhere now.

> +    else:
> +        return {'sp': rsp, 'pc': addr['scratch'].cast(U64_PTR) }
> diff --git a/util/Makefile.objs b/util/Makefile.objs
> index 41a10539cf..2167ffc862 100644
> --- a/util/Makefile.objs
> +++ b/util/Makefile.objs
> @@ -39,7 +39,9 @@ util-obj-$(CONFIG_MEMBARRIER) += sys_membarrier.o
>  util-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
>  util-obj-y += qemu-coroutine-sleep.o
>  util-obj-y += coroutine-$(CONFIG_COROUTINE_BACKEND).o
> +ifeq ($(ARCH),x86_64)
>  coroutine-asm.o-cflags := -mno-red-zone
> +endif

-mno-red-zone was mentioned in the previous patch.  Should this hunk be
moved there?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] Assembly coroutine backend and x86 CET support
  2019-05-04 12:05 ` Paolo Bonzini
                   ` (11 preceding siblings ...)
  (?)
@ 2019-05-09 13:29 ` Stefan Hajnoczi
  -1 siblings, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2019-05-09 13:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: peter.maydell, cohuck, richard.henderson, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2117 bytes --]

On Sat, May 04, 2019 at 06:05:18AM -0600, Paolo Bonzini wrote:
> *** BLURB HERE ***
> 
> Paolo Bonzini (10):
>   qemugdb: allow adding support for other coroutine backends
>   qemugdb: allow adding support for other architectures
>   coroutine: add host specific coroutine backend for 64-bit x86
>   coroutine: add host specific coroutine backend for 64-bit ARM
>   coroutine: add host specific coroutine backend for 64-bit s390
>   configure: add control-flow protection support
>   tcg: add tcg_out_start
>   tcg/i386: add support for IBT
>   linux-user: add IBT support to x86 safe-syscall.S
>   coroutine-asm: add x86 CET shadow stack support
> 
>  Makefile.target                           |   5 +
>  configure                                 |  62 ++++
>  include/qemu/cpuid.h                      |   5 +
>  linux-user/host/i386/safe-syscall.inc.S   |  19 ++
>  linux-user/host/x86_64/safe-syscall.inc.S |  19 ++
>  scripts/qemugdb/coroutine.py              | 107 ++----
>  scripts/qemugdb/coroutine_asm.py          |  24 ++
>  scripts/qemugdb/coroutine_ucontext.py     |  69 ++++
>  tcg/aarch64/tcg-target.inc.c              |   4 +
>  tcg/arm/tcg-target.inc.c                  |   4 +
>  tcg/i386/tcg-target.inc.c                 |  23 ++
>  tcg/mips/tcg-target.inc.c                 |   4 +
>  tcg/ppc/tcg-target.inc.c                  |   4 +
>  tcg/riscv/tcg-target.inc.c                |   4 +
>  tcg/s390/tcg-target.inc.c                 |   4 +
>  tcg/sparc/tcg-target.inc.c                |   4 +
>  tcg/tcg.c                                 |   2 +
>  tcg/tci/tcg-target.inc.c                  |   4 +
>  util/Makefile.objs                        |  10 +
>  util/coroutine-asm.c                      | 387 ++++++++++++++++++++++
>  20 files changed, 689 insertions(+), 75 deletions(-)
>  create mode 100644 scripts/qemugdb/coroutine_asm.py
>  create mode 100644 scripts/qemugdb/coroutine_ucontext.py
>  create mode 100644 util/coroutine-asm.c
> 
> -- 
> 2.21.0
> 
> 

Aside from the comments I posted:
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 0/9] Assembly coroutine backend and x86 CET support
  2019-05-05 15:41 ` [Qemu-devel] [PATCH 0/9] Assembly coroutine backend and x86 CET support Alex Bennée
@ 2019-05-09 13:44   ` Peter Maydell
  2019-05-15  9:48     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Maydell @ 2019-05-09 13:44 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Cornelia Huck, Richard Henderson, QEMU Developers, Qemu-block

On Sun, 5 May 2019 at 16:41, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Paolo Bonzini <pbonzini@redhat.com> writes:
>
> > *** BLURB HERE ***
>
> I assume there was going to be a bit more background here?

Mmm, could we have the rationale, please ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 0/9] Assembly coroutine backend and x86 CET support
  2019-05-09 13:44   ` Peter Maydell
@ 2019-05-15  9:48     ` Stefan Hajnoczi
  2019-05-16 12:50       ` Peter Maydell
  0 siblings, 1 reply; 37+ messages in thread
From: Stefan Hajnoczi @ 2019-05-15  9:48 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, Cornelia Huck, Alex Bennée,
	QEMU Developers, Qemu-block

[-- Attachment #1: Type: text/plain, Size: 599 bytes --]

On Thu, May 09, 2019 at 02:44:39PM +0100, Peter Maydell wrote:
> On Sun, 5 May 2019 at 16:41, Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> >
> > Paolo Bonzini <pbonzini@redhat.com> writes:
> >
> > > *** BLURB HERE ***
> >
> > I assume there was going to be a bit more background here?
> 
> Mmm, could we have the rationale, please ?

Paolo can add more if necessary, but my understanding is:

1. It's required for Intel Control-flow Enforcement Technology (CET).
   The existing ucontext backend doesn't work with CET.
2. It's faster than the existing ucontext implementation.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 0/9] Assembly coroutine backend and x86 CET support
  2019-05-15  9:48     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2019-05-16 12:50       ` Peter Maydell
  2019-05-22 10:02         ` Paolo Bonzini
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Maydell @ 2019-05-16 12:50 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Richard Henderson, Cornelia Huck, Alex Bennée,
	QEMU Developers, Qemu-block

On Wed, 15 May 2019 at 10:48, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>
> On Thu, May 09, 2019 at 02:44:39PM +0100, Peter Maydell wrote:
> > On Sun, 5 May 2019 at 16:41, Alex Bennée <alex.bennee@linaro.org> wrote:
> > >
> > >
> > > Paolo Bonzini <pbonzini@redhat.com> writes:
> > >
> > > > *** BLURB HERE ***
> > >
> > > I assume there was going to be a bit more background here?
> >
> > Mmm, could we have the rationale, please ?
>
> Paolo can add more if necessary, but my understanding is:
>
> 1. It's required for Intel Control-flow Enforcement Technology (CET).
>    The existing ucontext backend doesn't work with CET.
> 2. It's faster than the existing ucontext implementation.

Mmm, I think we've talked about 1 before, but I think it would
be useful to clearly state why we need to do things here.
It's also useful for identifying whether we need an asm
backend for every host, or only some hosts (and if so which).

I'm unconvinced by 2 as a rationale for adding more host asm.
Coroutines were already bad enough when they were at least
vaguely portable C code.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 0/9] Assembly coroutine backend and x86 CET support
  2019-05-16 12:50       ` Peter Maydell
@ 2019-05-22 10:02         ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2019-05-22 10:02 UTC (permalink / raw)
  To: Peter Maydell, Stefan Hajnoczi
  Cc: Alex Bennée, Cornelia Huck, Richard Henderson,
	QEMU Developers, Qemu-block

On 16/05/19 14:50, Peter Maydell wrote:
> On Wed, 15 May 2019 at 10:48, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> 1. It's required for Intel Control-flow Enforcement Technology (CET).
>>    The existing ucontext backend doesn't work with CET.
>> 2. It's faster than the existing ucontext implementation.
> 
> Mmm, I think we've talked about 1 before, but I think it would
> be useful to clearly state why we need to do things here.

The reason is that, with CET enabled, setjmp and longjmp assume that
they are used only to unwind the stack and not to switch to a completely
different one.  You are supposed to use swapcontext for that, but it
doesn't work for QEMU coroutines because it saves/restores the signal
mask; that is not only slower, it's incorrect we want the signal mask to
be a property of the thread, not the coroutine.

> It's also useful for identifying whether we need an asm
> backend for every host, or only some hosts (and if so which).

It's not needed for _any_ host (except x86 if you want CET support).  I
wrote these three backends to ensure that it could be ported without
much effort on any host.  If you prefer not having an aarch64 backend,
for example, I can leave it out.

Paolo


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2019-05-22 10:08 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-04 12:05 [Qemu-devel] [PATCH 0/9] Assembly coroutine backend and x86 CET support Paolo Bonzini
2019-05-04 12:05 ` Paolo Bonzini
2019-05-04 12:05 ` [Qemu-devel] [PATCH 1/9] qemugdb: allow adding support for other coroutine backends Paolo Bonzini
2019-05-04 12:05   ` Paolo Bonzini
2019-05-04 12:05 ` [Qemu-devel] [PATCH 2/9] qemugdb: allow adding support for other architectures Paolo Bonzini
2019-05-04 12:05   ` Paolo Bonzini
2019-05-04 12:05 ` [Qemu-devel] [PATCH 3/9] coroutine: add host specific coroutine backend for 64-bit x86 Paolo Bonzini
2019-05-04 12:05   ` Paolo Bonzini
2019-05-05 16:52   ` Richard Henderson
2019-05-05 16:52     ` Richard Henderson
2019-05-04 12:05 ` [Qemu-devel] [PATCH 4/9] coroutine: add host specific coroutine backend for 64-bit ARM Paolo Bonzini
2019-05-04 12:05   ` Paolo Bonzini
2019-05-05 17:00   ` Richard Henderson
2019-05-05 17:00     ` Richard Henderson
2019-05-09 13:15   ` Stefan Hajnoczi
2019-05-04 12:05 ` [Qemu-devel] [PATCH 5/9] coroutine: add host specific coroutine backend for 64-bit s390 Paolo Bonzini
2019-05-04 12:05   ` Paolo Bonzini
2019-05-05 17:10   ` Richard Henderson
2019-05-05 17:10     ` Richard Henderson
2019-05-04 12:05 ` [Qemu-devel] [PATCH 6/9] configure: add control-flow protection support Paolo Bonzini
2019-05-04 12:05   ` Paolo Bonzini
2019-05-04 12:05 ` [Qemu-devel] [PATCH 7/9] tcg: add tcg_out_start Paolo Bonzini
2019-05-04 12:05   ` Paolo Bonzini
2019-05-04 12:05 ` [Qemu-devel] [PATCH 8/9] tcg/i386: add support for IBT Paolo Bonzini
2019-05-04 12:05   ` Paolo Bonzini
2019-05-05 17:14   ` Richard Henderson
2019-05-05 17:14     ` Richard Henderson
2019-05-04 12:05 ` [Qemu-devel] [PATCH 9/9] linux-user: add IBT support to x86 safe-syscall.S Paolo Bonzini
2019-05-04 12:05   ` Paolo Bonzini
2019-05-04 12:05 ` [Qemu-devel] [PATCH 10/9] coroutine-asm: add x86 CET shadow stack support Paolo Bonzini
2019-05-04 12:05   ` Paolo Bonzini
2019-05-05 15:41 ` [Qemu-devel] [PATCH 0/9] Assembly coroutine backend and x86 CET support Alex Bennée
2019-05-09 13:44   ` Peter Maydell
2019-05-15  9:48     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-05-16 12:50       ` Peter Maydell
2019-05-22 10:02         ` Paolo Bonzini
2019-05-09 13:29 ` [Qemu-devel] " Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.