* [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-04-25 13:35 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2019-04-25 13:35 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé,
Daniel Berrange, Peter Maydell, Markus Armbruster, Paolo Bonzini,
Eduardo Otubo, Stefan Hajnoczi
At KVM Forum 2018 I gave a presentation on security in QEMU:
https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
This patch adds a security guide to the developer docs. This document
covers things that developers should know about security in QEMU. It is
just a starting point that we can expand on later. I hope it will be
useful as a resource for new contributors and will save code reviewers
from explaining the same concepts many times.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v2:
* Added mention of passthrough USB and PCI devices [philmd]
* Reworded resource limits [philmd]
* Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
---
docs/devel/index.rst | 1 +
docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 226 insertions(+)
create mode 100644 docs/devel/security.rst
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index ebbab636ce..fd0b5fa387 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -20,3 +20,4 @@ Contents:
stable-process
testing
decodetree
+ security
diff --git a/docs/devel/security.rst b/docs/devel/security.rst
new file mode 100644
index 0000000000..83c6fb2231
--- /dev/null
+++ b/docs/devel/security.rst
@@ -0,0 +1,225 @@
+==============
+Security Guide
+==============
+Overview
+--------
+This guide covers security topics relevant to developers working on QEMU. It
+includes an explanation of the security requirements that QEMU gives its users,
+the architecture of the code, and secure coding practices.
+
+Security Requirements
+---------------------
+QEMU supports many different use cases, some of which have stricter security
+requirements than others. The community has agreed on the overall security
+requirements that users may depend on. These requirements define what is
+considered supported from a security perspective.
+
+Virtualization Use Case
+~~~~~~~~~~~~~~~~~~~~~~~
+The virtualization use case covers cloud and virtual private server (VPS)
+hosting, as well as traditional data center and desktop virtualization. These
+use cases rely on hardware virtualization extensions to execute guest code
+safely on the physical CPU at close-to-native speed.
+
+The following entities are **untrusted**, meaning that they may be buggy or
+malicious:
+
+* Guest
+* User-facing interfaces (e.g. VNC, SPICE, WebSocket)
+* Network protocols (e.g. NBD, live migration)
+* User-supplied files (e.g. disk images, kernels, device trees)
+* Passthrough devices (e.g. PCI, USB)
+
+Bugs affecting these entities are evaluated on whether they can cause damage in
+real-world use cases and treated as security bugs if this is the case.
+
+Non-virtualization Use Case
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The non-virtualization use case covers emulation using the Tiny Code Generator
+(TCG). In principle the TCG and device emulation code used in conjunction with
+the non-virtualization use case should meet the same security requirements as
+the virtualization use case. However, for historical reasons much of the
+non-virtualization use case code was not written with these security
+requirements in mind.
+
+Bugs affecting the non-virtualization use case are not considered security
+bugs at this time. Users with non-virtualization use cases must not rely on
+QEMU to provide guest isolation or any security guarantees.
+
+Architecture
+------------
+This section describes the design principles that ensure the security
+requirements are met.
+
+Guest Isolation
+~~~~~~~~~~~~~~~
+Guest isolation is the confinement of guest code to the virtual machine. When
+guest code gains control of execution on the host this is called escaping the
+virtual machine. Isolation also includes resource limits such as throttling of
+CPU, memory, disk, or network. Guests must be unable to exceed their resource
+limits.
+
+QEMU presents an attack surface to the guest in the form of emulated devices.
+The guest must not be able to gain control of QEMU. Bugs in emulated devices
+could allow malicious guests to gain code execution in QEMU. At this point the
+guest has escaped the virtual machine and is able to act in the context of the
+QEMU process on the host.
+
+Guests often interact with other guests and share resources with them. A
+malicious guest must not gain control of other guests or access their data.
+Disk image files and network traffic must be protected from other guests unless
+explicitly shared between them by the user.
+
+Principle of Least Privilege
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The principle of least privilege states that each component only has access to
+the privileges necessary for its function. In the case of QEMU this means that
+each process only has access to resources belonging to the guest.
+
+The QEMU process should not have access to any resources that are inaccessible
+to the guest. This way the guest does not gain anything by escaping into the
+QEMU process since it already has access to those same resources from within
+the guest.
+
+Following the principle of least privilege immediately fulfills guest isolation
+requirements. For example, guest A only has access to its own disk image file
+``a.img`` and not guest B's disk image file ``b.img``.
+
+In reality certain resources are inaccessible to the guest but must be
+available to QEMU to perform its function. For example, host system calls are
+necessary for QEMU but are not exposed to guests. A guest that escapes into
+the QEMU process can then begin invoking host system calls.
+
+New features must be designed to follow the principle of least privilege.
+Should this not be possible for technical reasons, the security risk must be
+clearly documented so users are aware of the trade-off of enabling the feature.
+
+Isolation mechanisms
+~~~~~~~~~~~~~~~~~~~~
+Several isolation mechanisms are available to realize this architecture of
+guest isolation and the principle of least privilege. With the exception of
+Linux seccomp, these mechanisms are all deployed by management tools that
+launch QEMU, such as libvirt. They are also platform-specific so they are only
+described briefly for Linux here.
+
+The fundamental isolation mechanism is that QEMU processes must run as
+**unprivileged users**. Sometimes it seems more convenient to launch QEMU as
+root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
+huge security risk. File descriptor passing can be used to give an otherwise
+unprivileged QEMU process access to host devices without running QEMU as root.
+
+**SELinux** and **AppArmor** make it possible to confine processes beyond the
+traditional UNIX process and file permissions model. They restrict the QEMU
+process from accessing processes and files on the host system that are not
+needed by QEMU.
+
+**Resource limits** and **cgroup controllers** provide throughput and utilization
+limits on key resources such as CPU time, memory, and I/O bandwidth.
+
+**Linux namespaces** can be used to make process, file system, and other system
+resources unavailable to QEMU. A namespaced QEMU process is restricted to only
+those resources that were granted to it.
+
+**Linux seccomp** is available via the QEMU ``--sandbox`` option. It disables
+system calls that are not needed by QEMU, thereby reducing the host kernel
+attack surface.
+
+Secure coding practices
+-----------------------
+At the source code level there are several points to keep in mind. Both
+developers and security researchers must be aware of them so that they can
+develop safe code and audit existing code properly.
+
+General Secure C Coding Practices
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Most CVEs (security bugs) reported against QEMU are not specific to
+virtualization or emulation. They are simply C programming bugs. Therefore
+it's critical to be aware of common classes of security bugs.
+
+There is a wide selection of resources available covering secure C coding. For
+example, the `CERT C Coding Standard
+<https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard>`_
+covers the most important classes of security bugs.
+
+Instead of describing them in detail here, only the names of the most important
+classes of security bugs are mentioned:
+
+* Buffer overflows
+* Use-after-free and double-free
+* Integer overflows
+* Format string vulnerabilities
+
+Some of these classes of bugs can be detected by analyzers. Static analysis is
+performed regularly by Coverity and the most obvious of these bugs are even
+reported by compilers. Dynamic analysis is possible with valgrind, tsan, and
+asan.
+
+Input Validation
+~~~~~~~~~~~~~~~~
+Inputs from the guest or external sources (e.g. network, files) cannot be
+trusted and may be invalid. Inputs must be checked before using them in a way
+that could crash the program, expose host memory to the guest, or otherwise be
+exploitable by an attacker.
+
+The most sensitive attack surface is device emulation. All hardware register
+accesses and data read from guest memory must be validated. A typical example
+is a device that contains multiple units that are selectable by the guest via
+an index register::
+
+ typedef struct {
+ ProcessingUnit unit[2];
+ ...
+ } MyDeviceState;
+
+ static void mydev_writel(void *opaque, uint32_t addr, uint32_t val)
+ {
+ MyDeviceState *mydev = opaque;
+ ProcessingUnit *unit;
+
+ switch (addr) {
+ case MYDEV_SELECT_UNIT:
+ unit = &mydev->unit[val]; <-- this input wasn't validated!
+ ...
+ }
+ }
+
+If ``val`` is not in range [0, 1] then an out-of-bounds memory access will take
+place when ``unit`` is dereferenced. The code must check that ``val`` is 0 or
+1 and handle the case where it is invalid.
+
+Unexpected Device Accesses
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+The guest may access device registers in unusual orders or at unexpected
+moments. Device emulation code must not assume that the guest follows the
+typical "theory of operation" presented in driver writer manuals. The guest
+may make nonsense accesses to device registers such as starting operations
+before the device has been fully initialized.
+
+A related issue is that device emulation code must be prepared for unexpected
+device register accesses while asynchronous operations are in progress. A
+well-behaved guest might wait for a completion interrupt before accessing
+certain device registers. Device emulation code must handle the case where the
+guest overwrites registers or submits further requests before an ongoing
+request completes. Unexpected accesses must not cause memory corruption or
+leaks in QEMU.
+
+Invalid device register accesses can be reported with
+``qemu_log_mask(LOG_GUEST_ERROR, ...)``. The ``-d guest_errors`` command-line
+option enables these log messages.
+
+Live migration
+~~~~~~~~~~~~~~
+Device state can be saved to disk image files and shared with other users.
+Live migration code must validate inputs when loading device state so an
+attacker cannot gain control by crafting invalid device states. Device state
+is therefore considered untrusted even though it is typically generated by QEMU
+itself.
+
+Guest Memory Access Races
+~~~~~~~~~~~~~~~~~~~~~~~~~
+Guests with multiple vCPUs may modify guest RAM while device emulation code is
+running. Device emulation code must copy in descriptors and other guest RAM
+structures and only process the local copy. This prevents
+time-of-check-to-time-of-use (TOCTOU) race conditions that could cause QEMU to
+crash when a vCPU thread modifies guest RAM while device emulation is
+processing it.
--
2.20.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-04-25 13:35 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2019-04-25 13:35 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Otubo, Peter Maydell, Markus Armbruster, Stefan Hajnoczi,
Paolo Bonzini, Philippe Mathieu-Daudé
At KVM Forum 2018 I gave a presentation on security in QEMU:
https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
This patch adds a security guide to the developer docs. This document
covers things that developers should know about security in QEMU. It is
just a starting point that we can expand on later. I hope it will be
useful as a resource for new contributors and will save code reviewers
from explaining the same concepts many times.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v2:
* Added mention of passthrough USB and PCI devices [philmd]
* Reworded resource limits [philmd]
* Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
---
docs/devel/index.rst | 1 +
docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 226 insertions(+)
create mode 100644 docs/devel/security.rst
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index ebbab636ce..fd0b5fa387 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -20,3 +20,4 @@ Contents:
stable-process
testing
decodetree
+ security
diff --git a/docs/devel/security.rst b/docs/devel/security.rst
new file mode 100644
index 0000000000..83c6fb2231
--- /dev/null
+++ b/docs/devel/security.rst
@@ -0,0 +1,225 @@
+==============
+Security Guide
+==============
+Overview
+--------
+This guide covers security topics relevant to developers working on QEMU. It
+includes an explanation of the security requirements that QEMU gives its users,
+the architecture of the code, and secure coding practices.
+
+Security Requirements
+---------------------
+QEMU supports many different use cases, some of which have stricter security
+requirements than others. The community has agreed on the overall security
+requirements that users may depend on. These requirements define what is
+considered supported from a security perspective.
+
+Virtualization Use Case
+~~~~~~~~~~~~~~~~~~~~~~~
+The virtualization use case covers cloud and virtual private server (VPS)
+hosting, as well as traditional data center and desktop virtualization. These
+use cases rely on hardware virtualization extensions to execute guest code
+safely on the physical CPU at close-to-native speed.
+
+The following entities are **untrusted**, meaning that they may be buggy or
+malicious:
+
+* Guest
+* User-facing interfaces (e.g. VNC, SPICE, WebSocket)
+* Network protocols (e.g. NBD, live migration)
+* User-supplied files (e.g. disk images, kernels, device trees)
+* Passthrough devices (e.g. PCI, USB)
+
+Bugs affecting these entities are evaluated on whether they can cause damage in
+real-world use cases and treated as security bugs if this is the case.
+
+Non-virtualization Use Case
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The non-virtualization use case covers emulation using the Tiny Code Generator
+(TCG). In principle the TCG and device emulation code used in conjunction with
+the non-virtualization use case should meet the same security requirements as
+the virtualization use case. However, for historical reasons much of the
+non-virtualization use case code was not written with these security
+requirements in mind.
+
+Bugs affecting the non-virtualization use case are not considered security
+bugs at this time. Users with non-virtualization use cases must not rely on
+QEMU to provide guest isolation or any security guarantees.
+
+Architecture
+------------
+This section describes the design principles that ensure the security
+requirements are met.
+
+Guest Isolation
+~~~~~~~~~~~~~~~
+Guest isolation is the confinement of guest code to the virtual machine. When
+guest code gains control of execution on the host this is called escaping the
+virtual machine. Isolation also includes resource limits such as throttling of
+CPU, memory, disk, or network. Guests must be unable to exceed their resource
+limits.
+
+QEMU presents an attack surface to the guest in the form of emulated devices.
+The guest must not be able to gain control of QEMU. Bugs in emulated devices
+could allow malicious guests to gain code execution in QEMU. At this point the
+guest has escaped the virtual machine and is able to act in the context of the
+QEMU process on the host.
+
+Guests often interact with other guests and share resources with them. A
+malicious guest must not gain control of other guests or access their data.
+Disk image files and network traffic must be protected from other guests unless
+explicitly shared between them by the user.
+
+Principle of Least Privilege
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The principle of least privilege states that each component only has access to
+the privileges necessary for its function. In the case of QEMU this means that
+each process only has access to resources belonging to the guest.
+
+The QEMU process should not have access to any resources that are inaccessible
+to the guest. This way the guest does not gain anything by escaping into the
+QEMU process since it already has access to those same resources from within
+the guest.
+
+Following the principle of least privilege immediately fulfills guest isolation
+requirements. For example, guest A only has access to its own disk image file
+``a.img`` and not guest B's disk image file ``b.img``.
+
+In reality certain resources are inaccessible to the guest but must be
+available to QEMU to perform its function. For example, host system calls are
+necessary for QEMU but are not exposed to guests. A guest that escapes into
+the QEMU process can then begin invoking host system calls.
+
+New features must be designed to follow the principle of least privilege.
+Should this not be possible for technical reasons, the security risk must be
+clearly documented so users are aware of the trade-off of enabling the feature.
+
+Isolation mechanisms
+~~~~~~~~~~~~~~~~~~~~
+Several isolation mechanisms are available to realize this architecture of
+guest isolation and the principle of least privilege. With the exception of
+Linux seccomp, these mechanisms are all deployed by management tools that
+launch QEMU, such as libvirt. They are also platform-specific so they are only
+described briefly for Linux here.
+
+The fundamental isolation mechanism is that QEMU processes must run as
+**unprivileged users**. Sometimes it seems more convenient to launch QEMU as
+root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
+huge security risk. File descriptor passing can be used to give an otherwise
+unprivileged QEMU process access to host devices without running QEMU as root.
+
+**SELinux** and **AppArmor** make it possible to confine processes beyond the
+traditional UNIX process and file permissions model. They restrict the QEMU
+process from accessing processes and files on the host system that are not
+needed by QEMU.
+
+**Resource limits** and **cgroup controllers** provide throughput and utilization
+limits on key resources such as CPU time, memory, and I/O bandwidth.
+
+**Linux namespaces** can be used to make process, file system, and other system
+resources unavailable to QEMU. A namespaced QEMU process is restricted to only
+those resources that were granted to it.
+
+**Linux seccomp** is available via the QEMU ``--sandbox`` option. It disables
+system calls that are not needed by QEMU, thereby reducing the host kernel
+attack surface.
+
+Secure coding practices
+-----------------------
+At the source code level there are several points to keep in mind. Both
+developers and security researchers must be aware of them so that they can
+develop safe code and audit existing code properly.
+
+General Secure C Coding Practices
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Most CVEs (security bugs) reported against QEMU are not specific to
+virtualization or emulation. They are simply C programming bugs. Therefore
+it's critical to be aware of common classes of security bugs.
+
+There is a wide selection of resources available covering secure C coding. For
+example, the `CERT C Coding Standard
+<https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard>`_
+covers the most important classes of security bugs.
+
+Instead of describing them in detail here, only the names of the most important
+classes of security bugs are mentioned:
+
+* Buffer overflows
+* Use-after-free and double-free
+* Integer overflows
+* Format string vulnerabilities
+
+Some of these classes of bugs can be detected by analyzers. Static analysis is
+performed regularly by Coverity and the most obvious of these bugs are even
+reported by compilers. Dynamic analysis is possible with valgrind, tsan, and
+asan.
+
+Input Validation
+~~~~~~~~~~~~~~~~
+Inputs from the guest or external sources (e.g. network, files) cannot be
+trusted and may be invalid. Inputs must be checked before using them in a way
+that could crash the program, expose host memory to the guest, or otherwise be
+exploitable by an attacker.
+
+The most sensitive attack surface is device emulation. All hardware register
+accesses and data read from guest memory must be validated. A typical example
+is a device that contains multiple units that are selectable by the guest via
+an index register::
+
+ typedef struct {
+ ProcessingUnit unit[2];
+ ...
+ } MyDeviceState;
+
+ static void mydev_writel(void *opaque, uint32_t addr, uint32_t val)
+ {
+ MyDeviceState *mydev = opaque;
+ ProcessingUnit *unit;
+
+ switch (addr) {
+ case MYDEV_SELECT_UNIT:
+ unit = &mydev->unit[val]; <-- this input wasn't validated!
+ ...
+ }
+ }
+
+If ``val`` is not in range [0, 1] then an out-of-bounds memory access will take
+place when ``unit`` is dereferenced. The code must check that ``val`` is 0 or
+1 and handle the case where it is invalid.
+
+Unexpected Device Accesses
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+The guest may access device registers in unusual orders or at unexpected
+moments. Device emulation code must not assume that the guest follows the
+typical "theory of operation" presented in driver writer manuals. The guest
+may make nonsense accesses to device registers such as starting operations
+before the device has been fully initialized.
+
+A related issue is that device emulation code must be prepared for unexpected
+device register accesses while asynchronous operations are in progress. A
+well-behaved guest might wait for a completion interrupt before accessing
+certain device registers. Device emulation code must handle the case where the
+guest overwrites registers or submits further requests before an ongoing
+request completes. Unexpected accesses must not cause memory corruption or
+leaks in QEMU.
+
+Invalid device register accesses can be reported with
+``qemu_log_mask(LOG_GUEST_ERROR, ...)``. The ``-d guest_errors`` command-line
+option enables these log messages.
+
+Live migration
+~~~~~~~~~~~~~~
+Device state can be saved to disk image files and shared with other users.
+Live migration code must validate inputs when loading device state so an
+attacker cannot gain control by crafting invalid device states. Device state
+is therefore considered untrusted even though it is typically generated by QEMU
+itself.
+
+Guest Memory Access Races
+~~~~~~~~~~~~~~~~~~~~~~~~~
+Guests with multiple vCPUs may modify guest RAM while device emulation code is
+running. Device emulation code must copy in descriptors and other guest RAM
+structures and only process the local copy. This prevents
+time-of-check-to-time-of-use (TOCTOU) race conditions that could cause QEMU to
+crash when a vCPU thread modifies guest RAM while device emulation is
+processing it.
--
2.20.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-01 16:20 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2019-05-01 16:20 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Eduardo Otubo, Peter Maydell, Markus Armbruster,
Paolo Bonzini, Philippe Mathieu-Daudé
[-- Attachment #1: Type: text/plain, Size: 12565 bytes --]
On Thu, Apr 25, 2019 at 02:35:03PM +0100, Stefan Hajnoczi wrote:
> At KVM Forum 2018 I gave a presentation on security in QEMU:
> https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
> https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
>
> This patch adds a security guide to the developer docs. This document
> covers things that developers should know about security in QEMU. It is
> just a starting point that we can expand on later. I hope it will be
> useful as a resource for new contributors and will save code reviewers
> from explaining the same concepts many times.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v2:
> * Added mention of passthrough USB and PCI devices [philmd]
> * Reworded resource limits [philmd]
> * Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
> ---
> docs/devel/index.rst | 1 +
> docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 226 insertions(+)
> create mode 100644 docs/devel/security.rst
Ping?
> diff --git a/docs/devel/index.rst b/docs/devel/index.rst
> index ebbab636ce..fd0b5fa387 100644
> --- a/docs/devel/index.rst
> +++ b/docs/devel/index.rst
> @@ -20,3 +20,4 @@ Contents:
> stable-process
> testing
> decodetree
> + security
> diff --git a/docs/devel/security.rst b/docs/devel/security.rst
> new file mode 100644
> index 0000000000..83c6fb2231
> --- /dev/null
> +++ b/docs/devel/security.rst
> @@ -0,0 +1,225 @@
> +==============
> +Security Guide
> +==============
> +Overview
> +--------
> +This guide covers security topics relevant to developers working on QEMU. It
> +includes an explanation of the security requirements that QEMU gives its users,
> +the architecture of the code, and secure coding practices.
> +
> +Security Requirements
> +---------------------
> +QEMU supports many different use cases, some of which have stricter security
> +requirements than others. The community has agreed on the overall security
> +requirements that users may depend on. These requirements define what is
> +considered supported from a security perspective.
> +
> +Virtualization Use Case
> +~~~~~~~~~~~~~~~~~~~~~~~
> +The virtualization use case covers cloud and virtual private server (VPS)
> +hosting, as well as traditional data center and desktop virtualization. These
> +use cases rely on hardware virtualization extensions to execute guest code
> +safely on the physical CPU at close-to-native speed.
> +
> +The following entities are **untrusted**, meaning that they may be buggy or
> +malicious:
> +
> +* Guest
> +* User-facing interfaces (e.g. VNC, SPICE, WebSocket)
> +* Network protocols (e.g. NBD, live migration)
> +* User-supplied files (e.g. disk images, kernels, device trees)
> +* Passthrough devices (e.g. PCI, USB)
> +
> +Bugs affecting these entities are evaluated on whether they can cause damage in
> +real-world use cases and treated as security bugs if this is the case.
> +
> +Non-virtualization Use Case
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The non-virtualization use case covers emulation using the Tiny Code Generator
> +(TCG). In principle the TCG and device emulation code used in conjunction with
> +the non-virtualization use case should meet the same security requirements as
> +the virtualization use case. However, for historical reasons much of the
> +non-virtualization use case code was not written with these security
> +requirements in mind.
> +
> +Bugs affecting the non-virtualization use case are not considered security
> +bugs at this time. Users with non-virtualization use cases must not rely on
> +QEMU to provide guest isolation or any security guarantees.
> +
> +Architecture
> +------------
> +This section describes the design principles that ensure the security
> +requirements are met.
> +
> +Guest Isolation
> +~~~~~~~~~~~~~~~
> +Guest isolation is the confinement of guest code to the virtual machine. When
> +guest code gains control of execution on the host this is called escaping the
> +virtual machine. Isolation also includes resource limits such as throttling of
> +CPU, memory, disk, or network. Guests must be unable to exceed their resource
> +limits.
> +
> +QEMU presents an attack surface to the guest in the form of emulated devices.
> +The guest must not be able to gain control of QEMU. Bugs in emulated devices
> +could allow malicious guests to gain code execution in QEMU. At this point the
> +guest has escaped the virtual machine and is able to act in the context of the
> +QEMU process on the host.
> +
> +Guests often interact with other guests and share resources with them. A
> +malicious guest must not gain control of other guests or access their data.
> +Disk image files and network traffic must be protected from other guests unless
> +explicitly shared between them by the user.
> +
> +Principle of Least Privilege
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The principle of least privilege states that each component only has access to
> +the privileges necessary for its function. In the case of QEMU this means that
> +each process only has access to resources belonging to the guest.
> +
> +The QEMU process should not have access to any resources that are inaccessible
> +to the guest. This way the guest does not gain anything by escaping into the
> +QEMU process since it already has access to those same resources from within
> +the guest.
> +
> +Following the principle of least privilege immediately fulfills guest isolation
> +requirements. For example, guest A only has access to its own disk image file
> +``a.img`` and not guest B's disk image file ``b.img``.
> +
> +In reality certain resources are inaccessible to the guest but must be
> +available to QEMU to perform its function. For example, host system calls are
> +necessary for QEMU but are not exposed to guests. A guest that escapes into
> +the QEMU process can then begin invoking host system calls.
> +
> +New features must be designed to follow the principle of least privilege.
> +Should this not be possible for technical reasons, the security risk must be
> +clearly documented so users are aware of the trade-off of enabling the feature.
> +
> +Isolation mechanisms
> +~~~~~~~~~~~~~~~~~~~~
> +Several isolation mechanisms are available to realize this architecture of
> +guest isolation and the principle of least privilege. With the exception of
> +Linux seccomp, these mechanisms are all deployed by management tools that
> +launch QEMU, such as libvirt. They are also platform-specific so they are only
> +described briefly for Linux here.
> +
> +The fundamental isolation mechanism is that QEMU processes must run as
> +**unprivileged users**. Sometimes it seems more convenient to launch QEMU as
> +root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
> +huge security risk. File descriptor passing can be used to give an otherwise
> +unprivileged QEMU process access to host devices without running QEMU as root.
> +
> +**SELinux** and **AppArmor** make it possible to confine processes beyond the
> +traditional UNIX process and file permissions model. They restrict the QEMU
> +process from accessing processes and files on the host system that are not
> +needed by QEMU.
> +
> +**Resource limits** and **cgroup controllers** provide throughput and utilization
> +limits on key resources such as CPU time, memory, and I/O bandwidth.
> +
> +**Linux namespaces** can be used to make process, file system, and other system
> +resources unavailable to QEMU. A namespaced QEMU process is restricted to only
> +those resources that were granted to it.
> +
> +**Linux seccomp** is available via the QEMU ``--sandbox`` option. It disables
> +system calls that are not needed by QEMU, thereby reducing the host kernel
> +attack surface.
> +
> +Secure coding practices
> +-----------------------
> +At the source code level there are several points to keep in mind. Both
> +developers and security researchers must be aware of them so that they can
> +develop safe code and audit existing code properly.
> +
> +General Secure C Coding Practices
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +Most CVEs (security bugs) reported against QEMU are not specific to
> +virtualization or emulation. They are simply C programming bugs. Therefore
> +it's critical to be aware of common classes of security bugs.
> +
> +There is a wide selection of resources available covering secure C coding. For
> +example, the `CERT C Coding Standard
> +<https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard>`_
> +covers the most important classes of security bugs.
> +
> +Instead of describing them in detail here, only the names of the most important
> +classes of security bugs are mentioned:
> +
> +* Buffer overflows
> +* Use-after-free and double-free
> +* Integer overflows
> +* Format string vulnerabilities
> +
> +Some of these classes of bugs can be detected by analyzers. Static analysis is
> +performed regularly by Coverity and the most obvious of these bugs are even
> +reported by compilers. Dynamic analysis is possible with valgrind, tsan, and
> +asan.
> +
> +Input Validation
> +~~~~~~~~~~~~~~~~
> +Inputs from the guest or external sources (e.g. network, files) cannot be
> +trusted and may be invalid. Inputs must be checked before using them in a way
> +that could crash the program, expose host memory to the guest, or otherwise be
> +exploitable by an attacker.
> +
> +The most sensitive attack surface is device emulation. All hardware register
> +accesses and data read from guest memory must be validated. A typical example
> +is a device that contains multiple units that are selectable by the guest via
> +an index register::
> +
> + typedef struct {
> + ProcessingUnit unit[2];
> + ...
> + } MyDeviceState;
> +
> + static void mydev_writel(void *opaque, uint32_t addr, uint32_t val)
> + {
> + MyDeviceState *mydev = opaque;
> + ProcessingUnit *unit;
> +
> + switch (addr) {
> + case MYDEV_SELECT_UNIT:
> + unit = &mydev->unit[val]; <-- this input wasn't validated!
> + ...
> + }
> + }
> +
> +If ``val`` is not in range [0, 1] then an out-of-bounds memory access will take
> +place when ``unit`` is dereferenced. The code must check that ``val`` is 0 or
> +1 and handle the case where it is invalid.
> +
> +Unexpected Device Accesses
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The guest may access device registers in unusual orders or at unexpected
> +moments. Device emulation code must not assume that the guest follows the
> +typical "theory of operation" presented in driver writer manuals. The guest
> +may make nonsense accesses to device registers such as starting operations
> +before the device has been fully initialized.
> +
> +A related issue is that device emulation code must be prepared for unexpected
> +device register accesses while asynchronous operations are in progress. A
> +well-behaved guest might wait for a completion interrupt before accessing
> +certain device registers. Device emulation code must handle the case where the
> +guest overwrites registers or submits further requests before an ongoing
> +request completes. Unexpected accesses must not cause memory corruption or
> +leaks in QEMU.
> +
> +Invalid device register accesses can be reported with
> +``qemu_log_mask(LOG_GUEST_ERROR, ...)``. The ``-d guest_errors`` command-line
> +option enables these log messages.
> +
> +Live migration
> +~~~~~~~~~~~~~~
> +Device state can be saved to disk image files and shared with other users.
> +Live migration code must validate inputs when loading device state so an
> +attacker cannot gain control by crafting invalid device states. Device state
> +is therefore considered untrusted even though it is typically generated by QEMU
> +itself.
> +
> +Guest Memory Access Races
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +Guests with multiple vCPUs may modify guest RAM while device emulation code is
> +running. Device emulation code must copy in descriptors and other guest RAM
> +structures and only process the local copy. This prevents
> +time-of-check-to-time-of-use (TOCTOU) race conditions that could cause QEMU to
> +crash when a vCPU thread modifies guest RAM while device emulation is
> +processing it.
> --
> 2.20.1
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-01 16:20 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2019-05-01 16:20 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Eduardo Otubo, Peter Maydell, qemu-devel, Markus Armbruster,
Paolo Bonzini, Philippe Mathieu-Daudé
[-- Attachment #1: Type: text/plain, Size: 12565 bytes --]
On Thu, Apr 25, 2019 at 02:35:03PM +0100, Stefan Hajnoczi wrote:
> At KVM Forum 2018 I gave a presentation on security in QEMU:
> https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
> https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
>
> This patch adds a security guide to the developer docs. This document
> covers things that developers should know about security in QEMU. It is
> just a starting point that we can expand on later. I hope it will be
> useful as a resource for new contributors and will save code reviewers
> from explaining the same concepts many times.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v2:
> * Added mention of passthrough USB and PCI devices [philmd]
> * Reworded resource limits [philmd]
> * Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
> ---
> docs/devel/index.rst | 1 +
> docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 226 insertions(+)
> create mode 100644 docs/devel/security.rst
Ping?
> diff --git a/docs/devel/index.rst b/docs/devel/index.rst
> index ebbab636ce..fd0b5fa387 100644
> --- a/docs/devel/index.rst
> +++ b/docs/devel/index.rst
> @@ -20,3 +20,4 @@ Contents:
> stable-process
> testing
> decodetree
> + security
> diff --git a/docs/devel/security.rst b/docs/devel/security.rst
> new file mode 100644
> index 0000000000..83c6fb2231
> --- /dev/null
> +++ b/docs/devel/security.rst
> @@ -0,0 +1,225 @@
> +==============
> +Security Guide
> +==============
> +Overview
> +--------
> +This guide covers security topics relevant to developers working on QEMU. It
> +includes an explanation of the security requirements that QEMU gives its users,
> +the architecture of the code, and secure coding practices.
> +
> +Security Requirements
> +---------------------
> +QEMU supports many different use cases, some of which have stricter security
> +requirements than others. The community has agreed on the overall security
> +requirements that users may depend on. These requirements define what is
> +considered supported from a security perspective.
> +
> +Virtualization Use Case
> +~~~~~~~~~~~~~~~~~~~~~~~
> +The virtualization use case covers cloud and virtual private server (VPS)
> +hosting, as well as traditional data center and desktop virtualization. These
> +use cases rely on hardware virtualization extensions to execute guest code
> +safely on the physical CPU at close-to-native speed.
> +
> +The following entities are **untrusted**, meaning that they may be buggy or
> +malicious:
> +
> +* Guest
> +* User-facing interfaces (e.g. VNC, SPICE, WebSocket)
> +* Network protocols (e.g. NBD, live migration)
> +* User-supplied files (e.g. disk images, kernels, device trees)
> +* Passthrough devices (e.g. PCI, USB)
> +
> +Bugs affecting these entities are evaluated on whether they can cause damage in
> +real-world use cases and treated as security bugs if this is the case.
> +
> +Non-virtualization Use Case
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The non-virtualization use case covers emulation using the Tiny Code Generator
> +(TCG). In principle the TCG and device emulation code used in conjunction with
> +the non-virtualization use case should meet the same security requirements as
> +the virtualization use case. However, for historical reasons much of the
> +non-virtualization use case code was not written with these security
> +requirements in mind.
> +
> +Bugs affecting the non-virtualization use case are not considered security
> +bugs at this time. Users with non-virtualization use cases must not rely on
> +QEMU to provide guest isolation or any security guarantees.
> +
> +Architecture
> +------------
> +This section describes the design principles that ensure the security
> +requirements are met.
> +
> +Guest Isolation
> +~~~~~~~~~~~~~~~
> +Guest isolation is the confinement of guest code to the virtual machine. When
> +guest code gains control of execution on the host this is called escaping the
> +virtual machine. Isolation also includes resource limits such as throttling of
> +CPU, memory, disk, or network. Guests must be unable to exceed their resource
> +limits.
> +
> +QEMU presents an attack surface to the guest in the form of emulated devices.
> +The guest must not be able to gain control of QEMU. Bugs in emulated devices
> +could allow malicious guests to gain code execution in QEMU. At this point the
> +guest has escaped the virtual machine and is able to act in the context of the
> +QEMU process on the host.
> +
> +Guests often interact with other guests and share resources with them. A
> +malicious guest must not gain control of other guests or access their data.
> +Disk image files and network traffic must be protected from other guests unless
> +explicitly shared between them by the user.
> +
> +Principle of Least Privilege
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The principle of least privilege states that each component only has access to
> +the privileges necessary for its function. In the case of QEMU this means that
> +each process only has access to resources belonging to the guest.
> +
> +The QEMU process should not have access to any resources that are inaccessible
> +to the guest. This way the guest does not gain anything by escaping into the
> +QEMU process since it already has access to those same resources from within
> +the guest.
> +
> +Following the principle of least privilege immediately fulfills guest isolation
> +requirements. For example, guest A only has access to its own disk image file
> +``a.img`` and not guest B's disk image file ``b.img``.
> +
> +In reality certain resources are inaccessible to the guest but must be
> +available to QEMU to perform its function. For example, host system calls are
> +necessary for QEMU but are not exposed to guests. A guest that escapes into
> +the QEMU process can then begin invoking host system calls.
> +
> +New features must be designed to follow the principle of least privilege.
> +Should this not be possible for technical reasons, the security risk must be
> +clearly documented so users are aware of the trade-off of enabling the feature.
> +
> +Isolation mechanisms
> +~~~~~~~~~~~~~~~~~~~~
> +Several isolation mechanisms are available to realize this architecture of
> +guest isolation and the principle of least privilege. With the exception of
> +Linux seccomp, these mechanisms are all deployed by management tools that
> +launch QEMU, such as libvirt. They are also platform-specific so they are only
> +described briefly for Linux here.
> +
> +The fundamental isolation mechanism is that QEMU processes must run as
> +**unprivileged users**. Sometimes it seems more convenient to launch QEMU as
> +root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
> +huge security risk. File descriptor passing can be used to give an otherwise
> +unprivileged QEMU process access to host devices without running QEMU as root.
> +
> +**SELinux** and **AppArmor** make it possible to confine processes beyond the
> +traditional UNIX process and file permissions model. They restrict the QEMU
> +process from accessing processes and files on the host system that are not
> +needed by QEMU.
> +
> +**Resource limits** and **cgroup controllers** provide throughput and utilization
> +limits on key resources such as CPU time, memory, and I/O bandwidth.
> +
> +**Linux namespaces** can be used to make process, file system, and other system
> +resources unavailable to QEMU. A namespaced QEMU process is restricted to only
> +those resources that were granted to it.
> +
> +**Linux seccomp** is available via the QEMU ``--sandbox`` option. It disables
> +system calls that are not needed by QEMU, thereby reducing the host kernel
> +attack surface.
> +
> +Secure coding practices
> +-----------------------
> +At the source code level there are several points to keep in mind. Both
> +developers and security researchers must be aware of them so that they can
> +develop safe code and audit existing code properly.
> +
> +General Secure C Coding Practices
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +Most CVEs (security bugs) reported against QEMU are not specific to
> +virtualization or emulation. They are simply C programming bugs. Therefore
> +it's critical to be aware of common classes of security bugs.
> +
> +There is a wide selection of resources available covering secure C coding. For
> +example, the `CERT C Coding Standard
> +<https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard>`_
> +covers the most important classes of security bugs.
> +
> +Instead of describing them in detail here, only the names of the most important
> +classes of security bugs are mentioned:
> +
> +* Buffer overflows
> +* Use-after-free and double-free
> +* Integer overflows
> +* Format string vulnerabilities
> +
> +Some of these classes of bugs can be detected by analyzers. Static analysis is
> +performed regularly by Coverity and the most obvious of these bugs are even
> +reported by compilers. Dynamic analysis is possible with valgrind, tsan, and
> +asan.
> +
> +Input Validation
> +~~~~~~~~~~~~~~~~
> +Inputs from the guest or external sources (e.g. network, files) cannot be
> +trusted and may be invalid. Inputs must be checked before using them in a way
> +that could crash the program, expose host memory to the guest, or otherwise be
> +exploitable by an attacker.
> +
> +The most sensitive attack surface is device emulation. All hardware register
> +accesses and data read from guest memory must be validated. A typical example
> +is a device that contains multiple units that are selectable by the guest via
> +an index register::
> +
> + typedef struct {
> + ProcessingUnit unit[2];
> + ...
> + } MyDeviceState;
> +
> + static void mydev_writel(void *opaque, uint32_t addr, uint32_t val)
> + {
> + MyDeviceState *mydev = opaque;
> + ProcessingUnit *unit;
> +
> + switch (addr) {
> + case MYDEV_SELECT_UNIT:
> + unit = &mydev->unit[val]; <-- this input wasn't validated!
> + ...
> + }
> + }
> +
> +If ``val`` is not in range [0, 1] then an out-of-bounds memory access will take
> +place when ``unit`` is dereferenced. The code must check that ``val`` is 0 or
> +1 and handle the case where it is invalid.
> +
> +Unexpected Device Accesses
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The guest may access device registers in unusual orders or at unexpected
> +moments. Device emulation code must not assume that the guest follows the
> +typical "theory of operation" presented in driver writer manuals. The guest
> +may make nonsense accesses to device registers such as starting operations
> +before the device has been fully initialized.
> +
> +A related issue is that device emulation code must be prepared for unexpected
> +device register accesses while asynchronous operations are in progress. A
> +well-behaved guest might wait for a completion interrupt before accessing
> +certain device registers. Device emulation code must handle the case where the
> +guest overwrites registers or submits further requests before an ongoing
> +request completes. Unexpected accesses must not cause memory corruption or
> +leaks in QEMU.
> +
> +Invalid device register accesses can be reported with
> +``qemu_log_mask(LOG_GUEST_ERROR, ...)``. The ``-d guest_errors`` command-line
> +option enables these log messages.
> +
> +Live migration
> +~~~~~~~~~~~~~~
> +Device state can be saved to disk image files and shared with other users.
> +Live migration code must validate inputs when loading device state so an
> +attacker cannot gain control by crafting invalid device states. Device state
> +is therefore considered untrusted even though it is typically generated by QEMU
> +itself.
> +
> +Guest Memory Access Races
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +Guests with multiple vCPUs may modify guest RAM while device emulation code is
> +running. Device emulation code must copy in descriptors and other guest RAM
> +structures and only process the local copy. This prevents
> +time-of-check-to-time-of-use (TOCTOU) race conditions that could cause QEMU to
> +crash when a vCPU thread modifies guest RAM while device emulation is
> +processing it.
> --
> 2.20.1
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 8:14 ` Stefano Garzarella
0 siblings, 0 replies; 18+ messages in thread
From: Stefano Garzarella @ 2019-05-03 8:14 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Eduardo Otubo, Peter Maydell, Markus Armbruster,
Paolo Bonzini, Philippe Mathieu-Daudé
On Thu, Apr 25, 2019 at 02:35:03PM +0100, Stefan Hajnoczi wrote:
> At KVM Forum 2018 I gave a presentation on security in QEMU:
> https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
> https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
>
> This patch adds a security guide to the developer docs. This document
> covers things that developers should know about security in QEMU. It is
> just a starting point that we can expand on later. I hope it will be
> useful as a resource for new contributors and will save code reviewers
> from explaining the same concepts many times.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v2:
> * Added mention of passthrough USB and PCI devices [philmd]
> * Reworded resource limits [philmd]
> * Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
> ---
> docs/devel/index.rst | 1 +
> docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 226 insertions(+)
> create mode 100644 docs/devel/security.rst
>
Very useful docs!
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Thanks,
Stefano
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 8:14 ` Stefano Garzarella
0 siblings, 0 replies; 18+ messages in thread
From: Stefano Garzarella @ 2019-05-03 8:14 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Eduardo Otubo, Peter Maydell, qemu-devel, Markus Armbruster,
Paolo Bonzini, Philippe Mathieu-Daudé
On Thu, Apr 25, 2019 at 02:35:03PM +0100, Stefan Hajnoczi wrote:
> At KVM Forum 2018 I gave a presentation on security in QEMU:
> https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
> https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
>
> This patch adds a security guide to the developer docs. This document
> covers things that developers should know about security in QEMU. It is
> just a starting point that we can expand on later. I hope it will be
> useful as a resource for new contributors and will save code reviewers
> from explaining the same concepts many times.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v2:
> * Added mention of passthrough USB and PCI devices [philmd]
> * Reworded resource limits [philmd]
> * Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
> ---
> docs/devel/index.rst | 1 +
> docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 226 insertions(+)
> create mode 100644 docs/devel/security.rst
>
Very useful docs!
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Thanks,
Stefano
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
2019-04-25 13:35 ` Stefan Hajnoczi
` (2 preceding siblings ...)
(?)
@ 2019-05-03 9:04 ` Alex Bennée
2019-05-03 10:10 ` Philippe Mathieu-Daudé
2019-05-03 17:32 ` Stefan Hajnoczi
-1 siblings, 2 replies; 18+ messages in thread
From: Alex Bennée @ 2019-05-03 9:04 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Otubo, Peter Maydell, Markus Armbruster, Stefan Hajnoczi,
Paolo Bonzini, Philippe Mathieu-Daudé
Stefan Hajnoczi <stefanha@redhat.com> writes:
> At KVM Forum 2018 I gave a presentation on security in QEMU:
> https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
> https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
>
> This patch adds a security guide to the developer docs. This document
> covers things that developers should know about security in QEMU. It is
> just a starting point that we can expand on later. I hope it will be
> useful as a resource for new contributors and will save code reviewers
> from explaining the same concepts many times.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v2:
> * Added mention of passthrough USB and PCI devices [philmd]
> * Reworded resource limits [philmd]
> * Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
> ---
> docs/devel/index.rst | 1 +
> docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 226 insertions(+)
> create mode 100644 docs/devel/security.rst
>
> diff --git a/docs/devel/index.rst b/docs/devel/index.rst
> index ebbab636ce..fd0b5fa387 100644
> --- a/docs/devel/index.rst
> +++ b/docs/devel/index.rst
> @@ -20,3 +20,4 @@ Contents:
> stable-process
> testing
> decodetree
> + security
> diff --git a/docs/devel/security.rst b/docs/devel/security.rst
> new file mode 100644
> index 0000000000..83c6fb2231
> --- /dev/null
> +++ b/docs/devel/security.rst
> @@ -0,0 +1,225 @@
> +==============
> +Security Guide
> +==============
> +Overview
> +--------
> +This guide covers security topics relevant to developers working on QEMU. It
> +includes an explanation of the security requirements that QEMU gives its users,
> +the architecture of the code, and secure coding practices.
> +
> +Security Requirements
> +---------------------
> +QEMU supports many different use cases, some of which have stricter security
> +requirements than others. The community has agreed on the overall security
> +requirements that users may depend on. These requirements define what is
> +considered supported from a security perspective.
> +
> +Virtualization Use Case
> +~~~~~~~~~~~~~~~~~~~~~~~
> +The virtualization use case covers cloud and virtual private server (VPS)
> +hosting, as well as traditional data center and desktop virtualization. These
> +use cases rely on hardware virtualization extensions to execute guest code
> +safely on the physical CPU at close-to-native speed.
> +
> +The following entities are **untrusted**, meaning that they may be buggy or
> +malicious:
> +
> +* Guest
> +* User-facing interfaces (e.g. VNC, SPICE, WebSocket)
> +* Network protocols (e.g. NBD, live migration)
> +* User-supplied files (e.g. disk images, kernels, device trees)
> +* Passthrough devices (e.g. PCI, USB)
> +
> +Bugs affecting these entities are evaluated on whether they can cause damage in
> +real-world use cases and treated as security bugs if this is the case.
> +
> +Non-virtualization Use Case
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The non-virtualization use case covers emulation using the Tiny Code Generator
> +(TCG). In principle the TCG and device emulation code used in conjunction with
> +the non-virtualization use case should meet the same security requirements as
> +the virtualization use case. However, for historical reasons much of the
> +non-virtualization use case code was not written with these security
> +requirements in mind.
> +
> +Bugs affecting the non-virtualization use case are not considered security
> +bugs at this time. Users with non-virtualization use cases must not rely on
> +QEMU to provide guest isolation or any security guarantees.
> +
> +Architecture
> +------------
> +This section describes the design principles that ensure the security
> +requirements are met.
> +
> +Guest Isolation
> +~~~~~~~~~~~~~~~
> +Guest isolation is the confinement of guest code to the virtual machine. When
> +guest code gains control of execution on the host this is called escaping the
> +virtual machine. Isolation also includes resource limits such as throttling of
> +CPU, memory, disk, or network. Guests must be unable to exceed their resource
> +limits.
> +
> +QEMU presents an attack surface to the guest in the form of emulated devices.
> +The guest must not be able to gain control of QEMU. Bugs in emulated devices
> +could allow malicious guests to gain code execution in QEMU. At this point the
> +guest has escaped the virtual machine and is able to act in the context of the
> +QEMU process on the host.
> +
> +Guests often interact with other guests and share resources with them. A
> +malicious guest must not gain control of other guests or access their data.
> +Disk image files and network traffic must be protected from other guests unless
> +explicitly shared between them by the user.
> +
> +Principle of Least Privilege
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The principle of least privilege states that each component only has access to
> +the privileges necessary for its function. In the case of QEMU this means that
> +each process only has access to resources belonging to the guest.
> +
> +The QEMU process should not have access to any resources that are inaccessible
> +to the guest. This way the guest does not gain anything by escaping into the
> +QEMU process since it already has access to those same resources from within
> +the guest.
> +
> +Following the principle of least privilege immediately fulfills guest isolation
> +requirements. For example, guest A only has access to its own disk image file
> +``a.img`` and not guest B's disk image file ``b.img``.
> +
> +In reality certain resources are inaccessible to the guest but must be
> +available to QEMU to perform its function. For example, host system calls are
> +necessary for QEMU but are not exposed to guests. A guest that escapes into
> +the QEMU process can then begin invoking host system calls.
> +
> +New features must be designed to follow the principle of least privilege.
> +Should this not be possible for technical reasons, the security risk must be
> +clearly documented so users are aware of the trade-off of enabling the feature.
> +
> +Isolation mechanisms
> +~~~~~~~~~~~~~~~~~~~~
> +Several isolation mechanisms are available to realize this architecture of
> +guest isolation and the principle of least privilege. With the exception of
> +Linux seccomp, these mechanisms are all deployed by management tools that
> +launch QEMU, such as libvirt. They are also platform-specific so they are only
> +described briefly for Linux here.
> +
> +The fundamental isolation mechanism is that QEMU processes must run as
> +**unprivileged users**. Sometimes it seems more convenient to launch QEMU as
> +root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
> +huge security risk. File descriptor passing can be used to give an otherwise
> +unprivileged QEMU process access to host devices without running QEMU
> as root.
Should we mention that you can still maintain running as a user and just
make the devices you need available to the user/group rather than
becoming root? For example I generally make /dev/kvm group accessible to
my user account.
> +
> +**SELinux** and **AppArmor** make it possible to confine processes beyond the
> +traditional UNIX process and file permissions model. They restrict the QEMU
> +process from accessing processes and files on the host system that are not
> +needed by QEMU.
> +
> +**Resource limits** and **cgroup controllers** provide throughput and utilization
> +limits on key resources such as CPU time, memory, and I/O bandwidth.
> +
> +**Linux namespaces** can be used to make process, file system, and other system
> +resources unavailable to QEMU. A namespaced QEMU process is restricted to only
> +those resources that were granted to it.
> +
> +**Linux seccomp** is available via the QEMU ``--sandbox`` option. It disables
> +system calls that are not needed by QEMU, thereby reducing the host kernel
> +attack surface.
> +
> +Secure coding practices
> +-----------------------
> +At the source code level there are several points to keep in mind. Both
> +developers and security researchers must be aware of them so that they can
> +develop safe code and audit existing code properly.
> +
> +General Secure C Coding Practices
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +Most CVEs (security bugs) reported against QEMU are not specific to
> +virtualization or emulation. They are simply C programming bugs. Therefore
> +it's critical to be aware of common classes of security bugs.
> +
> +There is a wide selection of resources available covering secure C coding. For
> +example, the `CERT C Coding Standard
> +<https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard>`_
> +covers the most important classes of security bugs.
> +
> +Instead of describing them in detail here, only the names of the most important
> +classes of security bugs are mentioned:
> +
> +* Buffer overflows
> +* Use-after-free and double-free
> +* Integer overflows
> +* Format string vulnerabilities
> +
> +Some of these classes of bugs can be detected by analyzers. Static analysis is
> +performed regularly by Coverity and the most obvious of these bugs are even
> +reported by compilers. Dynamic analysis is possible with valgrind, tsan, and
> +asan.
> +
> +Input Validation
> +~~~~~~~~~~~~~~~~
> +Inputs from the guest or external sources (e.g. network, files) cannot be
> +trusted and may be invalid. Inputs must be checked before using them in a way
> +that could crash the program, expose host memory to the guest, or otherwise be
> +exploitable by an attacker.
> +
> +The most sensitive attack surface is device emulation. All hardware register
> +accesses and data read from guest memory must be validated. A typical example
> +is a device that contains multiple units that are selectable by the guest via
> +an index register::
> +
> + typedef struct {
> + ProcessingUnit unit[2];
> + ...
> + } MyDeviceState;
> +
> + static void mydev_writel(void *opaque, uint32_t addr, uint32_t val)
> + {
> + MyDeviceState *mydev = opaque;
> + ProcessingUnit *unit;
> +
> + switch (addr) {
> + case MYDEV_SELECT_UNIT:
> + unit = &mydev->unit[val]; <-- this input wasn't validated!
> + ...
> + }
> + }
> +
> +If ``val`` is not in range [0, 1] then an out-of-bounds memory access will take
> +place when ``unit`` is dereferenced. The code must check that ``val`` is 0 or
> +1 and handle the case where it is invalid.
> +
> +Unexpected Device Accesses
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The guest may access device registers in unusual orders or at unexpected
> +moments. Device emulation code must not assume that the guest follows the
> +typical "theory of operation" presented in driver writer manuals. The guest
> +may make nonsense accesses to device registers such as starting operations
> +before the device has been fully initialized.
> +
> +A related issue is that device emulation code must be prepared for unexpected
> +device register accesses while asynchronous operations are in progress. A
> +well-behaved guest might wait for a completion interrupt before accessing
> +certain device registers. Device emulation code must handle the case where the
> +guest overwrites registers or submits further requests before an ongoing
> +request completes. Unexpected accesses must not cause memory corruption or
> +leaks in QEMU.
> +
> +Invalid device register accesses can be reported with
> +``qemu_log_mask(LOG_GUEST_ERROR, ...)``. The ``-d guest_errors`` command-line
> +option enables these log messages.
> +
> +Live migration
> +~~~~~~~~~~~~~~
> +Device state can be saved to disk image files and shared with other users.
> +Live migration code must validate inputs when loading device state so an
> +attacker cannot gain control by crafting invalid device states. Device state
> +is therefore considered untrusted even though it is typically generated by QEMU
> +itself.
> +
> +Guest Memory Access Races
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +Guests with multiple vCPUs may modify guest RAM while device emulation code is
> +running. Device emulation code must copy in descriptors and other guest RAM
> +structures and only process the local copy. This prevents
> +time-of-check-to-time-of-use (TOCTOU) race conditions that could cause QEMU to
> +crash when a vCPU thread modifies guest RAM while device emulation is
> +processing it.
Anyway:
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
2019-05-03 9:04 ` Alex Bennée
@ 2019-05-03 10:10 ` Philippe Mathieu-Daudé
2019-05-03 17:32 ` Stefan Hajnoczi
1 sibling, 0 replies; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-05-03 10:10 UTC (permalink / raw)
To: Alex Bennée, qemu-devel
Cc: Eduardo Otubo, Peter Maydell, Markus Armbruster, Stefan Hajnoczi,
Paolo Bonzini
On 5/3/19 11:04 AM, Alex Bennée wrote:
>
> Stefan Hajnoczi <stefanha@redhat.com> writes:
>
>> At KVM Forum 2018 I gave a presentation on security in QEMU:
>> https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
>> https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
>>
>> This patch adds a security guide to the developer docs. This document
>> covers things that developers should know about security in QEMU. It is
>> just a starting point that we can expand on later. I hope it will be
>> useful as a resource for new contributors and will save code reviewers
>> from explaining the same concepts many times.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>> v2:
>> * Added mention of passthrough USB and PCI devices [philmd]
>> * Reworded resource limits [philmd]
>> * Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
>> ---
>> docs/devel/index.rst | 1 +
>> docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 226 insertions(+)
>> create mode 100644 docs/devel/security.rst
>>
>> diff --git a/docs/devel/index.rst b/docs/devel/index.rst
>> index ebbab636ce..fd0b5fa387 100644
>> --- a/docs/devel/index.rst
>> +++ b/docs/devel/index.rst
>> @@ -20,3 +20,4 @@ Contents:
>> stable-process
>> testing
>> decodetree
>> + security
>> diff --git a/docs/devel/security.rst b/docs/devel/security.rst
>> new file mode 100644
>> index 0000000000..83c6fb2231
>> --- /dev/null
>> +++ b/docs/devel/security.rst
>> @@ -0,0 +1,225 @@
>> +==============
>> +Security Guide
>> +==============
>> +Overview
>> +--------
>> +This guide covers security topics relevant to developers working on QEMU. It
>> +includes an explanation of the security requirements that QEMU gives its users,
>> +the architecture of the code, and secure coding practices.
>> +
>> +Security Requirements
>> +---------------------
>> +QEMU supports many different use cases, some of which have stricter security
>> +requirements than others. The community has agreed on the overall security
>> +requirements that users may depend on. These requirements define what is
>> +considered supported from a security perspective.
>> +
>> +Virtualization Use Case
>> +~~~~~~~~~~~~~~~~~~~~~~~
>> +The virtualization use case covers cloud and virtual private server (VPS)
>> +hosting, as well as traditional data center and desktop virtualization. These
>> +use cases rely on hardware virtualization extensions to execute guest code
>> +safely on the physical CPU at close-to-native speed.
>> +
>> +The following entities are **untrusted**, meaning that they may be buggy or
>> +malicious:
>> +
>> +* Guest
>> +* User-facing interfaces (e.g. VNC, SPICE, WebSocket)
>> +* Network protocols (e.g. NBD, live migration)
>> +* User-supplied files (e.g. disk images, kernels, device trees)
>> +* Passthrough devices (e.g. PCI, USB)
Thanks.
>> +
>> +Bugs affecting these entities are evaluated on whether they can cause damage in
>> +real-world use cases and treated as security bugs if this is the case.
>> +
>> +Non-virtualization Use Case
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The non-virtualization use case covers emulation using the Tiny Code Generator
>> +(TCG). In principle the TCG and device emulation code used in conjunction with
>> +the non-virtualization use case should meet the same security requirements as
>> +the virtualization use case. However, for historical reasons much of the
>> +non-virtualization use case code was not written with these security
>> +requirements in mind.
>> +
>> +Bugs affecting the non-virtualization use case are not considered security
>> +bugs at this time. Users with non-virtualization use cases must not rely on
>> +QEMU to provide guest isolation or any security guarantees.
>> +
>> +Architecture
>> +------------
>> +This section describes the design principles that ensure the security
>> +requirements are met.
>> +
>> +Guest Isolation
>> +~~~~~~~~~~~~~~~
>> +Guest isolation is the confinement of guest code to the virtual machine. When
>> +guest code gains control of execution on the host this is called escaping the
>> +virtual machine. Isolation also includes resource limits such as throttling of
>> +CPU, memory, disk, or network. Guests must be unable to exceed their resource
>> +limits.
>> +
>> +QEMU presents an attack surface to the guest in the form of emulated devices.
>> +The guest must not be able to gain control of QEMU. Bugs in emulated devices
>> +could allow malicious guests to gain code execution in QEMU. At this point the
>> +guest has escaped the virtual machine and is able to act in the context of the
>> +QEMU process on the host.
>> +
>> +Guests often interact with other guests and share resources with them. A
>> +malicious guest must not gain control of other guests or access their data.
>> +Disk image files and network traffic must be protected from other guests unless
>> +explicitly shared between them by the user.
>> +
>> +Principle of Least Privilege
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The principle of least privilege states that each component only has access to
>> +the privileges necessary for its function. In the case of QEMU this means that
>> +each process only has access to resources belonging to the guest.
>> +
>> +The QEMU process should not have access to any resources that are inaccessible
>> +to the guest. This way the guest does not gain anything by escaping into the
>> +QEMU process since it already has access to those same resources from within
>> +the guest.
>> +
>> +Following the principle of least privilege immediately fulfills guest isolation
>> +requirements. For example, guest A only has access to its own disk image file
>> +``a.img`` and not guest B's disk image file ``b.img``.
>> +
>> +In reality certain resources are inaccessible to the guest but must be
>> +available to QEMU to perform its function. For example, host system calls are
>> +necessary for QEMU but are not exposed to guests. A guest that escapes into
>> +the QEMU process can then begin invoking host system calls.
>> +
>> +New features must be designed to follow the principle of least privilege.
>> +Should this not be possible for technical reasons, the security risk must be
>> +clearly documented so users are aware of the trade-off of enabling the feature.
>> +
>> +Isolation mechanisms
>> +~~~~~~~~~~~~~~~~~~~~
>> +Several isolation mechanisms are available to realize this architecture of
>> +guest isolation and the principle of least privilege. With the exception of
>> +Linux seccomp, these mechanisms are all deployed by management tools that
>> +launch QEMU, such as libvirt. They are also platform-specific so they are only
>> +described briefly for Linux here.
>> +
>> +The fundamental isolation mechanism is that QEMU processes must run as
>> +**unprivileged users**. Sometimes it seems more convenient to launch QEMU as
>> +root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
>> +huge security risk. File descriptor passing can be used to give an otherwise
>> +unprivileged QEMU process access to host devices without running QEMU
>> as root.
>
> Should we mention that you can still maintain running as a user and just
> make the devices you need available to the user/group rather than
> becoming root? For example I generally make /dev/kvm group accessible to
> my user account.
Good suggestion.
>> +
>> +**SELinux** and **AppArmor** make it possible to confine processes beyond the
>> +traditional UNIX process and file permissions model. They restrict the QEMU
>> +process from accessing processes and files on the host system that are not
>> +needed by QEMU.
>> +
>> +**Resource limits** and **cgroup controllers** provide throughput and utilization
>> +limits on key resources such as CPU time, memory, and I/O bandwidth.
>> +
>> +**Linux namespaces** can be used to make process, file system, and other system
>> +resources unavailable to QEMU. A namespaced QEMU process is restricted to only
>> +those resources that were granted to it.
>> +
>> +**Linux seccomp** is available via the QEMU ``--sandbox`` option. It disables
>> +system calls that are not needed by QEMU, thereby reducing the host kernel
>> +attack surface.
>> +
>> +Secure coding practices
>> +-----------------------
>> +At the source code level there are several points to keep in mind. Both
>> +developers and security researchers must be aware of them so that they can
>> +develop safe code and audit existing code properly.
>> +
>> +General Secure C Coding Practices
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +Most CVEs (security bugs) reported against QEMU are not specific to
>> +virtualization or emulation. They are simply C programming bugs. Therefore
>> +it's critical to be aware of common classes of security bugs.
>> +
>> +There is a wide selection of resources available covering secure C coding. For
>> +example, the `CERT C Coding Standard
>> +<https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard>`_
>> +covers the most important classes of security bugs.
>> +
>> +Instead of describing them in detail here, only the names of the most important
>> +classes of security bugs are mentioned:
>> +
>> +* Buffer overflows
>> +* Use-after-free and double-free
>> +* Integer overflows
>> +* Format string vulnerabilities
>> +
>> +Some of these classes of bugs can be detected by analyzers. Static analysis is
>> +performed regularly by Coverity and the most obvious of these bugs are even
>> +reported by compilers. Dynamic analysis is possible with valgrind, tsan, and
>> +asan.
>> +
>> +Input Validation
>> +~~~~~~~~~~~~~~~~
>> +Inputs from the guest or external sources (e.g. network, files) cannot be
>> +trusted and may be invalid. Inputs must be checked before using them in a way
>> +that could crash the program, expose host memory to the guest, or otherwise be
>> +exploitable by an attacker.
>> +
>> +The most sensitive attack surface is device emulation. All hardware register
>> +accesses and data read from guest memory must be validated. A typical example
>> +is a device that contains multiple units that are selectable by the guest via
>> +an index register::
>> +
>> + typedef struct {
>> + ProcessingUnit unit[2];
>> + ...
>> + } MyDeviceState;
>> +
>> + static void mydev_writel(void *opaque, uint32_t addr, uint32_t val)
>> + {
>> + MyDeviceState *mydev = opaque;
>> + ProcessingUnit *unit;
>> +
>> + switch (addr) {
>> + case MYDEV_SELECT_UNIT:
>> + unit = &mydev->unit[val]; <-- this input wasn't validated!
>> + ...
>> + }
>> + }
>> +
>> +If ``val`` is not in range [0, 1] then an out-of-bounds memory access will take
>> +place when ``unit`` is dereferenced. The code must check that ``val`` is 0 or
>> +1 and handle the case where it is invalid.
>> +
>> +Unexpected Device Accesses
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The guest may access device registers in unusual orders or at unexpected
>> +moments. Device emulation code must not assume that the guest follows the
>> +typical "theory of operation" presented in driver writer manuals. The guest
>> +may make nonsense accesses to device registers such as starting operations
>> +before the device has been fully initialized.
>> +
>> +A related issue is that device emulation code must be prepared for unexpected
>> +device register accesses while asynchronous operations are in progress. A
>> +well-behaved guest might wait for a completion interrupt before accessing
>> +certain device registers. Device emulation code must handle the case where the
>> +guest overwrites registers or submits further requests before an ongoing
>> +request completes. Unexpected accesses must not cause memory corruption or
>> +leaks in QEMU.
>> +
>> +Invalid device register accesses can be reported with
>> +``qemu_log_mask(LOG_GUEST_ERROR, ...)``. The ``-d guest_errors`` command-line
>> +option enables these log messages.
Thanks for adding this section!
>> +
>> +Live migration
>> +~~~~~~~~~~~~~~
>> +Device state can be saved to disk image files and shared with other users.
>> +Live migration code must validate inputs when loading device state so an
>> +attacker cannot gain control by crafting invalid device states. Device state
>> +is therefore considered untrusted even though it is typically generated by QEMU
>> +itself.
>> +
>> +Guest Memory Access Races
>> +~~~~~~~~~~~~~~~~~~~~~~~~~
>> +Guests with multiple vCPUs may modify guest RAM while device emulation code is
>> +running. Device emulation code must copy in descriptors and other guest RAM
>> +structures and only process the local copy. This prevents
>> +time-of-check-to-time-of-use (TOCTOU) race conditions that could cause QEMU to
>> +crash when a vCPU thread modifies guest RAM while device emulation is
>> +processing it.
>
> Anyway:
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 10:19 ` Daniel P. Berrangé
0 siblings, 0 replies; 18+ messages in thread
From: Daniel P. Berrangé @ 2019-05-03 10:19 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Philippe Mathieu-Daudé,
Peter Maydell, Markus Armbruster, Paolo Bonzini, Eduardo Otubo
On Thu, Apr 25, 2019 at 02:35:03PM +0100, Stefan Hajnoczi wrote:
> At KVM Forum 2018 I gave a presentation on security in QEMU:
> https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
> https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
>
> This patch adds a security guide to the developer docs. This document
> covers things that developers should know about security in QEMU. It is
> just a starting point that we can expand on later. I hope it will be
> useful as a resource for new contributors and will save code reviewers
> from explaining the same concepts many times.
I'm wondering if we should split this doc in two parts. The first 50%
of it is actually relevant to both QEMU developers and downstream QEMU
developers of mgmt apps and/or end users.
The latter half is purely of interest to QEMU developers.
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v2:
> * Added mention of passthrough USB and PCI devices [philmd]
> * Reworded resource limits [philmd]
> * Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
> ---
> docs/devel/index.rst | 1 +
> docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 226 insertions(+)
> create mode 100644 docs/devel/security.rst
>
> diff --git a/docs/devel/index.rst b/docs/devel/index.rst
> index ebbab636ce..fd0b5fa387 100644
> --- a/docs/devel/index.rst
> +++ b/docs/devel/index.rst
> @@ -20,3 +20,4 @@ Contents:
> stable-process
> testing
> decodetree
> + security
> diff --git a/docs/devel/security.rst b/docs/devel/security.rst
> new file mode 100644
> index 0000000000..83c6fb2231
> --- /dev/null
> +++ b/docs/devel/security.rst
> @@ -0,0 +1,225 @@
> +==============
> +Security Guide
> +==============
> +Overview
> +--------
> +This guide covers security topics relevant to developers working on QEMU. It
> +includes an explanation of the security requirements that QEMU gives its users,
> +the architecture of the code, and secure coding practices.
> +
> +Security Requirements
> +---------------------
> +QEMU supports many different use cases, some of which have stricter security
> +requirements than others. The community has agreed on the overall security
> +requirements that users may depend on. These requirements define what is
> +considered supported from a security perspective.
> +
> +Virtualization Use Case
> +~~~~~~~~~~~~~~~~~~~~~~~
> +The virtualization use case covers cloud and virtual private server (VPS)
> +hosting, as well as traditional data center and desktop virtualization. These
> +use cases rely on hardware virtualization extensions to execute guest code
> +safely on the physical CPU at close-to-native speed.
> +
> +The following entities are **untrusted**, meaning that they may be buggy or
> +malicious:
> +
> +* Guest
> +* User-facing interfaces (e.g. VNC, SPICE, WebSocket)
> +* Network protocols (e.g. NBD, live migration)
> +* User-supplied files (e.g. disk images, kernels, device trees)
> +* Passthrough devices (e.g. PCI, USB)
> +
> +Bugs affecting these entities are evaluated on whether they can cause damage in
> +real-world use cases and treated as security bugs if this is the case.
> +
> +Non-virtualization Use Case
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The non-virtualization use case covers emulation using the Tiny Code Generator
> +(TCG). In principle the TCG and device emulation code used in conjunction with
> +the non-virtualization use case should meet the same security requirements as
> +the virtualization use case. However, for historical reasons much of the
> +non-virtualization use case code was not written with these security
> +requirements in mind.
> +
> +Bugs affecting the non-virtualization use case are not considered security
> +bugs at this time. Users with non-virtualization use cases must not rely on
> +QEMU to provide guest isolation or any security guarantees.
> +
> +Architecture
> +------------
> +This section describes the design principles that ensure the security
> +requirements are met.
> +
> +Guest Isolation
> +~~~~~~~~~~~~~~~
> +Guest isolation is the confinement of guest code to the virtual machine. When
> +guest code gains control of execution on the host this is called escaping the
> +virtual machine. Isolation also includes resource limits such as throttling of
> +CPU, memory, disk, or network. Guests must be unable to exceed their resource
> +limits.
> +
> +QEMU presents an attack surface to the guest in the form of emulated devices.
> +The guest must not be able to gain control of QEMU. Bugs in emulated devices
> +could allow malicious guests to gain code execution in QEMU. At this point the
> +guest has escaped the virtual machine and is able to act in the context of the
> +QEMU process on the host.
> +
> +Guests often interact with other guests and share resources with them. A
> +malicious guest must not gain control of other guests or access their data.
> +Disk image files and network traffic must be protected from other guests unless
> +explicitly shared between them by the user.
> +
> +Principle of Least Privilege
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The principle of least privilege states that each component only has access to
> +the privileges necessary for its function. In the case of QEMU this means that
> +each process only has access to resources belonging to the guest.
> +
> +The QEMU process should not have access to any resources that are inaccessible
> +to the guest. This way the guest does not gain anything by escaping into the
> +QEMU process since it already has access to those same resources from within
> +the guest.
> +
> +Following the principle of least privilege immediately fulfills guest isolation
> +requirements. For example, guest A only has access to its own disk image file
> +``a.img`` and not guest B's disk image file ``b.img``.
> +
> +In reality certain resources are inaccessible to the guest but must be
> +available to QEMU to perform its function. For example, host system calls are
> +necessary for QEMU but are not exposed to guests. A guest that escapes into
> +the QEMU process can then begin invoking host system calls.
> +
> +New features must be designed to follow the principle of least privilege.
> +Should this not be possible for technical reasons, the security risk must be
> +clearly documented so users are aware of the trade-off of enabling the feature.
> +
> +Isolation mechanisms
> +~~~~~~~~~~~~~~~~~~~~
> +Several isolation mechanisms are available to realize this architecture of
> +guest isolation and the principle of least privilege. With the exception of
> +Linux seccomp, these mechanisms are all deployed by management tools that
> +launch QEMU, such as libvirt. They are also platform-specific so they are only
> +described briefly for Linux here.
> +
> +The fundamental isolation mechanism is that QEMU processes must run as
> +**unprivileged users**. Sometimes it seems more convenient to launch QEMU as
> +root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
> +huge security risk. File descriptor passing can be used to give an otherwise
> +unprivileged QEMU process access to host devices without running QEMU as root.
> +
> +**SELinux** and **AppArmor** make it possible to confine processes beyond the
> +traditional UNIX process and file permissions model. They restrict the QEMU
> +process from accessing processes and files on the host system that are not
> +needed by QEMU.
> +
> +**Resource limits** and **cgroup controllers** provide throughput and utilization
> +limits on key resources such as CPU time, memory, and I/O bandwidth.
> +
> +**Linux namespaces** can be used to make process, file system, and other system
> +resources unavailable to QEMU. A namespaced QEMU process is restricted to only
> +those resources that were granted to it.
> +
> +**Linux seccomp** is available via the QEMU ``--sandbox`` option. It disables
> +system calls that are not needed by QEMU, thereby reducing the host kernel
> +attack surface.
Break here.
Everything above here is useful to QEMU devs, app devs & end users and
should be made part of the main QEMU doc - convert it to texi and @include
it from qemu-doc.texi, as we do for other stuff under docs/
Everything below here could just be renamed to "secure-coding-practices.rst"
and solely target qemu devs.
> +
> +Secure coding practices
> +-----------------------
> +At the source code level there are several points to keep in mind. Both
> +developers and security researchers must be aware of them so that they can
> +develop safe code and audit existing code properly.
> +
> +General Secure C Coding Practices
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +Most CVEs (security bugs) reported against QEMU are not specific to
> +virtualization or emulation. They are simply C programming bugs. Therefore
> +it's critical to be aware of common classes of security bugs.
> +
> +There is a wide selection of resources available covering secure C coding. For
> +example, the `CERT C Coding Standard
> +<https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard>`_
> +covers the most important classes of security bugs.
> +
> +Instead of describing them in detail here, only the names of the most important
> +classes of security bugs are mentioned:
> +
> +* Buffer overflows
> +* Use-after-free and double-free
> +* Integer overflows
> +* Format string vulnerabilities
> +
> +Some of these classes of bugs can be detected by analyzers. Static analysis is
> +performed regularly by Coverity and the most obvious of these bugs are even
> +reported by compilers. Dynamic analysis is possible with valgrind, tsan, and
> +asan.
> +
> +Input Validation
> +~~~~~~~~~~~~~~~~
> +Inputs from the guest or external sources (e.g. network, files) cannot be
> +trusted and may be invalid. Inputs must be checked before using them in a way
> +that could crash the program, expose host memory to the guest, or otherwise be
> +exploitable by an attacker.
> +
> +The most sensitive attack surface is device emulation. All hardware register
> +accesses and data read from guest memory must be validated. A typical example
> +is a device that contains multiple units that are selectable by the guest via
> +an index register::
> +
> + typedef struct {
> + ProcessingUnit unit[2];
> + ...
> + } MyDeviceState;
> +
> + static void mydev_writel(void *opaque, uint32_t addr, uint32_t val)
> + {
> + MyDeviceState *mydev = opaque;
> + ProcessingUnit *unit;
> +
> + switch (addr) {
> + case MYDEV_SELECT_UNIT:
> + unit = &mydev->unit[val]; <-- this input wasn't validated!
> + ...
> + }
> + }
> +
> +If ``val`` is not in range [0, 1] then an out-of-bounds memory access will take
> +place when ``unit`` is dereferenced. The code must check that ``val`` is 0 or
> +1 and handle the case where it is invalid.
> +
> +Unexpected Device Accesses
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The guest may access device registers in unusual orders or at unexpected
> +moments. Device emulation code must not assume that the guest follows the
> +typical "theory of operation" presented in driver writer manuals. The guest
> +may make nonsense accesses to device registers such as starting operations
> +before the device has been fully initialized.
> +
> +A related issue is that device emulation code must be prepared for unexpected
> +device register accesses while asynchronous operations are in progress. A
> +well-behaved guest might wait for a completion interrupt before accessing
> +certain device registers. Device emulation code must handle the case where the
> +guest overwrites registers or submits further requests before an ongoing
> +request completes. Unexpected accesses must not cause memory corruption or
> +leaks in QEMU.
> +
> +Invalid device register accesses can be reported with
> +``qemu_log_mask(LOG_GUEST_ERROR, ...)``. The ``-d guest_errors`` command-line
> +option enables these log messages.
> +
> +Live migration
> +~~~~~~~~~~~~~~
> +Device state can be saved to disk image files and shared with other users.
> +Live migration code must validate inputs when loading device state so an
> +attacker cannot gain control by crafting invalid device states. Device state
> +is therefore considered untrusted even though it is typically generated by QEMU
> +itself.
> +
> +Guest Memory Access Races
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +Guests with multiple vCPUs may modify guest RAM while device emulation code is
> +running. Device emulation code must copy in descriptors and other guest RAM
> +structures and only process the local copy. This prevents
> +time-of-check-to-time-of-use (TOCTOU) race conditions that could cause QEMU to
> +crash when a vCPU thread modifies guest RAM while device emulation is
> +processing it.
> --
> 2.20.1
>
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 10:19 ` Daniel P. Berrangé
0 siblings, 0 replies; 18+ messages in thread
From: Daniel P. Berrangé @ 2019-05-03 10:19 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Eduardo Otubo, Peter Maydell, qemu-devel, Markus Armbruster,
Paolo Bonzini, Philippe Mathieu-Daudé
On Thu, Apr 25, 2019 at 02:35:03PM +0100, Stefan Hajnoczi wrote:
> At KVM Forum 2018 I gave a presentation on security in QEMU:
> https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
> https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
>
> This patch adds a security guide to the developer docs. This document
> covers things that developers should know about security in QEMU. It is
> just a starting point that we can expand on later. I hope it will be
> useful as a resource for new contributors and will save code reviewers
> from explaining the same concepts many times.
I'm wondering if we should split this doc in two parts. The first 50%
of it is actually relevant to both QEMU developers and downstream QEMU
developers of mgmt apps and/or end users.
The latter half is purely of interest to QEMU developers.
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> v2:
> * Added mention of passthrough USB and PCI devices [philmd]
> * Reworded resource limits [philmd]
> * Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
> ---
> docs/devel/index.rst | 1 +
> docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 226 insertions(+)
> create mode 100644 docs/devel/security.rst
>
> diff --git a/docs/devel/index.rst b/docs/devel/index.rst
> index ebbab636ce..fd0b5fa387 100644
> --- a/docs/devel/index.rst
> +++ b/docs/devel/index.rst
> @@ -20,3 +20,4 @@ Contents:
> stable-process
> testing
> decodetree
> + security
> diff --git a/docs/devel/security.rst b/docs/devel/security.rst
> new file mode 100644
> index 0000000000..83c6fb2231
> --- /dev/null
> +++ b/docs/devel/security.rst
> @@ -0,0 +1,225 @@
> +==============
> +Security Guide
> +==============
> +Overview
> +--------
> +This guide covers security topics relevant to developers working on QEMU. It
> +includes an explanation of the security requirements that QEMU gives its users,
> +the architecture of the code, and secure coding practices.
> +
> +Security Requirements
> +---------------------
> +QEMU supports many different use cases, some of which have stricter security
> +requirements than others. The community has agreed on the overall security
> +requirements that users may depend on. These requirements define what is
> +considered supported from a security perspective.
> +
> +Virtualization Use Case
> +~~~~~~~~~~~~~~~~~~~~~~~
> +The virtualization use case covers cloud and virtual private server (VPS)
> +hosting, as well as traditional data center and desktop virtualization. These
> +use cases rely on hardware virtualization extensions to execute guest code
> +safely on the physical CPU at close-to-native speed.
> +
> +The following entities are **untrusted**, meaning that they may be buggy or
> +malicious:
> +
> +* Guest
> +* User-facing interfaces (e.g. VNC, SPICE, WebSocket)
> +* Network protocols (e.g. NBD, live migration)
> +* User-supplied files (e.g. disk images, kernels, device trees)
> +* Passthrough devices (e.g. PCI, USB)
> +
> +Bugs affecting these entities are evaluated on whether they can cause damage in
> +real-world use cases and treated as security bugs if this is the case.
> +
> +Non-virtualization Use Case
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The non-virtualization use case covers emulation using the Tiny Code Generator
> +(TCG). In principle the TCG and device emulation code used in conjunction with
> +the non-virtualization use case should meet the same security requirements as
> +the virtualization use case. However, for historical reasons much of the
> +non-virtualization use case code was not written with these security
> +requirements in mind.
> +
> +Bugs affecting the non-virtualization use case are not considered security
> +bugs at this time. Users with non-virtualization use cases must not rely on
> +QEMU to provide guest isolation or any security guarantees.
> +
> +Architecture
> +------------
> +This section describes the design principles that ensure the security
> +requirements are met.
> +
> +Guest Isolation
> +~~~~~~~~~~~~~~~
> +Guest isolation is the confinement of guest code to the virtual machine. When
> +guest code gains control of execution on the host this is called escaping the
> +virtual machine. Isolation also includes resource limits such as throttling of
> +CPU, memory, disk, or network. Guests must be unable to exceed their resource
> +limits.
> +
> +QEMU presents an attack surface to the guest in the form of emulated devices.
> +The guest must not be able to gain control of QEMU. Bugs in emulated devices
> +could allow malicious guests to gain code execution in QEMU. At this point the
> +guest has escaped the virtual machine and is able to act in the context of the
> +QEMU process on the host.
> +
> +Guests often interact with other guests and share resources with them. A
> +malicious guest must not gain control of other guests or access their data.
> +Disk image files and network traffic must be protected from other guests unless
> +explicitly shared between them by the user.
> +
> +Principle of Least Privilege
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The principle of least privilege states that each component only has access to
> +the privileges necessary for its function. In the case of QEMU this means that
> +each process only has access to resources belonging to the guest.
> +
> +The QEMU process should not have access to any resources that are inaccessible
> +to the guest. This way the guest does not gain anything by escaping into the
> +QEMU process since it already has access to those same resources from within
> +the guest.
> +
> +Following the principle of least privilege immediately fulfills guest isolation
> +requirements. For example, guest A only has access to its own disk image file
> +``a.img`` and not guest B's disk image file ``b.img``.
> +
> +In reality certain resources are inaccessible to the guest but must be
> +available to QEMU to perform its function. For example, host system calls are
> +necessary for QEMU but are not exposed to guests. A guest that escapes into
> +the QEMU process can then begin invoking host system calls.
> +
> +New features must be designed to follow the principle of least privilege.
> +Should this not be possible for technical reasons, the security risk must be
> +clearly documented so users are aware of the trade-off of enabling the feature.
> +
> +Isolation mechanisms
> +~~~~~~~~~~~~~~~~~~~~
> +Several isolation mechanisms are available to realize this architecture of
> +guest isolation and the principle of least privilege. With the exception of
> +Linux seccomp, these mechanisms are all deployed by management tools that
> +launch QEMU, such as libvirt. They are also platform-specific so they are only
> +described briefly for Linux here.
> +
> +The fundamental isolation mechanism is that QEMU processes must run as
> +**unprivileged users**. Sometimes it seems more convenient to launch QEMU as
> +root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
> +huge security risk. File descriptor passing can be used to give an otherwise
> +unprivileged QEMU process access to host devices without running QEMU as root.
> +
> +**SELinux** and **AppArmor** make it possible to confine processes beyond the
> +traditional UNIX process and file permissions model. They restrict the QEMU
> +process from accessing processes and files on the host system that are not
> +needed by QEMU.
> +
> +**Resource limits** and **cgroup controllers** provide throughput and utilization
> +limits on key resources such as CPU time, memory, and I/O bandwidth.
> +
> +**Linux namespaces** can be used to make process, file system, and other system
> +resources unavailable to QEMU. A namespaced QEMU process is restricted to only
> +those resources that were granted to it.
> +
> +**Linux seccomp** is available via the QEMU ``--sandbox`` option. It disables
> +system calls that are not needed by QEMU, thereby reducing the host kernel
> +attack surface.
Break here.
Everything above here is useful to QEMU devs, app devs & end users and
should be made part of the main QEMU doc - convert it to texi and @include
it from qemu-doc.texi, as we do for other stuff under docs/
Everything below here could just be renamed to "secure-coding-practices.rst"
and solely target qemu devs.
> +
> +Secure coding practices
> +-----------------------
> +At the source code level there are several points to keep in mind. Both
> +developers and security researchers must be aware of them so that they can
> +develop safe code and audit existing code properly.
> +
> +General Secure C Coding Practices
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +Most CVEs (security bugs) reported against QEMU are not specific to
> +virtualization or emulation. They are simply C programming bugs. Therefore
> +it's critical to be aware of common classes of security bugs.
> +
> +There is a wide selection of resources available covering secure C coding. For
> +example, the `CERT C Coding Standard
> +<https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard>`_
> +covers the most important classes of security bugs.
> +
> +Instead of describing them in detail here, only the names of the most important
> +classes of security bugs are mentioned:
> +
> +* Buffer overflows
> +* Use-after-free and double-free
> +* Integer overflows
> +* Format string vulnerabilities
> +
> +Some of these classes of bugs can be detected by analyzers. Static analysis is
> +performed regularly by Coverity and the most obvious of these bugs are even
> +reported by compilers. Dynamic analysis is possible with valgrind, tsan, and
> +asan.
> +
> +Input Validation
> +~~~~~~~~~~~~~~~~
> +Inputs from the guest or external sources (e.g. network, files) cannot be
> +trusted and may be invalid. Inputs must be checked before using them in a way
> +that could crash the program, expose host memory to the guest, or otherwise be
> +exploitable by an attacker.
> +
> +The most sensitive attack surface is device emulation. All hardware register
> +accesses and data read from guest memory must be validated. A typical example
> +is a device that contains multiple units that are selectable by the guest via
> +an index register::
> +
> + typedef struct {
> + ProcessingUnit unit[2];
> + ...
> + } MyDeviceState;
> +
> + static void mydev_writel(void *opaque, uint32_t addr, uint32_t val)
> + {
> + MyDeviceState *mydev = opaque;
> + ProcessingUnit *unit;
> +
> + switch (addr) {
> + case MYDEV_SELECT_UNIT:
> + unit = &mydev->unit[val]; <-- this input wasn't validated!
> + ...
> + }
> + }
> +
> +If ``val`` is not in range [0, 1] then an out-of-bounds memory access will take
> +place when ``unit`` is dereferenced. The code must check that ``val`` is 0 or
> +1 and handle the case where it is invalid.
> +
> +Unexpected Device Accesses
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The guest may access device registers in unusual orders or at unexpected
> +moments. Device emulation code must not assume that the guest follows the
> +typical "theory of operation" presented in driver writer manuals. The guest
> +may make nonsense accesses to device registers such as starting operations
> +before the device has been fully initialized.
> +
> +A related issue is that device emulation code must be prepared for unexpected
> +device register accesses while asynchronous operations are in progress. A
> +well-behaved guest might wait for a completion interrupt before accessing
> +certain device registers. Device emulation code must handle the case where the
> +guest overwrites registers or submits further requests before an ongoing
> +request completes. Unexpected accesses must not cause memory corruption or
> +leaks in QEMU.
> +
> +Invalid device register accesses can be reported with
> +``qemu_log_mask(LOG_GUEST_ERROR, ...)``. The ``-d guest_errors`` command-line
> +option enables these log messages.
> +
> +Live migration
> +~~~~~~~~~~~~~~
> +Device state can be saved to disk image files and shared with other users.
> +Live migration code must validate inputs when loading device state so an
> +attacker cannot gain control by crafting invalid device states. Device state
> +is therefore considered untrusted even though it is typically generated by QEMU
> +itself.
> +
> +Guest Memory Access Races
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +Guests with multiple vCPUs may modify guest RAM while device emulation code is
> +running. Device emulation code must copy in descriptors and other guest RAM
> +structures and only process the local copy. This prevents
> +time-of-check-to-time-of-use (TOCTOU) race conditions that could cause QEMU to
> +crash when a vCPU thread modifies guest RAM while device emulation is
> +processing it.
> --
> 2.20.1
>
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 10:28 ` Peter Maydell
0 siblings, 0 replies; 18+ messages in thread
From: Peter Maydell @ 2019-05-03 10:28 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Stefan Hajnoczi, QEMU Developers, Philippe Mathieu-Daudé,
Markus Armbruster, Paolo Bonzini, Eduardo Otubo
On Fri, 3 May 2019 at 11:19, Daniel P. Berrangé <berrange@redhat.com> wrote:
> Everything above here is useful to QEMU devs, app devs & end users and
> should be made part of the main QEMU doc - convert it to texi and @include
> it from qemu-doc.texi, as we do for other stuff under docs/
If we convert it to texi we'll have to convert it back again
as/when we migrate properly from texi to sphinx... (I would
like to make further moves in that direction during this
release cycle -- just need to find the time to work on it.)
thanks
-- PMM
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 10:28 ` Peter Maydell
0 siblings, 0 replies; 18+ messages in thread
From: Peter Maydell @ 2019-05-03 10:28 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Eduardo Otubo, QEMU Developers, Markus Armbruster,
Stefan Hajnoczi, Paolo Bonzini, Philippe Mathieu-Daudé
On Fri, 3 May 2019 at 11:19, Daniel P. Berrangé <berrange@redhat.com> wrote:
> Everything above here is useful to QEMU devs, app devs & end users and
> should be made part of the main QEMU doc - convert it to texi and @include
> it from qemu-doc.texi, as we do for other stuff under docs/
If we convert it to texi we'll have to convert it back again
as/when we migrate properly from texi to sphinx... (I would
like to make further moves in that direction during this
release cycle -- just need to find the time to work on it.)
thanks
-- PMM
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 10:35 ` Daniel P. Berrangé
0 siblings, 0 replies; 18+ messages in thread
From: Daniel P. Berrangé @ 2019-05-03 10:35 UTC (permalink / raw)
To: Peter Maydell
Cc: Stefan Hajnoczi, QEMU Developers, Philippe Mathieu-Daudé,
Markus Armbruster, Paolo Bonzini, Eduardo Otubo
On Fri, May 03, 2019 at 11:28:53AM +0100, Peter Maydell wrote:
> On Fri, 3 May 2019 at 11:19, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > Everything above here is useful to QEMU devs, app devs & end users and
> > should be made part of the main QEMU doc - convert it to texi and @include
> > it from qemu-doc.texi, as we do for other stuff under docs/
>
> If we convert it to texi we'll have to convert it back again
> as/when we migrate properly from texi to sphinx... (I would
> like to make further moves in that direction during this
> release cycle -- just need to find the time to work on it.)
Yes, but we're only talking about 100-150 lines of simple text with
minimal markup needs. Won't be a noticable extra burden compared to
the pre-existing 4700 lines of texi markup for qemu-doc.texi and its
includes.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 10:35 ` Daniel P. Berrangé
0 siblings, 0 replies; 18+ messages in thread
From: Daniel P. Berrangé @ 2019-05-03 10:35 UTC (permalink / raw)
To: Peter Maydell
Cc: Eduardo Otubo, QEMU Developers, Markus Armbruster,
Stefan Hajnoczi, Paolo Bonzini, Philippe Mathieu-Daudé
On Fri, May 03, 2019 at 11:28:53AM +0100, Peter Maydell wrote:
> On Fri, 3 May 2019 at 11:19, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > Everything above here is useful to QEMU devs, app devs & end users and
> > should be made part of the main QEMU doc - convert it to texi and @include
> > it from qemu-doc.texi, as we do for other stuff under docs/
>
> If we convert it to texi we'll have to convert it back again
> as/when we migrate properly from texi to sphinx... (I would
> like to make further moves in that direction during this
> release cycle -- just need to find the time to work on it.)
Yes, but we're only talking about 100-150 lines of simple text with
minimal markup needs. Won't be a noticable extra burden compared to
the pre-existing 4700 lines of texi markup for qemu-doc.texi and its
includes.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 17:30 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2019-05-03 17:30 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Peter Maydell, QEMU Developers, Philippe Mathieu-Daudé,
Markus Armbruster, Paolo Bonzini, Eduardo Otubo
[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]
On Fri, May 03, 2019 at 11:35:29AM +0100, Daniel P. Berrangé wrote:
> On Fri, May 03, 2019 at 11:28:53AM +0100, Peter Maydell wrote:
> > On Fri, 3 May 2019 at 11:19, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > Everything above here is useful to QEMU devs, app devs & end users and
> > > should be made part of the main QEMU doc - convert it to texi and @include
> > > it from qemu-doc.texi, as we do for other stuff under docs/
> >
> > If we convert it to texi we'll have to convert it back again
> > as/when we migrate properly from texi to sphinx... (I would
> > like to make further moves in that direction during this
> > release cycle -- just need to find the time to work on it.)
>
> Yes, but we're only talking about 100-150 lines of simple text with
> minimal markup needs. Won't be a noticable extra burden compared to
> the pre-existing 4700 lines of texi markup for qemu-doc.texi and its
> includes.
I'm happy to split as suggested and do it in texi for now.
I am also happy to convert the file back to rst again later.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 17:30 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2019-05-03 17:30 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Eduardo Otubo, Peter Maydell, QEMU Developers, Markus Armbruster,
Paolo Bonzini, Philippe Mathieu-Daudé
[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]
On Fri, May 03, 2019 at 11:35:29AM +0100, Daniel P. Berrangé wrote:
> On Fri, May 03, 2019 at 11:28:53AM +0100, Peter Maydell wrote:
> > On Fri, 3 May 2019 at 11:19, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > Everything above here is useful to QEMU devs, app devs & end users and
> > > should be made part of the main QEMU doc - convert it to texi and @include
> > > it from qemu-doc.texi, as we do for other stuff under docs/
> >
> > If we convert it to texi we'll have to convert it back again
> > as/when we migrate properly from texi to sphinx... (I would
> > like to make further moves in that direction during this
> > release cycle -- just need to find the time to work on it.)
>
> Yes, but we're only talking about 100-150 lines of simple text with
> minimal markup needs. Won't be a noticable extra burden compared to
> the pre-existing 4700 lines of texi markup for qemu-doc.texi and its
> includes.
I'm happy to split as suggested and do it in texi for now.
I am also happy to convert the file back to rst again later.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 17:32 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2019-05-03 17:32 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Eduardo Otubo, Peter Maydell, Markus Armbruster,
Paolo Bonzini, Philippe Mathieu-Daudé
[-- Attachment #1: Type: text/plain, Size: 1442 bytes --]
On Fri, May 03, 2019 at 10:04:10AM +0100, Alex Bennée wrote:
> Stefan Hajnoczi <stefanha@redhat.com> writes:
> > +Isolation mechanisms
> > +~~~~~~~~~~~~~~~~~~~~
> > +Several isolation mechanisms are available to realize this architecture of
> > +guest isolation and the principle of least privilege. With the exception of
> > +Linux seccomp, these mechanisms are all deployed by management tools that
> > +launch QEMU, such as libvirt. They are also platform-specific so they are only
> > +described briefly for Linux here.
> > +
> > +The fundamental isolation mechanism is that QEMU processes must run as
> > +**unprivileged users**. Sometimes it seems more convenient to launch QEMU as
> > +root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
> > +huge security risk. File descriptor passing can be used to give an otherwise
> > +unprivileged QEMU process access to host devices without running QEMU
> > as root.
>
> Should we mention that you can still maintain running as a user and just
> make the devices you need available to the user/group rather than
> becoming root? For example I generally make /dev/kvm group accessible to
> my user account.
Sure. I checked that /dev/vhost-* device nodes are root:root on Fedora
so at least the distro doesn't expect you to do that. The /dev/kvm
device node is root:kvm so it's easy to do it by joining the kvm group
there.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
@ 2019-05-03 17:32 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2019-05-03 17:32 UTC (permalink / raw)
To: Alex Bennée
Cc: Eduardo Otubo, Peter Maydell, qemu-devel, Markus Armbruster,
Paolo Bonzini, Philippe Mathieu-Daudé
[-- Attachment #1: Type: text/plain, Size: 1442 bytes --]
On Fri, May 03, 2019 at 10:04:10AM +0100, Alex Bennée wrote:
> Stefan Hajnoczi <stefanha@redhat.com> writes:
> > +Isolation mechanisms
> > +~~~~~~~~~~~~~~~~~~~~
> > +Several isolation mechanisms are available to realize this architecture of
> > +guest isolation and the principle of least privilege. With the exception of
> > +Linux seccomp, these mechanisms are all deployed by management tools that
> > +launch QEMU, such as libvirt. They are also platform-specific so they are only
> > +described briefly for Linux here.
> > +
> > +The fundamental isolation mechanism is that QEMU processes must run as
> > +**unprivileged users**. Sometimes it seems more convenient to launch QEMU as
> > +root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
> > +huge security risk. File descriptor passing can be used to give an otherwise
> > +unprivileged QEMU process access to host devices without running QEMU
> > as root.
>
> Should we mention that you can still maintain running as a user and just
> make the devices you need available to the user/group rather than
> becoming root? For example I generally make /dev/kvm group accessible to
> my user account.
Sure. I checked that /dev/vhost-* device nodes are root:root on Fedora
so at least the distro doesn't expect you to do that. The /dev/kvm
device node is root:kvm so it's easy to do it by joining the kvm group
there.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-05-03 17:34 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-25 13:35 [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs Stefan Hajnoczi
2019-04-25 13:35 ` Stefan Hajnoczi
2019-05-01 16:20 ` Stefan Hajnoczi
2019-05-01 16:20 ` Stefan Hajnoczi
2019-05-03 8:14 ` Stefano Garzarella
2019-05-03 8:14 ` Stefano Garzarella
2019-05-03 9:04 ` Alex Bennée
2019-05-03 10:10 ` Philippe Mathieu-Daudé
2019-05-03 17:32 ` Stefan Hajnoczi
2019-05-03 17:32 ` Stefan Hajnoczi
2019-05-03 10:19 ` Daniel P. Berrangé
2019-05-03 10:19 ` Daniel P. Berrangé
2019-05-03 10:28 ` Peter Maydell
2019-05-03 10:28 ` Peter Maydell
2019-05-03 10:35 ` Daniel P. Berrangé
2019-05-03 10:35 ` Daniel P. Berrangé
2019-05-03 17:30 ` Stefan Hajnoczi
2019-05-03 17:30 ` Stefan Hajnoczi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.