All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity
@ 2023-11-13  2:23 Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 01/19] virt: Introduce Hypervisor Enforced Kernel Integrity (Heki) Mickaël Salaün
                   ` (18 more replies)
  0 siblings, 19 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

Hi,

This patch series is a proof-of-concept that implements new KVM features
(guest memory attributes, MBEC support, CR pinning) and defines a new
API to protect guest VMs. You can find related resources, including the
related commits here: https://github.com/heki-linux
We'll talk about this work and the related LVBS project at LPC:
* https://lpc.events/event/17/contributions/1486/
* https://lpc.events/event/17/contributions/1515/

The main idea being that kernel self-protection mechanisms should be
delegated to a more privileged part of the system, that is the
hypervisor.  It is still the role of the guest kernel to request such
restrictions according to its configuration. The high-level security
guarantees provided by the hypervisor are semantically the same as a
subset of those the kernel already enforces on itself (CR pinning
hardening and memory protections), but with much higher guarantees.

We'd like the mainline kernel to support such hardening features
leveraging virtualization. We're looking for reviews and comments that
can help mainline these two parts: the KVM implementation and the guest
kernel API layer designed to support different hypervisors. The guest
kernel API layer contains a global struct heki_hypervisor to share data
and functions between the common code and the hypervisor support code.
The struct heki_hypervisor enables to plug in different backend
implementations that are initialized with the heki_early_init() and
heki_late_init() calls. This RFC is a call for collaboration. There is a
lot to do, either on hypervisors, guest kernels or VMMs sides.

We took inspiration from previous patches, mainly the KVMI [1] [2] and
KVM CR-pinning [3] series, revamped and simplified relevant parts to fit
well with our goal, added support for MBEC, added two hypercalls, and
created a kernel API for VMs to request protection in a generic way that
can be leveraged by any hypervisor.

This patch series is based on the kvm-x86's guest_memfd branch [4] [5],
and requires the host to support MBEC. This can easily be checked with:
grep ept_mode_based_exec /proc/cpuinfo You can test it by enabling
CONFIG_HEKI, CONFIG_HEKI_TEST, CONFIG_KUNIT_DEFAULT_ENABLED, and adding
the heki_test=N boot argument to the guest as explained in the last
patch.

# Main changes since v1

We replaced the KVM's page tracking mechanism with the new per-page
attributes patch series [5]. The main difference is that the Heki
per-page attributes should be set by the guest instead of the host.
Indeed, the security policy is defined by the guest and the host should
only be able to enforce it on its side (e.g. device drivers).
Furthermore, the host may not be trusted by the guest
(i.e., confidential computing).

The main previous limitation was the statically enforced permissions.
Mechanisms that dynamically impact kernel executable memory are now
handled (e.g., kernel modules, tracepoints, eBPF JIT) but not
authenticated yet (see Current limitations).

This version supports dynamic kernel memory permissions.  However, the
kernel does not know about all the pages that have been assigned to the
guest. It knows only about the memory that has been passed to it. The
VMM is the one that knows about all the pages. In a future version, the
VMM will be enhanced to set the memory attributes for the pages that are
not known to the kernel. The kernel will then set permissions for the
pages that are actually mapped in its address space. Other pages are
left alone. In order to set EPT permissions for a page, KVM can lookup
the memory attributes for the page. If the attributes are not present,
KVM can use a default of read-write. This will make it efficient as
memory attributes need to be set only for the pages that the kernel
actually maps. Also, this will serve to implement a deny-by-default
policy for execute permissions.

We implemented a mechanism to dynamically synchronize the guest's memory
permissions with KVM. The original KVM_HC_LOCK_MEM_PAGE_RANGES hypercall
contained support for statically defined sections (text, rodata, etc).
It has been redesigned like this:

- The previous version accepted an array of physically contiguous
  ranges. This is appropriate for statically defined sections which are
  loaded in contiguous memory.  But, for other cases like module
  loading, the pages would be discontinuous. The current version of the
  hypercall accepts a page list to fix this.

- The previous version passed permission combinations. E.g.,
  HEKI_MEM_ATTR_EXEC would imply R_X. The current version passes
  permissions as memory attributes and each of the permissions must be
  separately specified. E.g., for text, (MEM_ATTR_READ | MEM_ATTR_EXEC)
  must be passed.

- The previous version locked down the permissions for guest pages so
  that once the permissions are set, they cannot be changed. In this
  version, permissions can be either immutable (MEM_ATTR_IMMUTABLE) or
  can be changed dynamically.  So, the hypercall has been renamed to
  KVM_HC_PROTECT_MEMORY. The dynamic setting of permissions is needed
  by the following features (probably not a complete list):
  - Kprobes and Optprobes
  - Static call optimization
  - Jump Label optimization
  - Ftrace and Livepatch
  - Module loading and unloading
  - eBPF JIT
  - Kexec
  - Kgdb

Examples:
- A text page can be made writable very briefly to install a probe or a
  trace.
- eBPF JIT can populate a writable page with code and make it
  read-execute.
- Module load can load read-only data into a writable page and make the
  page read-only.
- When pages are unmapped, their permissions in the EPT must revert to
  read-write.

KVM now sends a GP fault if a guest attempts to change its pinned CRs.

Because the VMM needs to be involved (e.g. device driver mapping memory)
and to know the guests' requested memory permissions, we implemented two
new kind of VM exits to be able to notify the VMM about guests' Heki
configurations and policy violations. Indeed, forwarding such signals to
the VMM could help improve attack detection, and react to such attempt
(e.g. log events, stop the VM).  Giving visibility to the VMM would also
enable to migrate VMs.

# Threat model

The main threat model is a malicious user space process exploiting a
kernel vulnerability to gain more privileges or to bypass the
access-control system.  This threat also covers attacks coming from
network or storage data (e.g., malformed network packet, inconsistent
drive content).

An extended threat model, following pKVM (and partially confidential
computing) efforts, is to protect as much as possible against the VMM.
This means that the security policy should mainly be defined and
requested by the guest to the hypervisor.  The limit of this approach is
the VMM's resources (e.g. exposed memory), which should also be
protected from the guest.

Considering all potential ways to compromise a kernel, Heki's goal is to
harden a sane kernel before a runtime attack to make it more difficult,
and potentially to cause such an attack to fail. We consider the kernel
itself to be partially malicious during its lifetime e.g., because a ROP
attack that could disable kernel self-protection mechanisms and make
kernel exploitation much easier. Indeed, an exploit is often split into
several stages, each bypassing some security measures. Getting the
guarantee that new kernel executable code is not possible increases the
cost of an attack, hopefully to the point that it is not worth it.

To protect against persistent attacks, complementary security mechanisms
should be used (e.g., kernel module signing, IMA, IPE, Lockdown).

# Prerequisites

For this set of features to be useful, guest kernels must be trusted by
the VM owners at boot time, before launching any user space processes
nor receiving potentially malicious network packets. It is then required
to have a security mechanism to provide or check this initial trust
(e.g., secure boot, kernel module signing).

# How does it work?

This implementation mainly leverages KVM capabilities to control the
Second Layer Address Translation (or the Two Dimensional Paging e.g.,
Intel's EPT or AMD's RVI/NPT) and Mode Based Execution Control (Intel's
MBEC) introduced with the Kaby Lake (7th generation) architecture. This
allows to set permissions on memory pages in a complementary way to the
guest kernel's managed memory permissions. If any permissions are set as
immutable, they are locked and there is no way back.

The KVM_HC_PROTECT_MEMORY hypercall enables the guest kernel to enforce
boot time mapped pages with the MEM_ATTR_{READ,WRITE,EXEC} attributes.

The current implementation walks the kernel address space and requests
permission enforcement according to the mappings present.  For instance,
it sets its .rodata (i.e., any const or __ro_after_init variables, which
includes critical security data such as LSM parameters) and .text
sections as non-writable, and the .text section is the only one where
kernel execution is initially allowed.  This is possible thanks to the
new MBEC support implemented by this series (otherwise the vDSO would
have to be executable). Thanks to this hardware support (VT-x, EPT and
MBEC), the performance impact of such guest protection is negligible at
run time.

A page can be mapped to multiple VAs. Each mapping can have different
permissions. Also each mapping may have a different page size. The
collective permissions across all the mappings must be applied in the
EPT. To make this possible, the implementation maintains permissions
counters for each 4K page (one counter each for read, write and
execute). The kernel address space is walked, the mappings are
identified, the counters are updated and the EPT permissions are set
based on the counters. Currently, the overhead from this whole process
can be visible during boot, especially with debugging features such as
KASAN.  We have some ideas that we will try out for a next series. We
welcome any suggestions to improve this part.

The KVM_HC_LOCK_CR_UPDATE hypercall enables guests to pin some of its
CPU control register flags (e.g., X86_CR0_WP, X86_CR4_SMEP,
X86_CR4_SMAP), which is another complementary hardening mechanism.

Two new kinds of VM exits are implemented: one for a guest Heki request
(i.e. hypercall), and another for a guest attempt to change its pinned
CRs. We haven't implemented such VM exit for memory-related events yet,
that will be part of a next series if the designed is approved.

When the guest attempts to update pinned CRs or to access memory in a
way that is not allowed, the VMM can then be notified and react to such
attack attempt. After that, if the VM is still running, KVM sends either
a GP fault or a page fault to the guest. The guest could then send a
signal to the user space process that triggered this policy violation
(not implemented).

Heki can be enabled with the heki=1 boot command argument.

# Similar implementations

Here is a non-exhaustive list of similar implementations that we looked
at and took some ideas from. Linux mainline doesn't support such
security features, let's change that!

Windows's Virtualization-Based Security is a proprietary technology
that provides a superset of this kind of security mechanism, relying on
Hyper-V and Virtual Trust Levels which enables to have light and secure
VM enforcing restrictions on a full guest VM. This includes several
components such as HVCI for code authenticity, or HyperGuard for
monitoring and protecting kernel code and data.

Samsung's Real-time Kernel Protection (RKP) and Huawei Hypervisor
Execution Environment (HHEE) rely on proprietary hypervisors to protect
some Android devices. They monitor critical kernel data (e.g., page
tables, credentials, selinux_enforcing).

The iOS Kernel Patch Protection (KPP/Watchtower) is a proprietary
solution running in EL3 that monitors and protects critical parts of the
kernel. It is now replaced with a hardware-based mechanism: KTTR/RoRgn.

Bitdefender's Hypervisor Memory Introspection (HVMI) is an open-source
(but out of tree) set of components leveraging virtualization. HVMI
implementation is very complex, and this approach implies potential
semantic gap issues (i.e., kernel data structures may change from one
version to another).

Linux Kernel Runtime Guard is an open-source kernel module that can
detect some kernel data illegitimate modifications. Because it is the
same kernel as the compromised one, an attacker could also bypass or
disable these checks.

Intel's Virtualization Based Hardening [6] [7] is an open-source
proof-of-concept of a thin hypervisor dedicated to guest protection. As
such, it cannot be used to manage several VMs.

# Similar Linux patches

The VM introspection [1] [2] patch series proposed a set of features to
put probes and introspect VMs for debugging and security reasons. We
changed and included the prewrite page tracking and the fault_gva parts.
Heki is much simpler because it focuses on guest hardening, not
introspection.

Paravirtualized Control Register pinning [3] added a set of KVM IOCTLs
to restrict some flags to be set. Heki doesn't implement such user space
interface, but only a dedicated hypercall to lock such registers. A
superset of these flags is configurable with Heki.

The Hypervisor Based Integrity patches [8] [9] only contain a generic
IPC mechanism (KVM_HC_UCALL hypercall) to request protection to the VMM.
The idea was to extend the KVM_SET_USER_MEMORY_REGION IOCTL to support
more permission than read-only.

# Current limitations

The main limitation of this patch series is that the executable kernel
data (e.g. kernel module) is only authenticated by the guest, not by the
hypervisor nor the VMM.  Because the hypervisor is highly privileged and
critical to the security of all the VMs, we don't want to implement a
code authentication mechanism in the hypervisor itself but delegate this
verification to something much less privileged. We are thinking of two
ways to solve this: implement this verification in the VMM or spawn a
dedicated special VM (similar to Windows's VBS). There are pros on cons
to each approach: complexity, verification code ownership (guest's or
VMM's), access to guest memory (i.e., confidential computing).

In this version, the immutable attribute is set on kernel text. So, it
is not possible to use ftrace, Kprobes, etc on kernel text (these are
still possible on module text). The immutable attribute is needed
because we do not have authentication in place yet.

All permissions changes must be authenticated. For changes that happen
during module loading, the module signature can be used for
authentication. But for intrinsic kernel features such as ftrace and
Kprobes, what do we use to authenticate a request? How do we make sure
that it is legitimate? We are looking for ideas in this area.

Also, each authentication involves a round trip from the guest to the
VMM and the hypervisor. This overhead could be significant. We welcome
ideas on improving this as well.

We currently use static address ranges to configure protections at boot
(see heki_arch_early_init). This is not compatible with KASLR yet, but
this will be handled in a next patch series.

Because the guest's virtual address translation is not protected by the
hypervisor, a compromised kernel could map existing physical pages into
arbitrary virtual addresses. The new Intel's Hypervisor-Managed Linear
Address Translation [10] (HLAT) could be used to extend the current
protection and cover this case.

ROP is not covered by this patch series. Guest kernels can still jump to
arbitrary executable pages according to their control-flow integrity
protection.

# Future work

New dynamic restrictions could enable to improve the protected data by
including security-sensitive data such as LSM states, seccomp filters,
keyrings... This requires support outside of the hypervisor.

An execute-only mode could also be useful (cf. XOM for KVM [11] [12]).

Extending register pinning (e.g., MSRs).

For now, MBEC is only supported on a bare metal machine as KVM host;
nested virtualization is not supported yet.  Being able to protect
nested guests might be possible but we need to figure out the potential
security implications.

Protecting the host would be useful, but that doesn't really fit with
the KVM model. The Protected KVM project is a first step to help in this
direction [13].

We only tested this with an Intel CPU, but this approach should work the
same with an AMD CPU starting with the Zen 2 generation and their Guest
Mode Execute Trap (GMET) capability.

We also kept some TODOs to highlight missing checks and code sharing
issues, and some pr_warn() calls to help understand how it works. Tests
need to be improved (e.g., invalid hypercall arguments).

We'll present this work at the Linux Plumbers Conference next week.

[1] https://lore.kernel.org/all/20211006173113.26445-1-alazar@bitdefender.com/
[2] https://www.linux-kvm.org/images/7/72/KVMForum2017_Introspection.pdf
[3] https://lore.kernel.org/all/20200617190757.27081-1-john.s.andersen@intel.com/
[4] https://github.com/kvm-x86/linux
[5] https://lore.kernel.org/all/20231027182217.3615211-1-seanjc@google.com/
[6] https://github.com/intel/vbh
[7] https://sched.co/TmwN
[8] https://sched.co/eE3f
[9] https://lore.kernel.org/all/20200501185147.208192-1-yuanyu@google.com/
[10] https://sched.co/eE4F
[11] https://lore.kernel.org/kvm/20191003212400.31130-1-rick.p.edgecombe@intel.com/
[12] https://lpc.events/event/4/contributions/283/
[13] https://sched.co/eE24

Please reach out to us by replying to this thread, we're looking for
people to join and collaborate on this project!

Previous version:
v1: https://lore.kernel.org/r/20230505152046.6575-1-mic@digikod.net

Regards,

Madhavan T. Venkataraman (9):
  virt: Introduce Hypervisor Enforced Kernel Integrity (Heki)
  KVM: x86: Add new hypercall to set EPT permissions
  x86: Implement the Memory Table feature to store arbitrary per-page
    data
  heki: Implement a kernel page table walker
  heki: x86: Initialize permissions counters for pages mapped into KVA
  heki: x86: Initialize permissions counters for pages in
    vmap()/vunmap()
  heki: x86: Update permissions counters when guest page permissions
    change
  heki: x86: Update permissions counters during text patching
  heki: x86: Protect guest kernel memory using the KVM hypervisor

Mickaël Salaün (10):
  KVM: x86: Add new hypercall to lock control registers
  KVM: x86: Add notifications for Heki policy configuration and
    violation
  heki: Lock guest control registers at the end of guest kernel init
  KVM: VMX: Add MBEC support
  KVM: x86: Add kvm_x86_ops.fault_gva()
  KVM: x86: Make memory attribute helpers more generic
  KVM: x86: Extend kvm_vm_set_mem_attributes() with a mask
  KVM: x86: Extend kvm_range_has_memory_attributes() with match_all
  KVM: x86: Implement per-guest-page permissions
  virt: Add Heki KUnit tests

 Documentation/virt/kvm/x86/hypercalls.rst |  31 +++
 Kconfig                                   |   2 +
 arch/x86/Kconfig                          |   1 +
 arch/x86/include/asm/kvm-x86-ops.h        |   1 +
 arch/x86/include/asm/kvm_host.h           |   2 +
 arch/x86/include/asm/vmx.h                |  11 +-
 arch/x86/include/asm/x86_init.h           |   1 +
 arch/x86/include/uapi/asm/kvm_para.h      |   2 +
 arch/x86/kernel/alternative.c             |   5 +
 arch/x86/kernel/cpu/common.c              |   4 +-
 arch/x86/kernel/cpu/hypervisor.c          |   1 +
 arch/x86/kernel/kvm.c                     |  67 +++++
 arch/x86/kernel/setup.c                   |   2 +
 arch/x86/kvm/Kconfig                      |   2 +
 arch/x86/kvm/Makefile                     |   4 +-
 arch/x86/kvm/mmu.h                        |   3 +-
 arch/x86/kvm/mmu/mmu.c                    | 114 ++++++--
 arch/x86/kvm/mmu/mmutrace.h               |  11 +-
 arch/x86/kvm/mmu/paging_tmpl.h            |  19 +-
 arch/x86/kvm/mmu/spte.c                   |  19 +-
 arch/x86/kvm/mmu/spte.h                   |  15 +-
 arch/x86/kvm/svm/svm.c                    |   9 +
 arch/x86/kvm/vmx/capabilities.h           |   7 +
 arch/x86/kvm/vmx/nested.c                 |   7 +
 arch/x86/kvm/vmx/vmx.c                    |  45 +++-
 arch/x86/kvm/vmx/vmx.h                    |   1 +
 arch/x86/kvm/x86.c                        | 310 ++++++++++++++++++++++
 arch/x86/kvm/x86.h                        |  23 ++
 arch/x86/mm/Makefile                      |   2 +
 arch/x86/mm/heki.c                        | 135 ++++++++++
 arch/x86/mm/pat/set_memory.c              |  51 ++++
 include/linux/heki.h                      | 195 ++++++++++++++
 include/linux/kvm_host.h                  |  11 +-
 include/linux/kvm_mem_attr.h              |  32 +++
 include/linux/mem_table.h                 |  55 ++++
 include/uapi/linux/kvm.h                  |  27 ++
 include/uapi/linux/kvm_para.h             |   2 +
 init/main.c                               |   3 +
 kernel/Makefile                           |   2 +
 kernel/mem_table.c                        | 219 +++++++++++++++
 mm/mm_init.c                              |   1 +
 mm/vmalloc.c                              |   7 +
 virt/Makefile                             |   1 +
 virt/heki/Kconfig                         |  42 +++
 virt/heki/Makefile                        |   6 +
 virt/heki/common.h                        |  16 ++
 virt/heki/counters.c                      | 274 +++++++++++++++++++
 virt/heki/main.c                          | 155 +++++++++++
 virt/heki/tests.c                         | 207 +++++++++++++++
 virt/heki/walk.c                          | 140 ++++++++++
 virt/kvm/kvm_main.c                       |  65 +++--
 virt/lib/kvm_permissions.c                | 104 ++++++++
 52 files changed, 2401 insertions(+), 70 deletions(-)
 create mode 100644 arch/x86/mm/heki.c
 create mode 100644 include/linux/heki.h
 create mode 100644 include/linux/kvm_mem_attr.h
 create mode 100644 include/linux/mem_table.h
 create mode 100644 kernel/mem_table.c
 create mode 100644 virt/heki/Kconfig
 create mode 100644 virt/heki/Makefile
 create mode 100644 virt/heki/common.h
 create mode 100644 virt/heki/counters.c
 create mode 100644 virt/heki/main.c
 create mode 100644 virt/heki/tests.c
 create mode 100644 virt/heki/walk.c
 create mode 100644 virt/lib/kvm_permissions.c


base-commit: 881375a408c0f4ea451ff14545b59216d2923881
-- 
2.42.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 01/19] virt: Introduce Hypervisor Enforced Kernel Integrity (Heki)
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 02/19] KVM: x86: Add new hypercall to lock control registers Mickaël Salaün
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>

Hypervisor Enforced Kernel Integrity (Heki) is a feature that will use
the hypervisor to enhance guest virtual machine security.

Implement minimal code to introduce Heki:

- Define the config variables.

- Define a kernel command line parameter "heki" to turn the feature
  on or off. By default, Heki is on.

- Define heki_early_init() and call it in start_kernel(). Currently,
  this function only prints the value of the "heki" command
  line parameter.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---

Changes since v1:
* Shrinked this patch to only contain the minimal common parts.
* Moved heki_early_init() to start_kernel().
---
 Kconfig              |  2 ++
 arch/x86/Kconfig     |  1 +
 include/linux/heki.h | 31 +++++++++++++++++++++++++++++++
 init/main.c          |  2 ++
 mm/mm_init.c         |  1 +
 virt/Makefile        |  1 +
 virt/heki/Kconfig    | 19 +++++++++++++++++++
 virt/heki/Makefile   |  3 +++
 virt/heki/common.h   | 16 ++++++++++++++++
 virt/heki/main.c     | 32 ++++++++++++++++++++++++++++++++
 10 files changed, 108 insertions(+)
 create mode 100644 include/linux/heki.h
 create mode 100644 virt/heki/Kconfig
 create mode 100644 virt/heki/Makefile
 create mode 100644 virt/heki/common.h
 create mode 100644 virt/heki/main.c

diff --git a/Kconfig b/Kconfig
index 745bc773f567..0c844d9bcb03 100644
--- a/Kconfig
+++ b/Kconfig
@@ -29,4 +29,6 @@ source "lib/Kconfig"
 
 source "lib/Kconfig.debug"
 
+source "virt/heki/Kconfig"
+
 source "Documentation/Kconfig"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 66bfabae8814..424f949442bd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -35,6 +35,7 @@ config X86_64
 	select SWIOTLB
 	select ARCH_HAS_ELFCORE_COMPAT
 	select ZONE_DMA32
+	select ARCH_SUPPORTS_HEKI
 
 config FORCE_DYNAMIC_FTRACE
 	def_bool y
diff --git a/include/linux/heki.h b/include/linux/heki.h
new file mode 100644
index 000000000000..4c18d2283392
--- /dev/null
+++ b/include/linux/heki.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Hypervisor Enforced Kernel Integrity (Heki) - Definitions
+ *
+ * Copyright © 2023 Microsoft Corporation
+ */
+
+#ifndef __HEKI_H__
+#define __HEKI_H__
+
+#include <linux/types.h>
+#include <linux/cache.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/printk.h>
+
+#ifdef CONFIG_HEKI
+
+extern bool heki_enabled;
+
+void heki_early_init(void);
+
+#else /* !CONFIG_HEKI */
+
+static inline void heki_early_init(void)
+{
+}
+
+#endif /* CONFIG_HEKI */
+
+#endif /* __HEKI_H__ */
diff --git a/init/main.c b/init/main.c
index 436d73261810..0d28301c5402 100644
--- a/init/main.c
+++ b/init/main.c
@@ -99,6 +99,7 @@
 #include <linux/init_syscalls.h>
 #include <linux/stackdepot.h>
 #include <linux/randomize_kstack.h>
+#include <linux/heki.h>
 #include <net/net_namespace.h>
 
 #include <asm/io.h>
@@ -1047,6 +1048,7 @@ void start_kernel(void)
 	uts_ns_init();
 	key_init();
 	security_init();
+	heki_early_init();
 	dbg_late_init();
 	net_ns_init();
 	vfs_caches_init();
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 50f2f34745af..896977383cc3 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -26,6 +26,7 @@
 #include <linux/pgtable.h>
 #include <linux/swap.h>
 #include <linux/cma.h>
+#include <linux/heki.h>
 #include "internal.h"
 #include "slab.h"
 #include "shuffle.h"
diff --git a/virt/Makefile b/virt/Makefile
index 1cfea9436af9..4550dc624466 100644
--- a/virt/Makefile
+++ b/virt/Makefile
@@ -1,2 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-y	+= lib/
+obj-$(CONFIG_HEKI) += heki/
diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig
new file mode 100644
index 000000000000..49695fff6d21
--- /dev/null
+++ b/virt/heki/Kconfig
@@ -0,0 +1,19 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Hypervisor Enforced Kernel Integrity (Heki)
+
+config HEKI
+	bool "Hypervisor Enforced Kernel Integrity (Heki)"
+	depends on ARCH_SUPPORTS_HEKI
+	help
+	  This feature enhances guest virtual machine security by taking
+	  advantage of security features provided by the hypervisor for guests.
+	  This feature is helpful in maintaining guest virtual machine security
+	  even after the guest kernel has been compromised.
+
+config ARCH_SUPPORTS_HEKI
+	bool "Architecture support for Heki"
+	help
+	  An architecture should select this when it can successfully build
+	  and run with CONFIG_HEKI. That is, it should provide all of the
+	  architecture support required for the HEKI feature.
diff --git a/virt/heki/Makefile b/virt/heki/Makefile
new file mode 100644
index 000000000000..354e567df71c
--- /dev/null
+++ b/virt/heki/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-y += main.o
diff --git a/virt/heki/common.h b/virt/heki/common.h
new file mode 100644
index 000000000000..edd98fc650a8
--- /dev/null
+++ b/virt/heki/common.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Hypervisor Enforced Kernel Integrity (Heki) - Common header
+ *
+ * Copyright © 2023 Microsoft Corporation
+ */
+
+#ifndef _HEKI_COMMON_H
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) "heki-guest: " fmt
+
+#endif /* _HEKI_COMMON_H */
diff --git a/virt/heki/main.c b/virt/heki/main.c
new file mode 100644
index 000000000000..f005dd74d586
--- /dev/null
+++ b/virt/heki/main.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Hypervisor Enforced Kernel Integrity (Heki) - Common code
+ *
+ * Copyright © 2023 Microsoft Corporation
+ */
+
+#include <linux/heki.h>
+
+#include "common.h"
+
+bool heki_enabled __ro_after_init = true;
+
+/*
+ * Must be called after kmem_cache_init().
+ */
+__init void heki_early_init(void)
+{
+	if (!heki_enabled) {
+		pr_warn("Heki is not enabled\n");
+		return;
+	}
+	pr_warn("Heki is enabled\n");
+}
+
+static int __init heki_parse_config(char *str)
+{
+	if (strtobool(str, &heki_enabled))
+		pr_warn("Invalid option string for heki: '%s'\n", str);
+	return 1;
+}
+__setup("heki=", heki_parse_config);
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 02/19] KVM: x86: Add new hypercall to lock control registers
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 01/19] virt: Introduce Hypervisor Enforced Kernel Integrity (Heki) Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 03/19] KVM: x86: Add notifications for Heki policy configuration and violation Mickaël Salaün
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

This enables guests to lock their CR0 and CR4 registers with a subset of
X86_CR0_WP, X86_CR4_SMEP, X86_CR4_SMAP, X86_CR4_UMIP, X86_CR4_FSGSBASE
and X86_CR4_CET flags.

The new KVM_HC_LOCK_CR_UPDATE hypercall takes three arguments.  The
first is to identify the control register, the second is a bit mask to
pin (i.e. mark as read-only), and the third is for optional flags.

These register flags should already be pinned by Linux guests, but once
compromised, this self-protection mechanism could be disabled, which is
not the case with this dedicated hypercall.

Once the CRs are pinned by the guest, if it attempts to change them,
then a general protection fault is sent to the guest.

This hypercall may evolve and support new kind of registers or pinning.
The optional KVM_LOCK_CR_UPDATE_VERSION flag enables guests to know the
supported abilities by mapping the returned version with the related
features.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
* Guard KVM_HC_LOCK_CR_UPDATE hypercall with CONFIG_HEKI.
* Move extern cr4_pinned_mask to x86.h (suggested by Kees Cook).
* Move VMX CR checks from vmx_set_cr*() to handle_cr() to make it
  possible to return to user space (see next commit).
* Change the heki_check_cr()'s first argument to vcpu.
* Don't use -KVM_EPERM in heki_check_cr().
* Generate a fault when the guest requests a denied CR update.
* Add a flags argument to get the version of this hypercall. Being able
  to do a preper version check was suggested by Wei Liu.
---
 Documentation/virt/kvm/x86/hypercalls.rst | 17 +++++
 arch/x86/include/uapi/asm/kvm_para.h      |  2 +
 arch/x86/kernel/cpu/common.c              |  4 +-
 arch/x86/kvm/vmx/vmx.c                    |  5 ++
 arch/x86/kvm/x86.c                        | 84 +++++++++++++++++++++++
 arch/x86/kvm/x86.h                        | 22 ++++++
 include/linux/kvm_host.h                  |  5 ++
 include/uapi/linux/kvm_para.h             |  1 +
 8 files changed, 139 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/x86/hypercalls.rst b/Documentation/virt/kvm/x86/hypercalls.rst
index 10db7924720f..3178576f4c47 100644
--- a/Documentation/virt/kvm/x86/hypercalls.rst
+++ b/Documentation/virt/kvm/x86/hypercalls.rst
@@ -190,3 +190,20 @@ the KVM_CAP_EXIT_HYPERCALL capability. Userspace must enable that capability
 before advertising KVM_FEATURE_HC_MAP_GPA_RANGE in the guest CPUID.  In
 addition, if the guest supports KVM_FEATURE_MIGRATION_CONTROL, userspace
 must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL.
+
+9. KVM_HC_LOCK_CR_UPDATE
+------------------------
+
+:Architecture: x86
+:Status: active
+:Purpose: Request some control registers to be restricted.
+
+- a0: identify a control register
+- a1: bit mask to make some flags read-only
+- a2: optional KVM_LOCK_CR_UPDATE_VERSION flag that will return the version of
+      this hypercall. Version 1 supports CR0 and CR4 pinning.
+
+The hypercall lets a guest request control register flags to be pinned for
+itself.
+
+Returns 0 on success or a KVM error code otherwise.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 6e64b27b2c1e..efc5ccc0060f 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -150,4 +150,6 @@ struct kvm_vcpu_pv_apf_data {
 #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
 #define KVM_PV_EOI_DISABLED 0x0
 
+#define KVM_LOCK_CR_UPDATE_VERSION (1 << 0)
+
 #endif /* _UAPI_ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4e5ffc8b0e46..f18ee7ce0496 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -400,9 +400,11 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 }
 
 /* These bits should not change their value after CPU init is finished. */
-static const unsigned long cr4_pinned_mask =
+const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
 	X86_CR4_FSGSBASE | X86_CR4_CET;
+EXPORT_SYMBOL_GPL(cr4_pinned_mask);
+
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6e502ba93141..f487bf16dd96 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5452,6 +5452,11 @@ static int handle_cr(struct kvm_vcpu *vcpu)
 	case 0: /* mov to cr */
 		val = kvm_register_read(vcpu, reg);
 		trace_kvm_cr_write(cr, val);
+
+		ret = heki_check_cr(vcpu, cr, val);
+		if (ret)
+			return ret;
+
 		switch (cr) {
 		case 0:
 			err = handle_set_cr0(vcpu, val);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e3eb608b6692..4e6c4c21f12c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8054,11 +8054,86 @@ static unsigned long emulator_get_cr(struct x86_emulate_ctxt *ctxt, int cr)
 	return value;
 }
 
+#ifdef CONFIG_HEKI
+
+#define HEKI_ABI_VERSION 1
+
+static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
+			unsigned long pin, unsigned long flags)
+{
+	if (flags) {
+		if ((flags == KVM_LOCK_CR_UPDATE_VERSION) && !cr && !pin)
+			return HEKI_ABI_VERSION;
+		return -KVM_EINVAL;
+	}
+
+	if (!pin)
+		return -KVM_EINVAL;
+
+	switch (cr) {
+	case 0:
+		/* Cf. arch/x86/kernel/cpu/common.c */
+		if (!(pin & X86_CR0_WP))
+			return -KVM_EINVAL;
+
+		if ((pin & read_cr0()) != pin)
+			return -KVM_EINVAL;
+
+		atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr0);
+		return 0;
+	case 4:
+		/* Checks for irrelevant bits. */
+		if ((pin & cr4_pinned_mask) != pin)
+			return -KVM_EINVAL;
+
+		/* Ignores bits not present in host. */
+		pin &= __read_cr4();
+		atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr4);
+		return 0;
+	}
+	return -KVM_EINVAL;
+}
+
+int heki_check_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
+		  const unsigned long val)
+{
+	unsigned long pinned;
+
+	switch (cr) {
+	case 0:
+		pinned = atomic_long_read(&vcpu->kvm->heki_pinned_cr0);
+		if ((val & pinned) != pinned) {
+			pr_warn_ratelimited(
+				"heki: Blocked CR0 update: 0x%lx\n", val);
+			kvm_inject_gp(vcpu, 0);
+			return 1;
+		}
+		return 0;
+	case 4:
+		pinned = atomic_long_read(&vcpu->kvm->heki_pinned_cr4);
+		if ((val & pinned) != pinned) {
+			pr_warn_ratelimited(
+				"heki: Blocked CR4 update: 0x%lx\n", val);
+			kvm_inject_gp(vcpu, 0);
+			return 1;
+		}
+		return 0;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(heki_check_cr);
+
+#endif /* CONFIG_HEKI */
+
 static int emulator_set_cr(struct x86_emulate_ctxt *ctxt, int cr, ulong val)
 {
 	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
 	int res = 0;
 
+	res = heki_check_cr(vcpu, cr, val);
+	if (res)
+		return res;
+
 	switch (cr) {
 	case 0:
 		res = kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val));
@@ -9918,6 +9993,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
 		return 0;
 	}
+#ifdef CONFIG_HEKI
+	case KVM_HC_LOCK_CR_UPDATE:
+		if (a0 > U32_MAX) {
+			ret = -KVM_EINVAL;
+		} else {
+			ret = heki_lock_cr(vcpu, a0, a1, a2);
+		}
+		break;
+#endif /* CONFIG_HEKI */
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 1e7be1f6ab29..193093112b55 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -290,6 +290,26 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
 	return !(kvm->arch.disabled_quirks & quirk);
 }
 
+#ifdef CONFIG_HEKI
+
+int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr, unsigned long val);
+
+#else /* CONFIG_HEKI */
+
+static inline int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr,
+				unsigned long val)
+{
+	return 0;
+}
+
+static inline int heki_lock_cr(struct kvm_vcpu *const vcpu, unsigned long cr,
+			       unsigned long pin)
+{
+	return 0;
+}
+
+#endif /* CONFIG_HEKI */
+
 void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
 
 u64 get_kvmclock_ns(struct kvm *kvm);
@@ -325,6 +345,8 @@ extern u64 host_xcr0;
 extern u64 host_xss;
 extern u64 host_arch_capabilities;
 
+extern const unsigned long cr4_pinned_mask;
+
 extern struct kvm_caps kvm_caps;
 
 extern bool enable_pmu;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 687589ce9f63..6864c80ff936 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -835,6 +835,11 @@ struct kvm {
 	bool vm_bugged;
 	bool vm_dead;
 
+#ifdef CONFIG_HEKI
+	atomic_long_t heki_pinned_cr0;
+	atomic_long_t heki_pinned_cr4;
+#endif /* CONFIG_HEKI */
+
 #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
 	struct notifier_block pm_notifier;
 #endif
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 960c7e93d1a9..2ed418704603 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -30,6 +30,7 @@
 #define KVM_HC_SEND_IPI		10
 #define KVM_HC_SCHED_YIELD		11
 #define KVM_HC_MAP_GPA_RANGE		12
+#define KVM_HC_LOCK_CR_UPDATE		13
 
 /*
  * hypercalls use architecture specific
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 03/19] KVM: x86: Add notifications for Heki policy configuration and violation
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 01/19] virt: Introduce Hypervisor Enforced Kernel Integrity (Heki) Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 02/19] KVM: x86: Add new hypercall to lock control registers Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 04/19] heki: Lock guest control registers at the end of guest kernel init Mickaël Salaün
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

Add an interface for user space to be notified about guests' Heki policy
and related violations.

Extend the KVM_ENABLE_CAP IOCTL with KVM_CAP_HEKI_CONFIGURE and
KVM_CAP_HEKI_DENIAL. Each one takes a bitmask as first argument that can
contains KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4. The
returned value is the bitmask of known Heki exit reasons, for now:
KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4.

If KVM_CAP_HEKI_CONFIGURE is set, a VM exit will be triggered for each
KVM_HC_LOCK_CR_UPDATE hypercalls according to the requested control
register. This enables to enlighten the VMM with the guest
auto-restrictions.

If KVM_CAP_HEKI_DENIAL is set, a VM exit will be triggered for each
pinned CR violation. This enables the VMM to react to a policy
violation.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
* New patch. Making user space aware of Heki properties was requested by
  Sean Christopherson.
---
 arch/x86/kvm/vmx/vmx.c   |   5 +-
 arch/x86/kvm/x86.c       | 114 +++++++++++++++++++++++++++++++++++----
 arch/x86/kvm/x86.h       |   7 +--
 include/linux/kvm_host.h |   2 +
 include/uapi/linux/kvm.h |  22 ++++++++
 5 files changed, 136 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f487bf16dd96..b631b1d7ba30 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5444,6 +5444,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
 	int reg;
 	int err;
 	int ret;
+	bool exit = false;
 
 	exit_qualification = vmx_get_exit_qual(vcpu);
 	cr = exit_qualification & 15;
@@ -5453,8 +5454,8 @@ static int handle_cr(struct kvm_vcpu *vcpu)
 		val = kvm_register_read(vcpu, reg);
 		trace_kvm_cr_write(cr, val);
 
-		ret = heki_check_cr(vcpu, cr, val);
-		if (ret)
+		ret = heki_check_cr(vcpu, cr, val, &exit);
+		if (exit)
 			return ret;
 
 		switch (cr) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4e6c4c21f12c..43c28a6953bf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -119,6 +119,10 @@ static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
 
 #define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
 
+#define KVM_HEKI_EXIT_REASON_VALID_MASK ( \
+	KVM_HEKI_EXIT_REASON_CR0 | \
+	KVM_HEKI_EXIT_REASON_CR4)
+
 #define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
                                     KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
 
@@ -4644,6 +4648,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM))
 			r |= BIT(KVM_X86_SW_PROTECTED_VM);
 		break;
+	case KVM_CAP_HEKI_CONFIGURE:
+	case KVM_CAP_HEKI_DENIAL:
+		r = KVM_HEKI_EXIT_REASON_VALID_MASK;
+		break;
 	default:
 		break;
 	}
@@ -6518,6 +6526,22 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->lock);
 		break;
+#ifdef CONFIG_HEKI
+	case KVM_CAP_HEKI_CONFIGURE:
+		r = -EINVAL;
+		if (cap->args[0] & ~KVM_HEKI_EXIT_REASON_VALID_MASK)
+			break;
+		kvm->heki_configure_exit_reason = cap->args[0];
+		r = 0;
+		break;
+	case KVM_CAP_HEKI_DENIAL:
+		r = -EINVAL;
+		if (cap->args[0] & ~KVM_HEKI_EXIT_REASON_VALID_MASK)
+			break;
+		kvm->heki_denial_exit_reason = cap->args[0];
+		r = 0;
+		break;
+#endif
 	default:
 		r = -EINVAL;
 		break;
@@ -8056,11 +8080,60 @@ static unsigned long emulator_get_cr(struct x86_emulate_ctxt *ctxt, int cr)
 
 #ifdef CONFIG_HEKI
 
+static int complete_heki_configure_exit(struct kvm_vcpu *const vcpu)
+{
+	kvm_rax_write(vcpu, 0);
+	++vcpu->stat.hypercalls;
+	return kvm_skip_emulated_instruction(vcpu);
+}
+
+static int complete_heki_denial_exit(struct kvm_vcpu *const vcpu)
+{
+	kvm_inject_gp(vcpu, 0);
+	return 1;
+}
+
+/* Returns true if the @exit_reason is handled by @vcpu->kvm. */
+static bool heki_exit_cr(struct kvm_vcpu *const vcpu, const __u32 exit_reason,
+			 const u64 heki_reason, unsigned long value)
+{
+	switch (exit_reason) {
+	case KVM_EXIT_HEKI_CONFIGURE:
+		if (!(vcpu->kvm->heki_configure_exit_reason & heki_reason))
+			return false;
+
+		vcpu->run->heki_configure.reason = heki_reason;
+		memset(vcpu->run->heki_configure.reserved, 0,
+		       sizeof(vcpu->run->heki_configure.reserved));
+		vcpu->run->heki_configure.cr_pinned = value;
+		vcpu->arch.complete_userspace_io = complete_heki_configure_exit;
+		break;
+	case KVM_EXIT_HEKI_DENIAL:
+		if (!(vcpu->kvm->heki_denial_exit_reason & heki_reason))
+			return false;
+
+		vcpu->run->heki_denial.reason = heki_reason;
+		memset(vcpu->run->heki_denial.reserved, 0,
+		       sizeof(vcpu->run->heki_denial.reserved));
+		vcpu->run->heki_denial.cr_value = value;
+		vcpu->arch.complete_userspace_io = complete_heki_denial_exit;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	vcpu->run->exit_reason = exit_reason;
+	return true;
+}
+
 #define HEKI_ABI_VERSION 1
 
 static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
-			unsigned long pin, unsigned long flags)
+			unsigned long pin, unsigned long flags, bool *exit)
 {
+	*exit = false;
+
 	if (flags) {
 		if ((flags == KVM_LOCK_CR_UPDATE_VERSION) && !cr && !pin)
 			return HEKI_ABI_VERSION;
@@ -8080,6 +8153,8 @@ static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
 			return -KVM_EINVAL;
 
 		atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr0);
+		*exit = heki_exit_cr(vcpu, KVM_EXIT_HEKI_CONFIGURE,
+				     KVM_HEKI_EXIT_REASON_CR0, pin);
 		return 0;
 	case 4:
 		/* Checks for irrelevant bits. */
@@ -8089,24 +8164,37 @@ static int heki_lock_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
 		/* Ignores bits not present in host. */
 		pin &= __read_cr4();
 		atomic_long_or(pin, &vcpu->kvm->heki_pinned_cr4);
+		*exit = heki_exit_cr(vcpu, KVM_EXIT_HEKI_CONFIGURE,
+				     KVM_HEKI_EXIT_REASON_CR4, pin);
 		return 0;
 	}
 	return -KVM_EINVAL;
 }
 
+/*
+ * Sets @exit to true if the caller must exit (i.e. denied access) with the
+ * returned value:
+ * - 0 when kvm_run is configured;
+ * - 1 when there is no user space handler.
+ */
 int heki_check_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
-		  const unsigned long val)
+		  const unsigned long val, bool *exit)
 {
 	unsigned long pinned;
 
+	*exit = false;
+
 	switch (cr) {
 	case 0:
 		pinned = atomic_long_read(&vcpu->kvm->heki_pinned_cr0);
 		if ((val & pinned) != pinned) {
 			pr_warn_ratelimited(
 				"heki: Blocked CR0 update: 0x%lx\n", val);
-			kvm_inject_gp(vcpu, 0);
-			return 1;
+			*exit = true;
+			if (heki_exit_cr(vcpu, KVM_EXIT_HEKI_DENIAL,
+					 KVM_HEKI_EXIT_REASON_CR0, val))
+				return 0;
+			return complete_heki_denial_exit(vcpu);
 		}
 		return 0;
 	case 4:
@@ -8114,8 +8202,11 @@ int heki_check_cr(struct kvm_vcpu *const vcpu, const unsigned long cr,
 		if ((val & pinned) != pinned) {
 			pr_warn_ratelimited(
 				"heki: Blocked CR4 update: 0x%lx\n", val);
-			kvm_inject_gp(vcpu, 0);
-			return 1;
+			*exit = true;
+			if (heki_exit_cr(vcpu, KVM_EXIT_HEKI_DENIAL,
+					 KVM_HEKI_EXIT_REASON_CR4, val))
+				return 0;
+			return complete_heki_denial_exit(vcpu);
 		}
 		return 0;
 	}
@@ -8129,9 +8220,10 @@ static int emulator_set_cr(struct x86_emulate_ctxt *ctxt, int cr, ulong val)
 {
 	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
 	int res = 0;
+	bool exit = false;
 
-	res = heki_check_cr(vcpu, cr, val);
-	if (res)
+	res = heki_check_cr(vcpu, cr, val, &exit);
+	if (exit)
 		return res;
 
 	switch (cr) {
@@ -9998,7 +10090,11 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		if (a0 > U32_MAX) {
 			ret = -KVM_EINVAL;
 		} else {
-			ret = heki_lock_cr(vcpu, a0, a1, a2);
+			bool exit = false;
+
+			ret = heki_lock_cr(vcpu, a0, a1, a2, &exit);
+			if (exit)
+				return ret;
 		}
 		break;
 #endif /* CONFIG_HEKI */
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 193093112b55..f8f5c32bedd9 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -292,18 +292,19 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
 
 #ifdef CONFIG_HEKI
 
-int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr, unsigned long val);
+int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr, unsigned long val,
+		  bool *exit);
 
 #else /* CONFIG_HEKI */
 
 static inline int heki_check_cr(struct kvm_vcpu *vcpu, unsigned long cr,
-				unsigned long val)
+				unsigned long val, bool *exit)
 {
 	return 0;
 }
 
 static inline int heki_lock_cr(struct kvm_vcpu *const vcpu, unsigned long cr,
-			       unsigned long pin)
+			       unsigned long pin, bool *exit)
 {
 	return 0;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6864c80ff936..ec32af17add8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -838,6 +838,8 @@ struct kvm {
 #ifdef CONFIG_HEKI
 	atomic_long_t heki_pinned_cr0;
 	atomic_long_t heki_pinned_cr4;
+	u64 heki_configure_exit_reason;
+	u64 heki_denial_exit_reason;
 #endif /* CONFIG_HEKI */
 
 #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5b5820d19e71..2477b4a16126 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -279,6 +279,8 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_CSR        36
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_MEMORY_FAULT     38
+#define KVM_EXIT_HEKI_CONFIGURE   39
+#define KVM_EXIT_HEKI_DENIAL      40
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -532,6 +534,24 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory_fault;
+		/* KVM_EXIT_HEKI_CONFIGURE */
+		struct {
+#define KVM_HEKI_EXIT_REASON_CR0	(1ULL << 0)
+#define KVM_HEKI_EXIT_REASON_CR4	(1ULL << 1)
+			__u64 reason;
+			union {
+				__u64 cr_pinned;
+				__u64 reserved[7]; /* ignored */
+			};
+		} heki_configure;
+		/* KVM_EXIT_HEKI_DENIAL */
+		struct {
+			__u64 reason;
+			union {
+				__u64 cr_value;
+				__u64 reserved[7]; /* ignored */
+			};
+		} heki_denial;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -1219,6 +1239,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_MEMORY_ATTRIBUTES 232
 #define KVM_CAP_GUEST_MEMFD 233
 #define KVM_CAP_VM_TYPES 234
+#define KVM_CAP_HEKI_CONFIGURE 235
+#define KVM_CAP_HEKI_DENIAL 236
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 04/19] heki: Lock guest control registers at the end of guest kernel init
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (2 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 03/19] KVM: x86: Add notifications for Heki policy configuration and violation Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 05/19] KVM: VMX: Add MBEC support Mickaël Salaün
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

The hypervisor needs to provide some functions to support Heki. These
form the Heki-Hypervisor API.

Define a heki_hypervisor structure to house the API functions. A
hypervisor that supports Heki must instantiate a heki_hypervisor
structure and pass it to the Heki common code. This allows the common
code to access these functions in a hypervisor-agnostic way.

The first function that is implemented is lock_crs() (lock control
registers). That is, certain flags in the control registers are pinned
so that they can never be changed for the lifetime of the guest.

Implement Heki support in the guest:

- Each supported hypervisor in x86 implements a set of functions for the
  guest kernel. Add an init_heki() function to that set.  This function
  initializes Heki-related stuff. Call init_heki() for the detected
  hypervisor in init_hypervisor_platform().

- Implement init_heki() for the guest.

- Implement kvm_lock_crs() in the guest to lock down control registers.
  This function calls a KVM hypercall to do the job.

- Instantiate a heki_hypervisor structure that contains a pointer to
  kvm_lock_crs().

- Pass the heki_hypervisor structure to Heki common code in init_heki().

Implement a heki_late_init() function and call it at the end of kernel
init. This function calls lock_crs(). In other words, control registers
of a guest are locked down at the end of guest kernel init.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Co-developed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
* Shrinked the patch to only manage the CR pinning.
---
 arch/x86/include/asm/x86_init.h  |  1 +
 arch/x86/kernel/cpu/hypervisor.c |  1 +
 arch/x86/kernel/kvm.c            | 56 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/Kconfig             |  1 +
 include/linux/heki.h             | 22 +++++++++++++
 init/main.c                      |  1 +
 virt/heki/Kconfig                |  9 ++++-
 virt/heki/main.c                 | 25 ++++++++++++++
 8 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 5240d88db52a..ff4dfd2f615e 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -127,6 +127,7 @@ struct x86_hyper_init {
 	bool (*msi_ext_dest_id)(void);
 	void (*init_mem_mapping)(void);
 	void (*init_after_bootmem)(void);
+	void (*init_heki)(void);
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c
index 553bfbfc3a1b..6085c8129e0c 100644
--- a/arch/x86/kernel/cpu/hypervisor.c
+++ b/arch/x86/kernel/cpu/hypervisor.c
@@ -106,4 +106,5 @@ void __init init_hypervisor_platform(void)
 
 	x86_hyper_type = h->type;
 	x86_init.hyper.init_platform();
+	x86_init.hyper.init_heki();
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b8ab9ee5896c..8349f4ad3bbd 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -29,6 +29,7 @@
 #include <linux/syscore_ops.h>
 #include <linux/cc_platform.h>
 #include <linux/efi.h>
+#include <linux/heki.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -997,6 +998,60 @@ static bool kvm_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
 }
 #endif
 
+#ifdef CONFIG_HEKI
+
+extern unsigned long cr4_pinned_mask;
+
+/*
+ * TODO: Check SMP policy consistency, e.g. with
+ * this_cpu_read(cpu_tlbstate.cr4)
+ */
+static int kvm_lock_crs(void)
+{
+	unsigned long cr4;
+	int err;
+
+	err = kvm_hypercall3(KVM_HC_LOCK_CR_UPDATE, 0, X86_CR0_WP, 0);
+	if (err)
+		return err;
+
+	cr4 = __read_cr4();
+	err = kvm_hypercall3(KVM_HC_LOCK_CR_UPDATE, 4, cr4 & cr4_pinned_mask,
+			     0);
+	return err;
+}
+
+static struct heki_hypervisor kvm_heki_hypervisor = {
+	.lock_crs = kvm_lock_crs,
+};
+
+static void kvm_init_heki(void)
+{
+	long err;
+
+	if (!kvm_para_available()) {
+		/* Cannot make KVM hypercalls. */
+		return;
+	}
+
+	err = kvm_hypercall3(KVM_HC_LOCK_CR_UPDATE, 0, 0,
+			     KVM_LOCK_CR_UPDATE_VERSION);
+	if (err < 1) {
+		/* Ignores host not supporting at least the first version. */
+		return;
+	}
+
+	heki.hypervisor = &kvm_heki_hypervisor;
+}
+
+#else /* CONFIG_HEKI */
+
+static void kvm_init_heki(void)
+{
+}
+
+#endif /* CONFIG_HEKI */
+
 const __initconst struct hypervisor_x86 x86_hyper_kvm = {
 	.name				= "KVM",
 	.detect				= kvm_detect,
@@ -1005,6 +1060,7 @@ const __initconst struct hypervisor_x86 x86_hyper_kvm = {
 	.init.x2apic_available		= kvm_para_available,
 	.init.msi_ext_dest_id		= kvm_msi_ext_dest_id,
 	.init.init_platform		= kvm_init_platform,
+	.init.init_heki			= kvm_init_heki,
 #if defined(CONFIG_AMD_MEM_ENCRYPT)
 	.runtime.sev_es_hcall_prepare	= kvm_sev_es_hcall_prepare,
 	.runtime.sev_es_hcall_finish	= kvm_sev_es_hcall_finish,
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8452ed0228cb..7a3b52b7e456 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -49,6 +49,7 @@ config KVM
 	select INTERVAL_TREE
 	select HAVE_KVM_PM_NOTIFIER if PM
 	select KVM_GENERIC_HARDWARE_ENABLING
+	select HYPERVISOR_SUPPORTS_HEKI
 	help
 	  Support hosting fully virtualized guest machines using hardware
 	  virtualization extensions.  You will need a fairly recent
diff --git a/include/linux/heki.h b/include/linux/heki.h
index 4c18d2283392..96ccb17657e5 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -9,6 +9,7 @@
 #define __HEKI_H__
 
 #include <linux/types.h>
+#include <linux/bug.h>
 #include <linux/cache.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
@@ -16,15 +17,36 @@
 
 #ifdef CONFIG_HEKI
 
+/*
+ * A hypervisor that supports Heki will instantiate this structure to
+ * provide hypervisor specific functions for Heki.
+ */
+struct heki_hypervisor {
+	int (*lock_crs)(void); /* Lock control registers. */
+};
+
+/*
+ * If the active hypervisor supports Heki, it will plug its heki_hypervisor
+ * pointer into this heki structure.
+ */
+struct heki {
+	struct heki_hypervisor *hypervisor;
+};
+
+extern struct heki heki;
 extern bool heki_enabled;
 
 void heki_early_init(void);
+void heki_late_init(void);
 
 #else /* !CONFIG_HEKI */
 
 static inline void heki_early_init(void)
 {
 }
+static inline void heki_late_init(void)
+{
+}
 
 #endif /* CONFIG_HEKI */
 
diff --git a/init/main.c b/init/main.c
index 0d28301c5402..f1c998bbb370 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1447,6 +1447,7 @@ static int __ref kernel_init(void *unused)
 	exit_boot_config();
 	free_initmem();
 	mark_readonly();
+	heki_late_init();
 
 	/*
 	 * Kernel mappings are now finalized - update the userspace page-table
diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig
index 49695fff6d21..5ea75b595667 100644
--- a/virt/heki/Kconfig
+++ b/virt/heki/Kconfig
@@ -4,7 +4,7 @@
 
 config HEKI
 	bool "Hypervisor Enforced Kernel Integrity (Heki)"
-	depends on ARCH_SUPPORTS_HEKI
+	depends on ARCH_SUPPORTS_HEKI && HYPERVISOR_SUPPORTS_HEKI
 	help
 	  This feature enhances guest virtual machine security by taking
 	  advantage of security features provided by the hypervisor for guests.
@@ -17,3 +17,10 @@ config ARCH_SUPPORTS_HEKI
 	  An architecture should select this when it can successfully build
 	  and run with CONFIG_HEKI. That is, it should provide all of the
 	  architecture support required for the HEKI feature.
+
+config HYPERVISOR_SUPPORTS_HEKI
+	bool "Hypervisor support for Heki"
+	help
+	  A hypervisor should select this when it can successfully build
+	  and run with CONFIG_HEKI. That is, it should provide all of the
+	  hypervisor support required for the Heki feature.
diff --git a/virt/heki/main.c b/virt/heki/main.c
index f005dd74d586..ff1937e1c946 100644
--- a/virt/heki/main.c
+++ b/virt/heki/main.c
@@ -10,6 +10,7 @@
 #include "common.h"
 
 bool heki_enabled __ro_after_init = true;
+struct heki heki;
 
 /*
  * Must be called after kmem_cache_init().
@@ -21,6 +22,30 @@ __init void heki_early_init(void)
 		return;
 	}
 	pr_warn("Heki is enabled\n");
+
+	if (!heki.hypervisor) {
+		/* This happens for kernels running on bare metal as well. */
+		pr_warn("No support for Heki in the active hypervisor\n");
+		return;
+	}
+	pr_warn("Heki is supported by the active Hypervisor\n");
+}
+
+/*
+ * Must be called after mark_readonly().
+ */
+void heki_late_init(void)
+{
+	struct heki_hypervisor *hypervisor = heki.hypervisor;
+
+	if (!heki_enabled || !heki.hypervisor)
+		return;
+
+	/* Locks control registers so a compromised guest cannot change them. */
+	if (WARN_ON(hypervisor->lock_crs()))
+		return;
+
+	pr_warn("Control registers locked\n");
 }
 
 static int __init heki_parse_config(char *str)
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 05/19] KVM: VMX: Add MBEC support
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (3 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 04/19] heki: Lock guest control registers at the end of guest kernel init Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 06/19] KVM: x86: Add kvm_x86_ops.fault_gva() Mickaël Salaün
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

This changes add support for VMX_FEATURE_MODE_BASED_EPT_EXEC (named
ept_mode_based_exec in /proc/cpuinfo and MBEC elsewhere), which enables
to separate EPT execution bits for supervisor vs. user.  It transforms
the semantic of VMX_EPT_EXECUTABLE_MASK from a global execution to a
kernel execution, and use the VMX_EPT_USER_EXECUTABLE_MASK bit to
identify user execution.

The main use case is to be able to restrict kernel execution while
ignoring user space execution from the hypervisor point of view.
Indeed, user space execution can already be restricted by the guest
kernel.

This change enables MBEC but doesn't change the default configuration,
which is to allow execution for all guest memory.  However, the next
commit levages MBEC to restrict kernel memory pages.

MBEC can be configured with the new "enable_mbec" module parameter, set
to true by default.  However, MBEC is disable for L1 and L2 for now.

The MMU tracepoints are updated to reflect the difference between kernel
and user space executions, see is_executable_pte().

Replace EPT_VIOLATION_RWX_MASK (3 bits) with 4 dedicated
EPT_VIOLATION_READ, EPT_VIOLATION_WRITE, EPT_VIOLATION_KERNEL_INSTR, and
EPT_VIOLATION_USER_INSTR bits.

From the Intel 64 and IA-32 Architectures Software Developer's Manual,
Volume 3C (System Programming Guide), Part 3:

SECONDARY_EXEC_MODE_BASED_EPT_EXEC (bit 22):
If either the "unrestricted guest" VM-execution control or the
"mode-based execute control for EPT" VM-execution control is 1, the
"enable EPT" VM-execution control must also be 1.

EPT_VIOLATION_KERNEL_INSTR_BIT (bit 5):
The logical-AND of bit 2 in the EPT paging-structure entries used to
translate the guest-physical address of the access causing the EPT
violation.  If the "mode-based execute control for EPT" VM-execution
control is 0, this indicates whether the guest-physical address was
executable. If that control is 1, this indicates whether the
guest-physical address was executable for supervisor-mode linear
addresses.

EPT_VIOLATION_USER_INSTR_BIT (bit 6):
If the "mode-based execute control" VM-execution control is 0, the value
of this bit is undefined. If that control is 1, this bit is the
logical-AND of bit 10 in the EPT paging-structures entries used to
translate the guest-physical address of the access causing the EPT
violation. In this case, it indicates whether the guest-physical address
was executable for user-mode linear addresses.

PT_USER_EXEC_MASK (bit 10):
Execute access for user-mode linear addresses. If the "mode-based
execute control for EPT" VM-execution control is 1, indicates whether
instruction fetches are allowed from user-mode linear addresses in the
512-GByte region controlled by this entry. If that control is 0, this
bit is ignored.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
* Import the MMU tracepoint changes from the v1's "Enable guests to lock
  themselves thanks to MBEC" patch.
---
 arch/x86/include/asm/vmx.h      | 11 +++++++++--
 arch/x86/kvm/mmu.h              |  3 ++-
 arch/x86/kvm/mmu/mmu.c          |  8 ++++++--
 arch/x86/kvm/mmu/mmutrace.h     | 11 +++++++----
 arch/x86/kvm/mmu/paging_tmpl.h  | 16 ++++++++++++++--
 arch/x86/kvm/mmu/spte.c         |  4 +++-
 arch/x86/kvm/mmu/spte.h         | 15 +++++++++++++--
 arch/x86/kvm/vmx/capabilities.h |  7 +++++++
 arch/x86/kvm/vmx/nested.c       |  7 +++++++
 arch/x86/kvm/vmx/vmx.c          | 29 ++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/vmx.h          |  1 +
 11 files changed, 95 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 0e73616b82f3..7fd390484b36 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -513,6 +513,7 @@ enum vmcs_field {
 #define VMX_EPT_IPAT_BIT    			(1ull << 6)
 #define VMX_EPT_ACCESS_BIT			(1ull << 8)
 #define VMX_EPT_DIRTY_BIT			(1ull << 9)
+#define VMX_EPT_USER_EXECUTABLE_MASK		(1ull << 10)
 #define VMX_EPT_RWX_MASK                        (VMX_EPT_READABLE_MASK |       \
 						 VMX_EPT_WRITABLE_MASK |       \
 						 VMX_EPT_EXECUTABLE_MASK)
@@ -558,13 +559,19 @@ enum vm_entry_failure_code {
 #define EPT_VIOLATION_ACC_READ_BIT	0
 #define EPT_VIOLATION_ACC_WRITE_BIT	1
 #define EPT_VIOLATION_ACC_INSTR_BIT	2
-#define EPT_VIOLATION_RWX_SHIFT		3
+#define EPT_VIOLATION_READ_BIT		3
+#define EPT_VIOLATION_WRITE_BIT		4
+#define EPT_VIOLATION_KERNEL_INSTR_BIT	5
+#define EPT_VIOLATION_USER_INSTR_BIT	6
 #define EPT_VIOLATION_GVA_IS_VALID_BIT	7
 #define EPT_VIOLATION_GVA_TRANSLATED_BIT 8
 #define EPT_VIOLATION_ACC_READ		(1 << EPT_VIOLATION_ACC_READ_BIT)
 #define EPT_VIOLATION_ACC_WRITE		(1 << EPT_VIOLATION_ACC_WRITE_BIT)
 #define EPT_VIOLATION_ACC_INSTR		(1 << EPT_VIOLATION_ACC_INSTR_BIT)
-#define EPT_VIOLATION_RWX_MASK		(VMX_EPT_RWX_MASK << EPT_VIOLATION_RWX_SHIFT)
+#define EPT_VIOLATION_READ		(1 << EPT_VIOLATION_READ_BIT)
+#define EPT_VIOLATION_WRITE		(1 << EPT_VIOLATION_WRITE_BIT)
+#define EPT_VIOLATION_KERNEL_INSTR	(1 << EPT_VIOLATION_KERNEL_INSTR_BIT)
+#define EPT_VIOLATION_USER_INSTR	(1 << EPT_VIOLATION_USER_INSTR_BIT)
 #define EPT_VIOLATION_GVA_IS_VALID	(1 << EPT_VIOLATION_GVA_IS_VALID_BIT)
 #define EPT_VIOLATION_GVA_TRANSLATED	(1 << EPT_VIOLATION_GVA_TRANSLATED_BIT)
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 253fb2093d5d..65168d3a4e31 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -24,6 +24,7 @@ extern bool __read_mostly enable_mmio_caching;
 #define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT)
 #define PT_PAT_MASK (1ULL << 7)
 #define PT_GLOBAL_MASK (1ULL << 8)
+#define PT_USER_EXEC_MASK (1ULL << 10)
 #define PT64_NX_SHIFT 63
 #define PT64_NX_MASK (1ULL << PT64_NX_SHIFT)
 
@@ -102,7 +103,7 @@ static inline u8 kvm_get_shadow_phys_bits(void)
 
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
-void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
+void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only, bool has_mbec);
 
 void kvm_init_mmu(struct kvm_vcpu *vcpu);
 void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index baeba8fc1c38..7e053973125c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -29,6 +29,9 @@
 #include "cpuid.h"
 #include "spte.h"
 
+/* Required by paging_tmpl.h for enable_mbec */
+#include "../vmx/capabilities.h"
+
 #include <linux/kvm_host.h>
 #include <linux/types.h>
 #include <linux/string.h>
@@ -3410,7 +3413,7 @@ static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
 static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte)
 {
 	if (fault->exec)
-		return is_executable_pte(spte);
+		return is_executable_pte(spte, !fault->user);
 
 	if (fault->write)
 		return is_writable_pte(spte);
@@ -3852,7 +3855,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 */
 	pm_mask = PT_PRESENT_MASK | shadow_me_value;
 	if (mmu->root_role.level >= PT64_ROOT_4LEVEL) {
-		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
+		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK |
+			   PT_USER_EXEC_MASK;
 
 		if (WARN_ON_ONCE(!mmu->pml4_root)) {
 			r = -EIO;
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ae86820cef69..cb7df95aec25 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -342,7 +342,8 @@ TRACE_EVENT(
 		__field(u8, level)
 		/* These depend on page entry type, so compute them now.  */
 		__field(bool, r)
-		__field(bool, x)
+		__field(bool, kx)
+		__field(bool, ux)
 		__field(signed char, u)
 	),
 
@@ -352,15 +353,17 @@ TRACE_EVENT(
 		__entry->sptep = virt_to_phys(sptep);
 		__entry->level = level;
 		__entry->r = shadow_present_mask || (__entry->spte & PT_PRESENT_MASK);
-		__entry->x = is_executable_pte(__entry->spte);
+		__entry->kx = is_executable_pte(__entry->spte, true);
+		__entry->ux = is_executable_pte(__entry->spte, false);
 		__entry->u = shadow_user_mask ? !!(__entry->spte & shadow_user_mask) : -1;
 	),
 
-	TP_printk("gfn %llx spte %llx (%s%s%s%s) level %d at %llx",
+	TP_printk("gfn %llx spte %llx (%s%s%s%s%s) level %d at %llx",
 		  __entry->gfn, __entry->spte,
 		  __entry->r ? "r" : "-",
 		  __entry->spte & PT_WRITABLE_MASK ? "w" : "-",
-		  __entry->x ? "x" : "-",
+		  __entry->kx ? "X" : "-",
+		  __entry->ux ? "x" : "-",
 		  __entry->u == -1 ? "" : (__entry->u ? "u" : "-"),
 		  __entry->level, __entry->sptep
 	)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index c85255073f67..08f0c8d28245 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -510,8 +510,20 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 		 * Note, pte_access holds the raw RWX bits from the EPTE, not
 		 * ACC_*_MASK flags!
 		 */
-		vcpu->arch.exit_qualification |= (pte_access & VMX_EPT_RWX_MASK) <<
-						 EPT_VIOLATION_RWX_SHIFT;
+		vcpu->arch.exit_qualification |=
+			!!(pte_access & VMX_EPT_READABLE_MASK)
+			<< EPT_VIOLATION_READ_BIT;
+		vcpu->arch.exit_qualification |=
+			!!(pte_access & VMX_EPT_WRITABLE_MASK)
+			<< EPT_VIOLATION_WRITE_BIT;
+		vcpu->arch.exit_qualification |=
+			!!(pte_access & VMX_EPT_EXECUTABLE_MASK)
+			<< EPT_VIOLATION_KERNEL_INSTR_BIT;
+		if (enable_mbec) {
+			vcpu->arch.exit_qualification |=
+				!!(pte_access & VMX_EPT_USER_EXECUTABLE_MASK)
+				<< EPT_VIOLATION_USER_INSTR_BIT;
+		}
 	}
 #endif
 	walker->fault.address = addr;
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 4a599130e9c9..386cc1e8aab9 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -422,13 +422,15 @@ void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask);
 
-void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only)
+void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only, bool has_mbec)
 {
 	shadow_user_mask	= VMX_EPT_READABLE_MASK;
 	shadow_accessed_mask	= has_ad_bits ? VMX_EPT_ACCESS_BIT : 0ull;
 	shadow_dirty_mask	= has_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull;
 	shadow_nx_mask		= 0ull;
 	shadow_x_mask		= VMX_EPT_EXECUTABLE_MASK;
+	if (has_mbec)
+		shadow_x_mask |= VMX_EPT_USER_EXECUTABLE_MASK;
 	shadow_present_mask	= has_exec_only ? 0ull : VMX_EPT_READABLE_MASK;
 	/*
 	 * EPT overrides the host MTRRs, and so KVM must program the desired
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index a129951c9a88..2f402f81ee15 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -3,8 +3,11 @@
 #ifndef KVM_X86_MMU_SPTE_H
 #define KVM_X86_MMU_SPTE_H
 
+#include <asm/vmx.h>
+
 #include "mmu.h"
 #include "mmu_internal.h"
+#include "../vmx/vmx.h"
 
 /*
  * A MMU present SPTE is backed by actual memory and may or may not be present
@@ -320,9 +323,17 @@ static inline bool is_last_spte(u64 pte, int level)
 	return (level == PG_LEVEL_4K) || is_large_pte(pte);
 }
 
-static inline bool is_executable_pte(u64 spte)
+static inline bool is_executable_pte(u64 spte, bool for_kernel_mode)
 {
-	return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask;
+	u64 x_mask = shadow_x_mask;
+
+	if (enable_mbec) {
+		if (for_kernel_mode)
+			x_mask &= ~VMX_EPT_USER_EXECUTABLE_MASK;
+		else
+			x_mask &= ~VMX_EPT_EXECUTABLE_MASK;
+	}
+	return (spte & (x_mask | shadow_nx_mask)) == x_mask;
 }
 
 static inline kvm_pfn_t spte_to_pfn(u64 pte)
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 41a4533f9989..70bd38645680 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -13,6 +13,7 @@ extern bool __read_mostly enable_vpid;
 extern bool __read_mostly flexpriority_enabled;
 extern bool __read_mostly enable_ept;
 extern bool __read_mostly enable_unrestricted_guest;
+extern bool __read_mostly enable_mbec;
 extern bool __read_mostly enable_ept_ad_bits;
 extern bool __read_mostly enable_pml;
 extern bool __read_mostly enable_ipiv;
@@ -255,6 +256,12 @@ static inline bool cpu_has_vmx_xsaves(void)
 		SECONDARY_EXEC_ENABLE_XSAVES;
 }
 
+static inline bool cpu_has_vmx_mbec(void)
+{
+	return vmcs_config.cpu_based_2nd_exec_ctrl &
+		SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
+}
+
 static inline bool cpu_has_vmx_waitpkg(void)
 {
 	return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c5ec0ef51ff7..23a0341729d6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2324,6 +2324,9 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
 		/* VMCS shadowing for L2 is emulated for now */
 		exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
 
+		/* MBEC is currently only handled for L0. */
+		exec_control &= ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
+
 		/*
 		 * Preset *DT exiting when emulating UMIP, so that vmx_set_cr4()
 		 * will not have to rewrite the controls just for this bit.
@@ -6863,6 +6866,10 @@ static void nested_vmx_setup_secondary_ctls(u32 ept_caps,
 {
 	msrs->secondary_ctls_low = 0;
 
+	/*
+	 * Currently, SECONDARY_EXEC_MODE_BASED_EPT_EXEC is only handled for
+	 * L0 and doesn't need to be exposed to L1 nor L2.
+	 */
 	msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
 	msrs->secondary_ctls_high &=
 		SECONDARY_EXEC_DESC |
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b631b1d7ba30..1b1581f578b0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -94,6 +94,10 @@ bool __read_mostly enable_unrestricted_guest = 1;
 module_param_named(unrestricted_guest,
 			enable_unrestricted_guest, bool, S_IRUGO);
 
+bool __read_mostly enable_mbec = true;
+EXPORT_SYMBOL_GPL(enable_mbec);
+module_param_named(mbec, enable_mbec, bool, 0444);
+
 bool __read_mostly enable_ept_ad_bits = 1;
 module_param_named(eptad, enable_ept_ad_bits, bool, S_IRUGO);
 
@@ -4596,10 +4600,21 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 		exec_control &= ~SECONDARY_EXEC_ENABLE_VPID;
 	if (!enable_ept) {
 		exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
+		/*
+		 * From Intel's SDM:
+		 * If either the "unrestricted guest" VM-execution control or
+		 * the "mode-based execute control for EPT" VM-execution
+		 * control is 1, the "enable EPT" VM-execution control must
+		 * also be 1.
+		 */
 		enable_unrestricted_guest = 0;
+		enable_mbec = false;
 	}
 	if (!enable_unrestricted_guest)
 		exec_control &= ~SECONDARY_EXEC_UNRESTRICTED_GUEST;
+	if (!enable_mbec)
+		exec_control &= ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
+
 	if (kvm_pause_in_guest(vmx->vcpu.kvm))
 		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
 	if (!kvm_vcpu_apicv_active(vcpu))
@@ -5742,7 +5757,7 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
 
 static int handle_ept_violation(struct kvm_vcpu *vcpu)
 {
-	unsigned long exit_qualification;
+	unsigned long exit_qualification, rwx_mask;
 	gpa_t gpa;
 	u64 error_code;
 
@@ -5772,7 +5787,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
 		      ? PFERR_FETCH_MASK : 0;
 	/* ept page table entry is present? */
-	error_code |= (exit_qualification & EPT_VIOLATION_RWX_MASK)
+	rwx_mask = EPT_VIOLATION_READ | EPT_VIOLATION_WRITE |
+		   EPT_VIOLATION_KERNEL_INSTR;
+	if (enable_mbec)
+		rwx_mask |= EPT_VIOLATION_USER_INSTR;
+	error_code |= (exit_qualification & rwx_mask)
 		      ? PFERR_PRESENT_MASK : 0;
 
 	error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) != 0 ?
@@ -8475,6 +8494,9 @@ static __init int hardware_setup(void)
 	if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
 		enable_unrestricted_guest = 0;
 
+	if (!cpu_has_vmx_mbec() || !enable_ept)
+		enable_mbec = false;
+
 	if (!cpu_has_vmx_flexpriority())
 		flexpriority_enabled = 0;
 
@@ -8533,7 +8555,8 @@ static __init int hardware_setup(void)
 
 	if (enable_ept)
 		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
-				      cpu_has_vmx_ept_execute_only());
+				      cpu_has_vmx_ept_execute_only(),
+				      enable_mbec);
 
 	/*
 	 * Setup shadow_me_value/shadow_me_mask to include MKTME KeyID
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c2130d2c8e24..882add2412e6 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -572,6 +572,7 @@ static inline u8 vmx_get_rvi(void)
 	 SECONDARY_EXEC_ENABLE_VMFUNC |					\
 	 SECONDARY_EXEC_BUS_LOCK_DETECTION |				\
 	 SECONDARY_EXEC_NOTIFY_VM_EXITING |				\
+	 SECONDARY_EXEC_MODE_BASED_EPT_EXEC |				\
 	 SECONDARY_EXEC_ENCLS_EXITING)
 
 #define KVM_REQUIRED_VMX_TERTIARY_VM_EXEC_CONTROL 0
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 06/19] KVM: x86: Add kvm_x86_ops.fault_gva()
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (4 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 05/19] KVM: VMX: Add MBEC support Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 07/19] KVM: x86: Make memory attribute helpers more generic Mickaël Salaün
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

This function is needed for kvm_mmu_page_fault() to create synthetic
page faults.

Code originally written by Mihai Donțu and Nicușor Cîțu:
https://lore.kernel.org/r/20211006173113.26445-18-alazar@bitdefender.com
Renamed fault_gla() to fault_gva() and use the new
EPT_VIOLATION_GVA_IS_VALID.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Co-developed-by: Mihai Donțu <mdontu@bitdefender.com>
Signed-off-by: Mihai Donțu <mdontu@bitdefender.com>
Co-developed-by: Nicușor Cîțu <nicu.citu@icloud.com>
Signed-off-by: Nicușor Cîțu <nicu.citu@icloud.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/svm/svm.c             |  9 +++++++++
 arch/x86/kvm/vmx/vmx.c             | 10 ++++++++++
 4 files changed, 22 insertions(+)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index e3054e3e46d5..ba3db679db2b 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -134,6 +134,7 @@ KVM_X86_OP(msr_filter_changed)
 KVM_X86_OP(complete_emulated_msr)
 KVM_X86_OP(vcpu_deliver_sipi_vector)
 KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
+KVM_X86_OP(fault_gva)
 
 #undef KVM_X86_OP
 #undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dff10051e9b6..0415dacd4b28 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1750,6 +1750,8 @@ struct kvm_x86_ops {
 	 * Returns vCPU specific APICv inhibit reasons
 	 */
 	unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu);
+
+	u64 (*fault_gva)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index beea99c8e8e0..d32517a2cf9c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4906,6 +4906,13 @@ static int svm_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+static u64 svm_fault_gva(struct kvm_vcpu *vcpu)
+{
+	const struct vcpu_svm *svm = to_svm(vcpu);
+
+	return svm->vcpu.arch.cr2 ? svm->vcpu.arch.cr2 : ~0ull;
+}
+
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
@@ -5037,6 +5044,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 
 	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 	.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
+
+	.fault_gva = svm_fault_gva,
 };
 
 /*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1b1581f578b0..a8158bc1dda9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8233,6 +8233,14 @@ static void vmx_vm_destroy(struct kvm *kvm)
 	free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm));
 }
 
+static u64 vmx_fault_gva(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.exit_qualification & EPT_VIOLATION_GVA_IS_VALID)
+		return vmcs_readl(GUEST_LINEAR_ADDRESS);
+
+	return ~0ull;
+}
+
 static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
@@ -8373,6 +8381,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.complete_emulated_msr = kvm_complete_insn_gp,
 
 	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
+
+	.fault_gva = vmx_fault_gva,
 };
 
 static unsigned int vmx_handle_intel_pt_intr(void)
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 07/19] KVM: x86: Make memory attribute helpers more generic
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (5 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 06/19] KVM: x86: Add kvm_x86_ops.fault_gva() Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 08/19] KVM: x86: Extend kvm_vm_set_mem_attributes() with a mask Mickaël Salaün
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

To make it useful for other use cases such as Heki, remove the private
memory optimizations.

I guess we could try to infer the applied attributes to get back these
optimizations when it makes sense, but let's keep this simple for now.

Main changes:

- Replace slots_lock with slots_arch_lock to make it callable from a KVM
  hypercall.

- Move this mutex lock into kvm_vm_ioctl_set_mem_attributes() to make it
  easier to use with other locks.

- Export kvm_vm_set_mem_attributes().

- Remove the kvm_arch_pre_set_memory_attributes() and
  kvm_arch_post_set_memory_attributes() KVM_MEMORY_ATTRIBUTE_PRIVATE
  optimizations.

Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
* New patch
---
 arch/x86/kvm/mmu/mmu.c   | 23 -----------------------
 include/linux/kvm_host.h |  2 ++
 virt/kvm/kvm_main.c      | 19 ++++++++++---------
 3 files changed, 12 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7e053973125c..4d378d308762 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7251,20 +7251,6 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
 bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
 					struct kvm_gfn_range *range)
 {
-	/*
-	 * Zap SPTEs even if the slot can't be mapped PRIVATE.  KVM x86 only
-	 * supports KVM_MEMORY_ATTRIBUTE_PRIVATE, and so it *seems* like KVM
-	 * can simply ignore such slots.  But if userspace is making memory
-	 * PRIVATE, then KVM must prevent the guest from accessing the memory
-	 * as shared.  And if userspace is making memory SHARED and this point
-	 * is reached, then at least one page within the range was previously
-	 * PRIVATE, i.e. the slot's possible hugepage ranges are changing.
-	 * Zapping SPTEs in this case ensures KVM will reassess whether or not
-	 * a hugepage can be used for affected ranges.
-	 */
-	if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
-		return false;
-
 	return kvm_unmap_gfn_range(kvm, range);
 }
 
@@ -7313,15 +7299,6 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 	lockdep_assert_held_write(&kvm->mmu_lock);
 	lockdep_assert_held(&kvm->slots_lock);
 
-	/*
-	 * Calculate which ranges can be mapped with hugepages even if the slot
-	 * can't map memory PRIVATE.  KVM mustn't create a SHARED hugepage over
-	 * a range that has PRIVATE GFNs, and conversely converting a range to
-	 * SHARED may now allow hugepages.
-	 */
-	if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
-		return false;
-
 	/*
 	 * The sequence matters here: upper levels consume the result of lower
 	 * level's scanning.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ec32af17add8..85b8648fd892 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2396,6 +2396,8 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
 					struct kvm_gfn_range *range);
 bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 					 struct kvm_gfn_range *range);
+int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
+			      unsigned long attributes);
 
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 23633984142f..0096ccfbb609 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2552,7 +2552,7 @@ static bool kvm_pre_set_memory_attributes(struct kvm *kvm,
 }
 
 /* Set @attributes for the gfn range [@start, @end). */
-static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
+int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 				     unsigned long attributes)
 {
 	struct kvm_mmu_notifier_range pre_set_range = {
@@ -2577,11 +2577,11 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 
 	entry = attributes ? xa_mk_value(attributes) : NULL;
 
-	mutex_lock(&kvm->slots_lock);
+	lockdep_assert_held(&kvm->slots_arch_lock);
 
 	/* Nothing to do if the entire range as the desired attributes. */
 	if (kvm_range_has_memory_attributes(kvm, start, end, attributes))
-		goto out_unlock;
+		return r;
 
 	/*
 	 * Reserve memory ahead of time to avoid having to deal with failures
@@ -2590,7 +2590,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 	for (i = start; i < end; i++) {
 		r = xa_reserve(&kvm->mem_attr_array, i, GFP_KERNEL_ACCOUNT);
 		if (r)
-			goto out_unlock;
+			return r;
 	}
 
 	kvm_handle_gfn_range(kvm, &pre_set_range);
@@ -2602,15 +2602,13 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 	}
 
 	kvm_handle_gfn_range(kvm, &post_set_range);
-
-out_unlock:
-	mutex_unlock(&kvm->slots_lock);
-
 	return r;
 }
+
 static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 					   struct kvm_memory_attributes *attrs)
 {
+	int r;
 	gfn_t start, end;
 
 	/* flags is currently not used. */
@@ -2633,7 +2631,10 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 	 */
 	BUILD_BUG_ON(sizeof(attrs->attributes) != sizeof(unsigned long));
 
-	return kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes);
+	mutex_lock(&kvm->slots_arch_lock);
+	r = kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes);
+	mutex_unlock(&kvm->slots_arch_lock);
+	return r;
 }
 #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
 
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 08/19] KVM: x86: Extend kvm_vm_set_mem_attributes() with a mask
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (6 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 07/19] KVM: x86: Make memory attribute helpers more generic Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 09/19] KVM: x86: Extend kvm_range_has_memory_attributes() with match_all Mickaël Salaün
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

Enable to only update a subset of attributes.

This is needed to be able to use the XArray for different use cases and
make sure they don't interfere (see a following commit).

Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
* New patch
---
 arch/x86/kvm/mmu/mmu.c   |  2 +-
 include/linux/kvm_host.h |  2 +-
 virt/kvm/kvm_main.c      | 27 +++++++++++++++++++--------
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4d378d308762..d7010e09440d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7283,7 +7283,7 @@ static bool hugepage_has_attrs(struct kvm *kvm, struct kvm_memory_slot *slot,
 
 	for (gfn = start; gfn < end; gfn += KVM_PAGES_PER_HPAGE(level - 1)) {
 		if (hugepage_test_mixed(slot, gfn, level - 1) ||
-		    attrs != kvm_get_memory_attributes(kvm, gfn))
+		    !(attrs & kvm_get_memory_attributes(kvm, gfn)))
 			return false;
 	}
 	return true;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 85b8648fd892..de68390ab0f2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2397,7 +2397,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
 bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 					 struct kvm_gfn_range *range);
 int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
-			      unsigned long attributes);
+			      unsigned long attributes, unsigned long mask);
 
 static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
 {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0096ccfbb609..e2c178db17d5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2436,7 +2436,7 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 /*
  * Returns true if _all_ gfns in the range [@start, @end) have attributes
- * matching @attrs.
+ * matching the @attrs bitmask.
  */
 bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 				     unsigned long attrs)
@@ -2459,7 +2459,8 @@ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 			entry = xas_next(&xas);
 		} while (xas_retry(&xas, entry));
 
-		if (xas.xa_index != index || xa_to_value(entry) != attrs) {
+		if (xas.xa_index != index ||
+		    (xa_to_value(entry) & attrs) != attrs) {
 			has_attrs = false;
 			break;
 		}
@@ -2553,7 +2554,7 @@ static bool kvm_pre_set_memory_attributes(struct kvm *kvm,
 
 /* Set @attributes for the gfn range [@start, @end). */
 int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
-				     unsigned long attributes)
+			      unsigned long attributes, unsigned long mask)
 {
 	struct kvm_mmu_notifier_range pre_set_range = {
 		.start = start,
@@ -2572,11 +2573,8 @@ int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 		.may_block = true,
 	};
 	unsigned long i;
-	void *entry;
 	int r = 0;
 
-	entry = attributes ? xa_mk_value(attributes) : NULL;
-
 	lockdep_assert_held(&kvm->slots_arch_lock);
 
 	/* Nothing to do if the entire range as the desired attributes. */
@@ -2596,6 +2594,16 @@ int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 	kvm_handle_gfn_range(kvm, &pre_set_range);
 
 	for (i = start; i < end; i++) {
+		unsigned long value = 0;
+		void *entry;
+
+		entry = xa_load(&kvm->mem_attr_array, i);
+		if (xa_is_value(entry))
+			value = xa_to_value(entry) & ~mask;
+
+		value |= attributes & mask;
+		entry = value ? xa_mk_value(value) : NULL;
+
 		r = xa_err(xa_store(&kvm->mem_attr_array, i, entry,
 				    GFP_KERNEL_ACCOUNT));
 		KVM_BUG_ON(r, kvm);
@@ -2609,12 +2617,14 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 					   struct kvm_memory_attributes *attrs)
 {
 	int r;
+	unsigned long attrs_mask;
 	gfn_t start, end;
 
 	/* flags is currently not used. */
 	if (attrs->flags)
 		return -EINVAL;
-	if (attrs->attributes & ~kvm_supported_mem_attributes(kvm))
+	attrs_mask = kvm_supported_mem_attributes(kvm);
+	if (attrs->attributes & ~attrs_mask)
 		return -EINVAL;
 	if (attrs->size == 0 || attrs->address + attrs->size < attrs->address)
 		return -EINVAL;
@@ -2632,7 +2642,8 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
 	BUILD_BUG_ON(sizeof(attrs->attributes) != sizeof(unsigned long));
 
 	mutex_lock(&kvm->slots_arch_lock);
-	r = kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes);
+	r = kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes,
+				      attrs_mask);
 	mutex_unlock(&kvm->slots_arch_lock);
 	return r;
 }
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 09/19] KVM: x86: Extend kvm_range_has_memory_attributes() with match_all
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (7 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 08/19] KVM: x86: Extend kvm_vm_set_mem_attributes() with a mask Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions Mickaël Salaün
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

This enables to check if an attribute is tied to any memory page in a
range. This will be useful in a folling commit to check for
KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE.

Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
* New patch
---
 arch/x86/kvm/mmu/mmu.c   |  2 +-
 include/linux/kvm_host.h |  2 +-
 virt/kvm/kvm_main.c      | 27 ++++++++++++++++++---------
 3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d7010e09440d..2024ff21d036 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7279,7 +7279,7 @@ static bool hugepage_has_attrs(struct kvm *kvm, struct kvm_memory_slot *slot,
 	const unsigned long end = start + KVM_PAGES_PER_HPAGE(level);
 
 	if (level == PG_LEVEL_2M)
-		return kvm_range_has_memory_attributes(kvm, start, end, attrs);
+		return kvm_range_has_memory_attributes(kvm, start, end, attrs, true);
 
 	for (gfn = start; gfn < end; gfn += KVM_PAGES_PER_HPAGE(level - 1)) {
 		if (hugepage_test_mixed(slot, gfn, level - 1) ||
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index de68390ab0f2..9ecb016a336f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2391,7 +2391,7 @@ static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn
 }
 
 bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
-				     unsigned long attrs);
+				     unsigned long attrs, bool match_all);
 bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
 					struct kvm_gfn_range *range);
 bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e2c178db17d5..67dbaaf40c1c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2435,11 +2435,11 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
 
 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
 /*
- * Returns true if _all_ gfns in the range [@start, @end) have attributes
- * matching the @attrs bitmask.
+ * According to @match_all, returns true if _all_ (respectively _any_) gfns in
+ * the range [@start, @end) have attributes matching the @attrs bitmask.
  */
 bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
-				     unsigned long attrs)
+				     unsigned long attrs, bool match_all)
 {
 	XA_STATE(xas, &kvm->mem_attr_array, start);
 	unsigned long index;
@@ -2453,16 +2453,25 @@ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 		goto out;
 	}
 
-	has_attrs = true;
+	has_attrs = match_all;
 	for (index = start; index < end; index++) {
 		do {
 			entry = xas_next(&xas);
 		} while (xas_retry(&xas, entry));
 
-		if (xas.xa_index != index ||
-		    (xa_to_value(entry) & attrs) != attrs) {
-			has_attrs = false;
-			break;
+		if (match_all) {
+			if (xas.xa_index != index ||
+			    (xa_to_value(entry) & attrs) != attrs) {
+				has_attrs = false;
+				break;
+			}
+		} else {
+			index = xas.xa_index;
+			if (index < end &&
+			    (xa_to_value(entry) & attrs) == attrs) {
+				has_attrs = true;
+				break;
+			}
 		}
 	}
 
@@ -2578,7 +2587,7 @@ int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
 	lockdep_assert_held(&kvm->slots_arch_lock);
 
 	/* Nothing to do if the entire range as the desired attributes. */
-	if (kvm_range_has_memory_attributes(kvm, start, end, attributes))
+	if (kvm_range_has_memory_attributes(kvm, start, end, attributes, true))
 		return r;
 
 	/*
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (8 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 09/19] KVM: x86: Extend kvm_range_has_memory_attributes() with match_all Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 11/19] KVM: x86: Add new hypercall to set EPT permissions Mickaël Salaün
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

Define memory attributes that can be associated with guest physical
pages in KVM. To begin with, define permissions as memory attributes
(READ, WRITE and EXECUTE), and the IMMUTABLE property. In the future,
other attributes could be defined.

Use the memory attribute feature to implement the following functions in
KVM:

- kvm_permissions_set(): Set the permissions for a guest page in the
  memory attribute XArray.

- kvm_permissions_get(): Retrieve the permissions associated with a
  guest page in same XArray.

These functions will be called in a following commit to associate proper
permissions with guest pages instead of RWX for all the pages.

Add 4 new memory attributes, private to the KVM implementation:
- KVM_MEMORY_ATTRIBUTE_HEKI_READ
- KVM_MEMORY_ATTRIBUTE_HEKI_WRITE
- KVM_MEMORY_ATTRIBUTE_HEKI_EXEC
- KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Co-developed-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
* New patch replacing the deprecated page tracking mechanism.
* Add new files: virt/lib/kvm_permissions.c and
  include/linux/kvm_mem_attr.h
* Add new kvm_permissions_get() and kvm_permissions_set() leveraging
  the to-be-upstream memory attributes for KVM.
* Introduce the KVM_MEMORY_ATTRIBUTE_HEKI_* values.
---
 arch/x86/kvm/Kconfig         |   1 +
 arch/x86/kvm/Makefile        |   4 +-
 include/linux/kvm_mem_attr.h |  32 +++++++++++
 include/uapi/linux/kvm.h     |   5 ++
 virt/heki/Kconfig            |   1 +
 virt/lib/kvm_permissions.c   | 104 +++++++++++++++++++++++++++++++++++
 6 files changed, 146 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/kvm_mem_attr.h
 create mode 100644 virt/lib/kvm_permissions.c

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 7a3b52b7e456..ea6d73241632 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -50,6 +50,7 @@ config KVM
 	select HAVE_KVM_PM_NOTIFIER if PM
 	select KVM_GENERIC_HARDWARE_ENABLING
 	select HYPERVISOR_SUPPORTS_HEKI
+	select SPARSEMEM
 	help
 	  Support hosting fully virtualized guest machines using hardware
 	  virtualization extensions.  You will need a fairly recent
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 80e3fe184d17..aac51a5d2cae 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -9,10 +9,12 @@ endif
 
 include $(srctree)/virt/kvm/Makefile.kvm
 
+VIRT_LIB = ../../../virt/lib
+
 kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
 			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
 			   hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
-			   mmu/spte.o
+			   mmu/spte.o $(VIRT_LIB)/kvm_permissions.o
 
 ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
diff --git a/include/linux/kvm_mem_attr.h b/include/linux/kvm_mem_attr.h
new file mode 100644
index 000000000000..0a755025e553
--- /dev/null
+++ b/include/linux/kvm_mem_attr.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * KVM guest page permissions - Definitions.
+ *
+ * Copyright © 2023 Microsoft Corporation.
+ */
+#ifndef __KVM_MEM_ATTR_H__
+#define __KVM_MEM_ATTR_H__
+
+#include <linux/kvm_host.h>
+#include <linux/kvm_types.h>
+
+/* clang-format off */
+
+#define MEM_ATTR_READ			BIT(0)
+#define MEM_ATTR_WRITE			BIT(1)
+#define MEM_ATTR_EXEC			BIT(2)
+#define MEM_ATTR_IMMUTABLE		BIT(3)
+
+#define MEM_ATTR_PROT ( \
+	MEM_ATTR_READ | \
+	MEM_ATTR_WRITE | \
+	MEM_ATTR_EXEC | \
+	MEM_ATTR_IMMUTABLE)
+
+/* clang-format on */
+
+int kvm_permissions_set(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end,
+			unsigned long heki_attr);
+unsigned long kvm_permissions_get(struct kvm *kvm, gfn_t gfn);
+
+#endif /* __KVM_MEM_ATTR_H__ */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2477b4a16126..2b5b90216565 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -2319,6 +2319,11 @@ struct kvm_memory_attributes {
 
 #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
 
+#define KVM_MEMORY_ATTRIBUTE_HEKI_READ         (1ULL << 4)
+#define KVM_MEMORY_ATTRIBUTE_HEKI_WRITE        (1ULL << 5)
+#define KVM_MEMORY_ATTRIBUTE_HEKI_EXEC         (1ULL << 6)
+#define KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE    (1ULL << 7)
+
 #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
 
 struct kvm_create_guest_memfd {
diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig
index 5ea75b595667..75a784653e31 100644
--- a/virt/heki/Kconfig
+++ b/virt/heki/Kconfig
@@ -5,6 +5,7 @@
 config HEKI
 	bool "Hypervisor Enforced Kernel Integrity (Heki)"
 	depends on ARCH_SUPPORTS_HEKI && HYPERVISOR_SUPPORTS_HEKI
+	select KVM_GENERIC_MEMORY_ATTRIBUTES
 	help
 	  This feature enhances guest virtual machine security by taking
 	  advantage of security features provided by the hypervisor for guests.
diff --git a/virt/lib/kvm_permissions.c b/virt/lib/kvm_permissions.c
new file mode 100644
index 000000000000..9f4e8027d21c
--- /dev/null
+++ b/virt/lib/kvm_permissions.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * KVM guest page permissions - functions.
+ *
+ * Copyright © 2023 Microsoft Corporation.
+ */
+#include <linux/kvm_host.h>
+#include <linux/kvm_mem_attr.h>
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) "kvm: heki: " fmt
+
+/* clang-format off */
+
+static unsigned long kvm_default_permissions =
+	MEM_ATTR_READ |
+	MEM_ATTR_WRITE |
+	MEM_ATTR_EXEC;
+
+static unsigned long kvm_memory_attributes_heki =
+	KVM_MEMORY_ATTRIBUTE_HEKI_READ |
+	KVM_MEMORY_ATTRIBUTE_HEKI_WRITE |
+	KVM_MEMORY_ATTRIBUTE_HEKI_EXEC |
+	KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE;
+
+/* clang-format on */
+
+static unsigned long heki_attr_to_kvm_attr(unsigned long heki_attr)
+{
+	unsigned long kvm_attr = 0;
+
+	if (WARN_ON_ONCE((heki_attr | MEM_ATTR_PROT) != MEM_ATTR_PROT))
+		return 0;
+
+	if (heki_attr & MEM_ATTR_READ)
+		kvm_attr |= KVM_MEMORY_ATTRIBUTE_HEKI_READ;
+	if (heki_attr & MEM_ATTR_WRITE)
+		kvm_attr |= KVM_MEMORY_ATTRIBUTE_HEKI_WRITE;
+	if (heki_attr & MEM_ATTR_EXEC)
+		kvm_attr |= KVM_MEMORY_ATTRIBUTE_HEKI_EXEC;
+	if (heki_attr & MEM_ATTR_IMMUTABLE)
+		kvm_attr |= KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE;
+	return kvm_attr;
+}
+
+static unsigned long kvm_attr_to_heki_attr(unsigned long kvm_attr)
+{
+	unsigned long heki_attr = 0;
+
+	if (kvm_attr & KVM_MEMORY_ATTRIBUTE_HEKI_READ)
+		heki_attr |= MEM_ATTR_READ;
+	if (kvm_attr & KVM_MEMORY_ATTRIBUTE_HEKI_WRITE)
+		heki_attr |= MEM_ATTR_WRITE;
+	if (kvm_attr & KVM_MEMORY_ATTRIBUTE_HEKI_EXEC)
+		heki_attr |= MEM_ATTR_EXEC;
+	if (kvm_attr & KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE)
+		heki_attr |= MEM_ATTR_IMMUTABLE;
+	return heki_attr;
+}
+
+unsigned long kvm_permissions_get(struct kvm *kvm, gfn_t gfn)
+{
+	unsigned long kvm_attr = 0;
+
+	/*
+	 * Retrieve the permissions for a guest page. If not present (i.e., no
+	 * attribute), then return default permissions (RWX).  This means
+	 * setting permissions to 0 resets them to RWX. We might want to
+	 * revisit that in a future version.
+	 */
+	kvm_attr = kvm_get_memory_attributes(kvm, gfn);
+	if (kvm_attr)
+		return kvm_attr_to_heki_attr(kvm_attr);
+	else
+		return kvm_default_permissions;
+}
+EXPORT_SYMBOL_GPL(kvm_permissions_get);
+
+int kvm_permissions_set(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end,
+			unsigned long heki_attr)
+{
+	if ((heki_attr | MEM_ATTR_PROT) != MEM_ATTR_PROT)
+		return -EINVAL;
+
+	if (gfn_end <= gfn_start)
+		return -EINVAL;
+
+	if (kvm_range_has_memory_attributes(kvm, gfn_start, gfn_end,
+					    KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE,
+					    false)) {
+		pr_warn_ratelimited(
+			"Guest tried to change immutable permission for GFNs %llx-%llx\n",
+			gfn_start, gfn_end);
+		return -EPERM;
+	}
+
+	return kvm_vm_set_mem_attributes(kvm, gfn_start, gfn_end,
+					 heki_attr_to_kvm_attr(heki_attr),
+					 kvm_memory_attributes_heki);
+}
+EXPORT_SYMBOL_GPL(kvm_permissions_set);
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 11/19] KVM: x86: Add new hypercall to set EPT permissions
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (9 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  4:45   ` kernel test robot
  2023-11-13  2:23 ` [RFC PATCH v2 12/19] x86: Implement the Memory Table feature to store arbitrary per-page data Mickaël Salaün
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>

Add a new KVM_HC_PROTECT_MEMORY hypercall that enables a guest to set
EPT permissions for guest pages.

Until now, all of the guest pages (except Page Tracked pages) are given
RWX permissions in the EPT. In Heki, we want to restrict the permissions
to what is strictly needed. For instance, a text page only needs R_X. A
read-only data page only needs R__. A normal data page only needs RW_.

The guest will pass a page list to the hypercall. The page list is a
list of one or more physical pages each of which contains a array of
guest ranges and attributes. Currently, the attributes only contain
permissions. In the future, other attributes may be added.  The
hypervisor will apply the specified permissions in the EPT.

When a guest try to access its memory in a way which is not allowed, KVM
creates a synthetic kernel page fault. This fault should be handled by
the guest, which is not currently the case, making it try again and
again.  This will be part of a follow-up patch series.

When enabled, KASAN reveals a bug in the memory attributes patches. We
didn't find the source of this issue yet.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---

Changes since v1:

The original hypercall contained support for statically defined sections
(text, rodata, etc). It has been redesigned like this:

- The previous version accepted an array of physically contiguous
  ranges. This is appropriate for statically defined sections which are
  loaded in contiguous memory.  But, for other cases like module
  loading, the pages would be discontiguous. The current version of the
  hypercall accepts a page list to fix this.

- The previous version passed permission combinations. E.g.,
  HEKI_MEM_ATTR_EXEC would imply R_X. The current version passes
  permissions as memory attributes and each of the permissions must be
  separately specified. E.g., for text, (MEM_ATTR_READ | MEM_ATTR_EXEC)
  must be passed.

- The previous version locked down the permissions for guest pages so
  that once the permissions are set, they cannot be changed. In this
  version, permissions can be changed dynamically, except when the
  MEM_ATTR_IMMUTABLE is set.  So, the hypercall has been renamed from
  KVM_HC_LOCK_MEM_PAGE_RANGES to KVM_HC_PROTECT_MEMORY. The dynamic
  setting of permissions is needed by the following features (probably
  not a complete list):
  - Kprobes and Optprobes
  - Static call optimization
  - Jump Label optimization
  - Ftrace and Livepatch
  - Module loading and unloading
  - eBPF JIT
  - Kexec
  - Kgdb

Examples:
- A text page can be made writable very briefly to install a probe or a
  trace.
- eBPF JIT can populate a writable page with code and make it
  read-execute.
- Module load can load read-only data into a writable page and make the
  page read-only.
- When pages are unmapped, their permissions in the EPT must revert to
  read-write.
---
 Documentation/virt/kvm/x86/hypercalls.rst |  14 +++
 arch/x86/kvm/mmu/mmu.c                    |  77 +++++++++++++
 arch/x86/kvm/mmu/paging_tmpl.h            |   3 +
 arch/x86/kvm/mmu/spte.c                   |  15 ++-
 arch/x86/kvm/x86.c                        | 130 ++++++++++++++++++++++
 include/linux/heki.h                      |  29 +++++
 include/uapi/linux/kvm_para.h             |   1 +
 7 files changed, 267 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/x86/hypercalls.rst b/Documentation/virt/kvm/x86/hypercalls.rst
index 3178576f4c47..28865d111773 100644
--- a/Documentation/virt/kvm/x86/hypercalls.rst
+++ b/Documentation/virt/kvm/x86/hypercalls.rst
@@ -207,3 +207,17 @@ The hypercall lets a guest request control register flags to be pinned for
 itself.
 
 Returns 0 on success or a KVM error code otherwise.
+
+10. KVM_HC_PROTECT_MEMORY
+-------------------------
+
+:Architecture: x86
+:Status: active
+:Purpose: Request permissions to be set in EPT
+
+- a0: physical address of a struct heki_page_list
+
+The hypercall lets a guest request memory permissions to be set for a list
+of physical pages.
+
+Returns 0 on success or a KVM error code otherwise.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2024ff21d036..2d09bcc35462 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -47,9 +47,11 @@
 #include <linux/sched/signal.h>
 #include <linux/uaccess.h>
 #include <linux/hash.h>
+#include <linux/heki.h>
 #include <linux/kern_levels.h>
 #include <linux/kstrtox.h>
 #include <linux/kthread.h>
+#include <linux/kvm_mem_attr.h>
 
 #include <asm/page.h>
 #include <asm/memtype.h>
@@ -4446,6 +4448,75 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
 	       mmu_invalidate_retry_gfn(vcpu->kvm, fault->mmu_seq, fault->gfn);
 }
 
+static bool mem_attr_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+{
+	unsigned long perm;
+	bool noexec, nowrite;
+
+	if (unlikely(fault->rsvd))
+		return false;
+
+	if (!fault->present)
+		return false;
+
+	perm = kvm_permissions_get(vcpu->kvm, fault->gfn);
+	noexec = !(perm & MEM_ATTR_EXEC);
+	nowrite = !(perm & MEM_ATTR_WRITE);
+
+	if (fault->exec && noexec) {
+		struct x86_exception exception = {
+			.vector = PF_VECTOR,
+			.error_code_valid = true,
+			.error_code = fault->error_code,
+			.nested_page_fault = false,
+			/*
+			 * TODO: This kind of kernel page fault needs to be
+			 * handled by the guest, which is not currently the
+			 * case, making it try again and again.
+			 *
+			 * You may want to test with cr2_or_gva to see the page
+			 * fault caught by the guest kernel (thinking it is a
+			 * user space fault).
+			 */
+			.address = static_call(kvm_x86_fault_gva)(vcpu),
+			.async_page_fault = false,
+		};
+
+		pr_warn_ratelimited(
+			"heki: Creating fetch #PF at 0x%016llx GFN=%llx\n",
+			exception.address, fault->gfn);
+		kvm_inject_page_fault(vcpu, &exception);
+		return true;
+	}
+
+	if (fault->write && nowrite) {
+		struct x86_exception exception = {
+			.vector = PF_VECTOR,
+			.error_code_valid = true,
+			.error_code = fault->error_code,
+			.nested_page_fault = false,
+			/*
+			 * TODO: This kind of kernel page fault needs to be
+			 * handled by the guest, which is not currently the
+			 * case, making it try again and again.
+			 *
+			 * You may want to test with cr2_or_gva to see the page
+			 * fault caught by the guest kernel (thinking it is a
+			 * user space fault).
+			 */
+			.address = static_call(kvm_x86_fault_gva)(vcpu),
+			.async_page_fault = false,
+		};
+
+		pr_warn_ratelimited(
+			"heki: Creating write #PF at 0x%016llx GFN=%llx\n",
+			exception.address, fault->gfn);
+		kvm_inject_page_fault(vcpu, &exception);
+		return true;
+	}
+	return false;
+}
+
 static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	int r;
@@ -4457,6 +4528,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (page_fault_handle_page_track(vcpu, fault))
 		return RET_PF_EMULATE;
 
+	if (mem_attr_fault(vcpu, fault))
+		return RET_PF_RETRY;
+
 	r = fast_page_fault(vcpu, fault);
 	if (r != RET_PF_INVALID)
 		return r;
@@ -4537,6 +4611,9 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 	if (page_fault_handle_page_track(vcpu, fault))
 		return RET_PF_EMULATE;
 
+	if (mem_attr_fault(vcpu, fault))
+		return RET_PF_RETRY;
+
 	r = fast_page_fault(vcpu, fault);
 	if (r != RET_PF_INVALID)
 		return r;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 08f0c8d28245..49e8295d62dd 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -820,6 +820,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 		return RET_PF_EMULATE;
 	}
 
+	if (mem_attr_fault(vcpu, fault))
+		return RET_PF_RETRY;
+
 	r = mmu_topup_memory_caches(vcpu, true);
 	if (r)
 		return r;
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 386cc1e8aab9..d72dc149424c 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -10,6 +10,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/kvm_host.h>
+#include <linux/kvm_mem_attr.h>
 #include "mmu.h"
 #include "mmu_internal.h"
 #include "x86.h"
@@ -143,6 +144,11 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	int level = sp->role.level;
 	u64 spte = SPTE_MMU_PRESENT_MASK;
 	bool wrprot = false;
+	unsigned long perm;
+
+	perm = kvm_permissions_get(vcpu->kvm, gfn);
+	if (!(perm & MEM_ATTR_WRITE))
+		pte_access &= ~ACC_WRITE_MASK;
 
 	WARN_ON_ONCE(!pte_access && !shadow_present_mask);
 
@@ -178,10 +184,15 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		pte_access &= ~ACC_EXEC_MASK;
 	}
 
-	if (pte_access & ACC_EXEC_MASK)
+	if (pte_access & ACC_EXEC_MASK) {
 		spte |= shadow_x_mask;
-	else
+#ifdef CONFIG_HEKI
+		if (enable_mbec && !(perm & MEM_ATTR_EXEC))
+			spte &= ~VMX_EPT_EXECUTABLE_MASK;
+#endif
+	} else {
 		spte |= shadow_nx_mask;
+	}
 
 	if (pte_access & ACC_USER_MASK)
 		spte |= shadow_user_mask;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 43c28a6953bf..44f94b75ff16 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -62,6 +62,8 @@
 #include <linux/entry-kvm.h>
 #include <linux/suspend.h>
 #include <linux/smp.h>
+#include <linux/heki.h>
+#include <linux/kvm_mem_attr.h>
 
 #include <trace/events/ipi.h>
 #include <trace/events/kvm.h>
@@ -9983,6 +9985,131 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
 	return;
 }
 
+#ifdef CONFIG_HEKI
+
+static int heki_protect_memory(struct kvm *const kvm, gpa_t list_pa)
+{
+	struct heki_page_list *list, *head;
+	struct heki_pages *pages;
+	size_t size;
+	int i, npages, err = 0;
+
+	/* Read in the page list. */
+	head = NULL;
+	npages = 0;
+	while (list_pa) {
+		list = kmalloc(PAGE_SIZE, GFP_KERNEL);
+		if (!list) {
+			/* For want of a better error number. */
+			err = -KVM_E2BIG;
+			goto free;
+		}
+
+		err = kvm_read_guest(kvm, list_pa, list, sizeof(*list));
+		if (err) {
+			pr_warn("heki: Can't read list %llx\n", list_pa);
+			err = -KVM_EFAULT;
+			goto free;
+		}
+		list_pa += sizeof(*list);
+
+		size = list->npages * sizeof(*pages);
+		pages = list->pages;
+		err = kvm_read_guest(kvm, list_pa, pages, size);
+		if (err) {
+			pr_warn("heki: Can't read pages %llx\n", list_pa);
+			err = -KVM_EFAULT;
+			goto free;
+		}
+
+		list->next = head;
+		head = list;
+		npages += list->npages;
+		list_pa = list->next_pa;
+	}
+
+	/* For kvm_permissions_set() -> kvm_vm_set_mem_attributes() */
+	mutex_lock(&kvm->slots_arch_lock);
+
+	/*
+	 * Walk the page list, apply the permissions for each guest page and
+	 * zap the EPT entry of each page. The pages will be faulted in on
+	 * demand and the correct permissions will be applied at the correct
+	 * level for the pages.
+	 */
+	for (list = head; list; list = list->next) {
+		pages = list->pages;
+
+		for (i = 0; i < list->npages; i++) {
+			gfn_t gfn_start, gfn_end;
+			unsigned long permissions;
+
+			if (!PAGE_ALIGNED(pages[i].pa)) {
+				pr_warn("heki: GPA not aligned: %llx\n",
+					pages[i].pa);
+				err = -KVM_EINVAL;
+				goto unlock;
+			}
+			if (!PAGE_ALIGNED(pages[i].epa)) {
+				pr_warn("heki: GPA not aligned: %llx\n",
+					pages[i].epa);
+				err = -KVM_EINVAL;
+				goto unlock;
+			}
+
+			gfn_start = gpa_to_gfn(pages[i].pa);
+			gfn_end = gpa_to_gfn(pages[i].epa);
+			permissions = pages[i].permissions;
+
+			if (!permissions || (permissions & ~MEM_ATTR_PROT)) {
+				err = -KVM_EINVAL;
+				goto unlock;
+			}
+
+			if (!(permissions & MEM_ATTR_EXEC) && !enable_mbec) {
+				/*
+				 * Guests can check for MBEC support to avoid
+				 * this error message. We will continue
+				 * applying restrictions partially.
+				 */
+				pr_warn("heki: Clearing kernel exec "
+					"depends on MBEC, which is disabled.");
+				permissions |= MEM_ATTR_EXEC;
+			}
+
+			pr_warn("heki: Request to protect GFNs %llx-%llx"
+				" with %s permissions=%s%s%s\n",
+				gfn_start, gfn_end,
+				(permissions & MEM_ATTR_IMMUTABLE) ?
+					"immutable" :
+					"mutable",
+				(permissions & MEM_ATTR_READ) ? "r" : "_",
+				(permissions & MEM_ATTR_WRITE) ? "w" : "_",
+				(permissions & MEM_ATTR_EXEC) ? "x" : "_");
+
+			err = kvm_permissions_set(kvm, gfn_start, gfn_end,
+						  permissions);
+			if (err) {
+				pr_warn("heki: Failed to set permissions\n");
+				goto unlock;
+			}
+		}
+	}
+
+unlock:
+	mutex_unlock(&kvm->slots_arch_lock);
+
+free:
+	while (head) {
+		list = head;
+		head = head->next;
+		kfree(list);
+	}
+	return err;
+}
+
+#endif /* CONFIG_HEKI */
+
 static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
 {
 	u64 ret = vcpu->run->hypercall.ret;
@@ -10097,6 +10224,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 				return ret;
 		}
 		break;
+	case KVM_HC_PROTECT_MEMORY:
+		ret = heki_protect_memory(vcpu->kvm, a0);
+		break;
 #endif /* CONFIG_HEKI */
 	default:
 		ret = -KVM_ENOSYS;
diff --git a/include/linux/heki.h b/include/linux/heki.h
index 96ccb17657e5..89cc9273a968 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -8,6 +8,7 @@
 #ifndef __HEKI_H__
 #define __HEKI_H__
 
+#include <linux/kvm_types.h>
 #include <linux/types.h>
 #include <linux/bug.h>
 #include <linux/cache.h>
@@ -17,6 +18,32 @@
 
 #ifdef CONFIG_HEKI
 
+/*
+ * This structure contains a guest physical range and its permissions (RWX).
+ */
+struct heki_pages {
+	gpa_t pa;
+	gpa_t epa;
+	unsigned long permissions;
+};
+
+/*
+ * Guest ranges are passed to the VMM or hypervisor so they can be authenticated
+ * and their permissions can be set in the host page table. When an array of
+ * these is passed to the Hypervisor or VMM, the array must be in physically
+ * contiguous memory.
+ *
+ * This struct occupies one page. In each page, an array of guest ranges can
+ * be passed. A guest request to the VMM/Hypervisor may contain a list of
+ * these structs (linked by "next_pa").
+ */
+struct heki_page_list {
+	struct heki_page_list *next;
+	gpa_t next_pa;
+	unsigned long npages;
+	struct heki_pages pages[];
+};
+
 /*
  * A hypervisor that supports Heki will instantiate this structure to
  * provide hypervisor specific functions for Heki.
@@ -36,6 +63,8 @@ struct heki {
 extern struct heki heki;
 extern bool heki_enabled;
 
+extern bool __read_mostly enable_mbec;
+
 void heki_early_init(void);
 void heki_late_init(void);
 
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 2ed418704603..938c9006e354 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -31,6 +31,7 @@
 #define KVM_HC_SCHED_YIELD		11
 #define KVM_HC_MAP_GPA_RANGE		12
 #define KVM_HC_LOCK_CR_UPDATE		13
+#define KVM_HC_PROTECT_MEMORY		14
 
 /*
  * hypercalls use architecture specific
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 12/19] x86: Implement the Memory Table feature to store arbitrary per-page data
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (10 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 11/19] KVM: x86: Add new hypercall to set EPT permissions Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-22  7:19   ` kernel test robot
  2023-11-13  2:23 ` [RFC PATCH v2 13/19] heki: Implement a kernel page table walker Mickaël Salaün
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>

This feature can be used by a consumer to associate any arbitrary
pointer with a physical page. The feature implements a page table format
that mirrors the hardware page table. A leaf entry in the table points
to consumer data for that page.

The page table format has these advantages:

- The format allows for a sparse representation. This is useful since
  the physical address space can be large and is typically sparsely
  populated in a system.

- A consumer of this feature can choose to populate data just for the
  pages he is interested in.

- Information can be stored for large pages, if a consumer wishes.

For instance, for Heki, the guest kernel uses this to create permissions
counters for each guest physical page. The permissions counters reflects
the collective permissions for a guest physical page across all mappings
to that page. This allows the guest to request the hypervisor to set
only the necessary permissions for a guest physical page in the EPT
(instead of RWX).

This feature could also be used to improve the KVM's memory attribute
and the write page tracking.

We will support large page entries in mem_table in a future version
thanks to extra mem_table_ops's merge() and split() operations.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---

Changes since v1:
* New patch and new file: kernel/mem_table.c
---
 arch/x86/kernel/setup.c   |   2 +
 include/linux/heki.h      |   1 +
 include/linux/mem_table.h |  55 ++++++++++
 kernel/Makefile           |   2 +
 kernel/mem_table.c        | 219 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 279 insertions(+)
 create mode 100644 include/linux/mem_table.h
 create mode 100644 kernel/mem_table.c

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b098b1fa2470..e7ae46953ae4 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -25,6 +25,7 @@
 #include <linux/static_call.h>
 #include <linux/swiotlb.h>
 #include <linux/random.h>
+#include <linux/mem_table.h>
 
 #include <uapi/linux/mount.h>
 
@@ -1315,6 +1316,7 @@ void __init setup_arch(char **cmdline_p)
 #endif
 
 	unwind_init();
+	mem_table_init(PG_LEVEL_4K);
 }
 
 #ifdef CONFIG_X86_32
diff --git a/include/linux/heki.h b/include/linux/heki.h
index 89cc9273a968..9b0c966c50d1 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -15,6 +15,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/printk.h>
+#include <linux/slab.h>
 
 #ifdef CONFIG_HEKI
 
diff --git a/include/linux/mem_table.h b/include/linux/mem_table.h
new file mode 100644
index 000000000000..738bf12309f3
--- /dev/null
+++ b/include/linux/mem_table.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Memory table feature - Definitions.
+ *
+ * Copyright © 2023 Microsoft Corporation.
+ */
+
+#ifndef __MEM_TABLE_H__
+#define __MEM_TABLE_H__
+
+/* clang-format off */
+
+/*
+ * The MEM_TABLE bit is set on entries that point to an intermediate table.
+ * So, this bit is reserved. This means that pointers to consumer data must
+ * be at least two-byte aligned (so the MEM_TABLE bit is 0).
+ */
+#define MEM_TABLE		BIT(0)
+#define IS_LEAF(entry)		!((uintptr_t)entry & MEM_TABLE)
+
+/* clang-format on */
+
+/*
+ * A memory table is arranged exactly like a page table. The memory table
+ * configuration reflects the hardware page table configuration.
+ */
+
+/* Parameters at each level of the memory table hierarchy. */
+struct mem_table_level {
+	unsigned int number;
+	unsigned int nentries;
+	unsigned int shift;
+	unsigned int mask;
+};
+
+struct mem_table {
+	struct mem_table_level *level;
+	struct mem_table_ops *ops;
+	bool changed;
+	void *entries[];
+};
+
+/* Operations that need to be supplied by a consumer of memory tables. */
+struct mem_table_ops {
+	void (*free)(void *buf);
+};
+
+void mem_table_init(unsigned int base_level);
+struct mem_table *mem_table_alloc(struct mem_table_ops *ops);
+void mem_table_free(struct mem_table *table);
+void **mem_table_create(struct mem_table *table, phys_addr_t pa);
+void **mem_table_find(struct mem_table *table, phys_addr_t pa,
+		      unsigned int *level_num);
+
+#endif /* __MEM_TABLE_H__ */
diff --git a/kernel/Makefile b/kernel/Makefile
index 3947122d618b..dcef03ec5c54 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -131,6 +131,8 @@ obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o
 obj-$(CONFIG_RESOURCE_KUNIT_TEST) += resource_kunit.o
 obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o
 
+obj-$(CONFIG_SPARSEMEM) += mem_table.o
+
 CFLAGS_stackleak.o += $(DISABLE_STACKLEAK_PLUGIN)
 obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o
 KASAN_SANITIZE_stackleak.o := n
diff --git a/kernel/mem_table.c b/kernel/mem_table.c
new file mode 100644
index 000000000000..280a1b5ddde0
--- /dev/null
+++ b/kernel/mem_table.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Memory table feature.
+ *
+ * This feature can be used by a consumer to associate any arbitrary pointer
+ * with a physical page. The feature implements a page table format that
+ * mirrors the hardware page table. A leaf entry in the table points to
+ * consumer data for that page.
+ *
+ * The page table format has these advantages:
+ *
+ *	- The format allows for a sparse representation. This is useful since
+ *	  the physical address space can be large and is typically sparsely
+ *	  populated in a system.
+ *
+ *	- A consumer of this feature can choose to populate data just for
+ *	  the pages he is interested in.
+ *
+ *	- Information can be stored for large pages, if a consumer wishes.
+ *
+ * For instance, for Heki, the guest kernel uses this to create permissions
+ * counters for each guest physical page. The permissions counters reflects the
+ * collective permissions for a guest physical page across all mappings to that
+ * page. This allows the guest to request the hypervisor to set only the
+ * necessary permissions for a guest physical page in the EPT (instead of RWX).
+ *
+ * Copyright © 2023 Microsoft Corporation.
+ */
+
+/*
+ * Memory table functions use recursion for simplicity. The recursion is bounded
+ * by the number of hardware page table levels.
+ *
+ * Locking is left to the caller of these functions.
+ */
+#include <linux/heki.h>
+#include <linux/mem_table.h>
+#include <linux/pgtable.h>
+
+#define TABLE(entry) ((void *)((uintptr_t)entry & ~MEM_TABLE))
+#define ENTRY(table) ((void *)((uintptr_t)table | MEM_TABLE))
+
+/*
+ * Within this feature, the table levels start from 0. On X86, the base level
+ * is not 0.
+ */
+unsigned int mem_table_base_level __ro_after_init;
+unsigned int mem_table_nlevels __ro_after_init;
+struct mem_table_level mem_table_levels[CONFIG_PGTABLE_LEVELS] __ro_after_init;
+
+void __init mem_table_init(unsigned int base_level)
+{
+	struct mem_table_level *level;
+	unsigned long shift, delta_shift;
+	int physmem_bits;
+	int i, max_levels;
+
+	/*
+	 * Compute the actual number of levels present. Compute the parameters
+	 * for each level.
+	 */
+	shift = ilog2(PAGE_SIZE);
+	physmem_bits = PAGE_SHIFT;
+	max_levels = CONFIG_PGTABLE_LEVELS;
+
+	for (i = 0; i < max_levels && physmem_bits < MAX_PHYSMEM_BITS; i++) {
+		level = &mem_table_levels[i];
+
+		switch (i) {
+		case 0:
+			level->nentries = PTRS_PER_PTE;
+			break;
+		case 1:
+			level->nentries = PTRS_PER_PMD;
+			break;
+		case 2:
+			level->nentries = PTRS_PER_PUD;
+			break;
+		case 3:
+			level->nentries = PTRS_PER_P4D;
+			break;
+		case 4:
+			level->nentries = PTRS_PER_PGD;
+			break;
+		}
+		level->number = i;
+		level->shift = shift;
+		level->mask = level->nentries - 1;
+
+		delta_shift = ilog2(level->nentries);
+		shift += delta_shift;
+		physmem_bits += delta_shift;
+	}
+	mem_table_nlevels = i;
+	mem_table_base_level = base_level;
+}
+
+struct mem_table *mem_table_alloc(struct mem_table_ops *ops)
+{
+	struct mem_table_level *level;
+	struct mem_table *table;
+
+	level = &mem_table_levels[mem_table_nlevels - 1];
+
+	table = kzalloc(struct_size(table, entries, level->nentries),
+			GFP_KERNEL);
+	if (table) {
+		table->level = level;
+		table->ops = ops;
+		return table;
+	}
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(mem_table_alloc);
+
+static void _mem_table_free(struct mem_table *table)
+{
+	struct mem_table_level *level = table->level;
+	void **entries = table->entries;
+	struct mem_table_ops *ops = table->ops;
+	int i;
+
+	for (i = 0; i < level->nentries; i++) {
+		if (!entries[i])
+			continue;
+		if (IS_LEAF(entries[i])) {
+			/* The consumer frees the pointer. */
+			ops->free(entries[i]);
+			continue;
+		}
+		_mem_table_free(TABLE(entries[i]));
+	}
+	kfree(table);
+}
+
+void mem_table_free(struct mem_table *table)
+{
+	_mem_table_free(table);
+}
+EXPORT_SYMBOL_GPL(mem_table_free);
+
+static void **_mem_table_find(struct mem_table *table, phys_addr_t pa,
+			      unsigned int *level_number)
+{
+	struct mem_table_level *level = table->level;
+	void **entries = table->entries;
+	unsigned long i;
+
+	i = (pa >> level->shift) & level->mask;
+
+	*level_number = level->number;
+	if (!entries[i])
+		return NULL;
+
+	if (IS_LEAF(entries[i]))
+		return &entries[i];
+
+	return _mem_table_find(TABLE(entries[i]), pa, level_number);
+}
+
+void **mem_table_find(struct mem_table *table, phys_addr_t pa,
+		      unsigned int *level_number)
+{
+	void **entry;
+
+	entry = _mem_table_find(table, pa, level_number);
+	level_number += mem_table_base_level;
+
+	return entry;
+}
+EXPORT_SYMBOL_GPL(mem_table_find);
+
+static void **_mem_table_create(struct mem_table *table, phys_addr_t pa)
+{
+	struct mem_table_level *level = table->level;
+	void **entries = table->entries;
+	unsigned long i;
+
+	table->changed = true;
+	i = (pa >> level->shift) & level->mask;
+
+	if (!level->number) {
+		/*
+		 * Reached the lowest level. Return a pointer to the entry
+		 * so that the consumer can populate it.
+		 */
+		return &entries[i];
+	}
+
+	/*
+	 * If the entry is NULL, then create a lower level table and make the
+	 * entry point to it. Or, if the entry is a leaf, then we need to
+	 * split the entry. In this case as well, create a lower level table
+	 * to split the entry.
+	 */
+	if (!entries[i] || IS_LEAF(entries[i])) {
+		struct mem_table *next;
+
+		/* Create next level table. */
+		level--;
+		next = kzalloc(struct_size(table, entries, level->nentries),
+			       GFP_KERNEL);
+		if (!next)
+			return NULL;
+
+		next->level = level;
+		next->ops = table->ops;
+		next->changed = true;
+		entries[i] = ENTRY(next);
+	}
+
+	return _mem_table_create(TABLE(entries[i]), pa);
+}
+
+void **mem_table_create(struct mem_table *table, phys_addr_t pa)
+{
+	return _mem_table_create(table, pa);
+}
+EXPORT_SYMBOL_GPL(mem_table_create);
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 13/19] heki: Implement a kernel page table walker
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (11 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 12/19] x86: Implement the Memory Table feature to store arbitrary per-page data Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 14/19] heki: x86: Initialize permissions counters for pages mapped into KVA Mickaël Salaün
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>

The Heki feature needs to do the following:

- Find kernel mappings.

- Determine the permissions associated with each mapping.

- Determine the collective permissions for a guest physical page across
  all of its mappings.

This way, a guest physical page can reflect only the required
permissions in the EPT thanks to the KVM_HC_PROTECT_MEMORY hypercall..

Implement a kernel page table walker that walks all of the kernel
mappings and calls a callback function for each mapping.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---

Change since v1:
* New patch and new file: virt/heki/walk.c
---
 include/linux/heki.h |  16 +++++
 virt/heki/Makefile   |   1 +
 virt/heki/walk.c     | 140 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 157 insertions(+)
 create mode 100644 virt/heki/walk.c

diff --git a/include/linux/heki.h b/include/linux/heki.h
index 9b0c966c50d1..a7ae0b387dfe 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -61,6 +61,22 @@ struct heki {
 	struct heki_hypervisor *hypervisor;
 };
 
+/*
+ * The kernel page table is walked to locate kernel mappings. For each
+ * mapping, a callback function is called. The table walker passes information
+ * about the mapping to the callback using this structure.
+ */
+struct heki_args {
+	/* Information passed by the table walker to the callback. */
+	unsigned long va;
+	phys_addr_t pa;
+	size_t size;
+	unsigned long flags;
+};
+
+/* Callback function called by the table walker. */
+typedef void (*heki_func_t)(struct heki_args *args);
+
 extern struct heki heki;
 extern bool heki_enabled;
 
diff --git a/virt/heki/Makefile b/virt/heki/Makefile
index 354e567df71c..a5daa4ff7a4f 100644
--- a/virt/heki/Makefile
+++ b/virt/heki/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 obj-y += main.o
+obj-y += walk.o
diff --git a/virt/heki/walk.c b/virt/heki/walk.c
new file mode 100644
index 000000000000..e10b54226fcc
--- /dev/null
+++ b/virt/heki/walk.c
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Hypervisor Enforced Kernel Integrity (Heki) - Kernel page table walker.
+ *
+ * Copyright © 2023 Microsoft Corporation
+ *
+ * Cf. arch/x86/mm/init_64.c
+ */
+
+#include <linux/heki.h>
+#include <linux/pgtable.h>
+
+static void heki_walk_pte(pmd_t *pmd, unsigned long va, unsigned long va_end,
+			  heki_func_t func, struct heki_args *args)
+{
+	pte_t *pte;
+	unsigned long next_va;
+
+	for (pte = pte_offset_kernel(pmd, va); va < va_end;
+	     va = next_va, pte++) {
+		next_va = (va + PAGE_SIZE) & PAGE_MASK;
+
+		if (next_va > va_end)
+			next_va = va_end;
+
+		if (!pte_present(*pte))
+			continue;
+
+		args->va = va;
+		args->pa = pte_pfn(*pte) << PAGE_SHIFT;
+		args->size = PAGE_SIZE;
+		args->flags = pte_flags(*pte);
+
+		func(args);
+	}
+}
+
+static void heki_walk_pmd(pud_t *pud, unsigned long va, unsigned long va_end,
+			  heki_func_t func, struct heki_args *args)
+{
+	pmd_t *pmd;
+	unsigned long next_va;
+
+	for (pmd = pmd_offset(pud, va); va < va_end; va = next_va, pmd++) {
+		next_va = pmd_addr_end(va, va_end);
+
+		if (!pmd_present(*pmd))
+			continue;
+
+		if (pmd_large(*pmd)) {
+			args->va = va;
+			args->pa = pmd_pfn(*pmd) << PAGE_SHIFT;
+			args->pa += va & (PMD_SIZE - 1);
+			args->size = next_va - va;
+			args->flags = pmd_flags(*pmd);
+
+			func(args);
+		} else {
+			heki_walk_pte(pmd, va, next_va, func, args);
+		}
+	}
+}
+
+static void heki_walk_pud(p4d_t *p4d, unsigned long va, unsigned long va_end,
+			  heki_func_t func, struct heki_args *args)
+{
+	pud_t *pud;
+	unsigned long next_va;
+
+	for (pud = pud_offset(p4d, va); va < va_end; va = next_va, pud++) {
+		next_va = pud_addr_end(va, va_end);
+
+		if (!pud_present(*pud))
+			continue;
+
+		if (pud_large(*pud)) {
+			args->va = va;
+			args->pa = pud_pfn(*pud) << PAGE_SHIFT;
+			args->pa += va & (PUD_SIZE - 1);
+			args->size = next_va - va;
+			args->flags = pud_flags(*pud);
+
+			func(args);
+		} else {
+			heki_walk_pmd(pud, va, next_va, func, args);
+		}
+	}
+}
+
+static void heki_walk_p4d(pgd_t *pgd, unsigned long va, unsigned long va_end,
+			  heki_func_t func, struct heki_args *args)
+{
+	p4d_t *p4d;
+	unsigned long next_va;
+
+	for (p4d = p4d_offset(pgd, va); va < va_end; va = next_va, p4d++) {
+		next_va = p4d_addr_end(va, va_end);
+
+		if (!p4d_present(*p4d))
+			continue;
+
+		if (p4d_large(*p4d)) {
+			args->va = va;
+			args->pa = p4d_pfn(*p4d) << PAGE_SHIFT;
+			args->pa += va & (P4D_SIZE - 1);
+			args->size = next_va - va;
+			args->flags = p4d_flags(*p4d);
+
+			func(args);
+		} else {
+			heki_walk_pud(p4d, va, next_va, func, args);
+		}
+	}
+}
+
+void heki_walk(unsigned long va, unsigned long va_end, heki_func_t func,
+	       struct heki_args *args)
+{
+	pgd_t *pgd;
+	unsigned long next_va;
+
+	for (pgd = pgd_offset_k(va); va < va_end; va = next_va, pgd++) {
+		next_va = pgd_addr_end(va, va_end);
+
+		if (!pgd_present(*pgd))
+			continue;
+
+		if (pgd_large(*pgd)) {
+			args->va = va;
+			args->pa = pgd_pfn(*pgd) << PAGE_SHIFT;
+			args->pa += va & (PGDIR_SIZE - 1);
+			args->size = next_va - va;
+			args->flags = pgd_flags(*pgd);
+
+			func(args);
+		} else {
+			heki_walk_p4d(pgd, va, next_va, func, args);
+		}
+	}
+}
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 14/19] heki: x86: Initialize permissions counters for pages mapped into KVA
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (12 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 13/19] heki: Implement a kernel page table walker Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 15/19] heki: x86: Initialize permissions counters for pages in vmap()/vunmap() Mickaël Salaün
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>

Define a permissions counters structure that contains a counter for
read, write and execute. Each mapped guest page will be allocated a
permissions counters structure.

During kernel boot, walk the kernel address space, locate all the
mappings, create permissions counters for each mapped guest page and
update the counters to reflect the collective permissions for each page
across all of its mappings.

The collective permissions will be applied in the EPT in a following
commit.

We might want to move these counters to a safer place (e.g., KVM) to
protect it from tampering by the guest kernel itself.

We should note that walking through all mappings might be slow if KASAN
is enabled.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Suggested-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---

Changes since v1:
* New patch and new files: arch/x86/mm/heki.c and virt/heki/counters.c
---
 arch/x86/mm/Makefile |   2 +
 arch/x86/mm/heki.c   |  56 +++++++++++++++++
 include/linux/heki.h |  32 ++++++++++
 virt/heki/Kconfig    |   2 +
 virt/heki/Makefile   |   1 +
 virt/heki/counters.c | 147 +++++++++++++++++++++++++++++++++++++++++++
 virt/heki/main.c     |  13 ++++
 7 files changed, 253 insertions(+)
 create mode 100644 arch/x86/mm/heki.c
 create mode 100644 virt/heki/counters.c

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index c80febc44cd2..2998eaac0dbb 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -67,3 +67,5 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_amd.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_identity.o
 obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
+
+obj-$(CONFIG_HEKI)		+= heki.o
diff --git a/arch/x86/mm/heki.c b/arch/x86/mm/heki.c
new file mode 100644
index 000000000000..c495df0d8772
--- /dev/null
+++ b/arch/x86/mm/heki.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hypervisor Enforced Kernel Integrity (Heki) - Arch specific.
+ *
+ * Copyright © 2023 Microsoft Corporation
+ */
+
+#include <linux/heki.h>
+#include <linux/kvm_mem_attr.h>
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) "heki-guest: " fmt
+
+static unsigned long kernel_va;
+static unsigned long kernel_end;
+static unsigned long direct_map_va;
+static unsigned long direct_map_end;
+
+__init void heki_arch_early_init(void)
+{
+	/* Kernel virtual address space range, not yet compatible with KASLR. */
+	if (pgtable_l5_enabled()) {
+		kernel_va = 0xff00000000000000UL;
+		kernel_end = 0xffffffffffe00000UL;
+		direct_map_va = 0xff11000000000000UL;
+		direct_map_end = 0xff91000000000000UL;
+	} else {
+		kernel_va = 0xffff800000000000UL;
+		kernel_end = 0xffffffffffe00000UL;
+		direct_map_va = 0xffff888000000000UL;
+		direct_map_end = 0xffffc88000000000UL;
+	}
+
+	/*
+	 * Initialize the counters for all existing kernel mappings except
+	 * for direct map.
+	 */
+	heki_map(kernel_va, direct_map_va);
+	heki_map(direct_map_end, kernel_end);
+}
+
+unsigned long heki_flags_to_permissions(unsigned long flags)
+{
+	unsigned long permissions;
+
+	permissions = MEM_ATTR_READ | MEM_ATTR_EXEC;
+	if (flags & _PAGE_RW)
+		permissions |= MEM_ATTR_WRITE;
+	if (flags & _PAGE_NX)
+		permissions &= ~MEM_ATTR_EXEC;
+
+	return permissions;
+}
diff --git a/include/linux/heki.h b/include/linux/heki.h
index a7ae0b387dfe..86c787d121e0 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -19,6 +19,16 @@
 
 #ifdef CONFIG_HEKI
 
+/*
+ * This structure keeps track of the collective permissions for a guest page
+ * across all of its mappings.
+ */
+struct heki_counters {
+	int read;
+	int write;
+	int execute;
+};
+
 /*
  * This structure contains a guest physical range and its permissions (RWX).
  */
@@ -56,9 +66,17 @@ struct heki_hypervisor {
 /*
  * If the active hypervisor supports Heki, it will plug its heki_hypervisor
  * pointer into this heki structure.
+ *
+ * During guest kernel boot, permissions counters for each guest page are
+ * initialized based on the page's current permissions.
  */
 struct heki {
 	struct heki_hypervisor *hypervisor;
+	struct mem_table *counters;
+};
+
+enum heki_cmd {
+	HEKI_MAP,
 };
 
 /*
@@ -72,6 +90,9 @@ struct heki_args {
 	phys_addr_t pa;
 	size_t size;
 	unsigned long flags;
+
+	/* Command passed by caller. */
+	enum heki_cmd cmd;
 };
 
 /* Callback function called by the table walker. */
@@ -84,6 +105,14 @@ extern bool __read_mostly enable_mbec;
 
 void heki_early_init(void);
 void heki_late_init(void);
+void heki_counters_init(void);
+void heki_walk(unsigned long va, unsigned long va_end, heki_func_t func,
+	       struct heki_args *args);
+void heki_map(unsigned long va, unsigned long end);
+
+/* Arch-specific functions. */
+void heki_arch_early_init(void);
+unsigned long heki_flags_to_permissions(unsigned long flags);
 
 #else /* !CONFIG_HEKI */
 
@@ -93,6 +122,9 @@ static inline void heki_early_init(void)
 static inline void heki_late_init(void)
 {
 }
+static inline void heki_map(unsigned long va, unsigned long end)
+{
+}
 
 #endif /* CONFIG_HEKI */
 
diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig
index 75a784653e31..6d956eb9d04b 100644
--- a/virt/heki/Kconfig
+++ b/virt/heki/Kconfig
@@ -6,6 +6,8 @@ config HEKI
 	bool "Hypervisor Enforced Kernel Integrity (Heki)"
 	depends on ARCH_SUPPORTS_HEKI && HYPERVISOR_SUPPORTS_HEKI
 	select KVM_GENERIC_MEMORY_ATTRIBUTES
+	depends on !X86_16BIT
+	select SPARSEMEM
 	help
 	  This feature enhances guest virtual machine security by taking
 	  advantage of security features provided by the hypervisor for guests.
diff --git a/virt/heki/Makefile b/virt/heki/Makefile
index a5daa4ff7a4f..564f92faa9d8 100644
--- a/virt/heki/Makefile
+++ b/virt/heki/Makefile
@@ -2,3 +2,4 @@
 
 obj-y += main.o
 obj-y += walk.o
+obj-y += counters.o
diff --git a/virt/heki/counters.c b/virt/heki/counters.c
new file mode 100644
index 000000000000..7067449cabca
--- /dev/null
+++ b/virt/heki/counters.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Hypervisor Enforced Kernel Integrity (Heki) - Permissions counters.
+ *
+ * Copyright © 2023 Microsoft Corporation
+ */
+
+#include <linux/heki.h>
+#include <linux/kvm_mem_attr.h>
+#include <linux/mem_table.h>
+
+#include "common.h"
+
+DEFINE_MUTEX(heki_lock);
+
+static void heki_update_counters(struct heki_counters *counters,
+				 unsigned long perm, unsigned long set,
+				 unsigned long clear)
+{
+	if (WARN_ON_ONCE(!counters))
+		return;
+
+	if ((clear & MEM_ATTR_READ) && (perm & MEM_ATTR_READ))
+		counters->read--;
+	if ((clear & MEM_ATTR_WRITE) && (perm & MEM_ATTR_WRITE))
+		counters->write--;
+	if ((clear & MEM_ATTR_EXEC) && (perm & MEM_ATTR_EXEC))
+		counters->execute--;
+
+	if ((set & MEM_ATTR_READ) && !(perm & MEM_ATTR_READ))
+		counters->read++;
+	if ((set & MEM_ATTR_WRITE) && !(perm & MEM_ATTR_WRITE))
+		counters->write++;
+	if ((set & MEM_ATTR_EXEC) && !(perm & MEM_ATTR_EXEC))
+		counters->execute++;
+}
+
+static struct heki_counters *heki_create_counters(struct mem_table *table,
+						  phys_addr_t pa)
+{
+	struct heki_counters *counters;
+	void **entry;
+
+	entry = mem_table_create(table, pa);
+	if (WARN_ON(!entry))
+		return NULL;
+
+	counters = kzalloc(sizeof(*counters), GFP_KERNEL);
+	if (WARN_ON(!counters))
+		return NULL;
+
+	*entry = counters;
+	return counters;
+}
+
+void heki_callback(struct heki_args *args)
+{
+	/* The VA is only for debug. It is not really used in this function. */
+	unsigned long va;
+	phys_addr_t pa, pa_end;
+	unsigned long permissions;
+	void **entry;
+	struct heki_counters *counters;
+	unsigned int ignore;
+
+	if (!pfn_valid(args->pa >> PAGE_SHIFT))
+		return;
+
+	permissions = heki_flags_to_permissions(args->flags);
+
+	/*
+	 * Handle counters for a leaf entry in the kernel page table.
+	 */
+	pa_end = args->pa + args->size;
+	for (pa = args->pa, va = args->va; pa < pa_end;
+	     pa += PAGE_SIZE, va += PAGE_SIZE) {
+		entry = mem_table_find(heki.counters, pa, &ignore);
+		if (entry)
+			counters = *entry;
+		else
+			counters = NULL;
+
+		switch (args->cmd) {
+		case HEKI_MAP:
+			if (!counters)
+				counters =
+					heki_create_counters(heki.counters, pa);
+			heki_update_counters(counters, 0, permissions, 0);
+			break;
+
+		default:
+			WARN_ON_ONCE(1);
+			break;
+		}
+	}
+}
+
+static void heki_func(unsigned long va, unsigned long end,
+		      struct heki_args *args)
+{
+	if (!heki.counters || va >= end)
+		return;
+
+	va = ALIGN_DOWN(va, PAGE_SIZE);
+	end = ALIGN(end, PAGE_SIZE);
+
+	mutex_lock(&heki_lock);
+
+	heki_walk(va, end, heki_callback, args);
+
+	mutex_unlock(&heki_lock);
+}
+
+/*
+ * Find the mappings in the given range and initialize permission counters for
+ * them.
+ */
+void heki_map(unsigned long va, unsigned long end)
+{
+	struct heki_args args = {
+		.cmd = HEKI_MAP,
+	};
+
+	heki_func(va, end, &args);
+}
+
+/*
+ * Permissions counters are associated with each guest page using the
+ * Memory Table feature. Initialize the permissions counters here.
+ * Note that we don't support large page entries for counters because
+ * it is difficult to merge/split counters for large pages.
+ */
+
+static void heki_counters_free(void *counters)
+{
+	kfree(counters);
+}
+
+static struct mem_table_ops heki_counters_ops = {
+	.free = heki_counters_free,
+};
+
+__init void heki_counters_init(void)
+{
+	heki.counters = mem_table_alloc(&heki_counters_ops);
+	WARN_ON(!heki.counters);
+}
diff --git a/virt/heki/main.c b/virt/heki/main.c
index ff1937e1c946..0ab7de659e6f 100644
--- a/virt/heki/main.c
+++ b/virt/heki/main.c
@@ -21,6 +21,16 @@ __init void heki_early_init(void)
 		pr_warn("Heki is not enabled\n");
 		return;
 	}
+
+	/*
+	 * Static addresses (see heki_arch_early_init) are not compatible with
+	 * KASLR. This will be handled in a next patch series.
+	 */
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
+		pr_warn("Heki is disabled because KASLR is not supported yet\n");
+		return;
+	}
+
 	pr_warn("Heki is enabled\n");
 
 	if (!heki.hypervisor) {
@@ -29,6 +39,9 @@ __init void heki_early_init(void)
 		return;
 	}
 	pr_warn("Heki is supported by the active Hypervisor\n");
+
+	heki_counters_init();
+	heki_arch_early_init();
 }
 
 /*
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 15/19] heki: x86: Initialize permissions counters for pages in vmap()/vunmap()
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (13 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 14/19] heki: x86: Initialize permissions counters for pages mapped into KVA Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 16/19] heki: x86: Update permissions counters when guest page permissions change Mickaël Salaün
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>

When a page gets mapped, create permissions counters for it and
initialize them based on the specified permissions.

When a page gets unmapped, update the counters appropriately.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---

Changes since v1:
* New patch
---
 include/linux/heki.h | 11 ++++++++++-
 mm/vmalloc.c         |  7 +++++++
 virt/heki/counters.c | 20 ++++++++++++++++++++
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/include/linux/heki.h b/include/linux/heki.h
index 86c787d121e0..d660994d34d0 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -68,7 +68,11 @@ struct heki_hypervisor {
  * pointer into this heki structure.
  *
  * During guest kernel boot, permissions counters for each guest page are
- * initialized based on the page's current permissions.
+ * initialized based on the page's current permissions. Beyond this point,
+ * the counters are updated whenever:
+ *
+ *	- a page is mapped into the kernel address space
+ *	- a page is unmapped from the kernel address space
  */
 struct heki {
 	struct heki_hypervisor *hypervisor;
@@ -77,6 +81,7 @@ struct heki {
 
 enum heki_cmd {
 	HEKI_MAP,
+	HEKI_UNMAP,
 };
 
 /*
@@ -109,6 +114,7 @@ void heki_counters_init(void);
 void heki_walk(unsigned long va, unsigned long va_end, heki_func_t func,
 	       struct heki_args *args);
 void heki_map(unsigned long va, unsigned long end);
+void heki_unmap(unsigned long va, unsigned long end);
 
 /* Arch-specific functions. */
 void heki_arch_early_init(void);
@@ -125,6 +131,9 @@ static inline void heki_late_init(void)
 static inline void heki_map(unsigned long va, unsigned long end)
 {
 }
+static inline void heki_unmap(unsigned long va, unsigned long end)
+{
+}
 
 #endif /* CONFIG_HEKI */
 
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a3fedb3ee0db..d9096502e571 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -40,6 +40,7 @@
 #include <linux/pgtable.h>
 #include <linux/hugetlb.h>
 #include <linux/sched/mm.h>
+#include <linux/heki.h>
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
@@ -301,6 +302,8 @@ static int vmap_range_noflush(unsigned long addr, unsigned long end,
 	if (mask & ARCH_PAGE_TABLE_SYNC_MASK)
 		arch_sync_kernel_mappings(start, end);
 
+	heki_map(start, end);
+
 	return err;
 }
 
@@ -419,6 +422,8 @@ void __vunmap_range_noflush(unsigned long start, unsigned long end)
 	pgtbl_mod_mask mask = 0;
 
 	BUG_ON(addr >= end);
+	heki_unmap(start, end);
+
 	pgd = pgd_offset_k(addr);
 	do {
 		next = pgd_addr_end(addr, end);
@@ -564,6 +569,8 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
 	if (mask & ARCH_PAGE_TABLE_SYNC_MASK)
 		arch_sync_kernel_mappings(start, end);
 
+	heki_map(start, end);
+
 	return 0;
 }
 
diff --git a/virt/heki/counters.c b/virt/heki/counters.c
index 7067449cabca..adc8d566b8a9 100644
--- a/virt/heki/counters.c
+++ b/virt/heki/counters.c
@@ -88,6 +88,13 @@ void heki_callback(struct heki_args *args)
 			heki_update_counters(counters, 0, permissions, 0);
 			break;
 
+		case HEKI_UNMAP:
+			if (WARN_ON_ONCE(!counters))
+				break;
+			heki_update_counters(counters, permissions, 0,
+					     permissions);
+			break;
+
 		default:
 			WARN_ON_ONCE(1);
 			break;
@@ -124,6 +131,19 @@ void heki_map(unsigned long va, unsigned long end)
 	heki_func(va, end, &args);
 }
 
+/*
+ * Find the mappings in the given range and revert the permission counters for
+ * them.
+ */
+void heki_unmap(unsigned long va, unsigned long end)
+{
+	struct heki_args args = {
+		.cmd = HEKI_UNMAP,
+	};
+
+	heki_func(va, end, &args);
+}
+
 /*
  * Permissions counters are associated with each guest page using the
  * Memory Table feature. Initialize the permissions counters here.
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 16/19] heki: x86: Update permissions counters when guest page permissions change
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (14 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 15/19] heki: x86: Initialize permissions counters for pages in vmap()/vunmap() Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching Mickaël Salaün
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>

When permissions are changed on an existing mapping, update the
permissions counters.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---

Changes since v1:
* New patch
---
 arch/x86/mm/heki.c           |  9 +++++++
 arch/x86/mm/pat/set_memory.c | 51 ++++++++++++++++++++++++++++++++++++
 include/linux/heki.h         | 14 ++++++++++
 virt/heki/counters.c         | 23 ++++++++++++++++
 4 files changed, 97 insertions(+)

diff --git a/arch/x86/mm/heki.c b/arch/x86/mm/heki.c
index c495df0d8772..c0eace9e343f 100644
--- a/arch/x86/mm/heki.c
+++ b/arch/x86/mm/heki.c
@@ -54,3 +54,12 @@ unsigned long heki_flags_to_permissions(unsigned long flags)
 
 	return permissions;
 }
+
+void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set,
+				unsigned long *clear)
+{
+	if (pgprot_val(prot) & _PAGE_RW)
+		*set |= MEM_ATTR_WRITE;
+	if (pgprot_val(prot) & _PAGE_NX)
+		*clear |= MEM_ATTR_EXEC;
+}
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index bda9f129835e..6aaa1ce5692c 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -22,6 +22,7 @@
 #include <linux/cc_platform.h>
 #include <linux/set_memory.h>
 #include <linux/memregion.h>
+#include <linux/heki.h>
 
 #include <asm/e820/api.h>
 #include <asm/processor.h>
@@ -2056,11 +2057,56 @@ int clear_mce_nospec(unsigned long pfn)
 EXPORT_SYMBOL_GPL(clear_mce_nospec);
 #endif /* CONFIG_X86_64 */
 
+#ifdef CONFIG_HEKI
+
+static void heki_change_page_attr_set(unsigned long va, int numpages,
+				      pgprot_t set)
+{
+	unsigned long va_end;
+	unsigned long set_permissions = 0, clear_permissions = 0;
+
+	heki_pgprot_to_permissions(set, &set_permissions, &clear_permissions);
+	if (!(set_permissions | clear_permissions))
+		return;
+
+	va_end = va + (numpages << PAGE_SHIFT);
+	heki_update(va, va_end, set_permissions, clear_permissions);
+}
+
+static void heki_change_page_attr_clear(unsigned long va, int numpages,
+					pgprot_t clear)
+{
+	unsigned long va_end;
+	unsigned long set_permissions = 0, clear_permissions = 0;
+
+	heki_pgprot_to_permissions(clear, &clear_permissions, &set_permissions);
+	if (!(set_permissions | clear_permissions))
+		return;
+
+	va_end = va + (numpages << PAGE_SHIFT);
+	heki_update(va, va_end, set_permissions, clear_permissions);
+}
+
+#else /* !CONFIG_HEKI */
+
+static void heki_change_page_attr_set(unsigned long va, int numpages,
+				      pgprot_t set)
+{
+}
+
+static void heki_change_page_attr_clear(unsigned long va, int numpages,
+					pgprot_t clear)
+{
+}
+
+#endif /* CONFIG_HEKI */
+
 int set_memory_x(unsigned long addr, int numpages)
 {
 	if (!(__supported_pte_mask & _PAGE_NX))
 		return 0;
 
+	heki_change_page_attr_clear(addr, numpages, __pgprot(_PAGE_NX));
 	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_NX), 0);
 }
 
@@ -2069,11 +2115,14 @@ int set_memory_nx(unsigned long addr, int numpages)
 	if (!(__supported_pte_mask & _PAGE_NX))
 		return 0;
 
+	heki_change_page_attr_set(addr, numpages, __pgprot(_PAGE_NX));
 	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_NX), 0);
 }
 
 int set_memory_ro(unsigned long addr, int numpages)
 {
+	// TODO: What about _PAGE_DIRTY?
+	heki_change_page_attr_clear(addr, numpages, __pgprot(_PAGE_RW));
 	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0);
 }
 
@@ -2084,11 +2133,13 @@ int set_memory_rox(unsigned long addr, int numpages)
 	if (__supported_pte_mask & _PAGE_NX)
 		clr.pgprot |= _PAGE_NX;
 
+	heki_change_page_attr_clear(addr, numpages, clr);
 	return change_page_attr_clear(&addr, numpages, clr, 0);
 }
 
 int set_memory_rw(unsigned long addr, int numpages)
 {
+	heki_change_page_attr_set(addr, numpages, __pgprot(_PAGE_RW));
 	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_RW), 0);
 }
 
diff --git a/include/linux/heki.h b/include/linux/heki.h
index d660994d34d0..079b34af07f0 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -73,6 +73,7 @@ struct heki_hypervisor {
  *
  *	- a page is mapped into the kernel address space
  *	- a page is unmapped from the kernel address space
+ *	- permissions are changed for a mapped page
  */
 struct heki {
 	struct heki_hypervisor *hypervisor;
@@ -81,6 +82,7 @@ struct heki {
 
 enum heki_cmd {
 	HEKI_MAP,
+	HEKI_UPDATE,
 	HEKI_UNMAP,
 };
 
@@ -98,6 +100,10 @@ struct heki_args {
 
 	/* Command passed by caller. */
 	enum heki_cmd cmd;
+
+	/* Permissions passed by heki_update(). */
+	unsigned long set;
+	unsigned long clear;
 };
 
 /* Callback function called by the table walker. */
@@ -114,11 +120,15 @@ void heki_counters_init(void);
 void heki_walk(unsigned long va, unsigned long va_end, heki_func_t func,
 	       struct heki_args *args);
 void heki_map(unsigned long va, unsigned long end);
+void heki_update(unsigned long va, unsigned long end, unsigned long set,
+		 unsigned long clear);
 void heki_unmap(unsigned long va, unsigned long end);
 
 /* Arch-specific functions. */
 void heki_arch_early_init(void);
 unsigned long heki_flags_to_permissions(unsigned long flags);
+void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set,
+				unsigned long *clear);
 
 #else /* !CONFIG_HEKI */
 
@@ -131,6 +141,10 @@ static inline void heki_late_init(void)
 static inline void heki_map(unsigned long va, unsigned long end)
 {
 }
+static inline void heki_update(unsigned long va, unsigned long end,
+			       unsigned long set, unsigned long clear)
+{
+}
 static inline void heki_unmap(unsigned long va, unsigned long end)
 {
 }
diff --git a/virt/heki/counters.c b/virt/heki/counters.c
index adc8d566b8a9..d0f830b0775a 100644
--- a/virt/heki/counters.c
+++ b/virt/heki/counters.c
@@ -88,6 +88,13 @@ void heki_callback(struct heki_args *args)
 			heki_update_counters(counters, 0, permissions, 0);
 			break;
 
+		case HEKI_UPDATE:
+			if (!counters)
+				continue;
+			heki_update_counters(counters, permissions, args->set,
+					     args->clear);
+			break;
+
 		case HEKI_UNMAP:
 			if (WARN_ON_ONCE(!counters))
 				break;
@@ -131,6 +138,22 @@ void heki_map(unsigned long va, unsigned long end)
 	heki_func(va, end, &args);
 }
 
+/*
+ * Find the mappings in the given range and update permission counters for
+ * them. Apply permissions in the host page table.
+ */
+void heki_update(unsigned long va, unsigned long end, unsigned long set,
+		 unsigned long clear)
+{
+	struct heki_args args = {
+		.cmd = HEKI_UPDATE,
+		.set = set,
+		.clear = clear,
+	};
+
+	heki_func(va, end, &args);
+}
+
 /*
  * Find the mappings in the given range and revert the permission counters for
  * them.
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (15 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 16/19] heki: x86: Update permissions counters when guest page permissions change Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  8:19   ` Peter Zijlstra
  2023-11-13  2:23 ` [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor Mickaël Salaün
  2023-11-13  2:23 ` [RFC PATCH v2 19/19] virt: Add Heki KUnit tests Mickaël Salaün
  18 siblings, 1 reply; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>

X86 uses a function called __text_poke() to modify executable code. This
patching function is used by many features such as KProbes and FTrace.

Update the permissions counters for the text page so that write
permissions can be temporarily established in the EPT to modify the
instructions in that page.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---

Changes since v1:
* New patch
---
 arch/x86/kernel/alternative.c |  5 ++++
 arch/x86/mm/heki.c            | 49 +++++++++++++++++++++++++++++++++++
 include/linux/heki.h          | 14 ++++++++++
 3 files changed, 68 insertions(+)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 517ee01503be..64fd8757ba5c 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -18,6 +18,7 @@
 #include <linux/mmu_context.h>
 #include <linux/bsearch.h>
 #include <linux/sync_core.h>
+#include <linux/heki.h>
 #include <asm/text-patching.h>
 #include <asm/alternative.h>
 #include <asm/sections.h>
@@ -1801,6 +1802,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
 	 */
 	pgprot = __pgprot(pgprot_val(PAGE_KERNEL) & ~_PAGE_GLOBAL);
 
+	heki_text_poke_start(pages, cross_page_boundary ? 2 : 1, pgprot);
 	/*
 	 * The lock is not really needed, but this allows to avoid open-coding.
 	 */
@@ -1865,7 +1867,10 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
 	}
 
 	local_irq_restore(flags);
+
 	pte_unmap_unlock(ptep, ptl);
+	heki_text_poke_end(pages, cross_page_boundary ? 2 : 1, pgprot);
+
 	return addr;
 }
 
diff --git a/arch/x86/mm/heki.c b/arch/x86/mm/heki.c
index c0eace9e343f..e4c60d8b4f2d 100644
--- a/arch/x86/mm/heki.c
+++ b/arch/x86/mm/heki.c
@@ -5,8 +5,11 @@
  * Copyright © 2023 Microsoft Corporation
  */
 
+#include <asm/pgtable.h>
+#include <asm/text-patching.h>
 #include <linux/heki.h>
 #include <linux/kvm_mem_attr.h>
+#include <linux/mm.h>
 
 #ifdef pr_fmt
 #undef pr_fmt
@@ -63,3 +66,49 @@ void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set,
 	if (pgprot_val(prot) & _PAGE_NX)
 		*clear |= MEM_ATTR_EXEC;
 }
+
+static unsigned long heki_pgprot_to_flags(pgprot_t prot)
+{
+	unsigned long flags = 0;
+
+	if (pgprot_val(prot) & _PAGE_RW)
+		flags |= _PAGE_RW;
+	if (pgprot_val(prot) & _PAGE_NX)
+		flags |= _PAGE_NX;
+	return flags;
+}
+
+static void heki_text_poke_common(struct page **pages, int npages,
+				  pgprot_t prot, enum heki_cmd cmd)
+{
+	struct heki_args args = {
+		.cmd = cmd,
+	};
+	unsigned long va = poking_addr;
+	int i;
+
+	if (!heki.counters)
+		return;
+
+	mutex_lock(&heki_lock);
+
+	for (i = 0; i < npages; i++, va += PAGE_SIZE) {
+		args.va = va;
+		args.pa = page_to_pfn(pages[i]) << PAGE_SHIFT;
+		args.size = PAGE_SIZE;
+		args.flags = heki_pgprot_to_flags(prot);
+		heki_callback(&args);
+	}
+
+	mutex_unlock(&heki_lock);
+}
+
+void heki_text_poke_start(struct page **pages, int npages, pgprot_t prot)
+{
+	heki_text_poke_common(pages, npages, prot, HEKI_MAP);
+}
+
+void heki_text_poke_end(struct page **pages, int npages, pgprot_t prot)
+{
+	heki_text_poke_common(pages, npages, prot, HEKI_UNMAP);
+}
diff --git a/include/linux/heki.h b/include/linux/heki.h
index 079b34af07f0..6f2cfddc6dac 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -111,6 +111,7 @@ typedef void (*heki_func_t)(struct heki_args *args);
 
 extern struct heki heki;
 extern bool heki_enabled;
+extern struct mutex heki_lock;
 
 extern bool __read_mostly enable_mbec;
 
@@ -123,12 +124,15 @@ void heki_map(unsigned long va, unsigned long end);
 void heki_update(unsigned long va, unsigned long end, unsigned long set,
 		 unsigned long clear);
 void heki_unmap(unsigned long va, unsigned long end);
+void heki_callback(struct heki_args *args);
 
 /* Arch-specific functions. */
 void heki_arch_early_init(void);
 unsigned long heki_flags_to_permissions(unsigned long flags);
 void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set,
 				unsigned long *clear);
+void heki_text_poke_start(struct page **pages, int npages, pgprot_t prot);
+void heki_text_poke_end(struct page **pages, int npages, pgprot_t prot);
 
 #else /* !CONFIG_HEKI */
 
@@ -149,6 +153,16 @@ static inline void heki_unmap(unsigned long va, unsigned long end)
 {
 }
 
+/* Arch-specific functions. */
+static inline void heki_text_poke_start(struct page **pages, int npages,
+					pgprot_t prot)
+{
+}
+static inline void heki_text_poke_end(struct page **pages, int npages,
+				      pgprot_t prot)
+{
+}
+
 #endif /* CONFIG_HEKI */
 
 #endif /* __HEKI_H__ */
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (16 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  2023-11-13  8:54   ` Peter Zijlstra
  2023-11-13  2:23 ` [RFC PATCH v2 19/19] virt: Add Heki KUnit tests Mickaël Salaün
  18 siblings, 1 reply; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>

Implement a hypervisor function, kvm_protect_memory() that calls the
KVM_HC_PROTECT_MEMORY hypercall to request the KVM hypervisor to
set specified permissions on a list of guest pages.

Using the protect_memory() function, set proper EPT permissions for all
guest pages.

Use the MEM_ATTR_IMMUTABLE property to protect the kernel static
sections and the boot-time read-only sections. This enables to make sure
a compromised guest will not be able to change its main physical memory
page permissions. However, this also disable any feature that may change
the kernel's text section (e.g., ftrace, Kprobes), but they can still be
used on kernel modules.

Module loading/unloading, and eBPF JIT is allowed without restrictions
for now, but we'll need a way to authenticate these code changes to
really improve the guests' security. We plan to use module signatures,
but there is no solution yet to authenticate eBPF programs.

Being able to use ftrace and Kprobes in a secure way is a challenge not
solved yet. We're looking for ideas to make this work.

Likewise, the JUMP_LABEL feature cannot work because the kernel's text
section is read-only.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---

Changes since v1:
* New patch
---
 arch/x86/kernel/kvm.c  | 11 ++++++
 arch/x86/kvm/mmu/mmu.c |  2 +-
 arch/x86/mm/heki.c     | 21 ++++++++++
 include/linux/heki.h   | 26 ++++++++++++
 virt/heki/Kconfig      |  1 +
 virt/heki/counters.c   | 90 ++++++++++++++++++++++++++++++++++++++++--
 virt/heki/main.c       | 83 +++++++++++++++++++++++++++++++++++++-
 7 files changed, 229 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8349f4ad3bbd..343615b0e3bf 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -1021,8 +1021,19 @@ static int kvm_lock_crs(void)
 	return err;
 }
 
+static int kvm_protect_memory(gpa_t pa)
+{
+	long err;
+
+	WARN_ON_ONCE(in_interrupt());
+
+	err = kvm_hypercall1(KVM_HC_PROTECT_MEMORY, pa);
+	return err;
+}
+
 static struct heki_hypervisor kvm_heki_hypervisor = {
 	.lock_crs = kvm_lock_crs,
+	.protect_memory = kvm_protect_memory,
 };
 
 static void kvm_init_heki(void)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2d09bcc35462..13be05e9ccf1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7374,7 +7374,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 	int level;
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
-	lockdep_assert_held(&kvm->slots_lock);
+	lockdep_assert_held(&kvm->slots_arch_lock);
 
 	/*
 	 * The sequence matters here: upper levels consume the result of lower
diff --git a/arch/x86/mm/heki.c b/arch/x86/mm/heki.c
index e4c60d8b4f2d..6c3fa9defada 100644
--- a/arch/x86/mm/heki.c
+++ b/arch/x86/mm/heki.c
@@ -45,6 +45,19 @@ __init void heki_arch_early_init(void)
 	heki_map(direct_map_end, kernel_end);
 }
 
+void heki_arch_late_init(void)
+{
+	/*
+	 * The permission counters for all existing kernel mappings have
+	 * already been updated. Now, walk all the pages, compute their
+	 * permissions from the counters and apply the permissions in the
+	 * host page table. To accomplish this, we walk the direct map
+	 * range.
+	 */
+	heki_protect(direct_map_va, direct_map_end);
+	pr_warn("Guest memory protected\n");
+}
+
 unsigned long heki_flags_to_permissions(unsigned long flags)
 {
 	unsigned long permissions;
@@ -67,6 +80,11 @@ void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set,
 		*clear |= MEM_ATTR_EXEC;
 }
 
+unsigned long heki_default_permissions(void)
+{
+	return MEM_ATTR_READ | MEM_ATTR_WRITE;
+}
+
 static unsigned long heki_pgprot_to_flags(pgprot_t prot)
 {
 	unsigned long flags = 0;
@@ -100,6 +118,9 @@ static void heki_text_poke_common(struct page **pages, int npages,
 		heki_callback(&args);
 	}
 
+	if (args.head)
+		heki_apply_permissions(&args);
+
 	mutex_unlock(&heki_lock);
 }
 
diff --git a/include/linux/heki.h b/include/linux/heki.h
index 6f2cfddc6dac..306bcec7ae92 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -15,6 +15,8 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/printk.h>
+#include <linux/mm.h>
+#include <linux/memblock.h>
 #include <linux/slab.h>
 
 #ifdef CONFIG_HEKI
@@ -61,6 +63,7 @@ struct heki_page_list {
  */
 struct heki_hypervisor {
 	int (*lock_crs)(void); /* Lock control registers. */
+	int (*protect_memory)(gpa_t pa); /* Protect guest memory */
 };
 
 /*
@@ -74,16 +77,28 @@ struct heki_hypervisor {
  *	- a page is mapped into the kernel address space
  *	- a page is unmapped from the kernel address space
  *	- permissions are changed for a mapped page
+ *
+ * At the end of kernel boot (before kicking off the init process), the
+ * permissions for guest pages are applied to the host page table.
+ *
+ * Beyond that point, the counters and host page table permissions are updated
+ * whenever:
+ *
+ *	- a guest page is mapped into the kernel address space
+ *	- a guest page is unmapped from the kernel address space
+ *	- permissions are changed for a mapped guest page
  */
 struct heki {
 	struct heki_hypervisor *hypervisor;
 	struct mem_table *counters;
+	bool protect_memory;
 };
 
 enum heki_cmd {
 	HEKI_MAP,
 	HEKI_UPDATE,
 	HEKI_UNMAP,
+	HEKI_PROTECT_MEMORY,
 };
 
 /*
@@ -103,7 +118,12 @@ struct heki_args {
 
 	/* Permissions passed by heki_update(). */
 	unsigned long set;
+	unsigned long set_global;
 	unsigned long clear;
+
+	/* Page list is built by the callback. */
+	struct heki_page_list *head;
+	phys_addr_t head_pa;
 };
 
 /* Callback function called by the table walker. */
@@ -125,14 +145,20 @@ void heki_update(unsigned long va, unsigned long end, unsigned long set,
 		 unsigned long clear);
 void heki_unmap(unsigned long va, unsigned long end);
 void heki_callback(struct heki_args *args);
+void heki_protect(unsigned long va, unsigned long end);
+void heki_add_pa(struct heki_args *args, phys_addr_t pa,
+		 unsigned long permissions);
+void heki_apply_permissions(struct heki_args *args);
 
 /* Arch-specific functions. */
 void heki_arch_early_init(void);
+void heki_arch_late_init(void);
 unsigned long heki_flags_to_permissions(unsigned long flags);
 void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set,
 				unsigned long *clear);
 void heki_text_poke_start(struct page **pages, int npages, pgprot_t prot);
 void heki_text_poke_end(struct page **pages, int npages, pgprot_t prot);
+unsigned long heki_default_permissions(void);
 
 #else /* !CONFIG_HEKI */
 
diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig
index 6d956eb9d04b..9bde84cd759e 100644
--- a/virt/heki/Kconfig
+++ b/virt/heki/Kconfig
@@ -8,6 +8,7 @@ config HEKI
 	select KVM_GENERIC_MEMORY_ATTRIBUTES
 	depends on !X86_16BIT
 	select SPARSEMEM
+	depends on !JUMP_LABEL
 	help
 	  This feature enhances guest virtual machine security by taking
 	  advantage of security features provided by the hypervisor for guests.
diff --git a/virt/heki/counters.c b/virt/heki/counters.c
index d0f830b0775a..302113bee6ff 100644
--- a/virt/heki/counters.c
+++ b/virt/heki/counters.c
@@ -13,6 +13,25 @@
 
 DEFINE_MUTEX(heki_lock);
 
+static inline unsigned long heki_permissions(struct heki_counters *counters)
+{
+	unsigned long permissions;
+
+	if (!counters)
+		return heki_default_permissions();
+
+	permissions = 0;
+	if (counters->read)
+		permissions |= MEM_ATTR_READ;
+	if (counters->write)
+		permissions |= MEM_ATTR_WRITE;
+	if (counters->execute)
+		permissions |= MEM_ATTR_EXEC;
+	if (!permissions)
+		permissions = heki_default_permissions();
+	return permissions;
+}
+
 static void heki_update_counters(struct heki_counters *counters,
 				 unsigned long perm, unsigned long set,
 				 unsigned long clear)
@@ -53,20 +72,38 @@ static struct heki_counters *heki_create_counters(struct mem_table *table,
 	return counters;
 }
 
+static void heki_check_counters(struct heki_counters *counters,
+				unsigned long permissions)
+{
+	/*
+	 * If a permission has been added to a PTE directly, it will not be
+	 * reflected in the counters. Adjust for that. This is a bit of a
+	 * hack, really.
+	 */
+	if ((permissions & MEM_ATTR_READ) && !counters->read)
+		counters->read++;
+	if ((permissions & MEM_ATTR_WRITE) && !counters->write)
+		counters->write++;
+	if ((permissions & MEM_ATTR_EXEC) && !counters->execute)
+		counters->execute++;
+}
+
 void heki_callback(struct heki_args *args)
 {
 	/* The VA is only for debug. It is not really used in this function. */
 	unsigned long va;
 	phys_addr_t pa, pa_end;
-	unsigned long permissions;
+	unsigned long permissions, existing, new;
 	void **entry;
 	struct heki_counters *counters;
 	unsigned int ignore;
+	bool protect_memory;
 
 	if (!pfn_valid(args->pa >> PAGE_SHIFT))
 		return;
 
 	permissions = heki_flags_to_permissions(args->flags);
+	protect_memory = heki.protect_memory;
 
 	/*
 	 * Handle counters for a leaf entry in the kernel page table.
@@ -80,6 +117,8 @@ void heki_callback(struct heki_args *args)
 		else
 			counters = NULL;
 
+		existing = heki_permissions(counters);
+
 		switch (args->cmd) {
 		case HEKI_MAP:
 			if (!counters)
@@ -102,10 +141,30 @@ void heki_callback(struct heki_args *args)
 					     permissions);
 			break;
 
+		case HEKI_PROTECT_MEMORY:
+			if (counters)
+				heki_check_counters(counters, permissions);
+			existing = 0;
+			break;
+
 		default:
 			WARN_ON_ONCE(1);
 			break;
 		}
+
+		new = heki_permissions(counters) | args->set_global;
+
+		/*
+		 * To be able to use a pool of allocated memory for new
+		 * executable or read-only mappings (e.g., kernel module
+		 * loading), ignores immutable attribute if memory can be
+		 * changed.
+		 */
+		if (new & MEM_ATTR_WRITE)
+			new &= ~MEM_ATTR_IMMUTABLE;
+
+		if (protect_memory && existing != new)
+			heki_add_pa(args, pa, new);
 	}
 }
 
@@ -120,14 +179,20 @@ static void heki_func(unsigned long va, unsigned long end,
 
 	mutex_lock(&heki_lock);
 
+	if (args->cmd == HEKI_PROTECT_MEMORY)
+		heki.protect_memory = true;
+
 	heki_walk(va, end, heki_callback, args);
 
+	if (args->head)
+		heki_apply_permissions(args);
+
 	mutex_unlock(&heki_lock);
 }
 
 /*
  * Find the mappings in the given range and initialize permission counters for
- * them.
+ * them. Apply permissions in the host page table.
  */
 void heki_map(unsigned long va, unsigned long end)
 {
@@ -138,6 +203,25 @@ void heki_map(unsigned long va, unsigned long end)
 	heki_func(va, end, &args);
 }
 
+/*
+ * The architecture calls this to protect all guest pages at the end of
+ * kernel init. Up to this point, only the counters for guest pages have been
+ * updated. No permissions have been applied on the host page table.
+ * Now, the permissions will be applied.
+ *
+ * Beyond this point, the host page table permissions will always be updated
+ * whenever the counters are updated.
+ */
+void heki_protect(unsigned long va, unsigned long end)
+{
+	struct heki_args args = {
+		.cmd = HEKI_PROTECT_MEMORY,
+		.set_global = MEM_ATTR_IMMUTABLE,
+	};
+
+	heki_func(va, end, &args);
+}
+
 /*
  * Find the mappings in the given range and update permission counters for
  * them. Apply permissions in the host page table.
@@ -156,7 +240,7 @@ void heki_update(unsigned long va, unsigned long end, unsigned long set,
 
 /*
  * Find the mappings in the given range and revert the permission counters for
- * them.
+ * them. Apply permissions in the host page table.
  */
 void heki_unmap(unsigned long va, unsigned long end)
 {
diff --git a/virt/heki/main.c b/virt/heki/main.c
index 0ab7de659e6f..5629334112e7 100644
--- a/virt/heki/main.c
+++ b/virt/heki/main.c
@@ -51,7 +51,7 @@ void heki_late_init(void)
 {
 	struct heki_hypervisor *hypervisor = heki.hypervisor;
 
-	if (!heki_enabled || !heki.hypervisor)
+	if (!heki.counters)
 		return;
 
 	/* Locks control registers so a compromised guest cannot change them. */
@@ -59,6 +59,87 @@ void heki_late_init(void)
 		return;
 
 	pr_warn("Control registers locked\n");
+
+	heki_arch_late_init();
+}
+
+/*
+ * Build a list of guest pages with their permissions. This list will be
+ * passed to the VMM/Hypervisor to set these permissions in the host page
+ * table.
+ */
+void heki_add_pa(struct heki_args *args, phys_addr_t pa,
+		 unsigned long permissions)
+{
+	struct heki_page_list *list = args->head;
+	struct heki_pages *hpage;
+	u64 max_pages;
+	struct page *page;
+	bool new = false;
+
+	max_pages = (PAGE_SIZE - sizeof(*list)) / sizeof(*hpage);
+again:
+	if (!list || list->npages == max_pages) {
+		page = alloc_page(GFP_KERNEL);
+		if (WARN_ON_ONCE(!page))
+			return;
+
+		list = page_address(page);
+		list->npages = 0;
+		list->next_pa = args->head_pa;
+		list->next = args->head;
+
+		args->head = list;
+		args->head_pa = page_to_pfn(page) << PAGE_SHIFT;
+		new = true;
+	}
+
+	hpage = &list->pages[list->npages];
+	if (new) {
+		hpage->pa = pa;
+		hpage->epa = pa + PAGE_SIZE;
+		hpage->permissions = permissions;
+		return;
+	}
+
+	if (pa == hpage->epa && permissions == hpage->permissions) {
+		hpage->epa += PAGE_SIZE;
+		return;
+	}
+
+	list->npages++;
+	new = true;
+	goto again;
+}
+
+void heki_apply_permissions(struct heki_args *args)
+{
+	struct heki_hypervisor *hypervisor = heki.hypervisor;
+	struct heki_page_list *list = args->head;
+	phys_addr_t list_pa = args->head_pa;
+	struct page *page;
+	int ret;
+
+	if (!list)
+		return;
+
+	/* The very last one must be included. */
+	list->npages++;
+
+	/* Protect guest memory in the host page table. */
+	ret = hypervisor->protect_memory(list_pa);
+	if (ret) {
+		pr_warn("Failed to set memory permission\n");
+		return;
+	}
+
+	/* Free all the pages in the page list. */
+	while (list) {
+		page = pfn_to_page(list_pa >> PAGE_SHIFT);
+		list_pa = list->next_pa;
+		list = list->next;
+		__free_pages(page, 0);
+	}
 }
 
 static int __init heki_parse_config(char *str)
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH v2 19/19] virt: Add Heki KUnit tests
  2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
                   ` (17 preceding siblings ...)
  2023-11-13  2:23 ` [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor Mickaël Salaün
@ 2023-11-13  2:23 ` Mickaël Salaün
  18 siblings, 0 replies; 44+ messages in thread
From: Mickaël Salaün @ 2023-11-13  2:23 UTC (permalink / raw)
  To: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li
  Cc: Mickaël Salaün, Alexander Graf, Chao Peng, Edgecombe,
	Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

This adds a new CONFIG_HEKI_TEST option to run tests at boot. Because we
use some symbols not exported to modules (e.g., kernel_set_to_readonly)
this could not work as modules.

To run these tests, we need to boot the kernel with the heki_test=N boot
argument with N selecting a specific test:
1. heki_test_cr_disable_smep: Check CR pinning and try to disable SMEP.
2. heki_test_write_to_const: Check .rodata (const) protection.
3. heki_test_write_to_ro_after_init: Check __ro_after_init protection.
4. heki_test_exec: Check non-executable kernel memory.

This way to select tests should not be required when the kernel will
properly handle the triggered synthetic page faults.  For now, these
page faults make the kernel loop.

All these tests temporarily disable the related kernel self-protections
and should then failed if Heki doesn't protect the kernel.  They are
verbose to make it easier to understand what is going on.

Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
---

Changes since v1:
* Move all tests to virt/heki/tests.c
---
 include/linux/heki.h |   1 +
 virt/heki/Kconfig    |  12 +++
 virt/heki/Makefile   |   1 +
 virt/heki/main.c     |   6 +-
 virt/heki/tests.c    | 207 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 226 insertions(+), 1 deletion(-)
 create mode 100644 virt/heki/tests.c

diff --git a/include/linux/heki.h b/include/linux/heki.h
index 306bcec7ae92..9e2cf0051ab0 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -149,6 +149,7 @@ void heki_protect(unsigned long va, unsigned long end);
 void heki_add_pa(struct heki_args *args, phys_addr_t pa,
 		 unsigned long permissions);
 void heki_apply_permissions(struct heki_args *args);
+void heki_run_test(void);
 
 /* Arch-specific functions. */
 void heki_arch_early_init(void);
diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig
index 9bde84cd759e..fa814a921bb0 100644
--- a/virt/heki/Kconfig
+++ b/virt/heki/Kconfig
@@ -28,3 +28,15 @@ config HYPERVISOR_SUPPORTS_HEKI
 	  A hypervisor should select this when it can successfully build
 	  and run with CONFIG_HEKI. That is, it should provide all of the
 	  hypervisor support required for the Heki feature.
+
+config HEKI_TEST
+	bool "Tests for Heki" if !KUNIT_ALL_TESTS
+	depends on HEKI && KUNIT=y
+	default KUNIT_ALL_TESTS
+	help
+	  Run Heki tests at runtime according to the heki_test=N boot
+	  parameter, with N identifying the test to run (between 1 and 4).
+
+	  Before launching the init process, the system might not respond
+	  because of unhandled kernel page fault.  This will be fixed in a
+	  next patch series.
diff --git a/virt/heki/Makefile b/virt/heki/Makefile
index 564f92faa9d8..a66cd0ba140b 100644
--- a/virt/heki/Makefile
+++ b/virt/heki/Makefile
@@ -3,3 +3,4 @@
 obj-y += main.o
 obj-y += walk.o
 obj-y += counters.o
+obj-y += tests.o
diff --git a/virt/heki/main.c b/virt/heki/main.c
index 5629334112e7..ce9984231996 100644
--- a/virt/heki/main.c
+++ b/virt/heki/main.c
@@ -51,8 +51,10 @@ void heki_late_init(void)
 {
 	struct heki_hypervisor *hypervisor = heki.hypervisor;
 
-	if (!heki.counters)
+	if (!heki.counters) {
+		heki_run_test();
 		return;
+	}
 
 	/* Locks control registers so a compromised guest cannot change them. */
 	if (WARN_ON(hypervisor->lock_crs()))
@@ -61,6 +63,8 @@ void heki_late_init(void)
 	pr_warn("Control registers locked\n");
 
 	heki_arch_late_init();
+
+	heki_run_test();
 }
 
 /*
diff --git a/virt/heki/tests.c b/virt/heki/tests.c
new file mode 100644
index 000000000000..6e6542b257f1
--- /dev/null
+++ b/virt/heki/tests.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Hypervisor Enforced Kernel Integrity (Heki) - Common code
+ *
+ * Copyright © 2023 Microsoft Corporation
+ */
+
+#include <linux/kvm_host.h>
+#include <kunit/test.h>
+#include <linux/cache.h>
+#include <linux/heki.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/printk.h>
+#include <linux/set_memory.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+
+#include "common.h"
+
+#ifdef CONFIG_HEKI_TEST
+
+/* Heki test data */
+
+/* Takes two pages to not change permission of other read-only pages. */
+const char heki_test_const_buf[PAGE_SIZE * 2] = {};
+char heki_test_ro_after_init_buf[PAGE_SIZE * 2] __ro_after_init = {};
+
+long heki_test_exec_data(long);
+void _test_exec_data_end(void);
+
+/* Used to test ROP execution against the .rodata section. */
+/* clang-format off */
+asm(
+".pushsection .rodata;" // NOT .text section
+".global heki_test_exec_data;"
+".type heki_test_exec_data, @function;"
+"heki_test_exec_data:"
+ASM_ENDBR
+"movq %rdi, %rax;"
+"inc %rax;"
+ASM_RET
+".size heki_test_exec_data, .-heki_test_exec_data;"
+"_test_exec_data_end:"
+".popsection");
+/* clang-format on */
+
+static void heki_test_cr_disable_smep(struct kunit *test)
+{
+	unsigned long cr4;
+
+	/* SMEP should be initially enabled. */
+	KUNIT_ASSERT_TRUE(test, __read_cr4() & X86_CR4_SMEP);
+
+	kunit_warn(test,
+		   "Starting control register pinning tests with SMEP check\n");
+
+	/*
+	 * Trying to disable SMEP, bypassing kernel self-protection by not
+	 * using cr4_clear_bits(X86_CR4_SMEP).
+	 */
+	cr4 = __read_cr4() & ~X86_CR4_SMEP;
+	asm volatile("mov %0,%%cr4" : "+r"(cr4) : : "memory");
+
+	/* SMEP should still be enabled. */
+	KUNIT_ASSERT_TRUE(test, __read_cr4() & X86_CR4_SMEP);
+}
+
+static inline void print_addr(struct kunit *test, const char *const buf_name,
+			      void *const buf)
+{
+	const pte_t pte = *virt_to_kpte((unsigned long)buf);
+	const phys_addr_t paddr = slow_virt_to_phys(buf);
+	bool present = pte_flags(pte) & (_PAGE_PRESENT);
+	bool accessible = pte_accessible(&init_mm, pte);
+
+	kunit_warn(
+		test,
+		"%s vaddr:%llx paddr:%llx exec:%d write:%d present:%d accessible:%d\n",
+		buf_name, (unsigned long long)buf, paddr, !!pte_exec(pte),
+		!!pte_write(pte), present, accessible);
+}
+
+extern int kernel_set_to_readonly;
+
+static void heki_test_write_to_rodata(struct kunit *test,
+				      const char *const buf_name,
+				      char *const ro_buf)
+{
+	print_addr(test, buf_name, (void *)ro_buf);
+	KUNIT_EXPECT_EQ(test, 0, *ro_buf);
+
+	kunit_warn(
+		test,
+		"Bypassing kernel self-protection: mark memory as writable\n");
+	kernel_set_to_readonly = 0;
+	/*
+	 * Removes execute permission that might be set by bugdoor-exec,
+	 * because change_page_attr_clear() is not use by set_memory_rw().
+	 * This is required since commit 652c5bf380ad ("x86/mm: Refuse W^X
+	 * violations").
+	 */
+	KUNIT_ASSERT_FALSE(test, set_memory_nx((unsigned long)PTR_ALIGN_DOWN(
+						       ro_buf, PAGE_SIZE),
+					       1));
+	KUNIT_ASSERT_FALSE(test, set_memory_rw((unsigned long)PTR_ALIGN_DOWN(
+						       ro_buf, PAGE_SIZE),
+					       1));
+	kernel_set_to_readonly = 1;
+
+	kunit_warn(test, "Trying memory write\n");
+	*ro_buf = 0x11;
+	KUNIT_EXPECT_EQ(test, 0, *ro_buf);
+	kunit_warn(test, "New content: 0x%02x\n", *ro_buf);
+}
+
+static void heki_test_write_to_const(struct kunit *test)
+{
+	heki_test_write_to_rodata(test, "const_buf",
+				  (void *)heki_test_const_buf);
+}
+
+static void heki_test_write_to_ro_after_init(struct kunit *test)
+{
+	heki_test_write_to_rodata(test, "ro_after_init_buf",
+				  (void *)heki_test_ro_after_init_buf);
+}
+
+typedef long test_exec_t(long);
+
+static void heki_test_exec(struct kunit *test)
+{
+	const size_t exec_size = 7;
+	unsigned long nx_page_start = (unsigned long)PTR_ALIGN_DOWN(
+		(const void *const)heki_test_exec_data, PAGE_SIZE);
+	unsigned long nx_page_end = (unsigned long)PTR_ALIGN(
+		(const void *const)heki_test_exec_data + exec_size, PAGE_SIZE);
+	test_exec_t *exec = (test_exec_t *)heki_test_exec_data;
+	long ret;
+
+	/* Starting non-executable memory tests. */
+	print_addr(test, "test_exec_data", heki_test_exec_data);
+
+	kunit_warn(
+		test,
+		"Bypassing kernel-self protection: mark memory as executable\n");
+	kernel_set_to_readonly = 0;
+	KUNIT_ASSERT_FALSE(test,
+			   set_memory_rox(nx_page_start,
+					  PFN_UP(nx_page_end - nx_page_start)));
+	kernel_set_to_readonly = 1;
+
+	kunit_warn(
+		test,
+		"Trying to execute data (ROP) in (initially) non-executable memory\n");
+	ret = exec(3);
+
+	/* This should not be reached because of the uncaught page fault. */
+	KUNIT_EXPECT_EQ(test, 3, ret);
+	kunit_warn(test, "Result of execution: 3 + 1 = %ld\n", ret);
+}
+
+const struct kunit_case heki_test_cases[] = {
+	KUNIT_CASE(heki_test_cr_disable_smep),
+	KUNIT_CASE(heki_test_write_to_const),
+	KUNIT_CASE(heki_test_write_to_ro_after_init),
+	KUNIT_CASE(heki_test_exec),
+	{}
+};
+
+static unsigned long heki_test __ro_after_init;
+
+static int __init parse_heki_test_config(char *str)
+{
+	if (kstrtoul(str, 10, &heki_test) ||
+	    heki_test > (ARRAY_SIZE(heki_test_cases) - 1))
+		pr_warn("Invalid option string for heki_test: '%s'\n", str);
+	return 1;
+}
+
+__setup("heki_test=", parse_heki_test_config);
+
+void heki_run_test(void)
+{
+	struct kunit_case heki_test_case[2] = {};
+	struct kunit_suite heki_test_suite = {
+		.name = "heki",
+		.test_cases = heki_test_case,
+	};
+	struct kunit_suite *const test_suite = &heki_test_suite;
+
+	if (!kunit_enabled() || heki_test == 0 ||
+	    heki_test >= ARRAY_SIZE(heki_test_cases))
+		return;
+
+	pr_warn("Running test #%lu\n", heki_test);
+	heki_test_case[0] = heki_test_cases[heki_test - 1];
+	__kunit_test_suites_init(&test_suite, 1);
+}
+
+#else /* CONFIG_HEKI_TEST */
+
+void heki_run_test(void)
+{
+}
+
+#endif /* CONFIG_HEKI_TEST */
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 11/19] KVM: x86: Add new hypercall to set EPT permissions
  2023-11-13  2:23 ` [RFC PATCH v2 11/19] KVM: x86: Add new hypercall to set EPT permissions Mickaël Salaün
@ 2023-11-13  4:45   ` kernel test robot
  0 siblings, 0 replies; 44+ messages in thread
From: kernel test robot @ 2023-11-13  4:45 UTC (permalink / raw)
  To: Mickaël Salaün; +Cc: oe-kbuild-all

Hi Mickaël,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on 881375a408c0f4ea451ff14545b59216d2923881]

url:    https://github.com/intel-lab-lkp/linux/commits/Micka-l-Sala-n/virt-Introduce-Hypervisor-Enforced-Kernel-Integrity-Heki/20231113-102847
base:   881375a408c0f4ea451ff14545b59216d2923881
patch link:    https://lore.kernel.org/r/20231113022326.24388-12-mic%40digikod.net
patch subject: [RFC PATCH v2 11/19] KVM: x86: Add new hypercall to set EPT permissions
config: openrisc-allnoconfig (https://download.01.org/0day-ci/archive/20231113/202311131256.g3KYfiJx-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231113/202311131256.g3KYfiJx-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311131256.g3KYfiJx-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/heki.h:11,
                    from init/main.c:102:
>> include/linux/kvm_types.h:27:10: fatal error: asm/kvm_types.h: No such file or directory
      27 | #include <asm/kvm_types.h>
         |          ^~~~~~~~~~~~~~~~~
   compilation terminated.


vim +27 include/linux/kvm_types.h

d77a39d982431e drivers/kvm/types.h       Hollis Blanchard    2007-12-03  26  
2aa9c199cf8151 include/linux/kvm_types.h Sean Christopherson 2020-07-02 @27  #include <asm/kvm_types.h>
2aa9c199cf8151 include/linux/kvm_types.h Sean Christopherson 2020-07-02  28  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 14/19] heki: x86: Initialize permissions counters for pages mapped into KVA
  2023-11-13  2:23 ` [RFC PATCH v2 14/19] heki: x86: Initialize permissions counters for pages mapped into KVA Mickaël Salaün
@ 2023-11-14  1:22 ` kernel test robot
  -1 siblings, 0 replies; 44+ messages in thread
From: kernel test robot @ 2023-11-13  5:18 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp

:::::: 
:::::: Manual check reason: "fbc 5dc2a03f8c532db0c352fd8cc08cefd8cbae43e6 and parent 44ac2f6f2e17897d27676ba4e80879929a1af167 have different kconfig"
:::::: Manual check reason: "has kconfig file changed"
:::::: 

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20231113022326.24388-15-mic@digikod.net>
References: <20231113022326.24388-15-mic@digikod.net>
TO: "Mickaël Salaün" <mic@digikod.net>

Hi Mickaël,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on 881375a408c0f4ea451ff14545b59216d2923881]

url:    https://github.com/intel-lab-lkp/linux/commits/Micka-l-Sala-n/virt-Introduce-Hypervisor-Enforced-Kernel-Integrity-Heki/20231113-102847
base:   881375a408c0f4ea451ff14545b59216d2923881
patch link:    https://lore.kernel.org/r/20231113022326.24388-15-mic%40digikod.net
patch subject: [RFC PATCH v2 14/19] heki: x86: Initialize permissions counters for pages mapped into KVA
:::::: branch date: 3 hours ago
:::::: commit date: 3 hours ago
config: openrisc-allmodconfig (https://download.01.org/0day-ci/archive/20231113/202311131254.C8WZC0U9-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231113/202311131254.C8WZC0U9-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/r/202311131254.C8WZC0U9-lkp@intel.com/

All errors (new ones prefixed by >>):

   WARNING: unmet direct dependencies detected for SPARSEMEM
   Depends on [n]: !SELECT_MEMORY_MODEL [=n] && ARCH_SPARSEMEM_ENABLE || SPARSEMEM_MANUAL [=n]
   Selected by [y]:
   - HEKI [=y] && ARCH_SUPPORTS_HEKI [=y] && HYPERVISOR_SUPPORTS_HEKI [=y] && !X86_16BIT
   In file included from include/linux/mm_types.h:18,
   from include/linux/sched/signal.h:13,
   from include/linux/ptrace.h:7,
   from arch/openrisc/kernel/asm-offsets.c:28:
>> include/linux/page-flags-layout.h:30:10: fatal error: asm/sparsemem.h: No such file or directory
   30 | #include <asm/sparsemem.h>
   |          ^~~~~~~~~~~~~~~~~
   compilation terminated.
   make[3]: *** [scripts/Makefile.build:116: arch/openrisc/kernel/asm-offsets.s] Error 1
   make[3]: Target 'prepare' not remade because of errors.
   make[2]: *** [Makefile:1202: prepare0] Error 2
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:234: __sub-make] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:234: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for SPARSEMEM
   Depends on [n]: !SELECT_MEMORY_MODEL [=n] && ARCH_SPARSEMEM_ENABLE || SPARSEMEM_MANUAL [=n]
   Selected by [y]:
   - HEKI [=y] && ARCH_SUPPORTS_HEKI [=y] && HYPERVISOR_SUPPORTS_HEKI [=y] && !X86_16BIT


vim +30 include/linux/page-flags-layout.h

1587db62d8c0db Yu Zhao        2021-04-29  28  
bbeae5b05ef6e4 Peter Zijlstra 2013-02-22  29  #ifdef CONFIG_SPARSEMEM
bbeae5b05ef6e4 Peter Zijlstra 2013-02-22 @30  #include <asm/sparsemem.h>
bbeae5b05ef6e4 Peter Zijlstra 2013-02-22  31  #define SECTIONS_SHIFT	(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
1587db62d8c0db Yu Zhao        2021-04-29  32  #else
1587db62d8c0db Yu Zhao        2021-04-29  33  #define SECTIONS_SHIFT	0
1587db62d8c0db Yu Zhao        2021-04-29  34  #endif
bbeae5b05ef6e4 Peter Zijlstra 2013-02-22  35  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions
  2023-11-13  2:23 ` [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions Mickaël Salaün
@ 2023-11-14  1:27 ` kernel test robot
  -1 siblings, 0 replies; 44+ messages in thread
From: kernel test robot @ 2023-11-13  7:42 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp

:::::: 
:::::: Manual check reason: "fbc e677181e18929a5506e046fcb60f660060216505 and parent dd628ea6472ca6fefaaa3d84baab96270832e7e8 have different kconfig"
:::::: Manual check reason: "has kconfig file changed"
:::::: 

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20231113022326.24388-11-mic@digikod.net>
References: <20231113022326.24388-11-mic@digikod.net>
TO: "Mickaël Salaün" <mic@digikod.net>

Hi Mickaël,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on 881375a408c0f4ea451ff14545b59216d2923881]

url:    https://github.com/intel-lab-lkp/linux/commits/Micka-l-Sala-n/virt-Introduce-Hypervisor-Enforced-Kernel-Integrity-Heki/20231113-102847
base:   881375a408c0f4ea451ff14545b59216d2923881
patch link:    https://lore.kernel.org/r/20231113022326.24388-11-mic%40digikod.net
patch subject: [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions
:::::: branch date: 5 hours ago
:::::: commit date: 5 hours ago
config: i386-randconfig-002-20231113 (https://download.01.org/0day-ci/archive/20231113/202311131527.nBtS1Ewn-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231113/202311131527.nBtS1Ewn-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/r/202311131527.nBtS1Ewn-lkp@intel.com/

All errors (new ones prefixed by >>):

   WARNING: unmet direct dependencies detected for SPARSEMEM
   Depends on [n]: !SELECT_MEMORY_MODEL [=y] && ARCH_SPARSEMEM_ENABLE [=y] || SPARSEMEM_MANUAL [=n]
   Selected by [y]:
   - KVM [=y] && VIRTUALIZATION [=y] && HAVE_KVM [=y] && HIGH_RES_TIMERS [=y] && X86_LOCAL_APIC [=y]
   In file included from arch/x86/include/asm/page.h:85,
   from arch/x86/include/asm/thread_info.h:12,
   from include/linux/thread_info.h:60,
   from arch/x86/include/asm/preempt.h:9,
   from include/linux/preempt.h:79,
   from include/linux/spinlock.h:56,
   from include/linux/swait.h:7,
   from include/linux/completion.h:12,
   from include/linux/crypto.h:15,
   from arch/x86/kernel/asm-offsets.c:9:
>> include/asm-generic/memory_model.h:31:19: error: redefinition of 'pfn_valid'
   31 | #define pfn_valid pfn_valid
   |                   ^~~~~~~~~
   include/linux/mmzone.h:1987:19: note: in expansion of macro 'pfn_valid'
   1987 | static inline int pfn_valid(unsigned long pfn)
   |                   ^~~~~~~~~
   include/asm-generic/memory_model.h:23:19: note: previous definition of 'pfn_valid' with type 'int(long unsigned int)'
   23 | static inline int pfn_valid(unsigned long pfn)
   |                   ^~~~~~~~~
   make[3]: *** [scripts/Makefile.build:116: arch/x86/kernel/asm-offsets.s] Error 1 shuffle=912718013
   make[3]: Target 'prepare' not remade because of errors.
   make[2]: *** [Makefile:1202: prepare0] Error 2 shuffle=912718013
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:234: __sub-make] Error 2 shuffle=912718013
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:234: __sub-make] Error 2 shuffle=912718013
   make: Target 'prepare' not remade because of errors.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for SPARSEMEM
   Depends on [n]: !SELECT_MEMORY_MODEL [=y] && ARCH_SPARSEMEM_ENABLE [=y] || SPARSEMEM_MANUAL [=n]
   Selected by [y]:
   - KVM [=y] && VIRTUALIZATION [=y] && HAVE_KVM [=y] && HIGH_RES_TIMERS [=y] && X86_LOCAL_APIC [=y]


vim +/pfn_valid +31 include/asm-generic/memory_model.h

a117e66ed45ac0 KAMEZAWA Hiroyuki   2006-03-27  17  
67de648211fa04 Andy Whitcroft      2006-06-23  18  #define __pfn_to_page(pfn)	(mem_map + ((pfn) - ARCH_PFN_OFFSET))
67de648211fa04 Andy Whitcroft      2006-06-23  19  #define __page_to_pfn(page)	((unsigned long)((page) - mem_map) + \
a117e66ed45ac0 KAMEZAWA Hiroyuki   2006-03-27  20  				 ARCH_PFN_OFFSET)
a117e66ed45ac0 KAMEZAWA Hiroyuki   2006-03-27  21  
e5080a9677854b Mike Rapoport (IBM  2023-01-29  22) #ifndef pfn_valid
e5080a9677854b Mike Rapoport (IBM  2023-01-29  23) static inline int pfn_valid(unsigned long pfn)
e5080a9677854b Mike Rapoport (IBM  2023-01-29  24) {
e5080a9677854b Mike Rapoport (IBM  2023-01-29  25) 	/* avoid <linux/mm.h> include hell */
e5080a9677854b Mike Rapoport (IBM  2023-01-29  26) 	extern unsigned long max_mapnr;
e5080a9677854b Mike Rapoport (IBM  2023-01-29  27) 	unsigned long pfn_offset = ARCH_PFN_OFFSET;
e5080a9677854b Mike Rapoport (IBM  2023-01-29  28) 
e5080a9677854b Mike Rapoport (IBM  2023-01-29  29) 	return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
e5080a9677854b Mike Rapoport (IBM  2023-01-29  30) }
e5080a9677854b Mike Rapoport (IBM  2023-01-29 @31) #define pfn_valid pfn_valid
e5080a9677854b Mike Rapoport (IBM  2023-01-29  32) #endif
e5080a9677854b Mike Rapoport (IBM  2023-01-29  33) 

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor
  2023-11-13  2:23 ` [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor Mickaël Salaün
@ 2023-11-14  1:30 ` kernel test robot
  0 siblings, 0 replies; 44+ messages in thread
From: kernel test robot @ 2023-11-13  8:14 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp

:::::: 
:::::: Manual check reason: "has kconfig file changed"
:::::: 

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20231113022326.24388-19-mic@digikod.net>
References: <20231113022326.24388-19-mic@digikod.net>
TO: "Mickaël Salaün" <mic@digikod.net>

Hi Mickaël,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on 881375a408c0f4ea451ff14545b59216d2923881]

url:    https://github.com/intel-lab-lkp/linux/commits/Micka-l-Sala-n/virt-Introduce-Hypervisor-Enforced-Kernel-Integrity-Heki/20231113-102847
base:   881375a408c0f4ea451ff14545b59216d2923881
patch link:    https://lore.kernel.org/r/20231113022326.24388-19-mic%40digikod.net
patch subject: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor
:::::: branch date: 6 hours ago
:::::: commit date: 6 hours ago
config: riscv-randconfig-001-20231113 (attached as .config)
compiler: riscv64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231113/202311131528.mJpddiq3-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/r/202311131528.mJpddiq3-lkp@intel.com/

All errors (new ones prefixed by >>):

>> arch/riscv/Kconfig:833:error: recursive dependency detected!
   arch/riscv/Kconfig:833:	symbol XIP_KERNEL depends on SPARSEMEM
   mm/Kconfig:461:	symbol SPARSEMEM is selected by HEKI
   virt/heki/Kconfig:5:	symbol HEKI depends on JUMP_LABEL
   arch/Kconfig:66:	symbol JUMP_LABEL depends on HAVE_ARCH_JUMP_LABEL
   arch/Kconfig:441:	symbol HAVE_ARCH_JUMP_LABEL is selected by XIP_KERNEL
   For a resolution refer to Documentation/kbuild/kconfig-language.rst
   subsection "Kconfig recursive dependency limitations"


vim +833 arch/riscv/Kconfig

aef53f97b505ff Nick Kossifidis 2018-09-20  787  
d7071743db31b4 Atish Patra     2020-09-17  788  config EFI_STUB
d7071743db31b4 Atish Patra     2020-09-17  789  	bool
d7071743db31b4 Atish Patra     2020-09-17  790  
d7071743db31b4 Atish Patra     2020-09-17  791  config EFI
d7071743db31b4 Atish Patra     2020-09-17  792  	bool "UEFI runtime support"
44c922572952d8 Vitaly Wool     2021-04-13  793  	depends on OF && !XIP_KERNEL
5f365c133b83a5 Conor Dooley    2022-12-19  794  	depends on MMU
5f365c133b83a5 Conor Dooley    2022-12-19  795  	default y
a91a9ffbd3a55a Sunil V L       2023-05-15  796  	select ARCH_SUPPORTS_ACPI if 64BIT
d7071743db31b4 Atish Patra     2020-09-17  797  	select EFI_GENERIC_STUB
5f365c133b83a5 Conor Dooley    2022-12-19  798  	select EFI_PARAMS_FROM_FDT
b91540d52a08b6 Atish Patra     2020-09-17  799  	select EFI_RUNTIME_WRAPPERS
5f365c133b83a5 Conor Dooley    2022-12-19  800  	select EFI_STUB
5f365c133b83a5 Conor Dooley    2022-12-19  801  	select LIBFDT
d7071743db31b4 Atish Patra     2020-09-17  802  	select RISCV_ISA_C
5f365c133b83a5 Conor Dooley    2022-12-19  803  	select UCS2_STRING
d7071743db31b4 Atish Patra     2020-09-17  804  	help
d7071743db31b4 Atish Patra     2020-09-17  805  	  This option provides support for runtime services provided
d7071743db31b4 Atish Patra     2020-09-17  806  	  by UEFI firmware (such as non-volatile variables, realtime
d7071743db31b4 Atish Patra     2020-09-17  807  	  clock, and platform reset). A UEFI stub is also provided to
d7071743db31b4 Atish Patra     2020-09-17  808  	  allow the kernel to be booted as an EFI application. This
d7071743db31b4 Atish Patra     2020-09-17  809  	  is only useful on systems that have UEFI firmware.
d7071743db31b4 Atish Patra     2020-09-17  810  
fea2fed201ee56 Guo Ren         2020-12-17  811  config CC_HAVE_STACKPROTECTOR_TLS
fea2fed201ee56 Guo Ren         2020-12-17  812  	def_bool $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=tp -mstack-protector-guard-offset=0)
fea2fed201ee56 Guo Ren         2020-12-17  813  
fea2fed201ee56 Guo Ren         2020-12-17  814  config STACKPROTECTOR_PER_TASK
fea2fed201ee56 Guo Ren         2020-12-17  815  	def_bool y
595b893e2087de Kees Cook       2022-05-03  816  	depends on !RANDSTRUCT
fea2fed201ee56 Guo Ren         2020-12-17  817  	depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS
fea2fed201ee56 Guo Ren         2020-12-17  818  
867432bec1c6e7 Alexandre Ghiti 2021-07-21  819  config PHYS_RAM_BASE_FIXED
867432bec1c6e7 Alexandre Ghiti 2021-07-21  820  	bool "Explicitly specified physical RAM address"
44c1e84a38a031 Palmer Dabbelt  2022-05-21  821  	depends on NONPORTABLE
867432bec1c6e7 Alexandre Ghiti 2021-07-21  822  	default n
867432bec1c6e7 Alexandre Ghiti 2021-07-21  823  
44c922572952d8 Vitaly Wool     2021-04-13  824  config PHYS_RAM_BASE
44c922572952d8 Vitaly Wool     2021-04-13  825  	hex "Platform Physical RAM address"
867432bec1c6e7 Alexandre Ghiti 2021-07-21  826  	depends on PHYS_RAM_BASE_FIXED
44c922572952d8 Vitaly Wool     2021-04-13  827  	default "0x80000000"
44c922572952d8 Vitaly Wool     2021-04-13  828  	help
44c922572952d8 Vitaly Wool     2021-04-13  829  	  This is the physical address of RAM in the system. It has to be
44c922572952d8 Vitaly Wool     2021-04-13  830  	  explicitly specified to run early relocations of read-write data
44c922572952d8 Vitaly Wool     2021-04-13  831  	  from flash to RAM.
44c922572952d8 Vitaly Wool     2021-04-13  832  
44c922572952d8 Vitaly Wool     2021-04-13 @833  config XIP_KERNEL
44c922572952d8 Vitaly Wool     2021-04-13  834  	bool "Kernel Execute-In-Place from ROM"
44c1e84a38a031 Palmer Dabbelt  2022-05-21  835  	depends on MMU && SPARSEMEM && NONPORTABLE
44c922572952d8 Vitaly Wool     2021-04-13  836  	# This prevents XIP from being enabled by all{yes,mod}config, which
44c922572952d8 Vitaly Wool     2021-04-13  837  	# fail to build since XIP doesn't support large kernels.
44c922572952d8 Vitaly Wool     2021-04-13  838  	depends on !COMPILE_TEST
867432bec1c6e7 Alexandre Ghiti 2021-07-21  839  	select PHYS_RAM_BASE_FIXED
44c922572952d8 Vitaly Wool     2021-04-13  840  	help
44c922572952d8 Vitaly Wool     2021-04-13  841  	  Execute-In-Place allows the kernel to run from non-volatile storage
44c922572952d8 Vitaly Wool     2021-04-13  842  	  directly addressable by the CPU, such as NOR flash. This saves RAM
44c922572952d8 Vitaly Wool     2021-04-13  843  	  space since the text section of the kernel is not loaded from flash
44c922572952d8 Vitaly Wool     2021-04-13  844  	  to RAM.  Read-write sections, such as the data section and stack,
44c922572952d8 Vitaly Wool     2021-04-13  845  	  are still copied to RAM.  The XIP kernel is not compressed since
44c922572952d8 Vitaly Wool     2021-04-13  846  	  it has to run directly from flash, so it will take more space to
44c922572952d8 Vitaly Wool     2021-04-13  847  	  store it.  The flash address used to link the kernel object files,
44c922572952d8 Vitaly Wool     2021-04-13  848  	  and for storing it, is configuration dependent. Therefore, if you
44c922572952d8 Vitaly Wool     2021-04-13  849  	  say Y here, you must know the proper physical address where to
44c922572952d8 Vitaly Wool     2021-04-13  850  	  store the kernel image depending on your own flash memory usage.
44c922572952d8 Vitaly Wool     2021-04-13  851  
44c922572952d8 Vitaly Wool     2021-04-13  852  	  Also note that the make target becomes "make xipImage" rather than
44c922572952d8 Vitaly Wool     2021-04-13  853  	  "make zImage" or "make Image".  The final kernel binary to put in
44c922572952d8 Vitaly Wool     2021-04-13  854  	  ROM memory will be arch/riscv/boot/xipImage.
44c922572952d8 Vitaly Wool     2021-04-13  855  
44c922572952d8 Vitaly Wool     2021-04-13  856  	  SPARSEMEM is required because the kernel text and rodata that are
44c922572952d8 Vitaly Wool     2021-04-13  857  	  flash resident are not backed by memmap, then any attempt to get
44c922572952d8 Vitaly Wool     2021-04-13  858  	  a struct page on those regions will trigger a fault.
44c922572952d8 Vitaly Wool     2021-04-13  859  
44c922572952d8 Vitaly Wool     2021-04-13  860  	  If unsure, say N.
44c922572952d8 Vitaly Wool     2021-04-13  861  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
  2023-11-13  2:23 ` [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching Mickaël Salaün
@ 2023-11-13  8:19   ` Peter Zijlstra
  2023-11-27 16:48     ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2023-11-13  8:19 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li, Alexander Graf, Chao Peng,
	Edgecombe, Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

On Sun, Nov 12, 2023 at 09:23:24PM -0500, Mickaël Salaün wrote:
> From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> 
> X86 uses a function called __text_poke() to modify executable code. This
> patching function is used by many features such as KProbes and FTrace.
> 
> Update the permissions counters for the text page so that write
> permissions can be temporarily established in the EPT to modify the
> instructions in that page.
> 
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> Cc: Mickaël Salaün <mic@digikod.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> ---
> 
> Changes since v1:
> * New patch
> ---
>  arch/x86/kernel/alternative.c |  5 ++++
>  arch/x86/mm/heki.c            | 49 +++++++++++++++++++++++++++++++++++
>  include/linux/heki.h          | 14 ++++++++++
>  3 files changed, 68 insertions(+)
> 
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 517ee01503be..64fd8757ba5c 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -18,6 +18,7 @@
>  #include <linux/mmu_context.h>
>  #include <linux/bsearch.h>
>  #include <linux/sync_core.h>
> +#include <linux/heki.h>
>  #include <asm/text-patching.h>
>  #include <asm/alternative.h>
>  #include <asm/sections.h>
> @@ -1801,6 +1802,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
>  	 */
>  	pgprot = __pgprot(pgprot_val(PAGE_KERNEL) & ~_PAGE_GLOBAL);
>  
> +	heki_text_poke_start(pages, cross_page_boundary ? 2 : 1, pgprot);
>  	/*
>  	 * The lock is not really needed, but this allows to avoid open-coding.
>  	 */
> @@ -1865,7 +1867,10 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
>  	}
>  
>  	local_irq_restore(flags);
> +
>  	pte_unmap_unlock(ptep, ptl);
> +	heki_text_poke_end(pages, cross_page_boundary ? 2 : 1, pgprot);
> +
>  	return addr;
>  }

This makes no sense, we already use a custom CR3 with userspace alias
for the actual pages to write to, why are you then frobbing permissions
on that *again* ?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor
  2023-11-13  2:23 ` [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor Mickaël Salaün
@ 2023-11-13  8:54   ` Peter Zijlstra
  2023-11-27 17:05     ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2023-11-13  8:54 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li, Alexander Graf, Chao Peng,
	Edgecombe, Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Madhavan T . Venkataraman, Marian Rotariu,
	Mihai Donțu, Nicușor Cîțu, Thara Gopinath,
	Trilok Soni, Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

On Sun, Nov 12, 2023 at 09:23:25PM -0500, Mickaël Salaün wrote:
> From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> 
> Implement a hypervisor function, kvm_protect_memory() that calls the
> KVM_HC_PROTECT_MEMORY hypercall to request the KVM hypervisor to
> set specified permissions on a list of guest pages.
> 
> Using the protect_memory() function, set proper EPT permissions for all
> guest pages.
> 
> Use the MEM_ATTR_IMMUTABLE property to protect the kernel static
> sections and the boot-time read-only sections. This enables to make sure
> a compromised guest will not be able to change its main physical memory
> page permissions. However, this also disable any feature that may change
> the kernel's text section (e.g., ftrace, Kprobes), but they can still be
> used on kernel modules.
> 
> Module loading/unloading, and eBPF JIT is allowed without restrictions
> for now, but we'll need a way to authenticate these code changes to
> really improve the guests' security. We plan to use module signatures,
> but there is no solution yet to authenticate eBPF programs.
> 
> Being able to use ftrace and Kprobes in a secure way is a challenge not
> solved yet. We're looking for ideas to make this work.
> 
> Likewise, the JUMP_LABEL feature cannot work because the kernel's text
> section is read-only.

What is the actual problem? As is the kernel text map is already RO and
never changed.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions
  2023-11-13  2:23 ` [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions Mickaël Salaün
@ 2023-11-14  1:29 ` kernel test robot
  -1 siblings, 0 replies; 44+ messages in thread
From: kernel test robot @ 2023-11-13 12:37 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp

:::::: 
:::::: Manual check reason: "has kconfig file changed"
:::::: 

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20231113022326.24388-11-mic@digikod.net>
References: <20231113022326.24388-11-mic@digikod.net>
TO: "Mickaël Salaün" <mic@digikod.net>

Hi Mickaël,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on 881375a408c0f4ea451ff14545b59216d2923881]

url:    https://github.com/intel-lab-lkp/linux/commits/Micka-l-Sala-n/virt-Introduce-Hypervisor-Enforced-Kernel-Integrity-Heki/20231113-102847
base:   881375a408c0f4ea451ff14545b59216d2923881
patch link:    https://lore.kernel.org/r/20231113022326.24388-11-mic%40digikod.net
patch subject: [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions
:::::: branch date: 10 hours ago
:::::: commit date: 10 hours ago
config: x86_64-randconfig-013-20231113 (https://download.01.org/0day-ci/archive/20231113/202311132003.c5QKfbEI-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231113/202311132003.c5QKfbEI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/r/202311132003.c5QKfbEI-lkp@intel.com/

All errors (new ones prefixed by >>):

   arch/x86/kvm/../../../virt/lib/kvm_permissions.c: In function 'kvm_permissions_get':
>> arch/x86/kvm/../../../virt/lib/kvm_permissions.c:74:20: error: implicit declaration of function 'kvm_get_memory_attributes' [-Werror=implicit-function-declaration]
      74 |         kvm_attr = kvm_get_memory_attributes(kvm, gfn);
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/kvm/../../../virt/lib/kvm_permissions.c: In function 'kvm_permissions_set':
>> arch/x86/kvm/../../../virt/lib/kvm_permissions.c:91:13: error: implicit declaration of function 'kvm_range_has_memory_attributes'; did you mean 'kvm_mmu_init_memslot_memory_attributes'? [-Werror=implicit-function-declaration]
      91 |         if (kvm_range_has_memory_attributes(kvm, gfn_start, gfn_end,
         |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |             kvm_mmu_init_memslot_memory_attributes
>> arch/x86/kvm/../../../virt/lib/kvm_permissions.c:100:16: error: implicit declaration of function 'kvm_vm_set_mem_attributes' [-Werror=implicit-function-declaration]
     100 |         return kvm_vm_set_mem_attributes(kvm, gfn_start, gfn_end,
         |                ^~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/kvm_get_memory_attributes +74 arch/x86/kvm/../../../virt/lib/kvm_permissions.c

e677181e18929a Mickaël Salaün 2023-11-12   63  
e677181e18929a Mickaël Salaün 2023-11-12   64  unsigned long kvm_permissions_get(struct kvm *kvm, gfn_t gfn)
e677181e18929a Mickaël Salaün 2023-11-12   65  {
e677181e18929a Mickaël Salaün 2023-11-12   66  	unsigned long kvm_attr = 0;
e677181e18929a Mickaël Salaün 2023-11-12   67  
e677181e18929a Mickaël Salaün 2023-11-12   68  	/*
e677181e18929a Mickaël Salaün 2023-11-12   69  	 * Retrieve the permissions for a guest page. If not present (i.e., no
e677181e18929a Mickaël Salaün 2023-11-12   70  	 * attribute), then return default permissions (RWX).  This means
e677181e18929a Mickaël Salaün 2023-11-12   71  	 * setting permissions to 0 resets them to RWX. We might want to
e677181e18929a Mickaël Salaün 2023-11-12   72  	 * revisit that in a future version.
e677181e18929a Mickaël Salaün 2023-11-12   73  	 */
e677181e18929a Mickaël Salaün 2023-11-12  @74  	kvm_attr = kvm_get_memory_attributes(kvm, gfn);
e677181e18929a Mickaël Salaün 2023-11-12   75  	if (kvm_attr)
e677181e18929a Mickaël Salaün 2023-11-12   76  		return kvm_attr_to_heki_attr(kvm_attr);
e677181e18929a Mickaël Salaün 2023-11-12   77  	else
e677181e18929a Mickaël Salaün 2023-11-12   78  		return kvm_default_permissions;
e677181e18929a Mickaël Salaün 2023-11-12   79  }
e677181e18929a Mickaël Salaün 2023-11-12   80  EXPORT_SYMBOL_GPL(kvm_permissions_get);
e677181e18929a Mickaël Salaün 2023-11-12   81  
e677181e18929a Mickaël Salaün 2023-11-12   82  int kvm_permissions_set(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end,
e677181e18929a Mickaël Salaün 2023-11-12   83  			unsigned long heki_attr)
e677181e18929a Mickaël Salaün 2023-11-12   84  {
e677181e18929a Mickaël Salaün 2023-11-12   85  	if ((heki_attr | MEM_ATTR_PROT) != MEM_ATTR_PROT)
e677181e18929a Mickaël Salaün 2023-11-12   86  		return -EINVAL;
e677181e18929a Mickaël Salaün 2023-11-12   87  
e677181e18929a Mickaël Salaün 2023-11-12   88  	if (gfn_end <= gfn_start)
e677181e18929a Mickaël Salaün 2023-11-12   89  		return -EINVAL;
e677181e18929a Mickaël Salaün 2023-11-12   90  
e677181e18929a Mickaël Salaün 2023-11-12  @91  	if (kvm_range_has_memory_attributes(kvm, gfn_start, gfn_end,
e677181e18929a Mickaël Salaün 2023-11-12   92  					    KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE,
e677181e18929a Mickaël Salaün 2023-11-12   93  					    false)) {
e677181e18929a Mickaël Salaün 2023-11-12   94  		pr_warn_ratelimited(
e677181e18929a Mickaël Salaün 2023-11-12   95  			"Guest tried to change immutable permission for GFNs %llx-%llx\n",
e677181e18929a Mickaël Salaün 2023-11-12   96  			gfn_start, gfn_end);
e677181e18929a Mickaël Salaün 2023-11-12   97  		return -EPERM;
e677181e18929a Mickaël Salaün 2023-11-12   98  	}
e677181e18929a Mickaël Salaün 2023-11-12   99  
e677181e18929a Mickaël Salaün 2023-11-12 @100  	return kvm_vm_set_mem_attributes(kvm, gfn_start, gfn_end,

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 14/19] heki: x86: Initialize permissions counters for pages mapped into KVA
@ 2023-11-14  1:22 ` kernel test robot
  0 siblings, 0 replies; 44+ messages in thread
From: kernel test robot @ 2023-11-14  1:22 UTC (permalink / raw)
  To: Mickaël Salaün; +Cc: oe-kbuild-all

Hi Mickaël,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on 881375a408c0f4ea451ff14545b59216d2923881]

url:    https://github.com/intel-lab-lkp/linux/commits/Micka-l-Sala-n/virt-Introduce-Hypervisor-Enforced-Kernel-Integrity-Heki/20231113-102847
base:   881375a408c0f4ea451ff14545b59216d2923881
patch link:    https://lore.kernel.org/r/20231113022326.24388-15-mic%40digikod.net
patch subject: [RFC PATCH v2 14/19] heki: x86: Initialize permissions counters for pages mapped into KVA
config: openrisc-allmodconfig (https://download.01.org/0day-ci/archive/20231113/202311131254.C8WZC0U9-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231113/202311131254.C8WZC0U9-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <yujie.liu@intel.com>
| Closes: https://lore.kernel.org/r/202311131254.C8WZC0U9-lkp@intel.com/

All errors (new ones prefixed by >>):

   WARNING: unmet direct dependencies detected for SPARSEMEM
   Depends on [n]: !SELECT_MEMORY_MODEL [=n] && ARCH_SPARSEMEM_ENABLE || SPARSEMEM_MANUAL [=n]
   Selected by [y]:
   - HEKI [=y] && ARCH_SUPPORTS_HEKI [=y] && HYPERVISOR_SUPPORTS_HEKI [=y] && !X86_16BIT
   In file included from include/linux/mm_types.h:18,
   from include/linux/sched/signal.h:13,
   from include/linux/ptrace.h:7,
   from arch/openrisc/kernel/asm-offsets.c:28:
>> include/linux/page-flags-layout.h:30:10: fatal error: asm/sparsemem.h: No such file or directory
   30 | #include <asm/sparsemem.h>
   |          ^~~~~~~~~~~~~~~~~
   compilation terminated.
   make[3]: *** [scripts/Makefile.build:116: arch/openrisc/kernel/asm-offsets.s] Error 1
   make[3]: Target 'prepare' not remade because of errors.
   make[2]: *** [Makefile:1202: prepare0] Error 2
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:234: __sub-make] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:234: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for SPARSEMEM
   Depends on [n]: !SELECT_MEMORY_MODEL [=n] && ARCH_SPARSEMEM_ENABLE || SPARSEMEM_MANUAL [=n]
   Selected by [y]:
   - HEKI [=y] && ARCH_SUPPORTS_HEKI [=y] && HYPERVISOR_SUPPORTS_HEKI [=y] && !X86_16BIT


vim +30 include/linux/page-flags-layout.h

1587db62d8c0db Yu Zhao        2021-04-29  28  
bbeae5b05ef6e4 Peter Zijlstra 2013-02-22  29  #ifdef CONFIG_SPARSEMEM
bbeae5b05ef6e4 Peter Zijlstra 2013-02-22 @30  #include <asm/sparsemem.h>
bbeae5b05ef6e4 Peter Zijlstra 2013-02-22  31  #define SECTIONS_SHIFT	(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
1587db62d8c0db Yu Zhao        2021-04-29  32  #else
1587db62d8c0db Yu Zhao        2021-04-29  33  #define SECTIONS_SHIFT	0
1587db62d8c0db Yu Zhao        2021-04-29  34  #endif
bbeae5b05ef6e4 Peter Zijlstra 2013-02-22  35  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions
@ 2023-11-14  1:27 ` kernel test robot
  0 siblings, 0 replies; 44+ messages in thread
From: kernel test robot @ 2023-11-14  1:27 UTC (permalink / raw)
  To: Mickaël Salaün; +Cc: oe-kbuild-all

Hi Mickaël,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on 881375a408c0f4ea451ff14545b59216d2923881]

url:    https://github.com/intel-lab-lkp/linux/commits/Micka-l-Sala-n/virt-Introduce-Hypervisor-Enforced-Kernel-Integrity-Heki/20231113-102847
base:   881375a408c0f4ea451ff14545b59216d2923881
patch link:    https://lore.kernel.org/r/20231113022326.24388-11-mic%40digikod.net
patch subject: [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions
config: i386-randconfig-002-20231113 (https://download.01.org/0day-ci/archive/20231113/202311131527.nBtS1Ewn-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231113/202311131527.nBtS1Ewn-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <yujie.liu@intel.com>
| Closes: https://lore.kernel.org/r/202311131527.nBtS1Ewn-lkp@intel.com/

All errors (new ones prefixed by >>):

   WARNING: unmet direct dependencies detected for SPARSEMEM
   Depends on [n]: !SELECT_MEMORY_MODEL [=y] && ARCH_SPARSEMEM_ENABLE [=y] || SPARSEMEM_MANUAL [=n]
   Selected by [y]:
   - KVM [=y] && VIRTUALIZATION [=y] && HAVE_KVM [=y] && HIGH_RES_TIMERS [=y] && X86_LOCAL_APIC [=y]
   In file included from arch/x86/include/asm/page.h:85,
   from arch/x86/include/asm/thread_info.h:12,
   from include/linux/thread_info.h:60,
   from arch/x86/include/asm/preempt.h:9,
   from include/linux/preempt.h:79,
   from include/linux/spinlock.h:56,
   from include/linux/swait.h:7,
   from include/linux/completion.h:12,
   from include/linux/crypto.h:15,
   from arch/x86/kernel/asm-offsets.c:9:
>> include/asm-generic/memory_model.h:31:19: error: redefinition of 'pfn_valid'
   31 | #define pfn_valid pfn_valid
   |                   ^~~~~~~~~
   include/linux/mmzone.h:1987:19: note: in expansion of macro 'pfn_valid'
   1987 | static inline int pfn_valid(unsigned long pfn)
   |                   ^~~~~~~~~
   include/asm-generic/memory_model.h:23:19: note: previous definition of 'pfn_valid' with type 'int(long unsigned int)'
   23 | static inline int pfn_valid(unsigned long pfn)
   |                   ^~~~~~~~~
   make[3]: *** [scripts/Makefile.build:116: arch/x86/kernel/asm-offsets.s] Error 1 shuffle=912718013
   make[3]: Target 'prepare' not remade because of errors.
   make[2]: *** [Makefile:1202: prepare0] Error 2 shuffle=912718013
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:234: __sub-make] Error 2 shuffle=912718013
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:234: __sub-make] Error 2 shuffle=912718013
   make: Target 'prepare' not remade because of errors.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for SPARSEMEM
   Depends on [n]: !SELECT_MEMORY_MODEL [=y] && ARCH_SPARSEMEM_ENABLE [=y] || SPARSEMEM_MANUAL [=n]
   Selected by [y]:
   - KVM [=y] && VIRTUALIZATION [=y] && HAVE_KVM [=y] && HIGH_RES_TIMERS [=y] && X86_LOCAL_APIC [=y]


vim +/pfn_valid +31 include/asm-generic/memory_model.h

a117e66ed45ac0 KAMEZAWA Hiroyuki   2006-03-27  17  
67de648211fa04 Andy Whitcroft      2006-06-23  18  #define __pfn_to_page(pfn)	(mem_map + ((pfn) - ARCH_PFN_OFFSET))
67de648211fa04 Andy Whitcroft      2006-06-23  19  #define __page_to_pfn(page)	((unsigned long)((page) - mem_map) + \
a117e66ed45ac0 KAMEZAWA Hiroyuki   2006-03-27  20  				 ARCH_PFN_OFFSET)
a117e66ed45ac0 KAMEZAWA Hiroyuki   2006-03-27  21  
e5080a9677854b Mike Rapoport (IBM  2023-01-29  22) #ifndef pfn_valid
e5080a9677854b Mike Rapoport (IBM  2023-01-29  23) static inline int pfn_valid(unsigned long pfn)
e5080a9677854b Mike Rapoport (IBM  2023-01-29  24) {
e5080a9677854b Mike Rapoport (IBM  2023-01-29  25) 	/* avoid <linux/mm.h> include hell */
e5080a9677854b Mike Rapoport (IBM  2023-01-29  26) 	extern unsigned long max_mapnr;
e5080a9677854b Mike Rapoport (IBM  2023-01-29  27) 	unsigned long pfn_offset = ARCH_PFN_OFFSET;
e5080a9677854b Mike Rapoport (IBM  2023-01-29  28) 
e5080a9677854b Mike Rapoport (IBM  2023-01-29  29) 	return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr;
e5080a9677854b Mike Rapoport (IBM  2023-01-29  30) }
e5080a9677854b Mike Rapoport (IBM  2023-01-29 @31) #define pfn_valid pfn_valid
e5080a9677854b Mike Rapoport (IBM  2023-01-29  32) #endif
e5080a9677854b Mike Rapoport (IBM  2023-01-29  33) 

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions
@ 2023-11-14  1:29 ` kernel test robot
  0 siblings, 0 replies; 44+ messages in thread
From: kernel test robot @ 2023-11-14  1:29 UTC (permalink / raw)
  To: Mickaël Salaün; +Cc: oe-kbuild-all

Hi Mickaël,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on 881375a408c0f4ea451ff14545b59216d2923881]

url:    https://github.com/intel-lab-lkp/linux/commits/Micka-l-Sala-n/virt-Introduce-Hypervisor-Enforced-Kernel-Integrity-Heki/20231113-102847
base:   881375a408c0f4ea451ff14545b59216d2923881
patch link:    https://lore.kernel.org/r/20231113022326.24388-11-mic%40digikod.net
patch subject: [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions
config: x86_64-randconfig-013-20231113 (https://download.01.org/0day-ci/archive/20231113/202311132003.c5QKfbEI-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231113/202311132003.c5QKfbEI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <yujie.liu@intel.com>
| Closes: https://lore.kernel.org/r/202311132003.c5QKfbEI-lkp@intel.com/

All errors (new ones prefixed by >>):

   arch/x86/kvm/../../../virt/lib/kvm_permissions.c: In function 'kvm_permissions_get':
>> arch/x86/kvm/../../../virt/lib/kvm_permissions.c:74:20: error: implicit declaration of function 'kvm_get_memory_attributes' [-Werror=implicit-function-declaration]
      74 |         kvm_attr = kvm_get_memory_attributes(kvm, gfn);
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/kvm/../../../virt/lib/kvm_permissions.c: In function 'kvm_permissions_set':
>> arch/x86/kvm/../../../virt/lib/kvm_permissions.c:91:13: error: implicit declaration of function 'kvm_range_has_memory_attributes'; did you mean 'kvm_mmu_init_memslot_memory_attributes'? [-Werror=implicit-function-declaration]
      91 |         if (kvm_range_has_memory_attributes(kvm, gfn_start, gfn_end,
         |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |             kvm_mmu_init_memslot_memory_attributes
>> arch/x86/kvm/../../../virt/lib/kvm_permissions.c:100:16: error: implicit declaration of function 'kvm_vm_set_mem_attributes' [-Werror=implicit-function-declaration]
     100 |         return kvm_vm_set_mem_attributes(kvm, gfn_start, gfn_end,
         |                ^~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/kvm_get_memory_attributes +74 arch/x86/kvm/../../../virt/lib/kvm_permissions.c

e677181e18929a Mickaël Salaün 2023-11-12   63  
e677181e18929a Mickaël Salaün 2023-11-12   64  unsigned long kvm_permissions_get(struct kvm *kvm, gfn_t gfn)
e677181e18929a Mickaël Salaün 2023-11-12   65  {
e677181e18929a Mickaël Salaün 2023-11-12   66  	unsigned long kvm_attr = 0;
e677181e18929a Mickaël Salaün 2023-11-12   67  
e677181e18929a Mickaël Salaün 2023-11-12   68  	/*
e677181e18929a Mickaël Salaün 2023-11-12   69  	 * Retrieve the permissions for a guest page. If not present (i.e., no
e677181e18929a Mickaël Salaün 2023-11-12   70  	 * attribute), then return default permissions (RWX).  This means
e677181e18929a Mickaël Salaün 2023-11-12   71  	 * setting permissions to 0 resets them to RWX. We might want to
e677181e18929a Mickaël Salaün 2023-11-12   72  	 * revisit that in a future version.
e677181e18929a Mickaël Salaün 2023-11-12   73  	 */
e677181e18929a Mickaël Salaün 2023-11-12  @74  	kvm_attr = kvm_get_memory_attributes(kvm, gfn);
e677181e18929a Mickaël Salaün 2023-11-12   75  	if (kvm_attr)
e677181e18929a Mickaël Salaün 2023-11-12   76  		return kvm_attr_to_heki_attr(kvm_attr);
e677181e18929a Mickaël Salaün 2023-11-12   77  	else
e677181e18929a Mickaël Salaün 2023-11-12   78  		return kvm_default_permissions;
e677181e18929a Mickaël Salaün 2023-11-12   79  }
e677181e18929a Mickaël Salaün 2023-11-12   80  EXPORT_SYMBOL_GPL(kvm_permissions_get);
e677181e18929a Mickaël Salaün 2023-11-12   81  
e677181e18929a Mickaël Salaün 2023-11-12   82  int kvm_permissions_set(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end,
e677181e18929a Mickaël Salaün 2023-11-12   83  			unsigned long heki_attr)
e677181e18929a Mickaël Salaün 2023-11-12   84  {
e677181e18929a Mickaël Salaün 2023-11-12   85  	if ((heki_attr | MEM_ATTR_PROT) != MEM_ATTR_PROT)
e677181e18929a Mickaël Salaün 2023-11-12   86  		return -EINVAL;
e677181e18929a Mickaël Salaün 2023-11-12   87  
e677181e18929a Mickaël Salaün 2023-11-12   88  	if (gfn_end <= gfn_start)
e677181e18929a Mickaël Salaün 2023-11-12   89  		return -EINVAL;
e677181e18929a Mickaël Salaün 2023-11-12   90  
e677181e18929a Mickaël Salaün 2023-11-12  @91  	if (kvm_range_has_memory_attributes(kvm, gfn_start, gfn_end,
e677181e18929a Mickaël Salaün 2023-11-12   92  					    KVM_MEMORY_ATTRIBUTE_HEKI_IMMUTABLE,
e677181e18929a Mickaël Salaün 2023-11-12   93  					    false)) {
e677181e18929a Mickaël Salaün 2023-11-12   94  		pr_warn_ratelimited(
e677181e18929a Mickaël Salaün 2023-11-12   95  			"Guest tried to change immutable permission for GFNs %llx-%llx\n",
e677181e18929a Mickaël Salaün 2023-11-12   96  			gfn_start, gfn_end);
e677181e18929a Mickaël Salaün 2023-11-12   97  		return -EPERM;
e677181e18929a Mickaël Salaün 2023-11-12   98  	}
e677181e18929a Mickaël Salaün 2023-11-12   99  
e677181e18929a Mickaël Salaün 2023-11-12 @100  	return kvm_vm_set_mem_attributes(kvm, gfn_start, gfn_end,

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor
@ 2023-11-14  1:30 ` kernel test robot
  0 siblings, 0 replies; 44+ messages in thread
From: kernel test robot @ 2023-11-14  1:30 UTC (permalink / raw)
  To: Mickaël Salaün; +Cc: oe-kbuild-all

Hi Mickaël,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on 881375a408c0f4ea451ff14545b59216d2923881]

url:    https://github.com/intel-lab-lkp/linux/commits/Micka-l-Sala-n/virt-Introduce-Hypervisor-Enforced-Kernel-Integrity-Heki/20231113-102847
base:   881375a408c0f4ea451ff14545b59216d2923881
patch link:    https://lore.kernel.org/r/20231113022326.24388-19-mic%40digikod.net
patch subject: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor
config: riscv-randconfig-001-20231113 (attached as .config)
compiler: riscv64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231113/202311131528.mJpddiq3-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <yujie.liu@intel.com>
| Closes: https://lore.kernel.org/r/202311131528.mJpddiq3-lkp@intel.com/

All errors (new ones prefixed by >>):

>> arch/riscv/Kconfig:833:error: recursive dependency detected!
   arch/riscv/Kconfig:833:	symbol XIP_KERNEL depends on SPARSEMEM
   mm/Kconfig:461:	symbol SPARSEMEM is selected by HEKI
   virt/heki/Kconfig:5:	symbol HEKI depends on JUMP_LABEL
   arch/Kconfig:66:	symbol JUMP_LABEL depends on HAVE_ARCH_JUMP_LABEL
   arch/Kconfig:441:	symbol HAVE_ARCH_JUMP_LABEL is selected by XIP_KERNEL
   For a resolution refer to Documentation/kbuild/kconfig-language.rst
   subsection "Kconfig recursive dependency limitations"


vim +833 arch/riscv/Kconfig

aef53f97b505ff Nick Kossifidis 2018-09-20  787  
d7071743db31b4 Atish Patra     2020-09-17  788  config EFI_STUB
d7071743db31b4 Atish Patra     2020-09-17  789  	bool
d7071743db31b4 Atish Patra     2020-09-17  790  
d7071743db31b4 Atish Patra     2020-09-17  791  config EFI
d7071743db31b4 Atish Patra     2020-09-17  792  	bool "UEFI runtime support"
44c922572952d8 Vitaly Wool     2021-04-13  793  	depends on OF && !XIP_KERNEL
5f365c133b83a5 Conor Dooley    2022-12-19  794  	depends on MMU
5f365c133b83a5 Conor Dooley    2022-12-19  795  	default y
a91a9ffbd3a55a Sunil V L       2023-05-15  796  	select ARCH_SUPPORTS_ACPI if 64BIT
d7071743db31b4 Atish Patra     2020-09-17  797  	select EFI_GENERIC_STUB
5f365c133b83a5 Conor Dooley    2022-12-19  798  	select EFI_PARAMS_FROM_FDT
b91540d52a08b6 Atish Patra     2020-09-17  799  	select EFI_RUNTIME_WRAPPERS
5f365c133b83a5 Conor Dooley    2022-12-19  800  	select EFI_STUB
5f365c133b83a5 Conor Dooley    2022-12-19  801  	select LIBFDT
d7071743db31b4 Atish Patra     2020-09-17  802  	select RISCV_ISA_C
5f365c133b83a5 Conor Dooley    2022-12-19  803  	select UCS2_STRING
d7071743db31b4 Atish Patra     2020-09-17  804  	help
d7071743db31b4 Atish Patra     2020-09-17  805  	  This option provides support for runtime services provided
d7071743db31b4 Atish Patra     2020-09-17  806  	  by UEFI firmware (such as non-volatile variables, realtime
d7071743db31b4 Atish Patra     2020-09-17  807  	  clock, and platform reset). A UEFI stub is also provided to
d7071743db31b4 Atish Patra     2020-09-17  808  	  allow the kernel to be booted as an EFI application. This
d7071743db31b4 Atish Patra     2020-09-17  809  	  is only useful on systems that have UEFI firmware.
d7071743db31b4 Atish Patra     2020-09-17  810  
fea2fed201ee56 Guo Ren         2020-12-17  811  config CC_HAVE_STACKPROTECTOR_TLS
fea2fed201ee56 Guo Ren         2020-12-17  812  	def_bool $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=tp -mstack-protector-guard-offset=0)
fea2fed201ee56 Guo Ren         2020-12-17  813  
fea2fed201ee56 Guo Ren         2020-12-17  814  config STACKPROTECTOR_PER_TASK
fea2fed201ee56 Guo Ren         2020-12-17  815  	def_bool y
595b893e2087de Kees Cook       2022-05-03  816  	depends on !RANDSTRUCT
fea2fed201ee56 Guo Ren         2020-12-17  817  	depends on STACKPROTECTOR && CC_HAVE_STACKPROTECTOR_TLS
fea2fed201ee56 Guo Ren         2020-12-17  818  
867432bec1c6e7 Alexandre Ghiti 2021-07-21  819  config PHYS_RAM_BASE_FIXED
867432bec1c6e7 Alexandre Ghiti 2021-07-21  820  	bool "Explicitly specified physical RAM address"
44c1e84a38a031 Palmer Dabbelt  2022-05-21  821  	depends on NONPORTABLE
867432bec1c6e7 Alexandre Ghiti 2021-07-21  822  	default n
867432bec1c6e7 Alexandre Ghiti 2021-07-21  823  
44c922572952d8 Vitaly Wool     2021-04-13  824  config PHYS_RAM_BASE
44c922572952d8 Vitaly Wool     2021-04-13  825  	hex "Platform Physical RAM address"
867432bec1c6e7 Alexandre Ghiti 2021-07-21  826  	depends on PHYS_RAM_BASE_FIXED
44c922572952d8 Vitaly Wool     2021-04-13  827  	default "0x80000000"
44c922572952d8 Vitaly Wool     2021-04-13  828  	help
44c922572952d8 Vitaly Wool     2021-04-13  829  	  This is the physical address of RAM in the system. It has to be
44c922572952d8 Vitaly Wool     2021-04-13  830  	  explicitly specified to run early relocations of read-write data
44c922572952d8 Vitaly Wool     2021-04-13  831  	  from flash to RAM.
44c922572952d8 Vitaly Wool     2021-04-13  832  
44c922572952d8 Vitaly Wool     2021-04-13 @833  config XIP_KERNEL
44c922572952d8 Vitaly Wool     2021-04-13  834  	bool "Kernel Execute-In-Place from ROM"
44c1e84a38a031 Palmer Dabbelt  2022-05-21  835  	depends on MMU && SPARSEMEM && NONPORTABLE
44c922572952d8 Vitaly Wool     2021-04-13  836  	# This prevents XIP from being enabled by all{yes,mod}config, which
44c922572952d8 Vitaly Wool     2021-04-13  837  	# fail to build since XIP doesn't support large kernels.
44c922572952d8 Vitaly Wool     2021-04-13  838  	depends on !COMPILE_TEST
867432bec1c6e7 Alexandre Ghiti 2021-07-21  839  	select PHYS_RAM_BASE_FIXED
44c922572952d8 Vitaly Wool     2021-04-13  840  	help
44c922572952d8 Vitaly Wool     2021-04-13  841  	  Execute-In-Place allows the kernel to run from non-volatile storage
44c922572952d8 Vitaly Wool     2021-04-13  842  	  directly addressable by the CPU, such as NOR flash. This saves RAM
44c922572952d8 Vitaly Wool     2021-04-13  843  	  space since the text section of the kernel is not loaded from flash
44c922572952d8 Vitaly Wool     2021-04-13  844  	  to RAM.  Read-write sections, such as the data section and stack,
44c922572952d8 Vitaly Wool     2021-04-13  845  	  are still copied to RAM.  The XIP kernel is not compressed since
44c922572952d8 Vitaly Wool     2021-04-13  846  	  it has to run directly from flash, so it will take more space to
44c922572952d8 Vitaly Wool     2021-04-13  847  	  store it.  The flash address used to link the kernel object files,
44c922572952d8 Vitaly Wool     2021-04-13  848  	  and for storing it, is configuration dependent. Therefore, if you
44c922572952d8 Vitaly Wool     2021-04-13  849  	  say Y here, you must know the proper physical address where to
44c922572952d8 Vitaly Wool     2021-04-13  850  	  store the kernel image depending on your own flash memory usage.
44c922572952d8 Vitaly Wool     2021-04-13  851  
44c922572952d8 Vitaly Wool     2021-04-13  852  	  Also note that the make target becomes "make xipImage" rather than
44c922572952d8 Vitaly Wool     2021-04-13  853  	  "make zImage" or "make Image".  The final kernel binary to put in
44c922572952d8 Vitaly Wool     2021-04-13  854  	  ROM memory will be arch/riscv/boot/xipImage.
44c922572952d8 Vitaly Wool     2021-04-13  855  
44c922572952d8 Vitaly Wool     2021-04-13  856  	  SPARSEMEM is required because the kernel text and rodata that are
44c922572952d8 Vitaly Wool     2021-04-13  857  	  flash resident are not backed by memmap, then any attempt to get
44c922572952d8 Vitaly Wool     2021-04-13  858  	  a struct page on those regions will trigger a fault.
44c922572952d8 Vitaly Wool     2021-04-13  859  
44c922572952d8 Vitaly Wool     2021-04-13  860  	  If unsure, say N.
44c922572952d8 Vitaly Wool     2021-04-13  861  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 12/19] x86: Implement the Memory Table feature to store arbitrary per-page data
  2023-11-13  2:23 ` [RFC PATCH v2 12/19] x86: Implement the Memory Table feature to store arbitrary per-page data Mickaël Salaün
@ 2023-11-22  7:19   ` kernel test robot
  0 siblings, 0 replies; 44+ messages in thread
From: kernel test robot @ 2023-11-22  7:19 UTC (permalink / raw)
  To: Mickaël Salaün; +Cc: oe-kbuild-all

Hi Mickaël,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on 881375a408c0f4ea451ff14545b59216d2923881]

url:    https://github.com/intel-lab-lkp/linux/commits/Micka-l-Sala-n/virt-Introduce-Hypervisor-Enforced-Kernel-Integrity-Heki/20231113-102847
base:   881375a408c0f4ea451ff14545b59216d2923881
patch link:    https://lore.kernel.org/r/20231113022326.24388-13-mic%40digikod.net
patch subject: [RFC PATCH v2 12/19] x86: Implement the Memory Table feature to store arbitrary per-page data
config: x86_64-randconfig-123-20231122 (https://download.01.org/0day-ci/archive/20231122/202311221345.9sPNLiAU-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231122/202311221345.9sPNLiAU-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311221345.9sPNLiAU-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> kernel/mem_table.c:47:14: sparse: sparse: symbol 'mem_table_base_level' was not declared. Should it be static?
>> kernel/mem_table.c:48:14: sparse: sparse: symbol 'mem_table_nlevels' was not declared. Should it be static?
>> kernel/mem_table.c:49:24: sparse: sparse: symbol 'mem_table_levels' was not declared. Should it be static?

vim +/mem_table_base_level +47 kernel/mem_table.c

    42	
    43	/*
    44	 * Within this feature, the table levels start from 0. On X86, the base level
    45	 * is not 0.
    46	 */
  > 47	unsigned int mem_table_base_level __ro_after_init;
  > 48	unsigned int mem_table_nlevels __ro_after_init;
  > 49	struct mem_table_level mem_table_levels[CONFIG_PGTABLE_LEVELS] __ro_after_init;
    50	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
  2023-11-13  8:19   ` Peter Zijlstra
@ 2023-11-27 16:48     ` Madhavan T. Venkataraman
  2023-11-27 20:08       ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Madhavan T. Venkataraman @ 2023-11-27 16:48 UTC (permalink / raw)
  To: Peter Zijlstra, Mickaël Salaün
  Cc: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li, Alexander Graf, Chao Peng,
	Edgecombe, Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Marian Rotariu, Mihai Donțu,
	Nicușor Cîțu, Thara Gopinath, Trilok Soni,
	Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

Apologies for the late reply. I was on vacation. Please see my response below:

On 11/13/23 02:19, Peter Zijlstra wrote:
> On Sun, Nov 12, 2023 at 09:23:24PM -0500, Mickaël Salaün wrote:
>> From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
>>
>> X86 uses a function called __text_poke() to modify executable code. This
>> patching function is used by many features such as KProbes and FTrace.
>>
>> Update the permissions counters for the text page so that write
>> permissions can be temporarily established in the EPT to modify the
>> instructions in that page.
>>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: H. Peter Anvin <hpa@zytor.com>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
>> Cc: Mickaël Salaün <mic@digikod.net>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Sean Christopherson <seanjc@google.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Cc: Wanpeng Li <wanpengli@tencent.com>
>> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
>> ---
>>
>> Changes since v1:
>> * New patch
>> ---
>>  arch/x86/kernel/alternative.c |  5 ++++
>>  arch/x86/mm/heki.c            | 49 +++++++++++++++++++++++++++++++++++
>>  include/linux/heki.h          | 14 ++++++++++
>>  3 files changed, 68 insertions(+)
>>
>> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
>> index 517ee01503be..64fd8757ba5c 100644
>> --- a/arch/x86/kernel/alternative.c
>> +++ b/arch/x86/kernel/alternative.c
>> @@ -18,6 +18,7 @@
>>  #include <linux/mmu_context.h>
>>  #include <linux/bsearch.h>
>>  #include <linux/sync_core.h>
>> +#include <linux/heki.h>
>>  #include <asm/text-patching.h>
>>  #include <asm/alternative.h>
>>  #include <asm/sections.h>
>> @@ -1801,6 +1802,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
>>  	 */
>>  	pgprot = __pgprot(pgprot_val(PAGE_KERNEL) & ~_PAGE_GLOBAL);
>>  
>> +	heki_text_poke_start(pages, cross_page_boundary ? 2 : 1, pgprot);
>>  	/*
>>  	 * The lock is not really needed, but this allows to avoid open-coding.
>>  	 */
>> @@ -1865,7 +1867,10 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
>>  	}
>>  
>>  	local_irq_restore(flags);
>> +
>>  	pte_unmap_unlock(ptep, ptl);
>> +	heki_text_poke_end(pages, cross_page_boundary ? 2 : 1, pgprot);
>> +
>>  	return addr;
>>  }
> 
> This makes no sense, we already use a custom CR3 with userspace alias
> for the actual pages to write to, why are you then frobbing permissions
> on that *again* ?

Today, the permissions for a guest page in the extended page table (EPT) are RWX (unless permissions are
restricted for some specific reason like for shadow page table pages). In this Heki feature, we don't allow
RWX by default in the EPT. We only allow those permissions in the EPT that the guest page actually needs.
E.g., for a text page, it is R_X in both the guest page table and the EPT.

For text patching, the above code establishes an alternate mapping in the guest page table that is RW_ so
that the text can be patched. That needs to be reflected in the EPT so that the EPT permissions will change
from R_X to RWX. In other words, RWX is allowed only as necessary. At the end of patching, the EPT permissions
are restored to R_X.

Does that address your comment?

Madhavan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor
  2023-11-13  8:54   ` Peter Zijlstra
@ 2023-11-27 17:05     ` Madhavan T. Venkataraman
  2023-11-27 20:03       ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Madhavan T. Venkataraman @ 2023-11-27 17:05 UTC (permalink / raw)
  To: Peter Zijlstra, Mickaël Salaün
  Cc: Borislav Petkov, Dave Hansen, H . Peter Anvin, Ingo Molnar,
	Kees Cook, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Vitaly Kuznetsov, Wanpeng Li, Alexander Graf, Chao Peng,
	Edgecombe, Rick P, Forrest Yuan Yu, James Gowans, James Morris,
	John Andersen, Marian Rotariu, Mihai Donțu,
	Nicușor Cîțu, Thara Gopinath, Trilok Soni,
	Wei Liu, Will Deacon, Yu Zhang, Zahra Tarkhani,
	Ștefan Șicleru, dev, kvm, linux-hardening,
	linux-hyperv, linux-kernel, linux-security-module, qemu-devel,
	virtualization, x86, xen-devel

Apologies for the late reply. I was on vacation. Please see my response below:

On 11/13/23 02:54, Peter Zijlstra wrote:
> On Sun, Nov 12, 2023 at 09:23:25PM -0500, Mickaël Salaün wrote:
>> From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
>>
>> Implement a hypervisor function, kvm_protect_memory() that calls the
>> KVM_HC_PROTECT_MEMORY hypercall to request the KVM hypervisor to
>> set specified permissions on a list of guest pages.
>>
>> Using the protect_memory() function, set proper EPT permissions for all
>> guest pages.
>>
>> Use the MEM_ATTR_IMMUTABLE property to protect the kernel static
>> sections and the boot-time read-only sections. This enables to make sure
>> a compromised guest will not be able to change its main physical memory
>> page permissions. However, this also disable any feature that may change
>> the kernel's text section (e.g., ftrace, Kprobes), but they can still be
>> used on kernel modules.
>>
>> Module loading/unloading, and eBPF JIT is allowed without restrictions
>> for now, but we'll need a way to authenticate these code changes to
>> really improve the guests' security. We plan to use module signatures,
>> but there is no solution yet to authenticate eBPF programs.
>>
>> Being able to use ftrace and Kprobes in a secure way is a challenge not
>> solved yet. We're looking for ideas to make this work.
>>
>> Likewise, the JUMP_LABEL feature cannot work because the kernel's text
>> section is read-only.
> 
> What is the actual problem? As is the kernel text map is already RO and
> never changed.

For the JUMP_LABEL optimization, the text needs to be patched at some point.
That patching requires a writable mapping of the text page at the time of
patching.

In this Heki feature, we currently lock down the kernel text at the end of
kernel boot just before kicking off the init process. The lockdown is
implemented by setting the permissions of a text page to R_X in the extended
page table and not allowing write permissions in the EPT after that. So, jump label
patching during kernel boot is not a problem. But doing it after kernel
boot is a problem.

The lockdown is just for the current Heki implementation. In the future, we plan
to have a way of authenticating guest requests to change permissions on a text page.
Once that is in place, permissions on text pages can be changed on the fly to
support features that depend on text patching - FTrace, KProbes, etc.

Madhavan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor
  2023-11-27 17:05     ` Madhavan T. Venkataraman
@ 2023-11-27 20:03       ` Peter Zijlstra
  2023-11-29 19:47         ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2023-11-27 20:03 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Mickaël Salaün, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Ingo Molnar, Kees Cook, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Vitaly Kuznetsov,
	Wanpeng Li, Alexander Graf, Chao Peng, Edgecombe, Rick P,
	Forrest Yuan Yu, James Gowans, James Morris, John Andersen,
	Marian Rotariu, Mihai Donțu, Nicușor Cîțu,
	Thara Gopinath, Trilok Soni, Wei Liu, Will Deacon, Yu Zhang,
	Zahra Tarkhani, Ștefan Șicleru, dev, kvm,
	linux-hardening, linux-hyperv, linux-kernel,
	linux-security-module, qemu-devel, virtualization, x86,
	xen-devel

On Mon, Nov 27, 2023 at 11:05:23AM -0600, Madhavan T. Venkataraman wrote:
> Apologies for the late reply. I was on vacation. Please see my response below:
> 
> On 11/13/23 02:54, Peter Zijlstra wrote:
> > On Sun, Nov 12, 2023 at 09:23:25PM -0500, Mickaël Salaün wrote:
> >> From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> >>
> >> Implement a hypervisor function, kvm_protect_memory() that calls the
> >> KVM_HC_PROTECT_MEMORY hypercall to request the KVM hypervisor to
> >> set specified permissions on a list of guest pages.
> >>
> >> Using the protect_memory() function, set proper EPT permissions for all
> >> guest pages.
> >>
> >> Use the MEM_ATTR_IMMUTABLE property to protect the kernel static
> >> sections and the boot-time read-only sections. This enables to make sure
> >> a compromised guest will not be able to change its main physical memory
> >> page permissions. However, this also disable any feature that may change
> >> the kernel's text section (e.g., ftrace, Kprobes), but they can still be
> >> used on kernel modules.
> >>
> >> Module loading/unloading, and eBPF JIT is allowed without restrictions
> >> for now, but we'll need a way to authenticate these code changes to
> >> really improve the guests' security. We plan to use module signatures,
> >> but there is no solution yet to authenticate eBPF programs.
> >>
> >> Being able to use ftrace and Kprobes in a secure way is a challenge not
> >> solved yet. We're looking for ideas to make this work.
> >>
> >> Likewise, the JUMP_LABEL feature cannot work because the kernel's text
> >> section is read-only.
> > 
> > What is the actual problem? As is the kernel text map is already RO and
> > never changed.
> 
> For the JUMP_LABEL optimization, the text needs to be patched at some point.
> That patching requires a writable mapping of the text page at the time of
> patching.
> 
> In this Heki feature, we currently lock down the kernel text at the end of
> kernel boot just before kicking off the init process. The lockdown is
> implemented by setting the permissions of a text page to R_X in the extended
> page table and not allowing write permissions in the EPT after that. So, jump label
> patching during kernel boot is not a problem. But doing it after kernel
> boot is a problem.

But you see, that's exactly what the kernel already does with the normal
permissions. They get set to RX after init and are never changed.

See the previous patch, we establish a read-write alias and write there.

You seem to lack basic understanding of how the kernel works in this
regard, which makes me very nervous about you touching any of this.

I must also say I really dislike your extra/random permssion calls all
over the place. They don't really get us anything afaict. Why can't you
plumb into the existing set_memory_*() family?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
  2023-11-27 16:48     ` Madhavan T. Venkataraman
@ 2023-11-27 20:08       ` Peter Zijlstra
  2023-11-29 21:07         ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2023-11-27 20:08 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Mickaël Salaün, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Ingo Molnar, Kees Cook, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Vitaly Kuznetsov,
	Wanpeng Li, Alexander Graf, Chao Peng, Edgecombe, Rick P,
	Forrest Yuan Yu, James Gowans, James Morris, John Andersen,
	Marian Rotariu, Mihai Donțu, Nicușor Cîțu,
	Thara Gopinath, Trilok Soni, Wei Liu, Will Deacon, Yu Zhang,
	Zahra Tarkhani, Ștefan Șicleru, dev, kvm,
	linux-hardening, linux-hyperv, linux-kernel,
	linux-security-module, qemu-devel, virtualization, x86,
	xen-devel

On Mon, Nov 27, 2023 at 10:48:29AM -0600, Madhavan T. Venkataraman wrote:
> Apologies for the late reply. I was on vacation. Please see my response below:
> 
> On 11/13/23 02:19, Peter Zijlstra wrote:
> > On Sun, Nov 12, 2023 at 09:23:24PM -0500, Mickaël Salaün wrote:
> >> From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> >>
> >> X86 uses a function called __text_poke() to modify executable code. This
> >> patching function is used by many features such as KProbes and FTrace.
> >>
> >> Update the permissions counters for the text page so that write
> >> permissions can be temporarily established in the EPT to modify the
> >> instructions in that page.
> >>
> >> Cc: Borislav Petkov <bp@alien8.de>
> >> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >> Cc: H. Peter Anvin <hpa@zytor.com>
> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> Cc: Kees Cook <keescook@chromium.org>
> >> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> >> Cc: Mickaël Salaün <mic@digikod.net>
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: Sean Christopherson <seanjc@google.com>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> Cc: Wanpeng Li <wanpengli@tencent.com>
> >> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
> >> ---
> >>
> >> Changes since v1:
> >> * New patch
> >> ---
> >>  arch/x86/kernel/alternative.c |  5 ++++
> >>  arch/x86/mm/heki.c            | 49 +++++++++++++++++++++++++++++++++++
> >>  include/linux/heki.h          | 14 ++++++++++
> >>  3 files changed, 68 insertions(+)
> >>
> >> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> >> index 517ee01503be..64fd8757ba5c 100644
> >> --- a/arch/x86/kernel/alternative.c
> >> +++ b/arch/x86/kernel/alternative.c
> >> @@ -18,6 +18,7 @@
> >>  #include <linux/mmu_context.h>
> >>  #include <linux/bsearch.h>
> >>  #include <linux/sync_core.h>
> >> +#include <linux/heki.h>
> >>  #include <asm/text-patching.h>
> >>  #include <asm/alternative.h>
> >>  #include <asm/sections.h>
> >> @@ -1801,6 +1802,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
> >>  	 */
> >>  	pgprot = __pgprot(pgprot_val(PAGE_KERNEL) & ~_PAGE_GLOBAL);
> >>  
> >> +	heki_text_poke_start(pages, cross_page_boundary ? 2 : 1, pgprot);
> >>  	/*
> >>  	 * The lock is not really needed, but this allows to avoid open-coding.
> >>  	 */
> >> @@ -1865,7 +1867,10 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
> >>  	}
> >>  
> >>  	local_irq_restore(flags);
> >> +
> >>  	pte_unmap_unlock(ptep, ptl);
> >> +	heki_text_poke_end(pages, cross_page_boundary ? 2 : 1, pgprot);
> >> +
> >>  	return addr;
> >>  }
> > 
> > This makes no sense, we already use a custom CR3 with userspace alias
> > for the actual pages to write to, why are you then frobbing permissions
> > on that *again* ?
> 
> Today, the permissions for a guest page in the extended page table
> (EPT) are RWX (unless permissions are restricted for some specific
> reason like for shadow page table pages). In this Heki feature, we
> don't allow RWX by default in the EPT. We only allow those permissions
> in the EPT that the guest page actually needs.  E.g., for a text page,
> it is R_X in both the guest page table and the EPT.

To what end? If you always mirror what the guest does, you've not
actually gained anything.

> For text patching, the above code establishes an alternate mapping in
> the guest page table that is RW_ so that the text can be patched. That
> needs to be reflected in the EPT so that the EPT permissions will
> change from R_X to RWX. In other words, RWX is allowed only as
> necessary. At the end of patching, the EPT permissions are restored to
> R_X.
> 
> Does that address your comment?

No, if you want to mirror the native PTEs why don't you hook into the
paravirt page-table muck and get all that for free?

Also, this is the user range, are you saying you're also playing these
daft games with user maps?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor
  2023-11-27 20:03       ` Peter Zijlstra
@ 2023-11-29 19:47         ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 44+ messages in thread
From: Madhavan T. Venkataraman @ 2023-11-29 19:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mickaël Salaün, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Ingo Molnar, Kees Cook, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Vitaly Kuznetsov,
	Wanpeng Li, Alexander Graf, Chao Peng, Edgecombe, Rick P,
	Forrest Yuan Yu, James Gowans, James Morris, John Andersen,
	Marian Rotariu, Mihai Donțu, Nicușor Cîțu,
	Thara Gopinath, Trilok Soni, Wei Liu, Will Deacon, Yu Zhang,
	Zahra Tarkhani, Ștefan Șicleru, dev, kvm,
	linux-hardening, linux-hyperv, linux-kernel,
	linux-security-module, qemu-devel, virtualization, x86,
	xen-devel



On 11/27/23 14:03, Peter Zijlstra wrote:
> On Mon, Nov 27, 2023 at 11:05:23AM -0600, Madhavan T. Venkataraman wrote:
>> Apologies for the late reply. I was on vacation. Please see my response below:
>>
>> On 11/13/23 02:54, Peter Zijlstra wrote:
>>> On Sun, Nov 12, 2023 at 09:23:25PM -0500, Mickaël Salaün wrote:
>>>> From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
>>>>
>>>> Implement a hypervisor function, kvm_protect_memory() that calls the
>>>> KVM_HC_PROTECT_MEMORY hypercall to request the KVM hypervisor to
>>>> set specified permissions on a list of guest pages.
>>>>
>>>> Using the protect_memory() function, set proper EPT permissions for all
>>>> guest pages.
>>>>
>>>> Use the MEM_ATTR_IMMUTABLE property to protect the kernel static
>>>> sections and the boot-time read-only sections. This enables to make sure
>>>> a compromised guest will not be able to change its main physical memory
>>>> page permissions. However, this also disable any feature that may change
>>>> the kernel's text section (e.g., ftrace, Kprobes), but they can still be
>>>> used on kernel modules.
>>>>
>>>> Module loading/unloading, and eBPF JIT is allowed without restrictions
>>>> for now, but we'll need a way to authenticate these code changes to
>>>> really improve the guests' security. We plan to use module signatures,
>>>> but there is no solution yet to authenticate eBPF programs.
>>>>
>>>> Being able to use ftrace and Kprobes in a secure way is a challenge not
>>>> solved yet. We're looking for ideas to make this work.
>>>>
>>>> Likewise, the JUMP_LABEL feature cannot work because the kernel's text
>>>> section is read-only.
>>>
>>> What is the actual problem? As is the kernel text map is already RO and
>>> never changed.
>>
>> For the JUMP_LABEL optimization, the text needs to be patched at some point.
>> That patching requires a writable mapping of the text page at the time of
>> patching.
>>
>> In this Heki feature, we currently lock down the kernel text at the end of
>> kernel boot just before kicking off the init process. The lockdown is
>> implemented by setting the permissions of a text page to R_X in the extended
>> page table and not allowing write permissions in the EPT after that. So, jump label
>> patching during kernel boot is not a problem. But doing it after kernel
>> boot is a problem.
> 
> But you see, that's exactly what the kernel already does with the normal
> permissions. They get set to RX after init and are never changed.
> 
> See the previous patch, we establish a read-write alias and write there.
> 
> You seem to lack basic understanding of how the kernel works in this
> regard, which makes me very nervous about you touching any of this.
> 
> I must also say I really dislike your extra/random permssion calls all
> over the place. They don't really get us anything afaict. Why can't you
> plumb into the existing set_memory_*() family?

I have responded to your comments on your other email. Please read my
response there.

Thanks.

Madhavan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
  2023-11-27 20:08       ` Peter Zijlstra
@ 2023-11-29 21:07         ` Madhavan T. Venkataraman
  2023-11-30 11:33           ` Peter Zijlstra
  2023-12-01  0:45           ` Edgecombe, Rick P
  0 siblings, 2 replies; 44+ messages in thread
From: Madhavan T. Venkataraman @ 2023-11-29 21:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mickaël Salaün, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Ingo Molnar, Kees Cook, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Vitaly Kuznetsov,
	Wanpeng Li, Alexander Graf, Chao Peng, Edgecombe, Rick P,
	Forrest Yuan Yu, James Gowans, James Morris, John Andersen,
	Marian Rotariu, Mihai Donțu, Nicușor Cîțu,
	Thara Gopinath, Trilok Soni, Wei Liu, Will Deacon, Yu Zhang,
	Zahra Tarkhani, Ștefan Șicleru, dev, kvm,
	linux-hardening, linux-hyperv, linux-kernel,
	linux-security-module, qemu-devel, virtualization, x86,
	xen-devel



On 11/27/23 14:08, Peter Zijlstra wrote:
> On Mon, Nov 27, 2023 at 10:48:29AM -0600, Madhavan T. Venkataraman wrote:
>> Apologies for the late reply. I was on vacation. Please see my response below:
>>
>> On 11/13/23 02:19, Peter Zijlstra wrote:
>>> On Sun, Nov 12, 2023 at 09:23:24PM -0500, Mickaël Salaün wrote:
>>>> From: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
>>>>
>>>> X86 uses a function called __text_poke() to modify executable code. This
>>>> patching function is used by many features such as KProbes and FTrace.
>>>>
>>>> Update the permissions counters for the text page so that write
>>>> permissions can be temporarily established in the EPT to modify the
>>>> instructions in that page.
>>>>
>>>> Cc: Borislav Petkov <bp@alien8.de>
>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>>> Cc: H. Peter Anvin <hpa@zytor.com>
>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>> Cc: Kees Cook <keescook@chromium.org>
>>>> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
>>>> Cc: Mickaël Salaün <mic@digikod.net>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Cc: Sean Christopherson <seanjc@google.com>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>> Cc: Wanpeng Li <wanpengli@tencent.com>
>>>> Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
>>>> ---
>>>>
>>>> Changes since v1:
>>>> * New patch
>>>> ---
>>>>  arch/x86/kernel/alternative.c |  5 ++++
>>>>  arch/x86/mm/heki.c            | 49 +++++++++++++++++++++++++++++++++++
>>>>  include/linux/heki.h          | 14 ++++++++++
>>>>  3 files changed, 68 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
>>>> index 517ee01503be..64fd8757ba5c 100644
>>>> --- a/arch/x86/kernel/alternative.c
>>>> +++ b/arch/x86/kernel/alternative.c
>>>> @@ -18,6 +18,7 @@
>>>>  #include <linux/mmu_context.h>
>>>>  #include <linux/bsearch.h>
>>>>  #include <linux/sync_core.h>
>>>> +#include <linux/heki.h>
>>>>  #include <asm/text-patching.h>
>>>>  #include <asm/alternative.h>
>>>>  #include <asm/sections.h>
>>>> @@ -1801,6 +1802,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
>>>>  	 */
>>>>  	pgprot = __pgprot(pgprot_val(PAGE_KERNEL) & ~_PAGE_GLOBAL);
>>>>  
>>>> +	heki_text_poke_start(pages, cross_page_boundary ? 2 : 1, pgprot);
>>>>  	/*
>>>>  	 * The lock is not really needed, but this allows to avoid open-coding.
>>>>  	 */
>>>> @@ -1865,7 +1867,10 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
>>>>  	}
>>>>  
>>>>  	local_irq_restore(flags);
>>>> +
>>>>  	pte_unmap_unlock(ptep, ptl);
>>>> +	heki_text_poke_end(pages, cross_page_boundary ? 2 : 1, pgprot);
>>>> +
>>>>  	return addr;
>>>>  }
>>>
>>> This makes no sense, we already use a custom CR3 with userspace alias
>>> for the actual pages to write to, why are you then frobbing permissions
>>> on that *again* ?
>>
>> Today, the permissions for a guest page in the extended page table
>> (EPT) are RWX (unless permissions are restricted for some specific
>> reason like for shadow page table pages). In this Heki feature, we
>> don't allow RWX by default in the EPT. We only allow those permissions
>> in the EPT that the guest page actually needs.  E.g., for a text page,
>> it is R_X in both the guest page table and the EPT.
> 
> To what end? If you always mirror what the guest does, you've not
> actually gained anything.
> 
>> For text patching, the above code establishes an alternate mapping in
>> the guest page table that is RW_ so that the text can be patched. That
>> needs to be reflected in the EPT so that the EPT permissions will
>> change from R_X to RWX. In other words, RWX is allowed only as
>> necessary. At the end of patching, the EPT permissions are restored to
>> R_X.
>>
>> Does that address your comment?
> 
> No, if you want to mirror the native PTEs why don't you hook into the
> paravirt page-table muck and get all that for free?
> 
> Also, this is the user range, are you saying you're also playing these
> daft games with user maps?

I think that we should have done a better job of communicating the threat model in Heki and
how we are trying to address it. I will correct that here. I think this will help answer
many questions. Some of these questions also came up in the LPC when we presented this.
Apologies for the slightly long answer. It is for everyone's benefit. Bear with me.

Threat Model
------------

In the threat model in Heki, the attacker is a user space attacker who exploits
a kernel vulnerability to gain more privileges or bypass the kernel's access
control and self-protection mechanisms. 

In the context of the guest page table, one of the things that the threat model translates
to is a hacker gaining access to a guest page with RWX permissions. E.g., by adding execute
permissions to a writable page or by adding write permissions to an executable page.

Today, the permissions for a guest page in the extended page table are RWX by
default. So, if a hacker manages to establish RWX for a page in the guest page
table, then that is all he needs to do some damage.

How to defeat the threat
------------------------

To defeat this, we need to establish the correct permissions for a guest page
in the extended page table as well. That is, R_X for a text page, R__ for a
read-only page and RW_ for a writable page. The only exception is a guest page
that is mapped via multiple mappings with different permissions in each
mapping. In that case, the collective permissions across all mappings needs
to be established in the extended page table so that all mappings can work.

Mechanism
---------

To achieve all this, Heki finds all the kernel mappings at the end of kernel
boot and reflects their permissions in the extended page table before
kicking off the init process.

During runtime, permissions on a guest page can change because of genuine
kernel operations:
	- vmap/vunmap
	- text patching for FTrace, Kprobes, etc
	- set_memory_*()

In each of these cases as well, the permissions need to be reflected in the
extended page table. In summary, the extended page table permissions mirror
the guest page table ones.

Authentication
--------------

The above approach addresses the case where a hacker tries to directly
modify a guest page table entry. It doesn't matter since the extended page
table permissions are not changed.

Now, the question is - what if a hacker manages to use the Heki primitives
and establish the permissions he wants in the extended page table? All of
this work is for nothing!!

The answer is - authentication. If an entity outside the guest can validate
or authenticate each guest request to change extended page table permissions,
then it can tell a genuine request from an attack. We are planning to make the
VMM that entity. In the case of a genuine request, the VMM will call the
hypervisor and establish the correct permissions in the extended page table.
If the VMM thinks that it is an attack, it will have the hypervisor send an
exception to the guest.

In the current version (RFC V2), we don't have authentication in place. The
VMM is not in the picture yet. This is WIP. We are hoping to implement some
authentication in V3 and improve it as we go forward. So, in V2, we only have
the mechanisms in place.

So, I agree that it is kinda hard to see the value of Heki without authentication.

Kernel Lockdown
---------------

But, we must provide at least some security in V2. Otherwise, it is useless.

So, we have implemented what we call a kernel lockdown. At the end of kernel
boot, Heki establishes permissions in the extended page table as mentioned
before. Also, it adds an immutable attribute for kernel text and kernel RO data.
Beyond that point, guest requests that attempt to modify permissions on any of
the immutable pages will be denied.

This means that features like FTrace and KProbes will not work on kernel text
in V2. This is a temporary limitation. Once authentication is in place, the
limitation will go away.

Additional information
----------------------

The following primitives call Heki functions to reflect guest page table
permissions in the extended page table:

heki_arch_late_init()
	This calls heki_protect().
	
	This is called at the end of kernel boot to protect all guest kernel
	mappings at that point.

vmap_range_noflush()
vmap_small_pages_range_noflush()
	These functions call heki_map().

	These are the lowest level functions called from different places
	to map something in the kernel address space.

set_memory_nx()
set_memory_rw()
	These functions call heki_change_page_attr_set().

set_memory_x()
set_memory_ro()
set_memory_rox()
	These functions call heki_change_page_attr_clear().

__text_poke()
	This function is called by various features to patch text.
	This calls heki_text_poke_start() and heki_text_poke_end().

	heki_text_poke_start() is called to add write permissions to the
	extended page table so that text can be patched. heki_text_poke_end()
	is called to revert write permissions in the extended page table.

User mappings
-------------

All of this work is only for protecting guest kernel mappings. So, the idea is
- the VMM+Hypervisor protect the integrity of the kernel and the kernel
protects the integrity of user land. So, user pages continue to have RWX in the
extended page table. The MBEC feature makes it possible to have separate execute
permission bits in the page table entry for user and kernel.

One final thing
---------------

Peter mentioned the following:

"if you want to mirror the native PTEs why don't you hook into the
paravirt page-table muck and get all that for free?"

We did consider using a shadow page table kind of approach so that guest page table
modifications can be intercepted and reflected in the page table entry. We did not
do this for two reasons:

- there are bits in the page table entry that are not permission bits. We would like
  the guest kernel to be able to modify them directly.

- we cannot tell a genuine request from an attack.

That said, we are still considering making the guest page table read only for added
security. This is WIP. I do not have details yet.

If I have not addressed any comment, please let me know.

As always, thanks for your comments.

Madhavan T Venkataraman





^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
  2023-11-29 21:07         ` Madhavan T. Venkataraman
@ 2023-11-30 11:33           ` Peter Zijlstra
  2023-12-06 16:37             ` Madhavan T. Venkataraman
  2023-12-01  0:45           ` Edgecombe, Rick P
  1 sibling, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2023-11-30 11:33 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Mickaël Salaün, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Ingo Molnar, Kees Cook, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Vitaly Kuznetsov,
	Wanpeng Li, Alexander Graf, Chao Peng, Edgecombe, Rick P,
	Forrest Yuan Yu, James Gowans, James Morris, John Andersen,
	Marian Rotariu, Mihai Donțu, Nicușor Cîțu,
	Thara Gopinath, Trilok Soni, Wei Liu, Will Deacon, Yu Zhang,
	Zahra Tarkhani, Ștefan Șicleru, dev, kvm,
	linux-hardening, linux-hyperv, linux-kernel,
	linux-security-module, qemu-devel, virtualization, x86,
	xen-devel

On Wed, Nov 29, 2023 at 03:07:15PM -0600, Madhavan T. Venkataraman wrote:

> Kernel Lockdown
> ---------------
> 
> But, we must provide at least some security in V2. Otherwise, it is useless.
> 
> So, we have implemented what we call a kernel lockdown. At the end of kernel
> boot, Heki establishes permissions in the extended page table as mentioned
> before. Also, it adds an immutable attribute for kernel text and kernel RO data.
> Beyond that point, guest requests that attempt to modify permissions on any of
> the immutable pages will be denied.
> 
> This means that features like FTrace and KProbes will not work on kernel text
> in V2. This is a temporary limitation. Once authentication is in place, the
> limitation will go away.

So either you're saying your patch 17 / text_poke is broken (so why
include it ?!?) or your statement above is incorrect. Pick one.


> __text_poke()
> 	This function is called by various features to patch text.
> 	This calls heki_text_poke_start() and heki_text_poke_end().
> 
> 	heki_text_poke_start() is called to add write permissions to the
> 	extended page table so that text can be patched. heki_text_poke_end()
> 	is called to revert write permissions in the extended page table.

This, if text_poke works, then static_call / jump_label / ftrace and
everything else should work, they all rely on this.


> Peter mentioned the following:
> 
> "if you want to mirror the native PTEs why don't you hook into the
> paravirt page-table muck and get all that for free?"
> 
> We did consider using a shadow page table kind of approach so that guest page table
> modifications can be intercepted and reflected in the page table entry. We did not
> do this for two reasons:
> 
> - there are bits in the page table entry that are not permission bits. We would like
>   the guest kernel to be able to modify them directly.

This statement makes no sense.

> - we cannot tell a genuine request from an attack.

Why not? How is an explicit call different from an explicit call in a
paravirt hook?

From a maintenance pov we already hate paravirt with a passion, but it
is ever so much better than sprinkling yet another pile of crap
(heki_*) around.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
  2023-11-29 21:07         ` Madhavan T. Venkataraman
  2023-11-30 11:33           ` Peter Zijlstra
@ 2023-12-01  0:45           ` Edgecombe, Rick P
  2023-12-06 16:41             ` Madhavan T. Venkataraman
  1 sibling, 1 reply; 44+ messages in thread
From: Edgecombe, Rick P @ 2023-12-01  0:45 UTC (permalink / raw)
  To: peterz, madvenka
  Cc: ssicleru, tglx, mic, marian.c.rotariu, kvm, wei.liu,
	virtualization, pbonzini, tgopinath, chao.p.peng, qemu-devel,
	linux-kernel, jgowans, ztarkhani, mdontu, x86, bp, xen-devel,
	jamorris, seanjc, vkuznets, Andersen, John S, yu.c.zhang,
	nicu.citu, keescook, Graf, Alexander, wanpengli, dev, will,
	mingo, hpa, linux-security-module, yuanyu, linux-hyperv,
	linux-hardening, quic_tsoni, dave.hansen

On Wed, 2023-11-29 at 15:07 -0600, Madhavan T. Venkataraman wrote:
> Threat Model
> ------------
> 
> In the threat model in Heki, the attacker is a user space attacker
> who exploits
> a kernel vulnerability to gain more privileges or bypass the kernel's
> access
> control and self-protection mechanisms. 
> 
> In the context of the guest page table, one of the things that the
> threat model translates
> to is a hacker gaining access to a guest page with RWX permissions.
> E.g., by adding execute
> permissions to a writable page or by adding write permissions to an
> executable page.
> 
> Today, the permissions for a guest page in the extended page table
> are RWX by
> default. So, if a hacker manages to establish RWX for a page in the
> guest page
> table, then that is all he needs to do some damage.

I had a few random comments from watching the plumbers talk online:

Is there really a big difference between a page that is RWX, and a RW
page that is about to become RX? I realize that there is an addition of
timing, but when executable code is getting loaded it can be written to
then and later executed. I think that gap could be addressed in two
different ways, both pretty difficult:
 1. Verifying the loaded code before it gets marked 
    executable. This is difficult because the kernel does lots of 
    tweaks on the code it is loading (alternatives, etc). It can't 
    just check a signature.
 2. Loading the code in a protected environment. In this model the 
    (for example) module signature would be checked, then the code 
    would be loaded in some sort of protected environment. This way 
    integrity of the loaded code would be enforced. But extracting 
    module loading into a separate domain would be difficult. 
    Various scattered features all have their hands in the loading.

Secondly, I wonder if another way to look at the memory parts of HEKI
could be that this is a way to protect certain page table bits from
stay writes. The RWX bits in the EPT are not directly writable, so more
steps are needed to change things than just a stray write (instead the
helpers involved in the operations need to be called). If that is a
fair way of looking at it, then I wonder how HEKI compares to a
solution like this security-wise:
https://lore.kernel.org/lkml/20210830235927.6443-1-rick.p.edgecombe@intel.com/

Functional-wise it had the benefit of working on bare metal and
supporting the normal kernel features.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
  2023-11-30 11:33           ` Peter Zijlstra
@ 2023-12-06 16:37             ` Madhavan T. Venkataraman
  2023-12-06 18:51               ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Madhavan T. Venkataraman @ 2023-12-06 16:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mickaël Salaün, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Ingo Molnar, Kees Cook, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Vitaly Kuznetsov,
	Wanpeng Li, Alexander Graf, Chao Peng, Edgecombe, Rick P,
	Forrest Yuan Yu, James Gowans, James Morris, John Andersen,
	Marian Rotariu, Mihai Donțu, Nicușor Cîțu,
	Thara Gopinath, Trilok Soni, Wei Liu, Will Deacon, Yu Zhang,
	Zahra Tarkhani, Ștefan Șicleru, dev, kvm,
	linux-hardening, linux-hyperv, linux-kernel,
	linux-security-module, qemu-devel, virtualization, x86,
	xen-devel



On 11/30/23 05:33, Peter Zijlstra wrote:
> On Wed, Nov 29, 2023 at 03:07:15PM -0600, Madhavan T. Venkataraman wrote:
> 
>> Kernel Lockdown
>> ---------------
>>
>> But, we must provide at least some security in V2. Otherwise, it is useless.
>>
>> So, we have implemented what we call a kernel lockdown. At the end of kernel
>> boot, Heki establishes permissions in the extended page table as mentioned
>> before. Also, it adds an immutable attribute for kernel text and kernel RO data.
>> Beyond that point, guest requests that attempt to modify permissions on any of
>> the immutable pages will be denied.
>>
>> This means that features like FTrace and KProbes will not work on kernel text
>> in V2. This is a temporary limitation. Once authentication is in place, the
>> limitation will go away.
> 
> So either you're saying your patch 17 / text_poke is broken (so why
> include it ?!?) or your statement above is incorrect. Pick one.
> 

It has been included so that people can be aware of the changes.

I will remove the text_poke() changes from the patchset and send it later when
I have some authentication in place. It will make sense then.

> 
>> __text_poke()
>> 	This function is called by various features to patch text.
>> 	This calls heki_text_poke_start() and heki_text_poke_end().
>>
>> 	heki_text_poke_start() is called to add write permissions to the
>> 	extended page table so that text can be patched. heki_text_poke_end()
>> 	is called to revert write permissions in the extended page table.
> 
> This, if text_poke works, then static_call / jump_label / ftrace and
> everything else should work, they all rely on this.
> 
> 
>> Peter mentioned the following:
>>
>> "if you want to mirror the native PTEs why don't you hook into the
>> paravirt page-table muck and get all that for free?"
>>
>> We did consider using a shadow page table kind of approach so that guest page table
>> modifications can be intercepted and reflected in the page table entry. We did not
>> do this for two reasons:
>>
>> - there are bits in the page table entry that are not permission bits. We would like
>>   the guest kernel to be able to modify them directly.
> 
> This statement makes no sense.
> 
>> - we cannot tell a genuine request from an attack.
> 
> Why not? How is an explicit call different from an explicit call in a
> paravirt hook?
> 
>>From a maintenance pov we already hate paravirt with a passion, but it
> is ever so much better than sprinkling yet another pile of crap
> (heki_*) around.

I only said that the idea was considered.

We can resume the discussion on this topic when I send the text_poke() changes in a later
version of the Heki patchset.

Madhavan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
  2023-12-01  0:45           ` Edgecombe, Rick P
@ 2023-12-06 16:41             ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 44+ messages in thread
From: Madhavan T. Venkataraman @ 2023-12-06 16:41 UTC (permalink / raw)
  To: Edgecombe, Rick P, peterz
  Cc: ssicleru, tglx, mic, marian.c.rotariu, kvm, wei.liu,
	virtualization, pbonzini, tgopinath, chao.p.peng, qemu-devel,
	linux-kernel, jgowans, ztarkhani, mdontu, x86, bp, xen-devel,
	jamorris, seanjc, vkuznets, Andersen, John S, yu.c.zhang,
	nicu.citu, keescook, Graf, Alexander, wanpengli, dev, will,
	mingo, hpa, linux-security-module, yuanyu, linux-hyperv,
	linux-hardening, quic_tsoni, dave.hansen



On 11/30/23 18:45, Edgecombe, Rick P wrote:
> On Wed, 2023-11-29 at 15:07 -0600, Madhavan T. Venkataraman wrote:
>> Threat Model
>> ------------
>>
>> In the threat model in Heki, the attacker is a user space attacker
>> who exploits
>> a kernel vulnerability to gain more privileges or bypass the kernel's
>> access
>> control and self-protection mechanisms. 
>>
>> In the context of the guest page table, one of the things that the
>> threat model translates
>> to is a hacker gaining access to a guest page with RWX permissions.
>> E.g., by adding execute
>> permissions to a writable page or by adding write permissions to an
>> executable page.
>>
>> Today, the permissions for a guest page in the extended page table
>> are RWX by
>> default. So, if a hacker manages to establish RWX for a page in the
>> guest page
>> table, then that is all he needs to do some damage.
> 
> I had a few random comments from watching the plumbers talk online:
> 
> Is there really a big difference between a page that is RWX, and a RW
> page that is about to become RX? I realize that there is an addition of
> timing, but when executable code is getting loaded it can be written to
> then and later executed. I think that gap could be addressed in two
> different ways, both pretty difficult:
>  1. Verifying the loaded code before it gets marked 
>     executable. This is difficult because the kernel does lots of 
>     tweaks on the code it is loading (alternatives, etc). It can't 
>     just check a signature.
>  2. Loading the code in a protected environment. In this model the 
>     (for example) module signature would be checked, then the code 
>     would be loaded in some sort of protected environment. This way 
>     integrity of the loaded code would be enforced. But extracting 
>     module loading into a separate domain would be difficult. 
>     Various scattered features all have their hands in the loading.
> 
> Secondly, I wonder if another way to look at the memory parts of HEKI
> could be that this is a way to protect certain page table bits from
> stay writes. The RWX bits in the EPT are not directly writable, so more
> steps are needed to change things than just a stray write (instead the
> helpers involved in the operations need to be called). If that is a
> fair way of looking at it, then I wonder how HEKI compares to a
> solution like this security-wise:
> https://lore.kernel.org/lkml/20210830235927.6443-1-rick.p.edgecombe@intel.com/
> 
> Functional-wise it had the benefit of working on bare metal and
> supporting the normal kernel features.

Thanks for the comments. I will think about what you have said and will respond
soon.

Madhavan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
  2023-12-06 16:37             ` Madhavan T. Venkataraman
@ 2023-12-06 18:51               ` Peter Zijlstra
  2023-12-08 18:41                 ` Madhavan T. Venkataraman
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2023-12-06 18:51 UTC (permalink / raw)
  To: Madhavan T. Venkataraman
  Cc: Mickaël Salaün, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Ingo Molnar, Kees Cook, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Vitaly Kuznetsov,
	Wanpeng Li, Alexander Graf, Chao Peng, Edgecombe, Rick P,
	Forrest Yuan Yu, James Gowans, James Morris, John Andersen,
	Marian Rotariu, Mihai Donțu, Nicușor Cîțu,
	Thara Gopinath, Trilok Soni, Wei Liu, Will Deacon, Yu Zhang,
	Zahra Tarkhani, Ștefan Șicleru, dev, kvm,
	linux-hardening, linux-hyperv, linux-kernel,
	linux-security-module, qemu-devel, virtualization, x86,
	xen-devel

On Wed, Dec 06, 2023 at 10:37:33AM -0600, Madhavan T. Venkataraman wrote:
> 
> 
> On 11/30/23 05:33, Peter Zijlstra wrote:
> > On Wed, Nov 29, 2023 at 03:07:15PM -0600, Madhavan T. Venkataraman wrote:
> > 
> >> Kernel Lockdown
> >> ---------------
> >>
> >> But, we must provide at least some security in V2. Otherwise, it is useless.
> >>
> >> So, we have implemented what we call a kernel lockdown. At the end of kernel
> >> boot, Heki establishes permissions in the extended page table as mentioned
> >> before. Also, it adds an immutable attribute for kernel text and kernel RO data.
> >> Beyond that point, guest requests that attempt to modify permissions on any of
> >> the immutable pages will be denied.
> >>
> >> This means that features like FTrace and KProbes will not work on kernel text
> >> in V2. This is a temporary limitation. Once authentication is in place, the
> >> limitation will go away.
> > 
> > So either you're saying your patch 17 / text_poke is broken (so why
> > include it ?!?) or your statement above is incorrect. Pick one.
> > 
> 
> It has been included so that people can be aware of the changes.
> 
> I will remove the text_poke() changes from the patchset and send it later when
> I have some authentication in place. It will make sense then.

If you know its broken then fucking say so in the Changelog instead of
wasting everybody's time.. OMG.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching
  2023-12-06 18:51               ` Peter Zijlstra
@ 2023-12-08 18:41                 ` Madhavan T. Venkataraman
  0 siblings, 0 replies; 44+ messages in thread
From: Madhavan T. Venkataraman @ 2023-12-08 18:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mickaël Salaün, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Ingo Molnar, Kees Cook, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Vitaly Kuznetsov,
	Wanpeng Li, Alexander Graf, Chao Peng, Edgecombe, Rick P,
	Forrest Yuan Yu, James Gowans, James Morris, John Andersen,
	Marian Rotariu, Mihai Donțu, Nicușor Cîțu,
	Thara Gopinath, Trilok Soni, Wei Liu, Will Deacon, Yu Zhang,
	Zahra Tarkhani, Ștefan Șicleru, dev, kvm,
	linux-hardening, linux-hyperv, linux-kernel,
	linux-security-module, qemu-devel, virtualization, x86,
	xen-devel



On 12/6/23 12:51, Peter Zijlstra wrote:
> On Wed, Dec 06, 2023 at 10:37:33AM -0600, Madhavan T. Venkataraman wrote:
>>
>>
>> On 11/30/23 05:33, Peter Zijlstra wrote:
>>> On Wed, Nov 29, 2023 at 03:07:15PM -0600, Madhavan T. Venkataraman wrote:
>>>
>>>> Kernel Lockdown
>>>> ---------------
>>>>
>>>> But, we must provide at least some security in V2. Otherwise, it is useless.
>>>>
>>>> So, we have implemented what we call a kernel lockdown. At the end of kernel
>>>> boot, Heki establishes permissions in the extended page table as mentioned
>>>> before. Also, it adds an immutable attribute for kernel text and kernel RO data.
>>>> Beyond that point, guest requests that attempt to modify permissions on any of
>>>> the immutable pages will be denied.
>>>>
>>>> This means that features like FTrace and KProbes will not work on kernel text
>>>> in V2. This is a temporary limitation. Once authentication is in place, the
>>>> limitation will go away.
>>>
>>> So either you're saying your patch 17 / text_poke is broken (so why
>>> include it ?!?) or your statement above is incorrect. Pick one.
>>>
>>
>> It has been included so that people can be aware of the changes.
>>
>> I will remove the text_poke() changes from the patchset and send it later when
>> I have some authentication in place. It will make sense then.
> 
> If you know its broken then fucking say so in the Changelog instead of
> wasting everybody's time.. OMG.

It is not broken. It addresses one part of the problem. The other part is WIP.

I am preparing a detailed response to your comments. I ask you to be patient until then. In fact, I would appreciate your input/suggestions on some problems we are trying to solve in this context. I will mention them in my response.

Madhavan


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2023-12-08 18:41 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-13  2:23 [RFC PATCH v2 00/19] Hypervisor-Enforced Kernel Integrity Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 01/19] virt: Introduce Hypervisor Enforced Kernel Integrity (Heki) Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 02/19] KVM: x86: Add new hypercall to lock control registers Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 03/19] KVM: x86: Add notifications for Heki policy configuration and violation Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 04/19] heki: Lock guest control registers at the end of guest kernel init Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 05/19] KVM: VMX: Add MBEC support Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 06/19] KVM: x86: Add kvm_x86_ops.fault_gva() Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 07/19] KVM: x86: Make memory attribute helpers more generic Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 08/19] KVM: x86: Extend kvm_vm_set_mem_attributes() with a mask Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 09/19] KVM: x86: Extend kvm_range_has_memory_attributes() with match_all Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 11/19] KVM: x86: Add new hypercall to set EPT permissions Mickaël Salaün
2023-11-13  4:45   ` kernel test robot
2023-11-13  2:23 ` [RFC PATCH v2 12/19] x86: Implement the Memory Table feature to store arbitrary per-page data Mickaël Salaün
2023-11-22  7:19   ` kernel test robot
2023-11-13  2:23 ` [RFC PATCH v2 13/19] heki: Implement a kernel page table walker Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 14/19] heki: x86: Initialize permissions counters for pages mapped into KVA Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 15/19] heki: x86: Initialize permissions counters for pages in vmap()/vunmap() Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 16/19] heki: x86: Update permissions counters when guest page permissions change Mickaël Salaün
2023-11-13  2:23 ` [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching Mickaël Salaün
2023-11-13  8:19   ` Peter Zijlstra
2023-11-27 16:48     ` Madhavan T. Venkataraman
2023-11-27 20:08       ` Peter Zijlstra
2023-11-29 21:07         ` Madhavan T. Venkataraman
2023-11-30 11:33           ` Peter Zijlstra
2023-12-06 16:37             ` Madhavan T. Venkataraman
2023-12-06 18:51               ` Peter Zijlstra
2023-12-08 18:41                 ` Madhavan T. Venkataraman
2023-12-01  0:45           ` Edgecombe, Rick P
2023-12-06 16:41             ` Madhavan T. Venkataraman
2023-11-13  2:23 ` [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor Mickaël Salaün
2023-11-13  8:54   ` Peter Zijlstra
2023-11-27 17:05     ` Madhavan T. Venkataraman
2023-11-27 20:03       ` Peter Zijlstra
2023-11-29 19:47         ` Madhavan T. Venkataraman
2023-11-13  2:23 ` [RFC PATCH v2 19/19] virt: Add Heki KUnit tests Mickaël Salaün
2023-11-13  5:18 [RFC PATCH v2 14/19] heki: x86: Initialize permissions counters for pages mapped into KVA kernel test robot
2023-11-14  1:22 ` kernel test robot
2023-11-13  7:42 [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions kernel test robot
2023-11-14  1:27 ` kernel test robot
2023-11-13  8:14 [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor kernel test robot
2023-11-14  1:30 ` kernel test robot
2023-11-13 12:37 [RFC PATCH v2 10/19] KVM: x86: Implement per-guest-page permissions kernel test robot
2023-11-14  1:29 ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.