All of lore.kernel.org
 help / color / mirror / Atom feed
From: Julien Grall <julien@xen.org>
To: Jan Beulich <jbeulich@suse.com>
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>,
	"Julien Grall" <jgrall@amazon.com>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH 06/22] x86: map/unmap pages in restore_all_guests
Date: Thu, 22 Jun 2023 11:44:12 +0100	[thread overview]
Message-ID: <558e68c4-1a2d-5a9e-4070-5b894e14a3f4@xen.org> (raw)
In-Reply-To: <aa2c8649-4acd-bcf4-d547-e3609bb1a0a2@suse.com>

Hi Jan,

Sorry for the late reply.

On 13/01/2023 09:22, Jan Beulich wrote:
> On 13.01.2023 00:20, Julien Grall wrote:
>> On 04/01/2023 10:27, Jan Beulich wrote:
>>> On 23.12.2022 13:22, Julien Grall wrote:
>>>> On 22/12/2022 11:12, Jan Beulich wrote:
>>>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>>>> --- a/xen/arch/x86/x86_64/entry.S
>>>>>> +++ b/xen/arch/x86/x86_64/entry.S
>>>>>> @@ -165,7 +165,24 @@ restore_all_guest:
>>>>>>             and   %rsi, %rdi
>>>>>>             and   %r9, %rsi
>>>>>>             add   %rcx, %rdi
>>>>>> -        add   %rcx, %rsi
>>>>>> +
>>>>>> +         /*
>>>>>> +          * Without a direct map, we have to map first before copying. We only
>>>>>> +          * need to map the guest root table but not the per-CPU root_pgt,
>>>>>> +          * because the latter is still a xenheap page.
>>>>>> +          */
>>>>>> +        pushq %r9
>>>>>> +        pushq %rdx
>>>>>> +        pushq %rax
>>>>>> +        pushq %rdi
>>>>>> +        mov   %rsi, %rdi
>>>>>> +        shr   $PAGE_SHIFT, %rdi
>>>>>> +        callq map_domain_page
>>>>>> +        mov   %rax, %rsi
>>>>>> +        popq  %rdi
>>>>>> +        /* Stash the pointer for unmapping later. */
>>>>>> +        pushq %rax
>>>>>> +
>>>>>>             mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
>>>>>>             mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
>>>>>>             mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
>>>>>> @@ -177,6 +194,14 @@ restore_all_guest:
>>>>>>             sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
>>>>>>                     ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
>>>>>>             rep movsq
>>>>>> +
>>>>>> +        /* Unmap the page. */
>>>>>> +        popq  %rdi
>>>>>> +        callq unmap_domain_page
>>>>>> +        popq  %rax
>>>>>> +        popq  %rdx
>>>>>> +        popq  %r9
>>>>>
>>>>> While the PUSH/POP are part of what I dislike here, I think this wants
>>>>> doing differently: Establish a mapping when putting in place a new guest
>>>>> page table, and use the pointer here. This could be a new per-domain
>>>>> mapping, to limit its visibility.
>>>>
>>>> I have looked at a per-domain approach and this looks way more complex
>>>> than the few concise lines here (not mentioning the extra amount of
>>>> memory).
>>>
>>> Yes, I do understand that would be a more intrusive change.
>>
>> I could be persuaded to look at a more intrusive change if there are a
>> good reason to do it. To me, at the moment, it mostly seem a matter of
>> taste.
>>
>> So what would we gain from a perdomain mapping?
> 
> Rather than mapping/unmapping once per hypervisor entry/exit, we'd
> map just once per context switch. Plus we'd save ugly/fragile assembly
> code (apart from the push/pop I also dislike C functions being called
> from assembly which aren't really meant to be called this way: While
> these two may indeed be unlikely to ever change, any such change comes
> with the risk of the assembly callers being missed - the compiler
> won't tell you that e.g. argument types/count don't match parameters
> anymore).

I think I have managed to write what you suggested. I would like to 
share to get early feedback before resending the series.

There are also a couple of TODOs (XXX) in place where I am not sure if 
this is correct.

diff --git a/xen/arch/x86/include/asm/config.h 
b/xen/arch/x86/include/asm/config.h
index fbc4bb3416bd..320ddb9e1e77 100644
--- a/xen/arch/x86/include/asm/config.h
+++ b/xen/arch/x86/include/asm/config.h
@@ -202,7 +202,7 @@ extern unsigned char boot_edid_info[128];
  /* Slot 260: per-domain mappings (including map cache). */
  #define PERDOMAIN_VIRT_START    (PML4_ADDR(260))
  #define PERDOMAIN_SLOT_MBYTES   (PML4_ENTRY_BYTES >> (20 + 
PAGETABLE_ORDER))
-#define PERDOMAIN_SLOTS         3
+#define PERDOMAIN_SLOTS         4
  #define PERDOMAIN_VIRT_SLOT(s)  (PERDOMAIN_VIRT_START + (s) * \
                                   (PERDOMAIN_SLOT_MBYTES << 20))
  /* Slot 4: mirror of per-domain mappings (for compat xlat area 
accesses). */
@@ -316,6 +316,16 @@ extern unsigned long xen_phys_start;
  #define ARG_XLAT_START(v)        \
      (ARG_XLAT_VIRT_START + ((v)->vcpu_id << ARG_XLAT_VA_SHIFT))

+/* CR3 shadow mapping area. The fourth per-domain-mapping sub-area */
+#define SHADOW_CR3_VIRT_START   PERDOMAIN_VIRT_SLOT(3)
+#define SHADOW_CR3_ENTRIES      MAX_VIRT_CPUS
+#define SHADOW_CR3_VIRT_END     (SHADOW_CR3_VIRT_START +    \
+                                 (MAX_VIRT_CPUS * PAGE_SIZE))
+
+/* The address of a particular VCPU's c3 */
+#define SHADOW_CR3_VCPU_VIRT_START(v) \
+    (SHADOW_CR3_VIRT_START + ((v)->vcpu_id * PAGE_SIZE))
+
  #define ELFSIZE 64

  #define ARCH_CRASH_SAVE_VMCOREINFO
diff --git a/xen/arch/x86/include/asm/domain.h 
b/xen/arch/x86/include/asm/domain.h
index c2d9fc333be5..d5989224f4a3 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -273,6 +273,7 @@ struct time_scale {
  struct pv_domain
  {
      l1_pgentry_t **gdt_ldt_l1tab;
+    l1_pgentry_t **shadow_cr3_l1tab;

      atomic_t nr_l4_pages;

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9741d28cbc96..b64ee1ca47f6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -509,6 +509,13 @@ void share_xen_page_with_guest(struct page_info 
*page, struct domain *d,
      spin_unlock(&d->page_alloc_lock);
  }

+#define shadow_cr3_idx(v) \
+    ((v)->vcpu_id >> PAGETABLE_ORDER)
+
+#define pv_shadow_cr3_pte(v) \
+    ((v)->domain->arch.pv.shadow_cr3_l1tab[shadow_cr3_idx(v)] + \
+     ((v)->vcpu_id & (L1_PAGETABLE_ENTRIES - 1)))
+
  void make_cr3(struct vcpu *v, mfn_t mfn)
  {
      struct domain *d = v->domain;
@@ -516,6 +523,18 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
      v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
      if ( is_pv_domain(d) && d->arch.pv.pcid )
          v->arch.cr3 |= get_pcid_bits(v, false);
+
+    /* Update the CR3 mapping */
+    if ( is_pv_domain(d) )
+    {
+        l1_pgentry_t *pte = pv_shadow_cr3_pte(v);
+
+        /* XXX Do we need to call get page first? */
+        l1e_write(pte, l1e_from_mfn(mfn, __PAGE_HYPERVISOR_RW));
+        /* XXX Can the flush be reduced to the page? */
+        /* XXX Do we always call with current? */
+        flush_tlb_local();
+    }
  }

  void write_ptbase(struct vcpu *v)
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 5c92812dc67a..064645ccc261 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -288,6 +288,19 @@ static void pv_destroy_gdt_ldt_l1tab(struct vcpu *v)
                                1U << GDT_LDT_VCPU_SHIFT);
  }

+static int pv_create_shadow_cr3_l1tab(struct vcpu *v)
+{
+    return create_perdomain_mapping(v->domain, 
SHADOW_CR3_VCPU_VIRT_START(v),
+                                    1, v->domain->arch.pv.shadow_cr3_l1tab,
+                                    NULL);
+}
+
+static void pv_destroy_shadow_cr3_l1tab(struct vcpu *v)
+
+{
+    destroy_perdomain_mapping(v->domain, SHADOW_CR3_VCPU_VIRT_START(v), 1);
+}
+
  void pv_vcpu_destroy(struct vcpu *v)
  {
      if ( is_pv_32bit_vcpu(v) )
@@ -297,6 +310,7 @@ void pv_vcpu_destroy(struct vcpu *v)
      }

      pv_destroy_gdt_ldt_l1tab(v);
+    pv_destroy_shadow_cr3_l1tab(v);
      XFREE(v->arch.pv.trap_ctxt);
  }

@@ -311,6 +325,10 @@ int pv_vcpu_initialise(struct vcpu *v)
      if ( rc )
          return rc;

+    rc = pv_create_shadow_cr3_l1tab(v);
+    if ( rc )
+        goto done;
+
      BUILD_BUG_ON(X86_NR_VECTORS * sizeof(*v->arch.pv.trap_ctxt) >
                   PAGE_SIZE);
      v->arch.pv.trap_ctxt = xzalloc_array(struct trap_info, 
X86_NR_VECTORS);
@@ -346,10 +364,12 @@ void pv_domain_destroy(struct domain *d)

      destroy_perdomain_mapping(d, GDT_LDT_VIRT_START,
                                GDT_LDT_MBYTES << (20 - PAGE_SHIFT));
+    destroy_perdomain_mapping(d, SHADOW_CR3_VIRT_START, 
SHADOW_CR3_ENTRIES);

      XFREE(d->arch.pv.cpuidmasks);

      FREE_XENHEAP_PAGE(d->arch.pv.gdt_ldt_l1tab);
+    FREE_XENHEAP_PAGE(d->arch.pv.shadow_cr3_l1tab);
  }

  void noreturn cf_check continue_pv_domain(void);
@@ -371,6 +391,12 @@ int pv_domain_initialise(struct domain *d)
          goto fail;
      clear_page(d->arch.pv.gdt_ldt_l1tab);

+    d->arch.pv.shadow_cr3_l1tab =
+        alloc_xenheap_pages(0, MEMF_node(domain_to_node(d)));
+    if ( !d->arch.pv.shadow_cr3_l1tab )
+        goto fail;
+    clear_page(d->arch.pv.shadow_cr3_l1tab);
+
      if ( levelling_caps & ~LCAP_faulting &&
           (d->arch.pv.cpuidmasks = xmemdup(&cpuidmask_defaults)) == NULL )
          goto fail;
@@ -381,6 +407,11 @@ int pv_domain_initialise(struct domain *d)
      if ( rc )
          goto fail;

+    rc = create_perdomain_mapping(d, SHADOW_CR3_VIRT_START,
+                                  SHADOW_CR3_ENTRIES, NULL, NULL);
+    if ( rc )
+        goto fail;
+
      d->arch.ctxt_switch = &pv_csw;

      d->arch.pv.xpti = is_hardware_domain(d) ? opt_xpti_hwdom : 
opt_xpti_domu;
diff --git a/xen/arch/x86/x86_64/asm-offsets.c 
b/xen/arch/x86/x86_64/asm-offsets.c
index 287dac101ad4..ed486607bf15 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -51,6 +51,7 @@ void __dummy__(void)
      OFFSET(UREGS_kernel_sizeof, struct cpu_user_regs, es);
      BLANK();

+    OFFSET(VCPU_id, struct vcpu, vcpu_id);
      OFFSET(VCPU_processor, struct vcpu, processor);
      OFFSET(VCPU_domain, struct vcpu, domain);
      OFFSET(VCPU_vcpu_info, struct vcpu, vcpu_info);
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 8b77d7113bbf..678876a32177 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -165,7 +165,16 @@ restore_all_guest:
          and   %rsi, %rdi
          and   %r9, %rsi
          add   %rcx, %rdi
+
+        /*
+         * The address in the vCPU cr3 is always mapped in the shadow
+         * cr3 virt area.
+         */
+        mov   VCPU_id(%rbx), %rsi
+        shl   $PAGE_SHIFT, %rsi
+        movabs $SHADOW_CR3_VIRT_START, %rcx
          add   %rcx, %rsi
+
          mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
          mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
          mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
> 
> Jan

-- 
Julien Grall


  reply	other threads:[~2023-06-22 10:44 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
2022-12-16 11:48 ` [PATCH 01/22] xen/common: page_alloc: Re-order includes Julien Grall
2022-12-16 12:03   ` Jan Beulich
2022-12-23  9:29     ` Julien Grall
2023-01-23 21:29   ` Stefano Stabellini
2023-01-23 21:57     ` Julien Grall
2022-12-16 11:48 ` [PATCH 02/22] x86/setup: move vm_init() before acpi calls Julien Grall
2022-12-20 15:08   ` Jan Beulich
2022-12-21 10:18     ` Julien Grall
2022-12-21 10:22       ` Jan Beulich
2022-12-23  9:51         ` Julien Grall
2022-12-23  9:51     ` Julien Grall
2023-01-23 21:34   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory Julien Grall
2022-12-16 12:07   ` Julien Grall
2022-12-20 15:15   ` Jan Beulich
2022-12-21 10:23     ` Julien Grall
2023-01-23 21:39   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 04/22] xen/numa: vmap the pages for memnodemap Julien Grall
2022-12-20 15:25   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 05/22] x86/srat: vmap the pages for acpi_slit Julien Grall
2022-12-20 15:30   ` Jan Beulich
2022-12-23 11:31     ` Julien Grall
2023-01-04 10:23       ` Jan Beulich
2023-01-12 23:15         ` Julien Grall
2023-01-13  9:16           ` Jan Beulich
2023-01-13  9:17             ` Julien Grall
2023-01-30 19:27       ` Julien Grall
2023-01-31  9:11         ` Jan Beulich
2023-01-31 21:37           ` Julien Grall
2022-12-16 11:48 ` [PATCH 06/22] x86: map/unmap pages in restore_all_guests Julien Grall
2022-12-22 11:12   ` Jan Beulich
2022-12-23 12:22     ` Julien Grall
2023-01-04 10:27       ` Jan Beulich
2023-01-12 23:20         ` Julien Grall
2023-01-13  9:22           ` Jan Beulich
2023-06-22 10:44             ` Julien Grall [this message]
2023-06-22 13:19               ` Jan Beulich
2022-12-16 11:48 ` [PATCH 07/22] x86/pv: domheap pages should be mapped while relocating initrd Julien Grall
2022-12-22 11:18   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings Julien Grall
2022-12-22 11:48   ` Jan Beulich
2024-01-10 12:50     ` El Yandouzi, Elias
2022-12-16 11:48 ` [PATCH 09/22] x86: lift mapcache variable to the arch level Julien Grall
2022-12-22 12:53   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 10/22] x86/mapcache: initialise the mapcache for the idle domain Julien Grall
2022-12-22 13:06   ` Jan Beulich
2024-01-10 16:24     ` Elias El Yandouzi
2024-01-11  7:53       ` Jan Beulich
2022-12-16 11:48 ` [PATCH 11/22] x86: add a boot option to enable and disable the direct map Julien Grall
2022-12-22 13:24   ` Jan Beulich
2024-01-11 10:47     ` Elias El Yandouzi
2024-01-11 11:53       ` Jan Beulich
2024-01-11 12:25         ` Julien Grall
2024-01-11 14:09           ` Jan Beulich
2024-01-11 18:25             ` Elias El Yandouzi
2024-01-12  7:47               ` Jan Beulich
2024-01-15 14:50                 ` Elias El Yandouzi
2024-01-16  8:30                   ` Jan Beulich
2023-01-23 21:45   ` Stefano Stabellini
2023-01-23 22:01     ` Julien Grall
2022-12-16 11:48 ` [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention Julien Grall
2022-12-22 13:29   ` Jan Beulich
2023-01-06 14:54   ` Henry Wang
2023-01-23 21:47   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 13/22] xen/x86: Add support for the PMAP Julien Grall
2023-01-05 16:46   ` Jan Beulich
2023-01-05 17:50     ` Julien Grall
2023-01-06  7:17       ` Jan Beulich
2022-12-16 11:48 ` [PATCH 14/22] x86/domain_page: remove the fast paths when mfn is not in the directmap Julien Grall
2023-01-11 14:11   ` Jan Beulich
2024-01-11 14:22     ` Elias El Yandouzi
2022-12-16 11:48 ` [PATCH 15/22] xen/page_alloc: add a path for xenheap when there is no direct map Julien Grall
2023-01-11 14:23   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 16/22] x86/setup: leave early boot slightly earlier Julien Grall
2023-01-11 14:34   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 17/22] x86/setup: vmap heap nodes when they are outside the direct map Julien Grall
2023-01-11 14:39   ` Jan Beulich
2023-01-23 22:03   ` Stefano Stabellini
2023-01-23 22:23     ` Julien Grall
2023-01-23 22:56       ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 18/22] x86/setup: do not create valid mappings when directmap=no Julien Grall
2023-01-11 14:47   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 19/22] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables() Julien Grall
2023-01-06 14:54   ` Henry Wang
2023-01-23 22:06   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables Julien Grall
2023-01-06 14:54   ` Henry Wang
2023-01-06 15:44     ` Julien Grall
2023-01-07  2:22       ` Henry Wang
2023-01-23 22:21   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 21/22] xen/arm64: Implement a mapcache for arm64 Julien Grall
2023-01-06 14:55   ` Henry Wang
2023-01-23 22:34   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap Julien Grall
2023-01-06 14:55   ` Henry Wang
2023-01-23 22:52   ` Stefano Stabellini
2023-01-23 23:09     ` Julien Grall
2023-01-24  0:12       ` Stefano Stabellini
2023-01-24 18:06         ` Julien Grall
2023-01-24 20:48           ` Stefano Stabellini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=558e68c4-1a2d-5a9e-4070-5b894e14a3f4@xen.org \
    --to=julien@xen.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jgrall@amazon.com \
    --cc=roger.pau@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.