All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/pv: Fix construction of 32bit dom0's
@ 2019-02-06 20:41 Andrew Cooper
  2019-02-07 12:58 ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2019-02-06 20:41 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

dom0_construct_pv() has logic to transition dom0 into a compat domain when
booting an ELF32 image.

One aspect which is missing is the CPUID policy recalculation, meaning that a
32bit dom0 sees a 64bit policy, which differ by the Long Mode feature flag in
particular.  Another missing item is the x87_fip_width initialisation.

Update dom0_construct_pv() to use switch_compat(), rather than retaining the
opencoding.  Position the call to switch_compat() such that the compat32 local
variable can disappear entirely.

The 32bit monitor table is now created by setup_compat_l4(), avoiding the need
to for manual creation later.  Furthermore, the L3 table creation is redundant
with the logic inside the main mapping loop, so can be dropped as well.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

Slightly RFC:

1) I've not worked out exactly what the

     v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];

   line is supposed to be doing and whether it is needed, but it doesn't
   appear to matter.  It is perhaps another redundant opencoding.

2) The reported

     Dom0 alloc.:   000000003e800000->000000003ec00000 (240470 pages to be allocated)

   line changes by 1 page because of the alloc_domheap_page() moving ahead of
   the printk(), but I'm fairly sure this is benign.  There is a matching
   reduction in the length of the constructed m2p which is perhaps less
   benign.
---
 xen/arch/x86/pv/dom0_build.c | 43 +++++++++++++------------------------------
 xen/arch/x86/pv/domain.c     |  2 +-
 2 files changed, 14 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 837ef7b..c3d8ee7 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -285,7 +285,7 @@ int __init dom0_construct_pv(struct domain *d,
                              module_t *initrd,
                              char *cmdline)
 {
-    int i, cpu, rc, compatible, compat32, order, machine;
+    int i, cpu, rc, compatible, order, machine;
     struct cpu_user_regs *regs;
     unsigned long pfn, mfn;
     unsigned long nr_pages;
@@ -354,14 +354,18 @@ int __init dom0_construct_pv(struct domain *d,
 
     /* compatibility check */
     compatible = 0;
-    compat32   = 0;
     machine = elf_uval(&elf, elf.ehdr, e_machine);
     printk(" Xen  kernel: 64-bit, lsb, compat32\n");
     if ( elf_32bit(&elf) && parms.pae == XEN_PAE_BIMODAL )
         parms.pae = XEN_PAE_EXTCR3;
     if ( elf_32bit(&elf) && parms.pae && machine == EM_386 )
     {
-        compat32 = 1;
+        if ( unlikely(rc = switch_compat(d)) )
+        {
+            printk("Dom0 failed to switch to compat: %d\n", rc);
+            return rc;
+        }
+
         compatible = 1;
     }
     if (elf_64bit(&elf) && machine == EM_X86_64)
@@ -392,16 +396,6 @@ int __init dom0_construct_pv(struct domain *d,
         }
     }
 
-    if ( compat32 )
-    {
-        d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 1;
-        d->arch.pv.xpti = false;
-        d->arch.pv.pcid = false;
-        v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];
-        if ( setup_compat_arg_xlat(v) != 0 )
-            BUG();
-    }
-
     nr_pages = dom0_compute_nr_pages(d, &parms, initrd_len);
 
     if ( parms.pae == XEN_PAE_EXTCR3 )
@@ -425,8 +419,6 @@ int __init dom0_construct_pv(struct domain *d,
         parms.p2m_base = UNSET_ADDR;
     }
 
-    domain_set_alloc_bitsize(d);
-
     /*
      * Why do we need this? The number of page-table frames depends on the
      * size of the bootstrap address space. But the size of the address space
@@ -606,23 +598,14 @@ int __init dom0_construct_pv(struct domain *d,
     {
         maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
         l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+        clear_page(l4tab);
+        init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
+                          d, INVALID_MFN, true);
+        v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
     }
     else
-    {
-        page = alloc_domheap_page(d, MEMF_no_owner | MEMF_no_scrub);
-        if ( !page )
-            panic("Not enough RAM for domain 0 PML4\n");
-        page->u.inuse.type_info = PGT_l4_page_table|PGT_validated|1;
-        l4start = l4tab = page_to_virt(page);
-        maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
-        l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
-    }
-    clear_page(l4tab);
-    init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
-                      d, INVALID_MFN, true);
-    v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
-    if ( is_pv_32bit_domain(d) )
-        v->arch.guest_table_user = v->arch.guest_table;
+        /* Monitor table already created by switch_compat(). */
+        l4start = l4tab = __va(pagetable_get_paddr(v->arch.guest_table));
 
     l4tab += l4_table_offset(v_start);
     pfn = alloc_spfn;
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 7e84b04..3457586 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -70,7 +70,7 @@ static int setup_compat_l4(struct vcpu *v)
     l4_pgentry_t *l4tab;
     mfn_t mfn;
 
-    pg = alloc_domheap_page(v->domain, MEMF_no_owner);
+    pg = alloc_domheap_page(v->domain, MEMF_no_owner | MEMF_no_scrub);
     if ( pg == NULL )
         return -ENOMEM;
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/pv: Fix construction of 32bit dom0's
  2019-02-06 20:41 [PATCH] x86/pv: Fix construction of 32bit dom0's Andrew Cooper
@ 2019-02-07 12:58 ` Jan Beulich
  2019-02-07 13:29   ` Andrew Cooper
  2019-02-13 18:07   ` Wei Liu
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Beulich @ 2019-02-07 12:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monne

>>> On 06.02.19 at 21:41, <andrew.cooper3@citrix.com> wrote:
> Slightly RFC:
> 
> 1) I've not worked out exactly what the
> 
>      v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];
> 
>    line is supposed to be doing and whether it is needed, but it doesn't
>    appear to matter.  It is perhaps another redundant opencoding.

Afaict this is just to be independent of the fact that the vcpu_info
array is first in struct shared_info. I'd be fine with it getting replaced
by a respective BUILD_BUG_ON(), but I'd like to ask that it not be
dropped without replacement.

> 2) The reported
> 
>      Dom0 alloc.:   000000003e800000->000000003ec00000 (240470 pages to be allocated)
> 
>    line changes by 1 page because of the alloc_domheap_page() moving ahead of
>    the printk(), but I'm fairly sure this is benign.  There is a matching
>    reduction in the length of the constructed m2p which is perhaps less
>    benign.

Well, the M2P of course has to be correctly sized. An off-by-one would
likely result in hard to repro bug reports.

> @@ -606,23 +598,14 @@ int __init dom0_construct_pv(struct domain *d,
>      {
>          maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
>          l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> +        clear_page(l4tab);
> +        init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
> +                          d, INVALID_MFN, true);
> +        v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
>      }
>      else
> -    {
> -        page = alloc_domheap_page(d, MEMF_no_owner | MEMF_no_scrub);
> -        if ( !page )
> -            panic("Not enough RAM for domain 0 PML4\n");
> -        page->u.inuse.type_info = PGT_l4_page_table|PGT_validated|1;
> -        l4start = l4tab = page_to_virt(page);
> -        maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
> -        l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;

This one is lost without replacement, but is needed. Commit
7a9d764630 ("x86/32-on-64: adjust Dom0 initial page table layout")
specifically introduced it to make sure the guest-perceived top level
page table is allocated first (and hence marks the beginning of the
boot page tables, so Dom0 can later put all of them into general use).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/pv: Fix construction of 32bit dom0's
  2019-02-07 12:58 ` Jan Beulich
@ 2019-02-07 13:29   ` Andrew Cooper
  2019-02-07 13:38     ` Juergen Gross
  2019-02-07 13:45     ` Jan Beulich
  2019-02-13 18:07   ` Wei Liu
  1 sibling, 2 replies; 11+ messages in thread
From: Andrew Cooper @ 2019-02-07 13:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 3958 bytes --]

On 07/02/2019 12:58, Jan Beulich wrote:
>>>> On 06.02.19 at 21:41, <andrew.cooper3@citrix.com> wrote:
>> Slightly RFC:
>>
>> 1) I've not worked out exactly what the
>>
>>      v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];
>>
>>    line is supposed to be doing and whether it is needed, but it doesn't
>>    appear to matter.  It is perhaps another redundant opencoding.
> Afaict this is just to be independent of the fact that the vcpu_info
> array is first in struct shared_info. I'd be fine with it getting replaced
> by a respective BUILD_BUG_ON(), but I'd like to ask that it not be
> dropped without replacement.

There is some ancillary logic with vcpu_info_mfn (which looks not to be
initialised), which is why I suspect this is some more incorrect
opencoding to begin with.

>
>> 2) The reported
>>
>>      Dom0 alloc.:   000000003e800000->000000003ec00000 (240470 pages to be allocated)
>>
>>    line changes by 1 page because of the alloc_domheap_page() moving ahead of
>>    the printk(), but I'm fairly sure this is benign.  There is a matching
>>    reduction in the length of the constructed m2p which is perhaps less
>>    benign.
> Well, the M2P of course has to be correctly sized. An off-by-one would
> likely result in hard to repro bug reports.

The delta in output (with some of my own debugging) is:

@@ -22,13 +22,13 @@
 (XEN)     p2m_base         = 0xffffffffffffffff
 (XEN)  Xen  kernel: 64-bit, lsb, compat32
 (XEN)  Dom0 kernel: 32-bit, PAE, lsb, paddr 0x100000 -> 0x112000
-(XEN) ** nr_pages 241494
+(XEN) ** nr_pages 241493
 (XEN) PHYSICAL MEMORY ARRANGEMENT:
-(XEN)  Dom0 alloc.:   000000003e800000->000000003ec00000 (240470 pages to be allocated) (tot 1024, nr 241494)
+(XEN)  Dom0 alloc.:   000000003e800000->000000003ec00000 (240469 pages to be allocated) (tot 1024, nr 241493)
 (XEN) VIRTUAL MEMORY ARRANGEMENT:
 (XEN)  Loaded kernel: 0000000000100000->0000000000112000
 (XEN)  Init. ramdisk: 0000000000112000->0000000000112000
-(XEN)  Phys-Mach map: 0000000000112000->00000000001fdd58
+(XEN)  Phys-Mach map: 0000000000112000->00000000001fdd54
 (XEN)  Start info:    00000000001fe000->00000000001fe4b4
 (XEN)  Xenstore ring: 0000000000000000->0000000000000000
 (XEN)  Console ring:  0000000000000000->0000000000000000

I meant the P2M rather than M2P, and it is different by 1 entry which is
expected, given the change by 1 page.  I've positively identified the
1-page change to be the alloc_domheap_page() for the monitor table moving.

>
>> @@ -606,23 +598,14 @@ int __init dom0_construct_pv(struct domain *d,
>>      {
>>          maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
>>          l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
>> +        clear_page(l4tab);
>> +        init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
>> +                          d, INVALID_MFN, true);
>> +        v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
>>      }
>>      else
>> -    {
>> -        page = alloc_domheap_page(d, MEMF_no_owner | MEMF_no_scrub);
>> -        if ( !page )
>> -            panic("Not enough RAM for domain 0 PML4\n");
>> -        page->u.inuse.type_info = PGT_l4_page_table|PGT_validated|1;
>> -        l4start = l4tab = page_to_virt(page);
>> -        maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
>> -        l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> This one is lost without replacement, but is needed. Commit
> 7a9d764630 ("x86/32-on-64: adjust Dom0 initial page table layout")
> specifically introduced it to make sure the guest-perceived top level
> page table is allocated first (and hence marks the beginning of the
> boot page tables, so Dom0 can later put all of them into general use).

I did call this out specifically in the commit message.  I had no idea
about that commit when editing the code, but I still don't understand
why it is important that the guests top level needs to be first.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 5378 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/pv: Fix construction of 32bit dom0's
  2019-02-07 13:29   ` Andrew Cooper
@ 2019-02-07 13:38     ` Juergen Gross
  2019-02-07 13:45     ` Jan Beulich
  1 sibling, 0 replies; 11+ messages in thread
From: Juergen Gross @ 2019-02-07 13:38 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: Roger Pau Monne, Wei Liu, Xen-devel

On 07/02/2019 14:29, Andrew Cooper wrote:
> On 07/02/2019 12:58, Jan Beulich wrote:
>>>>> On 06.02.19 at 21:41, <andrew.cooper3@citrix.com> wrote:
>>> Slightly RFC:
>>>
>>> 1) I've not worked out exactly what the
>>>
>>>      v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];
>>>
>>>    line is supposed to be doing and whether it is needed, but it doesn't
>>>    appear to matter.  It is perhaps another redundant opencoding.
>> Afaict this is just to be independent of the fact that the vcpu_info
>> array is first in struct shared_info. I'd be fine with it getting replaced
>> by a respective BUILD_BUG_ON(), but I'd like to ask that it not be
>> dropped without replacement.
> 
> There is some ancillary logic with vcpu_info_mfn (which looks not to be
> initialised), which is why I suspect this is some more incorrect
> opencoding to begin with.
> 
>>> 2) The reported
>>>
>>>      Dom0 alloc.:   000000003e800000->000000003ec00000 (240470 pages to be allocated)
>>>
>>>    line changes by 1 page because of the alloc_domheap_page() moving ahead of
>>>    the printk(), but I'm fairly sure this is benign.  There is a matching
>>>    reduction in the length of the constructed m2p which is perhaps less
>>>    benign.
>> Well, the M2P of course has to be correctly sized. An off-by-one would
>> likely result in hard to repro bug reports.
> 
> The delta in output (with some of my own debugging) is:
> 
> @@ -22,13 +22,13 @@
>  (XEN)     p2m_base         = 0xffffffffffffffff
>  (XEN)  Xen  kernel: 64-bit, lsb, compat32
>  (XEN)  Dom0 kernel: 32-bit, PAE, lsb, paddr 0x100000 -> 0x112000
> -(XEN) ** nr_pages 241494
> +(XEN) ** nr_pages 241493
>  (XEN) PHYSICAL MEMORY ARRANGEMENT:
> -(XEN)  Dom0 alloc.:   000000003e800000->000000003ec00000 (240470 pages to be allocated) (tot 1024, nr 241494)
> +(XEN)  Dom0 alloc.:   000000003e800000->000000003ec00000 (240469 pages to be allocated) (tot 1024, nr 241493)
>  (XEN) VIRTUAL MEMORY ARRANGEMENT:
>  (XEN)  Loaded kernel: 0000000000100000->0000000000112000
>  (XEN)  Init. ramdisk: 0000000000112000->0000000000112000
> -(XEN)  Phys-Mach map: 0000000000112000->00000000001fdd58
> +(XEN)  Phys-Mach map: 0000000000112000->00000000001fdd54
>  (XEN)  Start info:    00000000001fe000->00000000001fe4b4
>  (XEN)  Xenstore ring: 0000000000000000->0000000000000000
>  (XEN)  Console ring:  0000000000000000->0000000000000000
> 
> I meant the P2M rather than M2P, and it is different by 1 entry which is
> expected, given the change by 1 page.  I've positively identified the
> 1-page change to be the alloc_domheap_page() for the monitor table moving.
> 
>>> @@ -606,23 +598,14 @@ int __init dom0_construct_pv(struct domain *d,
>>>      {
>>>          maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
>>>          l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
>>> +        clear_page(l4tab);
>>> +        init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
>>> +                          d, INVALID_MFN, true);
>>> +        v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
>>>      }
>>>      else
>>> -    {
>>> -        page = alloc_domheap_page(d, MEMF_no_owner | MEMF_no_scrub);
>>> -        if ( !page )
>>> -            panic("Not enough RAM for domain 0 PML4\n");
>>> -        page->u.inuse.type_info = PGT_l4_page_table|PGT_validated|1;
>>> -        l4start = l4tab = page_to_virt(page);
>>> -        maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
>>> -        l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
>> This one is lost without replacement, but is needed. Commit
>> 7a9d764630 ("x86/32-on-64: adjust Dom0 initial page table layout")
>> specifically introduced it to make sure the guest-perceived top level
>> page table is allocated first (and hence marks the beginning of the
>> boot page tables, so Dom0 can later put all of them into general use).
> 
> I did call this out specifically in the commit message.  I had no idea
> about that commit when editing the code, but I still don't understand
> why it is important that the guests top level needs to be first.

In the Linux kernel I have the following comment for 32-bit PV guests:

/*
 * For 32 bit domains xen_start_info->pt_base is the pgd address which
might be
 * not the first page table in the page table pool.
 * Iterate through the initial page tables to find the real page table base.
 */

Does this help somehow?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/pv: Fix construction of 32bit dom0's
  2019-02-07 13:29   ` Andrew Cooper
  2019-02-07 13:38     ` Juergen Gross
@ 2019-02-07 13:45     ` Jan Beulich
  2019-02-07 13:56       ` Juergen Gross
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2019-02-07 13:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monne

>>> On 07.02.19 at 14:29, <andrew.cooper3@citrix.com> wrote:
> On 07/02/2019 12:58, Jan Beulich wrote:
>>>>> On 06.02.19 at 21:41, <andrew.cooper3@citrix.com> wrote:
>>> 2) The reported
>>>
>>>      Dom0 alloc.:   000000003e800000->000000003ec00000 (240470 pages to be allocated)
>>>
>>>    line changes by 1 page because of the alloc_domheap_page() moving ahead of
>>>    the printk(), but I'm fairly sure this is benign.  There is a matching
>>>    reduction in the length of the constructed m2p which is perhaps less
>>>    benign.
>> Well, the M2P of course has to be correctly sized. An off-by-one would
>> likely result in hard to repro bug reports.
> 
> The delta in output (with some of my own debugging) is:
> 
> @@ -22,13 +22,13 @@
>  (XEN)     p2m_base         = 0xffffffffffffffff
>  (XEN)  Xen  kernel: 64-bit, lsb, compat32
>  (XEN)  Dom0 kernel: 32-bit, PAE, lsb, paddr 0x100000 -> 0x112000
> -(XEN) ** nr_pages 241494
> +(XEN) ** nr_pages 241493
>  (XEN) PHYSICAL MEMORY ARRANGEMENT:
> -(XEN)  Dom0 alloc.:   000000003e800000->000000003ec00000 (240470 pages to be allocated) (tot 1024, nr 241494)
> +(XEN)  Dom0 alloc.:   000000003e800000->000000003ec00000 (240469 pages to be allocated) (tot 1024, nr 241493)
>  (XEN) VIRTUAL MEMORY ARRANGEMENT:
>  (XEN)  Loaded kernel: 0000000000100000->0000000000112000
>  (XEN)  Init. ramdisk: 0000000000112000->0000000000112000
> -(XEN)  Phys-Mach map: 0000000000112000->00000000001fdd58
> +(XEN)  Phys-Mach map: 0000000000112000->00000000001fdd54
>  (XEN)  Start info:    00000000001fe000->00000000001fe4b4
>  (XEN)  Xenstore ring: 0000000000000000->0000000000000000
>  (XEN)  Console ring:  0000000000000000->0000000000000000
> 
> I meant the P2M rather than M2P, and it is different by 1 entry which is
> expected, given the change by 1 page.  I've positively identified the
> 1-page change to be the alloc_domheap_page() for the monitor table moving.

But the P2M size isn't supposed to change overall - the same number
of pages get added to the domain. IOW I can see why the "Dom0
alloc.:" changes (and without bad side effects), but I'm having trouble
seeing how a P2M size change can be correct (and I suspect there
would be a problem if previously it went just one slot past a page
boundary).

>>> @@ -606,23 +598,14 @@ int __init dom0_construct_pv(struct domain *d,
>>>      {
>>>          maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
>>>          l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
>>> +        clear_page(l4tab);
>>> +        init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
>>> +                          d, INVALID_MFN, true);
>>> +        v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
>>>      }
>>>      else
>>> -    {
>>> -        page = alloc_domheap_page(d, MEMF_no_owner | MEMF_no_scrub);
>>> -        if ( !page )
>>> -            panic("Not enough RAM for domain 0 PML4\n");
>>> -        page->u.inuse.type_info = PGT_l4_page_table|PGT_validated|1;
>>> -        l4start = l4tab = page_to_virt(page);
>>> -        maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
>>> -        l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
>> This one is lost without replacement, but is needed. Commit
>> 7a9d764630 ("x86/32-on-64: adjust Dom0 initial page table layout")
>> specifically introduced it to make sure the guest-perceived top level
>> page table is allocated first (and hence marks the beginning of the
>> boot page tables, so Dom0 can later put all of them into general use).
> 
> I did call this out specifically in the commit message.  I had no idea
> about that commit when editing the code, but I still don't understand
> why it is important that the guests top level needs to be first.

The start info field "pt_base" is specified to point at the root table.
If the root table isn't first, it's harder for the kernel to know where
the counting of "nr_pt_frames" actually starts (see Linux'es
xen_find_pt_base(), which tells me that nowadays they do that
extra scanning, but iirc this hadn't been there from the beginning).
Furthermore your change even violates the specification, as
"pt_base" no longer points at the root table; you'd have to undo
the respective adjustment said commit did. I'm having trouble seeing
how it would work, considering e.g.

	xen_setup_kernel_pagetable((pgd_t *)xen_start_info->pt_base,
				   xen_start_info->nr_pages);

in Linux code.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/pv: Fix construction of 32bit dom0's
  2019-02-07 13:45     ` Jan Beulich
@ 2019-02-07 13:56       ` Juergen Gross
  2019-02-07 15:01         ` Andrew Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Juergen Gross @ 2019-02-07 13:56 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: Roger Pau Monne, Wei Liu, Xen-devel

On 07/02/2019 14:45, Jan Beulich wrote:
>>>> On 07.02.19 at 14:29, <andrew.cooper3@citrix.com> wrote:
>> On 07/02/2019 12:58, Jan Beulich wrote:
>>>>>> On 06.02.19 at 21:41, <andrew.cooper3@citrix.com> wrote:
>>>> 2) The reported
>>>>
>>>>      Dom0 alloc.:   000000003e800000->000000003ec00000 (240470 pages to be allocated)
>>>>
>>>>    line changes by 1 page because of the alloc_domheap_page() moving ahead of
>>>>    the printk(), but I'm fairly sure this is benign.  There is a matching
>>>>    reduction in the length of the constructed m2p which is perhaps less
>>>>    benign.
>>> Well, the M2P of course has to be correctly sized. An off-by-one would
>>> likely result in hard to repro bug reports.
>>
>> The delta in output (with some of my own debugging) is:
>>
>> @@ -22,13 +22,13 @@
>>  (XEN)     p2m_base         = 0xffffffffffffffff
>>  (XEN)  Xen  kernel: 64-bit, lsb, compat32
>>  (XEN)  Dom0 kernel: 32-bit, PAE, lsb, paddr 0x100000 -> 0x112000
>> -(XEN) ** nr_pages 241494
>> +(XEN) ** nr_pages 241493
>>  (XEN) PHYSICAL MEMORY ARRANGEMENT:
>> -(XEN)  Dom0 alloc.:   000000003e800000->000000003ec00000 (240470 pages to be allocated) (tot 1024, nr 241494)
>> +(XEN)  Dom0 alloc.:   000000003e800000->000000003ec00000 (240469 pages to be allocated) (tot 1024, nr 241493)
>>  (XEN) VIRTUAL MEMORY ARRANGEMENT:
>>  (XEN)  Loaded kernel: 0000000000100000->0000000000112000
>>  (XEN)  Init. ramdisk: 0000000000112000->0000000000112000
>> -(XEN)  Phys-Mach map: 0000000000112000->00000000001fdd58
>> +(XEN)  Phys-Mach map: 0000000000112000->00000000001fdd54
>>  (XEN)  Start info:    00000000001fe000->00000000001fe4b4
>>  (XEN)  Xenstore ring: 0000000000000000->0000000000000000
>>  (XEN)  Console ring:  0000000000000000->0000000000000000
>>
>> I meant the P2M rather than M2P, and it is different by 1 entry which is
>> expected, given the change by 1 page.  I've positively identified the
>> 1-page change to be the alloc_domheap_page() for the monitor table moving.
> 
> But the P2M size isn't supposed to change overall - the same number
> of pages get added to the domain. IOW I can see why the "Dom0
> alloc.:" changes (and without bad side effects), but I'm having trouble
> seeing how a P2M size change can be correct (and I suspect there
> would be a problem if previously it went just one slot past a page
> boundary).
> 
>>>> @@ -606,23 +598,14 @@ int __init dom0_construct_pv(struct domain *d,
>>>>      {
>>>>          maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
>>>>          l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
>>>> +        clear_page(l4tab);
>>>> +        init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
>>>> +                          d, INVALID_MFN, true);
>>>> +        v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
>>>>      }
>>>>      else
>>>> -    {
>>>> -        page = alloc_domheap_page(d, MEMF_no_owner | MEMF_no_scrub);
>>>> -        if ( !page )
>>>> -            panic("Not enough RAM for domain 0 PML4\n");
>>>> -        page->u.inuse.type_info = PGT_l4_page_table|PGT_validated|1;
>>>> -        l4start = l4tab = page_to_virt(page);
>>>> -        maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
>>>> -        l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
>>> This one is lost without replacement, but is needed. Commit
>>> 7a9d764630 ("x86/32-on-64: adjust Dom0 initial page table layout")
>>> specifically introduced it to make sure the guest-perceived top level
>>> page table is allocated first (and hence marks the beginning of the
>>> boot page tables, so Dom0 can later put all of them into general use).
>>
>> I did call this out specifically in the commit message.  I had no idea
>> about that commit when editing the code, but I still don't understand
>> why it is important that the guests top level needs to be first.
> 
> The start info field "pt_base" is specified to point at the root table.
> If the root table isn't first, it's harder for the kernel to know where
> the counting of "nr_pt_frames" actually starts (see Linux'es
> xen_find_pt_base(), which tells me that nowadays they do that
> extra scanning, but iirc this hadn't been there from the beginning).

Before I introduced xen_find_pt_base() 32-bit pv domains just assumed
there could be 2 page tables located before PGD.

There is an exhaustive comment in Xen's include/public/xen.h in this
regard.

> Furthermore your change even violates the specification, as
> "pt_base" no longer points at the root table; you'd have to undo

This is of course a major problem.

pt_base is similar to "where cr3 is supposed to point at".


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/pv: Fix construction of 32bit dom0's
  2019-02-07 13:56       ` Juergen Gross
@ 2019-02-07 15:01         ` Andrew Cooper
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2019-02-07 15:01 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich; +Cc: Roger Pau Monne, Wei Liu, Xen-devel

On 07/02/2019 13:56, Juergen Gross wrote:
> On 07/02/2019 14:45, Jan Beulich wrote:
>>
>>>>> @@ -606,23 +598,14 @@ int __init dom0_construct_pv(struct domain *d,
>>>>>      {
>>>>>          maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
>>>>>          l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
>>>>> +        clear_page(l4tab);
>>>>> +        init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
>>>>> +                          d, INVALID_MFN, true);
>>>>> +        v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
>>>>>      }
>>>>>      else
>>>>> -    {
>>>>> -        page = alloc_domheap_page(d, MEMF_no_owner | MEMF_no_scrub);
>>>>> -        if ( !page )
>>>>> -            panic("Not enough RAM for domain 0 PML4\n");
>>>>> -        page->u.inuse.type_info = PGT_l4_page_table|PGT_validated|1;
>>>>> -        l4start = l4tab = page_to_virt(page);
>>>>> -        maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
>>>>> -        l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
>>>> This one is lost without replacement, but is needed. Commit
>>>> 7a9d764630 ("x86/32-on-64: adjust Dom0 initial page table layout")
>>>> specifically introduced it to make sure the guest-perceived top level
>>>> page table is allocated first (and hence marks the beginning of the
>>>> boot page tables, so Dom0 can later put all of them into general use).
>>> I did call this out specifically in the commit message.  I had no idea
>>> about that commit when editing the code, but I still don't understand
>>> why it is important that the guests top level needs to be first.
>> The start info field "pt_base" is specified to point at the root table.
>> If the root table isn't first, it's harder for the kernel to know where
>> the counting of "nr_pt_frames" actually starts (see Linux'es
>> xen_find_pt_base(), which tells me that nowadays they do that
>> extra scanning, but iirc this hadn't been there from the beginning).
> Before I introduced xen_find_pt_base() 32-bit pv domains just assumed
> there could be 2 page tables located before PGD.
>
> There is an exhaustive comment in Xen's include/public/xen.h in this
> regard.

Oh - so there is.  That is rather more helpful at explaining what is
going on, but there really needs to be a comment in the code.

>
>> Furthermore your change even violates the specification, as
>> "pt_base" no longer points at the root table; you'd have to undo
> This is of course a major problem.
>
> pt_base is similar to "where cr3 is supposed to point at".

Obviously this is a bug needing fixing.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/pv: Fix construction of 32bit dom0's
  2019-02-07 12:58 ` Jan Beulich
  2019-02-07 13:29   ` Andrew Cooper
@ 2019-02-13 18:07   ` Wei Liu
  2019-02-14  8:11     ` Jan Beulich
  1 sibling, 1 reply; 11+ messages in thread
From: Wei Liu @ 2019-02-13 18:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Xen-devel, Wei Liu, Roger Pau Monne

On Thu, Feb 07, 2019 at 05:58:56AM -0700, Jan Beulich wrote:
> >>> On 06.02.19 at 21:41, <andrew.cooper3@citrix.com> wrote:
> > Slightly RFC:
> > 
> > 1) I've not worked out exactly what the
> > 
> >      v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];
> > 
> >    line is supposed to be doing and whether it is needed, but it doesn't
> >    appear to matter.  It is perhaps another redundant opencoding.
> 
> Afaict this is just to be independent of the fact that the vcpu_info
> array is first in struct shared_info. I'd be fine with it getting replaced
> by a respective BUILD_BUG_ON(), but I'd like to ask that it not be
> dropped without replacement.

What do you mean by "be independent of" here? Perhaps you meant "be sure
of"? But I still fail to understand how would an assignment makes sure
a member is first in a struct.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/pv: Fix construction of 32bit dom0's
  2019-02-13 18:07   ` Wei Liu
@ 2019-02-14  8:11     ` Jan Beulich
  2019-02-14 10:30       ` Wei Liu
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2019-02-14  8:11 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel, Roger Pau Monne

>>> On 13.02.19 at 19:07, <wei.liu2@citrix.com> wrote:
> On Thu, Feb 07, 2019 at 05:58:56AM -0700, Jan Beulich wrote:
>> >>> On 06.02.19 at 21:41, <andrew.cooper3@citrix.com> wrote:
>> > Slightly RFC:
>> > 
>> > 1) I've not worked out exactly what the
>> > 
>> >      v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];
>> > 
>> >    line is supposed to be doing and whether it is needed, but it doesn't
>> >    appear to matter.  It is perhaps another redundant opencoding.
>> 
>> Afaict this is just to be independent of the fact that the vcpu_info
>> array is first in struct shared_info. I'd be fine with it getting replaced
>> by a respective BUILD_BUG_ON(), but I'd like to ask that it not be
>> dropped without replacement.
> 
> What do you mean by "be independent of" here? Perhaps you meant "be sure
> of"? But I still fail to understand how would an assignment makes sure
> a member is first in a struct.

It's the other way around: We're fine without the assignment when
the field is first in the struct. The assignment would strictly be needed
if it wasn't, because then what's earlier in the struct could have
different sizes between the native and compat layouts.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/pv: Fix construction of 32bit dom0's
  2019-02-14  8:11     ` Jan Beulich
@ 2019-02-14 10:30       ` Wei Liu
  2019-02-14 10:47         ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Wei Liu @ 2019-02-14 10:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Xen-devel, Wei Liu, Roger Pau Monne

On Thu, Feb 14, 2019 at 01:11:49AM -0700, Jan Beulich wrote:
> >>> On 13.02.19 at 19:07, <wei.liu2@citrix.com> wrote:
> > On Thu, Feb 07, 2019 at 05:58:56AM -0700, Jan Beulich wrote:
> >> >>> On 06.02.19 at 21:41, <andrew.cooper3@citrix.com> wrote:
> >> > Slightly RFC:
> >> > 
> >> > 1) I've not worked out exactly what the
> >> > 
> >> >      v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];
> >> > 
> >> >    line is supposed to be doing and whether it is needed, but it doesn't
> >> >    appear to matter.  It is perhaps another redundant opencoding.
> >> 
> >> Afaict this is just to be independent of the fact that the vcpu_info
> >> array is first in struct shared_info. I'd be fine with it getting replaced
> >> by a respective BUILD_BUG_ON(), but I'd like to ask that it not be
> >> dropped without replacement.
> > 
> > What do you mean by "be independent of" here? Perhaps you meant "be sure
> > of"? But I still fail to understand how would an assignment makes sure
> > a member is first in a struct.
> 
> It's the other way around: We're fine without the assignment when
> the field is first in the struct. The assignment would strictly be needed
> if it wasn't, because then what's earlier in the struct could have
> different sizes between the native and compat layouts.

I see. In that case I think replacing it with

   BUILD_BUG_ON(offsetof(struct shared_info, vcpu_info) != 0);

should be OK. Although I don't see how it can change to not be the first
in the shared_info, it being part of the guest visible ABI.

Wei.

> 
> Jan
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/pv: Fix construction of 32bit dom0's
  2019-02-14 10:30       ` Wei Liu
@ 2019-02-14 10:47         ` Jan Beulich
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2019-02-14 10:47 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, Xen-devel, Roger Pau Monne

>>> On 14.02.19 at 11:30, <wei.liu2@citrix.com> wrote:
> On Thu, Feb 14, 2019 at 01:11:49AM -0700, Jan Beulich wrote:
>> >>> On 13.02.19 at 19:07, <wei.liu2@citrix.com> wrote:
>> > On Thu, Feb 07, 2019 at 05:58:56AM -0700, Jan Beulich wrote:
>> >> >>> On 06.02.19 at 21:41, <andrew.cooper3@citrix.com> wrote:
>> >> > Slightly RFC:
>> >> > 
>> >> > 1) I've not worked out exactly what the
>> >> > 
>> >> >      v->vcpu_info = (void *)&d->shared_info->compat.vcpu_info[0];
>> >> > 
>> >> >    line is supposed to be doing and whether it is needed, but it doesn't
>> >> >    appear to matter.  It is perhaps another redundant opencoding.
>> >> 
>> >> Afaict this is just to be independent of the fact that the vcpu_info
>> >> array is first in struct shared_info. I'd be fine with it getting replaced
>> >> by a respective BUILD_BUG_ON(), but I'd like to ask that it not be
>> >> dropped without replacement.
>> > 
>> > What do you mean by "be independent of" here? Perhaps you meant "be sure
>> > of"? But I still fail to understand how would an assignment makes sure
>> > a member is first in a struct.
>> 
>> It's the other way around: We're fine without the assignment when
>> the field is first in the struct. The assignment would strictly be needed
>> if it wasn't, because then what's earlier in the struct could have
>> different sizes between the native and compat layouts.
> 
> I see. In that case I think replacing it with
> 
>    BUILD_BUG_ON(offsetof(struct shared_info, vcpu_info) != 0);
> 
> should be OK. Although I don't see how it can change to not be the first
> in the shared_info, it being part of the guest visible ABI.

Sure - it was added really just as a pre- (and perhaps over-)
cautionary measure, when at the time so many other native
<-> compat transformations were necessary and could be got
wrong.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-02-14 10:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-06 20:41 [PATCH] x86/pv: Fix construction of 32bit dom0's Andrew Cooper
2019-02-07 12:58 ` Jan Beulich
2019-02-07 13:29   ` Andrew Cooper
2019-02-07 13:38     ` Juergen Gross
2019-02-07 13:45     ` Jan Beulich
2019-02-07 13:56       ` Juergen Gross
2019-02-07 15:01         ` Andrew Cooper
2019-02-13 18:07   ` Wei Liu
2019-02-14  8:11     ` Jan Beulich
2019-02-14 10:30       ` Wei Liu
2019-02-14 10:47         ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.