All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] PVH dom0 construction timeout
@ 2020-02-28 21:08 Andrew Cooper
  2020-03-02  9:24 ` Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Andrew Cooper @ 2020-02-28 21:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Jan Beulich, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 5120 bytes --]

It turns out that PVH dom0 construction doesn't work so well on a
2-socket Rome system...

(XEN) NX (Execute Disable) protection active

(XEN) *** Building a PVH Dom0 ***

(XEN) Watchdog timer detects that CPU0 is stuck!

(XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----

(XEN) CPU:    0

(XEN) RIP:    e008:[<ffff82d08029a8fd>] page_get_ram_type+0x58/0xb6

(XEN) RFLAGS: 0000000000000206   CONTEXT: hypervisor

(XEN) rax: ffff82d080948fe0   rbx: 0000000002b73db9   rcx: 0000000000000000

(XEN) rdx: 0000000004000000   rsi: 0000000004000000   rdi: 0000002b73db9000

(XEN) rbp: ffff82d080827be0   rsp: ffff82d080827ba0   r8:  ffff82d080948fcc

(XEN) r9:  0000002b73dba000   r10: ffff82d0809491fc   r11: 8000000000000000

(XEN) r12: 0000000002b73db9   r13: ffff8320341bc000   r14: 000000000404fc00

(XEN) r15: ffff82d08046f209   cr0: 000000008005003b   cr4: 00000000001506e0

(XEN) cr3: 00000000a0414000   cr2: 0000000000000000

(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000

(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008

(XEN) Xen code around <ffff82d08029a8fd> (page_get_ram_type+0x58/0xb6):

(XEN)  4c 39 d0 74 4d 49 39 d1 <76> 0b 89 ca 83 ca 10 48 39 38 0f 47 ca 49 89 c0

(XEN) Xen stack trace from rsp=ffff82d080827ba0:

(XEN)    ffff82d08061ee91 ffff82d080827bb4 00000000000b2403 ffff82d080804340

(XEN)    ffff8320341bc000 ffff82d080804340 ffff83000003df90 ffff8320341bc000

(XEN)    ffff82d080827c08 ffff82d08061c38c ffff8320341bc000 ffff82d080827ca8

(XEN)    ffff82d080648750 ffff82d080827c20 ffff82d08061852c 0000000000200000

(XEN)    ffff82d080827d60 ffff82d080638abe ffff82d080232854 ffff82d080930c60

(XEN)    ffff82d080930280 ffff82d080674800 ffff83000003df90 0000000001a40000

(XEN)    ffff83000003df80 ffff82d080827c80 0000000000000206 ffff8320341bc000

(XEN)    ffff82d080827cb8 ffff82d080827ca8 ffff82d080232854 ffff82d080961780

(XEN)    ffff82d080930280 ffff82d080827c00 0000000000000002 ffff82d08022f9a0

(XEN)    00000000010a4bb0 ffff82d080827ce0 0000000000000206 000000000381b66d

(XEN)    ffff82d080827d00 ffff82d0802b1e87 ffff82d080936900 ffff82d080936900

(XEN)    ffff82d080827d18 ffff82d0802b30d0 ffff82d080936900 ffff82d080827d50

(XEN)    ffff82d08022ef5e ffff8320341bc000 ffff83000003df80 ffff8320341bc000

(XEN)    ffff83000003df80 0000000001a40000 ffff83000003df90 ffff82d080674800

(XEN)    ffff82d080827d98 ffff82d08063cd06 0000000000000001 ffff82d080674800

(XEN)    ffff82d080931050 0000000000000100 ffff82d080950c80 ffff82d080827ee8

(XEN)    ffff82d08062eae7 0000000001a40fff 0000000000000000 000ffff82d080e00

(XEN)    ffffffff00000000 0000000000000005 0000000000000004 0000000000000004

(XEN)    0000000000000003 0000000000000003 0000000000000002 0000000000000002

(XEN)    0000000002050000 0000000000000000 ffff82d080674c20 ffff82d080674ea0

(XEN) Xen call trace:

(XEN)    [<ffff82d08029a8fd>] R page_get_ram_type+0x58/0xb6

(XEN)    [<ffff82d08061ee91>] S arch_iommu_hwdom_init+0x239/0x2b7

(XEN)    [<ffff82d08061c38c>] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0x85/0x9f

(XEN)    [<ffff82d08061852c>] F iommu_hwdom_init+0x44/0x4b

(XEN)    [<ffff82d080638abe>] F dom0_construct_pvh+0x160/0x1233

(XEN)    [<ffff82d08063cd06>] F construct_dom0+0x5c/0x280e

(XEN)    [<ffff82d08062eae7>] F __start_xen+0x25db/0x2860

(XEN)    [<ffff82d0802000ec>] F __high_start+0x4c/0x4e

(XEN)

(XEN) CPU1 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)

(XEN) CPU31 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)

(XEN) CPU30 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)

(XEN) CPU27 @ e008:ffff82d08022ad5a (scrub_one_page+0x6d/0x7b)

(XEN) CPU26 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)

(XEN) CPU244 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)

(XEN) CPU245 @ e008:ffff82d08022ad5a (scrub_one_page+0x6d/0x7b)

(XEN) CPU247 @ e008:ffff82d080256e3f (drivers/char/ns16550.c#ns_read_reg+0x2d/0x35)

(XEN) CPU246 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)

<snip rather a large number of cpus, all idle>


This stack trace is the same on several boots, and in particular,
page_get_ram_type() being the %rip which took the timeout.  For an
equivalent PV dom0 build, it takes perceptibly 0 time, based on how
quickly the next line is printed.

I haven't diagnosed the exact issue, but some observations:

The arch_iommu_hwdom_init() loop's positioning of
process_pending_softirqs() looks problematic, because it is short
circuited conditionally by hwdom_iommu_map().

page_get_ram_type() is definitely suboptimal here.  We have an linear
search over a (large-ish) sorted list, and a caller which has every MFN
in the system passed into it, which makes the total runtime of
arch_iommu_hwdom_init() quadratic with the size of the system.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 6306 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Xen-devel] PVH dom0 construction timeout
  2020-02-28 21:08 [Xen-devel] PVH dom0 construction timeout Andrew Cooper
@ 2020-03-02  9:24 ` Jan Beulich
  2020-03-02  9:36   ` Jan Beulich
  2020-03-02 10:36 ` Roger Pau Monné
  2020-03-02 11:45 ` Andrew Cooper
  2 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2020-03-02  9:24 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Roger Pau Monne

On 28.02.2020 22:08, Andrew Cooper wrote:
> It turns out that PVH dom0 construction doesn't work so well on a
> 2-socket Rome system...
> 
> (XEN) NX (Execute Disable) protection active
> 
> (XEN) *** Building a PVH Dom0 ***
> 
> (XEN) Watchdog timer detects that CPU0 is stuck!
> 
> (XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
> 
> (XEN) CPU:    0
> 
> (XEN) RIP:    e008:[<ffff82d08029a8fd>] page_get_ram_type+0x58/0xb6
> 
> (XEN) RFLAGS: 0000000000000206   CONTEXT: hypervisor
> 
> (XEN) rax: ffff82d080948fe0   rbx: 0000000002b73db9   rcx: 0000000000000000
> 
> (XEN) rdx: 0000000004000000   rsi: 0000000004000000   rdi: 0000002b73db9000
> 
> (XEN) rbp: ffff82d080827be0   rsp: ffff82d080827ba0   r8:  ffff82d080948fcc
> 
> (XEN) r9:  0000002b73dba000   r10: ffff82d0809491fc   r11: 8000000000000000
> 
> (XEN) r12: 0000000002b73db9   r13: ffff8320341bc000   r14: 000000000404fc00
> 
> (XEN) r15: ffff82d08046f209   cr0: 000000008005003b   cr4: 00000000001506e0
> 
> (XEN) cr3: 00000000a0414000   cr2: 0000000000000000
> 
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> 
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> 
> (XEN) Xen code around <ffff82d08029a8fd> (page_get_ram_type+0x58/0xb6):
> 
> (XEN)  4c 39 d0 74 4d 49 39 d1 <76> 0b 89 ca 83 ca 10 48 39 38 0f 47 ca 49 89 c0
> 
> (XEN) Xen stack trace from rsp=ffff82d080827ba0:
> 
> (XEN)    ffff82d08061ee91 ffff82d080827bb4 00000000000b2403 ffff82d080804340
> 
> (XEN)    ffff8320341bc000 ffff82d080804340 ffff83000003df90 ffff8320341bc000
> 
> (XEN)    ffff82d080827c08 ffff82d08061c38c ffff8320341bc000 ffff82d080827ca8
> 
> (XEN)    ffff82d080648750 ffff82d080827c20 ffff82d08061852c 0000000000200000
> 
> (XEN)    ffff82d080827d60 ffff82d080638abe ffff82d080232854 ffff82d080930c60
> 
> (XEN)    ffff82d080930280 ffff82d080674800 ffff83000003df90 0000000001a40000
> 
> (XEN)    ffff83000003df80 ffff82d080827c80 0000000000000206 ffff8320341bc000
> 
> (XEN)    ffff82d080827cb8 ffff82d080827ca8 ffff82d080232854 ffff82d080961780
> 
> (XEN)    ffff82d080930280 ffff82d080827c00 0000000000000002 ffff82d08022f9a0
> 
> (XEN)    00000000010a4bb0 ffff82d080827ce0 0000000000000206 000000000381b66d
> 
> (XEN)    ffff82d080827d00 ffff82d0802b1e87 ffff82d080936900 ffff82d080936900
> 
> (XEN)    ffff82d080827d18 ffff82d0802b30d0 ffff82d080936900 ffff82d080827d50
> 
> (XEN)    ffff82d08022ef5e ffff8320341bc000 ffff83000003df80 ffff8320341bc000
> 
> (XEN)    ffff83000003df80 0000000001a40000 ffff83000003df90 ffff82d080674800
> 
> (XEN)    ffff82d080827d98 ffff82d08063cd06 0000000000000001 ffff82d080674800
> 
> (XEN)    ffff82d080931050 0000000000000100 ffff82d080950c80 ffff82d080827ee8
> 
> (XEN)    ffff82d08062eae7 0000000001a40fff 0000000000000000 000ffff82d080e00
> 
> (XEN)    ffffffff00000000 0000000000000005 0000000000000004 0000000000000004
> 
> (XEN)    0000000000000003 0000000000000003 0000000000000002 0000000000000002
> 
> (XEN)    0000000002050000 0000000000000000 ffff82d080674c20 ffff82d080674ea0
> 
> (XEN) Xen call trace:
> 
> (XEN)    [<ffff82d08029a8fd>] R page_get_ram_type+0x58/0xb6
> 
> (XEN)    [<ffff82d08061ee91>] S arch_iommu_hwdom_init+0x239/0x2b7
> 
> (XEN)    [<ffff82d08061c38c>] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0x85/0x9f
> 
> (XEN)    [<ffff82d08061852c>] F iommu_hwdom_init+0x44/0x4b
> 
> (XEN)    [<ffff82d080638abe>] F dom0_construct_pvh+0x160/0x1233
> 
> (XEN)    [<ffff82d08063cd06>] F construct_dom0+0x5c/0x280e
> 
> (XEN)    [<ffff82d08062eae7>] F __start_xen+0x25db/0x2860
> 
> (XEN)    [<ffff82d0802000ec>] F __high_start+0x4c/0x4e
> 
> (XEN)
> 
> (XEN) CPU1 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> (XEN) CPU31 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> (XEN) CPU30 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> (XEN) CPU27 @ e008:ffff82d08022ad5a (scrub_one_page+0x6d/0x7b)
> 
> (XEN) CPU26 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> (XEN) CPU244 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> (XEN) CPU245 @ e008:ffff82d08022ad5a (scrub_one_page+0x6d/0x7b)
> 
> (XEN) CPU247 @ e008:ffff82d080256e3f (drivers/char/ns16550.c#ns_read_reg+0x2d/0x35)
> 
> (XEN) CPU246 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> <snip rather a large number of cpus, all idle>
> 
> 
> This stack trace is the same on several boots, and in particular,
> page_get_ram_type() being the %rip which took the timeout.  For an
> equivalent PV dom0 build, it takes perceptibly 0 time, based on how
> quickly the next line is printed.
> 
> I haven't diagnosed the exact issue, but some observations:
> 
> The arch_iommu_hwdom_init() loop's positioning of
> process_pending_softirqs() looks problematic, because it is short
> circuited conditionally by hwdom_iommu_map().

Yes, we want to avoid this bypassing. I'll make a patch.

> page_get_ram_type() is definitely suboptimal here.  We have an linear
> search over a (large-ish) sorted list, and a caller which has every MFN
> in the system passed into it, which makes the total runtime of
> arch_iommu_hwdom_init() quadratic with the size of the system.

This linear search is the same for PVH and PV, isn't it? In
fact hwdom_iommu_map(), on the average, may do more work for
PV than for PVH, considering the is_hvm_domain()-based return
from the switch()'s default case. So for the moment I could
explain such a huge difference in consumed time only if the
PV case ran with iommu_hwdom_passthrough set to true (which
isn't possible for PVH).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Xen-devel] PVH dom0 construction timeout
  2020-03-02  9:24 ` Jan Beulich
@ 2020-03-02  9:36   ` Jan Beulich
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2020-03-02  9:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Roger Pau Monne

On 02.03.2020 10:24, Jan Beulich wrote:
> On 28.02.2020 22:08, Andrew Cooper wrote:
>> page_get_ram_type() is definitely suboptimal here.  We have an linear
>> search over a (large-ish) sorted list, and a caller which has every MFN
>> in the system passed into it, which makes the total runtime of
>> arch_iommu_hwdom_init() quadratic with the size of the system.
> 
> This linear search is the same for PVH and PV, isn't it? In
> fact hwdom_iommu_map(), on the average, may do more work for
> PV than for PVH, considering the is_hvm_domain()-based return
> from the switch()'s default case. So for the moment I could
> explain such a huge difference in consumed time only if the
> PV case ran with iommu_hwdom_passthrough set to true (which
> isn't possible for PVH).

Actually the differing iommu_hwdom_strict setting may matter
here, but it being clear (possible only in the PV case) would
mean more actual mapping operations get carried out, i.e.
should result in slower overall execution. So I'm still
puzzled by the observed difference in consumed time.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Xen-devel] PVH dom0 construction timeout
  2020-02-28 21:08 [Xen-devel] PVH dom0 construction timeout Andrew Cooper
  2020-03-02  9:24 ` Jan Beulich
@ 2020-03-02 10:36 ` Roger Pau Monné
  2020-03-02 11:45 ` Andrew Cooper
  2 siblings, 0 replies; 7+ messages in thread
From: Roger Pau Monné @ 2020-03-02 10:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Jan Beulich

On Fri, Feb 28, 2020 at 09:08:30PM +0000, Andrew Cooper wrote:
> It turns out that PVH dom0 construction doesn't work so well on a
> 2-socket Rome system...
> 
> (XEN) NX (Execute Disable) protection active
> 
> (XEN) *** Building a PVH Dom0 ***
> 
> (XEN) Watchdog timer detects that CPU0 is stuck!
> 
> (XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
> 
> (XEN) CPU:    0
> 
> (XEN) RIP:    e008:[<ffff82d08029a8fd>] page_get_ram_type+0x58/0xb6
> 
> (XEN) RFLAGS: 0000000000000206   CONTEXT: hypervisor
> 
> (XEN) rax: ffff82d080948fe0   rbx: 0000000002b73db9   rcx: 0000000000000000
> 
> (XEN) rdx: 0000000004000000   rsi: 0000000004000000   rdi: 0000002b73db9000
> 
> (XEN) rbp: ffff82d080827be0   rsp: ffff82d080827ba0   r8:  ffff82d080948fcc
> 
> (XEN) r9:  0000002b73dba000   r10: ffff82d0809491fc   r11: 8000000000000000
> 
> (XEN) r12: 0000000002b73db9   r13: ffff8320341bc000   r14: 000000000404fc00
> 
> (XEN) r15: ffff82d08046f209   cr0: 000000008005003b   cr4: 00000000001506e0
> 
> (XEN) cr3: 00000000a0414000   cr2: 0000000000000000
> 
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> 
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> 
> (XEN) Xen code around <ffff82d08029a8fd> (page_get_ram_type+0x58/0xb6):
> 
> (XEN)  4c 39 d0 74 4d 49 39 d1 <76> 0b 89 ca 83 ca 10 48 39 38 0f 47 ca 49 89 c0
> 
> (XEN) Xen stack trace from rsp=ffff82d080827ba0:
> 
> (XEN)    ffff82d08061ee91 ffff82d080827bb4 00000000000b2403 ffff82d080804340
> 
> (XEN)    ffff8320341bc000 ffff82d080804340 ffff83000003df90 ffff8320341bc000
> 
> (XEN)    ffff82d080827c08 ffff82d08061c38c ffff8320341bc000 ffff82d080827ca8
> 
> (XEN)    ffff82d080648750 ffff82d080827c20 ffff82d08061852c 0000000000200000
> 
> (XEN)    ffff82d080827d60 ffff82d080638abe ffff82d080232854 ffff82d080930c60
> 
> (XEN)    ffff82d080930280 ffff82d080674800 ffff83000003df90 0000000001a40000
> 
> (XEN)    ffff83000003df80 ffff82d080827c80 0000000000000206 ffff8320341bc000
> 
> (XEN)    ffff82d080827cb8 ffff82d080827ca8 ffff82d080232854 ffff82d080961780
> 
> (XEN)    ffff82d080930280 ffff82d080827c00 0000000000000002 ffff82d08022f9a0
> 
> (XEN)    00000000010a4bb0 ffff82d080827ce0 0000000000000206 000000000381b66d
> 
> (XEN)    ffff82d080827d00 ffff82d0802b1e87 ffff82d080936900 ffff82d080936900
> 
> (XEN)    ffff82d080827d18 ffff82d0802b30d0 ffff82d080936900 ffff82d080827d50
> 
> (XEN)    ffff82d08022ef5e ffff8320341bc000 ffff83000003df80 ffff8320341bc000
> 
> (XEN)    ffff83000003df80 0000000001a40000 ffff83000003df90 ffff82d080674800
> 
> (XEN)    ffff82d080827d98 ffff82d08063cd06 0000000000000001 ffff82d080674800
> 
> (XEN)    ffff82d080931050 0000000000000100 ffff82d080950c80 ffff82d080827ee8
> 
> (XEN)    ffff82d08062eae7 0000000001a40fff 0000000000000000 000ffff82d080e00
> 
> (XEN)    ffffffff00000000 0000000000000005 0000000000000004 0000000000000004
> 
> (XEN)    0000000000000003 0000000000000003 0000000000000002 0000000000000002
> 
> (XEN)    0000000002050000 0000000000000000 ffff82d080674c20 ffff82d080674ea0
> 
> (XEN) Xen call trace:
> 
> (XEN)    [<ffff82d08029a8fd>] R page_get_ram_type+0x58/0xb6
> 
> (XEN)    [<ffff82d08061ee91>] S arch_iommu_hwdom_init+0x239/0x2b7
> 
> (XEN)    [<ffff82d08061c38c>] F drivers/passthrough/amd/pci_amd_iommu.c#amd_iommu_hwdom_init+0x85/0x9f
> 
> (XEN)    [<ffff82d08061852c>] F iommu_hwdom_init+0x44/0x4b
> 
> (XEN)    [<ffff82d080638abe>] F dom0_construct_pvh+0x160/0x1233
> 
> (XEN)    [<ffff82d08063cd06>] F construct_dom0+0x5c/0x280e
> 
> (XEN)    [<ffff82d08062eae7>] F __start_xen+0x25db/0x2860
> 
> (XEN)    [<ffff82d0802000ec>] F __high_start+0x4c/0x4e
> 
> (XEN)
> 
> (XEN) CPU1 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> (XEN) CPU31 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> (XEN) CPU30 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> (XEN) CPU27 @ e008:ffff82d08022ad5a (scrub_one_page+0x6d/0x7b)
> 
> (XEN) CPU26 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> (XEN) CPU244 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> (XEN) CPU245 @ e008:ffff82d08022ad5a (scrub_one_page+0x6d/0x7b)
> 
> (XEN) CPU247 @ e008:ffff82d080256e3f (drivers/char/ns16550.c#ns_read_reg+0x2d/0x35)
> 
> (XEN) CPU246 @ e008:ffff82d0802f203f (arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0xa9/0xbf)
> 
> <snip rather a large number of cpus, all idle>
> 
> 
> This stack trace is the same on several boots, and in particular,
> page_get_ram_type() being the %rip which took the timeout.  For an
> equivalent PV dom0 build, it takes perceptibly 0 time, based on how
> quickly the next line is printed.

set_identity_p2m_entry on AMD will always take longer as it needs to
add the mfn to both the p2m and the iommu page tables because of the
lack of page table sharing.

On a PVH dom0 hwdom_iommu_map will return false more often than for
PV, because RAM regions are already mapped into the p2m and the iommu
page tables if required, and hence the process_pending_softirqs was
likely skipped way more often. That together with a big memory map
could explain the watchdog triggering and rIP pointing to
page_get_ram_type I think.

> 
> I haven't diagnosed the exact issue, but some observations:
> 
> The arch_iommu_hwdom_init() loop's positioning of
> process_pending_softirqs() looks problematic, because it is short
> circuited conditionally by hwdom_iommu_map().
> 
> page_get_ram_type() is definitely suboptimal here.  We have an linear
> search over a (large-ish) sorted list, and a caller which has every MFN
> in the system passed into it, which makes the total runtime of
> arch_iommu_hwdom_init() quadratic with the size of the system.

This could be improved for PVH dom0 I believe, as we already have an
adjusted e820 we could use instead of having to query the type of
every mfn on the system. We could just iterate over holes and reserved
ranges on the adjusted memory map and avoid having to query the type
of RAM regions for example, as those are already mapped in the p2m or
iommu pages tables for a PVH dom0.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Xen-devel] PVH dom0 construction timeout
  2020-02-28 21:08 [Xen-devel] PVH dom0 construction timeout Andrew Cooper
  2020-03-02  9:24 ` Jan Beulich
  2020-03-02 10:36 ` Roger Pau Monné
@ 2020-03-02 11:45 ` Andrew Cooper
  2020-03-02 12:19   ` Roger Pau Monné
  2 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2020-03-02 11:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Jan Beulich, Wei Liu, Roger Pau Monne

On 28/02/2020 21:08, Andrew Cooper wrote:
> It turns out that PVH dom0 construction doesn't work so well on a
> 2-socket Rome system...

With the softirq fix in place, here are the differences in construction
between PV and PVH along with timestamps.

(XEN) [   30.856178] NX (Execute Disable) protection active
(XEN) [   30.906155] *** Building a PV Dom0 ***
(XEN) [   31.153853] ELF: phdr: paddr=0x1000000 memsz=0xeef000

(XEN) [   27.588081] NX (Execute Disable) protection active
(XEN) [   27.633081] *** Building a PVH Dom0 ***
(XEN) [   33.524345] Dom0 memory allocation stats:
(XEN) [   33.568697] order  0 allocations: 2
(XEN) [   33.612341] order  1 allocations: 1
(XEN) [   33.655544] order  2 allocations: 5
(XEN) [   33.698344] order  3 allocations: 5
(XEN) [   33.740650] order  4 allocations: 2
(XEN) [   33.782736] order  5 allocations: 5
(XEN) [   33.824295] order  6 allocations: 4
(XEN) [   33.865423] order  7 allocations: 4
(XEN) [   33.906237] order  8 allocations: 4
(XEN) [   33.946560] order  9 allocations: 4
(XEN) [   33.986465] order 10 allocations: 4
(XEN) [   34.025925] order 11 allocations: 6
(XEN) [   34.065089] order 12 allocations: 5
(XEN) [   34.103750] order 13 allocations: 5
(XEN) [   34.142221] order 14 allocations: 3
(XEN) [   34.180064] order 15 allocations: 2
(XEN) [   34.217557] order 16 allocations: 3
(XEN) [   34.255105] order 17 allocations: 3
(XEN) [   34.292610] order 18 allocations: 5
(XEN) [   34.539002] Unable to copy initrd to guest
(XEN) [   34.576732] Failed to load Dom0 kernel
(XEN) [   34.618554]
(XEN) [   34.656905] ****************************************
(XEN) [   34.698851] Panic on CPU 0:
(XEN) [   34.737640] Could not set up DOM0 guest OS
(XEN) [   34.777939] ****************************************

i.e. PVH doesn't even complete correctly, and takes 6 seconds as opposed
to PV's 0.2s

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Xen-devel] PVH dom0 construction timeout
  2020-03-02 11:45 ` Andrew Cooper
@ 2020-03-02 12:19   ` Roger Pau Monné
  2020-03-02 12:49     ` Andrew Cooper
  0 siblings, 1 reply; 7+ messages in thread
From: Roger Pau Monné @ 2020-03-02 12:19 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich, Wei Liu

On Mon, Mar 02, 2020 at 11:45:26AM +0000, Andrew Cooper wrote:
> On 28/02/2020 21:08, Andrew Cooper wrote:
> > It turns out that PVH dom0 construction doesn't work so well on a
> > 2-socket Rome system...
> 
> With the softirq fix in place, here are the differences in construction
> between PV and PVH along with timestamps.
> 
> (XEN) [   30.856178] NX (Execute Disable) protection active
> (XEN) [   30.906155] *** Building a PV Dom0 ***
> (XEN) [   31.153853] ELF: phdr: paddr=0x1000000 memsz=0xeef000
> 
> (XEN) [   27.588081] NX (Execute Disable) protection active
> (XEN) [   27.633081] *** Building a PVH Dom0 ***
> (XEN) [   33.524345] Dom0 memory allocation stats:
> (XEN) [   33.568697] order  0 allocations: 2
> (XEN) [   33.612341] order  1 allocations: 1
> (XEN) [   33.655544] order  2 allocations: 5
> (XEN) [   33.698344] order  3 allocations: 5
> (XEN) [   33.740650] order  4 allocations: 2
> (XEN) [   33.782736] order  5 allocations: 5
> (XEN) [   33.824295] order  6 allocations: 4
> (XEN) [   33.865423] order  7 allocations: 4
> (XEN) [   33.906237] order  8 allocations: 4
> (XEN) [   33.946560] order  9 allocations: 4
> (XEN) [   33.986465] order 10 allocations: 4
> (XEN) [   34.025925] order 11 allocations: 6
> (XEN) [   34.065089] order 12 allocations: 5
> (XEN) [   34.103750] order 13 allocations: 5
> (XEN) [   34.142221] order 14 allocations: 3
> (XEN) [   34.180064] order 15 allocations: 2
> (XEN) [   34.217557] order 16 allocations: 3
> (XEN) [   34.255105] order 17 allocations: 3
> (XEN) [   34.292610] order 18 allocations: 5
> (XEN) [   34.539002] Unable to copy initrd to guest
> (XEN) [   34.576732] Failed to load Dom0 kernel
> (XEN) [   34.618554]
> (XEN) [   34.656905] ****************************************
> (XEN) [   34.698851] Panic on CPU 0:
> (XEN) [   34.737640] Could not set up DOM0 guest OS
> (XEN) [   34.777939] ****************************************
> 
> i.e. PVH doesn't even complete correctly, and takes 6 seconds as opposed
> to PV's 0.2s

Hm, I guess PVH dom0 construction needs to be more clever about initrd
placement, right now it's just copied after the kernel, without any
check on whether there's enough space. Can you paste the output of the
following patch?

Thanks, Roger.
---8<---
diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
index aa602773bb..82e9ac46a0 100644
--- a/xen/arch/x86/e820.c
+++ b/xen/arch/x86/e820.c
@@ -88,7 +88,7 @@ static void __init add_memory_region(unsigned long long start,
     e820.nr_map++;
 }
 
-static void __init print_e820_memory_map(struct e820entry *map, unsigned int entries)
+void __init print_e820_memory_map(struct e820entry *map, unsigned int entries)
 {
     unsigned int i;
 
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index eded87eaf5..3ec036678c 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -490,6 +490,8 @@ static int __init pvh_populate_p2m(struct domain *d)
 #undef MB1_PAGES
 }
 
+void print_e820_memory_map(struct e820entry *map, unsigned int entries);
+
 static int __init pvh_load_kernel(struct domain *d, const module_t *image,
                                   unsigned long image_headroom,
                                   module_t *initrd, void *image_base,
@@ -555,6 +557,9 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
         if ( rc )
         {
             printk("Unable to copy initrd to guest\n");
+printk("load address: %lx initrd size: %x rc %d\n",
+       last_addr, initrd->mod_end, rc);
+print_e820_memory_map(d->arch.e820, d->arch.nr_e820);
             return rc;
         }
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Xen-devel] PVH dom0 construction timeout
  2020-03-02 12:19   ` Roger Pau Monné
@ 2020-03-02 12:49     ` Andrew Cooper
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Cooper @ 2020-03-02 12:49 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich, Wei Liu

On 02/03/2020 12:19, Roger Pau Monné wrote:
> On Mon, Mar 02, 2020 at 11:45:26AM +0000, Andrew Cooper wrote:
>> On 28/02/2020 21:08, Andrew Cooper wrote:
>>> It turns out that PVH dom0 construction doesn't work so well on a
>>> 2-socket Rome system...
>> With the softirq fix in place, here are the differences in construction
>> between PV and PVH along with timestamps.
>>
>> (XEN) [   30.856178] NX (Execute Disable) protection active
>> (XEN) [   30.906155] *** Building a PV Dom0 ***
>> (XEN) [   31.153853] ELF: phdr: paddr=0x1000000 memsz=0xeef000
>>
>> (XEN) [   27.588081] NX (Execute Disable) protection active
>> (XEN) [   27.633081] *** Building a PVH Dom0 ***
>> (XEN) [   33.524345] Dom0 memory allocation stats:
>> (XEN) [   33.568697] order  0 allocations: 2
>> (XEN) [   33.612341] order  1 allocations: 1
>> (XEN) [   33.655544] order  2 allocations: 5
>> (XEN) [   33.698344] order  3 allocations: 5
>> (XEN) [   33.740650] order  4 allocations: 2
>> (XEN) [   33.782736] order  5 allocations: 5
>> (XEN) [   33.824295] order  6 allocations: 4
>> (XEN) [   33.865423] order  7 allocations: 4
>> (XEN) [   33.906237] order  8 allocations: 4
>> (XEN) [   33.946560] order  9 allocations: 4
>> (XEN) [   33.986465] order 10 allocations: 4
>> (XEN) [   34.025925] order 11 allocations: 6
>> (XEN) [   34.065089] order 12 allocations: 5
>> (XEN) [   34.103750] order 13 allocations: 5
>> (XEN) [   34.142221] order 14 allocations: 3
>> (XEN) [   34.180064] order 15 allocations: 2
>> (XEN) [   34.217557] order 16 allocations: 3
>> (XEN) [   34.255105] order 17 allocations: 3
>> (XEN) [   34.292610] order 18 allocations: 5
>> (XEN) [   34.539002] Unable to copy initrd to guest
>> (XEN) [   34.576732] Failed to load Dom0 kernel
>> (XEN) [   34.618554]
>> (XEN) [   34.656905] ****************************************
>> (XEN) [   34.698851] Panic on CPU 0:
>> (XEN) [   34.737640] Could not set up DOM0 guest OS
>> (XEN) [   34.777939] ****************************************
>>
>> i.e. PVH doesn't even complete correctly, and takes 6 seconds as opposed
>> to PV's 0.2s
> Hm, I guess PVH dom0 construction needs to be more clever about initrd
> placement, right now it's just copied after the kernel, without any
> check on whether there's enough space.

Correct.

(XEN) [   34.150042] Unable to copy initrd to guest
(XEN) [   34.186891] load address: 302c000 initrd size: 12ac916 rc 2
(XEN) [   34.224415]  [0000000000000000, 000000000009ffff] (usable)
(XEN) [   34.262722]  [00000000000a0000, 00000000000fffff] (reserved)
(XEN) [   34.300587]  [0000000000100000, 0000000003ffffff] (usable)
(XEN) [   34.338095]  [0000000004000000, 0000000004041fff] (ACPI NVS)

The initrd overlaps into this NVS region.

(XEN) [   34.375640]  [0000000004042000, 0000000076cfffff] (usable)
(XEN) [   34.413183]  [0000000076d00000, 0000000076ffffff] (reserved)
(XEN) [   34.450883]  [0000000077000000, 00000000a6e7cfff] (usable)
(XEN) [   34.488777]  [00000000a6e7d000, 00000000a97b7fff] (reserved)
(XEN) [   34.526778]  [00000000a97b8000, 00000000a99e3fff] (usable)
(XEN) [   34.564801]  [00000000a99e4000, 00000000a9e9ffff] (ACPI NVS)
(XEN) [   34.603098]  [00000000a9ea0000, 00000000aa7f3fff] (reserved)
(XEN) [   34.641739]  [00000000aa7f4000, 00000000abffffff] (usable)
(XEN) [   34.680609]  [00000000ac000000, 00000000afffffff] (reserved)
(XEN) [   34.719820]  [00000000b2200000, 00000000b41fffff] (reserved)
(XEN) [   34.759288]  [00000000b8800000, 00000000ba7fffff] (reserved)
(XEN) [   34.798923]  [00000000f2200000, 00000000f41fffff] (reserved)
(XEN) [   34.838674]  [00000000f8c00000, 00000000fabfffff] (reserved)
(XEN) [   34.878809]  [00000000fe000000, 00000000ffffffff] (reserved)
(XEN) [   34.918972]  [0000000100000000, 0000000257aecfff] (usable)
(XEN) [   34.959120]  [0000000257aed000, 000000204effffff] (unusable)
(XEN) [   34.999413]  [000000204f000000, 000000204fffffff] (reserved)
(XEN) [   35.039708]  [0000002050000000, 000000404fbfffff] (unusable)
(XEN) [   35.080142]  [000000404fc00000, 000000404fffffff] (reserved)
(XEN) [   35.120721]  [0000010000000000, 00000100201fffff] (reserved)
(XEN) [   35.160901]  [000001dfa0000000, 000001dfc01fffff] (reserved)
(XEN) [   35.200730]  [000002bf40000000, 000002bf601fffff] (reserved)
(XEN) [   35.240184]  [0000039ee0000000, 0000039f001fffff] (reserved)
(XEN) [   35.279301]  [0000047e80000000, 0000047ea01fffff] (reserved)
(XEN) [   35.318087]  [0000055e20000000, 0000055e401fffff] (reserved)
(XEN) [   35.356586]  [0000063dc0000000, 0000063de01fffff] (reserved)
(XEN) [   35.395183]  [0000071d60000000, 0000071d801fffff] (reserved)
(XEN) [   35.433919] Failed to load Dom0 kernel

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-02 12:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 21:08 [Xen-devel] PVH dom0 construction timeout Andrew Cooper
2020-03-02  9:24 ` Jan Beulich
2020-03-02  9:36   ` Jan Beulich
2020-03-02 10:36 ` Roger Pau Monné
2020-03-02 11:45 ` Andrew Cooper
2020-03-02 12:19   ` Roger Pau Monné
2020-03-02 12:49     ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.