All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v1 00/74] Run PV guest in PVH container
@ 2018-01-04 13:05 Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 01/74] x86/svm: Offer CPUID Faulting to AMD HVM guests as well Wei Liu
                   ` (75 more replies)
  0 siblings, 76 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

Hi all

This is a patch series to run PV guest inside a PVH container. The series is
still in a very RFC state. We're aware that some code is not very clean yet and
in the process of cleaning things up.

The series can be found at:

    https://xenbits.xen.org/git-http/people/liuw/xen.git wip.pvshim-rfc-v1

The basic idea can be found at page 15 of the slides at [0].

This is a mitigation against one of the CPU vulnerabilities disclosed recently.
This series makes it possible to continue running untrusted PV guests.  Please
refer to XSA-254 [1] for more information.

Given the embargo lifted and vulnerabilities disclosed we opt to develop openly
on xen-devel. Feedback and testing is very welcome.

The series is split into three parts: The first part is for the host that runs
the shim, the second part is for the shim itself, the third part is for
toolstack patches (not yet fully working). See the markers in the list of
patches.

Instructions on using the PV shim:

1. Git clone the branch and configure as one normally would.
2. A xen-shim binary would be built and installed into Xen's firmware
   directory, along side hvmloader and co.
3. Use the hacky way currently provided in the first part of the series to
   boot a PV guest inside a PVH container:
   a. Append type='pvh' in your PV guest config file;
   b. Export two environment variables so that libxl knows where to find
      the shim and what to add to the shim's command line option.
      # export LIBXL_PVSHIM_PATH=$PATH_TO_XEN_SHIM
      # export LIBXL_PVSHIM_CMDLINE="pv-shim console=xen,pv loglvl=all guest_loglvl=all apic_verbosity=debug e820-verbose sched=null"
4. xl create -c guest.cfg

You should be able to see some Xen messages first and then guest kernel
messages (the console= shim paramter is required).

Known issues:

1. ARM build and some Clang build are broken by this series.
2. The host will see a lot over-allocation messages, nothing too harmful and
   will be fixed once toolstack is ready.

Wei.

[0] https://www.slideshare.net/xen_com_mgr/xpdds17-keynote-towards-a-configurable-and-slimmer-x86-hypervisor-wei-liu-citrix
[1] https://xenbits.xen.org/xsa/advisory-254.html

# Patches for the host:

448f56a363 x86/svm: Offer CPUID Faulting to AMD HVM guests as well
6a78c9ae33 x86: Common cpuid faulting support
05844fec44 x86/upcall: inject a spurious event after setting upcall vector
fc7a48dd74 tools/libxc: initialise hvm loader elf log fd to get more logging
522c9cbaf0 tools/libxc: remove extraneous newline in xc_dom_load_acpi
bd6b572b32 tools/libelf: fix elf notes check for PVH guest
449b932b0c tools/libxc: Multi modules support
cc6dbdc0c1 libxl: Introduce hack to allow PVH mode to add a shim

# Patches for the shim:

8ffbad2060 xen/common: Widen the guest logging buffer slightly
7fc883a17e x86/time: Print a more helpful error when a platform timer can't be found
0575f913c1 x86/link: Introduce and use SECTION_ALIGN
62d8196e3b xen/acpi: mark the PM timer FADT field as optional
789db028db xen/domctl: Return arch_config via getdomaininfo
97f38ec4b4 tools/ocaml: Expose arch_config in domaininfo
6da8e1993c tools/ocaml: Extend domain_create() to take arch_domainconfig
0b1f990b03 x86/fixmap: Modify fix_to_virt() to return a void pointer
014520b9d0 ---- x86/Kconfig: Options for Xen and PVH support
62a4c07bfe x86/link: Relocate program headers
3b5a699018 x86: introduce ELFNOTE macro
802fbc7aff x86: produce a binary that can be booted as PVH
782cebfce5 x86/entry: Early PVH boot code
cdc718e5e0 x86/boot: Map more than the first 16MB
025e19e9a5 x86/entry: Probe for Xen early during boot
53aadd1db4 x86/guest: Hypercall support
86aeb49671 x86/shutdown: Support for using SCHEDOP_{shutdown,reboot}
145d616bed x86/pvh: Retrieve memory map from Xen
40bf7f9323 xen/console: Introduce console=xen
c7ad734481 x86: initialise shared_info page
a6940a5516 x86: xen pv clock time source
b69ce8f6eb x86: APIC timer calibration when running as a guest
e2e3665ea3 x86: read wallclock from Xen running in pvh mode
b01d7d338e x86: don't swallow the first command line item in pvh mode
aa1937c569 x86/guest: enable event channels upcalls
9e9c06e556 x86/guest: add PV console code
a87b4fd510 x86/guest: use PV console for Xen/Dom0 I/O
b67edf6cdb --- x86/shim: Kconfig and command line options
ce622d9384 tools/firmware: Build and install xen-shim
c1b1c473b8 x86/pv-shim: Force CPUID faulting in pv-shim mode
576f4be4b9 xen/x86: make VGA support selectable
7dbc3f25f6 xen/x86: report domain id on cpuid
c95b5e63eb xen/pvh: do not mark the low 1MB as IO mem
cc7023e59b sched/null: skip vCPUs on the waitqueue that are blocked
4fcc995c14 xen: introduce rangeset_reserve_hole
22d931454f xen/pvshim: keep track of unused pages
f32f82c439 x86/guest: use unpopulated memory to map the shared_info page
638d3ae921 xen/guest: fetch vCPU ID from Xen
2fb52effec x86/guest: fix upcall vector setup
2ec939a35e x86/guest: unmask console event channel
e57a7c3173 x86/guest: map per-cpu vcpu_info area.
d2a1878ac6 xen/pvshim: remove Dom0 kernel support check
4e898f8c00 xen/pvshim: don't allow access to iomem or ioports
4dff8efebd xen: mark xenstore/console pages as RAM and add them to dom_io
5b6a4b069b xen/pvshim: modify Dom0 builder in order to build a DomU
b865a91f1d xen/pvshim: set correct domid value
f31e0cd535 xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU
97bda4f904 xen/pvshim: add grant table operations
e37cf1367a x86/pv-shim: shadow PV console's page for L2 DomU
8cc21ecc09 xen/pvshim: add migration support
a7ff975bb3 xen/pvshim: add shim_mem cmdline parameter
bcdc208ad2 xen/pvshim: set max_pages to the value of tot_pages
a5b1f98a15 xen/pvshim: support vCPU hotplug
6be1b4e645 xen/pvshim: memory hotplug
7ce9abb399 xen/shim: modify shim_mem parameter behaviour
7952196d0c xen/pvshim: use default position for the m2p mappings
6cbe2150d3 xen/shim: crash instead of reboot in shim mode
8667344fa4 xen/shim: allow DomU to have as many vcpus as available

# Patches for toolstack (not yet fully working):

8faff727c4 libxl: libxl__build_hvm: Introduce separate b_info parameter
4b5a346b96 libxl__domain_build_info_setdefault_pvhhvm: introduce
dfcaf56b00 libxl_bitmap_copy_alloc: copy 0,NULL as 0,NULL
a9b73202d1 libxl: pvshim: Check state->shim_path before domain type
48a6a2217a libxl: pvshim: Provide first-class config settings to enable shim mode
e40e08c4fd libxl: pvshim: Introduce pvhshim_extra
c6bd9ca574 xl: pvshim: Provide and document xl config
d3e5c3a0d6 libxl: pvshim: Set video_memkb to ~0

 .gitignore                            |   5 +
 docs/man/xl.cfg.pod.5.in              |  28 +
 docs/misc/xen-command-line.markdown   |  36 +-
 stubdom/grub/kexec.c                  |   7 +-
 tools/firmware/Makefile               |  11 +
 tools/firmware/xen-dir/Makefile       |  59 +++
 tools/firmware/xen-dir/shim.config    |  86 +++
 tools/helpers/init-xenstore-domain.c  |   4 +-
 tools/libxc/include/xc_dom.h          |  48 +-
 tools/libxc/include/xenctrl.h         |   1 +
 tools/libxc/xc_dom_compat_linux.c     |   2 +-
 tools/libxc/xc_dom_core.c             | 154 ++++--
 tools/libxc/xc_dom_hvmloader.c        |   1 +
 tools/libxc/xc_dom_x86.c              |  65 +--
 tools/libxc/xc_domain.c               |   1 +
 tools/libxl/libxl.h                   |   8 +
 tools/libxl/libxl_create.c            |  65 ++-
 tools/libxl/libxl_dom.c               |  73 ++-
 tools/libxl/libxl_internal.h          |   7 +
 tools/libxl/libxl_types.idl           |   4 +
 tools/libxl/libxl_utils.c             |   8 +-
 tools/ocaml/libs/xc/xenctrl.ml        |  31 +-
 tools/ocaml/libs/xc/xenctrl.mli       |  30 +-
 tools/ocaml/libs/xc/xenctrl_stubs.c   |  48 +-
 tools/xl/xl_parse.c                   |  11 +
 xen/Makefile                          |   8 +-
 xen/arch/x86/Kconfig                  |  40 +-
 xen/arch/x86/Makefile                 |  11 +-
 xen/arch/x86/acpi/lib.c               |   2 +-
 xen/arch/x86/apic.c                   |  38 +-
 xen/arch/x86/boot/build32.mk          |   1 +
 xen/arch/x86/boot/cmdline.c           |   5 +-
 xen/arch/x86/boot/head.S              |  48 ++
 xen/arch/x86/boot/trampoline.S        |   7 +
 xen/arch/x86/boot/x86_64.S            |   5 +-
 xen/arch/x86/cpu/amd.c                |  16 +-
 xen/arch/x86/cpu/common.c             |  78 ++-
 xen/arch/x86/cpu/intel.c              |  81 +--
 xen/arch/x86/dom0_build.c             |  50 +-
 xen/arch/x86/domctl.c                 |   2 +
 xen/arch/x86/e820.c                   |   7 +-
 xen/arch/x86/efi/efi-boot.h           |   4 +
 xen/arch/x86/guest/Makefile           |   4 +
 xen/arch/x86/guest/hypercall_page.S   |  79 +++
 xen/arch/x86/guest/pvh-boot.c         | 140 +++++
 xen/arch/x86/guest/xen.c              | 401 ++++++++++++++
 xen/arch/x86/hvm/hvm.c                |   1 +
 xen/arch/x86/hvm/irq.c                |   4 +
 xen/arch/x86/hvm/svm/svm.c            |   6 +
 xen/arch/x86/mm.c                     |  19 +-
 xen/arch/x86/mpparse.c                |   2 +-
 xen/arch/x86/msi.c                    |   3 +-
 xen/arch/x86/msr.c                    |   3 +-
 xen/arch/x86/platform_hypercall.c     |   2 +
 xen/arch/x86/pv/Makefile              |   1 +
 xen/arch/x86/pv/dom0_build.c          |  49 +-
 xen/arch/x86/pv/shim.c                | 955 ++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c                  |  86 ++-
 xen/arch/x86/shutdown.c               |  41 +-
 xen/arch/x86/smpboot.c                |   4 +
 xen/arch/x86/tboot.c                  |   4 +-
 xen/arch/x86/time.c                   | 132 ++++-
 xen/arch/x86/traps.c                  |   1 +
 xen/arch/x86/xen.lds.S                |  82 ++-
 xen/common/compat/grant_table.c       |   5 +
 xen/common/domain.c                   |  54 +-
 xen/common/event_channel.c            | 100 ++--
 xen/common/grant_table.c              |  10 +
 xen/common/libelf/libelf-dominfo.c    |   9 +-
 xen/common/memory.c                   |  14 +
 xen/common/page_alloc.c               |  15 +
 xen/common/rangeset.c                 |  51 ++
 xen/common/sched_null.c               |  11 +-
 xen/common/schedule.c                 |   3 +-
 xen/drivers/acpi/apei/apei-io.c       |   2 +-
 xen/drivers/acpi/tables/tbfadt.c      |   2 +-
 xen/drivers/char/Makefile             |   2 +
 xen/drivers/char/console.c            |  49 +-
 xen/drivers/char/consoled.c           | 145 ++++++
 xen/drivers/char/ehci-dbgp.c          |   2 +-
 xen/drivers/char/ns16550.c            |   2 +-
 xen/drivers/char/xen_pv_console.c     | 205 ++++++++
 xen/drivers/video/Kconfig             |   8 +-
 xen/include/asm-x86/apicdef.h         |   2 +-
 xen/include/asm-x86/asm_defns.h       |  12 +
 xen/include/asm-x86/cpuid.h           |   3 -
 xen/include/asm-x86/dom0_build.h      |   4 +
 xen/include/asm-x86/e820.h            |   1 +
 xen/include/asm-x86/fixmap.h          |   6 +-
 xen/include/asm-x86/guest.h           |  37 ++
 xen/include/asm-x86/guest/hypercall.h | 202 +++++++
 xen/include/asm-x86/guest/pvh-boot.h  |  57 ++
 xen/include/asm-x86/guest/xen.h       | 111 ++++
 xen/include/asm-x86/processor.h       |   4 +-
 xen/include/asm-x86/pv/shim.h         | 124 +++++
 xen/include/asm-x86/setup.h           |   6 +
 xen/include/public/arch-x86/cpuid.h   |   3 +-
 xen/include/public/domctl.h           |   1 +
 xen/include/xen/consoled.h            |  27 +
 xen/include/xen/domain.h              |   1 +
 xen/include/xen/event.h               |  15 +
 xen/include/xen/pv_console.h          |  38 ++
 xen/include/xen/rangeset.h            |   4 +
 xen/include/xen/sched.h               |   6 +-
 104 files changed, 4153 insertions(+), 415 deletions(-)
 create mode 100644 tools/firmware/xen-dir/Makefile
 create mode 100644 tools/firmware/xen-dir/shim.config
 create mode 100644 xen/arch/x86/guest/Makefile
 create mode 100644 xen/arch/x86/guest/hypercall_page.S
 create mode 100644 xen/arch/x86/guest/pvh-boot.c
 create mode 100644 xen/arch/x86/guest/xen.c
 create mode 100644 xen/arch/x86/pv/shim.c
 create mode 100644 xen/drivers/char/consoled.c
 create mode 100644 xen/drivers/char/xen_pv_console.c
 create mode 100644 xen/include/asm-x86/guest.h
 create mode 100644 xen/include/asm-x86/guest/hypercall.h
 create mode 100644 xen/include/asm-x86/guest/pvh-boot.h
 create mode 100644 xen/include/asm-x86/guest/xen.h
 create mode 100644 xen/include/asm-x86/pv/shim.h
 create mode 100644 xen/include/xen/consoled.h
 create mode 100644 xen/include/xen/pv_console.h

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 01/74] x86/svm: Offer CPUID Faulting to AMD HVM guests as well
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 14:00   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 02/74] x86: Common cpuid faulting support Wei Liu
                   ` (74 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

CPUID Faulting can be virtulised for HVM guests without hardware support,
meaning it can be offered to SVM guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/svm/svm.c | 6 ++++++
 xen/arch/x86/msr.c         | 3 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 2e62b9bb6d..677241be65 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1786,6 +1786,12 @@ static void svm_vmexit_do_cpuid(struct cpu_user_regs *regs)
     if ( (inst_len = __get_instruction_length(curr, INSTR_CPUID)) == 0 )
         return;
 
+    if ( hvm_check_cpuid_faulting(curr) )
+    {
+        hvm_inject_hw_exception(TRAP_gp_fault, 0);
+        return;
+    }
+
     guest_cpuid(curr, regs->eax, regs->ecx, &res);
     HVMTRACE_5D(CPUID, regs->eax, res.a, res.b, res.c, res.d);
 
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 31983edc54..187f8623a5 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -39,7 +39,8 @@ static void __init calculate_hvm_max_policy(void)
         return;
 
     /* 0x000000ce  MSR_INTEL_PLATFORM_INFO */
-    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
+         boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
     {
         dp->plaform_info.available = true;
         dp->plaform_info.cpuid_faulting = true;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 02/74] x86: Common cpuid faulting support
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 01/74] x86/svm: Offer CPUID Faulting to AMD HVM guests as well Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 14:19   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 03/74] x86/upcall: inject a spurious event after setting upcall vector Wei Liu
                   ` (73 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

With CPUID Faulting offered to SVM guests, move Xen's faulting code to being
common rather than Intel specific.

This is necessary for nested Xen (inc. pv-shim mode) to prevent PV guests from
finding the outer HVM Xen leaves via native cpuid.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/amd.c          | 16 +++++---
 xen/arch/x86/cpu/common.c       | 76 ++++++++++++++++++++++++++++++++++++--
 xen/arch/x86/cpu/intel.c        | 81 +++++++----------------------------------
 xen/include/asm-x86/cpuid.h     |  3 --
 xen/include/asm-x86/processor.h |  4 +-
 5 files changed, 98 insertions(+), 82 deletions(-)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 5f36ac75a7..2bff3ee377 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -198,11 +198,12 @@ static void __init noinline probe_masking_msrs(void)
 }
 
 /*
- * Context switch levelling state to the next domain.  A parameter of NULL is
- * used to context switch to the default host state (by the cpu bringup-code,
- * crash path, etc).
+ * Context switch CPUID masking state to the next domain.  Only called if
+ * CPUID Faulting isn't available, but masking MSRs have been detected.  A
+ * parameter of NULL is used to context switch to the default host state (by
+ * the cpu bringup-code, crash path, etc).
  */
-static void amd_ctxt_switch_levelling(const struct vcpu *next)
+static void amd_ctxt_switch_masking(const struct vcpu *next)
 {
 	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
 	const struct domain *nextd = next ? next->domain : NULL;
@@ -263,6 +264,9 @@ static void __init noinline amd_init_levelling(void)
 {
 	const struct cpuidmask *m = NULL;
 
+	if (probe_cpuid_faulting())
+		return;
+
 	probe_masking_msrs();
 
 	if (*opt_famrev != '\0') {
@@ -352,7 +356,7 @@ static void __init noinline amd_init_levelling(void)
 	}
 
 	if (levelling_caps)
-		ctxt_switch_levelling = amd_ctxt_switch_levelling;
+		ctxt_switch_masking = amd_ctxt_switch_masking;
 }
 
 /*
@@ -518,7 +522,7 @@ static void early_init_amd(struct cpuinfo_x86 *c)
 	if (c == &boot_cpu_data)
 		amd_init_levelling();
 
-	amd_ctxt_switch_levelling(NULL);
+	ctxt_switch_levelling(NULL);
 }
 
 static void init_amd(struct cpuinfo_x86 *c)
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index e9588b3c0d..a1f1a04776 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -113,12 +113,80 @@ static const struct cpu_dev default_cpu = {
 };
 static const struct cpu_dev *this_cpu = &default_cpu;
 
-static void default_ctxt_switch_levelling(const struct vcpu *next)
+static DEFINE_PER_CPU(uint64_t, msr_misc_features);
+void (* __read_mostly ctxt_switch_masking)(const struct vcpu *next);
+
+bool __init probe_cpuid_faulting(void)
+{
+	uint64_t val;
+
+	if (rdmsr_safe(MSR_INTEL_PLATFORM_INFO, val) ||
+	    !(val & MSR_PLATFORM_INFO_CPUID_FAULTING) ||
+	    rdmsr_safe(MSR_INTEL_MISC_FEATURES_ENABLES,
+		       this_cpu(msr_misc_features)))
+	{
+		setup_clear_cpu_cap(X86_FEATURE_CPUID_FAULTING);
+		return false;
+	}
+
+	expected_levelling_cap |= LCAP_faulting;
+	levelling_caps |=  LCAP_faulting;
+	setup_force_cpu_cap(X86_FEATURE_CPUID_FAULTING);
+
+	return true;
+}
+
+static void set_cpuid_faulting(bool enable)
+{
+	uint64_t *this_misc_features = &this_cpu(msr_misc_features);
+	uint64_t val = *this_misc_features;
+
+	if (!!(val & MSR_MISC_FEATURES_CPUID_FAULTING) == enable)
+		return;
+
+	val ^= MSR_MISC_FEATURES_CPUID_FAULTING;
+
+	wrmsrl(MSR_INTEL_MISC_FEATURES_ENABLES, val);
+	*this_misc_features = val;
+}
+
+void ctxt_switch_levelling(const struct vcpu *next)
 {
-	/* Nop */
+	const struct domain *nextd = next ? next->domain : NULL;
+
+	if (cpu_has_cpuid_faulting) {
+		/*
+		 * No need to alter the faulting setting if we are switching
+		 * to idle; it won't affect any code running in idle context.
+		 */
+		if (nextd && is_idle_domain(nextd))
+			return;
+		/*
+		 * We *should* be enabling faulting for the control domain.
+		 *
+		 * Unfortunately, the domain builder (having only ever been a
+		 * PV guest) expects to be able to see host cpuid state in a
+		 * native CPUID instruction, to correctly build a CPUID policy
+		 * for HVM guests (notably the xstate leaves).
+		 *
+		 * This logic is fundimentally broken for HVM toolstack
+		 * domains, and faulting causes PV guests to behave like HVM
+		 * guests from their point of view.
+		 *
+		 * Future development plans will move responsibility for
+		 * generating the maximum full cpuid policy into Xen, at which
+		 * this problem will disappear.
+		 */
+		set_cpuid_faulting(nextd && !is_control_domain(nextd) &&
+				   (is_pv_domain(nextd) ||
+				    next->arch.msr->
+				    misc_features_enables.cpuid_faulting));
+		return;
+	}
+
+	if (ctxt_switch_masking)
+		ctxt_switch_masking(next);
 }
-void (* __read_mostly ctxt_switch_levelling)(const struct vcpu *next) =
-	default_ctxt_switch_levelling;
 
 bool_t opt_cpu_info;
 boolean_param("cpuinfo", opt_cpu_info);
diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
index 8311952f1f..0888f76161 100644
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -15,40 +15,6 @@
 
 #include "cpu.h"
 
-static bool __init probe_intel_cpuid_faulting(void)
-{
-	uint64_t x;
-
-	if (rdmsr_safe(MSR_INTEL_PLATFORM_INFO, x) ||
-	    !(x & MSR_PLATFORM_INFO_CPUID_FAULTING))
-		return 0;
-
-	expected_levelling_cap |= LCAP_faulting;
-	levelling_caps |=  LCAP_faulting;
-	setup_force_cpu_cap(X86_FEATURE_CPUID_FAULTING);
-	return 1;
-}
-
-DEFINE_PER_CPU(bool, cpuid_faulting_enabled);
-
-static void set_cpuid_faulting(bool enable)
-{
-	bool *this_enabled = &this_cpu(cpuid_faulting_enabled);
-	uint32_t hi, lo;
-
-	ASSERT(cpu_has_cpuid_faulting);
-
-	if (*this_enabled == enable)
-		return;
-
-	rdmsr(MSR_INTEL_MISC_FEATURES_ENABLES, lo, hi);
-	lo &= ~MSR_MISC_FEATURES_CPUID_FAULTING;
-	if (enable)
-		lo |= MSR_MISC_FEATURES_CPUID_FAULTING;
-	wrmsr(MSR_INTEL_MISC_FEATURES_ENABLES, lo, hi);
-
-	*this_enabled = enable;
-}
 
 /*
  * Set caps in expected_levelling_cap, probe a specific masking MSR, and set
@@ -145,40 +111,17 @@ static void __init probe_masking_msrs(void)
 }
 
 /*
- * Context switch levelling state to the next domain.  A parameter of NULL is
- * used to context switch to the default host state (by the cpu bringup-code,
- * crash path, etc).
+ * Context switch CPUID masking state to the next domain.  Only called if
+ * CPUID Faulting isn't available, but masking MSRs have been detected.  A
+ * parameter of NULL is used to context switch to the default host state (by
+ * the cpu bringup-code, crash path, etc).
  */
-static void intel_ctxt_switch_levelling(const struct vcpu *next)
+static void intel_ctxt_switch_masking(const struct vcpu *next)
 {
 	struct cpuidmasks *these_masks = &this_cpu(cpuidmasks);
 	const struct domain *nextd = next ? next->domain : NULL;
-	const struct cpuidmasks *masks;
-
-	if (cpu_has_cpuid_faulting) {
-		/*
-		 * We *should* be enabling faulting for the control domain.
-		 *
-		 * Unfortunately, the domain builder (having only ever been a
-		 * PV guest) expects to be able to see host cpuid state in a
-		 * native CPUID instruction, to correctly build a CPUID policy
-		 * for HVM guests (notably the xstate leaves).
-		 *
-		 * This logic is fundimentally broken for HVM toolstack
-		 * domains, and faulting causes PV guests to behave like HVM
-		 * guests from their point of view.
-		 *
-		 * Future development plans will move responsibility for
-		 * generating the maximum full cpuid policy into Xen, at which
-		 * this problem will disappear.
-		 */
-		set_cpuid_faulting(nextd && !is_control_domain(nextd) &&
-				   (is_pv_domain(nextd) ||
-				    next->arch.msr->misc_features_enables.cpuid_faulting));
-		return;
-	}
-
-	masks = (nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
+	const struct cpuidmasks *masks =
+		(nextd && is_pv_domain(nextd) && nextd->arch.pv_domain.cpuidmasks)
 		? nextd->arch.pv_domain.cpuidmasks : &cpuidmask_defaults;
 
         if (msr_basic) {
@@ -223,8 +166,10 @@ static void intel_ctxt_switch_levelling(const struct vcpu *next)
  */
 static void __init noinline intel_init_levelling(void)
 {
-	if (!probe_intel_cpuid_faulting())
-		probe_masking_msrs();
+	if (probe_cpuid_faulting())
+		return;
+
+	probe_masking_msrs();
 
 	if (msr_basic) {
 		uint32_t ecx, edx, tmp;
@@ -278,7 +223,7 @@ static void __init noinline intel_init_levelling(void)
 	}
 
 	if (levelling_caps)
-		ctxt_switch_levelling = intel_ctxt_switch_levelling;
+		ctxt_switch_masking = intel_ctxt_switch_masking;
 }
 
 static void early_init_intel(struct cpuinfo_x86 *c)
@@ -316,7 +261,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	if (c == &boot_cpu_data)
 		intel_init_levelling();
 
-	intel_ctxt_switch_levelling(NULL);
+	ctxt_switch_levelling(NULL);
 }
 
 /*
diff --git a/xen/include/asm-x86/cpuid.h b/xen/include/asm-x86/cpuid.h
index d2dd841e15..74d6f123e5 100644
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -58,9 +58,6 @@ DECLARE_PER_CPU(struct cpuidmasks, cpuidmasks);
 /* Default masking MSR values, calculated at boot. */
 extern struct cpuidmasks cpuidmask_defaults;
 
-/* Whether or not cpuid faulting is available for the current domain. */
-DECLARE_PER_CPU(bool, cpuid_faulting_enabled);
-
 #define CPUID_GUEST_NR_BASIC      (0xdu + 1)
 #define CPUID_GUEST_NR_FEAT       (0u + 1)
 #define CPUID_GUEST_NR_CACHE      (5u + 1)
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 41a8d8c32f..c9601b2fb2 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -151,7 +151,9 @@ extern struct cpuinfo_x86 boot_cpu_data;
 extern struct cpuinfo_x86 cpu_data[];
 #define current_cpu_data cpu_data[smp_processor_id()]
 
-extern void (*ctxt_switch_levelling)(const struct vcpu *next);
+extern bool probe_cpuid_faulting(void);
+extern void ctxt_switch_levelling(const struct vcpu *next);
+extern void (*ctxt_switch_masking)(const struct vcpu *next);
 
 extern u64 host_pat;
 extern bool_t opt_cpu_info;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 03/74] x86/upcall: inject a spurious event after setting upcall vector
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 01/74] x86/svm: Offer CPUID Faulting to AMD HVM guests as well Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 02/74] x86: Common cpuid faulting support Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 04/74] tools/libxc: initialise hvm loader elf log fd to get more logging Wei Liu
                   ` (72 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

In case the vCPU has pending events to inject. This fixes a bug that
happened if the guest mapped the vcpu info area using
VCPUOP_register_vcpu_info without having setup the event channel
upcall, and then setup the upcall vector.

In this scenario the guest would not receive any upcalls, because the
call to VCPUOP_register_vcpu_info would have marked the vCPU as having
pending events, but the vector could not be injected because it was
not yet setup.

This has not caused issues so far because all the consumers first
setup the vector callback and then map the vcpu info page, but there's
no limitation that prevents doing it in the inverse order.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c | 1 +
 xen/arch/x86/hvm/irq.c | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 28bc7e4252..3dfb3511d9 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4069,6 +4069,7 @@ static int hvmop_set_evtchn_upcall_vector(
     printk(XENLOG_G_INFO "%pv: upcall vector %02x\n", v, op.vector);
 
     v->arch.hvm_vcpu.evtchn_upcall_vector = op.vector;
+    arch_evtchn_inject(v);
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 0077f68a83..9427e30806 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -385,6 +385,7 @@ void hvm_set_callback_via(struct domain *d, uint64_t via)
     struct hvm_irq *hvm_irq = hvm_domain_irq(d);
     unsigned int gsi=0, pdev=0, pintx=0;
     uint8_t via_type;
+    struct vcpu *v;
 
     via_type = (uint8_t)MASK_EXTR(via, HVM_PARAM_CALLBACK_IRQ_TYPE_MASK) + 1;
     if ( ((via_type == HVMIRQ_callback_gsi) && (via == 0)) ||
@@ -447,6 +448,9 @@ void hvm_set_callback_via(struct domain *d, uint64_t via)
 
     spin_unlock(&d->arch.hvm_domain.irq_lock);
 
+    for_each_vcpu(d, v)
+        arch_evtchn_inject(v);
+
 #ifndef NDEBUG
     printk(XENLOG_G_INFO "Dom%u callback via changed to ", d->domain_id);
     switch ( via_type )
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 04/74] tools/libxc: initialise hvm loader elf log fd to get more logging
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (2 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 03/74] x86/upcall: inject a spurious event after setting upcall vector Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 05/74] tools/libxc: remove extraneous newline in xc_dom_load_acpi Wei Liu
                   ` (71 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/xc_dom_hvmloader.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/libxc/xc_dom_hvmloader.c b/tools/libxc/xc_dom_hvmloader.c
index 59f94e51e5..02c3eaef38 100644
--- a/tools/libxc/xc_dom_hvmloader.c
+++ b/tools/libxc/xc_dom_hvmloader.c
@@ -66,6 +66,7 @@ static elf_negerrnoval xc_dom_probe_hvm_kernel(struct xc_dom_image *dom)
         return rc;
 
     rc = elf_init(&elf, dom->kernel_blob, dom->kernel_size);
+    xc_elf_set_logfile(dom->xch, &elf, 1);
     if ( rc != 0 )
         return rc;
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 05/74] tools/libxc: remove extraneous newline in xc_dom_load_acpi
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (3 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 04/74] tools/libxc: initialise hvm loader elf log fd to get more logging Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 06/74] tools/libelf: fix elf notes check for PVH guest Wei Liu
                   ` (70 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/xc_dom_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index b5f316a1dc..303cb971e8 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -1078,7 +1078,7 @@ static int xc_dom_load_acpi(struct xc_dom_image *dom)
 
     while ( (i < MAX_ACPI_MODULES) && dom->acpi_modules[i].length )
     {
-        DOMPRINTF("%s: %d bytes at address %" PRIx64 "\n", __FUNCTION__,
+        DOMPRINTF("%s: %d bytes at address %" PRIx64, __FUNCTION__,
                   dom->acpi_modules[i].length,
                   dom->acpi_modules[i].guest_addr_out);
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 06/74] tools/libelf: fix elf notes check for PVH guest
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (4 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 05/74] tools/libxc: remove extraneous newline in xc_dom_load_acpi Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 14:37   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 07/74] tools/libxc: Multi modules support Wei Liu
                   ` (69 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

PVH only requires PHYS32_ENTRY to be set. Return immediately if that's
the case.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/common/libelf/libelf-dominfo.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
index 829d5176a9..508f08db42 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -381,6 +381,13 @@ static elf_errorstatus elf_xen_note_check(struct elf_binary *elf,
          return 0;
     }
 
+    /* PVH only requires one ELF note to be set */
+    if ( parms->phys_entry != UNSET_ADDR32 )
+    {
+        elf_msg(elf, "ELF: Found PVH image\n");
+        return 0;
+    }
+
     /* Check the contents of the Xen notes or guest string. */
     if ( ((strlen(parms->loader) == 0) ||
           strncmp(parms->loader, "generic", 7)) &&
@@ -389,7 +396,7 @@ static elf_errorstatus elf_xen_note_check(struct elf_binary *elf,
     {
         elf_err(elf,
                 "ERROR: Will only load images built for the generic loader or Linux images"
-                " (Not '%.*s' and '%.*s')\n",
+                " (Not '%.*s' and '%.*s') or with PHYS32_ENTRY set\n",
                 (int)sizeof(parms->loader), parms->loader,
                 (int)sizeof(parms->guest_os), parms->guest_os);
         return -1;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 07/74] tools/libxc: Multi modules support
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (5 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 06/74] tools/libelf: fix elf notes check for PVH guest Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 08/74] libxl: Introduce hack to allow PVH mode to add a shim Wei Liu
                   ` (68 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Jonathan Ludlam <jonathan.ludlam@citrix.com>

Signed-off-by: Jonathan Ludlam <jonathan.ludlam@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 stubdom/grub/kexec.c                 |   7 +-
 tools/helpers/init-xenstore-domain.c |   4 +-
 tools/libxc/include/xc_dom.h         |  48 ++++++-----
 tools/libxc/xc_dom_compat_linux.c    |   2 +-
 tools/libxc/xc_dom_core.c            | 152 +++++++++++++++++++++++------------
 tools/libxc/xc_dom_x86.c             |  65 ++++++++-------
 tools/libxl/libxl_dom.c              |  10 +--
 7 files changed, 175 insertions(+), 113 deletions(-)

diff --git a/stubdom/grub/kexec.c b/stubdom/grub/kexec.c
index 437a0a96e9..61ca082d42 100644
--- a/stubdom/grub/kexec.c
+++ b/stubdom/grub/kexec.c
@@ -202,7 +202,7 @@ static void tpm_hash2pcr(struct xc_dom_image *dom, char *cmdline)
 	ASSERT(rv == 0 && resp->status == 0);
 
 	cmd.pcr = bswap_32(5); // PCR #5 for initrd
-	sha1(dom->ramdisk_blob, dom->ramdisk_size, cmd.hash);
+	sha1(dom->modules[0].blob, dom->modules[0].size, cmd.hash);
 	rv = tpmfront_cmd(tpm, (void*)&cmd, sizeof(cmd), (void*)&resp, &resplen);
 	ASSERT(rv == 0 && resp->status == 0);
 
@@ -231,13 +231,12 @@ void kexec(void *kernel, long kernel_size, void *module, long module_size, char
 
     /* We are using guest owned memory, therefore no limits. */
     xc_dom_kernel_max_size(dom, 0);
-    xc_dom_ramdisk_max_size(dom, 0);
+    xc_dom_module_max_size(dom, 0);
 
     dom->kernel_blob = kernel;
     dom->kernel_size = kernel_size;
 
-    dom->ramdisk_blob = module;
-    dom->ramdisk_size = module_size;
+    xc_dom_module_mem(dom, module, module_size, NULL);
 
     dom->flags = flags;
     dom->console_evtchn = start_info.console.domU.evtchn;
diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
index 047ad0cb1d..8453be283b 100644
--- a/tools/helpers/init-xenstore-domain.c
+++ b/tools/helpers/init-xenstore-domain.c
@@ -145,10 +145,10 @@ static int build(xc_interface *xch)
 
     if ( ramdisk )
     {
-        rv = xc_dom_ramdisk_file(dom, ramdisk);
+        rv = xc_dom_module_file(dom, ramdisk, NULL);
         if ( rv )
         {
-            fprintf(stderr, "xc_dom_ramdisk_file failed\n");
+            fprintf(stderr, "xc_dom_module_file failed\n");
             goto err;
         }
     }
diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index cdcdd07d2b..08be8a8f3f 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -22,6 +22,7 @@
 #define INVALID_PFN ((xen_pfn_t)-1)
 #define X86_HVM_NR_SPECIAL_PAGES    8
 #define X86_HVM_END_SPECIAL_REGION  0xff000u
+#define XG_MAX_MODULES 2
 
 /* --- typedefs and structs ---------------------------------------- */
 
@@ -56,17 +57,32 @@ struct xc_dom_phys {
     xen_pfn_t count;
 };
 
+struct xc_dom_module {
+    void *blob;
+    size_t size;
+    void *cmdline;
+    /* If seg.vstart is non zero then the module will be loaded at that
+     * address, otherwise it will automatically placed.
+     *
+     * If automatic placement is used and the module is gzip
+     * compressed then it will be decompressed as it is loaded. If the
+     * module has been explicitly placed then it is loaded as is
+     * otherwise decompressing risks undoing the manual placement.
+     */
+    struct xc_dom_seg seg;
+};
+
 struct xc_dom_image {
     /* files */
     void *kernel_blob;
     size_t kernel_size;
-    void *ramdisk_blob;
-    size_t ramdisk_size;
+    unsigned int num_modules;
+    struct xc_dom_module modules[XG_MAX_MODULES];
     void *devicetree_blob;
     size_t devicetree_size;
 
     size_t max_kernel_size;
-    size_t max_ramdisk_size;
+    size_t max_module_size;
     size_t max_devicetree_size;
 
     /* arguments and parameters */
@@ -80,15 +96,6 @@ struct xc_dom_image {
 
     /* memory layout */
     struct xc_dom_seg kernel_seg;
-    /* If ramdisk_seg.vstart is non zero then the ramdisk will be
-     * loaded at that address, otherwise it will automatically placed.
-     *
-     * If automatic placement is used and the ramdisk is gzip
-     * compressed then it will be decompressed as it is loaded. If the
-     * ramdisk has been explicitly placed then it is loaded as is
-     * otherwise decompressing risks undoing the manual placement.
-     */
-    struct xc_dom_seg ramdisk_seg;
     struct xc_dom_seg p2m_seg;
     struct xc_dom_seg pgtables_seg;
     struct xc_dom_seg devicetree_seg;
@@ -277,12 +284,12 @@ void xc_dom_release(struct xc_dom_image *dom);
 int xc_dom_rambase_init(struct xc_dom_image *dom, uint64_t rambase);
 int xc_dom_mem_init(struct xc_dom_image *dom, unsigned int mem_mb);
 
-/* Set this larger if you have enormous ramdisks/kernels. Note that
+/* Set this larger if you have enormous modules/kernels. Note that
  * you should trust all kernels not to be maliciously large (e.g. to
  * exhaust all dom0 memory) if you do this (see CVE-2012-4544 /
  * XSA-25). You can also set the default independently for
- * ramdisks/kernels in xc_dom_allocate() or call
- * xc_dom_{kernel,ramdisk}_max_size.
+ * modules/kernels in xc_dom_allocate() or call
+ * xc_dom_{kernel,module}_max_size.
  */
 #ifndef XC_DOM_DECOMPRESS_MAX
 #define XC_DOM_DECOMPRESS_MAX (1024*1024*1024) /* 1GB */
@@ -291,8 +298,8 @@ int xc_dom_mem_init(struct xc_dom_image *dom, unsigned int mem_mb);
 int xc_dom_kernel_check_size(struct xc_dom_image *dom, size_t sz);
 int xc_dom_kernel_max_size(struct xc_dom_image *dom, size_t sz);
 
-int xc_dom_ramdisk_check_size(struct xc_dom_image *dom, size_t sz);
-int xc_dom_ramdisk_max_size(struct xc_dom_image *dom, size_t sz);
+int xc_dom_module_check_size(struct xc_dom_image *dom, size_t sz);
+int xc_dom_module_max_size(struct xc_dom_image *dom, size_t sz);
 
 int xc_dom_devicetree_max_size(struct xc_dom_image *dom, size_t sz);
 
@@ -303,11 +310,12 @@ int xc_dom_do_gunzip(xc_interface *xch,
 int xc_dom_try_gunzip(struct xc_dom_image *dom, void **blob, size_t * size);
 
 int xc_dom_kernel_file(struct xc_dom_image *dom, const char *filename);
-int xc_dom_ramdisk_file(struct xc_dom_image *dom, const char *filename);
+int xc_dom_module_file(struct xc_dom_image *dom, const char *filename,
+                       const char *cmdline);
 int xc_dom_kernel_mem(struct xc_dom_image *dom, const void *mem,
                       size_t memsize);
-int xc_dom_ramdisk_mem(struct xc_dom_image *dom, const void *mem,
-                       size_t memsize);
+int xc_dom_module_mem(struct xc_dom_image *dom, const void *mem,
+                       size_t memsize, const char *cmdline);
 int xc_dom_devicetree_file(struct xc_dom_image *dom, const char *filename);
 int xc_dom_devicetree_mem(struct xc_dom_image *dom, const void *mem,
                           size_t memsize);
diff --git a/tools/libxc/xc_dom_compat_linux.c b/tools/libxc/xc_dom_compat_linux.c
index c922c61e90..b3d43feed9 100644
--- a/tools/libxc/xc_dom_compat_linux.c
+++ b/tools/libxc/xc_dom_compat_linux.c
@@ -56,7 +56,7 @@ int xc_linux_build(xc_interface *xch, uint32_t domid,
     if ( (rc = xc_dom_kernel_file(dom, image_name)) != 0 )
         goto out;
     if ( initrd_name && strlen(initrd_name) &&
-         ((rc = xc_dom_ramdisk_file(dom, initrd_name)) != 0) )
+         ((rc = xc_dom_module_file(dom, initrd_name, NULL)) != 0) )
         goto out;
 
     dom->flags |= flags;
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index 303cb971e8..3e65aff22b 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -314,16 +314,16 @@ int xc_dom_kernel_check_size(struct xc_dom_image *dom, size_t sz)
     return 0;
 }
 
-int xc_dom_ramdisk_check_size(struct xc_dom_image *dom, size_t sz)
+int xc_dom_module_check_size(struct xc_dom_image *dom, size_t sz)
 {
     /* No limit */
-    if ( !dom->max_ramdisk_size )
+    if ( !dom->max_module_size )
         return 0;
 
-    if ( sz > dom->max_ramdisk_size )
+    if ( sz > dom->max_module_size )
     {
         xc_dom_panic(dom->xch, XC_INVALID_KERNEL,
-                     "ramdisk image too large");
+                     "module image too large");
         return 1;
     }
 
@@ -764,7 +764,7 @@ struct xc_dom_image *xc_dom_allocate(xc_interface *xch,
     dom->xch = xch;
 
     dom->max_kernel_size = XC_DOM_DECOMPRESS_MAX;
-    dom->max_ramdisk_size = XC_DOM_DECOMPRESS_MAX;
+    dom->max_module_size = XC_DOM_DECOMPRESS_MAX;
     dom->max_devicetree_size = XC_DOM_DECOMPRESS_MAX;
 
     if ( cmdline )
@@ -797,10 +797,10 @@ int xc_dom_kernel_max_size(struct xc_dom_image *dom, size_t sz)
     return 0;
 }
 
-int xc_dom_ramdisk_max_size(struct xc_dom_image *dom, size_t sz)
+int xc_dom_module_max_size(struct xc_dom_image *dom, size_t sz)
 {
-    DOMPRINTF("%s: ramdisk_max_size=%zx", __FUNCTION__, sz);
-    dom->max_ramdisk_size = sz;
+    DOMPRINTF("%s: module_max_size=%zx", __FUNCTION__, sz);
+    dom->max_module_size = sz;
     return 0;
 }
 
@@ -821,16 +821,30 @@ int xc_dom_kernel_file(struct xc_dom_image *dom, const char *filename)
     return xc_dom_try_gunzip(dom, &dom->kernel_blob, &dom->kernel_size);
 }
 
-int xc_dom_ramdisk_file(struct xc_dom_image *dom, const char *filename)
+int xc_dom_module_file(struct xc_dom_image *dom, const char *filename, const char *cmdline)
 {
+    unsigned int mod = dom->num_modules++;
+
     DOMPRINTF("%s: filename=\"%s\"", __FUNCTION__, filename);
-    dom->ramdisk_blob =
-        xc_dom_malloc_filemap(dom, filename, &dom->ramdisk_size,
-                              dom->max_ramdisk_size);
+    dom->modules[mod].blob =
+        xc_dom_malloc_filemap(dom, filename, &dom->modules[mod].size,
+                              dom->max_module_size);
 
-    if ( dom->ramdisk_blob == NULL )
+    if ( dom->modules[mod].blob == NULL )
         return -1;
-//    return xc_dom_try_gunzip(dom, &dom->ramdisk_blob, &dom->ramdisk_size);
+
+    if ( cmdline )
+    {
+        dom->modules[mod].cmdline = xc_dom_strdup(dom, cmdline);
+
+        if ( dom->modules[mod].cmdline == NULL )
+            return -1;
+    }
+    else
+    {
+        dom->modules[mod].cmdline = NULL;
+    }
+
     return 0;
 }
 
@@ -859,13 +873,28 @@ int xc_dom_kernel_mem(struct xc_dom_image *dom, const void *mem, size_t memsize)
     return xc_dom_try_gunzip(dom, &dom->kernel_blob, &dom->kernel_size);
 }
 
-int xc_dom_ramdisk_mem(struct xc_dom_image *dom, const void *mem,
-                       size_t memsize)
+int xc_dom_module_mem(struct xc_dom_image *dom, const void *mem,
+                      size_t memsize, const char *cmdline)
 {
+    unsigned int mod = dom->num_modules++;
+
     DOMPRINTF_CALLED(dom->xch);
-    dom->ramdisk_blob = (void *)mem;
-    dom->ramdisk_size = memsize;
-//    return xc_dom_try_gunzip(dom, &dom->ramdisk_blob, &dom->ramdisk_size);
+
+    dom->modules[mod].blob = (void *)mem;
+    dom->modules[mod].size = memsize;
+
+    if ( cmdline )
+    {
+        dom->modules[mod].cmdline = xc_dom_strdup(dom, cmdline);
+
+        if ( dom->modules[mod].cmdline == NULL )
+            return -1;
+    }
+    else
+    {
+        dom->modules[mod].cmdline = NULL;
+    }
+
     return 0;
 }
 
@@ -990,41 +1019,42 @@ int xc_dom_update_guest_p2m(struct xc_dom_image *dom)
     return 0;
 }
 
-static int xc_dom_build_ramdisk(struct xc_dom_image *dom)
+static int xc_dom_build_module(struct xc_dom_image *dom, unsigned int mod)
 {
-    size_t unziplen, ramdisklen;
-    void *ramdiskmap;
+    size_t unziplen, modulelen;
+    void *modulemap;
+    char name[10];
 
-    if ( !dom->ramdisk_seg.vstart )
+    if ( !dom->modules[mod].seg.vstart )
     {
         unziplen = xc_dom_check_gzip(dom->xch,
-                                     dom->ramdisk_blob, dom->ramdisk_size);
-        if ( xc_dom_ramdisk_check_size(dom, unziplen) != 0 )
+                                     dom->modules[mod].blob, dom->modules[mod].size);
+        if ( xc_dom_module_check_size(dom, unziplen) != 0 )
             unziplen = 0;
     }
     else
         unziplen = 0;
 
-    ramdisklen = unziplen ? unziplen : dom->ramdisk_size;
-
-    if ( xc_dom_alloc_segment(dom, &dom->ramdisk_seg, "ramdisk",
-                              dom->ramdisk_seg.vstart, ramdisklen) != 0 )
+    modulelen = unziplen ? unziplen : dom->modules[mod].size;
+    snprintf(name, sizeof(name), "module%u", mod);
+    if ( xc_dom_alloc_segment(dom, &dom->modules[mod].seg, name,
+                              dom->modules[mod].seg.vstart, modulelen) != 0 )
         goto err;
-    ramdiskmap = xc_dom_seg_to_ptr(dom, &dom->ramdisk_seg);
-    if ( ramdiskmap == NULL )
+    modulemap = xc_dom_seg_to_ptr(dom, &dom->modules[mod].seg);
+    if ( modulemap == NULL )
     {
-        DOMPRINTF("%s: xc_dom_seg_to_ptr(dom, &dom->ramdisk_seg) => NULL",
-                  __FUNCTION__);
+        DOMPRINTF("%s: xc_dom_seg_to_ptr(dom, &dom->modules[%u].seg) => NULL",
+                  __FUNCTION__, mod);
         goto err;
     }
     if ( unziplen )
     {
-        if ( xc_dom_do_gunzip(dom->xch, dom->ramdisk_blob, dom->ramdisk_size,
-                              ramdiskmap, ramdisklen) == -1 )
+        if ( xc_dom_do_gunzip(dom->xch, dom->modules[mod].blob, dom->modules[mod].size,
+                              modulemap, modulelen) == -1 )
             goto err;
     }
     else
-        memcpy(ramdiskmap, dom->ramdisk_blob, dom->ramdisk_size);
+        memcpy(modulemap, dom->modules[mod].blob, dom->modules[mod].size);
 
     return 0;
 
@@ -1131,6 +1161,7 @@ int xc_dom_build_image(struct xc_dom_image *dom)
 {
     unsigned int page_size;
     bool unmapped_initrd;
+    unsigned int mod;
 
     DOMPRINTF_CALLED(dom->xch);
 
@@ -1154,15 +1185,24 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     if ( dom->kernel_loader->loader(dom) != 0 )
         goto err;
 
-    /* Don't load ramdisk now if no initial mapping required. */
-    unmapped_initrd = dom->parms.unmapped_initrd && !dom->ramdisk_seg.vstart;
-
-    if ( dom->ramdisk_blob && !unmapped_initrd )
+    /* Don't load ramdisk / other modules now if no initial mapping required. */
+    for ( mod = 0; mod < dom->num_modules; mod++ )
     {
-        if ( xc_dom_build_ramdisk(dom) != 0 )
-            goto err;
-        dom->initrd_start = dom->ramdisk_seg.vstart;
-        dom->initrd_len = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
+        unmapped_initrd = (dom->parms.unmapped_initrd &&
+                           !dom->modules[mod].seg.vstart);
+
+        if ( dom->modules[mod].blob && !unmapped_initrd )
+        {
+            if ( xc_dom_build_module(dom, mod) != 0 )
+                goto err;
+
+            if ( mod == 0 )
+            {
+                dom->initrd_start = dom->modules[mod].seg.vstart;
+                dom->initrd_len =
+                    dom->modules[mod].seg.vend - dom->modules[mod].seg.vstart;
+            }
+        }
     }
 
     /* load devicetree */
@@ -1216,14 +1256,24 @@ int xc_dom_build_image(struct xc_dom_image *dom)
     if ( dom->virt_pgtab_end && xc_dom_alloc_pad(dom, dom->virt_pgtab_end) )
         return -1;
 
-    /* Load ramdisk if no initial mapping required. */
-    if ( dom->ramdisk_blob && unmapped_initrd )
+    for ( mod = 0; mod < dom->num_modules; mod++ )
     {
-        if ( xc_dom_build_ramdisk(dom) != 0 )
-            goto err;
-        dom->flags |= SIF_MOD_START_PFN;
-        dom->initrd_start = dom->ramdisk_seg.pfn;
-        dom->initrd_len = page_size * dom->ramdisk_seg.pages;
+        unmapped_initrd = (dom->parms.unmapped_initrd &&
+                           !dom->modules[mod].seg.vstart);
+
+        /* Load ramdisk / other modules if no initial mapping required. */
+        if ( dom->modules[mod].blob && unmapped_initrd )
+        {
+            if ( xc_dom_build_module(dom, mod) != 0 )
+                goto err;
+
+            if ( mod == 0 )
+            {
+                dom->flags |= SIF_MOD_START_PFN;
+                dom->initrd_start = dom->modules[mod].seg.pfn;
+                dom->initrd_len = page_size * dom->modules[mod].seg.pages;
+            }
+        }
     }
 
     /* Allocate p2m list if outside of initial kernel mapping. */
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index bff68a011f..0b65dab4bc 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -70,8 +70,8 @@
 #define round_up(addr, mask)     ((addr) | (mask))
 #define round_pg_up(addr)  (((addr) + PAGE_SIZE_X86 - 1) & ~(PAGE_SIZE_X86 - 1))
 
-#define HVMLOADER_MODULE_MAX_COUNT 1
-#define HVMLOADER_MODULE_NAME_SIZE 10
+#define HVMLOADER_MODULE_MAX_COUNT 2
+#define HVMLOADER_MODULE_CMDLINE_SIZE MAX_GUEST_CMDLINE
 
 struct xc_dom_params {
     unsigned levels;
@@ -627,6 +627,12 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
     xc_hvm_param_set(xch, domid, HVM_PARAM_SHARING_RING_PFN,
                      special_pfn(SPECIALPAGE_SHARING));
 
+    start_info_size +=
+        sizeof(struct hvm_modlist_entry) * HVMLOADER_MODULE_MAX_COUNT;
+
+    start_info_size +=
+        HVMLOADER_MODULE_CMDLINE_SIZE * HVMLOADER_MODULE_MAX_COUNT;
+
     if ( !dom->device_model )
     {
         if ( dom->cmdline )
@@ -634,22 +640,9 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
             dom->cmdline_size = ROUNDUP(strlen(dom->cmdline) + 1, 8);
             start_info_size += dom->cmdline_size;
         }
-
-        /* Limited to one module. */
-        if ( dom->ramdisk_blob )
-            start_info_size += sizeof(struct hvm_modlist_entry);
     }
     else
     {
-        start_info_size +=
-            sizeof(struct hvm_modlist_entry) * HVMLOADER_MODULE_MAX_COUNT;
-        /*
-         * Add extra space to write modules name.
-         * The HVMLOADER_MODULE_NAME_SIZE accounts for NUL byte.
-         */
-        start_info_size +=
-            HVMLOADER_MODULE_NAME_SIZE * HVMLOADER_MODULE_MAX_COUNT;
-
         /*
          * Allocate and clear additional ioreq server pages. The default
          * server will use the IOREQ and BUFIOREQ special pages above.
@@ -749,7 +742,7 @@ static int start_info_x86_32(struct xc_dom_image *dom)
     start_info->console.domU.mfn = xc_dom_p2m(dom, dom->console_pfn);
     start_info->console.domU.evtchn = dom->console_evtchn;
 
-    if ( dom->ramdisk_blob )
+    if ( dom->modules[0].blob )
     {
         start_info->mod_start = dom->initrd_start;
         start_info->mod_len = dom->initrd_len;
@@ -800,7 +793,7 @@ static int start_info_x86_64(struct xc_dom_image *dom)
     start_info->console.domU.mfn = xc_dom_p2m(dom, dom->console_pfn);
     start_info->console.domU.evtchn = dom->console_evtchn;
 
-    if ( dom->ramdisk_blob )
+    if ( dom->modules[0].blob )
     {
         start_info->mod_start = dom->initrd_start;
         start_info->mod_len = dom->initrd_len;
@@ -1237,7 +1230,7 @@ static int meminit_hvm(struct xc_dom_image *dom)
     unsigned long target_pages = dom->target_pages;
     unsigned long cur_pages, cur_pfn;
     int rc;
-    unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
+    unsigned long stat_normal_pages = 0, stat_2mb_pages = 0,
         stat_1gb_pages = 0;
     unsigned int memflags = 0;
     int claim_enabled = dom->claim_enabled;
@@ -1303,6 +1296,8 @@ static int meminit_hvm(struct xc_dom_image *dom)
     p2m_size = 0;
     for ( i = 0; i < nr_vmemranges; i++ )
     {
+        DOMPRINTF("range: start=0x%"PRIx64" end=0x%"PRIx64, vmemranges[i].start, vmemranges[i].end);
+
         total_pages += ((vmemranges[i].end - vmemranges[i].start)
                         >> PAGE_SHIFT);
         p2m_size = p2m_size > (vmemranges[i].end >> PAGE_SHIFT) ?
@@ -1633,7 +1628,7 @@ static int alloc_pgtables_hvm(struct xc_dom_image *dom)
  */
 static void add_module_to_list(struct xc_dom_image *dom,
                                struct xc_hvm_firmware_module *module,
-                               const char *name,
+                               const char *cmdline,
                                struct hvm_modlist_entry *modlist,
                                struct hvm_start_info *start_info)
 {
@@ -1648,16 +1643,20 @@ static void add_module_to_list(struct xc_dom_image *dom,
         return;
 
     assert(start_info->nr_modules < HVMLOADER_MODULE_MAX_COUNT);
-    assert(strnlen(name, HVMLOADER_MODULE_NAME_SIZE)
-           < HVMLOADER_MODULE_NAME_SIZE);
 
     modlist[index].paddr = module->guest_addr_out;
     modlist[index].size = module->length;
 
-    strncpy(modules_cmdline_start + HVMLOADER_MODULE_NAME_SIZE * index,
-            name, HVMLOADER_MODULE_NAME_SIZE);
+    if ( cmdline )
+    {
+        assert(strnlen(cmdline, HVMLOADER_MODULE_CMDLINE_SIZE)
+               < HVMLOADER_MODULE_CMDLINE_SIZE);
+        strncpy(modules_cmdline_start + HVMLOADER_MODULE_CMDLINE_SIZE * index,
+                cmdline, HVMLOADER_MODULE_CMDLINE_SIZE);
+    }
+
     modlist[index].cmdline_paddr =
-        modules_cmdline_paddr + HVMLOADER_MODULE_NAME_SIZE * index;
+        modules_cmdline_paddr + HVMLOADER_MODULE_CMDLINE_SIZE * index;
 
     start_info->nr_modules++;
 }
@@ -1669,10 +1668,10 @@ static int bootlate_hvm(struct xc_dom_image *dom)
     struct hvm_start_info *start_info;
     size_t start_info_size;
     struct hvm_modlist_entry *modlist;
+    unsigned int i;
 
     start_info_size = sizeof(*start_info) + dom->cmdline_size;
-    if ( dom->ramdisk_blob )
-        start_info_size += sizeof(struct hvm_modlist_entry);
+    start_info_size += sizeof(struct hvm_modlist_entry) * dom->num_modules;
 
     if ( start_info_size >
          dom->start_info_seg.pages << XC_DOM_PAGE_SHIFT(dom) )
@@ -1703,12 +1702,18 @@ static int bootlate_hvm(struct xc_dom_image *dom)
                                 ((uintptr_t)cmdline - (uintptr_t)start_info);
         }
 
-        if ( dom->ramdisk_blob )
+        for ( i = 0; i < dom->num_modules; i++ )
         {
+            struct xc_hvm_firmware_module mod;
+
+            DOMPRINTF("Adding module %u", i);
+            mod.guest_addr_out =
+                dom->modules[i].seg.vstart - dom->parms.virt_base;
+            mod.length =
+                dom->modules[i].seg.vend - dom->modules[i].seg.vstart;
 
-            modlist[0].paddr = dom->ramdisk_seg.vstart - dom->parms.virt_base;
-            modlist[0].size = dom->ramdisk_seg.vend - dom->ramdisk_seg.vstart;
-            start_info->nr_modules = 1;
+            add_module_to_list(dom, &mod, dom->modules[i].cmdline,
+                               modlist, start_info);
         }
 
         /* ACPI module 0 is the RSDP */
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index ef834e652d..fbbdb9ec2f 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -796,12 +796,12 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
 
     if ( state->pv_ramdisk.path && strlen(state->pv_ramdisk.path) ) {
         if (state->pv_ramdisk.mapped) {
-            if ( (ret = xc_dom_ramdisk_mem(dom, state->pv_ramdisk.data, state->pv_ramdisk.size)) != 0 ) {
+            if ( (ret = xc_dom_module_mem(dom, state->pv_ramdisk.data, state->pv_ramdisk.size, NULL)) != 0 ) {
                 LOGE(ERROR, "xc_dom_ramdisk_mem failed");
                 goto out;
             }
         } else {
-            if ( (ret = xc_dom_ramdisk_file(dom, state->pv_ramdisk.path)) != 0 ) {
+            if ( (ret = xc_dom_module_file(dom, state->pv_ramdisk.path, NULL)) != 0 ) {
                 LOGE(ERROR, "xc_dom_ramdisk_file failed");
                 goto out;
             }
@@ -1043,14 +1043,14 @@ static int libxl__domain_firmware(libxl__gc *gc,
 
         if (state->pv_ramdisk.path && strlen(state->pv_ramdisk.path)) {
             if (state->pv_ramdisk.mapped) {
-                rc = xc_dom_ramdisk_mem(dom, state->pv_ramdisk.data,
-                                        state->pv_ramdisk.size);
+                rc = xc_dom_module_mem(dom, state->pv_ramdisk.data,
+                                       state->pv_ramdisk.size, NULL);
                 if (rc) {
                     LOGE(ERROR, "xc_dom_ramdisk_mem failed");
                     goto out;
                 }
             } else {
-                rc = xc_dom_ramdisk_file(dom, state->pv_ramdisk.path);
+                rc = xc_dom_module_file(dom, state->pv_ramdisk.path, NULL);
                 if (rc) {
                     LOGE(ERROR, "xc_dom_ramdisk_file failed");
                     goto out;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 08/74] libxl: Introduce hack to allow PVH mode to add a shim
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (6 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 07/74] tools/libxc: Multi modules support Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 09/74] xen/common: Widen the guest logging buffer slightly Wei Liu
                   ` (67 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: George Dunlap <george.dunlap@citrix.com>

libxl will look for LIBXL_PVSHIM_PATH and LIBXL_PVSHIM_CMDLINE
environment variables.  If the first is present, it will boot with the
shim and the existing kernel / ramdisk.  (That is, the shim as the "kernel" and the
kernel and ramdisk both as extra modules.)

If not, it will just boot the kernel / ramdisk directly (that is, with
the kernel as "kernel" and the ramdisk as a module).

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
To be replaced with proper toolstack side patches
---
 tools/libxl/libxl_dom.c      | 67 ++++++++++++++++++++++++++++++++++++--------
 tools/libxl/libxl_internal.h |  2 ++
 2 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index fbbdb9ec2f..f04eec7c79 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1025,22 +1025,51 @@ static int libxl__domain_firmware(libxl__gc *gc,
 
     if (state->pv_kernel.path != NULL &&
         info->type == LIBXL_DOMAIN_TYPE_PVH) {
-        /* Try to load a kernel instead of the firmware. */
-        if (state->pv_kernel.mapped) {
-            rc = xc_dom_kernel_mem(dom, state->pv_kernel.data,
-                                   state->pv_kernel.size);
+
+        if (state->shim_path) {
+            rc = xc_dom_kernel_file(dom, state->shim_path);
             if (rc) {
-                LOGE(ERROR, "xc_dom_kernel_mem failed");
+                LOGE(ERROR, "xc_dom_kernel_file failed");
                 goto out;
             }
+
+            /* We've loaded the shim, so load the kernel as a secondary module */
+            if (state->pv_kernel.mapped) {
+                LOG(WARN, "xc_dom_module_mem, cmdline %s",
+                    state->pv_cmdline);
+                rc = xc_dom_module_mem(dom, state->pv_kernel.data,
+                                       state->pv_kernel.size, state->pv_cmdline);
+                if (rc) {
+                    LOGE(ERROR, "xc_dom_kernel_mem failed");
+                    goto out;
+                }
+            } else {
+                LOG(WARN, "xc_dom_module_file, path %s cmdline %s",
+                    state->pv_kernel.path, state->pv_cmdline);
+                rc = xc_dom_module_file(dom, state->pv_kernel.path, state->pv_cmdline);
+                if (rc) {
+                    LOGE(ERROR, "xc_dom_kernel_file failed");
+                    goto out;
+                }
+            }
         } else {
-            rc = xc_dom_kernel_file(dom, state->pv_kernel.path);
-            if (rc) {
-                LOGE(ERROR, "xc_dom_kernel_file failed");
-                goto out;
+            /* No shim, so load the kernel directly */
+            if (state->pv_kernel.mapped) {
+                rc = xc_dom_kernel_mem(dom, state->pv_kernel.data,
+                                       state->pv_kernel.size);
+                if (rc) {
+                    LOGE(ERROR, "xc_dom_kernel_mem failed");
+                    goto out;
+                }
+            } else {
+                rc = xc_dom_kernel_file(dom, state->pv_kernel.path);
+                if (rc) {
+                    LOGE(ERROR, "xc_dom_kernel_file failed");
+                    goto out;
+                }
             }
         }
-
+        
         if (state->pv_ramdisk.path && strlen(state->pv_ramdisk.path)) {
             if (state->pv_ramdisk.mapped) {
                 rc = xc_dom_module_mem(dom, state->pv_ramdisk.data,
@@ -1154,8 +1183,24 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 
     xc_dom_loginit(ctx->xch);
 
+    /* FIXME */
+#define LIBXL_PVSHIM_PATH "LIBXL_PVSHIM_PATH"
+#define LIBXL_PVSHIM_CMDLINE "LIBXL_PVSHIM_CMDLINE"
+    state->shim_path = getenv(LIBXL_PVSHIM_PATH);
+    if (state->shim_path) {
+        state->shim_cmdline = getenv(LIBXL_PVSHIM_CMDLINE);
+        LOG(WARN, "LIBXL_PVSHIM_PATH detected, using pv shim %s cmd %s",
+            state->shim_path, state->shim_cmdline);
+    }
+
+    /* 
+     * If PVH and we have a shim override, use the shim cmdline.
+     * If PVH and no shim override, use the pv cmdline.
+     * If not PVH, use info->cmdline.
+     */
     dom = xc_dom_allocate(ctx->xch, info->type == LIBXL_DOMAIN_TYPE_PVH ?
-                          state->pv_cmdline : info->cmdline, NULL);
+                          (state->shim_path ? state->shim_cmdline : state->pv_cmdline) :
+                          info->cmdline, NULL);
     if (!dom) {
         LOGE(ERROR, "xc_dom_allocate failed");
         rc = ERROR_NOMEM;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index bfa95d8619..ef1b2e2ca1 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1136,6 +1136,8 @@ typedef struct {
 
     libxl__file_reference pv_kernel;
     libxl__file_reference pv_ramdisk;
+    const char * shim_path;
+    const char * shim_cmdline;
     const char * pv_cmdline;
 
     xen_vmemrange_t *vmemranges;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 09/74] xen/common: Widen the guest logging buffer slightly
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (7 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 08/74] libxl: Introduce hack to allow PVH mode to add a shim Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 10/74] x86/time: Print a more helpful error when a platform timer can't be found Wei Liu
                   ` (66 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

This reduces the amount of line wrapping from guests; Xen in particular likes
to print lines longer than 80 characters.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/xen/sched.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 002ba29d6d..64abc1df6c 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -427,7 +427,7 @@ struct domain
     xen_domain_handle_t handle;
 
     /* hvm_print_line() and guest_console_write() logging. */
-#define DOMAIN_PBUF_SIZE 80
+#define DOMAIN_PBUF_SIZE 200
     char       *pbuf;
     unsigned    pbuf_idx;
     spinlock_t  pbuf_lock;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 10/74] x86/time: Print a more helpful error when a platform timer can't be found
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (8 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 09/74] xen/common: Widen the guest logging buffer slightly Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 10:37   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 11/74] x86/link: Introduce and use SECTION_ALIGN Wei Liu
                   ` (65 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/time.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 2a879502a2..3b654d7b7d 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -708,7 +708,8 @@ static u64 __init init_platform_timer(void)
         }
     }
 
-    BUG_ON(rc <= 0);
+    if ( rc <= 0 )
+        panic("Unable to find usable platform timer");
 
     printk("Platform timer is %s %s\n",
            freq_string(pts->frequency), pts->name);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 11/74] x86/link: Introduce and use SECTION_ALIGN
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (9 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 10/74] x86/time: Print a more helpful error when a platform timer can't be found Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 10:38   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 12/74] xen/acpi: mark the PM timer FADT field as optional Wei Liu
                   ` (64 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

... to reduce the quantity of #ifdef EFI.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/xen.lds.S | 50 +++++++++++++-------------------------------------
 1 file changed, 13 insertions(+), 37 deletions(-)

diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index d5e8821d41..6164ad094f 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -12,12 +12,14 @@
 #define FORMAT "pei-x86-64"
 #undef __XEN_VIRT_START
 #define __XEN_VIRT_START __image_base__
+#define SECTION_ALIGN MB(2)
 
 ENTRY(efi_start)
 
 #else /* !EFI */
 
 #define FORMAT "elf64-x86-64"
+#define SECTION_ALIGN PAGE_SIZE
 
 ENTRY(start)
 
@@ -67,11 +69,7 @@ SECTIONS
        _etext = .;             /* End of text section */
   } :text = 0x9090
 
-#ifdef EFI
-  . = ALIGN(MB(2));
-#else
-  . = ALIGN(PAGE_SIZE);
-#endif
+  . = ALIGN(SECTION_ALIGN);
   __2M_text_end = .;
 
   __2M_rodata_start = .;       /* Start of 2M superpages, mapped RO. */
@@ -149,11 +147,7 @@ SECTIONS
 #endif
   _erodata = .;
 
-#ifdef EFI
-  . = ALIGN(MB(2));
-#else
-  . = ALIGN(PAGE_SIZE);
-#endif
+  . = ALIGN(SECTION_ALIGN);
   __2M_rodata_end = .;
 
   __2M_init_start = .;         /* Start of 2M superpages, mapped RWX (boot only). */
@@ -215,11 +209,7 @@ SECTIONS
        __ctors_end = .;
   } :text
 
-#ifdef EFI
-  . = ALIGN(MB(2));
-#else
-  . = ALIGN(PAGE_SIZE);
-#endif
+  . = ALIGN(SECTION_ALIGN);
   __init_end = .;
   __2M_init_end = .;
 
@@ -257,11 +247,7 @@ SECTIONS
   } :text
   _end = . ;
 
-#ifdef EFI
-  . = ALIGN(MB(2));
-#else
-  . = ALIGN(PAGE_SIZE);
-#endif
+  . = ALIGN(SECTION_ALIGN);
   __2M_rwdata_end = .;
 
 #ifdef EFI
@@ -310,23 +296,13 @@ ASSERT(__image_base__ > XEN_VIRT_START ||
 ASSERT(kexec_reloc_size - kexec_reloc <= PAGE_SIZE, "kexec_reloc is too large")
 #endif
 
-#ifdef EFI
-ASSERT(IS_ALIGNED(__2M_text_end,     MB(2)), "__2M_text_end misaligned")
-ASSERT(IS_ALIGNED(__2M_rodata_start, MB(2)), "__2M_rodata_start misaligned")
-ASSERT(IS_ALIGNED(__2M_rodata_end,   MB(2)), "__2M_rodata_end misaligned")
-ASSERT(IS_ALIGNED(__2M_init_start,   MB(2)), "__2M_init_start misaligned")
-ASSERT(IS_ALIGNED(__2M_init_end,     MB(2)), "__2M_init_end misaligned")
-ASSERT(IS_ALIGNED(__2M_rwdata_start, MB(2)), "__2M_rwdata_start misaligned")
-ASSERT(IS_ALIGNED(__2M_rwdata_end,   MB(2)), "__2M_rwdata_end misaligned")
-#else
-ASSERT(IS_ALIGNED(__2M_text_end,     PAGE_SIZE), "__2M_text_end misaligned")
-ASSERT(IS_ALIGNED(__2M_rodata_start, PAGE_SIZE), "__2M_rodata_start misaligned")
-ASSERT(IS_ALIGNED(__2M_rodata_end,   PAGE_SIZE), "__2M_rodata_end misaligned")
-ASSERT(IS_ALIGNED(__2M_init_start,   PAGE_SIZE), "__2M_init_start misaligned")
-ASSERT(IS_ALIGNED(__2M_init_end,     PAGE_SIZE), "__2M_init_end misaligned")
-ASSERT(IS_ALIGNED(__2M_rwdata_start, PAGE_SIZE), "__2M_rwdata_start misaligned")
-ASSERT(IS_ALIGNED(__2M_rwdata_end,   PAGE_SIZE), "__2M_rwdata_end misaligned")
-#endif
+ASSERT(IS_ALIGNED(__2M_text_end,     SECTION_ALIGN), "__2M_text_end misaligned")
+ASSERT(IS_ALIGNED(__2M_rodata_start, SECTION_ALIGN), "__2M_rodata_start misaligned")
+ASSERT(IS_ALIGNED(__2M_rodata_end,   SECTION_ALIGN), "__2M_rodata_end misaligned")
+ASSERT(IS_ALIGNED(__2M_init_start,   SECTION_ALIGN), "__2M_init_start misaligned")
+ASSERT(IS_ALIGNED(__2M_init_end,     SECTION_ALIGN), "__2M_init_end misaligned")
+ASSERT(IS_ALIGNED(__2M_rwdata_start, SECTION_ALIGN), "__2M_rwdata_start misaligned")
+ASSERT(IS_ALIGNED(__2M_rwdata_end,   SECTION_ALIGN), "__2M_rwdata_end misaligned")
 
 ASSERT(IS_ALIGNED(cpu0_stack, STACK_SIZE), "cpu0_stack misaligned")
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 12/74] xen/acpi: mark the PM timer FADT field as optional
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (10 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 11/74] x86/link: Introduce and use SECTION_ALIGN Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 10:52   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 13/74] xen/domctl: Return arch_config via getdomaininfo Wei Liu
                   ` (63 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

According to the ACPI 6.1 specification this field is optional, so
mark it as such.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/drivers/acpi/tables/tbfadt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/drivers/acpi/tables/tbfadt.c b/xen/drivers/acpi/tables/tbfadt.c
index d62d8d5cb9..e84a726dfe 100644
--- a/xen/drivers/acpi/tables/tbfadt.c
+++ b/xen/drivers/acpi/tables/tbfadt.c
@@ -95,7 +95,7 @@ static struct acpi_fadt_info __initdata fadt_info_table[] = {
 
 	{"PmTimerBlock", ACPI_FADT_OFFSET(xpm_timer_block),
 	 ACPI_FADT_OFFSET(pm_timer_block),
-	 ACPI_FADT_OFFSET(pm_timer_length), ACPI_FADT_REQUIRED},
+	 ACPI_FADT_OFFSET(pm_timer_length), ACPI_FADT_OPTIONAL},
 
 	{"Gpe0Block", ACPI_FADT_OFFSET(xgpe0_block),
 	 ACPI_FADT_OFFSET(gpe0_block),
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 13/74] xen/domctl: Return arch_config via getdomaininfo
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (11 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 12/74] xen/acpi: mark the PM timer FADT field as optional Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 10:58   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 14/74] tools/ocaml: Expose arch_config in domaininfo Wei Liu
                   ` (62 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

This allows toolstack software to distinguish HVM from PVH guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/libxc/include/xenctrl.h | 1 +
 tools/libxc/xc_domain.c       | 1 +
 xen/arch/x86/domctl.c         | 2 ++
 xen/include/public/domctl.h   | 1 +
 4 files changed, 5 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 666db0b919..a92a8d7a53 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -456,6 +456,7 @@ typedef struct xc_dominfo {
     unsigned int  max_vcpu_id;
     xen_domain_handle_t handle;
     unsigned int  cpupool;
+    struct xen_arch_domainconfig arch_config;
 } xc_dominfo_t;
 
 typedef xen_domctl_getdomaininfo_t xc_domaininfo_t;
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 3ccd27f101..8169284dc1 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -421,6 +421,7 @@ int xc_domain_getinfo(xc_interface *xch,
         info->nr_online_vcpus = domctl.u.getdomaininfo.nr_online_vcpus;
         info->max_vcpu_id = domctl.u.getdomaininfo.max_vcpu_id;
         info->cpupool = domctl.u.getdomaininfo.cpupool;
+        info->arch_config = domctl.u.getdomaininfo.arch_config;
 
         memcpy(info->handle, domctl.u.getdomaininfo.handle,
                sizeof(xen_domain_handle_t));
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 36ab23577b..0d2d2ec3e6 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -345,6 +345,8 @@ void arch_get_domain_info(const struct domain *d,
 {
     if ( paging_mode_hap(d) )
         info->flags |= XEN_DOMINF_hap;
+
+    info->arch_config.emulation_flags = d->arch.emulation_flags;
 }
 
 #define MAX_IOPORTS 0x10000
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 6d5396fd71..4cd09b44a0 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -116,6 +116,7 @@ struct xen_domctl_getdomaininfo {
     uint32_t ssidref;
     xen_domain_handle_t handle;
     uint32_t cpupool;
+    struct xen_arch_domainconfig arch_config;
 };
 typedef struct xen_domctl_getdomaininfo xen_domctl_getdomaininfo_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_getdomaininfo_t);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 14/74] tools/ocaml: Expose arch_config in domaininfo
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (12 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 13/74] xen/domctl: Return arch_config via getdomaininfo Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 15/74] tools/ocaml: Extend domain_create() to take arch_domainconfig Wei Liu
                   ` (61 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/ocaml/libs/xc/xenctrl.ml      | 29 +++++++++++++++++++++++++++++
 tools/ocaml/libs/xc/xenctrl.mli     | 28 ++++++++++++++++++++++++++++
 tools/ocaml/libs/xc/xenctrl_stubs.c | 26 ++++++++++++++++++++++++--
 3 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index 70a325b0e9..d549068d60 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -28,6 +28,34 @@ type vcpuinfo =
 	cpumap: int32;
 }
 
+type xen_arm_arch_domainconfig =
+{
+	gic_version: int;
+	nr_spis: int;
+	clock_frequency: int32;
+}
+
+type x86_arch_emulation_flags =
+	| X86_EMU_LAPIC
+	| X86_EMU_HPET
+	| X86_EMU_PM
+	| X86_EMU_RTC
+	| X86_EMU_IOAPIC
+	| X86_EMU_PIC
+	| X86_EMU_VGA
+	| X86_EMU_IOMMU
+	| X86_EMU_PIT
+	| X86_EMU_USE_PIRQ
+
+type xen_x86_arch_domainconfig =
+{
+	emulation_flags: x86_arch_emulation_flags list;
+}
+
+type arch_domainconfig =
+	| ARM of xen_arm_arch_domainconfig
+	| X86 of xen_x86_arch_domainconfig
+
 type domaininfo =
 {
 	domid             : domid;
@@ -46,6 +74,7 @@ type domaininfo =
 	max_vcpu_id       : int;
 	ssidref           : int32;
 	handle            : int array;
+	arch_config       : arch_domainconfig;
 }
 
 type sched_control =
diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli
index 702d8a7ab8..08f1fd26ae 100644
--- a/tools/ocaml/libs/xc/xenctrl.mli
+++ b/tools/ocaml/libs/xc/xenctrl.mli
@@ -22,6 +22,33 @@ type vcpuinfo = {
   cputime : int64;
   cpumap : int32;
 }
+
+type xen_arm_arch_domainconfig = {
+  gic_version: int;
+  nr_spis: int;
+  clock_frequency: int32;
+}
+
+type x86_arch_emulation_flags =
+  | X86_EMU_LAPIC
+  | X86_EMU_HPET
+  | X86_EMU_PM
+  | X86_EMU_RTC
+  | X86_EMU_IOAPIC
+  | X86_EMU_PIC
+  | X86_EMU_VGA
+  | X86_EMU_IOMMU
+  | X86_EMU_PIT
+  | X86_EMU_USE_PIRQ
+
+type xen_x86_arch_domainconfig = {
+  emulation_flags: x86_arch_emulation_flags list;
+}
+
+type arch_domainconfig =
+  | ARM of xen_arm_arch_domainconfig
+  | X86 of xen_x86_arch_domainconfig
+
 type domaininfo = {
   domid : domid;
   dying : bool;
@@ -39,6 +66,7 @@ type domaininfo = {
   max_vcpu_id : int;
   ssidref : int32;
   handle : int array;
+  arch_config : arch_domainconfig;
 }
 type sched_control = { weight : int; cap : int; }
 type physinfo_cap_flag = CAP_HVM | CAP_DirectIO
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index c66732f67c..124aa34fe8 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -273,10 +273,10 @@ CAMLprim value stub_xc_domain_shutdown(value xch, value domid, value reason)
 static value alloc_domaininfo(xc_domaininfo_t * info)
 {
 	CAMLparam0();
-	CAMLlocal2(result, tmp);
+	CAMLlocal5(result, tmp, arch_config, x86_arch_config, emul_list);
 	int i;
 
-	result = caml_alloc_tuple(16);
+	result = caml_alloc_tuple(17);
 
 	Store_field(result,  0, Val_int(info->domain));
 	Store_field(result,  1, Val_bool(info->flags & XEN_DOMINF_dying));
@@ -302,6 +302,28 @@ static value alloc_domaininfo(xc_domaininfo_t * info)
 
 	Store_field(result, 15, tmp);
 
+	/* emulation_flags: x86_arch_emulation_flags list; */
+	tmp = emul_list = Val_emptylist;
+	for (i = 0; i < 10; i++) {
+		if ((info->arch_config.emulation_flags >> i) & 1) {
+			tmp = caml_alloc_small(2, Tag_cons);
+			Field(tmp, 0) = Val_int(i);
+			Field(tmp, 1) = emul_list;
+			emul_list = tmp;
+		}
+	}
+
+	/* xen_x86_arch_domainconfig */
+	x86_arch_config = caml_alloc_tuple(1);
+	Store_field(x86_arch_config, 0, emul_list);
+
+	/* arch_config: arch_domainconfig */
+	arch_config = caml_alloc_small(1, 1);
+
+	Store_field(arch_config, 0, x86_arch_config);
+
+	Store_field(result, 16, arch_config);
+
 	CAMLreturn(result);
 }
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 15/74] tools/ocaml: Extend domain_create() to take arch_domainconfig
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (13 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 14/74] tools/ocaml: Expose arch_config in domaininfo Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 16/74] x86/fixmap: Modify fix_to_virt() to return a void pointer Wei Liu
                   ` (60 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Jon Ludlam <jonathan.ludlam@citrix.com>

No longer passing NULL into xc_domain_create() allows for the creation
of PVH guests.

Signed-off-by: Jon Ludlam <jonathan.ludlam@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 tools/ocaml/libs/xc/xenctrl.ml      |  2 +-
 tools/ocaml/libs/xc/xenctrl.mli     |  2 +-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 22 ++++++++++++++++++++--
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index d549068d60..9116aa222c 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -143,7 +143,7 @@ let with_intf f =
 	interface_close xc;
 	r
 
-external _domain_create: handle -> int32 -> domain_create_flag list -> int array -> domid
+external _domain_create: handle -> int32 -> domain_create_flag list -> int array -> arch_domainconfig -> domid
        = "stub_xc_domain_create"
 
 let int_array_of_uuid_string s =
diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli
index 08f1fd26ae..54c099c88f 100644
--- a/tools/ocaml/libs/xc/xenctrl.mli
+++ b/tools/ocaml/libs/xc/xenctrl.mli
@@ -102,7 +102,7 @@ external sizeof_xen_pfn : unit -> int = "stub_sizeof_xen_pfn"
 external interface_open : unit -> handle = "stub_xc_interface_open"
 external interface_close : handle -> unit = "stub_xc_interface_close"
 val with_intf : (handle -> 'a) -> 'a
-val domain_create : handle -> int32 -> domain_create_flag list -> string -> domid
+val domain_create : handle -> int32 -> domain_create_flag list -> string -> arch_domainconfig -> domid
 val domain_sethandle : handle -> domid -> string -> unit
 external domain_max_vcpus : handle -> domid -> int -> unit
   = "stub_xc_domain_max_vcpus"
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 124aa34fe8..0b5a2361c0 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -144,7 +144,8 @@ static int domain_create_flag_table[] = {
 };
 
 CAMLprim value stub_xc_domain_create(value xch, value ssidref,
-                                     value flags, value handle)
+                                     value flags, value handle,
+                                     value domconfig)
 {
 	CAMLparam4(xch, ssidref, flags, handle);
 
@@ -155,6 +156,7 @@ CAMLprim value stub_xc_domain_create(value xch, value ssidref,
 	uint32_t c_ssidref = Int32_val(ssidref);
 	unsigned int c_flags = 0;
 	value l;
+	xc_domain_configuration_t config = {};
 
         if (Wosize_val(handle) != 16)
 		caml_invalid_argument("Handle not a 16-integer array");
@@ -168,8 +170,24 @@ CAMLprim value stub_xc_domain_create(value xch, value ssidref,
 		c_flags |= domain_create_flag_table[v];
 	}
 
+	switch(Tag_val(domconfig)) {
+	case 0: /* ARM - nothing to do */
+		caml_failwith("Unhandled: ARM");
+		break;
+
+	case 1: /* X86 - emulation flags in the block */
+		for (l = Field(Field(domconfig, 0), 0);
+		     l != Val_none;
+		     l = Field(l, 1))
+			config.emulation_flags |= 1u << Int_val(Field(l, 0));
+		break;
+
+	default:
+		caml_failwith("Unhandled domconfig type");
+	}
+
 	caml_enter_blocking_section();
-	result = xc_domain_create(_H(xch), c_ssidref, h, c_flags, &domid, NULL);
+	result = xc_domain_create(_H(xch), c_ssidref, h, c_flags, &domid, &config);
 	caml_leave_blocking_section();
 
 	if (result < 0)
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 16/74] x86/fixmap: Modify fix_to_virt() to return a void pointer
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (14 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 15/74] tools/ocaml: Extend domain_create() to take arch_domainconfig Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 11:05   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 17/74] ---- x86/Kconfig: Options for Xen and PVH support Wei Liu
                   ` (59 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

Almost all users of fix_to_virt() actually want a pointer.  Include the cast
within the definition, so the callers don't need to.

Two users which need the integer value are switched to using __fix_to_virt()
directly.  A few users stay fully unchanged, due to GCC's void pointer
arithmetic extension causing the same behaviour.  Most users however have
their explicit casting dropped.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/acpi/lib.c         | 2 +-
 xen/arch/x86/mm.c               | 4 ++--
 xen/arch/x86/mpparse.c          | 2 +-
 xen/arch/x86/msi.c              | 3 +--
 xen/arch/x86/tboot.c            | 4 ++--
 xen/drivers/acpi/apei/apei-io.c | 2 +-
 xen/drivers/char/ehci-dbgp.c    | 2 +-
 xen/drivers/char/ns16550.c      | 2 +-
 xen/include/asm-x86/apicdef.h   | 2 +-
 xen/include/asm-x86/fixmap.h    | 2 +-
 10 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/acpi/lib.c b/xen/arch/x86/acpi/lib.c
index 7d7c71848b..265b9ad819 100644
--- a/xen/arch/x86/acpi/lib.c
+++ b/xen/arch/x86/acpi/lib.c
@@ -49,7 +49,7 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
 	offset = phys & (PAGE_SIZE - 1);
 	mapped_size = PAGE_SIZE - offset;
 	set_fixmap(FIX_ACPI_END, phys);
-	base = fix_to_virt(FIX_ACPI_END);
+	base = __fix_to_virt(FIX_ACPI_END);
 
 	/*
 	 * Most cases can be covered by the below.
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a56f875d45..f73fee225e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5227,12 +5227,12 @@ void __set_fixmap(
     enum fixed_addresses idx, unsigned long mfn, unsigned long flags)
 {
     BUG_ON(idx >= __end_of_fixed_addresses);
-    map_pages_to_xen(fix_to_virt(idx), mfn, 1, flags);
+    map_pages_to_xen(__fix_to_virt(idx), mfn, 1, flags);
 }
 
 void *__init arch_vmap_virt_end(void)
 {
-    return (void *)fix_to_virt(__end_of_fixed_addresses);
+    return fix_to_virt(__end_of_fixed_addresses);
 }
 
 void __iomem *ioremap(paddr_t pa, size_t len)
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index a1a0738a19..49140e46f0 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -703,7 +703,7 @@ static void __init efi_check_config(void)
 		return;
 
 	__set_fixmap(FIX_EFI_MPF, PFN_DOWN(efi.mps), __PAGE_HYPERVISOR);
-	mpf = (void *)fix_to_virt(FIX_EFI_MPF) + ((long)efi.mps & (PAGE_SIZE-1));
+	mpf = fix_to_virt(FIX_EFI_MPF) + ((long)efi.mps & (PAGE_SIZE-1));
 
 	if (memcmp(mpf->mpf_signature, "_MP_", 4) == 0 &&
 	    mpf->mpf_length == 1 &&
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 095bd3cae7..8c89f072a8 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -960,8 +960,7 @@ static int msix_capability_init(struct pci_dev *dev,
             xfree(entry);
             return idx;
         }
-        base = (void *)(fix_to_virt(idx) +
-                        ((unsigned long)entry_paddr & (PAGE_SIZE - 1)));
+        base = fix_to_virt(idx) + (entry_paddr & (PAGE_SIZE - 1));
 
         /* Mask interrupt here */
         writel(1, base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index 59d7c477f4..d36bf33407 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -82,7 +82,7 @@ static void __init tboot_copy_memory(unsigned char *va, uint32_t size,
         {
             map_base = PFN_DOWN(pa + i);
             set_fixmap(FIX_TBOOT_MAP_ADDRESS, map_base << PAGE_SHIFT);
-            map_addr = (unsigned char *)fix_to_virt(FIX_TBOOT_MAP_ADDRESS);
+            map_addr = fix_to_virt(FIX_TBOOT_MAP_ADDRESS);
         }
         va[i] = map_addr[pa + i - (map_base << PAGE_SHIFT)];
     }
@@ -98,7 +98,7 @@ void __init tboot_probe(void)
 
     /* Map and check for tboot UUID. */
     set_fixmap(FIX_TBOOT_SHARED_BASE, opt_tboot_pa);
-    tboot_shared = (tboot_shared_t *)fix_to_virt(FIX_TBOOT_SHARED_BASE);
+    tboot_shared = fix_to_virt(FIX_TBOOT_SHARED_BASE);
     if ( tboot_shared == NULL )
         return;
     if ( memcmp(&tboot_shared_uuid, (uuid_t *)tboot_shared, sizeof(uuid_t)) )
diff --git a/xen/drivers/acpi/apei/apei-io.c b/xen/drivers/acpi/apei/apei-io.c
index 8955de935e..89b70f45ef 100644
--- a/xen/drivers/acpi/apei/apei-io.c
+++ b/xen/drivers/acpi/apei/apei-io.c
@@ -92,7 +92,7 @@ static void __iomem *__init apei_range_map(paddr_t paddr, unsigned long size)
 		apei_range_nr++;
 	}
 
-	return (void __iomem *)fix_to_virt(FIX_APEI_RANGE_BASE + start_nr);
+	return fix_to_virt(FIX_APEI_RANGE_BASE + start_nr);
 }
 
 /*
diff --git a/xen/drivers/char/ehci-dbgp.c b/xen/drivers/char/ehci-dbgp.c
index d48e777c34..d0071d3114 100644
--- a/xen/drivers/char/ehci-dbgp.c
+++ b/xen/drivers/char/ehci-dbgp.c
@@ -1327,7 +1327,7 @@ static void __init ehci_dbgp_init_preirq(struct serial_port *port)
      * than enough.  1k is the biggest that was seen.
      */
     set_fixmap_nocache(FIX_EHCI_DBGP, dbgp->bar_val);
-    ehci_bar = (void __iomem *)fix_to_virt(FIX_EHCI_DBGP);
+    ehci_bar = fix_to_virt(FIX_EHCI_DBGP);
     ehci_bar += dbgp->bar_val & ~PAGE_MASK;
     dbgp_printk("ehci_bar: %p\n", ehci_bar);
 
diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index e0f8199f98..f32dbd3247 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -697,7 +697,7 @@ static void __init ns16550_init_preirq(struct serial_port *port)
         enum fixed_addresses idx = FIX_COM_BEGIN + (uart - ns16550_com);
 
         set_fixmap_nocache(idx, uart->io_base);
-        uart->remapped_io_base = (void __iomem *)fix_to_virt(idx);
+        uart->remapped_io_base = fix_to_virt(idx);
         uart->remapped_io_base += uart->io_base & ~PAGE_MASK;
 #else
         uart->remapped_io_base = (char *)ioremap(uart->io_base, uart->io_size);
diff --git a/xen/include/asm-x86/apicdef.h b/xen/include/asm-x86/apicdef.h
index eed504a31a..ce50c53f18 100644
--- a/xen/include/asm-x86/apicdef.h
+++ b/xen/include/asm-x86/apicdef.h
@@ -119,7 +119,7 @@
 /* Only available in x2APIC mode */
 #define		APIC_SELF_IPI	0x3F0
 
-#define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
+#define APIC_BASE (__fix_to_virt(FIX_APIC_BASE))
 
 /* It's only used in x2APIC mode of an x2APIC unit. */
 #define APIC_MSR_BASE 0x800
diff --git a/xen/include/asm-x86/fixmap.h b/xen/include/asm-x86/fixmap.h
index 89bf6cb611..51b0e7e945 100644
--- a/xen/include/asm-x86/fixmap.h
+++ b/xen/include/asm-x86/fixmap.h
@@ -79,7 +79,7 @@ extern void __set_fixmap(
 #define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT))
 #define __virt_to_fix(x) ((FIXADDR_TOP - ((x)&PAGE_MASK)) >> PAGE_SHIFT)
 
-#define fix_to_virt(x)   (__fix_to_virt(x))
+#define fix_to_virt(x)   ((void *)__fix_to_virt(x))
 
 static inline unsigned long virt_to_fix(const unsigned long vaddr)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 17/74] ---- x86/Kconfig: Options for Xen and PVH support
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (15 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 16/74] x86/fixmap: Modify fix_to_virt() to return a void pointer Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 11:11   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 18/74] x86/link: Relocate program headers Wei Liu
                   ` (58 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/Kconfig | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 7c4582922f..c0b0bcdcb3 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -117,6 +117,23 @@ config TBOOT
 	  Technology (TXT)
 
 	  If unsure, say Y.
+
+config XEN_GUEST
+	def_bool n
+	prompt "Xen Guest"
+	---help---
+	  Support for Xen detecting when it is running under Xen.
+
+	  If unsure, say N.
+
+config PVH_GUEST
+	def_bool n
+	prompt "PVH Guest"
+	depends on XEN_GUEST
+	---help---
+	  Support booting using the PVH ABI.
+
+	  If unsure, say N.
 endmenu
 
 source "common/Kconfig"
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 18/74] x86/link: Relocate program headers
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (16 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 17/74] ---- x86/Kconfig: Options for Xen and PVH support Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 11:20   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 19/74] x86: introduce ELFNOTE macro Wei Liu
                   ` (57 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

When the xen binary is loaded by libelf (in the future) we rely on the
elf loader to load the binary accordingly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/xen.lds.S | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 6164ad094f..400d8a56c4 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -13,6 +13,7 @@
 #undef __XEN_VIRT_START
 #define __XEN_VIRT_START __image_base__
 #define SECTION_ALIGN MB(2)
+#define DECL_SECTION(x) x :
 
 ENTRY(efi_start)
 
@@ -20,8 +21,9 @@ ENTRY(efi_start)
 
 #define FORMAT "elf64-x86-64"
 #define SECTION_ALIGN PAGE_SIZE
+#define DECL_SECTION(x) x : AT(ADDR(x) - __XEN_VIRT_START)
 
-ENTRY(start)
+ENTRY(start_pa)
 
 #endif /* EFI */
 
@@ -56,9 +58,11 @@ SECTIONS
   __2M_text_start = .;         /* Start of 2M superpages, mapped RX. */
 #endif
 
+  start_pa = ABSOLUTE(start - __XEN_VIRT_START);
+
   . = __XEN_VIRT_START + XEN_IMG_OFFSET;
   _start = .;
-  .text : {
+  DECL_SECTION(.text) {
         _stext = .;            /* Text and read-only data */
        *(.text)
        *(.text.cold)
@@ -73,7 +77,7 @@ SECTIONS
   __2M_text_end = .;
 
   __2M_rodata_start = .;       /* Start of 2M superpages, mapped RO. */
-  .rodata : {
+  DECL_SECTION(.rodata) {
        _srodata = .;
        /* Bug frames table */
        __start_bug_frames = .;
@@ -132,13 +136,13 @@ SECTIONS
  * compiler may want to inject other things in the .note which we don't care
  * about - hence this unique name.
  */
-  .note.gnu.build-id : {
+  DECL_SECTION(.note.gnu.build-id) {
        __note_gnu_build_id_start = .;
        *(.note.gnu.build-id)
        __note_gnu_build_id_end = .;
   } :note :text
 #elif defined(BUILD_ID_EFI)
-  .buildid : {
+  DECL_SECTION(.buildid) {
        __note_gnu_build_id_start = .;
        *(.buildid)
        __note_gnu_build_id_end = .;
@@ -153,7 +157,7 @@ SECTIONS
   __2M_init_start = .;         /* Start of 2M superpages, mapped RWX (boot only). */
   . = ALIGN(PAGE_SIZE);             /* Init code and data */
   __init_begin = .;
-  .init : {
+  DECL_SECTION(.init) {
        _sinittext = .;
        *(.init.text)
        /*
@@ -215,7 +219,7 @@ SECTIONS
 
   __2M_rwdata_start = .;       /* Start of 2M superpages, mapped RW. */
   . = ALIGN(SMP_CACHE_BYTES);
-  .data.read_mostly : {
+  DECL_SECTION(.data.read_mostly) {
        *(.data.read_mostly)
        . = ALIGN(8);
        __start_schedulers_array = .;
@@ -223,7 +227,7 @@ SECTIONS
        __end_schedulers_array = .;
   } :text
 
-  .data : {                    /* Data */
+  DECL_SECTION(.data) {
        *(.data.page_aligned)
        *(.data)
        *(.data.rel)
@@ -231,7 +235,7 @@ SECTIONS
        CONSTRUCTORS
   } :text
 
-  .bss : {                     /* BSS */
+  DECL_SECTION(.bss) {
        __bss_start = .;
        *(.bss.stack_aligned)
        *(.bss.page_aligned*)
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 19/74] x86: introduce ELFNOTE macro
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (17 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 18/74] x86/link: Relocate program headers Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 11:27   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 20/74] x86: produce a binary that can be booted as PVH Wei Liu
                   ` (56 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

It is needed later for introducing PVH entry point.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/asm-x86/asm_defns.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h
index 388fc93b9d..2493e97883 100644
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -409,4 +409,16 @@ static always_inline void stac(void)
 #define REX64_PREFIX "rex64/"
 #endif
 
+#define ELFNOTE(name, type, desc)           \
+    .pushsection .note.name               ; \
+    .align 4                              ; \
+    .long 2f - 1f       /* namesz */      ; \
+    .long 4f - 3f       /* descsz */      ; \
+    .long type          /* type   */      ; \
+1:.asciz #name          /* name   */      ; \
+2:.align 4                                ; \
+3:desc                  /* desc   */      ; \
+4:.align 4                                ; \
+    .popsection
+
 #endif /* __X86_ASM_DEFNS_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 20/74] x86: produce a binary that can be booted as PVH
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (18 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 19/74] x86: introduce ELFNOTE macro Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 11:39   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 21/74] x86/entry: Early PVH boot code Wei Liu
                   ` (55 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 .gitignore               |  1 +
 xen/arch/x86/Makefile    | 10 +++++++++-
 xen/arch/x86/boot/head.S | 10 ++++++++++
 xen/arch/x86/xen.lds.S   |  9 ++++++++-
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/.gitignore b/.gitignore
index d64b03d06c..8da67daf31 100644
--- a/.gitignore
+++ b/.gitignore
@@ -323,6 +323,7 @@ xen/xsm/flask/xenpolicy-*
 tools/flask/policy/policy.conf
 tools/flask/policy/xenpolicy-*
 xen/xen
+xen/xen-shim
 xen/xen-syms
 xen/xen-syms.map
 xen/xen.*
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index d5d58a205e..b58141efe2 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -75,6 +75,8 @@ efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
                       -O $(BASEDIR)/include/xen/compile.h ]; then \
                          echo '$(TARGET).efi'; fi)
 
+shim-$(CONFIG_PVH_GUEST) := $(TARGET)-shim
+
 ifneq ($(build_id_linker),)
 notes_phdrs = --notes
 else
@@ -93,7 +95,7 @@ endif
 syms-warn-dup-y := --warn-dup
 syms-warn-dup-$(CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS) :=
 
-$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
+$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32 $(shim-y)
 	./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TARGET) $(XEN_IMG_OFFSET) \
 	               `$(NM) $(TARGET)-syms | sed -ne 's/^\([^ ]*\) . __2M_rwdata_end$$/0x\1/p'`
 
@@ -144,6 +146,11 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
 		>$(@D)/$(@F).map
 	rm -f $(@D)/.$(@F).[0-9]*
 
+# Use elf32-x86-64 if toolchain support exists, elf32-i386 otherwise.
+$(TARGET)-shim: FORMAT = $(firstword $(filter elf32-x86-64,$(shell $(OBJCOPY) --help)) elf32-i386)
+$(TARGET)-shim: $(TARGET)-syms
+	$(OBJCOPY) -O $(FORMAT) $< $@
+
 note.o: $(TARGET)-syms
 	$(OBJCOPY) -O binary --only-section=.note.gnu.build-id  $(BASEDIR)/xen-syms $@.bin
 	$(OBJCOPY) -I binary -O elf64-x86-64 -B i386:x86-64 \
@@ -224,5 +231,6 @@ clean::
 	rm -f asm-offsets.s *.lds boot/*.o boot/*~ boot/core boot/mkelf32
 	rm -f $(BASEDIR)/.xen-syms.[0-9]* boot/.*.d
 	rm -f $(BASEDIR)/.xen.efi.[0-9]* efi/*.efi efi/disabled efi/mkreloc
+	rm -f $(BASEDIR)/xen-shim
 	rm -f boot/cmdline.S boot/reloc.S boot/*.lnk boot/*.bin
 	rm -f note.o
diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 475c678f2c..6810422435 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -7,6 +7,7 @@
 #include <asm/page.h>
 #include <asm/msr.h>
 #include <asm/cpufeature.h>
+#include <public/elfnote.h>
 
         .text
         .code32
@@ -374,6 +375,15 @@ cs32_switch:
         /* Jump to earlier loaded address. */
         jmp     *%edi
 
+
+#ifdef CONFIG_PVH_GUEST
+ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY, .long sym_offs(__pvh_start))
+
+__pvh_start:
+        ud2a
+
+#endif /* CONFIG_PVH_GUEST */
+
 __start:
         cld
         cli
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 400d8a56c4..d880b0a61a 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -34,7 +34,7 @@ OUTPUT_ARCH(i386:x86-64)
 PHDRS
 {
   text PT_LOAD ;
-#if defined(BUILD_ID) && !defined(EFI)
+#if (defined(BUILD_ID) && !defined(EFI)) || defined (CONFIG_PVH_GUEST)
   note PT_NOTE ;
 #endif
 }
@@ -128,6 +128,12 @@ SECTIONS
        __param_end = .;
   } :text
 
+#if defined(CONFIG_PVH_GUEST) && !defined(EFI)
+  DECL_SECTION(.note.Xen) {
+      *(.note.Xen)
+  } :note :text
+#endif
+
 #if defined(BUILD_ID)
 #if !defined(EFI)
 /*
@@ -279,6 +285,7 @@ SECTIONS
 #ifdef EFI
        *(.comment)
        *(.comment.*)
+       *(.note.Xen)
 #endif
   }
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 21/74] x86/entry: Early PVH boot code
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (19 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 20/74] x86: produce a binary that can be booted as PVH Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 13:32   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 22/74] x86/boot: Map more than the first 16MB Wei Liu
                   ` (54 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/Makefile                |   1 +
 xen/arch/x86/boot/head.S             |  40 +++++++++++-
 xen/arch/x86/boot/x86_64.S           |   2 +-
 xen/arch/x86/guest/Makefile          |   1 +
 xen/arch/x86/guest/pvh-boot.c        | 119 +++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c                 |  18 +++++-
 xen/include/asm-x86/guest.h          |  34 ++++++++++
 xen/include/asm-x86/guest/pvh-boot.h |  57 +++++++++++++++++
 8 files changed, 268 insertions(+), 4 deletions(-)
 create mode 100644 xen/arch/x86/guest/Makefile
 create mode 100644 xen/arch/x86/guest/pvh-boot.c
 create mode 100644 xen/include/asm-x86/guest.h
 create mode 100644 xen/include/asm-x86/guest/pvh-boot.h

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index b58141efe2..a8a3686812 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -1,6 +1,7 @@
 subdir-y += acpi
 subdir-y += cpu
 subdir-y += genapic
+subdir-$(CONFIG_XEN_GUEST) += guest
 subdir-$(CONFIG_HVM) += hvm
 subdir-y += mm
 subdir-$(CONFIG_XENOPROF) += oprofile
diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 6810422435..89a968f704 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -380,7 +380,39 @@ cs32_switch:
 ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY, .long sym_offs(__pvh_start))
 
 __pvh_start:
-        ud2a
+        cld
+        cli
+
+        /*
+         * We need one push/pop to determine load address.  Use the same
+         * absolute address as the native path, for lack of a better
+         * alternative.
+         */
+        mov     $0x1000, %esp
+
+        /* Calculate the load base address. */
+        call    1f
+1:      pop     %esi
+        sub     $sym_offs(1b), %esi
+
+        /* Set up stack. */
+        lea     STACK_SIZE + sym_esi(cpu0_stack), %esp
+
+        mov     %ebx, sym_esi(pvh_start_info_pa)
+
+        /* Prepare gdt and segments */
+        add     %esi, sym_esi(gdt_boot_base)
+        lgdt    sym_esi(gdt_boot_descr)
+
+        mov     $BOOT_DS, %ecx
+        mov     %ecx, %ds
+        mov     %ecx, %es
+        mov     %ecx, %ss
+
+        /* Skip bootloader setup and bios setup, go straight to trampoline */
+        movb    $1, sym_esi(pvh_boot)
+        movb    $1, sym_esi(skip_realmode)
+        jmp     trampoline_setup
 
 #endif /* CONFIG_PVH_GUEST */
 
@@ -544,12 +576,18 @@ trampoline_setup:
         /* Get bottom-most low-memory stack address. */
         add     $TRAMPOLINE_SPACE,%ecx
 
+#ifdef CONFIG_PVH_GUEST
+        cmpb    $1, sym_fs(pvh_boot)
+        je      1f
+#endif
+
         /* Save the Multiboot info struct (after relocation) for later use. */
         push    %ecx                /* Bottom-most low-memory stack address. */
         push    %ebx                /* Multiboot information address. */
         push    %eax                /* Multiboot magic. */
         call    reloc
         mov     %eax,sym_fs(multiboot_ptr)
+1:
 
         /*
          * Now trampoline_phys points to the following structure (lowest address
diff --git a/xen/arch/x86/boot/x86_64.S b/xen/arch/x86/boot/x86_64.S
index 925fd4bb0a..cf47e019f5 100644
--- a/xen/arch/x86/boot/x86_64.S
+++ b/xen/arch/x86/boot/x86_64.S
@@ -31,7 +31,7 @@ ENTRY(__high_start)
         test    %ebx,%ebx
         jnz     start_secondary
 
-        /* Pass off the Multiboot info structure to C land. */
+        /* Pass off the Multiboot info structure to C land (if applicable). */
         mov     multiboot_ptr(%rip),%edi
         call    __start_xen
         BUG     /* __start_xen() shouldn't return. */
diff --git a/xen/arch/x86/guest/Makefile b/xen/arch/x86/guest/Makefile
new file mode 100644
index 0000000000..a5f1625ab1
--- /dev/null
+++ b/xen/arch/x86/guest/Makefile
@@ -0,0 +1 @@
+obj-bin-$(CONFIG_PVH_GUEST) += pvh-boot.init.o
diff --git a/xen/arch/x86/guest/pvh-boot.c b/xen/arch/x86/guest/pvh-boot.c
new file mode 100644
index 0000000000..b93761f948
--- /dev/null
+++ b/xen/arch/x86/guest/pvh-boot.c
@@ -0,0 +1,119 @@
+/******************************************************************************
+ * arch/x86/guest/pvh-boot.c
+ *
+ * PVH boot time support
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/mm.h>
+
+#include <asm/guest.h>
+
+#include <public/arch-x86/hvm/start_info.h>
+
+/* Initialised in head.S, before .bss is zeroed. */
+bool pvh_boot __initdata;
+uint32_t pvh_start_info_pa __initdata;
+
+static multiboot_info_t __initdata pvh_mbi;
+static module_t __initdata pvh_mbi_mods[32];
+static char *__initdata pvh_loader = "PVH Directboot";
+
+static void __init convert_pvh_info(void)
+{
+    struct hvm_start_info *pvh_info = __va(pvh_start_info_pa);
+    struct hvm_modlist_entry *entry;
+    module_t *mod;
+    unsigned int i;
+
+    ASSERT(pvh_info->magic == XEN_HVM_START_MAGIC_VALUE);
+
+    /*
+     * Turn hvm_start_info into mbi. Luckily all modules are placed under 4GB
+     * boundary on x86.
+     */
+    pvh_mbi.flags = MBI_CMDLINE | MBI_MODULES | MBI_LOADERNAME;
+
+    ASSERT(!(pvh_info->cmdline_paddr >> 32));
+    pvh_mbi.cmdline = pvh_info->cmdline_paddr;
+    pvh_mbi.boot_loader_name = __pa(pvh_loader);
+
+    ASSERT(pvh_info->nr_modules < 32);
+    pvh_mbi.mods_count = pvh_info->nr_modules;
+    pvh_mbi.mods_addr = __pa(pvh_mbi_mods);
+
+    mod = pvh_mbi_mods;
+    entry = __va(pvh_info->modlist_paddr);
+    for ( i = 0; i < pvh_info->nr_modules; i++ )
+    {
+        ASSERT(!(entry[i].paddr >> 32));
+
+        mod[i].mod_start = entry[i].paddr;
+        mod[i].mod_end   = entry[i].paddr + entry[i].size;
+        mod[i].string    = entry[i].cmdline_paddr;
+    }
+}
+
+multiboot_info_t *__init pvh_init(void)
+{
+    convert_pvh_info();
+
+    return &pvh_mbi;
+}
+
+void __init pvh_print_info(void)
+{
+    struct hvm_start_info *pvh_info = __va(pvh_start_info_pa);
+    struct hvm_modlist_entry *entry;
+    unsigned int i;
+
+    ASSERT(pvh_info->magic == XEN_HVM_START_MAGIC_VALUE);
+
+    printk("PVH start info: (pa %08x)\n", pvh_start_info_pa);
+    printk("  version:    %u\n", pvh_info->version);
+    printk("  flags:      %#"PRIx32"\n", pvh_info->flags);
+    printk("  nr_modules: %u\n", pvh_info->nr_modules);
+    printk("  modlist_pa: %016"PRIx64"\n", pvh_info->modlist_paddr);
+    printk("  cmdline_pa: %016"PRIx64"\n", pvh_info->cmdline_paddr);
+    if ( pvh_info->cmdline_paddr )
+        printk("  cmdline:    '%s'\n",
+               (char *)__va(pvh_info->cmdline_paddr));
+    printk("  rsdp_pa:    %016"PRIx64"\n", pvh_info->rsdp_paddr);
+
+    entry = __va(pvh_info->modlist_paddr);
+    for ( i = 0; i < pvh_info->nr_modules; i++ )
+    {
+        printk("    mod[%u].pa:         %016"PRIx64"\n", i, entry[i].paddr);
+        printk("    mod[%u].size:       %016"PRIu64"\n", i, entry[i].size);
+        printk("    mod[%u].cmdline_pa: %016"PRIx64"\n",
+               i, entry[i].cmdline_paddr);
+        if ( entry[i].cmdline_paddr )
+            printk("    mod[%u].cmdline:    '%s'\n", i,
+                   (char *)__va(entry[i].cmdline_paddr));
+    }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 2e10c6bdf4..4b8d09b751 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -51,6 +51,7 @@
 #include <asm/alternative.h>
 #include <asm/mc146818rtc.h>
 #include <asm/cpuid.h>
+#include <asm/guest.h>
 
 /* opt_nosmp: If true, secondary processors are ignored. */
 static bool __initdata opt_nosmp;
@@ -649,8 +650,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     char *memmap_type = NULL;
     char *cmdline, *kextra, *loader;
     unsigned int initrdidx, domcr_flags = DOMCRF_s3_integrity;
-    multiboot_info_t *mbi = __va(mbi_p);
-    module_t *mod = (module_t *)__va(mbi->mods_addr);
+    multiboot_info_t *mbi;
+    module_t *mod;
     unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
     int i, j, e820_warn = 0, bytes = 0;
     bool acpi_boot_table_init_done = false, relocated = false;
@@ -680,6 +681,16 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     /* Full exception support from here on in. */
 
+    if ( pvh_boot )
+    {
+        ASSERT(mbi_p == 0);
+        mbi = pvh_init();
+    }
+    else
+        mbi = __va(mbi_p);
+
+    mod = __va(mbi->mods_addr);
+
     loader = (mbi->flags & MBI_LOADERNAME)
         ? (char *)__va(mbi->boot_loader_name) : "unknown";
 
@@ -719,6 +730,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     ehci_dbgp_init();
     console_init_preirq();
 
+    if ( pvh_boot )
+        pvh_print_info();
+
     printk("Bootloader: %s\n", loader);
 
     printk("Command line: %s\n", cmdline);
diff --git a/xen/include/asm-x86/guest.h b/xen/include/asm-x86/guest.h
new file mode 100644
index 0000000000..630c092c25
--- /dev/null
+++ b/xen/include/asm-x86/guest.h
@@ -0,0 +1,34 @@
+/******************************************************************************
+ * asm-x86/guest.h
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+
+#ifndef __X86_GUEST_H__
+#define __X86_GUEST_H__
+
+#include <asm/guest/pvh-boot.h>
+
+#endif /* __X86_GUEST_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/guest/pvh-boot.h b/xen/include/asm-x86/guest/pvh-boot.h
new file mode 100644
index 0000000000..1b429f9401
--- /dev/null
+++ b/xen/include/asm-x86/guest/pvh-boot.h
@@ -0,0 +1,57 @@
+/******************************************************************************
+ * asm-x86/guest/pvh-boot.h
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+
+#ifndef __X86_PVH_BOOT_H__
+#define __X86_PVH_BOOT_H__
+
+#include <xen/multiboot.h>
+
+#ifdef CONFIG_PVH_GUEST
+
+extern bool pvh_boot;
+
+multiboot_info_t *pvh_init(void);
+void pvh_print_info(void);
+
+#else
+
+#define pvh_boot 0
+
+static inline multiboot_info_t *pvh_init(void)
+{
+    ASSERT_UNREACHABLE();
+    return NULL;
+}
+
+static inline void pvh_print_info(void)
+{
+    ASSERT_UNREACHABLE();
+}
+
+#endif /* CONFIG_PVH_GUEST */
+#endif /* __X86_PVH_BOOT_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 22/74] x86/boot: Map more than the first 16MB
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (20 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 21/74] x86/entry: Early PVH boot code Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 23/74] x86/entry: Probe for Xen early during boot Wei Liu
                   ` (53 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

TODO: Replace somehow (bootstrap_map() ?)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/boot/x86_64.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/arch/x86/boot/x86_64.S b/xen/arch/x86/boot/x86_64.S
index cf47e019f5..42636cf334 100644
--- a/xen/arch/x86/boot/x86_64.S
+++ b/xen/arch/x86/boot/x86_64.S
@@ -114,11 +114,10 @@ GLOBAL(__page_tables_start)
 GLOBAL(l2_identmap)
         .quad sym_offs(l1_identmap) + __PAGE_HYPERVISOR
         idx = 1
-        .rept 7
+        .rept 4 * L2_PAGETABLE_ENTRIES - 1
         .quad (idx << L2_PAGETABLE_SHIFT) | PAGE_HYPERVISOR | _PAGE_PSE
         idx = idx + 1
         .endr
-        .fill 4 * L2_PAGETABLE_ENTRIES - 8, 8, 0
         .size l2_identmap, . - l2_identmap
 
 /*
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 23/74] x86/entry: Probe for Xen early during boot
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (21 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 22/74] x86/boot: Map more than the first 16MB Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 13:40   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 24/74] x86/guest: Hypercall support Wei Liu
                   ` (52 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/guest/Makefile     |  2 ++
 xen/arch/x86/guest/xen.c        | 75 +++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c            |  2 ++
 xen/include/asm-x86/guest.h     |  1 +
 xen/include/asm-x86/guest/xen.h | 47 ++++++++++++++++++++++++++
 5 files changed, 127 insertions(+)
 create mode 100644 xen/arch/x86/guest/xen.c
 create mode 100644 xen/include/asm-x86/guest/xen.h

diff --git a/xen/arch/x86/guest/Makefile b/xen/arch/x86/guest/Makefile
index a5f1625ab1..1345a60c81 100644
--- a/xen/arch/x86/guest/Makefile
+++ b/xen/arch/x86/guest/Makefile
@@ -1 +1,3 @@
+obj-y += xen.o
+
 obj-bin-$(CONFIG_PVH_GUEST) += pvh-boot.init.o
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
new file mode 100644
index 0000000000..9446a46a94
--- /dev/null
+++ b/xen/arch/x86/guest/xen.c
@@ -0,0 +1,75 @@
+/******************************************************************************
+ * arch/x86/guest/xen.c
+ *
+ * Support for detecting and running under Xen.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+#include <xen/init.h>
+#include <xen/types.h>
+
+#include <asm/guest.h>
+#include <asm/processor.h>
+
+#include <public/arch-x86/cpuid.h>
+
+bool xen_guest;
+
+static uint32_t xen_cpuid_base;
+
+static void __init find_xen_leaves(void)
+{
+    uint32_t eax, ebx, ecx, edx, base;
+
+    for ( base = XEN_CPUID_FIRST_LEAF;
+          base < XEN_CPUID_FIRST_LEAF + 0x10000; base += 0x100 )
+    {
+        cpuid(base, &eax, &ebx, &ecx, &edx);
+
+        if ( (ebx == XEN_CPUID_SIGNATURE_EBX) &&
+             (ecx == XEN_CPUID_SIGNATURE_ECX) &&
+             (edx == XEN_CPUID_SIGNATURE_EDX) &&
+             ((eax - base) >= 2) )
+        {
+            xen_cpuid_base = base;
+            break;
+        }
+    }
+}
+
+void __init probe_hypervisor(void)
+{
+    /* Too early to use cpu_has_hypervisor */
+    if ( !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) )
+        return;
+
+    find_xen_leaves();
+
+    if ( !xen_cpuid_base )
+        return;
+
+    xen_guest = true;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 4b8d09b751..d8059f23b5 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -715,6 +715,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
      * allocing any xenheap structures wanted in lower memory. */
     kexec_early_calculations();
 
+    probe_hypervisor();
+
     parse_video_info();
 
     rdmsrl(MSR_EFER, this_cpu(efer));
diff --git a/xen/include/asm-x86/guest.h b/xen/include/asm-x86/guest.h
index 630c092c25..8d91f81451 100644
--- a/xen/include/asm-x86/guest.h
+++ b/xen/include/asm-x86/guest.h
@@ -20,6 +20,7 @@
 #define __X86_GUEST_H__
 
 #include <asm/guest/pvh-boot.h>
+#include <asm/guest/xen.h>
 
 #endif /* __X86_GUEST_H__ */
 
diff --git a/xen/include/asm-x86/guest/xen.h b/xen/include/asm-x86/guest/xen.h
new file mode 100644
index 0000000000..97a7c8d531
--- /dev/null
+++ b/xen/include/asm-x86/guest/xen.h
@@ -0,0 +1,47 @@
+/******************************************************************************
+ * asm-x86/guest/xen.h
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+
+#ifndef __X86_GUEST_XEN_H__
+#define __X86_GUEST_XEN_H__
+
+#include <xen/types.h>
+
+#ifdef CONFIG_XEN_GUEST
+
+extern bool xen_guest;
+
+void probe_hypervisor(void);
+
+#else
+
+#define xen_guest 0
+
+static inline void probe_hypervisor(void) {};
+
+#endif /* CONFIG_XEN_GUEST */
+#endif /* __X86_GUEST_XEN_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 24/74] x86/guest: Hypercall support
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (22 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 23/74] x86/entry: Probe for Xen early during boot Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 13:53   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 25/74] x86/shutdown: Support for using SCHEDOP_{shutdown, reboot} Wei Liu
                   ` (51 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/guest/Makefile           |  1 +
 xen/arch/x86/guest/hypercall_page.S   | 79 ++++++++++++++++++++++++++++++
 xen/arch/x86/guest/xen.c              |  5 ++
 xen/arch/x86/xen.lds.S                |  1 +
 xen/include/asm-x86/guest.h           |  1 +
 xen/include/asm-x86/guest/hypercall.h | 92 +++++++++++++++++++++++++++++++++++
 6 files changed, 179 insertions(+)
 create mode 100644 xen/arch/x86/guest/hypercall_page.S
 create mode 100644 xen/include/asm-x86/guest/hypercall.h

diff --git a/xen/arch/x86/guest/Makefile b/xen/arch/x86/guest/Makefile
index 1345a60c81..26fb4b1007 100644
--- a/xen/arch/x86/guest/Makefile
+++ b/xen/arch/x86/guest/Makefile
@@ -1,3 +1,4 @@
+obj-y += hypercall_page.o
 obj-y += xen.o
 
 obj-bin-$(CONFIG_PVH_GUEST) += pvh-boot.init.o
diff --git a/xen/arch/x86/guest/hypercall_page.S b/xen/arch/x86/guest/hypercall_page.S
new file mode 100644
index 0000000000..fdd2e72272
--- /dev/null
+++ b/xen/arch/x86/guest/hypercall_page.S
@@ -0,0 +1,79 @@
+#include <asm/page.h>
+#include <asm/asm_defns.h>
+#include <public/xen.h>
+
+        .section ".text.page_aligned", "ax", @progbits
+        .p2align PAGE_SHIFT
+
+GLOBAL(hypercall_page)
+         /* Poisoned with `ret` for safety before hypercalls are set up. */
+        .fill PAGE_SIZE, 1, 0xc3
+        .type hypercall_page, STT_OBJECT
+        .size hypercall_page, PAGE_SIZE
+
+/*
+ * Identify a specific hypercall in the hypercall page
+ * @param name Hypercall name.
+ */
+#define DECLARE_HYPERCALL(name)                                                 \
+        .globl HYPERCALL_ ## name;                                              \
+        .set   HYPERCALL_ ## name, hypercall_page + __HYPERVISOR_ ## name * 32; \
+        .type  HYPERCALL_ ## name, STT_FUNC;                                    \
+        .size  HYPERCALL_ ## name, 32
+
+DECLARE_HYPERCALL(set_trap_table)
+DECLARE_HYPERCALL(mmu_update)
+DECLARE_HYPERCALL(set_gdt)
+DECLARE_HYPERCALL(stack_switch)
+DECLARE_HYPERCALL(set_callbacks)
+DECLARE_HYPERCALL(fpu_taskswitch)
+DECLARE_HYPERCALL(sched_op_compat)
+DECLARE_HYPERCALL(platform_op)
+DECLARE_HYPERCALL(set_debugreg)
+DECLARE_HYPERCALL(get_debugreg)
+DECLARE_HYPERCALL(update_descriptor)
+DECLARE_HYPERCALL(memory_op)
+DECLARE_HYPERCALL(multicall)
+DECLARE_HYPERCALL(update_va_mapping)
+DECLARE_HYPERCALL(set_timer_op)
+DECLARE_HYPERCALL(event_channel_op_compat)
+DECLARE_HYPERCALL(xen_version)
+DECLARE_HYPERCALL(console_io)
+DECLARE_HYPERCALL(physdev_op_compat)
+DECLARE_HYPERCALL(grant_table_op)
+DECLARE_HYPERCALL(vm_assist)
+DECLARE_HYPERCALL(update_va_mapping_otherdomain)
+DECLARE_HYPERCALL(iret)
+DECLARE_HYPERCALL(vcpu_op)
+DECLARE_HYPERCALL(set_segment_base)
+DECLARE_HYPERCALL(mmuext_op)
+DECLARE_HYPERCALL(xsm_op)
+DECLARE_HYPERCALL(nmi_op)
+DECLARE_HYPERCALL(sched_op)
+DECLARE_HYPERCALL(callback_op)
+DECLARE_HYPERCALL(xenoprof_op)
+DECLARE_HYPERCALL(event_channel_op)
+DECLARE_HYPERCALL(physdev_op)
+DECLARE_HYPERCALL(hvm_op)
+DECLARE_HYPERCALL(sysctl)
+DECLARE_HYPERCALL(domctl)
+DECLARE_HYPERCALL(kexec_op)
+DECLARE_HYPERCALL(tmem_op)
+DECLARE_HYPERCALL(xc_reserved_op)
+DECLARE_HYPERCALL(xenpmu_op)
+
+DECLARE_HYPERCALL(arch_0)
+DECLARE_HYPERCALL(arch_1)
+DECLARE_HYPERCALL(arch_2)
+DECLARE_HYPERCALL(arch_3)
+DECLARE_HYPERCALL(arch_4)
+DECLARE_HYPERCALL(arch_5)
+DECLARE_HYPERCALL(arch_6)
+DECLARE_HYPERCALL(arch_7)
+
+/*
+ * Local variables:
+ * tab-width: 8
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 9446a46a94..c5b43414c9 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -22,6 +22,7 @@
 #include <xen/types.h>
 
 #include <asm/guest.h>
+#include <asm/msr.h>
 #include <asm/processor.h>
 
 #include <public/arch-x86/cpuid.h>
@@ -29,6 +30,7 @@
 bool xen_guest;
 
 static uint32_t xen_cpuid_base;
+extern char hypercall_page[];
 
 static void __init find_xen_leaves(void)
 {
@@ -61,6 +63,9 @@ void __init probe_hypervisor(void)
     if ( !xen_cpuid_base )
         return;
 
+    /* Fill the hypercall page. */
+    wrmsrl(cpuid_ebx(xen_cpuid_base + 2), __pa(hypercall_page));
+
     xen_guest = true;
 }
 
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index d880b0a61a..0410d95af2 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -65,6 +65,7 @@ SECTIONS
   DECL_SECTION(.text) {
         _stext = .;            /* Text and read-only data */
        *(.text)
+       *(.text.page_aligned)
        *(.text.cold)
        *(.text.unlikely)
        *(.fixup)
diff --git a/xen/include/asm-x86/guest.h b/xen/include/asm-x86/guest.h
index 8d91f81451..5abdb8c433 100644
--- a/xen/include/asm-x86/guest.h
+++ b/xen/include/asm-x86/guest.h
@@ -19,6 +19,7 @@
 #ifndef __X86_GUEST_H__
 #define __X86_GUEST_H__
 
+#include <asm/guest/hypercall.h>
 #include <asm/guest/pvh-boot.h>
 #include <asm/guest/xen.h>
 
diff --git a/xen/include/asm-x86/guest/hypercall.h b/xen/include/asm-x86/guest/hypercall.h
new file mode 100644
index 0000000000..c460f59c54
--- /dev/null
+++ b/xen/include/asm-x86/guest/hypercall.h
@@ -0,0 +1,92 @@
+/******************************************************************************
+ * asm-x86/guest/hypercall.h
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+
+#ifndef __X86_XEN_HYPERCALL_H__
+#define __X86_XEN_HYPERCALL_H__
+
+#ifdef CONFIG_XEN_GUEST
+
+/*
+ * Hypercall primatives for 64bit
+ *
+ * Inputs: %rdi, %rsi, %rdx, %r10, %r8, %r9 (arguments 1-6)
+ */
+
+#define _hypercall64_1(type, hcall, a1)                                 \
+    ({                                                                  \
+        long res, tmp;                                                  \
+        asm volatile (                                                  \
+            "call hypercall_page + %c[offset]"                          \
+            : "=a" (res), "=D" (tmp)                                    \
+            : [offset] "i" (hcall * 32),                                \
+              "1" ((long)(a1))                                          \
+            : "memory" );                                               \
+        (type)res;                                                      \
+    })
+
+#define _hypercall64_2(type, hcall, a1, a2)                             \
+    ({                                                                  \
+        long res, tmp;                                                  \
+        asm volatile (                                                  \
+            "call hypercall_page + %c[offset]"                          \
+            : "=a" (res), "=D" (tmp), "=S" (tmp)                        \
+            : [offset] "i" (hcall * 32),                                \
+              "1" ((long)(a1)), "2" ((long)(a2))                        \
+            : "memory" );                                               \
+        (type)res;                                                      \
+    })
+
+#define _hypercall64_3(type, hcall, a1, a2, a3)                         \
+    ({                                                                  \
+        long res, tmp;                                                  \
+        asm volatile (                                                  \
+            "call hypercall_page + %c[offset]"                          \
+            : "=a" (res), "=D" (tmp), "=S" (tmp), "=d" (tmp)            \
+            : [offset] "i" (hcall * 32),                                \
+              "1" ((long)(a1)), "2" ((long)(a2)), "3" ((long)(a3))      \
+            : "memory" );                                               \
+        (type)res;                                                      \
+    })
+
+#define _hypercall64_4(type, hcall, a1, a2, a3, a4)                     \
+    ({                                                                  \
+        long res, tmp;                                                  \
+        register long _a4 asm ("r10") = ((long)(a4));                   \
+        asm volatile (                                                  \
+            "call hypercall_page + %c[offset]"                          \
+            : "=a" (res), "=D" (tmp), "=S" (tmp), "=d" (tmp),           \
+              "=&r" (tmp)                                               \
+            : [offset] "i" (hcall * 32),                                \
+              "1" ((long)(a1)), "2" ((long)(a2)), "3" ((long)(a3)),     \
+              "4" (_a4)                                                 \
+            : "memory" );                                               \
+        (type)res;                                                      \
+    })
+
+#endif /* CONFIG_XEN_GUEST */
+#endif /* __X86_XEN_HYPERCALL_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 25/74] x86/shutdown: Support for using SCHEDOP_{shutdown, reboot}
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (23 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 24/74] x86/guest: Hypercall support Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 14:01   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 26/74] x86/pvh: Retrieve memory map from Xen Wei Liu
                   ` (50 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 docs/misc/xen-command-line.markdown   |  3 +++
 xen/arch/x86/shutdown.c               | 34 ++++++++++++++++++++++++++++++----
 xen/include/asm-x86/guest/hypercall.h | 29 +++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 781110d4b2..e5979bceee 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1478,6 +1478,9 @@ Specify the host reboot method.
 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
  default it will use that method first).
 
+`xen` instructs Xen to reboot using Xen's SCHEDOP hypercall (this is the default
+when running nested Xen)
+
 ### rmrr
 > '= start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
 
diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c
index a87aa60add..689f6f137d 100644
--- a/xen/arch/x86/shutdown.c
+++ b/xen/arch/x86/shutdown.c
@@ -25,6 +25,7 @@
 #include <asm/mpspec.h>
 #include <asm/tboot.h>
 #include <asm/apic.h>
+#include <asm/guest.h>
 
 enum reboot_type {
         BOOT_INVALID,
@@ -34,6 +35,7 @@ enum reboot_type {
         BOOT_CF9 = 'p',
         BOOT_CF9_PWR = 'P',
         BOOT_EFI = 'e',
+        BOOT_XEN = 'x',
 };
 
 static int reboot_mode;
@@ -49,6 +51,7 @@ static int reboot_mode;
  * pci    Use the so-called "PCI reset register", CF9
  * Power  Like 'pci' but for a full power-cyle reset
  * efi    Use the EFI reboot (if running under EFI)
+ * xen    Use Xen SCHEDOP hypercall (if running under Xen as a guest)
  */
 static enum reboot_type reboot_type = BOOT_INVALID;
 
@@ -75,6 +78,7 @@ static int __init set_reboot_type(const char *str)
         case 'P':
         case 'p':
         case 't':
+        case 'x':
             reboot_type = *str;
             break;
         default:
@@ -93,6 +97,13 @@ static int __init set_reboot_type(const char *str)
         reboot_type = BOOT_INVALID;
     }
 
+    if ( reboot_type == BOOT_XEN && !xen_guest )
+    {
+        printk("Xen reboot selected, but Xen hypervisor not detected\n"
+               "Falling back to default\n");
+        reboot_type = BOOT_INVALID;
+    }
+
     return rc;
 }
 custom_param("reboot", set_reboot_type);
@@ -109,6 +120,10 @@ static inline void kb_wait(void)
 static void noreturn __machine_halt(void *unused)
 {
     local_irq_disable();
+
+    if ( reboot_type == BOOT_XEN )
+        xen_hypercall_shutdown(SHUTDOWN_poweroff);
+
     for ( ; ; )
         halt();
 }
@@ -129,10 +144,17 @@ void machine_halt(void)
 
 static void default_reboot_type(void)
 {
-    if ( reboot_type == BOOT_INVALID )
-        reboot_type = efi_enabled(EFI_RS) ? BOOT_EFI
-                                  : acpi_disabled ? BOOT_KBD
-                                                  : BOOT_ACPI;
+    if ( reboot_type != BOOT_INVALID )
+        return;
+
+    if ( xen_guest )
+        reboot_type = BOOT_XEN;
+    else if ( efi_enabled(EFI_RS) )
+        reboot_type = BOOT_EFI;
+    else if ( acpi_disabled )
+        reboot_type = BOOT_KBD;
+    else
+        reboot_type = BOOT_ACPI;
 }
 
 static int __init override_reboot(struct dmi_system_id *d)
@@ -618,6 +640,10 @@ void machine_restart(unsigned int delay_millisecs)
             }
             reboot_type = BOOT_ACPI;
             break;
+
+        case BOOT_XEN:
+            xen_hypercall_shutdown(SHUTDOWN_reboot);
+            break;
         }
     }
 }
diff --git a/xen/include/asm-x86/guest/hypercall.h b/xen/include/asm-x86/guest/hypercall.h
index c460f59c54..38791088fb 100644
--- a/xen/include/asm-x86/guest/hypercall.h
+++ b/xen/include/asm-x86/guest/hypercall.h
@@ -19,6 +19,11 @@
 #ifndef __X86_XEN_HYPERCALL_H__
 #define __X86_XEN_HYPERCALL_H__
 
+#include <xen/types.h>
+
+#include <public/xen.h>
+#include <public/sched.h>
+
 #ifdef CONFIG_XEN_GUEST
 
 /*
@@ -78,6 +83,30 @@
         (type)res;                                                      \
     })
 
+/*
+ * Primitive Hypercall wrappers
+ */
+static inline long xen_hypercall_sched_op(unsigned int cmd, void *arg)
+{
+    return _hypercall64_2(long, __HYPERVISOR_sched_op, cmd, arg);
+}
+
+/*
+ * Higher level hypercall helpers
+ */
+static inline long xen_hypercall_shutdown(unsigned int reason)
+{
+    return xen_hypercall_sched_op(SCHEDOP_shutdown, &reason);
+}
+
+#else /* CONFIG_XEN_GUEST */
+
+static inline long xen_hypercall_shutdown(unsigned int reason)
+{
+    ASSERT_UNREACHABLE();
+    return 0;
+}
+
 #endif /* CONFIG_XEN_GUEST */
 #endif /* __X86_XEN_HYPERCALL_H__ */
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 26/74] x86/pvh: Retrieve memory map from Xen
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (24 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 25/74] x86/shutdown: Support for using SCHEDOP_{shutdown, reboot} Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 14:05   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 27/74] xen/console: Introduce console=xen Wei Liu
                   ` (49 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/e820.c                   |  3 +--
 xen/arch/x86/guest/pvh-boot.c         | 21 +++++++++++++++++++++
 xen/arch/x86/guest/xen.c              |  3 +++
 xen/arch/x86/setup.c                  |  7 ++++++-
 xen/include/asm-x86/e820.h            |  1 +
 xen/include/asm-x86/guest/hypercall.h |  5 +++++
 6 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
index 7c572bade2..b422a684ee 100644
--- a/xen/arch/x86/e820.c
+++ b/xen/arch/x86/e820.c
@@ -134,8 +134,7 @@ static struct change_member *change_point[2*E820MAX] __initdata;
 static struct e820entry *overlap_list[E820MAX] __initdata;
 static struct e820entry new_bios[E820MAX] __initdata;
 
-static int __init sanitize_e820_map(struct e820entry *biosmap,
-                                    unsigned int *pnr_map)
+int __init sanitize_e820_map(struct e820entry *biosmap, unsigned int *pnr_map)
 {
     struct change_member *change_tmp;
     unsigned long current_type, last_type;
diff --git a/xen/arch/x86/guest/pvh-boot.c b/xen/arch/x86/guest/pvh-boot.c
index b93761f948..da213dfee0 100644
--- a/xen/arch/x86/guest/pvh-boot.c
+++ b/xen/arch/x86/guest/pvh-boot.c
@@ -22,6 +22,7 @@
 #include <xen/lib.h>
 #include <xen/mm.h>
 
+#include <asm/e820.h>
 #include <asm/guest.h>
 
 #include <public/arch-x86/hvm/start_info.h>
@@ -69,10 +70,30 @@ static void __init convert_pvh_info(void)
     }
 }
 
+static void __init get_memory_map(void)
+{
+    struct xen_memory_map memmap = {
+        .nr_entries = E820MAX,
+        .buffer.p = e820_raw.map,
+    };
+    int rc = xen_hypercall_memory_op(XENMEM_memory_map, &memmap);
+
+    ASSERT(rc == 0);
+    e820_raw.nr_map = memmap.nr_entries;
+
+    /* :( Various toolstacks don't sort the memory map. */
+    sanitize_e820_map(e820_raw.map, &e820_raw.nr_map);
+}
+
 multiboot_info_t *__init pvh_init(void)
 {
     convert_pvh_info();
 
+    probe_hypervisor();
+    ASSERT(xen_guest);
+
+    get_memory_map();
+
     return &pvh_mbi;
 }
 
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index c5b43414c9..152e471c06 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -54,6 +54,9 @@ static void __init find_xen_leaves(void)
 
 void __init probe_hypervisor(void)
 {
+    if ( xen_guest )
+        return;
+
     /* Too early to use cpu_has_hypervisor */
     if ( !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) )
         return;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index d8059f23b5..edb43bf2cb 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -795,7 +795,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( !(mbi->flags & MBI_MODULES) || (mbi->mods_count == 0) )
         panic("dom0 kernel not specified. Check bootloader configuration.");
 
-    if ( efi_enabled(EFI_LOADER) )
+    if ( pvh_boot )
+    {
+        /* pvh_init() already filled in e820_raw */
+        memmap_type = "PVH-e820";
+    }
+    else if ( efi_enabled(EFI_LOADER) )
     {
         set_pdx_range(xen_phys_start >> PAGE_SHIFT,
                       (xen_phys_start + BOOTSTRAP_MAP_BASE) >> PAGE_SHIFT);
diff --git a/xen/include/asm-x86/e820.h b/xen/include/asm-x86/e820.h
index 28defa8545..ee317b17aa 100644
--- a/xen/include/asm-x86/e820.h
+++ b/xen/include/asm-x86/e820.h
@@ -23,6 +23,7 @@ struct e820map {
     struct e820entry map[E820MAX];
 };
 
+extern int sanitize_e820_map(struct e820entry *biosmap, unsigned int *pnr_map);
 extern int e820_all_mapped(u64 start, u64 end, unsigned type);
 extern int reserve_e820_ram(struct e820map *e820, uint64_t s, uint64_t e);
 extern int e820_change_range_type(
diff --git a/xen/include/asm-x86/guest/hypercall.h b/xen/include/asm-x86/guest/hypercall.h
index 38791088fb..4bb749f240 100644
--- a/xen/include/asm-x86/guest/hypercall.h
+++ b/xen/include/asm-x86/guest/hypercall.h
@@ -91,6 +91,11 @@ static inline long xen_hypercall_sched_op(unsigned int cmd, void *arg)
     return _hypercall64_2(long, __HYPERVISOR_sched_op, cmd, arg);
 }
 
+static inline long xen_hypercall_memory_op(unsigned int cmd, void *arg)
+{
+    return _hypercall64_2(long, __HYPERVISOR_memory_op, cmd, arg);
+}
+
 /*
  * Higher level hypercall helpers
  */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 27/74] xen/console: Introduce console=xen
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (25 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 26/74] x86/pvh: Retrieve memory map from Xen Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 14:08   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 28/74] x86: initialise shared_info page Wei Liu
                   ` (48 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

This specifies whether to use Xen specific console output. There are
two variants: one is the hypervisor console, the other is the magic
debug port 0xe9.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/drivers/char/console.c            | 33 +++++++++++++++++++++++++++++++++
 xen/include/asm-x86/guest/hypercall.h | 13 +++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 19d0e74f17..51c1454b8e 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -30,6 +30,7 @@
 #include <xen/hypercall.h> /* for do_console_io */
 #include <xen/early_printk.h>
 #include <xen/warning.h>
+#include <asm/guest.h>
 
 /* console: comma-separated list of console outputs. */
 static char __initdata opt_console[30] = OPT_CONSOLE_STR;
@@ -83,6 +84,8 @@ static uint32_t conringc, conringp;
 
 static int __read_mostly sercon_handle = -1;
 
+static bool __read_mostly opt_console_xen; /* console=xen */
+
 static DEFINE_SPINLOCK(console_lock);
 
 /*
@@ -432,6 +435,14 @@ static void notify_dom0_con_ring(unsigned long unused)
 static DECLARE_SOFTIRQ_TASKLET(notify_dom0_con_ring_tasklet,
                                notify_dom0_con_ring, 0);
 
+static inline void xen_console_write_debug_port(const char *buf, size_t len)
+{
+    unsigned long tmp;
+    asm volatile ( "rep outsb;"
+                   : "=&S" (tmp), "=&c" (tmp)
+                   : "0" (buf), "1" (len), "d" (0xe9) );
+}
+
 static long guest_console_write(XEN_GUEST_HANDLE_PARAM(char) buffer, int count)
 {
     char kbuf[128];
@@ -458,6 +469,16 @@ static long guest_console_write(XEN_GUEST_HANDLE_PARAM(char) buffer, int count)
             sercon_puts(kbuf);
             video_puts(kbuf);
 
+            if ( opt_console_xen )
+            {
+                size_t len = strlen(kbuf);
+
+                if ( xen_guest )
+                    xen_hypercall_console_write(kbuf, len);
+                else
+                    xen_console_write_debug_port(kbuf, len);
+            }
+
             if ( opt_console_to_ring )
             {
                 conring_puts(kbuf);
@@ -567,6 +588,16 @@ static void __putstr(const char *str)
     sercon_puts(str);
     video_puts(str);
 
+    if ( opt_console_xen )
+    {
+        size_t len = strlen(str);
+
+        if ( xen_guest )
+            xen_hypercall_console_write(str, len);
+        else
+            xen_console_write_debug_port(str, len);
+    }
+
     conring_puts(str);
 
     if ( !console_locks_busted )
@@ -762,6 +793,8 @@ void __init console_init_preirq(void)
             p++;
         if ( !strncmp(p, "vga", 3) )
             video_init();
+        else if ( !strncmp(p, "xen", 3) )
+            opt_console_xen = true;
         else if ( !strncmp(p, "none", 4) )
             continue;
         else if ( (sh = serial_parse_handle(p)) >= 0 )
diff --git a/xen/include/asm-x86/guest/hypercall.h b/xen/include/asm-x86/guest/hypercall.h
index 4bb749f240..d5fe535c03 100644
--- a/xen/include/asm-x86/guest/hypercall.h
+++ b/xen/include/asm-x86/guest/hypercall.h
@@ -99,6 +99,13 @@ static inline long xen_hypercall_memory_op(unsigned int cmd, void *arg)
 /*
  * Higher level hypercall helpers
  */
+static inline void xen_hypercall_console_write(
+    const char *buf, unsigned int count)
+{
+    (void)_hypercall64_3(long, __HYPERVISOR_console_io,
+                         CONSOLEIO_write, count, buf);
+}
+
 static inline long xen_hypercall_shutdown(unsigned int reason)
 {
     return xen_hypercall_sched_op(SCHEDOP_shutdown, &reason);
@@ -106,6 +113,12 @@ static inline long xen_hypercall_shutdown(unsigned int reason)
 
 #else /* CONFIG_XEN_GUEST */
 
+static inline void xen_hypercall_console_write(
+    const char *buf, unsigned int count)
+{
+    ASSERT_UNREACHABLE();
+}
+
 static inline long xen_hypercall_shutdown(unsigned int reason)
 {
     ASSERT_UNREACHABLE();
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 28/74] x86: initialise shared_info page
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (26 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 27/74] xen/console: Introduce console=xen Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 14:11   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 29/74] x86: xen pv clock time source Wei Liu
                   ` (47 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/guest/xen.c        | 24 ++++++++++++++++++++++++
 xen/arch/x86/setup.c            |  3 +++
 xen/include/asm-x86/fixmap.h    |  3 +++
 xen/include/asm-x86/guest/xen.h | 10 ++++++++++
 4 files changed, 40 insertions(+)

diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 152e471c06..594eae0828 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -72,6 +72,30 @@ void __init probe_hypervisor(void)
     xen_guest = true;
 }
 
+static void map_shared_info(struct e820map *e820)
+{
+    paddr_t frame = 0xff000000; /* TODO: Hardcoded beside magic frames. */
+    struct xen_add_to_physmap xatp = {
+        .domid = DOMID_SELF,
+        .idx = 0,
+        .space = XENMAPSPACE_shared_info,
+        .gpfn = frame >> PAGE_SHIFT,
+    };
+
+    if ( !e820_add_range(e820, frame, frame + PAGE_SIZE, E820_RESERVED) )
+        panic("Failed to reserve shared_info range");
+
+    if ( xen_hypercall_memory_op(XENMEM_add_to_physmap, &xatp) )
+        panic("Failed to map shared_info page");
+
+    set_fixmap(FIX_XEN_SHARED_INFO, frame);
+}
+
+void __init hypervisor_early_setup(struct e820map *e820)
+{
+    map_shared_info(e820);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index edb43bf2cb..353cdd4337 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -892,6 +892,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     /* Create a temporary copy of the E820 map. */
     memcpy(&boot_e820, &e820, sizeof(e820));
 
+    if ( xen_guest )
+        hypervisor_early_setup(&boot_e820);
+
     /* Early kexec reservation (explicit static start address). */
     nr_pages = 0;
     for ( i = 0; i < e820.nr_map; i++ )
diff --git a/xen/include/asm-x86/fixmap.h b/xen/include/asm-x86/fixmap.h
index 51b0e7e945..ded4ddf21b 100644
--- a/xen/include/asm-x86/fixmap.h
+++ b/xen/include/asm-x86/fixmap.h
@@ -45,6 +45,9 @@ enum fixed_addresses {
     FIX_COM_BEGIN,
     FIX_COM_END,
     FIX_EHCI_DBGP,
+#ifdef CONFIG_XEN_GUEST
+    FIX_XEN_SHARED_INFO,
+#endif /* CONFIG_XEN_GUEST */
     /* Everything else should go further down. */
     FIX_APIC_BASE,
     FIX_IO_APIC_BASE_0,
diff --git a/xen/include/asm-x86/guest/xen.h b/xen/include/asm-x86/guest/xen.h
index 97a7c8d531..2f3bcd2fe4 100644
--- a/xen/include/asm-x86/guest/xen.h
+++ b/xen/include/asm-x86/guest/xen.h
@@ -21,17 +21,27 @@
 
 #include <xen/types.h>
 
+#include <asm/e820.h>
+#include <asm/fixmap.h>
+
+#define XEN_shared_info ((struct shared_info *)fix_to_virt(FIX_XEN_SHARED_INFO))
+
 #ifdef CONFIG_XEN_GUEST
 
 extern bool xen_guest;
 
 void probe_hypervisor(void);
+void hypervisor_early_setup(struct e820map *e820);
 
 #else
 
 #define xen_guest 0
 
 static inline void probe_hypervisor(void) {};
+static inline void hypervisor_early_setup(struct e820map *e820)
+{
+    ASSERT_UNREACHABLE();
+};
 
 #endif /* CONFIG_XEN_GUEST */
 #endif /* __X86_GUEST_XEN_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 29/74] x86: xen pv clock time source
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (27 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 28/74] x86: initialise shared_info page Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 14:17   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 30/74] x86: APIC timer calibration when running as a guest Wei Liu
                   ` (46 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

It is a variant of TSC clock source.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/time.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 3b654d7b7d..ee35ffda6c 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -29,6 +29,7 @@
 #include <asm/mpspec.h>
 #include <asm/processor.h>
 #include <asm/fixmap.h>
+#include <asm/guest.h>
 #include <asm/mc146818rtc.h>
 #include <asm/div64.h>
 #include <asm/acpi.h>
@@ -525,6 +526,96 @@ static struct platform_timesource __initdata plt_tsc =
     .init = init_tsc,
 };
 
+#ifdef CONFIG_XEN_GUEST
+/************************************************************
+ * PLATFORM TIMER 5: XEN PV CLOCK SOURCE
+ *
+ * Xen clock source is a variant of TSC source.
+ */
+
+static u64 xen_timer_cpu_frequency(void)
+{
+    struct vcpu_time_info *info = &XEN_shared_info->vcpu_info[0].time;
+    u64 freq;
+
+    freq = 1000000000ULL << 32;
+    do_div(freq, info->tsc_to_system_mul);
+    if ( info->tsc_shift < 0 )
+        freq <<= -info->tsc_shift;
+    else
+        freq >>= info->tsc_shift;
+
+    return freq;
+}
+
+static s64 __init init_xen_timer(struct platform_timesource *pts)
+{
+    if ( !xen_guest )
+        return 0;
+
+    pts->frequency = xen_timer_cpu_frequency();
+
+    return pts->frequency;
+}
+
+static always_inline
+u64 __read_cycle(const struct vcpu_time_info *info, u64 tsc)
+{
+    u64 delta = tsc - info->tsc_timestamp;
+    struct time_scale ts = {
+        .shift    = info->tsc_shift,
+        .mul_frac = info->tsc_to_system_mul,
+    };
+    u64 offset = scale_delta(delta, &ts);
+
+    return info->system_time + offset;
+}
+
+static u64 last_value;
+static u64 read_xen_timer(void)
+{
+    struct vcpu_time_info *info;
+    unsigned int cpu = smp_processor_id();
+    u32 version;
+    u64 ret;
+    u64 last;
+
+    /* TODO: lift this restriction */
+    ASSERT(cpu < XEN_LEGACY_MAX_VCPUS);
+    info = &XEN_shared_info->vcpu_info[cpu].time;
+
+    do {
+        version = info->version & ~1;
+        /* Make sure version is read before the data */
+        smp_rmb();
+
+        ret = __read_cycle(info, rdtsc_ordered());
+        /* Ignore fancy flags for now */
+
+        /* Make sure version is reread after the data */
+        smp_rmb();
+    } while ( unlikely(version != info->version) );
+
+    /* Maintain a monotonic global value */
+    do {
+        last = read_atomic(&last_value);
+        if ( ret < last )
+            return last;
+    } while ( unlikely(cmpxchg(&last_value, last, ret) != last) );
+
+    return ret;
+}
+
+static struct platform_timesource __initdata plt_xen_timer =
+{
+    .id = "xen",
+    .name = "XEN PV CLOCK",
+    .read_counter = read_xen_timer,
+    .init = init_xen_timer,
+    .counter_bits = 63,
+};
+#endif
+
 /************************************************************
  * GENERIC PLATFORM TIMER INFRASTRUCTURE
  */
@@ -672,6 +763,9 @@ static s64 __init try_platform_timer(struct platform_timesource *pts)
 static u64 __init init_platform_timer(void)
 {
     static struct platform_timesource * __initdata plt_timers[] = {
+#ifdef CONFIG_XEN_GUEST
+        &plt_xen_timer,
+#endif
         &plt_hpet, &plt_pmtimer, &plt_pit
     };
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 30/74] x86: APIC timer calibration when running as a guest
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (28 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 29/74] x86: xen pv clock time source Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 14:35   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 31/74] x86: read wallclock from Xen running in pvh mode Wei Liu
                   ` (45 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

The timer calibration depends on the number of ticks. Introduce a
variant to wait for a tick when running as a guest.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/apic.c | 38 ++++++++++++++++++++++++++++++--------
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index ed59440c45..5039173827 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -36,6 +36,8 @@
 #include <mach_apic.h>
 #include <io_ports.h>
 #include <xen/kexec.h>
+#include <asm/guest.h>
+#include <asm/time.h>
 
 static bool __read_mostly tdt_enabled;
 static bool __initdata tdt_enable = true;
@@ -1091,6 +1093,20 @@ static void setup_APIC_timer(void)
     local_irq_restore(flags);
 }
 
+static void wait_tick_pvh(void)
+{
+    u64 lapse_ns = 1000000000ULL / HZ;
+    s_time_t start, curr_time;
+
+    start = NOW();
+
+    /* Won't wrap around */
+    do {
+        cpu_relax();
+        curr_time = NOW();
+    } while ( curr_time - start < lapse_ns );
+}
+
 /*
  * In this function we calibrate APIC bus clocks to the external
  * timer. Unfortunately we cannot use jiffies and the timer irq
@@ -1123,12 +1139,15 @@ static int __init calibrate_APIC_clock(void)
      */
     __setup_APIC_LVTT(1000000000);
 
-    /*
-     * The timer chip counts down to zero. Let's wait
-     * for a wraparound to start exact measurement:
-     * (the current tick might have been already half done)
-     */
-    wait_8254_wraparound();
+    if ( !xen_guest )
+        /*
+         * The timer chip counts down to zero. Let's wait
+         * for a wraparound to start exact measurement:
+         * (the current tick might have been already half done)
+         */
+        wait_8254_wraparound();
+    else
+        wait_tick_pvh();
 
     /*
      * We wrapped around just now. Let's start:
@@ -1137,10 +1156,13 @@ static int __init calibrate_APIC_clock(void)
     tt1 = apic_read(APIC_TMCCT);
 
     /*
-     * Let's wait LOOPS wraprounds:
+     * Let's wait LOOPS ticks:
      */
     for (i = 0; i < LOOPS; i++)
-        wait_8254_wraparound();
+        if ( !xen_guest )
+            wait_8254_wraparound();
+        else
+            wait_tick_pvh();
 
     tt2 = apic_read(APIC_TMCCT);
     t2 = rdtsc_ordered();
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 31/74] x86: read wallclock from Xen running in pvh mode
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (29 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 30/74] x86: APIC timer calibration when running as a guest Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 14:43   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 32/74] x86: don't swallow the first command line item " Wei Liu
                   ` (44 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/time.c | 38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index ee35ffda6c..886fc45248 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -969,6 +969,36 @@ static unsigned long get_cmos_time(void)
     return mktime(rtc.year, rtc.mon, rtc.day, rtc.hour, rtc.min, rtc.sec);
 }
 
+static unsigned long noinline get_xen_wallclock_time(void)
+{
+#ifdef CONFIG_XEN_GUEST
+    struct shared_info *sh_info = XEN_shared_info;
+    uint32_t wc_version;
+    uint64_t wc_sec;
+
+    do {
+        wc_version = sh_info->wc_version & ~1;
+        smp_rmb();
+
+        wc_sec  = sh_info->wc_sec;
+        smp_rmb();
+    } while ( wc_version != sh_info->wc_version );
+
+    return wc_sec + read_xen_timer() / 1000000000;
+#else
+    ASSERT_UNREACHABLE();
+    return 0;
+#endif
+}
+
+static unsigned long get_wallclock_time(void)
+{
+    if ( !xen_guest )
+        return get_cmos_time();
+    else
+        return get_xen_wallclock_time();
+}
+
 /***************************************************************************
  * System Time
  ***************************************************************************/
@@ -1764,8 +1794,8 @@ int __init init_xen_time(void)
 
     open_softirq(TIME_CALIBRATE_SOFTIRQ, local_time_calibration);
 
-    /* NB. get_cmos_time() can take over one second to execute. */
-    do_settime(get_cmos_time(), 0, NOW());
+    /* NB. get_wallclock_time() can take over one second to execute. */
+    do_settime(get_wallclock_time(), 0, NOW());
 
     /* Finish platform timer initialization. */
     try_platform_timer_tail(false);
@@ -1875,7 +1905,7 @@ int time_suspend(void)
 {
     if ( smp_processor_id() == 0 )
     {
-        cmos_utc_offset = -get_cmos_time();
+        cmos_utc_offset = -get_wallclock_time();
         cmos_utc_offset += get_sec();
         kill_timer(&calibration_timer);
 
@@ -1902,7 +1932,7 @@ int time_resume(void)
 
     set_timer(&calibration_timer, NOW() + EPOCH);
 
-    do_settime(get_cmos_time() + cmos_utc_offset, 0, NOW());
+    do_settime(get_wallclock_time() + cmos_utc_offset, 0, NOW());
 
     update_vcpu_system_time(current);
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 32/74] x86: don't swallow the first command line item in pvh mode
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (30 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 31/74] x86: read wallclock from Xen running in pvh mode Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 14:49   ` Jan Beulich
  2018-01-09 14:30   ` Roger Pau Monné
  2018-01-04 13:05 ` [PATCH RFC v1 33/74] x86/guest: enable event channels upcalls Wei Liu
                   ` (43 subsequent siblings)
  75 siblings, 2 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

Instead, special case GRUB1 rather assuming that all bootloaders except GRUB2
need a parameter stripping.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/setup.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 353cdd4337..4dff2bca8b 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -617,11 +617,12 @@ static void __init noreturn reinit_bsp_stack(void)
     reset_stack_and_jump(init_done);
 }
 
-static bool __init loader_is_grub2(const char *loader_name)
+static bool __init loader_is_grub1(const char *loader_name)
 {
     /* GRUB1="GNU GRUB 0.xx"; GRUB2="GRUB 1.xx" */
     const char *p = strstr(loader_name, "GRUB ");
-    return (p != NULL) && (p[5] != '0');
+
+    return p && p[5] == '0';
 }
 
 static char * __init cmdline_cook(char *p, const char *loader_name)
@@ -632,11 +633,10 @@ static char * __init cmdline_cook(char *p, const char *loader_name)
     while ( *p == ' ' )
         p++;
 
-    /* GRUB2 does not include image name as first item on command line. */
-    if ( loader_is_grub2(loader_name) )
+    if ( !loader_is_grub1(loader_name) )
         return p;
 
-    /* Strip image name plus whitespace. */
+    /* GRUB1 includes the image name as first item on command line. Strip it. */
     while ( (*p != ' ') && (*p != '\0') )
         p++;
     while ( *p == ' ' )
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 33/74] x86/guest: enable event channels upcalls
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (31 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 32/74] x86: don't swallow the first command line item " Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 15:07   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 34/74] x86/guest: add PV console code Wei Liu
                   ` (42 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Sergey Dyasli <sergey.dyasli@citrix.com>

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/guest/xen.c              | 74 +++++++++++++++++++++++++++++++++++
 xen/arch/x86/smpboot.c                |  4 ++
 xen/include/asm-x86/guest/hypercall.h | 17 ++++++++
 xen/include/asm-x86/guest/xen.h       |  5 +++
 4 files changed, 100 insertions(+)

diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 594eae0828..781bfa493b 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -21,6 +21,7 @@
 #include <xen/init.h>
 #include <xen/types.h>
 
+#include <asm/apic.h>
 #include <asm/guest.h>
 #include <asm/msr.h>
 #include <asm/processor.h>
@@ -30,6 +31,7 @@
 bool xen_guest;
 
 static uint32_t xen_cpuid_base;
+static uint8_t evtchn_upcall_vector;
 extern char hypercall_page[];
 
 static void __init find_xen_leaves(void)
@@ -91,9 +93,81 @@ static void map_shared_info(struct e820map *e820)
     set_fixmap(FIX_XEN_SHARED_INFO, frame);
 }
 
+static void xen_evtchn_upcall(struct cpu_user_regs *regs)
+{
+    unsigned int cpu = smp_processor_id();
+    struct vcpu_info *vcpu_info = &XEN_shared_info->vcpu_info[cpu];
+
+    vcpu_info->evtchn_upcall_pending = 0;
+    xchg(&vcpu_info->evtchn_pending_sel, 0);
+
+    ack_APIC_irq();
+}
+
+static void ap_setup_event_channels(bool clear)
+{
+    unsigned int i, cpu = smp_processor_id();
+    struct vcpu_info *vcpu_info = &XEN_shared_info->vcpu_info[cpu];
+    int rc;
+
+    ASSERT(evtchn_upcall_vector);
+    ASSERT(cpu < ARRAY_SIZE(XEN_shared_info->vcpu_info));
+
+    if ( !clear )
+    {
+        /*
+         * This is necessary to ensure that a CPU will be interrupted in case
+         * of an event channel notification.
+         */
+        ASSERT(vcpu_info->evtchn_upcall_pending == 0);
+        ASSERT(vcpu_info->evtchn_pending_sel == 0);
+    }
+
+    rc = xen_hypercall_set_evtchn_upcall_vector(cpu, evtchn_upcall_vector);
+    if ( rc )
+        panic("Unable to set evtchn upcall vector: %d", rc);
+
+    if ( clear )
+    {
+        /*
+         * Clear any pending upcall bits. This makes us effectively ignore any
+         * previous upcalls which might be suboptimal.
+         */
+        vcpu_info->evtchn_upcall_pending = 0;
+        xchg(&vcpu_info->evtchn_pending_sel, 0);
+
+        /*
+         * evtchn_pending can be cleared only on the boot CPU because it's
+         * located in a shared structure.
+         */
+        for ( i = 0; i < 8; i++ )
+            xchg(&XEN_shared_info->evtchn_pending[i], 0);
+    }
+}
+
+static void __init init_evtchn(void)
+{
+    unsigned int i;
+
+    alloc_direct_apic_vector(&evtchn_upcall_vector, xen_evtchn_upcall);
+
+    /* Mask all upcalls */
+    for ( i = 0; i < 8; i++ )
+        xchg(&XEN_shared_info->evtchn_mask[i], ~0ul);
+
+    ap_setup_event_channels(true);
+}
+
 void __init hypervisor_early_setup(struct e820map *e820)
 {
     map_shared_info(e820);
+
+    init_evtchn();
+}
+
+void hypervisor_ap_setup(void)
+{
+    ap_setup_event_channels(false);
 }
 
 /*
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 7b97ff86cb..9a9fbc6ee0 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -45,6 +45,7 @@
 #include <mach_apic.h>
 #include <mach_wakecpu.h>
 #include <smpboot_hooks.h>
+#include <asm/guest.h>
 
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef mfn_to_page
@@ -372,6 +373,9 @@ void start_secondary(void *unused)
     cpumask_set_cpu(cpu, &cpu_online_map);
     unlock_vector_lock();
 
+    if ( xen_guest )
+        hypervisor_ap_setup();
+
     /* We can take interrupts now: we're officially "up". */
     local_irq_enable();
     mtrr_ap_init();
diff --git a/xen/include/asm-x86/guest/hypercall.h b/xen/include/asm-x86/guest/hypercall.h
index d5fe535c03..d6d4d1946b 100644
--- a/xen/include/asm-x86/guest/hypercall.h
+++ b/xen/include/asm-x86/guest/hypercall.h
@@ -23,6 +23,7 @@
 
 #include <public/xen.h>
 #include <public/sched.h>
+#include <public/hvm/hvm_op.h>
 
 #ifdef CONFIG_XEN_GUEST
 
@@ -96,6 +97,11 @@ static inline long xen_hypercall_memory_op(unsigned int cmd, void *arg)
     return _hypercall64_2(long, __HYPERVISOR_memory_op, cmd, arg);
 }
 
+static inline long xen_hypercall_hvm_op(unsigned int op, void *arg)
+{
+    return _hypercall64_2(long, __HYPERVISOR_hvm_op, op, arg);
+}
+
 /*
  * Higher level hypercall helpers
  */
@@ -111,6 +117,17 @@ static inline long xen_hypercall_shutdown(unsigned int reason)
     return xen_hypercall_sched_op(SCHEDOP_shutdown, &reason);
 }
 
+static inline long xen_hypercall_set_evtchn_upcall_vector(
+    unsigned int cpu, unsigned int vector)
+{
+    struct xen_hvm_evtchn_upcall_vector a = {
+        .vcpu = cpu,
+        .vector = vector,
+    };
+
+    return xen_hypercall_hvm_op(HVMOP_set_evtchn_upcall_vector, &a);
+}
+
 #else /* CONFIG_XEN_GUEST */
 
 static inline void xen_hypercall_console_write(
diff --git a/xen/include/asm-x86/guest/xen.h b/xen/include/asm-x86/guest/xen.h
index 2f3bcd2fe4..56cabb1934 100644
--- a/xen/include/asm-x86/guest/xen.h
+++ b/xen/include/asm-x86/guest/xen.h
@@ -32,6 +32,7 @@ extern bool xen_guest;
 
 void probe_hypervisor(void);
 void hypervisor_early_setup(struct e820map *e820);
+void hypervisor_ap_setup(void);
 
 #else
 
@@ -42,6 +43,10 @@ static inline void hypervisor_early_setup(struct e820map *e820)
 {
     ASSERT_UNREACHABLE();
 };
+static inline void hypervisor_ap_setup(void)
+{
+    ASSERT_UNREACHABLE();
+};
 
 #endif /* CONFIG_XEN_GUEST */
 #endif /* __X86_GUEST_XEN_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 34/74] x86/guest: add PV console code
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (32 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 33/74] x86/guest: enable event channels upcalls Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 15:22   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 35/74] x86/guest: use PV console for Xen/Dom0 I/O Wei Liu
                   ` (41 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Sergey Dyasli <sergey.dyasli@citrix.com>

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/drivers/char/Makefile             |   1 +
 xen/drivers/char/xen_pv_console.c     | 198 ++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/fixmap.h          |   1 +
 xen/include/asm-x86/guest/hypercall.h |  33 ++++++
 xen/include/xen/pv_console.h          |  32 ++++++
 5 files changed, 265 insertions(+)
 create mode 100644 xen/drivers/char/xen_pv_console.c
 create mode 100644 xen/include/xen/pv_console.h

diff --git a/xen/drivers/char/Makefile b/xen/drivers/char/Makefile
index aa169d7961..9d48d0f2dc 100644
--- a/xen/drivers/char/Makefile
+++ b/xen/drivers/char/Makefile
@@ -8,3 +8,4 @@ obj-$(CONFIG_HAS_SCIF) += scif-uart.o
 obj-$(CONFIG_HAS_EHCI) += ehci-dbgp.o
 obj-$(CONFIG_ARM) += arm-uart.o
 obj-y += serial.o
+obj-$(CONFIG_XEN_GUEST) += xen_pv_console.o
diff --git a/xen/drivers/char/xen_pv_console.c b/xen/drivers/char/xen_pv_console.c
new file mode 100644
index 0000000000..5e494bc72a
--- /dev/null
+++ b/xen/drivers/char/xen_pv_console.c
@@ -0,0 +1,198 @@
+/******************************************************************************
+ * drivers/char/xen_pv_console.c
+ *
+ * A frontend driver for Xen's PV console.
+ * Can be used when Xen is running on top of Xen in pv-in-pvh mode.
+ * (Linux's name for this is hvc console)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+
+#include <xen/lib.h>
+#include <xen/hypercall.h>
+#include <xen/pv_console.h>
+
+#include <asm/fixmap.h>
+#include <asm/guest.h>
+
+#include <public/io/console.h>
+
+static struct xencons_interface *cons_ring;
+static evtchn_port_t cons_evtchn;
+static serial_rx_fn cons_rx_handler;
+static DEFINE_SPINLOCK(tx_lock);
+
+void __init pv_console_init(void)
+{
+    long r;
+    uint64_t raw_pfn = 0, raw_evtchn = 0;
+
+    if ( !xen_guest )
+    {
+        printk("PV console init failed: xen_guest mode is not active!\n");
+        return;
+    }
+
+    r = xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_PFN, &raw_pfn);
+    if ( r < 0 )
+        goto error;
+
+    r = xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_EVTCHN, &raw_evtchn);
+    if ( r < 0 )
+        goto error;
+
+    set_fixmap(FIX_PV_CONSOLE, raw_pfn << PAGE_SHIFT);
+    cons_ring = (struct xencons_interface *)fix_to_virt(FIX_PV_CONSOLE);
+    cons_evtchn = raw_evtchn;
+
+    printk("Initialised PV console at 0x%p with pfn %#lx and evtchn %#x\n",
+            cons_ring, raw_pfn, cons_evtchn);
+    return;
+
+ error:
+    printk("Couldn't initialise PV console\n");
+}
+
+void __init pv_console_set_rx_handler(serial_rx_fn fn)
+{
+    cons_rx_handler = fn;
+}
+
+void __init pv_console_init_postirq(void)
+{
+    if ( !cons_ring )
+        return;
+
+    xen_hypercall_evtchn_unmask(cons_evtchn);
+}
+
+static void notify_daemon(void)
+{
+    xen_hypercall_evtchn_send(cons_evtchn);
+}
+
+size_t pv_console_rx(struct cpu_user_regs *regs)
+{
+    char c;
+    XENCONS_RING_IDX cons, prod;
+    size_t recv = 0;
+
+    if ( !cons_ring )
+        return 0;
+
+    /* TODO: move this somewhere */
+    if ( !test_bit(cons_evtchn, XEN_shared_info->evtchn_pending) )
+        return 0;
+
+    prod = ACCESS_ONCE(cons_ring->in_prod);
+    cons = cons_ring->in_cons;
+    /* Get pointers before reading the ring */
+    smp_rmb();
+
+    ASSERT((prod - cons) <= sizeof(cons_ring->in));
+
+    while ( cons != prod )
+    {
+        c = cons_ring->in[MASK_XENCONS_IDX(cons++, cons_ring->in)];
+        if ( cons_rx_handler )
+            cons_rx_handler(c, regs);
+        recv++;
+    }
+
+    /* No need for a mem barrier because every character was already consumed */
+    barrier();
+    ACCESS_ONCE(cons_ring->in_cons) = cons;
+    notify_daemon();
+
+    clear_bit(cons_evtchn, XEN_shared_info->evtchn_pending);
+
+    return recv;
+}
+
+static size_t pv_ring_puts(const char *buf)
+{
+    XENCONS_RING_IDX cons, prod;
+    size_t sent = 0, avail;
+    bool put_r = false;
+
+    while ( buf[sent] != '\0' || put_r )
+    {
+        cons = ACCESS_ONCE(cons_ring->out_cons);
+        prod = cons_ring->out_prod;
+
+        ASSERT((prod - cons) <= sizeof(cons_ring->out));
+        avail = sizeof(cons_ring->out) - (prod - cons);
+
+        if ( avail == 0 )
+        {
+            /* Wait for xenconsoled to consume our output */
+            xen_hypercall_sched_op(SCHEDOP_yield, NULL);
+            continue;
+        }
+
+        /* Update pointers before accessing the ring */
+        smp_rmb();
+
+        while ( avail && (buf[sent] != '\0' || put_r) )
+        {
+            if ( put_r )
+            {
+                cons_ring->out[MASK_XENCONS_IDX(prod++, cons_ring->out)] = '\r';
+                put_r = false;
+            }
+            else
+            {
+                cons_ring->out[MASK_XENCONS_IDX(prod++, cons_ring->out)] =
+                    buf[sent];
+
+                /* Send '\r' for every '\n' */
+                if ( buf[sent] == '\n' )
+                    put_r = true;
+                sent++;
+            }
+            avail--;
+        }
+
+        /* Write to the ring before updating the pointer */
+        smp_wmb();
+        ACCESS_ONCE(cons_ring->out_prod) = prod;
+        notify_daemon();
+    }
+
+    return sent;
+}
+
+void pv_console_puts(const char *buf)
+{
+    unsigned long flags;
+
+    if ( !cons_ring )
+        return;
+
+    spin_lock_irqsave(&tx_lock, flags);
+    pv_ring_puts(buf);
+    spin_unlock_irqrestore(&tx_lock, flags);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/fixmap.h b/xen/include/asm-x86/fixmap.h
index ded4ddf21b..16ccaa2c77 100644
--- a/xen/include/asm-x86/fixmap.h
+++ b/xen/include/asm-x86/fixmap.h
@@ -46,6 +46,7 @@ enum fixed_addresses {
     FIX_COM_END,
     FIX_EHCI_DBGP,
 #ifdef CONFIG_XEN_GUEST
+    FIX_PV_CONSOLE,
     FIX_XEN_SHARED_INFO,
 #endif /* CONFIG_XEN_GUEST */
     /* Everything else should go further down. */
diff --git a/xen/include/asm-x86/guest/hypercall.h b/xen/include/asm-x86/guest/hypercall.h
index d6d4d1946b..90b4755467 100644
--- a/xen/include/asm-x86/guest/hypercall.h
+++ b/xen/include/asm-x86/guest/hypercall.h
@@ -97,6 +97,11 @@ static inline long xen_hypercall_memory_op(unsigned int cmd, void *arg)
     return _hypercall64_2(long, __HYPERVISOR_memory_op, cmd, arg);
 }
 
+static inline long xen_hypercall_event_channel_op(unsigned int cmd, void *arg)
+{
+    return _hypercall64_2(long, __HYPERVISOR_event_channel_op, cmd, arg);
+}
+
 static inline long xen_hypercall_hvm_op(unsigned int op, void *arg)
 {
     return _hypercall64_2(long, __HYPERVISOR_hvm_op, op, arg);
@@ -117,6 +122,34 @@ static inline long xen_hypercall_shutdown(unsigned int reason)
     return xen_hypercall_sched_op(SCHEDOP_shutdown, &reason);
 }
 
+static inline long xen_hypercall_evtchn_send(evtchn_port_t port)
+{
+    struct evtchn_send send = { .port = port };
+
+    return xen_hypercall_event_channel_op(EVTCHNOP_send, &send);
+}
+
+static inline long xen_hypercall_evtchn_unmask(evtchn_port_t port)
+{
+    struct evtchn_unmask unmask = { .port = port };
+
+    return xen_hypercall_event_channel_op(EVTCHNOP_unmask, &unmask);
+}
+
+static inline long xen_hypercall_hvm_get_param(uint32_t index, uint64_t *value)
+{
+    struct xen_hvm_param xhv = {
+        .domid = DOMID_SELF,
+        .index = index,
+    };
+    long ret = xen_hypercall_hvm_op(HVMOP_get_param, &xhv);
+
+    if ( ret == 0 )
+        *value = xhv.value;
+
+    return ret;
+}
+
 static inline long xen_hypercall_set_evtchn_upcall_vector(
     unsigned int cpu, unsigned int vector)
 {
diff --git a/xen/include/xen/pv_console.h b/xen/include/xen/pv_console.h
new file mode 100644
index 0000000000..e578b56620
--- /dev/null
+++ b/xen/include/xen/pv_console.h
@@ -0,0 +1,32 @@
+#ifndef __XEN_PV_CONSOLE_H__
+#define __XEN_PV_CONSOLE_H__
+
+#include <xen/serial.h>
+
+#ifdef CONFIG_XEN_GUEST
+
+void pv_console_init(void);
+void pv_console_set_rx_handler(serial_rx_fn fn);
+void pv_console_init_postirq(void);
+void pv_console_puts(const char *buf);
+size_t pv_console_rx(struct cpu_user_regs *regs);
+
+#else
+
+static inline void pv_console_init(void) {}
+static inline void pv_console_set_rx_handler(serial_rx_fn fn) { }
+static inline void pv_console_init_postirq(void) { }
+static inline void pv_console_puts(const char *buf) { }
+static inline size_t pv_console_rx(struct cpu_user_regs *regs) { return 0; }
+
+#endif /* !CONFIG_XEN_GUEST */
+#endif /* __XEN_PV_CONSOLE_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 35/74] x86/guest: use PV console for Xen/Dom0 I/O
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (33 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 34/74] x86/guest: add PV console code Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 36/74] --- x86/shim: Kconfig and command line options Wei Liu
                   ` (40 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Sergey Dyasli <sergey.dyasli@citrix.com>

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
 docs/misc/xen-command-line.markdown |  5 ++++-
 xen/arch/x86/guest/xen.c            |  3 +++
 xen/drivers/char/console.c          | 10 +++++++++-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index e5979bceee..da006dd4f7 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -365,7 +365,7 @@ The following are examples of correct specifications:
 Specify the size of the console ring buffer.
 
 ### console
-> `= List of [ vga | com1[H,L] | com2[H,L] | dbgp | none ]`
+> `= List of [ vga | com1[H,L] | com2[H,L] | pv | dbgp | none ]`
 
 > Default: `console=com1,vga`
 
@@ -381,6 +381,9 @@ the converse; transmitted and received characters will have their MSB
 cleared.  This allows a single port to be shared by two subsystems
 (e.g. console and debugger).
 
+`pv` indicates that Xen should use Xen's PV console. This option is
+only available when used together with `pv-in-pvh`.
+
 `dbgp` indicates that Xen should use a USB debug port.
 
 `none` indicates that Xen should not use a console.  This option only
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 781bfa493b..0319a5f9e8 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -20,6 +20,7 @@
  */
 #include <xen/init.h>
 #include <xen/types.h>
+#include <xen/pv_console.h>
 
 #include <asm/apic.h>
 #include <asm/guest.h>
@@ -101,6 +102,8 @@ static void xen_evtchn_upcall(struct cpu_user_regs *regs)
     vcpu_info->evtchn_upcall_pending = 0;
     xchg(&vcpu_info->evtchn_pending_sel, 0);
 
+    pv_console_rx(regs);
+
     ack_APIC_irq();
 }
 
diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 51c1454b8e..354e020d19 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -16,6 +16,7 @@
 #include <xen/event.h>
 #include <xen/console.h>
 #include <xen/serial.h>
+#include <xen/pv_console.h>
 #include <xen/softirq.h>
 #include <xen/keyhandler.h>
 #include <xen/guest_access.h>
@@ -339,6 +340,9 @@ static void sercon_puts(const char *s)
         (*serial_steal_fn)(s);
     else
         serial_puts(sercon_handle, s);
+
+    /* Copy all serial output into PV console */
+    pv_console_puts(s);
 }
 
 static void dump_console_ring_key(unsigned char key)
@@ -791,7 +795,9 @@ void __init console_init_preirq(void)
     {
         if ( *p == ',' )
             p++;
-        if ( !strncmp(p, "vga", 3) )
+        if ( !strncmp(p, "pv", 2) )
+            pv_console_init();
+        else if ( !strncmp(p, "vga", 3) )
             video_init();
         else if ( !strncmp(p, "xen", 3) )
             opt_console_xen = true;
@@ -814,6 +820,7 @@ void __init console_init_preirq(void)
     }
 
     serial_set_rx_handler(sercon_handle, serial_rx);
+    pv_console_set_rx_handler(serial_rx);
 
     /* HELLO WORLD --- start-of-day banner text. */
     spin_lock(&console_lock);
@@ -866,6 +873,7 @@ void __init console_init_ring(void)
 void __init console_init_postirq(void)
 {
     serial_init_postirq();
+    pv_console_init_postirq();
 
     if ( conring != _conring )
         return;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 36/74] --- x86/shim: Kconfig and command line options
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (34 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 35/74] x86/guest: use PV console for Xen/Dom0 I/O Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-05 15:26   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 37/74] tools/firmware: Build and install xen-shim Wei Liu
                   ` (39 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 docs/misc/xen-command-line.markdown | 11 ++++++++++
 xen/arch/x86/Kconfig                | 22 +++++++++++++++++++
 xen/arch/x86/pv/Makefile            |  1 +
 xen/arch/x86/pv/shim.c              | 39 ++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/guest.h         |  1 +
 xen/include/asm-x86/pv/shim.h       | 42 +++++++++++++++++++++++++++++++++++++
 6 files changed, 116 insertions(+)
 create mode 100644 xen/arch/x86/pv/shim.c
 create mode 100644 xen/include/asm-x86/pv/shim.h

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index da006dd4f7..3a1a9c1fba 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1445,6 +1445,17 @@ do; there may be other custom operating systems which do.  If you're
 certain you don't plan on having PV guests which use this feature,
 turning it off can reduce the attack surface.
 
+### pv-shim (x86)
+> `= <boolean>`
+
+> Default: `false`
+
+This option is intended for use by a toolstack, when choosing to run a PV
+guest compatibly inside an HVM container.
+
+In this mode, the kernel and initrd passed as modules to the hypervisor are
+constructed into a plain unprivileged PV domain.
+
 ### rcu-idle-timer-period-ms
 > `= <integer>`
 
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index c0b0bcdcb3..4953533f16 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -133,6 +133,28 @@ config PVH_GUEST
 	---help---
 	  Support booting using the PVH ABI.
 
+	  If unsure, say N.
+
+config PV_SHIM
+	def_bool n
+	prompt "PV Shim"
+	depends on PV && XEN_GUEST
+	---help---
+	  Build Xen with a mode which acts as a shim to allow PV guest to run
+	  in an HVM/PVH container. This mode can only be enabled with command
+	  line option.
+
+	  If unsure, say N.
+
+config PV_SHIM_EXCLUSIVE
+	def_bool n
+	prompt "PV Shim Exclusive"
+	depends on PV_SHIM
+	---help---
+	  Build Xen in a way which unconditionally assumes PV_SHIM mode.  This
+	  option is only intended for use when building a dedicated PV Shim
+	  firmware, and will not function correctly in other scenarios.
+
 	  If unsure, say N.
 endmenu
 
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index bac2792aa2..65bca04175 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -11,6 +11,7 @@ obj-y += iret.o
 obj-y += misc-hypercalls.o
 obj-y += mm.o
 obj-y += ro-page-fault.o
+obj-$(CONFIG_PV_SHIM) += shim.o
 obj-y += traps.o
 
 obj-bin-y += dom0_build.init.o
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
new file mode 100644
index 0000000000..4d037355db
--- /dev/null
+++ b/xen/arch/x86/pv/shim.c
@@ -0,0 +1,39 @@
+/******************************************************************************
+ * arch/x86/pv/shim.c
+ *
+ * Functionaltiy for PV Shim mode
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+#include <xen/init.h>
+#include <xen/types.h>
+
+#include <asm/apic.h>
+
+#ifndef CONFIG_PV_SHIM_EXCLUSIVE
+bool pv_shim;
+boolean_param("pv-shim", pv_shim);
+#endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/guest.h b/xen/include/asm-x86/guest.h
index 5abdb8c433..a38c6b5b3f 100644
--- a/xen/include/asm-x86/guest.h
+++ b/xen/include/asm-x86/guest.h
@@ -22,6 +22,7 @@
 #include <asm/guest/hypercall.h>
 #include <asm/guest/pvh-boot.h>
 #include <asm/guest/xen.h>
+#include <asm/pv/shim.h>
 
 #endif /* __X86_GUEST_H__ */
 
diff --git a/xen/include/asm-x86/pv/shim.h b/xen/include/asm-x86/pv/shim.h
new file mode 100644
index 0000000000..1468cfd498
--- /dev/null
+++ b/xen/include/asm-x86/pv/shim.h
@@ -0,0 +1,42 @@
+/******************************************************************************
+ * asm-x86/guest/shim.h
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+
+#ifndef __X86_PV_SHIM_H__
+#define __X86_PV_SHIM_H__
+
+#include <xen/types.h>
+
+#if defined(CONFIG_PV_SHIM_EXCLUSIVE)
+# define pv_shim 1
+#elif defined(CONFIG_PV_SHIM)
+extern bool pv_shim;
+#else
+# define pv_shim 0
+#endif /* CONFIG_PV_SHIM{,_EXCLUSIVE} */
+
+#endif /* __X86_PV_SHIM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 37/74] tools/firmware: Build and install xen-shim
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (35 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 36/74] --- x86/shim: Kconfig and command line options Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:05 ` [PATCH RFC v1 38/74] x86/pv-shim: Force CPUID faulting in pv-shim mode Wei Liu
                   ` (38 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

Link a minimum set of files to build the shim. The linkfarm rune can
handle creation and deletion of files.

We can do better by properly generate the dependency from the list of
files but that's an improvement for later.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 .gitignore                         |  4 ++
 tools/firmware/Makefile            | 11 +++++
 tools/firmware/xen-dir/Makefile    | 59 ++++++++++++++++++++++++++
 tools/firmware/xen-dir/shim.config | 87 ++++++++++++++++++++++++++++++++++++++
 xen/Makefile                       |  8 +++-
 5 files changed, 167 insertions(+), 2 deletions(-)
 create mode 100644 tools/firmware/xen-dir/Makefile
 create mode 100644 tools/firmware/xen-dir/shim.config

diff --git a/.gitignore b/.gitignore
index 8da67daf31..f6cc61a701 100644
--- a/.gitignore
+++ b/.gitignore
@@ -155,6 +155,10 @@ tools/firmware/rombios/rombios[^/]*.s
 tools/firmware/rombios/32bit/32bitbios_flat.h
 tools/firmware/vgabios/vbetables-gen
 tools/firmware/vgabios/vbetables.h
+tools/firmware/xen-dir/*.old
+tools/firmware/xen-dir/linkfarm.stamp*
+tools/firmware/xen-dir/xen-root
+tools/firmware/xen-dir/xen-shim
 tools/flask/utils/flask-getenforce
 tools/flask/utils/flask-get-bool
 tools/flask/utils/flask-loadpolicy
diff --git a/tools/firmware/Makefile b/tools/firmware/Makefile
index 868b506920..b2f011df49 100644
--- a/tools/firmware/Makefile
+++ b/tools/firmware/Makefile
@@ -1,6 +1,10 @@
 XEN_ROOT = $(CURDIR)/../..
 include $(XEN_ROOT)/tools/Rules.mk
 
+ifneq ($(XEN_TARGET_ARCH),x86_32)
+CONFIG_PV_SHIM := y
+endif
+
 # hvmloader is a 32-bit protected mode binary.
 TARGET      := hvmloader/hvmloader
 INST_DIR := $(DESTDIR)$(XENFIRMWAREDIR)
@@ -11,6 +15,7 @@ SUBDIRS-$(CONFIG_SEABIOS) += seabios-dir
 SUBDIRS-$(CONFIG_ROMBIOS) += rombios
 SUBDIRS-$(CONFIG_ROMBIOS) += vgabios
 SUBDIRS-$(CONFIG_ROMBIOS) += etherboot
+SUBDIRS-$(CONFIG_PV_SHIM) += xen-dir
 SUBDIRS-y += hvmloader
 
 LD32BIT-$(CONFIG_FreeBSD) := LD32BIT_FLAG=-melf_i386_fbsd
@@ -48,6 +53,9 @@ endif
 ifeq ($(CONFIG_OVMF),y)
 	$(INSTALL_DATA) ovmf-dir/ovmf.bin $(INST_DIR)/ovmf.bin
 endif
+ifeq ($(CONFIG_PV_SHIM),y)
+	$(INSTALL_DATA) xen-dir/xen-shim $(INST_DIR)/xen-shim
+endif
 
 .PHONY: uninstall
 uninstall:
@@ -58,6 +66,9 @@ endif
 ifeq ($(CONFIG_OVMF),y)
 	rm -f $(INST_DIR)/ovmf.bin
 endif
+ifeq ($(CONFIG_PV_SHIM),y)
+	rm -f $(INST_DIR)/xen-shim
+endif
 
 .PHONY: clean
 clean: subdirs-clean
diff --git a/tools/firmware/xen-dir/Makefile b/tools/firmware/xen-dir/Makefile
new file mode 100644
index 0000000000..adf6c31e8d
--- /dev/null
+++ b/tools/firmware/xen-dir/Makefile
@@ -0,0 +1,59 @@
+XEN_ROOT = $(CURDIR)/../../..
+
+all: xen-shim
+
+.PHONY: FORCE
+FORCE:
+
+D=xen-root
+
+# Minimun set of files / directories go get Xen to build
+LINK_DIRS=config xen
+LINK_FILES=Config.mk
+
+DEP_DIRS=$(foreach i, $(LINK_DIRS), $(XEN_ROOT)/$(i))
+DEP_FILES=$(foreach i, $(LINK_FILES), $(XEN_ROOT)/$(i))
+
+linkfarm.stamp: $(DEP_DIRS) $(DEP_FILES) FORCE
+	mkdir -p $(D)
+	set -e
+	rm -f linkfarm.stamp.tmp
+	$(foreach d, $(LINK_DIRS), \
+		 (mkdir -p $(D)/$(d); \
+		  cd $(D)/$(d); \
+		  find $(XEN_ROOT)/$(d)/ -type d -printf "./%P\n" |  xargs mkdir -p);)
+	$(foreach d, $(LINK_DIRS), \
+		(cd $(XEN_ROOT); \
+		 find $(d) ! -type l -type f \
+		 $(addprefix ! -path , '*.[oda1]' '*.d[12]')) \
+		 >> linkfarm.stamp.tmp ; )
+	$(foreach f, $(LINK_FILES), \
+		echo $(f) >> linkfarm.stamp.tmp ;)
+	cmp -s linkfarm.stamp.tmp linkfarm.stamp && \
+		rm linkfarm.stamp.tmp || { \
+		mv linkfarm.stamp.tmp linkfarm.stamp; \
+		cat linkfarm.stamp | while read f; \
+		  do rm -f "$(D)/$$f"; ln -s "$(XEN_ROOT)/$$f" "$(D)/$$f"; done \
+		}
+
+# Copy enough of the tree to build the shim hypervisor
+$(D): linkfarm.stamp
+	$(MAKE) -C $(D)/xen distclean
+
+.PHONY: shim-%config
+shim-%config: $(D) FORCE
+	$(MAKE) -C $(D)/xen $*config \
+		XEN_CONFIG_EXPERT=y \
+		KCONFIG_CONFIG=$(CURDIR)/shim.config
+
+xen-shim: $(D) shim-olddefconfig
+	$(MAKE) -C $(D)/xen install-shim \
+		XEN_CONFIG_EXPERT=y \
+		KCONFIG_CONFIG=$(CURDIR)/shim.config \
+		DESTDIR=$(CURDIR)
+
+.PHONY: distclean clean
+distclean clean:
+	rm -f xen-shim *.old
+	rm -rf $(D)
+	rm -f linkfarm.stamp*
diff --git a/tools/firmware/xen-dir/shim.config b/tools/firmware/xen-dir/shim.config
new file mode 100644
index 0000000000..227a12fb4c
--- /dev/null
+++ b/tools/firmware/xen-dir/shim.config
@@ -0,0 +1,87 @@
+#
+# Automatically generated file; DO NOT EDIT.
+# Xen/x86 4.11-unstable Configuration
+#
+CONFIG_X86_64=y
+CONFIG_X86=y
+CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
+
+#
+# Architecture Features
+#
+CONFIG_NR_CPUS=32
+CONFIG_PV=y
+CONFIG_PV_LINEAR_PT=y
+CONFIG_HVM=y
+# CONFIG_SHADOW_PAGING is not set
+# CONFIG_BIGMEM is not set
+# CONFIG_HVM_FEP is not set
+# CONFIG_TBOOT is not set
+CONFIG_XEN_GUEST=y
+CONFIG_PVH_GUEST=y
+CONFIG_PV_SHIM=y
+CONFIG_PV_SHIM_EXCLUSIVE=y
+
+#
+# Common Features
+#
+CONFIG_COMPAT=y
+CONFIG_CORE_PARKING=y
+CONFIG_HAS_ALTERNATIVE=y
+CONFIG_HAS_EX_TABLE=y
+CONFIG_HAS_MEM_ACCESS=y
+CONFIG_HAS_MEM_PAGING=y
+CONFIG_HAS_MEM_SHARING=y
+CONFIG_HAS_PDX=y
+CONFIG_HAS_UBSAN=y
+CONFIG_HAS_KEXEC=y
+CONFIG_HAS_GDBSX=y
+CONFIG_HAS_IOPORTS=y
+# CONFIG_KEXEC is not set
+# CONFIG_TMEM is not set
+# CONFIG_XENOPROF is not set
+# CONFIG_XSM is not set
+
+#
+# Schedulers
+#
+CONFIG_SCHED_CREDIT=y
+# CONFIG_SCHED_CREDIT2 is not set
+# CONFIG_SCHED_RTDS is not set
+# CONFIG_SCHED_ARINC653 is not set
+CONFIG_SCHED_NULL=y
+# CONFIG_SCHED_CREDIT_DEFAULT is not set
+CONFIG_SCHED_NULL_DEFAULT=y
+CONFIG_SCHED_DEFAULT="null"
+# CONFIG_LIVEPATCH is not set
+# CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS is not set
+CONFIG_CMDLINE=""
+
+#
+# Device Drivers
+#
+CONFIG_ACPI=y
+CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
+CONFIG_NUMA=y
+CONFIG_HAS_NS16550=y
+CONFIG_HAS_EHCI=y
+CONFIG_HAS_CPUFREQ=y
+CONFIG_HAS_PASSTHROUGH=y
+CONFIG_HAS_PCI=y
+CONFIG_VIDEO=y
+CONFIG_VGA=y
+CONFIG_DEFCONFIG_LIST="$ARCH_DEFCONFIG"
+CONFIG_ARCH_SUPPORTS_INT128=y
+
+#
+# Debugging Options
+#
+# CONFIG_DEBUG is not set
+# CONFIG_CRASH_DEBUG is not set
+# CONFIG_FRAME_POINTER is not set
+# CONFIG_GCOV is not set
+# CONFIG_LOCK_PROFILE is not set
+# CONFIG_PERF_COUNTERS is not set
+# CONFIG_VERBOSE_DEBUG is not set
+# CONFIG_SCRUB_DEBUG is not set
+# CONFIG_UBSAN is not set
diff --git a/xen/Makefile b/xen/Makefile
index 044e7c82a3..49b590187f 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -39,8 +39,8 @@ dist: install
 
 build install:: include/config/auto.conf
 
-.PHONY: build install uninstall clean distclean cscope TAGS tags MAP gtags tests
-build install uninstall debug clean distclean cscope TAGS tags MAP gtags tests::
+.PHONY: build install uninstall clean distclean cscope TAGS tags MAP gtags tests install-shim
+build install uninstall debug clean distclean cscope TAGS tags MAP gtags tests install-shim::
 ifneq ($(XEN_TARGET_ARCH),x86_32)
 	$(MAKE) -f Rules.mk _$@
 else
@@ -80,6 +80,10 @@ _install: $(TARGET)$(CONFIG_XEN_INSTALL_SUFFIX)
 		fi; \
 	fi
 
+.PHONY: _install-shim
+_install-shim: build
+	$(INSTALL_DATA) $(TARGET)-shim $(DESTDIR)
+
 .PHONY: _tests
 _tests:
 	$(MAKE) -f $(BASEDIR)/Rules.mk -C test tests
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 38/74] x86/pv-shim: Force CPUID faulting in pv-shim mode
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (36 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 37/74] tools/firmware: Build and install xen-shim Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-08 10:16   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 39/74] xen/x86: make VGA support selectable Wei Liu
                   ` (37 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Andrew Cooper <andrew.cooper3@citrix.com>

This is necessary to prevent the PV guest seeing HVM Xen leaves via native
cpuid.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/cpu/common.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index a1f1a04776..6543690988 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -12,6 +12,7 @@
 #include <mach_apic.h>
 #include <asm/setup.h>
 #include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
+#include <asm/guest.h>
 
 #include "cpu.h"
 
@@ -177,7 +178,8 @@ void ctxt_switch_levelling(const struct vcpu *next)
 		 * generating the maximum full cpuid policy into Xen, at which
 		 * this problem will disappear.
 		 */
-		set_cpuid_faulting(nextd && !is_control_domain(nextd) &&
+		set_cpuid_faulting(nextd &&
+				   (pv_shim || !is_control_domain(nextd)) &&
 				   (is_pv_domain(nextd) ||
 				    next->arch.msr->
 				    misc_features_enables.cpuid_faulting));
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 39/74] xen/x86: make VGA support selectable
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (37 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 38/74] x86/pv-shim: Force CPUID faulting in pv-shim mode Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-08 10:22   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid Wei Liu
                   ` (36 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Through a Kconfig option. Enable it by default, and disable it for the
PV-in-PVH shim.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 tools/firmware/xen-dir/shim.config | 3 +--
 xen/arch/x86/Kconfig               | 1 -
 xen/arch/x86/boot/build32.mk       | 1 +
 xen/arch/x86/boot/cmdline.c        | 5 ++++-
 xen/arch/x86/boot/trampoline.S     | 7 +++++++
 xen/arch/x86/efi/efi-boot.h        | 4 ++++
 xen/arch/x86/platform_hypercall.c  | 2 ++
 xen/arch/x86/pv/dom0_build.c       | 2 ++
 xen/arch/x86/setup.c               | 6 ++++++
 xen/drivers/video/Kconfig          | 8 +++++++-
 xen/include/asm-x86/setup.h        | 6 ++++++
 11 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/tools/firmware/xen-dir/shim.config b/tools/firmware/xen-dir/shim.config
index 227a12fb4c..78b965f4c7 100644
--- a/tools/firmware/xen-dir/shim.config
+++ b/tools/firmware/xen-dir/shim.config
@@ -68,8 +68,7 @@ CONFIG_HAS_EHCI=y
 CONFIG_HAS_CPUFREQ=y
 CONFIG_HAS_PASSTHROUGH=y
 CONFIG_HAS_PCI=y
-CONFIG_VIDEO=y
-CONFIG_VGA=y
+# CONFIG_VGA is not set
 CONFIG_DEFCONFIG_LIST="$ARCH_DEFCONFIG"
 CONFIG_ARCH_SUPPORTS_INT128=y
 
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 4953533f16..f621e799ed 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -24,7 +24,6 @@ config X86
 	select HAS_PDX
 	select HAS_UBSAN
 	select NUMA
-	select VGA
 
 config ARCH_DEFCONFIG
 	string
diff --git a/xen/arch/x86/boot/build32.mk b/xen/arch/x86/boot/build32.mk
index f7e8ebe67d..48c7407c00 100644
--- a/xen/arch/x86/boot/build32.mk
+++ b/xen/arch/x86/boot/build32.mk
@@ -5,6 +5,7 @@ include $(XEN_ROOT)/Config.mk
 $(call cc-options-add,CFLAGS,CC,$(EMBEDDED_EXTRA_CFLAGS))
 
 CFLAGS += -Werror -fno-asynchronous-unwind-tables -fno-builtin -g0 -msoft-float
+CFLAGS += -I$(XEN_ROOT)/xen/include
 CFLAGS := $(filter-out -flto,$(CFLAGS)) 
 
 # NB. awk invocation is a portable alternative to 'head -n -1'
diff --git a/xen/arch/x86/boot/cmdline.c b/xen/arch/x86/boot/cmdline.c
index 06aa064e72..51b0659a04 100644
--- a/xen/arch/x86/boot/cmdline.c
+++ b/xen/arch/x86/boot/cmdline.c
@@ -30,6 +30,7 @@ asm (
     "    jmp  cmdline_parse_early      \n"
     );
 
+#include <xen/kconfig.h>
 #include "defs.h"
 #include "video.h"
 
@@ -336,5 +337,7 @@ void __stdcall cmdline_parse_early(const char *cmdline, early_boot_opts_t *ebo)
     ebo->skip_realmode = skip_realmode(cmdline);
     ebo->opt_edd = edd_parse(cmdline);
     ebo->opt_edid = edid_parse(cmdline);
-    vga_parse(cmdline, ebo);
+
+    if ( IS_ENABLED(CONFIG_VIDEO) )
+        vga_parse(cmdline, ebo);
 }
diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S
index 4d640f3fcd..a17a90df5e 100644
--- a/xen/arch/x86/boot/trampoline.S
+++ b/xen/arch/x86/boot/trampoline.S
@@ -219,7 +219,9 @@ trampoline_boot_cpu_entry:
          */
         call    get_memory_map
         call    get_edd
+#ifdef CONFIG_VIDEO
         call    video
+#endif
 
         mov     $0x0200,%ax
         int     $0x16
@@ -267,10 +269,13 @@ opt_edid:
         .byte   0                               /* EDID parsing option (force/no/default). */
 /* Padding. */
         .byte   0
+
+#ifdef CONFIG_VIDEO
 GLOBAL(boot_vid_mode)
         .word   VIDEO_80x25                     /* If we don't run at all, assume basic video mode 3 at 80x25. */
 vesa_size:
         .word   0,0,0                           /* width x depth x height */
+#endif
 
 GLOBAL(kbd_shift_flags)
         .byte   0
@@ -279,4 +284,6 @@ rm_idt: .word   256*4-1, 0, 0
 
 #include "mem.S"
 #include "edd.S"
+#ifdef CONFIG_VIDEO
 #include "video.S"
+#endif
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index d30f688a5a..5789d2cb70 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -479,16 +479,19 @@ static void __init efi_arch_edd(void)
 
 static void __init efi_arch_console_init(UINTN cols, UINTN rows)
 {
+#ifdef CONFIG_VIDEO
     vga_console_info.video_type = XEN_VGATYPE_TEXT_MODE_3;
     vga_console_info.u.text_mode_3.columns = cols;
     vga_console_info.u.text_mode_3.rows = rows;
     vga_console_info.u.text_mode_3.font_height = 16;
+#endif
 }
 
 static void __init efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
                                        UINTN info_size,
                                        EFI_GRAPHICS_OUTPUT_MODE_INFORMATION *mode_info)
 {
+#ifdef CONFIG_VIDEO
     int bpp = 0;
 
     switch ( mode_info->PixelFormat )
@@ -550,6 +553,7 @@ static void __init efi_arch_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
         vga_console_info.u.vesa_lfb.lfb_size =
             (gop->Mode->FrameBufferSize + 0xffff) >> 16;
     }
+#endif
 }
 
 static void __init efi_arch_memory_setup(void)
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index ebc2f394ee..ea18c3215a 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -388,6 +388,7 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
         }
         case XEN_FW_VBEDDC_INFO:
             ret = -ESRCH;
+#ifdef CONFIG_VIDEO
             if ( op->u.firmware_info.index != 0 )
                 break;
             if ( *(u32 *)bootsym(boot_edid_info) == 0x13131313 )
@@ -406,6 +407,7 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
                  copy_to_compat(op->u.firmware_info.u.vbeddc_info.edid,
                                 bootsym(boot_edid_info), 128) )
                 ret = -EFAULT;
+#endif
             break;
         case XEN_FW_EFI_INFO:
             ret = efi_get_info(op->u.firmware_info.index,
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 09c765a06f..23d5993b7e 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -828,11 +828,13 @@ int __init dom0_construct_pv(struct domain *d,
     if ( cmdline != NULL )
         strlcpy((char *)si->cmd_line, cmdline, sizeof(si->cmd_line));
 
+#ifdef CONFIG_VIDEO
     if ( fill_console_start_info((void *)(si + 1)) )
     {
         si->console.dom0.info_off  = sizeof(struct start_info);
         si->console.dom0.info_size = sizeof(struct dom0_vga_console_info);
     }
+#endif
 
     if ( is_pv_32bit_domain(d) )
         xlat_start_info(si, XLAT_start_info_console_dom0);
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 4dff2bca8b..99f5d61eb8 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -456,6 +456,7 @@ static void __init setup_max_pdx(unsigned long top_page)
 /* A temporary copy of the e820 map that we can mess with during bootstrap. */
 static struct e820map __initdata boot_e820;
 
+#ifdef CONFIG_VIDEO
 struct boot_video_info {
     u8  orig_x;             /* 0x00 */
     u8  orig_y;             /* 0x01 */
@@ -486,9 +487,11 @@ struct boot_video_info {
     u16 vesa_attrib;        /* 0x28 */
 };
 extern struct boot_video_info boot_vid_info;
+#endif
 
 static void __init parse_video_info(void)
 {
+#ifdef CONFIG_VIDEO
     struct boot_video_info *bvi = &bootsym(boot_vid_info);
 
     /* vga_console_info is filled directly on EFI platform. */
@@ -524,6 +527,7 @@ static void __init parse_video_info(void)
         vga_console_info.u.vesa_lfb.gbl_caps = bvi->capabilities;
         vga_console_info.u.vesa_lfb.mode_attrs = bvi->vesa_attrib;
     }
+#endif
 }
 
 static void __init kexec_reserve_area(struct e820map *e820)
@@ -741,6 +745,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     printk("Xen image load base address: %#lx\n", xen_phys_start);
 
+#ifdef CONFIG_VIDEO
     printk("Video information:\n");
 
     /* Print VGA display mode information. */
@@ -784,6 +789,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                 printk("of reasons unknown\n");
         }
     }
+#endif
 
     printk("Disc information:\n");
     printk(" Found %d MBR signatures\n",
diff --git a/xen/drivers/video/Kconfig b/xen/drivers/video/Kconfig
index 0ffbbd9a88..e668462a94 100644
--- a/xen/drivers/video/Kconfig
+++ b/xen/drivers/video/Kconfig
@@ -3,8 +3,14 @@ config VIDEO
 	bool
 
 config VGA
-	bool
+	bool "VGA support"
 	select VIDEO
+	depends on X86
+	default y
+	---help---
+	  Enable VGA output for the Xen hypervisor.
+
+	  If unsure, say Y.
 
 config HAS_ARM_HDLCD
 	bool
diff --git a/xen/include/asm-x86/setup.h b/xen/include/asm-x86/setup.h
index c5b3d4ef18..b68ec9de4d 100644
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -31,8 +31,14 @@ void arch_init_memory(void);
 void subarch_init_memory(void);
 
 void init_IRQ(void);
+
+#ifdef CONFIG_VIDEO
 void vesa_init(void);
 void vesa_mtrr_init(void);
+#else
+static inline void vesa_init(void) {};
+static inline void vesa_mtrr_init(void) {};
+#endif
 
 int construct_dom0(
     struct domain *d,
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (38 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 39/74] xen/x86: make VGA support selectable Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-08 10:27   ` Jan Beulich
  2018-01-08 11:29   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 41/74] xen/pvh: do not mark the low 1MB as IO mem Wei Liu
                   ` (35 subsequent siblings)
  75 siblings, 2 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Use the ebx register of the hypervisor leaf 1. The eax register on
this leaf is already used to report the Xen major and minor versions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/traps.c                | 1 +
 xen/include/public/arch-x86/cpuid.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index db16a44417..42c50b1cd4 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -853,6 +853,7 @@ void cpuid_hypervisor_leaves(const struct vcpu *v, uint32_t leaf,
 
     case 1:
         res->a = (xen_major_version() << 16) | xen_minor_version();
+        res->b = d->domain_id;
         break;
 
     case 2:
diff --git a/xen/include/public/arch-x86/cpuid.h b/xen/include/public/arch-x86/cpuid.h
index eb76875d0e..a17c8682a0 100644
--- a/xen/include/public/arch-x86/cpuid.h
+++ b/xen/include/public/arch-x86/cpuid.h
@@ -57,7 +57,8 @@
  * Leaf 2 (0x40000x01)
  * EAX[31:16]: Xen major version.
  * EAX[15: 0]: Xen minor version.
- * EBX-EDX: Reserved (currently all zeroes).
+ * EBX: Domain id.
+ * ECX-EDX: Reserved (currently all zeroes).
  */
 
 /*
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 41/74] xen/pvh: do not mark the low 1MB as IO mem
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (39 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-08 10:30   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked Wei Liu
                   ` (34 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

On PVH there's nothing special on the low 1MB

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/mm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index f73fee225e..355e6747bb 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -122,6 +122,7 @@
 #include <asm/fixmap.h>
 #include <asm/io_apic.h>
 #include <asm/pci.h>
+#include <asm/guest.h>
 
 #include <asm/hvm/grant_table.h>
 #include <asm/pv/grant_table.h>
@@ -288,8 +289,12 @@ void __init arch_init_memory(void)
     dom_cow = domain_create(DOMID_COW, DOMCRF_dummy, 0, NULL);
     BUG_ON(IS_ERR(dom_cow));
 
-    /* First 1MB of RAM is historically marked as I/O. */
-    for ( i = 0; i < 0x100; i++ )
+    /*
+     * First 1MB of RAM is historically marked as I/O.  If we booted PVH,
+     * reclaim the space.  Irrespective, leave MFN 0 as special for the sake
+     * of 0 being a very common default value.
+     */
+    for ( i = 0; i < (pvh_boot ? 1 : 0x100); i++ )
         share_xen_page_with_guest(mfn_to_page(_mfn(i)),
                                   dom_io, XENSHARE_writable);
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (40 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 41/74] xen/pvh: do not mark the low 1MB as IO mem Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-08 10:37   ` Jan Beulich
  2018-01-12 10:41   ` Dario Faggioli
  2018-01-04 13:05 ` [PATCH RFC v1 43/74] xen: introduce rangeset_reserve_hole Wei Liu
                   ` (33 subsequent siblings)
  75 siblings, 2 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Avoid scheduling vCPUs that are blocked, there's no point in assigning
them to a pCPU because they are not going to run anyway.

Since blocked vCPUs are not assigned to pCPUs after this change, force
a rescheduling when a vCPU is brought up if it's on the waitqueue.
Also when scheduling try to pick a vCPU from the runqueue if the pCPU
is running idle.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Dario Faggioli <raistlin@linux.it>
---
Changes since v1:
 - Force a rescheduling when a vCPU is brought up.
 - Try to pick a vCPU from the runqueue if running the idle vCPU.
---
 xen/common/sched_null.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index b4a24baf8e..bacfb31cb3 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -574,6 +574,8 @@ static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v)
     {
         /* Not exactly "on runq", but close enough for reusing the counter */
         SCHED_STAT_CRANK(vcpu_wake_onrunq);
+        /* Force a rescheduling in case some CPU is idle can pick this vCPU */
+        cpumask_raise_softirq(&cpu_online_map, SCHEDULE_SOFTIRQ);
         return;
     }
 
@@ -761,9 +763,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
     /*
      * We may be new in the cpupool, or just coming back online. In which
      * case, there may be vCPUs in the waitqueue that we can assign to us
-     * and run.
+     * and run. Also check whether this CPU is running idle, in which case try
+     * to pick a vCPU from the waitqueue.
      */
-    if ( unlikely(ret.task == NULL) )
+    if ( unlikely(ret.task == NULL || ret.task == idle_vcpu[cpu]) )
     {
         spin_lock(&prv->waitq_lock);
 
@@ -781,6 +784,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
         {
             list_for_each_entry( wvc, &prv->waitq, waitq_elem )
             {
+                if ( test_bit(_VPF_down, &wvc->vcpu->pause_flags) )
+                    /* Skip vCPUs that are down. */
+                    continue;
+
                 if ( bs == BALANCE_SOFT_AFFINITY &&
                      !has_soft_affinity(wvc->vcpu, wvc->vcpu->cpu_hard_affinity) )
                     continue;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 43/74] xen: introduce rangeset_reserve_hole
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (41 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-08 10:46   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 44/74] xen/pvshim: keep track of unused pages Wei Liu
                   ` (32 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

Reserve a hole in a rangeset.

Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/rangeset.c      | 51 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/rangeset.h |  4 ++++
 2 files changed, 55 insertions(+)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 6c6293c15c..2633786289 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -298,6 +298,57 @@ int rangeset_report_ranges(
     return rc;
 }
 
+int rangeset_reserve_hole(struct rangeset *r, unsigned long size,
+                          unsigned long *s)
+{
+    struct range *prev, *next;
+
+    *s = 0;
+
+    write_lock(&r->lock);
+
+    for ( prev = NULL, next = first_range(r);
+          next;
+          prev = next, next = next_range(r, next) )
+    {
+        if ( (next->s - *s) >= size )
+            goto insert;
+
+        if ( next->e == ~0UL )
+            goto out;
+
+        *s = next->e + 1;
+    }
+
+    if ( (~0UL - *s) + 1 >= size )
+        goto insert;
+
+ out:
+    write_unlock(&r->lock);
+    return -ENOSPC;
+
+ insert:
+    if ( !prev )
+    {
+        next = alloc_range(r);
+        if ( !next )
+        {
+            write_unlock(&r->lock);
+            return -ENOMEM;
+        }
+
+        next->s = *s;
+        next->e = *s + size - 1;
+        insert_range(r, prev, next);
+    }
+    else
+        prev->e += size;
+
+    write_unlock(&r->lock);
+
+    return 0;
+}
+
 int rangeset_add_singleton(
     struct rangeset *r, unsigned long s)
 {
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index aa6408248b..a606fb7793 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -76,6 +76,10 @@ int __must_check rangeset_remove_singleton(
 bool_t __must_check rangeset_contains_singleton(
     struct rangeset *r, unsigned long s);
 
+/* Reserve a region of the specified size. */
+int __must_check rangeset_reserve_hole(struct rangeset *r, unsigned long size,
+                                       unsigned long *s);
+
 /* swap contents */
 void rangeset_swap(struct rangeset *a, struct rangeset *b);
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 44/74] xen/pvshim: keep track of unused pages
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (42 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 43/74] xen: introduce rangeset_reserve_hole Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-08 10:58   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 45/74] x86/guest: use unpopulated memory to map the shared_info page Wei Liu
                   ` (31 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

Simple infrastructure to keep track of allocate and free unused pages,
so that we can use them to map special pages like shared info and
grant table.

As rangeset depends on malloc being ready we introduce
hypervisor_setup for things that can be initialised late in the
process.

Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/guest/xen.c        | 48 +++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/setup.c            |  3 +++
 xen/include/asm-x86/guest/xen.h | 22 +++++++++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 0319a5f9e8..f66c10fbe5 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -21,6 +21,7 @@
 #include <xen/init.h>
 #include <xen/types.h>
 #include <xen/pv_console.h>
+#include <xen/rangeset.h>
 
 #include <asm/apic.h>
 #include <asm/guest.h>
@@ -34,6 +35,7 @@ bool xen_guest;
 static uint32_t xen_cpuid_base;
 static uint8_t evtchn_upcall_vector;
 extern char hypercall_page[];
+static struct rangeset *mem;
 
 static void __init find_xen_leaves(void)
 {
@@ -161,9 +163,38 @@ static void __init init_evtchn(void)
     ap_setup_event_channels(true);
 }
 
+static void __init init_memmap(void)
+{
+    unsigned int i;
+
+    mem = rangeset_new(NULL, "host memory map", 0);
+    if ( !mem )
+        panic("failed to allocate host memory rangeset");
+
+    /* Mark up to the last memory page (or 4GB) as RAM. */
+    if ( rangeset_add_range(mem, 0, max_t(unsigned long, max_page,
+                                          (GB(4) - 1) >> PAGE_SHIFT)) )
+        panic("unable to add RAM to memory rangeset");
+
+    for ( i = 0; i < e820.nr_map; i++ )
+    {
+        struct e820entry *e = &e820.map[i];
+
+        if ( rangeset_add_range(mem, e->addr >> PAGE_SHIFT,
+                                (e->addr + e->size) >> PAGE_SHIFT) )
+            panic("unable to add range %#lx - %#lx to memory rangeset",
+                  e->addr, e->addr + e->size);
+    }
+}
+
 void __init hypervisor_early_setup(struct e820map *e820)
 {
     map_shared_info(e820);
+}
+
+void __init hypervisor_setup(void)
+{
+    init_memmap();
 
     init_evtchn();
 }
@@ -173,6 +204,23 @@ void hypervisor_ap_setup(void)
     ap_setup_event_channels(false);
 }
 
+int hypervisor_alloc_unused_page(mfn_t *mfn)
+{
+    unsigned long m;
+    int rc;
+
+    rc = rangeset_reserve_hole(mem, 1, &m);
+    if ( !rc )
+        *mfn = _mfn(m);
+
+    return rc;
+}
+
+int hypervisor_free_unused_page(mfn_t mfn)
+{
+    return rangeset_remove_range(mem, mfn_x(mfn), mfn_x(mfn));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 99f5d61eb8..1b3576bc7d 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1481,6 +1481,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         max_cpus = nr_cpu_ids;
     }
 
+    if ( xen_guest )
+        hypervisor_setup();
+
     /* Low mappings were only needed for some BIOS table parsing. */
     zap_low_mappings();
 
diff --git a/xen/include/asm-x86/guest/xen.h b/xen/include/asm-x86/guest/xen.h
index 56cabb1934..ba826d75db 100644
--- a/xen/include/asm-x86/guest/xen.h
+++ b/xen/include/asm-x86/guest/xen.h
@@ -32,7 +32,10 @@ extern bool xen_guest;
 
 void probe_hypervisor(void);
 void hypervisor_early_setup(struct e820map *e820);
+void hypervisor_setup(void);
 void hypervisor_ap_setup(void);
+int hypervisor_alloc_unused_page(mfn_t *mfn);
+int hypervisor_free_unused_page(mfn_t mfn);
 
 #else
 
@@ -43,11 +46,30 @@ static inline void hypervisor_early_setup(struct e820map *e820)
 {
     ASSERT_UNREACHABLE();
 };
+
+static inline void hypervisor_setup(void)
+{
+    ASSERT_UNREACHABLE();
+}
+
 static inline void hypervisor_ap_setup(void)
 {
     ASSERT_UNREACHABLE();
 };
 
+static inline int hypervisor_alloc_unused_page(mfn_t *mfn)
+{
+
+    ASSERT_UNREACHABLE();
+    return 0;
+}
+
+static inline int hypervisor_free_unused_page(mfn_t mfn)
+{
+    ASSERT_UNREACHABLE();
+    return 0;
+}
+
 #endif /* CONFIG_XEN_GUEST */
 #endif /* __X86_GUEST_XEN_H__ */
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 45/74] x86/guest: use unpopulated memory to map the shared_info page
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (43 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 44/74] xen/pvshim: keep track of unused pages Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-08 11:03   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 46/74] xen/guest: fetch vCPU ID from Xen Wei Liu
                   ` (30 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

This prevents hardcoding a known unpopulated memory page to map
the shared info page. This fixes a TODO item in a previous patch.

Remove hypervisor_early_setup as now it is not required anymore.

Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/guest/xen.c        | 20 ++++++++------------
 xen/arch/x86/setup.c            |  3 ---
 xen/include/asm-x86/guest/xen.h |  5 -----
 3 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index f66c10fbe5..0a4c02a8cd 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -77,23 +77,22 @@ void __init probe_hypervisor(void)
     xen_guest = true;
 }
 
-static void map_shared_info(struct e820map *e820)
+static void map_shared_info(void)
 {
-    paddr_t frame = 0xff000000; /* TODO: Hardcoded beside magic frames. */
+    mfn_t mfn;
     struct xen_add_to_physmap xatp = {
         .domid = DOMID_SELF,
-        .idx = 0,
         .space = XENMAPSPACE_shared_info,
-        .gpfn = frame >> PAGE_SHIFT,
     };
 
-    if ( !e820_add_range(e820, frame, frame + PAGE_SIZE, E820_RESERVED) )
-        panic("Failed to reserve shared_info range");
+    if ( hypervisor_alloc_unused_page(&mfn) )
+        panic("unable to reserve shared info memory page");
 
+    xatp.gpfn = mfn_x(mfn);
     if ( xen_hypercall_memory_op(XENMEM_add_to_physmap, &xatp) )
         panic("Failed to map shared_info page");
 
-    set_fixmap(FIX_XEN_SHARED_INFO, frame);
+    set_fixmap(FIX_XEN_SHARED_INFO, mfn_x(mfn) << PAGE_SHIFT);
 }
 
 static void xen_evtchn_upcall(struct cpu_user_regs *regs)
@@ -187,15 +186,12 @@ static void __init init_memmap(void)
     }
 }
 
-void __init hypervisor_early_setup(struct e820map *e820)
-{
-    map_shared_info(e820);
-}
-
 void __init hypervisor_setup(void)
 {
     init_memmap();
 
+    map_shared_info();
+
     init_evtchn();
 }
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 1b3576bc7d..9b45a4fd94 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -898,9 +898,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     /* Create a temporary copy of the E820 map. */
     memcpy(&boot_e820, &e820, sizeof(e820));
 
-    if ( xen_guest )
-        hypervisor_early_setup(&boot_e820);
-
     /* Early kexec reservation (explicit static start address). */
     nr_pages = 0;
     for ( i = 0; i < e820.nr_map; i++ )
diff --git a/xen/include/asm-x86/guest/xen.h b/xen/include/asm-x86/guest/xen.h
index ba826d75db..7a4d734795 100644
--- a/xen/include/asm-x86/guest/xen.h
+++ b/xen/include/asm-x86/guest/xen.h
@@ -31,7 +31,6 @@
 extern bool xen_guest;
 
 void probe_hypervisor(void);
-void hypervisor_early_setup(struct e820map *e820);
 void hypervisor_setup(void);
 void hypervisor_ap_setup(void);
 int hypervisor_alloc_unused_page(mfn_t *mfn);
@@ -42,10 +41,6 @@ int hypervisor_free_unused_page(mfn_t mfn);
 #define xen_guest 0
 
 static inline void probe_hypervisor(void) {};
-static inline void hypervisor_early_setup(struct e820map *e820)
-{
-    ASSERT_UNREACHABLE();
-};
 
 static inline void hypervisor_setup(void)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 46/74] xen/guest: fetch vCPU ID from Xen
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (44 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 45/74] x86/guest: use unpopulated memory to map the shared_info page Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-08 11:04   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 47/74] x86/guest: fix upcall vector setup Wei Liu
                   ` (29 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

If available.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
To be moved before "x86/guest: enable event channels upcalls"
---
 xen/arch/x86/guest/xen.c | 24 +++++++++++++++++++++---
 xen/arch/x86/time.c      |  4 +++-
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 0a4c02a8cd..0f2c5d7413 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -37,6 +37,8 @@ static uint8_t evtchn_upcall_vector;
 extern char hypercall_page[];
 static struct rangeset *mem;
 
+DEFINE_PER_CPU(unsigned int, vcpu_id);
+
 static void __init find_xen_leaves(void)
 {
     uint32_t eax, ebx, ecx, edx, base;
@@ -95,10 +97,24 @@ static void map_shared_info(void)
     set_fixmap(FIX_XEN_SHARED_INFO, mfn_x(mfn) << PAGE_SHIFT);
 }
 
+static void set_vcpu_id(void)
+{
+    uint32_t eax, ebx, ecx, edx;
+
+    ASSERT(xen_cpuid_base);
+
+    /* Fetch vcpu id from cpuid. */
+    cpuid(xen_cpuid_base + 4, &eax, &ebx, &ecx, &edx);
+    if ( eax & XEN_HVM_CPUID_VCPU_ID_PRESENT )
+        this_cpu(vcpu_id) = ebx;
+    else
+        this_cpu(vcpu_id) = smp_processor_id();
+}
+
 static void xen_evtchn_upcall(struct cpu_user_regs *regs)
 {
-    unsigned int cpu = smp_processor_id();
-    struct vcpu_info *vcpu_info = &XEN_shared_info->vcpu_info[cpu];
+    struct vcpu_info *vcpu_info =
+        &XEN_shared_info->vcpu_info[this_cpu(vcpu_id)];
 
     vcpu_info->evtchn_upcall_pending = 0;
     xchg(&vcpu_info->evtchn_pending_sel, 0);
@@ -110,7 +126,7 @@ static void xen_evtchn_upcall(struct cpu_user_regs *regs)
 
 static void ap_setup_event_channels(bool clear)
 {
-    unsigned int i, cpu = smp_processor_id();
+    unsigned int i, cpu = this_cpu(vcpu_id);
     struct vcpu_info *vcpu_info = &XEN_shared_info->vcpu_info[cpu];
     int rc;
 
@@ -191,12 +207,14 @@ void __init hypervisor_setup(void)
     init_memmap();
 
     map_shared_info();
+    set_vcpu_id();
 
     init_evtchn();
 }
 
 void hypervisor_ap_setup(void)
 {
+    set_vcpu_id();
     ap_setup_event_channels(false);
 }
 
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 886fc45248..85bcb9b28a 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -533,6 +533,8 @@ static struct platform_timesource __initdata plt_tsc =
  * Xen clock source is a variant of TSC source.
  */
 
+DECLARE_PER_CPU(unsigned int, vcpu_id);
+
 static u64 xen_timer_cpu_frequency(void)
 {
     struct vcpu_time_info *info = &XEN_shared_info->vcpu_info[0].time;
@@ -575,7 +577,7 @@ static u64 last_value;
 static u64 read_xen_timer(void)
 {
     struct vcpu_time_info *info;
-    unsigned int cpu = smp_processor_id();
+    unsigned int cpu = this_cpu(vcpu_id);
     u32 version;
     u64 ret;
     u64 last;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 47/74] x86/guest: fix upcall vector setup
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (45 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 46/74] xen/guest: fetch vCPU ID from Xen Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-08 11:08   ` Jan Beulich
  2018-01-04 13:05 ` [PATCH RFC v1 48/74] x86/guest: unmask console event channel Wei Liu
                   ` (28 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Instead of forcing no pending event on the vCPU, just mask all event
channels when setting up the BSP and further patches will unmask them
as event channels are being setup.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
To be squashed with "x86/guest: enable event channels upcalls"
---
 xen/arch/x86/guest/xen.c | 61 +++++++++++-------------------------------------
 1 file changed, 14 insertions(+), 47 deletions(-)

diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 0f2c5d7413..a95c36017f 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -33,7 +33,6 @@
 bool xen_guest;
 
 static uint32_t xen_cpuid_base;
-static uint8_t evtchn_upcall_vector;
 extern char hypercall_page[];
 static struct rangeset *mem;
 
@@ -86,6 +85,7 @@ static void map_shared_info(void)
         .domid = DOMID_SELF,
         .space = XENMAPSPACE_shared_info,
     };
+    unsigned int i;
 
     if ( hypervisor_alloc_unused_page(&mfn) )
         panic("unable to reserve shared info memory page");
@@ -95,6 +95,10 @@ static void map_shared_info(void)
         panic("Failed to map shared_info page");
 
     set_fixmap(FIX_XEN_SHARED_INFO, mfn_x(mfn) << PAGE_SHIFT);
+
+    /* Mask all upcalls */
+    for ( i = 0; i < ARRAY_SIZE(XEN_shared_info->evtchn_mask); i++ )
+        xchg(&XEN_shared_info->evtchn_mask[i], ~0ul);
 }
 
 static void set_vcpu_id(void)
@@ -124,58 +128,21 @@ static void xen_evtchn_upcall(struct cpu_user_regs *regs)
     ack_APIC_irq();
 }
 
-static void ap_setup_event_channels(bool clear)
+static void init_evtchn(void)
 {
-    unsigned int i, cpu = this_cpu(vcpu_id);
-    struct vcpu_info *vcpu_info = &XEN_shared_info->vcpu_info[cpu];
+    unsigned int vcpu = this_cpu(vcpu_id);
+    static uint8_t evtchn_upcall_vector;
     int rc;
 
-    ASSERT(evtchn_upcall_vector);
-    ASSERT(cpu < ARRAY_SIZE(XEN_shared_info->vcpu_info));
+    if ( !evtchn_upcall_vector )
+        alloc_direct_apic_vector(&evtchn_upcall_vector, xen_evtchn_upcall);
 
-    if ( !clear )
-    {
-        /*
-         * This is necessary to ensure that a CPU will be interrupted in case
-         * of an event channel notification.
-         */
-        ASSERT(vcpu_info->evtchn_upcall_pending == 0);
-        ASSERT(vcpu_info->evtchn_pending_sel == 0);
-    }
+    ASSERT(evtchn_upcall_vector);
+    ASSERT(vcpu < ARRAY_SIZE(XEN_shared_info->vcpu_info));
 
-    rc = xen_hypercall_set_evtchn_upcall_vector(cpu, evtchn_upcall_vector);
+    rc = xen_hypercall_set_evtchn_upcall_vector(vcpu, evtchn_upcall_vector);
     if ( rc )
         panic("Unable to set evtchn upcall vector: %d", rc);
-
-    if ( clear )
-    {
-        /*
-         * Clear any pending upcall bits. This makes us effectively ignore any
-         * previous upcalls which might be suboptimal.
-         */
-        vcpu_info->evtchn_upcall_pending = 0;
-        xchg(&vcpu_info->evtchn_pending_sel, 0);
-
-        /*
-         * evtchn_pending can be cleared only on the boot CPU because it's
-         * located in a shared structure.
-         */
-        for ( i = 0; i < 8; i++ )
-            xchg(&XEN_shared_info->evtchn_pending[i], 0);
-    }
-}
-
-static void __init init_evtchn(void)
-{
-    unsigned int i;
-
-    alloc_direct_apic_vector(&evtchn_upcall_vector, xen_evtchn_upcall);
-
-    /* Mask all upcalls */
-    for ( i = 0; i < 8; i++ )
-        xchg(&XEN_shared_info->evtchn_mask[i], ~0ul);
-
-    ap_setup_event_channels(true);
 }
 
 static void __init init_memmap(void)
@@ -215,7 +182,7 @@ void __init hypervisor_setup(void)
 void hypervisor_ap_setup(void)
 {
     set_vcpu_id();
-    ap_setup_event_channels(false);
+    init_evtchn();
 }
 
 int hypervisor_alloc_unused_page(mfn_t *mfn)
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 48/74] x86/guest: unmask console event channel
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (46 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 47/74] x86/guest: fix upcall vector setup Wei Liu
@ 2018-01-04 13:05 ` Wei Liu
  2018-01-04 13:06 ` [PATCH RFC v1 49/74] x86/guest: map per-cpu vcpu_info area Wei Liu
                   ` (27 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
To be squashed with "x86/guest: use PV console for Xen/Dom0 I/O"
---
 xen/drivers/char/xen_pv_console.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xen/drivers/char/xen_pv_console.c b/xen/drivers/char/xen_pv_console.c
index 5e494bc72a..2df7d982ba 100644
--- a/xen/drivers/char/xen_pv_console.c
+++ b/xen/drivers/char/xen_pv_console.c
@@ -37,6 +37,7 @@ static DEFINE_SPINLOCK(tx_lock);
 
 void __init pv_console_init(void)
 {
+    struct evtchn_unmask unmask;
     long r;
     uint64_t raw_pfn = 0, raw_evtchn = 0;
 
@@ -58,6 +59,9 @@ void __init pv_console_init(void)
     cons_ring = (struct xencons_interface *)fix_to_virt(FIX_PV_CONSOLE);
     cons_evtchn = raw_evtchn;
 
+    unmask.port = raw_evtchn;
+    BUG_ON(xen_hypercall_event_channel_op(EVTCHNOP_unmask, &unmask));
+
     printk("Initialised PV console at 0x%p with pfn %#lx and evtchn %#x\n",
             cons_ring, raw_pfn, cons_evtchn);
     return;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 49/74] x86/guest: map per-cpu vcpu_info area.
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (47 preceding siblings ...)
  2018-01-04 13:05 ` [PATCH RFC v1 48/74] x86/guest: unmask console event channel Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-08 13:21   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 50/74] xen/pvshim: remove Dom0 kernel support check Wei Liu
                   ` (26 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

So that the limit of XEN_LEGACY_MAX_VCPUS can be lifted.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Should be moved earlier maybe?
---
 xen/arch/x86/guest/xen.c              | 61 +++++++++++++++++++++++++++++++++--
 xen/arch/x86/time.c                   | 11 ++-----
 xen/include/asm-x86/guest/hypercall.h |  7 ++++
 3 files changed, 69 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index a95c36017f..3fa164aba8 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -38,6 +38,10 @@ static struct rangeset *mem;
 
 DEFINE_PER_CPU(unsigned int, vcpu_id);
 
+static struct vcpu_info *vcpu_info;
+unsigned long vcpu_info_mapped[BITS_TO_LONGS(NR_CPUS)];
+DEFINE_PER_CPU(struct vcpu_info *, vcpu_info);
+
 static void __init find_xen_leaves(void)
 {
     uint32_t eax, ebx, ecx, edx, base;
@@ -101,6 +105,38 @@ static void map_shared_info(void)
         xchg(&XEN_shared_info->evtchn_mask[i], ~0ul);
 }
 
+static int map_vcpuinfo(void)
+{
+    unsigned int vcpu = this_cpu(vcpu_id);
+    struct vcpu_register_vcpu_info info = { };
+    long rc;
+
+    if ( !vcpu_info )
+    {
+        this_cpu(vcpu_info) = &XEN_shared_info->vcpu_info[vcpu];
+        return 0;
+    }
+
+    if ( test_bit(vcpu, vcpu_info_mapped) )
+    {
+        this_cpu(vcpu_info) = &vcpu_info[vcpu];
+        return 0;
+    }
+
+    info.mfn = virt_to_mfn(&vcpu_info[vcpu]);
+    info.offset = (unsigned long)&vcpu_info[vcpu] & ~PAGE_MASK;
+    rc = xen_hypercall_vcpu_op(VCPUOP_register_vcpu_info, vcpu, &info);
+    if ( rc )
+        this_cpu(vcpu_info) = &XEN_shared_info->vcpu_info[vcpu];
+    else
+    {
+        this_cpu(vcpu_info) = &vcpu_info[vcpu];
+        set_bit(vcpu, vcpu_info_mapped);
+    }
+
+    return rc;
+}
+
 static void set_vcpu_id(void)
 {
     uint32_t eax, ebx, ecx, edx;
@@ -117,8 +153,7 @@ static void set_vcpu_id(void)
 
 static void xen_evtchn_upcall(struct cpu_user_regs *regs)
 {
-    struct vcpu_info *vcpu_info =
-        &XEN_shared_info->vcpu_info[this_cpu(vcpu_id)];
+    struct vcpu_info *vcpu_info = this_cpu(vcpu_info);
 
     vcpu_info->evtchn_upcall_pending = 0;
     xchg(&vcpu_info->evtchn_pending_sel, 0);
@@ -176,12 +211,34 @@ void __init hypervisor_setup(void)
     map_shared_info();
     set_vcpu_id();
 
+    vcpu_info = xzalloc_array(struct vcpu_info, nr_cpu_ids);
+    if ( map_vcpuinfo() || !vcpu_info )
+    {
+        if ( vcpu_info )
+        {
+            xfree(vcpu_info);
+            vcpu_info = NULL;
+        }
+        if ( nr_cpu_ids > XEN_LEGACY_MAX_VCPUS )
+        {
+            unsigned int i;
+
+            for ( i = XEN_LEGACY_MAX_VCPUS; i < nr_cpu_ids; i++ )
+                __cpumask_clear_cpu(i, &cpu_present_map);
+            nr_cpu_ids = XEN_LEGACY_MAX_VCPUS;
+            printk(XENLOG_WARNING
+                   "unable to map vCPU info, limiting vCPUs to: %u\n",
+                   XEN_LEGACY_MAX_VCPUS);
+        }
+    }
+
     init_evtchn();
 }
 
 void hypervisor_ap_setup(void)
 {
     set_vcpu_id();
+    map_vcpuinfo();
     init_evtchn();
 }
 
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 85bcb9b28a..1294c88240 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -533,11 +533,11 @@ static struct platform_timesource __initdata plt_tsc =
  * Xen clock source is a variant of TSC source.
  */
 
-DECLARE_PER_CPU(unsigned int, vcpu_id);
+DECLARE_PER_CPU(struct vcpu_info *, vcpu_info);
 
 static u64 xen_timer_cpu_frequency(void)
 {
-    struct vcpu_time_info *info = &XEN_shared_info->vcpu_info[0].time;
+    struct vcpu_time_info *info = &this_cpu(vcpu_info)->time;
     u64 freq;
 
     freq = 1000000000ULL << 32;
@@ -576,16 +576,11 @@ u64 __read_cycle(const struct vcpu_time_info *info, u64 tsc)
 static u64 last_value;
 static u64 read_xen_timer(void)
 {
-    struct vcpu_time_info *info;
-    unsigned int cpu = this_cpu(vcpu_id);
+    struct vcpu_time_info *info = &this_cpu(vcpu_info)->time;
     u32 version;
     u64 ret;
     u64 last;
 
-    /* TODO: lift this restriction */
-    ASSERT(cpu < XEN_LEGACY_MAX_VCPUS);
-    info = &XEN_shared_info->vcpu_info[cpu].time;
-
     do {
         version = info->version & ~1;
         /* Make sure version is read before the data */
diff --git a/xen/include/asm-x86/guest/hypercall.h b/xen/include/asm-x86/guest/hypercall.h
index 90b4755467..7d11df29fa 100644
--- a/xen/include/asm-x86/guest/hypercall.h
+++ b/xen/include/asm-x86/guest/hypercall.h
@@ -23,6 +23,7 @@
 
 #include <public/xen.h>
 #include <public/sched.h>
+#include <public/vcpu.h>
 #include <public/hvm/hvm_op.h>
 
 #ifdef CONFIG_XEN_GUEST
@@ -107,6 +108,12 @@ static inline long xen_hypercall_hvm_op(unsigned int op, void *arg)
     return _hypercall64_2(long, __HYPERVISOR_hvm_op, op, arg);
 }
 
+static inline long xen_hypercall_vcpu_op(unsigned int cmd, unsigned int vcpu,
+                                         void *arg)
+{
+    return _hypercall64_3(long, __HYPERVISOR_vcpu_op, cmd, vcpu, arg);
+}
+
 /*
  * Higher level hypercall helpers
  */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 50/74] xen/pvshim: remove Dom0 kernel support check
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (48 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 49/74] x86/guest: map per-cpu vcpu_info area Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-08 13:28   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 51/74] xen/pvshim: don't allow access to iomem or ioports Wei Liu
                   ` (25 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/pv/dom0_build.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 23d5993b7e..95347c6fd2 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -19,6 +19,7 @@
 #include <asm/dom0_build.h>
 #include <asm/page.h>
 #include <asm/pv/mm.h>
+#include <asm/pv/shim.h>
 #include <asm/setup.h>
 
 /* Allow ring-3 access in long mode as guest cannot use ring 1 ... */
@@ -373,7 +374,7 @@ int __init dom0_construct_pv(struct domain *d,
 
     if ( parms.elf_notes[XEN_ELFNOTE_SUPPORTED_FEATURES].type != XEN_ENT_NONE )
     {
-        if ( !test_bit(XENFEAT_dom0, parms.f_supported) )
+        if ( !pv_shim && !test_bit(XENFEAT_dom0, parms.f_supported) )
         {
             printk("Kernel does not support Dom0 operation\n");
             rc = -EINVAL;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 51/74] xen/pvshim: don't allow access to iomem or ioports
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (49 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 50/74] xen/pvshim: remove Dom0 kernel support check Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-08 13:29   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 52/74] xen: mark xenstore/console pages as RAM and add them to dom_io Wei Liu
                   ` (24 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/dom0_build.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index bf992fef6d..357fd87f39 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -16,6 +16,7 @@
 #include <asm/hpet.h>
 #include <asm/io_apic.h>
 #include <asm/p2m.h>
+#include <asm/pv/shim.h>
 #include <asm/setup.h>
 
 static long __initdata dom0_nrpages;
@@ -385,6 +386,9 @@ int __init dom0_setup_permissions(struct domain *d)
     unsigned int i;
     int rc;
 
+    if ( pv_shim )
+        return 0;
+
     /* The hardware domain is initially permitted full I/O capabilities. */
     rc = ioports_permit_access(d, 0, 0xFFFF);
     rc |= iomem_permit_access(d, 0UL, (1UL << (paddr_bits - PAGE_SHIFT)) - 1);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 52/74] xen: mark xenstore/console pages as RAM and add them to dom_io
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (50 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 51/74] xen/pvshim: don't allow access to iomem or ioports Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-08 13:49   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU Wei Liu
                   ` (23 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/e820.c               |  4 +++
 xen/arch/x86/guest/xen.c          | 76 +++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm.c                 |  3 ++
 xen/common/page_alloc.c           | 15 ++++++++
 xen/drivers/char/xen_pv_console.c |  4 +++
 xen/include/asm-x86/guest/xen.h   | 21 +++++++++++
 6 files changed, 123 insertions(+)

diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
index b422a684ee..590ea985ef 100644
--- a/xen/arch/x86/e820.c
+++ b/xen/arch/x86/e820.c
@@ -9,6 +9,7 @@
 #include <asm/processor.h>
 #include <asm/mtrr.h>
 #include <asm/msr.h>
+#include <asm/guest.h>
 
 /*
  * opt_mem: Limit maximum address of physical RAM.
@@ -699,6 +700,9 @@ unsigned long __init init_e820(const char *str, struct e820map *raw)
 
     machine_specific_memory_setup(raw);
 
+    if ( xen_guest )
+        hypervisor_fixup_e820(&e820);
+
     printk("%s RAM map:\n", str);
     print_e820_memory_map(e820.map, e820.nr_map);
 
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 3fa164aba8..b7743e646d 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -29,6 +29,7 @@
 #include <asm/processor.h>
 
 #include <public/arch-x86/cpuid.h>
+#include <public/hvm/params.h>
 
 bool xen_guest;
 
@@ -259,6 +260,81 @@ int hypervisor_free_unused_page(mfn_t mfn)
     return rangeset_remove_range(mem, mfn_x(mfn), mfn_x(mfn));
 }
 
+static void __init mark_pfn_as_ram(struct e820map *e820, uint64_t pfn)
+{
+    if ( !e820_add_range(e820, pfn << PAGE_SHIFT,
+                         (pfn << PAGE_SHIFT) + PAGE_SIZE, E820_RAM) )
+        if ( !e820_change_range_type(e820, pfn << PAGE_SHIFT,
+                                     (pfn << PAGE_SHIFT) + PAGE_SIZE,
+                                     E820_RESERVED, E820_RAM) )
+            panic("Unable to add/change memory type of pfn %#lx to RAM", pfn);
+}
+
+void __init hypervisor_fixup_e820(struct e820map *e820)
+{
+    uint64_t pfn = 0;
+    long rc;
+
+    if ( !xen_guest )
+        return;
+
+#define MARK_PARAM_RAM(p) ({                    \
+    rc = xen_hypercall_hvm_get_param(p, &pfn);  \
+    if ( rc )                                   \
+        panic("Unable to get " #p);             \
+    mark_pfn_as_ram(e820, pfn);                 \
+})
+    MARK_PARAM_RAM(HVM_PARAM_STORE_PFN);
+    if ( !pv_console )
+        MARK_PARAM_RAM(HVM_PARAM_CONSOLE_PFN);
+#undef MARK_PARAM_RAM
+}
+
+void __init hypervisor_init_memory(void)
+{
+    uint64_t pfn = 0;
+    long rc;
+
+    if ( !xen_guest )
+        return;
+
+#define SHARE_PARAM(p) ({                                                   \
+    rc = xen_hypercall_hvm_get_param(p, &pfn);                              \
+    if ( rc )                                                               \
+        panic("Unable to get " #p);                                         \
+    share_xen_page_with_guest(mfn_to_page(pfn), dom_io, XENSHARE_writable); \
+})
+    SHARE_PARAM(HVM_PARAM_STORE_PFN);
+    if ( !pv_console )
+        SHARE_PARAM(HVM_PARAM_CONSOLE_PFN);
+#undef SHARE_PARAM
+}
+
+const unsigned long *__init hypervisor_reserved_pages(unsigned int *size)
+{
+    static unsigned long __initdata reserved_pages[2];
+    uint64_t pfn = 0;
+    long rc;
+
+    if ( !xen_guest )
+        return NULL;
+
+    *size = 0;
+
+#define RESERVE_PARAM(p) ({                             \
+    rc = xen_hypercall_hvm_get_param(p, &pfn);          \
+    if ( rc )                                           \
+        panic("Unable to get " #p);                     \
+    reserved_pages[(*size)++] = pfn << PAGE_SHIFT;      \
+})
+    RESERVE_PARAM(HVM_PARAM_STORE_PFN);
+    if ( !pv_console )
+        RESERVE_PARAM(HVM_PARAM_CONSOLE_PFN);
+#undef RESERVE_PARAM
+
+    return reserved_pages;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 355e6747bb..4332d3bb39 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -298,6 +298,9 @@ void __init arch_init_memory(void)
         share_xen_page_with_guest(mfn_to_page(_mfn(i)),
                                   dom_io, XENSHARE_writable);
 
+    if ( xen_guest )
+        hypervisor_init_memory();
+
     /* Any areas not specified as RAM by the e820 map are considered I/O. */
     for ( i = 0, pfn = 0; pfn < max_page; i++ )
     {
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index c0c2d82906..4de8988bea 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -143,6 +143,7 @@
 #include <asm/numa.h>
 #include <asm/flushtlb.h>
 #ifdef CONFIG_X86
+#include <asm/guest.h>
 #include <asm/p2m.h>
 #include <asm/setup.h> /* for highmem_start only */
 #else
@@ -303,6 +304,20 @@ void __init init_boot_pages(paddr_t ps, paddr_t pe)
             badpage++;
         }
     }
+
+    if ( xen_guest )
+    {
+        badpage = hypervisor_reserved_pages(&array_size);
+        if ( badpage )
+        {
+            for ( i = 0; i < array_size; i++ )
+            {
+                bootmem_region_zap(*badpage >> PAGE_SHIFT,
+                                   (*badpage >> PAGE_SHIFT) + 1);
+                badpage++;
+            }
+        }
+    }
 #endif
 
     /* Check new pages against the bad-page list. */
diff --git a/xen/drivers/char/xen_pv_console.c b/xen/drivers/char/xen_pv_console.c
index 2df7d982ba..6aa694e395 100644
--- a/xen/drivers/char/xen_pv_console.c
+++ b/xen/drivers/char/xen_pv_console.c
@@ -35,6 +35,8 @@ static evtchn_port_t cons_evtchn;
 static serial_rx_fn cons_rx_handler;
 static DEFINE_SPINLOCK(tx_lock);
 
+bool pv_console;
+
 void __init pv_console_init(void)
 {
     struct evtchn_unmask unmask;
@@ -64,6 +66,8 @@ void __init pv_console_init(void)
 
     printk("Initialised PV console at 0x%p with pfn %#lx and evtchn %#x\n",
             cons_ring, raw_pfn, cons_evtchn);
+    pv_console = true;
+
     return;
 
  error:
diff --git a/xen/include/asm-x86/guest/xen.h b/xen/include/asm-x86/guest/xen.h
index 7a4d734795..898156d42e 100644
--- a/xen/include/asm-x86/guest/xen.h
+++ b/xen/include/asm-x86/guest/xen.h
@@ -29,16 +29,21 @@
 #ifdef CONFIG_XEN_GUEST
 
 extern bool xen_guest;
+extern bool pv_console;
 
 void probe_hypervisor(void);
 void hypervisor_setup(void);
 void hypervisor_ap_setup(void);
 int hypervisor_alloc_unused_page(mfn_t *mfn);
 int hypervisor_free_unused_page(mfn_t mfn);
+void hypervisor_fixup_e820(struct e820map *e820);
+void hypervisor_init_memory(void);
+const unsigned long *hypervisor_reserved_pages(unsigned int *size);
 
 #else
 
 #define xen_guest 0
+#define pv_console 0
 
 static inline void probe_hypervisor(void) {};
 
@@ -65,6 +70,22 @@ static inline int hypervisor_free_unused_page(mfn_t mfn)
     return 0;
 }
 
+static inline void hypervisor_fixup_e820(struct e820map *e820)
+{
+    ASSERT_UNREACHABLE();
+}
+
+static inline void hypervisor_init_memory(void)
+{
+    ASSERT_UNREACHABLE();
+}
+
+static inline const unsigned long *hypervisor_reserved_pages(unsigned int *size)
+{
+    ASSERT_UNREACHABLE();
+    return NULL;
+};
+
 #endif /* CONFIG_XEN_GUEST */
 #endif /* __X86_GUEST_XEN_H__ */
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (51 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 52/74] xen: mark xenstore/console pages as RAM and add them to dom_io Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-08 14:06   ` Jan Beulich
  2018-01-09  9:06   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 54/74] xen/pvshim: set correct domid value Wei Liu
                   ` (22 subsequent siblings)
  75 siblings, 2 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

According to the PV ABI the initial virtual memory regions should
contain the xenstore and console pages after the start_info. Fix this
and add the pages to the p2m/m2p after the start_info page also.

Also set the correct values in the start_info for DomU operation.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/pv/dom0_build.c     | 38 ++++++++++++++++++-----
 xen/arch/x86/pv/shim.c           | 66 ++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/dom0_build.h |  4 +++
 xen/include/asm-x86/pv/shim.h    | 21 +++++++++++++
 4 files changed, 121 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 95347c6fd2..e152fe3a9e 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -31,9 +31,8 @@
 #define L3_PROT (BASE_PROT|_PAGE_DIRTY)
 #define L4_PROT (BASE_PROT|_PAGE_DIRTY)
 
-static __init void dom0_update_physmap(struct domain *d, unsigned long pfn,
-                                       unsigned long mfn,
-                                       unsigned long vphysmap_s)
+__init void dom0_update_physmap(struct domain *d, unsigned long pfn,
+                                unsigned long mfn, unsigned long vphysmap_s)
 {
     if ( !is_pv_32bit_domain(d) )
         ((unsigned long *)vphysmap_s)[pfn] = mfn;
@@ -316,6 +315,10 @@ int __init dom0_construct_pv(struct domain *d,
     unsigned long vphysmap_end;
     unsigned long vstartinfo_start;
     unsigned long vstartinfo_end;
+    unsigned long vxenstore_start = 0;
+    unsigned long vxenstore_end = 0;
+    unsigned long vconsole_start = 0;
+    unsigned long vconsole_end = 0;
     unsigned long vstack_start;
     unsigned long vstack_end;
     unsigned long vpt_start;
@@ -443,9 +446,18 @@ int __init dom0_construct_pv(struct domain *d,
     vstartinfo_start = round_pgup(vphysmap_end);
     vstartinfo_end   = (vstartinfo_start +
                         sizeof(struct start_info) +
-                        sizeof(struct dom0_vga_console_info));
+                        (pv_shim ? 0 : sizeof(struct dom0_vga_console_info)));
 
-    vpt_start        = round_pgup(vstartinfo_end);
+    if ( pv_shim )
+    {
+        vxenstore_start  = round_pgup(vstartinfo_end);
+        vxenstore_end    = vxenstore_start + PAGE_SIZE;
+        vconsole_start   = vxenstore_end;
+        vconsole_end     = vconsole_start + PAGE_SIZE;
+        vpt_start        = vconsole_end;
+    }
+    else
+        vpt_start        = round_pgup(vstartinfo_end);
     for ( nr_pt_pages = 2; ; nr_pt_pages++ )
     {
         vpt_end          = vpt_start + (nr_pt_pages * PAGE_SIZE);
@@ -538,6 +550,8 @@ int __init dom0_construct_pv(struct domain *d,
            " Init. ramdisk: %p->%p\n"
            " Phys-Mach map: %p->%p\n"
            " Start info:    %p->%p\n"
+           " Xenstore ring: %p->%p\n"
+           " Console ring:  %p->%p\n"
            " Page tables:   %p->%p\n"
            " Boot stack:    %p->%p\n"
            " TOTAL:         %p->%p\n",
@@ -545,6 +559,8 @@ int __init dom0_construct_pv(struct domain *d,
            _p(vinitrd_start), _p(vinitrd_end),
            _p(vphysmap_start), _p(vphysmap_end),
            _p(vstartinfo_start), _p(vstartinfo_end),
+           _p(vxenstore_start), _p(vxenstore_end),
+           _p(vconsole_start), _p(vconsole_end),
            _p(vpt_start), _p(vpt_end),
            _p(vstack_start), _p(vstack_end),
            _p(v_start), _p(v_end));
@@ -738,7 +754,8 @@ int __init dom0_construct_pv(struct domain *d,
 
     si->shared_info = virt_to_maddr(d->shared_info);
 
-    si->flags        = SIF_PRIVILEGED | SIF_INITDOMAIN;
+    if ( !pv_shim )
+        si->flags    = SIF_PRIVILEGED | SIF_INITDOMAIN;
     if ( !vinitrd_start && initrd_len )
         si->flags   |= SIF_MOD_START_PFN;
     si->flags       |= (xen_processor_pmbits << 8) & SIF_PM_MASK;
@@ -830,15 +847,20 @@ int __init dom0_construct_pv(struct domain *d,
         strlcpy((char *)si->cmd_line, cmdline, sizeof(si->cmd_line));
 
 #ifdef CONFIG_VIDEO
-    if ( fill_console_start_info((void *)(si + 1)) )
+    if ( !pv_shim && fill_console_start_info((void *)(si + 1)) )
     {
         si->console.dom0.info_off  = sizeof(struct start_info);
         si->console.dom0.info_size = sizeof(struct dom0_vga_console_info);
     }
 #endif
 
+    if ( pv_shim )
+        pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start,
+                          vphysmap_start, si);
+
     if ( is_pv_32bit_domain(d) )
-        xlat_start_info(si, XLAT_start_info_console_dom0);
+        xlat_start_info(si, pv_shim ? XLAT_start_info_console_domU
+                                    : XLAT_start_info_console_dom0);
 
     /* Return to idle domain's page tables. */
     mapcache_override_current(NULL);
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 4d037355db..5e7e46632b 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -18,16 +18,82 @@
  *
  * Copyright (c) 2017 Citrix Systems Ltd.
  */
+#include <xen/hypercall.h>
 #include <xen/init.h>
 #include <xen/types.h>
 
 #include <asm/apic.h>
+#include <asm/dom0_build.h>
+#include <asm/guest.h>
+#include <asm/pv/mm.h>
 
 #ifndef CONFIG_PV_SHIM_EXCLUSIVE
 bool pv_shim;
 boolean_param("pv-shim", pv_shim);
 #endif
 
+#define L1_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_USER| \
+                 _PAGE_GUEST_KERNEL)
+#define COMPAT_L1_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED)
+
+static void __init replace_va(struct domain *d, l4_pgentry_t *l4start,
+                              unsigned long va, unsigned long mfn)
+{
+    struct page_info *page;
+    l4_pgentry_t *pl4e;
+    l3_pgentry_t *pl3e;
+    l2_pgentry_t *pl2e;
+    l1_pgentry_t *pl1e;
+
+    pl4e = l4start + l4_table_offset(va);
+    pl3e = l4e_to_l3e(*pl4e);
+    pl3e += l3_table_offset(va);
+    pl2e = l3e_to_l2e(*pl3e);
+    pl2e += l2_table_offset(va);
+    pl1e = l2e_to_l1e(*pl2e);
+    pl1e += l1_table_offset(va);
+
+    page = mfn_to_page(l1e_get_pfn(*pl1e));
+    /* Free original page, will be replaced */
+    put_page_and_type(page);
+    free_domheap_pages(page, 0);
+
+    *pl1e = l1e_from_pfn(mfn, (!is_pv_32bit_domain(d) ? L1_PROT
+                                                      : COMPAT_L1_PROT));
+}
+
+void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
+                              unsigned long va_start, unsigned long store_va,
+                              unsigned long console_va, unsigned long vphysmap,
+                              start_info_t *si)
+{
+    uint64_t param = 0;
+    long rc;
+
+#define SET_AND_MAP_PARAM(p, si, va) ({                                        \
+    rc = xen_hypercall_hvm_get_param(p, &param);                               \
+    if ( rc )                                                                  \
+        panic("Unable to get " #p "\n");                                       \
+    (si) = param;                                                              \
+    if ( va )                                                                  \
+    {                                                                          \
+        BUG_ON(unshare_xen_page_with_guest(mfn_to_page(param), dom_io));       \
+        share_xen_page_with_guest(mfn_to_page(param), d, XENSHARE_writable);   \
+        replace_va(d, l4start, va, param);                                     \
+        dom0_update_physmap(d, (va - va_start) >> PAGE_SHIFT, param, vphysmap);\
+    }                                                                          \
+})
+    SET_AND_MAP_PARAM(HVM_PARAM_STORE_PFN, si->store_mfn, store_va);
+    SET_AND_MAP_PARAM(HVM_PARAM_STORE_EVTCHN, si->store_evtchn, 0);
+    if ( !pv_console )
+    {
+        SET_AND_MAP_PARAM(HVM_PARAM_CONSOLE_PFN, si->console.domU.mfn,
+                          console_va);
+        SET_AND_MAP_PARAM(HVM_PARAM_CONSOLE_EVTCHN, si->console.domU.evtchn, 0);
+    }
+#undef SET_AND_MAP_PARAM
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/dom0_build.h b/xen/include/asm-x86/dom0_build.h
index d83d2b4387..d985406503 100644
--- a/xen/include/asm-x86/dom0_build.h
+++ b/xen/include/asm-x86/dom0_build.h
@@ -1,6 +1,7 @@
 #ifndef _DOM0_BUILD_H_
 #define _DOM0_BUILD_H_
 
+#include <xen/libelf.h>
 #include <xen/sched.h>
 
 #include <asm/setup.h>
@@ -29,6 +30,9 @@ int dom0_construct_pvh(struct domain *d, const module_t *image,
 unsigned long dom0_paging_pages(const struct domain *d,
                                 unsigned long nr_pages);
 
+void dom0_update_physmap(struct domain *d, unsigned long pfn,
+                         unsigned long mfn, unsigned long vphysmap_s);
+
 #endif	/* _DOM0_BUILD_H_ */
 
 /*
diff --git a/xen/include/asm-x86/pv/shim.h b/xen/include/asm-x86/pv/shim.h
index 1468cfd498..b0c361cba1 100644
--- a/xen/include/asm-x86/pv/shim.h
+++ b/xen/include/asm-x86/pv/shim.h
@@ -29,6 +29,27 @@ extern bool pv_shim;
 # define pv_shim 0
 #endif /* CONFIG_PV_SHIM{,_EXCLUSIVE} */
 
+#ifdef CONFIG_PV_SHIM
+
+void pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
+                       unsigned long va_start, unsigned long store_va,
+                       unsigned long console_va, unsigned long vphysmap,
+                       start_info_t *si);
+
+#else
+
+static inline void pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
+                                     unsigned long va_start,
+                                     unsigned long store_va,
+                                     unsigned long console_va,
+                                     unsigned long vphysmap,
+                                     start_info_t *si)
+{
+    ASSERT_UNREACHABLE();
+}
+
+#endif
+
 #endif /* __X86_PV_SHIM_H__ */
 
 /*
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 54/74] xen/pvshim: set correct domid value
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (52 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-08 14:17   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU Wei Liu
                   ` (21 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

If domid is not provided by L0 set domid to 1 by default.

Since the domain created is no longer the hardware domain add a hook
to the domain shutdown path in order to forward shutdown operations to
the L0 hypervisor.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
 xen/arch/x86/dom0_build.c       |  2 +-
 xen/arch/x86/guest/xen.c        |  5 +++++
 xen/arch/x86/pv/shim.c          | 19 +++++++++++++++++++
 xen/arch/x86/setup.c            | 12 +++++++-----
 xen/common/domain.c             |  7 +++++++
 xen/include/asm-x86/guest/xen.h |  6 ++++++
 xen/include/asm-x86/pv/shim.h   | 10 ++++++++++
 7 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 357fd87f39..1c5853690a 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -473,7 +473,7 @@ int __init construct_dom0(struct domain *d, const module_t *image,
     int rc;
 
     /* Sanity! */
-    BUG_ON(d->domain_id != 0);
+    BUG_ON(!pv_shim && d->domain_id != 0);
     BUG_ON(d->vcpu[0] == NULL);
     BUG_ON(d->vcpu[0]->is_initialised);
 
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index b7743e646d..a9de20708c 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -335,6 +335,11 @@ const unsigned long *__init hypervisor_reserved_pages(unsigned int *size)
     return reserved_pages;
 }
 
+uint32_t hypervisor_cpuid_base(void)
+{
+    return xen_cpuid_base;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 5e7e46632b..d318f07d08 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -20,6 +20,7 @@
  */
 #include <xen/hypercall.h>
 #include <xen/init.h>
+#include <xen/shutdown.h>
 #include <xen/types.h>
 
 #include <asm/apic.h>
@@ -94,6 +95,24 @@ void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
 #undef SET_AND_MAP_PARAM
 }
 
+void pv_shim_shutdown(uint8_t reason)
+{
+    /* XXX: handle suspend */
+    xen_hypercall_shutdown(reason);
+}
+
+domid_t get_dom0_domid(void)
+{
+    uint32_t eax, ebx, ecx, edx;
+
+    if ( !pv_shim )
+        return 0;
+
+    cpuid(hypervisor_cpuid_base() + 1, &eax, &ebx, &ecx, &edx);
+
+    return ebx ?: 1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 9b45a4fd94..34d746395b 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -104,6 +104,8 @@ unsigned long __read_mostly mmu_cr4_features = XEN_MINIMAL_CR4;
 #define SMEP_HVM_ONLY (-1)
 static s8 __initdata opt_smep = 1;
 
+static struct domain *__initdata dom0;
+
 static int __init parse_smep_param(const char *s)
 {
     if ( !*s )
@@ -576,11 +578,11 @@ static void noinline init_done(void)
 
     system_state = SYS_STATE_active;
 
+    domain_unpause_by_systemcontroller(dom0);
+
     /* MUST be done prior to removing .init data. */
     unregister_init_virtual_region();
 
-    domain_unpause_by_systemcontroller(hardware_domain);
-
     /* Zero the .init code and data. */
     for ( va = __init_begin; va < _p(__init_end); va += PAGE_SIZE )
         clear_page(va);
@@ -659,7 +661,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
     int i, j, e820_warn = 0, bytes = 0;
     bool acpi_boot_table_init_done = false, relocated = false;
-    struct domain *dom0;
     struct ns16550_defaults ns16550 = {
         .data_bits = 8,
         .parity    = 'n',
@@ -1617,11 +1618,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     }
 
     /* Create initial domain 0. */
-    dom0 = domain_create(0, domcr_flags, 0, &config);
+    dom0 = domain_create(get_dom0_domid(), domcr_flags, 0, &config);
     if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
         panic("Error creating domain 0");
 
-    dom0->is_privileged = 1;
+    if ( !pv_shim )
+        dom0->is_privileged = 1;
     dom0->target = NULL;
 
     /* Grab the DOM0 command line. */
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 7af8d12512..edbf1a2ba9 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -42,6 +42,7 @@
 #include <xen/trace.h>
 #include <xen/tmem.h>
 #include <asm/setup.h>
+#include <asm/guest.h>
 
 /* Linux config option: propageted to domain0 */
 /* xen_processor_pmbits: xen control Cx, Px, ... */
@@ -697,6 +698,12 @@ void domain_shutdown(struct domain *d, u8 reason)
 {
     struct vcpu *v;
 
+    if ( pv_shim )
+    {
+        pv_shim_shutdown(reason);
+        return;
+    }
+
     spin_lock(&d->shutdown_lock);
 
     if ( d->shutdown_code == SHUTDOWN_CODE_INVALID )
diff --git a/xen/include/asm-x86/guest/xen.h b/xen/include/asm-x86/guest/xen.h
index 898156d42e..94f781c30f 100644
--- a/xen/include/asm-x86/guest/xen.h
+++ b/xen/include/asm-x86/guest/xen.h
@@ -39,6 +39,7 @@ int hypervisor_free_unused_page(mfn_t mfn);
 void hypervisor_fixup_e820(struct e820map *e820);
 void hypervisor_init_memory(void);
 const unsigned long *hypervisor_reserved_pages(unsigned int *size);
+uint32_t hypervisor_cpuid_base(void);
 
 #else
 
@@ -85,6 +86,11 @@ static inline const unsigned long *hypervisor_reserved_pages(unsigned int *size)
     ASSERT_UNREACHABLE();
     return NULL;
 };
+static inline uint32_t hypervisor_cpuid_base(void)
+{
+    ASSERT_UNREACHABLE();
+    return 0;
+};
 
 #endif /* CONFIG_XEN_GUEST */
 #endif /* __X86_GUEST_XEN_H__ */
diff --git a/xen/include/asm-x86/pv/shim.h b/xen/include/asm-x86/pv/shim.h
index b0c361cba1..8d4e8d2ae1 100644
--- a/xen/include/asm-x86/pv/shim.h
+++ b/xen/include/asm-x86/pv/shim.h
@@ -35,6 +35,8 @@ void pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
                        unsigned long va_start, unsigned long store_va,
                        unsigned long console_va, unsigned long vphysmap,
                        start_info_t *si);
+void pv_shim_shutdown(uint8_t reason);
+domid_t get_dom0_domid(void);
 
 #else
 
@@ -47,6 +49,14 @@ static inline void pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
 {
     ASSERT_UNREACHABLE();
 }
+static inline void pv_shim_shutdown(uint8_t reason)
+{
+    ASSERT_UNREACHABLE();
+}
+static inline domid_t get_dom0_domid(void)
+{
+    return 0;
+}
 
 #endif
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (53 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 54/74] xen/pvshim: set correct domid value Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-08 16:05   ` Jan Beulich
  2018-01-09  7:49   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 56/74] xen/pvshim: add grant table operations Wei Liu
                   ` (20 subsequent siblings)
  75 siblings, 2 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Note that the unmask and the virq operations are handled by the shim
itself, and that FIFO event channels are not exposed to the guest.

Signed-off-by: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
 xen/arch/x86/guest/xen.c          |  25 +++-
 xen/arch/x86/pv/shim.c            | 259 ++++++++++++++++++++++++++++++++++++++
 xen/common/domain.c               |   7 ++
 xen/common/event_channel.c        | 100 +++++++++------
 xen/drivers/char/xen_pv_console.c |  11 +-
 xen/include/asm-x86/pv/shim.h     |  12 ++
 xen/include/xen/event.h           |  15 +++
 xen/include/xen/pv_console.h      |   6 +
 xen/include/xen/sched.h           |   2 +
 9 files changed, 394 insertions(+), 43 deletions(-)

diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index a9de20708c..653a7366ab 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -18,6 +18,7 @@
  *
  * Copyright (c) 2017 Citrix Systems Ltd.
  */
+#include <xen/event.h>
 #include <xen/init.h>
 #include <xen/types.h>
 #include <xen/pv_console.h>
@@ -155,11 +156,31 @@ static void set_vcpu_id(void)
 static void xen_evtchn_upcall(struct cpu_user_regs *regs)
 {
     struct vcpu_info *vcpu_info = this_cpu(vcpu_info);
+    unsigned long pending;
 
     vcpu_info->evtchn_upcall_pending = 0;
-    xchg(&vcpu_info->evtchn_pending_sel, 0);
+    pending = xchg(&vcpu_info->evtchn_pending_sel, 0);
 
-    pv_console_rx(regs);
+    while ( pending )
+    {
+        unsigned int l1 = ffsl(pending) - 1;
+        unsigned long evtchn = xchg(&XEN_shared_info->evtchn_pending[l1], 0);
+
+        __clear_bit(l1, &pending);
+        evtchn &= ~XEN_shared_info->evtchn_mask[l1];
+        while ( evtchn )
+        {
+            unsigned int port = ffsl(evtchn) - 1;
+
+            __clear_bit(port, &evtchn);
+            port += l1 * BITS_PER_LONG;
+
+            if ( pv_console && port == pv_console_evtchn() )
+                pv_console_rx(regs);
+            else if ( pv_shim )
+                pv_shim_inject_evtchn(port);
+        }
+    }
 
     ack_APIC_irq();
 }
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index d318f07d08..69482993f9 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -18,6 +18,8 @@
  *
  * Copyright (c) 2017 Citrix Systems Ltd.
  */
+#include <xen/event.h>
+#include <xen/guest_access.h>
 #include <xen/hypercall.h>
 #include <xen/init.h>
 #include <xen/shutdown.h>
@@ -63,6 +65,31 @@ static void __init replace_va(struct domain *d, l4_pgentry_t *l4start,
                                                       : COMPAT_L1_PROT));
 }
 
+static void evtchn_reserve(struct domain *d, unsigned int port)
+{
+    struct evtchn_unmask unmask = {
+        .port = port,
+    };
+
+    ASSERT(port_is_valid(d, port));
+    evtchn_from_port(d, port)->state = ECS_RESERVED;
+    BUG_ON(xen_hypercall_event_channel_op(EVTCHNOP_unmask, &unmask));
+}
+
+static bool evtchn_handled(struct domain *d, unsigned int port)
+{
+    ASSERT(port_is_valid(d, port));
+    /* The shim manages VIRQs, the rest is forwarded to L0. */
+    return evtchn_from_port(d, port)->state == ECS_VIRQ;
+}
+
+static void evtchn_assign_vcpu(struct domain *d, unsigned int port,
+                               unsigned int vcpu)
+{
+    ASSERT(port_is_valid(d, port));
+    evtchn_from_port(d, port)->notify_vcpu_id = vcpu;
+}
+
 void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
                               unsigned long va_start, unsigned long store_va,
                               unsigned long console_va, unsigned long vphysmap,
@@ -83,6 +110,11 @@ void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
         replace_va(d, l4start, va, param);                                     \
         dom0_update_physmap(d, (va - va_start) >> PAGE_SHIFT, param, vphysmap);\
     }                                                                          \
+    else                                                                       \
+    {                                                                          \
+        BUG_ON(evtchn_allocate_port(d, param));                                \
+        evtchn_reserve(d, param);                                              \
+    }                                                                          \
 })
     SET_AND_MAP_PARAM(HVM_PARAM_STORE_PFN, si->store_mfn, store_va);
     SET_AND_MAP_PARAM(HVM_PARAM_STORE_EVTCHN, si->store_evtchn, 0);
@@ -101,6 +133,233 @@ void pv_shim_shutdown(uint8_t reason)
     xen_hypercall_shutdown(reason);
 }
 
+long pv_shim_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+    struct domain *d = current->domain;
+    long rc;
+
+    switch ( cmd )
+    {
+#define EVTCHN_FORWARD(cmd, port_field)                                 \
+case EVTCHNOP_##cmd: {                                                  \
+    struct evtchn_##cmd op;                                             \
+                                                                        \
+    if ( copy_from_guest(&op, arg, 1) != 0 )                            \
+        return -EFAULT;                                                 \
+                                                                        \
+    rc = xen_hypercall_event_channel_op(EVTCHNOP_##cmd, &op);           \
+    if ( rc )                                                           \
+        break;                                                          \
+                                                                        \
+    spin_lock(&d->event_lock);                                          \
+    rc = evtchn_allocate_port(d, op.port_field);                        \
+    if ( rc )                                                           \
+    {                                                                   \
+        struct evtchn_close close = {                                   \
+            .port = op.port_field,                                      \
+        };                                                              \
+                                                                        \
+        BUG_ON(xen_hypercall_event_channel_op(EVTCHNOP_close, &close)); \
+    }                                                                   \
+    else                                                                \
+        evtchn_reserve(d, op.port_field);                               \
+    spin_unlock(&d->event_lock);                                        \
+                                                                        \
+    if ( !rc && __copy_to_guest(arg, &op, 1) )                          \
+        rc = -EFAULT;                                                   \
+                                                                        \
+    break;                                                              \
+    }
+    EVTCHN_FORWARD(alloc_unbound, port)
+    EVTCHN_FORWARD(bind_interdomain, local_port)
+#undef EVTCHN_FORWARD
+
+    case EVTCHNOP_bind_virq: {
+        struct evtchn_bind_virq virq;
+        struct evtchn_alloc_unbound alloc = {
+            .dom = DOMID_SELF,
+            .remote_dom = DOMID_SELF,
+        };
+
+        if ( copy_from_guest(&virq, arg, 1) != 0 )
+            return -EFAULT;
+        /*
+         * The event channel space is actually controlled by L0 Xen, so
+         * allocate a port from L0 and then force the VIRQ to be bound to that
+         * specific port.
+         *
+         * This is only required for VIRQ because the rest of the event channel
+         * operations are handled directly by L0.
+         */
+        rc = xen_hypercall_event_channel_op(EVTCHNOP_alloc_unbound, &alloc);
+        if ( rc )
+           break;
+
+        /* Force L1 to use the event channel port allocated on L0. */
+        rc = evtchn_bind_virq(&virq, alloc.port);
+        if ( rc )
+        {
+             struct evtchn_close free = {
+                .port = alloc.port,
+             };
+
+              xen_hypercall_event_channel_op(EVTCHNOP_close, &free);
+        }
+
+        if ( !rc && __copy_to_guest(arg, &virq, 1) )
+            rc = -EFAULT;
+
+        break;
+    }
+    case EVTCHNOP_status: {
+        struct evtchn_status status;
+
+        if ( copy_from_guest(&status, arg, 1) != 0 )
+            return -EFAULT;
+
+        if ( port_is_valid(d, status.port) && evtchn_handled(d, status.port) )
+            rc = evtchn_status(&status);
+        else
+            rc = xen_hypercall_event_channel_op(EVTCHNOP_status, &status);
+
+        break;
+    }
+    case EVTCHNOP_bind_vcpu: {
+        struct evtchn_bind_vcpu vcpu;
+
+        if ( copy_from_guest(&vcpu, arg, 1) != 0 )
+            return -EFAULT;
+
+        if ( !port_is_valid(d, vcpu.port) )
+            return -EINVAL;
+
+        if ( evtchn_handled(d, vcpu.port) )
+            rc = evtchn_bind_vcpu(vcpu.port, vcpu.vcpu);
+        else
+        {
+            rc = xen_hypercall_event_channel_op(EVTCHNOP_bind_vcpu, &vcpu);
+            if ( !rc )
+                 evtchn_assign_vcpu(d, vcpu.port, vcpu.vcpu);
+        }
+
+        break;
+    }
+    case EVTCHNOP_close: {
+        struct evtchn_close close;
+
+        if ( copy_from_guest(&close, arg, 1) != 0 )
+            return -EFAULT;
+
+        if ( !port_is_valid(d, close.port) )
+            return -EINVAL;
+
+        if ( evtchn_handled(d, close.port) )
+        {
+            rc = evtchn_close(d, close.port, true);
+            if ( rc )
+                break;
+        }
+        else
+            evtchn_free(d, evtchn_from_port(d, close.port));
+
+        rc = xen_hypercall_event_channel_op(EVTCHNOP_close, &close);
+        if ( rc )
+            /*
+             * If the port cannot be closed on the L0 mark it as reserved
+             * in the shim to avoid re-using it.
+             */
+            evtchn_reserve(d, close.port);
+
+        set_bit(close.port, XEN_shared_info->evtchn_mask);
+
+        break;
+    }
+    case EVTCHNOP_bind_ipi: {
+        struct evtchn_bind_ipi ipi;
+
+        if ( copy_from_guest(&ipi, arg, 1) != 0 )
+            return -EFAULT;
+
+        rc = xen_hypercall_event_channel_op(EVTCHNOP_bind_ipi, &ipi);
+        if ( rc )
+            break;
+
+        spin_lock(&d->event_lock);
+        rc = evtchn_allocate_port(d, ipi.port);
+        if ( rc )
+        {
+            struct evtchn_close close = {
+                .port = ipi.port,
+            };
+
+            /*
+             * If closing the event channel port also fails there's not
+             * much the shim can do, since it has been unable to reserve
+             * the port in it's event channel space.
+             */
+            BUG_ON(xen_hypercall_event_channel_op(EVTCHNOP_close, &close));
+            break;
+        }
+
+        evtchn_assign_vcpu(d, ipi.port, ipi.vcpu);
+        evtchn_reserve(d, ipi.port);
+        spin_unlock(&d->event_lock);
+
+        if ( __copy_to_guest(arg, &ipi, 1) )
+            rc = -EFAULT;
+
+        break;
+    }
+    case EVTCHNOP_unmask: {
+        struct evtchn_unmask unmask;
+
+        if ( copy_from_guest(&unmask, arg, 1) != 0 )
+            return -EFAULT;
+
+        /* Unmask is handled in L1 */
+        rc = evtchn_unmask(unmask.port);
+
+        break;
+    }
+    case EVTCHNOP_send: {
+        struct evtchn_send send;
+
+        if ( copy_from_guest(&send, arg, 1) != 0 )
+            return -EFAULT;
+
+        rc = xen_hypercall_event_channel_op(EVTCHNOP_send, &send);
+
+        break;
+    }
+    case EVTCHNOP_reset: {
+        struct evtchn_reset reset;
+
+        if ( copy_from_guest(&reset, arg, 1) != 0 )
+            return -EFAULT;
+
+        rc = xen_hypercall_event_channel_op(EVTCHNOP_reset, &reset);
+
+        break;
+    }
+    default:
+        /* No FIFO or PIRQ support for now */
+        rc = -ENOSYS;
+        break;
+    }
+
+    return rc;
+}
+
+void pv_shim_inject_evtchn(unsigned int port)
+{
+    if ( port_is_valid(pv_domain, port) )
+    {
+         struct evtchn *chn = evtchn_from_port(pv_domain, port);
+
+         evtchn_port_set_pending(pv_domain, chn->notify_vcpu_id, chn);
+    }
+}
+
 domid_t get_dom0_domid(void)
 {
     uint32_t eax, ebx, ecx, edx;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index edbf1a2ba9..d653a0b0bb 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -63,6 +63,8 @@ struct domain *domain_list;
 
 struct domain *hardware_domain __read_mostly;
 
+struct domain *pv_domain __read_mostly;
+
 #ifdef CONFIG_LATE_HWDOM
 domid_t hardware_domid __read_mostly;
 integer_param("hardware_dom", hardware_domid);
@@ -395,6 +397,11 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
         rcu_assign_pointer(*pd, d);
         rcu_assign_pointer(domain_hash[DOMAIN_HASH(domid)], d);
         spin_unlock(&domlist_update_lock);
+
+#ifdef CONFIG_X86
+        if ( pv_shim )
+            pv_domain = d;
+#endif
     }
 
     return d;
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index c69f9db6db..977a876751 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -31,6 +31,10 @@
 #include <public/event_channel.h>
 #include <xsm/xsm.h>
 
+#ifdef CONFIG_X86
+#include <asm/pv/shim.h>
+#endif
+
 #define ERROR_EXIT(_errno)                                          \
     do {                                                            \
         gdprintk(XENLOG_WARNING,                                    \
@@ -156,46 +160,62 @@ static void free_evtchn_bucket(struct domain *d, struct evtchn *bucket)
     xfree(bucket);
 }
 
+int evtchn_allocate_port(struct domain *d, unsigned int port)
+{
+    if ( port > d->max_evtchn_port || port >= d->max_evtchns )
+        return -ENOSPC;
+
+    if ( port_is_valid(d, port) )
+    {
+        if ( evtchn_from_port(d, port)->state != ECS_FREE ||
+             evtchn_port_is_busy(d, port) )
+            return -EBUSY;
+    }
+    else
+    {
+        struct evtchn *chn;
+        struct evtchn **grp;
+
+        if ( !group_from_port(d, port) )
+        {
+            grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
+            if ( !grp )
+                return -ENOMEM;
+            group_from_port(d, port) = grp;
+        }
+
+        chn = alloc_evtchn_bucket(d, port);
+        if ( !chn )
+            return -ENOMEM;
+        bucket_from_port(d, port) = chn;
+
+        write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
+    }
+
+    return 0;
+}
+
 static int get_free_port(struct domain *d)
 {
-    struct evtchn *chn;
-    struct evtchn **grp;
     int            port;
 
     if ( d->is_dying )
         return -EINVAL;
 
-    for ( port = 0; port_is_valid(d, port); port++ )
+    for ( port = 0; port <= d->max_evtchn_port; port++ )
     {
-        if ( port > d->max_evtchn_port )
-            return -ENOSPC;
-        if ( evtchn_from_port(d, port)->state == ECS_FREE
-             && !evtchn_port_is_busy(d, port) )
-            return port;
-    }
+        int rc = evtchn_allocate_port(d, port);
 
-    if ( port == d->max_evtchns || port > d->max_evtchn_port )
-        return -ENOSPC;
+        if ( rc == -EBUSY )
+            continue;
 
-    if ( !group_from_port(d, port) )
-    {
-        grp = xzalloc_array(struct evtchn *, BUCKETS_PER_GROUP);
-        if ( !grp )
-            return -ENOMEM;
-        group_from_port(d, port) = grp;
+        return port;
     }
 
-    chn = alloc_evtchn_bucket(d, port);
-    if ( !chn )
-        return -ENOMEM;
-    bucket_from_port(d, port) = chn;
-
-    write_atomic(&d->valid_evtchns, d->valid_evtchns + EVTCHNS_PER_BUCKET);
-
-    return port;
+    return -ENOSPC;
 }
 
-static void free_evtchn(struct domain *d, struct evtchn *chn)
+void evtchn_free(struct domain *d, struct evtchn *chn)
 {
     /* Clear pending event to avoid unexpected behavior on re-bind. */
     evtchn_port_clear_pending(d, chn);
@@ -345,13 +365,13 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
 }
 
 
-static long evtchn_bind_virq(evtchn_bind_virq_t *bind)
+int evtchn_bind_virq(evtchn_bind_virq_t *bind, int port)
 {
     struct evtchn *chn;
     struct vcpu   *v;
     struct domain *d = current->domain;
-    int            port, virq = bind->virq, vcpu = bind->vcpu;
-    long           rc = 0;
+    int            virq = bind->virq, vcpu = bind->vcpu;
+    int            rc = 0;
 
     if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
         return -EINVAL;
@@ -368,7 +388,12 @@ static long evtchn_bind_virq(evtchn_bind_virq_t *bind)
     if ( v->virq_to_evtchn[virq] != 0 )
         ERROR_EXIT(-EEXIST);
 
-    if ( (port = get_free_port(d)) < 0 )
+    if ( port >= 0 )
+    {
+        if ( (rc = evtchn_allocate_port(d, port)) < 0 )
+            ERROR_EXIT(rc);
+    }
+    else if ( (port = get_free_port(d)) < 0 )
         ERROR_EXIT(port);
 
     chn = evtchn_from_port(d, port);
@@ -511,7 +536,7 @@ static long evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
 }
 
 
-static long evtchn_close(struct domain *d1, int port1, bool_t guest)
+long evtchn_close(struct domain *d1, int port1, bool guest)
 {
     struct domain *d2 = NULL;
     struct vcpu   *v;
@@ -619,7 +644,7 @@ static long evtchn_close(struct domain *d1, int port1, bool_t guest)
 
         double_evtchn_lock(chn1, chn2);
 
-        free_evtchn(d1, chn1);
+        evtchn_free(d1, chn1);
 
         chn2->state = ECS_UNBOUND;
         chn2->u.unbound.remote_domid = d1->domain_id;
@@ -633,7 +658,7 @@ static long evtchn_close(struct domain *d1, int port1, bool_t guest)
     }
 
     spin_lock(&chn1->lock);
-    free_evtchn(d1, chn1);
+    evtchn_free(d1, chn1);
     spin_unlock(&chn1->lock);
 
  out:
@@ -839,7 +864,7 @@ static void clear_global_virq_handlers(struct domain *d)
     }
 }
 
-static long evtchn_status(evtchn_status_t *status)
+long evtchn_status(evtchn_status_t *status)
 {
     struct domain   *d;
     domid_t          dom = status->dom;
@@ -1030,6 +1055,11 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     long rc;
 
+#ifdef CONFIG_X86
+    if ( pv_shim )
+        return pv_shim_event_channel_op(cmd, arg);
+#endif
+
     switch ( cmd )
     {
     case EVTCHNOP_alloc_unbound: {
@@ -1056,7 +1086,7 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         struct evtchn_bind_virq bind_virq;
         if ( copy_from_guest(&bind_virq, arg, 1) != 0 )
             return -EFAULT;
-        rc = evtchn_bind_virq(&bind_virq);
+        rc = evtchn_bind_virq(&bind_virq, -1);
         if ( !rc && __copy_to_guest(arg, &bind_virq, 1) )
             rc = -EFAULT; /* Cleaning up here would be a mess! */
         break;
diff --git a/xen/drivers/char/xen_pv_console.c b/xen/drivers/char/xen_pv_console.c
index 6aa694e395..fb5a7893be 100644
--- a/xen/drivers/char/xen_pv_console.c
+++ b/xen/drivers/char/xen_pv_console.c
@@ -92,6 +92,11 @@ static void notify_daemon(void)
     xen_hypercall_evtchn_send(cons_evtchn);
 }
 
+evtchn_port_t pv_console_evtchn(void)
+{
+    return cons_evtchn;
+}
+
 size_t pv_console_rx(struct cpu_user_regs *regs)
 {
     char c;
@@ -101,10 +106,6 @@ size_t pv_console_rx(struct cpu_user_regs *regs)
     if ( !cons_ring )
         return 0;
 
-    /* TODO: move this somewhere */
-    if ( !test_bit(cons_evtchn, XEN_shared_info->evtchn_pending) )
-        return 0;
-
     prod = ACCESS_ONCE(cons_ring->in_prod);
     cons = cons_ring->in_cons;
     /* Get pointers before reading the ring */
@@ -125,8 +126,6 @@ size_t pv_console_rx(struct cpu_user_regs *regs)
     ACCESS_ONCE(cons_ring->in_cons) = cons;
     notify_daemon();
 
-    clear_bit(cons_evtchn, XEN_shared_info->evtchn_pending);
-
     return recv;
 }
 
diff --git a/xen/include/asm-x86/pv/shim.h b/xen/include/asm-x86/pv/shim.h
index 8d4e8d2ae1..6f7b39c3e0 100644
--- a/xen/include/asm-x86/pv/shim.h
+++ b/xen/include/asm-x86/pv/shim.h
@@ -36,6 +36,8 @@ void pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
                        unsigned long console_va, unsigned long vphysmap,
                        start_info_t *si);
 void pv_shim_shutdown(uint8_t reason);
+long pv_shim_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg);
+void pv_shim_inject_evtchn(unsigned int port);
 domid_t get_dom0_domid(void);
 
 #else
@@ -53,6 +55,16 @@ static inline void pv_shim_shutdown(uint8_t reason)
 {
     ASSERT_UNREACHABLE();
 }
+static inline long pv_shim_event_channel_op(int cmd,
+                                            XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+    ASSERT_UNREACHABLE();
+    return 0;
+}
+static inline void pv_shim_inject_evtchn(unsigned int port)
+{
+    ASSERT_UNREACHABLE();
+}
 static inline domid_t get_dom0_domid(void)
 {
     return 0;
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 87915ead69..3d202d8172 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -48,6 +48,21 @@ int evtchn_send(struct domain *d, unsigned int lport);
 /* Bind a local event-channel port to the specified VCPU. */
 long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id);
 
+/* Bind a VIRQ. */
+int evtchn_bind_virq(evtchn_bind_virq_t *bind, int port);
+
+/* Get the status of an event channel port. */
+long evtchn_status(evtchn_status_t *status);
+
+/* Close an event channel. */
+long evtchn_close(struct domain *d1, int port1, bool guest);
+
+/* Free an event channel. */
+void evtchn_free(struct domain *d, struct evtchn *chn);
+
+/* Allocate a specific event channel port. */
+int evtchn_allocate_port(struct domain *d, unsigned int port);
+
 /* Unmask a local event-channel port. */
 int evtchn_unmask(unsigned int port);
 
diff --git a/xen/include/xen/pv_console.h b/xen/include/xen/pv_console.h
index e578b56620..cb92539666 100644
--- a/xen/include/xen/pv_console.h
+++ b/xen/include/xen/pv_console.h
@@ -10,6 +10,7 @@ void pv_console_set_rx_handler(serial_rx_fn fn);
 void pv_console_init_postirq(void);
 void pv_console_puts(const char *buf);
 size_t pv_console_rx(struct cpu_user_regs *regs);
+evtchn_port_t pv_console_evtchn(void);
 
 #else
 
@@ -18,6 +19,11 @@ static inline void pv_console_set_rx_handler(serial_rx_fn fn) { }
 static inline void pv_console_init_postirq(void) { }
 static inline void pv_console_puts(const char *buf) { }
 static inline size_t pv_console_rx(struct cpu_user_regs *regs) { return 0; }
+evtchn_port_t pv_console_evtchn(void)
+{
+    ASSERT_UNREACHABLE();
+    return 0;
+}
 
 #endif /* !CONFIG_XEN_GUEST */
 #endif /* __XEN_PV_CONSOLE_H__ */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 64abc1df6c..ac65d0c0df 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -48,6 +48,8 @@ DEFINE_XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t);
 /* A global pointer to the hardware domain (usually DOM0). */
 extern struct domain *hardware_domain;
 
+extern struct domain *pv_domain;
+
 #ifdef CONFIG_LATE_HWDOM
 extern domid_t hardware_domid;
 #else
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 56/74] xen/pvshim: add grant table operations
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (54 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-08 17:19   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU Wei Liu
                   ` (19 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/pv/shim.c                | 174 ++++++++++++++++++++++++++++++++++
 xen/common/compat/grant_table.c       |   5 +
 xen/common/grant_table.c              |  10 ++
 xen/include/asm-x86/guest/hypercall.h |   6 ++
 xen/include/asm-x86/pv/shim.h         |   9 ++
 5 files changed, 204 insertions(+)

diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 69482993f9..98c1e31e8f 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -22,6 +22,7 @@
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
 #include <xen/init.h>
+#include <xen/iocap.h>
 #include <xen/shutdown.h>
 #include <xen/types.h>
 
@@ -30,11 +31,17 @@
 #include <asm/guest.h>
 #include <asm/pv/mm.h>
 
+#include <compat/grant_table.h>
+
 #ifndef CONFIG_PV_SHIM_EXCLUSIVE
 bool pv_shim;
 boolean_param("pv-shim", pv_shim);
 #endif
 
+static unsigned int nr_grant_list;
+static unsigned long *grant_frames;
+static DEFINE_SPINLOCK(grant_lock);
+
 #define L1_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_USER| \
                  _PAGE_GUEST_KERNEL)
 #define COMPAT_L1_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED)
@@ -360,6 +367,173 @@ void pv_shim_inject_evtchn(unsigned int port)
     }
 }
 
+long pv_shim_grant_table_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop,
+                            unsigned int count, bool compat)
+{
+    struct domain *d = current->domain;
+    long rc = 0;
+
+    if ( count != 1 )
+        return -EINVAL;
+
+    switch ( cmd )
+    {
+    case GNTTABOP_setup_table:
+    {
+        struct gnttab_setup_table nat;
+        struct compat_gnttab_setup_table cmp;
+        unsigned int i;
+
+        if ( unlikely(compat ? copy_from_guest(&cmp, uop, 1)
+                             : copy_from_guest(&nat, uop, 1)) ||
+             unlikely(compat ? !compat_handle_okay(cmp.frame_list,
+                                                   cmp.nr_frames)
+                             : !guest_handle_okay(nat.frame_list,
+                                                  nat.nr_frames)) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+        if ( compat )
+#define XLAT_gnttab_setup_table_HNDL_frame_list(d, s)
+                XLAT_gnttab_setup_table(&nat, &cmp);
+#undef XLAT_gnttab_setup_table_HNDL_frame_list
+
+        nat.status = GNTST_okay;
+
+        spin_lock(&grant_lock);
+        if ( !nr_grant_list )
+        {
+            struct gnttab_query_size query_size = {
+                .dom = DOMID_SELF,
+            };
+
+            rc = xen_hypercall_grant_table_op(GNTTABOP_query_size,
+                                              &query_size, 1);
+            if ( rc )
+            {
+                spin_unlock(&grant_lock);
+                break;
+            }
+
+            ASSERT(!grant_frames);
+            grant_frames = xzalloc_array(unsigned long,
+                                         query_size.max_nr_frames);
+            if ( !grant_frames )
+            {
+                spin_unlock(&grant_lock);
+                rc = -ENOMEM;
+                break;
+            }
+
+            nr_grant_list = query_size.max_nr_frames;
+        }
+
+        if ( nat.nr_frames > nr_grant_list )
+        {
+            spin_unlock(&grant_lock);
+            rc = -EINVAL;
+            break;
+        }
+
+        for ( i = 0; i < nat.nr_frames; i++ )
+        {
+            if ( !grant_frames[i] )
+            {
+                struct xen_add_to_physmap xatp = {
+                    .domid = DOMID_SELF,
+                    .idx = i,
+                    .space = XENMAPSPACE_grant_table,
+                };
+                mfn_t mfn;
+
+                rc = hypervisor_alloc_unused_page(&mfn);
+                if ( rc )
+                {
+                    gprintk(XENLOG_ERR,
+                            "unable to get memory for grant table\n");
+                    break;
+                }
+
+                xatp.gpfn = mfn_x(mfn);
+                rc = xen_hypercall_memory_op(XENMEM_add_to_physmap, &xatp);
+                if ( rc )
+                {
+                    hypervisor_free_unused_page(mfn);
+                    break;
+                }
+
+                BUG_ON(iomem_permit_access(d, mfn_x(mfn), mfn_x(mfn)));
+                grant_frames[i] = mfn_x(mfn);
+            }
+
+            ASSERT(grant_frames[i]);
+            if ( compat )
+            {
+                compat_pfn_t pfn = grant_frames[i];
+
+                if ( __copy_to_compat_offset(cmp.frame_list, i, &pfn, 1) )
+                {
+                    nat.status = GNTST_bad_virt_addr;
+                    rc = -EFAULT;
+                    break;
+                }
+            }
+            else if ( __copy_to_guest_offset(nat.frame_list, i,
+                                             &grant_frames[i], 1) )
+            {
+                nat.status = GNTST_bad_virt_addr;
+                rc = -EFAULT;
+                break;
+            }
+        }
+        spin_unlock(&grant_lock);
+
+        if ( compat )
+#define XLAT_gnttab_setup_table_HNDL_frame_list(d, s)
+                XLAT_gnttab_setup_table(&cmp, &nat);
+#undef XLAT_gnttab_setup_table_HNDL_frame_list
+
+        if ( unlikely(compat ? copy_to_guest(uop, &cmp, 1)
+                             : copy_to_guest(uop, &nat, 1)) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        break;
+    }
+    case GNTTABOP_query_size:
+    {
+        struct gnttab_query_size op;
+        int rc;
+
+        if ( unlikely(copy_from_guest(&op, uop, 1)) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        rc = xen_hypercall_grant_table_op(GNTTABOP_query_size, &op, count);
+        if ( rc )
+            break;
+
+        if ( copy_to_guest(uop, &op, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        break;
+    }
+    default:
+        rc = -ENOSYS;
+        break;
+    }
+
+    return rc;
+}
+
 domid_t get_dom0_domid(void)
 {
     uint32_t eax, ebx, ecx, edx;
diff --git a/xen/common/compat/grant_table.c b/xen/common/compat/grant_table.c
index ff1d678f01..88c608b62b 100644
--- a/xen/common/compat/grant_table.c
+++ b/xen/common/compat/grant_table.c
@@ -122,6 +122,11 @@ int compat_grant_table_op(unsigned int cmd,
         return do_grant_table_op(cmd, cmp_uop, count);
     }
 
+#ifdef CONFIG_X86
+    if ( pv_shim )
+        return pv_shim_grant_table_op(cmd, cmp_uop, count, true);
+#endif
+
     if ( (int)count < 0 )
         rc = -EINVAL;
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 250450bdda..caf9d2cfae 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -40,6 +40,10 @@
 #include <xsm/xsm.h>
 #include <asm/flushtlb.h>
 
+#ifdef CONFIG_X86
+#include <asm/pv/shim.h>
+#endif
+
 /* Per-domain grant information. */
 struct grant_table {
     /*
@@ -3324,6 +3328,12 @@ do_grant_table_op(
     if ( (cmd &= GNTTABOP_CMD_MASK) != GNTTABOP_cache_flush && opaque_in )
         return -EINVAL;
 
+#ifdef CONFIG_X86
+    if ( pv_shim )
+        /* NB: no continuation support for pv-shim ops. */
+        return pv_shim_grant_table_op(cmd, uop, count, false);
+#endif
+
     rc = -EFAULT;
     switch ( cmd )
     {
diff --git a/xen/include/asm-x86/guest/hypercall.h b/xen/include/asm-x86/guest/hypercall.h
index 7d11df29fa..85985a7d98 100644
--- a/xen/include/asm-x86/guest/hypercall.h
+++ b/xen/include/asm-x86/guest/hypercall.h
@@ -103,6 +103,12 @@ static inline long xen_hypercall_event_channel_op(unsigned int cmd, void *arg)
     return _hypercall64_2(long, __HYPERVISOR_event_channel_op, cmd, arg);
 }
 
+static inline long xen_hypercall_grant_table_op(unsigned int cmd, void *arg,
+                                                unsigned int count)
+{
+    return _hypercall64_3(long, __HYPERVISOR_grant_table_op, cmd, arg, count);
+}
+
 static inline long xen_hypercall_hvm_op(unsigned int op, void *arg)
 {
     return _hypercall64_2(long, __HYPERVISOR_hvm_op, op, arg);
diff --git a/xen/include/asm-x86/pv/shim.h b/xen/include/asm-x86/pv/shim.h
index 6f7b39c3e0..47bc4267af 100644
--- a/xen/include/asm-x86/pv/shim.h
+++ b/xen/include/asm-x86/pv/shim.h
@@ -38,6 +38,8 @@ void pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
 void pv_shim_shutdown(uint8_t reason);
 long pv_shim_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg);
 void pv_shim_inject_evtchn(unsigned int port);
+long pv_shim_grant_table_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop,
+                            unsigned int count, bool compat);
 domid_t get_dom0_domid(void);
 
 #else
@@ -65,6 +67,13 @@ static inline void pv_shim_inject_evtchn(unsigned int port)
 {
     ASSERT_UNREACHABLE();
 }
+static inline long pv_shim_grant_table_op(unsigned int cmd,
+                                          XEN_GUEST_HANDLE_PARAM(void) uop,
+                                          unsigned int count, bool compat)
+{
+    ASSERT_UNREACHABLE();
+    return 0;
+}
 static inline domid_t get_dom0_domid(void)
 {
     return 0;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (55 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 56/74] xen/pvshim: add grant table operations Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-09  9:13   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 58/74] xen/pvshim: add migration support Wei Liu
                   ` (18 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Sergey Dyasli <sergey.dyasli@citrix.com>

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
 xen/arch/x86/pv/shim.c      |  31 ++++++++--
 xen/drivers/char/Makefile   |   1 +
 xen/drivers/char/console.c  |   4 ++
 xen/drivers/char/consoled.c | 145 ++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/consoled.h  |  27 +++++++++
 5 files changed, 204 insertions(+), 4 deletions(-)
 create mode 100644 xen/drivers/char/consoled.c
 create mode 100644 xen/include/xen/consoled.h

diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 98c1e31e8f..74b24bf950 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -25,6 +25,8 @@
 #include <xen/iocap.h>
 #include <xen/shutdown.h>
 #include <xen/types.h>
+#include <xen/consoled.h>
+#include <xen/pv_console.h>
 
 #include <asm/apic.h>
 #include <asm/dom0_build.h>
@@ -125,13 +127,28 @@ void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
 })
     SET_AND_MAP_PARAM(HVM_PARAM_STORE_PFN, si->store_mfn, store_va);
     SET_AND_MAP_PARAM(HVM_PARAM_STORE_EVTCHN, si->store_evtchn, 0);
+    SET_AND_MAP_PARAM(HVM_PARAM_CONSOLE_EVTCHN, si->console.domU.evtchn, 0);
     if ( !pv_console )
-    {
         SET_AND_MAP_PARAM(HVM_PARAM_CONSOLE_PFN, si->console.domU.mfn,
                           console_va);
-        SET_AND_MAP_PARAM(HVM_PARAM_CONSOLE_EVTCHN, si->console.domU.evtchn, 0);
-    }
 #undef SET_AND_MAP_PARAM
+    else
+    {
+        /* Allocate a new page for DomU's PV console */
+        void *page = alloc_xenheap_pages(0, MEMF_bits(32));
+        uint64_t console_mfn;
+
+        ASSERT(page);
+        clear_page(page);
+        console_mfn = virt_to_mfn(page);
+        si->console.domU.mfn = console_mfn;
+        share_xen_page_with_guest(mfn_to_page(console_mfn), d,
+                                  XENSHARE_writable);
+        replace_va(d, l4start, console_va, console_mfn);
+        dom0_update_physmap(d, (console_va - va_start) >> PAGE_SHIFT,
+                            console_mfn, vphysmap);
+        consoled_set_ring_addr(page);
+    }
 }
 
 void pv_shim_shutdown(uint8_t reason)
@@ -334,7 +351,13 @@ case EVTCHNOP_##cmd: {                                                  \
         if ( copy_from_guest(&send, arg, 1) != 0 )
             return -EFAULT;
 
-        rc = xen_hypercall_event_channel_op(EVTCHNOP_send, &send);
+        if ( pv_console && send.port == pv_console_evtchn() )
+        {
+            consoled_guest_rx();
+            rc = 0;
+        }
+        else
+            rc = xen_hypercall_event_channel_op(EVTCHNOP_send, &send);
 
         break;
     }
diff --git a/xen/drivers/char/Makefile b/xen/drivers/char/Makefile
index 9d48d0f2dc..0d48b16e8d 100644
--- a/xen/drivers/char/Makefile
+++ b/xen/drivers/char/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_HAS_EHCI) += ehci-dbgp.o
 obj-$(CONFIG_ARM) += arm-uart.o
 obj-y += serial.o
 obj-$(CONFIG_XEN_GUEST) += xen_pv_console.o
+obj-$(CONFIG_PV_SHIM) += consoled.o
diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 354e020d19..3c615a255c 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -15,6 +15,7 @@
 #include <xen/init.h>
 #include <xen/event.h>
 #include <xen/console.h>
+#include <xen/consoled.h>
 #include <xen/serial.h>
 #include <xen/pv_console.h>
 #include <xen/softirq.h>
@@ -408,6 +409,9 @@ static void __serial_rx(char c, struct cpu_user_regs *regs)
         serial_rx_ring[SERIAL_RX_MASK(serial_rx_prod++)] = c;
     /* Always notify the guest: prevents receive path from getting stuck. */
     send_global_virq(VIRQ_CONSOLE);
+
+    if ( pv_shim && pv_console )
+        consoled_guest_tx(c);
 }
 
 static void serial_rx(char c, struct cpu_user_regs *regs)
diff --git a/xen/drivers/char/consoled.c b/xen/drivers/char/consoled.c
new file mode 100644
index 0000000000..9a4b504208
--- /dev/null
+++ b/xen/drivers/char/consoled.c
@@ -0,0 +1,145 @@
+/******************************************************************************
+ * drivers/char/consoled.c
+ *
+ * A backend driver for Xen's PV console.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ */
+
+#include <xen/lib.h>
+#include <xen/event.h>
+#include <xen/pv_console.h>
+#include <xen/consoled.h>
+
+static struct xencons_interface *cons_ring;
+static DEFINE_SPINLOCK(rx_lock);
+
+void consoled_set_ring_addr(struct xencons_interface *ring)
+{
+    cons_ring = ring;
+}
+
+struct xencons_interface *consoled_get_ring_addr(void)
+{
+    return cons_ring;
+}
+
+static void notify_guest(void)
+{
+    unsigned int cpu = 0, port = pv_console_evtchn();
+
+    evtchn_port_set_pending(pv_domain, cpu, evtchn_from_port(pv_domain, port));
+}
+
+#define BUF_SZ 255
+static char buf[BUF_SZ + 1];
+
+/* Receives characters from a domain's PV console */
+size_t consoled_guest_rx(void)
+{
+    size_t recv = 0, idx = 0;
+    XENCONS_RING_IDX cons, prod;
+
+    if ( !cons_ring )
+        return 0;
+
+    spin_lock(&rx_lock);
+
+    cons = cons_ring->out_cons;
+    prod = ACCESS_ONCE(cons_ring->out_prod);
+    ASSERT((prod - cons) <= sizeof(cons_ring->out));
+
+    /* Is the ring empty? */
+    if ( cons == prod )
+        goto out;
+
+    /* Update pointers before accessing the ring */
+    smp_rmb();
+
+    while ( cons != prod )
+    {
+        char c = cons_ring->out[MASK_XENCONS_IDX(cons++, cons_ring->out)];
+
+        buf[idx++] = c;
+        recv++;
+
+        if ( idx >= BUF_SZ )
+        {
+            pv_console_puts(buf);
+            idx = 0;
+        }
+    }
+
+    if ( idx )
+    {
+        buf[idx] = '\0';
+        pv_console_puts(buf);
+    }
+
+    /* No need for a mem barrier because every character was already consumed */
+    barrier();
+    ACCESS_ONCE(cons_ring->out_cons) = cons;
+    notify_guest();
+
+ out:
+    spin_unlock(&rx_lock);
+
+    return recv;
+}
+
+/* Sends a character into a domain's PV console */
+size_t consoled_guest_tx(char c)
+{
+    size_t sent = 0;
+    XENCONS_RING_IDX cons, prod;
+
+    if ( !cons_ring )
+        return 0;
+
+    cons = ACCESS_ONCE(cons_ring->in_cons);
+    prod = cons_ring->in_prod;
+    ASSERT((prod - cons) <= sizeof(cons_ring->in));
+
+    /* Is the ring out of space? */
+    if ( sizeof(cons_ring->in) - (prod - cons) == 0 )
+        goto notify;
+
+    /* Update pointers before accessing the ring */
+    smp_rmb();
+
+    cons_ring->in[MASK_XENCONS_IDX(prod++, cons_ring->in)] = c;
+    sent++;
+
+    /* Write to the ring before updating the pointer */
+    smp_wmb();
+    ACCESS_ONCE(cons_ring->in_prod) = prod;
+
+ notify:
+    /* Always notify the guest: prevents receive path from getting stuck. */
+    notify_guest();
+
+    return sent;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/consoled.h b/xen/include/xen/consoled.h
new file mode 100644
index 0000000000..fd5d220a8a
--- /dev/null
+++ b/xen/include/xen/consoled.h
@@ -0,0 +1,27 @@
+#ifndef __XEN_CONSOLED_H__
+#define __XEN_CONSOLED_H__
+
+#include <public/io/console.h>
+
+#ifdef CONFIG_PV_SHIM
+
+void consoled_set_ring_addr(struct xencons_interface *ring);
+struct xencons_interface *consoled_get_ring_addr(void);
+size_t consoled_guest_rx(void);
+size_t consoled_guest_tx(char c);
+
+#else
+
+size_t consoled_guest_tx(char c) { return 0; }
+
+#endif /* !CONFIG_PV_SHIM */
+#endif /* __XEN_CONSOLED_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 58/74] xen/pvshim: add migration support
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (56 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-09  9:38   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 59/74] xen/pvshim: add shim_mem cmdline parameter Wei Liu
                   ` (17 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/guest/xen.c          |  29 +++++++
 xen/arch/x86/mm.c                 |   3 +-
 xen/arch/x86/pv/shim.c            | 163 +++++++++++++++++++++++++++++++++++++-
 xen/common/domain.c               |  11 ++-
 xen/common/schedule.c             |   3 +-
 xen/drivers/char/xen_pv_console.c |   2 +-
 xen/include/asm-x86/guest/xen.h   |   5 ++
 xen/include/asm-x86/pv/shim.h     |   5 +-
 xen/include/xen/sched.h           |   2 +-
 9 files changed, 206 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 653a7366ab..54c997b9e0 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -361,6 +361,35 @@ uint32_t hypervisor_cpuid_base(void)
     return xen_cpuid_base;
 }
 
+void ap_resume(void *unused)
+{
+    map_vcpuinfo();
+    init_evtchn();
+}
+
+void hypervisor_resume(void)
+{
+    /* Reset shared info page. */
+    map_shared_info();
+
+    /*
+     * Reset vcpu_info. Just clean the mapped bitmap and try to map the vcpu
+     * area again. On failure to map (when it was previously mapped) panic
+     * since it's impossible to safely shut down running guest vCPUs in order
+     * to meet the new XEN_LEGACY_MAX_VCPUS requirement.
+     */
+    memset(vcpu_info_mapped, 0, sizeof(vcpu_info_mapped));
+    if ( map_vcpuinfo() && nr_cpu_ids > XEN_LEGACY_MAX_VCPUS )
+        panic("unable to remap vCPU info and vCPUs > legacy limit");
+
+    /* Setup event channel upcall vector. */
+    init_evtchn();
+    smp_call_function(ap_resume, NULL, 1);
+
+    if ( pv_console )
+        pv_console_init();
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4332d3bb39..e726c62064 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -466,8 +466,7 @@ void share_xen_page_with_guest(
     spin_unlock(&d->page_alloc_lock);
 }
 
-int __init unshare_xen_page_with_guest(struct page_info *page,
-                                       struct domain *d)
+int unshare_xen_page_with_guest(struct page_info *page, struct domain *d)
 {
     if ( page_get_owner(page) != d || !is_xen_heap_page(page) )
         return -EINVAL;
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 74b24bf950..56ecaea2d2 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -151,10 +151,167 @@ void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
     }
 }
 
-void pv_shim_shutdown(uint8_t reason)
+static void write_start_info(struct domain *d)
 {
-    /* XXX: handle suspend */
-    xen_hypercall_shutdown(reason);
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    start_info_t *si = map_domain_page(_mfn(is_pv_32bit_domain(d) ? regs->edx
+                                                                  : regs->rdx));
+    uint64_t param;
+
+    BUG_ON(!si);
+
+    snprintf(si->magic, sizeof(si->magic), "xen-3.0-x86_%s",
+             is_pv_32bit_domain(d) ? "32p" : "64");
+    si->nr_pages = d->tot_pages;
+    si->shared_info = virt_to_maddr(d->shared_info);
+    si->flags = (xen_processor_pmbits << 8) & SIF_PM_MASK;
+    BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_STORE_PFN, &si->store_mfn));
+    BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_STORE_EVTCHN, &param));
+    si->store_evtchn = param;
+    BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_EVTCHN, &param));
+    si->console.domU.evtchn = param;
+    if ( !pv_console )
+        BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_PFN,
+                                           &si->console.domU.mfn));
+    else
+        si->console.domU.mfn = virt_to_mfn(consoled_get_ring_addr());
+
+    if ( is_pv_32bit_domain(d) )
+        xlat_start_info(si, XLAT_start_info_console_domU);
+
+    unmap_domain_page(si);
+}
+
+int pv_shim_shutdown(uint8_t reason)
+{
+    long rc;
+
+    if ( reason == SHUTDOWN_suspend )
+    {
+        struct domain *d = current->domain;
+        struct vcpu *v;
+        unsigned int i;
+        uint64_t old_store_pfn, old_console_pfn = 0, store_pfn, console_pfn;
+        uint64_t store_evtchn, console_evtchn;
+
+        BUG_ON(current->vcpu_id != 0);
+
+        BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_STORE_PFN,
+                                           &old_store_pfn));
+        if ( !pv_console )
+            BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_PFN,
+                                               &old_console_pfn));
+
+        /* Pause the other vcpus before starting the migration. */
+        for_each_vcpu(d, v)
+            if ( v != current )
+                vcpu_pause_by_systemcontroller(v);
+
+        rc = xen_hypercall_shutdown(SHUTDOWN_suspend);
+        if ( rc )
+        {
+            for_each_vcpu(d, v)
+                if ( v != current )
+                    vcpu_unpause_by_systemcontroller(v);
+
+            return rc;
+        }
+
+        /* Resume the shim itself first. */
+        hypervisor_resume();
+
+        /*
+         * ATM there's nothing Xen can do if the console/store pfn changes,
+         * because Xen won't have a page_info struct for it.
+         */
+        BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_STORE_PFN,
+                                           &store_pfn));
+        BUG_ON(old_store_pfn != store_pfn);
+        if ( !pv_console )
+        {
+            BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_PFN,
+                                               &console_pfn));
+            BUG_ON(old_console_pfn != console_pfn);
+        }
+
+        /* Update domain id. */
+        d->domain_id = get_dom0_domid();
+
+        /* Clean the iomem range. */
+        BUG_ON(iomem_deny_access(d, 0, ~0UL));
+
+        /* Clean grant frames. */
+        xfree(grant_frames);
+        grant_frames = NULL;
+        nr_grant_list = 0;
+
+        /* Clean event channels. */
+        for ( i = 0; i < EVTCHN_2L_NR_CHANNELS; i++ )
+        {
+            if ( !port_is_valid(d, i) )
+                continue;
+
+            if ( evtchn_handled(d, i) )
+                evtchn_close(d, i, false);
+            else
+                evtchn_free(d, evtchn_from_port(d, i));
+        }
+
+        /* Reserve store/console event channel. */
+        BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_STORE_EVTCHN,
+                                           &store_evtchn));
+        BUG_ON(evtchn_allocate_port(d, store_evtchn));
+        evtchn_reserve(d, store_evtchn);
+        BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_EVTCHN,
+                                           &console_evtchn));
+        BUG_ON(evtchn_allocate_port(d, console_evtchn));
+        evtchn_reserve(d, console_evtchn);
+
+        /* Clean watchdogs. */
+        watchdog_domain_destroy(d);
+        watchdog_domain_init(d);
+
+        /* Clean the PIRQ EOI page. */
+        if ( d->arch.pirq_eoi_map != NULL )
+        {
+            unmap_domain_page_global(d->arch.pirq_eoi_map);
+            put_page_and_type(mfn_to_page(d->arch.pirq_eoi_map_mfn));
+            d->arch.pirq_eoi_map = NULL;
+            d->arch.pirq_eoi_map_mfn = 0;
+            d->arch.auto_unmask = 0;
+        }
+
+        /*
+         * NB: there's no need to fixup the p2m, since the mfns assigned
+         * to the PV guest have not changed at all. Just re-write the
+         * start_info fields with the appropriate value.
+         */
+        write_start_info(d);
+
+        for_each_vcpu(d, v)
+        {
+            /* Unmap guest vcpu_info pages. */
+            unmap_vcpu_info(v);
+
+            /* Reset the periodic timer to the default value. */
+            v->periodic_period = MILLISECS(10);
+            /* Stop the singleshot timer. */
+            stop_timer(&v->singleshot_timer);
+
+            if ( test_bit(_VPF_down, &v->pause_flags) )
+                BUG_ON(vcpu_reset(v));
+
+            if ( v != current )
+                vcpu_unpause_by_systemcontroller(v);
+            else
+                vcpu_force_reschedule(v);
+        }
+    }
+    else
+        /* Forward to L0. */
+        rc = xen_hypercall_shutdown(reason);
+
+    return rc;
 }
 
 long pv_shim_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
diff --git a/xen/common/domain.c b/xen/common/domain.c
index d653a0b0bb..bc2ceb2d36 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -701,15 +701,12 @@ void __domain_crash_synchronous(void)
 }
 
 
-void domain_shutdown(struct domain *d, u8 reason)
+int domain_shutdown(struct domain *d, u8 reason)
 {
     struct vcpu *v;
 
     if ( pv_shim )
-    {
-        pv_shim_shutdown(reason);
-        return;
-    }
+        return pv_shim_shutdown(reason);
 
     spin_lock(&d->shutdown_lock);
 
@@ -723,7 +720,7 @@ void domain_shutdown(struct domain *d, u8 reason)
     if ( d->is_shutting_down )
     {
         spin_unlock(&d->shutdown_lock);
-        return;
+        return 0;
     }
 
     d->is_shutting_down = 1;
@@ -745,6 +742,8 @@ void domain_shutdown(struct domain *d, u8 reason)
     __domain_finalise_shutdown(d);
 
     spin_unlock(&d->shutdown_lock);
+
+    return 0;
 }
 
 void domain_resume(struct domain *d)
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 88279213e8..b7884263f2 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1149,11 +1149,10 @@ ret_t do_sched_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&sched_shutdown, arg, 1) )
             break;
 
-        ret = 0;
         TRACE_3D(TRC_SCHED_SHUTDOWN,
                  current->domain->domain_id, current->vcpu_id,
                  sched_shutdown.reason);
-        domain_shutdown(current->domain, (u8)sched_shutdown.reason);
+        ret = domain_shutdown(current->domain, (u8)sched_shutdown.reason);
 
         break;
     }
diff --git a/xen/drivers/char/xen_pv_console.c b/xen/drivers/char/xen_pv_console.c
index fb5a7893be..d2a28478ac 100644
--- a/xen/drivers/char/xen_pv_console.c
+++ b/xen/drivers/char/xen_pv_console.c
@@ -37,7 +37,7 @@ static DEFINE_SPINLOCK(tx_lock);
 
 bool pv_console;
 
-void __init pv_console_init(void)
+void pv_console_init(void)
 {
     struct evtchn_unmask unmask;
     long r;
diff --git a/xen/include/asm-x86/guest/xen.h b/xen/include/asm-x86/guest/xen.h
index 94f781c30f..c244569e83 100644
--- a/xen/include/asm-x86/guest/xen.h
+++ b/xen/include/asm-x86/guest/xen.h
@@ -40,6 +40,7 @@ void hypervisor_fixup_e820(struct e820map *e820);
 void hypervisor_init_memory(void);
 const unsigned long *hypervisor_reserved_pages(unsigned int *size);
 uint32_t hypervisor_cpuid_base(void);
+void hypervisor_resume(void);
 
 #else
 
@@ -91,6 +92,10 @@ static inline uint32_t hypervisor_cpuid_base(void)
     ASSERT_UNREACHABLE();
     return 0;
 };
+static inline void hypervisor_resume(void)
+{
+    ASSERT_UNREACHABLE();
+};
 
 #endif /* CONFIG_XEN_GUEST */
 #endif /* __X86_GUEST_XEN_H__ */
diff --git a/xen/include/asm-x86/pv/shim.h b/xen/include/asm-x86/pv/shim.h
index 47bc4267af..0207348a85 100644
--- a/xen/include/asm-x86/pv/shim.h
+++ b/xen/include/asm-x86/pv/shim.h
@@ -35,7 +35,7 @@ void pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
                        unsigned long va_start, unsigned long store_va,
                        unsigned long console_va, unsigned long vphysmap,
                        start_info_t *si);
-void pv_shim_shutdown(uint8_t reason);
+int pv_shim_shutdown(uint8_t reason);
 long pv_shim_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg);
 void pv_shim_inject_evtchn(unsigned int port);
 long pv_shim_grant_table_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop,
@@ -53,9 +53,10 @@ static inline void pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
 {
     ASSERT_UNREACHABLE();
 }
-static inline void pv_shim_shutdown(uint8_t reason)
+static inline int pv_shim_shutdown(uint8_t reason)
 {
     ASSERT_UNREACHABLE();
+    return 0;
 }
 static inline long pv_shim_event_channel_op(int cmd,
                                             XEN_GUEST_HANDLE_PARAM(void) arg)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac65d0c0df..70db377eae 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -605,7 +605,7 @@ static inline struct domain *rcu_lock_current_domain(void)
 struct domain *get_domain_by_id(domid_t dom);
 void domain_destroy(struct domain *d);
 int domain_kill(struct domain *d);
-void domain_shutdown(struct domain *d, u8 reason);
+int domain_shutdown(struct domain *d, u8 reason);
 void domain_resume(struct domain *d);
 void domain_pause_for_debugger(void);
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 59/74] xen/pvshim: add shim_mem cmdline parameter
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (57 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 58/74] xen/pvshim: add migration support Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-09  9:47   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 60/74] xen/pvshim: set max_pages to the value of tot_pages Wei Liu
                   ` (16 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Sergey Dyasli <sergey.dyasli@citrix.com>

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
 docs/misc/xen-command-line.markdown | 16 +++++++++++++
 xen/arch/x86/dom0_build.c           | 18 ++++++++++++++-
 xen/arch/x86/pv/shim.c              | 46 +++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/pv/shim.h       |  7 ++++++
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 3a1a9c1fba..9f51710a46 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -686,6 +686,8 @@ any dom0 autoballooning feature present in your toolstack. See the
 _xl.conf(5)_ man page or [Xen Best
 Practices](http://wiki.xen.org/wiki/Xen_Best_Practices#Xen_dom0_dedicated_memory_and_preventing_dom0_memory_ballooning).
 
+This option doesn't have effect if pv-shim mode is enabled.
+
 ### dom0\_nodes
 
 > `= List of [ <integer> | relaxed | strict ]`
@@ -1456,6 +1458,20 @@ guest compatibly inside an HVM container.
 In this mode, the kernel and initrd passed as modules to the hypervisor are
 constructed into a plain unprivileged PV domain.
 
+### shim\_mem (x86)
+> `= List of ( min:<size> | max:<size> | <size> )`
+
+Set the amount of memory that xen-shim reserves for itself. Only has effect
+if pv-shim mode is enabled.
+
+* `min:<size>` specifies the minimum amount of memory. Ignored if greater
+   than max. Default: 10M.
+* `max:<size>` specifies the maximum amount of memory. Default: 128M.
+* `<size>` specifies the exact amount of memory. Overrides both min and max.
+
+By default, 1/16th of total HVM container's memory is reserved for xen-shim
+with minimum amount being 10MB and maximum amount 128MB.
+
 ### rcu-idle-timer-period-ms
 > `= <integer>`
 
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 1c5853690a..1b0b89fdeb 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -51,6 +51,13 @@ static long __init parse_amt(const char *s, const char **ps)
 
 static int __init parse_dom0_mem(const char *s)
 {
+    /* xen-shim uses shim_mem parameter instead of dom0_mem */
+    if ( pv_shim )
+    {
+        printk("Ignoring dom0_mem param in pv-shim mode\n");
+        return 0;
+    }
+
     do {
         if ( !strncmp(s, "min:", 4) )
             dom0_min_nrpages = parse_amt(s+4, &s);
@@ -284,7 +291,16 @@ unsigned long __init dom0_compute_nr_pages(
          * maximum of 128MB.
          */
         if ( nr_pages == 0 )
-            nr_pages = -min(avail / 16, 128UL << (20 - PAGE_SHIFT));
+        {
+            uint64_t rsvd = min(avail / 16, 128UL << (20 - PAGE_SHIFT));
+            if ( pv_shim )
+            {
+                rsvd = pv_shim_mem(avail);
+                printk("Reserved %lu pages for xen-shim\n", rsvd);
+
+            }
+            nr_pages = -rsvd;
+        }
 
         /* Negative specification means "all memory - specified amount". */
         if ( (long)nr_pages  < 0 ) nr_pages  += avail;
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 56ecaea2d2..c24adacbc7 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -40,6 +40,52 @@ bool pv_shim;
 boolean_param("pv-shim", pv_shim);
 #endif
 
+/*
+ * By default, 1/16th of total HVM container's memory is reserved for xen-shim
+ * with minimum amount being 10MB and maximum amount 128MB. Some users may wish
+ * to tune this constants for better memory utilization. This can be achieved
+ * using the following xen-shim's command line option:
+ *
+ * shim_mem=[min:<min_amt>,][max:<max_amt>,][<amt>]
+ *
+ * <min_amt>: The minimum amount of memory that should be allocated for xen-shim
+ *            (ignored if greater than max)
+ * <max_amt>: The maximum amount of memory that should be allocated for xen-shim
+ * <amt>:     The precise amount of memory to allocate for xen-shim
+ *            (overrides both min and max)
+ */
+static uint64_t __initdata shim_nrpages;
+static uint64_t __initdata shim_min_nrpages = 10UL << (20 - PAGE_SHIFT);
+static uint64_t __initdata shim_max_nrpages = 128UL << (20 - PAGE_SHIFT);
+
+static int __init parse_shim_mem(const char *s)
+{
+    do {
+        if ( !strncmp(s, "min:", 4) )
+            shim_min_nrpages = parse_size_and_unit(s+4, &s) >> PAGE_SHIFT;
+        else if ( !strncmp(s, "max:", 4) )
+            shim_max_nrpages = parse_size_and_unit(s+4, &s) >> PAGE_SHIFT;
+        else
+            shim_nrpages = parse_size_and_unit(s, &s) >> PAGE_SHIFT;
+    } while ( *s++ == ',' );
+
+    return s[-1] ? -EINVAL : 0;
+}
+custom_param("shim_mem", parse_shim_mem);
+
+uint64_t pv_shim_mem(uint64_t avail)
+{
+    uint64_t rsvd = min(avail / 16, shim_max_nrpages);
+
+    if ( shim_nrpages )
+        return shim_nrpages;
+
+    if ( shim_min_nrpages <= shim_max_nrpages )
+        rsvd = max(rsvd, shim_min_nrpages);
+
+    return rsvd;
+}
+
 static unsigned int nr_grant_list;
 static unsigned long *grant_frames;
 static DEFINE_SPINLOCK(grant_lock);
diff --git a/xen/include/asm-x86/pv/shim.h b/xen/include/asm-x86/pv/shim.h
index 0207348a85..00906f884b 100644
--- a/xen/include/asm-x86/pv/shim.h
+++ b/xen/include/asm-x86/pv/shim.h
@@ -41,6 +41,7 @@ void pv_shim_inject_evtchn(unsigned int port);
 long pv_shim_grant_table_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop,
                             unsigned int count, bool compat);
 domid_t get_dom0_domid(void);
+uint64_t pv_shim_mem(uint64_t avail);
 
 #else
 
@@ -80,6 +81,12 @@ static inline domid_t get_dom0_domid(void)
     return 0;
 }
 
+static inline uint64_t pv_shim_mem(uint64_t avail)
+{
+    ASSERT_UNREACHABLE();
+    return 0;
+}
+
 #endif
 
 #endif /* __X86_PV_SHIM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 60/74] xen/pvshim: set max_pages to the value of tot_pages
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (58 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 59/74] xen/pvshim: add shim_mem cmdline parameter Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-09  9:48   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 61/74] xen/pvshim: support vCPU hotplug Wei Liu
                   ` (15 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

So that the guest is not able to deplete the memory pool of the shim
itself by trying to balloon up.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/pv/shim.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index c24adacbc7..46f77362a7 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -195,6 +195,12 @@ void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
                             console_mfn, vphysmap);
         consoled_set_ring_addr(page);
     }
+
+    /*
+     * Set the max pages to the current number of pages to prevent the
+     * guest from depleting the shim memory pool.
+     */
+    d->max_pages = d->tot_pages;
 }
 
 static void write_start_info(struct domain *d)
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 61/74] xen/pvshim: support vCPU hotplug
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (59 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 60/74] xen/pvshim: set max_pages to the value of tot_pages Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-09 10:16   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 62/74] xen/pvshim: memory hotplug Wei Liu
                   ` (14 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/pv/shim.c        | 48 +++++++++++++++++++++++++++++++++++++++++++
 xen/common/domain.c           | 37 +++++++++++++++++++++++----------
 xen/include/asm-x86/pv/shim.h | 12 +++++++++++
 xen/include/xen/domain.h      |  1 +
 4 files changed, 87 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 46f77362a7..29f343b871 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -766,6 +766,54 @@ long pv_shim_grant_table_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop,
     return rc;
 }
 
+long pv_shim_cpu_up(void *data)
+{
+    struct vcpu *v = data;
+    long rc;
+
+    BUG_ON(smp_processor_id() != 0);
+
+    if ( !cpu_online(v->vcpu_id) )
+    {
+        rc = cpu_up_helper((void *)(unsigned long)v->vcpu_id);
+        if ( rc )
+        {
+            gprintk(XENLOG_ERR, "Failed to bring up CPU#%u: %ld\n",
+                    v->vcpu_id, rc);
+            return rc;
+        }
+    }
+
+    return vcpu_up(v);
+}
+
+long pv_shim_cpu_down(void *data)
+{
+    struct vcpu *v = data;
+    long rc;
+
+    BUG_ON(smp_processor_id() != 0);
+
+    if ( !test_and_set_bit(_VPF_down, &v->pause_flags) )
+        vcpu_sleep_sync(v);
+
+    if ( cpu_online(v->vcpu_id) )
+    {
+        rc = cpu_down_helper((void *)(unsigned long)v->vcpu_id);
+        if ( rc )
+            gprintk(XENLOG_ERR, "Failed to bring down CPU#%u: %ld\n",
+                    v->vcpu_id, rc);
+        /*
+         * NB: do not propagate errors from cpu_down_helper failing. The shim
+         * is going to run with extra CPUs, but that's not going to prevent
+         * normal operation. OTOH most guests are not prepared to handle an
+         * error on VCPUOP_down failing, and will likely panic.
+         */
+    }
+
+    return 0;
+}
+
 domid_t get_dom0_domid(void)
 {
     uint32_t eax, ebx, ecx, edx;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index bc2ceb2d36..c30e98b24e 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1281,6 +1281,23 @@ int default_initialise_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
     return rc;
 }
 
+int vcpu_up(struct vcpu *v)
+{
+    bool wake = false;
+    int rc = 0;
+
+    domain_lock(v->domain);
+    if ( !v->is_initialised )
+        rc = -EINVAL;
+    else
+        wake = test_and_clear_bit(_VPF_down, &v->pause_flags);
+    domain_unlock(v->domain);
+    if ( wake )
+        vcpu_wake(v);
+
+    return rc;
+}
+
 long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d = current->domain;
@@ -1303,22 +1320,20 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
 
         break;
 
-    case VCPUOP_up: {
-        bool_t wake = 0;
-        domain_lock(d);
-        if ( !v->is_initialised )
-            rc = -EINVAL;
+    case VCPUOP_up:
+        if ( pv_shim )
+            rc = continue_hypercall_on_cpu(0, pv_shim_cpu_up, v);
         else
-            wake = test_and_clear_bit(_VPF_down, &v->pause_flags);
-        domain_unlock(d);
-        if ( wake )
-            vcpu_wake(v);
+            rc = vcpu_up(v);
+
         break;
-    }
 
     case VCPUOP_down:
-        if ( !test_and_set_bit(_VPF_down, &v->pause_flags) )
+        if ( pv_shim )
+            rc = continue_hypercall_on_cpu(0, pv_shim_cpu_down, v);
+        else if ( !test_and_set_bit(_VPF_down, &v->pause_flags) )
             vcpu_sleep_nosync(v);
+
         break;
 
     case VCPUOP_is_up:
diff --git a/xen/include/asm-x86/pv/shim.h b/xen/include/asm-x86/pv/shim.h
index 00906f884b..d107a617a7 100644
--- a/xen/include/asm-x86/pv/shim.h
+++ b/xen/include/asm-x86/pv/shim.h
@@ -40,6 +40,8 @@ long pv_shim_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg);
 void pv_shim_inject_evtchn(unsigned int port);
 long pv_shim_grant_table_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop,
                             unsigned int count, bool compat);
+long pv_shim_cpu_up(void *data);
+long pv_shim_cpu_down(void *data);
 domid_t get_dom0_domid(void);
 uint64_t pv_shim_mem(uint64_t avail);
 
@@ -76,6 +78,16 @@ static inline long pv_shim_grant_table_op(unsigned int cmd,
     ASSERT_UNREACHABLE();
     return 0;
 }
+static inline long pv_shim_cpu_up(void *data)
+{
+    ASSERT_UNREACHABLE();
+    return 0;
+}
+static inline long pv_shim_cpu_down(void *data)
+{
+    ASSERT_UNREACHABLE();
+    return 0;
+}
 static inline domid_t get_dom0_domid(void)
 {
     return 0;
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 347f264047..eb62f1dab1 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -17,6 +17,7 @@ struct vcpu *alloc_vcpu(
     struct domain *d, unsigned int vcpu_id, unsigned int cpu_id);
 struct vcpu *alloc_dom0_vcpu0(struct domain *dom0);
 int vcpu_reset(struct vcpu *);
+int vcpu_up(struct vcpu *v);
 
 struct xen_domctl_getdomaininfo;
 void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 62/74] xen/pvshim: memory hotplug
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (60 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 61/74] xen/pvshim: support vCPU hotplug Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-09 10:42   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 63/74] xen/shim: modify shim_mem parameter behaviour Wei Liu
                   ` (13 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/pv/shim.c        | 110 ++++++++++++++++++++++++++++++++++++++++++
 xen/common/memory.c           |  14 ++++++
 xen/include/asm-x86/pv/shim.h |  10 ++++
 3 files changed, 134 insertions(+)

diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index 29f343b871..eb34467833 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -90,6 +90,9 @@ static unsigned int nr_grant_list;
 static unsigned long *grant_frames;
 static DEFINE_SPINLOCK(grant_lock);
 
+static PAGE_LIST_HEAD(balloon);
+static DEFINE_SPINLOCK(balloon_lock);
+
 #define L1_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_USER| \
                  _PAGE_GUEST_KERNEL)
 #define COMPAT_L1_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED)
@@ -814,6 +817,113 @@ long pv_shim_cpu_down(void *data)
     return 0;
 }
 
+static unsigned long batch_memory_op(int cmd, struct page_list_head *list)
+{
+    struct xen_memory_reservation xmr = {
+        .domid = DOMID_SELF,
+    };
+    unsigned long pfns[64];
+    struct page_info *pg;
+    unsigned long done = 0;
+
+    set_xen_guest_handle(xmr.extent_start, pfns);
+    page_list_for_each ( pg, list )
+    {
+        pfns[xmr.nr_extents++] = page_to_mfn(pg);
+        if ( xmr.nr_extents == ARRAY_SIZE(pfns) || !page_list_next(pg, list) )
+        {
+            long nr = xen_hypercall_memory_op(cmd, &xmr);
+
+            done += nr > 0 ? nr : 0;
+            if ( nr != xmr.nr_extents )
+                break;
+            xmr.nr_extents = 0;
+        }
+    }
+
+    return done;
+}
+
+void pv_shim_online_memory(unsigned int nr, unsigned int order)
+{
+    struct page_info *page, *tmp;
+    PAGE_LIST_HEAD(list);
+
+    spin_lock(&balloon_lock);
+    page_list_for_each_safe ( page, tmp, &balloon )
+    {
+            if ( page->v.free.order != order )
+                continue;
+
+            page_list_del(page, &balloon);
+            page_list_add_tail(page, &list);
+            if ( !--nr )
+                break;
+    }
+    spin_unlock(&balloon_lock);
+
+    if ( nr )
+        gprintk(XENLOG_WARNING,
+                "failed to allocate %u extents of order %u for onlining\n",
+                nr, order);
+
+    nr = batch_memory_op(XENMEM_populate_physmap, &list);
+    while ( nr-- )
+    {
+        BUG_ON((page = page_list_remove_head(&list)) == NULL);
+        free_domheap_pages(page, order);
+    }
+
+    if ( !page_list_empty(&list) )
+    {
+        gprintk(XENLOG_WARNING,
+                "failed to online some of the memory regions\n");
+        spin_lock(&balloon_lock);
+        while ( (page = page_list_remove_head(&list)) != NULL )
+            page_list_add_tail(page, &balloon);
+        spin_unlock(&balloon_lock);
+    }
+}
+
+void pv_shim_offline_memory(unsigned int nr, unsigned int order)
+{
+    struct page_info *page;
+    PAGE_LIST_HEAD(list);
+
+    while ( nr-- )
+    {
+        page = alloc_domheap_pages(NULL, order, 0);
+        if ( !page )
+            break;
+
+        page_list_add_tail(page, &list);
+        page->v.free.order = order;
+    }
+
+    if ( nr + 1 )
+        gprintk(XENLOG_WARNING,
+                "failed to reserve %u extents of order %u for offlining\n",
+                nr + 1, order);
+
+
+    nr = batch_memory_op(XENMEM_decrease_reservation, &list);
+    spin_lock(&balloon_lock);
+    while ( nr-- )
+    {
+        BUG_ON((page = page_list_remove_head(&list)) == NULL);
+        page_list_add_tail(page, &balloon);
+    }
+    spin_unlock(&balloon_lock);
+
+    if ( !page_list_empty(&list) )
+    {
+        gprintk(XENLOG_WARNING,
+                "failed to offline some of the memory regions\n");
+        while ( (page = page_list_remove_head(&list)) != NULL )
+            free_domheap_pages(page, order);
+    }
+}
+
 domid_t get_dom0_domid(void)
 {
     uint32_t eax, ebx, ecx, edx;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 5a1508a292..f06df8c8cf 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -29,6 +29,10 @@
 #include <public/memory.h>
 #include <xsm/xsm.h>
 
+#ifdef CONFIG_X86
+#include <asm/pv/shim.h>
+#endif
+
 struct memop_args {
     /* INPUT */
     struct domain *domain;     /* Domain to be affected. */
@@ -993,6 +997,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return start_extent;
         }
 
+#ifdef CONFIG_X86
+        if ( pv_shim && op != XENMEM_decrease_reservation && !args.nr_done )
+            pv_shim_online_memory(args.nr_extents, args.extent_order);
+#endif
+
         switch ( op )
         {
         case XENMEM_increase_reservation:
@@ -1015,6 +1024,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
                 __HYPERVISOR_memory_op, "lh",
                 op | (rc << MEMOP_EXTENT_SHIFT), arg);
 
+#ifdef CONFIG_X86
+        if ( pv_shim && op == XENMEM_decrease_reservation )
+            pv_shim_offline_memory(args.nr_extents, args.extent_order);
+#endif
+
         break;
 
     case XENMEM_exchange:
diff --git a/xen/include/asm-x86/pv/shim.h b/xen/include/asm-x86/pv/shim.h
index d107a617a7..7174f6fc07 100644
--- a/xen/include/asm-x86/pv/shim.h
+++ b/xen/include/asm-x86/pv/shim.h
@@ -42,6 +42,8 @@ long pv_shim_grant_table_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop,
                             unsigned int count, bool compat);
 long pv_shim_cpu_up(void *data);
 long pv_shim_cpu_down(void *data);
+void pv_shim_online_memory(unsigned int nr, unsigned int order);
+void pv_shim_offline_memory(unsigned int nr, unsigned int order);
 domid_t get_dom0_domid(void);
 uint64_t pv_shim_mem(uint64_t avail);
 
@@ -88,6 +90,14 @@ static inline long pv_shim_cpu_down(void *data)
     ASSERT_UNREACHABLE();
     return 0;
 }
+static inline void pv_shim_online_memory(unsigned int nr, unsigned int order)
+{
+    ASSERT_UNREACHABLE();
+}
+static inline void pv_shim_offline_memory(unsigned int nr, unsigned int order)
+{
+    ASSERT_UNREACHABLE();
+}
 static inline domid_t get_dom0_domid(void)
 {
     return 0;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 63/74] xen/shim: modify shim_mem parameter behaviour
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (61 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 62/74] xen/pvshim: memory hotplug Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-09 10:48   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 64/74] xen/pvshim: use default position for the m2p mappings Wei Liu
                   ` (12 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

shim_mem will now account for both the memory used by the hypervisor
loaded in memory and the free memory slack given to the shim for
runtime usage.

From experimental testing it seems like the total amount of MiB used
by the shim (giving it ~1MB of free memory for runtime) is:

memory/113 + 20

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 docs/misc/xen-command-line.markdown | 13 +++++++------
 xen/arch/x86/dom0_build.c           | 14 +++-----------
 xen/arch/x86/pv/shim.c              | 30 +++++++++++++++++++-----------
 3 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 9f51710a46..68ec52b5c2 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1461,16 +1461,17 @@ constructed into a plain unprivileged PV domain.
 ### shim\_mem (x86)
 > `= List of ( min:<size> | max:<size> | <size> )`
 
-Set the amount of memory that xen-shim reserves for itself. Only has effect
-if pv-shim mode is enabled.
+Set the amount of memory that xen-shim uses. Only has effect if pv-shim mode is
+enabled. Note that this value accounts for the memory used by the shim itself
+plus the free memory slack given to the shim for runtime allocations.
 
 * `min:<size>` specifies the minimum amount of memory. Ignored if greater
-   than max. Default: 10M.
-* `max:<size>` specifies the maximum amount of memory. Default: 128M.
+   than max.
+* `max:<size>` specifies the maximum amount of memory.
 * `<size>` specifies the exact amount of memory. Overrides both min and max.
 
-By default, 1/16th of total HVM container's memory is reserved for xen-shim
-with minimum amount being 10MB and maximum amount 128MB.
+By default, the amount of free memory slack given to the shim for runtime usage
+is 1MB.
 
 ### rcu-idle-timer-period-ms
 > `= <integer>`
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 1b0b89fdeb..30347bcc67 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -290,17 +290,9 @@ unsigned long __init dom0_compute_nr_pages(
          * for things like DMA buffers. This reservation is clamped to a
          * maximum of 128MB.
          */
-        if ( nr_pages == 0 )
-        {
-            uint64_t rsvd = min(avail / 16, 128UL << (20 - PAGE_SHIFT));
-            if ( pv_shim )
-            {
-                rsvd = pv_shim_mem(avail);
-                printk("Reserved %lu pages for xen-shim\n", rsvd);
-
-            }
-            nr_pages = -rsvd;
-        }
+        if ( !nr_pages )
+            nr_pages = -(pv_shim ? pv_shim_mem(avail)
+                                 : min(avail / 16, 128UL << (20 - PAGE_SHIFT)));
 
         /* Negative specification means "all memory - specified amount". */
         if ( (long)nr_pages  < 0 ) nr_pages  += avail;
diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
index eb34467833..120c3b55b0 100644
--- a/xen/arch/x86/pv/shim.c
+++ b/xen/arch/x86/pv/shim.c
@@ -41,9 +41,8 @@ boolean_param("pv-shim", pv_shim);
 #endif
 
 /*
- * By default, 1/16th of total HVM container's memory is reserved for xen-shim
- * with minimum amount being 10MB and maximum amount 128MB. Some users may wish
- * to tune this constants for better memory utilization. This can be achieved
+ * By default give the shim 1MB of free memory slack. Some users may wish to
+ * tune this constants for better memory utilization. This can be achieved
  * using the following xen-shim's command line option:
  *
  * shim_mem=[min:<min_amt>,][max:<max_amt>,][<amt>]
@@ -55,8 +54,8 @@ boolean_param("pv-shim", pv_shim);
  *            (overrides both min and max)
  */
 static uint64_t __initdata shim_nrpages;
-static uint64_t __initdata shim_min_nrpages = 10UL << (20 - PAGE_SHIFT);
-static uint64_t __initdata shim_max_nrpages = 128UL << (20 - PAGE_SHIFT);
+static uint64_t __initdata shim_min_nrpages;
+static uint64_t __initdata shim_max_nrpages;
 
 static int __init parse_shim_mem(const char *s)
 {
@@ -75,15 +74,24 @@ custom_param("shim_mem", parse_shim_mem);
 
 uint64_t pv_shim_mem(uint64_t avail)
 {
-    uint64_t rsvd = min(avail / 16, shim_max_nrpages);
+    if ( !shim_nrpages )
+    {
+        shim_nrpages = max(shim_min_nrpages,
+                           total_pages - avail + (1UL << (20 - PAGE_SHIFT)));
+        if ( shim_max_nrpages )
+            shim_max_nrpages = min(shim_nrpages, shim_max_nrpages);
+    }
+
+    if ( total_pages - avail > shim_nrpages )
+        panic("pages used by shim > shim_nrpages (%#lx > %#lx)",
+              total_pages - avail, shim_nrpages);
 
-    if ( shim_nrpages )
-        return shim_nrpages;
+    shim_nrpages -= total_pages - avail;
 
-    if ( shim_min_nrpages <= shim_max_nrpages )
-        rsvd = max(rsvd, shim_min_nrpages);
+    printk("shim used pages %#lx reserving %#lx free pages\n",
+           total_pages - avail, shim_nrpages);
 
-    return rsvd;
+    return shim_nrpages;
 }
 
 static unsigned int nr_grant_list;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 64/74] xen/pvshim: use default position for the m2p mappings
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (62 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 63/74] xen/shim: modify shim_mem parameter behaviour Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-09 10:50   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 65/74] xen/shim: crash instead of reboot in shim mode Wei Liu
                   ` (11 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

When running a 32bit kernel as Dom0 on a 64bit hypervisor the
hypervisor will try to shrink the hypervisor hole to the minimum
needed, and thus requires the Dom0 to use XENMEM_machphys_mapping in
order to fetch the position of the start of the hypervisor virtual
mappings.

Disable this feature when running as a PV shim, since some DomU
kernels don't implemented XENMEM_machphys_mapping and break if the m2p
doesn't begin at the default address.

NB: support for the XENMEM_machphys_mapping was added in Linux by
commit 7e7750.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/pv/dom0_build.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index e152fe3a9e..f5a793b1d2 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -398,7 +398,8 @@ int __init dom0_construct_pv(struct domain *d,
     if ( parms.pae == XEN_PAE_EXTCR3 )
             set_bit(VMASST_TYPE_pae_extended_cr3, &d->vm_assist);
 
-    if ( (parms.virt_hv_start_low != UNSET_ADDR) && elf_32bit(&elf) )
+    if ( !pv_shim && (parms.virt_hv_start_low != UNSET_ADDR) &&
+         elf_32bit(&elf) )
     {
         unsigned long mask = (1UL << L2_PAGETABLE_SHIFT) - 1;
         value = (parms.virt_hv_start_low + mask) & ~mask;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 65/74] xen/shim: crash instead of reboot in shim mode
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (63 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 64/74] xen/pvshim: use default position for the m2p mappings Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-09 10:52   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 66/74] xen/shim: allow DomU to have as many vcpus as available Wei Liu
                   ` (10 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

All guest shutdown operations are forwarded to L0, so the only native
calls to machine_restart happen from crash related paths inside the
hypervisor, hence switch the reboot code to instead issue a crash
shutdown.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/shutdown.c    | 7 +++++++
 xen/drivers/char/console.c | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c
index 689f6f137d..d2d14ae5f8 100644
--- a/xen/arch/x86/shutdown.c
+++ b/xen/arch/x86/shutdown.c
@@ -642,6 +642,13 @@ void machine_restart(unsigned int delay_millisecs)
             break;
 
         case BOOT_XEN:
+            if ( pv_shim )
+                /*
+                 * When running in PV shim mode guest shutdown calls are
+                 * forwarded to L0, hence the only way to get here is if a
+                 * shim crash happens.
+                 */
+                xen_hypercall_shutdown(SHUTDOWN_crash);
             xen_hypercall_shutdown(SHUTDOWN_reboot);
             break;
         }
diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 3c615a255c..5c37dfc3f6 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -1230,7 +1230,7 @@ void panic(const char *fmt, ...)
     if ( opt_noreboot )
         printk("Manual reset required ('noreboot' specified)\n");
     else
-        printk("Reboot in five seconds...\n");
+        printk("%s in five seconds...\n", pv_shim ? "Crash" : "Reboot");
 
     spin_unlock_irqrestore(&lock, flags);
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 66/74] xen/shim: allow DomU to have as many vcpus as available
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (64 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 65/74] xen/shim: crash instead of reboot in shim mode Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-09 10:59   ` Jan Beulich
  2018-01-04 13:06 ` [PATCH RFC v1 67/74] libxl: libxl__build_hvm: Introduce separate b_info parameter Wei Liu
                   ` (9 subsequent siblings)
  75 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Roger Pau Monne <roger.pau@citrix.com>

Since the shim VCPUOP_{up/down} hypercall is wired to the plug/unplug
of CPUs to the shim itself, start the shim DomU with only the BSP
online, and let the guest bring up other CPUs as it needs them.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/dom0_build.c    | 32 +++++++++++++++++++++++++++++---
 xen/arch/x86/pv/dom0_build.c |  3 ++-
 xen/arch/x86/setup.c         | 28 ++++++++++++++++++----------
 3 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 30347bcc67..d54155b1a4 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -138,9 +138,18 @@ struct vcpu *__init dom0_setup_vcpu(struct domain *d,
 
     if ( v )
     {
-        if ( !d->is_pinned && !dom0_affinity_relaxed )
-            cpumask_copy(v->cpu_hard_affinity, &dom0_cpus);
-        cpumask_copy(v->cpu_soft_affinity, &dom0_cpus);
+        if ( pv_shim )
+        {
+
+            cpumask_setall(v->cpu_hard_affinity);
+            cpumask_setall(v->cpu_soft_affinity);
+        }
+        else
+        {
+            if ( !d->is_pinned && !dom0_affinity_relaxed )
+                cpumask_copy(v->cpu_hard_affinity, &dom0_cpus);
+            cpumask_copy(v->cpu_soft_affinity, &dom0_cpus);
+        }
     }
 
     return v;
@@ -153,6 +162,23 @@ unsigned int __init dom0_max_vcpus(void)
     unsigned int i, max_vcpus, limit;
     nodeid_t node;
 
+    if ( pv_shim )
+    {
+        nodes_setall(dom0_nodes);
+
+        /*
+         * When booting in shim mode APs are not started until the guest brings
+         * other vCPUs up.
+         */
+        cpumask_set_cpu(0, &dom0_cpus);
+
+        /*
+         * On PV shim mode allow the guest to have as many CPUs as available.
+         */
+        return nr_cpu_ids;
+    }
+
+
     for ( i = 0; i < dom0_nr_pxms; ++i )
         if ( (node = pxm_to_node(dom0_pxms[i])) != NUMA_NO_NODE )
             node_set(node, dom0_nodes);
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index f5a793b1d2..2bee988f54 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -695,7 +695,8 @@ int __init dom0_construct_pv(struct domain *d,
     for ( i = 0; i < XEN_LEGACY_MAX_VCPUS; i++ )
         shared_info(d, vcpu_info[i].evtchn_upcall_mask) = 1;
 
-    printk("Dom0 has maximum %u VCPUs\n", d->max_vcpus);
+    printk("%s has maximum %u VCPUs\n", pv_shim ? "DomU" : "Dom0",
+           d->max_vcpus);
 
     cpu = v->processor;
     for ( i = 1; i < d->max_vcpus; i++ )
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 34d746395b..aa6703d542 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1580,20 +1580,28 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     do_presmp_initcalls();
 
-    for_each_present_cpu ( i )
+    if ( !pv_shim )
     {
-        /* Set up cpu_to_node[]. */
-        srat_detect_node(i);
-        /* Set up node_to_cpumask based on cpu_to_node[]. */
-        numa_add_cpu(i);        
-
-        if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
+        for_each_present_cpu ( i )
         {
-            int ret = cpu_up(i);
-            if ( ret != 0 )
-                printk("Failed to bring up CPU %u (error %d)\n", i, ret);
+            /* Set up cpu_to_node[]. */
+            srat_detect_node(i);
+            /* Set up node_to_cpumask based on cpu_to_node[]. */
+            numa_add_cpu(i);
+
+            if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
+            {
+                int ret = cpu_up(i);
+                if ( ret != 0 )
+                    printk("Failed to bring up CPU %u (error %d)\n", i, ret);
+            }
         }
     }
+    /*
+     * NB: when running as a PV shim VCPUOP_up/down is wired to the shim
+     * physical cpu_add/remove functions, so launch the guest with only
+     * the BSP online and let it bring up the other CPUs as required.
+     */
 
     printk("Brought up %ld CPUs\n", (long)num_online_cpus());
     smp_cpus_done();
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 67/74] libxl: libxl__build_hvm: Introduce separate b_info parameter
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (65 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 66/74] xen/shim: allow DomU to have as many vcpus as available Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-04 13:06 ` [PATCH RFC v1 68/74] libxl__domain_build_info_setdefault_pvhhvm: introduce Wei Liu
                   ` (8 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Ian Jackson <ian.jackson@eu.citrix.com>

When running pv-in-pvh, we are going to want to pass this function an
exciting config which is a mixture of the user's main domain
configuration, and some PVH configuration which we make up.

To this end, have libxl__build_hvm take, and honour, a separate
parameter for config->b_info.  Because it already has a convenience
alias, the change is trivial.

We add the obvious extra parameter at every call site.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_create.c   | 4 ++--
 tools/libxl/libxl_dom.c      | 2 +-
 tools/libxl/libxl_internal.h | 1 +
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f15fb215c2..a837a7f5be 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -438,7 +438,7 @@ int libxl__domain_build(libxl__gc *gc,
 
     switch (info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
-        ret = libxl__build_hvm(gc, domid, d_config, state);
+        ret = libxl__build_hvm(gc, domid, d_config, &d_config->b_info, state);
         if (ret)
             goto out;
 
@@ -499,7 +499,7 @@ int libxl__domain_build(libxl__gc *gc,
 
         break;
     case LIBXL_DOMAIN_TYPE_PVH:
-        ret = libxl__build_hvm(gc, domid, d_config, state);
+        ret = libxl__build_hvm(gc, domid, d_config, &d_config->b_info, state);
         if (ret)
             goto out;
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index f04eec7c79..14a9a09958 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1172,12 +1172,12 @@ out:
 
 int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
               libxl_domain_config *d_config,
+              libxl_domain_build_info *info,
               libxl__domain_build_state *state)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     int rc;
     uint64_t mmio_start, lowmem_end, highmem_end, mem_size;
-    libxl_domain_build_info *const info = &d_config->b_info;
     struct xc_dom_image *dom = NULL;
     bool device_model = info->type == LIBXL_DOMAIN_TYPE_HVM ? true : false;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index ef1b2e2ca1..15a3c33697 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1160,6 +1160,7 @@ _hidden int libxl__build_pv(libxl__gc *gc, uint32_t domid,
              libxl_domain_build_info *info, libxl__domain_build_state *state);
 _hidden int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
               libxl_domain_config *d_config,
+              libxl_domain_build_info *info,
               libxl__domain_build_state *state);
 
 _hidden int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid,
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 68/74] libxl__domain_build_info_setdefault_pvhhvm: introduce
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (66 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 67/74] libxl: libxl__build_hvm: Introduce separate b_info parameter Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-04 13:06 ` [PATCH RFC v1 69/74] libxl_bitmap_copy_alloc: copy 0, NULL as 0, NULL Wei Liu
                   ` (7 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Ian Jackson <ian.jackson@eu.citrix.com>

We are going to want to make bits of a pvh config, specifically the
build_info, internally in libxl.  This code is part of the
default-setting etc. and will need to be called there.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_create.c   | 9 +++++++++
 tools/libxl/libxl_internal.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index a837a7f5be..6d910e4a09 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -396,6 +396,15 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         return ERROR_INVAL;
     }
 
+    rc = libxl__domain_build_info_setdefault_pvhhvm(gc, b_info);
+    if (rc) return rc;
+
+    return 0;
+}
+
+int libxl__domain_build_info_setdefault_pvhhvm(libxl__gc *gc,
+                                        libxl_domain_build_info *b_info)
+{
     /* Configuration fields shared between PVH and HVM. */
     if (b_info->type != LIBXL_DOMAIN_TYPE_PV) {
         if (libxl__timer_mode_is_default(&b_info->timer_mode))
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 15a3c33697..174cf35d97 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1254,6 +1254,8 @@ _hidden int libxl__domain_create_info_setdefault(libxl__gc *gc,
                                         libxl_domain_create_info *c_info);
 _hidden int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info);
+_hidden int libxl__domain_build_info_setdefault_pvhhvm(libxl__gc *gc,
+                                        libxl_domain_build_info *b_info);
 _hidden void libxl__rdm_setdefault(libxl__gc *gc,
                                    libxl_domain_build_info *b_info);
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 69/74] libxl_bitmap_copy_alloc: copy 0, NULL as 0, NULL
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (67 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 68/74] libxl__domain_build_info_setdefault_pvhhvm: introduce Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-04 13:06 ` [PATCH RFC v1 70/74] libxl: pvshim: Check state->shim_path before domain type Wei Liu
                   ` (6 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Ian Jackson <ian.jackson@eu.citrix.com>

We shouldn't allocate when it's both unnecessary, and not in
accordance with the thing we're copying.

One effect is to make a copied libxl__domain_build_info more like the
original, which is going to be helpful for the pv shim mode.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_utils.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 507ee56c7c..63b1a2cac1 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -659,9 +659,13 @@ void libxl_bitmap_copy_alloc(libxl_ctx *ctx,
 {
     GC_INIT(ctx);
 
-    dptr->map = libxl__calloc(NOGC, sptr->size, sizeof(*sptr->map));
     dptr->size = sptr->size;
-    memcpy(dptr->map, sptr->map, sptr->size * sizeof(*sptr->map));
+    if (sptr->map) {
+        dptr->map = libxl__calloc(NOGC, sptr->size, sizeof(*sptr->map));
+        memcpy(dptr->map, sptr->map, sptr->size * sizeof(*sptr->map));
+    } else {
+        dptr->map = NULL;
+    }
 
     GC_FREE;
 }
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 70/74] libxl: pvshim: Check state->shim_path before domain type
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (68 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 69/74] libxl_bitmap_copy_alloc: copy 0, NULL as 0, NULL Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-04 13:06 ` [PATCH RFC v1 71/74] libxl: pvshim: Provide first-class config settings to enable shim mode Wei Liu
                   ` (5 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Ian Jackson <ian.jackson@eu.citrix.com>

This will make it possible to use the shim when the actual
application-requested domain type is PV.

Code elsewhere is responsible for setting state->shim_path non-NULL
iff the shim is required.

With this patch, in the current context, setting LIBXL_PVSHIM_PATH
will affect non-PVH guests now.  So we increase the scope of that
bodge (which we are about to abolish).

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_dom.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 14a9a09958..bf509905a1 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1198,9 +1198,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
      * If PVH and no shim override, use the pv cmdline.
      * If not PVH, use info->cmdline.
      */
-    dom = xc_dom_allocate(ctx->xch, info->type == LIBXL_DOMAIN_TYPE_PVH ?
-                          (state->shim_path ? state->shim_cmdline : state->pv_cmdline) :
-                          info->cmdline, NULL);
+    dom = xc_dom_allocate(ctx->xch,
+              state->shim_path                    ? state->shim_cmdline :
+              info->type == LIBXL_DOMAIN_TYPE_PVH ? state->pv_cmdline   :
+              info->cmdline,
+                          NULL);
     if (!dom) {
         LOGE(ERROR, "xc_dom_allocate failed");
         rc = ERROR_NOMEM;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 71/74] libxl: pvshim: Provide first-class config settings to enable shim mode
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (69 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 70/74] libxl: pvshim: Check state->shim_path before domain type Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-04 13:06 ` [PATCH RFC v1 72/74] libxl: pvshim: Introduce pvhshim_extra Wei Liu
                   ` (4 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Ian Jackson <ian.jackson@eu.citrix.com>

** NOTE: This patch does not currently work! **

** NOTE: I intend to change the config names from "pvhshim" to "pvshim" **

This is API-compatible because old callers are supposed to call
libxl_*_init to initialise the struct; and the updated function clears
these members.

It is ABI-compatible because the new fields make this member of the
guest type union larger but only within the existing size of that
union.

For now, our config defaults are:
 * shim is disabled
 * if enabled, path is "xen-shim" in the xen firmware directory
 * if enabled, cmdline is the one we are currently debugging with

The debugging arguments will be rationalised in a moment.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl.h          |  8 ++++++++
 tools/libxl/libxl_create.c   | 48 +++++++++++++++++++++++++++++++++++++++++---
 tools/libxl/libxl_dom.c      | 10 ---------
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_types.idl  |  3 +++
 5 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 5e9aed739d..81dfcc80ad 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1101,6 +1101,14 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, const libxl_mac *src);
  */
 #define LIBXL_HAVE_SET_PARAMETERS 1
 
+/*
+ * LIBXL_HAVE_PV_SHIM
+ *
+ * If this is defined, libxl_domain_build_info's pv type information
+ * contains members pvhshim, pvhshim_path, pvhshim_cmdline.
+ */
+#define LIBXL_HAVE_PV_SHIM 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 6d910e4a09..cd98522b9b 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -369,6 +369,18 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         if (b_info->u.pv.slack_memkb == LIBXL_MEMKB_DEFAULT)
             b_info->u.pv.slack_memkb = 0;
 
+        libxl_defbool_setdefault(&b_info->u.pv.pvhshim, false);
+        if (libxl_defbool_val(b_info->u.pv.pvhshim)) {
+            if (!b_info->u.pv.pvhshim_path)
+                b_info->u.pv.pvhshim_path =
+                    libxl__sprintf(NOGC, "%s/%s",
+                                   libxl__xenfirmwaredir_path(),
+                                   PVSHIM_BASENAME);
+            if (!b_info->u.pv.pvhshim_cmdline)
+                b_info->u.pv.pvhshim_cmdline =
+                    libxl__strdup(NOGC, PVSHIM_CMDLINE);
+        }
+
         /* For compatibility, fill in b_info->kernel|ramdisk|cmdline
          * with the value in u.pv, later processing will use
          * b_info->kernel|ramdisk|cmdline only.
@@ -438,6 +450,9 @@ int libxl__domain_build(libxl__gc *gc,
     char **vments = NULL, **localents = NULL;
     struct timeval start_time;
     int i, ret;
+    libxl_domain_build_info shim_info;
+
+    libxl_domain_build_info_init(&shim_info);
 
     ret = libxl__build_pre(gc, domid, d_config, state);
     if (ret)
@@ -485,9 +500,35 @@ int libxl__domain_build(libxl__gc *gc,
 
         break;
     case LIBXL_DOMAIN_TYPE_PV:
-        ret = libxl__build_pv(gc, domid, info, state);
-        if (ret)
-            goto out;
+        if (libxl_defbool_val(info->u.pv.pvhshim)) {
+            /*
+             * The next bit seems like it might be thread-unsafe, but
+             * libxl_domain_create can already modify this struct so a
+             * config cannot be passed to libxl on different threads
+             * concurrently.  So we can set this to INVALID, as part
+             * of making copy with a different type.
+             */
+            libxl_domain_type shim_saved_type = info->type;
+            info->type = LIBXL_DOMAIN_TYPE_INVALID;
+            libxl_domain_build_info_copy(CTX, &shim_info, info);
+            info->type = shim_saved_type;
+
+            libxl_domain_build_info_init_type(&shim_info,
+                                              LIBXL_DOMAIN_TYPE_PVH);
+            ret = libxl__domain_build_info_setdefault_pvhhvm(gc, &shim_info);
+            if (ret) goto out;
+
+            state->shim_path = info->u.pv.pvhshim_path;
+            state->shim_cmdline = info->u.pv.pvhshim_cmdline;
+            ret = libxl__build_hvm(gc, domid,
+                                   d_config, &shim_info,
+                                   state);
+            if (ret) goto out;
+        } else {
+            ret = libxl__build_pv(gc, domid, info, state);
+            if (ret)
+                goto out;
+        }
 
         vments = libxl__calloc(gc, 11, sizeof(char *));
         i = 0;
@@ -525,6 +566,7 @@ int libxl__domain_build(libxl__gc *gc,
     }
     ret = libxl__build_post(gc, domid, info, state, vments, localents);
 out:
+    libxl_domain_build_info_dispose(&shim_info);
     return ret;
 }
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index bf509905a1..3b6c457ec0 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1183,16 +1183,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 
     xc_dom_loginit(ctx->xch);
 
-    /* FIXME */
-#define LIBXL_PVSHIM_PATH "LIBXL_PVSHIM_PATH"
-#define LIBXL_PVSHIM_CMDLINE "LIBXL_PVSHIM_CMDLINE"
-    state->shim_path = getenv(LIBXL_PVSHIM_PATH);
-    if (state->shim_path) {
-        state->shim_cmdline = getenv(LIBXL_PVSHIM_CMDLINE);
-        LOG(WARN, "LIBXL_PVSHIM_PATH detected, using pv shim %s cmd %s",
-            state->shim_path, state->shim_cmdline);
-    }
-
     /* 
      * If PVH and we have a shim override, use the shim cmdline.
      * If PVH and no shim override, use the pv cmdline.
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 174cf35d97..2897e7c3bb 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -118,6 +118,8 @@
 #define TAP_DEVICE_SUFFIX "-emu"
 #define DOMID_XS_PATH "domid"
 #define INVALID_DOMID ~0
+#define PVSHIM_BASENAME "xen-shim"
+#define PVSHIM_CMDLINE "pv-shim console=xen,pv sched=null loglvl=all guest_loglvl=all apic_verbosity=debug e820-verbose"
 
 /* Size macros. */
 #define __AC(X,Y)   (X##Y)
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a239324341..a6ebea0178 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -591,6 +591,9 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                       ("features", string, {'const': True}),
                                       # Use host's E820 for PCI passthrough.
                                       ("e820_host", libxl_defbool),
+                                      ("pvhshim", libxl_defbool),
+                                      ("pvhshim_path", string),
+                                      ("pvhshim_cmdline", string),
                                       ])),
                  ("pvh", None),
                  ("invalid", None),
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 72/74] libxl: pvshim: Introduce pvhshim_extra
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (70 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 71/74] libxl: pvshim: Provide first-class config settings to enable shim mode Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-04 13:06 ` [PATCH RFC v1 73/74] xl: pvshim: Provide and document xl config Wei Liu
                   ` (3 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Ian Jackson <ian.jackson@eu.citrix.com>

And move the debugging options from the default config into a doc
comment in libxl_types.idl.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl.h          | 2 +-
 tools/libxl/libxl_create.c   | 5 ++++-
 tools/libxl/libxl_internal.h | 2 +-
 tools/libxl/libxl_types.idl  | 1 +
 4 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 81dfcc80ad..7e40155079 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1105,7 +1105,7 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, const libxl_mac *src);
  * LIBXL_HAVE_PV_SHIM
  *
  * If this is defined, libxl_domain_build_info's pv type information
- * contains members pvhshim, pvhshim_path, pvhshim_cmdline.
+ * contains members pvhshim, pvhshim_path, pvhshim_cmdline, pvhshim_extra.
  */
 #define LIBXL_HAVE_PV_SHIM 1
 
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index cd98522b9b..94b2c90c5e 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -519,7 +519,10 @@ int libxl__domain_build(libxl__gc *gc,
             if (ret) goto out;
 
             state->shim_path = info->u.pv.pvhshim_path;
-            state->shim_cmdline = info->u.pv.pvhshim_cmdline;
+            state->shim_cmdline = GCSPRINTF("%s%s%s",
+                                            info->u.pv.pvhshim_cmdline,
+                                            info->u.pv.pvhshim_extra ? " " : "",
+                                            info->u.pv.pvhshim_extra ? info->u.pv.pvhshim_extra : "");
             ret = libxl__build_hvm(gc, domid,
                                    d_config, &shim_info,
                                    state);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 2897e7c3bb..040d9c6b30 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -119,7 +119,7 @@
 #define DOMID_XS_PATH "domid"
 #define INVALID_DOMID ~0
 #define PVSHIM_BASENAME "xen-shim"
-#define PVSHIM_CMDLINE "pv-shim console=xen,pv sched=null loglvl=all guest_loglvl=all apic_verbosity=debug e820-verbose"
+#define PVSHIM_CMDLINE "pv-shim console=xen,pv sched=null"
 
 /* Size macros. */
 #define __AC(X,Y)   (X##Y)
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a6ebea0178..7ff807acf3 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -594,6 +594,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                       ("pvhshim", libxl_defbool),
                                       ("pvhshim_path", string),
                                       ("pvhshim_cmdline", string),
+                                      ("pvhshim_extra", string), # eg "loglvl=all guest_loglvl=all apic_verbosity=debug e820-verbose"
                                       ])),
                  ("pvh", None),
                  ("invalid", None),
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 73/74] xl: pvshim: Provide and document xl config
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (71 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 72/74] libxl: pvshim: Introduce pvhshim_extra Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-04 13:06 ` [PATCH RFC v1 74/74] libxl: pvshim: Set video_memkb to ~0 Wei Liu
                   ` (2 subsequent siblings)
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Ian Jackson <ian.jackson@eu.citrix.com>

** NOTE: I intend to change the config names from "pvhshim" to "pvshim" **

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 docs/man/xl.cfg.pod.5.in | 28 ++++++++++++++++++++++++++++
 tools/xl/xl_parse.c      | 11 +++++++++++
 2 files changed, 39 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index b7b91d8627..e9f29c2424 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -470,6 +470,34 @@ version of pvgrub can be selected.
 Note that xl expects to find the pvgrub32.bin and pvgrub64.bin binaries in
 F<@XENFIRMWAREDIR@>.
 
+=item B<pvhshim=BOOLEAN>
+
+Whether to boot this guest as a PV guest within a PVH container
+(ie, using processor hardware extensions to
+separate its address space.)
+
+Default is false.
+
+=item B<pvhshim_path="PATH">
+
+The PVH shim is a specially-built firmware-like executable
+constructed from the hypervisor source tree.
+This option specifies to use a non-default shim.
+Ignored if pvhsim is false.
+
+=item B<pvhshim_cmdline="STRING">
+
+Command line for the shim.
+Default is "pv-shim console=xen,pv sched=null".
+Ignored if pvhsim is false.
+
+=item B<pvhshim_extra="STRING">
+
+Extra command line arguments for the shim.
+If supplied, appended to the value for pvhshim_cmdline.
+Default is empty.
+Ignored if pvhsim is false.
+
 =back
 
 =head4 HVM guest options
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 9a692d5ae6..bdd3ad8127 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1402,6 +1402,17 @@ void parse_config_data(const char *config_source,
             exit(1);
         }
 
+        xlu_cfg_get_defbool(config, "pvhshim", &b_info->u.pv.pvhshim, 0);
+        if (!xlu_cfg_get_string(config, "pvhshim_path", &buf, 0))
+            xlu_cfg_replace_string(config, "pvhshim_path",
+                                   &b_info->u.pv.pvhshim_path, 0);
+        if (!xlu_cfg_get_string(config, "pvhshim_cmdline", &buf, 0))
+            xlu_cfg_replace_string(config, "pvhshim_cmdline",
+                                   &b_info->u.pv.pvhshim_cmdline, 0);
+        if (!xlu_cfg_get_string(config, "pvhshim_extra", &buf, 0))
+            xlu_cfg_replace_string(config, "pvhshim_extra",
+                                   &b_info->u.pv.pvhshim_extra, 0);
+
         break;
     }
     default:
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* [PATCH RFC v1 74/74] libxl: pvshim: Set video_memkb to ~0
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (72 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 73/74] xl: pvshim: Provide and document xl config Wei Liu
@ 2018-01-04 13:06 ` Wei Liu
  2018-01-08 16:12 ` [PATCH RFC v1 00/74] Run PV guest in PVH container Ian Jackson
  2018-01-10 16:26 ` George Dunlap
  75 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-04 13:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: wei.liu2

From: Ian Jackson <ian.jackson@eu.citrix.com>

This is how ordinary pvh guests get created right now.  This is
probably a bug.  But I want this pv shim mode to build the guest like
pvh does.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_create.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 94b2c90c5e..a1f9c9957b 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -517,6 +517,7 @@ int libxl__domain_build(libxl__gc *gc,
                                               LIBXL_DOMAIN_TYPE_PVH);
             ret = libxl__domain_build_info_setdefault_pvhhvm(gc, &shim_info);
             if (ret) goto out;
+            shim_info.video_memkb = ~(uint64_t)0; /* bodge ?! */
 
             state->shim_path = info->u.pv.pvhshim_path;
             state->shim_cmdline = GCSPRINTF("%s%s%s",
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 01/74] x86/svm: Offer CPUID Faulting to AMD HVM guests as well
  2018-01-04 13:05 ` [PATCH RFC v1 01/74] x86/svm: Offer CPUID Faulting to AMD HVM guests as well Wei Liu
@ 2018-01-04 14:00   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-04 14:00 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -39,7 +39,8 @@ static void __init calculate_hvm_max_policy(void)
>          return;
>  
>      /* 0x000000ce  MSR_INTEL_PLATFORM_INFO */
> -    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> +    if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
> +         boot_cpu_data.x86_vendor == X86_VENDOR_AMD )

Why not drop the conditional instead? Either way
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 02/74] x86: Common cpuid faulting support
  2018-01-04 13:05 ` [PATCH RFC v1 02/74] x86: Common cpuid faulting support Wei Liu
@ 2018-01-04 14:19   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-04 14:19 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> @@ -518,7 +522,7 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>  	if (c == &boot_cpu_data)
>  		amd_init_levelling();
>  
> -	amd_ctxt_switch_levelling(NULL);
> +	ctxt_switch_levelling(NULL);
>  }

I don't really understand this change: Why don't you call
amd_ctxt_switch_masking() instead? ctxt_switch_levelling(NULL)
doesn't do anything else (assuming the call here is wrapped by a
"if (!cpu_has_cpuid_faulting)"). Same for the Intel variant then.

> +static void set_cpuid_faulting(bool enable)
> +{
> +	uint64_t *this_misc_features = &this_cpu(msr_misc_features);
> +	uint64_t val = *this_misc_features;
> +
> +	if (!!(val & MSR_MISC_FEATURES_CPUID_FAULTING) == enable)
> +		return;
> +
> +	val ^= MSR_MISC_FEATURES_CPUID_FAULTING;
> +
> +	wrmsrl(MSR_INTEL_MISC_FEATURES_ENABLES, val);
> +	*this_misc_features = val;

If you maintain a cache for the full MSR, then I think you'd better
create wrappers to do reads and writes, just like we have for
EFER.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 06/74] tools/libelf: fix elf notes check for PVH guest
  2018-01-04 13:05 ` [PATCH RFC v1 06/74] tools/libelf: fix elf notes check for PVH guest Wei Liu
@ 2018-01-04 14:37   ` Jan Beulich
  2018-01-08 15:34     ` Wei Liu
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-04 14:37 UTC (permalink / raw)
  To: wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> PVH only requires PHYS32_ENTRY to be set. Return immediately if that's
> the case.

So I guess the bug(?) being fixed is that so far loader or guest_os,
and xen_ver settings are also required. However, you fail to mention
_why_ you think they're not required. I can sort of see this for
loader and maybe guest_os, but for the Xen version this isn't as
obvious, mainly because any arguments I can think of right now
would equally apply to PV.

> --- a/xen/common/libelf/libelf-dominfo.c
> +++ b/xen/common/libelf/libelf-dominfo.c
> @@ -381,6 +381,13 @@ static elf_errorstatus elf_xen_note_check(struct elf_binary *elf,
>           return 0;
>      }
>  
> +    /* PVH only requires one ELF note to be set */
> +    if ( parms->phys_entry != UNSET_ADDR32 )
> +    {
> +        elf_msg(elf, "ELF: Found PVH image\n");
> +        return 0;
> +    }

If the other entries are of no interest for PVH, I think that this
then calls for dropping their logging from pvh_load_kernel().
I'm also surprised that I can't find any use of any of the three
values checked in libxc.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 10/74] x86/time: Print a more helpful error when a platform timer can't be found
  2018-01-04 13:05 ` [PATCH RFC v1 10/74] x86/time: Print a more helpful error when a platform timer can't be found Wei Liu
@ 2018-01-05 10:37   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 10:37 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 11/74] x86/link: Introduce and use SECTION_ALIGN
  2018-01-04 13:05 ` [PATCH RFC v1 11/74] x86/link: Introduce and use SECTION_ALIGN Wei Liu
@ 2018-01-05 10:38   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 10:38 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> ... to reduce the quantity of #ifdef EFI.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 12/74] xen/acpi: mark the PM timer FADT field as optional
  2018-01-04 13:05 ` [PATCH RFC v1 12/74] xen/acpi: mark the PM timer FADT field as optional Wei Liu
@ 2018-01-05 10:52   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 10:52 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> According to the ACPI 6.1 specification this field is optional, so
> mark it as such.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

This would probably better be a direct port of Linux commit
1d82980c99 (obviously just the tbfadt.c parts of it); perhaps
the other comment in acpi_tb_validate_fadt() would also be
worth updating.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 13/74] xen/domctl: Return arch_config via getdomaininfo
  2018-01-04 13:05 ` [PATCH RFC v1 13/74] xen/domctl: Return arch_config via getdomaininfo Wei Liu
@ 2018-01-05 10:58   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 10:58 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -116,6 +116,7 @@ struct xen_domctl_getdomaininfo {
>      uint32_t ssidref;
>      xen_domain_handle_t handle;
>      uint32_t cpupool;
> +    struct xen_arch_domainconfig arch_config;
>  };

Such an addition requires the interface version to be bumped.
As I assume we will want to backport this to 4.10, we should
make sure this (and perhaps others in this series, but none
outside) is the only domctl interface change for this version,
i.e. for any others until 4.11 goes out we'd need to remember
to bump it a second time then.

With that
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 16/74] x86/fixmap: Modify fix_to_virt() to return a void pointer
  2018-01-04 13:05 ` [PATCH RFC v1 16/74] x86/fixmap: Modify fix_to_virt() to return a void pointer Wei Liu
@ 2018-01-05 11:05   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 11:05 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> --- a/xen/drivers/acpi/apei/apei-io.c
> +++ b/xen/drivers/acpi/apei/apei-io.c
> @@ -92,7 +92,7 @@ static void __iomem *__init apei_range_map(paddr_t paddr, unsigned long size)
>  		apei_range_nr++;
>  	}
>  
> -	return (void __iomem *)fix_to_virt(FIX_APEI_RANGE_BASE + start_nr);
> +	return fix_to_virt(FIX_APEI_RANGE_BASE + start_nr);
>  }

Granted we probably don't use "__iomem" consistently, and we may
hence well want to consider dropping it altogether. But without that
being called out in the description, I don't think it should be dropped
here and further down.

Another option would be to introduce something like fix_to_io_virt(),
with that annotation included in the cast.

> --- a/xen/include/asm-x86/apicdef.h
> +++ b/xen/include/asm-x86/apicdef.h
> @@ -119,7 +119,7 @@
>  /* Only available in x2APIC mode */
>  #define		APIC_SELF_IPI	0x3F0
>  
> -#define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
> +#define APIC_BASE (__fix_to_virt(FIX_APIC_BASE))

Please take the opportunity to get rid of the unnecessary
parentheses.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 17/74] ---- x86/Kconfig: Options for Xen and PVH support
  2018-01-04 13:05 ` [PATCH RFC v1 17/74] ---- x86/Kconfig: Options for Xen and PVH support Wei Liu
@ 2018-01-05 11:11   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 11:11 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:

Please drop the stray ---- from the subject.

> From: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

No description (rationale) at all? But perhaps that's to be attributed
to the RFC nature of the series.

> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -117,6 +117,23 @@ config TBOOT
>  	  Technology (TXT)
>  
>  	  If unsure, say Y.
> +
> +config XEN_GUEST
> +	def_bool n
> +	prompt "Xen Guest"
> +	---help---
> +	  Support for Xen detecting when it is running under Xen.
> +
> +	  If unsure, say N.
> +
> +config PVH_GUEST
> +	def_bool n
> +	prompt "PVH Guest"
> +	depends on XEN_GUEST
> +	---help---
> +	  Support booting using the PVH ABI.
> +
> +	  If unsure, say N.

The names of the options are ambiguous, yet I can't really think of
nice alternatives. Maybe XEN_AS_GUEST and XEN_AS_PVH_GUEST
or GUEST_OF_XEN and PHVH_GUEST_OF_XEN? Same goes for the
prompts and PVH_GUEST's help text.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 18/74] x86/link: Relocate program headers
  2018-01-04 13:05 ` [PATCH RFC v1 18/74] x86/link: Relocate program headers Wei Liu
@ 2018-01-05 11:20   ` Jan Beulich
  2018-01-08 15:43     ` Wei Liu
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 11:20 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> When the xen binary is loaded by libelf (in the future) we rely on the
> elf loader to load the binary accordingly.

It would really help if it was said here what effect this has on the
program headers - I can only guess that it'll make p_vaddr different
from p_paddr. I'm also rather uncertain about the entry point
change wrt various (and especially older) boot loaders.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 19/74] x86: introduce ELFNOTE macro
  2018-01-04 13:05 ` [PATCH RFC v1 19/74] x86: introduce ELFNOTE macro Wei Liu
@ 2018-01-05 11:27   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 11:27 UTC (permalink / raw)
  To: wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> It is needed later for introducing PVH entry point.

Perhaps worth moving the addition there, rather than introducing
dead code here?

> --- a/xen/include/asm-x86/asm_defns.h
> +++ b/xen/include/asm-x86/asm_defns.h
> @@ -409,4 +409,16 @@ static always_inline void stac(void)
>  #define REX64_PREFIX "rex64/"
>  #endif
>  
> +#define ELFNOTE(name, type, desc)           \
> +    .pushsection .note.name               ; \

Please also specify section attributes and type.

> +    .align 4                              ; \

I think we should try to avoid the ambiguous .align, and instead
use .balign or .p2align in new code.

> +    .long 2f - 1f       /* namesz */      ; \
> +    .long 4f - 3f       /* descsz */      ; \
> +    .long type          /* type   */      ; \
> +1:.asciz #name          /* name   */      ; \
> +2:.align 4                                ; \
> +3:desc                  /* desc   */      ; \
> +4:.align 4                                ; \

I'd prefer if you used .L-prefixed labels in new macros, to eliminate
the risk of references around the macro use sites becoming broken.
And if you really meant to stick with numeric labels, please add two
padding blanks after each of them, to align the directives.

Considering this is meant to be used by assembly code only, perhaps
it would be better to make this an assembler macro rather than a C
one (eliminating the need for all the "; \")?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 20/74] x86: produce a binary that can be booted as PVH
  2018-01-04 13:05 ` [PATCH RFC v1 20/74] x86: produce a binary that can be booted as PVH Wei Liu
@ 2018-01-05 11:39   ` Jan Beulich
  2018-01-08 15:59     ` Wei Liu
  2018-01-10 19:10     ` Wei Liu
  0 siblings, 2 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 11:39 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Again I assume a description is still being intended to be written

> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -75,6 +75,8 @@ efi-y := $(shell if [ ! -r 
> $(BASEDIR)/include/xen/compile.h -o \
>                        -O $(BASEDIR)/include/xen/compile.h ]; then \
>                           echo '$(TARGET).efi'; fi)
>  
> +shim-$(CONFIG_PVH_GUEST) := $(TARGET)-shim
> +
>  ifneq ($(build_id_linker),)
>  notes_phdrs = --notes
>  else
> @@ -93,7 +95,7 @@ endif
>  syms-warn-dup-y := --warn-dup
>  syms-warn-dup-$(CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS) :=
>  
> -$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
> +$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32 $(shim-y)
>  	./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TARGET) $(XEN_IMG_OFFSET) \
>  	               `$(NM) $(TARGET)-syms | sed -ne 's/^\([^ ]*\) . __2M_rwdata_end$$/0x\1/p'`

Hmm, so you mean to build shim and "normal" Xen at the same time,
with all the same objects? That's rather unexpected following the
earlier exchange Andrew and I had. I would expect the shim to not
require quite a few bits and pieces, and hence wanting to be built
independently.

> @@ -144,6 +146,11 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
>  		>$(@D)/$(@F).map
>  	rm -f $(@D)/.$(@F).[0-9]*
>  
> +# Use elf32-x86-64 if toolchain support exists, elf32-i386 otherwise.
> +$(TARGET)-shim: FORMAT = $(firstword $(filter elf32-x86-64,$(shell $(OBJCOPY) --help)) elf32-i386)

What are the implications of using one vs the other? If elf32-i386
works, why not use it all the time?

> @@ -374,6 +375,15 @@ cs32_switch:
>          /* Jump to earlier loaded address. */
>          jmp     *%edi
>  
> +
> +#ifdef CONFIG_PVH_GUEST

No double blank lines please.

> +ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY, .long sym_offs(__pvh_start))
> +
> +__pvh_start:
> +        ud2a
> +
> +#endif /* CONFIG_PVH_GUEST */
> +
>  __start:

Does the new code strictly need to live here? Can't is be kept both
out of the resulting binary sequence currently resulting here and
out of this source file altogether (by introducing a new pvh.S or
shim.S)?

> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -34,7 +34,7 @@ OUTPUT_ARCH(i386:x86-64)
>  PHDRS
>  {
>    text PT_LOAD ;
> -#if defined(BUILD_ID) && !defined(EFI)
> +#if (defined(BUILD_ID) && !defined(EFI)) || defined (CONFIG_PVH_GUEST)

Did you mean

#if (defined(BUILD_ID) || defined(CONFIG_PVH_GUEST)) && !defined(EFI)

? Of course this would be moot if main and shim binary were to
be built independently.

Also - stray blank.

> @@ -128,6 +128,12 @@ SECTIONS
>         __param_end = .;
>    } :text
>  
> +#if defined(CONFIG_PVH_GUEST) && !defined(EFI)

The EFI part here then also wouldn't be necessary, afaict.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 21/74] x86/entry: Early PVH boot code
  2018-01-04 13:05 ` [PATCH RFC v1 21/74] x86/entry: Early PVH boot code Wei Liu
@ 2018-01-05 13:32   ` Jan Beulich
  2018-01-09 15:45     ` Wei Liu
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 13:32 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> --- a/xen/arch/x86/boot/head.S
> +++ b/xen/arch/x86/boot/head.S
> @@ -380,7 +380,39 @@ cs32_switch:
>  ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY, .long sym_offs(__pvh_start))
>  
>  __pvh_start:
> -        ud2a
> +        cld
> +        cli
> +
> +        /*
> +         * We need one push/pop to determine load address.  Use the same
> +         * absolute address as the native path, for lack of a better

... stack address ...

> @@ -544,12 +576,18 @@ trampoline_setup:
>          /* Get bottom-most low-memory stack address. */
>          add     $TRAMPOLINE_SPACE,%ecx
>  
> +#ifdef CONFIG_PVH_GUEST
> +        cmpb    $1, sym_fs(pvh_boot)
> +        je      1f

I'd much prefer

        cmpb    $0, sym_fs(pvh_boot)
        jne     1f

in cases like this one.

But then I sort of dislike the addition of such random in-memory
flags. Considering ...

> +#endif
> +
>          /* Save the Multiboot info struct (after relocation) for later use. */
>          push    %ecx                /* Bottom-most low-memory stack address. */
>          push    %ebx                /* Multiboot information address. */
>          push    %eax                /* Multiboot magic. */

... the values used here, couldn't the flag be replaced by setting
one or both of %eax and %ebx to zero before jumping to
trampoline_setup? Or wait, further down I see that this flag is
also being use in C code. Perhaps fine then as is. Otoh, keying
this off of one of the register values would allow the #ifdef to
be dropped.

> --- /dev/null
> +++ b/xen/arch/x86/guest/pvh-boot.c
> @@ -0,0 +1,119 @@
> +/******************************************************************************
> + * arch/x86/guest/pvh-boot.c
> + *
> + * PVH boot time support
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (c) 2017 Citrix Systems Ltd.
> + */
> +#include <xen/init.h>
> +#include <xen/lib.h>
> +#include <xen/mm.h>
> +
> +#include <asm/guest.h>
> +
> +#include <public/arch-x86/hvm/start_info.h>
> +
> +/* Initialised in head.S, before .bss is zeroed. */
> +bool pvh_boot __initdata;
> +uint32_t pvh_start_info_pa __initdata;

Would you mind using the more common placement of __initdata,
like you do ...

> +static multiboot_info_t __initdata pvh_mbi;
> +static module_t __initdata pvh_mbi_mods[32];
> +static char *__initdata pvh_loader = "PVH Directboot";

... here?

For the last item

static const char __initconst pvh_loader[] = "PVH Directboot";

please. For mods[] - isn't 32 overly much?

> +static void __init convert_pvh_info(void)
> +{
> +    struct hvm_start_info *pvh_info = __va(pvh_start_info_pa);
> +    struct hvm_modlist_entry *entry;

const (twice)

> +    module_t *mod;
> +    unsigned int i;
> +
> +    ASSERT(pvh_info->magic == XEN_HVM_START_MAGIC_VALUE);
> +
> +    /*
> +     * Turn hvm_start_info into mbi. Luckily all modules are placed under 4GB
> +     * boundary on x86.

ISTR having that discussion relatively recently in another context:
All the header states is "NB: Xen on x86 will always try to place all
the data below the 4GiB boundary." Note the "try to". Hence I
think ...

> +     */
> +    pvh_mbi.flags = MBI_CMDLINE | MBI_MODULES | MBI_LOADERNAME;
> +
> +    ASSERT(!(pvh_info->cmdline_paddr >> 32));

... this, if we don't want to handle the case, should be BUG_ON() or
panic() (same further down).

> +    pvh_mbi.cmdline = pvh_info->cmdline_paddr;
> +    pvh_mbi.boot_loader_name = __pa(pvh_loader);
> +
> +    ASSERT(pvh_info->nr_modules < 32);

ARRAY_SIZE(pvh_mbi_mods) and perhaps again BUG_ON() or
panic().

> +    pvh_mbi.mods_count = pvh_info->nr_modules;
> +    pvh_mbi.mods_addr = __pa(pvh_mbi_mods);
> +
> +    mod = pvh_mbi_mods;
> +    entry = __va(pvh_info->modlist_paddr);

How come __va() already works at this point in time? And what about
this address being beyond 4Gb?

> +    for ( i = 0; i < pvh_info->nr_modules; i++ )
> +    {
> +        ASSERT(!(entry[i].paddr >> 32));

To relax this condition (in particular to allow huge initrd), how
about ...

> +        mod[i].mod_start = entry[i].paddr;
> +        mod[i].mod_end   = entry[i].paddr + entry[i].size;

... using the EFI approach here and store the PFN in mod_start
and the size in mod_end?

> +        mod[i].string    = entry[i].cmdline_paddr;

No 4Gb check here?

> +void __init pvh_print_info(void)
> +{
> +    struct hvm_start_info *pvh_info = __va(pvh_start_info_pa);
> +    struct hvm_modlist_entry *entry;

const (twice) again

> +    unsigned int i;
> +
> +    ASSERT(pvh_info->magic == XEN_HVM_START_MAGIC_VALUE);
> +
> +    printk("PVH start info: (pa %08x)\n", pvh_start_info_pa);
> +    printk("  version:    %u\n", pvh_info->version);
> +    printk("  flags:      %#"PRIx32"\n", pvh_info->flags);
> +    printk("  nr_modules: %u\n", pvh_info->nr_modules);
> +    printk("  modlist_pa: %016"PRIx64"\n", pvh_info->modlist_paddr);
> +    printk("  cmdline_pa: %016"PRIx64"\n", pvh_info->cmdline_paddr);

Considering you assume these to be below 4Gb anyway, how about
just %08?

> +    if ( pvh_info->cmdline_paddr )
> +        printk("  cmdline:    '%s'\n",
> +               (char *)__va(pvh_info->cmdline_paddr));

This appears to fit on one line.

> +    printk("  rsdp_pa:    %016"PRIx64"\n", pvh_info->rsdp_paddr);

This one also unlikely needs 16 digits (and there are more
candidates further down).

> +    entry = __va(pvh_info->modlist_paddr);
> +    for ( i = 0; i < pvh_info->nr_modules; i++ )
> +    {
> +        printk("    mod[%u].pa:         %016"PRIx64"\n", i, entry[i].paddr);
> +        printk("    mod[%u].size:       %016"PRIu64"\n", i, entry[i].size);
> +        printk("    mod[%u].cmdline_pa: %016"PRIx64"\n",
> +               i, entry[i].cmdline_paddr);
> +        if ( entry[i].cmdline_paddr )
> +            printk("    mod[%u].cmdline:    '%s'\n", i,
> +                   (char *)__va(entry[i].cmdline_paddr));

[%2u] perhaps in all cases, unless you decide to shrink the array
size to no more than 10?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 23/74] x86/entry: Probe for Xen early during boot
  2018-01-04 13:05 ` [PATCH RFC v1 23/74] x86/entry: Probe for Xen early during boot Wei Liu
@ 2018-01-05 13:40   ` Jan Beulich
  2018-01-10 17:45     ` Wei Liu
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 13:40 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> --- /dev/null
> +++ b/xen/arch/x86/guest/xen.c
> @@ -0,0 +1,75 @@
> +/******************************************************************************
> + * arch/x86/guest/xen.c
> + *
> + * Support for detecting and running under Xen.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (c) 2017 Citrix Systems Ltd.
> + */
> +#include <xen/init.h>
> +#include <xen/types.h>
> +
> +#include <asm/guest.h>
> +#include <asm/processor.h>
> +
> +#include <public/arch-x86/cpuid.h>
> +
> +bool xen_guest;

__read_mostly?

> +static uint32_t xen_cpuid_base;

Depending on future use, __initdata or __read_mostly?

> --- a/xen/include/asm-x86/guest.h
> +++ b/xen/include/asm-x86/guest.h
> @@ -20,6 +20,7 @@
>  #define __X86_GUEST_H__
>  
>  #include <asm/guest/pvh-boot.h>
> +#include <asm/guest/xen.h>
>  
>  #endif /* __X86_GUEST_H__ */

I'm increasingly curious to understand what this header's purpose
is meant to be. It looks as if you mean source files to only ever
include this one, but why? Rather than exposing everything at
once, we should try (unrelated to this series) to limit what each
CU gets to see, speeding up builds (not the least incremental ones
by reducing the dependency trees).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 24/74] x86/guest: Hypercall support
  2018-01-04 13:05 ` [PATCH RFC v1 24/74] x86/guest: Hypercall support Wei Liu
@ 2018-01-05 13:53   ` Jan Beulich
  2018-01-05 14:09     ` Andrew Cooper
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 13:53 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> --- /dev/null
> +++ b/xen/arch/x86/guest/hypercall_page.S
> @@ -0,0 +1,79 @@
> +#include <asm/page.h>
> +#include <asm/asm_defns.h>
> +#include <public/xen.h>
> +
> +        .section ".text.page_aligned", "ax", @progbits
> +        .p2align PAGE_SHIFT
> +
> +GLOBAL(hypercall_page)
> +         /* Poisoned with `ret` for safety before hypercalls are set up. */
> +        .fill PAGE_SIZE, 1, 0xc3

How is RET a useful poison value? Why not 0xcc?

> +        .type hypercall_page, STT_OBJECT

I'd rather omit the type altogether - it's not really an object (nor a
function), the more that you produce individual entry symbols
below anyway.

> +        .size hypercall_page, PAGE_SIZE
> +
> +/*
> + * Identify a specific hypercall in the hypercall page
> + * @param name Hypercall name.
> + */
> +#define DECLARE_HYPERCALL(name)                                                 \
> +        .globl HYPERCALL_ ## name;                                              \
> +        .set   HYPERCALL_ ## name, hypercall_page + __HYPERVISOR_ ## name * 32; \
> +        .type  HYPERCALL_ ## name, STT_FUNC;                                    \
> +        .size  HYPERCALL_ ## name, 32

This is certainly fine for now, but going forward wants to be
machine generated directly from the header, so that it won't
need touching when new hypercalls are being added. Until
then I wonder whether you really need all the entries you
enumerate below - some (like iret) are plain invalid for PVH.

> --- /dev/null
> +++ b/xen/include/asm-x86/guest/hypercall.h
> @@ -0,0 +1,92 @@
> +/******************************************************************************
> + * asm-x86/guest/hypercall.h
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (c) 2017 Citrix Systems Ltd.
> + */
> +
> +#ifndef __X86_XEN_HYPERCALL_H__
> +#define __X86_XEN_HYPERCALL_H__
> +
> +#ifdef CONFIG_XEN_GUEST
> +
> +/*
> + * Hypercall primatives for 64bit
> + *
> + * Inputs: %rdi, %rsi, %rdx, %r10, %r8, %r9 (arguments 1-6)
> + */
> +
> +#define _hypercall64_1(type, hcall, a1)                                 \
> +    ({                                                                  \
> +        long res, tmp;                                                  \

Especially for tmp I think it would be quite a bit more safe if it
had a trailing underscore attached, so that an occasional use
of

    _hypercall64_1(..., tmp);

would work as intended.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 25/74] x86/shutdown: Support for using SCHEDOP_{shutdown, reboot}
  2018-01-04 13:05 ` [PATCH RFC v1 25/74] x86/shutdown: Support for using SCHEDOP_{shutdown, reboot} Wei Liu
@ 2018-01-05 14:01   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 14:01 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with two remarks:

> --- a/xen/include/asm-x86/guest/hypercall.h
> +++ b/xen/include/asm-x86/guest/hypercall.h
> @@ -19,6 +19,11 @@
>  #ifndef __X86_XEN_HYPERCALL_H__
>  #define __X86_XEN_HYPERCALL_H__
>  
> +#include <xen/types.h>
> +
> +#include <public/xen.h>
> +#include <public/sched.h>
> +
>  #ifdef CONFIG_XEN_GUEST

Why do you #include ahead of the #ifdef?

> @@ -78,6 +83,30 @@
>          (type)res;                                                      \
>      })
>  
> +/*
> + * Primitive Hypercall wrappers
> + */
> +static inline long xen_hypercall_sched_op(unsigned int cmd, void *arg)
> +{
> +    return _hypercall64_2(long, __HYPERVISOR_sched_op, cmd, arg);
> +}
> +
> +/*
> + * Higher level hypercall helpers
> + */
> +static inline long xen_hypercall_shutdown(unsigned int reason)
> +{
> +    return xen_hypercall_sched_op(SCHEDOP_shutdown, &reason);

It would seem more correct if you went through struct
sched_shutdown here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 26/74] x86/pvh: Retrieve memory map from Xen
  2018-01-04 13:05 ` [PATCH RFC v1 26/74] x86/pvh: Retrieve memory map from Xen Wei Liu
@ 2018-01-05 14:05   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 14:05 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 27/74] xen/console: Introduce console=xen
  2018-01-04 13:05 ` [PATCH RFC v1 27/74] xen/console: Introduce console=xen Wei Liu
@ 2018-01-05 14:08   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 14:08 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> This specifies whether to use Xen specific console output. There are
> two variants: one is the hypervisor console, the other is the magic
> debug port 0xe9.

With just x86 in mind this is all fine, but for ARM (and for other
reasons even for x86) this surely wants some #ifdef-s added.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 24/74] x86/guest: Hypercall support
  2018-01-05 13:53   ` Jan Beulich
@ 2018-01-05 14:09     ` Andrew Cooper
  0 siblings, 0 replies; 206+ messages in thread
From: Andrew Cooper @ 2018-01-05 14:09 UTC (permalink / raw)
  To: Jan Beulich, wei.liu2; +Cc: Xen-devel

On 05/01/18 13:53, Jan Beulich wrote:
>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> --- /dev/null
>> +++ b/xen/arch/x86/guest/hypercall_page.S
>> @@ -0,0 +1,79 @@
>> +#include <asm/page.h>
>> +#include <asm/asm_defns.h>
>> +#include <public/xen.h>
>> +
>> +        .section ".text.page_aligned", "ax", @progbits
>> +        .p2align PAGE_SHIFT
>> +
>> +GLOBAL(hypercall_page)
>> +         /* Poisoned with `ret` for safety before hypercalls are set up. */
>> +        .fill PAGE_SIZE, 1, 0xc3
> How is RET a useful poison value? Why not 0xcc?

This was all imported basically-verbatim from XTF (which also answers
some of your lower questions).

ret over cc prevents problems when crashing early.  Turning the
preferred schedop_shutdown() into a nop stop you taking a cascade fault,
and instead try a different shutdown mechanism.

Also, before my recent patch to fix int3 behaviour, Xen will happily
execute its way (slowly) through debug traps without printing anything
useful.

>
>> +        .type hypercall_page, STT_OBJECT
> I'd rather omit the type altogether - it's not really an object (nor a
> function), the more that you produce individual entry symbols
> below anyway.
>
>> +        .size hypercall_page, PAGE_SIZE
>> +
>> +/*
>> + * Identify a specific hypercall in the hypercall page
>> + * @param name Hypercall name.
>> + */
>> +#define DECLARE_HYPERCALL(name)                                                 \
>> +        .globl HYPERCALL_ ## name;                                              \
>> +        .set   HYPERCALL_ ## name, hypercall_page + __HYPERVISOR_ ## name * 32; \
>> +        .type  HYPERCALL_ ## name, STT_FUNC;                                    \
>> +        .size  HYPERCALL_ ## name, 32
> This is certainly fine for now, but going forward wants to be
> machine generated directly from the header, so that it won't
> need touching when new hypercalls are being added. Until
> then I wonder whether you really need all the entries you
> enumerate below - some (like iret) are plain invalid for PVH.
>
>> --- /dev/null
>> +++ b/xen/include/asm-x86/guest/hypercall.h
>> @@ -0,0 +1,92 @@
>> +/******************************************************************************
>> + * asm-x86/guest/hypercall.h
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms and conditions of the GNU General Public
>> + * License, version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public
>> + * License along with this program; If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * Copyright (c) 2017 Citrix Systems Ltd.
>> + */
>> +
>> +#ifndef __X86_XEN_HYPERCALL_H__
>> +#define __X86_XEN_HYPERCALL_H__
>> +
>> +#ifdef CONFIG_XEN_GUEST
>> +
>> +/*
>> + * Hypercall primatives for 64bit
>> + *
>> + * Inputs: %rdi, %rsi, %rdx, %r10, %r8, %r9 (arguments 1-6)
>> + */
>> +
>> +#define _hypercall64_1(type, hcall, a1)                                 \
>> +    ({                                                                  \
>> +        long res, tmp;                                                  \
> Especially for tmp I think it would be quite a bit more safe if it
> had a trailing underscore attached, so that an occasional use
> of
>
>     _hypercall64_1(..., tmp);
>
> would work as intended.

Hmm.  I'd not even considered that issue.  I'll add it to my todo list.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 28/74] x86: initialise shared_info page
  2018-01-04 13:05 ` [PATCH RFC v1 28/74] x86: initialise shared_info page Wei Liu
@ 2018-01-05 14:11   ` Jan Beulich
  2018-01-05 14:20     ` Andrew Cooper
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 14:11 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> --- a/xen/arch/x86/guest/xen.c
> +++ b/xen/arch/x86/guest/xen.c
> @@ -72,6 +72,30 @@ void __init probe_hypervisor(void)
>      xen_guest = true;
>  }
>  
> +static void map_shared_info(struct e820map *e820)
> +{
> +    paddr_t frame = 0xff000000; /* TODO: Hardcoded beside magic frames. */

What are the plans here?

> +    struct xen_add_to_physmap xatp = {
> +        .domid = DOMID_SELF,
> +        .idx = 0,
> +        .space = XENMAPSPACE_shared_info,
> +        .gpfn = frame >> PAGE_SHIFT,
> +    };
> +
> +    if ( !e820_add_range(e820, frame, frame + PAGE_SIZE, E820_RESERVED) )
> +        panic("Failed to reserve shared_info range");
> +
> +    if ( xen_hypercall_memory_op(XENMEM_add_to_physmap, &xatp) )
> +        panic("Failed to map shared_info page");

Also report the error code?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 29/74] x86: xen pv clock time source
  2018-01-04 13:05 ` [PATCH RFC v1 29/74] x86: xen pv clock time source Wei Liu
@ 2018-01-05 14:17   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 14:17 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> It is a variant of TSC clock source.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Mostly fine, with the TODO addressed, u64 etc replaced by uint64_t
etc, ...

> +static always_inline
> +u64 __read_cycle(const struct vcpu_time_info *info, u64 tsc)

... the double underscores dropped here, and ...

> +static u64 last_value;

... this moved into the only function it's needed in.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 28/74] x86: initialise shared_info page
  2018-01-05 14:11   ` Jan Beulich
@ 2018-01-05 14:20     ` Andrew Cooper
  2018-01-05 14:28       ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Andrew Cooper @ 2018-01-05 14:20 UTC (permalink / raw)
  To: Jan Beulich, wei.liu2; +Cc: Xen-devel

On 05/01/18 14:11, Jan Beulich wrote:
>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> --- a/xen/arch/x86/guest/xen.c
>> +++ b/xen/arch/x86/guest/xen.c
>> @@ -72,6 +72,30 @@ void __init probe_hypervisor(void)
>>      xen_guest = true;
>>  }
>>  
>> +static void map_shared_info(struct e820map *e820)
>> +{
>> +    paddr_t frame = 0xff000000; /* TODO: Hardcoded beside magic frames. */
> What are the plans here?

Nothing immediately.  This is compatible with all versions of libxc in
existance, but we need to start a thread discussing HVM guest physical
address space.

We've also just found a passive performance hole, where enabling any
kind of PCI Passthrough causes Windows and Linux's grant table mappings
to turn uncached because they are allocated inside what the OS thinks is
an MMIO BAR.

I'll start a thread when I'm a little less busy.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 28/74] x86: initialise shared_info page
  2018-01-05 14:20     ` Andrew Cooper
@ 2018-01-05 14:28       ` Roger Pau Monné
  2018-01-05 14:40         ` Andrew Cooper
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-05 14:28 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, wei.liu2, Jan Beulich

On Fri, Jan 05, 2018 at 02:20:16PM +0000, Andrew Cooper wrote:
> On 05/01/18 14:11, Jan Beulich wrote:
> >>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> >> --- a/xen/arch/x86/guest/xen.c
> >> +++ b/xen/arch/x86/guest/xen.c
> >> @@ -72,6 +72,30 @@ void __init probe_hypervisor(void)
> >>      xen_guest = true;
> >>  }
> >>  
> >> +static void map_shared_info(struct e820map *e820)
> >> +{
> >> +    paddr_t frame = 0xff000000; /* TODO: Hardcoded beside magic frames. */
> > What are the plans here?
> 
> Nothing immediately.  This is compatible with all versions of libxc in
> existance, but we need to start a thread discussing HVM guest physical
> address space.

Patches 43/44/45 remove this hardcoding.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 30/74] x86: APIC timer calibration when running as a guest
  2018-01-04 13:05 ` [PATCH RFC v1 30/74] x86: APIC timer calibration when running as a guest Wei Liu
@ 2018-01-05 14:35   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 14:35 UTC (permalink / raw)
  To: wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> The timer calibration depends on the number of ticks. Introduce a
> variant to wait for a tick when running as a guest.

The change itself is fine, i.e.
Reviewed-by: Jan Beulich <jbeulich@suse.com>
but the description (to me, but it may be just me) doesn't really
match it. How about

The timer calibration currently depends on PIT. Introduce a variant
to wait for a tick's worth of time to elapse when running as a PVH
guest.

Jan

> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  xen/arch/x86/apic.c | 38 ++++++++++++++++++++++++++++++--------
>  1 file changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
> index ed59440c45..5039173827 100644
> --- a/xen/arch/x86/apic.c
> +++ b/xen/arch/x86/apic.c
> @@ -36,6 +36,8 @@
>  #include <mach_apic.h>
>  #include <io_ports.h>
>  #include <xen/kexec.h>
> +#include <asm/guest.h>
> +#include <asm/time.h>
>  
>  static bool __read_mostly tdt_enabled;
>  static bool __initdata tdt_enable = true;
> @@ -1091,6 +1093,20 @@ static void setup_APIC_timer(void)
>      local_irq_restore(flags);
>  }
>  
> +static void wait_tick_pvh(void)
> +{
> +    u64 lapse_ns = 1000000000ULL / HZ;
> +    s_time_t start, curr_time;
> +
> +    start = NOW();
> +
> +    /* Won't wrap around */
> +    do {
> +        cpu_relax();
> +        curr_time = NOW();
> +    } while ( curr_time - start < lapse_ns );
> +}
> +
>  /*
>   * In this function we calibrate APIC bus clocks to the external
>   * timer. Unfortunately we cannot use jiffies and the timer irq
> @@ -1123,12 +1139,15 @@ static int __init calibrate_APIC_clock(void)
>       */
>      __setup_APIC_LVTT(1000000000);
>  
> -    /*
> -     * The timer chip counts down to zero. Let's wait
> -     * for a wraparound to start exact measurement:
> -     * (the current tick might have been already half done)
> -     */
> -    wait_8254_wraparound();
> +    if ( !xen_guest )
> +        /*
> +         * The timer chip counts down to zero. Let's wait
> +         * for a wraparound to start exact measurement:
> +         * (the current tick might have been already half done)
> +         */
> +        wait_8254_wraparound();
> +    else
> +        wait_tick_pvh();
>  
>      /*
>       * We wrapped around just now. Let's start:
> @@ -1137,10 +1156,13 @@ static int __init calibrate_APIC_clock(void)
>      tt1 = apic_read(APIC_TMCCT);
>  
>      /*
> -     * Let's wait LOOPS wraprounds:
> +     * Let's wait LOOPS ticks:
>       */
>      for (i = 0; i < LOOPS; i++)
> -        wait_8254_wraparound();
> +        if ( !xen_guest )
> +            wait_8254_wraparound();
> +        else
> +            wait_tick_pvh();
>  
>      tt2 = apic_read(APIC_TMCCT);
>      t2 = rdtsc_ordered();
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org 
> https://lists.xenproject.org/mailman/listinfo/xen-devel 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 28/74] x86: initialise shared_info page
  2018-01-05 14:28       ` Roger Pau Monné
@ 2018-01-05 14:40         ` Andrew Cooper
  0 siblings, 0 replies; 206+ messages in thread
From: Andrew Cooper @ 2018-01-05 14:40 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, wei.liu2, Jan Beulich

On 05/01/18 14:28, Roger Pau Monné wrote:
> On Fri, Jan 05, 2018 at 02:20:16PM +0000, Andrew Cooper wrote:
>> On 05/01/18 14:11, Jan Beulich wrote:
>>>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>>>> --- a/xen/arch/x86/guest/xen.c
>>>> +++ b/xen/arch/x86/guest/xen.c
>>>> @@ -72,6 +72,30 @@ void __init probe_hypervisor(void)
>>>>      xen_guest = true;
>>>>  }
>>>>  
>>>> +static void map_shared_info(struct e820map *e820)
>>>> +{
>>>> +    paddr_t frame = 0xff000000; /* TODO: Hardcoded beside magic frames. */
>>> What are the plans here?
>> Nothing immediately.  This is compatible with all versions of libxc in
>> existance, but we need to start a thread discussing HVM guest physical
>> address space.
> Patches 43/44/45 remove this hardcoding.

Oh sorry - I'm even more out of date than I thought I was.

I'll get back to my other work.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 31/74] x86: read wallclock from Xen running in pvh mode
  2018-01-04 13:05 ` [PATCH RFC v1 31/74] x86: read wallclock from Xen running in pvh mode Wei Liu
@ 2018-01-05 14:43   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 14:43 UTC (permalink / raw)
  To: wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with a suggestion on code structure:

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -969,6 +969,36 @@ static unsigned long get_cmos_time(void)
>      return mktime(rtc.year, rtc.mon, rtc.day, rtc.hour, rtc.min, rtc.sec);
>  }
>  
> +static unsigned long noinline get_xen_wallclock_time(void)
> +{
> +#ifdef CONFIG_XEN_GUEST
> +    struct shared_info *sh_info = XEN_shared_info;
> +    uint32_t wc_version;
> +    uint64_t wc_sec;
> +
> +    do {
> +        wc_version = sh_info->wc_version & ~1;
> +        smp_rmb();
> +
> +        wc_sec  = sh_info->wc_sec;
> +        smp_rmb();
> +    } while ( wc_version != sh_info->wc_version );
> +
> +    return wc_sec + read_xen_timer() / 1000000000;

Why not move all of this ...

> +#else
> +    ASSERT_UNREACHABLE();
> +    return 0;
> +#endif
> +}
> +
> +static unsigned long get_wallclock_time(void)
> +{

... here:

#ifdef CONFIG_XEN_GUEST
    if ( xen_guest )
    {
        ...
        return wc_sec + read_xen_timer() / 1000000000;
    }
#endif

   return get_cmos_time();
}

avoiding one of these not very nice ASSERT_UNREACHABLE()?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 32/74] x86: don't swallow the first command line item in pvh mode
  2018-01-04 13:05 ` [PATCH RFC v1 32/74] x86: don't swallow the first command line item " Wei Liu
@ 2018-01-05 14:49   ` Jan Beulich
  2018-01-09 14:30   ` Roger Pau Monné
  1 sibling, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 14:49 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> @@ -632,11 +633,10 @@ static char * __init cmdline_cook(char *p, const char *loader_name)
>      while ( *p == ' ' )
>          p++;
>  
> -    /* GRUB2 does not include image name as first item on command line. */
> -    if ( loader_is_grub2(loader_name) )
> +    if ( !loader_is_grub1(loader_name) )
>          return p;

Behavior here changes for xen.efi booted without grub2 afaict.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 33/74] x86/guest: enable event channels upcalls
  2018-01-04 13:05 ` [PATCH RFC v1 33/74] x86/guest: enable event channels upcalls Wei Liu
@ 2018-01-05 15:07   ` Jan Beulich
  2018-01-05 15:19     ` Andrew Cooper
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 15:07 UTC (permalink / raw)
  To: Andrew Cooper, Sergey Dyasli, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> @@ -30,6 +31,7 @@
>  bool xen_guest;
>  
>  static uint32_t xen_cpuid_base;
> +static uint8_t evtchn_upcall_vector;

There being a single global vector, why do you use
HVMOP_set_evtchn_upcall_vector instead of setting
HVM_PARAM_CALLBACK_IRQ? Aiui this would also make ...

> @@ -91,9 +93,81 @@ static void map_shared_info(struct e820map *e820)
>      set_fixmap(FIX_XEN_SHARED_INFO, frame);
>  }
>  
> +static void xen_evtchn_upcall(struct cpu_user_regs *regs)
> +{
> +    unsigned int cpu = smp_processor_id();
> +    struct vcpu_info *vcpu_info = &XEN_shared_info->vcpu_info[cpu];
> +
> +    vcpu_info->evtchn_upcall_pending = 0;
> +    xchg(&vcpu_info->evtchn_pending_sel, 0);
> +
> +    ack_APIC_irq();

... this call unnecessary.

Also wouldn't it be better to decouple uses of vcpu_info from
XEN_shared_info right away, for the later extension to more
vCPU-s to be less intrusive?

Also - why xchg() rather than write_atomic() (again further down)?

> +static void ap_setup_event_channels(bool clear)
> +{
> +    unsigned int i, cpu = smp_processor_id();
> +    struct vcpu_info *vcpu_info = &XEN_shared_info->vcpu_info[cpu];
> +    int rc;
> +
> +    ASSERT(evtchn_upcall_vector);
> +    ASSERT(cpu < ARRAY_SIZE(XEN_shared_info->vcpu_info));

Strictly speaking this assertion comes too late. But yes, we have
quite a few such examples elsewhere, so I don't really mind.

> +    if ( !clear )
> +    {
> +        /*
> +         * This is necessary to ensure that a CPU will be interrupted in case
> +         * of an event channel notification.
> +         */
> +        ASSERT(vcpu_info->evtchn_upcall_pending == 0);
> +        ASSERT(vcpu_info->evtchn_pending_sel == 0);
> +    }
> +
> +    rc = xen_hypercall_set_evtchn_upcall_vector(cpu, evtchn_upcall_vector);
> +    if ( rc )
> +        panic("Unable to set evtchn upcall vector: %d", rc);
> +
> +    if ( clear )
> +    {
> +        /*
> +         * Clear any pending upcall bits. This makes us effectively ignore any
> +         * previous upcalls which might be suboptimal.
> +         */
> +        vcpu_info->evtchn_upcall_pending = 0;
> +        xchg(&vcpu_info->evtchn_pending_sel, 0);
> +
> +        /*
> +         * evtchn_pending can be cleared only on the boot CPU because it's
> +         * located in a shared structure.
> +         */
> +        for ( i = 0; i < 8; i++ )

ARRAY_SIZE() (also further down)

I also don't really understand the comment - all CPUs can access
shared info. But then again I don't really understand all this
clearing anyway, including the respective ASSERT()s further up.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 33/74] x86/guest: enable event channels upcalls
  2018-01-05 15:07   ` Jan Beulich
@ 2018-01-05 15:19     ` Andrew Cooper
  0 siblings, 0 replies; 206+ messages in thread
From: Andrew Cooper @ 2018-01-05 15:19 UTC (permalink / raw)
  To: Jan Beulich, Sergey Dyasli, wei.liu2; +Cc: Xen-devel

On 05/01/18 15:07, Jan Beulich wrote:
>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> @@ -30,6 +31,7 @@
>>  bool xen_guest;
>>  
>>  static uint32_t xen_cpuid_base;
>> +static uint8_t evtchn_upcall_vector;
> There being a single global vector, why do you use
> HVMOP_set_evtchn_upcall_vector instead of setting
> HVM_PARAM_CALLBACK_IRQ?

Because another discovery is that HVM_PARAM_CALLBACK_IRQ is subtly
broken.  It is incompatible with L0 Xen choosing to use hardware APIC
assistance, due to its deliberate (ab)use of the IRR state model.

OTOH, there are patches (perhaps later, perhaps not posted yet) which do
try to make use of CALLBACK_IRQ for compatibility on older L0 hypervisors.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 34/74] x86/guest: add PV console code
  2018-01-04 13:05 ` [PATCH RFC v1 34/74] x86/guest: add PV console code Wei Liu
@ 2018-01-05 15:22   ` Jan Beulich
  2018-01-10 15:33     ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 15:22 UTC (permalink / raw)
  To: Andrew Cooper, Sergey Dyasli, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> --- /dev/null
> +++ b/xen/drivers/char/xen_pv_console.c
> @@ -0,0 +1,198 @@
> +/******************************************************************************
> + * drivers/char/xen_pv_console.c
> + *
> + * A frontend driver for Xen's PV console.
> + * Can be used when Xen is running on top of Xen in pv-in-pvh mode.
> + * (Linux's name for this is hvc console)
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (c) 2017 Citrix Systems Ltd.
> + */
> +
> +#include <xen/lib.h>
> +#include <xen/hypercall.h>
> +#include <xen/pv_console.h>
> +
> +#include <asm/fixmap.h>
> +#include <asm/guest.h>
> +
> +#include <public/io/console.h>
> +
> +static struct xencons_interface *cons_ring;
> +static evtchn_port_t cons_evtchn;
> +static serial_rx_fn cons_rx_handler;
> +static DEFINE_SPINLOCK(tx_lock);
> +
> +void __init pv_console_init(void)
> +{
> +    long r;
> +    uint64_t raw_pfn = 0, raw_evtchn = 0;
> +
> +    if ( !xen_guest )
> +    {
> +        printk("PV console init failed: xen_guest mode is not active!\n");
> +        return;
> +    }
> +
> +    r = xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_PFN, &raw_pfn);
> +    if ( r < 0 )
> +        goto error;
> +
> +    r = xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_EVTCHN, &raw_evtchn);
> +    if ( r < 0 )
> +        goto error;
> +
> +    set_fixmap(FIX_PV_CONSOLE, raw_pfn << PAGE_SHIFT);
> +    cons_ring = (struct xencons_interface *)fix_to_virt(FIX_PV_CONSOLE);

Pointless cast with the earlier return type change.

> +    cons_evtchn = raw_evtchn;
> +
> +    printk("Initialised PV console at 0x%p with pfn %#lx and evtchn %#x\n",

Does %#p not work?

> +void __init pv_console_set_rx_handler(serial_rx_fn fn)
> +{
> +    cons_rx_handler = fn;
> +}

Especially this and ...

> +size_t pv_console_rx(struct cpu_user_regs *regs)
> +{
> +    char c;
> +    XENCONS_RING_IDX cons, prod;
> +    size_t recv = 0;
> +
> +    if ( !cons_ring )
> +        return 0;
> +
> +    /* TODO: move this somewhere */
> +    if ( !test_bit(cons_evtchn, XEN_shared_info->evtchn_pending) )
> +        return 0;

... the need for this and ...

> +    prod = ACCESS_ONCE(cons_ring->in_prod);
> +    cons = cons_ring->in_cons;
> +    /* Get pointers before reading the ring */
> +    smp_rmb();
> +
> +    ASSERT((prod - cons) <= sizeof(cons_ring->in));
> +
> +    while ( cons != prod )
> +    {
> +        c = cons_ring->in[MASK_XENCONS_IDX(cons++, cons_ring->in)];
> +        if ( cons_rx_handler )
> +            cons_rx_handler(c, regs);
> +        recv++;
> +    }
> +
> +    /* No need for a mem barrier because every character was already consumed */
> +    barrier();
> +    ACCESS_ONCE(cons_ring->in_cons) = cons;
> +    notify_daemon();
> +
> +    clear_bit(cons_evtchn, XEN_shared_info->evtchn_pending);

... this at this layer are very hard to judge about with all the code
here being dead for the moment. Can't this driver be modeled like
any other of the UART drivers, surfacing the accessors through
struct uart_driver (and making the ad-hoc call sites in the next
patch [mostly] unnecessary)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 36/74] --- x86/shim: Kconfig and command line options
  2018-01-04 13:05 ` [PATCH RFC v1 36/74] --- x86/shim: Kconfig and command line options Wei Liu
@ 2018-01-05 15:26   ` Jan Beulich
  2018-01-05 17:51     ` Andrew Cooper
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-05 15:26 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -133,6 +133,28 @@ config PVH_GUEST
>  	---help---
>  	  Support booting using the PVH ABI.
>  
> +	  If unsure, say N.
> +
> +config PV_SHIM
> +	def_bool n
> +	prompt "PV Shim"
> +	depends on PV && XEN_GUEST
> +	---help---
> +	  Build Xen with a mode which acts as a shim to allow PV guest to run
> +	  in an HVM/PVH container. This mode can only be enabled with command
> +	  line option.
> +
> +	  If unsure, say N.
> +
> +config PV_SHIM_EXCLUSIVE
> +	def_bool n
> +	prompt "PV Shim Exclusive"
> +	depends on PV_SHIM

My expectation so far was that this would be the only mode we
target, hence I think at the very least the default wants to be y
here.

> --- /dev/null
> +++ b/xen/arch/x86/pv/shim.c
> @@ -0,0 +1,39 @@
> +/******************************************************************************
> + * arch/x86/pv/shim.c
> + *
> + * Functionaltiy for PV Shim mode
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (c) 2017 Citrix Systems Ltd.
> + */
> +#include <xen/init.h>
> +#include <xen/types.h>
> +
> +#include <asm/apic.h>
> +
> +#ifndef CONFIG_PV_SHIM_EXCLUSIVE
> +bool pv_shim;

__read_mostly (if not __initdata)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 36/74] --- x86/shim: Kconfig and command line options
  2018-01-05 15:26   ` Jan Beulich
@ 2018-01-05 17:51     ` Andrew Cooper
  2018-01-08  8:22       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Andrew Cooper @ 2018-01-05 17:51 UTC (permalink / raw)
  To: Jan Beulich, wei.liu2; +Cc: Xen-devel

On 05/01/18 15:26, Jan Beulich wrote:
>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> --- a/xen/arch/x86/Kconfig
>> +++ b/xen/arch/x86/Kconfig
>> @@ -133,6 +133,28 @@ config PVH_GUEST
>>  	---help---
>>  	  Support booting using the PVH ABI.
>>  
>> +	  If unsure, say N.
>> +
>> +config PV_SHIM
>> +	def_bool n
>> +	prompt "PV Shim"
>> +	depends on PV && XEN_GUEST
>> +	---help---
>> +	  Build Xen with a mode which acts as a shim to allow PV guest to run
>> +	  in an HVM/PVH container. This mode can only be enabled with command
>> +	  line option.
>> +
>> +	  If unsure, say N.
>> +
>> +config PV_SHIM_EXCLUSIVE
>> +	def_bool n
>> +	prompt "PV Shim Exclusive"
>> +	depends on PV_SHIM
> My expectation so far was that this would be the only mode we
> target, hence I think at the very least the default wants to be y
> here.

Until proper out-of-tree Xen builds work, building the shim binary at
all is a PITA.

These defaults give a developer a single binary which is capable of
running natively or as the shim, which has made development far more
productive.  Its certainly the way I'd expect to do primary future
development of the shim.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 36/74] --- x86/shim: Kconfig and command line options
  2018-01-05 17:51     ` Andrew Cooper
@ 2018-01-08  8:22       ` Jan Beulich
  2018-01-08 11:33         ` Andrew Cooper
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08  8:22 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, wei.liu2

>>> On 05.01.18 at 18:51, <andrew.cooper3@citrix.com> wrote:
> On 05/01/18 15:26, Jan Beulich wrote:
>>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>>> --- a/xen/arch/x86/Kconfig
>>> +++ b/xen/arch/x86/Kconfig
>>> @@ -133,6 +133,28 @@ config PVH_GUEST
>>>  	---help---
>>>  	  Support booting using the PVH ABI.
>>>  
>>> +	  If unsure, say N.
>>> +
>>> +config PV_SHIM
>>> +	def_bool n
>>> +	prompt "PV Shim"
>>> +	depends on PV && XEN_GUEST
>>> +	---help---
>>> +	  Build Xen with a mode which acts as a shim to allow PV guest to run
>>> +	  in an HVM/PVH container. This mode can only be enabled with command
>>> +	  line option.
>>> +
>>> +	  If unsure, say N.
>>> +
>>> +config PV_SHIM_EXCLUSIVE
>>> +	def_bool n
>>> +	prompt "PV Shim Exclusive"
>>> +	depends on PV_SHIM
>> My expectation so far was that this would be the only mode we
>> target, hence I think at the very least the default wants to be y
>> here.
> 
> Until proper out-of-tree Xen builds work, building the shim binary at
> all is a PITA.

Out-of-tree builds would certainly be nice to have, but I don't see
the big issue with building a shim-only binary, and I have been
explaining before how I build multiple distinct configurations from
the same source tree: Rather than building in the actual source
tree, establish a tree of symlinks back to the source tree, and
build in there. You can create any number of such trees. I'd also
expect this would eliminate some or all of the (I'm sorry) crude
build logic you're introducing in one of the patches; at the very
least I'm considering that patch so heavily draft/RFC that I didn't
even mean to reply to it.

> These defaults give a developer a single binary which is capable of
> running natively or as the shim, which has made development far more
> productive.  Its certainly the way I'd expect to do primary future
> development of the shim.

Interesting - the need to have the binary built under tools/firmware/
to me is a clear indication that you'll need a separate .config for it
anyway, so I can't see how building a universal binary will be of long
term help.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 38/74] x86/pv-shim: Force CPUID faulting in pv-shim mode
  2018-01-04 13:05 ` [PATCH RFC v1 38/74] x86/pv-shim: Force CPUID faulting in pv-shim mode Wei Liu
@ 2018-01-08 10:16   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 10:16 UTC (permalink / raw)
  To: Andrew Cooper, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> @@ -177,7 +178,8 @@ void ctxt_switch_levelling(const struct vcpu *next)
>  		 * generating the maximum full cpuid policy into Xen, at which
>  		 * this problem will disappear.
>  		 */
> -		set_cpuid_faulting(nextd && !is_control_domain(nextd) &&
> +		set_cpuid_faulting(nextd &&
> +				   (pv_shim || !is_control_domain(nextd)) &&

Doesn't pv_shim imply !is_control_domain(nextd)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 39/74] xen/x86: make VGA support selectable
  2018-01-04 13:05 ` [PATCH RFC v1 39/74] xen/x86: make VGA support selectable Wei Liu
@ 2018-01-08 10:22   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 10:22 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> --- a/xen/drivers/video/Kconfig
> +++ b/xen/drivers/video/Kconfig
> @@ -3,8 +3,14 @@ config VIDEO
>  	bool
>  
>  config VGA
> -	bool
> +	bool "VGA support"
>  	select VIDEO
> +	depends on X86
> +	default y

What about

config VGA
	bool "VGA support" if !PV_SHIM_EXCLUSIVE
	select VIDEO
	depends on X86
	default y if !PV_SHIM_EXCLUSIVE

? With that (or a good reason why not to)
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid
  2018-01-04 13:05 ` [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid Wei Liu
@ 2018-01-08 10:27   ` Jan Beulich
  2018-01-08 10:34     ` Andrew Cooper
  2018-01-08 11:29   ` Jan Beulich
  1 sibling, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 10:27 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> Use the ebx register of the hypervisor leaf 1. The eax register on
> this leaf is already used to report the Xen major and minor versions.

The rationale for doing this is missing. Iirc in past discussions the
opinion was voiced (more than once, and iirc by Andrew any maybe
others) that a domain in general shouldn't be told about its domain
ID. Otherwise I also can't see why we don't have a hypercall for
this, and e.g. XTF needs to go through hoops to figure it out. Are
those arguments (which I don't recall) not applicable anymore?

In the Amazon shim patches thread handing out the domain ID by
command line option was suggested as an alternative, which then
wouldn't affect other (non-shim) domains, or the client of the shim.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 41/74] xen/pvh: do not mark the low 1MB as IO mem
  2018-01-04 13:05 ` [PATCH RFC v1 41/74] xen/pvh: do not mark the low 1MB as IO mem Wei Liu
@ 2018-01-08 10:30   ` Jan Beulich
  2018-01-08 10:37     ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 10:30 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> @@ -288,8 +289,12 @@ void __init arch_init_memory(void)
>      dom_cow = domain_create(DOMID_COW, DOMCRF_dummy, 0, NULL);
>      BUG_ON(IS_ERR(dom_cow));
>  
> -    /* First 1MB of RAM is historically marked as I/O. */
> -    for ( i = 0; i < 0x100; i++ )
> +    /*
> +     * First 1MB of RAM is historically marked as I/O.  If we booted PVH,
> +     * reclaim the space.  Irrespective, leave MFN 0 as special for the sake
> +     * of 0 being a very common default value.
> +     */
> +    for ( i = 0; i < (pvh_boot ? 1 : 0x100); i++ )
>          share_xen_page_with_guest(mfn_to_page(_mfn(i)),
>                                    dom_io, XENSHARE_writable);

I can see this being valid as long as there's no firmware. What
doesn't become clear from neither the description nor the
comment is whether this is a necessary change, or just an
optimization to avoid wasting these 255 pages.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid
  2018-01-08 10:27   ` Jan Beulich
@ 2018-01-08 10:34     ` Andrew Cooper
  2018-01-08 11:11       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Andrew Cooper @ 2018-01-08 10:34 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monne, wei.liu2; +Cc: Xen-devel

On 08/01/18 10:27, Jan Beulich wrote:
>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> From: Roger Pau Monne <roger.pau@citrix.com>
>>
>> Use the ebx register of the hypervisor leaf 1. The eax register on
>> this leaf is already used to report the Xen major and minor versions.
> The rationale for doing this is missing. Iirc in past discussions the
> opinion was voiced (more than once, and iirc by Andrew any maybe
> others) that a domain in general shouldn't be told about its domain
> ID. Otherwise I also can't see why we don't have a hypercall for
> this, and e.g. XTF needs to go through hoops to figure it out. Are
> those arguments (which I don't recall) not applicable anymore?
>
> In the Amazon shim patches thread handing out the domain ID by
> command line option was suggested as an alternative, which then
> wouldn't affect other (non-shim) domains, or the client of the shim.

A guests domid is unconditionally always available in xenstore, and is a
necessary part of any PV communication.

Like it or not, domid is part of the guests view of the Xen ABI. 
Therefore, making it easily accessible is the best course of action
(especially as pv-shim deliberately doesn't interpose on the xenstore ring).

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked
  2018-01-04 13:05 ` [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked Wei Liu
@ 2018-01-08 10:37   ` Jan Beulich
  2018-01-08 11:12     ` George Dunlap
  2018-01-12 10:41   ` Dario Faggioli
  1 sibling, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 10:37 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: George Dunlap, Xen-devel, Dario Faggioli

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> Avoid scheduling vCPUs that are blocked, there's no point in assigning
> them to a pCPU because they are not going to run anyway.
> 
> Since blocked vCPUs are not assigned to pCPUs after this change, force
> a rescheduling when a vCPU is brought up if it's on the waitqueue.
> Also when scheduling try to pick a vCPU from the runqueue if the pCPU
> is running idle.

I don't think the description adequately describes the changes,
perhaps (in part) because ...

> Changes since v1:
>  - Force a rescheduling when a vCPU is brought up.
>  - Try to pick a vCPU from the runqueue if running the idle vCPU.

... it wasn't updated after making these adjustments.

> --- a/xen/common/sched_null.c
> +++ b/xen/common/sched_null.c
> @@ -574,6 +574,8 @@ static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v)
>      {
>          /* Not exactly "on runq", but close enough for reusing the counter */
>          SCHED_STAT_CRANK(vcpu_wake_onrunq);
> +        /* Force a rescheduling in case some CPU is idle can pick this vCPU */
> +        cpumask_raise_softirq(&cpu_online_map, SCHEDULE_SOFTIRQ);
>          return;
>      }

I don't understand: Isn't the null scheduler not moving around
vCPU-s at all? At least that's what the comment at the top of the
file says, unless I'm mis-interpreting it. If so, how can "some CPU
(...) pick this vCPU"?

> @@ -781,6 +784,10 @@ static struct task_slice null_schedule(const struct scheduler *ops,
>          {
>              list_for_each_entry( wvc, &prv->waitq, waitq_elem )
>              {
> +                if ( test_bit(_VPF_down, &wvc->vcpu->pause_flags) )
> +                    /* Skip vCPUs that are down. */
> +                    continue;

"Down" != "blocked" (as per the description).

Overall it's not really being made clear what problem there is that
this patch is intended to solve.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 41/74] xen/pvh: do not mark the low 1MB as IO mem
  2018-01-08 10:30   ` Jan Beulich
@ 2018-01-08 10:37     ` Roger Pau Monné
  2018-01-08 11:11       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-08 10:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, wei.liu2, Xen-devel

On Mon, Jan 08, 2018 at 03:30:17AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> > @@ -288,8 +289,12 @@ void __init arch_init_memory(void)
> >      dom_cow = domain_create(DOMID_COW, DOMCRF_dummy, 0, NULL);
> >      BUG_ON(IS_ERR(dom_cow));
> >  
> > -    /* First 1MB of RAM is historically marked as I/O. */
> > -    for ( i = 0; i < 0x100; i++ )
> > +    /*
> > +     * First 1MB of RAM is historically marked as I/O.  If we booted PVH,
> > +     * reclaim the space.  Irrespective, leave MFN 0 as special for the sake
> > +     * of 0 being a very common default value.
> > +     */
> > +    for ( i = 0; i < (pvh_boot ? 1 : 0x100); i++ )
> >          share_xen_page_with_guest(mfn_to_page(_mfn(i)),
> >                                    dom_io, XENSHARE_writable);
> 
> I can see this being valid as long as there's no firmware. What
> doesn't become clear from neither the description nor the
> comment is whether this is a necessary change, or just an
> optimization to avoid wasting these 255 pages.

It's just an optimization to not waste the low 1MiB. The shim will
work equally well without this.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 43/74] xen: introduce rangeset_reserve_hole
  2018-01-04 13:05 ` [PATCH RFC v1 43/74] xen: introduce rangeset_reserve_hole Wei Liu
@ 2018-01-08 10:46   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 10:46 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> Reserve a hole in a rangeset.

At the end of this operation the new range isn't distinguishable
from a range added by rangeset_add_range(). Hence I don't think
the term "hole" is really appropriate.

> --- a/xen/common/rangeset.c
> +++ b/xen/common/rangeset.c
> @@ -298,6 +298,57 @@ int rangeset_report_ranges(
>      return rc;
>  }
>  
> +int rangeset_reserve_hole(struct rangeset *r, unsigned long size,
> +                          unsigned long *s)

Therefore, how about "rangeset_claim_range()" or
"rangeset_add_dyn_range()"?

> +{
> +    struct range *prev, *next;
> +
> +    *s = 0;

I think it would be better to use a local variable here, and set *s
only on the success path.

> +    write_lock(&r->lock);
> +
> +    for ( prev = NULL, next = first_range(r);
> +          next;
> +          prev = next, next = next_range(r, next) )
> +    {
> +        if ( (next->s - *s) >= size )
> +            goto insert;
> +
> +        if ( next->e == ~0UL )
> +            goto out;
> +
> +        *s = next->e + 1;
> +    }
> +
> +    if ( (~0UL - *s) + 1 >= size )
> +        goto insert;
> +
> + out:
> +    write_unlock(&r->lock);
> +    return -ENOSPC;
> +
> + insert:
> +    if ( !prev )

unlikely()?

> --- a/xen/include/xen/rangeset.h
> +++ b/xen/include/xen/rangeset.h
> @@ -76,6 +76,10 @@ int __must_check rangeset_remove_singleton(
>  bool_t __must_check rangeset_contains_singleton(
>      struct rangeset *r, unsigned long s);
>  
> +/* Reserve a region of the specified size. */
> +int __must_check rangeset_reserve_hole(struct rangeset *r, unsigned long size,
> +                                       unsigned long *s);

I think this would better be placed closer to rangeset_add_range().

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 44/74] xen/pvshim: keep track of unused pages
  2018-01-04 13:05 ` [PATCH RFC v1 44/74] xen/pvshim: keep track of unused pages Wei Liu
@ 2018-01-08 10:58   ` Jan Beulich
  2018-01-08 11:04     ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 10:58 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> Simple infrastructure to keep track of allocate and free unused pages,
> so that we can use them to map special pages like shared info and
> grant table.
> 
> As rangeset depends on malloc being ready we introduce
> hypervisor_setup for things that can be initialised late in the
> process.
> 
> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  xen/arch/x86/guest/xen.c        | 48 
> +++++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/setup.c            |  3 +++
>  xen/include/asm-x86/guest/xen.h | 22 +++++++++++++++++++
>  3 files changed, 73 insertions(+)
> 
> diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
> index 0319a5f9e8..f66c10fbe5 100644
> --- a/xen/arch/x86/guest/xen.c
> +++ b/xen/arch/x86/guest/xen.c
> @@ -21,6 +21,7 @@
>  #include <xen/init.h>
>  #include <xen/types.h>
>  #include <xen/pv_console.h>
> +#include <xen/rangeset.h>
>  
>  #include <asm/apic.h>
>  #include <asm/guest.h>
> @@ -34,6 +35,7 @@ bool xen_guest;
>  static uint32_t xen_cpuid_base;
>  static uint8_t evtchn_upcall_vector;
>  extern char hypercall_page[];
> +static struct rangeset *mem;
>  
>  static void __init find_xen_leaves(void)
>  {
> @@ -161,9 +163,38 @@ static void __init init_evtchn(void)
>      ap_setup_event_channels(true);
>  }
>  
> +static void __init init_memmap(void)
> +{
> +    unsigned int i;
> +
> +    mem = rangeset_new(NULL, "host memory map", 0);
> +    if ( !mem )
> +        panic("failed to allocate host memory rangeset");

"host" is meant from the perspective of the shim on itself here aiui,
not the underlying entity? I find using that term here at least
misleading.

> +    /* Mark up to the last memory page (or 4GB) as RAM. */
> +    if ( rangeset_add_range(mem, 0, max_t(unsigned long, max_page,
> +                                          (GB(4) - 1) >> PAGE_SHIFT)) )

Don't you also need "max_page - 1" then? Also - why the
saturation to 4Gb?

> +        panic("unable to add RAM to memory rangeset");
> +
> +    for ( i = 0; i < e820.nr_map; i++ )
> +    {
> +        struct e820entry *e = &e820.map[i];
> +
> +        if ( rangeset_add_range(mem, e->addr >> PAGE_SHIFT,
> +                                (e->addr + e->size) >> PAGE_SHIFT) )

PFN_DOWN() and PFN_UP() respectively. Plus aren't rangeset
ranges exclusive of their upper ends, making it necessary to
subtract 1 from the upper bound?

> @@ -43,11 +46,30 @@ static inline void hypervisor_early_setup(struct e820map *e820)
>  {
>      ASSERT_UNREACHABLE();
>  };
> +
> +static inline void hypervisor_setup(void)
> +{
> +    ASSERT_UNREACHABLE();
> +}
> +
>  static inline void hypervisor_ap_setup(void)
>  {
>      ASSERT_UNREACHABLE();
>  };
>  
> +static inline int hypervisor_alloc_unused_page(mfn_t *mfn)
> +{
> +
> +    ASSERT_UNREACHABLE();
> +    return 0;
> +}
> +
> +static inline int hypervisor_free_unused_page(mfn_t mfn)
> +{
> +    ASSERT_UNREACHABLE();
> +    return 0;
> +}

I can see the value of hypervisor_setup() stub, but are the other
two really going to be needed, i.e. are any such allocations being
placed into not shim specific code (doesn't seem very likely)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 45/74] x86/guest: use unpopulated memory to map the shared_info page
  2018-01-04 13:05 ` [PATCH RFC v1 45/74] x86/guest: use unpopulated memory to map the shared_info page Wei Liu
@ 2018-01-08 11:03   ` Jan Beulich
  2018-01-08 11:06     ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 11:03 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> This prevents hardcoding a known unpopulated memory page to map
> the shared info page. This fixes a TODO item in a previous patch.
> 
> Remove hypervisor_early_setup as now it is not required anymore.
> 
> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Looks good, but one question:

> @@ -187,15 +186,12 @@ static void __init init_memmap(void)
>      }
>  }
>  
> -void __init hypervisor_early_setup(struct e820map *e820)
> -{
> -    map_shared_info(e820);
> -}
> -
>  void __init hypervisor_setup(void)
>  {
>      init_memmap();
>  
> +    map_shared_info();
> +
>      init_evtchn();
>  }

If the shared info page isn't needed as early, why was it set up
that early originally?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 44/74] xen/pvshim: keep track of unused pages
  2018-01-08 10:58   ` Jan Beulich
@ 2018-01-08 11:04     ` Roger Pau Monné
  2018-01-08 11:22       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-08 11:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Mon, Jan 08, 2018 at 03:58:22AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> > +static void __init init_memmap(void)
> > +{
> > +    unsigned int i;
> > +
> > +    mem = rangeset_new(NULL, "host memory map", 0);
> > +    if ( !mem )
> > +        panic("failed to allocate host memory rangeset");
> 
> "host" is meant from the perspective of the shim on itself here aiui,
> not the underlying entity? I find using that term here at least
> misleading.

Does "failed to allocate memory tracking rangeset" seem better?

> > +    /* Mark up to the last memory page (or 4GB) as RAM. */
> > +    if ( rangeset_add_range(mem, 0, max_t(unsigned long, max_page,
> > +                                          (GB(4) - 1) >> PAGE_SHIFT)) )
> 
> Don't you also need "max_page - 1" then? Also - why the
> saturation to 4Gb?

There's the MMIO hole below 4GiB, and I wanted to prevent using memory
from there. I know there can still be MMIO holes above 4GiB, but it's
less likely.

> > +        panic("unable to add RAM to memory rangeset");
> > +
> > +    for ( i = 0; i < e820.nr_map; i++ )
> > +    {
> > +        struct e820entry *e = &e820.map[i];
> > +
> > +        if ( rangeset_add_range(mem, e->addr >> PAGE_SHIFT,
> > +                                (e->addr + e->size) >> PAGE_SHIFT) )
> 
> PFN_DOWN() and PFN_UP() respectively. Plus aren't rangeset
> ranges exclusive of their upper ends, making it necessary to
> subtract 1 from the upper bound?

Right.

> > @@ -43,11 +46,30 @@ static inline void hypervisor_early_setup(struct e820map *e820)
> >  {
> >      ASSERT_UNREACHABLE();
> >  };
> > +
> > +static inline void hypervisor_setup(void)
> > +{
> > +    ASSERT_UNREACHABLE();
> > +}
> > +
> >  static inline void hypervisor_ap_setup(void)
> >  {
> >      ASSERT_UNREACHABLE();
> >  };
> >  
> > +static inline int hypervisor_alloc_unused_page(mfn_t *mfn)
> > +{
> > +
> > +    ASSERT_UNREACHABLE();
> > +    return 0;
> > +}
> > +
> > +static inline int hypervisor_free_unused_page(mfn_t mfn)
> > +{
> > +    ASSERT_UNREACHABLE();
> > +    return 0;
> > +}
> 
> I can see the value of hypervisor_setup() stub, but are the other
> two really going to be needed, i.e. are any such allocations being
> placed into not shim specific code (doesn't seem very likely)?

Yes, I think I we can get rid of those two helpers.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 46/74] xen/guest: fetch vCPU ID from Xen
  2018-01-04 13:05 ` [PATCH RFC v1 46/74] xen/guest: fetch vCPU ID from Xen Wei Liu
@ 2018-01-08 11:04   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 11:04 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> If available.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 45/74] x86/guest: use unpopulated memory to map the shared_info page
  2018-01-08 11:03   ` Jan Beulich
@ 2018-01-08 11:06     ` Roger Pau Monné
  2018-01-08 11:25       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-08 11:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Mon, Jan 08, 2018 at 04:03:50AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> > This prevents hardcoding a known unpopulated memory page to map
> > the shared info page. This fixes a TODO item in a previous patch.
> > 
> > Remove hypervisor_early_setup as now it is not required anymore.
> > 
> > Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> 
> Looks good, but one question:
> 
> > @@ -187,15 +186,12 @@ static void __init init_memmap(void)
> >      }
> >  }
> >  
> > -void __init hypervisor_early_setup(struct e820map *e820)
> > -{
> > -    map_shared_info(e820);
> > -}
> > -
> >  void __init hypervisor_setup(void)
> >  {
> >      init_memmap();
> >  
> > +    map_shared_info();
> > +
> >      init_evtchn();
> >  }
> 
> If the shared info page isn't needed as early, why was it set up
> that early originally?

Because during the setup of the shared_info the used memory address
would also be added to the e820 in order to mark it as RAM. I don't
think that was really required, since it's not needed to have a
page_info for the shared_info because it's not shared with the guest.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 47/74] x86/guest: fix upcall vector setup
  2018-01-04 13:05 ` [PATCH RFC v1 47/74] x86/guest: fix upcall vector setup Wei Liu
@ 2018-01-08 11:08   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 11:08 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> Instead of forcing no pending event on the vCPU, just mask all event
> channels when setting up the BSP and further patches will unmask them
> as event channels are being setup.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> To be squashed with "x86/guest: enable event channels upcalls"

Yes please.

> @@ -95,6 +95,10 @@ static void map_shared_info(void)
>          panic("Failed to map shared_info page");
>  
>      set_fixmap(FIX_XEN_SHARED_INFO, mfn_x(mfn) << PAGE_SHIFT);
> +
> +    /* Mask all upcalls */
> +    for ( i = 0; i < ARRAY_SIZE(XEN_shared_info->evtchn_mask); i++ )
> +        xchg(&XEN_shared_info->evtchn_mask[i], ~0ul);

Just like there, write_atomic() would likely be better than xchg().

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid
  2018-01-08 10:34     ` Andrew Cooper
@ 2018-01-08 11:11       ` Jan Beulich
  2018-01-08 11:22         ` Andrew Cooper
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 11:11 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, wei.liu2, Roger Pau Monne

>>> On 08.01.18 at 11:34, <andrew.cooper3@citrix.com> wrote:
> On 08/01/18 10:27, Jan Beulich wrote:
>>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>>
>>> Use the ebx register of the hypervisor leaf 1. The eax register on
>>> this leaf is already used to report the Xen major and minor versions.
>> The rationale for doing this is missing. Iirc in past discussions the
>> opinion was voiced (more than once, and iirc by Andrew any maybe
>> others) that a domain in general shouldn't be told about its domain
>> ID. Otherwise I also can't see why we don't have a hypercall for
>> this, and e.g. XTF needs to go through hoops to figure it out. Are
>> those arguments (which I don't recall) not applicable anymore?
>>
>> In the Amazon shim patches thread handing out the domain ID by
>> command line option was suggested as an alternative, which then
>> wouldn't affect other (non-shim) domains, or the client of the shim.
> 
> A guests domid is unconditionally always available in xenstore, and is a
> necessary part of any PV communication.
> 
> Like it or not, domid is part of the guests view of the Xen ABI. 
> Therefore, making it easily accessible is the best course of action
> (especially as pv-shim deliberately doesn't interpose on the xenstore ring).

All understood, yet you don't address the question on the
backgrounds of the change of your opinion here. Or am I
misremembering that earlier on you were against exposing
the domain ID?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 41/74] xen/pvh: do not mark the low 1MB as IO mem
  2018-01-08 10:37     ` Roger Pau Monné
@ 2018-01-08 11:11       ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 11:11 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, wei.liu2, Xen-devel

>>> On 08.01.18 at 11:37, <roger.pau@citrix.com> wrote:
> On Mon, Jan 08, 2018 at 03:30:17AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> > @@ -288,8 +289,12 @@ void __init arch_init_memory(void)
>> >      dom_cow = domain_create(DOMID_COW, DOMCRF_dummy, 0, NULL);
>> >      BUG_ON(IS_ERR(dom_cow));
>> >  
>> > -    /* First 1MB of RAM is historically marked as I/O. */
>> > -    for ( i = 0; i < 0x100; i++ )
>> > +    /*
>> > +     * First 1MB of RAM is historically marked as I/O.  If we booted PVH,
>> > +     * reclaim the space.  Irrespective, leave MFN 0 as special for the sake
>> > +     * of 0 being a very common default value.
>> > +     */
>> > +    for ( i = 0; i < (pvh_boot ? 1 : 0x100); i++ )
>> >          share_xen_page_with_guest(mfn_to_page(_mfn(i)),
>> >                                    dom_io, XENSHARE_writable);
>> 
>> I can see this being valid as long as there's no firmware. What
>> doesn't become clear from neither the description nor the
>> comment is whether this is a necessary change, or just an
>> optimization to avoid wasting these 255 pages.
> 
> It's just an optimization to not waste the low 1MiB. The shim will
> work equally well without this.

Good. Please say so in the description.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked
  2018-01-08 10:37   ` Jan Beulich
@ 2018-01-08 11:12     ` George Dunlap
  2018-01-12  9:54       ` Dario Faggioli
  0 siblings, 1 reply; 206+ messages in thread
From: George Dunlap @ 2018-01-08 11:12 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monne, wei.liu2
  Cc: George Dunlap, Xen-devel, Dario Faggioli

On 01/08/2018 10:37 AM, Jan Beulich wrote:
>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> From: Roger Pau Monne <roger.pau@citrix.com>
>>
>> Avoid scheduling vCPUs that are blocked, there's no point in assigning
>> them to a pCPU because they are not going to run anyway.
>>
>> Since blocked vCPUs are not assigned to pCPUs after this change, force
>> a rescheduling when a vCPU is brought up if it's on the waitqueue.
>> Also when scheduling try to pick a vCPU from the runqueue if the pCPU
>> is running idle.
> 
> I don't think the description adequately describes the changes,
> perhaps (in part) because ...
> 
>> Changes since v1:
>>  - Force a rescheduling when a vCPU is brought up.
>>  - Try to pick a vCPU from the runqueue if running the idle vCPU.
> 
> ... it wasn't updated after making these adjustments.
> 
>> --- a/xen/common/sched_null.c
>> +++ b/xen/common/sched_null.c
>> @@ -574,6 +574,8 @@ static void null_vcpu_wake(const struct scheduler *ops, struct vcpu *v)
>>      {
>>          /* Not exactly "on runq", but close enough for reusing the counter */
>>          SCHED_STAT_CRANK(vcpu_wake_onrunq);
>> +        /* Force a rescheduling in case some CPU is idle can pick this vCPU */
>> +        cpumask_raise_softirq(&cpu_online_map, SCHEDULE_SOFTIRQ);
>>          return;
>>      }
> 
> I don't understand: Isn't the null scheduler not moving around
> vCPU-s at all? At least that's what the comment at the top of the
> file says, unless I'm mis-interpreting it. If so, how can "some CPU
> (...) pick this vCPU"?

There's no current way to prevent a user from adding more vcpus to a
pool than there are pcpus (if nothing else, by creating a new VM in a
given pool), or from taking pcpus from a pool in which #vcpus >= #pcpus.

The null scheduler deals with this by having a queue of "unassigned"
vcpus that are waiting for a free pcpu.  When a pcpu becomes available,
it will do the assignment.  When a pcpu that has a vcpu is assigned is
removed from the pool, that vcpu is assigned to a different pcpu if one
is available; if not, it is put on the list.

In the case of shim mode, this also seems to happen whenever curvcpus <
maxvcpus: The L1 hypervisor (shim) only sees curvcpus cpus on which to
schedule L2 vcpus, but the L2 guest has maxvcpus vcpus to schedule, of
which (maxvcpus-curvcpus) are  marked 'down'.  In this case, it also
seems that the null scheduler sometimes schedules a "down" vcpu when
there are "up" vcpus on the list; meaning that the "up" vcpus are never
scheduled.

(This is just my understanding from conversations with Roger; I haven't
actually looked at the code to verify a number of the statements in the
previous paragraph.)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 44/74] xen/pvshim: keep track of unused pages
  2018-01-08 11:04     ` Roger Pau Monné
@ 2018-01-08 11:22       ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 11:22 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, wei.liu2

>>> On 08.01.18 at 12:04, <roger.pau@citrix.com> wrote:
> On Mon, Jan 08, 2018 at 03:58:22AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> > +static void __init init_memmap(void)
>> > +{
>> > +    unsigned int i;
>> > +
>> > +    mem = rangeset_new(NULL, "host memory map", 0);
>> > +    if ( !mem )
>> > +        panic("failed to allocate host memory rangeset");
>> 
>> "host" is meant from the perspective of the shim on itself here aiui,
>> not the underlying entity? I find using that term here at least
>> misleading.
> 
> Does "failed to allocate memory tracking rangeset" seem better?

Thinking of it, even the use of the word "memory" isn't necessarily
appropriate, as what you care about are just (un)used address
ranges. How about "in-use PFNs" and "failed to allocated PFN usage
rangeset" (or some such) respectively?

>> > +    /* Mark up to the last memory page (or 4GB) as RAM. */
>> > +    if ( rangeset_add_range(mem, 0, max_t(unsigned long, max_page,
>> > +                                          (GB(4) - 1) >> PAGE_SHIFT)) )
>> 
>> Don't you also need "max_page - 1" then? Also - why the
>> saturation to 4Gb?
> 
> There's the MMIO hole below 4GiB, and I wanted to prevent using memory
> from there. I know there can still be MMIO holes above 4GiB, but it's
> less likely.

All MMIO holes certainly will need taking care of here anyway,
sooner or later (and ideally without needing to scan PCI config
space). Since there's nothing else that could be there for a PVH
guest, I guess it might be reasonable to call the 4Gb related
logic here temporary then (in the patch description as well as
the comment)? What I'm concerned about here (without having
checked them) are simplistic PV environments like mini-os which
I wouldn't be surprised to assume all their addresses fit into
the low 4Gb.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid
  2018-01-08 11:11       ` Jan Beulich
@ 2018-01-08 11:22         ` Andrew Cooper
  2018-01-08 11:27           ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Andrew Cooper @ 2018-01-08 11:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2, Roger Pau Monne

On 08/01/18 11:11, Jan Beulich wrote:
>>>> On 08.01.18 at 11:34, <andrew.cooper3@citrix.com> wrote:
>> On 08/01/18 10:27, Jan Beulich wrote:
>>>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>>>
>>>> Use the ebx register of the hypervisor leaf 1. The eax register on
>>>> this leaf is already used to report the Xen major and minor versions.
>>> The rationale for doing this is missing. Iirc in past discussions the
>>> opinion was voiced (more than once, and iirc by Andrew any maybe
>>> others) that a domain in general shouldn't be told about its domain
>>> ID. Otherwise I also can't see why we don't have a hypercall for
>>> this, and e.g. XTF needs to go through hoops to figure it out. Are
>>> those arguments (which I don't recall) not applicable anymore?
>>>
>>> In the Amazon shim patches thread handing out the domain ID by
>>> command line option was suggested as an alternative, which then
>>> wouldn't affect other (non-shim) domains, or the client of the shim.
>> A guests domid is unconditionally always available in xenstore, and is a
>> necessary part of any PV communication.
>>
>> Like it or not, domid is part of the guests view of the Xen ABI. 
>> Therefore, making it easily accessible is the best course of action
>> (especially as pv-shim deliberately doesn't interpose on the xenstore ring).
> All understood, yet you don't address the question on the
> backgrounds of the change of your opinion here. Or am I
> misremembering that earlier on you were against exposing
> the domain ID?

In the past, I was concerned about how a guest can brute force its own
domid via leaky error conditions in some hypercalls.  I still think
these should be fixed.

Ideally, a guest wouldn't know its own domid, but we're 15 years too
late on that line of thought...

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 45/74] x86/guest: use unpopulated memory to map the shared_info page
  2018-01-08 11:06     ` Roger Pau Monné
@ 2018-01-08 11:25       ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 11:25 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, wei.liu2

>>> On 08.01.18 at 12:06, <roger.pau@citrix.com> wrote:
> On Mon, Jan 08, 2018 at 04:03:50AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> > This prevents hardcoding a known unpopulated memory page to map
>> > the shared info page. This fixes a TODO item in a previous patch.
>> > 
>> > Remove hypervisor_early_setup as now it is not required anymore.
>> > 
>> > Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
>> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>> 
>> Looks good, but one question:
>> 
>> > @@ -187,15 +186,12 @@ static void __init init_memmap(void)
>> >      }
>> >  }
>> >  
>> > -void __init hypervisor_early_setup(struct e820map *e820)
>> > -{
>> > -    map_shared_info(e820);
>> > -}
>> > -
>> >  void __init hypervisor_setup(void)
>> >  {
>> >      init_memmap();
>> >  
>> > +    map_shared_info();
>> > +
>> >      init_evtchn();
>> >  }
>> 
>> If the shared info page isn't needed as early, why was it set up
>> that early originally?
> 
> Because during the setup of the shared_info the used memory address
> would also be added to the e820 in order to mark it as RAM. I don't
> think that was really required, since it's not needed to have a
> page_info for the shared_info because it's not shared with the guest.

Ah, I see.

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid
  2018-01-08 11:22         ` Andrew Cooper
@ 2018-01-08 11:27           ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 11:27 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, wei.liu2, Roger Pau Monne

>>> On 08.01.18 at 12:22, <andrew.cooper3@citrix.com> wrote:
> On 08/01/18 11:11, Jan Beulich wrote:
>>>>> On 08.01.18 at 11:34, <andrew.cooper3@citrix.com> wrote:
>>> On 08/01/18 10:27, Jan Beulich wrote:
>>>>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>>>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>>>>
>>>>> Use the ebx register of the hypervisor leaf 1. The eax register on
>>>>> this leaf is already used to report the Xen major and minor versions.
>>>> The rationale for doing this is missing. Iirc in past discussions the
>>>> opinion was voiced (more than once, and iirc by Andrew any maybe
>>>> others) that a domain in general shouldn't be told about its domain
>>>> ID. Otherwise I also can't see why we don't have a hypercall for
>>>> this, and e.g. XTF needs to go through hoops to figure it out. Are
>>>> those arguments (which I don't recall) not applicable anymore?
>>>>
>>>> In the Amazon shim patches thread handing out the domain ID by
>>>> command line option was suggested as an alternative, which then
>>>> wouldn't affect other (non-shim) domains, or the client of the shim.
>>> A guests domid is unconditionally always available in xenstore, and is a
>>> necessary part of any PV communication.
>>>
>>> Like it or not, domid is part of the guests view of the Xen ABI. 
>>> Therefore, making it easily accessible is the best course of action
>>> (especially as pv-shim deliberately doesn't interpose on the xenstore ring).
>> All understood, yet you don't address the question on the
>> backgrounds of the change of your opinion here. Or am I
>> misremembering that earlier on you were against exposing
>> the domain ID?
> 
> In the past, I was concerned about how a guest can brute force its own
> domid via leaky error conditions in some hypercalls.  I still think
> these should be fixed.

I agree on that latter part.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid
  2018-01-04 13:05 ` [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid Wei Liu
  2018-01-08 10:27   ` Jan Beulich
@ 2018-01-08 11:29   ` Jan Beulich
  1 sibling, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 11:29 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> --- a/xen/include/public/arch-x86/cpuid.h
> +++ b/xen/include/public/arch-x86/cpuid.h
> @@ -57,7 +57,8 @@
>   * Leaf 2 (0x40000x01)
>   * EAX[31:16]: Xen major version.
>   * EAX[15: 0]: Xen minor version.
> - * EBX-EDX: Reserved (currently all zeroes).
> + * EBX: Domain id.
> + * ECX-EDX: Reserved (currently all zeroes).
>   */

There's one issue here, as I've noticed only now: How does a
caller distinguish domid being zero from the information not
being provided by an older hypervisor? I think there needs to
be a qualifying bit elsewhere.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 36/74] --- x86/shim: Kconfig and command line options
  2018-01-08  8:22       ` Jan Beulich
@ 2018-01-08 11:33         ` Andrew Cooper
  2018-01-08 11:46           ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Andrew Cooper @ 2018-01-08 11:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On 08/01/18 08:22, Jan Beulich wrote:
>>>> On 05.01.18 at 18:51, <andrew.cooper3@citrix.com> wrote:
>> On 05/01/18 15:26, Jan Beulich wrote:
>>>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>>>> --- a/xen/arch/x86/Kconfig
>>>> +++ b/xen/arch/x86/Kconfig
>>>> @@ -133,6 +133,28 @@ config PVH_GUEST
>>>>  	---help---
>>>>  	  Support booting using the PVH ABI.
>>>>  
>>>> +	  If unsure, say N.
>>>> +
>>>> +config PV_SHIM
>>>> +	def_bool n
>>>> +	prompt "PV Shim"
>>>> +	depends on PV && XEN_GUEST
>>>> +	---help---
>>>> +	  Build Xen with a mode which acts as a shim to allow PV guest to run
>>>> +	  in an HVM/PVH container. This mode can only be enabled with command
>>>> +	  line option.
>>>> +
>>>> +	  If unsure, say N.
>>>> +
>>>> +config PV_SHIM_EXCLUSIVE
>>>> +	def_bool n
>>>> +	prompt "PV Shim Exclusive"
>>>> +	depends on PV_SHIM
>>> My expectation so far was that this would be the only mode we
>>> target, hence I think at the very least the default wants to be y
>>> here.
>> Until proper out-of-tree Xen builds work, building the shim binary at
>> all is a PITA.
> Out-of-tree builds would certainly be nice to have, but I don't see
> the big issue with building a shim-only binary, and I have been
> explaining before how I build multiple distinct configurations from
> the same source tree: Rather than building in the actual source
> tree, establish a tree of symlinks back to the source tree, and
> build in there. You can create any number of such trees. I'd also
> expect this would eliminate some or all of the (I'm sorry) crude
> build logic you're introducing in one of the patches; at the very
> least I'm considering that patch so heavily draft/RFC that I didn't
> even mean to reply to it.

How would you suggest we build this then?  There seem to be no good options.

>
>> These defaults give a developer a single binary which is capable of
>> running natively or as the shim, which has made development far more
>> productive.  Its certainly the way I'd expect to do primary future
>> development of the shim.
> Interesting - the need to have the binary built under tools/firmware/
> to me is a clear indication that you'll need a separate .config for it
> anyway, so I can't see how building a universal binary will be of long
> term help.

Its in firmware, because of $(XENFIRMWAREDIR), which will be needed for
anyone packaging this for distros.

The separate .config is simply because we can.  There is no point
wasting RAM in production system by using a fully-fat Xen binary when we
can reasonably compile things out.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 36/74] --- x86/shim: Kconfig and command line options
  2018-01-08 11:33         ` Andrew Cooper
@ 2018-01-08 11:46           ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 11:46 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, wei.liu2

>>> On 08.01.18 at 12:33, <andrew.cooper3@citrix.com> wrote:
> On 08/01/18 08:22, Jan Beulich wrote:
>>>>> On 05.01.18 at 18:51, <andrew.cooper3@citrix.com> wrote:
>>> On 05/01/18 15:26, Jan Beulich wrote:
>>>>>>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>>>>> --- a/xen/arch/x86/Kconfig
>>>>> +++ b/xen/arch/x86/Kconfig
>>>>> @@ -133,6 +133,28 @@ config PVH_GUEST
>>>>>  	---help---
>>>>>  	  Support booting using the PVH ABI.
>>>>>  
>>>>> +	  If unsure, say N.
>>>>> +
>>>>> +config PV_SHIM
>>>>> +	def_bool n
>>>>> +	prompt "PV Shim"
>>>>> +	depends on PV && XEN_GUEST
>>>>> +	---help---
>>>>> +	  Build Xen with a mode which acts as a shim to allow PV guest to run
>>>>> +	  in an HVM/PVH container. This mode can only be enabled with command
>>>>> +	  line option.
>>>>> +
>>>>> +	  If unsure, say N.
>>>>> +
>>>>> +config PV_SHIM_EXCLUSIVE
>>>>> +	def_bool n
>>>>> +	prompt "PV Shim Exclusive"
>>>>> +	depends on PV_SHIM
>>>> My expectation so far was that this would be the only mode we
>>>> target, hence I think at the very least the default wants to be y
>>>> here.
>>> Until proper out-of-tree Xen builds work, building the shim binary at
>>> all is a PITA.
>> Out-of-tree builds would certainly be nice to have, but I don't see
>> the big issue with building a shim-only binary, and I have been
>> explaining before how I build multiple distinct configurations from
>> the same source tree: Rather than building in the actual source
>> tree, establish a tree of symlinks back to the source tree, and
>> build in there. You can create any number of such trees. I'd also
>> expect this would eliminate some or all of the (I'm sorry) crude
>> build logic you're introducing in one of the patches; at the very
>> least I'm considering that patch so heavily draft/RFC that I didn't
>> even mean to reply to it.
> 
> How would you suggest we build this then?  There seem to be no good options.

The answer to this is (I think) implied by the answer below (taken
together with what I've said earlier and is still visible above).

>>> These defaults give a developer a single binary which is capable of
>>> running natively or as the shim, which has made development far more
>>> productive.  Its certainly the way I'd expect to do primary future
>>> development of the shim.
>> Interesting - the need to have the binary built under tools/firmware/
>> to me is a clear indication that you'll need a separate .config for it
>> anyway, so I can't see how building a universal binary will be of long
>> term help.
> 
> Its in firmware, because of $(XENFIRMWAREDIR), which will be needed for
> anyone packaging this for distros.

Yes and no. My general assumption on how to utilize this on older
Xen is to build the shim from 4.10 (or newer), which means the
two trees will be distinct anyway. Therefore I think the better
approach will be to hand a pre-built xen-shim to the tools part of
the build, just like you can hand in pre-built ("system") SeaBIOS
or alike. No need to introduce an incompletely dependency
tracked sub-make.

> The separate .config is simply because we can.  There is no point
> wasting RAM in production system by using a fully-fat Xen binary when we
> can reasonably compile things out.

Of course, but that doesn't address my question regarding the
long term utility of the "all-in-one" binary that I had asked.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 49/74] x86/guest: map per-cpu vcpu_info area.
  2018-01-04 13:06 ` [PATCH RFC v1 49/74] x86/guest: map per-cpu vcpu_info area Wei Liu
@ 2018-01-08 13:21   ` Jan Beulich
  2018-01-09 12:08     ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 13:21 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> So that the limit of XEN_LEGACY_MAX_VCPUS can be lifted.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Should be moved earlier maybe?

Especially the changes to time.c undoing/redoing earlier changes
suggests so.

> --- a/xen/arch/x86/guest/xen.c
> +++ b/xen/arch/x86/guest/xen.c
> @@ -38,6 +38,10 @@ static struct rangeset *mem;
>  
>  DEFINE_PER_CPU(unsigned int, vcpu_id);
>  
> +static struct vcpu_info *vcpu_info;
> +unsigned long vcpu_info_mapped[BITS_TO_LONGS(NR_CPUS)];

static

> @@ -101,6 +105,38 @@ static void map_shared_info(void)
>          xchg(&XEN_shared_info->evtchn_mask[i], ~0ul);
>  }
>  
> +static int map_vcpuinfo(void)
> +{
> +    unsigned int vcpu = this_cpu(vcpu_id);
> +    struct vcpu_register_vcpu_info info = { };

I doubt you need the initializer here.

> +    long rc;
> +
> +    if ( !vcpu_info )
> +    {
> +        this_cpu(vcpu_info) = &XEN_shared_info->vcpu_info[vcpu];
> +        return 0;
> +    }
> +
> +    if ( test_bit(vcpu, vcpu_info_mapped) )
> +    {
> +        this_cpu(vcpu_info) = &vcpu_info[vcpu];
> +        return 0;
> +    }
> +
> +    info.mfn = virt_to_mfn(&vcpu_info[vcpu]);
> +    info.offset = (unsigned long)&vcpu_info[vcpu] & ~PAGE_MASK;
> +    rc = xen_hypercall_vcpu_op(VCPUOP_register_vcpu_info, vcpu, &info);
> +    if ( rc )
> +        this_cpu(vcpu_info) = &XEN_shared_info->vcpu_info[vcpu];

You need to avoid producing an out of bounds pointer here for
large vcpu values.

> @@ -176,12 +211,34 @@ void __init hypervisor_setup(void)
>      map_shared_info();
>      set_vcpu_id();
>  
> +    vcpu_info = xzalloc_array(struct vcpu_info, nr_cpu_ids);
> +    if ( map_vcpuinfo() || !vcpu_info )
> +    {
> +        if ( vcpu_info )
> +        {
> +            xfree(vcpu_info);
> +            vcpu_info = NULL;
> +        }
> +        if ( nr_cpu_ids > XEN_LEGACY_MAX_VCPUS )

How about

    if ( map_vcpuinfo() )
    {
        xfree(vcpu_info);
        vcpu_info = NULL;
    }
    if ( !vcpu_info && nr_cpu_ids > XEN_LEGACY_MAX_VCPUS )
    {
        ...

?

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -533,11 +533,11 @@ static struct platform_timesource __initdata plt_tsc =
>   * Xen clock source is a variant of TSC source.
>   */
>  
> -DECLARE_PER_CPU(unsigned int, vcpu_id);
> +DECLARE_PER_CPU(struct vcpu_info *, vcpu_info);

I didn't notice the one being removed here - both shouldn't be
declared here, but in a header.

> @@ -107,6 +108,12 @@ static inline long xen_hypercall_hvm_op(unsigned int op, void *arg)
>      return _hypercall64_2(long, __HYPERVISOR_hvm_op, op, arg);
>  }
>  
> +static inline long xen_hypercall_vcpu_op(unsigned int cmd, unsigned int vcpu,

I believe "int" is sufficient here (and then also for the variable(s) into
which the return value is/are being latched).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 50/74] xen/pvshim: remove Dom0 kernel support check
  2018-01-04 13:06 ` [PATCH RFC v1 50/74] xen/pvshim: remove Dom0 kernel support check Wei Liu
@ 2018-01-08 13:28   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 13:28 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 51/74] xen/pvshim: don't allow access to iomem or ioports
  2018-01-04 13:06 ` [PATCH RFC v1 51/74] xen/pvshim: don't allow access to iomem or ioports Wei Liu
@ 2018-01-08 13:29   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 13:29 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

Could perhaps be folded into patch 50, as both relate to not
being Dom0.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 52/74] xen: mark xenstore/console pages as RAM and add them to dom_io
  2018-01-04 13:06 ` [PATCH RFC v1 52/74] xen: mark xenstore/console pages as RAM and add them to dom_io Wei Liu
@ 2018-01-08 13:49   ` Jan Beulich
  2018-01-09  9:25     ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 13:49 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

There being no description at all makes it rather harder to review this
one. I assume that marking the pages as RAM is necessary to make
sure a struct page_info is being created for them, which in turn is a
prereq for sharing the pages.

> +void __init hypervisor_fixup_e820(struct e820map *e820)
> +{
> +    uint64_t pfn = 0;

I don't think initializers of this kind are necessary (there are several
instances of this).

> +    long rc;
> +
> +    if ( !xen_guest )
> +        return;
> +
> +#define MARK_PARAM_RAM(p) ({                    \
> +    rc = xen_hypercall_hvm_get_param(p, &pfn);  \
> +    if ( rc )                                   \
> +        panic("Unable to get " #p);             \

The text here is the same in all three instances - please make it
distinguishable, so one doesn't have to start guessing.

> +void __init hypervisor_init_memory(void)
> +{
> +    uint64_t pfn = 0;
> +    long rc;
> +
> +    if ( !xen_guest )
> +        return;
> +
> +#define SHARE_PARAM(p) ({                                                   \
> +    rc = xen_hypercall_hvm_get_param(p, &pfn);                             \
> +    if ( rc )                                                               \
> +        panic("Unable to get " #p);                                         \
> +    share_xen_page_with_guest(mfn_to_page(pfn), dom_io, XENSHARE_writable); \

Why dom_io rather than the client domain? The more that dom_io
pages can only be mapped by privileged guests (and hence I
assume you need another tweak somewhere this way).

> +const unsigned long *__init hypervisor_reserved_pages(unsigned int *size)
> +{
> +    static unsigned long __initdata reserved_pages[2];
> +    uint64_t pfn = 0;
> +    long rc;
> +
> +    if ( !xen_guest )
> +        return NULL;
> +
> +    *size = 0;
> +
> +#define RESERVE_PARAM(p) ({                             \
> +    rc = xen_hypercall_hvm_get_param(p, &pfn);          \
> +    if ( rc )                                           \
> +        panic("Unable to get " #p);                     \
> +    reserved_pages[(*size)++] = pfn << PAGE_SHIFT;      \
> +})
> +    RESERVE_PARAM(HVM_PARAM_STORE_PFN);
> +    if ( !pv_console )
> +        RESERVE_PARAM(HVM_PARAM_CONSOLE_PFN);
> +#undef RESERVE_PARAM
> +
> +    return reserved_pages;
> +}

Afaict this happens much later than hypervisor_fixup_e820() -
can't you latch the PFNs into a file scope array there, and merely
return the information here, rather than re-invoking the
hypercalls? This would save at least one instance of the wrapper
macros.

> --- a/xen/drivers/char/xen_pv_console.c
> +++ b/xen/drivers/char/xen_pv_console.c
> @@ -35,6 +35,8 @@ static evtchn_port_t cons_evtchn;
>  static serial_rx_fn cons_rx_handler;
>  static DEFINE_SPINLOCK(tx_lock);
>  
> +bool pv_console;

__read_mostly?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU
  2018-01-04 13:06 ` [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU Wei Liu
@ 2018-01-08 14:06   ` Jan Beulich
  2018-01-09 16:09     ` Roger Pau Monné
  2018-01-09  9:06   ` Jan Beulich
  1 sibling, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 14:06 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> According to the PV ABI the initial virtual memory regions should
> contain the xenstore and console pages after the start_info. Fix this
> and add the pages to the p2m/m2p after the start_info page also.

I don't think "fix" is the right term here.

> --- a/xen/arch/x86/pv/dom0_build.c
> +++ b/xen/arch/x86/pv/dom0_build.c
> @@ -31,9 +31,8 @@
>  #define L3_PROT (BASE_PROT|_PAGE_DIRTY)
>  #define L4_PROT (BASE_PROT|_PAGE_DIRTY)
>  
> -static __init void dom0_update_physmap(struct domain *d, unsigned long pfn,
> -                                       unsigned long mfn,
> -                                       unsigned long vphysmap_s)
> +__init void dom0_update_physmap(struct domain *d, unsigned long pfn,

Please don't re-order type and annotation.

> @@ -443,9 +446,18 @@ int __init dom0_construct_pv(struct domain *d,
>      vstartinfo_start = round_pgup(vphysmap_end);
>      vstartinfo_end   = (vstartinfo_start +
>                          sizeof(struct start_info) +
> -                        sizeof(struct dom0_vga_console_info));
> +                        (pv_shim ? 0 : sizeof(struct dom0_vga_console_info)));

Why not move this addition ...

> -    vpt_start        = round_pgup(vstartinfo_end);
> +    if ( pv_shim )
> +    {
> +        vxenstore_start  = round_pgup(vstartinfo_end);
> +        vxenstore_end    = vxenstore_start + PAGE_SIZE;
> +        vconsole_start   = vxenstore_end;
> +        vconsole_end     = vconsole_start + PAGE_SIZE;
> +        vpt_start        = vconsole_end;
> +    }
> +    else
> +        vpt_start        = round_pgup(vstartinfo_end);

... into this "else" block.

> @@ -538,6 +550,8 @@ int __init dom0_construct_pv(struct domain *d,
>             " Init. ramdisk: %p->%p\n"
>             " Phys-Mach map: %p->%p\n"
>             " Start info:    %p->%p\n"
> +           " Xenstore ring: %p->%p\n"
> +           " Console ring:  %p->%p\n"
>             " Page tables:   %p->%p\n"
>             " Boot stack:    %p->%p\n"
>             " TOTAL:         %p->%p\n",
> @@ -545,6 +559,8 @@ int __init dom0_construct_pv(struct domain *d,
>             _p(vinitrd_start), _p(vinitrd_end),
>             _p(vphysmap_start), _p(vphysmap_end),
>             _p(vstartinfo_start), _p(vstartinfo_end),
> +           _p(vxenstore_start), _p(vxenstore_end),
> +           _p(vconsole_start), _p(vconsole_end),

I'm not convinced the extra verbosity is helpful - the pages are at
fixed offsets from start_info, which already is being logged.

> @@ -830,15 +847,20 @@ int __init dom0_construct_pv(struct domain *d,
>          strlcpy((char *)si->cmd_line, cmdline, sizeof(si->cmd_line));
>  
>  #ifdef CONFIG_VIDEO
> -    if ( fill_console_start_info((void *)(si + 1)) )
> +    if ( !pv_shim && fill_console_start_info((void *)(si + 1)) )
>      {
>          si->console.dom0.info_off  = sizeof(struct start_info);
>          si->console.dom0.info_size = sizeof(struct dom0_vga_console_info);
>      }
>  #endif
>  
> +    if ( pv_shim )
> +        pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start,
> +                          vphysmap_start, si);

With fill_console_start_info() being given a stub in the !CONFIG_VIDEO
case (ideally right in the earlier patch), this could become

    if ( pv_shim )
        pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start,
                          vphysmap_start, si);
    else if ( fill_console_start_info((void *)(si + 1)) )
     {
         si->console.dom0.info_off  = sizeof(struct start_info);
         si->console.dom0.info_size = sizeof(struct dom0_vga_console_info);
     }

> +static void __init replace_va(struct domain *d, l4_pgentry_t *l4start,
> +                              unsigned long va, unsigned long mfn)

I tdoesn't look like you're replacing a VA here (and really: how could
you?), so how about "replace_va_mapping()"?

> +{
> +    struct page_info *page;
> +    l4_pgentry_t *pl4e;
> +    l3_pgentry_t *pl3e;
> +    l2_pgentry_t *pl2e;
> +    l1_pgentry_t *pl1e;
> +
> +    pl4e = l4start + l4_table_offset(va);
> +    pl3e = l4e_to_l3e(*pl4e);
> +    pl3e += l3_table_offset(va);
> +    pl2e = l3e_to_l2e(*pl3e);
> +    pl2e += l2_table_offset(va);
> +    pl1e = l2e_to_l1e(*pl2e);
> +    pl1e += l1_table_offset(va);
> +
> +    page = mfn_to_page(l1e_get_pfn(*pl1e));
> +    /* Free original page, will be replaced */
> +    put_page_and_type(page);
> +    free_domheap_pages(page, 0);

This looks bogus - free_domheap_pages() should be called by
the last put_page(), not directly. If that doesn't happen, I
would guess you need the usual PGC_allocated clearing logic
here.

> +void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
> +                              unsigned long va_start, unsigned long store_va,
> +                              unsigned long console_va, unsigned long vphysmap,
> +                              start_info_t *si)
> +{
> +    uint64_t param = 0;
> +    long rc;
> +
> +#define SET_AND_MAP_PARAM(p, si, va) ({                                        \
> +    rc = xen_hypercall_hvm_get_param(p, &param);                               \
> +    if ( rc )                                                                  \
> +        panic("Unable to get " #p "\n");                                       \
> +    (si) = param;                                                              \
> +    if ( va )                                                                 \
> +    {                                                                          \
> +        BUG_ON(unshare_xen_page_with_guest(mfn_to_page(param), dom_io));       \
> +        share_xen_page_with_guest(mfn_to_page(param), d, XENSHARE_writable);   \
> +        replace_va(d, l4start, va, param);                                     \
> +        dom0_update_physmap(d, (va - va_start) >> PAGE_SHIFT, param, vphysmap);\

PFN_DOWN()

> +    }                                                                          \
> +})
> +    SET_AND_MAP_PARAM(HVM_PARAM_STORE_PFN, si->store_mfn, store_va);
> +    SET_AND_MAP_PARAM(HVM_PARAM_STORE_EVTCHN, si->store_evtchn, 0);
> +    if ( !pv_console )
> +    {
> +        SET_AND_MAP_PARAM(HVM_PARAM_CONSOLE_PFN, si->console.domU.mfn,
> +                          console_va);
> +        SET_AND_MAP_PARAM(HVM_PARAM_CONSOLE_EVTCHN, si->console.domU.evtchn, 0);
> +    }
> +#undef SET_AND_MAP_PARAM

Here, even more than earlier on, it becomes rather desirable to move
the HVM_PARAM_ prefixes into the macro. But yes, I know at least
Andrew won't like it ...

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 54/74] xen/pvshim: set correct domid value
  2018-01-04 13:06 ` [PATCH RFC v1 54/74] xen/pvshim: set correct domid value Wei Liu
@ 2018-01-08 14:17   ` Jan Beulich
  2018-01-09 16:27     ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 14:17 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> @@ -94,6 +95,24 @@ void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
>  #undef SET_AND_MAP_PARAM
>  }
>  
> +void pv_shim_shutdown(uint8_t reason)
> +{
> +    /* XXX: handle suspend */
> +    xen_hypercall_shutdown(reason);
> +}

Does this really need to be an out-of-line function? But yes, the
todo item probably warrants it.

> +domid_t get_dom0_domid(void)

What a strange name - to me Dom0's domain ID can only ever be
zero.

> +{
> +    uint32_t eax, ebx, ecx, edx;
> +
> +    if ( !pv_shim )
> +        return 0;
> +
> +    cpuid(hypervisor_cpuid_base() + 1, &eax, &ebx, &ecx, &edx);
> +
> +    return ebx ?: 1;
> +}

Not having another way to obtain the domain ID, returning 1
here is nevertheless dangerous in case the client domain actually
means to use its domain ID instead of DOMID_SELF anywhere. At
the very least this should be stated clearly in the description
(serving as a hint that the CPUID change should be backported
by anyone wanting to use the shim on their hypervisors).

> @@ -576,11 +578,11 @@ static void noinline init_done(void)
>  
>      system_state = SYS_STATE_active;
>  
> +    domain_unpause_by_systemcontroller(dom0);
> +
>      /* MUST be done prior to removing .init data. */
>      unregister_init_virtual_region();
>  
> -    domain_unpause_by_systemcontroller(hardware_domain);

Why the re-ordering? Along the lines of the earlier comment,
using "dom0" as replacement (static) variable isn't very nice.
Please at least accompany its declaration by a comment.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 06/74] tools/libelf: fix elf notes check for PVH guest
  2018-01-04 14:37   ` Jan Beulich
@ 2018-01-08 15:34     ` Wei Liu
  2018-01-08 16:02       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-08 15:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Thu, Jan 04, 2018 at 07:37:20AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> > PVH only requires PHYS32_ENTRY to be set. Return immediately if that's
> > the case.
> 
> So I guess the bug(?) being fixed is that so far loader or guest_os,
> and xen_ver settings are also required. However, you fail to mention
> _why_ you think they're not required. I can sort of see this for
> loader and maybe guest_os, but for the Xen version this isn't as
> obvious, mainly because any arguments I can think of right now
> would equally apply to PV.

Got it from docs/misc/pvh.markdown. It doesn't state other notes are
required.

I'm not sure if xen_version (always "xen-3.0"?) will be meaningful or
useful. It is not the end of the world if we check it but we do need to
be careful to not break existing OSes (mini-os for one only sets
PHYS32_ENTRY for PVH mode but that's easy to fix).

> 
> > --- a/xen/common/libelf/libelf-dominfo.c
> > +++ b/xen/common/libelf/libelf-dominfo.c
> > @@ -381,6 +381,13 @@ static elf_errorstatus elf_xen_note_check(struct elf_binary *elf,
> >           return 0;
> >      }
> >  
> > +    /* PVH only requires one ELF note to be set */
> > +    if ( parms->phys_entry != UNSET_ADDR32 )
> > +    {
> > +        elf_msg(elf, "ELF: Found PVH image\n");
> > +        return 0;
> > +    }
> 
> If the other entries are of no interest for PVH, I think that this
> then calls for dropping their logging from pvh_load_kernel().

Sure.

> I'm also surprised that I can't find any use of any of the three
> values checked in libxc.

Libxc delegates the work to libelf AIUI.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 18/74] x86/link: Relocate program headers
  2018-01-05 11:20   ` Jan Beulich
@ 2018-01-08 15:43     ` Wei Liu
  2018-01-08 16:26       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-08 15:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, wei.liu2, Xen-devel

On Fri, Jan 05, 2018 at 04:20:36AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> > From: Andrew Cooper <andrew.cooper3@citrix.com>
> > 
> > When the xen binary is loaded by libelf (in the future) we rely on the
> > elf loader to load the binary accordingly.
> 
> It would really help if it was said here what effect this has on the
> program headers - I can only guess that it'll make p_vaddr different
> from p_paddr.

The first version of this patch was written quite some time ago. If my
memory doesn't fail me, it is like what you said -- the p_vaddr and
p_paddr need to be different. I will double-check and update the commit
message.

> I'm also rather uncertain about the entry point
> change wrt various (and especially older) boot loaders.
> 

What (older) boot loaders do you have in mind?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 20/74] x86: produce a binary that can be booted as PVH
  2018-01-05 11:39   ` Jan Beulich
@ 2018-01-08 15:59     ` Wei Liu
  2018-01-08 16:42       ` Jan Beulich
  2018-01-10 19:10     ` Wei Liu
  1 sibling, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-08 15:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, wei.liu2, Xen-devel

On Fri, Jan 05, 2018 at 04:39:33AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Again I assume a description is still being intended to be written
> 
> > --- a/xen/arch/x86/Makefile
> > +++ b/xen/arch/x86/Makefile
> > @@ -75,6 +75,8 @@ efi-y := $(shell if [ ! -r 
> > $(BASEDIR)/include/xen/compile.h -o \
> >                        -O $(BASEDIR)/include/xen/compile.h ]; then \
> >                           echo '$(TARGET).efi'; fi)
> >  
> > +shim-$(CONFIG_PVH_GUEST) := $(TARGET)-shim
> > +
> >  ifneq ($(build_id_linker),)
> >  notes_phdrs = --notes
> >  else
> > @@ -93,7 +95,7 @@ endif
> >  syms-warn-dup-y := --warn-dup
> >  syms-warn-dup-$(CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS) :=
> >  
> > -$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
> > +$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32 $(shim-y)
> >  	./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TARGET) $(XEN_IMG_OFFSET) \
> >  	               `$(NM) $(TARGET)-syms | sed -ne 's/^\([^ ]*\) . __2M_rwdata_end$$/0x\1/p'`
> 
> Hmm, so you mean to build shim and "normal" Xen at the same time,
> with all the same objects? That's rather unexpected following the
> earlier exchange Andrew and I had. I would expect the shim to not
> require quite a few bits and pieces, and hence wanting to be built
> independently.
> 

There is a later patch in this series to link xen under tools/firmware/
to build the shim there, which would need build system patch like this.

The can be cleaned up somehow. At the time I wasn't sure how best to
proceed (and certainly didn't take part in the discussion between Andrew
and you).

Suggestions welcome.

> > @@ -144,6 +146,11 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
> >  		>$(@D)/$(@F).map
> >  	rm -f $(@D)/.$(@F).[0-9]*
> >  
> > +# Use elf32-x86-64 if toolchain support exists, elf32-i386 otherwise.
> > +$(TARGET)-shim: FORMAT = $(firstword $(filter elf32-x86-64,$(shell $(OBJCOPY) --help)) elf32-i386)
> 
> What are the implications of using one vs the other? If elf32-i386
> works, why not use it all the time?
> 

Not sure, Andrew made this change. I will leave this to him.

> > @@ -374,6 +375,15 @@ cs32_switch:
> >          /* Jump to earlier loaded address. */
> >          jmp     *%edi
> >  
> > +
> > +#ifdef CONFIG_PVH_GUEST
> 
> No double blank lines please.
> 
> > +ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY, .long sym_offs(__pvh_start))
> > +
> > +__pvh_start:
> > +        ud2a
> > +
> > +#endif /* CONFIG_PVH_GUEST */
> > +
> >  __start:
> 
> Does the new code strictly need to live here? Can't is be kept both
> out of the resulting binary sequence currently resulting here and
> out of this source file altogether (by introducing a new pvh.S or
> shim.S)?
> 

We can use a new source file.

> > --- a/xen/arch/x86/xen.lds.S
> > +++ b/xen/arch/x86/xen.lds.S
> > @@ -34,7 +34,7 @@ OUTPUT_ARCH(i386:x86-64)
> >  PHDRS
> >  {
> >    text PT_LOAD ;
> > -#if defined(BUILD_ID) && !defined(EFI)
> > +#if (defined(BUILD_ID) && !defined(EFI)) || defined (CONFIG_PVH_GUEST)
> 
> Did you mean
> 
> #if (defined(BUILD_ID) || defined(CONFIG_PVH_GUEST)) && !defined(EFI)
> 
> ? Of course this would be moot if main and shim binary were to
> be built independently.
> 
> Also - stray blank.
> 
> > @@ -128,6 +128,12 @@ SECTIONS
> >         __param_end = .;
> >    } :text
> >  
> > +#if defined(CONFIG_PVH_GUEST) && !defined(EFI)
> 
> The EFI part here then also wouldn't be necessary, afaict.
> 

My goal was to include the note section when building the shim. Your
comment looks correct to me. I will clean this up in the next version.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 06/74] tools/libelf: fix elf notes check for PVH guest
  2018-01-08 15:34     ` Wei Liu
@ 2018-01-08 16:02       ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 16:02 UTC (permalink / raw)
  To: wei.liu2; +Cc: Xen-devel

>>> On 08.01.18 at 16:34, <wei.liu2@citrix.com> wrote:
> On Thu, Jan 04, 2018 at 07:37:20AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> I'm also surprised that I can't find any use of any of the three
>> values checked in libxc.
> 
> Libxc delegates the work to libelf AIUI.

But libelf doesn't itself do anything with e.g. xen_ver, afaics.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU
  2018-01-04 13:06 ` [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU Wei Liu
@ 2018-01-08 16:05   ` Jan Beulich
  2018-01-08 16:22     ` Roger Pau Monné
  2018-01-09 17:50     ` Anthony Liguori
  2018-01-09  7:49   ` Jan Beulich
  1 sibling, 2 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 16:05 UTC (permalink / raw)
  To: Anthony Liguori, Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> Note that the unmask and the virq operations are handled by the shim
> itself, and that FIFO event channels are not exposed to the guest.
> 
> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>

In RFC state this certainly doesn't matter yet, but generally I'd
expect From: to match the first S-o-b.

> @@ -155,11 +156,31 @@ static void set_vcpu_id(void)
>  static void xen_evtchn_upcall(struct cpu_user_regs *regs)
>  {
>      struct vcpu_info *vcpu_info = this_cpu(vcpu_info);
> +    unsigned long pending;
>  
>      vcpu_info->evtchn_upcall_pending = 0;
> -    xchg(&vcpu_info->evtchn_pending_sel, 0);
> +    pending = xchg(&vcpu_info->evtchn_pending_sel, 0);
>  
> -    pv_console_rx(regs);
> +    while ( pending )
> +    {
> +        unsigned int l1 = ffsl(pending) - 1;

find_first_set_bit() would look to be the better match here (and
below), not the least because it translates (on capable hardware)
to TZCNT instead of BSF.

> +        unsigned long evtchn = xchg(&XEN_shared_info->evtchn_pending[l1], 0);
> +
> +        __clear_bit(l1, &pending);
> +        evtchn &= ~XEN_shared_info->evtchn_mask[l1];
> +        while ( evtchn )
> +        {
> +            unsigned int port = ffsl(evtchn) - 1;
> +
> +            __clear_bit(port, &evtchn);
> +            port += l1 * BITS_PER_LONG;

What about a 32-bit client? If that's not intended to be supported,
building of such a guest should be prevented (in dom0_build.c).

> @@ -63,6 +65,31 @@ static void __init replace_va(struct domain *d, l4_pgentry_t *l4start,
>                                                        : COMPAT_L1_PROT));
>  }
>  
> +static void evtchn_reserve(struct domain *d, unsigned int port)

const (perhaps also for other helpers below)?

> @@ -101,6 +133,233 @@ void pv_shim_shutdown(uint8_t reason)
>      xen_hypercall_shutdown(reason);
>  }
>  
> +long pv_shim_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> +{
> +    struct domain *d = current->domain;
> +    long rc;
> +
> +    switch ( cmd )
> +    {
> +#define EVTCHN_FORWARD(cmd, port_field)                                 \
> +case EVTCHNOP_##cmd: {                                                  \
> +    struct evtchn_##cmd op;                                             \

I think this whole macro body would better be indented one more
level, matching up with actual indentation at this point.

> +                                                                        \
> +    if ( copy_from_guest(&op, arg, 1) != 0 )                            \
> +        return -EFAULT;                                                 \
> +                                                                        \
> +    rc = xen_hypercall_event_channel_op(EVTCHNOP_##cmd, &op);           \
> +    if ( rc )                                                           \
> +        break;                                                          \
> +                                                                        \
> +    spin_lock(&d->event_lock);                                          \

Would the lock better be acquired already before the hypercall
above?

> +    rc = evtchn_allocate_port(d, op.port_field);                        \
> +    if ( rc )                                                           \
> +    {                                                                   \
> +        struct evtchn_close close = {                                   \
> +            .port = op.port_field,                                      \
> +        };                                                              \
> +                                                                        \
> +        BUG_ON(xen_hypercall_event_channel_op(EVTCHNOP_close, &close)); \
> +    }                                                                   \
> +    else                                                                \
> +        evtchn_reserve(d, op.port_field);                               \
> +    spin_unlock(&d->event_lock);                                        \
> +                                                                        \
> +    if ( !rc && __copy_to_guest(arg, &op, 1) )                          \
> +        rc = -EFAULT;                                                   \
> +                                                                        \
> +    break;                                                              \
> +    }
> +    EVTCHN_FORWARD(alloc_unbound, port)
> +    EVTCHN_FORWARD(bind_interdomain, local_port)
> +#undef EVTCHN_FORWARD
> +
> +    case EVTCHNOP_bind_virq: {
> +        struct evtchn_bind_virq virq;
> +        struct evtchn_alloc_unbound alloc = {
> +            .dom = DOMID_SELF,
> +            .remote_dom = DOMID_SELF,
> +        };
> +
> +        if ( copy_from_guest(&virq, arg, 1) != 0 )
> +            return -EFAULT;
> +        /*
> +         * The event channel space is actually controlled by L0 Xen, so
> +         * allocate a port from L0 and then force the VIRQ to be bound to that
> +         * specific port.
> +         *
> +         * This is only required for VIRQ because the rest of the event channel
> +         * operations are handled directly by L0.
> +         */
> +        rc = xen_hypercall_event_channel_op(EVTCHNOP_alloc_unbound, &alloc);
> +        if ( rc )
> +           break;
> +
> +        /* Force L1 to use the event channel port allocated on L0. */
> +        rc = evtchn_bind_virq(&virq, alloc.port);
> +        if ( rc )
> +        {
> +             struct evtchn_close free = {

Why is this not named "close", like the other one? Perhaps a single
function wide instance of this structure would suffice?

> +                .port = alloc.port,
> +             };
> +
> +              xen_hypercall_event_channel_op(EVTCHNOP_close, &free);
> +        }
> +
> +        if ( !rc && __copy_to_guest(arg, &virq, 1) )
> +            rc = -EFAULT;
> +
> +        break;
> +    }
> +    case EVTCHNOP_status: {

Blank lines between non-fall-through case blocks please.

> +        struct evtchn_status status;
> +
> +        if ( copy_from_guest(&status, arg, 1) != 0 )
> +            return -EFAULT;
> +
> +        if ( port_is_valid(d, status.port) && evtchn_handled(d, status.port) )

Please be consistent with the validity checks: Compare this one
with ...

> +            rc = evtchn_status(&status);
> +        else
> +            rc = xen_hypercall_event_channel_op(EVTCHNOP_status, &status);
> +
> +        break;
> +    }
> +    case EVTCHNOP_bind_vcpu: {
> +        struct evtchn_bind_vcpu vcpu;
> +
> +        if ( copy_from_guest(&vcpu, arg, 1) != 0 )
> +            return -EFAULT;
> +
> +        if ( !port_is_valid(d, vcpu.port) )
> +            return -EINVAL;
> +
> +        if ( evtchn_handled(d, vcpu.port) )

... the one here. Or otherwise add a comment clarifying why they
are being done differently.

> +    case EVTCHNOP_bind_ipi: {
> +        struct evtchn_bind_ipi ipi;
> +
> +        if ( copy_from_guest(&ipi, arg, 1) != 0 )
> +            return -EFAULT;
> +
> +        rc = xen_hypercall_event_channel_op(EVTCHNOP_bind_ipi, &ipi);
> +        if ( rc )
> +            break;
> +
> +        spin_lock(&d->event_lock);
> +        rc = evtchn_allocate_port(d, ipi.port);
> +        if ( rc )
> +        {
> +            struct evtchn_close close = {
> +                .port = ipi.port,
> +            };
> +
> +            /*
> +             * If closing the event channel port also fails there's not
> +             * much the shim can do, since it has been unable to reserve
> +             * the port in it's event channel space.
> +             */
> +            BUG_ON(xen_hypercall_event_channel_op(EVTCHNOP_close, &close));

A similar BUG_ON() further up went without comment, which I think
would be fine here too.

> +    case EVTCHNOP_unmask: {
> +        struct evtchn_unmask unmask;
> +
> +        if ( copy_from_guest(&unmask, arg, 1) != 0 )
> +            return -EFAULT;
> +
> +        /* Unmask is handled in L1 */
> +        rc = evtchn_unmask(unmask.port);
> +
> +        break;
> +    }

Is this really sufficient, without handing anything through to L0?
Perhaps it's fine as long as there's no pass-through support here.

> +    default:
> +        /* No FIFO or PIRQ support for now */
> +        rc = -ENOSYS;

-EOPNOTSUPP please.

> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -63,6 +63,8 @@ struct domain *domain_list;
>  
>  struct domain *hardware_domain __read_mostly;
>  
> +struct domain *pv_domain __read_mostly;

This shouldn't really live in common code, and even less so outside
of any #ifdef (same for its declaration being placed in a common
header).

> @@ -395,6 +397,11 @@ struct domain *domain_create(domid_t domid, unsigned int domcr_flags,
>          rcu_assign_pointer(*pd, d);
>          rcu_assign_pointer(domain_hash[DOMAIN_HASH(domid)], d);
>          spin_unlock(&domlist_update_lock);
> +
> +#ifdef CONFIG_X86
> +        if ( pv_shim )
> +            pv_domain = d;
> +#endif

I assume this #ifdef could be more restrictive.

> @@ -345,13 +365,13 @@ static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
>  }
>  
>  
> -static long evtchn_bind_virq(evtchn_bind_virq_t *bind)
> +int evtchn_bind_virq(evtchn_bind_virq_t *bind, int port)

evtchn_port_t please (also in evtchn_allocate_port()), and ...

>  {
>      struct evtchn *chn;
>      struct vcpu   *v;
>      struct domain *d = current->domain;
> -    int            port, virq = bind->virq, vcpu = bind->vcpu;
> -    long           rc = 0;
> +    int            virq = bind->virq, vcpu = bind->vcpu;
> +    int            rc = 0;
>  
>      if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) )
>          return -EINVAL;
> @@ -368,7 +388,12 @@ static long evtchn_bind_virq(evtchn_bind_virq_t *bind)
>      if ( v->virq_to_evtchn[virq] != 0 )
>          ERROR_EXIT(-EEXIST);
>  
> -    if ( (port = get_free_port(d)) < 0 )
> +    if ( port >= 0 )

... use zero as the "please allocate" indicator here (and in the
respective caller).

> @@ -511,7 +536,7 @@ static long evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
>  }
>  
>  
> -static long evtchn_close(struct domain *d1, int port1, bool_t guest)
> +long evtchn_close(struct domain *d1, int port1, bool guest)

Convert return type to "int" at the same time?

> @@ -839,7 +864,7 @@ static void clear_global_virq_handlers(struct domain *d)
>      }
>  }
>  
> -static long evtchn_status(evtchn_status_t *status)
> +long evtchn_status(evtchn_status_t *status)

Same here.

> @@ -1030,6 +1055,11 @@ long do_event_channel_op(int cmd, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>  {
>      long rc;
>  
> +#ifdef CONFIG_X86
> +    if ( pv_shim )
> +        return pv_shim_event_channel_op(cmd, arg);
> +#endif

Patch it right into the hypercall table instead?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 00/74] Run PV guest in PVH container
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (73 preceding siblings ...)
  2018-01-04 13:06 ` [PATCH RFC v1 74/74] libxl: pvshim: Set video_memkb to ~0 Wei Liu
@ 2018-01-08 16:12 ` Ian Jackson
  2018-01-11 15:39   ` Ian Jackson
  2018-01-10 16:26 ` George Dunlap
  75 siblings, 1 reply; 206+ messages in thread
From: Ian Jackson @ 2018-01-08 16:12 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel, George Dunlap, Andrew Cooper, Roger Pau Monné

Wei Liu writes ("[Xen-devel] [PATCH RFC v1 00/74] Run PV guest in PVH container"):
> 1. ARM build and some Clang build are broken by this series.
> 2. The host will see a lot over-allocation messages, nothing too harmful and
>    will be fixed once toolstack is ready.

My revised toolstack part of this, which I passed to Wei on Friday,
uses a different approach.  The caller must specify type=pvh
and some additional pvshim options (minimally, they must set a new
defbool to true).

Since we are intending to backport something like this, I thought I
would write down my API/ABI compatibility analysis, and our
discussions as I remember them.

I'm talking here about the libxl ABI/API; xl's config file handling is
easier and I won't analyse it here.


Xen 4.10
========

Backport strategy:
------------------

The ABI/API changes at the libxl layer are a few new fields in
b_info.u.pvh.  That substructure is empty in earlier 4.10.  It's part
of a union, so the overall struct size and layout does not increase.

Old callers with new libxl on 4.10:
-----------------------------------

If the caller is creating a guest other than a PVH one there is no
change to the ABI.

If the caller that is creating a PVH guest uses the proper
libxl_*init* functions those new fields will be initialised to the
default values.  The default is to disable the new mode (this is right
because we don't want to turn existing PVH guests into PV-in-PVH
ones).  With the patch as I passed to Wei, the extra string
parameters are initialised unconditionally.  This is not desirable
in the backport.

 => code change: the backport should set the default non-NULL
    values only if the pvhshim boolean is true after defaulting

When the caller disposes a domain config, the libxl dispose function
will free any of these fields.

If the caller did not call libxl_*init*, nor initialise the struct
with memset, but _does_ call libxl_*dispose*, the dispose will read
uninitialised memory and crash.  However, that is not a supported
approach and such a caller would have to manually initialise all the
myriad fields instead of calling *init* so it seems unlikely.

If the caller is examining existing guests: The returned struct will
have additional information, for any PVH guests (including PV-in-PVH
ones) which will be ignored by the caller.  The caller may
misunderstand and misreport the guest type.  The additional
information may come from the heap, but *dispose* will free it, so
there should be no leak.

Overall: this is safe, with the code change I propose.

New callers with old libxl on 4.10:
-----------------------------------

If the caller is creating a guest other than a PVH one there is no
change to the ABI.

When a caller creates a PVH non-shim guest, it will probably not set
any of these fields.  Things will work properly.

If a caller tries to create a shim guest, the attempt to do so will be
ignored and the guest will be created as PV.  Probably, the guest will
not boot.  Additionally, if the caller filled in pvshim cmdline or
path information, it will probably expect *dispose* to free those
values - and the result will be a memory leak.

If the caller is examining existing guests, PV and HVM guests will
work fine.  If the caller is examining existing PVH guests, the
library will not initialise the new fields.  The result may include an
uninitialised read by the caller.

Overall: this is not safe and should be prevented.

 => code change: the 4.10 backport should use symbol versioning or
    another technique to prevent expecting callers whose source code
    understands *shim* guests from using libxl versions which don't.



Xen 4.9 and 4.8
===============

Backport strategy:
------------------

These have some kind of PVH toolstack support but only half-baked and
specified with "device_model_version=none" which breaks some PV device
availability etc.

We have considered three backport strategies:

 (i) Make the shim run in HVM mode and use that.  My understanding
     from experts in the relevant area is that this is quite awkward:
     a lot of work, and risks producing a program which does not work
     well.

 (ii) Have libxl callers specify the new mode as a variant on the PV
     guest type.  Inside libxl this is a real pain because it involves
     touching every place the domain type is considered, to decide
     whether this new mode should be like PV or like HVM.  This is
     essentially redoing the work done for PVH in 4.10, but afresh and
     therefore with new bugs.

 (iii) Backport the first-class PVH guest type patches to libxl from
     4.10.  This is a large backport but these patches have been
     around for a while and we regard them as stable.  Doing this
     means we provide users of 4.8 and 4.9 with the same bugs as are
     in 4.10 (which we think are fairly few) rather than doing a lot
     of work to write an essentially equivalent amount of new code
     with a lot of new bugs.

From the above it should be evident that we think strategy (iii) is
the best answer.  Howwever, there are some wrinkles.

Some of the libxl pvh patches move various parameters for specifying
guest kernels from the pv-specific part of the domain config struct to
the "build info", ie from d_config.b_info.u.pv to d_config.b_info.

We cannot do that in 4.8 and 4.9 because doing so would increase the
size of d_config.b_info.  b_info is not the last field in d_config, so
adding fields to it would move other fields in b_info which is total
ABI break.

Accordingly, we will have to provide copies of the kernel related
parameters in the pvh struct.

Updated callers that want to create shim guests (or, indeed PVH
guests) will have to specify the kernel in the new fields.

We should provide a macro to access the "right" field for each of
the kernel fields.  We should provide that macro in 4.10 and unstable
too, so that the API is forward-compatible.

 => code change: define the macro just discussed

Old callers with new libxl on <=4.9:
------------------------------------

If the caller is creating a guest there is no change to the ABI.

If the caller is examining existing guests: If any PVH guests have
been created already, the caller will receive a domain type that it is
not expecting.  It may crash or produce an error, but undefined
behaviour seems unlikely.  And this can only happen when old
higher-level programs are operating on a host that already has new
guests.  Furthermore, if the caller properly handles the unexpected
guest type, and calls *dispose*, there is even no memory leak.

New callers with old libxl on <=4.9:
------------------------------------

If the caller is creating a guest other than a PVH one there is no
change to the ABI.

If the caller is creating a PVH guest, it will call *init* but that
call will not initialise the PVH type.  (*init* does not return
errors.)  The caller may attempt boolean setting, string replacement,
and so on, on the undefined data.

 => code change: the <=4.9 backport should use symbol versioning or
    another technique to prevent callers whose source code understands
    *PVH* guests from using libxl versions which don't.

If the caller is examining guests, for non-PVH guests there is no
problem.  For PVH guests, libxl will return wrong information but
there will be no serious problem.


Xen 4.7 and earlier:
====================

These lack appropriate hypervisor support for PVHv2 guests.
It may be possible to make the shim run in HVM - but see above.


Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU
  2018-01-08 16:05   ` Jan Beulich
@ 2018-01-08 16:22     ` Roger Pau Monné
  2018-01-09  8:00       ` Jan Beulich
  2018-01-09 17:50     ` Anthony Liguori
  1 sibling, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-08 16:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2, Anthony Liguori

On Mon, Jan 08, 2018 at 09:05:40AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > +        unsigned long evtchn = xchg(&XEN_shared_info->evtchn_pending[l1], 0);
> > +
> > +        __clear_bit(l1, &pending);
> > +        evtchn &= ~XEN_shared_info->evtchn_mask[l1];
> > +        while ( evtchn )
> > +        {
> > +            unsigned int port = ffsl(evtchn) - 1;
> > +
> > +            __clear_bit(port, &evtchn);
> > +            port += l1 * BITS_PER_LONG;
> 
> What about a 32-bit client? If that's not intended to be supported,
> building of such a guest should be prevented (in dom0_build.c).

32bit client? You mean building a shim that runs in 32bit mode? If so
I haven't really through of it, but in any case BITS_PER_LOG would be
OK also in that case?

> > +                                                                        \
> > +    if ( copy_from_guest(&op, arg, 1) != 0 )                            \
> > +        return -EFAULT;                                                 \
> > +                                                                        \
> > +    rc = xen_hypercall_event_channel_op(EVTCHNOP_##cmd, &op);           \
> > +    if ( rc )                                                           \
> > +        break;                                                          \
> > +                                                                        \
> > +    spin_lock(&d->event_lock);                                          \
> 
> Would the lock better be acquired already before the hypercall
> above?

I'm not sure I see your point here, certainly L0 already must have
it's own locking. AFAICT the shim just needs to lock event_lock when
fiddling with event channel data of the guest.

> > +    case EVTCHNOP_unmask: {
> > +        struct evtchn_unmask unmask;
> > +
> > +        if ( copy_from_guest(&unmask, arg, 1) != 0 )
> > +            return -EFAULT;
> > +
> > +        /* Unmask is handled in L1 */
> > +        rc = evtchn_unmask(unmask.port);
> > +
> > +        break;
> > +    }
> 
> Is this really sufficient, without handing anything through to L0?
> Perhaps it's fine as long as there's no pass-through support here.

For the unmask operation? I think so, if there was a pending event the
shim will already take care of injecting it to the guest.

> > @@ -1030,6 +1055,11 @@ long do_event_channel_op(int cmd, 
> > XEN_GUEST_HANDLE_PARAM(void) arg)
> >  {
> >      long rc;
> >  
> > +#ifdef CONFIG_X86
> > +    if ( pv_shim )
> > +        return pv_shim_event_channel_op(cmd, arg);
> > +#endif
> 
> Patch it right into the hypercall table instead?

That would only work if the shim is a compile time option, but not a
run time one, the hypercall table is ro.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 18/74] x86/link: Relocate program headers
  2018-01-08 15:43     ` Wei Liu
@ 2018-01-08 16:26       ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 16:26 UTC (permalink / raw)
  To: wei.liu2; +Cc: Andrew Cooper, Xen-devel

>>> On 08.01.18 at 16:43, <wei.liu2@citrix.com> wrote:
> On Fri, Jan 05, 2018 at 04:20:36AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> I'm also rather uncertain about the entry point
>> change wrt various (and especially older) boot loaders.
> 
> What (older) boot loaders do you have in mind?

grub1 in particular, but there may be others.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 20/74] x86: produce a binary that can be booted as PVH
  2018-01-08 15:59     ` Wei Liu
@ 2018-01-08 16:42       ` Jan Beulich
  2018-01-09 13:49         ` Wei Liu
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 16:42 UTC (permalink / raw)
  To: wei.liu2; +Cc: Andrew Cooper, Xen-devel

>>> On 08.01.18 at 16:59, <wei.liu2@citrix.com> wrote:
> On Fri, Jan 05, 2018 at 04:39:33AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> > --- a/xen/arch/x86/Makefile
>> > +++ b/xen/arch/x86/Makefile
>> > @@ -75,6 +75,8 @@ efi-y := $(shell if [ ! -r 
>> > $(BASEDIR)/include/xen/compile.h -o \
>> >                        -O $(BASEDIR)/include/xen/compile.h ]; then \
>> >                           echo '$(TARGET).efi'; fi)
>> >  
>> > +shim-$(CONFIG_PVH_GUEST) := $(TARGET)-shim
>> > +
>> >  ifneq ($(build_id_linker),)
>> >  notes_phdrs = --notes
>> >  else
>> > @@ -93,7 +95,7 @@ endif
>> >  syms-warn-dup-y := --warn-dup
>> >  syms-warn-dup-$(CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS) :=
>> >  
>> > -$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
>> > +$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32 $(shim-y)
>> >  	./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TARGET) $(XEN_IMG_OFFSET) \
>> >  	               `$(NM) $(TARGET)-syms | sed -ne 's/^\([^ ]*\) . __2M_rwdata_end$$/0x\1/p'`
>> 
>> Hmm, so you mean to build shim and "normal" Xen at the same time,
>> with all the same objects? That's rather unexpected following the
>> earlier exchange Andrew and I had. I would expect the shim to not
>> require quite a few bits and pieces, and hence wanting to be built
>> independently.
>> 
> 
> There is a later patch in this series to link xen under tools/firmware/
> to build the shim there, which would need build system patch like this.
> 
> The can be cleaned up somehow. At the time I wasn't sure how best to
> proceed (and certainly didn't take part in the discussion between Andrew
> and you).
> 
> Suggestions welcome.

Well, when I had discussed this with Andrew, my view on the
outcome was that we'd build either xen-shim or the pair of
xen.gz and xen.efi in a single build invocation (hence two build
all three, a second make would be needed, which would seem
to be at least along the lines of what that later patch is doing).

The above dependency, otoh, suggests that you want to
build both xen.gz and xen-shim.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 56/74] xen/pvshim: add grant table operations
  2018-01-04 13:06 ` [PATCH RFC v1 56/74] xen/pvshim: add grant table operations Wei Liu
@ 2018-01-08 17:19   ` Jan Beulich
  2018-01-09 18:34     ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-08 17:19 UTC (permalink / raw)
  To: Anthony Liguori, Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> @@ -30,11 +31,17 @@
>  #include <asm/guest.h>
>  #include <asm/pv/mm.h>
>  
> +#include <compat/grant_table.h>

Interesting: The event channel patch gave me the impression that
it is not intended to deal with 32-bit guests.

> @@ -360,6 +367,173 @@ void pv_shim_inject_evtchn(unsigned int port)
>      }
>  }
>  
> +long pv_shim_grant_table_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop,
> +                            unsigned int count, bool compat)
> +{
> +    struct domain *d = current->domain;
> +    long rc = 0;
> +
> +    if ( count != 1 )
> +        return -EINVAL;
> +
> +    switch ( cmd )
> +    {
> +    case GNTTABOP_setup_table:
> +    {
> +        struct gnttab_setup_table nat;
> +        struct compat_gnttab_setup_table cmp;
> +        unsigned int i;
> +
> +        if ( unlikely(compat ? copy_from_guest(&cmp, uop, 1)
> +                             : copy_from_guest(&nat, uop, 1)) ||
> +             unlikely(compat ? !compat_handle_okay(cmp.frame_list,
> +                                                   cmp.nr_frames)
> +                             : !guest_handle_okay(nat.frame_list,
> +                                                  nat.nr_frames)) )
> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
> +        if ( compat )
> +#define XLAT_gnttab_setup_table_HNDL_frame_list(d, s)
> +                XLAT_gnttab_setup_table(&nat, &cmp);
> +#undef XLAT_gnttab_setup_table_HNDL_frame_list
> +
> +        nat.status = GNTST_okay;
> +
> +        spin_lock(&grant_lock);
> +        if ( !nr_grant_list )
> +        {
> +            struct gnttab_query_size query_size = {
> +                .dom = DOMID_SELF,
> +            };
> +
> +            rc = xen_hypercall_grant_table_op(GNTTABOP_query_size,
> +                                              &query_size, 1);
> +            if ( rc )
> +            {
> +                spin_unlock(&grant_lock);
> +                break;
> +            }
> +
> +            ASSERT(!grant_frames);
> +            grant_frames = xzalloc_array(unsigned long,
> +                                         query_size.max_nr_frames);

Hmm, such runtime allocations (especially when the amount can
be large) are a fundamental problem. I think this needs setting
up before the guest is started.

> +            if ( !grant_frames )
> +            {
> +                spin_unlock(&grant_lock);
> +                rc = -ENOMEM;
> +                break;
> +            }
> +
> +            nr_grant_list = query_size.max_nr_frames;
> +        }
> +
> +        if ( nat.nr_frames > nr_grant_list )
> +        {
> +            spin_unlock(&grant_lock);
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        for ( i = 0; i < nat.nr_frames; i++ )
> +        {
> +            if ( !grant_frames[i] )
> +            {
> +                struct xen_add_to_physmap xatp = {
> +                    .domid = DOMID_SELF,
> +                    .idx = i,
> +                    .space = XENMAPSPACE_grant_table,
> +                };
> +                mfn_t mfn;
> +
> +                rc = hypervisor_alloc_unused_page(&mfn);
> +                if ( rc )
> +                {
> +                    gprintk(XENLOG_ERR,
> +                            "unable to get memory for grant table\n");
> +                    break;
> +                }
> +
> +                xatp.gpfn = mfn_x(mfn);
> +                rc = xen_hypercall_memory_op(XENMEM_add_to_physmap, &xatp);
> +                if ( rc )
> +                {
> +                    hypervisor_free_unused_page(mfn);
> +                    break;
> +                }
> +
> +                BUG_ON(iomem_permit_access(d, mfn_x(mfn), mfn_x(mfn)));
> +                grant_frames[i] = mfn_x(mfn);
> +            }
> +
> +            ASSERT(grant_frames[i]);
> +            if ( compat )
> +            {
> +                compat_pfn_t pfn = grant_frames[i];
> +
> +                if ( __copy_to_compat_offset(cmp.frame_list, i, &pfn, 1) )
> +                {
> +                    nat.status = GNTST_bad_virt_addr;
> +                    rc = -EFAULT;
> +                    break;
> +                }
> +            }
> +            else if ( __copy_to_guest_offset(nat.frame_list, i,
> +                                             &grant_frames[i], 1) )
> +            {
> +                nat.status = GNTST_bad_virt_addr;
> +                rc = -EFAULT;
> +                break;
> +            }
> +        }
> +        spin_unlock(&grant_lock);
> +
> +        if ( compat )
> +#define XLAT_gnttab_setup_table_HNDL_frame_list(d, s)
> +                XLAT_gnttab_setup_table(&cmp, &nat);
> +#undef XLAT_gnttab_setup_table_HNDL_frame_list
> +
> +        if ( unlikely(compat ? copy_to_guest(uop, &cmp, 1)
> +                             : copy_to_guest(uop, &nat, 1)) )

__copy_to_guest()

> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
> +
> +        break;
> +    }
> +    case GNTTABOP_query_size:

Blank line above such "case" please.

> +    {
> +        struct gnttab_query_size op;
> +        int rc;
> +
> +        if ( unlikely(copy_from_guest(&op, uop, 1)) )
> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
> +
> +        rc = xen_hypercall_grant_table_op(GNTTABOP_query_size, &op, count);
> +        if ( rc )
> +            break;
> +
> +        if ( copy_to_guest(uop, &op, 1) )

__copy_to_guest() (assuming this coping in and out is necessary
in the first place).

> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
> +
> +        break;
> +    }
> +    default:
> +        rc = -ENOSYS;

-EOPNOTSUPP again please. Plus - what about other sub-ops?

> @@ -3324,6 +3328,12 @@ do_grant_table_op(
>      if ( (cmd &= GNTTABOP_CMD_MASK) != GNTTABOP_cache_flush && opaque_in )
>          return -EINVAL;
>  
> +#ifdef CONFIG_X86
> +    if ( pv_shim )
> +        /* NB: no continuation support for pv-shim ops. */
> +        return pv_shim_grant_table_op(cmd, uop, count, false);
> +#endif

As for event channels - patch it right into the hypercall table?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU
  2018-01-04 13:06 ` [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU Wei Liu
  2018-01-08 16:05   ` Jan Beulich
@ 2018-01-09  7:49   ` Jan Beulich
  1 sibling, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-09  7:49 UTC (permalink / raw)
  To: Anthony Liguori, Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> +    case EVTCHNOP_close: {
> +        struct evtchn_close close;
> +
> +        if ( copy_from_guest(&close, arg, 1) != 0 )
> +            return -EFAULT;
> +
> +        if ( !port_is_valid(d, close.port) )
> +            return -EINVAL;
> +
> +        if ( evtchn_handled(d, close.port) )
> +        {
> +            rc = evtchn_close(d, close.port, true);
> +            if ( rc )
> +                break;
> +        }
> +        else
> +            evtchn_free(d, evtchn_from_port(d, close.port));

Judging by other callers of this function you ought to hold the
domain event lock and the event channel lock around it. That's
one of the reasons the original function was static and didn't
follow the evtchn_*() naming scheme of externally usable
functions. Perhaps you want to introduce evtchn_free() as a
wrapper around free_evtchn(), acquiring and releasing the
locks? In this particular case use of evtchn_from_port() may
also be deemed a layering violation.

> +        rc = xen_hypercall_event_channel_op(EVTCHNOP_close, &close);
> +        if ( rc )
> +            /*
> +             * If the port cannot be closed on the L0 mark it as reserved
> +             * in the shim to avoid re-using it.
> +             */
> +            evtchn_reserve(d, close.port);
> +
> +        set_bit(close.port, XEN_shared_info->evtchn_mask);

Wouldn't this better be set earlier? The port is certainly available for
re-use prior to making it here. And I wonder whether the bit wouldn't
also want setting on some of the error paths closing ports, just to be
on the safe side.

> +    case EVTCHNOP_bind_ipi: {
> +        struct evtchn_bind_ipi ipi;
> +
> +        if ( copy_from_guest(&ipi, arg, 1) != 0 )
> +            return -EFAULT;
> +
> +        rc = xen_hypercall_event_channel_op(EVTCHNOP_bind_ipi, &ipi);
> +        if ( rc )
> +            break;
> +
> +        spin_lock(&d->event_lock);
> +        rc = evtchn_allocate_port(d, ipi.port);
> +        if ( rc )
> +        {
> +            struct evtchn_close close = {
> +                .port = ipi.port,
> +            };
> +
> +            /*
> +             * If closing the event channel port also fails there's not
> +             * much the shim can do, since it has been unable to reserve
> +             * the port in it's event channel space.
> +             */
> +            BUG_ON(xen_hypercall_event_channel_op(EVTCHNOP_close, &close));
> +            break;

There's a spin_unlock() missing here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU
  2018-01-08 16:22     ` Roger Pau Monné
@ 2018-01-09  8:00       ` Jan Beulich
  2018-01-09 16:45         ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-09  8:00 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, wei.liu2, Anthony Liguori

>>> On 08.01.18 at 17:22, <roger.pau@citrix.com> wrote:
> On Mon, Jan 08, 2018 at 09:05:40AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
>> > +        unsigned long evtchn = xchg(&XEN_shared_info->evtchn_pending[l1], 0);
>> > +
>> > +        __clear_bit(l1, &pending);
>> > +        evtchn &= ~XEN_shared_info->evtchn_mask[l1];
>> > +        while ( evtchn )
>> > +        {
>> > +            unsigned int port = ffsl(evtchn) - 1;
>> > +
>> > +            __clear_bit(port, &evtchn);
>> > +            port += l1 * BITS_PER_LONG;
>> 
>> What about a 32-bit client? If that's not intended to be supported,
>> building of such a guest should be prevented (in dom0_build.c).
> 
> 32bit client? You mean building a shim that runs in 32bit mode? If so
> I haven't really through of it, but in any case BITS_PER_LOG would be
> OK also in that case?

No, by "client" I mean the (sole) guest of the shim, in the 32-bit
case of which you'd need to use BITS_PER_EVTCHN_WORD() here.
But since 32-bit PV guests are not a problem wrt SP3, I can see
why we wouldn't want/need to support that case. Yet if so, I'd
prefer if we did that uniformly, by e.g. also avoiding the compat
complications in the new grant table wrapper.

>> > +                                                                        \
>> > +    if ( copy_from_guest(&op, arg, 1) != 0 )                            \
>> > +        return -EFAULT;                                                 \
>> > +                                                                        \
>> > +    rc = xen_hypercall_event_channel_op(EVTCHNOP_##cmd, &op);           \
>> > +    if ( rc )                                                           \
>> > +        break;                                                          \
>> > +                                                                        \
>> > +    spin_lock(&d->event_lock);                                          \
>> 
>> Would the lock better be acquired already before the hypercall
>> above?
> 
> I'm not sure I see your point here, certainly L0 already must have
> it's own locking. AFAICT the shim just needs to lock event_lock when
> fiddling with event channel data of the guest.

The point isn't to guard L0 in any case, but to deal with racing
requests coming from the client. Having looked again, the two
operations the macro is being used for are probably fine with
the locking left as is, but as you can see from the other reply
sent a few minutes ago (to the original patch) there are races
to be considered in general.

>> > +    case EVTCHNOP_unmask: {
>> > +        struct evtchn_unmask unmask;
>> > +
>> > +        if ( copy_from_guest(&unmask, arg, 1) != 0 )
>> > +            return -EFAULT;
>> > +
>> > +        /* Unmask is handled in L1 */
>> > +        rc = evtchn_unmask(unmask.port);
>> > +
>> > +        break;
>> > +    }
>> 
>> Is this really sufficient, without handing anything through to L0?
>> Perhaps it's fine as long as there's no pass-through support here.
> 
> For the unmask operation? I think so, if there was a pending event the
> shim will already take care of injecting it to the guest.

Well, as the Linux code (evtchn_2l_unmask()) tells us certain
unmasks have to go through the hypervisor. I would assume
that in the case of the shim this means that L2 requests need
to also be handed through to L0 whenever they're not being
handled entirely locally to L1.

>> > @@ -1030,6 +1055,11 @@ long do_event_channel_op(int cmd, 
>> > XEN_GUEST_HANDLE_PARAM(void) arg)
>> >  {
>> >      long rc;
>> >  
>> > +#ifdef CONFIG_X86
>> > +    if ( pv_shim )
>> > +        return pv_shim_event_channel_op(cmd, arg);
>> > +#endif
>> 
>> Patch it right into the hypercall table instead?
> 
> That would only work if the shim is a compile time option, but not a
> run time one, the hypercall table is ro.

Well, yes and no: See nmi_shootdown_cpus() for a precedent
of how to do that without removing the r/o attribute. Not having
the hook sit here would (I assume) allow to avoid compiling the
entire do_event_channel_op() down the road in the shim-only
case. The compiler may be able to partially do this (omitting the
rest of the function), but my experience is that deferring to the
compiler in this regard often means leaving some traces around.

But anyway - this of course is something which can also be sorted
out later, if it's deemed too complicated for doing right away.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU
  2018-01-04 13:06 ` [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU Wei Liu
  2018-01-08 14:06   ` Jan Beulich
@ 2018-01-09  9:06   ` Jan Beulich
  1 sibling, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-09  9:06 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> +void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
> +                              unsigned long va_start, unsigned long store_va,
> +                              unsigned long console_va, unsigned long vphysmap,
> +                              start_info_t *si)
> +{
> +    uint64_t param = 0;
> +    long rc;
> +
> +#define SET_AND_MAP_PARAM(p, si, va) ({                                        \
> +    rc = xen_hypercall_hvm_get_param(p, &param);                               \
> +    if ( rc )                                                                  \
> +        panic("Unable to get " #p "\n");                                       \
> +    (si) = param;                                                              \
> +    if ( va )                                                                  \
> +    {                                                                          \
> +        BUG_ON(unshare_xen_page_with_guest(mfn_to_page(param), dom_io));       \
> +        share_xen_page_with_guest(mfn_to_page(param), d, XENSHARE_writable);   \
> +        replace_va(d, l4start, va, param);                                     \
> +        dom0_update_physmap(d, (va - va_start) >> PAGE_SHIFT, param, vphysmap);\

Cosmetic remark: va wants to be parenthesized here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU
  2018-01-04 13:06 ` [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU Wei Liu
@ 2018-01-09  9:13   ` Jan Beulich
  2018-01-09 15:43     ` Sergey Dyasli
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-09  9:13 UTC (permalink / raw)
  To: Sergey Dyasli, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> @@ -125,13 +127,28 @@ void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
>  })
>      SET_AND_MAP_PARAM(HVM_PARAM_STORE_PFN, si->store_mfn, store_va);
>      SET_AND_MAP_PARAM(HVM_PARAM_STORE_EVTCHN, si->store_evtchn, 0);
> +    SET_AND_MAP_PARAM(HVM_PARAM_CONSOLE_EVTCHN, si->console.domU.evtchn, 0);
>      if ( !pv_console )
> -    {
>          SET_AND_MAP_PARAM(HVM_PARAM_CONSOLE_PFN, si->console.domU.mfn,
>                            console_va);
> -        SET_AND_MAP_PARAM(HVM_PARAM_CONSOLE_EVTCHN, si->console.domU.evtchn, 0);
> -    }
>  #undef SET_AND_MAP_PARAM
> +    else
> +    {
> +        /* Allocate a new page for DomU's PV console */
> +        void *page = alloc_xenheap_pages(0, MEMF_bits(32));
> +        uint64_t console_mfn;
> +
> +        ASSERT(page);
> +        clear_page(page);
> +        console_mfn = virt_to_mfn(page);
> +        si->console.domU.mfn = console_mfn;
> +        share_xen_page_with_guest(mfn_to_page(console_mfn), d,
> +                                  XENSHARE_writable);
> +        replace_va(d, l4start, console_va, console_mfn);
> +        dom0_update_physmap(d, (console_va - va_start) >> PAGE_SHIFT,
> +                            console_mfn, vphysmap);
> +        consoled_set_ring_addr(page);

This looks to be a fair part of SET_AND_MAP_PARAM(), so I think
this wants breaking out as a separate macro.


> +size_t consoled_guest_rx(void)
> +{
> +    size_t recv = 0, idx = 0;
> +    XENCONS_RING_IDX cons, prod;
> +
> +    if ( !cons_ring )
> +        return 0;
> +
> +    spin_lock(&rx_lock);
> +
> +    cons = cons_ring->out_cons;
> +    prod = ACCESS_ONCE(cons_ring->out_prod);
> +    ASSERT((prod - cons) <= sizeof(cons_ring->out));
> +
> +    /* Is the ring empty? */
> +    if ( cons == prod )
> +        goto out;
> +
> +    /* Update pointers before accessing the ring */
> +    smp_rmb();

I think this need to move up ahead of the if(). In the comment
perhaps s/Update/Latch/?

> +size_t consoled_guest_tx(char c)
> +{
> +    size_t sent = 0;
> +    XENCONS_RING_IDX cons, prod;
> +
> +    if ( !cons_ring )
> +        return 0;
> +
> +    cons = ACCESS_ONCE(cons_ring->in_cons);
> +    prod = cons_ring->in_prod;
> +    ASSERT((prod - cons) <= sizeof(cons_ring->in));
> +
> +    /* Is the ring out of space? */
> +    if ( sizeof(cons_ring->in) - (prod - cons) == 0 )
> +        goto notify;
> +
> +    /* Update pointers before accessing the ring */
> +    smp_rmb();

Same here.

> --- /dev/null
> +++ b/xen/include/xen/consoled.h
> @@ -0,0 +1,27 @@
> +#ifndef __XEN_CONSOLED_H__
> +#define __XEN_CONSOLED_H__
> +
> +#include <public/io/console.h>
> +
> +#ifdef CONFIG_PV_SHIM
> +
> +void consoled_set_ring_addr(struct xencons_interface *ring);
> +struct xencons_interface *consoled_get_ring_addr(void);
> +size_t consoled_guest_rx(void);
> +size_t consoled_guest_tx(char c);
> +
> +#else
> +
> +size_t consoled_guest_tx(char c) { return 0; }

static inline

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 52/74] xen: mark xenstore/console pages as RAM and add them to dom_io
  2018-01-08 13:49   ` Jan Beulich
@ 2018-01-09  9:25     ` Roger Pau Monné
  2018-01-09 11:03       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-09  9:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Mon, Jan 08, 2018 at 06:49:21AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > From: Roger Pau Monne <roger.pau@citrix.com>
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> 
> There being no description at all makes it rather harder to review this
> one. I assume that marking the pages as RAM is necessary to make
> sure a struct page_info is being created for them, which in turn is a
> prereq for sharing the pages.

Yes, this was sent before we had time to add proper logs to some
commits, sorry.

> > +void __init hypervisor_fixup_e820(struct e820map *e820)
> > +{
> > +    uint64_t pfn = 0;
> 
> I don't think initializers of this kind are necessary (there are several
> instances of this).

gcc complains with "variable maybe used uninitialized" if I don't add
this. I haven't looked at the root cause of this.

> > +    long rc;
> > +
> > +    if ( !xen_guest )
> > +        return;
> > +
> > +#define MARK_PARAM_RAM(p) ({                    \
> > +    rc = xen_hypercall_hvm_get_param(p, &pfn);  \
> > +    if ( rc )                                   \
> > +        panic("Unable to get " #p);             \
> 
> The text here is the same in all three instances - please make it
> distinguishable, so one doesn't have to start guessing.
> 
> > +void __init hypervisor_init_memory(void)
> > +{
> > +    uint64_t pfn = 0;
> > +    long rc;
> > +
> > +    if ( !xen_guest )
> > +        return;
> > +
> > +#define SHARE_PARAM(p) ({                                                   \
> > +    rc = xen_hypercall_hvm_get_param(p, &pfn);                             \
> > +    if ( rc )                                                               \
> > +        panic("Unable to get " #p);                                         \
> > +    share_xen_page_with_guest(mfn_to_page(pfn), dom_io, XENSHARE_writable); \
> 
> Why dom_io rather than the client domain?

The client domain is not yet created at this point. This is exactly
the same that Xen does for the low 1MiB for example.

> The more that dom_io
> pages can only be mapped by privileged guests (and hence I
> assume you need another tweak somewhere this way).

I just use unshare_xen_page and share it again with the guest.

> > +const unsigned long *__init hypervisor_reserved_pages(unsigned int *size)
> > +{
> > +    static unsigned long __initdata reserved_pages[2];
> > +    uint64_t pfn = 0;
> > +    long rc;
> > +
> > +    if ( !xen_guest )
> > +        return NULL;
> > +
> > +    *size = 0;
> > +
> > +#define RESERVE_PARAM(p) ({                             \
> > +    rc = xen_hypercall_hvm_get_param(p, &pfn);          \
> > +    if ( rc )                                           \
> > +        panic("Unable to get " #p);                     \
> > +    reserved_pages[(*size)++] = pfn << PAGE_SHIFT;      \
> > +})
> > +    RESERVE_PARAM(HVM_PARAM_STORE_PFN);
> > +    if ( !pv_console )
> > +        RESERVE_PARAM(HVM_PARAM_CONSOLE_PFN);
> > +#undef RESERVE_PARAM
> > +
> > +    return reserved_pages;
> > +}
> 
> Afaict this happens much later than hypervisor_fixup_e820() -
> can't you latch the PFNs into a file scope array there, and merely
> return the information here, rather than re-invoking the
> hypercalls? This would save at least one instance of the wrapper
> macros.

Right, this seems better.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 58/74] xen/pvshim: add migration support
  2018-01-04 13:06 ` [PATCH RFC v1 58/74] xen/pvshim: add migration support Wei Liu
@ 2018-01-09  9:38   ` Jan Beulich
  2018-01-10 12:54     ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-09  9:38 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> +void hypervisor_resume(void)
> +{
> +    /* Reset shared info page. */
> +    map_shared_info();
> +
> +    /*
> +     * Reset vcpu_info. Just clean the mapped bitmap and try to map the vcpu
> +     * area again. On failure to map (when it was previously mapped) panic
> +     * since it's impossible to safely shut down running guest vCPUs in order
> +     * to meet the new XEN_LEGACY_MAX_VCPUS requirement.
> +     */
> +    memset(vcpu_info_mapped, 0, sizeof(vcpu_info_mapped));

bitmap_zero() would seem the more natural function to use here.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -466,8 +466,7 @@ void share_xen_page_with_guest(
>      spin_unlock(&d->page_alloc_lock);
>  }
>  
> -int __init unshare_xen_page_with_guest(struct page_info *page,
> -                                       struct domain *d)
> +int unshare_xen_page_with_guest(struct page_info *page, struct domain *d)
>  {

The function is - afaict - not generally safe to use in its current
shape; I've recently made an attempt at making it generic, but
iirc I wasn't able to make it fully race free. Therefore, to prevent
use in the wrong context, please retain the __init here when
not building a shim.

> --- a/xen/arch/x86/pv/shim.c
> +++ b/xen/arch/x86/pv/shim.c
> @@ -151,10 +151,167 @@ void __init pv_shim_setup_dom(struct domain *d, l4_pgentry_t *l4start,
>      }
>  }
>  
> -void pv_shim_shutdown(uint8_t reason)
> +static void write_start_info(struct domain *d)
>  {
> -    /* XXX: handle suspend */
> -    xen_hypercall_shutdown(reason);
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    start_info_t *si = map_domain_page(_mfn(is_pv_32bit_domain(d) ? regs->edx
> +                                                                  : regs->rdx));
> +    uint64_t param;
> +
> +    BUG_ON(!si);

map_domain_page() can't fail, so this is pointless.

> +    snprintf(si->magic, sizeof(si->magic), "xen-3.0-x86_%s",
> +             is_pv_32bit_domain(d) ? "32p" : "64");
> +    si->nr_pages = d->tot_pages;
> +    si->shared_info = virt_to_maddr(d->shared_info);
> +    si->flags = (xen_processor_pmbits << 8) & SIF_PM_MASK;

This appears to be pointless (and irritating) in the context of the
shim.

> +    BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_STORE_PFN, &si->store_mfn));
> +    BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_STORE_EVTCHN, &param));
> +    si->store_evtchn = param;
> +    BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_EVTCHN, &param));
> +    si->console.domU.evtchn = param;
> +    if ( !pv_console )
> +        BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_PFN,
> +                                           &si->console.domU.mfn));
> +    else
> +        si->console.domU.mfn = virt_to_mfn(consoled_get_ring_addr());

Generally we prefer to avoid side effects in BUG_ON()s, so
perhaps better

    if ( pv_console )
        si->console.domU.mfn = virt_to_mfn(consoled_get_ring_addr());
    else if ( xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_PFN,
                                           &si->console.domU.mfn) )
        BUG();

Considering it, I think the same (mostly cosmetic) issues exists
earlier in the series.

> +int pv_shim_shutdown(uint8_t reason)
> +{
> +    long rc;
> +
> +    if ( reason == SHUTDOWN_suspend )
> +    {

Reduce indentation of almost the entire function body by one
level by doing

    if ( reason != SHUTDOWN_suspend )
        /* Forward to L0. */
        return xen_hypercall_shutdown(reason);

?

> +        struct domain *d = current->domain;
> +        struct vcpu *v;
> +        unsigned int i;
> +        uint64_t old_store_pfn, old_console_pfn = 0, store_pfn, console_pfn;
> +        uint64_t store_evtchn, console_evtchn;
> +
> +        BUG_ON(current->vcpu_id != 0);
> +
> +        BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_STORE_PFN,
> +                                           &old_store_pfn));
> +        if ( !pv_console )
> +            BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_PFN,
> +                                               &old_console_pfn));
> +
> +        /* Pause the other vcpus before starting the migration. */
> +        for_each_vcpu(d, v)
> +            if ( v != current )
> +                vcpu_pause_by_systemcontroller(v);
> +
> +        rc = xen_hypercall_shutdown(SHUTDOWN_suspend);
> +        if ( rc )
> +        {
> +            for_each_vcpu(d, v)
> +                if ( v != current )
> +                    vcpu_unpause_by_systemcontroller(v);
> +
> +            return rc;
> +        }
> +
> +        /* Resume the shim itself first. */
> +        hypervisor_resume();
> +
> +        /*
> +         * ATM there's nothing Xen can do if the console/store pfn changes,
> +         * because Xen won't have a page_info struct for it.
> +         */
> +        BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_STORE_PFN,
> +                                           &store_pfn));
> +        BUG_ON(old_store_pfn != store_pfn);
> +        if ( !pv_console )
> +        {
> +            BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_PFN,
> +                                               &console_pfn));
> +            BUG_ON(old_console_pfn != console_pfn);
> +        }
> +
> +        /* Update domain id. */
> +        d->domain_id = get_dom0_domid();
> +
> +        /* Clean the iomem range. */
> +        BUG_ON(iomem_deny_access(d, 0, ~0UL));

Does this rangeset change across migration?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 59/74] xen/pvshim: add shim_mem cmdline parameter
  2018-01-04 13:06 ` [PATCH RFC v1 59/74] xen/pvshim: add shim_mem cmdline parameter Wei Liu
@ 2018-01-09  9:47   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-09  9:47 UTC (permalink / raw)
  To: Sergey Dyasli, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> @@ -284,7 +291,16 @@ unsigned long __init dom0_compute_nr_pages(
>           * maximum of 128MB.
>           */
>          if ( nr_pages == 0 )
> -            nr_pages = -min(avail / 16, 128UL << (20 - PAGE_SHIFT));
> +        {
> +            uint64_t rsvd = min(avail / 16, 128UL << (20 - PAGE_SHIFT));

"unsigned long" and blank line following it.

> --- a/xen/arch/x86/pv/shim.c
> +++ b/xen/arch/x86/pv/shim.c
> @@ -40,6 +40,52 @@ bool pv_shim;
>  boolean_param("pv-shim", pv_shim);
>  #endif
>  
> +/*
> + * By default, 1/16th of total HVM container's memory is reserved for xen-shim
> + * with minimum amount being 10MB and maximum amount 128MB. Some users may wish
> + * to tune this constants for better memory utilization. This can be achieved
> + * using the following xen-shim's command line option:
> + *
> + * shim_mem=[min:<min_amt>,][max:<max_amt>,][<amt>]
> + *
> + * <min_amt>: The minimum amount of memory that should be allocated for xen-shim
> + *            (ignored if greater than max)
> + * <max_amt>: The maximum amount of memory that should be allocated for xen-shim
> + * <amt>:     The precise amount of memory to allocate for xen-shim
> + *            (overrides both min and max)
> + */
> +static uint64_t __initdata shim_nrpages;
> +static uint64_t __initdata shim_min_nrpages = 10UL << (20 - PAGE_SHIFT);
> +static uint64_t __initdata shim_max_nrpages = 128UL << (20 - PAGE_SHIFT);

"unsigned long" at the very least. The Dom0 variant of the code
also allows for negative values (and hence uses "long"), but it
looks like you don't mean to support such here.

> +static int __init parse_shim_mem(const char *s)
> +{
> +    do {
> +        if ( !strncmp(s, "min:", 4) )
> +            shim_min_nrpages = parse_size_and_unit(s+4, &s) >> PAGE_SHIFT;
> +        else if ( !strncmp(s, "max:", 4) )
> +            shim_max_nrpages = parse_size_and_unit(s+4, &s) >> PAGE_SHIFT;

Blanks around + (twice) please.

> +        else
> +            shim_nrpages = parse_size_and_unit(s, &s) >> PAGE_SHIFT;
> +    } while ( *s++ == ',' );
> +
> +    return s[-1] ? -EINVAL : 0;
> +}
> +custom_param("shim_mem", parse_shim_mem);
> +
> +uint64_t pv_shim_mem(uint64_t avail)

unsigned long (twice) and __init.

With that (or the promise to clean it up after initial commit)
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 60/74] xen/pvshim: set max_pages to the value of tot_pages
  2018-01-04 13:06 ` [PATCH RFC v1 60/74] xen/pvshim: set max_pages to the value of tot_pages Wei Liu
@ 2018-01-09  9:48   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-09  9:48 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> So that the guest is not able to deplete the memory pool of the shim
> itself by trying to balloon up.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

Could perhaps be consider folding into the previous patch.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 61/74] xen/pvshim: support vCPU hotplug
  2018-01-04 13:06 ` [PATCH RFC v1 61/74] xen/pvshim: support vCPU hotplug Wei Liu
@ 2018-01-09 10:16   ` Jan Beulich
  2018-01-10 13:07     ` Roger Pau Monné
  2018-01-10 14:40     ` Roger Pau Monné
  0 siblings, 2 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 10:16 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> @@ -1303,22 +1320,20 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>          break;
>  
> -    case VCPUOP_up: {
> -        bool_t wake = 0;
> -        domain_lock(d);
> -        if ( !v->is_initialised )
> -            rc = -EINVAL;

Shouldn't this check remain here? I realize this will complicate
locking (luckily the domain lock is a recursive one, so it shouldn't
be too bad), but I don't think pv_shim_cpu_up() can tolerate failing
because of vcpu_up() failing.

I also think that the use of "long" for return types and values isn't
really warranted here, and there's also no visible to me reason to
special case CPU0 here. But for simplicity reasons I can see why
you've chosen that option; otoh the locking issue above that you'll
need to solve might be easier to deal with if you didn't switch CPUs
for hypercall processing (without dropping the use of
continue_hypercall_on_cpu()).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 62/74] xen/pvshim: memory hotplug
  2018-01-04 13:06 ` [PATCH RFC v1 62/74] xen/pvshim: memory hotplug Wei Liu
@ 2018-01-09 10:42   ` Jan Beulich
  2018-01-10 13:36     ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 10:42 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> @@ -814,6 +817,113 @@ long pv_shim_cpu_down(void *data)
>      return 0;
>  }
>  
> +static unsigned long batch_memory_op(int cmd, struct page_list_head *list)

unsigned int cmd, const struct ...

> +{
> +    struct xen_memory_reservation xmr = {
> +        .domid = DOMID_SELF,
> +    };
> +    unsigned long pfns[64];
> +    struct page_info *pg;

As long as you don't modify the list, this too can be const.

> +void pv_shim_online_memory(unsigned int nr, unsigned int order)
> +{
> +    struct page_info *page, *tmp;
> +    PAGE_LIST_HEAD(list);
> +
> +    spin_lock(&balloon_lock);
> +    page_list_for_each_safe ( page, tmp, &balloon )
> +    {
> +            if ( page->v.free.order != order )
> +                continue;

Since guests (afaik) only ever balloon order-0 pages, this is fine
for now. But it's insufficient in general - there's no point failing
a request when there's no exact match available, but a higher
order one is (which could be split).

> +            page_list_del(page, &balloon);
> +            page_list_add_tail(page, &list);
> +            if ( !--nr )
> +                break;
> +    }
> +    spin_unlock(&balloon_lock);
> +
> +    if ( nr )
> +        gprintk(XENLOG_WARNING,
> +                "failed to allocate %u extents of order %u for onlining\n",
> +                nr, order);
> +
> +    nr = batch_memory_op(XENMEM_populate_physmap, &list);

You need to pass order into the function (and use it there).

> +    while ( nr-- )
> +    {
> +        BUG_ON((page = page_list_remove_head(&list)) == NULL);
> +        free_domheap_pages(page, order);
> +    }
> +
> +    if ( !page_list_empty(&list) )
> +    {
> +        gprintk(XENLOG_WARNING,
> +                "failed to online some of the memory regions\n");
> +        spin_lock(&balloon_lock);
> +        while ( (page = page_list_remove_head(&list)) != NULL )
> +            page_list_add_tail(page, &balloon);

page_list_splice()?

> @@ -993,6 +997,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>              return start_extent;
>          }
>  
> +#ifdef CONFIG_X86
> +        if ( pv_shim && op != XENMEM_decrease_reservation && !args.nr_done )
> +            pv_shim_online_memory(args.nr_extents, args.extent_order);
> +#endif
> +
>          switch ( op )
>          {
>          case XENMEM_increase_reservation:
> @@ -1015,6 +1024,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  __HYPERVISOR_memory_op, "lh",
>                  op | (rc << MEMOP_EXTENT_SHIFT), arg);
>  
> +#ifdef CONFIG_X86
> +        if ( pv_shim && op == XENMEM_decrease_reservation )
> +            pv_shim_offline_memory(args.nr_extents, args.extent_order);
> +#endif

Looking at both of these changes - is it somewhere being made
sure that shim containers won't boot in PoD mode?

For the latter change - is this correct when the operation has been
preempted? I think you want to offline only the delta between
start and args.nr_done.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 63/74] xen/shim: modify shim_mem parameter behaviour
  2018-01-04 13:06 ` [PATCH RFC v1 63/74] xen/shim: modify shim_mem parameter behaviour Wei Liu
@ 2018-01-09 10:48   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 10:48 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> shim_mem will now account for both the memory used by the hypervisor
> loaded in memory and the free memory slack given to the shim for
> runtime usage.
> 
> From experimental testing it seems like the total amount of MiB used
> by the shim (giving it ~1MB of free memory for runtime) is:
> 
> memory/113 + 20

How can this be independent of e.g. the client's vCPU count? For
the moment we may be better off leaving out this change, with
people caring much about memory utilization making use of the
command line options.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 64/74] xen/pvshim: use default position for the m2p mappings
  2018-01-04 13:06 ` [PATCH RFC v1 64/74] xen/pvshim: use default position for the m2p mappings Wei Liu
@ 2018-01-09 10:50   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 10:50 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> When running a 32bit kernel as Dom0 on a 64bit hypervisor the
> hypervisor will try to shrink the hypervisor hole to the minimum
> needed, and thus requires the Dom0 to use XENMEM_machphys_mapping in
> order to fetch the position of the start of the hypervisor virtual
> mappings.
> 
> Disable this feature when running as a PV shim, since some DomU
> kernels don't implemented XENMEM_machphys_mapping and break if the m2p
> doesn't begin at the default address.
> 
> NB: support for the XENMEM_machphys_mapping was added in Linux by
> commit 7e7750.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

If we really mean to support 32-bit clients
Acked-by: Jan Beulich <jbeulich@suse.com>
Otherwise I'd prefer to leave this out.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 65/74] xen/shim: crash instead of reboot in shim mode
  2018-01-04 13:06 ` [PATCH RFC v1 65/74] xen/shim: crash instead of reboot in shim mode Wei Liu
@ 2018-01-09 10:52   ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 10:52 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> All guest shutdown operations are forwarded to L0, so the only native
> calls to machine_restart happen from crash related paths inside the
> hypervisor, hence switch the reboot code to instead issue a crash
> shutdown.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
preferably with ...

> --- a/xen/arch/x86/shutdown.c
> +++ b/xen/arch/x86/shutdown.c
> @@ -642,6 +642,13 @@ void machine_restart(unsigned int delay_millisecs)
>              break;
>  
>          case BOOT_XEN:
> +            if ( pv_shim )
> +                /*
> +                 * When running in PV shim mode guest shutdown calls are
> +                 * forwarded to L0, hence the only way to get here is if a
> +                 * shim crash happens.
> +                 */
> +                xen_hypercall_shutdown(SHUTDOWN_crash);
>              xen_hypercall_shutdown(SHUTDOWN_reboot);

... this made

             xen_hypercall_shutdown(pv_shim ? SHUTDOWN_crash : SHUTDOWN_reboot);

(suitably line wrapped).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 66/74] xen/shim: allow DomU to have as many vcpus as available
  2018-01-04 13:06 ` [PATCH RFC v1 66/74] xen/shim: allow DomU to have as many vcpus as available Wei Liu
@ 2018-01-09 10:59   ` Jan Beulich
  2018-01-10 16:14     ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 10:59 UTC (permalink / raw)
  To: Roger Pau Monne, wei.liu2; +Cc: Xen-devel

>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> Since the shim VCPUOP_{up/down} hypercall is wired to the plug/unplug
> of CPUs to the shim itself, start the shim DomU with only the BSP
> online, and let the guest bring up other CPUs as it needs them.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

What are the ramifications of not making this change? Shouldn't
the shim's pCPU count (pCPU as viewed from its own perspective)
simply always match its client's vCPU count?

> @@ -153,6 +162,23 @@ unsigned int __init dom0_max_vcpus(void)
>      unsigned int i, max_vcpus, limit;
>      nodeid_t node;
>  
> +    if ( pv_shim )
> +    {
> +        nodes_setall(dom0_nodes);
> +
> +        /*
> +         * When booting in shim mode APs are not started until the guest brings
> +         * other vCPUs up.
> +         */
> +        cpumask_set_cpu(0, &dom0_cpus);
> +
> +        /*
> +         * On PV shim mode allow the guest to have as many CPUs as available.
> +         */

Style (single line comment).

> --- a/xen/arch/x86/pv/dom0_build.c
> +++ b/xen/arch/x86/pv/dom0_build.c
> @@ -695,7 +695,8 @@ int __init dom0_construct_pv(struct domain *d,
>      for ( i = 0; i < XEN_LEGACY_MAX_VCPUS; i++ )
>          shared_info(d, vcpu_info[i].evtchn_upcall_mask) = 1;
>  
> -    printk("Dom0 has maximum %u VCPUs\n", d->max_vcpus);
> +    printk("%s has maximum %u VCPUs\n", pv_shim ? "DomU" : "Dom0",

"Dom%c ..." perhaps?

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1580,20 +1580,28 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      do_presmp_initcalls();
>  
> -    for_each_present_cpu ( i )
> +    if ( !pv_shim )
>      {
> -        /* Set up cpu_to_node[]. */
> -        srat_detect_node(i);
> -        /* Set up node_to_cpumask based on cpu_to_node[]. */
> -        numa_add_cpu(i);        
> -
> -        if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
> +        for_each_present_cpu ( i )
>          {
> -            int ret = cpu_up(i);
> -            if ( ret != 0 )
> -                printk("Failed to bring up CPU %u (error %d)\n", i, ret);
> +            /* Set up cpu_to_node[]. */
> +            srat_detect_node(i);
> +            /* Set up node_to_cpumask based on cpu_to_node[]. */
> +            numa_add_cpu(i);
> +
> +            if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
> +            {
> +                int ret = cpu_up(i);
> +                if ( ret != 0 )
> +                    printk("Failed to bring up CPU %u (error %d)\n", i, ret);
> +            }
>          }
>      }
> +    /*
> +     * NB: when running as a PV shim VCPUOP_up/down is wired to the shim
> +     * physical cpu_add/remove functions, so launch the guest with only
> +     * the BSP online and let it bring up the other CPUs as required.
> +     */

I think this comment would better go immediately ahead of the if()
you introduce.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 52/74] xen: mark xenstore/console pages as RAM and add them to dom_io
  2018-01-09  9:25     ` Roger Pau Monné
@ 2018-01-09 11:03       ` Jan Beulich
  2018-01-09 11:26         ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 11:03 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, wei.liu2

>>> On 09.01.18 at 10:25, <roger.pau@citrix.com> wrote:
> On Mon, Jan 08, 2018 at 06:49:21AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
>> > +void __init hypervisor_init_memory(void)
>> > +{
>> > +    uint64_t pfn = 0;
>> > +    long rc;
>> > +
>> > +    if ( !xen_guest )
>> > +        return;
>> > +
>> > +#define SHARE_PARAM(p) ({                                                   \
>> > +    rc = xen_hypercall_hvm_get_param(p, &pfn);                             \
>> > +    if ( rc )                                                               \
>> > +        panic("Unable to get " #p);                                         \
>> > +    share_xen_page_with_guest(mfn_to_page(pfn), dom_io, XENSHARE_writable); \
>> 
>> Why dom_io rather than the client domain?
> 
> The client domain is not yet created at this point. This is exactly
> the same that Xen does for the low 1MiB for example.

The low 1Mb is being treated as MMIO, hence remains assigned
to dom_io.

>> The more that dom_io
>> pages can only be mapped by privileged guests (and hence I
>> assume you need another tweak somewhere this way).
> 
> I just use unshare_xen_page and share it again with the guest.

And there is no option of simply doing the sharing here later,
when the domain is already in existence?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 52/74] xen: mark xenstore/console pages as RAM and add them to dom_io
  2018-01-09 11:03       ` Jan Beulich
@ 2018-01-09 11:26         ` Roger Pau Monné
  2018-01-09 13:34           ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-09 11:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Tue, Jan 09, 2018 at 04:03:25AM -0700, Jan Beulich wrote:
> >>> On 09.01.18 at 10:25, <roger.pau@citrix.com> wrote:
> > On Mon, Jan 08, 2018 at 06:49:21AM -0700, Jan Beulich wrote:
> >> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> >> > +void __init hypervisor_init_memory(void)
> >> > +{
> >> > +    uint64_t pfn = 0;
> >> > +    long rc;
> >> > +
> >> > +    if ( !xen_guest )
> >> > +        return;
> >> > +
> >> > +#define SHARE_PARAM(p) ({                                                   \
> >> > +    rc = xen_hypercall_hvm_get_param(p, &pfn);                             \
> >> > +    if ( rc )                                                               \
> >> > +        panic("Unable to get " #p);                                         \
> >> > +    share_xen_page_with_guest(mfn_to_page(pfn), dom_io, XENSHARE_writable); \
> >> 
> >> Why dom_io rather than the client domain?
> > 
> > The client domain is not yet created at this point. This is exactly
> > the same that Xen does for the low 1MiB for example.
> 
> The low 1Mb is being treated as MMIO, hence remains assigned
> to dom_io.
> 
> >> The more that dom_io
> >> pages can only be mapped by privileged guests (and hence I
> >> assume you need another tweak somewhere this way).
> > 
> > I just use unshare_xen_page and share it again with the guest.
> 
> And there is no option of simply doing the sharing here later,
> when the domain is already in existence?

I'm afraid that if I don't add the pages to dom_io at this point they
would be added to the free memory pool, and thus might be used for
anything. Maybe I'm missing something, but I didn't find any other way
to deal with this given the short time.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 49/74] x86/guest: map per-cpu vcpu_info area.
  2018-01-08 13:21   ` Jan Beulich
@ 2018-01-09 12:08     ` Roger Pau Monné
  0 siblings, 0 replies; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-09 12:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Mon, Jan 08, 2018 at 06:21:04AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > +    long rc;
> > +
> > +    if ( !vcpu_info )
> > +    {
> > +        this_cpu(vcpu_info) = &XEN_shared_info->vcpu_info[vcpu];
> > +        return 0;
> > +    }
> > +
> > +    if ( test_bit(vcpu, vcpu_info_mapped) )
> > +    {
> > +        this_cpu(vcpu_info) = &vcpu_info[vcpu];
> > +        return 0;
> > +    }
> > +
> > +    info.mfn = virt_to_mfn(&vcpu_info[vcpu]);
> > +    info.offset = (unsigned long)&vcpu_info[vcpu] & ~PAGE_MASK;
> > +    rc = xen_hypercall_vcpu_op(VCPUOP_register_vcpu_info, vcpu, &info);
> > +    if ( rc )
> > +        this_cpu(vcpu_info) = &XEN_shared_info->vcpu_info[vcpu];
> 
> You need to avoid producing an out of bounds pointer here for
> large vcpu values.

I guess a BUG is the only sensible outcome here in that case. The BSP
should have already limited the number of possible CPUs if mapping the
vcpu_info failed.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 52/74] xen: mark xenstore/console pages as RAM and add them to dom_io
  2018-01-09 11:26         ` Roger Pau Monné
@ 2018-01-09 13:34           ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 13:34 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, wei.liu2

>>> On 09.01.18 at 12:26, <roger.pau@citrix.com> wrote:
> On Tue, Jan 09, 2018 at 04:03:25AM -0700, Jan Beulich wrote:
>> >>> On 09.01.18 at 10:25, <roger.pau@citrix.com> wrote:
>> > On Mon, Jan 08, 2018 at 06:49:21AM -0700, Jan Beulich wrote:
>> >> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
>> >> > +void __init hypervisor_init_memory(void)
>> >> > +{
>> >> > +    uint64_t pfn = 0;
>> >> > +    long rc;
>> >> > +
>> >> > +    if ( !xen_guest )
>> >> > +        return;
>> >> > +
>> >> > +#define SHARE_PARAM(p) ({                                                   \
>> >> > +    rc = xen_hypercall_hvm_get_param(p, &pfn);                             \
>> >> > +    if ( rc )                                                               \
>> >> > +        panic("Unable to get " #p);                                         \
>> >> > +    share_xen_page_with_guest(mfn_to_page(pfn), dom_io, XENSHARE_writable); \
>> >> 
>> >> Why dom_io rather than the client domain?
>> > 
>> > The client domain is not yet created at this point. This is exactly
>> > the same that Xen does for the low 1MiB for example.
>> 
>> The low 1Mb is being treated as MMIO, hence remains assigned
>> to dom_io.
>> 
>> >> The more that dom_io
>> >> pages can only be mapped by privileged guests (and hence I
>> >> assume you need another tweak somewhere this way).
>> > 
>> > I just use unshare_xen_page and share it again with the guest.
>> 
>> And there is no option of simply doing the sharing here later,
>> when the domain is already in existence?
> 
> I'm afraid that if I don't add the pages to dom_io at this point they
> would be added to the free memory pool, and thus might be used for
> anything. Maybe I'm missing something, but I didn't find any other way
> to deal with this given the short time.

The first thing you do is mark these pages as E820 RAM. If that
wasn't done, I don't think they'd end up in the allocator, and
hence could be shared later. I guess there may nevertheless be
a reason to do that early E820 manipulation, but with the patch
having no description I cannot guess what that reason might be
(and hence I can't think of alternatives).

Anyway - in the interest of quick progress I'm fine with this being
left as is for now, as long as revisiting it is being put on a todo
list.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 20/74] x86: produce a binary that can be booted as PVH
  2018-01-08 16:42       ` Jan Beulich
@ 2018-01-09 13:49         ` Wei Liu
  0 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-09 13:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, wei.liu2, Xen-devel

On Mon, Jan 08, 2018 at 09:42:54AM -0700, Jan Beulich wrote:
> >>> On 08.01.18 at 16:59, <wei.liu2@citrix.com> wrote:
> > On Fri, Jan 05, 2018 at 04:39:33AM -0700, Jan Beulich wrote:
> >> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> >> > --- a/xen/arch/x86/Makefile
> >> > +++ b/xen/arch/x86/Makefile
> >> > @@ -75,6 +75,8 @@ efi-y := $(shell if [ ! -r 
> >> > $(BASEDIR)/include/xen/compile.h -o \
> >> >                        -O $(BASEDIR)/include/xen/compile.h ]; then \
> >> >                           echo '$(TARGET).efi'; fi)
> >> >  
> >> > +shim-$(CONFIG_PVH_GUEST) := $(TARGET)-shim
> >> > +
> >> >  ifneq ($(build_id_linker),)
> >> >  notes_phdrs = --notes
> >> >  else
> >> > @@ -93,7 +95,7 @@ endif
> >> >  syms-warn-dup-y := --warn-dup
> >> >  syms-warn-dup-$(CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS) :=
> >> >  
> >> > -$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
> >> > +$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32 $(shim-y)
> >> >  	./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TARGET) $(XEN_IMG_OFFSET) \
> >> >  	               `$(NM) $(TARGET)-syms | sed -ne 's/^\([^ ]*\) . __2M_rwdata_end$$/0x\1/p'`
> >> 
> >> Hmm, so you mean to build shim and "normal" Xen at the same time,
> >> with all the same objects? That's rather unexpected following the
> >> earlier exchange Andrew and I had. I would expect the shim to not
> >> require quite a few bits and pieces, and hence wanting to be built
> >> independently.
> >> 
> > 
> > There is a later patch in this series to link xen under tools/firmware/
> > to build the shim there, which would need build system patch like this.
> > 
> > The can be cleaned up somehow. At the time I wasn't sure how best to
> > proceed (and certainly didn't take part in the discussion between Andrew
> > and you).
> > 
> > Suggestions welcome.
> 
> Well, when I had discussed this with Andrew, my view on the
> outcome was that we'd build either xen-shim or the pair of
> xen.gz and xen.efi in a single build invocation (hence two build
> all three, a second make would be needed, which would seem
> to be at least along the lines of what that later patch is doing).
> 
> The above dependency, otoh, suggests that you want to
> build both xen.gz and xen-shim.

Removing the dependency should be easy. It was added mostly for
convenience of development.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 32/74] x86: don't swallow the first command line item in pvh mode
  2018-01-04 13:05 ` [PATCH RFC v1 32/74] x86: don't swallow the first command line item " Wei Liu
  2018-01-05 14:49   ` Jan Beulich
@ 2018-01-09 14:30   ` Roger Pau Monné
  1 sibling, 0 replies; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-09 14:30 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel

On Thu, Jan 04, 2018 at 01:05:43PM +0000, Wei Liu wrote:
> Instead, special case GRUB1 rather assuming that all bootloaders except GRUB2
> need a parameter stripping.

The FreeBSD loader also prepends "xen.gz" (or the Xen kernel filename)
to the command line. Hence this change will break it.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU
  2018-01-09  9:13   ` Jan Beulich
@ 2018-01-09 15:43     ` Sergey Dyasli
  2018-01-09 16:28       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Sergey Dyasli @ 2018-01-09 15:43 UTC (permalink / raw)
  To: JBeulich; +Cc: Sergey Dyasli, Wei Liu, xen-devel

On Tue, 2018-01-09 at 02:13 -0700, Jan Beulich wrote:
> > > > On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > +size_t consoled_guest_rx(void)
> > +{
> > +    size_t recv = 0, idx = 0;
> > +    XENCONS_RING_IDX cons, prod;
> > +
> > +    if ( !cons_ring )
> > +        return 0;
> > +
> > +    spin_lock(&rx_lock);
> > +
> > +    cons = cons_ring->out_cons;
> > +    prod = ACCESS_ONCE(cons_ring->out_prod);
> > +    ASSERT((prod - cons) <= sizeof(cons_ring->out));
> > +
> > +    /* Is the ring empty? */
> > +    if ( cons == prod )
> > +        goto out;
> > +
> > +    /* Update pointers before accessing the ring */
> > +    smp_rmb();
> 
> I think this need to move up ahead of the if(). In the comment
> perhaps s/Update/Latch/?

The read/write memory barriers here are between read/write accesses to
ring->out_prod and ring->out array. So there is no need to move them.
(the same goes for the input ring)

-- 
Thanks,
Sergey
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 21/74] x86/entry: Early PVH boot code
  2018-01-05 13:32   ` Jan Beulich
@ 2018-01-09 15:45     ` Wei Liu
  2018-01-09 16:41       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-09 15:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, wei.liu2, Xen-devel

On Fri, Jan 05, 2018 at 06:32:56AM -0700, Jan Beulich wrote:
> > +    module_t *mod;
> > +    unsigned int i;
> > +
> > +    ASSERT(pvh_info->magic == XEN_HVM_START_MAGIC_VALUE);
> > +
> > +    /*
> > +     * Turn hvm_start_info into mbi. Luckily all modules are placed under 4GB
> > +     * boundary on x86.
> 
> ISTR having that discussion relatively recently in another context:
> All the header states is "NB: Xen on x86 will always try to place all
> the data below the 4GiB boundary." Note the "try to". Hence I
> think ...
> 
> > +     */
> > +    pvh_mbi.flags = MBI_CMDLINE | MBI_MODULES | MBI_LOADERNAME;
> > +
> > +    ASSERT(!(pvh_info->cmdline_paddr >> 32));
> 
> ... this, if we don't want to handle the case, should be BUG_ON() or
> panic() (same further down).
> 
> > +    pvh_mbi.cmdline = pvh_info->cmdline_paddr;
> > +    pvh_mbi.boot_loader_name = __pa(pvh_loader);
> > +
> > +    ASSERT(pvh_info->nr_modules < 32);
> 
> ARRAY_SIZE(pvh_mbi_mods) and perhaps again BUG_ON() or
> panic().
> 
> > +    pvh_mbi.mods_count = pvh_info->nr_modules;
> > +    pvh_mbi.mods_addr = __pa(pvh_mbi_mods);
> > +
> > +    mod = pvh_mbi_mods;
> > +    entry = __va(pvh_info->modlist_paddr);
> 
> How come __va() already works at this point in time? And what about
> this address being beyond 4Gb?
> 

The original code uses __va at the beginning of __start_xen so this is
no more erroneous than what we originally have.

We shall BUG_ON address beyond 4Gb for the time being.

> > +    for ( i = 0; i < pvh_info->nr_modules; i++ )
> > +    {
> > +        ASSERT(!(entry[i].paddr >> 32));
> 
> To relax this condition (in particular to allow huge initrd), how
> about ...
> 
> > +        mod[i].mod_start = entry[i].paddr;
> > +        mod[i].mod_end   = entry[i].paddr + entry[i].size;
> 
> ... using the EFI approach here and store the PFN in mod_start
> and the size in mod_end?


This function turns pvh_info into multiboot info. I'm afraid I don't
follow you suggestion here. The best approach now is to BUG_ON here and
consider huge initrd later.

(I will try to fix other comments where I can)

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU
  2018-01-08 14:06   ` Jan Beulich
@ 2018-01-09 16:09     ` Roger Pau Monné
  2018-01-09 16:26       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-09 16:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Mon, Jan 08, 2018 at 07:06:14AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > From: Roger Pau Monne <roger.pau@citrix.com>
> > --- a/xen/arch/x86/pv/dom0_build.c
> > +++ b/xen/arch/x86/pv/dom0_build.c
> > @@ -31,9 +31,8 @@
> >  #define L3_PROT (BASE_PROT|_PAGE_DIRTY)
> >  #define L4_PROT (BASE_PROT|_PAGE_DIRTY)
> >  
> > -static __init void dom0_update_physmap(struct domain *d, unsigned long pfn,
> > -                                       unsigned long mfn,
> > -                                       unsigned long vphysmap_s)
> > +__init void dom0_update_physmap(struct domain *d, unsigned long pfn,
> 
> Please don't re-order type and annotation.

I'm not re-ordering anything here, just removing "static". Do you mean
that you prefer "void __init ..."?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU
  2018-01-09 16:09     ` Roger Pau Monné
@ 2018-01-09 16:26       ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 16:26 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, wei.liu2

>>> On 09.01.18 at 17:09, <roger.pau@citrix.com> wrote:
> On Mon, Jan 08, 2018 at 07:06:14AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
>> > From: Roger Pau Monne <roger.pau@citrix.com>
>> > --- a/xen/arch/x86/pv/dom0_build.c
>> > +++ b/xen/arch/x86/pv/dom0_build.c
>> > @@ -31,9 +31,8 @@
>> >  #define L3_PROT (BASE_PROT|_PAGE_DIRTY)
>> >  #define L4_PROT (BASE_PROT|_PAGE_DIRTY)
>> >  
>> > -static __init void dom0_update_physmap(struct domain *d, unsigned long pfn,
>> > -                                       unsigned long mfn,
>> > -                                       unsigned long vphysmap_s)
>> > +__init void dom0_update_physmap(struct domain *d, unsigned long pfn,
>> 
>> Please don't re-order type and annotation.
> 
> I'm not re-ordering anything here, just removing "static".

Oops, I'm sorry. Things being mis-ordered simply becomes more
obvious with the "static gone".

> Do you mean that you prefer "void __init ..."?

Yes.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 54/74] xen/pvshim: set correct domid value
  2018-01-08 14:17   ` Jan Beulich
@ 2018-01-09 16:27     ` Roger Pau Monné
  0 siblings, 0 replies; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-09 16:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Mon, Jan 08, 2018 at 07:17:16AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > @@ -576,11 +578,11 @@ static void noinline init_done(void)
> >  
> >      system_state = SYS_STATE_active;
> >  
> > +    domain_unpause_by_systemcontroller(dom0);
> > +
> >      /* MUST be done prior to removing .init data. */
> >      unregister_init_virtual_region();
> >  
> > -    domain_unpause_by_systemcontroller(hardware_domain);
> 
> Why the re-ordering? Along the lines of the earlier comment,
> using "dom0" as replacement (static) variable isn't very nice.
> Please at least accompany its declaration by a comment.

The 'dom0' variable is in the .init section, so it seems best to do
the unpause first and then remove the init virtual regions.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU
  2018-01-09 15:43     ` Sergey Dyasli
@ 2018-01-09 16:28       ` Jan Beulich
  2018-01-10 16:56         ` Sergey Dyasli
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 16:28 UTC (permalink / raw)
  To: Sergey Dyasli; +Cc: xen-devel, Wei Liu

>>> On 09.01.18 at 16:43, <sergey.dyasli@citrix.com> wrote:
> On Tue, 2018-01-09 at 02:13 -0700, Jan Beulich wrote:
>> > > > On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
>> > +size_t consoled_guest_rx(void)
>> > +{
>> > +    size_t recv = 0, idx = 0;
>> > +    XENCONS_RING_IDX cons, prod;
>> > +
>> > +    if ( !cons_ring )
>> > +        return 0;
>> > +
>> > +    spin_lock(&rx_lock);
>> > +
>> > +    cons = cons_ring->out_cons;
>> > +    prod = ACCESS_ONCE(cons_ring->out_prod);
>> > +    ASSERT((prod - cons) <= sizeof(cons_ring->out));
>> > +
>> > +    /* Is the ring empty? */
>> > +    if ( cons == prod )
>> > +        goto out;
>> > +
>> > +    /* Update pointers before accessing the ring */
>> > +    smp_rmb();
>> 
>> I think this need to move up ahead of the if(). In the comment
>> perhaps s/Update/Latch/?
> 
> The read/write memory barriers here are between read/write accesses to
> ring->out_prod and ring->out array. So there is no need to move them.
> (the same goes for the input ring)

And there is no multiple-read issue here?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 21/74] x86/entry: Early PVH boot code
  2018-01-09 15:45     ` Wei Liu
@ 2018-01-09 16:41       ` Jan Beulich
  2018-01-09 17:10         ` Wei Liu
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 16:41 UTC (permalink / raw)
  To: wei.liu2; +Cc: Andrew Cooper, Xen-devel

>>> On 09.01.18 at 16:45, <wei.liu2@citrix.com> wrote:
> On Fri, Jan 05, 2018 at 06:32:56AM -0700, Jan Beulich wrote:
>> > +    pvh_mbi.mods_count = pvh_info->nr_modules;
>> > +    pvh_mbi.mods_addr = __pa(pvh_mbi_mods);
>> > +
>> > +    mod = pvh_mbi_mods;
>> > +    entry = __va(pvh_info->modlist_paddr);
>> 
>> How come __va() already works at this point in time? And what about
>> this address being beyond 4Gb?
>> 
> 
> The original code uses __va at the beginning of __start_xen so this is
> no more erroneous than what we originally have.

Well, I was assuming that these uses of __va() here are the
reason why you need to extend the initial mapping in another
patch. The original ones early in __start_xen() all deal with the
MBI which we've relocated to a place where __va() can be used.

>> > +    for ( i = 0; i < pvh_info->nr_modules; i++ )
>> > +    {
>> > +        ASSERT(!(entry[i].paddr >> 32));
>> 
>> To relax this condition (in particular to allow huge initrd), how
>> about ...
>> 
>> > +        mod[i].mod_start = entry[i].paddr;
>> > +        mod[i].mod_end   = entry[i].paddr + entry[i].size;
>> 
>> ... using the EFI approach here and store the PFN in mod_start
>> and the size in mod_end?
> 
> 
> This function turns pvh_info into multiboot info. I'm afraid I don't
> follow you suggestion here. The best approach now is to BUG_ON here and
> consider huge initrd later.

Doing this later is fine of course; what I'm referring to is that
you store paddr of start and end, whereas the early EFI code
stores PFN and size (and the consumer code in __start_xen
knows to tell the cases apart).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU
  2018-01-09  8:00       ` Jan Beulich
@ 2018-01-09 16:45         ` Roger Pau Monné
  2018-01-09 17:42           ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-09 16:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2, Anthony Liguori

On Tue, Jan 09, 2018 at 01:00:10AM -0700, Jan Beulich wrote:
> >>> On 08.01.18 at 17:22, <roger.pau@citrix.com> wrote:
> > On Mon, Jan 08, 2018 at 09:05:40AM -0700, Jan Beulich wrote:
> >> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> >> > +        unsigned long evtchn = xchg(&XEN_shared_info->evtchn_pending[l1], 0);
> >> > +
> >> > +        __clear_bit(l1, &pending);
> >> > +        evtchn &= ~XEN_shared_info->evtchn_mask[l1];
> >> > +        while ( evtchn )
> >> > +        {
> >> > +            unsigned int port = ffsl(evtchn) - 1;
> >> > +
> >> > +            __clear_bit(port, &evtchn);
> >> > +            port += l1 * BITS_PER_LONG;
> >> 
> >> What about a 32-bit client? If that's not intended to be supported,
> >> building of such a guest should be prevented (in dom0_build.c).
> > 
> > 32bit client? You mean building a shim that runs in 32bit mode? If so
> > I haven't really through of it, but in any case BITS_PER_LOG would be
> > OK also in that case?
> 
> No, by "client" I mean the (sole) guest of the shim, in the 32-bit
> case of which you'd need to use BITS_PER_EVTCHN_WORD() here.
> But since 32-bit PV guests are not a problem wrt SP3, I can see
> why we wouldn't want/need to support that case. Yet if so, I'd
> prefer if we did that uniformly, by e.g. also avoiding the compat
> complications in the new grant table wrapper.

Hm, I'm afraid I'm not following. Xen is 64bits, and this is the
shared_info page of the shim (Xen), so the size it's BITS_PER_LONG.

32bit PV guests have been tested and seem to work fine. Whether
someone would want to convert them or not I don't know, but it's
almost no extra effort to provide a shim that works for both
bitness.

> >> > +    case EVTCHNOP_unmask: {
> >> > +        struct evtchn_unmask unmask;
> >> > +
> >> > +        if ( copy_from_guest(&unmask, arg, 1) != 0 )
> >> > +            return -EFAULT;
> >> > +
> >> > +        /* Unmask is handled in L1 */
> >> > +        rc = evtchn_unmask(unmask.port);
> >> > +
> >> > +        break;
> >> > +    }
> >> 
> >> Is this really sufficient, without handing anything through to L0?
> >> Perhaps it's fine as long as there's no pass-through support here.
> > 
> > For the unmask operation? I think so, if there was a pending event the
> > shim will already take care of injecting it to the guest.
> 
> Well, as the Linux code (evtchn_2l_unmask()) tells us certain
> unmasks have to go through the hypervisor. I would assume
> that in the case of the shim this means that L2 requests need
> to also be handed through to L0 whenever they're not being
> handled entirely locally to L1.

I'm not sure any L2 unmask needs to go through L0. If we perform the
unmask in L1 and there's an event pending L1 will already inject an
interrupt into L2, and AFAIK that's the point of using EVTCHNOP_unmask
(get an interrupt after unmask if an event is pending).

> >> > @@ -1030,6 +1055,11 @@ long do_event_channel_op(int cmd, 
> >> > XEN_GUEST_HANDLE_PARAM(void) arg)
> >> >  {
> >> >      long rc;
> >> >  
> >> > +#ifdef CONFIG_X86
> >> > +    if ( pv_shim )
> >> > +        return pv_shim_event_channel_op(cmd, arg);
> >> > +#endif
> >> 
> >> Patch it right into the hypercall table instead?
> > 
> > That would only work if the shim is a compile time option, but not a
> > run time one, the hypercall table is ro.
> 
> Well, yes and no: See nmi_shootdown_cpus() for a precedent
> of how to do that without removing the r/o attribute. Not having
> the hook sit here would (I assume) allow to avoid compiling the
> entire do_event_channel_op() down the road in the shim-only
> case. The compiler may be able to partially do this (omitting the
> rest of the function), but my experience is that deferring to the
> compiler in this regard often means leaving some traces around.

I see, I could use the write_atomic + directmap trick, but I think I
will leave that for later, since doesn't seem crucial to me.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 21/74] x86/entry: Early PVH boot code
  2018-01-09 16:41       ` Jan Beulich
@ 2018-01-09 17:10         ` Wei Liu
  0 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-09 17:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, wei.liu2, Xen-devel

On Tue, Jan 09, 2018 at 09:41:51AM -0700, Jan Beulich wrote:
> >>> On 09.01.18 at 16:45, <wei.liu2@citrix.com> wrote:
> > On Fri, Jan 05, 2018 at 06:32:56AM -0700, Jan Beulich wrote:
> >> > +    pvh_mbi.mods_count = pvh_info->nr_modules;
> >> > +    pvh_mbi.mods_addr = __pa(pvh_mbi_mods);
> >> > +
> >> > +    mod = pvh_mbi_mods;
> >> > +    entry = __va(pvh_info->modlist_paddr);
> >> 
> >> How come __va() already works at this point in time? And what about
> >> this address being beyond 4Gb?
> >> 
> > 
> > The original code uses __va at the beginning of __start_xen so this is
> > no more erroneous than what we originally have.
> 
> Well, I was assuming that these uses of __va() here are the
> reason why you need to extend the initial mapping in another
> patch. The original ones early in __start_xen() all deal with the
> MBI which we've relocated to a place where __va() can be used.

I see -- I thought everything was relocated automatically, which in
hindsight looks very stupid. That's probably why Andrew wrote that patch
to extend the mapping. We can certainly relocate pvh info as well, but
then that would delay the work.  We can add that as a blocker for the
proper solution later.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU
  2018-01-09 16:45         ` Roger Pau Monné
@ 2018-01-09 17:42           ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-09 17:42 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, wei.liu2, Anthony Liguori

>>> On 09.01.18 at 17:45, <roger.pau@citrix.com> wrote:
> On Tue, Jan 09, 2018 at 01:00:10AM -0700, Jan Beulich wrote:
>> >>> On 08.01.18 at 17:22, <roger.pau@citrix.com> wrote:
>> > On Mon, Jan 08, 2018 at 09:05:40AM -0700, Jan Beulich wrote:
>> >> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
>> >> > +        unsigned long evtchn = xchg(&XEN_shared_info->evtchn_pending[l1], 0);
>> >> > +
>> >> > +        __clear_bit(l1, &pending);
>> >> > +        evtchn &= ~XEN_shared_info->evtchn_mask[l1];
>> >> > +        while ( evtchn )
>> >> > +        {
>> >> > +            unsigned int port = ffsl(evtchn) - 1;
>> >> > +
>> >> > +            __clear_bit(port, &evtchn);
>> >> > +            port += l1 * BITS_PER_LONG;
>> >> 
>> >> What about a 32-bit client? If that's not intended to be supported,
>> >> building of such a guest should be prevented (in dom0_build.c).
>> > 
>> > 32bit client? You mean building a shim that runs in 32bit mode? If so
>> > I haven't really through of it, but in any case BITS_PER_LOG would be
>> > OK also in that case?
>> 
>> No, by "client" I mean the (sole) guest of the shim, in the 32-bit
>> case of which you'd need to use BITS_PER_EVTCHN_WORD() here.
>> But since 32-bit PV guests are not a problem wrt SP3, I can see
>> why we wouldn't want/need to support that case. Yet if so, I'd
>> prefer if we did that uniformly, by e.g. also avoiding the compat
>> complications in the new grant table wrapper.
> 
> Hm, I'm afraid I'm not following. Xen is 64bits, and this is the
> shared_info page of the shim (Xen), so the size it's BITS_PER_LONG.

Oh, in that case I'm sorry for being the one being confused here.
I was certainly under the impression that this is the page shared
with the client domain.

>> >> > +    case EVTCHNOP_unmask: {
>> >> > +        struct evtchn_unmask unmask;
>> >> > +
>> >> > +        if ( copy_from_guest(&unmask, arg, 1) != 0 )
>> >> > +            return -EFAULT;
>> >> > +
>> >> > +        /* Unmask is handled in L1 */
>> >> > +        rc = evtchn_unmask(unmask.port);
>> >> > +
>> >> > +        break;
>> >> > +    }
>> >> 
>> >> Is this really sufficient, without handing anything through to L0?
>> >> Perhaps it's fine as long as there's no pass-through support here.
>> > 
>> > For the unmask operation? I think so, if there was a pending event the
>> > shim will already take care of injecting it to the guest.
>> 
>> Well, as the Linux code (evtchn_2l_unmask()) tells us certain
>> unmasks have to go through the hypervisor. I would assume
>> that in the case of the shim this means that L2 requests need
>> to also be handed through to L0 whenever they're not being
>> handled entirely locally to L1.
> 
> I'm not sure any L2 unmask needs to go through L0. If we perform the
> unmask in L1 and there's an event pending L1 will already inject an
> interrupt into L2, and AFAIK that's the point of using EVTCHNOP_unmask
> (get an interrupt after unmask if an event is pending).

Possible, but to be honest I'm not sure: If getting an event was
all that's wanted in Linux, I don't think it would need to be done
by issuing a hypercall. Otoh maybe that code just isn't optimally
written. IOW - as long as things work, I'm fine here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU
  2018-01-08 16:05   ` Jan Beulich
  2018-01-08 16:22     ` Roger Pau Monné
@ 2018-01-09 17:50     ` Anthony Liguori
  2018-01-10 12:23       ` Roger Pau Monné
  1 sibling, 1 reply; 206+ messages in thread
From: Anthony Liguori @ 2018-01-09 17:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Anthony Liguori, Roger Pau Monne

On Mon, Jan 8, 2018 at 8:05 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
>> From: Roger Pau Monne <roger.pau@citrix.com>
>>
>> Note that the unmask and the virq operations are handled by the shim
>> itself, and that FIFO event channels are not exposed to the guest.
>>
>> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
>
> In RFC state this certainly doesn't matter yet, but generally I'd
> expect From: to match the first S-o-b.
>
>> @@ -155,11 +156,31 @@ static void set_vcpu_id(void)
>>  static void xen_evtchn_upcall(struct cpu_user_regs *regs)
>>  {
>>      struct vcpu_info *vcpu_info = this_cpu(vcpu_info);
>> +    unsigned long pending;
>>
>>      vcpu_info->evtchn_upcall_pending = 0;
>> -    xchg(&vcpu_info->evtchn_pending_sel, 0);
>> +    pending = xchg(&vcpu_info->evtchn_pending_sel, 0);
>>
>> -    pv_console_rx(regs);
>> +    while ( pending )
>> +    {
>> +        unsigned int l1 = ffsl(pending) - 1;
>
> find_first_set_bit() would look to be the better match here (and
> below), not the least because it translates (on capable hardware)
> to TZCNT instead of BSF.
>
>> +        unsigned long evtchn = xchg(&XEN_shared_info->evtchn_pending[l1], 0);
>> +
>> +        __clear_bit(l1, &pending);
>> +        evtchn &= ~XEN_shared_info->evtchn_mask[l1];
>> +        while ( evtchn )
>> +        {
>> +            unsigned int port = ffsl(evtchn) - 1;
>> +
>> +            __clear_bit(port, &evtchn);
>> +            port += l1 * BITS_PER_LONG;
>
> What about a 32-bit client? If that's not intended to be supported,
> building of such a guest should be prevented (in dom0_build.c).

Note that we discarded this approach in the Vixen series because it
wasn't working reliably for injecting remote event channel
notifications.

Regards,

Anthony Liguori

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 56/74] xen/pvshim: add grant table operations
  2018-01-08 17:19   ` Jan Beulich
@ 2018-01-09 18:34     ` Roger Pau Monné
  2018-01-10  7:28       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-09 18:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2, Anthony Liguori

On Mon, Jan 08, 2018 at 10:19:39AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > @@ -30,11 +31,17 @@
> >  #include <asm/guest.h>
> >  #include <asm/pv/mm.h>
> >  
> > +#include <compat/grant_table.h>
> 
> Interesting: The event channel patch gave me the impression that
> it is not intended to deal with 32-bit guests.

AFAICT the event channel didn't need any explicit compat stuff. That's
not the case with grant tables however...

> > @@ -360,6 +367,173 @@ void pv_shim_inject_evtchn(unsigned int port)
> >      }
> >  }
> >  
> > +long pv_shim_grant_table_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop,
> > +                            unsigned int count, bool compat)
> > +{
> > +    struct domain *d = current->domain;
> > +    long rc = 0;
> > +
> > +    if ( count != 1 )
> > +        return -EINVAL;
> > +
> > +    switch ( cmd )
> > +    {
> > +    case GNTTABOP_setup_table:
> > +    {
> > +        struct gnttab_setup_table nat;
> > +        struct compat_gnttab_setup_table cmp;
> > +        unsigned int i;
> > +
> > +        if ( unlikely(compat ? copy_from_guest(&cmp, uop, 1)
> > +                             : copy_from_guest(&nat, uop, 1)) ||
> > +             unlikely(compat ? !compat_handle_okay(cmp.frame_list,
> > +                                                   cmp.nr_frames)
> > +                             : !guest_handle_okay(nat.frame_list,
> > +                                                  nat.nr_frames)) )
> > +        {
> > +            rc = -EFAULT;
> > +            break;
> > +        }
> > +        if ( compat )
> > +#define XLAT_gnttab_setup_table_HNDL_frame_list(d, s)
> > +                XLAT_gnttab_setup_table(&nat, &cmp);
> > +#undef XLAT_gnttab_setup_table_HNDL_frame_list
> > +
> > +        nat.status = GNTST_okay;
> > +
> > +        spin_lock(&grant_lock);
> > +        if ( !nr_grant_list )
> > +        {
> > +            struct gnttab_query_size query_size = {
> > +                .dom = DOMID_SELF,
> > +            };
> > +
> > +            rc = xen_hypercall_grant_table_op(GNTTABOP_query_size,
> > +                                              &query_size, 1);
> > +            if ( rc )
> > +            {
> > +                spin_unlock(&grant_lock);
> > +                break;
> > +            }
> > +
> > +            ASSERT(!grant_frames);
> > +            grant_frames = xzalloc_array(unsigned long,
> > +                                         query_size.max_nr_frames);
> 
> Hmm, such runtime allocations (especially when the amount can
> be large) are a fundamental problem. I think this needs setting
> up before the guest is started.

The shim already sets some memory apart for it's own usage. It could
be moved to some shim-start function, but it will likely have to be
freed and allocated again on migration, since the number of grant
table frames can change when migrating from one host to another.

> > +    {
> > +        struct gnttab_query_size op;
> > +        int rc;
> > +
> > +        if ( unlikely(copy_from_guest(&op, uop, 1)) )
> > +        {
> > +            rc = -EFAULT;
> > +            break;
> > +        }
> > +
> > +        rc = xen_hypercall_grant_table_op(GNTTABOP_query_size, &op, count);
> > +        if ( rc )
> > +            break;
> > +
> > +        if ( copy_to_guest(uop, &op, 1) )
> 
> __copy_to_guest() (assuming this coping in and out is necessary
> in the first place).

I guess this could be bypassed by just using uop instead of op in the
hypercall?

> > +        {
> > +            rc = -EFAULT;
> > +            break;
> > +        }
> > +
> > +        break;
> > +    }
> > +    default:
> > +        rc = -ENOSYS;
> 
> -EOPNOTSUPP again please. Plus - what about other sub-ops?

They are not yet implemented. I think this is bare minimum needed to
boot a PV DomU, we can expand this later on.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 56/74] xen/pvshim: add grant table operations
  2018-01-09 18:34     ` Roger Pau Monné
@ 2018-01-10  7:28       ` Jan Beulich
  2018-01-10  8:01         ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-10  7:28 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, wei.liu2, Anthony Liguori

>>> On 09.01.18 at 19:34, <roger.pau@citrix.com> wrote:
> On Mon, Jan 08, 2018 at 10:19:39AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
>> > +    {
>> > +        struct gnttab_query_size op;
>> > +        int rc;
>> > +
>> > +        if ( unlikely(copy_from_guest(&op, uop, 1)) )
>> > +        {
>> > +            rc = -EFAULT;
>> > +            break;
>> > +        }
>> > +
>> > +        rc = xen_hypercall_grant_table_op(GNTTABOP_query_size, &op, count);
>> > +        if ( rc )
>> > +            break;
>> > +
>> > +        if ( copy_to_guest(uop, &op, 1) )
>> 
>> __copy_to_guest() (assuming this coping in and out is necessary
>> in the first place).
> 
> I guess this could be bypassed by just using uop instead of op in the
> hypercall?

That's my impression, but you doing the copying made me assume
you might have found a case where things don't work without
copying.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 56/74] xen/pvshim: add grant table operations
  2018-01-10  7:28       ` Jan Beulich
@ 2018-01-10  8:01         ` Roger Pau Monné
  0 siblings, 0 replies; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-10  8:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2, Anthony Liguori

On Wed, Jan 10, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
> >>> On 09.01.18 at 19:34, <roger.pau@citrix.com> wrote:
> > On Mon, Jan 08, 2018 at 10:19:39AM -0700, Jan Beulich wrote:
> >> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> >> > +    {
> >> > +        struct gnttab_query_size op;
> >> > +        int rc;
> >> > +
> >> > +        if ( unlikely(copy_from_guest(&op, uop, 1)) )
> >> > +        {
> >> > +            rc = -EFAULT;
> >> > +            break;
> >> > +        }
> >> > +
> >> > +        rc = xen_hypercall_grant_table_op(GNTTABOP_query_size, &op, count);
> >> > +        if ( rc )
> >> > +            break;
> >> > +
> >> > +        if ( copy_to_guest(uop, &op, 1) )
> >> 
> >> __copy_to_guest() (assuming this coping in and out is necessary
> >> in the first place).
> > 
> > I guess this could be bypassed by just using uop instead of op in the
> > hypercall?
> 
> That's my impression, but you doing the copying made me assume
> you might have found a case where things don't work without
> copying.

No, just didn't realize I could do it that way. gnttab_query_size
doesn't require any compat code AFAICT, so it should work fine.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU
  2018-01-09 17:50     ` Anthony Liguori
@ 2018-01-10 12:23       ` Roger Pau Monné
  0 siblings, 0 replies; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-10 12:23 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Xen-devel, Wei Liu, Anthony Liguori, Jan Beulich

On Tue, Jan 09, 2018 at 09:50:14AM -0800, Anthony Liguori wrote:
> On Mon, Jan 8, 2018 at 8:05 AM, Jan Beulich <JBeulich@suse.com> wrote:
> >>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> >> From: Roger Pau Monne <roger.pau@citrix.com>
> >>
> >> Note that the unmask and the virq operations are handled by the shim
> >> itself, and that FIFO event channels are not exposed to the guest.
> >>
> >> Signed-off-by: Anthony Liguori <aliguori@amazon.com>
> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
> >
> > In RFC state this certainly doesn't matter yet, but generally I'd
> > expect From: to match the first S-o-b.
> >
> >> @@ -155,11 +156,31 @@ static void set_vcpu_id(void)
> >>  static void xen_evtchn_upcall(struct cpu_user_regs *regs)
> >>  {
> >>      struct vcpu_info *vcpu_info = this_cpu(vcpu_info);
> >> +    unsigned long pending;
> >>
> >>      vcpu_info->evtchn_upcall_pending = 0;
> >> -    xchg(&vcpu_info->evtchn_pending_sel, 0);
> >> +    pending = xchg(&vcpu_info->evtchn_pending_sel, 0);
> >>
> >> -    pv_console_rx(regs);
> >> +    while ( pending )
> >> +    {
> >> +        unsigned int l1 = ffsl(pending) - 1;
> >
> > find_first_set_bit() would look to be the better match here (and
> > below), not the least because it translates (on capable hardware)
> > to TZCNT instead of BSF.
> >
> >> +        unsigned long evtchn = xchg(&XEN_shared_info->evtchn_pending[l1], 0);
> >> +
> >> +        __clear_bit(l1, &pending);
> >> +        evtchn &= ~XEN_shared_info->evtchn_mask[l1];
> >> +        while ( evtchn )
> >> +        {
> >> +            unsigned int port = ffsl(evtchn) - 1;
> >> +
> >> +            __clear_bit(port, &evtchn);
> >> +            port += l1 * BITS_PER_LONG;
> >
> > What about a 32-bit client? If that's not intended to be supported,
> > building of such a guest should be prevented (in dom0_build.c).
> 
> Note that we discarded this approach in the Vixen series because it
> wasn't working reliably for injecting remote event channel
> notifications.

I have to admit I haven't found issues with this implementation, and
AFAICT it looks correct albeit simplistic.

The lack if issues might be because the shim only supports
HVMOP_set_evtchn_upcall_vector as the event channel injection
mechanism ATM. I guess we will revising this if required once more
event channel injection mechanisms are added.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 58/74] xen/pvshim: add migration support
  2018-01-09  9:38   ` Jan Beulich
@ 2018-01-10 12:54     ` Roger Pau Monné
  0 siblings, 0 replies; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-10 12:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Tue, Jan 09, 2018 at 02:38:21AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > +        struct domain *d = current->domain;
> > +        struct vcpu *v;
> > +        unsigned int i;
> > +        uint64_t old_store_pfn, old_console_pfn = 0, store_pfn, console_pfn;
> > +        uint64_t store_evtchn, console_evtchn;
> > +
> > +        BUG_ON(current->vcpu_id != 0);
> > +
> > +        BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_STORE_PFN,
> > +                                           &old_store_pfn));
> > +        if ( !pv_console )
> > +            BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_PFN,
> > +                                               &old_console_pfn));
> > +
> > +        /* Pause the other vcpus before starting the migration. */
> > +        for_each_vcpu(d, v)
> > +            if ( v != current )
> > +                vcpu_pause_by_systemcontroller(v);
> > +
> > +        rc = xen_hypercall_shutdown(SHUTDOWN_suspend);
> > +        if ( rc )
> > +        {
> > +            for_each_vcpu(d, v)
> > +                if ( v != current )
> > +                    vcpu_unpause_by_systemcontroller(v);
> > +
> > +            return rc;
> > +        }
> > +
> > +        /* Resume the shim itself first. */
> > +        hypervisor_resume();
> > +
> > +        /*
> > +         * ATM there's nothing Xen can do if the console/store pfn changes,
> > +         * because Xen won't have a page_info struct for it.
> > +         */
> > +        BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_STORE_PFN,
> > +                                           &store_pfn));
> > +        BUG_ON(old_store_pfn != store_pfn);
> > +        if ( !pv_console )
> > +        {
> > +            BUG_ON(xen_hypercall_hvm_get_param(HVM_PARAM_CONSOLE_PFN,
> > +                                               &console_pfn));
> > +            BUG_ON(old_console_pfn != console_pfn);
> > +        }
> > +
> > +        /* Update domain id. */
> > +        d->domain_id = get_dom0_domid();
> > +
> > +        /* Clean the iomem range. */
> > +        BUG_ON(iomem_deny_access(d, 0, ~0UL));
> 
> Does this rangeset change across migration?

Likely, the allowed iomem ranges for the DomU change depending on what
hypervisor_alloc_unused_page returns. Those are mainly used to map
grant table frames.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 61/74] xen/pvshim: support vCPU hotplug
  2018-01-09 10:16   ` Jan Beulich
@ 2018-01-10 13:07     ` Roger Pau Monné
  2018-01-10 13:33       ` Jan Beulich
  2018-01-10 14:40     ` Roger Pau Monné
  1 sibling, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-10 13:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Tue, Jan 09, 2018 at 03:16:38AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > @@ -1303,22 +1320,20 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
> >  
> >          break;
> >  
> > -    case VCPUOP_up: {
> > -        bool_t wake = 0;
> > -        domain_lock(d);
> > -        if ( !v->is_initialised )
> > -            rc = -EINVAL;
> 
> Shouldn't this check remain here? I realize this will complicate
> locking (luckily the domain lock is a recursive one, so it shouldn't
> be too bad), but I don't think pv_shim_cpu_up() can tolerate failing
> because of vcpu_up() failing.
> 
> I also think that the use of "long" for return types and values isn't
> really warranted here, and there's also no visible to me reason to
> special case CPU0 here. But for simplicity reasons I can see why
> you've chosen that option; otoh the locking issue above that you'll
> need to solve might be easier to deal with if you didn't switch CPUs
> for hypercall processing (without dropping the use of
> continue_hypercall_on_cpu()).

Right, I'm not sure why bringing a CPU up is required to happen on
CPU0, but that's what the current code in arch_do_sysctl does.

I'm not sure I'm following the last part of your reply, if for CPU
bringup there's no need to switch to CPU0, why would I want to keep
the continue_hypercall_on_cpu for in that case?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 61/74] xen/pvshim: support vCPU hotplug
  2018-01-10 13:07     ` Roger Pau Monné
@ 2018-01-10 13:33       ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-10 13:33 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, wei.liu2

>>> On 10.01.18 at 14:07, <roger.pau@citrix.com> wrote:
> On Tue, Jan 09, 2018 at 03:16:38AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
>> > @@ -1303,22 +1320,20 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>> >  
>> >          break;
>> >  
>> > -    case VCPUOP_up: {
>> > -        bool_t wake = 0;
>> > -        domain_lock(d);
>> > -        if ( !v->is_initialised )
>> > -            rc = -EINVAL;
>> 
>> Shouldn't this check remain here? I realize this will complicate
>> locking (luckily the domain lock is a recursive one, so it shouldn't
>> be too bad), but I don't think pv_shim_cpu_up() can tolerate failing
>> because of vcpu_up() failing.
>> 
>> I also think that the use of "long" for return types and values isn't
>> really warranted here, and there's also no visible to me reason to
>> special case CPU0 here. But for simplicity reasons I can see why
>> you've chosen that option; otoh the locking issue above that you'll
>> need to solve might be easier to deal with if you didn't switch CPUs
>> for hypercall processing (without dropping the use of
>> continue_hypercall_on_cpu()).
> 
> Right, I'm not sure why bringing a CPU up is required to happen on
> CPU0, but that's what the current code in arch_do_sysctl does.

In the bare metal hypervisor we likely will need to do some work to
remove this CPU0 restriction.

> I'm not sure I'm following the last part of your reply, if for CPU
> bringup there's no need to switch to CPU0, why would I want to keep
> the continue_hypercall_on_cpu for in that case?

For onlining you may indeed get away without. But in the
offlining case you don't want to offline the CPU you're
running on.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 62/74] xen/pvshim: memory hotplug
  2018-01-09 10:42   ` Jan Beulich
@ 2018-01-10 13:36     ` Roger Pau Monné
  2018-01-10 13:42       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-10 13:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Tue, Jan 09, 2018 at 03:42:01AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > +void pv_shim_online_memory(unsigned int nr, unsigned int order)
> > +{
> > +    struct page_info *page, *tmp;
> > +    PAGE_LIST_HEAD(list);
> > +
> > +    spin_lock(&balloon_lock);
> > +    page_list_for_each_safe ( page, tmp, &balloon )
> > +    {
> > +            if ( page->v.free.order != order )
> > +                continue;
> 
> Since guests (afaik) only ever balloon order-0 pages, this is fine
> for now. But it's insufficient in general - there's no point failing
> a request when there's no exact match available, but a higher
> order one is (which could be split).

Yes, that's right. Using order != 0 is likely to not work properly
given the lack of support for splitting higher order chunks.

I haven't implemented this because as you say there's no guest making
use of ti anyway. Let me add a TODO here.

> > @@ -993,6 +997,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >              return start_extent;
> >          }
> >  
> > +#ifdef CONFIG_X86
> > +        if ( pv_shim && op != XENMEM_decrease_reservation && !args.nr_done )
> > +            pv_shim_online_memory(args.nr_extents, args.extent_order);
> > +#endif
> > +
> >          switch ( op )
> >          {
> >          case XENMEM_increase_reservation:
> > @@ -1015,6 +1024,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >                  __HYPERVISOR_memory_op, "lh",
> >                  op | (rc << MEMOP_EXTENT_SHIFT), arg);
> >  
> > +#ifdef CONFIG_X86
> > +        if ( pv_shim && op == XENMEM_decrease_reservation )
> > +            pv_shim_offline_memory(args.nr_extents, args.extent_order);
> > +#endif
> 
> Looking at both of these changes - is it somewhere being made
> sure that shim containers won't boot in PoD mode?
> 
> For the latter change - is this correct when the operation has been
> preempted? I think you want to offline only the delta between
> start and args.nr_done.

AFAICT this function will only be called once, even when preempted.
On the online case it's only called when args.nr_done == 0, and in the
offline case it's only called after the work has been completely done.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 62/74] xen/pvshim: memory hotplug
  2018-01-10 13:36     ` Roger Pau Monné
@ 2018-01-10 13:42       ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-10 13:42 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Xen-devel, wei.liu2

>>> On 10.01.18 at 14:36, <roger.pau@citrix.com> wrote:
> On Tue, Jan 09, 2018 at 03:42:01AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
>> > @@ -1015,6 +1024,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>> >                  __HYPERVISOR_memory_op, "lh",
>> >                  op | (rc << MEMOP_EXTENT_SHIFT), arg);
>> >  
>> > +#ifdef CONFIG_X86
>> > +        if ( pv_shim && op == XENMEM_decrease_reservation )
>> > +            pv_shim_offline_memory(args.nr_extents, args.extent_order);
>> > +#endif
>> 
>> Looking at both of these changes - is it somewhere being made
>> sure that shim containers won't boot in PoD mode?
>> 
>> For the latter change - is this correct when the operation has been
>> preempted? I think you want to offline only the delta between
>> start and args.nr_done.
> 
> AFAICT this function will only be called once, even when preempted.
> On the online case it's only called when args.nr_done == 0, and in the
> offline case it's only called after the work has been completely done.

No, and that's the point of my earlier comment: You call the function
solely based on the value of op, not considering at all whether you
were preempted. Or am I overlooking anything?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 61/74] xen/pvshim: support vCPU hotplug
  2018-01-09 10:16   ` Jan Beulich
  2018-01-10 13:07     ` Roger Pau Monné
@ 2018-01-10 14:40     ` Roger Pau Monné
  1 sibling, 0 replies; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-10 14:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Tue, Jan 09, 2018 at 03:16:38AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > @@ -1303,22 +1320,20 @@ long do_vcpu_op(int cmd, unsigned int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
> >  
> >          break;
> >  
> > -    case VCPUOP_up: {
> > -        bool_t wake = 0;
> > -        domain_lock(d);
> > -        if ( !v->is_initialised )
> > -            rc = -EINVAL;
> 
> Shouldn't this check remain here? I realize this will complicate
> locking (luckily the domain lock is a recursive one, so it shouldn't
> be too bad), but I don't think pv_shim_cpu_up() can tolerate failing
> because of vcpu_up() failing.

I guess you mean that it's unfortunate to fail in pv_shim_cpu_up after
having brought up the physical CPU if it turns out that
!v->is_initialised. I can add a check at the top of pv_shim_cpu_up for
!v->is_initialised and change vcpu_up slightly.

Regarding the usage of long, continue_hypercall_on_cpu requires a
function that returns long, so I would rather keep things as they are
now for simplicity.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 34/74] x86/guest: add PV console code
  2018-01-05 15:22   ` Jan Beulich
@ 2018-01-10 15:33     ` Roger Pau Monné
  2018-01-10 15:55       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-10 15:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, wei.liu2, Xen-devel, Sergey Dyasli

On Fri, Jan 05, 2018 at 08:22:46AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> > +void __init pv_console_set_rx_handler(serial_rx_fn fn)
> > +{
> > +    cons_rx_handler = fn;
> > +}
> 
> Especially this and ...
> 
> > +size_t pv_console_rx(struct cpu_user_regs *regs)
> > +{
> > +    char c;
> > +    XENCONS_RING_IDX cons, prod;
> > +    size_t recv = 0;
> > +
> > +    if ( !cons_ring )
> > +        return 0;
> > +
> > +    /* TODO: move this somewhere */
> > +    if ( !test_bit(cons_evtchn, XEN_shared_info->evtchn_pending) )
> > +        return 0;
> 
> ... the need for this and ...
> 
> > +    prod = ACCESS_ONCE(cons_ring->in_prod);
> > +    cons = cons_ring->in_cons;
> > +    /* Get pointers before reading the ring */
> > +    smp_rmb();
> > +
> > +    ASSERT((prod - cons) <= sizeof(cons_ring->in));
> > +
> > +    while ( cons != prod )
> > +    {
> > +        c = cons_ring->in[MASK_XENCONS_IDX(cons++, cons_ring->in)];
> > +        if ( cons_rx_handler )
> > +            cons_rx_handler(c, regs);
> > +        recv++;
> > +    }
> > +
> > +    /* No need for a mem barrier because every character was already consumed */
> > +    barrier();
> > +    ACCESS_ONCE(cons_ring->in_cons) = cons;
> > +    notify_daemon();
> > +
> > +    clear_bit(cons_evtchn, XEN_shared_info->evtchn_pending);
> 
> ... this at this layer are very hard to judge about with all the code
> here being dead for the moment. Can't this driver be modeled like
> any other of the UART drivers, surfacing the accessors through
> struct uart_driver (and making the ad-hoc call sites in the next
> patch [mostly] unnecessary)?

I've spoken to Sergey and he agrees that this should be solved and
that using uart_driver seems like the right approach.

However given that we would like to merge this ASAP, do you consider
this a blocker? I haven't really looked at how much effort it would
take to model this code as a proper uart driver.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 34/74] x86/guest: add PV console code
  2018-01-10 15:33     ` Roger Pau Monné
@ 2018-01-10 15:55       ` Jan Beulich
  0 siblings, 0 replies; 206+ messages in thread
From: Jan Beulich @ 2018-01-10 15:55 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, wei.liu2, Xen-devel, Sergey Dyasli

>>> On 10.01.18 at 16:33, <roger.pau@citrix.com> wrote:
> I've spoken to Sergey and he agrees that this should be solved and
> that using uart_driver seems like the right approach.
> 
> However given that we would like to merge this ASAP, do you consider
> this a blocker?

No.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 66/74] xen/shim: allow DomU to have as many vcpus as available
  2018-01-09 10:59   ` Jan Beulich
@ 2018-01-10 16:14     ` Roger Pau Monné
  0 siblings, 0 replies; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-10 16:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, wei.liu2

On Tue, Jan 09, 2018 at 03:59:33AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > From: Roger Pau Monne <roger.pau@citrix.com>
> > 
> > Since the shim VCPUOP_{up/down} hypercall is wired to the plug/unplug
> > of CPUs to the shim itself, start the shim DomU with only the BSP
> > online, and let the guest bring up other CPUs as it needs them.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> What are the ramifications of not making this change? Shouldn't
> the shim's pCPU count (pCPU as viewed from its own perspective)
> simply always match its client's vCPU count?

Yes, that's the point of this change. By default Dom0 will get a many
vCPUs as online pCPUs. ÇIn the shim case the number of online pCPUs is
always 1, and thus we need to set max_vcpus to match the number of
possible pCPUs.

> > --- a/xen/arch/x86/pv/dom0_build.c
> > +++ b/xen/arch/x86/pv/dom0_build.c
> > @@ -695,7 +695,8 @@ int __init dom0_construct_pv(struct domain *d,
> >      for ( i = 0; i < XEN_LEGACY_MAX_VCPUS; i++ )
> >          shared_info(d, vcpu_info[i].evtchn_upcall_mask) = 1;
> >  
> > -    printk("Dom0 has maximum %u VCPUs\n", d->max_vcpus);
> > +    printk("%s has maximum %u VCPUs\n", pv_shim ? "DomU" : "Dom0",
> 
> "Dom%c ..." perhaps?

We could even use Dom%u and d->domain_id. That would remove the
dependency on pv_shim.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 00/74] Run PV guest in PVH container
  2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
                   ` (74 preceding siblings ...)
  2018-01-08 16:12 ` [PATCH RFC v1 00/74] Run PV guest in PVH container Ian Jackson
@ 2018-01-10 16:26 ` George Dunlap
  2018-01-10 16:28   ` Wei Liu
  75 siblings, 1 reply; 206+ messages in thread
From: George Dunlap @ 2018-01-10 16:26 UTC (permalink / raw)
  To: Wei Liu; +Cc: Xen-devel

On Thu, Jan 4, 2018 at 1:05 PM, Wei Liu <wei.liu2@citrix.com> wrote:
> Hi all
>
> This is a patch series to run PV guest inside a PVH container. The series is
> still in a very RFC state. We're aware that some code is not very clean yet and
> in the process of cleaning things up.
>
> The series can be found at:
>
>     https://xenbits.xen.org/git-http/people/liuw/xen.git wip.pvshim-rfc-v1
>
> The basic idea can be found at page 15 of the slides at [0].
>
> This is a mitigation against one of the CPU vulnerabilities disclosed recently.
> This series makes it possible to continue running untrusted PV guests.  Please
> refer to XSA-254 [1] for more information.
>
> Given the embargo lifted and vulnerabilities disclosed we opt to develop openly
> on xen-devel. Feedback and testing is very welcome.
>
> The series is split into three parts: The first part is for the host that runs
> the shim, the second part is for the shim itself, the third part is for
> toolstack patches (not yet fully working). See the markers in the list of
> patches.
>
> Instructions on using the PV shim:
>
> 1. Git clone the branch and configure as one normally would.
> 2. A xen-shim binary would be built and installed into Xen's firmware
>    directory, along side hvmloader and co.
> 3. Use the hacky way currently provided in the first part of the series to
>    boot a PV guest inside a PVH container:
>    a. Append type='pvh' in your PV guest config file;
>    b. Export two environment variables so that libxl knows where to find
>       the shim and what to add to the shim's command line option.
>       # export LIBXL_PVSHIM_PATH=$PATH_TO_XEN_SHIM
>       # export LIBXL_PVSHIM_CMDLINE="pv-shim console=xen,pv loglvl=all guest_loglvl=all apic_verbosity=debug e820-verbose sched=null"
> 4. xl create -c guest.cfg
>
> You should be able to see some Xen messages first and then guest kernel
> messages (the console= shim paramter is required).
>
> Known issues:
>
> 1. ARM build and some Clang build are broken by this series.
> 2. The host will see a lot over-allocation messages, nothing too harmful and
>    will be fixed once toolstack is ready.
>
> Wei.
>
> [0] https://www.slideshare.net/xen_com_mgr/xpdds17-keynote-towards-a-configurable-and-slimmer-x86-hypervisor-wei-liu-citrix
> [1] https://xenbits.xen.org/xsa/advisory-254.html
>
> # Patches for the host:
>
> 448f56a363 x86/svm: Offer CPUID Faulting to AMD HVM guests as well
> 6a78c9ae33 x86: Common cpuid faulting support
> 05844fec44 x86/upcall: inject a spurious event after setting upcall vector
> fc7a48dd74 tools/libxc: initialise hvm loader elf log fd to get more logging
> 522c9cbaf0 tools/libxc: remove extraneous newline in xc_dom_load_acpi
> bd6b572b32 tools/libelf: fix elf notes check for PVH guest
> 449b932b0c tools/libxc: Multi modules support
> cc6dbdc0c1 libxl: Introduce hack to allow PVH mode to add a shim
>
> # Patches for the shim:
>
[snip]
> 7dbc3f25f6 xen/x86: report domain id on cpuid

This is a host (L0) patch, isn't it?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 00/74] Run PV guest in PVH container
  2018-01-10 16:26 ` George Dunlap
@ 2018-01-10 16:28   ` Wei Liu
  0 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-10 16:28 UTC (permalink / raw)
  To: George Dunlap; +Cc: Xen-devel, Wei Liu

On Wed, Jan 10, 2018 at 04:26:07PM +0000, George Dunlap wrote:
> On Thu, Jan 4, 2018 at 1:05 PM, Wei Liu <wei.liu2@citrix.com> wrote:
> > Hi all
> >
> > This is a patch series to run PV guest inside a PVH container. The series is
> > still in a very RFC state. We're aware that some code is not very clean yet and
> > in the process of cleaning things up.
> >
> > The series can be found at:
> >
> >     https://xenbits.xen.org/git-http/people/liuw/xen.git wip.pvshim-rfc-v1
> >
> > The basic idea can be found at page 15 of the slides at [0].
> >
> > This is a mitigation against one of the CPU vulnerabilities disclosed recently.
> > This series makes it possible to continue running untrusted PV guests.  Please
> > refer to XSA-254 [1] for more information.
> >
> > Given the embargo lifted and vulnerabilities disclosed we opt to develop openly
> > on xen-devel. Feedback and testing is very welcome.
> >
> > The series is split into three parts: The first part is for the host that runs
> > the shim, the second part is for the shim itself, the third part is for
> > toolstack patches (not yet fully working). See the markers in the list of
> > patches.
> >
> > Instructions on using the PV shim:
> >
> > 1. Git clone the branch and configure as one normally would.
> > 2. A xen-shim binary would be built and installed into Xen's firmware
> >    directory, along side hvmloader and co.
> > 3. Use the hacky way currently provided in the first part of the series to
> >    boot a PV guest inside a PVH container:
> >    a. Append type='pvh' in your PV guest config file;
> >    b. Export two environment variables so that libxl knows where to find
> >       the shim and what to add to the shim's command line option.
> >       # export LIBXL_PVSHIM_PATH=$PATH_TO_XEN_SHIM
> >       # export LIBXL_PVSHIM_CMDLINE="pv-shim console=xen,pv loglvl=all guest_loglvl=all apic_verbosity=debug e820-verbose sched=null"
> > 4. xl create -c guest.cfg
> >
> > You should be able to see some Xen messages first and then guest kernel
> > messages (the console= shim paramter is required).
> >
> > Known issues:
> >
> > 1. ARM build and some Clang build are broken by this series.
> > 2. The host will see a lot over-allocation messages, nothing too harmful and
> >    will be fixed once toolstack is ready.
> >
> > Wei.
> >
> > [0] https://www.slideshare.net/xen_com_mgr/xpdds17-keynote-towards-a-configurable-and-slimmer-x86-hypervisor-wei-liu-citrix
> > [1] https://xenbits.xen.org/xsa/advisory-254.html
> >
> > # Patches for the host:
> >
> > 448f56a363 x86/svm: Offer CPUID Faulting to AMD HVM guests as well
> > 6a78c9ae33 x86: Common cpuid faulting support
> > 05844fec44 x86/upcall: inject a spurious event after setting upcall vector
> > fc7a48dd74 tools/libxc: initialise hvm loader elf log fd to get more logging
> > 522c9cbaf0 tools/libxc: remove extraneous newline in xc_dom_load_acpi
> > bd6b572b32 tools/libelf: fix elf notes check for PVH guest
> > 449b932b0c tools/libxc: Multi modules support
> > cc6dbdc0c1 libxl: Introduce hack to allow PVH mode to add a shim
> >
> > # Patches for the shim:
> >
> [snip]
> > 7dbc3f25f6 xen/x86: report domain id on cpuid
> 
> This is a host (L0) patch, isn't it?

Yes it is.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU
  2018-01-09 16:28       ` Jan Beulich
@ 2018-01-10 16:56         ` Sergey Dyasli
  2018-01-12  7:03           ` Sarah Newman
  0 siblings, 1 reply; 206+ messages in thread
From: Sergey Dyasli @ 2018-01-10 16:56 UTC (permalink / raw)
  To: JBeulich
  Cc: Andrew Cooper, Sergey Dyasli, Wei Liu, xen-devel, Roger Pau Monne

On Tue, 2018-01-09 at 09:28 -0700, Jan Beulich wrote:
> > > > On 09.01.18 at 16:43, <sergey.dyasli@citrix.com> wrote:
> > 
> > On Tue, 2018-01-09 at 02:13 -0700, Jan Beulich wrote:
> > > > > > On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
> > > > 
> > > > +size_t consoled_guest_rx(void)
> > > > +{
> > > > +    size_t recv = 0, idx = 0;
> > > > +    XENCONS_RING_IDX cons, prod;
> > > > +
> > > > +    if ( !cons_ring )
> > > > +        return 0;
> > > > +
> > > > +    spin_lock(&rx_lock);
> > > > +
> > > > +    cons = cons_ring->out_cons;
> > > > +    prod = ACCESS_ONCE(cons_ring->out_prod);
> > > > +    ASSERT((prod - cons) <= sizeof(cons_ring->out));
> > > > +
> > > > +    /* Is the ring empty? */
> > > > +    if ( cons == prod )
> > > > +        goto out;
> > > > +
> > > > +    /* Update pointers before accessing the ring */
> > > > +    smp_rmb();
> > > 
> > > I think this need to move up ahead of the if(). In the comment
> > > perhaps s/Update/Latch/?
> > 
> > The read/write memory barriers here are between read/write accesses to
> > ring->out_prod and ring->out array. So there is no need to move them.
> > (the same goes for the input ring)
> 
> And there is no multiple-read issue here?

As Andrew has kindly explained to me, there is an issue indeed.
So I moved smp_rmb() to be right after cons and prod read, and updated
the comment to say:

"Latch pointers before accessing the ring. Included compiler barrier also
ensures that pointers are really read only once into local variables."

-- 
Thanks,
Sergey
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 23/74] x86/entry: Probe for Xen early during boot
  2018-01-05 13:40   ` Jan Beulich
@ 2018-01-10 17:45     ` Wei Liu
  2018-01-11  7:55       ` Jan Beulich
  0 siblings, 1 reply; 206+ messages in thread
From: Wei Liu @ 2018-01-10 17:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, wei.liu2, Xen-devel

On Fri, Jan 05, 2018 at 06:40:29AM -0700, Jan Beulich wrote:
> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> > --- /dev/null
> > +++ b/xen/arch/x86/guest/xen.c
> > @@ -0,0 +1,75 @@
> > +/******************************************************************************
> > + * arch/x86/guest/xen.c
> > + *
> > + * Support for detecting and running under Xen.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> > + *
> > + * Copyright (c) 2017 Citrix Systems Ltd.
> > + */
> > +#include <xen/init.h>
> > +#include <xen/types.h>
> > +
> > +#include <asm/guest.h>
> > +#include <asm/processor.h>
> > +
> > +#include <public/arch-x86/cpuid.h>
> > +
> > +bool xen_guest;
> 
> __read_mostly?
> 
> > +static uint32_t xen_cpuid_base;
> 
> Depending on future use, __initdata or __read_mostly?
> 
> > --- a/xen/include/asm-x86/guest.h
> > +++ b/xen/include/asm-x86/guest.h
> > @@ -20,6 +20,7 @@
> >  #define __X86_GUEST_H__
> >  
> >  #include <asm/guest/pvh-boot.h>
> > +#include <asm/guest/xen.h>
> >  
> >  #endif /* __X86_GUEST_H__ */
> 
> I'm increasingly curious to understand what this header's purpose
> is meant to be. It looks as if you mean source files to only ever
> include this one, but why? Rather than exposing everything at

Yes there will be file that only includes this header -- the PV in HVM
work doesn't need the PVH bits.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 20/74] x86: produce a binary that can be booted as PVH
  2018-01-05 11:39   ` Jan Beulich
  2018-01-08 15:59     ` Wei Liu
@ 2018-01-10 19:10     ` Wei Liu
  1 sibling, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-10 19:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, wei.liu2, Xen-devel

On Fri, Jan 05, 2018 at 04:39:33AM -0700, Jan Beulich wrote:
> > +#if defined(CONFIG_PVH_GUEST) && !defined(EFI)
> 
> The EFI part here then also wouldn't be necessary, afaict.

It is necessary otherwise efi.lds will contain .Xen.note directives,
which breaks the build.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 23/74] x86/entry: Probe for Xen early during boot
  2018-01-10 17:45     ` Wei Liu
@ 2018-01-11  7:55       ` Jan Beulich
  2018-01-11  9:43         ` Wei Liu
  0 siblings, 1 reply; 206+ messages in thread
From: Jan Beulich @ 2018-01-11  7:55 UTC (permalink / raw)
  To: wei.liu2; +Cc: Andrew Cooper, Xen-devel

>>> On 10.01.18 at 18:45, <wei.liu2@citrix.com> wrote:
> On Fri, Jan 05, 2018 at 06:40:29AM -0700, Jan Beulich wrote:
>> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
>> > --- a/xen/include/asm-x86/guest.h
>> > +++ b/xen/include/asm-x86/guest.h
>> > @@ -20,6 +20,7 @@
>> >  #define __X86_GUEST_H__
>> >  
>> >  #include <asm/guest/pvh-boot.h>
>> > +#include <asm/guest/xen.h>
>> >  
>> >  #endif /* __X86_GUEST_H__ */
>> 
>> I'm increasingly curious to understand what this header's purpose
>> is meant to be. It looks as if you mean source files to only ever
>> include this one, but why? Rather than exposing everything at
> 
> Yes there will be file that only includes this header -- the PV in HVM
> work doesn't need the PVH bits.

Either I'm not understanding your reply or you didn't understand
mine: I was trying to understand the purpose of guest.h if it
consists of only #include-s. In such a case, why can't the sources
needing certain bits simply include the headers they need, instead
of this container one?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 23/74] x86/entry: Probe for Xen early during boot
  2018-01-11  7:55       ` Jan Beulich
@ 2018-01-11  9:43         ` Wei Liu
  0 siblings, 0 replies; 206+ messages in thread
From: Wei Liu @ 2018-01-11  9:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, wei.liu2, Xen-devel

On Thu, Jan 11, 2018 at 12:55:48AM -0700, Jan Beulich wrote:
> >>> On 10.01.18 at 18:45, <wei.liu2@citrix.com> wrote:
> > On Fri, Jan 05, 2018 at 06:40:29AM -0700, Jan Beulich wrote:
> >> >>> On 04.01.18 at 14:05, <wei.liu2@citrix.com> wrote:
> >> > --- a/xen/include/asm-x86/guest.h
> >> > +++ b/xen/include/asm-x86/guest.h
> >> > @@ -20,6 +20,7 @@
> >> >  #define __X86_GUEST_H__
> >> >  
> >> >  #include <asm/guest/pvh-boot.h>
> >> > +#include <asm/guest/xen.h>
> >> >  
> >> >  #endif /* __X86_GUEST_H__ */
> >> 
> >> I'm increasingly curious to understand what this header's purpose
> >> is meant to be. It looks as if you mean source files to only ever
> >> include this one, but why? Rather than exposing everything at
> > 
> > Yes there will be file that only includes this header -- the PV in HVM
> > work doesn't need the PVH bits.
> 
> Either I'm not understanding your reply or you didn't understand
> mine: I was trying to understand the purpose of guest.h if it
> consists of only #include-s. In such a case, why can't the sources
> needing certain bits simply include the headers they need, instead
> of this container one?

I see. Yes this header can go away. This is going to be a nice-to-have
in my book, i.e. I will do it later.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 00/74] Run PV guest in PVH container
  2018-01-08 16:12 ` [PATCH RFC v1 00/74] Run PV guest in PVH container Ian Jackson
@ 2018-01-11 15:39   ` Ian Jackson
  0 siblings, 0 replies; 206+ messages in thread
From: Ian Jackson @ 2018-01-11 15:39 UTC (permalink / raw)
  To: Wei Liu, Xen-devel, George Dunlap, Roger Pau Monné, Andrew Cooper

Ian Jackson writes ("Re: [Xen-devel] [PATCH RFC v1 00/74] Run PV guest in PVH container"):
> Xen 4.10
> ========
...
>  => code change: the backport should set the default non-NULL
>     values only if the pvhshim boolean is true after defaulting

I discover, looking at the code, that this is already true.

> New callers with old libxl on 4.10:
> -----------------------------------
> 
> If the caller is creating a guest other than a PVH one there is no
> change to the ABI.
> 
> When a caller creates a PVH non-shim guest, it will probably not set
> any of these fields.  Things will work properly.
> 
> If a caller tries to create a shim guest, the attempt to do so will be
> ignored and the guest will be created as PV.  Probably, the guest will
> not boot.  Additionally, if the caller filled in pvshim cmdline or
> path information, it will probably expect *dispose* to free those
> values - and the result will be a memory leak.
> 
> If the caller is examining existing guests, PV and HVM guests will
> work fine.  If the caller is examining existing PVH guests, the
> library will not initialise the new fields.  The result may include an
> uninitialised read by the caller.
> 
> Overall: this is not safe and should be prevented.
> 
>  => code change: the 4.10 backport should use symbol versioning or
>     another technique to prevent expecting callers whose source code
>     understands *shim* guests from using libxl versions which don't.

This is only _really_ needed to avoid trouble when:
  * New tool has been installed
  * New libxl has not been installed
  * PVH guests (including shim guests) have been created by tool A
  * Tool B has not been updated, and is used to examine guests

We do not have any symbol versioning in libxl in 4.10 and it is hard
to think of another way to make this work without changes to the tool
source code.  I don't want to invent something ad-hoc.

So I propose to skip this.

Wei: My conclusion is that the libxl tools branch (including George's
"libxl: introduce hack" patch, as I I sent you previously, is suitable
for simple rebase to 4.10.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU
  2018-01-10 16:56         ` Sergey Dyasli
@ 2018-01-12  7:03           ` Sarah Newman
  0 siblings, 0 replies; 206+ messages in thread
From: Sarah Newman @ 2018-01-12  7:03 UTC (permalink / raw)
  To: Sergey Dyasli, JBeulich
  Cc: Andrew Cooper, Roger Pau Monne, Wei Liu, xen-devel

On 01/10/2018 08:56 AM, Sergey Dyasli wrote:
> On Tue, 2018-01-09 at 09:28 -0700, Jan Beulich wrote:
>>>>> On 09.01.18 at 16:43, <sergey.dyasli@citrix.com> wrote:
>>>
>>> On Tue, 2018-01-09 at 02:13 -0700, Jan Beulich wrote:
>>>>>>> On 04.01.18 at 14:06, <wei.liu2@citrix.com> wrote:
>>>>>
>>>>> +size_t consoled_guest_rx(void)
>>>>> +{
>>>>> +    size_t recv = 0, idx = 0;
>>>>> +    XENCONS_RING_IDX cons, prod;
>>>>> +
>>>>> +    if ( !cons_ring )
>>>>> +        return 0;
>>>>> +
>>>>> +    spin_lock(&rx_lock);
>>>>> +
>>>>> +    cons = cons_ring->out_cons;
>>>>> +    prod = ACCESS_ONCE(cons_ring->out_prod);
>>>>> +    ASSERT((prod - cons) <= sizeof(cons_ring->out));
>>>>> +
>>>>> +    /* Is the ring empty? */
>>>>> +    if ( cons == prod )
>>>>> +        goto out;
>>>>> +
>>>>> +    /* Update pointers before accessing the ring */
>>>>> +    smp_rmb();
>>>>
>>>> I think this need to move up ahead of the if(). In the comment
>>>> perhaps s/Update/Latch/?
>>>
>>> The read/write memory barriers here are between read/write accesses to
>>> ring->out_prod and ring->out array. So there is no need to move them.
>>> (the same goes for the input ring)
>>
>> And there is no multiple-read issue here?
> 
> As Andrew has kindly explained to me, there is an issue indeed.
> So I moved smp_rmb() to be right after cons and prod read, and updated
> the comment to say:
> 
> "Latch pointers before accessing the ring. Included compiler barrier also
> ensures that pointers are really read only once into local variables."
> 

There is an incompatibility in this (also vixen's) serial output. The NetBSD installer includes NUL characters in its output and does not display
properly due to the call to puts. putc demonstrably works, though it would be better to add a length oriented function to serial.c.

--Sarah

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked
  2018-01-08 11:12     ` George Dunlap
@ 2018-01-12  9:54       ` Dario Faggioli
  2018-01-12 10:45         ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Dario Faggioli @ 2018-01-12  9:54 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich, Roger Pau Monne, wei.liu2
  Cc: George Dunlap, Xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3104 bytes --]

Hi!

First of all, my filters somehow failed to highlight this for me, so
sorry if I did not notice it earlier (and now, I need new filters
anyway, as the email I'm using is different :-D).

I'll have a look at the patch ASAP.

On Mon, 2018-01-08 at 11:12 +0000, George Dunlap wrote:
> On 01/08/2018 10:37 AM, Jan Beulich wrote:
>
> > I don't understand: Isn't the null scheduler not moving around
> > vCPU-s at all? At least that's what the comment at the top of the
> > file says, unless I'm mis-interpreting it. If so, how can "some CPU
> > (...) pick this vCPU"?
> 
> There's no current way to prevent a user from adding more vcpus to a
> pool than there are pcpus (if nothing else, by creating a new VM in a
> given pool), or from taking pcpus from a pool in which #vcpus >=
> #pcpus.
> 
Exactly. And something that checks for that is all but easy to
introduce (let's just avoid even mentioning enforcing!).

> The null scheduler deals with this by having a queue of "unassigned"
> vcpus that are waiting for a free pcpu.  When a pcpu becomes
> available,
> it will do the assignment.  When a pcpu that has a vcpu is assigned
> is
> removed from the pool, that vcpu is assigned to a different pcpu if
> one
> is available; if not, it is put on the list.
> 
Err... yes. BTW, either there are a couple of typos in the above
paragraph, or it's me that can't read it well. Anyway, just to be
clear, if we have 4 pCPUs, and 6 VMs, with 1 vCPU each, this might be
the situation:

CPU0 <-- d1v0
CPU1 <-- d2v0
CPU2 <-- d3v0
CPU3 <-- d4v0

Waitqueue: d5v0,d6v0

Then, if d2 leaves/dies/etc, leaving CPU1 idle, d5v0 is picked up from
the waitqueue and assigned to CPU1.


> In the case of shim mode, this also seems to happen whenever curvcpus
> <
> maxvcpus: The L1 hypervisor (shim) only sees curvcpus cpus on which
> to
> schedule L2 vcpus, but the L2 guest has maxvcpus vcpus to schedule,
> of
> which (maxvcpus-curvcpus) are  marked 'down'.  
>
Mmm, wait. In case of a domain which specifies both maxvcpus and
curvcpus, how many vCPUs does the domain in which the shim run?

> In this case, it also
> seems that the null scheduler sometimes schedules a "down" vcpu when
> there are "up" vcpus on the list; meaning that the "up" vcpus are
> never
> scheduled.
> 
I'm not sure how an offline vCPU can end up there... but maybe I need
to look at the code better, with the shim use case in mind.

Anyway, I'm fine with checks that prevent offline vCPUs to be assigned
to either pCPUs (like, the CPUs of L0 Xen) or shim's vCPUs (so, the
CPUs of L1 Xen). I'm less fine with rescheduling everyone at every
wakeup.

Roger, Wei, if/when you want to talk a bit about this, to explain the
situation a bit better, so I'll be able to help, feel free to ping me
 (email or IRC). :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked
  2018-01-04 13:05 ` [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked Wei Liu
  2018-01-08 10:37   ` Jan Beulich
@ 2018-01-12 10:41   ` Dario Faggioli
  1 sibling, 0 replies; 206+ messages in thread
From: Dario Faggioli @ 2018-01-12 10:41 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: George Dunlap, Jan Beulich, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 5161 bytes --]

On Thu, 2018-01-04 at 13:05 +0000, Wei Liu wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> Avoid scheduling vCPUs that are blocked, there's no point in
> assigning
> them to a pCPU because they are not going to run anyway.
> 
> Since blocked vCPUs are not assigned to pCPUs after this change,
> force
> a rescheduling when a vCPU is brought up if it's on the waitqueue.
> Also when scheduling try to pick a vCPU from the runqueue if the pCPU
> is running idle.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Dario Faggioli <raistlin@linux.it>
> ---
> Changes since v1:
>  - Force a rescheduling when a vCPU is brought up.
>  - Try to pick a vCPU from the runqueue if running the idle vCPU.
>
As noted by Jan already, there's a mixing of "blocked" and "down" (or
offline).

In the null scheduler, a vCPU that is assigned to a pCPU, is free to
block and wake-up as many time as it wants (quite obviously). And when
it blocks, the pCPU will just stay idle.

There's no such thing of pulling on the CPU another vCPU, either from
the waitqueue or from anywhere else. That's the whole point of the
scheduler, actually.

Now, I'm not quite sure whether or not this can be a problem in the
"shim scenario". If it is, we have to think of a solution that does not
totally defeat the purpose of the scheduler when used baremetal.

Or use another scheduler, perhaps configuring static 1:1 pinning. Null
seems a great fit for this use case to me, so, I'd say, let's try to
find a nice and cool way to use it. :-)

> ---
>  xen/common/sched_null.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
> index b4a24baf8e..bacfb31cb3 100644
> --- a/xen/common/sched_null.c
> +++ b/xen/common/sched_null.c
> @@ -574,6 +574,8 @@ static void null_vcpu_wake(const struct scheduler
> *ops, struct vcpu *v)
>      {
>          /* Not exactly "on runq", but close enough for reusing the
> counter */
>          SCHED_STAT_CRANK(vcpu_wake_onrunq);
> +        /* Force a rescheduling in case some CPU is idle can pick
> this vCPU */
> +        cpumask_raise_softirq(&cpu_online_map, SCHEDULE_SOFTIRQ);
>          return;
>
This needs to become 'the cpus of vcpu->domain 's cpupool'.

I appreciate that this is fine, when running as shim, where you
certainly don't use cpupools. But when this run as baremetal, if we use
cpu_online_map, basically _all_ the online CPUs --even the ones that
are in another pool, under a different scheduler-- will be forced to
reschedule. And we don't want that.

I'm not also 100% convinced that this must/can live here. Basially,
you're saying that vcpu_wake() is called on a vCPU that happens to be
in the waitqueue, we should reschedule. And, AFAIUI, this is to cover
the case of a vCPU of the L2 guest comes online.

Well, it may even be technically fine. Still, if what we want to deal
with is vCPU onlining, I would prefer to at least trying find a place
which is more related to the onlining path, than to the wakeup path.

If you confirm your intent, I can have a look at the code and try to
identify such better place...

> @@ -761,9 +763,10 @@ static struct task_slice null_schedule(const
> struct scheduler *ops,
>      /*
>       * We may be new in the cpupool, or just coming back online. In
> which
>       * case, there may be vCPUs in the waitqueue that we can assign
> to us
> -     * and run.
> +     * and run. Also check whether this CPU is running idle, in
> which case try
> +     * to pick a vCPU from the waitqueue.
>       */
> -    if ( unlikely(ret.task == NULL) )
> +    if ( unlikely(ret.task == NULL || ret.task == idle_vcpu[cpu]) )
>
I don't think I understand this. I may be a bit rusty, but are you sure
that, on an idle pCPU, ret.task is idle_vcpu at this point in this
function? I don't think it is.

Also, I'm quite sure this may mess up things for tasklets. In fact, one
case when ret.task is idle_vcpu here, if I have just forced it to be
so, in order to run a tasklet. But with this, we scan the waitqueue
instead, and may end up running something else.

> @@ -781,6 +784,10 @@ static struct task_slice null_schedule(const
> struct scheduler *ops,
>          {
>              list_for_each_entry( wvc, &prv->waitq, waitq_elem )
>              {
> +                if ( test_bit(_VPF_down, &wvc->vcpu->pause_flags) )
> +                    /* Skip vCPUs that are down. */
> +                    continue;
> +
So, yes, I think things like this are what we want. As said above for
the wakeup case, though, I'd prefer to find a way to avoid that offline
vCPUs ends up in the waitqueue, rather than having to skip them.

Side note, is_vcpu_online() can be used for the test.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked
  2018-01-12  9:54       ` Dario Faggioli
@ 2018-01-12 10:45         ` Roger Pau Monné
  2018-01-12 11:16           ` Dario Faggioli
  0 siblings, 1 reply; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-12 10:45 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: George Dunlap, Xen-devel, wei.liu2, George Dunlap, Jan Beulich

On Fri, Jan 12, 2018 at 10:54:03AM +0100, Dario Faggioli wrote:
> Hi!
> 
> First of all, my filters somehow failed to highlight this for me, so
> sorry if I did not notice it earlier (and now, I need new filters
> anyway, as the email I'm using is different :-D).
> 
> I'll have a look at the patch ASAP.
> 
> On Mon, 2018-01-08 at 11:12 +0000, George Dunlap wrote:
> > On 01/08/2018 10:37 AM, Jan Beulich wrote:
> >
> > > I don't understand: Isn't the null scheduler not moving around
> > > vCPU-s at all? At least that's what the comment at the top of the
> > > file says, unless I'm mis-interpreting it. If so, how can "some CPU
> > > (...) pick this vCPU"?
> > 
> > There's no current way to prevent a user from adding more vcpus to a
> > pool than there are pcpus (if nothing else, by creating a new VM in a
> > given pool), or from taking pcpus from a pool in which #vcpus >=
> > #pcpus.
> > 
> Exactly. And something that checks for that is all but easy to
> introduce (let's just avoid even mentioning enforcing!).
> 
> > The null scheduler deals with this by having a queue of "unassigned"
> > vcpus that are waiting for a free pcpu.  When a pcpu becomes
> > available,
> > it will do the assignment.  When a pcpu that has a vcpu is assigned
> > is
> > removed from the pool, that vcpu is assigned to a different pcpu if
> > one
> > is available; if not, it is put on the list.
> > 
> Err... yes. BTW, either there are a couple of typos in the above
> paragraph, or it's me that can't read it well. Anyway, just to be
> clear, if we have 4 pCPUs, and 6 VMs, with 1 vCPU each, this might be
> the situation:
> 
> CPU0 <-- d1v0
> CPU1 <-- d2v0
> CPU2 <-- d3v0
> CPU3 <-- d4v0
> 
> Waitqueue: d5v0,d6v0
> 
> Then, if d2 leaves/dies/etc, leaving CPU1 idle, d5v0 is picked up from
> the waitqueue and assigned to CPU1.

I think the above example is not representative of what happens inside
of the shim, since there's only one domain that runs on the shim, so
the picture is something like:

CPU0 <-- d1v0
CPU1 <-- d1v1

waitqueue: d1v2 (down), d1v3 (down)

Then if the guest brings up another vCPU, let's assume it's vCPU#3
pCPU#3 will be bring up form the shim PoV, and the null scheduler will
assign the first vCPU on the waitqueue:

CPU0 <-- d1v0
CPU1 <-- d1v1
CPU3 <-- d1v2 (down)
NULL <-- d1v3 (up)

Hence d1v2 which is still down will get assigned to CPU#3, and d1v3
which is up won't get assigned to any pCPU, and hence won't run.

> > In the case of shim mode, this also seems to happen whenever curvcpus
> > <
> > maxvcpus: The L1 hypervisor (shim) only sees curvcpus cpus on which
> > to
> > schedule L2 vcpus, but the L2 guest has maxvcpus vcpus to schedule,
> > of
> > which (maxvcpus-curvcpus) are  marked 'down'.  
> >
> Mmm, wait. In case of a domain which specifies both maxvcpus and
> curvcpus, how many vCPUs does the domain in which the shim run?

Regardless of the values of maxvcpus and curvcpus PV guests are always
started with only the BSP online, and then the guest itself brings up
other vCPUs.

In the shim case vCPU hotplug is tied to pCPU hotplug, so everytime
the guest hotplugs or unplugs a vCPU the shim does exactly the same
with it's CPUs.

> > In this case, it also
> > seems that the null scheduler sometimes schedules a "down" vcpu when
> > there are "up" vcpus on the list; meaning that the "up" vcpus are
> > never
> > scheduled.
> > 
> I'm not sure how an offline vCPU can end up there... but maybe I need
> to look at the code better, with the shim use case in mind.
> 
> Anyway, I'm fine with checks that prevent offline vCPUs to be assigned
> to either pCPUs (like, the CPUs of L0 Xen) or shim's vCPUs (so, the
> CPUs of L1 Xen). I'm less fine with rescheduling everyone at every
> wakeup.

So using the scenario from before:

CPU0 <-- d1v0
CPU1 <-- d1v1

waitqueue: d1v2 (down), d1v3 (down)

Guest decided to hotplug vCPU#2, and hence the shim first hotplugs
CPU#2, but at the point CPU2 is added to the pool of CPUs vCPU2 is
still not up, hence we get the following:

CPU0 <-- d1v0
CPU1 <-- d1v1
CPU2 <-- NULL


waitqueue: d1v2 (down), d1v3 (down)

Then d1v2 is brought up, but since the null scheduler doesn't react to
wakeup the picture stays the same:

CPU0 <-- d1v0
CPU1 <-- d1v1
CPU2 <-- NULL


waitqueue: d1v2 (up), d1v3 (down)

And d1v2 doesn't get scheduled.

Hope this makes sense :)

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked
  2018-01-12 10:45         ` Roger Pau Monné
@ 2018-01-12 11:16           ` Dario Faggioli
  2018-01-12 11:22             ` Roger Pau Monné
  0 siblings, 1 reply; 206+ messages in thread
From: Dario Faggioli @ 2018-01-12 11:16 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: George Dunlap, Xen-devel, wei.liu2, George Dunlap, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 4424 bytes --]

On Fri, 2018-01-12 at 10:45 +0000, Roger Pau Monné wrote:
> On Fri, Jan 12, 2018 at 10:54:03AM +0100, Dario Faggioli wrote:
>
> > Err... yes. BTW, either there are a couple of typos in the above
> > paragraph, or it's me that can't read it well. Anyway, just to be
> > clear, if we have 4 pCPUs, and 6 VMs, with 1 vCPU each, this might
> > be
> > the situation:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d2v0
> > CPU2 <-- d3v0
> > CPU3 <-- d4v0
> > 
> > Waitqueue: d5v0,d6v0
> > 
> > Then, if d2 leaves/dies/etc, leaving CPU1 idle, d5v0 is picked up
> > from
> > the waitqueue and assigned to CPU1.
> 
> I think the above example is not representative of what happens
> inside
> of the shim, 
>
Indeed it's not. I was just trying to clarify, via an example, George's
explanation of how null works in general.

> since there's only one domain that runs on the shim, so
> the picture is something like:
> 
> CPU0 <-- d1v0
> CPU1 <-- d1v1
> 
> waitqueue: d1v2 (down), d1v3 (down)
> 
Right. So, how about we change this in such a way that d1v2 and d1v3,
since they're offline, won't end up in the waitqueue?

> Then if the guest brings up another vCPU, let's assume it's vCPU#3
> pCPU#3 will be bring up form the shim PoV, and the null scheduler
> will
> assign the first vCPU on the waitqueue:
> 
> CPU0 <-- d1v0
> CPU1 <-- d1v1
> CPU3 <-- d1v2 (down)
> NULL <-- d1v3 (up)
> 
> Hence d1v2 which is still down will get assigned to CPU#3, and d1v3
> which is up won't get assigned to any pCPU, and hence won't run.
> 
Exactly. While, if d1v2 and d1v3 were not in the waitqueue, while
offline, at all, whould would (should) happen is:

- CPU3 comes online ("in" the shim)
- CPU3 stays idle, as there's nothing in the waitqueue
- d1v3 comes online and is added to the shim's null scheduler
- as CPU3 does not have any vCPU assigned, d1v3 is assigned to it

> > Mmm, wait. In case of a domain which specifies both maxvcpus and
> > curvcpus, how many vCPUs does the domain in which the shim run?
> 
> Regardless of the values of maxvcpus and curvcpus PV guests are
> always
> started with only the BSP online, and then the guest itself brings up
> other vCPUs.
> 
> In the shim case vCPU hotplug is tied to pCPU hotplug, so everytime
> the guest hotplugs or unplugs a vCPU the shim does exactly the same
> with it's CPUs.
> 
Sure, what I was asking was much rather this: if the guest config file
has "maxvcpus=4;vcpus=1", at the end of domain creation, and before any
`xl vcpu-set' or anything that would bring online other guest vCPU,
what's the output of `vl vcpu-list'. :-)

Anyway, I think you've answered to this below.

> > I'm not sure how an offline vCPU can end up there... but maybe I
> > need
> > to look at the code better, with the shim use case in mind.
> > 
> > Anyway, I'm fine with checks that prevent offline vCPUs to be
> > assigned
> > to either pCPUs (like, the CPUs of L0 Xen) or shim's vCPUs (so, the
> > CPUs of L1 Xen). I'm less fine with rescheduling everyone at every
> > wakeup.
> 
> So using the scenario from before:
> 
> CPU0 <-- d1v0
> CPU1 <-- d1v1
> 
> waitqueue: d1v2 (down), d1v3 (down)
> 
> Guest decided to hotplug vCPU#2, and hence the shim first hotplugs
> CPU#2, but at the point CPU2 is added to the pool of CPUs vCPU2 is
> still not up, hence we get the following:
> 
> CPU0 <-- d1v0
> CPU1 <-- d1v1
> CPU2 <-- NULL
> 
> waitqueue: d1v2 (down), d1v3 (down)
> 
> Then d1v2 is brought up, but since the null scheduler doesn't react
> to
> wakeup the picture stays the same:
> 
> CPU0 <-- d1v0
> CPU1 <-- d1v1
> CPU2 <-- NULL
> 
> waitqueue: d1v2 (up), d1v3 (down)
> 
> And d1v2 doesn't get scheduled.
> 
> Hope this makes sense :)
> 
Yeah, and I see that it works.

What I'm saying is that I'd prefer, instead than having the null
scheduler reacting to wakeups of vCPUs in the waitqueue, to avoid
having the offline vCPUs in the waitqueue all together.

At which point, when d1v2 hotplug happens, there has to be a
null_vcpu_insert() (or something equivalent), to which the null
scheduler should react already.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

* Re: [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked
  2018-01-12 11:16           ` Dario Faggioli
@ 2018-01-12 11:22             ` Roger Pau Monné
  0 siblings, 0 replies; 206+ messages in thread
From: Roger Pau Monné @ 2018-01-12 11:22 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: George Dunlap, Xen-devel, wei.liu2, George Dunlap, Jan Beulich

On Fri, Jan 12, 2018 at 12:16:47PM +0100, Dario Faggioli wrote:
> On Fri, 2018-01-12 at 10:45 +0000, Roger Pau Monné wrote:
> > On Fri, Jan 12, 2018 at 10:54:03AM +0100, Dario Faggioli wrote:
> >
> > > Err... yes. BTW, either there are a couple of typos in the above
> > > paragraph, or it's me that can't read it well. Anyway, just to be
> > > clear, if we have 4 pCPUs, and 6 VMs, with 1 vCPU each, this might
> > > be
> > > the situation:
> > > 
> > > CPU0 <-- d1v0
> > > CPU1 <-- d2v0
> > > CPU2 <-- d3v0
> > > CPU3 <-- d4v0
> > > 
> > > Waitqueue: d5v0,d6v0
> > > 
> > > Then, if d2 leaves/dies/etc, leaving CPU1 idle, d5v0 is picked up
> > > from
> > > the waitqueue and assigned to CPU1.
> > 
> > I think the above example is not representative of what happens
> > inside
> > of the shim, 
> >
> Indeed it's not. I was just trying to clarify, via an example, George's
> explanation of how null works in general.
> 
> > since there's only one domain that runs on the shim, so
> > the picture is something like:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d1v1
> > 
> > waitqueue: d1v2 (down), d1v3 (down)
> > 
> Right. So, how about we change this in such a way that d1v2 and d1v3,
> since they're offline, won't end up in the waitqueue?

Sounds fine. I have to admit this is the first time I play with the
scheduler code, so it's quite likely that whatever you say will seem
OK to me :).

> > Then if the guest brings up another vCPU, let's assume it's vCPU#3
> > pCPU#3 will be bring up form the shim PoV, and the null scheduler
> > will
> > assign the first vCPU on the waitqueue:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d1v1
> > CPU3 <-- d1v2 (down)
> > NULL <-- d1v3 (up)
> > 
> > Hence d1v2 which is still down will get assigned to CPU#3, and d1v3
> > which is up won't get assigned to any pCPU, and hence won't run.
> > 
> Exactly. While, if d1v2 and d1v3 were not in the waitqueue, while
> offline, at all, whould would (should) happen is:
> 
> - CPU3 comes online ("in" the shim)
> - CPU3 stays idle, as there's nothing in the waitqueue
> - d1v3 comes online and is added to the shim's null scheduler
> - as CPU3 does not have any vCPU assigned, d1v3 is assigned to it

Yes, that's what I'm aiming for :).

> > So using the scenario from before:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d1v1
> > 
> > waitqueue: d1v2 (down), d1v3 (down)
> > 
> > Guest decided to hotplug vCPU#2, and hence the shim first hotplugs
> > CPU#2, but at the point CPU2 is added to the pool of CPUs vCPU2 is
> > still not up, hence we get the following:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d1v1
> > CPU2 <-- NULL
> > 
> > waitqueue: d1v2 (down), d1v3 (down)
> > 
> > Then d1v2 is brought up, but since the null scheduler doesn't react
> > to
> > wakeup the picture stays the same:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d1v1
> > CPU2 <-- NULL
> > 
> > waitqueue: d1v2 (up), d1v3 (down)
> > 
> > And d1v2 doesn't get scheduled.
> > 
> > Hope this makes sense :)
> > 
> Yeah, and I see that it works.
> 
> What I'm saying is that I'd prefer, instead than having the null
> scheduler reacting to wakeups of vCPUs in the waitqueue, to avoid
> having the offline vCPUs in the waitqueue all together.
> 
> At which point, when d1v2 hotplug happens, there has to be a
> null_vcpu_insert() (or something equivalent), to which the null
> scheduler should react already.

That seems fine to me, I will try to take a look at implementing this.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 206+ messages in thread

end of thread, other threads:[~2018-01-12 11:22 UTC | newest]

Thread overview: 206+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-04 13:05 [PATCH RFC v1 00/74] Run PV guest in PVH container Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 01/74] x86/svm: Offer CPUID Faulting to AMD HVM guests as well Wei Liu
2018-01-04 14:00   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 02/74] x86: Common cpuid faulting support Wei Liu
2018-01-04 14:19   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 03/74] x86/upcall: inject a spurious event after setting upcall vector Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 04/74] tools/libxc: initialise hvm loader elf log fd to get more logging Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 05/74] tools/libxc: remove extraneous newline in xc_dom_load_acpi Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 06/74] tools/libelf: fix elf notes check for PVH guest Wei Liu
2018-01-04 14:37   ` Jan Beulich
2018-01-08 15:34     ` Wei Liu
2018-01-08 16:02       ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 07/74] tools/libxc: Multi modules support Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 08/74] libxl: Introduce hack to allow PVH mode to add a shim Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 09/74] xen/common: Widen the guest logging buffer slightly Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 10/74] x86/time: Print a more helpful error when a platform timer can't be found Wei Liu
2018-01-05 10:37   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 11/74] x86/link: Introduce and use SECTION_ALIGN Wei Liu
2018-01-05 10:38   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 12/74] xen/acpi: mark the PM timer FADT field as optional Wei Liu
2018-01-05 10:52   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 13/74] xen/domctl: Return arch_config via getdomaininfo Wei Liu
2018-01-05 10:58   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 14/74] tools/ocaml: Expose arch_config in domaininfo Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 15/74] tools/ocaml: Extend domain_create() to take arch_domainconfig Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 16/74] x86/fixmap: Modify fix_to_virt() to return a void pointer Wei Liu
2018-01-05 11:05   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 17/74] ---- x86/Kconfig: Options for Xen and PVH support Wei Liu
2018-01-05 11:11   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 18/74] x86/link: Relocate program headers Wei Liu
2018-01-05 11:20   ` Jan Beulich
2018-01-08 15:43     ` Wei Liu
2018-01-08 16:26       ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 19/74] x86: introduce ELFNOTE macro Wei Liu
2018-01-05 11:27   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 20/74] x86: produce a binary that can be booted as PVH Wei Liu
2018-01-05 11:39   ` Jan Beulich
2018-01-08 15:59     ` Wei Liu
2018-01-08 16:42       ` Jan Beulich
2018-01-09 13:49         ` Wei Liu
2018-01-10 19:10     ` Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 21/74] x86/entry: Early PVH boot code Wei Liu
2018-01-05 13:32   ` Jan Beulich
2018-01-09 15:45     ` Wei Liu
2018-01-09 16:41       ` Jan Beulich
2018-01-09 17:10         ` Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 22/74] x86/boot: Map more than the first 16MB Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 23/74] x86/entry: Probe for Xen early during boot Wei Liu
2018-01-05 13:40   ` Jan Beulich
2018-01-10 17:45     ` Wei Liu
2018-01-11  7:55       ` Jan Beulich
2018-01-11  9:43         ` Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 24/74] x86/guest: Hypercall support Wei Liu
2018-01-05 13:53   ` Jan Beulich
2018-01-05 14:09     ` Andrew Cooper
2018-01-04 13:05 ` [PATCH RFC v1 25/74] x86/shutdown: Support for using SCHEDOP_{shutdown, reboot} Wei Liu
2018-01-05 14:01   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 26/74] x86/pvh: Retrieve memory map from Xen Wei Liu
2018-01-05 14:05   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 27/74] xen/console: Introduce console=xen Wei Liu
2018-01-05 14:08   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 28/74] x86: initialise shared_info page Wei Liu
2018-01-05 14:11   ` Jan Beulich
2018-01-05 14:20     ` Andrew Cooper
2018-01-05 14:28       ` Roger Pau Monné
2018-01-05 14:40         ` Andrew Cooper
2018-01-04 13:05 ` [PATCH RFC v1 29/74] x86: xen pv clock time source Wei Liu
2018-01-05 14:17   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 30/74] x86: APIC timer calibration when running as a guest Wei Liu
2018-01-05 14:35   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 31/74] x86: read wallclock from Xen running in pvh mode Wei Liu
2018-01-05 14:43   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 32/74] x86: don't swallow the first command line item " Wei Liu
2018-01-05 14:49   ` Jan Beulich
2018-01-09 14:30   ` Roger Pau Monné
2018-01-04 13:05 ` [PATCH RFC v1 33/74] x86/guest: enable event channels upcalls Wei Liu
2018-01-05 15:07   ` Jan Beulich
2018-01-05 15:19     ` Andrew Cooper
2018-01-04 13:05 ` [PATCH RFC v1 34/74] x86/guest: add PV console code Wei Liu
2018-01-05 15:22   ` Jan Beulich
2018-01-10 15:33     ` Roger Pau Monné
2018-01-10 15:55       ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 35/74] x86/guest: use PV console for Xen/Dom0 I/O Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 36/74] --- x86/shim: Kconfig and command line options Wei Liu
2018-01-05 15:26   ` Jan Beulich
2018-01-05 17:51     ` Andrew Cooper
2018-01-08  8:22       ` Jan Beulich
2018-01-08 11:33         ` Andrew Cooper
2018-01-08 11:46           ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 37/74] tools/firmware: Build and install xen-shim Wei Liu
2018-01-04 13:05 ` [PATCH RFC v1 38/74] x86/pv-shim: Force CPUID faulting in pv-shim mode Wei Liu
2018-01-08 10:16   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 39/74] xen/x86: make VGA support selectable Wei Liu
2018-01-08 10:22   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 40/74] xen/x86: report domain id on cpuid Wei Liu
2018-01-08 10:27   ` Jan Beulich
2018-01-08 10:34     ` Andrew Cooper
2018-01-08 11:11       ` Jan Beulich
2018-01-08 11:22         ` Andrew Cooper
2018-01-08 11:27           ` Jan Beulich
2018-01-08 11:29   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 41/74] xen/pvh: do not mark the low 1MB as IO mem Wei Liu
2018-01-08 10:30   ` Jan Beulich
2018-01-08 10:37     ` Roger Pau Monné
2018-01-08 11:11       ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked Wei Liu
2018-01-08 10:37   ` Jan Beulich
2018-01-08 11:12     ` George Dunlap
2018-01-12  9:54       ` Dario Faggioli
2018-01-12 10:45         ` Roger Pau Monné
2018-01-12 11:16           ` Dario Faggioli
2018-01-12 11:22             ` Roger Pau Monné
2018-01-12 10:41   ` Dario Faggioli
2018-01-04 13:05 ` [PATCH RFC v1 43/74] xen: introduce rangeset_reserve_hole Wei Liu
2018-01-08 10:46   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 44/74] xen/pvshim: keep track of unused pages Wei Liu
2018-01-08 10:58   ` Jan Beulich
2018-01-08 11:04     ` Roger Pau Monné
2018-01-08 11:22       ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 45/74] x86/guest: use unpopulated memory to map the shared_info page Wei Liu
2018-01-08 11:03   ` Jan Beulich
2018-01-08 11:06     ` Roger Pau Monné
2018-01-08 11:25       ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 46/74] xen/guest: fetch vCPU ID from Xen Wei Liu
2018-01-08 11:04   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 47/74] x86/guest: fix upcall vector setup Wei Liu
2018-01-08 11:08   ` Jan Beulich
2018-01-04 13:05 ` [PATCH RFC v1 48/74] x86/guest: unmask console event channel Wei Liu
2018-01-04 13:06 ` [PATCH RFC v1 49/74] x86/guest: map per-cpu vcpu_info area Wei Liu
2018-01-08 13:21   ` Jan Beulich
2018-01-09 12:08     ` Roger Pau Monné
2018-01-04 13:06 ` [PATCH RFC v1 50/74] xen/pvshim: remove Dom0 kernel support check Wei Liu
2018-01-08 13:28   ` Jan Beulich
2018-01-04 13:06 ` [PATCH RFC v1 51/74] xen/pvshim: don't allow access to iomem or ioports Wei Liu
2018-01-08 13:29   ` Jan Beulich
2018-01-04 13:06 ` [PATCH RFC v1 52/74] xen: mark xenstore/console pages as RAM and add them to dom_io Wei Liu
2018-01-08 13:49   ` Jan Beulich
2018-01-09  9:25     ` Roger Pau Monné
2018-01-09 11:03       ` Jan Beulich
2018-01-09 11:26         ` Roger Pau Monné
2018-01-09 13:34           ` Jan Beulich
2018-01-04 13:06 ` [PATCH RFC v1 53/74] xen/pvshim: modify Dom0 builder in order to build a DomU Wei Liu
2018-01-08 14:06   ` Jan Beulich
2018-01-09 16:09     ` Roger Pau Monné
2018-01-09 16:26       ` Jan Beulich
2018-01-09  9:06   ` Jan Beulich
2018-01-04 13:06 ` [PATCH RFC v1 54/74] xen/pvshim: set correct domid value Wei Liu
2018-01-08 14:17   ` Jan Beulich
2018-01-09 16:27     ` Roger Pau Monné
2018-01-04 13:06 ` [PATCH RFC v1 55/74] xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU Wei Liu
2018-01-08 16:05   ` Jan Beulich
2018-01-08 16:22     ` Roger Pau Monné
2018-01-09  8:00       ` Jan Beulich
2018-01-09 16:45         ` Roger Pau Monné
2018-01-09 17:42           ` Jan Beulich
2018-01-09 17:50     ` Anthony Liguori
2018-01-10 12:23       ` Roger Pau Monné
2018-01-09  7:49   ` Jan Beulich
2018-01-04 13:06 ` [PATCH RFC v1 56/74] xen/pvshim: add grant table operations Wei Liu
2018-01-08 17:19   ` Jan Beulich
2018-01-09 18:34     ` Roger Pau Monné
2018-01-10  7:28       ` Jan Beulich
2018-01-10  8:01         ` Roger Pau Monné
2018-01-04 13:06 ` [PATCH RFC v1 57/74] x86/pv-shim: shadow PV console's page for L2 DomU Wei Liu
2018-01-09  9:13   ` Jan Beulich
2018-01-09 15:43     ` Sergey Dyasli
2018-01-09 16:28       ` Jan Beulich
2018-01-10 16:56         ` Sergey Dyasli
2018-01-12  7:03           ` Sarah Newman
2018-01-04 13:06 ` [PATCH RFC v1 58/74] xen/pvshim: add migration support Wei Liu
2018-01-09  9:38   ` Jan Beulich
2018-01-10 12:54     ` Roger Pau Monné
2018-01-04 13:06 ` [PATCH RFC v1 59/74] xen/pvshim: add shim_mem cmdline parameter Wei Liu
2018-01-09  9:47   ` Jan Beulich
2018-01-04 13:06 ` [PATCH RFC v1 60/74] xen/pvshim: set max_pages to the value of tot_pages Wei Liu
2018-01-09  9:48   ` Jan Beulich
2018-01-04 13:06 ` [PATCH RFC v1 61/74] xen/pvshim: support vCPU hotplug Wei Liu
2018-01-09 10:16   ` Jan Beulich
2018-01-10 13:07     ` Roger Pau Monné
2018-01-10 13:33       ` Jan Beulich
2018-01-10 14:40     ` Roger Pau Monné
2018-01-04 13:06 ` [PATCH RFC v1 62/74] xen/pvshim: memory hotplug Wei Liu
2018-01-09 10:42   ` Jan Beulich
2018-01-10 13:36     ` Roger Pau Monné
2018-01-10 13:42       ` Jan Beulich
2018-01-04 13:06 ` [PATCH RFC v1 63/74] xen/shim: modify shim_mem parameter behaviour Wei Liu
2018-01-09 10:48   ` Jan Beulich
2018-01-04 13:06 ` [PATCH RFC v1 64/74] xen/pvshim: use default position for the m2p mappings Wei Liu
2018-01-09 10:50   ` Jan Beulich
2018-01-04 13:06 ` [PATCH RFC v1 65/74] xen/shim: crash instead of reboot in shim mode Wei Liu
2018-01-09 10:52   ` Jan Beulich
2018-01-04 13:06 ` [PATCH RFC v1 66/74] xen/shim: allow DomU to have as many vcpus as available Wei Liu
2018-01-09 10:59   ` Jan Beulich
2018-01-10 16:14     ` Roger Pau Monné
2018-01-04 13:06 ` [PATCH RFC v1 67/74] libxl: libxl__build_hvm: Introduce separate b_info parameter Wei Liu
2018-01-04 13:06 ` [PATCH RFC v1 68/74] libxl__domain_build_info_setdefault_pvhhvm: introduce Wei Liu
2018-01-04 13:06 ` [PATCH RFC v1 69/74] libxl_bitmap_copy_alloc: copy 0, NULL as 0, NULL Wei Liu
2018-01-04 13:06 ` [PATCH RFC v1 70/74] libxl: pvshim: Check state->shim_path before domain type Wei Liu
2018-01-04 13:06 ` [PATCH RFC v1 71/74] libxl: pvshim: Provide first-class config settings to enable shim mode Wei Liu
2018-01-04 13:06 ` [PATCH RFC v1 72/74] libxl: pvshim: Introduce pvhshim_extra Wei Liu
2018-01-04 13:06 ` [PATCH RFC v1 73/74] xl: pvshim: Provide and document xl config Wei Liu
2018-01-04 13:06 ` [PATCH RFC v1 74/74] libxl: pvshim: Set video_memkb to ~0 Wei Liu
2018-01-08 16:12 ` [PATCH RFC v1 00/74] Run PV guest in PVH container Ian Jackson
2018-01-11 15:39   ` Ian Jackson
2018-01-10 16:26 ` George Dunlap
2018-01-10 16:28   ` Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.