linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v13] Linux Xen PVH support (v13)
@ 2014-01-03 19:38 Konrad Rzeszutek Wilk
  2014-01-03 19:38 ` [PATCH v13 01/19] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
                   ` (19 more replies)
  0 siblings, 20 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa

The patches, also available at

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/pvh.v13

implements the neccessary functionality to boot a PV guest in PVH mode.

This blog has a great description of what PVH is:
http://blog.xen.org/index.php/2012/10/31/the-paravirtualization-spectrum-part-2-from-poles-to-a-spectrum/

These patches are based on v3.13-rc6. If I had failed to
address your review I am terrible sorry - it was an oversight.
Please poke at the patch again.

Changes since v13: [http://mid.gmane.org/1388550945-25499-1-git-send-email-konrad.wilk@oracle.com]
 - Rework per David and Stefano review.
 - Fix regression with Xen 4.1.
 - Use native_cpuid instead of xen_cpuid.

v12: [http://mid.gmane.org/1387313503-31362-1-git-send-email-konrad.wilk@oracle.com]
 - Rework per Stefano's review.
 - Split some patches up for easier review.
 - Bugs fixed.

v11 as compared to v10: [https://lkml.org/lkml/2013/12/12/625]:
 - Split patches in a more logical sense, squash some
 - Dropped Acked-by's from folks
 - Fleshed out descriptions


Regression wise - there are no bugs with Xen 4.[1,2,3,4].

That is if you compile/boot it with
CONFIG_XEN_PVH=y or "# CONFIG_XEN_PVH is not set" - in both cases as
either dom0 or domU there are no bugs. Also launched it as 32/64 bit
dom0 with 32/64 domU as PV or PVHVM, and along with SLES11, SLES12,
F15->F19 (32/64), OL5, OL6, RHEL5 (32/64) FreeBSD HVM, NetBSD PV without issues.

With Xen 4.1, there was a regression, (see
http://mid.gmane.org/20131220175735.GA619@phenom.dumpdata.com)
and it this patchset has the fix for it.


-------------------------
PARAVIRT OPS / x86_init /apic /smp ops
------------------------

The paravirt ops that are in usage are:

	pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;

These are still used:

        pv_info = xen_info;
        pv_init_ops = xen_init_ops;
        pv_apic_ops = xen_apic_ops;
        pv_time_ops = xen_time_ops;

And the x86_init,apic, and smp_ops ops are still in force.

This is just the first step so there might be some other ones
that are needed that I failed to enumerate.

The pv_cpu_ops is not used. From pv_mmu_ops only one is used.

-----------------------------
HOW TO USE IT
-----------------------------

The only things needed to make this work as PVH are:

 0) Get the latest version of Xen and compile/install it.
    See http://wiki.xen.org/wiki/Compiling_Xen_From_Source for details

 1) Clone above mentioned tree

    See http://wiki.xenproject.org/wiki/Mainline_Linux_Kernel_Configs#Configuring_the_Kernel
    for details. The steps are:

	cd $HOME
	git clone  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git linux
	cd linux
	git checkout origin/stable/pvh.v11

 2) Compile with CONFIG_XEN_PVH=y

    a) From scratch:

	make defconfig
	make menuconfig
	Processor type and features  --->  Linux guest support  --->
		 Paravirtualization layer for spinlocks
		 Xen guest support	(which will now show you:)
		 Support for running as a PVH guest (NEW)

	in case you like to edit .config, it is:

	CONFIG_HYPERVISOR_GUEST=y
	CONFIG_PARAVIRT=y
	CONFIG_PARAVIRT_GUEST=y
	CONFIG_PARAVIRT_SPINLOCKS=y
	CONFIG_XEN=y
	CONFIG_XEN_PVH=y

	You will also have to enable the block, network drivers, console, etc
	which are in different submenus.

    b). Based on your current distro.

	cp /boot/config-`uname -r` $HOME/linux/.config
	make menuconfig
	Processor type and features  --->  Linux guest support  --->
		 Support for running as a PVH guest (NEW)

 3) Launch it with 'pvh=1' in your guest config (for example):

	extra="console=hvc0 debug  kgdboc=hvc0 nokgdbroundup  initcall_debug debug"
	kernel="/mnt/lab/latest/vmlinuz"
	ramdisk="/mnt/lab/latest/initramfs.cpio.gz"
	memory=1024
	vcpus=4
	name="pvh"
	vif = [ 'mac=00:0F:4B:00:00:68, bridge=switch' ]
	vfb = [ 'vnc=1, vnclisten=0.0.0.0,vncunused=1']
	disk=['phy:/dev/sdb1,xvda,w']
	pvh=1
	on_reboot="preserve"
	on_crash="preserve"
	on_poweroff="preserve"

    using 'xl'. Xend 'xm' does not have PVH support.

It will bootup as a normal PV guest, but 'xen-detect' will report it as an HVM
guest.

Items that have not been tested extensively or at all:
  - Migration (xl save && xl restore for example).

  - 32-bit guests (won't even present you with a CONFIG_XEN_PVH option)

  - PCI passthrough

  - Running it in dom0 mode (as the patches for that are not yet in Xen upstream).
    If you want to try that, you can merge/pull Mukesh's branch:

	cd $HOME/xen
	git pull git://oss.oracle.com/git/mrathor/xen.git dom0pvh-v6

    .. and use this bootup parameter ("dom0pvh=1"). Remember to recompile
    and install the new version of Xen. This patchset
    does not contain the patches neccessary to setup guests - but I can
    create one easily enough. 

  - Memory ballooning
  - Multiple VBDs, NICs, etc.

Things that are broken:
 - CPUID filtering. There are no filtering done at all which  means that
   certain cpuid flags are exposed to the guest. The x2apic will cause
   a crash if the NMI handler is invoked. The APERF will cause inferior
   scheduling decisions.
 
If you encounter errors, please email with the following (pls note that the
guest config has 'on_reboot="preserve", on_crash="preserve" - which you should
have in your guest config to contain the memory of the guest):

 a) xl dmesg
 b) xl list
 c) xenctx -s $HOME/linux/System.map -f -a -C <domain id>
    [xenctx is sometimes found in  /usr/lib/xen/bin/xenctx ]
 d) the console output from the guest
 e) Anything else you can think off.

Stash away your vmlinux file (it is too big to send via email) - as I might
need it later on.


That is it!

Thank you!

 arch/arm/include/asm/xen/page.h    |   1 +
 arch/arm/xen/enlighten.c           |   9 +-
 arch/x86/include/asm/xen/page.h    |   8 +-
 arch/x86/xen/Kconfig               |   5 ++
 arch/x86/xen/enlighten.c           | 100 +++++++++++++++++-----
 arch/x86/xen/grant-table.c         |  62 ++++++++++++++
 arch/x86/xen/irq.c                 |   5 +-
 arch/x86/xen/mmu.c                 | 166 +++++++++++++++++++++----------------
 arch/x86/xen/p2m.c                 |  15 +++-
 arch/x86/xen/setup.c               |  40 +++++++--
 arch/x86/xen/smp.c                 |  49 +++++++----
 arch/x86/xen/xen-head.S            |  25 +++++-
 arch/x86/xen/xen-ops.h             |   1 +
 drivers/xen/events.c               |  14 ++--
 drivers/xen/gntdev.c               |   2 +-
 drivers/xen/grant-table.c          |  87 ++++++++++++++-----
 drivers/xen/platform-pci.c         |  10 ++-
 drivers/xen/xenbus/xenbus_client.c |   3 +-
 include/xen/grant_table.h          |   9 +-
 include/xen/interface/elfnote.h    |  13 +++
 include/xen/xen.h                  |  14 ++++
 21 files changed, 483 insertions(+), 155 deletions(-)

Konrad Rzeszutek Wilk (7):
      xen/pvh: Don't setup P2M tree.
      xen/mmu/p2m: Refactor the xen_pagetable_init code (v2).
      xen/mmu: Cleanup xen_pagetable_p2m_copy a bit.
      xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init.
      xen/grant-table: Refactor gnttab_init
      xen/grant: Implement an grant frame array struct (v2).
      xen/pvh: Piggyback on PVHVM for grant driver (v4)

Mukesh Rathor (12):
      xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn.
      xen/pvh/x86: Define what an PVH guest is (v3).
      xen/pvh: Early bootup changes in PV code (v4).
      xen/pvh: MMU changes for PVH (v2)
      xen/pvh/mmu: Use PV TLB instead of native.
      xen/pvh: Setup up shared_info.
      xen/pvh: Load GDT/GS in early PV bootup code for BSP.
      xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
      xen/pvh: Update E820 to work with PVH (v2)
      xen/pvh: Piggyback on PVHVM for event channels (v2)
      xen/pvh: Piggyback on PVHVM XenBus.
      xen/pvh: Support ParaVirtualized Hardware extensions (v3).


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v13 01/19] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn.
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-03 19:38 ` [PATCH v13 02/19] xen/pvh/x86: Define what an PVH guest is (v3) Konrad Rzeszutek Wilk
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

Most of the functions in page.h are prefaced with
	if (xen_feature(XENFEAT_auto_translated_physmap))
		return mfn;

Except the mfn_to_local_pfn. At a first sight, the function
should work without this patch - as the 'mfn_to_mfn' has
a similar check. But there are no such check in the
'get_phys_to_machine' function - so we would crash in there.

This fixes it by following the convention of having the
check for auto-xlat in these static functions.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/include/asm/xen/page.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index b913915..4a092cc 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -167,7 +167,12 @@ static inline xpaddr_t machine_to_phys(xmaddr_t machine)
  */
 static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
 {
-	unsigned long pfn = mfn_to_pfn(mfn);
+	unsigned long pfn;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return mfn;
+
+	pfn = mfn_to_pfn(mfn);
 	if (get_phys_to_machine(pfn) != mfn)
 		return -1; /* force !pfn_valid() */
 	return pfn;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 02/19] xen/pvh/x86: Define what an PVH guest is (v3).
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
  2014-01-03 19:38 ` [PATCH v13 01/19] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-03 19:38 ` [PATCH v13 03/19] xen/pvh: Early bootup changes in PV code (v4) Konrad Rzeszutek Wilk
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

Which is a PV guest with auto page translation enabled
and with vector callback. It is a cross between PVHVM and PV.

The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
with modifications):

"* the guest uses auto translate:
 - p2m is managed by Xen
 - pagetables are owned by the guest
 - mmu_update hypercall not available
* it uses event callback and not vlapic emulation,
* IDT is native, so set_trap_table hcall is also N/A for a PVH guest.

For a full list of hcalls supported for PVH, see pvh_hypercall64_table
in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
PV guest with auto translate, although it does use hvm_op for setting
callback vector."

Also we use the PV cpuid, albeit we can use the HVM (native) cpuid.
However, we do have a fair bit of filtering in the xen_cpuid and
we can piggyback on that until the hypervisor/toolstack filters
the appropiate cpuids. Once that is done we can swap over to
use the native one.

We setup a Kconfig entry that is disabled by default and
cannot be enabled.

Note that on ARM the concept of PVH is non-existent. As Ian
put it: "an ARM guest is neither PV nor HVM nor PVHVM.
It's a bit like PVH but is different also (it's further towards
the H end of the spectrum than even PVH).". As such these
options (PVHVM, PVH) are never enabled nor seen on ARM
compilations.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/Kconfig |  5 +++++
 include/xen/xen.h    | 14 ++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 1a3c765..e7d0590 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -51,3 +51,8 @@ config XEN_DEBUG_FS
 	  Enable statistics output and various tuning options in debugfs.
 	  Enabling this option may incur a significant performance overhead.
 
+config XEN_PVH
+	bool "Support for running as a PVH guest"
+	depends on X86_64 && XEN && BROKEN
+	select XEN_PVHVM
+	def_bool n
diff --git a/include/xen/xen.h b/include/xen/xen.h
index a74d436..0c0e3ef 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -29,4 +29,18 @@ extern enum xen_domain_type xen_domain_type;
 #define xen_initial_domain()	(0)
 #endif	/* CONFIG_XEN_DOM0 */
 
+#ifdef CONFIG_XEN_PVH
+/* This functionality exists only for x86. The XEN_PVHVM support exists
+ * only in x86 world - hence on ARM it will be always disabled.
+ * N.B. ARM guests are neither PV nor HVM nor PVHVM.
+ * It's a bit like PVH but is different also (it's further towards the H
+ * end of the spectrum than even PVH).
+ */
+#include <xen/features.h>
+#define xen_pvh_domain() (xen_pv_domain() && \
+			  xen_feature(XENFEAT_auto_translated_physmap) && \
+			  xen_have_vector_callback)
+#else
+#define xen_pvh_domain()	(0)
+#endif
 #endif	/* _XEN_XEN_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 03/19] xen/pvh: Early bootup changes in PV code (v4).
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
  2014-01-03 19:38 ` [PATCH v13 01/19] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
  2014-01-03 19:38 ` [PATCH v13 02/19] xen/pvh/x86: Define what an PVH guest is (v3) Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-05 17:49   ` Stefano Stabellini
  2014-01-03 19:38 ` [PATCH v13 04/19] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

We don't use the filtering that 'xen_cpuid' is doing
because the hypervisor treats 'XEN_EMULATE_PREFIX' as
an invalid instruction. This means that all of the filtering
will have to be done in the hypervisor/toolstack.

Without the filtering we expose to the guest the:

 - cpu topology (sockets, cores, etc);
 - the APERF (which the generic scheduler likes to
    use), see  5e626254206a709c6e937f3dda69bf26c7344f6f
    "xen/setup: filter APERFMPERF cpuid feature out"
 - and the inability to figure out whether MWAIT_LEAF
   should be exposed or not. See
   df88b2d96e36d9a9e325bfcd12eb45671cbbc937
   "xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded."
 - x2apic, see  4ea9b9aca90cfc71e6872ed3522356755162932c
   "xen: mask x2APIC feature in PV"

We also check for vector callback early on, as it is a required
feature. PVH also runs at default kernel IOPL.

Finally, pure PV settings are moved to a separate function that are
only called for pure PV, ie, pv with pvmmu. They are also #ifdef
with CONFIG_XEN_PVMMU.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c | 48 ++++++++++++++++++++++++++++++++++--------------
 arch/x86/xen/setup.c     | 18 ++++++++++++------
 2 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index fa6ade7..eb0efc2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -46,6 +46,7 @@
 #include <xen/hvm.h>
 #include <xen/hvc-console.h>
 #include <xen/acpi.h>
+#include <xen/features.h>
 
 #include <asm/paravirt.h>
 #include <asm/apic.h>
@@ -262,8 +263,9 @@ static void __init xen_banner(void)
 	struct xen_extraversion extra;
 	HYPERVISOR_xen_version(XENVER_extraversion, &extra);
 
-	printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
-	       pv_info.name);
+	pr_info("Booting paravirtualized kernel %son %s\n",
+		xen_feature(XENFEAT_auto_translated_physmap) ?
+			"with PVH extensions " : "", pv_info.name);
 	printk(KERN_INFO "Xen version: %d.%d%s%s\n",
 	       version >> 16, version & 0xffff, extra.extraversion,
 	       xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
@@ -433,7 +435,7 @@ static void __init xen_init_cpuid_mask(void)
 
 	ax = 1;
 	cx = 0;
-	xen_cpuid(&ax, &bx, &cx, &dx);
+	cpuid(1, &ax, &bx, &cx, &dx);
 
 	xsave_mask =
 		(1 << (X86_FEATURE_XSAVE % 32)) |
@@ -1420,6 +1422,19 @@ static void __init xen_setup_stackprotector(void)
 	pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
+static void __init xen_pvh_early_guest_init(void)
+{
+	if (!xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
+	if (xen_feature(XENFEAT_hvm_callback_vector))
+		xen_have_vector_callback = 1;
+
+#ifdef CONFIG_X86_32
+	BUG(); /* PVH: Implement proper support. */
+#endif
+}
+
 /* First C function to be called on Xen boot */
 asmlinkage void __init xen_start_kernel(void)
 {
@@ -1431,13 +1446,16 @@ asmlinkage void __init xen_start_kernel(void)
 
 	xen_domain_type = XEN_PV_DOMAIN;
 
+	xen_setup_features();
+	xen_pvh_early_guest_init();
 	xen_setup_machphys_mapping();
 
 	/* Install Xen paravirt ops */
 	pv_info = xen_info;
 	pv_init_ops = xen_init_ops;
-	pv_cpu_ops = xen_cpu_ops;
 	pv_apic_ops = xen_apic_ops;
+	if (!xen_pvh_domain())
+		pv_cpu_ops = xen_cpu_ops;
 
 	x86_init.resources.memory_setup = xen_memory_setup;
 	x86_init.oem.arch_setup = xen_arch_setup;
@@ -1469,8 +1487,6 @@ asmlinkage void __init xen_start_kernel(void)
 	/* Work out if we support NX */
 	x86_configure_nx();
 
-	xen_setup_features();
-
 	/* Get mfn list */
 	if (!xen_feature(XENFEAT_auto_translated_physmap))
 		xen_build_dynamic_phys_to_machine();
@@ -1548,14 +1564,18 @@ asmlinkage void __init xen_start_kernel(void)
 	/* set the limit of our address space */
 	xen_reserve_top();
 
-	/* We used to do this in xen_arch_setup, but that is too late on AMD
-	 * were early_cpu_init (run before ->arch_setup()) calls early_amd_init
-	 * which pokes 0xcf8 port.
-	 */
-	set_iopl.iopl = 1;
-	rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
-	if (rc != 0)
-		xen_raw_printk("physdev_op failed %d\n", rc);
+	/* PVH: runs at default kernel iopl of 0 */
+	if (!xen_pvh_domain()) {
+		/*
+		 * We used to do this in xen_arch_setup, but that is too late
+		 * on AMD were early_cpu_init (run before ->arch_setup()) calls
+		 * early_amd_init which pokes 0xcf8 port.
+		 */
+		set_iopl.iopl = 1;
+		rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
+		if (rc != 0)
+			xen_raw_printk("physdev_op failed %d\n", rc);
+	}
 
 #ifdef CONFIG_X86_32
 	/* set up basic CPUID stuff */
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 68c054f..2137c51 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -563,16 +563,13 @@ void xen_enable_nmi(void)
 		BUG();
 #endif
 }
-void __init xen_arch_setup(void)
+void __init xen_pvmmu_arch_setup(void)
 {
-	xen_panic_handler_init();
-
 	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments);
 	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_writable_pagetables);
 
-	if (!xen_feature(XENFEAT_auto_translated_physmap))
-		HYPERVISOR_vm_assist(VMASST_CMD_enable,
-				     VMASST_TYPE_pae_extended_cr3);
+	HYPERVISOR_vm_assist(VMASST_CMD_enable,
+			     VMASST_TYPE_pae_extended_cr3);
 
 	if (register_callback(CALLBACKTYPE_event, xen_hypervisor_callback) ||
 	    register_callback(CALLBACKTYPE_failsafe, xen_failsafe_callback))
@@ -581,6 +578,15 @@ void __init xen_arch_setup(void)
 	xen_enable_sysenter();
 	xen_enable_syscall();
 	xen_enable_nmi();
+}
+
+/* This function is not called for HVM domains */
+void __init xen_arch_setup(void)
+{
+	xen_panic_handler_init();
+	if (!xen_feature(XENFEAT_auto_translated_physmap))
+		xen_pvmmu_arch_setup();
+
 #ifdef CONFIG_ACPI
 	if (!(xen_start_info->flags & SIF_INITDOMAIN)) {
 		printk(KERN_INFO "ACPI in unprivileged domain disabled\n");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 04/19] xen/pvh: Don't setup P2M tree.
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 03/19] xen/pvh: Early bootup changes in PV code (v4) Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-03 19:38 ` [PATCH v13 05/19] xen/mmu/p2m: Refactor the xen_pagetable_init code (v2) Konrad Rzeszutek Wilk
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Konrad Rzeszutek Wilk

P2M is not available for PVH. Fortunatly for us the
P2M code already has mostly the support for auto-xlat guest thanks to
commit 3d24bbd7dddbea54358a9795abaf051b0f18973c
"grant-table: call set_phys_to_machine after mapping grant refs"
which: "
introduces set_phys_to_machine calls for auto_translated guests
(even on x86) in gnttab_map_refs and gnttab_unmap_refs.
translated by swiotlb-xen... " so we don't need to muck much.

with above mentioned "commit you'll get set_phys_to_machine calls
from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do
anything with them " (Stefano Stabellini) which is OK - we want
them to be NOPs.

This is because we assume that an "IOMMU is always present on the
plaform and Xen is going to make the appropriate IOMMU pagetable
changes in the hypercall implementation of GNTTABOP_map_grant_ref
and GNTTABOP_unmap_grant_ref, then eveything should be transparent
from PVH priviligied point of view and DMA transfers involving
foreign pages keep working with no issues[sp]

Otherwise we would need a P2M (and an M2P) for PVH priviligied to
track these foreign pages .. (see arch/arm/xen/p2m.c)."
(Stefano Stabellini).

We still have to inhibit the building of the P2M tree.
That had been done in the past by not calling
xen_build_dynamic_phys_to_machine (which setups the P2M tree
and gives us virtual address to access them). But we are missing
a check for xen_build_mfn_list_list - which was continuing to setup
the P2M tree and would blow up at trying to get the virtual
address of p2m_missing (which would have been setup by
xen_build_dynamic_phys_to_machine).

Hence a check is needed to not call xen_build_mfn_list_list when
running in auto-xlat mode.

Instead of replicating the check for auto-xlat in enlighten.c
do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
is called also in xen_arch_post_suspend without any checks for
auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
allocate space for an P2M tree.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/xen/enlighten.c |  3 +--
 arch/x86/xen/p2m.c       | 12 ++++++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index eb0efc2..23ead29 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1488,8 +1488,7 @@ asmlinkage void __init xen_start_kernel(void)
 	x86_configure_nx();
 
 	/* Get mfn list */
-	if (!xen_feature(XENFEAT_auto_translated_physmap))
-		xen_build_dynamic_phys_to_machine();
+	xen_build_dynamic_phys_to_machine();
 
 	/*
 	 * Set up kernel GDT and segment registers, mainly so that
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 2ae8699..fb7ee0a 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -280,6 +280,9 @@ void __ref xen_build_mfn_list_list(void)
 {
 	unsigned long pfn;
 
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	/* Pre-initialize p2m_top_mfn to be completely missing */
 	if (p2m_top_mfn == NULL) {
 		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
@@ -346,10 +349,15 @@ void xen_setup_mfn_list_list(void)
 /* Set up p2m_top to point to the domain-builder provided p2m pages */
 void __init xen_build_dynamic_phys_to_machine(void)
 {
-	unsigned long *mfn_list = (unsigned long *)xen_start_info->mfn_list;
-	unsigned long max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
+	unsigned long *mfn_list;
+	unsigned long max_pfn;
 	unsigned long pfn;
 
+	 if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
+	mfn_list = (unsigned long *)xen_start_info->mfn_list;
+	max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
 	xen_max_p2m_pfn = max_pfn;
 
 	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 05/19] xen/mmu/p2m: Refactor the xen_pagetable_init code (v2).
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 04/19] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-05 17:51   ` Stefano Stabellini
  2014-01-03 19:38 ` [PATCH v13 06/19] xen/mmu: Cleanup xen_pagetable_p2m_copy a bit Konrad Rzeszutek Wilk
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Konrad Rzeszutek Wilk

The revector and copying of the P2M only happens when
!auto-xlat and on 64-bit builds. It is not obvious from
the code, so lets have seperate 32 and 64-bit functions.

We also invert the check for auto-xlat to make the code
flow simpler.

Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c | 70 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 37 insertions(+), 33 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index ce563be..c140eff 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1198,44 +1198,40 @@ static void __init xen_cleanhighmap(unsigned long vaddr,
 	 * instead of somewhere later and be confusing. */
 	xen_mc_flush();
 }
-#endif
-static void __init xen_pagetable_init(void)
+static void __init xen_pagetable_p2m_copy(void)
 {
-#ifdef CONFIG_X86_64
 	unsigned long size;
 	unsigned long addr;
-#endif
-	paging_init();
-	xen_setup_shared_info();
-#ifdef CONFIG_X86_64
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		unsigned long new_mfn_list;
+	unsigned long new_mfn_list;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
+	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
+
+	/* On 32-bit, we get zero so this never gets executed. */
+	new_mfn_list = xen_revector_p2m_tree();
+	if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
+		/* using __ka address and sticking INVALID_P2M_ENTRY! */
+		memset((void *)xen_start_info->mfn_list, 0xff, size);
+
+		/* We should be in __ka space. */
+		BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
+		addr = xen_start_info->mfn_list;
+		/* We roundup to the PMD, which means that if anybody at this stage is
+		 * using the __ka address of xen_start_info or xen_start_info->shared_info
+		 * they are in going to crash. Fortunatly we have already revectored
+		 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
+		size = roundup(size, PMD_SIZE);
+		xen_cleanhighmap(addr, addr + size);
 
 		size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
+		memblock_free(__pa(xen_start_info->mfn_list), size);
+		/* And revector! Bye bye old array */
+		xen_start_info->mfn_list = new_mfn_list;
+	} else
+		return;
 
-		/* On 32-bit, we get zero so this never gets executed. */
-		new_mfn_list = xen_revector_p2m_tree();
-		if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
-			/* using __ka address and sticking INVALID_P2M_ENTRY! */
-			memset((void *)xen_start_info->mfn_list, 0xff, size);
-
-			/* We should be in __ka space. */
-			BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
-			addr = xen_start_info->mfn_list;
-			/* We roundup to the PMD, which means that if anybody at this stage is
-			 * using the __ka address of xen_start_info or xen_start_info->shared_info
-			 * they are in going to crash. Fortunatly we have already revectored
-			 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
-			size = roundup(size, PMD_SIZE);
-			xen_cleanhighmap(addr, addr + size);
-
-			size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
-			memblock_free(__pa(xen_start_info->mfn_list), size);
-			/* And revector! Bye bye old array */
-			xen_start_info->mfn_list = new_mfn_list;
-		} else
-			goto skip;
-	}
 	/* At this stage, cleanup_highmap has already cleaned __ka space
 	 * from _brk_limit way up to the max_pfn_mapped (which is the end of
 	 * the ramdisk). We continue on, erasing PMD entries that point to page
@@ -1255,7 +1251,15 @@ static void __init xen_pagetable_init(void)
 	 * anything at this stage. */
 	xen_cleanhighmap(MODULES_VADDR, roundup(MODULES_VADDR, PUD_SIZE) - 1);
 #endif
-skip:
+}
+#endif
+
+static void __init xen_pagetable_init(void)
+{
+	paging_init();
+	xen_setup_shared_info();
+#ifdef CONFIG_X86_64
+	xen_pagetable_p2m_copy();
 #endif
 	xen_post_allocator_init();
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 06/19] xen/mmu: Cleanup xen_pagetable_p2m_copy a bit.
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 05/19] xen/mmu/p2m: Refactor the xen_pagetable_init code (v2) Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-05 17:56   ` Stefano Stabellini
  2014-01-03 19:38 ` [PATCH v13 07/19] xen/pvh: MMU changes for PVH (v2) Konrad Rzeszutek Wilk
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Konrad Rzeszutek Wilk

Stefano noticed that the code runs only under 64-bit so
the comments about 32-bit are pointless.

Also we change the condition for xen_revector_p2m_tree
returning the same value (because it could not allocate
a swath of space to put the new P2M in) or it had been
called once already. In such we return early from the
function.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index c140eff..9d74249 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1209,29 +1209,29 @@ static void __init xen_pagetable_p2m_copy(void)
 
 	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
 
-	/* On 32-bit, we get zero so this never gets executed. */
 	new_mfn_list = xen_revector_p2m_tree();
-	if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
-		/* using __ka address and sticking INVALID_P2M_ENTRY! */
-		memset((void *)xen_start_info->mfn_list, 0xff, size);
-
-		/* We should be in __ka space. */
-		BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
-		addr = xen_start_info->mfn_list;
-		/* We roundup to the PMD, which means that if anybody at this stage is
-		 * using the __ka address of xen_start_info or xen_start_info->shared_info
-		 * they are in going to crash. Fortunatly we have already revectored
-		 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
-		size = roundup(size, PMD_SIZE);
-		xen_cleanhighmap(addr, addr + size);
-
-		size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
-		memblock_free(__pa(xen_start_info->mfn_list), size);
-		/* And revector! Bye bye old array */
-		xen_start_info->mfn_list = new_mfn_list;
-	} else
+	/* No memory or already called. */
+	if (!new_mfn_list || new_mfn_list == xen_start_info->mfn_list)
 		return;
 
+	/* using __ka address and sticking INVALID_P2M_ENTRY! */
+	memset((void *)xen_start_info->mfn_list, 0xff, size);
+
+	/* We should be in __ka space. */
+	BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
+	addr = xen_start_info->mfn_list;
+	/* We roundup to the PMD, which means that if anybody at this stage is
+	 * using the __ka address of xen_start_info or xen_start_info->shared_info
+	 * they are in going to crash. Fortunatly we have already revectored
+	 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
+	size = roundup(size, PMD_SIZE);
+	xen_cleanhighmap(addr, addr + size);
+
+	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
+	memblock_free(__pa(xen_start_info->mfn_list), size);
+	/* And revector! Bye bye old array */
+	xen_start_info->mfn_list = new_mfn_list;
+
 	/* At this stage, cleanup_highmap has already cleaned __ka space
 	 * from _brk_limit way up to the max_pfn_mapped (which is the end of
 	 * the ramdisk). We continue on, erasing PMD entries that point to page
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 07/19] xen/pvh: MMU changes for PVH (v2)
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (5 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 06/19] xen/mmu: Cleanup xen_pagetable_p2m_copy a bit Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-03 19:38 ` [PATCH v13 08/19] xen/pvh/mmu: Use PV TLB instead of native Konrad Rzeszutek Wilk
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

.. which are surprinsingly small compared to the amount for PV code.

PVH uses mostly native mmu ops, we leave the generic (native_*) for
the majority and just overwrite the baremetal with the ones we need.

At startup, we are running with pre-allocated page-tables
courtesy of the tool-stack. But we still need to graft them
in the Linux initial pagetables. However there is no need to
unpin/pin and change them to R/O or R/W.

Note that the xen_pagetable_init due to 7836fec9d0994cc9c9150c5a33f0eb0eb08a335a
"xen/mmu/p2m: Refactor the xen_pagetable_init code." does not
need any changes - we just need to make sure that xen_post_allocator_init
does not alter the pvops from the default native one.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/xen/mmu.c | 81 +++++++++++++++++++++++++++++++-----------------------
 1 file changed, 46 insertions(+), 35 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 9d74249..490ddb3 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1757,6 +1757,10 @@ static void set_page_prot_flags(void *addr, pgprot_t prot, unsigned long flags)
 	unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
 	pte_t pte = pfn_pte(pfn, prot);
 
+	/* For PVH no need to set R/O or R/W to pin them or unpin them. */
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, flags))
 		BUG();
 }
@@ -1867,6 +1871,7 @@ static void __init check_pt_base(unsigned long *pt_base, unsigned long *pt_end,
  * but that's enough to get __va working.  We need to fill in the rest
  * of the physical mapping once some sort of allocator has been set
  * up.
+ * NOTE: for PVH, the page tables are native.
  */
 void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 {
@@ -1888,17 +1893,18 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	/* Zap identity mapping */
 	init_level4_pgt[0] = __pgd(0);
 
-	/* Pre-constructed entries are in pfn, so convert to mfn */
-	/* L4[272] -> level3_ident_pgt
-	 * L4[511] -> level3_kernel_pgt */
-	convert_pfn_mfn(init_level4_pgt);
-
-	/* L3_i[0] -> level2_ident_pgt */
-	convert_pfn_mfn(level3_ident_pgt);
-	/* L3_k[510] -> level2_kernel_pgt
-	 * L3_i[511] -> level2_fixmap_pgt */
-	convert_pfn_mfn(level3_kernel_pgt);
-
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		/* Pre-constructed entries are in pfn, so convert to mfn */
+		/* L4[272] -> level3_ident_pgt
+		 * L4[511] -> level3_kernel_pgt */
+		convert_pfn_mfn(init_level4_pgt);
+
+		/* L3_i[0] -> level2_ident_pgt */
+		convert_pfn_mfn(level3_ident_pgt);
+		/* L3_k[510] -> level2_kernel_pgt
+		 * L3_i[511] -> level2_fixmap_pgt */
+		convert_pfn_mfn(level3_kernel_pgt);
+	}
 	/* We get [511][511] and have Xen's version of level2_kernel_pgt */
 	l3 = m2v(pgd[pgd_index(__START_KERNEL_map)].pgd);
 	l2 = m2v(l3[pud_index(__START_KERNEL_map)].pud);
@@ -1922,31 +1928,33 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	copy_page(level2_fixmap_pgt, l2);
 	/* Note that we don't do anything with level1_fixmap_pgt which
 	 * we don't need. */
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		/* Make pagetable pieces RO */
+		set_page_prot(init_level4_pgt, PAGE_KERNEL_RO);
+		set_page_prot(level3_ident_pgt, PAGE_KERNEL_RO);
+		set_page_prot(level3_kernel_pgt, PAGE_KERNEL_RO);
+		set_page_prot(level3_user_vsyscall, PAGE_KERNEL_RO);
+		set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO);
+		set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
+		set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
+
+		/* Pin down new L4 */
+		pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE,
+				  PFN_DOWN(__pa_symbol(init_level4_pgt)));
+
+		/* Unpin Xen-provided one */
+		pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
 
-	/* Make pagetable pieces RO */
-	set_page_prot(init_level4_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level3_ident_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level3_kernel_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level3_user_vsyscall, PAGE_KERNEL_RO);
-	set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
-	set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
-
-	/* Pin down new L4 */
-	pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE,
-			  PFN_DOWN(__pa_symbol(init_level4_pgt)));
-
-	/* Unpin Xen-provided one */
-	pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
-
-	/*
-	 * At this stage there can be no user pgd, and no page
-	 * structure to attach it to, so make sure we just set kernel
-	 * pgd.
-	 */
-	xen_mc_batch();
-	__xen_write_cr3(true, __pa(init_level4_pgt));
-	xen_mc_issue(PARAVIRT_LAZY_CPU);
+		/*
+		 * At this stage there can be no user pgd, and no page
+		 * structure to attach it to, so make sure we just set kernel
+		 * pgd.
+		 */
+		xen_mc_batch();
+		__xen_write_cr3(true, __pa(init_level4_pgt));
+		xen_mc_issue(PARAVIRT_LAZY_CPU);
+	} else
+		native_write_cr3(__pa(init_level4_pgt));
 
 	/* We can't that easily rip out L3 and L2, as the Xen pagetables are
 	 * set out this way: [L4], [L1], [L2], [L3], [L1], [L1] ...  for
@@ -2107,6 +2115,9 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
 
 static void __init xen_post_allocator_init(void)
 {
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	pv_mmu_ops.set_pte = xen_set_pte;
 	pv_mmu_ops.set_pmd = xen_set_pmd;
 	pv_mmu_ops.set_pud = xen_set_pud;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 08/19] xen/pvh/mmu: Use PV TLB instead of native.
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (6 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 07/19] xen/pvh: MMU changes for PVH (v2) Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-05 18:11   ` Stefano Stabellini
  2014-01-03 19:38 ` [PATCH v13 09/19] xen/pvh: Setup up shared_info Konrad Rzeszutek Wilk
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

We also optimize one - the TLB flush. The native operation would
needlessly IPI offline VCPUs causing extra wakeups. Using the
Xen one avoids that and lets the hypervisor determine which
VCPU needs the TLB flush.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 490ddb3..c1d406f 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2222,6 +2222,15 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
 void __init xen_init_mmu_ops(void)
 {
 	x86_init.paging.pagetable_init = xen_pagetable_init;
+
+	/* Optimization - we can use the HVM one but it has no idea which
+	 * VCPUs are descheduled - which means that it will needlessly IPI
+	 * them. Xen knows so let it do the job.
+	 */
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+		pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
+		return;
+	}
 	pv_mmu_ops = xen_mmu_ops;
 
 	memset(dummy_mapping, 0xff, PAGE_SIZE);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 09/19] xen/pvh: Setup up shared_info.
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (7 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 08/19] xen/pvh/mmu: Use PV TLB instead of native Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-03 19:38 ` [PATCH v13 10/19] xen/pvh: Load GDT/GS in early PV bootup code for BSP Konrad Rzeszutek Wilk
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

For PVHVM the shared_info structure is provided via the same way
as for normal PV guests (see include/xen/interface/xen.h).

That is during bootup we get 'xen_start_info' via the %esi register
in startup_xen. Then later we extract the 'shared_info' from said
structure (in xen_setup_shared_info) and start using it.

The 'xen_setup_shared_info' is all setup to work with auto-xlat
guests, but there are two functions which it calls that are not:
xen_setup_mfn_list_list and xen_setup_vcpu_info_placement.
This patch modifies the P2M code (xen_setup_mfn_list_list)
while the "Piggyback on PVHVM for event channels" modifies
the xen_setup_vcpu_info_placement.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/p2m.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index fb7ee0a..696c694 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -339,6 +339,9 @@ void __ref xen_build_mfn_list_list(void)
 
 void xen_setup_mfn_list_list(void)
 {
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	BUG_ON(HYPERVISOR_shared_info == &xen_dummy_shared_info);
 
 	HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list =
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 10/19] xen/pvh: Load GDT/GS in early PV bootup code for BSP.
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (8 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 09/19] xen/pvh: Setup up shared_info Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-03 19:38 ` [PATCH v13 11/19] xen/pvh: Secondary VCPU bringup (non-bootup CPUs) Konrad Rzeszutek Wilk
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

During early bootup we start life using the Xen provided
GDT, which means that we are running with %cs segment set
to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261).

But for PVH we want to be use HVM type mechanism for
segment operations. As such we need to switch to the HVM
one and also reload ourselves with the __KERNEL_CS:eip
to run in the proper GDT and segment.

For HVM this is usually done in 'secondary_startup_64' in
(head_64.S) but since we are not taking that bootup
path (we start in PV - xen_start_kernel) we need to do
that in the early PV bootup paths.

For good measure we also zero out the %fs, %ds, and %es
(not strictly needed as Xen has already cleared them
for us). The %gs is loaded by 'switch_to_new_gdt'.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/enlighten.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 23ead29..1170d00 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1410,8 +1410,43 @@ static void __init xen_boot_params_init_edd(void)
  * we do this, we have to be careful not to call any stack-protected
  * function, which is most of the kernel.
  */
-static void __init xen_setup_stackprotector(void)
+static void __init xen_setup_gdt(void)
 {
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+#ifdef CONFIG_X86_64
+		unsigned long dummy;
+
+		switch_to_new_gdt(0); /* GDT and GS set */
+
+		/* We are switching of the Xen provided GDT to our HVM mode
+		 * GDT. The new GDT has  __KERNEL_CS with CS.L = 1
+		 * and we are jumping to reload it.
+		 */
+		asm volatile ("pushq %0\n"
+			      "leaq 1f(%%rip),%0\n"
+			      "pushq %0\n"
+			      "lretq\n"
+			      "1:\n"
+			      : "=&r" (dummy) : "0" (__KERNEL_CS));
+
+		/*
+		 * While not needed, we also set the %es, %ds, and %fs
+		 * to zero. We don't care about %ss as it is NULL.
+		 * Strictly speaking this is not needed as Xen zeros those
+		 * out (and also MSR_FS_BASE, MSR_GS_BASE, MSR_KERNEL_GS_BASE)
+		 *
+		 * Linux zeros them in cpu_init() and in secondary_startup_64
+		 * (for BSP).
+		 */
+		loadsegment(es, 0);
+		loadsegment(ds, 0);
+		loadsegment(fs, 0);
+#else
+		/* PVH: TODO Implement. */
+		BUG();
+#endif
+		return; /* PVH does not need any PV GDT ops. */
+	}
 	pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
 	pv_cpu_ops.load_gdt = xen_load_gdt_boot;
 
@@ -1494,7 +1529,7 @@ asmlinkage void __init xen_start_kernel(void)
 	 * Set up kernel GDT and segment registers, mainly so that
 	 * -fstack-protector code can be executed.
 	 */
-	xen_setup_stackprotector();
+	xen_setup_gdt();
 
 	xen_init_irq_ops();
 	xen_init_cpuid_mask();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 11/19] xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (9 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 10/19] xen/pvh: Load GDT/GS in early PV bootup code for BSP Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-06 10:52   ` David Vrabel
  2014-01-03 19:38 ` [PATCH v13 12/19] xen/pvh: Update E820 to work with PVH (v2) Konrad Rzeszutek Wilk
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

The VCPU bringup protocol follows the PV with certain twists.
>From xen/include/public/arch-x86/xen.h:

Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
for HVM and PVH guests, not all information in this structure is updated:

 - For HVM guests, the structures read include: fpu_ctxt (if
 VGCT_I387_VALID is set), flags, user_regs, debugreg[*]

 - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
 set cr3. All other fields not used should be set to 0.

This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
can load per-cpu data-structures. It has no effect on the VCPU0.

We also piggyback on the %rdi register to pass in the CPU number - so
that when we bootup a new CPU, the cpu_bringup_and_idle will have
passed as the first parameter the CPU number (via %rdi for 64-bit).

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c | 11 ++++++++---
 arch/x86/xen/smp.c       | 49 ++++++++++++++++++++++++++++++++----------------
 arch/x86/xen/xen-ops.h   |  1 +
 3 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 1170d00..fde62c4 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1409,14 +1409,19 @@ static void __init xen_boot_params_init_edd(void)
  * Set up the GDT and segment registers for -fstack-protector.  Until
  * we do this, we have to be careful not to call any stack-protected
  * function, which is most of the kernel.
+ *
+ * Note, that it is refok - because the only caller of this after init
+ * is PVH which is not going to use xen_load_gdt_boot or other
+ * __init functions.
  */
-static void __init xen_setup_gdt(void)
+void __ref xen_setup_gdt(int cpu)
 {
 	if (xen_feature(XENFEAT_auto_translated_physmap)) {
 #ifdef CONFIG_X86_64
 		unsigned long dummy;
 
-		switch_to_new_gdt(0); /* GDT and GS set */
+		load_percpu_segment(cpu); /* We need to access per-cpu area */
+		switch_to_new_gdt(cpu); /* GDT and GS set */
 
 		/* We are switching of the Xen provided GDT to our HVM mode
 		 * GDT. The new GDT has  __KERNEL_CS with CS.L = 1
@@ -1529,7 +1534,7 @@ asmlinkage void __init xen_start_kernel(void)
 	 * Set up kernel GDT and segment registers, mainly so that
 	 * -fstack-protector code can be executed.
 	 */
-	xen_setup_gdt();
+	xen_setup_gdt(0);
 
 	xen_init_irq_ops();
 	xen_init_cpuid_mask();
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index c36b325..5e46190 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -73,9 +73,11 @@ static void cpu_bringup(void)
 	touch_softlockup_watchdog();
 	preempt_disable();
 
-	xen_enable_sysenter();
-	xen_enable_syscall();
-
+	/* PVH runs in ring 0 and allows us to do native syscalls. Yay! */
+	if (!xen_feature(XENFEAT_supervisor_mode_kernel)) {
+		xen_enable_sysenter();
+		xen_enable_syscall();
+	}
 	cpu = smp_processor_id();
 	smp_store_cpu_info(cpu);
 	cpu_data(cpu).x86_max_cores = 1;
@@ -97,8 +99,14 @@ static void cpu_bringup(void)
 	wmb();			/* make sure everything is out */
 }
 
-static void cpu_bringup_and_idle(void)
+/* Note: cpu parameter is only relevant for PVH */
+static void cpu_bringup_and_idle(int cpu)
 {
+#ifdef CONFIG_X86_64
+	if (xen_feature(XENFEAT_auto_translated_physmap) &&
+	    xen_feature(XENFEAT_supervisor_mode_kernel))
+		xen_setup_gdt(cpu);
+#endif
 	cpu_bringup();
 	cpu_startup_entry(CPUHP_ONLINE);
 }
@@ -274,9 +282,10 @@ static void __init xen_smp_prepare_boot_cpu(void)
 	native_smp_prepare_boot_cpu();
 
 	if (xen_pv_domain()) {
-		/* We've switched to the "real" per-cpu gdt, so make sure the
-		   old memory can be recycled */
-		make_lowmem_page_readwrite(xen_initial_gdt);
+		if (!xen_feature(XENFEAT_writable_page_tables))
+			/* We've switched to the "real" per-cpu gdt, so make
+			 * sure the old memory can be recycled. */
+			make_lowmem_page_readwrite(xen_initial_gdt);
 
 #ifdef CONFIG_X86_32
 		/*
@@ -360,22 +369,21 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 
 	gdt = get_cpu_gdt_table(cpu);
 
-	ctxt->flags = VGCF_IN_KERNEL;
-	ctxt->user_regs.ss = __KERNEL_DS;
 #ifdef CONFIG_X86_32
+	/* Note: PVH is not yet supported on x86_32. */
 	ctxt->user_regs.fs = __KERNEL_PERCPU;
 	ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
-#else
-	ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
 	ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
 
 	memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
-	{
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		ctxt->flags = VGCF_IN_KERNEL;
 		ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
 		ctxt->user_regs.ds = __USER_DS;
 		ctxt->user_regs.es = __USER_DS;
+		ctxt->user_regs.ss = __KERNEL_DS;
 
 		xen_copy_trap_info(ctxt->trap_ctxt);
 
@@ -396,18 +404,27 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 #ifdef CONFIG_X86_32
 		ctxt->event_callback_cs     = __KERNEL_CS;
 		ctxt->failsafe_callback_cs  = __KERNEL_CS;
+#else
+		ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
 		ctxt->event_callback_eip    =
 					(unsigned long)xen_hypervisor_callback;
 		ctxt->failsafe_callback_eip =
 					(unsigned long)xen_failsafe_callback;
+		ctxt->user_regs.cs = __KERNEL_CS;
+		per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
+#ifdef CONFIG_X86_32
 	}
-	ctxt->user_regs.cs = __KERNEL_CS;
+#else
+	} else
+		/* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
+		 * %rdi having the cpu number - which means are passing in
+		 * as the first parameter the cpu. Subtle!
+		 */
+		ctxt->user_regs.rdi = cpu;
+#endif
 	ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
-
-	per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
 	ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
-
 	if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
 		BUG();
 
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 95f8c61..9059c24 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -123,4 +123,5 @@ __visible void xen_adjust_exception_frame(void);
 
 extern int xen_panic_handler_init(void);
 
+void xen_setup_gdt(int cpu);
 #endif /* XEN_OPS_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 12/19] xen/pvh: Update E820 to work with PVH (v2)
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (10 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 11/19] xen/pvh: Secondary VCPU bringup (non-bootup CPUs) Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-03 19:38 ` [PATCH v13 13/19] xen/pvh: Piggyback on PVHVM for event channels (v2) Konrad Rzeszutek Wilk
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

In xen_add_extra_mem() we can skip updating P2M as it's managed
by Xen. PVH maps the entire IO space, but only RAM pages need
to be repopulated.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/xen/setup.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 2137c51..dd5f905 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -27,6 +27,7 @@
 #include <xen/interface/memory.h>
 #include <xen/interface/physdev.h>
 #include <xen/features.h>
+#include "mmu.h"
 #include "xen-ops.h"
 #include "vdso.h"
 
@@ -81,6 +82,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
 
 	memblock_reserve(start, size);
 
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	xen_max_p2m_pfn = PFN_DOWN(start + size);
 	for (pfn = PFN_DOWN(start); pfn < xen_max_p2m_pfn; pfn++) {
 		unsigned long mfn = pfn_to_mfn(pfn);
@@ -103,6 +107,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
 		.domid        = DOMID_SELF
 	};
 	unsigned long len = 0;
+	int xlated_phys = xen_feature(XENFEAT_auto_translated_physmap);
 	unsigned long pfn;
 	int ret;
 
@@ -116,7 +121,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
 				continue;
 			frame = mfn;
 		} else {
-			if (mfn != INVALID_P2M_ENTRY)
+			if (!xlated_phys && mfn != INVALID_P2M_ENTRY)
 				continue;
 			frame = pfn;
 		}
@@ -154,6 +159,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
 static unsigned long __init xen_release_chunk(unsigned long start,
 					      unsigned long end)
 {
+	/*
+	 * Xen already ballooned out the E820 non RAM regions for us
+	 * and set them up properly in EPT.
+	 */
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return end - start;
+
 	return xen_do_chunk(start, end, true);
 }
 
@@ -222,7 +234,13 @@ static void __init xen_set_identity_and_release_chunk(
 	 * (except for the ISA region which must be 1:1 mapped) to
 	 * release the refcounts (in Xen) on the original frames.
 	 */
-	for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) {
+
+	/*
+	 * PVH E820 matches the hypervisor's P2M which means we need to
+	 * account for the proper values of *release and *identity.
+	 */
+	for (pfn = start_pfn; !xen_feature(XENFEAT_auto_translated_physmap) &&
+	     pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) {
 		pte_t pte = __pte_ma(0);
 
 		if (pfn < PFN_UP(ISA_END_ADDRESS))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 13/19] xen/pvh: Piggyback on PVHVM for event channels (v2)
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (11 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 12/19] xen/pvh: Update E820 to work with PVH (v2) Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-05 18:15   ` Stefano Stabellini
  2014-01-03 19:38 ` [PATCH v13 14/19] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init Konrad Rzeszutek Wilk
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

PVH is a PV guest with a twist - there are certain things
that work in it like HVM and some like PV. There is
a similar mode - PVHVM where we run in HVM mode with
PV code enabled - and this patch explores that.

The most notable PV interfaces are the XenBus and event channels.

We will piggyback on how the event channel mechanism is
used in PVHVM - that is we want the normal native IRQ mechanism
and we will install a vector (hvm callback) for which we
will call the event channel mechanism.

This means that from a pvops perspective, we can use
native_irq_ops instead of the Xen PV specific. Albeit in the
future we could support pirq_eoi_map. But that is
a feature request that can be shared with PVHVM.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/enlighten.c |  5 +++--
 arch/x86/xen/irq.c       |  5 ++++-
 drivers/xen/events.c     | 14 +++++++++-----
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index fde62c4..628099a 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1144,8 +1144,9 @@ void xen_setup_vcpu_info_placement(void)
 		xen_vcpu_setup(cpu);
 
 	/* xen_vcpu_setup managed to place the vcpu_info within the
-	   percpu area for all cpus, so make use of it */
-	if (have_vcpu_info_placement) {
+	 * percpu area for all cpus, so make use of it. Note that for
+	 * PVH we want to use native IRQ mechanism. */
+	if (have_vcpu_info_placement && !xen_pvh_domain()) {
 		pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
 		pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
 		pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 0da7f86..76ca326 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -5,6 +5,7 @@
 #include <xen/interface/xen.h>
 #include <xen/interface/sched.h>
 #include <xen/interface/vcpu.h>
+#include <xen/features.h>
 #include <xen/events.h>
 
 #include <asm/xen/hypercall.h>
@@ -128,6 +129,8 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
 
 void __init xen_init_irq_ops(void)
 {
-	pv_irq_ops = xen_irq_ops;
+	/* For PVH we use default pv_irq_ops settings. */
+	if (!xen_feature(XENFEAT_hvm_callback_vector))
+		pv_irq_ops = xen_irq_ops;
 	x86_init.irqs.intr_init = xen_init_IRQ;
 }
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 4035e83..783b972 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -1908,8 +1908,15 @@ void __init xen_init_IRQ(void)
 	pirq_needs_eoi = pirq_needs_eoi_flag;
 
 #ifdef CONFIG_X86
-	if (xen_hvm_domain()) {
+	if (xen_pv_domain()) {
+		irq_ctx_init(smp_processor_id());
+		if (xen_initial_domain())
+			pci_xen_initial_domain();
+	}
+	if (xen_feature(XENFEAT_hvm_callback_vector))
 		xen_callback_vector();
+
+	if (xen_hvm_domain()) {
 		native_init_IRQ();
 		/* pci_xen_hvm_init must be called after native_init_IRQ so that
 		 * __acpi_register_gsi can point at the right function */
@@ -1918,13 +1925,10 @@ void __init xen_init_IRQ(void)
 		int rc;
 		struct physdev_pirq_eoi_gmfn eoi_gmfn;
 
-		irq_ctx_init(smp_processor_id());
-		if (xen_initial_domain())
-			pci_xen_initial_domain();
-
 		pirq_eoi_map = (void *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
 		eoi_gmfn.gmfn = virt_to_mfn(pirq_eoi_map);
 		rc = HYPERVISOR_physdev_op(PHYSDEVOP_pirq_eoi_gmfn_v2, &eoi_gmfn);
+		/* TODO: No PVH support for PIRQ EOI */
 		if (rc != 0) {
 			free_page((unsigned long) pirq_eoi_map);
 			pirq_eoi_map = NULL;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 14/19] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init.
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (12 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 13/19] xen/pvh: Piggyback on PVHVM for event channels (v2) Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-05 18:16   ` Stefano Stabellini
  2014-01-03 19:38 ` [PATCH v13 15/19] xen/grant-table: Refactor gnttab_init Konrad Rzeszutek Wilk
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Konrad Rzeszutek Wilk

The function gnttab_max_grant_frames() returns the maximum amount
of frames (pages) of grants we can have. Unfortunatly it was
dependent on gnttab_init() having been run before to initialize
the boot max value (boot_max_nr_grant_frames).

This meant that users of gnttab_max_grant_frames would always
get a zero value if they called before gnttab_init() - such as
'platform_pci_init' (drivers/xen/platform-pci.c).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/grant-table.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index aa846a4..99399cb 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -62,7 +62,6 @@
 
 static grant_ref_t **gnttab_list;
 static unsigned int nr_grant_frames;
-static unsigned int boot_max_nr_grant_frames;
 static int gnttab_free_count;
 static grant_ref_t gnttab_free_head;
 static DEFINE_SPINLOCK(gnttab_list_lock);
@@ -827,6 +826,11 @@ static unsigned int __max_nr_grant_frames(void)
 unsigned int gnttab_max_grant_frames(void)
 {
 	unsigned int xen_max = __max_nr_grant_frames();
+	static unsigned int boot_max_nr_grant_frames;
+
+	/* First time, initialize it properly. */
+	if (!boot_max_nr_grant_frames)
+		boot_max_nr_grant_frames = __max_nr_grant_frames();
 
 	if (xen_max > boot_max_nr_grant_frames)
 		return boot_max_nr_grant_frames;
@@ -1227,13 +1231,12 @@ int gnttab_init(void)
 
 	gnttab_request_version();
 	nr_grant_frames = 1;
-	boot_max_nr_grant_frames = __max_nr_grant_frames();
 
 	/* Determine the maximum number of frames required for the
 	 * grant reference free list on the current hypervisor.
 	 */
 	BUG_ON(grefs_per_grant_frame == 0);
-	max_nr_glist_frames = (boot_max_nr_grant_frames *
+	max_nr_glist_frames = (gnttab_max_grant_frames() *
 			       grefs_per_grant_frame / RPP);
 
 	gnttab_list = kmalloc(max_nr_glist_frames * sizeof(grant_ref_t *),
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 15/19] xen/grant-table: Refactor gnttab_init
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (13 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 14/19] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-05 18:18   ` Stefano Stabellini
  2014-01-03 19:38 ` [PATCH v13 16/19] xen/grant: Implement an grant frame array struct (v2) Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Konrad Rzeszutek Wilk

We have this odd scenario of where for PV paths we take a shortcut
but for the HVM paths we first ioremap xen_hvm_resume_frames, then
assign it to gnttab_shared.addr. This is needed because gnttab_map
uses gnttab_shared.addr.

Instead of having:
	if (pv)
		return gnttab_map
	if (hvm)
		...

	gnttab_map

Lets move the HVM part before the gnttab_map and remove the
first call to gnttab_map.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/grant-table.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 99399cb..cc1b4fa 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1173,22 +1173,17 @@ static int gnttab_setup(void)
 	if (max_nr_gframes < nr_grant_frames)
 		return -ENOSYS;
 
-	if (xen_pv_domain())
-		return gnttab_map(0, nr_grant_frames - 1);
-
-	if (gnttab_shared.addr == NULL) {
+	if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL)
+	{
 		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
-						PAGE_SIZE * max_nr_gframes);
+					       PAGE_SIZE * max_nr_gframes);
 		if (gnttab_shared.addr == NULL) {
 			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
 					xen_hvm_resume_frames);
 			return -ENOMEM;
 		}
 	}
-
-	gnttab_map(0, nr_grant_frames - 1);
-
-	return 0;
+	return gnttab_map(0, nr_grant_frames - 1);
 }
 
 int gnttab_resume(void)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 16/19] xen/grant: Implement an grant frame array struct (v2).
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (14 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 15/19] xen/grant-table: Refactor gnttab_init Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-05 18:38   ` Stefano Stabellini
  2014-01-03 19:38 ` [PATCH v13 17/19] xen/pvh: Piggyback on PVHVM for grant driver (v4) Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Konrad Rzeszutek Wilk

The 'xen_hvm_resume_frames' used to be an 'unsigned long'
and contain the virtual address of the grants. That was OK
for most architectures (PVHVM, ARM) were the grants are contiguous
in memory. That however is not the case for PVH - in which case
we will have to do a lookup for each virtual address for the PFN.

Instead of doing that, lets make it a structure which will contain
the array of PFNs, the virtual address and the count of said PFNs.

Also provide a generic functions: gnttab_setup_auto_xlat_frames and
gnttab_free_auto_xlat_frames to populate said structure with
appropriate values for PVHVM and ARM.

To round it off, change the name from 'xen_hvm_resume_frames' to
a more descriptive one - 'xen_auto_xlat_grant_frames'.

For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
we will populate the 'xen_auto_xlat_grant_frames' by ourselves.

v2 moves the xen_remap in the gnttab_setup_auto_xlat_frames
and also introduces xen_unmap for gnttab_free_auto_xlat_frames.

Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/arm/include/asm/xen/page.h |  1 +
 arch/arm/xen/enlighten.c        |  9 +++++--
 arch/x86/include/asm/xen/page.h |  1 +
 drivers/xen/grant-table.c       | 58 ++++++++++++++++++++++++++++++++++++-----
 drivers/xen/platform-pci.c      | 10 ++++---
 include/xen/grant_table.h       |  9 ++++++-
 6 files changed, 75 insertions(+), 13 deletions(-)

diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 75579a9d..5af8fb3 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -118,5 +118,6 @@ static inline bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
 }
 
 #define xen_remap(cookie, size) ioremap_cached((cookie), (size));
+#define xen_unmap(cookie) iounmap((cookie))
 
 #endif /* _ASM_ARM_XEN_PAGE_H */
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 8550123..2162172 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -208,6 +208,7 @@ static int __init xen_guest_init(void)
 	const char *version = NULL;
 	const char *xen_prefix = "xen,xen-";
 	struct resource res;
+	unsigned long grant_frames;
 
 	node = of_find_compatible_node(NULL, NULL, "xen,xen");
 	if (!node) {
@@ -224,10 +225,10 @@ static int __init xen_guest_init(void)
 	}
 	if (of_address_to_resource(node, GRANT_TABLE_PHYSADDR, &res))
 		return 0;
-	xen_hvm_resume_frames = res.start;
+	grant_frames = res.start;
 	xen_events_irq = irq_of_parse_and_map(node, 0);
 	pr_info("Xen %s support found, events_irq=%d gnttab_frame_pfn=%lx\n",
-			version, xen_events_irq, (xen_hvm_resume_frames >> PAGE_SHIFT));
+			version, xen_events_irq, (grant_frames >> PAGE_SHIFT));
 	xen_domain_type = XEN_HVM_DOMAIN;
 
 	xen_setup_features();
@@ -265,6 +266,10 @@ static int __init xen_guest_init(void)
 	if (xen_vcpu_info == NULL)
 		return -ENOMEM;
 
+	if (gnttab_setup_auto_xlat_frames(grant_frames)) {
+		free_percpu(xen_vcpu_info);
+		return -ENOMEM;
+	}
 	gnttab_init();
 	if (!xen_initial_domain())
 		xenbus_probe(NULL);
diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 4a092cc..3e276eb 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -227,5 +227,6 @@ void make_lowmem_page_readonly(void *vaddr);
 void make_lowmem_page_readwrite(void *vaddr);
 
 #define xen_remap(cookie, size) ioremap((cookie), (size));
+#define xen_unmap(cookie) iounmap((cookie))
 
 #endif /* _ASM_X86_XEN_PAGE_H */
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index cc1b4fa..6c78fd21 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -65,8 +65,7 @@ static unsigned int nr_grant_frames;
 static int gnttab_free_count;
 static grant_ref_t gnttab_free_head;
 static DEFINE_SPINLOCK(gnttab_list_lock);
-unsigned long xen_hvm_resume_frames;
-EXPORT_SYMBOL_GPL(xen_hvm_resume_frames);
+struct grant_frames xen_auto_xlat_grant_frames;
 
 static union {
 	struct grant_entry_v1 *v1;
@@ -838,6 +837,51 @@ unsigned int gnttab_max_grant_frames(void)
 }
 EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
 
+int gnttab_setup_auto_xlat_frames(unsigned long addr)
+{
+	xen_pfn_t *pfn;
+	unsigned int max_nr_gframes = __max_nr_grant_frames();
+	unsigned int i;
+	void *vaddr;
+
+	if (xen_auto_xlat_grant_frames.count)
+		return -EINVAL;
+
+	vaddr = xen_remap(addr, PAGE_SIZE * max_nr_gframes);
+	if (vaddr == NULL) {
+		pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
+			addr);
+		return -ENOMEM;
+	}
+	pfn = kcalloc(max_nr_gframes, sizeof(pfn[0]), GFP_KERNEL);
+	if (!pfn) {
+		xen_unmap(vaddr);
+		return -ENOMEM;
+	}
+	for (i = 0; i < max_nr_gframes; i++)
+		pfn[i] = PFN_DOWN(addr) + i;
+
+	xen_auto_xlat_grant_frames.vaddr = vaddr;
+	xen_auto_xlat_grant_frames.pfn = pfn;
+	xen_auto_xlat_grant_frames.count = max_nr_gframes;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(gnttab_setup_auto_xlat_frames);
+
+void gnttab_free_auto_xlat_frames(void)
+{
+	if (!xen_auto_xlat_grant_frames.count)
+		return;
+	kfree(xen_auto_xlat_grant_frames.pfn);
+	xen_unmap(xen_auto_xlat_grant_frames.vaddr);
+
+	xen_auto_xlat_grant_frames.pfn = NULL;
+	xen_auto_xlat_grant_frames.count = 0;
+	xen_auto_xlat_grant_frames.vaddr = NULL;
+}
+EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
+
 /* Handling of paged out grant targets (GNTST_eagain) */
 #define MAX_DELAY 256
 static inline void
@@ -1068,6 +1112,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 		struct xen_add_to_physmap xatp;
 		unsigned int i = end_idx;
 		rc = 0;
+		BUG_ON(xen_auto_xlat_grant_frames.count < nr_gframes);
 		/*
 		 * Loop backwards, so that the first hypercall has the largest
 		 * index, ensuring that the table will grow only once.
@@ -1076,7 +1121,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 			xatp.domid = DOMID_SELF;
 			xatp.idx = i;
 			xatp.space = XENMAPSPACE_grant_table;
-			xatp.gpfn = (xen_hvm_resume_frames >> PAGE_SHIFT) + i;
+			xatp.gpfn = xen_auto_xlat_grant_frames.pfn[i];
 			rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
 			if (rc != 0) {
 				pr_warn("grant table add_to_physmap failed, err=%d\n",
@@ -1175,11 +1220,10 @@ static int gnttab_setup(void)
 
 	if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL)
 	{
-		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
-					       PAGE_SIZE * max_nr_gframes);
+		gnttab_shared.addr = xen_auto_xlat_grant_frames.vaddr;
 		if (gnttab_shared.addr == NULL) {
-			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
-					xen_hvm_resume_frames);
+			pr_warn("gnttab share frames (addr=0x%08lx) is not mapped!\n",
+				(unsigned long)xen_auto_xlat_grant_frames.vaddr);
 			return -ENOMEM;
 		}
 	}
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 2f3528e..f1947ac 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -108,6 +108,7 @@ static int platform_pci_init(struct pci_dev *pdev,
 	long ioaddr;
 	long mmio_addr, mmio_len;
 	unsigned int max_nr_gframes;
+	unsigned long grant_frames;
 
 	if (!xen_domain())
 		return -ENODEV;
@@ -154,13 +155,16 @@ static int platform_pci_init(struct pci_dev *pdev,
 	}
 
 	max_nr_gframes = gnttab_max_grant_frames();
-	xen_hvm_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
+	grant_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
+	if (gnttab_setup_auto_xlat_frames(grant_frames))
+		goto out;
 	ret = gnttab_init();
 	if (ret)
-		goto out;
+		goto grant_out;
 	xenbus_probe(NULL);
 	return 0;
-
+grant_out:
+	gnttab_free_auto_xlat_frames();
 out:
 	pci_release_region(pdev, 0);
 mem_out:
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 694dcaf..5acb1e4 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -178,8 +178,15 @@ int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
 			   grant_status_t **__shared);
 void arch_gnttab_unmap(void *shared, unsigned long nr_gframes);
 
-extern unsigned long xen_hvm_resume_frames;
+struct grant_frames {
+	xen_pfn_t *pfn;
+	unsigned int count;
+	void *vaddr;
+};
+extern struct grant_frames xen_auto_xlat_grant_frames;
 unsigned int gnttab_max_grant_frames(void);
+int gnttab_setup_auto_xlat_frames(unsigned long addr);
+void gnttab_free_auto_xlat_frames(void);
 
 #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 17/19] xen/pvh: Piggyback on PVHVM for grant driver (v4)
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (15 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 16/19] xen/grant: Implement an grant frame array struct (v2) Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-05 18:20   ` Stefano Stabellini
  2014-01-03 19:38 ` [PATCH v13 18/19] xen/pvh: Piggyback on PVHVM XenBus Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Konrad Rzeszutek Wilk

In PVH the shared grant frame is the PFN and not MFN,
hence its mapped via the same code path as HVM.

The allocation of the grant frame is done differently - we
do not use the early platform-pci driver and have an
ioremap area - instead we use balloon memory and stitch
all of the non-contingous pages in a virtualized area.

That means when we call the hypervisor to replace the GMFN
with a XENMAPSPACE_grant_table type, we need to lookup the
old PFN for every iteration instead of assuming a flat
contingous PFN allocation.

Lastly, we only use v1 for grants. This is because PVHVM
is not able to use v2 due to no XENMEM_add_to_physmap
calls on the error status page (see commit
69e8f430e243d657c2053f097efebc2e2cd559f0
 xen/granttable: Disable grant v2 for HVM domains.)

Until that is implemented this workaround has to
be in place.

Also per suggestions by Stefano utilize the PVHVM paths
as they share common functionality.

v2 of this patch moves most of the PVH code out in the
arch/x86/xen/grant-table driver and touches only minimally
the generic driver.

v3, v4: fixes us some of the code due to earlier patches.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/grant-table.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/xen/gntdev.c       |  2 +-
 drivers/xen/grant-table.c  |  9 ++++---
 3 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 3a5f55d..2d71979 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -125,3 +125,65 @@ void arch_gnttab_unmap(void *shared, unsigned long nr_gframes)
 	apply_to_page_range(&init_mm, (unsigned long)shared,
 			    PAGE_SIZE * nr_gframes, unmap_pte_fn, NULL);
 }
+#ifdef CONFIG_XEN_PVH
+#include <xen/balloon.h>
+#include <xen/events.h>
+#include <linux/slab.h>
+static int __init xlated_setup_gnttab_pages(void)
+{
+	struct page **pages;
+	xen_pfn_t *pfns;
+	int rc;
+	unsigned int i;
+	unsigned long nr_grant_frames = gnttab_max_grant_frames();
+
+	BUG_ON(nr_grant_frames == 0);
+	pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
+	if (!pages)
+		return -ENOMEM;
+
+	pfns = kcalloc(nr_grant_frames, sizeof(pfns[0]), GFP_KERNEL);
+	if (!pfns) {
+		kfree(pages);
+		return -ENOMEM;
+	}
+	rc = alloc_xenballooned_pages(nr_grant_frames, pages, 0 /* lowmem */);
+	if (rc) {
+		pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
+			nr_grant_frames, rc);
+		kfree(pages);
+		kfree(pfns);
+		return rc;
+	}
+	for (i = 0; i < nr_grant_frames; i++)
+		pfns[i] = page_to_pfn(pages[i]);
+
+	rc = arch_gnttab_map_shared(pfns, nr_grant_frames, nr_grant_frames,
+				    &xen_auto_xlat_grant_frames.vaddr);
+
+	kfree(pages);
+	if (rc) {
+		pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
+			nr_grant_frames, rc);
+		free_xenballooned_pages(nr_grant_frames, pages);
+		kfree(pfns);
+		return rc;
+	}
+
+	xen_auto_xlat_grant_frames.pfn = pfns;
+	xen_auto_xlat_grant_frames.count = nr_grant_frames;
+
+	return 0;
+}
+
+static int __init xen_pvh_gnttab_setup(void)
+{
+	if (!xen_pvh_domain())
+		return -ENODEV;
+
+	return xlated_setup_gnttab_pages();
+}
+/* Call it _before_ __gnttab_init as we need to initialize the
+ * xen_auto_xlat_grant_frames first. */
+core_initcall(xen_pvh_gnttab_setup);
+#endif
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index e41c79c..073b4a1 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -846,7 +846,7 @@ static int __init gntdev_init(void)
 	if (!xen_domain())
 		return -ENODEV;
 
-	use_ptemod = xen_pv_domain();
+	use_ptemod = !xen_feature(XENFEAT_auto_translated_physmap);
 
 	err = misc_register(&gntdev_miscdev);
 	if (err != 0) {
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 6c78fd21..3d04c1c 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1108,7 +1108,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 	unsigned int nr_gframes = end_idx + 1;
 	int rc;
 
-	if (xen_hvm_domain()) {
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
 		struct xen_add_to_physmap xatp;
 		unsigned int i = end_idx;
 		rc = 0;
@@ -1184,7 +1184,7 @@ static void gnttab_request_version(void)
 	int rc;
 	struct gnttab_set_version gsv;
 
-	if (xen_hvm_domain())
+	if (xen_feature(XENFEAT_auto_translated_physmap))
 		gsv.version = 1;
 	else
 		gsv.version = 2;
@@ -1328,5 +1328,6 @@ static int __gnttab_init(void)
 
 	return gnttab_init();
 }
-
-core_initcall(__gnttab_init);
+/* Starts after core_initcall so that xen_pvh_gnttab_setup can be called
+ * beforehand to initialize xen_auto_xlat_grant_frames. */
+core_initcall_sync(__gnttab_init);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 18/19] xen/pvh: Piggyback on PVHVM XenBus.
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (16 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 17/19] xen/pvh: Piggyback on PVHVM for grant driver (v4) Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-05 17:54   ` Stefano Stabellini
  2014-01-03 19:38 ` [PATCH v13 19/19] xen/pvh: Support ParaVirtualized Hardware extensions (v3) Konrad Rzeszutek Wilk
  2014-01-06 10:55 ` [PATCH v13] Linux Xen PVH support (v13) David Vrabel
  19 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

PVH is a PV guest with a twist - there are certain things
that work in it like HVM and some like PV. For the XenBus
mechanism we want to use the PVHVM mechanism.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/xenbus/xenbus_client.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index ec097d6..01d59e6 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -45,6 +45,7 @@
 #include <xen/grant_table.h>
 #include <xen/xenbus.h>
 #include <xen/xen.h>
+#include <xen/features.h>
 
 #include "xenbus_probe.h"
 
@@ -743,7 +744,7 @@ static const struct xenbus_ring_ops ring_ops_hvm = {
 
 void __init xenbus_ring_ops_init(void)
 {
-	if (xen_pv_domain())
+	if (!xen_feature(XENFEAT_auto_translated_physmap))
 		ring_ops = &ring_ops_pv;
 	else
 		ring_ops = &ring_ops_hvm;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 19/19] xen/pvh: Support ParaVirtualized Hardware extensions (v3).
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (17 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 18/19] xen/pvh: Piggyback on PVHVM XenBus Konrad Rzeszutek Wilk
@ 2014-01-03 19:38 ` Konrad Rzeszutek Wilk
  2014-01-06 10:55 ` [PATCH v13] Linux Xen PVH support (v13) David Vrabel
  19 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-03 19:38 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel
  Cc: hpa, Mukesh Rathor, Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

PVH allows PV linux guest to utilize hardware extended capabilities,
such as running MMU updates in a HVM container.

The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
with modifications):

"* the guest uses auto translate:
 - p2m is managed by Xen
 - pagetables are owned by the guest
 - mmu_update hypercall not available
* it uses event callback and not vlapic emulation,
* IDT is native, so set_trap_table hcall is also N/A for a PVH guest.

For a full list of hcalls supported for PVH, see pvh_hypercall64_table
in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
PV guest with auto translate, although it does use hvm_op for setting
callback vector."

Use .ascii and .asciz to define xen feature string. Note, the PVH
string must be in a single line (not multiple lines with \) to keep the
assembler from putting null char after each string before \.
This patch allows it to be configured and enabled.

We also use introduce the 'XEN_ELFNOTE_SUPPORTED_FEATURES' ELF note to
tell the hypervisor that 'hvm_callback_vector' is what the kernel
needs. We can not put it in 'XEN_ELFNOTE_FEATURES' as older hypervisor
parse fields they don't understand as errors and refuse to load
the kernel. This work-around fixes the problem.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/xen/Kconfig            |  2 +-
 arch/x86/xen/xen-head.S         | 25 ++++++++++++++++++++++++-
 include/xen/interface/elfnote.h | 13 +++++++++++++
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index e7d0590..d88bfd6 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -53,6 +53,6 @@ config XEN_DEBUG_FS
 
 config XEN_PVH
 	bool "Support for running as a PVH guest"
-	depends on X86_64 && XEN && BROKEN
+	depends on X86_64 && XEN
 	select XEN_PVHVM
 	def_bool n
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 7faed58..485b695 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -11,8 +11,28 @@
 #include <asm/page_types.h>
 
 #include <xen/interface/elfnote.h>
+#include <xen/interface/features.h>
 #include <asm/xen/interface.h>
 
+#ifdef CONFIG_XEN_PVH
+#define PVH_FEATURES_STR  "|writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel"
+/* Note the lack of 'hvm_callback_vector'. Older hypervisor will
+ * balk at this being part of XEN_ELFNOTE_FEATURES, so we put it in
+ * XEN_ELFNOTE_SUPPORTED_FEATURES which older hypervisors will ignore.
+ */
+#define PVH_FEATURES ((1 << XENFEAT_writable_page_tables) | \
+		      (1 << XENFEAT_auto_translated_physmap) | \
+		      (1 << XENFEAT_supervisor_mode_kernel) | \
+		      (1 << XENFEAT_hvm_callback_vector))
+/* The XENFEAT_writable_page_tables is not stricly neccessary as we set that
+ * up regardless whether this CONFIG option is enabled or not, but it
+ * clarifies what the right flags need to be.
+ */
+#else
+#define PVH_FEATURES_STR  ""
+#define PVH_FEATURES (0)
+#endif
+
 	__INIT
 ENTRY(startup_xen)
 	cld
@@ -95,7 +115,10 @@ NEXT_HYPERCALL(arch_6)
 #endif
 	ELFNOTE(Xen, XEN_ELFNOTE_ENTRY,          _ASM_PTR startup_xen)
 	ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page)
-	ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .asciz "!writable_page_tables|pae_pgdir_above_4gb")
+	ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .ascii "!writable_page_tables|pae_pgdir_above_4gb"; .asciz PVH_FEATURES_STR)
+	ELFNOTE(Xen, XEN_ELFNOTE_SUPPORTED_FEATURES, .long (PVH_FEATURES) |
+						(1 << XENFEAT_writable_page_tables) |
+						(1 << XENFEAT_dom0))
 	ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE,       .asciz "yes")
 	ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz "generic")
 	ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID,
diff --git a/include/xen/interface/elfnote.h b/include/xen/interface/elfnote.h
index 0360b15..6f4eae3 100644
--- a/include/xen/interface/elfnote.h
+++ b/include/xen/interface/elfnote.h
@@ -140,6 +140,19 @@
  */
 #define XEN_ELFNOTE_SUSPEND_CANCEL 14
 
+/*
+ * The features supported by this kernel (numeric).
+ *
+ * Other than XEN_ELFNOTE_FEATURES on pre-4.2 Xen, this note allows a
+ * kernel to specify support for features that older hypervisors don't
+ * know about. The set of features 4.2 and newer hypervisors will
+ * consider supported by the kernel is the combination of the sets
+ * specified through this and the string note.
+ *
+ * LEGACY: FEATURES
+ */
+#define XEN_ELFNOTE_SUPPORTED_FEATURES 17
+
 #endif /* __XEN_PUBLIC_ELFNOTE_H__ */
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 03/19] xen/pvh: Early bootup changes in PV code (v4).
  2014-01-03 19:38 ` [PATCH v13 03/19] xen/pvh: Early bootup changes in PV code (v4) Konrad Rzeszutek Wilk
@ 2014-01-05 17:49   ` Stefano Stabellini
  0 siblings, 0 replies; 38+ messages in thread
From: Stefano Stabellini @ 2014-01-05 17:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel, hpa, Mukesh Rathor

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> We don't use the filtering that 'xen_cpuid' is doing
> because the hypervisor treats 'XEN_EMULATE_PREFIX' as
> an invalid instruction. This means that all of the filtering
> will have to be done in the hypervisor/toolstack.
> 
> Without the filtering we expose to the guest the:
> 
>  - cpu topology (sockets, cores, etc);
>  - the APERF (which the generic scheduler likes to
>     use), see  5e626254206a709c6e937f3dda69bf26c7344f6f
>     "xen/setup: filter APERFMPERF cpuid feature out"
>  - and the inability to figure out whether MWAIT_LEAF
>    should be exposed or not. See
>    df88b2d96e36d9a9e325bfcd12eb45671cbbc937
>    "xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded."
>  - x2apic, see  4ea9b9aca90cfc71e6872ed3522356755162932c
>    "xen: mask x2APIC feature in PV"
> 
> We also check for vector callback early on, as it is a required
> feature. PVH also runs at default kernel IOPL.
> 
> Finally, pure PV settings are moved to a separate function that are
> only called for pure PV, ie, pv with pvmmu. They are also #ifdef
> with CONFIG_XEN_PVMMU.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  arch/x86/xen/enlighten.c | 48 ++++++++++++++++++++++++++++++++++--------------
>  arch/x86/xen/setup.c     | 18 ++++++++++++------
>  2 files changed, 46 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index fa6ade7..eb0efc2 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -46,6 +46,7 @@
>  #include <xen/hvm.h>
>  #include <xen/hvc-console.h>
>  #include <xen/acpi.h>
> +#include <xen/features.h>
>  
>  #include <asm/paravirt.h>
>  #include <asm/apic.h>
> @@ -262,8 +263,9 @@ static void __init xen_banner(void)
>  	struct xen_extraversion extra;
>  	HYPERVISOR_xen_version(XENVER_extraversion, &extra);
>  
> -	printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
> -	       pv_info.name);
> +	pr_info("Booting paravirtualized kernel %son %s\n",
> +		xen_feature(XENFEAT_auto_translated_physmap) ?
> +			"with PVH extensions " : "", pv_info.name);
>  	printk(KERN_INFO "Xen version: %d.%d%s%s\n",
>  	       version >> 16, version & 0xffff, extra.extraversion,
>  	       xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
> @@ -433,7 +435,7 @@ static void __init xen_init_cpuid_mask(void)
>  
>  	ax = 1;
>  	cx = 0;
> -	xen_cpuid(&ax, &bx, &cx, &dx);
> +	cpuid(1, &ax, &bx, &cx, &dx);
>  
>  	xsave_mask =
>  		(1 << (X86_FEATURE_XSAVE % 32)) |
> @@ -1420,6 +1422,19 @@ static void __init xen_setup_stackprotector(void)
>  	pv_cpu_ops.load_gdt = xen_load_gdt;
>  }
>  
> +static void __init xen_pvh_early_guest_init(void)
> +{
> +	if (!xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
> +
> +	if (xen_feature(XENFEAT_hvm_callback_vector))
> +		xen_have_vector_callback = 1;
> +
> +#ifdef CONFIG_X86_32
> +	BUG(); /* PVH: Implement proper support. */
> +#endif
> +}
> +
>  /* First C function to be called on Xen boot */
>  asmlinkage void __init xen_start_kernel(void)
>  {
> @@ -1431,13 +1446,16 @@ asmlinkage void __init xen_start_kernel(void)
>  
>  	xen_domain_type = XEN_PV_DOMAIN;
>  
> +	xen_setup_features();
> +	xen_pvh_early_guest_init();
>  	xen_setup_machphys_mapping();
>  
>  	/* Install Xen paravirt ops */
>  	pv_info = xen_info;
>  	pv_init_ops = xen_init_ops;
> -	pv_cpu_ops = xen_cpu_ops;
>  	pv_apic_ops = xen_apic_ops;
> +	if (!xen_pvh_domain())
> +		pv_cpu_ops = xen_cpu_ops;
>  
>  	x86_init.resources.memory_setup = xen_memory_setup;
>  	x86_init.oem.arch_setup = xen_arch_setup;
> @@ -1469,8 +1487,6 @@ asmlinkage void __init xen_start_kernel(void)
>  	/* Work out if we support NX */
>  	x86_configure_nx();
>  
> -	xen_setup_features();
> -
>  	/* Get mfn list */
>  	if (!xen_feature(XENFEAT_auto_translated_physmap))
>  		xen_build_dynamic_phys_to_machine();
> @@ -1548,14 +1564,18 @@ asmlinkage void __init xen_start_kernel(void)
>  	/* set the limit of our address space */
>  	xen_reserve_top();
>  
> -	/* We used to do this in xen_arch_setup, but that is too late on AMD
> -	 * were early_cpu_init (run before ->arch_setup()) calls early_amd_init
> -	 * which pokes 0xcf8 port.
> -	 */
> -	set_iopl.iopl = 1;
> -	rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
> -	if (rc != 0)
> -		xen_raw_printk("physdev_op failed %d\n", rc);
> +	/* PVH: runs at default kernel iopl of 0 */
> +	if (!xen_pvh_domain()) {
> +		/*
> +		 * We used to do this in xen_arch_setup, but that is too late
> +		 * on AMD were early_cpu_init (run before ->arch_setup()) calls
> +		 * early_amd_init which pokes 0xcf8 port.
> +		 */
> +		set_iopl.iopl = 1;
> +		rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
> +		if (rc != 0)
> +			xen_raw_printk("physdev_op failed %d\n", rc);
> +	}
>  
>  #ifdef CONFIG_X86_32
>  	/* set up basic CPUID stuff */
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 68c054f..2137c51 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -563,16 +563,13 @@ void xen_enable_nmi(void)
>  		BUG();
>  #endif
>  }
> -void __init xen_arch_setup(void)
> +void __init xen_pvmmu_arch_setup(void)
>  {
> -	xen_panic_handler_init();
> -
>  	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments);
>  	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_writable_pagetables);
>  
> -	if (!xen_feature(XENFEAT_auto_translated_physmap))
> -		HYPERVISOR_vm_assist(VMASST_CMD_enable,
> -				     VMASST_TYPE_pae_extended_cr3);
> +	HYPERVISOR_vm_assist(VMASST_CMD_enable,
> +			     VMASST_TYPE_pae_extended_cr3);
>  
>  	if (register_callback(CALLBACKTYPE_event, xen_hypervisor_callback) ||
>  	    register_callback(CALLBACKTYPE_failsafe, xen_failsafe_callback))
> @@ -581,6 +578,15 @@ void __init xen_arch_setup(void)
>  	xen_enable_sysenter();
>  	xen_enable_syscall();
>  	xen_enable_nmi();
> +}
> +
> +/* This function is not called for HVM domains */
> +void __init xen_arch_setup(void)
> +{
> +	xen_panic_handler_init();
> +	if (!xen_feature(XENFEAT_auto_translated_physmap))
> +		xen_pvmmu_arch_setup();
> +
>  #ifdef CONFIG_ACPI
>  	if (!(xen_start_info->flags & SIF_INITDOMAIN)) {
>  		printk(KERN_INFO "ACPI in unprivileged domain disabled\n");
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 05/19] xen/mmu/p2m: Refactor the xen_pagetable_init code (v2).
  2014-01-03 19:38 ` [PATCH v13 05/19] xen/mmu/p2m: Refactor the xen_pagetable_init code (v2) Konrad Rzeszutek Wilk
@ 2014-01-05 17:51   ` Stefano Stabellini
  0 siblings, 0 replies; 38+ messages in thread
From: Stefano Stabellini @ 2014-01-05 17:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel, hpa

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> The revector and copying of the P2M only happens when
> !auto-xlat and on 64-bit builds. It is not obvious from
> the code, so lets have seperate 32 and 64-bit functions.
> 
> We also invert the check for auto-xlat to make the code
> flow simpler.
> 
> Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  arch/x86/xen/mmu.c | 70 +++++++++++++++++++++++++++++-------------------------
>  1 file changed, 37 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index ce563be..c140eff 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1198,44 +1198,40 @@ static void __init xen_cleanhighmap(unsigned long vaddr,
>  	 * instead of somewhere later and be confusing. */
>  	xen_mc_flush();
>  }
> -#endif
> -static void __init xen_pagetable_init(void)
> +static void __init xen_pagetable_p2m_copy(void)
>  {
> -#ifdef CONFIG_X86_64
>  	unsigned long size;
>  	unsigned long addr;
> -#endif
> -	paging_init();
> -	xen_setup_shared_info();
> -#ifdef CONFIG_X86_64
> -	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> -		unsigned long new_mfn_list;
> +	unsigned long new_mfn_list;
> +
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
> +
> +	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> +
> +	/* On 32-bit, we get zero so this never gets executed. */
> +	new_mfn_list = xen_revector_p2m_tree();
> +	if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
> +		/* using __ka address and sticking INVALID_P2M_ENTRY! */
> +		memset((void *)xen_start_info->mfn_list, 0xff, size);
> +
> +		/* We should be in __ka space. */
> +		BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
> +		addr = xen_start_info->mfn_list;
> +		/* We roundup to the PMD, which means that if anybody at this stage is
> +		 * using the __ka address of xen_start_info or xen_start_info->shared_info
> +		 * they are in going to crash. Fortunatly we have already revectored
> +		 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
> +		size = roundup(size, PMD_SIZE);
> +		xen_cleanhighmap(addr, addr + size);
>  
>  		size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> +		memblock_free(__pa(xen_start_info->mfn_list), size);
> +		/* And revector! Bye bye old array */
> +		xen_start_info->mfn_list = new_mfn_list;
> +	} else
> +		return;
>  
> -		/* On 32-bit, we get zero so this never gets executed. */
> -		new_mfn_list = xen_revector_p2m_tree();
> -		if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
> -			/* using __ka address and sticking INVALID_P2M_ENTRY! */
> -			memset((void *)xen_start_info->mfn_list, 0xff, size);
> -
> -			/* We should be in __ka space. */
> -			BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
> -			addr = xen_start_info->mfn_list;
> -			/* We roundup to the PMD, which means that if anybody at this stage is
> -			 * using the __ka address of xen_start_info or xen_start_info->shared_info
> -			 * they are in going to crash. Fortunatly we have already revectored
> -			 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
> -			size = roundup(size, PMD_SIZE);
> -			xen_cleanhighmap(addr, addr + size);
> -
> -			size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> -			memblock_free(__pa(xen_start_info->mfn_list), size);
> -			/* And revector! Bye bye old array */
> -			xen_start_info->mfn_list = new_mfn_list;
> -		} else
> -			goto skip;
> -	}
>  	/* At this stage, cleanup_highmap has already cleaned __ka space
>  	 * from _brk_limit way up to the max_pfn_mapped (which is the end of
>  	 * the ramdisk). We continue on, erasing PMD entries that point to page
> @@ -1255,7 +1251,15 @@ static void __init xen_pagetable_init(void)
>  	 * anything at this stage. */
>  	xen_cleanhighmap(MODULES_VADDR, roundup(MODULES_VADDR, PUD_SIZE) - 1);
>  #endif
> -skip:
> +}
> +#endif
> +
> +static void __init xen_pagetable_init(void)
> +{
> +	paging_init();
> +	xen_setup_shared_info();
> +#ifdef CONFIG_X86_64
> +	xen_pagetable_p2m_copy();
>  #endif
>  	xen_post_allocator_init();
>  }
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 18/19] xen/pvh: Piggyback on PVHVM XenBus.
  2014-01-03 19:38 ` [PATCH v13 18/19] xen/pvh: Piggyback on PVHVM XenBus Konrad Rzeszutek Wilk
@ 2014-01-05 17:54   ` Stefano Stabellini
  0 siblings, 0 replies; 38+ messages in thread
From: Stefano Stabellini @ 2014-01-05 17:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel, hpa, Mukesh Rathor

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> PVH is a PV guest with a twist - there are certain things
> that work in it like HVM and some like PV. For the XenBus
> mechanism we want to use the PVHVM mechanism.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: David Vrabel <david.vrabel@citrix.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

>  drivers/xen/xenbus/xenbus_client.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
> index ec097d6..01d59e6 100644
> --- a/drivers/xen/xenbus/xenbus_client.c
> +++ b/drivers/xen/xenbus/xenbus_client.c
> @@ -45,6 +45,7 @@
>  #include <xen/grant_table.h>
>  #include <xen/xenbus.h>
>  #include <xen/xen.h>
> +#include <xen/features.h>
>  
>  #include "xenbus_probe.h"
>  
> @@ -743,7 +744,7 @@ static const struct xenbus_ring_ops ring_ops_hvm = {
>  
>  void __init xenbus_ring_ops_init(void)
>  {
> -	if (xen_pv_domain())
> +	if (!xen_feature(XENFEAT_auto_translated_physmap))
>  		ring_ops = &ring_ops_pv;
>  	else
>  		ring_ops = &ring_ops_hvm;
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 06/19] xen/mmu: Cleanup xen_pagetable_p2m_copy a bit.
  2014-01-03 19:38 ` [PATCH v13 06/19] xen/mmu: Cleanup xen_pagetable_p2m_copy a bit Konrad Rzeszutek Wilk
@ 2014-01-05 17:56   ` Stefano Stabellini
  0 siblings, 0 replies; 38+ messages in thread
From: Stefano Stabellini @ 2014-01-05 17:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel, hpa

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> Stefano noticed that the code runs only under 64-bit so
> the comments about 32-bit are pointless.
> 
> Also we change the condition for xen_revector_p2m_tree
> returning the same value (because it could not allocate
> a swath of space to put the new P2M in) or it had been
> called once already. In such we return early from the
> function.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  arch/x86/xen/mmu.c | 40 ++++++++++++++++++++--------------------
>  1 file changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index c140eff..9d74249 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1209,29 +1209,29 @@ static void __init xen_pagetable_p2m_copy(void)
>  
>  	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
>  
> -	/* On 32-bit, we get zero so this never gets executed. */
>  	new_mfn_list = xen_revector_p2m_tree();
> -	if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
> -		/* using __ka address and sticking INVALID_P2M_ENTRY! */
> -		memset((void *)xen_start_info->mfn_list, 0xff, size);
> -
> -		/* We should be in __ka space. */
> -		BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
> -		addr = xen_start_info->mfn_list;
> -		/* We roundup to the PMD, which means that if anybody at this stage is
> -		 * using the __ka address of xen_start_info or xen_start_info->shared_info
> -		 * they are in going to crash. Fortunatly we have already revectored
> -		 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
> -		size = roundup(size, PMD_SIZE);
> -		xen_cleanhighmap(addr, addr + size);
> -
> -		size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> -		memblock_free(__pa(xen_start_info->mfn_list), size);
> -		/* And revector! Bye bye old array */
> -		xen_start_info->mfn_list = new_mfn_list;
> -	} else
> +	/* No memory or already called. */
> +	if (!new_mfn_list || new_mfn_list == xen_start_info->mfn_list)
>  		return;
>  
> +	/* using __ka address and sticking INVALID_P2M_ENTRY! */
> +	memset((void *)xen_start_info->mfn_list, 0xff, size);
> +
> +	/* We should be in __ka space. */
> +	BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
> +	addr = xen_start_info->mfn_list;
> +	/* We roundup to the PMD, which means that if anybody at this stage is
> +	 * using the __ka address of xen_start_info or xen_start_info->shared_info
> +	 * they are in going to crash. Fortunatly we have already revectored
> +	 * in xen_setup_kernel_pagetable and in xen_setup_shared_info. */
> +	size = roundup(size, PMD_SIZE);
> +	xen_cleanhighmap(addr, addr + size);
> +
> +	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> +	memblock_free(__pa(xen_start_info->mfn_list), size);
> +	/* And revector! Bye bye old array */
> +	xen_start_info->mfn_list = new_mfn_list;
> +
>  	/* At this stage, cleanup_highmap has already cleaned __ka space
>  	 * from _brk_limit way up to the max_pfn_mapped (which is the end of
>  	 * the ramdisk). We continue on, erasing PMD entries that point to page
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 08/19] xen/pvh/mmu: Use PV TLB instead of native.
  2014-01-03 19:38 ` [PATCH v13 08/19] xen/pvh/mmu: Use PV TLB instead of native Konrad Rzeszutek Wilk
@ 2014-01-05 18:11   ` Stefano Stabellini
  2014-01-05 19:41     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 38+ messages in thread
From: Stefano Stabellini @ 2014-01-05 18:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel, hpa, Mukesh Rathor

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> We also optimize one - the TLB flush. The native operation would
> needlessly IPI offline VCPUs causing extra wakeups. Using the
> Xen one avoids that and lets the hypervisor determine which
> VCPU needs the TLB flush.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/mmu.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 490ddb3..c1d406f 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -2222,6 +2222,15 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
>  void __init xen_init_mmu_ops(void)
>  {
>  	x86_init.paging.pagetable_init = xen_pagetable_init;
> +
> +	/* Optimization - we can use the HVM one but it has no idea which
> +	 * VCPUs are descheduled - which means that it will needlessly IPI
> +	 * them. Xen knows so let it do the job.
> +	 */
> +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
> +		pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
> +		return;
> +	}
>  	pv_mmu_ops = xen_mmu_ops;
>  
>  	memset(dummy_mapping, 0xff, PAGE_SIZE);

Regarding this patch, the next one and the other changes to
xen_setup_shared_info, xen_setup_mfn_list_list,
xen_setup_vcpu_info_placement, etc: considering that the mmu related
stuff is very different between PV and PVH guests, I wonder if it makes
any sense to keep calling xen_init_mmu_ops on PVH.

I would introduce a new function, xen_init_pvh_mmu_ops, that sets
pv_mmu_ops.flush_tlb_others and only calls whatever is needed for PVH
under a new xen_pvh_pagetable_init.
Just to give you an idea, not even compiled tested:



diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 23ead29..4e53fa3 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1117,15 +1117,12 @@ static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
 
 void xen_setup_shared_info(void)
 {
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		set_fixmap(FIX_PARAVIRT_BOOTMAP,
-			   xen_start_info->shared_info);
+	BUG_ON(xen_feature(XENFEAT_auto_translated_physmap));
+	set_fixmap(FIX_PARAVIRT_BOOTMAP,
+			xen_start_info->shared_info);
 
-		HYPERVISOR_shared_info =
-			(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
-	} else
-		HYPERVISOR_shared_info =
-			(struct shared_info *)__va(xen_start_info->shared_info);
+	HYPERVISOR_shared_info =
+		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
 
 #ifndef CONFIG_SMP
 	/* In UP this is as good a place as any to set up shared info */
@@ -1467,7 +1464,10 @@ asmlinkage void __init xen_start_kernel(void)
 	 * Set up some pagetable state before starting to set any ptes.
 	 */
 
-	xen_init_mmu_ops();
+	if (xen_pvh_domain())
+		xen_init_pvh_mmu_ops();
+	else
+		xen_init_mmu_ops();
 
 	/* Prevent unwanted bits from being set in PTEs. */
 	__supported_pte_mask &= ~_PAGE_GLOBAL;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 490ddb3..04405bc 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1254,6 +1254,15 @@ static void __init xen_pagetable_p2m_copy(void)
 }
 #endif
 
+static void __init xen_pvh_pagetable_init(void)
+{
+	paging_init();
+	HYPERVISOR_shared_info =
+		(struct shared_info *)__va(xen_start_info->shared_info);
+	for_each_possible_cpu(cpu)
+		xen_vcpu_setup(cpu);
+}
+
 static void __init xen_pagetable_init(void)
 {
 	paging_init();
@@ -2219,6 +2228,20 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
 	.set_fixmap = xen_set_fixmap,
 };
 
+void __init xen_init_pvh_mmu_ops(void)
+{
+	x86_init.paging.pagetable_init = xen_pvh_pagetable_init;
+
+	/* Optimization - we can use the HVM one but it has no idea which
+	 * VCPUs are descheduled - which means that it will needlessly IPI
+	 * them. Xen knows so let it do the job.
+	 */
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+		pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
+		return;
+	}
+}
+
 void __init xen_init_mmu_ops(void)
 {
 	x86_init.paging.pagetable_init = xen_pagetable_init;

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 13/19] xen/pvh: Piggyback on PVHVM for event channels (v2)
  2014-01-03 19:38 ` [PATCH v13 13/19] xen/pvh: Piggyback on PVHVM for event channels (v2) Konrad Rzeszutek Wilk
@ 2014-01-05 18:15   ` Stefano Stabellini
  0 siblings, 0 replies; 38+ messages in thread
From: Stefano Stabellini @ 2014-01-05 18:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel, hpa, Mukesh Rathor

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> PVH is a PV guest with a twist - there are certain things
> that work in it like HVM and some like PV. There is
> a similar mode - PVHVM where we run in HVM mode with
> PV code enabled - and this patch explores that.
> 
> The most notable PV interfaces are the XenBus and event channels.
> 
> We will piggyback on how the event channel mechanism is
> used in PVHVM - that is we want the normal native IRQ mechanism
> and we will install a vector (hvm callback) for which we
> will call the event channel mechanism.
> 
> This means that from a pvops perspective, we can use
> native_irq_ops instead of the Xen PV specific. Albeit in the
> future we could support pirq_eoi_map. But that is
> a feature request that can be shared with PVHVM.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: David Vrabel <david.vrabel@citrix.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  arch/x86/xen/enlighten.c |  5 +++--
>  arch/x86/xen/irq.c       |  5 ++++-
>  drivers/xen/events.c     | 14 +++++++++-----
>  3 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index fde62c4..628099a 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1144,8 +1144,9 @@ void xen_setup_vcpu_info_placement(void)
>  		xen_vcpu_setup(cpu);
>  
>  	/* xen_vcpu_setup managed to place the vcpu_info within the
> -	   percpu area for all cpus, so make use of it */
> -	if (have_vcpu_info_placement) {
> +	 * percpu area for all cpus, so make use of it. Note that for
> +	 * PVH we want to use native IRQ mechanism. */
> +	if (have_vcpu_info_placement && !xen_pvh_domain()) {
>  		pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
>  		pv_irq_ops.restore_fl = __PV_IS_CALLEE_SAVE(xen_restore_fl_direct);
>  		pv_irq_ops.irq_disable = __PV_IS_CALLEE_SAVE(xen_irq_disable_direct);
> diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
> index 0da7f86..76ca326 100644
> --- a/arch/x86/xen/irq.c
> +++ b/arch/x86/xen/irq.c
> @@ -5,6 +5,7 @@
>  #include <xen/interface/xen.h>
>  #include <xen/interface/sched.h>
>  #include <xen/interface/vcpu.h>
> +#include <xen/features.h>
>  #include <xen/events.h>
>  
>  #include <asm/xen/hypercall.h>
> @@ -128,6 +129,8 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
>  
>  void __init xen_init_irq_ops(void)
>  {
> -	pv_irq_ops = xen_irq_ops;
> +	/* For PVH we use default pv_irq_ops settings. */
> +	if (!xen_feature(XENFEAT_hvm_callback_vector))
> +		pv_irq_ops = xen_irq_ops;
>  	x86_init.irqs.intr_init = xen_init_IRQ;
>  }
> diff --git a/drivers/xen/events.c b/drivers/xen/events.c
> index 4035e83..783b972 100644
> --- a/drivers/xen/events.c
> +++ b/drivers/xen/events.c
> @@ -1908,8 +1908,15 @@ void __init xen_init_IRQ(void)
>  	pirq_needs_eoi = pirq_needs_eoi_flag;
>  
>  #ifdef CONFIG_X86
> -	if (xen_hvm_domain()) {
> +	if (xen_pv_domain()) {
> +		irq_ctx_init(smp_processor_id());
> +		if (xen_initial_domain())
> +			pci_xen_initial_domain();
> +	}
> +	if (xen_feature(XENFEAT_hvm_callback_vector))
>  		xen_callback_vector();
> +
> +	if (xen_hvm_domain()) {
>  		native_init_IRQ();
>  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
>  		 * __acpi_register_gsi can point at the right function */
> @@ -1918,13 +1925,10 @@ void __init xen_init_IRQ(void)
>  		int rc;
>  		struct physdev_pirq_eoi_gmfn eoi_gmfn;
>  
> -		irq_ctx_init(smp_processor_id());
> -		if (xen_initial_domain())
> -			pci_xen_initial_domain();
> -
>  		pirq_eoi_map = (void *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
>  		eoi_gmfn.gmfn = virt_to_mfn(pirq_eoi_map);
>  		rc = HYPERVISOR_physdev_op(PHYSDEVOP_pirq_eoi_gmfn_v2, &eoi_gmfn);
> +		/* TODO: No PVH support for PIRQ EOI */
>  		if (rc != 0) {
>  			free_page((unsigned long) pirq_eoi_map);
>  			pirq_eoi_map = NULL;
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 14/19] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init.
  2014-01-03 19:38 ` [PATCH v13 14/19] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init Konrad Rzeszutek Wilk
@ 2014-01-05 18:16   ` Stefano Stabellini
  0 siblings, 0 replies; 38+ messages in thread
From: Stefano Stabellini @ 2014-01-05 18:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel, hpa

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> The function gnttab_max_grant_frames() returns the maximum amount
> of frames (pages) of grants we can have. Unfortunatly it was
> dependent on gnttab_init() having been run before to initialize
> the boot max value (boot_max_nr_grant_frames).
> 
> This meant that users of gnttab_max_grant_frames would always
> get a zero value if they called before gnttab_init() - such as
> 'platform_pci_init' (drivers/xen/platform-pci.c).
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: David Vrabel <david.vrabel@citrix.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  drivers/xen/grant-table.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index aa846a4..99399cb 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -62,7 +62,6 @@
>  
>  static grant_ref_t **gnttab_list;
>  static unsigned int nr_grant_frames;
> -static unsigned int boot_max_nr_grant_frames;
>  static int gnttab_free_count;
>  static grant_ref_t gnttab_free_head;
>  static DEFINE_SPINLOCK(gnttab_list_lock);
> @@ -827,6 +826,11 @@ static unsigned int __max_nr_grant_frames(void)
>  unsigned int gnttab_max_grant_frames(void)
>  {
>  	unsigned int xen_max = __max_nr_grant_frames();
> +	static unsigned int boot_max_nr_grant_frames;
> +
> +	/* First time, initialize it properly. */
> +	if (!boot_max_nr_grant_frames)
> +		boot_max_nr_grant_frames = __max_nr_grant_frames();
>  
>  	if (xen_max > boot_max_nr_grant_frames)
>  		return boot_max_nr_grant_frames;
> @@ -1227,13 +1231,12 @@ int gnttab_init(void)
>  
>  	gnttab_request_version();
>  	nr_grant_frames = 1;
> -	boot_max_nr_grant_frames = __max_nr_grant_frames();
>  
>  	/* Determine the maximum number of frames required for the
>  	 * grant reference free list on the current hypervisor.
>  	 */
>  	BUG_ON(grefs_per_grant_frame == 0);
> -	max_nr_glist_frames = (boot_max_nr_grant_frames *
> +	max_nr_glist_frames = (gnttab_max_grant_frames() *
>  			       grefs_per_grant_frame / RPP);
>  
>  	gnttab_list = kmalloc(max_nr_glist_frames * sizeof(grant_ref_t *),
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 15/19] xen/grant-table: Refactor gnttab_init
  2014-01-03 19:38 ` [PATCH v13 15/19] xen/grant-table: Refactor gnttab_init Konrad Rzeszutek Wilk
@ 2014-01-05 18:18   ` Stefano Stabellini
  2014-01-05 19:33     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 38+ messages in thread
From: Stefano Stabellini @ 2014-01-05 18:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel, hpa

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> We have this odd scenario of where for PV paths we take a shortcut
> but for the HVM paths we first ioremap xen_hvm_resume_frames, then
> assign it to gnttab_shared.addr. This is needed because gnttab_map
> uses gnttab_shared.addr.
> 
> Instead of having:
> 	if (pv)
> 		return gnttab_map
> 	if (hvm)
> 		...
> 
> 	gnttab_map
> 
> Lets move the HVM part before the gnttab_map and remove the
> first call to gnttab_map.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: David Vrabel <david.vrabel@citrix.com>

As I wrote in my reply to the previous version of the patch, you can
have my acked-by, except for the spurious code style fix mixed-up with
the other changes.


>  drivers/xen/grant-table.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 99399cb..cc1b4fa 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -1173,22 +1173,17 @@ static int gnttab_setup(void)
>  	if (max_nr_gframes < nr_grant_frames)
>  		return -ENOSYS;
>  
> -	if (xen_pv_domain())
> -		return gnttab_map(0, nr_grant_frames - 1);
> -
> -	if (gnttab_shared.addr == NULL) {
> +	if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL)
> +	{
>  		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
> -						PAGE_SIZE * max_nr_gframes);
> +					       PAGE_SIZE * max_nr_gframes);
>  		if (gnttab_shared.addr == NULL) {
>  			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
>  					xen_hvm_resume_frames);
>  			return -ENOMEM;
>  		}
>  	}
> -
> -	gnttab_map(0, nr_grant_frames - 1);
> -
> -	return 0;
> +	return gnttab_map(0, nr_grant_frames - 1);
>  }
>  
>  int gnttab_resume(void)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 17/19] xen/pvh: Piggyback on PVHVM for grant driver (v4)
  2014-01-03 19:38 ` [PATCH v13 17/19] xen/pvh: Piggyback on PVHVM for grant driver (v4) Konrad Rzeszutek Wilk
@ 2014-01-05 18:20   ` Stefano Stabellini
  0 siblings, 0 replies; 38+ messages in thread
From: Stefano Stabellini @ 2014-01-05 18:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel, hpa

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> In PVH the shared grant frame is the PFN and not MFN,
> hence its mapped via the same code path as HVM.
> 
> The allocation of the grant frame is done differently - we
> do not use the early platform-pci driver and have an
> ioremap area - instead we use balloon memory and stitch
> all of the non-contingous pages in a virtualized area.
> 
> That means when we call the hypervisor to replace the GMFN
> with a XENMAPSPACE_grant_table type, we need to lookup the
> old PFN for every iteration instead of assuming a flat
> contingous PFN allocation.
> 
> Lastly, we only use v1 for grants. This is because PVHVM
> is not able to use v2 due to no XENMEM_add_to_physmap
> calls on the error status page (see commit
> 69e8f430e243d657c2053f097efebc2e2cd559f0
>  xen/granttable: Disable grant v2 for HVM domains.)
> 
> Until that is implemented this workaround has to
> be in place.
> 
> Also per suggestions by Stefano utilize the PVHVM paths
> as they share common functionality.
> 
> v2 of this patch moves most of the PVH code out in the
> arch/x86/xen/grant-table driver and touches only minimally
> the generic driver.
> 
> v3, v4: fixes us some of the code due to earlier patches.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  arch/x86/xen/grant-table.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++
>  drivers/xen/gntdev.c       |  2 +-
>  drivers/xen/grant-table.c  |  9 ++++---
>  3 files changed, 68 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
> index 3a5f55d..2d71979 100644
> --- a/arch/x86/xen/grant-table.c
> +++ b/arch/x86/xen/grant-table.c
> @@ -125,3 +125,65 @@ void arch_gnttab_unmap(void *shared, unsigned long nr_gframes)
>  	apply_to_page_range(&init_mm, (unsigned long)shared,
>  			    PAGE_SIZE * nr_gframes, unmap_pte_fn, NULL);
>  }
> +#ifdef CONFIG_XEN_PVH
> +#include <xen/balloon.h>
> +#include <xen/events.h>
> +#include <linux/slab.h>
> +static int __init xlated_setup_gnttab_pages(void)
> +{
> +	struct page **pages;
> +	xen_pfn_t *pfns;
> +	int rc;
> +	unsigned int i;
> +	unsigned long nr_grant_frames = gnttab_max_grant_frames();
> +
> +	BUG_ON(nr_grant_frames == 0);
> +	pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
> +	if (!pages)
> +		return -ENOMEM;
> +
> +	pfns = kcalloc(nr_grant_frames, sizeof(pfns[0]), GFP_KERNEL);
> +	if (!pfns) {
> +		kfree(pages);
> +		return -ENOMEM;
> +	}
> +	rc = alloc_xenballooned_pages(nr_grant_frames, pages, 0 /* lowmem */);
> +	if (rc) {
> +		pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
> +			nr_grant_frames, rc);
> +		kfree(pages);
> +		kfree(pfns);
> +		return rc;
> +	}
> +	for (i = 0; i < nr_grant_frames; i++)
> +		pfns[i] = page_to_pfn(pages[i]);
> +
> +	rc = arch_gnttab_map_shared(pfns, nr_grant_frames, nr_grant_frames,
> +				    &xen_auto_xlat_grant_frames.vaddr);
> +
> +	kfree(pages);
> +	if (rc) {
> +		pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
> +			nr_grant_frames, rc);
> +		free_xenballooned_pages(nr_grant_frames, pages);
> +		kfree(pfns);
> +		return rc;
> +	}
> +
> +	xen_auto_xlat_grant_frames.pfn = pfns;
> +	xen_auto_xlat_grant_frames.count = nr_grant_frames;
> +
> +	return 0;
> +}
> +
> +static int __init xen_pvh_gnttab_setup(void)
> +{
> +	if (!xen_pvh_domain())
> +		return -ENODEV;
> +
> +	return xlated_setup_gnttab_pages();
> +}
> +/* Call it _before_ __gnttab_init as we need to initialize the
> + * xen_auto_xlat_grant_frames first. */
> +core_initcall(xen_pvh_gnttab_setup);
> +#endif
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index e41c79c..073b4a1 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -846,7 +846,7 @@ static int __init gntdev_init(void)
>  	if (!xen_domain())
>  		return -ENODEV;
>  
> -	use_ptemod = xen_pv_domain();
> +	use_ptemod = !xen_feature(XENFEAT_auto_translated_physmap);
>  
>  	err = misc_register(&gntdev_miscdev);
>  	if (err != 0) {
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 6c78fd21..3d04c1c 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -1108,7 +1108,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
>  	unsigned int nr_gframes = end_idx + 1;
>  	int rc;
>  
> -	if (xen_hvm_domain()) {
> +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
>  		struct xen_add_to_physmap xatp;
>  		unsigned int i = end_idx;
>  		rc = 0;
> @@ -1184,7 +1184,7 @@ static void gnttab_request_version(void)
>  	int rc;
>  	struct gnttab_set_version gsv;
>  
> -	if (xen_hvm_domain())
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
>  		gsv.version = 1;
>  	else
>  		gsv.version = 2;
> @@ -1328,5 +1328,6 @@ static int __gnttab_init(void)
>  
>  	return gnttab_init();
>  }
> -
> -core_initcall(__gnttab_init);
> +/* Starts after core_initcall so that xen_pvh_gnttab_setup can be called
> + * beforehand to initialize xen_auto_xlat_grant_frames. */
> +core_initcall_sync(__gnttab_init);
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 16/19] xen/grant: Implement an grant frame array struct (v2).
  2014-01-03 19:38 ` [PATCH v13 16/19] xen/grant: Implement an grant frame array struct (v2) Konrad Rzeszutek Wilk
@ 2014-01-05 18:38   ` Stefano Stabellini
  0 siblings, 0 replies; 38+ messages in thread
From: Stefano Stabellini @ 2014-01-05 18:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	david.vrabel, hpa

On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> The 'xen_hvm_resume_frames' used to be an 'unsigned long'
> and contain the virtual address of the grants. That was OK
> for most architectures (PVHVM, ARM) were the grants are contiguous
> in memory. That however is not the case for PVH - in which case
> we will have to do a lookup for each virtual address for the PFN.
> 
> Instead of doing that, lets make it a structure which will contain
> the array of PFNs, the virtual address and the count of said PFNs.
> 
> Also provide a generic functions: gnttab_setup_auto_xlat_frames and
> gnttab_free_auto_xlat_frames to populate said structure with
> appropriate values for PVHVM and ARM.
> 
> To round it off, change the name from 'xen_hvm_resume_frames' to
> a more descriptive one - 'xen_auto_xlat_grant_frames'.
> 
> For PVH, in patch "xen/pvh: Piggyback on PVHVM for grant driver"
> we will populate the 'xen_auto_xlat_grant_frames' by ourselves.
> 
> v2 moves the xen_remap in the gnttab_setup_auto_xlat_frames
> and also introduces xen_unmap for gnttab_free_auto_xlat_frames.
> 
> Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  arch/arm/include/asm/xen/page.h |  1 +
>  arch/arm/xen/enlighten.c        |  9 +++++--
>  arch/x86/include/asm/xen/page.h |  1 +
>  drivers/xen/grant-table.c       | 58 ++++++++++++++++++++++++++++++++++++-----
>  drivers/xen/platform-pci.c      | 10 ++++---
>  include/xen/grant_table.h       |  9 ++++++-
>  6 files changed, 75 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
> index 75579a9d..5af8fb3 100644
> --- a/arch/arm/include/asm/xen/page.h
> +++ b/arch/arm/include/asm/xen/page.h
> @@ -118,5 +118,6 @@ static inline bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
>  }
>  
>  #define xen_remap(cookie, size) ioremap_cached((cookie), (size));
> +#define xen_unmap(cookie) iounmap((cookie))
>  
>  #endif /* _ASM_ARM_XEN_PAGE_H */
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 8550123..2162172 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -208,6 +208,7 @@ static int __init xen_guest_init(void)
>  	const char *version = NULL;
>  	const char *xen_prefix = "xen,xen-";
>  	struct resource res;
> +	unsigned long grant_frames;
>  
>  	node = of_find_compatible_node(NULL, NULL, "xen,xen");
>  	if (!node) {
> @@ -224,10 +225,10 @@ static int __init xen_guest_init(void)
>  	}
>  	if (of_address_to_resource(node, GRANT_TABLE_PHYSADDR, &res))
>  		return 0;
> -	xen_hvm_resume_frames = res.start;
> +	grant_frames = res.start;
>  	xen_events_irq = irq_of_parse_and_map(node, 0);
>  	pr_info("Xen %s support found, events_irq=%d gnttab_frame_pfn=%lx\n",
> -			version, xen_events_irq, (xen_hvm_resume_frames >> PAGE_SHIFT));
> +			version, xen_events_irq, (grant_frames >> PAGE_SHIFT));
>  	xen_domain_type = XEN_HVM_DOMAIN;
>  
>  	xen_setup_features();
> @@ -265,6 +266,10 @@ static int __init xen_guest_init(void)
>  	if (xen_vcpu_info == NULL)
>  		return -ENOMEM;
>  
> +	if (gnttab_setup_auto_xlat_frames(grant_frames)) {
> +		free_percpu(xen_vcpu_info);
> +		return -ENOMEM;
> +	}
>  	gnttab_init();
>  	if (!xen_initial_domain())
>  		xenbus_probe(NULL);
> diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> index 4a092cc..3e276eb 100644
> --- a/arch/x86/include/asm/xen/page.h
> +++ b/arch/x86/include/asm/xen/page.h
> @@ -227,5 +227,6 @@ void make_lowmem_page_readonly(void *vaddr);
>  void make_lowmem_page_readwrite(void *vaddr);
>  
>  #define xen_remap(cookie, size) ioremap((cookie), (size));
> +#define xen_unmap(cookie) iounmap((cookie))
>  
>  #endif /* _ASM_X86_XEN_PAGE_H */
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index cc1b4fa..6c78fd21 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -65,8 +65,7 @@ static unsigned int nr_grant_frames;
>  static int gnttab_free_count;
>  static grant_ref_t gnttab_free_head;
>  static DEFINE_SPINLOCK(gnttab_list_lock);
> -unsigned long xen_hvm_resume_frames;
> -EXPORT_SYMBOL_GPL(xen_hvm_resume_frames);
> +struct grant_frames xen_auto_xlat_grant_frames;
>  
>  static union {
>  	struct grant_entry_v1 *v1;
> @@ -838,6 +837,51 @@ unsigned int gnttab_max_grant_frames(void)
>  }
>  EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
>  
> +int gnttab_setup_auto_xlat_frames(unsigned long addr)
> +{
> +	xen_pfn_t *pfn;
> +	unsigned int max_nr_gframes = __max_nr_grant_frames();
> +	unsigned int i;
> +	void *vaddr;
> +
> +	if (xen_auto_xlat_grant_frames.count)
> +		return -EINVAL;
> +
> +	vaddr = xen_remap(addr, PAGE_SIZE * max_nr_gframes);
> +	if (vaddr == NULL) {
> +		pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
> +			addr);
> +		return -ENOMEM;
> +	}
> +	pfn = kcalloc(max_nr_gframes, sizeof(pfn[0]), GFP_KERNEL);
> +	if (!pfn) {
> +		xen_unmap(vaddr);
> +		return -ENOMEM;
> +	}
> +	for (i = 0; i < max_nr_gframes; i++)
> +		pfn[i] = PFN_DOWN(addr) + i;
> +
> +	xen_auto_xlat_grant_frames.vaddr = vaddr;
> +	xen_auto_xlat_grant_frames.pfn = pfn;
> +	xen_auto_xlat_grant_frames.count = max_nr_gframes;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gnttab_setup_auto_xlat_frames);
> +
> +void gnttab_free_auto_xlat_frames(void)
> +{
> +	if (!xen_auto_xlat_grant_frames.count)
> +		return;
> +	kfree(xen_auto_xlat_grant_frames.pfn);
> +	xen_unmap(xen_auto_xlat_grant_frames.vaddr);
> +
> +	xen_auto_xlat_grant_frames.pfn = NULL;
> +	xen_auto_xlat_grant_frames.count = 0;
> +	xen_auto_xlat_grant_frames.vaddr = NULL;
> +}
> +EXPORT_SYMBOL_GPL(gnttab_free_auto_xlat_frames);
> +
>  /* Handling of paged out grant targets (GNTST_eagain) */
>  #define MAX_DELAY 256
>  static inline void
> @@ -1068,6 +1112,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
>  		struct xen_add_to_physmap xatp;
>  		unsigned int i = end_idx;
>  		rc = 0;
> +		BUG_ON(xen_auto_xlat_grant_frames.count < nr_gframes);
>  		/*
>  		 * Loop backwards, so that the first hypercall has the largest
>  		 * index, ensuring that the table will grow only once.
> @@ -1076,7 +1121,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
>  			xatp.domid = DOMID_SELF;
>  			xatp.idx = i;
>  			xatp.space = XENMAPSPACE_grant_table;
> -			xatp.gpfn = (xen_hvm_resume_frames >> PAGE_SHIFT) + i;
> +			xatp.gpfn = xen_auto_xlat_grant_frames.pfn[i];
>  			rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
>  			if (rc != 0) {
>  				pr_warn("grant table add_to_physmap failed, err=%d\n",
> @@ -1175,11 +1220,10 @@ static int gnttab_setup(void)
>  
>  	if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL)
>  	{
> -		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
> -					       PAGE_SIZE * max_nr_gframes);
> +		gnttab_shared.addr = xen_auto_xlat_grant_frames.vaddr;
>  		if (gnttab_shared.addr == NULL) {
> -			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
> -					xen_hvm_resume_frames);
> +			pr_warn("gnttab share frames (addr=0x%08lx) is not mapped!\n",
> +				(unsigned long)xen_auto_xlat_grant_frames.vaddr);
>  			return -ENOMEM;
>  		}
>  	}
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index 2f3528e..f1947ac 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -108,6 +108,7 @@ static int platform_pci_init(struct pci_dev *pdev,
>  	long ioaddr;
>  	long mmio_addr, mmio_len;
>  	unsigned int max_nr_gframes;
> +	unsigned long grant_frames;
>  
>  	if (!xen_domain())
>  		return -ENODEV;
> @@ -154,13 +155,16 @@ static int platform_pci_init(struct pci_dev *pdev,
>  	}
>  
>  	max_nr_gframes = gnttab_max_grant_frames();
> -	xen_hvm_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
> +	grant_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
> +	if (gnttab_setup_auto_xlat_frames(grant_frames))
> +		goto out;
>  	ret = gnttab_init();
>  	if (ret)
> -		goto out;
> +		goto grant_out;
>  	xenbus_probe(NULL);
>  	return 0;
> -
> +grant_out:
> +	gnttab_free_auto_xlat_frames();
>  out:
>  	pci_release_region(pdev, 0);
>  mem_out:
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 694dcaf..5acb1e4 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -178,8 +178,15 @@ int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
>  			   grant_status_t **__shared);
>  void arch_gnttab_unmap(void *shared, unsigned long nr_gframes);
>  
> -extern unsigned long xen_hvm_resume_frames;
> +struct grant_frames {
> +	xen_pfn_t *pfn;
> +	unsigned int count;
> +	void *vaddr;
> +};
> +extern struct grant_frames xen_auto_xlat_grant_frames;
>  unsigned int gnttab_max_grant_frames(void);
> +int gnttab_setup_auto_xlat_frames(unsigned long addr);
> +void gnttab_free_auto_xlat_frames(void);
>  
>  #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 15/19] xen/grant-table: Refactor gnttab_init
  2014-01-05 18:18   ` Stefano Stabellini
@ 2014-01-05 19:33     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-05 19:33 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel, hpa

On Sun, Jan 05, 2014 at 06:18:03PM +0000, Stefano Stabellini wrote:
> On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> > We have this odd scenario of where for PV paths we take a shortcut
> > but for the HVM paths we first ioremap xen_hvm_resume_frames, then
> > assign it to gnttab_shared.addr. This is needed because gnttab_map
> > uses gnttab_shared.addr.
> > 
> > Instead of having:
> > 	if (pv)
> > 		return gnttab_map
> > 	if (hvm)
> > 		...
> > 
> > 	gnttab_map
> > 
> > Lets move the HVM part before the gnttab_map and remove the
> > first call to gnttab_map.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Reviewed-by: David Vrabel <david.vrabel@citrix.com>
> 
> As I wrote in my reply to the previous version of the patch, you can
> have my acked-by, except for the spurious code style fix mixed-up with
> the other changes.

Thanks. I fixed it up (removed the code style fix).

> 
> 
> >  drivers/xen/grant-table.c | 13 ++++---------
> >  1 file changed, 4 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> > index 99399cb..cc1b4fa 100644
> > --- a/drivers/xen/grant-table.c
> > +++ b/drivers/xen/grant-table.c
> > @@ -1173,22 +1173,17 @@ static int gnttab_setup(void)
> >  	if (max_nr_gframes < nr_grant_frames)
> >  		return -ENOSYS;
> >  
> > -	if (xen_pv_domain())
> > -		return gnttab_map(0, nr_grant_frames - 1);
> > -
> > -	if (gnttab_shared.addr == NULL) {
> > +	if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr == NULL)
> > +	{
> >  		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
> > -						PAGE_SIZE * max_nr_gframes);
> > +					       PAGE_SIZE * max_nr_gframes);
> >  		if (gnttab_shared.addr == NULL) {
> >  			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
> >  					xen_hvm_resume_frames);
> >  			return -ENOMEM;
> >  		}
> >  	}
> > -
> > -	gnttab_map(0, nr_grant_frames - 1);
> > -
> > -	return 0;
> > +	return gnttab_map(0, nr_grant_frames - 1);
> >  }
> >  
> >  int gnttab_resume(void)
> > -- 
> > 1.8.3.1
> > 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 08/19] xen/pvh/mmu: Use PV TLB instead of native.
  2014-01-05 18:11   ` Stefano Stabellini
@ 2014-01-05 19:41     ` Konrad Rzeszutek Wilk
  2014-01-06 11:33       ` Stefano Stabellini
  0 siblings, 1 reply; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-05 19:41 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel, hpa,
	Mukesh Rathor

On Sun, Jan 05, 2014 at 06:11:39PM +0000, Stefano Stabellini wrote:
> On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > We also optimize one - the TLB flush. The native operation would
> > needlessly IPI offline VCPUs causing extra wakeups. Using the
> > Xen one avoids that and lets the hypervisor determine which
> > VCPU needs the TLB flush.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/xen/mmu.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index 490ddb3..c1d406f 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -2222,6 +2222,15 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
> >  void __init xen_init_mmu_ops(void)
> >  {
> >  	x86_init.paging.pagetable_init = xen_pagetable_init;
> > +
> > +	/* Optimization - we can use the HVM one but it has no idea which
> > +	 * VCPUs are descheduled - which means that it will needlessly IPI
> > +	 * them. Xen knows so let it do the job.
> > +	 */
> > +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
> > +		pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
> > +		return;
> > +	}
> >  	pv_mmu_ops = xen_mmu_ops;
> >  
> >  	memset(dummy_mapping, 0xff, PAGE_SIZE);
> 
> Regarding this patch, the next one and the other changes to
> xen_setup_shared_info, xen_setup_mfn_list_list,
> xen_setup_vcpu_info_placement, etc: considering that the mmu related
> stuff is very different between PV and PVH guests, I wonder if it makes
> any sense to keep calling xen_init_mmu_ops on PVH.
> 
> I would introduce a new function, xen_init_pvh_mmu_ops, that sets
> pv_mmu_ops.flush_tlb_others and only calls whatever is needed for PVH
> under a new xen_pvh_pagetable_init.
> Just to give you an idea, not even compiled tested:

There is something to be said about sharing the same code path
that "old-style" PV is using with the new-style - code coverage.

That is the code gets tested under both platforms and if I (or
anybody else) introduce a bug in the "common-PV-paths" it will
be immediately obvious as hopefully the regression tests
will pick it up.

It is not nice - as low-level code is sprinkled with the one-offs
for the PVH - which mostly is doing _less_.

What I was thinking is to flip this around. Make the PVH paths
the default and then have something like 'if (!xen_pvh_domain())'
... the big code.

Would you be OK with this line of thinking going forward say
after this patchset?

Thanks!

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 11/19] xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
  2014-01-03 19:38 ` [PATCH v13 11/19] xen/pvh: Secondary VCPU bringup (non-bootup CPUs) Konrad Rzeszutek Wilk
@ 2014-01-06 10:52   ` David Vrabel
  2014-01-06 15:03     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 38+ messages in thread
From: David Vrabel @ 2014-01-06 10:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	hpa, Mukesh Rathor

On 03/01/14 19:38, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> The VCPU bringup protocol follows the PV with certain twists.
> From xen/include/public/arch-x86/xen.h:
> 
> Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
> for HVM and PVH guests, not all information in this structure is updated:
> 
>  - For HVM guests, the structures read include: fpu_ctxt (if
>  VGCT_I387_VALID is set), flags, user_regs, debugreg[*]
> 
>  - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
>  set cr3. All other fields not used should be set to 0.
> 
> This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
> a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
> can load per-cpu data-structures. It has no effect on the VCPU0.
> 
> We also piggyback on the %rdi register to pass in the CPU number - so
> that when we bootup a new CPU, the cpu_bringup_and_idle will have
> passed as the first parameter the CPU number (via %rdi for 64-bit).
[...]
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1409,14 +1409,19 @@ static void __init xen_boot_params_init_edd(void)
>   * Set up the GDT and segment registers for -fstack-protector.  Until
>   * we do this, we have to be careful not to call any stack-protected
>   * function, which is most of the kernel.
> + *
> + * Note, that it is refok - because the only caller of this after init

"Note, this is __ref because..."

David

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13] Linux Xen PVH support (v13)
  2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
                   ` (18 preceding siblings ...)
  2014-01-03 19:38 ` [PATCH v13 19/19] xen/pvh: Support ParaVirtualized Hardware extensions (v3) Konrad Rzeszutek Wilk
@ 2014-01-06 10:55 ` David Vrabel
  2014-01-06 14:53   ` Konrad Rzeszutek Wilk
  19 siblings, 1 reply; 38+ messages in thread
From: David Vrabel @ 2014-01-06 10:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini, hpa

On 03/01/14 19:38, Konrad Rzeszutek Wilk wrote:
> The patches, also available at
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/pvh.v13

A minor nit with a comment but consider the complete series:

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

I'm happy for this to go into 3.14.

David

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 08/19] xen/pvh/mmu: Use PV TLB instead of native.
  2014-01-05 19:41     ` Konrad Rzeszutek Wilk
@ 2014-01-06 11:33       ` Stefano Stabellini
  2014-01-06 14:59         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 38+ messages in thread
From: Stefano Stabellini @ 2014-01-06 11:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, xen-devel, linux-kernel, boris.ostrovsky,
	david.vrabel, hpa, Mukesh Rathor

On Sun, 5 Jan 2014, Konrad Rzeszutek Wilk wrote:
> On Sun, Jan 05, 2014 at 06:11:39PM +0000, Stefano Stabellini wrote:
> > On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > 
> > > We also optimize one - the TLB flush. The native operation would
> > > needlessly IPI offline VCPUs causing extra wakeups. Using the
> > > Xen one avoids that and lets the hypervisor determine which
> > > VCPU needs the TLB flush.
> > > 
> > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > ---
> > >  arch/x86/xen/mmu.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > > index 490ddb3..c1d406f 100644
> > > --- a/arch/x86/xen/mmu.c
> > > +++ b/arch/x86/xen/mmu.c
> > > @@ -2222,6 +2222,15 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
> > >  void __init xen_init_mmu_ops(void)
> > >  {
> > >  	x86_init.paging.pagetable_init = xen_pagetable_init;
> > > +
> > > +	/* Optimization - we can use the HVM one but it has no idea which
> > > +	 * VCPUs are descheduled - which means that it will needlessly IPI
> > > +	 * them. Xen knows so let it do the job.
> > > +	 */
> > > +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
> > > +		pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
> > > +		return;
> > > +	}
> > >  	pv_mmu_ops = xen_mmu_ops;
> > >  
> > >  	memset(dummy_mapping, 0xff, PAGE_SIZE);
> > 
> > Regarding this patch, the next one and the other changes to
> > xen_setup_shared_info, xen_setup_mfn_list_list,
> > xen_setup_vcpu_info_placement, etc: considering that the mmu related
> > stuff is very different between PV and PVH guests, I wonder if it makes
> > any sense to keep calling xen_init_mmu_ops on PVH.
> > 
> > I would introduce a new function, xen_init_pvh_mmu_ops, that sets
> > pv_mmu_ops.flush_tlb_others and only calls whatever is needed for PVH
> > under a new xen_pvh_pagetable_init.
> > Just to give you an idea, not even compiled tested:
> 
> There is something to be said about sharing the same code path
> that "old-style" PV is using with the new-style - code coverage.
> 
> That is the code gets tested under both platforms and if I (or
> anybody else) introduce a bug in the "common-PV-paths" it will
> be immediately obvious as hopefully the regression tests
> will pick it up.
> 
> It is not nice - as low-level code is sprinkled with the one-offs
> for the PVH - which mostly is doing _less_.

I thought you would say that. However in this specific case the costs
exceed the benefits. Think of all the times we'll have to debug
something, we'll be staring at the code, and several dozens of minutes
later we'll realize that the code we have been looking at all along is
not actually executed in PVH mode.


> What I was thinking is to flip this around. Make the PVH paths
> the default and then have something like 'if (!xen_pvh_domain())'
> ... the big code.
> 
> Would you be OK with this line of thinking going forward say
> after this patchset?
 
I am not opposed to it in principle but I don't expect that you'll be
able to improve things significantly.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13] Linux Xen PVH support (v13)
  2014-01-06 10:55 ` [PATCH v13] Linux Xen PVH support (v13) David Vrabel
@ 2014-01-06 14:53   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-06 14:53 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini, hpa

On Mon, Jan 06, 2014 at 10:55:34AM +0000, David Vrabel wrote:
> On 03/01/14 19:38, Konrad Rzeszutek Wilk wrote:
> > The patches, also available at
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/pvh.v13
> 
> A minor nit with a comment but consider the complete series:
> 
> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
> 
> I'm happy for this to go into 3.14.

Woot! Thank you.
> 
> David

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 08/19] xen/pvh/mmu: Use PV TLB instead of native.
  2014-01-06 11:33       ` Stefano Stabellini
@ 2014-01-06 14:59         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-06 14:59 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel, hpa,
	Mukesh Rathor

On Mon, Jan 06, 2014 at 11:33:00AM +0000, Stefano Stabellini wrote:
> On Sun, 5 Jan 2014, Konrad Rzeszutek Wilk wrote:
> > On Sun, Jan 05, 2014 at 06:11:39PM +0000, Stefano Stabellini wrote:
> > > On Fri, 3 Jan 2014, Konrad Rzeszutek Wilk wrote:
> > > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > > 
> > > > We also optimize one - the TLB flush. The native operation would
> > > > needlessly IPI offline VCPUs causing extra wakeups. Using the
> > > > Xen one avoids that and lets the hypervisor determine which
> > > > VCPU needs the TLB flush.
> > > > 
> > > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > ---
> > > >  arch/x86/xen/mmu.c | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > > > index 490ddb3..c1d406f 100644
> > > > --- a/arch/x86/xen/mmu.c
> > > > +++ b/arch/x86/xen/mmu.c
> > > > @@ -2222,6 +2222,15 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
> > > >  void __init xen_init_mmu_ops(void)
> > > >  {
> > > >  	x86_init.paging.pagetable_init = xen_pagetable_init;
> > > > +
> > > > +	/* Optimization - we can use the HVM one but it has no idea which
> > > > +	 * VCPUs are descheduled - which means that it will needlessly IPI
> > > > +	 * them. Xen knows so let it do the job.
> > > > +	 */
> > > > +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
> > > > +		pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
> > > > +		return;
> > > > +	}
> > > >  	pv_mmu_ops = xen_mmu_ops;
> > > >  
> > > >  	memset(dummy_mapping, 0xff, PAGE_SIZE);
> > > 
> > > Regarding this patch, the next one and the other changes to
> > > xen_setup_shared_info, xen_setup_mfn_list_list,
> > > xen_setup_vcpu_info_placement, etc: considering that the mmu related
> > > stuff is very different between PV and PVH guests, I wonder if it makes
> > > any sense to keep calling xen_init_mmu_ops on PVH.
> > > 
> > > I would introduce a new function, xen_init_pvh_mmu_ops, that sets
> > > pv_mmu_ops.flush_tlb_others and only calls whatever is needed for PVH
> > > under a new xen_pvh_pagetable_init.
> > > Just to give you an idea, not even compiled tested:
> > 
> > There is something to be said about sharing the same code path
> > that "old-style" PV is using with the new-style - code coverage.
> > 
> > That is the code gets tested under both platforms and if I (or
> > anybody else) introduce a bug in the "common-PV-paths" it will
> > be immediately obvious as hopefully the regression tests
> > will pick it up.
> > 
> > It is not nice - as low-level code is sprinkled with the one-offs
> > for the PVH - which mostly is doing _less_.
> 
> I thought you would say that. However in this specific case the costs

You know me too well :-)

> exceed the benefits. Think of all the times we'll have to debug
> something, we'll be staring at the code, and several dozens of minutes
> later we'll realize that the code we have been looking at all along is
> not actually executed in PVH mode.
> 

For this specific code - that is the shared grants and the hypercalls
I think it needs a bit more testing to make sure suspend/resume works
well. And then this segregation can be done.

My reasoning is that - there might be more code that could benefit
from this - so I could do it in one nice big patchset.

Also the other reasoning of mine for delaying your suggestion
is so that this patchset goes in Linux and doesn't accumulate 20+
patches on top to make the review more daunting.
> 
> > What I was thinking is to flip this around. Make the PVH paths
> > the default and then have something like 'if (!xen_pvh_domain())'
> > ... the big code.
> > 
> > Would you be OK with this line of thinking going forward say
> > after this patchset?
>  
> I am not opposed to it in principle but I don't expect that you'll be
> able to improve things significantly.

The end goal is to take a chainsaw to the code and cut out the
PV-old specific ones. But that is not going to happen now - but rather
in 5 years when we are comfortable with it.

And perhaps even make some #ifdef CONFIG_XEN_PVMMU parts around it
to even further identify the old code.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 11/19] xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
  2014-01-06 10:52   ` David Vrabel
@ 2014-01-06 15:03     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 38+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-06 15:03 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, linux-kernel, boris.ostrovsky, stefano.stabellini,
	hpa, Mukesh Rathor

On Mon, Jan 06, 2014 at 10:52:39AM +0000, David Vrabel wrote:
> On 03/01/14 19:38, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > The VCPU bringup protocol follows the PV with certain twists.
> > From xen/include/public/arch-x86/xen.h:
> > 
> > Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
> > for HVM and PVH guests, not all information in this structure is updated:
> > 
> >  - For HVM guests, the structures read include: fpu_ctxt (if
> >  VGCT_I387_VALID is set), flags, user_regs, debugreg[*]
> > 
> >  - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
> >  set cr3. All other fields not used should be set to 0.
> > 
> > This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
> > a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
> > can load per-cpu data-structures. It has no effect on the VCPU0.
> > 
> > We also piggyback on the %rdi register to pass in the CPU number - so
> > that when we bootup a new CPU, the cpu_bringup_and_idle will have
> > passed as the first parameter the CPU number (via %rdi for 64-bit).
> [...]
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -1409,14 +1409,19 @@ static void __init xen_boot_params_init_edd(void)
> >   * Set up the GDT and segment registers for -fstack-protector.  Until
> >   * we do this, we have to be careful not to call any stack-protected
> >   * function, which is most of the kernel.
> > + *
> > + * Note, that it is refok - because the only caller of this after init
> 
> "Note, this is __ref because..."

Fixed.

Thank you.
> 
> David

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2014-01-06 15:04 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-03 19:38 [PATCH v13] Linux Xen PVH support (v13) Konrad Rzeszutek Wilk
2014-01-03 19:38 ` [PATCH v13 01/19] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
2014-01-03 19:38 ` [PATCH v13 02/19] xen/pvh/x86: Define what an PVH guest is (v3) Konrad Rzeszutek Wilk
2014-01-03 19:38 ` [PATCH v13 03/19] xen/pvh: Early bootup changes in PV code (v4) Konrad Rzeszutek Wilk
2014-01-05 17:49   ` Stefano Stabellini
2014-01-03 19:38 ` [PATCH v13 04/19] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
2014-01-03 19:38 ` [PATCH v13 05/19] xen/mmu/p2m: Refactor the xen_pagetable_init code (v2) Konrad Rzeszutek Wilk
2014-01-05 17:51   ` Stefano Stabellini
2014-01-03 19:38 ` [PATCH v13 06/19] xen/mmu: Cleanup xen_pagetable_p2m_copy a bit Konrad Rzeszutek Wilk
2014-01-05 17:56   ` Stefano Stabellini
2014-01-03 19:38 ` [PATCH v13 07/19] xen/pvh: MMU changes for PVH (v2) Konrad Rzeszutek Wilk
2014-01-03 19:38 ` [PATCH v13 08/19] xen/pvh/mmu: Use PV TLB instead of native Konrad Rzeszutek Wilk
2014-01-05 18:11   ` Stefano Stabellini
2014-01-05 19:41     ` Konrad Rzeszutek Wilk
2014-01-06 11:33       ` Stefano Stabellini
2014-01-06 14:59         ` Konrad Rzeszutek Wilk
2014-01-03 19:38 ` [PATCH v13 09/19] xen/pvh: Setup up shared_info Konrad Rzeszutek Wilk
2014-01-03 19:38 ` [PATCH v13 10/19] xen/pvh: Load GDT/GS in early PV bootup code for BSP Konrad Rzeszutek Wilk
2014-01-03 19:38 ` [PATCH v13 11/19] xen/pvh: Secondary VCPU bringup (non-bootup CPUs) Konrad Rzeszutek Wilk
2014-01-06 10:52   ` David Vrabel
2014-01-06 15:03     ` Konrad Rzeszutek Wilk
2014-01-03 19:38 ` [PATCH v13 12/19] xen/pvh: Update E820 to work with PVH (v2) Konrad Rzeszutek Wilk
2014-01-03 19:38 ` [PATCH v13 13/19] xen/pvh: Piggyback on PVHVM for event channels (v2) Konrad Rzeszutek Wilk
2014-01-05 18:15   ` Stefano Stabellini
2014-01-03 19:38 ` [PATCH v13 14/19] xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init Konrad Rzeszutek Wilk
2014-01-05 18:16   ` Stefano Stabellini
2014-01-03 19:38 ` [PATCH v13 15/19] xen/grant-table: Refactor gnttab_init Konrad Rzeszutek Wilk
2014-01-05 18:18   ` Stefano Stabellini
2014-01-05 19:33     ` Konrad Rzeszutek Wilk
2014-01-03 19:38 ` [PATCH v13 16/19] xen/grant: Implement an grant frame array struct (v2) Konrad Rzeszutek Wilk
2014-01-05 18:38   ` Stefano Stabellini
2014-01-03 19:38 ` [PATCH v13 17/19] xen/pvh: Piggyback on PVHVM for grant driver (v4) Konrad Rzeszutek Wilk
2014-01-05 18:20   ` Stefano Stabellini
2014-01-03 19:38 ` [PATCH v13 18/19] xen/pvh: Piggyback on PVHVM XenBus Konrad Rzeszutek Wilk
2014-01-05 17:54   ` Stefano Stabellini
2014-01-03 19:38 ` [PATCH v13 19/19] xen/pvh: Support ParaVirtualized Hardware extensions (v3) Konrad Rzeszutek Wilk
2014-01-06 10:55 ` [PATCH v13] Linux Xen PVH support (v13) David Vrabel
2014-01-06 14:53   ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).