linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11] PVH support for Linux kernel.
@ 2013-12-17 20:51 Konrad Rzeszutek Wilk
  2013-12-17 20:51 ` [PATCH v11 01/12] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
                   ` (11 more replies)
  0 siblings, 12 replies; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

The patches, also available at
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/pvh.v11

implements the neccessary functionality to boot a PV guest in PVH mode.

This blog has a great description of what PVH is:
http://blog.xen.org/index.php/2012/10/31/the-paravirtualization-spectrum-part-2-from-poles-to-a-spectrum/

These patches are based on v3.13-rc4 + stable/for-linus-3.13 (which should
go out Wednesday).

Changelog of v11 as compared to v10: [https://lkml.org/lkml/2013/12/12/625]:
 - Split patches in a more logical sense, squash some
 - Dropped Acked-by's from folks
 - Fleshed out descriptions

Regression wise - there are no bugs. That is if you compile/boot it with
CONFIG_XEN_PVH=y or "# CONFIG_XEN_PVH is not set" - in both cases as
either dom0 or domU there are no bugs. Also launched it as 32/64 bit
dom0 with 32/64 domU as PV or PVHVM, and along with SLES11, SLES12,
F15->F19 (32/64), OL5, OL6, RHEL5 (32/64) FreeBSD HVM, NetBSD PV without issues.

The only things needed to make this work as PVH are:

 0) Get the latest version of Xen and compile/install it.
    See http://wiki.xen.org/wiki/Compiling_Xen_From_Source for details

 1) Clone above mentioned tree

    See http://wiki.xenproject.org/wiki/Mainline_Linux_Kernel_Configs#Configuring_the_Kernel
    for details. The steps are:

	cd $HOME
	git clone  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git linux
	cd linux
	git checkout origin/stable/pvh.v11

 2) Compile with CONFIG_XEN_PVH=y

    a) From scratch:

	make defconfig
	make menuconfig
	Processor type and features  --->  Linux guest support  --->
		 Paravirtualization layer for spinlocks
		 Xen guest support	(which will now show you:)
		 Support for running as a PVH guest (NEW)

	in case you like to edit .config, it is:

	CONFIG_HYPERVISOR_GUEST=y
	CONFIG_PARAVIRT=y
	CONFIG_PARAVIRT_GUEST=y
	CONFIG_PARAVIRT_SPINLOCKS=y
	CONFIG_XEN=y
	CONFIG_XEN_PVH=y

	You will also have to enable the block, network drivers, console, etc
	which are in different submenus.

    b). Based on your current distro.

	cp /boot/config-`uname -r` $HOME/linux/.config
	make menuconfig
	Processor type and features  --->  Linux guest support  --->
		 Support for running as a PVH guest (NEW)

 3) Launch it with 'pvh=1' in your guest config (for example):

	extra="console=hvc0 debug  kgdboc=hvc0 nokgdbroundup  initcall_debug debug"
	kernel="/mnt/lab/latest/vmlinuz"
	ramdisk="/mnt/lab/latest/initramfs.cpio.gz"
	memory=1024
	vcpus=4
	name="pvh"
	vif = [ 'mac=00:0F:4B:00:00:68, bridge=switch' ]
	vfb = [ 'vnc=1, vnclisten=0.0.0.0,vncunused=1']
	disk=['phy:/dev/sdb1,xvda,w']
	pvh=1
	on_reboot="preserve"
	on_crash="preserve"
	on_poweroff="preserve"

    using 'xl'. Xend 'xm' does not have PVH support.

It will bootup as a normal PV guest, but 'xen-detect' will report it as an HVM
guest.

The functionality that is turned off is:
 - VCPU hotplug. You can try it but it should not allow you to do it.
   So 'echo 0 > /sys/bus/cpu/devices/cpu4/online' will error out.


Items that have not been tested extensively or at all:
  - Migration (xl save && xl restore for example).

  - 32-bit guests (won't even present you with a CONFIG_XEN_PVH option)

  - PCI passthrough

  - Running it in dom0 mode (as the patches for that are not yet in Xen upstream).
    If you want to try that, you can merge/pull Mukesh's branch:

	cd $HOME/xen
	git pull git://oss.oracle.com/git/mrathor/xen.git dom0pvh-v6

    .. and use this bootup parameter ("dom0pvh=1"). Remember to recompile
    and install the new version of Xen. This patchset
    does not contain the patches neccessary to setup guests - but I can
    create one easily enough. 

  - Memory ballooning
.
  - Multiple VBDs, NICs, etc.

If you encounter errors, please email with the following (pls note that the
guest config has 'on_reboot="preserve", on_crash="preserve" - which you should
have in your guest config to contain the memory of the guest):

 a) xl dmesg
 b) xl list
 c) xenctx -s $HOME/linux/System.map -f -a -C <domain id>
    [xenctx is sometimes found in  /usr/lib/xen/bin/xenctx ]
 d) the console output from the guest
 e) Anything else you can think off.

Stash away your vmlinux file (it is too big to send via email) - as I might
need it later on.

That is it!

Thank you!

 arch/x86/include/asm/xen/page.h    |   7 ++-
 arch/x86/xen/Kconfig               |   8 +++
 arch/x86/xen/enlighten.c           | 112 +++++++++++++++++++++++++++++--------
 arch/x86/xen/irq.c                 |   5 +-
 arch/x86/xen/mmu.c                 |  35 ++++++++++--
 arch/x86/xen/p2m.c                 |  12 +++-
 arch/x86/xen/setup.c               |  37 +++++++++---
 arch/x86/xen/smp.c                 |  49 ++++++++++------
 arch/x86/xen/xen-head.S            |   8 ++-
 arch/x86/xen/xen-ops.h             |   1 +
 drivers/xen/cpu_hotplug.c          |   4 +-
 drivers/xen/events.c               |   5 ++
 drivers/xen/gntdev.c               |   2 +-
 drivers/xen/grant-table.c          |  80 +++++++++++++++++++++++---
 drivers/xen/platform-pci.c         |   2 +-
 drivers/xen/xenbus/xenbus_client.c |   3 +-
 include/xen/grant_table.h          |   2 +-
 include/xen/xen.h                  |   8 +++
 18 files changed, 312 insertions(+), 68 deletions(-)


Konrad Rzeszutek Wilk (2):
      xen/pvh: Don't setup P2M tree.
      xen/pvh: Piggyback on PVHVM for grant driver.

Mukesh Rathor (10):
      xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn.
      xen/pvh: Define what an PVH guest is.
      xen/pvh: Early bootup changes in PV code.
      xen/pvh: Update E820 to work with PVH
      xen/pvh: Load GDT/GS in early PV bootup code for BSP.
      xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
      xen/pvh: MMU changes for PVH
      xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.
      xen/pvh: Disable PV code that does not work with PVH.
      xen/pvh: Support ParaVirtualized Hardware extensions.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v11 01/12] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn.
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-18 14:10   ` [Xen-devel] " Stefano Stabellini
  2013-12-17 20:51 ` [PATCH v11 02/12] xen/pvh: Define what an PVH guest is Konrad Rzeszutek Wilk
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

Most of the functions in page.h are prefaced with
	if (xen_feature(XENFEAT_auto_translated_physmap))
		return mfn;

Except the mfn_to_local_pfn. At a first sight, the function
should work without this patch - as the 'mfn_to_mfn' has
a similar check. But there are no such check in the
'get_phys_to_machine' function - so we would crash in there.

This fixes it by following the convention of having the
check for auto-xlat in these static functions.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/include/asm/xen/page.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index b913915..4a092cc 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -167,7 +167,12 @@ static inline xpaddr_t machine_to_phys(xmaddr_t machine)
  */
 static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
 {
-	unsigned long pfn = mfn_to_pfn(mfn);
+	unsigned long pfn;
+
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return mfn;
+
+	pfn = mfn_to_pfn(mfn);
 	if (get_phys_to_machine(pfn) != mfn)
 		return -1; /* force !pfn_valid() */
 	return pfn;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v11 02/12] xen/pvh: Define what an PVH guest is.
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
  2013-12-17 20:51 ` [PATCH v11 01/12] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-18 14:22   ` [Xen-devel] " Stefano Stabellini
  2013-12-17 20:51 ` [PATCH v11 03/12] xen/pvh: Early bootup changes in PV code Konrad Rzeszutek Wilk
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

Which is a PV guest with auto page translation enabled
and with vector callback. It is a cross between PVHVM and PV.

The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
with modifications):

"* the guest uses auto translate:
 - p2m is managed by Xen
 - pagetables are owned by the guest
 - mmu_update hypercall not available
* it uses event callback and not vlapic emulation,
* IDT is native, so set_trap_table hcall is also N/A for a PVH guest.

For a full list of hcalls supported for PVH, see pvh_hypercall64_table
in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
PV guest with auto translate, although it does use hvm_op for setting
callback vector."

We don't have yet a Kconfig entry setup as we do not
have all the parts ready for it - so we piggyback
on the PVHVM config option. This scaffolding will
be removed later.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 include/xen/xen.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/xen/xen.h b/include/xen/xen.h
index a74d436..1d6a237 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -29,4 +29,13 @@ extern enum xen_domain_type xen_domain_type;
 #define xen_initial_domain()	(0)
 #endif	/* CONFIG_XEN_DOM0 */
 
+#ifdef CONFIG_XEN_PVHVM
+/* Temporarily under XEN_PVHVM, but will be under CONFIG_XEN_PVH */
+#include <xen/features.h>
+#define xen_pvh_domain() (xen_pv_domain() && \
+			  xen_feature(XENFEAT_auto_translated_physmap) && \
+			  xen_have_vector_callback)
+#else
+#define xen_pvh_domain()	(0)
+#endif
 #endif	/* _XEN_XEN_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v11 03/12] xen/pvh: Early bootup changes in PV code.
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
  2013-12-17 20:51 ` [PATCH v11 01/12] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
  2013-12-17 20:51 ` [PATCH v11 02/12] xen/pvh: Define what an PVH guest is Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-18 14:27   ` [Xen-devel] " Stefano Stabellini
  2013-12-17 20:51 ` [PATCH v11 04/12] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

In the bootup code for PVH we can trap cpuid via vmexit, so don't
need to use emulated prefix call. We also check for vector callback
early on, as it is a required feature. PVH also runs at default kernel
IOPL.

Finally, pure PV settings are moved to a separate function that are
only called for pure PV, ie, pv with pvmmu. They are also #ifdef
with CONFIG_XEN_PVMMU.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c | 63 +++++++++++++++++++++++++++++++++---------------
 arch/x86/xen/setup.c     | 18 +++++++++-----
 2 files changed, 56 insertions(+), 25 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index fa6ade7..3b21c82 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -46,6 +46,7 @@
 #include <xen/hvm.h>
 #include <xen/hvc-console.h>
 #include <xen/acpi.h>
+#include <xen/features.h>
 
 #include <asm/paravirt.h>
 #include <asm/apic.h>
@@ -262,8 +263,9 @@ static void __init xen_banner(void)
 	struct xen_extraversion extra;
 	HYPERVISOR_xen_version(XENVER_extraversion, &extra);
 
-	printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
-	       pv_info.name);
+	pr_info("Booting paravirtualized kernel %son %s\n",
+		xen_feature(XENFEAT_auto_translated_physmap) ?
+			"with PVH extensions " : "", pv_info.name);
 	printk(KERN_INFO "Xen version: %d.%d%s%s\n",
 	       version >> 16, version & 0xffff, extra.extraversion,
 	       xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
@@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
 		break;
 	}
 
-	asm(XEN_EMULATE_PREFIX "cpuid"
-		: "=a" (*ax),
-		  "=b" (*bx),
-		  "=c" (*cx),
-		  "=d" (*dx)
-		: "0" (*ax), "2" (*cx));
+	if (xen_pvh_domain())
+		native_cpuid(ax, bx, cx, dx);
+	else
+		asm(XEN_EMULATE_PREFIX "cpuid"
+			: "=a" (*ax),
+			"=b" (*bx),
+			"=c" (*cx),
+			"=d" (*dx)
+			: "0" (*ax), "2" (*cx));
 
 	*bx &= maskebx;
 	*cx &= maskecx;
@@ -1420,6 +1425,19 @@ static void __init xen_setup_stackprotector(void)
 	pv_cpu_ops.load_gdt = xen_load_gdt;
 }
 
+static void __init xen_pvh_early_guest_init(void)
+{
+	if (xen_feature(XENFEAT_hvm_callback_vector))
+		xen_have_vector_callback = 1;
+
+#ifdef CONFIG_X86_32
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+		xen_raw_printk("ERROR: 32bit PVH guests are not supported\n");
+		BUG();
+	}
+#endif
+}
+
 /* First C function to be called on Xen boot */
 asmlinkage void __init xen_start_kernel(void)
 {
@@ -1431,13 +1449,18 @@ asmlinkage void __init xen_start_kernel(void)
 
 	xen_domain_type = XEN_PV_DOMAIN;
 
+	xen_setup_features();
+	xen_pvh_early_guest_init();
 	xen_setup_machphys_mapping();
 
 	/* Install Xen paravirt ops */
 	pv_info = xen_info;
 	pv_init_ops = xen_init_ops;
-	pv_cpu_ops = xen_cpu_ops;
 	pv_apic_ops = xen_apic_ops;
+	if (xen_pvh_domain())
+		pv_cpu_ops.cpuid = xen_cpuid;
+	else
+		pv_cpu_ops = xen_cpu_ops;
 
 	x86_init.resources.memory_setup = xen_memory_setup;
 	x86_init.oem.arch_setup = xen_arch_setup;
@@ -1469,8 +1492,6 @@ asmlinkage void __init xen_start_kernel(void)
 	/* Work out if we support NX */
 	x86_configure_nx();
 
-	xen_setup_features();
-
 	/* Get mfn list */
 	if (!xen_feature(XENFEAT_auto_translated_physmap))
 		xen_build_dynamic_phys_to_machine();
@@ -1548,14 +1569,18 @@ asmlinkage void __init xen_start_kernel(void)
 	/* set the limit of our address space */
 	xen_reserve_top();
 
-	/* We used to do this in xen_arch_setup, but that is too late on AMD
-	 * were early_cpu_init (run before ->arch_setup()) calls early_amd_init
-	 * which pokes 0xcf8 port.
-	 */
-	set_iopl.iopl = 1;
-	rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
-	if (rc != 0)
-		xen_raw_printk("physdev_op failed %d\n", rc);
+	/* PVH: runs at default kernel iopl of 0 */
+	if (!xen_pvh_domain()) {
+		/*
+		 * We used to do this in xen_arch_setup, but that is too late
+		 * on AMD were early_cpu_init (run before ->arch_setup()) calls
+		 * early_amd_init which pokes 0xcf8 port.
+		 */
+		set_iopl.iopl = 1;
+		rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
+		if (rc != 0)
+			xen_raw_printk("physdev_op failed %d\n", rc);
+	}
 
 #ifdef CONFIG_X86_32
 	/* set up basic CPUID stuff */
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 68c054f..2137c51 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -563,16 +563,13 @@ void xen_enable_nmi(void)
 		BUG();
 #endif
 }
-void __init xen_arch_setup(void)
+void __init xen_pvmmu_arch_setup(void)
 {
-	xen_panic_handler_init();
-
 	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments);
 	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_writable_pagetables);
 
-	if (!xen_feature(XENFEAT_auto_translated_physmap))
-		HYPERVISOR_vm_assist(VMASST_CMD_enable,
-				     VMASST_TYPE_pae_extended_cr3);
+	HYPERVISOR_vm_assist(VMASST_CMD_enable,
+			     VMASST_TYPE_pae_extended_cr3);
 
 	if (register_callback(CALLBACKTYPE_event, xen_hypervisor_callback) ||
 	    register_callback(CALLBACKTYPE_failsafe, xen_failsafe_callback))
@@ -581,6 +578,15 @@ void __init xen_arch_setup(void)
 	xen_enable_sysenter();
 	xen_enable_syscall();
 	xen_enable_nmi();
+}
+
+/* This function is not called for HVM domains */
+void __init xen_arch_setup(void)
+{
+	xen_panic_handler_init();
+	if (!xen_feature(XENFEAT_auto_translated_physmap))
+		xen_pvmmu_arch_setup();
+
 #ifdef CONFIG_ACPI
 	if (!(xen_start_info->flags & SIF_INITDOMAIN)) {
 		printk(KERN_INFO "ACPI in unprivileged domain disabled\n");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v11 04/12] xen/pvh: Don't setup P2M tree.
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2013-12-17 20:51 ` [PATCH v11 03/12] xen/pvh: Early bootup changes in PV code Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-18 14:39   ` [Xen-devel] " Stefano Stabellini
  2013-12-17 20:51 ` [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH Konrad Rzeszutek Wilk
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

P2M is not available for PVH. Fortunatly for us the
P2M code already has mostly the support for auto-xlat guest thanks to
commit 3d24bbd7dddbea54358a9795abaf051b0f18973c
"grant-table: call set_phys_to_machine after mapping grant refs"
which: "
introduces set_phys_to_machine calls for auto_translated guests
(even on x86) in gnttab_map_refs and gnttab_unmap_refs.
translated by swiotlb-xen... " so we don't need to muck much.

But we still have to inhibit the building of the P2M tree.
That had been done in the past by not calling
xen_build_dynamic_phys_to_machine (which setups the P2M tree
and gives us virtual address to access them). But we are missing
a check for xen_build_mfn_list_list - which was continuing to setup
the P2M tree and would blow up at trying to get the virtual
address of p2m_missing (which would have been setup by
xen_build_dynamic_phys_to_machine).

Hence a check is needed to not call xen_build_mfn_list_list when
running in auto-xlat mode.

Instead of replicating the check for auto-xlat in enlighten.c
do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
is called also in xen_arch_post_suspend without any checks for
auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
allocate space for an P2M tree.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c |  3 +--
 arch/x86/xen/p2m.c       | 12 ++++++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 3b21c82..c7341d0 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1493,8 +1493,7 @@ asmlinkage void __init xen_start_kernel(void)
 	x86_configure_nx();
 
 	/* Get mfn list */
-	if (!xen_feature(XENFEAT_auto_translated_physmap))
-		xen_build_dynamic_phys_to_machine();
+	xen_build_dynamic_phys_to_machine();
 
 	/*
 	 * Set up kernel GDT and segment registers, mainly so that
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 2ae8699..fb7ee0a 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -280,6 +280,9 @@ void __ref xen_build_mfn_list_list(void)
 {
 	unsigned long pfn;
 
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	/* Pre-initialize p2m_top_mfn to be completely missing */
 	if (p2m_top_mfn == NULL) {
 		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
@@ -346,10 +349,15 @@ void xen_setup_mfn_list_list(void)
 /* Set up p2m_top to point to the domain-builder provided p2m pages */
 void __init xen_build_dynamic_phys_to_machine(void)
 {
-	unsigned long *mfn_list = (unsigned long *)xen_start_info->mfn_list;
-	unsigned long max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
+	unsigned long *mfn_list;
+	unsigned long max_pfn;
 	unsigned long pfn;
 
+	 if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
+	mfn_list = (unsigned long *)xen_start_info->mfn_list;
+	max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
 	xen_max_p2m_pfn = max_pfn;
 
 	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2013-12-17 20:51 ` [PATCH v11 04/12] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-18 18:25   ` [Xen-devel] " Stefano Stabellini
  2013-12-17 20:51 ` [PATCH v11 06/12] xen/pvh: Load GDT/GS in early PV bootup code for BSP Konrad Rzeszutek Wilk
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

In xen_add_extra_mem() we can skip updating P2M as it's managed
by Xen. PVH maps the entire IO space, but only RAM pages need
to be repopulated.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/setup.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 2137c51..f93bca1 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -27,6 +27,7 @@
 #include <xen/interface/memory.h>
 #include <xen/interface/physdev.h>
 #include <xen/features.h>
+#include "mmu.h"
 #include "xen-ops.h"
 #include "vdso.h"
 
@@ -81,6 +82,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
 
 	memblock_reserve(start, size);
 
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	xen_max_p2m_pfn = PFN_DOWN(start + size);
 	for (pfn = PFN_DOWN(start); pfn < xen_max_p2m_pfn; pfn++) {
 		unsigned long mfn = pfn_to_mfn(pfn);
@@ -103,6 +107,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
 		.domid        = DOMID_SELF
 	};
 	unsigned long len = 0;
+	int xlated_phys = xen_feature(XENFEAT_auto_translated_physmap);
 	unsigned long pfn;
 	int ret;
 
@@ -116,7 +121,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
 				continue;
 			frame = mfn;
 		} else {
-			if (mfn != INVALID_P2M_ENTRY)
+			if (!xlated_phys && mfn != INVALID_P2M_ENTRY)
 				continue;
 			frame = pfn;
 		}
@@ -154,6 +159,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
 static unsigned long __init xen_release_chunk(unsigned long start,
 					      unsigned long end)
 {
+	/*
+	 * Xen already ballooned out the E820 non RAM regions for us
+	 * and set them up properly in EPT.
+	 */
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return end - start;
+
 	return xen_do_chunk(start, end, true);
 }
 
@@ -222,6 +234,9 @@ static void __init xen_set_identity_and_release_chunk(
 	 * (except for the ISA region which must be 1:1 mapped) to
 	 * release the refcounts (in Xen) on the original frames.
 	 */
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		goto skip;
+
 	for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) {
 		pte_t pte = __pte_ma(0);
 
@@ -231,7 +246,7 @@ static void __init xen_set_identity_and_release_chunk(
 		(void)HYPERVISOR_update_va_mapping(
 			(unsigned long)__va(pfn << PAGE_SHIFT), pte, 0);
 	}
-
+skip:
 	if (start_pfn < nr_pages)
 		*released += xen_release_chunk(
 			start_pfn, min(end_pfn, nr_pages));
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v11 06/12] xen/pvh: Load GDT/GS in early PV bootup code for BSP.
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2013-12-17 20:51 ` [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-17 20:51 ` [PATCH v11 07/12] xen/pvh: Secondary VCPU bringup (non-bootup CPUs) Konrad Rzeszutek Wilk
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

During early bootup we start life using the Xen provided
GDT, which means that we are running with %cs segment set
to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261).

But for PVH we want to be use HVM type mechanism for
segment operations. As such we need to switch to the HVM
one and also reload ourselves with the __KERNEL_CS:eip
to run in the proper GDT and segment.

For HVM this is usually done in 'secondary_startup_64' in
(head_64.S) but since we are not taking that bootup
path (we start in PV - xen_start_kernel) we need to do
that in the early PV bootup paths.

For good measure we also zero out the %fs, %ds, and %es
(not strictly needed as Xen has already cleared them
for us). The %gs is loaded by 'switch_to_new_gdt'.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c7341d0..0d13652 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1413,8 +1413,43 @@ static void __init xen_boot_params_init_edd(void)
  * we do this, we have to be careful not to call any stack-protected
  * function, which is most of the kernel.
  */
-static void __init xen_setup_stackprotector(void)
+static void __init xen_setup_gdt(void)
 {
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+#ifdef CONFIG_X86_64
+		unsigned long dummy;
+
+		switch_to_new_gdt(0); /* GDT and GS set */
+
+		/* We are switching of the Xen provided GDT to our HVM mode
+		 * GDT. The new GDT has  __KERNEL_CS with CS.L = 1
+		 * and we are jumping to reload it.
+		 */
+		asm volatile ("pushq %0\n"
+			      "leaq 1f(%%rip),%0\n"
+			      "pushq %0\n"
+			      "lretq\n"
+			      "1:\n"
+			      : "=&r" (dummy) : "0" (__KERNEL_CS));
+
+		/*
+		 * While not needed, we also set the %es, %ds, and %fs
+		 * to zero. We don't care about %ss as it is NULL.
+		 * Strictly speaking this is not needed as Xen zeros those
+		 * out (and also MSR_FS_BASE, MSR_GS_BASE, MSR_KERNEL_GS_BASE)
+		 *
+		 * Linux zeros them in cpu_init() and in secondary_startup_64
+		 * (for BSP).
+		 */
+		loadsegment(es, 0);
+		loadsegment(ds, 0);
+		loadsegment(fs, 0);
+#else
+		/* PVH: TODO Implement. */
+		BUG();
+#endif
+		return;
+	}
 	pv_cpu_ops.write_gdt_entry = xen_write_gdt_entry_boot;
 	pv_cpu_ops.load_gdt = xen_load_gdt_boot;
 
@@ -1499,7 +1534,7 @@ asmlinkage void __init xen_start_kernel(void)
 	 * Set up kernel GDT and segment registers, mainly so that
 	 * -fstack-protector code can be executed.
 	 */
-	xen_setup_stackprotector();
+	xen_setup_gdt();
 
 	xen_init_irq_ops();
 	xen_init_cpuid_mask();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v11 07/12] xen/pvh: Secondary VCPU bringup (non-bootup CPUs)
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
                   ` (5 preceding siblings ...)
  2013-12-17 20:51 ` [PATCH v11 06/12] xen/pvh: Load GDT/GS in early PV bootup code for BSP Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-17 20:51 ` [PATCH v11 08/12] xen/pvh: MMU changes for PVH Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

The VCPU bringup protocol follows the PV with certain twists.
>From xen/include/public/arch-x86/xen.h:

Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise
for HVM and PVH guests, not all information in this structure is updated:

 - For HVM guests, the structures read include: fpu_ctxt (if
 VGCT_I387_VALID is set), flags, user_regs, debugreg[*]

 - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to
 set cr3. All other fields not used should be set to 0.

This is what we do. We piggyback on the 'xen_setup_gdt' - but modify
a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt'
can load per-cpu data-structures. It has no effect on the VCPU0.

We also piggyback on the %rdi register to pass in the CPU number - so
that when we bootup a new CPU, the cpu_bringup_and_idle will have
passed as the first parameter the CPU number (via %rdi for 64-bit).

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c |  7 ++++---
 arch/x86/xen/smp.c       | 49 ++++++++++++++++++++++++++++++++----------------
 arch/x86/xen/xen-ops.h   |  1 +
 3 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 0d13652..e420613 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1413,13 +1413,14 @@ static void __init xen_boot_params_init_edd(void)
  * we do this, we have to be careful not to call any stack-protected
  * function, which is most of the kernel.
  */
-static void __init xen_setup_gdt(void)
+void xen_setup_gdt(int cpu)
 {
 	if (xen_feature(XENFEAT_auto_translated_physmap)) {
 #ifdef CONFIG_X86_64
 		unsigned long dummy;
 
-		switch_to_new_gdt(0); /* GDT and GS set */
+		load_percpu_segment(cpu); /* We need to access per-cpu area */
+		switch_to_new_gdt(cpu); /* GDT and GS set */
 
 		/* We are switching of the Xen provided GDT to our HVM mode
 		 * GDT. The new GDT has  __KERNEL_CS with CS.L = 1
@@ -1534,7 +1535,7 @@ asmlinkage void __init xen_start_kernel(void)
 	 * Set up kernel GDT and segment registers, mainly so that
 	 * -fstack-protector code can be executed.
 	 */
-	xen_setup_gdt();
+	xen_setup_gdt(0);
 
 	xen_init_irq_ops();
 	xen_init_cpuid_mask();
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index c36b325..5e46190 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -73,9 +73,11 @@ static void cpu_bringup(void)
 	touch_softlockup_watchdog();
 	preempt_disable();
 
-	xen_enable_sysenter();
-	xen_enable_syscall();
-
+	/* PVH runs in ring 0 and allows us to do native syscalls. Yay! */
+	if (!xen_feature(XENFEAT_supervisor_mode_kernel)) {
+		xen_enable_sysenter();
+		xen_enable_syscall();
+	}
 	cpu = smp_processor_id();
 	smp_store_cpu_info(cpu);
 	cpu_data(cpu).x86_max_cores = 1;
@@ -97,8 +99,14 @@ static void cpu_bringup(void)
 	wmb();			/* make sure everything is out */
 }
 
-static void cpu_bringup_and_idle(void)
+/* Note: cpu parameter is only relevant for PVH */
+static void cpu_bringup_and_idle(int cpu)
 {
+#ifdef CONFIG_X86_64
+	if (xen_feature(XENFEAT_auto_translated_physmap) &&
+	    xen_feature(XENFEAT_supervisor_mode_kernel))
+		xen_setup_gdt(cpu);
+#endif
 	cpu_bringup();
 	cpu_startup_entry(CPUHP_ONLINE);
 }
@@ -274,9 +282,10 @@ static void __init xen_smp_prepare_boot_cpu(void)
 	native_smp_prepare_boot_cpu();
 
 	if (xen_pv_domain()) {
-		/* We've switched to the "real" per-cpu gdt, so make sure the
-		   old memory can be recycled */
-		make_lowmem_page_readwrite(xen_initial_gdt);
+		if (!xen_feature(XENFEAT_writable_page_tables))
+			/* We've switched to the "real" per-cpu gdt, so make
+			 * sure the old memory can be recycled. */
+			make_lowmem_page_readwrite(xen_initial_gdt);
 
 #ifdef CONFIG_X86_32
 		/*
@@ -360,22 +369,21 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 
 	gdt = get_cpu_gdt_table(cpu);
 
-	ctxt->flags = VGCF_IN_KERNEL;
-	ctxt->user_regs.ss = __KERNEL_DS;
 #ifdef CONFIG_X86_32
+	/* Note: PVH is not yet supported on x86_32. */
 	ctxt->user_regs.fs = __KERNEL_PERCPU;
 	ctxt->user_regs.gs = __KERNEL_STACK_CANARY;
-#else
-	ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
 	ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
 
 	memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt));
 
-	{
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		ctxt->flags = VGCF_IN_KERNEL;
 		ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
 		ctxt->user_regs.ds = __USER_DS;
 		ctxt->user_regs.es = __USER_DS;
+		ctxt->user_regs.ss = __KERNEL_DS;
 
 		xen_copy_trap_info(ctxt->trap_ctxt);
 
@@ -396,18 +404,27 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle)
 #ifdef CONFIG_X86_32
 		ctxt->event_callback_cs     = __KERNEL_CS;
 		ctxt->failsafe_callback_cs  = __KERNEL_CS;
+#else
+		ctxt->gs_base_kernel = per_cpu_offset(cpu);
 #endif
 		ctxt->event_callback_eip    =
 					(unsigned long)xen_hypervisor_callback;
 		ctxt->failsafe_callback_eip =
 					(unsigned long)xen_failsafe_callback;
+		ctxt->user_regs.cs = __KERNEL_CS;
+		per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
+#ifdef CONFIG_X86_32
 	}
-	ctxt->user_regs.cs = __KERNEL_CS;
+#else
+	} else
+		/* N.B. The user_regs.eip (cpu_bringup_and_idle) is called with
+		 * %rdi having the cpu number - which means are passing in
+		 * as the first parameter the cpu. Subtle!
+		 */
+		ctxt->user_regs.rdi = cpu;
+#endif
 	ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs);
-
-	per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir);
 	ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_mfn(swapper_pg_dir));
-
 	if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt))
 		BUG();
 
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 95f8c61..9059c24 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -123,4 +123,5 @@ __visible void xen_adjust_exception_frame(void);
 
 extern int xen_panic_handler_init(void);
 
+void xen_setup_gdt(int cpu);
 #endif /* XEN_OPS_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v11 08/12] xen/pvh: MMU changes for PVH
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
                   ` (6 preceding siblings ...)
  2013-12-17 20:51 ` [PATCH v11 07/12] xen/pvh: Secondary VCPU bringup (non-bootup CPUs) Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-18 14:48   ` [Xen-devel] " Stefano Stabellini
  2013-12-17 20:51 ` [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels " Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

.. which are surprinsingly small compared to the amount for PV code.

PVH uses mostly native mmu ops, we leave the generic (native_*) for
the majority and just overwrite the baremetal with the ones we need.

We also optimize one - the TLB flush. The native operation would
needlessly IPI offline VCPUs causing extra wakeups. Using the
Xen one avoids that and lets the hypervisor determine which
VCPU needs the TLB flush.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/mmu.c | 35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index ce563be..77b7622 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -74,6 +74,7 @@
 #include <xen/interface/version.h>
 #include <xen/interface/memory.h>
 #include <xen/hvc-console.h>
+#include <xen/balloon.h>
 
 #include "multicalls.h"
 #include "mmu.h"
@@ -1207,6 +1208,8 @@ static void __init xen_pagetable_init(void)
 #endif
 	paging_init();
 	xen_setup_shared_info();
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
 #ifdef CONFIG_X86_64
 	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
 		unsigned long new_mfn_list;
@@ -1556,6 +1559,10 @@ static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
 static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
 {
 	struct mmuext_op op;
+
+	if (xen_feature(XENFEAT_writable_page_tables))
+		return;
+
 	op.cmd = cmd;
 	op.arg1.mfn = pfn_to_mfn(pfn);
 	if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF))
@@ -1753,6 +1760,10 @@ static void set_page_prot_flags(void *addr, pgprot_t prot, unsigned long flags)
 	unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
 	pte_t pte = pfn_pte(pfn, prot);
 
+	/* recall for PVH, page tables are native. */
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, flags))
 		BUG();
 }
@@ -1834,6 +1845,9 @@ static void convert_pfn_mfn(void *v)
 	pte_t *pte = v;
 	int i;
 
+	if (xen_feature(XENFEAT_auto_translated_physmap))
+		return;
+
 	/* All levels are converted the same way, so just treat them
 	   as ptes. */
 	for (i = 0; i < PTRS_PER_PTE; i++)
@@ -1863,6 +1877,7 @@ static void __init check_pt_base(unsigned long *pt_base, unsigned long *pt_end,
  * but that's enough to get __va working.  We need to fill in the rest
  * of the physical mapping once some sort of allocator has been set
  * up.
+ * NOTE: for PVH, the page tables are native.
  */
 void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 {
@@ -1940,10 +1955,13 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	 * structure to attach it to, so make sure we just set kernel
 	 * pgd.
 	 */
-	xen_mc_batch();
-	__xen_write_cr3(true, __pa(init_level4_pgt));
-	xen_mc_issue(PARAVIRT_LAZY_CPU);
-
+	if (xen_feature(XENFEAT_writable_page_tables)) {
+		native_write_cr3(__pa(init_level4_pgt));
+	} else {
+		xen_mc_batch();
+		__xen_write_cr3(true, __pa(init_level4_pgt));
+		xen_mc_issue(PARAVIRT_LAZY_CPU);
+	}
 	/* We can't that easily rip out L3 and L2, as the Xen pagetables are
 	 * set out this way: [L4], [L1], [L2], [L3], [L1], [L1] ...  for
 	 * the initial domain. For guests using the toolstack, they are in:
@@ -2207,6 +2225,15 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
 void __init xen_init_mmu_ops(void)
 {
 	x86_init.paging.pagetable_init = xen_pagetable_init;
+
+	/* Optimization - we can use the HVM one but it has no idea which
+	 * VCPUs are descheduled - which means that it will needlessly IPI
+	 * them. Xen knows so let it do the job.
+	 */
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+		pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
+		return;
+	}
 	pv_mmu_ops = xen_mmu_ops;
 
 	memset(dummy_mapping, 0xff, PAGE_SIZE);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
                   ` (7 preceding siblings ...)
  2013-12-17 20:51 ` [PATCH v11 08/12] xen/pvh: MMU changes for PVH Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-18 18:31   ` [Xen-devel] " Stefano Stabellini
  2013-12-17 20:51 ` [PATCH v11 10/12] xen/pvh: Piggyback on PVHVM for grant driver Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

PVH is a PV guest with a twist - there are certain things
that work in it like HVM and some like PV. There is
a similar mode - PVHVM where we run in HVM mode with
PV code enabled - and this patch explores that.

The most notable PV interfaces are the XenBus and event channels.
For PVH, we will use XenBus and event channels.

For the XenBus mechanism we piggyback on how it is done for
PVHVM guests.

Ditto for the event channel mechanism - we piggyback on PVHVM -
by setting up a specific vector callback and that
vector ends up calling the event channel mechanism to
dispatch the events as needed.

This means that from a pvops perspective, we can use
native_irq_ops instead of the Xen PV specific. Albeit in the
future we could support pirq_eoi_map. But that is
a feature request that can be shared with PVHVM.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c           | 6 ++++++
 arch/x86/xen/irq.c                 | 5 ++++-
 drivers/xen/events.c               | 5 +++++
 drivers/xen/xenbus/xenbus_client.c | 3 ++-
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index e420613..7fceb51 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1134,6 +1134,8 @@ void xen_setup_shared_info(void)
 	/* In UP this is as good a place as any to set up shared info */
 	xen_setup_vcpu_info_placement();
 #endif
+	if (xen_pvh_domain())
+		return;
 
 	xen_setup_mfn_list_list();
 }
@@ -1146,6 +1148,10 @@ void xen_setup_vcpu_info_placement(void)
 	for_each_possible_cpu(cpu)
 		xen_vcpu_setup(cpu);
 
+	/* PVH always uses native IRQ ops */
+	if (xen_pvh_domain())
+		return;
+
 	/* xen_vcpu_setup managed to place the vcpu_info within the
 	   percpu area for all cpus, so make use of it */
 	if (have_vcpu_info_placement) {
diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 0da7f86..4f7f351 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -5,6 +5,7 @@
 #include <xen/interface/xen.h>
 #include <xen/interface/sched.h>
 #include <xen/interface/vcpu.h>
+#include <xen/features.h>
 #include <xen/events.h>
 
 #include <asm/xen/hypercall.h>
@@ -128,6 +129,8 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
 
 void __init xen_init_irq_ops(void)
 {
-	pv_irq_ops = xen_irq_ops;
+	/* For PVH we use default pv_irq_ops settings */
+	if (!xen_feature(XENFEAT_hvm_callback_vector))
+		pv_irq_ops = xen_irq_ops;
 	x86_init.irqs.intr_init = xen_init_IRQ;
 }
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 4035e83..627a16a 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -1922,6 +1922,11 @@ void __init xen_init_IRQ(void)
 		if (xen_initial_domain())
 			pci_xen_initial_domain();
 
+		if (xen_feature(XENFEAT_hvm_callback_vector)) {
+			xen_callback_vector();
+			return;
+		}
+
 		pirq_eoi_map = (void *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
 		eoi_gmfn.gmfn = virt_to_mfn(pirq_eoi_map);
 		rc = HYPERVISOR_physdev_op(PHYSDEVOP_pirq_eoi_gmfn_v2, &eoi_gmfn);
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index ec097d6..7f7c454 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -45,6 +45,7 @@
 #include <xen/grant_table.h>
 #include <xen/xenbus.h>
 #include <xen/xen.h>
+#include <xen/features.h>
 
 #include "xenbus_probe.h"
 
@@ -743,7 +744,7 @@ static const struct xenbus_ring_ops ring_ops_hvm = {
 
 void __init xenbus_ring_ops_init(void)
 {
-	if (xen_pv_domain())
+	if (xen_pv_domain() && !xen_feature(XENFEAT_auto_translated_physmap))
 		ring_ops = &ring_ops_pv;
 	else
 		ring_ops = &ring_ops_hvm;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v11 10/12] xen/pvh: Piggyback on PVHVM for grant driver.
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
                   ` (8 preceding siblings ...)
  2013-12-17 20:51 ` [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels " Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-18 18:46   ` [Xen-devel] " Stefano Stabellini
  2013-12-17 20:51 ` [PATCH v11 11/12] xen/pvh: Disable PV code that does not work with PVH Konrad Rzeszutek Wilk
  2013-12-17 20:51 ` [PATCH v11 12/12] xen/pvh: Support ParaVirtualized Hardware extensions Konrad Rzeszutek Wilk
  11 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

In PVH the shared grant frame is the PFN and not MFN,
hence its mapped via the same code path as HVM.

The allocation of the grant frame is done differently - we
do not use the early platform-pci driver and have an
ioremap area - instead we use balloon memory and stitch
all of the non-contingous pages in a virtualized area.

That means when we call the hypervisor to replace the GMFN
with a XENMAPSPACE_grant_table type, we need to lookup the
old PFN for every iteration instead of assuming a flat
contingous PFN allocation.

Lastly, we only use v1 for grants. This is because PVHVM
is not able to use v2 due to no XENMEM_add_to_physmap
calls on the error status page (see commit
69e8f430e243d657c2053f097efebc2e2cd559f0
 xen/granttable: Disable grant v2 for HVM domains.)

Until that is implemented this workaround has to
be in place.

Also per suggestions by Stefano utilize the PVHVM paths
as they share common functionality.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/gntdev.c       |  2 +-
 drivers/xen/grant-table.c  | 80 ++++++++++++++++++++++++++++++++++++++++++----
 drivers/xen/platform-pci.c |  2 +-
 include/xen/grant_table.h  |  2 +-
 4 files changed, 76 insertions(+), 10 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index e41c79c..073b4a1 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -846,7 +846,7 @@ static int __init gntdev_init(void)
 	if (!xen_domain())
 		return -ENODEV;
 
-	use_ptemod = xen_pv_domain();
+	use_ptemod = !xen_feature(XENFEAT_auto_translated_physmap);
 
 	err = misc_register(&gntdev_miscdev);
 	if (err != 0) {
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index aa846a4..c0ded9f 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -47,6 +47,7 @@
 #include <xen/interface/xen.h>
 #include <xen/page.h>
 #include <xen/grant_table.h>
+#include <xen/balloon.h>
 #include <xen/interface/memory.h>
 #include <xen/hvc-console.h>
 #include <xen/swiotlb-xen.h>
@@ -66,8 +67,8 @@ static unsigned int boot_max_nr_grant_frames;
 static int gnttab_free_count;
 static grant_ref_t gnttab_free_head;
 static DEFINE_SPINLOCK(gnttab_list_lock);
-unsigned long xen_hvm_resume_frames;
-EXPORT_SYMBOL_GPL(xen_hvm_resume_frames);
+unsigned long xen_auto_xlat_grant_frames;
+EXPORT_SYMBOL_GPL(xen_auto_xlat_grant_frames);
 
 static union {
 	struct grant_entry_v1 *v1;
@@ -1060,7 +1061,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 	unsigned int nr_gframes = end_idx + 1;
 	int rc;
 
-	if (xen_hvm_domain()) {
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
 		struct xen_add_to_physmap xatp;
 		unsigned int i = end_idx;
 		rc = 0;
@@ -1069,10 +1070,24 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
 		 * index, ensuring that the table will grow only once.
 		 */
 		do {
+			unsigned long vaddr;
+			unsigned int level;
+			pte_t *pte;
+
 			xatp.domid = DOMID_SELF;
 			xatp.idx = i;
 			xatp.space = XENMAPSPACE_grant_table;
-			xatp.gpfn = (xen_hvm_resume_frames >> PAGE_SHIFT) + i;
+
+			/*
+			 * Don't assume the memory is contingous. Lookup each.
+			 */
+			vaddr = xen_auto_xlat_grant_frames + (i * PAGE_SIZE);
+			if (xen_hvm_domain())
+				xatp.gpfn = vaddr >> PAGE_SHIFT;
+			else {
+				pte = lookup_address(vaddr, &level);
+				xatp.gpfn = pte_mfn(*pte);
+			}
 			rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
 			if (rc != 0) {
 				pr_warn("grant table add_to_physmap failed, err=%d\n",
@@ -1135,7 +1150,7 @@ static void gnttab_request_version(void)
 	int rc;
 	struct gnttab_set_version gsv;
 
-	if (xen_hvm_domain())
+	if (xen_hvm_domain() || xen_feature(XENFEAT_auto_translated_physmap))
 		gsv.version = 1;
 	else
 		gsv.version = 2;
@@ -1161,6 +1176,46 @@ static void gnttab_request_version(void)
 	pr_info("Grant tables using version %d layout\n", grant_table_version);
 }
 
+static int xlated_setup_gnttab_pages(unsigned long nr_grant_frames,
+				     unsigned long max, void *addr)
+{
+	struct page **pages;
+	unsigned long *pfns;
+	int rc, i;
+
+	pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
+	if (!pages)
+		return -ENOMEM;
+
+	pfns = kcalloc(nr_grant_frames, sizeof(pfns[0]), GFP_KERNEL);
+	if (!pfns) {
+		kfree(pages);
+		return -ENOMEM;
+	}
+	rc = alloc_xenballooned_pages(nr_grant_frames, pages, 0 /* lowmem */);
+	if (rc) {
+		pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
+			nr_grant_frames, rc);
+		kfree(pages);
+		kfree(pfns);
+		return rc;
+	}
+	for (i = 0; i < nr_grant_frames; i++)
+		pfns[i] = page_to_pfn(pages[i]);
+
+	rc = arch_gnttab_map_shared(pfns, nr_grant_frames, max, addr);
+
+	kfree(pages);
+	kfree(pfns);
+	if (rc) {
+		pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
+			nr_grant_frames, rc);
+		free_xenballooned_pages(nr_grant_frames, pages);
+		return rc;
+	}
+	return rc;
+}
+
 static int gnttab_setup(void)
 {
 	unsigned int max_nr_gframes;
@@ -1169,15 +1224,26 @@ static int gnttab_setup(void)
 	if (max_nr_gframes < nr_grant_frames)
 		return -ENOSYS;
 
+	if (xen_feature(XENFEAT_auto_translated_physmap) && !xen_auto_xlat_grant_frames) {
+		/*
+		 * xen_auto_xlat_grant_frames is setup for PVHVM by
+		 * alloc_xen_mmio by the time this is called.
+		 */
+		int rc = xlated_setup_gnttab_pages(max_nr_gframes, max_nr_gframes,
+						   &gnttab_shared.addr);
+		if (rc)
+			return rc;
+		xen_auto_xlat_grant_frames = (unsigned long)gnttab_shared.addr;
+	}
 	if (xen_pv_domain())
 		return gnttab_map(0, nr_grant_frames - 1);
 
 	if (gnttab_shared.addr == NULL) {
-		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
+		gnttab_shared.addr = xen_remap(xen_auto_xlat_grant_frames,
 						PAGE_SIZE * max_nr_gframes);
 		if (gnttab_shared.addr == NULL) {
 			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
-					xen_hvm_resume_frames);
+					xen_auto_xlat_grant_frames);
 			return -ENOMEM;
 		}
 	}
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 2f3528e..44bc5a6 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -154,7 +154,7 @@ static int platform_pci_init(struct pci_dev *pdev,
 	}
 
 	max_nr_gframes = gnttab_max_grant_frames();
-	xen_hvm_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
+	xen_auto_xlat_grant_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
 	ret = gnttab_init();
 	if (ret)
 		goto out;
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 694dcaf..24280ac 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -178,7 +178,7 @@ int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
 			   grant_status_t **__shared);
 void arch_gnttab_unmap(void *shared, unsigned long nr_gframes);
 
-extern unsigned long xen_hvm_resume_frames;
+extern unsigned long xen_auto_xlat_grant_frames;
 unsigned int gnttab_max_grant_frames(void);
 
 #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v11 11/12] xen/pvh: Disable PV code that does not work with PVH.
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
                   ` (9 preceding siblings ...)
  2013-12-17 20:51 ` [PATCH v11 10/12] xen/pvh: Piggyback on PVHVM for grant driver Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-18 14:19   ` [Xen-devel] " Stefano Stabellini
  2013-12-17 20:51 ` [PATCH v11 12/12] xen/pvh: Support ParaVirtualized Hardware extensions Konrad Rzeszutek Wilk
  11 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

As we do not have yet a mechanism for that.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/cpu_hotplug.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
index cc6513a..cbb02af 100644
--- a/drivers/xen/cpu_hotplug.c
+++ b/drivers/xen/cpu_hotplug.c
@@ -4,6 +4,7 @@
 
 #include <xen/xen.h>
 #include <xen/xenbus.h>
+#include <xen/features.h>
 
 #include <asm/xen/hypervisor.h>
 #include <asm/cpu.h>
@@ -102,7 +103,8 @@ static int __init setup_vcpu_hotplug_event(void)
 	static struct notifier_block xsn_cpu = {
 		.notifier_call = setup_cpu_watcher };
 
-	if (!xen_pv_domain())
+	/* PVH TBD/FIXME: future work */
+	if (!xen_pv_domain() || xen_feature(XENFEAT_auto_translated_physmap))
 		return -ENODEV;
 
 	register_xenstore_notifier(&xsn_cpu);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v11 12/12] xen/pvh: Support ParaVirtualized Hardware extensions.
  2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
                   ` (10 preceding siblings ...)
  2013-12-17 20:51 ` [PATCH v11 11/12] xen/pvh: Disable PV code that does not work with PVH Konrad Rzeszutek Wilk
@ 2013-12-17 20:51 ` Konrad Rzeszutek Wilk
  2013-12-18 14:52   ` [Xen-devel] " Stefano Stabellini
  11 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 20:51 UTC (permalink / raw)
  To: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich
  Cc: Konrad Rzeszutek Wilk

From: Mukesh Rathor <mukesh.rathor@oracle.com>

PVH allows PV linux guest to utilize hardware extended capabilities,
such as running MMU updates in a HVM container.

The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
with modifications):

"* the guest uses auto translate:
 - p2m is managed by Xen
 - pagetables are owned by the guest
 - mmu_update hypercall not available
* it uses event callback and not vlapic emulation,
* IDT is native, so set_trap_table hcall is also N/A for a PVH guest.

For a full list of hcalls supported for PVH, see pvh_hypercall64_table
in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
PV guest with auto translate, although it does use hvm_op for setting
callback vector."

Use .ascii and .asciz to define xen feature string. Note, the PVH
string must be in a single line (not multiple lines with \) to keep the
assembler from putting null char after each string before \.
This patch allows it to be configured and enabled.

Lastly remove some of the scaffolding.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/Kconfig    | 8 ++++++++
 arch/x86/xen/xen-head.S | 8 +++++++-
 include/xen/xen.h       | 3 +--
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 1a3c765..161cc34 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -51,3 +51,11 @@ config XEN_DEBUG_FS
 	  Enable statistics output and various tuning options in debugfs.
 	  Enabling this option may incur a significant performance overhead.
 
+config XEN_PVH
+	bool "Support for running as a PVH guest"
+	depends on X86_64 && XEN && XEN_PVHVM
+	default n
+	help
+	   This option enables support for running as a PVH guest (PV guest
+	   using hardware extensions) under a suitably capable hypervisor.
+	   If unsure, say N.
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 7faed58..56f42c0 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -13,6 +13,12 @@
 #include <xen/interface/elfnote.h>
 #include <asm/xen/interface.h>
 
+#ifdef CONFIG_XEN_PVH
+#define PVH_FEATURES_STR  "|writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector"
+#else
+#define PVH_FEATURES_STR  ""
+#endif
+
 	__INIT
 ENTRY(startup_xen)
 	cld
@@ -95,7 +101,7 @@ NEXT_HYPERCALL(arch_6)
 #endif
 	ELFNOTE(Xen, XEN_ELFNOTE_ENTRY,          _ASM_PTR startup_xen)
 	ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page)
-	ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .asciz "!writable_page_tables|pae_pgdir_above_4gb")
+	ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .ascii "!writable_page_tables|pae_pgdir_above_4gb"; .asciz PVH_FEATURES_STR)
 	ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE,       .asciz "yes")
 	ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz "generic")
 	ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID,
diff --git a/include/xen/xen.h b/include/xen/xen.h
index 1d6a237..a248002 100644
--- a/include/xen/xen.h
+++ b/include/xen/xen.h
@@ -29,8 +29,7 @@ extern enum xen_domain_type xen_domain_type;
 #define xen_initial_domain()	(0)
 #endif	/* CONFIG_XEN_DOM0 */
 
-#ifdef CONFIG_XEN_PVHVM
-/* Temporarily under XEN_PVHVM, but will be under CONFIG_XEN_PVH */
+#ifdef CONFIG_XEN_PVH
 #include <xen/features.h>
 #define xen_pvh_domain() (xen_pv_domain() && \
 			  xen_feature(XENFEAT_auto_translated_physmap) && \
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 01/12] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn.
  2013-12-17 20:51 ` [PATCH v11 01/12] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
@ 2013-12-18 14:10   ` Stefano Stabellini
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 14:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> Most of the functions in page.h are prefaced with
> 	if (xen_feature(XENFEAT_auto_translated_physmap))
> 		return mfn;
> 
> Except the mfn_to_local_pfn. At a first sight, the function
> should work without this patch - as the 'mfn_to_mfn' has
> a similar check. But there are no such check in the
> 'get_phys_to_machine' function - so we would crash in there.
> 
> This fixes it by following the convention of having the
> check for auto-xlat in these static functions.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  arch/x86/include/asm/xen/page.h | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> index b913915..4a092cc 100644
> --- a/arch/x86/include/asm/xen/page.h
> +++ b/arch/x86/include/asm/xen/page.h
> @@ -167,7 +167,12 @@ static inline xpaddr_t machine_to_phys(xmaddr_t machine)
>   */
>  static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
>  {
> -	unsigned long pfn = mfn_to_pfn(mfn);
> +	unsigned long pfn;
> +
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return mfn;
> +
> +	pfn = mfn_to_pfn(mfn);
>  	if (get_phys_to_machine(pfn) != mfn)
>  		return -1; /* force !pfn_valid() */
>  	return pfn;
> -- 
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 11/12] xen/pvh: Disable PV code that does not work with PVH.
  2013-12-17 20:51 ` [PATCH v11 11/12] xen/pvh: Disable PV code that does not work with PVH Konrad Rzeszutek Wilk
@ 2013-12-18 14:19   ` Stefano Stabellini
  2013-12-18 14:56     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 14:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> As we do not have yet a mechanism for that.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Even though we don't even compile cpu_hotplug on ARM yet, I don't like
this change because it affects ARM guests too.

Please add a note about ARM guests in the FIXME line.


>  drivers/xen/cpu_hotplug.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
> index cc6513a..cbb02af 100644
> --- a/drivers/xen/cpu_hotplug.c
> +++ b/drivers/xen/cpu_hotplug.c
> @@ -4,6 +4,7 @@
>  
>  #include <xen/xen.h>
>  #include <xen/xenbus.h>
> +#include <xen/features.h>
>  
>  #include <asm/xen/hypervisor.h>
>  #include <asm/cpu.h>
> @@ -102,7 +103,8 @@ static int __init setup_vcpu_hotplug_event(void)
>  	static struct notifier_block xsn_cpu = {
>  		.notifier_call = setup_cpu_watcher };
>  
> -	if (!xen_pv_domain())
> +	/* PVH TBD/FIXME: future work */
> +	if (!xen_pv_domain() || xen_feature(XENFEAT_auto_translated_physmap))
>  		return -ENODEV;
>  
>  	register_xenstore_notifier(&xsn_cpu);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 02/12] xen/pvh: Define what an PVH guest is.
  2013-12-17 20:51 ` [PATCH v11 02/12] xen/pvh: Define what an PVH guest is Konrad Rzeszutek Wilk
@ 2013-12-18 14:22   ` Stefano Stabellini
  2013-12-18 14:55     ` Stefano Stabellini
  2013-12-18 14:57     ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 14:22 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> Which is a PV guest with auto page translation enabled
> and with vector callback. It is a cross between PVHVM and PV.
> 
> The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
> with modifications):
> 
> "* the guest uses auto translate:
>  - p2m is managed by Xen
>  - pagetables are owned by the guest
>  - mmu_update hypercall not available
> * it uses event callback and not vlapic emulation,
> * IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
> 
> For a full list of hcalls supported for PVH, see pvh_hypercall64_table
> in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
> PV guest with auto translate, although it does use hvm_op for setting
> callback vector."
> 
> We don't have yet a Kconfig entry setup as we do not
> have all the parts ready for it - so we piggyback
> on the PVHVM config option. This scaffolding will
> be removed later.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Could you please add an "&& CONFIG_X86"?


>  include/xen/xen.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index a74d436..1d6a237 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -29,4 +29,13 @@ extern enum xen_domain_type xen_domain_type;
>  #define xen_initial_domain()	(0)
>  #endif	/* CONFIG_XEN_DOM0 */
>  
> +#ifdef CONFIG_XEN_PVHVM
> +/* Temporarily under XEN_PVHVM, but will be under CONFIG_XEN_PVH */
> +#include <xen/features.h>
> +#define xen_pvh_domain() (xen_pv_domain() && \
> +			  xen_feature(XENFEAT_auto_translated_physmap) && \
> +			  xen_have_vector_callback)
> +#else
> +#define xen_pvh_domain()	(0)
> +#endif
>  #endif	/* _XEN_XEN_H */
> -- 
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 03/12] xen/pvh: Early bootup changes in PV code.
  2013-12-17 20:51 ` [PATCH v11 03/12] xen/pvh: Early bootup changes in PV code Konrad Rzeszutek Wilk
@ 2013-12-18 14:27   ` Stefano Stabellini
  2013-12-18 14:58     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 14:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> In the bootup code for PVH we can trap cpuid via vmexit, so don't
> need to use emulated prefix call. We also check for vector callback
> early on, as it is a required feature. PVH also runs at default kernel
> IOPL.
> 
> Finally, pure PV settings are moved to a separate function that are
> only called for pure PV, ie, pv with pvmmu. They are also #ifdef
> with CONFIG_XEN_PVMMU.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/enlighten.c | 63 +++++++++++++++++++++++++++++++++---------------
>  arch/x86/xen/setup.c     | 18 +++++++++-----
>  2 files changed, 56 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index fa6ade7..3b21c82 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -46,6 +46,7 @@
>  #include <xen/hvm.h>
>  #include <xen/hvc-console.h>
>  #include <xen/acpi.h>
> +#include <xen/features.h>
>  
>  #include <asm/paravirt.h>
>  #include <asm/apic.h>
> @@ -262,8 +263,9 @@ static void __init xen_banner(void)
>  	struct xen_extraversion extra;
>  	HYPERVISOR_xen_version(XENVER_extraversion, &extra);
>  
> -	printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
> -	       pv_info.name);
> +	pr_info("Booting paravirtualized kernel %son %s\n",
> +		xen_feature(XENFEAT_auto_translated_physmap) ?
> +			"with PVH extensions " : "", pv_info.name);
>  	printk(KERN_INFO "Xen version: %d.%d%s%s\n",
>  	       version >> 16, version & 0xffff, extra.extraversion,
>  	       xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
> @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
>  		break;
>  	}
>  
> -	asm(XEN_EMULATE_PREFIX "cpuid"
> -		: "=a" (*ax),
> -		  "=b" (*bx),
> -		  "=c" (*cx),
> -		  "=d" (*dx)
> -		: "0" (*ax), "2" (*cx));
> +	if (xen_pvh_domain())
> +		native_cpuid(ax, bx, cx, dx);
> +	else
> +		asm(XEN_EMULATE_PREFIX "cpuid"
> +			: "=a" (*ax),
> +			"=b" (*bx),
> +			"=c" (*cx),
> +			"=d" (*dx)
> +			: "0" (*ax), "2" (*cx));
>  
>  	*bx &= maskebx;
>  	*cx &= maskecx;
> @@ -1420,6 +1425,19 @@ static void __init xen_setup_stackprotector(void)
>  	pv_cpu_ops.load_gdt = xen_load_gdt;
>  }
>  
> +static void __init xen_pvh_early_guest_init(void)
> +{
> +	if (xen_feature(XENFEAT_hvm_callback_vector))
> +		xen_have_vector_callback = 1;
> +
> +#ifdef CONFIG_X86_32
> +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
> +		xen_raw_printk("ERROR: 32bit PVH guests are not supported\n");
> +		BUG();
> +	}
> +#endif

Shouldn't we detect this error at build time?
If so, does it make sense to check for it at run time too?

The rest looks OK to me.

> +}
> +
>  /* First C function to be called on Xen boot */
>  asmlinkage void __init xen_start_kernel(void)
>  {
> @@ -1431,13 +1449,18 @@ asmlinkage void __init xen_start_kernel(void)
>  
>  	xen_domain_type = XEN_PV_DOMAIN;
>  
> +	xen_setup_features();
> +	xen_pvh_early_guest_init();
>  	xen_setup_machphys_mapping();
>  
>  	/* Install Xen paravirt ops */
>  	pv_info = xen_info;
>  	pv_init_ops = xen_init_ops;
> -	pv_cpu_ops = xen_cpu_ops;
>  	pv_apic_ops = xen_apic_ops;
> +	if (xen_pvh_domain())
> +		pv_cpu_ops.cpuid = xen_cpuid;
> +	else
> +		pv_cpu_ops = xen_cpu_ops;
>  
>  	x86_init.resources.memory_setup = xen_memory_setup;
>  	x86_init.oem.arch_setup = xen_arch_setup;
> @@ -1469,8 +1492,6 @@ asmlinkage void __init xen_start_kernel(void)
>  	/* Work out if we support NX */
>  	x86_configure_nx();
>  
> -	xen_setup_features();
> -
>  	/* Get mfn list */
>  	if (!xen_feature(XENFEAT_auto_translated_physmap))
>  		xen_build_dynamic_phys_to_machine();
> @@ -1548,14 +1569,18 @@ asmlinkage void __init xen_start_kernel(void)
>  	/* set the limit of our address space */
>  	xen_reserve_top();
>  
> -	/* We used to do this in xen_arch_setup, but that is too late on AMD
> -	 * were early_cpu_init (run before ->arch_setup()) calls early_amd_init
> -	 * which pokes 0xcf8 port.
> -	 */
> -	set_iopl.iopl = 1;
> -	rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
> -	if (rc != 0)
> -		xen_raw_printk("physdev_op failed %d\n", rc);
> +	/* PVH: runs at default kernel iopl of 0 */
> +	if (!xen_pvh_domain()) {
> +		/*
> +		 * We used to do this in xen_arch_setup, but that is too late
> +		 * on AMD were early_cpu_init (run before ->arch_setup()) calls
> +		 * early_amd_init which pokes 0xcf8 port.
> +		 */
> +		set_iopl.iopl = 1;
> +		rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
> +		if (rc != 0)
> +			xen_raw_printk("physdev_op failed %d\n", rc);
> +	}
>  
>  #ifdef CONFIG_X86_32
>  	/* set up basic CPUID stuff */
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 68c054f..2137c51 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -563,16 +563,13 @@ void xen_enable_nmi(void)
>  		BUG();
>  #endif
>  }
> -void __init xen_arch_setup(void)
> +void __init xen_pvmmu_arch_setup(void)
>  {
> -	xen_panic_handler_init();
> -
>  	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments);
>  	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_writable_pagetables);
>  
> -	if (!xen_feature(XENFEAT_auto_translated_physmap))
> -		HYPERVISOR_vm_assist(VMASST_CMD_enable,
> -				     VMASST_TYPE_pae_extended_cr3);
> +	HYPERVISOR_vm_assist(VMASST_CMD_enable,
> +			     VMASST_TYPE_pae_extended_cr3);
>  
>  	if (register_callback(CALLBACKTYPE_event, xen_hypervisor_callback) ||
>  	    register_callback(CALLBACKTYPE_failsafe, xen_failsafe_callback))
> @@ -581,6 +578,15 @@ void __init xen_arch_setup(void)
>  	xen_enable_sysenter();
>  	xen_enable_syscall();
>  	xen_enable_nmi();
> +}
> +
> +/* This function is not called for HVM domains */
> +void __init xen_arch_setup(void)
> +{
> +	xen_panic_handler_init();
> +	if (!xen_feature(XENFEAT_auto_translated_physmap))
> +		xen_pvmmu_arch_setup();
> +
>  #ifdef CONFIG_ACPI
>  	if (!(xen_start_info->flags & SIF_INITDOMAIN)) {
>  		printk(KERN_INFO "ACPI in unprivileged domain disabled\n");
> -- 
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 04/12] xen/pvh: Don't setup P2M tree.
  2013-12-17 20:51 ` [PATCH v11 04/12] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
@ 2013-12-18 14:39   ` Stefano Stabellini
  2013-12-18 15:05     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 14:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> P2M is not available for PVH. Fortunatly for us the
> P2M code already has mostly the support for auto-xlat guest thanks to
> commit 3d24bbd7dddbea54358a9795abaf051b0f18973c
> "grant-table: call set_phys_to_machine after mapping grant refs"
> which: "
> introduces set_phys_to_machine calls for auto_translated guests
> (even on x86) in gnttab_map_refs and gnttab_unmap_refs.
> translated by swiotlb-xen... " so we don't need to muck much.

Just a note: with 3d24bbd7dddbea54358a9795abaf051b0f18973c you'll get
set_phys_to_machine calls from gnttab_map_refs and gnttab_unmap_refs but
PVH guests won't do anything with them.

If we assume that an IOMMU is always present on the plaform and Xen is
going to make the appropriate IOMMU pagetable changes in the hypercall
implementation of GNTTABOP_map_grant_ref and GNTTABOP_unmap_grant_ref,
then eveything should be transparent from PVH Dom0 point of view and DMA
transfers involving foreign pages keep working with no issies. 

Otherwise we do need a P2M (and an M2P) for PVH Dom0 to track these
foreign pages: on ARM I introduced two redblack trees to do it (see
arch/arm/xen/p2m.c).



> But we still have to inhibit the building of the P2M tree.
> That had been done in the past by not calling
> xen_build_dynamic_phys_to_machine (which setups the P2M tree
> and gives us virtual address to access them). But we are missing
> a check for xen_build_mfn_list_list - which was continuing to setup
> the P2M tree and would blow up at trying to get the virtual
> address of p2m_missing (which would have been setup by
> xen_build_dynamic_phys_to_machine).
> 
> Hence a check is needed to not call xen_build_mfn_list_list when
> running in auto-xlat mode.
> 
> Instead of replicating the check for auto-xlat in enlighten.c
> do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
> is called also in xen_arch_post_suspend without any checks for
> auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
> allocate space for an P2M tree.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/enlighten.c |  3 +--
>  arch/x86/xen/p2m.c       | 12 ++++++++++--
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 3b21c82..c7341d0 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1493,8 +1493,7 @@ asmlinkage void __init xen_start_kernel(void)
>  	x86_configure_nx();
>  
>  	/* Get mfn list */
> -	if (!xen_feature(XENFEAT_auto_translated_physmap))
> -		xen_build_dynamic_phys_to_machine();
> +	xen_build_dynamic_phys_to_machine();
>  
>  	/*
>  	 * Set up kernel GDT and segment registers, mainly so that
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 2ae8699..fb7ee0a 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -280,6 +280,9 @@ void __ref xen_build_mfn_list_list(void)
>  {
>  	unsigned long pfn;
>  
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
> +
>  	/* Pre-initialize p2m_top_mfn to be completely missing */
>  	if (p2m_top_mfn == NULL) {
>  		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
> @@ -346,10 +349,15 @@ void xen_setup_mfn_list_list(void)
>  /* Set up p2m_top to point to the domain-builder provided p2m pages */
>  void __init xen_build_dynamic_phys_to_machine(void)
>  {
> -	unsigned long *mfn_list = (unsigned long *)xen_start_info->mfn_list;
> -	unsigned long max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
> +	unsigned long *mfn_list;
> +	unsigned long max_pfn;
>  	unsigned long pfn;
>  
> +	 if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
> +
> +	mfn_list = (unsigned long *)xen_start_info->mfn_list;
> +	max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
>  	xen_max_p2m_pfn = max_pfn;
>  
>  	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
> -- 
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 08/12] xen/pvh: MMU changes for PVH
  2013-12-17 20:51 ` [PATCH v11 08/12] xen/pvh: MMU changes for PVH Konrad Rzeszutek Wilk
@ 2013-12-18 14:48   ` Stefano Stabellini
  2013-12-18 15:10     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 14:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> .. which are surprinsingly small compared to the amount for PV code.
> 
> PVH uses mostly native mmu ops, we leave the generic (native_*) for
> the majority and just overwrite the baremetal with the ones we need.
> 
> We also optimize one - the TLB flush. The native operation would
> needlessly IPI offline VCPUs causing extra wakeups. Using the
> Xen one avoids that and lets the hypervisor determine which
> VCPU needs the TLB flush.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/mmu.c | 35 +++++++++++++++++++++++++++++++----
>  1 file changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index ce563be..77b7622 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -74,6 +74,7 @@
>  #include <xen/interface/version.h>
>  #include <xen/interface/memory.h>
>  #include <xen/hvc-console.h>
> +#include <xen/balloon.h>
>  
>  #include "multicalls.h"
>  #include "mmu.h"
> @@ -1207,6 +1208,8 @@ static void __init xen_pagetable_init(void)
>  #endif
>  	paging_init();
>  	xen_setup_shared_info();
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
>  #ifdef CONFIG_X86_64
>  	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>  		unsigned long new_mfn_list;

At the very least you should remove the second
XENFEAT_auto_translated_physmap check.  Also xen_setup_shared_info
contains yet another XENFEAT_auto_translated_physmap check. Maybe we
could refactor the code a bit to look nicer. Having a separate
xen_pagetable_init function for PVH could help.


> @@ -1556,6 +1559,10 @@ static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
>  static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
>  {
>  	struct mmuext_op op;
> +
> +	if (xen_feature(XENFEAT_writable_page_tables))
> +		return;
> +
>  	op.cmd = cmd;
>  	op.arg1.mfn = pfn_to_mfn(pfn);
>  	if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF))

Why do we need this? I thought that all the callers of pin_pagetable_pfn
are not actually enabled on PVH.


> @@ -1753,6 +1760,10 @@ static void set_page_prot_flags(void *addr, pgprot_t prot, unsigned long flags)
>  	unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
>  	pte_t pte = pfn_pte(pfn, prot);
>  
> +	/* recall for PVH, page tables are native. */
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
> +
>  	if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, flags))
>  		BUG();
>  }

This one too. Is it because we are reusing xen_setup_kernel_pagetable on
PVH?


> @@ -1834,6 +1845,9 @@ static void convert_pfn_mfn(void *v)
>  	pte_t *pte = v;
>  	int i;
>  
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
> +
>  	/* All levels are converted the same way, so just treat them
>  	   as ptes. */
>  	for (i = 0; i < PTRS_PER_PTE; i++)

This is getting pretty bad.
Can we find a way to refactor xen_setup_kernel_pagetable so that we
don't need all this? Maybe we need a new function?


> @@ -1863,6 +1877,7 @@ static void __init check_pt_base(unsigned long *pt_base, unsigned long *pt_end,
>   * but that's enough to get __va working.  We need to fill in the rest
>   * of the physical mapping once some sort of allocator has been set
>   * up.
> + * NOTE: for PVH, the page tables are native.
>   */
>  void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
>  {
> @@ -1940,10 +1955,13 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
>  	 * structure to attach it to, so make sure we just set kernel
>  	 * pgd.
>  	 */
> -	xen_mc_batch();
> -	__xen_write_cr3(true, __pa(init_level4_pgt));
> -	xen_mc_issue(PARAVIRT_LAZY_CPU);
> -
> +	if (xen_feature(XENFEAT_writable_page_tables)) {
> +		native_write_cr3(__pa(init_level4_pgt));
> +	} else {
> +		xen_mc_batch();
> +		__xen_write_cr3(true, __pa(init_level4_pgt));
> +		xen_mc_issue(PARAVIRT_LAZY_CPU);
> +	}
>  	/* We can't that easily rip out L3 and L2, as the Xen pagetables are
>  	 * set out this way: [L4], [L1], [L2], [L3], [L1], [L1] ...  for
>  	 * the initial domain. For guests using the toolstack, they are in:
> @@ -2207,6 +2225,15 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
>  void __init xen_init_mmu_ops(void)
>  {
>  	x86_init.paging.pagetable_init = xen_pagetable_init;
> +
> +	/* Optimization - we can use the HVM one but it has no idea which
> +	 * VCPUs are descheduled - which means that it will needlessly IPI
> +	 * them. Xen knows so let it do the job.
> +	 */
> +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
> +		pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
> +		return;
> +	}
>  	pv_mmu_ops = xen_mmu_ops;
>  
>  	memset(dummy_mapping, 0xff, PAGE_SIZE);
> -- 
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 12/12] xen/pvh: Support ParaVirtualized Hardware extensions.
  2013-12-17 20:51 ` [PATCH v11 12/12] xen/pvh: Support ParaVirtualized Hardware extensions Konrad Rzeszutek Wilk
@ 2013-12-18 14:52   ` Stefano Stabellini
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 14:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> PVH allows PV linux guest to utilize hardware extended capabilities,
> such as running MMU updates in a HVM container.
> 
> The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
> with modifications):
> 
> "* the guest uses auto translate:
>  - p2m is managed by Xen
>  - pagetables are owned by the guest
>  - mmu_update hypercall not available
> * it uses event callback and not vlapic emulation,
> * IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
> 
> For a full list of hcalls supported for PVH, see pvh_hypercall64_table
> in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
> PV guest with auto translate, although it does use hvm_op for setting
> callback vector."
> 
> Use .ascii and .asciz to define xen feature string. Note, the PVH
> string must be in a single line (not multiple lines with \) to keep the
> assembler from putting null char after each string before \.
> This patch allows it to be configured and enabled.
> 
> Lastly remove some of the scaffolding.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


>  arch/x86/xen/Kconfig    | 8 ++++++++
>  arch/x86/xen/xen-head.S | 8 +++++++-
>  include/xen/xen.h       | 3 +--
>  3 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
> index 1a3c765..161cc34 100644
> --- a/arch/x86/xen/Kconfig
> +++ b/arch/x86/xen/Kconfig
> @@ -51,3 +51,11 @@ config XEN_DEBUG_FS
>  	  Enable statistics output and various tuning options in debugfs.
>  	  Enabling this option may incur a significant performance overhead.
>  
> +config XEN_PVH
> +	bool "Support for running as a PVH guest"
> +	depends on X86_64 && XEN && XEN_PVHVM
> +	default n
> +	help
> +	   This option enables support for running as a PVH guest (PV guest
> +	   using hardware extensions) under a suitably capable hypervisor.
> +	   If unsure, say N.
> diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
> index 7faed58..56f42c0 100644
> --- a/arch/x86/xen/xen-head.S
> +++ b/arch/x86/xen/xen-head.S
> @@ -13,6 +13,12 @@
>  #include <xen/interface/elfnote.h>
>  #include <asm/xen/interface.h>
>  
> +#ifdef CONFIG_XEN_PVH
> +#define PVH_FEATURES_STR  "|writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector"
> +#else
> +#define PVH_FEATURES_STR  ""
> +#endif
> +
>  	__INIT
>  ENTRY(startup_xen)
>  	cld
> @@ -95,7 +101,7 @@ NEXT_HYPERCALL(arch_6)
>  #endif
>  	ELFNOTE(Xen, XEN_ELFNOTE_ENTRY,          _ASM_PTR startup_xen)
>  	ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, _ASM_PTR hypercall_page)
> -	ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .asciz "!writable_page_tables|pae_pgdir_above_4gb")
> +	ELFNOTE(Xen, XEN_ELFNOTE_FEATURES,       .ascii "!writable_page_tables|pae_pgdir_above_4gb"; .asciz PVH_FEATURES_STR)
>  	ELFNOTE(Xen, XEN_ELFNOTE_PAE_MODE,       .asciz "yes")
>  	ELFNOTE(Xen, XEN_ELFNOTE_LOADER,         .asciz "generic")
>  	ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID,
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index 1d6a237..a248002 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -29,8 +29,7 @@ extern enum xen_domain_type xen_domain_type;
>  #define xen_initial_domain()	(0)
>  #endif	/* CONFIG_XEN_DOM0 */
>  
> -#ifdef CONFIG_XEN_PVHVM
> -/* Temporarily under XEN_PVHVM, but will be under CONFIG_XEN_PVH */
> +#ifdef CONFIG_XEN_PVH
>  #include <xen/features.h>
>  #define xen_pvh_domain() (xen_pv_domain() && \
>  			  xen_feature(XENFEAT_auto_translated_physmap) && \
> -- 
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 02/12] xen/pvh: Define what an PVH guest is.
  2013-12-18 14:22   ` [Xen-devel] " Stefano Stabellini
@ 2013-12-18 14:55     ` Stefano Stabellini
  2013-12-18 16:01       ` Ian Campbell
  2013-12-18 14:57     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 14:55 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Konrad Rzeszutek Wilk, linux-kernel, david.vrabel, jbeulich,
	xen-devel, boris.ostrovsky

On Wed, 18 Dec 2013, Stefano Stabellini wrote:
> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > Which is a PV guest with auto page translation enabled
> > and with vector callback. It is a cross between PVHVM and PV.
> > 
> > The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
> > with modifications):
> > 
> > "* the guest uses auto translate:
> >  - p2m is managed by Xen
> >  - pagetables are owned by the guest
> >  - mmu_update hypercall not available
> > * it uses event callback and not vlapic emulation,
> > * IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
> > 
> > For a full list of hcalls supported for PVH, see pvh_hypercall64_table
> > in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
> > PV guest with auto translate, although it does use hvm_op for setting
> > callback vector."
> > 
> > We don't have yet a Kconfig entry setup as we do not
> > have all the parts ready for it - so we piggyback
> > on the PVHVM config option. This scaffolding will
> > be removed later.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> Could you please add an "&& CONFIG_X86"?

On second thought, given that it is just temporary and that PVHVM is not
defined on ARM, it could be OK. But maybe it is worth adding a small
comment on the fact that this is an x86-only option.


> >  include/xen/xen.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/xen/xen.h b/include/xen/xen.h
> > index a74d436..1d6a237 100644
> > --- a/include/xen/xen.h
> > +++ b/include/xen/xen.h
> > @@ -29,4 +29,13 @@ extern enum xen_domain_type xen_domain_type;
> >  #define xen_initial_domain()	(0)
> >  #endif	/* CONFIG_XEN_DOM0 */
> >  
> > +#ifdef CONFIG_XEN_PVHVM
> > +/* Temporarily under XEN_PVHVM, but will be under CONFIG_XEN_PVH */
> > +#include <xen/features.h>
> > +#define xen_pvh_domain() (xen_pv_domain() && \
> > +			  xen_feature(XENFEAT_auto_translated_physmap) && \
> > +			  xen_have_vector_callback)
> > +#else
> > +#define xen_pvh_domain()	(0)
> > +#endif
> >  #endif	/* _XEN_XEN_H */
> > -- 
> > 1.8.3.1
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> > 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 11/12] xen/pvh: Disable PV code that does not work with PVH.
  2013-12-18 14:19   ` [Xen-devel] " Stefano Stabellini
@ 2013-12-18 14:56     ` Konrad Rzeszutek Wilk
  2013-12-18 15:22       ` Stefano Stabellini
  0 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-18 14:56 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Wed, Dec 18, 2013 at 02:19:28PM +0000, Stefano Stabellini wrote:
> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > As we do not have yet a mechanism for that.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> Even though we don't even compile cpu_hotplug on ARM yet, I don't like
> this change because it affects ARM guests too.

Perhaps then just #ifdef it with CONFIG_X86 for right now?

> 
> Please add a note about ARM guests in the FIXME line.

Good idea. Is the offline/online part working in ARM? Or is that it
has not been tested?
> 
> 
> >  drivers/xen/cpu_hotplug.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
> > index cc6513a..cbb02af 100644
> > --- a/drivers/xen/cpu_hotplug.c
> > +++ b/drivers/xen/cpu_hotplug.c
> > @@ -4,6 +4,7 @@
> >  
> >  #include <xen/xen.h>
> >  #include <xen/xenbus.h>
> > +#include <xen/features.h>
> >  
> >  #include <asm/xen/hypervisor.h>
> >  #include <asm/cpu.h>
> > @@ -102,7 +103,8 @@ static int __init setup_vcpu_hotplug_event(void)
> >  	static struct notifier_block xsn_cpu = {
> >  		.notifier_call = setup_cpu_watcher };
> >  
> > -	if (!xen_pv_domain())
> > +	/* PVH TBD/FIXME: future work */
> > +	if (!xen_pv_domain() || xen_feature(XENFEAT_auto_translated_physmap))
> >  		return -ENODEV;
> >  
> >  	register_xenstore_notifier(&xsn_cpu);
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 02/12] xen/pvh: Define what an PVH guest is.
  2013-12-18 14:22   ` [Xen-devel] " Stefano Stabellini
  2013-12-18 14:55     ` Stefano Stabellini
@ 2013-12-18 14:57     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-18 14:57 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Wed, Dec 18, 2013 at 02:22:35PM +0000, Stefano Stabellini wrote:
> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > Which is a PV guest with auto page translation enabled
> > and with vector callback. It is a cross between PVHVM and PV.
> > 
> > The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
> > with modifications):
> > 
> > "* the guest uses auto translate:
> >  - p2m is managed by Xen
> >  - pagetables are owned by the guest
> >  - mmu_update hypercall not available
> > * it uses event callback and not vlapic emulation,
> > * IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
> > 
> > For a full list of hcalls supported for PVH, see pvh_hypercall64_table
> > in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
> > PV guest with auto translate, although it does use hvm_op for setting
> > callback vector."
> > 
> > We don't have yet a Kconfig entry setup as we do not
> > have all the parts ready for it - so we piggyback
> > on the PVHVM config option. This scaffolding will
> > be removed later.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> Could you please add an "&& CONFIG_X86"?

I think that CONFIG_XEN_PVHVM is only X86 anyhow.

And the XEN_PVH is going to be X86 as well - so it would always
be turned off on ARM.

I can of course add it in - but it might not neccessary?
> 
> 
> >  include/xen/xen.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/xen/xen.h b/include/xen/xen.h
> > index a74d436..1d6a237 100644
> > --- a/include/xen/xen.h
> > +++ b/include/xen/xen.h
> > @@ -29,4 +29,13 @@ extern enum xen_domain_type xen_domain_type;
> >  #define xen_initial_domain()	(0)
> >  #endif	/* CONFIG_XEN_DOM0 */
> >  
> > +#ifdef CONFIG_XEN_PVHVM
> > +/* Temporarily under XEN_PVHVM, but will be under CONFIG_XEN_PVH */
> > +#include <xen/features.h>
> > +#define xen_pvh_domain() (xen_pv_domain() && \
> > +			  xen_feature(XENFEAT_auto_translated_physmap) && \
> > +			  xen_have_vector_callback)
> > +#else
> > +#define xen_pvh_domain()	(0)
> > +#endif
> >  #endif	/* _XEN_XEN_H */
> > -- 
> > 1.8.3.1
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 03/12] xen/pvh: Early bootup changes in PV code.
  2013-12-18 14:27   ` [Xen-devel] " Stefano Stabellini
@ 2013-12-18 14:58     ` Konrad Rzeszutek Wilk
  2013-12-18 15:05       ` Stefano Stabellini
  0 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-18 14:58 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Wed, Dec 18, 2013 at 02:27:08PM +0000, Stefano Stabellini wrote:
> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > In the bootup code for PVH we can trap cpuid via vmexit, so don't
> > need to use emulated prefix call. We also check for vector callback
> > early on, as it is a required feature. PVH also runs at default kernel
> > IOPL.
> > 
> > Finally, pure PV settings are moved to a separate function that are
> > only called for pure PV, ie, pv with pvmmu. They are also #ifdef
> > with CONFIG_XEN_PVMMU.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/xen/enlighten.c | 63 +++++++++++++++++++++++++++++++++---------------
> >  arch/x86/xen/setup.c     | 18 +++++++++-----
> >  2 files changed, 56 insertions(+), 25 deletions(-)
> > 
> > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > index fa6ade7..3b21c82 100644
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -46,6 +46,7 @@
> >  #include <xen/hvm.h>
> >  #include <xen/hvc-console.h>
> >  #include <xen/acpi.h>
> > +#include <xen/features.h>
> >  
> >  #include <asm/paravirt.h>
> >  #include <asm/apic.h>
> > @@ -262,8 +263,9 @@ static void __init xen_banner(void)
> >  	struct xen_extraversion extra;
> >  	HYPERVISOR_xen_version(XENVER_extraversion, &extra);
> >  
> > -	printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
> > -	       pv_info.name);
> > +	pr_info("Booting paravirtualized kernel %son %s\n",
> > +		xen_feature(XENFEAT_auto_translated_physmap) ?
> > +			"with PVH extensions " : "", pv_info.name);
> >  	printk(KERN_INFO "Xen version: %d.%d%s%s\n",
> >  	       version >> 16, version & 0xffff, extra.extraversion,
> >  	       xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
> > @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
> >  		break;
> >  	}
> >  
> > -	asm(XEN_EMULATE_PREFIX "cpuid"
> > -		: "=a" (*ax),
> > -		  "=b" (*bx),
> > -		  "=c" (*cx),
> > -		  "=d" (*dx)
> > -		: "0" (*ax), "2" (*cx));
> > +	if (xen_pvh_domain())
> > +		native_cpuid(ax, bx, cx, dx);
> > +	else
> > +		asm(XEN_EMULATE_PREFIX "cpuid"
> > +			: "=a" (*ax),
> > +			"=b" (*bx),
> > +			"=c" (*cx),
> > +			"=d" (*dx)
> > +			: "0" (*ax), "2" (*cx));
> >  
> >  	*bx &= maskebx;
> >  	*cx &= maskecx;
> > @@ -1420,6 +1425,19 @@ static void __init xen_setup_stackprotector(void)
> >  	pv_cpu_ops.load_gdt = xen_load_gdt;
> >  }
> >  
> > +static void __init xen_pvh_early_guest_init(void)
> > +{
> > +	if (xen_feature(XENFEAT_hvm_callback_vector))
> > +		xen_have_vector_callback = 1;
> > +
> > +#ifdef CONFIG_X86_32
> > +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
> > +		xen_raw_printk("ERROR: 32bit PVH guests are not supported\n");
> > +		BUG();
> > +	}
> > +#endif
> 
> Shouldn't we detect this error at build time?

You can't even set the XEN_PVH for 32-bit.
> If so, does it make sense to check for it at run time too?
> 

I think I might as well rip that out - as that code won't be ever
touched. And perhaps just stash in a BUG with /* PVH: FIXME */?


> The rest looks OK to me.

Thank you!
> 
> > +}
> > +
> >  /* First C function to be called on Xen boot */
> >  asmlinkage void __init xen_start_kernel(void)
> >  {
> > @@ -1431,13 +1449,18 @@ asmlinkage void __init xen_start_kernel(void)
> >  
> >  	xen_domain_type = XEN_PV_DOMAIN;
> >  
> > +	xen_setup_features();
> > +	xen_pvh_early_guest_init();
> >  	xen_setup_machphys_mapping();
> >  
> >  	/* Install Xen paravirt ops */
> >  	pv_info = xen_info;
> >  	pv_init_ops = xen_init_ops;
> > -	pv_cpu_ops = xen_cpu_ops;
> >  	pv_apic_ops = xen_apic_ops;
> > +	if (xen_pvh_domain())
> > +		pv_cpu_ops.cpuid = xen_cpuid;
> > +	else
> > +		pv_cpu_ops = xen_cpu_ops;
> >  
> >  	x86_init.resources.memory_setup = xen_memory_setup;
> >  	x86_init.oem.arch_setup = xen_arch_setup;
> > @@ -1469,8 +1492,6 @@ asmlinkage void __init xen_start_kernel(void)
> >  	/* Work out if we support NX */
> >  	x86_configure_nx();
> >  
> > -	xen_setup_features();
> > -
> >  	/* Get mfn list */
> >  	if (!xen_feature(XENFEAT_auto_translated_physmap))
> >  		xen_build_dynamic_phys_to_machine();
> > @@ -1548,14 +1569,18 @@ asmlinkage void __init xen_start_kernel(void)
> >  	/* set the limit of our address space */
> >  	xen_reserve_top();
> >  
> > -	/* We used to do this in xen_arch_setup, but that is too late on AMD
> > -	 * were early_cpu_init (run before ->arch_setup()) calls early_amd_init
> > -	 * which pokes 0xcf8 port.
> > -	 */
> > -	set_iopl.iopl = 1;
> > -	rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
> > -	if (rc != 0)
> > -		xen_raw_printk("physdev_op failed %d\n", rc);
> > +	/* PVH: runs at default kernel iopl of 0 */
> > +	if (!xen_pvh_domain()) {
> > +		/*
> > +		 * We used to do this in xen_arch_setup, but that is too late
> > +		 * on AMD were early_cpu_init (run before ->arch_setup()) calls
> > +		 * early_amd_init which pokes 0xcf8 port.
> > +		 */
> > +		set_iopl.iopl = 1;
> > +		rc = HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl);
> > +		if (rc != 0)
> > +			xen_raw_printk("physdev_op failed %d\n", rc);
> > +	}
> >  
> >  #ifdef CONFIG_X86_32
> >  	/* set up basic CPUID stuff */
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> > index 68c054f..2137c51 100644
> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> > @@ -563,16 +563,13 @@ void xen_enable_nmi(void)
> >  		BUG();
> >  #endif
> >  }
> > -void __init xen_arch_setup(void)
> > +void __init xen_pvmmu_arch_setup(void)
> >  {
> > -	xen_panic_handler_init();
> > -
> >  	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments);
> >  	HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_writable_pagetables);
> >  
> > -	if (!xen_feature(XENFEAT_auto_translated_physmap))
> > -		HYPERVISOR_vm_assist(VMASST_CMD_enable,
> > -				     VMASST_TYPE_pae_extended_cr3);
> > +	HYPERVISOR_vm_assist(VMASST_CMD_enable,
> > +			     VMASST_TYPE_pae_extended_cr3);
> >  
> >  	if (register_callback(CALLBACKTYPE_event, xen_hypervisor_callback) ||
> >  	    register_callback(CALLBACKTYPE_failsafe, xen_failsafe_callback))
> > @@ -581,6 +578,15 @@ void __init xen_arch_setup(void)
> >  	xen_enable_sysenter();
> >  	xen_enable_syscall();
> >  	xen_enable_nmi();
> > +}
> > +
> > +/* This function is not called for HVM domains */
> > +void __init xen_arch_setup(void)
> > +{
> > +	xen_panic_handler_init();
> > +	if (!xen_feature(XENFEAT_auto_translated_physmap))
> > +		xen_pvmmu_arch_setup();
> > +
> >  #ifdef CONFIG_ACPI
> >  	if (!(xen_start_info->flags & SIF_INITDOMAIN)) {
> >  		printk(KERN_INFO "ACPI in unprivileged domain disabled\n");
> > -- 
> > 1.8.3.1
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 04/12] xen/pvh: Don't setup P2M tree.
  2013-12-18 14:39   ` [Xen-devel] " Stefano Stabellini
@ 2013-12-18 15:05     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-18 15:05 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Wed, Dec 18, 2013 at 02:39:35PM +0000, Stefano Stabellini wrote:
> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > P2M is not available for PVH. Fortunatly for us the
> > P2M code already has mostly the support for auto-xlat guest thanks to
> > commit 3d24bbd7dddbea54358a9795abaf051b0f18973c
> > "grant-table: call set_phys_to_machine after mapping grant refs"
> > which: "
> > introduces set_phys_to_machine calls for auto_translated guests
> > (even on x86) in gnttab_map_refs and gnttab_unmap_refs.
> > translated by swiotlb-xen... " so we don't need to muck much.
> 
> Just a note: with 3d24bbd7dddbea54358a9795abaf051b0f18973c you'll get
> set_phys_to_machine calls from gnttab_map_refs and gnttab_unmap_refs but
> PVH guests won't do anything with them.
> 
> If we assume that an IOMMU is always present on the plaform and Xen is
> going to make the appropriate IOMMU pagetable changes in the hypercall
> implementation of GNTTABOP_map_grant_ref and GNTTABOP_unmap_grant_ref,
> then eveything should be transparent from PVH Dom0 point of view and DMA
> transfers involving foreign pages keep working with no issies. 

I think this Xen upstream git commit:

commit 0e624511f228aef57a3c17520c466c1986e68f62
Author: Mukesh Rathor <mukesh.rathor@oracle.com>
Date:   Wed Nov 27 15:17:02 2013 +0100

    PVH dom0: iommu related changes
    
has this:
static __init void check_dom0_pvh_reqs(struct domain *d)
+{
+    if ( !iommu_enabled )
+        panic("Presently, iommu must be enabled for pvh dom0\n");
+
+    if ( iommu_passthrough )
+        panic("For pvh dom0, dom0-passthrough must not be enabled\n");
+
+    iommu_dom0_strict = 1;
+}
+


so we can assume that. I should include your description and also point
out to the Xen commit to explain why we are ignoring the non-IOMMU case.

This presents an interesting issue - we can boot PVH domU on
machines without IOMMU. We should also be able to do PCI passthrough
to those - and ... BOOM, unless the toolstack checks that the
hypervisor has IOMMU enabled. Actually this check should exist for HVM
guests too - hopefully I can piggyback on that.

<files away a bug for libxl>
> 
> Otherwise we do need a P2M (and an M2P) for PVH Dom0 to track these
> foreign pages: on ARM I introduced two redblack trees to do it (see
> arch/arm/xen/p2m.c).
> 

<nods> There is that. I am hoping to not have to do that. Granted
it is mostly all in the Xen SWIOTLB so not soo bad and nicely contained.



> 
> 
> > But we still have to inhibit the building of the P2M tree.
> > That had been done in the past by not calling
> > xen_build_dynamic_phys_to_machine (which setups the P2M tree
> > and gives us virtual address to access them). But we are missing
> > a check for xen_build_mfn_list_list - which was continuing to setup
> > the P2M tree and would blow up at trying to get the virtual
> > address of p2m_missing (which would have been setup by
> > xen_build_dynamic_phys_to_machine).
> > 
> > Hence a check is needed to not call xen_build_mfn_list_list when
> > running in auto-xlat mode.
> > 
> > Instead of replicating the check for auto-xlat in enlighten.c
> > do it in the p2m.c code. The reason is that the xen_build_mfn_list_list
> > is called also in xen_arch_post_suspend without any checks for
> > auto-xlat. So for PVH or PV with auto-xlat - we would needlessly
> > allocate space for an P2M tree.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/xen/enlighten.c |  3 +--
> >  arch/x86/xen/p2m.c       | 12 ++++++++++--
> >  2 files changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > index 3b21c82..c7341d0 100644
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -1493,8 +1493,7 @@ asmlinkage void __init xen_start_kernel(void)
> >  	x86_configure_nx();
> >  
> >  	/* Get mfn list */
> > -	if (!xen_feature(XENFEAT_auto_translated_physmap))
> > -		xen_build_dynamic_phys_to_machine();
> > +	xen_build_dynamic_phys_to_machine();
> >  
> >  	/*
> >  	 * Set up kernel GDT and segment registers, mainly so that
> > diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> > index 2ae8699..fb7ee0a 100644
> > --- a/arch/x86/xen/p2m.c
> > +++ b/arch/x86/xen/p2m.c
> > @@ -280,6 +280,9 @@ void __ref xen_build_mfn_list_list(void)
> >  {
> >  	unsigned long pfn;
> >  
> > +	if (xen_feature(XENFEAT_auto_translated_physmap))
> > +		return;
> > +
> >  	/* Pre-initialize p2m_top_mfn to be completely missing */
> >  	if (p2m_top_mfn == NULL) {
> >  		p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
> > @@ -346,10 +349,15 @@ void xen_setup_mfn_list_list(void)
> >  /* Set up p2m_top to point to the domain-builder provided p2m pages */
> >  void __init xen_build_dynamic_phys_to_machine(void)
> >  {
> > -	unsigned long *mfn_list = (unsigned long *)xen_start_info->mfn_list;
> > -	unsigned long max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
> > +	unsigned long *mfn_list;
> > +	unsigned long max_pfn;
> >  	unsigned long pfn;
> >  
> > +	 if (xen_feature(XENFEAT_auto_translated_physmap))
> > +		return;
> > +
> > +	mfn_list = (unsigned long *)xen_start_info->mfn_list;
> > +	max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
> >  	xen_max_p2m_pfn = max_pfn;
> >  
> >  	p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
> > -- 
> > 1.8.3.1
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 03/12] xen/pvh: Early bootup changes in PV code.
  2013-12-18 14:58     ` Konrad Rzeszutek Wilk
@ 2013-12-18 15:05       ` Stefano Stabellini
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 15:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, xen-devel, linux-kernel, boris.ostrovsky,
	david.vrabel, mukesh.rathor, jbeulich

On Wed, 18 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > If so, does it make sense to check for it at run time too?
> > 
> 
> I think I might as well rip that out - as that code won't be ever
> touched. And perhaps just stash in a BUG with /* PVH: FIXME */?

That would be fine.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 08/12] xen/pvh: MMU changes for PVH
  2013-12-18 14:48   ` [Xen-devel] " Stefano Stabellini
@ 2013-12-18 15:10     ` Konrad Rzeszutek Wilk
  2013-12-18 15:15       ` Stefano Stabellini
  0 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-18 15:10 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Wed, Dec 18, 2013 at 02:48:38PM +0000, Stefano Stabellini wrote:
> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > .. which are surprinsingly small compared to the amount for PV code.
> > 
> > PVH uses mostly native mmu ops, we leave the generic (native_*) for
> > the majority and just overwrite the baremetal with the ones we need.
> > 
> > We also optimize one - the TLB flush. The native operation would
> > needlessly IPI offline VCPUs causing extra wakeups. Using the
> > Xen one avoids that and lets the hypervisor determine which
> > VCPU needs the TLB flush.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/xen/mmu.c | 35 +++++++++++++++++++++++++++++++----
> >  1 file changed, 31 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index ce563be..77b7622 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -74,6 +74,7 @@
> >  #include <xen/interface/version.h>
> >  #include <xen/interface/memory.h>
> >  #include <xen/hvc-console.h>
> > +#include <xen/balloon.h>
> >  
> >  #include "multicalls.h"
> >  #include "mmu.h"
> > @@ -1207,6 +1208,8 @@ static void __init xen_pagetable_init(void)
> >  #endif
> >  	paging_init();
> >  	xen_setup_shared_info();
> > +	if (xen_feature(XENFEAT_auto_translated_physmap))
> > +		return;
> >  #ifdef CONFIG_X86_64
> >  	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> >  		unsigned long new_mfn_list;
> 
> At the very least you should remove the second
> XENFEAT_auto_translated_physmap check.  Also xen_setup_shared_info
> contains yet another XENFEAT_auto_translated_physmap check. Maybe we
> could refactor the code a bit to look nicer. Having a separate
> xen_pagetable_init function for PVH could help.

Right, that would be much nicer.
> 
> 
> > @@ -1556,6 +1559,10 @@ static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
> >  static void pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
> >  {
> >  	struct mmuext_op op;
> > +
> > +	if (xen_feature(XENFEAT_writable_page_tables))
> > +		return;
> > +
> >  	op.cmd = cmd;
> >  	op.arg1.mfn = pfn_to_mfn(pfn);
> >  	if (HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF))
> 
> Why do we need this? I thought that all the callers of pin_pagetable_pfn
> are not actually enabled on PVH.

We still call xen_setup_kernel_pagetable.
> 
> 
> > @@ -1753,6 +1760,10 @@ static void set_page_prot_flags(void *addr, pgprot_t prot, unsigned long flags)
> >  	unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
> >  	pte_t pte = pfn_pte(pfn, prot);
> >  
> > +	/* recall for PVH, page tables are native. */
> > +	if (xen_feature(XENFEAT_auto_translated_physmap))
> > +		return;
> > +
> >  	if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, flags))
> >  		BUG();
> >  }
> 
> This one too. Is it because we are reusing xen_setup_kernel_pagetable on
> PVH?

Yup.
> 
> 
> > @@ -1834,6 +1845,9 @@ static void convert_pfn_mfn(void *v)
> >  	pte_t *pte = v;
> >  	int i;
> >  
> > +	if (xen_feature(XENFEAT_auto_translated_physmap))
> > +		return;
> > +
> >  	/* All levels are converted the same way, so just treat them
> >  	   as ptes. */
> >  	for (i = 0; i < PTRS_PER_PTE; i++)
> 
> This is getting pretty bad.
> Can we find a way to refactor xen_setup_kernel_pagetable so that we
> don't need all this? Maybe we need a new function?

Is it that bad? Doing a copy of xen_setup_kernel_pagetable just
for PVH strikes me as error prone. Having the checks in the
functions that xen_setup_kernel_pagetable is much easier and
nicer I think.

Maybe if we did a big 'if (xen_feature(XEN..)' inside of
xen_setup_kernel_pagetable that would be easier?

> 
> 
> > @@ -1863,6 +1877,7 @@ static void __init check_pt_base(unsigned long *pt_base, unsigned long *pt_end,
> >   * but that's enough to get __va working.  We need to fill in the rest
> >   * of the physical mapping once some sort of allocator has been set
> >   * up.
> > + * NOTE: for PVH, the page tables are native.
> >   */
> >  void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
> >  {
> > @@ -1940,10 +1955,13 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
> >  	 * structure to attach it to, so make sure we just set kernel
> >  	 * pgd.
> >  	 */
> > -	xen_mc_batch();
> > -	__xen_write_cr3(true, __pa(init_level4_pgt));
> > -	xen_mc_issue(PARAVIRT_LAZY_CPU);
> > -
> > +	if (xen_feature(XENFEAT_writable_page_tables)) {
> > +		native_write_cr3(__pa(init_level4_pgt));
> > +	} else {
> > +		xen_mc_batch();
> > +		__xen_write_cr3(true, __pa(init_level4_pgt));
> > +		xen_mc_issue(PARAVIRT_LAZY_CPU);
> > +	}
> >  	/* We can't that easily rip out L3 and L2, as the Xen pagetables are
> >  	 * set out this way: [L4], [L1], [L2], [L3], [L1], [L1] ...  for
> >  	 * the initial domain. For guests using the toolstack, they are in:
> > @@ -2207,6 +2225,15 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
> >  void __init xen_init_mmu_ops(void)
> >  {
> >  	x86_init.paging.pagetable_init = xen_pagetable_init;
> > +
> > +	/* Optimization - we can use the HVM one but it has no idea which
> > +	 * VCPUs are descheduled - which means that it will needlessly IPI
> > +	 * them. Xen knows so let it do the job.
> > +	 */
> > +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
> > +		pv_mmu_ops.flush_tlb_others = xen_flush_tlb_others;
> > +		return;
> > +	}
> >  	pv_mmu_ops = xen_mmu_ops;
> >  
> >  	memset(dummy_mapping, 0xff, PAGE_SIZE);
> > -- 
> > 1.8.3.1
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 08/12] xen/pvh: MMU changes for PVH
  2013-12-18 15:10     ` Konrad Rzeszutek Wilk
@ 2013-12-18 15:15       ` Stefano Stabellini
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 15:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, xen-devel, linux-kernel, boris.ostrovsky,
	david.vrabel, mukesh.rathor, jbeulich

On Wed, 18 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > @@ -1834,6 +1845,9 @@ static void convert_pfn_mfn(void *v)
> > >  	pte_t *pte = v;
> > >  	int i;
> > >  
> > > +	if (xen_feature(XENFEAT_auto_translated_physmap))
> > > +		return;
> > > +
> > >  	/* All levels are converted the same way, so just treat them
> > >  	   as ptes. */
> > >  	for (i = 0; i < PTRS_PER_PTE; i++)
> > 
> > This is getting pretty bad.
> > Can we find a way to refactor xen_setup_kernel_pagetable so that we
> > don't need all this? Maybe we need a new function?
> 
> Is it that bad? Doing a copy of xen_setup_kernel_pagetable just
> for PVH strikes me as error prone. Having the checks in the
> functions that xen_setup_kernel_pagetable is much easier and
> nicer I think.
> 
> Maybe if we did a big 'if (xen_feature(XEN..)' inside of
> xen_setup_kernel_pagetable that would be easier?

Yes, that should make things clearer.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 11/12] xen/pvh: Disable PV code that does not work with PVH.
  2013-12-18 14:56     ` Konrad Rzeszutek Wilk
@ 2013-12-18 15:22       ` Stefano Stabellini
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 15:22 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, xen-devel, linux-kernel, boris.ostrovsky,
	david.vrabel, mukesh.rathor, jbeulich

On Wed, 18 Dec 2013, Konrad Rzeszutek Wilk wrote:
> On Wed, Dec 18, 2013 at 02:19:28PM +0000, Stefano Stabellini wrote:
> > On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > 
> > > As we do not have yet a mechanism for that.
> > > 
> > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > 
> > Even though we don't even compile cpu_hotplug on ARM yet, I don't like
> > this change because it affects ARM guests too.
> 
> Perhaps then just #ifdef it with CONFIG_X86 for right now?

That or a comment.


> > Please add a note about ARM guests in the FIXME line.
> 
> Good idea. Is the offline/online part working in ARM? Or is that it
> has not been tested?

At the moment we are using PSCI only on ARM (that is a firmware
interface that we are "emulating" in the hypervisor), but it is not
exactly the same: PSCI is about enabling secondary cpus or
disabling/suspending cpus. It is not about cpu-hotplug. So one day we
might want both.



> > >  drivers/xen/cpu_hotplug.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/xen/cpu_hotplug.c b/drivers/xen/cpu_hotplug.c
> > > index cc6513a..cbb02af 100644
> > > --- a/drivers/xen/cpu_hotplug.c
> > > +++ b/drivers/xen/cpu_hotplug.c
> > > @@ -4,6 +4,7 @@
> > >  
> > >  #include <xen/xen.h>
> > >  #include <xen/xenbus.h>
> > > +#include <xen/features.h>
> > >  
> > >  #include <asm/xen/hypervisor.h>
> > >  #include <asm/cpu.h>
> > > @@ -102,7 +103,8 @@ static int __init setup_vcpu_hotplug_event(void)
> > >  	static struct notifier_block xsn_cpu = {
> > >  		.notifier_call = setup_cpu_watcher };
> > >  
> > > -	if (!xen_pv_domain())
> > > +	/* PVH TBD/FIXME: future work */
> > > +	if (!xen_pv_domain() || xen_feature(XENFEAT_auto_translated_physmap))
> > >  		return -ENODEV;
> > >  
> > >  	register_xenstore_notifier(&xsn_cpu);
> > 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 02/12] xen/pvh: Define what an PVH guest is.
  2013-12-18 14:55     ` Stefano Stabellini
@ 2013-12-18 16:01       ` Ian Campbell
  2013-12-18 16:58         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 46+ messages in thread
From: Ian Campbell @ 2013-12-18 16:01 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: linux-kernel, david.vrabel, jbeulich, xen-devel, boris.ostrovsky

On Wed, 2013-12-18 at 14:55 +0000, Stefano Stabellini wrote:
> On Wed, 18 Dec 2013, Stefano Stabellini wrote:
> > On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > 
> > > Which is a PV guest with auto page translation enabled
> > > and with vector callback. It is a cross between PVHVM and PV.
> > > 
> > > The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
> > > with modifications):
> > > 
> > > "* the guest uses auto translate:
> > >  - p2m is managed by Xen
> > >  - pagetables are owned by the guest
> > >  - mmu_update hypercall not available
> > > * it uses event callback and not vlapic emulation,
> > > * IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
> > > 
> > > For a full list of hcalls supported for PVH, see pvh_hypercall64_table
> > > in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
> > > PV guest with auto translate, although it does use hvm_op for setting
> > > callback vector."
> > > 
> > > We don't have yet a Kconfig entry setup as we do not
> > > have all the parts ready for it - so we piggyback
> > > on the PVHVM config option. This scaffolding will
> > > be removed later.
> > > 
> > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > 
> > Could you please add an "&& CONFIG_X86"?
> 
> On second thought, given that it is just temporary and that PVHVM is not
> defined on ARM, it could be OK. But maybe it is worth adding a small
> comment on the fact that this is an x86-only option.

I wonder if it should be CONFIG_XEN_X86_{PVH,PVHVM} instead?



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 02/12] xen/pvh: Define what an PVH guest is.
  2013-12-18 16:01       ` Ian Campbell
@ 2013-12-18 16:58         ` Konrad Rzeszutek Wilk
  2013-12-18 17:03           ` Ian Campbell
  0 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-18 16:58 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Stefano Stabellini, xen-devel, boris.ostrovsky, linux-kernel,
	jbeulich, david.vrabel

On Wed, Dec 18, 2013 at 04:01:03PM +0000, Ian Campbell wrote:
> On Wed, 2013-12-18 at 14:55 +0000, Stefano Stabellini wrote:
> > On Wed, 18 Dec 2013, Stefano Stabellini wrote:
> > > On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > > 
> > > > Which is a PV guest with auto page translation enabled
> > > > and with vector callback. It is a cross between PVHVM and PV.
> > > > 
> > > > The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
> > > > with modifications):
> > > > 
> > > > "* the guest uses auto translate:
> > > >  - p2m is managed by Xen
> > > >  - pagetables are owned by the guest
> > > >  - mmu_update hypercall not available
> > > > * it uses event callback and not vlapic emulation,
> > > > * IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
> > > > 
> > > > For a full list of hcalls supported for PVH, see pvh_hypercall64_table
> > > > in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
> > > > PV guest with auto translate, although it does use hvm_op for setting
> > > > callback vector."
> > > > 
> > > > We don't have yet a Kconfig entry setup as we do not
> > > > have all the parts ready for it - so we piggyback
> > > > on the PVHVM config option. This scaffolding will
> > > > be removed later.
> > > > 
> > > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > 
> > > Could you please add an "&& CONFIG_X86"?
> > 
> > On second thought, given that it is just temporary and that PVHVM is not
> > defined on ARM, it could be OK. But maybe it is worth adding a small
> > comment on the fact that this is an x86-only option.
> 
> I wonder if it should be CONFIG_XEN_X86_{PVH,PVHVM} instead?

Originally it was CONFIG_XEN_X86_PVH but I figured it would be pointless
as most of the changes were in arch/x86 and that is by default x86.

And then once that work is stabilized, ARM can kind of do the same thing - have
an CONFIG_XEN_PVH that would (hopefully) have the same ABI as x86 PVH?

Thought, you kind of already do PVH in spirit. Is that what you were
alluding too? As ARM already boots in PV and the page table manipulations
are done by the hardware.

?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 02/12] xen/pvh: Define what an PVH guest is.
  2013-12-18 16:58         ` Konrad Rzeszutek Wilk
@ 2013-12-18 17:03           ` Ian Campbell
  0 siblings, 0 replies; 46+ messages in thread
From: Ian Campbell @ 2013-12-18 17:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, xen-devel, boris.ostrovsky, linux-kernel,
	jbeulich, david.vrabel

On Wed, 2013-12-18 at 11:58 -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Dec 18, 2013 at 04:01:03PM +0000, Ian Campbell wrote:
> > On Wed, 2013-12-18 at 14:55 +0000, Stefano Stabellini wrote:
> > > On Wed, 18 Dec 2013, Stefano Stabellini wrote:
> > > > On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > > > 
> > > > > Which is a PV guest with auto page translation enabled
> > > > > and with vector callback. It is a cross between PVHVM and PV.
> > > > > 
> > > > > The Xen side defines PVH as (from docs/misc/pvh-readme.txt,
> > > > > with modifications):
> > > > > 
> > > > > "* the guest uses auto translate:
> > > > >  - p2m is managed by Xen
> > > > >  - pagetables are owned by the guest
> > > > >  - mmu_update hypercall not available
> > > > > * it uses event callback and not vlapic emulation,
> > > > > * IDT is native, so set_trap_table hcall is also N/A for a PVH guest.
> > > > > 
> > > > > For a full list of hcalls supported for PVH, see pvh_hypercall64_table
> > > > > in arch/x86/hvm/hvm.c in xen.  From the ABI prespective, it's mostly a
> > > > > PV guest with auto translate, although it does use hvm_op for setting
> > > > > callback vector."
> > > > > 
> > > > > We don't have yet a Kconfig entry setup as we do not
> > > > > have all the parts ready for it - so we piggyback
> > > > > on the PVHVM config option. This scaffolding will
> > > > > be removed later.
> > > > > 
> > > > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > 
> > > > Could you please add an "&& CONFIG_X86"?
> > > 
> > > On second thought, given that it is just temporary and that PVHVM is not
> > > defined on ARM, it could be OK. But maybe it is worth adding a small
> > > comment on the fact that this is an x86-only option.
> > 
> > I wonder if it should be CONFIG_XEN_X86_{PVH,PVHVM} instead?
> 
> Originally it was CONFIG_XEN_X86_PVH but I figured it would be pointless
> as most of the changes were in arch/x86 and that is by default x86.
> 
> And then once that work is stabilized, ARM can kind of do the same thing - have
> an CONFIG_XEN_PVH that would (hopefully) have the same ABI as x86 PVH?
> 
> Thought, you kind of already do PVH in spirit. Is that what you were
> alluding too? As ARM already boots in PV and the page table manipulations
> are done by the hardware.
> 
> ?

The Zen answer is that an ARM guest is neither PV nor HVM nor PVHVM.
It's a bit like PVH but is different also (it's further towards the H
end of the spectrum than even PVH).

I'm keen to avoid using these x86 specific terms to widely to refer to
ARM guests, because it leads to confusion.

Ian.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH
  2013-12-17 20:51 ` [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH Konrad Rzeszutek Wilk
@ 2013-12-18 18:25   ` Stefano Stabellini
  2013-12-18 20:30     ` Konrad Rzeszutek Wilk
  2013-12-18 23:44     ` Mukesh Rathor
  0 siblings, 2 replies; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 18:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> In xen_add_extra_mem() we can skip updating P2M as it's managed
> by Xen. PVH maps the entire IO space, but only RAM pages need
> to be repopulated.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/setup.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 2137c51..f93bca1 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -27,6 +27,7 @@
>  #include <xen/interface/memory.h>
>  #include <xen/interface/physdev.h>
>  #include <xen/features.h>
> +#include "mmu.h"
>  #include "xen-ops.h"
>  #include "vdso.h"
>  
> @@ -81,6 +82,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
>  
>  	memblock_reserve(start, size);
>  
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return;
> +
>  	xen_max_p2m_pfn = PFN_DOWN(start + size);
>  	for (pfn = PFN_DOWN(start); pfn < xen_max_p2m_pfn; pfn++) {
>  		unsigned long mfn = pfn_to_mfn(pfn);
> @@ -103,6 +107,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
>  		.domid        = DOMID_SELF
>  	};
>  	unsigned long len = 0;
> +	int xlated_phys = xen_feature(XENFEAT_auto_translated_physmap);
>  	unsigned long pfn;
>  	int ret;
>  
> @@ -116,7 +121,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
>  				continue;
>  			frame = mfn;
>  		} else {
> -			if (mfn != INVALID_P2M_ENTRY)
> +			if (!xlated_phys && mfn != INVALID_P2M_ENTRY)
>  				continue;
>  			frame = pfn;
>  		}
> @@ -154,6 +159,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
>  static unsigned long __init xen_release_chunk(unsigned long start,
>  					      unsigned long end)
>  {
> +	/*
> +	 * Xen already ballooned out the E820 non RAM regions for us
> +	 * and set them up properly in EPT.
> +	 */
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		return end - start;
> +
>  	return xen_do_chunk(start, end, true);
>  }
>  
> @@ -222,6 +234,9 @@ static void __init xen_set_identity_and_release_chunk(
>  	 * (except for the ISA region which must be 1:1 mapped) to
>  	 * release the refcounts (in Xen) on the original frames.
>  	 */
> +	if (xen_feature(XENFEAT_auto_translated_physmap))
> +		goto skip;
> +
>  	for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) {
>  		pte_t pte = __pte_ma(0);
>  
> @@ -231,7 +246,7 @@ static void __init xen_set_identity_and_release_chunk(
>  		(void)HYPERVISOR_update_va_mapping(
>  			(unsigned long)__va(pfn << PAGE_SHIFT), pte, 0);
>  	}
> -
> +skip:
>  	if (start_pfn < nr_pages)
>  		*released += xen_release_chunk(
>  			start_pfn, min(end_pfn, nr_pages));

A goto? Really? What's wrong with an if?
Also considering that you are turning xen_release_chunk into a nop, the
only purpose of this function on PVH is to call set_phys_range_identity.
Can't we just do that?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.
  2013-12-17 20:51 ` [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels " Konrad Rzeszutek Wilk
@ 2013-12-18 18:31   ` Stefano Stabellini
  2013-12-18 21:17     ` Konrad Rzeszutek Wilk
  2013-12-31 18:56     ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 18:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> From: Mukesh Rathor <mukesh.rathor@oracle.com>
> 
> PVH is a PV guest with a twist - there are certain things
> that work in it like HVM and some like PV. There is
> a similar mode - PVHVM where we run in HVM mode with
> PV code enabled - and this patch explores that.
> 
> The most notable PV interfaces are the XenBus and event channels.
> For PVH, we will use XenBus and event channels.
> 
> For the XenBus mechanism we piggyback on how it is done for
> PVHVM guests.
> 
> Ditto for the event channel mechanism - we piggyback on PVHVM -
> by setting up a specific vector callback and that
> vector ends up calling the event channel mechanism to
> dispatch the events as needed.
> 
> This means that from a pvops perspective, we can use
> native_irq_ops instead of the Xen PV specific. Albeit in the
> future we could support pirq_eoi_map. But that is
> a feature request that can be shared with PVHVM.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/xen/enlighten.c           | 6 ++++++
>  arch/x86/xen/irq.c                 | 5 ++++-
>  drivers/xen/events.c               | 5 +++++
>  drivers/xen/xenbus/xenbus_client.c | 3 ++-
>  4 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index e420613..7fceb51 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1134,6 +1134,8 @@ void xen_setup_shared_info(void)
>  	/* In UP this is as good a place as any to set up shared info */
>  	xen_setup_vcpu_info_placement();
>  #endif
> +	if (xen_pvh_domain())
> +		return;
>  
>  	xen_setup_mfn_list_list();
>  }

This is another one of those cases where I think we would benefit from
introducing xen_setup_shared_info_pvh instead of adding more ifs here.


> @@ -1146,6 +1148,10 @@ void xen_setup_vcpu_info_placement(void)
>  	for_each_possible_cpu(cpu)
>  		xen_vcpu_setup(cpu);
>  
> +	/* PVH always uses native IRQ ops */
> +	if (xen_pvh_domain())
> +		return;
> +
>  	/* xen_vcpu_setup managed to place the vcpu_info within the
>  	   percpu area for all cpus, so make use of it */
>  	if (have_vcpu_info_placement) {

Same here?


> diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
> index 0da7f86..4f7f351 100644
> --- a/arch/x86/xen/irq.c
> +++ b/arch/x86/xen/irq.c
> @@ -5,6 +5,7 @@
>  #include <xen/interface/xen.h>
>  #include <xen/interface/sched.h>
>  #include <xen/interface/vcpu.h>
> +#include <xen/features.h>
>  #include <xen/events.h>
>  
>  #include <asm/xen/hypercall.h>
> @@ -128,6 +129,8 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
>  
>  void __init xen_init_irq_ops(void)
>  {
> -	pv_irq_ops = xen_irq_ops;
> +	/* For PVH we use default pv_irq_ops settings */
> +	if (!xen_feature(XENFEAT_hvm_callback_vector))
> +		pv_irq_ops = xen_irq_ops;
>  	x86_init.irqs.intr_init = xen_init_IRQ;
>  }
> diff --git a/drivers/xen/events.c b/drivers/xen/events.c
> index 4035e83..627a16a 100644
> --- a/drivers/xen/events.c
> +++ b/drivers/xen/events.c
> @@ -1922,6 +1922,11 @@ void __init xen_init_IRQ(void)
>  		if (xen_initial_domain())
>  			pci_xen_initial_domain();
>  
> +		if (xen_feature(XENFEAT_hvm_callback_vector)) {
> +			xen_callback_vector();
> +			return;
> +		}

There is another call to xen_callback_vector in the if
(xen_hvm_domain()) path. Could we just move it out and have a single one
if (xen_feature(XENFEAT_hvm_callback_vector))?


>  		pirq_eoi_map = (void *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
>  		eoi_gmfn.gmfn = virt_to_mfn(pirq_eoi_map);
>  		rc = HYPERVISOR_physdev_op(PHYSDEVOP_pirq_eoi_gmfn_v2, &eoi_gmfn);
> diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
> index ec097d6..7f7c454 100644
> --- a/drivers/xen/xenbus/xenbus_client.c
> +++ b/drivers/xen/xenbus/xenbus_client.c
> @@ -45,6 +45,7 @@
>  #include <xen/grant_table.h>
>  #include <xen/xenbus.h>
>  #include <xen/xen.h>
> +#include <xen/features.h>
>  
>  #include "xenbus_probe.h"
>  
> @@ -743,7 +744,7 @@ static const struct xenbus_ring_ops ring_ops_hvm = {
>  
>  void __init xenbus_ring_ops_init(void)
>  {
> -	if (xen_pv_domain())
> +	if (xen_pv_domain() && !xen_feature(XENFEAT_auto_translated_physmap))

Can we just change this test to

if (!xen_feature(XENFEAT_auto_translated_physmap))

?


>  		ring_ops = &ring_ops_pv;
>  	else
>  		ring_ops = &ring_ops_hvm;
> -- 
> 1.8.3.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 10/12] xen/pvh: Piggyback on PVHVM for grant driver.
  2013-12-17 20:51 ` [PATCH v11 10/12] xen/pvh: Piggyback on PVHVM for grant driver Konrad Rzeszutek Wilk
@ 2013-12-18 18:46   ` Stefano Stabellini
  2013-12-18 21:21     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-18 18:46 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> In PVH the shared grant frame is the PFN and not MFN,
> hence its mapped via the same code path as HVM.
> 
> The allocation of the grant frame is done differently - we
> do not use the early platform-pci driver and have an
> ioremap area - instead we use balloon memory and stitch
> all of the non-contingous pages in a virtualized area.
> 
> That means when we call the hypervisor to replace the GMFN
> with a XENMAPSPACE_grant_table type, we need to lookup the
> old PFN for every iteration instead of assuming a flat
> contingous PFN allocation.
> 
> Lastly, we only use v1 for grants. This is because PVHVM
> is not able to use v2 due to no XENMEM_add_to_physmap
> calls on the error status page (see commit
> 69e8f430e243d657c2053f097efebc2e2cd559f0
>  xen/granttable: Disable grant v2 for HVM domains.)
> 
> Until that is implemented this workaround has to
> be in place.
> 
> Also per suggestions by Stefano utilize the PVHVM paths
> as they share common functionality.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  drivers/xen/gntdev.c       |  2 +-
>  drivers/xen/grant-table.c  | 80 ++++++++++++++++++++++++++++++++++++++++++----
>  drivers/xen/platform-pci.c |  2 +-
>  include/xen/grant_table.h  |  2 +-
>  4 files changed, 76 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index e41c79c..073b4a1 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -846,7 +846,7 @@ static int __init gntdev_init(void)
>  	if (!xen_domain())
>  		return -ENODEV;
>  
> -	use_ptemod = xen_pv_domain();
> +	use_ptemod = !xen_feature(XENFEAT_auto_translated_physmap);
>  
>  	err = misc_register(&gntdev_miscdev);
>  	if (err != 0) {
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index aa846a4..c0ded9f 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -47,6 +47,7 @@
>  #include <xen/interface/xen.h>
>  #include <xen/page.h>
>  #include <xen/grant_table.h>
> +#include <xen/balloon.h>
>  #include <xen/interface/memory.h>
>  #include <xen/hvc-console.h>
>  #include <xen/swiotlb-xen.h>
> @@ -66,8 +67,8 @@ static unsigned int boot_max_nr_grant_frames;
>  static int gnttab_free_count;
>  static grant_ref_t gnttab_free_head;
>  static DEFINE_SPINLOCK(gnttab_list_lock);
> -unsigned long xen_hvm_resume_frames;
> -EXPORT_SYMBOL_GPL(xen_hvm_resume_frames);
> +unsigned long xen_auto_xlat_grant_frames;
> +EXPORT_SYMBOL_GPL(xen_auto_xlat_grant_frames);
>  
>  static union {
>  	struct grant_entry_v1 *v1;
> @@ -1060,7 +1061,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
>  	unsigned int nr_gframes = end_idx + 1;
>  	int rc;
>  
> -	if (xen_hvm_domain()) {
> +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
>  		struct xen_add_to_physmap xatp;
>  		unsigned int i = end_idx;
>  		rc = 0;
> @@ -1069,10 +1070,24 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
>  		 * index, ensuring that the table will grow only once.
>  		 */
>  		do {
> +			unsigned long vaddr;
> +			unsigned int level;
> +			pte_t *pte;
> +
>  			xatp.domid = DOMID_SELF;
>  			xatp.idx = i;
>  			xatp.space = XENMAPSPACE_grant_table;
> -			xatp.gpfn = (xen_hvm_resume_frames >> PAGE_SHIFT) + i;
> +
> +			/*
> +			 * Don't assume the memory is contingous. Lookup each.
> +			 */
> +			vaddr = xen_auto_xlat_grant_frames + (i * PAGE_SIZE);
> +			if (xen_hvm_domain())
> +				xatp.gpfn = vaddr >> PAGE_SHIFT;
> +			else {
> +				pte = lookup_address(vaddr, &level);

lookup_address is x86 only. I added an empty implementation under
arch/arm to be able to compile stuff under drivers/xen but I would like
to get rid of it.
If you can't find a way to replace lookup_address with something that
works on arm and arm64 too, you could always turn
xen_auto_xlat_grant_frames into an array and store the pfns there.


> +				xatp.gpfn = pte_mfn(*pte);

Do you actually need the mfn here? Don't you just need the pfn?



> +			}
>  			rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
>  			if (rc != 0) {
>  				pr_warn("grant table add_to_physmap failed, err=%d\n",
> @@ -1135,7 +1150,7 @@ static void gnttab_request_version(void)
>  	int rc;
>  	struct gnttab_set_version gsv;
>  
> -	if (xen_hvm_domain())
> +	if (xen_hvm_domain() || xen_feature(XENFEAT_auto_translated_physmap))

just turn this to

if (xen_feature(XENFEAT_auto_translated_physmap))


>  		gsv.version = 1;
>  	else
>  		gsv.version = 2;
> @@ -1161,6 +1176,46 @@ static void gnttab_request_version(void)
>  	pr_info("Grant tables using version %d layout\n", grant_table_version);
>  }
>  
> +static int xlated_setup_gnttab_pages(unsigned long nr_grant_frames,
> +				     unsigned long max, void *addr)
> +{
> +	struct page **pages;
> +	unsigned long *pfns;
> +	int rc, i;
> +
> +	pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
> +	if (!pages)
> +		return -ENOMEM;
> +
> +	pfns = kcalloc(nr_grant_frames, sizeof(pfns[0]), GFP_KERNEL);
> +	if (!pfns) {
> +		kfree(pages);
> +		return -ENOMEM;
> +	}
> +	rc = alloc_xenballooned_pages(nr_grant_frames, pages, 0 /* lowmem */);
> +	if (rc) {
> +		pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
> +			nr_grant_frames, rc);
> +		kfree(pages);
> +		kfree(pfns);
> +		return rc;
> +	}
> +	for (i = 0; i < nr_grant_frames; i++)
> +		pfns[i] = page_to_pfn(pages[i]);
> +
> +	rc = arch_gnttab_map_shared(pfns, nr_grant_frames, max, addr);
> +
> +	kfree(pages);
> +	kfree(pfns);
> +	if (rc) {
> +		pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
> +			nr_grant_frames, rc);
> +		free_xenballooned_pages(nr_grant_frames, pages);
> +		return rc;
> +	}
> +	return rc;
> +}

Please move this function somewhere under arch/x86/xen and call it from
xen_start_kernel, it doesn't belong to common code.


>  static int gnttab_setup(void)
>  {
>  	unsigned int max_nr_gframes;
> @@ -1169,15 +1224,26 @@ static int gnttab_setup(void)
>  	if (max_nr_gframes < nr_grant_frames)
>  		return -ENOSYS;
>  
> +	if (xen_feature(XENFEAT_auto_translated_physmap) && !xen_auto_xlat_grant_frames) {
> +		/*
> +		 * xen_auto_xlat_grant_frames is setup for PVHVM by
> +		 * alloc_xen_mmio by the time this is called.
> +		 */
> +		int rc = xlated_setup_gnttab_pages(max_nr_gframes, max_nr_gframes,
> +						   &gnttab_shared.addr);
> +		if (rc)
> +			return rc;
> +		xen_auto_xlat_grant_frames = (unsigned long)gnttab_shared.addr;
> +	}

I think you should do this from arch/x86/xen. When gnttab_setup is
called in your guest, xen_auto_xlat_grant_frames should have already
been setup correctly.


>  	if (xen_pv_domain())
>  		return gnttab_map(0, nr_grant_frames - 1);
>  
>  	if (gnttab_shared.addr == NULL) {
> -		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
> +		gnttab_shared.addr = xen_remap(xen_auto_xlat_grant_frames,
>  						PAGE_SIZE * max_nr_gframes);
>  		if (gnttab_shared.addr == NULL) {
>  			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
> -					xen_hvm_resume_frames);
> +					xen_auto_xlat_grant_frames);
>  			return -ENOMEM;
>  		}
>  	}
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index 2f3528e..44bc5a6 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -154,7 +154,7 @@ static int platform_pci_init(struct pci_dev *pdev,
>  	}
>  
>  	max_nr_gframes = gnttab_max_grant_frames();
> -	xen_hvm_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
> +	xen_auto_xlat_grant_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
>  	ret = gnttab_init();
>  	if (ret)
>  		goto out;
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 694dcaf..24280ac 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -178,7 +178,7 @@ int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
>  			   grant_status_t **__shared);
>  void arch_gnttab_unmap(void *shared, unsigned long nr_gframes);
>  
> -extern unsigned long xen_hvm_resume_frames;
> +extern unsigned long xen_auto_xlat_grant_frames;
>  unsigned int gnttab_max_grant_frames(void);
>  
>  #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))

You need to s/xen_hvm_resume_frames/xen_auto_xlat_grant_frames under
arch/arm/xen too.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH
  2013-12-18 18:25   ` [Xen-devel] " Stefano Stabellini
@ 2013-12-18 20:30     ` Konrad Rzeszutek Wilk
  2013-12-18 23:44     ` Mukesh Rathor
  1 sibling, 0 replies; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-18 20:30 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Wed, Dec 18, 2013 at 06:25:15PM +0000, Stefano Stabellini wrote:
> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > In xen_add_extra_mem() we can skip updating P2M as it's managed
> > by Xen. PVH maps the entire IO space, but only RAM pages need
> > to be repopulated.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/xen/setup.c | 19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> > index 2137c51..f93bca1 100644
> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> > @@ -27,6 +27,7 @@
> >  #include <xen/interface/memory.h>
> >  #include <xen/interface/physdev.h>
> >  #include <xen/features.h>
> > +#include "mmu.h"
> >  #include "xen-ops.h"
> >  #include "vdso.h"
> >  
> > @@ -81,6 +82,9 @@ static void __init xen_add_extra_mem(u64 start, u64 size)
> >  
> >  	memblock_reserve(start, size);
> >  
> > +	if (xen_feature(XENFEAT_auto_translated_physmap))
> > +		return;
> > +
> >  	xen_max_p2m_pfn = PFN_DOWN(start + size);
> >  	for (pfn = PFN_DOWN(start); pfn < xen_max_p2m_pfn; pfn++) {
> >  		unsigned long mfn = pfn_to_mfn(pfn);
> > @@ -103,6 +107,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
> >  		.domid        = DOMID_SELF
> >  	};
> >  	unsigned long len = 0;
> > +	int xlated_phys = xen_feature(XENFEAT_auto_translated_physmap);
> >  	unsigned long pfn;
> >  	int ret;
> >  
> > @@ -116,7 +121,7 @@ static unsigned long __init xen_do_chunk(unsigned long start,
> >  				continue;
> >  			frame = mfn;
> >  		} else {
> > -			if (mfn != INVALID_P2M_ENTRY)
> > +			if (!xlated_phys && mfn != INVALID_P2M_ENTRY)
> >  				continue;
> >  			frame = pfn;
> >  		}
> > @@ -154,6 +159,13 @@ static unsigned long __init xen_do_chunk(unsigned long start,
> >  static unsigned long __init xen_release_chunk(unsigned long start,
> >  					      unsigned long end)
> >  {
> > +	/*
> > +	 * Xen already ballooned out the E820 non RAM regions for us
> > +	 * and set them up properly in EPT.
> > +	 */
> > +	if (xen_feature(XENFEAT_auto_translated_physmap))
> > +		return end - start;
> > +
> >  	return xen_do_chunk(start, end, true);
> >  }
> >  
> > @@ -222,6 +234,9 @@ static void __init xen_set_identity_and_release_chunk(
> >  	 * (except for the ISA region which must be 1:1 mapped) to
> >  	 * release the refcounts (in Xen) on the original frames.
> >  	 */
> > +	if (xen_feature(XENFEAT_auto_translated_physmap))
> > +		goto skip;
> > +
> >  	for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) {
> >  		pte_t pte = __pte_ma(0);
> >  
> > @@ -231,7 +246,7 @@ static void __init xen_set_identity_and_release_chunk(
> >  		(void)HYPERVISOR_update_va_mapping(
> >  			(unsigned long)__va(pfn << PAGE_SHIFT), pte, 0);
> >  	}
> > -
> > +skip:
> >  	if (start_pfn < nr_pages)
> >  		*released += xen_release_chunk(
> >  			start_pfn, min(end_pfn, nr_pages));
> 
> A goto? Really? What's wrong with an if?

Moves too much code.
> Also considering that you are turning xen_release_chunk into a nop, the
> only purpose of this function on PVH is to call set_phys_range_identity.
> Can't we just do that?

And set_phys_range_identity ends up just doing:

 769         if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))             
 770                 return pfn_e - pfn_s;   

So what we really could do (which is what Mukesh's patch had):

if (xen_feature(XENFEAT..) {
	if (start_pfn < nr_pages)
		*released += min(end_pfn, nr_pages) - start_pfn;
	*identity += end_pfn - start_pfn;
}

in its own function. We could do that.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.
  2013-12-18 18:31   ` [Xen-devel] " Stefano Stabellini
@ 2013-12-18 21:17     ` Konrad Rzeszutek Wilk
  2014-01-04  0:48       ` Mukesh Rathor
  2013-12-31 18:56     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-18 21:17 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Wed, Dec 18, 2013 at 06:31:43PM +0000, Stefano Stabellini wrote:
> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > PVH is a PV guest with a twist - there are certain things
> > that work in it like HVM and some like PV. There is
> > a similar mode - PVHVM where we run in HVM mode with
> > PV code enabled - and this patch explores that.
> > 
> > The most notable PV interfaces are the XenBus and event channels.
> > For PVH, we will use XenBus and event channels.
> > 
> > For the XenBus mechanism we piggyback on how it is done for
> > PVHVM guests.
> > 
> > Ditto for the event channel mechanism - we piggyback on PVHVM -
> > by setting up a specific vector callback and that
> > vector ends up calling the event channel mechanism to
> > dispatch the events as needed.
> > 
> > This means that from a pvops perspective, we can use
> > native_irq_ops instead of the Xen PV specific. Albeit in the
> > future we could support pirq_eoi_map. But that is
> > a feature request that can be shared with PVHVM.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/xen/enlighten.c           | 6 ++++++
> >  arch/x86/xen/irq.c                 | 5 ++++-
> >  drivers/xen/events.c               | 5 +++++
> >  drivers/xen/xenbus/xenbus_client.c | 3 ++-
> >  4 files changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > index e420613..7fceb51 100644
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -1134,6 +1134,8 @@ void xen_setup_shared_info(void)
> >  	/* In UP this is as good a place as any to set up shared info */
> >  	xen_setup_vcpu_info_placement();
> >  #endif
> > +	if (xen_pvh_domain())
> > +		return;
> >  
> >  	xen_setup_mfn_list_list();
> >  }
> 
> This is another one of those cases where I think we would benefit from
> introducing xen_setup_shared_info_pvh instead of adding more ifs here.

Actually this one can be removed.

> 
> 
> > @@ -1146,6 +1148,10 @@ void xen_setup_vcpu_info_placement(void)
> >  	for_each_possible_cpu(cpu)
> >  		xen_vcpu_setup(cpu);
> >  
> > +	/* PVH always uses native IRQ ops */
> > +	if (xen_pvh_domain())
> > +		return;
> > +
> >  	/* xen_vcpu_setup managed to place the vcpu_info within the
> >  	   percpu area for all cpus, so make use of it */
> >  	if (have_vcpu_info_placement) {
> 
> Same here?

Hmmm, I wonder if the vcpu info placement could work with PVH.
> 
> 
> > diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
> > index 0da7f86..4f7f351 100644
> > --- a/arch/x86/xen/irq.c
> > +++ b/arch/x86/xen/irq.c
> > @@ -5,6 +5,7 @@
> >  #include <xen/interface/xen.h>
> >  #include <xen/interface/sched.h>
> >  #include <xen/interface/vcpu.h>
> > +#include <xen/features.h>
> >  #include <xen/events.h>
> >  
> >  #include <asm/xen/hypercall.h>
> > @@ -128,6 +129,8 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
> >  
> >  void __init xen_init_irq_ops(void)
> >  {
> > -	pv_irq_ops = xen_irq_ops;
> > +	/* For PVH we use default pv_irq_ops settings */
> > +	if (!xen_feature(XENFEAT_hvm_callback_vector))
> > +		pv_irq_ops = xen_irq_ops;
> >  	x86_init.irqs.intr_init = xen_init_IRQ;
> >  }
> > diff --git a/drivers/xen/events.c b/drivers/xen/events.c
> > index 4035e83..627a16a 100644
> > --- a/drivers/xen/events.c
> > +++ b/drivers/xen/events.c
> > @@ -1922,6 +1922,11 @@ void __init xen_init_IRQ(void)
> >  		if (xen_initial_domain())
> >  			pci_xen_initial_domain();
> >  
> > +		if (xen_feature(XENFEAT_hvm_callback_vector)) {
> > +			xen_callback_vector();
> > +			return;
> > +		}
> 
> There is another call to xen_callback_vector in the if
> (xen_hvm_domain()) path. Could we just move it out and have a single one
> if (xen_feature(XENFEAT_hvm_callback_vector))?

Sure. Good idea.
> 
> 
> >  		pirq_eoi_map = (void *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
> >  		eoi_gmfn.gmfn = virt_to_mfn(pirq_eoi_map);
> >  		rc = HYPERVISOR_physdev_op(PHYSDEVOP_pirq_eoi_gmfn_v2, &eoi_gmfn);
> > diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
> > index ec097d6..7f7c454 100644
> > --- a/drivers/xen/xenbus/xenbus_client.c
> > +++ b/drivers/xen/xenbus/xenbus_client.c
> > @@ -45,6 +45,7 @@
> >  #include <xen/grant_table.h>
> >  #include <xen/xenbus.h>
> >  #include <xen/xen.h>
> > +#include <xen/features.h>
> >  
> >  #include "xenbus_probe.h"
> >  
> > @@ -743,7 +744,7 @@ static const struct xenbus_ring_ops ring_ops_hvm = {
> >  
> >  void __init xenbus_ring_ops_init(void)
> >  {
> > -	if (xen_pv_domain())
> > +	if (xen_pv_domain() && !xen_feature(XENFEAT_auto_translated_physmap))
> 
> Can we just change this test to
> 
> if (!xen_feature(XENFEAT_auto_translated_physmap))
> 
> ?

I think it can. I will try it out.
> 
> 
> >  		ring_ops = &ring_ops_pv;
> >  	else
> >  		ring_ops = &ring_ops_hvm;
> > -- 
> > 1.8.3.1
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 10/12] xen/pvh: Piggyback on PVHVM for grant driver.
  2013-12-18 18:46   ` [Xen-devel] " Stefano Stabellini
@ 2013-12-18 21:21     ` Konrad Rzeszutek Wilk
  2014-01-03 15:10       ` Stefano Stabellini
  0 siblings, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-18 21:21 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

On Wed, Dec 18, 2013 at 06:46:56PM +0000, Stefano Stabellini wrote:
> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > In PVH the shared grant frame is the PFN and not MFN,
> > hence its mapped via the same code path as HVM.
> > 
> > The allocation of the grant frame is done differently - we
> > do not use the early platform-pci driver and have an
> > ioremap area - instead we use balloon memory and stitch
> > all of the non-contingous pages in a virtualized area.
> > 
> > That means when we call the hypervisor to replace the GMFN
> > with a XENMAPSPACE_grant_table type, we need to lookup the
> > old PFN for every iteration instead of assuming a flat
> > contingous PFN allocation.
> > 
> > Lastly, we only use v1 for grants. This is because PVHVM
> > is not able to use v2 due to no XENMEM_add_to_physmap
> > calls on the error status page (see commit
> > 69e8f430e243d657c2053f097efebc2e2cd559f0
> >  xen/granttable: Disable grant v2 for HVM domains.)
> > 
> > Until that is implemented this workaround has to
> > be in place.
> > 
> > Also per suggestions by Stefano utilize the PVHVM paths
> > as they share common functionality.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  drivers/xen/gntdev.c       |  2 +-
> >  drivers/xen/grant-table.c  | 80 ++++++++++++++++++++++++++++++++++++++++++----
> >  drivers/xen/platform-pci.c |  2 +-
> >  include/xen/grant_table.h  |  2 +-
> >  4 files changed, 76 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > index e41c79c..073b4a1 100644
> > --- a/drivers/xen/gntdev.c
> > +++ b/drivers/xen/gntdev.c
> > @@ -846,7 +846,7 @@ static int __init gntdev_init(void)
> >  	if (!xen_domain())
> >  		return -ENODEV;
> >  
> > -	use_ptemod = xen_pv_domain();
> > +	use_ptemod = !xen_feature(XENFEAT_auto_translated_physmap);
> >  
> >  	err = misc_register(&gntdev_miscdev);
> >  	if (err != 0) {
> > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> > index aa846a4..c0ded9f 100644
> > --- a/drivers/xen/grant-table.c
> > +++ b/drivers/xen/grant-table.c
> > @@ -47,6 +47,7 @@
> >  #include <xen/interface/xen.h>
> >  #include <xen/page.h>
> >  #include <xen/grant_table.h>
> > +#include <xen/balloon.h>
> >  #include <xen/interface/memory.h>
> >  #include <xen/hvc-console.h>
> >  #include <xen/swiotlb-xen.h>
> > @@ -66,8 +67,8 @@ static unsigned int boot_max_nr_grant_frames;
> >  static int gnttab_free_count;
> >  static grant_ref_t gnttab_free_head;
> >  static DEFINE_SPINLOCK(gnttab_list_lock);
> > -unsigned long xen_hvm_resume_frames;
> > -EXPORT_SYMBOL_GPL(xen_hvm_resume_frames);
> > +unsigned long xen_auto_xlat_grant_frames;
> > +EXPORT_SYMBOL_GPL(xen_auto_xlat_grant_frames);
> >  
> >  static union {
> >  	struct grant_entry_v1 *v1;
> > @@ -1060,7 +1061,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
> >  	unsigned int nr_gframes = end_idx + 1;
> >  	int rc;
> >  
> > -	if (xen_hvm_domain()) {
> > +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
> >  		struct xen_add_to_physmap xatp;
> >  		unsigned int i = end_idx;
> >  		rc = 0;
> > @@ -1069,10 +1070,24 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
> >  		 * index, ensuring that the table will grow only once.
> >  		 */
> >  		do {
> > +			unsigned long vaddr;
> > +			unsigned int level;
> > +			pte_t *pte;
> > +
> >  			xatp.domid = DOMID_SELF;
> >  			xatp.idx = i;
> >  			xatp.space = XENMAPSPACE_grant_table;
> > -			xatp.gpfn = (xen_hvm_resume_frames >> PAGE_SHIFT) + i;
> > +
> > +			/*
> > +			 * Don't assume the memory is contingous. Lookup each.
> > +			 */
> > +			vaddr = xen_auto_xlat_grant_frames + (i * PAGE_SIZE);
> > +			if (xen_hvm_domain())
> > +				xatp.gpfn = vaddr >> PAGE_SHIFT;
> > +			else {
> > +				pte = lookup_address(vaddr, &level);
> 
> lookup_address is x86 only. I added an empty implementation under
> arch/arm to be able to compile stuff under drivers/xen but I would like
> to get rid of it.
> If you can't find a way to replace lookup_address with something that
> works on arm and arm64 too, you could always turn
> xen_auto_xlat_grant_frames into an array and store the pfns there.

Yeah, that would work nicely too. Thought we would have to store the
size somewhere. Maybe a struct then.

struct xlat_grant_frames {
	int	nr;
	xen_pfn_t	*array;
	struct page	*page;
};

> 
> 
> > +				xatp.gpfn = pte_mfn(*pte);
> 
> Do you actually need the mfn here? Don't you just need the pfn?

Just (pte.pte & ~PAGE_FLAGS) >> PAGE_SHIFT.

> 
> 
> 
> > +			}
> >  			rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
> >  			if (rc != 0) {
> >  				pr_warn("grant table add_to_physmap failed, err=%d\n",
> > @@ -1135,7 +1150,7 @@ static void gnttab_request_version(void)
> >  	int rc;
> >  	struct gnttab_set_version gsv;
> >  
> > -	if (xen_hvm_domain())
> > +	if (xen_hvm_domain() || xen_feature(XENFEAT_auto_translated_physmap))
> 
> just turn this to
> 
> if (xen_feature(XENFEAT_auto_translated_physmap))

Hmm, I would rather actually keep that until I fix the underlaying issue
of why v2 can't be used. And then we can get rid of this workaround.

> 
> 
> >  		gsv.version = 1;
> >  	else
> >  		gsv.version = 2;
> > @@ -1161,6 +1176,46 @@ static void gnttab_request_version(void)
> >  	pr_info("Grant tables using version %d layout\n", grant_table_version);
> >  }
> >  
> > +static int xlated_setup_gnttab_pages(unsigned long nr_grant_frames,
> > +				     unsigned long max, void *addr)
> > +{
> > +	struct page **pages;
> > +	unsigned long *pfns;
> > +	int rc, i;
> > +
> > +	pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
> > +	if (!pages)
> > +		return -ENOMEM;
> > +
> > +	pfns = kcalloc(nr_grant_frames, sizeof(pfns[0]), GFP_KERNEL);
> > +	if (!pfns) {
> > +		kfree(pages);
> > +		return -ENOMEM;
> > +	}
> > +	rc = alloc_xenballooned_pages(nr_grant_frames, pages, 0 /* lowmem */);
> > +	if (rc) {
> > +		pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
> > +			nr_grant_frames, rc);
> > +		kfree(pages);
> > +		kfree(pfns);
> > +		return rc;
> > +	}
> > +	for (i = 0; i < nr_grant_frames; i++)
> > +		pfns[i] = page_to_pfn(pages[i]);
> > +
> > +	rc = arch_gnttab_map_shared(pfns, nr_grant_frames, max, addr);
> > +
> > +	kfree(pages);
> > +	kfree(pfns);
> > +	if (rc) {
> > +		pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
> > +			nr_grant_frames, rc);
> > +		free_xenballooned_pages(nr_grant_frames, pages);
> > +		return rc;
> > +	}
> > +	return rc;
> > +}
> 
> Please move this function somewhere under arch/x86/xen and call it from
> xen_start_kernel, it doesn't belong to common code.

<nods>

Actually, looking at how the xen-platform-pci does it. I am wondering
if that can be just made in one function that will do the right
thing.

Can the xen-platform-pci use balloon pages instead of the MMIO BARs?
Or would that be a waste of memory?

> 
> 
> >  static int gnttab_setup(void)
> >  {
> >  	unsigned int max_nr_gframes;
> > @@ -1169,15 +1224,26 @@ static int gnttab_setup(void)
> >  	if (max_nr_gframes < nr_grant_frames)
> >  		return -ENOSYS;
> >  
> > +	if (xen_feature(XENFEAT_auto_translated_physmap) && !xen_auto_xlat_grant_frames) {
> > +		/*
> > +		 * xen_auto_xlat_grant_frames is setup for PVHVM by
> > +		 * alloc_xen_mmio by the time this is called.
> > +		 */
> > +		int rc = xlated_setup_gnttab_pages(max_nr_gframes, max_nr_gframes,
> > +						   &gnttab_shared.addr);
> > +		if (rc)
> > +			return rc;
> > +		xen_auto_xlat_grant_frames = (unsigned long)gnttab_shared.addr;
> > +	}
> 
> I think you should do this from arch/x86/xen. When gnttab_setup is
> called in your guest, xen_auto_xlat_grant_frames should have already
> been setup correctly.

<nods> And have some early init that sets this up.

> 
> 
> >  	if (xen_pv_domain())
> >  		return gnttab_map(0, nr_grant_frames - 1);
> >  
> >  	if (gnttab_shared.addr == NULL) {
> > -		gnttab_shared.addr = xen_remap(xen_hvm_resume_frames,
> > +		gnttab_shared.addr = xen_remap(xen_auto_xlat_grant_frames,
> >  						PAGE_SIZE * max_nr_gframes);
> >  		if (gnttab_shared.addr == NULL) {
> >  			pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n",
> > -					xen_hvm_resume_frames);
> > +					xen_auto_xlat_grant_frames);
> >  			return -ENOMEM;
> >  		}
> >  	}
> > diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> > index 2f3528e..44bc5a6 100644
> > --- a/drivers/xen/platform-pci.c
> > +++ b/drivers/xen/platform-pci.c
> > @@ -154,7 +154,7 @@ static int platform_pci_init(struct pci_dev *pdev,
> >  	}
> >  
> >  	max_nr_gframes = gnttab_max_grant_frames();
> > -	xen_hvm_resume_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
> > +	xen_auto_xlat_grant_frames = alloc_xen_mmio(PAGE_SIZE * max_nr_gframes);
> >  	ret = gnttab_init();
> >  	if (ret)
> >  		goto out;
> > diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> > index 694dcaf..24280ac 100644
> > --- a/include/xen/grant_table.h
> > +++ b/include/xen/grant_table.h
> > @@ -178,7 +178,7 @@ int arch_gnttab_map_status(uint64_t *frames, unsigned long nr_gframes,
> >  			   grant_status_t **__shared);
> >  void arch_gnttab_unmap(void *shared, unsigned long nr_gframes);
> >  
> > -extern unsigned long xen_hvm_resume_frames;
> > +extern unsigned long xen_auto_xlat_grant_frames;
> >  unsigned int gnttab_max_grant_frames(void);
> >  
> >  #define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))
> 
> You need to s/xen_hvm_resume_frames/xen_auto_xlat_grant_frames under
> arch/arm/xen too.

Right. Thanks for reminding me!

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH
  2013-12-18 18:25   ` [Xen-devel] " Stefano Stabellini
  2013-12-18 20:30     ` Konrad Rzeszutek Wilk
@ 2013-12-18 23:44     ` Mukesh Rathor
  2013-12-19 11:25       ` Stefano Stabellini
  1 sibling, 1 reply; 46+ messages in thread
From: Mukesh Rathor @ 2013-12-18 23:44 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Konrad Rzeszutek Wilk, xen-devel, linux-kernel, boris.ostrovsky,
	david.vrabel, jbeulich

On Wed, 18 Dec 2013 18:25:15 +0000
Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:

> On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > 
> > In xen_add_extra_mem() we can skip updating P2M as it's managed
> > by Xen. PVH maps the entire IO space, but only RAM pages need
> > to be repopulated.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  arch/x86/xen/setup.c | 19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
........
> > @@ -231,7 +246,7 @@ static void __init
> > xen_set_identity_and_release_chunk( (void)HYPERVISOR_update_va_mapping(
> >  			(unsigned long)__va(pfn << PAGE_SHIFT),
> > pte, 0); }
> > -
> > +skip:
> >  	if (start_pfn < nr_pages)
> >  		*released += xen_release_chunk(
> >  			start_pfn, min(end_pfn, nr_pages));
... 
> Also considering that you are turning xen_release_chunk into a nop,
> the only purpose of this function on PVH is to call
> set_phys_range_identity. Can't we just do that?

xen_release_chunk() is called for PVH to give us the count of released,
altho we don't need to release anything for pvh as it was already done in
xen. The released count is then used later to add memory.

I had separate function to just adjust the stats, which is all we need
to do for pvh, konrad just merged it with pv functions.

thanks
mukesh

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH
  2013-12-18 23:44     ` Mukesh Rathor
@ 2013-12-19 11:25       ` Stefano Stabellini
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Stabellini @ 2013-12-19 11:25 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: Stefano Stabellini, Konrad Rzeszutek Wilk, xen-devel,
	linux-kernel, boris.ostrovsky, david.vrabel, jbeulich

On Wed, 18 Dec 2013, Mukesh Rathor wrote:
> On Wed, 18 Dec 2013 18:25:15 +0000
> Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:
> 
> > On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > 
> > > In xen_add_extra_mem() we can skip updating P2M as it's managed
> > > by Xen. PVH maps the entire IO space, but only RAM pages need
> > > to be repopulated.
> > > 
> > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > ---
> > >  arch/x86/xen/setup.c | 19 +++++++++++++++++--
> > >  1 file changed, 17 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> ........
> > > @@ -231,7 +246,7 @@ static void __init
> > > xen_set_identity_and_release_chunk( (void)HYPERVISOR_update_va_mapping(
> > >  			(unsigned long)__va(pfn << PAGE_SHIFT),
> > > pte, 0); }
> > > -
> > > +skip:
> > >  	if (start_pfn < nr_pages)
> > >  		*released += xen_release_chunk(
> > >  			start_pfn, min(end_pfn, nr_pages));
> ... 
> > Also considering that you are turning xen_release_chunk into a nop,
> > the only purpose of this function on PVH is to call
> > set_phys_range_identity. Can't we just do that?
> 
> xen_release_chunk() is called for PVH to give us the count of released,
> altho we don't need to release anything for pvh as it was already done in
> xen. The released count is then used later to add memory.
> 
> I had separate function to just adjust the stats, which is all we need
> to do for pvh, konrad just merged it with pv functions.

I see. I'll let Konrad decide which way is nicer.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.
  2013-12-18 18:31   ` [Xen-devel] " Stefano Stabellini
  2013-12-18 21:17     ` Konrad Rzeszutek Wilk
@ 2013-12-31 18:56     ` Konrad Rzeszutek Wilk
  2014-01-03 15:04       ` Stefano Stabellini
  1 sibling, 1 reply; 46+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-31 18:56 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, linux-kernel, boris.ostrovsky, david.vrabel,
	mukesh.rathor, jbeulich

> > --- a/drivers/xen/xenbus/xenbus_client.c
> > +++ b/drivers/xen/xenbus/xenbus_client.c
> > @@ -45,6 +45,7 @@
> >  #include <xen/grant_table.h>
> >  #include <xen/xenbus.h>
> >  #include <xen/xen.h>
> > +#include <xen/features.h>
> >  
> >  #include "xenbus_probe.h"
> >  
> > @@ -743,7 +744,7 @@ static const struct xenbus_ring_ops ring_ops_hvm = {
> >  
> >  void __init xenbus_ring_ops_init(void)
> >  {
> > -	if (xen_pv_domain())
> > +	if (xen_pv_domain() && !xen_feature(XENFEAT_auto_translated_physmap))
> 
> Can we just change this test to
> 
> if (!xen_feature(XENFEAT_auto_translated_physmap))
> 
> ?

No. If we do then the HVM domains (which are also !auto-xlat)
will end up using the PV version of ring_ops.
> 
> 
> >  		ring_ops = &ring_ops_pv;
> >  	else
> >  		ring_ops = &ring_ops_hvm;
> > -- 
> > 1.8.3.1
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> > 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.
  2013-12-31 18:56     ` Konrad Rzeszutek Wilk
@ 2014-01-03 15:04       ` Stefano Stabellini
  2014-01-04  0:29         ` Mukesh Rathor
  0 siblings, 1 reply; 46+ messages in thread
From: Stefano Stabellini @ 2014-01-03 15:04 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, xen-devel, linux-kernel, boris.ostrovsky,
	david.vrabel, mukesh.rathor, jbeulich

On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > --- a/drivers/xen/xenbus/xenbus_client.c
> > > +++ b/drivers/xen/xenbus/xenbus_client.c
> > > @@ -45,6 +45,7 @@
> > >  #include <xen/grant_table.h>
> > >  #include <xen/xenbus.h>
> > >  #include <xen/xen.h>
> > > +#include <xen/features.h>
> > >  
> > >  #include "xenbus_probe.h"
> > >  
> > > @@ -743,7 +744,7 @@ static const struct xenbus_ring_ops ring_ops_hvm = {
> > >  
> > >  void __init xenbus_ring_ops_init(void)
> > >  {
> > > -	if (xen_pv_domain())
> > > +	if (xen_pv_domain() && !xen_feature(XENFEAT_auto_translated_physmap))
> > 
> > Can we just change this test to
> > 
> > if (!xen_feature(XENFEAT_auto_translated_physmap))
> > 
> > ?
> 
> No. If we do then the HVM domains (which are also !auto-xlat)
> will end up using the PV version of ring_ops.

Actually HVM guests have XENFEAT_auto_translated_physmap, so in this
case they would get &ring_ops_hvm.


> > >  		ring_ops = &ring_ops_pv;
> > >  	else
> > >  		ring_ops = &ring_ops_hvm;
> > > -- 
> > > 1.8.3.1
> > > 
> > > 
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xen.org
> > > http://lists.xen.org/xen-devel
> > > 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 10/12] xen/pvh: Piggyback on PVHVM for grant driver.
  2013-12-18 21:21     ` Konrad Rzeszutek Wilk
@ 2014-01-03 15:10       ` Stefano Stabellini
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Stabellini @ 2014-01-03 15:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, xen-devel, linux-kernel, boris.ostrovsky,
	david.vrabel, mukesh.rathor, jbeulich

On Wed, 18 Dec 2013, Konrad Rzeszutek Wilk wrote:
> On Wed, Dec 18, 2013 at 06:46:56PM +0000, Stefano Stabellini wrote:
> > On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > In PVH the shared grant frame is the PFN and not MFN,
> > > hence its mapped via the same code path as HVM.
> > > 
> > > The allocation of the grant frame is done differently - we
> > > do not use the early platform-pci driver and have an
> > > ioremap area - instead we use balloon memory and stitch
> > > all of the non-contingous pages in a virtualized area.
> > > 
> > > That means when we call the hypervisor to replace the GMFN
> > > with a XENMAPSPACE_grant_table type, we need to lookup the
> > > old PFN for every iteration instead of assuming a flat
> > > contingous PFN allocation.
> > > 
> > > Lastly, we only use v1 for grants. This is because PVHVM
> > > is not able to use v2 due to no XENMEM_add_to_physmap
> > > calls on the error status page (see commit
> > > 69e8f430e243d657c2053f097efebc2e2cd559f0
> > >  xen/granttable: Disable grant v2 for HVM domains.)
> > > 
> > > Until that is implemented this workaround has to
> > > be in place.
> > > 
> > > Also per suggestions by Stefano utilize the PVHVM paths
> > > as they share common functionality.
> > > 
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > ---
> > >  drivers/xen/gntdev.c       |  2 +-
> > >  drivers/xen/grant-table.c  | 80 ++++++++++++++++++++++++++++++++++++++++++----
> > >  drivers/xen/platform-pci.c |  2 +-
> > >  include/xen/grant_table.h  |  2 +-
> > >  4 files changed, 76 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > > index e41c79c..073b4a1 100644
> > > --- a/drivers/xen/gntdev.c
> > > +++ b/drivers/xen/gntdev.c
> > > @@ -846,7 +846,7 @@ static int __init gntdev_init(void)
> > >  	if (!xen_domain())
> > >  		return -ENODEV;
> > >  
> > > -	use_ptemod = xen_pv_domain();
> > > +	use_ptemod = !xen_feature(XENFEAT_auto_translated_physmap);
> > >  
> > >  	err = misc_register(&gntdev_miscdev);
> > >  	if (err != 0) {
> > > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> > > index aa846a4..c0ded9f 100644
> > > --- a/drivers/xen/grant-table.c
> > > +++ b/drivers/xen/grant-table.c
> > > @@ -47,6 +47,7 @@
> > >  #include <xen/interface/xen.h>
> > >  #include <xen/page.h>
> > >  #include <xen/grant_table.h>
> > > +#include <xen/balloon.h>
> > >  #include <xen/interface/memory.h>
> > >  #include <xen/hvc-console.h>
> > >  #include <xen/swiotlb-xen.h>
> > > @@ -66,8 +67,8 @@ static unsigned int boot_max_nr_grant_frames;
> > >  static int gnttab_free_count;
> > >  static grant_ref_t gnttab_free_head;
> > >  static DEFINE_SPINLOCK(gnttab_list_lock);
> > > -unsigned long xen_hvm_resume_frames;
> > > -EXPORT_SYMBOL_GPL(xen_hvm_resume_frames);
> > > +unsigned long xen_auto_xlat_grant_frames;
> > > +EXPORT_SYMBOL_GPL(xen_auto_xlat_grant_frames);
> > >  
> > >  static union {
> > >  	struct grant_entry_v1 *v1;
> > > @@ -1060,7 +1061,7 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
> > >  	unsigned int nr_gframes = end_idx + 1;
> > >  	int rc;
> > >  
> > > -	if (xen_hvm_domain()) {
> > > +	if (xen_feature(XENFEAT_auto_translated_physmap)) {
> > >  		struct xen_add_to_physmap xatp;
> > >  		unsigned int i = end_idx;
> > >  		rc = 0;
> > > @@ -1069,10 +1070,24 @@ static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
> > >  		 * index, ensuring that the table will grow only once.
> > >  		 */
> > >  		do {
> > > +			unsigned long vaddr;
> > > +			unsigned int level;
> > > +			pte_t *pte;
> > > +
> > >  			xatp.domid = DOMID_SELF;
> > >  			xatp.idx = i;
> > >  			xatp.space = XENMAPSPACE_grant_table;
> > > -			xatp.gpfn = (xen_hvm_resume_frames >> PAGE_SHIFT) + i;
> > > +
> > > +			/*
> > > +			 * Don't assume the memory is contingous. Lookup each.
> > > +			 */
> > > +			vaddr = xen_auto_xlat_grant_frames + (i * PAGE_SIZE);
> > > +			if (xen_hvm_domain())
> > > +				xatp.gpfn = vaddr >> PAGE_SHIFT;
> > > +			else {
> > > +				pte = lookup_address(vaddr, &level);
> > 
> > lookup_address is x86 only. I added an empty implementation under
> > arch/arm to be able to compile stuff under drivers/xen but I would like
> > to get rid of it.
> > If you can't find a way to replace lookup_address with something that
> > works on arm and arm64 too, you could always turn
> > xen_auto_xlat_grant_frames into an array and store the pfns there.
> 
> Yeah, that would work nicely too. Thought we would have to store the
> size somewhere. Maybe a struct then.
> 
> struct xlat_grant_frames {
> 	int	nr;
> 	xen_pfn_t	*array;
> 	struct page	*page;
> };

sounds good


> > > +			}
> > >  			rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp);
> > >  			if (rc != 0) {
> > >  				pr_warn("grant table add_to_physmap failed, err=%d\n",
> > > @@ -1135,7 +1150,7 @@ static void gnttab_request_version(void)
> > >  	int rc;
> > >  	struct gnttab_set_version gsv;
> > >  
> > > -	if (xen_hvm_domain())
> > > +	if (xen_hvm_domain() || xen_feature(XENFEAT_auto_translated_physmap))
> > 
> > just turn this to
> > 
> > if (xen_feature(XENFEAT_auto_translated_physmap))
> 
> Hmm, I would rather actually keep that until I fix the underlaying issue
> of why v2 can't be used. And then we can get rid of this workaround.

Given that HVM guests have XENFEAT_auto_translated_physmap, the two are
equivalent.


> > >  		gsv.version = 1;
> > >  	else
> > >  		gsv.version = 2;
> > > @@ -1161,6 +1176,46 @@ static void gnttab_request_version(void)
> > >  	pr_info("Grant tables using version %d layout\n", grant_table_version);
> > >  }
> > >  
> > > +static int xlated_setup_gnttab_pages(unsigned long nr_grant_frames,
> > > +				     unsigned long max, void *addr)
> > > +{
> > > +	struct page **pages;
> > > +	unsigned long *pfns;
> > > +	int rc, i;
> > > +
> > > +	pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
> > > +	if (!pages)
> > > +		return -ENOMEM;
> > > +
> > > +	pfns = kcalloc(nr_grant_frames, sizeof(pfns[0]), GFP_KERNEL);
> > > +	if (!pfns) {
> > > +		kfree(pages);
> > > +		return -ENOMEM;
> > > +	}
> > > +	rc = alloc_xenballooned_pages(nr_grant_frames, pages, 0 /* lowmem */);
> > > +	if (rc) {
> > > +		pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
> > > +			nr_grant_frames, rc);
> > > +		kfree(pages);
> > > +		kfree(pfns);
> > > +		return rc;
> > > +	}
> > > +	for (i = 0; i < nr_grant_frames; i++)
> > > +		pfns[i] = page_to_pfn(pages[i]);
> > > +
> > > +	rc = arch_gnttab_map_shared(pfns, nr_grant_frames, max, addr);
> > > +
> > > +	kfree(pages);
> > > +	kfree(pfns);
> > > +	if (rc) {
> > > +		pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
> > > +			nr_grant_frames, rc);
> > > +		free_xenballooned_pages(nr_grant_frames, pages);
> > > +		return rc;
> > > +	}
> > > +	return rc;
> > > +}
> > 
> > Please move this function somewhere under arch/x86/xen and call it from
> > xen_start_kernel, it doesn't belong to common code.
> 
> <nods>
> 
> Actually, looking at how the xen-platform-pci does it. I am wondering
> if that can be just made in one function that will do the right
> thing.
> 
> Can the xen-platform-pci use balloon pages instead of the MMIO BARs?
> Or would that be a waste of memory?

I think they could be used but it would be a waste and they would need
to be contiguous in gpfn space.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.
  2014-01-03 15:04       ` Stefano Stabellini
@ 2014-01-04  0:29         ` Mukesh Rathor
  0 siblings, 0 replies; 46+ messages in thread
From: Mukesh Rathor @ 2014-01-04  0:29 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Konrad Rzeszutek Wilk, xen-devel, linux-kernel, boris.ostrovsky,
	david.vrabel, jbeulich

On Fri, 3 Jan 2014 15:04:27 +0000
Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:

> On Tue, 31 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > > --- a/drivers/xen/xenbus/xenbus_client.c
> > > > +++ b/drivers/xen/xenbus/xenbus_client.c
> > > > @@ -45,6 +45,7 @@
> > > >  #include <xen/grant_table.h>
> > > >  #include <xen/xenbus.h>
> > > >  #include <xen/xen.h>
> > > > +#include <xen/features.h>
> > > >  
> > > >  #include "xenbus_probe.h"
> > > >  
> > > > @@ -743,7 +744,7 @@ static const struct xenbus_ring_ops
> > > > ring_ops_hvm = { 
> > > >  void __init xenbus_ring_ops_init(void)
> > > >  {
> > > > -	if (xen_pv_domain())
> > > > +	if (xen_pv_domain()
> > > > && !xen_feature(XENFEAT_auto_translated_physmap))
> > > 
> > > Can we just change this test to
> > > 
> > > if (!xen_feature(XENFEAT_auto_translated_physmap))
> > > 
> > > ?
> > 
> > No. If we do then the HVM domains (which are also !auto-xlat)
> > will end up using the PV version of ring_ops.
> 
> Actually HVM guests have XENFEAT_auto_translated_physmap, so in this
> case they would get &ring_ops_hvm.

Right. Back then I was confused about all the other PV modes, like
shadow, supervisor, ... but looks like they are all obsolete. It could 
just be:

        if (!xen_feature(XENFEAT_auto_translated_physmap))
                ring_ops = &ring_ops_pv;
        else
                ring_ops = &ring_ops_hvm;

thanks,
Mukesh

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.
  2013-12-18 21:17     ` Konrad Rzeszutek Wilk
@ 2014-01-04  0:48       ` Mukesh Rathor
  2014-01-05 17:18         ` Stefano Stabellini
  0 siblings, 1 reply; 46+ messages in thread
From: Mukesh Rathor @ 2014-01-04  0:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, xen-devel, linux-kernel, boris.ostrovsky,
	david.vrabel, jbeulich

On Wed, 18 Dec 2013 16:17:39 -0500
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> On Wed, Dec 18, 2013 at 06:31:43PM +0000, Stefano Stabellini wrote:
> > On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > 
> > > PVH is a PV guest with a twist - there are certain things
> > > that work in it like HVM and some like PV. There is
> > > a similar mode - PVHVM where we run in HVM mode with
> > > PV code enabled - and this patch explores that.
> > > 
> > > The most notable PV interfaces are the XenBus and event channels.
> > > For PVH, we will use XenBus and event channels.
> > > 
> > > For the XenBus mechanism we piggyback on how it is done for
> > > PVHVM guests.
> > > 
> > > Ditto for the event channel mechanism - we piggyback on PVHVM -
> > > by setting up a specific vector callback and that
> > > vector ends up calling the event channel mechanism to
> > > dispatch the events as needed.
> > > 
> > > This means that from a pvops perspective, we can use
> > > native_irq_ops instead of the Xen PV specific. Albeit in the
> > > future we could support pirq_eoi_map. But that is
> > > a feature request that can be shared with PVHVM.
> > > 
> > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > ---
> > >  arch/x86/xen/enlighten.c           | 6 ++++++
> > >  arch/x86/xen/irq.c                 | 5 ++++-
> > >  drivers/xen/events.c               | 5 +++++
> > >  drivers/xen/xenbus/xenbus_client.c | 3 ++-
> > >  4 files changed, 17 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > > index e420613..7fceb51 100644
> > > --- a/arch/x86/xen/enlighten.c
> > > +++ b/arch/x86/xen/enlighten.c
> > > @@ -1134,6 +1134,8 @@ void xen_setup_shared_info(void)
> > >  	/* In UP this is as good a place as any to set up shared
> > > info */ xen_setup_vcpu_info_placement();
> > >  #endif
> > > +	if (xen_pvh_domain())
> > > +		return;
> > >  
> > >  	xen_setup_mfn_list_list();
> > >  }
> > 
> > This is another one of those cases where I think we would benefit
> > from introducing xen_setup_shared_info_pvh instead of adding more
> > ifs here.
> 
> Actually this one can be removed.
> 
> > 
> > 
> > > @@ -1146,6 +1148,10 @@ void xen_setup_vcpu_info_placement(void)
> > >  	for_each_possible_cpu(cpu)
> > >  		xen_vcpu_setup(cpu);
> > >  
> > > +	/* PVH always uses native IRQ ops */
> > > +	if (xen_pvh_domain())
> > > +		return;
> > > +
> > >  	/* xen_vcpu_setup managed to place the vcpu_info within
> > > the percpu area for all cpus, so make use of it */
> > >  	if (have_vcpu_info_placement) {
> > 
> > Same here?
> 
> Hmmm, I wonder if the vcpu info placement could work with PVH.

It should now (after a patch I sent while ago)... the comment implies
that PVH uses native IRQs even case of vcpu info placlement...

perhaps it would be more clear to do:

        for_each_possible_cpu(cpu)
                xen_vcpu_setup(cpu);
        /* PVH always uses native IRQ ops */
        if (have_vcpu_info_placement && !xen_pvh_domain) {
            pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
            .........


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xen-devel] [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels for PVH.
  2014-01-04  0:48       ` Mukesh Rathor
@ 2014-01-05 17:18         ` Stefano Stabellini
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Stabellini @ 2014-01-05 17:18 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: Konrad Rzeszutek Wilk, Stefano Stabellini, xen-devel,
	linux-kernel, boris.ostrovsky, david.vrabel, jbeulich

On Fri, 3 Jan 2014, Mukesh Rathor wrote:
> On Wed, 18 Dec 2013 16:17:39 -0500
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> 
> > On Wed, Dec 18, 2013 at 06:31:43PM +0000, Stefano Stabellini wrote:
> > > On Tue, 17 Dec 2013, Konrad Rzeszutek Wilk wrote:
> > > > From: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > > 
> > > > PVH is a PV guest with a twist - there are certain things
> > > > that work in it like HVM and some like PV. There is
> > > > a similar mode - PVHVM where we run in HVM mode with
> > > > PV code enabled - and this patch explores that.
> > > > 
> > > > The most notable PV interfaces are the XenBus and event channels.
> > > > For PVH, we will use XenBus and event channels.
> > > > 
> > > > For the XenBus mechanism we piggyback on how it is done for
> > > > PVHVM guests.
> > > > 
> > > > Ditto for the event channel mechanism - we piggyback on PVHVM -
> > > > by setting up a specific vector callback and that
> > > > vector ends up calling the event channel mechanism to
> > > > dispatch the events as needed.
> > > > 
> > > > This means that from a pvops perspective, we can use
> > > > native_irq_ops instead of the Xen PV specific. Albeit in the
> > > > future we could support pirq_eoi_map. But that is
> > > > a feature request that can be shared with PVHVM.
> > > > 
> > > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > ---
> > > >  arch/x86/xen/enlighten.c           | 6 ++++++
> > > >  arch/x86/xen/irq.c                 | 5 ++++-
> > > >  drivers/xen/events.c               | 5 +++++
> > > >  drivers/xen/xenbus/xenbus_client.c | 3 ++-
> > > >  4 files changed, 17 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > > > index e420613..7fceb51 100644
> > > > --- a/arch/x86/xen/enlighten.c
> > > > +++ b/arch/x86/xen/enlighten.c
> > > > @@ -1134,6 +1134,8 @@ void xen_setup_shared_info(void)
> > > >  	/* In UP this is as good a place as any to set up shared
> > > > info */ xen_setup_vcpu_info_placement();
> > > >  #endif
> > > > +	if (xen_pvh_domain())
> > > > +		return;
> > > >  
> > > >  	xen_setup_mfn_list_list();
> > > >  }
> > > 
> > > This is another one of those cases where I think we would benefit
> > > from introducing xen_setup_shared_info_pvh instead of adding more
> > > ifs here.
> > 
> > Actually this one can be removed.
> > 
> > > 
> > > 
> > > > @@ -1146,6 +1148,10 @@ void xen_setup_vcpu_info_placement(void)
> > > >  	for_each_possible_cpu(cpu)
> > > >  		xen_vcpu_setup(cpu);
> > > >  
> > > > +	/* PVH always uses native IRQ ops */
> > > > +	if (xen_pvh_domain())
> > > > +		return;
> > > > +
> > > >  	/* xen_vcpu_setup managed to place the vcpu_info within
> > > > the percpu area for all cpus, so make use of it */
> > > >  	if (have_vcpu_info_placement) {
> > > 
> > > Same here?
> > 
> > Hmmm, I wonder if the vcpu info placement could work with PVH.
> 
> It should now (after a patch I sent while ago)... the comment implies
> that PVH uses native IRQs even case of vcpu info placlement...
> 
> perhaps it would be more clear to do:
> 
>         for_each_possible_cpu(cpu)
>                 xen_vcpu_setup(cpu);
>         /* PVH always uses native IRQ ops */
>         if (have_vcpu_info_placement && !xen_pvh_domain) {
>             pv_irq_ops.save_fl = __PV_IS_CALLEE_SAVE(xen_save_fl_direct);
>             .........

Yeah, this looks better

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2014-01-05 17:19 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-17 20:51 [PATCH v11] PVH support for Linux kernel Konrad Rzeszutek Wilk
2013-12-17 20:51 ` [PATCH v11 01/12] xen/p2m: Check for auto-xlat when doing mfn_to_local_pfn Konrad Rzeszutek Wilk
2013-12-18 14:10   ` [Xen-devel] " Stefano Stabellini
2013-12-17 20:51 ` [PATCH v11 02/12] xen/pvh: Define what an PVH guest is Konrad Rzeszutek Wilk
2013-12-18 14:22   ` [Xen-devel] " Stefano Stabellini
2013-12-18 14:55     ` Stefano Stabellini
2013-12-18 16:01       ` Ian Campbell
2013-12-18 16:58         ` Konrad Rzeszutek Wilk
2013-12-18 17:03           ` Ian Campbell
2013-12-18 14:57     ` Konrad Rzeszutek Wilk
2013-12-17 20:51 ` [PATCH v11 03/12] xen/pvh: Early bootup changes in PV code Konrad Rzeszutek Wilk
2013-12-18 14:27   ` [Xen-devel] " Stefano Stabellini
2013-12-18 14:58     ` Konrad Rzeszutek Wilk
2013-12-18 15:05       ` Stefano Stabellini
2013-12-17 20:51 ` [PATCH v11 04/12] xen/pvh: Don't setup P2M tree Konrad Rzeszutek Wilk
2013-12-18 14:39   ` [Xen-devel] " Stefano Stabellini
2013-12-18 15:05     ` Konrad Rzeszutek Wilk
2013-12-17 20:51 ` [PATCH v11 05/12] xen/pvh: Update E820 to work with PVH Konrad Rzeszutek Wilk
2013-12-18 18:25   ` [Xen-devel] " Stefano Stabellini
2013-12-18 20:30     ` Konrad Rzeszutek Wilk
2013-12-18 23:44     ` Mukesh Rathor
2013-12-19 11:25       ` Stefano Stabellini
2013-12-17 20:51 ` [PATCH v11 06/12] xen/pvh: Load GDT/GS in early PV bootup code for BSP Konrad Rzeszutek Wilk
2013-12-17 20:51 ` [PATCH v11 07/12] xen/pvh: Secondary VCPU bringup (non-bootup CPUs) Konrad Rzeszutek Wilk
2013-12-17 20:51 ` [PATCH v11 08/12] xen/pvh: MMU changes for PVH Konrad Rzeszutek Wilk
2013-12-18 14:48   ` [Xen-devel] " Stefano Stabellini
2013-12-18 15:10     ` Konrad Rzeszutek Wilk
2013-12-18 15:15       ` Stefano Stabellini
2013-12-17 20:51 ` [PATCH v11 09/12] xen/pvh: Piggyback on PVHVM XenBus and event channels " Konrad Rzeszutek Wilk
2013-12-18 18:31   ` [Xen-devel] " Stefano Stabellini
2013-12-18 21:17     ` Konrad Rzeszutek Wilk
2014-01-04  0:48       ` Mukesh Rathor
2014-01-05 17:18         ` Stefano Stabellini
2013-12-31 18:56     ` Konrad Rzeszutek Wilk
2014-01-03 15:04       ` Stefano Stabellini
2014-01-04  0:29         ` Mukesh Rathor
2013-12-17 20:51 ` [PATCH v11 10/12] xen/pvh: Piggyback on PVHVM for grant driver Konrad Rzeszutek Wilk
2013-12-18 18:46   ` [Xen-devel] " Stefano Stabellini
2013-12-18 21:21     ` Konrad Rzeszutek Wilk
2014-01-03 15:10       ` Stefano Stabellini
2013-12-17 20:51 ` [PATCH v11 11/12] xen/pvh: Disable PV code that does not work with PVH Konrad Rzeszutek Wilk
2013-12-18 14:19   ` [Xen-devel] " Stefano Stabellini
2013-12-18 14:56     ` Konrad Rzeszutek Wilk
2013-12-18 15:22       ` Stefano Stabellini
2013-12-17 20:51 ` [PATCH v11 12/12] xen/pvh: Support ParaVirtualized Hardware extensions Konrad Rzeszutek Wilk
2013-12-18 14:52   ` [Xen-devel] " Stefano Stabellini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).