All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-12 23:25 Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 01/17] xen/dom0: handle acpi lapic parsing in Xen dom0 Jeremy Fitzhardinge
                   ` (17 more replies)
  0 siblings, 18 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel

Hi Ingo,

Here's a revised set of the Xen APIC changes which adds io_apic_ops
to allow Xen to intercept IO APIC access operations.

Thanks,
	J

The following changes since commit ce791368bb4a53d05e78e1588bac0aacde8db84c:
  Jeremy Fitzhardinge (1):
        xen/i386: make sure initial VGA/ISA mappings are not overridden

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git for-ingo/xen/dom0/apic-ops

Gerd Hoffmann (2):
      xen: set pirq name to something useful.
      xen: fix legacy irq setup, make ioapic-less machines work.

Ian Campbell (1):
      xen: pre-initialize legacy irqs early

Jeremy Fitzhardinge (14):
      xen/dom0: handle acpi lapic parsing in Xen dom0
      x86: add io_apic_ops to allow interception
      xen: implement io_apic_ops
      xen: create dummy ioapic mapping
      xen: implement pirq type event channels
      x86/io_apic: add get_nr_irqs_gsi()
      xen/apic: identity map gsi->irqs
      xen: direct irq registration to pirq event channels
      xen: bind pirq to vector and event channel
      xen: don't setup acpi interrupt unless there is one
      xen: use acpi_get_override_irq() to get triggering for legacy irqs
      xen: initialize irq 0 too
      xen: dynamically allocate irq & event structures
      xen: disable MSI

 arch/x86/include/asm/io_apic.h |   10 ++
 arch/x86/include/asm/xen/pci.h |   13 ++
 arch/x86/kernel/acpi/boot.c    |   18 +++-
 arch/x86/kernel/apic/io_apic.c |   55 ++++++++-
 arch/x86/xen/Kconfig           |   11 ++
 arch/x86/xen/Makefile          |    3 +-
 arch/x86/xen/apic.c            |   69 ++++++++++
 arch/x86/xen/enlighten.c       |    2 +
 arch/x86/xen/mmu.c             |   10 ++
 arch/x86/xen/pci.c             |   86 +++++++++++++
 arch/x86/xen/xen-ops.h         |    6 +
 drivers/pci/pci.h              |    2 -
 drivers/xen/events.c           |  273 ++++++++++++++++++++++++++++++++++++++--
 include/linux/pci.h            |    6 +
 include/xen/events.h           |   19 +++
 15 files changed, 568 insertions(+), 15 deletions(-)
 create mode 100644 arch/x86/include/asm/xen/pci.h
 create mode 100644 arch/x86/xen/apic.c
 create mode 100644 arch/x86/xen/pci.c


^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 01/17] xen/dom0: handle acpi lapic parsing in Xen dom0
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 02/17] x86: add io_apic_ops to allow interception Jeremy Fitzhardinge
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

When running in Xen dom0, we still want to parse the ACPI tables to
find out about local and IO apics, but we don't want to actually use
the lapics.

Put a couple of tests for Xen to prevent lapics from being mapped or
accessed.  This is very Xen-specific behaviour, so there didn't seem to
be any point in adding more indirection.

[ Impact: ignore local apics, which are not usable under Xen ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/kernel/acpi/boot.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 723989d..4147e0c 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -41,6 +41,8 @@
 #include <asm/mpspec.h>
 #include <asm/smp.h>
 
+#include <asm/xen/hypervisor.h>
+
 static int __initdata acpi_force = 0;
 u32 acpi_rsdt_forced;
 #ifdef	CONFIG_ACPI
@@ -218,6 +220,10 @@ static void __cpuinit acpi_register_lapic(int id, u8 enabled)
 {
 	unsigned int ver = 0;
 
+	/* We don't want to register lapics when in Xen dom0 */
+	if (xen_initial_domain())
+		return;
+
 	if (!enabled) {
 		++disabled_cpus;
 		return;
@@ -802,6 +808,10 @@ static int __init acpi_parse_fadt(struct acpi_table_header *table)
 
 static void __init acpi_register_lapic_address(unsigned long address)
 {
+	/* Xen dom0 doesn't have usable lapics */
+	if (xen_initial_domain())
+		return;
+
 	mp_lapic_addr = address;
 
 	set_fixmap_nocache(FIX_APIC_BASE, address);
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 02/17] x86: add io_apic_ops to allow interception
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 01/17] xen/dom0: handle acpi lapic parsing in Xen dom0 Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-25  3:54     ` Ingo Molnar
  2009-05-12 23:25 ` [PATCH 03/17] xen: implement io_apic_ops Jeremy Fitzhardinge
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Xen dom0 needs to paravirtualize IO operations to the IO APIC, so add
a io_apic_ops for it to intercept.  Do this as ops structure because
there's at least some chance that another paravirtualized environment
may want to intercept these.

[Impact: indirect IO APIC access via io_apic_ops]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/io_apic.h |    9 +++++++
 arch/x86/kernel/apic/io_apic.c |   50 +++++++++++++++++++++++++++++++++++++--
 2 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 9d826e4..8cbfe73 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -21,6 +21,15 @@
 #define IO_APIC_REDIR_LEVEL_TRIGGER	(1 << 15)
 #define IO_APIC_REDIR_MASKED		(1 << 16)
 
+struct io_apic_ops {
+	void (*init)(void);
+	unsigned int (*read)(unsigned int apic, unsigned int reg);
+	void (*write)(unsigned int apic, unsigned int reg, unsigned int value);
+	void (*modify)(unsigned int apic, unsigned int reg, unsigned int value);
+};
+
+void __init set_io_apic_ops(const struct io_apic_ops *);
+
 /*
  * The structure of the IO-APIC:
  */
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 30da617..c24f116 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -66,6 +66,25 @@
 
 #define __apicdebuginit(type) static type __init
 
+static void __init __ioapic_init_mappings(void);
+static unsigned int __io_apic_read(unsigned int apic, unsigned int reg);
+static void __io_apic_write(unsigned int apic, unsigned int reg,
+			    unsigned int val);
+static void __io_apic_modify(unsigned int apic, unsigned int reg,
+			     unsigned int val);
+
+static struct io_apic_ops io_apic_ops = {
+	.init = __ioapic_init_mappings,
+	.read = __io_apic_read,
+	.write = __io_apic_write,
+	.modify = __io_apic_modify,
+};
+
+void __init set_io_apic_ops(const struct io_apic_ops *ops)
+{
+	io_apic_ops = *ops;
+}
+
 /*
  *      Is the SiS APIC rmw bug present ?
  *      -1 = don't know, 0 = no, 1 = yes
@@ -385,6 +404,24 @@ set_extra_move_desc(struct irq_desc *desc, const struct cpumask *mask)
 }
 #endif
 
+static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
+{
+	return io_apic_ops.read(apic, reg);
+}
+
+static inline void io_apic_write(unsigned int apic, unsigned int reg,
+				 unsigned int value)
+{
+	io_apic_ops.write(apic, reg, value);
+}
+
+static inline void io_apic_modify(unsigned int apic, unsigned int reg,
+				  unsigned int value)
+{
+	io_apic_ops.modify(apic, reg, value);
+}
+
+
 struct io_apic {
 	unsigned int index;
 	unsigned int unused[3];
@@ -405,14 +442,15 @@ static inline void io_apic_eoi(unsigned int apic, unsigned int vector)
 	writel(vector, &io_apic->eoi);
 }
 
-static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
+static unsigned int __io_apic_read(unsigned int apic, unsigned int reg)
 {
 	struct io_apic __iomem *io_apic = io_apic_base(apic);
 	writel(reg, &io_apic->index);
 	return readl(&io_apic->data);
 }
 
-static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
+static void __io_apic_write(unsigned int apic, unsigned int reg,
+			    unsigned int value)
 {
 	struct io_apic __iomem *io_apic = io_apic_base(apic);
 	writel(reg, &io_apic->index);
@@ -425,7 +463,8 @@ static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned i
  *
  * Older SiS APIC requires we rewrite the index register
  */
-static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
+static void __io_apic_modify(unsigned int apic, unsigned int reg,
+			     unsigned int value)
 {
 	struct io_apic __iomem *io_apic = io_apic_base(apic);
 
@@ -4141,6 +4180,11 @@ static struct resource * __init ioapic_setup_resources(void)
 
 void __init ioapic_init_mappings(void)
 {
+	io_apic_ops.init();
+}
+
+static void __init __ioapic_init_mappings(void)
+{
 	unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
 	struct resource *ioapic_res;
 	int i;
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 03/17] xen: implement io_apic_ops
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 01/17] xen/dom0: handle acpi lapic parsing in Xen dom0 Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 02/17] x86: add io_apic_ops to allow interception Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 04/17] xen: create dummy ioapic mapping Jeremy Fitzhardinge
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

Writes to the IO APIC are paravirtualized via hypercalls, so implement
the appropriate operations.

[ Impact: implement Xen interface for io_apic_ops ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/Makefile    |    2 +-
 arch/x86/xen/apic.c      |   64 ++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/xen/enlighten.c |    2 +
 arch/x86/xen/xen-ops.h   |    6 ++++
 4 files changed, 73 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/xen/apic.c

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index c4cda96..73ecb74 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -11,4 +11,4 @@ obj-y		:= enlighten.o setup.o multicalls.o mmu.o irq.o \
 
 obj-$(CONFIG_SMP)		+= smp.o spinlock.o
 obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
-obj-$(CONFIG_XEN_DOM0)		+= vga.o
+obj-$(CONFIG_XEN_DOM0)		+= vga.o apic.o
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
new file mode 100644
index 0000000..8ae563c
--- /dev/null
+++ b/arch/x86/xen/apic.c
@@ -0,0 +1,64 @@
+#include <linux/kernel.h>
+#include <linux/threads.h>
+#include <linux/bitmap.h>
+
+#include <asm/io_apic.h>
+#include <asm/acpi.h>
+
+#include <asm/xen/hypervisor.h>
+#include <asm/xen/hypercall.h>
+
+#include <xen/interface/xen.h>
+#include <xen/interface/physdev.h>
+
+static void __init xen_io_apic_init(void)
+{
+}
+
+static unsigned int xen_io_apic_read(unsigned apic, unsigned reg)
+{
+	struct physdev_apic apic_op;
+	int ret;
+
+	apic_op.apic_physbase = mp_ioapics[apic].apicaddr;
+	apic_op.reg = reg;
+	ret = HYPERVISOR_physdev_op(PHYSDEVOP_apic_read, &apic_op);
+	if (ret)
+		BUG();
+	return apic_op.value;
+}
+
+
+static void xen_io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
+{
+	struct physdev_apic apic_op;
+
+	apic_op.apic_physbase = mp_ioapics[apic].apicaddr;
+	apic_op.reg = reg;
+	apic_op.value = value;
+	if (HYPERVISOR_physdev_op(PHYSDEVOP_apic_write, &apic_op))
+		BUG();
+}
+
+static struct io_apic_ops __initdata xen_ioapic_ops = {
+	.init = xen_io_apic_init,
+	.read = xen_io_apic_read,
+	.write = xen_io_apic_write,
+	.modify = xen_io_apic_write,
+};
+
+void xen_init_apic(void)
+{
+	if (!xen_initial_domain())
+		return;
+
+	set_io_apic_ops(&xen_ioapic_ops);
+
+#ifdef CONFIG_ACPI
+	/*
+	 * Pretend ACPI found our lapic even though we've disabled it,
+ 	 * to prevent MP tables from setting up lapics.
+ 	 */
+	acpi_lapic = 1;
+#endif
+}
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 12e4d9c..3a4932a 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1085,6 +1085,8 @@ asmlinkage void __init xen_start_kernel(void)
 		set_iopl.iopl = 1;
 		if (HYPERVISOR_physdev_op(PHYSDEVOP_set_iopl, &set_iopl) == -1)
 			BUG();
+
+		xen_init_apic();
 	}
 
 	/* set the limit of our address space */
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 40abcef..0853949 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -76,13 +76,19 @@ struct dom0_vga_console_info;
 
 #ifdef CONFIG_XEN_DOM0
 void xen_init_vga(const struct dom0_vga_console_info *, size_t size);
+void xen_init_apic(void);
 #else
 static inline void xen_init_vga(const struct dom0_vga_console_info *info,
 				size_t size)
 {
 }
+
+static inline void xen_init_apic(void)
+{
+}
 #endif
 
+
 /* Declare an asm function, along with symbols needed to make it
    inlineable */
 #define DECL_ASM(ret, name, ...)		\
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 04/17] xen: create dummy ioapic mapping
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2009-05-12 23:25 ` [PATCH 03/17] xen: implement io_apic_ops Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 05/17] xen: implement pirq type event channels Jeremy Fitzhardinge
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

We don't allow direct access to the IO apic, so make sure that any
request to map it just "maps" non-present pages.  We should see any
attempts at direct access explode nicely.

[ Impact: debuggability (make failures obvious) ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/mmu.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 331e52d..139c8de 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1919,6 +1919,16 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
 		pte = pfn_pte(phys, prot);
 		break;
 
+#ifdef CONFIG_X86_IO_APIC
+	case FIX_IO_APIC_BASE_0 ... FIX_IO_APIC_BASE_END:
+		/*
+		 * We just don't map the IO APIC - all access is via
+		 * hypercalls.  Keep the address in the pte for reference.
+		 */
+		pte = pfn_pte(phys, PAGE_NONE);
+		break;
+#endif
+
 	case FIX_PARAVIRT_BOOTMAP:
 		/* This is an MFN, but it isn't an IO mapping from the
 		   IO domain */
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 05/17] xen: implement pirq type event channels
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (3 preceding siblings ...)
  2009-05-12 23:25 ` [PATCH 04/17] xen: create dummy ioapic mapping Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25   ` Jeremy Fitzhardinge
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

A privileged PV Xen domain can get direct access to hardware.  In
order for this to be useful, it must be able to get hardware
interrupts.

Being a PV Xen domain, all interrupts are delivered as event channels.
PIRQ event channels are bound to a pirq number and an interrupt
vector.  When a IO APIC raises a hardware interrupt on that vector, it
is delivered as an event channel, which we can deliver to the
appropriate device driver(s).

This patch simply implements the infrastructure for dealing with pirq
event channels.

[ Impact: integrate hardware interrupts into Xen's event scheme ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/events.c |  245 +++++++++++++++++++++++++++++++++++++++++++++++++-
 include/xen/events.h |   11 +++
 2 files changed, 253 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 97f4b39..fd98c19 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -16,7 +16,7 @@
  *    (typically dom0).
  * 2. VIRQs, typically used for timers.  These are per-cpu events.
  * 3. IPIs.
- * 4. Hardware interrupts. Not supported at present.
+ * 4. PIRQs - Hardware interrupts.
  *
  * Jeremy Fitzhardinge <jeremy@xensource.com>, XenSource Inc, 2007
  */
@@ -40,6 +40,9 @@
 #include <xen/interface/xen.h>
 #include <xen/interface/event_channel.h>
 
+/* Leave low irqs free for identity mapping */
+#define LEGACY_IRQS	16
+
 /*
  * This lock protects updates to the following mapping and reference-count
  * arrays. The lock does not need to be acquired to read the mapping tables.
@@ -83,10 +86,12 @@ struct irq_info
 		enum ipi_vector ipi;
 		struct {
 			unsigned short gsi;
-			unsigned short vector;
+			unsigned char vector;
+			unsigned char flags;
 		} pirq;
 	} u;
 };
+#define PIRQ_NEEDS_EOI	(1 << 0)
 
 static struct irq_info irq_info[NR_IRQS];
 
@@ -106,6 +111,7 @@ static inline unsigned long *cpu_evtchn_mask(int cpu)
 #define VALID_EVTCHN(chn)	((chn) != 0)
 
 static struct irq_chip xen_dynamic_chip;
+static struct irq_chip xen_pirq_chip;
 
 /* Constructor for packed IRQ information. */
 static struct irq_info mk_unbound_info(void)
@@ -218,6 +224,15 @@ static unsigned int cpu_from_evtchn(unsigned int evtchn)
 	return ret;
 }
 
+static bool pirq_needs_eoi(unsigned irq)
+{
+	struct irq_info *info = info_for_irq(irq);
+
+	BUG_ON(info->type != IRQT_PIRQ);
+
+	return info->u.pirq.flags & PIRQ_NEEDS_EOI;
+}
+
 static inline unsigned long active_evtchns(unsigned int cpu,
 					   struct shared_info *sh,
 					   unsigned int idx)
@@ -334,7 +349,7 @@ static int find_unbound_irq(void)
 	int irq;
 	struct irq_desc *desc;
 
-	for (irq = 0; irq < nr_irqs; irq++)
+	for (irq = LEGACY_IRQS; irq < nr_irqs; irq++)
 		if (irq_info[irq].type == IRQT_UNBOUND)
 			break;
 
@@ -350,6 +365,210 @@ static int find_unbound_irq(void)
 	return irq;
 }
 
+static bool identity_mapped_irq(unsigned irq)
+{
+	/* only identity map legacy irqs */
+	return irq < LEGACY_IRQS;
+}
+
+static void pirq_unmask_notify(int irq)
+{
+	struct physdev_eoi eoi = { .irq = irq };
+
+	if (unlikely(pirq_needs_eoi(irq))) {
+		int rc = HYPERVISOR_physdev_op(PHYSDEVOP_eoi, &eoi);
+		WARN_ON(rc);
+	}
+}
+
+static void pirq_query_unmask(int irq)
+{
+	struct physdev_irq_status_query irq_status;
+	struct irq_info *info = info_for_irq(irq);
+
+	BUG_ON(info->type != IRQT_PIRQ);
+
+	irq_status.irq = irq;
+	if (HYPERVISOR_physdev_op(PHYSDEVOP_irq_status_query, &irq_status))
+		irq_status.flags = 0;
+
+	info->u.pirq.flags &= ~PIRQ_NEEDS_EOI;
+	if (irq_status.flags & XENIRQSTAT_needs_eoi)
+		info->u.pirq.flags |= PIRQ_NEEDS_EOI;
+}
+
+static bool probing_irq(int irq)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+
+	return desc && desc->action == NULL;
+}
+
+static unsigned int startup_pirq(unsigned int irq)
+{
+	struct evtchn_bind_pirq bind_pirq;
+	struct irq_info *info = info_for_irq(irq);
+	int evtchn = evtchn_from_irq(irq);
+
+	BUG_ON(info->type != IRQT_PIRQ);
+
+	if (VALID_EVTCHN(evtchn))
+		goto out;
+
+	bind_pirq.pirq = irq;
+	/* NB. We are happy to share unless we are probing. */
+	bind_pirq.flags = probing_irq(irq) ? 0 : BIND_PIRQ__WILL_SHARE;
+	if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_pirq, &bind_pirq) != 0) {
+		if (!probing_irq(irq))
+			printk(KERN_INFO "Failed to obtain physical IRQ %d\n",
+			       irq);
+		return 0;
+	}
+	evtchn = bind_pirq.port;
+
+	pirq_query_unmask(irq);
+
+	evtchn_to_irq[evtchn] = irq;
+	bind_evtchn_to_cpu(evtchn, 0);
+	info->evtchn = evtchn;
+
+ out:
+	unmask_evtchn(evtchn);
+	pirq_unmask_notify(irq);
+
+	return 0;
+}
+
+static void shutdown_pirq(unsigned int irq)
+{
+	struct evtchn_close close;
+	struct irq_info *info = info_for_irq(irq);
+	int evtchn = evtchn_from_irq(irq);
+
+	BUG_ON(info->type != IRQT_PIRQ);
+
+	if (!VALID_EVTCHN(evtchn))
+		return;
+
+	mask_evtchn(evtchn);
+
+	close.port = evtchn;
+	if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
+		BUG();
+
+	bind_evtchn_to_cpu(evtchn, 0);
+	evtchn_to_irq[evtchn] = -1;
+	info->evtchn = 0;
+}
+
+static void enable_pirq(unsigned int irq)
+{
+	startup_pirq(irq);
+}
+
+static void disable_pirq(unsigned int irq)
+{
+}
+
+static void ack_pirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	move_native_irq(irq);
+
+	if (VALID_EVTCHN(evtchn)) {
+		mask_evtchn(evtchn);
+		clear_evtchn(evtchn);
+	}
+}
+
+static void end_pirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+	struct irq_desc *desc = irq_to_desc(irq);
+
+	if (WARN_ON(!desc))
+		return;
+
+	if ((desc->status & (IRQ_DISABLED|IRQ_PENDING)) ==
+	    (IRQ_DISABLED|IRQ_PENDING)) {
+		shutdown_pirq(irq);
+	} else if (VALID_EVTCHN(evtchn)) {
+		unmask_evtchn(evtchn);
+		pirq_unmask_notify(irq);
+	}
+}
+
+static int find_irq_by_gsi(unsigned gsi)
+{
+	int irq;
+
+	for (irq = 0; irq < NR_IRQS; irq++) {
+		struct irq_info *info = info_for_irq(irq);
+
+		if (info == NULL || info->type != IRQT_PIRQ)
+			continue;
+
+		if (gsi_from_irq(irq) == gsi)
+			return irq;
+	}
+
+	return -1;
+}
+
+/*
+ * Allocate a physical irq, along with a vector.  We don't assign an
+ * event channel until the irq actually started up.  Return an
+ * existing irq if we've already got one for the gsi.
+ */
+int xen_allocate_pirq(unsigned gsi)
+{
+	int irq;
+	struct physdev_irq irq_op;
+
+	spin_lock(&irq_mapping_update_lock);
+
+	irq = find_irq_by_gsi(gsi);
+	if (irq != -1) {
+		printk(KERN_INFO "xen_allocate_pirq: returning irq %d for gsi %u\n",
+		       irq, gsi);
+		goto out;	/* XXX need refcount? */
+	}
+
+	if (identity_mapped_irq(gsi)) {
+		irq = gsi;
+		dynamic_irq_init(irq);
+	} else
+		irq = find_unbound_irq();
+
+	set_irq_chip_and_handler_name(irq, &xen_pirq_chip,
+				      handle_level_irq, "pirq");
+
+	irq_op.irq = irq;
+	if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) {
+		dynamic_irq_cleanup(irq);
+		irq = -ENOSPC;
+		goto out;
+	}
+
+	irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector);
+
+out:
+	spin_unlock(&irq_mapping_update_lock);
+
+	return irq;
+}
+
+int xen_vector_from_irq(unsigned irq)
+{
+	return vector_from_irq(irq);
+}
+
+int xen_gsi_from_irq(unsigned irq)
+{
+	return gsi_from_irq(irq);
+}
+
 int bind_evtchn_to_irq(unsigned int evtchn)
 {
 	int irq;
@@ -922,6 +1141,26 @@ static struct irq_chip xen_dynamic_chip __read_mostly = {
 	.retrigger	= retrigger_dynirq,
 };
 
+static struct irq_chip xen_pirq_chip __read_mostly = {
+	.name		= "xen-pirq",
+
+	.startup	= startup_pirq,
+	.shutdown	= shutdown_pirq,
+
+	.enable		= enable_pirq,
+	.unmask		= enable_pirq,
+
+	.disable	= disable_pirq,
+	.mask		= disable_pirq,
+
+	.ack		= ack_pirq,
+	.end		= end_pirq,
+
+	.set_affinity	= set_affinity_irq,
+
+	.retrigger	= retrigger_dynirq,
+};
+
 void __init xen_init_IRQ(void)
 {
 	int i;
diff --git a/include/xen/events.h b/include/xen/events.h
index 9f24b64..e5b541d 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -58,4 +58,15 @@ void xen_poll_irq(int irq);
 /* Determine the IRQ which is bound to an event channel */
 unsigned irq_from_evtchn(unsigned int evtchn);
 
+/* Allocate an irq for a physical interrupt, given a gsi.  "Legacy"
+   GSIs are identity mapped; others are dynamically allocated as
+   usual. */
+int xen_allocate_pirq(unsigned gsi);
+
+/* Return vector allocated to pirq */
+int xen_vector_from_irq(unsigned pirq);
+
+/* Return gsi allocated to pirq */
+int xen_gsi_from_irq(unsigned pirq);
+
 #endif	/* _XEN_EVENTS_H */
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 06/17] x86/io_apic: add get_nr_irqs_gsi()
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
@ 2009-05-12 23:25   ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 02/17] x86: add io_apic_ops to allow interception Jeremy Fitzhardinge
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy@f9-builder.(none)>

Add get_nr_irqs_gsi() to return nr_irqs_gsi.  Xen will use this to
determine how many irqs it needs to reserve for hardware irqs.

[ Impact: new interface to get max GSI ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/io_apic.h |    1 +
 arch/x86/kernel/apic/io_apic.c |    5 +++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 8cbfe73..e33ccb7 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -181,6 +181,7 @@ extern void reinit_intr_remapped_IO_APIC(int intr_remapping,
 #endif
 
 extern void probe_nr_irqs_gsi(void);
+extern int get_nr_irqs_gsi(void);
 
 extern int setup_ioapic_entry(int apic, int irq,
 			      struct IO_APIC_route_entry *entry,
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c24f116..07dc530 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3917,6 +3917,11 @@ void __init probe_nr_irqs_gsi(void)
 	printk(KERN_DEBUG "nr_irqs_gsi: %d\n", nr_irqs_gsi);
 }
 
+int get_nr_irqs_gsi(void)
+{
+	return nr_irqs_gsi;
+}
+
 #ifdef CONFIG_SPARSE_IRQ
 int __init arch_probe_nr_irqs(void)
 {
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 06/17] x86/io_apic: add get_nr_irqs_gsi()
@ 2009-05-12 23:25   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge (none),
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy@f9-builder.(none)>

Add get_nr_irqs_gsi() to return nr_irqs_gsi.  Xen will use this to
determine how many irqs it needs to reserve for hardware irqs.

[ Impact: new interface to get max GSI ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/io_apic.h |    1 +
 arch/x86/kernel/apic/io_apic.c |    5 +++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 8cbfe73..e33ccb7 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -181,6 +181,7 @@ extern void reinit_intr_remapped_IO_APIC(int intr_remapping,
 #endif
 
 extern void probe_nr_irqs_gsi(void);
+extern int get_nr_irqs_gsi(void);
 
 extern int setup_ioapic_entry(int apic, int irq,
 			      struct IO_APIC_route_entry *entry,
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c24f116..07dc530 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3917,6 +3917,11 @@ void __init probe_nr_irqs_gsi(void)
 	printk(KERN_DEBUG "nr_irqs_gsi: %d\n", nr_irqs_gsi);
 }
 
+int get_nr_irqs_gsi(void)
+{
+	return nr_irqs_gsi;
+}
+
 #ifdef CONFIG_SPARSE_IRQ
 int __init arch_probe_nr_irqs(void)
 {
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 07/17] xen/apic: identity map gsi->irqs
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
@ 2009-05-12 23:25   ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 02/17] x86: add io_apic_ops to allow interception Jeremy Fitzhardinge
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy@f9-builder.(none)>

Reserve the lower irq range for use for hardware interrupts so we
can identity-map them.

[ Impact: preserve compat with native ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/events.c |   23 +++++++++++++++++------
 1 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index fd98c19..88395bb 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -31,6 +31,7 @@
 #include <asm/ptrace.h>
 #include <asm/irq.h>
 #include <asm/idle.h>
+#include <asm/io_apic.h>
 #include <asm/sync_bitops.h>
 #include <asm/xen/hypercall.h>
 #include <asm/xen/hypervisor.h>
@@ -40,9 +41,6 @@
 #include <xen/interface/xen.h>
 #include <xen/interface/event_channel.h>
 
-/* Leave low irqs free for identity mapping */
-#define LEGACY_IRQS	16
-
 /*
  * This lock protects updates to the following mapping and reference-count
  * arrays. The lock does not need to be acquired to read the mapping tables.
@@ -344,12 +342,24 @@ static void unmask_evtchn(int port)
 	put_cpu();
 }
 
+static int get_nr_hw_irqs(void)
+{
+	int ret = 1;
+
+#ifdef CONFIG_X86_IO_APIC
+	ret = get_nr_irqs_gsi();
+#endif
+
+	return ret;
+}
+
 static int find_unbound_irq(void)
 {
 	int irq;
 	struct irq_desc *desc;
+	int start = get_nr_hw_irqs();
 
-	for (irq = LEGACY_IRQS; irq < nr_irqs; irq++)
+	for (irq = start; irq < nr_irqs; irq++)
 		if (irq_info[irq].type == IRQT_UNBOUND)
 			break;
 
@@ -367,8 +377,8 @@ static int find_unbound_irq(void)
 
 static bool identity_mapped_irq(unsigned irq)
 {
-	/* only identity map legacy irqs */
-	return irq < LEGACY_IRQS;
+	/* identity map all the hardware irqs */
+	return irq < get_nr_hw_irqs();
 }
 
 static void pirq_unmask_notify(int irq)
@@ -537,6 +547,7 @@ int xen_allocate_pirq(unsigned gsi)
 
 	if (identity_mapped_irq(gsi)) {
 		irq = gsi;
+		irq_to_desc_alloc_cpu(irq, 0);
 		dynamic_irq_init(irq);
 	} else
 		irq = find_unbound_irq();
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 07/17] xen/apic: identity map gsi->irqs
@ 2009-05-12 23:25   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge (none),
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy@f9-builder.(none)>

Reserve the lower irq range for use for hardware interrupts so we
can identity-map them.

[ Impact: preserve compat with native ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/events.c |   23 +++++++++++++++++------
 1 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index fd98c19..88395bb 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -31,6 +31,7 @@
 #include <asm/ptrace.h>
 #include <asm/irq.h>
 #include <asm/idle.h>
+#include <asm/io_apic.h>
 #include <asm/sync_bitops.h>
 #include <asm/xen/hypercall.h>
 #include <asm/xen/hypervisor.h>
@@ -40,9 +41,6 @@
 #include <xen/interface/xen.h>
 #include <xen/interface/event_channel.h>
 
-/* Leave low irqs free for identity mapping */
-#define LEGACY_IRQS	16
-
 /*
  * This lock protects updates to the following mapping and reference-count
  * arrays. The lock does not need to be acquired to read the mapping tables.
@@ -344,12 +342,24 @@ static void unmask_evtchn(int port)
 	put_cpu();
 }
 
+static int get_nr_hw_irqs(void)
+{
+	int ret = 1;
+
+#ifdef CONFIG_X86_IO_APIC
+	ret = get_nr_irqs_gsi();
+#endif
+
+	return ret;
+}
+
 static int find_unbound_irq(void)
 {
 	int irq;
 	struct irq_desc *desc;
+	int start = get_nr_hw_irqs();
 
-	for (irq = LEGACY_IRQS; irq < nr_irqs; irq++)
+	for (irq = start; irq < nr_irqs; irq++)
 		if (irq_info[irq].type == IRQT_UNBOUND)
 			break;
 
@@ -367,8 +377,8 @@ static int find_unbound_irq(void)
 
 static bool identity_mapped_irq(unsigned irq)
 {
-	/* only identity map legacy irqs */
-	return irq < LEGACY_IRQS;
+	/* identity map all the hardware irqs */
+	return irq < get_nr_hw_irqs();
 }
 
 static void pirq_unmask_notify(int irq)
@@ -537,6 +547,7 @@ int xen_allocate_pirq(unsigned gsi)
 
 	if (identity_mapped_irq(gsi)) {
 		irq = gsi;
+		irq_to_desc_alloc_cpu(irq, 0);
 		dynamic_irq_init(irq);
 	} else
 		irq = find_unbound_irq();
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 08/17] xen: direct irq registration to pirq event channels
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (6 preceding siblings ...)
  2009-05-12 23:25   ` Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 09/17] xen: bind pirq to vector and event channel Jeremy Fitzhardinge
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

This patch puts the hooks into place so that when the interrupt
subsystem registers an irq, it gets routed via Xen (if we're running
under Xen).

The first step is to get a gsi for a particular device+pin.  We use
the normal acpi interrupt routing to do the mapping.

We reserve enough irq space to fit the hardware interrupt sources in,
so we can allocate the irq == gsi, as we do in the native case;
software events will get allocated irqs above that.

Having allocated an irq, we ask Xen to allocate a vector, and then
bind that pirq/vector to an event channel.  When the hardware raises
an interrupt on a vector, Xen signals us on the corresponding event
channel, which gets routed to the irq and delivered to the appropriate
device driver.

This patch does everything except set up the IO APIC pin routing to
the vector.

[ Impact: route hardware interrupts via Xen ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/xen/pci.h |   13 +++++++++++
 arch/x86/kernel/acpi/boot.c    |    8 ++++++-
 arch/x86/xen/Kconfig           |   11 +++++++++
 arch/x86/xen/Makefile          |    1 +
 arch/x86/xen/pci.c             |   47 ++++++++++++++++++++++++++++++++++++++++
 drivers/xen/events.c           |    6 ++++-
 include/xen/events.h           |    8 ++++++
 7 files changed, 92 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/xen/pci.h
 create mode 100644 arch/x86/xen/pci.c

diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
new file mode 100644
index 0000000..0563fc6
--- /dev/null
+++ b/arch/x86/include/asm/xen/pci.h
@@ -0,0 +1,13 @@
+#ifndef _ASM_X86_XEN_PCI_H
+#define _ASM_X86_XEN_PCI_H
+
+#ifdef CONFIG_XEN_DOM0_PCI
+int xen_register_gsi(u32 gsi, int triggering, int polarity);
+#else
+static inline int xen_register_gsi(u32 gsi, int triggering, int polarity)
+{
+	return -1;
+}
+#endif
+
+#endif	/* _ASM_X86_XEN_PCI_H */
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 4147e0c..d4de1c2 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -41,6 +41,8 @@
 #include <asm/mpspec.h>
 #include <asm/smp.h>
 
+#include <asm/xen/pci.h>
+
 #include <asm/xen/hypervisor.h>
 
 static int __initdata acpi_force = 0;
@@ -530,9 +532,13 @@ int acpi_gsi_to_irq(u32 gsi, unsigned int *irq)
  */
 int acpi_register_gsi(u32 gsi, int triggering, int polarity)
 {
-	unsigned int irq;
+	int irq;
 	unsigned int plat_gsi = gsi;
 
+	irq = xen_register_gsi(gsi, triggering, polarity);
+	if (irq >= 0)
+		return irq;
+
 #ifdef CONFIG_PCI
 	/*
 	 * Make sure all (legacy) PCI IRQs are set as level-triggered.
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index fe69286..42e9f0a 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -37,6 +37,17 @@ config XEN_DEBUG_FS
 	  Enable statistics output and various tuning options in debugfs.
 	  Enabling this option may incur a significant performance overhead.
 
+config XEN_PCI_PASSTHROUGH
+       bool #"Enable support for Xen PCI passthrough devices"
+       depends on XEN && PCI
+       help
+         Enable support for passing PCI devices through to
+	 unprivileged domains. (COMPLETELY UNTESTED)
+
+config XEN_DOM0_PCI
+       def_bool y
+       depends on XEN_DOM0 && PCI
+
 config XEN_DOM0
 	bool "Enable Xen privileged domain support"
 	depends on XEN && X86_IO_APIC && ACPI
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 73ecb74..639965a 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -12,3 +12,4 @@ obj-y		:= enlighten.o setup.o multicalls.o mmu.o irq.o \
 obj-$(CONFIG_SMP)		+= smp.o spinlock.o
 obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_XEN_DOM0)		+= vga.o apic.o
+obj-$(CONFIG_XEN_DOM0_PCI)	+= pci.o
\ No newline at end of file
diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c
new file mode 100644
index 0000000..f450007
--- /dev/null
+++ b/arch/x86/xen/pci.c
@@ -0,0 +1,47 @@
+#include <linux/kernel.h>
+#include <linux/acpi.h>
+#include <linux/pci.h>
+
+#include <asm/pci_x86.h>
+
+#include <asm/xen/hypervisor.h>
+
+#include <xen/interface/xen.h>
+#include <xen/events.h>
+
+#include "xen-ops.h"
+
+int xen_register_gsi(u32 gsi, int triggering, int polarity)
+{
+	int irq;
+
+	if (!xen_domain())
+		return -1;
+
+	printk(KERN_DEBUG "xen: registering gsi %u triggering %d polarity %d\n",
+	       gsi, triggering, polarity);
+
+	irq = xen_allocate_pirq(gsi);
+
+	printk(KERN_DEBUG "xen: --> irq=%d\n", irq);
+
+	return irq;
+}
+
+void __init xen_setup_pirqs(void)
+{
+#ifdef CONFIG_ACPI
+	int irq;
+
+	/*
+	 * Set up acpi interrupt in acpi_gbl_FADT.sci_interrupt.
+	 */
+	irq = xen_allocate_pirq(acpi_gbl_FADT.sci_interrupt);
+
+	printk(KERN_INFO "xen: allocated irq %d for acpi %d\n",
+	       irq, acpi_gbl_FADT.sci_interrupt);
+
+	/* Blerk. */
+	acpi_gbl_FADT.sci_interrupt = irq;
+#endif
+}
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 88395bb..968e927 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -419,6 +419,7 @@ static unsigned int startup_pirq(unsigned int irq)
 	struct evtchn_bind_pirq bind_pirq;
 	struct irq_info *info = info_for_irq(irq);
 	int evtchn = evtchn_from_irq(irq);
+	int rc;
 
 	BUG_ON(info->type != IRQT_PIRQ);
 
@@ -428,7 +429,8 @@ static unsigned int startup_pirq(unsigned int irq)
 	bind_pirq.pirq = irq;
 	/* NB. We are happy to share unless we are probing. */
 	bind_pirq.flags = probing_irq(irq) ? 0 : BIND_PIRQ__WILL_SHARE;
-	if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_pirq, &bind_pirq) != 0) {
+	rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_pirq, &bind_pirq);
+	if (rc != 0) {
 		if (!probing_irq(irq))
 			printk(KERN_INFO "Failed to obtain physical IRQ %d\n",
 			       irq);
@@ -1187,4 +1189,6 @@ void __init xen_init_IRQ(void)
 		mask_evtchn(i);
 
 	irq_ctx_init(smp_processor_id());
+
+	xen_setup_pirqs();
 }
diff --git a/include/xen/events.h b/include/xen/events.h
index e5b541d..6fe4863 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -69,4 +69,12 @@ int xen_vector_from_irq(unsigned pirq);
 /* Return gsi allocated to pirq */
 int xen_gsi_from_irq(unsigned pirq);
 
+#ifdef CONFIG_XEN_DOM0_PCI
+void xen_setup_pirqs(void);
+#else
+static inline void xen_setup_pirqs(void)
+{
+}
+#endif
+
 #endif	/* _XEN_EVENTS_H */
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 09/17] xen: bind pirq to vector and event channel
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (7 preceding siblings ...)
  2009-05-12 23:25 ` [PATCH 08/17] xen: direct irq registration to pirq event channels Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 10/17] xen: pre-initialize legacy irqs early Jeremy Fitzhardinge
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

Having converting a dev+pin to a gsi, and that gsi to an irq, and
allocated a vector for the irq, we must program the IO APIC to deliver
an interrupt on a pin to the vector, so Xen can deliver it as an event
channel.

Given the pirq, we can get the gsi and vector.  We map the gsi to a
specific IO APIC's pin, and set the routing entry.

(We were passing the ACPI triggering and polarity levels directly into
the apic - but they have reversed values.  The result was that
all the level-triggered interrupts were edge, and vice-versa.
It's surprising that anything worked at all, but now AHCI works
for me.

Thanks for Gerd Hoffmann for noticing this.)

[ Impact: program IO APICs under Xen ]

Diagnosed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/apic.c |    2 ++
 arch/x86/xen/pci.c  |   33 +++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index 8ae563c..35a8af7 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -4,6 +4,7 @@
 
 #include <asm/io_apic.h>
 #include <asm/acpi.h>
+#include <asm/hw_irq.h>
 
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
@@ -13,6 +14,7 @@
 
 static void __init xen_io_apic_init(void)
 {
+	enable_IO_APIC();
 }
 
 static unsigned int xen_io_apic_read(unsigned apic, unsigned reg)
diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c
index f450007..af4e898 100644
--- a/arch/x86/xen/pci.c
+++ b/arch/x86/xen/pci.c
@@ -2,6 +2,8 @@
 #include <linux/acpi.h>
 #include <linux/pci.h>
 
+#include <asm/mpspec.h>
+#include <asm/io_apic.h>
 #include <asm/pci_x86.h>
 
 #include <asm/xen/hypervisor.h>
@@ -11,6 +13,32 @@
 
 #include "xen-ops.h"
 
+static void xen_set_io_apic_routing(int irq, int trigger, int polarity)
+{
+	int ioapic, ioapic_pin;
+	int vector, gsi;
+	struct IO_APIC_route_entry entry;
+
+	gsi = xen_gsi_from_irq(irq);
+	vector = xen_vector_from_irq(irq);
+
+	ioapic = mp_find_ioapic(gsi);
+	if (ioapic == -1) {
+		printk(KERN_WARNING "xen_set_ioapic_routing: irq %d gsi %d ioapic %d\n",
+			irq, gsi, ioapic);
+		return;
+	}
+
+	ioapic_pin = mp_find_ioapic_pin(ioapic, gsi);
+
+	printk(KERN_INFO "xen_set_ioapic_routing: irq %d gsi %d vector %d ioapic %d pin %d triggering %d polarity %d\n",
+		irq, gsi, vector, ioapic, ioapic_pin, trigger, polarity);
+
+	setup_ioapic_entry(ioapic, -1, &entry, ~0, trigger, polarity, vector,
+			   ioapic_pin);
+	ioapic_write_entry(ioapic, ioapic_pin, entry);
+}
+
 int xen_register_gsi(u32 gsi, int triggering, int polarity)
 {
 	int irq;
@@ -25,6 +53,11 @@ int xen_register_gsi(u32 gsi, int triggering, int polarity)
 
 	printk(KERN_DEBUG "xen: --> irq=%d\n", irq);
 
+	if (irq > 0)
+		xen_set_io_apic_routing(irq,
+					triggering == ACPI_EDGE_SENSITIVE ? 0 : 1,
+					polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
 	return irq;
 }
 
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 10/17] xen: pre-initialize legacy irqs early
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (8 preceding siblings ...)
  2009-05-12 23:25 ` [PATCH 09/17] xen: bind pirq to vector and event channel Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 11/17] xen: don't setup acpi interrupt unless there is one Jeremy Fitzhardinge
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Ian Campbell, Jeremy Fitzhardinge

From: Ian Campbell <ian.campbell@citrix.com>

Various legacy devices, such as IDE, assume their legacy interrupts are
already initialized and are immediately usable.  Pre-initialize all the
legacy interrupts.

[ Impact: ISA/legacy device compat ]

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/pci.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c
index af4e898..402a5bd 100644
--- a/arch/x86/xen/pci.c
+++ b/arch/x86/xen/pci.c
@@ -63,9 +63,9 @@ int xen_register_gsi(u32 gsi, int triggering, int polarity)
 
 void __init xen_setup_pirqs(void)
 {
-#ifdef CONFIG_ACPI
 	int irq;
 
+#ifdef CONFIG_ACPI
 	/*
 	 * Set up acpi interrupt in acpi_gbl_FADT.sci_interrupt.
 	 */
@@ -77,4 +77,8 @@ void __init xen_setup_pirqs(void)
 	/* Blerk. */
 	acpi_gbl_FADT.sci_interrupt = irq;
 #endif
+
+	/* Pre-allocate legacy irqs */
+	for (irq = 0; irq < NR_IRQS_LEGACY; irq++)
+		xen_allocate_pirq(irq);
 }
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 11/17] xen: don't setup acpi interrupt unless there is one
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (9 preceding siblings ...)
  2009-05-12 23:25 ` [PATCH 10/17] xen: pre-initialize legacy irqs early Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 12/17] xen: use acpi_get_override_irq() to get triggering for legacy irqs Jeremy Fitzhardinge
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

If the SCI hasn't been set, then presumably we're not running
with acpi, don't bother setting up the interrupt.

[ Impact: compatibility with pre-ACPI machines ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/pci.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c
index 402a5bd..00ad6df 100644
--- a/arch/x86/xen/pci.c
+++ b/arch/x86/xen/pci.c
@@ -69,13 +69,12 @@ void __init xen_setup_pirqs(void)
 	/*
 	 * Set up acpi interrupt in acpi_gbl_FADT.sci_interrupt.
 	 */
-	irq = xen_allocate_pirq(acpi_gbl_FADT.sci_interrupt);
+	if (acpi_gbl_FADT.sci_interrupt > 0) {
+		irq = xen_allocate_pirq(acpi_gbl_FADT.sci_interrupt);
 
-	printk(KERN_INFO "xen: allocated irq %d for acpi %d\n",
-	       irq, acpi_gbl_FADT.sci_interrupt);
-
-	/* Blerk. */
-	acpi_gbl_FADT.sci_interrupt = irq;
+		printk(KERN_INFO "xen: allocated irq %d for acpi %d\n",
+		       irq, acpi_gbl_FADT.sci_interrupt);
+	}
 #endif
 
 	/* Pre-allocate legacy irqs */
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 12/17] xen: use acpi_get_override_irq() to get triggering for legacy irqs
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (10 preceding siblings ...)
  2009-05-12 23:25 ` [PATCH 11/17] xen: don't setup acpi interrupt unless there is one Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 13/17] xen: initialize irq 0 too Jeremy Fitzhardinge
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

We need to set up proper IO apic entries for legacy irqs, which are
not normally configured by either normal acpi interrupt routing or
PNP.

This also generalizes the acpi interrupt setup, so we can remove it
as a special case.

[ Impact: compatibility with legacy/ISA hardware ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/pci.c |   24 ++++++++++--------------
 1 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c
index 00ad6df..db0c74c 100644
--- a/arch/x86/xen/pci.c
+++ b/arch/x86/xen/pci.c
@@ -65,19 +65,15 @@ void __init xen_setup_pirqs(void)
 {
 	int irq;
 
-#ifdef CONFIG_ACPI
-	/*
-	 * Set up acpi interrupt in acpi_gbl_FADT.sci_interrupt.
-	 */
-	if (acpi_gbl_FADT.sci_interrupt > 0) {
-		irq = xen_allocate_pirq(acpi_gbl_FADT.sci_interrupt);
-
-		printk(KERN_INFO "xen: allocated irq %d for acpi %d\n",
-		       irq, acpi_gbl_FADT.sci_interrupt);
-	}
-#endif
-
 	/* Pre-allocate legacy irqs */
-	for (irq = 0; irq < NR_IRQS_LEGACY; irq++)
-		xen_allocate_pirq(irq);
+	for (irq = 0; irq < NR_IRQS_LEGACY; irq++) {
+		int trigger, polarity;
+
+		if (acpi_get_override_irq(irq, &trigger, &polarity) == -1)
+			continue;
+
+		xen_register_gsi(irq,
+			trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE,
+			polarity ? ACPI_ACTIVE_LOW : ACPI_ACTIVE_HIGH);
+	}
 }
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 13/17] xen: initialize irq 0 too
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (11 preceding siblings ...)
  2009-05-12 23:25 ` [PATCH 12/17] xen: use acpi_get_override_irq() to get triggering for legacy irqs Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 14/17] xen: dynamically allocate irq & event structures Jeremy Fitzhardinge
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

IRQ 0 is valid, so make sure it gets initialized properly too.
(Though in practice it doesn't matter, because its the timer
interrupt we don't use under Xen.)

[ Impact: theoretical bugfix, cleanup ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/pci.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c
index db0c74c..381b7ab 100644
--- a/arch/x86/xen/pci.c
+++ b/arch/x86/xen/pci.c
@@ -53,7 +53,7 @@ int xen_register_gsi(u32 gsi, int triggering, int polarity)
 
 	printk(KERN_DEBUG "xen: --> irq=%d\n", irq);
 
-	if (irq > 0)
+	if (irq >= 0)
 		xen_set_io_apic_routing(irq,
 					triggering == ACPI_EDGE_SENSITIVE ? 0 : 1,
 					polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 14/17] xen: dynamically allocate irq & event structures
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (12 preceding siblings ...)
  2009-05-12 23:25 ` [PATCH 13/17] xen: initialize irq 0 too Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 15/17] xen: set pirq name to something useful Jeremy Fitzhardinge
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Dynamically allocate the irq_info and evtchn_to_irq arrays, so that
1) the irq_info array scales to the actual number of possible irqs,
and 2) we don't needlessly increase the static size of the kernel
when we aren't running under Xen.

Derived on patch from Mike Travis <travis@sgi.com>.

[ Impact: reduce memory usage ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/events.c |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 968e927..e6ddf78 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/string.h>
 #include <linux/bootmem.h>
+#include <linux/irqnr.h>
 
 #include <asm/ptrace.h>
 #include <asm/irq.h>
@@ -91,11 +92,9 @@ struct irq_info
 };
 #define PIRQ_NEEDS_EOI	(1 << 0)
 
-static struct irq_info irq_info[NR_IRQS];
+static struct irq_info *irq_info;
 
-static int evtchn_to_irq[NR_EVENT_CHANNELS] = {
-	[0 ... NR_EVENT_CHANNELS-1] = -1
-};
+static int *evtchn_to_irq;
 struct cpu_evtchn_s {
 	unsigned long bits[NR_EVENT_CHANNELS/BITS_PER_LONG];
 };
@@ -515,7 +514,7 @@ static int find_irq_by_gsi(unsigned gsi)
 {
 	int irq;
 
-	for (irq = 0; irq < NR_IRQS; irq++) {
+	for (irq = 0; irq < nr_irqs; irq++) {
 		struct irq_info *info = info_for_irq(irq);
 
 		if (info == NULL || info->type != IRQT_PIRQ)
@@ -1180,7 +1179,11 @@ void __init xen_init_IRQ(void)
 	size_t size = nr_cpu_ids * sizeof(struct cpu_evtchn_s);
 
 	cpu_evtchn_mask_p = alloc_bootmem(size);
-	BUG_ON(cpu_evtchn_mask_p == NULL);
+	irq_info = alloc_bootmem(nr_irqs * sizeof(*irq_info));
+
+	evtchn_to_irq = alloc_bootmem(NR_EVENT_CHANNELS * sizeof(*evtchn_to_irq));
+	for (i = 0; i < NR_EVENT_CHANNELS; i++)
+		evtchn_to_irq[i] = -1;
 
 	init_evtchn_cpu_bindings();
 
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 15/17] xen: set pirq name to something useful.
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (13 preceding siblings ...)
  2009-05-12 23:25 ` [PATCH 14/17] xen: dynamically allocate irq & event structures Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 16/17] xen: fix legacy irq setup, make ioapic-less machines work Jeremy Fitzhardinge
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Gerd Hoffmann, Jeremy Fitzhardinge

From: Gerd Hoffmann <kraxel@xeni.home.kraxel.org>

Make pirq show useful information in /proc/interrupts

[ Impact: better output in /proc/interrupts ]

Signed-off-by: Gerd Hoffmann <kraxel@xeni.home.kraxel.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/pci.c   |    3 ++-
 drivers/xen/events.c |    4 ++--
 include/xen/events.h |    2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c
index 381b7ab..4b286f1 100644
--- a/arch/x86/xen/pci.c
+++ b/arch/x86/xen/pci.c
@@ -49,7 +49,8 @@ int xen_register_gsi(u32 gsi, int triggering, int polarity)
 	printk(KERN_DEBUG "xen: registering gsi %u triggering %d polarity %d\n",
 	       gsi, triggering, polarity);
 
-	irq = xen_allocate_pirq(gsi);
+	irq = xen_allocate_pirq(gsi, (triggering == ACPI_EDGE_SENSITIVE)
+				     ? "ioapic-edge" : "ioapic-level");
 
 	printk(KERN_DEBUG "xen: --> irq=%d\n", irq);
 
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index e6ddf78..f84d13b 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -532,7 +532,7 @@ static int find_irq_by_gsi(unsigned gsi)
  * event channel until the irq actually started up.  Return an
  * existing irq if we've already got one for the gsi.
  */
-int xen_allocate_pirq(unsigned gsi)
+int xen_allocate_pirq(unsigned gsi, char *name)
 {
 	int irq;
 	struct physdev_irq irq_op;
@@ -554,7 +554,7 @@ int xen_allocate_pirq(unsigned gsi)
 		irq = find_unbound_irq();
 
 	set_irq_chip_and_handler_name(irq, &xen_pirq_chip,
-				      handle_level_irq, "pirq");
+				      handle_level_irq, name);
 
 	irq_op.irq = irq;
 	if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) {
diff --git a/include/xen/events.h b/include/xen/events.h
index 6fe4863..4b19b9c 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -61,7 +61,7 @@ unsigned irq_from_evtchn(unsigned int evtchn);
 /* Allocate an irq for a physical interrupt, given a gsi.  "Legacy"
    GSIs are identity mapped; others are dynamically allocated as
    usual. */
-int xen_allocate_pirq(unsigned gsi);
+int xen_allocate_pirq(unsigned gsi, char *name);
 
 /* Return vector allocated to pirq */
 int xen_vector_from_irq(unsigned pirq);
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 16/17] xen: fix legacy irq setup, make ioapic-less machines work.
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (14 preceding siblings ...)
  2009-05-12 23:25 ` [PATCH 15/17] xen: set pirq name to something useful Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-12 23:25 ` [PATCH 17/17] xen: disable MSI Jeremy Fitzhardinge
  2009-05-19 12:35   ` Ingo Molnar
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Gerd Hoffmann, Jeremy Fitzhardinge

From: Gerd Hoffmann <kraxel@xeni.home.kraxel.org>

If the machine has no IO APICs, then just allocate a set of legacy
interrupts.

[ Impact: fix Xen compatibility with old machines ]

Signed-off-by: Gerd Hoffmann <kraxel@xeni.home.kraxel.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/pci.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/pci.c b/arch/x86/xen/pci.c
index 4b286f1..07b59fe 100644
--- a/arch/x86/xen/pci.c
+++ b/arch/x86/xen/pci.c
@@ -66,6 +66,12 @@ void __init xen_setup_pirqs(void)
 {
 	int irq;
 
+	if (0 == nr_ioapics) {
+		for (irq = 0; irq < NR_IRQS_LEGACY; irq++)
+			xen_allocate_pirq(irq, "xt-pic");
+		return;
+	}
+
 	/* Pre-allocate legacy irqs */
 	for (irq = 0; irq < NR_IRQS_LEGACY; irq++) {
 		int trigger, polarity;
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 17/17] xen: disable MSI
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
                   ` (15 preceding siblings ...)
  2009-05-12 23:25 ` [PATCH 16/17] xen: fix legacy irq setup, make ioapic-less machines work Jeremy Fitzhardinge
@ 2009-05-12 23:25 ` Jeremy Fitzhardinge
  2009-05-19 12:35   ` Ingo Molnar
  17 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-12 23:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Disable MSI until we support it properly.

[ Impact: prevent MSI subsystem from crashing ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/apic.c |    3 +++
 drivers/pci/pci.h   |    2 --
 include/linux/pci.h |    6 ++++++
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index 35a8af7..fece57a 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -1,6 +1,7 @@
 #include <linux/kernel.h>
 #include <linux/threads.h>
 #include <linux/bitmap.h>
+#include <linux/pci.h>
 
 #include <asm/io_apic.h>
 #include <asm/acpi.h>
@@ -54,6 +55,8 @@ void xen_init_apic(void)
 	if (!xen_initial_domain())
 		return;
 
+	pci_no_msi();
+
 	set_io_apic_ops(&xen_ioapic_ops);
 
 #ifdef CONFIG_ACPI
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d03f6b9..79ada7b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -111,10 +111,8 @@ extern struct rw_semaphore pci_bus_sem;
 extern unsigned int pci_pm_d3_delay;
 
 #ifdef CONFIG_PCI_MSI
-void pci_no_msi(void);
 extern void pci_msi_init_pci_dev(struct pci_dev *dev);
 #else
-static inline void pci_no_msi(void) { }
 static inline void pci_msi_init_pci_dev(struct pci_dev *dev) { }
 #endif
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 72698d8..724d030 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1253,5 +1253,11 @@ static inline irqreturn_t pci_sriov_migration(struct pci_dev *dev)
 }
 #endif
 
+#ifdef CONFIG_PCI_MSI
+void pci_no_msi(void);
+#else
+static inline void pci_no_msi(void) { }
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* LINUX_PCI_H */
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
@ 2009-05-19 12:35   ` Ingo Molnar
  2009-05-12 23:25 ` [PATCH 02/17] x86: add io_apic_ops to allow interception Jeremy Fitzhardinge
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-19 12:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Hi Ingo,
> 
> Here's a revised set of the Xen APIC changes which adds 
> io_apic_ops to allow Xen to intercept IO APIC access operations.

In a previous discussion you said:

> IO APIC operations are not even slightly performance critical? Are 
> they ever used on the interrupt delivery path?

Since they are not performance critical, then why doesnt Xen catch 
the IO-APIC accesses, and virtualizes the device?

If you want to hook into the IO-APIC code at such a low level, why 
dont you hook into the _hardware_ API - i.e. catch those 
setup/routing modifications to the IO-APIC space. No Linux changes 
are needed in that case.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-19 12:35   ` Ingo Molnar
  0 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-19 12:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Hi Ingo,
> 
> Here's a revised set of the Xen APIC changes which adds 
> io_apic_ops to allow Xen to intercept IO APIC access operations.

In a previous discussion you said:

> IO APIC operations are not even slightly performance critical? Are 
> they ever used on the interrupt delivery path?

Since they are not performance critical, then why doesnt Xen catch 
the IO-APIC accesses, and virtualizes the device?

If you want to hook into the IO-APIC code at such a low level, why 
dont you hook into the _hardware_ API - i.e. catch those 
setup/routing modifications to the IO-APIC space. No Linux changes 
are needed in that case.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-19 12:35   ` Ingo Molnar
@ 2009-05-20 17:57     ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-20 17:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Keir Fraser

Ingo Molnar wrote:
> Since they are not performance critical, then why doesnt Xen catch 
> the IO-APIC accesses, and virtualizes the device?
>
> If you want to hook into the IO-APIC code at such a low level, why 
> dont you hook into the _hardware_ API - i.e. catch those 
> setup/routing modifications to the IO-APIC space. No Linux changes 
> are needed in that case.
>   

Yes, these changes aren't for a performance reason.  It's a case where a 
few lines change in Linux saves many hundreds or thousands of lines 
change in Xen.

Xen doesn't have an internal mechanism for emulating devices via 
pagefaults (that's generally handled by a qemu instance running as part 
of a guest domain), so there's no mechanism to map and emulate the 
io-apic.  Putting such support into Xen would mean adding a pile of new 
infrastructure to support this case.

Unlike the mtrr discussion, where the msr read/write ops would allow us 
to emulate the mtrr within the Xen-specific parts of the kernel, the 
io-apic ops are just accessed via normal memory writes which we can't 
hook, so it would have to be done within Xen.

The other thing I thought about was putting a hook in the Linux 
pagefault handler, so we could emulate the ioapic at that level.  But 
putting a hook in a very hot path to avoid code changes in a cold path 
doesn't make any sense.  (Same applies to doing PF emulation within Xen; 
that's an even hotter path than Linux's.)

    J

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-20 17:57     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-20 17:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List,
	Keir Fraser

Ingo Molnar wrote:
> Since they are not performance critical, then why doesnt Xen catch 
> the IO-APIC accesses, and virtualizes the device?
>
> If you want to hook into the IO-APIC code at such a low level, why 
> dont you hook into the _hardware_ API - i.e. catch those 
> setup/routing modifications to the IO-APIC space. No Linux changes 
> are needed in that case.
>   

Yes, these changes aren't for a performance reason.  It's a case where a 
few lines change in Linux saves many hundreds or thousands of lines 
change in Xen.

Xen doesn't have an internal mechanism for emulating devices via 
pagefaults (that's generally handled by a qemu instance running as part 
of a guest domain), so there's no mechanism to map and emulate the 
io-apic.  Putting such support into Xen would mean adding a pile of new 
infrastructure to support this case.

Unlike the mtrr discussion, where the msr read/write ops would allow us 
to emulate the mtrr within the Xen-specific parts of the kernel, the 
io-apic ops are just accessed via normal memory writes which we can't 
hook, so it would have to be done within Xen.

The other thing I thought about was putting a hook in the Linux 
pagefault handler, so we could emulate the ioapic at that level.  But 
putting a hook in a very hot path to avoid code changes in a cold path 
doesn't make any sense.  (Same applies to doing PF emulation within Xen; 
that's an even hotter path than Linux's.)

    J

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-19 12:35   ` Ingo Molnar
@ 2009-05-24 20:10     ` Avi Kivity
  -1 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-05-24 20:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel

Ingo Molnar wrote:
>> IO APIC operations are not even slightly performance critical? Are 
>> they ever used on the interrupt delivery path?
>>     
>
> Since they are not performance critical, then why doesnt Xen catch 
> the IO-APIC accesses, and virtualizes the device?
>
> If you want to hook into the IO-APIC code at such a low level, why 
> dont you hook into the _hardware_ API - i.e. catch those 
> setup/routing modifications to the IO-APIC space. No Linux changes 
> are needed in that case.
>   

When x2apic is enabled, and EOI broadcast is disabled, then the io apic 
does become a hot path - it needs to be written for each level-triggered 
interrupt EOI.  In this case I might want to paravirtualize  the EOI 
write to exit only if an interrupt is pending; otherwise communicate via 
shared memory.

We do something similar for Windows (by patching it) very successfully; 
Windows likes to touch the APIC TPR ~ 100,000 times per second, usually 
without triggering an interrupt.  We hijack these writes, do the checks 
in guest context, and only exit if the TPR write would trigger an interrupt.

(kvm will likely gain x2apic support in 2.6.32; patches have already 
been posted)

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-24 20:10     ` Avi Kivity
  0 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-05-24 20:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Xen-devel, Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List

Ingo Molnar wrote:
>> IO APIC operations are not even slightly performance critical? Are 
>> they ever used on the interrupt delivery path?
>>     
>
> Since they are not performance critical, then why doesnt Xen catch 
> the IO-APIC accesses, and virtualizes the device?
>
> If you want to hook into the IO-APIC code at such a low level, why 
> dont you hook into the _hardware_ API - i.e. catch those 
> setup/routing modifications to the IO-APIC space. No Linux changes 
> are needed in that case.
>   

When x2apic is enabled, and EOI broadcast is disabled, then the io apic 
does become a hot path - it needs to be written for each level-triggered 
interrupt EOI.  In this case I might want to paravirtualize  the EOI 
write to exit only if an interrupt is pending; otherwise communicate via 
shared memory.

We do something similar for Windows (by patching it) very successfully; 
Windows likes to touch the APIC TPR ~ 100,000 times per second, usually 
without triggering an interrupt.  We hijack these writes, do the checks 
in guest context, and only exit if the TPR write would trigger an interrupt.

(kvm will likely gain x2apic support in 2.6.32; patches have already 
been posted)

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-24 20:10     ` Avi Kivity
@ 2009-05-25  3:51       ` Ingo Molnar
  -1 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-25  3:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel


* Avi Kivity <avi@redhat.com> wrote:

> Ingo Molnar wrote:
>>> IO APIC operations are not even slightly performance critical? Are  
>>> they ever used on the interrupt delivery path?
>>>     
>>
>> Since they are not performance critical, then why doesnt Xen catch the 
>> IO-APIC accesses, and virtualizes the device?
>>
>> If you want to hook into the IO-APIC code at such a low level, why  
>> dont you hook into the _hardware_ API - i.e. catch those setup/routing 
>> modifications to the IO-APIC space. No Linux changes are needed in that 
>> case.
>>   
>
> When x2apic is enabled, and EOI broadcast is disabled, then the io 
> apic does become a hot path - it needs to be written for each 
> level-triggered interrupt EOI.  In this case I might want to 
> paravirtualize the EOI write to exit only if an interrupt is 
> pending; otherwise communicate via shared memory.
>
> We do something similar for Windows (by patching it) very 
> successfully; Windows likes to touch the APIC TPR ~ 100,000 times 
> per second, usually without triggering an interrupt.  We hijack 
> these writes, do the checks in guest context, and only exit if the 
> TPR write would trigger an interrupt.

I suspect you aware of that this is about the io-apic not the local 
APIC. The local apic methods are already driver-ized - and they sit 
closer to the CPU so they matter more to performance.

> (kvm will likely gain x2apic support in 2.6.32; patches have 
> already been posted)

ok. This points in the direction of the io-apic driver abstraction 
from Jeremy being the right long-term approach. We already have a 
few quirks that could be cleaned up by using a proper driver 
interface.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-25  3:51       ` Ingo Molnar
  0 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-25  3:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Xen-devel, Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List


* Avi Kivity <avi@redhat.com> wrote:

> Ingo Molnar wrote:
>>> IO APIC operations are not even slightly performance critical? Are  
>>> they ever used on the interrupt delivery path?
>>>     
>>
>> Since they are not performance critical, then why doesnt Xen catch the 
>> IO-APIC accesses, and virtualizes the device?
>>
>> If you want to hook into the IO-APIC code at such a low level, why  
>> dont you hook into the _hardware_ API - i.e. catch those setup/routing 
>> modifications to the IO-APIC space. No Linux changes are needed in that 
>> case.
>>   
>
> When x2apic is enabled, and EOI broadcast is disabled, then the io 
> apic does become a hot path - it needs to be written for each 
> level-triggered interrupt EOI.  In this case I might want to 
> paravirtualize the EOI write to exit only if an interrupt is 
> pending; otherwise communicate via shared memory.
>
> We do something similar for Windows (by patching it) very 
> successfully; Windows likes to touch the APIC TPR ~ 100,000 times 
> per second, usually without triggering an interrupt.  We hijack 
> these writes, do the checks in guest context, and only exit if the 
> TPR write would trigger an interrupt.

I suspect you aware of that this is about the io-apic not the local 
APIC. The local apic methods are already driver-ized - and they sit 
closer to the CPU so they matter more to performance.

> (kvm will likely gain x2apic support in 2.6.32; patches have 
> already been posted)

ok. This points in the direction of the io-apic driver abstraction 
from Jeremy being the right long-term approach. We already have a 
few quirks that could be cleaned up by using a proper driver 
interface.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 02/17] x86: add io_apic_ops to allow interception
  2009-05-12 23:25 ` [PATCH 02/17] x86: add io_apic_ops to allow interception Jeremy Fitzhardinge
@ 2009-05-25  3:54     ` Ingo Molnar
  0 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-25  3:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> Xen dom0 needs to paravirtualize IO operations to the IO APIC, so add
> a io_apic_ops for it to intercept.  Do this as ops structure because
> there's at least some chance that another paravirtualized environment
> may want to intercept these.
> 
> [Impact: indirect IO APIC access via io_apic_ops]
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  arch/x86/include/asm/io_apic.h |    9 +++++++
>  arch/x86/kernel/apic/io_apic.c |   50 +++++++++++++++++++++++++++++++++++++--
>  2 files changed, 56 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> index 9d826e4..8cbfe73 100644
> --- a/arch/x86/include/asm/io_apic.h
> +++ b/arch/x86/include/asm/io_apic.h
> @@ -21,6 +21,15 @@
>  #define IO_APIC_REDIR_LEVEL_TRIGGER	(1 << 15)
>  #define IO_APIC_REDIR_MASKED		(1 << 16)
>  
> +struct io_apic_ops {
> +	void (*init)(void);
> +	unsigned int (*read)(unsigned int apic, unsigned int reg);
> +	void (*write)(unsigned int apic, unsigned int reg, unsigned int value);
> +	void (*modify)(unsigned int apic, unsigned int reg, unsigned int value);
> +};
> +
> +void __init set_io_apic_ops(const struct io_apic_ops *);

ok, could you please turn the whole IO-APIC code into a driver 
framework? I.e. all IO-APIC calls outside of 
arch/x86/kernel/apic/io_apic.c should be to some io_apic-> method.

The advantage will be a proper abstraction for all IO-APIC details - 
not just a minimalistic one for Xen's need.

Also, please name it 'struct io_apic' - similar to the 'struct apic' 
naming we have for the local APIC driver structure.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 02/17] x86: add io_apic_ops to allow interception
@ 2009-05-25  3:54     ` Ingo Molnar
  0 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-25  3:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List,
	Jeremy Fitzhardinge


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> Xen dom0 needs to paravirtualize IO operations to the IO APIC, so add
> a io_apic_ops for it to intercept.  Do this as ops structure because
> there's at least some chance that another paravirtualized environment
> may want to intercept these.
> 
> [Impact: indirect IO APIC access via io_apic_ops]
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  arch/x86/include/asm/io_apic.h |    9 +++++++
>  arch/x86/kernel/apic/io_apic.c |   50 +++++++++++++++++++++++++++++++++++++--
>  2 files changed, 56 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> index 9d826e4..8cbfe73 100644
> --- a/arch/x86/include/asm/io_apic.h
> +++ b/arch/x86/include/asm/io_apic.h
> @@ -21,6 +21,15 @@
>  #define IO_APIC_REDIR_LEVEL_TRIGGER	(1 << 15)
>  #define IO_APIC_REDIR_MASKED		(1 << 16)
>  
> +struct io_apic_ops {
> +	void (*init)(void);
> +	unsigned int (*read)(unsigned int apic, unsigned int reg);
> +	void (*write)(unsigned int apic, unsigned int reg, unsigned int value);
> +	void (*modify)(unsigned int apic, unsigned int reg, unsigned int value);
> +};
> +
> +void __init set_io_apic_ops(const struct io_apic_ops *);

ok, could you please turn the whole IO-APIC code into a driver 
framework? I.e. all IO-APIC calls outside of 
arch/x86/kernel/apic/io_apic.c should be to some io_apic-> method.

The advantage will be a proper abstraction for all IO-APIC details - 
not just a minimalistic one for Xen's need.

Also, please name it 'struct io_apic' - similar to the 'struct apic' 
naming we have for the local APIC driver structure.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-20 17:57     ` Jeremy Fitzhardinge
@ 2009-05-25  4:10       ` Ingo Molnar
  -1 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-25  4:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Keir Fraser, Linus Torvalds, Avi Kivity


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
>> Since they are not performance critical, then why doesnt Xen catch the 
>> IO-APIC accesses, and virtualizes the device?
>>
>> If you want to hook into the IO-APIC code at such a low level, why  
>> dont you hook into the _hardware_ API - i.e. catch those setup/routing 
>> modifications to the IO-APIC space. No Linux changes are needed in that 
>> case.
>>   
>
> Yes, these changes aren't for a performance reason.  It's a case 
> where a few lines change in Linux saves many hundreds or thousands 
> of lines change in Xen.
>
> Xen doesn't have an internal mechanism for emulating devices via 
> pagefaults (that's generally handled by a qemu instance running as 
> part of a guest domain), so there's no mechanism to map and 
> emulate the io-apic.  Putting such support into Xen would mean 
> adding a pile of new infrastructure to support this case.

Note that this design problem has been created by Xen, 
intentionally, and Xen is now suffering under those bad technical 
choices made years ago. It's not Linux's problem.

The whole Xen design is messed up really: you have taken off bits of 
the Linux kernel you found interesting, turned them into a 
micro-kernel in essence and renamed it to 'Xen'.

But drivers and proper architecture is apparently boring (and 
fragile and hard and expensive to write and support in a 
micro-kernel setup) so you came up with this DOM0 piece of cr*p that 
ties Linux to Xen even closer (along an _ABI_), where Linux does 
most of the real work while Xen still stays 'separate' on paper.

Xen isnt actually useful _at all_ without Linux/DOM0. Without Dom0 
Xen is slow and native hardware support within Xen is virtually 
non-existent, as you point out above.

This is proof that you should have done all that work within Linux - 
instead of duplicating a lot of code.

> Unlike the mtrr discussion, where the msr read/write ops would 
> allow us to emulate the mtrr within the Xen-specific parts of the 
> kernel, the io-apic ops are just accessed via normal memory writes 
> which we can't hook, so it would have to be done within Xen.
>
> The other thing I thought about was putting a hook in the Linux 
> pagefault handler, so we could emulate the ioapic at that level.  
> But putting a hook in a very hot path to avoid code changes in a 
> cold path doesn't make any sense.  (Same applies to doing PF 
> emulation within Xen; that's an even hotter path than Linux's.)

We already have various page fault notifiers, you could reuse them 
if you wanted to.

Anyway, i'll pull the IO-APIC driver-ization changes if it's 
complete, thorough and clean, because that will obviously help Linux 
too. But the influx of paravirt overhead slowing down the native 
kernel has to stop really.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-25  4:10       ` Ingo Molnar
  0 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-25  4:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List,
	Avi Kivity, Linus Torvalds, Keir Fraser


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Ingo Molnar wrote:
>> Since they are not performance critical, then why doesnt Xen catch the 
>> IO-APIC accesses, and virtualizes the device?
>>
>> If you want to hook into the IO-APIC code at such a low level, why  
>> dont you hook into the _hardware_ API - i.e. catch those setup/routing 
>> modifications to the IO-APIC space. No Linux changes are needed in that 
>> case.
>>   
>
> Yes, these changes aren't for a performance reason.  It's a case 
> where a few lines change in Linux saves many hundreds or thousands 
> of lines change in Xen.
>
> Xen doesn't have an internal mechanism for emulating devices via 
> pagefaults (that's generally handled by a qemu instance running as 
> part of a guest domain), so there's no mechanism to map and 
> emulate the io-apic.  Putting such support into Xen would mean 
> adding a pile of new infrastructure to support this case.

Note that this design problem has been created by Xen, 
intentionally, and Xen is now suffering under those bad technical 
choices made years ago. It's not Linux's problem.

The whole Xen design is messed up really: you have taken off bits of 
the Linux kernel you found interesting, turned them into a 
micro-kernel in essence and renamed it to 'Xen'.

But drivers and proper architecture is apparently boring (and 
fragile and hard and expensive to write and support in a 
micro-kernel setup) so you came up with this DOM0 piece of cr*p that 
ties Linux to Xen even closer (along an _ABI_), where Linux does 
most of the real work while Xen still stays 'separate' on paper.

Xen isnt actually useful _at all_ without Linux/DOM0. Without Dom0 
Xen is slow and native hardware support within Xen is virtually 
non-existent, as you point out above.

This is proof that you should have done all that work within Linux - 
instead of duplicating a lot of code.

> Unlike the mtrr discussion, where the msr read/write ops would 
> allow us to emulate the mtrr within the Xen-specific parts of the 
> kernel, the io-apic ops are just accessed via normal memory writes 
> which we can't hook, so it would have to be done within Xen.
>
> The other thing I thought about was putting a hook in the Linux 
> pagefault handler, so we could emulate the ioapic at that level.  
> But putting a hook in a very hot path to avoid code changes in a 
> cold path doesn't make any sense.  (Same applies to doing PF 
> emulation within Xen; that's an even hotter path than Linux's.)

We already have various page fault notifiers, you could reuse them 
if you wanted to.

Anyway, i'll pull the IO-APIC driver-ization changes if it's 
complete, thorough and clean, because that will obviously help Linux 
too. But the influx of paravirt overhead slowing down the native 
kernel has to stop really.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-25  3:51       ` Ingo Molnar
@ 2009-05-25  4:55         ` Avi Kivity
  -1 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-05-25  4:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel

Ingo Molnar wrote:
>> We do something similar for Windows (by patching it) very 
>> successfully; Windows likes to touch the APIC TPR ~ 100,000 times 
>> per second, usually without triggering an interrupt.  We hijack 
>> these writes, do the checks in guest context, and only exit if the 
>> TPR write would trigger an interrupt.
>>     
>
> I suspect you aware of that this is about the io-apic not the local 
> APIC. The local apic methods are already driver-ized - and they sit 
> closer to the CPU so they matter more to performance.
>   

Yeah, I gave this as an example.  It's very different -- io-apic vs. 
local apic, paravirtualization vs. patching the guest behind its back, 
Linux vs. Windows.

Of course if we hook the io-apic EOI we'll want to hook the local apic 
EOI as well.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-25  4:55         ` Avi Kivity
  0 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-05-25  4:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Xen-devel, Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List

Ingo Molnar wrote:
>> We do something similar for Windows (by patching it) very 
>> successfully; Windows likes to touch the APIC TPR ~ 100,000 times 
>> per second, usually without triggering an interrupt.  We hijack 
>> these writes, do the checks in guest context, and only exit if the 
>> TPR write would trigger an interrupt.
>>     
>
> I suspect you aware of that this is about the io-apic not the local 
> APIC. The local apic methods are already driver-ized - and they sit 
> closer to the CPU so they matter more to performance.
>   

Yeah, I gave this as an example.  It's very different -- io-apic vs. 
local apic, paravirtualization vs. patching the guest behind its back, 
Linux vs. Windows.

Of course if we hook the io-apic EOI we'll want to hook the local apic 
EOI as well.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-25  4:55         ` Avi Kivity
@ 2009-05-25  5:06           ` Ingo Molnar
  -1 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-25  5:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel


* Avi Kivity <avi@redhat.com> wrote:

> Ingo Molnar wrote:
>>> We do something similar for Windows (by patching it) very  
>>> successfully; Windows likes to touch the APIC TPR ~ 100,000 times  
>>> per second, usually without triggering an interrupt.  We hijack  
>>> these writes, do the checks in guest context, and only exit if the  
>>> TPR write would trigger an interrupt.
>>>     
>>
>> I suspect you aware of that this is about the io-apic not the local  
>> APIC. The local apic methods are already driver-ized - and they sit  
>> closer to the CPU so they matter more to performance.
>>   
>
> Yeah, I gave this as an example.  It's very different -- io-apic 
> vs.  local apic, paravirtualization vs. patching the guest behind 
> its back, Linux vs. Windows.
>
> Of course if we hook the io-apic EOI we'll want to hook the local 
> apic EOI as well.

Yeah. Eventually anything that matters to performance will be 
accelerated by hardware (and properly virtualized), which in turn 
will be faster than any hypercall based approach, right?

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-25  5:06           ` Ingo Molnar
  0 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-25  5:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Xen-devel, Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List


* Avi Kivity <avi@redhat.com> wrote:

> Ingo Molnar wrote:
>>> We do something similar for Windows (by patching it) very  
>>> successfully; Windows likes to touch the APIC TPR ~ 100,000 times  
>>> per second, usually without triggering an interrupt.  We hijack  
>>> these writes, do the checks in guest context, and only exit if the  
>>> TPR write would trigger an interrupt.
>>>     
>>
>> I suspect you aware of that this is about the io-apic not the local  
>> APIC. The local apic methods are already driver-ized - and they sit  
>> closer to the CPU so they matter more to performance.
>>   
>
> Yeah, I gave this as an example.  It's very different -- io-apic 
> vs.  local apic, paravirtualization vs. patching the guest behind 
> its back, Linux vs. Windows.
>
> Of course if we hook the io-apic EOI we'll want to hook the local 
> apic EOI as well.

Yeah. Eventually anything that matters to performance will be 
accelerated by hardware (and properly virtualized), which in turn 
will be faster than any hypercall based approach, right?

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-25  5:06           ` Ingo Molnar
@ 2009-05-25  5:12             ` Avi Kivity
  -1 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-05-25  5:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel

Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
>
>   
>> Ingo Molnar wrote:
>>     
>>>> We do something similar for Windows (by patching it) very  
>>>> successfully; Windows likes to touch the APIC TPR ~ 100,000 times  
>>>> per second, usually without triggering an interrupt.  We hijack  
>>>> these writes, do the checks in guest context, and only exit if the  
>>>> TPR write would trigger an interrupt.
>>>>     
>>>>         
>>> I suspect you aware of that this is about the io-apic not the local  
>>> APIC. The local apic methods are already driver-ized - and they sit  
>>> closer to the CPU so they matter more to performance.
>>>   
>>>       
>> Yeah, I gave this as an example.  It's very different -- io-apic 
>> vs.  local apic, paravirtualization vs. patching the guest behind 
>> its back, Linux vs. Windows.
>>
>> Of course if we hook the io-apic EOI we'll want to hook the local 
>> apic EOI as well.
>>     
>
> Yeah. Eventually anything that matters to performance will be 
> accelerated by hardware (and properly virtualized), which in turn 
> will be faster than any hypercall based approach, right?
>   

Right.  That's already happened to the TPR (Intel processors accelerate 
that 4-bit registers but ignore everything else in the local apic).  As 
another example, we have mmu paravirtualization in kvm, but 
automatically disable it when the hardware does nested paging.  The 
problem is that hardware support has a long pipeline, and even when 
support does appear, there's a massive installed base to care about.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-25  5:12             ` Avi Kivity
  0 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-05-25  5:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Xen-devel, Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List

Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
>
>   
>> Ingo Molnar wrote:
>>     
>>>> We do something similar for Windows (by patching it) very  
>>>> successfully; Windows likes to touch the APIC TPR ~ 100,000 times  
>>>> per second, usually without triggering an interrupt.  We hijack  
>>>> these writes, do the checks in guest context, and only exit if the  
>>>> TPR write would trigger an interrupt.
>>>>     
>>>>         
>>> I suspect you aware of that this is about the io-apic not the local  
>>> APIC. The local apic methods are already driver-ized - and they sit  
>>> closer to the CPU so they matter more to performance.
>>>   
>>>       
>> Yeah, I gave this as an example.  It's very different -- io-apic 
>> vs.  local apic, paravirtualization vs. patching the guest behind 
>> its back, Linux vs. Windows.
>>
>> Of course if we hook the io-apic EOI we'll want to hook the local 
>> apic EOI as well.
>>     
>
> Yeah. Eventually anything that matters to performance will be 
> accelerated by hardware (and properly virtualized), which in turn 
> will be faster than any hypercall based approach, right?
>   

Right.  That's already happened to the TPR (Intel processors accelerate 
that 4-bit registers but ignore everything else in the local apic).  As 
another example, we have mmu paravirtualization in kvm, but 
automatically disable it when the hardware does nested paging.  The 
problem is that hardware support has a long pipeline, and even when 
support does appear, there's a massive installed base to care about.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-25  5:12             ` Avi Kivity
  (?)
@ 2009-05-25  5:19             ` Ingo Molnar
  -1 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-25  5:19 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List, Xen-devel


* Avi Kivity <avi@redhat.com> wrote:

> Ingo Molnar wrote:
>> * Avi Kivity <avi@redhat.com> wrote:
>>
>>   
>>> Ingo Molnar wrote:
>>>     
>>>>> We do something similar for Windows (by patching it) very   
>>>>> successfully; Windows likes to touch the APIC TPR ~ 100,000 times 
>>>>>  per second, usually without triggering an interrupt.  We hijack  
>>>>> these writes, do the checks in guest context, and only exit if 
>>>>> the  TPR write would trigger an interrupt.
>>>>>             
>>>> I suspect you aware of that this is about the io-apic not the local 
>>>>  APIC. The local apic methods are already driver-ized - and they 
>>>> sit  closer to the CPU so they matter more to performance.
>>>>         
>>> Yeah, I gave this as an example.  It's very different -- io-apic vs.  
>>> local apic, paravirtualization vs. patching the guest behind its 
>>> back, Linux vs. Windows.
>>>
>>> Of course if we hook the io-apic EOI we'll want to hook the local  
>>> apic EOI as well.
>>>     
>>
>> Yeah. Eventually anything that matters to performance will be 
>> accelerated by hardware (and properly virtualized), which in turn 
>> will be faster than any hypercall based approach, right?
>
> Right.  That's already happened to the TPR (Intel processors 
> accelerate that 4-bit registers but ignore everything else in the 
> local apic).  As another example, we have mmu paravirtualization 
> in kvm, but automatically disable it when the hardware does nested 
> paging.  The problem is that hardware support has a long pipeline, 
> and even when support does appear, there's a massive installed 
> base to care about.

Yeah. Btw., i also think that in-kernel IO-APIC and APIC emulation 
could have uses elsewhere as well - such as in testing. Currently 
you actually have to own a big box to be able to test certain 
hardware limits. This has a negative effect on test coverage and a 
subsequent negative effect on kernel quality. If KVM provided clean 
code to emulate certain hw environments we could check out limits 
(and our bugs) far more effectively.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-25  4:10       ` Ingo Molnar
@ 2009-05-26 12:46         ` George Dunlap
  -1 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-05-26 12:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Xen-devel, the arch/x86 maintainers,
	Linux Kernel Mailing List, Avi Kivity, Linus Torvalds,
	Keir Fraser

On Mon, May 25, 2009 at 5:10 AM, Ingo Molnar <mingo@elte.hu> wrote:
> Note that this design problem has been created by Xen,
> intentionally, and Xen is now suffering under those bad technical
> choices made years ago. It's not Linux's problem.

I'd like to respecfully disagree with this.  I think I can see your
point of view: you're being asked to make changes to accommodate a
project you're not involved in, and whose fundamental design you
disagree with.  And no one disagrees with the stance that changes to
accomodate Xen must not impact native performance.  But I think the
current design (with dom0 running linux-as-hypervisor-component) is
the best one, and it's one we would make over again if we had to start
from scratch.

Basically, there are three ways to approach the hypervisor problem wrt Linux:
1. Make Linux into a hypervisor (linux-as-hypervisor). This is the KVM approach.
2. Fork Linux, stealing all the device drivers, and making a
monolithic hypervisor.
3. Make a small, lean hypervisor, but leverage Linux to run the
devices and control stack (linux-as-hypervisor-component).

I've worked a bit at both kernel and hypervisor level (although
admittedly much more in-depth at the hypervisor level).  It seems to
me that being a hypervisor is a much different thing than being a
kernel.  I don't believe that one piece of software can do both well.
And I believe that, when it begins to mature more, KVM will run into
the very same issue.  KVM developers will really want to start to make
the kernel into a hypervisor, and there will be a disagreement between
those who want the kernel to be just a kernel, and those who want the
kernel also to be a hypervisor.  The result will be either a heavily
modified Linux (much more than linux-as-hypervisor-component) or a
really sucky hypervisor.

As a simple example, take scheduling.  I'm about to re-write the Xen
scheduler, and in the process I took a good look at the scheduler you
wrote.  I think it's got a lot of really good ideas, which I plan to
steal. :-)  However, I'm going to have to make some key changes in
order for it to function well as a hypervisor scheduler.  If KVM is
used on a production server with 20 or 30 multi-vcpu VMs, I predict
the current scheduler will do very poorly, because it wasn't designed
with VMs in mind, but with processes.  Making changes so that VMs run
better will fundamentally make things that make processes run less
well.

Forking Linux, drivers an all, is not a good idea; anyone would have
to be a fool to try it.  I think if you think seriously about it,
you'd never do something like that.  I don't believe any such a
project would have a snowball's chance in hell of attracting anywhere
near the required number of hardware developers to make it an
enterprise-class system.  If, somehow, it did manage to attract a
critical mass to make it viable, then the result would be two much
weaker projects, wasting millions of man-hours of  labor doing
unnecessary duplication.

No, I think the best option, and the option the Xen project would take
again if we were to start from scratch, would be what we have done:
To build a hypervisor to be a hypervisor, and let the kernel be a
kernel: but leverage the millions of man-hours still being done in
hardware support for Linux.

Either way, time will tell in the end.  If I'm wrong, and KVM can
become an enterprise-class hypervisor while playing well with
linux-as-kernel, then eventually it will dominate and Xen will die
out.  You can say "I told you so" and remove all the crap you've been
objecting to.  If I'm right, however, then having Xen around will be
critical, not just for open-source virtualization, but for the kernel
as well.  You'll be happy to be able to tell people, "Don't put this
hypervisor crap in here.  If you want a hypervisor, go to Xen." :-)

Until things are shown clearly one way or the other, the best thing to
do is hedge your bets, and allow both projects to develop.

[That's my main point; in-line responses below.]

> The whole Xen design is messed up really: you have taken off bits of
> the Linux kernel you found interesting, turned them into a
> micro-kernel in essence and renamed it to 'Xen'.

That's how Xen started, and that's really the beauty of open-source.
(After all, KVM has stolen some ideas from the Xen shadow code.)  But
since then, basically all of the code has been replaced with
Xen-written code.  I think if you did an SCO-style audit comparing
Linux and Xen 3.4, you'd find a lot less in common than you think.

> But drivers and proper architecture is apparently boring (and
> fragile and hard and expensive to write and support in a
> micro-kernel setup) so you came up with this DOM0 piece of cr*p that
> ties Linux to Xen even closer (along an _ABI_), where Linux does
> most of the real work while Xen still stays 'separate' on paper.

It's not boring, it's just a colossal waste of time and resources to
duplicate all that effort.  "Real work" is done by all of the
components: Xen does the "real work" of scheduling and resource
management; Linux does the "real work" of process-level stuff,
filesystems, and so on and (in the case of dom0) hardware support;
qemu does the "real work" of doing device emulation.  All of them are
unique, difficult, and interesting to somebody.  Reducing duplication
means everyone can work on what interests them the most, and minimizes
the total "busy work" for all involved.

How many KVM developers are working on device drivers?  And how would
Xen duplicating all the driver development help Linux?  Linux would
still have to do everything, there'd just be fewer developers to do it
(since some people would be working on Xen drivers instead).

> Xen isnt actually useful _at all_ without Linux/DOM0. Without Dom0
> Xen is slow and native hardware support within Xen is virtually
> non-existent, as you point out above.

And qemu-kvm isn't useful _at_all_ without Linux either; and Linux-KVM
isn't useful _at_all_ without qemu.  Your point?

Xen will run without dom0?  I wasn't aware of that... ;-)

> This is proof that you should have done all that work within Linux -
> instead of duplicating a lot of code.

See above.

 -George Dunlap

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-26 12:46         ` George Dunlap
  0 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-05-26 12:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeremy Fitzhardinge, Xen-devel, the arch/x86 maintainers,
	Linux Kernel Mailing List, Avi Kivity, Linus Torvalds,
	Keir Fraser

On Mon, May 25, 2009 at 5:10 AM, Ingo Molnar <mingo@elte.hu> wrote:
> Note that this design problem has been created by Xen,
> intentionally, and Xen is now suffering under those bad technical
> choices made years ago. It's not Linux's problem.

I'd like to respecfully disagree with this.  I think I can see your
point of view: you're being asked to make changes to accommodate a
project you're not involved in, and whose fundamental design you
disagree with.  And no one disagrees with the stance that changes to
accomodate Xen must not impact native performance.  But I think the
current design (with dom0 running linux-as-hypervisor-component) is
the best one, and it's one we would make over again if we had to start
from scratch.

Basically, there are three ways to approach the hypervisor problem wrt Linux:
1. Make Linux into a hypervisor (linux-as-hypervisor). This is the KVM approach.
2. Fork Linux, stealing all the device drivers, and making a
monolithic hypervisor.
3. Make a small, lean hypervisor, but leverage Linux to run the
devices and control stack (linux-as-hypervisor-component).

I've worked a bit at both kernel and hypervisor level (although
admittedly much more in-depth at the hypervisor level).  It seems to
me that being a hypervisor is a much different thing than being a
kernel.  I don't believe that one piece of software can do both well.
And I believe that, when it begins to mature more, KVM will run into
the very same issue.  KVM developers will really want to start to make
the kernel into a hypervisor, and there will be a disagreement between
those who want the kernel to be just a kernel, and those who want the
kernel also to be a hypervisor.  The result will be either a heavily
modified Linux (much more than linux-as-hypervisor-component) or a
really sucky hypervisor.

As a simple example, take scheduling.  I'm about to re-write the Xen
scheduler, and in the process I took a good look at the scheduler you
wrote.  I think it's got a lot of really good ideas, which I plan to
steal. :-)  However, I'm going to have to make some key changes in
order for it to function well as a hypervisor scheduler.  If KVM is
used on a production server with 20 or 30 multi-vcpu VMs, I predict
the current scheduler will do very poorly, because it wasn't designed
with VMs in mind, but with processes.  Making changes so that VMs run
better will fundamentally make things that make processes run less
well.

Forking Linux, drivers an all, is not a good idea; anyone would have
to be a fool to try it.  I think if you think seriously about it,
you'd never do something like that.  I don't believe any such a
project would have a snowball's chance in hell of attracting anywhere
near the required number of hardware developers to make it an
enterprise-class system.  If, somehow, it did manage to attract a
critical mass to make it viable, then the result would be two much
weaker projects, wasting millions of man-hours of  labor doing
unnecessary duplication.

No, I think the best option, and the option the Xen project would take
again if we were to start from scratch, would be what we have done:
To build a hypervisor to be a hypervisor, and let the kernel be a
kernel: but leverage the millions of man-hours still being done in
hardware support for Linux.

Either way, time will tell in the end.  If I'm wrong, and KVM can
become an enterprise-class hypervisor while playing well with
linux-as-kernel, then eventually it will dominate and Xen will die
out.  You can say "I told you so" and remove all the crap you've been
objecting to.  If I'm right, however, then having Xen around will be
critical, not just for open-source virtualization, but for the kernel
as well.  You'll be happy to be able to tell people, "Don't put this
hypervisor crap in here.  If you want a hypervisor, go to Xen." :-)

Until things are shown clearly one way or the other, the best thing to
do is hedge your bets, and allow both projects to develop.

[That's my main point; in-line responses below.]

> The whole Xen design is messed up really: you have taken off bits of
> the Linux kernel you found interesting, turned them into a
> micro-kernel in essence and renamed it to 'Xen'.

That's how Xen started, and that's really the beauty of open-source.
(After all, KVM has stolen some ideas from the Xen shadow code.)  But
since then, basically all of the code has been replaced with
Xen-written code.  I think if you did an SCO-style audit comparing
Linux and Xen 3.4, you'd find a lot less in common than you think.

> But drivers and proper architecture is apparently boring (and
> fragile and hard and expensive to write and support in a
> micro-kernel setup) so you came up with this DOM0 piece of cr*p that
> ties Linux to Xen even closer (along an _ABI_), where Linux does
> most of the real work while Xen still stays 'separate' on paper.

It's not boring, it's just a colossal waste of time and resources to
duplicate all that effort.  "Real work" is done by all of the
components: Xen does the "real work" of scheduling and resource
management; Linux does the "real work" of process-level stuff,
filesystems, and so on and (in the case of dom0) hardware support;
qemu does the "real work" of doing device emulation.  All of them are
unique, difficult, and interesting to somebody.  Reducing duplication
means everyone can work on what interests them the most, and minimizes
the total "busy work" for all involved.

How many KVM developers are working on device drivers?  And how would
Xen duplicating all the driver development help Linux?  Linux would
still have to do everything, there'd just be fewer developers to do it
(since some people would be working on Xen drivers instead).

> Xen isnt actually useful _at all_ without Linux/DOM0. Without Dom0
> Xen is slow and native hardware support within Xen is virtually
> non-existent, as you point out above.

And qemu-kvm isn't useful _at_all_ without Linux either; and Linux-KVM
isn't useful _at_all_ without qemu.  Your point?

Xen will run without dom0?  I wasn't aware of that... ;-)

> This is proof that you should have done all that work within Linux -
> instead of duplicating a lot of code.

See above.

 -George Dunlap

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-26 12:46         ` George Dunlap
@ 2009-05-26 18:26           ` Avi Kivity
  -1 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-05-26 18:26 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ingo Molnar, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Linus Torvalds, Keir Fraser

George Dunlap wrote:
> As a simple example, take scheduling.  I'm about to re-write the Xen
> scheduler, and in the process I took a good look at the scheduler you
> wrote.  I think it's got a lot of really good ideas, which I plan to
> steal. :-)  However, I'm going to have to make some key changes in
> order for it to function well as a hypervisor scheduler.  If KVM is
> used on a production server with 20 or 30 multi-vcpu VMs, I predict
> the current scheduler will do very poorly, because it wasn't designed
> with VMs in mind, but with processes.  Making changes so that VMs run
> better will fundamentally make things that make processes run less
> well.
>   

The Linux scheduler already supports multiple scheduling classes.  If we 
find that none of them will fit our needs, we'll propose a new one.  
There are also multiple I/O schedulers, multiple allocators (perhaps a 
bad example), and multiple filesystems.

When the need can be demonstrated to be real, and the implementation can 
be clean, Linux can usually be adapted.

I think the Xen design has merit if it can truly make dom0 a guest -- 
that is, if it can survive dom0 failure.  Until then, you're just taking 
a large interdependent codebase and splitting it at some random point, 
but you don't get any stability or security in return.  It will also be 
interesting to see how far Xen can get along without real memory 
management (overcommit).

>> The whole Xen design is messed up really: you have taken off bits of
>> the Linux kernel you found interesting, turned them into a
>> micro-kernel in essence and renamed it to 'Xen'.
>>     
>
> That's how Xen started, and that's really the beauty of open-source.
> (After all, KVM has stolen some ideas from the Xen shadow code.)  But
> since then, basically all of the code has been replaced with
> Xen-written code.  I think if you did an SCO-style audit comparing
> Linux and Xen 3.4, you'd find a lot less in common than you think.
>   

A lot of the arch code is derived from Linux.

>> Xen isnt actually useful _at all_ without Linux/DOM0. Without Dom0
>> Xen is slow and native hardware support within Xen is virtually
>> non-existent, as you point out above.
>>     
>
> And qemu-kvm isn't useful _at_all_ without Linux either; and Linux-KVM
> isn't useful _at_all_ without qemu.  Your point?
>   

kvm is actually being used by other userspaces.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-26 18:26           ` Avi Kivity
  0 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-05-26 18:26 UTC (permalink / raw)
  To: George Dunlap
  Cc: Jeremy Fitzhardinge, Xen-devel, the arch/x86 maintainers,
	Linux Kernel Mailing List, Keir Fraser, Ingo Molnar,
	Linus Torvalds

George Dunlap wrote:
> As a simple example, take scheduling.  I'm about to re-write the Xen
> scheduler, and in the process I took a good look at the scheduler you
> wrote.  I think it's got a lot of really good ideas, which I plan to
> steal. :-)  However, I'm going to have to make some key changes in
> order for it to function well as a hypervisor scheduler.  If KVM is
> used on a production server with 20 or 30 multi-vcpu VMs, I predict
> the current scheduler will do very poorly, because it wasn't designed
> with VMs in mind, but with processes.  Making changes so that VMs run
> better will fundamentally make things that make processes run less
> well.
>   

The Linux scheduler already supports multiple scheduling classes.  If we 
find that none of them will fit our needs, we'll propose a new one.  
There are also multiple I/O schedulers, multiple allocators (perhaps a 
bad example), and multiple filesystems.

When the need can be demonstrated to be real, and the implementation can 
be clean, Linux can usually be adapted.

I think the Xen design has merit if it can truly make dom0 a guest -- 
that is, if it can survive dom0 failure.  Until then, you're just taking 
a large interdependent codebase and splitting it at some random point, 
but you don't get any stability or security in return.  It will also be 
interesting to see how far Xen can get along without real memory 
management (overcommit).

>> The whole Xen design is messed up really: you have taken off bits of
>> the Linux kernel you found interesting, turned them into a
>> micro-kernel in essence and renamed it to 'Xen'.
>>     
>
> That's how Xen started, and that's really the beauty of open-source.
> (After all, KVM has stolen some ideas from the Xen shadow code.)  But
> since then, basically all of the code has been replaced with
> Xen-written code.  I think if you did an SCO-style audit comparing
> Linux and Xen 3.4, you'd find a lot less in common than you think.
>   

A lot of the arch code is derived from Linux.

>> Xen isnt actually useful _at all_ without Linux/DOM0. Without Dom0
>> Xen is slow and native hardware support within Xen is virtually
>> non-existent, as you point out above.
>>     
>
> And qemu-kvm isn't useful _at_all_ without Linux either; and Linux-KVM
> isn't useful _at_all_ without qemu.  Your point?
>   

kvm is actually being used by other userspaces.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-26 18:26           ` Avi Kivity
@ 2009-05-26 19:18             ` Dan Magenheimer
  -1 siblings, 0 replies; 183+ messages in thread
From: Dan Magenheimer @ 2009-05-26 19:18 UTC (permalink / raw)
  To: Avi Kivity, George Dunlap
  Cc: Jeremy Fitzhardinge, Xen-devel, the arch/x86 maintainers,
	Linux Kernel Mailing List, Keir Fraser, Ingo Molnar,
	Linus Torvalds

> It will also be 
> interesting to see how far Xen can get along without real memory 
> management (overcommit).

Several implementations of "classic" memory overcommit have been
done for Xen, most recently the Difference Engine work at UCSD.
It is true that none have been merged yet, in part because,
in many real world environments, "generalized" overcommit
often leads to hypervisor swapping, and performance becomes
unacceptable.  (In other words, except in certain limited customer
use models, memory overcommit is a "marketing feature".)

There's also a novel approach, Transcendent Memory (aka "tmem"
see http://oss.oracle.com/projects/tmem).  Though tmem requires the
guest to participate in memory management decisions (thus requiring
a Linux patch), system-wide physical memory efficiency may
improve vs memory deduplication, and hypervisor-based swapping
is not necessary.

> The Linux scheduler already supports multiple scheduling 
> classes.  If we 
> find that none of them will fit our needs, we'll propose a new one.  
> When the need can be demonstrated to be real, and the 
> implementation can 
> be clean, Linux can usually be adapted.

But that's exactly George and Jeremy's point.  KVM will
eventually require changes that clutter Linux for purposes
that are relevant only to a hypervisor.

> > I think if you did an SCO-style audit comparing
> > Linux and Xen 3.4, you'd find a lot less in common than you think.  
> 
> A lot of the arch code is derived from Linux.

Indeed it is, but the operative word is "derived".  In
many cases, the code has been modified to be more applicable
to a hypervisor.  For example, in Xen, tmem uses radix trees
in a way that is similar to Linux but different enough that
the changes would not likely be acceptable in Linux.  The
separation between Xen and Linux allows this diversity
without cluttering Linux.

I think we can all agree that drawing boundaries between
"hypervisor" functionality and "operating system"
functionality is a work in progress and may take many
more years to settle.  In the meantime, there should be
room (and support) for different approaches.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-26 19:18             ` Dan Magenheimer
  0 siblings, 0 replies; 183+ messages in thread
From: Dan Magenheimer @ 2009-05-26 19:18 UTC (permalink / raw)
  To: Avi Kivity, George Dunlap
  Cc: Jeremy Fitzhardinge, Xen-devel, the arch/x86 maintainers,
	Linux Kernel Mailing List, Keir Fraser, Ingo Molnar,
	Linus Torvalds

> It will also be 
> interesting to see how far Xen can get along without real memory 
> management (overcommit).

Several implementations of "classic" memory overcommit have been
done for Xen, most recently the Difference Engine work at UCSD.
It is true that none have been merged yet, in part because,
in many real world environments, "generalized" overcommit
often leads to hypervisor swapping, and performance becomes
unacceptable.  (In other words, except in certain limited customer
use models, memory overcommit is a "marketing feature".)

There's also a novel approach, Transcendent Memory (aka "tmem"
see http://oss.oracle.com/projects/tmem).  Though tmem requires the
guest to participate in memory management decisions (thus requiring
a Linux patch), system-wide physical memory efficiency may
improve vs memory deduplication, and hypervisor-based swapping
is not necessary.

> The Linux scheduler already supports multiple scheduling 
> classes.  If we 
> find that none of them will fit our needs, we'll propose a new one.  
> When the need can be demonstrated to be real, and the 
> implementation can 
> be clean, Linux can usually be adapted.

But that's exactly George and Jeremy's point.  KVM will
eventually require changes that clutter Linux for purposes
that are relevant only to a hypervisor.

> > I think if you did an SCO-style audit comparing
> > Linux and Xen 3.4, you'd find a lot less in common than you think.  
> 
> A lot of the arch code is derived from Linux.

Indeed it is, but the operative word is "derived".  In
many cases, the code has been modified to be more applicable
to a hypervisor.  For example, in Xen, tmem uses radix trees
in a way that is similar to Linux but different enough that
the changes would not likely be acceptable in Linux.  The
separation between Xen and Linux allows this diversity
without cluttering Linux.

I think we can all agree that drawing boundaries between
"hypervisor" functionality and "operating system"
functionality is a work in progress and may take many
more years to settle.  In the meantime, there should be
room (and support) for different approaches.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-26 19:18             ` Dan Magenheimer
@ 2009-05-26 19:41               ` Avi Kivity
  -1 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-05-26 19:41 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: George Dunlap, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Keir Fraser,
	Ingo Molnar, Linus Torvalds

Dan Magenheimer wrote:
>> It will also be 
>> interesting to see how far Xen can get along without real memory 
>> management (overcommit).
>>     
>
> Several implementations of "classic" memory overcommit have been
> done for Xen, most recently the Difference Engine work at UCSD.
> It is true that none have been merged yet, in part because,
> in many real world environments, "generalized" overcommit
> often leads to hypervisor swapping, and performance becomes
> unacceptable.  (In other words, except in certain limited customer
> use models, memory overcommit is a "marketing feature".)
>   

Swapping indeed drags performance down horribly.  I regard it as a last 
resort solution used when everything else (page sharing, compression, 
ballooning, live migration) has failed.  By having that last resort you 
can actually use the other methods without fearing an out-of-memory 
condition eventually.

Note that with SSDs disks have started to narrow the gap between memory 
and secondary storage access times, so swapping will actually start 
improving rather than regressing as it has done in recent times.

> There's also a novel approach, Transcendent Memory (aka "tmem"
> see http://oss.oracle.com/projects/tmem).  Though tmem requires the
> guest to participate in memory management decisions (thus requiring
> a Linux patch), system-wide physical memory efficiency may
> improve vs memory deduplication, and hypervisor-based swapping
> is not necessary.
>   

Yes, I've seen that.  Another tool in the memory management arsenal.

>   
>> The Linux scheduler already supports multiple scheduling 
>> classes.  If we 
>> find that none of them will fit our needs, we'll propose a new one.  
>> When the need can be demonstrated to be real, and the 
>> implementation can 
>> be clean, Linux can usually be adapted.
>>     
>
> But that's exactly George and Jeremy's point.  KVM will
> eventually require changes that clutter Linux for purposes
> that are relevant only to a hypervisor.
>   

kvm has already made changes to Linux.  Preemption notifiers allow us to 
have a lightweight exit path, and mmu notifiers allow the Linux mmu to 
control the kvm mmu.  And in fact mmu notifiers have proven useful to 
device drivers.

It also works the other way around; for example work on cpu controllers 
will benefit kvm, and the real-time scheduler will also apply to kvm 
guests.  In fact many scheduler and memory management features 
immediately apply to kvm, usually without any need for integration.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-26 19:41               ` Avi Kivity
  0 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-05-26 19:41 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Jeremy Fitzhardinge, Xen-devel, George Dunlap,
	the arch/x86 maintainers, Linux Kernel Mailing List, Keir Fraser,
	Ingo Molnar, Linus Torvalds

Dan Magenheimer wrote:
>> It will also be 
>> interesting to see how far Xen can get along without real memory 
>> management (overcommit).
>>     
>
> Several implementations of "classic" memory overcommit have been
> done for Xen, most recently the Difference Engine work at UCSD.
> It is true that none have been merged yet, in part because,
> in many real world environments, "generalized" overcommit
> often leads to hypervisor swapping, and performance becomes
> unacceptable.  (In other words, except in certain limited customer
> use models, memory overcommit is a "marketing feature".)
>   

Swapping indeed drags performance down horribly.  I regard it as a last 
resort solution used when everything else (page sharing, compression, 
ballooning, live migration) has failed.  By having that last resort you 
can actually use the other methods without fearing an out-of-memory 
condition eventually.

Note that with SSDs disks have started to narrow the gap between memory 
and secondary storage access times, so swapping will actually start 
improving rather than regressing as it has done in recent times.

> There's also a novel approach, Transcendent Memory (aka "tmem"
> see http://oss.oracle.com/projects/tmem).  Though tmem requires the
> guest to participate in memory management decisions (thus requiring
> a Linux patch), system-wide physical memory efficiency may
> improve vs memory deduplication, and hypervisor-based swapping
> is not necessary.
>   

Yes, I've seen that.  Another tool in the memory management arsenal.

>   
>> The Linux scheduler already supports multiple scheduling 
>> classes.  If we 
>> find that none of them will fit our needs, we'll propose a new one.  
>> When the need can be demonstrated to be real, and the 
>> implementation can 
>> be clean, Linux can usually be adapted.
>>     
>
> But that's exactly George and Jeremy's point.  KVM will
> eventually require changes that clutter Linux for purposes
> that are relevant only to a hypervisor.
>   

kvm has already made changes to Linux.  Preemption notifiers allow us to 
have a lightweight exit path, and mmu notifiers allow the Linux mmu to 
control the kvm mmu.  And in fact mmu notifiers have proven useful to 
device drivers.

It also works the other way around; for example work on cpu controllers 
will benefit kvm, and the real-time scheduler will also apply to kvm 
guests.  In fact many scheduler and memory management features 
immediately apply to kvm, usually without any need for integration.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-26 12:46         ` George Dunlap
@ 2009-05-26 21:19           ` Gerd Hoffmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Gerd Hoffmann @ 2009-05-26 21:19 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ingo Molnar, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Avi Kivity,
	Linus Torvalds, Keir Fraser

On 05/26/09 14:46, George Dunlap wrote:
> On Mon, May 25, 2009 at 5:10 AM, Ingo Molnar<mingo@elte.hu>  wrote:
>> Note that this design problem has been created by Xen,
>> intentionally, and Xen is now suffering under those bad technical
>> choices made years ago. It's not Linux's problem.
>
> I'd like to respecfully disagree with this.

Well.  Xen *does* suffer from bad technical choices made years ago.  I'm 
pretty sure Xen would look radically different when being rewritten from 
scratch today.

One reason is that Xen predates vt and svm.  With that in mind some of 
the xen interface bits don't look *that* odd any more.  Back then it did 
made sense to handle things that way.  The ioapic hypercalls discussed 
in this thread belong into that group IMHO.

Another reason is that Xen wasn't "designed".  Xen was "hacked up".  As 
far I know there is no document which describes the overall design of 
the guest/xen ABI.  Also there is no documentation (other than code) 
which describes all details of the guest/xen ABI.  Simple reason:  The 
ABI wasn't designed.  It was hammered into shape until it worked.  On 
x86.  The guys who attempted (and failed) to port xen to ppc had alot of 
*ahem* fun with that stuff.  For example: Passing guest virtual 
addresses in (some) hypercalls.  Also direct paging mode is a very 
x86-ish and is the reason for a number of ia64-ifdefs in places where 
you don't expect them ...

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-26 21:19           ` Gerd Hoffmann
  0 siblings, 0 replies; 183+ messages in thread
From: Gerd Hoffmann @ 2009-05-26 21:19 UTC (permalink / raw)
  To: George Dunlap
  Cc: Jeremy Fitzhardinge, Xen-devel, the arch/x86 maintainers,
	Linux Kernel Mailing List, Avi Kivity, Ingo Molnar,
	Linus Torvalds, Keir Fraser

On 05/26/09 14:46, George Dunlap wrote:
> On Mon, May 25, 2009 at 5:10 AM, Ingo Molnar<mingo@elte.hu>  wrote:
>> Note that this design problem has been created by Xen,
>> intentionally, and Xen is now suffering under those bad technical
>> choices made years ago. It's not Linux's problem.
>
> I'd like to respecfully disagree with this.

Well.  Xen *does* suffer from bad technical choices made years ago.  I'm 
pretty sure Xen would look radically different when being rewritten from 
scratch today.

One reason is that Xen predates vt and svm.  With that in mind some of 
the xen interface bits don't look *that* odd any more.  Back then it did 
made sense to handle things that way.  The ioapic hypercalls discussed 
in this thread belong into that group IMHO.

Another reason is that Xen wasn't "designed".  Xen was "hacked up".  As 
far I know there is no document which describes the overall design of 
the guest/xen ABI.  Also there is no documentation (other than code) 
which describes all details of the guest/xen ABI.  Simple reason:  The 
ABI wasn't designed.  It was hammered into shape until it worked.  On 
x86.  The guys who attempted (and failed) to port xen to ppc had alot of 
*ahem* fun with that stuff.  For example: Passing guest virtual 
addresses in (some) hypercalls.  Also direct paging mode is a very 
x86-ish and is the reason for a number of ia64-ifdefs in places where 
you don't expect them ...

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 02/17] x86: add io_apic_ops to allow interception
  2009-05-25  3:54     ` Ingo Molnar
  (?)
@ 2009-05-27  7:17     ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-27  7:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge, Greg KH, Jens Axboe

Ingo Molnar wrote:
> ok, could you please turn the whole IO-APIC code into a driver 
> framework? I.e. all IO-APIC calls outside of 
> arch/x86/kernel/apic/io_apic.c should be to some io_apic-> method.
>
> The advantage will be a proper abstraction for all IO-APIC details - 
> not just a minimalistic one for Xen's need.
>
> Also, please name it 'struct io_apic' - similar to the 'struct apic' 
> naming we have for the local APIC driver structure.

OK, I'll have a look at it.  I think it could turn out quite nicely, and 
possibly remove the need for some other other Xen hooks around the 
place, as well as make the path for some other other upcoming things 
clearer.

But in the meantime, would you consider taking the minimal ops approach 
for this next merge window, and the full api in the next dev cycle?

Thanks,
    J


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-26 21:19           ` Gerd Hoffmann
@ 2009-05-27 10:14             ` George Dunlap
  -1 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-05-27 10:14 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jeremy Fitzhardinge, Xen-devel, the arch/x86 maintainers,
	Linux Kernel Mailing List, Avi Kivity, Ingo Molnar,
	Linus Torvalds, Keir Fraser

On Tue, May 26, 2009 at 10:19 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> Well.  Xen *does* suffer from bad technical choices made years ago.  I'm
> pretty sure Xen would look radically different when being rewritten from
> scratch today.

That may be.  I don't know enough about the specific issues you raise
below to comment.  But Ingo wasn't bringing up those issues: he was
disagreeing with the whole idea of including dom0 Linux as a key
component of the Xen system.  If the Xen project were to start over
from scratch, we might make a lot of different decisions; but running
Linux as the hypervisor (as KVM does) or forking Linux (as Ingo seemed
to suggest) are not among them.

 -George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-27 10:14             ` George Dunlap
  0 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-05-27 10:14 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jeremy Fitzhardinge, Xen-devel, the arch/x86 maintainers,
	Linux Kernel Mailing List, Avi Kivity, Ingo Molnar,
	Linus Torvalds, Keir Fraser

On Tue, May 26, 2009 at 10:19 PM, Gerd Hoffmann <kraxel@redhat.com> wrote:
> Well.  Xen *does* suffer from bad technical choices made years ago.  I'm
> pretty sure Xen would look radically different when being rewritten from
> scratch today.

That may be.  I don't know enough about the specific issues you raise
below to comment.  But Ingo wasn't bringing up those issues: he was
disagreeing with the whole idea of including dom0 Linux as a key
component of the Xen system.  If the Xen project were to start over
from scratch, we might make a lot of different decisions; but running
Linux as the hypervisor (as KVM does) or forking Linux (as Ingo seemed
to suggest) are not among them.

 -George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-26 19:18             ` Dan Magenheimer
@ 2009-05-28  0:13               ` Ingo Molnar
  -1 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-28  0:13 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Avi Kivity, George Dunlap, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Keir Fraser,
	Linus Torvalds


* Dan Magenheimer <dan.magenheimer@oracle.com> wrote:

> > The Linux scheduler already supports multiple scheduling 
> > classes.  If we find that none of them will fit our needs, we'll 
> > propose a new one.  When the need can be demonstrated to be 
> > real, and the implementation can be clean, Linux can usually be 
> > adapted.
> 
> But that's exactly George and Jeremy's point.  KVM will eventually 
> require changes that clutter Linux for purposes that are relevant 
> only to a hypervisor.

That's wrong. Any such scheduler classes would also help: control 
groups, containers, vserver, UML and who knows what other isolation 
project. Many of such mechanisms are already implemented as well.

I rarely see any KVM-only feature in generic kernel code, and that's 
good.

Xen changes - especially dom0 - are overwhelmingly not about 
improving Linux, but about having some special hook and extra 
treatment in random places - and that's really bad.

I also find it pretty telling that you cut out the most important 
point of Avi's reply:

> > I think the Xen design has merit if it can truly make dom0 a 
> > guest -- that is, if it can survive dom0 failure.  Until then, 
> > you're just taking a large interdependent codebase and splitting 
> > it at some random point, but you don't get any stability or 
> > security in return.

that crucial question really has to be answered honestly and 
upfront.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-28  0:13               ` Ingo Molnar
  0 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-05-28  0:13 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Jeremy Fitzhardinge, Xen-devel, George Dunlap,
	the arch/x86 maintainers, Linux Kernel Mailing List, Avi Kivity,
	Linus Torvalds, Keir Fraser


* Dan Magenheimer <dan.magenheimer@oracle.com> wrote:

> > The Linux scheduler already supports multiple scheduling 
> > classes.  If we find that none of them will fit our needs, we'll 
> > propose a new one.  When the need can be demonstrated to be 
> > real, and the implementation can be clean, Linux can usually be 
> > adapted.
> 
> But that's exactly George and Jeremy's point.  KVM will eventually 
> require changes that clutter Linux for purposes that are relevant 
> only to a hypervisor.

That's wrong. Any such scheduler classes would also help: control 
groups, containers, vserver, UML and who knows what other isolation 
project. Many of such mechanisms are already implemented as well.

I rarely see any KVM-only feature in generic kernel code, and that's 
good.

Xen changes - especially dom0 - are overwhelmingly not about 
improving Linux, but about having some special hook and extra 
treatment in random places - and that's really bad.

I also find it pretty telling that you cut out the most important 
point of Avi's reply:

> > I think the Xen design has merit if it can truly make dom0 a 
> > guest -- that is, if it can survive dom0 failure.  Until then, 
> > you're just taking a large interdependent codebase and splitting 
> > it at some random point, but you don't get any stability or 
> > security in return.

that crucial question really has to be answered honestly and 
upfront.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-28  0:13               ` Ingo Molnar
@ 2009-05-28  0:49                 ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-28  0:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dan Magenheimer, Avi Kivity, George Dunlap, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Keir Fraser,
	Linus Torvalds

Ingo Molnar wrote:
> I also find it pretty telling that you cut out the most important 
> point of Avi's reply:
>
>   
>>> I think the Xen design has merit if it can truly make dom0 a 
>>> guest -- that is, if it can survive dom0 failure.  Until then, 
>>> you're just taking a large interdependent codebase and splitting 
>>> it at some random point, but you don't get any stability or 
>>> security in return.
>>>       
>
> that crucial question really has to be answered honestly and 
> upfront.

Xen, the hypervisor itself, doesn't require any services from dom0. From 
its perspective, dom0 is just another guest domain, though with enough 
privileges to access hardware.  Dom0's job is to provide device access 
to other less privileged domains.

There is currently some system-wide information which is stored in a 
usermode daemon in dom0. Recovering from its loss is hard, but there is 
a prototype to pull that daemon out into its own special-purpose 
domain.  At that point, dom0 can reboot without affecting any of the 
other domains or Xen itself.

If dom0 goes away, the other domains will get a disconnect and 
temporarily lose access to their devices, but they can cope with that.  
 From their perspective, it would look like they'd just been 
save/restored or migrated to another machine.  When dom0 comes back, 
they'll reconnect and carry on.

The disaggregation of dom0's functions is something that the Xen 
development community is actively perusing.

    J

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-28  0:49                 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-28  0:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dan Magenheimer, Xen-devel, George Dunlap,
	the arch/x86 maintainers, Linux Kernel Mailing List, Avi Kivity,
	Linus Torvalds, Keir Fraser

Ingo Molnar wrote:
> I also find it pretty telling that you cut out the most important 
> point of Avi's reply:
>
>   
>>> I think the Xen design has merit if it can truly make dom0 a 
>>> guest -- that is, if it can survive dom0 failure.  Until then, 
>>> you're just taking a large interdependent codebase and splitting 
>>> it at some random point, but you don't get any stability or 
>>> security in return.
>>>       
>
> that crucial question really has to be answered honestly and 
> upfront.

Xen, the hypervisor itself, doesn't require any services from dom0. From 
its perspective, dom0 is just another guest domain, though with enough 
privileges to access hardware.  Dom0's job is to provide device access 
to other less privileged domains.

There is currently some system-wide information which is stored in a 
usermode daemon in dom0. Recovering from its loss is hard, but there is 
a prototype to pull that daemon out into its own special-purpose 
domain.  At that point, dom0 can reboot without affecting any of the 
other domains or Xen itself.

If dom0 goes away, the other domains will get a disconnect and 
temporarily lose access to their devices, but they can cope with that.  
 From their perspective, it would look like they'd just been 
save/restored or migrated to another machine.  When dom0 comes back, 
they'll reconnect and carry on.

The disaggregation of dom0's functions is something that the Xen 
development community is actively perusing.

    J

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-28  0:13               ` Ingo Molnar
  (?)
  (?)
@ 2009-05-28  3:47               ` Dan Magenheimer
  2009-05-28 12:03                 ` Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)) Luke S Crawford
  -1 siblings, 1 reply; 183+ messages in thread
From: Dan Magenheimer @ 2009-05-28  3:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, George Dunlap, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Keir Fraser,
	Linus Torvalds

> * Dan Magenheimer <dan.magenheimer@oracle.com> wrote:
> 
> > > The Linux scheduler already supports multiple scheduling 
> > > classes.  If we find that none of them will fit our needs, we'll 
> > > propose a new one.  When the need can be demonstrated to be 
> > > real, and the implementation can be clean, Linux can usually be 
> > > adapted.
> > 
> > But that's exactly George and Jeremy's point.  KVM will eventually 
> > require changes that clutter Linux for purposes that are relevant 
> > only to a hypervisor.
> 
> That's wrong. Any such scheduler classes would also help: control 
> groups, containers, vserver, UML and who knows what other isolation 
> project. Many of such mechanisms are already implemented as well.

I think you are missing the point.  Yes, certainly, generic
scheduler code can be written that applies to all of these
uses.  But will that be the same code that is best for KVM to
succeed in an enterprise-class virtual data center?
I agree with George that it will not; generic code and optimal
code are rarely the same thing.  What's best for an operating
system is not always what's best for a hypervisor.

But we are both speculating.  I guess only time will tell.

> I also find it pretty telling that you cut out the most important 
> point of Avi's reply:
> 
> > > I think the Xen design has merit if it can truly make dom0 a 
> > > guest -- that is, if it can survive dom0 failure.  Until then, 
> > > you're just taking a large interdependent codebase and splitting 
> > > it at some random point, but you don't get any stability or 
> > > security in return.
> 
> that crucial question really has to be answered honestly and 
> upfront.

I cut it out because I thought others would be more qualified
to answer, but since nobody else has, I will.  Absolutely there
is work going on to survive failure of dom0 (or any domain)!
This is a must for enterprise-grade availability and security,
such as is needed for huge corporate data centers and "clouds".
However, the majority of users (individuals and small businesses)
will probably be most happy with their distro (and distro kernel)
as dom0 since it is convenient and familiar.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-28  3:47               ` [Xen-devel] " Dan Magenheimer
@ 2009-05-28 12:03                 ` Luke S Crawford
  2009-05-28 13:39                   ` Tim Post
  0 siblings, 1 reply; 183+ messages in thread
From: Luke S Crawford @ 2009-05-28 12:03 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: Xen-devel

Dan Magenheimer <dan.magenheimer@oracle.com> writes:
> such as is needed for huge corporate data centers and "clouds".
> However, the majority of users (individuals and small businesses)
> will probably be most happy with their distro (and distro kernel)
> as dom0 since it is convenient and familiar.

Hey, I'm going to hijack this for a rant, if you don't mind.  I've 
pared back the cc list.  

Now, I'm just the janitor; I rightly belong on xen-users.  I lurk in -devel
because it's good to know what the developers are thinking.  I don't want
to come off like I think I'm smarter than anyone here; I don't.  But
I am a heavy user of Xen, and I don't think I'm an unusual user of Xen.

I've been selling VPSs using Xen since 2005.  After the 
marketing people convince the middle managers that virtualization is the 
way to go, someone like me has to actually bang on the thing with a spanner
or rub it with a greasy rag until it works.  

I also do contracting for some of those 'large corporate data centers' of 
which you speak.  (corporate data centers seem to be the worst in terms of 
operational efficiency.  do you know how many Linux installations I've seen 
where the customer pays a few hundred extra per box for integrated KVM over 
IP functionality rather than the much cheaper and more useful serial 
consoles?  Oy.  You expect me to tell you why your server crashed when 
you have no console logs of the backtrace?)

But I'm getting sidetracked.   My point is that small companies need good 
tools more than large corporations do.   The big guys can just keep 
throwing money at the problem until their stuff mostly works.  

In the Dom0, I want something as stable, minimal, standard and  
supported as possible (by supported, I really mean standard and widely
used.  The best 'support' is searching mailing lists such as this one.)

the last thing I want is all the cowboy hackery that goes into my favorite
desktop OS to be included in my Xen Dom0.   I moved to Ubuntu on my laptop
last year and I was amazed how easy it was.  everything just worked.  
making new hardware work was easier than windows.  

But do I want that on my Xen Dom0?  certainly not until you get that thing
working where I can reboot the Dom0 without killing everything.  

This is what I think is wrong about the default install of Xen;  it is setup 
so that you can run your desktop in the dom0, and spin up DomUs as needed.
It tries to be a virtualization server and a desktop at the same time,
and it gives up stability for this. 

If you've ever run a Xen host and have forgotten to change the default 
dom0-min-mem of 192MiB, you'd know most (especially x86_64) linux 
installations are not stable under load with that much memory.   Even 
if you set dom0-min-mem to a reasonable value, I've seen enough problems 
related to ballooning that I always disable it on all hosts.  (granted, 
most of those problems are on old RHEL installs)   Also, in all 
the environments I work in, both at the 'large corporate data center' 
and my own,  the configuration of the DomUs is fairly static.   Sure, it's 
kind of nice to be able to add resources without rebooting the DomU, but 
to give up stability for that is just crazy.  If your customer says "I 
want more X" it's fine to say "Ok, let me know when I can reboot you"  -  
it's not ok to crash.  

In my work, people mostly use the  'I take this Linux box, I set it up,
and I use it for three years' model.  They don't need any of the fancy 
'computing on demand'  -  they just want to move 16 of those crusty P3
servers that are killing their power bill and crashing due to bad hardware
twice a month on to a nice shiny new 8 core box with 32GiB ram and a
warranty.   I've seen lots of people who buy ec2 instances and do the
same thing; they leave it on all the time. (the basic ec2 instances are
particularly unsuited to this usage, but people do it anyhow.) 

I'm not going to say memory overcommit is never useful for anyone;
but I can say it is never useful for me.  32GiB registered ecc ddr2
is around $600.  That's not very many billable hours.  That's around
half the approximate cost of an unplanned reboot of one of my servers.
(I'm only counting money lost due to SLA and time to clean up;  if you
count loss to reputation, it gets even worse)  

My experience with memory overcommit has been that it makes your 
system either unstable, slow or both.  Now, I don't know if you could
theoretically make a zero cost memory overcommit system;  I'm just saying
that every attempt at overcommiting memory between virtual servers I have
seen ended in tears.   (heck, I've seen quite a lot of tearful endings
due to the memory overcommit linux itself does.)  This is why I ditched
FreeBSD jails and came to Xen in 2005.  

Right now, I'm using CentOS5 with the xen.org kernels, but it sure
would be nice if there was some pared down pre-built dom0 configuration 
available. (I personally give my Dom0 1024MiB out of 32GiB)  It could be
based on centos, or on ttylinux, or whatever.  just something standard, small,
and simple.  Make it good enough that people use it.  When I see a problem,
I want fifty other guys to have seen the problem first.  

I'm thinking about starting such a project myself once I get a few other 
things done.  If nothing else, I can distribute kickstart files of a minimal
dom0. Going forward, now that NetBSD 5 is out, perhaps I will switch back 
to NetBSD as my Dom0  (I switched from FreeBSD jails to NetBSD/Xen2 in 
2005.  I switched away for pae/x86_64 support.  I mean, no pae sucked, 
but the os was solid)  Unfortunately, that means I would have less 
'support' in the form of other people doing the same thing and talking 
about it in public.

RedHat is talking about doing it with KVM  -  see the Red Hat Enterprise 
Virtualization hypervisor  - they claim you will have a KVM 'dom0'  that 
uses only 64M ram- which seems funny to me, as my perception of KVM  has 
always been that it was optimized to run virtual instances as needed on 
a box that usually ran applications on the bare metal, like a desktop.  


-- 
Luke S. Crawford
http://prgmr.com/xen/  -     Hosting for the technically adept
http://nostarch.com/xen.htm  We don't assume you are stupid.  

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-28 12:03                 ` Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)) Luke S Crawford
@ 2009-05-28 13:39                   ` Tim Post
  2009-05-28 22:23                     ` Luke S Crawford
  2009-05-30  1:10                     ` Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant Michael David Crawford
  0 siblings, 2 replies; 183+ messages in thread
From: Tim Post @ 2009-05-28 13:39 UTC (permalink / raw)
  To: Luke S Crawford; +Cc: Dan Magenheimer, Xen-devel

On Thu, 2009-05-28 at 08:03 -0400, Luke S Crawford wrote:
> Dan Magenheimer <dan.magenheimer@oracle.com> writes:
> > such as is needed for huge corporate data centers and "clouds".
> > However, the majority of users (individuals and small businesses)
> > will probably be most happy with their distro (and distro kernel)
> > as dom0 since it is convenient and familiar.

> I've been selling VPSs using Xen since 2005.  After the 
> marketing people convince the middle managers that virtualization is the 
> way to go, someone like me has to actually bang on the thing with a spanner
> or rub it with a greasy rag until it works.  

So have I, since (pre) 2.0.7. I was one of the first (and only) to offer
OpenSSI (paravirtualized) as an offering.

> I also do contracting for some of those 'large corporate data centers' of 
> which you speak.  (corporate data centers seem to be the worst in terms of 
> operational efficiency.  do you know how many Linux installations I've seen 
> where the customer pays a few hundred extra per box for integrated KVM over 
> IP functionality rather than the much cheaper and more useful serial 
> consoles?  Oy.  You expect me to tell you why your server crashed when 
> you have no console logs of the backtrace?)

They are in business to make money, which is why real system integrators
flourish and stand out from the crowd who read "linux for dummies
version (x), now including KVM!!"

You either know how Xen and Linux works or you don't. Most DC "hands and
eyes" just follow a pre-set procedure and can't be bothered to deviate
from it or handle special cases. Again, that's why we have jobs.

> But I'm getting sidetracked.   My point is that small companies need good 
> tools more than large corporations do.   The big guys can just keep 
> throwing money at the problem until their stuff mostly works.

Here we go again. Writing your own tools is not too difficult, it makes
you money using LGPL libraries that are (reasonably) self explanatory.
Xen is a tool in your toolbox. All too often many fail to realize the
difference between Xen the hypervisor and the tools provided.

> the last thing I want is all the cowboy hackery that goes into my favorite
> desktop OS to be included in my Xen Dom0.   I moved to Ubuntu on my laptop
> last year and I was amazed how easy it was.  everything just worked.  
> making new hardware work was easier than windows.  

Desktop OS? We have to draw a line here. There is desktop and server
virtualization. If you want to try xyz-distro on your desktop, use
Virtualbox. If you want to put virtual machines to work, use Xen.

What, exactly is cowboy hackery? A dom-0 that might be a little slower
if you boot it without Xen? 

> But do I want that on my Xen Dom0?  certainly not until you get that thing
> working where I can reboot the Dom0 without killing everything.

Mmmm, then work on getting xenstored into a stub domain.

> This is what I think is wrong about the default install of Xen;  it is setup 
> so that you can run your desktop in the dom0, and spin up DomUs as needed.
> It tries to be a virtualization server and a desktop at the same time,
> and it gives up stability for this. 

The only reason that you should be using Xen on a desktop is to test
stuff that you want to propagate to servers. You've already said that
you make your living as an integrator selling the use of computers that
use Xen.

Xen is meant for production, it can be used on a desktop.

> If you've ever run a Xen host and have forgotten to change the default 
> dom0-min-mem of 192MiB, you'd know most (especially x86_64) linux 
> installations are not stable under load with that much memory.

I have , and I don't forget to change it.

> In my work, people mostly use the  'I take this Linux box, I set it up,
> and I use it for three years' model.  They don't need any of the fancy 
> 'computing on demand'  -  they just want to move 16 of those crusty P3
> servers that are killing their power bill and crashing due to bad hardware
> twice a month on to a nice shiny new 8 core box with 32GiB ram and a
> warranty.   I've seen lots of people who buy ec2 instances and do the
> same thing; they leave it on all the time. (the basic ec2 instances are
> particularly unsuited to this usage, but people do it anyhow.) 

Have you even looked at / tried Eucalyptus ?

> I'm not going to say memory overcommit is never useful for anyone;
> but I can say it is never useful for me.  32GiB registered ecc ddr2
> is around $600.  That's not very many billable hours.  That's around
> half the approximate cost of an unplanned reboot of one of my servers.
> (I'm only counting money lost due to SLA and time to clean up;  if you
> count loss to reputation, it gets even worse)

I don't have this problem. I export PV guest vitals over xenbus and set
up watches on them. As for overcommitment, the first step is knowing how
much memory each domain's kernel has actually promised to running
processes. That much is already in the tree. 

> Right now, I'm using CentOS5 with the xen.org kernels, but it sure
> would be nice if there was some pared down pre-built dom0 configuration 
> available. (I personally give my Dom0 1024MiB out of 32GiB)  It could be
> based on centos, or on ttylinux, or whatever.  just something standard, small,
> and simple.  Make it good enough that people use it.  When I see a problem,
> I want fifty other guys to have seen the problem first.  

I don't want to seem combative or antagonistic .. however, if I give you
a screw driver and a wrench, I'd expect that you'd use them in your own
way. Xen is no different.

> I'm thinking about starting such a project myself once I get a few other 
> things done.  If nothing else, I can distribute kickstart files of a minimal
> dom0.

Just as many others have done with debootstrap. I know your frustrated
with dom-0 not being in mainline, we all are. However, it seems the
tools frustrate you the most. Xen gives us a solid hypervisor, solid low
level libraries and some examples on how to use them. I can't see (at
this point) why you are so seemingly disgruntled?

> RedHat is talking about doing it with KVM  -  see the Red Hat Enterprise 
> Virtualization hypervisor  - they claim you will have a KVM 'dom0'  that 
> uses only 64M ram- which seems funny to me, as my perception of KVM  has 
> always been that it was optimized to run virtual instances as needed on 
> a box that usually ran applications on the bare metal, like a desktop.

Eh, that funny thing we call "market research" influences that. People
want easy desktop virtualization. Desktop virtualization is
_most_decidedly_not_ IAAS. There will _always_ be a market for people
who can make tools (or modify the existing ones) to suit some need.

I agree with some of what you have to say, I always appreciate a rant
and I do not mean to seem unfriendly .. however, I also fail to see the
basis?

Maybe I missed something, entirely possible.

Cheers,
--Tim

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
  2009-05-28  0:13               ` Ingo Molnar
@ 2009-05-28 14:26                 ` George Dunlap
  -1 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-05-28 14:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dan Magenheimer, Jeremy Fitzhardinge, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Avi Kivity,
	Linus Torvalds, Keir Fraser

On Thu, May 28, 2009 at 1:13 AM, Ingo Molnar <mingo@elte.hu> wrote:
>> > I think the Xen design has merit if it can truly make dom0 a
>> > guest -- that is, if it can survive dom0 failure.  Until then,
>> > you're just taking a large interdependent codebase and splitting
>> > it at some random point, but you don't get any stability or
>> > security in return.

Let me turn this around: are you (Ingo) saying that if a Xen system
could successfully survive a dom0 failure, then you would consider
that a valid reason for this design choice, and would be willing to
support and pursue changes required to allow mainline linux to run as
dom0?  If not then this line of discussion is just a distraction.

I personally think the strongest argument for an interdependent
codebase is the ability to have a separate piece of software as a
dedicated hypervisor. I also think Xen provides extra security and
stability as it is right now.  The code is much smaller and simpler
than the kernel.  The number of hypercalls is smaller than the number
of system calls, and the complexity of hypercalls is much lower than
the complexity of system calls in general.  Driver domains, in which a
driver runs in a domain other than dom0 and can fail and reboot, have
been supported in Xen for years.  The ability to survive dom0 failure
is just an added benefit.

As Dan and Jeremy said, the Xen community is actively pursuing the
changes required to allow dom0 to panic / reboot without requiring a
reboot of Xen and other guests.  I'm sure if that would make members
of the linux community actively support inclusion of dom0 support, we
could make that work a priority.

 -George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
@ 2009-05-28 14:26                 ` George Dunlap
  0 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-05-28 14:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dan Magenheimer, Xen-devel, the arch/x86 maintainers,
	Jeremy Fitzhardinge, Linux Kernel Mailing List, Avi Kivity,
	Linus Torvalds, Keir Fraser

On Thu, May 28, 2009 at 1:13 AM, Ingo Molnar <mingo@elte.hu> wrote:
>> > I think the Xen design has merit if it can truly make dom0 a
>> > guest -- that is, if it can survive dom0 failure.  Until then,
>> > you're just taking a large interdependent codebase and splitting
>> > it at some random point, but you don't get any stability or
>> > security in return.

Let me turn this around: are you (Ingo) saying that if a Xen system
could successfully survive a dom0 failure, then you would consider
that a valid reason for this design choice, and would be willing to
support and pursue changes required to allow mainline linux to run as
dom0?  If not then this line of discussion is just a distraction.

I personally think the strongest argument for an interdependent
codebase is the ability to have a separate piece of software as a
dedicated hypervisor. I also think Xen provides extra security and
stability as it is right now.  The code is much smaller and simpler
than the kernel.  The number of hypercalls is smaller than the number
of system calls, and the complexity of hypercalls is much lower than
the complexity of system calls in general.  Driver domains, in which a
driver runs in a domain other than dom0 and can fail and reboot, have
been supported in Xen for years.  The ability to survive dom0 failure
is just an added benefit.

As Dan and Jeremy said, the Xen community is actively pursuing the
changes required to allow dom0 to panic / reboot without requiring a
reboot of Xen and other guests.  I'm sure if that would make members
of the linux community actively support inclusion of dom0 support, we
could make that work a priority.

 -George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-28 13:39                   ` Tim Post
@ 2009-05-28 22:23                     ` Luke S Crawford
  2009-05-29  1:00                       ` Tim Post
  2009-05-29 13:42                       ` Dan Magenheimer
  2009-05-30  1:10                     ` Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant Michael David Crawford
  1 sibling, 2 replies; 183+ messages in thread
From: Luke S Crawford @ 2009-05-28 22:23 UTC (permalink / raw)
  To: echo; +Cc: Dan Magenheimer, Xen-devel

Tim Post <echo@echoreply.us> writes:

> What, exactly is cowboy hackery? A dom-0 that might be a little slower
> if you boot it without Xen? 

No, I mean like Debian's 2.6.27 Dom0.  As far as I can tell, 
they imported the SUSE Xen patch once, and have not pulled any of SUSE's 
bugfixes since.   By all reports, it works fine, and is excellent as a 
desktop OS.   However, it's not something I want on the server where 
reboots cost me money.

(I'm further arguing that even in the case of a small office, you want your
'dedicated virtualization server' to be just that;  and rock solid.)  

> > If you've ever run a Xen host and have forgotten to change the default 
> > dom0-min-mem of 192MiB, you'd know most (especially x86_64) linux 
> > installations are not stable under load with that much memory.
> 
> I have , and I don't forget to change it.

But why make the default something that will crash the server?

> Have you even looked at / tried Eucalyptus ?

I've looked at it, I'm thinking about using it.  The EC2 interface is 
really neat.   I have a lot of admiration and respect for amazon.  But 
the interface is not what makes the small ec2 instances unsuitable as co-lo 
replacements, the unmirrored disk and the high price is.  the small
ec2 images are simply not designed to be a replacement for servers in the 
usual case.  They are great if you have a nice redundant application, 
designed to let any node fail at any time,  but that's not how most small 
businesses configure their servers.  For most small companies, it's cheaper 
to get reasonably good hardware and redundant disk, take backups, and then 
have a fire drill every time the hardware fails. 

> I don't have this problem. I export PV guest vitals over xenbus and set
> up watches on them. As for overcommitment, the first step is knowing how
> much memory each domain's kernel has actually promised to running
> processes. That much is already in the tree. 

That only solves half of the problem and gets you back to where
you are with FreeBSD jails/unionfs (well, also my users run their own
kernels and have full control over userland.)   Even with that problem 
solved, you still have the problem of disk cache, which is essential for 
acceptable performance.

If you want to buy a small image from me and thrash it, that's fine.
However, I don't want that user who underprovisions his or her
domain to make performance suck for a more responsible user.  This is
why I moved to xen in the first place;  a few heavy users were
trashing the disk cache on my FreeBSD jail system, and it was slow
for everyone. 

With the move to Xen, suddenly the heavy user was the only user
seeing the slowness.    Now the heavy user has the option of paying
me more money for more ram to use as disk cache, or of dealing with it
being slow.  Light users had no more trouble.  Log in once every 3 months?
your /etc/passwd is still cached from last time.  

This is why I'm so uneasy above overcommit.  Ram is not like CPU, which
you can take away at a moments notice and give back as if (almost) nothing
happened.  (or perhaps new CPUs are just so much more powerful than I 
need that I don't notice the degridation.)  

> Just as many others have done with debootstrap. I know your frustrated
> with dom-0 not being in mainline, we all are. However, it seems the
> tools frustrate you the most. Xen gives us a solid hypervisor, solid low
> level libraries and some examples on how to use them. I can't see (at
> this point) why you are so seemingly disgruntled?

hm.  Well, it is bad that I am coming across as disgruntled.  But 
I do think it is bad that the tools that come with xen seem to be focused
on Xen as a desktop OS at the expense of xen as a dedicated 
virtualization server.  I don't think Xen makes a particularly good
desktop virtualization platform, and this setup unnecessarily 
raises the barrier to using Xen as a dedicated virtualization server.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Xen is a feature
  2009-05-28  0:13               ` Ingo Molnar
@ 2009-05-29  0:45                 ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-29  0:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dan Magenheimer, Avi Kivity, George Dunlap, Xen-devel,
	the arch/x86 maintainers, Linux Kernel Mailing List, Keir Fraser,
	Linus Torvalds, Greg KH, Kurt C. Hackel, Ian Pratt, xen-users,
	Ky Srinivasan, Eric Anderson, Wim Coekaerts, Stephen Spector,
	Jens Axboe, Nick Piggin

Ingo Molnar wrote:
> Xen changes - especially dom0 - are overwhelmingly not about 
> improving Linux, but about having some special hook and extra 
> treatment in random places - and that's really bad.
>   

You've made this argument a few times now, and I take exception to it.

It seems to be predicated on the idea that Xen has some kind of niche 
usage, with barely more users than Voyager.  Or that it is a parasite 
sitting on the side of Linux, being a pure drain.

Neither is true.  Xen is very widely used.  There are at least 500k 
servers running Xen in commercial user sites (and untold numbers of 
smaller sites and personal users), running millions of virtual guest 
domains.  If you browse the net at all widely, you're likely to be using 
a Xen-based server; all of Amazon runs on Xen, for example.  Mozilla and 
Debian are hosted on Xen systems.

Hardware vendors like Dell and HP are shipping servers with Xen built 
into the firmware, and increasingly, desktops and laptops.  Many laptop 
"instant-on/instant-access" features are based on a combination of Xen 
and Linux.

All major Linux distributions support running as a Xen guest, and many 
support running as a Xen host.

For these users, Xen support is an active feature of Linux; Linux 
without Xen support would be much less useful to them, and better Xen 
support would be more useful.  For them, Xen support is no different 
from any other kind of platform support.  They are being actively 
hampered by the fact that the only dom0 support is available in the form 
of either ancient or very patched kernels. 

To them, improved Xen support *is* "improving Linux".

Your view appears to be that virtualization is either useless, or a neat 
trick useful for doing a quick kernel test (which is why kvm got early 
traction in this community; it is well suited to this use-case).  But 
that is a very parochial kernel-dev view.  For many users, 
virtualization (in general, but commonly on Xen) has become an 
absolutely essential part of their computing infrastructure, and they 
would no more go without it than they would go without ethernet.

We're taking your technical critiques very seriously, of course, and I 
appreciate any constructive comment.  But your baseline position of 
animosity towards Xen is unreasonable, unfair and unnecessary.

    J

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Xen is a feature
@ 2009-05-29  0:45                 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-29  0:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nick Piggin, Dan Magenheimer, Xen-devel, Wim Coekaerts,
	Ian Pratt, Stephen Spector, George Dunlap, Kurt C. Hackel,
	the arch/x86 maintainers, Linux Kernel Mailing List, xen-users,
	Avi Kivity, Eric Anderson, Jens Axboe, Ky Srinivasan,
	Linus Torvalds, Greg KH, Keir Fraser

Ingo Molnar wrote:
> Xen changes - especially dom0 - are overwhelmingly not about 
> improving Linux, but about having some special hook and extra 
> treatment in random places - and that's really bad.
>   

You've made this argument a few times now, and I take exception to it.

It seems to be predicated on the idea that Xen has some kind of niche 
usage, with barely more users than Voyager.  Or that it is a parasite 
sitting on the side of Linux, being a pure drain.

Neither is true.  Xen is very widely used.  There are at least 500k 
servers running Xen in commercial user sites (and untold numbers of 
smaller sites and personal users), running millions of virtual guest 
domains.  If you browse the net at all widely, you're likely to be using 
a Xen-based server; all of Amazon runs on Xen, for example.  Mozilla and 
Debian are hosted on Xen systems.

Hardware vendors like Dell and HP are shipping servers with Xen built 
into the firmware, and increasingly, desktops and laptops.  Many laptop 
"instant-on/instant-access" features are based on a combination of Xen 
and Linux.

All major Linux distributions support running as a Xen guest, and many 
support running as a Xen host.

For these users, Xen support is an active feature of Linux; Linux 
without Xen support would be much less useful to them, and better Xen 
support would be more useful.  For them, Xen support is no different 
from any other kind of platform support.  They are being actively 
hampered by the fact that the only dom0 support is available in the form 
of either ancient or very patched kernels. 

To them, improved Xen support *is* "improving Linux".

Your view appears to be that virtualization is either useless, or a neat 
trick useful for doing a quick kernel test (which is why kvm got early 
traction in this community; it is well suited to this use-case).  But 
that is a very parochial kernel-dev view.  For many users, 
virtualization (in general, but commonly on Xen) has become an 
absolutely essential part of their computing infrastructure, and they 
would no more go without it than they would go without ethernet.

We're taking your technical critiques very seriously, of course, and I 
appreciate any constructive comment.  But your baseline position of 
animosity towards Xen is unreasonable, unfair and unnecessary.

    J

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-28 22:23                     ` Luke S Crawford
@ 2009-05-29  1:00                       ` Tim Post
  2009-05-29  8:31                         ` Tim Post
  2009-05-29 13:42                       ` Dan Magenheimer
  1 sibling, 1 reply; 183+ messages in thread
From: Tim Post @ 2009-05-29  1:00 UTC (permalink / raw)
  To: Luke S Crawford; +Cc: Dan Magenheimer, Xen-devel

On Thu, 2009-05-28 at 18:23 -0400, Luke S Crawford wrote:
> Tim Post <echo@echoreply.us> writes:
> 
> > What, exactly is cowboy hackery? A dom-0 that might be a little slower
> > if you boot it without Xen? 
> 
> No, I mean like Debian's 2.6.27 Dom0.  As far as I can tell, 
> they imported the SUSE Xen patch once, and have not pulled any of SUSE's 
> bugfixes since.   By all reports, it works fine, and is excellent as a 
> desktop OS.   However, it's not something I want on the server where 
> reboots cost me money.

One of the biggest problems is having to go out of tree to get a usable
dom-0, then you deploy it .. then you find interesting bugs a week
later. I think everyone right now is just crossing their fingers.

> But why make the default something that will crash the server?

That's really something for the various distros to address too.

> > I don't have this problem. I export PV guest vitals over xenbus and set
> > up watches on them. As for overcommitment, the first step is knowing how
> > much memory each domain's kernel has actually promised to running
> > processes. That much is already in the tree. 
> 
> That only solves half of the problem and gets you back to where
> you are with FreeBSD jails/unionfs (well, also my users run their own
> kernels and have full control over userland.)   Even with that problem 
> solved, you still have the problem of disk cache, which is essential for 
> acceptable performance.

Right now, what we're doing is not quite overcommitment, its more like
accounting. By placing the output of sysinfo() and more (bits
of /proc/meminfo) on Xenbus, its easy to get a bird's eye view of what
domains are under or over utilizing their given RAM. If a domain has
1GB, yet its kernel is consistently committing only 384MB (actual size),
there's a good chance that the guest would do just as well with 512MB,
depending on its buffer use. The reverse is also true. Its looking at
the whole VM big picture, including buffers, swap, etc.

Its not an automatic process, but it does allow an administrator to
better organize domains and allocate resources. In the case where you've
sold someone 1GB its not applicable .. but in an office / enterprise
setting it does make things easier.

> This is why I'm so uneasy above overcommit.  Ram is not like CPU, which
> you can take away at a moments notice and give back as if (almost) nothing
> happened.  (or perhaps new CPUs are just so much more powerful than I 
> need that I don't notice the degridation.)

I'm the same way, I look forward to seeing the balloon driver advance,
however I'd never flip a switch to 'auto'. 

> hm.  Well, it is bad that I am coming across as disgruntled.

Frustrated is probably a better word. 

> But 
> I do think it is bad that the tools that come with xen seem to be focused
> on Xen as a desktop OS at the expense of xen as a dedicated 
> virtualization server.  I don't think Xen makes a particularly good
> desktop virtualization platform, and this setup unnecessarily 
> raises the barrier to using Xen as a dedicated virtualization server.

The tools are the minimum needed to control and manage domains, plus an
API for those who don't want to get intimiate with the lower level
libraries.

I know they're basic, but they also present good examples and a great
opportunity to make tools that suit your exact need.

I don't quite understand why you feel they are better suited to desktop
virtualization (taking the API into consideration for multi server
setups)?

Cheers,
--Tim

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-05-29  0:45                 ` Jeremy Fitzhardinge
  (?)
@ 2009-05-29  1:27                 ` Greg KH
  -1 siblings, 0 replies; 183+ messages in thread
From: Greg KH @ 2009-05-29  1:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Dan Magenheimer, Avi Kivity, George Dunlap,
	Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List,
	Keir Fraser, Linus Torvalds, Kurt C. Hackel, Ian Pratt,
	xen-users, Ky Srinivasan, Eric Anderson, Wim Coekaerts,
	Stephen Spector, Jens Axboe, Nick Piggin

On Thu, May 28, 2009 at 05:45:34PM -0700, Jeremy Fitzhardinge wrote:
> Mozilla and Debian are hosted on Xen systems.

A tiny data point about these domains.  They are hosted by osuosl.org,
which uses xen systems running with the current dom0 patch set.  Because
those patches are out-of-tree, they have a hard time updating kernel
versions, and generally lag kernel.org releases by a lot, which is not
always a good thing.

So getting the dom0 patches into mainline will make their lives much
easier, and more secure.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-05-29  0:45                 ` Jeremy Fitzhardinge
  (?)
  (?)
@ 2009-05-29  4:05                 ` David Miller
  2009-05-29  6:37                   ` Jaswinder Singh Rajput
  2009-05-29 12:01                     ` George Dunlap
  -1 siblings, 2 replies; 183+ messages in thread
From: David Miller @ 2009-05-29  4:05 UTC (permalink / raw)
  To: jeremy
  Cc: mingo, dan.magenheimer, avi, George.Dunlap, xen-devel, x86,
	linux-kernel, keir.fraser, torvalds, gregkh, kurt.hackel,
	Ian.Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	stephen.spector, jens.axboe, npiggin

From: Jeremy Fitzhardinge <jeremy@goop.org>
Date: Thu, 28 May 2009 17:45:34 -0700

> Ingo Molnar wrote:
>> Xen changes - especially dom0 - are overwhelmingly not about improving
>> Linux, but about having some special hook and extra treatment in
>> random places - and that's really bad.
>>   
> 
> You've made this argument a few times now, and I take exception to it.
> 
> It seems to be predicated on the idea that Xen has some kind of niche
> usage, with barely more users than Voyager.  Or that it is a parasite
> sitting on the side of Linux, being a pure drain.

I don't see Ingo's comments, whether I agree with them or not, as
an implication of Xen being niche.  Rather I see his comments as
an opposition to how Xen is implemented.

> We're taking your technical critiques very seriously, of course, and I
> appreciate any constructive comment.  But your baseline position of
> animosity towards Xen is unreasonable, unfair and unnecessary.

I don't see any animosity at all in what Ingo has said.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-05-29  4:05                 ` David Miller
@ 2009-05-29  6:37                   ` Jaswinder Singh Rajput
  2009-05-29  6:51                     ` David Miller
  2009-05-29 12:01                     ` George Dunlap
  1 sibling, 1 reply; 183+ messages in thread
From: Jaswinder Singh Rajput @ 2009-05-29  6:37 UTC (permalink / raw)
  To: David Miller
  Cc: jeremy, mingo, dan.magenheimer, avi, George.Dunlap, xen-devel,
	x86, linux-kernel, keir.fraser, torvalds, gregkh, kurt.hackel,
	Ian.Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	stephen.spector, jens.axboe, npiggin

Hi Dave,

On Thu, 2009-05-28 at 21:05 -0700, David Miller wrote:
> From: Jeremy Fitzhardinge <jeremy@goop.org>
> Date: Thu, 28 May 2009 17:45:34 -0700
> 
> > Ingo Molnar wrote:
> >> Xen changes - especially dom0 - are overwhelmingly not about improving
> >> Linux, but about having some special hook and extra treatment in
> >> random places - and that's really bad.
> >>   
> > 
> > You've made this argument a few times now, and I take exception to it.
> > 
> > It seems to be predicated on the idea that Xen has some kind of niche
> > usage, with barely more users than Voyager.  Or that it is a parasite
> > sitting on the side of Linux, being a pure drain.
> 
> I don't see Ingo's comments, whether I agree with them or not, as
> an implication of Xen being niche.  Rather I see his comments as
> an opposition to how Xen is implemented.
> 

You can see Ingo's comments and whole thread under subject :

Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)

http://lkml.org/lkml/2009/5/27/758

--
JSR


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-05-29  6:37                   ` Jaswinder Singh Rajput
@ 2009-05-29  6:51                     ` David Miller
  0 siblings, 0 replies; 183+ messages in thread
From: David Miller @ 2009-05-29  6:51 UTC (permalink / raw)
  To: jaswinder
  Cc: jeremy, mingo, dan.magenheimer, avi, George.Dunlap, xen-devel,
	x86, linux-kernel, keir.fraser, torvalds, gregkh, kurt.hackel,
	Ian.Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	stephen.spector, jens.axboe, npiggin

From: Jaswinder Singh Rajput <jaswinder@kernel.org>
Date: Fri, 29 May 2009 12:07:32 +0530

> Hi Dave,
> 
> On Thu, 2009-05-28 at 21:05 -0700, David Miller wrote:
>> From: Jeremy Fitzhardinge <jeremy@goop.org>
>> Date: Thu, 28 May 2009 17:45:34 -0700
>> 
>> > Ingo Molnar wrote:
>> >> Xen changes - especially dom0 - are overwhelmingly not about improving
>> >> Linux, but about having some special hook and extra treatment in
>> >> random places - and that's really bad.
>> >>   
>> > 
>> > You've made this argument a few times now, and I take exception to it.
>> > 
>> > It seems to be predicated on the idea that Xen has some kind of niche
>> > usage, with barely more users than Voyager.  Or that it is a parasite
>> > sitting on the side of Linux, being a pure drain.
>> 
>> I don't see Ingo's comments, whether I agree with them or not, as
>> an implication of Xen being niche.  Rather I see his comments as
>> an opposition to how Xen is implemented.
>> 
> 
> You can see Ingo's comments and whole thread under subject :
> 
> Re: [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)
> 
> http://lkml.org/lkml/2009/5/27/758

Jeremy is specifically commenting on Ingo's quoted "argument".
And that "argument" is what he takes "exception to".

And that's the scope of what I'm commenting on too.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-29  1:00                       ` Tim Post
@ 2009-05-29  8:31                         ` Tim Post
  2009-05-29  9:49                           ` George Dunlap
  0 siblings, 1 reply; 183+ messages in thread
From: Tim Post @ 2009-05-29  8:31 UTC (permalink / raw)
  To: Luke S Crawford; +Cc: Dan Magenheimer, Xen-devel

On Fri, 2009-05-29 at 09:00 +0800, Tim Post wrote:
> Right now, what we're doing is not quite overcommitment, its more like
> accounting. By placing the output of sysinfo() and more (bits
> of /proc/meminfo) on Xenbus, its easy to get a bird's eye view of what
> domains are under or over utilizing their given RAM. If a domain has
> 1GB, yet its kernel is consistently committing only 384MB (actual size),
> there's a good chance that the guest would do just as well with 512MB,
> depending on its buffer use. The reverse is also true. Its looking at
> the whole VM big picture, including buffers, swap, etc.

Sorry, forgot to mention, average (aggregate) IOWAIT is also a key
factor. Users can do odd things like bypass buffers with relational
databases. So, when we see the kernel overselling, next to nill buffers
and a very high aggregate average IOWAIT across all vcpus, we have a
pretty good idea of what's going on.

Xenbus/Xenstore exists, the combined size of these vitals are small ..
until admin friendly introspection surfaces, its really the best way to
put any given host under a stereo microscope.

The problem is differentiating disk I/O from network I/O.

Cheers,
--Tim

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-29  8:31                         ` Tim Post
@ 2009-05-29  9:49                           ` George Dunlap
  0 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-05-29  9:49 UTC (permalink / raw)
  To: echo; +Cc: Luke S Crawford, Xen-devel, Dan Magenheimer

Luke,

I hope this doesn't come off as a shameless plug, but Citrix XenServer
is exactly what you describe: dom0 is used only as a utility domain to
control other VMs.  And the basic version, which now includes (if I
recall our marketing blah blah blah correclty) support for server
pools, migration, and remote storage, is available for free
(as-in-beer, and with some registration so we can figure out who's
using it).  It's honestly what I would use if I were running a sever
in a small business.

If you're commenting on the lack of free-as-in-speech distro that
looks like XenServer, Xen as a project doesn't have much say in how
distros integrate Xen.  I don't see any technical reason why someone
couldn't take a Debian base and set up something like XenServer; or
any technical reason why someone couldn't do like CentOS has done, and
clone our entire open-source tree as a starting point.  (Obviously it
would take a little bit of additional work, since the control stack on
XenServer isn't open-source.)

If you're up for starting a distro based on Xen, that would be great.
I think it would probably get a lot of traction with server admins,
and if you make good design choices to minimize the amount of work you
have to do as things move forward, and can get a good community around
it, it's got a chance to have a big impact on OSS virtualization.

And any technical feedback, such as suggesting a better dom0_min_mem
size, can be submitted to the list, or put in a bugzilla note, even if
you don't have a patch to change it.

 -George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-05-29  4:05                 ` David Miller
@ 2009-05-29 12:01                     ` George Dunlap
  2009-05-29 12:01                     ` George Dunlap
  1 sibling, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-05-29 12:01 UTC (permalink / raw)
  To: David Miller
  Cc: jeremy, mingo, Dan Magenheimer, avi, xen-devel, x86,
	linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

David Miller wrote:
> I don't see Ingo's comments, whether I agree with them or not, as
> an implication of Xen being niche.  Rather I see his comments as
> an opposition to how Xen is implemented.
>   
It's in his definition of "improving Linux".  Jeremy is saying that 
allowing Linux to run as dom0 *is* improving Linux.  The lack of dom0 
support is at this moment making life more difficult for a huge number 
of Linux users who use Xen, including Mozilla, Debian, and Amazon.    
Adding dom0 support would make Linux even more useful to a wide variety 
of people not using Xen at the moment. 

Saying that dom0 support is "not about improving Linux" completely 
ignores the cost people are paying right now, and the benefits people 
could have.  That (if I understand him) what Jeremy meant by saying it 
was treating it as if it was some kind of "niche usage, with barely more 
users than Voyager", and "being a pure drain".
> I don't see any animosity at all in what Ingo has said.
>   
The last few paragraphs of the e-mail weren't about that particular 
argument, but about the sum of the interaction with Ingo over dom0 
support for the last 6 months.  If you read the various threads, it's 
pretty clear that Ingo is resistant to accepting dom0 changes, for 
whatever reason, and has been looking for reasons not to include it. 

If we take him at his word, that the root issue is that he fundamentally 
dislikes the design choice of running Linux-as-hypervisor-component, 
then we have a difference of opinion and we're just going to have to 
agree to disagree.  But there are reasons to include it anyway, 
including benefits to existing Xen users and potential Xen users (who 
have decided not to use KVM for whatever reason), and the idea of 
survival-of-the-fittest: Xen and KVM have made different design choices, 
let's let them both grow and see which one thrives.  If KVM's design is 
unilaterally superior, eventually Xen will die off.  But I suspect that 
there's significant demand in the OSS virtualization ecology for both 
approaches, and the world will be the worse for dom0 support being 
out-of-tree.

In any case, making unreasonable or inconsistent technical objections, 
when the root issue is is actually something else, is a waste of time 
and energy for everyone involved.

 -George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-05-29 12:01                     ` George Dunlap
  0 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-05-29 12:01 UTC (permalink / raw)
  To: David Miller
  Cc: jeremy, mingo, Dan Magenheimer, avi, xen-devel, x86,
	linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin@suse.de

David Miller wrote:
> I don't see Ingo's comments, whether I agree with them or not, as
> an implication of Xen being niche.  Rather I see his comments as
> an opposition to how Xen is implemented.
>   
It's in his definition of "improving Linux".  Jeremy is saying that 
allowing Linux to run as dom0 *is* improving Linux.  The lack of dom0 
support is at this moment making life more difficult for a huge number 
of Linux users who use Xen, including Mozilla, Debian, and Amazon.    
Adding dom0 support would make Linux even more useful to a wide variety 
of people not using Xen at the moment. 

Saying that dom0 support is "not about improving Linux" completely 
ignores the cost people are paying right now, and the benefits people 
could have.  That (if I understand him) what Jeremy meant by saying it 
was treating it as if it was some kind of "niche usage, with barely more 
users than Voyager", and "being a pure drain".
> I don't see any animosity at all in what Ingo has said.
>   
The last few paragraphs of the e-mail weren't about that particular 
argument, but about the sum of the interaction with Ingo over dom0 
support for the last 6 months.  If you read the various threads, it's 
pretty clear that Ingo is resistant to accepting dom0 changes, for 
whatever reason, and has been looking for reasons not to include it. 

If we take him at his word, that the root issue is that he fundamentally 
dislikes the design choice of running Linux-as-hypervisor-component, 
then we have a difference of opinion and we're just going to have to 
agree to disagree.  But there are reasons to include it anyway, 
including benefits to existing Xen users and potential Xen users (who 
have decided not to use KVM for whatever reason), and the idea of 
survival-of-the-fittest: Xen and KVM have made different design choices, 
let's let them both grow and see which one thrives.  If KVM's design is 
unilaterally superior, eventually Xen will die off.  But I suspect that 
there's significant demand in the OSS virtualization ecology for both 
approaches, and the world will be the worse for dom0 support being 
out-of-tree.

In any case, making unreasonable or inconsistent technical objections, 
when the root issue is is actually something else, is a waste of time 
and energy for everyone involved.

 -George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-28 22:23                     ` Luke S Crawford
  2009-05-29  1:00                       ` Tim Post
@ 2009-05-29 13:42                       ` Dan Magenheimer
  2009-05-30 21:02                         ` Luke S Crawford
  1 sibling, 1 reply; 183+ messages in thread
From: Dan Magenheimer @ 2009-05-29 13:42 UTC (permalink / raw)
  To: Luke S Crawford, echo; +Cc: Xen-devel

> With the move to Xen, suddenly the heavy user was the only user
> seeing the slowness.    Now the heavy user has the option of paying
> me more money for more ram to use as disk cache, or of dealing with it
> being slow.  Light users had no more trouble.  Log in once 
> every 3 months?
> your /etc/passwd is still cached from last time.  

Am I understanding this correctly that you are "renting" a
fixed partition of physical RAM that (assuming the physical
server never reboots) persistently holds one VSP customer's
VM's memory forever, never saved to disk?

Although I can see this being advantageous for some users,
no matter how cheap RAM is, having RAM sit "idle" for months
(or even minutes) seems a dreadful waste of resources,
which is either increasing the price of the service or the
cost to the provider for a very small benefit for a
small number of users.  I see it as akin to every VM
computing pi in a background process because, after all,
the CPU has nothing better to do if it was going to be
idle anyway.

While I can see how the current sorry state of memory management
by OS's and hypervisors might lead to this business decision,
my goal is to make RAM a much more "renewable" resource.
The same way CPU's are adding power management so that
they can be shut down when idle even for extremely small
periods of time to conserve resources, I'd like to see
"idle memory" dramatically reduced.  Self-ballooning and
tmem are admittedly only a step in that direction, but
at least it is (I hope) the right direction.

Dan

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-05-29 12:01                     ` George Dunlap
@ 2009-05-29 14:14                       ` Pasi Kärkkäinen
  -1 siblings, 0 replies; 183+ messages in thread
From: Pasi Kärkkäinen @ 2009-05-29 14:14 UTC (permalink / raw)
  To: George Dunlap
  Cc: David Miller, jeremy, mingo, Dan Magenheimer, avi, xen-devel,
	x86, linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

On Fri, May 29, 2009 at 01:01:18PM +0100, George Dunlap wrote:
> David Miller wrote:
> >I don't see Ingo's comments, whether I agree with them or not, as
> >an implication of Xen being niche.  Rather I see his comments as
> >an opposition to how Xen is implemented.
> >  
> It's in his definition of "improving Linux".  Jeremy is saying that 
> allowing Linux to run as dom0 *is* improving Linux.  The lack of dom0 
> support is at this moment making life more difficult for a huge number 
> of Linux users who use Xen, including Mozilla, Debian, and Amazon.    
> Adding dom0 support would make Linux even more useful to a wide variety 
> of people not using Xen at the moment. 
> 

Like stated already earlier, there is a huge amount of Xen in use all around
the globe for server/datacenter virtualization. Personally I know many Xen 
installations in production, but not a single KVM installation (I'm sure those 
exist aswell, but personally I haven't seen those).

At the moment it's pretty painful for the distro developers to ship dom0
enabled kernels (most of the distros do ship or are waiting for upstream
dom0 enabled kernel), and also for many advanced users who build their custom Xen
based solutions.. 

The current situation is not good for anyone. We really need Xen dom0
support in mainline Linux.

Just my 2 eurocents.

-- Pasi

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-05-29 14:14                       ` Pasi Kärkkäinen
  0 siblings, 0 replies; 183+ messages in thread
From: Pasi Kärkkäinen @ 2009-05-29 14:14 UTC (permalink / raw)
  To: George Dunlap
  Cc: David Miller, jeremy, mingo, Dan Magenheimer, avi, xen-devel,
	x86, linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe

On Fri, May 29, 2009 at 01:01:18PM +0100, George Dunlap wrote:
> David Miller wrote:
> >I don't see Ingo's comments, whether I agree with them or not, as
> >an implication of Xen being niche.  Rather I see his comments as
> >an opposition to how Xen is implemented.
> >  
> It's in his definition of "improving Linux".  Jeremy is saying that 
> allowing Linux to run as dom0 *is* improving Linux.  The lack of dom0 
> support is at this moment making life more difficult for a huge number 
> of Linux users who use Xen, including Mozilla, Debian, and Amazon.    
> Adding dom0 support would make Linux even more useful to a wide variety 
> of people not using Xen at the moment. 
> 

Like stated already earlier, there is a huge amount of Xen in use all around
the globe for server/datacenter virtualization. Personally I know many Xen 
installations in production, but not a single KVM installation (I'm sure those 
exist aswell, but personally I haven't seen those).

At the moment it's pretty painful for the distro developers to ship dom0
enabled kernels (most of the distros do ship or are waiting for upstream
dom0 enabled kernel), and also for many advanced users who build their custom Xen
based solutions.. 

The current situation is not good for anyone. We really need Xen dom0
support in mainline Linux.

Just my 2 eurocents.

-- Pasi

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-05-29 12:01                     ` George Dunlap
  (?)
  (?)
@ 2009-05-29 18:34                     ` Andi Kleen
  2009-05-29 21:31                         ` Jeremy Fitzhardinge
  2009-05-29 23:09                         ` Nakajima, Jun
  -1 siblings, 2 replies; 183+ messages in thread
From: Andi Kleen @ 2009-05-29 18:34 UTC (permalink / raw)
  To: George Dunlap; +Cc: jeremy, xen-devel, Keir Fraser, x86, linux-kernel

George Dunlap <george.dunlap@eu.citrix.com> writes:

cc list from hell trimmed. 

> allowing Linux to run as dom0 *is* improving Linux.  The lack of dom0
> support is at this moment making life more difficult for a huge number
> of Linux users who use Xen, including Mozilla, Debian, and Amazon.
> Adding dom0 support would make Linux even more useful to a wide
> variety of people not using Xen at the moment.

Perhaps one way to address this problem would be to make the Dom0
interface less intrusive for the host OS?

Maybe impression last time I looked was that there was huge potential
of improvement in this area. For example the PAT issue
recently discussed was completely unnecessary.  Or if you
added a "VT/SVM only" Dom0 mode I'm sure the interface
would be significantly cleaner too. If you can come up
with a slim clean interface the chances for actual integration
would be likely much higher.

And if people want to update the Dom0 they surely could update the
hypervisor to one with cleaner interfaces too.

I understand that the DomU Xen ABI is becoming a kind of standard
and should be supported, but that's far from true for Dom0.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-05-29 14:14                       ` Pasi Kärkkäinen
  (?)
@ 2009-05-29 21:29                       ` David Miller
  -1 siblings, 0 replies; 183+ messages in thread
From: David Miller @ 2009-05-29 21:29 UTC (permalink / raw)
  To: pasik
  Cc: george.dunlap, jeremy, mingo, dan.magenheimer, avi, xen-devel,
	x86, linux-kernel, Keir.Fraser, torvalds, gregkh, kurt.hackel,
	Ian.Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	stephen.spector, jens.axboe, npiggin

From: Pasi Kärkkäinen <pasik@iki.fi>
Date: Fri, 29 May 2009 17:14:39 +0300

> We really need Xen dom0 support in mainline Linux.

Whether we want a feature is seperate from making sure it's
implementation is up to snuff and doesn't suck.

But the concentration of the talk seems to be on wanting the feature,
and that's only half the story.

I'm getting sick of hearing over and over how many people use Xen,
that point has been made succintly so let's move on ok?


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-devel] Re: Xen is a feature
  2009-05-29 18:34                     ` Andi Kleen
@ 2009-05-29 21:31                         ` Jeremy Fitzhardinge
  2009-05-29 23:09                         ` Nakajima, Jun
  1 sibling, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-29 21:31 UTC (permalink / raw)
  To: Andi Kleen; +Cc: George Dunlap, xen-devel, Keir Fraser, x86, linux-kernel

Andi Kleen wrote:
> George Dunlap <george.dunlap@eu.citrix.com> writes:
>
> cc list from hell trimmed. 
>
>   
>> allowing Linux to run as dom0 *is* improving Linux.  The lack of dom0
>> support is at this moment making life more difficult for a huge number
>> of Linux users who use Xen, including Mozilla, Debian, and Amazon.
>> Adding dom0 support would make Linux even more useful to a wide
>> variety of people not using Xen at the moment.
>>     
>
> Perhaps one way to address this problem would be to make the Dom0
> interface less intrusive for the host OS?
>   

I'm certainly not deaf to criticism along those lines, and I'm looking 
at ways of cleaning up/decoupling those interactions.

But my frustration arises from the fact that there's been a total stall 
on merging any of the pieces, even the ones which are either 
uncontroversial, or purely xen-internal changes.

    J

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Re: Xen is a feature
@ 2009-05-29 21:31                         ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-29 21:31 UTC (permalink / raw)
  To: Andi Kleen; +Cc: George Dunlap, x86, xen-devel, Keir Fraser, linux-kernel

Andi Kleen wrote:
> George Dunlap <george.dunlap@eu.citrix.com> writes:
>
> cc list from hell trimmed. 
>
>   
>> allowing Linux to run as dom0 *is* improving Linux.  The lack of dom0
>> support is at this moment making life more difficult for a huge number
>> of Linux users who use Xen, including Mozilla, Debian, and Amazon.
>> Adding dom0 support would make Linux even more useful to a wide
>> variety of people not using Xen at the moment.
>>     
>
> Perhaps one way to address this problem would be to make the Dom0
> interface less intrusive for the host OS?
>   

I'm certainly not deaf to criticism along those lines, and I'm looking 
at ways of cleaning up/decoupling those interactions.

But my frustration arises from the fact that there's been a total stall 
on merging any of the pieces, even the ones which are either 
uncontroversial, or purely xen-internal changes.

    J

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [Xen-devel] Re: Xen is a feature
  2009-05-29 18:34                     ` Andi Kleen
@ 2009-05-29 23:09                         ` Nakajima, Jun
  2009-05-29 23:09                         ` Nakajima, Jun
  1 sibling, 0 replies; 183+ messages in thread
From: Nakajima, Jun @ 2009-05-29 23:09 UTC (permalink / raw)
  To: Andi Kleen, George Dunlap
  Cc: jeremy, xen-devel, Keir Fraser, x86, linux-kernel

On 5/29/2009 11:34:40 AM, Andi Kleen wrote:
> George Dunlap <george.dunlap@eu.citrix.com> writes:
>
> cc list from hell trimmed.
>
> > allowing Linux to run as dom0 *is* improving Linux.  The lack of
> > dom0 support is at this moment making life more difficult for a huge
> > number of Linux users who use Xen, including Mozilla, Debian, and Amazon.
> > Adding dom0 support would make Linux even more useful to a wide
> > variety of people not using Xen at the moment.
>
> Perhaps one way to address this problem would be to make the Dom0
> interface less intrusive for the host OS?
>
> Maybe impression last time I looked was that there was huge potential
> of improvement in this area. For example the PAT issue recently
> discussed was completely unnecessary.  Or if you added a "VT/SVM only"
> Dom0 mode I'm sure the interface would be significantly cleaner too.
> If you can come up with a slim clean interface the chances for actual
> integration would be likely much higher.

I think we still need some (or all?) of additional dom0 PV ops even for HVM (Hardware-based VM) dom0. Hardware-based virtualization can significantly clean up the CPU-related PV ops (including some for local APIC), but they have nothing to do with dom0.

Some hooks in the host could be removed by reusing the HVM-specific code with modifications to the virtualization logic, but I think people need to tell which specific ones are intrusive, to be fair.

             .
Jun Nakajima | Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: Re: Xen is a feature
@ 2009-05-29 23:09                         ` Nakajima, Jun
  0 siblings, 0 replies; 183+ messages in thread
From: Nakajima, Jun @ 2009-05-29 23:09 UTC (permalink / raw)
  To: Andi Kleen, George Dunlap
  Cc: x86, jeremy, xen-devel, Keir Fraser, linux-kernel

On 5/29/2009 11:34:40 AM, Andi Kleen wrote:
> George Dunlap <george.dunlap@eu.citrix.com> writes:
>
> cc list from hell trimmed.
>
> > allowing Linux to run as dom0 *is* improving Linux.  The lack of
> > dom0 support is at this moment making life more difficult for a huge
> > number of Linux users who use Xen, including Mozilla, Debian, and Amazon.
> > Adding dom0 support would make Linux even more useful to a wide
> > variety of people not using Xen at the moment.
>
> Perhaps one way to address this problem would be to make the Dom0
> interface less intrusive for the host OS?
>
> Maybe impression last time I looked was that there was huge potential
> of improvement in this area. For example the PAT issue recently
> discussed was completely unnecessary.  Or if you added a "VT/SVM only"
> Dom0 mode I'm sure the interface would be significantly cleaner too.
> If you can come up with a slim clean interface the chances for actual
> integration would be likely much higher.

I think we still need some (or all?) of additional dom0 PV ops even for HVM (Hardware-based VM) dom0. Hardware-based virtualization can significantly clean up the CPU-related PV ops (including some for local APIC), but they have nothing to do with dom0.

Some hooks in the host could be removed by reusing the HVM-specific code with modifications to the virtualization logic, but I think people need to tell which specific ones are intrusive, to be fair.

             .
Jun Nakajima | Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-devel] Re: Xen is a feature
  2009-05-29 23:09                         ` Nakajima, Jun
@ 2009-05-29 23:26                           ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-29 23:26 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: Andi Kleen, George Dunlap, x86, xen-devel, Keir Fraser,
	linux-kernel, Ingo Molnar, Jiang, Yunhong

Nakajima, Jun wrote:
> I think we still need some (or all?) of additional dom0 PV ops even for HVM (Hardware-based VM) dom0. Hardware-based virtualization can significantly clean up the CPU-related PV ops (including some for local APIC), but they have nothing to do with dom0.
>
> Some hooks in the host could be removed by reusing the HVM-specific code with modifications to the virtualization logic, but I think people need to tell which specific ones are intrusive, to be fair.
>   

I think two things will significantly clean up the dom0 apic patches:

    One is to adjust the LAPIC and IOAPIC probing code so that it
    behaves correctly if the APIC cpuid flag is clear.  That would
    remove a lot of the init-time ad-hoc Xen changes I made.

    The other is to implement Ingo's suggestion of a proper ioapic
    driver layer.  I think that would not only resolve the low-level
    IO-APIC register access issue, but probably clean up a lot of the
    vector allocation/handling, and make a clear path for MSI support. 
    With luck it will also clean up things like x2apic support

I'm planning on putting some time into investigating these next week.

Once we've nailed down the details of how to make PAT work for PV guests 
on the Xen side, we should be able to implement that fairly easily in 
Linux with no core x86 changes.

I really don't think emulating MTRR register writes is the right way to 
implement Xen MTRR support, given that a much more semantically 
appropriate interface already exists, but we can do that if nothing else 
gets merged.

IanC is restructuring the swiotlb changes in a way that I hope will be 
acceptable to all.

At that point, I think we really will have resolved all the high-level 
concerns expressed about the overall architecture of the patches, and 
maybe we can finally see some progress.

    J

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Re: Xen is a feature
@ 2009-05-29 23:26                           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 183+ messages in thread
From: Jeremy Fitzhardinge @ 2009-05-29 23:26 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: xen-devel, Jiang, Yunhong, George Dunlap, x86, linux-kernel,
	Andi Kleen, Keir Fraser, Ingo Molnar

Nakajima, Jun wrote:
> I think we still need some (or all?) of additional dom0 PV ops even for HVM (Hardware-based VM) dom0. Hardware-based virtualization can significantly clean up the CPU-related PV ops (including some for local APIC), but they have nothing to do with dom0.
>
> Some hooks in the host could be removed by reusing the HVM-specific code with modifications to the virtualization logic, but I think people need to tell which specific ones are intrusive, to be fair.
>   

I think two things will significantly clean up the dom0 apic patches:

    One is to adjust the LAPIC and IOAPIC probing code so that it
    behaves correctly if the APIC cpuid flag is clear.  That would
    remove a lot of the init-time ad-hoc Xen changes I made.

    The other is to implement Ingo's suggestion of a proper ioapic
    driver layer.  I think that would not only resolve the low-level
    IO-APIC register access issue, but probably clean up a lot of the
    vector allocation/handling, and make a clear path for MSI support. 
    With luck it will also clean up things like x2apic support

I'm planning on putting some time into investigating these next week.

Once we've nailed down the details of how to make PAT work for PV guests 
on the Xen side, we should be able to implement that fairly easily in 
Linux with no core x86 changes.

I really don't think emulating MTRR register writes is the right way to 
implement Xen MTRR support, given that a much more semantically 
appropriate interface already exists, but we can do that if nothing else 
gets merged.

IanC is restructuring the swiotlb changes in a way that I hope will be 
acceptable to all.

At that point, I think we really will have resolved all the high-level 
concerns expressed about the overall architecture of the patches, and 
maybe we can finally see some progress.

    J

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant
  2009-05-28 13:39                   ` Tim Post
  2009-05-28 22:23                     ` Luke S Crawford
@ 2009-05-30  1:10                     ` Michael David Crawford
  1 sibling, 0 replies; 183+ messages in thread
From: Michael David Crawford @ 2009-05-30  1:10 UTC (permalink / raw)
  To: Xen-devel

Tim Post wrote:
> What, exactly is cowboy hackery? A dom-0 that might be a little slower
> if you boot it without Xen? 

I unpacked the source RPM to some kernel, I don't recall if it was 
CentOS' or Fedora's, but there were something like a hundred patches 
applied to the original kernel.org sources.

I don't doubt that many of those were a good idea, but it would have 
been a great deal of work simply to determine which ones were both 
important and properly implemented.

Mike
-- 
Michael David Crawford
mdc@prgmr.com

    prgmr.com - We Don't Assume You Are Stupid.

       Xen-Powered Virtual Private Servers: http://prgmr.com/xen

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-devel] Xen is a feature
  2009-05-29  0:45                 ` Jeremy Fitzhardinge
                                   ` (2 preceding siblings ...)
  (?)
@ 2009-05-30  2:19                 ` Andy Burns
  -1 siblings, 0 replies; 183+ messages in thread
From: Andy Burns @ 2009-05-30  2:19 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Nick Piggin, Dan Magenheimer, Xen-devel,
	Wim Coekaerts, Ian Pratt, Stephen Spector, George Dunlap,
	Kurt C. Hackel, the arch/x86 maintainers,
	Linux Kernel Mailing List, xen-users, Avi Kivity, Eric Anderson,
	Jens Axboe, Ky Srinivasan, Linus Torvalds, Greg KH, Keir Fraser

2009/5/29 Jeremy Fitzhardinge <jeremy@goop.org>:

> Ingo Molnar wrote:
>>
>> Xen changes - especially dom0 - are overwhelmingly not about improving
>> Linux, but about having some special hook and extra treatment in random
>> places - and that's really bad.
>>
>
> You've made this argument a few times now, and I take exception to it.
>
> There are at least 500k servers
> running Xen in commercial user sites (and untold numbers of smaller sites
> and personal users), running millions of virtual guest domains.
> To them, improved Xen support *is* "improving Linux".

Well said. I use xen both personally and in my business as a dozen or
so of those unseen millions of domUs, I've bitten my tongue for months
while watching xen developers jump through the hoops in order to get
pv_ops dom0 into the mainstream, only to be knocked back or left until
the next merge window and the next and the next.

Sure there were "the bad old days" of xen's history, but having been
asked the go the pv_ops route, I feel it is not just failing to
improve linux by keeping dom0 out of mainstream, but actually hurting
users and trapping them on ancient kernels which are missing newer
hardware support.

Sure, I wouldn't like to see any old rubbish merged into the kernel,
but I'm amazed at Jeremy's patience over this.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-29 13:42                       ` Dan Magenheimer
@ 2009-05-30 21:02                         ` Luke S Crawford
  2009-05-31 16:44                           ` Tim Post
  2009-06-01 18:04                           ` Dan Magenheimer
  0 siblings, 2 replies; 183+ messages in thread
From: Luke S Crawford @ 2009-05-30 21:02 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: Xen-devel, echo

Dan Magenheimer <dan.magenheimer@oracle.com> writes:

> > With the move to Xen, suddenly the heavy user was the only user
> > seeing the slowness.    Now the heavy user has the option of paying
> > me more money for more ram to use as disk cache, or of dealing with it
> > being slow.  Light users had no more trouble.  Log in once 
> > every 3 months?
> > your /etc/passwd is still cached from last time.  
> 
> Am I understanding this correctly that you are "renting" a
> fixed partition of physical RAM that (assuming the physical
> server never reboots) persistently holds one VSP customer's
> VM's memory forever, never saved to disk?

Yes.   Exactly.  If you rent a 1 GB VPS from me, the way I see it,
you are renting 1/32nd of one of my 32GiB servers.   (and paying a
premium for the privlege)  Because  the cost of giving you extra CPU
when nobody else wants it, I'll give you up to a full core, if
nobody else needs it, so that's a small bonus. 

> Although I can see this being advantageous for some users,
> no matter how cheap RAM is, having RAM sit "idle" for months
> (or even minutes) seems a dreadful waste of resources,
> which is either increasing the price of the service or the
> cost to the provider for a very small benefit for a
> small number of users.  I see it as akin to every VM
> computing pi in a background process because, after all,
> the CPU has nothing better to do if it was going to be
> idle anyway.

wait what?  the difference is if you aren't using the CPU, I can take
it away, and then give it back to you when you want it almost immediately,
with a small cost (of flushing the cpu cache, but that is fast enough
that while it's a big deal for scientific type applications, it doesn't
really make the percieved responsiveness of the box worse, unless you
do it a bunch of times in a small period of time.)  

Ram is different.  If I take away your pagecache, either I save it to 
disk (slow) and restore it (slow) when I return it, or I take it from you
without saving to disk, and return clean pages when you want it back,
meaning if you want that data you've got to re-read from disk. (slow)

By slow, I mean slow enough that you notice.  you type a command and sit,
wondering what the problem with this cheap peice of crap you rented from
me is, while the disk seeks.   

Hitting disk brings the performance of nearly anything well into
'unacceptable' even when you use the expensive 10K disks, especially
when you have a bunch of people hitting those same disks.  
(and I and all competitors I know of within an order of magnitude of
my pricing use 7500rpm sata, exasterbating the problem, but the difference
between 10K sas and 7.5k sata is not many orders of magnitude like the
difference between ram and disk is)  

This does not help 'a few users'  this massively increases the percieved
responsiveness of nearly all VPSs.  what if you only get a website hit 
every 10 minutes? would you be satisfied if that hit took north of a second 
to return because it had to hit disk every time?   I wouldn't. 

would you complain if there was often north of a 1500ms delay between
when you type a command and when you got a responce?  I can tell you
that my customers did, when I used a shared pagecache.  (and yeah,
that was on 10K fibre disks in raid 1+0)  

solving these problems is what pagecache is for.

> While I can see how the current sorry state of memory management
> by OS's and hypervisors might lead to this business decision,
> my goal is to make RAM a much more "renewable" resource.
> The same way CPU's are adding power management so that
> they can be shut down when idle even for extremely small
> periods of time to conserve resources, I'd like to see
> "idle memory" dramatically reduced.  Self-ballooning and
> tmem are admittedly only a step in that direction, but
> at least it is (I hope) the right direction.

I keep saying, Pagecache is not idle ram.   Pagecache is essential to the
perception of acceptable system performance.  I've tried selling service
(on 10K fibre disk, no less) with shared pagecache, and by all reasonable
standards, performance was unacceptable.  

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-30 21:02                         ` Luke S Crawford
@ 2009-05-31 16:44                           ` Tim Post
  2009-05-31 17:00                             ` Tim Post
  2009-06-01 18:04                           ` Dan Magenheimer
  1 sibling, 1 reply; 183+ messages in thread
From: Tim Post @ 2009-05-31 16:44 UTC (permalink / raw)
  To: Luke S Crawford; +Cc: Dan Magenheimer, Xen-devel

On Sat, 2009-05-30 at 17:02 -0400, Luke S Crawford wrote:

> I keep saying, Pagecache is not idle ram.   Pagecache is essential to the
> perception of acceptable system performance.  I've tried selling service
> (on 10K fibre disk, no less) with shared pagecache, and by all reasonable
> standards, performance was unacceptable.

I've never seen automatic overcommitment work out in a way that everyone
was happy in the hosting industry. You are 100% correct, by default
Linux is like pac man gobbling up blocks for cache.

However, this is partly because even most well written services and
applications neglect to advise the kernel to do anything different.
posix_madvise() and posix_fadvise() do not see the light of day nearly
as often as they should. Are you parsing some m4 generated configuration
file that's just under or north of the system page size? You'd then want
to tell the kernel "Hey, I only need this once .. " prior to even
talking to read(). Yet I see people going hog wild with O_DIRECT because
they think its supposed to make things faster. 

On enterprise systems (i.e. not hosting web sites and databases that are
created by others and uploaded), this is less of a hassle and a bit
easier to manage. You _know_ better than to make 1500 static HTML pages
360K long each and put them where Google can access them. You _know_
better than to mix services that allocate 20x more than they actually
need on the same host. You're able to adjust your swappiness on a whole
group of domains instantly from a central place. Finally, your able to
patch your services so they better suit your goals.

What Dan is describing is very useful, but not to IAAS providers. Like I
said before, I would not flip a switch to AUTO on any server that is
providing the use of a VM to a customer. However , customers do get
e-mails saying "You bought 1 GB, on average this month you've used only
xxx (detail averages sampled through /proc and sysinfo()) you may wish
to switch to a cheaper plan". Sound nuts? It actually makes more money,
because our density per server goes up quite a bit.

So in a large way, I think Dan is correct. If a client bought the use of
memory and barely uses it, I'd rather give them a discount for giving
some back, enabling me to set up another domain on that node. But don't
get me wrong, I'd never dream of doing that 'automagically' :)

Cheers,
--Tim

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-31 16:44                           ` Tim Post
@ 2009-05-31 17:00                             ` Tim Post
  2009-05-31 19:48                               ` Dan Magenheimer
  0 siblings, 1 reply; 183+ messages in thread
From: Tim Post @ 2009-05-31 17:00 UTC (permalink / raw)
  To: Luke S Crawford; +Cc: Dan Magenheimer, Xen-devel

Sorry, hit send too quickly:

On Mon, 2009-06-01 at 00:44 +0800, Tim Post wrote:

> So in a large way, I think Dan is correct. If a client bought the use of
> memory and barely uses it, I'd rather give them a discount for giving
> some back, enabling me to set up another domain on that node. But don't
> get me wrong, I'd never dream of doing that 'automagically' :)

I meant to add, if an overcommit feature could just make and log
suggestions, it would eliminate a ton of userspace hackery. Thus, it
would be very useful to hosts (albeit in a neutered form).

Most hosts would gladly deal with sed, grep and awk vs libxc and
libxs :)

Cheers,
--Tim

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-31 17:00                             ` Tim Post
@ 2009-05-31 19:48                               ` Dan Magenheimer
  2009-06-02  0:15                                 ` Luke S Crawford
  0 siblings, 1 reply; 183+ messages in thread
From: Dan Magenheimer @ 2009-05-31 19:48 UTC (permalink / raw)
  To: echo, Luke S Crawford; +Cc: Xen-devel

> > So in a large way, I think Dan is correct. If a client 
> bought the use of
> > memory and barely uses it, I'd rather give them a discount 
> for giving
> > some back, enabling me to set up another domain on that 
> node. But don't
> > get me wrong, I'd never dream of doing that 'automagically' :)
> 
> I meant to add, if an overcommit feature could just make and log
> suggestions, it would eliminate a ton of userspace hackery. Thus, it
> would be very useful to hosts (albeit in a neutered form).
> 
> Most hosts would gladly deal with sed, grep and awk vs libxc and
> libxs :)

Tmem with self-ballooning can be controlled on a guest-by-guest
basis, dynamically and with fairly good granularity.  So
you need not turn overcommit "on" or "off".  And there is no
hypervisor-based swapping which is invisible to the guest;
overcommit requires guests to provide swap space and
if they don't balloon down (voluntarily) and don't exceed
their RAM, they don't use it.

Picture this (and assume tools exist to help you measure
and manage it):  Each user is billed only for the resources
they use, including RAM.  RAM "optimization" can be controlled
by the user via a menu (or slider bar for more granularity);
at one extreme, RAM (and more specifically page cache) is
aggressively reduced... but only if another VM is demanding
it.  On the other extreme, fixed maximum RAM is fully owned
by the user, and it sits idle if not in use.  The user
can choose dynamically whether to pay more for fast responsiveness,
or to pay less and surrender RAM if needed elsewhere, with
some probability for slower responsiveness.

In other words, this is like the option that some power
utilities are providing to give you a discount if you are
willing to let them shut off your air conditioning or
water heater at peak load.

Note that these tools DON'T exist today... and I don't plan
on writing them.  I'm just working at the hypervisor level
to ensure that memory utilization can be more effective and
flexible (and measurable when the flexibility is used).

Does that sound more attractive to an IAAS provider?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-30 21:02                         ` Luke S Crawford
  2009-05-31 16:44                           ` Tim Post
@ 2009-06-01 18:04                           ` Dan Magenheimer
  1 sibling, 0 replies; 183+ messages in thread
From: Dan Magenheimer @ 2009-06-01 18:04 UTC (permalink / raw)
  To: Luke S Crawford; +Cc: Xen-devel, echo

Not to beat this to death, but one more comment:

> wait what?  the difference is if you aren't using the CPU, I can take
> it away, and then give it back to you when you want it almost 
> immediately,
> with a small cost (of flushing the cpu cache, but that is fast enough
> that while it's a big deal for scientific type applications, 
> it doesn't
> really make the percieved responsiveness of the box worse, unless you
> do it a bunch of times in a small period of time.)  
> 
> Ram is different.  If I take away your pagecache, either I save it to 
> disk (slow) and restore it (slow) when I return it, or I take 
> it from you
> without saving to disk, and return clean pages when you want it back,
> meaning if you want that data you've got to re-read from disk. (slow)

You are technically correct, but I'm not talking about taking
away ALL of the pagecache.  Pagecache is a guess as to what
pages might be used in the future.  A large percentage of those
guesses are wrong and the page will never be used again
and will eventually be evicted. This is what I call "idle memory"
but I love the way Tim Post put it: "Linux is like pac man gobbling
up blocks for cache."

The right long-term answer is for Linux and OS's in general to
get smarter about giving up memory that they know is not going to
be used again, but even if they get smarter, they will never be
omniscient.

So self-ballooning creates pressure on the page cache, making
the OS evict pages that its not so sure about.  Then tmem acts
as a backup for those pages; if the OS was wrong and the page
is needed again (soon), it can get it right back without a disk
read.

Clearly this won't help users who leave their VM idle for three
months and then expect instantaneous response, but that's what
I meant by your memory partitioning helping only a few users.

Does that make sense?  Is it at least a step in the right direction?

Dan

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops))
  2009-05-31 19:48                               ` Dan Magenheimer
@ 2009-06-02  0:15                                 ` Luke S Crawford
  0 siblings, 0 replies; 183+ messages in thread
From: Luke S Crawford @ 2009-06-02  0:15 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: Xen-devel, echo

Dan Magenheimer <dan.magenheimer@oracle.com> writes:

> Picture this (and assume tools exist to help you measure
> and manage it):  Each user is billed only for the resources
> they use, including RAM.  RAM "optimization" can be controlled
> by the user via a menu (or slider bar for more granularity);
> at one extreme, RAM (and more specifically page cache) is
> aggressively reduced... but only if another VM is demanding
> it.  On the other extreme, fixed maximum RAM is fully owned
> by the user, and it sits idle if not in use.  The user
> can choose dynamically whether to pay more for fast responsiveness,
> or to pay less and surrender RAM if needed elsewhere, with
> some probability for slower responsiveness.

That sounds excelent for situations where I can quickly and cheaply
move a guest from one piece of physical hardware to another.

> Does that sound more attractive to an IAAS provider?

This is useful in some cases.  Still not in mine;  see, I can't afford
shared storage, so giving me free ram that may only be free for a 
few minutes is of limited utility.   Yeah, I can use it as shared disk
cache for extra heavy disk users, but it's still a more complex model
for the customer to understand, and I can't bring up more guests on that
host.   I could give it to other people on the same host, but 
I think that might be of limited utility, as I don't know how many
customers will be willing to pay for extra capacity if that extra
capacity is only sometimes available.  

But then, I am experimenting with low-cost homebrew OpenSolaris NAS setups,
so if that works out, and I get a working live migration system together,
then this could be useful.  Not as useful as, say, some mechanisim for live 
or nearly live migration with local storage, but still useful.  

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-05-29 12:01                     ` George Dunlap
@ 2009-06-02 15:23                       ` Thomas Gleixner
  -1 siblings, 0 replies; 183+ messages in thread
From: Thomas Gleixner @ 2009-06-02 15:23 UTC (permalink / raw)
  To: George Dunlap
  Cc: David Miller, jeremy, mingo, Dan Magenheimer, avi, xen-devel,
	x86, linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

On Fri, 29 May 2009, George Dunlap wrote:
> David Miller wrote:
> > I don't see Ingo's comments, whether I agree with them or not, as
> > an implication of Xen being niche.  Rather I see his comments as
> > an opposition to how Xen is implemented.
> >   
> It's in his definition of "improving Linux".  Jeremy is saying that allowing
> Linux to run as dom0 *is* improving Linux.  The lack of dom0 support is at
> this moment making life more difficult for a huge number of Linux users who

Exactly that's the point. Adding dom0 makes life easier for a group of
users who decided to use Xen some time ago, but what Ingo wants is
technical improvement of the kernel.

There are many features which have been wildly used in the distro
world where developers tried to push support into the kernel with the
same line of arguments.

The kernel policy always was and still is to accept only those
features which have a technical benefit to the code base.

I'm just picking a few examples:

Aside of the paravirt, which seems to expand through arch/x86 like a
hydra, the new patches sprinkle "if (xen_...)" all over the
place. These extra xen dependencies are no improvement, they are a
royal pain in the ... They are sticky once they got merged simply
because the hypervisor relies on them and we need to provide
compatibility for a long time.

Aside of that it grows interfaces like pat_disable() just because the
CPU model of Xen is obviously not able to kill the PAT flags in the
CPUid emulation. Why for heavens sake do we have a cpuid paravirt op
when we need to disable stuff seperately which can be disabled by
paravirt functionality already? I don't see this as an improvement
either, it's simple sloppy hackery.

The changelogs of the patches are partially confusing as hell:

commit 7d2b03ff4ae27b7c9e99a421a5b965f20e4bfaab

    x86: fix up flush_tlb_all
    
    - initialize the locks before the first use
    - make sure preemption is disabled
    
    [ Impact: Bug fixes: boot time warning, and crash ]

This patch is in the Xen queue and I assume it's XEN related as we
have not seen anywhere a boot time warning and crash with the current
code AFAICT, but the changelog reads like this is some generic BUG in
the SMP boot code. There is neither a hint to Xen nor to another patch
which caused that problem. While the patch itself is harmless I do not
see what is improved and why the change was necessary in the first
place.

That's what maintainers have to look at and not who is using the code
already and wants to see it merged.

> use Xen, including Mozilla, Debian, and Amazon. Adding dom0 support would
> make Linux even more useful to a wide variety of people not using Xen at the
> moment. 

I really have a hard time to see why dom0 support makes Linux more
useful to people who do not use it. It does not improve the Linux
experience of Joe User at all.

In fact it could be harmful to the average user, if it's merged in a
crappy way that increases overhead, has a performance cost and draws
away development and maintenance resources from other areas of the
kernel.

Aside of that it can also hinder the development of a properly
designed hypervisor in Linux: 'why bother with that new stuff, it
might be cleaner and nicer, but we have this Xen dom0 stuff
already?'.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-02 15:23                       ` Thomas Gleixner
  0 siblings, 0 replies; 183+ messages in thread
From: Thomas Gleixner @ 2009-06-02 15:23 UTC (permalink / raw)
  To: George Dunlap
  Cc: David Miller, jeremy, mingo, Dan Magenheimer, avi, xen-devel,
	x86, linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe

On Fri, 29 May 2009, George Dunlap wrote:
> David Miller wrote:
> > I don't see Ingo's comments, whether I agree with them or not, as
> > an implication of Xen being niche.  Rather I see his comments as
> > an opposition to how Xen is implemented.
> >   
> It's in his definition of "improving Linux".  Jeremy is saying that allowing
> Linux to run as dom0 *is* improving Linux.  The lack of dom0 support is at
> this moment making life more difficult for a huge number of Linux users who

Exactly that's the point. Adding dom0 makes life easier for a group of
users who decided to use Xen some time ago, but what Ingo wants is
technical improvement of the kernel.

There are many features which have been wildly used in the distro
world where developers tried to push support into the kernel with the
same line of arguments.

The kernel policy always was and still is to accept only those
features which have a technical benefit to the code base.

I'm just picking a few examples:

Aside of the paravirt, which seems to expand through arch/x86 like a
hydra, the new patches sprinkle "if (xen_...)" all over the
place. These extra xen dependencies are no improvement, they are a
royal pain in the ... They are sticky once they got merged simply
because the hypervisor relies on them and we need to provide
compatibility for a long time.

Aside of that it grows interfaces like pat_disable() just because the
CPU model of Xen is obviously not able to kill the PAT flags in the
CPUid emulation. Why for heavens sake do we have a cpuid paravirt op
when we need to disable stuff seperately which can be disabled by
paravirt functionality already? I don't see this as an improvement
either, it's simple sloppy hackery.

The changelogs of the patches are partially confusing as hell:

commit 7d2b03ff4ae27b7c9e99a421a5b965f20e4bfaab

    x86: fix up flush_tlb_all
    
    - initialize the locks before the first use
    - make sure preemption is disabled
    
    [ Impact: Bug fixes: boot time warning, and crash ]

This patch is in the Xen queue and I assume it's XEN related as we
have not seen anywhere a boot time warning and crash with the current
code AFAICT, but the changelog reads like this is some generic BUG in
the SMP boot code. There is neither a hint to Xen nor to another patch
which caused that problem. While the patch itself is harmless I do not
see what is improved and why the change was necessary in the first
place.

That's what maintainers have to look at and not who is using the code
already and wants to see it merged.

> use Xen, including Mozilla, Debian, and Amazon. Adding dom0 support would
> make Linux even more useful to a wide variety of people not using Xen at the
> moment. 

I really have a hard time to see why dom0 support makes Linux more
useful to people who do not use it. It does not improve the Linux
experience of Joe User at all.

In fact it could be harmful to the average user, if it's merged in a
crappy way that increases overhead, has a performance cost and draws
away development and maintenance resources from other areas of the
kernel.

Aside of that it can also hinder the development of a properly
designed hypervisor in Linux: 'why bother with that new stuff, it
might be cleaner and nicer, but we have this Xen dom0 stuff
already?'.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-02 15:23                       ` Thomas Gleixner
@ 2009-06-02 16:41                         ` George Dunlap
  -1 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-06-02 16:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Miller, jeremy, mingo, Dan Magenheimer, avi, xen-devel,
	x86, linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

Thomas Gleixner wrote:
> Exactly that's the point. Adding dom0 makes life easier for a group of
> users who decided to use Xen some time ago, but what Ingo wants is
> technical improvement of the kernel.
>
> There are many features which have been wildly used in the distro
> world where developers tried to push support into the kernel with the
> same line of arguments.
>
> The kernel policy always was and still is to accept only those
> features which have a technical benefit to the code base.
>   
I can appreciate the idea of resisting the pushing of random features.  
Still, your definition of "improving Linux" is still lacking.  Obviously 
a new scheduler is taking something that's existing and improving it.  
But adding a new filesystem, a new driver, or adding a new feature, such 
as notifications, AIO, a new hardware architecture, or even KVM: How do 
those classify as "technical improvement to the kernel" or "features 
which have technical benefit to the code base" in a way that Xen does not?

If you mean "increases Linux's technical capability", and define Xen as 
outside of Linux, then I think the definition is too small.  After all, 
allowing Linux to run on an ARM processor isn't increasing Linux' 
technical capability, it's just allowing a new group of people (people 
with ARM chips) to use Linux.  It's the same with Xen.

No one disputes the idea that changes shouldn't be ugly; no one disputes 
the idea that changes shouldn't introduce performance regressions.  But 
there are patchqueues that are ready, signed-off by other maintainers, 
and which Ingo admits that he has no technical objections to, but 
refuses to merge. 

(His most recent "objection" is that he claims the currently existing 
pv_ops infrastructure (which KVM and others benefit from as well as Xen) 
introduces almost a 1% overhead on native in an mm-heavy 
microbenchmark.  So he refuses to merge feature Y (dom0 support) until 
the Xen community helps technically unrelated existing feature X 
(pv_ops) meets some criteria.  So it has nothing to do with the quality 
of the patches themselves.)

[Not qualified to speak to the specific technical objections.]
> I really have a hard time to see why dom0 support makes Linux more
> useful to people who do not use it. It does not improve the Linux
> experience of Joe User at all.
>   
If Joe User uses Amazon, he benefits.  If Joe User downloads an Ubuntu 
or Debian distro, and the hosting providers were more secure and had to 
do less work because dom0 was inlined, then he benefits because of the 
lower cost / resources freed to do other things.

But what I was actually talking about is the number of people who don't 
use it now but would use it if it were merged in.  There hundreds of 
thousands of instances running now, and more people are chosing to use 
it at the moment, even though those who use it have the devil's choice 
between doing patching or using a 3-year old kernel.  How many more 
would use it if it were in mainline?
> In fact it could be harmful to the average user, if it's merged in a
> crappy way that increases overhead, has a performance cost and draws
> away development and maintenance resources from other areas of the
> kernel.
>   
No one is asking for something to be merged in a crappy way, or with 
unacceptable performance cost.  There are a number of patchqueues that 
Ingo has no technical objections to, but which he still refuses to merge.

"Drawing away development and maintenance resources" is a cost/benefits 
question, and Jeremy's main point was that there is a *high* benefit for 
dom0 being merged into mainline.  The same could be said of almost 
anything: are you suggesting not accepting any more KVM code because it 
might "draw away development and maintenance resources from other areas 
of the kernel"?
> Aside of that it can also hinder the development of a properly
> designed hypervisor in Linux: 'why bother with that new stuff, it
> might be cleaner and nicer, but we have this Xen dom0 stuff
> already?'.
>   
This argument doesn't make any sense.  Would you advocate only having 
one filesystem for fear that people would somehow be discouraged from 
working on a new filesystem?

Even if that were a valid argument, it wouldn't apply in this situation. 
KVM has plenty of mind-share, and the support of RedHat.  Also, I'd 
wager that it's a lot easier for a Linux kernel developer to get 
involved in KVM than in Xen, because they're already familiar with 
Linux.  I don't think anyone working on KVM will be tempted to give up 
just because Xen is also available, unless it becomes clear that 
linux-as-hypervisor isn't the best technical solution; in which case, 
moving to Xen would be the right thing to do anyway.  Merging dom0 Xen 
will in no way interfere with the development of KVM or other 
linux-as-hypervisor projects.

The main point of Jeremy's e-mail was NOT to say, "Lots of people use 
this so you should merge it."  He's was responding to Xen being treated 
like it had no benefit.  It does have a benefit; it is a feature.

 -George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-02 16:41                         ` George Dunlap
  0 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-06-02 16:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Miller, jeremy, mingo, Dan Magenheimer, avi, xen-devel,
	x86, linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe

Thomas Gleixner wrote:
> Exactly that's the point. Adding dom0 makes life easier for a group of
> users who decided to use Xen some time ago, but what Ingo wants is
> technical improvement of the kernel.
>
> There are many features which have been wildly used in the distro
> world where developers tried to push support into the kernel with the
> same line of arguments.
>
> The kernel policy always was and still is to accept only those
> features which have a technical benefit to the code base.
>   
I can appreciate the idea of resisting the pushing of random features.  
Still, your definition of "improving Linux" is still lacking.  Obviously 
a new scheduler is taking something that's existing and improving it.  
But adding a new filesystem, a new driver, or adding a new feature, such 
as notifications, AIO, a new hardware architecture, or even KVM: How do 
those classify as "technical improvement to the kernel" or "features 
which have technical benefit to the code base" in a way that Xen does not?

If you mean "increases Linux's technical capability", and define Xen as 
outside of Linux, then I think the definition is too small.  After all, 
allowing Linux to run on an ARM processor isn't increasing Linux' 
technical capability, it's just allowing a new group of people (people 
with ARM chips) to use Linux.  It's the same with Xen.

No one disputes the idea that changes shouldn't be ugly; no one disputes 
the idea that changes shouldn't introduce performance regressions.  But 
there are patchqueues that are ready, signed-off by other maintainers, 
and which Ingo admits that he has no technical objections to, but 
refuses to merge. 

(His most recent "objection" is that he claims the currently existing 
pv_ops infrastructure (which KVM and others benefit from as well as Xen) 
introduces almost a 1% overhead on native in an mm-heavy 
microbenchmark.  So he refuses to merge feature Y (dom0 support) until 
the Xen community helps technically unrelated existing feature X 
(pv_ops) meets some criteria.  So it has nothing to do with the quality 
of the patches themselves.)

[Not qualified to speak to the specific technical objections.]
> I really have a hard time to see why dom0 support makes Linux more
> useful to people who do not use it. It does not improve the Linux
> experience of Joe User at all.
>   
If Joe User uses Amazon, he benefits.  If Joe User downloads an Ubuntu 
or Debian distro, and the hosting providers were more secure and had to 
do less work because dom0 was inlined, then he benefits because of the 
lower cost / resources freed to do other things.

But what I was actually talking about is the number of people who don't 
use it now but would use it if it were merged in.  There hundreds of 
thousands of instances running now, and more people are chosing to use 
it at the moment, even though those who use it have the devil's choice 
between doing patching or using a 3-year old kernel.  How many more 
would use it if it were in mainline?
> In fact it could be harmful to the average user, if it's merged in a
> crappy way that increases overhead, has a performance cost and draws
> away development and maintenance resources from other areas of the
> kernel.
>   
No one is asking for something to be merged in a crappy way, or with 
unacceptable performance cost.  There are a number of patchqueues that 
Ingo has no technical objections to, but which he still refuses to merge.

"Drawing away development and maintenance resources" is a cost/benefits 
question, and Jeremy's main point was that there is a *high* benefit for 
dom0 being merged into mainline.  The same could be said of almost 
anything: are you suggesting not accepting any more KVM code because it 
might "draw away development and maintenance resources from other areas 
of the kernel"?
> Aside of that it can also hinder the development of a properly
> designed hypervisor in Linux: 'why bother with that new stuff, it
> might be cleaner and nicer, but we have this Xen dom0 stuff
> already?'.
>   
This argument doesn't make any sense.  Would you advocate only having 
one filesystem for fear that people would somehow be discouraged from 
working on a new filesystem?

Even if that were a valid argument, it wouldn't apply in this situation. 
KVM has plenty of mind-share, and the support of RedHat.  Also, I'd 
wager that it's a lot easier for a Linux kernel developer to get 
involved in KVM than in Xen, because they're already familiar with 
Linux.  I don't think anyone working on KVM will be tempted to give up 
just because Xen is also available, unless it becomes clear that 
linux-as-hypervisor isn't the best technical solution; in which case, 
moving to Xen would be the right thing to do anyway.  Merging dom0 Xen 
will in no way interfere with the development of KVM or other 
linux-as-hypervisor projects.

The main point of Jeremy's e-mail was NOT to say, "Lots of people use 
this so you should merge it."  He's was responding to Xen being treated 
like it had no benefit.  It does have a benefit; it is a feature.

 -George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-02 16:41                         ` George Dunlap
@ 2009-06-02 17:28                           ` Chris Friesen
  -1 siblings, 0 replies; 183+ messages in thread
From: Chris Friesen @ 2009-06-02 17:28 UTC (permalink / raw)
  To: George Dunlap
  Cc: Thomas Gleixner, David Miller, jeremy, mingo, Dan Magenheimer,
	avi, xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin

George Dunlap wrote:
> Thomas Gleixner wrote:

> No one disputes the idea that changes shouldn't be ugly; no one disputes 
> the idea that changes shouldn't introduce performance regressions.  But 
> there are patchqueues that are ready, signed-off by other maintainers, 
> and which Ingo admits that he has no technical objections to, but 
> refuses to merge.

I can't comment on this part, but if so that seems unfortunate.

> The main point of Jeremy's e-mail was NOT to say, "Lots of people use 
> this so you should merge it."  He's was responding to Xen being treated 
> like it had no benefit.  It does have a benefit; it is a feature.

I don't know about others, but I certainly interpreted a number of posts
saying exactly that--that it's useful so it should be included.

I don't think anyone is arguing that Xen is not useful or that it should
not ever be included, rather the question is whether the current set of
patches is suitable for addition or whether they are too messy and
should be cleaned up first.

Chris

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-02 17:28                           ` Chris Friesen
  0 siblings, 0 replies; 183+ messages in thread
From: Chris Friesen @ 2009-06-02 17:28 UTC (permalink / raw)
  To: George Dunlap
  Cc: Thomas Gleixner, David Miller, jeremy, mingo, Dan Magenheimer,
	avi, xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.a

George Dunlap wrote:
> Thomas Gleixner wrote:

> No one disputes the idea that changes shouldn't be ugly; no one disputes 
> the idea that changes shouldn't introduce performance regressions.  But 
> there are patchqueues that are ready, signed-off by other maintainers, 
> and which Ingo admits that he has no technical objections to, but 
> refuses to merge.

I can't comment on this part, but if so that seems unfortunate.

> The main point of Jeremy's e-mail was NOT to say, "Lots of people use 
> this so you should merge it."  He's was responding to Xen being treated 
> like it had no benefit.  It does have a benefit; it is a feature.

I don't know about others, but I certainly interpreted a number of posts
saying exactly that--that it's useful so it should be included.

I don't think anyone is arguing that Xen is not useful or that it should
not ever be included, rather the question is whether the current set of
patches is suitable for addition or whether they are too messy and
should be cleaned up first.

Chris

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-02 16:41                         ` George Dunlap
@ 2009-06-02 17:46                           ` Linus Torvalds
  -1 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2009-06-02 17:46 UTC (permalink / raw)
  To: George Dunlap
  Cc: Thomas Gleixner, David Miller, jeremy, mingo, Dan Magenheimer,
	avi, xen-devel, x86, linux-kernel, Keir Fraser, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin



On Tue, 2 Jun 2009, George Dunlap wrote:
>
> idea that changes shouldn't introduce performance regressions.  But there are
> patchqueues that are ready, signed-off by other maintainers, and which Ingo
> admits that he has no technical objections to, but refuses to merge. 

I've seen technical objects in this thread. The whole thing _started_ with 
one, and Thomas brought up others.

As a top-level maintainer, I can also very much sympathise with the "don't 
merge new stuff if there are known problems and no known solutions to 
those issues". Is Ingo supposed to just continue to merge crap, when it's 
admitted that it has problems and pollutes code that he has to maintain?

The fact is (and this is a _fact_): Xen is a total mess from a development 
standpoint. I talked about this in private with Jeremy. Xen pollutes the 
architecture code in ways that NO OTHER subsystem does. And I have never 
EVER seen the Xen developers really acknowledge that and try to fix it.

Thomas pointed to patches that add _explicitly_ Xen-related special cases 
that aren't even trying to make sense. See the local apic thing. 

So quite frankly, I wish some of the Xen people looked themselves in the 
mirror, and then asked themselves "would _I_ merge something ugly like 
that, if it was filling my subsystem with totally unrelated hacks for some 
other crap"?

Seriously.

If it was just the local APIC, fine. But it may be just the local APIC 
code this time around, next time it will be something else. It's been TLB, 
it's been entry_*.S, it's been all over. Some of them are performance 
issues.

I dunno. I just do know that I pointed out the statistics for how 
mindlessly incestuous the Xen patches have historically been to Jeremy. He 
admitted it. I've not seen _anybody_ say that things will improve. 

Xen has been painful. If you give maintainers pain, don't expect them to 
love you or respect you.

So I would really suggest that Xen people should look at _why_ they are 
giving maintainers so much pain.

		Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-02 17:46                           ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2009-06-02 17:46 UTC (permalink / raw)
  To: George Dunlap
  Cc: Thomas Gleixner, David Miller, jeremy, mingo, Dan Magenheimer,
	avi, xen-devel, x86, linux-kernel, Keir Fraser, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin@suse.de



On Tue, 2 Jun 2009, George Dunlap wrote:
>
> idea that changes shouldn't introduce performance regressions.  But there are
> patchqueues that are ready, signed-off by other maintainers, and which Ingo
> admits that he has no technical objections to, but refuses to merge. 

I've seen technical objects in this thread. The whole thing _started_ with 
one, and Thomas brought up others.

As a top-level maintainer, I can also very much sympathise with the "don't 
merge new stuff if there are known problems and no known solutions to 
those issues". Is Ingo supposed to just continue to merge crap, when it's 
admitted that it has problems and pollutes code that he has to maintain?

The fact is (and this is a _fact_): Xen is a total mess from a development 
standpoint. I talked about this in private with Jeremy. Xen pollutes the 
architecture code in ways that NO OTHER subsystem does. And I have never 
EVER seen the Xen developers really acknowledge that and try to fix it.

Thomas pointed to patches that add _explicitly_ Xen-related special cases 
that aren't even trying to make sense. See the local apic thing. 

So quite frankly, I wish some of the Xen people looked themselves in the 
mirror, and then asked themselves "would _I_ merge something ugly like 
that, if it was filling my subsystem with totally unrelated hacks for some 
other crap"?

Seriously.

If it was just the local APIC, fine. But it may be just the local APIC 
code this time around, next time it will be something else. It's been TLB, 
it's been entry_*.S, it's been all over. Some of them are performance 
issues.

I dunno. I just do know that I pointed out the statistics for how 
mindlessly incestuous the Xen patches have historically been to Jeremy. He 
admitted it. I've not seen _anybody_ say that things will improve. 

Xen has been painful. If you give maintainers pain, don't expect them to 
love you or respect you.

So I would really suggest that Xen people should look at _why_ they are 
giving maintainers so much pain.

		Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-02 17:46                           ` Linus Torvalds
@ 2009-06-02 18:02                             ` Linus Torvalds
  -1 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2009-06-02 18:02 UTC (permalink / raw)
  To: George Dunlap
  Cc: Thomas Gleixner, David Miller, jeremy, mingo, Dan Magenheimer,
	avi, xen-devel, x86, linux-kernel, Keir Fraser, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin



On Tue, 2 Jun 2009, Linus Torvalds wrote:
> 
> I dunno. I just do know that I pointed out the statistics for how 
> mindlessly incestuous the Xen patches have historically been to Jeremy. He 
> admitted it. I've not seen _anybody_ say that things will improve. 

In case people want to look at this on their own, get a git tree, and run 
the examples I asked Jeremy to run:

        git log --pretty=oneline --full-diff --stat arch/x86/kvm/ |
                grep -v '/kvm' |
                less -S

and then go ahead and do the same except with "xen" instead of "kvm".

Now, once you've done that, ask yourself which one is going to be merged 
easily and without any pushback.

Btw, this is NOT meant to be a "xen vs kvm" thing. Before you react to the 
"kvm" part, replace "arch/x86/kvm" above with "drivers/scsi" or something.

The point? Xen really is horribly badly separated out. It gets way more 
incestuous with other systems than it should. It's entirely possible that 
this is very fundamental to both paravirtualization and to hypervisor 
behavior, but it doesn't matter - it just measn that I can well see that 
Xen is a f*cking pain to merge.

So please, Xen people, look at your track record, and look at the issues 
from the standpoint of somebody merging your code, rather than just from 
the standpoint of somebody who whines "I want my code to be merged".

IOW, if you have trouble getting your code merged, ask yourself what _you_ 
are doing wrong.

			Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-02 18:02                             ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2009-06-02 18:02 UTC (permalink / raw)
  To: George Dunlap
  Cc: Thomas Gleixner, David Miller, jeremy, mingo, Dan Magenheimer,
	avi, xen-devel, x86, linux-kernel, Keir Fraser, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin@suse.de



On Tue, 2 Jun 2009, Linus Torvalds wrote:
> 
> I dunno. I just do know that I pointed out the statistics for how 
> mindlessly incestuous the Xen patches have historically been to Jeremy. He 
> admitted it. I've not seen _anybody_ say that things will improve. 

In case people want to look at this on their own, get a git tree, and run 
the examples I asked Jeremy to run:

        git log --pretty=oneline --full-diff --stat arch/x86/kvm/ |
                grep -v '/kvm' |
                less -S

and then go ahead and do the same except with "xen" instead of "kvm".

Now, once you've done that, ask yourself which one is going to be merged 
easily and without any pushback.

Btw, this is NOT meant to be a "xen vs kvm" thing. Before you react to the 
"kvm" part, replace "arch/x86/kvm" above with "drivers/scsi" or something.

The point? Xen really is horribly badly separated out. It gets way more 
incestuous with other systems than it should. It's entirely possible that 
this is very fundamental to both paravirtualization and to hypervisor 
behavior, but it doesn't matter - it just measn that I can well see that 
Xen is a f*cking pain to merge.

So please, Xen people, look at your track record, and look at the issues 
from the standpoint of somebody merging your code, rather than just from 
the standpoint of somebody who whines "I want my code to be merged".

IOW, if you have trouble getting your code merged, ask yourself what _you_ 
are doing wrong.

			Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-02 16:41                         ` George Dunlap
@ 2009-06-02 18:59                           ` Thomas Gleixner
  -1 siblings, 0 replies; 183+ messages in thread
From: Thomas Gleixner @ 2009-06-02 18:59 UTC (permalink / raw)
  To: George Dunlap
  Cc: David Miller, jeremy, mingo, Dan Magenheimer, avi, xen-devel,
	x86, linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

On Tue, 2 Jun 2009, George Dunlap wrote:

> Thomas Gleixner wrote:
> > Exactly that's the point. Adding dom0 makes life easier for a group of
> > users who decided to use Xen some time ago, but what Ingo wants is
> > technical improvement of the kernel.
> > 
> > There are many features which have been wildly used in the distro
> > world where developers tried to push support into the kernel with the
> > same line of arguments.
> > 
> > The kernel policy always was and still is to accept only those
> > features which have a technical benefit to the code base.
> >   
> I can appreciate the idea of resisting the pushing of random features.  Still,
> your definition of "improving Linux" is still lacking.  Obviously a new
> scheduler is taking something that's existing and improving it.  But adding a
> new filesystem, a new driver, or adding a new feature, such as notifications,
> AIO, a new hardware architecture, or even KVM: How do those classify as
> "technical improvement to the kernel" or "features which have technical
> benefit to the code base" in a way that Xen does not?

There is a huge difference between new filesystems, drivers,
architectures and Xen.

A new filesystem is not intrusive to the filesystem layers, it's not
adding its special cases all over the place. There is no single "if
(fs_whatever)" hackery in the code base. Neither does a driver nor a
new architecture.

If the new functionality needs some extension to the generic code base
then this is carefully added with the maintainers of that code and the
extension is usually useful to other (filesystems, drivers,
architectures) as well. If it's necessary to add some special case for
one architecture then this is done by proper abstraction to keep the
burden and the maintainence cost down.

There is no #ifdef ARCH_ARM in mm/ fs/ kernel/ block/ .....

Talking about KVM, there is not a single "if (kvm)" line in the
arch/x86 code base. There is _ONE_ lonely #ifdef CONFIG_KVM_CLOCK
(which could be eliminated) in the whole x86 codebase, but at least 10
CONFIG_XEN* ones all over the place. The KVM developers went great
length to avoid adding restrictions to the existing code base.

I'm not saying that the Xen folks did not listen to us, they improved
lots of their code base and Jeremy was particularly helpful to unify
the 32/64bit code.

But right now I see a big code dump with subtle details where some of
them are just not acceptable to me.

> If you mean "increases Linux's technical capability", and define Xen as
> outside of Linux, then I think the definition is too small.  After all,
> allowing Linux to run on an ARM processor isn't increasing Linux' technical
> capability, it's just allowing a new group of people (people with ARM chips)
> to use Linux.  It's the same with Xen.

No, it's not. ARM does not interfere with anything and it keeps its
architecture specific limitations confined in arch/arm.

Xen injects its design limitation workarounds into the arch/x86
codebase and burdens developers and maintainers with it.

> No one disputes the idea that changes shouldn't be ugly; no one disputes the
> idea that changes shouldn't introduce performance regressions.  But there are
> patchqueues that are ready, signed-off by other maintainers, and which Ingo
> admits that he has no technical objections to, but refuses to merge. 
> (His most recent "objection" is that he claims the currently existing pv_ops
> infrastructure (which KVM and others benefit from as well as Xen) introduces
> almost a 1% overhead on native in an mm-heavy microbenchmark.  So he refuses
> to merge feature Y (dom0 support) until the Xen community helps technically
> unrelated existing feature X (pv_ops) meets some criteria.  So it has nothing
> to do with the quality of the patches themselves.)

Oh well. It has a lot to do with the quality of the patches. The
design is part of the quality and right now the short comings of the
design are papered over by adding Xen restrictions into the x86 code
base.

> [Not qualified to speak to the specific technical objections.]
> > I really have a hard time to see why dom0 support makes Linux more
> > useful to people who do not use it. It does not improve the Linux
> > experience of Joe User at all.
> >   
> If Joe User uses Amazon, he benefits.  If Joe User downloads an Ubuntu or
> Debian distro, and the hosting providers were more secure and had to do less
> work because dom0 was inlined, then he benefits because of the lower cost /
> resources freed to do other things.

Right, then they can concentrate on adding another bunch out of tree
patches to their kernels. Next time you stand up and tell me the same
argument for apparmour, ndiswrapper or whatever people like to use.

> But what I was actually talking about is the number of people who don't use it
> now but would use it if it were merged in.  There hundreds of thousands of
> instances running now, and more people are chosing to use it at the moment,
> even though those who use it have the devil's choice between doing patching or
> using a 3-year old kernel.  How many more would use it if it were in mainline?

How many more would use ndiswrapper if it were in mainline ?

> > In fact it could be harmful to the average user, if it's merged in a
> > crappy way that increases overhead, has a performance cost and draws
> > away development and maintenance resources from other areas of the
> > kernel.
> >   
> No one is asking for something to be merged in a crappy way, or with
> unacceptable performance cost.  There are a number of patchqueues that Ingo
> has no technical objections to, but which he still refuses to merge.

Right, because the lineup of patches is not completely untangled and
we still have objections against the overall outcome and design of the
Dom0 integration into the kernel proper.

It's not our fault that the Dom0 design decisions were made in total
disconnect to the kernel community and now a "swallow them as is"
policy is imposed on us with the argument that the newer kernels need
to run on ancient hypervisors as well.

You whine about users having to use 3 year old kernels, but 3 years
old hypervisors are fine, right ?

I'm not against merging dom0 in general, I'm opposing that we need to
buy inferior technical solutions which we can not change for a long
time. Once we merged them the "you can not break existent hypervisors"
argument will be used to prevent any design change and cleanup.

> The main point of Jeremy's e-mail was NOT to say, "Lots of people use this so
> you should merge it."  He's was responding to Xen being treated like it had no
> benefit.  It does have a benefit; it is a feature.

Right, a feature which comes with cost. The cost is the de facto
injection of an dom0 ABI into the arch/x86 code base. A new driver is
a feature as well, but it just adds the feature w/o impact to the
general system.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-02 18:59                           ` Thomas Gleixner
  0 siblings, 0 replies; 183+ messages in thread
From: Thomas Gleixner @ 2009-06-02 18:59 UTC (permalink / raw)
  To: George Dunlap
  Cc: David Miller, jeremy, mingo, Dan Magenheimer, avi, xen-devel,
	x86, linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe

On Tue, 2 Jun 2009, George Dunlap wrote:

> Thomas Gleixner wrote:
> > Exactly that's the point. Adding dom0 makes life easier for a group of
> > users who decided to use Xen some time ago, but what Ingo wants is
> > technical improvement of the kernel.
> > 
> > There are many features which have been wildly used in the distro
> > world where developers tried to push support into the kernel with the
> > same line of arguments.
> > 
> > The kernel policy always was and still is to accept only those
> > features which have a technical benefit to the code base.
> >   
> I can appreciate the idea of resisting the pushing of random features.  Still,
> your definition of "improving Linux" is still lacking.  Obviously a new
> scheduler is taking something that's existing and improving it.  But adding a
> new filesystem, a new driver, or adding a new feature, such as notifications,
> AIO, a new hardware architecture, or even KVM: How do those classify as
> "technical improvement to the kernel" or "features which have technical
> benefit to the code base" in a way that Xen does not?

There is a huge difference between new filesystems, drivers,
architectures and Xen.

A new filesystem is not intrusive to the filesystem layers, it's not
adding its special cases all over the place. There is no single "if
(fs_whatever)" hackery in the code base. Neither does a driver nor a
new architecture.

If the new functionality needs some extension to the generic code base
then this is carefully added with the maintainers of that code and the
extension is usually useful to other (filesystems, drivers,
architectures) as well. If it's necessary to add some special case for
one architecture then this is done by proper abstraction to keep the
burden and the maintainence cost down.

There is no #ifdef ARCH_ARM in mm/ fs/ kernel/ block/ .....

Talking about KVM, there is not a single "if (kvm)" line in the
arch/x86 code base. There is _ONE_ lonely #ifdef CONFIG_KVM_CLOCK
(which could be eliminated) in the whole x86 codebase, but at least 10
CONFIG_XEN* ones all over the place. The KVM developers went great
length to avoid adding restrictions to the existing code base.

I'm not saying that the Xen folks did not listen to us, they improved
lots of their code base and Jeremy was particularly helpful to unify
the 32/64bit code.

But right now I see a big code dump with subtle details where some of
them are just not acceptable to me.

> If you mean "increases Linux's technical capability", and define Xen as
> outside of Linux, then I think the definition is too small.  After all,
> allowing Linux to run on an ARM processor isn't increasing Linux' technical
> capability, it's just allowing a new group of people (people with ARM chips)
> to use Linux.  It's the same with Xen.

No, it's not. ARM does not interfere with anything and it keeps its
architecture specific limitations confined in arch/arm.

Xen injects its design limitation workarounds into the arch/x86
codebase and burdens developers and maintainers with it.

> No one disputes the idea that changes shouldn't be ugly; no one disputes the
> idea that changes shouldn't introduce performance regressions.  But there are
> patchqueues that are ready, signed-off by other maintainers, and which Ingo
> admits that he has no technical objections to, but refuses to merge. 
> (His most recent "objection" is that he claims the currently existing pv_ops
> infrastructure (which KVM and others benefit from as well as Xen) introduces
> almost a 1% overhead on native in an mm-heavy microbenchmark.  So he refuses
> to merge feature Y (dom0 support) until the Xen community helps technically
> unrelated existing feature X (pv_ops) meets some criteria.  So it has nothing
> to do with the quality of the patches themselves.)

Oh well. It has a lot to do with the quality of the patches. The
design is part of the quality and right now the short comings of the
design are papered over by adding Xen restrictions into the x86 code
base.

> [Not qualified to speak to the specific technical objections.]
> > I really have a hard time to see why dom0 support makes Linux more
> > useful to people who do not use it. It does not improve the Linux
> > experience of Joe User at all.
> >   
> If Joe User uses Amazon, he benefits.  If Joe User downloads an Ubuntu or
> Debian distro, and the hosting providers were more secure and had to do less
> work because dom0 was inlined, then he benefits because of the lower cost /
> resources freed to do other things.

Right, then they can concentrate on adding another bunch out of tree
patches to their kernels. Next time you stand up and tell me the same
argument for apparmour, ndiswrapper or whatever people like to use.

> But what I was actually talking about is the number of people who don't use it
> now but would use it if it were merged in.  There hundreds of thousands of
> instances running now, and more people are chosing to use it at the moment,
> even though those who use it have the devil's choice between doing patching or
> using a 3-year old kernel.  How many more would use it if it were in mainline?

How many more would use ndiswrapper if it were in mainline ?

> > In fact it could be harmful to the average user, if it's merged in a
> > crappy way that increases overhead, has a performance cost and draws
> > away development and maintenance resources from other areas of the
> > kernel.
> >   
> No one is asking for something to be merged in a crappy way, or with
> unacceptable performance cost.  There are a number of patchqueues that Ingo
> has no technical objections to, but which he still refuses to merge.

Right, because the lineup of patches is not completely untangled and
we still have objections against the overall outcome and design of the
Dom0 integration into the kernel proper.

It's not our fault that the Dom0 design decisions were made in total
disconnect to the kernel community and now a "swallow them as is"
policy is imposed on us with the argument that the newer kernels need
to run on ancient hypervisors as well.

You whine about users having to use 3 year old kernels, but 3 years
old hypervisors are fine, right ?

I'm not against merging dom0 in general, I'm opposing that we need to
buy inferior technical solutions which we can not change for a long
time. Once we merged them the "you can not break existent hypervisors"
argument will be used to prevent any design change and cleanup.

> The main point of Jeremy's e-mail was NOT to say, "Lots of people use this so
> you should merge it."  He's was responding to Xen being treated like it had no
> benefit.  It does have a benefit; it is a feature.

Right, a feature which comes with cost. The cost is the de facto
injection of an dom0 ABI into the arch/x86 code base. A new driver is
a feature as well, but it just adds the feature w/o impact to the
general system.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-02 18:02                             ` Linus Torvalds
@ 2009-06-02 18:59                               ` Avi Kivity
  -1 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-06-02 18:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: George Dunlap, Thomas Gleixner, David Miller, jeremy, mingo,
	Dan Magenheimer, xen-devel, x86, linux-kernel, Keir Fraser,
	gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe, npiggin

Linus Torvalds wrote:
> The point? Xen really is horribly badly separated out. It gets way more 
> incestuous with other systems than it should. It's entirely possible that 
> this is very fundamental to both paravirtualization and to hypervisor 
> behavior, but it doesn't matter - it just measn that I can well see that 
> Xen is a f*cking pain to merge.
>
> So please, Xen people, look at your track record, and look at the issues 
> from the standpoint of somebody merging your code, rather than just from 
> the standpoint of somebody who whines "I want my code to be merged".
>
> IOW, if you have trouble getting your code merged, ask yourself what _you_ 
> are doing wrong.
>   

There is in fact a way to get dom0 support with nearly no changes to 
Linux, but it involves massive changes to Xen itself and requires 
hardware support: run dom0 as a fully virtualized guest, and assign it 
all the resources dom0 can access.  It's probably a massive effort though.

I've considered it for kvm when faced with the "I want a thin 
hypervisor" question: compile the hypervisor kernel with PCI support but 
nothing else (no CONFIG_BLOCK or CONFIG_NET, no device drivers), load 
userspace from initramfs, and assign host devices to one or more 
privileged guests.  You could probably run the host with a heavily 
stripped configuration, and enjoy the slimness while every interrupt 
invokes the scheduler, a context switch, and maybe an IPI for good measure.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-02 18:59                               ` Avi Kivity
  0 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-06-02 18:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: George Dunlap, Thomas Gleixner, David Miller, jeremy, mingo,
	Dan Magenheimer, xen-devel, x86, linux-kernel, Keir Fraser,
	gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe, npiggin

Linus Torvalds wrote:
> The point? Xen really is horribly badly separated out. It gets way more 
> incestuous with other systems than it should. It's entirely possible that 
> this is very fundamental to both paravirtualization and to hypervisor 
> behavior, but it doesn't matter - it just measn that I can well see that 
> Xen is a f*cking pain to merge.
>
> So please, Xen people, look at your track record, and look at the issues 
> from the standpoint of somebody merging your code, rather than just from 
> the standpoint of somebody who whines "I want my code to be merged".
>
> IOW, if you have trouble getting your code merged, ask yourself what _you_ 
> are doing wrong.
>   

There is in fact a way to get dom0 support with nearly no changes to 
Linux, but it involves massive changes to Xen itself and requires 
hardware support: run dom0 as a fully virtualized guest, and assign it 
all the resources dom0 can access.  It's probably a massive effort though.

I've considered it for kvm when faced with the "I want a thin 
hypervisor" question: compile the hypervisor kernel with PCI support but 
nothing else (no CONFIG_BLOCK or CONFIG_NET, no device drivers), load 
userspace from initramfs, and assign host devices to one or more 
privileged guests.  You could probably run the host with a heavily 
stripped configuration, and enjoy the slimness while every interrupt 
invokes the scheduler, a context switch, and maybe an IPI for good measure.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-05-29 12:01                     ` George Dunlap
@ 2009-06-02 22:40                       ` Steven Rostedt
  -1 siblings, 0 replies; 183+ messages in thread
From: Steven Rostedt @ 2009-06-02 22:40 UTC (permalink / raw)
  To: George Dunlap
  Cc: David Miller, jeremy, mingo, Dan Magenheimer, avi, xen-devel,
	x86, linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

On Fri, May 29, 2009 at 01:01:18PM +0100, George Dunlap wrote:
>
> If we take him at his word, that the root issue is that he fundamentally  
> dislikes the design choice of running Linux-as-hypervisor-component,  
> then we have a difference of opinion and we're just going to have to  
> agree to disagree.  But there are reasons to include it anyway,  
> including benefits to existing Xen users and potential Xen users (who  
> have decided not to use KVM for whatever reason), and the idea of  
> survival-of-the-fittest: Xen and KVM have made different design choices,  
> let's let them both grow and see which one thrives.  If KVM's design is  
> unilaterally superior, eventually Xen will die off.  But I suspect that  
> there's significant demand in the OSS virtualization ecology for both  
> approaches, and the world will be the worse for dom0 support being  
> out-of-tree.
>

Three years ago, when I was hired by Red Hat, I was put on the Virt team,
and I had to work on Xen. I found it an awkward community to say the least.
But I'll refrain from talking about that experience.

Before I was hired, I was full time developing the -rt patch. I was accustom
to the way the Linux development worked, and felt comfortable with it. I was
very pleased when I left the virt team to go back to work on the -rt patch.
Just before I left, KVM came out. I started playing with it and I once again
felt comfortable in that development. I probably would not have mind working
in the virt team if it was KVM that I was working on. I guess the point I'm
trying to make here is that KVM is developed in a Linux community, Xen is not.

The major difference between KVM and Xen is that KVM _is_ part of Linux. Xen
is not. The reason that this matters is that if we need to make a change to
the way Linux works we can simply make KVM handle the change. That is, you
could think of it as Dom0 and the hypervisor would always be in sync.

If we were to break an interface with Dom0 for Xen then we would have a bunch
of people crying foul about us breaking a defined API. One of Thomas's complaints
(and a valid one) is that once Linux supports an external API it must always
keep it compatible. This will hamper new development in Linux if the APIs are
scattered throughout the kernel without much thought.

Now here's a crazy solution. Merge the Xen hypervisor into Linux ;-)

Give full ownership of Xen to the Linux community. One of your people could be
a maintainer. This way the API between Dom0 and the hypervisor would be an internal
one. If you needed to upgrade Dom0, you also must upgrade the hypervisor, but that
would be fine since the hypervisor would also be in the Kernel proper.

This may not solve all the issues that the x86 maintainers have with the Dom0
patches, but it may help solve the API one.

Yeah, I know, I'll be having snowball fights with Saddam before that happens.

-- Steve


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-02 22:40                       ` Steven Rostedt
  0 siblings, 0 replies; 183+ messages in thread
From: Steven Rostedt @ 2009-06-02 22:40 UTC (permalink / raw)
  To: George Dunlap
  Cc: David Miller, jeremy, mingo, Dan Magenheimer, avi, xen-devel,
	x86, linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe

On Fri, May 29, 2009 at 01:01:18PM +0100, George Dunlap wrote:
>
> If we take him at his word, that the root issue is that he fundamentally  
> dislikes the design choice of running Linux-as-hypervisor-component,  
> then we have a difference of opinion and we're just going to have to  
> agree to disagree.  But there are reasons to include it anyway,  
> including benefits to existing Xen users and potential Xen users (who  
> have decided not to use KVM for whatever reason), and the idea of  
> survival-of-the-fittest: Xen and KVM have made different design choices,  
> let's let them both grow and see which one thrives.  If KVM's design is  
> unilaterally superior, eventually Xen will die off.  But I suspect that  
> there's significant demand in the OSS virtualization ecology for both  
> approaches, and the world will be the worse for dom0 support being  
> out-of-tree.
>

Three years ago, when I was hired by Red Hat, I was put on the Virt team,
and I had to work on Xen. I found it an awkward community to say the least.
But I'll refrain from talking about that experience.

Before I was hired, I was full time developing the -rt patch. I was accustom
to the way the Linux development worked, and felt comfortable with it. I was
very pleased when I left the virt team to go back to work on the -rt patch.
Just before I left, KVM came out. I started playing with it and I once again
felt comfortable in that development. I probably would not have mind working
in the virt team if it was KVM that I was working on. I guess the point I'm
trying to make here is that KVM is developed in a Linux community, Xen is not.

The major difference between KVM and Xen is that KVM _is_ part of Linux. Xen
is not. The reason that this matters is that if we need to make a change to
the way Linux works we can simply make KVM handle the change. That is, you
could think of it as Dom0 and the hypervisor would always be in sync.

If we were to break an interface with Dom0 for Xen then we would have a bunch
of people crying foul about us breaking a defined API. One of Thomas's complaints
(and a valid one) is that once Linux supports an external API it must always
keep it compatible. This will hamper new development in Linux if the APIs are
scattered throughout the kernel without much thought.

Now here's a crazy solution. Merge the Xen hypervisor into Linux ;-)

Give full ownership of Xen to the Linux community. One of your people could be
a maintainer. This way the API between Dom0 and the hypervisor would be an internal
one. If you needed to upgrade Dom0, you also must upgrade the hypervisor, but that
would be fine since the hypervisor would also be in the Kernel proper.

This may not solve all the issues that the x86 maintainers have with the Dom0
patches, but it may help solve the API one.

Yeah, I know, I'll be having snowball fights with Saddam before that happens.

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Merge Xen (the hypervisor) into Linux
  2009-06-02 22:40                       ` Steven Rostedt
@ 2009-06-02 23:28                         ` Ingo Molnar
  -1 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-06-02 23:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: George Dunlap, David Miller, jeremy, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Now here's a crazy solution. Merge the Xen hypervisor into Linux 
> ;-)

That's not that crazy - it's the right technical solution if DOM0 is 
desired for upstream. From what i've seen in DOM0 land the incestous 
dependencies are really only long-term manageable if the whole thing 
is in a single tree.

A lot of Xen legacies could be dropped: the crazy ring1 hack on 
32-bit, the various wide interfaces to make pure-software 
virtualization limp along. All major CPUs shipped with hardware 
virtualization support in the past 2-3 years, so the availability of 
VMX and SVM can be taken for granted for such a project.

That cuts down on a fair amount of crap. A lot of code on the Linux 
side could be reused, and a pure CONFIG_PCI=y (all other things 
disabled) would provide a "slim hypervisor" instance with a very 
small and concentrated code base. (That 'slim hypervisor' might even 
be built with CONFIG_NOMMU.)

That way dom0 would be a natural extension: a minimal interface 
between Linux-Xen-minimal and the dom0 guest instance.

It's a sane technical model IMO, and makes dom0 a lot more 
palatable. Having in-tree competition to KVM would also obviously be 
good to Linux in general.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Merge Xen (the hypervisor) into Linux
@ 2009-06-02 23:28                         ` Ingo Molnar
  0 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-06-02 23:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: George Dunlap, David Miller, jeremy, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe@oracle.com


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Now here's a crazy solution. Merge the Xen hypervisor into Linux 
> ;-)

That's not that crazy - it's the right technical solution if DOM0 is 
desired for upstream. From what i've seen in DOM0 land the incestous 
dependencies are really only long-term manageable if the whole thing 
is in a single tree.

A lot of Xen legacies could be dropped: the crazy ring1 hack on 
32-bit, the various wide interfaces to make pure-software 
virtualization limp along. All major CPUs shipped with hardware 
virtualization support in the past 2-3 years, so the availability of 
VMX and SVM can be taken for granted for such a project.

That cuts down on a fair amount of crap. A lot of code on the Linux 
side could be reused, and a pure CONFIG_PCI=y (all other things 
disabled) would provide a "slim hypervisor" instance with a very 
small and concentrated code base. (That 'slim hypervisor' might even 
be built with CONFIG_NOMMU.)

That way dom0 would be a natural extension: a minimal interface 
between Linux-Xen-minimal and the dom0 guest instance.

It's a sane technical model IMO, and makes dom0 a lot more 
palatable. Having in-tree competition to KVM would also obviously be 
good to Linux in general.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-02 22:40                       ` Steven Rostedt
@ 2009-06-02 23:41                         ` Thomas Gleixner
  -1 siblings, 0 replies; 183+ messages in thread
From: Thomas Gleixner @ 2009-06-02 23:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: George Dunlap, David Miller, jeremy, mingo, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin

On Tue, 2 Jun 2009, Steven Rostedt wrote:
> If we were to break an interface with Dom0 for Xen then we would have a bunch
> of people crying foul about us breaking a defined API. One of Thomas's complaints
> (and a valid one) is that once Linux supports an external API it must always
> keep it compatible. This will hamper new development in Linux if the APIs are
> scattered throughout the kernel without much thought.
> 
> Now here's a crazy solution. Merge the Xen hypervisor into Linux ;-)

Not that crazy as you might think.
 
> Give full ownership of Xen to the Linux community. One of your people could be
> a maintainer. This way the API between Dom0 and the hypervisor would be an internal

s/API/ABI/ :) 

> one. If you needed to upgrade Dom0, you also must upgrade the hypervisor, but that
> would be fine since the hypervisor would also be in the Kernel proper.
> 
> This may not solve all the issues that the x86 maintainers have with the Dom0
> patches, but it may help solve the API one.

In fact it would resolve the ABI problem once and forever as we could
fix hypervisor / dom0 in sync. hypervisor and dom0 need to run in
lock-step anyway if you want to make useful progress aside of
maintaining versioned interfaces which are known to bloat rapidly.

It's not a big deal to set a flag day which says: update hypervisor
and (dom0) kernel in one go.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-02 23:41                         ` Thomas Gleixner
  0 siblings, 0 replies; 183+ messages in thread
From: Thomas Gleixner @ 2009-06-02 23:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: George Dunlap, David Miller, jeremy, mingo, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector

On Tue, 2 Jun 2009, Steven Rostedt wrote:
> If we were to break an interface with Dom0 for Xen then we would have a bunch
> of people crying foul about us breaking a defined API. One of Thomas's complaints
> (and a valid one) is that once Linux supports an external API it must always
> keep it compatible. This will hamper new development in Linux if the APIs are
> scattered throughout the kernel without much thought.
> 
> Now here's a crazy solution. Merge the Xen hypervisor into Linux ;-)

Not that crazy as you might think.
 
> Give full ownership of Xen to the Linux community. One of your people could be
> a maintainer. This way the API between Dom0 and the hypervisor would be an internal

s/API/ABI/ :) 

> one. If you needed to upgrade Dom0, you also must upgrade the hypervisor, but that
> would be fine since the hypervisor would also be in the Kernel proper.
> 
> This may not solve all the issues that the x86 maintainers have with the Dom0
> patches, but it may help solve the API one.

In fact it would resolve the ABI problem once and forever as we could
fix hypervisor / dom0 in sync. hypervisor and dom0 need to run in
lock-step anyway if you want to make useful progress aside of
maintaining versioned interfaces which are known to bloat rapidly.

It's not a big deal to set a flag day which says: update hypervisor
and (dom0) kernel in one go.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: Merge Xen (the hypervisor) into Linux
  2009-06-02 23:28                         ` Ingo Molnar
  (?)
@ 2009-06-03  0:00                         ` Dan Magenheimer
  2009-06-03  0:32                           ` Thomas Gleixner
  2009-06-03  2:43                           ` Theodore Tso
  -1 siblings, 2 replies; 183+ messages in thread
From: Dan Magenheimer @ 2009-06-03  0:00 UTC (permalink / raw)
  To: Ingo Molnar, Steven Rostedt
  Cc: George Dunlap, David Miller, jeremy, avi, xen-devel, x86,
	linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

That sound you heard was 10000 xen-users@lists.xensource.com
all having heart attacks at once.

Need I say more.

> -----Original Message-----
> From: Ingo Molnar [mailto:mingo@elte.hu]
> Sent: Tuesday, June 02, 2009 5:29 PM
> To: Steven Rostedt
> Cc: George Dunlap; David Miller; jeremy@goop.org; Dan Magenheimer;
> avi@redhat.com; xen-devel@lists.xensource.com; x86@kernel.org;
> linux-kernel@vger.kernel.org; Keir Fraser;
> torvalds@linux-foundation.org; gregkh@suse.de; Kurt Hackel; Ian Pratt;
> xen-users@lists.xensource.com; ksrinivasan; EAnderson@novell.com;
> wimcoekaerts@wimmekes.net; Stephen Spector; Jens Axboe; 
> npiggin@suse.de
> Subject: Merge Xen (the hypervisor) into Linux
> 
> 
> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Now here's a crazy solution. Merge the Xen hypervisor into Linux 
> > ;-)
> 
> That's not that crazy - it's the right technical solution if DOM0 is 
> desired for upstream. From what i've seen in DOM0 land the incestous 
> dependencies are really only long-term manageable if the whole thing 
> is in a single tree.
> 
> A lot of Xen legacies could be dropped: the crazy ring1 hack on 
> 32-bit, the various wide interfaces to make pure-software 
> virtualization limp along. All major CPUs shipped with hardware 
> virtualization support in the past 2-3 years, so the availability of 
> VMX and SVM can be taken for granted for such a project.
> 
> That cuts down on a fair amount of crap. A lot of code on the Linux 
> side could be reused, and a pure CONFIG_PCI=y (all other things 
> disabled) would provide a "slim hypervisor" instance with a very 
> small and concentrated code base. (That 'slim hypervisor' might even 
> be built with CONFIG_NOMMU.)
> 
> That way dom0 would be a natural extension: a minimal interface 
> between Linux-Xen-minimal and the dom0 guest instance.
> 
> It's a sane technical model IMO, and makes dom0 a lot more 
> palatable. Having in-tree competition to KVM would also obviously be 
> good to Linux in general.
> 
> 	Ingo
>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: Merge Xen (the hypervisor) into Linux
  2009-06-03  0:00                         ` Dan Magenheimer
@ 2009-06-03  0:32                           ` Thomas Gleixner
  2009-06-03  2:43                           ` Theodore Tso
  1 sibling, 0 replies; 183+ messages in thread
From: Thomas Gleixner @ 2009-06-03  0:32 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Ingo Molnar, Steven Rostedt, George Dunlap, David Miller, jeremy,
	avi, xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin

On Tue, 2 Jun 2009, Dan Magenheimer wrote:

> That sound you heard was 10000 xen-users@lists.xensource.com
> all having heart attacks at once.
> 
> Need I say more.

Well, you might answer the question whether you are the only survivor
of that mass heart attack. In case you are the only one we can simply
assume that 99.99% of the user base is gone and we can stop the merge
discussion completely. Otherwise we try to find the survivors which
have to contribute more than the tabloid pattern.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-02 23:28                         ` Ingo Molnar
@ 2009-06-03  1:00                           ` Joel Becker
  -1 siblings, 0 replies; 183+ messages in thread
From: Joel Becker @ 2009-06-03  1:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, George Dunlap, David Miller, jeremy,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe, npiggin

[ Speaking as me, no regard to $EMPLOYER ]

On Wed, Jun 03, 2009 at 01:28:43AM +0200, Ingo Molnar wrote:
> A lot of Xen legacies could be dropped: the crazy ring1 hack on 
> 32-bit, the various wide interfaces to make pure-software 
> virtualization limp along. All major CPUs shipped with hardware 
> virtualization support in the past 2-3 years, so the availability of 
> VMX and SVM can be taken for granted for such a project.

	The biggest reason I personally want Xen to be in mainline is
PVM.  Dropping PVM is, to me, pretty much saying "let's merge Xen
without taking the useful parts."
	I have only two large machines I control.  They're too big to
run as single hosts - it's a waste - but I can leverage cluster testing
by virtualizing them.
	The first machine has HVM support.  The early kind.  It's about
2 years old.  It's so dreadfully slow that I had to go to PVM.  That
runs at very good speeds and I've stopped noticing the virtualization.
The only problem I have is managing the hypervisor bits, because they're
out of tree.
	Now, perhaps that could be fixed.  Someone told me that older
HVM boxen can't be fixed; you need a very recent VMX/SVM to perform
well.  But if it is fixable, then perhaps future plans shouldn't worry
about it.
	The second machine is pre-HVM by a short period.  It is not even
three years old.  I can't run HVM on it, at all.  I can either run PVM
or I can't virtualize.  It has fast CPUs and many GB of RAM.  I can do
an entire four node cluster test on it, with serious (read, memory
intensive) software.  In a PVM-less world, this machine becomes a
single cluster node, and I have to go find three more machines.  Of
course, if I had infinite machines, I wouldn't be worrying about this at
all.
	So I want to see PVM continue for a long time.  I'd like it to
be something I can get with mainline Linux.  I don't care if it is dom0,
dom0 and the hypervisor, whatever.  I just don't want to have to be
patching out-of-tree patches for a pretty basic functionality.
	I don't see 2-3 years as a time frame to assume "everyone has
one."  Otherwise, why does Linux have code for x86_32?  Everyone's had a
64bit system for at least that long.  Sure, that's a straw man.  It goes
both ways.
	Like Chris said, if we have technical hurdles for Xen to cross,
let's get them out in the open and fixed.  If previous Xen developer
interaction has left a bad taste in people's mouths, then the current
crew has to make it up to us.  But we have to be willing to notice
they're doing so.
	At the end of the day, I want to use Linux on my systems.

Joel

-- 

"I almost ran over an angel
 He had a nice big fat cigar.
 'In a sense,' he said, 'You're alone here
 So if you jump, you'd best jump far.'"

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-03  1:00                           ` Joel Becker
  0 siblings, 0 replies; 183+ messages in thread
From: Joel Becker @ 2009-06-03  1:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, George Dunlap, David Miller, jeremy,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector

[ Speaking as me, no regard to $EMPLOYER ]

On Wed, Jun 03, 2009 at 01:28:43AM +0200, Ingo Molnar wrote:
> A lot of Xen legacies could be dropped: the crazy ring1 hack on 
> 32-bit, the various wide interfaces to make pure-software 
> virtualization limp along. All major CPUs shipped with hardware 
> virtualization support in the past 2-3 years, so the availability of 
> VMX and SVM can be taken for granted for such a project.

	The biggest reason I personally want Xen to be in mainline is
PVM.  Dropping PVM is, to me, pretty much saying "let's merge Xen
without taking the useful parts."
	I have only two large machines I control.  They're too big to
run as single hosts - it's a waste - but I can leverage cluster testing
by virtualizing them.
	The first machine has HVM support.  The early kind.  It's about
2 years old.  It's so dreadfully slow that I had to go to PVM.  That
runs at very good speeds and I've stopped noticing the virtualization.
The only problem I have is managing the hypervisor bits, because they're
out of tree.
	Now, perhaps that could be fixed.  Someone told me that older
HVM boxen can't be fixed; you need a very recent VMX/SVM to perform
well.  But if it is fixable, then perhaps future plans shouldn't worry
about it.
	The second machine is pre-HVM by a short period.  It is not even
three years old.  I can't run HVM on it, at all.  I can either run PVM
or I can't virtualize.  It has fast CPUs and many GB of RAM.  I can do
an entire four node cluster test on it, with serious (read, memory
intensive) software.  In a PVM-less world, this machine becomes a
single cluster node, and I have to go find three more machines.  Of
course, if I had infinite machines, I wouldn't be worrying about this at
all.
	So I want to see PVM continue for a long time.  I'd like it to
be something I can get with mainline Linux.  I don't care if it is dom0,
dom0 and the hypervisor, whatever.  I just don't want to have to be
patching out-of-tree patches for a pretty basic functionality.
	I don't see 2-3 years as a time frame to assume "everyone has
one."  Otherwise, why does Linux have code for x86_32?  Everyone's had a
64bit system for at least that long.  Sure, that's a straw man.  It goes
both ways.
	Like Chris said, if we have technical hurdles for Xen to cross,
let's get them out in the open and fixed.  If previous Xen developer
interaction has left a bad taste in people's mouths, then the current
crew has to make it up to us.  But we have to be willing to notice
they're doing so.
	At the end of the day, I want to use Linux on my systems.

Joel

-- 

"I almost ran over an angel
 He had a nice big fat cigar.
 'In a sense,' he said, 'You're alone here
 So if you jump, you'd best jump far.'"

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  1:00                           ` Joel Becker
@ 2009-06-03  2:00                             ` david
  -1 siblings, 0 replies; 183+ messages in thread
From: david @ 2009-06-03  2:00 UTC (permalink / raw)
  To: Joel Becker
  Cc: Ingo Molnar, Steven Rostedt, George Dunlap, David Miller, jeremy,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe, npiggin

On Tue, 2 Jun 2009, Joel Becker wrote:

> [ Speaking as me, no regard to $EMPLOYER ]
>
> On Wed, Jun 03, 2009 at 01:28:43AM +0200, Ingo Molnar wrote:
>> A lot of Xen legacies could be dropped: the crazy ring1 hack on
>> 32-bit, the various wide interfaces to make pure-software
>> virtualization limp along. All major CPUs shipped with hardware
>> virtualization support in the past 2-3 years, so the availability of
>> VMX and SVM can be taken for granted for such a project.
>
> 	The biggest reason I personally want Xen to be in mainline is
> PVM.  Dropping PVM is, to me, pretty much saying "let's merge Xen
> without taking the useful parts."


> 	So I want to see PVM continue for a long time.  I'd like it to
> be something I can get with mainline Linux.  I don't care if it is dom0,
> dom0 and the hypervisor, whatever.  I just don't want to have to be
> patching out-of-tree patches for a pretty basic functionality.
> 	I don't see 2-3 years as a time frame to assume "everyone has
> one."  Otherwise, why does Linux have code for x86_32?  Everyone's had a
> 64bit system for at least that long.  Sure, that's a straw man.  It goes
> both ways.

it's always easier to continue to support stuff that you already have in 
place than it is to add new things.

if the non PVM stuff could be added to the kernel, how much would that 
simplify the code needed to support PVM? would that reduce the amount of 
effort that the Xen people need to spend to something that would mean that 
they would be able to keep up with fairly recent kernels?

or what about getting the non PVM version in, and then making the seperate 
argument to add PVM support with a different config option ('xen support 
for older CPU's, note there is a performance degredation if this option is 
selected'), distros could support Xen in their main kernel package on new 
hardware, and users like you could enable the slower version.

David Lang

note: I am not an approver in this process, just an interested observer 
(who doesn't use Xen)

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-03  2:00                             ` david
  0 siblings, 0 replies; 183+ messages in thread
From: david @ 2009-06-03  2:00 UTC (permalink / raw)
  To: Joel Becker
  Cc: Ingo Molnar, Steven Rostedt, George Dunlap, David Miller, jeremy,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen

On Tue, 2 Jun 2009, Joel Becker wrote:

> [ Speaking as me, no regard to $EMPLOYER ]
>
> On Wed, Jun 03, 2009 at 01:28:43AM +0200, Ingo Molnar wrote:
>> A lot of Xen legacies could be dropped: the crazy ring1 hack on
>> 32-bit, the various wide interfaces to make pure-software
>> virtualization limp along. All major CPUs shipped with hardware
>> virtualization support in the past 2-3 years, so the availability of
>> VMX and SVM can be taken for granted for such a project.
>
> 	The biggest reason I personally want Xen to be in mainline is
> PVM.  Dropping PVM is, to me, pretty much saying "let's merge Xen
> without taking the useful parts."


> 	So I want to see PVM continue for a long time.  I'd like it to
> be something I can get with mainline Linux.  I don't care if it is dom0,
> dom0 and the hypervisor, whatever.  I just don't want to have to be
> patching out-of-tree patches for a pretty basic functionality.
> 	I don't see 2-3 years as a time frame to assume "everyone has
> one."  Otherwise, why does Linux have code for x86_32?  Everyone's had a
> 64bit system for at least that long.  Sure, that's a straw man.  It goes
> both ways.

it's always easier to continue to support stuff that you already have in 
place than it is to add new things.

if the non PVM stuff could be added to the kernel, how much would that 
simplify the code needed to support PVM? would that reduce the amount of 
effort that the Xen people need to spend to something that would mean that 
they would be able to keep up with fairly recent kernels?

or what about getting the non PVM version in, and then making the seperate 
argument to add PVM support with a different config option ('xen support 
for older CPU's, note there is a performance degredation if this option is 
selected'), distros could support Xen in their main kernel package on new 
hardware, and users like you could enable the slower version.

David Lang

note: I am not an approver in this process, just an interested observer 
(who doesn't use Xen)

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  0:00                         ` Dan Magenheimer
  2009-06-03  0:32                           ` Thomas Gleixner
@ 2009-06-03  2:43                           ` Theodore Tso
  2009-06-03  3:42                             ` Steven Rostedt
                                               ` (2 more replies)
  1 sibling, 3 replies; 183+ messages in thread
From: Theodore Tso @ 2009-06-03  2:43 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Ingo Molnar, Steven Rostedt, George Dunlap, David Miller, jeremy,
	avi, xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin

On Tue, Jun 02, 2009 at 05:00:21PM -0700, Dan Magenheimer wrote:
> That sound you heard was 10000 xen-users@lists.xensource.com
> all having heart attacks at once.
> 
> Need I say more.

So maybe I'm stupid, but why would they be having heart attacks?

It seems like a decent solutoin to me.  What's being proposed would
make the dom0/hypervisor interface an internal once, always subject to
change.  What's wrong with that?  Presumably the domU/hypervisor
interface would have to be remain stable, but why is the
dom0/hypervisor interface have to be sacred and unchanging?  I don't
understand the concern.

			       	     	    - Ted

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  2:43                           ` Theodore Tso
@ 2009-06-03  3:42                             ` Steven Rostedt
  2009-06-03  4:49                               ` Dan Magenheimer
  2009-06-03  7:28                             ` Gerd Hoffmann
  2009-06-03  7:28                             ` Gerd Hoffmann
  2 siblings, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2009-06-03  3:42 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Dan Magenheimer, Ingo Molnar, George Dunlap, David Miller,
	jeremy, avi, xen-devel, x86, linux-kernel, Keir Fraser, torvalds,
	gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe, npiggin


On Tue, 2 Jun 2009, Theodore Tso wrote:

> On Tue, Jun 02, 2009 at 05:00:21PM -0700, Dan Magenheimer wrote:
> > That sound you heard was 10000 xen-users@lists.xensource.com
> > all having heart attacks at once.
> > 
> > Need I say more.
> 
> So maybe I'm stupid, but why would they be having heart attacks?

Maybe because they asked for an apple and got an apple pie?

That is, they are pushing hard for an interface for Dom0, and Ingo just 
agreed to take it along with the entire Xen hypervisor ;-)

> 
> It seems like a decent solutoin to me.  What's being proposed would
> make the dom0/hypervisor interface an internal once, always subject to
> change.  What's wrong with that?  Presumably the domU/hypervisor
> interface would have to be remain stable, but why is the
> dom0/hypervisor interface have to be sacred and unchanging?  I don't
> understand the concern.

I know I said it was a crazy idea, but the craziness was not with the 
technical side, or even if it is the correct thing to do. I just don't see 
the Xen team cooperating with the Linux team. But maybe those are the old 
days. Perhaps the rightful place for the Xen hypervisor is in Linux. Xen 
is GPL right? Thus we could do this even with out the permission from 
Citrix.

The Dom0 push of Xen just seems too much like Linux being Xen's sex 
slave, when it should be the other way around. By Linux acquiring the Xen 
hypervisor, then I can imaging much more progress in the area of Xen. KVM 
may be a competitor, but the two may also be able to share code thus both 
could benefit.

I'm not as turned off by Paravirt as others (although I've had my cursing 
at it), but with Xen inside Linux, we can tame the damage. Progress of Xen 
would speed up since there would be no barrier with the changes in Linux 
with the changes in Xen. That is, they will always be compatible.

-- Steve


^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: Merge Xen (the hypervisor) into Linux
  2009-06-03  3:42                             ` Steven Rostedt
@ 2009-06-03  4:49                               ` Dan Magenheimer
  2009-06-03  4:58                                 ` David Miller
  2009-06-03  5:22                                 ` Steven Rostedt
  0 siblings, 2 replies; 183+ messages in thread
From: Dan Magenheimer @ 2009-06-03  4:49 UTC (permalink / raw)
  To: Steven Rostedt, Theodore Tso
  Cc: Ingo Molnar, George Dunlap, David Miller, jeremy, avi, xen-devel,
	x86, linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

> > On Tue, Jun 02, 2009 at 05:00:21PM -0700, Dan Magenheimer wrote:
> > > That sound you heard was 10000 xen-users@lists.xensource.com
> > > all having heart attacks at once.
> > > 
> > > Need I say more.
> > 
> > So maybe I'm stupid, but why would they be having heart attacks?
> 
> Maybe because they asked for an apple and got an apple pie?
> 
> That is, they are pushing hard for an interface for Dom0, and 
> Ingo just 
> agreed to take it along with the entire Xen hypervisor ;-)

Um, no, he did not.  He and Avi suggested that Xen be completely
rearchitected to suit Linux's preferences. 

A hypervisor is not an operating system.  Yes there is
similarity in a number of pieces of code.  But there's
some similarity between Java and Linux too...

> Perhaps the rightful place for the Xen hypervisor is in 
> Linux. Xen 
> is GPL right? Thus we could do this even with out the permission from 
> Citrix.

(tongue firmly in cheek in case you might assume otherwise)
Linux is GPL right?  Perhaps the rightful place for the Linux
operating system is part of Java.  Thus we could do this even
with out the permission from Ingo.

> I just don't see 
> the Xen team cooperating with the Linux team.  But maybe those 
> are the old days. 

Yes, let's fix that.  Let's start turning this discussion towards
how we can cooperate better.

> The Dom0 push of Xen just seems too much like Linux being Xen's sex 
> slave, when it should be the other way around.

I can certainly see how it might feel that way, but it needn't
be... nor the other way around.  But in the end, only the end users
matter.  If we can't cooperate, we simply cede the war to Windows
and Hyper-V.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  4:49                               ` Dan Magenheimer
@ 2009-06-03  4:58                                 ` David Miller
  2009-06-03  5:07                                   ` Steven Rostedt
  2009-06-03  5:22                                 ` Steven Rostedt
  1 sibling, 1 reply; 183+ messages in thread
From: David Miller @ 2009-06-03  4:58 UTC (permalink / raw)
  To: dan.magenheimer
  Cc: rostedt, tytso, mingo, george.dunlap, jeremy, avi, xen-devel,
	x86, linux-kernel, Keir.Fraser, torvalds, gregkh, kurt.hackel,
	Ian.Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	stephen.spector, jens.axboe, npiggin

From: Dan Magenheimer <dan.magenheimer@oracle.com>
Date: Tue, 2 Jun 2009 21:49:58 -0700 (PDT)

> A hypervisor is not an operating system.

This is a pretty bogus statement if you ask me.

A hypervisor a software system that provides seperation between
protection realms.

It also handles exceptions and "system calls" on behalf of the other
protection realms.

I personally don't see the difference at all.  And since many
hypervisors even do cpu scheduling, the fundamental differences
converge to almost nothing.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  4:58                                 ` David Miller
@ 2009-06-03  5:07                                   ` Steven Rostedt
  0 siblings, 0 replies; 183+ messages in thread
From: Steven Rostedt @ 2009-06-03  5:07 UTC (permalink / raw)
  To: David Miller
  Cc: dan.magenheimer, tytso, mingo, george.dunlap, jeremy, avi,
	xen-devel, x86, linux-kernel, Keir.Fraser, torvalds, gregkh,
	kurt.hackel, Ian.Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, stephen.spector, jens.axboe, npiggin


On Tue, 2 Jun 2009, David Miller wrote:

> From: Dan Magenheimer <dan.magenheimer@oracle.com>
> Date: Tue, 2 Jun 2009 21:49:58 -0700 (PDT)
> 
> > A hypervisor is not an operating system.
> 
> This is a pretty bogus statement if you ask me.
> 
> A hypervisor a software system that provides seperation between
> protection realms.
> 
> It also handles exceptions and "system calls" on behalf of the other
> protection realms.
> 
> I personally don't see the difference at all.  And since many
> hypervisors even do cpu scheduling, the fundamental differences
> converge to almost nothing.

I recently sat in an Operating Systems class where the Professor was an 
old IBM retiree, that worked on the 390 system way back when. He would 
argue the point that an Operating System must do at least two things, 
schedule tasks and manage paging. The Xen hypervisor does both, thus in 
his eyes, it is indeed an Operating System.

-- Steve

P.S. he also thought that filesystem management does not have to be a 
duty of the OS and he hated the fact he had to teach it ;-)


^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: Merge Xen (the hypervisor) into Linux
  2009-06-03  4:49                               ` Dan Magenheimer
  2009-06-03  4:58                                 ` David Miller
@ 2009-06-03  5:22                                 ` Steven Rostedt
  2009-06-03 12:03                                     ` George Dunlap
  1 sibling, 1 reply; 183+ messages in thread
From: Steven Rostedt @ 2009-06-03  5:22 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Theodore Tso, Ingo Molnar, George Dunlap, David Miller, jeremy,
	avi, xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin


On Tue, 2 Jun 2009, Dan Magenheimer wrote:

> > > On Tue, Jun 02, 2009 at 05:00:21PM -0700, Dan Magenheimer wrote:
> > > > That sound you heard was 10000 xen-users@lists.xensource.com
> > > > all having heart attacks at once.
> > > > 
> > > > Need I say more.
> > > 
> > > So maybe I'm stupid, but why would they be having heart attacks?
> > 
> > Maybe because they asked for an apple and got an apple pie?
> > 
> > That is, they are pushing hard for an interface for Dom0, and 
> > Ingo just 
> > agreed to take it along with the entire Xen hypervisor ;-)
> 
> Um, no, he did not.  He and Avi suggested that Xen be completely
> rearchitected to suit Linux's preferences. 

I was being a bit tongue in cheek with that comment too.

> 
> A hypervisor is not an operating system.

You say potato I say potato (Hmm, that doesn't work in text)

>  Yes there is
> similarity in a number of pieces of code.  But there's
> some similarity between Java and Linux too...

Java can run on hardware?

> 
> > Perhaps the rightful place for the Xen hypervisor is in 
> > Linux. Xen 
> > is GPL right? Thus we could do this even with out the permission from 
> > Citrix.
> 
> (tongue firmly in cheek in case you might assume otherwise)
> Linux is GPL right?  Perhaps the rightful place for the Linux
> operating system is part of Java.  Thus we could do this even
> with out the permission from Ingo.

If Java became GPL it could very well do that.

> 
> > I just don't see 
> > the Xen team cooperating with the Linux team.  But maybe those 
> > are the old days. 
> 
> Yes, let's fix that.  Let's start turning this discussion towards
> how we can cooperate better.

Sure.

> 
> > The Dom0 push of Xen just seems too much like Linux being Xen's sex 
> > slave, when it should be the other way around.
> 
> I can certainly see how it might feel that way, but it needn't
> be... nor the other way around.  But in the end, only the end users
> matter.  If we can't cooperate, we simply cede the war to Windows
> and Hyper-V.

When I suggest that Xen be merged into Linux, I did not mean it had to be 
like KVM or lguest where the Linux would boot up and run Xen. I mean that 
Xen could still be a micro kernel. The difference would be that its source 
would live in the kernel proper. linux.git/xen?   This way the ABI between 
Xen and Dom0 would always be in sync.

We could even link it in to the vmlinuz, instead of needing the separate 
xen.gz to load first. The vmlinuz could then expand into a Xen 
hypervisor, and also load the Dom0 with it. One image for both entities.

If you want Dom0 ABI in, you have to expect it to change without notice. 
If this breaks Xen, then we don't want to hear any complaints. This means 
that users of Xen would need to make sure that they have both the most 
recent on hypervisor and kernel and hope that they match.

With the combined image we then get the two to always be together, and no 
problems with the users.

What's the issue with this? You get to keep your "micro hypervisor" design 
that has been stated to be the superior method.

-- Steve


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  2:43                           ` Theodore Tso
  2009-06-03  3:42                             ` Steven Rostedt
@ 2009-06-03  7:28                             ` Gerd Hoffmann
  2009-06-03  8:47                               ` Alan Cox
  2009-06-03  7:28                             ` Gerd Hoffmann
  2 siblings, 1 reply; 183+ messages in thread
From: Gerd Hoffmann @ 2009-06-03  7:28 UTC (permalink / raw)
  To: Theodore Tso, Dan Magenheimer, Ingo Molnar, Steven Rostedt,
	George Dunlap, David Miller, jeremy, avi, xen-devel, x86,
	linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

   Hi,

> It seems like a decent solutoin to me.  What's being proposed would
> make the dom0/hypervisor interface an internal once, always subject to
> change.  What's wrong with that?

Linux is not the only player here.  NetBSD can run as dom0 guest. 
Solaris can run as dom0 guest too.  Thus making the dom0/xen interface 
private to linux and xen isn't going to fly.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  2:43                           ` Theodore Tso
  2009-06-03  3:42                             ` Steven Rostedt
  2009-06-03  7:28                             ` Gerd Hoffmann
@ 2009-06-03  7:28                             ` Gerd Hoffmann
  2 siblings, 0 replies; 183+ messages in thread
From: Gerd Hoffmann @ 2009-06-03  7:28 UTC (permalink / raw)
  To: Theodore Tso, Dan Magenheimer, Ingo Molnar, Steven Rostedt,
	George Dunlap

   Hi,

> It seems like a decent solutoin to me.  What's being proposed would
> make the dom0/hypervisor interface an internal once, always subject to
> change.  What's wrong with that?

Linux is not the only player here.  NetBSD can run as dom0 guest. 
Solaris can run as dom0 guest too.  Thus making the dom0/xen interface 
private to linux and xen isn't going to fly.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  1:00                           ` Joel Becker
@ 2009-06-03  7:59                             ` Alan Cox
  -1 siblings, 0 replies; 183+ messages in thread
From: Alan Cox @ 2009-06-03  7:59 UTC (permalink / raw)
  To: Joel Becker
  Cc: Ingo Molnar, Steven Rostedt, George Dunlap, David Miller, jeremy,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe, npiggin

> 	The biggest reason I personally want Xen to be in mainline is
> PVM.  Dropping PVM is, to me, pretty much saying "let's merge Xen
> without taking the useful parts."

PVM is and has been for a long time a messaging parallel machine. Can you
not misuse the abbreviation in confusing ways (especially in email I read
in the morning ;))

Merging just hardware assisted vm support initially might be a perfectly
sensible path.

> 	Like Chris said, if we have technical hurdles for Xen to cross,
> let's get them out in the open and fixed.  If previous Xen developer
> interaction has left a bad taste in people's mouths, then the current
> crew has to make it up to us.  But we have to be willing to notice
> they're doing so.

Start by changing the mentality. Right now much of the patched code looks
like  "We made a decision years ago when creating Xen. Now we need to
force that code we wrote into Linux somehow".

Stuff gets merged a lot better if the thinking is "how do we make the
minimal changes to the existing kernel, cleanly and with minimal
inter-relationships". Only after that do you worry about whether
the existing in kernel interfaces are right.

There is a simple reason for this: Changing an interface in the kernel is
a consensus finding process around all visible users of the interface.
It's much easier to do that as a follow up. That way you can bench
alternatives, test if it harms any of the users and merge change sets
that span all the various users of the interface in one go.

It's also frequently the case that when you have a simple clean interface
that doesn't fit some in tree users it becomes blindly obvious what it
should look like.

So I would suggest the path is
- Use existing interfaces
- Merge chunks of the Xen code without worrying too much about performance
  in Xen but worry in detail about bare metal performance
- Don't worry about "hard" problems initially - eg with PAE just use the
  paravirt CPUID hook and deny having PAE to begin with
- Where there isn't a clean simple interface try as hard as possible to
  build some glue code using existing interfaces in the kernel

When it works, doesn't harm bare metal performance and is merged then go
back and worry about the harder stuff, optimisation and fine tuning. It
doesn't even need to be able to run all guests or all configurations
initially.

Also please can folks get out of the "how do we merge Xen" mentality into
the "How do we create dom0 functionality for Xen in Linux" - don't
pre-suppose the existing implementation is right.

Alan

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-03  7:59                             ` Alan Cox
  0 siblings, 0 replies; 183+ messages in thread
From: Alan Cox @ 2009-06-03  7:59 UTC (permalink / raw)
  To: Joel Becker
  Cc: Ingo Molnar, Steven Rostedt, George Dunlap, David Miller, jeremy,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen

> 	The biggest reason I personally want Xen to be in mainline is
> PVM.  Dropping PVM is, to me, pretty much saying "let's merge Xen
> without taking the useful parts."

PVM is and has been for a long time a messaging parallel machine. Can you
not misuse the abbreviation in confusing ways (especially in email I read
in the morning ;))

Merging just hardware assisted vm support initially might be a perfectly
sensible path.

> 	Like Chris said, if we have technical hurdles for Xen to cross,
> let's get them out in the open and fixed.  If previous Xen developer
> interaction has left a bad taste in people's mouths, then the current
> crew has to make it up to us.  But we have to be willing to notice
> they're doing so.

Start by changing the mentality. Right now much of the patched code looks
like  "We made a decision years ago when creating Xen. Now we need to
force that code we wrote into Linux somehow".

Stuff gets merged a lot better if the thinking is "how do we make the
minimal changes to the existing kernel, cleanly and with minimal
inter-relationships". Only after that do you worry about whether
the existing in kernel interfaces are right.

There is a simple reason for this: Changing an interface in the kernel is
a consensus finding process around all visible users of the interface.
It's much easier to do that as a follow up. That way you can bench
alternatives, test if it harms any of the users and merge change sets
that span all the various users of the interface in one go.

It's also frequently the case that when you have a simple clean interface
that doesn't fit some in tree users it becomes blindly obvious what it
should look like.

So I would suggest the path is
- Use existing interfaces
- Merge chunks of the Xen code without worrying too much about performance
  in Xen but worry in detail about bare metal performance
- Don't worry about "hard" problems initially - eg with PAE just use the
  paravirt CPUID hook and deny having PAE to begin with
- Where there isn't a clean simple interface try as hard as possible to
  build some glue code using existing interfaces in the kernel

When it works, doesn't harm bare metal performance and is merged then go
back and worry about the harder stuff, optimisation and fine tuning. It
doesn't even need to be able to run all guests or all configurations
initially.

Also please can folks get out of the "how do we merge Xen" mentality into
the "How do we create dom0 functionality for Xen in Linux" - don't
pre-suppose the existing implementation is right.

Alan

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-02 23:28                         ` Ingo Molnar
                                           ` (2 preceding siblings ...)
  (?)
@ 2009-06-03  8:07                         ` Christian Tramnitz
  2009-06-04 18:53                           ` Linus Torvalds
  -1 siblings, 1 reply; 183+ messages in thread
From: Christian Tramnitz @ 2009-06-03  8:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, xen-users

Ingo Molnar wrote:
> A lot of Xen legacies could be dropped: the crazy ring1 hack on 
> 32-bit, the various wide interfaces to make pure-software 
> virtualization limp along. All major CPUs shipped with hardware 
> virtualization support in the past 2-3 years, so the availability of 
> VMX and SVM can be taken for granted for such a project.

What a great idea, and while we're doing this let's also drop support
for legacy stuff like PATA and i8042 in mainline. Noone will need it
anyway because their successors are on the market for years... let's
just take it for granted that everyone is using SATA and USB nowadays!


Best regards,
   Christian


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  7:28                             ` Gerd Hoffmann
@ 2009-06-03  8:47                               ` Alan Cox
  2009-06-03  9:09                                 ` Gerd Hoffmann
  0 siblings, 1 reply; 183+ messages in thread
From: Alan Cox @ 2009-06-03  8:47 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Theodore Tso, Dan Magenheimer, Ingo Molnar, Steven Rostedt,
	George Dunlap, David Miller, jeremy, avi, xen-devel, x86,
	linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

> Linux is not the only player here.  NetBSD can run as dom0 guest. 
> Solaris can run as dom0 guest too.  Thus making the dom0/xen interface 
> private to linux and xen isn't going to fly.

It does not however preclude fixing the dom0 interface.

Anyway we deal with unfixable interfaces on a regular basis with device
hardware. What we don't do is screw up the kernel handling garbage
hardware. We dump the adaption on the driver.

Same with Xen, impedance matching Xen's interface with the kernel is (at
least initialy) something that belongs entirely in the Xen glue, or to
get started initially by just turning off stuff.

MTRR, PAE etc can all be turned off for the purpose an initial merge.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  8:47                               ` Alan Cox
@ 2009-06-03  9:09                                 ` Gerd Hoffmann
  2009-06-03  9:20                                     ` Keir Fraser
  2009-06-03 11:15                                   ` Theodore Tso
  0 siblings, 2 replies; 183+ messages in thread
From: Gerd Hoffmann @ 2009-06-03  9:09 UTC (permalink / raw)
  To: Alan Cox
  Cc: Theodore Tso, Dan Magenheimer, Ingo Molnar, Steven Rostedt,
	George Dunlap, David Miller, jeremy, avi, xen-devel, x86,
	linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

On 06/03/09 10:47, Alan Cox wrote:
>> Linux is not the only player here.  NetBSD can run as dom0 guest.
>> Solaris can run as dom0 guest too.  Thus making the dom0/xen interface
>> private to linux and xen isn't going to fly.
>
> It does not however preclude fixing the dom0 interface.

It wasn't my intention to imply that.  The interface can be extended 
when needed.  PAT support will probably be such a case.  Changing it in 
incompatible ways isn't going to work though.

> MTRR, PAE etc can all be turned off for the purpose an initial merge.

s/PAE/PAT/?  PAE is mandatory ...

Having not-yet supported stuff disabled initially is sensible IMHO.  Can 
be done for MTRR and PAT.  Is already done for MSI ;)

The lapic/ioapic stuff must be sorted though because otherwise you can't 
boot the box at all.  I think the same is true for the swiotlb bits.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  9:09                                 ` Gerd Hoffmann
@ 2009-06-03  9:20                                     ` Keir Fraser
  2009-06-03 11:15                                   ` Theodore Tso
  1 sibling, 0 replies; 183+ messages in thread
From: Keir Fraser @ 2009-06-03  9:20 UTC (permalink / raw)
  To: Gerd Hoffmann, Alan Cox
  Cc: Theodore Tso, Dan Magenheimer, Ingo Molnar, Steven Rostedt,
	George Dunlap, David Miller, jeremy, avi, xen-devel, x86,
	linux-kernel, torvalds, gregkh, kurt.hackel, Ian Pratt,
	xen-users, ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe, npiggin

On 03/06/2009 10:09, "Gerd Hoffmann" <kraxel@redhat.com> wrote:

> On 06/03/09 10:47, Alan Cox wrote:
>>> Linux is not the only player here.  NetBSD can run as dom0 guest.
>>> Solaris can run as dom0 guest too.  Thus making the dom0/xen interface
>>> private to linux and xen isn't going to fly.
>> 
>> It does not however preclude fixing the dom0 interface.
> 
> It wasn't my intention to imply that.  The interface can be extended
> when needed.  PAT support will probably be such a case.  Changing it in
> incompatible ways isn't going to work though.

We're happy to change interfaces where we agree that makes sense.
Compatibility is our own (Xen's) problem of course, and it's generally not
an insurmountable problem -- worst case we can launch dom0 in a varying
environment dependent on a Xen-specific elf note, for example.

 -- Keir



^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-03  9:20                                     ` Keir Fraser
  0 siblings, 0 replies; 183+ messages in thread
From: Keir Fraser @ 2009-06-03  9:20 UTC (permalink / raw)
  To: Gerd Hoffmann, Alan Cox
  Cc: Theodore Tso, Dan Magenheimer, Ingo Molnar, Steven Rostedt,
	George Dunlap, David Miller, jeremy, avi, xen-devel, x86,
	linux-kernel, torvalds, gregkh, kurt.hackel, Ian Pratt,
	xen-users, ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector

On 03/06/2009 10:09, "Gerd Hoffmann" <kraxel@redhat.com> wrote:

> On 06/03/09 10:47, Alan Cox wrote:
>>> Linux is not the only player here.  NetBSD can run as dom0 guest.
>>> Solaris can run as dom0 guest too.  Thus making the dom0/xen interface
>>> private to linux and xen isn't going to fly.
>> 
>> It does not however preclude fixing the dom0 interface.
> 
> It wasn't my intention to imply that.  The interface can be extended
> when needed.  PAT support will probably be such a case.  Changing it in
> incompatible ways isn't going to work though.

We're happy to change interfaces where we agree that makes sense.
Compatibility is our own (Xen's) problem of course, and it's generally not
an insurmountable problem -- worst case we can launch dom0 in a varying
environment dependent on a Xen-specific elf note, for example.

 -- Keir

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  9:09                                 ` Gerd Hoffmann
  2009-06-03  9:20                                     ` Keir Fraser
@ 2009-06-03 11:15                                   ` Theodore Tso
  2009-06-03 11:39                                       ` Keir Fraser
                                                       ` (2 more replies)
  1 sibling, 3 replies; 183+ messages in thread
From: Theodore Tso @ 2009-06-03 11:15 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Alan Cox, Dan Magenheimer, Ingo Molnar, Steven Rostedt,
	George Dunlap, David Miller, jeremy, avi, xen-devel, x86,
	linux-kernel, Keir Fraser, torvalds, gregkh, kurt.hackel,
	Ian Pratt, xen-users, ksrinivasan, EAnderson, wimcoekaerts,
	Stephen Spector, jens.axboe, npiggin

On Wed, Jun 03, 2009 at 11:09:39AM +0200, Gerd Hoffmann wrote:
> On 06/03/09 10:47, Alan Cox wrote:
>>> Linux is not the only player here.  NetBSD can run as dom0 guest.
>>> Solaris can run as dom0 guest too.  Thus making the dom0/xen interface
>>> private to linux and xen isn't going to fly.
>>
>> It does not however preclude fixing the dom0 interface.
>
> It wasn't my intention to imply that.  The interface can be extended  
> when needed.  PAT support will probably be such a case.  Changing it in  
> incompatible ways isn't going to work though.

But that means that if there is some fundamentally broken piece of
dom0 design, that the Linux kernel will be stuck with it ***forever***
and it will contaminate code paths and make the code harder to
maintain ***forever*** if we consent to the Xen merge?  Is that really
what you are saying?   Be careful how you answer that....

     	     	       	  	      	  - Ted

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03 11:15                                   ` Theodore Tso
@ 2009-06-03 11:39                                       ` Keir Fraser
  2009-06-03 11:41                                     ` Gerd Hoffmann
  2009-06-03 11:41                                     ` Gerd Hoffmann
  2 siblings, 0 replies; 183+ messages in thread
From: Keir Fraser @ 2009-06-03 11:39 UTC (permalink / raw)
  To: Theodore Tso, Gerd Hoffmann
  Cc: Alan Cox, Dan Magenheimer, Ingo Molnar, Steven Rostedt,
	George Dunlap, David Miller, jeremy, avi, xen-devel, x86,
	linux-kernel, torvalds, gregkh, kurt.hackel, Ian Pratt,
	xen-users, ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe, npiggin

On 03/06/2009 12:15, "Theodore Tso" <tytso@mit.edu> wrote:

>>> It does not however preclude fixing the dom0 interface.
>> 
>> It wasn't my intention to imply that.  The interface can be extended
>> when needed.  PAT support will probably be such a case.  Changing it in
>> incompatible ways isn't going to work though.
> 
> But that means that if there is some fundamentally broken piece of
> dom0 design, that the Linux kernel will be stuck with it ***forever***
> and it will contaminate code paths and make the code harder to
> maintain ***forever*** if we consent to the Xen merge?  Is that really
> what you are saying?   Be careful how you answer that....

It's not true, if you are prepared for a new dom0 kernel to require a new
version of Xen (which seems not unreasonable). We're happy to make
reasonable interface changes, and deal with compatibility issues as
necessary within Xen.

 -- Keir



^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-03 11:39                                       ` Keir Fraser
  0 siblings, 0 replies; 183+ messages in thread
From: Keir Fraser @ 2009-06-03 11:39 UTC (permalink / raw)
  To: Theodore Tso, Gerd Hoffmann
  Cc: Alan Cox, Dan Magenheimer, Ingo Molnar, Steven Rostedt,
	George Dunlap, David Miller, jeremy, avi, xen-devel, x86,
	linux-kernel, torvalds, gregkh, kurt.hackel, Ian Pratt,
	xen-users, ksrinivasan, EAnderson, wimcoekaerts, Stephen

On 03/06/2009 12:15, "Theodore Tso" <tytso@mit.edu> wrote:

>>> It does not however preclude fixing the dom0 interface.
>> 
>> It wasn't my intention to imply that.  The interface can be extended
>> when needed.  PAT support will probably be such a case.  Changing it in
>> incompatible ways isn't going to work though.
> 
> But that means that if there is some fundamentally broken piece of
> dom0 design, that the Linux kernel will be stuck with it ***forever***
> and it will contaminate code paths and make the code harder to
> maintain ***forever*** if we consent to the Xen merge?  Is that really
> what you are saying?   Be careful how you answer that....

It's not true, if you are prepared for a new dom0 kernel to require a new
version of Xen (which seems not unreasonable). We're happy to make
reasonable interface changes, and deal with compatibility issues as
necessary within Xen.

 -- Keir

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03 11:15                                   ` Theodore Tso
  2009-06-03 11:39                                       ` Keir Fraser
  2009-06-03 11:41                                     ` Gerd Hoffmann
@ 2009-06-03 11:41                                     ` Gerd Hoffmann
  2 siblings, 0 replies; 183+ messages in thread
From: Gerd Hoffmann @ 2009-06-03 11:41 UTC (permalink / raw)
  To: Theodore Tso, Alan Cox, Dan Magenheimer, Ingo Molnar,
	Steven Rostedt, George Dunlap, David Miller, jeremy, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin

On 06/03/09 13:15, Theodore Tso wrote:
> On Wed, Jun 03, 2009 at 11:09:39AM +0200, Gerd Hoffmann wrote:
>> It wasn't my intention to imply that.  The interface can be extended
>> when needed.  PAT support will probably be such a case.  Changing it in
>> incompatible ways isn't going to work though.
>
> But that means that if there is some fundamentally broken piece of
> dom0 design, that the Linux kernel will be stuck with it ***forever***
> and it will contaminate code paths and make the code harder to
> maintain ***forever*** if we consent to the Xen merge?

No.  Xen is stuck with it forever (or at least for a few releases). 
Even when adding new & better dom0/xen interfaces in the merge process 
Xen has to keep the old ones to handle the other dom0 guests (NetBSD, 
Solaris, old 2.6.18 out-of-tree linux kernel).  Pretty much like the 
linux kernel has to keep old syscalls to not break the ABI for the 
applications, xen has to maintain old hypercalls[1].

Other way around:  Apps can use new system calls only when running one 
recent kernels, and they have to deal with -ENOSYS.  Likewise it might 
be that the pv_ops-based dom0 kernel can provide some features only when 
running on a recent hypervisor.  That will likely be the case for PAT.

cheers,
   Gerd

[1] and other interfaces like trap'n'emulate certain instructions.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03 11:15                                   ` Theodore Tso
  2009-06-03 11:39                                       ` Keir Fraser
@ 2009-06-03 11:41                                     ` Gerd Hoffmann
  2009-06-03 11:41                                     ` Gerd Hoffmann
  2 siblings, 0 replies; 183+ messages in thread
From: Gerd Hoffmann @ 2009-06-03 11:41 UTC (permalink / raw)
  To: Theodore Tso, Alan Cox, Dan Magenheimer, Ingo Molnar,
	Steven Rostedt, Geo

On 06/03/09 13:15, Theodore Tso wrote:
> On Wed, Jun 03, 2009 at 11:09:39AM +0200, Gerd Hoffmann wrote:
>> It wasn't my intention to imply that.  The interface can be extended
>> when needed.  PAT support will probably be such a case.  Changing it in
>> incompatible ways isn't going to work though.
>
> But that means that if there is some fundamentally broken piece of
> dom0 design, that the Linux kernel will be stuck with it ***forever***
> and it will contaminate code paths and make the code harder to
> maintain ***forever*** if we consent to the Xen merge?

No.  Xen is stuck with it forever (or at least for a few releases). 
Even when adding new & better dom0/xen interfaces in the merge process 
Xen has to keep the old ones to handle the other dom0 guests (NetBSD, 
Solaris, old 2.6.18 out-of-tree linux kernel).  Pretty much like the 
linux kernel has to keep old syscalls to not break the ABI for the 
applications, xen has to maintain old hypercalls[1].

Other way around:  Apps can use new system calls only when running one 
recent kernels, and they have to deal with -ENOSYS.  Likewise it might 
be that the pv_ops-based dom0 kernel can provide some features only when 
running on a recent hypervisor.  That will likely be the case for PAT.

cheers,
   Gerd

[1] and other interfaces like trap'n'emulate certain instructions.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  5:22                                 ` Steven Rostedt
@ 2009-06-03 12:03                                     ` George Dunlap
  0 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-06-03 12:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dan Magenheimer, Theodore Tso, Ingo Molnar, David Miller, jeremy,
	avi, xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe, npiggin

Steven Rostedt wrote:
> What's the issue with this? You get to keep your "micro hypervisor" design 
> that has been stated to be the superior method.
>   
It is a very interesting idea, but it would still be basically a 
completely new project.  If someone started such a project, they could 
probably cannibalize a lot of Xen's existing code (a funny boomerang, 
since Xen cannibalized Linux's code when it started), but it would still 
require a lot of work and re-writing, and the result would be a lot 
different than Xen is now.  It would be years before it was ready to be 
used in a production system.  It's not really realistic to expect all 
the Xen developers and users to drop Xen development, shift gears into 
this new project, and wait until it's ready to be used.  (That's not to 
say that the idea has no merit, just that Xen as it is wouldn't go away 
until it this hypothetical linux hypervisor component was mature enough 
for users and developers to jump onto.)

Yeah, lots of interesting implications for such a project.

Having a separate component to be a hypervisor, even if in the same 
tree, would mean we could have dedicated hypervisor schedulers, &c.  
They could (conceivably) work more closely with the dom0 scheduler to 
make things more efficient.

As others have said, it would limit the ability of such a hypervisor to 
be used with other dom0 operatings systems.  Fixing the ABI sufficiently 
so that others can use it might be possible, but it seems to me unlikely 
to meet with much success without a lot of committment on both sides 
(i.e., w/in Linux and within other OS communities).

I'm not sure that it would turn out quite the way some people expect, 
though.  From a technical perspective, I'm not sure getting rid of the 
"ring 1 hack" or requiring HVM support would be the best design choice 
for such a project.  And it's hard to predict what kinds of technical, 
political, or cultural issues, directions, or potential dead-ends a 
project might take. 

 From all angles, it's too risky to just abandon the current Xen 
codebase until this hypothetical linux hypervisor component has shown 
itself to be viable.

-George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-03 12:03                                     ` George Dunlap
  0 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-06-03 12:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dan Magenheimer, Theodore Tso, Ingo Molnar, David Miller, jeremy,
	avi, xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe

Steven Rostedt wrote:
> What's the issue with this? You get to keep your "micro hypervisor" design 
> that has been stated to be the superior method.
>   
It is a very interesting idea, but it would still be basically a 
completely new project.  If someone started such a project, they could 
probably cannibalize a lot of Xen's existing code (a funny boomerang, 
since Xen cannibalized Linux's code when it started), but it would still 
require a lot of work and re-writing, and the result would be a lot 
different than Xen is now.  It would be years before it was ready to be 
used in a production system.  It's not really realistic to expect all 
the Xen developers and users to drop Xen development, shift gears into 
this new project, and wait until it's ready to be used.  (That's not to 
say that the idea has no merit, just that Xen as it is wouldn't go away 
until it this hypothetical linux hypervisor component was mature enough 
for users and developers to jump onto.)

Yeah, lots of interesting implications for such a project.

Having a separate component to be a hypervisor, even if in the same 
tree, would mean we could have dedicated hypervisor schedulers, &c.  
They could (conceivably) work more closely with the dom0 scheduler to 
make things more efficient.

As others have said, it would limit the ability of such a hypervisor to 
be used with other dom0 operatings systems.  Fixing the ABI sufficiently 
so that others can use it might be possible, but it seems to me unlikely 
to meet with much success without a lot of committment on both sides 
(i.e., w/in Linux and within other OS communities).

I'm not sure that it would turn out quite the way some people expect, 
though.  From a technical perspective, I'm not sure getting rid of the 
"ring 1 hack" or requiring HVM support would be the best design choice 
for such a project.  And it's hard to predict what kinds of technical, 
political, or cultural issues, directions, or potential dead-ends a 
project might take. 

 From all angles, it's too risky to just abandon the current Xen 
codebase until this hypothetical linux hypervisor component has shown 
itself to be viable.

-George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-02 23:28                         ` Ingo Molnar
@ 2009-06-03 17:31                           ` Chris Friesen
  -1 siblings, 0 replies; 183+ messages in thread
From: Chris Friesen @ 2009-06-03 17:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, George Dunlap, David Miller, jeremy,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe, npiggin

Ingo Molnar wrote:

> A lot of Xen legacies could be dropped: the crazy ring1 hack on 
> 32-bit, the various wide interfaces to make pure-software 
> virtualization limp along. All major CPUs shipped with hardware 
> virtualization support in the past 2-3 years, so the availability of 
> VMX and SVM can be taken for granted for such a project.

That's a pretty bold statement.  I have five x86 machines in my house
currently being used, and none of them support VMX/SVM.

At least some Lenovo laptops disable VMX in the BIOS with no way to
enable it.  Some of the Core2Duo chips don't support VMX at all.

I think Xen without paravirtualization would be a serious degradation of
usefulness.

Chris

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-03 17:31                           ` Chris Friesen
  0 siblings, 0 replies; 183+ messages in thread
From: Chris Friesen @ 2009-06-03 17:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, George Dunlap, David Miller, jeremy,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector

Ingo Molnar wrote:

> A lot of Xen legacies could be dropped: the crazy ring1 hack on 
> 32-bit, the various wide interfaces to make pure-software 
> virtualization limp along. All major CPUs shipped with hardware 
> virtualization support in the past 2-3 years, so the availability of 
> VMX and SVM can be taken for granted for such a project.

That's a pretty bold statement.  I have five x86 machines in my house
currently being used, and none of them support VMX/SVM.

At least some Lenovo laptops disable VMX in the BIOS with no way to
enable it.  Some of the Core2Duo chips don't support VMX at all.

I think Xen without paravirtualization would be a serious degradation of
usefulness.

Chris

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03 17:31                           ` Chris Friesen
@ 2009-06-03 17:36                             ` Alan Cox
  -1 siblings, 0 replies; 183+ messages in thread
From: Alan Cox @ 2009-06-03 17:36 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Ingo Molnar, Steven Rostedt, George Dunlap, David Miller, jeremy,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe, npiggin

On Wed, 03 Jun 2009 11:31:13 -0600
"Chris Friesen" <cfriesen@nortel.com> wrote:

> Ingo Molnar wrote:
> 
> > A lot of Xen legacies could be dropped: the crazy ring1 hack on 
> > 32-bit, the various wide interfaces to make pure-software 
> > virtualization limp along. All major CPUs shipped with hardware 
> > virtualization support in the past 2-3 years, so the availability of 
> > VMX and SVM can be taken for granted for such a project.
> 
> That's a pretty bold statement.  I have five x86 machines in my house
> currently being used, and none of them support VMX/SVM.
> 
> At least some Lenovo laptops disable VMX in the BIOS with no way to
> enable it.  Some of the Core2Duo chips don't support VMX at all.

Ditto some Atom cpus which in turn means you can't run kvm on all the
netbooks right now - which is one place its very useful.

> I think Xen without paravirtualization would be a serious degradation of
> usefulness.

At that point you can just use kvm anyway.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-03 17:36                             ` Alan Cox
  0 siblings, 0 replies; 183+ messages in thread
From: Alan Cox @ 2009-06-03 17:36 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Ingo Molnar, Steven Rostedt, George Dunlap, David Miller, jeremy,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen

On Wed, 03 Jun 2009 11:31:13 -0600
"Chris Friesen" <cfriesen@nortel.com> wrote:

> Ingo Molnar wrote:
> 
> > A lot of Xen legacies could be dropped: the crazy ring1 hack on 
> > 32-bit, the various wide interfaces to make pure-software 
> > virtualization limp along. All major CPUs shipped with hardware 
> > virtualization support in the past 2-3 years, so the availability of 
> > VMX and SVM can be taken for granted for such a project.
> 
> That's a pretty bold statement.  I have five x86 machines in my house
> currently being used, and none of them support VMX/SVM.
> 
> At least some Lenovo laptops disable VMX in the BIOS with no way to
> enable it.  Some of the Core2Duo chips don't support VMX at all.

Ditto some Atom cpus which in turn means you can't run kvm on all the
netbooks right now - which is one place its very useful.

> I think Xen without paravirtualization would be a serious degradation of
> usefulness.

At that point you can just use kvm anyway.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03 12:03                                     ` George Dunlap
@ 2009-06-03 19:05                                       ` Theodore Tso
  -1 siblings, 0 replies; 183+ messages in thread
From: Theodore Tso @ 2009-06-03 19:05 UTC (permalink / raw)
  To: George Dunlap
  Cc: Steven Rostedt, Dan Magenheimer, Ingo Molnar, David Miller,
	jeremy, avi, xen-devel, x86, linux-kernel, Keir Fraser, torvalds,
	gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe, npiggin

On Wed, Jun 03, 2009 at 01:03:51PM +0100, George Dunlap wrote:
> It is a very interesting idea, but it would still be basically a  
> completely new project.  If someone started such a project, they could  
> probably cannibalize a lot of Xen's existing code (a funny boomerang,  
> since Xen cannibalized Linux's code when it started), but it would still  
> require a lot of work and re-writing, and the result would be a lot  
> different than Xen is now.  It would be years before it was ready to be  
> used in a production system.  

You might be surprised; if we started with a working dom0/xen pair,
and there were people working on it to clean up dom0/xen interface,
treating it as an internal Linux interface with an eye towards
minimizing contamination of core kernel code, the Linux model of
development can go pretty fast.  Compare and contrast it with the
***years*** of calendar time and decades of wasted man-years of
engineering effort needed to port and backport and maintain dom0
support with Linux.  Given that experience, I could easily see how
some might assume that it would take years to significantly improve
things, but I suspect if xen were merged into mainline with the
assumption that it could be arbitrarily changed to make things sane,
with the primary interface that needed backwards compatibility care
being the xen/domU interface, I expect things would go pretty fast.

What would be lost is dom0 support for other OS's, but really, is that
such a major loss?  Linux has far better device driver support than
Solaris or FreeBSD, so there is really that much gain in using some
other OS for dom0?

						- Ted

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-03 19:05                                       ` Theodore Tso
  0 siblings, 0 replies; 183+ messages in thread
From: Theodore Tso @ 2009-06-03 19:05 UTC (permalink / raw)
  To: George Dunlap
  Cc: Steven Rostedt, Dan Magenheimer, Ingo Molnar, David Miller,
	jeremy, avi, xen-devel, x86, linux-kernel, Keir Fraser, torvalds,
	gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe

On Wed, Jun 03, 2009 at 01:03:51PM +0100, George Dunlap wrote:
> It is a very interesting idea, but it would still be basically a  
> completely new project.  If someone started such a project, they could  
> probably cannibalize a lot of Xen's existing code (a funny boomerang,  
> since Xen cannibalized Linux's code when it started), but it would still  
> require a lot of work and re-writing, and the result would be a lot  
> different than Xen is now.  It would be years before it was ready to be  
> used in a production system.  

You might be surprised; if we started with a working dom0/xen pair,
and there were people working on it to clean up dom0/xen interface,
treating it as an internal Linux interface with an eye towards
minimizing contamination of core kernel code, the Linux model of
development can go pretty fast.  Compare and contrast it with the
***years*** of calendar time and decades of wasted man-years of
engineering effort needed to port and backport and maintain dom0
support with Linux.  Given that experience, I could easily see how
some might assume that it would take years to significantly improve
things, but I suspect if xen were merged into mainline with the
assumption that it could be arbitrarily changed to make things sane,
with the primary interface that needed backwards compatibility care
being the xen/domU interface, I expect things would go pretty fast.

What would be lost is dom0 support for other OS's, but really, is that
such a major loss?  Linux has far better device driver support than
Solaris or FreeBSD, so there is really that much gain in using some
other OS for dom0?

						- Ted

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-02 15:23                       ` Thomas Gleixner
@ 2009-06-03 19:49                         ` Bill Davidsen
  -1 siblings, 0 replies; 183+ messages in thread
From: Bill Davidsen @ 2009-06-03 19:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: George Dunlap, David Miller, jeremy, mingo, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe

Thomas Gleixner wrote:
> On Fri, 29 May 2009, George Dunlap wrote:
>> David Miller wrote:
>>> I don't see Ingo's comments, whether I agree with them or not, as
>>> an implication of Xen being niche.  Rather I see his comments as
>>> an opposition to how Xen is implemented.
>>>   
>> It's in his definition of "improving Linux".  Jeremy is saying that allowing
>> Linux to run as dom0 *is* improving Linux.  The lack of dom0 support is at
>> this moment making life more difficult for a huge number of Linux users who
> 
> Exactly that's the point. Adding dom0 makes life easier for a group of
> users who decided to use Xen some time ago, but what Ingo wants is
> technical improvement of the kernel.
> 
> There are many features which have been wildly used in the distro
> world where developers tried to push support into the kernel with the
> same line of arguments.
> 
> The kernel policy always was and still is to accept only those
> features which have a technical benefit to the code base.
> 
> I'm just picking a few examples:
> 
> Aside of the paravirt, which seems to expand through arch/x86 like a
> hydra, the new patches sprinkle "if (xen_...)" all over the
> place. These extra xen dependencies are no improvement, they are a
> royal pain in the ... They are sticky once they got merged simply
> because the hypervisor relies on them and we need to provide
> compatibility for a long time.
> 
Wait, let's not classify something as "no improvement" when you mean "I don't 
need it." The fact that processors without hardware VM can run virtual machines 
is a non-trivial benefit for many users, and in future embedded applications, 
where both hvm and 64 bit capability may not justify their power requirements. 
And the improved PV performance over full virtualization is an improvement, even 
though it certainly isn't night and day.

Having replace some systems with new hardware just so I could use KVM does not 
make me forget that I used xen for some time, and that PV is still a savings, 
even with the latest hardware.

Let's stick to technical issues, and not deny that there are a number of users 
who really will have expanded capability. The technical points are valid, but as 
a former and probable future xen (CentOS) user, so are the benefits.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-03 19:49                         ` Bill Davidsen
  0 siblings, 0 replies; 183+ messages in thread
From: Bill Davidsen @ 2009-06-03 19:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: George Dunlap, David Miller, jeremy, mingo, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector

Thomas Gleixner wrote:
> On Fri, 29 May 2009, George Dunlap wrote:
>> David Miller wrote:
>>> I don't see Ingo's comments, whether I agree with them or not, as
>>> an implication of Xen being niche.  Rather I see his comments as
>>> an opposition to how Xen is implemented.
>>>   
>> It's in his definition of "improving Linux".  Jeremy is saying that allowing
>> Linux to run as dom0 *is* improving Linux.  The lack of dom0 support is at
>> this moment making life more difficult for a huge number of Linux users who
> 
> Exactly that's the point. Adding dom0 makes life easier for a group of
> users who decided to use Xen some time ago, but what Ingo wants is
> technical improvement of the kernel.
> 
> There are many features which have been wildly used in the distro
> world where developers tried to push support into the kernel with the
> same line of arguments.
> 
> The kernel policy always was and still is to accept only those
> features which have a technical benefit to the code base.
> 
> I'm just picking a few examples:
> 
> Aside of the paravirt, which seems to expand through arch/x86 like a
> hydra, the new patches sprinkle "if (xen_...)" all over the
> place. These extra xen dependencies are no improvement, they are a
> royal pain in the ... They are sticky once they got merged simply
> because the hypervisor relies on them and we need to provide
> compatibility for a long time.
> 
Wait, let's not classify something as "no improvement" when you mean "I don't 
need it." The fact that processors without hardware VM can run virtual machines 
is a non-trivial benefit for many users, and in future embedded applications, 
where both hvm and 64 bit capability may not justify their power requirements. 
And the improved PV performance over full virtualization is an improvement, even 
though it certainly isn't night and day.

Having replace some systems with new hardware just so I could use KVM does not 
make me forget that I used xen for some time, and that PV is still a savings, 
even with the latest hardware.

Let's stick to technical issues, and not deny that there are a number of users 
who really will have expanded capability. The technical points are valid, but as 
a former and probable future xen (CentOS) user, so are the benefits.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-03 19:49                         ` Bill Davidsen
@ 2009-06-03 20:20                           ` Thomas Gleixner
  -1 siblings, 0 replies; 183+ messages in thread
From: Thomas Gleixner @ 2009-06-03 20:20 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: George Dunlap, David Miller, jeremy, mingo, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe

On Wed, 3 Jun 2009, Bill Davidsen wrote:
> Thomas Gleixner wrote:
> > Aside of the paravirt, which seems to expand through arch/x86 like a
> > hydra, the new patches sprinkle "if (xen_...)" all over the
> > place. These extra xen dependencies are no improvement, they are a
> > royal pain in the ... They are sticky once they got merged simply
> > because the hypervisor relies on them and we need to provide
> > compatibility for a long time.
> > 
> Wait, let's not classify something as "no improvement" when you mean "I don't
> need it."

It's not about "I don't need it.". It's about having Xen dependencies
in the code all over the place which make mainatainence harder. I have
to balance the users benefit (xen dom0 support) vs. the impact on
maintainability and the restrictions which are going to be set almost
in stone by merging it.

> Let's stick to technical issues, and not deny that there are a number of users
> who really will have expanded capability. The technical points are valid, but
> as a former and probable future xen (CentOS) user, so are the benefits.

Refusing random "if (xen...)" dependencies is a purely technical
decision. I have said more than once that I'm not against merging dom0
in general, I'm just frightened by the technical impact of a defacto
ABI which we swallow with it.

We have enough problems with real silicon and BIOS/ACPI already, why
should we add artifical and _avoidable_ virtual silicon horror ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-03 20:20                           ` Thomas Gleixner
  0 siblings, 0 replies; 183+ messages in thread
From: Thomas Gleixner @ 2009-06-03 20:20 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: George Dunlap, David Miller, jeremy, mingo, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector

On Wed, 3 Jun 2009, Bill Davidsen wrote:
> Thomas Gleixner wrote:
> > Aside of the paravirt, which seems to expand through arch/x86 like a
> > hydra, the new patches sprinkle "if (xen_...)" all over the
> > place. These extra xen dependencies are no improvement, they are a
> > royal pain in the ... They are sticky once they got merged simply
> > because the hypervisor relies on them and we need to provide
> > compatibility for a long time.
> > 
> Wait, let's not classify something as "no improvement" when you mean "I don't
> need it."

It's not about "I don't need it.". It's about having Xen dependencies
in the code all over the place which make mainatainence harder. I have
to balance the users benefit (xen dom0 support) vs. the impact on
maintainability and the restrictions which are going to be set almost
in stone by merging it.

> Let's stick to technical issues, and not deny that there are a number of users
> who really will have expanded capability. The technical points are valid, but
> as a former and probable future xen (CentOS) user, so are the benefits.

Refusing random "if (xen...)" dependencies is a purely technical
decision. I have said more than once that I'm not against merging dom0
in general, I'm just frightened by the technical impact of a defacto
ABI which we swallow with it.

We have enough problems with real silicon and BIOS/ACPI already, why
should we add artifical and _avoidable_ virtual silicon horror ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03 19:05                                       ` Theodore Tso
  (?)
@ 2009-06-03 21:49                                       ` Samuel Thibault
  -1 siblings, 0 replies; 183+ messages in thread
From: Samuel Thibault @ 2009-06-03 21:49 UTC (permalink / raw)
  To: Theodore Tso, George Dunlap, Steven Rostedt, Dan Magenheimer,
	Ingo Molnar

Theodore Tso, le Wed 03 Jun 2009 15:05:21 -0400, a écrit :
> What would be lost is dom0 support for other OS's, but really, is that
> such a major loss?

Yes.

> Linux has far better device driver support than Solaris or FreeBSD, so
> there is really that much gain in using some other OS for dom0?

Yes.

Thanks for taking that into account.
Samuel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-03 20:20                           ` Thomas Gleixner
@ 2009-06-03 22:37                             ` Bill Davidsen
  -1 siblings, 0 replies; 183+ messages in thread
From: Bill Davidsen @ 2009-06-03 22:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: George Dunlap, David Miller, jeremy, mingo, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe

Thomas Gleixner wrote:
> On Wed, 3 Jun 2009, Bill Davidsen wrote:
>   
>> Thomas Gleixner wrote:
>>     
>>> Aside of the paravirt, which seems to expand through arch/x86 like a
>>> hydra, the new patches sprinkle "if (xen_...)" all over the
>>> place. These extra xen dependencies are no improvement, they are a
>>> royal pain in the ... They are sticky once they got merged simply
>>> because the hypervisor relies on them and we need to provide
>>> compatibility for a long time.
>>>
>>>       
>> Wait, let's not classify something as "no improvement" when you mean "I don't
>> need it."
>>     
>
> It's not about "I don't need it.". It's about having Xen dependencies
> in the code all over the place which make mainatainence harder. I have
> to balance the users benefit (xen dom0 support) vs. the impact on
> maintainability and the restrictions which are going to be set almost
> in stone by merging it.
>
>   
>> Let's stick to technical issues, and not deny that there are a number of users
>> who really will have expanded capability. The technical points are valid, but
>> as a former and probable future xen (CentOS) user, so are the benefits.
>>     
>
> Refusing random "if (xen...)" dependencies is a purely technical
> decision. I have said more than once that I'm not against merging dom0
> in general, I'm just frightened by the technical impact of a defacto
> ABI which we swallow with it.
>
>   
I was referring to your "no benefit" comment, I don't dispute the 
technical issues. I think the idea of moving the hypervisor into the 
kernel and letting xen folks do the external parts as they please.

> We have enough problems with real silicon and BIOS/ACPI already, why
> should we add artifical and _avoidable_ virtual silicon horror ?
>   

I guess my point wasn't clear, sorry, it's just that I felt as though 
the features lacking KVM (old/small/BIOS-limited CPUs) might be hidden 
in the smoke due to the technical issues.

-- 
Bill Davidsen <davidsen@tmr.com>
  Even purely technical things can appear to be magic, if the documentation is
obscure enough. For example, PulseAudio is configured by dancing naked around a
fire at midnight, shaking a rattle with one hand and a LISP manual with the
other, while reciting the GNU manifesto in hexadecimal. The documentation fails
to note that you must circle the fire counter-clockwise in the southern
hemisphere.



^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-03 22:37                             ` Bill Davidsen
  0 siblings, 0 replies; 183+ messages in thread
From: Bill Davidsen @ 2009-06-03 22:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: George Dunlap, David Miller, jeremy, mingo, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector

Thomas Gleixner wrote:
> On Wed, 3 Jun 2009, Bill Davidsen wrote:
>   
>> Thomas Gleixner wrote:
>>     
>>> Aside of the paravirt, which seems to expand through arch/x86 like a
>>> hydra, the new patches sprinkle "if (xen_...)" all over the
>>> place. These extra xen dependencies are no improvement, they are a
>>> royal pain in the ... They are sticky once they got merged simply
>>> because the hypervisor relies on them and we need to provide
>>> compatibility for a long time.
>>>
>>>       
>> Wait, let's not classify something as "no improvement" when you mean "I don't
>> need it."
>>     
>
> It's not about "I don't need it.". It's about having Xen dependencies
> in the code all over the place which make mainatainence harder. I have
> to balance the users benefit (xen dom0 support) vs. the impact on
> maintainability and the restrictions which are going to be set almost
> in stone by merging it.
>
>   
>> Let's stick to technical issues, and not deny that there are a number of users
>> who really will have expanded capability. The technical points are valid, but
>> as a former and probable future xen (CentOS) user, so are the benefits.
>>     
>
> Refusing random "if (xen...)" dependencies is a purely technical
> decision. I have said more than once that I'm not against merging dom0
> in general, I'm just frightened by the technical impact of a defacto
> ABI which we swallow with it.
>
>   
I was referring to your "no benefit" comment, I don't dispute the 
technical issues. I think the idea of moving the hypervisor into the 
kernel and letting xen folks do the external parts as they please.

> We have enough problems with real silicon and BIOS/ACPI already, why
> should we add artifical and _avoidable_ virtual silicon horror ?
>   

I guess my point wasn't clear, sorry, it's just that I felt as though 
the features lacking KVM (old/small/BIOS-limited CPUs) might be hidden 
in the smoke due to the technical issues.

-- 
Bill Davidsen <davidsen@tmr.com>
  Even purely technical things can appear to be magic, if the documentation is
obscure enough. For example, PulseAudio is configured by dancing naked around a
fire at midnight, shaking a rattle with one hand and a LISP manual with the
other, while reciting the GNU manifesto in hexadecimal. The documentation fails
to note that you must circle the fire counter-clockwise in the southern
hemisphere.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-03 22:37                             ` Bill Davidsen
  (?)
@ 2009-06-03 23:29                             ` Frans Pop
  2009-06-04 13:21                                 ` George Dunlap
  2009-06-05  4:14                               ` Bill Davidsen
  -1 siblings, 2 replies; 183+ messages in thread
From: Frans Pop @ 2009-06-03 23:29 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: tglx, george.dunlap, davem, jeremy, mingo, dan.magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir.Fraser, torvalds, gregkh,
	kurt.hackel, Ian.Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, stephen.spector, jens.axboe

Bill Davidsen wrote:
> I was referring to your "no benefit" comment, I don't dispute the
> technical issues. I think the idea of moving the hypervisor into the
> kernel and letting xen folks do the external parts as they please.

Where does that come from? AFAICT Thomas never made a "no benefit" comment 
other than limited to the context of the technical implementation.
I've always understood his meaning in this thread to be: "the proposed 
patch set does not improve the technical standard of the linux kernel, 
but would instead lower it considerably".
Thomas has been extremely correct in this thread and IMO does not deserve 
this attack.

Let's look at his exact comments (emphasis mine).

! The kernel policy always was and still is to accept only those
! features which have a technical benefit **to the code base**.

and

! Aside of the paravirt, which seems to expand through arch/x86 like a
! hydra, the new patches sprinkle "if (xen_...)" all over the
! place. These extra xen dependencies are no improvement, they are a
! royal pain in the ...

Also clearly limited to technical implementation.

! I really have a hard time to see why dom0 support makes Linux more
! useful **to people who do not use it**. It does not improve the Linux
! experience **of Joe User** at all.

Or has Thomas made some "no benefit" comment I've missed?

Cheers,
FJP

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-03 23:29                             ` Frans Pop
@ 2009-06-04 13:21                                 ` George Dunlap
  2009-06-05  4:14                               ` Bill Davidsen
  1 sibling, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-06-04 13:21 UTC (permalink / raw)
  To: Frans Pop
  Cc: Bill Davidsen, tglx, davem, jeremy, mingo, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Stephen Spector, jens.axboe

Frans Pop wrote:
> ! The kernel policy always was and still is to accept only those
> ! features which have a technical benefit **to the code base**.
>   
Yes, I think I understood him better after I responded to his e-mail 
(unfortunately).  When people say things like "dom0 adds all these hooks 
but doesn't add anything to Linux", they mean something like this 
(please correct me anyone, if I'm wrong).

Kernel developers want Linux, as a project, to have cool things in it.  
They want it to be cool.  Adding new features, new capabilities, new 
technical code, makes it cooler.  Sometimes adding new features to make 
it cooler has some cost in terms of adding things to other parts of the 
code, possibly making it a little less clean or a little more 
convoluted.  But if the coolness is cool enough, it's worth the cost.

The feeling is that adding a bunch of these dom0 hooks (especially of 
the type, "if(xen) { foo; }"), are a cost to Linux.  They make the code 
ugly.  They do allow a new kind of coolness, a (linux-dom0 + Xen) 
coolness.  But none of the coolness actually happens in Linux; it all 
happens in Xen.  So coolness may happen, and world happiness might 
increase marginally, but Linux itself doesn't seem any cooler, it just 
has the cost of all these ugly hooks.  Thus the "Linux is Xen's sex 
slave" analogy. :-)

If (hypothetically) we merged Xen into Linux, then (people are 
suggesting) the coolness of Xen would actually contribute to the 
coolness of Linux ("add technical benefit to the code base").  People 
would feel like working on the interface between linux-xen and the rest 
of linux would be making their own piece of software, Linux, work 
better, rather than feeling like they have to work with some foreign 
project that doesn't make their code any cooler.

Is that a pretty accurate representation of the "adding features which 
have a technical benefit to the code base" argument?

 -George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-04 13:21                                 ` George Dunlap
  0 siblings, 0 replies; 183+ messages in thread
From: George Dunlap @ 2009-06-04 13:21 UTC (permalink / raw)
  To: Frans Pop
  Cc: Bill Davidsen, tglx, davem, jeremy, mingo, Dan Magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir Fraser, torvalds, gregkh,
	kurt.hackel, Ian Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, Steph

Frans Pop wrote:
> ! The kernel policy always was and still is to accept only those
> ! features which have a technical benefit **to the code base**.
>   
Yes, I think I understood him better after I responded to his e-mail 
(unfortunately).  When people say things like "dom0 adds all these hooks 
but doesn't add anything to Linux", they mean something like this 
(please correct me anyone, if I'm wrong).

Kernel developers want Linux, as a project, to have cool things in it.  
They want it to be cool.  Adding new features, new capabilities, new 
technical code, makes it cooler.  Sometimes adding new features to make 
it cooler has some cost in terms of adding things to other parts of the 
code, possibly making it a little less clean or a little more 
convoluted.  But if the coolness is cool enough, it's worth the cost.

The feeling is that adding a bunch of these dom0 hooks (especially of 
the type, "if(xen) { foo; }"), are a cost to Linux.  They make the code 
ugly.  They do allow a new kind of coolness, a (linux-dom0 + Xen) 
coolness.  But none of the coolness actually happens in Linux; it all 
happens in Xen.  So coolness may happen, and world happiness might 
increase marginally, but Linux itself doesn't seem any cooler, it just 
has the cost of all these ugly hooks.  Thus the "Linux is Xen's sex 
slave" analogy. :-)

If (hypothetically) we merged Xen into Linux, then (people are 
suggesting) the coolness of Xen would actually contribute to the 
coolness of Linux ("add technical benefit to the code base").  People 
would feel like working on the interface between linux-xen and the rest 
of linux would be making their own piece of software, Linux, work 
better, rather than feeling like they have to work with some foreign 
project that doesn't make their code any cooler.

Is that a pretty accurate representation of the "adding features which 
have a technical benefit to the code base" argument?

 -George

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-users] Re: Merge Xen (the hypervisor) into Linux
  2009-06-03 19:05                                       ` Theodore Tso
  (?)
  (?)
@ 2009-06-04 13:43                                       ` Florian Manschwetus
  2009-06-04 14:03                                           ` Steven Rostedt
  -1 siblings, 1 reply; 183+ messages in thread
From: Florian Manschwetus @ 2009-06-04 13:43 UTC (permalink / raw)
  To: Theodore Tso, George Dunlap, Steven Rostedt, Dan Magenheimer,
	Ingo Molnar


[-- Attachment #1.1: Type: text/plain, Size: 1270 bytes --]

So first in short, merging xen into a part of linux negates one of the 
central ideas, to focus developer power of different OS's to one 
compatible hypervisor.

Am 03.06.2009 21:05, schrieb Theodore Tso:
>
> What would be lost is dom0 support for other OS's, but really, is that
> such a major loss?

(all is based on my personal feelings and information!)
So my two cents here:

Yes, as for example Sun is one of the most active partys in the Xen 
community at all. They are going to make xvm (the name of xen in 
solaris) a real competitor with VmWare ESX.
Even more Sun is responsible for a lot of cleanup and improvement on the 
hypervisor over the last years.
So to kick Solaris out might be a lot more than just the first nail...

On the other side why use linux as dom0?
just take a second to mind about OpenSolaris as dom0 (release state 
would close up soon to current state of xen), it gifts you zfs.

Florian

   Linux has far better device driver support than
> Solaris or FreeBSD, so there is really that much gain in using some
> other OS for dom0?
>
> 						- Ted
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>



[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3686 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-users] Re: Xen is a feature
  2009-06-02 17:46                           ` Linus Torvalds
@ 2009-06-04 14:02                             ` Thomas Goirand
  -1 siblings, 0 replies; 183+ messages in thread
From: Thomas Goirand @ 2009-06-04 14:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: George Dunlap, jens.axboe, npiggin, Dan Magenheimer, xen-devel,
	wimcoekaerts, gregkh, ksrinivasan, linux-kernel, x86, jeremy,
	David Miller, Ian Pratt, Stephen Spector, avi, EAnderson,
	kurt.hackel, Thomas Gleixner, xen-users, mingo, Keir Fraser

Linus Torvalds wrote:
> Seriously.
> 
> If it was just the local APIC, fine. But it may be just the local APIC 
> code this time around, next time it will be something else. It's been TLB, 
> it's been entry_*.S, it's been all over. Some of them are performance 
> issues.
> 
> I dunno. I just do know that I pointed out the statistics for how 
> mindlessly incestuous the Xen patches have historically been to Jeremy. He 
> admitted it. I've not seen _anybody_ say that things will improve. 
> 
> Xen has been painful. If you give maintainers pain, don't expect them to 
> love you or respect you.
> 
> So I would really suggest that Xen people should look at _why_ they are 
> giving maintainers so much pain.
> 
> 		Linus

Seriously, reading this is discouraging. I had to stop myself
criticizing too much this opinion here, but it's kind of hard to read
"mindless", "painful" and such considering the consequences of the
current state.

As time passes, it's becoming more and more unmaintainable to manage the
dom0 patch on one side, and the mainline kernel on the other, even for a
user/admin point of view. THIS is years of mindless and painful
administration/patching tasks. We've all bee waiting too long already.
We need the Xen dom0 "feature" NOW! Not tomorrow, not in one week, not
in 10 years...

As a developer myself (not on the kernel though), I can perfectly
understand the standpoint about ugliness of the code. However, refusing
to merge gives bad headaches to hundreds of people trying to deal and
maintain productions with the issues it creates.

I stand on Steven Rostedt's side (and many others too). Merging WILL
make it possible to have Xen going the way you wish. Otherwise, it's
again a cathedral type of development. Keir Fraser and others seems to
be willing to do the changes in the API if needed. It's just not right
to tell they don't want to. And if there is such need for ABI/API
compatibility, why not just add a config option "compatibility to old
style Xen (dirty hugly slow feature)" if there are some issues?

Now, about merging the Xen hypervisor, that's another discussion that
can happen later on, IMHO. What's URGENT (I insist here) is dom0 support
(including with 64 bits).

Thomas

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-users] Re: Xen is a feature
@ 2009-06-04 14:02                             ` Thomas Goirand
  0 siblings, 0 replies; 183+ messages in thread
From: Thomas Goirand @ 2009-06-04 14:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: George Dunlap, jens.axboe, npiggin, Dan Magenheimer, xen-devel,
	wimcoekaerts, gregkh, ksrinivasan, linux-kernel, x86, jeremy,
	David Miller, Ian Pratt, Stephen Spector, avi, EAnderson,
	kurt.hackel, Thomas Gleixner, xen-users, mingo

Linus Torvalds wrote:
> Seriously.
> 
> If it was just the local APIC, fine. But it may be just the local APIC 
> code this time around, next time it will be something else. It's been TLB, 
> it's been entry_*.S, it's been all over. Some of them are performance 
> issues.
> 
> I dunno. I just do know that I pointed out the statistics for how 
> mindlessly incestuous the Xen patches have historically been to Jeremy. He 
> admitted it. I've not seen _anybody_ say that things will improve. 
> 
> Xen has been painful. If you give maintainers pain, don't expect them to 
> love you or respect you.
> 
> So I would really suggest that Xen people should look at _why_ they are 
> giving maintainers so much pain.
> 
> 		Linus

Seriously, reading this is discouraging. I had to stop myself
criticizing too much this opinion here, but it's kind of hard to read
"mindless", "painful" and such considering the consequences of the
current state.

As time passes, it's becoming more and more unmaintainable to manage the
dom0 patch on one side, and the mainline kernel on the other, even for a
user/admin point of view. THIS is years of mindless and painful
administration/patching tasks. We've all bee waiting too long already.
We need the Xen dom0 "feature" NOW! Not tomorrow, not in one week, not
in 10 years...

As a developer myself (not on the kernel though), I can perfectly
understand the standpoint about ugliness of the code. However, refusing
to merge gives bad headaches to hundreds of people trying to deal and
maintain productions with the issues it creates.

I stand on Steven Rostedt's side (and many others too). Merging WILL
make it possible to have Xen going the way you wish. Otherwise, it's
again a cathedral type of development. Keir Fraser and others seems to
be willing to do the changes in the API if needed. It's just not right
to tell they don't want to. And if there is such need for ABI/API
compatibility, why not just add a config option "compatibility to old
style Xen (dirty hugly slow feature)" if there are some issues?

Now, about merging the Xen hypervisor, that's another discussion that
can happen later on, IMHO. What's URGENT (I insist here) is dom0 support
(including with 64 bits).

Thomas

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-users] Re: Merge Xen (the hypervisor) into Linux
  2009-06-04 13:43                                       ` [Xen-users] " Florian Manschwetus
@ 2009-06-04 14:03                                           ` Steven Rostedt
  0 siblings, 0 replies; 183+ messages in thread
From: Steven Rostedt @ 2009-06-04 14:03 UTC (permalink / raw)
  To: Florian Manschwetus
  Cc: Theodore Tso, George Dunlap, Dan Magenheimer, Ingo Molnar,
	David Miller, jeremy, avi, xen-devel, x86, linux-kernel,
	Keir Fraser, torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users,
	ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe, npiggin


On Thu, 4 Jun 2009, Florian Manschwetus wrote:
> 
> On the other side why use linux as dom0?
> just take a second to mind about OpenSolaris as dom0 (release state would
> close up soon to current state of xen), it gifts you zfs.

Let's turn this around a bit. Can we get Xen to keep a rock solid stable 
ABI?  Where the interface to Xen from Dom0 is never expected to break? All 
old Dom0's will always work on Xen?

Document this interface, and that it will always work. If it is a clean 
interface, then perhaps Linux could work with it. But it would need to be 
non intrusive. I'll have to take some time to look at the Dom0 patches to 
see what exactly it requires. Perhaps there's better ways to accomplish 
what is being asked for.

-- Steve


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [Xen-users] Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-04 14:03                                           ` Steven Rostedt
  0 siblings, 0 replies; 183+ messages in thread
From: Steven Rostedt @ 2009-06-04 14:03 UTC (permalink / raw)
  To: Florian Manschwetus
  Cc: Theodore Tso, George Dunlap, Dan Magenheimer, Ingo Molnar,
	David Miller, jeremy, avi, xen-devel, x86, linux-kernel,
	Keir Fraser, torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users,
	ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector


On Thu, 4 Jun 2009, Florian Manschwetus wrote:
> 
> On the other side why use linux as dom0?
> just take a second to mind about OpenSolaris as dom0 (release state would
> close up soon to current state of xen), it gifts you zfs.

Let's turn this around a bit. Can we get Xen to keep a rock solid stable 
ABI?  Where the interface to Xen from Dom0 is never expected to break? All 
old Dom0's will always work on Xen?

Document this interface, and that it will always work. If it is a clean 
interface, then perhaps Linux could work with it. But it would need to be 
non intrusive. I'll have to take some time to look at the Dom0 patches to 
see what exactly it requires. Perhaps there's better ways to accomplish 
what is being asked for.

-- Steve

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-04 13:21                                 ` George Dunlap
@ 2009-06-04 15:10                                   ` Theodore Tso
  -1 siblings, 0 replies; 183+ messages in thread
From: Theodore Tso @ 2009-06-04 15:10 UTC (permalink / raw)
  To: George Dunlap
  Cc: Frans Pop, Bill Davidsen, tglx, davem, jeremy, mingo,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe

On Thu, Jun 04, 2009 at 02:21:08PM +0100, George Dunlap wrote:
> If (hypothetically) we merged Xen into Linux, then (people are  
> suggesting) the coolness of Xen would actually contribute to the  
> coolness of Linux ("add technical benefit to the code base").  People  
> would feel like working on the interface between linux-xen and the rest  
> of linux would be making their own piece of software, Linux, work  
> better, rather than feeling like they have to work with some foreign  
> project that doesn't make their code any cooler.
>
> Is that a pretty accurate representation of the "adding features which  
> have a technical benefit to the code base" argument?

The other argument is that by merging Xen into Linux, it becomes
easier for kernel developers to understand *why* "if (xen) ..." shows
up in random places in core kernel code, and it becomes easier to
clean that up.

If Xen isn't merged, it becomes much harder to believe that those
cleanups will occur, since the Xen developers might stonewall such
cleanups for reasons that Linux developers might not consider valid.
So the threshold for accepting patches might be much higher, since the
subsystem maintainers involved might decide to NAK patches as
uglifying the Linux kernel codebase with no real benefit to the Linux
codebase --- and not much hope that said ugly hacks will get cleaned
up later.  Historically, once code with warts gets merged, we lose all
leverage towards fixing those warts afterwards; this is true in
general, and not a statement of a lack of trust of Xen developers
specifically.

This doesn't make merging Xen *impossible*, but probably makes it
harder, since each of those objections will have to be cleared,
possibly by refactoring the code so that it adds benefits not just for
Xen, but some other in-kernel user of that abstraction (i.e., like
KVM, lguest, etc.) or by cleaning up the code in general, in order to
clear NAK's by the relevant developers.  

If Xen is merged, then ultimately Linus gets to make the call about
whether something gets fixed, even at the cost of making a change to
the hypervisor/dom0 interface.  So this would likely decrease the
threshold of what has to be fixed before people are willing to ACK a
Xen merge, since there's better confidence that these warts will be
cleaned up.  An example of that might be XFS, which had all sorts of
Irix warts which has been gradually cleaned up over the years.  Of
course, there might still be some hideous abstraction violations that
would have to be cleaned up first; but that's up to the relevant
subsystem maintainers.

	       	   		  	     - Ted

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-04 15:10                                   ` Theodore Tso
  0 siblings, 0 replies; 183+ messages in thread
From: Theodore Tso @ 2009-06-04 15:10 UTC (permalink / raw)
  To: George Dunlap
  Cc: Frans Pop, Bill Davidsen, tglx, davem, jeremy, mingo,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts@wimmekes.net

On Thu, Jun 04, 2009 at 02:21:08PM +0100, George Dunlap wrote:
> If (hypothetically) we merged Xen into Linux, then (people are  
> suggesting) the coolness of Xen would actually contribute to the  
> coolness of Linux ("add technical benefit to the code base").  People  
> would feel like working on the interface between linux-xen and the rest  
> of linux would be making their own piece of software, Linux, work  
> better, rather than feeling like they have to work with some foreign  
> project that doesn't make their code any cooler.
>
> Is that a pretty accurate representation of the "adding features which  
> have a technical benefit to the code base" argument?

The other argument is that by merging Xen into Linux, it becomes
easier for kernel developers to understand *why* "if (xen) ..." shows
up in random places in core kernel code, and it becomes easier to
clean that up.

If Xen isn't merged, it becomes much harder to believe that those
cleanups will occur, since the Xen developers might stonewall such
cleanups for reasons that Linux developers might not consider valid.
So the threshold for accepting patches might be much higher, since the
subsystem maintainers involved might decide to NAK patches as
uglifying the Linux kernel codebase with no real benefit to the Linux
codebase --- and not much hope that said ugly hacks will get cleaned
up later.  Historically, once code with warts gets merged, we lose all
leverage towards fixing those warts afterwards; this is true in
general, and not a statement of a lack of trust of Xen developers
specifically.

This doesn't make merging Xen *impossible*, but probably makes it
harder, since each of those objections will have to be cleared,
possibly by refactoring the code so that it adds benefits not just for
Xen, but some other in-kernel user of that abstraction (i.e., like
KVM, lguest, etc.) or by cleaning up the code in general, in order to
clear NAK's by the relevant developers.  

If Xen is merged, then ultimately Linus gets to make the call about
whether something gets fixed, even at the cost of making a change to
the hypervisor/dom0 interface.  So this would likely decrease the
threshold of what has to be fixed before people are willing to ACK a
Xen merge, since there's better confidence that these warts will be
cleaned up.  An example of that might be XFS, which had all sorts of
Irix warts which has been gradually cleaned up over the years.  Of
course, there might still be some hideous abstraction violations that
would have to be cleaned up first; but that's up to the relevant
subsystem maintainers.

	       	   		  	     - Ted

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-04 13:21                                 ` George Dunlap
@ 2009-06-04 15:31                                   ` Chris Friesen
  -1 siblings, 0 replies; 183+ messages in thread
From: Chris Friesen @ 2009-06-04 15:31 UTC (permalink / raw)
  To: George Dunlap
  Cc: Frans Pop, Bill Davidsen, tglx, davem, jeremy, mingo,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe

George Dunlap wrote:
> Frans Pop wrote:
> 
>>! The kernel policy always was and still is to accept only those
>>! features which have a technical benefit **to the code base**.

> If (hypothetically) we merged Xen into Linux, then (people are 
> suggesting) the coolness of Xen would actually contribute to the 
> coolness of Linux ("add technical benefit to the code base").  People 
> would feel like working on the interface between linux-xen and the rest 
> of linux would be making their own piece of software, Linux, work 
> better, rather than feeling like they have to work with some foreign 
> project that doesn't make their code any cooler.

I suspect that there is an element of this.

There is also the factor that if Xen was merged into linux, we would
then be able to work towards a sane(r) virtualization layer that would
be useful for KVM, Xen, and possibly others.  This provides a technical
benefit to the code base by introducing a more logical organization
rather than having ad-hoc changes sprinkled all over.

Chris


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-04 15:31                                   ` Chris Friesen
  0 siblings, 0 replies; 183+ messages in thread
From: Chris Friesen @ 2009-06-04 15:31 UTC (permalink / raw)
  To: George Dunlap
  Cc: Frans Pop, Bill Davidsen, tglx, davem, jeremy, mingo,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	torvalds, gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts@wimmekes.net

George Dunlap wrote:
> Frans Pop wrote:
> 
>>! The kernel policy always was and still is to accept only those
>>! features which have a technical benefit **to the code base**.

> If (hypothetically) we merged Xen into Linux, then (people are 
> suggesting) the coolness of Xen would actually contribute to the 
> coolness of Linux ("add technical benefit to the code base").  People 
> would feel like working on the interface between linux-xen and the rest 
> of linux would be making their own piece of software, Linux, work 
> better, rather than feeling like they have to work with some foreign 
> project that doesn't make their code any cooler.

I suspect that there is an element of this.

There is also the factor that if Xen was merged into linux, we would
then be able to work towards a sane(r) virtualization layer that would
be useful for KVM, Xen, and possibly others.  This provides a technical
benefit to the code base by introducing a more logical organization
rather than having ad-hoc changes sprinkled all over.

Chris

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-03  8:07                         ` Christian Tramnitz
@ 2009-06-04 18:53                           ` Linus Torvalds
  2009-06-05  0:09                               ` Samuel Thibault
  0 siblings, 1 reply; 183+ messages in thread
From: Linus Torvalds @ 2009-06-04 18:53 UTC (permalink / raw)
  To: Christian Tramnitz
  Cc: Ingo Molnar, Steven Rostedt, George Dunlap, David Miller, jeremy,
	Dan Magenheimer, avi, xen-devel, x86, linux-kernel, Keir Fraser,
	gregkh, kurt.hackel, Ian Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, Stephen Spector, jens.axboe



On Wed, 3 Jun 2009, Christian Tramnitz wrote:
>
> What a great idea, and while we're doing this let's also drop support
> for legacy stuff like PATA and i8042 in mainline. Noone will need it
> anyway because their successors are on the market for years... let's
> just take it for granted that everyone is using SATA and USB nowadays!

Have you noticed how PATA and i8042 don't screw up anything else? 

You're totally missing the problem. If Xen was a single driver thing, we 
wouldn't have this discussion. But as is, Xen craps all over OTHER PEOPLES 
CODE. When those people then aren't interested in Xen, why is anybody 
surprised that people aren't excited?

			Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-04 18:53                           ` Linus Torvalds
@ 2009-06-05  0:09                               ` Samuel Thibault
  0 siblings, 0 replies; 183+ messages in thread
From: Samuel Thibault @ 2009-06-05  0:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christian Tramnitz, Ingo Molnar, Steven Rostedt, George Dunlap,
	David Miller, jeremy, Dan Magenheimer, avi, xen-devel, x86,
	linux-kernel, Keir Fraser, gregkh, kurt.hackel, Ian Pratt,
	xen-users, ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe

Linus Torvalds, le Thu 04 Jun 2009 11:53:45 -0700, a écrit :
> On Wed, 3 Jun 2009, Christian Tramnitz wrote:
> >
> > What a great idea, and while we're doing this let's also drop support
> > for legacy stuff like PATA and i8042 in mainline. Noone will need it
> > anyway because their successors are on the market for years... let's
> > just take it for granted that everyone is using SATA and USB nowadays!
> 
> Have you noticed how PATA and i8042 don't screw up anything else? 

Right.  We should get rid of all the HIGHMEM kmap crap that cripples all
the code.

Samuel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-05  0:09                               ` Samuel Thibault
  0 siblings, 0 replies; 183+ messages in thread
From: Samuel Thibault @ 2009-06-05  0:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christian Tramnitz, Ingo Molnar, Steven Rostedt, George Dunlap,
	David Miller, jeremy, Dan Magenheimer, avi, xen-devel, x86,
	linux-kernel, Keir Fraser, gregkh, kurt.hackel, Ian Pratt,
	xen-users, ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector

Linus Torvalds, le Thu 04 Jun 2009 11:53:45 -0700, a écrit :
> On Wed, 3 Jun 2009, Christian Tramnitz wrote:
> >
> > What a great idea, and while we're doing this let's also drop support
> > for legacy stuff like PATA and i8042 in mainline. Noone will need it
> > anyway because their successors are on the market for years... let's
> > just take it for granted that everyone is using SATA and USB nowadays!
> 
> Have you noticed how PATA and i8042 don't screw up anything else? 

Right.  We should get rid of all the HIGHMEM kmap crap that cripples all
the code.

Samuel

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-05  0:09                               ` Samuel Thibault
  (?)
@ 2009-06-05  0:18                               ` David Miller
  -1 siblings, 0 replies; 183+ messages in thread
From: David Miller @ 2009-06-05  0:18 UTC (permalink / raw)
  To: samuel.thibault
  Cc: torvalds, christian, mingo, rostedt, george.dunlap, jeremy,
	dan.magenheimer, avi, xen-devel, x86, linux-kernel, Keir.Fraser,
	gregkh, kurt.hackel, Ian.Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, stephen.spector, jens.axboe

From: Samuel Thibault <samuel.thibault@ens-lyon.org>
Date: Fri, 5 Jun 2009 02:09:10 +0200

> Linus Torvalds, le Thu 04 Jun 2009 11:53:45 -0700, a écrit :
>> On Wed, 3 Jun 2009, Christian Tramnitz wrote:
>> >
>> > What a great idea, and while we're doing this let's also drop support
>> > for legacy stuff like PATA and i8042 in mainline. Noone will need it
>> > anyway because their successors are on the market for years... let's
>> > just take it for granted that everyone is using SATA and USB nowadays!
>> 
>> Have you noticed how PATA and i8042 don't screw up anything else? 
> 
> Right.  We should get rid of all the HIGHMEM kmap crap that cripples all
> the code.

The kmap interfaces are pretty damn clean if you ask me.  Especially
compared to the abortion Xen plops into the x86 platform code.

So, keep searching for an argument where none exists.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
  2009-06-05  0:09                               ` Samuel Thibault
@ 2009-06-05  0:54                                 ` Linus Torvalds
  -1 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2009-06-05  0:54 UTC (permalink / raw)
  To: Samuel Thibault
  Cc: Christian Tramnitz, Ingo Molnar, Steven Rostedt, George Dunlap,
	David Miller, jeremy, Dan Magenheimer, avi, xen-devel, x86,
	linux-kernel, Keir Fraser, gregkh, kurt.hackel, Ian Pratt,
	xen-users, ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe



On Fri, 5 Jun 2009, Samuel Thibault wrote:
> 
> Right.  We should get rid of all the HIGHMEM kmap crap that cripples all
> the code.

Now you're starting to understand.

However, the difference between Xen and highmem (which I do hate, and 
which took a long time and lots of effort to get done) is how many people 
care. And in particular how many kernel developers do.

Until you can face these obvious facts, please just shut up. Ok?

			Linsu

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Merge Xen (the hypervisor) into Linux
@ 2009-06-05  0:54                                 ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2009-06-05  0:54 UTC (permalink / raw)
  To: Samuel Thibault
  Cc: Christian Tramnitz, Ingo Molnar, Steven Rostedt, George Dunlap,
	David Miller, jeremy, Dan Magenheimer, avi, xen-devel, x86,
	linux-kernel, Keir Fraser, gregkh, kurt.hackel, Ian Pratt,
	xen-users, ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector



On Fri, 5 Jun 2009, Samuel Thibault wrote:
> 
> Right.  We should get rid of all the HIGHMEM kmap crap that cripples all
> the code.

Now you're starting to understand.

However, the difference between Xen and highmem (which I do hate, and 
which took a long time and lots of effort to get done) is how many people 
care. And in particular how many kernel developers do.

Until you can face these obvious facts, please just shut up. Ok?

			Linsu

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-03 23:29                             ` Frans Pop
  2009-06-04 13:21                                 ` George Dunlap
@ 2009-06-05  4:14                               ` Bill Davidsen
  2009-06-05  4:55                                 ` Chris Friesen
  1 sibling, 1 reply; 183+ messages in thread
From: Bill Davidsen @ 2009-06-05  4:14 UTC (permalink / raw)
  To: Frans Pop
  Cc: tglx, george.dunlap, davem, jeremy, mingo, dan.magenheimer, avi,
	xen-devel, x86, linux-kernel, Keir.Fraser, torvalds, gregkh,
	kurt.hackel, Ian.Pratt, xen-users, ksrinivasan, EAnderson,
	wimcoekaerts, stephen.spector, jens.axboe

Frans Pop wrote:
> Bill Davidsen wrote:
>   
>> I was referring to your "no benefit" comment, I don't dispute the
>> technical issues. I think the idea of moving the hypervisor into the
>> kernel and letting xen folks do the external parts as they please.
>>     
>
> Where does that come from? AFAICT Thomas never made a "no benefit" comment 
> other than limited to the context of the technical implementation.
>
>   
Where it comes from is his very recent statement, which contains those 
very words. You may interpret what he said in any way you choose, but 
denying that he said it shows that you didn't follow the link back. I 
never denied the ugliness of the code, nor does the author, but it adds 
a great deal of value for many people, and that's the point I was making.

-- 
Bill Davidsen <davidsen@tmr.com>
  Even purely technical things can appear to be magic, if the documentation is
obscure enough. For example, PulseAudio is configured by dancing naked around a
fire at midnight, shaking a rattle with one hand and a LISP manual with the
other, while reciting the GNU manifesto in hexadecimal. The documentation fails
to note that you must circle the fire counter-clockwise in the southern
hemisphere.



^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-05  4:14                               ` Bill Davidsen
@ 2009-06-05  4:55                                 ` Chris Friesen
  0 siblings, 0 replies; 183+ messages in thread
From: Chris Friesen @ 2009-06-05  4:55 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Frans Pop, tglx, george.dunlap, davem, jeremy, mingo,
	dan.magenheimer, avi, xen-devel, x86, linux-kernel, Keir.Fraser,
	torvalds, gregkh, kurt.hackel, Ian.Pratt, xen-users, ksrinivasan,
	EAnderson, wimcoekaerts, stephen.spector, jens.axboe

Bill Davidsen wrote:

> Where it comes from is his very recent statement, which contains those 
> very words. You may interpret what he said in any way you choose, but 
> denying that he said it shows that you didn't follow the link back. I 
> never denied the ugliness of the code, nor does the author, but it adds 
> a great deal of value for many people, and that's the point I was making.

Lots of code could be said to add a great deal of value for many people
(semi-closed video card drivers, ndiswrapper, etc.), but it's never
going to be accepted into the kernel.

The maintainers get to decide whether the perceived benefit outweighs
the perceived cost.  So far, they've decided that Xen isn't worth it.

The most likely way to get Xen merged is to lower the cost (reduce the
churn and ugliness), increase the benefit (improve the virtualization
layer, thus cleaning up other code as well), or both.

Chris

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-02 18:59                               ` Avi Kivity
@ 2009-06-07  9:13                                 ` Ingo Molnar
  -1 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-06-07  9:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Linus Torvalds, George Dunlap, Thomas Gleixner, David Miller,
	jeremy, Dan Magenheimer, xen-devel, x86, linux-kernel,
	Keir Fraser, gregkh, kurt.hackel, Ian Pratt, xen-users,
	ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe, npiggin


* Avi Kivity <avi@redhat.com> wrote:

> Linus Torvalds wrote:
>> The point? Xen really is horribly badly separated out. It gets way more 
>> incestuous with other systems than it should. It's entirely possible 
>> that this is very fundamental to both paravirtualization and to 
>> hypervisor behavior, but it doesn't matter - it just measn that I can 
>> well see that Xen is a f*cking pain to merge.
>>
>> So please, Xen people, look at your track record, and look at the 
>> issues from the standpoint of somebody merging your code, rather 
>> than just from the standpoint of somebody who whines "I want my 
>> code to be merged".
>>
>> IOW, if you have trouble getting your code merged, ask yourself 
>> what _you_ are doing wrong.
>
> There is in fact a way to get dom0 support with nearly no changes 
> to Linux, but it involves massive changes to Xen itself and 
> requires hardware support: run dom0 as a fully virtualized guest, 
> and assign it all the resources dom0 can access.  It's probably a 
> massive effort though.
>
> I've considered it for kvm when faced with the "I want a thin 
> hypervisor" question: compile the hypervisor kernel with PCI 
> support but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device 
> drivers), load userspace from initramfs, and assign host devices 
> to one or more privileged guests.  You could probably run the host 
> with a heavily stripped configuration, and enjoy the slimness 
> while every interrupt invokes the scheduler, a context switch, and 
> maybe an IPI for good measure.

This would be an acceptable model i suspect, if someone wants a 
'slim hypervisor'.

We can context switch way faster than we handle IRQs. Plus in a 
slimmed-down config we could intentionally slim down aspects of the 
scheduler as well, if it ever became a measurable performance issue. 
The hypervisor would run a minimal user-space and most of the 
context-switching overhead relates to having a full-fledged 
user-space with rich requirements. So there's no real conceptual 
friction between a 'lean and mean' hypervisor and a full-featured 
native kernel.

This would certainly be an utterly clean design, and it would be 
interesting to see a Linux/Xen + Linux/Dom0 combo engineered in such 
a way - if people really find this layered kernel approach 
interesting. So the door is not closed to dom0 at all - but it has 
to be designed cleanly without messing up the native kernel.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-07  9:13                                 ` Ingo Molnar
  0 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-06-07  9:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Linus Torvalds, George Dunlap, Thomas Gleixner, David Miller,
	jeremy, Dan Magenheimer, xen-devel, x86, linux-kernel,
	Keir Fraser, gregkh, kurt.hackel, Ian Pratt, xen-users,
	ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe


* Avi Kivity <avi@redhat.com> wrote:

> Linus Torvalds wrote:
>> The point? Xen really is horribly badly separated out. It gets way more 
>> incestuous with other systems than it should. It's entirely possible 
>> that this is very fundamental to both paravirtualization and to 
>> hypervisor behavior, but it doesn't matter - it just measn that I can 
>> well see that Xen is a f*cking pain to merge.
>>
>> So please, Xen people, look at your track record, and look at the 
>> issues from the standpoint of somebody merging your code, rather 
>> than just from the standpoint of somebody who whines "I want my 
>> code to be merged".
>>
>> IOW, if you have trouble getting your code merged, ask yourself 
>> what _you_ are doing wrong.
>
> There is in fact a way to get dom0 support with nearly no changes 
> to Linux, but it involves massive changes to Xen itself and 
> requires hardware support: run dom0 as a fully virtualized guest, 
> and assign it all the resources dom0 can access.  It's probably a 
> massive effort though.
>
> I've considered it for kvm when faced with the "I want a thin 
> hypervisor" question: compile the hypervisor kernel with PCI 
> support but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device 
> drivers), load userspace from initramfs, and assign host devices 
> to one or more privileged guests.  You could probably run the host 
> with a heavily stripped configuration, and enjoy the slimness 
> while every interrupt invokes the scheduler, a context switch, and 
> maybe an IPI for good measure.

This would be an acceptable model i suspect, if someone wants a 
'slim hypervisor'.

We can context switch way faster than we handle IRQs. Plus in a 
slimmed-down config we could intentionally slim down aspects of the 
scheduler as well, if it ever became a measurable performance issue. 
The hypervisor would run a minimal user-space and most of the 
context-switching overhead relates to having a full-fledged 
user-space with rich requirements. So there's no real conceptual 
friction between a 'lean and mean' hypervisor and a full-featured 
native kernel.

This would certainly be an utterly clean design, and it would be 
interesting to see a Linux/Xen + Linux/Dom0 combo engineered in such 
a way - if people really find this layered kernel approach 
interesting. So the door is not closed to dom0 at all - but it has 
to be designed cleanly without messing up the native kernel.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-07  9:13                                 ` Ingo Molnar
@ 2009-06-07 10:01                                   ` Avi Kivity
  -1 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-06-07 10:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, George Dunlap, Thomas Gleixner, David Miller,
	jeremy, Dan Magenheimer, xen-devel, x86, linux-kernel,
	Keir Fraser, gregkh, kurt.hackel, Ian Pratt, xen-users,
	ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe, npiggin

Ingo Molnar wrote:
>> There is in fact a way to get dom0 support with nearly no changes 
>> to Linux, but it involves massive changes to Xen itself and 
>> requires hardware support: run dom0 as a fully virtualized guest, 
>> and assign it all the resources dom0 can access.  It's probably a 
>> massive effort though.
>>
>> I've considered it for kvm when faced with the "I want a thin 
>> hypervisor" question: compile the hypervisor kernel with PCI 
>> support but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device 
>> drivers), load userspace from initramfs, and assign host devices 
>> to one or more privileged guests.  You could probably run the host 
>> with a heavily stripped configuration, and enjoy the slimness 
>> while every interrupt invokes the scheduler, a context switch, and 
>> maybe an IPI for good measure.
>>     
>
> This would be an acceptable model i suspect, if someone wants a 
> 'slim hypervisor'.
>
> We can context switch way faster than we handle IRQs. Plus in a 
> slimmed-down config we could intentionally slim down aspects of the 
> scheduler as well, if it ever became a measurable performance issue. 
> The hypervisor would run a minimal user-space and most of the 
> context-switching overhead relates to having a full-fledged 
> user-space with rich requirements. So there's no real conceptual 
> friction between a 'lean and mean' hypervisor and a full-featured 
> native kernel.
>   

The context switch would be taken by the Xen scheduler, not the Linux 
scheduler.  It's how interrupts work under Xen: an interrupt is taken, 
Xen schedules the domain that owns the interrupts (dom0 usually), which 
then handles the interrupt.  The Linux scheduler would only be involved 
if you thread your interrupt handlers.

This context switch is necessary regardless of how dom0 is integrated 
into Linux; it's simply a side effect of implementing device drivers 
outside the kernel (in this context, the kernel is Xen, and dom0 is just 
another userspace, albeit with elevated privileges.  The Linux 
equivalent to dom0 is a process that uses uio.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-07 10:01                                   ` Avi Kivity
  0 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-06-07 10:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, George Dunlap, Thomas Gleixner, David Miller,
	jeremy, Dan Magenheimer, xen-devel, x86, linux-kernel,
	Keir Fraser, gregkh, kurt.hackel, Ian Pratt, xen-users,
	ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe

Ingo Molnar wrote:
>> There is in fact a way to get dom0 support with nearly no changes 
>> to Linux, but it involves massive changes to Xen itself and 
>> requires hardware support: run dom0 as a fully virtualized guest, 
>> and assign it all the resources dom0 can access.  It's probably a 
>> massive effort though.
>>
>> I've considered it for kvm when faced with the "I want a thin 
>> hypervisor" question: compile the hypervisor kernel with PCI 
>> support but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device 
>> drivers), load userspace from initramfs, and assign host devices 
>> to one or more privileged guests.  You could probably run the host 
>> with a heavily stripped configuration, and enjoy the slimness 
>> while every interrupt invokes the scheduler, a context switch, and 
>> maybe an IPI for good measure.
>>     
>
> This would be an acceptable model i suspect, if someone wants a 
> 'slim hypervisor'.
>
> We can context switch way faster than we handle IRQs. Plus in a 
> slimmed-down config we could intentionally slim down aspects of the 
> scheduler as well, if it ever became a measurable performance issue. 
> The hypervisor would run a minimal user-space and most of the 
> context-switching overhead relates to having a full-fledged 
> user-space with rich requirements. So there's no real conceptual 
> friction between a 'lean and mean' hypervisor and a full-featured 
> native kernel.
>   

The context switch would be taken by the Xen scheduler, not the Linux 
scheduler.  It's how interrupts work under Xen: an interrupt is taken, 
Xen schedules the domain that owns the interrupts (dom0 usually), which 
then handles the interrupt.  The Linux scheduler would only be involved 
if you thread your interrupt handlers.

This context switch is necessary regardless of how dom0 is integrated 
into Linux; it's simply a side effect of implementing device drivers 
outside the kernel (in this context, the kernel is Xen, and dom0 is just 
another userspace, albeit with elevated privileges.  The Linux 
equivalent to dom0 is a process that uses uio.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-07 10:01                                   ` Avi Kivity
@ 2009-06-07 10:35                                     ` Ingo Molnar
  -1 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-06-07 10:35 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Linus Torvalds, George Dunlap, Thomas Gleixner, David Miller,
	jeremy, Dan Magenheimer, xen-devel, x86, linux-kernel,
	Keir Fraser, gregkh, kurt.hackel, Ian Pratt, xen-users,
	ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe, npiggin


* Avi Kivity <avi@redhat.com> wrote:

> Ingo Molnar wrote:
>>> There is in fact a way to get dom0 support with nearly no changes to 
>>> Linux, but it involves massive changes to Xen itself and requires 
>>> hardware support: run dom0 as a fully virtualized guest, and assign 
>>> it all the resources dom0 can access.  It's probably a massive effort 
>>> though.
>>>
>>> I've considered it for kvm when faced with the "I want a thin  
>>> hypervisor" question: compile the hypervisor kernel with PCI support 
>>> but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device drivers), 
>>> load userspace from initramfs, and assign host devices to one or more 
>>> privileged guests.  You could probably run the host with a heavily 
>>> stripped configuration, and enjoy the slimness while every interrupt 
>>> invokes the scheduler, a context switch, and maybe an IPI for good 
>>> measure.
>>>     
>>
>> This would be an acceptable model i suspect, if someone wants a 'slim 
>> hypervisor'.
>>
>> We can context switch way faster than we handle IRQs. Plus in a  
>> slimmed-down config we could intentionally slim down aspects of the  
>> scheduler as well, if it ever became a measurable performance issue.  
>> The hypervisor would run a minimal user-space and most of the  
>> context-switching overhead relates to having a full-fledged user-space 
>> with rich requirements. So there's no real conceptual friction between 
>> a 'lean and mean' hypervisor and a full-featured native kernel.
>>   
>
> The context switch would be taken by the Xen scheduler, not the Linux  
> scheduler. [...]

The 'slim hypervisor' model i was suggesting was a slimmed down 
_Linux_ kernel.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-07 10:35                                     ` Ingo Molnar
  0 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2009-06-07 10:35 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Linus Torvalds, George Dunlap, Thomas Gleixner, David Miller,
	jeremy, Dan Magenheimer, xen-devel, x86, linux-kernel,
	Keir Fraser, gregkh, kurt.hackel, Ian Pratt, xen-users,
	ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe


* Avi Kivity <avi@redhat.com> wrote:

> Ingo Molnar wrote:
>>> There is in fact a way to get dom0 support with nearly no changes to 
>>> Linux, but it involves massive changes to Xen itself and requires 
>>> hardware support: run dom0 as a fully virtualized guest, and assign 
>>> it all the resources dom0 can access.  It's probably a massive effort 
>>> though.
>>>
>>> I've considered it for kvm when faced with the "I want a thin  
>>> hypervisor" question: compile the hypervisor kernel with PCI support 
>>> but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device drivers), 
>>> load userspace from initramfs, and assign host devices to one or more 
>>> privileged guests.  You could probably run the host with a heavily 
>>> stripped configuration, and enjoy the slimness while every interrupt 
>>> invokes the scheduler, a context switch, and maybe an IPI for good 
>>> measure.
>>>     
>>
>> This would be an acceptable model i suspect, if someone wants a 'slim 
>> hypervisor'.
>>
>> We can context switch way faster than we handle IRQs. Plus in a  
>> slimmed-down config we could intentionally slim down aspects of the  
>> scheduler as well, if it ever became a measurable performance issue.  
>> The hypervisor would run a minimal user-space and most of the  
>> context-switching overhead relates to having a full-fledged user-space 
>> with rich requirements. So there's no real conceptual friction between 
>> a 'lean and mean' hypervisor and a full-featured native kernel.
>>   
>
> The context switch would be taken by the Xen scheduler, not the Linux  
> scheduler. [...]

The 'slim hypervisor' model i was suggesting was a slimmed down 
_Linux_ kernel.

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-07 10:35                                     ` Ingo Molnar
@ 2009-06-07 12:46                                       ` Avi Kivity
  -1 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-06-07 12:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, George Dunlap, Thomas Gleixner, David Miller,
	jeremy, Dan Magenheimer, xen-devel, x86, linux-kernel,
	Keir Fraser, gregkh, kurt.hackel, Ian Pratt, xen-users,
	ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe, npiggin

Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
>
>   
>> Ingo Molnar wrote:
>>     
>>>> There is in fact a way to get dom0 support with nearly no changes to 
>>>> Linux, but it involves massive changes to Xen itself and requires 
>>>> hardware support: run dom0 as a fully virtualized guest, and assign 
>>>> it all the resources dom0 can access.  It's probably a massive effort 
>>>> though.
>>>>
>>>> I've considered it for kvm when faced with the "I want a thin  
>>>> hypervisor" question: compile the hypervisor kernel with PCI support 
>>>> but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device drivers), 
>>>> load userspace from initramfs, and assign host devices to one or more 
>>>> privileged guests.  You could probably run the host with a heavily 
>>>> stripped configuration, and enjoy the slimness while every interrupt 
>>>> invokes the scheduler, a context switch, and maybe an IPI for good 
>>>> measure.
>>>>     
>>>>         
>>> This would be an acceptable model i suspect, if someone wants a 'slim 
>>> hypervisor'.
>>>
>>> We can context switch way faster than we handle IRQs. Plus in a  
>>> slimmed-down config we could intentionally slim down aspects of the  
>>> scheduler as well, if it ever became a measurable performance issue.  
>>> The hypervisor would run a minimal user-space and most of the  
>>> context-switching overhead relates to having a full-fledged user-space 
>>> with rich requirements. So there's no real conceptual friction between 
>>> a 'lean and mean' hypervisor and a full-featured native kernel.
>>>   
>>>       
>> The context switch would be taken by the Xen scheduler, not the Linux  
>> scheduler. [...]
>>     
>
> The 'slim hypervisor' model i was suggesting was a slimmed down 
> _Linux_ kernel.
>   

Yeah, I lost the context.  I should reduce my own context switching.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-07 12:46                                       ` Avi Kivity
  0 siblings, 0 replies; 183+ messages in thread
From: Avi Kivity @ 2009-06-07 12:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, George Dunlap, Thomas Gleixner, David Miller,
	jeremy, Dan Magenheimer, xen-devel, x86, linux-kernel,
	Keir Fraser, gregkh, kurt.hackel, Ian Pratt, xen-users,
	ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe

Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
>
>   
>> Ingo Molnar wrote:
>>     
>>>> There is in fact a way to get dom0 support with nearly no changes to 
>>>> Linux, but it involves massive changes to Xen itself and requires 
>>>> hardware support: run dom0 as a fully virtualized guest, and assign 
>>>> it all the resources dom0 can access.  It's probably a massive effort 
>>>> though.
>>>>
>>>> I've considered it for kvm when faced with the "I want a thin  
>>>> hypervisor" question: compile the hypervisor kernel with PCI support 
>>>> but nothing else (no CONFIG_BLOCK or CONFIG_NET, no device drivers), 
>>>> load userspace from initramfs, and assign host devices to one or more 
>>>> privileged guests.  You could probably run the host with a heavily 
>>>> stripped configuration, and enjoy the slimness while every interrupt 
>>>> invokes the scheduler, a context switch, and maybe an IPI for good 
>>>> measure.
>>>>     
>>>>         
>>> This would be an acceptable model i suspect, if someone wants a 'slim 
>>> hypervisor'.
>>>
>>> We can context switch way faster than we handle IRQs. Plus in a  
>>> slimmed-down config we could intentionally slim down aspects of the  
>>> scheduler as well, if it ever became a measurable performance issue.  
>>> The hypervisor would run a minimal user-space and most of the  
>>> context-switching overhead relates to having a full-fledged user-space 
>>> with rich requirements. So there's no real conceptual friction between 
>>> a 'lean and mean' hypervisor and a full-featured native kernel.
>>>   
>>>       
>> The context switch would be taken by the Xen scheduler, not the Linux  
>> scheduler. [...]
>>     
>
> The 'slim hypervisor' model i was suggesting was a slimmed down 
> _Linux_ kernel.
>   

Yeah, I lost the context.  I should reduce my own context switching.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
  2009-06-07 12:46                                       ` Avi Kivity
@ 2009-06-07 13:02                                         ` Jaswinder Singh Rajput
  -1 siblings, 0 replies; 183+ messages in thread
From: Jaswinder Singh Rajput @ 2009-06-07 13:02 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Linus Torvalds, George Dunlap, Thomas Gleixner,
	David Miller, jeremy, Dan Magenheimer, xen-devel, x86,
	linux-kernel, Keir Fraser, gregkh, kurt.hackel, Ian Pratt,
	xen-users, ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe, npiggin

On Sun, 2009-06-07 at 15:46 +0300, Avi Kivity wrote:
> Ingo Molnar wrote:
> >
> > The 'slim hypervisor' model i was suggesting was a slimmed down 
> > _Linux_ kernel.
> >   
> 
> Yeah, I lost the context.  I should reduce my own context switching.
> 

It would be better if we monitor the switching, entry/exit and other
useful parameters in count and frequency using debugfs to increase the
performance.

Thanks,
--
JSR


^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: Xen is a feature
@ 2009-06-07 13:02                                         ` Jaswinder Singh Rajput
  0 siblings, 0 replies; 183+ messages in thread
From: Jaswinder Singh Rajput @ 2009-06-07 13:02 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Linus Torvalds, George Dunlap, Thomas Gleixner,
	David Miller, jeremy, Dan Magenheimer, xen-devel, x86,
	linux-kernel, Keir Fraser, gregkh, kurt.hackel, Ian Pratt,
	xen-users, ksrinivasan, EAnderson, wimcoekaerts, Stephen Spector,
	jens.axboe

On Sun, 2009-06-07 at 15:46 +0300, Avi Kivity wrote:
> Ingo Molnar wrote:
> >
> > The 'slim hypervisor' model i was suggesting was a slimmed down 
> > _Linux_ kernel.
> >   
> 
> Yeah, I lost the context.  I should reduce my own context switching.
> 

It would be better if we monitor the switching, entry/exit and other
useful parameters in count and frequency using debugfs to increase the
performance.

Thanks,
--
JSR

^ permalink raw reply	[flat|nested] 183+ messages in thread

end of thread, other threads:[~2009-06-07 13:03 UTC | newest]

Thread overview: 183+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-12 23:25 [GIT PULL] Xen APIC hooks (with io_apic_ops) Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 01/17] xen/dom0: handle acpi lapic parsing in Xen dom0 Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 02/17] x86: add io_apic_ops to allow interception Jeremy Fitzhardinge
2009-05-25  3:54   ` Ingo Molnar
2009-05-25  3:54     ` Ingo Molnar
2009-05-27  7:17     ` Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 03/17] xen: implement io_apic_ops Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 04/17] xen: create dummy ioapic mapping Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 05/17] xen: implement pirq type event channels Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 06/17] x86/io_apic: add get_nr_irqs_gsi() Jeremy Fitzhardinge
2009-05-12 23:25   ` Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 07/17] xen/apic: identity map gsi->irqs Jeremy Fitzhardinge
2009-05-12 23:25   ` Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 08/17] xen: direct irq registration to pirq event channels Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 09/17] xen: bind pirq to vector and event channel Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 10/17] xen: pre-initialize legacy irqs early Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 11/17] xen: don't setup acpi interrupt unless there is one Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 12/17] xen: use acpi_get_override_irq() to get triggering for legacy irqs Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 13/17] xen: initialize irq 0 too Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 14/17] xen: dynamically allocate irq & event structures Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 15/17] xen: set pirq name to something useful Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 16/17] xen: fix legacy irq setup, make ioapic-less machines work Jeremy Fitzhardinge
2009-05-12 23:25 ` [PATCH 17/17] xen: disable MSI Jeremy Fitzhardinge
2009-05-19 12:35 ` [GIT PULL] Xen APIC hooks (with io_apic_ops) Ingo Molnar
2009-05-19 12:35   ` Ingo Molnar
2009-05-20 17:57   ` Jeremy Fitzhardinge
2009-05-20 17:57     ` Jeremy Fitzhardinge
2009-05-25  4:10     ` Ingo Molnar
2009-05-25  4:10       ` Ingo Molnar
2009-05-26 12:46       ` [Xen-devel] " George Dunlap
2009-05-26 12:46         ` George Dunlap
2009-05-26 18:26         ` [Xen-devel] " Avi Kivity
2009-05-26 18:26           ` Avi Kivity
2009-05-26 19:18           ` [Xen-devel] " Dan Magenheimer
2009-05-26 19:18             ` Dan Magenheimer
2009-05-26 19:41             ` [Xen-devel] " Avi Kivity
2009-05-26 19:41               ` Avi Kivity
2009-05-28  0:13             ` [Xen-devel] " Ingo Molnar
2009-05-28  0:13               ` Ingo Molnar
2009-05-28  0:49               ` [Xen-devel] " Jeremy Fitzhardinge
2009-05-28  0:49                 ` Jeremy Fitzhardinge
2009-05-28  3:47               ` [Xen-devel] " Dan Magenheimer
2009-05-28 12:03                 ` Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant (was: Re: Re: [GIT PULL] Xen APIC hooks (with io_apic_ops)) Luke S Crawford
2009-05-28 13:39                   ` Tim Post
2009-05-28 22:23                     ` Luke S Crawford
2009-05-29  1:00                       ` Tim Post
2009-05-29  8:31                         ` Tim Post
2009-05-29  9:49                           ` George Dunlap
2009-05-29 13:42                       ` Dan Magenheimer
2009-05-30 21:02                         ` Luke S Crawford
2009-05-31 16:44                           ` Tim Post
2009-05-31 17:00                             ` Tim Post
2009-05-31 19:48                               ` Dan Magenheimer
2009-06-02  0:15                                 ` Luke S Crawford
2009-06-01 18:04                           ` Dan Magenheimer
2009-05-30  1:10                     ` Distro kernel and 'virtualization server' vs. 'server that sometimes runs virtual instances' rant Michael David Crawford
2009-05-28 14:26               ` [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops) George Dunlap
2009-05-28 14:26                 ` George Dunlap
2009-05-29  0:45               ` Xen is a feature Jeremy Fitzhardinge
2009-05-29  0:45                 ` Jeremy Fitzhardinge
2009-05-29  1:27                 ` Greg KH
2009-05-29  4:05                 ` David Miller
2009-05-29  6:37                   ` Jaswinder Singh Rajput
2009-05-29  6:51                     ` David Miller
2009-05-29 12:01                   ` George Dunlap
2009-05-29 12:01                     ` George Dunlap
2009-05-29 14:14                     ` Pasi Kärkkäinen
2009-05-29 14:14                       ` Pasi Kärkkäinen
2009-05-29 21:29                       ` David Miller
2009-05-29 18:34                     ` Andi Kleen
2009-05-29 21:31                       ` [Xen-devel] " Jeremy Fitzhardinge
2009-05-29 21:31                         ` Jeremy Fitzhardinge
2009-05-29 23:09                       ` [Xen-devel] " Nakajima, Jun
2009-05-29 23:09                         ` Nakajima, Jun
2009-05-29 23:26                         ` [Xen-devel] " Jeremy Fitzhardinge
2009-05-29 23:26                           ` Jeremy Fitzhardinge
2009-06-02 15:23                     ` Thomas Gleixner
2009-06-02 15:23                       ` Thomas Gleixner
2009-06-02 16:41                       ` George Dunlap
2009-06-02 16:41                         ` George Dunlap
2009-06-02 17:28                         ` Chris Friesen
2009-06-02 17:28                           ` Chris Friesen
2009-06-02 17:46                         ` Linus Torvalds
2009-06-02 17:46                           ` Linus Torvalds
2009-06-02 18:02                           ` Linus Torvalds
2009-06-02 18:02                             ` Linus Torvalds
2009-06-02 18:59                             ` Avi Kivity
2009-06-02 18:59                               ` Avi Kivity
2009-06-07  9:13                               ` Ingo Molnar
2009-06-07  9:13                                 ` Ingo Molnar
2009-06-07 10:01                                 ` Avi Kivity
2009-06-07 10:01                                   ` Avi Kivity
2009-06-07 10:35                                   ` Ingo Molnar
2009-06-07 10:35                                     ` Ingo Molnar
2009-06-07 12:46                                     ` Avi Kivity
2009-06-07 12:46                                       ` Avi Kivity
2009-06-07 13:02                                       ` Jaswinder Singh Rajput
2009-06-07 13:02                                         ` Jaswinder Singh Rajput
2009-06-04 14:02                           ` [Xen-users] " Thomas Goirand
2009-06-04 14:02                             ` Thomas Goirand
2009-06-02 18:59                         ` Thomas Gleixner
2009-06-02 18:59                           ` Thomas Gleixner
2009-06-03 19:49                       ` Bill Davidsen
2009-06-03 19:49                         ` Bill Davidsen
2009-06-03 20:20                         ` Thomas Gleixner
2009-06-03 20:20                           ` Thomas Gleixner
2009-06-03 22:37                           ` Bill Davidsen
2009-06-03 22:37                             ` Bill Davidsen
2009-06-03 23:29                             ` Frans Pop
2009-06-04 13:21                               ` George Dunlap
2009-06-04 13:21                                 ` George Dunlap
2009-06-04 15:10                                 ` Theodore Tso
2009-06-04 15:10                                   ` Theodore Tso
2009-06-04 15:31                                 ` Chris Friesen
2009-06-04 15:31                                   ` Chris Friesen
2009-06-05  4:14                               ` Bill Davidsen
2009-06-05  4:55                                 ` Chris Friesen
2009-06-02 22:40                     ` Steven Rostedt
2009-06-02 22:40                       ` Steven Rostedt
2009-06-02 23:28                       ` Merge Xen (the hypervisor) into Linux Ingo Molnar
2009-06-02 23:28                         ` Ingo Molnar
2009-06-03  0:00                         ` Dan Magenheimer
2009-06-03  0:32                           ` Thomas Gleixner
2009-06-03  2:43                           ` Theodore Tso
2009-06-03  3:42                             ` Steven Rostedt
2009-06-03  4:49                               ` Dan Magenheimer
2009-06-03  4:58                                 ` David Miller
2009-06-03  5:07                                   ` Steven Rostedt
2009-06-03  5:22                                 ` Steven Rostedt
2009-06-03 12:03                                   ` George Dunlap
2009-06-03 12:03                                     ` George Dunlap
2009-06-03 19:05                                     ` Theodore Tso
2009-06-03 19:05                                       ` Theodore Tso
2009-06-03 21:49                                       ` Samuel Thibault
2009-06-04 13:43                                       ` [Xen-users] " Florian Manschwetus
2009-06-04 14:03                                         ` Steven Rostedt
2009-06-04 14:03                                           ` Steven Rostedt
2009-06-03  7:28                             ` Gerd Hoffmann
2009-06-03  8:47                               ` Alan Cox
2009-06-03  9:09                                 ` Gerd Hoffmann
2009-06-03  9:20                                   ` Keir Fraser
2009-06-03  9:20                                     ` Keir Fraser
2009-06-03 11:15                                   ` Theodore Tso
2009-06-03 11:39                                     ` Keir Fraser
2009-06-03 11:39                                       ` Keir Fraser
2009-06-03 11:41                                     ` Gerd Hoffmann
2009-06-03 11:41                                     ` Gerd Hoffmann
2009-06-03  7:28                             ` Gerd Hoffmann
2009-06-03  1:00                         ` Joel Becker
2009-06-03  1:00                           ` Joel Becker
2009-06-03  2:00                           ` david
2009-06-03  2:00                             ` david
2009-06-03  7:59                           ` Alan Cox
2009-06-03  7:59                             ` Alan Cox
2009-06-03  8:07                         ` Christian Tramnitz
2009-06-04 18:53                           ` Linus Torvalds
2009-06-05  0:09                             ` Samuel Thibault
2009-06-05  0:09                               ` Samuel Thibault
2009-06-05  0:18                               ` David Miller
2009-06-05  0:54                               ` Linus Torvalds
2009-06-05  0:54                                 ` Linus Torvalds
2009-06-03 17:31                         ` Chris Friesen
2009-06-03 17:31                           ` Chris Friesen
2009-06-03 17:36                           ` Alan Cox
2009-06-03 17:36                             ` Alan Cox
2009-06-02 23:41                       ` Xen is a feature Thomas Gleixner
2009-06-02 23:41                         ` Thomas Gleixner
2009-05-30  2:19                 ` [Xen-devel] " Andy Burns
2009-05-26 21:19         ` [Xen-devel] Re: [GIT PULL] Xen APIC hooks (with io_apic_ops) Gerd Hoffmann
2009-05-26 21:19           ` Gerd Hoffmann
2009-05-27 10:14           ` [Xen-devel] " George Dunlap
2009-05-27 10:14             ` George Dunlap
2009-05-24 20:10   ` Avi Kivity
2009-05-24 20:10     ` Avi Kivity
2009-05-25  3:51     ` Ingo Molnar
2009-05-25  3:51       ` Ingo Molnar
2009-05-25  4:55       ` Avi Kivity
2009-05-25  4:55         ` Avi Kivity
2009-05-25  5:06         ` Ingo Molnar
2009-05-25  5:06           ` Ingo Molnar
2009-05-25  5:12           ` Avi Kivity
2009-05-25  5:12             ` Avi Kivity
2009-05-25  5:19             ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.