All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] 2.6.30 Xen core updates
@ 2009-03-13  8:11 Jeremy Fitzhardinge
  2009-03-13  8:11   ` Jeremy Fitzhardinge
                   ` (23 more replies)
  0 siblings, 24 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel


This series updates the kernel's baseline domU Xen functionality.
It's mostly bugfixes, but there are a couple of new Xen-specific drivers.

The series depends on the earlier x86/brk and x86/paravirt patches I
posted a couple of days ago.

Thanks,
	J

The following changes since commit 6b3933081104945c557d8fe678301cc1bdefdcc8:
  Jeremy Fitzhardinge (1):
        Merge branch 'push/x86/brk' into HEAD

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git push/xen/master

Alex Nixon (1):
      Xen: Add virt_to_pfn helper function

Hannes Eder (1):
      NULL noise: arch/x86/xen/smp.c

Ian Campbell (6):
      xen: add irq_from_evtchn
      xen: add /dev/xen/evtchn driver
      xen: export ioctl headers to userspace
      xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet
      xen: remove suspend_cancel hook
      xen: use device model for suspending xenbus devices

Jeremy Fitzhardinge (18):
      xen: disable preempt for leave_lazy_mmu
      xen: separate p2m allocation from setting
      xen: dynamically allocate p2m tables
      xen: split construction of p2m mfn tables from registration
      xen: clean up xen_load_gdt
      xen: make xen_load_gdt simpler
      xen: remove xen_load_gdt debug
      xen: reserve i386 Xen pagetables
      xen: mask XSAVE from cpuid
      xen: add FIX_TEXT_POKE to fixmap
      x86-64: remove PGE from must-have feature list
      xen/dev-evtchn: clean up locking in evtchn
      xen: add "capabilities" file
      xen: add /sys/hypervisor support
      xen/sys/hypervisor: change writable_pt to features
      xen/xenbus: export xenbus_dev_changed
      Merge branches 'push/xen/dev-evtchn', 'push/xen/xenfs' and 'push/xen/sys-hypervisor' into push/xen/control
      Merge branches 'push/xen/control' and 'push/xen/xenbus' into push/xen/master

 arch/x86/include/asm/required-features.h |    2 +-
 arch/x86/include/asm/xen/page.h          |    3 +-
 arch/x86/xen/enlighten.c                 |   76 ++++-
 arch/x86/xen/mmu.c                       |  116 ++++++--
 arch/x86/xen/mmu.h                       |    3 +
 arch/x86/xen/smp.c                       |    4 +-
 drivers/xen/Kconfig                      |   20 ++
 drivers/xen/Makefile                     |    4 +-
 drivers/xen/events.c                     |    6 +
 drivers/xen/evtchn.c                     |  507 ++++++++++++++++++++++++++++++
 drivers/xen/manage.c                     |    9 +-
 drivers/xen/sys-hypervisor.c             |  445 ++++++++++++++++++++++++++
 drivers/xen/xenbus/xenbus_probe.c        |   61 +---
 drivers/xen/xenbus/xenbus_xs.c           |    2 +
 drivers/xen/xenfs/super.c                |   19 +-
 include/Kbuild                           |    1 +
 include/xen/Kbuild                       |    1 +
 include/xen/events.h                     |    3 +
 include/xen/evtchn.h                     |   88 +++++
 include/xen/interface/version.h          |    3 +
 include/xen/xenbus.h                     |    3 +-
 21 files changed, 1269 insertions(+), 107 deletions(-)
 create mode 100644 drivers/xen/evtchn.c
 create mode 100644 drivers/xen/sys-hypervisor.c
 create mode 100644 include/xen/Kbuild
 create mode 100644 include/xen/evtchn.h


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 01/24] xen: disable preempt for leave_lazy_mmu
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
@ 2009-03-13  8:11   ` Jeremy Fitzhardinge
  2009-03-13  8:11   ` Jeremy Fitzhardinge
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

xen_mc_flush() requires preemption to be disabled for its own sanity,
so disable it while we're flushing.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/mmu.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 185b547..eceff87 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1812,8 +1812,10 @@ __init void xen_post_allocator_init(void)
 
 static void xen_leave_lazy_mmu(void)
 {
+	preempt_disable();
 	xen_mc_flush();
 	paravirt_leave_lazy_mmu();
+	preempt_enable();
 }
 
 const struct pv_mmu_ops xen_mmu_ops __initdata = {
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 01/24] xen: disable preempt for leave_lazy_mmu
@ 2009-03-13  8:11   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

xen_mc_flush() requires preemption to be disabled for its own sanity,
so disable it while we're flushing.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/mmu.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 185b547..eceff87 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1812,8 +1812,10 @@ __init void xen_post_allocator_init(void)
 
 static void xen_leave_lazy_mmu(void)
 {
+	preempt_disable();
 	xen_mc_flush();
 	paravirt_leave_lazy_mmu();
+	preempt_enable();
 }
 
 const struct pv_mmu_ops xen_mmu_ops __initdata = {
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 02/24] xen: separate p2m allocation from setting
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
@ 2009-03-13  8:11   ` Jeremy Fitzhardinge
  2009-03-13  8:11   ` Jeremy Fitzhardinge
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

When doing very early p2m setting, we need to separate setting
from allocation, so split things up accordingly.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/mmu.c |   61 +++++++++++++++++++++++++++++++++++++--------------
 arch/x86/xen/mmu.h |    3 ++
 2 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index eceff87..d534986 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -233,47 +233,74 @@ unsigned long get_phys_to_machine(unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(get_phys_to_machine);
 
-static void alloc_p2m(unsigned long **pp, unsigned long *mfnp)
+/* install a  new p2m_top page */
+bool install_p2mtop_page(unsigned long pfn, unsigned long *p)
 {
-	unsigned long *p;
+	unsigned topidx = p2m_top_index(pfn);
+	unsigned long **pfnp, *mfnp;
 	unsigned i;
 
-	p = (void *)__get_free_page(GFP_KERNEL | __GFP_NOFAIL);
-	BUG_ON(p == NULL);
+	pfnp = &p2m_top[topidx];
+	mfnp = &p2m_top_mfn[topidx];
 
 	for (i = 0; i < P2M_ENTRIES_PER_PAGE; i++)
 		p[i] = INVALID_P2M_ENTRY;
 
-	if (cmpxchg(pp, p2m_missing, p) != p2m_missing)
-		free_page((unsigned long)p);
-	else
+	if (cmpxchg(pfnp, p2m_missing, p) == p2m_missing) {
 		*mfnp = virt_to_mfn(p);
+		return true;
+	}
+
+	return false;
 }
 
-void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+static void alloc_p2m(unsigned long pfn)
 {
-	unsigned topidx, idx;
+	unsigned long *p;
 
-	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
-		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
-		return;
-	}
+	p = (void *)__get_free_page(GFP_KERNEL | __GFP_NOFAIL);
+	BUG_ON(p == NULL);
+
+	if (!install_p2mtop_page(pfn, p))
+		free_page((unsigned long)p);
+}
+
+/* Try to install p2m mapping; fail if intermediate bits missing */
+bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+	unsigned topidx, idx;
 
 	if (unlikely(pfn >= MAX_DOMAIN_PAGES)) {
 		BUG_ON(mfn != INVALID_P2M_ENTRY);
-		return;
+		return true;
 	}
 
 	topidx = p2m_top_index(pfn);
 	if (p2m_top[topidx] == p2m_missing) {
-		/* no need to allocate a page to store an invalid entry */
 		if (mfn == INVALID_P2M_ENTRY)
-			return;
-		alloc_p2m(&p2m_top[topidx], &p2m_top_mfn[topidx]);
+			return true;
+		return false;
 	}
 
 	idx = p2m_index(pfn);
 	p2m_top[topidx][idx] = mfn;
+
+	return true;
+}
+
+void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
+		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
+		return;
+	}
+
+	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
+		alloc_p2m(pfn);
+
+		if (!__set_phys_to_machine(pfn, mfn))
+			BUG();
+	}
 }
 
 unsigned long arbitrary_virt_to_mfn(void *vaddr)
diff --git a/arch/x86/xen/mmu.h b/arch/x86/xen/mmu.h
index 24d1b44..da73026 100644
--- a/arch/x86/xen/mmu.h
+++ b/arch/x86/xen/mmu.h
@@ -11,6 +11,9 @@ enum pt_level {
 };
 
 
+bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
+bool install_p2mtop_page(unsigned long pfn, unsigned long *p);
+
 void set_pte_mfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);
 
 
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 02/24] xen: separate p2m allocation from setting
@ 2009-03-13  8:11   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

When doing very early p2m setting, we need to separate setting
from allocation, so split things up accordingly.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/mmu.c |   61 +++++++++++++++++++++++++++++++++++++--------------
 arch/x86/xen/mmu.h |    3 ++
 2 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index eceff87..d534986 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -233,47 +233,74 @@ unsigned long get_phys_to_machine(unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(get_phys_to_machine);
 
-static void alloc_p2m(unsigned long **pp, unsigned long *mfnp)
+/* install a  new p2m_top page */
+bool install_p2mtop_page(unsigned long pfn, unsigned long *p)
 {
-	unsigned long *p;
+	unsigned topidx = p2m_top_index(pfn);
+	unsigned long **pfnp, *mfnp;
 	unsigned i;
 
-	p = (void *)__get_free_page(GFP_KERNEL | __GFP_NOFAIL);
-	BUG_ON(p == NULL);
+	pfnp = &p2m_top[topidx];
+	mfnp = &p2m_top_mfn[topidx];
 
 	for (i = 0; i < P2M_ENTRIES_PER_PAGE; i++)
 		p[i] = INVALID_P2M_ENTRY;
 
-	if (cmpxchg(pp, p2m_missing, p) != p2m_missing)
-		free_page((unsigned long)p);
-	else
+	if (cmpxchg(pfnp, p2m_missing, p) == p2m_missing) {
 		*mfnp = virt_to_mfn(p);
+		return true;
+	}
+
+	return false;
 }
 
-void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+static void alloc_p2m(unsigned long pfn)
 {
-	unsigned topidx, idx;
+	unsigned long *p;
 
-	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
-		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
-		return;
-	}
+	p = (void *)__get_free_page(GFP_KERNEL | __GFP_NOFAIL);
+	BUG_ON(p == NULL);
+
+	if (!install_p2mtop_page(pfn, p))
+		free_page((unsigned long)p);
+}
+
+/* Try to install p2m mapping; fail if intermediate bits missing */
+bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+	unsigned topidx, idx;
 
 	if (unlikely(pfn >= MAX_DOMAIN_PAGES)) {
 		BUG_ON(mfn != INVALID_P2M_ENTRY);
-		return;
+		return true;
 	}
 
 	topidx = p2m_top_index(pfn);
 	if (p2m_top[topidx] == p2m_missing) {
-		/* no need to allocate a page to store an invalid entry */
 		if (mfn == INVALID_P2M_ENTRY)
-			return;
-		alloc_p2m(&p2m_top[topidx], &p2m_top_mfn[topidx]);
+			return true;
+		return false;
 	}
 
 	idx = p2m_index(pfn);
 	p2m_top[topidx][idx] = mfn;
+
+	return true;
+}
+
+void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+	if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
+		BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
+		return;
+	}
+
+	if (unlikely(!__set_phys_to_machine(pfn, mfn)))  {
+		alloc_p2m(pfn);
+
+		if (!__set_phys_to_machine(pfn, mfn))
+			BUG();
+	}
 }
 
 unsigned long arbitrary_virt_to_mfn(void *vaddr)
diff --git a/arch/x86/xen/mmu.h b/arch/x86/xen/mmu.h
index 24d1b44..da73026 100644
--- a/arch/x86/xen/mmu.h
+++ b/arch/x86/xen/mmu.h
@@ -11,6 +11,9 @@ enum pt_level {
 };
 
 
+bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
+bool install_p2mtop_page(unsigned long pfn, unsigned long *p);
+
 void set_pte_mfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags);
 
 
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 03/24] xen: dynamically allocate p2m tables
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
  2009-03-13  8:11   ` Jeremy Fitzhardinge
  2009-03-13  8:11   ` Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 04/24] xen: split construction of p2m mfn tables from registration Jeremy Fitzhardinge
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Saves about 128k static object size.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/mmu.c |   38 +++++++++++++++++++++++++++++---------
 1 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d534986..05280b4 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -159,18 +159,14 @@ DEFINE_PER_CPU(unsigned long, xen_current_cr3);	 /* actual vcpu cr3 */
 #define TOP_ENTRIES		(MAX_DOMAIN_PAGES / P2M_ENTRIES_PER_PAGE)
 
 /* Placeholder for holes in the address space */
-static unsigned long p2m_missing[P2M_ENTRIES_PER_PAGE] __page_aligned_data =
-		{ [ 0 ... P2M_ENTRIES_PER_PAGE-1 ] = ~0UL };
+static unsigned long *p2m_missing;
 
  /* Array of pointers to pages containing p2m entries */
-static unsigned long *p2m_top[TOP_ENTRIES] __page_aligned_data =
-		{ [ 0 ... TOP_ENTRIES - 1] = &p2m_missing[0] };
+static unsigned long **p2m_top;
 
 /* Arrays of p2m arrays expressed in mfns used for save/restore */
-static unsigned long p2m_top_mfn[TOP_ENTRIES] __page_aligned_bss;
-
-static unsigned long p2m_top_mfn_list[TOP_ENTRIES / P2M_ENTRIES_PER_PAGE]
-	__page_aligned_bss;
+static unsigned long *p2m_top_mfn;
+static unsigned long *p2m_top_mfn_list;
 
 static inline unsigned p2m_top_index(unsigned long pfn)
 {
@@ -183,18 +179,28 @@ static inline unsigned p2m_index(unsigned long pfn)
 	return pfn % P2M_ENTRIES_PER_PAGE;
 }
 
+#define SIZE_TOP_MFN sizeof(*p2m_top_mfn) * TOP_ENTRIES
+#define SIZE_TOP_MFN_LIST sizeof(*p2m_top_mfn_list) *			\
+	(TOP_ENTRIES / P2M_ENTRIES_PER_PAGE)
+
+RESERVE_BRK(xen_top_mfn, SIZE_TOP_MFN);
+RESERVE_BRK(xen_top_mfn_list, SIZE_TOP_MFN_LIST);
+
 /* Build the parallel p2m_top_mfn structures */
 void xen_setup_mfn_list_list(void)
 {
 	unsigned pfn, idx;
 
+	p2m_top_mfn = extend_brk(SIZE_TOP_MFN, PAGE_SIZE);
+	p2m_top_mfn_list = extend_brk(SIZE_TOP_MFN_LIST, PAGE_SIZE);
+
 	for (pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn += P2M_ENTRIES_PER_PAGE) {
 		unsigned topidx = p2m_top_index(pfn);
 
 		p2m_top_mfn[topidx] = virt_to_mfn(p2m_top[topidx]);
 	}
 
-	for (idx = 0; idx < ARRAY_SIZE(p2m_top_mfn_list); idx++) {
+	for (idx = 0; idx < (TOP_ENTRIES / P2M_ENTRIES_PER_PAGE); idx++) {
 		unsigned topidx = idx * P2M_ENTRIES_PER_PAGE;
 		p2m_top_mfn_list[idx] = virt_to_mfn(&p2m_top_mfn[topidx]);
 	}
@@ -206,12 +212,26 @@ void xen_setup_mfn_list_list(void)
 	HYPERVISOR_shared_info->arch.max_pfn = xen_start_info->nr_pages;
 }
 
+#define SIZE_P2M_MISSING	sizeof(*p2m_missing) * P2M_ENTRIES_PER_PAGE
+#define SIZE_P2M_TOP		sizeof(*p2m_top) * TOP_ENTRIES
+RESERVE_BRK(xen_p2m_missing, SIZE_P2M_MISSING);
+RESERVE_BRK(xen_p2m_top, SIZE_P2M_TOP);
+
 /* Set up p2m_top to point to the domain-builder provided p2m pages */
 void __init xen_build_dynamic_phys_to_machine(void)
 {
 	unsigned long *mfn_list = (unsigned long *)xen_start_info->mfn_list;
 	unsigned long max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
 	unsigned pfn;
+	unsigned i;
+
+	p2m_missing = extend_brk(SIZE_P2M_MISSING, PAGE_SIZE);
+	for(i = 0; i < P2M_ENTRIES_PER_PAGE; i++)
+		p2m_missing[i] = ~0ul;
+
+	p2m_top = extend_brk(SIZE_P2M_TOP, PAGE_SIZE);
+	for(i = 0; i < TOP_ENTRIES; i++)
+		p2m_top[i] = p2m_missing;
 
 	for (pfn = 0; pfn < max_pfn; pfn += P2M_ENTRIES_PER_PAGE) {
 		unsigned topidx = p2m_top_index(pfn);
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 04/24] xen: split construction of p2m mfn tables from registration
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (2 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 03/24] xen: dynamically allocate p2m tables Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11   ` Jeremy Fitzhardinge
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Build the p2m_mfn_list_list early with the rest of the p2m table, but
register it later when the real shared_info structure is in place.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/mmu.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 05280b4..2d30b74 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -187,7 +187,7 @@ RESERVE_BRK(xen_top_mfn, SIZE_TOP_MFN);
 RESERVE_BRK(xen_top_mfn_list, SIZE_TOP_MFN_LIST);
 
 /* Build the parallel p2m_top_mfn structures */
-void xen_setup_mfn_list_list(void)
+static void __init xen_build_mfn_list_list(void)
 {
 	unsigned pfn, idx;
 
@@ -204,7 +204,10 @@ void xen_setup_mfn_list_list(void)
 		unsigned topidx = idx * P2M_ENTRIES_PER_PAGE;
 		p2m_top_mfn_list[idx] = virt_to_mfn(&p2m_top_mfn[topidx]);
 	}
+}
 
+void xen_setup_mfn_list_list(void)
+{
 	BUG_ON(HYPERVISOR_shared_info == &xen_dummy_shared_info);
 
 	HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list =
@@ -238,6 +241,8 @@ void __init xen_build_dynamic_phys_to_machine(void)
 
 		p2m_top[topidx] = &mfn_list[pfn];
 	}
+
+	xen_build_mfn_list_list();
 }
 
 unsigned long get_phys_to_machine(unsigned long pfn)
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 05/24] xen: clean up xen_load_gdt
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
@ 2009-03-13  8:11   ` Jeremy Fitzhardinge
  2009-03-13  8:11   ` Jeremy Fitzhardinge
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Makes the logic a bit clearer.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/enlighten.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 70b355d..5776dc2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -301,10 +301,21 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
 	frames = mcs.args;
 
 	for (f = 0; va < dtr->address + size; va += PAGE_SIZE, f++) {
-		frames[f] = arbitrary_virt_to_mfn((void *)va);
+		int level;
+		pte_t *ptep = lookup_address(va, &level);
+		unsigned long pfn, mfn;
+		void *virt;
+
+		BUG_ON(ptep == NULL);
+
+		pfn = pte_pfn(*ptep);
+		mfn = pfn_to_mfn(pfn);
+		virt = __va(PFN_PHYS(pfn));
+
+		frames[f] = mfn;
 
 		make_lowmem_page_readonly((void *)va);
-		make_lowmem_page_readonly(mfn_to_virt(frames[f]));
+		make_lowmem_page_readonly(virt);
 	}
 
 	MULTI_set_gdt(mcs.mc, frames, size / sizeof(struct desc_struct));
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 05/24] xen: clean up xen_load_gdt
@ 2009-03-13  8:11   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Makes the logic a bit clearer.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/enlighten.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 70b355d..5776dc2 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -301,10 +301,21 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
 	frames = mcs.args;
 
 	for (f = 0; va < dtr->address + size; va += PAGE_SIZE, f++) {
-		frames[f] = arbitrary_virt_to_mfn((void *)va);
+		int level;
+		pte_t *ptep = lookup_address(va, &level);
+		unsigned long pfn, mfn;
+		void *virt;
+
+		BUG_ON(ptep == NULL);
+
+		pfn = pte_pfn(*ptep);
+		mfn = pfn_to_mfn(pfn);
+		virt = __va(PFN_PHYS(pfn));
+
+		frames[f] = mfn;
 
 		make_lowmem_page_readonly((void *)va);
-		make_lowmem_page_readonly(mfn_to_virt(frames[f]));
+		make_lowmem_page_readonly(virt);
 	}
 
 	MULTI_set_gdt(mcs.mc, frames, size / sizeof(struct desc_struct));
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 06/24] xen: make xen_load_gdt simpler
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
@ 2009-03-13  8:11   ` Jeremy Fitzhardinge
  2009-03-13  8:11   ` Jeremy Fitzhardinge
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Remove use of multicall machinery which is unused (gdt loading
is never performance critical).  This removes the implicit use
of percpu variables, which simplifies understanding how
the percpu code's use of load_gdt interacts with this code.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/enlighten.c |   14 ++++++--------
 1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5776dc2..48b399b 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -284,12 +284,11 @@ static void xen_set_ldt(const void *addr, unsigned entries)
 
 static void xen_load_gdt(const struct desc_ptr *dtr)
 {
-	unsigned long *frames;
 	unsigned long va = dtr->address;
 	unsigned int size = dtr->size + 1;
 	unsigned pages = (size + PAGE_SIZE - 1) / PAGE_SIZE;
+	unsigned long frames[pages];
 	int f;
-	struct multicall_space mcs;
 
 	/* A GDT can be up to 64k in size, which corresponds to 8192
 	   8-byte entries, or 16 4k pages.. */
@@ -297,9 +296,6 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
 	BUG_ON(size > 65536);
 	BUG_ON(va & ~PAGE_MASK);
 
-	mcs = xen_mc_entry(sizeof(*frames) * pages);
-	frames = mcs.args;
-
 	for (f = 0; va < dtr->address + size; va += PAGE_SIZE, f++) {
 		int level;
 		pte_t *ptep = lookup_address(va, &level);
@@ -314,13 +310,15 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
 
 		frames[f] = mfn;
 
+		printk("xen_load_gdt: %d va=%p mfn=%lx pfn=%lx va'=%p\n",
+		       f, (void *)va, mfn, pfn, virt);
+
 		make_lowmem_page_readonly((void *)va);
 		make_lowmem_page_readonly(virt);
 	}
 
-	MULTI_set_gdt(mcs.mc, frames, size / sizeof(struct desc_struct));
-
-	xen_mc_issue(PARAVIRT_LAZY_CPU);
+	if (HYPERVISOR_set_gdt(frames, size / sizeof(struct desc_struct)))
+		BUG();
 }
 
 static void load_TLS_descriptor(struct thread_struct *t,
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 06/24] xen: make xen_load_gdt simpler
@ 2009-03-13  8:11   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Remove use of multicall machinery which is unused (gdt loading
is never performance critical).  This removes the implicit use
of percpu variables, which simplifies understanding how
the percpu code's use of load_gdt interacts with this code.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/enlighten.c |   14 ++++++--------
 1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5776dc2..48b399b 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -284,12 +284,11 @@ static void xen_set_ldt(const void *addr, unsigned entries)
 
 static void xen_load_gdt(const struct desc_ptr *dtr)
 {
-	unsigned long *frames;
 	unsigned long va = dtr->address;
 	unsigned int size = dtr->size + 1;
 	unsigned pages = (size + PAGE_SIZE - 1) / PAGE_SIZE;
+	unsigned long frames[pages];
 	int f;
-	struct multicall_space mcs;
 
 	/* A GDT can be up to 64k in size, which corresponds to 8192
 	   8-byte entries, or 16 4k pages.. */
@@ -297,9 +296,6 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
 	BUG_ON(size > 65536);
 	BUG_ON(va & ~PAGE_MASK);
 
-	mcs = xen_mc_entry(sizeof(*frames) * pages);
-	frames = mcs.args;
-
 	for (f = 0; va < dtr->address + size; va += PAGE_SIZE, f++) {
 		int level;
 		pte_t *ptep = lookup_address(va, &level);
@@ -314,13 +310,15 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
 
 		frames[f] = mfn;
 
+		printk("xen_load_gdt: %d va=%p mfn=%lx pfn=%lx va'=%p\n",
+		       f, (void *)va, mfn, pfn, virt);
+
 		make_lowmem_page_readonly((void *)va);
 		make_lowmem_page_readonly(virt);
 	}
 
-	MULTI_set_gdt(mcs.mc, frames, size / sizeof(struct desc_struct));
-
-	xen_mc_issue(PARAVIRT_LAZY_CPU);
+	if (HYPERVISOR_set_gdt(frames, size / sizeof(struct desc_struct)))
+		BUG();
 }
 
 static void load_TLS_descriptor(struct thread_struct *t,
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 07/24] xen: remove xen_load_gdt debug
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (5 preceding siblings ...)
  2009-03-13  8:11   ` Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 08/24] xen: reserve i386 Xen pagetables Jeremy Fitzhardinge
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Don't need the noise.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/enlighten.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 48b399b..75b7a0f 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -310,9 +310,6 @@ static void xen_load_gdt(const struct desc_ptr *dtr)
 
 		frames[f] = mfn;
 
-		printk("xen_load_gdt: %d va=%p mfn=%lx pfn=%lx va'=%p\n",
-		       f, (void *)va, mfn, pfn, virt);
-
 		make_lowmem_page_readonly((void *)va);
 		make_lowmem_page_readonly(virt);
 	}
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 08/24] xen: reserve i386 Xen pagetables
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (6 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 07/24] xen: remove xen_load_gdt debug Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 09/24] NULL noise: arch/x86/xen/smp.c Jeremy Fitzhardinge
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Make sure the Xen-provided pagetables are reserved on x86-32.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/mmu.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 2d30b74..065fe8d 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1789,6 +1789,11 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 
 	pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir)));
 
+	reserve_early(__pa(xen_start_info->pt_base),
+		      __pa(xen_start_info->pt_base +
+			   xen_start_info->nr_pt_frames * PAGE_SIZE),
+		      "XEN PAGETABLES");
+
 	return swapper_pg_dir;
 }
 #endif	/* CONFIG_X86_64 */
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 09/24] NULL noise: arch/x86/xen/smp.c
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (7 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 08/24] xen: reserve i386 Xen pagetables Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 10/24] xen: mask XSAVE from cpuid Jeremy Fitzhardinge
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Hannes Eder, Jeremy Fitzhardinge

From: Hannes Eder <hannes@hanneseder.net>

Fix this sparse warnings:
  arch/x86/xen/smp.c:316:52: warning: Using plain integer as NULL pointer
  arch/x86/xen/smp.c:421:60: warning: Using plain integer as NULL pointer

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/smp.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 8d47056..304d832 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -317,7 +317,7 @@ static int __cpuinit xen_cpu_up(unsigned int cpu)
 	BUG_ON(rc);
 
 	while(per_cpu(cpu_state, cpu) != CPU_ONLINE) {
-		HYPERVISOR_sched_op(SCHEDOP_yield, 0);
+		HYPERVISOR_sched_op(SCHEDOP_yield, NULL);
 		barrier();
 	}
 
@@ -422,7 +422,7 @@ static void xen_smp_send_call_function_ipi(const struct cpumask *mask)
 	/* Make sure other vcpus get a chance to run if they need to. */
 	for_each_cpu(cpu, mask) {
 		if (xen_vcpu_stolen(cpu)) {
-			HYPERVISOR_sched_op(SCHEDOP_yield, 0);
+			HYPERVISOR_sched_op(SCHEDOP_yield, NULL);
 			break;
 		}
 	}
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (8 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 09/24] NULL noise: arch/x86/xen/smp.c Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  9:50     ` Jan Beulich
  2009-03-13  8:11 ` [PATCH 11/24] xen: add FIX_TEXT_POKE to fixmap Jeremy Fitzhardinge
                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE
to be set.  This confuses the kernel and it ends up crashing on
an xsetbv instruction.

At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of
cpuid it we can't.  This will produce a spurious error from Xen,
but allows us to support XSAVE if/when Xen does.

This also factors out the cpuid mask decisions to boot time.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/enlighten.c |   50 ++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 75b7a0f..da33e0c 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -168,21 +168,23 @@ static void __init xen_banner(void)
 	       xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
 }
 
+static __read_mostly unsigned int cpuid_leaf1_edx_mask = ~0;
+static __read_mostly unsigned int cpuid_leaf1_ecx_mask = ~0;
+
 static void xen_cpuid(unsigned int *ax, unsigned int *bx,
 		      unsigned int *cx, unsigned int *dx)
 {
+	unsigned maskecx = ~0;
 	unsigned maskedx = ~0;
 
 	/*
 	 * Mask out inconvenient features, to try and disable as many
 	 * unsupported kernel subsystems as possible.
 	 */
-	if (*ax == 1)
-		maskedx = ~((1 << X86_FEATURE_APIC) |  /* disable APIC */
-			    (1 << X86_FEATURE_ACPI) |  /* disable ACPI */
-			    (1 << X86_FEATURE_MCE)  |  /* disable MCE */
-			    (1 << X86_FEATURE_MCA)  |  /* disable MCA */
-			    (1 << X86_FEATURE_ACC));   /* thermal monitoring */
+	if (*ax == 1) {
+		maskecx = cpuid_leaf1_ecx_mask;
+		maskedx = cpuid_leaf1_edx_mask;
+	}
 
 	asm(XEN_EMULATE_PREFIX "cpuid"
 		: "=a" (*ax),
@@ -190,9 +192,43 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
 		  "=c" (*cx),
 		  "=d" (*dx)
 		: "0" (*ax), "2" (*cx));
+
+	*cx &= maskecx;
 	*dx &= maskedx;
 }
 
+static __init void xen_init_cpuid_mask(void)
+{
+	unsigned int ax, bx, cx, dx;
+
+	cpuid_leaf1_edx_mask =
+		~((1 << X86_FEATURE_MCE)  |  /* disable MCE */
+		  (1 << X86_FEATURE_MCA)  |  /* disable MCA */
+		  (1 << X86_FEATURE_ACC));   /* thermal monitoring */
+
+	if (!xen_initial_domain())
+		cpuid_leaf1_edx_mask &=
+			~((1 << X86_FEATURE_APIC) |  /* disable local APIC */
+			  (1 << X86_FEATURE_ACPI));  /* disable ACPI */
+
+	ax = 1;
+	xen_cpuid(&ax, &bx, &cx, &dx);
+
+	/* cpuid claims we support xsave; try enabling it to see what happens */
+	if (cx & (1 << (X86_FEATURE_XSAVE % 32))) {
+		unsigned long cr4;
+
+		set_in_cr4(X86_CR4_OSXSAVE);
+		
+		cr4 = read_cr4();
+
+		if ((cr4 & X86_CR4_OSXSAVE) == 0)
+			cpuid_leaf1_ecx_mask &= ~(1 << (X86_FEATURE_XSAVE % 32));
+
+		clear_in_cr4(X86_CR4_OSXSAVE);
+	}
+}
+
 static void xen_set_debugreg(int reg, unsigned long val)
 {
 	HYPERVISOR_set_debugreg(reg, val);
@@ -901,6 +937,8 @@ asmlinkage void __init xen_start_kernel(void)
 
 	xen_init_irq_ops();
 
+	xen_init_cpuid_mask();
+
 #ifdef CONFIG_X86_LOCAL_APIC
 	/*
 	 * set up the basic apic ops.
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 11/24] xen: add FIX_TEXT_POKE to fixmap
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (9 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 10/24] xen: mask XSAVE from cpuid Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 12/24] x86-64: remove PGE from must-have feature list Jeremy Fitzhardinge
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

FIX_TEXT_POKE[01] are used to map kernel addresses, so they're mapping
pfns, not mfns.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/xen/mmu.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 065fe8d..8969353 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1821,6 +1821,9 @@ static void xen_set_fixmap(unsigned idx, unsigned long phys, pgprot_t prot)
 #ifdef CONFIG_X86_LOCAL_APIC
 	case FIX_APIC_BASE:	/* maps dummy local APIC */
 #endif
+	case FIX_TEXT_POKE0:
+	case FIX_TEXT_POKE1:
+		/* All local page mappings */
 		pte = pfn_pte(phys, prot);
 		break;
 
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 12/24] x86-64: remove PGE from must-have feature list
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (10 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 11/24] xen: add FIX_TEXT_POKE to fixmap Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-15 21:18   ` H. Peter Anvin
  2009-03-13  8:11 ` [PATCH 13/24] Xen: Add virt_to_pfn helper function Jeremy Fitzhardinge
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

PGE may not be available when running paravirtualized, so test the cpuid
bit before using it.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 arch/x86/include/asm/required-features.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h
index d5cd6c5..a4737dd 100644
--- a/arch/x86/include/asm/required-features.h
+++ b/arch/x86/include/asm/required-features.h
@@ -50,7 +50,7 @@
 #ifdef CONFIG_X86_64
 #define NEED_PSE	0
 #define NEED_MSR	(1<<(X86_FEATURE_MSR & 31))
-#define NEED_PGE	(1<<(X86_FEATURE_PGE & 31))
+#define NEED_PGE	0
 #define NEED_FXSR	(1<<(X86_FEATURE_FXSR & 31))
 #define NEED_XMM	(1<<(X86_FEATURE_XMM & 31))
 #define NEED_XMM2	(1<<(X86_FEATURE_XMM2 & 31))
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 13/24] Xen: Add virt_to_pfn helper function
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (11 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 12/24] x86-64: remove PGE from must-have feature list Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 14/24] xen: add irq_from_evtchn Jeremy Fitzhardinge
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Alex Nixon

From: Alex Nixon <alex.nixon@citrix.com>

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
---
 arch/x86/include/asm/xen/page.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index 1a918dd..018a0a4 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -124,7 +124,8 @@ static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
 
 /* VIRT <-> MACHINE conversion */
 #define virt_to_machine(v)	(phys_to_machine(XPADDR(__pa(v))))
-#define virt_to_mfn(v)		(pfn_to_mfn(PFN_DOWN(__pa(v))))
+#define virt_to_pfn(v)          (PFN_DOWN(__pa(v)))
+#define virt_to_mfn(v)		(pfn_to_mfn(virt_to_pfn(v)))
 #define mfn_to_virt(m)		(__va(mfn_to_pfn(m) << PAGE_SHIFT))
 
 static inline unsigned long pte_mfn(pte_t pte)
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 14/24] xen: add irq_from_evtchn
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (12 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 13/24] Xen: Add virt_to_pfn helper function Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11   ` Jeremy Fitzhardinge
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Ian Campbell, Jeremy Fitzhardinge

From: Ian Campbell <ian.campbell@citrix.com>

Given an evtchn, return the corresponding irq.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/events.c |    6 ++++++
 include/xen/events.h |    3 +++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 30963af..1cd2a0e 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -151,6 +151,12 @@ static unsigned int evtchn_from_irq(unsigned irq)
 	return info_for_irq(irq)->evtchn;
 }
 
+unsigned irq_from_evtchn(unsigned int evtchn)
+{
+	return evtchn_to_irq[evtchn];
+}
+EXPORT_SYMBOL_GPL(irq_from_evtchn);
+
 static enum ipi_vector ipi_from_irq(unsigned irq)
 {
 	struct irq_info *info = info_for_irq(irq);
diff --git a/include/xen/events.h b/include/xen/events.h
index 0d5f1ad..e68d59a 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -53,4 +53,7 @@ bool xen_test_irq_pending(int irq);
    irq will be disabled so it won't deliver an interrupt. */
 void xen_poll_irq(int irq);
 
+/* Determine the IRQ which is bound to an event channel */
+unsigned irq_from_evtchn(unsigned int evtchn);
+
 #endif	/* _XEN_EVENTS_H */
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 15/24] xen: add /dev/xen/evtchn driver
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
@ 2009-03-13  8:11   ` Jeremy Fitzhardinge
  2009-03-13  8:11   ` Jeremy Fitzhardinge
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Ian Campbell, Jeremy Fitzhardinge

From: Ian Campbell <ian.campbell@citrix.com>

This driver is used by application which wish to receive notifications
from the hypervisor or other guests via Xen's event channel
mechanism. In particular it is used by the xenstore daemon in domain
0.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/Kconfig  |   10 +
 drivers/xen/Makefile |    3 +-
 drivers/xen/evtchn.c |  494 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/xen/evtchn.h |   88 +++++++++
 4 files changed, 594 insertions(+), 1 deletions(-)
 create mode 100644 drivers/xen/evtchn.c
 create mode 100644 include/xen/evtchn.h

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 526187c..1bbb910 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -18,6 +18,16 @@ config XEN_SCRUB_PAGES
 	  secure, but slightly less efficient.
 	  If in doubt, say yes.
 
+config XEN_DEV_EVTCHN
+	tristate "Xen /dev/xen/evtchn device"
+	depends on XEN
+	default y
+	help
+	  The evtchn driver allows a userspace process to triger event
+	  channels and to receive notification of an event channel
+	  firing.
+	  If in doubt, say yes.
+
 config XENFS
 	tristate "Xen filesystem"
 	depends on XEN
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index ff8accc..1567639 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -4,4 +4,5 @@ obj-y	+= xenbus/
 obj-$(CONFIG_HOTPLUG_CPU)	+= cpu_hotplug.o
 obj-$(CONFIG_XEN_XENCOMM)	+= xencomm.o
 obj-$(CONFIG_XEN_BALLOON)	+= balloon.o
-obj-$(CONFIG_XENFS)		+= xenfs/
\ No newline at end of file
+obj-$(CONFIG_XEN_DEV_EVTCHN)	+= evtchn.o
+obj-$(CONFIG_XENFS)		+= xenfs/
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
new file mode 100644
index 0000000..517b9ee
--- /dev/null
+++ b/drivers/xen/evtchn.c
@@ -0,0 +1,494 @@
+/******************************************************************************
+ * evtchn.c
+ *
+ * Driver for receiving and demuxing event-channel signals.
+ *
+ * Copyright (c) 2004-2005, K A Fraser
+ * Multi-process extensions Copyright (c) 2004, Steven Smith
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/errno.h>
+#include <linux/miscdevice.h>
+#include <linux/major.h>
+#include <linux/proc_fs.h>
+#include <linux/stat.h>
+#include <linux/poll.h>
+#include <linux/irq.h>
+#include <linux/init.h>
+#include <linux/gfp.h>
+#include <linux/mutex.h>
+#include <linux/cpu.h>
+#include <xen/events.h>
+#include <xen/evtchn.h>
+#include <asm/xen/hypervisor.h>
+
+struct per_user_data {
+	/* Notification ring, accessed via /dev/xen/evtchn. */
+#define EVTCHN_RING_SIZE     (PAGE_SIZE / sizeof(evtchn_port_t))
+#define EVTCHN_RING_MASK(_i) ((_i)&(EVTCHN_RING_SIZE-1))
+	evtchn_port_t *ring;
+	unsigned int ring_cons, ring_prod, ring_overflow;
+	struct mutex ring_cons_mutex; /* protect against concurrent readers */
+
+	/* Processes wait on this queue when ring is empty. */
+	wait_queue_head_t evtchn_wait;
+	struct fasync_struct *evtchn_async_queue;
+	const char *name;
+};
+
+/* Who's bound to each port? */
+static struct per_user_data *port_user[NR_EVENT_CHANNELS];
+static DEFINE_SPINLOCK(port_user_lock);
+
+irqreturn_t evtchn_interrupt(int irq, void *data)
+{
+	unsigned int port = (unsigned long)data;
+	struct per_user_data *u;
+
+	spin_lock(&port_user_lock);
+
+	u = port_user[port];
+
+	disable_irq_nosync(irq);
+
+	if ((u->ring_prod - u->ring_cons) < EVTCHN_RING_SIZE) {
+		u->ring[EVTCHN_RING_MASK(u->ring_prod)] = port;
+		wmb(); /* Ensure ring contents visible */
+		if (u->ring_cons == u->ring_prod++) {
+			wake_up_interruptible(&u->evtchn_wait);
+			kill_fasync(&u->evtchn_async_queue,
+				    SIGIO, POLL_IN);
+		}
+	} else {
+		u->ring_overflow = 1;
+	}
+
+	spin_unlock(&port_user_lock);
+
+	return IRQ_HANDLED;
+}
+
+static ssize_t evtchn_read(struct file *file, char __user *buf,
+			   size_t count, loff_t *ppos)
+{
+	int rc;
+	unsigned int c, p, bytes1 = 0, bytes2 = 0;
+	struct per_user_data *u = file->private_data;
+
+	/* Whole number of ports. */
+	count &= ~(sizeof(evtchn_port_t)-1);
+
+	if (count == 0)
+		return 0;
+
+	if (count > PAGE_SIZE)
+		count = PAGE_SIZE;
+
+	for (;;) {
+		mutex_lock(&u->ring_cons_mutex);
+
+		rc = -EFBIG;
+		if (u->ring_overflow)
+			goto unlock_out;
+
+		c = u->ring_cons;
+		p = u->ring_prod;
+		if (c != p)
+			break;
+
+		mutex_unlock(&u->ring_cons_mutex);
+
+		if (file->f_flags & O_NONBLOCK)
+			return -EAGAIN;
+
+		rc = wait_event_interruptible(u->evtchn_wait,
+					      u->ring_cons != u->ring_prod);
+		if (rc)
+			return rc;
+	}
+
+	/* Byte lengths of two chunks. Chunk split (if any) is at ring wrap. */
+	if (((c ^ p) & EVTCHN_RING_SIZE) != 0) {
+		bytes1 = (EVTCHN_RING_SIZE - EVTCHN_RING_MASK(c)) *
+			sizeof(evtchn_port_t);
+		bytes2 = EVTCHN_RING_MASK(p) * sizeof(evtchn_port_t);
+	} else {
+		bytes1 = (p - c) * sizeof(evtchn_port_t);
+		bytes2 = 0;
+	}
+
+	/* Truncate chunks according to caller's maximum byte count. */
+	if (bytes1 > count) {
+		bytes1 = count;
+		bytes2 = 0;
+	} else if ((bytes1 + bytes2) > count) {
+		bytes2 = count - bytes1;
+	}
+
+	rc = -EFAULT;
+	rmb(); /* Ensure that we see the port before we copy it. */
+	if (copy_to_user(buf, &u->ring[EVTCHN_RING_MASK(c)], bytes1) ||
+	    ((bytes2 != 0) &&
+	     copy_to_user(&buf[bytes1], &u->ring[0], bytes2)))
+		goto unlock_out;
+
+	u->ring_cons += (bytes1 + bytes2) / sizeof(evtchn_port_t);
+	rc = bytes1 + bytes2;
+
+ unlock_out:
+	mutex_unlock(&u->ring_cons_mutex);
+	return rc;
+}
+
+static ssize_t evtchn_write(struct file *file, const char __user *buf,
+			    size_t count, loff_t *ppos)
+{
+	int rc, i;
+	evtchn_port_t *kbuf = (evtchn_port_t *)__get_free_page(GFP_KERNEL);
+	struct per_user_data *u = file->private_data;
+
+	if (kbuf == NULL)
+		return -ENOMEM;
+
+	/* Whole number of ports. */
+	count &= ~(sizeof(evtchn_port_t)-1);
+
+	rc = 0;
+	if (count == 0)
+		goto out;
+
+	if (count > PAGE_SIZE)
+		count = PAGE_SIZE;
+
+	rc = -EFAULT;
+	if (copy_from_user(kbuf, buf, count) != 0)
+		goto out;
+
+	spin_lock_irq(&port_user_lock);
+	for (i = 0; i < (count/sizeof(evtchn_port_t)); i++)
+		if ((kbuf[i] < NR_EVENT_CHANNELS) && (port_user[kbuf[i]] == u))
+			enable_irq(irq_from_evtchn(kbuf[i]));
+	spin_unlock_irq(&port_user_lock);
+
+	rc = count;
+
+ out:
+	free_page((unsigned long)kbuf);
+	return rc;
+}
+
+static int evtchn_bind_to_user(struct per_user_data *u, int port)
+{
+	int irq;
+	int rc = 0;
+
+	spin_lock_irq(&port_user_lock);
+
+	BUG_ON(port_user[port] != NULL);
+
+	irq = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED,
+					u->name, (void *)(unsigned long)port);
+	if (rc < 0)
+		goto fail;
+
+	port_user[port] = u;
+
+fail:
+	spin_unlock_irq(&port_user_lock);
+	return rc;
+}
+
+static void evtchn_unbind_from_user(struct per_user_data *u, int port)
+{
+	int irq = irq_from_evtchn(port);
+
+	unbind_from_irqhandler(irq, (void *)(unsigned long)port);
+	port_user[port] = NULL;
+}
+
+static long evtchn_ioctl(struct file *file,
+			 unsigned int cmd, unsigned long arg)
+{
+	int rc;
+	struct per_user_data *u = file->private_data;
+	void __user *uarg = (void __user *) arg;
+
+	switch (cmd) {
+	case IOCTL_EVTCHN_BIND_VIRQ: {
+		struct ioctl_evtchn_bind_virq bind;
+		struct evtchn_bind_virq bind_virq;
+
+		rc = -EFAULT;
+		if (copy_from_user(&bind, uarg, sizeof(bind)))
+			break;
+
+		bind_virq.virq = bind.virq;
+		bind_virq.vcpu = 0;
+		rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
+						 &bind_virq);
+		if (rc != 0)
+			break;
+
+		rc = evtchn_bind_to_user(u, bind_virq.port);
+		if (rc == 0)
+			rc = bind_virq.port;
+		break;
+	}
+
+	case IOCTL_EVTCHN_BIND_INTERDOMAIN: {
+		struct ioctl_evtchn_bind_interdomain bind;
+		struct evtchn_bind_interdomain bind_interdomain;
+
+		rc = -EFAULT;
+		if (copy_from_user(&bind, uarg, sizeof(bind)))
+			break;
+
+		bind_interdomain.remote_dom  = bind.remote_domain;
+		bind_interdomain.remote_port = bind.remote_port;
+		rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_interdomain,
+						 &bind_interdomain);
+		if (rc != 0)
+			break;
+
+		rc = evtchn_bind_to_user(u, bind_interdomain.local_port);
+		if (rc == 0)
+			rc = bind_interdomain.local_port;
+		break;
+	}
+
+	case IOCTL_EVTCHN_BIND_UNBOUND_PORT: {
+		struct ioctl_evtchn_bind_unbound_port bind;
+		struct evtchn_alloc_unbound alloc_unbound;
+
+		rc = -EFAULT;
+		if (copy_from_user(&bind, uarg, sizeof(bind)))
+			break;
+
+		alloc_unbound.dom        = DOMID_SELF;
+		alloc_unbound.remote_dom = bind.remote_domain;
+		rc = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound,
+						 &alloc_unbound);
+		if (rc != 0)
+			break;
+
+		rc = evtchn_bind_to_user(u, alloc_unbound.port);
+		if (rc == 0)
+			rc = alloc_unbound.port;
+		break;
+	}
+
+	case IOCTL_EVTCHN_UNBIND: {
+		struct ioctl_evtchn_unbind unbind;
+
+		rc = -EFAULT;
+		if (copy_from_user(&unbind, uarg, sizeof(unbind)))
+			break;
+
+		rc = -EINVAL;
+		if (unbind.port >= NR_EVENT_CHANNELS)
+			break;
+
+		spin_lock_irq(&port_user_lock);
+
+		rc = -ENOTCONN;
+		if (port_user[unbind.port] != u) {
+			spin_unlock_irq(&port_user_lock);
+			break;
+		}
+
+		evtchn_unbind_from_user(u, unbind.port);
+
+		spin_unlock_irq(&port_user_lock);
+
+		rc = 0;
+		break;
+	}
+
+	case IOCTL_EVTCHN_NOTIFY: {
+		struct ioctl_evtchn_notify notify;
+
+		rc = -EFAULT;
+		if (copy_from_user(&notify, uarg, sizeof(notify)))
+			break;
+
+		if (notify.port >= NR_EVENT_CHANNELS) {
+			rc = -EINVAL;
+		} else if (port_user[notify.port] != u) {
+			rc = -ENOTCONN;
+		} else {
+			notify_remote_via_evtchn(notify.port);
+			rc = 0;
+		}
+		break;
+	}
+
+	case IOCTL_EVTCHN_RESET: {
+		/* Initialise the ring to empty. Clear errors. */
+		mutex_lock(&u->ring_cons_mutex);
+		spin_lock_irq(&port_user_lock);
+		u->ring_cons = u->ring_prod = u->ring_overflow = 0;
+		spin_unlock_irq(&port_user_lock);
+		mutex_unlock(&u->ring_cons_mutex);
+		rc = 0;
+		break;
+	}
+
+	default:
+		rc = -ENOSYS;
+		break;
+	}
+
+	return rc;
+}
+
+static unsigned int evtchn_poll(struct file *file, poll_table *wait)
+{
+	unsigned int mask = POLLOUT | POLLWRNORM;
+	struct per_user_data *u = file->private_data;
+
+	poll_wait(file, &u->evtchn_wait, wait);
+	if (u->ring_cons != u->ring_prod)
+		mask |= POLLIN | POLLRDNORM;
+	if (u->ring_overflow)
+		mask = POLLERR;
+	return mask;
+}
+
+static int evtchn_fasync(int fd, struct file *filp, int on)
+{
+	struct per_user_data *u = filp->private_data;
+	return fasync_helper(fd, filp, on, &u->evtchn_async_queue);
+}
+
+static int evtchn_open(struct inode *inode, struct file *filp)
+{
+	struct per_user_data *u;
+
+	u = kzalloc(sizeof(*u), GFP_KERNEL);
+	if (u == NULL)
+		return -ENOMEM;
+
+	u->name = kasprintf(GFP_KERNEL, "evtchn:%s", current->comm);
+	if (u->name == NULL) {
+		kfree(u);
+		return -ENOMEM;
+	}
+
+	init_waitqueue_head(&u->evtchn_wait);
+
+	u->ring = (evtchn_port_t *)__get_free_page(GFP_KERNEL);
+	if (u->ring == NULL) {
+		kfree(u->name);
+		kfree(u);
+		return -ENOMEM;
+	}
+
+	mutex_init(&u->ring_cons_mutex);
+
+	filp->private_data = u;
+
+	return 0;
+}
+
+static int evtchn_release(struct inode *inode, struct file *filp)
+{
+	int i;
+	struct per_user_data *u = filp->private_data;
+
+	spin_lock_irq(&port_user_lock);
+
+	free_page((unsigned long)u->ring);
+
+	for (i = 0; i < NR_EVENT_CHANNELS; i++) {
+		if (port_user[i] != u)
+			continue;
+
+		evtchn_unbind_from_user(port_user[i], i);
+	}
+
+	spin_unlock_irq(&port_user_lock);
+
+	kfree(u->name);
+	kfree(u);
+
+	return 0;
+}
+
+static const struct file_operations evtchn_fops = {
+	.owner   = THIS_MODULE,
+	.read    = evtchn_read,
+	.write   = evtchn_write,
+	.unlocked_ioctl = evtchn_ioctl,
+	.poll    = evtchn_poll,
+	.fasync  = evtchn_fasync,
+	.open    = evtchn_open,
+	.release = evtchn_release,
+};
+
+static struct miscdevice evtchn_miscdev = {
+	.minor        = MISC_DYNAMIC_MINOR,
+	.name         = "evtchn",
+	.fops         = &evtchn_fops,
+};
+static int __init evtchn_init(void)
+{
+	int err;
+
+	if (!xen_domain())
+		return -ENODEV;
+
+	spin_lock_init(&port_user_lock);
+	memset(port_user, 0, sizeof(port_user));
+
+	/* Create '/dev/misc/evtchn'. */
+	err = misc_register(&evtchn_miscdev);
+	if (err != 0) {
+		printk(KERN_ALERT "Could not register /dev/misc/evtchn\n");
+		return err;
+	}
+
+	printk(KERN_INFO "Event-channel device installed.\n");
+
+	return 0;
+}
+
+static void __exit evtchn_cleanup(void)
+{
+	misc_deregister(&evtchn_miscdev);
+}
+
+module_init(evtchn_init);
+module_exit(evtchn_cleanup);
+
+MODULE_LICENSE("GPL");
diff --git a/include/xen/evtchn.h b/include/xen/evtchn.h
new file mode 100644
index 0000000..14e833e
--- /dev/null
+++ b/include/xen/evtchn.h
@@ -0,0 +1,88 @@
+/******************************************************************************
+ * evtchn.h
+ *
+ * Interface to /dev/xen/evtchn.
+ *
+ * Copyright (c) 2003-2005, K A Fraser
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __LINUX_PUBLIC_EVTCHN_H__
+#define __LINUX_PUBLIC_EVTCHN_H__
+
+/*
+ * Bind a fresh port to VIRQ @virq.
+ * Return allocated port.
+ */
+#define IOCTL_EVTCHN_BIND_VIRQ				\
+	_IOC(_IOC_NONE, 'E', 0, sizeof(struct ioctl_evtchn_bind_virq))
+struct ioctl_evtchn_bind_virq {
+	unsigned int virq;
+};
+
+/*
+ * Bind a fresh port to remote <@remote_domain, @remote_port>.
+ * Return allocated port.
+ */
+#define IOCTL_EVTCHN_BIND_INTERDOMAIN			\
+	_IOC(_IOC_NONE, 'E', 1, sizeof(struct ioctl_evtchn_bind_interdomain))
+struct ioctl_evtchn_bind_interdomain {
+	unsigned int remote_domain, remote_port;
+};
+
+/*
+ * Allocate a fresh port for binding to @remote_domain.
+ * Return allocated port.
+ */
+#define IOCTL_EVTCHN_BIND_UNBOUND_PORT			\
+	_IOC(_IOC_NONE, 'E', 2, sizeof(struct ioctl_evtchn_bind_unbound_port))
+struct ioctl_evtchn_bind_unbound_port {
+	unsigned int remote_domain;
+};
+
+/*
+ * Unbind previously allocated @port.
+ */
+#define IOCTL_EVTCHN_UNBIND				\
+	_IOC(_IOC_NONE, 'E', 3, sizeof(struct ioctl_evtchn_unbind))
+struct ioctl_evtchn_unbind {
+	unsigned int port;
+};
+
+/*
+ * Unbind previously allocated @port.
+ */
+#define IOCTL_EVTCHN_NOTIFY				\
+	_IOC(_IOC_NONE, 'E', 4, sizeof(struct ioctl_evtchn_notify))
+struct ioctl_evtchn_notify {
+	unsigned int port;
+};
+
+/* Clear and reinitialise the event buffer. Clear error condition. */
+#define IOCTL_EVTCHN_RESET				\
+	_IOC(_IOC_NONE, 'E', 5, 0)
+
+#endif /* __LINUX_PUBLIC_EVTCHN_H__ */
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 15/24] xen: add /dev/xen/evtchn driver
@ 2009-03-13  8:11   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List,
	Jeremy Fitzhardinge, Ian Campbell

From: Ian Campbell <ian.campbell@citrix.com>

This driver is used by application which wish to receive notifications
from the hypervisor or other guests via Xen's event channel
mechanism. In particular it is used by the xenstore daemon in domain
0.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/Kconfig  |   10 +
 drivers/xen/Makefile |    3 +-
 drivers/xen/evtchn.c |  494 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/xen/evtchn.h |   88 +++++++++
 4 files changed, 594 insertions(+), 1 deletions(-)
 create mode 100644 drivers/xen/evtchn.c
 create mode 100644 include/xen/evtchn.h

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 526187c..1bbb910 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -18,6 +18,16 @@ config XEN_SCRUB_PAGES
 	  secure, but slightly less efficient.
 	  If in doubt, say yes.
 
+config XEN_DEV_EVTCHN
+	tristate "Xen /dev/xen/evtchn device"
+	depends on XEN
+	default y
+	help
+	  The evtchn driver allows a userspace process to triger event
+	  channels and to receive notification of an event channel
+	  firing.
+	  If in doubt, say yes.
+
 config XENFS
 	tristate "Xen filesystem"
 	depends on XEN
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index ff8accc..1567639 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -4,4 +4,5 @@ obj-y	+= xenbus/
 obj-$(CONFIG_HOTPLUG_CPU)	+= cpu_hotplug.o
 obj-$(CONFIG_XEN_XENCOMM)	+= xencomm.o
 obj-$(CONFIG_XEN_BALLOON)	+= balloon.o
-obj-$(CONFIG_XENFS)		+= xenfs/
\ No newline at end of file
+obj-$(CONFIG_XEN_DEV_EVTCHN)	+= evtchn.o
+obj-$(CONFIG_XENFS)		+= xenfs/
diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
new file mode 100644
index 0000000..517b9ee
--- /dev/null
+++ b/drivers/xen/evtchn.c
@@ -0,0 +1,494 @@
+/******************************************************************************
+ * evtchn.c
+ *
+ * Driver for receiving and demuxing event-channel signals.
+ *
+ * Copyright (c) 2004-2005, K A Fraser
+ * Multi-process extensions Copyright (c) 2004, Steven Smith
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/errno.h>
+#include <linux/miscdevice.h>
+#include <linux/major.h>
+#include <linux/proc_fs.h>
+#include <linux/stat.h>
+#include <linux/poll.h>
+#include <linux/irq.h>
+#include <linux/init.h>
+#include <linux/gfp.h>
+#include <linux/mutex.h>
+#include <linux/cpu.h>
+#include <xen/events.h>
+#include <xen/evtchn.h>
+#include <asm/xen/hypervisor.h>
+
+struct per_user_data {
+	/* Notification ring, accessed via /dev/xen/evtchn. */
+#define EVTCHN_RING_SIZE     (PAGE_SIZE / sizeof(evtchn_port_t))
+#define EVTCHN_RING_MASK(_i) ((_i)&(EVTCHN_RING_SIZE-1))
+	evtchn_port_t *ring;
+	unsigned int ring_cons, ring_prod, ring_overflow;
+	struct mutex ring_cons_mutex; /* protect against concurrent readers */
+
+	/* Processes wait on this queue when ring is empty. */
+	wait_queue_head_t evtchn_wait;
+	struct fasync_struct *evtchn_async_queue;
+	const char *name;
+};
+
+/* Who's bound to each port? */
+static struct per_user_data *port_user[NR_EVENT_CHANNELS];
+static DEFINE_SPINLOCK(port_user_lock);
+
+irqreturn_t evtchn_interrupt(int irq, void *data)
+{
+	unsigned int port = (unsigned long)data;
+	struct per_user_data *u;
+
+	spin_lock(&port_user_lock);
+
+	u = port_user[port];
+
+	disable_irq_nosync(irq);
+
+	if ((u->ring_prod - u->ring_cons) < EVTCHN_RING_SIZE) {
+		u->ring[EVTCHN_RING_MASK(u->ring_prod)] = port;
+		wmb(); /* Ensure ring contents visible */
+		if (u->ring_cons == u->ring_prod++) {
+			wake_up_interruptible(&u->evtchn_wait);
+			kill_fasync(&u->evtchn_async_queue,
+				    SIGIO, POLL_IN);
+		}
+	} else {
+		u->ring_overflow = 1;
+	}
+
+	spin_unlock(&port_user_lock);
+
+	return IRQ_HANDLED;
+}
+
+static ssize_t evtchn_read(struct file *file, char __user *buf,
+			   size_t count, loff_t *ppos)
+{
+	int rc;
+	unsigned int c, p, bytes1 = 0, bytes2 = 0;
+	struct per_user_data *u = file->private_data;
+
+	/* Whole number of ports. */
+	count &= ~(sizeof(evtchn_port_t)-1);
+
+	if (count == 0)
+		return 0;
+
+	if (count > PAGE_SIZE)
+		count = PAGE_SIZE;
+
+	for (;;) {
+		mutex_lock(&u->ring_cons_mutex);
+
+		rc = -EFBIG;
+		if (u->ring_overflow)
+			goto unlock_out;
+
+		c = u->ring_cons;
+		p = u->ring_prod;
+		if (c != p)
+			break;
+
+		mutex_unlock(&u->ring_cons_mutex);
+
+		if (file->f_flags & O_NONBLOCK)
+			return -EAGAIN;
+
+		rc = wait_event_interruptible(u->evtchn_wait,
+					      u->ring_cons != u->ring_prod);
+		if (rc)
+			return rc;
+	}
+
+	/* Byte lengths of two chunks. Chunk split (if any) is at ring wrap. */
+	if (((c ^ p) & EVTCHN_RING_SIZE) != 0) {
+		bytes1 = (EVTCHN_RING_SIZE - EVTCHN_RING_MASK(c)) *
+			sizeof(evtchn_port_t);
+		bytes2 = EVTCHN_RING_MASK(p) * sizeof(evtchn_port_t);
+	} else {
+		bytes1 = (p - c) * sizeof(evtchn_port_t);
+		bytes2 = 0;
+	}
+
+	/* Truncate chunks according to caller's maximum byte count. */
+	if (bytes1 > count) {
+		bytes1 = count;
+		bytes2 = 0;
+	} else if ((bytes1 + bytes2) > count) {
+		bytes2 = count - bytes1;
+	}
+
+	rc = -EFAULT;
+	rmb(); /* Ensure that we see the port before we copy it. */
+	if (copy_to_user(buf, &u->ring[EVTCHN_RING_MASK(c)], bytes1) ||
+	    ((bytes2 != 0) &&
+	     copy_to_user(&buf[bytes1], &u->ring[0], bytes2)))
+		goto unlock_out;
+
+	u->ring_cons += (bytes1 + bytes2) / sizeof(evtchn_port_t);
+	rc = bytes1 + bytes2;
+
+ unlock_out:
+	mutex_unlock(&u->ring_cons_mutex);
+	return rc;
+}
+
+static ssize_t evtchn_write(struct file *file, const char __user *buf,
+			    size_t count, loff_t *ppos)
+{
+	int rc, i;
+	evtchn_port_t *kbuf = (evtchn_port_t *)__get_free_page(GFP_KERNEL);
+	struct per_user_data *u = file->private_data;
+
+	if (kbuf == NULL)
+		return -ENOMEM;
+
+	/* Whole number of ports. */
+	count &= ~(sizeof(evtchn_port_t)-1);
+
+	rc = 0;
+	if (count == 0)
+		goto out;
+
+	if (count > PAGE_SIZE)
+		count = PAGE_SIZE;
+
+	rc = -EFAULT;
+	if (copy_from_user(kbuf, buf, count) != 0)
+		goto out;
+
+	spin_lock_irq(&port_user_lock);
+	for (i = 0; i < (count/sizeof(evtchn_port_t)); i++)
+		if ((kbuf[i] < NR_EVENT_CHANNELS) && (port_user[kbuf[i]] == u))
+			enable_irq(irq_from_evtchn(kbuf[i]));
+	spin_unlock_irq(&port_user_lock);
+
+	rc = count;
+
+ out:
+	free_page((unsigned long)kbuf);
+	return rc;
+}
+
+static int evtchn_bind_to_user(struct per_user_data *u, int port)
+{
+	int irq;
+	int rc = 0;
+
+	spin_lock_irq(&port_user_lock);
+
+	BUG_ON(port_user[port] != NULL);
+
+	irq = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED,
+					u->name, (void *)(unsigned long)port);
+	if (rc < 0)
+		goto fail;
+
+	port_user[port] = u;
+
+fail:
+	spin_unlock_irq(&port_user_lock);
+	return rc;
+}
+
+static void evtchn_unbind_from_user(struct per_user_data *u, int port)
+{
+	int irq = irq_from_evtchn(port);
+
+	unbind_from_irqhandler(irq, (void *)(unsigned long)port);
+	port_user[port] = NULL;
+}
+
+static long evtchn_ioctl(struct file *file,
+			 unsigned int cmd, unsigned long arg)
+{
+	int rc;
+	struct per_user_data *u = file->private_data;
+	void __user *uarg = (void __user *) arg;
+
+	switch (cmd) {
+	case IOCTL_EVTCHN_BIND_VIRQ: {
+		struct ioctl_evtchn_bind_virq bind;
+		struct evtchn_bind_virq bind_virq;
+
+		rc = -EFAULT;
+		if (copy_from_user(&bind, uarg, sizeof(bind)))
+			break;
+
+		bind_virq.virq = bind.virq;
+		bind_virq.vcpu = 0;
+		rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
+						 &bind_virq);
+		if (rc != 0)
+			break;
+
+		rc = evtchn_bind_to_user(u, bind_virq.port);
+		if (rc == 0)
+			rc = bind_virq.port;
+		break;
+	}
+
+	case IOCTL_EVTCHN_BIND_INTERDOMAIN: {
+		struct ioctl_evtchn_bind_interdomain bind;
+		struct evtchn_bind_interdomain bind_interdomain;
+
+		rc = -EFAULT;
+		if (copy_from_user(&bind, uarg, sizeof(bind)))
+			break;
+
+		bind_interdomain.remote_dom  = bind.remote_domain;
+		bind_interdomain.remote_port = bind.remote_port;
+		rc = HYPERVISOR_event_channel_op(EVTCHNOP_bind_interdomain,
+						 &bind_interdomain);
+		if (rc != 0)
+			break;
+
+		rc = evtchn_bind_to_user(u, bind_interdomain.local_port);
+		if (rc == 0)
+			rc = bind_interdomain.local_port;
+		break;
+	}
+
+	case IOCTL_EVTCHN_BIND_UNBOUND_PORT: {
+		struct ioctl_evtchn_bind_unbound_port bind;
+		struct evtchn_alloc_unbound alloc_unbound;
+
+		rc = -EFAULT;
+		if (copy_from_user(&bind, uarg, sizeof(bind)))
+			break;
+
+		alloc_unbound.dom        = DOMID_SELF;
+		alloc_unbound.remote_dom = bind.remote_domain;
+		rc = HYPERVISOR_event_channel_op(EVTCHNOP_alloc_unbound,
+						 &alloc_unbound);
+		if (rc != 0)
+			break;
+
+		rc = evtchn_bind_to_user(u, alloc_unbound.port);
+		if (rc == 0)
+			rc = alloc_unbound.port;
+		break;
+	}
+
+	case IOCTL_EVTCHN_UNBIND: {
+		struct ioctl_evtchn_unbind unbind;
+
+		rc = -EFAULT;
+		if (copy_from_user(&unbind, uarg, sizeof(unbind)))
+			break;
+
+		rc = -EINVAL;
+		if (unbind.port >= NR_EVENT_CHANNELS)
+			break;
+
+		spin_lock_irq(&port_user_lock);
+
+		rc = -ENOTCONN;
+		if (port_user[unbind.port] != u) {
+			spin_unlock_irq(&port_user_lock);
+			break;
+		}
+
+		evtchn_unbind_from_user(u, unbind.port);
+
+		spin_unlock_irq(&port_user_lock);
+
+		rc = 0;
+		break;
+	}
+
+	case IOCTL_EVTCHN_NOTIFY: {
+		struct ioctl_evtchn_notify notify;
+
+		rc = -EFAULT;
+		if (copy_from_user(&notify, uarg, sizeof(notify)))
+			break;
+
+		if (notify.port >= NR_EVENT_CHANNELS) {
+			rc = -EINVAL;
+		} else if (port_user[notify.port] != u) {
+			rc = -ENOTCONN;
+		} else {
+			notify_remote_via_evtchn(notify.port);
+			rc = 0;
+		}
+		break;
+	}
+
+	case IOCTL_EVTCHN_RESET: {
+		/* Initialise the ring to empty. Clear errors. */
+		mutex_lock(&u->ring_cons_mutex);
+		spin_lock_irq(&port_user_lock);
+		u->ring_cons = u->ring_prod = u->ring_overflow = 0;
+		spin_unlock_irq(&port_user_lock);
+		mutex_unlock(&u->ring_cons_mutex);
+		rc = 0;
+		break;
+	}
+
+	default:
+		rc = -ENOSYS;
+		break;
+	}
+
+	return rc;
+}
+
+static unsigned int evtchn_poll(struct file *file, poll_table *wait)
+{
+	unsigned int mask = POLLOUT | POLLWRNORM;
+	struct per_user_data *u = file->private_data;
+
+	poll_wait(file, &u->evtchn_wait, wait);
+	if (u->ring_cons != u->ring_prod)
+		mask |= POLLIN | POLLRDNORM;
+	if (u->ring_overflow)
+		mask = POLLERR;
+	return mask;
+}
+
+static int evtchn_fasync(int fd, struct file *filp, int on)
+{
+	struct per_user_data *u = filp->private_data;
+	return fasync_helper(fd, filp, on, &u->evtchn_async_queue);
+}
+
+static int evtchn_open(struct inode *inode, struct file *filp)
+{
+	struct per_user_data *u;
+
+	u = kzalloc(sizeof(*u), GFP_KERNEL);
+	if (u == NULL)
+		return -ENOMEM;
+
+	u->name = kasprintf(GFP_KERNEL, "evtchn:%s", current->comm);
+	if (u->name == NULL) {
+		kfree(u);
+		return -ENOMEM;
+	}
+
+	init_waitqueue_head(&u->evtchn_wait);
+
+	u->ring = (evtchn_port_t *)__get_free_page(GFP_KERNEL);
+	if (u->ring == NULL) {
+		kfree(u->name);
+		kfree(u);
+		return -ENOMEM;
+	}
+
+	mutex_init(&u->ring_cons_mutex);
+
+	filp->private_data = u;
+
+	return 0;
+}
+
+static int evtchn_release(struct inode *inode, struct file *filp)
+{
+	int i;
+	struct per_user_data *u = filp->private_data;
+
+	spin_lock_irq(&port_user_lock);
+
+	free_page((unsigned long)u->ring);
+
+	for (i = 0; i < NR_EVENT_CHANNELS; i++) {
+		if (port_user[i] != u)
+			continue;
+
+		evtchn_unbind_from_user(port_user[i], i);
+	}
+
+	spin_unlock_irq(&port_user_lock);
+
+	kfree(u->name);
+	kfree(u);
+
+	return 0;
+}
+
+static const struct file_operations evtchn_fops = {
+	.owner   = THIS_MODULE,
+	.read    = evtchn_read,
+	.write   = evtchn_write,
+	.unlocked_ioctl = evtchn_ioctl,
+	.poll    = evtchn_poll,
+	.fasync  = evtchn_fasync,
+	.open    = evtchn_open,
+	.release = evtchn_release,
+};
+
+static struct miscdevice evtchn_miscdev = {
+	.minor        = MISC_DYNAMIC_MINOR,
+	.name         = "evtchn",
+	.fops         = &evtchn_fops,
+};
+static int __init evtchn_init(void)
+{
+	int err;
+
+	if (!xen_domain())
+		return -ENODEV;
+
+	spin_lock_init(&port_user_lock);
+	memset(port_user, 0, sizeof(port_user));
+
+	/* Create '/dev/misc/evtchn'. */
+	err = misc_register(&evtchn_miscdev);
+	if (err != 0) {
+		printk(KERN_ALERT "Could not register /dev/misc/evtchn\n");
+		return err;
+	}
+
+	printk(KERN_INFO "Event-channel device installed.\n");
+
+	return 0;
+}
+
+static void __exit evtchn_cleanup(void)
+{
+	misc_deregister(&evtchn_miscdev);
+}
+
+module_init(evtchn_init);
+module_exit(evtchn_cleanup);
+
+MODULE_LICENSE("GPL");
diff --git a/include/xen/evtchn.h b/include/xen/evtchn.h
new file mode 100644
index 0000000..14e833e
--- /dev/null
+++ b/include/xen/evtchn.h
@@ -0,0 +1,88 @@
+/******************************************************************************
+ * evtchn.h
+ *
+ * Interface to /dev/xen/evtchn.
+ *
+ * Copyright (c) 2003-2005, K A Fraser
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __LINUX_PUBLIC_EVTCHN_H__
+#define __LINUX_PUBLIC_EVTCHN_H__
+
+/*
+ * Bind a fresh port to VIRQ @virq.
+ * Return allocated port.
+ */
+#define IOCTL_EVTCHN_BIND_VIRQ				\
+	_IOC(_IOC_NONE, 'E', 0, sizeof(struct ioctl_evtchn_bind_virq))
+struct ioctl_evtchn_bind_virq {
+	unsigned int virq;
+};
+
+/*
+ * Bind a fresh port to remote <@remote_domain, @remote_port>.
+ * Return allocated port.
+ */
+#define IOCTL_EVTCHN_BIND_INTERDOMAIN			\
+	_IOC(_IOC_NONE, 'E', 1, sizeof(struct ioctl_evtchn_bind_interdomain))
+struct ioctl_evtchn_bind_interdomain {
+	unsigned int remote_domain, remote_port;
+};
+
+/*
+ * Allocate a fresh port for binding to @remote_domain.
+ * Return allocated port.
+ */
+#define IOCTL_EVTCHN_BIND_UNBOUND_PORT			\
+	_IOC(_IOC_NONE, 'E', 2, sizeof(struct ioctl_evtchn_bind_unbound_port))
+struct ioctl_evtchn_bind_unbound_port {
+	unsigned int remote_domain;
+};
+
+/*
+ * Unbind previously allocated @port.
+ */
+#define IOCTL_EVTCHN_UNBIND				\
+	_IOC(_IOC_NONE, 'E', 3, sizeof(struct ioctl_evtchn_unbind))
+struct ioctl_evtchn_unbind {
+	unsigned int port;
+};
+
+/*
+ * Unbind previously allocated @port.
+ */
+#define IOCTL_EVTCHN_NOTIFY				\
+	_IOC(_IOC_NONE, 'E', 4, sizeof(struct ioctl_evtchn_notify))
+struct ioctl_evtchn_notify {
+	unsigned int port;
+};
+
+/* Clear and reinitialise the event buffer. Clear error condition. */
+#define IOCTL_EVTCHN_RESET				\
+	_IOC(_IOC_NONE, 'E', 5, 0)
+
+#endif /* __LINUX_PUBLIC_EVTCHN_H__ */
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 16/24] xen: export ioctl headers to userspace
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (14 preceding siblings ...)
  2009-03-13  8:11   ` Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 17/24] xen/dev-evtchn: clean up locking in evtchn Jeremy Fitzhardinge
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Ian Campbell, Jeremy Fitzhardinge

From: Ian Campbell <ian.campbell@citrix.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 include/Kbuild     |    1 +
 include/xen/Kbuild |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)
 create mode 100644 include/xen/Kbuild

diff --git a/include/Kbuild b/include/Kbuild
index d8c3e3c..fe36acc 100644
--- a/include/Kbuild
+++ b/include/Kbuild
@@ -8,3 +8,4 @@ header-y += mtd/
 header-y += rdma/
 header-y += video/
 header-y += drm/
+header-y += xen/
diff --git a/include/xen/Kbuild b/include/xen/Kbuild
new file mode 100644
index 0000000..4e65c16
--- /dev/null
+++ b/include/xen/Kbuild
@@ -0,0 +1 @@
+header-y += evtchn.h
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 17/24] xen/dev-evtchn: clean up locking in evtchn
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (15 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 16/24] xen: export ioctl headers to userspace Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 18/24] xen: add "capabilities" file Jeremy Fitzhardinge
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Define a new per_user_data mutex to serialize bind/unbind operations
to prevent them from racing with each other.  Fix error returns
and don't do a bind while holding a spinlock.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/evtchn.c |   37 +++++++++++++++++++++++++------------
 1 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 517b9ee..af03195 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -54,6 +54,8 @@
 #include <asm/xen/hypervisor.h>
 
 struct per_user_data {
+	struct mutex bind_mutex; /* serialize bind/unbind operations */
+
 	/* Notification ring, accessed via /dev/xen/evtchn. */
 #define EVTCHN_RING_SIZE     (PAGE_SIZE / sizeof(evtchn_port_t))
 #define EVTCHN_RING_MASK(_i) ((_i)&(EVTCHN_RING_SIZE-1))
@@ -69,7 +71,7 @@ struct per_user_data {
 
 /* Who's bound to each port? */
 static struct per_user_data *port_user[NR_EVENT_CHANNELS];
-static DEFINE_SPINLOCK(port_user_lock);
+static DEFINE_SPINLOCK(port_user_lock); /* protects port_user[] and ring_prod */
 
 irqreturn_t evtchn_interrupt(int irq, void *data)
 {
@@ -210,22 +212,24 @@ static ssize_t evtchn_write(struct file *file, const char __user *buf,
 
 static int evtchn_bind_to_user(struct per_user_data *u, int port)
 {
-	int irq;
 	int rc = 0;
 
-	spin_lock_irq(&port_user_lock);
-
+	/*
+	 * Ports are never reused, so every caller should pass in a
+	 * unique port.
+	 *
+	 * (Locking not necessary because we haven't registered the
+	 * interrupt handler yet, and our caller has already
+	 * serialized bind operations.)
+	 */
 	BUG_ON(port_user[port] != NULL);
-
-	irq = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED,
-					u->name, (void *)(unsigned long)port);
-	if (rc < 0)
-		goto fail;
-
 	port_user[port] = u;
 
-fail:
-	spin_unlock_irq(&port_user_lock);
+	rc = bind_evtchn_to_irqhandler(port, evtchn_interrupt, IRQF_DISABLED,
+				       u->name, (void *)(unsigned long)port);
+	if (rc >= 0)
+		rc = 0;
+
 	return rc;
 }
 
@@ -234,6 +238,10 @@ static void evtchn_unbind_from_user(struct per_user_data *u, int port)
 	int irq = irq_from_evtchn(port);
 
 	unbind_from_irqhandler(irq, (void *)(unsigned long)port);
+
+	/* make sure we unbind the irq handler before clearing the port */
+	barrier();
+
 	port_user[port] = NULL;
 }
 
@@ -244,6 +252,9 @@ static long evtchn_ioctl(struct file *file,
 	struct per_user_data *u = file->private_data;
 	void __user *uarg = (void __user *) arg;
 
+	/* Prevent bind from racing with unbind */
+	mutex_lock(&u->bind_mutex);
+
 	switch (cmd) {
 	case IOCTL_EVTCHN_BIND_VIRQ: {
 		struct ioctl_evtchn_bind_virq bind;
@@ -368,6 +379,7 @@ static long evtchn_ioctl(struct file *file,
 		rc = -ENOSYS;
 		break;
 	}
+	mutex_unlock(&u->bind_mutex);
 
 	return rc;
 }
@@ -414,6 +426,7 @@ static int evtchn_open(struct inode *inode, struct file *filp)
 		return -ENOMEM;
 	}
 
+	mutex_init(&u->bind_mutex);
 	mutex_init(&u->ring_cons_mutex);
 
 	filp->private_data = u;
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 18/24] xen: add "capabilities" file
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (16 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 17/24] xen/dev-evtchn: clean up locking in evtchn Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 19/24] xen: add /sys/hypervisor support Jeremy Fitzhardinge
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

The xenfs capabilities file allows usermode to determine what
capabilities the domain has.  The only one at present is "control_d"
in a privileged domain.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/xenfs/super.c |   19 ++++++++++++++++++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/xenfs/super.c b/drivers/xen/xenfs/super.c
index 515741a..6559e0c 100644
--- a/drivers/xen/xenfs/super.c
+++ b/drivers/xen/xenfs/super.c
@@ -20,10 +20,27 @@
 MODULE_DESCRIPTION("Xen filesystem");
 MODULE_LICENSE("GPL");
 
+static ssize_t capabilities_read(struct file *file, char __user *buf,
+				 size_t size, loff_t *off)
+{
+	char *tmp = "";
+
+	if (xen_initial_domain())
+		tmp = "control_d\n";
+
+	return simple_read_from_buffer(buf, size, off, tmp, strlen(tmp));
+}
+
+static const struct file_operations capabilities_file_ops = {
+	.read = capabilities_read,
+};
+
 static int xenfs_fill_super(struct super_block *sb, void *data, int silent)
 {
 	static struct tree_descr xenfs_files[] = {
-		[2] = {"xenbus", &xenbus_file_ops, S_IRUSR|S_IWUSR},
+		[1] = {},
+		{ "xenbus", &xenbus_file_ops, S_IRUSR|S_IWUSR },
+		{ "capabilities", &capabilities_file_ops, S_IRUGO },
 		{""},
 	};
 
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 19/24] xen: add /sys/hypervisor support
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (17 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 18/24] xen: add "capabilities" file Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 20/24] xen/sys/hypervisor: change writable_pt to features Jeremy Fitzhardinge
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Adds support for Xen info under /sys/hypervisor.  Taken from Novell 2.6.27
backport tree.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/Kconfig             |   10 +
 drivers/xen/Makefile            |    3 +-
 drivers/xen/sys-hypervisor.c    |  475 +++++++++++++++++++++++++++++++++++++++
 include/xen/interface/version.h |    3 +
 4 files changed, 490 insertions(+), 1 deletions(-)
 create mode 100644 drivers/xen/sys-hypervisor.c

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 526187c..88bca1c 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -41,3 +41,13 @@ config XEN_COMPAT_XENFS
          a xen platform.
          If in doubt, say yes.
 
+config XEN_SYS_HYPERVISOR
+       bool "Create xen entries under /sys/hypervisor"
+       depends on XEN && SYSFS
+       select SYS_HYPERVISOR
+       default y
+       help
+         Create entries under /sys/hypervisor describing the Xen
+	 hypervisor environment.  When running native or in another
+	 virtual environment, /sys/hypervisor will still be present,
+	 but will have no xen contents.
\ No newline at end of file
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index ff8accc..f3603a3 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -4,4 +4,5 @@ obj-y	+= xenbus/
 obj-$(CONFIG_HOTPLUG_CPU)	+= cpu_hotplug.o
 obj-$(CONFIG_XEN_XENCOMM)	+= xencomm.o
 obj-$(CONFIG_XEN_BALLOON)	+= balloon.o
-obj-$(CONFIG_XENFS)		+= xenfs/
\ No newline at end of file
+obj-$(CONFIG_XENFS)		+= xenfs/
+obj-$(CONFIG_XEN_SYS_HYPERVISOR)	+= sys-hypervisor.o
diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
new file mode 100644
index 0000000..cb29d1c
--- /dev/null
+++ b/drivers/xen/sys-hypervisor.c
@@ -0,0 +1,475 @@
+/*
+ *  copyright (c) 2006 IBM Corporation
+ *  Authored by: Mike D. Day <ncmike@us.ibm.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kobject.h>
+
+#include <asm/xen/hypervisor.h>
+#include <asm/xen/hypercall.h>
+
+#include <xen/xenbus.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/version.h>
+
+#define HYPERVISOR_ATTR_RO(_name) \
+static struct hyp_sysfs_attr  _name##_attr = __ATTR_RO(_name)
+
+#define HYPERVISOR_ATTR_RW(_name) \
+static struct hyp_sysfs_attr _name##_attr = \
+	__ATTR(_name, 0644, _name##_show, _name##_store)
+
+struct hyp_sysfs_attr {
+	struct attribute attr;
+	ssize_t (*show)(struct hyp_sysfs_attr *, char *);
+	ssize_t (*store)(struct hyp_sysfs_attr *, const char *, size_t);
+	void *hyp_attr_data;
+};
+
+static ssize_t type_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	return sprintf(buffer, "xen\n");
+}
+
+HYPERVISOR_ATTR_RO(type);
+
+static int __init xen_sysfs_type_init(void)
+{
+	return sysfs_create_file(hypervisor_kobj, &type_attr.attr);
+}
+
+static void xen_sysfs_type_destroy(void)
+{
+	sysfs_remove_file(hypervisor_kobj, &type_attr.attr);
+}
+
+/* xen version attributes */
+static ssize_t major_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int version = HYPERVISOR_xen_version(XENVER_version, NULL);
+	if (version)
+		return sprintf(buffer, "%d\n", version >> 16);
+	return -ENODEV;
+}
+
+HYPERVISOR_ATTR_RO(major);
+
+static ssize_t minor_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int version = HYPERVISOR_xen_version(XENVER_version, NULL);
+	if (version)
+		return sprintf(buffer, "%d\n", version & 0xff);
+	return -ENODEV;
+}
+
+HYPERVISOR_ATTR_RO(minor);
+
+static ssize_t extra_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int ret = -ENOMEM;
+	char *extra;
+
+	extra = kmalloc(XEN_EXTRAVERSION_LEN, GFP_KERNEL);
+	if (extra) {
+		ret = HYPERVISOR_xen_version(XENVER_extraversion, extra);
+		if (!ret)
+			ret = sprintf(buffer, "%s\n", extra);
+		kfree(extra);
+	}
+
+	return ret;
+}
+
+HYPERVISOR_ATTR_RO(extra);
+
+static struct attribute *version_attrs[] = {
+	&major_attr.attr,
+	&minor_attr.attr,
+	&extra_attr.attr,
+	NULL
+};
+
+static struct attribute_group version_group = {
+	.name = "version",
+	.attrs = version_attrs,
+};
+
+static int __init xen_sysfs_version_init(void)
+{
+	return sysfs_create_group(hypervisor_kobj, &version_group);
+}
+
+static void xen_sysfs_version_destroy(void)
+{
+	sysfs_remove_group(hypervisor_kobj, &version_group);
+}
+
+/* UUID */
+
+static ssize_t uuid_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	char *vm, *val;
+	int ret;
+	extern int xenstored_ready;
+
+	if (!xenstored_ready)
+		return -EBUSY;
+
+	vm = xenbus_read(XBT_NIL, "vm", "", NULL);
+	if (IS_ERR(vm))
+		return PTR_ERR(vm);
+	val = xenbus_read(XBT_NIL, vm, "uuid", NULL);
+	kfree(vm);
+	if (IS_ERR(val))
+		return PTR_ERR(val);
+	ret = sprintf(buffer, "%s\n", val);
+	kfree(val);
+	return ret;
+}
+
+HYPERVISOR_ATTR_RO(uuid);
+
+static int __init xen_sysfs_uuid_init(void)
+{
+	return sysfs_create_file(hypervisor_kobj, &uuid_attr.attr);
+}
+
+static void xen_sysfs_uuid_destroy(void)
+{
+	sysfs_remove_file(hypervisor_kobj, &uuid_attr.attr);
+}
+
+/* xen compilation attributes */
+
+static ssize_t compiler_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int ret = -ENOMEM;
+	struct xen_compile_info *info;
+
+	info = kmalloc(sizeof(struct xen_compile_info), GFP_KERNEL);
+	if (info) {
+		ret = HYPERVISOR_xen_version(XENVER_compile_info, info);
+		if (!ret)
+			ret = sprintf(buffer, "%s\n", info->compiler);
+		kfree(info);
+	}
+
+	return ret;
+}
+
+HYPERVISOR_ATTR_RO(compiler);
+
+static ssize_t compiled_by_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int ret = -ENOMEM;
+	struct xen_compile_info *info;
+
+	info = kmalloc(sizeof(struct xen_compile_info), GFP_KERNEL);
+	if (info) {
+		ret = HYPERVISOR_xen_version(XENVER_compile_info, info);
+		if (!ret)
+			ret = sprintf(buffer, "%s\n", info->compile_by);
+		kfree(info);
+	}
+
+	return ret;
+}
+
+HYPERVISOR_ATTR_RO(compiled_by);
+
+static ssize_t compile_date_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int ret = -ENOMEM;
+	struct xen_compile_info *info;
+
+	info = kmalloc(sizeof(struct xen_compile_info), GFP_KERNEL);
+	if (info) {
+		ret = HYPERVISOR_xen_version(XENVER_compile_info, info);
+		if (!ret)
+			ret = sprintf(buffer, "%s\n", info->compile_date);
+		kfree(info);
+	}
+
+	return ret;
+}
+
+HYPERVISOR_ATTR_RO(compile_date);
+
+static struct attribute *xen_compile_attrs[] = {
+	&compiler_attr.attr,
+	&compiled_by_attr.attr,
+	&compile_date_attr.attr,
+	NULL
+};
+
+static struct attribute_group xen_compilation_group = {
+	.name = "compilation",
+	.attrs = xen_compile_attrs,
+};
+
+int __init static xen_compilation_init(void)
+{
+	return sysfs_create_group(hypervisor_kobj, &xen_compilation_group);
+}
+
+static void xen_compilation_destroy(void)
+{
+	sysfs_remove_group(hypervisor_kobj, &xen_compilation_group);
+}
+
+/* xen properties info */
+
+static ssize_t capabilities_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int ret = -ENOMEM;
+	char *caps;
+
+	caps = kmalloc(XEN_CAPABILITIES_INFO_LEN, GFP_KERNEL);
+	if (caps) {
+		ret = HYPERVISOR_xen_version(XENVER_capabilities, caps);
+		if (!ret)
+			ret = sprintf(buffer, "%s\n", caps);
+		kfree(caps);
+	}
+
+	return ret;
+}
+
+HYPERVISOR_ATTR_RO(capabilities);
+
+static ssize_t changeset_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int ret = -ENOMEM;
+	char *cset;
+
+	cset = kmalloc(XEN_CHANGESET_INFO_LEN, GFP_KERNEL);
+	if (cset) {
+		ret = HYPERVISOR_xen_version(XENVER_changeset, cset);
+		if (!ret)
+			ret = sprintf(buffer, "%s\n", cset);
+		kfree(cset);
+	}
+
+	return ret;
+}
+
+HYPERVISOR_ATTR_RO(changeset);
+
+static ssize_t virtual_start_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int ret = -ENOMEM;
+	struct xen_platform_parameters *parms;
+
+	parms = kmalloc(sizeof(struct xen_platform_parameters), GFP_KERNEL);
+	if (parms) {
+		ret = HYPERVISOR_xen_version(XENVER_platform_parameters,
+					     parms);
+		if (!ret)
+			ret = sprintf(buffer, "%lx\n", parms->virt_start);
+		kfree(parms);
+	}
+
+	return ret;
+}
+
+HYPERVISOR_ATTR_RO(virtual_start);
+
+static ssize_t pagesize_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	int ret;
+
+	ret = HYPERVISOR_xen_version(XENVER_pagesize, NULL);
+	if (ret > 0)
+		ret = sprintf(buffer, "%x\n", ret);
+
+	return ret;
+}
+
+HYPERVISOR_ATTR_RO(pagesize);
+
+/* eventually there will be several more features to export */
+static ssize_t xen_feature_show(int index, char *buffer)
+{
+	int ret = -ENOMEM;
+	struct xen_feature_info *info;
+
+	info = kmalloc(sizeof(struct xen_feature_info), GFP_KERNEL);
+	if (info) {
+		info->submap_idx = index;
+		ret = HYPERVISOR_xen_version(XENVER_get_features, info);
+		if (!ret)
+			ret = sprintf(buffer, "%d\n", info->submap);
+		kfree(info);
+	}
+
+	return ret;
+}
+
+static ssize_t writable_pt_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+	return xen_feature_show(XENFEAT_writable_page_tables, buffer);
+}
+
+HYPERVISOR_ATTR_RO(writable_pt);
+
+static struct attribute *xen_properties_attrs[] = {
+	&capabilities_attr.attr,
+	&changeset_attr.attr,
+	&virtual_start_attr.attr,
+	&pagesize_attr.attr,
+	&writable_pt_attr.attr,
+	NULL
+};
+
+static struct attribute_group xen_properties_group = {
+	.name = "properties",
+	.attrs = xen_properties_attrs,
+};
+
+static int __init xen_properties_init(void)
+{
+	return sysfs_create_group(hypervisor_kobj, &xen_properties_group);
+}
+
+static void xen_properties_destroy(void)
+{
+	sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
+}
+
+#ifdef CONFIG_KEXEC
+
+extern size_t vmcoreinfo_size_xen;
+extern unsigned long paddr_vmcoreinfo_xen;
+
+static ssize_t vmcoreinfo_show(struct hyp_sysfs_attr *attr, char *page)
+{
+	return sprintf(page, "%lx %zx\n",
+		paddr_vmcoreinfo_xen, vmcoreinfo_size_xen);
+}
+
+HYPERVISOR_ATTR_RO(vmcoreinfo);
+
+static int __init xen_sysfs_vmcoreinfo_init(void)
+{
+	return sysfs_create_file(hypervisor_kobj,
+				 &vmcoreinfo_attr.attr);
+}
+
+static void xen_sysfs_vmcoreinfo_destroy(void)
+{
+	sysfs_remove_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
+}
+
+#endif
+
+static int __init hyper_sysfs_init(void)
+{
+	int ret;
+
+	if (!xen_domain())
+		return -ENODEV;
+
+	ret = xen_sysfs_type_init();
+	if (ret)
+		goto out;
+	ret = xen_sysfs_version_init();
+	if (ret)
+		goto version_out;
+	ret = xen_compilation_init();
+	if (ret)
+		goto comp_out;
+	ret = xen_sysfs_uuid_init();
+	if (ret)
+		goto uuid_out;
+	ret = xen_properties_init();
+	if (ret)
+		goto prop_out;
+#ifdef CONFIG_KEXEC
+	if (vmcoreinfo_size_xen != 0) {
+		ret = xen_sysfs_vmcoreinfo_init();
+		if (ret)
+			goto vmcoreinfo_out;
+	}
+#endif
+
+	goto out;
+
+#ifdef CONFIG_KEXEC
+vmcoreinfo_out:
+#endif
+	xen_properties_destroy();
+prop_out:
+	xen_sysfs_uuid_destroy();
+uuid_out:
+	xen_compilation_destroy();
+comp_out:
+	xen_sysfs_version_destroy();
+version_out:
+	xen_sysfs_type_destroy();
+out:
+	return ret;
+}
+
+static void __exit hyper_sysfs_exit(void)
+{
+#ifdef CONFIG_KEXEC
+	if (vmcoreinfo_size_xen != 0)
+		xen_sysfs_vmcoreinfo_destroy();
+#endif
+	xen_properties_destroy();
+	xen_compilation_destroy();
+	xen_sysfs_uuid_destroy();
+	xen_sysfs_version_destroy();
+	xen_sysfs_type_destroy();
+
+}
+module_init(hyper_sysfs_init);
+module_exit(hyper_sysfs_exit);
+
+static ssize_t hyp_sysfs_show(struct kobject *kobj,
+			      struct attribute *attr,
+			      char *buffer)
+{
+	struct hyp_sysfs_attr *hyp_attr;
+	hyp_attr = container_of(attr, struct hyp_sysfs_attr, attr);
+	if (hyp_attr->show)
+		return hyp_attr->show(hyp_attr, buffer);
+	return 0;
+}
+
+static ssize_t hyp_sysfs_store(struct kobject *kobj,
+			       struct attribute *attr,
+			       const char *buffer,
+			       size_t len)
+{
+	struct hyp_sysfs_attr *hyp_attr;
+	hyp_attr = container_of(attr, struct hyp_sysfs_attr, attr);
+	if (hyp_attr->store)
+		return hyp_attr->store(hyp_attr, buffer, len);
+	return 0;
+}
+
+static struct sysfs_ops hyp_sysfs_ops = {
+	.show = hyp_sysfs_show,
+	.store = hyp_sysfs_store,
+};
+
+static struct kobj_type hyp_sysfs_kobj_type = {
+	.sysfs_ops = &hyp_sysfs_ops,
+};
+
+static int __init hypervisor_subsys_init(void)
+{
+	if (!xen_domain())
+		return -ENODEV;
+
+	hypervisor_kobj->ktype = &hyp_sysfs_kobj_type;
+	return 0;
+}
+device_initcall(hypervisor_subsys_init);
diff --git a/include/xen/interface/version.h b/include/xen/interface/version.h
index 453235e..e8b6519 100644
--- a/include/xen/interface/version.h
+++ b/include/xen/interface/version.h
@@ -57,4 +57,7 @@ struct xen_feature_info {
 /* Declares the features reported by XENVER_get_features. */
 #include "features.h"
 
+/* arg == NULL; returns host memory page size. */
+#define XENVER_pagesize 7
+
 #endif /* __XEN_PUBLIC_VERSION_H__ */
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 20/24] xen/sys/hypervisor: change writable_pt to features
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (18 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 19/24] xen: add /sys/hypervisor support Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 21/24] xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet Jeremy Fitzhardinge
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

/sys/hypervisor/properties/writable_pt was misnamed.  Rename to features,
expressed as a bit array in hex.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/sys-hypervisor.c |   41 ++++++++++++++++++++++++++---------------
 1 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index cb29d1c..1267d6f 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -293,37 +293,48 @@ static ssize_t pagesize_show(struct hyp_sysfs_attr *attr, char *buffer)
 
 HYPERVISOR_ATTR_RO(pagesize);
 
-/* eventually there will be several more features to export */
 static ssize_t xen_feature_show(int index, char *buffer)
 {
-	int ret = -ENOMEM;
-	struct xen_feature_info *info;
+	ssize_t ret;
+	struct xen_feature_info info;
 
-	info = kmalloc(sizeof(struct xen_feature_info), GFP_KERNEL);
-	if (info) {
-		info->submap_idx = index;
-		ret = HYPERVISOR_xen_version(XENVER_get_features, info);
-		if (!ret)
-			ret = sprintf(buffer, "%d\n", info->submap);
-		kfree(info);
-	}
+	info.submap_idx = index;
+	ret = HYPERVISOR_xen_version(XENVER_get_features, &info);
+	if (!ret)
+		ret = sprintf(buffer, "%08x", info.submap);
 
 	return ret;
 }
 
-static ssize_t writable_pt_show(struct hyp_sysfs_attr *attr, char *buffer)
+static ssize_t features_show(struct hyp_sysfs_attr *attr, char *buffer)
 {
-	return xen_feature_show(XENFEAT_writable_page_tables, buffer);
+	ssize_t len;
+	int i;
+
+	len = 0;
+	for (i = XENFEAT_NR_SUBMAPS-1; i >= 0; i--) {
+		int ret = xen_feature_show(i, buffer + len);
+		if (ret < 0) {
+			if (len == 0)
+				len = ret;
+			break;
+		}
+		len += ret;
+	}
+	if (len > 0)
+		buffer[len++] = '\n';
+
+	return len;
 }
 
-HYPERVISOR_ATTR_RO(writable_pt);
+HYPERVISOR_ATTR_RO(features);
 
 static struct attribute *xen_properties_attrs[] = {
 	&capabilities_attr.attr,
 	&changeset_attr.attr,
 	&virtual_start_attr.attr,
 	&pagesize_attr.attr,
-	&writable_pt_attr.attr,
+	&features_attr.attr,
 	NULL
 };
 
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 21/24] xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (19 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 20/24] xen/sys/hypervisor: change writable_pt to features Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13  8:11 ` [PATCH 22/24] xen: remove suspend_cancel hook Jeremy Fitzhardinge
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Ian Campbell, Ian Campbell

From: Ian Campbell <Ian.Campbell@citrix.com>

I needed this to compile since there is no kexec yet in pvops kernel
  CC      drivers/xen/sys-hypervisor.o
drivers/xen/sys-hypervisor.c: In function 'hyper_sysfs_init':
drivers/xen/sys-hypervisor.c:405: error: 'vmcoreinfo_size_xen' undeclared (first use in this function)
drivers/xen/sys-hypervisor.c:405: error: (Each undeclared identifier is reported only once
drivers/xen/sys-hypervisor.c:405: error: for each function it appears in.)
drivers/xen/sys-hypervisor.c:406: error: implicit declaration of function 'xen_sysfs_vmcoreinfo_init'
drivers/xen/sys-hypervisor.c: In function 'hyper_sysfs_exit':
drivers/xen/sys-hypervisor.c:433: error: 'vmcoreinfo_size_xen' undeclared (first use in this function)
drivers/xen/sys-hypervisor.c:434: error: implicit declaration of function 'xen_sysfs_vmcoreinfo_destroy'

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/xen/sys-hypervisor.c |   41 -----------------------------------------
 1 files changed, 0 insertions(+), 41 deletions(-)

diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 1267d6f..88a60e0 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -353,32 +353,6 @@ static void xen_properties_destroy(void)
 	sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
 }
 
-#ifdef CONFIG_KEXEC
-
-extern size_t vmcoreinfo_size_xen;
-extern unsigned long paddr_vmcoreinfo_xen;
-
-static ssize_t vmcoreinfo_show(struct hyp_sysfs_attr *attr, char *page)
-{
-	return sprintf(page, "%lx %zx\n",
-		paddr_vmcoreinfo_xen, vmcoreinfo_size_xen);
-}
-
-HYPERVISOR_ATTR_RO(vmcoreinfo);
-
-static int __init xen_sysfs_vmcoreinfo_init(void)
-{
-	return sysfs_create_file(hypervisor_kobj,
-				 &vmcoreinfo_attr.attr);
-}
-
-static void xen_sysfs_vmcoreinfo_destroy(void)
-{
-	sysfs_remove_file(hypervisor_kobj, &vmcoreinfo_attr.attr);
-}
-
-#endif
-
 static int __init hyper_sysfs_init(void)
 {
 	int ret;
@@ -401,20 +375,9 @@ static int __init hyper_sysfs_init(void)
 	ret = xen_properties_init();
 	if (ret)
 		goto prop_out;
-#ifdef CONFIG_KEXEC
-	if (vmcoreinfo_size_xen != 0) {
-		ret = xen_sysfs_vmcoreinfo_init();
-		if (ret)
-			goto vmcoreinfo_out;
-	}
-#endif
 
 	goto out;
 
-#ifdef CONFIG_KEXEC
-vmcoreinfo_out:
-#endif
-	xen_properties_destroy();
 prop_out:
 	xen_sysfs_uuid_destroy();
 uuid_out:
@@ -429,10 +392,6 @@ out:
 
 static void __exit hyper_sysfs_exit(void)
 {
-#ifdef CONFIG_KEXEC
-	if (vmcoreinfo_size_xen != 0)
-		xen_sysfs_vmcoreinfo_destroy();
-#endif
 	xen_properties_destroy();
 	xen_compilation_destroy();
 	xen_sysfs_uuid_destroy();
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 22/24] xen: remove suspend_cancel hook
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (20 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 21/24] xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13 10:08     ` Jan Beulich
  2009-03-13  8:11 ` [PATCH 23/24] xen: use device model for suspending xenbus devices Jeremy Fitzhardinge
  2009-03-13  8:12 ` [PATCH 24/24] xen/xenbus: export xenbus_dev_changed Jeremy Fitzhardinge
  23 siblings, 1 reply; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Ian Campbell, Jeremy Fitzhardinge

From: Ian Campbell <ian.campbell@citrix.com>

Remove suspend_cancel hook from xenbus_driver, in preparation for using
the device model for suspending.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/xenbus/xenbus_probe.c |   23 -----------------------
 include/xen/xenbus.h              |    1 -
 2 files changed, 0 insertions(+), 24 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 773d1cf..bd20361 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -689,27 +689,6 @@ static int suspend_dev(struct device *dev, void *data)
 	return 0;
 }
 
-static int suspend_cancel_dev(struct device *dev, void *data)
-{
-	int err = 0;
-	struct xenbus_driver *drv;
-	struct xenbus_device *xdev;
-
-	DPRINTK("");
-
-	if (dev->driver == NULL)
-		return 0;
-	drv = to_xenbus_driver(dev->driver);
-	xdev = container_of(dev, struct xenbus_device, dev);
-	if (drv->suspend_cancel)
-		err = drv->suspend_cancel(xdev);
-	if (err)
-		printk(KERN_WARNING
-		       "xenbus: suspend_cancel %s failed: %i\n",
-		       dev_name(dev), err);
-	return 0;
-}
-
 static int resume_dev(struct device *dev, void *data)
 {
 	int err;
@@ -777,8 +756,6 @@ EXPORT_SYMBOL_GPL(xenbus_resume);
 void xenbus_suspend_cancel(void)
 {
 	xs_suspend_cancel();
-	bus_for_each_dev(&xenbus_frontend.bus, NULL, NULL, suspend_cancel_dev);
-	xenbus_backend_resume(suspend_cancel_dev);
 }
 EXPORT_SYMBOL_GPL(xenbus_suspend_cancel);
 
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index f87f961..0836772 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -92,7 +92,6 @@ struct xenbus_driver {
 				 enum xenbus_state backend_state);
 	int (*remove)(struct xenbus_device *dev);
 	int (*suspend)(struct xenbus_device *dev);
-	int (*suspend_cancel)(struct xenbus_device *dev);
 	int (*resume)(struct xenbus_device *dev);
 	int (*uevent)(struct xenbus_device *, char **, int, char *, int);
 	struct device_driver driver;
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 23/24] xen: use device model for suspending xenbus devices
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (21 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 22/24] xen: remove suspend_cancel hook Jeremy Fitzhardinge
@ 2009-03-13  8:11 ` Jeremy Fitzhardinge
  2009-03-13 10:09   ` [Xen-devel] [PATCH 23/24] xen: use device model for suspendingxenbus devices Jan Beulich
  2009-03-13  8:12 ` [PATCH 24/24] xen/xenbus: export xenbus_dev_changed Jeremy Fitzhardinge
  23 siblings, 1 reply; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Ian Campbell, Jeremy Fitzhardinge

From: Ian Campbell <ian.campbell@citrix.com>

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/manage.c              |    9 ++++-----
 drivers/xen/xenbus/xenbus_probe.c |   37 +++++++++----------------------------
 drivers/xen/xenbus/xenbus_xs.c    |    2 ++
 include/xen/xenbus.h              |    2 +-
 4 files changed, 16 insertions(+), 34 deletions(-)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 3ccd348..0489ea2 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -104,9 +104,8 @@ static void do_suspend(void)
 		goto out;
 	}
 
-	printk("suspending xenbus...\n");
-	/* XXX use normal device tree? */
-	xenbus_suspend();
+	printk(KERN_DEBUG "suspending xenstore...\n");
+	xs_suspend();
 
 	err = stop_machine(xen_suspend, &cancelled, cpumask_of(0));
 	if (err) {
@@ -116,9 +115,9 @@ static void do_suspend(void)
 
 	if (!cancelled) {
 		xen_arch_resume();
-		xenbus_resume();
+		xs_resume();
 	} else
-		xenbus_suspend_cancel();
+		xs_suspend_cancel();
 
 	device_resume(PMSG_RESUME);
 
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index bd20361..4649213 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -71,6 +71,9 @@ static int xenbus_probe_frontend(const char *type, const char *name);
 
 static void xenbus_dev_shutdown(struct device *_dev);
 
+static int xenbus_dev_suspend(struct device *dev, pm_message_t state);
+static int xenbus_dev_resume(struct device *dev);
+
 /* If something in array of ids matches this device, return it. */
 static const struct xenbus_device_id *
 match_device(const struct xenbus_device_id *arr, struct xenbus_device *dev)
@@ -188,6 +191,9 @@ static struct xen_bus_type xenbus_frontend = {
 		.remove    = xenbus_dev_remove,
 		.shutdown  = xenbus_dev_shutdown,
 		.dev_attrs = xenbus_dev_attrs,
+
+		.suspend   = xenbus_dev_suspend,
+		.resume    = xenbus_dev_resume,
 	},
 };
 
@@ -669,7 +675,7 @@ static struct xenbus_watch fe_watch = {
 	.callback = frontend_changed,
 };
 
-static int suspend_dev(struct device *dev, void *data)
+static int xenbus_dev_suspend(struct device *dev, pm_message_t state)
 {
 	int err = 0;
 	struct xenbus_driver *drv;
@@ -682,14 +688,14 @@ static int suspend_dev(struct device *dev, void *data)
 	drv = to_xenbus_driver(dev->driver);
 	xdev = container_of(dev, struct xenbus_device, dev);
 	if (drv->suspend)
-		err = drv->suspend(xdev);
+		err = drv->suspend(xdev, state);
 	if (err)
 		printk(KERN_WARNING
 		       "xenbus: suspend %s failed: %i\n", dev_name(dev), err);
 	return 0;
 }
 
-static int resume_dev(struct device *dev, void *data)
+static int xenbus_dev_resume(struct device *dev)
 {
 	int err;
 	struct xenbus_driver *drv;
@@ -734,31 +740,6 @@ static int resume_dev(struct device *dev, void *data)
 	return 0;
 }
 
-void xenbus_suspend(void)
-{
-	DPRINTK("");
-
-	bus_for_each_dev(&xenbus_frontend.bus, NULL, NULL, suspend_dev);
-	xenbus_backend_suspend(suspend_dev);
-	xs_suspend();
-}
-EXPORT_SYMBOL_GPL(xenbus_suspend);
-
-void xenbus_resume(void)
-{
-	xb_init_comms();
-	xs_resume();
-	bus_for_each_dev(&xenbus_frontend.bus, NULL, NULL, resume_dev);
-	xenbus_backend_resume(resume_dev);
-}
-EXPORT_SYMBOL_GPL(xenbus_resume);
-
-void xenbus_suspend_cancel(void)
-{
-	xs_suspend_cancel();
-}
-EXPORT_SYMBOL_GPL(xenbus_suspend_cancel);
-
 /* A flag to determine if xenstored is 'ready' (i.e. has started) */
 int xenstored_ready = 0;
 
diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index e325eab..eab33f1 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -673,6 +673,8 @@ void xs_resume(void)
 	struct xenbus_watch *watch;
 	char token[sizeof(watch) * 2 + 1];
 
+	xb_init_comms();
+
 	mutex_unlock(&xs_state.response_mutex);
 	mutex_unlock(&xs_state.request_mutex);
 	up_write(&xs_state.transaction_mutex);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 0836772..b9763ba 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -91,7 +91,7 @@ struct xenbus_driver {
 	void (*otherend_changed)(struct xenbus_device *dev,
 				 enum xenbus_state backend_state);
 	int (*remove)(struct xenbus_device *dev);
-	int (*suspend)(struct xenbus_device *dev);
+	int (*suspend)(struct xenbus_device *dev, pm_message_t state);
 	int (*resume)(struct xenbus_device *dev);
 	int (*uevent)(struct xenbus_device *, char **, int, char *, int);
 	struct device_driver driver;
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 24/24] xen/xenbus: export xenbus_dev_changed
  2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
                   ` (22 preceding siblings ...)
  2009-03-13  8:11 ` [PATCH 23/24] xen: use device model for suspending xenbus devices Jeremy Fitzhardinge
@ 2009-03-13  8:12 ` Jeremy Fitzhardinge
  23 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13  8:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge, Jeremy Fitzhardinge

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 drivers/xen/xenbus/xenbus_probe.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 4649213..d42e25d 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -660,6 +660,7 @@ void xenbus_dev_changed(const char *node, struct xen_bus_type *bus)
 
 	kfree(root);
 }
+EXPORT_SYMBOL_GPL(xenbus_dev_changed);
 
 static void frontend_changed(struct xenbus_watch *watch,
 			     const char **vec, unsigned int len)
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-13  8:11 ` [PATCH 10/24] xen: mask XSAVE from cpuid Jeremy Fitzhardinge
@ 2009-03-13  9:50     ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2009-03-13  9:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, H. Peter Anvin
  Cc: Jeremy Fitzhardinge, the arch/x86 maintainers, Xen-devel,
	Linux Kernel Mailing List

>>> Jeremy Fitzhardinge <jeremy@goop.org> 13.03.09 09:11 >>>
>From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
>Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE
>to be set.  This confuses the kernel and it ends up crashing on
>an xsetbv instruction.
>
>At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of
>cpuid it we can't.  This will produce a spurious error from Xen,
>but allows us to support XSAVE if/when Xen does.

As pointed out on an earlier thread, it seems inappropriate to do probing
like this when there is a cpuid feature flag (osxsave) that can be used to
determine whether XSAVE can be used. And even without that flag,
simply reading CR4 and checking whether osxsave is set there would
suffice. This is under the assumption that Xen's to-be-done implementation
of XSAVE support would match that of FXSAVE (Xen turns its support on
unconditionally and for all [pv] guests).

Jan


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/24] xen: mask XSAVE from cpuid
@ 2009-03-13  9:50     ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2009-03-13  9:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, H. Peter Anvin
  Cc: Xen-devel, Jeremy Fitzhardinge, the arch/x86 maintainers,
	Linux Kernel Mailing List

>>> Jeremy Fitzhardinge <jeremy@goop.org> 13.03.09 09:11 >>>
>From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
>Xen leaves XSAVE set in cpuid, but doesn't allow cr4.OSXSAVE
>to be set.  This confuses the kernel and it ends up crashing on
>an xsetbv instruction.
>
>At boot time, try to set cr4.OSXSAVE, and mask XSAVE out of
>cpuid it we can't.  This will produce a spurious error from Xen,
>but allows us to support XSAVE if/when Xen does.

As pointed out on an earlier thread, it seems inappropriate to do probing
like this when there is a cpuid feature flag (osxsave) that can be used to
determine whether XSAVE can be used. And even without that flag,
simply reading CR4 and checking whether osxsave is set there would
suffice. This is under the assumption that Xen's to-be-done implementation
of XSAVE support would match that of FXSAVE (Xen turns its support on
unconditionally and for all [pv] guests).

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 22/24] xen: remove suspend_cancel hook
  2009-03-13  8:11 ` [PATCH 22/24] xen: remove suspend_cancel hook Jeremy Fitzhardinge
@ 2009-03-13 10:08     ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2009-03-13 10:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, H. Peter Anvin
  Cc: Ian Campbell, Jeremy Fitzhardinge, the arch/x86 maintainers,
	Xen-devel, Linux Kernel Mailing List

>>> Jeremy Fitzhardinge <jeremy@goop.org> 13.03.09 09:11 >>>
>From: Ian Campbell <ian.campbell@citrix.com>
>
>Remove suspend_cancel hook from xenbus_driver, in preparation for using
>the device model for suspending.
>
>Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Does that mean that there are no intentions to ever support the accelerator
stuff found in the 2.6.18-based Xenified Linux tree? That was the apparent
only user of the cancel hook...

Jan


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 22/24] xen: remove suspend_cancel hook
@ 2009-03-13 10:08     ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2009-03-13 10:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, H. Peter Anvin
  Cc: Xen-devel, the arch/x86 maintainers, Jeremy Fitzhardinge,
	Ian Campbell, Linux Kernel Mailing List

>>> Jeremy Fitzhardinge <jeremy@goop.org> 13.03.09 09:11 >>>
>From: Ian Campbell <ian.campbell@citrix.com>
>
>Remove suspend_cancel hook from xenbus_driver, in preparation for using
>the device model for suspending.
>
>Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

Does that mean that there are no intentions to ever support the accelerator
stuff found in the 2.6.18-based Xenified Linux tree? That was the apparent
only user of the cancel hook...

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 23/24] xen: use device model for suspendingxenbus devices
  2009-03-13  8:11 ` [PATCH 23/24] xen: use device model for suspending xenbus devices Jeremy Fitzhardinge
@ 2009-03-13 10:09   ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2009-03-13 10:09 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, H. Peter Anvin
  Cc: Ian Campbell, Jeremy Fitzhardinge, the arch/x86 maintainers,
	Xen-devel, Linux Kernel Mailing List

>>> Jeremy Fitzhardinge <jeremy@goop.org> 13.03.09 09:11 >>>

Shouldn't this also include removing the explicit calls to
gnttab_{suspend,resume}() in favor of making gnttab a sysdev?

Jan


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-13  9:50     ` Jan Beulich
@ 2009-03-13 15:13       ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13 15:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: H. Peter Anvin, Xen-devel, Jeremy Fitzhardinge,
	the arch/x86 maintainers, Linux Kernel Mailing List

Jan Beulich wrote:
> As pointed out on an earlier thread, it seems inappropriate to do probing
> like this when there is a cpuid feature flag (osxsave) that can be used to
> determine whether XSAVE can be used. And even without that flag,
> simply reading CR4 and checking whether osxsave is set there would
> suffice. This is under the assumption that Xen's to-be-done implementation
> of XSAVE support would match that of FXSAVE (Xen turns its support on
> unconditionally and for all [pv] guests).

I didn't want to make too many assumptions about how Xen's XSAVE support 
would look.  In particular, I thought it might virtualize the state of 
OSXSAVE to give the guest the honour of appearing to enable it.  A guest 
kernel may get confused if it starts with OSXSAVE set, as it may use it 
to control its own init logic.

    J

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/24] xen: mask XSAVE from cpuid
@ 2009-03-13 15:13       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13 15:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: the arch/x86 maintainers, Xen-devel, Jeremy Fitzhardinge,
	Linux Kernel Mailing List, H. Peter Anvin

Jan Beulich wrote:
> As pointed out on an earlier thread, it seems inappropriate to do probing
> like this when there is a cpuid feature flag (osxsave) that can be used to
> determine whether XSAVE can be used. And even without that flag,
> simply reading CR4 and checking whether osxsave is set there would
> suffice. This is under the assumption that Xen's to-be-done implementation
> of XSAVE support would match that of FXSAVE (Xen turns its support on
> unconditionally and for all [pv] guests).

I didn't want to make too many assumptions about how Xen's XSAVE support 
would look.  In particular, I thought it might virtualize the state of 
OSXSAVE to give the guest the honour of appearing to enable it.  A guest 
kernel may get confused if it starts with OSXSAVE set, as it may use it 
to control its own init logic.

    J

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 22/24] xen: remove suspend_cancel hook
  2009-03-13 10:08     ` Jan Beulich
@ 2009-03-13 15:17       ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13 15:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: H. Peter Anvin, Xen-devel, the arch/x86 maintainers,
	Jeremy Fitzhardinge, Ian Campbell, Linux Kernel Mailing List

Jan Beulich wrote:
> Does that mean that there are no intentions to ever support the accelerator
> stuff found in the 2.6.18-based Xenified Linux tree? That was the apparent
> only user of the cancel hook...
>   

I don't have any immediate plan to merge it myself, but I'm sure we can 
either add this back or otherwise work out how to implement it in the 
pvops kernel.

    J

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 22/24] xen: remove suspend_cancel hook
@ 2009-03-13 15:17       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-13 15:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Xen-devel, Ian Campbell, the arch/x86 maintainers,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, H. Peter Anvin

Jan Beulich wrote:
> Does that mean that there are no intentions to ever support the accelerator
> stuff found in the 2.6.18-based Xenified Linux tree? That was the apparent
> only user of the cancel hook...
>   

I don't have any immediate plan to merge it myself, but I'm sure we can 
either add this back or otherwise work out how to implement it in the 
pvops kernel.

    J

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-13 15:13       ` Jeremy Fitzhardinge
  (?)
@ 2009-03-15 18:45       ` H. Peter Anvin
  2009-03-15 21:03           ` Jeremy Fitzhardinge
  -1 siblings, 1 reply; 63+ messages in thread
From: H. Peter Anvin @ 2009-03-15 18:45 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Jan Beulich, Xen-devel, Jeremy Fitzhardinge,
	the arch/x86 maintainers, Linux Kernel Mailing List

Jeremy Fitzhardinge wrote:
> Jan Beulich wrote:
>> As pointed out on an earlier thread, it seems inappropriate to do probing
>> like this when there is a cpuid feature flag (osxsave) that can be
>> used to
>> determine whether XSAVE can be used. And even without that flag,
>> simply reading CR4 and checking whether osxsave is set there would
>> suffice. This is under the assumption that Xen's to-be-done
>> implementation
>> of XSAVE support would match that of FXSAVE (Xen turns its support on
>> unconditionally and for all [pv] guests).
> 
> I didn't want to make too many assumptions about how Xen's XSAVE support
> would look.  In particular, I thought it might virtualize the state of
> OSXSAVE to give the guest the honour of appearing to enable it.  A guest
> kernel may get confused if it starts with OSXSAVE set, as it may use it
> to control its own init logic.

That wouldn't be an issue if you use the *native* CPUID to look for
OSXSAVE early on, since such virtualization would only be visible though
the PV interface, right?

It seems cleaner than probing, to be sure...

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-15 18:45       ` [Xen-devel] " H. Peter Anvin
@ 2009-03-15 21:03           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-15 21:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jan Beulich, Xen-devel, Jeremy Fitzhardinge,
	the arch/x86 maintainers, Linux Kernel Mailing List

H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>   
>> Jan Beulich wrote:
>>     
>>> As pointed out on an earlier thread, it seems inappropriate to do probing
>>> like this when there is a cpuid feature flag (osxsave) that can be
>>> used to
>>> determine whether XSAVE can be used. And even without that flag,
>>> simply reading CR4 and checking whether osxsave is set there would
>>> suffice. This is under the assumption that Xen's to-be-done
>>> implementation
>>> of XSAVE support would match that of FXSAVE (Xen turns its support on
>>> unconditionally and for all [pv] guests).
>>>       
>> I didn't want to make too many assumptions about how Xen's XSAVE support
>> would look.  In particular, I thought it might virtualize the state of
>> OSXSAVE to give the guest the honour of appearing to enable it.  A guest
>> kernel may get confused if it starts with OSXSAVE set, as it may use it
>> to control its own init logic.
>>     
>
> That wouldn't be an issue if you use the *native* CPUID to look for
> OSXSAVE early on, since such virtualization would only be visible though
> the PV interface, right?
>
> It seems cleaner than probing, to be sure...
>   

Well, at the moment the problem is that cpuid (both PV and native) show 
XSAVE, but Xen prevents cr4.OSXSAVE from being set, crashing the 
kernel.  There's now a patch in Xen to mask XSAVE in CPUID, so that 
guests don't try to use it; the patch in the kernel is just to support 
non-bleeding-edge versions of Xen.

There have been some patches floating around for Xen support of XSAVE, 
but I think there are some issues with the variable-sized CPU context 
and save/restore/migrate, so they've been put on the backburner until 
there's a real need for them.  I haven't looked at them, but I wouldn't 
have assumed that Xen would necessarily set OSXSAVE for itself, or 
require guests to do so (if a guest can make do with a simpler CPU 
context structure, then that might be simpler for things like 
cross-architecture migration, etc).  I think that the only safe 
assumption is that XSAVE is available iff cpuid.XSAVE is set, modulo the 
bug mentioned above.

I guess if we support XSAVE for any vcpu, all the pcpus must have 
OSXSAVE set, and we rely on the fact that the XSAVE format is compatible 
with FXSAVE where they overlap.  But I really don't know what happens 
when guests use xsetbv and how that might be virtualized/paravirtualized.

    J

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/24] xen: mask XSAVE from cpuid
@ 2009-03-15 21:03           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-15 21:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Xen-devel, Jeremy Fitzhardinge,
	Linux Kernel Mailing List

H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
>   
>> Jan Beulich wrote:
>>     
>>> As pointed out on an earlier thread, it seems inappropriate to do probing
>>> like this when there is a cpuid feature flag (osxsave) that can be
>>> used to
>>> determine whether XSAVE can be used. And even without that flag,
>>> simply reading CR4 and checking whether osxsave is set there would
>>> suffice. This is under the assumption that Xen's to-be-done
>>> implementation
>>> of XSAVE support would match that of FXSAVE (Xen turns its support on
>>> unconditionally and for all [pv] guests).
>>>       
>> I didn't want to make too many assumptions about how Xen's XSAVE support
>> would look.  In particular, I thought it might virtualize the state of
>> OSXSAVE to give the guest the honour of appearing to enable it.  A guest
>> kernel may get confused if it starts with OSXSAVE set, as it may use it
>> to control its own init logic.
>>     
>
> That wouldn't be an issue if you use the *native* CPUID to look for
> OSXSAVE early on, since such virtualization would only be visible though
> the PV interface, right?
>
> It seems cleaner than probing, to be sure...
>   

Well, at the moment the problem is that cpuid (both PV and native) show 
XSAVE, but Xen prevents cr4.OSXSAVE from being set, crashing the 
kernel.  There's now a patch in Xen to mask XSAVE in CPUID, so that 
guests don't try to use it; the patch in the kernel is just to support 
non-bleeding-edge versions of Xen.

There have been some patches floating around for Xen support of XSAVE, 
but I think there are some issues with the variable-sized CPU context 
and save/restore/migrate, so they've been put on the backburner until 
there's a real need for them.  I haven't looked at them, but I wouldn't 
have assumed that Xen would necessarily set OSXSAVE for itself, or 
require guests to do so (if a guest can make do with a simpler CPU 
context structure, then that might be simpler for things like 
cross-architecture migration, etc).  I think that the only safe 
assumption is that XSAVE is available iff cpuid.XSAVE is set, modulo the 
bug mentioned above.

I guess if we support XSAVE for any vcpu, all the pcpus must have 
OSXSAVE set, and we rely on the fact that the XSAVE format is compatible 
with FXSAVE where they overlap.  But I really don't know what happens 
when guests use xsetbv and how that might be virtualized/paravirtualized.

    J

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 12/24] x86-64: remove PGE from must-have feature list
  2009-03-13  8:11 ` [PATCH 12/24] x86-64: remove PGE from must-have feature list Jeremy Fitzhardinge
@ 2009-03-15 21:18   ` H. Peter Anvin
  2009-03-15 21:25       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 63+ messages in thread
From: H. Peter Anvin @ 2009-03-15 21:18 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> PGE may not be available when running paravirtualized, so test the cpuid
> bit before using it.
> 
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  arch/x86/include/asm/required-features.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h
> index d5cd6c5..a4737dd 100644
> --- a/arch/x86/include/asm/required-features.h
> +++ b/arch/x86/include/asm/required-features.h
> @@ -50,7 +50,7 @@
>  #ifdef CONFIG_X86_64
>  #define NEED_PSE	0
>  #define NEED_MSR	(1<<(X86_FEATURE_MSR & 31))
> -#define NEED_PGE	(1<<(X86_FEATURE_PGE & 31))
> +#define NEED_PGE	0
>  #define NEED_FXSR	(1<<(X86_FEATURE_FXSR & 31))
>  #define NEED_XMM	(1<<(X86_FEATURE_XMM & 31))
>  #define NEED_XMM2	(1<<(X86_FEATURE_XMM2 & 31))

This should be conditionalized on CONFIG_PARAVIRT, since doing this
removes real-hardware optimimizations.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 12/24] x86-64: remove PGE from must-have feature list
  2009-03-15 21:18   ` H. Peter Anvin
@ 2009-03-15 21:25       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-15 21:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List, Xen-devel,
	Jeremy Fitzhardinge

H. Peter Anvin wrote:
>> --- a/arch/x86/include/asm/required-features.h
>> +++ b/arch/x86/include/asm/required-features.h
>> @@ -50,7 +50,7 @@
>>  #ifdef CONFIG_X86_64
>>  #define NEED_PSE	0
>>  #define NEED_MSR	(1<<(X86_FEATURE_MSR & 31))
>> -#define NEED_PGE	(1<<(X86_FEATURE_PGE & 31))
>> +#define NEED_PGE	0
>>  #define NEED_FXSR	(1<<(X86_FEATURE_FXSR & 31))
>>  #define NEED_XMM	(1<<(X86_FEATURE_XMM & 31))
>>  #define NEED_XMM2	(1<<(X86_FEATURE_XMM2 & 31))
>>     
>
> This should be conditionalized on CONFIG_PARAVIRT, since doing this
> removes real-hardware optimimizations.
>   

OK.  Can do the same for PSE.

    J

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 12/24] x86-64: remove PGE from must-have feature list
@ 2009-03-15 21:25       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-15 21:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Xen-devel, the arch/x86 maintainers, Linux Kernel Mailing List,
	Jeremy Fitzhardinge

H. Peter Anvin wrote:
>> --- a/arch/x86/include/asm/required-features.h
>> +++ b/arch/x86/include/asm/required-features.h
>> @@ -50,7 +50,7 @@
>>  #ifdef CONFIG_X86_64
>>  #define NEED_PSE	0
>>  #define NEED_MSR	(1<<(X86_FEATURE_MSR & 31))
>> -#define NEED_PGE	(1<<(X86_FEATURE_PGE & 31))
>> +#define NEED_PGE	0
>>  #define NEED_FXSR	(1<<(X86_FEATURE_FXSR & 31))
>>  #define NEED_XMM	(1<<(X86_FEATURE_XMM & 31))
>>  #define NEED_XMM2	(1<<(X86_FEATURE_XMM2 & 31))
>>     
>
> This should be conditionalized on CONFIG_PARAVIRT, since doing this
> removes real-hardware optimimizations.
>   

OK.  Can do the same for PSE.

    J

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-15 21:03           ` Jeremy Fitzhardinge
  (?)
@ 2009-03-15 22:47           ` Arjan van de Ven
  2009-03-16  0:05             ` Jeremy Fitzhardinge
  -1 siblings, 1 reply; 63+ messages in thread
From: Arjan van de Ven @ 2009-03-15 22:47 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Jan Beulich, Xen-devel, Jeremy Fitzhardinge,
	the arch/x86 maintainers, Linux Kernel Mailing List

On Sun, 15 Mar 2009 14:03:10 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> H. Peter Anvin wrote:
> > Jeremy Fitzhardinge wrote:
> >   
> >> Jan Beulich wrote:
> >>     
> >>> As pointed out on an earlier thread, it seems inappropriate to do
> >>> probing like this when there is a cpuid feature flag (osxsave)
> >>> that can be used to
> >>> determine whether XSAVE can be used. And even without that flag,
> >>> simply reading CR4 and checking whether osxsave is set there would
> >>> suffice. This is under the assumption that Xen's to-be-done
> >>> implementation
> >>> of XSAVE support would match that of FXSAVE (Xen turns its
> >>> support on unconditionally and for all [pv] guests).
> >>>       
> >> I didn't want to make too many assumptions about how Xen's XSAVE
> >> support would look.  In particular, I thought it might virtualize
> >> the state of OSXSAVE to give the guest the honour of appearing to
> >> enable it.  A guest kernel may get confused if it starts with
> >> OSXSAVE set, as it may use it to control its own init logic.
> >>     
> >
> > That wouldn't be an issue if you use the *native* CPUID to look for
> > OSXSAVE early on, since such virtualization would only be visible
> > though the PV interface, right?
> >
> > It seems cleaner than probing, to be sure...
> >   
> 
> Well, at the moment the problem is that cpuid (both PV and native)
> show XSAVE, but Xen prevents cr4.OSXSAVE from being set, crashing the 
> kernel.  There's now a patch in Xen to mask XSAVE in CPUID, so that 
> guests don't try to use it; the patch in the kernel is just to
> support non-bleeding-edge versions of Xen.

This is indicative of something that might be a huge bug in Xen:
Xen should never ever pass through CPUID bits it does not know.
If Xen does not honor that, there is a fundamental and eternally
recurring problem.... every time something new gets introduced Xen
likely breaks.


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-15 22:47           ` [Xen-devel] " Arjan van de Ven
@ 2009-03-16  0:05             ` Jeremy Fitzhardinge
  2009-03-16  0:09               ` Arjan van de Ven
  0 siblings, 1 reply; 63+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-16  0:05 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: H. Peter Anvin, Jan Beulich, Xen-devel, Jeremy Fitzhardinge,
	the arch/x86 maintainers, Linux Kernel Mailing List

Arjan van de Ven wrote:
> This is indicative of something that might be a huge bug in Xen:
> Xen should never ever pass through CPUID bits it does not know.
> If Xen does not honor that, there is a fundamental and eternally
> recurring problem.... every time something new gets introduced Xen
> likely breaks.

Yes, I'd agree; Xen should whitelist cpu capabilities rather than 
blacklist them.  Jan expressed the opposite opinion (on the grounds that 
it precludes using features which don't require special OS or hypervisor 
support without Xen modifications).  But if its just a matter of 
sticking a bit into a mask, its easy and quick to roll a new version of 
Xen (esp since it can generally be done before CPUs with the new feature 
get into people's hands).

    J


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-16  0:05             ` Jeremy Fitzhardinge
@ 2009-03-16  0:09               ` Arjan van de Ven
  2009-03-16  0:57                 ` H. Peter Anvin
  2009-03-16 14:16                 ` Jan Beulich
  0 siblings, 2 replies; 63+ messages in thread
From: Arjan van de Ven @ 2009-03-16  0:09 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: H. Peter Anvin, Jan Beulich, Xen-devel, Jeremy Fitzhardinge,
	the arch/x86 maintainers, Linux Kernel Mailing List

On Sun, 15 Mar 2009 17:05:26 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Arjan van de Ven wrote:
> > This is indicative of something that might be a huge bug in Xen:
> > Xen should never ever pass through CPUID bits it does not know.
> > If Xen does not honor that, there is a fundamental and eternally
> > recurring problem.... every time something new gets introduced Xen
> > likely breaks.
> 
> Yes, I'd agree; Xen should whitelist cpu capabilities rather than 
> blacklist them.  Jan expressed the opposite opinion (on the grounds
> that it precludes using features which don't require special OS or
> hypervisor support without Xen modifications).

Well.. pretty much all new instructions need Xen modifications due to
the need to be emulate to deal with traps/vmexits/etc right? 
So I don't quite see many cpuid bits that would NOT involve some Xen
modification or another ;)



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-16  0:09               ` Arjan van de Ven
@ 2009-03-16  0:57                 ` H. Peter Anvin
  2009-03-16 14:16                 ` Jan Beulich
  1 sibling, 0 replies; 63+ messages in thread
From: H. Peter Anvin @ 2009-03-16  0:57 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Jeremy Fitzhardinge, Jan Beulich, Xen-devel, Jeremy Fitzhardinge,
	the arch/x86 maintainers, Linux Kernel Mailing List

Arjan van de Ven wrote:
> 
> Well.. pretty much all new instructions need Xen modifications due to
> the need to be emulate to deal with traps/vmexits/etc right? 
> So I don't quite see many cpuid bits that would NOT involve some Xen
> modification or another ;)
> 

There are going to be a very small number which need only userspace
support.  However, there will be absolutely no way for Xen to know this
a priori.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-16  0:09               ` Arjan van de Ven
  2009-03-16  0:57                 ` H. Peter Anvin
@ 2009-03-16 14:16                 ` Jan Beulich
  2009-03-16 14:29                   ` Arjan van de Ven
  2009-03-16 23:59                   ` Andi Kleen
  1 sibling, 2 replies; 63+ messages in thread
From: Jan Beulich @ 2009-03-16 14:16 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Jeremy Fitzhardinge, Jeremy Fitzhardinge,
	the arch/x86 maintainers, Xen-devel, Linux Kernel Mailing List,
	H. Peter Anvin

>>> Arjan van de Ven <arjan@infradead.org> 16.03.09 01:09 >>>
>Well.. pretty much all new instructions need Xen modifications due to
>the need to be emulate to deal with traps/vmexits/etc right? 
>So I don't quite see many cpuid bits that would NOT involve some Xen
>modification or another ;)

No, new (user-mode accessible) instructions represent precisely the kind
of extension that do not require hypervisor (or OS) awareness (see SSE2
etc, AES, FMA). New registers otoh are examples of where awareness is
needed (SSE, AVX), as would be new privileged instructions.

Jan


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-16 14:16                 ` Jan Beulich
@ 2009-03-16 14:29                   ` Arjan van de Ven
  2009-03-16 23:59                   ` Andi Kleen
  1 sibling, 0 replies; 63+ messages in thread
From: Arjan van de Ven @ 2009-03-16 14:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Jeremy Fitzhardinge, Jeremy Fitzhardinge,
	the arch/x86 maintainers, Xen-devel, Linux Kernel Mailing List,
	H. Peter Anvin

On Mon, 16 Mar 2009 14:16:32 +0000
"Jan Beulich" <jbeulich@novell.com> wrote:

> >>> Arjan van de Ven <arjan@infradead.org> 16.03.09 01:09 >>>
> >Well.. pretty much all new instructions need Xen modifications due to
> >the need to be emulate to deal with traps/vmexits/etc right? 
> >So I don't quite see many cpuid bits that would NOT involve some Xen
> >modification or another ;)
> 
> No, new (user-mode accessible) instructions represent precisely the
> kind of extension that do not require hypervisor (or OS) awareness
> (see SSE2 etc, AES, FMA). 

so Xen doesn't need to handle a case where the kernel does AES on
uncached IO memory ?


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-16 14:16                 ` Jan Beulich
  2009-03-16 14:29                   ` Arjan van de Ven
@ 2009-03-16 23:59                   ` Andi Kleen
  2009-03-17  1:33                     ` H. Peter Anvin
  2009-03-17  7:53                       ` Jan Beulich
  1 sibling, 2 replies; 63+ messages in thread
From: Andi Kleen @ 2009-03-16 23:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Arjan van de Ven, Jeremy Fitzhardinge, Jeremy Fitzhardinge,
	the arch/x86 maintainers, Xen-devel, Linux Kernel Mailing List,
	H. Peter Anvin

"Jan Beulich" <jbeulich@novell.com> writes:

>>>> Arjan van de Ven <arjan@infradead.org> 16.03.09 01:09 >>>
>>Well.. pretty much all new instructions need Xen modifications due to
>>the need to be emulate to deal with traps/vmexits/etc right? 
>>So I don't quite see many cpuid bits that would NOT involve some Xen
>>modification or another ;)
>
> No, new (user-mode accessible) instructions represent precisely the kind
> of extension that do not require hypervisor (or OS) awareness (see SSE2
> etc, AES, FMA). New registers otoh are examples of where awareness is
> needed (SSE, AVX), as would be new privileged instructions.

Whey would another hypothetical FP register extension need Xen support
once it gets proper XSAVE support? I can't think of a reason why
(assuming XSAVE support) it would need to know of a new kind of
FP register or similar. They very likely won't appear in any 
instructions that need mmio. Or are you worried about the real
mode emulator?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-16 23:59                   ` Andi Kleen
@ 2009-03-17  1:33                     ` H. Peter Anvin
  2009-03-17 11:56                       ` Andi Kleen
  2009-03-17  7:53                       ` Jan Beulich
  1 sibling, 1 reply; 63+ messages in thread
From: H. Peter Anvin @ 2009-03-17  1:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jan Beulich, Arjan van de Ven, Jeremy Fitzhardinge,
	Jeremy Fitzhardinge, the arch/x86 maintainers, Xen-devel,
	Linux Kernel Mailing List

Andi Kleen wrote:
> "Jan Beulich" <jbeulich@novell.com> writes:
> 
>>>>> Arjan van de Ven <arjan@infradead.org> 16.03.09 01:09 >>>
>>> Well.. pretty much all new instructions need Xen modifications due to
>>> the need to be emulate to deal with traps/vmexits/etc right? 
>>> So I don't quite see many cpuid bits that would NOT involve some Xen
>>> modification or another ;)
>> No, new (user-mode accessible) instructions represent precisely the kind
>> of extension that do not require hypervisor (or OS) awareness (see SSE2
>> etc, AES, FMA). New registers otoh are examples of where awareness is
>> needed (SSE, AVX), as would be new privileged instructions.
> 
> Whey would another hypothetical FP register extension need Xen support
> once it gets proper XSAVE support? I can't think of a reason why
> (assuming XSAVE support) it would need to know of a new kind of
> FP register or similar. They very likely won't appear in any 
> instructions that need mmio. Or are you worried about the real
> mode emulator?
> 

The point is YOU DON'T KNOW.  In particular, there might be new traps,
there might be new state, there might be new MSRs, there might be new
control bits... anything.  Therefore, you cannot blindly pass the bit
on, even though XSAVE solves one part of the problem.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-16 23:59                   ` Andi Kleen
@ 2009-03-17  7:53                       ` Jan Beulich
  2009-03-17  7:53                       ` Jan Beulich
  1 sibling, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2009-03-17  7:53 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeremy Fitzhardinge, Jeremy Fitzhardinge, Arjan van de Ven,
	the arch/x86 maintainers, Xen-devel, Linux Kernel Mailing List,
	H. Peter Anvin

>>> Andi Kleen <andi@firstfloor.org> 17.03.09 00:59 >>>
>"Jan Beulich" <jbeulich@novell.com> writes:
>
>>>>> Arjan van de Ven <arjan@infradead.org> 16.03.09 01:09 >>>
>>>Well.. pretty much all new instructions need Xen modifications due to
>>>the need to be emulate to deal with traps/vmexits/etc right? 
>>>So I don't quite see many cpuid bits that would NOT involve some Xen
>>>modification or another ;)
>>
>> No, new (user-mode accessible) instructions represent precisely the kind
>> of extension that do not require hypervisor (or OS) awareness (see SSE2
>> etc, AES, FMA). New registers otoh are examples of where awareness is
>> needed (SSE, AVX), as would be new privileged instructions.
>
>Whey would another hypothetical FP register extension need Xen support
>once it gets proper XSAVE support? I can't think of a reason why
>(assuming XSAVE support) it would need to know of a new kind of
>FP register or similar. They very likely won't appear in any 
>instructions that need mmio. Or are you worried about the real
>mode emulator?

No, properly coded xsave support will (hopefully) make user-visible context
extensions transparent to hypervisor and OS. But I was giving a general
example here, and the change from xmm to ymm registers is one that does
need hypervisor (and OS) changes.

Jan


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/24] xen: mask XSAVE from cpuid
@ 2009-03-17  7:53                       ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2009-03-17  7:53 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeremy Fitzhardinge, Xen-devel, the arch/x86 maintainers,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, H. Peter Anvin,
	Arjan van de Ven

>>> Andi Kleen <andi@firstfloor.org> 17.03.09 00:59 >>>
>"Jan Beulich" <jbeulich@novell.com> writes:
>
>>>>> Arjan van de Ven <arjan@infradead.org> 16.03.09 01:09 >>>
>>>Well.. pretty much all new instructions need Xen modifications due to
>>>the need to be emulate to deal with traps/vmexits/etc right? 
>>>So I don't quite see many cpuid bits that would NOT involve some Xen
>>>modification or another ;)
>>
>> No, new (user-mode accessible) instructions represent precisely the kind
>> of extension that do not require hypervisor (or OS) awareness (see SSE2
>> etc, AES, FMA). New registers otoh are examples of where awareness is
>> needed (SSE, AVX), as would be new privileged instructions.
>
>Whey would another hypothetical FP register extension need Xen support
>once it gets proper XSAVE support? I can't think of a reason why
>(assuming XSAVE support) it would need to know of a new kind of
>FP register or similar. They very likely won't appear in any 
>instructions that need mmio. Or are you worried about the real
>mode emulator?

No, properly coded xsave support will (hopefully) make user-visible context
extensions transparent to hypervisor and OS. But I was giving a general
example here, and the change from xmm to ymm registers is one that does
need hypervisor (and OS) changes.

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-17  7:53                       ` Jan Beulich
  (?)
@ 2009-03-17 10:48                       ` Andi Kleen
  2009-03-17 10:55                           ` Jan Beulich
  -1 siblings, 1 reply; 63+ messages in thread
From: Andi Kleen @ 2009-03-17 10:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andi Kleen, Jeremy Fitzhardinge, Jeremy Fitzhardinge,
	Arjan van de Ven, the arch/x86 maintainers, Xen-devel,
	Linux Kernel Mailing List, H. Peter Anvin

> No, properly coded xsave support will (hopefully) make user-visible context
> extensions transparent to hypervisor and OS. But I was giving a general
> example here, and the change from xmm to ymm registers is one that does
> need hypervisor (and OS) changes.

Again except for XSAVE support it doesn't, does it?

-Andi

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-17 10:48                       ` [Xen-devel] " Andi Kleen
@ 2009-03-17 10:55                           ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2009-03-17 10:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeremy Fitzhardinge, Jeremy Fitzhardinge, Arjan van de Ven,
	the arch/x86 maintainers, Xen-devel, Linux Kernel Mailing List,
	H. Peter Anvin

>>> Andi Kleen <andi@firstfloor.org> 17.03.09 11:48 >>>
>> No, properly coded xsave support will (hopefully) make user-visible context
>> extensions transparent to hypervisor and OS. But I was giving a general
>> example here, and the change from xmm to ymm registers is one that does
>> need hypervisor (and OS) changes.
>
>Again except for XSAVE support it doesn't, does it?

No, it shouldn't (minus the valid comments hpa added in another reply).

Jan


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/24] xen: mask XSAVE from cpuid
@ 2009-03-17 10:55                           ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2009-03-17 10:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeremy Fitzhardinge, Xen-devel, the arch/x86 maintainers,
	Linux Kernel Mailing List, Jeremy Fitzhardinge, H. Peter Anvin,
	Arjan van de Ven

>>> Andi Kleen <andi@firstfloor.org> 17.03.09 11:48 >>>
>> No, properly coded xsave support will (hopefully) make user-visible context
>> extensions transparent to hypervisor and OS. But I was giving a general
>> example here, and the change from xmm to ymm registers is one that does
>> need hypervisor (and OS) changes.
>
>Again except for XSAVE support it doesn't, does it?

No, it shouldn't (minus the valid comments hpa added in another reply).

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-17  1:33                     ` H. Peter Anvin
@ 2009-03-17 11:56                       ` Andi Kleen
  2009-03-17 15:48                         ` H. Peter Anvin
  2009-03-17 15:49                           ` Arjan van de Ven
  0 siblings, 2 replies; 63+ messages in thread
From: Andi Kleen @ 2009-03-17 11:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andi Kleen, Jan Beulich, Arjan van de Ven, Jeremy Fitzhardinge,
	Jeremy Fitzhardinge, the arch/x86 maintainers, Xen-devel,
	Linux Kernel Mailing List

> The point is YOU DON'T KNOW.  In particular, there might be new traps,
> there might be new state, there might be new MSRs, there might be new
> control bits... anything.  Therefore, you cannot blindly pass the bit
> on, even though XSAVE solves one part of the problem.

I think what will happen if you don't expose it is that there will
be always hypervisors which are behind and applications/OS will end up
doing probing for opcodes instead of trusting CPUID bits.

Probably not what you intended.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-17 11:56                       ` Andi Kleen
@ 2009-03-17 15:48                         ` H. Peter Anvin
  2009-03-17 15:49                           ` Arjan van de Ven
  1 sibling, 0 replies; 63+ messages in thread
From: H. Peter Anvin @ 2009-03-17 15:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jan Beulich, Arjan van de Ven, Jeremy Fitzhardinge,
	Jeremy Fitzhardinge, the arch/x86 maintainers, Xen-devel,
	Linux Kernel Mailing List

Andi Kleen wrote:
>> The point is YOU DON'T KNOW.  In particular, there might be new traps,
>> there might be new state, there might be new MSRs, there might be new
>> control bits... anything.  Therefore, you cannot blindly pass the bit
>> on, even though XSAVE solves one part of the problem.
> 
> I think what will happen if you don't expose it is that there will
> be always hypervisors which are behind and applications/OS will end up
> doing probing for opcodes instead of trusting CPUID bits.
> 
> Probably not what you intended.
> 

Probing for opcodes is even more harmful, though.  But yes, we don't
have a good answer to this, and I believe we *can't* have a good answer
to this either -- we could architect the CPUID instruction a bit
differently, but that doesn't account for the various needs of
differnent hypervisors.

Hypervisor vendors can of course make this easier by making their CPUID
code pluggable so the end user can "hotfix" upgrade it without upgrading
the hypervisor (which makes a lot of them nervous.)

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-17 11:56                       ` Andi Kleen
@ 2009-03-17 15:49                           ` Arjan van de Ven
  2009-03-17 15:49                           ` Arjan van de Ven
  1 sibling, 0 replies; 63+ messages in thread
From: Arjan van de Ven @ 2009-03-17 15:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: H. Peter Anvin, Andi Kleen, Jan Beulich, Jeremy Fitzhardinge,
	Jeremy Fitzhardinge, the arch/x86 maintainers, Xen-devel,
	Linux Kernel Mailing List

On Tue, 17 Mar 2009 12:56:21 +0100
Andi Kleen <andi@firstfloor.org> wrote:

> > The point is YOU DON'T KNOW.  In particular, there might be new
> > traps, there might be new state, there might be new MSRs, there
> > might be new control bits... anything.  Therefore, you cannot
> > blindly pass the bit on, even though XSAVE solves one part of the
> > problem.
> 
> I think what will happen if you don't expose it is that there will
> be always hypervisors which are behind and applications/OS will end up
> doing probing for opcodes instead of trusting CPUID bits.
> 
> Probably not what you intended.
> 

well the choice fundamentally is
1) Have correct applications work, even though you might not always get
   all new features that the hardware could have done.. at the expense
   that someone who wants to do horrible things can
2) Have all latest features always there, but break correctly written
   apps/oses every 2 years.

I'd go for option 1 any day of the week, hands down.
Esp if the "cpu cloaking" kind of things really disable the
instructions... but even without.

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
@ 2009-03-17 15:49                           ` Arjan van de Ven
  0 siblings, 0 replies; 63+ messages in thread
From: Arjan van de Ven @ 2009-03-17 15:49 UTC (permalink / raw)
  Cc: H. Peter Anvin, Andi Kleen, Jan Beulich, Jeremy Fitzhardinge,
	Jeremy Fitzhardinge, the arch/x86 maintainers, Xen-devel,
	Linux Kernel Mailing List

On Tue, 17 Mar 2009 12:56:21 +0100
Andi Kleen <andi@firstfloor.org> wrote:

> > The point is YOU DON'T KNOW.  In particular, there might be new
> > traps, there might be new state, there might be new MSRs, there
> > might be new control bits... anything.  Therefore, you cannot
> > blindly pass the bit on, even though XSAVE solves one part of the
> > problem.
> 
> I think what will happen if you don't expose it is that there will
> be always hypervisors which are behind and applications/OS will end up
> doing probing for opcodes instead of trusting CPUID bits.
> 
> Probably not what you intended.
> 

well the choice fundamentally is
1) Have correct applications work, even though you might not always get
   all new features that the hardware could have done.. at the expense
   that someone who wants to do horrible things can
2) Have all latest features always there, but break correctly written
   apps/oses every 2 years.

I'd go for option 1 any day of the week, hands down.
Esp if the "cpu cloaking" kind of things really disable the
instructions... but even without.

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Xen-devel] [PATCH 10/24] xen: mask XSAVE from cpuid
  2009-03-17 15:49                           ` Arjan van de Ven
  (?)
@ 2009-03-17 15:55                           ` Andi Kleen
  -1 siblings, 0 replies; 63+ messages in thread
From: Andi Kleen @ 2009-03-17 15:55 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andi Kleen, H. Peter Anvin, Jan Beulich, Jeremy Fitzhardinge,
	Jeremy Fitzhardinge, the arch/x86 maintainers, Xen-devel,
	Linux Kernel Mailing List

> well the choice fundamentally is
> 1) Have correct applications work, even though you might not always get
>    all new features that the hardware could have done.. at the expense
>    that someone who wants to do horrible things can
> 2) Have all latest features always there, but break correctly written
>    apps/oses every 2 years.

I'm not sure there will be that much breakage. It seems more like
a theoretical danger.

> 
> I'd go for option 1 any day of the week, hands down.
> Esp if the "cpu cloaking" kind of things really disable the
> instructions... but even without.

With cpu cloaking and disabling unknown instructions it would be fine
to go conservative.  But that's not what is being proposed.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2009-03-17 15:56 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-13  8:11 [GIT PULL] 2.6.30 Xen core updates Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 01/24] xen: disable preempt for leave_lazy_mmu Jeremy Fitzhardinge
2009-03-13  8:11   ` Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 02/24] xen: separate p2m allocation from setting Jeremy Fitzhardinge
2009-03-13  8:11   ` Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 03/24] xen: dynamically allocate p2m tables Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 04/24] xen: split construction of p2m mfn tables from registration Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 05/24] xen: clean up xen_load_gdt Jeremy Fitzhardinge
2009-03-13  8:11   ` Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 06/24] xen: make xen_load_gdt simpler Jeremy Fitzhardinge
2009-03-13  8:11   ` Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 07/24] xen: remove xen_load_gdt debug Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 08/24] xen: reserve i386 Xen pagetables Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 09/24] NULL noise: arch/x86/xen/smp.c Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 10/24] xen: mask XSAVE from cpuid Jeremy Fitzhardinge
2009-03-13  9:50   ` [Xen-devel] " Jan Beulich
2009-03-13  9:50     ` Jan Beulich
2009-03-13 15:13     ` [Xen-devel] " Jeremy Fitzhardinge
2009-03-13 15:13       ` Jeremy Fitzhardinge
2009-03-15 18:45       ` [Xen-devel] " H. Peter Anvin
2009-03-15 21:03         ` Jeremy Fitzhardinge
2009-03-15 21:03           ` Jeremy Fitzhardinge
2009-03-15 22:47           ` [Xen-devel] " Arjan van de Ven
2009-03-16  0:05             ` Jeremy Fitzhardinge
2009-03-16  0:09               ` Arjan van de Ven
2009-03-16  0:57                 ` H. Peter Anvin
2009-03-16 14:16                 ` Jan Beulich
2009-03-16 14:29                   ` Arjan van de Ven
2009-03-16 23:59                   ` Andi Kleen
2009-03-17  1:33                     ` H. Peter Anvin
2009-03-17 11:56                       ` Andi Kleen
2009-03-17 15:48                         ` H. Peter Anvin
2009-03-17 15:49                         ` Arjan van de Ven
2009-03-17 15:49                           ` Arjan van de Ven
2009-03-17 15:55                           ` Andi Kleen
2009-03-17  7:53                     ` Jan Beulich
2009-03-17  7:53                       ` Jan Beulich
2009-03-17 10:48                       ` [Xen-devel] " Andi Kleen
2009-03-17 10:55                         ` Jan Beulich
2009-03-17 10:55                           ` Jan Beulich
2009-03-13  8:11 ` [PATCH 11/24] xen: add FIX_TEXT_POKE to fixmap Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 12/24] x86-64: remove PGE from must-have feature list Jeremy Fitzhardinge
2009-03-15 21:18   ` H. Peter Anvin
2009-03-15 21:25     ` Jeremy Fitzhardinge
2009-03-15 21:25       ` Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 13/24] Xen: Add virt_to_pfn helper function Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 14/24] xen: add irq_from_evtchn Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 15/24] xen: add /dev/xen/evtchn driver Jeremy Fitzhardinge
2009-03-13  8:11   ` Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 16/24] xen: export ioctl headers to userspace Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 17/24] xen/dev-evtchn: clean up locking in evtchn Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 18/24] xen: add "capabilities" file Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 19/24] xen: add /sys/hypervisor support Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 20/24] xen/sys/hypervisor: change writable_pt to features Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 21/24] xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 22/24] xen: remove suspend_cancel hook Jeremy Fitzhardinge
2009-03-13 10:08   ` [Xen-devel] " Jan Beulich
2009-03-13 10:08     ` Jan Beulich
2009-03-13 15:17     ` [Xen-devel] " Jeremy Fitzhardinge
2009-03-13 15:17       ` Jeremy Fitzhardinge
2009-03-13  8:11 ` [PATCH 23/24] xen: use device model for suspending xenbus devices Jeremy Fitzhardinge
2009-03-13 10:09   ` [Xen-devel] [PATCH 23/24] xen: use device model for suspendingxenbus devices Jan Beulich
2009-03-13  8:12 ` [PATCH 24/24] xen/xenbus: export xenbus_dev_changed Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.