All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] Small Xen bugfixes
@ 2010-10-29 18:57 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-29 18:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Xen-devel, Linux Kernel Mailing List, Ian Campbell, Vasiliy G Tolstov

 Hi Linus,

Here are some small Xen bugfixes:

    * fix dom0 boot on systems whose E820 doesn't completely cover the
      ISA address space.  This fixes a crash on a Dell R310.
    * fix misallocation of initial pagetables on 32-bit systems
      (sizeof(pmd_t) != sizeof(pmd_t *)).  In practice this didn't cause
      a problem because the following allocation was page-aligned so the
      padding was big enough to make up the difference.
    * in xenfs, properly report put_user failures to write back error status

The changes are on two branches:

  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git upstream/core

Ian Campbell (2):
      xen: do not release any memory under 1M in domain 0
      xen: correct size of level2_kernel_pgt

 arch/x86/xen/mmu.c   |    2 +-
 arch/x86/xen/setup.c |   31 ++++++++++++++++++++++++++-----
 2 files changed, 27 insertions(+), 6 deletions(-)

  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git upstream/xenfs

Vasiliy Kulikov (1):
      xen: xenfs: privcmd: check put_user() return code

 drivers/xen/xenfs/privcmd.c |    8 ++------
 1 files changed, 2 insertions(+), 6 deletions(-)

Thanks,
	J


diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index e41683c..13cd4eb 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2104,7 +2104,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 {
 	pmd_t *kernel_pmd;
 
-	level2_kernel_pgt = extend_brk(sizeof(pmd_t *) * PTRS_PER_PMD, PAGE_SIZE);
+	level2_kernel_pgt = extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
 
 	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
 				  xen_start_info->nr_pt_frames * PAGE_SIZE +
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 8e2c9f21..1f49951 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -84,6 +84,22 @@ static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
 	start = PFN_UP(start_addr);
 	end = PFN_DOWN(end_addr);
 
+	/*
+	 * Domain 0 maintains a 1-1 P2M mapping for the first megabyte
+	 * so do not return such memory to the hypervisor.
+	 *
+	 * This region can contain various firmware tables and the
+	 * like which are often assumed to be always mapped and
+	 * available via phys_to_virt.
+	 */
+	if (xen_initial_domain()) {
+		if (end < PFN_DOWN(ISA_END_ADDRESS))
+			return 0;
+
+		if (start < PFN_DOWN(ISA_END_ADDRESS))
+			start = PFN_DOWN(ISA_END_ADDRESS);
+	}
+
 	if (end <= start)
 		return 0;
 
@@ -163,6 +179,7 @@ char * __init xen_memory_setup(void)
 		XENMEM_memory_map;
 	rc = HYPERVISOR_memory_op(op, &memmap);
 	if (rc == -ENOSYS) {
+		BUG_ON(xen_initial_domain());
 		memmap.nr_entries = 1;
 		map[0].addr = 0ULL;
 		map[0].size = mem_end;
@@ -200,12 +217,16 @@ char * __init xen_memory_setup(void)
 	}
 
 	/*
-	 * Even though this is normal, usable memory under Xen, reserve
-	 * ISA memory anyway because too many things think they can poke
-	 * about in there.
+	 * Even though this is normal, usable memory in a Xen domU,
+	 * reserve ISA memory anyway because too many things think
+	 * they can poke about in there.
+	 *
+	 * In dom0 we use the host e820 and therefore do not need to
+	 * specially reserved anything.
 	 */
-	e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
-			E820_RESERVED);
+	if (!xen_initial_domain())
+		e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
+				E820_RESERVED);
 
 	/*
 	 * Reserve Xen bits:
diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c
index f80be7f..2eb04c8 100644
--- a/drivers/xen/xenfs/privcmd.c
+++ b/drivers/xen/xenfs/privcmd.c
@@ -266,9 +266,7 @@ static int mmap_return_errors(void *data, void *state)
 	xen_pfn_t *mfnp = data;
 	struct mmap_batch_state *st = state;
 
-	put_user(*mfnp, st->user++);
-
-	return 0;
+	return put_user(*mfnp, st->user++);
 }
 
 static struct vm_operations_struct privcmd_vm_ops;
@@ -323,10 +321,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata)
 	up_write(&mm->mmap_sem);
 
 	if (state.err > 0) {
-		ret = 0;
-
 		state.user = m.arr;
-		traverse_pages(m.num, sizeof(xen_pfn_t),
+		ret = traverse_pages(m.num, sizeof(xen_pfn_t),
 			       &pagelist,
 			       mmap_return_errors, &state);
 	}



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [GIT PULL] Small Xen bugfixes
@ 2010-10-29 18:57 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-29 18:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Xen-devel, Linux Kernel Mailing List, Vasiliy G Tolstov, Ian Campbell

 Hi Linus,

Here are some small Xen bugfixes:

    * fix dom0 boot on systems whose E820 doesn't completely cover the
      ISA address space.  This fixes a crash on a Dell R310.
    * fix misallocation of initial pagetables on 32-bit systems
      (sizeof(pmd_t) != sizeof(pmd_t *)).  In practice this didn't cause
      a problem because the following allocation was page-aligned so the
      padding was big enough to make up the difference.
    * in xenfs, properly report put_user failures to write back error status

The changes are on two branches:

  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git upstream/core

Ian Campbell (2):
      xen: do not release any memory under 1M in domain 0
      xen: correct size of level2_kernel_pgt

 arch/x86/xen/mmu.c   |    2 +-
 arch/x86/xen/setup.c |   31 ++++++++++++++++++++++++++-----
 2 files changed, 27 insertions(+), 6 deletions(-)

  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git upstream/xenfs

Vasiliy Kulikov (1):
      xen: xenfs: privcmd: check put_user() return code

 drivers/xen/xenfs/privcmd.c |    8 ++------
 1 files changed, 2 insertions(+), 6 deletions(-)

Thanks,
	J


diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index e41683c..13cd4eb 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2104,7 +2104,7 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 {
 	pmd_t *kernel_pmd;
 
-	level2_kernel_pgt = extend_brk(sizeof(pmd_t *) * PTRS_PER_PMD, PAGE_SIZE);
+	level2_kernel_pgt = extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
 
 	max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->pt_base) +
 				  xen_start_info->nr_pt_frames * PAGE_SIZE +
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 8e2c9f21..1f49951 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -84,6 +84,22 @@ static unsigned long __init xen_release_chunk(phys_addr_t start_addr,
 	start = PFN_UP(start_addr);
 	end = PFN_DOWN(end_addr);
 
+	/*
+	 * Domain 0 maintains a 1-1 P2M mapping for the first megabyte
+	 * so do not return such memory to the hypervisor.
+	 *
+	 * This region can contain various firmware tables and the
+	 * like which are often assumed to be always mapped and
+	 * available via phys_to_virt.
+	 */
+	if (xen_initial_domain()) {
+		if (end < PFN_DOWN(ISA_END_ADDRESS))
+			return 0;
+
+		if (start < PFN_DOWN(ISA_END_ADDRESS))
+			start = PFN_DOWN(ISA_END_ADDRESS);
+	}
+
 	if (end <= start)
 		return 0;
 
@@ -163,6 +179,7 @@ char * __init xen_memory_setup(void)
 		XENMEM_memory_map;
 	rc = HYPERVISOR_memory_op(op, &memmap);
 	if (rc == -ENOSYS) {
+		BUG_ON(xen_initial_domain());
 		memmap.nr_entries = 1;
 		map[0].addr = 0ULL;
 		map[0].size = mem_end;
@@ -200,12 +217,16 @@ char * __init xen_memory_setup(void)
 	}
 
 	/*
-	 * Even though this is normal, usable memory under Xen, reserve
-	 * ISA memory anyway because too many things think they can poke
-	 * about in there.
+	 * Even though this is normal, usable memory in a Xen domU,
+	 * reserve ISA memory anyway because too many things think
+	 * they can poke about in there.
+	 *
+	 * In dom0 we use the host e820 and therefore do not need to
+	 * specially reserved anything.
 	 */
-	e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
-			E820_RESERVED);
+	if (!xen_initial_domain())
+		e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
+				E820_RESERVED);
 
 	/*
 	 * Reserve Xen bits:
diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c
index f80be7f..2eb04c8 100644
--- a/drivers/xen/xenfs/privcmd.c
+++ b/drivers/xen/xenfs/privcmd.c
@@ -266,9 +266,7 @@ static int mmap_return_errors(void *data, void *state)
 	xen_pfn_t *mfnp = data;
 	struct mmap_batch_state *st = state;
 
-	put_user(*mfnp, st->user++);
-
-	return 0;
+	return put_user(*mfnp, st->user++);
 }
 
 static struct vm_operations_struct privcmd_vm_ops;
@@ -323,10 +321,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata)
 	up_write(&mm->mmap_sem);
 
 	if (state.err > 0) {
-		ret = 0;
-
 		state.user = m.arr;
-		traverse_pages(m.num, sizeof(xen_pfn_t),
+		ret = traverse_pages(m.num, sizeof(xen_pfn_t),
 			       &pagelist,
 			       mmap_return_errors, &state);
 	}

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Small Xen bugfixes
  2010-10-29 18:57 ` Jeremy Fitzhardinge
  (?)
@ 2010-10-29 19:08 ` Linus Torvalds
  2010-10-29 19:20     ` Jeremy Fitzhardinge
  -1 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2010-10-29 19:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Xen-devel, Linux Kernel Mailing List, Ian Campbell, Vasiliy G Tolstov

On Fri, Oct 29, 2010 at 11:57 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>    * fix dom0 boot on systems whose E820 doesn't completely cover the
>      ISA address space.  This fixes a crash on a Dell R310.

Hmm. This clashes with my current tree.

And that conflict is trivial to fix up, but the thing is, I think the
patch that comes from your tree is worse than what is already there.

Why is that simple unconditional

    e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
           E820_RESERVED);

not just always the right thing? Why do you have a separate hack for
dom0 in xen_release_chunk() instead? That just looks bogus.

The normal logic we use on PC's is to just always reserve the low 64kB
of memory, and the whole ISA space. Why doesn't Xen just do the same?

                          Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Small Xen bugfixes
  2010-10-29 19:08 ` Linus Torvalds
@ 2010-10-29 19:20     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-29 19:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Xen-devel, Linux Kernel Mailing List, Ian Campbell, Vasiliy G Tolstov

 On 10/29/2010 12:08 PM, Linus Torvalds wrote:
> On Fri, Oct 29, 2010 at 11:57 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>    * fix dom0 boot on systems whose E820 doesn't completely cover the
>>      ISA address space.  This fixes a crash on a Dell R310.
> Hmm. This clashes with my current tree.

Bugger, so it does.  I just did a test merge with no complaint though;
what happened?

I'll redo the patch anyway to fix the below.

> And that conflict is trivial to fix up, but the thing is, I think the
> patch that comes from your tree is worse than what is already there.
>
> Why is that simple unconditional
>
>     e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
>            E820_RESERVED);
>
> not just always the right thing? Why do you have a separate hack for
> dom0 in xen_release_chunk() instead? That just looks bogus.

Yes, we actually had this discussion.  I was for making the
e820_add_region unconditional, and Ian's counter was that it could be
done in the common code rather than Xen-specific.

> The normal logic we use on PC's is to just always reserve the low 64kB
> of memory, and the whole ISA space. Why doesn't Xen just do the same?

The specific issue is that the Xen domain returns any memory that's not
covered by an E820 entry back to Xen, mostly to make sure that memory
isn't wasted by being shadowed by PCI devices.  But it was also doing
this in the sub-1M region, which on all the machines I've tested on is
completely covered.  But on a Dell R310 there's a little 2-page gap
where some ACPI stuff is lurking, that was being released back to Xen so
it couldn't be accessed from Linux any more.

The fix is to just make sure the whole low region is covered (or at
least the 640k-1M space).

I'll rework the patch.

Thanks,
    J


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Small Xen bugfixes
@ 2010-10-29 19:20     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-29 19:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Xen-devel, Linux Kernel Mailing List, Vasiliy G Tolstov, Ian Campbell

 On 10/29/2010 12:08 PM, Linus Torvalds wrote:
> On Fri, Oct 29, 2010 at 11:57 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>    * fix dom0 boot on systems whose E820 doesn't completely cover the
>>      ISA address space.  This fixes a crash on a Dell R310.
> Hmm. This clashes with my current tree.

Bugger, so it does.  I just did a test merge with no complaint though;
what happened?

I'll redo the patch anyway to fix the below.

> And that conflict is trivial to fix up, but the thing is, I think the
> patch that comes from your tree is worse than what is already there.
>
> Why is that simple unconditional
>
>     e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
>            E820_RESERVED);
>
> not just always the right thing? Why do you have a separate hack for
> dom0 in xen_release_chunk() instead? That just looks bogus.

Yes, we actually had this discussion.  I was for making the
e820_add_region unconditional, and Ian's counter was that it could be
done in the common code rather than Xen-specific.

> The normal logic we use on PC's is to just always reserve the low 64kB
> of memory, and the whole ISA space. Why doesn't Xen just do the same?

The specific issue is that the Xen domain returns any memory that's not
covered by an E820 entry back to Xen, mostly to make sure that memory
isn't wasted by being shadowed by PCI devices.  But it was also doing
this in the sub-1M region, which on all the machines I've tested on is
completely covered.  But on a Dell R310 there's a little 2-page gap
where some ACPI stuff is lurking, that was being released back to Xen so
it couldn't be accessed from Linux any more.

The fix is to just make sure the whole low region is covered (or at
least the 640k-1M space).

I'll rework the patch.

Thanks,
    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Small Xen bugfixes
  2010-10-29 19:20     ` Jeremy Fitzhardinge
  (?)
@ 2010-10-29 20:06     ` Jeremy Fitzhardinge
  2010-10-31  9:13         ` Ian Campbell
  -1 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-29 20:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Xen-devel, Linux Kernel Mailing List, Ian Campbell, Vasiliy G Tolstov

 On 10/29/2010 12:20 PM, Jeremy Fitzhardinge wrote:
>  On 10/29/2010 12:08 PM, Linus Torvalds wrote:
>> On Fri, Oct 29, 2010 at 11:57 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>>>    * fix dom0 boot on systems whose E820 doesn't completely cover the
>>>      ISA address space.  This fixes a crash on a Dell R310.
>> Hmm. This clashes with my current tree.
> Bugger, so it does.  I just did a test merge with no complaint though;
> what happened?
>
> I'll redo the patch anyway to fix the below.
>
>> And that conflict is trivial to fix up, but the thing is, I think the
>> patch that comes from your tree is worse than what is already there.
>>
>> Why is that simple unconditional
>>
>>     e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
>>            E820_RESERVED);
>>
>> not just always the right thing? Why do you have a separate hack for
>> dom0 in xen_release_chunk() instead? That just looks bogus.
> Yes, we actually had this discussion.  I was for making the
> e820_add_region unconditional, and Ian's counter was that it could be
> done in the common code rather than Xen-specific.
>
>> The normal logic we use on PC's is to just always reserve the low 64kB
>> of memory, and the whole ISA space. Why doesn't Xen just do the same?
> The specific issue is that the Xen domain returns any memory that's not
> covered by an E820 entry back to Xen, mostly to make sure that memory
> isn't wasted by being shadowed by PCI devices.  But it was also doing
> this in the sub-1M region, which on all the machines I've tested on is
> completely covered.  But on a Dell R310 there's a little 2-page gap
> where some ACPI stuff is lurking, that was being released back to Xen so
> it couldn't be accessed from Linux any more.
>
> The fix is to just make sure the whole low region is covered (or at
> least the 640k-1M space).

Hm, I see.  This Dell machine stashes the MPS table in 2 pages just
*below* 640k, so the ISA_START_ADDRESS-ISA_END_ADDRESS reserved range
doesn't cover it.

There's three ways to fix this:

    * not free memory below 1M (Ian's current patch)
    * fill any E820 gaps below 1M
    * reserve all memory below 1M

The 3rd is certainly simplest, at the cost of wasting a trivial amount
of memory.  Unfortunately it crashes early.  Sigh, will try and sort it
out this afternoon.

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Small Xen bugfixes
  2010-10-29 20:06     ` Jeremy Fitzhardinge
@ 2010-10-31  9:13         ` Ian Campbell
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2010-10-31  9:13 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Xen-devel, Linux Kernel Mailing List, Vasiliy G Tolstov

On Fri, 2010-10-29 at 21:06 +0100, Jeremy Fitzhardinge wrote:
> On 10/29/2010 12:20 PM, Jeremy Fitzhardinge wrote:
> >  On 10/29/2010 12:08 PM, Linus Torvalds wrote:
> >> On Fri, Oct 29, 2010 at 11:57 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> >>>    * fix dom0 boot on systems whose E820 doesn't completely cover the
> >>>      ISA address space.  This fixes a crash on a Dell R310.
> >> Hmm. This clashes with my current tree.
> > Bugger, so it does.  I just did a test merge with no complaint though;
> > what happened?
> >
> > I'll redo the patch anyway to fix the below.
> >
> >> And that conflict is trivial to fix up, but the thing is, I think the
> >> patch that comes from your tree is worse than what is already there.
> >>
> >> Why is that simple unconditional
> >>
> >>     e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
> >>            E820_RESERVED);
> >>
> >> not just always the right thing? Why do you have a separate hack for
> >> dom0 in xen_release_chunk() instead? That just looks bogus.
> > Yes, we actually had this discussion.  I was for making the
> > e820_add_region unconditional, and Ian's counter was that it could be
> > done in the common code rather than Xen-specific.
> >
> >> The normal logic we use on PC's is to just always reserve the low 64kB
> >> of memory, and the whole ISA space. Why doesn't Xen just do the same?
> > The specific issue is that the Xen domain returns any memory that's not
> > covered by an E820 entry back to Xen, mostly to make sure that memory
> > isn't wasted by being shadowed by PCI devices.  But it was also doing
> > this in the sub-1M region, which on all the machines I've tested on is
> > completely covered.  But on a Dell R310 there's a little 2-page gap
> > where some ACPI stuff is lurking, that was being released back to Xen so
> > it couldn't be accessed from Linux any more.
> >
> > The fix is to just make sure the whole low region is covered (or at
> > least the 640k-1M space).
> 
> Hm, I see.  This Dell machine stashes the MPS table in 2 pages just
> *below* 640k, so the ISA_START_ADDRESS-ISA_END_ADDRESS reserved range
> doesn't cover it.

Yes, what the machine has is:
        (XEN)  0000000000000000 - 000000000009e000 (usable)
        (XEN)  0000000000100000 - 00000000bf699000 (usable)
which after reserving the 640k-1M range shows up in dom0 as:
        BIOS-provided physical RAM map:
         Xen: 0000000000000000 - 000000000009e000 (usable)
         Xen: 00000000000a0000 - 0000000000100000 (reserved)
         Xen: 0000000000100000 - 0000000020000000 (usable)
which has a little 2 page hole between 9e000-a0000 which
xen_release_chunk dutifully punches out as a hole in both the virtual
and physical address space.

> There's three ways to fix this:
> 
>     * not free memory below 1M (Ian's current patch)
>     * fill any E820 gaps below 1M
>     * reserve all memory below 1M

The second two basically indirectly implement the first under Xen. They
also seem like things which if they are correct to do in Xen dom0 they
would also be correct on native too and therefore belong in
sanitize_e820 or somewhere like that.

The actual memory which is marked reserved (or just unmentioned) doesn't
really matter much here since the code which goes poking around in this
stuff doesn't check if it is reserved or not (and it shouldn't since we
know the BIOSes can/will get this stuff wrong).

> The 3rd is certainly simplest, at the cost of wasting a trivial amount
> of memory.

Doesn't Linux avoid using the lowest 1M anyway? (obviously apart from
the start of day probing for firmware tables etc).

>   Unfortunately it crashes early.  Sigh, will try and sort it
> out this afternoon.

Strange!



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Small Xen bugfixes
@ 2010-10-31  9:13         ` Ian Campbell
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2010-10-31  9:13 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Xen-devel, Linus Torvalds, Linux Kernel Mailing List, Vasiliy G Tolstov

On Fri, 2010-10-29 at 21:06 +0100, Jeremy Fitzhardinge wrote:
> On 10/29/2010 12:20 PM, Jeremy Fitzhardinge wrote:
> >  On 10/29/2010 12:08 PM, Linus Torvalds wrote:
> >> On Fri, Oct 29, 2010 at 11:57 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> >>>    * fix dom0 boot on systems whose E820 doesn't completely cover the
> >>>      ISA address space.  This fixes a crash on a Dell R310.
> >> Hmm. This clashes with my current tree.
> > Bugger, so it does.  I just did a test merge with no complaint though;
> > what happened?
> >
> > I'll redo the patch anyway to fix the below.
> >
> >> And that conflict is trivial to fix up, but the thing is, I think the
> >> patch that comes from your tree is worse than what is already there.
> >>
> >> Why is that simple unconditional
> >>
> >>     e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
> >>            E820_RESERVED);
> >>
> >> not just always the right thing? Why do you have a separate hack for
> >> dom0 in xen_release_chunk() instead? That just looks bogus.
> > Yes, we actually had this discussion.  I was for making the
> > e820_add_region unconditional, and Ian's counter was that it could be
> > done in the common code rather than Xen-specific.
> >
> >> The normal logic we use on PC's is to just always reserve the low 64kB
> >> of memory, and the whole ISA space. Why doesn't Xen just do the same?
> > The specific issue is that the Xen domain returns any memory that's not
> > covered by an E820 entry back to Xen, mostly to make sure that memory
> > isn't wasted by being shadowed by PCI devices.  But it was also doing
> > this in the sub-1M region, which on all the machines I've tested on is
> > completely covered.  But on a Dell R310 there's a little 2-page gap
> > where some ACPI stuff is lurking, that was being released back to Xen so
> > it couldn't be accessed from Linux any more.
> >
> > The fix is to just make sure the whole low region is covered (or at
> > least the 640k-1M space).
> 
> Hm, I see.  This Dell machine stashes the MPS table in 2 pages just
> *below* 640k, so the ISA_START_ADDRESS-ISA_END_ADDRESS reserved range
> doesn't cover it.

Yes, what the machine has is:
        (XEN)  0000000000000000 - 000000000009e000 (usable)
        (XEN)  0000000000100000 - 00000000bf699000 (usable)
which after reserving the 640k-1M range shows up in dom0 as:
        BIOS-provided physical RAM map:
         Xen: 0000000000000000 - 000000000009e000 (usable)
         Xen: 00000000000a0000 - 0000000000100000 (reserved)
         Xen: 0000000000100000 - 0000000020000000 (usable)
which has a little 2 page hole between 9e000-a0000 which
xen_release_chunk dutifully punches out as a hole in both the virtual
and physical address space.

> There's three ways to fix this:
> 
>     * not free memory below 1M (Ian's current patch)
>     * fill any E820 gaps below 1M
>     * reserve all memory below 1M

The second two basically indirectly implement the first under Xen. They
also seem like things which if they are correct to do in Xen dom0 they
would also be correct on native too and therefore belong in
sanitize_e820 or somewhere like that.

The actual memory which is marked reserved (or just unmentioned) doesn't
really matter much here since the code which goes poking around in this
stuff doesn't check if it is reserved or not (and it shouldn't since we
know the BIOSes can/will get this stuff wrong).

> The 3rd is certainly simplest, at the cost of wasting a trivial amount
> of memory.

Doesn't Linux avoid using the lowest 1M anyway? (obviously apart from
the start of day probing for firmware tables etc).

>   Unfortunately it crashes early.  Sigh, will try and sort it
> out this afternoon.

Strange!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Small Xen bugfixes
  2010-10-31  9:13         ` Ian Campbell
@ 2010-10-31 16:28           ` Jeremy Fitzhardinge
  -1 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-31 16:28 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Linus Torvalds, Xen-devel, Linux Kernel Mailing List, Vasiliy G Tolstov

 On 10/31/2010 02:13 AM, Ian Campbell wrote:
>> The 3rd is certainly simplest, at the cost of wasting a trivial amount
>> of memory.
> Doesn't Linux avoid using the lowest 1M anyway? (obviously apart from
> the start of day probing for firmware tables etc).

No, it tries to use most of it I think.  It will tend to avoid the low
64k (maybe more) to avoid BIOS bugs.

>>   Unfortunately it crashes early.  Sigh, will try and sort it
>> out this afternoon.
> Strange!

I didn't get a chance to poke at it again, but in retrospect, I think
there are various "must succeed" allocations in low memory.  We don't
need those allocations (things like AP boot trampoline, etc), but we
don't bother to stub them out or prevent them from happening.  Reducing
the system to one with *no* allocatable memory below 1M is just too
strange, and would be a continuous source of problems in the future.

Of the other two options, I think your original approach is going to be
simplest.  E820 gap filling wouldn't be too bad, but we'd end up having
to add a bit of gap-tracking logic to the E820 loop which isn't
currently there.  Ignoring sub-1M gaps is simpler (and it needn't be
conditional on xen_initial_domain(), because we would never expect to
see anything strange sub-1M in a domU, and if there is, we should still
be careful of it in case something odd is going on).

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Small Xen bugfixes
@ 2010-10-31 16:28           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-10-31 16:28 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Xen-devel, Linus Torvalds, Linux Kernel Mailing List, Vasiliy G Tolstov

 On 10/31/2010 02:13 AM, Ian Campbell wrote:
>> The 3rd is certainly simplest, at the cost of wasting a trivial amount
>> of memory.
> Doesn't Linux avoid using the lowest 1M anyway? (obviously apart from
> the start of day probing for firmware tables etc).

No, it tries to use most of it I think.  It will tend to avoid the low
64k (maybe more) to avoid BIOS bugs.

>>   Unfortunately it crashes early.  Sigh, will try and sort it
>> out this afternoon.
> Strange!

I didn't get a chance to poke at it again, but in retrospect, I think
there are various "must succeed" allocations in low memory.  We don't
need those allocations (things like AP boot trampoline, etc), but we
don't bother to stub them out or prevent them from happening.  Reducing
the system to one with *no* allocatable memory below 1M is just too
strange, and would be a continuous source of problems in the future.

Of the other two options, I think your original approach is going to be
simplest.  E820 gap filling wouldn't be too bad, but we'd end up having
to add a bit of gap-tracking logic to the E820 loop which isn't
currently there.  Ignoring sub-1M gaps is simpler (and it needn't be
conditional on xen_initial_domain(), because we would never expect to
see anything strange sub-1M in a domU, and if there is, we should still
be careful of it in case something odd is going on).

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Small Xen bugfixes
  2010-10-31 16:28           ` Jeremy Fitzhardinge
@ 2010-11-01 16:36             ` Ian Campbell
  -1 siblings, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2010-11-01 16:36 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, Xen-devel, Linux Kernel Mailing List, Vasiliy G Tolstov

On Sun, 2010-10-31 at 16:28 +0000, Jeremy Fitzhardinge wrote:
> On 10/31/2010 02:13 AM, Ian Campbell wrote:
> >> The 3rd is certainly simplest, at the cost of wasting a trivial amount
> >> of memory.
> > Doesn't Linux avoid using the lowest 1M anyway? (obviously apart from
> > the start of day probing for firmware tables etc).
> 
> No, it tries to use most of it I think.  It will tend to avoid the low
> 64k (maybe more) to avoid BIOS bugs.

It'll be interesting to see what effect Vista's avoidance of the whole
region (so I hear) has on BIOS vendors... (I think we can all guess)

> >>   Unfortunately it crashes early.  Sigh, will try and sort it
> >> out this afternoon.
> > Strange!
> 
> I didn't get a chance to poke at it again, but in retrospect, I think
> there are various "must succeed" allocations in low memory.  We don't
> need those allocations (things like AP boot trampoline, etc), but we
> don't bother to stub them out or prevent them from happening.  Reducing
> the system to one with *no* allocatable memory below 1M is just too
> strange, and would be a continuous source of problems in the future.

Agreed, we should try and mimic native as far as possible in this regard
or I fear we will see a never ending stream of little quirks and
oddities related to this sort of thing.

> Of the other two options, I think your original approach is going to be
> simplest.  E820 gap filling wouldn't be too bad, but we'd end up having
> to add a bit of gap-tracking logic to the E820 loop which isn't
> currently there.

It would also make us susceptible to perhaps being a bit fragile in the
face of unexpectedly insane e820s coming from the BIOS.

> Ignoring sub-1M gaps is simpler (and it needn't be
> conditional on xen_initial_domain(), because we would never expect to
> see anything strange sub-1M in a domU, and if there is, we should still
> be careful of it in case something odd is going on).

Absolutely.

I wonder if we shouldn't also do the following (note: untested). Since
Xen avoids using the sub-1M region for anything I think it is reasonable
to give the whole lot over to domain 0 for the purposes of finding
firmware table stashed in odd locations etc.

Ian.

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index ebb74ec..ab086e5 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2321,7 +2321,7 @@ __init void xen_ident_map_ISA(void)
 
 	xen_raw_printk("Xen: setup ISA identity maps\n");
 
-	for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) {
+	for (pa = 0; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) {
 		pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO);
 
 		if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0))



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] Small Xen bugfixes
@ 2010-11-01 16:36             ` Ian Campbell
  0 siblings, 0 replies; 12+ messages in thread
From: Ian Campbell @ 2010-11-01 16:36 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Xen-devel, Linus Torvalds, Linux Kernel Mailing List, Vasiliy G Tolstov

On Sun, 2010-10-31 at 16:28 +0000, Jeremy Fitzhardinge wrote:
> On 10/31/2010 02:13 AM, Ian Campbell wrote:
> >> The 3rd is certainly simplest, at the cost of wasting a trivial amount
> >> of memory.
> > Doesn't Linux avoid using the lowest 1M anyway? (obviously apart from
> > the start of day probing for firmware tables etc).
> 
> No, it tries to use most of it I think.  It will tend to avoid the low
> 64k (maybe more) to avoid BIOS bugs.

It'll be interesting to see what effect Vista's avoidance of the whole
region (so I hear) has on BIOS vendors... (I think we can all guess)

> >>   Unfortunately it crashes early.  Sigh, will try and sort it
> >> out this afternoon.
> > Strange!
> 
> I didn't get a chance to poke at it again, but in retrospect, I think
> there are various "must succeed" allocations in low memory.  We don't
> need those allocations (things like AP boot trampoline, etc), but we
> don't bother to stub them out or prevent them from happening.  Reducing
> the system to one with *no* allocatable memory below 1M is just too
> strange, and would be a continuous source of problems in the future.

Agreed, we should try and mimic native as far as possible in this regard
or I fear we will see a never ending stream of little quirks and
oddities related to this sort of thing.

> Of the other two options, I think your original approach is going to be
> simplest.  E820 gap filling wouldn't be too bad, but we'd end up having
> to add a bit of gap-tracking logic to the E820 loop which isn't
> currently there.

It would also make us susceptible to perhaps being a bit fragile in the
face of unexpectedly insane e820s coming from the BIOS.

> Ignoring sub-1M gaps is simpler (and it needn't be
> conditional on xen_initial_domain(), because we would never expect to
> see anything strange sub-1M in a domU, and if there is, we should still
> be careful of it in case something odd is going on).

Absolutely.

I wonder if we shouldn't also do the following (note: untested). Since
Xen avoids using the sub-1M region for anything I think it is reasonable
to give the whole lot over to domain 0 for the purposes of finding
firmware table stashed in odd locations etc.

Ian.

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index ebb74ec..ab086e5 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2321,7 +2321,7 @@ __init void xen_ident_map_ISA(void)
 
 	xen_raw_printk("Xen: setup ISA identity maps\n");
 
-	for (pa = ISA_START_ADDRESS; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) {
+	for (pa = 0; pa < ISA_END_ADDRESS; pa += PAGE_SIZE) {
 		pte_t pte = mfn_pte(PFN_DOWN(pa), PAGE_KERNEL_IO);
 
 		if (HYPERVISOR_update_va_mapping(PAGE_OFFSET + pa, pte, 0))

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-11-01 16:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-29 18:57 [GIT PULL] Small Xen bugfixes Jeremy Fitzhardinge
2010-10-29 18:57 ` Jeremy Fitzhardinge
2010-10-29 19:08 ` Linus Torvalds
2010-10-29 19:20   ` Jeremy Fitzhardinge
2010-10-29 19:20     ` Jeremy Fitzhardinge
2010-10-29 20:06     ` Jeremy Fitzhardinge
2010-10-31  9:13       ` Ian Campbell
2010-10-31  9:13         ` Ian Campbell
2010-10-31 16:28         ` Jeremy Fitzhardinge
2010-10-31 16:28           ` Jeremy Fitzhardinge
2010-11-01 16:36           ` Ian Campbell
2010-11-01 16:36             ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.