All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] fix xen_in_range()
@ 2009-04-22 23:53 Cihula, Joseph
  2009-04-23  7:25 ` Keir Fraser
  0 siblings, 1 reply; 9+ messages in thread
From: Cihula, Joseph @ 2009-04-22 23:53 UTC (permalink / raw)
  To: xen-devel, Keir Fraser
  Cc: Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang, Xiaowei, Ke,
	Liping, Li, Xin

The frametable check in xen_in_range() incorrectly compares virtual addresses to physical addresses (i.e. the parameters).

Unfortunately, the frametable is only contiguous in the virtual address space, so one can't simply take __pa() of its start and end.  And since it is quite large, iterating through each page to gets its phys addr adds a perceptible delay when that check has to be done for each page of physical memory (as is the case in the only caller, the VT-d routine that maps memory for dom0).  But it also appears that we can't convert the phys addr arguments into their virt addrs to compare with the contiguous frametable range because they will convert to the DIRECTMAP va's instead.

So while I would prefer to find a way to keep the check against the frametable, I don't see any obvious way to do so.  So this RFC is to see if there are any ideas for how the test could be re-written.

If not, then the below patch should be applied to remove the frametable check altogether.

Signed-off-by:  Joseph Cihula <joseph.cihula@intel.com>

diff -r 655dc3bc1d8e xen/arch/x86/setup.c
--- a/xen/arch/x86/setup.c      Thu Apr 16 11:54:06 2009 +0100
+++ b/xen/arch/x86/setup.c      Wed Apr 22 15:23:46 2009 -0700
@@ -1119,7 +1119,7 @@ int xen_in_range(paddr_t start, paddr_t
     int i;
     static struct {
         paddr_t s, e;
-    } xen_regions[5];
+    } xen_regions[4];

     /* initialize first time */
     if ( !xen_regions[0].s )
@@ -1140,10 +1140,6 @@ int xen_in_range(paddr_t start, paddr_t
         /* bss + boot allocator bitmap */
         xen_regions[3].s = __pa(&__bss_start);
         xen_regions[3].e = allocator_bitmap_end;
-        /* frametable */
-        xen_regions[4].s = (unsigned long)frame_table;
-        xen_regions[4].e = (unsigned long)frame_table +
-                           PFN_UP(max_page * sizeof(*frame_table));
     }

     for ( i = 0; i < ARRAY_SIZE(xen_regions); i++ )

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] fix xen_in_range()
  2009-04-22 23:53 [RFC] fix xen_in_range() Cihula, Joseph
@ 2009-04-23  7:25 ` Keir Fraser
  2009-04-23  7:58   ` Jan Beulich
  2009-04-24  1:26   ` Cihula, Joseph
  0 siblings, 2 replies; 9+ messages in thread
From: Keir Fraser @ 2009-04-23  7:25 UTC (permalink / raw)
  To: Cihula, Joseph, xen-devel
  Cc: Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang, Xiaowei, Ke,
	Liping, Li, Xin

On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote:

> Unfortunately, the frametable is only contiguous in the virtual address space,
> so one can't simply take __pa() of its start and end.  And since it is quite
> large, iterating through each page to gets its phys addr adds a perceptible
> delay when that check has to be done for each page of physical memory (as is
> the case in the only caller, the VT-d routine that maps memory for dom0).  But
> it also appears that we can't convert the phys addr arguments into their virt
> addrs to compare with the contiguous frametable range because they will
> convert to the DIRECTMAP va's instead.

The frametable is allocated in aligned 2MB chunks. So you can check at that
granularity rather than 4kB.

 -- Keir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] fix xen_in_range()
  2009-04-23  7:25 ` Keir Fraser
@ 2009-04-23  7:58   ` Jan Beulich
  2009-04-24  1:26   ` Cihula, Joseph
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2009-04-23  7:58 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Yunhong Jiang, Dexuan Cui, Shane Wang, Joseph Cihula,
	Xiaowei Yang, Liping Ke, xen-devel, Xin Li

>>> Keir Fraser <keir.fraser@eu.citrix.com> 23.04.09 09:25 >>>
>On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote:
>
>> Unfortunately, the frametable is only contiguous in the virtual address space,
>> so one can't simply take __pa() of its start and end.  And since it is quite
>> large, iterating through each page to gets its phys addr adds a perceptible
>> delay when that check has to be done for each page of physical memory (as is
>> the case in the only caller, the VT-d routine that maps memory for dom0).  But
>> it also appears that we can't convert the phys addr arguments into their virt
>> addrs to compare with the contiguous frametable range because they will
>> convert to the DIRECTMAP va's instead.
>
>The frametable is allocated in aligned 2MB chunks. So you can check at that
>granularity rather than 4kB.

... and perhaps allocation should be attempted in 1Gb chunks when the table
size is getting close to or exceeding 1Gb (and 1Gb-pages are supported). Or,
since the space mapped is larger than the space allocated anyway, the
condition might be just that of 1Gb-pages being supported (provided a 1Gb-
aligned chunk can be allocated).

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC] fix xen_in_range()
  2009-04-23  7:25 ` Keir Fraser
  2009-04-23  7:58   ` Jan Beulich
@ 2009-04-24  1:26   ` Cihula, Joseph
  2009-04-24  7:04     ` Jan Beulich
  1 sibling, 1 reply; 9+ messages in thread
From: Cihula, Joseph @ 2009-04-24  1:26 UTC (permalink / raw)
  To: Keir Fraser, xen-devel
  Cc: Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang, Xiaowei, Ke,
	Liping, Li, Xin

> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Thursday, April 23, 2009 12:25 AM
>
> On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote:
>
> > Unfortunately, the frametable is only contiguous in the virtual address space,
> > so one can't simply take __pa() of its start and end.  And since it is quite
> > large, iterating through each page to gets its phys addr adds a perceptible
> > delay when that check has to be done for each page of physical memory (as is
> > the case in the only caller, the VT-d routine that maps memory for dom0).  But
> > it also appears that we can't convert the phys addr arguments into their virt
> > addrs to compare with the contiguous frametable range because they will
> > convert to the DIRECTMAP va's instead.
>
> The frametable is allocated in aligned 2MB chunks. So you can check at that
> granularity rather than 4kB.

That made it just a single iteration on a 2GB system, but what fn should be used to convert the va to pa?  __pa() isn't converting this correctly.

Joe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC] fix xen_in_range()
  2009-04-24  1:26   ` Cihula, Joseph
@ 2009-04-24  7:04     ` Jan Beulich
  2009-04-24  7:16       ` Keir Fraser
  2009-04-24 23:29       ` RE: [RFC] fix xen_in_range() Cihula, Joseph
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Beulich @ 2009-04-24  7:04 UTC (permalink / raw)
  To: Keir Fraser, Joseph Cihula, xen-devel
  Cc: Dexuan Cui, Shane Wang, Yunhong Jiang, Xiaowei Yang, Liping Ke, Xin Li

>>> "Cihula, Joseph" <joseph.cihula@intel.com> 24.04.09 03:26 >>>
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] 
>> Sent: Thursday, April 23, 2009 12:25 AM
>>
>> On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote:
>>
>> > Unfortunately, the frametable is only contiguous in the virtual address space,
>> > so one can't simply take __pa() of its start and end.  And since it is quite
>> > large, iterating through each page to gets its phys addr adds a perceptible
>> > delay when that check has to be done for each page of physical memory (as is
>> > the case in the only caller, the VT-d routine that maps memory for dom0).  But
>> > it also appears that we can't convert the phys addr arguments into their virt
>> > addrs to compare with the contiguous frametable range because they will
>> > convert to the DIRECTMAP va's instead.
>>
>> The frametable is allocated in aligned 2MB chunks. So you can check at that
>> granularity rather than 4kB.
>
>That made it just a single iteration on a 2GB system, but what fn should be used
>to convert the va to pa?  __pa() isn't converting this correctly.

I'm afraid you'll have to either create one, or you'll have to consult the page
tables. Also, how can this be a single iteration on a 2Gb system? struct
page_info is 32 bytes, so I'd expect the frame table to be 16Mb in size (i.e.
eight iterations).

Also, after suggesting to use gb-pages when possible here I realized that
it's probably a latent bug to map more space than was allocated - if the
non-allocated-but-mapped pages happen to later get allocated to a domain,
that domain may change the cacheability attributes of any of these pages,
resulting in aliasing issues. I'll put together a patch for this, but it'll be a
couple of days until I'll be able to do so.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RE: [RFC] fix xen_in_range()
  2009-04-24  7:04     ` Jan Beulich
@ 2009-04-24  7:16       ` Keir Fraser
  2009-04-24 23:14         ` Qing He
  2009-04-24 23:20         ` [PATCH] iommu: fix unused percpu in xen_in_range() Qing He
  2009-04-24 23:29       ` RE: [RFC] fix xen_in_range() Cihula, Joseph
  1 sibling, 2 replies; 9+ messages in thread
From: Keir Fraser @ 2009-04-24  7:16 UTC (permalink / raw)
  To: Jan Beulich, Joseph Cihula, xen-devel
  Cc: Dexuan Cui, Shane Wang, Yunhong Jiang, Xiaowei Yang, Liping Ke, Xin Li

On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com> wrote:

> Also, after suggesting to use gb-pages when possible here I realized that
> it's probably a latent bug to map more space than was allocated - if the
> non-allocated-but-mapped pages happen to later get allocated to a domain,
> that domain may change the cacheability attributes of any of these pages,
> resulting in aliasing issues. I'll put together a patch for this, but it'll be
> a couple of days until I'll be able to do so.

I think we should shatter the superpage on demand. This would also be
required for superpage mappings of Xen itself: when we free initmem that
memory can now be allocated to a domain (now xenheap and domheap are merged
on x86/64).

An alternative might be to mark such partially-freed superpages as
Xenheap-only, and allocate them preferentially for Xenheap callers (i.e.,
alloc those pages first, then from the general heap).

 -- Keir

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RE: [RFC] fix xen_in_range()
  2009-04-24  7:16       ` Keir Fraser
@ 2009-04-24 23:14         ` Qing He
  2009-04-24 23:20         ` [PATCH] iommu: fix unused percpu in xen_in_range() Qing He
  1 sibling, 0 replies; 9+ messages in thread
From: Qing He @ 2009-04-24 23:14 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Cihula, Joseph, Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang,
	Xiaowei, Ke, Liping, xen-devel, Li, Xin

On Fri, 2009-04-24 at 15:16 +0800, Keir Fraser wrote:
> On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com> wrote:
> 
> > Also, after suggesting to use gb-pages when possible here I realized that
> > it's probably a latent bug to map more space than was allocated - if the
> > non-allocated-but-mapped pages happen to later get allocated to a domain,
> > that domain may change the cacheability attributes of any of these pages,
> > resulting in aliasing issues. I'll put together a patch for this, but it'll be
> > a couple of days until I'll be able to do so.
> 
> I think we should shatter the superpage on demand. This would also be
> required for superpage mappings of Xen itself: when we free initmem that
> memory can now be allocated to a domain (now xenheap and domheap are merged
> on x86/64).

Actually, we just found an example that problems can arise here.
Just it's not the superpage shattering, but unused __per_cpu area. Dom0
actually gets these pages and tries to do DMA on them, which causes VT-d
page fault and dom0 boot failure. I have a patch for it adding this part
to iommu page table.

Thanks,
Qing
> 
> An alternative might be to mark such partially-freed superpages as
> Xenheap-only, and allocate them preferentially for Xenheap callers (i.e.,
> alloc those pages first, then from the general heap).

> 
>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] iommu: fix unused percpu in xen_in_range()
  2009-04-24  7:16       ` Keir Fraser
  2009-04-24 23:14         ` Qing He
@ 2009-04-24 23:20         ` Qing He
  1 sibling, 0 replies; 9+ messages in thread
From: Qing He @ 2009-04-24 23:20 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Cihula, Joseph, Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang,
	Xiaowei, Ke, Liping, xen-devel, Li, Xin

On Fri, 2009-04-24 at 15:16 +0800, Keir Fraser wrote:
> On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com> wrote:
> 
> > Also, after suggesting to use gb-pages when possible here I realized that
> > it's probably a latent bug to map more space than was allocated - if the
> > non-allocated-but-mapped pages happen to later get allocated to a domain,
> > that domain may change the cacheability attributes of any of these pages,
> > resulting in aliasing issues. I'll put together a patch for this, but it'll be
> > a couple of days until I'll be able to do so.
> 
> I think we should shatter the superpage on demand. This would also be
> required for superpage mappings of Xen itself: when we free initmem that
> memory can now be allocated to a domain (now xenheap and domheap are merged
> on x86/64).
> 
> An alternative might be to mark such partially-freed superpages as
> Xenheap-only, and allocate them preferentially for Xenheap callers (i.e.,
> alloc those pages first, then from the general heap).
> 

Here is the patch I mentioned above, it can fix dom0 booting on my box:
---

unused percpu area is reclaimed as xenheap, but since xenheap and
domheap are shared on x86_64, it's possible dom0 can get these pages and
perform DMA on them. This patch removes this area in xen_in_range(),
so iommu 1:1 mapping for this area can be added.

Signed-off-by: Qing He <qing.he@intel.com>

---
diff -r 8b152638adaa xen/arch/x86/setup.c
--- a/xen/arch/x86/setup.c	Thu Apr 23 16:22:48 2009 +0100
+++ b/xen/arch/x86/setup.c	Fri Apr 24 15:24:18 2009 +0800
@@ -98,6 +98,7 @@ cpumask_t cpu_present_map;
 
 unsigned long xen_phys_start;
 unsigned long allocator_bitmap_end;
+unsigned long per_cpu_used_end;
 
 #ifdef CONFIG_X86_32
 /* Limits of Xen heap, used to initialise the allocator. */
@@ -223,6 +224,8 @@ static void __init percpu_init_areas(voi
                          (first_unused << PERCPU_SHIFT),
                          (NR_CPUS - first_unused) << PERCPU_SHIFT);
 #endif
+
+    per_cpu_used_end = __pa(__per_cpu_start) + (first_unused << PERCPU_SHIFT);
 }
 
 static void __init init_idle_domain(void)
@@ -1124,9 +1127,9 @@ int xen_in_range(paddr_t start, paddr_t 
     /* initialize first time */
     if ( !xen_regions[0].s )
     {
-        extern char __init_begin[], __per_cpu_start[], __per_cpu_end[],
-                    __bss_start[];
+        extern char __init_begin[], __per_cpu_start[], __bss_start[];
         extern unsigned long allocator_bitmap_end;
+        extern unsigned long per_cpu_used_end;
 
         /* S3 resume code (and other real mode trampoline code) */
         xen_regions[0].s = bootsym_phys(trampoline_start);
@@ -1136,7 +1139,7 @@ int xen_in_range(paddr_t start, paddr_t 
         xen_regions[1].e = __pa(&__init_begin);
         /* per-cpu data */
         xen_regions[2].s = __pa(&__per_cpu_start);
-        xen_regions[2].e = __pa(&__per_cpu_end);
+        xen_regions[2].e = per_cpu_used_end;
         /* bss + boot allocator bitmap */
         xen_regions[3].s = __pa(&__bss_start);
         xen_regions[3].e = allocator_bitmap_end;
diff -r 8b152638adaa xen/arch/x86/tboot.c
--- a/xen/arch/x86/tboot.c	Thu Apr 23 16:22:48 2009 +0100
+++ b/xen/arch/x86/tboot.c	Fri Apr 24 15:24:18 2009 +0800
@@ -48,6 +48,7 @@ static uint64_t sinit_base, sinit_size;
 
 extern char __init_begin[], __per_cpu_start[], __per_cpu_end[], __bss_start[];
 extern unsigned long allocator_bitmap_end;
+extern unsigned long per_cpu_used_end;
 
 #define SHA1_SIZE      20
 typedef uint8_t   sha1_hash_t[SHA1_SIZE];
@@ -310,7 +311,7 @@ void tboot_shutdown(uint32_t shutdown_ty
                                               __pa(&_stext);
         /* per-cpu data */
         g_tboot_shared->mac_regions[2].start = (uint64_t)__pa(&__per_cpu_start);
-        g_tboot_shared->mac_regions[2].size = __pa(&__per_cpu_end) -
+        g_tboot_shared->mac_regions[2].size = per_cpu_used_end -
                                               __pa(&__per_cpu_start);
         /* bss */
         g_tboot_shared->mac_regions[3].start = (uint64_t)__pa(&__bss_start);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: RE: [RFC] fix xen_in_range()
  2009-04-24  7:04     ` Jan Beulich
  2009-04-24  7:16       ` Keir Fraser
@ 2009-04-24 23:29       ` Cihula, Joseph
  1 sibling, 0 replies; 9+ messages in thread
From: Cihula, Joseph @ 2009-04-24 23:29 UTC (permalink / raw)
  To: Jan Beulich, Keir Fraser, xen-devel
  Cc: Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang, Xiaowei, Ke,
	Liping, Li, Xin

> From: Jan Beulich [mailto:jbeulich@novell.com]
> Sent: Friday, April 24, 2009 12:05 AM
>
> >>> "Cihula, Joseph" <joseph.cihula@intel.com> 24.04.09 03:26 >>>
> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> >> Sent: Thursday, April 23, 2009 12:25 AM
> >>
> >> On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote:
> >>
> >> > Unfortunately, the frametable is only contiguous in the virtual address space,
> >> > so one can't simply take __pa() of its start and end.  And since it is quite
> >> > large, iterating through each page to gets its phys addr adds a perceptible
> >> > delay when that check has to be done for each page of physical memory (as is
> >> > the case in the only caller, the VT-d routine that maps memory for dom0).  But
> >> > it also appears that we can't convert the phys addr arguments into their virt
> >> > addrs to compare with the contiguous frametable range because they will
> >> > convert to the DIRECTMAP va's instead.
> >>
> >> The frametable is allocated in aligned 2MB chunks. So you can check at that
> >> granularity rather than 4kB.
> >
> >That made it just a single iteration on a 2GB system, but what fn should be used
> >to convert the va to pa?  __pa() isn't converting this correctly.
>
> I'm afraid you'll have to either create one, or you'll have to consult the page
> tables. Also, how can this be a single iteration on a 2Gb system? struct
> page_info is 32 bytes, so I'd expect the frame table to be 16Mb in size (i.e.
> eight iterations).

It's actually 8 (I cut off my trace early before).

I don't think that it is worth the effort of walking the page tables just to exclude the frametable from the VT-d tables, since there are other memory regions (xenheap, etc.) that we're not excluding.  But if there is a binlist of things to do when someone gets time, I'd suggest adding this to it.

Joe

> Also, after suggesting to use gb-pages when possible here I realized that
> it's probably a latent bug to map more space than was allocated - if the
> non-allocated-but-mapped pages happen to later get allocated to a domain,
> that domain may change the cacheability attributes of any of these pages,
> resulting in aliasing issues. I'll put together a patch for this, but it'll be a
> couple of days until I'll be able to do so.
>
> Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-04-24 23:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-22 23:53 [RFC] fix xen_in_range() Cihula, Joseph
2009-04-23  7:25 ` Keir Fraser
2009-04-23  7:58   ` Jan Beulich
2009-04-24  1:26   ` Cihula, Joseph
2009-04-24  7:04     ` Jan Beulich
2009-04-24  7:16       ` Keir Fraser
2009-04-24 23:14         ` Qing He
2009-04-24 23:20         ` [PATCH] iommu: fix unused percpu in xen_in_range() Qing He
2009-04-24 23:29       ` RE: [RFC] fix xen_in_range() Cihula, Joseph

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.