* [RFC] fix xen_in_range() @ 2009-04-22 23:53 Cihula, Joseph 2009-04-23 7:25 ` Keir Fraser 0 siblings, 1 reply; 9+ messages in thread From: Cihula, Joseph @ 2009-04-22 23:53 UTC (permalink / raw) To: xen-devel, Keir Fraser Cc: Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang, Xiaowei, Ke, Liping, Li, Xin The frametable check in xen_in_range() incorrectly compares virtual addresses to physical addresses (i.e. the parameters). Unfortunately, the frametable is only contiguous in the virtual address space, so one can't simply take __pa() of its start and end. And since it is quite large, iterating through each page to gets its phys addr adds a perceptible delay when that check has to be done for each page of physical memory (as is the case in the only caller, the VT-d routine that maps memory for dom0). But it also appears that we can't convert the phys addr arguments into their virt addrs to compare with the contiguous frametable range because they will convert to the DIRECTMAP va's instead. So while I would prefer to find a way to keep the check against the frametable, I don't see any obvious way to do so. So this RFC is to see if there are any ideas for how the test could be re-written. If not, then the below patch should be applied to remove the frametable check altogether. Signed-off-by: Joseph Cihula <joseph.cihula@intel.com> diff -r 655dc3bc1d8e xen/arch/x86/setup.c --- a/xen/arch/x86/setup.c Thu Apr 16 11:54:06 2009 +0100 +++ b/xen/arch/x86/setup.c Wed Apr 22 15:23:46 2009 -0700 @@ -1119,7 +1119,7 @@ int xen_in_range(paddr_t start, paddr_t int i; static struct { paddr_t s, e; - } xen_regions[5]; + } xen_regions[4]; /* initialize first time */ if ( !xen_regions[0].s ) @@ -1140,10 +1140,6 @@ int xen_in_range(paddr_t start, paddr_t /* bss + boot allocator bitmap */ xen_regions[3].s = __pa(&__bss_start); xen_regions[3].e = allocator_bitmap_end; - /* frametable */ - xen_regions[4].s = (unsigned long)frame_table; - xen_regions[4].e = (unsigned long)frame_table + - PFN_UP(max_page * sizeof(*frame_table)); } for ( i = 0; i < ARRAY_SIZE(xen_regions); i++ ) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] fix xen_in_range() 2009-04-22 23:53 [RFC] fix xen_in_range() Cihula, Joseph @ 2009-04-23 7:25 ` Keir Fraser 2009-04-23 7:58 ` Jan Beulich 2009-04-24 1:26 ` Cihula, Joseph 0 siblings, 2 replies; 9+ messages in thread From: Keir Fraser @ 2009-04-23 7:25 UTC (permalink / raw) To: Cihula, Joseph, xen-devel Cc: Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang, Xiaowei, Ke, Liping, Li, Xin On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: > Unfortunately, the frametable is only contiguous in the virtual address space, > so one can't simply take __pa() of its start and end. And since it is quite > large, iterating through each page to gets its phys addr adds a perceptible > delay when that check has to be done for each page of physical memory (as is > the case in the only caller, the VT-d routine that maps memory for dom0). But > it also appears that we can't convert the phys addr arguments into their virt > addrs to compare with the contiguous frametable range because they will > convert to the DIRECTMAP va's instead. The frametable is allocated in aligned 2MB chunks. So you can check at that granularity rather than 4kB. -- Keir ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] fix xen_in_range() 2009-04-23 7:25 ` Keir Fraser @ 2009-04-23 7:58 ` Jan Beulich 2009-04-24 1:26 ` Cihula, Joseph 1 sibling, 0 replies; 9+ messages in thread From: Jan Beulich @ 2009-04-23 7:58 UTC (permalink / raw) To: Keir Fraser Cc: Yunhong Jiang, Dexuan Cui, Shane Wang, Joseph Cihula, Xiaowei Yang, Liping Ke, xen-devel, Xin Li >>> Keir Fraser <keir.fraser@eu.citrix.com> 23.04.09 09:25 >>> >On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: > >> Unfortunately, the frametable is only contiguous in the virtual address space, >> so one can't simply take __pa() of its start and end. And since it is quite >> large, iterating through each page to gets its phys addr adds a perceptible >> delay when that check has to be done for each page of physical memory (as is >> the case in the only caller, the VT-d routine that maps memory for dom0). But >> it also appears that we can't convert the phys addr arguments into their virt >> addrs to compare with the contiguous frametable range because they will >> convert to the DIRECTMAP va's instead. > >The frametable is allocated in aligned 2MB chunks. So you can check at that >granularity rather than 4kB. ... and perhaps allocation should be attempted in 1Gb chunks when the table size is getting close to or exceeding 1Gb (and 1Gb-pages are supported). Or, since the space mapped is larger than the space allocated anyway, the condition might be just that of 1Gb-pages being supported (provided a 1Gb- aligned chunk can be allocated). Jan ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [RFC] fix xen_in_range() 2009-04-23 7:25 ` Keir Fraser 2009-04-23 7:58 ` Jan Beulich @ 2009-04-24 1:26 ` Cihula, Joseph 2009-04-24 7:04 ` Jan Beulich 1 sibling, 1 reply; 9+ messages in thread From: Cihula, Joseph @ 2009-04-24 1:26 UTC (permalink / raw) To: Keir Fraser, xen-devel Cc: Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang, Xiaowei, Ke, Liping, Li, Xin > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Thursday, April 23, 2009 12:25 AM > > On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: > > > Unfortunately, the frametable is only contiguous in the virtual address space, > > so one can't simply take __pa() of its start and end. And since it is quite > > large, iterating through each page to gets its phys addr adds a perceptible > > delay when that check has to be done for each page of physical memory (as is > > the case in the only caller, the VT-d routine that maps memory for dom0). But > > it also appears that we can't convert the phys addr arguments into their virt > > addrs to compare with the contiguous frametable range because they will > > convert to the DIRECTMAP va's instead. > > The frametable is allocated in aligned 2MB chunks. So you can check at that > granularity rather than 4kB. That made it just a single iteration on a 2GB system, but what fn should be used to convert the va to pa? __pa() isn't converting this correctly. Joe ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [RFC] fix xen_in_range() 2009-04-24 1:26 ` Cihula, Joseph @ 2009-04-24 7:04 ` Jan Beulich 2009-04-24 7:16 ` Keir Fraser 2009-04-24 23:29 ` RE: [RFC] fix xen_in_range() Cihula, Joseph 0 siblings, 2 replies; 9+ messages in thread From: Jan Beulich @ 2009-04-24 7:04 UTC (permalink / raw) To: Keir Fraser, Joseph Cihula, xen-devel Cc: Dexuan Cui, Shane Wang, Yunhong Jiang, Xiaowei Yang, Liping Ke, Xin Li >>> "Cihula, Joseph" <joseph.cihula@intel.com> 24.04.09 03:26 >>> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Thursday, April 23, 2009 12:25 AM >> >> On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: >> >> > Unfortunately, the frametable is only contiguous in the virtual address space, >> > so one can't simply take __pa() of its start and end. And since it is quite >> > large, iterating through each page to gets its phys addr adds a perceptible >> > delay when that check has to be done for each page of physical memory (as is >> > the case in the only caller, the VT-d routine that maps memory for dom0). But >> > it also appears that we can't convert the phys addr arguments into their virt >> > addrs to compare with the contiguous frametable range because they will >> > convert to the DIRECTMAP va's instead. >> >> The frametable is allocated in aligned 2MB chunks. So you can check at that >> granularity rather than 4kB. > >That made it just a single iteration on a 2GB system, but what fn should be used >to convert the va to pa? __pa() isn't converting this correctly. I'm afraid you'll have to either create one, or you'll have to consult the page tables. Also, how can this be a single iteration on a 2Gb system? struct page_info is 32 bytes, so I'd expect the frame table to be 16Mb in size (i.e. eight iterations). Also, after suggesting to use gb-pages when possible here I realized that it's probably a latent bug to map more space than was allocated - if the non-allocated-but-mapped pages happen to later get allocated to a domain, that domain may change the cacheability attributes of any of these pages, resulting in aliasing issues. I'll put together a patch for this, but it'll be a couple of days until I'll be able to do so. Jan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RE: [RFC] fix xen_in_range() 2009-04-24 7:04 ` Jan Beulich @ 2009-04-24 7:16 ` Keir Fraser 2009-04-24 23:14 ` Qing He 2009-04-24 23:20 ` [PATCH] iommu: fix unused percpu in xen_in_range() Qing He 2009-04-24 23:29 ` RE: [RFC] fix xen_in_range() Cihula, Joseph 1 sibling, 2 replies; 9+ messages in thread From: Keir Fraser @ 2009-04-24 7:16 UTC (permalink / raw) To: Jan Beulich, Joseph Cihula, xen-devel Cc: Dexuan Cui, Shane Wang, Yunhong Jiang, Xiaowei Yang, Liping Ke, Xin Li On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com> wrote: > Also, after suggesting to use gb-pages when possible here I realized that > it's probably a latent bug to map more space than was allocated - if the > non-allocated-but-mapped pages happen to later get allocated to a domain, > that domain may change the cacheability attributes of any of these pages, > resulting in aliasing issues. I'll put together a patch for this, but it'll be > a couple of days until I'll be able to do so. I think we should shatter the superpage on demand. This would also be required for superpage mappings of Xen itself: when we free initmem that memory can now be allocated to a domain (now xenheap and domheap are merged on x86/64). An alternative might be to mark such partially-freed superpages as Xenheap-only, and allocate them preferentially for Xenheap callers (i.e., alloc those pages first, then from the general heap). -- Keir ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RE: [RFC] fix xen_in_range() 2009-04-24 7:16 ` Keir Fraser @ 2009-04-24 23:14 ` Qing He 2009-04-24 23:20 ` [PATCH] iommu: fix unused percpu in xen_in_range() Qing He 1 sibling, 0 replies; 9+ messages in thread From: Qing He @ 2009-04-24 23:14 UTC (permalink / raw) To: Keir Fraser Cc: Cihula, Joseph, Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang, Xiaowei, Ke, Liping, xen-devel, Li, Xin On Fri, 2009-04-24 at 15:16 +0800, Keir Fraser wrote: > On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com> wrote: > > > Also, after suggesting to use gb-pages when possible here I realized that > > it's probably a latent bug to map more space than was allocated - if the > > non-allocated-but-mapped pages happen to later get allocated to a domain, > > that domain may change the cacheability attributes of any of these pages, > > resulting in aliasing issues. I'll put together a patch for this, but it'll be > > a couple of days until I'll be able to do so. > > I think we should shatter the superpage on demand. This would also be > required for superpage mappings of Xen itself: when we free initmem that > memory can now be allocated to a domain (now xenheap and domheap are merged > on x86/64). Actually, we just found an example that problems can arise here. Just it's not the superpage shattering, but unused __per_cpu area. Dom0 actually gets these pages and tries to do DMA on them, which causes VT-d page fault and dom0 boot failure. I have a patch for it adding this part to iommu page table. Thanks, Qing > > An alternative might be to mark such partially-freed superpages as > Xenheap-only, and allocate them preferentially for Xenheap callers (i.e., > alloc those pages first, then from the general heap). > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] iommu: fix unused percpu in xen_in_range() 2009-04-24 7:16 ` Keir Fraser 2009-04-24 23:14 ` Qing He @ 2009-04-24 23:20 ` Qing He 1 sibling, 0 replies; 9+ messages in thread From: Qing He @ 2009-04-24 23:20 UTC (permalink / raw) To: Keir Fraser Cc: Cihula, Joseph, Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang, Xiaowei, Ke, Liping, xen-devel, Li, Xin On Fri, 2009-04-24 at 15:16 +0800, Keir Fraser wrote: > On 24/04/2009 08:04, "Jan Beulich" <jbeulich@novell.com> wrote: > > > Also, after suggesting to use gb-pages when possible here I realized that > > it's probably a latent bug to map more space than was allocated - if the > > non-allocated-but-mapped pages happen to later get allocated to a domain, > > that domain may change the cacheability attributes of any of these pages, > > resulting in aliasing issues. I'll put together a patch for this, but it'll be > > a couple of days until I'll be able to do so. > > I think we should shatter the superpage on demand. This would also be > required for superpage mappings of Xen itself: when we free initmem that > memory can now be allocated to a domain (now xenheap and domheap are merged > on x86/64). > > An alternative might be to mark such partially-freed superpages as > Xenheap-only, and allocate them preferentially for Xenheap callers (i.e., > alloc those pages first, then from the general heap). > Here is the patch I mentioned above, it can fix dom0 booting on my box: --- unused percpu area is reclaimed as xenheap, but since xenheap and domheap are shared on x86_64, it's possible dom0 can get these pages and perform DMA on them. This patch removes this area in xen_in_range(), so iommu 1:1 mapping for this area can be added. Signed-off-by: Qing He <qing.he@intel.com> --- diff -r 8b152638adaa xen/arch/x86/setup.c --- a/xen/arch/x86/setup.c Thu Apr 23 16:22:48 2009 +0100 +++ b/xen/arch/x86/setup.c Fri Apr 24 15:24:18 2009 +0800 @@ -98,6 +98,7 @@ cpumask_t cpu_present_map; unsigned long xen_phys_start; unsigned long allocator_bitmap_end; +unsigned long per_cpu_used_end; #ifdef CONFIG_X86_32 /* Limits of Xen heap, used to initialise the allocator. */ @@ -223,6 +224,8 @@ static void __init percpu_init_areas(voi (first_unused << PERCPU_SHIFT), (NR_CPUS - first_unused) << PERCPU_SHIFT); #endif + + per_cpu_used_end = __pa(__per_cpu_start) + (first_unused << PERCPU_SHIFT); } static void __init init_idle_domain(void) @@ -1124,9 +1127,9 @@ int xen_in_range(paddr_t start, paddr_t /* initialize first time */ if ( !xen_regions[0].s ) { - extern char __init_begin[], __per_cpu_start[], __per_cpu_end[], - __bss_start[]; + extern char __init_begin[], __per_cpu_start[], __bss_start[]; extern unsigned long allocator_bitmap_end; + extern unsigned long per_cpu_used_end; /* S3 resume code (and other real mode trampoline code) */ xen_regions[0].s = bootsym_phys(trampoline_start); @@ -1136,7 +1139,7 @@ int xen_in_range(paddr_t start, paddr_t xen_regions[1].e = __pa(&__init_begin); /* per-cpu data */ xen_regions[2].s = __pa(&__per_cpu_start); - xen_regions[2].e = __pa(&__per_cpu_end); + xen_regions[2].e = per_cpu_used_end; /* bss + boot allocator bitmap */ xen_regions[3].s = __pa(&__bss_start); xen_regions[3].e = allocator_bitmap_end; diff -r 8b152638adaa xen/arch/x86/tboot.c --- a/xen/arch/x86/tboot.c Thu Apr 23 16:22:48 2009 +0100 +++ b/xen/arch/x86/tboot.c Fri Apr 24 15:24:18 2009 +0800 @@ -48,6 +48,7 @@ static uint64_t sinit_base, sinit_size; extern char __init_begin[], __per_cpu_start[], __per_cpu_end[], __bss_start[]; extern unsigned long allocator_bitmap_end; +extern unsigned long per_cpu_used_end; #define SHA1_SIZE 20 typedef uint8_t sha1_hash_t[SHA1_SIZE]; @@ -310,7 +311,7 @@ void tboot_shutdown(uint32_t shutdown_ty __pa(&_stext); /* per-cpu data */ g_tboot_shared->mac_regions[2].start = (uint64_t)__pa(&__per_cpu_start); - g_tboot_shared->mac_regions[2].size = __pa(&__per_cpu_end) - + g_tboot_shared->mac_regions[2].size = per_cpu_used_end - __pa(&__per_cpu_start); /* bss */ g_tboot_shared->mac_regions[3].start = (uint64_t)__pa(&__bss_start); ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: RE: [RFC] fix xen_in_range() 2009-04-24 7:04 ` Jan Beulich 2009-04-24 7:16 ` Keir Fraser @ 2009-04-24 23:29 ` Cihula, Joseph 1 sibling, 0 replies; 9+ messages in thread From: Cihula, Joseph @ 2009-04-24 23:29 UTC (permalink / raw) To: Jan Beulich, Keir Fraser, xen-devel Cc: Cui, Dexuan, Wang, Shane, Jiang, Yunhong, Yang, Xiaowei, Ke, Liping, Li, Xin > From: Jan Beulich [mailto:jbeulich@novell.com] > Sent: Friday, April 24, 2009 12:05 AM > > >>> "Cihula, Joseph" <joseph.cihula@intel.com> 24.04.09 03:26 >>> > >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > >> Sent: Thursday, April 23, 2009 12:25 AM > >> > >> On 23/04/2009 00:53, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: > >> > >> > Unfortunately, the frametable is only contiguous in the virtual address space, > >> > so one can't simply take __pa() of its start and end. And since it is quite > >> > large, iterating through each page to gets its phys addr adds a perceptible > >> > delay when that check has to be done for each page of physical memory (as is > >> > the case in the only caller, the VT-d routine that maps memory for dom0). But > >> > it also appears that we can't convert the phys addr arguments into their virt > >> > addrs to compare with the contiguous frametable range because they will > >> > convert to the DIRECTMAP va's instead. > >> > >> The frametable is allocated in aligned 2MB chunks. So you can check at that > >> granularity rather than 4kB. > > > >That made it just a single iteration on a 2GB system, but what fn should be used > >to convert the va to pa? __pa() isn't converting this correctly. > > I'm afraid you'll have to either create one, or you'll have to consult the page > tables. Also, how can this be a single iteration on a 2Gb system? struct > page_info is 32 bytes, so I'd expect the frame table to be 16Mb in size (i.e. > eight iterations). It's actually 8 (I cut off my trace early before). I don't think that it is worth the effort of walking the page tables just to exclude the frametable from the VT-d tables, since there are other memory regions (xenheap, etc.) that we're not excluding. But if there is a binlist of things to do when someone gets time, I'd suggest adding this to it. Joe > Also, after suggesting to use gb-pages when possible here I realized that > it's probably a latent bug to map more space than was allocated - if the > non-allocated-but-mapped pages happen to later get allocated to a domain, > that domain may change the cacheability attributes of any of these pages, > resulting in aliasing issues. I'll put together a patch for this, but it'll be a > couple of days until I'll be able to do so. > > Jan ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-04-24 23:29 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-04-22 23:53 [RFC] fix xen_in_range() Cihula, Joseph 2009-04-23 7:25 ` Keir Fraser 2009-04-23 7:58 ` Jan Beulich 2009-04-24 1:26 ` Cihula, Joseph 2009-04-24 7:04 ` Jan Beulich 2009-04-24 7:16 ` Keir Fraser 2009-04-24 23:14 ` Qing He 2009-04-24 23:20 ` [PATCH] iommu: fix unused percpu in xen_in_range() Qing He 2009-04-24 23:29 ` RE: [RFC] fix xen_in_range() Cihula, Joseph
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.