From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hongkaixing Subject: Some problems about xenpaging Date: Fri, 23 Sep 2011 09:35:11 +0000 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7BIT Return-path: Content-language: zh-CN List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "olaf@aepfle.de" Cc: YangXiaowei , ", " , "Eric Li(Zhentao)" , Yanqiangjun List-Id: xen-devel@lists.xenproject.org Hi, Olaf we have tested the xenpaging feature and found some problems. (1) the test case like this : when we start a VM with POD enable, the xenpaging is started at the same time. this case will cause many problems ,finally, we fixed the BUG, the patch is attached below. (2) there is a very serious problem. we have observed many VM crash examples, the error code is not always the same. we guess there exists conflict between xenpaging and memory mapping. for instance, if the Dom0 map the DomU's memory to its own space , then the memory of DomainU is paged out, when the Domain0 access this memory area, a panic is caused. And I really do not know whether the qemu device can perceive the memory modified by xenpaging? thanks! here is the patch to solve the pod problem 1) fix the p2m_pod_decrease_reservation() function, take care of the paging type --- ./origin/xen/arch/x86/mm/p2m.c 2011-09-05 20:39:30.000000000 +0800 +++ ./b/xen/arch/x86/mm/p2m.c 2011-09-23 23:46:19.000000000 +0800 @@ -675,6 +675,23 @@ BUG_ON(p2md->pod.entry_count < 0); pod--; } + else if ( steal_for_cache && p2m_is_paging(t) ) + { + struct page_info *page; + /* alloc a new page to compensate the pod list */ + page = alloc_domheap_page(d, 0); + if ( unlikely(page == NULL) ) + { + goto out_entry_check; + } + set_p2m_entry(d, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid); + p2m_mem_paging_drop_page(d, gpfn+i); + p2m_pod_cache_add(d, page, 0); + steal_for_cache = ( p2md->pod.entry_count > p2md->pod.count ); + nonpod--; + ram--; + } + /* for other ram types */ else if ( steal_for_cache && p2m_is_ram(t) ) { struct page_info *page; 2) fix the race between POD and xenpaging situation can be described as the follow xenpaging POD mfn = gfn_to_mfn() mfn = gfn_to_mfn() check p2m type check p2mt p2m_lock change p2m type p2m_unlock add to pod list p2m_lock() change p2m type p2m_unlock put_page() the result is a page is added to the pod list,and then it's removed from the list by put_page. my suggestion is to extend the range of the p2m lock to contain the p2m check. in p2m_mem_paging_nominate() function @@ -2532,7 +2561,8 @@ mfn_t mfn; int ret; - mfn = gfn_to_mfn(d, gfn, &p2mt); + p2m_lock(d->arch.p2m); + mfn = gfn_to_mfn_query(d, gfn, &p2mt); /* Check if mfn is valid */ ret = -EINVAL; @@ -2580,13 +2610,12 @@ goto out; /* Fix p2m entry */ - p2m_lock(d->arch.p2m); set_p2m_entry(d, gfn, mfn, 0, p2m_ram_paging_out); - p2m_unlock(d->arch.p2m); ret = 0; out: + p2m_unlock(d->arch.p2m); return ret; } in @@ -2595,34 +2624,39 @@ struct page_info *page; p2m_type_t p2mt; mfn_t mfn; - + int ret; + p2m_lock(d->arch.p2m); /* Get mfn */ - mfn = gfn_to_mfn(d, gfn, &p2mt); + mfn = gfn_to_mfn_query(d, gfn, &p2mt); + + ret = -EINVAL; if ( unlikely(!mfn_valid(mfn)) ) - return -EINVAL; + goto out; if (p2mt != p2m_ram_paging_out) { printk("p2m_mem_paging_evict type %d\n", p2mt); - return -EINVAL; + goto out; } /* Get the page so it doesn't get modified under Xen's feet */ page = mfn_to_page(mfn); if ( unlikely(!get_page(page, d)) ) - return -EINVAL; + goto out; /* Decrement guest domain's ref count of the page */ if ( test_and_clear_bit(_PGC_allocated, &page->count_info) ) put_page(page); /* Remove mapping from p2m table */ - p2m_lock(d->arch.p2m); set_p2m_entry(d, gfn, _mfn(PAGING_MFN), 0, p2m_ram_paged); - p2m_unlock(d->arch.p2m); /* Put the page back so it gets freed */ put_page(page); + + ret = 0; +out: + p2m_unlock(d->arch.p2m); return 0; } 3) fix the vmx_load_pdptrs() function in vmx.c in this situation the page directory table is paged out. Although using mdelay() is a bad idea, it's better than making the xen crash void vmx_load_pdptrs(struct vcpu *v) { unsigned long cr3 = v->arch.hvm_vcpu.guest_cr[3], mfn; uint64_t *guest_pdptrs; p2m_type_t p2mt; char *p; unsigned int try_count = 0; /* EPT needs to load PDPTRS into VMCS for PAE. */ if ( !hvm_pae_enabled(v) || (v->arch.hvm_vcpu.guest_efer & EFER_LMA) ) return; if ( cr3 & 0x1fUL ) goto crash; mfn = mfn_x(gfn_to_mfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt)); if ( !p2m_is_ram(p2mt) ) goto crash; + if( p2m_is_paging(p2mt)) + { + p2m_mem_paging_populate(v->domain, cr3 >> PAGE_SHIFT); + do + { + mdelay(1); + try_count ++; + if ( try_count > 65535 ) + { + goto crash; + } + mfn = mfn_x(gfn_to_mfn(v->domain, cr3 >> PAGE_SHIFT, &p2mt)); + }while( !mfn_valid(mfn)); + } + p = map_domain_page(mfn);