From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: c/s 22402 ("86 hvm: Refuse to perform __hvm_copy() work in atomic context.") breaks HVM, race possible in other code - any ideas? Date: Tue, 11 Jan 2011 13:00:32 -0500 Message-ID: <20110111180032.GH14017@dumpdata.com> References: <1292545063-32107-1-git-send-email-dgdegra@tycho.nsa.gov> <1292545063-32107-7-git-send-email-dgdegra@tycho.nsa.gov> <20110110224154.GH15016@dumpdata.com> <4D2C57DC.3090803@tycho.nsa.gov> <4D2C6EA3.8060900@tycho.nsa.gov> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <4D2C6EA3.8060900@tycho.nsa.gov> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Daniel De Graaf , keir@xen.org Cc: jeremy@goop.org, xen-devel@lists.xensource.com, Ian.Campbell@citrix.com List-Id: xen-devel@lists.xenproject.org On Tue, Jan 11, 2011 at 09:52:19AM -0500, Daniel De Graaf wrote: > On 01/11/2011 08:15 AM, Daniel De Graaf wrote: > > On 01/10/2011 05:41 PM, Konrad Rzeszutek Wilk wrote: > >>> @@ -284,8 +304,25 @@ static void unmap_grant_pages(struct grant_map *map, int offset, int pages) > >>> goto out; > >>> > >>> for (i = 0; i < pages; i++) { > >>> + uint32_t check, *tmp; > >>> WARN_ON(unmap_ops[i].status); > >>> - __free_page(map->pages[offset+i]); > >>> + if (!map->pages[i]) > >>> + continue; > >>> + /* XXX When unmapping, Xen will sometimes end up mapping the GFN > >>> + * to an invalid MFN. In this case, writes will be discarded and > >>> + * reads will return all 0xFF bytes. Leak these unusable GFNs > >>> + * until a way to restore them is found. > >>> + */ > >>> + tmp = kmap(map->pages[i]); > >>> + tmp[0] = 0xdeaddead; > >>> + mb(); > >>> + check = tmp[0]; > >>> + kunmap(map->pages[i]); > >>> + if (check == 0xdeaddead) > >>> + __free_page(map->pages[i]); > >>> + else if (debug) > >>> + printk("%s: Discard page %d=%ld\n", __func__, > >>> + i, page_to_pfn(map->pages[i])); > >> > >> Whoa. Any leads to when the "sometimes" happens? Does the status report an > >> error or is it silent? > > > > Status is silent in this case. I can produce it quite reliably on my > > test system where I am mapping a framebuffer (1280 pages) between two > > HVM guests - in this case, about 2/3 of the released pages will end up > > being invalid. It doesn't seem to be size-related as I have also seen > > it on the small 3-page page index mapping. There is a message on xm > > dmesg that may be related: > > > > (XEN) sh error: sh_remove_all_mappings(): can't find all mappings of mfn 7cbc6: c=8000000000000004 t=7400000000000002 > > > > This appears about once per page, with different MFNs but the same c/t. > > One of the two HVM guests (the one doing the mapping) has the PCI > > graphics card forwarded to it. > > > > Just tested on the latest xen 4.1 (with 22402:7d2fdc083c9c reverted as > that breaks HVM grants), which produces different output: Keir, the c/s 22402 has your name on it. Any ideas on the problem that Daniel is hitting with unmapping grants? > > ... > (XEN) mm.c:889:d1 Error getting mfn b803e (pfn 25a3e) from L1 entry 00000000b803e021 for l1e_owner=1, pg_owner=1 > (XEN) mm.c:889:d1 Error getting mfn b8038 (pfn 25a38) from L1 entry 00000000b8038021 for l1e_owner=1, pg_owner=1 > (XEN) mm.c:889:d1 Error getting mfn b803d (pfn 25a3d) from L1 entry 00000000b803d021 for l1e_owner=1, pg_owner=1 > (XEN) mm.c:889:d1 Error getting mfn 10829 (pfn 25a29) from L1 entry 0000000010829021 for l1e_owner=1, pg_owner=1 > (XEN) mm.c:889:d1 Error getting mfn 1081c (pfn 25a1c) from L1 entry 000000001081c021 for l1e_owner=1, pg_owner=1 > (XEN) mm.c:889:d1 Error getting mfn 10816 (pfn 25a16) from L1 entry 0000000010816021 for l1e_owner=1, pg_owner=1 > (XEN) mm.c:889:d1 Error getting mfn 1081a (pfn 25a1a) from L1 entry 000000001081a021 for l1e_owner=1, pg_owner=1 > ... > > This appears on the map; nothing is printed on the unmap. If the > unmap happens while the domain is up, it seems to be invalid more often; > most (perhaps all) of the destination-valid unmaps happen when the domain > is being destroyed. Exactly which pages are valid or invalid seems to be > mostly random, although nearby GFNs tend to have the same validity. > > If you have any thoughts as to the cause, I can test patches or provide > output as needed; it would be better if this workaround weren't needed. > > -- > Daniel De Graaf > National Security Agency