From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: dom0 / hypervisor hang on dom0 boot Date: Tue, 21 May 2013 10:10:37 -0400 Message-ID: <20130521141037.GN492@phenom.dumpdata.com> References: <3374329.VOz1gdFjBv@amur.mch.fsc.net> <1630888.LbRauWP15S@amur.mch.fsc.net> <20130517222814.GA3255@localhost.localdomain> <1486614.T9i4H26zXq@amur.mch.fsc.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <1486614.T9i4H26zXq@amur.mch.fsc.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dietmar Hahn Cc: Konrad Rzeszutek Wilk , Andrew Cooper , Jan Beulich , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Tue, May 21, 2013 at 09:39:14AM +0200, Dietmar Hahn wrote: > Am Freitag 17 Mai 2013, 18:28:16 schrieb Konrad Rzeszutek Wilk: > > On Thu, May 16, 2013 at 01:07:05PM +0200, Dietmar Hahn wrote: > > > Am Mittwoch 15 Mai 2013, 10:42:17 schrieb Jan Beulich: > > > > >>> On 15.05.13 at 11:12, Dietmar Hahn wrote: > > > > > Am Mittwoch 15 Mai 2013, 09:35:46 schrieb Jan Beulich: > > > > >> >>> On 15.05.13 at 08:53, Dietmar Hahn wrote: > > > > >> > I tried iommu=debug and I can't see any faulting messages but Iam not > > > > >> > familiar with this code. > > > > >> > I attached the logging, maybe anyone can have a look on this. > > > > > > > > Perhaps only (if at all) by instrumenting the hypervisor. The > > > > question of course is how easily/quickly you can narrow down the > > > > code region that it might be dying in. And whether it's a hypervisor > > > > action at all that causes the hang (as opposed to something the > > > > DRM code in Dom0 does). > > > > > > I added some debug code to the linux kernel and could track down the > > > point of the hang. I used openSuSE kernel 3.7.10-1.4 but I looked at newer > > > kernels and found that the code is similar. > > > > > > i915_gem_init_global_gtt(...) > > > ... > > > intel_gtt_clear_range(start / PAGE_SIZE, (end-start) / PAGE_SIZE); > > > ... > > > > > > void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries) > > > { > > > unsigned int i; > > > > > > ---> A printk(...) here is seen on serial line! > > > > > > for (i = first_entry; i < (first_entry + num_entries); i++) { > > > intel_private.driver->write_entry(intel_private.base.scratch_page_dma, > > > i, 0); > > > } > > > > > > ---> A printk(...) here is never seen! > > > > > > readl(intel_private.gtt+i-1); > > > } > > > > > > The function behind the pointer intel_private.driver->write_entry is > > > i965_write_entry(). And the interesting instruction seems to be: > > > writel(addr | pte_flags, intel_private.gtt + entry); > > > > > > I added another printk() on start of the function i965_write_entry(). > > > And surprisingly after printing a lot of messages the kernel came up!!! > > > But now I had other problems like losing the audio device (maybe timeouts). > > > So maybe the hang is a timing problem? > > > > > > What I wanted to check is, what the hypervisor is doing while the system hangs. > > > Has anybody an idea maybe a timer and after 30s printing a dump of the stack of > > > all cpus? > > > > Yes. Can you try the two attached patches please. > > I tried both but none helped. I think it couldn't be expected as the first > patch handles an error case and the line with the second patch, > the call of pci_dma_sync_single_for_device(), gets not reached. OK, perhaps move the pci_dma_sync_single_for_device in the while loop? The idea behind that flush code is to kick the GTT to do its job. But if the SWIOTLB is used and the bounce page is used, then the writes don't end up in the flush code area at all - until the pci_unmap_page. Or the pci_dma_sync_single call. The other option was to use pci_alloc_coherent so that we would not need to use the PCI API. But I would like to verify that the theory is correct. > > Dietmar. > > -- > Company details: http://ts.fujitsu.com/imprint.html