On Tue, Jun 29, 2010 at 9:17 PM, Dan Williams wrote: > [ copying David to see if I am barking up the wrong VT-d tree.  This is on a > MacPro 3,1 according to dmesg so a 5400 series MCH ] > > On 6/29/2010 6:07 PM, Chris Li wrote: >> >> On Tue, Jun 29, 2010 at 4:57 PM, Dan Williams >> OK. I can't do this test remotely so I will get back to you tomorrow. ioatdma: Intel(R) QuickData Technology Driver 4.00 ioatdma 0000:00:0f.0: can't derive routing for PCI INT A ioatdma 0000:00:0f.0: PCI INT A: no GSI ioatdma 0000:00:0f.0: setting latency timer to 64 alloc irq_desc for 57 on node -1 alloc kstat_irqs on node -1 ioatdma 0000:00:0f.0: irq 57 for MSI/MSI-X alloc irq_desc for 58 on node -1 alloc kstat_irqs on node -1 ioatdma 0000:00:0f.0: irq 58 for MSI/MSI-X alloc irq_desc for 59 on node -1 alloc kstat_irqs on node -1 ioatdma 0000:00:0f.0: irq 59 for MSI/MSI-X alloc irq_desc for 60 on node -1 alloc kstat_irqs on node -1 ioatdma 0000:00:0f.0: irq 60 for MSI/MSI-X ioatdma 0000:00:0f.0: ioat2_set_chainaddr: chainaddr: ffffe000 ------------[ cut here ]------------ WARNING: at drivers/dma/ioat/dma_v2.c:289 ioat2_timer_event+0xbc/0x225 [ioatdma]() Hardware name: MacPro3,1 0000:00:0f.0: ioat2_timer_event: Channel halted (10) Modules linked in: ioatdma(+) dca fuse rfcomm sco bridge stp llc bnep l2cap autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device btusb i5400_edac snd_pcm bluetooth shpchp snd_timer snd e1000e soundcore rfkill i2c_i801 edac_core iTCO_wdt snd_page_alloc applesmc i5k_amb iTCO_vendor_support input_polldev firewire_ohci firewire_core crc_itu_t radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.35-rc3+ #41 Call Trace: [] warn_slowpath_common+0x85/0x9d [] warn_slowpath_fmt+0x46/0x48 [] ? sched_clock+0x9/0xd [] ioat2_timer_event+0xbc/0x225 [ioatdma] [] ? sched_clock_cpu+0xc3/0xce [] run_timer_softirq+0x1d6/0x2a5 [] ? ioat2_timer_event+0x0/0x225 [ioatdma] [] ? ktime_get+0x65/0xbe [] __do_softirq+0xe9/0x1ae [] ? tick_program_event+0x2a/0x2c [] call_softirq+0x1c/0x30 [] do_softirq+0x46/0x83 [] irq_exit+0x3b/0x7d [] smp_apic_timer_interrupt+0x8d/0x9b [] apic_timer_interrupt+0x13/0x20 [] ? mwait_idle+0x7a/0x87 [] ? mwait_idle+0x2c/0x87 [] cpu_idle+0xaa/0xe4 [] start_secondary+0x253/0x294 ---[ end trace 19d8162e5c74f492 ]--- ioatdma 0000:00:0f.0: Self-test copy timed out, disabling ioatdma 0000:00:0f.0: Freeing 2 in use descriptors! ioatdma 0000:00:0f.0: Intel(R) I/OAT DMA Engine init failed ioatdma 0000:00:0f.0: can't derive routing for PCI INT A > I was thinking in the BIOS, but appending iommu=off to the kernel > command-line should also do the trick. iommu=off cause the kernel not boot properly. BTW, that is why I lost my machine remotely last night. There is some sata error keep printing on the console. Let me try to collect that once I reboot the machine again. > ...but the failure is not intermittent, right? Happen every time. > > Where it fell over is a pretty straightforward usage of the dma engine and > it is failing on the first transaction that the first channel issues to > memory.  You should be able to 'modprobe ioatdma' after you boot and watch > it fail again if my suspicion is correct... if the signature changes that > would also be good to know. The delta seems to be this line: ioatdma 0000:00:0f.0: ioat2_set_chainaddr: chainaddr: ffffe000 Chris