All of lore.kernel.org
 help / color / mirror / Atom feed
* regression ioatdma 3.3
@ 2012-01-27 13:31 William Dauchy
  2012-01-27 14:47 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 31+ messages in thread
From: William Dauchy @ 2012-01-27 13:31 UTC (permalink / raw)
  To: xen-devel

Hello,

I have some troubles loading the IOATDMA module under xen4.1.2 and a
linux dom0 3.3

CONFIG_INTEL_IOATDMA=m
CONFIG_IGB=y

It was working with linux 3.1.5. The regression seems to be since
linux 3.2. I tried to do a `git bisect` but I'm facing other
regressions which make the debug harder.

Here is the call trace when loading the module in dom0:

dca service started, version 1.12.1
ioatdma: Intel(R) QuickData Technology Driver 4.00
ioatdma 0000:00:16.0: enabling device (0000 -> 0002)
xen: registering gsi 43 triggering 0 polarity 1
xen: --> pirq=43 -> irq=43 (gsi=43)
------------[ cut here ]------------
kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163!
invalid opcode: 0000 [#1] SMP
Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip
ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp
llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod
button

Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell
  C6100           /0D61XP
EIP: 0061:[<f512c524>] EFLAGS: 00010246 CPU: 0
EIP is at __cleanup+0x154/0x160 [ioatdma]
EAX: 00000000 EBX: e9dd44c0 ECX: 769ed7af EDX: 00000002
ESI: e90fe48c EDI: 00000002 EBP: eb40bf9c ESP: eb40bf7c
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process swapper/0 (pid: 0, ti=eb40a000 task=c1401ee0 task.ti=c13fc000)
Stack:
 eadc8040 00010002 18ae4040 00000002 0000bf9c e90fe48c e90fe4bc 00000006
 eb40bfb0 f512c770 18ae4040 00000000 e90fe4f8 eb40bfc0 c10347cb 00000001
 00000018 eb40bff8 c10350ab 00000000 00000000 00000000 00000000 00000000
Call Trace:
 [<f512c770>] ioat2_cleanup_event+0x30/0x50 [ioatdma]
 [<c10347cb>] tasklet_action+0x9b/0xb0
 [<c10350ab>] __do_softirq+0x7b/0x110
 [<c1035030>] ? irq_enter+0x70/0x70
 <IRQ>
 [<c1034e7e>] ? irq_exit+0x6e/0xa0
 [<c11bde70>] ? xen_evtchn_do_upcall+0x20/0x30
 [<c1322907>] ? xen_do_upcall+0x7/0xc
 [<c10013a7>] ? hypercall_page+0x3a7/0x1000
 [<c1006172>] ? xen_safe_halt+0x12/0x20
 [<c1010582>] ? default_idle+0x32/0x60
 [<c1008596>] ? cpu_idle+0x66/0xa0
 [<c130bd58>] ? rest_init+0x58/0x60
 [<c14237d2>] ? start_kernel+0x2e4/0x2ea
 [<c142331d>] ? kernel_init+0x11b/0x11b
 [<c14230ba>] ? i386_start_kernel+0xa9/0xb0
 [<c1426abb>] ? xen_start_kernel+0x5a2/0x5aa
Code: 00 e8 41 7a f0 cb 8b 15 40 1a 40 c1 8d 14 10 8d 46 3c e8 60 ea
f0 cb 83 c4 14 5b 5e 5f c9 c3 31 d2 89 df 31 c0 eb a2 84 c0 75 b5 <0f>
0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 57 56 53 89 c3 83
EIP: [<f512c524>] __cleanup+0x154/0x160 [ioatdma] SS:ESP 0069:eb40bf7c
---[ end trace 902e93593e49fa50 ]---
Kernel panic - not syncing: Fatal exception in interrupt


Does anybody have any clue?

Regards,
-- 
William

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: regression ioatdma 3.3
  2012-01-27 13:31 regression ioatdma 3.3 William Dauchy
@ 2012-01-27 14:47 ` Konrad Rzeszutek Wilk
  2012-01-27 15:02   ` William Dauchy
  2012-02-19 22:31   ` Jonathan Nieder
  0 siblings, 2 replies; 31+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-27 14:47 UTC (permalink / raw)
  To: William Dauchy; +Cc: xen-devel

On Fri, Jan 27, 2012 at 02:31:55PM +0100, William Dauchy wrote:
> Hello,
> 
> I have some troubles loading the IOATDMA module under xen4.1.2 and a
> linux dom0 3.3

So you are using the rc1 version? What exact git commit are you using?

> 
> CONFIG_INTEL_IOATDMA=m
> CONFIG_IGB=y
> 
> It was working with linux 3.1.5. The regression seems to be since
> linux 3.2. I tried to do a `git bisect` but I'm facing other

3.2 you say? This below is 3.3?

> regressions which make the debug harder.

Such as?

> 
> Here is the call trace when loading the module in dom0:

Is the problem present with baremetal (same exact kernel?)
Do you see this if you run a 64-bit dom0?

> 
> dca service started, version 1.12.1
> ioatdma: Intel(R) QuickData Technology Driver 4.00
> ioatdma 0000:00:16.0: enabling device (0000 -> 0002)
> xen: registering gsi 43 triggering 0 polarity 1
> xen: --> pirq=43 -> irq=43 (gsi=43)
> ------------[ cut here ]------------
> kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip
> ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp
> llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod
> button
> 
> Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell
>   C6100           /0D61XP
> EIP: 0061:[<f512c524>] EFLAGS: 00010246 CPU: 0
> EIP is at __cleanup+0x154/0x160 [ioatdma]
> EAX: 00000000 EBX: e9dd44c0 ECX: 769ed7af EDX: 00000002
> ESI: e90fe48c EDI: 00000002 EBP: eb40bf9c ESP: eb40bf7c
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> Process swapper/0 (pid: 0, ti=eb40a000 task=c1401ee0 task.ti=c13fc000)
> Stack:
>  eadc8040 00010002 18ae4040 00000002 0000bf9c e90fe48c e90fe4bc 00000006
>  eb40bfb0 f512c770 18ae4040 00000000 e90fe4f8 eb40bfc0 c10347cb 00000001
>  00000018 eb40bff8 c10350ab 00000000 00000000 00000000 00000000 00000000
> Call Trace:
>  [<f512c770>] ioat2_cleanup_event+0x30/0x50 [ioatdma]
>  [<c10347cb>] tasklet_action+0x9b/0xb0
>  [<c10350ab>] __do_softirq+0x7b/0x110
>  [<c1035030>] ? irq_enter+0x70/0x70
>  <IRQ>
>  [<c1034e7e>] ? irq_exit+0x6e/0xa0
>  [<c11bde70>] ? xen_evtchn_do_upcall+0x20/0x30
>  [<c1322907>] ? xen_do_upcall+0x7/0xc
>  [<c10013a7>] ? hypercall_page+0x3a7/0x1000
>  [<c1006172>] ? xen_safe_halt+0x12/0x20
>  [<c1010582>] ? default_idle+0x32/0x60
>  [<c1008596>] ? cpu_idle+0x66/0xa0
>  [<c130bd58>] ? rest_init+0x58/0x60
>  [<c14237d2>] ? start_kernel+0x2e4/0x2ea
>  [<c142331d>] ? kernel_init+0x11b/0x11b
>  [<c14230ba>] ? i386_start_kernel+0xa9/0xb0
>  [<c1426abb>] ? xen_start_kernel+0x5a2/0x5aa
> Code: 00 e8 41 7a f0 cb 8b 15 40 1a 40 c1 8d 14 10 8d 46 3c e8 60 ea
> f0 cb 83 c4 14 5b 5e 5f c9 c3 31 d2 89 df 31 c0 eb a2 84 c0 75 b5 <0f>
> 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 57 56 53 89 c3 83
> EIP: [<f512c524>] __cleanup+0x154/0x160 [ioatdma] SS:ESP 0069:eb40bf7c
> ---[ end trace 902e93593e49fa50 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
> Does anybody have any clue?
> 
> Regards,
> -- 
> William
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: regression ioatdma 3.3
  2012-01-27 14:47 ` Konrad Rzeszutek Wilk
@ 2012-01-27 15:02   ` William Dauchy
  2012-02-19 22:31   ` Jonathan Nieder
  1 sibling, 0 replies; 31+ messages in thread
From: William Dauchy @ 2012-01-27 15:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Fri, Jan 27, 2012 at 3:47 PM, Konrad Rzeszutek Wilk
<konrad@darnok.org> wrote:
> So you are using the rc1 version? What exact git commit are you using?

I pulled the last revision 74ea15d

> 3.2 you say? This below is 3.3?

Yes. I was using 3.1 kernel. After an upgrade to 3.2 I got the problem
and thought it was good to report the problem with the last 3.3-rc
kernel

> Is the problem present with baremetal (same exact kernel?)

I indeed tested with a baremetal kernel and didn't got any problem. So
it seems to come from a Xen problem.

> Do you see this if you run a 64-bit dom0?

I didn't test this.

-- 
William

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: regression ioatdma 3.3
  2012-01-27 14:47 ` Konrad Rzeszutek Wilk
  2012-01-27 15:02   ` William Dauchy
@ 2012-02-19 22:31   ` Jonathan Nieder
  2012-02-20 18:16     ` Jonathan Nieder
  1 sibling, 1 reply; 31+ messages in thread
From: Jonathan Nieder @ 2012-02-19 22:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Thomas Goirand, xen-devel, William Dauchy

forwarded 660554 http://thread.gmane.org/gmane.comp.emulators.xen.devel/121604
quit
(cc-ing Thomas, since he ran into the same bug)
Hi,

Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 27, 2012 at 02:31:55PM +0100, William Dauchy wrote:

>> I have some troubles loading the IOATDMA module under xen4.1.2 and a
>> linux dom0 3.3
>
> So you are using the rc1 version? What exact git commit are you using?

Broken:

 v3.2.6 + Debian patches (zigo)
 v3.3-rc2~22 (William)

Not broken:

 v3.1.8 + Debian patches, presumably (zigo)
 v3.1.5 (William)

[...]
>> Here is the call trace when loading the module in dom0:
>
> Is the problem present with baremetal (same exact kernel?)

No.

> Do you see this if you run a 64-bit dom0?

I'm guessing not, just based on the crazy coincidence that both
reports were with 32-bit kernels.  But who knows. ;-)

[...]
>> kernel BUG at /linux-3.3/drivers/dma/ioat/dma_v2.c:163!
>> invalid opcode: 0000 [#1] SMP
>> Modules linked in: ioatdma(+) dca ebt_ip6 ebt_dnat ebt_ip
>> ebtable_broute ebt_snat ebtable_nat ebtable_filter ebtables bridge stp
>> llc ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod
>> button
>> 
>> Pid: 0, comm: swapper/0 Not tainted 3.3.0-dom0-6357-i386+ #24 Dell   C6100           /0D61XP

This is

	active = ioat2_ring_active(ioat);
	for (i = 0; i < active && !seen_current; i++) {
		...
		if (tx->phys == phys_complete)
			seen_current = true;
	}
	...
	BUG_ON(active && !seen_current); /* no active descs have written a completion? */

Any hints for tracking it down?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: regression ioatdma 3.3
  2012-02-19 22:31   ` Jonathan Nieder
@ 2012-02-20 18:16     ` Jonathan Nieder
  2012-02-25  7:46       ` Thomas Goirand
  0 siblings, 1 reply; 31+ messages in thread
From: Jonathan Nieder @ 2012-02-20 18:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Thomas Goirand, xen-devel, William Dauchy

> Konrad Rzeszutek Wilk wrote:

>> Do you see this if you run a 64-bit dom0?

Looks like no.  Thomas reports[1]:

> I just tried with the amd64 kernel and Xen, and I didn't see any issue.
>
> However, it is important that Xen 4.1 + Linux 3.2 works with a 32 bits
> kernel, because that is the most optimized configuration (eg: 64 bits
> hypervisor, 32 bits kernel and 32 bits userland).

Maybe Andres's patches are relevant.

Hope that helps,
Jonathan

[1] http://bugs.debian.org/660554#25

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: regression ioatdma 3.3
  2012-02-20 18:16     ` Jonathan Nieder
@ 2012-02-25  7:46       ` Thomas Goirand
  2012-02-25 21:13         ` William Dauchy
  2012-03-02  5:57         ` ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 Jonathan Nieder
  0 siblings, 2 replies; 31+ messages in thread
From: Thomas Goirand @ 2012-02-25  7:46 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Konrad Rzeszutek Wilk, xen-devel, William Dauchy

On 02/21/2012 02:16 AM, Jonathan Nieder wrote:
>> I just tried with the amd64 kernel and Xen, and I didn't see any issue.
>>
>> However, it is important that Xen 4.1 + Linux 3.2 works with a 32 bits
>> kernel, because that is the most optimized configuration (eg: 64 bits
>> hypervisor, 32 bits kernel and 32 bits userland).
>>     
> Maybe Andres's patches are relevant.
>
> Hope that helps,
> Jonathan
>
> [1] http://bugs.debian.org/660554#25
>   
Hi,

Which patch are you referring to? Is there anything I can do to help
testing/investigating this? Should this be reported in the LKML? How
can I find who's the author of this driver?

Thomas Goirand (zigo)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: regression ioatdma 3.3
  2012-02-25  7:46       ` Thomas Goirand
@ 2012-02-25 21:13         ` William Dauchy
  2012-03-02  5:57         ` ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 Jonathan Nieder
  1 sibling, 0 replies; 31+ messages in thread
From: William Dauchy @ 2012-02-25 21:13 UTC (permalink / raw)
  To: Thomas Goirand; +Cc: Jonathan Nieder, Konrad Rzeszutek Wilk, xen-devel

Hi Thomas,

On Sat, Feb 25, 2012 at 8:46 AM, Thomas Goirand <thomas@goirand.fr> wrote:
> How can I find who's the author of this driver?

I don't think the problem is related to the driver itself, because it
is working without xen.
I'm also looking for hints to fix the problem.

-- 
William

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-02-25  7:46       ` Thomas Goirand
  2012-02-25 21:13         ` William Dauchy
@ 2012-03-02  5:57         ` Jonathan Nieder
  2012-03-02  6:42           ` Dan Williams
  1 sibling, 1 reply; 31+ messages in thread
From: Jonathan Nieder @ 2012-03-02  5:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Goirand, Konrad Rzeszutek Wilk, xen-devel, William Dauchy,
	Maciej Sosnowski, pkg-xen-devel, linux-kernel

Hi Dan,

Thomas and William (cc-ed) have been having trouble loading the
ioatdma driver on a 32-bit Xen dom0.  The module loads automatically
at boot time and trips

	BUG_ON(active && !seen_current); /* no active descs have written a completion? */

from drivers/dma/ioat/dma_v2.c.  That check has been present since
5cbafa65b92e (ioat2,3: convert to a true ring buffer, 2009-08-26).
The bug is probably in Xen code and seems to be a regression (the bug
is present in 3.2 but not 3.1.8).

Thomas Goirand wrote:
> On 03/01/2012 11:53 PM, Bastian Blank wrote:
>> On Thu, Mar 01, 2012 at 06:02:15PM +0800, Thomas Goirand wrote:

>>>                     Any clue why I don't see crashes without Xen, with a
>>> 64 bits kernel, or with earlier versions of Linux (eg: 3.1 for example)?
>>
>> xen/i386 uses a different memory model to anything else, this may be a
>> problem.
[...]
> Replacing BUG_ON by a WARN_ON, and adding #define DEBUG 1 on top of
> dma_v2.c, my kernel booted, and I had the attached dmesg output.
>
> Blacklisting the ioatdma kernel module of course, solved the issue.
>
> I hope that helps, please let me know if I should do more to help. If
> you need access to my server, that's possible (I use it only for
> packaging XCP and some tests...).

I don't expect you to debug this Xen-specific bug, but I'm wondering:
is there any reason this check has to be a BUG_ON instead of a
WARN_ON?  If there is some way to recover when the impossible happens,
that would make using and debugging the kernel a little easier.

Curious,
Jonathan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-02  5:57         ` ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 Jonathan Nieder
@ 2012-03-02  6:42           ` Dan Williams
  2012-03-02 16:21               ` Bastian Blank
  0 siblings, 1 reply; 31+ messages in thread
From: Dan Williams @ 2012-03-02  6:42 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel,
	William Dauchy, Konrad Rzeszutek Wilk, Thomas Goirand


[-- Attachment #1.1: Type: text/plain, Size: 1917 bytes --]

[replying from phone]

WARN_ON may work, but then kernel may be subject random hangs from missed
i/o completions.  Is xen32 using vt-d?  Just wondering if writes from ioat
device are getting misdirected.

--
Dan
On Mar 1, 2012 9:57 PM, "Jonathan Nieder" <jrnieder@gmail.com> wrote:

> Hi Dan,
>
> Thomas and William (cc-ed) have been having trouble loading the
> ioatdma driver on a 32-bit Xen dom0.  The module loads automatically
> at boot time and trips
>
>        BUG_ON(active && !seen_current); /* no active descs have written a
> completion? */
>
> from drivers/dma/ioat/dma_v2.c.  That check has been present since
> 5cbafa65b92e (ioat2,3: convert to a true ring buffer, 2009-08-26).
> The bug is probably in Xen code and seems to be a regression (the bug
> is present in 3.2 but not 3.1.8).
>
> Thomas Goirand wrote:
> > On 03/01/2012 11:53 PM, Bastian Blank wrote:
> >> On Thu, Mar 01, 2012 at 06:02:15PM +0800, Thomas Goirand wrote:
>
> >>>                     Any clue why I don't see crashes without Xen, with
> a
> >>> 64 bits kernel, or with earlier versions of Linux (eg: 3.1 for
> example)?
> >>
> >> xen/i386 uses a different memory model to anything else, this may be a
> >> problem.
> [...]
> > Replacing BUG_ON by a WARN_ON, and adding #define DEBUG 1 on top of
> > dma_v2.c, my kernel booted, and I had the attached dmesg output.
> >
> > Blacklisting the ioatdma kernel module of course, solved the issue.
> >
> > I hope that helps, please let me know if I should do more to help. If
> > you need access to my server, that's possible (I use it only for
> > packaging XCP and some tests...).
>
> I don't expect you to debug this Xen-specific bug, but I'm wondering:
> is there any reason this check has to be a BUG_ON instead of a
> WARN_ON?  If there is some way to recover when the impossible happens,
> that would make using and debugging the kernel a little easier.
>
> Curious,
> Jonathan
>

[-- Attachment #1.2: Type: text/html, Size: 2381 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-02  6:42           ` Dan Williams
@ 2012-03-02 16:21               ` Bastian Blank
  0 siblings, 0 replies; 31+ messages in thread
From: Bastian Blank @ 2012-03-02 16:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski,
	linux-kernel, William Dauchy, Konrad Rzeszutek Wilk

On Thu, Mar 01, 2012 at 10:42:44PM -0800, Dan Williams wrote:
> WARN_ON may work, but then kernel may be subject random hangs from missed
> i/o completions.

Why is that? Currently it just dies if was triggered via interrupt and
for some reason no active descriptor was found.

>                   Is xen32 using vt-d?

Yes. Xen can use VT-D.

>                                         Just wondering if writes from ioat
> device are getting misdirected.

How do VT-D and ioat interact?

Bastian

-- 
Too much of anything, even love, isn't necessarily a good thing.
		-- Kirk, "The Trouble with Tribbles", stardate 4525.6

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
@ 2012-03-02 16:21               ` Bastian Blank
  0 siblings, 0 replies; 31+ messages in thread
From: Bastian Blank @ 2012-03-02 16:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: xen-devel, pkg-xen-devel, Maciej Sosnowski, linux-kernel,
	Jonathan Nieder, William Dauchy, Konrad Rzeszutek Wilk

On Thu, Mar 01, 2012 at 10:42:44PM -0800, Dan Williams wrote:
> WARN_ON may work, but then kernel may be subject random hangs from missed
> i/o completions.

Why is that? Currently it just dies if was triggered via interrupt and
for some reason no active descriptor was found.

>                   Is xen32 using vt-d?

Yes. Xen can use VT-D.

>                                         Just wondering if writes from ioat
> device are getting misdirected.

How do VT-D and ioat interact?

Bastian

-- 
Too much of anything, even love, isn't necessarily a good thing.
		-- Kirk, "The Trouble with Tribbles", stardate 4525.6

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-02 16:21               ` Bastian Blank
  (?)
@ 2012-03-02 16:44               ` Dan Williams
  2012-03-02 17:57                 ` Bastian Blank
  -1 siblings, 1 reply; 31+ messages in thread
From: Dan Williams @ 2012-03-02 16:44 UTC (permalink / raw)
  To: Dan Williams, Jonathan Nieder, xen-devel, pkg-xen-devel,
	Maciej Sosnowski, linux-kernel, William Dauchy,
	Konrad Rzeszutek Wilk, Dave Jiang

On Fri, Mar 2, 2012 at 8:21 AM, Bastian Blank <waldi@debian.org> wrote:
> On Thu, Mar 01, 2012 at 10:42:44PM -0800, Dan Williams wrote:
>> WARN_ON may work, but then kernel may be subject random hangs from missed
>> i/o completions.

...actually descriptors completing too early.

>
> Why is that? Currently it just dies if was triggered via interrupt and
> for some reason no active descriptor was found.

No, it's not the case that "no active descriptor was found".

The channel is walking through the submitted descriptor chain to catch
up with what was last posted to '
phys_complete'.  It expects to stop when seeing phys_complete, but if
it never finds it the driver ends up completing the entire pending
ring.  The BUG_ON is there because the driver has just completed every
descriptor in the chain, and if the kernel was depending on proper
descriptor ordering it may have just violated it.

So I take it back, we can't go to WARN_ON, because the state of the
system is compromised and we need to bring it to a halt.

That said the code is likely failing in the self test, so the system
is probably fine, but if this happened in the network or raid layer it
is potentially fatal.

>>                   Is xen32 using vt-d?
>
> Yes. Xen can use VT-D.
>
>>                                         Just wondering if writes from ioat
>> device are getting misdirected.
>
> How do VT-D and ioat interact?
>

Same as any other pci bus mastering device, via dma_map to get a
io-virtual  address.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-02 16:44               ` Dan Williams
@ 2012-03-02 17:57                 ` Bastian Blank
  2012-03-02 19:31                   ` Dan Williams
  0 siblings, 1 reply; 31+ messages in thread
From: Bastian Blank @ 2012-03-02 17:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski,
	linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang

On Fri, Mar 02, 2012 at 08:44:00AM -0800, Dan Williams wrote:
> On Fri, Mar 2, 2012 at 8:21 AM, Bastian Blank <waldi@debian.org> wrote:
> > On Thu, Mar 01, 2012 at 10:42:44PM -0800, Dan Williams wrote:
> >> WARN_ON may work, but then kernel may be subject random hangs from missed
> >> i/o completions.
> ...actually descriptors completing too early.

The interrupt happens while the module is still loading, so most likely
directly after enabling them. There should be no request in flight yet.

What puzzles me is the mix of different data types in the ioatdma
driver:

| u64 completion = *chan->completion;
| unsigned long phys_complete = completion & ~0x3f;

The state is 64bit long, but is down converted to a 32bit value without
anything.

phys_complete (a 32 bit value) gets compared to struct
dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit
value.

Bastian

-- 
... The prejudices people feel about each other disappear when they get
to know each other.
		-- Kirk, "Elaan of Troyius", stardate 4372.5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-02 17:57                 ` Bastian Blank
@ 2012-03-02 19:31                   ` Dan Williams
  2012-03-02 20:08                     ` Bastian Blank
  2012-03-05 15:26                     ` Thomas Goirand
  0 siblings, 2 replies; 31+ messages in thread
From: Dan Williams @ 2012-03-02 19:31 UTC (permalink / raw)
  To: Dan Williams, Jonathan Nieder, xen-devel, pkg-xen-devel,
	Maciej Sosnowski, linux-kernel, William Dauchy,
	Konrad Rzeszutek Wilk, Dave Jiang

On Fri, Mar 2, 2012 at 9:57 AM, Bastian Blank <waldi@debian.org> wrote:
> On Fri, Mar 02, 2012 at 08:44:00AM -0800, Dan Williams wrote:
>> On Fri, Mar 2, 2012 at 8:21 AM, Bastian Blank <waldi@debian.org> wrote:
>> > On Thu, Mar 01, 2012 at 10:42:44PM -0800, Dan Williams wrote:
>> >> WARN_ON may work, but then kernel may be subject random hangs from missed
>> >> i/o completions.
>> ...actually descriptors completing too early.
>
> The interrupt happens while the module is still loading, so most likely
> directly after enabling them. There should be no request in flight yet.
>
> What puzzles me is the mix of different data types in the ioatdma
> driver:
>
> | u64 completion = *chan->completion;
> | unsigned long phys_complete = completion & ~0x3f;
>
> The state is 64bit long, but is down converted to a 32bit value without
> anything.
>
> phys_complete (a 32 bit value) gets compared to struct
> dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit
> value.

The assumption is that the driver's control structures are not in high
memory so all address values will only have 32-bits of valid data, but
maybe xen32 changes that assumption?

Can you send the log of the driver load with debug enabled?

diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c
index a4d6cb0..82472de 100644
--- a/drivers/dma/ioat/dma.c
+++ b/drivers/dma/ioat/dma.c
@@ -24,7 +24,7 @@
  * This driver supports an Intel I/OAT DMA engine, which does asynchronous
  * copy operations.
  */
-
+#define DEBUG
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/slab.h>
diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c
index 5d65f83..da337e7 100644
--- a/drivers/dma/ioat/dma_v2.c
+++ b/drivers/dma/ioat/dma_v2.c
@@ -24,7 +24,7 @@
  * This driver supports an Intel I/OAT DMA engine (versions >= 2), which
  * does asynchronous data movement and checksumming operations.
  */
-
+#define DEBUG
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/slab.h>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-02 19:31                   ` Dan Williams
@ 2012-03-02 20:08                     ` Bastian Blank
  2012-03-02 20:16                       ` Dan Williams
  2012-03-05 15:26                     ` Thomas Goirand
  1 sibling, 1 reply; 31+ messages in thread
From: Bastian Blank @ 2012-03-02 20:08 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski,
	linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang

On Fri, Mar 02, 2012 at 11:31:56AM -0800, Dan Williams wrote:
> On Fri, Mar 2, 2012 at 9:57 AM, Bastian Blank <waldi@debian.org> wrote:
> > phys_complete (a 32 bit value) gets compared to struct
> > dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit
> > value.
> The assumption is that the driver's control structures are not in high
> memory so all address values will only have 32-bits of valid data,

Can you back that up by some kernel documentation? There is a reason why
pci_alloc_pool uses dma_addr_t to store the address and _not_ unsigned
long. This are physical addresses, nothing the kernel can access
directly without a mapping.

>                                                                    but
> maybe xen32 changes that assumption?

Xen changes a lot of things in the memory management. This includes that
physical != machine addresses, where i915 failed horrible.

> Can you send the log of the driver load with debug enabled?

No, I don't have that hardware.

Bastian

-- 
Each kiss is as the first.
		-- Miramanee, Kirk's wife, "The Paradise Syndrome",
		   stardate 4842.6

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-02 20:08                     ` Bastian Blank
@ 2012-03-02 20:16                       ` Dan Williams
  2012-03-02 20:56                         ` Bastian Blank
  0 siblings, 1 reply; 31+ messages in thread
From: Dan Williams @ 2012-03-02 20:16 UTC (permalink / raw)
  To: Dan Williams, Jonathan Nieder, xen-devel, pkg-xen-devel,
	Maciej Sosnowski, linux-kernel, William Dauchy,
	Konrad Rzeszutek Wilk, Dave Jiang

On Fri, Mar 2, 2012 at 12:08 PM, Bastian Blank <waldi@debian.org> wrote:
> On Fri, Mar 02, 2012 at 11:31:56AM -0800, Dan Williams wrote:
>> On Fri, Mar 2, 2012 at 9:57 AM, Bastian Blank <waldi@debian.org> wrote:
>> > phys_complete (a 32 bit value) gets compared to struct
>> > dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit
>> > value.
>> The assumption is that the driver's control structures are not in high
>> memory so all address values will only have 32-bits of valid data,
>
> Can you back that up by some kernel documentation? There is a reason why
> pci_alloc_pool uses dma_addr_t to store the address and _not_ unsigned
> long. This are physical addresses, nothing the kernel can access
> directly without a mapping.

High memory can only be accessed with kmap(), so the assumption is
that dma_alloc never gives a buffer address above 32-bits on a 32-bit
build.  Yes, if HIGHMEM64G is set dma_addr_t becomes 64-bit, but that
is only to access high memory mapped application buffers via dma_map.
I'm not aware of any documentation in this area.

I don't mind bumping up the size if xen32 is changing the above
assumptions, but I'd want confirmation that this is the failure
scenario.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-02 20:16                       ` Dan Williams
@ 2012-03-02 20:56                         ` Bastian Blank
  2012-03-02 21:17                           ` Dan Williams
  0 siblings, 1 reply; 31+ messages in thread
From: Bastian Blank @ 2012-03-02 20:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski,
	linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang

On Fri, Mar 02, 2012 at 12:16:47PM -0800, Dan Williams wrote:
> On Fri, Mar 2, 2012 at 12:08 PM, Bastian Blank <waldi@debian.org> wrote:
> > On Fri, Mar 02, 2012 at 11:31:56AM -0800, Dan Williams wrote:
> >> On Fri, Mar 2, 2012 at 9:57 AM, Bastian Blank <waldi@debian.org> wrote:
> >> > phys_complete (a 32 bit value) gets compared to struct
> >> > dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit
> >> > value.
> >> The assumption is that the driver's control structures are not in high
> >> memory so all address values will only have 32-bits of valid data,
> > Can you back that up by some kernel documentation? There is a reason why
> > pci_alloc_pool uses dma_addr_t to store the address and _not_ unsigned
> > long. This are physical addresses, nothing the kernel can access
> > directly without a mapping.
> High memory can only be accessed with kmap(), so the assumption is
> that dma_alloc never gives a buffer address above 32-bits on a 32-bit
> build.  Yes, if HIGHMEM64G is set dma_addr_t becomes 64-bit, but that
> is only to access high memory mapped application buffers via dma_map.

All memory needs to be mapped.  Linux just have a default mapping of 1GiB
of the memory handy. However this is irrelevant for the physical DMA
addresses we talk about.

A assume this devices have a DMA mask of 2^64, so they can address
memory above the 4GiB.  And the kernel will happily assign this memory
if necessary or usefull.

> I'm not aware of any documentation in this area.

There is; the header files qualifies as documentation.

> I don't mind bumping up the size if xen32 is changing the above
> assumptions, but I'd want confirmation that this is the failure
> scenario.

At least it looks pretty wrong to remove four bits from a given address
just for fun.

Bastian

-- 
Respect is a rational process
		-- McCoy, "The Galileo Seven", stardate 2822.3

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-02 20:56                         ` Bastian Blank
@ 2012-03-02 21:17                           ` Dan Williams
  0 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2012-03-02 21:17 UTC (permalink / raw)
  To: Dan Williams, Jonathan Nieder, xen-devel, pkg-xen-devel,
	Maciej Sosnowski, linux-kernel, William Dauchy,
	Konrad Rzeszutek Wilk, Dave Jiang

On Fri, Mar 2, 2012 at 12:56 PM, Bastian Blank <waldi@debian.org> wrote:
> On Fri, Mar 02, 2012 at 12:16:47PM -0800, Dan Williams wrote:
>> On Fri, Mar 2, 2012 at 12:08 PM, Bastian Blank <waldi@debian.org> wrote:
>> > On Fri, Mar 02, 2012 at 11:31:56AM -0800, Dan Williams wrote:
>> >> On Fri, Mar 2, 2012 at 9:57 AM, Bastian Blank <waldi@debian.org> wrote:
>> >> > phys_complete (a 32 bit value) gets compared to struct
>> >> > dma_async_tx_descriptor.phys, which is defined as dma_addr_t, a _64_ bit
>> >> > value.
>> >> The assumption is that the driver's control structures are not in high
>> >> memory so all address values will only have 32-bits of valid data,
>> > Can you back that up by some kernel documentation? There is a reason why
>> > pci_alloc_pool uses dma_addr_t to store the address and _not_ unsigned
>> > long. This are physical addresses, nothing the kernel can access
>> > directly without a mapping.
>> High memory can only be accessed with kmap(), so the assumption is
>> that dma_alloc never gives a buffer address above 32-bits on a 32-bit
>> build.  Yes, if HIGHMEM64G is set dma_addr_t becomes 64-bit, but that
>> is only to access high memory mapped application buffers via dma_map.
>
> All memory needs to be mapped.  Linux just have a default mapping of 1GiB
> of the memory handy. However this is irrelevant for the physical DMA
> addresses we talk about.

I'm not sure you understand how himem works or we're talking past each other.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-02 19:31                   ` Dan Williams
  2012-03-02 20:08                     ` Bastian Blank
@ 2012-03-05 15:26                     ` Thomas Goirand
  2012-03-05 15:38                       ` Dan Williams
  1 sibling, 1 reply; 31+ messages in thread
From: Thomas Goirand @ 2012-03-05 15:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski,
	linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang

On 03/03/2012 03:31 AM, Dan Williams wrote:
> Can you send the log of the driver load with debug enabled?
> 
> diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c
> index a4d6cb0..82472de 100644
> --- a/drivers/dma/ioat/dma.c
> +++ b/drivers/dma/ioat/dma.c
> @@ -24,7 +24,7 @@
>   * This driver supports an Intel I/OAT DMA engine, which does asynchronous
>   * copy operations.
>   */
> -
> +#define DEBUG
>  #include <linux/init.h>
>  #include <linux/module.h>
>  #include <linux/slab.h>
> diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c
> index 5d65f83..da337e7 100644
> --- a/drivers/dma/ioat/dma_v2.c
> +++ b/drivers/dma/ioat/dma_v2.c
> @@ -24,7 +24,7 @@
>   * This driver supports an Intel I/OAT DMA engine (versions >= 2), which
>   * does asynchronous data movement and checksumming operations.
>   */
> -
> +#define DEBUG
>  #include <linux/init.h>
>  #include <linux/module.h>
>  #include <linux/slab.h>

I will do my best to provide it ASAP. Should I compile with BUG_ON so
you see it crashing, as per the original code, or just with WARN_ON, so
you also see further things in dmesg?

Thomas

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-05 15:26                     ` Thomas Goirand
@ 2012-03-05 15:38                       ` Dan Williams
  2012-03-06  9:20                         ` Thomas Goirand
  0 siblings, 1 reply; 31+ messages in thread
From: Dan Williams @ 2012-03-05 15:38 UTC (permalink / raw)
  To: Thomas Goirand
  Cc: Jonathan Nieder, xen-devel, pkg-xen-devel, Maciej Sosnowski,
	linux-kernel, William Dauchy, Konrad Rzeszutek Wilk, Dave Jiang

On Mon, Mar 5, 2012 at 7:26 AM, Thomas Goirand <zigo@debian.org> wrote:
> I will do my best to provide it ASAP. Should I compile with BUG_ON so
> you see it crashing, as per the original code, or just with WARN_ON, so
> you also see further things in dmesg?

Yes, replacing with a WARN_ON might allow it to skid after the crash
and give a bit more information.

Thank you for grabbing this info.

--
Dan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-05 15:38                       ` Dan Williams
@ 2012-03-06  9:20                         ` Thomas Goirand
  2012-03-06 10:33                           ` Bastian Blank
  2012-03-06 14:14                           ` Dan Williams
  0 siblings, 2 replies; 31+ messages in thread
From: Thomas Goirand @ 2012-03-06  9:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: xen-devel, Dave Jiang, pkg-xen-devel, Maciej Sosnowski,
	linux-kernel, Jonathan Nieder, William Dauchy,
	Konrad Rzeszutek Wilk

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

On 03/05/2012 11:38 PM, Dan Williams wrote:
> On Mon, Mar 5, 2012 at 7:26 AM, Thomas Goirand <zigo@debian.org> wrote:
>> I will do my best to provide it ASAP. Should I compile with BUG_ON so
>> you see it crashing, as per the original code, or just with WARN_ON, so
>> you also see further things in dmesg?
> 
> Yes, replacing with a WARN_ON might allow it to skid after the crash
> and give a bit more information.
> 
> Thank you for grabbing this info.
> 
> --
> Dan

Hi Dan,

Please find attached the log that you asked me, with WARN_ON instead of
BUG_ON, and with the 2 #define DEBUG in dma.c and dma_v2.c.

Let me know if you want me to do more, or if you want to have access to
my server (in which case, provide me a public ssh key and sign your
email with PGP).

Thomas

P.S: I compressed the dmesg.txt because on debian lists if a message is
>= 40K, it requires administrator moderation, which I want to avoid.

[-- Attachment #2: dmesg.txt.gz --]
[-- Type: application/x-gzip, Size: 20336 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-06  9:20                         ` Thomas Goirand
@ 2012-03-06 10:33                           ` Bastian Blank
  2012-03-06 14:14                           ` Dan Williams
  1 sibling, 0 replies; 31+ messages in thread
From: Bastian Blank @ 2012-03-06 10:33 UTC (permalink / raw)
  To: Thomas Goirand
  Cc: Dan Williams, xen-devel, Dave Jiang, pkg-xen-devel,
	Maciej Sosnowski, linux-kernel, Jonathan Nieder, William Dauchy,
	Konrad Rzeszutek Wilk

On Tue, Mar 06, 2012 at 05:20:54PM +0800, Thomas Goirand wrote:
> Please find attached the log that you asked me, with WARN_ON instead of
> BUG_ON, and with the 2 #define DEBUG in dma.c and dma_v2.c.

| ioatdma 0000:00:16.1: desc[1]: (0x27ea6d040->0x27ea6d080) cookie: 0 flags: 0x0 ctl: 0x0 (op: 0 int_en: 0 compl: 0)
| ioatdma 0000:00:16.1: desc[1]: (0x27ea6d040->0x27ea6d080) cookie: 0 flags: 0x31 ctl: 0x9 (op: 0 int_en: 1 compl: 1)

*counting* 9 hex digest, aka > 2^32. What did I say?

Bastian

-- 
The joys of love made her human and the agonies of love destroyed her.
		-- Spock, "Requiem for Methuselah", stardate 5842.8

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-06  9:20                         ` Thomas Goirand
  2012-03-06 10:33                           ` Bastian Blank
@ 2012-03-06 14:14                           ` Dan Williams
  2012-03-06 14:39                               ` Ian Campbell
  2012-03-11 22:06                             ` Jonathan Nieder
  1 sibling, 2 replies; 31+ messages in thread
From: Dan Williams @ 2012-03-06 14:14 UTC (permalink / raw)
  To: Thomas Goirand
  Cc: xen-devel, Dave Jiang, pkg-xen-devel, Maciej Sosnowski,
	linux-kernel, Jonathan Nieder, William Dauchy,
	Konrad Rzeszutek Wilk

On Tue, Mar 6, 2012 at 1:20 AM, Thomas Goirand <zigo@debian.org> wrote:
> On 03/05/2012 11:38 PM, Dan Williams wrote:
>> On Mon, Mar 5, 2012 at 7:26 AM, Thomas Goirand <zigo@debian.org> wrote:
>>> I will do my best to provide it ASAP. Should I compile with BUG_ON so
>>> you see it crashing, as per the original code, or just with WARN_ON, so
>>> you also see further things in dmesg?
>>
>> Yes, replacing with a WARN_ON might allow it to skid after the crash
>> and give a bit more information.
>>
>> Thank you for grabbing this info.
>>
>> --
>> Dan
>
> Hi Dan,
>
> Please find attached the log that you asked me, with WARN_ON instead of
> BUG_ON, and with the 2 #define DEBUG in dma.c and dma_v2.c.
>

[    9.276817] ioatdma 0000:00:16.4: desc[0]:
(0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0
int_en: 1 compl: 1)
...
[    9.276832] ioatdma 0000:00:16.4: ioat_get_current_completion:
phys_complete: 0xcc7000

Thanks, this clearly shows that our descriptors are above 4GB and that
the driver truncates the completion word.

Is this new behavior for xen?

Before you had mentioned that non-xen 32-bit builds don't fail.  Can
you send me the .config from those two cases (offlist if they are too
large)?  I'm looking for what config option enables this so I can
quote it in the patch to increase the size of phys_complete.
Certainly this changes my assumptions of what address ranges
GFP_KERNEL memory will be located.

--
Dan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-06 14:14                           ` Dan Williams
@ 2012-03-06 14:39                               ` Ian Campbell
  2012-03-11 22:06                             ` Jonathan Nieder
  1 sibling, 0 replies; 31+ messages in thread
From: Ian Campbell @ 2012-03-06 14:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Goirand, xen-devel, Dave Jiang, pkg-xen-devel,
	Maciej Sosnowski, linux-kernel, Jonathan Nieder, William Dauchy,
	Konrad Rzeszutek Wilk

On Tue, 2012-03-06 at 06:14 -0800, Dan Williams wrote:
> [    9.276817] ioatdma 0000:00:16.4: desc[0]:
> (0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0
> int_en: 1 compl: 1)
> ...
> [    9.276832] ioatdma 0000:00:16.4: ioat_get_current_completion:
> phys_complete: 0xcc7000
> 
> Thanks, this clearly shows that our descriptors are above 4GB and that
> the driver truncates the completion word.
> 
> Is this new behavior for xen?

Xen makes a distinction between physical addresses and DMA addresses and
the latter can potentially be anywhere in the machine's real address
space while the former is what GFP_KERNEL etc controls.

You are using pci_pool_alloc which is the correct API to use for these
things since it's purpose is to handle cases where PHYS != DMA addr by
exposing the DMA address to the caller. As part of that you should also
be using dma_addr_t for DMA addresses since that is the type which is
defined to handle the appropriate DMA address size on the platform.

I think this DMA!=PHYS can also be true of some non-x86 architectures
without Xen too but I guess ioat is quite x86 specific? In any case it
is wrong, or at least non-portable, to use unsigned long for these
addresses even though it happens on x86 that physaddr == dma addr
(usually).

Ian.

-- 
Ian Campbell

Start every day off with a smile and get it over with.
		-- W. C. Fields


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
@ 2012-03-06 14:39                               ` Ian Campbell
  0 siblings, 0 replies; 31+ messages in thread
From: Ian Campbell @ 2012-03-06 14:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: xen-devel, Dave Jiang, pkg-xen-devel, Thomas Goirand,
	Maciej Sosnowski, linux-kernel, Jonathan Nieder, William Dauchy,
	Konrad Rzeszutek Wilk

On Tue, 2012-03-06 at 06:14 -0800, Dan Williams wrote:
> [    9.276817] ioatdma 0000:00:16.4: desc[0]:
> (0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0
> int_en: 1 compl: 1)
> ...
> [    9.276832] ioatdma 0000:00:16.4: ioat_get_current_completion:
> phys_complete: 0xcc7000
> 
> Thanks, this clearly shows that our descriptors are above 4GB and that
> the driver truncates the completion word.
> 
> Is this new behavior for xen?

Xen makes a distinction between physical addresses and DMA addresses and
the latter can potentially be anywhere in the machine's real address
space while the former is what GFP_KERNEL etc controls.

You are using pci_pool_alloc which is the correct API to use for these
things since it's purpose is to handle cases where PHYS != DMA addr by
exposing the DMA address to the caller. As part of that you should also
be using dma_addr_t for DMA addresses since that is the type which is
defined to handle the appropriate DMA address size on the platform.

I think this DMA!=PHYS can also be true of some non-x86 architectures
without Xen too but I guess ioat is quite x86 specific? In any case it
is wrong, or at least non-portable, to use unsigned long for these
addresses even though it happens on x86 that physaddr == dma addr
(usually).

Ian.

-- 
Ian Campbell

Start every day off with a smile and get it over with.
		-- W. C. Fields

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-06 14:14                           ` Dan Williams
  2012-03-06 14:39                               ` Ian Campbell
@ 2012-03-11 22:06                             ` Jonathan Nieder
  2012-03-23 23:55                               ` Dan Williams
  1 sibling, 1 reply; 31+ messages in thread
From: Jonathan Nieder @ 2012-03-11 22:06 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Goirand, xen-devel, Dave Jiang, pkg-xen-devel,
	Maciej Sosnowski, linux-kernel, William Dauchy,
	Konrad Rzeszutek Wilk

Hi Dan,

Dan Williams wrote:

> Before you had mentioned that non-xen 32-bit builds don't fail.  Can
> you send me the .config from those two cases (offlist if they are too
> large)?

The failing and non-failing kernels are identical.  It is the
environment in which they are run that is different.

Running the kernel on bare metal works fine, while booting as a dom0
from the xen hypervisor triggers the assertion failure.[1]

.config: [2]

Hope that helps,
Jonathan

[1] http://thread.gmane.org/gmane.comp.emulators.xen.devel/121604/focus=121615
[2] http://alioth.debian.org/~jrnieder-guest/temp/config-3.2.0-2-686-pae

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Xen-devel] [Pkg-xen-devel] ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-06 14:39                               ` Ian Campbell
  (?)
@ 2012-03-13 16:49                               ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 31+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-03-13 16:49 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Dan Williams, xen-devel, Dave Jiang, pkg-xen-devel,
	Thomas Goirand, Maciej Sosnowski, linux-kernel, Jonathan Nieder,
	William Dauchy, Konrad Rzeszutek Wilk

On Tue, Mar 06, 2012 at 06:39:12AM -0800, Ian Campbell wrote:
> On Tue, 2012-03-06 at 06:14 -0800, Dan Williams wrote:
> > [    9.276817] ioatdma 0000:00:16.4: desc[0]:
> > (0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0
> > int_en: 1 compl: 1)
> > ...
> > [    9.276832] ioatdma 0000:00:16.4: ioat_get_current_completion:
> > phys_complete: 0xcc7000
> > 
> > Thanks, this clearly shows that our descriptors are above 4GB and that
> > the driver truncates the completion word.
> > 
> > Is this new behavior for xen?
> 
> Xen makes a distinction between physical addresses and DMA addresses and
> the latter can potentially be anywhere in the machine's real address
> space while the former is what GFP_KERNEL etc controls.
> 
> You are using pci_pool_alloc which is the correct API to use for these
> things since it's purpose is to handle cases where PHYS != DMA addr by
> exposing the DMA address to the caller. As part of that you should also
> be using dma_addr_t for DMA addresses since that is the type which is
> defined to handle the appropriate DMA address size on the platform.
> 
> I think this DMA!=PHYS can also be true of some non-x86 architectures

Especially SPARC.
> without Xen too but I guess ioat is quite x86 specific? In any case it
> is wrong, or at least non-portable, to use unsigned long for these
> addresses even though it happens on x86 that physaddr == dma addr
> (usually).

I think with the Intel VT-d that can be different. The bus addresses returned
do seem different.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-11 22:06                             ` Jonathan Nieder
@ 2012-03-23 23:55                               ` Dan Williams
  2012-03-24  1:29                                 ` William Dauchy
  2012-03-24  2:25                                 ` William Dauchy
  0 siblings, 2 replies; 31+ messages in thread
From: Dan Williams @ 2012-03-23 23:55 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Thomas Goirand, xen-devel, Dave Jiang, pkg-xen-devel,
	Maciej Sosnowski, linux-kernel, William Dauchy,
	Konrad Rzeszutek Wilk

Subject: ioat: fix size of 'completion' for Xen

From: Dan Williams <dan.j.williams@intel.com>

Starting with v3.2 Jonathan reports that Xen crashes loading the ioatdma
driver.  A debug run shows:

  ioatdma 0000:00:16.4: desc[0]: (0x300cc7000->0x300cc7040) cookie: 0 flags: 0x2 ctl: 0x29 (op: 0 int_en: 1 compl: 1)
  ...
  ioatdma 0000:00:16.4: ioat_get_current_completion: phys_complete: 0xcc7000

...which shows that in this environment GFP_KERNEL memory may be backed
by a 64-bit dma address.  This breaks the driver's assumption that an
unsigned long should be able to contain the physical address for
descriptor memory.  Switch to dma_addr_t which beyond being the right
size, is the true type for the data i.e. an io-virtual address
indicating the engine's last processed descriptor.

[stable: 3.2+]
Cc: <stable@vger.kernel.org>
Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

On Sun, 2012-03-11 at 17:06 -0500, Jonathan Nieder wrote:
Hi Dan,
> 
> Dan Williams wrote:
> 
> > Before you had mentioned that non-xen 32-bit builds don't fail.  Can
> > you send me the .config from those two cases (offlist if they are too
> > large)?
> 
> The failing and non-failing kernels are identical.  It is the
> environment in which they are run that is different.
> 
> Running the kernel on bare metal works fine, while booting as a dom0
> from the xen hypervisor triggers the assertion failure.[1]
> 
> .config: [2]
> 
> Hope that helps,
> Jonathan

Thanks for the debug help, does this patch fix the issue for you?

 drivers/dma/ioat/dma.c    |   16 ++++++++--------
 drivers/dma/ioat/dma.h    |    6 +++---
 drivers/dma/ioat/dma_v2.c |    8 ++++----
 drivers/dma/ioat/dma_v3.c |    8 ++++----
 4 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c
index a4d6cb0..6595180 100644
--- a/drivers/dma/ioat/dma.c
+++ b/drivers/dma/ioat/dma.c
@@ -548,9 +548,9 @@ void ioat_dma_unmap(struct ioat_chan_common *chan, enum dma_ctrl_flags flags,
 			   PCI_DMA_TODEVICE, flags, 0);
 }
 
-unsigned long ioat_get_current_completion(struct ioat_chan_common *chan)
+dma_addr_t ioat_get_current_completion(struct ioat_chan_common *chan)
 {
-	unsigned long phys_complete;
+	dma_addr_t phys_complete;
 	u64 completion;
 
 	completion = *chan->completion;
@@ -571,7 +571,7 @@ unsigned long ioat_get_current_completion(struct ioat_chan_common *chan)
 }
 
 bool ioat_cleanup_preamble(struct ioat_chan_common *chan,
-			   unsigned long *phys_complete)
+			   dma_addr_t *phys_complete)
 {
 	*phys_complete = ioat_get_current_completion(chan);
 	if (*phys_complete == chan->last_completion)
@@ -582,14 +582,14 @@ bool ioat_cleanup_preamble(struct ioat_chan_common *chan,
 	return true;
 }
 
-static void __cleanup(struct ioat_dma_chan *ioat, unsigned long phys_complete)
+static void __cleanup(struct ioat_dma_chan *ioat, dma_addr_t phys_complete)
 {
 	struct ioat_chan_common *chan = &ioat->base;
 	struct list_head *_desc, *n;
 	struct dma_async_tx_descriptor *tx;
 
-	dev_dbg(to_dev(chan), "%s: phys_complete: %lx\n",
-		 __func__, phys_complete);
+	dev_dbg(to_dev(chan), "%s: phys_complete: %llx\n",
+		 __func__, (unsigned long long) phys_complete);
 	list_for_each_safe(_desc, n, &ioat->used_desc) {
 		struct ioat_desc_sw *desc;
 
@@ -655,7 +655,7 @@ static void __cleanup(struct ioat_dma_chan *ioat, unsigned long phys_complete)
 static void ioat1_cleanup(struct ioat_dma_chan *ioat)
 {
 	struct ioat_chan_common *chan = &ioat->base;
-	unsigned long phys_complete;
+	dma_addr_t phys_complete;
 
 	prefetch(chan->completion);
 
@@ -701,7 +701,7 @@ static void ioat1_timer_event(unsigned long data)
 		mod_timer(&chan->timer, jiffies + COMPLETION_TIMEOUT);
 		spin_unlock_bh(&ioat->desc_lock);
 	} else if (test_bit(IOAT_COMPLETION_PENDING, &chan->state)) {
-		unsigned long phys_complete;
+		dma_addr_t phys_complete;
 
 		spin_lock_bh(&ioat->desc_lock);
 		/* if we haven't made progress and we have already
diff --git a/drivers/dma/ioat/dma.h b/drivers/dma/ioat/dma.h
index 5216c8a..8bebddd 100644
--- a/drivers/dma/ioat/dma.h
+++ b/drivers/dma/ioat/dma.h
@@ -88,7 +88,7 @@ struct ioatdma_device {
 struct ioat_chan_common {
 	struct dma_chan common;
 	void __iomem *reg_base;
-	unsigned long last_completion;
+	dma_addr_t last_completion;
 	spinlock_t cleanup_lock;
 	dma_cookie_t completed_cookie;
 	unsigned long state;
@@ -333,7 +333,7 @@ int __devinit ioat_dma_self_test(struct ioatdma_device *device);
 void __devexit ioat_dma_remove(struct ioatdma_device *device);
 struct dca_provider * __devinit ioat_dca_init(struct pci_dev *pdev,
 					      void __iomem *iobase);
-unsigned long ioat_get_current_completion(struct ioat_chan_common *chan);
+dma_addr_t ioat_get_current_completion(struct ioat_chan_common *chan);
 void ioat_init_channel(struct ioatdma_device *device,
 		       struct ioat_chan_common *chan, int idx);
 enum dma_status ioat_dma_tx_status(struct dma_chan *c, dma_cookie_t cookie,
@@ -341,7 +341,7 @@ enum dma_status ioat_dma_tx_status(struct dma_chan *c, dma_cookie_t cookie,
 void ioat_dma_unmap(struct ioat_chan_common *chan, enum dma_ctrl_flags flags,
 		    size_t len, struct ioat_dma_descriptor *hw);
 bool ioat_cleanup_preamble(struct ioat_chan_common *chan,
-			   unsigned long *phys_complete);
+			   dma_addr_t *phys_complete);
 void ioat_kobject_add(struct ioatdma_device *device, struct kobj_type *type);
 void ioat_kobject_del(struct ioatdma_device *device);
 extern const struct sysfs_ops ioat_sysfs_ops;
diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c
index 5d65f83..cb8864d 100644
--- a/drivers/dma/ioat/dma_v2.c
+++ b/drivers/dma/ioat/dma_v2.c
@@ -126,7 +126,7 @@ static void ioat2_start_null_desc(struct ioat2_dma_chan *ioat)
 	spin_unlock_bh(&ioat->prep_lock);
 }
 
-static void __cleanup(struct ioat2_dma_chan *ioat, unsigned long phys_complete)
+static void __cleanup(struct ioat2_dma_chan *ioat, dma_addr_t phys_complete)
 {
 	struct ioat_chan_common *chan = &ioat->base;
 	struct dma_async_tx_descriptor *tx;
@@ -178,7 +178,7 @@ static void __cleanup(struct ioat2_dma_chan *ioat, unsigned long phys_complete)
 static void ioat2_cleanup(struct ioat2_dma_chan *ioat)
 {
 	struct ioat_chan_common *chan = &ioat->base;
-	unsigned long phys_complete;
+	dma_addr_t phys_complete;
 
 	spin_lock_bh(&chan->cleanup_lock);
 	if (ioat_cleanup_preamble(chan, &phys_complete))
@@ -259,7 +259,7 @@ int ioat2_reset_sync(struct ioat_chan_common *chan, unsigned long tmo)
 static void ioat2_restart_channel(struct ioat2_dma_chan *ioat)
 {
 	struct ioat_chan_common *chan = &ioat->base;
-	unsigned long phys_complete;
+	dma_addr_t phys_complete;
 
 	ioat2_quiesce(chan, 0);
 	if (ioat_cleanup_preamble(chan, &phys_complete))
@@ -274,7 +274,7 @@ void ioat2_timer_event(unsigned long data)
 	struct ioat_chan_common *chan = &ioat->base;
 
 	if (test_bit(IOAT_COMPLETION_PENDING, &chan->state)) {
-		unsigned long phys_complete;
+		dma_addr_t phys_complete;
 		u64 status;
 
 		status = ioat_chansts(chan);
diff --git a/drivers/dma/ioat/dma_v3.c b/drivers/dma/ioat/dma_v3.c
index f519c93..2dbf32b 100644
--- a/drivers/dma/ioat/dma_v3.c
+++ b/drivers/dma/ioat/dma_v3.c
@@ -256,7 +256,7 @@ static bool desc_has_ext(struct ioat_ring_ent *desc)
  * The difference from the dma_v2.c __cleanup() is that this routine
  * handles extended descriptors and dma-unmapping raid operations.
  */
-static void __cleanup(struct ioat2_dma_chan *ioat, unsigned long phys_complete)
+static void __cleanup(struct ioat2_dma_chan *ioat, dma_addr_t phys_complete)
 {
 	struct ioat_chan_common *chan = &ioat->base;
 	struct ioat_ring_ent *desc;
@@ -314,7 +314,7 @@ static void __cleanup(struct ioat2_dma_chan *ioat, unsigned long phys_complete)
 static void ioat3_cleanup(struct ioat2_dma_chan *ioat)
 {
 	struct ioat_chan_common *chan = &ioat->base;
-	unsigned long phys_complete;
+	dma_addr_t phys_complete;
 
 	spin_lock_bh(&chan->cleanup_lock);
 	if (ioat_cleanup_preamble(chan, &phys_complete))
@@ -333,7 +333,7 @@ static void ioat3_cleanup_event(unsigned long data)
 static void ioat3_restart_channel(struct ioat2_dma_chan *ioat)
 {
 	struct ioat_chan_common *chan = &ioat->base;
-	unsigned long phys_complete;
+	dma_addr_t phys_complete;
 
 	ioat2_quiesce(chan, 0);
 	if (ioat_cleanup_preamble(chan, &phys_complete))
@@ -348,7 +348,7 @@ static void ioat3_timer_event(unsigned long data)
 	struct ioat_chan_common *chan = &ioat->base;
 
 	if (test_bit(IOAT_COMPLETION_PENDING, &chan->state)) {
-		unsigned long phys_complete;
+		dma_addr_t phys_complete;
 		u64 status;
 
 		status = ioat_chansts(chan);




^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-23 23:55                               ` Dan Williams
@ 2012-03-24  1:29                                 ` William Dauchy
  2012-03-24  2:25                                 ` William Dauchy
  1 sibling, 0 replies; 31+ messages in thread
From: William Dauchy @ 2012-03-24  1:29 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jonathan Nieder, Thomas Goirand, xen-devel, Dave Jiang,
	pkg-xen-devel, Maciej Sosnowski, linux-kernel,
	Konrad Rzeszutek Wilk

On Sat, Mar 24, 2012 at 12:55 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> Starting with v3.2 Jonathan reports that Xen crashes loading the ioatdma
> driver.  A debug run shows:

Please note that I reported the crash a bit earlier
http://lists.xen.org/archives/html/xen-devel/2012-01/msg02408.html
I will test this patch as soon as possible. Thanks for your work.

Reported-by: William Dauchy <wdauchy@gmail.com>

Regards,
-- 
William

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-23 23:55                               ` Dan Williams
  2012-03-24  1:29                                 ` William Dauchy
@ 2012-03-24  2:25                                 ` William Dauchy
  2012-03-24  3:34                                   ` Williams, Dan J
  1 sibling, 1 reply; 31+ messages in thread
From: William Dauchy @ 2012-03-24  2:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jonathan Nieder, Thomas Goirand, xen-devel, Dave Jiang,
	pkg-xen-devel, Maciej Sosnowski, linux-kernel,
	Konrad Rzeszutek Wilk

Hi Dan,

On Sat, Mar 24, 2012 at 12:55 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> Thanks for the debug help, does this patch fix the issue for you?

I successfully tested your patch and it works fine. Thanks again for your work.

Reported-by: William Dauchy <wdauchy@gmail.com>
Tested-by: William Dauchy <wdauchy@gmail.com>

Best regards,
-- 
William

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2
  2012-03-24  2:25                                 ` William Dauchy
@ 2012-03-24  3:34                                   ` Williams, Dan J
  0 siblings, 0 replies; 31+ messages in thread
From: Williams, Dan J @ 2012-03-24  3:34 UTC (permalink / raw)
  To: William Dauchy
  Cc: Jonathan Nieder, Thomas Goirand, xen-devel, Dave Jiang,
	pkg-xen-devel, Maciej Sosnowski, linux-kernel,
	Konrad Rzeszutek Wilk

On Fri, Mar 23, 2012 at 7:25 PM, William Dauchy <wdauchy@gmail.com> wrote:
> Hi Dan,
>
> On Sat, Mar 24, 2012 at 12:55 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> Thanks for the debug help, does this patch fix the issue for you?
>
> I successfully tested your patch and it works fine. Thanks again for your work.
>
> Reported-by: William Dauchy <wdauchy@gmail.com>
> Tested-by: William Dauchy <wdauchy@gmail.com>

Great, thanks for the test.

--
Dan

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-03-24  3:34 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-27 13:31 regression ioatdma 3.3 William Dauchy
2012-01-27 14:47 ` Konrad Rzeszutek Wilk
2012-01-27 15:02   ` William Dauchy
2012-02-19 22:31   ` Jonathan Nieder
2012-02-20 18:16     ` Jonathan Nieder
2012-02-25  7:46       ` Thomas Goirand
2012-02-25 21:13         ` William Dauchy
2012-03-02  5:57         ` ioatdma: Boot process hangs then reboots when using Xen + Linux 3.2 Jonathan Nieder
2012-03-02  6:42           ` Dan Williams
2012-03-02 16:21             ` [Pkg-xen-devel] " Bastian Blank
2012-03-02 16:21               ` Bastian Blank
2012-03-02 16:44               ` Dan Williams
2012-03-02 17:57                 ` Bastian Blank
2012-03-02 19:31                   ` Dan Williams
2012-03-02 20:08                     ` Bastian Blank
2012-03-02 20:16                       ` Dan Williams
2012-03-02 20:56                         ` Bastian Blank
2012-03-02 21:17                           ` Dan Williams
2012-03-05 15:26                     ` Thomas Goirand
2012-03-05 15:38                       ` Dan Williams
2012-03-06  9:20                         ` Thomas Goirand
2012-03-06 10:33                           ` Bastian Blank
2012-03-06 14:14                           ` Dan Williams
2012-03-06 14:39                             ` Ian Campbell
2012-03-06 14:39                               ` Ian Campbell
2012-03-13 16:49                               ` [Xen-devel] " Konrad Rzeszutek Wilk
2012-03-11 22:06                             ` Jonathan Nieder
2012-03-23 23:55                               ` Dan Williams
2012-03-24  1:29                                 ` William Dauchy
2012-03-24  2:25                                 ` William Dauchy
2012-03-24  3:34                                   ` Williams, Dan J

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.