All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
@ 2010-01-20  9:41 Jarek Poplawski
  2010-01-20 18:03 ` Stephen Hemminger
                   ` (3 more replies)
  0 siblings, 4 replies; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-20  9:41 UTC (permalink / raw)
  To: David Miller
  Cc: Stephen Hemminger, Michael Breuer, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

[ previously: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit() ]
On Tue, Jan 19, 2010 at 05:10:13PM -0800, Stephen Hemminger wrote:
> On Tue, 19 Jan 2010 20:01:10 -0500
> Michael Breuer <mbreuer@majjas.com> wrote:
> 
> > On 1/19/2010 5:45 PM, Jarek Poplawski wrote:
> > > On Tue, Jan 19, 2010 at 03:06:01PM -0500, Michael Breuer wrote:
> > >    
> > >> On 1/19/2010 2:59 PM, Jarek Poplawski wrote:
> > >>      
> > >>> On Tue, Jan 19, 2010 at 10:47:27AM -0500, Michael Breuer wrote:
> > >>> ...
> > >>>        
> > >>>> Still get the warning... but now 60 bytes.
> > >>>> Jan 19 10:43:50 mail kernel: ------------[ cut here ]------------
> > >>>> Jan 19 10:43:50 mail kernel: WARNING: at lib/dma-debug.c:902
...
> > That not only compiled, but it cleared the error as well. Additionally, 
> > I used to see a bit of a delay receiving the login prompt when first 
> > connecting to the box by ssh. That delay is gone with this patch. I'd 
> > guess that the warning wasn't quite as innocuous as I thought.
> > Note: tested on 2.6.32.4.  I'll leave this up for a bit before 
> > attempting to move back to head.
> 
> Seems like an underlying bug in the DMA api. Maybe it just can't
> handle operations on partial mapping.
> 
> Other drivers with same problem:
>   bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3, 

It seems using the same length (even without pci_unmap_len()) is
crucial here, but I hope maintainers (added to CC) will take care.

Btw, it's not tested yet, but it might affect CONFIG_DMAR problems.

Thanks,
Jarek P.
----------------------->

Using pci_unmap_len(), with the same length as pci_map_single(), with
pci_dma_sync_single_for_cpu()/_device() fixes this warning (2.6.32.4):

> Jan 19 10:43:50 mail kernel: WARNING: at lib/dma-debug.c:902 
> check_sync+0xc1/0x43f()
> Jan 19 10:43:50 mail kernel: Hardware name: System Product Name
> Jan 19 10:43:50 mail kernel: sky2 0000:04:00.0: DMA-API: device driver 
> tries to sync DMA memory it has not allocated [device 
> address=0x0000000320a0b022] [size=60 bytes]

Reported-by: Michael Breuer <mbreuer@majjas.com>
Tested-by: Michael Breuer <mbreuer@majjas.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 drivers/net/sky2.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 7650f73..cdebdd3 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -2252,12 +2252,14 @@ static struct sk_buff *receive_copy(struct sky2_port *sky2,
 	skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
 	if (likely(skb)) {
 		pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
-					    length, PCI_DMA_FROMDEVICE);
+					    pci_unmap_len(re, data_size),
+					    PCI_DMA_FROMDEVICE);
 		skb_copy_from_linear_data(re->skb, skb->data, length);
 		skb->ip_summed = re->skb->ip_summed;
 		skb->csum = re->skb->csum;
 		pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
-					       length, PCI_DMA_FROMDEVICE);
+					       pci_unmap_len(re, data_size),
+					       PCI_DMA_FROMDEVICE);
 		re->skb->ip_summed = CHECKSUM_NONE;
 		skb_put(skb, length);
 	}

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20  9:41 [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync Jarek Poplawski
@ 2010-01-20 18:03 ` Stephen Hemminger
  2010-01-20 20:11   ` Michael Chan
  2010-01-20 18:09 ` Stephen Hemminger
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 99+ messages in thread
From: Stephen Hemminger @ 2010-01-20 18:03 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Michael Breuer, akpm, flyboy, linux-kernel, netdev,
	Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Wed, 20 Jan 2010 09:41:03 +0000
Jarek Poplawski <jarkao2@gmail.com> wrote:

> [ previously: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit() ]
> On Tue, Jan 19, 2010 at 05:10:13PM -0800, Stephen Hemminger wrote:
> > On Tue, 19 Jan 2010 20:01:10 -0500
> > Michael Breuer <mbreuer@majjas.com> wrote:
> > 
> > > On 1/19/2010 5:45 PM, Jarek Poplawski wrote:
> > > > On Tue, Jan 19, 2010 at 03:06:01PM -0500, Michael Breuer wrote:
> > > >    
> > > >> On 1/19/2010 2:59 PM, Jarek Poplawski wrote:
> > > >>      
> > > >>> On Tue, Jan 19, 2010 at 10:47:27AM -0500, Michael Breuer wrote:
> > > >>> ...
> > > >>>        
> > > >>>> Still get the warning... but now 60 bytes.
> > > >>>> Jan 19 10:43:50 mail kernel: ------------[ cut here ]------------
> > > >>>> Jan 19 10:43:50 mail kernel: WARNING: at lib/dma-debug.c:902
> ...
> > > That not only compiled, but it cleared the error as well. Additionally, 
> > > I used to see a bit of a delay receiving the login prompt when first 
> > > connecting to the box by ssh. That delay is gone with this patch. I'd 
> > > guess that the warning wasn't quite as innocuous as I thought.
> > > Note: tested on 2.6.32.4.  I'll leave this up for a bit before 
> > > attempting to move back to head.
> > 
> > Seems like an underlying bug in the DMA api. Maybe it just can't
> > handle operations on partial mapping.
> > 
> > Other drivers with same problem:
> >   bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3, 
> 
> It seems using the same length (even without pci_unmap_len()) is
> crucial here, but I hope maintainers (added to CC) will take care.
> 
> Btw, it's not tested yet, but it might affect CONFIG_DMAR problems.

The list of places to inspect is:

$ git grep -l sync_single_for_cpu $(git grep -l skb_copy_from_linear_data)
drivers/infiniband/ulp/ipoib/ipoib_cm.c
drivers/message/fusion/mptlan.c
drivers/net/b44.c
drivers/net/bnx2.c
drivers/net/cassini.c
drivers/net/chelsio/sge.c
drivers/net/cxgb3/sge.c
drivers/net/irda/vlsi_ir.c
drivers/net/myri_sbus.c
drivers/net/r8169.c
drivers/net/rrunner.c
drivers/net/skge.c
drivers/net/sky2.c
drivers/net/sungem.c
drivers/net/sunhme.c
drivers/net/tg3.c
drivers/net/tokenring/3c359.c
drivers/net/tokenring/olympic.c
drivers/net/tulip/de2104x.c
drivers/net/via-velocity.c
drivers/net/wireless/ipw2x00/ipw2100.c
drivers/net/wireless/ipw2x00/ipw2200.c

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20  9:41 [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync Jarek Poplawski
  2010-01-20 18:03 ` Stephen Hemminger
@ 2010-01-20 18:09 ` Stephen Hemminger
  2010-01-20 22:24 ` Alan Cox
  2010-01-21 19:59 ` Michael Breuer
  3 siblings, 0 replies; 99+ messages in thread
From: Stephen Hemminger @ 2010-01-20 18:09 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Michael Breuer, akpm, flyboy, linux-kernel, netdev,
	Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Wed, 20 Jan 2010 09:41:03 +0000
Jarek Poplawski <jarkao2@gmail.com> wrote:

> It seems using the same length (even without pci_unmap_len()) is
> crucial here, but I hope maintainers (added to CC) will take care.
> 
> Btw, it's not tested yet, but it might affect CONFIG_DMAR problems.
> 
> Thanks,
> Jarek P.

Update Documentation/PCI-DMA-mapping as well please.

-- 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20 18:03 ` Stephen Hemminger
@ 2010-01-20 20:11   ` Michael Chan
  2010-01-20 20:30     ` Stephen Hemminger
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Chan @ 2010-01-20 20:11 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, Michael Breuer, akpm, flyboy,
	linux-kernel, netdev, Don Fry, Francois Romieu, Matthew Carlson


On Wed, 2010-01-20 at 10:03 -0800, Stephen Hemminger wrote:
> On Wed, 20 Jan 2010 09:41:03 +0000
> Jarek Poplawski <jarkao2@gmail.com> wrote:
> > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > handle operations on partial mapping.
> > > 
> > > Other drivers with same problem:
> > >   bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3, 
> > 
> > It seems using the same length (even without pci_unmap_len()) is
> > crucial here, but I hope maintainers (added to CC) will take care.
> > 

I'm still unsure how to do dma_sync properly in bnx2.  In the current
code, we always dma_sync_for_cpu a small portion of the SKB because the
rx descriptor is at the beginning of the SKB.  We get the packet length,
for example, from the rx descriptor.

If it's a big packet, we'll simply unmap the entire SKB buffer (with the
beginning portion already dma_sync'ed).  If the packet is smaller than
what we dma_sync'ed, we'll just copy the data to a new SKB.  We'll then
dma_sync_for_device the portion of the original buffer and recycle the
whole buffer back to the device for new packets.

So, is it correct to just change the dma_sync length to the full length
of the buffer?  It doesn't sound right to me.

Thanks.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20 20:11   ` Michael Chan
@ 2010-01-20 20:30     ` Stephen Hemminger
  2010-01-20 20:58       ` Jarek Poplawski
  2010-01-20 22:45       ` David Miller
  0 siblings, 2 replies; 99+ messages in thread
From: Stephen Hemminger @ 2010-01-20 20:30 UTC (permalink / raw)
  To: Michael Chan
  Cc: Jarek Poplawski, David Miller, Michael Breuer, akpm, flyboy,
	linux-kernel, netdev, Don Fry, Francois Romieu, Matthew Carlson

On Wed, 20 Jan 2010 12:11:52 -0800
"Michael Chan" <mchan@broadcom.com> wrote:

> 
> On Wed, 2010-01-20 at 10:03 -0800, Stephen Hemminger wrote:
> > On Wed, 20 Jan 2010 09:41:03 +0000
> > Jarek Poplawski <jarkao2@gmail.com> wrote:
> > > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > > handle operations on partial mapping.
> > > > 
> > > > Other drivers with same problem:
> > > >   bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3, 
> > > 
> > > It seems using the same length (even without pci_unmap_len()) is
> > > crucial here, but I hope maintainers (added to CC) will take care.
> > > 
> 
> I'm still unsure how to do dma_sync properly in bnx2.  In the current
> code, we always dma_sync_for_cpu a small portion of the SKB because the
> rx descriptor is at the beginning of the SKB.  We get the packet length,
> for example, from the rx descriptor.
> 
> If it's a big packet, we'll simply unmap the entire SKB buffer (with the
> beginning portion already dma_sync'ed).  If the packet is smaller than
> what we dma_sync'ed, we'll just copy the data to a new SKB.  We'll then
> dma_sync_for_device the portion of the original buffer and recycle the
> whole buffer back to the device for new packets.
> 
> So, is it correct to just change the dma_sync length to the full length
> of the buffer?  It doesn't sound right to me.

It looks like the size passed to sync_single has to match size of original
mapping.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20 20:30     ` Stephen Hemminger
@ 2010-01-20 20:58       ` Jarek Poplawski
  2010-01-20 22:50         ` David Miller
  2010-01-20 22:45       ` David Miller
  1 sibling, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-20 20:58 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael Chan, David Miller, Michael Breuer, akpm, flyboy,
	linux-kernel, netdev, Don Fry, Francois Romieu, Matthew Carlson

On Wed, Jan 20, 2010 at 12:30:33PM -0800, Stephen Hemminger wrote:
> On Wed, 20 Jan 2010 12:11:52 -0800
> "Michael Chan" <mchan@broadcom.com> wrote:
> 
> > 
> > On Wed, 2010-01-20 at 10:03 -0800, Stephen Hemminger wrote:
> > > On Wed, 20 Jan 2010 09:41:03 +0000
> > > Jarek Poplawski <jarkao2@gmail.com> wrote:
> > > > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > > > handle operations on partial mapping.
> > > > > 
> > > > > Other drivers with same problem:
> > > > >   bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3, 
> > > > 
> > > > It seems using the same length (even without pci_unmap_len()) is
> > > > crucial here, but I hope maintainers (added to CC) will take care.
> > > > 
> > 
> > I'm still unsure how to do dma_sync properly in bnx2.  In the current
> > code, we always dma_sync_for_cpu a small portion of the SKB because the
> > rx descriptor is at the beginning of the SKB.  We get the packet length,
> > for example, from the rx descriptor.
> > 
> > If it's a big packet, we'll simply unmap the entire SKB buffer (with the
> > beginning portion already dma_sync'ed).  If the packet is smaller than
> > what we dma_sync'ed, we'll just copy the data to a new SKB.  We'll then
> > dma_sync_for_device the portion of the original buffer and recycle the
> > whole buffer back to the device for new packets.
> > 
> > So, is it correct to just change the dma_sync length to the full length
> > of the buffer?  It doesn't sound right to me.
> 
> It looks like the size passed to sync_single has to match size of original
> mapping.

Yes, and it's mainly for lib/dma-debug (until it's not verified dmar
errors reported by Michael Breuer could be connected). So, I'm not
sure for now how serious this warning could be. On the other hand,
Documentation/(PCI-?)DMA-mapping, mentioned by Stephen, doesn't seem
to allow or use in examples the "size" different than mapped.

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20  9:41 [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync Jarek Poplawski
  2010-01-20 18:03 ` Stephen Hemminger
  2010-01-20 18:09 ` Stephen Hemminger
@ 2010-01-20 22:24 ` Alan Cox
  2010-01-20 22:53   ` David Miller
  2010-01-20 22:53   ` Jarek Poplawski
  2010-01-21 19:59 ` Michael Breuer
  3 siblings, 2 replies; 99+ messages in thread
From: Alan Cox @ 2010-01-20 22:24 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Stephen Hemminger, Michael Breuer, akpm, flyboy,
	linux-kernel, netdev, Michael Chan, Don Fry, Francois Romieu,
	Matt Carlson

> > Seems like an underlying bug in the DMA api. Maybe it just can't
> > handle operations on partial mapping.
> > 
> > Other drivers with same problem:
> >   bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3, 
> 
> It seems using the same length (even without pci_unmap_len()) is
> crucial here, but I hope maintainers (added to CC) will take care.

The API needs fixing - if you've got a large mapping and you want to sync
part of it then we need to support that. Now it might well be that the
implementation on some braindead platform has to sync the entire thing,
and some implementations entire pages or cache lines.

You can't fix this in the drivers, they requested a service and they
don't have enough information nor is it their job to know about all the
platform specific rules.

Alan

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20 20:30     ` Stephen Hemminger
  2010-01-20 20:58       ` Jarek Poplawski
@ 2010-01-20 22:45       ` David Miller
  1 sibling, 0 replies; 99+ messages in thread
From: David Miller @ 2010-01-20 22:45 UTC (permalink / raw)
  To: shemminger
  Cc: mchan, jarkao2, mbreuer, akpm, flyboy, linux-kernel, netdev,
	pcnet32, romieu, mcarlson

From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Wed, 20 Jan 2010 12:30:33 -0800

> It looks like the size passed to sync_single has to match size of original
> mapping.

I think that's rediculious, and unreasonable.

All implementations of these APIs where the sync matters have the
ability to handle partial syncs correctly.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20 20:58       ` Jarek Poplawski
@ 2010-01-20 22:50         ` David Miller
  0 siblings, 0 replies; 99+ messages in thread
From: David Miller @ 2010-01-20 22:50 UTC (permalink / raw)
  To: jarkao2
  Cc: shemminger, mchan, mbreuer, akpm, flyboy, linux-kernel, netdev,
	pcnet32, romieu, mcarlson

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 20 Jan 2010 21:58:42 +0100

> Yes, and it's mainly for lib/dma-debug (until it's not verified dmar
> errors reported by Michael Breuer could be connected). So, I'm not
> sure for now how serious this warning could be. On the other hand,
> Documentation/(PCI-?)DMA-mapping, mentioned by Stephen, doesn't seem
> to allow or use in examples the "size" different than mapped.

Someone find me even one actual implemenetation of the DMA APIs, not
the debugging code, which can't handle a partial sync correctly, then
we can talk.

As the person who originally created the PCI DMA APIs I can tell
you the intention was definitely to allow arbitrary SYNC lengths
and there was no requirement that the length needed to be the
same as that used in the mapping call.

Nothing else makes sense.  People sync when they have a partially used
buffer and want to copy the data out then recycle the big original
buffer back to the chip.  I knew this when creating the PCI DMA APIs
because I first tried to use them in network drivers and that's what
so many of them do.

Currently all we have is an overly anal DMA debugging layer check, and
some speculation that it might or might not fix some real bug.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20 22:24 ` Alan Cox
@ 2010-01-20 22:53   ` David Miller
  2010-01-20 22:53   ` Jarek Poplawski
  1 sibling, 0 replies; 99+ messages in thread
From: David Miller @ 2010-01-20 22:53 UTC (permalink / raw)
  To: alan
  Cc: jarkao2, shemminger, mbreuer, akpm, flyboy, linux-kernel, netdev,
	mchan, pcnet32, romieu, mcarlson

From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Wed, 20 Jan 2010 22:24:14 +0000

>> > Seems like an underlying bug in the DMA api. Maybe it just can't
>> > handle operations on partial mapping.
>> > 
>> > Other drivers with same problem:
>> >   bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3, 
>> 
>> It seems using the same length (even without pci_unmap_len()) is
>> crucial here, but I hope maintainers (added to CC) will take care.
> 
> The API needs fixing - if you've got a large mapping and you want to sync
> part of it then we need to support that. Now it might well be that the
> implementation on some braindead platform has to sync the entire thing,
> and some implementations entire pages or cache lines.
> 
> You can't fix this in the drivers, they requested a service and they
> don't have enough information nor is it their job to know about all the
> platform specific rules.

Absolutely and %100 agreed, the DMA debugging layer and documentation
are both buggy.

The intention from the beginning was always to allow partial SYNCs
exactly for the reasons Alan and myself are mentioning here.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20 22:24 ` Alan Cox
  2010-01-20 22:53   ` David Miller
@ 2010-01-20 22:53   ` Jarek Poplawski
  2010-01-21 15:22     ` FUJITA Tomonori
  1 sibling, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-20 22:53 UTC (permalink / raw)
  To: Alan Cox
  Cc: David Miller, Stephen Hemminger, Michael Breuer, akpm, flyboy,
	linux-kernel, netdev, Michael Chan, Don Fry, Francois Romieu,
	Matt Carlson

On Wed, Jan 20, 2010 at 10:24:14PM +0000, Alan Cox wrote:
> > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > handle operations on partial mapping.
> > > 
> > > Other drivers with same problem:
> > >   bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3, 
> > 
> > It seems using the same length (even without pci_unmap_len()) is
> > crucial here, but I hope maintainers (added to CC) will take care.
> 
> The API needs fixing - if you've got a large mapping and you want to sync
> part of it then we need to support that. Now it might well be that the
> implementation on some braindead platform has to sync the entire thing,
> and some implementations entire pages or cache lines.
> 
> You can't fix this in the drivers, they requested a service and they
> don't have enough information nor is it their job to know about all the
> platform specific rules.

Yes, the need to repeat some other values if there is a dedicated
structure/pointer could be misleading. Btw, it seems to be a trivial
overlooking since there is dma_sync_single_range() ready to use.

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20 22:53   ` Jarek Poplawski
@ 2010-01-21 15:22     ` FUJITA Tomonori
  2010-01-21 18:41       ` Jarek Poplawski
  0 siblings, 1 reply; 99+ messages in thread
From: FUJITA Tomonori @ 2010-01-21 15:22 UTC (permalink / raw)
  To: jarkao2
  Cc: alan, davem, shemminger, mbreuer, akpm, flyboy, linux-kernel,
	netdev, mchan, pcnet32, romieu, mcarlson

On Wed, 20 Jan 2010 23:53:22 +0100
Jarek Poplawski <jarkao2@gmail.com> wrote:

> On Wed, Jan 20, 2010 at 10:24:14PM +0000, Alan Cox wrote:
> > > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > > handle operations on partial mapping.
> > > > 
> > > > Other drivers with same problem:
> > > >   bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3, 
> > > 
> > > It seems using the same length (even without pci_unmap_len()) is
> > > crucial here, but I hope maintainers (added to CC) will take care.
> > 
> > The API needs fixing - if you've got a large mapping and you want to sync
> > part of it then we need to support that. Now it might well be that the
> > implementation on some braindead platform has to sync the entire thing,
> > and some implementations entire pages or cache lines.
> > 
> > You can't fix this in the drivers, they requested a service and they
> > don't have enough information nor is it their job to know about all the
> > platform specific rules.
> 
> Yes, the need to repeat some other values if there is a dedicated
> structure/pointer could be misleading. Btw, it seems to be a trivial
> overlooking since there is dma_sync_single_range() ready to use.

Yeah, dma_sync_single_range() enables you to do a partial sync. But
you must be really careful with a partial sync (as DMA-API.txt says).

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-21 15:22     ` FUJITA Tomonori
@ 2010-01-21 18:41       ` Jarek Poplawski
  2010-01-22  5:11         ` FUJITA Tomonori
  0 siblings, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-21 18:41 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: alan, davem, shemminger, mbreuer, akpm, flyboy, linux-kernel,
	netdev, mchan, pcnet32, romieu, mcarlson

On Fri, Jan 22, 2010 at 12:22:10AM +0900, FUJITA Tomonori wrote:
> On Wed, 20 Jan 2010 23:53:22 +0100
> Jarek Poplawski <jarkao2@gmail.com> wrote:
> 
> > On Wed, Jan 20, 2010 at 10:24:14PM +0000, Alan Cox wrote:
> > > > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > > > handle operations on partial mapping.
> > > > > 
> > > > > Other drivers with same problem:
> > > > >   bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3, 
> > > > 
> > > > It seems using the same length (even without pci_unmap_len()) is
> > > > crucial here, but I hope maintainers (added to CC) will take care.
> > > 
> > > The API needs fixing - if you've got a large mapping and you want to sync
> > > part of it then we need to support that. Now it might well be that the
> > > implementation on some braindead platform has to sync the entire thing,
> > > and some implementations entire pages or cache lines.
> > > 
> > > You can't fix this in the drivers, they requested a service and they
> > > don't have enough information nor is it their job to know about all the
> > > platform specific rules.
> > 
> > Yes, the need to repeat some other values if there is a dedicated
> > structure/pointer could be misleading. Btw, it seems to be a trivial
> > overlooking since there is dma_sync_single_range() ready to use.
> 
> Yeah, dma_sync_single_range() enables you to do a partial sync. But
> you must be really careful with a partial sync (as DMA-API.txt says).

Actually, we are trying to establish here (and a few more netdev@
threads) what exactly the author was worried about. After looking at
some implementations it seems to me this carefullness in observing
the cache alignment and width is needed only wrt. the 'offset'. But
then, the way the 'size' is used (or rather not used for anything
crucial) suggests dma_sync_single_range() with zero offset seems
completely safe. But then it's equivalent to dma_sync_single() with
the 'size' possibly less than 'passed into the single mapping'. Which
according to the other comment seems wrong...

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-20  9:41 [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync Jarek Poplawski
                   ` (2 preceding siblings ...)
  2010-01-20 22:24 ` Alan Cox
@ 2010-01-21 19:59 ` Michael Breuer
  2010-01-21 20:41   ` Jarek Poplawski
  3 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-21 19:59 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/20/2010 4:41 AM, Jarek Poplawski wrote:
> [ previously: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit() ]
> On Tue, Jan 19, 2010 at 05:10:13PM -0800, Stephen Hemminger wrote:
>    
>> On Tue, 19 Jan 2010 20:01:10 -0500
>> Michael Breuer<mbreuer@majjas.com>  wrote:
>>
>>      
>>> On 1/19/2010 5:45 PM, Jarek Poplawski wrote:
>>>        
>>>> On Tue, Jan 19, 2010 at 03:06:01PM -0500, Michael Breuer wrote:
>>>>
>>>>          
>>>>> On 1/19/2010 2:59 PM, Jarek Poplawski wrote:
>>>>>
>>>>>            
>>>>>> On Tue, Jan 19, 2010 at 10:47:27AM -0500, Michael Breuer wrote:
>>>>>> ...
>>>>>>
>>>>>>              
>>>>>>> Still get the warning... but now 60 bytes.
>>>>>>> Jan 19 10:43:50 mail kernel: ------------[ cut here ]------------
>>>>>>> Jan 19 10:43:50 mail kernel: WARNING: at lib/dma-debug.c:902
>>>>>>>                
> ...
>    
>>> That not only compiled, but it cleared the error as well. Additionally,
>>> I used to see a bit of a delay receiving the login prompt when first
>>> connecting to the box by ssh. That delay is gone with this patch. I'd
>>> guess that the warning wasn't quite as innocuous as I thought.
>>> Note: tested on 2.6.32.4.  I'll leave this up for a bit before
>>> attempting to move back to head.
>>>        
>> Seems like an underlying bug in the DMA api. Maybe it just can't
>> handle operations on partial mapping.
>>
>> Other drivers with same problem:
>>    bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3,
>>      
> It seems using the same length (even without pci_unmap_len()) is
> crucial here, but I hope maintainers (added to CC) will take care.
>
> Btw, it's not tested yet, but it might affect CONFIG_DMAR problems.
>
> Thanks,
> Jarek P.
> ----------------------->
>
> Using pci_unmap_len(), with the same length as pci_map_single(), with
> pci_dma_sync_single_for_cpu()/_device() fixes this warning (2.6.32.4):
>
>    
>> Jan 19 10:43:50 mail kernel: WARNING: at lib/dma-debug.c:902
>> check_sync+0xc1/0x43f()
>> Jan 19 10:43:50 mail kernel: Hardware name: System Product Name
>> Jan 19 10:43:50 mail kernel: sky2 0000:04:00.0: DMA-API: device driver
>> tries to sync DMA memory it has not allocated [device
>> address=0x0000000320a0b022] [size=60 bytes]
>>      
> Reported-by: Michael Breuer<mbreuer@majjas.com>
> Tested-by: Michael Breuer<mbreuer@majjas.com>
> Signed-off-by: Jarek Poplawski<jarkao2@gmail.com>
> ---
>
>   drivers/net/sky2.c |    6 ++++--
>   1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index 7650f73..cdebdd3 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -2252,12 +2252,14 @@ static struct sk_buff *receive_copy(struct sky2_port *sky2,
>   	skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
>   	if (likely(skb)) {
>   		pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> -					    length, PCI_DMA_FROMDEVICE);
> +					    pci_unmap_len(re, data_size),
> +					    PCI_DMA_FROMDEVICE);
>   		skb_copy_from_linear_data(re->skb, skb->data, length);
>   		skb->ip_summed = re->skb->ip_summed;
>   		skb->csum = re->skb->csum;
>   		pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> -					       length, PCI_DMA_FROMDEVICE);
> +					       pci_unmap_len(re, data_size),
> +					       PCI_DMA_FROMDEVICE);
>   		re->skb->ip_summed = CHECKSUM_NONE;
>   		skb_put(skb, length);
>   	}
>    
Just a testing update: I went back to CONFIG_DMAR=Y yesterday and have 
not (yet) encountered the sky2 crash I'd been having prior to this fix. 
I've been pumping traffic through, and based on pre-patch experience, it 
would likely have crashed by now.

Will keep the system up for the next couple of days w/o reboot to 
confirm that the sky2 lockup I'd been seeing has stopped happening with 
this patch.

Test notes:

1) Warning previously apparent on start (dma_debug check_sync) with 
CONFIG_DMAR=n is gone.
2) W/o the above patch, I was getting sky2 DMAR errors and subsequent TX 
hangs requiring reboot to clear. The hangs happened after at least 12 
hours of uptime, and under RX load at the time of the hang.
3) With the above patch (and no other changes) I have not been able to 
recreate the crash - the system is stable.

I have been following the discussion about the DMA api would suggest 
that the length issue when DMAR is enabled is less innocuous than 
previously believed.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-21 19:59 ` Michael Breuer
@ 2010-01-21 20:41   ` Jarek Poplawski
  2010-01-21 20:46     ` Michael Breuer
  2010-01-22 18:01     ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Michael Breuer
  0 siblings, 2 replies; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-21 20:41 UTC (permalink / raw)
  To: Michael Breuer
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Thu, Jan 21, 2010 at 02:59:19PM -0500, Michael Breuer wrote:
> On 1/20/2010 4:41 AM, Jarek Poplawski wrote:
> >[ previously: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit() ]
> >diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> >index 7650f73..cdebdd3 100644
> >--- a/drivers/net/sky2.c
> >+++ b/drivers/net/sky2.c
> >@@ -2252,12 +2252,14 @@ static struct sk_buff *receive_copy(struct sky2_port *sky2,
> >  	skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
> >  	if (likely(skb)) {
> >  		pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> >-					    length, PCI_DMA_FROMDEVICE);
> >+					    pci_unmap_len(re, data_size),
> >+					    PCI_DMA_FROMDEVICE);
> >  		skb_copy_from_linear_data(re->skb, skb->data, length);
> >  		skb->ip_summed = re->skb->ip_summed;
> >  		skb->csum = re->skb->csum;
> >  		pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> >-					       length, PCI_DMA_FROMDEVICE);
> >+					       pci_unmap_len(re, data_size),
> >+					       PCI_DMA_FROMDEVICE);
> >  		re->skb->ip_summed = CHECKSUM_NONE;
> >  		skb_put(skb, length);
> >  	}
> Just a testing update: I went back to CONFIG_DMAR=Y yesterday and
> have not (yet) encountered the sky2 crash I'd been having prior to
> this fix. I've been pumping traffic through, and based on pre-patch
> experience, it would likely have crashed by now.
> 
> Will keep the system up for the next couple of days w/o reboot to
> confirm that the sky2 lockup I'd been seeing has stopped happening
> with this patch.
> 
> Test notes:
> 
> 1) Warning previously apparent on start (dma_debug check_sync) with
> CONFIG_DMAR=n is gone.
> 2) W/o the above patch, I was getting sky2 DMAR errors and
> subsequent TX hangs requiring reboot to clear. The hangs happened
> after at least 12 hours of uptime, and under RX load at the time of
> the hang.
> 3) With the above patch (and no other changes) I have not been able
> to recreate the crash - the system is stable.

Btw, could you remind us if during last dmar bugs jumbo frames might
have been used or maybe mtu was changed, and the current test setting?

> 
> I have been following the discussion about the DMA api would suggest
> that the length issue when DMAR is enabled is less innocuous than
> previously believed.

Actually, the last conclusions are - it's more innocuous than ever
believed, and I completely agree with this (at least until the next
week ;-).

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-21 20:41   ` Jarek Poplawski
@ 2010-01-21 20:46     ` Michael Breuer
  2010-01-21 21:02       ` Jarek Poplawski
  2010-01-22 18:01     ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Michael Breuer
  1 sibling, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-21 20:46 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/21/2010 3:41 PM, Jarek Poplawski wrote:
> On Thu, Jan 21, 2010 at 02:59:19PM -0500, Michael Breuer wrote:
>    
>> On 1/20/2010 4:41 AM, Jarek Poplawski wrote:
>>      
>>> [ previously: Re: [PATCH] af_packet: Don't use skb after dev_queue_xmit() ]
>>> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
>>> index 7650f73..cdebdd3 100644
>>> --- a/drivers/net/sky2.c
>>> +++ b/drivers/net/sky2.c
>>> @@ -2252,12 +2252,14 @@ static struct sk_buff *receive_copy(struct sky2_port *sky2,
>>>   	skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
>>>   	if (likely(skb)) {
>>>   		pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
>>> -					    length, PCI_DMA_FROMDEVICE);
>>> +					    pci_unmap_len(re, data_size),
>>> +					    PCI_DMA_FROMDEVICE);
>>>   		skb_copy_from_linear_data(re->skb, skb->data, length);
>>>   		skb->ip_summed = re->skb->ip_summed;
>>>   		skb->csum = re->skb->csum;
>>>   		pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
>>> -					       length, PCI_DMA_FROMDEVICE);
>>> +					       pci_unmap_len(re, data_size),
>>> +					       PCI_DMA_FROMDEVICE);
>>>   		re->skb->ip_summed = CHECKSUM_NONE;
>>>   		skb_put(skb, length);
>>>   	}
>>>        
>> Just a testing update: I went back to CONFIG_DMAR=Y yesterday and
>> have not (yet) encountered the sky2 crash I'd been having prior to
>> this fix. I've been pumping traffic through, and based on pre-patch
>> experience, it would likely have crashed by now.
>>
>> Will keep the system up for the next couple of days w/o reboot to
>> confirm that the sky2 lockup I'd been seeing has stopped happening
>> with this patch.
>>
>> Test notes:
>>
>> 1) Warning previously apparent on start (dma_debug check_sync) with
>> CONFIG_DMAR=n is gone.
>> 2) W/o the above patch, I was getting sky2 DMAR errors and
>> subsequent TX hangs requiring reboot to clear. The hangs happened
>> after at least 12 hours of uptime, and under RX load at the time of
>> the hang.
>> 3) With the above patch (and no other changes) I have not been able
>> to recreate the crash - the system is stable.
>>      
> Btw, could you remind us if during last dmar bugs jumbo frames might
> have been used or maybe mtu was changed, and the current test setting?
>
>    
I've hit this with and without Jumbo frames enabled. Last couple of 
recreations were with mtu = 1500, which is how I'm running now.
>> I have been following the discussion about the DMA api would suggest
>> that the length issue when DMAR is enabled is less innocuous than
>> previously believed.
>>      
> Actually, the last conclusions are - it's more innocuous than ever
> believed, and I completely agree with this (at least until the next
> week ;-).
>    
I stand grammatically corrected.
> Thanks,
> Jarek P.
>    


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-21 20:46     ` Michael Breuer
@ 2010-01-21 21:02       ` Jarek Poplawski
  0 siblings, 0 replies; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-21 21:02 UTC (permalink / raw)
  To: Michael Breuer
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Thu, Jan 21, 2010 at 03:46:50PM -0500, Michael Breuer wrote:
> On 1/21/2010 3:41 PM, Jarek Poplawski wrote:
> >On Thu, Jan 21, 2010 at 02:59:19PM -0500, Michael Breuer wrote:
> >>Test notes:
> >>
> >>1) Warning previously apparent on start (dma_debug check_sync) with
> >>CONFIG_DMAR=n is gone.
> >>2) W/o the above patch, I was getting sky2 DMAR errors and
> >>subsequent TX hangs requiring reboot to clear. The hangs happened
> >>after at least 12 hours of uptime, and under RX load at the time of
> >>the hang.
> >>3) With the above patch (and no other changes) I have not been able
> >>to recreate the crash - the system is stable.
> >Btw, could you remind us if during last dmar bugs jumbo frames might
> >have been used or maybe mtu was changed, and the current test setting?
> >
> I've hit this with and without Jumbo frames enabled. Last couple of
> recreations were with mtu = 1500, which is how I'm running now.
> >>I have been following the discussion about the DMA api would suggest
> >>that the length issue when DMAR is enabled is less innocuous than
> >>previously believed.
> >Actually, the last conclusions are - it's more innocuous than ever
> >believed, and I completely agree with this (at least until the next
> >week ;-).
> I stand grammatically corrected.

I didn't mean anything grammatical, sorry! (Except, it's equally
complex ;-)

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-21 18:41       ` Jarek Poplawski
@ 2010-01-22  5:11         ` FUJITA Tomonori
  2010-01-22  6:38           ` David Miller
  0 siblings, 1 reply; 99+ messages in thread
From: FUJITA Tomonori @ 2010-01-22  5:11 UTC (permalink / raw)
  To: jarkao2
  Cc: fujita.tomonori, alan, davem, shemminger, mbreuer, akpm, flyboy,
	linux-kernel, netdev, mchan, pcnet32, romieu, mcarlson,
	James.Bottomley

On Thu, 21 Jan 2010 19:41:58 +0100
Jarek Poplawski <jarkao2@gmail.com> wrote:

> On Fri, Jan 22, 2010 at 12:22:10AM +0900, FUJITA Tomonori wrote:
> > On Wed, 20 Jan 2010 23:53:22 +0100
> > Jarek Poplawski <jarkao2@gmail.com> wrote:
> > 
> > > On Wed, Jan 20, 2010 at 10:24:14PM +0000, Alan Cox wrote:
> > > > > > Seems like an underlying bug in the DMA api. Maybe it just can't
> > > > > > handle operations on partial mapping.
> > > > > > 
> > > > > > Other drivers with same problem:
> > > > > >   bnx2, cassini, pcnet32, r8169, rrunner, skge, sungem, tg3, 
> > > > > 
> > > > > It seems using the same length (even without pci_unmap_len()) is
> > > > > crucial here, but I hope maintainers (added to CC) will take care.
> > > > 
> > > > The API needs fixing - if you've got a large mapping and you want to sync
> > > > part of it then we need to support that. Now it might well be that the
> > > > implementation on some braindead platform has to sync the entire thing,
> > > > and some implementations entire pages or cache lines.
> > > > 
> > > > You can't fix this in the drivers, they requested a service and they
> > > > don't have enough information nor is it their job to know about all the
> > > > platform specific rules.
> > > 
> > > Yes, the need to repeat some other values if there is a dedicated
> > > structure/pointer could be misleading. Btw, it seems to be a trivial
> > > overlooking since there is dma_sync_single_range() ready to use.
> > 
> > Yeah, dma_sync_single_range() enables you to do a partial sync. But
> > you must be really careful with a partial sync (as DMA-API.txt says).
> 
> Actually, we are trying to establish here (and a few more netdev@
> threads) what exactly the author was worried about. After looking at

James added to Cc,


> some implementations it seems to me this carefullness in observing
> the cache alignment and width is needed only wrt. the 'offset'. But
> then, the way the 'size' is used (or rather not used for anything
> crucial) suggests dma_sync_single_range() with zero offset seems
> completely safe. But then it's equivalent to dma_sync_single() with

Even if 'offset' is zero, 'size' still matters, I think. If 'size' is
not a multiple of the cache line size, it's possible that driver
writers who aren't familiar with cache would be surprised (it depends
on the way their drivers use buffers though).

The easiest way for 'completely safe sync for any driver writers' is
asking for all the sync parameters must be the same as those passed
into the single mapping API. If writes knows what they do, they can do
a partial sync with sync_range API. That's the author intention, I
guess.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-22  5:11         ` FUJITA Tomonori
@ 2010-01-22  6:38           ` David Miller
  2010-02-03  1:18             ` FUJITA Tomonori
  0 siblings, 1 reply; 99+ messages in thread
From: David Miller @ 2010-01-22  6:38 UTC (permalink / raw)
  To: fujita.tomonori
  Cc: jarkao2, alan, shemminger, mbreuer, akpm, flyboy, linux-kernel,
	netdev, mchan, pcnet32, romieu, mcarlson, James.Bottomley

From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Fri, 22 Jan 2010 14:11:29 +0900

> Even if 'offset' is zero, 'size' still matters, I think. If 'size' is
> not a multiple of the cache line size, it's possible that driver
> writers who aren't familiar with cache would be surprised (it depends
> on the way their drivers use buffers though).
> 
> The easiest way for 'completely safe sync for any driver writers' is
> asking for all the sync parameters must be the same as those passed
> into the single mapping API. If writes knows what they do, they can do
> a partial sync with sync_range API. That's the author intention, I
> guess.

This is not reasonable.

You have to think about how people actually use these
interfaces.

They have a large buffer, and if they receive a small request they
want to allocate a smaller buffer, copy into that smaller buffer, and
give the larger buffer back to the hardware.

It's an optimization, it performs better this way.

If you make it so that the DMA sync has to cover the entire large
buffer, the whole point of the optimization is taken away.

That makes no sense at all.

I know that when I designed and wrote the first implementation of the
PCI DMA interfaces, I sure as hell meant to allow partial DMA sync
operations.

I know this as a fact, because the first drivers ported over
to these interfaces were network drivers.  And I definitely
knew about the copy-break mechanism I describe above and how
networking drivers use such a scheme pretty much across the
board.

The DMA API documentation is wrong, it must be fixed to allow partial
syncs of arbitrary offsets and sizes.

The issue of cache line boundaries and such are the domain of the DMA
API implementation, and has absolutely no business in the definition
of these interfaces.  Nor should it be something driver authors have
to be knowledgable about, that would be completely unreasonable.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-21 20:41   ` Jarek Poplawski
  2010-01-21 20:46     ` Michael Breuer
@ 2010-01-22 18:01     ` Michael Breuer
  2010-01-22 21:53       ` Jarek Poplawski
  2012-01-16 16:39       ` Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9) Michael Breuer
  1 sibling, 2 replies; 99+ messages in thread
From: Michael Breuer @ 2010-01-22 18:01 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

[-- Attachment #1: Type: text/plain, Size: 6135 bytes --]

Kernel 2.6.32.4 (git) with the following patches applied:

af_packet.c (tpacket_snd version 3)
sky2.c pskb_may_pull
sky2 fix WARNING at lib/dma-debug.c check_sync

Running with CONFIG_DMAR=n, system is stable.
Running with the exact same source but CONFIG_DMAR=y I get the WARNING 
(see below) after about 36 hours of uptime (has varied from about 24 to 
about 48):
Smolt profile: 
http://smolt.fedoraproject.org/show?uuid=pub_bb05c701-1e47-4b3c-9fab-54f520f39d79+
I'm also attaching dmesg.old (dmesg from the crash).

Subsequent to this the system watchdog reboots the system (it's hung).

Of interest: each and every time this has happened the system was under 
heavy RX load (win7 backup to a cifs share hosted on this server). Also, 
there is always a dhcp exchange of some sort preceding the event.

It is possible that the event is re creatable without DMAR enabled, but 
I have been unsuccessful in doing so.


Jan 22 05:38:36 mail dhcpd: DHCPREQUEST for 10.0.0.54 from 
00:1b:78:c8:2b:8e (HPC82B8D) via eth0
Jan 22 05:38:36 mail dhcpd: DHCPACK on 10.0.0.54 to 00:1b:78:c8:2b:8e 
(HPC82B8D) via eth0
Jan 22 05:38:41 mail kernel: DRHD: handling fault status reg 2
Jan 22 05:38:41 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
fault addr ffdfdd9fe000
Jan 22 05:38:41 mail kernel: DMAR:[fault reason 06] PTE Read access is 
not set
Jan 22 05:38:41 mail kernel: sky2 0000:06:00.0: error interrupt 
status=0xc0000000
Jan 22 05:38:41 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
Jan 22 05:39:18 mail kernel: ------------[ cut here ]------------
Jan 22 05:39:18 mail kernel: WARNING: at net/sched/sch_generic.c:261 
dev_watchdog+0xf3/0x164()
Jan 22 05:39:18 mail kernel: Hardware name: System Product Name
Jan 22 05:39:18 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit 
queue 0 timed out
Jan 22 05:39:18 mail kernel: Modules linked in: cpufreq_stats 
ip6table_mangle ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat 
nf_nat iptable_mangle iptable_raw appletalk psnap llc nfsd lockd nfs_acl 
auth_rpcgss exportfs hwmon_vid coretemp sunrpc acpi_cpufreq sit tunnel4 
ipt_LOG nf_conntrack_netbios_ns nf_conntrack_ftp nf_conntrack_ipv6 
xt_multiport xt_DSCP xt_dscp xt_MARK ipv6 dm_multipath kvm_intel kvm 
snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_hda_intel 
snd_ac97_codec snd_hda_codec snd_hwdep ac97_bus snd_seq snd_seq_device 
firewire_ohci snd_pcm firewire_core crc_itu_t snd_timer snd 
gspca_spca505 gspca_main i2c_i801 videodev v4l1_compat sky2 soundcore 
v4l2_compat_ioctl32 wmi snd_page_alloc asus_atk0110 hwmon pcspkr 
iTCO_wdt iTCO_vendor_support fbcon tileblit font bitblit softcursor 
raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy 
async_tx raid1 ata_generic pata_acpi pata_marvell nouveau ttm 
drm_kms_helper drm agpgart fb i2c_algo_bit cfbcopyarea i2c_core 
cfbimgblt cfb
Jan 22 05:39:18 mail kernel: fillrect [last unloaded: microcode]
Jan 22 05:39:18 mail kernel: Pid: 0, comm: swapper Tainted: G        W  
2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty #6
Jan 22 05:39:18 mail kernel: Call Trace:
Jan 22 05:39:18 mail kernel: <IRQ>  [<ffffffff810536ee>] 
warn_slowpath_common+0x7c/0x94
Jan 22 05:39:18 mail kernel: [<ffffffff8105375d>] 
warn_slowpath_fmt+0x41/0x43
Jan 22 05:39:18 mail kernel: [<ffffffff813e3b6b>] ? netif_tx_lock+0x44/0x6c
Jan 22 05:39:18 mail kernel: [<ffffffff813e3cd3>] dev_watchdog+0xf3/0x164
Jan 22 05:39:18 mail kernel: [<ffffffff8105f3f4>] ? cascade+0x6a/0x84
Jan 22 05:39:18 mail kernel: [<ffffffff8106323f>] 
run_timer_softirq+0x1c8/0x270
Jan 22 05:39:18 mail kernel: [<ffffffff8105af0f>] __do_softirq+0xf8/0x1cd
Jan 22 05:39:18 mail kernel: [<ffffffff8107f0ab>] ? 
tick_program_event+0x2a/0x2c
Jan 22 05:39:18 mail kernel: [<ffffffff81012e1c>] call_softirq+0x1c/0x30
Jan 22 05:39:18 mail kernel: [<ffffffff810143a3>] do_softirq+0x4b/0xa6
Jan 22 05:39:18 mail kernel: [<ffffffff8105aaef>] irq_exit+0x4a/0x8c
Jan 22 05:39:18 mail kernel: [<ffffffff81470612>] 
smp_apic_timer_interrupt+0x86/0x94
Jan 22 05:39:18 mail kernel: [<ffffffff810127e3>] 
apic_timer_interrupt+0x13/0x20
Jan 22 05:39:18 mail kernel: <EOI>  [<ffffffff812c729a>] ? 
acpi_idle_enter_bm+0x256/0x28a
Jan 22 05:39:18 mail kernel: [<ffffffff812c7293>] ? 
acpi_idle_enter_bm+0x24f/0x28a
Jan 22 05:39:18 mail kernel: [<ffffffff813a6c3c>] ? 
cpuidle_idle_call+0x9e/0xfa
Jan 22 05:39:18 mail kernel: [<ffffffff81010c90>] ? cpu_idle+0xb4/0xf6
Jan 22 05:39:18 mail kernel: [<ffffffff81465ba5>] ? 
start_secondary+0x201/0x242
Jan 22 05:39:18 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 22 05:39:18 mail kernel: sky2 eth0: tx timeout
Jan 22 05:39:18 mail kernel: sky2 eth0: transmit ring 76 .. 35 report=76 
done=76
Jan 22 05:39:18 mail kernel: sky2 eth0: disabling interface
Jan 22 05:39:18 mail kernel: sky2 eth0: enabling interface
Jan 22 05:39:21 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full 
duplex, flow control both
Jan 22 05:40:06 mail kernel: sky2 eth0: tx timeout
Jan 22 05:40:06 mail kernel: sky2 eth0: transmit ring 3 .. 90 report=3 
done=3
Jan 22 05:40:06 mail kernel: sky2 eth0: disabling interface
Jan 22 05:40:06 mail kernel: sky2 eth0: enabling interface
Jan 22 05:40:09 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full 
duplex, flow control both
Jan 22 05:40:30 mail abrt: Kerneloops: Reported 1 kernel oopses to Abrt
Jan 22 05:40:30 mail abrtd: Directory 'kerneloops-1264156830-1' creation 
detected
Jan 22 05:40:30 mail abrtd: Can't open file 
'/var/cache/abrt/kerneloops-1264156830-1/cmdline'
Jan 22 05:40:30 mail abrtd: Corrupted or bad crash, deleting
Jan 22 05:40:33 mail dhcpd: DHCPINFORM from 10.0.0.11 via eth0
Jan 22 05:40:33 mail dhcpd: DHCPACK to 10.0.0.11 (00:1a:92:8d:30:81) via 
eth0
Jan 22 05:40:36 mail dhcpd: DHCPINFORM from 10.0.0.11 via eth0
Jan 22 05:40:36 mail dhcpd: DHCPACK to 10.0.0.11 (00:1a:92:8d:30:81) via 
eth0
Jan 22 05:40:50 mail named[11245]: error (connection refused) resolving 
'173-212-207-86.hostnoc.net/A/IN': 2607:f878:0:3::12#53
Jan 22 05:40:50 mail named[11245]: error (connection refused) resolving 
'NS1.HOSTNOC.NET/AAAA/IN': 2607:f878:0:3::10#53
<watchdog reboot>


[-- Attachment #2: dmesg.old --]
[-- Type: text/plain, Size: 72485 bytes --]

Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty (mbreuer@mail.majjas.com) (gcc version 4.4.2 20091222 (Red Hat 4.4.2-20) (GCC) ) #6 SMP Thu Jan 21 00:09:11 EST 2010
Command line: ro root=/dev/mapper/vg_00-LogVol02_root usbcore.autosuspend=1 crashkernel=128M SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us pcie_aspm=force intel_iommu=igfx_off debug
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009cc00 (usable)
 BIOS-e820: 000000000009cc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e4c00 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000bf780000 (usable)
 BIOS-e820: 00000000bf780000 - 00000000bf798000 (ACPI data)
 BIOS-e820: 00000000bf798000 - 00000000bf7dc000 (ACPI NVS)
 BIOS-e820: 00000000bf7dc000 - 00000000c0000000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000340000000 (usable)
DMI 2.5 present.
AMI BIOS detected: BIOS may corrupt low RAM, working around it.
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
last_pfn = 0x340000 max_arch_pfn = 0x400000000
MTRR default type: uncachable
MTRR fixed ranges enabled:
  00000-9FFFF write-back
  A0000-BFFFF uncachable
  C0000-E3FFF write-protect
  E4000-EBFFF write-through
  EC000-FFFFF write-protect
MTRR variable ranges enabled:
  0 base 000000000 mask E00000000 write-back
  1 base 200000000 mask F00000000 write-back
  2 base 300000000 mask FC0000000 write-back
  3 base 0C0000000 mask FC0000000 uncachable
  4 base 0BF800000 mask FFF800000 uncachable
  5 disabled
  6 disabled
  7 disabled
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
original variable MTRRs
reg 0, base: 0GB, range: 8GB, type WB
reg 1, base: 8GB, range: 4GB, type WB
reg 2, base: 12GB, range: 1GB, type WB
reg 3, base: 3GB, range: 1GB, type UC
reg 4, base: 3064MB, range: 8MB, type UC
total RAM covered: 12280M
Found optimal setting for mtrr clean up
 gran_size: 64K 	chunk_size: 16M 	num_reg: 6  	lose cover RAM: 0G
New variable MTRRs
reg 0, base: 0GB, range: 2GB, type WB
reg 1, base: 2GB, range: 1GB, type WB
reg 2, base: 3064MB, range: 8MB, type UC
reg 3, base: 4GB, range: 4GB, type WB
reg 4, base: 8GB, range: 4GB, type WB
reg 5, base: 12GB, range: 1GB, type WB
e820 update range: 00000000bf800000 - 0000000100000000 (usable) ==> (reserved)
last_pfn = 0xbf780 max_arch_pfn = 0x400000000
Scanning 0 areas for low memory corruption
modified physical RAM map:
 modified: 0000000000000000 - 0000000000010000 (reserved)
 modified: 0000000000010000 - 000000000009cc00 (usable)
 modified: 000000000009cc00 - 00000000000a0000 (reserved)
 modified: 00000000000e4c00 - 0000000000100000 (reserved)
 modified: 0000000000100000 - 00000000bf780000 (usable)
 modified: 00000000bf780000 - 00000000bf798000 (ACPI data)
 modified: 00000000bf798000 - 00000000bf7dc000 (ACPI NVS)
 modified: 00000000bf7dc000 - 00000000c0000000 (reserved)
 modified: 00000000fee00000 - 00000000fee01000 (reserved)
 modified: 00000000ffe00000 - 0000000100000000 (reserved)
 modified: 0000000100000000 - 0000000340000000 (usable)
initial memory mapped : 0 - 20000000
init_memory_mapping: 0000000000000000-00000000bf780000
 0000000000 - 00bf600000 page 2M
 00bf600000 - 00bf780000 page 4k
kernel direct mapping tables up to bf780000 @ 10000-15000
init_memory_mapping: 0000000100000000-0000000340000000
 0100000000 - 0340000000 page 2M
kernel direct mapping tables up to 340000000 @ 13000-21000
RAMDISK: 3770e000 - 37fef2d9
ACPI: RSDP 00000000000fb1a0 00024 (v02 ACPIAM)
ACPI: XSDT 00000000bf780100 00064 (v01 111209 XSDT1642 20091112 MSFT 00000097)
ACPI: FACP 00000000bf780290 000F4 (v03 111209 FACP1642 20091112 MSFT 00000097)
ACPI: DSDT 00000000bf7804b0 0B66D (v01  A1265 A1265001 00000001 INTL 20060113)
ACPI: FACS 00000000bf798000 00040
ACPI: APIC 00000000bf780390 000D8 (v01 111209 APIC1642 20091112 MSFT 00000097)
ACPI: MCFG 00000000bf780470 0003C (v01 111209 OEMMCFG  20091112 MSFT 00000097)
ACPI: OEMB 00000000bf798040 00072 (v01 111209 OEMB1642 20091112 MSFT 00000097)
ACPI: HPET 00000000bf78f4b0 00038 (v01 111209 OEMHPET  20091112 MSFT 00000097)
ACPI: DMAR 00000000bf7980c0 00138 (v01    AMI  OEMDMAR 00000001 MSFT 00000097)
ACPI: OSFR 00000000bf78f4f0 000B0 (v01 111209 OEMOSFR  20091112 MSFT 00000097)
ACPI: SSDT 00000000bf79a470 00363 (v01 DpgPmm    CpuPm 00000012 INTL 20060113)
ACPI: Local APIC address 0xfee00000
No NUMA configuration found
Faking a node at 0000000000000000-0000000340000000
Bootmem setup node 0 0000000000000000-0000000340000000
  NODE_DATA [000000000001c000 - 000000000002ffff]
  bootmap [0000000000030000 -  0000000000097fff] pages 68
(8 early reservations) ==> bootmem [0000000000 - 0340000000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
  #2 [0001000000 - 0001a478b0]    TEXT DATA BSS ==> [0001000000 - 0001a478b0]
  #3 [003770e000 - 0037fef2d9]          RAMDISK ==> [003770e000 - 0037fef2d9]
  #4 [000009cc00 - 0000100000]    BIOS reserved ==> [000009cc00 - 0000100000]
  #5 [0001a48000 - 0001a48294]              BRK ==> [0001a48000 - 0001a48294]
  #6 [0000010000 - 0000013000]          PGTABLE ==> [0000010000 - 0000013000]
  #7 [0000013000 - 000001c000]          PGTABLE ==> [0000013000 - 000001c000]
found SMP MP-table at [ffff8800000ff780] ff780
Reserving 128MB of memory at 32MB for crashkernel (System RAM: 13312MB)
 [ffffea0000000000-ffffea000b5fffff] PMD -> [ffff880028600000-ffff880032dfffff] on node 0
Zone PFN ranges:
  DMA      0x00000010 -> 0x00001000
  DMA32    0x00001000 -> 0x00100000
  Normal   0x00100000 -> 0x00340000
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
    0: 0x00000010 -> 0x0000009c
    0: 0x00000100 -> 0x000bf780
    0: 0x00100000 -> 0x00340000
On node 0 totalpages: 3143436
  DMA zone: 56 pages used for memmap
  DMA zone: 114 pages reserved
  DMA zone: 3810 pages, LIFO batch:0
  DMA32 zone: 14280 pages used for memmap
  DMA32 zone: 765880 pages, LIFO batch:31
  Normal zone: 32256 pages used for memmap
  Normal zone: 2327040 pages, LIFO batch:31
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x05] enabled)
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled)
ACPI: LAPIC (acpi_id[0x09] lapic_id[0x88] disabled)
ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x89] disabled)
ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x8a] disabled)
ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x8b] disabled)
ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x8c] disabled)
ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x8d] disabled)
ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x8e] disabled)
ACPI: LAPIC (acpi_id[0x10] lapic_id[0x8f] disabled)
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x09] address[0xfec8a000] gsi_base[24])
IOAPIC[1]: apic_id 9, version 32, address 0xfec8a000, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x8086a301 base: 0xfed00000
SMP: Allowing 16 CPUs, 8 hotplug CPUs
nr_irqs_gsi: 48
Allocating PCI resources starting at c0000000 (gap: c0000000:3ee00000)
Booting paravirtualized kernel on bare hardware
NR_CPUS:16 nr_cpumask_bits:16 nr_cpu_ids:16 nr_node_ids:1
PERCPU: Embedded 30 pages/cpu @ffff880028200000 s90264 r8192 d24424 u131072
pcpu-alloc: s90264 r8192 d24424 u131072 alloc=1*2097152
pcpu-alloc: [0] 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 3096730
Policy zone: Normal
Kernel command line: ro root=/dev/mapper/vg_00-LogVol02_root usbcore.autosuspend=1 crashkernel=128M SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us pcie_aspm=force intel_iommu=igfx_off debug
PCIe ASPM is forcedly enabled
Intel-IOMMU: disable GFX device mapping
PID hash table entries: 4096 (order: 3, 32768 bytes)
Initializing CPU#0
Checking aperture...
No AGP bridge found
Calgary: detecting Calgary via BIOS EBDA area
Calgary: Unable to locate Rio Grande table in EBDA - bailing!
Memory: 12248680k/13631488k available (4554k kernel code, 1057744k absent, 325064k reserved, 3163k data, 1524k init)
SLUB: Genslabs=14, HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
Hierarchical RCU implementation.
RCU-based detection of stalled CPUs is enabled.
NR_IRQS:4352 nr_irqs:944
Extended CMOS year: 2000
Console: colour VGA+ 80x25
console [tty0] enabled
allocated 125829120 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
hpet clockevent registered
HPET: 4 timers in total, 0 timers will be used for per-cpu timer
Fast TSC calibration using PIT
Detected 2672.911 MHz processor.
Calibrating delay loop (skipped), value calculated using timer frequency.. 5345.82 BogoMIPS (lpj=2672911)
Security Framework initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 8192K
CPU 0/0x0 -> Node 0
mce: CPU supports 9 MCE banks
CPU0: Thermal monitoring enabled (TM1)
CPU 0 MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
using mwait in idle threads.
Performance Events: Nehalem/Corei7 events, Intel PMU driver.
... version:                3
... bit width:              48
... generic registers:      4
... value mask:             0000ffffffffffff
... max period:             000000007fffffff
... fixed-purpose events:   3
... event mask:             000000070000000f
ACPI: Core revision 20090903
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 23777 entries in 94 pages
DMAR: Host address width 39
DMAR: DRHD base: 0x000000fbfff000 flags: 0x0
IOMMU fbfff000: ver 1:0 cap c9008010e60262 ecap f0207a
DMAR: DRHD base: 0x000000fbffe000 flags: 0x1
IOMMU fbffe000: ver 1:0 cap c90780106f0462 ecap f020fe
DMAR: RMRR base: 0x000000000ec000 end: 0x000000000effff
DMAR: RMRR base: 0x000000bf7dc000 end: 0x000000bf7dbfff
DMAR: ATSR flags: 0x0
IOAPIC id 8 under DRHD base 0xfbffe000
IOAPIC id 9 under DRHD base 0xfbffe000
DRHD: handling fault status reg 2
Enabled Interrupt-remapping
Setting APIC routing to flat
  alloc irq_desc for 48 on node 0
  alloc kstat_irqs on node 0
  alloc irq_desc for 49 on node 0
  alloc kstat_irqs on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
alloc irq_2_iommu on node 0
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz stepping 05
APIC calibration not consistent with PM-Timer: 121ms instead of 100ms
APIC delta adjusted to PM-Timer: 835227 (1018966)
Booting processor 1 APIC 0x2 ip 0x6000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5344.68 BogoMIPS (lpj=2672340)
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 8192K
CPU 1/0x2 -> Node 0
CPU1: Thermal monitoring enabled (TM1)
CPU 1 MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
CPU1: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz stepping 05
Skipping synchronization checks as TSC is reliable.
Booting processor 2 APIC 0x4 ip 0x6000
Initializing CPU#2
Calibrating delay using timer specific routine.. 5344.68 BogoMIPS (lpj=2672340)
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 2
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 8192K
CPU 2/0x4 -> Node 0
CPU2: Thermal monitoring enabled (TM1)
CPU 2 MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
CPU2: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz stepping 05
Booting processor 3 APIC 0x6 ip 0x6000
Initializing CPU#3
Calibrating delay using timer specific routine.. 5558.47 BogoMIPS (lpj=2779235)
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 3
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 8192K
CPU 3/0x6 -> Node 0
CPU3: Thermal monitoring enabled (TM1)
CPU 3 MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
CPU3: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz stepping 05
Booting processor 4 APIC 0x1 ip 0x6000
Initializing CPU#4
Calibrating delay using timer specific routine.. 5344.67 BogoMIPS (lpj=2672339)
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 8192K
CPU 4/0x1 -> Node 0
CPU4: Thermal monitoring enabled (TM1)
CPU 4 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
CPU4: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz stepping 05
Booting processor 5 APIC 0x3 ip 0x6000
Initializing CPU#5
Calibrating delay using timer specific routine.. 5344.67 BogoMIPS (lpj=2672339)
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 8192K
CPU 5/0x3 -> Node 0
CPU5: Thermal monitoring enabled (TM1)
CPU 5 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
CPU5: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz stepping 05
Booting processor 6 APIC 0x5 ip 0x6000
Initializing CPU#6
Calibrating delay using timer specific routine.. 5558.46 BogoMIPS (lpj=2779231)
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 2
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 8192K
CPU 6/0x5 -> Node 0
CPU6: Thermal monitoring enabled (TM1)
CPU 6 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
CPU6: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz stepping 05
Booting processor 7 APIC 0x7 ip 0x6000
Initializing CPU#7
Calibrating delay using timer specific routine.. 5344.67 BogoMIPS (lpj=2672338)
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 3
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 256K
CPU: L3 cache: 8192K
CPU 7/0x7 -> Node 0
CPU7: Thermal monitoring enabled (TM1)
CPU 7 MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
CPU7: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz stepping 05
Brought up 8 CPUs
Total of 8 processors activated (43186.14 BogoMIPS).
CPU0 attaching sched-domain:
 domain 0: span 0,4 level SIBLING
  groups: 0 (cpu_power = 589) 4 (cpu_power = 589)
  domain 1: span 0-7 level MC
   groups: 0,4 (cpu_power = 1178) 1,5 (cpu_power = 1178) 2,6 (cpu_power = 1178) 3,7 (cpu_power = 1178)
CPU1 attaching sched-domain:
 domain 0: span 1,5 level SIBLING
  groups: 1 (cpu_power = 589) 5 (cpu_power = 589)
  domain 1: span 0-7 level MC
   groups: 1,5 (cpu_power = 1178) 2,6 (cpu_power = 1178) 3,7 (cpu_power = 1178) 0,4 (cpu_power = 1178)
CPU2 attaching sched-domain:
 domain 0: span 2,6 level SIBLING
  groups: 2 (cpu_power = 589) 6 (cpu_power = 589)
  domain 1: span 0-7 level MC
   groups: 2,6 (cpu_power = 1178) 3,7 (cpu_power = 1178) 0,4 (cpu_power = 1178) 1,5 (cpu_power = 1178)
CPU3 attaching sched-domain:
 domain 0: span 3,7 level SIBLING
  groups: 3 (cpu_power = 589) 7 (cpu_power = 589)
  domain 1: span 0-7 level MC
   groups: 3,7 (cpu_power = 1178) 0,4 (cpu_power = 1178) 1,5 (cpu_power = 1178) 2,6 (cpu_power = 1178)
CPU4 attaching sched-domain:
 domain 0: span 0,4 level SIBLING
  groups: 4 (cpu_power = 589) 0 (cpu_power = 589)
  domain 1: span 0-7 level MC
   groups: 0,4 (cpu_power = 1178) 1,5 (cpu_power = 1178) 2,6 (cpu_power = 1178) 3,7 (cpu_power = 1178)
CPU5 attaching sched-domain:
 domain 0: span 1,5 level SIBLING
  groups: 5 (cpu_power = 589) 1 (cpu_power = 589)
  domain 1: span 0-7 level MC
   groups: 1,5 (cpu_power = 1178) 2,6 (cpu_power = 1178) 3,7 (cpu_power = 1178) 0,4 (cpu_power = 1178)
CPU6 attaching sched-domain:
 domain 0: span 2,6 level SIBLING
  groups: 6 (cpu_power = 589) 2 (cpu_power = 589)
  domain 1: span 0-7 level MC
   groups: 2,6 (cpu_power = 1178) 3,7 (cpu_power = 1178) 0,4 (cpu_power = 1178) 1,5 (cpu_power = 1178)
CPU7 attaching sched-domain:
 domain 0: span 3,7 level SIBLING
  groups: 7 (cpu_power = 589) 3 (cpu_power = 589)
  domain 1: span 0-7 level MC
   groups: 3,7 (cpu_power = 1178) 0,4 (cpu_power = 1178) 1,5 (cpu_power = 1178) 2,6 (cpu_power = 1178)
regulator: core version 0.5
Time:  5:43:57  Date: 01/21/10
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255
PCI: Not using MMCONFIG.
PCI: Using configuration type 1 for base access
bio: create slab <bio-0> at 0
ACPI: EC: Look up EC in DSDT
ACPI: Executed 1 blocks of module-level executable AML code
ACPI: Interpreter enabled
ACPI: (supports S0 S1 S3 S5)
ACPI: Using IOAPIC for interrupt routing
PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255
PCI: MCFG area at e0000000 reserved in ACPI motherboard resources
PCI: Using MMCONFIG at e0000000 - efffffff
ACPI: No dock devices found.
ACPI: PCI Root Bridge [PCI0] (0000:00)
pci 0000:00:00.0: PME# supported from D0 D3hot D3cold
pci 0000:00:00.0: PME# disabled
pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
pci 0000:00:01.0: PME# disabled
pci 0000:00:03.0: PME# supported from D0 D3hot D3cold
pci 0000:00:03.0: PME# disabled
pci 0000:00:07.0: PME# supported from D0 D3hot D3cold
pci 0000:00:07.0: PME# disabled
pci 0000:00:1a.0: reg 20 io port: [0x9800-0x981f]
pci 0000:00:1a.1: reg 20 io port: [0x9880-0x989f]
pci 0000:00:1a.2: reg 20 io port: [0x9c00-0x9c1f]
pci 0000:00:1a.7: reg 10 32bit mmio: [0xf7fff000-0xf7fff3ff]
pci 0000:00:1a.7: PME# supported from D0 D3hot D3cold
pci 0000:00:1a.7: PME# disabled
pci 0000:00:1b.0: reg 10 64bit mmio: [0xf7ff8000-0xf7ffbfff]
pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
pci 0000:00:1b.0: PME# disabled
pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.0: PME# disabled
pci 0000:00:1c.2: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.2: PME# disabled
pci 0000:00:1c.4: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.4: PME# disabled
pci 0000:00:1c.5: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.5: PME# disabled
pci 0000:00:1d.0: reg 20 io port: [0x9080-0x909f]
pci 0000:00:1d.1: reg 20 io port: [0x9400-0x941f]
pci 0000:00:1d.2: reg 20 io port: [0x9480-0x949f]
pci 0000:00:1d.7: reg 10 32bit mmio: [0xf7ffe000-0xf7ffe3ff]
pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold
pci 0000:00:1d.7: PME# disabled
pci 0000:00:1f.0: quirk: region 0800-087f claimed by ICH6 ACPI/GPIO/TCO
pci 0000:00:1f.0: quirk: region 0500-053f claimed by ICH6 GPIO
pci 0000:00:1f.0: ICH7 LPC Generic IO decode 1 PIO at 0294 (mask 0003)
pci 0000:00:1f.0: ICH7 LPC Generic IO decode 3 PIO at 4700 (mask 001f)
pci 0000:00:1f.2: reg 10 io port: [0x8c00-0x8c07]
pci 0000:00:1f.2: reg 14 io port: [0x8880-0x8883]
pci 0000:00:1f.2: reg 18 io port: [0x8800-0x8807]
pci 0000:00:1f.2: reg 1c io port: [0x8480-0x8483]
pci 0000:00:1f.2: reg 20 io port: [0x8400-0x841f]
pci 0000:00:1f.2: reg 24 32bit mmio: [0xf7ffc000-0xf7ffc7ff]
pci 0000:00:1f.2: PME# supported from D3hot
pci 0000:00:1f.2: PME# disabled
pci 0000:00:1f.3: reg 10 64bit mmio: [0xf7ffd000-0xf7ffd0ff]
pci 0000:00:1f.3: reg 20 io port: [0x400-0x41f]
pci 0000:02:00.0: reg 10 32bit mmio: [0xfa000000-0xfaffffff]
pci 0000:02:00.0: reg 14 64bit mmio pref: [0xd0000000-0xdfffffff]
pci 0000:02:00.0: reg 1c 64bit mmio: [0xf8000000-0xf9ffffff]
pci 0000:02:00.0: reg 24 io port: [0xac00-0xac7f]
pci 0000:02:00.0: reg 30 32bit mmio pref: [0xfbae0000-0xfbafffff]
pci 0000:00:03.0: bridge io port: [0xa000-0xafff]
pci 0000:00:03.0: bridge 32bit mmio: [0xf8000000-0xfbafffff]
pci 0000:00:03.0: bridge 64bit mmio pref: [0xd0000000-0xdfffffff]
pci 0000:00:1c.0: bridge 64bit mmio pref: [0xf6f00000-0xf6ffffff]
pci 0000:06:00.0: reg 10 64bit mmio: [0xfbdfc000-0xfbdfffff]
pci 0000:06:00.0: reg 18 io port: [0xd800-0xd8ff]
pci 0000:06:00.0: reg 30 32bit mmio pref: [0xfbdc0000-0xfbddffff]
pci 0000:06:00.0: supports D1 D2
pci 0000:06:00.0: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:06:00.0: PME# disabled
pci 0000:00:1c.2: bridge io port: [0xd000-0xdfff]
pci 0000:00:1c.2: bridge 32bit mmio: [0xfbd00000-0xfbdfffff]
pci 0000:05:00.0: reg 10 io port: [0xcc00-0xcc07]
pci 0000:05:00.0: reg 14 io port: [0xc880-0xc883]
pci 0000:05:00.0: reg 18 io port: [0xc800-0xc807]
pci 0000:05:00.0: reg 1c io port: [0xc480-0xc483]
pci 0000:05:00.0: reg 20 io port: [0xc400-0xc40f]
pci 0000:05:00.0: reg 24 32bit mmio: [0xfbcffc00-0xfbcfffff]
pci 0000:05:00.0: supports D1
pci 0000:05:00.0: PME# supported from D0 D1 D3hot
pci 0000:05:00.0: PME# disabled
pci 0000:00:1c.4: bridge io port: [0xc000-0xcfff]
pci 0000:00:1c.4: bridge 32bit mmio: [0xfbc00000-0xfbcfffff]
pci 0000:04:00.0: reg 10 64bit mmio: [0xfbbfc000-0xfbbfffff]
pci 0000:04:00.0: reg 18 io port: [0xb800-0xb8ff]
pci 0000:04:00.0: reg 30 32bit mmio pref: [0xfbbc0000-0xfbbdffff]
pci 0000:04:00.0: supports D1 D2
pci 0000:04:00.0: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:04:00.0: PME# disabled
pci 0000:00:1c.5: bridge io port: [0xb000-0xbfff]
pci 0000:00:1c.5: bridge 32bit mmio: [0xfbb00000-0xfbbfffff]
pci 0000:08:01.0: reg 10 io port: [0xec00-0xec3f]
pci 0000:08:01.0: supports D2
pci 0000:08:02.0: reg 10 32bit mmio: [0xfbeff000-0xfbeff7ff]
pci 0000:08:02.0: reg 14 io port: [0xe880-0xe8ff]
pci 0000:08:02.0: supports D2
pci 0000:08:02.0: PME# supported from D2 D3hot D3cold
pci 0000:08:02.0: PME# disabled
pci 0000:00:1e.0: transparent bridge
pci 0000:00:1e.0: bridge io port: [0xe000-0xefff]
pci 0000:00:1e.0: bridge 32bit mmio: [0xfbe00000-0xfbefffff]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P4._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P6._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P8._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P9._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NPE1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NPE3._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NPE7._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs *5)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 6 7 10 11 12 *14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 6 7 10 11 12 14 *15)
ACPI: PCI Interrupt Link [LNKF] (IRQs *3 4 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 6 *7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 *4 6 7 10 11 12 14 15)
vgaarb: device added: PCI:0000:02:00.0,decodes=io+mem,owns=io+mem,locks=none
vgaarb: loaded
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
hpet0: 4 comparators, 64-bit 14.318180 MHz counter
Switching to clocksource tsc
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 15 devices
ACPI: ACPI bus type pnp unregistered
system 00:01: iomem range 0xfbf00000-0xfbffffff has been reserved
system 00:01: iomem range 0xfc000000-0xfcffffff has been reserved
system 00:01: iomem range 0xfd000000-0xfdffffff has been reserved
system 00:01: iomem range 0xfe000000-0xfebfffff has been reserved
system 00:01: iomem range 0xfec8a000-0xfec8afff could not be reserved
system 00:01: iomem range 0xfed10000-0xfed10fff has been reserved
system 00:06: ioport range 0x290-0x29f has been reserved
system 00:07: ioport range 0x4d0-0x4d1 has been reserved
system 00:07: ioport range 0x800-0x87f has been reserved
system 00:07: ioport range 0x500-0x57f could not be reserved
system 00:07: iomem range 0xfed1c000-0xfed1ffff has been reserved
system 00:07: iomem range 0xfed20000-0xfed3ffff has been reserved
system 00:07: iomem range 0xfed40000-0xfed8ffff has been reserved
system 00:0a: iomem range 0xffc00000-0xffdfffff has been reserved
system 00:0b: iomem range 0xfec00000-0xfec00fff could not be reserved
system 00:0b: iomem range 0xfee00000-0xfee00fff has been reserved
system 00:0d: iomem range 0xe0000000-0xefffffff has been reserved
system 00:0e: iomem range 0x0-0x9ffff could not be reserved
system 00:0e: iomem range 0xc0000-0xcffff has been reserved
system 00:0e: iomem range 0xe0000-0xfffff could not be reserved
system 00:0e: iomem range 0x100000-0xbfffffff could not be reserved
system 00:0e: iomem range 0xfed90000-0xffffffff could not be reserved
pci 0000:00:01.0: PCI bridge, secondary bus 0000:01
pci 0000:00:01.0:   IO window: disabled
pci 0000:00:01.0:   MEM window: disabled
pci 0000:00:01.0:   PREFETCH window: disabled
pci 0000:00:03.0: PCI bridge, secondary bus 0000:02
pci 0000:00:03.0:   IO window: 0xa000-0xafff
pci 0000:00:03.0:   MEM window: 0xf8000000-0xfbafffff
pci 0000:00:03.0:   PREFETCH window: 0x000000d0000000-0x000000dfffffff
pci 0000:00:07.0: PCI bridge, secondary bus 0000:03
pci 0000:00:07.0:   IO window: disabled
pci 0000:00:07.0:   MEM window: disabled
pci 0000:00:07.0:   PREFETCH window: disabled
pci 0000:00:1c.0: PCI bridge, secondary bus 0000:07
pci 0000:00:1c.0:   IO window: 0x1000-0x1fff
pci 0000:00:1c.0:   MEM window: 0xc0000000-0xc03fffff
pci 0000:00:1c.0:   PREFETCH window: 0x000000f6f00000-0x000000f6ffffff
pci 0000:00:1c.2: PCI bridge, secondary bus 0000:06
pci 0000:00:1c.2:   IO window: 0xd000-0xdfff
pci 0000:00:1c.2:   MEM window: 0xfbd00000-0xfbdfffff
pci 0000:00:1c.2:   PREFETCH window: 0x000000c0400000-0x000000c05fffff
pci 0000:00:1c.4: PCI bridge, secondary bus 0000:05
pci 0000:00:1c.4:   IO window: 0xc000-0xcfff
pci 0000:00:1c.4:   MEM window: 0xfbc00000-0xfbcfffff
pci 0000:00:1c.4:   PREFETCH window: 0x000000c0600000-0x000000c07fffff
pci 0000:00:1c.5: PCI bridge, secondary bus 0000:04
pci 0000:00:1c.5:   IO window: 0xb000-0xbfff
pci 0000:00:1c.5:   MEM window: 0xfbb00000-0xfbbfffff
pci 0000:00:1c.5:   PREFETCH window: 0x000000c0800000-0x000000c09fffff
pci 0000:00:1e.0: PCI bridge, secondary bus 0000:08
pci 0000:00:1e.0:   IO window: 0xe000-0xefff
pci 0000:00:1e.0:   MEM window: 0xfbe00000-0xfbefffff
pci 0000:00:1e.0:   PREFETCH window: disabled
pci 0000:00:01.0: setting latency timer to 64
pci 0000:00:03.0: setting latency timer to 64
pci 0000:00:07.0: setting latency timer to 64
pci 0000:00:1c.0: enabling device (0106 -> 0107)
  alloc irq_desc for 17 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
pci 0000:00:1c.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
pci 0000:00:1c.0: setting latency timer to 64
  alloc irq_desc for 18 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
pci 0000:00:1c.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
pci 0000:00:1c.2: setting latency timer to 64
pci 0000:00:1c.4: PCI INT A -> GSI 17 (level, low) -> IRQ 17
pci 0000:00:1c.4: setting latency timer to 64
  alloc irq_desc for 16 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
pci 0000:00:1c.5: PCI INT B -> GSI 16 (level, low) -> IRQ 16
pci 0000:00:1c.5: setting latency timer to 64
pci 0000:00:1e.0: setting latency timer to 64
pci_bus 0000:00: resource 0 io:  [0x00-0xffff]
pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
pci_bus 0000:02: resource 0 io:  [0xa000-0xafff]
pci_bus 0000:02: resource 1 mem: [0xf8000000-0xfbafffff]
pci_bus 0000:02: resource 2 pref mem [0xd0000000-0xdfffffff]
pci_bus 0000:07: resource 0 io:  [0x1000-0x1fff]
pci_bus 0000:07: resource 1 mem: [0xc0000000-0xc03fffff]
pci_bus 0000:07: resource 2 pref mem [0xf6f00000-0xf6ffffff]
pci_bus 0000:06: resource 0 io:  [0xd000-0xdfff]
pci_bus 0000:06: resource 1 mem: [0xfbd00000-0xfbdfffff]
pci_bus 0000:06: resource 2 pref mem [0xc0400000-0xc05fffff]
pci_bus 0000:05: resource 0 io:  [0xc000-0xcfff]
pci_bus 0000:05: resource 1 mem: [0xfbc00000-0xfbcfffff]
pci_bus 0000:05: resource 2 pref mem [0xc0600000-0xc07fffff]
pci_bus 0000:04: resource 0 io:  [0xb000-0xbfff]
pci_bus 0000:04: resource 1 mem: [0xfbb00000-0xfbbfffff]
pci_bus 0000:04: resource 2 pref mem [0xc0800000-0xc09fffff]
pci_bus 0000:08: resource 0 io:  [0xe000-0xefff]
pci_bus 0000:08: resource 1 mem: [0xfbe00000-0xfbefffff]
pci_bus 0000:08: resource 3 io:  [0x00-0xffff]
pci_bus 0000:08: resource 4 mem: [0x000000-0xffffffffffffffff]
NET: Registered protocol family 2
IP route cache hash table entries: 524288 (order: 10, 4194304 bytes)
TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
NET: Registered protocol family 1
pci 0000:02:00.0: Boot video device
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 9092k freed
DMA-API: preallocated 32768 debug entries
DMA-API: debugging enabled by kernel config
IOMMU 0xfbfff000: using Queued invalidation
IOMMU 0xfbffe000: using Queued invalidation
------------[ cut here ]------------
WARNING: at drivers/pci/intel-iommu.c:3790 init_dmars+0x32a/0x6ea()
Hardware name: System Product Name
Your BIOS is broken; DMA routed to ISOCH DMAR unit but no TLB space.
BIOS vendor: American Megatrends Inc.; Ver: 0704   ; Product Version: System Version
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty #6
Call Trace:
 [<ffffffff810536ee>] warn_slowpath_common+0x7c/0x94
 [<ffffffff8105375d>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff817c9f73>] init_dmars+0x32a/0x6ea
 [<ffffffff817ca59e>] intel_iommu_init+0x26b/0x340
 [<ffffffff817a98f7>] ? pci_iommu_init+0x0/0x32
 [<ffffffff817a991b>] pci_iommu_init+0x24/0x32
 [<ffffffff8100a07d>] do_one_initcall+0x72/0x18a
 [<ffffffff817a26d8>] kernel_init+0x181/0x1db
 [<ffffffff81012d1a>] child_rip+0xa/0x20
 [<ffffffff8101267d>] ? restore_args+0x0/0x30
 [<ffffffff817a2557>] ? kernel_init+0x0/0x1db
 [<ffffffff81012d10>] ? child_rip+0x0/0x20
---[ end trace 57f7151f6a5def05 ]---
IOMMU: hardware identity mapping for device 0000:00:1b.0
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1d.0 [0xbf7dc000 - 0xbf7dc000]
IOMMU: Setting identity map for device 0000:00:1d.1 [0xbf7dc000 - 0xbf7dc000]
IOMMU: Setting identity map for device 0000:00:1d.2 [0xbf7dc000 - 0xbf7dc000]
IOMMU: Setting identity map for device 0000:00:1d.7 [0xbf7dc000 - 0xbf7dc000]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xbf7dc000 - 0xbf7dc000]
IOMMU: Setting identity map for device 0000:00:1a.1 [0xbf7dc000 - 0xbf7dc000]
IOMMU: Setting identity map for device 0000:00:1a.2 [0xbf7dc000 - 0xbf7dc000]
IOMMU: Setting identity map for device 0000:00:1a.7 [0xbf7dc000 - 0xbf7dc000]
IOMMU: Setting identity map for device 0000:00:1d.0 [0xec000 - 0xf0000]
IOMMU: Setting identity map for device 0000:00:1d.1 [0xec000 - 0xf0000]
IOMMU: Setting identity map for device 0000:00:1d.2 [0xec000 - 0xf0000]
IOMMU: Setting identity map for device 0000:00:1d.7 [0xec000 - 0xf0000]
IOMMU: Setting identity map for device 0000:00:1a.0 [0xec000 - 0xf0000]
IOMMU: Setting identity map for device 0000:00:1a.1 [0xec000 - 0xf0000]
IOMMU: Setting identity map for device 0000:00:1a.2 [0xec000 - 0xf0000]
IOMMU: Setting identity map for device 0000:00:1a.7 [0xec000 - 0xf0000]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0x1000000]
PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
Scanning for low memory corruption every 60 seconds
audit: initializing netlink socket (disabled)
type=2000 audit(1264052636.959:1): initialized
HugeTLB registered 2 MB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Btrfs loaded
msgmni has been set to 23940
SELinux:  Registering netfilter hooks
alg: No test for stdrng (krng)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
  alloc irq_desc for 50 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
pcieport 0000:00:01.0: irq 50 for MSI/MSI-X
pcieport 0000:00:01.0: setting latency timer to 64
  alloc irq_desc for 51 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
pcieport 0000:00:03.0: irq 51 for MSI/MSI-X
pcieport 0000:00:03.0: setting latency timer to 64
  alloc irq_desc for 52 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
pcieport 0000:00:07.0: irq 52 for MSI/MSI-X
pcieport 0000:00:07.0: setting latency timer to 64
  alloc irq_desc for 53 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
pcieport 0000:00:1c.0: irq 53 for MSI/MSI-X
pcieport 0000:00:1c.0: setting latency timer to 64
  alloc irq_desc for 54 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
pcieport 0000:00:1c.2: irq 54 for MSI/MSI-X
pcieport 0000:00:1c.2: setting latency timer to 64
  alloc irq_desc for 55 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
pcieport 0000:00:1c.4: irq 55 for MSI/MSI-X
pcieport 0000:00:1c.4: setting latency timer to 64
  alloc irq_desc for 56 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
pcieport 0000:00:1c.5: irq 56 for MSI/MSI-X
pcieport 0000:00:1c.5: setting latency timer to 64
aer 0000:00:01.0:pcie02: AER service couldn't init device: no _OSC support
aer 0000:00:03.0:pcie02: AER service couldn't init device: no _OSC support
aer 0000:00:07.0:pcie02: AER service couldn't init device: no _OSC support
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
pci-stub: invalid id string ""
input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
ACPI: Power Button [PWRB]
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
ACPI: Power Button [PWRF]
ACPI: SSDT 00000000bf798200 01C0C (v01 DpgPmm  P001Ist 00000011 INTL 20060113)
ACPI: SSDT 00000000bf799e10 00659 (v01  PmRef  P001Cst 00003001 INTL 20060113)
Monitor-Mwait will be used to enter C-1 state
Monitor-Mwait will be used to enter C-2 state
Monitor-Mwait will be used to enter C-3 state
processor LNXCPU:00: registered as cooling_device0
processor LNXCPU:01: registered as cooling_device1
processor LNXCPU:02: registered as cooling_device2
processor LNXCPU:03: registered as cooling_device3
processor LNXCPU:04: registered as cooling_device4
processor LNXCPU:05: registered as cooling_device5
processor LNXCPU:06: registered as cooling_device6
processor LNXCPU:07: registered as cooling_device7
Non-volatile memory driver v1.3
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
brd: module loaded
loop: module loaded
ahci 0000:00:1f.2: version 3.0
  alloc irq_desc for 20 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
ahci 0000:00:1f.2: PCI INT B -> GSI 20 (level, low) -> IRQ 20
  alloc irq_desc for 57 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
ahci 0000:00:1f.2: irq 57 for MSI/MSI-X
ahci: SSS flag set, parallel bus scan disabled
ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x3f impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq sntf stag pm led clo pmp pio slum part ccc ems sxs 
ahci 0000:00:1f.2: setting latency timer to 64
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
scsi3 : ahci
scsi4 : ahci
scsi5 : ahci
ata1: SATA max UDMA/133 abar m2048@0xf7ffc000 port 0xf7ffc100 irq 57
ata2: SATA max UDMA/133 abar m2048@0xf7ffc000 port 0xf7ffc180 irq 57
ata3: SATA max UDMA/133 abar m2048@0xf7ffc000 port 0xf7ffc200 irq 57
ata4: SATA max UDMA/133 abar m2048@0xf7ffc000 port 0xf7ffc280 irq 57
ata5: SATA max UDMA/133 abar m2048@0xf7ffc000 port 0xf7ffc300 irq 57
ata6: SATA max UDMA/133 abar m2048@0xf7ffc000 port 0xf7ffc380 irq 57
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18
ehci_hcd 0000:00:1a.7: setting latency timer to 64
ehci_hcd 0000:00:1a.7: EHCI Host Controller
ehci_hcd 0000:00:1a.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1a.7: debug port 1
ehci_hcd 0000:00:1a.7: cache line size of 32 is not supported
ehci_hcd 0000:00:1a.7: irq 18, io mem 0xf7fff000
ehci_hcd 0000:00:1a.7: USB 2.0 started, EHCI 1.00
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: EHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty ehci_hcd
usb usb1: SerialNumber: 0000:00:1a.7
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 6 ports detected
  alloc irq_desc for 23 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 23 (level, low) -> IRQ 23
ehci_hcd 0000:00:1d.7: setting latency timer to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 2
ehci_hcd 0000:00:1d.7: debug port 1
ehci_hcd 0000:00:1d.7: cache line size of 32 is not supported
ehci_hcd 0000:00:1d.7: irq 23, io mem 0xf7ffe000
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: EHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty ehci_hcd
usb usb2: SerialNumber: 0000:00:1d.7
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 6 ports detected
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
uhci_hcd: USB Universal Host Controller Interface driver
uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
uhci_hcd 0000:00:1a.0: setting latency timer to 64
uhci_hcd 0000:00:1a.0: UHCI Host Controller
uhci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1a.0: irq 16, io base 0x00009800
usb usb3: New USB device found, idVendor=1d6b, idProduct=0001
usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: UHCI Host Controller
usb usb3: Manufacturer: Linux 2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty uhci_hcd
usb usb3: SerialNumber: 0000:00:1a.0
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
  alloc irq_desc for 21 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 21 (level, low) -> IRQ 21
uhci_hcd 0000:00:1a.1: setting latency timer to 64
uhci_hcd 0000:00:1a.1: UHCI Host Controller
uhci_hcd 0000:00:1a.1: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1a.1: irq 21, io base 0x00009880
usb usb4: New USB device found, idVendor=1d6b, idProduct=0001
usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb4: Product: UHCI Host Controller
usb usb4: Manufacturer: Linux 2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty uhci_hcd
usb usb4: SerialNumber: 0000:00:1a.1
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
  alloc irq_desc for 19 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
uhci_hcd 0000:00:1a.2: PCI INT D -> GSI 19 (level, low) -> IRQ 19
uhci_hcd 0000:00:1a.2: setting latency timer to 64
uhci_hcd 0000:00:1a.2: UHCI Host Controller
uhci_hcd 0000:00:1a.2: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:1a.2: irq 19, io base 0x00009c00
usb usb5: New USB device found, idVendor=1d6b, idProduct=0001
usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb5: Product: UHCI Host Controller
usb usb5: Manufacturer: Linux 2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty uhci_hcd
usb usb5: SerialNumber: 0000:00:1a.2
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 23 (level, low) -> IRQ 23
uhci_hcd 0000:00:1d.0: setting latency timer to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 6
uhci_hcd 0000:00:1d.0: irq 23, io base 0x00009080
usb usb6: New USB device found, idVendor=1d6b, idProduct=0001
usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb6: Product: UHCI Host Controller
usb usb6: Manufacturer: Linux 2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty uhci_hcd
usb usb6: SerialNumber: 0000:00:1d.0
usb usb6: configuration #1 chosen from 1 choice
hub 6-0:1.0: USB hub found
hub 6-0:1.0: 2 ports detected
uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 19 (level, low) -> IRQ 19
uhci_hcd 0000:00:1d.1: setting latency timer to 64
uhci_hcd 0000:00:1d.1: UHCI Host Controller
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 7
uhci_hcd 0000:00:1d.1: irq 19, io base 0x00009400
usb usb7: New USB device found, idVendor=1d6b, idProduct=0001
usb usb7: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb7: Product: UHCI Host Controller
usb usb7: Manufacturer: Linux 2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty uhci_hcd
usb usb7: SerialNumber: 0000:00:1d.1
usb usb7: configuration #1 chosen from 1 choice
hub 7-0:1.0: USB hub found
hub 7-0:1.0: 2 ports detected
uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
uhci_hcd 0000:00:1d.2: setting latency timer to 64
uhci_hcd 0000:00:1d.2: UHCI Host Controller
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 8
uhci_hcd 0000:00:1d.2: irq 18, io base 0x00009480
usb usb8: New USB device found, idVendor=1d6b, idProduct=0001
usb usb8: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb8: Product: UHCI Host Controller
usb usb8: Manufacturer: Linux 2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty uhci_hcd
usb usb8: SerialNumber: 0000:00:1d.2
usb usb8: configuration #1 chosen from 1 choice
hub 8-0:1.0: USB hub found
hub 8-0:1.0: 2 ports detected
PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice
rtc_cmos 00:03: RTC can wake from S4
rtc_cmos 00:03: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one month, y3k, 114 bytes nvram, hpet irqs
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com
cpuidle: using governor ladder
cpuidle: using governor menu
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
ip_tables: (C) 2000-2006 Netfilter Core Team
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
No TPM chip found, activating TPM-bypass!
  Magic number: 10:757:719
rtc_cmos 00:03: setting system clock to 2010-01-21 05:43:57 UTC (1264052637)
Initalizing network drop monitor service
input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
usb 1-5: new high speed USB device using ehci_hcd and address 2
usb 1-5: New USB device found, idVendor=0409, idProduct=005a
usb 1-5: New USB device strings: Mfr=0, Product=0, SerialNumber=0
usb 1-5: configuration #1 chosen from 1 choice
hub 1-5:1.0: USB hub found
hub 1-5:1.0: 4 ports detected
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-8: WDC WD10EADS-00L5B1, 01.01A01, max UDMA/133
ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
ata1.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      WDC WD10EADS-00L 01.0 PQ: 0 ANSI: 5
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
usb 7-1: new full speed USB device using uhci_hcd and address 2
usb 7-1: New USB device found, idVendor=0733, idProduct=0430
usb 7-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
usb 7-1: configuration #1 chosen from 1 choice
usb 7-2: new low speed USB device using uhci_hcd and address 3
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: ATA-8: WDC WD10EADS-00L5B1, 01.01A01, max UDMA/133
ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
ata2.00: configured for UDMA/133
scsi 1:0:0:0: Direct-Access     ATA      WDC WD10EADS-00L 01.0 PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
sd 1:0:0:0: Attached scsi generic sg1 type 0
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdb: sdb1 sdb2
sd 1:0:0:0: [sdb] Attached SCSI disk
usb 7-2: New USB device found, idVendor=051d, idProduct=0002
usb 7-2: New USB device strings: Mfr=3, Product=1, SerialNumber=2
usb 7-2: Product: Back-UPS ES 550 FW:843.K1d.D USB FW:K1d
usb 7-2: Manufacturer: APC
usb 7-2: SerialNumber: 3B1806X13792  
usb 7-2: configuration #1 chosen from 1 choice
generic-usb 0003:051D:0002.0001: hiddev96,hidraw0: USB HID v1.10 Device [APC Back-UPS ES 550 FW:843.K1d.D USB FW:K1d] on usb-0000:00:1d.1-2/input0
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: ATA-8: WDC WD10EADS-00L5B1, 01.01A01, max UDMA/133
ata3.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
ata3.00: configured for UDMA/133
scsi 2:0:0:0: Direct-Access     ATA      WDC WD10EADS-00L 01.0 PQ: 0 ANSI: 5
sd 2:0:0:0: Attached scsi generic sg2 type 0
sd 2:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdc: sdc1
sd 2:0:0:0: [sdc] Attached SCSI disk
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata4.00: ATA-8: WDC WD10EADS-00L5B1, 01.01A01, max UDMA/133
ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
ata4.00: configured for UDMA/133
scsi 3:0:0:0: Direct-Access     ATA      WDC WD10EADS-00L 01.0 PQ: 0 ANSI: 5
sd 3:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
sd 3:0:0:0: Attached scsi generic sg3 type 0
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdd: sdd1
sd 3:0:0:0: [sdd] Attached SCSI disk
ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata5.00: ATA-8: WDC WD10EADS-65L5B1, 01.01A01, max UDMA/133
ata5.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
ata5.00: configured for UDMA/133
scsi 4:0:0:0: Direct-Access     ATA      WDC WD10EADS-65L 01.0 PQ: 0 ANSI: 5
sd 4:0:0:0: [sde] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
sd 4:0:0:0: Attached scsi generic sg4 type 0
sd 4:0:0:0: [sde] Write Protect is off
sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sde: sde1
sd 4:0:0:0: [sde] Attached SCSI disk
ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.00: ATA-8: WDC WD10EADS-65L5B1, 01.01A01, max UDMA/133
ata6.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
ata6.00: configured for UDMA/133
scsi 5:0:0:0: Direct-Access     ATA      WDC WD10EADS-65L 01.0 PQ: 0 ANSI: 5
sd 5:0:0:0: [sdf] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
sd 5:0:0:0: Attached scsi generic sg5 type 0
sd 5:0:0:0: [sdf] Write Protect is off
sd 5:0:0:0: [sdf] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdf: sdf1
sd 5:0:0:0: [sdf] Attached SCSI disk
Freeing unused kernel memory: 1524k freed
Write protecting the kernel read-only data: 6788k
dracut: dracut-002-13.4.git8f397a9b.fc12
udev: starting version 147
Linux agpgart interface v0.103
[drm] Initialized drm 1.1.0 20060810
  alloc irq_desc for 24 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
nouveau 0000:02:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
nouveau 0000:02:00.0: setting latency timer to 64
[drm] nouveau 0000:02:00.0: Detected an NV50 generation card (0x450300a2)
[drm] nouveau 0000:02:00.0: Attempting to load BIOS image from PRAMIN
[drm] nouveau 0000:02:00.0: ... appears to be valid
[drm] nouveau 0000:02:00.0: BIT BIOS found
[drm] nouveau 0000:02:00.0: Bios version 60.80.18.00
[drm] nouveau 0000:02:00.0: TMDS table revision 2.0 not currently supported
[drm] nouveau 0000:02:00.0: BIT table 'd' not found
[drm] nouveau 0000:02:00.0: Found Display Configuration Block version 4.0
[drm] nouveau 0000:02:00.0: DCB connector table: VHER 0x40 5 14 2
[drm] nouveau 0000:02:00.0:   0: 0x00001030: type 0x30 idx 0 tag 0x07
[drm] nouveau 0000:02:00.0:   1: 0x00002130: type 0x30 idx 1 tag 0x08
[drm] nouveau 0000:02:00.0:   2: 0x00000210: type 0x10 idx 2 tag 0xff
[drm] nouveau 0000:02:00.0:   3: 0x00000211: type 0x11 idx 2 tag 0xff
[drm] nouveau 0000:02:00.0:   4: 0x00000213: type 0x13 idx 2 tag 0xff
[drm] nouveau 0000:02:00.0: Raw DCB entry 0: 04000320 00000028
[drm] nouveau 0000:02:00.0: Raw DCB entry 1: 01000322 00000030
[drm] nouveau 0000:02:00.0: Raw DCB entry 2: 02011310 00000028
[drm] nouveau 0000:02:00.0: Raw DCB entry 3: 02011312 00000030
[drm] nouveau 0000:02:00.0: Raw DCB entry 4: 010223f1 00c1c020
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table 0 at offset 0xBFB4
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table 1 at offset 0xC1BF
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table 2 at offset 0xD37B
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table 3 at offset 0xD46A
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table 4 at offset 0xD621
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table at offset 0xD686
[drm] nouveau 0000:02:00.0: 0xD686: Condition still not met after 20ms, skipping following opcodes
[drm] nouveau 0000:02:00.0: 0xB284: parsing output script 0
[drm] nouveau 0000:02:00.0: 0xB284: parsing output script 0
[drm] nouveau 0000:02:00.0: 0xA8FA: parsing output script 0
[TTM] Zone  kernel: Available graphics memory: 6129650 kiB.
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB.
[drm] nouveau 0000:02:00.0: 320 MiB VRAM
[drm] nouveau 0000:02:00.0: 512 MiB GART (aperture)
nouveau 0000:02:00.0: firmware: requesting nouveau/nv50.ctxprog
dracut: 
dracut: udevadm settle - timeout of 30 seconds reached, the event queue contains:
dracut: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0 (936)
dracut: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/drm/controlD64 (1024)
dracut: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/drm/card0 (1025)
dracut: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/i2c-0 (1026)
dracut: Starting plymouth daemon
pata_marvell 0000:05:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pata_marvell 0000:05:00.0: setting latency timer to 64
scsi6 : pata_marvell
scsi7 : pata_marvell
ata7: PATA max UDMA/100 cmd 0xcc00 ctl 0xc880 bmdma 0xc400 irq 16
ata8: PATA max UDMA/133 cmd 0xc800 ctl 0xc480 bmdma 0xc408 irq 16
ata7.01: NODEV after polling detection
irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Tainted: G        W  2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty #6
Call Trace:
 <IRQ>  [<ffffffff810a43ff>] __report_bad_irq+0x3d/0x8c
 [<ffffffff810a4566>] note_interrupt+0x118/0x17e
 [<ffffffff810a4c69>] handle_fasteoi_irq+0x9c/0xc1
 [<ffffffff81014422>] handle_irq+0x24/0x2c
 [<ffffffff8147052c>] do_IRQ+0x5c/0xbc
 [<ffffffff81012613>] ret_from_intr+0x0/0x16
 <EOI>  [<ffffffff812c729a>] ? acpi_idle_enter_bm+0x256/0x28a
 [<ffffffff812c7293>] ? acpi_idle_enter_bm+0x24f/0x28a
 [<ffffffff813a6c3c>] ? cpuidle_idle_call+0x9e/0xfa
 [<ffffffff81010c90>] ? cpu_idle+0xb4/0xf6
 [<ffffffff81465ba5>] ? start_secondary+0x201/0x242
handlers:
[<ffffffff8134d15a>] (usb_hcd_irq+0x0/0xa7)
[<ffffffff813395ce>] (ata_sff_interrupt+0x0/0x96)
Disabling IRQ #16
ata7.00: ATAPI: HP  DVD Writer 400, KH27, max UDMA/33
ata7.00: configured for UDMA/33
scsi 6:0:0:0: CD-ROM            HP       DVD Writer 400c  KH27 PQ: 0 ANSI: 5
sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 6:0:0:0: Attached scsi CD-ROM sr0
sr 6:0:0:0: Attached scsi generic sg6 type 5
[drm] nouveau 0000:02:00.0: No ctxprog for NV50
[drm] nouveau 0000:02:00.0: Detected a DAC output
[drm] nouveau 0000:02:00.0: Detected a TMDS output
[drm] nouveau 0000:02:00.0: Detected a DAC output
[drm] nouveau 0000:02:00.0: Detected a TMDS output
[drm] nouveau 0000:02:00.0: DCB encoder 1 unknown
[drm] nouveau 0000:02:00.0: Detected a DVI-I connector
[drm] nouveau 0000:02:00.0: Detected a DVI-I connector
[drm] nouveau 0000:02:00.0: allocated 1680x1050 fb: 0x401c0000, bo ffff88032dba3400
fb0: nouveaufb frame buffer device
registered panic notifier
[drm] Initialized nouveau 0.0.15 20090420 for 0000:02:00.0 on minor 0
dracut: Autoassembling MD Raid
md: md0 stopped.
md: bind<sdb1>
md: bind<sda1>
md: raid1 personality registered for level 1
raid1: raid set md0 active with 2 out of 2 mirrors
md0: detected capacity change from 0 to 209649664
dracut: mdadm: /dev/md0 has been started with 2 drives.
 md0: unknown partition table
md: md1 stopped.
md: bind<sdb2>
md: bind<sdc1>
md: bind<sdd1>
md: bind<sdf1>
md: bind<sde1>
md: bind<sda2>
async_tx: api initialized (async)
xor: automatically using best checksumming function: generic_sse
   generic_sse:  8908.000 MB/sec
xor: using function: generic_sse (8908.000 MB/sec)
raid6: int64x1   2371 MB/s
raid6: int64x2   2476 MB/s
raid6: int64x4   2039 MB/s
raid6: int64x8   1558 MB/s
raid6: sse2x1    5906 MB/s
raid6: sse2x2    6945 MB/s
raid6: sse2x4    8015 MB/s
raid6: using algorithm sse2x4 (8015 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid5: device sda2 operational as raid disk 0
raid5: device sde1 operational as raid disk 5
raid5: device sdf1 operational as raid disk 4
raid5: device sdd1 operational as raid disk 3
raid5: device sdc1 operational as raid disk 2
raid5: device sdb2 operational as raid disk 1
raid5: allocated 6386kB for md1
0: w=1 pa=0 pr=6 m=2 a=2 r=6 op1=0 op2=0
5: w=2 pa=0 pr=6 m=2 a=2 r=6 op1=0 op2=0
4: w=3 pa=0 pr=6 m=2 a=2 r=6 op1=0 op2=0
3: w=4 pa=0 pr=6 m=2 a=2 r=6 op1=0 op2=0
2: w=5 pa=0 pr=6 m=2 a=2 r=6 op1=0 op2=0
1: w=6 pa=0 pr=6 m=2 a=2 r=6 op1=0 op2=0
raid5: raid level 6 set md1 active with 6 out of 6 devices, algorithm 2
RAID5 conf printout:
 --- rd:6 wd:6
 disk 0, o:1, dev:sda2
 disk 1, o:1, dev:sdb2
 disk 2, o:1, dev:sdc1
 disk 3, o:1, dev:sdd1
 disk 4, o:1, dev:sdf1
 disk 5, o:1, dev:sde1
md1: detected capacity change from 0 to 3999969837056
dracut: mdadm: /dev/md1 has been started with 6 drives.
 md1: unknown partition table
dracut: Scanning devices md1  for LVM volume groups 
dracut: Reading all physical volumes. This may take a while...
dracut: Found volume group "vg_00" using metadata type lvm2
dracut: 13 logical volume(s) in volume group "vg_00" now active
dracut: Autoassembling MD Raid
EXT4-fs (dm-3): mounted filesystem with ordered data mode
dracut: Mounted root filesystem /dev/mapper/vg_00-LogVol02_root
dracut: Loading SELinux policy
type=1404 audit(1264052705.420:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
SELinux: 8192 avtab hash slots, 131435 rules.
SELinux: 8192 avtab hash slots, 131435 rules.
SELinux:  8 users, 12 roles, 2787 types, 129 bools, 1 sens, 1024 cats
SELinux:  74 classes, 131435 rules
SELinux:  class tun_socket not defined in policy
SELinux:  permission module_request in class system not defined in policy
SELinux: the above unknown classes and permissions will be allowed
SELinux:  Completing initialization.
SELinux:  Setting up existing superblocks.
SELinux: initialized (dev dm-3, type ext4), uses xattr
SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
SELinux: initialized (dev securityfs, type securityfs), uses genfs_contexts
SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
SELinux: initialized (dev selinuxfs, type selinuxfs), uses genfs_contexts
SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs
SELinux: initialized (dev hugetlbfs, type hugetlbfs), uses genfs_contexts
SELinux: initialized (dev devpts, type devpts), uses transition SIDs
SELinux: initialized (dev inotifyfs, type inotifyfs), uses genfs_contexts
SELinux: initialized (dev anon_inodefs, type anon_inodefs), uses genfs_contexts
SELinux: initialized (dev pipefs, type pipefs), uses task SIDs
SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts
SELinux: initialized (dev sockfs, type sockfs), uses task SIDs
SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
SELinux: initialized (dev proc, type proc), uses genfs_contexts
SELinux: initialized (dev bdev, type bdev), uses genfs_contexts
SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts
SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts
type=1403 audit(1264052705.655:3): policy loaded auid=4294967295 ses=4294967295
dracut: Switching root
[drm] DAC-7: set mode 1680x1050 25
Console: switching to colour frame buffer device 210x65
[drm] nouveau 0000:02:00.0: 0x0FA9: parsing clock script 0
udev: starting version 147
iTCO_vendor_support: vendor-support=0
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.05
iTCO_wdt: Found a ICH10R TCO device (Version=2, TCOBASE=0x0860)
iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
input: PC Speaker as /devices/platform/pcspkr/input/input3
ACPI: WMI: Mapper loaded
sky2 driver version 1.25
sky2 0000:06:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
sky2 0000:06:00.0: setting latency timer to 64
sky2 0000:06:00.0: Yukon-2 EC Ultra chip revision 3
  alloc irq_desc for 58 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
sky2 0000:06:00.0: irq 58 for MSI/MSI-X
sky2 eth0: addr 00:26:18:00:1c:3b
sky2 0000:04:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
sky2 0000:04:00.0: setting latency timer to 64
sky2 0000:04:00.0: Yukon-2 EC Ultra chip revision 3
  alloc irq_desc for 59 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
sky2 0000:04:00.0: irq 59 for MSI/MSI-X
sky2 eth1: addr 00:26:18:00:1c:3a
Linux video capture interface: v2.00
i801_smbus 0000:00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18
ACPI: I/O resource 0000:00:1f.3 [0x400-0x41f] conflicts with ACPI region SMRG [0x400-0x40f]
ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
gspca: main v2.7.0 registered
gspca: probing 0733:0430
gspca: probe ok
usbcore: registered new interface driver spca505
spca505: registered
firewire_ohci 0000:08:02.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
firewire_ohci: Added fw-ohci device 0000:08:02.0, OHCI version 1.10
  alloc irq_desc for 22 on node -1
  alloc kstat_irqs on node -1
alloc irq_2_iommu on node -1
HDA Intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
HDA Intel 0000:00:1b.0: setting latency timer to 64
ENS1371 0000:08:01.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
firewire_core: created device fw0: GUID 001e8c0001e5e584, S400
hda_codec: AD1989B: BIOS auto-probing.
ALSA sound/pci/hda/hda_codec.c:3881: autoconfig: line_outs=4 (0x12/0x16/0x24/0x25/0x0)
ALSA sound/pci/hda/hda_codec.c:3885:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
ALSA sound/pci/hda/hda_codec.c:3889:    hp_outs=1 (0x11/0x0/0x0/0x0/0x0)
ALSA sound/pci/hda/hda_codec.c:3890:    mono: mono_out=0x0
ALSA sound/pci/hda/hda_codec.c:3893:    dig-out=0x1b/0x1d
ALSA sound/pci/hda/hda_codec.c:3901:    inputs: mic=0x17, fmic=0x14, line=0x15, fline=0x0, cd=0x18, aux=0x0
input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:1b.0/input/input4
ALSA sound/pci/hda/hda_intel.c:698: azx_get_response timeout, switching to polling mode: last cmd=0x002f0d00
gameport: ES137x is pci0000:08:01.0/gameport0, io 0x200, speed 663kHz
name_count maxed, losing inode data: dev=00:06, inode=11914
name_count maxed, losing inode data: dev=00:06, inode=11914
name_count maxed, losing inode data: dev=00:06, inode=11914
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11919
name_count maxed, losing inode data: dev=00:06, inode=11919
name_count maxed, losing inode data: dev=00:06, inode=11919
name_count maxed, losing inode data: dev=00:06, inode=11919
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11924
name_count maxed, losing inode data: dev=00:06, inode=11924
name_count maxed, losing inode data: dev=00:06, inode=11924
name_count maxed, losing inode data: dev=00:06, inode=11924
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11929
name_count maxed, losing inode data: dev=00:06, inode=11929
name_count maxed, losing inode data: dev=00:06, inode=11929
name_count maxed, losing inode data: dev=00:06, inode=11929
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11934
name_count maxed, losing inode data: dev=00:06, inode=11934
name_count maxed, losing inode data: dev=00:06, inode=11934
name_count maxed, losing inode data: dev=00:06, inode=11934
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11939
name_count maxed, losing inode data: dev=00:06, inode=11939
name_count maxed, losing inode data: dev=00:06, inode=11939
name_count maxed, losing inode data: dev=00:06, inode=11939
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11944
name_count maxed, losing inode data: dev=00:06, inode=11944
name_count maxed, losing inode data: dev=00:06, inode=11944
name_count maxed, losing inode data: dev=00:06, inode=11944
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11949
name_count maxed, losing inode data: dev=00:06, inode=11949
name_count maxed, losing inode data: dev=00:06, inode=11949
name_count maxed, losing inode data: dev=00:06, inode=11949
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11954
name_count maxed, losing inode data: dev=00:06, inode=11954
name_count maxed, losing inode data: dev=00:06, inode=11954
name_count maxed, losing inode data: dev=00:06, inode=11954
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11959
name_count maxed, losing inode data: dev=00:06, inode=11959
name_count maxed, losing inode data: dev=00:06, inode=11959
name_count maxed, losing inode data: dev=00:06, inode=11959
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11965
name_count maxed, losing inode data: dev=00:06, inode=11965
name_count maxed, losing inode data: dev=00:06, inode=11965
name_count maxed, losing inode data: dev=00:06, inode=11965
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11970
name_count maxed, losing inode data: dev=00:06, inode=11970
name_count maxed, losing inode data: dev=00:06, inode=11970
name_count maxed, losing inode data: dev=00:06, inode=11970
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11975
name_count maxed, losing inode data: dev=00:06, inode=11975
name_count maxed, losing inode data: dev=00:06, inode=11975
name_count maxed, losing inode data: dev=00:06, inode=11975
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11980
name_count maxed, losing inode data: dev=00:06, inode=11980
name_count maxed, losing inode data: dev=00:06, inode=11980
name_count maxed, losing inode data: dev=00:06, inode=11980
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11985
name_count maxed, losing inode data: dev=00:06, inode=11985
name_count maxed, losing inode data: dev=00:06, inode=11985
name_count maxed, losing inode data: dev=00:06, inode=11985
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11990
name_count maxed, losing inode data: dev=00:06, inode=11990
name_count maxed, losing inode data: dev=00:06, inode=11990
name_count maxed, losing inode data: dev=00:06, inode=11990
name_count maxed, losing inode data: dev=00:06, inode=11906
name_count maxed, losing inode data: dev=00:06, inode=11995
name_count maxed, losing inode data: dev=00:06, inode=11995
name_count maxed, losing inode data: dev=00:06, inode=11995
name_count maxed, losing inode data: dev=00:06, inode=11995
name_count maxed, losing inode data: dev=00:06, inode=201
name_count maxed, losing inode data: dev=00:06, inode=12000
name_count maxed, losing inode data: dev=00:06, inode=12000
name_count maxed, losing inode data: dev=00:06, inode=12000
name_count maxed, losing inode data: dev=00:06, inode=12003
name_count maxed, losing inode data: dev=00:06, inode=12003
name_count maxed, losing inode data: dev=00:06, inode=12003
name_count maxed, losing inode data: dev=00:06, inode=12003
name_count maxed, losing inode data: dev=00:06, inode=12000
name_count maxed, losing inode data: dev=00:06, inode=12008
name_count maxed, losing inode data: dev=00:06, inode=12008
name_count maxed, losing inode data: dev=00:06, inode=12008
name_count maxed, losing inode data: dev=00:06, inode=12008
name_count maxed, losing inode data: dev=00:06, inode=12000
name_count maxed, losing inode data: dev=00:06, inode=12013
name_count maxed, losing inode data: dev=00:06, inode=12013
name_count maxed, losing inode data: dev=00:06, inode=12013
name_count maxed, losing inode data: dev=00:06, inode=12013
name_count maxed, losing inode data: dev=00:06, inode=12000
name_count maxed, losing inode data: dev=00:06, inode=12018
name_count maxed, losing inode data: dev=00:06, inode=12018
name_count maxed, losing inode data: dev=00:06, inode=12018
name_count maxed, losing inode data: dev=00:06, inode=12018
name_count maxed, losing inode data: dev=00:06, inode=12000
name_count maxed, losing inode data: dev=00:06, inode=12023
name_count maxed, losing inode data: dev=00:06, inode=12023
name_count maxed, losing inode data: dev=00:06, inode=12023
name_count maxed, losing inode data: dev=00:06, inode=12023
name_count maxed, losing inode data: dev=00:06, inode=12000
name_count maxed, losing inode data: dev=00:06, inode=12028
name_count maxed, losing inode data: dev=00:06, inode=12028
name_count maxed, losing inode data: dev=00:06, inode=12028
name_count maxed, losing inode data: dev=00:06, inode=12028
name_count maxed, losing inode data: dev=00:06, inode=12000
name_count maxed, losing inode data: dev=00:06, inode=12033
name_count maxed, losing inode data: dev=00:06, inode=12033
name_count maxed, losing inode data: dev=00:06, inode=12033
name_count maxed, losing inode data: dev=00:06, inode=12033
name_count maxed, losing inode data: dev=00:06, inode=12000
name_count maxed, losing inode data: dev=00:06, inode=12038
name_count maxed, losing inode data: dev=00:06, inode=12038
name_count maxed, losing inode data: dev=00:06, inode=12038
name_count maxed, losing inode data: dev=00:06, inode=12038
name_count maxed, losing inode data: dev=00:06, inode=12000
name_count maxed, losing inode data: dev=00:06, inode=12043
name_count maxed, losing inode data: dev=00:06, inode=12043
name_count maxed, losing inode data: dev=00:06, inode=12043
name_count maxed, losing inode data: dev=00:06, inode=12043
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
name_count maxed, losing inode data: dev=00:06, inode=12050
device-mapper: multipath: version 1.1.0 loaded
EXT4-fs (dm-6): mounted filesystem with ordered data mode
SELinux: initialized (dev dm-6, type ext4), uses xattr
EXT4-fs (dm-1): mounted filesystem with ordered data mode
SELinux: initialized (dev dm-1, type ext4), uses xattr
EXT4-fs (dm-5): mounted filesystem with ordered data mode
SELinux: initialized (dev dm-5, type ext4), uses xattr
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev md0, type ext3), uses xattr
EXT4-fs (dm-7): mounted filesystem with ordered data mode
SELinux: initialized (dev dm-7, type ext4), uses xattr
EXT4-fs (dm-8): mounted filesystem with ordered data mode
SELinux: initialized (dev dm-8, type ext4), uses xattr
EXT4-fs (dm-4): mounted filesystem with ordered data mode
SELinux: initialized (dev dm-4, type ext4), uses xattr
EXT4-fs (dm-2): mounted filesystem with ordered data mode
SELinux: initialized (dev dm-2, type ext4), uses xattr
device fsid 834fc529ea4ea2c5-f9308b94bfd9cac devid 1 transid 24026 /dev/mapper/vg_00-kvm_guest_btrfs_test
SELinux: initialized (dev dm-9, type btrfs), uses xattr
EXT4-fs (dm-10): mounted filesystem with ordered data mode
SELinux: initialized (dev dm-10, type ext4), uses xattr
Adding 17432568k swap on /dev/mapper/vg_00-LogVol00_swap.  Priority:-1 extents:1 across:17432568k 
SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-22 18:01     ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Michael Breuer
@ 2010-01-22 21:53       ` Jarek Poplawski
  2010-01-22 22:14         ` Michael Breuer
  2012-01-16 16:39       ` Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9) Michael Breuer
  1 sibling, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-22 21:53 UTC (permalink / raw)
  To: Michael Breuer
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Fri, Jan 22, 2010 at 01:01:15PM -0500, Michael Breuer wrote:
> Kernel 2.6.32.4 (git) with the following patches applied:
> 
> af_packet.c (tpacket_snd version 3)
> sky2.c pskb_may_pull
> sky2 fix WARNING at lib/dma-debug.c check_sync

I guess, you meant the "sky2.c receive_copy" patch which you tested
earlier, or at least you managed to crash DMAR with that patch
before crashing it with Stephen's "lib/dma-debug.c check_sync" patch,
right?

> Running with CONFIG_DMAR=n, system is stable.
> Running with the exact same source but CONFIG_DMAR=y I get the
> WARNING (see below) after about 36 hours of uptime (has varied from
> about 24 to about 48):
> Smolt profile: http://smolt.fedoraproject.org/show?uuid=pub_bb05c701-1e47-4b3c-9fab-54f520f39d79+
> I'm also attaching dmesg.old (dmesg from the crash).
> 
> Subsequent to this the system watchdog reboots the system (it's hung).
> 
> Of interest: each and every time this has happened the system was
> under heavy RX load (win7 backup to a cifs share hosted on this
> server). Also, there is always a dhcp exchange of some sort
> preceding the event.
> 
> It is possible that the event is re creatable without DMAR enabled,
> but I have been unsuccessful in doing so.

It would be nice to check now if it's re-creatable without the dhcp
exchange yet, or at least dhcp through the switch and the router,
because I suspect there might be something more than a simple drop
on the switch that affects sky2 stability.

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-22 21:53       ` Jarek Poplawski
@ 2010-01-22 22:14         ` Michael Breuer
  2010-01-22 23:06           ` Jarek Poplawski
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-22 22:14 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/22/2010 4:53 PM, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 01:01:15PM -0500, Michael Breuer wrote:
>    
>> Kernel 2.6.32.4 (git) with the following patches applied:
>>
>> af_packet.c (tpacket_snd version 3)
>> sky2.c pskb_may_pull
>> sky2 fix WARNING at lib/dma-debug.c check_sync
>>      
> I guess, you meant the "sky2.c receive_copy" patch which you tested
> earlier, or at least you managed to crash DMAR with that patch
> before crashing it with Stephen's "lib/dma-debug.c check_sync" patch,
> right?
>
>    
Yes - sorry, correct - all three patches were in the last run. 
Previously, I've encountered the crash without these patches.
>> Running with CONFIG_DMAR=n, system is stable.
>> Running with the exact same source but CONFIG_DMAR=y I get the
>> WARNING (see below) after about 36 hours of uptime (has varied from
>> about 24 to about 48):
>> Smolt profile: http://smolt.fedoraproject.org/show?uuid=pub_bb05c701-1e47-4b3c-9fab-54f520f39d79+
>> I'm also attaching dmesg.old (dmesg from the crash).
>>
>> Subsequent to this the system watchdog reboots the system (it's hung).
>>
>> Of interest: each and every time this has happened the system was
>> under heavy RX load (win7 backup to a cifs share hosted on this
>> server). Also, there is always a dhcp exchange of some sort
>> preceding the event.
>>
>> It is possible that the event is re creatable without DMAR enabled,
>> but I have been unsuccessful in doing so.
>>      
> It would be nice to check now if it's re-creatable without the dhcp
> exchange yet, or at least dhcp through the switch and the router,
> because I suspect there might be something more than a simple drop
> on the switch that affects sky2 stability.
>
> Jarek P.
>    
Not sure I can do that. Note that based on the log messages, there were 
no errors/dropped packets involving dhcp. Moving the dhcp server off of 
the affected machine is not trivial. The dhcp correlation is based on 
logged messages preceding each crash. I cannot confirm that they're 
related, however it's really suspicious. If it helps, HP replaced my 
unmanaged switch with a managed one so I can see whether there were any 
switch events logged the next time I have a crash.

At this point, it seems the following is required to trigger the crash:
1) Uptime of 24-36 hours
2) High RX load on server (cifs traffic is what I've triggered it with).
3) Normal DHCP traffic.

Looks like based on the events I've seen that the high RX load has to be 
sustained for about 15-20 minutes prior to the dhcp traffic. Crash 
follows about a minute later.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-22 22:14         ` Michael Breuer
@ 2010-01-22 23:06           ` Jarek Poplawski
  2010-01-22 23:25             ` Michael Breuer
  0 siblings, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-22 23:06 UTC (permalink / raw)
  To: Michael Breuer
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Fri, Jan 22, 2010 at 05:14:58PM -0500, Michael Breuer wrote:
> On 1/22/2010 4:53 PM, Jarek Poplawski wrote:
> >On Fri, Jan 22, 2010 at 01:01:15PM -0500, Michael Breuer wrote:
> >>Kernel 2.6.32.4 (git) with the following patches applied:
> >>
> >>af_packet.c (tpacket_snd version 3)
> >>sky2.c pskb_may_pull
> >>sky2 fix WARNING at lib/dma-debug.c check_sync
> >I guess, you meant the "sky2.c receive_copy" patch which you tested
> >earlier, or at least you managed to crash DMAR with that patch
> >before crashing it with Stephen's "lib/dma-debug.c check_sync" patch,
> >right?
> >
> Yes - sorry, correct - all three patches were in the last run.
> Previously, I've encountered the crash without these patches.

OK, thanks for testing - it's really very helpful, and supports
David's opinion that dmar is a different problem.
...
> Not sure I can do that. Note that based on the log messages, there
> were no errors/dropped packets involving dhcp. Moving the dhcp
> server off of the affected machine is not trivial. The dhcp
> correlation is based on logged messages preceding each crash. I
> cannot confirm that they're related, however it's really suspicious.
> If it helps, HP replaced my unmanaged switch with a managed one so I
> can see whether there were any switch events logged the next time I
> have a crash.
> 
> At this point, it seems the following is required to trigger the crash:
> 1) Uptime of 24-36 hours
> 2) High RX load on server (cifs traffic is what I've triggered it with).
> 3) Normal DHCP traffic.

Do you mean you got these crashes with the new switch too, and this
switch doesn't drop DHCP at all? (Otherwise, let's try this switch
first.)

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-22 23:06           ` Jarek Poplawski
@ 2010-01-22 23:25             ` Michael Breuer
  2010-01-22 23:46               ` Jarek Poplawski
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-22 23:25 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/22/2010 6:06 PM, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 05:14:58PM -0500, Michael Breuer wrote:
>    
>> On 1/22/2010 4:53 PM, Jarek Poplawski wrote:
>>      
>>> On Fri, Jan 22, 2010 at 01:01:15PM -0500, Michael Breuer wrote:
>>>        
>>>> Kernel 2.6.32.4 (git) with the following patches applied:
>>>>
>>>> af_packet.c (tpacket_snd version 3)
>>>> sky2.c pskb_may_pull
>>>> sky2 fix WARNING at lib/dma-debug.c check_sync
>>>>          
>>> I guess, you meant the "sky2.c receive_copy" patch which you tested
>>> earlier, or at least you managed to crash DMAR with that patch
>>> before crashing it with Stephen's "lib/dma-debug.c check_sync" patch,
>>> right?
>>>
>>>        
>> Yes - sorry, correct - all three patches were in the last run.
>> Previously, I've encountered the crash without these patches.
>>      
> OK, thanks for testing - it's really very helpful, and supports
> David's opinion that dmar is a different problem.
> ...
>    
>> Not sure I can do that. Note that based on the log messages, there
>> were no errors/dropped packets involving dhcp. Moving the dhcp
>> server off of the affected machine is not trivial. The dhcp
>> correlation is based on logged messages preceding each crash. I
>> cannot confirm that they're related, however it's really suspicious.
>> If it helps, HP replaced my unmanaged switch with a managed one so I
>> can see whether there were any switch events logged the next time I
>> have a crash.
>>
>> At this point, it seems the following is required to trigger the crash:
>> 1) Uptime of 24-36 hours
>> 2) High RX load on server (cifs traffic is what I've triggered it with).
>> 3) Normal DHCP traffic.
>>      
> Do you mean you got these crashes with the new switch too, and this
> switch doesn't drop DHCP at all? (Otherwise, let's try this switch
> first.)
>
> Jarek P.
>    
Nope - just got the new switch. Crash was old switch. That said, I don't 
think (based on the log messages) that the dhcpoffer packet drop was 
happening prior to the crash. I also can't fathom why a DHCPOFFER packet 
dropped after leaving the server would have any bearing on the issue.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-22 23:25             ` Michael Breuer
@ 2010-01-22 23:46               ` Jarek Poplawski
  2010-01-22 23:50                 ` Michael Breuer
  0 siblings, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-22 23:46 UTC (permalink / raw)
  To: Michael Breuer
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Fri, Jan 22, 2010 at 06:25:12PM -0500, Michael Breuer wrote:
> On 1/22/2010 6:06 PM, Jarek Poplawski wrote:
> >On Fri, Jan 22, 2010 at 05:14:58PM -0500, Michael Breuer wrote:
> >>Not sure I can do that. Note that based on the log messages, there
> >>were no errors/dropped packets involving dhcp. Moving the dhcp
> >>server off of the affected machine is not trivial. The dhcp
> >>correlation is based on logged messages preceding each crash. I
> >>cannot confirm that they're related, however it's really suspicious.
> >>If it helps, HP replaced my unmanaged switch with a managed one so I
> >>can see whether there were any switch events logged the next time I
> >>have a crash.
> >>
> >>At this point, it seems the following is required to trigger the crash:
> >>1) Uptime of 24-36 hours
> >>2) High RX load on server (cifs traffic is what I've triggered it with).
> >>3) Normal DHCP traffic.
> >Do you mean you got these crashes with the new switch too, and this
> >switch doesn't drop DHCP at all? (Otherwise, let's try this switch
> >first.)
> >
> >Jarek P.
> Nope - just got the new switch. Crash was old switch. That said, I
> don't think (based on the log messages) that the dhcpoffer packet
> drop was happening prior to the crash. I also can't fathom why a
> DHCPOFFER packet dropped after leaving the server would have any
> bearing on the issue.

You wrote earlier:
> [...] Also, there is always a dhcp exchange of some sort
> preceding the event.

So, I'm not sure there was "3) Normal DHCP traffic." if the switch
could drop DHCP packets in some buggy conditions. Anyway, let's try
the new one with really "3) Normal DHCP traffic.", I hope.

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-22 23:46               ` Jarek Poplawski
@ 2010-01-22 23:50                 ` Michael Breuer
  2010-01-23 23:21                   ` Jarek Poplawski
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-22 23:50 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/22/2010 6:46 PM, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 06:25:12PM -0500, Michael Breuer wrote:
>    
>
> You wrote earlier:
>    
>> [...] Also, there is always a dhcp exchange of some sort
>> preceding the event.
>>      
> So, I'm not sure there was "3) Normal DHCP traffic." if the switch
> could drop DHCP packets in some buggy conditions. Anyway, let's try
> the new one with really "3) Normal DHCP traffic.", I hope.
>
> Jarek P.
>    
When the packets were dropped, there was a different sequence in the log 
- DISCOVER/OFFER repeated. The "normal" is that the sequence appeared 
correct and complete - DISCOVER/OFFER/REQUEST/ACK - or INFORM/ACK (vs. 
INFORM repeatedly sans ACK) as the case may be.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-22 23:50                 ` Michael Breuer
@ 2010-01-23 23:21                   ` Jarek Poplawski
  2010-01-24  1:53                     ` Michael Breuer
  2010-01-27 15:34                     ` Michael Breuer
  0 siblings, 2 replies; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-23 23:21 UTC (permalink / raw)
  To: Michael Breuer
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> When the packets were dropped, there was a different sequence in the
> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.

Anyway, I'd be intersted if the switch matters here.

Plus one more test: could you try to load sky2 with the parameter:
"copybreak=1" (the rest as in any recent test, which gave you dmar
errors; any switch).

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-23 23:21                   ` Jarek Poplawski
@ 2010-01-24  1:53                     ` Michael Breuer
  2010-01-27 15:34                     ` Michael Breuer
  1 sibling, 0 replies; 99+ messages in thread
From: Michael Breuer @ 2010-01-24  1:53 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/23/2010 6:21 PM, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>    
>> When the packets were dropped, there was a different sequence in the
>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>      
> Anyway, I'd be intersted if the switch matters here.
>
> Plus one more test: could you try to load sky2 with the parameter:
> "copybreak=1" (the rest as in any recent test, which gave you dmar
> errors; any switch).
>
> Thanks,
> Jarek P.
>    
Ok - will try with and without the copybreak=1 after I'm done bisecting 
2.6.33 rc5 for something unrelated (it'll take a couple of days for each 
unless a crash occurs in less that 48 hours).

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-23 23:21                   ` Jarek Poplawski
  2010-01-24  1:53                     ` Michael Breuer
@ 2010-01-27 15:34                     ` Michael Breuer
  2010-01-27 16:50                       ` Stephen Hemminger
  1 sibling, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-27 15:34 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>    
>> When the packets were dropped, there was a different sequence in the
>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>      
> Anyway, I'd be intersted if the switch matters here.
>
> Plus one more test: could you try to load sky2 with the parameter:
> "copybreak=1" (the rest as in any recent test, which gave you dmar
> errors; any switch).
>
> Thanks,
> Jarek P.
>    
Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak 
to confirm that I haven't inadvertently fixed something. However, given 
that it might be copybreak-related, I looked at sky2.c again and I'm 
wondering about the copybreak max size in sky2_rx_start:

   size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);

         /* Stopping point for hardware truncation */
         thresh = (size - 8) / sizeof(u32);

         sky2->rx_nfrags = size >> PAGE_SHIFT;
         BUG_ON(sky2->rx_nfrags > ARRAY_SIZE(re->frag_addr));

         /* Compute residue after pages */
         size -= sky2->rx_nfrags << PAGE_SHIFT;

         /* Optimize to handle small packets and headers */
         if (size < copybreak)
                 size = copybreak;
         if (size < ETH_HLEN)
                 size = ETH_HLEN;


Why would increasing size to copybreak be valid here?

Guessing a bit as I'm not sure about rx_nfrags, but if I read this 
correctly, if size is ever less than copybreak it's because there isn't 
enough space left for anything larger. If so, wouldn't increasing size 
potentially corrupt something? I'd further guess that the resulting 
condition manifests sooner (or at least with a more visible effect) when 
using DMAR.

In any event, why "copybreak" as the minimum buffer size? I'd suggest 
that if it isn't possible to allocate at least MTU + overhead that 
sky2_rx_start ought to be delayed until there is room.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 15:34                     ` Michael Breuer
@ 2010-01-27 16:50                       ` Stephen Hemminger
  2010-01-27 16:57                         ` Michael Breuer
  0 siblings, 1 reply; 99+ messages in thread
From: Stephen Hemminger @ 2010-01-27 16:50 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Wed, 27 Jan 2010 10:34:51 -0500
Michael Breuer <mbreuer@majjas.com> wrote:

> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> > On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> >      
> >> When the packets were dropped, there was a different sequence in the
> >> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> >> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> >> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
> >>        
> > Anyway, I'd be intersted if the switch matters here.
> >
> > Plus one more test: could you try to load sky2 with the parameter:
> > "copybreak=1" (the rest as in any recent test, which gave you dmar
> > errors; any switch).
> >
> > Thanks,
> > Jarek P.
> >      
> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak 
> to confirm that I haven't inadvertently fixed something. However, given 
> that it might be copybreak-related, I looked at sky2.c again and I'm 
> wondering about the copybreak max size in sky2_rx_start:
> 
>    size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
> 
>          /* Stopping point for hardware truncation */
>          thresh = (size - 8) / sizeof(u32);
> 
>          sky2->rx_nfrags = size >> PAGE_SHIFT;
>          BUG_ON(sky2->rx_nfrags > ARRAY_SIZE(re->frag_addr));
> 
>          /* Compute residue after pages */
>          size -= sky2->rx_nfrags << PAGE_SHIFT;
> 
>          /* Optimize to handle small packets and headers */
>          if (size < copybreak)
>                  size = copybreak;
>          if (size < ETH_HLEN)
>                  size = ETH_HLEN;
> 
> 
> Why would increasing size to copybreak be valid here?
> 
> Guessing a bit as I'm not sure about rx_nfrags, but if I read this 
> correctly, if size is ever less than copybreak it's because there isn't 
> enough space left for anything larger. If so, wouldn't increasing size 
> potentially corrupt something? I'd further guess that the resulting 
> condition manifests sooner (or at least with a more visible effect) when 
> using DMAR.
> 
> In any event, why "copybreak" as the minimum buffer size? I'd suggest 
> that if it isn't possible to allocate at least MTU + overhead that 
> sky2_rx_start ought to be delayed until there is room.

This code is where driver decides how much data will be received in skb
data area and the remaining data spills over into skb frags.
Copybreak is the threshold so that packets less than size are copied
to a new skb.  The code doing the copying there assumes the data is
totally contained in the skb (not in frags). The size increase there
is to make sure that assumption is always true.  I suppose you
could do something perverse like setting copybreak really huge 
and confuse driver, but that is a user error.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 16:50                       ` Stephen Hemminger
@ 2010-01-27 16:57                         ` Michael Breuer
  2010-01-27 17:45                           ` Stephen Hemminger
  2010-01-27 17:56                           ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Stephen Hemminger
  0 siblings, 2 replies; 99+ messages in thread
From: Michael Breuer @ 2010-01-27 16:57 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
> On Wed, 27 Jan 2010 10:34:51 -0500
> Michael Breuer<mbreuer@majjas.com>  wrote:
>
>    
>> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
>>      
>>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>>>
>>>        
>>>> When the packets were dropped, there was a different sequence in the
>>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>>>
>>>>          
>>> Anyway, I'd be intersted if the switch matters here.
>>>
>>> Plus one more test: could you try to load sky2 with the parameter:
>>> "copybreak=1" (the rest as in any recent test, which gave you dmar
>>> errors; any switch).
>>>
>>> Thanks,
>>> Jarek P.
>>>
>>>        
>> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
>> to confirm that I haven't inadvertently fixed something. However, given
>> that it might be copybreak-related, I looked at sky2.c again and I'm
>> wondering about the copybreak max size in sky2_rx_start:
>>
>>     size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
>>
>>           /* Stopping point for hardware truncation */
>>           thresh = (size - 8) / sizeof(u32);
>>
>>           sky2->rx_nfrags = size>>  PAGE_SHIFT;
>>           BUG_ON(sky2->rx_nfrags>  ARRAY_SIZE(re->frag_addr));
>>
>>           /* Compute residue after pages */
>>           size -= sky2->rx_nfrags<<  PAGE_SHIFT;
>>
>>           /* Optimize to handle small packets and headers */
>>           if (size<  copybreak)
>>                   size = copybreak;
>>           if (size<  ETH_HLEN)
>>                   size = ETH_HLEN;
>>
>>
>> Why would increasing size to copybreak be valid here?
>>
>> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
>> correctly, if size is ever less than copybreak it's because there isn't
>> enough space left for anything larger. If so, wouldn't increasing size
>> potentially corrupt something? I'd further guess that the resulting
>> condition manifests sooner (or at least with a more visible effect) when
>> using DMAR.
>>
>> In any event, why "copybreak" as the minimum buffer size? I'd suggest
>> that if it isn't possible to allocate at least MTU + overhead that
>> sky2_rx_start ought to be delayed until there is room.
>>      
> This code is where driver decides how much data will be received in skb
> data area and the remaining data spills over into skb frags.
> Copybreak is the threshold so that packets less than size are copied
> to a new skb.  The code doing the copying there assumes the data is
> totally contained in the skb (not in frags). The size increase there
> is to make sure that assumption is always true.  I suppose you
> could do something perverse like setting copybreak really huge
> and confuse driver, but that is a user error.
>
>    
Ok - but I'm wondering under what circumstances size would be < 
copybreak in the first place after computing the residue. If size ends 
up being unreasonably small, is simply increasing the number to whatever 
copybreak is correct? Assuming my testing is correct, then the crash 
I've been experiencing when using dmar (only) seems related to the value 
of copybreak. I don't think the other use (skb reuse) is the issue (but 
hey, I could have missed something). The crash occurs when copybreak is 
the default of 128, didn't happen when I set copybreak to 1.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 16:57                         ` Michael Breuer
@ 2010-01-27 17:45                           ` Stephen Hemminger
  2010-01-27 17:57                             ` Michael Breuer
  2010-01-27 23:54                             ` Hang: 2.6.32.4 sky2/DMAR David Miller
  2010-01-27 17:56                           ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Stephen Hemminger
  1 sibling, 2 replies; 99+ messages in thread
From: Stephen Hemminger @ 2010-01-27 17:45 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Wed, 27 Jan 2010 11:57:35 -0500
Michael Breuer <mbreuer@majjas.com> wrote:

> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
> > On Wed, 27 Jan 2010 10:34:51 -0500
> > Michael Breuer<mbreuer@majjas.com>  wrote:
> >
> >    
> >> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> >>      
> >>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> >>>
> >>>        
> >>>> When the packets were dropped, there was a different sequence in the
> >>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> >>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> >>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
> >>>>
> >>>>          
> >>> Anyway, I'd be intersted if the switch matters here.
> >>>
> >>> Plus one more test: could you try to load sky2 with the parameter:
> >>> "copybreak=1" (the rest as in any recent test, which gave you dmar
> >>> errors; any switch).
> >>>
> >>> Thanks,
> >>> Jarek P.
> >>>
> >>>        
> >> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
> >> to confirm that I haven't inadvertently fixed something. However, given
> >> that it might be copybreak-related, I looked at sky2.c again and I'm
> >> wondering about the copybreak max size in sky2_rx_start:
> >>
> >>     size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
> >>
> >>           /* Stopping point for hardware truncation */
> >>           thresh = (size - 8) / sizeof(u32);
> >>
> >>           sky2->rx_nfrags = size>>  PAGE_SHIFT;
> >>           BUG_ON(sky2->rx_nfrags>  ARRAY_SIZE(re->frag_addr));
> >>
> >>           /* Compute residue after pages */
> >>           size -= sky2->rx_nfrags<<  PAGE_SHIFT;
> >>
> >>           /* Optimize to handle small packets and headers */
> >>           if (size<  copybreak)
> >>                   size = copybreak;
> >>           if (size<  ETH_HLEN)
> >>                   size = ETH_HLEN;
> >>
> >>
> >> Why would increasing size to copybreak be valid here?
> >>
> >> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
> >> correctly, if size is ever less than copybreak it's because there isn't
> >> enough space left for anything larger. If so, wouldn't increasing size
> >> potentially corrupt something? I'd further guess that the resulting
> >> condition manifests sooner (or at least with a more visible effect) when
> >> using DMAR.
> >>
> >> In any event, why "copybreak" as the minimum buffer size? I'd suggest
> >> that if it isn't possible to allocate at least MTU + overhead that
> >> sky2_rx_start ought to be delayed until there is room.
> >>      
> > This code is where driver decides how much data will be received in skb
> > data area and the remaining data spills over into skb frags.
> > Copybreak is the threshold so that packets less than size are copied
> > to a new skb.  The code doing the copying there assumes the data is
> > totally contained in the skb (not in frags). The size increase there
> > is to make sure that assumption is always true.  I suppose you
> > could do something perverse like setting copybreak really huge
> > and confuse driver, but that is a user error.
> >
> >    
> Ok - but I'm wondering under what circumstances size would be < 
> copybreak in the first place after computing the residue. If size ends 
> up being unreasonably small, is simply increasing the number to whatever 
> copybreak is correct? Assuming my testing is correct, then the crash 
> I've been experiencing when using dmar (only) seems related to the value 
> of copybreak. I don't think the other use (skb reuse) is the issue (but 
> hey, I could have missed something). The crash occurs when copybreak is 
> the default of 128, didn't happen when I set copybreak to 1.
> 

Setting it to 1 causes driver to never go through the dma_sync_single/memcpy
path.  Perhaps the code for DMAR doesn't do dma_sync_single_for_cpu
properly, or the value passed to sync_single_for_cpu doesn't account for
all the overhead of padding and/or ether header.

-- 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 16:57                         ` Michael Breuer
  2010-01-27 17:45                           ` Stephen Hemminger
@ 2010-01-27 17:56                           ` Stephen Hemminger
  2010-01-27 17:58                             ` Michael Breuer
  2010-01-27 18:08                             ` Michael Breuer
  1 sibling, 2 replies; 99+ messages in thread
From: Stephen Hemminger @ 2010-01-27 17:56 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Wed, 27 Jan 2010 11:57:35 -0500
Michael Breuer <mbreuer@majjas.com> wrote:

> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
> > On Wed, 27 Jan 2010 10:34:51 -0500
> > Michael Breuer<mbreuer@majjas.com>  wrote:
> >
> >    
> >> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> >>      
> >>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> >>>
> >>>        
> >>>> When the packets were dropped, there was a different sequence in the
> >>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> >>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> >>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
> >>>>
> >>>>          
> >>> Anyway, I'd be intersted if the switch matters here.
> >>>
> >>> Plus one more test: could you try to load sky2 with the parameter:
> >>> "copybreak=1" (the rest as in any recent test, which gave you dmar
> >>> errors; any switch).
> >>>
> >>> Thanks,
> >>> Jarek P.
> >>>
> >>>        
> >> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
> >> to confirm that I haven't inadvertently fixed something. However, given
> >> that it might be copybreak-related, I looked at sky2.c again and I'm
> >> wondering about the copybreak max size in sky2_rx_start:
> >>
> >>     size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
> >>
> >>           /* Stopping point for hardware truncation */
> >>           thresh = (size - 8) / sizeof(u32);
> >>
> >>           sky2->rx_nfrags = size>>  PAGE_SHIFT;
> >>           BUG_ON(sky2->rx_nfrags>  ARRAY_SIZE(re->frag_addr));
> >>
> >>           /* Compute residue after pages */
> >>           size -= sky2->rx_nfrags<<  PAGE_SHIFT;
> >>
> >>           /* Optimize to handle small packets and headers */
> >>           if (size<  copybreak)
> >>                   size = copybreak;
> >>           if (size<  ETH_HLEN)
> >>                   size = ETH_HLEN;
> >>
> >>
> >> Why would increasing size to copybreak be valid here?
> >>
> >> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
> >> correctly, if size is ever less than copybreak it's because there isn't
> >> enough space left for anything larger. If so, wouldn't increasing size
> >> potentially corrupt something? I'd further guess that the resulting
> >> condition manifests sooner (or at least with a more visible effect) when
> >> using DMAR.
> >>
> >> In any event, why "copybreak" as the minimum buffer size? I'd suggest
> >> that if it isn't possible to allocate at least MTU + overhead that
> >> sky2_rx_start ought to be delayed until there is room.
> >>      
> > This code is where driver decides how much data will be received in skb
> > data area and the remaining data spills over into skb frags.
> > Copybreak is the threshold so that packets less than size are copied
> > to a new skb.  The code doing the copying there assumes the data is
> > totally contained in the skb (not in frags). The size increase there
> > is to make sure that assumption is always true.  I suppose you
> > could do something perverse like setting copybreak really huge
> > and confuse driver, but that is a user error.
> >
> >    
> Ok - but I'm wondering under what circumstances size would be < 
> copybreak in the first place after computing the residue. If size ends 
> up being unreasonably small, is simply increasing the number to whatever 
> copybreak is correct? Assuming my testing is correct, then the crash 
> I've been experiencing when using dmar (only) seems related to the value 
> of copybreak. I don't think the other use (skb reuse) is the issue (but 
> hey, I could have missed something). The crash occurs when copybreak is 
> the default of 128, didn't happen when I set copybreak to 1.

Does this change it? If so the dma code is (not sky2) is buggy and not
rounding up properly.

--- a/drivers/net/sky2.c	2010-01-27 09:46:10.940005248 -0800
+++ b/drivers/net/sky2.c	2010-01-27 09:53:47.141267850 -0800
@@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
 
 	skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
 	if (likely(skb)) {
+		unsigned dma_align = dma_get_cache_alignment();
+		unsigned dma_size = ALIGN(length+1, dma_align);
+
 		pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
-					    length, PCI_DMA_FROMDEVICE);
+					    dma_size, PCI_DMA_FROMDEVICE);
 		skb_copy_from_linear_data(re->skb, skb->data, length);
 		skb->ip_summed = re->skb->ip_summed;
 		skb->csum = re->skb->csum;
 		pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
-					       length, PCI_DMA_FROMDEVICE);
+					       dma_size, PCI_DMA_FROMDEVICE);
 		re->skb->ip_summed = CHECKSUM_NONE;
 		skb_put(skb, length);
 	}

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 17:45                           ` Stephen Hemminger
@ 2010-01-27 17:57                             ` Michael Breuer
  2010-01-27 18:33                               ` Michael Breuer
  2010-01-27 23:54                             ` Hang: 2.6.32.4 sky2/DMAR David Miller
  1 sibling, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-27 17:57 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/27/2010 12:45 PM, Stephen Hemminger wrote:
> On Wed, 27 Jan 2010 11:57:35 -0500
> Michael Breuer<mbreuer@majjas.com>  wrote:
>
>    
>> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
>>      
>>> On Wed, 27 Jan 2010 10:34:51 -0500
>>> Michael Breuer<mbreuer@majjas.com>   wrote:
>>>
>>>
>>>        
>>>> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
>>>>
>>>>          
>>>>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> When the packets were dropped, there was a different sequence in the
>>>>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>>>>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>>>>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>>>>>
>>>>>>
>>>>>>              
>>>>> Anyway, I'd be intersted if the switch matters here.
>>>>>
>>>>> Plus one more test: could you try to load sky2 with the parameter:
>>>>> "copybreak=1" (the rest as in any recent test, which gave you dmar
>>>>> errors; any switch).
>>>>>
>>>>> Thanks,
>>>>> Jarek P.
>>>>>
>>>>>
>>>>>            
>>>> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
>>>> to confirm that I haven't inadvertently fixed something. However, given
>>>> that it might be copybreak-related, I looked at sky2.c again and I'm
>>>> wondering about the copybreak max size in sky2_rx_start:
>>>>
>>>>      size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
>>>>
>>>>            /* Stopping point for hardware truncation */
>>>>            thresh = (size - 8) / sizeof(u32);
>>>>
>>>>            sky2->rx_nfrags = size>>   PAGE_SHIFT;
>>>>            BUG_ON(sky2->rx_nfrags>   ARRAY_SIZE(re->frag_addr));
>>>>
>>>>            /* Compute residue after pages */
>>>>            size -= sky2->rx_nfrags<<   PAGE_SHIFT;
>>>>
>>>>            /* Optimize to handle small packets and headers */
>>>>            if (size<   copybreak)
>>>>                    size = copybreak;
>>>>            if (size<   ETH_HLEN)
>>>>                    size = ETH_HLEN;
>>>>
>>>>
>>>> Why would increasing size to copybreak be valid here?
>>>>
>>>> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
>>>> correctly, if size is ever less than copybreak it's because there isn't
>>>> enough space left for anything larger. If so, wouldn't increasing size
>>>> potentially corrupt something? I'd further guess that the resulting
>>>> condition manifests sooner (or at least with a more visible effect) when
>>>> using DMAR.
>>>>
>>>> In any event, why "copybreak" as the minimum buffer size? I'd suggest
>>>> that if it isn't possible to allocate at least MTU + overhead that
>>>> sky2_rx_start ought to be delayed until there is room.
>>>>
>>>>          
>>> This code is where driver decides how much data will be received in skb
>>> data area and the remaining data spills over into skb frags.
>>> Copybreak is the threshold so that packets less than size are copied
>>> to a new skb.  The code doing the copying there assumes the data is
>>> totally contained in the skb (not in frags). The size increase there
>>> is to make sure that assumption is always true.  I suppose you
>>> could do something perverse like setting copybreak really huge
>>> and confuse driver, but that is a user error.
>>>
>>>
>>>        
>> Ok - but I'm wondering under what circumstances size would be<
>> copybreak in the first place after computing the residue. If size ends
>> up being unreasonably small, is simply increasing the number to whatever
>> copybreak is correct? Assuming my testing is correct, then the crash
>> I've been experiencing when using dmar (only) seems related to the value
>> of copybreak. I don't think the other use (skb reuse) is the issue (but
>> hey, I could have missed something). The crash occurs when copybreak is
>> the default of 128, didn't happen when I set copybreak to 1.
>>
>>      
> Setting it to 1 causes driver to never go through the dma_sync_single/memcpy
> path.  Perhaps the code for DMAR doesn't do dma_sync_single_for_cpu
> properly, or the value passed to sync_single_for_cpu doesn't account for
> all the overhead of padding and/or ether header.
>
>    
Ah - ok... will poke around there... if you have any suggestions, 
diagnostics, whatever, let me know. Also, just an FYI - before rebooting 
with copybreak back to defaults, I tried mtu=9000 again. That hung the 
server immediately - no diagnostic output - system froze until watchdog 
rebooted. Don't know right now if the copybreak had anything to do with 
this, but when I've tried in the past I've had errors on sky2, but never 
crashed the system like this. Only two things different were copybreak 
and the length of time the system had been up. I'll try later with 
copybreak default and copybreak=1 to see if that affects mtu behavior.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 17:56                           ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Stephen Hemminger
@ 2010-01-27 17:58                             ` Michael Breuer
  2010-01-27 18:08                             ` Michael Breuer
  1 sibling, 0 replies; 99+ messages in thread
From: Michael Breuer @ 2010-01-27 17:58 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/27/2010 12:56 PM, Stephen Hemminger wrote:
> On Wed, 27 Jan 2010 11:57:35 -0500
> Michael Breuer<mbreuer@majjas.com>  wrote:
>
>    
>> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
>>      
>>> On Wed, 27 Jan 2010 10:34:51 -0500
>>> Michael Breuer<mbreuer@majjas.com>   wrote:
>>>
>>>
>>>        
>>>> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
>>>>
>>>>          
>>>>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> When the packets were dropped, there was a different sequence in the
>>>>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>>>>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>>>>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>>>>>
>>>>>>
>>>>>>              
>>>>> Anyway, I'd be intersted if the switch matters here.
>>>>>
>>>>> Plus one more test: could you try to load sky2 with the parameter:
>>>>> "copybreak=1" (the rest as in any recent test, which gave you dmar
>>>>> errors; any switch).
>>>>>
>>>>> Thanks,
>>>>> Jarek P.
>>>>>
>>>>>
>>>>>            
>>>> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
>>>> to confirm that I haven't inadvertently fixed something. However, given
>>>> that it might be copybreak-related, I looked at sky2.c again and I'm
>>>> wondering about the copybreak max size in sky2_rx_start:
>>>>
>>>>      size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
>>>>
>>>>            /* Stopping point for hardware truncation */
>>>>            thresh = (size - 8) / sizeof(u32);
>>>>
>>>>            sky2->rx_nfrags = size>>   PAGE_SHIFT;
>>>>            BUG_ON(sky2->rx_nfrags>   ARRAY_SIZE(re->frag_addr));
>>>>
>>>>            /* Compute residue after pages */
>>>>            size -= sky2->rx_nfrags<<   PAGE_SHIFT;
>>>>
>>>>            /* Optimize to handle small packets and headers */
>>>>            if (size<   copybreak)
>>>>                    size = copybreak;
>>>>            if (size<   ETH_HLEN)
>>>>                    size = ETH_HLEN;
>>>>
>>>>
>>>> Why would increasing size to copybreak be valid here?
>>>>
>>>> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
>>>> correctly, if size is ever less than copybreak it's because there isn't
>>>> enough space left for anything larger. If so, wouldn't increasing size
>>>> potentially corrupt something? I'd further guess that the resulting
>>>> condition manifests sooner (or at least with a more visible effect) when
>>>> using DMAR.
>>>>
>>>> In any event, why "copybreak" as the minimum buffer size? I'd suggest
>>>> that if it isn't possible to allocate at least MTU + overhead that
>>>> sky2_rx_start ought to be delayed until there is room.
>>>>
>>>>          
>>> This code is where driver decides how much data will be received in skb
>>> data area and the remaining data spills over into skb frags.
>>> Copybreak is the threshold so that packets less than size are copied
>>> to a new skb.  The code doing the copying there assumes the data is
>>> totally contained in the skb (not in frags). The size increase there
>>> is to make sure that assumption is always true.  I suppose you
>>> could do something perverse like setting copybreak really huge
>>> and confuse driver, but that is a user error.
>>>
>>>
>>>        
>> Ok - but I'm wondering under what circumstances size would be<
>> copybreak in the first place after computing the residue. If size ends
>> up being unreasonably small, is simply increasing the number to whatever
>> copybreak is correct? Assuming my testing is correct, then the crash
>> I've been experiencing when using dmar (only) seems related to the value
>> of copybreak. I don't think the other use (skb reuse) is the issue (but
>> hey, I could have missed something). The crash occurs when copybreak is
>> the default of 128, didn't happen when I set copybreak to 1.
>>      
> Does this change it? If so the dma code is (not sky2) is buggy and not
> rounding up properly.
>
> --- a/drivers/net/sky2.c	2010-01-27 09:46:10.940005248 -0800
> +++ b/drivers/net/sky2.c	2010-01-27 09:53:47.141267850 -0800
> @@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
>
>   	skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
>   	if (likely(skb)) {
> +		unsigned dma_align = dma_get_cache_alignment();
> +		unsigned dma_size = ALIGN(length+1, dma_align);
> +
>   		pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> -					    length, PCI_DMA_FROMDEVICE);
> +					    dma_size, PCI_DMA_FROMDEVICE);
>   		skb_copy_from_linear_data(re->skb, skb->data, length);
>   		skb->ip_summed = re->skb->ip_summed;
>   		skb->csum = re->skb->csum;
>   		pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> -					       length, PCI_DMA_FROMDEVICE);
> +					       dma_size, PCI_DMA_FROMDEVICE);
>   		re->skb->ip_summed = CHECKSUM_NONE;
>   		skb_put(skb, length);
>   	}
>    
Ok - will queue this - want to reconfirm that the system still crashes 
w/o this (or copybreak). That should take a few days.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 17:56                           ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Stephen Hemminger
  2010-01-27 17:58                             ` Michael Breuer
@ 2010-01-27 18:08                             ` Michael Breuer
  2010-01-27 18:45                               ` Michael Breuer
  1 sibling, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-27 18:08 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 01/27/2010 12:56 PM, Stephen Hemminger wrote:
> --- a/drivers/net/sky2.c	2010-01-27 09:46:10.940005248 -0800
> +++ b/drivers/net/sky2.c	2010-01-27 09:53:47.141267850 -0800
> @@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
>
>   	skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
>   	if (likely(skb)) {
> +		unsigned dma_align = dma_get_cache_alignment();
> +		unsigned dma_size = ALIGN(length+1, dma_align);
> +
>   		pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> -					    length, PCI_DMA_FROMDEVICE);
> +					    dma_size, PCI_DMA_FROMDEVICE);
>   		skb_copy_from_linear_data(re->skb, skb->data, length);
>   		skb->ip_summed = re->skb->ip_summed;
>   		skb->csum = re->skb->csum;
>   		pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> -					       length, PCI_DMA_FROMDEVICE);
> +					       dma_size, PCI_DMA_FROMDEVICE);
>   		re->skb->ip_summed = CHECKSUM_NONE;
>   		skb_put(skb, length);
>   	}
>    
This doesn't apply - I'm missing some intermediate patch.

I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re, data_size) 
vs., "length." I assume that I can just replace the pci_unmap_len with 
dma_size... but perhaps the intermediate change may have affected this 
as well?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 17:57                             ` Michael Breuer
@ 2010-01-27 18:33                               ` Michael Breuer
  0 siblings, 0 replies; 99+ messages in thread
From: Michael Breuer @ 2010-01-27 18:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 01/27/2010 12:57 PM, Michael Breuer wrote:
> On 1/27/2010 12:45 PM, Stephen Hemminger wrote:
>> On Wed, 27 Jan 2010 11:57:35 -0500
>> Michael Breuer<mbreuer@majjas.com>  wrote:
>>
>>
> Ah - ok... will poke around there... if you have any suggestions, 
> diagnostics, whatever, let me know. Also, just an FYI - before 
> rebooting with copybreak back to defaults, I tried mtu=9000 again. 
> That hung the server immediately - no diagnostic output - system froze 
> until watchdog rebooted. Don't know right now if the copybreak had 
> anything to do with this, but when I've tried in the past I've had 
> errors on sky2, but never crashed the system like this. Only two 
> things different were copybreak and the length of time the system had 
> been up. I'll try later with copybreak default and copybreak=1 to see 
> if that affects mtu behavior.
>
FYI - just redid this a few times. Looks like it's how long the system 
was up, not copybreak wrt crash on resetting MTU.

That said, while the system seems OK after resetting the MTU, I do get a 
WARNING from netdev watchdog - same warning regardless of copybreak. 
Setting the mtu back to 1500 generates rx errors after which things 
work. Going back to 9000 again does not generate new errors.

Jan 27 13:21:54 mail kernel: ------------[ cut here ]------------
Jan 27 13:21:54 mail kernel: WARNING: at net/sched/sch_generic.c:261 
dev_watchdog+0xf3/0x164()
Jan 27 13:21:54 mail kernel: Hardware name: System Product Name
Jan 27 13:21:54 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit 
queue 0 timed out
Jan 27 13:21:54 mail kernel: Modules linked in: microcode(+) 
ip6table_mangle ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat 
nf_nat iptable_mangle iptable_raw bridge stp appletalk psnap llc nfsd 
lockd nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc 
acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns 
nf_conntrack_ftp nf_conntrack_ipv6 xt_multiport xt_DSCP xt_dscp xt_MARK 
ipv6 dm_multipath kvm_intel kvm snd_hda_codec_analog snd_ens1371 
gameport snd_rawmidi snd_ac97_codec snd_hda_intel snd_hda_codec 
snd_hwdep ac97_bus snd_seq gspca_spca505 gspca_main videodev 
snd_seq_device asus_atk0110 v4l1_compat snd_pcm hwmon 
v4l2_compat_ioctl32 pcspkr i2c_i801 firewire_ohci firewire_core 
crc_itu_t snd_timer snd soundcore wmi snd_page_alloc sky2 iTCO_wdt 
iTCO_vendor_support fbcon tileblit font bitblit softcursor raid456 
async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx 
raid1 ata_generic pata_acpi pata_marvell nouveau ttm drm_kms_helper drm 
agpgart fb i2c_algo_bit cfbcopyarea i2c_core cfb
Jan 27 13:21:54 mail kernel: imgblt cfbfillrect [last unloaded: ip6_tables]
Jan 27 13:21:54 mail kernel: Pid: 0, comm: swapper Tainted: G        W  
2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty #6
Jan 27 13:21:54 mail kernel: Call Trace:
Jan 27 13:21:54 mail kernel: <IRQ>  [<ffffffff810536ee>] 
warn_slowpath_common+0x7c/0x94
Jan 27 13:21:54 mail kernel: [<ffffffff8105375d>] 
warn_slowpath_fmt+0x41/0x43
Jan 27 13:21:54 mail kernel: [<ffffffff813e3b6b>] ? netif_tx_lock+0x44/0x6c
Jan 27 13:21:54 mail kernel: [<ffffffff813e3cd3>] dev_watchdog+0xf3/0x164
Jan 27 13:21:54 mail kernel: [<ffffffff8106e990>] ? __queue_work+0x3a/0x42
Jan 27 13:21:54 mail kernel: [<ffffffff8106323f>] 
run_timer_softirq+0x1c8/0x270
Jan 27 13:21:54 mail kernel: [<ffffffff8105af0f>] __do_softirq+0xf8/0x1cd
Jan 27 13:21:54 mail kernel: [<ffffffff8107f0ab>] ? 
tick_program_event+0x2a/0x2c
Jan 27 13:21:54 mail kernel: [<ffffffff81012e1c>] call_softirq+0x1c/0x30
Jan 27 13:21:54 mail kernel: [<ffffffff810143a3>] do_softirq+0x4b/0xa6
Jan 27 13:21:54 mail kernel: [<ffffffff8105aaef>] irq_exit+0x4a/0x8c
Jan 27 13:21:54 mail kernel: [<ffffffff81470612>] 
smp_apic_timer_interrupt+0x86/0x94
Jan 27 13:21:54 mail kernel: [<ffffffff810127e3>] 
apic_timer_interrupt+0x13/0x20
Jan 27 13:21:54 mail kernel: <EOI>  [<ffffffff812c729a>] ? 
acpi_idle_enter_bm+0x256/0x28a
Jan 27 13:21:54 mail kernel: [<ffffffff812c7293>] ? 
acpi_idle_enter_bm+0x24f/0x28a
Jan 27 13:21:54 mail kernel: [<ffffffff813a6c3c>] ? 
cpuidle_idle_call+0x9e/0xfa
Jan 27 13:21:54 mail kernel: [<ffffffff81010c90>] ? cpu_idle+0xb4/0xf6
Jan 27 13:21:54 mail kernel: [<ffffffff81465ba5>] ? 
start_secondary+0x201/0x242
Jan 27 13:21:54 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 27 13:21:54 mail kernel: sky2 eth0: tx timeout
Jan 27 13:21:54 mail kernel: sky2 eth0: transmit ring 51 .. 10 report=51 
done=51
Jan 27 13:21:54 mail kernel: sky2 eth0: disabling interface
Jan 27 13:21:54 mail kernel: sky2 eth0: enabling interface



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 18:08                             ` Michael Breuer
@ 2010-01-27 18:45                               ` Michael Breuer
  2010-01-27 19:23                                 ` Jarek Poplawski
  2010-01-28 15:32                                 ` Michael Breuer
  0 siblings, 2 replies; 99+ messages in thread
From: Michael Breuer @ 2010-01-27 18:45 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/27/2010 1:08 PM, Michael Breuer wrote:
> On 01/27/2010 12:56 PM, Stephen Hemminger wrote:
>> --- a/drivers/net/sky2.c    2010-01-27 09:46:10.940005248 -0800
>> +++ b/drivers/net/sky2.c    2010-01-27 09:53:47.141267850 -0800
>> @@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
>>
>>       skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
>>       if (likely(skb)) {
>> +        unsigned dma_align = dma_get_cache_alignment();
>> +        unsigned dma_size = ALIGN(length+1, dma_align);
>> +
>>           pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
>> -                        length, PCI_DMA_FROMDEVICE);
>> +                        dma_size, PCI_DMA_FROMDEVICE);
>>           skb_copy_from_linear_data(re->skb, skb->data, length);
>>           skb->ip_summed = re->skb->ip_summed;
>>           skb->csum = re->skb->csum;
>>           pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
>> -                           length, PCI_DMA_FROMDEVICE);
>> +                           dma_size, PCI_DMA_FROMDEVICE);
>>           re->skb->ip_summed = CHECKSUM_NONE;
>>           skb_put(skb, length);
>>       }
> This doesn't apply - I'm missing some intermediate patch.
>
> I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re, 
> data_size) vs., "length." I assume that I can just replace the 
> pci_unmap_len with dma_size... but perhaps the intermediate change may 
> have affected this as well?
>
Never mind - that was from one of the earlier patches I had been trying 
out. will try the above patch after reestablishing that the system still 
crashes without copybreak=1.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 18:45                               ` Michael Breuer
@ 2010-01-27 19:23                                 ` Jarek Poplawski
  2010-01-27 19:32                                   ` Jarek Poplawski
  2010-01-28 15:32                                 ` Michael Breuer
  1 sibling, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-27 19:23 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Wed, Jan 27, 2010 at 01:45:27PM -0500, Michael Breuer wrote:
> On 1/27/2010 1:08 PM, Michael Breuer wrote:
> >On 01/27/2010 12:56 PM, Stephen Hemminger wrote:
> >>--- a/drivers/net/sky2.c    2010-01-27 09:46:10.940005248 -0800
> >>+++ b/drivers/net/sky2.c    2010-01-27 09:53:47.141267850 -0800
> >>@@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
> >>
> >>      skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
> >>      if (likely(skb)) {
> >>+        unsigned dma_align = dma_get_cache_alignment();
> >>+        unsigned dma_size = ALIGN(length+1, dma_align);
> >>+
> >>          pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> >>-                        length, PCI_DMA_FROMDEVICE);
> >>+                        dma_size, PCI_DMA_FROMDEVICE);
> >>          skb_copy_from_linear_data(re->skb, skb->data, length);
> >>          skb->ip_summed = re->skb->ip_summed;
> >>          skb->csum = re->skb->csum;
> >>          pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> >>-                           length, PCI_DMA_FROMDEVICE);
> >>+                           dma_size, PCI_DMA_FROMDEVICE);
> >>          re->skb->ip_summed = CHECKSUM_NONE;
> >>          skb_put(skb, length);
> >>      }
> >This doesn't apply - I'm missing some intermediate patch.
> >
> >I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re,
> >data_size) vs., "length." I assume that I can just replace the
> >pci_unmap_len with dma_size... but perhaps the intermediate change
> >may have affected this as well?
> >
> Never mind - that was from one of the earlier patches I had been
> trying out. will try the above patch after reestablishing that the
> system still crashes without copybreak=1.
> 

Stephen, I'm not sure this patch can show much after the patch with
"legal" dma_size == re->data_addr didn't help. It looks like David
was right: dma_sync can't affect dmar, because it doesn't use it at
all.

Then I'd rather suggest to test if using copybreak more often, e.g.
with copybreak=1000 or even more can trigger these errors faster.

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 19:23                                 ` Jarek Poplawski
@ 2010-01-27 19:32                                   ` Jarek Poplawski
  0 siblings, 0 replies; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-27 19:32 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Wed, Jan 27, 2010 at 08:23:12PM +0100, Jarek Poplawski wrote:
> Stephen, I'm not sure this patch can show much after the patch with
> "legal" dma_size == re->data_addr didn't help. It looks like David

Of course: "dma_size == re->data_size".

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR
  2010-01-27 17:45                           ` Stephen Hemminger
  2010-01-27 17:57                             ` Michael Breuer
@ 2010-01-27 23:54                             ` David Miller
  1 sibling, 0 replies; 99+ messages in thread
From: David Miller @ 2010-01-27 23:54 UTC (permalink / raw)
  To: shemminger
  Cc: mbreuer, jarkao2, akpm, flyboy, linux-kernel, netdev, mchan,
	pcnet32, romieu, mcarlson

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 27 Jan 2010 09:45:31 -0800

> Setting it to 1 causes driver to never go through the dma_sync_single/memcpy
> path.  Perhaps the code for DMAR doesn't do dma_sync_single_for_cpu
> properly, or the value passed to sync_single_for_cpu doesn't account for
> all the overhead of padding and/or ether header.

For the millionth time...

DMAR does not implement any of the DMA SYNC operations, they are set
to NULL in the dma ops vector, and thus are NOPs.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-27 18:45                               ` Michael Breuer
  2010-01-27 19:23                                 ` Jarek Poplawski
@ 2010-01-28 15:32                                 ` Michael Breuer
  2010-01-28 16:43                                   ` Michael Breuer
  1 sibling, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-28 15:32 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 01/27/2010 01:45 PM, Michael Breuer wrote:
> On 1/27/2010 1:08 PM, Michael Breuer wrote:
>>
>>
>> I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re, 
>> data_size) vs., "length." I assume that I can just replace the 
>> pci_unmap_len with dma_size... but perhaps the intermediate change 
>> may have affected this as well?
>>
> Never mind - that was from one of the earlier patches I had been 
> trying out. will try the above patch after reestablishing that the 
> system still crashes without copybreak=1.
Just FYI - still crashes with default copybreak.  Didn't get the netdev 
watchdog this time - just DMAR and then HW watchdog reboot (see below).

So what's known to be required to cause this crash:

1) sky2 @ 1Gb
2) High sustained RX load (> 40MBps)
3) Uptime (I can't cause this to happen just after boot).
4) DMAR enabled (doesn't crash w/o DMAR).
5) copybreak != 1

What might be required but is unproven:
1) cifs traffic (I've only seen this when the high traffic was due to a 
Win7 box doing backup). I've tried but have been unable to recreate by 
just copying large files. Backups done from a Mac OS laptop don't 
trigger the issue even though that machine is also connecting with CIFS 
(TimeMachine works better that way).
2) DHCP traffic. There has always been some sort of DHCP exchange in the 
log before the first indication of a problem (DMAR).
3) Total throughput since boot. DK about this - however the uptime 
component before the latest crash was the shortest yet. In preparation I 
moved a bunch of large files around on the Windows box to ensure a 
larger than normal backup run. I also ran manually before going to bed 
(then moved the files around again). Didn't crash when I was watching - 
but did overnight. Total uptime before this crash was only about 6 
hours. Previously (with less backup data) the system didn't crash until 
24-36 hours.

Observations:

Copybreak: I did play for an hour or so yesterday with copybreak=1000. 
Ran traffic, etc. No crash, but throughput was lower and the system was 
clearly working way harder than normal. Given the whine of the fans I'm 
not keen on leaving the system in that state for any extended period of 
time.

MTU: Increasing the MTU to 9000 yesterday after the system had been up 
for some time (copybreak=1) crashed the system immediately. Subsequently 
I have been able to change the mtu without crashes (although the driver 
does end up in some sort of state that requires a restart after lowering 
the mtu). I suspect that over time something is being corrupted 
resulting in the crash when changing mtu. Whatever it becoming corrupted 
is probably related to the other crash as well. That suggests to me that 
copybreak=1 is preventing or delaying the manifestation of the 
underlying issue but is unrelated to the source of corruption.

[no messages in the prior three minutes - there was a dhcp exchange 
(request/ack) at 06:02:27]
Jan 28 06:05:58 mail kernel: DRHD: handling fault status reg 2
Jan 28 06:05:58 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
fault addr ffdd06bfe000
Jan 28 06:05:58 mail kernel: DMAR:[fault reason 06] PTE Read access is 
not set
Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: error interrupt 
status=0x80000000
Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
[No further messages until restart at 06:09:46.]


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-28 15:32                                 ` Michael Breuer
@ 2010-01-28 16:43                                   ` Michael Breuer
  2010-01-28 17:08                                     ` Stephen Hemminger
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-28 16:43 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 01/28/2010 10:32 AM, Michael Breuer wrote:
> On 01/27/2010 01:45 PM, Michael Breuer wrote:
>> On 1/27/2010 1:08 PM, Michael Breuer wrote:
>>>
>>>
>>> I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re, 
>>> data_size) vs., "length." I assume that I can just replace the 
>>> pci_unmap_len with dma_size... but perhaps the intermediate change 
>>> may have affected this as well?
>>>
>> Never mind - that was from one of the earlier patches I had been 
>> trying out. will try the above patch after reestablishing that the 
>> system still crashes without copybreak=1.
> Just FYI - still crashes with default copybreak.  Didn't get the 
> netdev watchdog this time - just DMAR and then HW watchdog reboot (see 
> below).
>
> So what's known to be required to cause this crash:
>
> 1) sky2 @ 1Gb
> 2) High sustained RX load (> 40MBps)
> 3) Uptime (I can't cause this to happen just after boot).
> 4) DMAR enabled (doesn't crash w/o DMAR).
> 5) copybreak != 1
>
> What might be required but is unproven:
> 1) cifs traffic (I've only seen this when the high traffic was due to 
> a Win7 box doing backup). I've tried but have been unable to recreate 
> by just copying large files. Backups done from a Mac OS laptop don't 
> trigger the issue even though that machine is also connecting with 
> CIFS (TimeMachine works better that way).
> 2) DHCP traffic. There has always been some sort of DHCP exchange in 
> the log before the first indication of a problem (DMAR).
> 3) Total throughput since boot. DK about this - however the uptime 
> component before the latest crash was the shortest yet. In preparation 
> I moved a bunch of large files around on the Windows box to ensure a 
> larger than normal backup run. I also ran manually before going to bed 
> (then moved the files around again). Didn't crash when I was watching 
> - but did overnight. Total uptime before this crash was only about 6 
> hours. Previously (with less backup data) the system didn't crash 
> until 24-36 hours.
>
> Observations:
>
> Copybreak: I did play for an hour or so yesterday with copybreak=1000. 
> Ran traffic, etc. No crash, but throughput was lower and the system 
> was clearly working way harder than normal. Given the whine of the 
> fans I'm not keen on leaving the system in that state for any extended 
> period of time.
>
> MTU: Increasing the MTU to 9000 yesterday after the system had been up 
> for some time (copybreak=1) crashed the system immediately. 
> Subsequently I have been able to change the mtu without crashes 
> (although the driver does end up in some sort of state that requires a 
> restart after lowering the mtu). I suspect that over time something is 
> being corrupted resulting in the crash when changing mtu. Whatever it 
> becoming corrupted is probably related to the other crash as well. 
> That suggests to me that copybreak=1 is preventing or delaying the 
> manifestation of the underlying issue but is unrelated to the source 
> of corruption.
>
> [no messages in the prior three minutes - there was a dhcp exchange 
> (request/ack) at 06:02:27]
> Jan 28 06:05:58 mail kernel: DRHD: handling fault status reg 2
> Jan 28 06:05:58 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
> fault addr ffdd06bfe000
> Jan 28 06:05:58 mail kernel: DMAR:[fault reason 06] PTE Read access is 
> not set
> Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: error interrupt 
> status=0x80000000
> Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: PCI hardware error 
> (0x2010)
> [No further messages until restart at 06:09:46.]
>
Update: I played with dma-debug. Was being disabled due to lack of 
memory. I forced it back on while pumping traffic through and got this:
Jan 28 11:39:30 mail kernel: ------------[ cut here ]------------
Jan 28 11:39:30 mail kernel: WARNING: at lib/dma-debug.c:902 
check_sync+0xc1/0x43f()
Jan 28 11:39:30 mail kernel: Hardware name: System Product Name
Jan 28 11:39:30 mail kernel: sky2 0000:06:00.0: DMA-API: device driver 
tries to sync DMA memory it has not allocated [device 
address=0x0000ffff4fe37022] [size=1520 bytes]
Jan 28 11:39:30 mail kernel: Modules linked in: microcode(+) 
ip6table_filter ip6table_mangle ip6_tables iptable_raw iptable_mangle 
ipt_MASQUERADE iptable_nat nf_nat bridge stp appletalk psnap llc nfsd 
lockd nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc 
acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns 
nf_conntrack_ftp xt_DSCP xt_dscp xt_MARK nf_conntrack_ipv6 xt_multiport 
ipv6 dm_multipath kvm_intel kvm snd_hda_codec_analog snd_ens1371 
gameport snd_rawmidi gspca_spca505 snd_hda_intel snd_ac97_codec 
gspca_main snd_hda_codec videodev snd_hwdep snd_seq v4l1_compat i2c_i801 
pcspkr ac97_bus v4l2_compat_ioctl32 snd_seq_device asus_atk0110 hwmon 
snd_pcm firewire_ohci firewire_core crc_itu_t sky2 snd_timer snd 
iTCO_wdt iTCO_vendor_support wmi soundcore snd_page_alloc fbcon tileblit 
font bitblit softcursor raid456 async_raid6_recov async_pq raid6_pq 
async_xor xor async_memcpy async_tx raid1 ata_generic pata_acpi 
pata_marvell nouveau ttm drm_kms_helper drm agpgart fb i2c_algo_bit 
cfbcopyarea i2c_core cfb
Jan 28 11:39:30 mail kernel: imgblt cfbfillrect [last unloaded: ip6_tables]
Jan 28 11:39:30 mail kernel: Pid: 5327, comm: bash Tainted: G        W  
2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty #6
Jan 28 11:39:30 mail kernel: Call Trace:
Jan 28 11:39:30 mail kernel: <IRQ>  [<ffffffff810536ee>] 
warn_slowpath_common+0x7c/0x94
Jan 28 11:39:30 mail kernel: [<ffffffff8105375d>] 
warn_slowpath_fmt+0x41/0x43
Jan 28 11:39:30 mail kernel: [<ffffffff8127b891>] check_sync+0xc1/0x43f
Jan 28 11:39:30 mail kernel: [<ffffffff8146c51a>] ? 
_spin_unlock_irqrestore+0x29/0x41
Jan 28 11:39:30 mail kernel: [<ffffffff813cac10>] ? 
__netdev_alloc_skb+0x34/0x50
Jan 28 11:39:30 mail kernel: [<ffffffff8127bf62>] 
debug_dma_sync_single_for_cpu+0x42/0x44
Jan 28 11:39:30 mail kernel: [<ffffffff813cac10>] ? 
__netdev_alloc_skb+0x34/0x50
Jan 28 11:39:30 mail kernel: [<ffffffffa019aee8>] sky2_poll+0x4d5/0xb06 
[sky2]
Jan 28 11:39:30 mail kernel: [<ffffffff81044840>] ? 
enqueue_entity+0x26c/0x279
Jan 28 11:39:30 mail kernel: [<ffffffff8107decf>] ? 
clockevents_program_event+0x7a/0x83
Jan 28 11:39:30 mail kernel: [<ffffffff813d18ae>] net_rx_action+0xb5/0x1f3
Jan 28 11:39:30 mail kernel: [<ffffffff8105af0f>] __do_softirq+0xf8/0x1cd
Jan 28 11:39:30 mail kernel: [<ffffffff810a3006>] ? 
handle_IRQ_event+0x119/0x12b
Jan 28 11:39:30 mail kernel: [<ffffffff81012e1c>] call_softirq+0x1c/0x30
Jan 28 11:39:30 mail kernel: [<ffffffff810143a3>] do_softirq+0x4b/0xa6
Jan 28 11:39:30 mail kernel: [<ffffffff8105aaef>] irq_exit+0x4a/0x8c
Jan 28 11:39:30 mail kernel: [<ffffffff81470575>] do_IRQ+0xa5/0xbc
Jan 28 11:39:30 mail kernel: [<ffffffff81012613>] ret_from_intr+0x0/0x16
Jan 28 11:39:30 mail kernel: <EOI>
Jan 28 11:39:30 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 28 11:39:30 mail kernel: DMA-API: debugging out of memory - disabling




^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-28 16:43                                   ` Michael Breuer
@ 2010-01-28 17:08                                     ` Stephen Hemminger
  2010-01-28 18:46                                       ` Michael Breuer
  0 siblings, 1 reply; 99+ messages in thread
From: Stephen Hemminger @ 2010-01-28 17:08 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Thu, 28 Jan 2010 11:43:16 -0500
Michael Breuer <mbreuer@majjas.com> wrote:

> Update: I played with dma-debug. Was being disabled due to lack of 
> memory. I forced it back on while pumping traffic through and got this:
> Jan 28 11:39:30 mail kernel: ------------[ cut here ]------------
> Jan 28 11:39:30 mail kernel: WARNING: at lib/dma-debug.c:902 
> check_sync+0xc1/0x43f()
> Jan 28 11:39:30 mail kernel: Hardware name: System Product Name
> Jan 28 11:39:30 mail kernel: sky2 0000:06:00.0: DMA-API: device driver 
> tries to sync DMA memory it has not allocated [device 

This test in dma-debug is bogus. Because the debug code matches
dma based on address and size; and is perfectly valid to sync a value
less than size. This is the patch I sent earlier, it isn't 100%
correct but it will let you keep testing
....................................


This should fix the dma-debug API code (and documentation), to
avoid false positives when sync is done on a partial map.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/Documentation/DMA-API.txt	2010-01-20 15:17:01.390143729 -0800
+++ b/Documentation/DMA-API.txt	2010-01-20 15:18:48.967875255 -0800
@@ -377,9 +377,10 @@ void
 pci_dma_sync_sg(struct pci_dev *hwdev, struct scatterlist *sg,
 		       int nelems, int direction)
 
-Synchronise a single contiguous or scatter/gather mapping.  All the
-parameters must be the same as those passed into the single mapping
-API.
+Synchronise a single contiguous or scatter/gather mapping.  The
+device and handle must be the same as those passed into the single mapping
+API. The size can be less than the original mapping if only part
+of the mapping needs to be accessed.
 
 Notes:  You must do this:
 
--- a/lib/dma-debug.c	2010-01-20 15:22:55.919519883 -0800
+++ b/lib/dma-debug.c	2010-01-20 15:26:31.648895638 -0800
@@ -285,11 +285,9 @@ static struct dma_debug_entry *hash_buck
 	}
 
 	/*
-	 * If we have multiple matches but no perfect-fit, just return
-	 * NULL.
+	 * If we have multiple matches but no perfect-fit
+	 * return best value and let caller deal with it.
 	 */
-	ret = (matches == 1) ? ret : NULL;
-
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-28 17:08                                     ` Stephen Hemminger
@ 2010-01-28 18:46                                       ` Michael Breuer
  2010-01-28 22:34                                         ` Jarek Poplawski
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-28 18:46 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/28/2010 12:08 PM, Stephen Hemminger wrote:
> On Thu, 28 Jan 2010 11:43:16 -0500
> Michael Breuer<mbreuer@majjas.com>  wrote:
>
>    
>> Update: I played with dma-debug. Was being disabled due to lack of
>> memory. I forced it back on while pumping traffic through and got this:
>> Jan 28 11:39:30 mail kernel: ------------[ cut here ]------------
>> Jan 28 11:39:30 mail kernel: WARNING: at lib/dma-debug.c:902
>> check_sync+0xc1/0x43f()
>> Jan 28 11:39:30 mail kernel: Hardware name: System Product Name
>> Jan 28 11:39:30 mail kernel: sky2 0000:06:00.0: DMA-API: device driver
>> tries to sync DMA memory it has not allocated [device
>>      
> This test in dma-debug is bogus. Because the debug code matches
> dma based on address and size; and is perfectly valid to sync a value
> less than size. This is the patch I sent earlier, it isn't 100%
> correct but it will let you keep testing
> ....................................
>
>
> This should fix the dma-debug API code (and documentation), to
> avoid false positives when sync is done on a partial map.
>
> Signed-off-by: Stephen Hemminger<shemminger@vyatta.com>
>
> --- a/Documentation/DMA-API.txt	2010-01-20 15:17:01.390143729 -0800
> +++ b/Documentation/DMA-API.txt	2010-01-20 15:18:48.967875255 -0800
> @@ -377,9 +377,10 @@ void
>   pci_dma_sync_sg(struct pci_dev *hwdev, struct scatterlist *sg,
>   		       int nelems, int direction)
>
> -Synchronise a single contiguous or scatter/gather mapping.  All the
> -parameters must be the same as those passed into the single mapping
> -API.
> +Synchronise a single contiguous or scatter/gather mapping.  The
> +device and handle must be the same as those passed into the single mapping
> +API. The size can be less than the original mapping if only part
> +of the mapping needs to be accessed.
>
>   Notes:  You must do this:
>
> --- a/lib/dma-debug.c	2010-01-20 15:22:55.919519883 -0800
> +++ b/lib/dma-debug.c	2010-01-20 15:26:31.648895638 -0800
> @@ -285,11 +285,9 @@ static struct dma_debug_entry *hash_buck
>   	}
>
>   	/*
> -	 * If we have multiple matches but no perfect-fit, just return
> -	 * NULL.
> +	 * If we have multiple matches but no perfect-fit
> +	 * return best value and let caller deal with it.
>   	 */
> -	ret = (matches == 1) ? ret : NULL;
> -
>   	return ret;
>   }
>
>    
Ok - applied. Noise gone... however I'm not sure whether I'll be able to 
keep dma-debug going long enough to catch anything. num_free_entries 
keeps dropping... looks like entries are not freed. I'm running with a 
huge number for now & sky2 as the driver filter. Is there a reason that 
entries wouldn't be unmapped, or is dma-debug.c just not processing the 
unmap correctly?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-28 18:46                                       ` Michael Breuer
@ 2010-01-28 22:34                                         ` Jarek Poplawski
  2010-01-28 22:43                                           ` Michael Breuer
  0 siblings, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-28 22:34 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Thu, Jan 28, 2010 at 01:46:17PM -0500, Michael Breuer wrote:
> On 1/28/2010 12:08 PM, Stephen Hemminger wrote:
> >--- a/lib/dma-debug.c	2010-01-20 15:22:55.919519883 -0800
> >+++ b/lib/dma-debug.c	2010-01-20 15:26:31.648895638 -0800
> >@@ -285,11 +285,9 @@ static struct dma_debug_entry *hash_buck
> >  	}
> >
> >  	/*
> >-	 * If we have multiple matches but no perfect-fit, just return
> >-	 * NULL.
> >+	 * If we have multiple matches but no perfect-fit
> >+	 * return best value and let caller deal with it.
> >  	 */
> >-	ret = (matches == 1) ? ret : NULL;
> >-
> >  	return ret;
> >  }
> >
> Ok - applied. Noise gone... however I'm not sure whether I'll be
> able to keep dma-debug going long enough to catch anything.
> num_free_entries keeps dropping... looks like entries are not freed.
> I'm running with a huge number for now & sky2 as the driver filter.
> Is there a reason that entries wouldn't be unmapped, or is
> dma-debug.c just not processing the unmap correctly?

Do you mean it's after this patch or earlier too? I think you might
use my sky2/receive_copy/pci_unmap_len patch instead to get rid of
this warning.

Btw, since 1000 was too much, maybe you could try copybreak=256 yet,
plus additional ping or some other source of shorter packets. And how
about trying this new switch?

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-28 22:34                                         ` Jarek Poplawski
@ 2010-01-28 22:43                                           ` Michael Breuer
  2010-01-28 22:56                                             ` Jarek Poplawski
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-28 22:43 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/28/2010 5:34 PM, Jarek Poplawski wrote:
> On Thu, Jan 28, 2010 at 01:46:17PM -0500, Michael Breuer wrote:
>    
>> On 1/28/2010 12:08 PM, Stephen Hemminger wrote:
>>      
>>> --- a/lib/dma-debug.c	2010-01-20 15:22:55.919519883 -0800
>>> +++ b/lib/dma-debug.c	2010-01-20 15:26:31.648895638 -0800
>>> @@ -285,11 +285,9 @@ static struct dma_debug_entry *hash_buck
>>>   	}
>>>
>>>   	/*
>>> -	 * If we have multiple matches but no perfect-fit, just return
>>> -	 * NULL.
>>> +	 * If we have multiple matches but no perfect-fit
>>> +	 * return best value and let caller deal with it.
>>>   	 */
>>> -	ret = (matches == 1) ? ret : NULL;
>>> -
>>>   	return ret;
>>>   }
>>>
>>>        
>> Ok - applied. Noise gone... however I'm not sure whether I'll be
>> able to keep dma-debug going long enough to catch anything.
>> num_free_entries keeps dropping... looks like entries are not freed.
>> I'm running with a huge number for now&  sky2 as the driver filter.
>> Is there a reason that entries wouldn't be unmapped, or is
>> dma-debug.c just not processing the unmap correctly?
>>      
> Do you mean it's after this patch or earlier too? I think you might
> use my sky2/receive_copy/pci_unmap_len patch instead to get rid of
> this warning.
>
> Btw, since 1000 was too much, maybe you could try copybreak=256 yet,
> plus additional ping or some other source of shorter packets. And how
> about trying this new switch?
>
> Jarek P.
>    
This is with the pci_unmap_len patch as well as the dma-debug patch. I'm 
not getting any warnings - but dma-debug num-free-entries drops until 
zero and then debugging is disabled. I started with 8,000,000 about 
three hours ago - without load I'm already down to less than half that. 
Again - only looking at sky2. Looks like the dma-debug hash table is 
reducing one entry for every packet, but never increasing the 
num_entries (no unmap perhaps).

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-28 22:43                                           ` Michael Breuer
@ 2010-01-28 22:56                                             ` Jarek Poplawski
  2010-01-28 22:59                                               ` Michael Breuer
  0 siblings, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-28 22:56 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Thu, Jan 28, 2010 at 05:43:34PM -0500, Michael Breuer wrote:
> This is with the pci_unmap_len patch as well as the dma-debug patch.
> I'm not getting any warnings - but dma-debug num-free-entries drops
> until zero and then debugging is disabled. I started with 8,000,000
> about three hours ago - without load I'm already down to less than
> half that. Again - only looking at sky2. Looks like the dma-debug
> hash table is reducing one entry for every packet, but never
> increasing the num_entries (no unmap perhaps).

OK, then, until there is some new fix you better turn it off.

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)
  2010-01-28 22:56                                             ` Jarek Poplawski
@ 2010-01-28 22:59                                               ` Michael Breuer
  2010-01-28 23:36                                                 ` [PATCH] sky2: receive dma mapping error handling Stephen Hemminger
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-28 22:59 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/28/2010 5:56 PM, Jarek Poplawski wrote:
> On Thu, Jan 28, 2010 at 05:43:34PM -0500, Michael Breuer wrote:
>    
>> This is with the pci_unmap_len patch as well as the dma-debug patch.
>> I'm not getting any warnings - but dma-debug num-free-entries drops
>> until zero and then debugging is disabled. I started with 8,000,000
>> about three hours ago - without load I'm already down to less than
>> half that. Again - only looking at sky2. Looks like the dma-debug
>> hash table is reducing one entry for every packet, but never
>> increasing the num_entries (no unmap perhaps).
>>      
> OK, then, until there is some new fix you better turn it off.
>
> Jarek P.
>    
:( any further suggestions for helping track this down?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH] sky2:  receive dma mapping error handling
  2010-01-28 22:59                                               ` Michael Breuer
@ 2010-01-28 23:36                                                 ` Stephen Hemminger
  2010-01-29  0:05                                                   ` Michael Breuer
                                                                     ` (3 more replies)
  0 siblings, 4 replies; 99+ messages in thread
From: Stephen Hemminger @ 2010-01-28 23:36 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

Please try this patch (and only this patch), on 2.6.33-rc5[*];
none of the other patches that did not make it upstream because that
confuses things too much.

The code that checks for DMA mapping errors on receive buffers would
not handle errors correctly.  I doubt you have these errors, but if you
did then it would explain the problems.  The code has to be a little
tricky and build mapping for new rx buffer before releasing old one,
that way if new mapping fails, the old one can be reused.

If it works for you, I will resubmit with signed-off.

---
* If you want to use DMA debugging, then you will also need the match patch.


--- a/drivers/net/sky2.c	2010-01-28 15:09:11.387649670 -0800
+++ b/drivers/net/sky2.c	2010-01-28 15:31:26.969181566 -0800
@@ -1106,18 +1106,39 @@ static int sky2_rx_map_skb(struct pci_de
 	int i;
 
 	re->data_addr = pci_map_single(pdev, skb->data, size, PCI_DMA_FROMDEVICE);
-	if (unlikely(pci_dma_mapping_error(pdev, re->data_addr)))
-		return -EIO;
+	if (pci_dma_mapping_error(pdev, re->data_addr))
+		goto mapping_error;
 
 	pci_unmap_len_set(re, data_size, size);
 
-	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-		re->frag_addr[i] = pci_map_page(pdev,
-						skb_shinfo(skb)->frags[i].page,
-						skb_shinfo(skb)->frags[i].page_offset,
-						skb_shinfo(skb)->frags[i].size,
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+		re->frag_addr[i] = pci_map_page(pdev, frag->page,
+						frag->page_offset,
+						frag->size,
 						PCI_DMA_FROMDEVICE);
+
+		if (pci_dma_mapping_error(pdev, re->frag_addr[i]))
+			goto map_page_error;
+	}
 	return 0;
+
+map_page_error:
+	while (--i >= 0) {
+		pci_unmap_page(pdev, re->frag_addr[i],
+			       skb_shinfo(skb)->frags[i].size,
+			       PCI_DMA_FROMDEVICE);
+	}
+
+	pci_unmap_single(pdev, re->data_addr, pci_unmap_len(re, data_size),
+			 PCI_DMA_FROMDEVICE);
+
+mapping_error:
+	if (net_ratelimit())
+		dev_warn(&pdev->dev, "%s: rx mapping error\n",
+			 skb->dev->name);
+	return -EIO;
 }
 
 static void sky2_rx_unmap_skb(struct pci_dev *pdev, struct rx_ring_info *re)
@@ -2308,30 +2329,32 @@ static struct sk_buff *receive_new(struc
 				   struct rx_ring_info *re,
 				   unsigned int length)
 {
-	struct sk_buff *skb, *nskb;
+	struct sk_buff *skb;
+	struct rx_ring_info nre;
 	unsigned hdr_space = sky2->rx_data_size;
 
-	/* Don't be tricky about reusing pages (yet) */
-	nskb = sky2_rx_alloc(sky2);
-	if (unlikely(!nskb))
-		return NULL;
+	nre.skb = sky2_rx_alloc(sky2);
+	if (unlikely(!nre.skb))
+		goto nobuf;
+
+	if (sky2_rx_map_skb(sky2->hw->pdev, &nre, hdr_space))
+		goto nomap;
 
 	skb = re->skb;
 	sky2_rx_unmap_skb(sky2->hw->pdev, re);
-
 	prefetch(skb->data);
-	re->skb = nskb;
-	if (sky2_rx_map_skb(sky2->hw->pdev, re, hdr_space)) {
-		dev_kfree_skb(nskb);
-		re->skb = skb;
-		return NULL;
-	}
+	*re = nre;
 
 	if (skb_shinfo(skb)->nr_frags)
 		skb_put_frags(skb, hdr_space, length);
 	else
 		skb_put(skb, length);
 	return skb;
+
+nomap:
+	dev_kfree_skb(nre.skb);
+nobuf:
+	return NULL;
 }
 
 /*




^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-28 23:36                                                 ` [PATCH] sky2: receive dma mapping error handling Stephen Hemminger
@ 2010-01-29  0:05                                                   ` Michael Breuer
  2010-01-30 16:30                                                   ` Michael Breuer
                                                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 99+ messages in thread
From: Michael Breuer @ 2010-01-29  0:05 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/28/2010 6:36 PM, Stephen Hemminger wrote:
> Please try this patch (and only this patch), on 2.6.33-rc5[*];
> none of the other patches that did not make it upstream because that
> confuses things too much.
>
> The code that checks for DMA mapping errors on receive buffers would
> not handle errors correctly.  I doubt you have these errors, but if you
> did then it would explain the problems.  The code has to be a little
> tricky and build mapping for new rx buffer before releasing old one,
> that way if new mapping fails, the old one can be reused.
>
> If it works for you, I will resubmit with signed-off.
>
> ---
> * If you want to use DMA debugging, then you will also need the match patch.
>    
Ok - I'll also be running with the recent sched fork vs. hotplug vs. 
cpuset namespaces patch (commit 
fabf318e5e4bda0aca2b0d617b191884fda62703) from tip. Without that I get 
an rcu hang.

My plan then is to run with your patch, the rcu patch & the dma debug 
patch, but disable dma debug for now and see of the problem recurs. If 
it works, I'll know in a couple of days. If not, perhaps sooner :(.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-28 23:36                                                 ` [PATCH] sky2: receive dma mapping error handling Stephen Hemminger
  2010-01-29  0:05                                                   ` Michael Breuer
@ 2010-01-30 16:30                                                   ` Michael Breuer
  2010-01-30 16:31                                                   ` Michael Breuer
  2010-02-02 22:44                                                   ` Andi Kleen
  3 siblings, 0 replies; 99+ messages in thread
From: Michael Breuer @ 2010-01-30 16:30 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/28/2010 6:36 PM, Stephen Hemminger wrote:
> Please try this patch (and only this patch), on 2.6.33-rc5[*];
> none of the other patches that did not make it upstream because that
> confuses things too much.
>
> The code that checks for DMA mapping errors on receive buffers would
> not handle errors correctly.  I doubt you have these errors, but if you
> did then it would explain the problems.  The code has to be a little
> tricky and build mapping for new rx buffer before releasing old one,
> that way if new mapping fails, the old one can be reused.
>
> If it works for you, I will resubmit with signed-off.
>    
Nope - tx crash again. This time the system stayed up (but hosed) for a 
few hours. When I tried to recover eth0, the system crashed.
Brief summary of events (log extract below):

System start Jan 28 19:29
Everything seemed good (load and all) until 17:13:11 the following day 
when I got rx errors:

Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010 
length 1518
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010 
length 1518
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x8180010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x7f40010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x6230010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x8180010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x6230010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x7f40010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x8180010 
length 1518
Jan 29 17:13:14 mail kernel: sky2 eth0: rx error, status 0x5f60010 
length 1518

The system continued running normally after this until this morning (Jan 
30) at 0:44:55:
Jan 30 05:44:55 mail kernel: DRHD: handling fault status reg 2
Jan 30 05:44:55 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
fault addr ffc4331ff000
Jan 30 05:44:55 mail kernel: DMAR:[fault reason 06] PTE Read access is 
not set
Jan 30 05:44:55 mail kernel: net_ratelimit: 2 callbacks suppressed
Jan 30 05:44:55 mail kernel: sky2 0000:06:00.0: error interrupt 
status=0xc0000000
Jan 30 05:44:55 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
Jan 30 05:45:01 mail kernel: ------------[ cut here ]------------
Jan 30 05:45:01 mail kernel: WARNING: at net/sched/sch_generic.c:255 
dev_watchdog+0xf3/0x161()
Jan 30 05:45:01 mail kernel: Hardware name: System Product Name
Jan 30 05:45:01 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit 
queue 0 timed out
Jan 30 05:45:01 mail kernel: Modules linked in: iptable_raw 
iptable_mangle ipt_MASQUERADE iptable_nat nf_nat cpufreq_stats 
ip6table_filter ip6table_mangle ip6_tables bridge stp appletalk psnap 
llc nfsd lockd nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc 
acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns 
nf_conntrack_ftp xt_DSCP xt_dscp xt_MARK nf_conntrack_ipv6 xt_multiport 
ipv6 dm_multipath kvm_intel kvm snd_hda_codec_analog snd_hda_intel 
snd_ens1371 gameport snd_hda_codec snd_rawmidi snd_ac97_codec 
gspca_spca505 ac97_bus gspca_main snd_hwdep videodev snd_seq 
snd_seq_device v4l1_compat snd_pcm v4l2_compat_ioctl32 snd_timer snd 
soundcore snd_page_alloc firewire_ohci pcspkr i2c_i801 firewire_core wmi 
asus_atk0110 crc_itu_t sky2 hwmon iTCO_wdt iTCO_vendor_support fbcon 
tileblit font bitblit softcursor raid456 async_raid6_recov async_pq 
raid6_pq async_xor xor async_memcpy async_tx raid1 ata_generic pata_acpi 
pata_marvell nouveau ttm drm_kms_helper drm agpgart fb i2c_algo_bit 
cfbcopyarea i2c_core cf
Jan 30 05:45:01 mail kernel: bimgblt cfbfillrect [last unloaded: nf_nat]
Jan 30 05:45:01 mail kernel: Pid: 0, comm: swapper Tainted: G        W  
2.6.33-rc5WITHMMAPNODMARFORKTIPSKY2DMAMAP-00283-gd4d37bd-dirty #1
Jan 30 05:45:01 mail kernel: Call Trace:
Jan 30 05:45:01 mail kernel: <IRQ>  [<ffffffff8104a03d>] 
warn_slowpath_common+0x7c/0x94
Jan 30 05:45:01 mail kernel: [<ffffffff8104a0ac>] 
warn_slowpath_fmt+0x41/0x43
Jan 30 05:45:01 mail kernel: [<ffffffff813d2f43>] ? netif_tx_lock+0x44/0x6c
Jan 30 05:45:01 mail kernel: [<ffffffff813d30ab>] dev_watchdog+0xf3/0x161
Jan 30 05:45:01 mail kernel: [<ffffffff8106a31f>] ? 
sched_clock_cpu+0x44/0xce
Jan 30 05:45:01 mail kernel: [<ffffffff8105761a>] 
run_timer_softirq+0x1c3/0x26b
Jan 30 05:45:01 mail kernel: [<ffffffff8105060c>] __do_softirq+0xf8/0x1cd
Jan 30 05:45:01 mail kernel: [<ffffffff8107192b>] ? 
tick_program_event+0x2a/0x2c
Jan 30 05:45:01 mail kernel: [<ffffffff8100ab1c>] call_softirq+0x1c/0x30
Jan 30 05:45:01 mail kernel: [<ffffffff8100c2b3>] do_softirq+0x4b/0xa3
Jan 30 05:45:01 mail kernel: [<ffffffff810501f8>] irq_exit+0x4a/0x8c
Jan 30 05:45:01 mail kernel: [<ffffffff81461859>] 
smp_apic_timer_interrupt+0x86/0x94
Jan 30 05:45:01 mail kernel: [<ffffffff8100a5d3>] 
apic_timer_interrupt+0x13/0x20
Jan 30 05:45:01 mail kernel: <EOI>  [<ffffffff812afbd4>] ? 
acpi_idle_enter_bm+0x256/0x28a
Jan 30 05:45:01 mail kernel: [<ffffffff812afbcd>] ? 
acpi_idle_enter_bm+0x24f/0x28a
Jan 30 05:45:01 mail kernel: [<ffffffff8139574c>] 
cpuidle_idle_call+0x9e/0xfa
Jan 30 05:45:01 mail kernel: [<ffffffff81008c05>] cpu_idle+0xb4/0xf6
Jan 30 05:45:01 mail kernel: [<ffffffff81455d48>] 
start_secondary+0x201/0x242
Jan 30 05:45:01 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 30 05:45:01 mail kernel: sky2 eth0: tx timeout
Jan 30 05:45:01 mail kernel: sky2 eth0: transmit ring 14 .. 102 
report=14 done=14
Jan 30 05:45:01 mail kernel: sky2 eth0: disabling interface
Jan 30 05:45:01 mail kernel: sky2 eth0: enabling interface

This down/up continued for several hours until I intervened at about 10:05.

I saw that there was no eth0 connectivity, eth1 was ok. It appeard that 
eth0 was receiving traffic but unable to send. arpwatch was reporting 
bogons, DHCP showed many DISCOVER/OFFER pairs, no REQUEST/ACK. Pings to 
any system failed; arp showed incomplete for anything hanging off of 
eth0. arping also failed.
I manually stopped and started eth0 (ifconfig) and reset iptables 
(although eth0 has no filters).

As I started looking at logs, the system hung and rebooted. I'm up now 
with dma debug enabled, however as with 2.6.32.4 num_entries is dropping 
and I don't think that dma debug will remain enabled long enough to 
catch a crash.

So, as I see things, there are two issues here: 1) the TX hang post DMAR 
error and 2) the inability to recover the interface and subsequent 
system instability.





^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-28 23:36                                                 ` [PATCH] sky2: receive dma mapping error handling Stephen Hemminger
  2010-01-29  0:05                                                   ` Michael Breuer
  2010-01-30 16:30                                                   ` Michael Breuer
@ 2010-01-30 16:31                                                   ` Michael Breuer
  2010-01-31  0:34                                                     ` Jarek Poplawski
  2010-02-02 22:44                                                   ` Andi Kleen
  3 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-30 16:31 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 01/28/2010 06:36 PM, Stephen Hemminger wrote:
> Please try this patch (and only this patch), on 2.6.33-rc5[*];
> none of the other patches that did not make it upstream because that
> confuses things too much.
>
> The code that checks for DMA mapping errors on receive buffers would
> not handle errors correctly.  I doubt you have these errors, but if you
> did then it would explain the problems.  The code has to be a little
> tricky and build mapping for new rx buffer before releasing old one,
> that way if new mapping fails, the old one can be reused.
>
> If it works for you, I will resubmit with signed-off.
>
> -
>
Nope - tx crash again. This time the system stayed up (but hosed) for a 
few hours. When I tried to recover eth0 the system then crashed.

Brief summary of events (log extract below):

System start Jan 28 19:29
Everything seemed good (load and all) until 17:13:11 the following day 
when I got rx errors:

Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010 
length 1518
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010 
length 1518
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x8180010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x7f40010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x6230010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x8180010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x6230010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x7f40010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x8180010 
length 1518
Jan 29 17:13:14 mail kernel: sky2 eth0: rx error, status 0x5f60010 
length 1518

The system continued running normally after this until this morning (Jan 
30) at 0:44:55:
Jan 30 05:44:55 mail kernel: DRHD: handling fault status reg 2
Jan 30 05:44:55 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
fault addr ffc4331ff000
Jan 30 05:44:55 mail kernel: DMAR:[fault reason 06] PTE Read access is 
not set
Jan 30 05:44:55 mail kernel: net_ratelimit: 2 callbacks suppressed
Jan 30 05:44:55 mail kernel: sky2 0000:06:00.0: error interrupt 
status=0xc0000000
Jan 30 05:44:55 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
Jan 30 05:45:01 mail kernel: ------------[ cut here ]------------
Jan 30 05:45:01 mail kernel: WARNING: at net/sched/sch_generic.c:255 
dev_watchdog+0xf3/0x161()
Jan 30 05:45:01 mail kernel: Hardware name: System Product Name
Jan 30 05:45:01 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit 
queue 0 timed out
Jan 30 05:45:01 mail kernel: Modules linked in: iptable_raw 
iptable_mangle ipt_MASQUERADE iptable_nat nf_nat cpufreq_stats 
ip6table_filter ip6table_mangle ip6_tables bridge stp appletalk psnap 
llc nfsd lockd nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc 
acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns 
nf_conntrack_ftp xt_DSCP xt_dscp xt_MARK nf_conntrack_ipv6 xt_multiport 
ipv6 dm_multipath kvm_intel kvm snd_hda_codec_analog snd_hda_intel 
snd_ens1371 gameport snd_hda_codec snd_rawmidi snd_ac97_codec 
gspca_spca505 ac97_bus gspca_main snd_hwdep videodev snd_seq 
snd_seq_device v4l1_compat snd_pcm v4l2_compat_ioctl32 snd_timer snd 
soundcore snd_page_alloc firewire_ohci pcspkr i2c_i801 firewire_core wmi 
asus_atk0110 crc_itu_t sky2 hwmon iTCO_wdt iTCO_vendor_support fbcon 
tileblit font bitblit softcursor raid456 async_raid6_recov async_pq 
raid6_pq async_xor xor async_memcpy async_tx raid1 ata_generic pata_acpi 
pata_marvell nouveau ttm drm_kms_helper drm agpgart fb i2c_algo_bit 
cfbcopyarea i2c_core cf
Jan 30 05:45:01 mail kernel: bimgblt cfbfillrect [last unloaded: nf_nat]
Jan 30 05:45:01 mail kernel: Pid: 0, comm: swapper Tainted: G        W 
2.6.33-rc5WITHMMAPNODMARFORKTIPSKY2DMAMAP-00283-gd4d37bd-dirty #1
Jan 30 05:45:01 mail kernel: Call Trace:
Jan 30 05:45:01 mail kernel: <IRQ>  [<ffffffff8104a03d>] 
warn_slowpath_common+0x7c/0x94
Jan 30 05:45:01 mail kernel: [<ffffffff8104a0ac>] 
warn_slowpath_fmt+0x41/0x43
Jan 30 05:45:01 mail kernel: [<ffffffff813d2f43>] ? netif_tx_lock+0x44/0x6c
Jan 30 05:45:01 mail kernel: [<ffffffff813d30ab>] dev_watchdog+0xf3/0x161
Jan 30 05:45:01 mail kernel: [<ffffffff8106a31f>] ? 
sched_clock_cpu+0x44/0xce
Jan 30 05:45:01 mail kernel: [<ffffffff8105761a>] 
run_timer_softirq+0x1c3/0x26b
Jan 30 05:45:01 mail kernel: [<ffffffff8105060c>] __do_softirq+0xf8/0x1cd
Jan 30 05:45:01 mail kernel: [<ffffffff8107192b>] ? 
tick_program_event+0x2a/0x2c
Jan 30 05:45:01 mail kernel: [<ffffffff8100ab1c>] call_softirq+0x1c/0x30
Jan 30 05:45:01 mail kernel: [<ffffffff8100c2b3>] do_softirq+0x4b/0xa3
Jan 30 05:45:01 mail kernel: [<ffffffff810501f8>] irq_exit+0x4a/0x8c
Jan 30 05:45:01 mail kernel: [<ffffffff81461859>] 
smp_apic_timer_interrupt+0x86/0x94
Jan 30 05:45:01 mail kernel: [<ffffffff8100a5d3>] 
apic_timer_interrupt+0x13/0x20
Jan 30 05:45:01 mail kernel: <EOI>  [<ffffffff812afbd4>] ? 
acpi_idle_enter_bm+0x256/0x28a
Jan 30 05:45:01 mail kernel: [<ffffffff812afbcd>] ? 
acpi_idle_enter_bm+0x24f/0x28a
Jan 30 05:45:01 mail kernel: [<ffffffff8139574c>] 
cpuidle_idle_call+0x9e/0xfa
Jan 30 05:45:01 mail kernel: [<ffffffff81008c05>] cpu_idle+0xb4/0xf6
Jan 30 05:45:01 mail kernel: [<ffffffff81455d48>] 
start_secondary+0x201/0x242
Jan 30 05:45:01 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 30 05:45:01 mail kernel: sky2 eth0: tx timeout
Jan 30 05:45:01 mail kernel: sky2 eth0: transmit ring 14 .. 102 
report=14 done=14
Jan 30 05:45:01 mail kernel: sky2 eth0: disabling interface
Jan 30 05:45:01 mail kernel: sky2 eth0: enabling interface

This down/up continued for several hours until I intervened at about 10:05.

I saw that there was no eth0 connectivity, eth1 was ok. It appeard that 
eth0 was receiving traffic but unable to send. arpwatch was reporting 
bogons, DHCP showed many DISCOVER/OFFER pairs, no REQUEST/ACK. Pings to 
any system failed; arp showed incomplete for anything hanging off of 
eth0. arping also failed.
I manually stopped and started eth0 (ifconfig) and reset iptables 
(although eth0 has no filters).

As I started looking at logs, the system hung and rebooted. I'm up now 
with dma debug enabled, however as with 2.6.32.4 num_entries is dropping 
and I don't think that dma debug will remain enabled long enough to 
catch a crash.

So, as I see things, there are two issues here: 1) the TX hang post DMAR 
error and 2) the inability to recover the interface and subsequent 
system instability.




^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-30 16:31                                                   ` Michael Breuer
@ 2010-01-31  0:34                                                     ` Jarek Poplawski
  2010-01-31  4:17                                                       ` Michael Breuer
  2010-01-31  4:55                                                       ` Michael Breuer
  0 siblings, 2 replies; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-31  0:34 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Sat, Jan 30, 2010 at 11:31:48AM -0500, Michael Breuer wrote:
> On 01/28/2010 06:36 PM, Stephen Hemminger wrote:
> >Please try this patch (and only this patch), on 2.6.33-rc5[*];
> >none of the other patches that did not make it upstream because that
> >confuses things too much.
> >
> >The code that checks for DMA mapping errors on receive buffers would
> >not handle errors correctly.  I doubt you have these errors, but if you
> >did then it would explain the problems.  The code has to be a little
> >tricky and build mapping for new rx buffer before releasing old one,
> >that way if new mapping fails, the old one can be reused.
> >
> >If it works for you, I will resubmit with signed-off.
> >
> >-
> >
> Nope - tx crash again. This time the system stayed up (but hosed)
> for a few hours. When I tried to recover eth0 the system then
> crashed.
> 
> Brief summary of events (log extract below):
> 
> System start Jan 28 19:29
> Everything seemed good (load and all) until 17:13:11 the following
> day when I got rx errors:
> 
> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010
> length 1518
> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010
> length 1518

These are length errors, but status shows more than 1518, e.g. 2036
here, unless I miss something. Please, don't use jumbo frames in your
network until we fully debug it for regular frames (Stephen admitted
sky2 jumbo might be broken).

...
> As I started looking at logs, the system hung and rebooted. I'm up
> now with dma debug enabled, however as with 2.6.32.4 num_entries is
> dropping and I don't think that dma debug will remain enabled long
> enough to catch a crash.

Could you try the patch below to show maybe some other users of
dma-debug entries?

Jarek P.
---

 lib/dma-debug.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 51 insertions(+), 1 deletions(-)

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 7d2f0b3..e2dcc9c 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -310,6 +310,53 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
 	list_del(&entry->list);
 }
 
+struct dma_debug_dev {
+	struct device *dev;
+	unsigned int cnt;
+};
+
+#define DMA_DEBUG_DEVS 100
+static struct dma_debug_dev dma_debug_devs[DMA_DEBUG_DEVS];
+
+static void debug_dma_dump_devs(void)
+{
+	int idx, i;
+
+	memset(dma_debug_devs, 0, sizeof(struct dma_debug_dev) * DMA_DEBUG_DEVS);
+
+	for (idx = 0; idx < HASH_SIZE; idx++) {
+		struct hash_bucket *bucket = &dma_entry_hash[idx];
+		struct dma_debug_entry *entry;
+		unsigned long flags;
+
+		spin_lock_irqsave(&bucket->lock, flags);
+
+		list_for_each_entry(entry, &bucket->list, list) {
+			for (i = 0; i < DMA_DEBUG_DEVS; i++) {
+				struct device *dev = dma_debug_devs[i].dev;
+
+				if (!dev || dev == entry->dev) {
+					dma_debug_devs[i].dev = entry->dev;
+					dma_debug_devs[i].cnt++;
+					break;
+				}
+			}
+		}
+
+		spin_unlock_irqrestore(&bucket->lock, flags);
+	}
+
+	for (i = 0; i < DMA_DEBUG_DEVS; i++) {
+		struct device *dev = dma_debug_devs[i].dev;
+
+		if (!dev)
+			break;
+
+		pr_info("DMA-API: %s: entries: %d\n", dev_name(dev),
+			dma_debug_devs[i].cnt);
+	}
+}
+
 /*
  * Dump mapping entries for debugging purposes
  */
@@ -363,8 +410,11 @@ static struct dma_debug_entry *__dma_entry_alloc(void)
 	memset(entry, 0, sizeof(*entry));
 
 	num_free_entries -= 1;
-	if (num_free_entries < min_free_entries)
+	if (num_free_entries < min_free_entries) {
 		min_free_entries = num_free_entries;
+		if ((min_free_entries & 0xffff) == 0)
+			debug_dma_dump_devs();
+	}
 
 	return entry;
 }

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-31  0:34                                                     ` Jarek Poplawski
@ 2010-01-31  4:17                                                       ` Michael Breuer
  2010-01-31 22:25                                                         ` Jarek Poplawski
  2010-01-31  4:55                                                       ` Michael Breuer
  1 sibling, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-31  4:17 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
> On Sat, Jan 30, 2010 at 11:31:48AM -0500, Michael Breuer wrote:
>    
>> On 01/28/2010 06:36 PM, Stephen Hemminger wrote:
>>      
>>> Please try this patch (and only this patch), on 2.6.33-rc5[*];
>>> none of the other patches that did not make it upstream because that
>>> confuses things too much.
>>>
>>> The code that checks for DMA mapping errors on receive buffers would
>>> not handle errors correctly.  I doubt you have these errors, but if you
>>> did then it would explain the problems.  The code has to be a little
>>> tricky and build mapping for new rx buffer before releasing old one,
>>> that way if new mapping fails, the old one can be reused.
>>>
>>> If it works for you, I will resubmit with signed-off.
>>>
>>> -
>>>
>>>        
>> Nope - tx crash again. This time the system stayed up (but hosed)
>> for a few hours. When I tried to recover eth0 the system then
>> crashed.
>>
>> Brief summary of events (log extract below):
>>
>> System start Jan 28 19:29
>> Everything seemed good (load and all) until 17:13:11 the following
>> day when I got rx errors:
>>
>> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010
>> length 1518
>> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010
>> length 1518
>>      
> These are length errors, but status shows more than 1518, e.g. 2036
> here, unless I miss something. Please, don't use jumbo frames in your
> network until we fully debug it for regular frames (Stephen admitted
> sky2 jumbo might be broken).
>    
MTU was 1500 - not using jumbo frames as they don't work.
> ...
>    
>> As I started looking at logs, the system hung and rebooted. I'm up
>> now with dma debug enabled, however as with 2.6.32.4 num_entries is
>> dropping and I don't think that dma debug will remain enabled long
>> enough to catch a crash.
>>      
> Could you try the patch below to show maybe some other users of
> dma-debug entries?
>
> Jarek P.
> ---
>    
Will do. Note that I'm running with the dma debug filter set to sky2.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-31  0:34                                                     ` Jarek Poplawski
  2010-01-31  4:17                                                       ` Michael Breuer
@ 2010-01-31  4:55                                                       ` Michael Breuer
  2010-01-31 18:50                                                         ` Michael Breuer
  1 sibling, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-31  4:55 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
>
> Could you try the patch below to show maybe some other users of
> dma-debug entries?
>
> Jarek P.
> ---
>
>    
With the default # entries & dma_debug_driver=sky2:

6:00 is eth0 & 4:00 is eth1.

Jan 30 23:53:14 mail kernel: DMA-API: 0000:06:00.0: entries: 31961
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1b.0: entries: 15
Jan 30 23:53:14 mail kernel: DMA-API: 0000:04:00.0: entries: 744
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1f.2: entries: 6
Jan 30 23:53:14 mail kernel: DMA-API: 0000:08:01.0: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1d.1: entries: 6
Jan 30 23:53:14 mail kernel: DMA-API: 0000:08:02.0: entries: 8
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1a.7: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1d.7: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1a.0: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1a.1: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1a.2: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1d.0: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:00:1d.2: entries: 3
Jan 30 23:53:14 mail kernel: DMA-API: 0000:02:00.0: entries: 1
Jan 30 23:53:14 mail kernel: DMA-API: 0000:05:00.0: entries: 2
Jan 30 23:53:14 mail kernel: DMA-API: debugging out of memory - disabling


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-31  4:55                                                       ` Michael Breuer
@ 2010-01-31 18:50                                                         ` Michael Breuer
  2010-01-31 21:58                                                           ` Michael Breuer
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-31 18:50 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/30/2010 11:55 PM, Michael Breuer wrote:
> On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
>>
>> Could you try the patch below to show maybe some other users of
>> dma-debug entries?
>>
>> Jarek P.
>> ---
>>
> With the default # entries & dma_debug_driver=sky2:
>
> 6:00 is eth0 & 4:00 is eth1.
>
> Jan 30 23:53:14 mail kernel: DMA-API: 0000:06:00.0: entries: 31961
> ...
>
I put a printk as a third else case in sky2_tx_unmap. Looks like the 
issue is that a large number (perhaps all) calls to sky2_tx_unmap have 
re->flags set to neither TX_MAP_SINGLE or TX_MAP_PAGE. Thus the elements 
are never being unmapped.

I suspect that the system collapses when using DMAR sooner than if not 
using DMAR. Probably some hardware limitation on the number of mapped 
elements that is less than the software limitation. I don't see at 
present how a ring element can ever get to this code without re->flags 
being set to one or the other.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-31 18:50                                                         ` Michael Breuer
@ 2010-01-31 21:58                                                           ` Michael Breuer
  2010-01-31 22:18                                                             ` Jarek Poplawski
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-01-31 21:58 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/31/2010 1:50 PM, Michael Breuer wrote:
> On 1/30/2010 11:55 PM, Michael Breuer wrote:
>> On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
>>>
>>> Could you try the patch below to show maybe some other users of
>>> dma-debug entries?
>>>
>>> Jarek P.
>>> ---
>>>
>> With the default # entries & dma_debug_driver=sky2:
>>
>> 6:00 is eth0 & 4:00 is eth1.
>>
>> Jan 30 23:53:14 mail kernel: DMA-API: 0000:06:00.0: entries: 31961
>> ...
>>
> I put a printk as a third else case in sky2_tx_unmap. Looks like the 
> issue is that a large number (perhaps all) calls to sky2_tx_unmap have 
> re->flags set to neither TX_MAP_SINGLE or TX_MAP_PAGE. Thus the 
> elements are never being unmapped.
>
> I suspect that the system collapses when using DMAR sooner than if not 
> using DMAR. Probably some hardware limitation on the number of mapped 
> elements that is less than the software limitation. I don't see at 
> present how a ring element can ever get to this code without re->flags 
> being set to one or the other.
>
>
Put some more debugging code in... re->flags is always NULL upon entry 
to sky2_tx_unmap.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-31 21:58                                                           ` Michael Breuer
@ 2010-01-31 22:18                                                             ` Jarek Poplawski
  2010-02-01  0:19                                                               ` Michael Breuer
                                                                                 ` (2 more replies)
  0 siblings, 3 replies; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-31 22:18 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Sun, Jan 31, 2010 at 04:58:42PM -0500, Michael Breuer wrote:
> On 1/31/2010 1:50 PM, Michael Breuer wrote:
> >On 1/30/2010 11:55 PM, Michael Breuer wrote:
> >>On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
> >>>
> >>>Could you try the patch below to show maybe some other users of
> >>>dma-debug entries?
> >>>
> >>>Jarek P.
> >>>---
> >>>
> >>With the default # entries & dma_debug_driver=sky2:
> >>
> >>6:00 is eth0 & 4:00 is eth1.
> >>
> >>Jan 30 23:53:14 mail kernel: DMA-API: 0000:06:00.0: entries: 31961
> >>...
> >>
> >I put a printk as a third else case in sky2_tx_unmap. Looks like
> >the issue is that a large number (perhaps all) calls to
> >sky2_tx_unmap have re->flags set to neither TX_MAP_SINGLE or
> >TX_MAP_PAGE. Thus the elements are never being unmapped.
> >
> >I suspect that the system collapses when using DMAR sooner than if
> >not using DMAR. Probably some hardware limitation on the number of
> >mapped elements that is less than the software limitation. I don't
> >see at present how a ring element can ever get to this code
> >without re->flags being set to one or the other.
> >
> >
> Put some more debugging code in... re->flags is always NULL upon
> entry to sky2_tx_unmap.
> 

Yes, good point! Could you try if this patch can fix it. (not compiled)

Thanks,
Jarek P.
---

 drivers/net/sky2.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index d760650..3437917 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1025,9 +1025,10 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
 static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
 {
 	struct sky2_tx_le *le = sky2->tx_le + *slot;
-	struct tx_ring_info *re = sky2->tx_ring + *slot;
+	struct tx_ring_info *re;
 
 	*slot = RING_NEXT(*slot, sky2->tx_ring_size);
+	re = sky2->tx_ring + *slot;
 	re->flags = 0;
 	re->skb = NULL;
 	le->ctrl = 0;
@@ -1036,13 +1037,16 @@ static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
 
 static void tx_init(struct sky2_port *sky2)
 {
-	struct sky2_tx_le *le;
+	struct sky2_tx_le *le = sky2->tx_le;
+	struct tx_ring_info *re = sky2->tx_ring;
 
 	sky2->tx_prod = sky2->tx_cons = 0;
 	sky2->tx_tcpsum = 0;
 	sky2->tx_last_mss = 0;
 
-	le = get_tx_le(sky2, &sky2->tx_prod);
+	re->flags = 0;
+	re->skb = NULL;
+	le->ctrl = 0;
 	le->addr = 0;
 	le->opcode = OP_ADDR64 | HW_OWNER;
 	sky2->tx_last_upper = 0;

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-31  4:17                                                       ` Michael Breuer
@ 2010-01-31 22:25                                                         ` Jarek Poplawski
  2010-01-31 23:58                                                           ` Michael Breuer
  0 siblings, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-01-31 22:25 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Sat, Jan 30, 2010 at 11:17:41PM -0500, Michael Breuer wrote:
> On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
> >On Sat, Jan 30, 2010 at 11:31:48AM -0500, Michael Breuer wrote:
> >>Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010
> >>length 1518
> >>Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010
> >>length 1518
> >These are length errors, but status shows more than 1518, e.g. 2036
> >here, unless I miss something. Please, don't use jumbo frames in your
> >network until we fully debug it for regular frames (Stephen admitted
> >sky2 jumbo might be broken).
> MTU was 1500 - not using jumbo frames as they don't work.

Do you mean no NIC in your network could have sent such frames?

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-31 22:25                                                         ` Jarek Poplawski
@ 2010-01-31 23:58                                                           ` Michael Breuer
  0 siblings, 0 replies; 99+ messages in thread
From: Michael Breuer @ 2010-01-31 23:58 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/31/2010 5:25 PM, Jarek Poplawski wrote:
> On Sat, Jan 30, 2010 at 11:17:41PM -0500, Michael Breuer wrote:
>    
>> On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
>>      
>>> On Sat, Jan 30, 2010 at 11:31:48AM -0500, Michael Breuer wrote:
>>>        
>>>> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010
>>>> length 1518
>>>> Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010
>>>> length 1518
>>>>          
>>> These are length errors, but status shows more than 1518, e.g. 2036
>>> here, unless I miss something. Please, don't use jumbo frames in your
>>> network until we fully debug it for regular frames (Stephen admitted
>>> sky2 jumbo might be broken).
>>>        
>> MTU was 1500 - not using jumbo frames as they don't work.
>>      
> Do you mean no NIC in your network could have sent such frames?
>
> Jarek P.
>    
Well... There's only one possible source... and if there were it would 
have been a Win7 bug :) Regardless, sky2 shouldn't be sensitive to rogue 
external network stuff.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-31 22:18                                                             ` Jarek Poplawski
@ 2010-02-01  0:19                                                               ` Michael Breuer
  2010-02-01  4:26                                                                 ` Michael Breuer
  2010-02-01  9:17                                                                 ` [PATCH v2] sky2: Fix transmit dma mapping handling Jarek Poplawski
  2010-02-01 18:08                                                               ` [PATCH] sky2: receive dma mapping error handling Stephen Hemminger
  2010-02-01 18:20                                                               ` Stephen Hemminger
  2 siblings, 2 replies; 99+ messages in thread
From: Michael Breuer @ 2010-02-01  0:19 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/31/2010 5:18 PM, Jarek Poplawski wrote:
> On Sun, Jan 31, 2010 at 04:58:42PM -0500, Michael Breuer wrote:
>    
>> On 1/31/2010 1:50 PM, Michael Breuer wrote:
>>      
>>> On 1/30/2010 11:55 PM, Michael Breuer wrote:
>>>        
>>>> On 01/30/2010 07:34 PM, Jarek Poplawski wrote:
>>>>          
>>>>> Could you try the patch below to show maybe some other users of
>>>>> dma-debug entries?
>>>>>
>>>>> Jarek P.
>>>>> ---
>>>>>
>>>>>            
>>>> With the default # entries&  dma_debug_driver=sky2:
>>>>
>>>> 6:00 is eth0&  4:00 is eth1.
>>>>
>>>> Jan 30 23:53:14 mail kernel: DMA-API: 0000:06:00.0: entries: 31961
>>>> ...
>>>>
>>>>          
>>> I put a printk as a third else case in sky2_tx_unmap. Looks like
>>> the issue is that a large number (perhaps all) calls to
>>> sky2_tx_unmap have re->flags set to neither TX_MAP_SINGLE or
>>> TX_MAP_PAGE. Thus the elements are never being unmapped.
>>>
>>> I suspect that the system collapses when using DMAR sooner than if
>>> not using DMAR. Probably some hardware limitation on the number of
>>> mapped elements that is less than the software limitation. I don't
>>> see at present how a ring element can ever get to this code
>>> without re->flags being set to one or the other.
>>>
>>>
>>>        
>> Put some more debugging code in... re->flags is always NULL upon
>> entry to sky2_tx_unmap.
>>
>>      
> Yes, good point! Could you try if this patch can fix it. (not compiled)
>
> Thanks,
> Jarek P.
> ---
>
>   drivers/net/sky2.c |   10 +++++++---
>   1 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index d760650..3437917 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -1025,9 +1025,10 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
>   static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
>   {
>   	struct sky2_tx_le *le = sky2->tx_le + *slot;
> -	struct tx_ring_info *re = sky2->tx_ring + *slot;
> +	struct tx_ring_info *re;
>
>   	*slot = RING_NEXT(*slot, sky2->tx_ring_size);
> +	re = sky2->tx_ring + *slot;
>   	re->flags = 0;
>   	re->skb = NULL;
>   	le->ctrl = 0;
> @@ -1036,13 +1037,16 @@ static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
>
>   static void tx_init(struct sky2_port *sky2)
>   {
> -	struct sky2_tx_le *le;
> +	struct sky2_tx_le *le = sky2->tx_le;
> +	struct tx_ring_info *re = sky2->tx_ring;
>
>   	sky2->tx_prod = sky2->tx_cons = 0;
>   	sky2->tx_tcpsum = 0;
>   	sky2->tx_last_mss = 0;
>
> -	le = get_tx_le(sky2,&sky2->tx_prod);
> +	re->flags = 0;
> +	re->skb = NULL;
> +	le->ctrl = 0;
>   	le->addr = 0;
>   	le->opcode = OP_ADDR64 | HW_OWNER;
>   	sky2->tx_last_upper = 0;
>    
Ok- solves the dma-debug issue - i.e., elements are now being unmapped.

Will leave up and hit with traffic unless a crash occurs. If I hit 
something unrelated I'll backport to 2.6.32.7 and try that for a while. 
I do think it's plausible that the dma errors after (during) load were 
due to hardware limitations on the number of mapped entries (haven't 
researched what that limit was). I would also assume that the sw map 
would also have failed eventually.

I'd suggest that regardless of whether this patch solves my crash that 
it ought to be backported as it seems unlikely that any machine would be 
able to survive for long without the tx entries being unmapped.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-01  0:19                                                               ` Michael Breuer
@ 2010-02-01  4:26                                                                 ` Michael Breuer
  2010-02-01 10:47                                                                   ` Jarek Poplawski
  2010-02-01  9:17                                                                 ` [PATCH v2] sky2: Fix transmit dma mapping handling Jarek Poplawski
  1 sibling, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-02-01  4:26 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 1/31/2010 7:19 PM, Michael Breuer wrote:
> On 1/31/2010 5:18 PM, Jarek Poplawski wrote:
> solves the dma-debug issue - i.e., elements are now being unmapped.
>
> Will leave up and hit with traffic unless a crash occurs. If I hit 
> something unrelated I'll backport to 2.6.32.7 and try that for a 
> while. I do think it's plausible that the dma errors after (during) 
> load were due to hardware limitations on the number of mapped entries 
> (haven't researched what that limit was). I would also assume that the 
> sw map would also have failed eventually.
>
> I'd suggest that regardless of whether this patch solves my crash that 
> it ought to be backported as it seems unlikely that any machine would 
> be able to survive for long without the tx entries being unmapped.
>
FYI - tried generating lots of extra tx traffic... found a way to 
generate the rx status messages on demand:
     ping -i .0000001 -s 8000 -t 2 <host> >/dev/null

Yields:
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010 
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010 
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010 
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010 
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010 
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010 
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010 
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010 
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010 
length 1518
Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010 
length 1518
Jan 31 23:08:12 mail kernel: net_ratelimit: 316 callbacks suppressed
etc.
Looking at the packet trace, it seems that my Windows7 box is under 
*some* circumstances not observing the MTU. In this case, the ICMP reply 
is going back with the 8000 byte jumbo frame unfragmented. It seems that 
the reverse is also true. I don't know why sometimes win7 does this, and 
at other times properly fragments.

Oddly, prior to this attempt if I set no fragment on a ping from the 
windows box back to the linux box and a size of > mtu (like 8000), the 
ping failed. Absent the no-fragment flag, the ping properly fragmented. 
I am not sure why Windows now thinks the MTU is > 1500. I'll look into 
that when I have some time. It's possible that with 2.6.33-rc5 & the 
patches I've got that somehow path mtu discovery is broken as nothing 
changed on the windows side.

Understanding that the other side is out of spec, I'd still wonder why 
the sky2 driver generates rx errors. Perhaps overruns should be tossed 
silently... by the hardware if possible.




^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v2] sky2: Fix transmit dma mapping handling
  2010-02-01  0:19                                                               ` Michael Breuer
  2010-02-01  4:26                                                                 ` Michael Breuer
@ 2010-02-01  9:17                                                                 ` Jarek Poplawski
  2010-02-01 17:52                                                                   ` Michael Breuer
  1 sibling, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-02-01  9:17 UTC (permalink / raw)
  To: David Miller
  Cc: Michael Breuer, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Sun, Jan 31, 2010 at 07:19:46PM -0500, Michael Breuer wrote:
> On 1/31/2010 5:18 PM, Jarek Poplawski wrote:
>> On Sun, Jan 31, 2010 at 04:58:42PM -0500, Michael Breuer wrote:
>>    
>>> On 1/31/2010 1:50 PM, Michael Breuer wrote:
>>>> I put a printk as a third else case in sky2_tx_unmap. Looks like
>>>> the issue is that a large number (perhaps all) calls to
>>>> sky2_tx_unmap have re->flags set to neither TX_MAP_SINGLE or
>>>> TX_MAP_PAGE. Thus the elements are never being unmapped.
>>>>
>>>> I suspect that the system collapses when using DMAR sooner than if
>>>> not using DMAR. Probably some hardware limitation on the number of
>>>> mapped elements that is less than the software limitation. I don't
>>>> see at present how a ring element can ever get to this code
>>>> without re->flags being set to one or the other.
>>>>
>>>>
>>>>        
>>> Put some more debugging code in... re->flags is always NULL upon
>>> entry to sky2_tx_unmap.
>>>
>>>      
>> Yes, good point! Could you try if this patch can fix it. (not compiled)
...
> Ok- solves the dma-debug issue - i.e., elements are now being unmapped.
>
> Will leave up and hit with traffic unless a crash occurs. If I hit  
> something unrelated I'll backport to 2.6.32.7 and try that for a while.  
> I do think it's plausible that the dma errors after (during) load were  
> due to hardware limitations on the number of mapped entries (haven't  
> researched what that limit was). I would also assume that the sw map  
> would also have failed eventually.
>
> I'd suggest that regardless of whether this patch solves my crash that  
> it ought to be backported as it seems unlikely that any machine would be  
> able to survive for long without the tx entries being unmapped.

Here is a bit improved version (re->flags = 0 in sky2_tx_unmap()) for
merging, or additional testing if David wishes.

Thanks,
Jarek P.
--------------->
Michael Breuer reported that dma-debug entries added by sky2 driver
weren't unmapped, and found out "re->flags is always NULL upon entry
to sky2_tx_unmap". It is overwritten by get_tx_le() after changes
introduced by commit 6b84dacadbdc3dab6a5b313d20d5a93b0d998641.

This patch reorders initializations in get_tx_le() and tx_init(), and
additionally does re->flags zeroing in sky2_tx_unmap() to prevent
possible double unmapping.

With debugging by: Michael Breuer <mbreuer@majjas.com>

Reported-by: Michael Breuer <mbreuer@majjas.com>
Tested-by: Michael Breuer <mbreuer@majjas.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 drivers/net/sky2.c |   20 +++++++++++++-------
 1 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index d760650..21bb00a 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1025,9 +1025,10 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
 static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
 {
 	struct sky2_tx_le *le = sky2->tx_le + *slot;
-	struct tx_ring_info *re = sky2->tx_ring + *slot;
+	struct tx_ring_info *re;
 
 	*slot = RING_NEXT(*slot, sky2->tx_ring_size);
+	re = sky2->tx_ring + *slot;
 	re->flags = 0;
 	re->skb = NULL;
 	le->ctrl = 0;
@@ -1036,13 +1037,16 @@ static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
 
 static void tx_init(struct sky2_port *sky2)
 {
-	struct sky2_tx_le *le;
+	struct sky2_tx_le *le = sky2->tx_le;
+	struct tx_ring_info *re = sky2->tx_ring;
 
 	sky2->tx_prod = sky2->tx_cons = 0;
 	sky2->tx_tcpsum = 0;
 	sky2->tx_last_mss = 0;
 
-	le = get_tx_le(sky2, &sky2->tx_prod);
+	re->flags = 0;
+	re->skb = NULL;
+	le->ctrl = 0;
 	le->addr = 0;
 	le->opcode = OP_ADDR64 | HW_OWNER;
 	sky2->tx_last_upper = 0;
@@ -1622,17 +1626,19 @@ static unsigned tx_le_req(const struct sk_buff *skb)
 	return count;
 }
 
-static void sky2_tx_unmap(struct pci_dev *pdev,
-			  const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
 {
-	if (re->flags & TX_MAP_SINGLE)
+	if (re->flags & TX_MAP_SINGLE) {
 		pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
 				 pci_unmap_len(re, maplen),
 				 PCI_DMA_TODEVICE);
-	else if (re->flags & TX_MAP_PAGE)
+		re->flags = 0;
+	} else if (re->flags & TX_MAP_PAGE) {
 		pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
 			       pci_unmap_len(re, maplen),
 			       PCI_DMA_TODEVICE);
+		re->flags = 0;
+	}
 }
 
 /*

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-01  4:26                                                                 ` Michael Breuer
@ 2010-02-01 10:47                                                                   ` Jarek Poplawski
  0 siblings, 0 replies; 99+ messages in thread
From: Jarek Poplawski @ 2010-02-01 10:47 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Stephen Hemminger, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Sun, Jan 31, 2010 at 11:26:03PM -0500, Michael Breuer wrote:
> FYI - tried generating lots of extra tx traffic... found a way to  
> generate the rx status messages on demand:
>     ping -i .0000001 -s 8000 -t 2 <host> >/dev/null
>
> Yields:
> Jan 31 23:08:07 mail kernel: sky2 eth0: rx error, status 0x1f6a0010  
> length 1518
...
> Jan 31 23:08:12 mail kernel: net_ratelimit: 316 callbacks suppressed
> etc.
...
> Understanding that the other side is out of spec, I'd still wonder why  
> the sky2 driver generates rx errors. Perhaps overruns should be tossed  
> silently... by the hardware if possible.

Of course it's a matter of taste, but it seems such errors shouldn't
be tolerated in a local network. I'd rather prefer doing them more
explicit (like e.g. some other kind of length errors).

Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v2] sky2: Fix transmit dma mapping handling
  2010-02-01  9:17                                                                 ` [PATCH v2] sky2: Fix transmit dma mapping handling Jarek Poplawski
@ 2010-02-01 17:52                                                                   ` Michael Breuer
  0 siblings, 0 replies; 99+ messages in thread
From: Michael Breuer @ 2010-02-01 17:52 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Miller, Stephen Hemminger, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 2/1/2010 4:17 AM, Jarek Poplawski wrote:
> On Sun, Jan 31, 2010 at 07:19:46PM -0500, Michael Breuer wrote:
>    
>> On 1/31/2010 5:18 PM, Jarek Poplawski wrote:
>>      
>>> On Sun, Jan 31, 2010 at 04:58:42PM -0500, Michael Breuer wrote:
>>>
>>>        
>>>> On 1/31/2010 1:50 PM, Michael Breuer wrote:
>>>>          
>>>>> I put a printk as a third else case in sky2_tx_unmap. Looks like
>>>>> the issue is that a large number (perhaps all) calls to
>>>>> sky2_tx_unmap have re->flags set to neither TX_MAP_SINGLE or
>>>>> TX_MAP_PAGE. Thus the elements are never being unmapped.
>>>>>
>>>>> I suspect that the system collapses when using DMAR sooner than if
>>>>> not using DMAR. Probably some hardware limitation on the number of
>>>>> mapped elements that is less than the software limitation. I don't
>>>>> see at present how a ring element can ever get to this code
>>>>> without re->flags being set to one or the other.
>>>>>
>>>>>
>>>>>
>>>>>            
>>>> Put some more debugging code in... re->flags is always NULL upon
>>>> entry to sky2_tx_unmap.
>>>>
>>>>
>>>>          
>>> Yes, good point! Could you try if this patch can fix it. (not compiled)
>>>        
> ...
>    
>> Ok- solves the dma-debug issue - i.e., elements are now being unmapped.
>>
>> Will leave up and hit with traffic unless a crash occurs. If I hit
>> something unrelated I'll backport to 2.6.32.7 and try that for a while.
>> I do think it's plausible that the dma errors after (during) load were
>> due to hardware limitations on the number of mapped entries (haven't
>> researched what that limit was). I would also assume that the sw map
>> would also have failed eventually.
>>
>> I'd suggest that regardless of whether this patch solves my crash that
>> it ought to be backported as it seems unlikely that any machine would be
>> able to survive for long without the tx entries being unmapped.
>>      
> Here is a bit improved version (re->flags = 0 in sky2_tx_unmap()) for
> merging, or additional testing if David wishes.
>
> Thanks,
> Jarek P.
> --------------->
> Michael Breuer reported that dma-debug entries added by sky2 driver
> weren't unmapped, and found out "re->flags is always NULL upon entry
> to sky2_tx_unmap". It is overwritten by get_tx_le() after changes
> introduced by commit 6b84dacadbdc3dab6a5b313d20d5a93b0d998641.
>
> This patch reorders initializations in get_tx_le() and tx_init(), and
> additionally does re->flags zeroing in sky2_tx_unmap() to prevent
> possible double unmapping.
>
> With debugging by: Michael Breuer<mbreuer@majjas.com>
>
> Reported-by: Michael Breuer<mbreuer@majjas.com>
> Tested-by: Michael Breuer<mbreuer@majjas.com>
> Signed-off-by: Jarek Poplawski<jarkao2@gmail.com>
> ---
>
>   drivers/net/sky2.c |   20 +++++++++++++-------
>   1 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index d760650..21bb00a 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -1025,9 +1025,10 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
>   static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
>   {
>   	struct sky2_tx_le *le = sky2->tx_le + *slot;
> -	struct tx_ring_info *re = sky2->tx_ring + *slot;
> +	struct tx_ring_info *re;
>
>   	*slot = RING_NEXT(*slot, sky2->tx_ring_size);
> +	re = sky2->tx_ring + *slot;
>   	re->flags = 0;
>   	re->skb = NULL;
>   	le->ctrl = 0;
> @@ -1036,13 +1037,16 @@ static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
>
>   static void tx_init(struct sky2_port *sky2)
>   {
> -	struct sky2_tx_le *le;
> +	struct sky2_tx_le *le = sky2->tx_le;
> +	struct tx_ring_info *re = sky2->tx_ring;
>
>   	sky2->tx_prod = sky2->tx_cons = 0;
>   	sky2->tx_tcpsum = 0;
>   	sky2->tx_last_mss = 0;
>
> -	le = get_tx_le(sky2,&sky2->tx_prod);
> +	re->flags = 0;
> +	re->skb = NULL;
> +	le->ctrl = 0;
>   	le->addr = 0;
>   	le->opcode = OP_ADDR64 | HW_OWNER;
>   	sky2->tx_last_upper = 0;
> @@ -1622,17 +1626,19 @@ static unsigned tx_le_req(const struct sk_buff *skb)
>   	return count;
>   }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> -			  const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
>   {
> -	if (re->flags&  TX_MAP_SINGLE)
> +	if (re->flags&  TX_MAP_SINGLE) {
>   		pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
>   				 pci_unmap_len(re, maplen),
>   				 PCI_DMA_TODEVICE);
> -	else if (re->flags&  TX_MAP_PAGE)
> +		re->flags = 0;
> +	} else if (re->flags&  TX_MAP_PAGE) {
>   		pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
>   			       pci_unmap_len(re, maplen),
>   			       PCI_DMA_TODEVICE);
> +		re->flags = 0;
> +	}
>   }
>
>   /*
>    
Running with v2 now. v1 patch significantly improved things. I cannot 
state for certain that the original issue is resolved (crash under 
load), however so far it appears so. Additionally, some performance 
issues (raid rebuild times, etc.) seem drastically improved. I'm 
guessing that this might have to do with the excess mapped sky2 tx buffers.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-31 22:18                                                             ` Jarek Poplawski
  2010-02-01  0:19                                                               ` Michael Breuer
@ 2010-02-01 18:08                                                               ` Stephen Hemminger
  2010-02-01 18:20                                                               ` Stephen Hemminger
  2 siblings, 0 replies; 99+ messages in thread
From: Stephen Hemminger @ 2010-02-01 18:08 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Michael Breuer, David Miller, akpm, flyboy, linux-kernel, netdev,
	Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Sun, 31 Jan 2010 23:18:35 +0100
Jarek Poplawski <jarkao2@gmail.com> wrote:

> @@ -1025,9 +1025,10 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
>  static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
>  {
>  	struct sky2_tx_le *le = sky2->tx_le + *slot;
> -	struct tx_ring_info *re = sky2->tx_ring + *slot;
> +	struct tx_ring_info *re;
>  
>  	*slot = RING_NEXT(*slot, sky2->tx_ring_size);
> +	re = sky2->tx_ring + *slot;
>  	re->flags = 0;

Bogus, le and re are 1-to-1, so hardware portion an software
portion should be at same index.

-- 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-31 22:18                                                             ` Jarek Poplawski
  2010-02-01  0:19                                                               ` Michael Breuer
  2010-02-01 18:08                                                               ` [PATCH] sky2: receive dma mapping error handling Stephen Hemminger
@ 2010-02-01 18:20                                                               ` Stephen Hemminger
  2010-02-01 18:44                                                                 ` Michael Breuer
                                                                                   ` (4 more replies)
  2 siblings, 5 replies; 99+ messages in thread
From: Stephen Hemminger @ 2010-02-01 18:20 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Michael Breuer, David Miller, akpm, flyboy, linux-kernel, netdev,
	Michael Chan, Don Fry, Francois Romieu, Matt Carlson

This fixes the fact that re->flags is always zero without causing
other confusion.

--- a/drivers/net/sky2.c	2010-02-01 10:07:42.676296236 -0800
+++ b/drivers/net/sky2.c	2010-02-01 10:18:12.575044064 -0800
@@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sk
 static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
 {
 	struct sky2_tx_le *le = sky2->tx_le + *slot;
-	struct tx_ring_info *re = sky2->tx_ring + *slot;
 
 	*slot = RING_NEXT(*slot, sky2->tx_ring_size);
-	re->flags = 0;
-	re->skb = NULL;
 	le->ctrl = 0;
 	return le;
 }
@@ -1622,8 +1619,7 @@ static unsigned tx_le_req(const struct s
 	return count;
 }
 
-static void sky2_tx_unmap(struct pci_dev *pdev,
-			  const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
 {
 	if (re->flags & TX_MAP_SINGLE)
 		pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
@@ -1633,6 +1629,7 @@ static void sky2_tx_unmap(struct pci_dev
 		pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
 			       pci_unmap_len(re, maplen),
 			       PCI_DMA_TODEVICE);
+	re->flags = 0;
 }
 
 /*
@@ -1839,6 +1836,7 @@ static void sky2_tx_complete(struct sky2
 			dev->stats.tx_packets++;
 			dev->stats.tx_bytes += skb->len;
 
+			re->skb = NULL;
 			dev_kfree_skb_any(skb);
 
 			sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-01 18:20                                                               ` Stephen Hemminger
@ 2010-02-01 18:44                                                                 ` Michael Breuer
  2010-02-01 20:13                                                                 ` Jarek Poplawski
                                                                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 99+ messages in thread
From: Michael Breuer @ 2010-02-01 18:44 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 2/1/2010 1:20 PM, Stephen Hemminger wrote:
> This fixes the fact that re->flags is always zero without causing
> other confusion.
>
> --- a/drivers/net/sky2.c	2010-02-01 10:07:42.676296236 -0800
> +++ b/drivers/net/sky2.c	2010-02-01 10:18:12.575044064 -0800
> @@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sk
>   static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
>   {
>   	struct sky2_tx_le *le = sky2->tx_le + *slot;
> -	struct tx_ring_info *re = sky2->tx_ring + *slot;
>
>   	*slot = RING_NEXT(*slot, sky2->tx_ring_size);
> -	re->flags = 0;
> -	re->skb = NULL;
>   	le->ctrl = 0;
>   	return le;
>   }
> @@ -1622,8 +1619,7 @@ static unsigned tx_le_req(const struct s
>   	return count;
>   }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> -			  const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
>   {
>   	if (re->flags&  TX_MAP_SINGLE)
>   		pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1633,6 +1629,7 @@ static void sky2_tx_unmap(struct pci_dev
>   		pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
>   			       pci_unmap_len(re, maplen),
>   			       PCI_DMA_TODEVICE);
> +	re->flags = 0;
>   }
>
>   /*
> @@ -1839,6 +1836,7 @@ static void sky2_tx_complete(struct sky2
>   			dev->stats.tx_packets++;
>   			dev->stats.tx_bytes += skb->len;
>
> +			re->skb = NULL;
>   			dev_kfree_skb_any(skb);
>
>   			sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    
Running with this patch now - dma debug num_free_entries stable - so 
far, so good.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-01 18:20                                                               ` Stephen Hemminger
  2010-02-01 18:44                                                                 ` Michael Breuer
@ 2010-02-01 20:13                                                                 ` Jarek Poplawski
  2010-02-01 20:41                                                                   ` Jarek Poplawski
  2010-02-01 21:27                                                                 ` [PATCH v3] " Jarek Poplawski
                                                                                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-02-01 20:13 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael Breuer, David Miller, akpm, flyboy, linux-kernel, netdev,
	Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Mon, Feb 01, 2010 at 10:20:18AM -0800, Stephen Hemminger wrote:
> This fixes the fact that re->flags is always zero without causing
> other confusion.

Actually, there is a slight confusion: after tx_init() slots #0 are
skipped and waiting for tx_complete. Of course, no big deal, but no
problem with fixing it too, so there is the main difference between
these two patches. (Moving re->flags and re->skb initializations is
an optimization, but I tried to change only the buggy parts.)

Jarek P.

> 
> --- a/drivers/net/sky2.c	2010-02-01 10:07:42.676296236 -0800
> +++ b/drivers/net/sky2.c	2010-02-01 10:18:12.575044064 -0800
> @@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sk
>  static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
>  {
>  	struct sky2_tx_le *le = sky2->tx_le + *slot;
> -	struct tx_ring_info *re = sky2->tx_ring + *slot;
>  
>  	*slot = RING_NEXT(*slot, sky2->tx_ring_size);
> -	re->flags = 0;
> -	re->skb = NULL;
>  	le->ctrl = 0;
>  	return le;
>  }
> @@ -1622,8 +1619,7 @@ static unsigned tx_le_req(const struct s
>  	return count;
>  }
>  
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> -			  const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
>  {
>  	if (re->flags & TX_MAP_SINGLE)
>  		pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1633,6 +1629,7 @@ static void sky2_tx_unmap(struct pci_dev
>  		pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
>  			       pci_unmap_len(re, maplen),
>  			       PCI_DMA_TODEVICE);
> +	re->flags = 0;
>  }
>  
>  /*
> @@ -1839,6 +1836,7 @@ static void sky2_tx_complete(struct sky2
>  			dev->stats.tx_packets++;
>  			dev->stats.tx_bytes += skb->len;
>  
> +			re->skb = NULL;
>  			dev_kfree_skb_any(skb);
>  
>  			sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-01 20:13                                                                 ` Jarek Poplawski
@ 2010-02-01 20:41                                                                   ` Jarek Poplawski
  0 siblings, 0 replies; 99+ messages in thread
From: Jarek Poplawski @ 2010-02-01 20:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael Breuer, David Miller, akpm, flyboy, linux-kernel, netdev,
	Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Mon, Feb 01, 2010 at 09:13:23PM +0100, Jarek Poplawski wrote:
> On Mon, Feb 01, 2010 at 10:20:18AM -0800, Stephen Hemminger wrote:
> > This fixes the fact that re->flags is always zero without causing
> > other confusion.
> 
> Actually, there is a slight confusion: after tx_init() slots #0 are
> skipped and waiting for tx_complete. Of course, no big deal, but no
> problem with fixing it too, so there is the main difference between

Hmm... On the other hand it could be fixed simpler by moving
sky2->tx_cons. I'll send v3.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3] sky2:  receive dma mapping error handling
  2010-02-01 18:20                                                               ` Stephen Hemminger
  2010-02-01 18:44                                                                 ` Michael Breuer
  2010-02-01 20:13                                                                 ` Jarek Poplawski
@ 2010-02-01 21:27                                                                 ` Jarek Poplawski
  2010-02-01 22:29                                                                   ` Stephen Hemminger
  2010-02-01 21:42                                                                 ` [PATCH v3b resent] sky2: Fix transmit dma mapping handling Jarek Poplawski
  2010-02-03  4:07                                                                 ` [PATCH] sky2: receive dma mapping error handling Michael Breuer
  4 siblings, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-02-01 21:27 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael Breuer, David Miller, akpm, flyboy, linux-kernel, netdev,
	Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Mon, Feb 01, 2010 at 10:20:18AM -0800, Stephen Hemminger wrote:
> This fixes the fact that re->flags is always zero without causing
> other confusion.

Here it is, with only sky2->tx_cons update in tx_init().

Thanks,
Jarek P.
---------------> (take 3)
Michael Breuer reported that dma-debug entries added by sky2 driver
weren't unmapped, and found out "re->flags is always NULL upon entry
to sky2_tx_unmap". It is overwritten by get_tx_le() after changes
introduced by commit 6b84dacadbdc3dab6a5b313d20d5a93b0d998641.

This patch moves re->flags and re->skb initializations from
get_tx_le() to sky2_tx_unmap() and sky2_tx_complete() respectively.
Additionally sky2->tx_cons is updated to sky2->tx_prod in tx_init()
to remove one needless tx completion.

With debugging by: Michael Breuer <mbreuer@majjas.com>
Improved by: Stephen Hemminger <shemminger@vyatta.com>

Reported-by: Michael Breuer <mbreuer@majjas.com>
Tested-by: Michael Breuer <mbreuer@majjas.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
---

 drivers/net/sky2.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index d760650..08cd65b 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
 static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
 {
 	struct sky2_tx_le *le = sky2->tx_le + *slot;
-	struct tx_ring_info *re = sky2->tx_ring + *slot;
 
 	*slot = RING_NEXT(*slot, sky2->tx_ring_size);
-	re->flags = 0;
-	re->skb = NULL;
 	le->ctrl = 0;
 	return le;
 }
@@ -1038,13 +1035,14 @@ static void tx_init(struct sky2_port *sky2)
 {
 	struct sky2_tx_le *le;
 
-	sky2->tx_prod = sky2->tx_cons = 0;
+	sky2->tx_prod = 0;
 	sky2->tx_tcpsum = 0;
 	sky2->tx_last_mss = 0;
 
 	le = get_tx_le(sky2, &sky2->tx_prod);
 	le->addr = 0;
 	le->opcode = OP_ADDR64 | HW_OWNER;
+	sky2->tx_cons = sky2->tx_prod;
 	sky2->tx_last_upper = 0;
 }
 
@@ -1622,8 +1620,7 @@ static unsigned tx_le_req(const struct sk_buff *skb)
 	return count;
 }
 
-static void sky2_tx_unmap(struct pci_dev *pdev,
-			  const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
 {
 	if (re->flags & TX_MAP_SINGLE)
 		pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
@@ -1633,6 +1630,7 @@ static void sky2_tx_unmap(struct pci_dev *pdev,
 		pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
 			       pci_unmap_len(re, maplen),
 			       PCI_DMA_TODEVICE);
+	re->flags = 0;
 }
 
 /*
@@ -1839,6 +1837,7 @@ static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
 			dev->stats.tx_packets++;
 			dev->stats.tx_bytes += skb->len;
 
+			re->skb = NULL;
 			dev_kfree_skb_any(skb);
 
 			sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v3b resent] sky2: Fix transmit dma mapping handling
  2010-02-01 18:20                                                               ` Stephen Hemminger
                                                                                   ` (2 preceding siblings ...)
  2010-02-01 21:27                                                                 ` [PATCH v3] " Jarek Poplawski
@ 2010-02-01 21:42                                                                 ` Jarek Poplawski
  2010-02-03  4:07                                                                 ` [PATCH] sky2: receive dma mapping error handling Michael Breuer
  4 siblings, 0 replies; 99+ messages in thread
From: Jarek Poplawski @ 2010-02-01 21:42 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael Breuer, David Miller, akpm, flyboy, linux-kernel, netdev,
	Michael Chan, Don Fry, Francois Romieu, Matt Carlson

[ Subject was wrong, sorry! ]

On Mon, Feb 01, 2010 at 10:20:18AM -0800, Stephen Hemminger wrote:
> This fixes the fact that re->flags is always zero without causing
> other confusion.

Here it is, with only sky2->tx_cons update in tx_init().

Thanks,
Jarek P.
---------------> (take 3b)
Michael Breuer reported that dma-debug entries added by sky2 driver
weren't unmapped, and found out "re->flags is always NULL upon entry
to sky2_tx_unmap". It is overwritten by get_tx_le() after changes
introduced by commit 6b84dacadbdc3dab6a5b313d20d5a93b0d998641.

This patch moves re->flags and re->skb initializations from
get_tx_le() to sky2_tx_unmap() and sky2_tx_complete() respectively.
Additionally sky2->tx_cons is updated to sky2->tx_prod in tx_init()
to remove one needless tx completion.

With debugging by: Michael Breuer <mbreuer@majjas.com>
Improved by: Stephen Hemminger <shemminger@vyatta.com>

Reported-by: Michael Breuer <mbreuer@majjas.com>
Tested-by: Michael Breuer <mbreuer@majjas.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
---

 drivers/net/sky2.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index d760650..08cd65b 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sky2_hw *hw, u32 qaddr,
 static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
 {
 	struct sky2_tx_le *le = sky2->tx_le + *slot;
-	struct tx_ring_info *re = sky2->tx_ring + *slot;
 
 	*slot = RING_NEXT(*slot, sky2->tx_ring_size);
-	re->flags = 0;
-	re->skb = NULL;
 	le->ctrl = 0;
 	return le;
 }
@@ -1038,13 +1035,14 @@ static void tx_init(struct sky2_port *sky2)
 {
 	struct sky2_tx_le *le;
 
-	sky2->tx_prod = sky2->tx_cons = 0;
+	sky2->tx_prod = 0;
 	sky2->tx_tcpsum = 0;
 	sky2->tx_last_mss = 0;
 
 	le = get_tx_le(sky2, &sky2->tx_prod);
 	le->addr = 0;
 	le->opcode = OP_ADDR64 | HW_OWNER;
+	sky2->tx_cons = sky2->tx_prod;
 	sky2->tx_last_upper = 0;
 }
 
@@ -1622,8 +1620,7 @@ static unsigned tx_le_req(const struct sk_buff *skb)
 	return count;
 }
 
-static void sky2_tx_unmap(struct pci_dev *pdev,
-			  const struct tx_ring_info *re)
+static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
 {
 	if (re->flags & TX_MAP_SINGLE)
 		pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
@@ -1633,6 +1630,7 @@ static void sky2_tx_unmap(struct pci_dev *pdev,
 		pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
 			       pci_unmap_len(re, maplen),
 			       PCI_DMA_TODEVICE);
+	re->flags = 0;
 }
 
 /*
@@ -1839,6 +1837,7 @@ static void sky2_tx_complete(struct sky2_port *sky2, u16 done)
 			dev->stats.tx_packets++;
 			dev->stats.tx_bytes += skb->len;
 
+			re->skb = NULL;
 			dev_kfree_skb_any(skb);
 
 			sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v3] sky2:  receive dma mapping error handling
  2010-02-01 21:27                                                                 ` [PATCH v3] " Jarek Poplawski
@ 2010-02-01 22:29                                                                   ` Stephen Hemminger
  2010-02-01 22:46                                                                     ` Jarek Poplawski
  0 siblings, 1 reply; 99+ messages in thread
From: Stephen Hemminger @ 2010-02-01 22:29 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Michael Breuer, David Miller, akpm, flyboy, linux-kernel, netdev,
	Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Mon, 1 Feb 2010 22:27:41 +0100
Jarek Poplawski <jarkao2@gmail.com> wrote:

> @@ -1038,13 +1035,14 @@ static void tx_init(struct sky2_port *sky2)
>  {
>  	struct sky2_tx_le *le;
>  
> -	sky2->tx_prod = sky2->tx_cons = 0;
> +	sky2->tx_prod = 0;
>  	sky2->tx_tcpsum = 0;
>  	sky2->tx_last_mss = 0;
>  
>  	le = get_tx_le(sky2, &sky2->tx_prod);
>  	le->addr = 0;
>  	le->opcode = OP_ADDR64 | HW_OWNER;
> +	sky2->tx_cons = sky2->tx_prod;
>  	sky2->tx_last_upper = 0;
>  }

Your change causes the initial element to be skipped. I want
it to goto the hardware. It makes sure the upper bits of the
first request are set (0).

I don't see what was wrong with my fix.

-- 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3] sky2:  receive dma mapping error handling
  2010-02-01 22:29                                                                   ` Stephen Hemminger
@ 2010-02-01 22:46                                                                     ` Jarek Poplawski
  2010-02-01 22:51                                                                       ` Stephen Hemminger
  0 siblings, 1 reply; 99+ messages in thread
From: Jarek Poplawski @ 2010-02-01 22:46 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael Breuer, David Miller, akpm, flyboy, linux-kernel, netdev,
	Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Mon, Feb 01, 2010 at 02:29:42PM -0800, Stephen Hemminger wrote:
> On Mon, 1 Feb 2010 22:27:41 +0100
> Jarek Poplawski <jarkao2@gmail.com> wrote:
> 
> > @@ -1038,13 +1035,14 @@ static void tx_init(struct sky2_port *sky2)
> >  {
> >  	struct sky2_tx_le *le;
> >  
> > -	sky2->tx_prod = sky2->tx_cons = 0;
> > +	sky2->tx_prod = 0;
> >  	sky2->tx_tcpsum = 0;
> >  	sky2->tx_last_mss = 0;
> >  
> >  	le = get_tx_le(sky2, &sky2->tx_prod);
> >  	le->addr = 0;
> >  	le->opcode = OP_ADDR64 | HW_OWNER;
> > +	sky2->tx_cons = sky2->tx_prod;
> >  	sky2->tx_last_upper = 0;
> >  }
> 
> Your change causes the initial element to be skipped. I want
> it to goto the hardware. It makes sure the upper bits of the
> first request are set (0).

I thought "Send high bits if needed" part in sky2_xmit_frame() was
enough. If it's otherwise than my patch was wrong.

> 
> I don't see what was wrong with my fix.
> 
It's OK. Please, submit it instead of mine.

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3] sky2:  receive dma mapping error handling
  2010-02-01 22:46                                                                     ` Jarek Poplawski
@ 2010-02-01 22:51                                                                       ` Stephen Hemminger
  0 siblings, 0 replies; 99+ messages in thread
From: Stephen Hemminger @ 2010-02-01 22:51 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Michael Breuer, David Miller, akpm, flyboy, linux-kernel, netdev,
	Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Mon, 1 Feb 2010 23:46:39 +0100
Jarek Poplawski <jarkao2@gmail.com> wrote:

> On Mon, Feb 01, 2010 at 02:29:42PM -0800, Stephen Hemminger wrote:
> > On Mon, 1 Feb 2010 22:27:41 +0100
> > Jarek Poplawski <jarkao2@gmail.com> wrote:
> > 
> > > @@ -1038,13 +1035,14 @@ static void tx_init(struct sky2_port *sky2)
> > >  {
> > >  	struct sky2_tx_le *le;
> > >  
> > > -	sky2->tx_prod = sky2->tx_cons = 0;
> > > +	sky2->tx_prod = 0;
> > >  	sky2->tx_tcpsum = 0;
> > >  	sky2->tx_last_mss = 0;
> > >  
> > >  	le = get_tx_le(sky2, &sky2->tx_prod);
> > >  	le->addr = 0;
> > >  	le->opcode = OP_ADDR64 | HW_OWNER;
> > > +	sky2->tx_cons = sky2->tx_prod;
> > >  	sky2->tx_last_upper = 0;
> > >  }
> > 
> > Your change causes the initial element to be skipped. I want
> > it to goto the hardware. It makes sure the upper bits of the
> > first request are set (0).
> 
> I thought "Send high bits if needed" part in sky2_xmit_frame() was
> enough. If it's otherwise than my patch was wrong.
> 

The definition is "high bits are different than the last value".
The code init sets the last value to zero, and initializes the hardware
engine as well.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-01-28 23:36                                                 ` [PATCH] sky2: receive dma mapping error handling Stephen Hemminger
                                                                     ` (2 preceding siblings ...)
  2010-01-30 16:31                                                   ` Michael Breuer
@ 2010-02-02 22:44                                                   ` Andi Kleen
  3 siblings, 0 replies; 99+ messages in thread
From: Andi Kleen @ 2010-02-02 22:44 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael Breuer, Jarek Poplawski, David Miller, akpm, flyboy,
	linux-kernel, netdev, Michael Chan, Don Fry, Francois Romieu,
	Matt Carlson

Stephen Hemminger <shemminger@linux-foundation.org> writes:

> -	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> -		re->frag_addr[i] = pci_map_page(pdev,
> +
> +map_page_error:
> +	while (--i >= 0) {
> +		pci_unmap_page(pdev, re->frag_addr[i],
> +			       skb_shinfo(skb)->frags[i].size,
> +			       PCI_DMA_FROMDEVICE);
> +	}
> +
> +	pci_unmap_single(pdev, re->data_addr, pci_unmap_len(re, data_size),
> +			 PCI_DMA_FROMDEVICE);

Better add a helper somewhere to do this, doesn't make sense
to duplicate this in all drivers (lots of drivers have similar
problems)

I remember looking at this some time ago but for some reason
the patches never made it out.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-01-22  6:38           ` David Miller
@ 2010-02-03  1:18             ` FUJITA Tomonori
  2010-02-03  1:27               ` David Miller
  0 siblings, 1 reply; 99+ messages in thread
From: FUJITA Tomonori @ 2010-02-03  1:18 UTC (permalink / raw)
  To: davem
  Cc: fujita.tomonori, jarkao2, alan, shemminger, mbreuer, akpm,
	flyboy, linux-kernel, netdev, mchan, pcnet32, romieu, mcarlson,
	James.Bottomley

Sorry for the late reply,

On Thu, 21 Jan 2010 22:38:41 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Date: Fri, 22 Jan 2010 14:11:29 +0900
> 
> > Even if 'offset' is zero, 'size' still matters, I think. If 'size' is
> > not a multiple of the cache line size, it's possible that driver
> > writers who aren't familiar with cache would be surprised (it depends
> > on the way their drivers use buffers though).
> > 
> > The easiest way for 'completely safe sync for any driver writers' is
> > asking for all the sync parameters must be the same as those passed
> > into the single mapping API. If writes knows what they do, they can do
> > a partial sync with sync_range API. That's the author intention, I
> > guess.
> 
> This is not reasonable.
> 
> You have to think about how people actually use these
> interfaces.
> 
> They have a large buffer, and if they receive a small request they
> want to allocate a smaller buffer, copy into that smaller buffer, and
> give the larger buffer back to the hardware.
> 
> It's an optimization, it performs better this way.
> 
> If you make it so that the DMA sync has to cover the entire large
> buffer, the whole point of the optimization is taken away.

I talked with James. He is ok with changing (or fixing) this API to
enable users to do a partial sync (I'm ok with that too. I just
guessed that he designed the API in such way intentionally not by
mistake).

Can we safely assume that the arch implementations already round
up/down to the safe boundary internally in this API (they should
already)?

As you know, the patch to remove the description of
dma_sync_single/pci_dma_sync_single/dma_sync_sg/pci_dma_sync_pci that
always require a full sync in DMA-API.txt is already -mm so what we
need to do are:

- adding 'a partial sync' description to PCI-DMA-mapping.txt.
- duplicating the similar description to DMA-API.txt.

I don't like two DMA docs. I like to make pci_dma_* API obsolete. We
have the generic DMA API with generic devices so we are always able to
use the API (as you did with sbus_map_*). The majority arch
implementations safely call the bus specific DMA functions via the
generic DMA API. So there are not many things to do. We can just
convert pci_dma_* to dma_* API slowly.

Opinions?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync
  2010-02-03  1:18             ` FUJITA Tomonori
@ 2010-02-03  1:27               ` David Miller
  0 siblings, 0 replies; 99+ messages in thread
From: David Miller @ 2010-02-03  1:27 UTC (permalink / raw)
  To: fujita.tomonori
  Cc: jarkao2, alan, shemminger, mbreuer, akpm, flyboy, linux-kernel,
	netdev, mchan, pcnet32, romieu, mcarlson, James.Bottomley

From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Wed, 3 Feb 2010 10:18:39 +0900

> Can we safely assume that the arch implementations already round
> up/down to the safe boundary internally in this API (they should
> already)?

I can only speak for sparc64 and x86 directly and those are fine.

Any such improper implementations would fail with many common
ethernet drivers already.

> I don't like two DMA docs. I like to make pci_dma_* API obsolete. We
> have the generic DMA API with generic devices so we are always able to
> use the API (as you did with sbus_map_*). The majority arch
> implementations safely call the bus specific DMA functions via the
> generic DMA API. So there are not many things to do. We can just
> convert pci_dma_* to dma_* API slowly.
> 
> Opinions?

I have no problem with this.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-01 18:20                                                               ` Stephen Hemminger
                                                                                   ` (3 preceding siblings ...)
  2010-02-01 21:42                                                                 ` [PATCH v3b resent] sky2: Fix transmit dma mapping handling Jarek Poplawski
@ 2010-02-03  4:07                                                                 ` Michael Breuer
  2010-02-03 16:47                                                                   ` Michael Breuer
  4 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-02-03  4:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 2/1/2010 1:20 PM, Stephen Hemminger wrote:
> This fixes the fact that re->flags is always zero without causing
> other confusion.
>
> --- a/drivers/net/sky2.c	2010-02-01 10:07:42.676296236 -0800
> +++ b/drivers/net/sky2.c	2010-02-01 10:18:12.575044064 -0800
> @@ -1025,11 +1025,8 @@ static void sky2_prefetch_init(struct sk
>   static inline struct sky2_tx_le *get_tx_le(struct sky2_port *sky2, u16 *slot)
>   {
>   	struct sky2_tx_le *le = sky2->tx_le + *slot;
> -	struct tx_ring_info *re = sky2->tx_ring + *slot;
>
>   	*slot = RING_NEXT(*slot, sky2->tx_ring_size);
> -	re->flags = 0;
> -	re->skb = NULL;
>   	le->ctrl = 0;
>   	return le;
>   }
> @@ -1622,8 +1619,7 @@ static unsigned tx_le_req(const struct s
>   	return count;
>   }
>
> -static void sky2_tx_unmap(struct pci_dev *pdev,
> -			  const struct tx_ring_info *re)
> +static void sky2_tx_unmap(struct pci_dev *pdev, struct tx_ring_info *re)
>   {
>   	if (re->flags&  TX_MAP_SINGLE)
>   		pci_unmap_single(pdev, pci_unmap_addr(re, mapaddr),
> @@ -1633,6 +1629,7 @@ static void sky2_tx_unmap(struct pci_dev
>   		pci_unmap_page(pdev, pci_unmap_addr(re, mapaddr),
>   			       pci_unmap_len(re, maplen),
>   			       PCI_DMA_TODEVICE);
> +	re->flags = 0;
>   }
>
>   /*
> @@ -1839,6 +1836,7 @@ static void sky2_tx_complete(struct sky2
>   			dev->stats.tx_packets++;
>   			dev->stats.tx_bytes += skb->len;
>
> +			re->skb = NULL;
>   			dev_kfree_skb_any(skb);
>
>   			sky2->tx_next = RING_NEXT(idx, sky2->tx_ring_size);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>    
Just a brief update - this has been up and stable for about 32 hours - 
I've been periodically generating load on the system. No kernel errors 
of any sort so far. Actually, in retrospect, I believe the dma issue was 
triggering other bad things - including an rcu lockup (patched in tip - 
sched.c).

Just as an FYI - (and this should probably be in a new thread) I am 
seeing an large number (>9,000,000) of dropped rx packets, however at 
this time I see no errors resulting from that (on this or client 
machines). As the # of dropped packets hasn't incremented at any time I 
was observing things, I can't say what this is about. Probably nothing, 
but I'll see if I can track down what is going on. I did see some of 
this earlier on while troubleshooting the sky2 issues that now seem 
resolved.  Quick crosschecking of other machines do not show high error 
or retransmission rates. I'm also not seeing any evidence of other 
errors (no errors reported by ifconfig, or ethtool, or printk (debug is 
enabled).

I'm wondering whether these dropped packets are due mostly to hitting 
GMR_FS_RX_OK in sky2_receive. I'm also guessing that the high numbers of 
this that I'm seeing is an artifact of being able to pump more traffic 
through with the above patch. Given the description of the status code 
in sky2.h (receive ok) I'm wondering whether a) this should be reported 
as dropped, b) whether resubmit is necessary, c) whether it's possible 
that eth1 events coinciding with eth0 events are the cause and d) 
whether or not there's another issue entirely.






^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-03  4:07                                                                 ` [PATCH] sky2: receive dma mapping error handling Michael Breuer
@ 2010-02-03 16:47                                                                   ` Michael Breuer
  2010-02-03 16:56                                                                     ` Stephen Hemminger
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-02-03 16:47 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 02/02/2010 11:07 PM, Michael Breuer wrote:
> Just a brief update - this has been up and stable for about 32 hours - 
> I've been periodically generating load on the system. No kernel errors 
> of any sort so far. Actually, in retrospect, I believe the dma issue 
> was triggering other bad things - including an rcu lockup (patched in 
> tip - sched.c).
>
> Just as an FYI - (and this should probably be in a new thread) I am 
> seeing an large number (>9,000,000) of dropped rx packets, however at 
> this time I see no errors resulting from that (on this or client 
> machines). As the # of dropped packets hasn't incremented at any time 
> I was observing things, I can't say what this is about. Probably 
> nothing, but I'll see if I can track down what is going on. I did see 
> some of this earlier on while troubleshooting the sky2 issues that now 
> seem resolved.  Quick crosschecking of other machines do not show high 
> error or retransmission rates. I'm also not seeing any evidence of 
> other errors (no errors reported by ifconfig, or ethtool, or printk 
> (debug is enabled).
>
> I'm wondering whether these dropped packets are due mostly to hitting 
> GMR_FS_RX_OK in sky2_receive. I'm also guessing that the high numbers 
> of this that I'm seeing is an artifact of being able to pump more 
> traffic through with the above patch. Given the description of the 
> status code in sky2.h (receive ok) I'm wondering whether a) this 
> should be reported as dropped, b) whether resubmit is necessary, c) 
> whether it's possible that eth1 events coinciding with eth0 events are 
> the cause and d) whether or not there's another issue entirely.
>
Tracked this down. The status being returned is 0x3c0080 - good flow 
control packets. Nothing is actually being dropped (confirmed by packet 
trace on switch compared with packet trace on server).

I whipped up a trivial patch to not count these as dropped packets and 
will post to netdev.

I'm not really sure what the driver should be doing in this case, but 
resubmit seems to work.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-03 16:47                                                                   ` Michael Breuer
@ 2010-02-03 16:56                                                                     ` Stephen Hemminger
  2010-02-03 17:07                                                                       ` Michael Breuer
  2010-02-03 17:16                                                                       ` [PATCH] sky2: receive dma mapping error handling Justin P. Mattock
  0 siblings, 2 replies; 99+ messages in thread
From: Stephen Hemminger @ 2010-02-03 16:56 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On Wed, 03 Feb 2010 11:47:19 -0500
Michael Breuer <mbreuer@majjas.com> wrote:

> Tracked this down. The status being returned is 0x3c0080 - good flow 
> control packets. Nothing is actually being dropped (confirmed by packet 
> trace on switch compared with packet trace on server).
> 
> I whipped up a trivial patch to not count these as dropped packets and 
> will post to netdev.
> 
> I'm not really sure what the driver should be doing in this case, but 
> resubmit seems to work.

Looks like a flow control negotiation issue. You probably turned off
flow control on the Linux side, but the switch is still doing flow
control.

-- 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-03 16:56                                                                     ` Stephen Hemminger
@ 2010-02-03 17:07                                                                       ` Michael Breuer
  2010-02-03 18:23                                                                         ` Justin P. Mattock
  2010-11-06 16:57                                                                         ` Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled Michael Breuer
  2010-02-03 17:16                                                                       ` [PATCH] sky2: receive dma mapping error handling Justin P. Mattock
  1 sibling, 2 replies; 99+ messages in thread
From: Michael Breuer @ 2010-02-03 17:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, akpm, flyboy, linux-kernel,
	netdev, Michael Chan, Don Fry, Francois Romieu, Matt Carlson

On 02/03/2010 11:56 AM, Stephen Hemminger wrote:
> On Wed, 03 Feb 2010 11:47:19 -0500
> Michael Breuer<mbreuer@majjas.com>  wrote:
>
>    
>> Tracked this down. The status being returned is 0x3c0080 - good flow
>> control packets. Nothing is actually being dropped (confirmed by packet
>> trace on switch compared with packet trace on server).
>>
>> I whipped up a trivial patch to not count these as dropped packets and
>> will post to netdev.
>>
>> I'm not really sure what the driver should be doing in this case, but
>> resubmit seems to work.
>>      
> Looks like a flow control negotiation issue. You probably turned off
> flow control on the Linux side, but the switch is still doing flow
> control.
>
>    
According to the driver:
Feb  3 12:03:02 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full 
duplex, flow control both

So if the rx flow control packet status is due to flow control being 
disabled, then there's a different issue.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-03 16:56                                                                     ` Stephen Hemminger
  2010-02-03 17:07                                                                       ` Michael Breuer
@ 2010-02-03 17:16                                                                       ` Justin P. Mattock
  1 sibling, 0 replies; 99+ messages in thread
From: Justin P. Mattock @ 2010-02-03 17:16 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael Breuer, Jarek Poplawski, David Miller, akpm, flyboy,
	linux-kernel, netdev, Michael Chan, Don Fry, Francois Romieu,
	Matt Carlson

On 02/03/10 08:56, Stephen Hemminger wrote:
> On Wed, 03 Feb 2010 11:47:19 -0500
> Michael Breuer<mbreuer@majjas.com>  wrote:
>
>> Tracked this down. The status being returned is 0x3c0080 - good flow
>> control packets. Nothing is actually being dropped (confirmed by packet
>> trace on switch compared with packet trace on server).
>>
>> I whipped up a trivial patch to not count these as dropped packets and
>> will post to netdev.
>>
>> I'm not really sure what the driver should be doing in this case, but
>> resubmit seems to work.
>
> Looks like a flow control negotiation issue. You probably turned off
> flow control on the Linux side, but the switch is still doing flow
> control.
>

I noticed this in a hotel last week, I can try
at the home to see if this fires of, if so
I can test any patches you have.

Justin P. Mattock

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-03 17:07                                                                       ` Michael Breuer
@ 2010-02-03 18:23                                                                         ` Justin P. Mattock
  2010-02-03 18:25                                                                           ` Stephen Hemminger
  2010-11-06 16:57                                                                         ` Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled Michael Breuer
  1 sibling, 1 reply; 99+ messages in thread
From: Justin P. Mattock @ 2010-02-03 18:23 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Stephen Hemminger, Jarek Poplawski, David Miller, akpm, flyboy,
	linux-kernel, netdev, Michael Chan, Don Fry, Francois Romieu,
	Matt Carlson

On 02/03/10 09:07, Michael Breuer wrote:
> On 02/03/2010 11:56 AM, Stephen Hemminger wrote:
>> On Wed, 03 Feb 2010 11:47:19 -0500
>> Michael Breuer<mbreuer@majjas.com> wrote:
>>
>>> Tracked this down. The status being returned is 0x3c0080 - good flow
>>> control packets. Nothing is actually being dropped (confirmed by packet
>>> trace on switch compared with packet trace on server).
>>>
>>> I whipped up a trivial patch to not count these as dropped packets and
>>> will post to netdev.
>>>
>>> I'm not really sure what the driver should be doing in this case, but
>>> resubmit seems to work.
>> Looks like a flow control negotiation issue. You probably turned off
>> flow control on the Linux side, but the switch is still doing flow
>> control.
>>
> According to the driver:
> Feb 3 12:03:02 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full
> duplex, flow control both
>
> So if the rx flow control packet status is due to flow control being
> disabled, then there's a different issue.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


hmm.. after an hour or so I'm not seeing anything.
from what I remember I turned the machine on in the
hotel, then left the system there as I went out for a few hours
(so maybe I need to wait).

Anyways I did keep dmesg of when this occurred, basically
the log was spammed with these:

[  863.294057] sky2 eth0: rx error, status 0x580002 length 88
[  865.646645] sky2 eth0: rx error, status 0x600002 length 96
[ 1286.420471] sky2 eth0: rx error, status 0x600002 length 96
[ 1286.499459] sky2 eth0: rx error, status 0x600002 length 96
[ 1746.903826] sky2 eth0: rx error, status 0x600002 length 96
[ 1754.263692] sky2 eth0: rx error, status 0x600002 length 96
[ 1755.309360] sky2 eth0: rx error, status 0x680002 length 104
[ 2213.256294] sky2 eth0: rx error, status 0x600002 length 96
[ 2219.653342] sky2 eth0: rx error, status 0x580002 length 88
[ 2221.673601] sky2 eth0: rx error, status 0x600002 length 96
[ 2679.654655] sky2 eth0: rx error, status 0x680002 length 104
[ 2692.315058] sky2 eth0: rx error, status 0x500002 length 80
[ 2694.349612] sky2 eth0: rx error, status 0x580002 length 88
[ 2703.676717] sky2 eth0: rx error, status 0x700002 length 112
[ 2703.826375] sky2 eth0: rx error, status 0x600002 length 96
[ 3187.504843] sky2 eth0: rx error, status 0x600002 length 96
[ 3189.560744] sky2 eth0: rx error, status 0x600002 length 96
[ 3672.475719] sky2 eth0: rx error, status 0x680002 length 104
[ 3676.696959] sky2 eth0: rx error, status 0x680002 length 104

but while using the system with this, I didn't notice anything
out of the ordinary.
(if this fires off I can try a bisect for you guys, but right now
since I'm not seeing anything, might be a different story);

Justin P. Mattock

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-03 18:23                                                                         ` Justin P. Mattock
@ 2010-02-03 18:25                                                                           ` Stephen Hemminger
  2010-02-03 18:48                                                                             ` Justin P. Mattock
  0 siblings, 1 reply; 99+ messages in thread
From: Stephen Hemminger @ 2010-02-03 18:25 UTC (permalink / raw)
  To: Justin P. Mattock
  Cc: Michael Breuer, Jarek Poplawski, David Miller, akpm, flyboy,
	linux-kernel, netdev, Michael Chan, Don Fry, Francois Romieu,
	Matt Carlson

On Wed, 03 Feb 2010 10:23:01 -0800
"Justin P. Mattock" <justinmattock@gmail.com> wrote:

> On 02/03/10 09:07, Michael Breuer wrote:
> > On 02/03/2010 11:56 AM, Stephen Hemminger wrote:
> >> On Wed, 03 Feb 2010 11:47:19 -0500
> >> Michael Breuer<mbreuer@majjas.com> wrote:
> >>
> >>> Tracked this down. The status being returned is 0x3c0080 - good flow
> >>> control packets. Nothing is actually being dropped (confirmed by packet
> >>> trace on switch compared with packet trace on server).
> >>>
> >>> I whipped up a trivial patch to not count these as dropped packets and
> >>> will post to netdev.
> >>>
> >>> I'm not really sure what the driver should be doing in this case, but
> >>> resubmit seems to work.
> >> Looks like a flow control negotiation issue. You probably turned off
> >> flow control on the Linux side, but the switch is still doing flow
> >> control.
> >>
> > According to the driver:
> > Feb 3 12:03:02 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full
> > duplex, flow control both
> >
> > So if the rx flow control packet status is due to flow control being
> > disabled, then there's a different issue.
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> 
> 
> hmm.. after an hour or so I'm not seeing anything.
> from what I remember I turned the machine on in the
> hotel, then left the system there as I went out for a few hours
> (so maybe I need to wait).
> 
> Anyways I did keep dmesg of when this occurred, basically
> the log was spammed with these:
> 
> [  863.294057] sky2 eth0: rx error, status 0x580002 length 88
> [  865.646645] sky2 eth0: rx error, status 0x600002 length 96
> [ 1286.420471] sky2 eth0: rx error, status 0x600002 length 96
> [ 1286.499459] sky2 eth0: rx error, status 0x600002 length 96
> [ 1746.903826] sky2 eth0: rx error, status 0x600002 length 96
> [ 1754.263692] sky2 eth0: rx error, status 0x600002 length 96
> [ 1755.309360] sky2 eth0: rx error, status 0x680002 length 104
> [ 2213.256294] sky2 eth0: rx error, status 0x600002 length 96
> [ 2219.653342] sky2 eth0: rx error, status 0x580002 length 88
> [ 2221.673601] sky2 eth0: rx error, status 0x600002 length 96
> [ 2679.654655] sky2 eth0: rx error, status 0x680002 length 104
> [ 2692.315058] sky2 eth0: rx error, status 0x500002 length 80
> [ 2694.349612] sky2 eth0: rx error, status 0x580002 length 88
> [ 2703.676717] sky2 eth0: rx error, status 0x700002 length 112
> [ 2703.826375] sky2 eth0: rx error, status 0x600002 length 96
> [ 3187.504843] sky2 eth0: rx error, status 0x600002 length 96
> [ 3189.560744] sky2 eth0: rx error, status 0x600002 length 96
> [ 3672.475719] sky2 eth0: rx error, status 0x680002 length 104
> [ 3676.696959] sky2 eth0: rx error, status 0x680002 length 104
> 
> but while using the system with this, I didn't notice anything
> out of the ordinary.
> (if this fires off I can try a bisect for you guys, but right now
> since I'm not seeing anything, might be a different story);
> 

You were on a crappy hotel switch. Those are all CRC errors.



-- 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] sky2:  receive dma mapping error handling
  2010-02-03 18:25                                                                           ` Stephen Hemminger
@ 2010-02-03 18:48                                                                             ` Justin P. Mattock
  0 siblings, 0 replies; 99+ messages in thread
From: Justin P. Mattock @ 2010-02-03 18:48 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Michael Breuer, Jarek Poplawski, David Miller, akpm, flyboy,
	linux-kernel, netdev, Michael Chan, Don Fry, Francois Romieu,
	Matt Carlson

On 02/03/10 10:25, Stephen Hemminger wrote:
> On Wed, 03 Feb 2010 10:23:01 -0800
> "Justin P. Mattock"<justinmattock@gmail.com>  wrote:
>
>> On 02/03/10 09:07, Michael Breuer wrote:
>>> On 02/03/2010 11:56 AM, Stephen Hemminger wrote:
>>>> On Wed, 03 Feb 2010 11:47:19 -0500
>>>> Michael Breuer<mbreuer@majjas.com>  wrote:
>>>>
>>>>> Tracked this down. The status being returned is 0x3c0080 - good flow
>>>>> control packets. Nothing is actually being dropped (confirmed by packet
>>>>> trace on switch compared with packet trace on server).
>>>>>
>>>>> I whipped up a trivial patch to not count these as dropped packets and
>>>>> will post to netdev.
>>>>>
>>>>> I'm not really sure what the driver should be doing in this case, but
>>>>> resubmit seems to work.
>>>> Looks like a flow control negotiation issue. You probably turned off
>>>> flow control on the Linux side, but the switch is still doing flow
>>>> control.
>>>>
>>> According to the driver:
>>> Feb 3 12:03:02 mail kernel: sky2 eth0: Link is up at 1000 Mbps, full
>>> duplex, flow control both
>>>
>>> So if the rx flow control packet status is due to flow control being
>>> disabled, then there's a different issue.
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
>>>
>>
>>
>> hmm.. after an hour or so I'm not seeing anything.
>> from what I remember I turned the machine on in the
>> hotel, then left the system there as I went out for a few hours
>> (so maybe I need to wait).
>>
>> Anyways I did keep dmesg of when this occurred, basically
>> the log was spammed with these:
>>
>> [  863.294057] sky2 eth0: rx error, status 0x580002 length 88
>> [  865.646645] sky2 eth0: rx error, status 0x600002 length 96
>> [ 1286.420471] sky2 eth0: rx error, status 0x600002 length 96
>> [ 1286.499459] sky2 eth0: rx error, status 0x600002 length 96
>> [ 1746.903826] sky2 eth0: rx error, status 0x600002 length 96
>> [ 1754.263692] sky2 eth0: rx error, status 0x600002 length 96
>> [ 1755.309360] sky2 eth0: rx error, status 0x680002 length 104
>> [ 2213.256294] sky2 eth0: rx error, status 0x600002 length 96
>> [ 2219.653342] sky2 eth0: rx error, status 0x580002 length 88
>> [ 2221.673601] sky2 eth0: rx error, status 0x600002 length 96
>> [ 2679.654655] sky2 eth0: rx error, status 0x680002 length 104
>> [ 2692.315058] sky2 eth0: rx error, status 0x500002 length 80
>> [ 2694.349612] sky2 eth0: rx error, status 0x580002 length 88
>> [ 2703.676717] sky2 eth0: rx error, status 0x700002 length 112
>> [ 2703.826375] sky2 eth0: rx error, status 0x600002 length 96
>> [ 3187.504843] sky2 eth0: rx error, status 0x600002 length 96
>> [ 3189.560744] sky2 eth0: rx error, status 0x600002 length 96
>> [ 3672.475719] sky2 eth0: rx error, status 0x680002 length 104
>> [ 3676.696959] sky2 eth0: rx error, status 0x680002 length 104
>>
>> but while using the system with this, I didn't notice anything
>> out of the ordinary.
>> (if this fires off I can try a bisect for you guys, but right now
>> since I'm not seeing anything, might be a different story);
>>
>
> You were on a crappy hotel switch. Those are all CRC errors.
>
>
>


alright.. makes sense cause I'm not getting
any of this now at home.

Justin P. Mattock

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled
  2010-02-03 17:07                                                                       ` Michael Breuer
  2010-02-03 18:23                                                                         ` Justin P. Mattock
@ 2010-11-06 16:57                                                                         ` Michael Breuer
  2010-11-08  3:13                                                                           ` Stephen Hemminger
  1 sibling, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-11-06 16:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jarek Poplawski, David Miller, netdev

Basically, if I enable tcp timestamps (now disabled) I get a sky2 hang. 
As with the earlier issue the effects are not seen until after a couple 
days of uptime and seem exacerbated by load.

I can't 100% confirm that the problem is not occurring without tcp 
timestamps, but will leave the system up for a while to try to confirm. 
This didn't occur previously without tcp timestamps enabled, but I also 
pulled git changes between the two events.

I'm now also on 2.6.37-rc1.... I did a quick scan and didn't see any 
obvious commits between 2.6.36-09934 and -rc1 that would have affected this.

 From the log:
Nov  2 05:41:54 mail kernel: DRHD: handling fault status reg 2
Nov  2 05:41:54 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
fault addr ffea3000
Nov  2 05:41:54 mail kernel: DMAR:[fault reason 06] PTE Read access is 
not set
Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: error interrupt 
status=0x80000000
Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
Nov  2 05:42:01 mail clamd[9755]: SelfCheck: Database status OK.
Nov  2 05:42:11 mail root: ping of potter failed
Nov  2 05:42:16 mail kernel: ------------[ cut here ]------------
Nov  2 05:42:16 mail kernel: WARNING: at net/sched/sch_generic.c:258 
dev_watchdog+0x251/0x260()
Nov  2 05:42:16 mail kernel: Hardware name: System Product Name
Nov  2 05:42:16 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit 
queue 0 timed out
Nov  2 05:42:16 mail kernel: Modules linked in: cpufreq_stats 
ip6table_filter ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat 
nf_nat iptable_mangle iptable_raw ebtable_nat ebtables bridge stp 
appletalk psnap llc nfsd lockd nfs_acl auth_rpcgss exportfs coretemp 
sunrpc acpi_cpufreq mperf sit tunnel4 ipt_LOG nf_conntrack_netbios_ns 
nf_conntrack_ftp xt_DSCP xt_dscp xt_mark nf_conntrack_ipv6 
nf_defrag_ipv6 xt_state xt_multiport ipv6 kvm_intel kvm 
snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec 
snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device 
snd_pcm gspca_spca505 gspca_main snd_timer videodev snd v4l1_compat 
i2c_i801 sky2 v4l2_compat_ioctl32 iTCO_wdt pcspkr asus_atk0110 
i7core_edac edac_core soundcore iTCO_vendor_support snd_page_alloc 
microcode raid456 async_raid6_recov async_pq raid6_pq async_xor xor 
async_memcpy async_tx raid1 ata_generic firewire_ohci pata_acpi 
firewire_core crc_itu_t pata_marvell nouveau ttm drm_kms_helper drm 
i2c_algo_bit i2c_core video output [
Nov  2 05:42:16 mail kernel: last unloaded: ip6_tables]
Nov  2 05:42:16 mail kernel: Pid: 0, comm: swapper Tainted: G        W   
2.6.36-09934-g2aab243 #44
Nov  2 05:42:16 mail kernel: Call Trace:
Nov  2 05:42:16 mail kernel: <IRQ>  [<ffffffff81058a4f>] 
warn_slowpath_common+0x7f/0xc0
Nov  2 05:42:16 mail kernel: [<ffffffff81058b46>] 
warn_slowpath_fmt+0x46/0x50
Nov  2 05:42:16 mail kernel: [<ffffffff814603d1>] dev_watchdog+0x251/0x260
Nov  2 05:42:16 mail kernel: [<ffffffff8108a4a6>] ? 
tick_program_event+0x26/0x30
Nov  2 05:42:16 mail kernel: [<ffffffff8107eed4>] ? 
hrtimer_interrupt+0x134/0x240
Nov  2 05:42:16 mail kernel: [<ffffffff81068ab0>] 
run_timer_softirq+0x160/0x390
Nov  2 05:42:16 mail kernel: [<ffffffff8108a368>] ? 
tick_dev_program_event+0x48/0x110
Nov  2 05:42:16 mail kernel: [<ffffffff81460180>] ? dev_watchdog+0x0/0x260
Nov  2 05:42:16 mail kernel: [<ffffffff8105f981>] __do_softirq+0xb1/0x220
Nov  2 05:42:16 mail kernel: [<ffffffff8100cfdc>] call_softirq+0x1c/0x30
Nov  2 05:42:16 mail kernel: [<ffffffff8100ea15>] do_softirq+0x65/0xa0
Nov  2 05:42:16 mail kernel: [<ffffffff8105f845>] irq_exit+0x85/0x90
Nov  2 05:42:16 mail kernel: [<ffffffff81511d61>] do_IRQ+0x71/0xf0
Nov  2 05:42:16 mail kernel: [<ffffffff8150a7d3>] ret_from_intr+0x0/0x11
Nov  2 05:42:16 mail kernel: <EOI>  [<ffffffff812e4165>] ? 
intel_idle+0xd5/0x170
Nov  2 05:42:16 mail kernel: [<ffffffff812e4148>] ? intel_idle+0xb8/0x170
Nov  2 05:42:16 mail kernel: [<ffffffff81425b51>] 
cpuidle_idle_call+0x91/0x150
Nov  2 05:42:16 mail kernel: [<ffffffff8100aa8b>] cpu_idle+0xbb/0x150
Nov  2 05:42:16 mail kernel: [<ffffffff814f1785>] rest_init+0x75/0x80
Nov  2 05:42:16 mail kernel: [<ffffffff81b4ae9b>] start_kernel+0x3dc/0x3e7
Nov  2 05:42:16 mail kernel: [<ffffffff81b4a346>] 
x86_64_start_reservations+0x131/0x135
Nov  2 05:42:16 mail kernel: [<ffffffff81b4a450>] 
x86_64_start_kernel+0x106/0x115
Nov  2 05:42:16 mail kernel: ---[ end trace d9d3a1889f8925bf ]---
Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: tx timeout
Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: transmit ring 29 
.. 117 report=29 done=29


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled
  2010-11-06 16:57                                                                         ` Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled Michael Breuer
@ 2010-11-08  3:13                                                                           ` Stephen Hemminger
  2010-11-08  3:38                                                                             ` Michael Breuer
  0 siblings, 1 reply; 99+ messages in thread
From: Stephen Hemminger @ 2010-11-08  3:13 UTC (permalink / raw)
  To: Michael Breuer; +Cc: Stephen Hemminger, Jarek Poplawski, David Miller, netdev

On Sat, 06 Nov 2010 12:57:53 -0400
Michael Breuer <mbreuer@majjas.com> wrote:

> Basically, if I enable tcp timestamps (now disabled) I get a sky2 hang. 
> As with the earlier issue the effects are not seen until after a couple 
> days of uptime and seem exacerbated by load.
> 
> I can't 100% confirm that the problem is not occurring without tcp 
> timestamps, but will leave the system up for a while to try to confirm. 
> This didn't occur previously without tcp timestamps enabled, but I also 
> pulled git changes between the two events.
> 
> I'm now also on 2.6.37-rc1.... I did a quick scan and didn't see any 
> obvious commits between 2.6.36-09934 and -rc1 that would have affected this.
> 
>  From the log:
> Nov  2 05:41:54 mail kernel: DRHD: handling fault status reg 2
> Nov  2 05:41:54 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
> fault addr ffea3000
> Nov  2 05:41:54 mail kernel: DMAR:[fault reason 06] PTE Read access is 
> not set
> Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: error interrupt 
> status=0x80000000
> Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
> Nov  2 05:42:01 mail clamd[9755]: SelfCheck: Database status OK.
> Nov  2 05:42:11 mail root: ping of potter failed
> Nov  2 05:42:16 mail kernel: ------------[ cut here ]------------
> Nov  2 05:42:16 mail kernel: WARNING: at net/sched/sch_generic.c:258 
> dev_watchdog+0x251/0x260()
> Nov  2 05:42:16 mail kernel: Hardware name: System Product Name
> Nov  2 05:42:16 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit 
> queue 0 timed out
> Nov  2 05:42:16 mail kernel: Modules linked in: cpufreq_stats 
> ip6table_filter ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat 
> nf_nat iptable_mangle iptable_raw ebtable_nat ebtables bridge stp 
> appletalk psnap llc nfsd lockd nfs_acl auth_rpcgss exportfs coretemp 
> sunrpc acpi_cpufreq mperf sit tunnel4 ipt_LOG nf_conntrack_netbios_ns 
> nf_conntrack_ftp xt_DSCP xt_dscp xt_mark nf_conntrack_ipv6 
> nf_defrag_ipv6 xt_state xt_multiport ipv6 kvm_intel kvm 
> snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec 
> snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device 
> snd_pcm gspca_spca505 gspca_main snd_timer videodev snd v4l1_compat 
> i2c_i801 sky2 v4l2_compat_ioctl32 iTCO_wdt pcspkr asus_atk0110 
> i7core_edac edac_core soundcore iTCO_vendor_support snd_page_alloc 
> microcode raid456 async_raid6_recov async_pq raid6_pq async_xor xor 
> async_memcpy async_tx raid1 ata_generic firewire_ohci pata_acpi 
> firewire_core crc_itu_t pata_marvell nouveau ttm drm_kms_helper drm 
> i2c_algo_bit i2c_core video output [
> Nov  2 05:42:16 mail kernel: last unloaded: ip6_tables]
> Nov  2 05:42:16 mail kernel: Pid: 0, comm: swapper Tainted: G        W   
> 2.6.36-09934-g2aab243 #44
> Nov  2 05:42:16 mail kernel: Call Trace:
> Nov  2 05:42:16 mail kernel: <IRQ>  [<ffffffff81058a4f>] 
> warn_slowpath_common+0x7f/0xc0
> Nov  2 05:42:16 mail kernel: [<ffffffff81058b46>] 
> warn_slowpath_fmt+0x46/0x50
> Nov  2 05:42:16 mail kernel: [<ffffffff814603d1>] dev_watchdog+0x251/0x260
> Nov  2 05:42:16 mail kernel: [<ffffffff8108a4a6>] ? 
> tick_program_event+0x26/0x30
> Nov  2 05:42:16 mail kernel: [<ffffffff8107eed4>] ? 
> hrtimer_interrupt+0x134/0x240
> Nov  2 05:42:16 mail kernel: [<ffffffff81068ab0>] 
> run_timer_softirq+0x160/0x390
> Nov  2 05:42:16 mail kernel: [<ffffffff8108a368>] ? 
> tick_dev_program_event+0x48/0x110
> Nov  2 05:42:16 mail kernel: [<ffffffff81460180>] ? dev_watchdog+0x0/0x260
> Nov  2 05:42:16 mail kernel: [<ffffffff8105f981>] __do_softirq+0xb1/0x220
> Nov  2 05:42:16 mail kernel: [<ffffffff8100cfdc>] call_softirq+0x1c/0x30
> Nov  2 05:42:16 mail kernel: [<ffffffff8100ea15>] do_softirq+0x65/0xa0
> Nov  2 05:42:16 mail kernel: [<ffffffff8105f845>] irq_exit+0x85/0x90
> Nov  2 05:42:16 mail kernel: [<ffffffff81511d61>] do_IRQ+0x71/0xf0
> Nov  2 05:42:16 mail kernel: [<ffffffff8150a7d3>] ret_from_intr+0x0/0x11
> Nov  2 05:42:16 mail kernel: <EOI>  [<ffffffff812e4165>] ? 
> intel_idle+0xd5/0x170
> Nov  2 05:42:16 mail kernel: [<ffffffff812e4148>] ? intel_idle+0xb8/0x170
> Nov  2 05:42:16 mail kernel: [<ffffffff81425b51>] 
> cpuidle_idle_call+0x91/0x150
> Nov  2 05:42:16 mail kernel: [<ffffffff8100aa8b>] cpu_idle+0xbb/0x150
> Nov  2 05:42:16 mail kernel: [<ffffffff814f1785>] rest_init+0x75/0x80
> Nov  2 05:42:16 mail kernel: [<ffffffff81b4ae9b>] start_kernel+0x3dc/0x3e7
> Nov  2 05:42:16 mail kernel: [<ffffffff81b4a346>] 
> x86_64_start_reservations+0x131/0x135
> Nov  2 05:42:16 mail kernel: [<ffffffff81b4a450>] 
> x86_64_start_kernel+0x106/0x115
> Nov  2 05:42:16 mail kernel: ---[ end trace d9d3a1889f8925bf ]---
> Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: tx timeout
> Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: transmit ring 29 
> .. 117 report=29 done=29
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Looks like a hardware issue, never saw it before.
Are you running MTU > 1500?
Does turning off TSO help?

One possibility is that NET_IP_ALIGN changed. Now the ethernet header is
aligned and the IP header is not.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled
  2010-11-08  3:13                                                                           ` Stephen Hemminger
@ 2010-11-08  3:38                                                                             ` Michael Breuer
  2010-11-08 16:46                                                                               ` Stephen Hemminger
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2010-11-08  3:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Stephen Hemminger, Jarek Poplawski, David Miller, netdev

On 11/7/2010 10:13 PM, Stephen Hemminger wrote:
> On Sat, 06 Nov 2010 12:57:53 -0400
> Michael Breuer<mbreuer@majjas.com>  wrote:
>
>> Basically, if I enable tcp timestamps (now disabled) I get a sky2 hang.
>> As with the earlier issue the effects are not seen until after a couple
>> days of uptime and seem exacerbated by load.
>>
>> I can't 100% confirm that the problem is not occurring without tcp
>> timestamps, but will leave the system up for a while to try to confirm.
>> This didn't occur previously without tcp timestamps enabled, but I also
>> pulled git changes between the two events.
>>
>> I'm now also on 2.6.37-rc1.... I did a quick scan and didn't see any
>> obvious commits between 2.6.36-09934 and -rc1 that would have affected this.
>>
>>   From the log:
>> Nov  2 05:41:54 mail kernel: DRHD: handling fault status reg 2
>> Nov  2 05:41:54 mail kernel: DMAR:[DMA Read] Request device [06:00.0]
>> fault addr ffea3000
>> Nov  2 05:41:54 mail kernel: DMAR:[fault reason 06] PTE Read access is
>> not set
>> Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: error interrupt
>> status=0x80000000
>> Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
>> Nov  2 05:42:01 mail clamd[9755]: SelfCheck: Database status OK.
>> Nov  2 05:42:11 mail root: ping of potter failed
>> Nov  2 05:42:16 mail kernel: ------------[ cut here ]------------
>> Nov  2 05:42:16 mail kernel: WARNING: at net/sched/sch_generic.c:258
>> dev_watchdog+0x251/0x260()
>> Nov  2 05:42:16 mail kernel: Hardware name: System Product Name
>> Nov  2 05:42:16 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit
>> queue 0 timed out
>> Nov  2 05:42:16 mail kernel: Modules linked in: cpufreq_stats
>> ip6table_filter ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat
>> nf_nat iptable_mangle iptable_raw ebtable_nat ebtables bridge stp
>> appletalk psnap llc nfsd lockd nfs_acl auth_rpcgss exportfs coretemp
>> sunrpc acpi_cpufreq mperf sit tunnel4 ipt_LOG nf_conntrack_netbios_ns
>> nf_conntrack_ftp xt_DSCP xt_dscp xt_mark nf_conntrack_ipv6
>> nf_defrag_ipv6 xt_state xt_multiport ipv6 kvm_intel kvm
>> snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec
>> snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device
>> snd_pcm gspca_spca505 gspca_main snd_timer videodev snd v4l1_compat
>> i2c_i801 sky2 v4l2_compat_ioctl32 iTCO_wdt pcspkr asus_atk0110
>> i7core_edac edac_core soundcore iTCO_vendor_support snd_page_alloc
>> microcode raid456 async_raid6_recov async_pq raid6_pq async_xor xor
>> async_memcpy async_tx raid1 ata_generic firewire_ohci pata_acpi
>> firewire_core crc_itu_t pata_marvell nouveau ttm drm_kms_helper drm
>> i2c_algo_bit i2c_core video output [
>> Nov  2 05:42:16 mail kernel: last unloaded: ip6_tables]
>> Nov  2 05:42:16 mail kernel: Pid: 0, comm: swapper Tainted: G        W
>> 2.6.36-09934-g2aab243 #44
>> Nov  2 05:42:16 mail kernel: Call Trace:
>> Nov  2 05:42:16 mail kernel:<IRQ>   [<ffffffff81058a4f>]
>> warn_slowpath_common+0x7f/0xc0
>> Nov  2 05:42:16 mail kernel: [<ffffffff81058b46>]
>> warn_slowpath_fmt+0x46/0x50
>> Nov  2 05:42:16 mail kernel: [<ffffffff814603d1>] dev_watchdog+0x251/0x260
>> Nov  2 05:42:16 mail kernel: [<ffffffff8108a4a6>] ?
>> tick_program_event+0x26/0x30
>> Nov  2 05:42:16 mail kernel: [<ffffffff8107eed4>] ?
>> hrtimer_interrupt+0x134/0x240
>> Nov  2 05:42:16 mail kernel: [<ffffffff81068ab0>]
>> run_timer_softirq+0x160/0x390
>> Nov  2 05:42:16 mail kernel: [<ffffffff8108a368>] ?
>> tick_dev_program_event+0x48/0x110
>> Nov  2 05:42:16 mail kernel: [<ffffffff81460180>] ? dev_watchdog+0x0/0x260
>> Nov  2 05:42:16 mail kernel: [<ffffffff8105f981>] __do_softirq+0xb1/0x220
>> Nov  2 05:42:16 mail kernel: [<ffffffff8100cfdc>] call_softirq+0x1c/0x30
>> Nov  2 05:42:16 mail kernel: [<ffffffff8100ea15>] do_softirq+0x65/0xa0
>> Nov  2 05:42:16 mail kernel: [<ffffffff8105f845>] irq_exit+0x85/0x90
>> Nov  2 05:42:16 mail kernel: [<ffffffff81511d61>] do_IRQ+0x71/0xf0
>> Nov  2 05:42:16 mail kernel: [<ffffffff8150a7d3>] ret_from_intr+0x0/0x11
>> Nov  2 05:42:16 mail kernel:<EOI>   [<ffffffff812e4165>] ?
>> intel_idle+0xd5/0x170
>> Nov  2 05:42:16 mail kernel: [<ffffffff812e4148>] ? intel_idle+0xb8/0x170
>> Nov  2 05:42:16 mail kernel: [<ffffffff81425b51>]
>> cpuidle_idle_call+0x91/0x150
>> Nov  2 05:42:16 mail kernel: [<ffffffff8100aa8b>] cpu_idle+0xbb/0x150
>> Nov  2 05:42:16 mail kernel: [<ffffffff814f1785>] rest_init+0x75/0x80
>> Nov  2 05:42:16 mail kernel: [<ffffffff81b4ae9b>] start_kernel+0x3dc/0x3e7
>> Nov  2 05:42:16 mail kernel: [<ffffffff81b4a346>]
>> x86_64_start_reservations+0x131/0x135
>> Nov  2 05:42:16 mail kernel: [<ffffffff81b4a450>]
>> x86_64_start_kernel+0x106/0x115
>> Nov  2 05:42:16 mail kernel: ---[ end trace d9d3a1889f8925bf ]---
>> Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: tx timeout
>> Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: transmit ring 29
>> .. 117 report=29 done=29
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Looks like a hardware issue, never saw it before.
> Are you running MTU>  1500?
> Does turning off TSO help?
>
> One possibility is that NET_IP_ALIGN changed. Now the ethernet header is
> aligned and the IP header is not.
>
MTU=1500
TCP timestamps seems to be the culprit - no issues with it disabled. I 
hit the problem after running about 18 hours with TCP timestamps 
enabled. Has been stable since rebuilding without timestamps... but 
another day would be more telling.

Didn't look into the header alignment - but would that be inconsistent 
with tcp timestamps being involved?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled
  2010-11-08  3:38                                                                             ` Michael Breuer
@ 2010-11-08 16:46                                                                               ` Stephen Hemminger
  0 siblings, 0 replies; 99+ messages in thread
From: Stephen Hemminger @ 2010-11-08 16:46 UTC (permalink / raw)
  To: Michael Breuer; +Cc: Stephen Hemminger, Jarek Poplawski, David Miller, netdev

On Sun, 07 Nov 2010 22:38:19 -0500
Michael Breuer <mbreuer@majjas.com> wrote:

> On 11/7/2010 10:13 PM, Stephen Hemminger wrote:
> > On Sat, 06 Nov 2010 12:57:53 -0400
> > Michael Breuer<mbreuer@majjas.com>  wrote:
> >
> >> Basically, if I enable tcp timestamps (now disabled) I get a sky2 hang.
> >> As with the earlier issue the effects are not seen until after a couple
> >> days of uptime and seem exacerbated by load.
> >>
> >> I can't 100% confirm that the problem is not occurring without tcp
> >> timestamps, but will leave the system up for a while to try to confirm.
> >> This didn't occur previously without tcp timestamps enabled, but I also
> >> pulled git changes between the two events.
> >>
> >> I'm now also on 2.6.37-rc1.... I did a quick scan and didn't see any
> >> obvious commits between 2.6.36-09934 and -rc1 that would have affected this.
> >>
> >>   From the log:
> >> Nov  2 05:41:54 mail kernel: DRHD: handling fault status reg 2
> >> Nov  2 05:41:54 mail kernel: DMAR:[DMA Read] Request device [06:00.0]
> >> fault addr ffea3000
> >> Nov  2 05:41:54 mail kernel: DMAR:[fault reason 06] PTE Read access is
> >> not set
> >> Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: error interrupt
> >> status=0x80000000
> >> Nov  2 05:41:54 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
> >> Nov  2 05:42:01 mail clamd[9755]: SelfCheck: Database status OK.
> >> Nov  2 05:42:11 mail root: ping of potter failed
> >> Nov  2 05:42:16 mail kernel: ------------[ cut here ]------------
> >> Nov  2 05:42:16 mail kernel: WARNING: at net/sched/sch_generic.c:258
> >> dev_watchdog+0x251/0x260()
> >> Nov  2 05:42:16 mail kernel: Hardware name: System Product Name
> >> Nov  2 05:42:16 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit
> >> queue 0 timed out
> >> Nov  2 05:42:16 mail kernel: Modules linked in: cpufreq_stats
> >> ip6table_filter ip6table_mangle ip6_tables ipt_MASQUERADE iptable_nat
> >> nf_nat iptable_mangle iptable_raw ebtable_nat ebtables bridge stp
> >> appletalk psnap llc nfsd lockd nfs_acl auth_rpcgss exportfs coretemp
> >> sunrpc acpi_cpufreq mperf sit tunnel4 ipt_LOG nf_conntrack_netbios_ns
> >> nf_conntrack_ftp xt_DSCP xt_dscp xt_mark nf_conntrack_ipv6
> >> nf_defrag_ipv6 xt_state xt_multiport ipv6 kvm_intel kvm
> >> snd_hda_codec_analog snd_ens1371 gameport snd_rawmidi snd_ac97_codec
> >> snd_hda_intel snd_hda_codec ac97_bus snd_hwdep snd_seq snd_seq_device
> >> snd_pcm gspca_spca505 gspca_main snd_timer videodev snd v4l1_compat
> >> i2c_i801 sky2 v4l2_compat_ioctl32 iTCO_wdt pcspkr asus_atk0110
> >> i7core_edac edac_core soundcore iTCO_vendor_support snd_page_alloc
> >> microcode raid456 async_raid6_recov async_pq raid6_pq async_xor xor
> >> async_memcpy async_tx raid1 ata_generic firewire_ohci pata_acpi
> >> firewire_core crc_itu_t pata_marvell nouveau ttm drm_kms_helper drm
> >> i2c_algo_bit i2c_core video output [
> >> Nov  2 05:42:16 mail kernel: last unloaded: ip6_tables]
> >> Nov  2 05:42:16 mail kernel: Pid: 0, comm: swapper Tainted: G        W
> >> 2.6.36-09934-g2aab243 #44
> >> Nov  2 05:42:16 mail kernel: Call Trace:
> >> Nov  2 05:42:16 mail kernel:<IRQ>   [<ffffffff81058a4f>]
> >> warn_slowpath_common+0x7f/0xc0
> >> Nov  2 05:42:16 mail kernel: [<ffffffff81058b46>]
> >> warn_slowpath_fmt+0x46/0x50
> >> Nov  2 05:42:16 mail kernel: [<ffffffff814603d1>] dev_watchdog+0x251/0x260
> >> Nov  2 05:42:16 mail kernel: [<ffffffff8108a4a6>] ?
> >> tick_program_event+0x26/0x30
> >> Nov  2 05:42:16 mail kernel: [<ffffffff8107eed4>] ?
> >> hrtimer_interrupt+0x134/0x240
> >> Nov  2 05:42:16 mail kernel: [<ffffffff81068ab0>]
> >> run_timer_softirq+0x160/0x390
> >> Nov  2 05:42:16 mail kernel: [<ffffffff8108a368>] ?
> >> tick_dev_program_event+0x48/0x110
> >> Nov  2 05:42:16 mail kernel: [<ffffffff81460180>] ? dev_watchdog+0x0/0x260
> >> Nov  2 05:42:16 mail kernel: [<ffffffff8105f981>] __do_softirq+0xb1/0x220
> >> Nov  2 05:42:16 mail kernel: [<ffffffff8100cfdc>] call_softirq+0x1c/0x30
> >> Nov  2 05:42:16 mail kernel: [<ffffffff8100ea15>] do_softirq+0x65/0xa0
> >> Nov  2 05:42:16 mail kernel: [<ffffffff8105f845>] irq_exit+0x85/0x90
> >> Nov  2 05:42:16 mail kernel: [<ffffffff81511d61>] do_IRQ+0x71/0xf0
> >> Nov  2 05:42:16 mail kernel: [<ffffffff8150a7d3>] ret_from_intr+0x0/0x11
> >> Nov  2 05:42:16 mail kernel:<EOI>   [<ffffffff812e4165>] ?
> >> intel_idle+0xd5/0x170
> >> Nov  2 05:42:16 mail kernel: [<ffffffff812e4148>] ? intel_idle+0xb8/0x170
> >> Nov  2 05:42:16 mail kernel: [<ffffffff81425b51>]
> >> cpuidle_idle_call+0x91/0x150
> >> Nov  2 05:42:16 mail kernel: [<ffffffff8100aa8b>] cpu_idle+0xbb/0x150
> >> Nov  2 05:42:16 mail kernel: [<ffffffff814f1785>] rest_init+0x75/0x80
> >> Nov  2 05:42:16 mail kernel: [<ffffffff81b4ae9b>] start_kernel+0x3dc/0x3e7
> >> Nov  2 05:42:16 mail kernel: [<ffffffff81b4a346>]
> >> x86_64_start_reservations+0x131/0x135
> >> Nov  2 05:42:16 mail kernel: [<ffffffff81b4a450>]
> >> x86_64_start_kernel+0x106/0x115
> >> Nov  2 05:42:16 mail kernel: ---[ end trace d9d3a1889f8925bf ]---
> >> Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: tx timeout
> >> Nov  2 05:42:16 mail kernel: sky2 0000:06:00.0: eth0: transmit ring 29
> >> .. 117 report=29 done=29
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Looks like a hardware issue, never saw it before.
> > Are you running MTU>  1500?
> > Does turning off TSO help?
> >
> > One possibility is that NET_IP_ALIGN changed. Now the ethernet header is
> > aligned and the IP header is not.
> >
> MTU=1500
> TCP timestamps seems to be the culprit - no issues with it disabled. I 
> hit the problem after running about 18 hours with TCP timestamps 
> enabled. Has been stable since rebuilding without timestamps... but 
> another day would be more telling.
> 
> Didn't look into the header alignment - but would that be inconsistent 
> with tcp timestamps being involved?

TCP timestamps make the header bigger and that might be causing
the gather code to see different alignment, causing problem.
Seeing the whole contents of the transmit ring on the dmesg might
give a clue.

I don't work for Marvell. The limited documentation does not describe
any restrictions on alignment. But that's not a surprise since they
never tell me about errata. 

Since it is a regression, bisect might help.

-- 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9)
  2010-01-22 18:01     ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Michael Breuer
  2010-01-22 21:53       ` Jarek Poplawski
@ 2012-01-16 16:39       ` Michael Breuer
  2012-01-20 14:24         ` Michael Breuer
  2012-01-20 16:26         ` Stephen Hemminger
  1 sibling, 2 replies; 99+ messages in thread
From: Michael Breuer @ 2012-01-16 16:39 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, Stephen Hemminger, linux-kernel, netdev

Synopsis:

Receiving DMAR and other errors after approximately three days of 
uptime. The symptoms exactly match errors seen and then fixed around 
2.6.32.4.

While the system remains unaffected for too long to do a bisect, I was 
able to confirm that the problem exists in the 3.1 stable branch (I 
jumped from 3.0 to 3.2 when 3.2. was released).

For now I reverted to the sky2.c from 3.0.9 and am running the rest of 
the kernel from 3.1.2, but won't be certain that this works until later 
in the week.

Note that 20 seconds prior to the log extract below were DHCP renewal 
attempts on eth1, the issue below was on eth0. Not sure it's relevant, 
however back in 2010 a preceding DHCP event did turn out to be relevant 
to the manifestation of the bug.

The 3.2.1-dirty I'm running is from git with a single local patch - for 
sidewinder force-feedback support (shouldn't be relevant to the sky2 issue).

Log extract:

Jan 16 05:49:46 mail kernel: [198230.628919] DRHD: handling fault status 
reg 2
Jan 16 05:49:46 mail kernel: [198230.628925] sky2 0000:06:00.0: error 
interrupt status=0x80000000
Jan 16 05:49:46 mail kernel: [198230.628929] DMAR:[DMA Read] Request 
device [06:00.0] fault addr fff78000
Jan 16 05:49:46 mail kernel: [198230.628931] DMAR:[fault reason 06] PTE 
Read access is not set
Jan 16 05:49:46 mail kernel: [198230.628939] sky2 0000:06:00.0: PCI 
hardware error (0x2010)
Jan 16 05:49:53 mail dhclient[1616]: DHCPREQUEST on eth1 to 
10.240.184.29 port 67
Jan 16 05:50:01 mail kernel: [198246.288400] ------------[ cut here 
]------------
Jan 16 05:50:01 mail kernel: [198246.288408] WARNING: at 
net/sched/sch_generic.c:255 dev_watchdog+0x247/0x250()
Jan 16 05:50:01 mail kernel: [198246.288411] Hardware name: System 
Product Name
Jan 16 05:50:01 mail kernel: [198246.288413] NETDEV WATCHDOG: eth0 
(sky2): transmit queue 0 timed out
Jan 16 05:50:01 mail kernel: [198246.288415] Modules linked in: tcp_lp 
cpufreq_stats ebtable_nat ebtables nf_conntrack_netbios_ns 
nf_conntrack_broadcast ip6table_mangle ip6table_filter ip6_tables 
iptable_mangle ipt_MASQUERADE iptable_nat nf_nat iptable_raw tun bridge 
stp llc lockd sit tunnel4 ipt_LOG nf_conntrack_ftp nf_conntrack_ipv6 
nf_defrag_ipv6 xt_CHECKSUM xt_multiport xt_DSCP w83627ehf xt_mark 
xt_dscp hwmon_vid binfmt_misc raid1 btrfs sunrpc zlib_deflate libcrc32c 
snd_hda_codec_analog snd_ens1371 gameport snd_hda_intel snd_rawmidi 
snd_ac97_codec snd_hda_codec snd_hwdep ac97_bus snd_seq snd_seq_device 
snd_pcm gspca_spca505 snd_timer gspca_main snd videodev media soundcore 
i2c_i801 iTCO_wdt microcode v4l2_compat_ioctl32 snd_page_alloc 
i7core_edac sky2 edac_core pcspkr iTCO_vendor_support virtio_net virtio 
virtio_ring kvm_intel kvm uinput ipv6 raid456 async_raid6_recov async_pq 
raid6_pq async_xor firewire_ohci firewire_core pata_acpi ata_generic xor 
async_memcpy async_tx crc_itu_t pata_marvell nouveau ttm d
Jan 16 05:50:01 mail kernel: rm_kms_helper drm i2c_algo_bit i2c_core 
mxm_wmi video [last unloaded: nf_conntrack_broadcast]
Jan 16 05:50:01 mail kernel: [198246.288487] Pid: 0, comm: swapper/0 
Tainted: G        W    3.2.1-dirty #1
Jan 16 05:50:01 mail kernel: [198246.288489] Call Trace:
Jan 16 05:50:01 mail kernel: [198246.288491] <IRQ>  [<ffffffff81050a4f>] 
warn_slowpath_common+0x7f/0xc0
Jan 16 05:50:01 mail kernel: [198246.288501]  [<ffffffff8101f0bd>] ? 
lapic_next_event+0x1d/0x30
Jan 16 05:50:01 mail kernel: [198246.288504]  [<ffffffff81050b46>] 
warn_slowpath_fmt+0x46/0x50
Jan 16 05:50:01 mail kernel: [198246.288509]  [<ffffffff81009319>] ? 
read_tsc+0x9/0x20
Jan 16 05:50:01 mail kernel: [198246.288513]  [<ffffffff814a81e7>] 
dev_watchdog+0x247/0x250
Jan 16 05:50:01 mail kernel: [198246.288518]  [<ffffffff8105fbbb>] 
run_timer_softirq+0x12b/0x3b0
Jan 16 05:50:01 mail kernel: [198246.288521]  [<ffffffff814a7fa0>] ? 
qdisc_reset+0x50/0x50
Jan 16 05:50:01 mail kernel: [198246.288525]  [<ffffffff81057d18>] 
__do_softirq+0xa8/0x210
Jan 16 05:50:01 mail kernel: [198246.288529]  [<ffffffff8157496c>] 
call_softirq+0x1c/0x30
Jan 16 05:50:01 mail kernel: [198246.288533]  [<ffffffff810041e5>] 
do_softirq+0x65/0xa0
Jan 16 05:50:01 mail kernel: [198246.288536]  [<ffffffff810580fe>] 
irq_exit+0x8e/0xb0
Jan 16 05:50:01 mail kernel: [198246.288539]  [<ffffffff815750a3>] 
do_IRQ+0x63/0xe0
Jan 16 05:50:01 mail kernel: [198246.288543]  [<ffffffff8156ad2e>] 
common_interrupt+0x6e/0x6e
Jan 16 05:50:01 mail kernel: [198246.288545] <EOI>  [<ffffffff81307b6d>] 
? intel_idle+0xed/0x150
Jan 16 05:50:01 mail kernel: [198246.288551]  [<ffffffff81307b4f>] ? 
intel_idle+0xcf/0x150
Jan 16 05:50:01 mail kernel: [198246.288555]  [<ffffffff8144d331>] 
cpuidle_idle_call+0xc1/0x280
Jan 16 05:50:01 mail kernel: [198246.288559]  [<ffffffff8100122a>] 
cpu_idle+0xca/0x120
Jan 16 05:50:01 mail kernel: [198246.288563]  [<ffffffff8154741e>] 
rest_init+0x72/0x74
Jan 16 05:50:01 mail kernel: [198246.288568]  [<ffffffff81b6abdd>] 
start_kernel+0x3b5/0x3c0
Jan 16 05:50:01 mail kernel: [198246.288572]  [<ffffffff81b6a32b>] 
x86_64_start_reservations+0x132/0x136
Jan 16 05:50:01 mail kernel: [198246.288576]  [<ffffffff81b6a140>] ? 
early_idt_handlers+0x140/0x140
Jan 16 05:50:01 mail kernel: [198246.288580]  [<ffffffff81b6a431>] 
x86_64_start_kernel+0x102/0x111
Jan 16 05:50:01 mail kernel: [198246.288583] ---[ end trace 
bb26011d21a2b1d7 ]---
Jan 16 05:50:01 mail kernel: [198246.288586] sky2 0000:06:00.0: eth0: tx 
timeout
Jan 16 05:50:01 mail kernel: [198246.288593] sky2 0000:06:00.0: eth0: 
transmit ring 115 .. 10 report=115 done=115




^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9)
  2012-01-16 16:39       ` Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9) Michael Breuer
@ 2012-01-20 14:24         ` Michael Breuer
  2012-01-20 16:10           ` Stephen Hemminger
  2012-01-20 16:26         ` Stephen Hemminger
  1 sibling, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2012-01-20 14:24 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, Stephen Hemminger, linux-kernel, netdev

On 1/16/2012 11:39 AM, Michael Breuer wrote:
> Synopsis:
>
> Receiving DMAR and other errors after approximately three days of 
> uptime. The symptoms exactly match errors seen and then fixed around 
> 2.6.32.4.
>
> While the system remains unaffected for too long to do a bisect, I was 
> able to confirm that the problem exists in the 3.1 stable branch (I 
> jumped from 3.0 to 3.2 when 3.2. was released).
>
> For now I reverted to the sky2.c from 3.0.9 and am running the rest of 
> the kernel from 3.1.2, but won't be certain that this works until 
> later in the week.
>
> Note that 20 seconds prior to the log extract below were DHCP renewal 
> attempts on eth1, the issue below was on eth0. Not sure it's relevant, 
> however back in 2010 a preceding DHCP event did turn out to be 
> relevant to the manifestation of the bug.
>
> The 3.2.1-dirty I'm running is from git with a single local patch - 
> for sidewinder force-feedback support (shouldn't be relevant to the 
> sky2 issue).
>
> Log extract:
>
> Jan 16 05:49:46 mail kernel: [198230.628919] DRHD: handling fault 
> status reg 2
> Jan 16 05:49:46 mail kernel: [198230.628925] sky2 0000:06:00.0: error 
> interrupt status=0x80000000
> Jan 16 05:49:46 mail kernel: [198230.628929] DMAR:[DMA Read] Request 
> device [06:00.0] fault addr fff78000
> Jan 16 05:49:46 mail kernel: [198230.628931] DMAR:[fault reason 06] 
> PTE Read access is not set
> Jan 16 05:49:46 mail kernel: [198230.628939] sky2 0000:06:00.0: PCI 
> hardware error (0x2010)
> Jan 16 05:49:53 mail dhclient[1616]: DHCPREQUEST on eth1 to 
> 10.240.184.29 port 67
> Jan 16 05:50:01 mail kernel: [198246.288400] ------------[ cut here 
> ]------------
> Jan 16 05:50:01 mail kernel: [198246.288408] WARNING: at 
> net/sched/sch_generic.c:255 dev_watchdog+0x247/0x250()
> Jan 16 05:50:01 mail kernel: [198246.288411] Hardware name: System 
> Product Name
> Jan 16 05:50:01 mail kernel: [198246.288413] NETDEV WATCHDOG: eth0 
> (sky2): transmit queue 0 timed out
> Jan 16 05:50:01 mail kernel: [198246.288415] Modules linked in: tcp_lp 
> cpufreq_stats ebtable_nat ebtables nf_conntrack_netbios_ns 
> nf_conntrack_broadcast ip6table_mangle ip6table_filter ip6_tables 
> iptable_mangle ipt_MASQUERADE iptable_nat nf_nat iptable_raw tun 
> bridge stp llc lockd sit tunnel4 ipt_LOG nf_conntrack_ftp 
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_CHECKSUM xt_multiport xt_DSCP 
> w83627ehf xt_mark xt_dscp hwmon_vid binfmt_misc raid1 btrfs sunrpc 
> zlib_deflate libcrc32c snd_hda_codec_analog snd_ens1371 gameport 
> snd_hda_intel snd_rawmidi snd_ac97_codec snd_hda_codec snd_hwdep 
> ac97_bus snd_seq snd_seq_device snd_pcm gspca_spca505 snd_timer 
> gspca_main snd videodev media soundcore i2c_i801 iTCO_wdt microcode 
> v4l2_compat_ioctl32 snd_page_alloc i7core_edac sky2 edac_core pcspkr 
> iTCO_vendor_support virtio_net virtio virtio_ring kvm_intel kvm uinput 
> ipv6 raid456 async_raid6_recov async_pq raid6_pq async_xor 
> firewire_ohci firewire_core pata_acpi ata_generic xor async_memcpy 
> async_tx crc_itu_t pata_marvell nouveau ttm d
> Jan 16 05:50:01 mail kernel: rm_kms_helper drm i2c_algo_bit i2c_core 
> mxm_wmi video [last unloaded: nf_conntrack_broadcast]
> Jan 16 05:50:01 mail kernel: [198246.288487] Pid: 0, comm: swapper/0 
> Tainted: G        W    3.2.1-dirty #1
> Jan 16 05:50:01 mail kernel: [198246.288489] Call Trace:
> Jan 16 05:50:01 mail kernel: [198246.288491] <IRQ>  
> [<ffffffff81050a4f>] warn_slowpath_common+0x7f/0xc0
> Jan 16 05:50:01 mail kernel: [198246.288501]  [<ffffffff8101f0bd>] ? 
> lapic_next_event+0x1d/0x30
> Jan 16 05:50:01 mail kernel: [198246.288504]  [<ffffffff81050b46>] 
> warn_slowpath_fmt+0x46/0x50
> Jan 16 05:50:01 mail kernel: [198246.288509]  [<ffffffff81009319>] ? 
> read_tsc+0x9/0x20
> Jan 16 05:50:01 mail kernel: [198246.288513]  [<ffffffff814a81e7>] 
> dev_watchdog+0x247/0x250
> Jan 16 05:50:01 mail kernel: [198246.288518]  [<ffffffff8105fbbb>] 
> run_timer_softirq+0x12b/0x3b0
> Jan 16 05:50:01 mail kernel: [198246.288521]  [<ffffffff814a7fa0>] ? 
> qdisc_reset+0x50/0x50
> Jan 16 05:50:01 mail kernel: [198246.288525]  [<ffffffff81057d18>] 
> __do_softirq+0xa8/0x210
> Jan 16 05:50:01 mail kernel: [198246.288529]  [<ffffffff8157496c>] 
> call_softirq+0x1c/0x30
> Jan 16 05:50:01 mail kernel: [198246.288533]  [<ffffffff810041e5>] 
> do_softirq+0x65/0xa0
> Jan 16 05:50:01 mail kernel: [198246.288536]  [<ffffffff810580fe>] 
> irq_exit+0x8e/0xb0
> Jan 16 05:50:01 mail kernel: [198246.288539]  [<ffffffff815750a3>] 
> do_IRQ+0x63/0xe0
> Jan 16 05:50:01 mail kernel: [198246.288543]  [<ffffffff8156ad2e>] 
> common_interrupt+0x6e/0x6e
> Jan 16 05:50:01 mail kernel: [198246.288545] <EOI>  
> [<ffffffff81307b6d>] ? intel_idle+0xed/0x150
> Jan 16 05:50:01 mail kernel: [198246.288551]  [<ffffffff81307b4f>] ? 
> intel_idle+0xcf/0x150
> Jan 16 05:50:01 mail kernel: [198246.288555]  [<ffffffff8144d331>] 
> cpuidle_idle_call+0xc1/0x280
> Jan 16 05:50:01 mail kernel: [198246.288559]  [<ffffffff8100122a>] 
> cpu_idle+0xca/0x120
> Jan 16 05:50:01 mail kernel: [198246.288563]  [<ffffffff8154741e>] 
> rest_init+0x72/0x74
> Jan 16 05:50:01 mail kernel: [198246.288568]  [<ffffffff81b6abdd>] 
> start_kernel+0x3b5/0x3c0
> Jan 16 05:50:01 mail kernel: [198246.288572]  [<ffffffff81b6a32b>] 
> x86_64_start_reservations+0x132/0x136
> Jan 16 05:50:01 mail kernel: [198246.288576]  [<ffffffff81b6a140>] ? 
> early_idt_handlers+0x140/0x140
> Jan 16 05:50:01 mail kernel: [198246.288580]  [<ffffffff81b6a431>] 
> x86_64_start_kernel+0x102/0x111
> Jan 16 05:50:01 mail kernel: [198246.288583] ---[ end trace 
> bb26011d21a2b1d7 ]---
> Jan 16 05:50:01 mail kernel: [198246.288586] sky2 0000:06:00.0: eth0: 
> tx timeout
> Jan 16 05:50:01 mail kernel: [198246.288593] sky2 0000:06:00.0: eth0: 
> transmit ring 115 .. 10 report=115 done=115
>
>
>
FYI - I've been up for four days now without issues running on 3.2.1 + 
sky2.c from 3.0.9. Looks like the issue is in fact in one of the 
modifications made in sky2.c between those two releases.

--
Mike

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9)
  2012-01-20 14:24         ` Michael Breuer
@ 2012-01-20 16:10           ` Stephen Hemminger
  2012-01-20 16:17             ` Michael Breuer
  0 siblings, 1 reply; 99+ messages in thread
From: Stephen Hemminger @ 2012-01-20 16:10 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Jarek Poplawski, David Miller, Stephen Hemminger, linux-kernel, netdev

On Fri, 20 Jan 2012 09:24:38 -0500
Michael Breuer <mbreuer@majjas.com> wrote:

> On 1/16/2012 11:39 AM, Michael Breuer wrote:
> > Synopsis:
> >
> > Receiving DMAR and other errors after approximately three days of 
> > uptime. The symptoms exactly match errors seen and then fixed around 
> > 2.6.32.4.
> >
> > While the system remains unaffected for too long to do a bisect, I was 
> > able to confirm that the problem exists in the 3.1 stable branch (I 
> > jumped from 3.0 to 3.2 when 3.2. was released).
> >
> > For now I reverted to the sky2.c from 3.0.9 and am running the rest of 
> > the kernel from 3.1.2, but won't be certain that this works until 
> > later in the week.
> >
> > Note that 20 seconds prior to the log extract below were DHCP renewal 
> > attempts on eth1, the issue below was on eth0. Not sure it's relevant, 
> > however back in 2010 a preceding DHCP event did turn out to be 
> > relevant to the manifestation of the bug.
> >
> > The 3.2.1-dirty I'm running is from git with a single local patch - 
> > for sidewinder force-feedback support (shouldn't be relevant to the 
> > sky2 issue).
> >
> > Log extract:
> >
> > Jan 16 05:49:46 mail kernel: [198230.628919] DRHD: handling fault 
> > status reg 2
> > Jan 16 05:49:46 mail kernel: [198230.628925] sky2 0000:06:00.0: error 
> > interrupt status=0x80000000
> > Jan 16 05:49:46 mail kernel: [198230.628929] DMAR:[DMA Read] Request 
> > device [06:00.0] fault addr fff78000
> > Jan 16 05:49:46 mail kernel: [198230.628931] DMAR:[fault reason 06] 
> > PTE Read access is not set
> > Jan 16 05:49:46 mail kernel: [198230.628939] sky2 0000:06:00.0: PCI 
> > hardware error (0x2010)
> > Jan 16 05:49:53 mail dhclient[1616]: DHCPREQUEST on eth1 to 
> > 10.240.184.29 port 67
> > Jan 16 05:50:01 mail kernel: [198246.288400] ------------[ cut here 
> > ]------------
> > Jan 16 05:50:01 mail kernel: [198246.288408] WARNING: at 
> > net/sched/sch_generic.c:255 dev_watchdog+0x247/0x250()
> > Jan 16 05:50:01 mail kernel: [198246.288411] Hardware name: System 
> > Product Name
> > Jan 16 05:50:01 mail kernel: [198246.288413] NETDEV WATCHDOG: eth0 
> > (sky2): transmit queue 0 timed out
> > Jan 16 05:50:01 mail kernel: [198246.288415] Modules linked in: tcp_lp 
> > cpufreq_stats ebtable_nat ebtables nf_conntrack_netbios_ns 
> > nf_conntrack_broadcast ip6table_mangle ip6table_filter ip6_tables 
> > iptable_mangle ipt_MASQUERADE iptable_nat nf_nat iptable_raw tun 
> > bridge stp llc lockd sit tunnel4 ipt_LOG nf_conntrack_ftp 
> > nf_conntrack_ipv6 nf_defrag_ipv6 xt_CHECKSUM xt_multiport xt_DSCP 
> > w83627ehf xt_mark xt_dscp hwmon_vid binfmt_misc raid1 btrfs sunrpc 
> > zlib_deflate libcrc32c snd_hda_codec_analog snd_ens1371 gameport 
> > snd_hda_intel snd_rawmidi snd_ac97_codec snd_hda_codec snd_hwdep 
> > ac97_bus snd_seq snd_seq_device snd_pcm gspca_spca505 snd_timer 
> > gspca_main snd videodev media soundcore i2c_i801 iTCO_wdt microcode 
> > v4l2_compat_ioctl32 snd_page_alloc i7core_edac sky2 edac_core pcspkr 
> > iTCO_vendor_support virtio_net virtio virtio_ring kvm_intel kvm uinput 
> > ipv6 raid456 async_raid6_recov async_pq raid6_pq async_xor 
> > firewire_ohci firewire_core pata_acpi ata_generic xor async_memcpy 
> > async_tx crc_itu_t pata_marvell nouveau ttm d
> > Jan 16 05:50:01 mail kernel: rm_kms_helper drm i2c_algo_bit i2c_core 
> > mxm_wmi video [last unloaded: nf_conntrack_broadcast]
> > Jan 16 05:50:01 mail kernel: [198246.288487] Pid: 0, comm: swapper/0 
> > Tainted: G        W    3.2.1-dirty #1
> > Jan 16 05:50:01 mail kernel: [198246.288489] Call Trace:
> > Jan 16 05:50:01 mail kernel: [198246.288491] <IRQ>  
> > [<ffffffff81050a4f>] warn_slowpath_common+0x7f/0xc0
> > Jan 16 05:50:01 mail kernel: [198246.288501]  [<ffffffff8101f0bd>] ? 
> > lapic_next_event+0x1d/0x30
> > Jan 16 05:50:01 mail kernel: [198246.288504]  [<ffffffff81050b46>] 
> > warn_slowpath_fmt+0x46/0x50
> > Jan 16 05:50:01 mail kernel: [198246.288509]  [<ffffffff81009319>] ? 
> > read_tsc+0x9/0x20
> > Jan 16 05:50:01 mail kernel: [198246.288513]  [<ffffffff814a81e7>] 
> > dev_watchdog+0x247/0x250
> > Jan 16 05:50:01 mail kernel: [198246.288518]  [<ffffffff8105fbbb>] 
> > run_timer_softirq+0x12b/0x3b0
> > Jan 16 05:50:01 mail kernel: [198246.288521]  [<ffffffff814a7fa0>] ? 
> > qdisc_reset+0x50/0x50
> > Jan 16 05:50:01 mail kernel: [198246.288525]  [<ffffffff81057d18>] 
> > __do_softirq+0xa8/0x210
> > Jan 16 05:50:01 mail kernel: [198246.288529]  [<ffffffff8157496c>] 
> > call_softirq+0x1c/0x30
> > Jan 16 05:50:01 mail kernel: [198246.288533]  [<ffffffff810041e5>] 
> > do_softirq+0x65/0xa0
> > Jan 16 05:50:01 mail kernel: [198246.288536]  [<ffffffff810580fe>] 
> > irq_exit+0x8e/0xb0
> > Jan 16 05:50:01 mail kernel: [198246.288539]  [<ffffffff815750a3>] 
> > do_IRQ+0x63/0xe0
> > Jan 16 05:50:01 mail kernel: [198246.288543]  [<ffffffff8156ad2e>] 
> > common_interrupt+0x6e/0x6e
> > Jan 16 05:50:01 mail kernel: [198246.288545] <EOI>  
> > [<ffffffff81307b6d>] ? intel_idle+0xed/0x150
> > Jan 16 05:50:01 mail kernel: [198246.288551]  [<ffffffff81307b4f>] ? 
> > intel_idle+0xcf/0x150
> > Jan 16 05:50:01 mail kernel: [198246.288555]  [<ffffffff8144d331>] 
> > cpuidle_idle_call+0xc1/0x280
> > Jan 16 05:50:01 mail kernel: [198246.288559]  [<ffffffff8100122a>] 
> > cpu_idle+0xca/0x120
> > Jan 16 05:50:01 mail kernel: [198246.288563]  [<ffffffff8154741e>] 
> > rest_init+0x72/0x74
> > Jan 16 05:50:01 mail kernel: [198246.288568]  [<ffffffff81b6abdd>] 
> > start_kernel+0x3b5/0x3c0
> > Jan 16 05:50:01 mail kernel: [198246.288572]  [<ffffffff81b6a32b>] 
> > x86_64_start_reservations+0x132/0x136
> > Jan 16 05:50:01 mail kernel: [198246.288576]  [<ffffffff81b6a140>] ? 
> > early_idt_handlers+0x140/0x140
> > Jan 16 05:50:01 mail kernel: [198246.288580]  [<ffffffff81b6a431>] 
> > x86_64_start_kernel+0x102/0x111
> > Jan 16 05:50:01 mail kernel: [198246.288583] ---[ end trace 
> > bb26011d21a2b1d7 ]---
> > Jan 16 05:50:01 mail kernel: [198246.288586] sky2 0000:06:00.0: eth0: 
> > tx timeout
> > Jan 16 05:50:01 mail kernel: [198246.288593] sky2 0000:06:00.0: eth0: 
> > transmit ring 115 .. 10 report=115 done=115
> >
> >
> >
> FYI - I've been up for four days now without issues running on 3.2.1 + 
> sky2.c from 3.0.9. Looks like the issue is in fact in one of the 
> modifications made in sky2.c between those two releases.

Since only you seem to be able to reproduce it, most likely the
bisect burden will be on you.  If you know it is only one file,
then bisecting that file is fairly quick.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9)
  2012-01-20 16:10           ` Stephen Hemminger
@ 2012-01-20 16:17             ` Michael Breuer
  0 siblings, 0 replies; 99+ messages in thread
From: Michael Breuer @ 2012-01-20 16:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, Stephen Hemminger, linux-kernel, netdev

On 1/20/2012 11:10 AM, Stephen Hemminger wrote:
> On Fri, 20 Jan 2012 09:24:38 -0500
> Michael Breuer<mbreuer@majjas.com>  wrote:
>
>> On 1/16/2012 11:39 AM, Michael Breuer wrote:
>>> Synopsis:
>>>
>>> Receiving DMAR and other errors after approximately three days of
>>> uptime. The symptoms exactly match errors seen and then fixed around
>>> 2.6.32.4.
>>>
>>> While the system remains unaffected for too long to do a bisect, I was
>>> able to confirm that the problem exists in the 3.1 stable branch (I
>>> jumped from 3.0 to 3.2 when 3.2. was released).
>>>
>>> For now I reverted to the sky2.c from 3.0.9 and am running the rest of
>>> the kernel from 3.1.2, but won't be certain that this works until
>>> later in the week.
>>>
>>> Note that 20 seconds prior to the log extract below were DHCP renewal
>>> attempts on eth1, the issue below was on eth0. Not sure it's relevant,
>>> however back in 2010 a preceding DHCP event did turn out to be
>>> relevant to the manifestation of the bug.
>>>
>>> The 3.2.1-dirty I'm running is from git with a single local patch -
>>> for sidewinder force-feedback support (shouldn't be relevant to the
>>> sky2 issue).
>>>
>>> Log extract:
>>>
>>> Jan 16 05:49:46 mail kernel: [198230.628919] DRHD: handling fault
>>> status reg 2
>>> Jan 16 05:49:46 mail kernel: [198230.628925] sky2 0000:06:00.0: error
>>> interrupt status=0x80000000
>>> Jan 16 05:49:46 mail kernel: [198230.628929] DMAR:[DMA Read] Request
>>> device [06:00.0] fault addr fff78000
>>> Jan 16 05:49:46 mail kernel: [198230.628931] DMAR:[fault reason 06]
>>> PTE Read access is not set
>>> Jan 16 05:49:46 mail kernel: [198230.628939] sky2 0000:06:00.0: PCI
>>> hardware error (0x2010)
>>> Jan 16 05:49:53 mail dhclient[1616]: DHCPREQUEST on eth1 to
>>> 10.240.184.29 port 67
>>> Jan 16 05:50:01 mail kernel: [198246.288400] ------------[ cut here
>>> ]------------
>>> Jan 16 05:50:01 mail kernel: [198246.288408] WARNING: at
>>> net/sched/sch_generic.c:255 dev_watchdog+0x247/0x250()
>>> Jan 16 05:50:01 mail kernel: [198246.288411] Hardware name: System
>>> Product Name
>>> Jan 16 05:50:01 mail kernel: [198246.288413] NETDEV WATCHDOG: eth0
>>> (sky2): transmit queue 0 timed out
>>> Jan 16 05:50:01 mail kernel: [198246.288415] Modules linked in: tcp_lp
>>> cpufreq_stats ebtable_nat ebtables nf_conntrack_netbios_ns
>>> nf_conntrack_broadcast ip6table_mangle ip6table_filter ip6_tables
>>> iptable_mangle ipt_MASQUERADE iptable_nat nf_nat iptable_raw tun
>>> bridge stp llc lockd sit tunnel4 ipt_LOG nf_conntrack_ftp
>>> nf_conntrack_ipv6 nf_defrag_ipv6 xt_CHECKSUM xt_multiport xt_DSCP
>>> w83627ehf xt_mark xt_dscp hwmon_vid binfmt_misc raid1 btrfs sunrpc
>>> zlib_deflate libcrc32c snd_hda_codec_analog snd_ens1371 gameport
>>> snd_hda_intel snd_rawmidi snd_ac97_codec snd_hda_codec snd_hwdep
>>> ac97_bus snd_seq snd_seq_device snd_pcm gspca_spca505 snd_timer
>>> gspca_main snd videodev media soundcore i2c_i801 iTCO_wdt microcode
>>> v4l2_compat_ioctl32 snd_page_alloc i7core_edac sky2 edac_core pcspkr
>>> iTCO_vendor_support virtio_net virtio virtio_ring kvm_intel kvm uinput
>>> ipv6 raid456 async_raid6_recov async_pq raid6_pq async_xor
>>> firewire_ohci firewire_core pata_acpi ata_generic xor async_memcpy
>>> async_tx crc_itu_t pata_marvell nouveau ttm d
>>> Jan 16 05:50:01 mail kernel: rm_kms_helper drm i2c_algo_bit i2c_core
>>> mxm_wmi video [last unloaded: nf_conntrack_broadcast]
>>> Jan 16 05:50:01 mail kernel: [198246.288487] Pid: 0, comm: swapper/0
>>> Tainted: G        W    3.2.1-dirty #1
>>> Jan 16 05:50:01 mail kernel: [198246.288489] Call Trace:
>>> Jan 16 05:50:01 mail kernel: [198246.288491]<IRQ>
>>> [<ffffffff81050a4f>] warn_slowpath_common+0x7f/0xc0
>>> Jan 16 05:50:01 mail kernel: [198246.288501]  [<ffffffff8101f0bd>] ?
>>> lapic_next_event+0x1d/0x30
>>> Jan 16 05:50:01 mail kernel: [198246.288504]  [<ffffffff81050b46>]
>>> warn_slowpath_fmt+0x46/0x50
>>> Jan 16 05:50:01 mail kernel: [198246.288509]  [<ffffffff81009319>] ?
>>> read_tsc+0x9/0x20
>>> Jan 16 05:50:01 mail kernel: [198246.288513]  [<ffffffff814a81e7>]
>>> dev_watchdog+0x247/0x250
>>> Jan 16 05:50:01 mail kernel: [198246.288518]  [<ffffffff8105fbbb>]
>>> run_timer_softirq+0x12b/0x3b0
>>> Jan 16 05:50:01 mail kernel: [198246.288521]  [<ffffffff814a7fa0>] ?
>>> qdisc_reset+0x50/0x50
>>> Jan 16 05:50:01 mail kernel: [198246.288525]  [<ffffffff81057d18>]
>>> __do_softirq+0xa8/0x210
>>> Jan 16 05:50:01 mail kernel: [198246.288529]  [<ffffffff8157496c>]
>>> call_softirq+0x1c/0x30
>>> Jan 16 05:50:01 mail kernel: [198246.288533]  [<ffffffff810041e5>]
>>> do_softirq+0x65/0xa0
>>> Jan 16 05:50:01 mail kernel: [198246.288536]  [<ffffffff810580fe>]
>>> irq_exit+0x8e/0xb0
>>> Jan 16 05:50:01 mail kernel: [198246.288539]  [<ffffffff815750a3>]
>>> do_IRQ+0x63/0xe0
>>> Jan 16 05:50:01 mail kernel: [198246.288543]  [<ffffffff8156ad2e>]
>>> common_interrupt+0x6e/0x6e
>>> Jan 16 05:50:01 mail kernel: [198246.288545]<EOI>
>>> [<ffffffff81307b6d>] ? intel_idle+0xed/0x150
>>> Jan 16 05:50:01 mail kernel: [198246.288551]  [<ffffffff81307b4f>] ?
>>> intel_idle+0xcf/0x150
>>> Jan 16 05:50:01 mail kernel: [198246.288555]  [<ffffffff8144d331>]
>>> cpuidle_idle_call+0xc1/0x280
>>> Jan 16 05:50:01 mail kernel: [198246.288559]  [<ffffffff8100122a>]
>>> cpu_idle+0xca/0x120
>>> Jan 16 05:50:01 mail kernel: [198246.288563]  [<ffffffff8154741e>]
>>> rest_init+0x72/0x74
>>> Jan 16 05:50:01 mail kernel: [198246.288568]  [<ffffffff81b6abdd>]
>>> start_kernel+0x3b5/0x3c0
>>> Jan 16 05:50:01 mail kernel: [198246.288572]  [<ffffffff81b6a32b>]
>>> x86_64_start_reservations+0x132/0x136
>>> Jan 16 05:50:01 mail kernel: [198246.288576]  [<ffffffff81b6a140>] ?
>>> early_idt_handlers+0x140/0x140
>>> Jan 16 05:50:01 mail kernel: [198246.288580]  [<ffffffff81b6a431>]
>>> x86_64_start_kernel+0x102/0x111
>>> Jan 16 05:50:01 mail kernel: [198246.288583] ---[ end trace
>>> bb26011d21a2b1d7 ]---
>>> Jan 16 05:50:01 mail kernel: [198246.288586] sky2 0000:06:00.0: eth0:
>>> tx timeout
>>> Jan 16 05:50:01 mail kernel: [198246.288593] sky2 0000:06:00.0: eth0:
>>> transmit ring 115 .. 10 report=115 done=115
>>>
>>>
>>>
>> FYI - I've been up for four days now without issues running on 3.2.1 +
>> sky2.c from 3.0.9. Looks like the issue is in fact in one of the
>> modifications made in sky2.c between those two releases.
> Since only you seem to be able to reproduce it, most likely the
> bisect burden will be on you.  If you know it is only one file,
> then bisecting that file is fairly quick.
>
As of now, I have no reliable way to reproduce... so this is likely to 
take about 3-4 days per bisect run... more if it doesn't fail.

If there are suggestions as to diagnostic code to put in; or specific 
bias towards one version or another that may reduce the time significantly.

I've also got some windows where I have to leave a stable version up.


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9)
  2012-01-16 16:39       ` Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9) Michael Breuer
  2012-01-20 14:24         ` Michael Breuer
@ 2012-01-20 16:26         ` Stephen Hemminger
  2012-01-20 16:44           ` Michael Breuer
  1 sibling, 1 reply; 99+ messages in thread
From: Stephen Hemminger @ 2012-01-20 16:26 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Jarek Poplawski, David Miller, Stephen Hemminger, linux-kernel, netdev

On Mon, 16 Jan 2012 11:39:13 -0500
Michael Breuer <mbreuer@majjas.com> wrote:

> Synopsis:
> 
> Receiving DMAR and other errors after approximately three days of 
> uptime. The symptoms exactly match errors seen and then fixed around 
> 2.6.32.4.
> 
> While the system remains unaffected for too long to do a bisect, I was 
> able to confirm that the problem exists in the 3.1 stable branch (I 
> jumped from 3.0 to 3.2 when 3.2. was released).
> 
> For now I reverted to the sky2.c from 3.0.9 and am running the rest of 
> the kernel from 3.1.2, but won't be certain that this works until later 
> in the week.
> 
> Note that 20 seconds prior to the log extract below were DHCP renewal 
> attempts on eth1, the issue below was on eth0. Not sure it's relevant, 
> however back in 2010 a preceding DHCP event did turn out to be relevant 
> to the manifestation of the bug.
> 
> The 3.2.1-dirty I'm running is from git with a single local patch - for 
> sidewinder force-feedback support (shouldn't be relevant to the sky2 issue).
> 
> Log extract:
> 
> Jan 16 05:49:46 mail kernel: [198230.628919] DRHD: handling fault status 
> reg 2
> Jan 16 05:49:46 mail kernel: [198230.628925] sky2 0000:06:00.0: error 
> interrupt status=0x80000000
> Jan 16 05:49:46 mail kernel: [198230.628929] DMAR:[DMA Read] Request 
> device [06:00.0] fault addr fff78000
> Jan 16 05:49:46 mail kernel: [198230.628931] DMAR:[fault reason 06] PTE 
> Read access is not set
> Jan 16 05:49:46 mail kernel: [198230.628939] sky2 0000:06:00.0: PCI 
> hardware error (0x2010)
> Jan 16 05:49:53 mail dhclient[1616]: DHCPREQUEST on eth1 to 
> 10.240.184.29 port 67
> Jan 16 05:50:01 mail kernel: [198246.288400] ------------[ cut here 
> ]------------
> Jan 16 05:50:01 mail kernel: [198246.288408] WARNING: at 
> net/sched/sch_generic.c:255 dev_watchdog+0x247/0x250()
> Jan 16 05:50:01 mail kernel: [198246.288411] Hardware name: System 
> Product Name
> Jan 16 05:50:01 mail kernel: [198246.288413] NETDEV WATCHDOG: eth0 
> (sky2): transmit queue 0 timed out
> Jan 16 05:50:01 mail kernel: [198246.288415] Modules linked in: tcp_lp 
> cpufreq_stats ebtable_nat ebtables nf_conntrack_netbios_ns 
> nf_conntrack_broadcast ip6table_mangle ip6table_filter ip6_tables 
> iptable_mangle ipt_MASQUERADE iptable_nat nf_nat iptable_raw tun bridge 
> stp llc lockd sit tunnel4 ipt_LOG nf_conntrack_ftp nf_conntrack_ipv6 
> nf_defrag_ipv6 xt_CHECKSUM xt_multiport xt_DSCP w83627ehf xt_mark 
> xt_dscp hwmon_vid binfmt_misc raid1 btrfs sunrpc zlib_deflate libcrc32c 
> snd_hda_codec_analog snd_ens1371 gameport snd_hda_intel snd_rawmidi 
> snd_ac97_codec snd_hda_codec snd_hwdep ac97_bus snd_seq snd_seq_device 
> snd_pcm gspca_spca505 snd_timer gspca_main snd videodev media soundcore 
> i2c_i801 iTCO_wdt microcode v4l2_compat_ioctl32 snd_page_alloc 
> i7core_edac sky2 edac_core pcspkr iTCO_vendor_support virtio_net virtio 
> virtio_ring kvm_intel kvm uinput ipv6 raid456 async_raid6_recov async_pq 
> raid6_pq async_xor firewire_ohci firewire_core pata_acpi ata_generic xor 
> async_memcpy async_tx crc_itu_t pata_marvell nouveau ttm d
> Jan 16 05:50:01 mail kernel: rm_kms_helper drm i2c_algo_bit i2c_core 
> mxm_wmi video [last unloaded: nf_conntrack_broadcast]
> Jan 16 05:50:01 mail kernel: [198246.288487] Pid: 0, comm: swapper/0 
> Tainted: G        W    3.2.1-dirty #1
> Jan 16 05:50:01 mail kernel: [198246.288489] Call Trace:
> Jan 16 05:50:01 mail kernel: [198246.288491] <IRQ>  [<ffffffff81050a4f>] 
> warn_slowpath_common+0x7f/0xc0
> Jan 16 05:50:01 mail kernel: [198246.288501]  [<ffffffff8101f0bd>] ? 
> lapic_next_event+0x1d/0x30
> Jan 16 05:50:01 mail kernel: [198246.288504]  [<ffffffff81050b46>] 
> warn_slowpath_fmt+0x46/0x50
> Jan 16 05:50:01 mail kernel: [198246.288509]  [<ffffffff81009319>] ? 
> read_tsc+0x9/0x20
> Jan 16 05:50:01 mail kernel: [198246.288513]  [<ffffffff814a81e7>] 
> dev_watchdog+0x247/0x250
> Jan 16 05:50:01 mail kernel: [198246.288518]  [<ffffffff8105fbbb>] 
> run_timer_softirq+0x12b/0x3b0
> Jan 16 05:50:01 mail kernel: [198246.288521]  [<ffffffff814a7fa0>] ? 
> qdisc_reset+0x50/0x50
> Jan 16 05:50:01 mail kernel: [198246.288525]  [<ffffffff81057d18>] 
> __do_softirq+0xa8/0x210
> Jan 16 05:50:01 mail kernel: [198246.288529]  [<ffffffff8157496c>] 
> call_softirq+0x1c/0x30
> Jan 16 05:50:01 mail kernel: [198246.288533]  [<ffffffff810041e5>] 
> do_softirq+0x65/0xa0
> Jan 16 05:50:01 mail kernel: [198246.288536]  [<ffffffff810580fe>] 
> irq_exit+0x8e/0xb0
> Jan 16 05:50:01 mail kernel: [198246.288539]  [<ffffffff815750a3>] 
> do_IRQ+0x63/0xe0
> Jan 16 05:50:01 mail kernel: [198246.288543]  [<ffffffff8156ad2e>] 
> common_interrupt+0x6e/0x6e
> Jan 16 05:50:01 mail kernel: [198246.288545] <EOI>  [<ffffffff81307b6d>] 
> ? intel_idle+0xed/0x150
> Jan 16 05:50:01 mail kernel: [198246.288551]  [<ffffffff81307b4f>] ? 
> intel_idle+0xcf/0x150
> Jan 16 05:50:01 mail kernel: [198246.288555]  [<ffffffff8144d331>] 
> cpuidle_idle_call+0xc1/0x280
> Jan 16 05:50:01 mail kernel: [198246.288559]  [<ffffffff8100122a>] 
> cpu_idle+0xca/0x120
> Jan 16 05:50:01 mail kernel: [198246.288563]  [<ffffffff8154741e>] 
> rest_init+0x72/0x74
> Jan 16 05:50:01 mail kernel: [198246.288568]  [<ffffffff81b6abdd>] 
> start_kernel+0x3b5/0x3c0
> Jan 16 05:50:01 mail kernel: [198246.288572]  [<ffffffff81b6a32b>] 
> x86_64_start_reservations+0x132/0x136
> Jan 16 05:50:01 mail kernel: [198246.288576]  [<ffffffff81b6a140>] ? 
> early_idt_handlers+0x140/0x140
> Jan 16 05:50:01 mail kernel: [198246.288580]  [<ffffffff81b6a431>] 
> x86_64_start_kernel+0x102/0x111
> Jan 16 05:50:01 mail kernel: [198246.288583] ---[ end trace 
> bb26011d21a2b1d7 ]---
> Jan 16 05:50:01 mail kernel: [198246.288586] sky2 0000:06:00.0: eth0: tx 
> timeout
> Jan 16 05:50:01 mail kernel: [198246.288593] sky2 0000:06:00.0: eth0: 
> transmit ring 115 .. 10 report=115 done=115
> 
> 
> 


Which exact chip version is this?
  dmesg | grep sky2
  lspci

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9)
  2012-01-20 16:26         ` Stephen Hemminger
@ 2012-01-20 16:44           ` Michael Breuer
  2012-01-21 15:29             ` Michael Breuer
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2012-01-20 16:44 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, Stephen Hemminger, linux-kernel, netdev

On 1/20/2012 11:26 AM, Stephen Hemminger wrote:
> On Mon, 16 Jan 2012 11:39:13 -0500
> Michael Breuer<mbreuer@majjas.com>  wrote:
>
>> Synopsis:
>>
>> Receiving DMAR and other errors after approximately three days of
>> uptime. The symptoms exactly match errors seen and then fixed around
>> 2.6.32.4.
>>
>> While the system remains unaffected for too long to do a bisect, I was
>> able to confirm that the problem exists in the 3.1 stable branch (I
>> jumped from 3.0 to 3.2 when 3.2. was released).
>>
>> For now I reverted to the sky2.c from 3.0.9 and am running the rest of
>> the kernel from 3.1.2, but won't be certain that this works until later
>> in the week.
>>
>> Note that 20 seconds prior to the log extract below were DHCP renewal
>> attempts on eth1, the issue below was on eth0. Not sure it's relevant,
>> however back in 2010 a preceding DHCP event did turn out to be relevant
>> to the manifestation of the bug.
>>
>> The 3.2.1-dirty I'm running is from git with a single local patch - for
>> sidewinder force-feedback support (shouldn't be relevant to the sky2 issue).
>>
>> Log extract:
>>
>> Jan 16 05:49:46 mail kernel: [198230.628919] DRHD: handling fault status
>> reg 2
>> [snip]
>>
>>
>>
>
> Which exact chip version is this?
>    dmesg | grep sky2
>    lspci
[    9.927143] sky2: driver version 1.29
[    9.927166] sky2 0000:06:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    9.927177] sky2 0000:06:00.0: setting latency timer to 64
[    9.927254] sky2 0000:06:00.0: Yukon-2 EC Ultra chip revision 3
[    9.927339] sky2 0000:06:00.0: irq 71 for MSI/MSI-X
[    9.927562] sky2 0000:06:00.0: eth0: addr 00:26:18:00:1c:3b
[    9.927578] sky2 0000:04:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[    9.927586] sky2 0000:04:00.0: setting latency timer to 64
[    9.927640] sky2 0000:04:00.0: Yukon-2 EC Ultra chip revision 3
[    9.927718] sky2 0000:04:00.0: irq 72 for MSI/MSI-X
[    9.927856] sky2 0000:04:00.0: eth1: addr 00:26:18:00:1c:3a
[   23.468135] sky2 0000:06:00.0: eth0: enabling interface
[   25.709668] sky2 0000:04:00.0: eth1: enabling interface
[   25.981841] sky2 0000:06:00.0: eth0: Link is up at 1000 Mbps, full 
duplex, fl    ow control both
[   27.418742] sky2 0000:04:00.0: eth1: Link is up at 100 Mbps, full 
duplex, flo    w control rx

04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E 
Gigabit Ethernet Controller (rev 14)
05:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6121 SATA II 
Controller (rev b2)
06:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E 
Gigabit Ethernet Controller (rev 14)


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9)
  2012-01-20 16:44           ` Michael Breuer
@ 2012-01-21 15:29             ` Michael Breuer
  2012-01-22 18:03               ` Stephen Hemminger
  0 siblings, 1 reply; 99+ messages in thread
From: Michael Breuer @ 2012-01-21 15:29 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jarek Poplawski, David Miller, Stephen Hemminger, linux-kernel, netdev

On 1/20/2012 11:44 AM, Michael Breuer wrote:
> On 1/20/2012 11:26 AM, Stephen Hemminger wrote:
>> On Mon, 16 Jan 2012 11:39:13 -0500
>> Michael Breuer<mbreuer@majjas.com>  wrote:
>>
>>> Synopsis:
>>>
>>> Receiving DMAR and other errors after approximately three days of
>>> uptime. The symptoms exactly match errors seen and then fixed around
>>> 2.6.32.4.
>>>
>>> While the system remains unaffected for too long to do a bisect, I was
>>> able to confirm that the problem exists in the 3.1 stable branch (I
>>> jumped from 3.0 to 3.2 when 3.2. was released).
>>>
>>> For now I reverted to the sky2.c from 3.0.9 and am running the rest of
>>> the kernel from 3.1.2, but won't be certain that this works until later
>>> in the week.
>>>
>>> Note that 20 seconds prior to the log extract below were DHCP renewal
>>> attempts on eth1, the issue below was on eth0. Not sure it's relevant,
>>> however back in 2010 a preceding DHCP event did turn out to be relevant
>>> to the manifestation of the bug.
>>>
>>> The 3.2.1-dirty I'm running is from git with a single local patch - for
>>> sidewinder force-feedback support (shouldn't be relevant to the sky2 
>>> issue).
>>>
>>> Log extract:
>>>
>>> Jan 16 05:49:46 mail kernel: [198230.628919] DRHD: handling fault 
>>> status
>>> reg 2
>>> [snip]
>>>
>>>
>>>
>>
>> Which exact chip version is this?
>>    dmesg | grep sky2
>>    lspci
> [    9.927143] sky2: driver version 1.29
> [    9.927166] sky2 0000:06:00.0: PCI INT A -> GSI 18 (level, low) -> 
> IRQ 18
> [    9.927177] sky2 0000:06:00.0: setting latency timer to 64
> [    9.927254] sky2 0000:06:00.0: Yukon-2 EC Ultra chip revision 3
> [    9.927339] sky2 0000:06:00.0: irq 71 for MSI/MSI-X
> [    9.927562] sky2 0000:06:00.0: eth0: addr 00:26:18:00:1c:3b
> [    9.927578] sky2 0000:04:00.0: PCI INT A -> GSI 17 (level, low) -> 
> IRQ 17
> [    9.927586] sky2 0000:04:00.0: setting latency timer to 64
> [    9.927640] sky2 0000:04:00.0: Yukon-2 EC Ultra chip revision 3
> [    9.927718] sky2 0000:04:00.0: irq 72 for MSI/MSI-X
> [    9.927856] sky2 0000:04:00.0: eth1: addr 00:26:18:00:1c:3a
> [   23.468135] sky2 0000:06:00.0: eth0: enabling interface
> [   25.709668] sky2 0000:04:00.0: eth1: enabling interface
> [   25.981841] sky2 0000:06:00.0: eth0: Link is up at 1000 Mbps, full 
> duplex, fl    ow control both
> [   27.418742] sky2 0000:04:00.0: eth1: Link is up at 100 Mbps, full 
> duplex, flo    w control rx
>
> 04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 
> PCI-E Gigabit Ethernet Controller (rev 14)
> 05:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6121 SATA II 
> Controller (rev b2)
> 06:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 
> PCI-E Gigabit Ethernet Controller (rev 14)
>
Seems I spoke too soon... Got the sky2 crash again early this morning 
after five days up.

Not sure how I can do any sort of bisect without narrowing down the 
possible culprits.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9)
  2012-01-21 15:29             ` Michael Breuer
@ 2012-01-22 18:03               ` Stephen Hemminger
  0 siblings, 0 replies; 99+ messages in thread
From: Stephen Hemminger @ 2012-01-22 18:03 UTC (permalink / raw)
  To: Michael Breuer
  Cc: Jarek Poplawski, David Miller, Stephen Hemminger, linux-kernel, netdev


> Not sure how I can do any sort of bisect without narrowing down the 
> possible culprits.


The error is resulting from hardware trying to access un mapped memory.
You could instrument the dmar_fault() code to print out more fault information
(address, type, etc) and then dump request ring in sky2.

Also, since it is rare, it maybe related to your hardware not mapping all
the bits of address.

Jan 16 05:49:46 mail kernel: [198230.628919] DRHD: handling fault status 
reg 2
Jan 16 05:49:46 mail kernel: [198230.628925] sky2 0000:06:00.0: error 
interrupt status=0x80000000
Jan 16 05:49:46 mail kernel: [198230.628929] DMAR:[DMA Read] Request 
device [06:00.0] fault addr fff78000
Jan 16 05:49:46 mail kernel: [198230.628931] DMAR:[fault reason 06] PTE 
Read access is not set
Jan 16 05:49:46 mail kernel: [198230.628939] sky2 0000:06:00.0: PCI 
hardware error (0x2010)

^ permalink raw reply	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2012-01-22 18:03 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-20  9:41 [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync Jarek Poplawski
2010-01-20 18:03 ` Stephen Hemminger
2010-01-20 20:11   ` Michael Chan
2010-01-20 20:30     ` Stephen Hemminger
2010-01-20 20:58       ` Jarek Poplawski
2010-01-20 22:50         ` David Miller
2010-01-20 22:45       ` David Miller
2010-01-20 18:09 ` Stephen Hemminger
2010-01-20 22:24 ` Alan Cox
2010-01-20 22:53   ` David Miller
2010-01-20 22:53   ` Jarek Poplawski
2010-01-21 15:22     ` FUJITA Tomonori
2010-01-21 18:41       ` Jarek Poplawski
2010-01-22  5:11         ` FUJITA Tomonori
2010-01-22  6:38           ` David Miller
2010-02-03  1:18             ` FUJITA Tomonori
2010-02-03  1:27               ` David Miller
2010-01-21 19:59 ` Michael Breuer
2010-01-21 20:41   ` Jarek Poplawski
2010-01-21 20:46     ` Michael Breuer
2010-01-21 21:02       ` Jarek Poplawski
2010-01-22 18:01     ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Michael Breuer
2010-01-22 21:53       ` Jarek Poplawski
2010-01-22 22:14         ` Michael Breuer
2010-01-22 23:06           ` Jarek Poplawski
2010-01-22 23:25             ` Michael Breuer
2010-01-22 23:46               ` Jarek Poplawski
2010-01-22 23:50                 ` Michael Breuer
2010-01-23 23:21                   ` Jarek Poplawski
2010-01-24  1:53                     ` Michael Breuer
2010-01-27 15:34                     ` Michael Breuer
2010-01-27 16:50                       ` Stephen Hemminger
2010-01-27 16:57                         ` Michael Breuer
2010-01-27 17:45                           ` Stephen Hemminger
2010-01-27 17:57                             ` Michael Breuer
2010-01-27 18:33                               ` Michael Breuer
2010-01-27 23:54                             ` Hang: 2.6.32.4 sky2/DMAR David Miller
2010-01-27 17:56                           ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Stephen Hemminger
2010-01-27 17:58                             ` Michael Breuer
2010-01-27 18:08                             ` Michael Breuer
2010-01-27 18:45                               ` Michael Breuer
2010-01-27 19:23                                 ` Jarek Poplawski
2010-01-27 19:32                                   ` Jarek Poplawski
2010-01-28 15:32                                 ` Michael Breuer
2010-01-28 16:43                                   ` Michael Breuer
2010-01-28 17:08                                     ` Stephen Hemminger
2010-01-28 18:46                                       ` Michael Breuer
2010-01-28 22:34                                         ` Jarek Poplawski
2010-01-28 22:43                                           ` Michael Breuer
2010-01-28 22:56                                             ` Jarek Poplawski
2010-01-28 22:59                                               ` Michael Breuer
2010-01-28 23:36                                                 ` [PATCH] sky2: receive dma mapping error handling Stephen Hemminger
2010-01-29  0:05                                                   ` Michael Breuer
2010-01-30 16:30                                                   ` Michael Breuer
2010-01-30 16:31                                                   ` Michael Breuer
2010-01-31  0:34                                                     ` Jarek Poplawski
2010-01-31  4:17                                                       ` Michael Breuer
2010-01-31 22:25                                                         ` Jarek Poplawski
2010-01-31 23:58                                                           ` Michael Breuer
2010-01-31  4:55                                                       ` Michael Breuer
2010-01-31 18:50                                                         ` Michael Breuer
2010-01-31 21:58                                                           ` Michael Breuer
2010-01-31 22:18                                                             ` Jarek Poplawski
2010-02-01  0:19                                                               ` Michael Breuer
2010-02-01  4:26                                                                 ` Michael Breuer
2010-02-01 10:47                                                                   ` Jarek Poplawski
2010-02-01  9:17                                                                 ` [PATCH v2] sky2: Fix transmit dma mapping handling Jarek Poplawski
2010-02-01 17:52                                                                   ` Michael Breuer
2010-02-01 18:08                                                               ` [PATCH] sky2: receive dma mapping error handling Stephen Hemminger
2010-02-01 18:20                                                               ` Stephen Hemminger
2010-02-01 18:44                                                                 ` Michael Breuer
2010-02-01 20:13                                                                 ` Jarek Poplawski
2010-02-01 20:41                                                                   ` Jarek Poplawski
2010-02-01 21:27                                                                 ` [PATCH v3] " Jarek Poplawski
2010-02-01 22:29                                                                   ` Stephen Hemminger
2010-02-01 22:46                                                                     ` Jarek Poplawski
2010-02-01 22:51                                                                       ` Stephen Hemminger
2010-02-01 21:42                                                                 ` [PATCH v3b resent] sky2: Fix transmit dma mapping handling Jarek Poplawski
2010-02-03  4:07                                                                 ` [PATCH] sky2: receive dma mapping error handling Michael Breuer
2010-02-03 16:47                                                                   ` Michael Breuer
2010-02-03 16:56                                                                     ` Stephen Hemminger
2010-02-03 17:07                                                                       ` Michael Breuer
2010-02-03 18:23                                                                         ` Justin P. Mattock
2010-02-03 18:25                                                                           ` Stephen Hemminger
2010-02-03 18:48                                                                             ` Justin P. Mattock
2010-11-06 16:57                                                                         ` Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled Michael Breuer
2010-11-08  3:13                                                                           ` Stephen Hemminger
2010-11-08  3:38                                                                             ` Michael Breuer
2010-11-08 16:46                                                                               ` Stephen Hemminger
2010-02-03 17:16                                                                       ` [PATCH] sky2: receive dma mapping error handling Justin P. Mattock
2010-02-02 22:44                                                   ` Andi Kleen
2012-01-16 16:39       ` Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9) Michael Breuer
2012-01-20 14:24         ` Michael Breuer
2012-01-20 16:10           ` Stephen Hemminger
2012-01-20 16:17             ` Michael Breuer
2012-01-20 16:26         ` Stephen Hemminger
2012-01-20 16:44           ` Michael Breuer
2012-01-21 15:29             ` Michael Breuer
2012-01-22 18:03               ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.