All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
@ 2013-01-22 10:15 Christoph Paasch
  2013-01-22 13:32 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Paasch @ 2013-01-22 10:15 UTC (permalink / raw)
  To: Ian Campbell, Eric Dumazet, Sony Chacko, Rajesh Borundia,
	David Miller, netdev

Hello,

I have a scenario where I can trigger a bug on kernels >= 3.3 and <= 3.6. 
Thus, I can produce it with the latest longterm-stable v3.4.26.

The crashdumps/warning can be seen below. Sometimes it is only the warning, 
sometimes it also produces the crash. But, it happens each time I try out my 
scenario.


How to reproduce the bug (I have HP Proliant DL165 machines with HP NC375T 1Gb 
interface):

 * Launch an iperf-session ( -t 10 ) to a server over a 1Gbps interface.

 * After 5 seconds on the client, remove the IP-address from the interface
with ip addr del dev [itf] [ip]

 * Wait 10 more seconds and kill the iperf on the client and the server.

 * Then do: ifconfig down [itf]

Now the crash happens.

What I observe in netxen_release_tx_buffers is that upon the 18th iteration (j 
== 17), buffrag->length == 0. buffrag->frag_count is 18.
Sometimes (much more rare), buffrag->length rather looks like garbage (e.g., > 
2^32)


I bisected this, and it was introduced by commit 9d4dde521577 (net: only use a 
single page of slop in MAX_SKB_FRAGS). 
It was fixed by Eric in commit 5640f7685831 (net: use a per task frag 
allocator) since kernel > 3.6.

As this bug is present in the longterm-stable 3.4, should Eric's patch be 
backported?
If not, does somebody (with more knowledge than I have of this part of the 
code) can have a look at it, or maybe give me a pointer on how I could solve 
this properly?

Reverting commit 9d4dde521577 (net: only use a single page of slop in 
MAX_SKB_FRAGS) fixes it for me on 3.4.26.



Thanks,
Christoph



[  610.315966] ------------[ cut here ]------------
[  610.371099] WARNING: at /home/cpaasch/builder/net-next/lib/dma-debug.c:865 
check_unmap+0x18e/0x61e()
[  610.480197] Hardware name: ProLiant DL165 G7
[  610.531168] netxen_nic 0000:05:00.2: DMA-API: device driver tries to free 
DMA memory it has not allocated [device address=0x0000000000000012] [size=0 
bytes]
[  610.698391] Modules linked in:
[  610.734935] Pid: 3728, comm: ip Not tainted 3.4.26-mptcp #30
[  610.802511] Call Trace:
[  610.831692]  [<ffffffff81025af7>] warn_slowpath_common+0x80/0x98
[  610.903424]  [<ffffffff81025ba3>] warn_slowpath_fmt+0x41/0x43
[  610.972041]  [<ffffffff811c93d1>] check_unmap+0x18e/0x61e
[  611.036505]  [<ffffffff811c99aa>] debug_dma_unmap_page+0x50/0x52
[  611.108236]  [<ffffffff813642f3>] netxen_release_tx_buffers+0x11e/0x175
[  611.187232]  [<ffffffff813621a6>] __netxen_nic_down+0x12c/0x13f
[  611.257922]  [<ffffffff81362290>] netxen_nic_close+0x13/0x17
[  611.325502]  [<ffffffff813fc7dc>] __dev_close_many+0x90/0xbc
[  611.393079]  [<ffffffff813fc839>] __dev_close+0x31/0x42
[  611.455469]  [<ffffffff813fa38d>] __dev_change_flags+0xb9/0x13d
[  611.526160]  [<ffffffff813fd16e>] dev_change_flags+0x1c/0x52
[  611.593740]  [<ffffffff814075ad>] do_setlink+0x2c0/0x7d2
[  611.657167]  [<ffffffff8146fcb5>] ? inet6_fill_ifla6_attrs+0x205/0x219
[  611.735124]  [<ffffffff814085a3>] rtnl_newlink+0x26b/0x4a1
[  611.800626]  [<ffffffff81408400>] ? rtnl_newlink+0xc8/0x4a1
[  611.867166]  [<ffffffff81419ebb>] ? netlink_sendmsg+0x22b/0x2b2
[  611.937859]  [<ffffffff8109fcf3>] ? check_object+0x13b/0x1df
[  612.005437]  [<ffffffff8140831d>] rtnetlink_rcv_msg+0x22c/0x247
[  612.076128]  [<ffffffff814080f1>] ? rtnetlink_rcv+0x28/0x28
[  612.142668]  [<ffffffff81419c3f>] netlink_rcv_skb+0x3e/0x8f
[  612.209209]  [<ffffffff814080ea>] rtnetlink_rcv+0x21/0x28
[  612.273673]  [<ffffffff81419a01>] netlink_unicast+0x134/0x1ab
[  612.342289]  [<ffffffff81419eda>] netlink_sendmsg+0x24a/0x2b2
[  612.410908]  [<ffffffff813ebb1a>] sock_sendmsg+0xb8/0xd1
[  612.474332]  [<ffffffff81073a29>] ? filemap_fault+0x199/0x35d
[  612.542948]  [<ffffffff810732ab>] ? unlock_page+0x2d/0x32
[  612.607412]  [<ffffffff810894c2>] ? __do_fault+0x3ce/0x409
[  612.672916]  [<ffffffff813eb239>] ? move_addr_to_kernel+0x3a/0x51
[  612.745685]  [<ffffffff813f4f76>] ? verify_iovec+0x59/0xaf
[  612.811187]  [<ffffffff813ec675>] __sys_sendmsg+0x1b9/0x23e
[  612.877725]  [<ffffffff8108c2ab>] ? handle_mm_fault+0x1b7/0x1cd
[  612.948420]  [<ffffffff8101f08e>] ? do_page_fault+0x336/0x375
[  613.017035]  [<ffffffff81090681>] ? do_brk+0x2e4/0x346
[  613.078385]  [<ffffffff813ec80d>] sys_sendmsg+0x3d/0x5e
[  613.140775]  [<ffffffff814e62a2>] system_call_fastpath+0x16/0x1b
[  613.212502] ---[ end trace d4f0eb8a4ca35e8a ]---



[  719.276359] BUG: unable to handle kernel paging request at ffffc900103cd000
[  719.359733] IP: [<ffffffff81364295>] netxen_release_tx_buffers+0xc0/0x175
[  719.440926] PGD 42d851067 PUD 42d852067 PMD 42c11b067 PTE 0
[  719.507897] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[  719.562307] CPU 7
[  719.584214] Modules linked in:
[  719.622944]
[  719.640700] Pid: 4563, comm: ip Tainted: G        W    3.4.26-mptcp #30 HP 
ProLiant DL165 G7
[  719.741707] RIP: 0010:[<ffffffff81364295>]  [<ffffffff81364295>] 
netxen_release_tx_buffers+0xc0/0x175
[  719.851952] RSP: 0018:ffff880422c79668  EFLAGS: 00010246
[  719.915380] RAX: ffff88042dbce360 RBX: ffffc900103cced0 RCX: 
0000000000000000
[  720.000603] RDX: 0000000000000011 RSI: 0000000000000282 RDI: 
0000000000000000
[  720.085826] RBP: ffff880422c796b8 R08: 0000000000000000 R09: 
ffff88042dbce3e8
[  720.171049] R10: 0000000000000001 R11: 0000000000000206 R12: 
ffffc900103ccff8
[  720.256273] R13: ffff88042d9aa720 R14: 0000000000000012 R15: 
0000000000000000
[  720.341499] FS:  00007f191d157700(0000) GS:ffff88043fdc0000(0000) 
knlGS:0000000000000000
[  720.438138] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  720.506755] CR2: ffffc900103cd000 CR3: 0000000422cbd000 CR4: 
00000000000007e0
[  720.591979] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  720.677203] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[  720.762427] Process ip (pid: 4563, threadinfo ffff880422c78000, task 
ffff88042d9b4b90)
[  720.856990] Stack:
[  720.880968]  ffff880422c796b8 0000000000000000 ffff88042bc53b28 
00000011000003ff
[  720.969736]  ffff880422c796b8 ffff88042d9aa720 ffff88042bc6c920 
0000000000000001
[  721.058506]  ffff88042d9aa0a0 0000000000000007 ffff880422c796f8 
ffffffff813621a6
[  721.147273] Call Trace:
[  721.176450]  [<ffffffff813621a6>] __netxen_nic_down+0x12c/0x13f
[  721.247140]  [<ffffffff81362290>] netxen_nic_close+0x13/0x17
[  721.314721]  [<ffffffff813fc7dc>] __dev_close_many+0x90/0xbc
[  721.382295]  [<ffffffff813fc839>] __dev_close+0x31/0x42
[  721.444687]  [<ffffffff813fa38d>] __dev_change_flags+0xb9/0x13d
[  721.515378]  [<ffffffff813fd16e>] dev_change_flags+0x1c/0x52
[  721.582958]  [<ffffffff814075ad>] do_setlink+0x2c0/0x7d2
[  721.646384]  [<ffffffff8146fcb5>] ? inet6_fill_ifla6_attrs+0x205/0x219
[  721.724342]  [<ffffffff814085a3>] rtnl_newlink+0x26b/0x4a1
[  721.789842]  [<ffffffff81408400>] ? rtnl_newlink+0xc8/0x4a1
[  721.856384]  [<ffffffff81419ebb>] ? netlink_sendmsg+0x22b/0x2b2
[  721.927079]  [<ffffffff8109fcf3>] ? check_object+0x13b/0x1df
[  721.994659]  [<ffffffff8140831d>] rtnetlink_rcv_msg+0x22c/0x247
[  722.065351]  [<ffffffff814080f1>] ? rtnetlink_rcv+0x28/0x28
[  722.131893]  [<ffffffff81419c3f>] netlink_rcv_skb+0x3e/0x8f
[  722.198433]  [<ffffffff814080ea>] rtnetlink_rcv+0x21/0x28
[  722.262897]  [<ffffffff81419a01>] netlink_unicast+0x134/0x1ab
[  722.331515]  [<ffffffff81419eda>] netlink_sendmsg+0x24a/0x2b2
[  722.400131]  [<ffffffff813ebb1a>] sock_sendmsg+0xb8/0xd1
[  722.463559]  [<ffffffff81073a29>] ? filemap_fault+0x199/0x35d
[  722.532171]  [<ffffffff810732ab>] ? unlock_page+0x2d/0x32
[  722.596636]  [<ffffffff810894c2>] ? __do_fault+0x3ce/0x409
[  722.662140]  [<ffffffff813eb239>] ? move_addr_to_kernel+0x3a/0x51
[  722.734909]  [<ffffffff813f4f76>] ? verify_iovec+0x59/0xaf
[  722.800410]  [<ffffffff813ec675>] __sys_sendmsg+0x1b9/0x23e
[  722.866951]  [<ffffffff8108c2ab>] ? handle_mm_fault+0x1b7/0x1cd
[  722.937643]  [<ffffffff8101f08e>] ? do_page_fault+0x336/0x375
[  723.006260]  [<ffffffff81090681>] ? do_brk+0x2e4/0x346
[  723.067610]  [<ffffffff813ec80d>] sys_sendmsg+0x3d/0x5e
[  723.129998]  [<ffffffff814e62a2>] system_call_fastpath+0x16/0x1b
[  723.201724] Code: e6 ff 48 c7 43 08 00 00 00 00 4c 8d 63 08 c7 45 cc 00 00 
00 00 eb 7d 49 83 c4 10 4d 8b 34 24 4d 85 f6 74 6d 49 8b 45 58 45 31 ff <4d> 
8b 4c 24 08 48 85 c0 74 13 4c 8d b8 88 00 00 00 48 8b 80 b8
[  723.434030] RIP  [<ffffffff81364295>] netxen_release_tx_buffers+0xc0/0x175
[  723.516249]  RSP <ffff880422c79668>
[  723.557878] CR2: ffffc900103cd000
[  723.597832] ---[ end trace d4f0eb8a4ca35e8b ]---


-- 
IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://mptcp.info.ucl.ac.be
UCLouvain
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
  2013-01-22 10:15 BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6 Christoph Paasch
@ 2013-01-22 13:32 ` Eric Dumazet
  2013-01-22 13:56   ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2013-01-22 13:32 UTC (permalink / raw)
  To: christoph.paasch
  Cc: Ian Campbell, Sony Chacko, Rajesh Borundia, David Miller, netdev

On Tue, 2013-01-22 at 11:15 +0100, Christoph Paasch wrote:
> Hello,
> 

Hi Christoph

> I have a scenario where I can trigger a bug on kernels >= 3.3 and <= 3.6. 
> Thus, I can produce it with the latest longterm-stable v3.4.26.
> 
> The crashdumps/warning can be seen below. Sometimes it is only the warning, 
> sometimes it also produces the crash. But, it happens each time I try out my 
> scenario.
> 
> 
> How to reproduce the bug (I have HP Proliant DL165 machines with HP NC375T 1Gb 
> interface):
> 
>  * Launch an iperf-session ( -t 10 ) to a server over a 1Gbps interface.
> 
>  * After 5 seconds on the client, remove the IP-address from the interface
> with ip addr del dev [itf] [ip]
> 
>  * Wait 10 more seconds and kill the iperf on the client and the server.
> 
>  * Then do: ifconfig down [itf]
> 
> Now the crash happens.
> 
> What I observe in netxen_release_tx_buffers is that upon the 18th iteration (j 
> == 17), buffrag->length == 0. buffrag->frag_count is 18.
> Sometimes (much more rare), buffrag->length rather looks like garbage (e.g., > 
> 2^32)
> 
> 
> I bisected this, and it was introduced by commit 9d4dde521577 (net: only use a 
> single page of slop in MAX_SKB_FRAGS). 
> It was fixed by Eric in commit 5640f7685831 (net: use a per task frag 
> allocator) since kernel > 3.6.
> 

Its a side effect of this patch, as it permits to build a full TSO
packet using 2 or 3 frags, instead of 16 to 17 frags.

But you could theoretically still hit the bug if the application uses
several sockets and does short write() on them. Because each short
write() would use a small frag.

> As this bug is present in the longterm-stable 3.4, should Eric's patch be 
> backported?

I don't think so. It had some side effects that we are still sorting
out. (see recent splice() fixes for example)

> If not, does somebody (with more knowledge than I have of this part of the 
> code) can have a look at it, or maybe give me a pointer on how I could solve 
> this properly?
> 
> Reverting commit 9d4dde521577 (net: only use a single page of slop in 
> MAX_SKB_FRAGS) fixes it for me on 3.4.26.
> 

Something doesn't properly test MAX_SKB_FRAGS, we should track it and
fix.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
  2013-01-22 13:32 ` Eric Dumazet
@ 2013-01-22 13:56   ` Eric Dumazet
  2013-01-22 15:03     ` Christoph Paasch
  2013-01-22 19:36     ` Ben Hutchings
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2013-01-22 13:56 UTC (permalink / raw)
  To: christoph.paasch
  Cc: Ian Campbell, Sony Chacko, Rajesh Borundia, David Miller, netdev

On Tue, 2013-01-22 at 05:32 -0800, Eric Dumazet wrote:

> 
> Something doesn't properly test MAX_SKB_FRAGS, we should track it and
> fix.

I guess netxen driver has a bug.

Please try the following patch :

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
index bc165f4..695667d 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
@@ -144,7 +144,7 @@ void netxen_release_tx_buffers(struct netxen_adapter *adapter)
 					 buffrag->length, PCI_DMA_TODEVICE);
 			buffrag->dma = 0ULL;
 		}
-		for (j = 0; j < cmd_buf->frag_count; j++) {
+		for (j = 1; j < cmd_buf->frag_count; j++) {
 			buffrag++;
 			if (buffrag->dma) {
 				pci_unmap_page(adapter->pdev, buffrag->dma,

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
  2013-01-22 13:56   ` Eric Dumazet
@ 2013-01-22 15:03     ` Christoph Paasch
  2013-01-22 15:43       ` Christoph Paasch
  2013-01-22 19:36     ` Ben Hutchings
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Paasch @ 2013-01-22 15:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ian Campbell, Sony Chacko, Rajesh Borundia, David Miller, netdev

On Tuesday 22 January 2013 05:56:06 Eric Dumazet wrote:
> I guess netxen driver has a bug.
> 
> Please try the following patch :
> 
> diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
> b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c index
> bc165f4..695667d 100644
> --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
> +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
> @@ -144,7 +144,7 @@ void netxen_release_tx_buffers(struct netxen_adapter
> *adapter) buffrag->length, PCI_DMA_TODEVICE); buffrag->dma = 0ULL;
>                 }
> -               for (j = 0; j < cmd_buf->frag_count; j++) {
> +               for (j = 1; j < cmd_buf->frag_count; j++) {
>                         buffrag++;
>                         if (buffrag->dma) {
>                                 pci_unmap_page(adapter->pdev, buffrag->dma,

Perfect, I tested it, and this fixes the bug.

I should have found it on my own, I have been starring for too long at this 
function... :)

Feel free to add
Tested-by: Christoph Paasch <christoph.paasch@uclouvain.be>


Thanks, Eric!


Christoph


-- 
IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://mptcp.info.ucl.ac.be
UCLouvain
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
  2013-01-22 15:03     ` Christoph Paasch
@ 2013-01-22 15:43       ` Christoph Paasch
  2013-01-22 16:33         ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Paasch @ 2013-01-22 15:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ian Campbell, Sony Chacko, Rajesh Borundia, David Miller, netdev

In netxen_map_tx_skb() I think we also have to set nf->dma to 0ULL (like the 
diff below).

Otherwise, netxen_release_tx_buffer() may try to unmap something that has 
already been unmapped.

I'm not sure - I don't feel very comfortable in driver-code...


diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c 
b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
index 342b3a7..a1516a6 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
@@ -1959,10 +1959,12 @@ unwind:
        while (--i >= 0) {
                nf = &pbuf->frag_array[i+1];
                pci_unmap_page(pdev, nf->dma, nf->length, PCI_DMA_TODEVICE);
+               nf->dma = 0ULL;
        }
 
        nf = &pbuf->frag_array[0];
        pci_unmap_single(pdev, nf->dma, skb_headlen(skb), PCI_DMA_TODEVICE);
+       nf->dma = 0ULL;
 
 out_err:
        return -ENOMEM;



On Tuesday 22 January 2013 16:03:59 Christoph Paasch wrote:
> On Tuesday 22 January 2013 05:56:06 Eric Dumazet wrote:
> > I guess netxen driver has a bug.
> > 
> > Please try the following patch :
> > 
> > diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
> > b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c index
> > bc165f4..695667d 100644
> > --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
> > +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
> > @@ -144,7 +144,7 @@ void netxen_release_tx_buffers(struct netxen_adapter
> > *adapter) buffrag->length, PCI_DMA_TODEVICE); buffrag->dma = 0ULL;
> > 
> >                 }
> > 
> > -               for (j = 0; j < cmd_buf->frag_count; j++) {
> > +               for (j = 1; j < cmd_buf->frag_count; j++) {
> > 
> >                         buffrag++;
> >                         if (buffrag->dma) {
> >                         
> >                                 pci_unmap_page(adapter->pdev,
> >                                 buffrag->dma,
> 
> Perfect, I tested it, and this fixes the bug.
> 
> I should have found it on my own, I have been starring for too long at this
> function... :)
> 
> Feel free to add
> Tested-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> 
> 
> Thanks, Eric!
> 
> 
> Christoph
-- 
IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://mptcp.info.ucl.ac.be
UCLouvain
--

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
  2013-01-22 15:43       ` Christoph Paasch
@ 2013-01-22 16:33         ` Eric Dumazet
  2013-01-22 16:55           ` Christoph Paasch
                             ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Eric Dumazet @ 2013-01-22 16:33 UTC (permalink / raw)
  To: christoph.paasch
  Cc: Ian Campbell, Sony Chacko, Rajesh Borundia, David Miller, netdev

From: Eric Dumazet <edumazet@google.com>

On Tue, 2013-01-22 at 16:43 +0100, Christoph Paasch wrote:
> In netxen_map_tx_skb() I think we also have to set nf->dma to 0ULL (like the 
> diff below).
> 
> Otherwise, netxen_release_tx_buffer() may try to unmap something that has 
> already been unmapped.
> 
> I'm not sure - I don't feel very comfortable in driver-code...

It seems fine to me, here is the official combined patch, feel
free to add your 'Signed-off-by'

Thanks !

[PATCH] netxen: fix off by one bug in netxen_release_tx_buffer()

Christoph Paasch found netxen could trigger a BUG in its dismantle
phase, in netxen_release_tx_buffer(), using full size TSO packets.

cmd_buf->frag_count includes the skb->data part, so the loop must
start at index 1 instead of 0, or else we can make an out
of bound access to cmd_buff->frag_array[MAX_SKB_FRAGS + 2]

Christoph provided the fixes in netxen_map_tx_skb() function.
In case of a dma mapping error, its better to clear the dma fields
so that we don't try to unmap them again in netxen_release_tx_buffer()

Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Sony Chacko <sony.chacko@qlogic.com>
Cc: Rajesh Borundia <rajesh.borundia@qlogic.com>
---
 drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c |    2 +-
 drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c |    2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
index bc165f4..695667d 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
@@ -144,7 +144,7 @@ void netxen_release_tx_buffers(struct netxen_adapter *adapter)
 					 buffrag->length, PCI_DMA_TODEVICE);
 			buffrag->dma = 0ULL;
 		}
-		for (j = 0; j < cmd_buf->frag_count; j++) {
+		for (j = 1; j < cmd_buf->frag_count; j++) {
 			buffrag++;
 			if (buffrag->dma) {
 				pci_unmap_page(adapter->pdev, buffrag->dma,
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
index 6098fd4a..69e321a 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
@@ -1963,10 +1963,12 @@ unwind:
 	while (--i >= 0) {
 		nf = &pbuf->frag_array[i+1];
 		pci_unmap_page(pdev, nf->dma, nf->length, PCI_DMA_TODEVICE);
+		nf->dma = 0ULL;
 	}
 
 	nf = &pbuf->frag_array[0];
 	pci_unmap_single(pdev, nf->dma, skb_headlen(skb), PCI_DMA_TODEVICE);
+	nf->dma = 0ULL;
 
 out_err:
 	return -ENOMEM;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
  2013-01-22 16:33         ` Eric Dumazet
@ 2013-01-22 16:55           ` Christoph Paasch
  2013-01-22 18:46           ` Rajesh Borundia
  2013-01-22 19:15           ` David Miller
  2 siblings, 0 replies; 12+ messages in thread
From: Christoph Paasch @ 2013-01-22 16:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ian Campbell, Sony Chacko, Rajesh Borundia, David Miller, netdev

On Tuesday 22 January 2013 08:33:05 Eric Dumazet wrote:
> [PATCH] netxen: fix off by one bug in netxen_release_tx_buffer()
> 
> Christoph Paasch found netxen could trigger a BUG in its dismantle
> phase, in netxen_release_tx_buffer(), using full size TSO packets.
> 
> cmd_buf->frag_count includes the skb->data part, so the loop must
> start at index 1 instead of 0, or else we can make an out
> of bound access to cmd_buff->frag_array[MAX_SKB_FRAGS + 2]
> 
> Christoph provided the fixes in netxen_map_tx_skb() function.
> In case of a dma mapping error, its better to clear the dma fields
> so that we don't try to unmap them again in netxen_release_tx_buffer()
> 
> Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Tested-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> Cc: Sony Chacko <sony.chacko@qlogic.com>
> Cc: Rajesh Borundia <rajesh.borundia@qlogic.com>
> ---
>  drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c |    2 +-
>  drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c |    2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>


-- 
IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://mptcp.info.ucl.ac.be
UCLouvain
--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
  2013-01-22 16:33         ` Eric Dumazet
  2013-01-22 16:55           ` Christoph Paasch
@ 2013-01-22 18:46           ` Rajesh Borundia
  2013-01-22 19:15           ` David Miller
  2 siblings, 0 replies; 12+ messages in thread
From: Rajesh Borundia @ 2013-01-22 18:46 UTC (permalink / raw)
  To: Eric Dumazet, christoph.paasch
  Cc: Ian Campbell, Sony Chacko, David Miller, netdev

>-----Original Message-----
>From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>Sent: Tuesday, January 22, 2013 10:03 PM
>To: christoph.paasch@uclouvain.be
>Cc: Ian Campbell; Sony Chacko; Rajesh Borundia; David Miller; netdev
>Subject: Re: BUG in netxen_release_tx_buffers when TSO enabled on
>kernels >= 3.3 and <= 3.6
>
>From: Eric Dumazet <edumazet@google.com>
>
>On Tue, 2013-01-22 at 16:43 +0100, Christoph Paasch wrote:
>> In netxen_map_tx_skb() I think we also have to set nf->dma to 0ULL
>(like the
>> diff below).
>>
>> Otherwise, netxen_release_tx_buffer() may try to unmap something that
>has
>> already been unmapped.
>>
>> I'm not sure - I don't feel very comfortable in driver-code...
>
>It seems fine to me, here is the official combined patch, feel
>free to add your 'Signed-off-by'
>
>Thanks !
>
>[PATCH] netxen: fix off by one bug in netxen_release_tx_buffer()
>
>Christoph Paasch found netxen could trigger a BUG in its dismantle
>phase, in netxen_release_tx_buffer(), using full size TSO packets.
>
>cmd_buf->frag_count includes the skb->data part, so the loop must
>start at index 1 instead of 0, or else we can make an out
>of bound access to cmd_buff->frag_array[MAX_SKB_FRAGS + 2]
>
>Christoph provided the fixes in netxen_map_tx_skb() function.
>In case of a dma mapping error, its better to clear the dma fields
>so that we don't try to unmap them again in netxen_release_tx_buffer()
>
>Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>Tested-by: Christoph Paasch <christoph.paasch@uclouvain.be>
>Cc: Sony Chacko <sony.chacko@qlogic.com>
>Cc: Rajesh Borundia <rajesh.borundia@qlogic.com>
>---
> drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c |    2 +-
> drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c |    2 ++
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
>b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
>index bc165f4..695667d 100644
>--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
>+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
>@@ -144,7 +144,7 @@ void netxen_release_tx_buffers(struct netxen_adapter
>*adapter)
> 					 buffrag->length, PCI_DMA_TODEVICE);
> 			buffrag->dma = 0ULL;
> 		}
>-		for (j = 0; j < cmd_buf->frag_count; j++) {
>+		for (j = 1; j < cmd_buf->frag_count; j++) {
> 			buffrag++;
> 			if (buffrag->dma) {
> 				pci_unmap_page(adapter->pdev, buffrag->dma,
>diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
>b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
>index 6098fd4a..69e321a 100644
>--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
>+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
>@@ -1963,10 +1963,12 @@ unwind:
> 	while (--i >= 0) {
> 		nf = &pbuf->frag_array[i+1];
> 		pci_unmap_page(pdev, nf->dma, nf->length, PCI_DMA_TODEVICE);
>+		nf->dma = 0ULL;
> 	}
>
> 	nf = &pbuf->frag_array[0];
> 	pci_unmap_single(pdev, nf->dma, skb_headlen(skb),
>PCI_DMA_TODEVICE);
>+	nf->dma = 0ULL;
>
> out_err:
> 	return -ENOMEM;
>
>

Acked-by: Rajesh Borundia <rajesh.borundia@qlogic.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
  2013-01-22 16:33         ` Eric Dumazet
  2013-01-22 16:55           ` Christoph Paasch
  2013-01-22 18:46           ` Rajesh Borundia
@ 2013-01-22 19:15           ` David Miller
  2 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2013-01-22 19:15 UTC (permalink / raw)
  To: eric.dumazet
  Cc: christoph.paasch, Ian.Campbell, sony.chacko, rajesh.borundia, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Jan 2013 08:33:05 -0800

> From: Eric Dumazet <edumazet@google.com>
 ...
> [PATCH] netxen: fix off by one bug in netxen_release_tx_buffer()
> 
> Christoph Paasch found netxen could trigger a BUG in its dismantle
> phase, in netxen_release_tx_buffer(), using full size TSO packets.
> 
> cmd_buf->frag_count includes the skb->data part, so the loop must
> start at index 1 instead of 0, or else we can make an out
> of bound access to cmd_buff->frag_array[MAX_SKB_FRAGS + 2]
> 
> Christoph provided the fixes in netxen_map_tx_skb() function.
> In case of a dma mapping error, its better to clear the dma fields
> so that we don't try to unmap them again in netxen_release_tx_buffer()
> 
> Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Tested-by: Christoph Paasch <christoph.paasch@uclouvain.be>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
  2013-01-22 13:56   ` Eric Dumazet
  2013-01-22 15:03     ` Christoph Paasch
@ 2013-01-22 19:36     ` Ben Hutchings
  2013-01-22 19:59       ` Eric Dumazet
  1 sibling, 1 reply; 12+ messages in thread
From: Ben Hutchings @ 2013-01-22 19:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: christoph.paasch, Ian Campbell, Sony Chacko, Rajesh Borundia,
	David Miller, netdev

On Tue, 2013-01-22 at 05:56 -0800, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 05:32 -0800, Eric Dumazet wrote:
> 
> > 
> > Something doesn't properly test MAX_SKB_FRAGS, we should track it and
> > fix.
> 
> I guess netxen driver has a bug.
> 
> Please try the following patch :
> 
> diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
> index bc165f4..695667d 100644
> --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
> +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c
> @@ -144,7 +144,7 @@ void netxen_release_tx_buffers(struct netxen_adapter *adapter)
>  					 buffrag->length, PCI_DMA_TODEVICE);
>  			buffrag->dma = 0ULL;
>  		}
> -		for (j = 0; j < cmd_buf->frag_count; j++) {
> +		for (j = 1; j < cmd_buf->frag_count; j++) {
>  			buffrag++;
>  			if (buffrag->dma) {
>  				pci_unmap_page(adapter->pdev, buffrag->dma,
> 

There's another bug right here, which is that 0 is a valid DMA address
in some systems.  The driver should be calling pci_dma_mapping_error()
to find out whether an address is valid or not.  But it also wants to be
able to assign an invalid address to netxen_skb_frag::dma, and
unfortunately there is no way to do that in the current DMA API.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
  2013-01-22 19:36     ` Ben Hutchings
@ 2013-01-22 19:59       ` Eric Dumazet
  2013-01-23 19:47         ` Rajesh Borundia
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2013-01-22 19:59 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: christoph.paasch, Ian Campbell, Sony Chacko, Rajesh Borundia,
	David Miller, netdev

On Tue, 2013-01-22 at 19:36 +0000, Ben Hutchings wrote:

> There's another bug right here, which is that 0 is a valid DMA address
> in some systems.  The driver should be calling pci_dma_mapping_error()
> to find out whether an address is valid or not.  But it also wants to be
> able to assign an invalid address to netxen_skb_frag::dma, and
> unfortunately there is no way to do that in the current DMA API.
> 

I guess we should only test ->frag_count then, and set it to 0 in TX
completion path.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6
  2013-01-22 19:59       ` Eric Dumazet
@ 2013-01-23 19:47         ` Rajesh Borundia
  0 siblings, 0 replies; 12+ messages in thread
From: Rajesh Borundia @ 2013-01-23 19:47 UTC (permalink / raw)
  To: Eric Dumazet, Ben Hutchings
  Cc: christoph.paasch, Ian Campbell, Sony Chacko, David Miller, netdev

>-----Original Message-----
>From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>Sent: Wednesday, January 23, 2013 1:30 AM
>To: Ben Hutchings
>Cc: christoph.paasch@uclouvain.be; Ian Campbell; Sony Chacko; Rajesh
>Borundia; David Miller; netdev
>Subject: Re: BUG in netxen_release_tx_buffers when TSO enabled on
>kernels >= 3.3 and <= 3.6
>
>On Tue, 2013-01-22 at 19:36 +0000, Ben Hutchings wrote:
>
>> There's another bug right here, which is that 0 is a valid DMA address
>> in some systems.  The driver should be calling pci_dma_mapping_error()
>> to find out whether an address is valid or not.  But it also wants to
>be
>> able to assign an invalid address to netxen_skb_frag::dma, and
>> unfortunately there is no way to do that in the current DMA API.
>>
>
>I guess we should only test ->frag_count then, and set it to 0 in TX
>completion path.
>
We will send a patch with the suggested fix.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-01-23 19:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-22 10:15 BUG in netxen_release_tx_buffers when TSO enabled on kernels >= 3.3 and <= 3.6 Christoph Paasch
2013-01-22 13:32 ` Eric Dumazet
2013-01-22 13:56   ` Eric Dumazet
2013-01-22 15:03     ` Christoph Paasch
2013-01-22 15:43       ` Christoph Paasch
2013-01-22 16:33         ` Eric Dumazet
2013-01-22 16:55           ` Christoph Paasch
2013-01-22 18:46           ` Rajesh Borundia
2013-01-22 19:15           ` David Miller
2013-01-22 19:36     ` Ben Hutchings
2013-01-22 19:59       ` Eric Dumazet
2013-01-23 19:47         ` Rajesh Borundia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.