All of lore.kernel.org
 help / color / mirror / Atom feed
* ipgre rss is broken since gro
@ 2012-12-08 22:35 Dmitry Kravkov
  2012-12-08 23:31 ` Dmitry Kravkov
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Kravkov @ 2012-12-08 22:35 UTC (permalink / raw)
  To: Eric Dumazet, netdev

Hi  Eric,

I'm trying to use GRE with RSS, but it looks broken on net-next since:

60769a5dcd8755715c7143b4571d5c44f01796f1 is the first bad commit
commit 60769a5dcd8755715c7143b4571d5c44f01796f1
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Sep 27 02:48:50 2012 +0000

    ipv4: gre: add GRO capability

    Add GRO capability to IPv4 GRE tunnels, using the gro_cells
    infrastructure.

    Tested using IPv4 and IPv6 TCP traffic inside this tunnel, and
    checking GRO is building large packets.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 8eb4f570181b6d72abe24f8c1123b7e49134e662 fa20194bb14d1745e9271c8a962d0f140801a226 M      include
:040000 040000 6f605ade7fed9fbe5fd57d4a0c3a8dc687e64ed6 4c06b880a6a6068aa791decb900f7b449c6ec7b5 M      net

Multiple TCP streams over the tunnel cause (almost) immediately GRE interface to drop any ingress packet.

Please note that at current net-next head behavior is different - I hit null pointer dereference, I will try to bisect this behavior too.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: ipgre rss is broken since gro
  2012-12-08 22:35 ipgre rss is broken since gro Dmitry Kravkov
@ 2012-12-08 23:31 ` Dmitry Kravkov
  2012-12-09  2:01   ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Kravkov @ 2012-12-08 23:31 UTC (permalink / raw)
  To: Dmitry Kravkov, Eric Dumazet, netdev


> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Dmitry Kravkov
> Sent: Sunday, December 09, 2012 12:35 AM
> To: Eric Dumazet; netdev@vger.kernel.org
> Subject: ipgre rss is broken since gro
> 
> Please note that at current net-next head behavior is different - I hit null pointer
> dereference, I will try to bisect this behavior too.

Here is the trace for a while:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8144f35e>] skb_gro_receive+0xbe/0x5a0
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: ip_gre gre bnx2x(O) netconsole configfs ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support sg coretemp hwmon kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode serio_raw pcspkr snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i7core_edac edac_core i2c_i801 i2c_core lpc_ich mfd_core igb dca ptp pps_core libcrc32c mdio ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 x
 ts gf128mul pata_acpi ata_generic ata_piix [last unloaded: bnx2x]
CPU 0
Pid: 0, comm: swapper/0 Tainted: G           O 3.7.0-rc7+ #38 Supermicro X8QB6/X8QB6
RIP: 0010:[<ffffffff8144f35e>]  [<ffffffff8144f35e>] skb_gro_receive+0xbe/0x5a0
RSP: 0018:ffff88047f803c80  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88046cc557c0 RCX: 0000000000001c04
RDX: 0000000000000900 RSI: 0000000000000000 RDI: ffff88046e37d800
RBP: ffff88047f803cf0 R08: ffff88046cc557e8 R09: ffff88046e37dec0
R10: 00000000000005c4 R11: ffff88046d872ec0 R12: ffff88046e013480
R13: 0000000000000034 R14: 0000000000000590 R15: ffff880466b9dc50
FS:  0000000000000000(0000) GS:ffff88047f800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a13420)
Stack:
 ffff880469cc79a8 ffff880866296800 0000000000000482 ffff880469cc7980
 ffff88047f803cd0 ffffffff81461558 ffff880866296800 ffff88046d452080
 ffff88086ce601c8 ffff88046e013480 0000000000000590 ffff88046cc557e8
Call Trace:
 <IRQ>
 [<ffffffff81461558>] ? napi_gro_receive+0x238/0x270
 [<ffffffff814a3ef1>] tcp_gro_receive+0x271/0x2d0
 [<ffffffff814b6aa0>] tcp4_gro_receive+0xb0/0x130
 [<ffffffff814cd02a>] inet_gro_receive+0x16a/0x210
 [<ffffffff81460c79>] dev_gro_receive+0x1c9/0x2d0
 [<ffffffff8146144b>] napi_gro_receive+0x12b/0x270
 [<ffffffffa00daace>] gro_cell_poll+0x2e/0x60 [ip_gre]
 [<ffffffff81460f73>] net_rx_action+0x103/0x280
 [<ffffffff8105dfd7>] __do_softirq+0xd7/0x240
 [<ffffffff815362dc>] call_softirq+0x1c/0x30
 [<ffffffff810164a5>] do_softirq+0x65/0xa0
 [<ffffffff8105ddbd>] irq_exit+0xbd/0xe0
 [<ffffffff81536b66>] do_IRQ+0x66/0xe0
 [<ffffffff8152ca2d>] common_interrupt+0x6d/0x6d
 <EOI>
 [<ffffffff8107e4df>] ? __hrtimer_start_range_ns+0x18f/0x420
 [<ffffffff812bccc1>] ? intel_idle+0xe1/0x150
 [<ffffffff812bcca7>] ? intel_idle+0xc7/0x150
 [<ffffffff8141df79>] cpuidle_enter+0x19/0x20
 [<ffffffff8141df97>] cpuidle_enter_state+0x17/0x50
 [<ffffffff8141e8af>] cpuidle_idle_call+0xcf/0x1a0
 [<ffffffff8101cd2f>] cpu_idle+0xcf/0x120
 [<ffffffff815110e5>] rest_init+0x75/0x80
 [<ffffffff81b05f10>] start_kernel+0x3da/0x3e7
 [<ffffffff81b05954>] ? repair_env_string+0x5b/0x5b
 [<ffffffff81b05356>] x86_64_start_reservations+0x131/0x136
 [<ffffffff81b0545e>] x86_64_start_kernel+0x103/0x112
Code: e8 00 00 00 0f 87 8b 00 00 00 8b 43 68 44 29 e8 3b 43 6c 89 43 68 0f 82 c7 04 00 00 45 89 ed 4c 01 ab e0 00 00 00 49 8b 44 24 08 <48> 89 18 49 89 5c 24 08 0f b6 43 7c a8 10 0f 85 a8 04 00 00 83
RIP  [<ffffffff8144f35e>] skb_gro_receive+0xbe/0x5a0
 RSP <ffff88047f803c80>
CR2: 0000000000000000
---[ end trace e828b50927d09339 ]---
Kernel panic - not syncing: Fatal exception in interrupt
 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ipgre rss is broken since gro
  2012-12-08 23:31 ` Dmitry Kravkov
@ 2012-12-09  2:01   ` Eric Dumazet
  2012-12-09  2:04     ` Dmitry Kravkov
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2012-12-09  2:01 UTC (permalink / raw)
  To: Dmitry Kravkov; +Cc: netdev

On Sat, Dec 8, 2012 at 3:31 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:
> Here is the trace for a while:
>
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: [<ffffffff8144f35e>] skb_gro_receive+0xbe/0x5a0

Hi Dmitry

NULL pointer deref probably fixed on net tree (or Linus tree) by

http://git.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=c3c7c254b2e8cd99b0adf288c2a1bddacd7ba255

For the GRO stuff and RSS, I wonder if skbs have a property that makes
them dropped somewhere, you might try drop_monitor / drop_watch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ipgre rss is broken since gro
  2012-12-09  2:01   ` Eric Dumazet
@ 2012-12-09  2:04     ` Dmitry Kravkov
  2012-12-09 20:49       ` Dmitry Kravkov
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Kravkov @ 2012-12-09  2:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Sat, 2012-12-08 at 18:01 -0800, Eric Dumazet wrote:
> On Sat, Dec 8, 2012 at 3:31 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:
> > Here is the trace for a while:
> >
> > BUG: unable to handle kernel NULL pointer dereference at           (null)
> > IP: [<ffffffff8144f35e>] skb_gro_receive+0xbe/0x5a0
> 
> Hi Dmitry
> 
> NULL pointer deref probably fixed on net tree (or Linus tree) by
> 
> http://git.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=c3c7c254b2e8cd99b0adf288c2a1bddacd7ba255
> 
> For the GRO stuff and RSS, I wonder if skbs have a property that makes
> them dropped somewhere, you might try drop_monitor / drop_watch
> 

I will try both and update ... 
Thanks

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ipgre rss is broken since gro
  2012-12-09  2:04     ` Dmitry Kravkov
@ 2012-12-09 20:49       ` Dmitry Kravkov
  2012-12-09 23:27         ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Kravkov @ 2012-12-09 20:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Sun, 2012-12-09 at 04:04 +0200, Dmitry Kravkov wrote:
> On Sat, 2012-12-08 at 18:01 -0800, Eric Dumazet wrote:
> > On Sat, Dec 8, 2012 at 3:31 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:
> > > Here is the trace for a while:
> > >
> > > BUG: unable to handle kernel NULL pointer dereference at           (null)
> > > IP: [<ffffffff8144f35e>] skb_gro_receive+0xbe/0x5a0
> > 
> > Hi Dmitry
> > 
> > NULL pointer deref probably fixed on net tree (or Linus tree) by
> > 
> > http://git.kernel.org/?p=linux/kernel/git/davem/net.git;a=commit;h=c3c7c254b2e8cd99b0adf288c2a1bddacd7ba255

This resolved the deref. Thanks.

> > For the GRO stuff and RSS, I wonder if skbs have a property that makes
> > them dropped somewhere, you might try drop_monitor / drop_watch

for this item: drop_watch does not show any drops (i've disable all
other interfaces for clear env)
I will explain a little bit more the setup:
bnx2x device (under testing) is configured for RSS for IPGRE packets.
Sending multiple (3) TCP_STREAM causes ip_gre interface to disappear
packets (even ICMP).
This is not happening with single TCP_STREAM, or before gro_cell
introduction.

i was searching for the drops by print-out in ip_gre.c but disappeared
packets completes this code:

static int ipgre_rcv(struct sk_buff *skb)
(snip)
	        printk("%s:%d\n", __FUNCTION__, __LINE__);

                tstats = this_cpu_ptr(tunnel->dev->tstats);
                u64_stats_update_begin(&tstats->syncp);
                tstats->rx_packets++;
                tstats->rx_bytes += skb->len;
                u64_stats_update_end(&tstats->syncp);

                gro_cells_receive(&tunnel->gro_cells, skb);
                return 0;

> > 
> 
> I will try both and update ... 
> Thanks

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ipgre rss is broken since gro
  2012-12-09 20:49       ` Dmitry Kravkov
@ 2012-12-09 23:27         ` Eric Dumazet
  2012-12-10 11:32           ` Dmitry Kravkov
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2012-12-09 23:27 UTC (permalink / raw)
  To: Dmitry Kravkov; +Cc: netdev

On Sun, Dec 9, 2012 at 12:49 PM, Dmitry Kravkov <dmitry@broadcom.com> wrote:

> for this item: drop_watch does not show any drops (i've disable all
> other interfaces for clear env)
> I will explain a little bit more the setup:
> bnx2x device (under testing) is configured for RSS for IPGRE packets.
> Sending multiple (3) TCP_STREAM causes ip_gre interface to disappear
> packets (even ICMP).
> This is not happening with single TCP_STREAM, or before gro_cell
> introduction.
>
> i was searching for the drops by print-out in ip_gre.c but disappeared
> packets completes this code:
>
> static int ipgre_rcv(struct sk_buff *skb)
> (snip)
>                 printk("%s:%d\n", __FUNCTION__, __LINE__);
>
>                 tstats = this_cpu_ptr(tunnel->dev->tstats);
>                 u64_stats_update_begin(&tstats->syncp);
>                 tstats->rx_packets++;
>                 tstats->rx_bytes += skb->len;
>                 u64_stats_update_end(&tstats->syncp);
>
>                 gro_cells_receive(&tunnel->gro_cells, skb);
>                 return 0;

I dont know, I tried a bnx2x setup, and 100 tcp flows, no special problem.

If you receive a lot of packets on a single RX queue, they might be
dropped because cpu cant cope with the load
(This has nothing to do with GRE or GRO )

cat /proc/net/softnet_stat

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: ipgre rss is broken since gro
  2012-12-09 23:27         ` Eric Dumazet
@ 2012-12-10 11:32           ` Dmitry Kravkov
  2012-12-10 15:17             ` Eric Dumazet
  2012-12-10 16:54             ` Eric Dumazet
  0 siblings, 2 replies; 15+ messages in thread
From: Dmitry Kravkov @ 2012-12-10 11:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 2635 bytes --]

> -----Original Message-----
> From: Eric Dumazet [mailto:edumazet@google.com]
> Sent: Monday, December 10, 2012 1:27 AM
> To: Dmitry Kravkov
> Cc: netdev@vger.kernel.org
> Subject: Re: ipgre rss is broken since gro
> 
> On Sun, Dec 9, 2012 at 12:49 PM, Dmitry Kravkov <dmitry@broadcom.com>
> wrote:
> 
> > for this item: drop_watch does not show any drops (i've disable all
> > other interfaces for clear env)
> > I will explain a little bit more the setup:
> > bnx2x device (under testing) is configured for RSS for IPGRE packets.
> > Sending multiple (3) TCP_STREAM causes ip_gre interface to disappear
> > packets (even ICMP).
> > This is not happening with single TCP_STREAM, or before gro_cell
> > introduction.
> >
> I dont know, I tried a bnx2x setup, and 100 tcp flows, no special problem.

Current bnx2x do not apply RSS for GRE, non GRE RSS is working w/o problem.

> 
> If you receive a lot of packets on a single RX queue, they might be
> dropped because cpu cant cope with the load
> (This has nothing to do with GRE or GRO )
> 
CPU is not loaded at all 

> cat /proc/net/softnet_stat
Please find attached.

For gre interface RX and DROP statistics are advancing simultaneously (by one each ICMP request):

[root@ ~]# ifconfig gre
gre       Link encap:UNSPEC  HWaddr C0-A8-0A-40-73-72-83-D2-00-00-00-00-00-00-00-00
          inet addr:8.0.0.1  P-t-P:8.0.0.1  Mask:255.255.255.0
          inet6 addr: fe80::5efe:c0a8:a40/64 Scope:Link
          UP POINTOPOINT RUNNING NOARP  MTU:1476  Metric:1
          RX packets:1646824 errors:0 dropped:51610 overruns:0 frame:0
          TX packets:140519 errors:1 dropped:0 overruns:0 carrier:1
          collisions:0 txqueuelen:0
          RX bytes:2357650904 (2.1 GiB)  TX bytes:7309072 (6.9 MiB)

[root@ ~]# ifconfig gre
gre       Link encap:UNSPEC  HWaddr C0-A8-0A-40-73-72-83-82-00-00-00-00-00-00-00-00
          inet addr:8.0.0.1  P-t-P:8.0.0.1  Mask:255.255.255.0
          inet6 addr: fe80::5efe:c0a8:a40/64 Scope:Link
          UP POINTOPOINT RUNNING NOARP  MTU:1476  Metric:1
          RX packets:1646826 errors:0 dropped:51612 overruns:0 frame:0
          TX packets:140519 errors:1 dropped:0 overruns:0 carrier:1
          collisions:0 txqueuelen:0
          RX bytes:2357651072 (2.1 GiB)  TX bytes:7309072 (6.9 MiB) 

[root@ ~]# tcpdump -i gre
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on gre, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
2 packets dropped by interface

[-- Attachment #2: stat1 --]
[-- Type: application/octet-stream, Size: 3600 bytes --]

001b96a3 00000000 00001cb5 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000036f 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000181 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000001e6 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000012e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000007b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000a 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000029e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000043 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000015 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000010 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000008 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000007 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00008192 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00005682 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000066bd 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00002ab5 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00001d11 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000165cf 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000005 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000203 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000009b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000c914 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000003c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000788 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00001dab 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000972f 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000015 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

[-- Attachment #3: stat2 --]
[-- Type: application/octet-stream, Size: 3600 bytes --]

001b96a3 00000000 00001cb5 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000036f 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000181 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000001e6 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000012e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000007b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000004 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000a 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000029e 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000043 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000015 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000010 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000008 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000007 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00008192 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00005682 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000066bd 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00002ab5 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00001d11 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
000165cf 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000005 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000203 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000009b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000c919 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000003c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000789 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00001dab 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000972f 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000015 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0000000c 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: ipgre rss is broken since gro
  2012-12-10 11:32           ` Dmitry Kravkov
@ 2012-12-10 15:17             ` Eric Dumazet
  2012-12-10 16:54             ` Eric Dumazet
  1 sibling, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2012-12-10 15:17 UTC (permalink / raw)
  To: Dmitry Kravkov; +Cc: Eric Dumazet, netdev

On Mon, 2012-12-10 at 11:32 +0000, Dmitry Kravkov wrote:

> Current bnx2x do not apply RSS for GRE, non GRE RSS is working w/o problem.

What about you post the changes you did ?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: ipgre rss is broken since gro
  2012-12-10 11:32           ` Dmitry Kravkov
  2012-12-10 15:17             ` Eric Dumazet
@ 2012-12-10 16:54             ` Eric Dumazet
  2012-12-10 19:02               ` David Miller
  2012-12-10 19:20               ` Dmitry Kravkov
  1 sibling, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2012-12-10 16:54 UTC (permalink / raw)
  To: Dmitry Kravkov; +Cc: Eric Dumazet, netdev

On Mon, 2012-12-10 at 11:32 +0000, Dmitry Kravkov wrote:

> CPU is not loaded at all 
> 
> > cat /proc/net/softnet_stat
> Please find attached.
> 
> For gre interface RX and DROP statistics are advancing simultaneously (by one each ICMP request):
> 
> [root@ ~]# ifconfig gre
> gre       Link encap:UNSPEC  HWaddr C0-A8-0A-40-73-72-83-D2-00-00-00-00-00-00-00-00
>           inet addr:8.0.0.1  P-t-P:8.0.0.1  Mask:255.255.255.0
>           inet6 addr: fe80::5efe:c0a8:a40/64 Scope:Link
>           UP POINTOPOINT RUNNING NOARP  MTU:1476  Metric:1
>           RX packets:1646824 errors:0 dropped:51610 overruns:0 frame:0
>           TX packets:140519 errors:1 dropped:0 overruns:0 carrier:1
>           collisions:0 txqueuelen:0
>           RX bytes:2357650904 (2.1 GiB)  TX bytes:7309072 (6.9 MiB)


dropped:51610  so obviously  one cpu is fully loaded.

I believe performance problem might come from the
skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()

So all packets are queued into a single GRO queue, instead of being
split as intended in multiple queues.

I cant find why we must clear queue_mapping, so could you try :


diff --git a/include/net/dst.h b/include/net/dst.h
index 9a78810..4cb27df 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -329,7 +329,6 @@ static inline void __skb_tunnel_rx(struct sk_buff *skb, struct net_device *dev)
 	 */
 	if (!skb->l4_rxhash)
 		skb->rxhash = 0;
-	skb_set_queue_mapping(skb, 0);
 	skb_dst_drop(skb);
 	nf_reset(skb);
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: ipgre rss is broken since gro
  2012-12-10 16:54             ` Eric Dumazet
@ 2012-12-10 19:02               ` David Miller
  2012-12-10 19:20               ` Dmitry Kravkov
  1 sibling, 0 replies; 15+ messages in thread
From: David Miller @ 2012-12-10 19:02 UTC (permalink / raw)
  To: eric.dumazet; +Cc: dmitry, edumazet, netdev, therbert

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 10 Dec 2012 08:54:31 -0800

> I believe performance problem might come from the
> skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()
> 
> So all packets are queued into a single GRO queue, instead of being
> split as intended in multiple queues.
> 
> I cant find why we must clear queue_mapping, so could you try :

Tom says:

commit 693019e90ca45d881109d32c0c6d29adf03f6447
Author: Tom Herbert <therbert@google.com>
Date:   Thu Sep 23 11:19:54 2010 +0000

    net: reset skb queue mapping when rx'ing over tunnel
    
    Reset queue mapping when an skb is reentering the stack via a tunnel.
    On second pass, the queue mapping from the original device is no
    longer valid.
    
    Signed-off-by: Tom Herbert <therbert@google.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/dst.h b/include/net/dst.h
index 81d1413..0238650 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -242,6 +242,7 @@ static inline void skb_tunnel_rx(struct sk_buff *skb, struct net_device *dev)
 	dev->stats.rx_packets++;
 	dev->stats.rx_bytes += skb->len;
 	skb->rxhash = 0;
+	skb_set_queue_mapping(skb, 0);
 	skb_dst_drop(skb);
 	nf_reset(skb);
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: ipgre rss is broken since gro
  2012-12-10 16:54             ` Eric Dumazet
  2012-12-10 19:02               ` David Miller
@ 2012-12-10 19:20               ` Dmitry Kravkov
  2012-12-10 21:55                 ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Dmitry Kravkov @ 2012-12-10 19:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, netdev

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Monday, December 10, 2012 6:55 PM
> To: Dmitry Kravkov
> Cc: Eric Dumazet; netdev@vger.kernel.org
> Subject: RE: ipgre rss is broken since gro
> 
> On Mon, 2012-12-10 at 11:32 +0000, Dmitry Kravkov wrote:
> 
> 
> dropped:51610  so obviously  one cpu is fully loaded.
Link partner continued pushing data overnight

> 
> I believe performance problem might come from the
> skb_set_queue_mapping(skb, 0); in __skb_tunnel_rx()
> 
> So all packets are queued into a single GRO queue, instead of being
> split as intended in multiple queues.
> 
> I cant find why we must clear queue_mapping, so could you try :
> 
> 
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 9a78810..4cb27df 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -329,7 +329,6 @@ static inline void __skb_tunnel_rx(struct sk_buff *skb,
> struct net_device *dev)
>  	 */
>  	if (!skb->l4_rxhash)
>  		skb->rxhash = 0;
> -	skb_set_queue_mapping(skb, 0);
>  	skb_dst_drop(skb);
>  	nf_reset(skb);
>  }
> 
Yep, this resolved the issue - Interface is functional after 3 and 100 TCP connections. Thanks
   


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: ipgre rss is broken since gro
  2012-12-10 19:20               ` Dmitry Kravkov
@ 2012-12-10 21:55                 ` Eric Dumazet
  2012-12-10 22:32                   ` [PATCH] net: fix a race in gro_cell_poll() Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2012-12-10 21:55 UTC (permalink / raw)
  To: Dmitry Kravkov; +Cc: netdev

On Mon, 2012-12-10 at 19:20 +0000, Dmitry Kravkov wrote:

> Yep, this resolved the issue - Interface is functional after 3 and 100 TCP connections. Thanks
>    
> 

I guess its only lowering probability of the race.

Could you try instead the following patch ?

I'll address the queue_mapping separately for net-next

Thanks

diff --git a/include/net/gro_cells.h b/include/net/gro_cells.h
index 4fd8a4b..e5062c9 100644
--- a/include/net/gro_cells.h
+++ b/include/net/gro_cells.h
@@ -17,7 +17,6 @@ struct gro_cells {
 
 static inline void gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
 {
-	unsigned long flags;
 	struct gro_cell *cell = gcells->cells;
 	struct net_device *dev = skb->dev;
 
@@ -35,32 +34,37 @@ static inline void gro_cells_receive(struct gro_cells *gcells, struct sk_buff *s
 		return;
 	}
 
-	spin_lock_irqsave(&cell->napi_skbs.lock, flags);
+	/* We run in BH context */
+	spin_lock(&cell->napi_skbs.lock);
 
 	__skb_queue_tail(&cell->napi_skbs, skb);
 	if (skb_queue_len(&cell->napi_skbs) == 1)
 		napi_schedule(&cell->napi);
 
-	spin_unlock_irqrestore(&cell->napi_skbs.lock, flags);
+	spin_unlock(&cell->napi_skbs.lock);
 }
 
+/* called unser BH context */
 static inline int gro_cell_poll(struct napi_struct *napi, int budget)
 {
 	struct gro_cell *cell = container_of(napi, struct gro_cell, napi);
 	struct sk_buff *skb;
 	int work_done = 0;
 
+	spin_lock(&cell->napi_skbs.lock);
 	while (work_done < budget) {
-		skb = skb_dequeue(&cell->napi_skbs);
+		skb = __skb_dequeue(&cell->napi_skbs);
 		if (!skb)
 			break;
-
+		spin_unlock(&cell->napi_skbs.lock);
 		napi_gro_receive(napi, skb);
 		work_done++;
+		spin_lock(&cell->napi_skbs.lock);
 	}
 
 	if (work_done < budget)
 		napi_complete(napi);
+	spin_unlock(&cell->napi_skbs.lock);
 	return work_done;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] net: fix a race in gro_cell_poll()
  2012-12-10 21:55                 ` Eric Dumazet
@ 2012-12-10 22:32                   ` Eric Dumazet
  2012-12-10 22:46                     ` Dmitry Kravkov
  2012-12-11 17:50                     ` David Miller
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2012-12-10 22:32 UTC (permalink / raw)
  To: Dmitry Kravkov, David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

Dmitry Kravkov reported packet drops for GRE packets since GRO support
was added.

There is a race in gro_cell_poll() because we call napi_complete()
without any synchronization with a concurrent gro_cells_receive()

Once bug was triggered, we queued packets but did not schedule NAPI
poll.

We can fix this issue using the spinlock protected the napi_skbs queue,
as we have to hold it to perform skb dequeue anyway.

As we open-code skb_dequeue(), we no longer need to mask IRQS, as both
producer and consumer run under BH context.

Bug added in commit c9e6bc644e (net: add gro_cells infrastructure)

Reported-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---

David: I could reproduce the bug Dmitry reported, and have
  verified this patch fixes the issue.

 include/net/gro_cells.h |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/net/gro_cells.h b/include/net/gro_cells.h
index 4fd8a4b..e5062c9 100644
--- a/include/net/gro_cells.h
+++ b/include/net/gro_cells.h
@@ -17,7 +17,6 @@ struct gro_cells {
 
 static inline void gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
 {
-	unsigned long flags;
 	struct gro_cell *cell = gcells->cells;
 	struct net_device *dev = skb->dev;
 
@@ -35,32 +34,37 @@ static inline void gro_cells_receive(struct gro_cells *gcells, struct sk_buff *s
 		return;
 	}
 
-	spin_lock_irqsave(&cell->napi_skbs.lock, flags);
+	/* We run in BH context */
+	spin_lock(&cell->napi_skbs.lock);
 
 	__skb_queue_tail(&cell->napi_skbs, skb);
 	if (skb_queue_len(&cell->napi_skbs) == 1)
 		napi_schedule(&cell->napi);
 
-	spin_unlock_irqrestore(&cell->napi_skbs.lock, flags);
+	spin_unlock(&cell->napi_skbs.lock);
 }
 
+/* called unser BH context */
 static inline int gro_cell_poll(struct napi_struct *napi, int budget)
 {
 	struct gro_cell *cell = container_of(napi, struct gro_cell, napi);
 	struct sk_buff *skb;
 	int work_done = 0;
 
+	spin_lock(&cell->napi_skbs.lock);
 	while (work_done < budget) {
-		skb = skb_dequeue(&cell->napi_skbs);
+		skb = __skb_dequeue(&cell->napi_skbs);
 		if (!skb)
 			break;
-
+		spin_unlock(&cell->napi_skbs.lock);
 		napi_gro_receive(napi, skb);
 		work_done++;
+		spin_lock(&cell->napi_skbs.lock);
 	}
 
 	if (work_done < budget)
 		napi_complete(napi);
+	spin_unlock(&cell->napi_skbs.lock);
 	return work_done;
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: [PATCH] net: fix a race in gro_cell_poll()
  2012-12-10 22:32                   ` [PATCH] net: fix a race in gro_cell_poll() Eric Dumazet
@ 2012-12-10 22:46                     ` Dmitry Kravkov
  2012-12-11 17:50                     ` David Miller
  1 sibling, 0 replies; 15+ messages in thread
From: Dmitry Kravkov @ 2012-12-10 22:46 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev


> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Tuesday, December 11, 2012 12:32 AM
> To: Dmitry Kravkov; David Miller
> Cc: netdev@vger.kernel.org
> Subject: [PATCH] net: fix a race in gro_cell_poll()
> 
> From: Eric Dumazet <edumazet@google.com>
> 
> Dmitry Kravkov reported packet drops for GRE packets since GRO support
> was added.
> 
> There is a race in gro_cell_poll() because we call napi_complete()
> without any synchronization with a concurrent gro_cells_receive()
> 
> Once bug was triggered, we queued packets but did not schedule NAPI
> poll.
> 
> We can fix this issue using the spinlock protected the napi_skbs queue,
> as we have to hold it to perform skb dequeue anyway.
> 
> As we open-code skb_dequeue(), we no longer need to mask IRQS, as both
> producer and consumer run under BH context.
> 
> Bug added in commit c9e6bc644e (net: add gro_cells infrastructure)
> 
> Reported-by: Dmitry Kravkov <dmitry@broadcom.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> 
> David: I could reproduce the bug Dmitry reported, and have
>   verified this patch fixes the issue.
> 
>  include/net/gro_cells.h |   14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/include/net/gro_cells.h b/include/net/gro_cells.h
> index 4fd8a4b..e5062c9 100644
> --- a/include/net/gro_cells.h
> +++ b/include/net/gro_cells.h
> @@ -17,7 +17,6 @@ struct gro_cells {
> 
>  static inline void gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
>  {
> -	unsigned long flags;
>  	struct gro_cell *cell = gcells->cells;
>  	struct net_device *dev = skb->dev;
> 
> @@ -35,32 +34,37 @@ static inline void gro_cells_receive(struct gro_cells
> *gcells, struct sk_buff *s
>  		return;
>  	}
> 
> -	spin_lock_irqsave(&cell->napi_skbs.lock, flags);
> +	/* We run in BH context */
> +	spin_lock(&cell->napi_skbs.lock);
> 
>  	__skb_queue_tail(&cell->napi_skbs, skb);
>  	if (skb_queue_len(&cell->napi_skbs) == 1)
>  		napi_schedule(&cell->napi);
> 
> -	spin_unlock_irqrestore(&cell->napi_skbs.lock, flags);
> +	spin_unlock(&cell->napi_skbs.lock);
>  }
> 
> +/* called unser BH context */
>  static inline int gro_cell_poll(struct napi_struct *napi, int budget)
>  {
>  	struct gro_cell *cell = container_of(napi, struct gro_cell, napi);
>  	struct sk_buff *skb;
>  	int work_done = 0;
> 
> +	spin_lock(&cell->napi_skbs.lock);
>  	while (work_done < budget) {
> -		skb = skb_dequeue(&cell->napi_skbs);
> +		skb = __skb_dequeue(&cell->napi_skbs);
>  		if (!skb)
>  			break;
> -
> +		spin_unlock(&cell->napi_skbs.lock);
>  		napi_gro_receive(napi, skb);
>  		work_done++;
> +		spin_lock(&cell->napi_skbs.lock);
>  	}
> 
>  	if (work_done < budget)
>  		napi_complete(napi);
> +	spin_unlock(&cell->napi_skbs.lock);
>  	return work_done;
>  }
> 
> 
My scenario is working, now
Thanks Eric.

Tested-by: Dmitry Kravkov <dmitry@broadcom.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] net: fix a race in gro_cell_poll()
  2012-12-10 22:32                   ` [PATCH] net: fix a race in gro_cell_poll() Eric Dumazet
  2012-12-10 22:46                     ` Dmitry Kravkov
@ 2012-12-11 17:50                     ` David Miller
  1 sibling, 0 replies; 15+ messages in thread
From: David Miller @ 2012-12-11 17:50 UTC (permalink / raw)
  To: eric.dumazet; +Cc: dmitry, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 10 Dec 2012 14:32:03 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Dmitry Kravkov reported packet drops for GRE packets since GRO support
> was added.
> 
> There is a race in gro_cell_poll() because we call napi_complete()
> without any synchronization with a concurrent gro_cells_receive()
> 
> Once bug was triggered, we queued packets but did not schedule NAPI
> poll.
> 
> We can fix this issue using the spinlock protected the napi_skbs queue,
> as we have to hold it to perform skb dequeue anyway.
> 
> As we open-code skb_dequeue(), we no longer need to mask IRQS, as both
> producer and consumer run under BH context.
> 
> Bug added in commit c9e6bc644e (net: add gro_cells infrastructure)
> 
> Reported-by: Dmitry Kravkov <dmitry@broadcom.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-12-11 17:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-08 22:35 ipgre rss is broken since gro Dmitry Kravkov
2012-12-08 23:31 ` Dmitry Kravkov
2012-12-09  2:01   ` Eric Dumazet
2012-12-09  2:04     ` Dmitry Kravkov
2012-12-09 20:49       ` Dmitry Kravkov
2012-12-09 23:27         ` Eric Dumazet
2012-12-10 11:32           ` Dmitry Kravkov
2012-12-10 15:17             ` Eric Dumazet
2012-12-10 16:54             ` Eric Dumazet
2012-12-10 19:02               ` David Miller
2012-12-10 19:20               ` Dmitry Kravkov
2012-12-10 21:55                 ` Eric Dumazet
2012-12-10 22:32                   ` [PATCH] net: fix a race in gro_cell_poll() Eric Dumazet
2012-12-10 22:46                     ` Dmitry Kravkov
2012-12-11 17:50                     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.