netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
@ 2014-02-03 20:37 Ben Greear
  2014-02-03 22:03 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Greear @ 2014-02-03 20:37 UTC (permalink / raw)
  To: netdev

The kernel has some additional patches, but not much to IPv6.

The bug is that when we have lots of mac-vlans on some ixgbe ports
(500 per interface in this case), and boot up the system with the ports unplugged,
we get this crash almost every time.  Boot-up is going to do normal bootup
stuff plus create and configure the 1000 mac-vlans, dump their routing
tables, etc.

We are using one routing table per network device, and some
ip rules.

If we plug in the ixgbe ports, we do not ever see a crash.

We have not yet tried reproducing it on other drivers, but I suspect
the issue is not related to ixgbe.

Any ideas on this one?


Reading symbols from /home/greearb/kernel/2.6/linux-3.9.x64/net/ipv6/ipv6.ko...done.
(gdb) l *(fib6_walk_continue+0xd3)
0x105c0 is in fib6_walk_continue (/home/greearb/git/linux-3.9.dev.y/net/ipv6/ip6_fib.c:1423).
1418				if (fn == w->root)
1419					return 0;
1420				pn = fn->parent;
1421				w->node = pn;
1422	#ifdef CONFIG_IPV6_SUBTREES
1423				if (FIB6_SUBTREE(pn) == fn) {
1424					WARN_ON(!(fn->fn_flags & RTN_ROOT));
1425					w->state = FWS_L;
1426					continue;
1427				}
(gdb)

[root@lanforge-13100125 ~]# BUG: unable to handle kernel NULL pointer
dereference at 0000000000000018
IP: [<ffffffffa00a75c0>] fib6_walk_continue+0xd3/0x13c [ipv6]
PGD 4017c4067 PUD 3f3a94067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: nf_nat_ipv4 nf_nat fuse macvlan wanlink(O) pktgen
ip6table_filter ip6_tables ebtable_nat ebtables coretemp mperf intel_powerclamp
kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode serio_raw joydev pcspkr
i2c_i801 lpc_ich e1000e ixgbe ptp pps_core mdio hwmon dca video shpchp uinput
ipv6 mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded:
iptable_nat]
CPU 7
Pid: 26961, comm: ip Tainted: G         C O 3.9.11+ #134 Supermicro
X9SCI/X9SCA/X9SCI/X9SCA
RIP: 0010:[<ffffffffa00a75c0>]  [<ffffffffa00a75c0>]
fib6_walk_continue+0xd3/0x13c [ipv6]
RSP: 0018:ffff880400677a48  EFLAGS: 00010283
RAX: ffff8803f8b08698 RBX: ffff8803f88ea6c0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880400677918 RDI: ffff8803f3dde058
RBP: ffff880400677a58 R08: ffff8803f3dde034 R09: ffff8803f3dde000
R10: ffffffff810ca37a R11: ffff88041d5adef8 R12: ffff8803f3a34500
R13: ffffffff81ab5780 R14: ffff8803f88ea6c0 R15: ffff88041bcfc200
FS:  00007f054b30b740(0000) GS:ffff88042fdc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 00000003f3c8c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ip (pid: 26961, threadinfo ffff880400676000, task ffff8803ff90aee0)
Stack:
 ffff880400677aa8 ffff8803f248dd00 ffff880400677ad8 ffffffffa00a7815
 ffff8803ff90aee0 ffff88041bcfc214 0000000200000000 0000000200000020
 ffff8803f3a34500 ffff8803f248dd00 ffffffff81ab5780 0000000000000e70
Call Trace:
 [<ffffffffa00a7815>] inet6_dump_fib+0x179/0x211 [ipv6]
 [<ffffffff81535b19>] netlink_dump+0x6b/0x1b2
 [<ffffffff81535e2c>] netlink_recvmsg+0x1cc/0x322
 [<ffffffff815205d9>] ? rtnetlink_rcv+0x2b/0x2d
 [<ffffffff814ff3f5>] __sock_recvmsg+0x6a/0x77
 [<ffffffff814ff473>] sock_recvmsg+0x71/0x8a
 [<ffffffff8150aea1>] ? copy_from_user+0x9/0xb
 [<ffffffff8150b207>] ? verify_iovec+0x54/0xa8
 [<ffffffff81500f59>] ___sys_recvmsg+0x13b/0x20d
 [<ffffffff811602ca>] ? handle_mm_fault+0x536/0x550
 [<ffffffff815ce8a6>] ? __do_page_fault+0x307/0x389
 [<ffffffff81162789>] ? remove_vma+0x5d/0x65
 [<ffffffff8116467d>] ? do_munmap+0x332/0x34c
 [<ffffffff81501323>] __sys_recvmsg+0x42/0x60
 [<ffffffff8150135a>] sys_recvmsg+0x19/0x1b
 [<ffffffff815d1c99>] system_call_fastpath+0x16/0x1b
Code: 89 43 2c e9 61 ff ff ff 48 89 df ff 53 38 85 c0 75 7d ff 43 30 e9 4f ff
ff ff c6 43 28 04 48 3b 43 10 74 69 48 8b 10 48 89 53 18 <48> 39 42 18 75 20 f6
40 2a 02 75 11 be 90 05 00 00 48 c7 c7 2a
RIP  [<ffffffffa00a75c0>] fib6_walk_continue+0xd3/0x13c [ipv6]
 RSP <ffff880400677a48>
CR2: 0000000000000018

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-02-03 20:37 IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel Ben Greear
@ 2014-02-03 22:03 ` Hannes Frederic Sowa
  2014-02-03 22:06   ` Ben Greear
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Frederic Sowa @ 2014-02-03 22:03 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

Hi Ben,

On Mon, Feb 03, 2014 at 12:37:52PM -0800, Ben Greear wrote:
> The kernel has some additional patches, but not much to IPv6.
> 
> The bug is that when we have lots of mac-vlans on some ixgbe ports
> (500 per interface in this case), and boot up the system with the ports unplugged,
> we get this crash almost every time.  Boot-up is going to do normal bootup
> stuff plus create and configure the 1000 mac-vlans, dump their routing
> tables, etc.
> 
> We are using one routing table per network device, and some
> ip rules.
> 
> If we plug in the ixgbe ports, we do not ever see a crash.
> 
> We have not yet tried reproducing it on other drivers, but I suspect
> the issue is not related to ixgbe.
> 
> Any ideas on this one?

Could you bring the machine to a panic again with enabling RT6_DEBUG at the
top of ip6_fib.c and send a dump of the trace?

Thanks,

  Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-02-03 22:03 ` Hannes Frederic Sowa
@ 2014-02-03 22:06   ` Ben Greear
  2014-02-08 16:43     ` Ben Greear
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Greear @ 2014-02-03 22:06 UTC (permalink / raw)
  To: netdev

On 02/03/2014 02:03 PM, Hannes Frederic Sowa wrote:
> Hi Ben,
> 
> On Mon, Feb 03, 2014 at 12:37:52PM -0800, Ben Greear wrote:
>> The kernel has some additional patches, but not much to IPv6.
>>
>> The bug is that when we have lots of mac-vlans on some ixgbe ports
>> (500 per interface in this case), and boot up the system with the ports unplugged,
>> we get this crash almost every time.  Boot-up is going to do normal bootup
>> stuff plus create and configure the 1000 mac-vlans, dump their routing
>> tables, etc.
>>
>> We are using one routing table per network device, and some
>> ip rules.
>>
>> If we plug in the ixgbe ports, we do not ever see a crash.
>>
>> We have not yet tried reproducing it on other drivers, but I suspect
>> the issue is not related to ixgbe.
>>
>> Any ideas on this one?
> 
> Could you bring the machine to a panic again with enabling RT6_DEBUG at the
> top of ip6_fib.c and send a dump of the trace?

Yes, but it will be a bit until we can create a duplicate machine.
We ended up delivering the machine with a note to make sure the
interfaces were plugged in (we found the bug hours before shipping
the system, of course).

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-02-03 22:06   ` Ben Greear
@ 2014-02-08 16:43     ` Ben Greear
  2014-02-08 17:23       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Greear @ 2014-02-08 16:43 UTC (permalink / raw)
  To: netdev

On 02/03/2014 02:06 PM, Ben Greear wrote:
> On 02/03/2014 02:03 PM, Hannes Frederic Sowa wrote:
>> Hi Ben,
>>
>> On Mon, Feb 03, 2014 at 12:37:52PM -0800, Ben Greear wrote:
>>> The kernel has some additional patches, but not much to IPv6.
>>>
>>> The bug is that when we have lots of mac-vlans on some ixgbe ports
>>> (500 per interface in this case), and boot up the system with the ports unplugged,
>>> we get this crash almost every time.  Boot-up is going to do normal bootup
>>> stuff plus create and configure the 1000 mac-vlans, dump their routing
>>> tables, etc.
>>>
>>> We are using one routing table per network device, and some
>>> ip rules.
>>>
>>> If we plug in the ixgbe ports, we do not ever see a crash.
>>>
>>> We have not yet tried reproducing it on other drivers, but I suspect
>>> the issue is not related to ixgbe.
>>>
>>> Any ideas on this one?
>>
>> Could you bring the machine to a panic again with enabling RT6_DEBUG at the
>> top of ip6_fib.c and send a dump of the trace?
> 
> Yes, but it will be a bit until we can create a duplicate machine.
> We ended up delivering the machine with a note to make sure the
> interfaces were plugged in (we found the bug hours before shipping
> the system, of course).

According to my system test guy, it took a lot longer to reproduce
the problem with the debug enabled kernel, but I do not see any extra
debug messages on the serial console logging or in /var/log/messages

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-02-08 16:43     ` Ben Greear
@ 2014-02-08 17:23       ` Hannes Frederic Sowa
  2014-09-25 22:24         ` Hongmei Li
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Frederic Sowa @ 2014-02-08 17:23 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Sat, Feb 08, 2014 at 08:43:32AM -0800, Ben Greear wrote:
> On 02/03/2014 02:06 PM, Ben Greear wrote:
> > On 02/03/2014 02:03 PM, Hannes Frederic Sowa wrote:
> >> Hi Ben,
> >>
> >> On Mon, Feb 03, 2014 at 12:37:52PM -0800, Ben Greear wrote:
> >>> The kernel has some additional patches, but not much to IPv6.
> >>>
> >>> The bug is that when we have lots of mac-vlans on some ixgbe ports
> >>> (500 per interface in this case), and boot up the system with the ports unplugged,
> >>> we get this crash almost every time.  Boot-up is going to do normal bootup
> >>> stuff plus create and configure the 1000 mac-vlans, dump their routing
> >>> tables, etc.
> >>>
> >>> We are using one routing table per network device, and some
> >>> ip rules.
> >>>
> >>> If we plug in the ixgbe ports, we do not ever see a crash.
> >>>
> >>> We have not yet tried reproducing it on other drivers, but I suspect
> >>> the issue is not related to ixgbe.
> >>>
> >>> Any ideas on this one?
> >>
> >> Could you bring the machine to a panic again with enabling RT6_DEBUG at the
> >> top of ip6_fib.c and send a dump of the trace?
> > 
> > Yes, but it will be a bit until we can create a duplicate machine.
> > We ended up delivering the machine with a note to make sure the
> > interfaces were plugged in (we found the bug hours before shipping
> > the system, of course).
> 
> According to my system test guy, it took a lot longer to reproduce
> the problem with the debug enabled kernel, but I do not see any extra
> debug messages on the serial console logging or in /var/log/messages

Sounds like a race, then, like I thought.

I forgot, those are pr_debugs, I usually enable them with

$ echo file net/ipv6/ip6_fib.c +p > /sys/kernel/debug/dynamic_debug/control

RT6_TRACE is pretty noisy so you should see output immediatley if you do ipv6
traffic. Other way is to specify dyndbg="file net/ipv6/ip6_fib.c +p" on the
kernel command line.

Try before doing to play with that until you can confirm the output showes up
on the console.

Thanks again,

  Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-02-08 17:23       ` Hannes Frederic Sowa
@ 2014-09-25 22:24         ` Hongmei Li
  2014-09-28 12:11           ` Hannes Frederic Sowa
  0 siblings, 1 reply; 15+ messages in thread
From: Hongmei Li @ 2014-09-25 22:24 UTC (permalink / raw)
  To: netdev

Hi Hannes and Ben,

I encounter the same kernel panic.
Do you guys have any progress on this issue?

Thanks and Best Regards,
Hongmei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-09-25 22:24         ` Hongmei Li
@ 2014-09-28 12:11           ` Hannes Frederic Sowa
  2014-09-29 18:15             ` Hongmei Li
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-28 12:11 UTC (permalink / raw)
  To: Hongmei Li, netdev

Hi Hongmei,

On Fri, Sep 26, 2014, at 00:24, Hongmei Li wrote:
> Hi Hannes and Ben,
> 
> I encounter the same kernel panic.
> Do you guys have any progress on this issue?

Unfortunately not, not looked into it for a long time.

Can you send me a fast reproducer script? I have better machines to
reproduce this issue nowadays.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-09-28 12:11           ` Hannes Frederic Sowa
@ 2014-09-29 18:15             ` Hongmei Li
  2014-09-29 18:44               ` Ben Greear
  0 siblings, 1 reply; 15+ messages in thread
From: Hongmei Li @ 2014-09-29 18:15 UTC (permalink / raw)
  To: netdev

Thanks Hannes for your prompt response!

We just encountered this issue two times in our product stability test so far,
and my panic backtrace is the exactly same with the one reported here.
I don't know how to reproduce the issue till now. 
I tried several method, unfortunately, I can not reproduce it by myself. :(

Thanks,
Hongmei

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-09-29 18:15             ` Hongmei Li
@ 2014-09-29 18:44               ` Ben Greear
  2014-09-29 19:03                 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Greear @ 2014-09-29 18:44 UTC (permalink / raw)
  To: Hongmei Li; +Cc: netdev

On 09/29/2014 11:15 AM, Hongmei Li wrote:
> Thanks Hannes for your prompt response!
> 
> We just encountered this issue two times in our product stability test so far,
> and my panic backtrace is the exactly same with the one reported here.
> I don't know how to reproduce the issue till now. 
> I tried several method, unfortunately, I can not reproduce it by myself. :(

We could reproduce it easily using our user-space tool that creates 1000
mac-vlans, configures them, dumps routing tables, etc.  But, we could
only reproduce it when we had the ixgbe ports unplugged.  If they
were properly connected to a switch, we did not see any crashes.

My original email on 2/3/2014 has more details, and the thread that
follows has some info on debugging we did at the time.

We will be happy to test patches...

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-09-29 18:44               ` Ben Greear
@ 2014-09-29 19:03                 ` Hannes Frederic Sowa
  2014-09-29 19:48                   ` Ben Greear
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-29 19:03 UTC (permalink / raw)
  To: Ben Greear; +Cc: Hongmei Li, netdev

On Mo, 2014-09-29 at 11:44 -0700, Ben Greear wrote:
> On 09/29/2014 11:15 AM, Hongmei Li wrote:
> > Thanks Hannes for your prompt response!
> > 
> > We just encountered this issue two times in our product stability test so far,
> > and my panic backtrace is the exactly same with the one reported here.
> > I don't know how to reproduce the issue till now. 
> > I tried several method, unfortunately, I can not reproduce it by myself. :(
> 
> We could reproduce it easily using our user-space tool that creates 1000
> mac-vlans, configures them, dumps routing tables, etc.  But, we could
> only reproduce it when we had the ixgbe ports unplugged.  If they
> were properly connected to a switch, we did not see any crashes.
> 
> My original email on 2/3/2014 has more details, and the thread that
> follows has some info on debugging we did at the time.

I just tried to reproduce the problem, disabling the port on a switch
and setting up 1000 macvlans on a ixgbe, enabling ipv6 and dumping
routing tables. Unluckily I had no success. :(

Any more hints?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-09-29 19:03                 ` Hannes Frederic Sowa
@ 2014-09-29 19:48                   ` Ben Greear
  2014-09-29 20:39                     ` Cong Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Greear @ 2014-09-29 19:48 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Hongmei Li, netdev

On 09/29/2014 12:03 PM, Hannes Frederic Sowa wrote:
> On Mo, 2014-09-29 at 11:44 -0700, Ben Greear wrote:
>> On 09/29/2014 11:15 AM, Hongmei Li wrote:
>>> Thanks Hannes for your prompt response!
>>>
>>> We just encountered this issue two times in our product stability test so far,
>>> and my panic backtrace is the exactly same with the one reported here.
>>> I don't know how to reproduce the issue till now. 
>>> I tried several method, unfortunately, I can not reproduce it by myself. :(
>>
>> We could reproduce it easily using our user-space tool that creates 1000
>> mac-vlans, configures them, dumps routing tables, etc.  But, we could
>> only reproduce it when we had the ixgbe ports unplugged.  If they
>> were properly connected to a switch, we did not see any crashes.
>>
>> My original email on 2/3/2014 has more details, and the thread that
>> follows has some info on debugging we did at the time.
> 
> I just tried to reproduce the problem, disabling the port on a switch
> and setting up 1000 macvlans on a ixgbe, enabling ipv6 and dumping
> routing tables. Unluckily I had no success. :(
> 
> Any more hints?

We are going to be running up to 20 concurrent scripts that
will be dumping routes & ips, and configuring ip addresses
and routes during bringup.

I don't have a simple stand-alone way to reproduce this,
but at least when we reported the problem is was very easy
for us to reproduce with our tool.

Maybe 20 scripts running in parallel that randomly configured and dumped routes
and ip addresses on random interfaces would do the trick?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-09-29 19:48                   ` Ben Greear
@ 2014-09-29 20:39                     ` Cong Wang
  2014-09-29 21:24                       ` Ben Greear
  0 siblings, 1 reply; 15+ messages in thread
From: Cong Wang @ 2014-09-29 20:39 UTC (permalink / raw)
  To: Ben Greear; +Cc: Hannes Frederic Sowa, Hongmei Li, netdev

On Mon, Sep 29, 2014 at 12:48 PM, Ben Greear <greearb@candelatech.com> wrote:
>
> We are going to be running up to 20 concurrent scripts that
> will be dumping routes & ips, and configuring ip addresses
> and routes during bringup.
>
> I don't have a simple stand-alone way to reproduce this,
> but at least when we reported the problem is was very easy
> for us to reproduce with our tool.
>
> Maybe 20 scripts running in parallel that randomly configured and dumped routes
> and ip addresses on random interfaces would do the trick?
>

I think the most important question is why is this related with macvlan?
IPv6 is L3 while macvlan pure L2, if this is a IPv6 routing bug, it should
not be limited to macvlan.

What does your IPv6 routing table look like? And how do you configure those
macvlan interfaces?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-09-29 20:39                     ` Cong Wang
@ 2014-09-29 21:24                       ` Ben Greear
  2014-10-12 11:42                         ` Vladislav Yasevich
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Greear @ 2014-09-29 21:24 UTC (permalink / raw)
  To: Cong Wang; +Cc: Hannes Frederic Sowa, Hongmei Li, netdev

On 09/29/2014 01:39 PM, Cong Wang wrote:
> On Mon, Sep 29, 2014 at 12:48 PM, Ben Greear <greearb@candelatech.com> wrote:
>>
>> We are going to be running up to 20 concurrent scripts that
>> will be dumping routes & ips, and configuring ip addresses
>> and routes during bringup.
>>
>> I don't have a simple stand-alone way to reproduce this,
>> but at least when we reported the problem is was very easy
>> for us to reproduce with our tool.
>>
>> Maybe 20 scripts running in parallel that randomly configured and dumped routes
>> and ip addresses on random interfaces would do the trick?
>>
> 
> I think the most important question is why is this related with macvlan?
> IPv6 is L3 while macvlan pure L2, if this is a IPv6 routing bug, it should
> not be limited to macvlan.

We saw it using mac-vlans on top of ixgbe.  Probably it can be reproduced
elsewhere, but since we had a good test case, we didn't bother trying lots
of other combinations.

Just enabling some debug code caused the problem to be much harder
to reproduce, so it is probably some sort of race.  Maybe lots of mac-vlans
make it easier to hit.

> What does your IPv6 routing table look like? And how do you configure those
> macvlan interfaces?

Each interface would have it's own routing table, with at least as subnet
route.  I'm not sure we were adding a default gateway or not.

The main routing table would also have some auto-created routes
I think.

It is configured using 'ip' to add and dump routes.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-09-29 21:24                       ` Ben Greear
@ 2014-10-12 11:42                         ` Vladislav Yasevich
  2014-10-13 18:06                           ` Ben Greear
  0 siblings, 1 reply; 15+ messages in thread
From: Vladislav Yasevich @ 2014-10-12 11:42 UTC (permalink / raw)
  To: Ben Greear; +Cc: Cong Wang, Hannes Frederic Sowa, Hongmei Li, netdev

On Mon, Sep 29, 2014 at 5:24 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 09/29/2014 01:39 PM, Cong Wang wrote:
>> On Mon, Sep 29, 2014 at 12:48 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>
>>> We are going to be running up to 20 concurrent scripts that
>>> will be dumping routes & ips, and configuring ip addresses
>>> and routes during bringup.
>>>
>>> I don't have a simple stand-alone way to reproduce this,
>>> but at least when we reported the problem is was very easy
>>> for us to reproduce with our tool.
>>>
>>> Maybe 20 scripts running in parallel that randomly configured and dumped routes
>>> and ip addresses on random interfaces would do the trick?
>>>
>>
>> I think the most important question is why is this related with macvlan?
>> IPv6 is L3 while macvlan pure L2, if this is a IPv6 routing bug, it should
>> not be limited to macvlan.
>
> We saw it using mac-vlans on top of ixgbe.  Probably it can be reproduced
> elsewhere, but since we had a good test case, we didn't bother trying lots
> of other combinations.
>
> Just enabling some debug code caused the problem to be much harder
> to reproduce, so it is probably some sort of race.  Maybe lots of mac-vlans
> make it easier to hit.
>
>> What does your IPv6 routing table look like? And how do you configure those
>> macvlan interfaces?
>
> Each interface would have it's own routing table, with at least as subnet
> route.  I'm not sure we were adding a default gateway or not.
>
> The main routing table would also have some auto-created routes
> I think.
>
> It is configured using 'ip' to add and dump routes.
>

Ben

Just saw this thread.  Can you check if this commit makes a difference:
commit 40b8fe45d1f094e3babe7b2dc2b71557ab71401d
Author: Vlad Yasevich <vyasevich@gmail.com>
Date:   Mon Sep 22 16:34:17 2014 -0400

    macvtap: Fix race between device delete and open.

Thanks
-vlad

> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel.
  2014-10-12 11:42                         ` Vladislav Yasevich
@ 2014-10-13 18:06                           ` Ben Greear
  0 siblings, 0 replies; 15+ messages in thread
From: Ben Greear @ 2014-10-13 18:06 UTC (permalink / raw)
  To: Vladislav Yasevich; +Cc: Cong Wang, Hannes Frederic Sowa, Hongmei Li, netdev

On 10/12/2014 04:42 AM, Vladislav Yasevich wrote:
> On Mon, Sep 29, 2014 at 5:24 PM, Ben Greear <greearb@candelatech.com> wrote:
>> On 09/29/2014 01:39 PM, Cong Wang wrote:
>>> On Mon, Sep 29, 2014 at 12:48 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>>
>>>> We are going to be running up to 20 concurrent scripts that
>>>> will be dumping routes & ips, and configuring ip addresses
>>>> and routes during bringup.
>>>>
>>>> I don't have a simple stand-alone way to reproduce this,
>>>> but at least when we reported the problem is was very easy
>>>> for us to reproduce with our tool.
>>>>
>>>> Maybe 20 scripts running in parallel that randomly configured and dumped routes
>>>> and ip addresses on random interfaces would do the trick?
>>>>
>>>
>>> I think the most important question is why is this related with macvlan?
>>> IPv6 is L3 while macvlan pure L2, if this is a IPv6 routing bug, it should
>>> not be limited to macvlan.
>>
>> We saw it using mac-vlans on top of ixgbe.  Probably it can be reproduced
>> elsewhere, but since we had a good test case, we didn't bother trying lots
>> of other combinations.
>>
>> Just enabling some debug code caused the problem to be much harder
>> to reproduce, so it is probably some sort of race.  Maybe lots of mac-vlans
>> make it easier to hit.
>>
>>> What does your IPv6 routing table look like? And how do you configure those
>>> macvlan interfaces?
>>
>> Each interface would have it's own routing table, with at least as subnet
>> route.  I'm not sure we were adding a default gateway or not.
>>
>> The main routing table would also have some auto-created routes
>> I think.
>>
>> It is configured using 'ip' to add and dump routes.
>>
> 
> Ben
> 
> Just saw this thread.  Can you check if this commit makes a difference:
> commit 40b8fe45d1f094e3babe7b2dc2b71557ab71401d
> Author: Vlad Yasevich <vyasevich@gmail.com>
> Date:   Mon Sep 22 16:34:17 2014 -0400
> 
>     macvtap: Fix race between device delete and open.

We can retest this, but we do not use macvtap, and we have higher-priority
things to test at the moment, so might be a while.

Thanks,
Ben

> 
> Thanks
> -vlad
> 
>> Thanks,
>> Ben
>>
>> --
>> Ben Greear <greearb@candelatech.com>
>> Candela Technologies Inc  http://www.candelatech.com
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-10-13 18:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-03 20:37 IPv6 FIB related crash with MACVLANs in 3.9.11+ kernel Ben Greear
2014-02-03 22:03 ` Hannes Frederic Sowa
2014-02-03 22:06   ` Ben Greear
2014-02-08 16:43     ` Ben Greear
2014-02-08 17:23       ` Hannes Frederic Sowa
2014-09-25 22:24         ` Hongmei Li
2014-09-28 12:11           ` Hannes Frederic Sowa
2014-09-29 18:15             ` Hongmei Li
2014-09-29 18:44               ` Ben Greear
2014-09-29 19:03                 ` Hannes Frederic Sowa
2014-09-29 19:48                   ` Ben Greear
2014-09-29 20:39                     ` Cong Wang
2014-09-29 21:24                       ` Ben Greear
2014-10-12 11:42                         ` Vladislav Yasevich
2014-10-13 18:06                           ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).