All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.17-rc1 oops during network interface configuration
@ 2014-08-18 12:18 Bart Van Assche
  2014-08-19 14:56 ` Chuck Lever
       [not found] ` <53F1EF18.7010909-HInyCGIudOg@public.gmane.org>
  0 siblings, 2 replies; 18+ messages in thread
From: Bart Van Assche @ 2014-08-18 12:18 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: linux-rdma

Hello,

Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? The
following call trace is triggered during boot on a system on which kernel
3.16 runs fine:

BUG: unable to handle kernel paging request at ffff88090000007e
IP: __dev_queue_xmit+0x519
Call Trace:
? __dev_queue_xmit+0x49
dev_queue_xmit+0x10
neigh_connected_output
? ip_finish_output
ip_finish_output
? ip_finish_output
? netif_rx_ni
ip_mc_output
ip_local_out_sk
ip_send_skb
udp_send_skb
udp_sendmsg
? ip_reply_glue_bits
? __lock_is_held
inet_sendmsg
? inet_sendmsg
sock_sendmsg
? might_fault
? might_fault
? move_addr_to_kernel.part.38
SYSC_sendto
? sysret_check
? trace_hardirqs_on_caller
? trace_hardirqs_on_thunk
SyS_sendto
system_call_fastpath

Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
drm_kms_helper: panic occurred, switching back to text console

A screenshot of this kernel oops can be found here:
https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/

gdb translates the crash address into the following (not sure this makes sense
since offset 0x519 is past the end of __dev_queue_xmit()):

(gdb) list *(__dev_queue_xmit+0x519) 
0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
5163    {
5164            struct netdev_adjacent *iter;
5165
5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
5168                                              &iter->dev->adj_list.lower);
5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
5170                                              &iter->dev->adj_list.lower);
5171            }

And the address __dev_queue_xmit+0x49 is translated by gdb into:

(gdb) list *(__dev_queue_xmit+0x49)
0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
70       * The various preempt_count add/sub methods
71       */
72
73      static __always_inline void __preempt_count_add(int val)
74      {
75              raw_cpu_add_4(__preempt_count, val);
76      }
77
78      static __always_inline void __preempt_count_sub(int val)
79      {

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-08-18 12:18 3.17-rc1 oops during network interface configuration Bart Van Assche
@ 2014-08-19 14:56 ` Chuck Lever
       [not found]   ` <BA3DEE00-C035-41B9-8ECD-614F04483395-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
       [not found] ` <53F1EF18.7010909-HInyCGIudOg@public.gmane.org>
  1 sibling, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2014-08-19 14:56 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: netdev, linux-rdma


On Aug 18, 2014, at 8:18 AM, Bart Van Assche <bvanassche@acm.org> wrote:

> Hello,
> 
> Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ?

After updating to 3.17-rc1 this morning, I hit the same issue.

> The
> following call trace is triggered during boot on a system on which kernel
> 3.16 runs fine:
> 
> BUG: unable to handle kernel paging request at ffff88090000007e
> IP: __dev_queue_xmit+0x519
> Call Trace:
> ? __dev_queue_xmit+0x49
> dev_queue_xmit+0x10
> neigh_connected_output
> ? ip_finish_output
> ip_finish_output
> ? ip_finish_output
> ? netif_rx_ni
> ip_mc_output
> ip_local_out_sk
> ip_send_skb
> udp_send_skb
> udp_sendmsg
> ? ip_reply_glue_bits
> ? __lock_is_held
> inet_sendmsg
> ? inet_sendmsg
> sock_sendmsg
> ? might_fault
> ? might_fault
> ? move_addr_to_kernel.part.38
> SYSC_sendto
> ? sysret_check
> ? trace_hardirqs_on_caller
> ? trace_hardirqs_on_thunk
> SyS_sendto
> system_call_fastpath
> 
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> drm_kms_helper: panic occurred, switching back to text console
> 
> A screenshot of this kernel oops can be found here:
> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
> 
> gdb translates the crash address into the following (not sure this makes sense
> since offset 0x519 is past the end of __dev_queue_xmit()):
> 
> (gdb) list *(__dev_queue_xmit+0x519) 
> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
> 5163    {
> 5164            struct netdev_adjacent *iter;
> 5165
> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
> 5168                                              &iter->dev->adj_list.lower);
> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
> 5170                                              &iter->dev->adj_list.lower);
> 5171            }
> 
> And the address __dev_queue_xmit+0x49 is translated by gdb into:
> 
> (gdb) list *(__dev_queue_xmit+0x49)
> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
> 70       * The various preempt_count add/sub methods
> 71       */
> 72
> 73      static __always_inline void __preempt_count_add(int val)
> 74      {
> 75              raw_cpu_add_4(__preempt_count, val);
> 76      }
> 77
> 78      static __always_inline void __preempt_count_sub(int val)
> 79      {
> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
       [not found] ` <53F1EF18.7010909-HInyCGIudOg@public.gmane.org>
@ 2014-08-20 10:31     ` Or Gerlitz
  0 siblings, 0 replies; 18+ messages in thread
From: Or Gerlitz @ 2014-08-20 10:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Saeed Mahameed,
	Tal Alon, Yevgeny Petrilin

On 18/08/2014 15:18, Bart Van Assche wrote:
> Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? The
> following call trace is triggered during boot on a system on which kernel
> 3.16 runs fine:

Yep, I see it on my systems too.

I narrowed this down a bit to happen only when the port link type (these 
nodes have ConnectX) is IB and IPoIB gets to load.

I reverted (below) all the IPoIB changes since 3.16 (except for the 
trivial commit c835a67) and the crash still exists.

I guess this needs to go through systematic bisection.

Or.

> net.git]# git log --oneline --no-merges v3.16.. drivers/infiniband/ulp/ipoib/
> 8a118a4 Revert "IB/ipoib: Use P_Key change event instead of P_Key polling mechanism"
> 90e6f39 Revert "IB/ipoib: Avoid flushing the workqueue from worker context"
> 030ade7 Revert "IB/ipoib: Avoid multicast join attempts with invalid P_key"
> 97ba2ff Revert "IPoIB: Remove unnecessary test for NULL before debugfs_remove()"
> e42fa20 IPoIB: Remove unnecessary test for NULL before debugfs_remove()
> dd57c93 IB/ipoib: Avoid multicast join attempts with invalid P_key
> 4eae374 IB/ipoib: Avoid flushing the workqueue from worker context
> db84f88 IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
> c835a67 net: set name_assign_type in alloc_netdev()


> BUG: unable to handle kernel paging request at ffff88090000007e
> IP: __dev_queue_xmit+0x519
> Call Trace:
> ? __dev_queue_xmit+0x49
> dev_queue_xmit+0x10
> neigh_connected_output
> ? ip_finish_output
> ip_finish_output
> ? ip_finish_output
> ? netif_rx_ni
> ip_mc_output
> ip_local_out_sk
> ip_send_skb
> udp_send_skb
> udp_sendmsg
> ? ip_reply_glue_bits
> ? __lock_is_held
> inet_sendmsg
> ? inet_sendmsg
> sock_sendmsg
> ? might_fault
> ? might_fault
> ? move_addr_to_kernel.part.38
> SYSC_sendto
> ? sysret_check
> ? trace_hardirqs_on_caller
> ? trace_hardirqs_on_thunk
> SyS_sendto
> system_call_fastpath
>
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> drm_kms_helper: panic occurred, switching back to text console
>
> A screenshot of this kernel oops can be found here:
> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>
> gdb translates the crash address into the following (not sure this makes sense
> since offset 0x519 is past the end of __dev_queue_xmit()):
>
> (gdb) list *(__dev_queue_xmit+0x519)
> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
> 5163    {
> 5164            struct netdev_adjacent *iter;
> 5165
> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
> 5168                                              &iter->dev->adj_list.lower);
> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
> 5170                                              &iter->dev->adj_list.lower);
> 5171            }
>
> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>
> (gdb) list *(__dev_queue_xmit+0x49)
> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
> 70       * The various preempt_count add/sub methods
> 71       */
> 72
> 73      static __always_inline void __preempt_count_add(int val)
> 74      {
> 75              raw_cpu_add_4(__preempt_count, val);
> 76      }
> 77
> 78      static __always_inline void __preempt_count_sub(int val)
> 79      {

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
@ 2014-08-20 10:31     ` Or Gerlitz
  0 siblings, 0 replies; 18+ messages in thread
From: Or Gerlitz @ 2014-08-20 10:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Saeed Mahameed,
	Tal Alon, Yevgeny Petrilin

On 18/08/2014 15:18, Bart Van Assche wrote:
> Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? The
> following call trace is triggered during boot on a system on which kernel
> 3.16 runs fine:

Yep, I see it on my systems too.

I narrowed this down a bit to happen only when the port link type (these 
nodes have ConnectX) is IB and IPoIB gets to load.

I reverted (below) all the IPoIB changes since 3.16 (except for the 
trivial commit c835a67) and the crash still exists.

I guess this needs to go through systematic bisection.

Or.

> net.git]# git log --oneline --no-merges v3.16.. drivers/infiniband/ulp/ipoib/
> 8a118a4 Revert "IB/ipoib: Use P_Key change event instead of P_Key polling mechanism"
> 90e6f39 Revert "IB/ipoib: Avoid flushing the workqueue from worker context"
> 030ade7 Revert "IB/ipoib: Avoid multicast join attempts with invalid P_key"
> 97ba2ff Revert "IPoIB: Remove unnecessary test for NULL before debugfs_remove()"
> e42fa20 IPoIB: Remove unnecessary test for NULL before debugfs_remove()
> dd57c93 IB/ipoib: Avoid multicast join attempts with invalid P_key
> 4eae374 IB/ipoib: Avoid flushing the workqueue from worker context
> db84f88 IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
> c835a67 net: set name_assign_type in alloc_netdev()


> BUG: unable to handle kernel paging request at ffff88090000007e
> IP: __dev_queue_xmit+0x519
> Call Trace:
> ? __dev_queue_xmit+0x49
> dev_queue_xmit+0x10
> neigh_connected_output
> ? ip_finish_output
> ip_finish_output
> ? ip_finish_output
> ? netif_rx_ni
> ip_mc_output
> ip_local_out_sk
> ip_send_skb
> udp_send_skb
> udp_sendmsg
> ? ip_reply_glue_bits
> ? __lock_is_held
> inet_sendmsg
> ? inet_sendmsg
> sock_sendmsg
> ? might_fault
> ? might_fault
> ? move_addr_to_kernel.part.38
> SYSC_sendto
> ? sysret_check
> ? trace_hardirqs_on_caller
> ? trace_hardirqs_on_thunk
> SyS_sendto
> system_call_fastpath
>
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> drm_kms_helper: panic occurred, switching back to text console
>
> A screenshot of this kernel oops can be found here:
> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>
> gdb translates the crash address into the following (not sure this makes sense
> since offset 0x519 is past the end of __dev_queue_xmit()):
>
> (gdb) list *(__dev_queue_xmit+0x519)
> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
> 5163    {
> 5164            struct netdev_adjacent *iter;
> 5165
> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
> 5168                                              &iter->dev->adj_list.lower);
> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
> 5170                                              &iter->dev->adj_list.lower);
> 5171            }
>
> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>
> (gdb) list *(__dev_queue_xmit+0x49)
> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
> 70       * The various preempt_count add/sub methods
> 71       */
> 72
> 73      static __always_inline void __preempt_count_add(int val)
> 74      {
> 75              raw_cpu_add_4(__preempt_count, val);
> 76      }
> 77
> 78      static __always_inline void __preempt_count_sub(int val)
> 79      {

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: 3.17-rc1 oops during network interface configuration
       [not found]   ` <BA3DEE00-C035-41B9-8ECD-614F04483395-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-09-09 16:24       ` Steve Wise
  0 siblings, 0 replies; 18+ messages in thread
From: Steve Wise @ 2014-09-09 16:24 UTC (permalink / raw)
  To: 'Chuck Lever', 'Bart Van Assche'
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, 'linux-rdma'



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
> Sent: Tuesday, August 19, 2014 9:56 AM
> To: Bart Van Assche
> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma
> Subject: Re: 3.17-rc1 oops during network interface configuration
> 
> 
> On Aug 18, 2014, at 8:18 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
> 
> > Hello,
> >
> > Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ?
> 
> After updating to 3.17-rc1 this morning, I hit the same issue.
>

I still see this with 3.17-rc4.  Perhaps somebody should fix it. :)


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: 3.17-rc1 oops during network interface configuration
@ 2014-09-09 16:24       ` Steve Wise
  0 siblings, 0 replies; 18+ messages in thread
From: Steve Wise @ 2014-09-09 16:24 UTC (permalink / raw)
  To: 'Chuck Lever', 'Bart Van Assche'
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, 'linux-rdma'



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
> Sent: Tuesday, August 19, 2014 9:56 AM
> To: Bart Van Assche
> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma
> Subject: Re: 3.17-rc1 oops during network interface configuration
> 
> 
> On Aug 18, 2014, at 8:18 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
> 
> > Hello,
> >
> > Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ?
> 
> After updating to 3.17-rc1 this morning, I hit the same issue.
>

I still see this with 3.17-rc4.  Perhaps somebody should fix it. :)


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-09 16:24       ` Steve Wise
  (?)
@ 2014-09-09 16:41       ` Chuck Lever
  -1 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2014-09-09 16:41 UTC (permalink / raw)
  To: Steve Wise; +Cc: Bart Van Assche, netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma


On Sep 9, 2014, at 12:24 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:

> 
> 
>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Chuck Lever
>> Sent: Tuesday, August 19, 2014 9:56 AM
>> To: Bart Van Assche
>> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma
>> Subject: Re: 3.17-rc1 oops during network interface configuration
>> 
>> 
>> On Aug 18, 2014, at 8:18 AM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote:
>> 
>>> Hello,
>>> 
>>> Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ?
>> 
>> After updating to 3.17-rc1 this morning, I hit the same issue.
>> 
> 
> I still see this with 3.17-rc4.  Perhaps somebody should fix it. :)

I’m working on it right now.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-08-20 10:31     ` Or Gerlitz
  (?)
@ 2014-09-09 19:30     ` Chuck Lever
       [not found]       ` <F580F466-28A5-4BDD-A338-D5C065A760C1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  -1 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2014-09-09 19:30 UTC (permalink / raw)
  To: netdev, LKML Kernel
  Cc: linux-rdma, Bart Van Assche, Or Gerlitz, Saeed Mahameed,
	Tal Alon, Yevgeny Petrilin, _govind


On Aug 20, 2014, at 6:31 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:

> On 18/08/2014 15:18, Bart Van Assche wrote:
>> Has anyone else already tried to boot kernel 3.17-rc1 on an IB system ? The
>> following call trace is triggered during boot on a system on which kernel
>> 3.16 runs fine:
> 
> Yep, I see it on my systems too.
> 
> I narrowed this down a bit to happen only when the port link type (these nodes have ConnectX) is IB and IPoIB gets to load.
> 
> I reverted (below) all the IPoIB changes since 3.16 (except for the trivial commit c835a67) and the crash still exists.
> 
> I guess this needs to go through systematic bisection.

This crash happens when booting v3.17-rcN on any of my IB-enabled
systems. I have both ConnectX-2 and mthca systems, all are affected.

I bisected this to:

commit e0f31d8498676fda36289603a054d0d490aa2679
Author:     Govindarajulu Varadarajan <_govind@gmx.com>
AuthorDate: Mon Jun 23 16:07:58 2014 +0530
Commit:     David S. Miller <davem@davemloft.net>
CommitDate: Mon Jun 23 14:32:19 2014 -0700

    flow_keys: Record IP layer protocol in skb_flow_dissect()

    skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
    give any information about IPv4 or IPv6.
    This patch adds new member, n_proto, to struct flow_keys. Which records the
    IP layer type. i.e IPv4 or IPv6.
    This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
    Adding new member to flow_keys increases the struct size by around 4 bytes.
    This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
    qdisc_cb_private_validate()
    So increase data size by 4

    Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>


This commit includes a hunk that increases the size of struct qdisc_skb_cb
by at least 4 bytes:

> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 624f985..a3cfb8e 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>         unsigned int            pkt_len;
>         u16                     slave_dev_queue_mapping;
>         u16                     _pad;
> -       unsigned char           data[20];
> +       unsigned char           data[24];
>  };
>  
>  static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)



IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h:

> struct ipoib_cb {
>         struct qdisc_skb_cb     qdisc_cb;
>         u8                      hwaddr[INFINIBAND_ALEN];
> };

IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
52 bytes.

Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
field, which contains a pointer. By the time we get into
__dev_queue_xmit() and try to use the result of skb_dst(), that pointer
is garbage, and we oops.

Obviously, cb[] could be increased to 56 bytes to accommodate struct
ipoib_cb. I tried this, and it is effective in preventing the oops on
one of my systems.

But I suspect there is an historical reason I’m not aware of that it
has remained 48 bytes for years.


> Or.
> 
>> net.git]# git log --oneline --no-merges v3.16.. drivers/infiniband/ulp/ipoib/
>> 8a118a4 Revert "IB/ipoib: Use P_Key change event instead of P_Key polling mechanism"
>> 90e6f39 Revert "IB/ipoib: Avoid flushing the workqueue from worker context"
>> 030ade7 Revert "IB/ipoib: Avoid multicast join attempts with invalid P_key"
>> 97ba2ff Revert "IPoIB: Remove unnecessary test for NULL before debugfs_remove()"
>> e42fa20 IPoIB: Remove unnecessary test for NULL before debugfs_remove()
>> dd57c93 IB/ipoib: Avoid multicast join attempts with invalid P_key
>> 4eae374 IB/ipoib: Avoid flushing the workqueue from worker context
>> db84f88 IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
>> c835a67 net: set name_assign_type in alloc_netdev()
> 
> 
>> BUG: unable to handle kernel paging request at ffff88090000007e
>> IP: __dev_queue_xmit+0x519
>> Call Trace:
>> ? __dev_queue_xmit+0x49
>> dev_queue_xmit+0x10
>> neigh_connected_output
>> ? ip_finish_output
>> ip_finish_output
>> ? ip_finish_output
>> ? netif_rx_ni
>> ip_mc_output
>> ip_local_out_sk
>> ip_send_skb
>> udp_send_skb
>> udp_sendmsg
>> ? ip_reply_glue_bits
>> ? __lock_is_held
>> inet_sendmsg
>> ? inet_sendmsg
>> sock_sendmsg
>> ? might_fault
>> ? might_fault
>> ? move_addr_to_kernel.part.38
>> SYSC_sendto
>> ? sysret_check
>> ? trace_hardirqs_on_caller
>> ? trace_hardirqs_on_thunk
>> SyS_sendto
>> system_call_fastpath
>> 
>> Kernel panic - not syncing: Fatal exception in interrupt
>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
>> drm_kms_helper: panic occurred, switching back to text console
>> 
>> A screenshot of this kernel oops can be found here:
>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>> 
>> gdb translates the crash address into the following (not sure this makes sense
>> since offset 0x519 is past the end of __dev_queue_xmit()):
>> 
>> (gdb) list *(__dev_queue_xmit+0x519)
>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
>> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
>> 5163    {
>> 5164            struct netdev_adjacent *iter;
>> 5165
>> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
>> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
>> 5168                                              &iter->dev->adj_list.lower);
>> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
>> 5170                                              &iter->dev->adj_list.lower);
>> 5171            }
>> 
>> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>> 
>> (gdb) list *(__dev_queue_xmit+0x49)
>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
>> 70       * The various preempt_count add/sub methods
>> 71       */
>> 72
>> 73      static __always_inline void __preempt_count_add(int val)
>> 74      {
>> 75              raw_cpu_add_4(__preempt_count, val);
>> 76      }
>> 77
>> 78      static __always_inline void __preempt_count_sub(int val)
>> 79      {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-09 19:30     ` Chuck Lever
       [not found]       ` <F580F466-28A5-4BDD-A338-D5C065A760C1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-09-10  7:42           ` Or Gerlitz
  0 siblings, 0 replies; 18+ messages in thread
From: Or Gerlitz @ 2014-09-10  7:42 UTC (permalink / raw)
  To: Chuck Lever, Govindarajulu Varadarajan, David S. Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, LKML Kernel, linux-rdma,
	Bart Van Assche, Saeed Mahameed, Tal Alon, Yevgeny Petrilin,
	Roland Dreier


On 9/9/2014 10:30 PM, Chuck Lever wrote:
> This crash happens when booting v3.17-rcN on any of my IB-enabled
> systems. I have both ConnectX-2 and mthca systems, all are affected.
>
> I bisected this to:
>
> commit e0f31d8498676fda36289603a054d0d490aa2679
> Author:     Govindarajulu Varadarajan <_govind-KK0ffGbhmjU@public.gmane.org>
> AuthorDate: Mon Jun 23 16:07:58 2014 +0530
> Commit:     David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> CommitDate: Mon Jun 23 14:32:19 2014 -0700
>
>      flow_keys: Record IP layer protocol in skb_flow_dissect()
>
>      skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
>      give any information about IPv4 or IPv6.
>      This patch adds new member, n_proto, to struct flow_keys. Which records the
>      IP layer type. i.e IPv4 or IPv6.
>      This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
>      Adding new member to flow_keys increases the struct size by around 4 bytes.
>      This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
>      qdisc_cb_private_validate()
>      So increase data size by 4
>
>      Signed-off-by: Govindarajulu Varadarajan <_govind-KK0ffGbhmjU@public.gmane.org>
>      Signed-off-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>
>
> This commit includes a hunk that increases the size of struct qdisc_skb_cb
> by at least 4 bytes:
>
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 624f985..a3cfb8e 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>>          unsigned int            pkt_len;
>>          u16                     slave_dev_queue_mapping;
>>          u16                     _pad;
>> -       unsigned char           data[20];
>> +       unsigned char           data[24];
>>   };
>>   
>>   static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
>
>
> IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h:
>
>> struct ipoib_cb {
>>          struct qdisc_skb_cb     qdisc_cb;
>>          u8                      hwaddr[INFINIBAND_ALEN];
>> };
> IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
> After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
> 52 bytes.
>
> Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
> field, which contains a pointer. By the time we get into
> __dev_queue_xmit() and try to use the result of skb_dst(), that pointer
> is garbage, and we oops.
>
> Obviously, cb[] could be increased to 56 bytes to accommodate struct
> ipoib_cb. I tried this, and it is effective in preventing the oops on
> one of my systems.
>
> But I suspect there is an historical reason I’m not aware of that it
> has remained 48 bytes for years.

Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 
commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb 
to stash LL addresses" we are using the skb->cb field to enable proper 
work under GRO and avoid another historical quirk we had there... so I 
think we can definetly consider commit e0f31d849 to introduce a severe 
regression... Govindarajulu, Dave - what's your thinking here? any quick 
idea on how to fix?

Also, I was thinking we have the mechanics in the kernel, e.g commit 
a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to 
catch such over-flows?

Or.

>>> BUG: unable to handle kernel paging request at ffff88090000007e
>>> IP: __dev_queue_xmit+0x519
>>> Call Trace:
>>> ? __dev_queue_xmit+0x49
>>> dev_queue_xmit+0x10
>>> neigh_connected_output
>>> ? ip_finish_output
>>> ip_finish_output
>>> ? ip_finish_output
>>> ? netif_rx_ni
>>> ip_mc_output
>>> ip_local_out_sk
>>> ip_send_skb
>>> udp_send_skb
>>> udp_sendmsg
>>> ? ip_reply_glue_bits
>>> ? __lock_is_held
>>> inet_sendmsg
>>> ? inet_sendmsg
>>> sock_sendmsg
>>> ? might_fault
>>> ? might_fault
>>> ? move_addr_to_kernel.part.38
>>> SYSC_sendto
>>> ? sysret_check
>>> ? trace_hardirqs_on_caller
>>> ? trace_hardirqs_on_thunk
>>> SyS_sendto
>>> system_call_fastpath
>>>
>>> Kernel panic - not syncing: Fatal exception in interrupt
>>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
>>> drm_kms_helper: panic occurred, switching back to text console
>>>
>>> A screenshot of this kernel oops can be found here:
>>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>>>
>>> gdb translates the crash address into the following (not sure this makes sense
>>> since offset 0x519 is past the end of __dev_queue_xmit()):
>>>
>>> (gdb) list *(__dev_queue_xmit+0x519)
>>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
>>> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
>>> 5163    {
>>> 5164            struct netdev_adjacent *iter;
>>> 5165
>>> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
>>> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
>>> 5168                                              &iter->dev->adj_list.lower);
>>> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
>>> 5170                                              &iter->dev->adj_list.lower);
>>> 5171            }
>>>
>>> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>>>
>>> (gdb) list *(__dev_queue_xmit+0x49)
>>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
>>> 70       * The various preempt_count add/sub methods
>>> 71       */
>>> 72
>>> 73      static __always_inline void __preempt_count_add(int val)
>>> 74      {
>>> 75              raw_cpu_add_4(__preempt_count, val);
>>> 76      }
>>> 77
>>> 78      static __always_inline void __preempt_count_sub(int val)
>>> 79      {

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
@ 2014-09-10  7:42           ` Or Gerlitz
  0 siblings, 0 replies; 18+ messages in thread
From: Or Gerlitz @ 2014-09-10  7:42 UTC (permalink / raw)
  To: Chuck Lever, Govindarajulu Varadarajan, David S. Miller
  Cc: netdev, LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed,
	Tal Alon, Yevgeny Petrilin, Roland Dreier


On 9/9/2014 10:30 PM, Chuck Lever wrote:
> This crash happens when booting v3.17-rcN on any of my IB-enabled
> systems. I have both ConnectX-2 and mthca systems, all are affected.
>
> I bisected this to:
>
> commit e0f31d8498676fda36289603a054d0d490aa2679
> Author:     Govindarajulu Varadarajan <_govind@gmx.com>
> AuthorDate: Mon Jun 23 16:07:58 2014 +0530
> Commit:     David S. Miller <davem@davemloft.net>
> CommitDate: Mon Jun 23 14:32:19 2014 -0700
>
>      flow_keys: Record IP layer protocol in skb_flow_dissect()
>
>      skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
>      give any information about IPv4 or IPv6.
>      This patch adds new member, n_proto, to struct flow_keys. Which records the
>      IP layer type. i.e IPv4 or IPv6.
>      This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
>      Adding new member to flow_keys increases the struct size by around 4 bytes.
>      This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
>      qdisc_cb_private_validate()
>      So increase data size by 4
>
>      Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
>      Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
> This commit includes a hunk that increases the size of struct qdisc_skb_cb
> by at least 4 bytes:
>
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 624f985..a3cfb8e 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>>          unsigned int            pkt_len;
>>          u16                     slave_dev_queue_mapping;
>>          u16                     _pad;
>> -       unsigned char           data[20];
>> +       unsigned char           data[24];
>>   };
>>   
>>   static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
>
>
> IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h:
>
>> struct ipoib_cb {
>>          struct qdisc_skb_cb     qdisc_cb;
>>          u8                      hwaddr[INFINIBAND_ALEN];
>> };
> IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
> After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
> 52 bytes.
>
> Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
> field, which contains a pointer. By the time we get into
> __dev_queue_xmit() and try to use the result of skb_dst(), that pointer
> is garbage, and we oops.
>
> Obviously, cb[] could be increased to 56 bytes to accommodate struct
> ipoib_cb. I tried this, and it is effective in preventing the oops on
> one of my systems.
>
> But I suspect there is an historical reason I’m not aware of that it
> has remained 48 bytes for years.

Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 
commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb 
to stash LL addresses" we are using the skb->cb field to enable proper 
work under GRO and avoid another historical quirk we had there... so I 
think we can definetly consider commit e0f31d849 to introduce a severe 
regression... Govindarajulu, Dave - what's your thinking here? any quick 
idea on how to fix?

Also, I was thinking we have the mechanics in the kernel, e.g commit 
a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to 
catch such over-flows?

Or.

>>> BUG: unable to handle kernel paging request at ffff88090000007e
>>> IP: __dev_queue_xmit+0x519
>>> Call Trace:
>>> ? __dev_queue_xmit+0x49
>>> dev_queue_xmit+0x10
>>> neigh_connected_output
>>> ? ip_finish_output
>>> ip_finish_output
>>> ? ip_finish_output
>>> ? netif_rx_ni
>>> ip_mc_output
>>> ip_local_out_sk
>>> ip_send_skb
>>> udp_send_skb
>>> udp_sendmsg
>>> ? ip_reply_glue_bits
>>> ? __lock_is_held
>>> inet_sendmsg
>>> ? inet_sendmsg
>>> sock_sendmsg
>>> ? might_fault
>>> ? might_fault
>>> ? move_addr_to_kernel.part.38
>>> SYSC_sendto
>>> ? sysret_check
>>> ? trace_hardirqs_on_caller
>>> ? trace_hardirqs_on_thunk
>>> SyS_sendto
>>> system_call_fastpath
>>>
>>> Kernel panic - not syncing: Fatal exception in interrupt
>>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
>>> drm_kms_helper: panic occurred, switching back to text console
>>>
>>> A screenshot of this kernel oops can be found here:
>>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>>>
>>> gdb translates the crash address into the following (not sure this makes sense
>>> since offset 0x519 is past the end of __dev_queue_xmit()):
>>>
>>> (gdb) list *(__dev_queue_xmit+0x519)
>>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
>>> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
>>> 5163    {
>>> 5164            struct netdev_adjacent *iter;
>>> 5165
>>> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
>>> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
>>> 5168                                              &iter->dev->adj_list.lower);
>>> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
>>> 5170                                              &iter->dev->adj_list.lower);
>>> 5171            }
>>>
>>> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>>>
>>> (gdb) list *(__dev_queue_xmit+0x49)
>>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
>>> 70       * The various preempt_count add/sub methods
>>> 71       */
>>> 72
>>> 73      static __always_inline void __preempt_count_add(int val)
>>> 74      {
>>> 75              raw_cpu_add_4(__preempt_count, val);
>>> 76      }
>>> 77
>>> 78      static __always_inline void __preempt_count_sub(int val)
>>> 79      {


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
@ 2014-09-10  7:42           ` Or Gerlitz
  0 siblings, 0 replies; 18+ messages in thread
From: Or Gerlitz @ 2014-09-10  7:42 UTC (permalink / raw)
  To: Chuck Lever, Govindarajulu Varadarajan, David S. Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, LKML Kernel, linux-rdma,
	Bart Van Assche, Saeed Mahameed, Tal Alon, Yevgeny Petrilin,
	Roland Dreier


On 9/9/2014 10:30 PM, Chuck Lever wrote:
> This crash happens when booting v3.17-rcN on any of my IB-enabled
> systems. I have both ConnectX-2 and mthca systems, all are affected.
>
> I bisected this to:
>
> commit e0f31d8498676fda36289603a054d0d490aa2679
> Author:     Govindarajulu Varadarajan <_govind-KK0ffGbhmjU@public.gmane.org>
> AuthorDate: Mon Jun 23 16:07:58 2014 +0530
> Commit:     David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> CommitDate: Mon Jun 23 14:32:19 2014 -0700
>
>      flow_keys: Record IP layer protocol in skb_flow_dissect()
>
>      skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
>      give any information about IPv4 or IPv6.
>      This patch adds new member, n_proto, to struct flow_keys. Which records the
>      IP layer type. i.e IPv4 or IPv6.
>      This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
>      Adding new member to flow_keys increases the struct size by around 4 bytes.
>      This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
>      qdisc_cb_private_validate()
>      So increase data size by 4
>
>      Signed-off-by: Govindarajulu Varadarajan <_govind-KK0ffGbhmjU@public.gmane.org>
>      Signed-off-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>
>
> This commit includes a hunk that increases the size of struct qdisc_skb_cb
> by at least 4 bytes:
>
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 624f985..a3cfb8e 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>>          unsigned int            pkt_len;
>>          u16                     slave_dev_queue_mapping;
>>          u16                     _pad;
>> -       unsigned char           data[20];
>> +       unsigned char           data[24];
>>   };
>>   
>>   static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
>
>
> IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h:
>
>> struct ipoib_cb {
>>          struct qdisc_skb_cb     qdisc_cb;
>>          u8                      hwaddr[INFINIBAND_ALEN];
>> };
> IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
> After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
> 52 bytes.
>
> Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
> field, which contains a pointer. By the time we get into
> __dev_queue_xmit() and try to use the result of skb_dst(), that pointer
> is garbage, and we oops.
>
> Obviously, cb[] could be increased to 56 bytes to accommodate struct
> ipoib_cb. I tried this, and it is effective in preventing the oops on
> one of my systems.
>
> But I suspect there is an historical reason I’m not aware of that it
> has remained 48 bytes for years.

Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 
commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb 
to stash LL addresses" we are using the skb->cb field to enable proper 
work under GRO and avoid another historical quirk we had there... so I 
think we can definetly consider commit e0f31d849 to introduce a severe 
regression... Govindarajulu, Dave - what's your thinking here? any quick 
idea on how to fix?

Also, I was thinking we have the mechanics in the kernel, e.g commit 
a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to 
catch such over-flows?

Or.

>>> BUG: unable to handle kernel paging request at ffff88090000007e
>>> IP: __dev_queue_xmit+0x519
>>> Call Trace:
>>> ? __dev_queue_xmit+0x49
>>> dev_queue_xmit+0x10
>>> neigh_connected_output
>>> ? ip_finish_output
>>> ip_finish_output
>>> ? ip_finish_output
>>> ? netif_rx_ni
>>> ip_mc_output
>>> ip_local_out_sk
>>> ip_send_skb
>>> udp_send_skb
>>> udp_sendmsg
>>> ? ip_reply_glue_bits
>>> ? __lock_is_held
>>> inet_sendmsg
>>> ? inet_sendmsg
>>> sock_sendmsg
>>> ? might_fault
>>> ? might_fault
>>> ? move_addr_to_kernel.part.38
>>> SYSC_sendto
>>> ? sysret_check
>>> ? trace_hardirqs_on_caller
>>> ? trace_hardirqs_on_thunk
>>> SyS_sendto
>>> system_call_fastpath
>>>
>>> Kernel panic - not syncing: Fatal exception in interrupt
>>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
>>> drm_kms_helper: panic occurred, switching back to text console
>>>
>>> A screenshot of this kernel oops can be found here:
>>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>>>
>>> gdb translates the crash address into the following (not sure this makes sense
>>> since offset 0x519 is past the end of __dev_queue_xmit()):
>>>
>>> (gdb) list *(__dev_queue_xmit+0x519)
>>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
>>> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
>>> 5163    {
>>> 5164            struct netdev_adjacent *iter;
>>> 5165
>>> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
>>> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
>>> 5168                                              &iter->dev->adj_list.lower);
>>> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
>>> 5170                                              &iter->dev->adj_list.lower);
>>> 5171            }
>>>
>>> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>>>
>>> (gdb) list *(__dev_queue_xmit+0x49)
>>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
>>> 70       * The various preempt_count add/sub methods
>>> 71       */
>>> 72
>>> 73      static __always_inline void __preempt_count_add(int val)
>>> 74      {
>>> 75              raw_cpu_add_4(__preempt_count, val);
>>> 76      }
>>> 77
>>> 78      static __always_inline void __preempt_count_sub(int val)
>>> 79      {

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-10  7:42           ` Or Gerlitz
  (?)
  (?)
@ 2014-09-10 14:24           ` Eric Dumazet
  -1 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2014-09-10 14:24 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Chuck Lever, Govindarajulu Varadarajan, David S. Miller, netdev,
	LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed,
	Tal Alon, Yevgeny Petrilin, Roland Dreier

On Wed, 2014-09-10 at 10:42 +0300, Or Gerlitz wrote:

> Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 
> commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb 
> to stash LL addresses" we are using the skb->cb field to enable proper 
> work under GRO and avoid another historical quirk we had there... so I 
> think we can definetly consider commit e0f31d849 to introduce a severe 
> regression... Govindarajulu, Dave - what's your thinking here? any quick 
> idea on how to fix?

I mentioned the IB stuff when patch was posted, and David replied :

"I think this is fine, IPOIB's control block will need still just 44
bytes after these changes, so there will still be 4 bytes to spare."

http://patchwork.ozlabs.org/patch/357584/

My suggestion was to reduce sch_choke usage and not store the whole
flow_keys.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-10  7:42           ` Or Gerlitz
@ 2014-09-10 20:04               ` David Miller
  -1 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2014-09-10 20:04 UTC (permalink / raw)
  To: ogerlitz-VPRAkNaXOzVWk0Htik3J/w
  Cc: chuck.lever-QHcLZuEGTsvQT0dZR+AlfA, _govind-KK0ffGbhmjU,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, bvanassche-HInyCGIudOg,
	saeedm-VPRAkNaXOzVWk0Htik3J/w, talal-VPRAkNaXOzVWk0Htik3J/w,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, roland-DgEjT+Ai2ygdnm+yROfE0A

From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Date: Wed, 10 Sep 2014 10:42:41 +0300

> Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2
> commit 936d7de "IPoIB: Stop lying about hard_header_len and use
> skb->cb to stash LL addresses" we are using the skb->cb field to
> enable proper work under GRO and avoid another historical quirk we had
> there... so I think we can definetly consider commit e0f31d849 to
> introduce a severe regression... Govindarajulu, Dave - what's your
> thinking here? any quick idea on how to fix?

Eric mentioned that we could reduce the amount of flow state stored
in the qdisc cb in order to handle this better.

Making skb->cb[] larger is basically out of the question as far as
I'm concerned.

> Also, I was thinking we have the mechanics in the kernel, e.g commit
> a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to
> catch such over-flows?

Yes we should have added a build-time check so that we would discover
this issue more quickly.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
@ 2014-09-10 20:04               ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2014-09-10 20:04 UTC (permalink / raw)
  To: ogerlitz
  Cc: chuck.lever, _govind, netdev, linux-kernel, linux-rdma,
	bvanassche, saeedm, talal, yevgenyp, roland

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Wed, 10 Sep 2014 10:42:41 +0300

> Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2
> commit 936d7de "IPoIB: Stop lying about hard_header_len and use
> skb->cb to stash LL addresses" we are using the skb->cb field to
> enable proper work under GRO and avoid another historical quirk we had
> there... so I think we can definetly consider commit e0f31d849 to
> introduce a severe regression... Govindarajulu, Dave - what's your
> thinking here? any quick idea on how to fix?

Eric mentioned that we could reduce the amount of flow state stored
in the qdisc cb in order to handle this better.

Making skb->cb[] larger is basically out of the question as far as
I'm concerned.

> Also, I was thinking we have the mechanics in the kernel, e.g commit
> a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to
> catch such over-flows?

Yes we should have added a build-time check so that we would discover
this issue more quickly.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-10  7:42           ` Or Gerlitz
                             ` (3 preceding siblings ...)
  (?)
@ 2014-09-29 18:52           ` Chuck Lever
       [not found]             ` <6C43129E-4365-4BEF-ADAD-963B8F386789-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  -1 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2014-09-29 18:52 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Govindarajulu Varadarajan, David S. Miller, netdev, LKML Kernel,
	linux-rdma, Bart Van Assche, Saeed Mahameed, Tal Alon,
	Yevgeny Petrilin, Roland Dreier


On Sep 10, 2014, at 3:42 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:

> 
> On 9/9/2014 10:30 PM, Chuck Lever wrote:
>> This crash happens when booting v3.17-rcN on any of my IB-enabled
>> systems. I have both ConnectX-2 and mthca systems, all are affected.
>> 
>> I bisected this to:
>> 
>> commit e0f31d8498676fda36289603a054d0d490aa2679
>> Author:     Govindarajulu Varadarajan <_govind@gmx.com>
>> AuthorDate: Mon Jun 23 16:07:58 2014 +0530
>> Commit:     David S. Miller <davem@davemloft.net>
>> CommitDate: Mon Jun 23 14:32:19 2014 -0700
>> 
>>     flow_keys: Record IP layer protocol in skb_flow_dissect()
>> 
>>     skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
>>     give any information about IPv4 or IPv6.
>>     This patch adds new member, n_proto, to struct flow_keys. Which records the
>>     IP layer type. i.e IPv4 or IPv6.
>>     This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
>>     Adding new member to flow_keys increases the struct size by around 4 bytes.
>>     This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
>>     qdisc_cb_private_validate()
>>     So increase data size by 4
>> 
>>     Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>> 
>> 
>> This commit includes a hunk that increases the size of struct qdisc_skb_cb
>> by at least 4 bytes:
>> 
>>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>> index 624f985..a3cfb8e 100644
>>> --- a/include/net/sch_generic.h
>>> +++ b/include/net/sch_generic.h
>>> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>>>         unsigned int            pkt_len;
>>>         u16                     slave_dev_queue_mapping;
>>>         u16                     _pad;
>>> -       unsigned char           data[20];
>>> +       unsigned char           data[24];
>>>  };
>>>    static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
>> 
>> 
>> IPoIB defines the following structure in drivers/infiniband/ulp/ipoib/ipoib.h:
>> 
>>> struct ipoib_cb {
>>>         struct qdisc_skb_cb     qdisc_cb;
>>>         u8                      hwaddr[INFINIBAND_ALEN];
>>> };
>> IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
>> After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
>> 52 bytes.
>> 
>> Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
>> field, which contains a pointer. By the time we get into
>> __dev_queue_xmit() and try to use the result of skb_dst(), that pointer
>> is garbage, and we oops.
>> 
>> Obviously, cb[] could be increased to 56 bytes to accommodate struct
>> ipoib_cb. I tried this, and it is effective in preventing the oops on
>> one of my systems.
>> 
>> But I suspect there is an historical reason I’m not aware of that it
>> has remained 48 bytes for years.
> 
> Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 commit 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses" we are using the skb->cb field to enable proper work under GRO and avoid another historical quirk we had there... so I think we can definetly consider commit e0f31d849 to introduce a severe regression... Govindarajulu, Dave - what's your thinking here? any quick idea on how to fix?

Hi Or-

Is there a bugzilla report filed for this issue? Has there been any progress
towards a fix?


> Also, I was thinking we have the mechanics in the kernel, e.g commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to catch such over-flows?
> 
> Or.
> 
>>>> BUG: unable to handle kernel paging request at ffff88090000007e
>>>> IP: __dev_queue_xmit+0x519
>>>> Call Trace:
>>>> ? __dev_queue_xmit+0x49
>>>> dev_queue_xmit+0x10
>>>> neigh_connected_output
>>>> ? ip_finish_output
>>>> ip_finish_output
>>>> ? ip_finish_output
>>>> ? netif_rx_ni
>>>> ip_mc_output
>>>> ip_local_out_sk
>>>> ip_send_skb
>>>> udp_send_skb
>>>> udp_sendmsg
>>>> ? ip_reply_glue_bits
>>>> ? __lock_is_held
>>>> inet_sendmsg
>>>> ? inet_sendmsg
>>>> sock_sendmsg
>>>> ? might_fault
>>>> ? might_fault
>>>> ? move_addr_to_kernel.part.38
>>>> SYSC_sendto
>>>> ? sysret_check
>>>> ? trace_hardirqs_on_caller
>>>> ? trace_hardirqs_on_thunk
>>>> SyS_sendto
>>>> system_call_fastpath
>>>> 
>>>> Kernel panic - not syncing: Fatal exception in interrupt
>>>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
>>>> drm_kms_helper: panic occurred, switching back to text console
>>>> 
>>>> A screenshot of this kernel oops can be found here:
>>>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>>>> 
>>>> gdb translates the crash address into the following (not sure this makes sense
>>>> since offset 0x519 is past the end of __dev_queue_xmit()):
>>>> 
>>>> (gdb) list *(__dev_queue_xmit+0x519)
>>>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links (net/core/dev.c:5167).
>>>> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char *oldname)
>>>> 5163    {
>>>> 5164            struct netdev_adjacent *iter;
>>>> 5165
>>>> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
>>>> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
>>>> 5168                                              &iter->dev->adj_list.lower);
>>>> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
>>>> 5170                                              &iter->dev->adj_list.lower);
>>>> 5171            }
>>>> 
>>>> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>>>> 
>>>> (gdb) list *(__dev_queue_xmit+0x49)
>>>> 0xffffffff8136b7b9 is in __dev_queue_xmit (./arch/x86/include/asm/preempt.h:75).
>>>> 70       * The various preempt_count add/sub methods
>>>> 71       */
>>>> 72
>>>> 73      static __always_inline void __preempt_count_add(int val)
>>>> 74      {
>>>> 75              raw_cpu_add_4(__preempt_count, val);
>>>> 76      }
>>>> 77
>>>> 78      static __always_inline void __preempt_count_sub(int val)
>>>> 79      {
> 

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-29 18:52           ` Chuck Lever
@ 2014-09-29 19:00                 ` Eric Dumazet
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2014-09-29 19:00 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Or Gerlitz, Govindarajulu Varadarajan, David S. Miller,
	netdev-u79uwXL29TY76Z2rM5mHXA, LKML Kernel, linux-rdma,
	Bart Van Assche, Saeed Mahameed, Tal Alon, Yevgeny Petrilin,
	Roland Dreier

On Mon, 2014-09-29 at 14:52 -0400, Chuck Lever wrote:

> Is there a bugzilla report filed for this issue? Has there been any progress
> towards a fix?

This is fixed in Linus tree.

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=257117862634d89de33fec74858b1a0ba5ab444b

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=b49fe36208b45f76dfbcfcd3afd952a33fa9f5ce


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
@ 2014-09-29 19:00                 ` Eric Dumazet
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2014-09-29 19:00 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Or Gerlitz, Govindarajulu Varadarajan, David S. Miller, netdev,
	LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed,
	Tal Alon, Yevgeny Petrilin, Roland Dreier

On Mon, 2014-09-29 at 14:52 -0400, Chuck Lever wrote:

> Is there a bugzilla report filed for this issue? Has there been any progress
> towards a fix?

This is fixed in Linus tree.

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=257117862634d89de33fec74858b1a0ba5ab444b

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=b49fe36208b45f76dfbcfcd3afd952a33fa9f5ce



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 3.17-rc1 oops during network interface configuration
  2014-09-29 19:00                 ` Eric Dumazet
  (?)
@ 2014-09-30 14:56                 ` Chuck Lever
  -1 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2014-09-30 14:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Or Gerlitz, Govindarajulu Varadarajan, David S. Miller, netdev,
	LKML Kernel, linux-rdma, Bart Van Assche, Saeed Mahameed,
	Tal Alon, Yevgeny Petrilin, Roland Dreier


On Sep 29, 2014, at 3:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Mon, 2014-09-29 at 14:52 -0400, Chuck Lever wrote:
> 
>> Is there a bugzilla report filed for this issue? Has there been any progress
>> towards a fix?
> 
> This is fixed in Linus tree.
> 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=257117862634d89de33fec74858b1a0ba5ab444b
> 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=b49fe36208b45f76dfbcfcd3afd952a33fa9f5ce

Tested 3.17-rc7, fix confirmed. Many thanks, Eric!

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-09-30 14:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-18 12:18 3.17-rc1 oops during network interface configuration Bart Van Assche
2014-08-19 14:56 ` Chuck Lever
     [not found]   ` <BA3DEE00-C035-41B9-8ECD-614F04483395-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-09-09 16:24     ` Steve Wise
2014-09-09 16:24       ` Steve Wise
2014-09-09 16:41       ` Chuck Lever
     [not found] ` <53F1EF18.7010909-HInyCGIudOg@public.gmane.org>
2014-08-20 10:31   ` Or Gerlitz
2014-08-20 10:31     ` Or Gerlitz
2014-09-09 19:30     ` Chuck Lever
     [not found]       ` <F580F466-28A5-4BDD-A338-D5C065A760C1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-09-10  7:42         ` Or Gerlitz
2014-09-10  7:42           ` Or Gerlitz
2014-09-10  7:42           ` Or Gerlitz
2014-09-10 14:24           ` Eric Dumazet
     [not found]           ` <541000F1.5000805-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2014-09-10 20:04             ` David Miller
2014-09-10 20:04               ` David Miller
2014-09-29 18:52           ` Chuck Lever
     [not found]             ` <6C43129E-4365-4BEF-ADAD-963B8F386789-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-09-29 19:00               ` Eric Dumazet
2014-09-29 19:00                 ` Eric Dumazet
2014-09-30 14:56                 ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.