linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* openvswitch crash on i386
@ 2019-03-05  9:40 Juerg Haefliger
  2019-03-05 10:12 ` [ovs-dev] " Christian Ehrhardt
  0 siblings, 1 reply; 6+ messages in thread
From: Juerg Haefliger @ 2019-03-05  9:40 UTC (permalink / raw)
  To: pshelar, davem, netdev, dev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4079 bytes --]

Hi,

Running the following commands in a loop will crash an i386 5.0 kernel
typically within a few iterations:

ovs-vsctl add-br test
ovs-vsctl del-br test

[  106.215748] BUG: unable to handle kernel paging request at e8a35f3b
[  106.216733] #PF error: [normal kernel read fault]
[  106.217464] *pdpt = 0000000019a76001 *pde = 0000000000000000 
[  106.218346] Oops: 0000 [#1] SMP PTI
[  106.218911] CPU: 0 PID: 2050 Comm: systemd-udevd Tainted: G            E     5.0.0 #25
[  106.220103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1ubuntu1 04/01/2014
[  106.221447] EIP: kmem_cache_alloc_trace+0x7a/0x1b0
[  106.222178] Code: 01 00 00 8b 07 64 8b 50 04 64 03 05 28 61 e8 d2 8b 08 89 4d ec 85 c9 0f 84 03 01 00 00 8b 45 ec 8b 5f 14 8d 4a 01 8b 37 01 c3 <33> 1b 33 9f b4 00 00 00 64 0f c7 0e 75 cb 8b 75 ec 8b 47 14 0f 18
[  106.224752] EAX: e8a35f3b EBX: e8a35f3b ECX: 0000869f EDX: 0000869e
[  106.225683] ESI: d2e96ef0 EDI: da401a00 EBP: d9b85dd0 ESP: d9b85db0
[  106.226662] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
[  106.227710] CR0: 80050033 CR2: e8a35f3b CR3: 185b8000 CR4: 000006f0
[  106.228703] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  106.229604] DR6: fffe0ff0 DR7: 00000400
[  106.230114] Call Trace:
[  106.230525]  ? kernfs_fop_open+0xb4/0x390
[  106.231176]  kernfs_fop_open+0xb4/0x390
[  106.231856]  ? security_file_open+0x7c/0xc0
[  106.232562]  do_dentry_open+0x131/0x370
[  106.233229]  ? kernfs_fop_write+0x180/0x180
[  106.233905]  vfs_open+0x25/0x30
[  106.234432]  path_openat+0x2fd/0x1450
[  106.235084]  ? cp_new_stat64+0x115/0x140
[  106.235754]  ? cp_new_stat64+0x115/0x140
[  106.236427]  do_filp_open+0x6a/0xd0
[  106.237026]  ? cp_new_stat64+0x115/0x140
[  106.237748]  ? strncpy_from_user+0x3d/0x180
[  106.238539]  ? __alloc_fd+0x36/0x120
[  106.239256]  do_sys_open+0x175/0x210
[  106.239955]  sys_openat+0x1b/0x20
[  106.240596]  do_fast_syscall_32+0x7f/0x1e0
[  106.241313]  entry_SYSENTER_32+0x6b/0xbe
[  106.242017] EIP: 0xb7fae871
[  106.242559] Code: 8b 98 58 cd ff ff 89 c8 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 34 24 c3 8b 3c 24 c3 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
[  106.245551] EAX: ffffffda EBX: ffffff9c ECX: bffdcb60 EDX: 00088000
[  106.246651] ESI: 00000000 EDI: b7f9e000 EBP: 00088000 ESP: bffdc970
[  106.247706] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
[  106.248851] Modules linked in: openvswitch(E)
[  106.249621] CR2: 00000000e8a35f3b
[  106.250218] ---[ end trace 6a8d05679a59cda7 ]---

I've bisected this down to the following commit that seems to have introduced
the issue:

commit 120645513f55a4ac5543120d9e79925d30a0156f (refs/bisect/bad)
Author: Jarno Rajahalme <jarno@ovn.org>
Date:   Fri Apr 21 16:48:06 2017 -0700

    openvswitch: Add eventmask support to CT action.
    
    Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
    which can be used in conjunction with the commit flag
    (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
    conntrack events (IPCT_*) should be delivered via the Netfilter
    netlink multicast groups.  Default behavior depends on the system
    configuration, but typically a lot of events are delivered.  This can be
    very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
    types of events are of interest.
    
    Netfilter core init_conntrack() adds the event cache extension, so we
    only need to set the ctmask value.  However, if the system is
    configured without support for events, the setting will be skipped due
    to extension not being found.
    
    Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
    Reviewed-by: Greg Rose <gvrose8192@gmail.com>
    Acked-by: Joe Stringer <joe@ovn.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Reverting that commit from 5.0 makes the problem go away. I'm not able to
reproduce the crash on x86_64.

...Juerg

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ovs-dev] openvswitch crash on i386
  2019-03-05  9:40 openvswitch crash on i386 Juerg Haefliger
@ 2019-03-05 10:12 ` Christian Ehrhardt
  2019-03-05 19:58   ` Joe Stringer
  0 siblings, 1 reply; 6+ messages in thread
From: Christian Ehrhardt @ 2019-03-05 10:12 UTC (permalink / raw)
  To: Juerg Haefliger, James Page
  Cc: pshelar, davem, netdev, <dev@openvswitch.org>, linux-kernel

On Tue, Mar 5, 2019 at 10:58 AM Juerg Haefliger
<juerg.haefliger@canonical.com> wrote:
>
> Hi,
>
> Running the following commands in a loop will crash an i386 5.0 kernel
> typically within a few iterations:
>
> ovs-vsctl add-br test
> ovs-vsctl del-br test
>
> [  106.215748] BUG: unable to handle kernel paging request at e8a35f3b
> [  106.216733] #PF error: [normal kernel read fault]
> [  106.217464] *pdpt = 0000000019a76001 *pde = 0000000000000000
> [  106.218346] Oops: 0000 [#1] SMP PTI
> [  106.218911] CPU: 0 PID: 2050 Comm: systemd-udevd Tainted: G            E     5.0.0 #25
> [  106.220103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1ubuntu1 04/01/2014
> [  106.221447] EIP: kmem_cache_alloc_trace+0x7a/0x1b0
> [  106.222178] Code: 01 00 00 8b 07 64 8b 50 04 64 03 05 28 61 e8 d2 8b 08 89 4d ec 85 c9 0f 84 03 01 00 00 8b 45 ec 8b 5f 14 8d 4a 01 8b 37 01 c3 <33> 1b 33 9f b4 00 00 00 64 0f c7 0e 75 cb 8b 75 ec 8b 47 14 0f 18
> [  106.224752] EAX: e8a35f3b EBX: e8a35f3b ECX: 0000869f EDX: 0000869e
> [  106.225683] ESI: d2e96ef0 EDI: da401a00 EBP: d9b85dd0 ESP: d9b85db0
> [  106.226662] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
> [  106.227710] CR0: 80050033 CR2: e8a35f3b CR3: 185b8000 CR4: 000006f0
> [  106.228703] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [  106.229604] DR6: fffe0ff0 DR7: 00000400
> [  106.230114] Call Trace:
> [  106.230525]  ? kernfs_fop_open+0xb4/0x390
> [  106.231176]  kernfs_fop_open+0xb4/0x390
> [  106.231856]  ? security_file_open+0x7c/0xc0
> [  106.232562]  do_dentry_open+0x131/0x370
> [  106.233229]  ? kernfs_fop_write+0x180/0x180
> [  106.233905]  vfs_open+0x25/0x30
> [  106.234432]  path_openat+0x2fd/0x1450
> [  106.235084]  ? cp_new_stat64+0x115/0x140
> [  106.235754]  ? cp_new_stat64+0x115/0x140
> [  106.236427]  do_filp_open+0x6a/0xd0
> [  106.237026]  ? cp_new_stat64+0x115/0x140
> [  106.237748]  ? strncpy_from_user+0x3d/0x180
> [  106.238539]  ? __alloc_fd+0x36/0x120
> [  106.239256]  do_sys_open+0x175/0x210
> [  106.239955]  sys_openat+0x1b/0x20
> [  106.240596]  do_fast_syscall_32+0x7f/0x1e0
> [  106.241313]  entry_SYSENTER_32+0x6b/0xbe
> [  106.242017] EIP: 0xb7fae871
> [  106.242559] Code: 8b 98 58 cd ff ff 89 c8 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 34 24 c3 8b 3c 24 c3 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> [  106.245551] EAX: ffffffda EBX: ffffff9c ECX: bffdcb60 EDX: 00088000
> [  106.246651] ESI: 00000000 EDI: b7f9e000 EBP: 00088000 ESP: bffdc970
> [  106.247706] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
> [  106.248851] Modules linked in: openvswitch(E)
> [  106.249621] CR2: 00000000e8a35f3b
> [  106.250218] ---[ end trace 6a8d05679a59cda7 ]---
>
> I've bisected this down to the following commit that seems to have introduced
> the issue:
>
> commit 120645513f55a4ac5543120d9e79925d30a0156f (refs/bisect/bad)
> Author: Jarno Rajahalme <jarno@ovn.org>
> Date:   Fri Apr 21 16:48:06 2017 -0700
>
>     openvswitch: Add eventmask support to CT action.
>
>     Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
>     which can be used in conjunction with the commit flag
>     (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
>     conntrack events (IPCT_*) should be delivered via the Netfilter
>     netlink multicast groups.  Default behavior depends on the system
>     configuration, but typically a lot of events are delivered.  This can be
>     very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
>     types of events are of interest.
>
>     Netfilter core init_conntrack() adds the event cache extension, so we
>     only need to set the ctmask value.  However, if the system is
>     configured without support for events, the setting will be skipped due
>     to extension not being found.
>
>     Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
>     Reviewed-by: Greg Rose <gvrose8192@gmail.com>
>     Acked-by: Joe Stringer <joe@ovn.org>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

Hi Juerg,
the symptom, the identified breaking commit and actually all of it
seems to be [1] which James, Joseph and I worked on already.
I wanted to make you aware of the past context that already exists.

Back then we already reverted the change, found it to be working then.
Afterwards Joseph brought it up with Jarno [2] and got some patch it
seems, but that (whatever change it was - I have never seen it) wasn't
enough and still crashing.
Then we lost traction on the case and now you had to re-debug it I'm afraid :-/

[1]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1736390
[2]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1736390/comments/55

> Reverting that commit from 5.0 makes the problem go away. I'm not able to
> reproduce the crash on x86_64.
>
> ...Juerg
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev



-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ovs-dev] openvswitch crash on i386
  2019-03-05 10:12 ` [ovs-dev] " Christian Ehrhardt
@ 2019-03-05 19:58   ` Joe Stringer
  2019-03-05 22:52     ` Gregory Rose
  2019-03-06 15:31     ` Juerg Haefliger
  0 siblings, 2 replies; 6+ messages in thread
From: Joe Stringer @ 2019-03-05 19:58 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: Juerg Haefliger, James Page, Pravin Shelar, David Miller, netdev,
	<dev@openvswitch.org>,
	LKML, Jarno Rajahalme

On Tue, Mar 5, 2019 at 2:12 AM Christian Ehrhardt
<christian.ehrhardt@canonical.com> wrote:
>
> On Tue, Mar 5, 2019 at 10:58 AM Juerg Haefliger
> <juerg.haefliger@canonical.com> wrote:
> >
> > Hi,
> >
> > Running the following commands in a loop will crash an i386 5.0 kernel
> > typically within a few iterations:
> >
> > ovs-vsctl add-br test
> > ovs-vsctl del-br test
> >
> > [  106.215748] BUG: unable to handle kernel paging request at e8a35f3b
> > [  106.216733] #PF error: [normal kernel read fault]
> > [  106.217464] *pdpt = 0000000019a76001 *pde = 0000000000000000
> > [  106.218346] Oops: 0000 [#1] SMP PTI
> > [  106.218911] CPU: 0 PID: 2050 Comm: systemd-udevd Tainted: G            E     5.0.0 #25
> > [  106.220103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1ubuntu1 04/01/2014
> > [  106.221447] EIP: kmem_cache_alloc_trace+0x7a/0x1b0
> > [  106.222178] Code: 01 00 00 8b 07 64 8b 50 04 64 03 05 28 61 e8 d2 8b 08 89 4d ec 85 c9 0f 84 03 01 00 00 8b 45 ec 8b 5f 14 8d 4a 01 8b 37 01 c3 <33> 1b 33 9f b4 00 00 00 64 0f c7 0e 75 cb 8b 75 ec 8b 47 14 0f 18
> > [  106.224752] EAX: e8a35f3b EBX: e8a35f3b ECX: 0000869f EDX: 0000869e
> > [  106.225683] ESI: d2e96ef0 EDI: da401a00 EBP: d9b85dd0 ESP: d9b85db0
> > [  106.226662] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
> > [  106.227710] CR0: 80050033 CR2: e8a35f3b CR3: 185b8000 CR4: 000006f0
> > [  106.228703] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > [  106.229604] DR6: fffe0ff0 DR7: 00000400
> > [  106.230114] Call Trace:
> > [  106.230525]  ? kernfs_fop_open+0xb4/0x390
> > [  106.231176]  kernfs_fop_open+0xb4/0x390
> > [  106.231856]  ? security_file_open+0x7c/0xc0
> > [  106.232562]  do_dentry_open+0x131/0x370
> > [  106.233229]  ? kernfs_fop_write+0x180/0x180
> > [  106.233905]  vfs_open+0x25/0x30
> > [  106.234432]  path_openat+0x2fd/0x1450
> > [  106.235084]  ? cp_new_stat64+0x115/0x140
> > [  106.235754]  ? cp_new_stat64+0x115/0x140
> > [  106.236427]  do_filp_open+0x6a/0xd0
> > [  106.237026]  ? cp_new_stat64+0x115/0x140
> > [  106.237748]  ? strncpy_from_user+0x3d/0x180
> > [  106.238539]  ? __alloc_fd+0x36/0x120
> > [  106.239256]  do_sys_open+0x175/0x210
> > [  106.239955]  sys_openat+0x1b/0x20
> > [  106.240596]  do_fast_syscall_32+0x7f/0x1e0
> > [  106.241313]  entry_SYSENTER_32+0x6b/0xbe
> > [  106.242017] EIP: 0xb7fae871
> > [  106.242559] Code: 8b 98 58 cd ff ff 89 c8 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 34 24 c3 8b 3c 24 c3 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> > [  106.245551] EAX: ffffffda EBX: ffffff9c ECX: bffdcb60 EDX: 00088000
> > [  106.246651] ESI: 00000000 EDI: b7f9e000 EBP: 00088000 ESP: bffdc970
> > [  106.247706] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
> > [  106.248851] Modules linked in: openvswitch(E)
> > [  106.249621] CR2: 00000000e8a35f3b
> > [  106.250218] ---[ end trace 6a8d05679a59cda7 ]---
> >
> > I've bisected this down to the following commit that seems to have introduced
> > the issue:
> >
> > commit 120645513f55a4ac5543120d9e79925d30a0156f (refs/bisect/bad)
> > Author: Jarno Rajahalme <jarno@ovn.org>
> > Date:   Fri Apr 21 16:48:06 2017 -0700
> >
> >     openvswitch: Add eventmask support to CT action.
> >
> >     Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
> >     which can be used in conjunction with the commit flag
> >     (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
> >     conntrack events (IPCT_*) should be delivered via the Netfilter
> >     netlink multicast groups.  Default behavior depends on the system
> >     configuration, but typically a lot of events are delivered.  This can be
> >     very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
> >     types of events are of interest.
> >
> >     Netfilter core init_conntrack() adds the event cache extension, so we
> >     only need to set the ctmask value.  However, if the system is
> >     configured without support for events, the setting will be skipped due
> >     to extension not being found.
> >
> >     Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
> >     Reviewed-by: Greg Rose <gvrose8192@gmail.com>
> >     Acked-by: Joe Stringer <joe@ovn.org>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> Hi Juerg,
> the symptom, the identified breaking commit and actually all of it
> seems to be [1] which James, Joseph and I worked on already.
> I wanted to make you aware of the past context that already exists.
>
> Back then we already reverted the change, found it to be working then.
> Afterwards Joseph brought it up with Jarno [2] and got some patch it
> seems, but that (whatever change it was - I have never seen it) wasn't
> enough and still crashing.
> Then we lost traction on the case and now you had to re-debug it I'm afraid :-/
>
> [1]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1736390
> [2]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1736390/comments/55
>
> > Reverting that commit from 5.0 makes the problem go away. I'm not able to
> > reproduce the crash on x86_64.
> >
> > ...Juerg
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
>
> --
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd

Somehow the patch was shared on three different open source lists (the
bug, linux-kernel, and ovs-dev), yet only one of the three actually
retained the message:

https://mail.openvswitch.org/pipermail/ovs-dev/2018-September/352395.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ovs-dev] openvswitch crash on i386
  2019-03-05 19:58   ` Joe Stringer
@ 2019-03-05 22:52     ` Gregory Rose
  2019-03-06  0:21       ` Gregory Rose
  2019-03-06 15:31     ` Juerg Haefliger
  1 sibling, 1 reply; 6+ messages in thread
From: Gregory Rose @ 2019-03-05 22:52 UTC (permalink / raw)
  To: Joe Stringer, Christian Ehrhardt
  Cc: <dev@openvswitch.org>,
	James Page, netdev, LKML, Juerg Haefliger, David Miller


On 3/5/2019 11:58 AM, Joe Stringer wrote:
> On Tue, Mar 5, 2019 at 2:12 AM Christian Ehrhardt
> <christian.ehrhardt@canonical.com> wrote:
>> On Tue, Mar 5, 2019 at 10:58 AM Juerg Haefliger
>> <juerg.haefliger@canonical.com> wrote:
>>> Hi,
>>>
>>> Running the following commands in a loop will crash an i386 5.0 kernel
>>> typically within a few iterations:
>>>
>>> ovs-vsctl add-br test
>>> ovs-vsctl del-br test

I have an i386 Ubuntu 18 VM to test this on.  I'll investigate and see 
what I can find.

- Greg

>>>
>>> [  106.215748] BUG: unable to handle kernel paging request at e8a35f3b
>>> [  106.216733] #PF error: [normal kernel read fault]
>>> [  106.217464] *pdpt = 0000000019a76001 *pde = 0000000000000000
>>> [  106.218346] Oops: 0000 [#1] SMP PTI
>>> [  106.218911] CPU: 0 PID: 2050 Comm: systemd-udevd Tainted: G            E     5.0.0 #25
>>> [  106.220103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1ubuntu1 04/01/2014
>>> [  106.221447] EIP: kmem_cache_alloc_trace+0x7a/0x1b0
>>> [  106.222178] Code: 01 00 00 8b 07 64 8b 50 04 64 03 05 28 61 e8 d2 8b 08 89 4d ec 85 c9 0f 84 03 01 00 00 8b 45 ec 8b 5f 14 8d 4a 01 8b 37 01 c3 <33> 1b 33 9f b4 00 00 00 64 0f c7 0e 75 cb 8b 75 ec 8b 47 14 0f 18
>>> [  106.224752] EAX: e8a35f3b EBX: e8a35f3b ECX: 0000869f EDX: 0000869e
>>> [  106.225683] ESI: d2e96ef0 EDI: da401a00 EBP: d9b85dd0 ESP: d9b85db0
>>> [  106.226662] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
>>> [  106.227710] CR0: 80050033 CR2: e8a35f3b CR3: 185b8000 CR4: 000006f0
>>> [  106.228703] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>>> [  106.229604] DR6: fffe0ff0 DR7: 00000400
>>> [  106.230114] Call Trace:
>>> [  106.230525]  ? kernfs_fop_open+0xb4/0x390
>>> [  106.231176]  kernfs_fop_open+0xb4/0x390
>>> [  106.231856]  ? security_file_open+0x7c/0xc0
>>> [  106.232562]  do_dentry_open+0x131/0x370
>>> [  106.233229]  ? kernfs_fop_write+0x180/0x180
>>> [  106.233905]  vfs_open+0x25/0x30
>>> [  106.234432]  path_openat+0x2fd/0x1450
>>> [  106.235084]  ? cp_new_stat64+0x115/0x140
>>> [  106.235754]  ? cp_new_stat64+0x115/0x140
>>> [  106.236427]  do_filp_open+0x6a/0xd0
>>> [  106.237026]  ? cp_new_stat64+0x115/0x140
>>> [  106.237748]  ? strncpy_from_user+0x3d/0x180
>>> [  106.238539]  ? __alloc_fd+0x36/0x120
>>> [  106.239256]  do_sys_open+0x175/0x210
>>> [  106.239955]  sys_openat+0x1b/0x20
>>> [  106.240596]  do_fast_syscall_32+0x7f/0x1e0
>>> [  106.241313]  entry_SYSENTER_32+0x6b/0xbe
>>> [  106.242017] EIP: 0xb7fae871
>>> [  106.242559] Code: 8b 98 58 cd ff ff 89 c8 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 34 24 c3 8b 3c 24 c3 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
>>> [  106.245551] EAX: ffffffda EBX: ffffff9c ECX: bffdcb60 EDX: 00088000
>>> [  106.246651] ESI: 00000000 EDI: b7f9e000 EBP: 00088000 ESP: bffdc970
>>> [  106.247706] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
>>> [  106.248851] Modules linked in: openvswitch(E)
>>> [  106.249621] CR2: 00000000e8a35f3b
>>> [  106.250218] ---[ end trace 6a8d05679a59cda7 ]---
>>>
>>> I've bisected this down to the following commit that seems to have introduced
>>> the issue:
>>>
>>> commit 120645513f55a4ac5543120d9e79925d30a0156f (refs/bisect/bad)
>>> Author: Jarno Rajahalme <jarno@ovn.org>
>>> Date:   Fri Apr 21 16:48:06 2017 -0700
>>>
>>>      openvswitch: Add eventmask support to CT action.
>>>
>>>      Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
>>>      which can be used in conjunction with the commit flag
>>>      (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
>>>      conntrack events (IPCT_*) should be delivered via the Netfilter
>>>      netlink multicast groups.  Default behavior depends on the system
>>>      configuration, but typically a lot of events are delivered.  This can be
>>>      very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
>>>      types of events are of interest.
>>>
>>>      Netfilter core init_conntrack() adds the event cache extension, so we
>>>      only need to set the ctmask value.  However, if the system is
>>>      configured without support for events, the setting will be skipped due
>>>      to extension not being found.
>>>
>>>      Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
>>>      Reviewed-by: Greg Rose <gvrose8192@gmail.com>
>>>      Acked-by: Joe Stringer <joe@ovn.org>
>>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>> Hi Juerg,
>> the symptom, the identified breaking commit and actually all of it
>> seems to be [1] which James, Joseph and I worked on already.
>> I wanted to make you aware of the past context that already exists.
>>
>> Back then we already reverted the change, found it to be working then.
>> Afterwards Joseph brought it up with Jarno [2] and got some patch it
>> seems, but that (whatever change it was - I have never seen it) wasn't
>> enough and still crashing.
>> Then we lost traction on the case and now you had to re-debug it I'm afraid :-/
>>
>> [1]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1736390
>> [2]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1736390/comments/55
>>
>>> Reverting that commit from 5.0 makes the problem go away. I'm not able to
>>> reproduce the crash on x86_64.
>>>
>>> ...Juerg
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
>> --
>> Christian Ehrhardt
>> Software Engineer, Ubuntu Server
>> Canonical Ltd
> Somehow the patch was shared on three different open source lists (the
> bug, linux-kernel, and ovs-dev), yet only one of the three actually
> retained the message:
>
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-September/352395.html
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ovs-dev] openvswitch crash on i386
  2019-03-05 22:52     ` Gregory Rose
@ 2019-03-06  0:21       ` Gregory Rose
  0 siblings, 0 replies; 6+ messages in thread
From: Gregory Rose @ 2019-03-06  0:21 UTC (permalink / raw)
  To: Joe Stringer, Christian Ehrhardt
  Cc: <dev@openvswitch.org>,
	James Page, netdev, LKML, Juerg Haefliger, David Miller



On 3/5/2019 2:52 PM, Gregory Rose wrote:
>
> I have an i386 Ubuntu 18 VM to test this on.  I'll investigate and see 
> what I can find.
>
> - Greg
>

I have a repro.  It's not the same kernel (4.13 in my case) but looks 
like the same issue.

- Greg


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [ovs-dev] openvswitch crash on i386
  2019-03-05 19:58   ` Joe Stringer
  2019-03-05 22:52     ` Gregory Rose
@ 2019-03-06 15:31     ` Juerg Haefliger
  1 sibling, 0 replies; 6+ messages in thread
From: Juerg Haefliger @ 2019-03-06 15:31 UTC (permalink / raw)
  To: Joe Stringer
  Cc: Christian Ehrhardt, Juerg Haefliger, James Page, Pravin Shelar,
	David Miller, netdev, <dev@openvswitch.org>,
	LKML, Jarno Rajahalme

[-- Attachment #1: Type: text/plain, Size: 6665 bytes --]

On Tue, 5 Mar 2019 11:58:42 -0800
Joe Stringer <joe@ovn.org> wrote:

> On Tue, Mar 5, 2019 at 2:12 AM Christian Ehrhardt
> <christian.ehrhardt@canonical.com> wrote:
> >
> > On Tue, Mar 5, 2019 at 10:58 AM Juerg Haefliger
> > <juerg.haefliger@canonical.com> wrote:  
> > >
> > > Hi,
> > >
> > > Running the following commands in a loop will crash an i386 5.0 kernel
> > > typically within a few iterations:
> > >
> > > ovs-vsctl add-br test
> > > ovs-vsctl del-br test
> > >
> > > [  106.215748] BUG: unable to handle kernel paging request at e8a35f3b
> > > [  106.216733] #PF error: [normal kernel read fault]
> > > [  106.217464] *pdpt = 0000000019a76001 *pde = 0000000000000000
> > > [  106.218346] Oops: 0000 [#1] SMP PTI
> > > [  106.218911] CPU: 0 PID: 2050 Comm: systemd-udevd Tainted: G            E     5.0.0 #25
> > > [  106.220103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1ubuntu1 04/01/2014
> > > [  106.221447] EIP: kmem_cache_alloc_trace+0x7a/0x1b0
> > > [  106.222178] Code: 01 00 00 8b 07 64 8b 50 04 64 03 05 28 61 e8 d2 8b 08 89 4d ec 85 c9 0f 84 03 01 00 00 8b 45 ec 8b 5f 14 8d 4a 01 8b 37 01 c3 <33> 1b 33 9f b4 00 00 00 64 0f c7 0e 75 cb 8b 75 ec 8b 47 14 0f 18
> > > [  106.224752] EAX: e8a35f3b EBX: e8a35f3b ECX: 0000869f EDX: 0000869e
> > > [  106.225683] ESI: d2e96ef0 EDI: da401a00 EBP: d9b85dd0 ESP: d9b85db0
> > > [  106.226662] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
> > > [  106.227710] CR0: 80050033 CR2: e8a35f3b CR3: 185b8000 CR4: 000006f0
> > > [  106.228703] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > [  106.229604] DR6: fffe0ff0 DR7: 00000400
> > > [  106.230114] Call Trace:
> > > [  106.230525]  ? kernfs_fop_open+0xb4/0x390
> > > [  106.231176]  kernfs_fop_open+0xb4/0x390
> > > [  106.231856]  ? security_file_open+0x7c/0xc0
> > > [  106.232562]  do_dentry_open+0x131/0x370
> > > [  106.233229]  ? kernfs_fop_write+0x180/0x180
> > > [  106.233905]  vfs_open+0x25/0x30
> > > [  106.234432]  path_openat+0x2fd/0x1450
> > > [  106.235084]  ? cp_new_stat64+0x115/0x140
> > > [  106.235754]  ? cp_new_stat64+0x115/0x140
> > > [  106.236427]  do_filp_open+0x6a/0xd0
> > > [  106.237026]  ? cp_new_stat64+0x115/0x140
> > > [  106.237748]  ? strncpy_from_user+0x3d/0x180
> > > [  106.238539]  ? __alloc_fd+0x36/0x120
> > > [  106.239256]  do_sys_open+0x175/0x210
> > > [  106.239955]  sys_openat+0x1b/0x20
> > > [  106.240596]  do_fast_syscall_32+0x7f/0x1e0
> > > [  106.241313]  entry_SYSENTER_32+0x6b/0xbe
> > > [  106.242017] EIP: 0xb7fae871
> > > [  106.242559] Code: 8b 98 58 cd ff ff 89 c8 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 34 24 c3 8b 3c 24 c3 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> > > [  106.245551] EAX: ffffffda EBX: ffffff9c ECX: bffdcb60 EDX: 00088000
> > > [  106.246651] ESI: 00000000 EDI: b7f9e000 EBP: 00088000 ESP: bffdc970
> > > [  106.247706] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
> > > [  106.248851] Modules linked in: openvswitch(E)
> > > [  106.249621] CR2: 00000000e8a35f3b
> > > [  106.250218] ---[ end trace 6a8d05679a59cda7 ]---
> > >
> > > I've bisected this down to the following commit that seems to have introduced
> > > the issue:
> > >
> > > commit 120645513f55a4ac5543120d9e79925d30a0156f (refs/bisect/bad)
> > > Author: Jarno Rajahalme <jarno@ovn.org>
> > > Date:   Fri Apr 21 16:48:06 2017 -0700
> > >
> > >     openvswitch: Add eventmask support to CT action.
> > >
> > >     Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
> > >     which can be used in conjunction with the commit flag
> > >     (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
> > >     conntrack events (IPCT_*) should be delivered via the Netfilter
> > >     netlink multicast groups.  Default behavior depends on the system
> > >     configuration, but typically a lot of events are delivered.  This can be
> > >     very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
> > >     types of events are of interest.
> > >
> > >     Netfilter core init_conntrack() adds the event cache extension, so we
> > >     only need to set the ctmask value.  However, if the system is
> > >     configured without support for events, the setting will be skipped due
> > >     to extension not being found.
> > >
> > >     Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
> > >     Reviewed-by: Greg Rose <gvrose8192@gmail.com>
> > >     Acked-by: Joe Stringer <joe@ovn.org>
> > >     Signed-off-by: David S. Miller <davem@davemloft.net>  
> >
> > Hi Juerg,
> > the symptom, the identified breaking commit and actually all of it
> > seems to be [1] which James, Joseph and I worked on already.
> > I wanted to make you aware of the past context that already exists.
> >
> > Back then we already reverted the change, found it to be working then.
> > Afterwards Joseph brought it up with Jarno [2] and got some patch it
> > seems, but that (whatever change it was - I have never seen it) wasn't
> > enough and still crashing.
> > Then we lost traction on the case and now you had to re-debug it I'm afraid :-/
> >
> > [1]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1736390
> > [2]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1736390/comments/55
> >  
> > > Reverting that commit from 5.0 makes the problem go away. I'm not able to
> > > reproduce the crash on x86_64.
> > >
> > > ...Juerg
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev  
> >
> >
> >
> > --
> > Christian Ehrhardt
> > Software Engineer, Ubuntu Server
> > Canonical Ltd  
> 
> Somehow the patch was shared on three different open source lists (the
> bug, linux-kernel, and ovs-dev), yet only one of the three actually
> retained the message:
> 
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-September/352395.html

Thanks for the link, Jarno, but that patch doesn't help at all.

FWIW, removing the following case statement makes the test pass:

		case OVS_CT_ATTR_EVENTMASK:
			info->have_eventmask = true;
			info->eventmask = nla_get_u32(a);
			break;

But changing it to something like the below also blows up the machine:

		case OVS_CT_ATTR_EVENTMASK:
			info->have_eventmask = false;
			info->eventmask = nla_get_u32(a);
			break;

Also bad:

		case OVS_CT_ATTR_EVENTMASK:
			break;

I'm not saying these tests make any sense, just saying :-)

...Juerg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-03-06 15:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-05  9:40 openvswitch crash on i386 Juerg Haefliger
2019-03-05 10:12 ` [ovs-dev] " Christian Ehrhardt
2019-03-05 19:58   ` Joe Stringer
2019-03-05 22:52     ` Gregory Rose
2019-03-06  0:21       ` Gregory Rose
2019-03-06 15:31     ` Juerg Haefliger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).