All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.19: Traced deadlock during xfrm_user module load
@ 2019-06-25 15:55 Thomas Jarosch
  2019-06-25 16:53 ` Florian Westphal
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Jarosch @ 2019-06-25 15:55 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, Juliana Rodrigueiro

Hi all,

we're in the process of upgrading to kernel 4.19 and hit
a very rare lockup on boot during "xfrm_user" module load.
The tested kernel was 4.19.55.

When the strongswan IPsec service starts, it loads the xfrm_user module.
-> modprobe hangs forever. 

Also network services like ssh or apache stop responding,
ICMP ping still works.

By chance we had magic sysRq enabled and were able to get some meaningful stack 
traces. We've rebuilt the kernel with LOCKDEP + DEBUG_INFO + DEBUG_INFO_REDUCED, 
but so far failed to reproduce the issue even when hammering the suspected 
deadlock case. Though it's just hammering it for a few hours yet.

Preliminary analysis:

"modprobe xfrm_user":
    xfrm_user_init()
        register_pernet_subsys()
            -> grab pernet_ops_rwsem
                ..
                netlink_table_grab()
                    calls schedule() as "nl_table_users" is non-zero


conntrack netlink related program "info_iponline" does this in parallel:
    netlink_bind()
        netlink_lock_table() -> increases "nl_table_users"
            nfnetlink_bind()
            # does not unlock the table as it's locked by netlink_bind()
                __request_module()
                    call_usermodehelper_exec()
            

"modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink:
    ctnetlink_init()
        register_pernet_subsys()
            -> blocks on "pernet_ops_rwsem" thanks to xfrm_user module
                -> schedule()
                    -> deadlock forever


Full stack traces:

strongswan starts on boot and loads "xfrm_user":

kernel: modprobe        D    0  3825   3762 0x80000080
kernel: Call Trace:
kernel: __schedule+0x188/0x4d0
kernel: ? add_wait_queue_exclusive+0x4d/0x60
kernel: schedule+0x21/0x70
kernel: netlink_table_grab+0x9a/0xf0
kernel: ? wake_up_q+0x60/0x60
kernel: __netlink_kernel_create+0x15f/0x1e0
kernel: xfrm_user_net_init+0x45/0x80 [xfrm_user]
kernel: ? xfrm_user_net_exit+0x60/0x60 [xfrm_user]
kernel: ops_init+0x68/0x100
kernel: ? vprintk_emit+0x9e/0x1a0
kernel: ? 0xf826c000
kernel: ? xfrm_dump_sa+0x120/0x120 [xfrm_user]
kernel: register_pernet_operations+0xef/0x1d0
kernel: ? 0xf826c000
kernel: register_pernet_subsys+0x1c/0x30
kernel: xfrm_user_init+0x18/0x1000 [xfrm_user]
kernel: do_one_initcall+0x44/0x190
kernel: ? free_unref_page_commit+0x6a/0xd0
kernel: do_init_module+0x46/0x1c0
kernel: load_module+0x1dc1/0x2400
kernel: sys_init_module+0xed/0x120
kernel: do_fast_syscall_32+0x7a/0x200
kernel: entry_SYSENTER_32+0x6b/0xbe


Another program triggers a conntrack netlink operation in parallel:

kernel: info_iponline   D    0  3787   3684 0x00000084
kernel: Call Trace:
kernel: __schedule+0x188/0x4d0
kernel: schedule+0x21/0x70
kernel: schedule_timeout+0x195/0x260
kernel: ? wake_up_process+0xf/0x20
kernel: ? insert_work+0x86/0xa0
kernel: wait_for_common+0xe2/0x190
kernel: ? wake_up_q+0x60/0x60
kernel: wait_for_completion_killable+0x12/0x30
kernel: call_usermodehelper_exec+0xda/0x170
kernel: __request_module+0x115/0x2e0
kernel: ? __wake_up_common_lock+0x7a/0xa0
kernel: ? __wake_up+0xd/0x20
kernel: nfnetlink_bind+0x28/0x53 [nfnetlink]
kernel: netlink_bind+0x138/0x300
kernel: ? netlink_setsockopt+0x320/0x320
kernel: __sys_bind+0x65/0xb0
kernel: ? __audit_syscall_exit+0x1fb/0x270
kernel: ? __audit_syscall_entry+0xad/0xf0
kernel: sys_socketcall+0x2f0/0x350
kernel: ? _cond_resched+0x12/0x40
kernel: do_fast_syscall_32+0x7a/0x200
kernel: entry_SYSENTER_32+0x6b/0xbe


This triggers a module load by the kernel:

kernel: modprobe        D    0  3827   1578 0x80000080
kernel: Call Trace:
kernel: __schedule+0x188/0x4d0
kernel: schedule+0x21/0x70
kernel: rwsem_down_write_failed+0xf5/0x210
kernel: ? 0xf8283000
kernel: call_rwsem_down_write_failed+0x9/0xc
kernel: down_write+0x18/0x30
kernel: register_pernet_subsys+0x10/0x30
kernel: ctnetlink_init+0x48/0x1000 [nf_conntrack_netlink]
kernel: do_one_initcall+0x44/0x190
kernel: ? free_unref_page_commit+0x6a/0xd0
kernel: do_init_module+0x46/0x1c0
kernel: load_module+0x1dc1/0x2400
kernel: sys_init_module+0xed/0x120
kernel: do_fast_syscall_32+0x7a/0x200
kernel: entry_SYSENTER_32+0x6b/0xbe


Other network related operations are deadlocked in netlink_table_grab():

kernel: ntpd            D    0  3831   3830 0x00000080
kernel: Call Trace:
kernel: __schedule+0x188/0x4d0
kernel: ? add_wait_queue_exclusive+0x4d/0x60
kernel: schedule+0x21/0x70
kernel: netlink_table_grab+0x9a/0xf0
kernel: ? wake_up_q+0x60/0x60
kernel: netlink_release+0x14e/0x460
kernel: __sock_release+0x2d/0xb0
kernel: sock_close+0xd/0x20
kernel: __fput+0x93/0x1c0
kernel: ____fput+0x8/0x10
kernel: task_work_run+0x82/0xa0
kernel: exit_to_usermode_loop+0x8d/0x90
kernel: do_fast_syscall_32+0x1a7/0x200
kernel: entry_SYSENTER_32+0x6b/0xbe


The easy workaround for us is to load the conntrack netlink
modules before starting strongswan. Since we use classic init,
this works. In parallel booting systemd world, people might
still hit the issue, so it's worth fixing.

The related function netlink_create() unlocks the table
before requesting modules and locks it afterwards again.

We guess it's racy to call netlink_unlock_table()
before doing the 
    
    err = nlk->netlink_bind(net, group + 1);
    
call in netlink_bind() ?


What would be the best fix here?

Best regards,
Thomas Jarosch and Juliana Rodrigueiro

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 4.19: Traced deadlock during xfrm_user module load
  2019-06-25 15:55 4.19: Traced deadlock during xfrm_user module load Thomas Jarosch
@ 2019-06-25 16:53 ` Florian Westphal
  2019-06-27 15:46   ` Thomas Jarosch
  0 siblings, 1 reply; 4+ messages in thread
From: Florian Westphal @ 2019-06-25 16:53 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: netdev, netfilter-devel, Juliana Rodrigueiro

Thomas Jarosch <thomas.jarosch@intra2net.com> wrote:
> we're in the process of upgrading to kernel 4.19 and hit
> a very rare lockup on boot during "xfrm_user" module load.
> The tested kernel was 4.19.55.
> 
> When the strongswan IPsec service starts, it loads the xfrm_user module.
> -> modprobe hangs forever. 
> 
> Also network services like ssh or apache stop responding,
> ICMP ping still works.
> 
> By chance we had magic sysRq enabled and were able to get some meaningful stack 
> traces. We've rebuilt the kernel with LOCKDEP + DEBUG_INFO + DEBUG_INFO_REDUCED, 
> but so far failed to reproduce the issue even when hammering the suspected 
> deadlock case. Though it's just hammering it for a few hours yet.
> 
> Preliminary analysis:
> 
> "modprobe xfrm_user":
>     xfrm_user_init()
>         register_pernet_subsys()
>             -> grab pernet_ops_rwsem
>                 ..
>                 netlink_table_grab()
>                     calls schedule() as "nl_table_users" is non-zero
> 
> 
> conntrack netlink related program "info_iponline" does this in parallel:
>     netlink_bind()
>         netlink_lock_table() -> increases "nl_table_users"
>             nfnetlink_bind()
>             # does not unlock the table as it's locked by netlink_bind()
>                 __request_module()
>                     call_usermodehelper_exec()
>             
> 
> "modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink:
>     ctnetlink_init()
>         register_pernet_subsys()
>             -> blocks on "pernet_ops_rwsem" thanks to xfrm_user module
>                 -> schedule()
>                     -> deadlock forever
> 

Thanks for this detailed analysis.
In this specific case I think this is enough:

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 92077d459109..61ba92415480 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -578,7 +578,8 @@ static int nfnetlink_bind(struct net *net, int group)
        ss = nfnetlink_get_subsys(type << 8);
        rcu_read_unlock();
        if (!ss)
-               request_module("nfnetlink-subsys-%d", type);
+               request_module_nowait("nfnetlink-subsys-%d", type);
        return 0;
 }
 #endif



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: 4.19: Traced deadlock during xfrm_user module load
  2019-06-25 16:53 ` Florian Westphal
@ 2019-06-27 15:46   ` Thomas Jarosch
  2019-06-27 15:50     ` Florian Westphal
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Jarosch @ 2019-06-27 15:46 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, netfilter-devel, Juliana Rodrigueiro

Hi Florian,

You wrote on Tue, Jun 25, 2019 at 06:53:44PM +0200:
> Thanks for this detailed analysis.
> In this specific case I think this is enough:
> 
> diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
> index 92077d459109..61ba92415480 100644
> --- a/net/netfilter/nfnetlink.c
> +++ b/net/netfilter/nfnetlink.c
> @@ -578,7 +578,8 @@ static int nfnetlink_bind(struct net *net, int group)
>         ss = nfnetlink_get_subsys(type << 8);
>         rcu_read_unlock();
>         if (!ss)
> -               request_module("nfnetlink-subsys-%d", type);
> +               request_module_nowait("nfnetlink-subsys-%d", type);
>         return 0;
>  }
>  #endif

thanks for the patch! We finally found an easy way to reproduce the deadlock,
the following commands instantly trigger the problem on our machines:

    rmmod nf_conntrack_netlink
    rmmod xfrm_user
    conntrack -e NEW -E & modprobe -v xfrm_user

Note: the "-e" filter is needed to trigger the problematic
code path in the kernel.

We were worried that using "_nowait" would introduce other race conditions,
since the requested service might not be available by the time it is required.

On the other hand, if we understand correctly, it seems that after
"nfnetlink_bind()", the caller will listen on the socket for messages
regardless whether the needed modules are loaded, loading or unloaded.
To verify this we added a three second sleep during the initialisation of
nf_conntrack_netlink. The events started to appear after
the delayed init was completed.

If this is the case, then using "_nowait" should suffice as a fix
for the problem. Could you please confirm these assumptions
and give us some piece of mind?

Best regards,
Juliana Rodrigueiro and Thomas Jarosch

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 4.19: Traced deadlock during xfrm_user module load
  2019-06-27 15:46   ` Thomas Jarosch
@ 2019-06-27 15:50     ` Florian Westphal
  0 siblings, 0 replies; 4+ messages in thread
From: Florian Westphal @ 2019-06-27 15:50 UTC (permalink / raw)
  To: Thomas Jarosch
  Cc: Florian Westphal, netdev, netfilter-devel, Juliana Rodrigueiro

Thomas Jarosch <thomas.jarosch@intra2net.com> wrote:
> You wrote on Tue, Jun 25, 2019 at 06:53:44PM +0200:
> > Thanks for this detailed analysis.
> > In this specific case I think this is enough:
> > 
> > diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
> > index 92077d459109..61ba92415480 100644
> > --- a/net/netfilter/nfnetlink.c
> > +++ b/net/netfilter/nfnetlink.c
> > @@ -578,7 +578,8 @@ static int nfnetlink_bind(struct net *net, int group)
> >         ss = nfnetlink_get_subsys(type << 8);
> >         rcu_read_unlock();
> >         if (!ss)
> > -               request_module("nfnetlink-subsys-%d", type);
> > +               request_module_nowait("nfnetlink-subsys-%d", type);
> >         return 0;
> >  }
> >  #endif
> 
> thanks for the patch! We finally found an easy way to reproduce the deadlock,
> the following commands instantly trigger the problem on our machines:
> 
>     rmmod nf_conntrack_netlink
>     rmmod xfrm_user
>     conntrack -e NEW -E & modprobe -v xfrm_user
> 
> Note: the "-e" filter is needed to trigger the problematic
> code path in the kernel.
> 
> We were worried that using "_nowait" would introduce other race conditions,
> since the requested service might not be available by the time it is required.

Then this code would be buggy too, there is no guarantee that a
request_module() succeeds.

> "nfnetlink_bind()", the caller will listen on the socket for messages
> regardless whether the needed modules are loaded, loading or unloaded.
> To verify this we added a three second sleep during the initialisation of
> nf_conntrack_netlink. The events started to appear after
> the delayed init was completed.
> 
> If this is the case, then using "_nowait" should suffice as a fix
> for the problem. Could you please confirm these assumptions
> and give us some piece of mind?

Yes, _nowait is safe here (and needed, as you find out).
I'm away for a few hours but I plan to submit this patch officially
soon.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-06-27 15:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 15:55 4.19: Traced deadlock during xfrm_user module load Thomas Jarosch
2019-06-25 16:53 ` Florian Westphal
2019-06-27 15:46   ` Thomas Jarosch
2019-06-27 15:50     ` Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.