Re: INFO: rcu detected stall in wg_packet_tx

linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: INFO: rcu detected stall in wg_packet_tx_worker
       [not found] <00000000000005d1ab05a4351006@google.com>
@ 2020-04-26 17:57 ` syzbot
  2020-04-26 19:40   ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: syzbot @ 2020-04-26 17:57 UTC (permalink / raw)
  To: Jason, davem, f.fainelli, gregkh, jason, jhs, jiri, krzk, kuba,
	kvalo, leon, linux-kernel, linux-kselftest, netdev, shuah,
	syzkaller-bugs, tglx, vivien.didelot, xiyou.wangcong

syzbot has bisected this bug to:

commit e7096c131e5161fa3b8e52a650d7719d2857adfd
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Sun Dec 8 23:27:34 2019 +0000

    net: WireGuard secure network tunnel

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000
start commit:   b2768df2 Merge branch 'for-linus' of git://git.kernel.org/..
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000
console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a
userspace arch: i386
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000

Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in wg_packet_tx_worker
  2020-04-26 17:57 ` INFO: rcu detected stall in wg_packet_tx_worker syzbot
@ 2020-04-26 19:40   ` Eric Dumazet
  2020-04-26 19:42     ` Jason A. Donenfeld
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2020-04-26 19:40 UTC (permalink / raw)
  To: syzbot, Jason, davem, f.fainelli, gregkh, jhs, jiri, krzk, kuba,
	kvalo, leon, linux-kernel, linux-kselftest, netdev, shuah,
	syzkaller-bugs, tglx, vivien.didelot, xiyou.wangcong



On 4/26/20 10:57 AM, syzbot wrote:
> syzbot has bisected this bug to:
> 
> commit e7096c131e5161fa3b8e52a650d7719d2857adfd
> Author: Jason A. Donenfeld <Jason@zx2c4.com>
> Date:   Sun Dec 8 23:27:34 2019 +0000
> 
>     net: WireGuard secure network tunnel
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000
> start commit:   b2768df2 Merge branch 'for-linus' of git://git.kernel.org/..
> git tree:       upstream
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
> dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a
> userspace arch: i386
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000
> 
> Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com
> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> 

I have not looked at the repro closely, but WireGuard has some workers
that might loop forever, cond_resched() might help a bit.

diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c
index da3b782ab7d31df11e381529b144bcc494234a38..349a71e1907e081c61967c77c9f25a6ec5e57a24 100644
--- a/drivers/net/wireguard/receive.c
+++ b/drivers/net/wireguard/receive.c
@@ -518,6 +518,7 @@ void wg_packet_decrypt_worker(struct work_struct *work)
                                &PACKET_CB(skb)->keypair->receiving)) ?
                                PACKET_STATE_CRYPTED : PACKET_STATE_DEAD;
                wg_queue_enqueue_per_peer_napi(skb, state);
+               cond_resched();
        }
 }
 
diff --git a/drivers/net/wireguard/send.c b/drivers/net/wireguard/send.c
index 7348c10cbae3db54bfcb31f23c2753185735f876..f5b88693176c84b4bfdf8c4e05071481a3ce45b5 100644
--- a/drivers/net/wireguard/send.c
+++ b/drivers/net/wireguard/send.c
@@ -281,6 +281,7 @@ void wg_packet_tx_worker(struct work_struct *work)
 
                wg_noise_keypair_put(keypair, false);
                wg_peer_put(peer);
+               cond_resched();
        }
 }
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in wg_packet_tx_worker
  2020-04-26 19:40   ` Eric Dumazet
@ 2020-04-26 19:42     ` Jason A. Donenfeld
  2020-04-26 19:52       ` Jason A. Donenfeld
  2020-04-26 20:26       ` Eric Dumazet
  0 siblings, 2 replies; 10+ messages in thread
From: Jason A. Donenfeld @ 2020-04-26 19:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs,
	Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo,
	leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs,
	Thomas Gleixner, vivien.didelot, Cong Wang

On Sun, Apr 26, 2020 at 1:40 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 4/26/20 10:57 AM, syzbot wrote:
> > syzbot has bisected this bug to:
> >
> > commit e7096c131e5161fa3b8e52a650d7719d2857adfd
> > Author: Jason A. Donenfeld <Jason@zx2c4.com>
> > Date:   Sun Dec 8 23:27:34 2019 +0000
> >
> >     net: WireGuard secure network tunnel
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000
> > start commit:   b2768df2 Merge branch 'for-linus' of git://git.kernel.org/..
> > git tree:       upstream
> > final crash:    https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
> > dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a
> > userspace arch: i386
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000
> >
> > Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com
> > Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
> >
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >
>
> I have not looked at the repro closely, but WireGuard has some workers
> that might loop forever, cond_resched() might help a bit.

I'm working on this right now. Having a bit difficult of a time
getting it to reproduce locally...

The reports show the stall happening always at:

static struct sk_buff *
sfq_dequeue(struct Qdisc *sch)
{
       struct sfq_sched_data *q = qdisc_priv(sch);
       struct sk_buff *skb;
       sfq_index a, next_a;
       struct sfq_slot *slot;

       /* No active slots */
       if (q->tail == NULL)
               return NULL;

next_slot:
       a = q->tail->next;
       slot = &q->slots[a];

Which is kind of interesting, because it's not like that should block
or anything, unless there's some kasan faulting happening.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in wg_packet_tx_worker
  2020-04-26 19:42     ` Jason A. Donenfeld
@ 2020-04-26 19:52       ` Jason A. Donenfeld
  2020-04-26 19:58         ` Jason A. Donenfeld
  2020-04-26 20:26       ` Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: Jason A. Donenfeld @ 2020-04-26 19:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs,
	Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo,
	leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs,
	Thomas Gleixner, vivien.didelot, Cong Wang

It looks like part of the issue might be that I call
udp_tunnel6_xmit_skb while holding rcu_read_lock_bh, in
drivers/net/wireguard/socket.c. But I think there's good reason to do
so, and udp_tunnel6_xmit_skb should be rcu safe. In fact,
every.single.other user of udp_tunnel6_xmit_skb in the kernel uses it
with rcu locked. So, hm...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in wg_packet_tx_worker
  2020-04-26 19:52       ` Jason A. Donenfeld
@ 2020-04-26 19:58         ` Jason A. Donenfeld
  0 siblings, 0 replies; 10+ messages in thread
From: Jason A. Donenfeld @ 2020-04-26 19:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs,
	Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo,
	leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs,
	Thomas Gleixner, vivien.didelot, Cong Wang

On Sun, Apr 26, 2020 at 1:52 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> It looks like part of the issue might be that I call
> udp_tunnel6_xmit_skb while holding rcu_read_lock_bh, in
> drivers/net/wireguard/socket.c. But I think there's good reason to do
> so, and udp_tunnel6_xmit_skb should be rcu safe. In fact,
> every.single.other user of udp_tunnel6_xmit_skb in the kernel uses it
> with rcu locked. So, hm...

In the syzkaller log, it looks like several runs are hitting:

run #0: crashed: INFO: rcu detected stall in netlink_sendmsg

And other runs are hitting yet different functions. So actually, it's
not clear that this is the fault of the call to udp_tunnel6_xmit_skb.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in wg_packet_tx_worker
  2020-04-26 19:42     ` Jason A. Donenfeld
  2020-04-26 19:52       ` Jason A. Donenfeld
@ 2020-04-26 20:26       ` Eric Dumazet
  2020-04-26 20:38         ` Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2020-04-26 20:26 UTC (permalink / raw)
  To: Jason A. Donenfeld, Eric Dumazet
  Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs,
	Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo,
	leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs,
	Thomas Gleixner, vivien.didelot, Cong Wang



On 4/26/20 12:42 PM, Jason A. Donenfeld wrote:
> On Sun, Apr 26, 2020 at 1:40 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 4/26/20 10:57 AM, syzbot wrote:
>>> syzbot has bisected this bug to:
>>>
>>> commit e7096c131e5161fa3b8e52a650d7719d2857adfd
>>> Author: Jason A. Donenfeld <Jason@zx2c4.com>
>>> Date:   Sun Dec 8 23:27:34 2019 +0000
>>>
>>>     net: WireGuard secure network tunnel
>>>
>>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000
>>> start commit:   b2768df2 Merge branch 'for-linus' of git://git.kernel.org/..
>>> git tree:       upstream
>>> final crash:    https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a
>>> userspace arch: i386
>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000
>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000
>>>
>>> Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com
>>> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
>>>
>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>>>
>>
>> I have not looked at the repro closely, but WireGuard has some workers
>> that might loop forever, cond_resched() might help a bit.
> 
> I'm working on this right now. Having a bit difficult of a time
> getting it to reproduce locally...
> 
> The reports show the stall happening always at:
> 
> static struct sk_buff *
> sfq_dequeue(struct Qdisc *sch)
> {
>        struct sfq_sched_data *q = qdisc_priv(sch);
>        struct sk_buff *skb;
>        sfq_index a, next_a;
>        struct sfq_slot *slot;
> 
>        /* No active slots */
>        if (q->tail == NULL)
>                return NULL;
> 
> next_slot:
>        a = q->tail->next;
>        slot = &q->slots[a];
> 
> Which is kind of interesting, because it's not like that should block
> or anything, unless there's some kasan faulting happening.
> 

I am not really sure WireGuard is involved, the repro does not rely on it anyway.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in wg_packet_tx_worker
  2020-04-26 20:26       ` Eric Dumazet
@ 2020-04-26 20:38         ` Eric Dumazet
  2020-04-26 20:46           ` Jason A. Donenfeld
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2020-04-26 20:38 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs,
	Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo,
	leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs,
	Thomas Gleixner, vivien.didelot, Cong Wang



On 4/26/20 1:26 PM, Eric Dumazet wrote:
> 
> 
> On 4/26/20 12:42 PM, Jason A. Donenfeld wrote:
>> On Sun, Apr 26, 2020 at 1:40 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>>
>>>
>>> On 4/26/20 10:57 AM, syzbot wrote:
>>>> syzbot has bisected this bug to:
>>>>
>>>> commit e7096c131e5161fa3b8e52a650d7719d2857adfd
>>>> Author: Jason A. Donenfeld <Jason@zx2c4.com>
>>>> Date:   Sun Dec 8 23:27:34 2019 +0000
>>>>
>>>>     net: WireGuard secure network tunnel
>>>>
>>>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000
>>>> start commit:   b2768df2 Merge branch 'for-linus' of git://git.kernel.org/..
>>>> git tree:       upstream
>>>> final crash:    https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000
>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a
>>>> userspace arch: i386
>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000
>>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000
>>>>
>>>> Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com
>>>> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
>>>>
>>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>>>>
>>>
>>> I have not looked at the repro closely, but WireGuard has some workers
>>> that might loop forever, cond_resched() might help a bit.
>>
>> I'm working on this right now. Having a bit difficult of a time
>> getting it to reproduce locally...
>>
>> The reports show the stall happening always at:
>>
>> static struct sk_buff *
>> sfq_dequeue(struct Qdisc *sch)
>> {
>>        struct sfq_sched_data *q = qdisc_priv(sch);
>>        struct sk_buff *skb;
>>        sfq_index a, next_a;
>>        struct sfq_slot *slot;
>>
>>        /* No active slots */
>>        if (q->tail == NULL)
>>                return NULL;
>>
>> next_slot:
>>        a = q->tail->next;
>>        slot = &q->slots[a];
>>
>> Which is kind of interesting, because it's not like that should block
>> or anything, unless there's some kasan faulting happening.
>>
> 
> I am not really sure WireGuard is involved, the repro does not rely on it anyway.
> 

Yes, do not spend too much time on this.

syzbot found its way into crazy qdisc settings these last days.

( I sent a patch yesterday for choke qdisc, it seems similar checks are needed in sfq )


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in wg_packet_tx_worker
  2020-04-26 20:38         ` Eric Dumazet
@ 2020-04-26 20:46           ` Jason A. Donenfeld
  2020-04-26 21:53             ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Jason A. Donenfeld @ 2020-04-26 20:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs,
	Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo,
	leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs,
	Thomas Gleixner, vivien.didelot, Cong Wang

On Sun, Apr 26, 2020 at 2:38 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 4/26/20 1:26 PM, Eric Dumazet wrote:
> >
> >
> > On 4/26/20 12:42 PM, Jason A. Donenfeld wrote:
> >> On Sun, Apr 26, 2020 at 1:40 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> On 4/26/20 10:57 AM, syzbot wrote:
> >>>> syzbot has bisected this bug to:
> >>>>
> >>>> commit e7096c131e5161fa3b8e52a650d7719d2857adfd
> >>>> Author: Jason A. Donenfeld <Jason@zx2c4.com>
> >>>> Date:   Sun Dec 8 23:27:34 2019 +0000
> >>>>
> >>>>     net: WireGuard secure network tunnel
> >>>>
> >>>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000
> >>>> start commit:   b2768df2 Merge branch 'for-linus' of git://git.kernel.org/..
> >>>> git tree:       upstream
> >>>> final crash:    https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000
> >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000
> >>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
> >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a
> >>>> userspace arch: i386
> >>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000
> >>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000
> >>>>
> >>>> Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com
> >>>> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
> >>>>
> >>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >>>>
> >>>
> >>> I have not looked at the repro closely, but WireGuard has some workers
> >>> that might loop forever, cond_resched() might help a bit.
> >>
> >> I'm working on this right now. Having a bit difficult of a time
> >> getting it to reproduce locally...
> >>
> >> The reports show the stall happening always at:
> >>
> >> static struct sk_buff *
> >> sfq_dequeue(struct Qdisc *sch)
> >> {
> >>        struct sfq_sched_data *q = qdisc_priv(sch);
> >>        struct sk_buff *skb;
> >>        sfq_index a, next_a;
> >>        struct sfq_slot *slot;
> >>
> >>        /* No active slots */
> >>        if (q->tail == NULL)
> >>                return NULL;
> >>
> >> next_slot:
> >>        a = q->tail->next;
> >>        slot = &q->slots[a];
> >>
> >> Which is kind of interesting, because it's not like that should block
> >> or anything, unless there's some kasan faulting happening.
> >>
> >
> > I am not really sure WireGuard is involved, the repro does not rely on it anyway.
> >
>
> Yes, do not spend too much time on this.
>
> syzbot found its way into crazy qdisc settings these last days.
>
> ( I sent a patch yesterday for choke qdisc, it seems similar checks are needed in sfq )

Ah, whew, okay. I had just begun instrumenting sfq (the highly
technical term for "adding printks everywhere") to figure out what's
going on. Looks like you've got a handle on it, so I'll let you have
at it.

On the brighter side, it seems like Dmitry's and my effort to get full
coverage of WireGuard has paid off in the sense that tons of packets
wind up being shoveled through it in one way or another, which is
good.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in wg_packet_tx_worker
  2020-04-26 20:46           ` Jason A. Donenfeld
@ 2020-04-26 21:53             ` Eric Dumazet
  2020-05-04 11:23               ` Jason A. Donenfeld
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2020-04-26 21:53 UTC (permalink / raw)
  To: Jason A. Donenfeld, Eric Dumazet
  Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman, jhs,
	Jiří Pírko, Krzysztof Kozlowski, kuba, kvalo,
	leon, LKML, linux-kselftest, Netdev, Shuah Khan, syzkaller-bugs,
	Thomas Gleixner, vivien.didelot, Cong Wang



On 4/26/20 1:46 PM, Jason A. Donenfeld wrote:

> 
> Ah, whew, okay. I had just begun instrumenting sfq (the highly
> technical term for "adding printks everywhere") to figure out what's
> going on. Looks like you've got a handle on it, so I'll let you have
> at it.

Yes, syzbot manages to put a zero in q->scaled_quantum

I will send a fix.

> 
> On the brighter side, it seems like Dmitry's and my effort to get full
> coverage of WireGuard has paid off in the sense that tons of packets
> wind up being shoveled through it in one way or another, which is
> good.
> 

Sure !

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in wg_packet_tx_worker
  2020-04-26 21:53             ` Eric Dumazet
@ 2020-05-04 11:23               ` Jason A. Donenfeld
  0 siblings, 0 replies; 10+ messages in thread
From: Jason A. Donenfeld @ 2020-05-04 11:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: syzbot, David Miller, Florian Fainelli, Greg Kroah-Hartman,
	Jamal Hadi Salim, Jiří Pírko, Krzysztof Kozlowski,
	Jakub Kicinski, kvalo, leon, LKML, linux-kselftest, Netdev,
	Shuah Khan, syzkaller-bugs, Thomas Gleixner, vivien.didelot,
	Cong Wang

So in spite of this Syzkaller bug being unrelated in the end, I've
continued to think about the stacktrace a bit, and combined with some
other [potentially false alarm] bug reports I'm trying to wrap my head
around, I'm a bit a curious about ideal usage for the udp_tunnel API.

All the uses I've seen in the kernel (including wireguard) follow this pattern:

rcu_read_lock_bh();
sock = rcu_dereference(obj->sock);
...
udp_tunnel_xmit_skb(..., sock, ...);
rcu_read_unlock_bh();

udp_tunnel_xmit_skb calls iptunnel_xmit, which winds up in the usual
ip_local_out path, which eventually winds up calling some other
devices' ndo_xmit, or gets queued up in a qdisc. Calls to
udp_tunnel_xmit_skb aren't exactly cheap. So I wonder: is holding the
rcu lock for all that time really a good thing?

A different pattern that avoids holding the rcu lock would be:

rcu_read_lock_bh();
sock = rcu_dereference(obj->sock);
sock_hold(sock);
rcu_read_unlock_bh();
...
udp_tunnel_xmit_skb(..., sock, ...);
sock_put(sock);

This seems better, but I wonder if it has some drawbacks too. For
example, sock_put has some comment that warns against incrementing it
in response to forwarded packets. And if this isn't necessary to do,
it's marginally more costly than the first pattern.

Any opinions about this?

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-05-04 11:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <00000000000005d1ab05a4351006@google.com>
2020-04-26 17:57 ` INFO: rcu detected stall in wg_packet_tx_worker syzbot
2020-04-26 19:40   ` Eric Dumazet
2020-04-26 19:42     ` Jason A. Donenfeld
2020-04-26 19:52       ` Jason A. Donenfeld
2020-04-26 19:58         ` Jason A. Donenfeld
2020-04-26 20:26       ` Eric Dumazet
2020-04-26 20:38         ` Eric Dumazet
2020-04-26 20:46           ` Jason A. Donenfeld
2020-04-26 21:53             ` Eric Dumazet
2020-05-04 11:23               ` Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).