Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic
@ 2014-10-25  8:40 Stephen Hemminger
  2014-10-25 21:44 ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2014-10-25  8:40 UTC (permalink / raw)
  To: netdev



Begin forwarded message:

Date: Fri, 24 Oct 2014 11:34:08 -0700
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 86851] New: Reproducible panic on heavy UDP traffic


https://bugzilla.kernel.org/show_bug.cgi?id=86851

            Bug ID: 86851
           Summary: Reproducible panic on heavy UDP traffic
           Product: Networking
           Version: 2.5
    Kernel Version: 3.18-rc1
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: IPV4
          Assignee: shemminger@linux-foundation.org
          Reporter: chutzpah@gentoo.org
        Regression: No

Created attachment 154861
  --> https://bugzilla.kernel.org/attachment.cgi?id=154861&action=edit
Panic message captured over serial console

This bug is being encountered on a machine with 6 ixgbe interfaces configured
as 3 bonded pairs. Currently using netperf for testing UDP throughput from ~136
client machines connected via 1G interfaces.

This is the command being run on the clients:
netperf -H XX.XX.XX.XX -t OMNI -l 60 -- -d send -T UDP -O throughput,direction
-m 16384

This is the command being run on the server:
netserver -4

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic
  2014-10-25  8:40 Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic Stephen Hemminger
@ 2014-10-25 21:44 ` Florian Westphal
  2014-10-26 23:28   ` Nikolay Aleksandrov
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2014-10-25 21:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, nikolay

Stephen Hemminger <stephen@networkplumber.org> wrote:

[ CC Nik ]

> Date: Fri, 24 Oct 2014 11:34:08 -0700
> From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
> To: "stephen@networkplumber.org" <stephen@networkplumber.org>
> Subject: [Bug 86851] New: Reproducible panic on heavy UDP traffic
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=86851
> 
>             Bug ID: 86851
>            Summary: Reproducible panic on heavy UDP traffic
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 3.18-rc1
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>           Assignee: shemminger@linux-foundation.org
>           Reporter: chutzpah@gentoo.org
>         Regression: No
> 
> Created attachment 154861
>   --> https://bugzilla.kernel.org/attachment.cgi?id=154861&action=edit
> Panic message captured over serial console

 general protection fault: 0000 [#1] SMP
 Modules linked in: nfs [..]
 CPU: 7 PID: 257 Comm: kworker/7:1 Tainted: G        W      3.18.0-rc1-base-7+ #2

asked reporter to check if there is a warning before the oops.

 Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
 Workqueue: events inet_frag_worker
 task: ffff882fd32e70e0 ti: ffff882fd0adc000 task.ti: ffff882fd0adc000
 RIP: 0010:[<ffffffff81592ab4>]  [<ffffffff81592ab4>] inet_evict_bucket+0xf4/0x180
 RSP: 0018:ffff882fd0adfd58  EFLAGS: 00010286
 RAX: ffff8817c7230701 RBX: dead000000100100 RCX: 0000000180300013

Hello LIST_POISON!

 RDX: 0000000180300014 RSI: 0000000000000001 RDI: dead0000001000c0
 RBP: 0000000000000002 R08: 0000000000000202 R09: ffff88303fc39ab0
 R10: ffffffff81592ac0 R11: ffffea005f1c8c00 R12: ffffffff81aa2820
 R13: ffff882fd0adfd70 R14: ffff8817c72307e0 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88303fc20000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 device rack0a left promiscuous mode
 CR2: 00007f054c7ba034 CR3: 0000002fc4986000 CR4: 00000000001407e0
 Stack:
  ffffffff81aa3298 ffffffff81aa3290 ffff8817d0820a08 0000000000000000
  0000000000000000 00000000000000a8 0000000000000008 ffff88303fc32780
  ffffffff81aa6820 0000000000000059[ 2415.026338] device rack1a left promiscuous mode

  0000000000000000 ffffffff81592ba2
 Call Trace:
  [<ffffffff81592ba2>] ? inet_frag_worker+0x62/0x210
  [<ffffffff8112c312>] ? process_one_work+0x132/0x360
[..]
crash is in hlist_for_each_entry_safe() at the end of inet_evict_bucket(), looks like
we encounter an already-list_del'd element while iterating.

Will look at this tomorrow.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic
  2014-10-25 21:44 ` Florian Westphal
@ 2014-10-26 23:28   ` Nikolay Aleksandrov
  2014-10-27  0:47     ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Nikolay Aleksandrov @ 2014-10-26 23:28 UTC (permalink / raw)
  To: Florian Westphal, Stephen Hemminger; +Cc: netdev

On 10/25/2014 11:44 PM, Florian Westphal wrote:
> Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> [ CC Nik ]
> 
>> Date: Fri, 24 Oct 2014 11:34:08 -0700
>> From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
>> To: "stephen@networkplumber.org" <stephen@networkplumber.org>
>> Subject: [Bug 86851] New: Reproducible panic on heavy UDP traffic
>>
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=86851
>>
>>             Bug ID: 86851
>>            Summary: Reproducible panic on heavy UDP traffic
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: 3.18-rc1
>>           Hardware: x86-64
>>                 OS: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: IPV4
>>           Assignee: shemminger@linux-foundation.org
>>           Reporter: chutzpah@gentoo.org
>>         Regression: No
>>
>> Created attachment 154861
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=154861&action=edit
>> Panic message captured over serial console
> 
>  general protection fault: 0000 [#1] SMP
>  Modules linked in: nfs [..]
>  CPU: 7 PID: 257 Comm: kworker/7:1 Tainted: G        W      3.18.0-rc1-base-7+ #2
> 
> asked reporter to check if there is a warning before the oops.
> 
>  Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
>  Workqueue: events inet_frag_worker
>  task: ffff882fd32e70e0 ti: ffff882fd0adc000 task.ti: ffff882fd0adc000
>  RIP: 0010:[<ffffffff81592ab4>]  [<ffffffff81592ab4>] inet_evict_bucket+0xf4/0x180
>  RSP: 0018:ffff882fd0adfd58  EFLAGS: 00010286
>  RAX: ffff8817c7230701 RBX: dead000000100100 RCX: 0000000180300013
> 
> Hello LIST_POISON!
> 
>  RDX: 0000000180300014 RSI: 0000000000000001 RDI: dead0000001000c0
>  RBP: 0000000000000002 R08: 0000000000000202 R09: ffff88303fc39ab0
>  R10: ffffffff81592ac0 R11: ffffea005f1c8c00 R12: ffffffff81aa2820
>  R13: ffff882fd0adfd70 R14: ffff8817c72307e0 R15: 0000000000000000
>  FS:  0000000000000000(0000) GS:ffff88303fc20000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  device rack0a left promiscuous mode
>  CR2: 00007f054c7ba034 CR3: 0000002fc4986000 CR4: 00000000001407e0
>  Stack:
>   ffffffff81aa3298 ffffffff81aa3290 ffff8817d0820a08 0000000000000000
>   0000000000000000 00000000000000a8 0000000000000008 ffff88303fc32780
>   ffffffff81aa6820 0000000000000059[ 2415.026338] device rack1a left promiscuous mode
> 
>   0000000000000000 ffffffff81592ba2
>  Call Trace:
>   [<ffffffff81592ba2>] ? inet_frag_worker+0x62/0x210
>   [<ffffffff8112c312>] ? process_one_work+0x132/0x360
> [..]
> crash is in hlist_for_each_entry_safe() at the end of inet_evict_bucket(), looks like
> we encounter an already-list_del'd element while iterating.
> 
> Will look at this tomorrow.
> 

Thanks for CCing me.
I'll dig in the code tomorrow but my first thought when I saw this was
could it be possible that we have a race condition between
ip_frag_queue() and inet_frag_evict(), more precisely between the
ipq_kill() calls from ip_frag_queue and inet_frag_evict since the frag
could be found before we have entered the evictor which then can add it to
its expire list but the ipq_kill() from ip_frag_queue() can do a list_del
after we release the chain lock in the evictor so we may end up like this ?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic
  2014-10-26 23:28   ` Nikolay Aleksandrov
@ 2014-10-27  0:47     ` Eric Dumazet
  2014-10-27  8:48       ` Nikolay Aleksandrov
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2014-10-27  0:47 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: Florian Westphal, Stephen Hemminger, netdev

On Mon, 2014-10-27 at 00:28 +0100, Nikolay Aleksandrov wrote:

> 
> Thanks for CCing me.
> I'll dig in the code tomorrow but my first thought when I saw this was
> could it be possible that we have a race condition between
> ip_frag_queue() and inet_frag_evict(), more precisely between the
> ipq_kill() calls from ip_frag_queue and inet_frag_evict since the frag
> could be found before we have entered the evictor which then can add it to
> its expire list but the ipq_kill() from ip_frag_queue() can do a list_del
> after we release the chain lock in the evictor so we may end up like this ?

Yes, either we use hlist_del_init() but loose poison aid, or test if
frag was evicted :

Not sure about refcount.

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 9eb89f3f0ee4..894ec30c5896 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -285,7 +285,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 	struct inet_frag_bucket *hb;
 
 	hb = get_frag_bucket_locked(fq, f);
-	hlist_del(&fq->list);
+	if (!(fq->flags & INET_FRAG_EVICTED))
+		hlist_del(&fq->list);
 	spin_unlock(&hb->chain_lock);
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic
  2014-10-27  0:47     ` Eric Dumazet
@ 2014-10-27  8:48       ` Nikolay Aleksandrov
  2014-10-27  9:12         ` Nikolay Aleksandrov
  2014-10-27 22:59         ` Patrick McLean
  0 siblings, 2 replies; 15+ messages in thread
From: Nikolay Aleksandrov @ 2014-10-27  8:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, Stephen Hemminger, netdev, chutzpah

On 10/27/2014 01:47 AM, Eric Dumazet wrote:
> On Mon, 2014-10-27 at 00:28 +0100, Nikolay Aleksandrov wrote:
> 
>>
>> Thanks for CCing me.
>> I'll dig in the code tomorrow but my first thought when I saw this was
>> could it be possible that we have a race condition between
>> ip_frag_queue() and inet_frag_evict(), more precisely between the
>> ipq_kill() calls from ip_frag_queue and inet_frag_evict since the frag
>> could be found before we have entered the evictor which then can add it to
>> its expire list but the ipq_kill() from ip_frag_queue() can do a list_del
>> after we release the chain lock in the evictor so we may end up like this ?
> 
> Yes, either we use hlist_del_init() but loose poison aid, or test if
> frag was evicted :
> 
> Not sure about refcount.
> 
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 9eb89f3f0ee4..894ec30c5896 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -285,7 +285,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
>  	struct inet_frag_bucket *hb;
>  
>  	hb = get_frag_bucket_locked(fq, f);
> -	hlist_del(&fq->list);
> +	if (!(fq->flags & INET_FRAG_EVICTED))
> +		hlist_del(&fq->list);
>  	spin_unlock(&hb->chain_lock);
>  }
>  
> 
> 

Exactly, I was thinking about a similar fix since the evict flag is only
set with the chain lock. IMO the refcount should be fine.
CCing the reporter.
Patrick could you please try Eric's patch ?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic
  2014-10-27  8:48       ` Nikolay Aleksandrov
@ 2014-10-27  9:12         ` Nikolay Aleksandrov
  2014-10-27  9:32           ` Nikolay Aleksandrov
  2014-10-27 22:59         ` Patrick McLean
  1 sibling, 1 reply; 15+ messages in thread
From: Nikolay Aleksandrov @ 2014-10-27  9:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, Stephen Hemminger, netdev, chutzpah

On 10/27/2014 09:48 AM, Nikolay Aleksandrov wrote:
> On 10/27/2014 01:47 AM, Eric Dumazet wrote:
>> On Mon, 2014-10-27 at 00:28 +0100, Nikolay Aleksandrov wrote:
>>
>>>
>>> Thanks for CCing me.
>>> I'll dig in the code tomorrow but my first thought when I saw this was
>>> could it be possible that we have a race condition between
>>> ip_frag_queue() and inet_frag_evict(), more precisely between the
>>> ipq_kill() calls from ip_frag_queue and inet_frag_evict since the frag
>>> could be found before we have entered the evictor which then can add it to
>>> its expire list but the ipq_kill() from ip_frag_queue() can do a list_del
>>> after we release the chain lock in the evictor so we may end up like this ?
>>
>> Yes, either we use hlist_del_init() but loose poison aid, or test if
>> frag was evicted :
>>
>> Not sure about refcount.
>>
>> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
>> index 9eb89f3f0ee4..894ec30c5896 100644
>> --- a/net/ipv4/inet_fragment.c
>> +++ b/net/ipv4/inet_fragment.c
>> @@ -285,7 +285,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
>>  	struct inet_frag_bucket *hb;
>>  
>>  	hb = get_frag_bucket_locked(fq, f);
>> -	hlist_del(&fq->list);
>> +	if (!(fq->flags & INET_FRAG_EVICTED))
>> +		hlist_del(&fq->list);
>>  	spin_unlock(&hb->chain_lock);
>>  }
>>  
>>
>>
> 
> Exactly, I was thinking about a similar fix since the evict flag is only
> set with the chain lock. IMO the refcount should be fine.
> CCing the reporter.
> Patrick could you please try Eric's patch ?

Actually there might be a refcnt problem. What if the following happens:
refcnt = 2 (1 for timer, 1 init)
	A			B
inet_frag_find() +1 (3)
ipq_kill() -2 (1)          inet_frag_evict() (sees it before fq_unlink)
spin_unlock(q.lock)
			   ->frag_expire() -1 (0 -> frag_destroy())
*ipq_put(freed frag)*

Or the other way around, last ipq_put() is executed before frag_expire
so we have a freed frag on the expire list.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic
  2014-10-27  9:12         ` Nikolay Aleksandrov
@ 2014-10-27  9:32           ` Nikolay Aleksandrov
  0 siblings, 0 replies; 15+ messages in thread
From: Nikolay Aleksandrov @ 2014-10-27  9:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, Stephen Hemminger, netdev, chutzpah

On 10/27/2014 10:12 AM, Nikolay Aleksandrov wrote:
> On 10/27/2014 09:48 AM, Nikolay Aleksandrov wrote:
>> On 10/27/2014 01:47 AM, Eric Dumazet wrote:
>>> On Mon, 2014-10-27 at 00:28 +0100, Nikolay Aleksandrov wrote:
>>>
>>>>
>>>> Thanks for CCing me.
>>>> I'll dig in the code tomorrow but my first thought when I saw this was
>>>> could it be possible that we have a race condition between
>>>> ip_frag_queue() and inet_frag_evict(), more precisely between the
>>>> ipq_kill() calls from ip_frag_queue and inet_frag_evict since the frag
>>>> could be found before we have entered the evictor which then can add it to
>>>> its expire list but the ipq_kill() from ip_frag_queue() can do a list_del
>>>> after we release the chain lock in the evictor so we may end up like this ?
>>>
>>> Yes, either we use hlist_del_init() but loose poison aid, or test if
>>> frag was evicted :
>>>
>>> Not sure about refcount.
>>>
>>> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
>>> index 9eb89f3f0ee4..894ec30c5896 100644
>>> --- a/net/ipv4/inet_fragment.c
>>> +++ b/net/ipv4/inet_fragment.c
>>> @@ -285,7 +285,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
>>>  	struct inet_frag_bucket *hb;
>>>  
>>>  	hb = get_frag_bucket_locked(fq, f);
>>> -	hlist_del(&fq->list);
>>> +	if (!(fq->flags & INET_FRAG_EVICTED))
>>> +		hlist_del(&fq->list);
>>>  	spin_unlock(&hb->chain_lock);
>>>  }
>>>  
>>>
>>>
>>
>> Exactly, I was thinking about a similar fix since the evict flag is only
>> set with the chain lock. IMO the refcount should be fine.
>> CCing the reporter.
>> Patrick could you please try Eric's patch ?
> 
> Actually there might be a refcnt problem. What if the following happens:
> refcnt = 2 (1 for timer, 1 init)
> 	A			B
> inet_frag_find() +1 (3)
> ipq_kill() -2 (1)          inet_frag_evict() (sees it before fq_unlink)
> spin_unlock(q.lock)
> 			   ->frag_expire() -1 (0 -> frag_destroy())
> *ipq_put(freed frag)*
> 
> Or the other way around, last ipq_put() is executed before frag_expire
> so we have a freed frag on the expire list.
> 
> 
Ah, I forgot about the del_timer(), only one of the two can actually delete
it so in the case inet_frag_kill() deletes it, it'll take the timer refcnt
but inet_evict_frag() won't add the frag to the list (and we may actually
hit the WARN_ON(refcnt!=1) in inet_evict_frag), in the other case
inet_frag_evict() deletes the timer but the refcnt for it stays and we're
fine w.r.t. inet_frag_kill. I think we should also remove the WARN_ON() in
inet_frag_evict as the timer running is not the only case to be in that
if() branch, inet_frag_evict could be running with inet_frag_kill() being
called by *frag_queue() with taken refcount but just after deleting the timer.
Does this sound sane ? :-) I need to get some coffee...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 86851] New: Reproducible panic on heavy UDP traffic
  2014-10-27  8:48       ` Nikolay Aleksandrov
  2014-10-27  9:12         ` Nikolay Aleksandrov
@ 2014-10-27 22:59         ` Patrick McLean
  2014-10-27 23:06           ` Nikolay Aleksandrov
  1 sibling, 1 reply; 15+ messages in thread
From: Patrick McLean @ 2014-10-27 22:59 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Eric Dumazet, Florian Westphal, Stephen Hemminger, netdev

On Mon, 27 Oct 2014 09:48:15 +0100
Nikolay Aleksandrov <nikolay@redhat.com> wrote:

> On 10/27/2014 01:47 AM, Eric Dumazet wrote:
> > On Mon, 2014-10-27 at 00:28 +0100, Nikolay Aleksandrov wrote:
> > 
> >>
> >> Thanks for CCing me.
> >> I'll dig in the code tomorrow but my first thought when I saw this
> >> was could it be possible that we have a race condition between
> >> ip_frag_queue() and inet_frag_evict(), more precisely between the
> >> ipq_kill() calls from ip_frag_queue and inet_frag_evict since the
> >> frag could be found before we have entered the evictor which then
> >> can add it to its expire list but the ipq_kill() from
> >> ip_frag_queue() can do a list_del after we release the chain lock
> >> in the evictor so we may end up like this ?
> > 
> > Yes, either we use hlist_del_init() but loose poison aid, or test if
> > frag was evicted :
> > 
> > Not sure about refcount.
> > 
> > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> > index 9eb89f3f0ee4..894ec30c5896 100644
> > --- a/net/ipv4/inet_fragment.c
> > +++ b/net/ipv4/inet_fragment.c
> > @@ -285,7 +285,8 @@ static inline void fq_unlink(struct
> > inet_frag_queue *fq, struct inet_frags *f) struct inet_frag_bucket
> > *hb; 
> >  	hb = get_frag_bucket_locked(fq, f);
> > -	hlist_del(&fq->list);
> > +	if (!(fq->flags & INET_FRAG_EVICTED))
> > +		hlist_del(&fq->list);
> >  	spin_unlock(&hb->chain_lock);
> >  }
> >  
> > 
> > 
> 
> Exactly, I was thinking about a similar fix since the evict flag is
> only set with the chain lock. IMO the refcount should be fine.
> CCing the reporter.
> Patrick could you please try Eric's patch ?
> 

It no longer panics with that patch, but it does produce a large amount
of warnings, here is an example of what I am getting. I will attach the
full log to the bug.

> [  205.042923] ------------[ cut here ]------------
> [  205.042933] WARNING: CPU: 4 PID: 615 at net/ipv4/inet_fragment.c:149 inet_evict_bucket+0x172/0x180()
> [  205.042934] Modules linked in: nfs fscache nfsd auth_rpcgss nfs_acl lockd grace sunrpc 8021q garp mrp bonding x86_pkg_temp_thermal joydev sb_edac edac_core ioatdma tpm_tis ext4 mbcache jbd2 igb ixgbe i2c_algo_bit raid1 mdio crc32c_intel megaraid_sas dca
> [  205.042953] CPU: 4 PID: 615 Comm: kworker/4:2 Not tainted 3.18.0-rc2-base-7+ #3
> [  205.042955] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
> [  205.042957] Workqueue: events inet_frag_worker
> [  205.042958]  0000000000000000 0000000000000009 ffffffff81624cd2 0000000000000000
> [  205.042960]  ffffffff81117b7d ffff8817c83a4740 0000000000000000 ffffffff81aa6820
> [  205.042962]  ffff8817ce073d70 ffff8817c83a4738 ffffffff81597cb2 ffffffff81aa8e28
> [  205.042964] Call Trace:
> [  205.042969]  [<ffffffff81624cd2>] ? dump_stack+0x41/0x51
> [  205.042973]  [<ffffffff81117b7d>] ? warn_slowpath_common+0x6d/0x90
> [  205.042975]  [<ffffffff81597cb2>] ? inet_evict_bucket+0x172/0x180
> [  205.042976]  [<ffffffff81597d22>] ? inet_frag_worker+0x62/0x210
> [  205.042979]  [<ffffffff8112c312>] ? process_one_work+0x132/0x360
> [  205.042981]  [<ffffffff8112ca23>] ? worker_thread+0x113/0x590
> [  205.042983]  [<ffffffff8112c910>] ? rescuer_thread+0x3d0/0x3d0
> [  205.042986]  [<ffffffff8113123c>] ? kthread+0xbc/0xe0
> [  205.042991]  [<ffffffff81040000>] ? xen_teardown_timer+0x10/0x70
> [  205.042993]  [<ffffffff81131180>] ? kthread_create_on_node+0x170/0x170
> [  205.042996]  [<ffffffff8162a9fc>] ? ret_from_fork+0x7c/0xb0
> [  205.042998]  [<ffffffff81131180>] ? kthread_create_on_node+0x170/0x170
> [  205.043000] ---[ end trace ed2bb7d412e082bc ]---
> [  205.752744] ------------[ cut here ]------------
> [  205.752752] WARNING: CPU: 2 PID: 610 at net/ipv4/inet_fragment.c:149 inet_evict_bucket+0x172/0x180()
> [  205.752754] Modules linked in: nfs fscache nfsd auth_rpcgss nfs_acl lockd grace sunrpc 8021q garp mrp bonding x86_pkg_temp_thermal joydev sb_edac edac_core ioatdma tpm_tis ext4 mbcache jbd2 igb ixgbe i2c_algo_bit raid1 mdio crc32c_intel megaraid_sas dca
> [  205.752773] CPU: 2 PID: 610 Comm: kworker/2:2 Tainted: G        W      3.18.0-rc2-base-7+ #3 
> [  205.752774] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
> [  205.752777] Workqueue: events inet_frag_worker
> [  205.752779]  0000000000000000 0000000000000009 ffffffff81624cd2 0000000000000000
> [  205.752780]  ffffffff81117b7d ffff882fc473c740 0000000000000000 ffffffff81aa6820
> [  205.752782]  ffff8817ce7afd70 ffff882fc473c738 ffffffff81597cb2 ffffffff81aa87a8
> [  205.752784] Call Trace:
> [  205.752790]  [<ffffffff81624cd2>] ? dump_stack+0x41/0x51
> [  205.752793]  [<ffffffff81117b7d>] ? warn_slowpath_common+0x6d/0x90
> [  205.752795]  [<ffffffff81597cb2>] ? inet_evict_bucket+0x172/0x180
> [  205.752797]  [<ffffffff81597d22>] ? inet_frag_worker+0x62/0x210
> [  205.752799]  [<ffffffff8112c312>] ? process_one_work+0x132/0x360
> [  205.752801]  [<ffffffff8112ca23>] ? worker_thread+0x113/0x590
> [  205.752803]  [<ffffffff8112c910>] ? rescuer_thread+0x3d0/0x3d0
> [  205.752806]  [<ffffffff8113123c>] ? kthread+0xbc/0xe0
> [  205.752810]  [<ffffffff81040000>] ? xen_teardown_timer+0x10/0x70
> [  205.752812]  [<ffffffff81131180>] ? kthread_create_on_node+0x170/0x170
> [  205.752815]  [<ffffffff8162a9fc>] ? ret_from_fork+0x7c/0xb0
> [  205.752818]  [<ffffffff81131180>] ? kthread_create_on_node+0x170/0x170
> [  205.752820] ---[ end trace ed2bb7d412e082bd ]---
> [  206.737865] ------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 86851] New: Reproducible panic on heavy UDP traffic
  2014-10-27 22:59         ` Patrick McLean
@ 2014-10-27 23:06           ` Nikolay Aleksandrov
  2014-10-28  0:16             ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Nikolay Aleksandrov @ 2014-10-27 23:06 UTC (permalink / raw)
  To: Patrick McLean; +Cc: Eric Dumazet, Florian Westphal, Stephen Hemminger, netdev

On 10/27/2014 11:59 PM, Patrick McLean wrote:
> On Mon, 27 Oct 2014 09:48:15 +0100
> Nikolay Aleksandrov <nikolay@redhat.com> wrote:
> 
>> On 10/27/2014 01:47 AM, Eric Dumazet wrote:
>>> On Mon, 2014-10-27 at 00:28 +0100, Nikolay Aleksandrov wrote:
>>>
>>>>
>>>> Thanks for CCing me.
>>>> I'll dig in the code tomorrow but my first thought when I saw this
>>>> was could it be possible that we have a race condition between
>>>> ip_frag_queue() and inet_frag_evict(), more precisely between the
>>>> ipq_kill() calls from ip_frag_queue and inet_frag_evict since the
>>>> frag could be found before we have entered the evictor which then
>>>> can add it to its expire list but the ipq_kill() from
>>>> ip_frag_queue() can do a list_del after we release the chain lock
>>>> in the evictor so we may end up like this ?
>>>
>>> Yes, either we use hlist_del_init() but loose poison aid, or test if
>>> frag was evicted :
>>>
>>> Not sure about refcount.
>>>
>>> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
>>> index 9eb89f3f0ee4..894ec30c5896 100644
>>> --- a/net/ipv4/inet_fragment.c
>>> +++ b/net/ipv4/inet_fragment.c
>>> @@ -285,7 +285,8 @@ static inline void fq_unlink(struct
>>> inet_frag_queue *fq, struct inet_frags *f) struct inet_frag_bucket
>>> *hb; 
>>>  	hb = get_frag_bucket_locked(fq, f);
>>> -	hlist_del(&fq->list);
>>> +	if (!(fq->flags & INET_FRAG_EVICTED))
>>> +		hlist_del(&fq->list);
>>>  	spin_unlock(&hb->chain_lock);
>>>  }
>>>  
>>>
>>>
>>
>> Exactly, I was thinking about a similar fix since the evict flag is
>> only set with the chain lock. IMO the refcount should be fine.
>> CCing the reporter.
>> Patrick could you please try Eric's patch ?
>>
> 
> It no longer panics with that patch, but it does produce a large amount
> of warnings, here is an example of what I am getting. I will attach the
> full log to the bug.
> 

Great! Thanks for testing.
As I said earlier we have a valid case that can hit the WARN_ON in
inet_evict_frag().
Anyhow, Eric would you mind posting the patch officially ?
If you'd like me to remove the WARN_ON() in a separate one just let me
know, otherwise feel free to remove it in the fix for the race.

Cheers,
 Nik

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 86851] New: Reproducible panic on heavy UDP traffic
  2014-10-27 23:06           ` Nikolay Aleksandrov
@ 2014-10-28  0:16             ` Eric Dumazet
  2014-10-28  9:30               ` [PATCH net] inet: frags: fix a race between inet_evict_bucket and inet_frag_kill Nikolay Aleksandrov
  2014-10-28  9:44               ` [PATCH net] inet: frags: remove the WARN_ON from inet_evict_bucket Nikolay Aleksandrov
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2014-10-28  0:16 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Patrick McLean, Florian Westphal, Stephen Hemminger, netdev

On Tue, 2014-10-28 at 00:06 +0100, Nikolay Aleksandrov wrote:

> Great! Thanks for testing.
> As I said earlier we have a valid case that can hit the WARN_ON in
> inet_evict_frag().
> Anyhow, Eric would you mind posting the patch officially ?
> If you'd like me to remove the WARN_ON() in a separate one just let me
> know, otherwise feel free to remove it in the fix for the race.

Please Nikolay take ownership of this patch, I am busy on other stuff at
the moment, thanks !

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net] inet: frags: fix a race between inet_evict_bucket and inet_frag_kill
  2014-10-28  0:16             ` Eric Dumazet
@ 2014-10-28  9:30               ` Nikolay Aleksandrov
  2014-10-28 10:03                 ` Florian Westphal
  2014-10-29 19:21                 ` David Miller
  2014-10-28  9:44               ` [PATCH net] inet: frags: remove the WARN_ON from inet_evict_bucket Nikolay Aleksandrov
  1 sibling, 2 replies; 15+ messages in thread
From: Nikolay Aleksandrov @ 2014-10-28  9:30 UTC (permalink / raw)
  To: netdev
  Cc: Nikolay Aleksandrov, Florian Westphal, Eric Dumazet, Patrick McLean

When the evictor is running it adds some chosen frags to a local list to
be evicted once the chain lock has been released but at the same time
the *frag_queue can be running for some of the same queues and it
may call inet_frag_kill which will wait on the chain lock and
will then delete the queue from the wrong list since it was added in the
eviction one. The fix is simple - check if the queue has the evict flag
set under the chain lock before deleting it, this is safe because the
evict flag is set only under that lock and having the flag set also means
that the queue has been detached from the chain list, so no need to delete
it again.
An important note to make is that we're safe w.r.t refcnt because
inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
where only one of the two can succeed (or if the timer is executing -
none of them), the cases are:
1. inet_frag_kill succeeds in del_timer
 - then the timer ref is removed, but inet_evict_bucket will not add
   this queue to its expire list but will restart eviction in that chain
2. inet_evict_bucket succeeds in del_timer
 - then the timer ref is kept until the evictor "expires" the queue, but
   inet_frag_kill will remove the initial ref and will set
   INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
   its ref.
In the end all of the queue users will do an inet_frag_put and the one
that reaches 0 will free it. The refcount balance should be okay.

CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>

Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Tested-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
A few more eyes to confirm all of this would be much appreciated.

 net/ipv4/inet_fragment.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 9eb89f3f0ee4..894ec30c5896 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -285,7 +285,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
 	struct inet_frag_bucket *hb;
 
 	hb = get_frag_bucket_locked(fq, f);
-	hlist_del(&fq->list);
+	if (!(fq->flags & INET_FRAG_EVICTED))
+		hlist_del(&fq->list);
 	spin_unlock(&hb->chain_lock);
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net] inet: frags: remove the WARN_ON from inet_evict_bucket
  2014-10-28  0:16             ` Eric Dumazet
  2014-10-28  9:30               ` [PATCH net] inet: frags: fix a race between inet_evict_bucket and inet_frag_kill Nikolay Aleksandrov
@ 2014-10-28  9:44               ` Nikolay Aleksandrov
  2014-10-29 19:22                 ` David Miller
  1 sibling, 1 reply; 15+ messages in thread
From: Nikolay Aleksandrov @ 2014-10-28  9:44 UTC (permalink / raw)
  To: netdev
  Cc: Nikolay Aleksandrov, Florian Westphal, Eric Dumazet, Patrick McLean

The WARN_ON in inet_evict_bucket can be triggered by a valid case:
inet_frag_kill and inet_evict_bucket can be running in parallel on the
same queue which means that there has been at least one more ref added
by a previous inet_frag_find call, but inet_frag_kill can delete the
timer before inet_evict_bucket which will cause the WARN_ON() there to
trigger since we'll have refcnt!=1. Now, this case is valid because the
queue is being "killed" for some reason (removed from the chain list and
its timer deleted) so it will get destroyed in the end by one of the
inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.

CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>

Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
I'm sending this as a separate patch so the race fix doesn't get blocked
in case I'm wrong and also it's a different issue.

 net/ipv4/inet_fragment.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 894ec30c5896..19419b60cb37 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -146,7 +146,6 @@ evict_again:
 			atomic_inc(&fq->refcnt);
 			spin_unlock(&hb->chain_lock);
 			del_timer_sync(&fq->timer);
-			WARN_ON(atomic_read(&fq->refcnt) != 1);
 			inet_frag_put(fq, f);
 			goto evict_again;
 		}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net] inet: frags: fix a race between inet_evict_bucket and inet_frag_kill
  2014-10-28  9:30               ` [PATCH net] inet: frags: fix a race between inet_evict_bucket and inet_frag_kill Nikolay Aleksandrov
@ 2014-10-28 10:03                 ` Florian Westphal
  2014-10-29 19:21                 ` David Miller
  1 sibling, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2014-10-28 10:03 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Florian Westphal, Eric Dumazet, Patrick McLean

Nikolay Aleksandrov <nikolay@redhat.com> wrote:
> When the evictor is running it adds some chosen frags to a local list to
> be evicted once the chain lock has been released but at the same time
> the *frag_queue can be running for some of the same queues and it
> may call inet_frag_kill which will wait on the chain lock and
> will then delete the queue from the wrong list since it was added in the
> eviction one.

I had to read that twice...

cpu1						cpu2
inet_evict_bucket				inet_frag_kill
  chain_lock()                                    chain_lock() ..
  for_each_frag_queue                               spin
    set fragqueue INET_FRAG_EVICTED flag [A]        .
    hlist_del()                                     spin
    hlist_add (to private list)                     .
                                                    spin
  chain_unlock                                      .
						    chain_lock returns
  for_each_frag_queue_on_private_list		    hlist_del() [B]
     frag_expire(fq) // destroy/free queue

[B] we may delete entry on the evictors private list.

since [A] is only set with chainlock held, other cpus
killing an entry can use INET_FRAG_EVICTED to test if the
entry is about to be removed by the evictor.

> The fix is simple - check if the queue has the evict flag
> set under the chain lock before deleting it, this is safe because the
> evict flag is set only under that lock and having the flag set also means
> that the queue has been detached from the chain list, so no need to delete
> it again.

Right, thanks everyone.
> ---
> A few more eyes to confirm all of this would be much appreciated.

Looks correct,
Reviewed-by: Florian Westphal <fw@strlen.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] inet: frags: fix a race between inet_evict_bucket and inet_frag_kill
  2014-10-28  9:30               ` [PATCH net] inet: frags: fix a race between inet_evict_bucket and inet_frag_kill Nikolay Aleksandrov
  2014-10-28 10:03                 ` Florian Westphal
@ 2014-10-29 19:21                 ` David Miller
  1 sibling, 0 replies; 15+ messages in thread
From: David Miller @ 2014-10-29 19:21 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, fw, eric.dumazet, chutzpah

From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Tue, 28 Oct 2014 10:30:34 +0100

> When the evictor is running it adds some chosen frags to a local list to
> be evicted once the chain lock has been released but at the same time
> the *frag_queue can be running for some of the same queues and it
> may call inet_frag_kill which will wait on the chain lock and
> will then delete the queue from the wrong list since it was added in the
> eviction one. The fix is simple - check if the queue has the evict flag
> set under the chain lock before deleting it, this is safe because the
> evict flag is set only under that lock and having the flag set also means
> that the queue has been detached from the chain list, so no need to delete
> it again.
> An important note to make is that we're safe w.r.t refcnt because
> inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
> where only one of the two can succeed (or if the timer is executing -
> none of them), the cases are:
> 1. inet_frag_kill succeeds in del_timer
>  - then the timer ref is removed, but inet_evict_bucket will not add
>    this queue to its expire list but will restart eviction in that chain
> 2. inet_evict_bucket succeeds in del_timer
>  - then the timer ref is kept until the evictor "expires" the queue, but
>    inet_frag_kill will remove the initial ref and will set
>    INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
>    its ref.
> In the end all of the queue users will do an inet_frag_put and the one
> that reaches 0 will free it. The refcount balance should be okay.
> 
> CC: Florian Westphal <fw@strlen.de>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Patrick McLean <chutzpah@gentoo.org>
> 
> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
> Reported-by: Patrick McLean <chutzpah@gentoo.org>
> Tested-by: Patrick McLean <chutzpah@gentoo.org>
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---
> A few more eyes to confirm all of this would be much appreciated.

I've applied this and tentatively scheduled it for -stable, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net] inet: frags: remove the WARN_ON from inet_evict_bucket
  2014-10-28  9:44               ` [PATCH net] inet: frags: remove the WARN_ON from inet_evict_bucket Nikolay Aleksandrov
@ 2014-10-29 19:22                 ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2014-10-29 19:22 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, fw, eric.dumazet, chutzpah

From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Tue, 28 Oct 2014 10:44:01 +0100

> The WARN_ON in inet_evict_bucket can be triggered by a valid case:
> inet_frag_kill and inet_evict_bucket can be running in parallel on the
> same queue which means that there has been at least one more ref added
> by a previous inet_frag_find call, but inet_frag_kill can delete the
> timer before inet_evict_bucket which will cause the WARN_ON() there to
> trigger since we'll have refcnt!=1. Now, this case is valid because the
> queue is being "killed" for some reason (removed from the chain list and
> its timer deleted) so it will get destroyed in the end by one of the
> inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.
> 
> CC: Florian Westphal <fw@strlen.de>
> CC: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Patrick McLean <chutzpah@gentoo.org>
> 
> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
> Reported-by: Patrick McLean <chutzpah@gentoo.org>
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---
> I'm sending this as a separate patch so the race fix doesn't get blocked
> in case I'm wrong and also it's a different issue.

Also applied, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-10-29 19:22 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-25  8:40 Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic Stephen Hemminger
2014-10-25 21:44 ` Florian Westphal
2014-10-26 23:28   ` Nikolay Aleksandrov
2014-10-27  0:47     ` Eric Dumazet
2014-10-27  8:48       ` Nikolay Aleksandrov
2014-10-27  9:12         ` Nikolay Aleksandrov
2014-10-27  9:32           ` Nikolay Aleksandrov
2014-10-27 22:59         ` Patrick McLean
2014-10-27 23:06           ` Nikolay Aleksandrov
2014-10-28  0:16             ` Eric Dumazet
2014-10-28  9:30               ` [PATCH net] inet: frags: fix a race between inet_evict_bucket and inet_frag_kill Nikolay Aleksandrov
2014-10-28 10:03                 ` Florian Westphal
2014-10-29 19:21                 ` David Miller
2014-10-28  9:44               ` [PATCH net] inet: frags: remove the WARN_ON from inet_evict_bucket Nikolay Aleksandrov
2014-10-29 19:22                 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).