All of lore.kernel.org
 help / color / mirror / Atom feed
* linux-next: zillions of lockdep whinges in include/net/sock.h:1408
@ 2016-04-21  0:30 Valdis Kletnieks
  2016-04-21  7:42 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 19+ messages in thread
From: Valdis Kletnieks @ 2016-04-21  0:30 UTC (permalink / raw)
  To: Hannes Frederic Sowa, David S. Miller; +Cc: netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2126 bytes --]

linux-next 20160420 is whining at an incredible rate - in 20 minutes of
uptime, I piled up some 41,000 hits from all over the place (cleaned up
to skip the CPU and PID so the list isn't quite so long):

% grep include/net/sock.h /var/log/messages | cut -f5- -d: |  sed -e 's/PID: [0-9]* /PID: (elided) /' -e 's/CPU: [0-3]/CPU: +/' | sort | uniq -c | sort -nr
  13468  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_rcv+0xc20/0xcb0
   9770  CPU: + PID: (elided) at include/net/sock.h:1408 udp_queue_rcv_skb+0x3ca/0x6d0
   7706  CPU: + PID: (elided) at include/net/sock.h:1408 sock_owned_by_user+0x91/0xa0
   2818  CPU: + PID: (elided) at include/net/sock.h:1408 udpv6_queue_rcv_skb+0x3b6/0x6d0
   1981  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_write_timer+0xf2/0x110
   1954  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_delack_timer+0x110/0x130
   1912  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_keepalive_timer+0x136/0x2c0
    882  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_close+0x226/0x4f0
    804  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_tasklet_func+0x192/0x1e0
     28  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_child_process+0x17a/0x350
      2  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x401/0x660
      2  CPU: + PID: (elided) at include/net/sock.h:1408 tcp_v6_err+0x1fd/0x660

Seems to be from this commit, which is apparently over-stringent or
isn't handling some case correctly:

commit fafc4e1ea1a4c1eb13a30c9426fb799f5efacbc3
Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date:   Fri Apr 8 15:11:27 2016 +0200

    sock: tigthen lockdep checks for sock_owned_by_user

    sock_owned_by_user should not be used without socket lock held. It seems
    to be a common practice to check .owned before lock reclassification, so
    provide a little help to abstract this check away.

    Cc: linux-cifs@vger.kernel.org
    Cc: linux-bluetooth@vger.kernel.org
    Cc: linux-nfs@vger.kernel.org
    Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>


[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-21  0:30 linux-next: zillions of lockdep whinges in include/net/sock.h:1408 Valdis Kletnieks
@ 2016-04-21  7:42 ` Hannes Frederic Sowa
  2016-04-21  9:05   ` Valdis.Kletnieks
  0 siblings, 1 reply; 19+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-21  7:42 UTC (permalink / raw)
  To: Valdis Kletnieks, David S. Miller; +Cc: netdev, linux-kernel

Hi,

On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> uptime, I piled up some 41,000 hits from all over the place (cleaned up
> to skip the CPU and PID so the list isn't quite so long):

Thanks for the report. Can you give me some more details:

Is this an nfs socket? Do you by accident know if this socket went
through xs_reclassify_socket at any point? We do hold the appropriate
locks at that point but I fear that the lockdep reinitialization
confused lockdep.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-21  7:42 ` Hannes Frederic Sowa
@ 2016-04-21  9:05   ` Valdis.Kletnieks
  2016-04-21 13:31     ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Valdis.Kletnieks @ 2016-04-21  9:05 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: David S. Miller, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1009 bytes --]

On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
> Hi,
>
> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> > linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> > uptime, I piled up some 41,000 hits from all over the place (cleaned up
> > to skip the CPU and PID so the list isn't quite so long):
>
> Thanks for the report. Can you give me some more details:
>
> Is this an nfs socket? Do you by accident know if this socket went
> through xs_reclassify_socket at any point? We do hold the appropriate
> locks at that point but I fear that the lockdep reinitialization
> confused lockdep.

It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm reasonably
sure that multiple sockets were in play, given that tcp_v6_rcv and
udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty much
any IPv6 traffic could do it - the frequency dropped off quite a bit when I
closed firefox, which is usually a heavy network hitter on my laptop.


[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-21  9:05   ` Valdis.Kletnieks
@ 2016-04-21 13:31     ` Eric Dumazet
  2016-04-21 13:49       ` Hannes Frederic Sowa
  2016-04-21 18:10       ` linux-next: zillions of lockdep whinges in include/net/sock.h:1408 Hannes Frederic Sowa
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2016-04-21 13:31 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Hannes Frederic Sowa, David S. Miller, netdev, linux-kernel

On Thu, 2016-04-21 at 05:05 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
> > Hi,
> >
> > On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> > > linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> > > uptime, I piled up some 41,000 hits from all over the place (cleaned up
> > > to skip the CPU and PID so the list isn't quite so long):
> >
> > Thanks for the report. Can you give me some more details:
> >
> > Is this an nfs socket? Do you by accident know if this socket went
> > through xs_reclassify_socket at any point? We do hold the appropriate
> > locks at that point but I fear that the lockdep reinitialization
> > confused lockdep.
> 
> It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm reasonably
> sure that multiple sockets were in play, given that tcp_v6_rcv and
> udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty much
> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
> closed firefox, which is usually a heavy network hitter on my laptop.


Looks like the following patch is needed, can you try it please ?

Thanks !

diff --git a/include/net/sock.h b/include/net/sock.h
index d997ec13a643..db8301c76d50 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
 {
 	struct sock *sk = (struct sock *)csk;
 
-	return lockdep_is_held(&sk->sk_lock) ||
+	return !debug_locks ||
+	       lockdep_is_held(&sk->sk_lock) ||
 	       lockdep_is_held(&sk->sk_lock.slock);
 }
 #endif

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-21 13:31     ` Eric Dumazet
@ 2016-04-21 13:49       ` Hannes Frederic Sowa
  2016-04-24 18:38         ` David Miller
  2016-04-21 18:10       ` linux-next: zillions of lockdep whinges in include/net/sock.h:1408 Hannes Frederic Sowa
  1 sibling, 1 reply; 19+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-21 13:49 UTC (permalink / raw)
  To: Eric Dumazet, Valdis.Kletnieks; +Cc: David S. Miller, netdev, linux-kernel

On 21.04.2016 15:31, Eric Dumazet wrote:
> On Thu, 2016-04-21 at 05:05 -0400, Valdis.Kletnieks@vt.edu wrote:
>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>> Hi,
>>>
>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>>>> to skip the CPU and PID so the list isn't quite so long):
>>>
>>> Thanks for the report. Can you give me some more details:
>>>
>>> Is this an nfs socket? Do you by accident know if this socket went
>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>> locks at that point but I fear that the lockdep reinitialization
>>> confused lockdep.
>>
>> It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm reasonably
>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>> udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty much
>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>> closed firefox, which is usually a heavy network hitter on my laptop.
> 
> 
> Looks like the following patch is needed, can you try it please ?
> 
> Thanks !
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index d997ec13a643..db8301c76d50 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
>  {
>  	struct sock *sk = (struct sock *)csk;
>  
> -	return lockdep_is_held(&sk->sk_lock) ||
> +	return !debug_locks ||
> +	       lockdep_is_held(&sk->sk_lock) ||
>  	       lockdep_is_held(&sk->sk_lock.slock);
>  }
>  #endif

I would prefer to add debug_locks at the WARN_ON level, like
WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
this fixes the initial splat.

Thanks Eric!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-21 13:31     ` Eric Dumazet
  2016-04-21 13:49       ` Hannes Frederic Sowa
@ 2016-04-21 18:10       ` Hannes Frederic Sowa
  1 sibling, 0 replies; 19+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-21 18:10 UTC (permalink / raw)
  To: Eric Dumazet, Valdis.Kletnieks; +Cc: David S. Miller, netdev, linux-kernel

On 21.04.2016 15:31, Eric Dumazet wrote:
> On Thu, 2016-04-21 at 05:05 -0400, Valdis.Kletnieks@vt.edu wrote:
>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>> Hi,
>>>
>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>>>> to skip the CPU and PID so the list isn't quite so long):
>>>
>>> Thanks for the report. Can you give me some more details:
>>>
>>> Is this an nfs socket? Do you by accident know if this socket went
>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>> locks at that point but I fear that the lockdep reinitialization
>>> confused lockdep.
>>
>> It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm reasonably
>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>> udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty much
>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>> closed firefox, which is usually a heavy network hitter on my laptop.
> 
> 
> Looks like the following patch is needed, can you try it please ?
> 
> Thanks !
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index d997ec13a643..db8301c76d50 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
>  {
>  	struct sock *sk = (struct sock *)csk;
>  
> -	return lockdep_is_held(&sk->sk_lock) ||
> +	return !debug_locks ||
> +	       lockdep_is_held(&sk->sk_lock) ||
>  	       lockdep_is_held(&sk->sk_lock.slock);
>  }
>  #endif

I am a little bit lost because I cannot reproduce this bug. I thought
maybe it has something to do with single cpu spin_locks which don't
update the lockdep_maps but I couldn't reproduce it. If debug_locks get
flipped you should see something in dmesg, too. Maybe you have this
handy? Was there another lockdep splat before the networking ones? Also
the config would be helpful.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-21 13:49       ` Hannes Frederic Sowa
@ 2016-04-24 18:38         ` David Miller
  2016-04-24 18:48           ` Hannes Frederic Sowa
  0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2016-04-24 18:38 UTC (permalink / raw)
  To: hannes; +Cc: eric.dumazet, Valdis.Kletnieks, netdev, linux-kernel

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 21 Apr 2016 15:49:37 +0200

> On 21.04.2016 15:31, Eric Dumazet wrote:
>> On Thu, 2016-04-21 at 05:05 -0400, Valdis.Kletnieks@vt.edu wrote:
>>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>>> Hi,
>>>>
>>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>>>>> to skip the CPU and PID so the list isn't quite so long):
>>>>
>>>> Thanks for the report. Can you give me some more details:
>>>>
>>>> Is this an nfs socket? Do you by accident know if this socket went
>>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>>> locks at that point but I fear that the lockdep reinitialization
>>>> confused lockdep.
>>>
>>> It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm reasonably
>>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>>> udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty much
>>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>>> closed firefox, which is usually a heavy network hitter on my laptop.
>> 
>> 
>> Looks like the following patch is needed, can you try it please ?
>> 
>> Thanks !
>> 
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index d997ec13a643..db8301c76d50 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
>>  {
>>  	struct sock *sk = (struct sock *)csk;
>>  
>> -	return lockdep_is_held(&sk->sk_lock) ||
>> +	return !debug_locks ||
>> +	       lockdep_is_held(&sk->sk_lock) ||
>>  	       lockdep_is_held(&sk->sk_lock.slock);
>>  }
>>  #endif
> 
> I would prefer to add debug_locks at the WARN_ON level, like
> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
> this fixes the initial splat.

Can we finish this conversation out and come up with a final patch
for this soon?

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-24 18:38         ` David Miller
@ 2016-04-24 18:48           ` Hannes Frederic Sowa
  2016-04-24 18:54             ` David Miller
  2016-04-24 19:46             ` Eric Dumazet
  0 siblings, 2 replies; 19+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-24 18:48 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, Valdis.Kletnieks, netdev, linux-kernel

On 24.04.2016 20:38, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Thu, 21 Apr 2016 15:49:37 +0200
> 
>> On 21.04.2016 15:31, Eric Dumazet wrote:
>>> On Thu, 2016-04-21 at 05:05 -0400, Valdis.Kletnieks@vt.edu wrote:
>>>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>>>> Hi,
>>>>>
>>>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>>>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>>>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>>>>>> to skip the CPU and PID so the list isn't quite so long):
>>>>>
>>>>> Thanks for the report. Can you give me some more details:
>>>>>
>>>>> Is this an nfs socket? Do you by accident know if this socket went
>>>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>>>> locks at that point but I fear that the lockdep reinitialization
>>>>> confused lockdep.
>>>>
>>>> It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm reasonably
>>>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>>>> udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty much
>>>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>>>> closed firefox, which is usually a heavy network hitter on my laptop.
>>>
>>>
>>> Looks like the following patch is needed, can you try it please ?
>>>
>>> Thanks !
>>>
>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>> index d997ec13a643..db8301c76d50 100644
>>> --- a/include/net/sock.h
>>> +++ b/include/net/sock.h
>>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
>>>  {
>>>  	struct sock *sk = (struct sock *)csk;
>>>  
>>> -	return lockdep_is_held(&sk->sk_lock) ||
>>> +	return !debug_locks ||
>>> +	       lockdep_is_held(&sk->sk_lock) ||
>>>  	       lockdep_is_held(&sk->sk_lock.slock);
>>>  }
>>>  #endif
>>
>> I would prefer to add debug_locks at the WARN_ON level, like
>> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
>> this fixes the initial splat.
> 
> Can we finish this conversation out and come up with a final patch
> for this soon?

Eric's patch is worth to apply anyway, but I am not sure if it solves
the (fundamental) problem. I couldn't reproduce it with the exact next-
tag provided in the initial mail. All other reports also only happend
with linux-next and not net-next.

I hope I Valdis provides his config soon and I will continue my analysis
on this then.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-24 18:48           ` Hannes Frederic Sowa
@ 2016-04-24 18:54             ` David Miller
  2016-04-24 19:55               ` Eric Dumazet
  2016-04-24 19:46             ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: David Miller @ 2016-04-24 18:54 UTC (permalink / raw)
  To: hannes; +Cc: eric.dumazet, Valdis.Kletnieks, netdev, linux-kernel

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Sun, 24 Apr 2016 20:48:24 +0200

> Eric's patch is worth to apply anyway, but I am not sure if it solves
> the (fundamental) problem. I couldn't reproduce it with the exact next-
> tag provided in the initial mail. All other reports also only happend
> with linux-next and not net-next.

Ok, Eric please submit it formally.

Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-24 18:48           ` Hannes Frederic Sowa
  2016-04-24 18:54             ` David Miller
@ 2016-04-24 19:46             ` Eric Dumazet
  2016-04-24 19:56               ` Valdis.Kletnieks
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-04-24 19:46 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: David Miller, Valdis.Kletnieks, netdev, linux-kernel

On Sun, 2016-04-24 at 20:48 +0200, Hannes Frederic Sowa wrote:
> On 24.04.2016 20:38, David Miller wrote:
> > From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > Date: Thu, 21 Apr 2016 15:49:37 +0200
> > 
> >> On 21.04.2016 15:31, Eric Dumazet wrote:
> >>> On Thu, 2016-04-21 at 05:05 -0400, Valdis.Kletnieks@vt.edu wrote:
> >>>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
> >>>>> Hi,
> >>>>>
> >>>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
> >>>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
> >>>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
> >>>>>> to skip the CPU and PID so the list isn't quite so long):
> >>>>>
> >>>>> Thanks for the report. Can you give me some more details:
> >>>>>
> >>>>> Is this an nfs socket? Do you by accident know if this socket went
> >>>>> through xs_reclassify_socket at any point? We do hold the appropriate
> >>>>> locks at that point but I fear that the lockdep reinitialization
> >>>>> confused lockdep.
> >>>>
> >>>> It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm reasonably
> >>>> sure that multiple sockets were in play, given that tcp_v6_rcv and
> >>>> udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty much
> >>>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
> >>>> closed firefox, which is usually a heavy network hitter on my laptop.
> >>>
> >>>
> >>> Looks like the following patch is needed, can you try it please ?
> >>>
> >>> Thanks !
> >>>
> >>> diff --git a/include/net/sock.h b/include/net/sock.h
> >>> index d997ec13a643..db8301c76d50 100644
> >>> --- a/include/net/sock.h
> >>> +++ b/include/net/sock.h
> >>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
> >>>  {
> >>>  	struct sock *sk = (struct sock *)csk;
> >>>  
> >>> -	return lockdep_is_held(&sk->sk_lock) ||
> >>> +	return !debug_locks ||
> >>> +	       lockdep_is_held(&sk->sk_lock) ||
> >>>  	       lockdep_is_held(&sk->sk_lock.slock);
> >>>  }
> >>>  #endif
> >>
> >> I would prefer to add debug_locks at the WARN_ON level, like
> >> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
> >> this fixes the initial splat.
> > 
> > Can we finish this conversation out and come up with a final patch
> > for this soon?
> 
> Eric's patch is worth to apply anyway, but I am not sure if it solves
> the (fundamental) problem. I couldn't reproduce it with the exact next-
> tag provided in the initial mail. All other reports also only happend
> with linux-next and not net-next.
> 
> I hope I Valdis provides his config soon and I will continue my analysis
> on this then.

Should be easy to force a lockdep splat and check if the patch solves
the issue.

Issue here is that once lockdep detected a problem (not necessarily in
net/ tree btw), your helper always 'detect' a problem, since lockdep
automatically disables itself.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-24 18:54             ` David Miller
@ 2016-04-24 19:55               ` Eric Dumazet
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2016-04-24 19:55 UTC (permalink / raw)
  To: David Miller; +Cc: hannes, Valdis.Kletnieks, netdev, linux-kernel

On Sun, 2016-04-24 at 14:54 -0400, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Sun, 24 Apr 2016 20:48:24 +0200
> 
> > Eric's patch is worth to apply anyway, but I am not sure if it solves
> > the (fundamental) problem. I couldn't reproduce it with the exact next-
> > tag provided in the initial mail. All other reports also only happend
> > with linux-next and not net-next.
> 
> Ok, Eric please submit it formally.
> 
> Thanks!

Yes, I will first test it, and will take Hannes suggestion as well.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-24 19:46             ` Eric Dumazet
@ 2016-04-24 19:56               ` Valdis.Kletnieks
  2016-04-24 21:00                 ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Valdis.Kletnieks @ 2016-04-24 19:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Hannes Frederic Sowa, David Miller, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 853 bytes --]

On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:

> >>> +	return !debug_locks ||
> >>> +	       lockdep_is_held(&sk->sk_lock) ||

> Issue here is that once lockdep detected a problem (not necessarily in
> net/ tree btw), your helper always 'detect' a problem, since lockdep
> automatically disables itself.

"D'Oh!" -- H. Simpson

I thought this patch looked suspect, but couldn't put my finger on it. The
reason why I got like 41,000 of them is because I built a kernel that has
lockdep enabled, but I have an out-of-tree module that doesn't init something,
so I get this:

[   48.898156] INFO: trying to register non-static key.
[   48.898157] the code is fine but needs lockdep annotation.
[   48.898157] turning off the locking correctness validator.

After which point, even with this patch, every time through it's still going to
explode.



[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-24 19:56               ` Valdis.Kletnieks
@ 2016-04-24 21:00                 ` Eric Dumazet
  2016-04-24 21:13                   ` Valdis.Kletnieks
  2016-04-25 13:34                   ` [PATCH v2 net-next] sock: relax WARN_ON() in sock_owned_by_user() Eric Dumazet
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2016-04-24 21:00 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Hannes Frederic Sowa, David Miller, netdev, linux-kernel

On Sun, 2016-04-24 at 15:56 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> 
> > >>> +	return !debug_locks ||
> > >>> +	       lockdep_is_held(&sk->sk_lock) ||
> 
> > Issue here is that once lockdep detected a problem (not necessarily in
> > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > automatically disables itself.
> 
> "D'Oh!" -- H. Simpson
> 
> I thought this patch looked suspect, but couldn't put my finger on it. The
> reason why I got like 41,000 of them is because I built a kernel that has
> lockdep enabled, but I have an out-of-tree module that doesn't init something,
> so I get this:
> 
> [   48.898156] INFO: trying to register non-static key.
> [   48.898157] the code is fine but needs lockdep annotation.
> [   48.898157] turning off the locking correctness validator.
> 
> After which point, even with this patch, every time through it's still going to
> explode.

Which patch are you talking about ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-24 21:00                 ` Eric Dumazet
@ 2016-04-24 21:13                   ` Valdis.Kletnieks
  2016-04-24 21:25                     ` Eric Dumazet
  2016-04-25 13:34                   ` [PATCH v2 net-next] sock: relax WARN_ON() in sock_owned_by_user() Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Valdis.Kletnieks @ 2016-04-24 21:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Hannes Frederic Sowa, David Miller, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1362 bytes --]

On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said:
> On Sun, 2016-04-24 at 15:56 -0400, Valdis.Kletnieks@vt.edu wrote:
> > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> >
> > > >>> +	return !debug_locks ||
> > > >>> +	       lockdep_is_held(&sk->sk_lock) ||
> >
> > > Issue here is that once lockdep detected a problem (not necessarily in
> > > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > > automatically disables itself.
> >
> > "D'Oh!" -- H. Simpson
> >
> > I thought this patch looked suspect, but couldn't put my finger on it. The
> > reason why I got like 41,000 of them is because I built a kernel that has
> > lockdep enabled, but I have an out-of-tree module that doesn't init something,
> > so I get this:
> >
> > [   48.898156] INFO: trying to register non-static key.
> > [   48.898157] the code is fine but needs lockdep annotation.
> > [   48.898157] turning off the locking correctness validator.
> >
> > After which point, even with this patch, every time through it's still going to
> > explode.
>
> Which patch are you talking about ?

The one that adds the !debug_locks check - once my out-of-kernel module
hits something that turns off lockdep, it's *still* going to complain on
pretty much all the same packets it complained about earlier.  I thought
it looked suspicious, but you clarified why...

[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-24 21:13                   ` Valdis.Kletnieks
@ 2016-04-24 21:25                     ` Eric Dumazet
  2016-04-24 21:28                       ` Eric Dumazet
  2016-04-25 13:26                       ` Hannes Frederic Sowa
  0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2016-04-24 21:25 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Hannes Frederic Sowa, David Miller, netdev, linux-kernel

On Sun, 2016-04-24 at 17:13 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said:
> > On Sun, 2016-04-24 at 15:56 -0400, Valdis.Kletnieks@vt.edu wrote:
> > > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> > >
> > > > >>> +	return !debug_locks ||
> > > > >>> +	       lockdep_is_held(&sk->sk_lock) ||
> > >
> > > > Issue here is that once lockdep detected a problem (not necessarily in
> > > > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > > > automatically disables itself.
> > >
> > > "D'Oh!" -- H. Simpson
> > >
> > > I thought this patch looked suspect, but couldn't put my finger on it. The
> > > reason why I got like 41,000 of them is because I built a kernel that has
> > > lockdep enabled, but I have an out-of-tree module that doesn't init something,
> > > so I get this:
> > >
> > > [   48.898156] INFO: trying to register non-static key.
> > > [   48.898157] the code is fine but needs lockdep annotation.
> > > [   48.898157] turning off the locking correctness validator.
> > >
> > > After which point, even with this patch, every time through it's still going to
> > > explode.
> >
> > Which patch are you talking about ?
> 
> The one that adds the !debug_locks check - once my out-of-kernel module
> hits something that turns off lockdep, it's *still* going to complain on
> pretty much all the same packets it complained about earlier.  I thought
> it looked suspicious, but you clarified why...

It does not make sense to me. If lockdep is disabled, then debug_locks
is 0.

So no complain should happen from networking.

I was about to send following patch, please check it solves the issue. ?

(It certainly did for me, once I forced a lockdep splat loading a buggy
module)

Thanks

From: Eric Dumazet <edumazet@google.com>

Valdis reported tons of stack dumps caused by WARN_ON() in sock_owned_by_user()

This test needs to be relaxed if/when lockdep disables itself.

Note that other lockdep_sock_is_held() callers are all from
rcu_dereference_protected() sections which already are disabled
if/when lockdep has been disabled.

Fixes: fafc4e1ea1a4 ("sock: tigthen lockdep checks for sock_owned_by_user")
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 52448baf19d7..f492d01512ed 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1409,7 +1409,7 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow)
 static inline bool sock_owned_by_user(const struct sock *sk)
 {
 #ifdef CONFIG_LOCKDEP
-	WARN_ON(!lockdep_sock_is_held(sk));
+	WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks);
 #endif
 	return sk->sk_lock.owned;
 }

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-24 21:25                     ` Eric Dumazet
@ 2016-04-24 21:28                       ` Eric Dumazet
  2016-04-25 13:26                       ` Hannes Frederic Sowa
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2016-04-24 21:28 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Hannes Frederic Sowa, David Miller, netdev, linux-kernel

On Sun, 2016-04-24 at 14:25 -0700, Eric Dumazet wrote:
> On Sun, 2016-04-24 at 17:13 -0400, Valdis.Kletnieks@vt.edu wrote:
> > On Sun, 24 Apr 2016 14:00:17 -0700, Eric Dumazet said:
> > > On Sun, 2016-04-24 at 15:56 -0400, Valdis.Kletnieks@vt.edu wrote:
> > > > On Sun, 24 Apr 2016 12:46:42 -0700, Eric Dumazet said:
> > > >
> > > > > >>> +	return !debug_locks ||
> > > > > >>> +	       lockdep_is_held(&sk->sk_lock) ||
> > > >
> > > > > Issue here is that once lockdep detected a problem (not necessarily in
> > > > > net/ tree btw), your helper always 'detect' a problem, since lockdep
> > > > > automatically disables itself.
> > > >
> > > > "D'Oh!" -- H. Simpson
> > > >
> > > > I thought this patch looked suspect, but couldn't put my finger on it. The
> > > > reason why I got like 41,000 of them is because I built a kernel that has
> > > > lockdep enabled, but I have an out-of-tree module that doesn't init something,
> > > > so I get this:
> > > >
> > > > [   48.898156] INFO: trying to register non-static key.
> > > > [   48.898157] the code is fine but needs lockdep annotation.
> > > > [   48.898157] turning off the locking correctness validator.
> > > >
> > > > After which point, even with this patch, every time through it's still going to
> > > > explode.
> > >
> > > Which patch are you talking about ?
> > 
> > The one that adds the !debug_locks check - once my out-of-kernel module
> > hits something that turns off lockdep, it's *still* going to complain on
> > pretty much all the same packets it complained about earlier.  I thought
> > it looked suspicious, but you clarified why...
> 
> It does not make sense to me. If lockdep is disabled, then debug_locks
> is 0.
> 
> So no complain should happen from networking.
> 
> I was about to send following patch, please check it solves the issue. ?
> 
> (It certainly did for me, once I forced a lockdep splat loading a buggy
> module)
> 
> Thanks
> 
> From: Eric Dumazet <edumazet@google.com>
> 
> Valdis reported tons of stack dumps caused by WARN_ON() in sock_owned_by_user()
> 
> This test needs to be relaxed if/when lockdep disables itself.
> 
> Note that other lockdep_sock_is_held() callers are all from
> rcu_dereference_protected() sections which already are disabled
> if/when lockdep has been disabled.
> 
> Fixes: fafc4e1ea1a4 ("sock: tigthen lockdep checks for sock_owned_by_user")
> Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/net/sock.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 52448baf19d7..f492d01512ed 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1409,7 +1409,7 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow)
>  static inline bool sock_owned_by_user(const struct sock *sk)
>  {
>  #ifdef CONFIG_LOCKDEP
> -	WARN_ON(!lockdep_sock_is_held(sk));
> +	WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks);


Silly me, I tested the opposite test of course :

WARN_ON_ONCE(!lockdep_sock_is_held(sk) && debug_locks);

>  #endif
>  	return sk->sk_lock.owned;
>  }
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
  2016-04-24 21:25                     ` Eric Dumazet
  2016-04-24 21:28                       ` Eric Dumazet
@ 2016-04-25 13:26                       ` Hannes Frederic Sowa
  1 sibling, 0 replies; 19+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-25 13:26 UTC (permalink / raw)
  To: Eric Dumazet, Valdis.Kletnieks; +Cc: David Miller, netdev, linux-kernel

On Sun, Apr 24, 2016, at 23:25, Eric Dumazet wrote:
>  #ifdef CONFIG_LOCKDEP
> -       WARN_ON(!lockdep_sock_is_held(sk));
> +       WARN_ON_ONCE(!lockdep_sock_is_held(sk) && !debug_locks);
>  #endif

Eric, could you resend this patch without the negation and also add my
acked-by (I came finally around to test it).

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 net-next] sock: relax WARN_ON() in sock_owned_by_user()
  2016-04-24 21:00                 ` Eric Dumazet
  2016-04-24 21:13                   ` Valdis.Kletnieks
@ 2016-04-25 13:34                   ` Eric Dumazet
  2016-04-25 15:50                     ` David Miller
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-04-25 13:34 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Hannes Frederic Sowa, David Miller, netdev

From: Eric Dumazet <edumazet@google.com>

Valdis reported tons of stack dumps caused by WARN_ON() in
sock_owned_by_user()

This test needs to be relaxed if/when lockdep disables itself.

Note that other lockdep_sock_is_held() callers are all from
rcu_dereference_protected() sections which already are disabled
if/when lockdep has been disabled.

Fixes: fafc4e1ea1a4 ("sock: tigthen lockdep checks for sock_owned_by_user")
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/sock.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 52448baf19d7..2fdb87f176cf 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1409,7 +1409,7 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow)
 static inline bool sock_owned_by_user(const struct sock *sk)
 {
 #ifdef CONFIG_LOCKDEP
-	WARN_ON(!lockdep_sock_is_held(sk));
+	WARN_ON_ONCE(!lockdep_sock_is_held(sk) && debug_locks);
 #endif
 	return sk->sk_lock.owned;
 }

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 net-next] sock: relax WARN_ON() in sock_owned_by_user()
  2016-04-25 13:34                   ` [PATCH v2 net-next] sock: relax WARN_ON() in sock_owned_by_user() Eric Dumazet
@ 2016-04-25 15:50                     ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2016-04-25 15:50 UTC (permalink / raw)
  To: eric.dumazet; +Cc: Valdis.Kletnieks, hannes, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Apr 2016 06:34:09 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Valdis reported tons of stack dumps caused by WARN_ON() in
> sock_owned_by_user()
> 
> This test needs to be relaxed if/when lockdep disables itself.
> 
> Note that other lockdep_sock_is_held() callers are all from
> rcu_dereference_protected() sections which already are disabled
> if/when lockdep has been disabled.
> 
> Fixes: fafc4e1ea1a4 ("sock: tigthen lockdep checks for sock_owned_by_user")
> Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-04-25 15:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-21  0:30 linux-next: zillions of lockdep whinges in include/net/sock.h:1408 Valdis Kletnieks
2016-04-21  7:42 ` Hannes Frederic Sowa
2016-04-21  9:05   ` Valdis.Kletnieks
2016-04-21 13:31     ` Eric Dumazet
2016-04-21 13:49       ` Hannes Frederic Sowa
2016-04-24 18:38         ` David Miller
2016-04-24 18:48           ` Hannes Frederic Sowa
2016-04-24 18:54             ` David Miller
2016-04-24 19:55               ` Eric Dumazet
2016-04-24 19:46             ` Eric Dumazet
2016-04-24 19:56               ` Valdis.Kletnieks
2016-04-24 21:00                 ` Eric Dumazet
2016-04-24 21:13                   ` Valdis.Kletnieks
2016-04-24 21:25                     ` Eric Dumazet
2016-04-24 21:28                       ` Eric Dumazet
2016-04-25 13:26                       ` Hannes Frederic Sowa
2016-04-25 13:34                   ` [PATCH v2 net-next] sock: relax WARN_ON() in sock_owned_by_user() Eric Dumazet
2016-04-25 15:50                     ` David Miller
2016-04-21 18:10       ` linux-next: zillions of lockdep whinges in include/net/sock.h:1408 Hannes Frederic Sowa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.