From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753933AbcENQq5 (ORCPT ); Sat, 14 May 2016 12:46:57 -0400 Received: from mail-yw0-f170.google.com ([209.85.161.170]:34611 "EHLO mail-yw0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751955AbcENQqz (ORCPT ); Sat, 14 May 2016 12:46:55 -0400 MIME-Version: 1.0 In-Reply-To: References: Date: Sat, 14 May 2016 09:46:53 -0700 Message-ID: Subject: Re: [v4.6-rc7-183-g1410b74e4061] From: Eric Dumazet To: sedat.dilek@gmail.com Cc: David Miller , Peter Zijlstra , "Paul E. McKenney" , "netdev@vger.kernel.org" , "the arch/x86 maintainers" , LKML , Linus Torvalds , Alfredo Alvarez Fernandez Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, May 14, 2016 at 2:22 AM, Sedat Dilek wrote: > Hi, > > as Linux v4.6 is very near, I decided to write this bug report (only > drunk one coffee). > > First, I am not absolutely sure if this is a real issue as... > #1: This is only a (lockdep) warning. > #2: I have not a "vanilla" Linux v4.6-rc7+ here (see P.S. and attached patch) > > For a more helpful feedback I should test a... > #1: vanilla v4.6-rc7-183-g1410b74e4061 > #2: net.git#master on top of #1 > > What I am seeing is this while surfing with a UMTS/HSPA internet-stick > (using PPP) and running Firefox on Ubuntu/precise AMD64... > > [ 423.484105] ------------[ cut here ]------------ > [ 423.484119] WARNING: CPU: 2 PID: 2392 at > kernel/locking/lockdep.c:2098 __lock_acquire > [ 423.484123] DEBUG_LOCKS_WARN_ON(chain_hlocks[chain->base > [ 423.484125] Modules linked in: btrfs xor raid6_pq ntfs xfs > libcrc32c ppp_deflate bsd_comp ppp_async crc_ccitt option usb_wwan > cdc_ether usbserial usbnet snd_hda_codec_hdmi i915 > snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel > snd_hda_codec arc4 uvcvideo iwldvm snd_hwdep joydev mac80211 > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core > videodev rfcomm bnep kvm_intel kvm usb_storage btusb i2c_algo_bit > iwlwifi btrtl snd_hda_core drm_kms_helper btbcm snd_pcm parport_pc > btintel syscopyarea irqbypass ppdev snd_seq_midi bluetooth sysfillrect > psmouse snd_seq_midi_event sysimgblt snd_rawmidi fb_sys_fops cfg80211 > snd_seq drm serio_raw snd_timer samsung_laptop snd_seq_device snd > soundcore wmi mac_hid video intel_rst lpc_ich lp parport binfmt_misc > hid_generic usbhid hid r8169 mii > [ 423.484241] CPU: 2 PID: 2392 Comm: firefox Not tainted > 4.6.0-rc7-183.1-iniza-small #1 > [ 423.484244] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > [ 423.484247] 0000000000000000 ffff88011fa83910 ffffffff81413825 > ffff88011fa83960 > [ 423.484253] 0000000000000000 ffff88011fa83950 ffffffff81083ea1 > 0000083282f34ec0 > [ 423.484259] 0000000000000005 ffff880082f34540 0000000000000000 > 0000000000000027 > [ 423.484265] Call Trace: > [ 423.484268] [] dump_stack > [ 423.484280] [] __warn > [ 423.484285] [] warn_slowpath_fmt > [ 423.484289] [] __lock_acquire > [ 423.484294] [] ? __lock_acquire > [ 423.484298] [] ? __lock_acquire > [ 423.484302] [] lock_acquire > [ 423.484307] [] ? __dev_queue_xmit > [ 423.484313] [] _raw_spin_lock > [ 423.484317] [] ? __dev_queue_xmit > [ 423.484321] [] __dev_queue_xmit > [ 423.484326] [] ? __dev_queue_xmit > [ 423.484330] [] dev_queue_xmit > [ 423.484334] [] neigh_direct_output > [ 423.484339] [] ip_finish_output2 > [ 423.484344] [] ? ip_finish_output2 > [ 423.484349] [] ip_finish_output > [ 423.484353] [] ip_output > [ 423.484357] [] ? __lock_is_held > [ 423.484362] [] ip_local_out > [ 423.484366] [] ip_queue_xmit > [ 423.484371] [] ? ip_queue_xmit > [ 423.484376] [] tcp_transmit_skb > [ 423.484380] [] __tcp_retransmit_skb > [ 423.484385] [] tcp_retransmit_skb > [ 423.484389] [] tcp_retransmit_timer > [ 423.484394] [] ? tcp_write_timer_handler > [ 423.484398] [] tcp_write_timer_handler > [ 423.484402] [] tcp_write_timer > [ 423.484407] [] call_timer_fn > [ 423.484411] [] ? call_timer_fn > [ 423.484416] [] ? tcp_write_timer_handler > [ 423.484419] [] run_timer_softirq > [ 423.484424] [] __do_softirq > [ 423.484428] [] irq_exit > [ 423.484432] [] smp_apic_timer_interrupt > [ 423.484437] [] apic_timer_interrupt > [ 423.484439] > [ 423.484443] ---[ end trace a29d8ee0ef420d5c ]--- > [ 423.484446] > [ 423.484447] ====================== > [ 423.484449] [chain_key collision ] > [ 423.484452] 4.6.0-rc7-183.1-iniza-small #1 Tainted: G W > [ 423.484454] ---------------------- > [ 423.484457] firefox/2392: Hash chain already cached but the > contents don't match! > [ 423.484460] Held locks:depth: 6 > [ 423.484463] class_idx:1993 -> chain_key:00000000000007c9 > (((&icsk->icsk_retransmit_timer))){ > [ 423.484473] class_idx:1334 -> chain_key:0000000000f92536 (slock-AF_INET){ > [ 423.484482] class_idx:33 -> chain_key:0000001f24a6c021 > (rcu_read_lock){......}, at: [] ip_queue_xmit > [ 423.484492] class_idx:1005 -> chain_key:0003e494d80423ed > (rcu_read_lock_bh){......}, at: [] ip_finish_output2 > [ 423.484500] class_idx:1005 -> chain_key:7c929b00847da3ed > (rcu_read_lock_bh){......}, at: [] __dev_queue_xmit > [ 423.484509] class_idx:1996 -> chain_key:5360108fb47da85e > (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){ > [ 423.484517] Locks in cached chain:depth: 6 > [ 423.484520] class_idx:1998 -> chain_key:00000000000007ce > (((&sk->sk_timer))){ > [ 423.484525] class_idx:1334 -> chain_key:0000000000f9c536 (slock-AF_INET){ > [ 423.484531] class_idx:33 -> chain_key:0000001f38a6c021 > (rcu_read_lock){......} > [ 423.484536] class_idx:1005 -> chain_key:0003e714d80423ed > (rcu_read_lock_bh){......} > [ 423.484541] class_idx:1005 -> chain_key:7ce29b00847da3ed > (rcu_read_lock_bh){......} > [ 423.484547] class_idx:1986 -> chain_key:5360108fb47da85e (_xmit_PPP#2){ > [ 423.484553] > [ 423.484553] stack backtrace: > [ 423.484558] CPU: 2 PID: 2392 Comm: firefox Tainted: G W > 4.6.0-rc7-183.1-iniza-small #1 > [ 423.484561] Hardware name: SAMSUNG ELECTRONICS CO., LTD. > 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 > [ 423.484563] 0000000000000000 ffff88011fa83968 ffffffff81413825 > 0000000000000006 > [ 423.484570] ffffffff82189c20 ffff88011fa839b0 ffffffff811a8190 > 000000041fa83980 > [ 423.484575] ffff880082f34e48 0000000000000005 ffff880082f34540 > 0000000000000000 > [ 423.484581] Call Trace: > [ 423.484584] [] dump_stack > [ 423.484592] [] print_collision > [ 423.484597] [] __lock_acquire > [ 423.484601] [] ? __lock_acquire > [ 423.484606] [] ? __lock_acquire > [ 423.484610] [] lock_acquire > [ 423.484615] [] ? __dev_queue_xmit > [ 423.484619] [] _raw_spin_lock > [ 423.484624] [] ? __dev_queue_xmit > [ 423.484628] [] __dev_queue_xmit > [ 423.484632] [] ? __dev_queue_xmit > [ 423.484636] [] dev_queue_xmit > [ 423.484640] [] neigh_direct_output > [ 423.484644] [] ip_finish_output2 > [ 423.484648] [] ? ip_finish_output2 > [ 423.484653] [] ip_finish_output > [ 423.484657] [] ip_output > [ 423.484661] [] ? __lock_is_held > [ 423.484666] [] ip_local_out > [ 423.484670] [] ip_queue_xmit > [ 423.484675] [] ? ip_queue_xmit > [ 423.484679] [] tcp_transmit_skb > [ 423.484683] [] __tcp_retransmit_skb > [ 423.484688] [] tcp_retransmit_skb > [ 423.484692] [] tcp_retransmit_timer > [ 423.484697] [] ? tcp_write_timer_handler > [ 423.484701] [] tcp_write_timer_handler > [ 423.484705] [] tcp_write_timer > [ 423.484709] [] call_timer_fn > [ 423.484713] [] ? call_timer_fn > [ 423.484718] [] ? tcp_write_timer_handler > [ 423.484730] [] run_timer_softirq > [ 423.484734] [] __do_softirq > [ 423.484738] [] irq_exit > [ 423.484742] [] smp_apic_timer_interrupt > [ 423.484746] [] apic_timer_interrupt > > NOTE: I realized this call-trace 1st after v4.6-rc7. > > The only commit I see which might have an effect is... > > 10a81980fc47 tcp: refresh skb timestamp at retransmit time > > ...but I may be wrong. > > So help in... > "No more darkness, no more night." ---> " I see the light, I see the light." > > Attached is my linux-config, dmesg output and "customized" patch. > > Pleas give some clear instructions to get more light in debugging this. > > I'm really in a hurry, so sorry if this BR is unsatisfying and incomplete. > > Regards, > - Sedat - > > P.S.: For more details see attached patch on top of Linux v4.6-rc7. > > Sedat Dilek (5): > Merge branch 'wb-buf-throttle' of > git://git.kernel.org/.../axboe/linux-block into > for-4.8/wb-buf-throttle-20160510 > Merge branch 'for-4.6/vfs-fixes' into 4.6.0-rc7-183.1-iniza-small > Merge branch 'for-4.6/net-fixes' into 4.6.0-rc7-183.1-iniza-small > Merge branch 'for-4.7/pm-next-20160513' into 4.6.0-rc7-183.1-iniza-small > Merge branch 'for-4.8/wb-buf-throttle-20160510' into > 4.6.0-rc7-183.1-iniza-small > > P.S.S: Investigating net/ changes since Linux v4.6-rc7. > > $ git log --oneline v4.6-rc7.. net/ > e271c7b4420d gre: do not keep the GRE header around in collect medata mode > 16ec3d4fbb96 openvswitch: Fix cached ct with helper. > 4e8c86155010 net sched: ife action fix late binding > 5e1567aeb7fe net sched: skbedit action fix late binding > 0e5538ab2b59 net sched: simple action fix late binding > 87dfbdc6c747 net sched: mirred action fix late binding > a57f19d30b2d net sched: ipt action fix late binding > 5026c9b1bafc net sched: vlan action fix late binding > 10a81980fc47 tcp: refresh skb timestamp at retransmit time > adc0a8bfdc52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf > 79e48650320e net: fix a kernel infoleak in x25 module > 229740c63169 udp_offload: Set encapsulation before inner completes. > 43b8448cd7b4 udp_tunnel: Remove redundant udp_tunnel_gro_complete(). > 1d2f7b2d956e net: ipv6: tcp reset, icmp need to consider L3 domain > 856ce5d083e1 bridge: fix igmp / mld query parsing > 31ca0458a61a net: bridge: fix old ioctl unlocked net device walk > dedc58e067d8 VSOCK: do not disconnect socket when peer has shutdown SEND only > eda3fc50daa9 netfilter: nfnetlink_acct: validate NFACCT_QUOTA parameter > 32b583a0cb9b Merge branch 'master' of > git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec > 5f8e44741f9f net: fix infoleak in rtnetlink > b8670c09f37b net: fix infoleak in llc > cec5913c1515 netfilter: IDLETIMER: fix race condition when destroy the target > 70d72b7e060e netfilter: conntrack: init all_locks to avoid debug warning > d6af1a31cc72 vti: Add pmtu handling to vti_xmit. > 215276c0147e xfrm: Reset encapsulation field of the skb before transformation > 6ad3122a08e3 flowcache: Avoid OOM condition under preasure This looks like a hash collision lockdep report, exposed by a recent commit I have no idea how to proceed at this point. commit 39e2e173fb1f900959d3a25c21c65fa88b06c6ee Author: Alfredo Alvarez Fernandez Date: Wed Mar 30 19:03:36 2016 +0200 locking/lockdep: Print chain_key collision information A sequence of pairs [class_idx -> corresponding chain_key iteration] is printed for both the current held_lock chain and the cached chain. That exposes the two different class_idx sequences that led to that particular hash value. This helps with debugging hash chain collision reports. Signed-off-by: Alfredo Alvarez Fernandez Acked-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-fsdevel@vger.kernel.org Cc: sedat.dilek@gmail.com Cc: tytso@mit.edu Link: http://lkml.kernel.org/r/1459357416-19190-1-git-send-email-alfredoalvarezernandez@gmail.com Signed-off-by: Ingo Molnar