From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932386AbdK0Ryo (ORCPT ); Mon, 27 Nov 2017 12:54:44 -0500 Received: from out0-230.mail.aliyun.com ([140.205.0.230]:43975 "EHLO out0-230.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752693AbdK0Ryn (ORCPT ); Mon, 27 Nov 2017 12:54:43 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e02c03300;MF=yang.s@alibaba-inc.com;NM=1;PH=DS;RN=3;SR=0;TI=SMTPD_---.9WVahya_1511805264; Subject: Re: [RFC PATCH 2/2] lib: debugobjects: touch watchdog to avoid softlockup when !CONFIG_PREEMPT To: tglx@linutronix.de, longman@redhat.com Cc: linux-kernel@vger.kernel.org References: <1510947833-116482-1-git-send-email-yang.s@alibaba-inc.com> <1510947833-116482-2-git-send-email-yang.s@alibaba-inc.com> From: "Yang Shi" Message-ID: <553877ed-843e-a59d-3a76-2e90ce192ab1@alibaba-inc.com> Date: Tue, 28 Nov 2017 01:54:15 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <1510947833-116482-2-git-send-email-yang.s@alibaba-inc.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Waiman, The second patch of this series. Thanks, Yang On 11/17/17 11:43 AM, Yang Shi wrote: > There are nested loops on debug objects free path, sometimes it may take > over hundred thousands of loops, then cause soft lockup with !CONFIG_PREEMPT > occasionally, like below: > > NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [stress-ng-getde:110342] > Modules linked in: binfmt_misc(E) tcp_diag(E) > inet_diag(E) bonding(E) intel_rapl(E) iosf_mbi(E) > x86_pkg_temp_thermal(E) coretemp(E) iTCO_wdt(E) iTCO_vendor_support(E) > kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) > dcdbas(E) ghash_clmulni_intel(E) aesni_intel(E) lrw(E) gf128mul(E) > glue_helper(E) ablk_helper(E) ipmi_devintf(E) sg(E) cryptd(E) pcspkr(E) > mei_me(E) lpc_ich(E) ipmi_si(E) mfd_core(E) mei(E) shpchp(E) wmi(E) > ipmi_msghandler(E) acpi_power_meter(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) > lockd(E) grace(E) sunrpc(E) ip_tables(E) ext4(E) jbd2(E) mbcache(E) > sd_mod(E) mgag200(E) igb(E) drm_kms_helper(E) ixgbe(E) syscopyarea(E) > mdio(E) sysfillrect(E) sysimgblt(E) ptp(E) fb_sys_fops(E) pps_core(E) > ttm(E) drm(E) crc32c_intel(E) i2c_algo_bit(E) i2c_core(E) > megaraid_sas(E) > dca(E) > irq event stamp: 4340444 > hardirqs last enabled at (4340443): [] > _raw_spin_unlock_irqrestore+0x36/0x60 > hardirqs last disabled at (4340444): [] > apic_timer_interrupt+0x91/0xa0 > softirqs last enabled at (4340398): [] > __do_softirq+0x349/0x50e > softirqs last disabled at (4340391): [] > irq_exit+0xf5/0x110 > CPU: 15 PID: 110342 Comm: stress-ng-getde Tainted: G > E 4.9.44-003.ali3000.alios7.x86_64.debug #1 > Hardware name: Dell Inc. PowerEdge R720xd/0X6FFV, BIOS > 1.6.0 03/07/2013 > task: ffff884cbb0d0000 task.stack: ffff884cabc70000 > RIP: 0010:[] [] > _raw_spin_unlock_irqrestore+0x3b/0x60 > RSP: 0018:ffff884cabc77b78 EFLAGS: 00000292 > RAX: ffff884cbb0d0000 RBX: 0000000000000292 RCX: 0000000000000000 > RDX: ffff884cbb0d0000 RSI: 0000000000000001 RDI: 0000000000000292 > RBP: ffff884cabc77b88 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8357a0d8 > R13: ffff884cabc77bc8 R14: ffffffff8357a0d0 R15: 00000000000000fc > FS: 00002aee845fd2c0(0000) GS:ffff8852bd400000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000002991808 CR3: 0000005123abf000 CR4: 00000000000406e0 > Stack: > ffff884ff4fe0000 ffff884ff4fd8000 ffff884cabc77c00 ffffffff8141177e > 0000000000000202 ffff884cbb0d0000 ffff884cabc77bc8 0000000000000006 > ffff884ff4fda000 ffffffff8357a0d8 0000000000000000 91f5d976f6020b6c > Call Trace: > [] debug_check_no_obj_freed+0x13e/0x220 > [] __free_pages_ok+0x1f1/0x5c0 > [] __free_pages+0x25/0x40 > [] __free_slab+0x19b/0x270 > [] discard_slab+0x39/0x50 > [] __slab_free+0x207/0x270 > [] ___cache_free+0xa6/0xb0 > [] qlist_free_all+0x47/0x80 > [] quarantine_reduce+0x159/0x190 > [] kasan_kmalloc+0xaf/0xc0 > [] kasan_slab_alloc+0x12/0x20 > [] kmem_cache_alloc+0xfa/0x360 > [] ? getname_flags+0x4f/0x1f0 > [] getname_flags+0x4f/0x1f0 > [] getname+0x12/0x20 > [] do_sys_open+0xf9/0x210 > [] SyS_open+0x1e/0x20 > [] entry_SYSCALL_64_fastpath+0x1f/0xc2 > Code: 7f 18 53 48 8b 55 08 48 89 f3 be 01 00 00 00 e8 3c > cd 92 ff 4c 89 e7 e8 f4 0e 93 ff f6 c7 02 74 1b e8 3a ac 92 ff 48 89 df > 57 9d <66> 66 90 66 90 65 ff 0d d1 ff 83 7e 5b 41 5c 5d c3 48 89 df 57 > > The code path might be called in either atomic or non-atomic context, > so touching softlockup watchdog instead of calling cond_resched() which > might fall asleep. However, it is unnecessary to touch the watchdog > every loop, so just touch the watchdog at every 10000 (best estimate) loops. > > Signed-off-by: Yang Shi > --- > lib/debugobjects.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/lib/debugobjects.c b/lib/debugobjects.c > index 166488d..6fe1e60 100644 > --- a/lib/debugobjects.c > +++ b/lib/debugobjects.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > #define ODEBUG_HASH_BITS 14 > #define ODEBUG_HASH_SIZE (1 << ODEBUG_HASH_BITS) > @@ -768,6 +769,9 @@ static void __debug_check_no_obj_freed(const void *address, unsigned long size) > debug_objects_maxchain = cnt; > > max_loops += cnt; > + > + if (max_loops > 10000 && ((max_loops % 10000) == 0)) > + touch_softlockup_watchdog(); > } > > if (max_loops > debug_objects_maxloops) >