From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7AB6C28CF5 for ; Wed, 26 Jan 2022 13:56:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241810AbiAZN4v (ORCPT ); Wed, 26 Jan 2022 08:56:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51074 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235073AbiAZN4q (ORCPT ); Wed, 26 Jan 2022 08:56:46 -0500 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DAE9C06161C for ; Wed, 26 Jan 2022 05:56:46 -0800 (PST) Received: by mail-qk1-x734.google.com with SMTP id o10so5108282qkg.0 for ; Wed, 26 Jan 2022 05:56:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:reply-to:references :mime-version:content-disposition:in-reply-to; bh=WXpAveC1E0AB+DeqbdNQKATHU88SHQt9sRmZQeuXtIM=; b=Yl/IuQaUV+IdU3dHyY2ueM19wSuuX+TAfCQMEQCFwpQWyADvj5FeeCT9vX1ADinTLY ZKkPqg2w+osm3CmXBzz7PtuqWq6pXR9II+xXKo1zPzYHSHNXv9zIjaQ+9epguendPqcI ym5J9La18qJrTQRYBdgdDhJlBBk+d6AESF8QKS6kKFupIjdP8NBbZil2rqB5g72eUd+f NzurotSMwkYXnmohMOQ2E7hqCYhNuO++z1dwGAl/p80sCuY8drZ5fUrj5qTCjSI3m52D C0iDZsOXmn3Dl+F5DHJXrZJRKjfgIEweScrgo+YPQSnP37kXAeP85S1t5G8ledGLnq6r De4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :reply-to:references:mime-version:content-disposition:in-reply-to; bh=WXpAveC1E0AB+DeqbdNQKATHU88SHQt9sRmZQeuXtIM=; b=MfiTaMUYLoApkYOp0L17XfN9zLCynKExmNC1cK0T/JmTCTTR82VPejy04cZEmVbcb1 xcx1J/O7b/TCLuAat+9huMSwVW5GE37sBdG74RnmYsKH1iCEWb5MUq5437+NqTt7FKC1 p/9IgJmSMOOY/a6zKYIaLEyZQgaw+vlpwZsMOZ53tJeZBgUz7tokPf7Dfl1Ab8nWK33N IXPmS17zISXi3lpSiOen3GAreN9nO2yvaQDy8O4Jit61qWdPFAmWZWgssEB3eXGRPXPw p0nxNqViHk7gTb2KRqbiqa2aLEnE+eLwIe8bt7yJqqfdjTK8RdcWuB8IU6Vth+lHfKB7 lH4A== X-Gm-Message-State: AOAM530tcde4vganLxw+AwdYS4WUEJD4/IXNmD6/KpJW+Ld1f+qChN2x EFT6doWfDrmRlSSFNJkBlQ== X-Google-Smtp-Source: ABdhPJw1btSR3z7AvP3BohTPCdLo9Lkzaa+N9vtJX88HeecXYGrnZ3V7zHirM65E1WIUEhV24d3t8w== X-Received: by 2002:a05:620a:3190:: with SMTP id bi16mr17540695qkb.521.1643205404994; Wed, 26 Jan 2022 05:56:44 -0800 (PST) Received: from serve.minyard.net ([47.184.144.75]) by smtp.gmail.com with ESMTPSA id b10sm10684542qtb.67.2022.01.26.05.56.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Jan 2022 05:56:44 -0800 (PST) Sender: Corey Minyard Received: from minyard.net (unknown [192.168.27.192]) by serve.minyard.net (Postfix) with ESMTPSA id 47242180004; Wed, 26 Jan 2022 13:56:43 +0000 (UTC) Date: Wed, 26 Jan 2022 07:56:39 -0600 From: Corey Minyard To: Juergen Gross Cc: Ingo Molnar , "Paul E. McKenney" , LKML Subject: Re: Possible reproduction of CSD locking issue Message-ID: <20220126135639.GS34919@minyard.net> Reply-To: minyard@acm.org References: <20220125182737.GO34919@minyard.net> <4609fe56-7d88-8176-a378-0f465670b37d@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4609fe56-7d88-8176-a378-0f465670b37d@suse.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 26, 2022 at 07:08:22AM +0100, Juergen Gross wrote: > On 25.01.22 19:27, Corey Minyard wrote: > > We have a customer that had been seeing CSD lock issues on a Centos 7 > > kernel (unfortunately). I couldn't find anything or any kernel changes > > that might fix it, so I was consdering it was the CSD locking issue you > > have been chasing for a while. > > Is this on bare metal or in a virtualized environment? This is bare metal. I do think I know what happened. Here's my analysis... csd: Detected non-responsive CSD lock (#1) on CPU#3, waiting 5000000042 ns for CPU#55 flush_tlb_func+0x0/0xb0(0xffff8e0b3e2afbe8). csd: CSD lock (#1) unresponsive. csd: cnt(0000000): 0000->0000 queue csd: cnt(0000001): ffff->0037 idle The above means that these events weren't seen, I think. We can ignore them in any case. csd: cnt(63d8dd8): 0003->0037 ipi csd: cnt(63d8dd9): 0003->0037 ping csd: cnt(63d8dda): 0003->ffff pinged This is a little confusing. The first two lines have to be from __smp_call_single_queue_debug. The last line has to be from smp_call_function_many. But you never see the pinged from __smp_call_single_queue_debug. csd: cnt(63d8dea): 0035->0037 pinged The tail of CPU 53 sending an IPI to CPU 55 in __smp_call_single_queue_debug. csd: cnt(63d8deb): ffff->0037 gotipi csd: cnt(63d8dec): ffff->0037 handle csd: cnt(63d8ded): ffff->0037 dequeue (src CPU 0 == empty) csd: cnt(63d8dee): ffff->0037 hdlend (src CPU 0 == early) CPU 55 is handling the IPI(s) it was sent earlier. csd: cnt(63d8e1f): 0003->0037 queue csd: cnt(63d8e20): 0003->0037 ipi csd: cnt(63d8e21): 0003->0037 ping In __smp_call_single_queue_debug CPU 3 sends another message to CPU 55 and sends an IPI. But there should be a pinged entry after this. csd: cnt(63d8e22): 0003->0037 queue csd: cnt(63d8e23): 0003->0037 noipi In __smp_call_single_queue_debug CPU 3 sends another message to CPU 55, but no IPI is required because one is already pending. I deleted the timestamps and the rest of the log, but a second IPI is sent and CPU 55 wakes up immediately. That tells me that CPU 55 wasn't stuck for some reason (interrupt storm, long time with interrupts disabled). Though maybe not, with your insight below? Looking at the code, it appears this code runs with interrupts and preemption enabled, so they can be switched out at any time here. I can only see one possibility from the above sequence: * CPU 3 queues the list item but is switched out before sending the IPI (no pinged in the sequence). * Another thread on CPU 3 is switched in and finishes a send many. * CPU 53 sends an IPI to CPU 55 and wakes it up. * CPU 55 handles the IPI. * Another thread on CPU 3 then sends another message to CPU 55, enqueues the link, but is switched out before actually sending the IPI (again, no pinged in the sequence). * Yet another thread on CPU 3 sends a message, but the item is already queued and thus it does not send an IPI. * Something on CPU 3 prevents the first two threads from sending the IPI. So assuming it's not a hardware bug, it appears the problem is process starvation on CPU 3. Knowing our customer's architecture, this is actually a possibility. This could be pretty easily solved by turning off interrupts (or preemption?) around the code that enqueues the link and sends the IPI. > > The bug I was hunting occurred when running under Xen, and in the end I > found a scheduling problem in the Xen hypervisor. > > > > > So I backported the debug patches. And of course, they stopped seeing > > the issue, at least as much, and they had trouble with the extra CPU > > time the debug code took. But they just reproduced it. Here are the > > logs: > > > > Jan 23 23:39:43 worker0 kernel: [285737.522743] csd: Detected non-responsive CSD lock (#1) on CPU#3, waiting 5000000042 ns for CPU#55 flush_tlb_func+0x0/0xb0(0xffff8e0b3e2afbe8). > > Jan 23 23:39:43 worker0 kernel: [285737.522744] csd: CSD lock (#1) unresponsive. > > Jan 23 23:39:43 worker0 kernel: [285737.522747] csd: cnt(0000000): 0000->0000 queue > > Jan 23 23:39:43 worker0 kernel: [285737.522748] csd: cnt(0000001): ffff->0037 idle > > Jan 23 23:39:43 worker0 kernel: [285737.522749] csd: cnt(63d8dd8): 0003->0037 ipi > > Jan 23 23:39:43 worker0 kernel: [285737.522750] csd: cnt(63d8dd9): 0003->0037 ping > > Jan 23 23:39:43 worker0 kernel: [285737.522750] csd: cnt(63d8dda): 0003->ffff pinged > > Jan 23 23:39:43 worker0 kernel: [285737.522751] csd: cnt(63d8dea): 0035->0037 pinged > > Jan 23 23:39:43 worker0 kernel: [285737.522752] csd: cnt(63d8deb): ffff->0037 gotipi > > Jan 23 23:39:43 worker0 kernel: [285737.522752] csd: cnt(63d8dec): ffff->0037 handle > > Jan 23 23:39:43 worker0 kernel: [285737.522753] csd: cnt(63d8ded): ffff->0037 dequeue (src CPU 0 == empty) > > Jan 23 23:39:43 worker0 kernel: [285737.522754] csd: cnt(63d8dee): ffff->0037 hdlend (src CPU 0 == early) > > Jan 23 23:39:43 worker0 kernel: [285737.522754] csd: cnt(63d8e1f): 0003->0037 queue > > Jan 23 23:39:43 worker0 kernel: [285737.522755] csd: cnt(63d8e20): 0003->0037 ipi > > Jan 23 23:39:43 worker0 kernel: [285737.522756] csd: cnt(63d8e21): 0003->0037 ping > > Jan 23 23:39:43 worker0 kernel: [285737.522756] csd: cnt(63d8e22): 0003->0037 queue > > Jan 23 23:39:43 worker0 kernel: [285737.522757] csd: cnt(63d8e23): 0003->0037 noipi > > Jan 23 23:39:43 worker0 kernel: [285737.522757] csd: cnt now: 63fe4cd > > Jan 23 23:39:43 worker0 kernel: [285737.522758] Task dump for CPU 55: > > Jan 23 23:39:43 worker0 kernel: [285737.522761] kubelet R running task 0 277695 1 0x00080000 > > Jan 23 23:39:43 worker0 kernel: [285737.522761] Call Trace: > > Jan 23 23:39:43 worker0 kernel: [285737.522769] [] ? __schedule+0x46a/0x990 > > Jan 23 23:39:43 worker0 kernel: [285737.522774] [] ? context_tracking_user_enter+0x13/0x20 > > Jan 23 23:39:43 worker0 kernel: [285737.522776] [] ? schedule_user+0x45/0x50 > > Jan 23 23:39:43 worker0 kernel: [285737.522779] [] ? retint_careful+0x16/0x34 > > Jan 23 23:39:43 worker0 kernel: [285737.522780] csd: Re-sending CSD lock (#1) IPI from CPU#03 to CPU#55 > > Jan 23 23:39:43 worker0 kernel: [285737.522788] CPU: 3 PID: 54671 Comm: runc:[2:INIT] Kdump: loaded Tainted: G OE ------------ T 3.10.0-1062.12.1.rt56.1042.mvista.test.14.el7.x86_64 #1 > > Jan 23 23:39:43 worker0 kernel: [285737.522789] Hardware name: Dell Inc. PowerEdge R740/0YWR7D, BIOS 2.9.4 11/06/2020 > > Jan 23 23:39:43 worker0 kernel: [285737.522789] Call Trace: > > Jan 23 23:39:43 worker0 kernel: [285737.522793] [] dump_stack+0x19/0x1b > > Jan 23 23:39:43 worker0 kernel: [285737.522798] [] __csd_lock_wait+0x1a8/0x2a0 > > Jan 23 23:39:43 worker0 kernel: [285737.522800] [] ? leave_mm+0x120/0x120 > > Jan 23 23:39:43 worker0 kernel: [285737.522802] [] smp_call_function_single+0xc4/0x1b0 > > Jan 23 23:39:43 worker0 kernel: [285737.522804] [] ? leave_mm+0x120/0x120 > > Jan 23 23:39:43 worker0 kernel: [285737.522809] [] ? page_counter_uncharge+0x3b/0x70 > > Jan 23 23:39:43 worker0 kernel: [285737.522811] [] smp_call_function_many+0x344/0x380 > > Jan 23 23:39:43 worker0 kernel: [285737.522813] [] ? leave_mm+0x120/0x120 > > Jan 23 23:39:43 worker0 kernel: [285737.522816] [] native_flush_tlb_others+0xb8/0xc0 > > Jan 23 23:39:43 worker0 kernel: [285737.522818] [] flush_tlb_page+0x65/0xf0 > > Jan 23 23:39:43 worker0 kernel: [285737.522821] [] ptep_clear_flush+0x68/0xa0 > > Jan 23 23:39:43 worker0 kernel: [285737.522825] [] wp_page_copy.isra.83+0x3d6/0x650 > > Jan 23 23:39:43 worker0 kernel: [285737.522828] [] do_wp_page+0xb4/0x710 > > Jan 23 23:39:43 worker0 kernel: [285737.522832] [] handle_mm_fault+0x884/0x1340 > > Jan 23 23:39:43 worker0 kernel: [285737.522835] [] ? update_cfs_shares+0xa9/0xf0 > > Jan 23 23:39:43 worker0 kernel: [285737.522839] [] __do_page_fault+0x213/0x5a0 > > Jan 23 23:39:43 worker0 kernel: [285737.522841] [] do_page_fault+0x35/0x90 > > Jan 23 23:39:43 worker0 kernel: [285737.522842] [] page_fault+0x28/0x30 > > Jan 23 23:39:43 worker0 kernel: [285737.522845] csd: CSD lock (#1) got unstuck on CPU#03, CPU#55 released the lock. > > > > Hopefully this is the issue you are chasing and not something else. > > I've been studying them to see what they mean, but I thought you might > > be interested to get them asap. > > As said before: I've solved my problem. > > But I can share some more insight from my experience: > > Don't be so sure that resending the IPI woke up the cpu again. It might > have been the NMI for getting the target cpu's stacktrace which resulted > in letting it run again. I missed this possibility in the beginning and > this was the reason I dismissed the Xen scheduling issue too early. Hmm. That's non-obvious, obviously. I'm not sure how sending an NMI would unstick something like this, but I'll take your word for it. I think my above analysis is correct based on the sequence, but this gives me something else to consider. Thanks for the input. -corey > > > Juergen