From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F8C0C433EF for ; Thu, 2 Jun 2022 12:27:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234957AbiFBM1Q (ORCPT ); Thu, 2 Jun 2022 08:27:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234995AbiFBM1O (ORCPT ); Thu, 2 Jun 2022 08:27:14 -0400 Received: from mail-yw1-x1130.google.com (mail-yw1-x1130.google.com [IPv6:2607:f8b0:4864:20::1130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4F9C113B58; Thu, 2 Jun 2022 05:27:12 -0700 (PDT) Received: by mail-yw1-x1130.google.com with SMTP id 00721157ae682-30c1b401711so49721507b3.2; Thu, 02 Jun 2022 05:27:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2/ytNBLtaet2Sc9b94adZZY4XFJBgyJMFg+LAlwXlqM=; b=bWn62KJM8+4dCHVGM9/0OyQMyG6NrDOyZl0Z78IdmQcvxm7IwvVeIk/UaYWC1Jj+33 pJQEKlFEL1IGcm5eJ70piiMl0iowRUCAIRaTyvwRQwnscWKZU7sMSL7Fl8nv/9diCAYJ htPp2bN/YGBTf3LTbaHTig9bINfsAk5sfRiVEOe04uJ8K0awQh5sBUdgX459oM0o4b+1 xXS7d02/LMCIGJgJqeNBtmgSa3ykr5Rb/fojoNcUGtI/enk6XVUMRIJWFWAX5BZLtkwN WVday1hBe11fF9D/c8RuKKOQDKhdFRWVy0X+BRgMtfctJ66aSFpLoRsu9uEtqF2EF5fa ky/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2/ytNBLtaet2Sc9b94adZZY4XFJBgyJMFg+LAlwXlqM=; b=0/LF0Z6dX9l53FBt8at0uTTjloUnr8AaBPVRfL0XGDcTDh0932cL1fZVvm8NbZAGQW kPGRljwRwuy0DISNO6qVH9Vkno7IvicBPTJPTgX0oCTKFW5DDO2SH2VFrwry/V2gEXKA S1y0DKiMQEjTpxXLx8KN1/XQESRmB3ootlgtQc40WyQ8UIIP1vhACWpQb3Ke9BT/ofgX 4BjeHMG5hxPjA/eHes2nYdDUF1lnVctuvDzTJc7D/xJLkEKvgJ0vfs84k/tufSeqId/N 2errmNSl8ELyHcuW99H2DtVP5ln1Joh1n15pJ0gvXkwJUJM8Q4RP2trGV6gwVpsKV8N4 lWJQ== X-Gm-Message-State: AOAM532nsRMW+C3GtTcWIy+9/jUXKQWiFJ+UAhueLQ6yxfnTpyrwQNn7 Y6Twf31yry5XNR8JXJU6MSdsMlZGtJODleZlcUtIW8T/cCc= X-Google-Smtp-Source: ABdhPJyDl12IoT3ipqY9bph7zY3VDQvVSvH/+M1j4DKSXgUx2V6G0oFpN+8hYgIk1UHAVEej//IKDU5F5PpzvPWxkbo= X-Received: by 2002:a81:15d8:0:b0:2f7:b686:53d9 with SMTP id 207-20020a8115d8000000b002f7b68653d9mr5382385ywv.428.1654172831990; Thu, 02 Jun 2022 05:27:11 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yegor Yefremov Date: Thu, 2 Jun 2022 14:27:00 +0200 Message-ID: Subject: Re: am335x: 5.18.x: system stalling To: Ard Biesheuvel Cc: Arnd Bergmann , Tony Lindgren , Linux-OMAP , linux-clk , Stephen Boyd , Linux ARM Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-clk@vger.kernel.org On Thu, Jun 2, 2022 at 12:37 PM Ard Biesheuvel wrote: > > On Thu, 2 Jun 2022 at 12:17, Yegor Yefremov wrote: > > > > On Wed, Jun 1, 2022 at 12:50 PM Ard Biesheuvel wrote: > > > > > > On Wed, 1 Jun 2022 at 12:46, Yegor Yefremov wrote: > > > > > > > > On Wed, Jun 1, 2022 at 12:06 PM Ard Biesheuvel wrote: > > > > > > > > > > On Wed, 1 Jun 2022 at 12:04, Yegor Yefremov wrote: > > > > > > > > > > > > On Wed, Jun 1, 2022 at 11:28 AM Ard Biesheuvel wrote: > > > > > > > > > > > > > > On Wed, 1 Jun 2022 at 10:08, Ard Biesheuvel wrote: > > > > > > > > > > > > > > > > On Wed, 1 Jun 2022 at 09:59, Arnd Bergmann wrote: > > > > > > > > > > > > > > > > > > On Wed, Jun 1, 2022 at 9:36 AM Yegor Yefremov > > > > > > > > > wrote: > > > > > > > > > > On Tue, May 31, 2022 at 5:23 PM Arnd Bergmann wrote: > > > > > > > > > > > I've pushed a modified branch now, with that fix on the broken commit, > > > > > > > > > > > and another change to make CONFIG_IRQSTACKS user-selectable rather > > > > > > > > > > > than always enabled. That should tell us if the problem is in the SMP > > > > > > > > > > > patching or in the irqstacks. > > > > > > > > > > > > > > > > > > > > > > Can you test the top of this branch with CONFIG_IRQSTACKS disabled, > > > > > > > > > > > and (if that still stalls) retest the fixed commit f0191ea5c2e5 ("[PART 1] > > > > > > > > > > > ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems")? > > > > > > > > > > > > > > > > > > > > 1. the top of this branch with CONFIG_IRQSTACKS disabled stalls > > > > > > > > > > 2. f0191ea5c2e5 with the same config - not > > > > > > > > > > > > > > > > > > Ok, perfect, that does narrow down the problem quite a bit: The final > > > > > > > > > patch has seven changes, all of which can be done individually because > > > > > > > > > in each case the simplified version in f0191ea5c2e5 is meant to run > > > > > > > > > the exact same instructions as the version after the change, when running > > > > > > > > > on a uniprocessor machine such as your am335x. > > > > > > > > > > > > > > > > > > You have already shown earlier that the get_current() and > > > > > > > > > __my_cpu_offset() functions are not to blame here, as reverting > > > > > > > > > only those does not change the behavior. > > > > > > > > > > > > > > > > > > This leaves the is_smp() check in set_current(), and the > > > > > > > > > four macros in . I don't see anything obviously > > > > > > > > > wrong with any of those five, but I would bet on the macros > > > > > > > > > here. Can you try bisecting into this commit, maybe reverting > > > > > > > > > the changes to set_current and get_current first, and then > > > > > > > > > narrowing it down to (hopefully) a single macro that causes the > > > > > > > > > problem? > > > > > > > > > > > > > > > > > > > > > > > > > set_current() is never called by the primary CPU, which is why the > > > > > > > > is_smp() check was removed from there in 57a420435edcb0b94 ("ARM: drop > > > > > > > > pointless SMP check on secondary startup path"). > > > > > > > > > > > > > > > > So that leaves only the four macros in asm/assembler.h, but I don't > > > > > > > > see anything obviously wrong with those either. > > > > > > > > > > > > > > I pushed a patch on top of Arnd's branch at the link below that gets > > > > > > > rid of the subsections, and uses normal branches (and code patching) > > > > > > > to switch between the thread ID register and the LDR to retrieve the > > > > > > > CPU offset and the current pointer. I have no explanation whether or > > > > > > > why it could make a difference, but I think it's worth a try. > > > > > > > > > > > > The link to your repo is missing. > > > > > > > > > > > > > > > > Oops, sorry :-) > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=am335x-stall-test > > > > > > > > I have tested your branch and it stalls: > > > > > > > > > > OK, thanks for verifying. > > > > My bisection results for f0191ea5c2e5aab29484ede0493ca385eec5472f as a base: > > > > percpu.h: sporadic stalls > > current.h: always stalls > > assembler.h: no stalls > > smp.c: no stalls > > > > So you mean that applying the changes to each of those files in > isolation to the baseline in f0191ea5c2e5aab29484ede0493ca385eec5472f > produces those results, right? Right. > That confirms my statement that smp.c cannot be the culprit, and > appears to exonerate the pure asm pieces. I wonder if this is related > to insufficient asm constraints on the C helpers, or just the cost > model taking different decisions because the inline asm string is much > longer. In any case, this opens up a couple of avenues we could > explore to narrow this down further. > > As a quick check, can you try the below snippet applied onto the > broken current.h build? > > --- a/arch/arm/include/asm/current.h > +++ b/arch/arm/include/asm/current.h > @@ -53,7 +53,8 @@ static __always_inline __attribute_const__ struct > task_struct *get_current(void) > " b . + (2b - 0b) \n\t" > " .popsection \n\t" > #endif > - : "=r"(cur)); > + : "=r"(cur) > + : "Q" (*(const unsigned long *)current_stack_pointer)); Where is the current_stack_pointer defined? > #elif __LINUX_ARM_ARCH__>= 7 || \ > !defined(CONFIG_ARM_HAS_GROUP_RELOCS) || \ > (defined(MODULE) && defined(CONFIG_ARM_MODULE_PLTS)) > > Given that the problematic sequence appears to be in C code, could you > please confirm whether or not the stall is reproducible when all the > pieces that are used by the CAN stack (musb, slcan, ftdio-sio, etc) > are built into the kernel rather than built as modules? Also, which > GCC version are you using? For now, the CAN stack parts are built as modules. I'll try to compile them in. I'm using GCC 10.x Yegor From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0729EC433EF for ; Thu, 2 Jun 2022 12:28:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=KHFYMVfjWLWdEHs7UPTgS4RIb8Vua4cfS9yhyqkRQpg=; b=KSTl5w3R5bUGH6 aTLs+NAflRfV3jQFpNTJW7MvaGcte3KBKjZ+z6GUbgHdzXcc+BIMv07ovZ7pXQJRG8P+4TZG0Ti8v gZ16pUt55SLyR4p+2CxsNmtzXlAvivG6E5JUR1v7yy/4IKDmZ6LB0ZlxptNif4SoN4Au5/EQD/km0 PHT5G8xV3SKcm8Cd5Le+bkISUTW072f1CSHLlx/RN6IVszPPtFgnoNDH8JGTV8L79yZOaxLpj2raD kJpPDS2vRAjIaJKPVVF8XfXUEic8gW1dlsEFjITQW4cUMXG75hV9OrKMT2ErDLRCTEBtTmkozYVjq djhqwEg4c1prlp/OiW9g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nwjv5-003CwV-A1; Thu, 02 Jun 2022 12:27:19 +0000 Received: from mail-yw1-x112f.google.com ([2607:f8b0:4864:20::112f]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nwjv1-003Cul-33 for linux-arm-kernel@lists.infradead.org; Thu, 02 Jun 2022 12:27:16 +0000 Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-300628e76f3so49300597b3.12 for ; Thu, 02 Jun 2022 05:27:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2/ytNBLtaet2Sc9b94adZZY4XFJBgyJMFg+LAlwXlqM=; b=bWn62KJM8+4dCHVGM9/0OyQMyG6NrDOyZl0Z78IdmQcvxm7IwvVeIk/UaYWC1Jj+33 pJQEKlFEL1IGcm5eJ70piiMl0iowRUCAIRaTyvwRQwnscWKZU7sMSL7Fl8nv/9diCAYJ htPp2bN/YGBTf3LTbaHTig9bINfsAk5sfRiVEOe04uJ8K0awQh5sBUdgX459oM0o4b+1 xXS7d02/LMCIGJgJqeNBtmgSa3ykr5Rb/fojoNcUGtI/enk6XVUMRIJWFWAX5BZLtkwN WVday1hBe11fF9D/c8RuKKOQDKhdFRWVy0X+BRgMtfctJ66aSFpLoRsu9uEtqF2EF5fa ky/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2/ytNBLtaet2Sc9b94adZZY4XFJBgyJMFg+LAlwXlqM=; b=K5ZjFlSvwkFz+gtMIv8mWtTzlX8/3ByGLBJuyN9Daa4pRBUT48PnsbnDb1EAeiXOH9 raQhMuWLZ+26eWp3+r5H4eWkQDypY44p0rTTpl3XNHrlp9c4tHIR+ky5qfn6Znx4kz6K miSGq7PKyJVVrftg2i5uo0s3+lQO1IpAN/WcVWZExvmY4f8YbILOu/dhIfs9xAmi+7fm eTYWEUY05GtnbqjYlNHE3R0xrUK/5loopxIo0+soIH7g+DMxPOptlQFQu8d8Xh/ciAse xsVE92wpPRmpsEjyzz5azRW1KLlwo17swtyKMzzU0g7VpWeX//20PYwUjzvyO5Ai6lSK ANHg== X-Gm-Message-State: AOAM532En6Z5xHoNB+TXdMzH52L3HWe9DvZlaOHWI7632Nw6T4tew3xl S+9yo4YX4CLmFaChgtvsRrCNecIV/WbTzeHXkU4= X-Google-Smtp-Source: ABdhPJyDl12IoT3ipqY9bph7zY3VDQvVSvH/+M1j4DKSXgUx2V6G0oFpN+8hYgIk1UHAVEej//IKDU5F5PpzvPWxkbo= X-Received: by 2002:a81:15d8:0:b0:2f7:b686:53d9 with SMTP id 207-20020a8115d8000000b002f7b68653d9mr5382385ywv.428.1654172831990; Thu, 02 Jun 2022 05:27:11 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yegor Yefremov Date: Thu, 2 Jun 2022 14:27:00 +0200 Message-ID: Subject: Re: am335x: 5.18.x: system stalling To: Ard Biesheuvel Cc: Arnd Bergmann , Tony Lindgren , Linux-OMAP , linux-clk , Stephen Boyd , Linux ARM X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220602_052715_239397_B8C7730D X-CRM114-Status: GOOD ( 48.41 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Jun 2, 2022 at 12:37 PM Ard Biesheuvel wrote: > > On Thu, 2 Jun 2022 at 12:17, Yegor Yefremov wrote: > > > > On Wed, Jun 1, 2022 at 12:50 PM Ard Biesheuvel wrote: > > > > > > On Wed, 1 Jun 2022 at 12:46, Yegor Yefremov wrote: > > > > > > > > On Wed, Jun 1, 2022 at 12:06 PM Ard Biesheuvel wrote: > > > > > > > > > > On Wed, 1 Jun 2022 at 12:04, Yegor Yefremov wrote: > > > > > > > > > > > > On Wed, Jun 1, 2022 at 11:28 AM Ard Biesheuvel wrote: > > > > > > > > > > > > > > On Wed, 1 Jun 2022 at 10:08, Ard Biesheuvel wrote: > > > > > > > > > > > > > > > > On Wed, 1 Jun 2022 at 09:59, Arnd Bergmann wrote: > > > > > > > > > > > > > > > > > > On Wed, Jun 1, 2022 at 9:36 AM Yegor Yefremov > > > > > > > > > wrote: > > > > > > > > > > On Tue, May 31, 2022 at 5:23 PM Arnd Bergmann wrote: > > > > > > > > > > > I've pushed a modified branch now, with that fix on the broken commit, > > > > > > > > > > > and another change to make CONFIG_IRQSTACKS user-selectable rather > > > > > > > > > > > than always enabled. That should tell us if the problem is in the SMP > > > > > > > > > > > patching or in the irqstacks. > > > > > > > > > > > > > > > > > > > > > > Can you test the top of this branch with CONFIG_IRQSTACKS disabled, > > > > > > > > > > > and (if that still stalls) retest the fixed commit f0191ea5c2e5 ("[PART 1] > > > > > > > > > > > ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems")? > > > > > > > > > > > > > > > > > > > > 1. the top of this branch with CONFIG_IRQSTACKS disabled stalls > > > > > > > > > > 2. f0191ea5c2e5 with the same config - not > > > > > > > > > > > > > > > > > > Ok, perfect, that does narrow down the problem quite a bit: The final > > > > > > > > > patch has seven changes, all of which can be done individually because > > > > > > > > > in each case the simplified version in f0191ea5c2e5 is meant to run > > > > > > > > > the exact same instructions as the version after the change, when running > > > > > > > > > on a uniprocessor machine such as your am335x. > > > > > > > > > > > > > > > > > > You have already shown earlier that the get_current() and > > > > > > > > > __my_cpu_offset() functions are not to blame here, as reverting > > > > > > > > > only those does not change the behavior. > > > > > > > > > > > > > > > > > > This leaves the is_smp() check in set_current(), and the > > > > > > > > > four macros in . I don't see anything obviously > > > > > > > > > wrong with any of those five, but I would bet on the macros > > > > > > > > > here. Can you try bisecting into this commit, maybe reverting > > > > > > > > > the changes to set_current and get_current first, and then > > > > > > > > > narrowing it down to (hopefully) a single macro that causes the > > > > > > > > > problem? > > > > > > > > > > > > > > > > > > > > > > > > > set_current() is never called by the primary CPU, which is why the > > > > > > > > is_smp() check was removed from there in 57a420435edcb0b94 ("ARM: drop > > > > > > > > pointless SMP check on secondary startup path"). > > > > > > > > > > > > > > > > So that leaves only the four macros in asm/assembler.h, but I don't > > > > > > > > see anything obviously wrong with those either. > > > > > > > > > > > > > > I pushed a patch on top of Arnd's branch at the link below that gets > > > > > > > rid of the subsections, and uses normal branches (and code patching) > > > > > > > to switch between the thread ID register and the LDR to retrieve the > > > > > > > CPU offset and the current pointer. I have no explanation whether or > > > > > > > why it could make a difference, but I think it's worth a try. > > > > > > > > > > > > The link to your repo is missing. > > > > > > > > > > > > > > > > Oops, sorry :-) > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=am335x-stall-test > > > > > > > > I have tested your branch and it stalls: > > > > > > > > > > OK, thanks for verifying. > > > > My bisection results for f0191ea5c2e5aab29484ede0493ca385eec5472f as a base: > > > > percpu.h: sporadic stalls > > current.h: always stalls > > assembler.h: no stalls > > smp.c: no stalls > > > > So you mean that applying the changes to each of those files in > isolation to the baseline in f0191ea5c2e5aab29484ede0493ca385eec5472f > produces those results, right? Right. > That confirms my statement that smp.c cannot be the culprit, and > appears to exonerate the pure asm pieces. I wonder if this is related > to insufficient asm constraints on the C helpers, or just the cost > model taking different decisions because the inline asm string is much > longer. In any case, this opens up a couple of avenues we could > explore to narrow this down further. > > As a quick check, can you try the below snippet applied onto the > broken current.h build? > > --- a/arch/arm/include/asm/current.h > +++ b/arch/arm/include/asm/current.h > @@ -53,7 +53,8 @@ static __always_inline __attribute_const__ struct > task_struct *get_current(void) > " b . + (2b - 0b) \n\t" > " .popsection \n\t" > #endif > - : "=r"(cur)); > + : "=r"(cur) > + : "Q" (*(const unsigned long *)current_stack_pointer)); Where is the current_stack_pointer defined? > #elif __LINUX_ARM_ARCH__>= 7 || \ > !defined(CONFIG_ARM_HAS_GROUP_RELOCS) || \ > (defined(MODULE) && defined(CONFIG_ARM_MODULE_PLTS)) > > Given that the problematic sequence appears to be in C code, could you > please confirm whether or not the stall is reproducible when all the > pieces that are used by the CAN stack (musb, slcan, ftdio-sio, etc) > are built into the kernel rather than built as modules? Also, which > GCC version are you using? For now, the CAN stack parts are built as modules. I'll try to compile them in. I'm using GCC 10.x Yegor _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel