From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9809C433EF for ; Thu, 12 May 2022 08:14:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351351AbiELIOl (ORCPT ); Thu, 12 May 2022 04:14:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351350AbiELIOk (ORCPT ); Thu, 12 May 2022 04:14:40 -0400 Received: from mout.kundenserver.de (mout.kundenserver.de [217.72.192.74]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 909265DBE3; Thu, 12 May 2022 01:14:37 -0700 (PDT) Received: from mail-yb1-f170.google.com ([209.85.219.170]) by mrelayeu.kundenserver.de (mreue107 [213.165.67.113]) with ESMTPSA (Nemesis) id 1Md66H-1oOwsi2EM7-00aAd1; Thu, 12 May 2022 10:14:35 +0200 Received: by mail-yb1-f170.google.com with SMTP id x17so8398710ybj.3; Thu, 12 May 2022 01:14:35 -0700 (PDT) X-Gm-Message-State: AOAM530083DOA3IplrTxU3lL5IsjzdG9UIuwOta3au/y3OjoXJ3eNl7c zmyLH+uatgDUPQQ/LZND2umTAw+exdRYNvjy65E= X-Google-Smtp-Source: ABdhPJxYlfZ6T04YTUrbGyvWzZ6qQVJVV0d+mwFo1uTISdfe00UIvNMgHa6nvI/sYpRan8quVF/h+BCmUz790udHgHg= X-Received: by 2002:a25:d3c2:0:b0:645:74df:f43d with SMTP id e185-20020a25d3c2000000b0064574dff43dmr27028464ybf.394.1652343274227; Thu, 12 May 2022 01:14:34 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Arnd Bergmann Date: Thu, 12 May 2022 10:14:15 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: am335x: 5.18.x: system stalling To: Tony Lindgren Cc: Yegor Yefremov , Ard Biesheuvel , Arnd Bergmann , Linux-OMAP , linux-clk , Stephen Boyd , Linux ARM Content-Type: text/plain; charset="UTF-8" X-Provags-ID: V03:K1:Lub9ZsxvV1zrOBc1Ujo83uZNaGaWl6GFILH3qo1y0/gpChdnfCO wsKxOPZ7H1V80G3yY+dLP7WvEtir0ov+3rJ54jSmokCqKrIuVSvJk0rvpCacaamo5VOGjS2 +Y2L9maTnyOfqeEfGj1mp2sbmzOgLeZg9EG1Wi7GhXeiXRCTfWsGCS1gCvkKMc9jHQ2xnlz U4X8/U7xA7mCotjXF8CPA== X-UI-Out-Filterresults: notjunk:1;V03:K0:SiCC8J+dTz0=:AOTX/qndo/uAbz52Ei7oCy rTC2wmqnbYJOzudEQKJgPv/5J//Dx9O9hM4gYBILgdpPTKxsMB27B17WsTwgMYF87Tao54kZI jCv9ek3PAJTMKB1+gnrIXDeuCuk1EPs7ivGokRLbDcQxuifyWUBo+k8Wi0274U6YDTIgkZIMQ UchQ+SNC/Z68Rrk1KCVKyWIlvh+ODOslf2zts6XBXaRsMu4vyqqvs0uno6KvF3+d8zNFwBGmW 43pqHhCSCwYtCFx/WTj13lm5WhdMXPqLqXty4ARPzW3GvbmidaB02+AWbs3oVZ9tOuMgKsQWO C225D0PQCRIxn5lxw7cMdtyySZe4XwJuLzQb9Slhy4dMMFJQ8+xsGk8dNOJZky7ld7FCcyQxa kaQkpWjpMaTIGzZYa+vLCBYOPRT2CKW98sLhMxGWhz81w0MZ0wOW53+K14fAWkJ/vbQ6w8GxS i2JDbGFwFmooYUj5MzNXJMzXgoMdxSbVqeH3a3sK2AAoEJGD5VD7CUDgC5f6uBJe0FPwA+riX qOsZuxWcrPP4AA4AU43aYovndJad3SasacDwY8eRayovARtGtR+EzdjeTQbSBKZzos89IC6aK qdg87qF3IWyDRtkA0/yutxkUo8VvkhZlL8hA9LWguInRGg8/ElN2m38akEVusm257F0U0HH/8 /8IInN00hv4kDVzZalc3Mib8VV1V8J0hWpf4s1aqdbllQ4Hn0YSTqWSZ3O/M6zGdEPM/By4df A450LVuW6MxYYqmcz5OHIKGwrj1jV0xw+06nHba6dPKpTWS+mC/h23R4TIyGrWxGWOUTk5kn/ oy1/JPNNaiH8hiN1LmTRk3KI7x92ACVTcwCo/p+WQ8oiQGfoz8= Precedence: bulk List-ID: X-Mailing-List: linux-clk@vger.kernel.org On Thu, May 12, 2022 at 7:41 AM Tony Lindgren wrote: > Adding Ard and Arnd for vmap stack. Thanks! > * Yegor Yefremov [220511 14:16]: > > On Thu, May 5, 2022 at 7:08 AM Tony Lindgren wrote: > > > * Yegor Yefremov [220504 10:35]: > > Maybe Ard and Arnd have some ideas what might be going wrong here. > Basically anything trying to use a physical address on stack will > fail in weird ways like we've seen for smc and wl1251. For this, the first step should be to enable CONFIG_DMA_API_DEBUG. If any device is getting the wrong DMA address for a stack variable, this should print a helpful debug message to the console. > > > > [ 88.408578] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > [ 88.415777] (detected by 0, t=2602 jiffies, g=2529, q=17) > > > > [ 88.422026] rcu: All QSes seen, last rcu_sched kthread activity > > > > 2602 (-21160--23762), jiffies_till_next_fqs=1, root ->qsmask 0x0 > > > > [ 88.434445] rcu: rcu_sched kthread starved for 2602 jiffies! g2529 > > > > f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > > > > [ 88.445274] rcu: Unless rcu_sched kthread gets sufficient CPU > > > > time, OOM is now expected behavior. > > > > [ 88.454859] rcu: RCU grace-period kthread stack dump: I looked for a smoking gun in the backtrace, didn't really find anything, so I'm guessing the problem is something that happened between the last timer timer and the time it actually ran the rcu_gp_kthread, maybe some DMA timeout in a device driver running with interrupts disabled. > > > > [ 88.807588] omap3_noncore_dpll_program from clk_change_rate+0x23c/0x4f8 > > > > [ 88.815375] clk_change_rate from clk_core_set_rate_nolock+0x1b0/0x29c > > > > [ 88.822936] clk_core_set_rate_nolock from clk_set_rate+0x30/0x64 > > > > [ 88.830056] clk_set_rate from _set_opp+0x254/0x51c > > > > [ 88.835835] _set_opp from dev_pm_opp_set_rate+0xec/0x228 > > > > [ 88.842073] dev_pm_opp_set_rate from __cpufreq_driver_target+0x584/0x700 > > > > [ 88.849792] __cpufreq_driver_target from od_dbs_update+0xb4/0x168 > > > > [ 88.856953] od_dbs_update from dbs_work_handler+0x2c/0x60 > > > > [ 88.863441] dbs_work_handler from process_one_work+0x284/0x72c > > > > [ 88.870411] process_one_work from worker_thread+0x28/0x4b0 > > > > [ 88.876973] worker_thread from kthread+0xe4/0x104 > > > > [ 88.882692] kthread from ret_from_fork+0x14/0x28 The only thing I see that is slightly unusual here is that the timer tick happened exactly during the cpufreq transition. Is this always the same backtrace when you run into the bug? What happens when you disable the omap3 cpufreq driver or set it to run at a fixed frequency? Arnd From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0A229C433F5 for ; Thu, 12 May 2022 08:15:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=lMTIeX/yiUxIF+nt+5P0kfGfBSK/x6/tT5rAvKTvS74=; b=xQVm2A/36bbyzL sGZ+Sdgmy0sTAVq4OUz+uQRSGfUenBZ5zsp+gIXlWPYa3pK+XAA1WLyK86uMdCeSUHED2OPWt1Jw8 9cqN0ahzaAehcv3ybSrOTAkCJkstB3Nnytlgdp61Js5YcVu0ZjYDp3mOaxVt1kYUIUlL+G2wbQKQx YmXTOOq5E0gLDaUnVg6gas0JnZHpgAiTeflz2ferd/rNV3nHk0puQpHa6JBk/J4UJ15tvUseAZDLy EqkOWDaYm7thjzoUjey1z+yhL/aWet/nt2wkiMM5A22eKS56PBYjZNkcSJKlRTxkvVuOYWmnUAPME xlY6l/QDw1vS2UZutzzw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1np3y7-00AtKp-7Q; Thu, 12 May 2022 08:14:43 +0000 Received: from mout.kundenserver.de ([217.72.192.74]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1np3y3-00AtJs-Qi for linux-arm-kernel@lists.infradead.org; Thu, 12 May 2022 08:14:41 +0000 Received: from mail-yb1-f171.google.com ([209.85.219.171]) by mrelayeu.kundenserver.de (mreue107 [213.165.67.113]) with ESMTPSA (Nemesis) id 1N4yNG-1nwat60vbA-010wF5 for ; Thu, 12 May 2022 10:14:36 +0200 Received: by mail-yb1-f171.google.com with SMTP id j2so8473947ybu.0 for ; Thu, 12 May 2022 01:14:35 -0700 (PDT) X-Gm-Message-State: AOAM530SZY28AUOPCmbJM1WBEg3ZuQKH4Xjji4XvUCyQwClg0wZOqfk1 7DCbkgblmJfFNOkw6g3BuNN+scdTxFryiL8J9zU= X-Google-Smtp-Source: ABdhPJxYlfZ6T04YTUrbGyvWzZ6qQVJVV0d+mwFo1uTISdfe00UIvNMgHa6nvI/sYpRan8quVF/h+BCmUz790udHgHg= X-Received: by 2002:a25:d3c2:0:b0:645:74df:f43d with SMTP id e185-20020a25d3c2000000b0064574dff43dmr27028464ybf.394.1652343274227; Thu, 12 May 2022 01:14:34 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Arnd Bergmann Date: Thu, 12 May 2022 10:14:15 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: am335x: 5.18.x: system stalling To: Tony Lindgren Cc: Yegor Yefremov , Ard Biesheuvel , Arnd Bergmann , Linux-OMAP , linux-clk , Stephen Boyd , Linux ARM X-Provags-ID: V03:K1:DR1Sjc6LYgl7vWq8mj2ix6y/fUhk3ZgYqgAUmxnzylYx2UG22CR TjpxyuRxtsa242ZNSp2d4YU+RqiFod2FWWu5r3/dC0iXxDyBsQHoMDKTsjedo5N0vGTbczg obonawCz0cBGOaTHAR/c2QwvjVw4l5pnKRMd2O07PGD5PPY/zlZgpJgN7LYOARppXqWlnte VUk/7COasQXHvOj/53UkQ== X-UI-Out-Filterresults: notjunk:1;V03:K0:6tTy819KmJw=:V9d8uvEJWDa5EtM9pqOGVu 7cq8NDPTxzpwAXtr/6VsDzYrrj4qcjYEiXo6+BOEA6u9QeAc7DdtR5MkS789hg0+IDAqDKYKx akfhIKjQeoNXq6TcaS8aLIMHHiUlOgF3ION2TiB+c8kKaLIqtfE2mmGQF+Tdl9uNRLrpdP/NK SjG2iSkZOL15k3dHVg04Ax/bPZrzCLggff5tXOjToofL8g3r2cwXGPnNnWWIQr1iM2wy3JKds oUDH66IoS9NUqXFW6w9e+KQq3vqZO/KMi7FC/Vl08W2CDazD1MOuweUEqL+JBZHbmpSGlYIhK /ObEBbuoRSyCIGFlJtb5AeEq1xfMgTraQrFJZhJR8SvfIpujAHsU2dlfkuMeXdYNmJZnPRaXe kgjy9b/9yaeMe0IvPhHYXGi0QMyDkPA8HKzZnjAD5gMFQfa0+lI1LSoQ0jkySrVZ3AiK7onvB uyxKiILFo6fE/htKkvs/FM+yYKmIY4m4hzxAF57knV7nhs6eVbNiwODVA6d/V7RJmAArCKBCV m6HfLI8lXSdpHae3wkdi8TE3cm4vM2EUTvw+5gHv9bqhM5brfHQABMQmbJsgHDPMBNwct5+SX NpW6a6jTZfmodmMDit34Aj6uolU+rd709pW3qLoStQHi3tqP4erRQ3BLThghEpir4X+W4iHkG FrntDSm9HKVmmtQBkVAGzp5ITZovCtO9kY/3DplcOPFwct79jqYJOwYgbp58GLD2bgGTsgGUi VoNeeUEJPEiKy0uqUYHuE90NVozPiSD8aaeyU0YGFfVD0CPdnAZycfbms1TxodbV7+WOOFWWu zJ6cX6KEhwcjBwnms61m6hijqbWNDaFov2OLZtOWOiqAbPijuH/k0bXe7X6FkKu1vwjS6KH X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220512_011440_210311_18AC4743 X-CRM114-Status: GOOD ( 20.88 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, May 12, 2022 at 7:41 AM Tony Lindgren wrote: > Adding Ard and Arnd for vmap stack. Thanks! > * Yegor Yefremov [220511 14:16]: > > On Thu, May 5, 2022 at 7:08 AM Tony Lindgren wrote: > > > * Yegor Yefremov [220504 10:35]: > > Maybe Ard and Arnd have some ideas what might be going wrong here. > Basically anything trying to use a physical address on stack will > fail in weird ways like we've seen for smc and wl1251. For this, the first step should be to enable CONFIG_DMA_API_DEBUG. If any device is getting the wrong DMA address for a stack variable, this should print a helpful debug message to the console. > > > > [ 88.408578] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > [ 88.415777] (detected by 0, t=2602 jiffies, g=2529, q=17) > > > > [ 88.422026] rcu: All QSes seen, last rcu_sched kthread activity > > > > 2602 (-21160--23762), jiffies_till_next_fqs=1, root ->qsmask 0x0 > > > > [ 88.434445] rcu: rcu_sched kthread starved for 2602 jiffies! g2529 > > > > f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > > > > [ 88.445274] rcu: Unless rcu_sched kthread gets sufficient CPU > > > > time, OOM is now expected behavior. > > > > [ 88.454859] rcu: RCU grace-period kthread stack dump: I looked for a smoking gun in the backtrace, didn't really find anything, so I'm guessing the problem is something that happened between the last timer timer and the time it actually ran the rcu_gp_kthread, maybe some DMA timeout in a device driver running with interrupts disabled. > > > > [ 88.807588] omap3_noncore_dpll_program from clk_change_rate+0x23c/0x4f8 > > > > [ 88.815375] clk_change_rate from clk_core_set_rate_nolock+0x1b0/0x29c > > > > [ 88.822936] clk_core_set_rate_nolock from clk_set_rate+0x30/0x64 > > > > [ 88.830056] clk_set_rate from _set_opp+0x254/0x51c > > > > [ 88.835835] _set_opp from dev_pm_opp_set_rate+0xec/0x228 > > > > [ 88.842073] dev_pm_opp_set_rate from __cpufreq_driver_target+0x584/0x700 > > > > [ 88.849792] __cpufreq_driver_target from od_dbs_update+0xb4/0x168 > > > > [ 88.856953] od_dbs_update from dbs_work_handler+0x2c/0x60 > > > > [ 88.863441] dbs_work_handler from process_one_work+0x284/0x72c > > > > [ 88.870411] process_one_work from worker_thread+0x28/0x4b0 > > > > [ 88.876973] worker_thread from kthread+0xe4/0x104 > > > > [ 88.882692] kthread from ret_from_fork+0x14/0x28 The only thing I see that is slightly unusual here is that the timer tick happened exactly during the cpufreq transition. Is this always the same backtrace when you run into the bug? What happens when you disable the omap3 cpufreq driver or set it to run at a fixed frequency? Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel