All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yegor Yefremov <yegorslists@googlemail.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Ard Biesheuvel <ardb@kernel.org>,
	Tony Lindgren <tony@atomide.com>,
	Linux-OMAP <linux-omap@vger.kernel.org>,
	linux-clk <linux-clk@vger.kernel.org>,
	Stephen Boyd <sboyd@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: am335x: 5.18.x: system stalling
Date: Fri, 3 Jun 2022 21:11:42 +0200	[thread overview]
Message-ID: <CAGm1_kuUVKAxcxENnvsq5AGzeXAeXP6yLmjt1MQSpfjdcvZjng@mail.gmail.com> (raw)
In-Reply-To: <CAK8P3a2113EnZw9BNjCYYmKuNqEk4CtZCC0ydBNNQXetvzTSEg@mail.gmail.com>

On Fri, Jun 3, 2022 at 11:32 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Fri, Jun 3, 2022 at 10:54 AM Yegor Yefremov
> <yegorslists@googlemail.com> wrote:
> > On Thu, Jun 2, 2022 at 2:27 PM Yegor Yefremov <yegorslists@googlemail.com> wrote:
> > > On Thu, Jun 2, 2022 at 12:37 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > > > That confirms my statement that smp.c cannot be the culprit, and
> > > > appears to exonerate the pure asm pieces. I wonder if this is related
> > > > to insufficient asm constraints on the C helpers, or just the cost
> > > > model taking different decisions because the inline asm string is much
> > > > longer. In any case, this opens up a couple of avenues we could
> > > > explore to narrow this down further.
> > > >
> > > > As a quick check, can you try the below snippet applied onto the
> > > > broken current.h build?
> > > >
> > > > --- a/arch/arm/include/asm/current.h
> > > > +++ b/arch/arm/include/asm/current.h
> > > > @@ -53,7 +53,8 @@ static __always_inline __attribute_const__ struct
> > > > task_struct *get_current(void)
> > > >             "   b       . + (2b - 0b)                           \n\t"
> > > >             "   .popsection                                     \n\t"
> > > >  #endif
> > > > -           : "=r"(cur));
> > > > +           : "=r"(cur)
> > > > +           : "Q" (*(const unsigned long *)current_stack_pointer));
> > >
> > > Where is the current_stack_pointer defined?
> > >
> > > >  #elif __LINUX_ARM_ARCH__>= 7 || \
> > > >        !defined(CONFIG_ARM_HAS_GROUP_RELOCS) || \
> > > >        (defined(MODULE) && defined(CONFIG_ARM_MODULE_PLTS))
> > > >
> > > > Given that the problematic sequence appears to be in C code, could you
> > > > please confirm whether or not the stall is reproducible when all the
> > > > pieces that are used by the CAN stack (musb, slcan, ftdio-sio, etc)
> > > > are built into the kernel rather than built as modules? Also, which
> > > > GCC version are you using?
> > >
> > > For now, the CAN stack parts are built as modules. I'll try to compile them in.
> > >
> > > I'm using GCC 10.x
> >
> > I have tried your patch (see the attachment) and the system stalls.
>
> This is with only get_current() patched on top of the working
> f0191ea5c2e5 ("[PART 1] ARM: implement THREAD_INFO_IN_TASK for
> uniprocessor systems"), right?
>
> My best theory right now is that something in get_currnent() is wrong that
> causes it to return the wrong task pointer, which in turn leads to
> current->preempt_count to get out of sync. This may be related to the cppi41
> dmaengine tasklet and effectively disables further softirqs including the timer
> that triggers the RCU grace period.
>
> When we finally switch tasks to the cpufreq worker thread, softirqs
> can happen again because of the task switch, and at the next IRQ
> the timer detects the stall.
>
> > Will try GCC 11.x and also compiled-in drivers.
>
> Ok. Maybe make sure all drivers are built-in here. I see both the CAN
> layer and the cppi41 driver use softirqs, so to be on the safe side,
> try to get to a running kernel that has no modules loaded at all at
> the time you expect the stall.
>
>
> One thing that could possibly go wrong with get_current() would be that
> it fails to get patched for some reason, or it gets patched only after
> it was already called. Since you run on an ARMv7 CPU as opposed to
> an actual OMAP2410/ARM1136r0, it would then try to load the
> variable from the uninitialized TPIDRURO register. If that happens,
> the one-liner below should tell you exactly where, by triggering an
> Oops. You can apply the patch on top for testing, it should have no
> other effects if the patching part works correctly.
>
>         Anrd
>
> 8<---
>
> diff --git a/arch/arm/include/asm/current.h b/arch/arm/include/asm/current.h
> index 2f9d79214b25..2a15832793c4 100644
> --- a/arch/arm/include/asm/current.h
> +++ b/arch/arm/include/asm/current.h
> @@ -33,7 +33,7 @@ static inline __attribute_const__ struct task_struct
> *get_current(void)
>          */
>         cur = __builtin_thread_pointer();
>  #elif defined(CONFIG_CURRENT_POINTER_IN_TPIDRURO) || defined(CONFIG_SMP)
> -       asm("0: mrc p15, 0, %0, c13, c0, 3                      \n\t"
> +       asm("0: .long 0xe7f001f2                        \n\t" // BUG() trap
>  #ifdef CONFIG_CPU_V6
>             "1:                                                 \n\t"
>             "   .subsection 1                                   \n\t"

With compiled-in drivers the system doesn't stall. All other tests and
related outputs will come next week.

Have a nice weekend.

Yegor

WARNING: multiple messages have this Message-ID (diff)
From: Yegor Yefremov <yegorslists@googlemail.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Ard Biesheuvel <ardb@kernel.org>,
	Tony Lindgren <tony@atomide.com>,
	 Linux-OMAP <linux-omap@vger.kernel.org>,
	linux-clk <linux-clk@vger.kernel.org>,
	 Stephen Boyd <sboyd@kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: am335x: 5.18.x: system stalling
Date: Fri, 3 Jun 2022 21:11:42 +0200	[thread overview]
Message-ID: <CAGm1_kuUVKAxcxENnvsq5AGzeXAeXP6yLmjt1MQSpfjdcvZjng@mail.gmail.com> (raw)
In-Reply-To: <CAK8P3a2113EnZw9BNjCYYmKuNqEk4CtZCC0ydBNNQXetvzTSEg@mail.gmail.com>

On Fri, Jun 3, 2022 at 11:32 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Fri, Jun 3, 2022 at 10:54 AM Yegor Yefremov
> <yegorslists@googlemail.com> wrote:
> > On Thu, Jun 2, 2022 at 2:27 PM Yegor Yefremov <yegorslists@googlemail.com> wrote:
> > > On Thu, Jun 2, 2022 at 12:37 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> > > > That confirms my statement that smp.c cannot be the culprit, and
> > > > appears to exonerate the pure asm pieces. I wonder if this is related
> > > > to insufficient asm constraints on the C helpers, or just the cost
> > > > model taking different decisions because the inline asm string is much
> > > > longer. In any case, this opens up a couple of avenues we could
> > > > explore to narrow this down further.
> > > >
> > > > As a quick check, can you try the below snippet applied onto the
> > > > broken current.h build?
> > > >
> > > > --- a/arch/arm/include/asm/current.h
> > > > +++ b/arch/arm/include/asm/current.h
> > > > @@ -53,7 +53,8 @@ static __always_inline __attribute_const__ struct
> > > > task_struct *get_current(void)
> > > >             "   b       . + (2b - 0b)                           \n\t"
> > > >             "   .popsection                                     \n\t"
> > > >  #endif
> > > > -           : "=r"(cur));
> > > > +           : "=r"(cur)
> > > > +           : "Q" (*(const unsigned long *)current_stack_pointer));
> > >
> > > Where is the current_stack_pointer defined?
> > >
> > > >  #elif __LINUX_ARM_ARCH__>= 7 || \
> > > >        !defined(CONFIG_ARM_HAS_GROUP_RELOCS) || \
> > > >        (defined(MODULE) && defined(CONFIG_ARM_MODULE_PLTS))
> > > >
> > > > Given that the problematic sequence appears to be in C code, could you
> > > > please confirm whether or not the stall is reproducible when all the
> > > > pieces that are used by the CAN stack (musb, slcan, ftdio-sio, etc)
> > > > are built into the kernel rather than built as modules? Also, which
> > > > GCC version are you using?
> > >
> > > For now, the CAN stack parts are built as modules. I'll try to compile them in.
> > >
> > > I'm using GCC 10.x
> >
> > I have tried your patch (see the attachment) and the system stalls.
>
> This is with only get_current() patched on top of the working
> f0191ea5c2e5 ("[PART 1] ARM: implement THREAD_INFO_IN_TASK for
> uniprocessor systems"), right?
>
> My best theory right now is that something in get_currnent() is wrong that
> causes it to return the wrong task pointer, which in turn leads to
> current->preempt_count to get out of sync. This may be related to the cppi41
> dmaengine tasklet and effectively disables further softirqs including the timer
> that triggers the RCU grace period.
>
> When we finally switch tasks to the cpufreq worker thread, softirqs
> can happen again because of the task switch, and at the next IRQ
> the timer detects the stall.
>
> > Will try GCC 11.x and also compiled-in drivers.
>
> Ok. Maybe make sure all drivers are built-in here. I see both the CAN
> layer and the cppi41 driver use softirqs, so to be on the safe side,
> try to get to a running kernel that has no modules loaded at all at
> the time you expect the stall.
>
>
> One thing that could possibly go wrong with get_current() would be that
> it fails to get patched for some reason, or it gets patched only after
> it was already called. Since you run on an ARMv7 CPU as opposed to
> an actual OMAP2410/ARM1136r0, it would then try to load the
> variable from the uninitialized TPIDRURO register. If that happens,
> the one-liner below should tell you exactly where, by triggering an
> Oops. You can apply the patch on top for testing, it should have no
> other effects if the patching part works correctly.
>
>         Anrd
>
> 8<---
>
> diff --git a/arch/arm/include/asm/current.h b/arch/arm/include/asm/current.h
> index 2f9d79214b25..2a15832793c4 100644
> --- a/arch/arm/include/asm/current.h
> +++ b/arch/arm/include/asm/current.h
> @@ -33,7 +33,7 @@ static inline __attribute_const__ struct task_struct
> *get_current(void)
>          */
>         cur = __builtin_thread_pointer();
>  #elif defined(CONFIG_CURRENT_POINTER_IN_TPIDRURO) || defined(CONFIG_SMP)
> -       asm("0: mrc p15, 0, %0, c13, c0, 3                      \n\t"
> +       asm("0: .long 0xe7f001f2                        \n\t" // BUG() trap
>  #ifdef CONFIG_CPU_V6
>             "1:                                                 \n\t"
>             "   .subsection 1                                   \n\t"

With compiled-in drivers the system doesn't stall. All other tests and
related outputs will come next week.

Have a nice weekend.

Yegor

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-06-03 19:12 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-04 10:35 am335x: 5.18.x: system stalling Yegor Yefremov
2022-05-05  5:08 ` Tony Lindgren
2022-05-11 14:16   ` Yegor Yefremov
2022-05-12  5:41     ` Tony Lindgren
2022-05-12  5:41       ` Tony Lindgren
2022-05-12  8:14       ` Arnd Bergmann
2022-05-12  8:14         ` Arnd Bergmann
2022-05-12  8:42       ` Arnd Bergmann
2022-05-12  8:42         ` Arnd Bergmann
2022-05-12 10:20         ` Yegor Yefremov
2022-05-12 10:20           ` Yegor Yefremov
2022-05-19 16:52           ` Yegor Yefremov
2022-05-19 16:52             ` Yegor Yefremov
2022-05-21 19:41             ` Arnd Bergmann
2022-05-21 19:41               ` Arnd Bergmann
2022-05-24 13:38               ` Yegor Yefremov
2022-05-24 13:38                 ` Yegor Yefremov
2022-05-24 14:19                 ` Tony Lindgren
2022-05-24 14:19                   ` Tony Lindgren
2022-05-26  5:49                   ` Yegor Yefremov
2022-05-26  5:49                     ` Yegor Yefremov
2022-05-26  6:20                     ` Tony Lindgren
2022-05-26  6:20                       ` Tony Lindgren
2022-05-26  8:19                       ` Ard Biesheuvel
2022-05-26  8:19                         ` Ard Biesheuvel
2022-05-26 12:37                         ` Yegor Yefremov
2022-05-26 12:37                           ` Yegor Yefremov
2022-05-26 14:15                           ` Arnd Bergmann
2022-05-26 14:15                             ` Arnd Bergmann
2022-05-27  4:44                             ` Yegor Yefremov
2022-05-27  4:44                               ` Yegor Yefremov
2022-05-27  6:38                               ` Arnd Bergmann
2022-05-27  6:38                                 ` Arnd Bergmann
2022-05-27  6:50                                 ` Tony Lindgren
2022-05-27  6:50                                   ` Tony Lindgren
2022-05-27  6:57                                   ` Arnd Bergmann
2022-05-27  6:57                                     ` Arnd Bergmann
2022-05-27  8:17                                     ` Yegor Yefremov
2022-05-27  8:17                                       ` Yegor Yefremov
2022-05-27  8:38                                       ` Arnd Bergmann
2022-05-27  8:38                                         ` Arnd Bergmann
2022-05-27  9:50                                         ` Yegor Yefremov
2022-05-27  9:50                                           ` Yegor Yefremov
2022-05-27 12:53                                           ` Arnd Bergmann
2022-05-27 12:53                                             ` Arnd Bergmann
2022-05-27 13:12                                             ` Ard Biesheuvel
2022-05-27 13:12                                               ` Ard Biesheuvel
2022-05-27 14:12                                               ` Arnd Bergmann
2022-05-27 14:12                                                 ` Arnd Bergmann
2022-05-28  5:48                                                 ` Yegor Yefremov
2022-05-28  5:48                                                   ` Yegor Yefremov
2022-05-28  7:53                                                   ` Arnd Bergmann
2022-05-28  7:53                                                     ` Arnd Bergmann
2022-05-28  8:29                                                     ` Yegor Yefremov
2022-05-28  8:29                                                       ` Yegor Yefremov
2022-05-28  9:07                                                       ` Ard Biesheuvel
2022-05-28  9:07                                                         ` Ard Biesheuvel
2022-05-28 13:01                                                         ` Yegor Yefremov
2022-05-28 13:01                                                           ` Yegor Yefremov
2022-05-28 13:13                                                           ` Arnd Bergmann
2022-05-28 13:13                                                             ` Arnd Bergmann
2022-05-28 19:28                                                             ` Yegor Yefremov
2022-05-28 19:28                                                               ` Yegor Yefremov
2022-05-30 10:16                                                               ` Ard Biesheuvel
2022-05-30 10:16                                                                 ` Ard Biesheuvel
2022-05-30 12:09                                                                 ` Yegor Yefremov
2022-05-30 12:09                                                                   ` Yegor Yefremov
2022-05-30 13:54                                                               ` Arnd Bergmann
2022-05-30 13:54                                                                 ` Arnd Bergmann
2022-05-30 15:14                                                                 ` Ard Biesheuvel
2022-05-30 15:14                                                                   ` Ard Biesheuvel
2022-05-31  8:36                                                                   ` Yegor Yefremov
2022-05-31  8:36                                                                     ` Yegor Yefremov
2022-05-31 14:16                                                                     ` Yegor Yefremov
2022-05-31 14:16                                                                       ` Yegor Yefremov
2022-05-31 15:22                                                                       ` Arnd Bergmann
2022-05-31 15:22                                                                         ` Arnd Bergmann
2022-06-01  7:36                                                                         ` Yegor Yefremov
2022-06-01  7:36                                                                           ` Yegor Yefremov
2022-06-01  7:59                                                                           ` Arnd Bergmann
2022-06-01  7:59                                                                             ` Arnd Bergmann
2022-06-01  8:08                                                                             ` Ard Biesheuvel
2022-06-01  8:08                                                                               ` Ard Biesheuvel
2022-06-01  9:27                                                                               ` Ard Biesheuvel
2022-06-01  9:27                                                                                 ` Ard Biesheuvel
2022-06-01 10:03                                                                                 ` Yegor Yefremov
2022-06-01 10:03                                                                                   ` Yegor Yefremov
2022-06-01 10:06                                                                                   ` Ard Biesheuvel
2022-06-01 10:06                                                                                     ` Ard Biesheuvel
2022-06-01 10:46                                                                                     ` Yegor Yefremov
2022-06-01 10:46                                                                                       ` Yegor Yefremov
2022-06-01 10:49                                                                                       ` Ard Biesheuvel
2022-06-01 10:49                                                                                         ` Ard Biesheuvel
2022-06-02 10:17                                                                                         ` Yegor Yefremov
2022-06-02 10:17                                                                                           ` Yegor Yefremov
2022-06-02 10:37                                                                                           ` Ard Biesheuvel
2022-06-02 10:37                                                                                             ` Ard Biesheuvel
2022-06-02 12:27                                                                                             ` Yegor Yefremov
2022-06-02 12:27                                                                                               ` Yegor Yefremov
2022-06-03  8:54                                                                                               ` Yegor Yefremov
2022-06-03  8:54                                                                                                 ` Yegor Yefremov
2022-06-03  9:32                                                                                                 ` Arnd Bergmann
2022-06-03  9:32                                                                                                   ` Arnd Bergmann
2022-06-03 19:11                                                                                                   ` Yegor Yefremov [this message]
2022-06-03 19:11                                                                                                     ` Yegor Yefremov
2022-06-03 20:46                                                                                                     ` Arnd Bergmann
2022-06-03 20:46                                                                                                       ` Arnd Bergmann
2022-06-05 14:59                                                                                                       ` Ard Biesheuvel
2022-06-05 14:59                                                                                                         ` Ard Biesheuvel
2022-06-07  8:55                                                                                                         ` Yegor Yefremov
2022-06-07  8:55                                                                                                           ` Yegor Yefremov
2022-08-12  7:35                                                                                                           ` Arnd Bergmann
2022-08-12  7:35                                                                                                             ` Arnd Bergmann
2022-05-24 14:36                 ` Arnd Bergmann
2022-05-24 14:36                   ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGm1_kuUVKAxcxENnvsq5AGzeXAeXP6yLmjt1MQSpfjdcvZjng@mail.gmail.com \
    --to=yegorslists@googlemail.com \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-clk@vger.kernel.org \
    --cc=linux-omap@vger.kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tony@atomide.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.