linux-renesas-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
       [not found] ` <20230819004356.1454718-2-Liam.Howlett@oracle.com>
@ 2023-08-29 16:42   ` Geert Uytterhoeven
  2023-08-31  5:39     ` Michael Ellerman
                       ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Geert Uytterhoeven @ 2023-08-29 16:42 UTC (permalink / raw)
  To: Liam R. Howlett
  Cc: Andrew Morton, maple-tree, linux-mm, linux-kernel, stable,
	linux-renesas-soc

 	Hi Liam,

On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> The current implementation of append may cause duplicate data and/or
> incorrect ranges to be returned to a reader during an update.  Although
> this has not been reported or seen, disable the append write operation
> while the tree is in rcu mode out of an abundance of caution.
>
> During the analysis of the mas_next_slot() the following was
> artificially created by separating the writer and reader code:
>
> Writer:                                 reader:
> mas_wr_append
>    set end pivot
>    updates end metata
>    Detects write to last slot
>    last slot write is to start of slot
>    store current contents in slot
>    overwrite old end pivot
>                                        mas_next_slot():
>                                                read end metadata
>                                                read old end pivot
>                                                return with incorrect range
>    store new value
>
> Alternatively:
>
> Writer:                                 reader:
> mas_wr_append
>    set end pivot
>    updates end metata
>    Detects write to last slot
>    last lost write to end of slot
>    store value
>                                        mas_next_slot():
>                                                read end metadata
>                                                read old end pivot
>                                                read new end pivot
>                                                return with incorrect range
>    set old end pivot
>
> There may be other accesses that are not safe since we are now updating
> both metadata and pointers, so disabling append if there could be rcu
> readers is the safest action.
>
> Fixes: 54a611b60590 ("Maple Tree: add new data structure")
> Cc: stable@vger.kernel.org
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

Thanks for your patch, which is now commit cfeb6ae8bcb96ccf
("maple_tree: disable mas_wr_append() when other readers are
possible") in v6.5, and is being backported to stable.

On Renesas RZ/A1 and RZ/A2 (single-core Cortex-A9), this causes the
following warning:

      clocksource: timer@e803b000: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 28958491609 ns
      sched_clock: 32 bits at 66MHz, resolution 15ns, wraps every 32537631224ns
      /soc/timer@e803b000: used for clocksource
      /soc/timer@e803c000: used for clock events
     +------------[ cut here ]------------
     +WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x2f0/0x480
     +Interrupts were enabled early
     +CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rza2mevb-10197-g99b80d6b92b5 #237
     +Hardware name: Generic R7S9210 (Flattened Device Tree)
     + unwind_backtrace from show_stack+0x10/0x14
     + show_stack from dump_stack_lvl+0x24/0x3c
     + dump_stack_lvl from __warn+0x74/0xb8
     + __warn from warn_slowpath_fmt+0x78/0xb0
     + warn_slowpath_fmt from start_kernel+0x2f0/0x480
     + start_kernel from 0x0
     +---[ end trace 0000000000000000 ]---
      Console: colour dummy device 80x30
      printk: console [tty0] enabled
      Calibrating delay loop (skipped) preset value.. 1056.00 BogoMIPS (lpj=5280000)

Reverting this commit fixes the issue.

RCU-related configs:

     $ grep RCU .config
     # RCU Subsystem
     CONFIG_TINY_RCU=y
     # CONFIG_RCU_EXPERT is not set
     CONFIG_TINY_SRCU=y
     # end of RCU Subsystem
     # RCU Debugging
     # CONFIG_RCU_SCALE_TEST is not set
     # CONFIG_RCU_TORTURE_TEST is not set
     # CONFIG_RCU_REF_SCALE_TEST is not set
     # CONFIG_RCU_TRACE is not set
     # CONFIG_RCU_EQS_DEBUG is not set
     # end of RCU Debugging

CONFIG_MAPLE_RCU_DISABLED is not defined (and should BTW be renamed,
as CONFIG_* is reserved for kernel configuration options).

I do not see this issue on any other platform
(arm/arm64/risc-v/mips/sh/m68k), several of them use the same
RCU configuration.

Do you have a clue?
Thanks!

> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -4107,6 +4107,10 @@ static inline unsigned char mas_wr_new_end(struct ma_wr_state *wr_mas)
>  * mas_wr_append: Attempt to append
>  * @wr_mas: the maple write state
>  *
> + * This is currently unsafe in rcu mode since the end of the node may be cached
> + * by readers while the node contents may be updated which could result in
> + * inaccurate information.
> + *
>  * Return: True if appended, false otherwise
>  */
> static inline bool mas_wr_append(struct ma_wr_state *wr_mas,
> @@ -4116,6 +4120,9 @@ static inline bool mas_wr_append(struct ma_wr_state *wr_mas,
> 	struct ma_state *mas = wr_mas->mas;
> 	unsigned char node_pivots = mt_pivots[wr_mas->type];
>
> +	if (mt_in_rcu(mas->tree))
> +		return false;
> +
> 	if (mas->offset != wr_mas->node_end)
> 		return false;
>
> -- 
> 2.39.2

Gr{oetje,eeting}s,

 						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
 							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-08-29 16:42   ` [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible Geert Uytterhoeven
@ 2023-08-31  5:39     ` Michael Ellerman
  2023-08-31  8:25       ` Geert Uytterhoeven
       [not found]     ` <20230906152325.dblzauybyoq5kd35@revolver>
  2023-09-11 12:27     ` Linux regression tracking #adding (Thorsten Leemhuis)
  2 siblings, 1 reply; 35+ messages in thread
From: Michael Ellerman @ 2023-08-31  5:39 UTC (permalink / raw)
  To: Geert Uytterhoeven, Liam R. Howlett
  Cc: Andrew Morton, maple-tree, linux-mm, linux-kernel, stable,
	linux-renesas-soc

Geert Uytterhoeven <geert@linux-m68k.org> writes:
>  	Hi Liam,
>
> On Fri, 18 Aug 2023, Liam R. Howlett wrote:
>> The current implementation of append may cause duplicate data and/or
>> incorrect ranges to be returned to a reader during an update.  Although
>> this has not been reported or seen, disable the append write operation
>> while the tree is in rcu mode out of an abundance of caution.
>>
>> During the analysis of the mas_next_slot() the following was
>> artificially created by separating the writer and reader code:
>>
>> Writer:                                 reader:
>> mas_wr_append
>>    set end pivot
>>    updates end metata
>>    Detects write to last slot
>>    last slot write is to start of slot
>>    store current contents in slot
>>    overwrite old end pivot
>>                                        mas_next_slot():
>>                                                read end metadata
>>                                                read old end pivot
>>                                                return with incorrect range
>>    store new value
>>
>> Alternatively:
>>
>> Writer:                                 reader:
>> mas_wr_append
>>    set end pivot
>>    updates end metata
>>    Detects write to last slot
>>    last lost write to end of slot
>>    store value
>>                                        mas_next_slot():
>>                                                read end metadata
>>                                                read old end pivot
>>                                                read new end pivot
>>                                                return with incorrect range
>>    set old end pivot
>>
>> There may be other accesses that are not safe since we are now updating
>> both metadata and pointers, so disabling append if there could be rcu
>> readers is the safest action.
>>
>> Fixes: 54a611b60590 ("Maple Tree: add new data structure")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>
> Thanks for your patch, which is now commit cfeb6ae8bcb96ccf
> ("maple_tree: disable mas_wr_append() when other readers are
> possible") in v6.5, and is being backported to stable.
>
> On Renesas RZ/A1 and RZ/A2 (single-core Cortex-A9), this causes the
> following warning:
>
>       clocksource: timer@e803b000: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 28958491609 ns
>       sched_clock: 32 bits at 66MHz, resolution 15ns, wraps every 32537631224ns
>       /soc/timer@e803b000: used for clocksource
>       /soc/timer@e803c000: used for clock events
>      +------------[ cut here ]------------
>      +WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x2f0/0x480
>      +Interrupts were enabled early
...
>
> I do not see this issue on any other platform
> (arm/arm64/risc-v/mips/sh/m68k), several of them use the same
> RCU configuration.

There's something similar on pmac32 / mac99.

> Do you have a clue?

It seems something in the maple tree code is setting TIF_NEED_RESCHED,
and that causes a subsequent call to cond_resched() to call schedule()
and enable interrupts.

On pmac32 enabling CONFIG_DEBUG_ATOMIC_SLEEP fixes/hides the problem.
But I don't see why.

cheers

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-08-31  5:39     ` Michael Ellerman
@ 2023-08-31  8:25       ` Geert Uytterhoeven
  2023-08-31  8:45         ` Peng Zhang
  0 siblings, 1 reply; 35+ messages in thread
From: Geert Uytterhoeven @ 2023-08-31  8:25 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Liam R. Howlett, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc

Hi Michael,

On Thu, Aug 31, 2023 at 7:39 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> Geert Uytterhoeven <geert@linux-m68k.org> writes:
> > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> >> The current implementation of append may cause duplicate data and/or
> >> incorrect ranges to be returned to a reader during an update.  Although
> >> this has not been reported or seen, disable the append write operation
> >> while the tree is in rcu mode out of an abundance of caution.
> >>
> >> During the analysis of the mas_next_slot() the following was
> >> artificially created by separating the writer and reader code:
> >>
> >> Writer:                                 reader:
> >> mas_wr_append
> >>    set end pivot
> >>    updates end metata
> >>    Detects write to last slot
> >>    last slot write is to start of slot
> >>    store current contents in slot
> >>    overwrite old end pivot
> >>                                        mas_next_slot():
> >>                                                read end metadata
> >>                                                read old end pivot
> >>                                                return with incorrect range
> >>    store new value
> >>
> >> Alternatively:
> >>
> >> Writer:                                 reader:
> >> mas_wr_append
> >>    set end pivot
> >>    updates end metata
> >>    Detects write to last slot
> >>    last lost write to end of slot
> >>    store value
> >>                                        mas_next_slot():
> >>                                                read end metadata
> >>                                                read old end pivot
> >>                                                read new end pivot
> >>                                                return with incorrect range
> >>    set old end pivot
> >>
> >> There may be other accesses that are not safe since we are now updating
> >> both metadata and pointers, so disabling append if there could be rcu
> >> readers is the safest action.
> >>
> >> Fixes: 54a611b60590 ("Maple Tree: add new data structure")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> >
> > Thanks for your patch, which is now commit cfeb6ae8bcb96ccf
> > ("maple_tree: disable mas_wr_append() when other readers are
> > possible") in v6.5, and is being backported to stable.
> >
> > On Renesas RZ/A1 and RZ/A2 (single-core Cortex-A9), this causes the
> > following warning:
> >
> >       clocksource: timer@e803b000: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 28958491609 ns
> >       sched_clock: 32 bits at 66MHz, resolution 15ns, wraps every 32537631224ns
> >       /soc/timer@e803b000: used for clocksource
> >       /soc/timer@e803c000: used for clock events
> >      +------------[ cut here ]------------
> >      +WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x2f0/0x480
> >      +Interrupts were enabled early
> ...
> >
> > I do not see this issue on any other platform
> > (arm/arm64/risc-v/mips/sh/m68k), several of them use the same
> > RCU configuration.
>
> There's something similar on pmac32 / mac99.
>
> > Do you have a clue?
>
> It seems something in the maple tree code is setting TIF_NEED_RESCHED,
> and that causes a subsequent call to cond_resched() to call schedule()
> and enable interrupts.
>
> On pmac32 enabling CONFIG_DEBUG_ATOMIC_SLEEP fixes/hides the problem.
> But I don't see why.

Enabling CONFIG_DEBUG_ATOMIC_SLEEP on RZ/A1 and RZ/A2 does
fix the problem.
But there must be more to it, as some of my test configs had it enabled,
and others hadn't, while only RZ/A showed the issue.
I tried disabling it on R-Car M2-W (arm32) and R-Car H3 (arm64), and
that did not cause the problem to happen...

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-08-31  8:25       ` Geert Uytterhoeven
@ 2023-08-31  8:45         ` Peng Zhang
  2023-08-31  9:43           ` Geert Uytterhoeven
  0 siblings, 1 reply; 35+ messages in thread
From: Peng Zhang @ 2023-08-31  8:45 UTC (permalink / raw)
  To: Geert Uytterhoeven, Michael Ellerman
  Cc: Liam R. Howlett, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc



在 2023/8/31 16:25, Geert Uytterhoeven 写道:
> Hi Michael,
> 
> On Thu, Aug 31, 2023 at 7:39 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>> Geert Uytterhoeven <geert@linux-m68k.org> writes:
>>> On Fri, 18 Aug 2023, Liam R. Howlett wrote:
>>>> The current implementation of append may cause duplicate data and/or
>>>> incorrect ranges to be returned to a reader during an update.  Although
>>>> this has not been reported or seen, disable the append write operation
>>>> while the tree is in rcu mode out of an abundance of caution.
>>>>
>>>> During the analysis of the mas_next_slot() the following was
>>>> artificially created by separating the writer and reader code:
>>>>
>>>> Writer:                                 reader:
>>>> mas_wr_append
>>>>     set end pivot
>>>>     updates end metata
>>>>     Detects write to last slot
>>>>     last slot write is to start of slot
>>>>     store current contents in slot
>>>>     overwrite old end pivot
>>>>                                         mas_next_slot():
>>>>                                                 read end metadata
>>>>                                                 read old end pivot
>>>>                                                 return with incorrect range
>>>>     store new value
>>>>
>>>> Alternatively:
>>>>
>>>> Writer:                                 reader:
>>>> mas_wr_append
>>>>     set end pivot
>>>>     updates end metata
>>>>     Detects write to last slot
>>>>     last lost write to end of slot
>>>>     store value
>>>>                                         mas_next_slot():
>>>>                                                 read end metadata
>>>>                                                 read old end pivot
>>>>                                                 read new end pivot
>>>>                                                 return with incorrect range
>>>>     set old end pivot
>>>>
>>>> There may be other accesses that are not safe since we are now updating
>>>> both metadata and pointers, so disabling append if there could be rcu
>>>> readers is the safest action.
>>>>
>>>> Fixes: 54a611b60590 ("Maple Tree: add new data structure")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>>>
>>> Thanks for your patch, which is now commit cfeb6ae8bcb96ccf
>>> ("maple_tree: disable mas_wr_append() when other readers are
>>> possible") in v6.5, and is being backported to stable.
>>>
>>> On Renesas RZ/A1 and RZ/A2 (single-core Cortex-A9), this causes the
>>> following warning:
>>>
>>>        clocksource: timer@e803b000: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 28958491609 ns
>>>        sched_clock: 32 bits at 66MHz, resolution 15ns, wraps every 32537631224ns
>>>        /soc/timer@e803b000: used for clocksource
>>>        /soc/timer@e803c000: used for clock events
>>>       +------------[ cut here ]------------
>>>       +WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x2f0/0x480
>>>       +Interrupts were enabled early
>> ...
>>>
>>> I do not see this issue on any other platform
>>> (arm/arm64/risc-v/mips/sh/m68k), several of them use the same
>>> RCU configuration.
>>
>> There's something similar on pmac32 / mac99.
>>
>>> Do you have a clue?
>>
>> It seems something in the maple tree code is setting TIF_NEED_RESCHED,
>> and that causes a subsequent call to cond_resched() to call schedule()
>> and enable interrupts.
>>
>> On pmac32 enabling CONFIG_DEBUG_ATOMIC_SLEEP fixes/hides the problem.
>> But I don't see why.
> 
> Enabling CONFIG_DEBUG_ATOMIC_SLEEP on RZ/A1 and RZ/A2 does
> fix the problem.
> But there must be more to it, as some of my test configs had it enabled,
> and others hadn't, while only RZ/A showed the issue.
> I tried disabling it on R-Car M2-W (arm32) and R-Car H3 (arm64), and
> that did not cause the problem to happen...

I guess this patch triggers a potential problem with the maple tree.
I don't have the environment to trigger the problem. Can you apply the
following patch to see if the problem still exists? This can help locate
the root cause. At least narrow down the scope of the code that has bug.

Thanks.

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index f723024e1426..1b4b6f6e3095 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4351,9 +4351,6 @@ static inline void mas_wr_modify(struct 
ma_wr_state *wr_mas)
  	if (new_end == wr_mas->node_end && mas_wr_slot_store(wr_mas))
  		return;

-	if (mas_wr_node_store(wr_mas, new_end))
-		return;
-
  	if (mas_is_err(mas))
  		return;


> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-08-31  8:45         ` Peng Zhang
@ 2023-08-31  9:43           ` Geert Uytterhoeven
  0 siblings, 0 replies; 35+ messages in thread
From: Geert Uytterhoeven @ 2023-08-31  9:43 UTC (permalink / raw)
  To: Peng Zhang
  Cc: Michael Ellerman, Liam R. Howlett, Andrew Morton, maple-tree,
	linux-mm, linux-kernel, stable, linux-renesas-soc

Hi Peng,

On Thu, Aug 31, 2023 at 10:45 AM Peng Zhang <zhangpeng.00@bytedance.com> wrote:
> 在 2023/8/31 16:25, Geert Uytterhoeven 写道:
> > On Thu, Aug 31, 2023 at 7:39 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> >> Geert Uytterhoeven <geert@linux-m68k.org> writes:
> >>> On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> >>>> The current implementation of append may cause duplicate data and/or
> >>>> incorrect ranges to be returned to a reader during an update.  Although
> >>>> this has not been reported or seen, disable the append write operation
> >>>> while the tree is in rcu mode out of an abundance of caution.
> >>>>
> >>>> During the analysis of the mas_next_slot() the following was
> >>>> artificially created by separating the writer and reader code:
> >>>>
> >>>> Writer:                                 reader:
> >>>> mas_wr_append
> >>>>     set end pivot
> >>>>     updates end metata
> >>>>     Detects write to last slot
> >>>>     last slot write is to start of slot
> >>>>     store current contents in slot
> >>>>     overwrite old end pivot
> >>>>                                         mas_next_slot():
> >>>>                                                 read end metadata
> >>>>                                                 read old end pivot
> >>>>                                                 return with incorrect range
> >>>>     store new value
> >>>>
> >>>> Alternatively:
> >>>>
> >>>> Writer:                                 reader:
> >>>> mas_wr_append
> >>>>     set end pivot
> >>>>     updates end metata
> >>>>     Detects write to last slot
> >>>>     last lost write to end of slot
> >>>>     store value
> >>>>                                         mas_next_slot():
> >>>>                                                 read end metadata
> >>>>                                                 read old end pivot
> >>>>                                                 read new end pivot
> >>>>                                                 return with incorrect range
> >>>>     set old end pivot
> >>>>
> >>>> There may be other accesses that are not safe since we are now updating
> >>>> both metadata and pointers, so disabling append if there could be rcu
> >>>> readers is the safest action.
> >>>>
> >>>> Fixes: 54a611b60590 ("Maple Tree: add new data structure")
> >>>> Cc: stable@vger.kernel.org
> >>>> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> >>>
> >>> Thanks for your patch, which is now commit cfeb6ae8bcb96ccf
> >>> ("maple_tree: disable mas_wr_append() when other readers are
> >>> possible") in v6.5, and is being backported to stable.
> >>>
> >>> On Renesas RZ/A1 and RZ/A2 (single-core Cortex-A9), this causes the
> >>> following warning:
> >>>
> >>>        clocksource: timer@e803b000: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 28958491609 ns
> >>>        sched_clock: 32 bits at 66MHz, resolution 15ns, wraps every 32537631224ns
> >>>        /soc/timer@e803b000: used for clocksource
> >>>        /soc/timer@e803c000: used for clock events
> >>>       +------------[ cut here ]------------
> >>>       +WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x2f0/0x480
> >>>       +Interrupts were enabled early
> >> ...
> >>>
> >>> I do not see this issue on any other platform
> >>> (arm/arm64/risc-v/mips/sh/m68k), several of them use the same
> >>> RCU configuration.
> >>
> >> There's something similar on pmac32 / mac99.
> >>
> >>> Do you have a clue?
> >>
> >> It seems something in the maple tree code is setting TIF_NEED_RESCHED,
> >> and that causes a subsequent call to cond_resched() to call schedule()
> >> and enable interrupts.
> >>
> >> On pmac32 enabling CONFIG_DEBUG_ATOMIC_SLEEP fixes/hides the problem.
> >> But I don't see why.
> >
> > Enabling CONFIG_DEBUG_ATOMIC_SLEEP on RZ/A1 and RZ/A2 does
> > fix the problem.
> > But there must be more to it, as some of my test configs had it enabled,
> > and others hadn't, while only RZ/A showed the issue.
> > I tried disabling it on R-Car M2-W (arm32) and R-Car H3 (arm64), and
> > that did not cause the problem to happen...
>
> I guess this patch triggers a potential problem with the maple tree.
> I don't have the environment to trigger the problem. Can you apply the
> following patch to see if the problem still exists? This can help locate
> the root cause. At least narrow down the scope of the code that has bug.

Thanks for your suggestion!

> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index f723024e1426..1b4b6f6e3095 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -4351,9 +4351,6 @@ static inline void mas_wr_modify(struct
> ma_wr_state *wr_mas)
>         if (new_end == wr_mas->node_end && mas_wr_slot_store(wr_mas))
>                 return;
>
> -       if (mas_wr_node_store(wr_mas, new_end))
> -               return;
> -
>         if (mas_is_err(mas))
>                 return;

Unfortunately removing these lines does not help.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
       [not found]       ` <ad298077-fca8-437e-b9e3-66e31424afb1@paulmck-laptop>
@ 2023-09-06 17:29         ` Liam R. Howlett
  2023-09-06 18:02           ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Liam R. Howlett @ 2023-09-06 17:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Geert Uytterhoeven, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc, Shanker Donthineni

* Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > (Adding Paul & Shanker to Cc list.. please see below for why)
> > 
> > Apologies on the late response, I was away and have been struggling to
> > get a working PPC32 test environment.
> > 
> > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > 	Hi Liam,
> > > 
> > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > The current implementation of append may cause duplicate data and/or
> > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > this has not been reported or seen, disable the append write operation
> > > > while the tree is in rcu mode out of an abundance of caution.
> > 
> > ...
> > > > 
> > > > Fixes: 54a611b60590 ("Maple Tree: add new data structure")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > > 
> > > Thanks for your patch, which is now commit cfeb6ae8bcb96ccf
> > > ("maple_tree: disable mas_wr_append() when other readers are
> > > possible") in v6.5, and is being backported to stable.
> > > 
> > > On Renesas RZ/A1 and RZ/A2 (single-core Cortex-A9), this causes the
> > > following warning:
> > > 
> > >      clocksource: timer@e803b000: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 28958491609 ns
> > >      sched_clock: 32 bits at 66MHz, resolution 15ns, wraps every 32537631224ns
> > >      /soc/timer@e803b000: used for clocksource
> > >      /soc/timer@e803c000: used for clock events
> > >     +------------[ cut here ]------------
> > >     +WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x2f0/0x480
> > >     +Interrupts were enabled early
> > 
> > Note that the maple tree is involved in tracking the interrupts, see
> > kernel/irq/irqdesc.c irq_insert_desc(), etc.
> > 
> > >     +CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rza2mevb-10197-g99b80d6b92b5 #237
> > 
> > I cannot find commit id 99b80d6b92b5.
> > 
> > >     +Hardware name: Generic R7S9210 (Flattened Device Tree)
> > >     + unwind_backtrace from show_stack+0x10/0x14
> > >     + show_stack from dump_stack_lvl+0x24/0x3c
> > >     + dump_stack_lvl from __warn+0x74/0xb8
> > >     + __warn from warn_slowpath_fmt+0x78/0xb0
> > >     + warn_slowpath_fmt from start_kernel+0x2f0/0x480
> > >     + start_kernel from 0x0
> > >     +---[ end trace 0000000000000000 ]---
> > >      Console: colour dummy device 80x30
> > >      printk: console [tty0] enabled
> > >      Calibrating delay loop (skipped) preset value.. 1056.00 BogoMIPS (lpj=5280000)
> > > 
> > > Reverting this commit fixes the issue.
> > 
> > I have set up testing with qemu for powerpc 32b, and reverting this
> > patch does not fix it for me.  Did you revert the patch or bisect to the
> > issue?
> > 
> > It also happens on 0e0e9bd5f7b9 (I ran git checkout cfeb6ae8bcb96ccf^ to
> > get the commit immediately before cfeb6ae8bcb96ccf).  My qemu command is
> > as follows:
> > 
> > qemu-system-ppc -L pc-bios -boot c -prom-env "boot-device=hd:,\yaboot"
> > -prom-env "boot-args=conf=hd:,\yaboot.conf" -M mac99,via=pmu -m 2048
> > -hda powerpc32.img -nographic -kernel <file>
> > 
> > 
> > > 
> > > RCU-related configs:
> > > 
> > >     $ grep RCU .config
> > >     # RCU Subsystem
> > >     CONFIG_TINY_RCU=y
> > >     # CONFIG_RCU_EXPERT is not set
> > >     CONFIG_TINY_SRCU=y
> > >     # end of RCU Subsystem
> > >     # RCU Debugging
> > >     # CONFIG_RCU_SCALE_TEST is not set
> > >     # CONFIG_RCU_TORTURE_TEST is not set
> > >     # CONFIG_RCU_REF_SCALE_TEST is not set
> > >     # CONFIG_RCU_TRACE is not set
> > >     # CONFIG_RCU_EQS_DEBUG is not set
> > >     # end of RCU Debugging
> > 
> > I used the configuration from debian 8 and ran 'make oldconfig' to build
> > my kernel.  I have attached the configuration.
> > 
> > > 
> > > CONFIG_MAPLE_RCU_DISABLED is not defined (and should BTW be renamed,
> > > as CONFIG_* is reserved for kernel configuration options).
> > 
> > Thanks, I'll add it to my list of work.
> > 
> > > 
> > > I do not see this issue on any other platform
> > > (arm/arm64/risc-v/mips/sh/m68k), several of them use the same
> > > RCU configuration.
> > > 
> > > Do you have a clue?
> > 
> > It appears to be something to do with struct maple_tree sparse_irqs.  If
> > you drop the rcu flag from that maple tree, then my configuration boots
> > without the warning.
> > 
> > I *think* this is because we will reuse a lot more nodes.  And I *think*
> > the rcu flag is not needed, since there is a comment about reading the
> > tree being protected by the mutex sparse_irq_lock within the
> > kernel/irq/irqdesc.c file.  Shanker, can you comment on that?
> > 
> > I wonder if there is a limit to the number of RCU free events before
> > something is triggered to flush them out which could trigger IRQ
> > enabling? Paul, could this be the case?
> 
> Are you asking if call_rcu() will re-enable interrupts in the following
> use case?
> 
> 	local_irq_disable();
> 	call_rcu(&p->rh, my_cb_func);
> 	local_irq_enable();
> 
> If so, the answer is "no" (and yes, there might be a bug, but then again
> I just double-checked).  However, if interrupts are enabled across a
> call_rcu() invocation, it will do some additional work to encourage
> the grace period.  Even if interrupts are disabled across a call_rcu()
> invocation, it will still do either a raise_softirq() or a wakeup,
> depending on configuration, to encourage the grace period.  And in
> the case of call_rcu() from an interrupt handler, that raise_softirq()
> could result in the RCU_SOFTIRQ handler running upon return from that
> interrupt, and if there are a great many callbacks ready to invoke,
> this RCU_SOFTIRQ handler might take quite some time.  Plus that
> handler could itself be interrupted.
> 
> If long-running RCU_SOFTIRQ handlers are a problem, you can boot with
> rcutree.use_softirq=0, which puts that handler into a kthread, which
> in turn give the scheduler more control.  However, this will also turn
> a lightweight raise_softirq() into a rather heavier weight wakeup.
> Choose wisely!  ;-)
> 
> Or am I missing your point?
> 


This is very early in the boot sequence when interrupts have not been
enabled.  What we are seeing is a WARN_ON() that is triggered by
interrupts being enabled before they should be enabled.

I was wondering if, for example, I called call_rcu() a lot *before*
interrupts were enabled, that something could trigger that would either
enable interrupts or indicate the task needs rescheduling?

Specifically the rescheduling part is suspect.  I tracked down the call
to a mutex_lock() which calls cond_resched(), so could rcu be
'encouraging' the rcu window by a reschedule request?

Thanks,
Liam

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-06 17:29         ` Liam R. Howlett
@ 2023-09-06 18:02           ` Paul E. McKenney
  2023-09-11 23:54             ` Liam R. Howlett
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2023-09-06 18:02 UTC (permalink / raw)
  To: Liam R. Howlett, Geert Uytterhoeven, Andrew Morton, maple-tree,
	linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni

On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > 
> > > Apologies on the late response, I was away and have been struggling to
> > > get a working PPC32 test environment.
> > > 
> > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > 	Hi Liam,
> > > > 
> > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > The current implementation of append may cause duplicate data and/or
> > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > this has not been reported or seen, disable the append write operation
> > > > > while the tree is in rcu mode out of an abundance of caution.
> > > 
> > > ...
> > > > > 
> > > > > Fixes: 54a611b60590 ("Maple Tree: add new data structure")
> > > > > Cc: stable@vger.kernel.org
> > > > > Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> > > > 
> > > > Thanks for your patch, which is now commit cfeb6ae8bcb96ccf
> > > > ("maple_tree: disable mas_wr_append() when other readers are
> > > > possible") in v6.5, and is being backported to stable.
> > > > 
> > > > On Renesas RZ/A1 and RZ/A2 (single-core Cortex-A9), this causes the
> > > > following warning:
> > > > 
> > > >      clocksource: timer@e803b000: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 28958491609 ns
> > > >      sched_clock: 32 bits at 66MHz, resolution 15ns, wraps every 32537631224ns
> > > >      /soc/timer@e803b000: used for clocksource
> > > >      /soc/timer@e803c000: used for clock events
> > > >     +------------[ cut here ]------------
> > > >     +WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x2f0/0x480
> > > >     +Interrupts were enabled early
> > > 
> > > Note that the maple tree is involved in tracking the interrupts, see
> > > kernel/irq/irqdesc.c irq_insert_desc(), etc.
> > > 
> > > >     +CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rza2mevb-10197-g99b80d6b92b5 #237
> > > 
> > > I cannot find commit id 99b80d6b92b5.
> > > 
> > > >     +Hardware name: Generic R7S9210 (Flattened Device Tree)
> > > >     + unwind_backtrace from show_stack+0x10/0x14
> > > >     + show_stack from dump_stack_lvl+0x24/0x3c
> > > >     + dump_stack_lvl from __warn+0x74/0xb8
> > > >     + __warn from warn_slowpath_fmt+0x78/0xb0
> > > >     + warn_slowpath_fmt from start_kernel+0x2f0/0x480
> > > >     + start_kernel from 0x0
> > > >     +---[ end trace 0000000000000000 ]---
> > > >      Console: colour dummy device 80x30
> > > >      printk: console [tty0] enabled
> > > >      Calibrating delay loop (skipped) preset value.. 1056.00 BogoMIPS (lpj=5280000)
> > > > 
> > > > Reverting this commit fixes the issue.
> > > 
> > > I have set up testing with qemu for powerpc 32b, and reverting this
> > > patch does not fix it for me.  Did you revert the patch or bisect to the
> > > issue?
> > > 
> > > It also happens on 0e0e9bd5f7b9 (I ran git checkout cfeb6ae8bcb96ccf^ to
> > > get the commit immediately before cfeb6ae8bcb96ccf).  My qemu command is
> > > as follows:
> > > 
> > > qemu-system-ppc -L pc-bios -boot c -prom-env "boot-device=hd:,\yaboot"
> > > -prom-env "boot-args=conf=hd:,\yaboot.conf" -M mac99,via=pmu -m 2048
> > > -hda powerpc32.img -nographic -kernel <file>
> > > 
> > > 
> > > > 
> > > > RCU-related configs:
> > > > 
> > > >     $ grep RCU .config
> > > >     # RCU Subsystem
> > > >     CONFIG_TINY_RCU=y
> > > >     # CONFIG_RCU_EXPERT is not set
> > > >     CONFIG_TINY_SRCU=y
> > > >     # end of RCU Subsystem
> > > >     # RCU Debugging
> > > >     # CONFIG_RCU_SCALE_TEST is not set
> > > >     # CONFIG_RCU_TORTURE_TEST is not set
> > > >     # CONFIG_RCU_REF_SCALE_TEST is not set
> > > >     # CONFIG_RCU_TRACE is not set
> > > >     # CONFIG_RCU_EQS_DEBUG is not set
> > > >     # end of RCU Debugging
> > > 
> > > I used the configuration from debian 8 and ran 'make oldconfig' to build
> > > my kernel.  I have attached the configuration.
> > > 
> > > > 
> > > > CONFIG_MAPLE_RCU_DISABLED is not defined (and should BTW be renamed,
> > > > as CONFIG_* is reserved for kernel configuration options).
> > > 
> > > Thanks, I'll add it to my list of work.
> > > 
> > > > 
> > > > I do not see this issue on any other platform
> > > > (arm/arm64/risc-v/mips/sh/m68k), several of them use the same
> > > > RCU configuration.
> > > > 
> > > > Do you have a clue?
> > > 
> > > It appears to be something to do with struct maple_tree sparse_irqs.  If
> > > you drop the rcu flag from that maple tree, then my configuration boots
> > > without the warning.
> > > 
> > > I *think* this is because we will reuse a lot more nodes.  And I *think*
> > > the rcu flag is not needed, since there is a comment about reading the
> > > tree being protected by the mutex sparse_irq_lock within the
> > > kernel/irq/irqdesc.c file.  Shanker, can you comment on that?
> > > 
> > > I wonder if there is a limit to the number of RCU free events before
> > > something is triggered to flush them out which could trigger IRQ
> > > enabling? Paul, could this be the case?
> > 
> > Are you asking if call_rcu() will re-enable interrupts in the following
> > use case?
> > 
> > 	local_irq_disable();
> > 	call_rcu(&p->rh, my_cb_func);
> > 	local_irq_enable();
> > 
> > If so, the answer is "no" (and yes, there might be a bug, but then again
> > I just double-checked).  However, if interrupts are enabled across a
> > call_rcu() invocation, it will do some additional work to encourage
> > the grace period.  Even if interrupts are disabled across a call_rcu()
> > invocation, it will still do either a raise_softirq() or a wakeup,
> > depending on configuration, to encourage the grace period.  And in
> > the case of call_rcu() from an interrupt handler, that raise_softirq()
> > could result in the RCU_SOFTIRQ handler running upon return from that
> > interrupt, and if there are a great many callbacks ready to invoke,
> > this RCU_SOFTIRQ handler might take quite some time.  Plus that
> > handler could itself be interrupted.
> > 
> > If long-running RCU_SOFTIRQ handlers are a problem, you can boot with
> > rcutree.use_softirq=0, which puts that handler into a kthread, which
> > in turn give the scheduler more control.  However, this will also turn
> > a lightweight raise_softirq() into a rather heavier weight wakeup.
> > Choose wisely!  ;-)
> > 
> > Or am I missing your point?
> 
> This is very early in the boot sequence when interrupts have not been
> enabled.  What we are seeing is a WARN_ON() that is triggered by
> interrupts being enabled before they should be enabled.
> 
> I was wondering if, for example, I called call_rcu() a lot *before*
> interrupts were enabled, that something could trigger that would either
> enable interrupts or indicate the task needs rescheduling?

You aren't doing call_rcu() enough to hit OOM, are you?  The actual RCU
callback invocations won't happen until some time after the scheduler
starts up.

> Specifically the rescheduling part is suspect.  I tracked down the call
> to a mutex_lock() which calls cond_resched(), so could rcu be
> 'encouraging' the rcu window by a reschedule request?

During boot before interrupts are enabled, RCU has not yet spawned any of
its kthreads.  Therefore, all of its attempts to do wakeups would notice
a NULL task_struct pointer and refrain from actually doing the wakeup.
If it did do the wakeup, you would see a NULL-pointer exception.  See
for example, invoke_rcu_core_kthread(), though that won't happen unless
you booted with rcutree.use_softirq=0.

Besides, since when did doing a wakeup enable interrupts?  That would
make it hard to do wakeups from hardware interrupt handlers, not?

But why not put some WARN_ON_ONCE(!irqs_disabled()) calls in the areas
of greatest suspicion, starting from the stack trace generated by that
mutex_lock()?  A stray interrupt-enable could be pretty much anywhere.

But where are those call_rcu() invocations?  Before rcu_init()?
Presumably before init is spawned and the early_init() calls.

And what is the RCU-related Kconfig and boot-parameter setup?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
       [not found]     ` <20230906152325.dblzauybyoq5kd35@revolver>
       [not found]       ` <ad298077-fca8-437e-b9e3-66e31424afb1@paulmck-laptop>
@ 2023-09-06 19:06       ` Geert Uytterhoeven
  1 sibling, 0 replies; 35+ messages in thread
From: Geert Uytterhoeven @ 2023-09-06 19:06 UTC (permalink / raw)
  To: Liam R. Howlett
  Cc: Andrew Morton, maple-tree, linux-mm, linux-kernel, stable,
	linux-renesas-soc, Shanker Donthineni, Paul E. McKenney

Hi Liam,

On Wed, Sep 6, 2023 at 5:24 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > The current implementation of append may cause duplicate data and/or
> > > incorrect ranges to be returned to a reader during an update.  Although
> > > this has not been reported or seen, disable the append write operation
> > > while the tree is in rcu mode out of an abundance of caution.
>
> ...
> > >
> > > Fixes: 54a611b60590 ("Maple Tree: add new data structure")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> >
> > Thanks for your patch, which is now commit cfeb6ae8bcb96ccf
> > ("maple_tree: disable mas_wr_append() when other readers are
> > possible") in v6.5, and is being backported to stable.
> >
> > On Renesas RZ/A1 and RZ/A2 (single-core Cortex-A9), this causes the
> > following warning:
> >
> >      clocksource: timer@e803b000: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 28958491609 ns
> >      sched_clock: 32 bits at 66MHz, resolution 15ns, wraps every 32537631224ns
> >      /soc/timer@e803b000: used for clocksource
> >      /soc/timer@e803c000: used for clock events
> >     +------------[ cut here ]------------
> >     +WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x2f0/0x480
> >     +Interrupts were enabled early
>
> Note that the maple tree is involved in tracking the interrupts, see
> kernel/irq/irqdesc.c irq_insert_desc(), etc.
>
> >     +CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rza2mevb-10197-g99b80d6b92b5 #237
>
> I cannot find commit id 99b80d6b92b5.

That's my local tree, based on renesas-drivers-2023-08-29-v6.5.

>
> >     +Hardware name: Generic R7S9210 (Flattened Device Tree)
> >     + unwind_backtrace from show_stack+0x10/0x14
> >     + show_stack from dump_stack_lvl+0x24/0x3c
> >     + dump_stack_lvl from __warn+0x74/0xb8
> >     + __warn from warn_slowpath_fmt+0x78/0xb0
> >     + warn_slowpath_fmt from start_kernel+0x2f0/0x480
> >     + start_kernel from 0x0
> >     +---[ end trace 0000000000000000 ]---
> >      Console: colour dummy device 80x30
> >      printk: console [tty0] enabled
> >      Calibrating delay loop (skipped) preset value.. 1056.00 BogoMIPS (lpj=5280000)
> >
> > Reverting this commit fixes the issue.
>
> I have set up testing with qemu for powerpc 32b, and reverting this
> patch does not fix it for me.  Did you revert the patch or bisect to the
> issue?

I did bisect the issue (on RZ/A) to cfeb6ae8bcb96ccf.
Reverting that commit on top of my local tree fixed the issue.

> It also happens on 0e0e9bd5f7b9 (I ran git checkout cfeb6ae8bcb96ccf^ to
> get the commit immediately before cfeb6ae8bcb96ccf).

That is not the case on Renesas RZ/A (which is arm32).

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-08-29 16:42   ` [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible Geert Uytterhoeven
  2023-08-31  5:39     ` Michael Ellerman
       [not found]     ` <20230906152325.dblzauybyoq5kd35@revolver>
@ 2023-09-11 12:27     ` Linux regression tracking #adding (Thorsten Leemhuis)
  2023-10-16  8:29       ` Linux regression tracking #update (Thorsten Leemhuis)
  2 siblings, 1 reply; 35+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-09-11 12:27 UTC (permalink / raw)
  To: Geert Uytterhoeven, Liam R. Howlett
  Cc: Andrew Morton, maple-tree, linux-mm, linux-kernel, stable,
	linux-renesas-soc

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 29.08.23 18:42, Geert Uytterhoeven wrote:
> 
> On Fri, 18 Aug 2023, Liam R. Howlett wrote:
>> The current implementation of append may cause duplicate data and/or
>> incorrect ranges to be returned to a reader during an update.  Although
>> this has not been reported or seen, disable the append write operation
>> while the tree is in rcu mode out of an abundance of caution.
>>
>> During the analysis of the mas_next_slot() the following was
>> artificially created by separating the writer and reader code:
> [...]
> Thanks for your patch, which is now commit cfeb6ae8bcb96ccf
> ("maple_tree: disable mas_wr_append() when other readers are
> possible") in v6.5, and is being backported to stable.
> 
> On Renesas RZ/A1 and RZ/A2 (single-core Cortex-A9), this causes the
> following warning:
> > […]
> Reverting this commit fixes the issue.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced cfeb6ae8bcb96ccf
#regzbot title maple_tree: warning on Renesas RZ/A1 and RZ/A2
(single-core Cortex-A9
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-06 18:02           ` Paul E. McKenney
@ 2023-09-11 23:54             ` Liam R. Howlett
  2023-09-12  8:14               ` Paul E. McKenney
  2023-09-12 14:10               ` Matthew Wilcox
  0 siblings, 2 replies; 35+ messages in thread
From: Liam R. Howlett @ 2023-09-11 23:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Geert Uytterhoeven, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc, Shanker Donthineni

* Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > 
> > > > Apologies on the late response, I was away and have been struggling to
> > > > get a working PPC32 test environment.
> > > > 
> > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > 	Hi Liam,
> > > > > 
> > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > this has not been reported or seen, disable the append write operation
> > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > 
> > > > ...
> > > > > > 

...

> > > > > RCU-related configs:
> > > > > 
> > > > >     $ grep RCU .config
> > > > >     # RCU Subsystem
> > > > >     CONFIG_TINY_RCU=y
> > > > >     # CONFIG_RCU_EXPERT is not set
> > > > >     CONFIG_TINY_SRCU=y
> > > > >     # end of RCU Subsystem
> > > > >     # RCU Debugging
> > > > >     # CONFIG_RCU_SCALE_TEST is not set
> > > > >     # CONFIG_RCU_TORTURE_TEST is not set
> > > > >     # CONFIG_RCU_REF_SCALE_TEST is not set
> > > > >     # CONFIG_RCU_TRACE is not set
> > > > >     # CONFIG_RCU_EQS_DEBUG is not set
> > > > >     # end of RCU Debugging
> > > > 
> > > > I used the configuration from debian 8 and ran 'make oldconfig' to build
> > > > my kernel.  I have attached the configuration.

...

> > > > It appears to be something to do with struct maple_tree sparse_irqs.  If
> > > > you drop the rcu flag from that maple tree, then my configuration boots
> > > > without the warning.
> > > > 
> > > > I *think* this is because we will reuse a lot more nodes.  And I *think*
> > > > the rcu flag is not needed, since there is a comment about reading the
> > > > tree being protected by the mutex sparse_irq_lock within the
> > > > kernel/irq/irqdesc.c file.  Shanker, can you comment on that?
> > > > 
> > > > I wonder if there is a limit to the number of RCU free events before
> > > > something is triggered to flush them out which could trigger IRQ
> > > > enabling? Paul, could this be the case?
> > > 
> > > Are you asking if call_rcu() will re-enable interrupts in the following
> > > use case?
> > > 
> > > 	local_irq_disable();
> > > 	call_rcu(&p->rh, my_cb_func);
> > > 	local_irq_enable();

I am not.

...

> > > 
> > > Or am I missing your point?
> > 
> > This is very early in the boot sequence when interrupts have not been
> > enabled.  What we are seeing is a WARN_ON() that is triggered by
> > interrupts being enabled before they should be enabled.
> > 
> > I was wondering if, for example, I called call_rcu() a lot *before*
> > interrupts were enabled, that something could trigger that would either
> > enable interrupts or indicate the task needs rescheduling?
> 
> You aren't doing call_rcu() enough to hit OOM, are you?  The actual RCU
> callback invocations won't happen until some time after the scheduler
> starts up.

I am not, it's just a detection of IRQs being enabled early.

> 
> > Specifically the rescheduling part is suspect.  I tracked down the call
> > to a mutex_lock() which calls cond_resched(), so could rcu be
> > 'encouraging' the rcu window by a reschedule request?
> 
> During boot before interrupts are enabled, RCU has not yet spawned any of
> its kthreads.  Therefore, all of its attempts to do wakeups would notice
> a NULL task_struct pointer and refrain from actually doing the wakeup.
> If it did do the wakeup, you would see a NULL-pointer exception.  See
> for example, invoke_rcu_core_kthread(), though that won't happen unless
> you booted with rcutree.use_softirq=0.
> 
> Besides, since when did doing a wakeup enable interrupts?  That would
> make it hard to do wakeups from hardware interrupt handlers, not?

Taking the mutex lock in kernel/irq/manage.c __setup_irq() is calling a
cond_resched().

From what Michael said [1] in this thread, since something has already
set TIF_NEED_RESCHED, it will eventually enable interrupts on us.

I've traced this to running call_rcu() in kernel/rcu/tiny.c and
is_idle_task(current) is true, which means rcu runs:
		/* force scheduling for rcu_qs() */                                                                     
                resched_cpu(0);

the task is set idle in sched_init() -> init_idle() and never changed,
afaict.

Removing the RCU option from the maple tree in kernel/irq/irqdesc.c
fixes the issue by avoiding the maple tree running call_rcu().  I am not
sure on the locking of the tree so I feel this change may cause other
issues...also it's before lockdep_init(), so any issue I introduce may
not be detected.

When CONFIG_DEBUG_ATOMIC_SLEEP is configured, it seems that rcu does the
same thing, but the IRQs are not enabled on return.  So, resched_cpu(0)
is called, but the IRQs warning of enabled isn't triggered.  I failed to
find a reason why.

I am not entirely sure what makes ppc32 different than other platforms
in that the initial task is configured to an idle task and the first
call to call_rcu (tiny!) would cause the observed behaviour.

Non-tiny rcu calls (as I am sure you know, but others may not)
kernel/rcu/tree.c which in turn calls __call_rcu_common().  That
function is far more complex than the tiny version.  Maybe it's part of
why we see different behaviour based on platforms?  I don't see an idle
check in that version of call_rcu().

Or maybe PPC32 has something set incorrectly to cause this failure in
early boot and I've just found something that needs to be set
differently?

> 
> But why not put some WARN_ON_ONCE(!irqs_disabled()) calls in the areas
> of greatest suspicion, starting from the stack trace generated by that
> mutex_lock()?  A stray interrupt-enable could be pretty much anywhere.
> 
> But where are those call_rcu() invocations?  Before rcu_init()?

During init_IRQ(), which is after rcu_init() but before rcu_init_nohz(),
srcu_init(), and softirq_init() in init/main.c start_kernel().

> Presumably before init is spawned and the early_init() calls.
> 
> And what is the RCU-related Kconfig and boot-parameter setup?

The .config was attached to the email I sent, and it matches what was
quoted above in the "RCU-related configs" section.

[1] https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-11 23:54             ` Liam R. Howlett
@ 2023-09-12  8:14               ` Paul E. McKenney
  2023-09-12  8:23                 ` Geert Uytterhoeven
  2023-09-12 14:37                 ` Christophe Leroy
  2023-09-12 14:10               ` Matthew Wilcox
  1 sibling, 2 replies; 35+ messages in thread
From: Paul E. McKenney @ 2023-09-12  8:14 UTC (permalink / raw)
  To: Liam R. Howlett, Geert Uytterhoeven, Andrew Morton, maple-tree,
	linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni

On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > 
> > > > > Apologies on the late response, I was away and have been struggling to
> > > > > get a working PPC32 test environment.
> > > > > 
> > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > 	Hi Liam,
> > > > > > 
> > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > 
> > > > > ...
> > > > > > > 
> 
> ...
> 
> > > > > > RCU-related configs:
> > > > > > 
> > > > > >     $ grep RCU .config
> > > > > >     # RCU Subsystem
> > > > > >     CONFIG_TINY_RCU=y

I must have been asleep last time I looked at this.  I was looking at
Tree RCU.  Please accept my apologies for my lapse.  :-/

However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
have said the same thing, albeit after looking at a lot less RCU code.

TL;DR:

1.	Try making the __setup_irq() function's call to mutex_lock()
	instead be as follows:

	if (!mutex_trylock(&desc->request_mutex))
		mutex_lock(&desc->request_mutex);

	This might fail if __setup_irq() has other dependencies on a
	fully operational scheduler.

2.	Move that ppc32 call to __setup_irq() much later, most definitely
	after interrupts have been enabled and the scheduler is fully
	operational.  Invoking mutex_lock() before that time is not a
	good idea.  ;-)

For more detail, please read on!

> > > > > >     # CONFIG_RCU_EXPERT is not set
> > > > > >     CONFIG_TINY_SRCU=y
> > > > > >     # end of RCU Subsystem
> > > > > >     # RCU Debugging
> > > > > >     # CONFIG_RCU_SCALE_TEST is not set
> > > > > >     # CONFIG_RCU_TORTURE_TEST is not set
> > > > > >     # CONFIG_RCU_REF_SCALE_TEST is not set
> > > > > >     # CONFIG_RCU_TRACE is not set
> > > > > >     # CONFIG_RCU_EQS_DEBUG is not set
> > > > > >     # end of RCU Debugging
> > > > > 
> > > > > I used the configuration from debian 8 and ran 'make oldconfig' to build
> > > > > my kernel.  I have attached the configuration.
> 
> ...
> 
> > > > > It appears to be something to do with struct maple_tree sparse_irqs.  If
> > > > > you drop the rcu flag from that maple tree, then my configuration boots
> > > > > without the warning.
> > > > > 
> > > > > I *think* this is because we will reuse a lot more nodes.  And I *think*
> > > > > the rcu flag is not needed, since there is a comment about reading the
> > > > > tree being protected by the mutex sparse_irq_lock within the
> > > > > kernel/irq/irqdesc.c file.  Shanker, can you comment on that?
> > > > > 
> > > > > I wonder if there is a limit to the number of RCU free events before
> > > > > something is triggered to flush them out which could trigger IRQ
> > > > > enabling? Paul, could this be the case?
> > > > 
> > > > Are you asking if call_rcu() will re-enable interrupts in the following
> > > > use case?
> > > > 
> > > > 	local_irq_disable();
> > > > 	call_rcu(&p->rh, my_cb_func);
> > > > 	local_irq_enable();
> 
> I am not.
> 
> ...
> 
> > > > 
> > > > Or am I missing your point?
> > > 
> > > This is very early in the boot sequence when interrupts have not been
> > > enabled.  What we are seeing is a WARN_ON() that is triggered by
> > > interrupts being enabled before they should be enabled.
> > > 
> > > I was wondering if, for example, I called call_rcu() a lot *before*
> > > interrupts were enabled, that something could trigger that would either
> > > enable interrupts or indicate the task needs rescheduling?
> > 
> > You aren't doing call_rcu() enough to hit OOM, are you?  The actual RCU
> > callback invocations won't happen until some time after the scheduler
> > starts up.
> 
> I am not, it's just a detection of IRQs being enabled early.
> 
> > 
> > > Specifically the rescheduling part is suspect.  I tracked down the call
> > > to a mutex_lock() which calls cond_resched(), so could rcu be
> > > 'encouraging' the rcu window by a reschedule request?
> > 
> > During boot before interrupts are enabled, RCU has not yet spawned any of
> > its kthreads.  Therefore, all of its attempts to do wakeups would notice
> > a NULL task_struct pointer and refrain from actually doing the wakeup.
> > If it did do the wakeup, you would see a NULL-pointer exception.  See
> > for example, invoke_rcu_core_kthread(), though that won't happen unless
> > you booted with rcutree.use_softirq=0.
> > 
> > Besides, since when did doing a wakeup enable interrupts?  That would
> > make it hard to do wakeups from hardware interrupt handlers, not?
> 
> Taking the mutex lock in kernel/irq/manage.c __setup_irq() is calling a
> cond_resched().
> 
> >From what Michael said [1] in this thread, since something has already
> set TIF_NEED_RESCHED, it will eventually enable interrupts on us.
> 
> I've traced this to running call_rcu() in kernel/rcu/tiny.c and
> is_idle_task(current) is true, which means rcu runs:
> 		/* force scheduling for rcu_qs() */                                                                     
>                 resched_cpu(0);
> 
> the task is set idle in sched_init() -> init_idle() and never changed,
> afaict.

Yes, because RCU eventually needs a context switch in order to make
a grace period happen.  And Maple Tree isn't the only thing invoking
call_rcu() early, so this has been in place for a very long time.

> Removing the RCU option from the maple tree in kernel/irq/irqdesc.c
> fixes the issue by avoiding the maple tree running call_rcu().  I am not
> sure on the locking of the tree so I feel this change may cause other
> issues...also it's before lockdep_init(), so any issue I introduce may
> not be detected.
> 
> When CONFIG_DEBUG_ATOMIC_SLEEP is configured, it seems that rcu does the
> same thing, but the IRQs are not enabled on return.  So, resched_cpu(0)
> is called, but the IRQs warning of enabled isn't triggered.  I failed to
> find a reason why.

Here you mean IRQs being enabled upon return from __setup_irq(), correct?

But yes, __setup_irq() does call mutex_lock().  Which will call
preempt_schedule_common() via might_sleep() and __cond_resched(), even
though that is clearly a very bad thing this early.  And let's face it,
the whole purpose of mutex_lock() is to block when needed.  And a big
purpose of that might_sleep() is to yell at people invoking this with
interrupts disabled.

And most of the wrappers around __setup_irq() look to be intended
for much later, after interrupts have been enabled.  One exception is
setup_percpu_irq(), which says that it is for "the early boot process",
whenever that might be.  But this is only invoked from mips in v6.5.

The __request_percpu_irq() function is wrappered by request_percpu_irq(),
and its header comment suggests that it is to be called after there are
multiple CPUs.  I am not seeing a call that is obviously from ppc32,
but there are a number of drivers using this with which I am unfamiliar.

The request_percpu_nmi() has to be followed up on each CPU using
prepare_percpu_nmi() and enable_percpu_nmi(), so it is not clear that
it is useful to invoke this before interrupts are enabled.  But this is
used by ARM, not ppc32 from what I can see.

So even though I am not seeing how ppc32 invokes __setup_irq() early,
your testing clearly indicates that it is doing so.

> I am not entirely sure what makes ppc32 different than other platforms
> in that the initial task is configured to an idle task and the first
> call to call_rcu (tiny!) would cause the observed behaviour.

Maybe something like this in __setup_irq(), right before the
mutex_lock()?

	WARN_ON_ONCE(irqs_disabled());

This will dump the stack trace showing how __setup_irq() is being invoked
in early boot on ppc32.

Again, given that __setup_irq() invokes mutex_lock(), invoking this
function in its current form before interrupts have been enabled is a
bad idea.

> Non-tiny rcu calls (as I am sure you know, but others may not)
> kernel/rcu/tree.c which in turn calls __call_rcu_common().  That
> function is far more complex than the tiny version.  Maybe it's part of
> why we see different behaviour based on platforms?  I don't see an idle
> check in that version of call_rcu().

Tree RCU can rely on the grace-period kthread starting up later in boot.
This kthread will handle any need for a grace period from early boot
invocation of call_rcu().  Tiny RCU does not have such a kthread, hence
the resched_cpu().  Which is fine, as long as nothing (incorrectly!)
invokes a blocking operation before the scheduler has been initialized.

> Or maybe PPC32 has something set incorrectly to cause this failure in
> early boot and I've just found something that needs to be set
> differently?

This might be a failure of imagination on my part, but I certainly don't
see why ppc32 should expect to get away with invoking __setup_irq()
anywhere near that early!  ;-)

But if ppc32 cannot be adjusted so as to invoke __setup_irq() later,
the following change to __setup_irq() might work:

	if (!mutex_trylock(&desc->request_mutex))
		mutex_lock(&desc->request_mutex);

The reason for "might" rather than "will" is that __setup_irq() might
well have other dependencies on the scheduler being up and running.

It would be cleaner for that ppc32 call to __setup_irq() to be moved
later, after the scheduler is up and running.

> > But why not put some WARN_ON_ONCE(!irqs_disabled()) calls in the areas
> > of greatest suspicion, starting from the stack trace generated by that
> > mutex_lock()?  A stray interrupt-enable could be pretty much anywhere.
> > 
> > But where are those call_rcu() invocations?  Before rcu_init()?
> 
> During init_IRQ(), which is after rcu_init() but before rcu_init_nohz(),
> srcu_init(), and softirq_init() in init/main.c start_kernel().

That is a perfectly legal time and place for a call_rcu() invocation.
It is also legal before rcu_init(), but it has to be late enough that
things like per-CPU variables are accessible.  Which is still very early.

> > Presumably before init is spawned and the early_init() calls.
> > 
> > And what is the RCU-related Kconfig and boot-parameter setup?
> 
> The .config was attached to the email I sent, and it matches what was
> quoted above in the "RCU-related configs" section.
> 
> [1] https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/

Again, apologies for having misread that .config snippet!!!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12  8:14               ` Paul E. McKenney
@ 2023-09-12  8:23                 ` Geert Uytterhoeven
  2023-09-12  8:30                   ` Paul E. McKenney
  2023-09-12 14:37                 ` Christophe Leroy
  1 sibling, 1 reply; 35+ messages in thread
From: Geert Uytterhoeven @ 2023-09-12  8:23 UTC (permalink / raw)
  To: paulmck
  Cc: Liam R. Howlett, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc, Shanker Donthineni

Hi Paul,

On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > >
> > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > get a working PPC32 test environment.
> > > > > >
> > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > >     Hi Liam,
> > > > > > >
> > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > >
> > > > > > ...
> > > > > > > >
> >
> > ...
> >
> > > > > > > RCU-related configs:
> > > > > > >
> > > > > > >     $ grep RCU .config
> > > > > > >     # RCU Subsystem
> > > > > > >     CONFIG_TINY_RCU=y
>
> I must have been asleep last time I looked at this.  I was looking at
> Tree RCU.  Please accept my apologies for my lapse.  :-/
>
> However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> have said the same thing, albeit after looking at a lot less RCU code.
>
> TL;DR:
>
> 1.      Try making the __setup_irq() function's call to mutex_lock()
>         instead be as follows:
>
>         if (!mutex_trylock(&desc->request_mutex))
>                 mutex_lock(&desc->request_mutex);
>
>         This might fail if __setup_irq() has other dependencies on a
>         fully operational scheduler.
>
> 2.      Move that ppc32 call to __setup_irq() much later, most definitely
>         after interrupts have been enabled and the scheduler is fully
>         operational.  Invoking mutex_lock() before that time is not a
>         good idea.  ;-)

There is no call to __setup_irq() from arch/powerpc/?

Note that there are (possibly different) issues seen on ppc32 and on arm32
(Renesas RZ/A in particular, but not on other Renesas ARM systems).

I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
cfeb6ae8bcb96ccf^.

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12  8:23                 ` Geert Uytterhoeven
@ 2023-09-12  8:30                   ` Paul E. McKenney
  2023-09-12  8:34                     ` Geert Uytterhoeven
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2023-09-12  8:30 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Liam R. Howlett, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc, Shanker Donthineni

On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> Hi Paul,
> 
> On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > >
> > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > get a working PPC32 test environment.
> > > > > > >
> > > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > > >     Hi Liam,
> > > > > > > >
> > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > >
> > > > > > > ...
> > > > > > > > >
> > >
> > > ...
> > >
> > > > > > > > RCU-related configs:
> > > > > > > >
> > > > > > > >     $ grep RCU .config
> > > > > > > >     # RCU Subsystem
> > > > > > > >     CONFIG_TINY_RCU=y
> >
> > I must have been asleep last time I looked at this.  I was looking at
> > Tree RCU.  Please accept my apologies for my lapse.  :-/
> >
> > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > have said the same thing, albeit after looking at a lot less RCU code.
> >
> > TL;DR:
> >
> > 1.      Try making the __setup_irq() function's call to mutex_lock()
> >         instead be as follows:
> >
> >         if (!mutex_trylock(&desc->request_mutex))
> >                 mutex_lock(&desc->request_mutex);
> >
> >         This might fail if __setup_irq() has other dependencies on a
> >         fully operational scheduler.
> >
> > 2.      Move that ppc32 call to __setup_irq() much later, most definitely
> >         after interrupts have been enabled and the scheduler is fully
> >         operational.  Invoking mutex_lock() before that time is not a
> >         good idea.  ;-)
> 
> There is no call to __setup_irq() from arch/powerpc/?

Glad it is not just me, given that I didn't see a direct call, either.  So
later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
just before that mutex_lock() in __setup_irq().

Either way, invoking mutex_lock() early in boot before interrupts have
been enabled is a bad idea.  ;-)

> Note that there are (possibly different) issues seen on ppc32 and on arm32
> (Renesas RZ/A in particular, but not on other Renesas ARM systems).
> 
> I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
> Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
> cfeb6ae8bcb96ccf^.

I look forward to hearing what is the issue in both cases.

							Thanx, Paul

> Thanks!
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12  8:30                   ` Paul E. McKenney
@ 2023-09-12  8:34                     ` Geert Uytterhoeven
  2023-09-12 10:00                       ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Geert Uytterhoeven @ 2023-09-12  8:34 UTC (permalink / raw)
  To: paulmck
  Cc: Liam R. Howlett, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc, Shanker Donthineni

Hi Paul,

On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > > >
> > > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > > get a working PPC32 test environment.
> > > > > > > >
> > > > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > > > >     Hi Liam,
> > > > > > > > >
> > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > > >
> > > > > > > > ...
> > > > > > > > > >
> > > >
> > > > ...
> > > >
> > > > > > > > > RCU-related configs:
> > > > > > > > >
> > > > > > > > >     $ grep RCU .config
> > > > > > > > >     # RCU Subsystem
> > > > > > > > >     CONFIG_TINY_RCU=y
> > >
> > > I must have been asleep last time I looked at this.  I was looking at
> > > Tree RCU.  Please accept my apologies for my lapse.  :-/
> > >
> > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > > have said the same thing, albeit after looking at a lot less RCU code.
> > >
> > > TL;DR:
> > >
> > > 1.      Try making the __setup_irq() function's call to mutex_lock()
> > >         instead be as follows:
> > >
> > >         if (!mutex_trylock(&desc->request_mutex))
> > >                 mutex_lock(&desc->request_mutex);
> > >
> > >         This might fail if __setup_irq() has other dependencies on a
> > >         fully operational scheduler.
> > >
> > > 2.      Move that ppc32 call to __setup_irq() much later, most definitely
> > >         after interrupts have been enabled and the scheduler is fully
> > >         operational.  Invoking mutex_lock() before that time is not a
> > >         good idea.  ;-)
> >
> > There is no call to __setup_irq() from arch/powerpc/?
>
> Glad it is not just me, given that I didn't see a direct call, either.  So
> later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
> just before that mutex_lock() in __setup_irq().
>
> Either way, invoking mutex_lock() early in boot before interrupts have
> been enabled is a bad idea.  ;-)

I'll add that WARN_ON_ONCE() too, and will report back later today...

> > Note that there are (possibly different) issues seen on ppc32 and on arm32
> > (Renesas RZ/A in particular, but not on other Renesas ARM systems).
> >
> > I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
> > Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
> > cfeb6ae8bcb96ccf^.
>
> I look forward to hearing what is the issue in both cases.

For RZ/A, my problem report is
https://lore.kernel.org/all/3f86d58e-7f36-c6b4-c43a-2a7bcffd3bd@linux-m68k.org/

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12  8:34                     ` Geert Uytterhoeven
@ 2023-09-12 10:00                       ` Paul E. McKenney
  2023-09-12 13:56                         ` Liam R. Howlett
  2023-09-13 13:14                         ` Geert Uytterhoeven
  0 siblings, 2 replies; 35+ messages in thread
From: Paul E. McKenney @ 2023-09-12 10:00 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Liam R. Howlett, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc, Shanker Donthineni

On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> Hi Paul,
> 
> On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> > > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > > > >
> > > > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > > > get a working PPC32 test environment.
> > > > > > > > >
> > > > > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > > > > >     Hi Liam,
> > > > > > > > > >
> > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > > > >
> > > > > > > > > ...
> > > > > > > > > > >
> > > > >
> > > > > ...
> > > > >
> > > > > > > > > > RCU-related configs:
> > > > > > > > > >
> > > > > > > > > >     $ grep RCU .config
> > > > > > > > > >     # RCU Subsystem
> > > > > > > > > >     CONFIG_TINY_RCU=y
> > > >
> > > > I must have been asleep last time I looked at this.  I was looking at
> > > > Tree RCU.  Please accept my apologies for my lapse.  :-/
> > > >
> > > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > > > have said the same thing, albeit after looking at a lot less RCU code.
> > > >
> > > > TL;DR:
> > > >
> > > > 1.      Try making the __setup_irq() function's call to mutex_lock()
> > > >         instead be as follows:
> > > >
> > > >         if (!mutex_trylock(&desc->request_mutex))
> > > >                 mutex_lock(&desc->request_mutex);
> > > >
> > > >         This might fail if __setup_irq() has other dependencies on a
> > > >         fully operational scheduler.
> > > >
> > > > 2.      Move that ppc32 call to __setup_irq() much later, most definitely
> > > >         after interrupts have been enabled and the scheduler is fully
> > > >         operational.  Invoking mutex_lock() before that time is not a
> > > >         good idea.  ;-)
> > >
> > > There is no call to __setup_irq() from arch/powerpc/?
> >
> > Glad it is not just me, given that I didn't see a direct call, either.  So
> > later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
> > just before that mutex_lock() in __setup_irq().
> >
> > Either way, invoking mutex_lock() early in boot before interrupts have
> > been enabled is a bad idea.  ;-)
> 
> I'll add that WARN_ON_ONCE() too, and will report back later today...

Thank you, looking forward to hearing the outcome!

> > > Note that there are (possibly different) issues seen on ppc32 and on arm32
> > > (Renesas RZ/A in particular, but not on other Renesas ARM systems).
> > >
> > > I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
> > > Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
> > > cfeb6ae8bcb96ccf^.
> >
> > I look forward to hearing what is the issue in both cases.
> 
> For RZ/A, my problem report is
> https://lore.kernel.org/all/3f86d58e-7f36-c6b4-c43a-2a7bcffd3bd@linux-m68k.org/

Thank you, Geert!

Huh.  Is that patch you reverted causing Maple Tree or related code
to attempt to acquire mutexes in early boot before interrupts have
been enabled?

If that added WARN_ON_ONCE() doesn't trigger early, another approach
would be to put it at the beginning of mutex_lock().  Or for that matter
at the beginning of might_sleep().

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 10:00                       ` Paul E. McKenney
@ 2023-09-12 13:56                         ` Liam R. Howlett
  2023-09-12 14:29                           ` Liam R. Howlett
  2023-09-12 15:07                           ` Paul E. McKenney
  2023-09-13 13:14                         ` Geert Uytterhoeven
  1 sibling, 2 replies; 35+ messages in thread
From: Liam R. Howlett @ 2023-09-12 13:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Geert Uytterhoeven, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc, Shanker Donthineni

* Paul E. McKenney <paulmck@kernel.org> [230912 06:00]:
> On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> > Hi Paul,
> > 
> > On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> > > > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > > > > >
> > > > > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > > > > get a working PPC32 test environment.
> > > > > > > > > >
> > > > > > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > > > > > >     Hi Liam,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > > > > >
> > > > > > > > > > ...
> > > > > > > > > > > >
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > > > > > > RCU-related configs:
> > > > > > > > > > >
> > > > > > > > > > >     $ grep RCU .config
> > > > > > > > > > >     # RCU Subsystem
> > > > > > > > > > >     CONFIG_TINY_RCU=y
> > > > >
> > > > > I must have been asleep last time I looked at this.  I was looking at
> > > > > Tree RCU.  Please accept my apologies for my lapse.  :-/
> > > > >
> > > > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > > > > have said the same thing, albeit after looking at a lot less RCU code.
> > > > >
> > > > > TL;DR:
> > > > >
> > > > > 1.      Try making the __setup_irq() function's call to mutex_lock()
> > > > >         instead be as follows:
> > > > >
> > > > >         if (!mutex_trylock(&desc->request_mutex))
> > > > >                 mutex_lock(&desc->request_mutex);
> > > > >
> > > > >         This might fail if __setup_irq() has other dependencies on a
> > > > >         fully operational scheduler.
> > > > >
> > > > > 2.      Move that ppc32 call to __setup_irq() much later, most definitely
> > > > >         after interrupts have been enabled and the scheduler is fully
> > > > >         operational.  Invoking mutex_lock() before that time is not a
> > > > >         good idea.  ;-)
> > > >
> > > > There is no call to __setup_irq() from arch/powerpc/?
> > >
> > > Glad it is not just me, given that I didn't see a direct call, either.  So
> > > later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
> > > just before that mutex_lock() in __setup_irq().

I had already found that this is the mutex lock that is enabling them.
I surrounded the mutex lock to ensure it was not enabled before, but was
after.  Here is the findings:

kernel/irq/manage.c:1587 __setup_irq:
[    0.000000] [c0e65ec0] [c00e9b00] __setup_irq+0x6c4/0x840 (unreliable)
[    0.000000] [c0e65ef0] [c00e9d74] request_threaded_irq+0xf8/0x1f4
[    0.000000] [c0e65f20] [c0c27168] pmac_pic_init+0x204/0x5f8
[    0.000000] [c0e65f80] [c0c1f544] init_IRQ+0xac/0x12c
[    0.000000] [c0e65fa0] [c0c1cad0] start_kernel+0x544/0x6d4

Note your line number will be slightly different due to my debug.  This
is the WARN _after_ the mutex lock.

> > >
> > > Either way, invoking mutex_lock() early in boot before interrupts have
> > > been enabled is a bad idea.  ;-)
> > 
> > I'll add that WARN_ON_ONCE() too, and will report back later today...
> 
> Thank you, looking forward to hearing the outcome!
> 
> > > > Note that there are (possibly different) issues seen on ppc32 and on arm32
> > > > (Renesas RZ/A in particular, but not on other Renesas ARM systems).
> > > >
> > > > I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
> > > > Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
> > > > cfeb6ae8bcb96ccf^.
> > >
> > > I look forward to hearing what is the issue in both cases.
> > 
> > For RZ/A, my problem report is
> > https://lore.kernel.org/all/3f86d58e-7f36-c6b4-c43a-2a7bcffd3bd@linux-m68k.org/
> 
> Thank you, Geert!
> 
> Huh.  Is that patch you reverted causing Maple Tree or related code
> to attempt to acquire mutexes in early boot before interrupts have
> been enabled?
> 
> If that added WARN_ON_ONCE() doesn't trigger early, another approach
> would be to put it at the beginning of mutex_lock().  Or for that matter
> at the beginning of might_sleep().

Yeah, I put many WARN() calls through the code as well as tracking down
where TIF_NEED_RESCHED was set; the tiny.c call_rcu().


So my findings summarized:

1. My change to the maple tree makes call_rcu() more likely on early boot.
2. The initial thread setup is always set to idle state
3. call_rcu() tiny sets TIF_NEED_RESCHED since is_idle_task(current)
4. init_IRQ() takes a mutex lock which will enable the interrupts since
TIF_NEED_RESCHED is set.

I don't know which of these things is "wrong".

I also looked into the mtmsr register but decided to consult you lot
about my findings in hopes that someone with more knowledge of the
platform or early boot would alleviate the pain so that I could context
switch or sleep :)  I mean, an mtmsr bug seems like a leap even for the
issues I create..

Regards,
Liam


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-11 23:54             ` Liam R. Howlett
  2023-09-12  8:14               ` Paul E. McKenney
@ 2023-09-12 14:10               ` Matthew Wilcox
  2023-09-12 14:17                 ` Liam R. Howlett
  1 sibling, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2023-09-12 14:10 UTC (permalink / raw)
  To: Liam R. Howlett, Paul E. McKenney, Geert Uytterhoeven,
	Andrew Morton, maple-tree, linux-mm, linux-kernel, stable,
	linux-renesas-soc, Shanker Donthineni

On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> Taking the mutex lock in kernel/irq/manage.c __setup_irq() is calling a
> cond_resched().
> 
> >From what Michael said [1] in this thread, since something has already
> set TIF_NEED_RESCHED, it will eventually enable interrupts on us.
> 
> I've traced this to running call_rcu() in kernel/rcu/tiny.c and
> is_idle_task(current) is true, which means rcu runs:
> 		/* force scheduling for rcu_qs() */                                                                     
>                 resched_cpu(0);
> 
> the task is set idle in sched_init() -> init_idle() and never changed,
> afaict.

Should calling init_idle() be deferred until after interrupts are
all set up?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 14:10               ` Matthew Wilcox
@ 2023-09-12 14:17                 ` Liam R. Howlett
  0 siblings, 0 replies; 35+ messages in thread
From: Liam R. Howlett @ 2023-09-12 14:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Paul E. McKenney, Geert Uytterhoeven, Andrew Morton, maple-tree,
	linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni

* Matthew Wilcox <willy@infradead.org> [230912 10:10]:
> On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > Taking the mutex lock in kernel/irq/manage.c __setup_irq() is calling a
> > cond_resched().
> > 
> > >From what Michael said [1] in this thread, since something has already
> > set TIF_NEED_RESCHED, it will eventually enable interrupts on us.
> > 
> > I've traced this to running call_rcu() in kernel/rcu/tiny.c and
> > is_idle_task(current) is true, which means rcu runs:
> > 		/* force scheduling for rcu_qs() */                                                                     
> >                 resched_cpu(0);
> > 
> > the task is set idle in sched_init() -> init_idle() and never changed,
> > afaict.
> 
> Should calling init_idle() be deferred until after interrupts are
> all set up?

At this point it is not platform specific code so I don't know what kind
of fallout I'll produce with a change like that, but I was wondering if
the thread running the boot process is really 'idle'?



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 13:56                         ` Liam R. Howlett
@ 2023-09-12 14:29                           ` Liam R. Howlett
  2023-09-12 15:08                             ` Paul E. McKenney
  2023-09-12 15:07                           ` Paul E. McKenney
  1 sibling, 1 reply; 35+ messages in thread
From: Liam R. Howlett @ 2023-09-12 14:29 UTC (permalink / raw)
  To: Paul E. McKenney, Geert Uytterhoeven, Andrew Morton, maple-tree,
	linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni

* Liam R. Howlett <Liam.Howlett@Oracle.com> [230912 09:56]:
> * Paul E. McKenney <paulmck@kernel.org> [230912 06:00]:
> > On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> > > Hi Paul,
> > > 
> > > On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> > > > > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > > > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > > > > > >
> > > > > > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > > > > > get a working PPC32 test environment.
> > > > > > > > > > >
> > > > > > > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > > > > > > >     Hi Liam,
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > > > > > >
> > > > > > > > > > > ...
> > > > > > > > > > > > >
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > > > > > > RCU-related configs:
> > > > > > > > > > > >
> > > > > > > > > > > >     $ grep RCU .config
> > > > > > > > > > > >     # RCU Subsystem
> > > > > > > > > > > >     CONFIG_TINY_RCU=y
> > > > > >
> > > > > > I must have been asleep last time I looked at this.  I was looking at
> > > > > > Tree RCU.  Please accept my apologies for my lapse.  :-/
> > > > > >
> > > > > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > > > > > have said the same thing, albeit after looking at a lot less RCU code.
> > > > > >
> > > > > > TL;DR:
> > > > > >
> > > > > > 1.      Try making the __setup_irq() function's call to mutex_lock()
> > > > > >         instead be as follows:
> > > > > >
> > > > > >         if (!mutex_trylock(&desc->request_mutex))
> > > > > >                 mutex_lock(&desc->request_mutex);
> > > > > >
> > > > > >         This might fail if __setup_irq() has other dependencies on a
> > > > > >         fully operational scheduler.

This changes where the interrupts become enabled, but doesn't stop it
from happening.  It still throws a WARN after init_IRQ(). I suspect it
is not the way to proceed as there are probably many places that will
need to be changed in both common and arch specific code.  As soon as
TIF_NEED_RESCHED is set, then all the checks will need to be altered.

I think we either need to set the boot thread to !idle, avoid call_rcu()
to set TIF_NEED_RESCHED (how was this working before?  Do we need rcu
for the IRQs?), or alter the boot order (note this is NOT arch or
platform code here).

I don't like any of these.  I'd like another option, please?

> > > > > >
> > > > > > 2.      Move that ppc32 call to __setup_irq() much later, most definitely
> > > > > >         after interrupts have been enabled and the scheduler is fully
> > > > > >         operational.  Invoking mutex_lock() before that time is not a
> > > > > >         good idea.  ;-)
> > > > >
> > > > > There is no call to __setup_irq() from arch/powerpc/?
> > > >
> > > > Glad it is not just me, given that I didn't see a direct call, either.  So
> > > > later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
> > > > just before that mutex_lock() in __setup_irq().

Oh, and also:
arch/powerpc/platforms/powermac/setup.c:        .init_IRQ               = pmac_pic_init,

> 
> I had already found that this is the mutex lock that is enabling them.
> I surrounded the mutex lock to ensure it was not enabled before, but was
> after.  Here is the findings:
> 
> kernel/irq/manage.c:1587 __setup_irq:
> [    0.000000] [c0e65ec0] [c00e9b00] __setup_irq+0x6c4/0x840 (unreliable)
> [    0.000000] [c0e65ef0] [c00e9d74] request_threaded_irq+0xf8/0x1f4
> [    0.000000] [c0e65f20] [c0c27168] pmac_pic_init+0x204/0x5f8
> [    0.000000] [c0e65f80] [c0c1f544] init_IRQ+0xac/0x12c
> [    0.000000] [c0e65fa0] [c0c1cad0] start_kernel+0x544/0x6d4
> 
> Note your line number will be slightly different due to my debug.  This
> is the WARN _after_ the mutex lock.
> 
> > > >
> > > > Either way, invoking mutex_lock() early in boot before interrupts have
> > > > been enabled is a bad idea.  ;-)
> > > 
> > > I'll add that WARN_ON_ONCE() too, and will report back later today...
> > 
> > Thank you, looking forward to hearing the outcome!
> > 
> > > > > Note that there are (possibly different) issues seen on ppc32 and on arm32
> > > > > (Renesas RZ/A in particular, but not on other Renesas ARM systems).
> > > > >
> > > > > I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
> > > > > Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
> > > > > cfeb6ae8bcb96ccf^.
> > > >
> > > > I look forward to hearing what is the issue in both cases.
> > > 
> > > For RZ/A, my problem report is
> > > https://lore.kernel.org/all/3f86d58e-7f36-c6b4-c43a-2a7bcffd3bd@linux-m68k.org/
> > 
> > Thank you, Geert!
> > 
> > Huh.  Is that patch you reverted causing Maple Tree or related code
> > to attempt to acquire mutexes in early boot before interrupts have
> > been enabled?
> > 
> > If that added WARN_ON_ONCE() doesn't trigger early, another approach
> > would be to put it at the beginning of mutex_lock().  Or for that matter
> > at the beginning of might_sleep().
> 
> Yeah, I put many WARN() calls through the code as well as tracking down
> where TIF_NEED_RESCHED was set; the tiny.c call_rcu().
> 
> 
> So my findings summarized:
> 
> 1. My change to the maple tree makes call_rcu() more likely on early boot.
> 2. The initial thread setup is always set to idle state
> 3. call_rcu() tiny sets TIF_NEED_RESCHED since is_idle_task(current)
> 4. init_IRQ() takes a mutex lock which will enable the interrupts since
> TIF_NEED_RESCHED is set.
> 
> I don't know which of these things is "wrong".
> 
> I also looked into the mtmsr register but decided to consult you lot
> about my findings in hopes that someone with more knowledge of the
> platform or early boot would alleviate the pain so that I could context
> switch or sleep :)  I mean, an mtmsr bug seems like a leap even for the
> issues I create..
> 
> Regards,
> Liam
> 
> 
> -- 
> maple-tree mailing list
> maple-tree@lists.infradead.org
> https://lists.infradead.org/mailman/listinfo/maple-tree

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12  8:14               ` Paul E. McKenney
  2023-09-12  8:23                 ` Geert Uytterhoeven
@ 2023-09-12 14:37                 ` Christophe Leroy
  2023-09-12 15:06                   ` Christophe Leroy
  1 sibling, 1 reply; 35+ messages in thread
From: Christophe Leroy @ 2023-09-12 14:37 UTC (permalink / raw)
  To: paulmck, Liam R. Howlett, Geert Uytterhoeven, Andrew Morton,
	maple-tree, linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni



Le 12/09/2023 à 10:14, Paul E. McKenney a écrit :
> On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
>> * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
>>> On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
>>>> * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
>>>>> On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
>>>>>> (Adding Paul & Shanker to Cc list.. please see below for why)
>>>>>>
>>>>>> Apologies on the late response, I was away and have been struggling to
>>>>>> get a working PPC32 test environment.
>>>>>>
>>>>>> * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
>>>>>>> 	Hi Liam,
>>>>>>>
>>>>>>> On Fri, 18 Aug 2023, Liam R. Howlett wrote:
>>>>>>>> The current implementation of append may cause duplicate data and/or
>>>>>>>> incorrect ranges to be returned to a reader during an update.  Although
>>>>>>>> this has not been reported or seen, disable the append write operation
>>>>>>>> while the tree is in rcu mode out of an abundance of caution.
>>>>>>
>>>>>> ...
>>>>>>>>
>>
>> ...
>>
>>>>>>> RCU-related configs:
>>>>>>>
>>>>>>>      $ grep RCU .config
>>>>>>>      # RCU Subsystem
>>>>>>>      CONFIG_TINY_RCU=y
> 
> I must have been asleep last time I looked at this.  I was looking at
> Tree RCU.  Please accept my apologies for my lapse.  :-/
> 
> However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> have said the same thing, albeit after looking at a lot less RCU code.
> 
> TL;DR:
> 
> 1.	Try making the __setup_irq() function's call to mutex_lock()
> 	instead be as follows:
> 
> 	if (!mutex_trylock(&desc->request_mutex))
> 		mutex_lock(&desc->request_mutex);
> 
> 	This might fail if __setup_irq() has other dependencies on a
> 	fully operational scheduler.
> 
> 2.	Move that ppc32 call to __setup_irq() much later, most definitely
> 	after interrupts have been enabled and the scheduler is fully
> 	operational.  Invoking mutex_lock() before that time is not a
> 	good idea.  ;-)
> 
> For more detail, please read on!
> 
>>>>>>>      # CONFIG_RCU_EXPERT is not set
>>>>>>>      CONFIG_TINY_SRCU=y
>>>>>>>      # end of RCU Subsystem
>>>>>>>      # RCU Debugging
>>>>>>>      # CONFIG_RCU_SCALE_TEST is not set
>>>>>>>      # CONFIG_RCU_TORTURE_TEST is not set
>>>>>>>      # CONFIG_RCU_REF_SCALE_TEST is not set
>>>>>>>      # CONFIG_RCU_TRACE is not set
>>>>>>>      # CONFIG_RCU_EQS_DEBUG is not set
>>>>>>>      # end of RCU Debugging
>>>>>>
>>>>>> I used the configuration from debian 8 and ran 'make oldconfig' to build
>>>>>> my kernel.  I have attached the configuration.
>>
>> ...
>>
>>>>>> It appears to be something to do with struct maple_tree sparse_irqs.  If
>>>>>> you drop the rcu flag from that maple tree, then my configuration boots
>>>>>> without the warning.
>>>>>>
>>>>>> I *think* this is because we will reuse a lot more nodes.  And I *think*
>>>>>> the rcu flag is not needed, since there is a comment about reading the
>>>>>> tree being protected by the mutex sparse_irq_lock within the
>>>>>> kernel/irq/irqdesc.c file.  Shanker, can you comment on that?
>>>>>>
>>>>>> I wonder if there is a limit to the number of RCU free events before
>>>>>> something is triggered to flush them out which could trigger IRQ
>>>>>> enabling? Paul, could this be the case?
>>>>>
>>>>> Are you asking if call_rcu() will re-enable interrupts in the following
>>>>> use case?
>>>>>
>>>>> 	local_irq_disable();
>>>>> 	call_rcu(&p->rh, my_cb_func);
>>>>> 	local_irq_enable();
>>
>> I am not.
>>
>> ...
>>
>>>>>
>>>>> Or am I missing your point?
>>>>
>>>> This is very early in the boot sequence when interrupts have not been
>>>> enabled.  What we are seeing is a WARN_ON() that is triggered by
>>>> interrupts being enabled before they should be enabled.
>>>>
>>>> I was wondering if, for example, I called call_rcu() a lot *before*
>>>> interrupts were enabled, that something could trigger that would either
>>>> enable interrupts or indicate the task needs rescheduling?
>>>
>>> You aren't doing call_rcu() enough to hit OOM, are you?  The actual RCU
>>> callback invocations won't happen until some time after the scheduler
>>> starts up.
>>
>> I am not, it's just a detection of IRQs being enabled early.
>>
>>>
>>>> Specifically the rescheduling part is suspect.  I tracked down the call
>>>> to a mutex_lock() which calls cond_resched(), so could rcu be
>>>> 'encouraging' the rcu window by a reschedule request?
>>>
>>> During boot before interrupts are enabled, RCU has not yet spawned any of
>>> its kthreads.  Therefore, all of its attempts to do wakeups would notice
>>> a NULL task_struct pointer and refrain from actually doing the wakeup.
>>> If it did do the wakeup, you would see a NULL-pointer exception.  See
>>> for example, invoke_rcu_core_kthread(), though that won't happen unless
>>> you booted with rcutree.use_softirq=0.
>>>
>>> Besides, since when did doing a wakeup enable interrupts?  That would
>>> make it hard to do wakeups from hardware interrupt handlers, not?
>>
>> Taking the mutex lock in kernel/irq/manage.c __setup_irq() is calling a
>> cond_resched().
>>
>> >From what Michael said [1] in this thread, since something has already
>> set TIF_NEED_RESCHED, it will eventually enable interrupts on us.
>>
>> I've traced this to running call_rcu() in kernel/rcu/tiny.c and
>> is_idle_task(current) is true, which means rcu runs:
>> 		/* force scheduling for rcu_qs() */
>>                  resched_cpu(0);
>>
>> the task is set idle in sched_init() -> init_idle() and never changed,
>> afaict.
> 
> Yes, because RCU eventually needs a context switch in order to make
> a grace period happen.  And Maple Tree isn't the only thing invoking
> call_rcu() early, so this has been in place for a very long time.
> 
>> Removing the RCU option from the maple tree in kernel/irq/irqdesc.c
>> fixes the issue by avoiding the maple tree running call_rcu().  I am not
>> sure on the locking of the tree so I feel this change may cause other
>> issues...also it's before lockdep_init(), so any issue I introduce may
>> not be detected.
>>
>> When CONFIG_DEBUG_ATOMIC_SLEEP is configured, it seems that rcu does the
>> same thing, but the IRQs are not enabled on return.  So, resched_cpu(0)
>> is called, but the IRQs warning of enabled isn't triggered.  I failed to
>> find a reason why.
> 
> Here you mean IRQs being enabled upon return from __setup_irq(), correct?
> 
> But yes, __setup_irq() does call mutex_lock().  Which will call
> preempt_schedule_common() via might_sleep() and __cond_resched(), even
> though that is clearly a very bad thing this early.  And let's face it,
> the whole purpose of mutex_lock() is to block when needed.  And a big
> purpose of that might_sleep() is to yell at people invoking this with
> interrupts disabled.
> 
> And most of the wrappers around __setup_irq() look to be intended
> for much later, after interrupts have been enabled.  One exception is
> setup_percpu_irq(), which says that it is for "the early boot process",
> whenever that might be.  But this is only invoked from mips in v6.5.
> 
> The __request_percpu_irq() function is wrappered by request_percpu_irq(),
> and its header comment suggests that it is to be called after there are
> multiple CPUs.  I am not seeing a call that is obviously from ppc32,
> but there are a number of drivers using this with which I am unfamiliar.
> 
> The request_percpu_nmi() has to be followed up on each CPU using
> prepare_percpu_nmi() and enable_percpu_nmi(), so it is not clear that
> it is useful to invoke this before interrupts are enabled.  But this is
> used by ARM, not ppc32 from what I can see.
> 
> So even though I am not seeing how ppc32 invokes __setup_irq() early,
> your testing clearly indicates that it is doing so.
> 
>> I am not entirely sure what makes ppc32 different than other platforms
>> in that the initial task is configured to an idle task and the first
>> call to call_rcu (tiny!) would cause the observed behaviour.
> 
> Maybe something like this in __setup_irq(), right before the
> mutex_lock()?
> 
> 	WARN_ON_ONCE(irqs_disabled());
> 
> This will dump the stack trace showing how __setup_irq() is being invoked
> in early boot on ppc32.
> 
> Again, given that __setup_irq() invokes mutex_lock(), invoking this
> function in its current form before interrupts have been enabled is a
> bad idea.
> 

No trigger of that WARN_ON() I added in __setup_irq() as instructed 
above, still getting (pmac32_defconfig on MAC99 QEMU)

------------[ cut here ]------------
Interrupts were enabled early
WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x4d8/0x5c0
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc1-dirty #481
Hardware name: PowerMac3,1 7400 0xc0209 PowerMac
NIP:  c0a6052c LR: c0a6052c CTR: 00000000
REGS: c0c4dee0 TRAP: 0700   Not tainted  (6.6.0-rc1-dirty)
MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 24000282  XER: 20000000

GPR00: c0a6052c c0c4dfa0 c0b92580 0000001d c0b9d128 00000001 c0b9d148 
3ffffdff
GPR08: c0ba80f0 00000000 00000000 3ffffe00 44000282 00000000 00000000 
0199abfc
GPR16: 0199b0a4 7fde7fa4 7fc5ac0c 000000bb 41000000 01c690c8 c0b92014 
c09b4bdc
GPR24: c0c55220 c0ac0000 00000000 efff9109 efff9100 0000000a c0c6d000 
c0b920a0
NIP [c0a6052c] start_kernel+0x4d8/0x5c0
LR [c0a6052c] start_kernel+0x4d8/0x5c0
Call Trace:
[c0c4dfa0] [c0a6052c] start_kernel+0x4d8/0x5c0 (unreliable)
[c0c4dff0] [00003540] 0x3540
Code: 480037b1 48023c05 4bab88ed 90620260 480139e9 4b657ced 7d2000a6 
71298000 41a20014 3c60c09a 3863b78c 4b5e9ccd <0fe00000> 39200000 
99380008 7d2000a6
---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 14:37                 ` Christophe Leroy
@ 2023-09-12 15:06                   ` Christophe Leroy
  0 siblings, 0 replies; 35+ messages in thread
From: Christophe Leroy @ 2023-09-12 15:06 UTC (permalink / raw)
  To: paulmck, Liam R. Howlett, Geert Uytterhoeven, Andrew Morton,
	maple-tree, linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni



Le 12/09/2023 à 16:37, Christophe Leroy a écrit :
> 
> 
>>
>> Maybe something like this in __setup_irq(), right before the
>> mutex_lock()?
>>
>> 	WARN_ON_ONCE(irqs_disabled());
>>
>> This will dump the stack trace showing how __setup_irq() is being invoked
>> in early boot on ppc32.
>>
>> Again, given that __setup_irq() invokes mutex_lock(), invoking this
>> function in its current form before interrupts have been enabled is a
>> bad idea.
>>
> 
> No trigger of that WARN_ON() I added in __setup_irq() as instructed
> above, still getting (pmac32_defconfig on MAC99 QEMU)
> 
> ------------[ cut here ]------------
> Interrupts were enabled early
> WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x4d8/0x5c0
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc1-dirty #481
> Hardware name: PowerMac3,1 7400 0xc0209 PowerMac
> NIP:  c0a6052c LR: c0a6052c CTR: 00000000
> REGS: c0c4dee0 TRAP: 0700   Not tainted  (6.6.0-rc1-dirty)
> MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 24000282  XER: 20000000
> 
> GPR00: c0a6052c c0c4dfa0 c0b92580 0000001d c0b9d128 00000001 c0b9d148
> 3ffffdff
> GPR08: c0ba80f0 00000000 00000000 3ffffe00 44000282 00000000 00000000
> 0199abfc
> GPR16: 0199b0a4 7fde7fa4 7fc5ac0c 000000bb 41000000 01c690c8 c0b92014
> c09b4bdc
> GPR24: c0c55220 c0ac0000 00000000 efff9109 efff9100 0000000a c0c6d000
> c0b920a0
> NIP [c0a6052c] start_kernel+0x4d8/0x5c0
> LR [c0a6052c] start_kernel+0x4d8/0x5c0
> Call Trace:
> [c0c4dfa0] [c0a6052c] start_kernel+0x4d8/0x5c0 (unreliable)
> [c0c4dff0] [00003540] 0x3540
> Code: 480037b1 48023c05 4bab88ed 90620260 480139e9 4b657ced 7d2000a6
> 71298000 41a20014 3c60c09a 3863b78c 4b5e9ccd <0fe00000> 39200000
> 99380008 7d2000a6
> ---[ end trace 0000000000000000 ]---
> 

For what it's worth, the interrupts seems to be enabled by the call to 
init_IRQ().

Diging into it that's enabled by the call to __vmalloc_node() in 
alloc_vm_stack()

static void *__init alloc_vm_stack(void)
{
	return __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, THREADINFO_GFP,
			      NUMA_NO_NODE, (void *)_RET_IP_);
}

static void __init vmap_irqstack_init(void)
{
	int i;

	for_each_possible_cpu(i) {
		softirq_ctx[i] = alloc_vm_stack();
		hardirq_ctx[i] = alloc_vm_stack();
	}
}


void __init init_IRQ(void)
{
	if (IS_ENABLED(CONFIG_VMAP_STACK))
		vmap_irqstack_init();

	if (ppc_md.init_IRQ)
		ppc_md.init_IRQ();

	if (!WARN_ON(!ppc_md.get_irq))
		static_call_update(ppc_get_irq, ppc_md.get_irq);
}

Christophe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 13:56                         ` Liam R. Howlett
  2023-09-12 14:29                           ` Liam R. Howlett
@ 2023-09-12 15:07                           ` Paul E. McKenney
  2023-09-12 15:44                             ` Liam R. Howlett
  1 sibling, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2023-09-12 15:07 UTC (permalink / raw)
  To: Liam R. Howlett, Geert Uytterhoeven, Andrew Morton, maple-tree,
	linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni

On Tue, Sep 12, 2023 at 09:56:17AM -0400, Liam R. Howlett wrote:
> * Paul E. McKenney <paulmck@kernel.org> [230912 06:00]:
> > On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> > > Hi Paul,
> > > 
> > > On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> > > > > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > > > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > > > > > >
> > > > > > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > > > > > get a working PPC32 test environment.
> > > > > > > > > > >
> > > > > > > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > > > > > > >     Hi Liam,
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > > > > > >
> > > > > > > > > > > ...
> > > > > > > > > > > > >
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > > > > > > RCU-related configs:
> > > > > > > > > > > >
> > > > > > > > > > > >     $ grep RCU .config
> > > > > > > > > > > >     # RCU Subsystem
> > > > > > > > > > > >     CONFIG_TINY_RCU=y
> > > > > >
> > > > > > I must have been asleep last time I looked at this.  I was looking at
> > > > > > Tree RCU.  Please accept my apologies for my lapse.  :-/
> > > > > >
> > > > > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > > > > > have said the same thing, albeit after looking at a lot less RCU code.
> > > > > >
> > > > > > TL;DR:
> > > > > >
> > > > > > 1.      Try making the __setup_irq() function's call to mutex_lock()
> > > > > >         instead be as follows:
> > > > > >
> > > > > >         if (!mutex_trylock(&desc->request_mutex))
> > > > > >                 mutex_lock(&desc->request_mutex);
> > > > > >
> > > > > >         This might fail if __setup_irq() has other dependencies on a
> > > > > >         fully operational scheduler.
> > > > > >
> > > > > > 2.      Move that ppc32 call to __setup_irq() much later, most definitely
> > > > > >         after interrupts have been enabled and the scheduler is fully
> > > > > >         operational.  Invoking mutex_lock() before that time is not a
> > > > > >         good idea.  ;-)
> > > > >
> > > > > There is no call to __setup_irq() from arch/powerpc/?
> > > >
> > > > Glad it is not just me, given that I didn't see a direct call, either.  So
> > > > later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
> > > > just before that mutex_lock() in __setup_irq().
> 
> I had already found that this is the mutex lock that is enabling them.
> I surrounded the mutex lock to ensure it was not enabled before, but was
> after.  Here is the findings:
> 
> kernel/irq/manage.c:1587 __setup_irq:
> [    0.000000] [c0e65ec0] [c00e9b00] __setup_irq+0x6c4/0x840 (unreliable)
> [    0.000000] [c0e65ef0] [c00e9d74] request_threaded_irq+0xf8/0x1f4
> [    0.000000] [c0e65f20] [c0c27168] pmac_pic_init+0x204/0x5f8
> [    0.000000] [c0e65f80] [c0c1f544] init_IRQ+0xac/0x12c
> [    0.000000] [c0e65fa0] [c0c1cad0] start_kernel+0x544/0x6d4
> 
> Note your line number will be slightly different due to my debug.  This
> is the WARN _after_ the mutex lock.
> 
> > > >
> > > > Either way, invoking mutex_lock() early in boot before interrupts have
> > > > been enabled is a bad idea.  ;-)
> > > 
> > > I'll add that WARN_ON_ONCE() too, and will report back later today...
> > 
> > Thank you, looking forward to hearing the outcome!
> > 
> > > > > Note that there are (possibly different) issues seen on ppc32 and on arm32
> > > > > (Renesas RZ/A in particular, but not on other Renesas ARM systems).
> > > > >
> > > > > I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
> > > > > Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
> > > > > cfeb6ae8bcb96ccf^.
> > > >
> > > > I look forward to hearing what is the issue in both cases.
> > > 
> > > For RZ/A, my problem report is
> > > https://lore.kernel.org/all/3f86d58e-7f36-c6b4-c43a-2a7bcffd3bd@linux-m68k.org/
> > 
> > Thank you, Geert!
> > 
> > Huh.  Is that patch you reverted causing Maple Tree or related code
> > to attempt to acquire mutexes in early boot before interrupts have
> > been enabled?
> > 
> > If that added WARN_ON_ONCE() doesn't trigger early, another approach
> > would be to put it at the beginning of mutex_lock().  Or for that matter
> > at the beginning of might_sleep().
> 
> Yeah, I put many WARN() calls through the code as well as tracking down
> where TIF_NEED_RESCHED was set; the tiny.c call_rcu().
> 
> 
> So my findings summarized:
> 
> 1. My change to the maple tree makes call_rcu() more likely on early boot.
> 2. The initial thread setup is always set to idle state
> 3. call_rcu() tiny sets TIF_NEED_RESCHED since is_idle_task(current)
> 4. init_IRQ() takes a mutex lock which will enable the interrupts since
> TIF_NEED_RESCHED is set.
> 
> I don't know which of these things is "wrong".

Doing early-boot call_rcu() is OK.

The initial thread eventually becomes the idle thread for the boot CPU.
See rest_init() in init/main.c.

I can certainly make Tiny call_rcu() refrain from invoking resched_cpu()
during boot, as shown in the (untested) patch below.  This might result in
boot-time hangs, though.

The thought of doing mutex_lock() before interrupts are enabled on the
boot CPU strikes me as very wrong.  Others might argue that the fact
that __might_resched() explicitly avoids complaining when system_state
is equal to SYSTEM_BOOTING constitutes evidence that such calls are OK.
(Which might be why enabling debug suppressed the problem.)  Except that
if you actually try sleeping at that time, nothing good can possibly
happen.

So my question is why is it useful to setup interrupts that early, given
that interrupts cannot possibly happen until the boot CPU enables them?

> I also looked into the mtmsr register but decided to consult you lot
> about my findings in hopes that someone with more knowledge of the
> platform or early boot would alleviate the pain so that I could context
> switch or sleep :)  I mean, an mtmsr bug seems like a leap even for the
> issues I create..

Stranger things have happened, but I agree that this would indeed be
quite strange!  ;-)

						Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index fec804b79080..f00fb0855e4b 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -192,7 +192,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
 	rcu_ctrlblk.curtail = &head->next;
 	local_irq_restore(flags);
 
-	if (unlikely(is_idle_task(current))) {
+	if (unlikely(is_idle_task(current)) && system_state > SYSTEM_BOOTING) {
 		/* force scheduling for rcu_qs() */
 		resched_cpu(0);
 	}

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 14:29                           ` Liam R. Howlett
@ 2023-09-12 15:08                             ` Paul E. McKenney
  2023-09-12 15:27                               ` Christophe Leroy
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2023-09-12 15:08 UTC (permalink / raw)
  To: Liam R. Howlett, Geert Uytterhoeven, Andrew Morton, maple-tree,
	linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni

On Tue, Sep 12, 2023 at 10:29:30AM -0400, Liam R. Howlett wrote:
> * Liam R. Howlett <Liam.Howlett@Oracle.com> [230912 09:56]:
> > * Paul E. McKenney <paulmck@kernel.org> [230912 06:00]:
> > > On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> > > > Hi Paul,
> > > > 
> > > > On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> > > > > > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > > > > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > > > > > > >
> > > > > > > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > > > > > > get a working PPC32 test environment.
> > > > > > > > > > > >
> > > > > > > > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > > > > > > > >     Hi Liam,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > > > > > > >
> > > > > > > > > > > > ...
> > > > > > > > > > > > > >
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > > > > > > RCU-related configs:
> > > > > > > > > > > > >
> > > > > > > > > > > > >     $ grep RCU .config
> > > > > > > > > > > > >     # RCU Subsystem
> > > > > > > > > > > > >     CONFIG_TINY_RCU=y
> > > > > > >
> > > > > > > I must have been asleep last time I looked at this.  I was looking at
> > > > > > > Tree RCU.  Please accept my apologies for my lapse.  :-/
> > > > > > >
> > > > > > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > > > > > > have said the same thing, albeit after looking at a lot less RCU code.
> > > > > > >
> > > > > > > TL;DR:
> > > > > > >
> > > > > > > 1.      Try making the __setup_irq() function's call to mutex_lock()
> > > > > > >         instead be as follows:
> > > > > > >
> > > > > > >         if (!mutex_trylock(&desc->request_mutex))
> > > > > > >                 mutex_lock(&desc->request_mutex);
> > > > > > >
> > > > > > >         This might fail if __setup_irq() has other dependencies on a
> > > > > > >         fully operational scheduler.
> 
> This changes where the interrupts become enabled, but doesn't stop it
> from happening.  It still throws a WARN after init_IRQ(). I suspect it
> is not the way to proceed as there are probably many places that will
> need to be changed in both common and arch specific code.  As soon as
> TIF_NEED_RESCHED is set, then all the checks will need to be altered.

Thank you for trying it!

> I think we either need to set the boot thread to !idle, avoid call_rcu()
> to set TIF_NEED_RESCHED (how was this working before?  Do we need rcu
> for the IRQs?), or alter the boot order (note this is NOT arch or
> platform code here).
> 
> I don't like any of these.  I'd like another option, please?

My favorite is to move the interrupt enabling later, but Michael Ellerman
would know better than would I about the feasibility of this.

							Thanx, Paul

> > > > > > >
> > > > > > > 2.      Move that ppc32 call to __setup_irq() much later, most definitely
> > > > > > >         after interrupts have been enabled and the scheduler is fully
> > > > > > >         operational.  Invoking mutex_lock() before that time is not a
> > > > > > >         good idea.  ;-)
> > > > > >
> > > > > > There is no call to __setup_irq() from arch/powerpc/?
> > > > >
> > > > > Glad it is not just me, given that I didn't see a direct call, either.  So
> > > > > later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
> > > > > just before that mutex_lock() in __setup_irq().
> 
> Oh, and also:
> arch/powerpc/platforms/powermac/setup.c:        .init_IRQ               = pmac_pic_init,
> 
> > 
> > I had already found that this is the mutex lock that is enabling them.
> > I surrounded the mutex lock to ensure it was not enabled before, but was
> > after.  Here is the findings:
> > 
> > kernel/irq/manage.c:1587 __setup_irq:
> > [    0.000000] [c0e65ec0] [c00e9b00] __setup_irq+0x6c4/0x840 (unreliable)
> > [    0.000000] [c0e65ef0] [c00e9d74] request_threaded_irq+0xf8/0x1f4
> > [    0.000000] [c0e65f20] [c0c27168] pmac_pic_init+0x204/0x5f8
> > [    0.000000] [c0e65f80] [c0c1f544] init_IRQ+0xac/0x12c
> > [    0.000000] [c0e65fa0] [c0c1cad0] start_kernel+0x544/0x6d4
> > 
> > Note your line number will be slightly different due to my debug.  This
> > is the WARN _after_ the mutex lock.
> > 
> > > > >
> > > > > Either way, invoking mutex_lock() early in boot before interrupts have
> > > > > been enabled is a bad idea.  ;-)
> > > > 
> > > > I'll add that WARN_ON_ONCE() too, and will report back later today...
> > > 
> > > Thank you, looking forward to hearing the outcome!
> > > 
> > > > > > Note that there are (possibly different) issues seen on ppc32 and on arm32
> > > > > > (Renesas RZ/A in particular, but not on other Renesas ARM systems).
> > > > > >
> > > > > > I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
> > > > > > Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
> > > > > > cfeb6ae8bcb96ccf^.
> > > > >
> > > > > I look forward to hearing what is the issue in both cases.
> > > > 
> > > > For RZ/A, my problem report is
> > > > https://lore.kernel.org/all/3f86d58e-7f36-c6b4-c43a-2a7bcffd3bd@linux-m68k.org/
> > > 
> > > Thank you, Geert!
> > > 
> > > Huh.  Is that patch you reverted causing Maple Tree or related code
> > > to attempt to acquire mutexes in early boot before interrupts have
> > > been enabled?
> > > 
> > > If that added WARN_ON_ONCE() doesn't trigger early, another approach
> > > would be to put it at the beginning of mutex_lock().  Or for that matter
> > > at the beginning of might_sleep().
> > 
> > Yeah, I put many WARN() calls through the code as well as tracking down
> > where TIF_NEED_RESCHED was set; the tiny.c call_rcu().
> > 
> > 
> > So my findings summarized:
> > 
> > 1. My change to the maple tree makes call_rcu() more likely on early boot.
> > 2. The initial thread setup is always set to idle state
> > 3. call_rcu() tiny sets TIF_NEED_RESCHED since is_idle_task(current)
> > 4. init_IRQ() takes a mutex lock which will enable the interrupts since
> > TIF_NEED_RESCHED is set.
> > 
> > I don't know which of these things is "wrong".
> > 
> > I also looked into the mtmsr register but decided to consult you lot
> > about my findings in hopes that someone with more knowledge of the
> > platform or early boot would alleviate the pain so that I could context
> > switch or sleep :)  I mean, an mtmsr bug seems like a leap even for the
> > issues I create..
> > 
> > Regards,
> > Liam
> > 
> > 
> > -- 
> > maple-tree mailing list
> > maple-tree@lists.infradead.org
> > https://lists.infradead.org/mailman/listinfo/maple-tree

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 15:08                             ` Paul E. McKenney
@ 2023-09-12 15:27                               ` Christophe Leroy
  2023-09-12 15:49                                 ` Liam R. Howlett
  0 siblings, 1 reply; 35+ messages in thread
From: Christophe Leroy @ 2023-09-12 15:27 UTC (permalink / raw)
  To: paulmck, Liam R. Howlett, Geert Uytterhoeven, Andrew Morton,
	maple-tree, linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni



Le 12/09/2023 à 17:08, Paul E. McKenney a écrit :
> On Tue, Sep 12, 2023 at 10:29:30AM -0400, Liam R. Howlett wrote:
>> * Liam R. Howlett <Liam.Howlett@Oracle.com> [230912 09:56]:
>>> * Paul E. McKenney <paulmck@kernel.org> [230912 06:00]:
>>>> On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
>>>>> Hi Paul,
>>>>>
>>>>> On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>>>>>> On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
>>>>>>> On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>>>>>>>> On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
>>>>>>>>> * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
>>>>>>>>>> On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
>>>>>>>>>>> * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
>>>>>>>>>>>> On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
>>>>>>>>>>>>> (Adding Paul & Shanker to Cc list.. please see below for why)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Apologies on the late response, I was away and have been struggling to
>>>>>>>>>>>>> get a working PPC32 test environment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
>>>>>>>>>>>>>>      Hi Liam,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, 18 Aug 2023, Liam R. Howlett wrote:
>>>>>>>>>>>>>>> The current implementation of append may cause duplicate data and/or
>>>>>>>>>>>>>>> incorrect ranges to be returned to a reader during an update.  Although
>>>>>>>>>>>>>>> this has not been reported or seen, disable the append write operation
>>>>>>>>>>>>>>> while the tree is in rcu mode out of an abundance of caution.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>>>>>> RCU-related configs:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      $ grep RCU .config
>>>>>>>>>>>>>>      # RCU Subsystem
>>>>>>>>>>>>>>      CONFIG_TINY_RCU=y
>>>>>>>>
>>>>>>>> I must have been asleep last time I looked at this.  I was looking at
>>>>>>>> Tree RCU.  Please accept my apologies for my lapse.  :-/
>>>>>>>>
>>>>>>>> However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
>>>>>>>> have said the same thing, albeit after looking at a lot less RCU code.
>>>>>>>>
>>>>>>>> TL;DR:
>>>>>>>>
>>>>>>>> 1.      Try making the __setup_irq() function's call to mutex_lock()
>>>>>>>>          instead be as follows:
>>>>>>>>
>>>>>>>>          if (!mutex_trylock(&desc->request_mutex))
>>>>>>>>                  mutex_lock(&desc->request_mutex);
>>>>>>>>
>>>>>>>>          This might fail if __setup_irq() has other dependencies on a
>>>>>>>>          fully operational scheduler.
>>
>> This changes where the interrupts become enabled, but doesn't stop it
>> from happening.  It still throws a WARN after init_IRQ(). I suspect it
>> is not the way to proceed as there are probably many places that will
>> need to be changed in both common and arch specific code.  As soon as
>> TIF_NEED_RESCHED is set, then all the checks will need to be altered.
> 
> Thank you for trying it!
> 
>> I think we either need to set the boot thread to !idle, avoid call_rcu()
>> to set TIF_NEED_RESCHED (how was this working before?  Do we need rcu
>> for the IRQs?), or alter the boot order (note this is NOT arch or
>> platform code here).
>>
>> I don't like any of these.  I'd like another option, please?
> 
> My favorite is to move the interrupt enabling later, but Michael Ellerman
> would know better than would I about the feasibility of this.
> 

I digged into it a bit more, looks like IRQs get enabled by the call to 
cond_resched() in the loop in vm_area_alloc_pages(), which is called 
from powerpc's init_IRQ() fonction when allocating IRQ stacks.

And IRQ stacks definitely need to be enabled before IRQs get enabled, so 
there's something wrong here isn't it ?

Christophe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 15:07                           ` Paul E. McKenney
@ 2023-09-12 15:44                             ` Liam R. Howlett
  2023-09-12 16:49                               ` Paul E. McKenney
  2023-09-12 17:09                               ` Christophe Leroy
  0 siblings, 2 replies; 35+ messages in thread
From: Liam R. Howlett @ 2023-09-12 15:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Geert Uytterhoeven, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc, Shanker Donthineni

* Paul E. McKenney <paulmck@kernel.org> [230912 11:07]:
> On Tue, Sep 12, 2023 at 09:56:17AM -0400, Liam R. Howlett wrote:
> > * Paul E. McKenney <paulmck@kernel.org> [230912 06:00]:
> > > On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> > > > Hi Paul,
> > > > 
> > > > On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> > > > > > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > > > > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > > > > > > >
> > > > > > > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > > > > > > get a working PPC32 test environment.
> > > > > > > > > > > >
> > > > > > > > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > > > > > > > >     Hi Liam,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > > > > > > >
> > > > > > > > > > > > ...
> > > > > > > > > > > > > >
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > > > > > > RCU-related configs:
> > > > > > > > > > > > >
> > > > > > > > > > > > >     $ grep RCU .config
> > > > > > > > > > > > >     # RCU Subsystem
> > > > > > > > > > > > >     CONFIG_TINY_RCU=y
> > > > > > >
> > > > > > > I must have been asleep last time I looked at this.  I was looking at
> > > > > > > Tree RCU.  Please accept my apologies for my lapse.  :-/
> > > > > > >
> > > > > > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > > > > > > have said the same thing, albeit after looking at a lot less RCU code.
> > > > > > >
> > > > > > > TL;DR:
> > > > > > >
> > > > > > > 1.      Try making the __setup_irq() function's call to mutex_lock()
> > > > > > >         instead be as follows:
> > > > > > >
> > > > > > >         if (!mutex_trylock(&desc->request_mutex))
> > > > > > >                 mutex_lock(&desc->request_mutex);
> > > > > > >
> > > > > > >         This might fail if __setup_irq() has other dependencies on a
> > > > > > >         fully operational scheduler.
> > > > > > >
> > > > > > > 2.      Move that ppc32 call to __setup_irq() much later, most definitely
> > > > > > >         after interrupts have been enabled and the scheduler is fully
> > > > > > >         operational.  Invoking mutex_lock() before that time is not a
> > > > > > >         good idea.  ;-)
> > > > > >
> > > > > > There is no call to __setup_irq() from arch/powerpc/?
> > > > >
> > > > > Glad it is not just me, given that I didn't see a direct call, either.  So
> > > > > later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
> > > > > just before that mutex_lock() in __setup_irq().
> > 
> > I had already found that this is the mutex lock that is enabling them.
> > I surrounded the mutex lock to ensure it was not enabled before, but was
> > after.  Here is the findings:
> > 
> > kernel/irq/manage.c:1587 __setup_irq:
> > [    0.000000] [c0e65ec0] [c00e9b00] __setup_irq+0x6c4/0x840 (unreliable)
> > [    0.000000] [c0e65ef0] [c00e9d74] request_threaded_irq+0xf8/0x1f4
> > [    0.000000] [c0e65f20] [c0c27168] pmac_pic_init+0x204/0x5f8
> > [    0.000000] [c0e65f80] [c0c1f544] init_IRQ+0xac/0x12c
> > [    0.000000] [c0e65fa0] [c0c1cad0] start_kernel+0x544/0x6d4
> > 
> > Note your line number will be slightly different due to my debug.  This
> > is the WARN _after_ the mutex lock.
> > 
> > > > >
> > > > > Either way, invoking mutex_lock() early in boot before interrupts have
> > > > > been enabled is a bad idea.  ;-)
> > > > 
> > > > I'll add that WARN_ON_ONCE() too, and will report back later today...
> > > 
> > > Thank you, looking forward to hearing the outcome!
> > > 
> > > > > > Note that there are (possibly different) issues seen on ppc32 and on arm32
> > > > > > (Renesas RZ/A in particular, but not on other Renesas ARM systems).
> > > > > >
> > > > > > I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
> > > > > > Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
> > > > > > cfeb6ae8bcb96ccf^.
> > > > >
> > > > > I look forward to hearing what is the issue in both cases.
> > > > 
> > > > For RZ/A, my problem report is
> > > > https://lore.kernel.org/all/3f86d58e-7f36-c6b4-c43a-2a7bcffd3bd@linux-m68k.org/
> > > 
> > > Thank you, Geert!
> > > 
> > > Huh.  Is that patch you reverted causing Maple Tree or related code
> > > to attempt to acquire mutexes in early boot before interrupts have
> > > been enabled?
> > > 
> > > If that added WARN_ON_ONCE() doesn't trigger early, another approach
> > > would be to put it at the beginning of mutex_lock().  Or for that matter
> > > at the beginning of might_sleep().
> > 
> > Yeah, I put many WARN() calls through the code as well as tracking down
> > where TIF_NEED_RESCHED was set; the tiny.c call_rcu().
> > 
> > 
> > So my findings summarized:
> > 
> > 1. My change to the maple tree makes call_rcu() more likely on early boot.
> > 2. The initial thread setup is always set to idle state
> > 3. call_rcu() tiny sets TIF_NEED_RESCHED since is_idle_task(current)
> > 4. init_IRQ() takes a mutex lock which will enable the interrupts since
> > TIF_NEED_RESCHED is set.
> > 
> > I don't know which of these things is "wrong".
> 
> Doing early-boot call_rcu() is OK.
> 
> The initial thread eventually becomes the idle thread for the boot CPU.
> See rest_init() in init/main.c.
> 
> I can certainly make Tiny call_rcu() refrain from invoking resched_cpu()
> during boot, as shown in the (untested) patch below.  This might result in
> boot-time hangs, though.

If we set the current thread as !idle, then we don't need to add
overhead to every call_rcu(), and you've already tracked down where I
need to change the flags back to idle.  Patch below.

> 
> The thought of doing mutex_lock() before interrupts are enabled on the
> boot CPU strikes me as very wrong.  Others might argue that the fact
> that __might_resched() explicitly avoids complaining when system_state
> is equal to SYSTEM_BOOTING constitutes evidence that such calls are OK.
> (Which might be why enabling debug suppressed the problem.)  Except that
> if you actually try sleeping at that time, nothing good can possibly
> happen.

Does lockdep check for SYSTEM_BOOTING as well?  That could be another
reason?

> 
> So my question is why is it useful to setup interrupts that early, given
> that interrupts cannot possibly happen until the boot CPU enables them?

I don't know for sure, but there are 'preallocated IRQs' which end up
grouped 0-15, then I see another one added at 55 after the mpic console
output.  I suspect it's so that they can be added as they are discovered
during early boot?

The below is not fully tested, but qemu stops throwing the warning on
boot and it doesn't add instructions to call_rcu().

------------------------------------------------------------------------
diff --git a/init/main.c b/init/main.c
index dbe1fe76be34..fd4739918a94 100644
--- a/init/main.c
+++ b/init/main.c
@@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
 	 */
 	rcu_read_lock();
 	tsk = find_task_by_pid_ns(pid, &init_pid_ns);
-	tsk->flags |= PF_NO_SETAFFINITY;
+	tsk->flags |= PF_NO_SETAFFINITY & PF_IDLE;
 	set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
 	rcu_read_unlock();
 
@@ -943,6 +943,7 @@ void start_kernel(void)
 	 * time - but meanwhile we still have a functioning scheduler.
 	 */
 	sched_init();
+	current->flags &= ~PF_IDLE;
 
 	if (WARN(!irqs_disabled(),
 		 "Interrupts were enabled *very* early, fixing it\n"))

> ------------------------------------------------------------------------
> 
> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> index fec804b79080..f00fb0855e4b 100644
> --- a/kernel/rcu/tiny.c
> +++ b/kernel/rcu/tiny.c
> @@ -192,7 +192,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
>  	rcu_ctrlblk.curtail = &head->next;
>  	local_irq_restore(flags);
>  
> -	if (unlikely(is_idle_task(current))) {
> +	if (unlikely(is_idle_task(current)) && system_state > SYSTEM_BOOTING) {
>  		/* force scheduling for rcu_qs() */
>  		resched_cpu(0);
>  	}

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 15:27                               ` Christophe Leroy
@ 2023-09-12 15:49                                 ` Liam R. Howlett
  0 siblings, 0 replies; 35+ messages in thread
From: Liam R. Howlett @ 2023-09-12 15:49 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: paulmck, Geert Uytterhoeven, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc, Shanker Donthineni

* Christophe Leroy <christophe.leroy@csgroup.eu> [230912 11:27]:
> 
> 
> Le 12/09/2023 à 17:08, Paul E. McKenney a écrit :
> > On Tue, Sep 12, 2023 at 10:29:30AM -0400, Liam R. Howlett wrote:
> >> * Liam R. Howlett <Liam.Howlett@Oracle.com> [230912 09:56]:
> >>> * Paul E. McKenney <paulmck@kernel.org> [230912 06:00]:
> >>>> On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> >>>>> Hi Paul,
> >>>>>
> >>>>> On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >>>>>> On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> >>>>>>> On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >>>>>>>> On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> >>>>>>>>> * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> >>>>>>>>>> On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> >>>>>>>>>>> * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> >>>>>>>>>>>> On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> >>>>>>>>>>>>> (Adding Paul & Shanker to Cc list.. please see below for why)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Apologies on the late response, I was away and have been struggling to
> >>>>>>>>>>>>> get a working PPC32 test environment.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> >>>>>>>>>>>>>>      Hi Liam,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> >>>>>>>>>>>>>>> The current implementation of append may cause duplicate data and/or
> >>>>>>>>>>>>>>> incorrect ranges to be returned to a reader during an update.  Although
> >>>>>>>>>>>>>>> this has not been reported or seen, disable the append write operation
> >>>>>>>>>>>>>>> while the tree is in rcu mode out of an abundance of caution.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ...
> >>>>>>>>>
> >>>>>>>>>>>>>> RCU-related configs:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      $ grep RCU .config
> >>>>>>>>>>>>>>      # RCU Subsystem
> >>>>>>>>>>>>>>      CONFIG_TINY_RCU=y
> >>>>>>>>
> >>>>>>>> I must have been asleep last time I looked at this.  I was looking at
> >>>>>>>> Tree RCU.  Please accept my apologies for my lapse.  :-/
> >>>>>>>>
> >>>>>>>> However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> >>>>>>>> have said the same thing, albeit after looking at a lot less RCU code.
> >>>>>>>>
> >>>>>>>> TL;DR:
> >>>>>>>>
> >>>>>>>> 1.      Try making the __setup_irq() function's call to mutex_lock()
> >>>>>>>>          instead be as follows:
> >>>>>>>>
> >>>>>>>>          if (!mutex_trylock(&desc->request_mutex))
> >>>>>>>>                  mutex_lock(&desc->request_mutex);
> >>>>>>>>
> >>>>>>>>          This might fail if __setup_irq() has other dependencies on a
> >>>>>>>>          fully operational scheduler.
> >>
> >> This changes where the interrupts become enabled, but doesn't stop it
> >> from happening.  It still throws a WARN after init_IRQ(). I suspect it
> >> is not the way to proceed as there are probably many places that will
> >> need to be changed in both common and arch specific code.  As soon as
> >> TIF_NEED_RESCHED is set, then all the checks will need to be altered.
> > 
> > Thank you for trying it!
> > 
> >> I think we either need to set the boot thread to !idle, avoid call_rcu()
> >> to set TIF_NEED_RESCHED (how was this working before?  Do we need rcu
> >> for the IRQs?), or alter the boot order (note this is NOT arch or
> >> platform code here).
> >>
> >> I don't like any of these.  I'd like another option, please?
> > 
> > My favorite is to move the interrupt enabling later, but Michael Ellerman
> > would know better than would I about the feasibility of this.
> > 
> 
> I digged into it a bit more, looks like IRQs get enabled by the call to 
> cond_resched() in the loop in vm_area_alloc_pages(), which is called 
> from powerpc's init_IRQ() fonction when allocating IRQ stacks.

This is another location where the process will enable IRQs because
TIF_NEED_RESCHED was set.

> 
> And IRQ stacks definitely need to be enabled before IRQs get enabled, so 
> there's something wrong here isn't it ?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 15:44                             ` Liam R. Howlett
@ 2023-09-12 16:49                               ` Paul E. McKenney
  2023-09-12 17:02                                 ` Christophe Leroy
  2023-09-12 17:09                               ` Christophe Leroy
  1 sibling, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2023-09-12 16:49 UTC (permalink / raw)
  To: Liam R. Howlett, Geert Uytterhoeven, Andrew Morton, maple-tree,
	linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni

On Tue, Sep 12, 2023 at 11:44:23AM -0400, Liam R. Howlett wrote:
> * Paul E. McKenney <paulmck@kernel.org> [230912 11:07]:
> > On Tue, Sep 12, 2023 at 09:56:17AM -0400, Liam R. Howlett wrote:
> > > * Paul E. McKenney <paulmck@kernel.org> [230912 06:00]:
> > > > On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> > > > > Hi Paul,
> > > > > 
> > > > > On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> > > > > > > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > > > > > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > > > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > > > > > > > get a working PPC32 test environment.
> > > > > > > > > > > > >
> > > > > > > > > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > > > > > > > > >     Hi Liam,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > > > > > > > >
> > > > > > > > > > > > > ...
> > > > > > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > ...
> > > > > > > > >
> > > > > > > > > > > > > > RCU-related configs:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >     $ grep RCU .config
> > > > > > > > > > > > > >     # RCU Subsystem
> > > > > > > > > > > > > >     CONFIG_TINY_RCU=y
> > > > > > > >
> > > > > > > > I must have been asleep last time I looked at this.  I was looking at
> > > > > > > > Tree RCU.  Please accept my apologies for my lapse.  :-/
> > > > > > > >
> > > > > > > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > > > > > > > have said the same thing, albeit after looking at a lot less RCU code.
> > > > > > > >
> > > > > > > > TL;DR:
> > > > > > > >
> > > > > > > > 1.      Try making the __setup_irq() function's call to mutex_lock()
> > > > > > > >         instead be as follows:
> > > > > > > >
> > > > > > > >         if (!mutex_trylock(&desc->request_mutex))
> > > > > > > >                 mutex_lock(&desc->request_mutex);
> > > > > > > >
> > > > > > > >         This might fail if __setup_irq() has other dependencies on a
> > > > > > > >         fully operational scheduler.
> > > > > > > >
> > > > > > > > 2.      Move that ppc32 call to __setup_irq() much later, most definitely
> > > > > > > >         after interrupts have been enabled and the scheduler is fully
> > > > > > > >         operational.  Invoking mutex_lock() before that time is not a
> > > > > > > >         good idea.  ;-)
> > > > > > >
> > > > > > > There is no call to __setup_irq() from arch/powerpc/?
> > > > > >
> > > > > > Glad it is not just me, given that I didn't see a direct call, either.  So
> > > > > > later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
> > > > > > just before that mutex_lock() in __setup_irq().
> > > 
> > > I had already found that this is the mutex lock that is enabling them.
> > > I surrounded the mutex lock to ensure it was not enabled before, but was
> > > after.  Here is the findings:
> > > 
> > > kernel/irq/manage.c:1587 __setup_irq:
> > > [    0.000000] [c0e65ec0] [c00e9b00] __setup_irq+0x6c4/0x840 (unreliable)
> > > [    0.000000] [c0e65ef0] [c00e9d74] request_threaded_irq+0xf8/0x1f4
> > > [    0.000000] [c0e65f20] [c0c27168] pmac_pic_init+0x204/0x5f8
> > > [    0.000000] [c0e65f80] [c0c1f544] init_IRQ+0xac/0x12c
> > > [    0.000000] [c0e65fa0] [c0c1cad0] start_kernel+0x544/0x6d4
> > > 
> > > Note your line number will be slightly different due to my debug.  This
> > > is the WARN _after_ the mutex lock.
> > > 
> > > > > >
> > > > > > Either way, invoking mutex_lock() early in boot before interrupts have
> > > > > > been enabled is a bad idea.  ;-)
> > > > > 
> > > > > I'll add that WARN_ON_ONCE() too, and will report back later today...
> > > > 
> > > > Thank you, looking forward to hearing the outcome!
> > > > 
> > > > > > > Note that there are (possibly different) issues seen on ppc32 and on arm32
> > > > > > > (Renesas RZ/A in particular, but not on other Renesas ARM systems).
> > > > > > >
> > > > > > > I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
> > > > > > > Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
> > > > > > > cfeb6ae8bcb96ccf^.
> > > > > >
> > > > > > I look forward to hearing what is the issue in both cases.
> > > > > 
> > > > > For RZ/A, my problem report is
> > > > > https://lore.kernel.org/all/3f86d58e-7f36-c6b4-c43a-2a7bcffd3bd@linux-m68k.org/
> > > > 
> > > > Thank you, Geert!
> > > > 
> > > > Huh.  Is that patch you reverted causing Maple Tree or related code
> > > > to attempt to acquire mutexes in early boot before interrupts have
> > > > been enabled?
> > > > 
> > > > If that added WARN_ON_ONCE() doesn't trigger early, another approach
> > > > would be to put it at the beginning of mutex_lock().  Or for that matter
> > > > at the beginning of might_sleep().
> > > 
> > > Yeah, I put many WARN() calls through the code as well as tracking down
> > > where TIF_NEED_RESCHED was set; the tiny.c call_rcu().
> > > 
> > > 
> > > So my findings summarized:
> > > 
> > > 1. My change to the maple tree makes call_rcu() more likely on early boot.
> > > 2. The initial thread setup is always set to idle state
> > > 3. call_rcu() tiny sets TIF_NEED_RESCHED since is_idle_task(current)
> > > 4. init_IRQ() takes a mutex lock which will enable the interrupts since
> > > TIF_NEED_RESCHED is set.
> > > 
> > > I don't know which of these things is "wrong".
> > 
> > Doing early-boot call_rcu() is OK.
> > 
> > The initial thread eventually becomes the idle thread for the boot CPU.
> > See rest_init() in init/main.c.
> > 
> > I can certainly make Tiny call_rcu() refrain from invoking resched_cpu()
> > during boot, as shown in the (untested) patch below.  This might result in
> > boot-time hangs, though.
> 
> If we set the current thread as !idle, then we don't need to add
> overhead to every call_rcu(), and you've already tracked down where I
> need to change the flags back to idle.  Patch below.

I personally like your patch way better than mine, but it will need both
eyes and time on it.  I wouldn't put it past someone to assume that the
boot CPU is running the idle thread early in boot.  :-/

> > The thought of doing mutex_lock() before interrupts are enabled on the
> > boot CPU strikes me as very wrong.  Others might argue that the fact
> > that __might_resched() explicitly avoids complaining when system_state
> > is equal to SYSTEM_BOOTING constitutes evidence that such calls are OK.
> > (Which might be why enabling debug suppressed the problem.)  Except that
> > if you actually try sleeping at that time, nothing good can possibly
> > happen.
> 
> Does lockdep check for SYSTEM_BOOTING as well?  That could be another
> reason?

Not from what I can see, but I could be missing something.

> > So my question is why is it useful to setup interrupts that early, given
> > that interrupts cannot possibly happen until the boot CPU enables them?
> 
> I don't know for sure, but there are 'preallocated IRQs' which end up
> grouped 0-15, then I see another one added at 55 after the mpic console
> output.  I suspect it's so that they can be added as they are discovered
> during early boot?

Christophe argues that the interrupt stacks must be allocated early
on, and that this acquires a mutex.

> The below is not fully tested, but qemu stops throwing the warning on
> boot and it doesn't add instructions to call_rcu().

Two points in its favor!  ;-)

							Thanx, Paul

> ------------------------------------------------------------------------
> diff --git a/init/main.c b/init/main.c
> index dbe1fe76be34..fd4739918a94 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
>  	 */
>  	rcu_read_lock();
>  	tsk = find_task_by_pid_ns(pid, &init_pid_ns);
> -	tsk->flags |= PF_NO_SETAFFINITY;
> +	tsk->flags |= PF_NO_SETAFFINITY & PF_IDLE;
>  	set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
>  	rcu_read_unlock();
>  
> @@ -943,6 +943,7 @@ void start_kernel(void)
>  	 * time - but meanwhile we still have a functioning scheduler.
>  	 */
>  	sched_init();
> +	current->flags &= ~PF_IDLE;
>  
>  	if (WARN(!irqs_disabled(),
>  		 "Interrupts were enabled *very* early, fixing it\n"))
> 
> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> > index fec804b79080..f00fb0855e4b 100644
> > --- a/kernel/rcu/tiny.c
> > +++ b/kernel/rcu/tiny.c
> > @@ -192,7 +192,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  	rcu_ctrlblk.curtail = &head->next;
> >  	local_irq_restore(flags);
> >  
> > -	if (unlikely(is_idle_task(current))) {
> > +	if (unlikely(is_idle_task(current)) && system_state > SYSTEM_BOOTING) {
> >  		/* force scheduling for rcu_qs() */
> >  		resched_cpu(0);
> >  	}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 16:49                               ` Paul E. McKenney
@ 2023-09-12 17:02                                 ` Christophe Leroy
  2023-09-13  6:33                                   ` Christophe Leroy
  0 siblings, 1 reply; 35+ messages in thread
From: Christophe Leroy @ 2023-09-12 17:02 UTC (permalink / raw)
  To: paulmck, Liam R. Howlett, Geert Uytterhoeven, Andrew Morton,
	maple-tree, linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni



Le 12/09/2023 à 18:49, Paul E. McKenney a écrit :
> On Tue, Sep 12, 2023 at 11:44:23AM -0400, Liam R. Howlett wrote:
>>> So my question is why is it useful to setup interrupts that early, given
>>> that interrupts cannot possibly happen until the boot CPU enables them?
>>
>> I don't know for sure, but there are 'preallocated IRQs' which end up
>> grouped 0-15, then I see another one added at 55 after the mpic console
>> output.  I suspect it's so that they can be added as they are discovered
>> during early boot?
> 
> Christophe argues that the interrupt stacks must be allocated early
> on, and that this acquires a mutex.
> 

Well, we can probably allocate them later than it is today.

In commit 547db12fd8a0 ("powerpc/32: Use vmapped stacks for interrupts") 
I already pushed the allocation at a later stage than it initialy was.

We can probably do it later if it helps, however it definitely needs to 
be done before enabling IRQs for obvious reasons, so it is a problem 
that alloc_vm_stack() calling __vmalloc_node() enables IRQs.

Christophe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 15:44                             ` Liam R. Howlett
  2023-09-12 16:49                               ` Paul E. McKenney
@ 2023-09-12 17:09                               ` Christophe Leroy
  2023-09-12 17:38                                 ` Liam R. Howlett
  1 sibling, 1 reply; 35+ messages in thread
From: Christophe Leroy @ 2023-09-12 17:09 UTC (permalink / raw)
  To: Liam R. Howlett, Paul E. McKenney, Geert Uytterhoeven,
	Andrew Morton, maple-tree, linux-mm, linux-kernel, stable,
	linux-renesas-soc, Shanker Donthineni



Le 12/09/2023 à 17:44, Liam R. Howlett a écrit :
> diff --git a/init/main.c b/init/main.c
> index dbe1fe76be34..fd4739918a94 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
>   	 */
>   	rcu_read_lock();
>   	tsk = find_task_by_pid_ns(pid, &init_pid_ns);
> -	tsk->flags |= PF_NO_SETAFFINITY;
> +	tsk->flags |= PF_NO_SETAFFINITY & PF_IDLE;

Is it really what you want to do ?

PF_NO_SETAFFINITY is 0x04000000 and PF_IDLE is 0x00000002 so

PF_NO_SETAFFINITY & PF_IDLE is 0


Didn't you mean to do PF_NO_SETAFFINITY | PF_IDLE  ?


Regardless, with either change I don't get the warning anymore.



>   	set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
>   	rcu_read_unlock();
>   
> @@ -943,6 +943,7 @@ void start_kernel(void)
>   	 * time - but meanwhile we still have a functioning scheduler.
>   	 */
>   	sched_init();
> +	current->flags &= ~PF_IDLE;
>   
>   	if (WARN(!irqs_disabled(),
>   		 "Interrupts were enabled *very* early, fixing it\n"))
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 17:09                               ` Christophe Leroy
@ 2023-09-12 17:38                                 ` Liam R. Howlett
  0 siblings, 0 replies; 35+ messages in thread
From: Liam R. Howlett @ 2023-09-12 17:38 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Paul E. McKenney, Geert Uytterhoeven, Andrew Morton, maple-tree,
	linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni

* Christophe Leroy <christophe.leroy@csgroup.eu> [230912 13:09]:
> 
> 
> Le 12/09/2023 à 17:44, Liam R. Howlett a écrit :
> > diff --git a/init/main.c b/init/main.c
> > index dbe1fe76be34..fd4739918a94 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -696,7 +696,7 @@ noinline void __ref __noreturn rest_init(void)
> >   	 */
> >   	rcu_read_lock();
> >   	tsk = find_task_by_pid_ns(pid, &init_pid_ns);
> > -	tsk->flags |= PF_NO_SETAFFINITY;
> > +	tsk->flags |= PF_NO_SETAFFINITY & PF_IDLE;
> 
> Is it really what you want to do ?
> 
> PF_NO_SETAFFINITY is 0x04000000 and PF_IDLE is 0x00000002 so
> 
> PF_NO_SETAFFINITY & PF_IDLE is 0
> 
> 
> Didn't you mean to do PF_NO_SETAFFINITY | PF_IDLE  ?

Yes, certainly.

> 
> 
> Regardless, with either change I don't get the warning anymore.

I don't have it fully tested but we avoid getting the call_rcu() setting
the TIF_... flag by avoiding the task having PF_IDLE set in the flags.
I'm not entirely sure if I have added the set/clear in the best
locations either.

The largest concern I have is that this could potentially change arch or
platfrom code if anything depends on this being idle.

> 
> 
> 
> >   	set_cpus_allowed_ptr(tsk, cpumask_of(smp_processor_id()));
> >   	rcu_read_unlock();
> >   
> > @@ -943,6 +943,7 @@ void start_kernel(void)
> >   	 * time - but meanwhile we still have a functioning scheduler.
> >   	 */
> >   	sched_init();
> > +	current->flags &= ~PF_IDLE;
> >   
> >   	if (WARN(!irqs_disabled(),
> >   		 "Interrupts were enabled *very* early, fixing it\n"))
> > 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 17:02                                 ` Christophe Leroy
@ 2023-09-13  6:33                                   ` Christophe Leroy
  0 siblings, 0 replies; 35+ messages in thread
From: Christophe Leroy @ 2023-09-13  6:33 UTC (permalink / raw)
  To: paulmck, Liam R. Howlett, Geert Uytterhoeven, Andrew Morton,
	maple-tree, linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni, Linux ARM



Le 12/09/2023 à 19:02, Christophe Leroy a écrit :
> 
> 
> Le 12/09/2023 à 18:49, Paul E. McKenney a écrit :
>> On Tue, Sep 12, 2023 at 11:44:23AM -0400, Liam R. Howlett wrote:
>>>> So my question is why is it useful to setup interrupts that early, given
>>>> that interrupts cannot possibly happen until the boot CPU enables them?
>>>
>>> I don't know for sure, but there are 'preallocated IRQs' which end up
>>> grouped 0-15, then I see another one added at 55 after the mpic console
>>> output.  I suspect it's so that they can be added as they are discovered
>>> during early boot?
>>
>> Christophe argues that the interrupt stacks must be allocated early
>> on, and that this acquires a mutex.
>>
> 
> Well, we can probably allocate them later than it is today.
> 
> In commit 547db12fd8a0 ("powerpc/32: Use vmapped stacks for interrupts")
> I already pushed the allocation at a later stage than it initialy was.
> 
> We can probably do it later if it helps, however it definitely needs to
> be done before enabling IRQs for obvious reasons, so it is a problem
> that alloc_vm_stack() calling __vmalloc_node() enables IRQs.

For FWIW, looks like arm32 is doing similar so the problem is likely to 
happen for them too. For that both CONFIG_IRQSTACKS and 
CONFIG_VMAP_STACK must be set it seems.

Christophe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-12 10:00                       ` Paul E. McKenney
  2023-09-12 13:56                         ` Liam R. Howlett
@ 2023-09-13 13:14                         ` Geert Uytterhoeven
  2023-09-13 13:24                           ` Liam R. Howlett
  1 sibling, 1 reply; 35+ messages in thread
From: Geert Uytterhoeven @ 2023-09-13 13:14 UTC (permalink / raw)
  To: paulmck
  Cc: Liam R. Howlett, Andrew Morton, maple-tree, linux-mm,
	linux-kernel, stable, linux-renesas-soc, Shanker Donthineni

Hi Paul,

On Tue, Sep 12, 2023 at 12:00 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> > On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> > > > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > > > > >
> > > > > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > > > > get a working PPC32 test environment.
> > > > > > > > > >
> > > > > > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > > > > > >     Hi Liam,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > > > > >
> > > > > > > > > > ...
> > > > > > > > > > > >
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > > > > > > RCU-related configs:
> > > > > > > > > > >
> > > > > > > > > > >     $ grep RCU .config
> > > > > > > > > > >     # RCU Subsystem
> > > > > > > > > > >     CONFIG_TINY_RCU=y
> > > > >
> > > > > I must have been asleep last time I looked at this.  I was looking at
> > > > > Tree RCU.  Please accept my apologies for my lapse.  :-/
> > > > >
> > > > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > > > > have said the same thing, albeit after looking at a lot less RCU code.
> > > > >
> > > > > TL;DR:
> > > > >
> > > > > 1.      Try making the __setup_irq() function's call to mutex_lock()
> > > > >         instead be as follows:
> > > > >
> > > > >         if (!mutex_trylock(&desc->request_mutex))
> > > > >                 mutex_lock(&desc->request_mutex);
> > > > >
> > > > >         This might fail if __setup_irq() has other dependencies on a
> > > > >         fully operational scheduler.
> > > > >
> > > > > 2.      Move that ppc32 call to __setup_irq() much later, most definitely
> > > > >         after interrupts have been enabled and the scheduler is fully
> > > > >         operational.  Invoking mutex_lock() before that time is not a
> > > > >         good idea.  ;-)
> > > >
> > > > There is no call to __setup_irq() from arch/powerpc/?
> > >
> > > Glad it is not just me, given that I didn't see a direct call, either.  So
> > > later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
> > > just before that mutex_lock() in __setup_irq().
> > >
> > > Either way, invoking mutex_lock() early in boot before interrupts have
> > > been enabled is a bad idea.  ;-)
> >
> > I'll add that WARN_ON_ONCE() too, and will report back later today...
>
> Thank you, looking forward to hearing the outcome!

On a typical arm32 system with global or architectured timer, I get:

WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1587 __setup_irq+0xcc/0x6d0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted
6.6.0-rc1-shmobile-02354-g24e058b77f5a #1655
Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x40/0x4c
 dump_stack_lvl from __warn+0x78/0x10c
 __warn from warn_slowpath_fmt+0x90/0x11c
 warn_slowpath_fmt from __setup_irq+0xcc/0x6d0
 __setup_irq from __request_percpu_irq+0xb8/0xd0
 __request_percpu_irq from set_smp_ipi_range+0x88/0xdc
 set_smp_ipi_range from gic_of_init+0x1a4/0x4d8
 gic_of_init from of_irq_init+0x1f0/0x320
 of_irq_init from init_IRQ+0x74/0x104
 init_IRQ from start_kernel+0x360/0x5d0
 start_kernel from 0x0

Likewise on arm64 with architectured timer.
But on these systems I do not see the issue I reported.

On RISC-V:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1587 __setup_irq+0x4e6/0x5ee
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc1-02355-g63165363c6a3 #63
Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
epc : __setup_irq+0x4e6/0x5ee
 ra : __setup_irq+0x38/0x5ee
epc : ffffffff80049472 ra : ffffffff80048fc4 sp : ffffffff81203cd0
 gp : ffffffff812ee760 tp : ffffffff8120d5c0 t0 : ffffffd801854300
 t1 : 0000000000000000 t2 : ffffffff81000ad8 s0 : ffffffff81203d20
 s1 : ffffffd801855000 a0 : 0000000000000001 a1 : ffffffd801855000
 a2 : ffffffd801854280 a3 : 0000000000000001 a4 : 0000000000000000
 a5 : 0000000000000000 a6 : ffffffd801852108 a7 : ffffffd801852100
 s2 : ffffffd801854280 s3 : 0000000000000005 s4 : ffffffff812c54c0
 s5 : 0000000000000005 s6 : ffffffff80dd83a0 s7 : ffffffff805c0cc0
 s8 : ffffffd801855018 s9 : 0000000000000000 s10: 0000000000000000
 s11: 000000007bf638a0 t3 : 0000000000000000 t4 : 0000000000000002
 t5 : ffffffff812882a0 t6 : 0000000000000001
status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff80049472>] __setup_irq+0x4e6/0x5ee
[<ffffffff800497a8>] __request_percpu_irq+0x98/0xcc
[<ffffffff8082501e>] riscv_timer_init_dt+0x186/0x22e
[<ffffffff80824b62>] timer_probe+0x62/0xd2
[<ffffffff80803c36>] time_init+0x86/0xa6
[<ffffffff80800ae2>] start_kernel+0x436/0x618
---[ end trace 0000000000000000 ]---

Also, no issue here.

On the affected systems (RZ/A1 and RZ/A2), the WARN_ON_ONCE() did
not trigger, until I applied Liam's patch ("init/main: Clear boot task
idle flag"), which got rid of the "Interrupts were enabled early" warning,
and now tells me, as expected:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1587 __setup_irq+0xc8/0x654
CPU: 0 PID: 0 Comm: swapper Not tainted
6.6.0-rc1-rskrza1-02357-g237e09fd64b8-dirty #829
Hardware name: Generic R7S72100 (Flattened Device Tree)
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x24/0x3c
 dump_stack_lvl from __warn+0x74/0xb8
 __warn from warn_slowpath_fmt+0x78/0xb0
 warn_slowpath_fmt from __setup_irq+0xc8/0x654
 __setup_irq from request_threaded_irq+0xac/0x13c
 request_threaded_irq from timer_of_init+0x238/0x2c8
 timer_of_init from ostm_init+0x98/0x208
 ostm_init from timer_probe+0x90/0xe4
 timer_probe from start_kernel+0x2c0/0x488
 start_kernel from 0x0
---[ end trace 0000000000000000 ]---

However, Liam's patch causes lots of warnings on the other systems...

> > > > Note that there are (possibly different) issues seen on ppc32 and on arm32
> > > > (Renesas RZ/A in particular, but not on other Renesas ARM systems).
> > > >
> > > > I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
> > > > Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
> > > > cfeb6ae8bcb96ccf^.
> > >
> > > I look forward to hearing what is the issue in both cases.
> >
> > For RZ/A, my problem report is
> > https://lore.kernel.org/all/3f86d58e-7f36-c6b4-c43a-2a7bcffd3bd@linux-m68k.org/
>
> Thank you, Geert!
>
> Huh.  Is that patch you reverted causing Maple Tree or related code
> to attempt to acquire mutexes in early boot before interrupts have
> been enabled?
>
> If that added WARN_ON_ONCE() doesn't trigger early, another approach
> would be to put it at the beginning of mutex_lock().  Or for that matter
> at the beginning of might_sleep().

With the WARN_ON_ONCE() moved from __setup_irq() to mutex_lock(),
it does trigger on RZ/A1:

 Dentry cache hash table entries: 4096 (order: 2, 16384 bytes, linear)
 Inode-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
 Built 1 zonelists, mobility grouping off.  Total pages: 8128
+------------[ cut here ]------------
+WARNING: CPU: 0 PID: 0 at kernel/locking/mutex.c:283 mutex_lock+0x3c/0x68
+CPU: 0 PID: 0 Comm: swapper Not tainted
6.6.0-rc1-rskrza1-02360-gd762084b1737-dirty #831
+Hardware name: Generic R7S72100 (Flattened Device Tree)
+ unwind_backtrace from show_stack+0x10/0x14
+ show_stack from dump_stack_lvl+0x24/0x3c
+ dump_stack_lvl from __warn+0x74/0xb8
+ __warn from warn_slowpath_fmt+0x78/0xb0
+ warn_slowpath_fmt from mutex_lock+0x3c/0x68
+ mutex_lock from __cpuhp_setup_state_cpuslocked+0x44/0x1c0
+ __cpuhp_setup_state_cpuslocked from page_alloc_init_cpuhp+0x28/0x64
+ page_alloc_init_cpuhp from mm_core_init+0x18/0x2a4
+ mm_core_init from start_kernel+0x250/0x47c
+ start_kernel from 0x0
+---[ end trace 0000000000000000 ]---
 mem auto-init: stack:off, heap alloc:off, heap free:off
 Memory: 22876K/32768K available (5120K kernel code, 951K rwdata,
1208K rodata, 1024K init, 321K bss, 9892K reserved, 0K cma-reserved)
 SLUB: HWalign=64, Order=0-1, MinObjects=0, CPUs=1, Nodes=1

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-13 13:14                         ` Geert Uytterhoeven
@ 2023-09-13 13:24                           ` Liam R. Howlett
  2023-09-13 13:26                             ` Geert Uytterhoeven
  0 siblings, 1 reply; 35+ messages in thread
From: Liam R. Howlett @ 2023-09-13 13:24 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: paulmck, Andrew Morton, maple-tree, linux-mm, linux-kernel,
	stable, linux-renesas-soc, Shanker Donthineni

* Geert Uytterhoeven <geert@linux-m68k.org> [230913 09:15]:
> Hi Paul,
> 
> On Tue, Sep 12, 2023 at 12:00 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> > > On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> > > > > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 14:03]:
> > > > > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > > > > > * Paul E. McKenney <paulmck@kernel.org> [230906 13:24]:
> > > > > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > > > > > >
> > > > > > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > > > > > get a working PPC32 test environment.
> > > > > > > > > > >
> > > > > > > > > > > * Geert Uytterhoeven <geert@linux-m68k.org> [230829 12:42]:
> > > > > > > > > > > >     Hi Liam,
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > > > > > incorrect ranges to be returned to a reader during an update.  Although
> > > > > > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > > > > > >
> > > > > > > > > > > ...
> > > > > > > > > > > > >
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > > > > > > RCU-related configs:
> > > > > > > > > > > >
> > > > > > > > > > > >     $ grep RCU .config
> > > > > > > > > > > >     # RCU Subsystem
> > > > > > > > > > > >     CONFIG_TINY_RCU=y
> > > > > >
> > > > > > I must have been asleep last time I looked at this.  I was looking at
> > > > > > Tree RCU.  Please accept my apologies for my lapse.  :-/
> > > > > >
> > > > > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > > > > > have said the same thing, albeit after looking at a lot less RCU code.
> > > > > >
> > > > > > TL;DR:
> > > > > >
> > > > > > 1.      Try making the __setup_irq() function's call to mutex_lock()
> > > > > >         instead be as follows:
> > > > > >
> > > > > >         if (!mutex_trylock(&desc->request_mutex))
> > > > > >                 mutex_lock(&desc->request_mutex);
> > > > > >
> > > > > >         This might fail if __setup_irq() has other dependencies on a
> > > > > >         fully operational scheduler.
> > > > > >
> > > > > > 2.      Move that ppc32 call to __setup_irq() much later, most definitely
> > > > > >         after interrupts have been enabled and the scheduler is fully
> > > > > >         operational.  Invoking mutex_lock() before that time is not a
> > > > > >         good idea.  ;-)
> > > > >
> > > > > There is no call to __setup_irq() from arch/powerpc/?
> > > >
> > > > Glad it is not just me, given that I didn't see a direct call, either.  So
> > > > later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
> > > > just before that mutex_lock() in __setup_irq().
> > > >
> > > > Either way, invoking mutex_lock() early in boot before interrupts have
> > > > been enabled is a bad idea.  ;-)
> > >
> > > I'll add that WARN_ON_ONCE() too, and will report back later today...
> >
> > Thank you, looking forward to hearing the outcome!
> 
> On a typical arm32 system with global or architectured timer, I get:
> 
> WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1587 __setup_irq+0xcc/0x6d0
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 6.6.0-rc1-shmobile-02354-g24e058b77f5a #1655
> Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
>  unwind_backtrace from show_stack+0x10/0x14
>  show_stack from dump_stack_lvl+0x40/0x4c
>  dump_stack_lvl from __warn+0x78/0x10c
>  __warn from warn_slowpath_fmt+0x90/0x11c
>  warn_slowpath_fmt from __setup_irq+0xcc/0x6d0
>  __setup_irq from __request_percpu_irq+0xb8/0xd0
>  __request_percpu_irq from set_smp_ipi_range+0x88/0xdc
>  set_smp_ipi_range from gic_of_init+0x1a4/0x4d8
>  gic_of_init from of_irq_init+0x1f0/0x320
>  of_irq_init from init_IRQ+0x74/0x104
>  init_IRQ from start_kernel+0x360/0x5d0
>  start_kernel from 0x0
> 
> Likewise on arm64 with architectured timer.
> But on these systems I do not see the issue I reported.
> 
> On RISC-V:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1587 __setup_irq+0x4e6/0x5ee
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc1-02355-g63165363c6a3 #63
> Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT)
> epc : __setup_irq+0x4e6/0x5ee
>  ra : __setup_irq+0x38/0x5ee
> epc : ffffffff80049472 ra : ffffffff80048fc4 sp : ffffffff81203cd0
>  gp : ffffffff812ee760 tp : ffffffff8120d5c0 t0 : ffffffd801854300
>  t1 : 0000000000000000 t2 : ffffffff81000ad8 s0 : ffffffff81203d20
>  s1 : ffffffd801855000 a0 : 0000000000000001 a1 : ffffffd801855000
>  a2 : ffffffd801854280 a3 : 0000000000000001 a4 : 0000000000000000
>  a5 : 0000000000000000 a6 : ffffffd801852108 a7 : ffffffd801852100
>  s2 : ffffffd801854280 s3 : 0000000000000005 s4 : ffffffff812c54c0
>  s5 : 0000000000000005 s6 : ffffffff80dd83a0 s7 : ffffffff805c0cc0
>  s8 : ffffffd801855018 s9 : 0000000000000000 s10: 0000000000000000
>  s11: 000000007bf638a0 t3 : 0000000000000000 t4 : 0000000000000002
>  t5 : ffffffff812882a0 t6 : 0000000000000001
> status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> [<ffffffff80049472>] __setup_irq+0x4e6/0x5ee
> [<ffffffff800497a8>] __request_percpu_irq+0x98/0xcc
> [<ffffffff8082501e>] riscv_timer_init_dt+0x186/0x22e
> [<ffffffff80824b62>] timer_probe+0x62/0xd2
> [<ffffffff80803c36>] time_init+0x86/0xa6
> [<ffffffff80800ae2>] start_kernel+0x436/0x618
> ---[ end trace 0000000000000000 ]---
> 
> Also, no issue here.
> 
> On the affected systems (RZ/A1 and RZ/A2), the WARN_ON_ONCE() did
> not trigger, until I applied Liam's patch ("init/main: Clear boot task
> idle flag"), which got rid of the "Interrupts were enabled early" warning,
> and now tells me, as expected:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at kernel/irq/manage.c:1587 __setup_irq+0xc8/0x654
> CPU: 0 PID: 0 Comm: swapper Not tainted
> 6.6.0-rc1-rskrza1-02357-g237e09fd64b8-dirty #829
> Hardware name: Generic R7S72100 (Flattened Device Tree)
>  unwind_backtrace from show_stack+0x10/0x14
>  show_stack from dump_stack_lvl+0x24/0x3c
>  dump_stack_lvl from __warn+0x74/0xb8
>  __warn from warn_slowpath_fmt+0x78/0xb0
>  warn_slowpath_fmt from __setup_irq+0xc8/0x654
>  __setup_irq from request_threaded_irq+0xac/0x13c
>  request_threaded_irq from timer_of_init+0x238/0x2c8
>  timer_of_init from ostm_init+0x98/0x208
>  ostm_init from timer_probe+0x90/0xe4
>  timer_probe from start_kernel+0x2c0/0x488
>  start_kernel from 0x0
> ---[ end trace 0000000000000000 ]---
> 
> However, Liam's patch causes lots of warnings on the other systems...
> 

What patch?  The "init/main" patch?

What systems and what are they?

Thanks,
Liam

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-13 13:24                           ` Liam R. Howlett
@ 2023-09-13 13:26                             ` Geert Uytterhoeven
  0 siblings, 0 replies; 35+ messages in thread
From: Geert Uytterhoeven @ 2023-09-13 13:26 UTC (permalink / raw)
  To: Liam R. Howlett, Geert Uytterhoeven, paulmck, Andrew Morton,
	maple-tree, linux-mm, linux-kernel, stable, linux-renesas-soc,
	Shanker Donthineni

Hi Liam,

On Wed, Sep 13, 2023 at 3:24 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> * Geert Uytterhoeven <geert@linux-m68k.org> [230913 09:15]:
> > On the affected systems (RZ/A1 and RZ/A2), the WARN_ON_ONCE() did
> > not trigger, until I applied Liam's patch ("init/main: Clear boot task
> > idle flag"), which got rid of the "Interrupts were enabled early" warning,
> > and now tells me, as expected:

[...]

> > However, Liam's patch causes lots of warnings on the other systems...
> >
>
> What patch?  The "init/main" patch?

Indeed.

> What systems and what are they?

Reported in the thread:
https://lore.kernel.org/all/CAMuHMdWR68a49=vthdp03stpvaHLS5BRa+rhVdnr7gQDFkNotQ@mail.gmail.com

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible
  2023-09-11 12:27     ` Linux regression tracking #adding (Thorsten Leemhuis)
@ 2023-10-16  8:29       ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 0 replies; 35+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-10-16  8:29 UTC (permalink / raw)
  To: Linux kernel regressions list; +Cc: linux-mm, linux-kernel, linux-renesas-soc



On 11.09.23 14:27, Linux regression tracking #adding (Thorsten Leemhuis)
wrote:
> On 29.08.23 18:42, Geert Uytterhoeven wrote:
>>
>> On Fri, 18 Aug 2023, Liam R. Howlett wrote:
>>> The current implementation of append may cause duplicate data and/or
>>> incorrect ranges to be returned to a reader during an update.  Although
>>> this has not been reported or seen, disable the append write operation
>>> while the tree is in rcu mode out of an abundance of caution.
>>>
>>> During the analysis of the mas_next_slot() the following was
>>> artificially created by separating the writer and reader code:
>> [...]
>> Thanks for your patch, which is now commit cfeb6ae8bcb96ccf
>> ("maple_tree: disable mas_wr_append() when other readers are
>> possible") in v6.5, and is being backported to stable.
>>
>> On Renesas RZ/A1 and RZ/A2 (single-core Cortex-A9), this causes the
>> following warning:
>>> […]
>> Reverting this commit fixes the issue.

#regzbot fix: cff9b2332ab762
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-10-16  8:29 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230819004356.1454718-1-Liam.Howlett@oracle.com>
     [not found] ` <20230819004356.1454718-2-Liam.Howlett@oracle.com>
2023-08-29 16:42   ` [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible Geert Uytterhoeven
2023-08-31  5:39     ` Michael Ellerman
2023-08-31  8:25       ` Geert Uytterhoeven
2023-08-31  8:45         ` Peng Zhang
2023-08-31  9:43           ` Geert Uytterhoeven
     [not found]     ` <20230906152325.dblzauybyoq5kd35@revolver>
     [not found]       ` <ad298077-fca8-437e-b9e3-66e31424afb1@paulmck-laptop>
2023-09-06 17:29         ` Liam R. Howlett
2023-09-06 18:02           ` Paul E. McKenney
2023-09-11 23:54             ` Liam R. Howlett
2023-09-12  8:14               ` Paul E. McKenney
2023-09-12  8:23                 ` Geert Uytterhoeven
2023-09-12  8:30                   ` Paul E. McKenney
2023-09-12  8:34                     ` Geert Uytterhoeven
2023-09-12 10:00                       ` Paul E. McKenney
2023-09-12 13:56                         ` Liam R. Howlett
2023-09-12 14:29                           ` Liam R. Howlett
2023-09-12 15:08                             ` Paul E. McKenney
2023-09-12 15:27                               ` Christophe Leroy
2023-09-12 15:49                                 ` Liam R. Howlett
2023-09-12 15:07                           ` Paul E. McKenney
2023-09-12 15:44                             ` Liam R. Howlett
2023-09-12 16:49                               ` Paul E. McKenney
2023-09-12 17:02                                 ` Christophe Leroy
2023-09-13  6:33                                   ` Christophe Leroy
2023-09-12 17:09                               ` Christophe Leroy
2023-09-12 17:38                                 ` Liam R. Howlett
2023-09-13 13:14                         ` Geert Uytterhoeven
2023-09-13 13:24                           ` Liam R. Howlett
2023-09-13 13:26                             ` Geert Uytterhoeven
2023-09-12 14:37                 ` Christophe Leroy
2023-09-12 15:06                   ` Christophe Leroy
2023-09-12 14:10               ` Matthew Wilcox
2023-09-12 14:17                 ` Liam R. Howlett
2023-09-06 19:06       ` Geert Uytterhoeven
2023-09-11 12:27     ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-10-16  8:29       ` Linux regression tracking #update (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).