linux-sh.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix the j-core SOC build.
@ 2021-10-02 19:32 Rob Landley
  2021-10-02 21:39 ` John Paul Adrian Glaubitz
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Rob Landley @ 2021-10-02 19:32 UTC (permalink / raw)
  To: Linux-sh list, Rich Felker

From: Rob Landley <rob@landley.net>

Commit b67177ecd956 broke the j-core SOC build with a link failure, because
mm/percpu.c function pcpu_post_unmap_tlb_flush() calls flush_tlb_kernel_range()
which is defined under #ifdef CONFIG_MMU.

Signed-off-by: Rob Landley <rob@landley.net>
---

 arch/sh/kernel/smp.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
index 65924d9ec245..3ec8f32aad85 100644
--- a/arch/sh/kernel/smp.c
+++ b/arch/sh/kernel/smp.c
@@ -468,4 +468,9 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
 	local_flush_tlb_one(asid, vaddr);
 }

+#else
+void flush_tlb_kernel_range(unsigned long start, unsigned long end)
+{
+	local_flush_tlb_all();
+}
 #endif

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix the j-core SOC build.
  2021-10-02 19:32 [PATCH] Fix the j-core SOC build Rob Landley
@ 2021-10-02 21:39 ` John Paul Adrian Glaubitz
  2021-10-03  2:18 ` Rich Felker
  2021-10-03  9:10 ` Sergey Shtylyov
  2 siblings, 0 replies; 12+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-10-02 21:39 UTC (permalink / raw)
  To: Rob Landley, Linux-sh list, Rich Felker

Hi Rob!

On 10/2/21 21:32, Rob Landley wrote:
> From: Rob Landley <rob@landley.net>
> 
> Commit b67177ecd956 broke the j-core SOC build with a link failure, because
> mm/percpu.c function pcpu_post_unmap_tlb_flush() calls flush_tlb_kernel_range()
> which is defined under #ifdef CONFIG_MMU.

Looks like you're missing a "Fixes:" tag.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix the j-core SOC build.
  2021-10-02 19:32 [PATCH] Fix the j-core SOC build Rob Landley
  2021-10-02 21:39 ` John Paul Adrian Glaubitz
@ 2021-10-03  2:18 ` Rich Felker
  2021-11-10 21:33   ` Rich Felker
  2021-10-03  9:10 ` Sergey Shtylyov
  2 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2021-10-03  2:18 UTC (permalink / raw)
  To: Rob Landley; +Cc: Linux-sh list

On Sat, Oct 02, 2021 at 02:32:15PM -0500, Rob Landley wrote:
> From: Rob Landley <rob@landley.net>
> 
> Commit b67177ecd956 broke the j-core SOC build with a link failure, because
> mm/percpu.c function pcpu_post_unmap_tlb_flush() calls flush_tlb_kernel_range()
> which is defined under #ifdef CONFIG_MMU.
> 
> Signed-off-by: Rob Landley <rob@landley.net>
> ---
> 
>  arch/sh/kernel/smp.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
> index 65924d9ec245..3ec8f32aad85 100644
> --- a/arch/sh/kernel/smp.c
> +++ b/arch/sh/kernel/smp.c
> @@ -468,4 +468,9 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
>  	local_flush_tlb_one(asid, vaddr);
>  }
> 
> +#else
> +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> +{
> +	local_flush_tlb_all();
> +}
>  #endif

local_flush_tlb_all() is defined in arch/sh/mm/nommu.c as BUG(); so
this is most likely wrong unless it just doesn't get called. I think
there should probably be something at a very general level dummying
out these functions/macros on nommu but I don't know where it should
be.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix the j-core SOC build.
  2021-10-02 19:32 [PATCH] Fix the j-core SOC build Rob Landley
  2021-10-02 21:39 ` John Paul Adrian Glaubitz
  2021-10-03  2:18 ` Rich Felker
@ 2021-10-03  9:10 ` Sergey Shtylyov
  2 siblings, 0 replies; 12+ messages in thread
From: Sergey Shtylyov @ 2021-10-03  9:10 UTC (permalink / raw)
  To: Rob Landley, Linux-sh list, Rich Felker

On 02.10.2021 22:32, Rob Landley wrote:

> From: Rob Landley <rob@landley.net>
> 
> Commit b67177ecd956

    You missed the commit summary enclosed on ("").

> broke the j-core SOC build with a link failure, because
> mm/percpu.c function pcpu_post_unmap_tlb_flush() calls flush_tlb_kernel_range()
> which is defined under #ifdef CONFIG_MMU.
> 
> Signed-off-by: Rob Landley <rob@landley.net>
[...]

MBR, Sergey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix the j-core SOC build.
  2021-10-03  2:18 ` Rich Felker
@ 2021-11-10 21:33   ` Rich Felker
  2021-11-11  0:53     ` Rich Felker
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2021-11-10 21:33 UTC (permalink / raw)
  To: Rob Landley; +Cc: Linux-sh list, Nicholas Piggin

On Sat, Oct 02, 2021 at 10:18:51PM -0400, Rich Felker wrote:
> On Sat, Oct 02, 2021 at 02:32:15PM -0500, Rob Landley wrote:
> > From: Rob Landley <rob@landley.net>
> > 
> > Commit b67177ecd956 broke the j-core SOC build with a link failure, because
> > mm/percpu.c function pcpu_post_unmap_tlb_flush() calls flush_tlb_kernel_range()
> > which is defined under #ifdef CONFIG_MMU.
> > 
> > Signed-off-by: Rob Landley <rob@landley.net>
> > ---
> > 
> >  arch/sh/kernel/smp.c |    5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
> > index 65924d9ec245..3ec8f32aad85 100644
> > --- a/arch/sh/kernel/smp.c
> > +++ b/arch/sh/kernel/smp.c
> > @@ -468,4 +468,9 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
> >  	local_flush_tlb_one(asid, vaddr);
> >  }
> > 
> > +#else
> > +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > +{
> > +	local_flush_tlb_all();
> > +}
> >  #endif
> 
> local_flush_tlb_all() is defined in arch/sh/mm/nommu.c as BUG(); so
> this is most likely wrong unless it just doesn't get called. I think
> there should probably be something at a very general level dummying
> out these functions/macros on nommu but I don't know where it should
> be.

I've looked into this some more, and while arch/arm does dummy its
equivalent functions out on nommu, arch/sh has always had them as
BUG(), and indeed it makes some sense to catch erroneous usage.
pcpu_post_unmap_tlb_flush should probably have the flush under #ifdef
CONFIG_MMU or something.

I've added the author of the commit that broke this to Cc in case he
has any thoughts.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix the j-core SOC build.
  2021-11-10 21:33   ` Rich Felker
@ 2021-11-11  0:53     ` Rich Felker
  2021-11-11  2:01       ` Dennis Zhou
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2021-11-11  0:53 UTC (permalink / raw)
  To: Rob Landley
  Cc: Linux-sh list, Nicholas Piggin, Dennis Zhou, Geert Uytterhoeven

On Wed, Nov 10, 2021 at 04:33:46PM -0500, Rich Felker wrote:
> On Sat, Oct 02, 2021 at 10:18:51PM -0400, Rich Felker wrote:
> > On Sat, Oct 02, 2021 at 02:32:15PM -0500, Rob Landley wrote:
> > > From: Rob Landley <rob@landley.net>
> > > 
> > > Commit b67177ecd956 broke the j-core SOC build with a link failure, because
> > > mm/percpu.c function pcpu_post_unmap_tlb_flush() calls flush_tlb_kernel_range()
> > > which is defined under #ifdef CONFIG_MMU.
> > > 
> > > Signed-off-by: Rob Landley <rob@landley.net>
> > > ---
> > > 
> > >  arch/sh/kernel/smp.c |    5 +++++
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
> > > index 65924d9ec245..3ec8f32aad85 100644
> > > --- a/arch/sh/kernel/smp.c
> > > +++ b/arch/sh/kernel/smp.c
> > > @@ -468,4 +468,9 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
> > >  	local_flush_tlb_one(asid, vaddr);
> > >  }
> > > 
> > > +#else
> > > +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > > +{
> > > +	local_flush_tlb_all();
> > > +}
> > >  #endif
> > 
> > local_flush_tlb_all() is defined in arch/sh/mm/nommu.c as BUG(); so
> > this is most likely wrong unless it just doesn't get called. I think
> > there should probably be something at a very general level dummying
> > out these functions/macros on nommu but I don't know where it should
> > be.
> 
> I've looked into this some more, and while arch/arm does dummy its
> equivalent functions out on nommu, arch/sh has always had them as
> BUG(), and indeed it makes some sense to catch erroneous usage.
> pcpu_post_unmap_tlb_flush should probably have the flush under #ifdef
> CONFIG_MMU or something.
> 
> I've added the author of the commit that broke this to Cc in case he
> has any thoughts.

I think we actually have the wrong commit to blame; it looks like
93274f1dd6b0 ("percpu: flush tlb in pcpu_reclaim_populated()") added
the TLB flush. However, this rabbit hole seems to go a lot deeper.
According to the comments in the files, percpu-km.c, not percpu-vm.c,
is supposed to be used on nommu archs. However, mm/Kconfig has
NEED_PER_CPU_KM depending on !SMP rather than !SMP || !MMU, which
mismatches the comments in the source. So should we be trying to fix
the Kconfig constraints to use -km? Or fixing -vm to work on nommu
like it used to?

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix the j-core SOC build.
  2021-11-11  0:53     ` Rich Felker
@ 2021-11-11  2:01       ` Dennis Zhou
  2021-11-11  5:39         ` Rob Landley
  2021-11-11 14:16         ` Rich Felker
  0 siblings, 2 replies; 12+ messages in thread
From: Dennis Zhou @ 2021-11-11  2:01 UTC (permalink / raw)
  To: Rich Felker
  Cc: Rob Landley, Linux-sh list, Nicholas Piggin, Dennis Zhou,
	Geert Uytterhoeven

Hello,

On Wed, Nov 10, 2021 at 07:53:13PM -0500, Rich Felker wrote:
> On Wed, Nov 10, 2021 at 04:33:46PM -0500, Rich Felker wrote:
> > On Sat, Oct 02, 2021 at 10:18:51PM -0400, Rich Felker wrote:
> > > On Sat, Oct 02, 2021 at 02:32:15PM -0500, Rob Landley wrote:
> > > > From: Rob Landley <rob@landley.net>
> > > > 
> > > > Commit b67177ecd956 broke the j-core SOC build with a link failure, because
> > > > mm/percpu.c function pcpu_post_unmap_tlb_flush() calls flush_tlb_kernel_range()
> > > > which is defined under #ifdef CONFIG_MMU.
> > > > 
> > > > Signed-off-by: Rob Landley <rob@landley.net>
> > > > ---
> > > > 
> > > >  arch/sh/kernel/smp.c |    5 +++++
> > > >  1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
> > > > index 65924d9ec245..3ec8f32aad85 100644
> > > > --- a/arch/sh/kernel/smp.c
> > > > +++ b/arch/sh/kernel/smp.c
> > > > @@ -468,4 +468,9 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
> > > >  	local_flush_tlb_one(asid, vaddr);
> > > >  }
> > > > 
> > > > +#else
> > > > +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > > > +{
> > > > +	local_flush_tlb_all();
> > > > +}
> > > >  #endif
> > > 
> > > local_flush_tlb_all() is defined in arch/sh/mm/nommu.c as BUG(); so
> > > this is most likely wrong unless it just doesn't get called. I think
> > > there should probably be something at a very general level dummying
> > > out these functions/macros on nommu but I don't know where it should
> > > be.
> > 
> > I've looked into this some more, and while arch/arm does dummy its
> > equivalent functions out on nommu, arch/sh has always had them as
> > BUG(), and indeed it makes some sense to catch erroneous usage.
> > pcpu_post_unmap_tlb_flush should probably have the flush under #ifdef
> > CONFIG_MMU or something.
> > 
> > I've added the author of the commit that broke this to Cc in case he
> > has any thoughts.
> 
> I think we actually have the wrong commit to blame; it looks like
> 93274f1dd6b0 ("percpu: flush tlb in pcpu_reclaim_populated()") added
> the TLB flush. However, this rabbit hole seems to go a lot deeper.
> According to the comments in the files, percpu-km.c, not percpu-vm.c,
> is supposed to be used on nommu archs. However, mm/Kconfig has
> NEED_PER_CPU_KM depending on !SMP rather than !SMP || !MMU, which
> mismatches the comments in the source. So should we be trying to fix
> the Kconfig constraints to use -km? Or fixing -vm to work on nommu
> like it used to?
> 
> Rich

I'm surprised I haven't heard any reports of this until now. Thanks for
ccing me. Fwiw, it's generally good to cc the author / subsystem
maintainer when there is an issue.

I would need to think about it a little bit more, but I think
percpu-km.c was written more in the mindset of !MMU == !SMP. It sounds
like for superh that's not true.

Given that, if sh has no MMU, then it really should be using percpu-km.c
not percpu-vm.c as there is no notion of unpopulated pages for that
configuration. I think we should fix the Kconfig constraints and give
that a shot. I didn't have any known users of !mmu/percpu-km.c, so I
didn't want to make the changes without someone verifying it or needing
it.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix the j-core SOC build.
  2021-11-11  2:01       ` Dennis Zhou
@ 2021-11-11  5:39         ` Rob Landley
  2021-11-11 14:16         ` Rich Felker
  1 sibling, 0 replies; 12+ messages in thread
From: Rob Landley @ 2021-11-11  5:39 UTC (permalink / raw)
  To: Dennis Zhou, Rich Felker
  Cc: Linux-sh list, Nicholas Piggin, Geert Uytterhoeven

On 11/10/21 8:01 PM, Dennis Zhou wrote:
> I'm surprised I haven't heard any reports of this until now. Thanks for
> ccing me. Fwiw, it's generally good to cc the author / subsystem
> maintainer when there is an issue.
> 
> I would need to think about it a little bit more, but I think
> percpu-km.c was written more in the mindset of !MMU == !SMP. It sounds
> like for superh that's not true.
> 
> Given that, if sh has no MMU,

sh1 and sh2 are nommu. sh3 and sh4 have mmu. J2 is an SMP sh2 compatible (with
some backported sh3 instructions plus a cmpxchg variant to make the SMP work).
It is, as far as I know, the only nommu SMP in the tree, but it worked for us
until recently.

Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix the j-core SOC build.
  2021-11-11  2:01       ` Dennis Zhou
  2021-11-11  5:39         ` Rob Landley
@ 2021-11-11 14:16         ` Rich Felker
  2021-11-12 18:34           ` Dennis Zhou
  1 sibling, 1 reply; 12+ messages in thread
From: Rich Felker @ 2021-11-11 14:16 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Rob Landley, Linux-sh list, Nicholas Piggin, Geert Uytterhoeven

On Wed, Nov 10, 2021 at 09:01:10PM -0500, Dennis Zhou wrote:
> Hello,
> 
> On Wed, Nov 10, 2021 at 07:53:13PM -0500, Rich Felker wrote:
> > On Wed, Nov 10, 2021 at 04:33:46PM -0500, Rich Felker wrote:
> > > On Sat, Oct 02, 2021 at 10:18:51PM -0400, Rich Felker wrote:
> > > > On Sat, Oct 02, 2021 at 02:32:15PM -0500, Rob Landley wrote:
> > > > > From: Rob Landley <rob@landley.net>
> > > > > 
> > > > > Commit b67177ecd956 broke the j-core SOC build with a link failure, because
> > > > > mm/percpu.c function pcpu_post_unmap_tlb_flush() calls flush_tlb_kernel_range()
> > > > > which is defined under #ifdef CONFIG_MMU.
> > > > > 
> > > > > Signed-off-by: Rob Landley <rob@landley.net>
> > > > > ---
> > > > > 
> > > > >  arch/sh/kernel/smp.c |    5 +++++
> > > > >  1 file changed, 5 insertions(+)
> > > > > 
> > > > > diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
> > > > > index 65924d9ec245..3ec8f32aad85 100644
> > > > > --- a/arch/sh/kernel/smp.c
> > > > > +++ b/arch/sh/kernel/smp.c
> > > > > @@ -468,4 +468,9 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
> > > > >  	local_flush_tlb_one(asid, vaddr);
> > > > >  }
> > > > > 
> > > > > +#else
> > > > > +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > > > > +{
> > > > > +	local_flush_tlb_all();
> > > > > +}
> > > > >  #endif
> > > > 
> > > > local_flush_tlb_all() is defined in arch/sh/mm/nommu.c as BUG(); so
> > > > this is most likely wrong unless it just doesn't get called. I think
> > > > there should probably be something at a very general level dummying
> > > > out these functions/macros on nommu but I don't know where it should
> > > > be.
> > > 
> > > I've looked into this some more, and while arch/arm does dummy its
> > > equivalent functions out on nommu, arch/sh has always had them as
> > > BUG(), and indeed it makes some sense to catch erroneous usage.
> > > pcpu_post_unmap_tlb_flush should probably have the flush under #ifdef
> > > CONFIG_MMU or something.
> > > 
> > > I've added the author of the commit that broke this to Cc in case he
> > > has any thoughts.
> > 
> > I think we actually have the wrong commit to blame; it looks like
> > 93274f1dd6b0 ("percpu: flush tlb in pcpu_reclaim_populated()") added
> > the TLB flush. However, this rabbit hole seems to go a lot deeper.
> > According to the comments in the files, percpu-km.c, not percpu-vm.c,
> > is supposed to be used on nommu archs. However, mm/Kconfig has
> > NEED_PER_CPU_KM depending on !SMP rather than !SMP || !MMU, which
> > mismatches the comments in the source. So should we be trying to fix
> > the Kconfig constraints to use -km? Or fixing -vm to work on nommu
> > like it used to?
> > 
> > Rich
> 
> I'm surprised I haven't heard any reports of this until now. Thanks for
> ccing me. Fwiw, it's generally good to cc the author / subsystem
> maintainer when there is an issue.

Thanks for the quick reply!

> I would need to think about it a little bit more, but I think
> percpu-km.c was written more in the mindset of !MMU == !SMP. It sounds
> like for superh that's not true.

Indeed, it's possible we have the only NOMMU+SMP system Linux
currently runs on; if so that makes it something of a good smoke test
for corner cases. J2 users just haven't kept up with the last few
kernel releases, so the breakage went unnoticed for a while until
resyncing with current.

> Given that, if sh has no MMU, then it really should be using percpu-km.c
> not percpu-vm.c as there is no notion of unpopulated pages for that
> configuration. I think we should fix the Kconfig constraints and give
> that a shot. I didn't have any known users of !mmu/percpu-km.c, so I
> didn't want to make the changes without someone verifying it or needing
> it.

I'm not familiar with the different percpu implementations, and
apparently up til now we've been using percpu-vm.c without noticing
any problem. In concept, should both be usable, or was it just an
accident that it worked?

Does it suffice to do:

 config NEED_PER_CPU_KM
-       depends on !SMP
+       depends on !SMP || !MMU

or should we also have:

+       select NEED_PER_CPU_KM if !MMU && SMP

for arch/sh/Kconfig to ensure there's a conflict if it can't be
selected? (Or, should this logic be higher level and not
arch-dependent?)

I'm not sure if there are any NOMMU+SMP ARM systems Linux supports,
but FYI ARM has no-op implementations of all the TLB flush functions
(whereas arch/sh is missing some and has BUG() for others), so it's
possible the same thing is happening on ARM and just not exhibiting
any outwardly wrong behaviors.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix the j-core SOC build.
  2021-11-11 14:16         ` Rich Felker
@ 2021-11-12 18:34           ` Dennis Zhou
  2021-11-12 18:57             ` Rich Felker
  0 siblings, 1 reply; 12+ messages in thread
From: Dennis Zhou @ 2021-11-12 18:34 UTC (permalink / raw)
  To: Rich Felker
  Cc: Rob Landley, Linux-sh list, Nicholas Piggin, Geert Uytterhoeven

On Thu, Nov 11, 2021 at 09:16:22AM -0500, Rich Felker wrote:
> On Wed, Nov 10, 2021 at 09:01:10PM -0500, Dennis Zhou wrote:
> > Hello,
> > 
> > On Wed, Nov 10, 2021 at 07:53:13PM -0500, Rich Felker wrote:
> > > On Wed, Nov 10, 2021 at 04:33:46PM -0500, Rich Felker wrote:
> > > > On Sat, Oct 02, 2021 at 10:18:51PM -0400, Rich Felker wrote:
> > > > > On Sat, Oct 02, 2021 at 02:32:15PM -0500, Rob Landley wrote:
> > > > > > From: Rob Landley <rob@landley.net>
> > > > > > 
> > > > > > Commit b67177ecd956 broke the j-core SOC build with a link failure, because
> > > > > > mm/percpu.c function pcpu_post_unmap_tlb_flush() calls flush_tlb_kernel_range()
> > > > > > which is defined under #ifdef CONFIG_MMU.
> > > > > > 
> > > > > > Signed-off-by: Rob Landley <rob@landley.net>
> > > > > > ---
> > > > > > 
> > > > > >  arch/sh/kernel/smp.c |    5 +++++
> > > > > >  1 file changed, 5 insertions(+)
> > > > > > 
> > > > > > diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
> > > > > > index 65924d9ec245..3ec8f32aad85 100644
> > > > > > --- a/arch/sh/kernel/smp.c
> > > > > > +++ b/arch/sh/kernel/smp.c
> > > > > > @@ -468,4 +468,9 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
> > > > > >  	local_flush_tlb_one(asid, vaddr);
> > > > > >  }
> > > > > > 
> > > > > > +#else
> > > > > > +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > > > > > +{
> > > > > > +	local_flush_tlb_all();
> > > > > > +}
> > > > > >  #endif
> > > > > 
> > > > > local_flush_tlb_all() is defined in arch/sh/mm/nommu.c as BUG(); so
> > > > > this is most likely wrong unless it just doesn't get called. I think
> > > > > there should probably be something at a very general level dummying
> > > > > out these functions/macros on nommu but I don't know where it should
> > > > > be.
> > > > 
> > > > I've looked into this some more, and while arch/arm does dummy its
> > > > equivalent functions out on nommu, arch/sh has always had them as
> > > > BUG(), and indeed it makes some sense to catch erroneous usage.
> > > > pcpu_post_unmap_tlb_flush should probably have the flush under #ifdef
> > > > CONFIG_MMU or something.
> > > > 
> > > > I've added the author of the commit that broke this to Cc in case he
> > > > has any thoughts.
> > > 
> > > I think we actually have the wrong commit to blame; it looks like
> > > 93274f1dd6b0 ("percpu: flush tlb in pcpu_reclaim_populated()") added
> > > the TLB flush. However, this rabbit hole seems to go a lot deeper.
> > > According to the comments in the files, percpu-km.c, not percpu-vm.c,
> > > is supposed to be used on nommu archs. However, mm/Kconfig has
> > > NEED_PER_CPU_KM depending on !SMP rather than !SMP || !MMU, which
> > > mismatches the comments in the source. So should we be trying to fix
> > > the Kconfig constraints to use -km? Or fixing -vm to work on nommu
> > > like it used to?
> > > 
> > > Rich
> > 
> > I'm surprised I haven't heard any reports of this until now. Thanks for
> > ccing me. Fwiw, it's generally good to cc the author / subsystem
> > maintainer when there is an issue.
> 
> Thanks for the quick reply!
> 

Sorry, I've had to think about this a little bit.

> > I would need to think about it a little bit more, but I think
> > percpu-km.c was written more in the mindset of !MMU == !SMP. It sounds
> > like for superh that's not true.
> 
> Indeed, it's possible we have the only NOMMU+SMP system Linux
> currently runs on; if so that makes it something of a good smoke test
> for corner cases. J2 users just haven't kept up with the last few
> kernel releases, so the breakage went unnoticed for a while until
> resyncing with current.
> 

So NOMMU+SMP is the odd case out. How percpu works is by creating
regions in the VMA that are a fixed offset apart. With NOMMU, this means
if we don't find physical addresses to suffice the initial condition
we're going to fail allocations. My guess is that these machines have a
fairly limited amount of memory so it isn't an issue.

> > Given that, if sh has no MMU, then it really should be using percpu-km.c
> > not percpu-vm.c as there is no notion of unpopulated pages for that
> > configuration. I think we should fix the Kconfig constraints and give
> > that a shot. I didn't have any known users of !mmu/percpu-km.c, so I
> > didn't want to make the changes without someone verifying it or needing
> > it.
> 
> I'm not familiar with the different percpu implementations, and
> apparently up til now we've been using percpu-vm.c without noticing
> any problem. In concept, should both be usable, or was it just an
> accident that it worked?
> 
> Does it suffice to do:
> 
>  config NEED_PER_CPU_KM
> -       depends on !SMP
> +       depends on !SMP || !MMU
> 
> or should we also have:
> 
> +       select NEED_PER_CPU_KM if !MMU && SMP
> 
> for arch/sh/Kconfig to ensure there's a conflict if it can't be
> selected? (Or, should this logic be higher level and not
> arch-dependent?)
> 
> I'm not sure if there are any NOMMU+SMP ARM systems Linux supports,
> but FYI ARM has no-op implementations of all the TLB flush functions
> (whereas arch/sh is missing some and has BUG() for others), so it's
> possible the same thing is happening on ARM and just not exhibiting
> any outwardly wrong behaviors.
> 

Continuing my explanation above, percpu on !SMP is just a pass through
allocation. Percpu can just blindly allocate memory and work as an
identity mapped pass through. So !SMP == NEED_PER_CPU_KM.

The tricky part is we still need to space the allocations which is what
percpu-vm.c kind of conflates with having an MMU and paging. SMP really
needs that part of percpu-vm.c not the vmalloc part.

So really, the appropriate fix would be to split out the address
allocation logic from the paging logic of percpu. This way backing can
occur through percpu-km.c for NOMMU+SMP. Unfortunately that is quite a
bit of work that I don't have the bandwidth for right now.

As a stop gap, I think we can get away with just stubbing out the tlb
flush as you've done and not bugging out on it because in the NOMMU+SMP
land, we need to get that specific address and not anything else. Doing
so should still be fine if my assumption about J2 being a small machine
is correct.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix the j-core SOC build.
  2021-11-12 18:34           ` Dennis Zhou
@ 2021-11-12 18:57             ` Rich Felker
  2021-11-12 22:41               ` Dennis Zhou
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2021-11-12 18:57 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Rob Landley, Linux-sh list, Nicholas Piggin, Geert Uytterhoeven

On Fri, Nov 12, 2021 at 01:34:43PM -0500, Dennis Zhou wrote:
> On Thu, Nov 11, 2021 at 09:16:22AM -0500, Rich Felker wrote:
> > On Wed, Nov 10, 2021 at 09:01:10PM -0500, Dennis Zhou wrote:
> > > Hello,
> > > 
> > > On Wed, Nov 10, 2021 at 07:53:13PM -0500, Rich Felker wrote:
> > > > On Wed, Nov 10, 2021 at 04:33:46PM -0500, Rich Felker wrote:
> > > > > On Sat, Oct 02, 2021 at 10:18:51PM -0400, Rich Felker wrote:
> > > > > > On Sat, Oct 02, 2021 at 02:32:15PM -0500, Rob Landley wrote:
> > > > > > > From: Rob Landley <rob@landley.net>
> > > > > > > 
> > > > > > > Commit b67177ecd956 broke the j-core SOC build with a link failure, because
> > > > > > > mm/percpu.c function pcpu_post_unmap_tlb_flush() calls flush_tlb_kernel_range()
> > > > > > > which is defined under #ifdef CONFIG_MMU.
> > > > > > > 
> > > > > > > Signed-off-by: Rob Landley <rob@landley.net>
> > > > > > > ---
> > > > > > > 
> > > > > > >  arch/sh/kernel/smp.c |    5 +++++
> > > > > > >  1 file changed, 5 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
> > > > > > > index 65924d9ec245..3ec8f32aad85 100644
> > > > > > > --- a/arch/sh/kernel/smp.c
> > > > > > > +++ b/arch/sh/kernel/smp.c
> > > > > > > @@ -468,4 +468,9 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
> > > > > > >  	local_flush_tlb_one(asid, vaddr);
> > > > > > >  }
> > > > > > > 
> > > > > > > +#else
> > > > > > > +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > > > > > > +{
> > > > > > > +	local_flush_tlb_all();
> > > > > > > +}
> > > > > > >  #endif
> > > > > > 
> > > > > > local_flush_tlb_all() is defined in arch/sh/mm/nommu.c as BUG(); so
> > > > > > this is most likely wrong unless it just doesn't get called. I think
> > > > > > there should probably be something at a very general level dummying
> > > > > > out these functions/macros on nommu but I don't know where it should
> > > > > > be.
> > > > > 
> > > > > I've looked into this some more, and while arch/arm does dummy its
> > > > > equivalent functions out on nommu, arch/sh has always had them as
> > > > > BUG(), and indeed it makes some sense to catch erroneous usage.
> > > > > pcpu_post_unmap_tlb_flush should probably have the flush under #ifdef
> > > > > CONFIG_MMU or something.
> > > > > 
> > > > > I've added the author of the commit that broke this to Cc in case he
> > > > > has any thoughts.
> > > > 
> > > > I think we actually have the wrong commit to blame; it looks like
> > > > 93274f1dd6b0 ("percpu: flush tlb in pcpu_reclaim_populated()") added
> > > > the TLB flush. However, this rabbit hole seems to go a lot deeper.
> > > > According to the comments in the files, percpu-km.c, not percpu-vm.c,
> > > > is supposed to be used on nommu archs. However, mm/Kconfig has
> > > > NEED_PER_CPU_KM depending on !SMP rather than !SMP || !MMU, which
> > > > mismatches the comments in the source. So should we be trying to fix
> > > > the Kconfig constraints to use -km? Or fixing -vm to work on nommu
> > > > like it used to?
> > > > 
> > > > Rich
> > > 
> > > I'm surprised I haven't heard any reports of this until now. Thanks for
> > > ccing me. Fwiw, it's generally good to cc the author / subsystem
> > > maintainer when there is an issue.
> > 
> > Thanks for the quick reply!
> > 
> 
> Sorry, I've had to think about this a little bit.
> 
> > > I would need to think about it a little bit more, but I think
> > > percpu-km.c was written more in the mindset of !MMU == !SMP. It sounds
> > > like for superh that's not true.
> > 
> > Indeed, it's possible we have the only NOMMU+SMP system Linux
> > currently runs on; if so that makes it something of a good smoke test
> > for corner cases. J2 users just haven't kept up with the last few
> > kernel releases, so the breakage went unnoticed for a while until
> > resyncing with current.
> > 
> 
> So NOMMU+SMP is the odd case out. How percpu works is by creating
> regions in the VMA that are a fixed offset apart. With NOMMU, this means
> if we don't find physical addresses to suffice the initial condition
> we're going to fail allocations. My guess is that these machines have a
> fairly limited amount of memory so it isn't an issue.
> 
> > > Given that, if sh has no MMU, then it really should be using percpu-km.c
> > > not percpu-vm.c as there is no notion of unpopulated pages for that
> > > configuration. I think we should fix the Kconfig constraints and give
> > > that a shot. I didn't have any known users of !mmu/percpu-km.c, so I
> > > didn't want to make the changes without someone verifying it or needing
> > > it.
> > 
> > I'm not familiar with the different percpu implementations, and
> > apparently up til now we've been using percpu-vm.c without noticing
> > any problem. In concept, should both be usable, or was it just an
> > accident that it worked?
> > 
> > Does it suffice to do:
> > 
> >  config NEED_PER_CPU_KM
> > -       depends on !SMP
> > +       depends on !SMP || !MMU
> > 
> > or should we also have:
> > 
> > +       select NEED_PER_CPU_KM if !MMU && SMP
> > 
> > for arch/sh/Kconfig to ensure there's a conflict if it can't be
> > selected? (Or, should this logic be higher level and not
> > arch-dependent?)
> > 
> > I'm not sure if there are any NOMMU+SMP ARM systems Linux supports,
> > but FYI ARM has no-op implementations of all the TLB flush functions
> > (whereas arch/sh is missing some and has BUG() for others), so it's
> > possible the same thing is happening on ARM and just not exhibiting
> > any outwardly wrong behaviors.
> > 
> 
> Continuing my explanation above, percpu on !SMP is just a pass through
> allocation. Percpu can just blindly allocate memory and work as an
> identity mapped pass through. So !SMP == NEED_PER_CPU_KM.

I'm still not understanding -- is percpu-km.c really incompatible with
SMP? I've successfully booted with the Kconfig changed to allow it,
but it's possible that cores are just clobbering each other's percpu
memory or something, and happening not to blow up yet.

> The tricky part is we still need to space the allocations which is what
> percpu-vm.c kind of conflates with having an MMU and paging. SMP really
> needs that part of percpu-vm.c not the vmalloc part.
> 
> So really, the appropriate fix would be to split out the address
> allocation logic from the paging logic of percpu. This way backing can
> occur through percpu-km.c for NOMMU+SMP. Unfortunately that is quite a
> bit of work that I don't have the bandwidth for right now.
> 
> As a stop gap, I think we can get away with just stubbing out the tlb
> flush as you've done and not bugging out on it because in the NOMMU+SMP
> land, we need to get that specific address and not anything else. Doing

Can #ifdef CONFIG_MMU be put around the code in percpu-vm.c to make it
clear that this case is supported?

> so should still be fine if my assumption about J2 being a small machine
> is correct.

Most devices have around 128 MB of DRAM. Is that within your view of
"small"?

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix the j-core SOC build.
  2021-11-12 18:57             ` Rich Felker
@ 2021-11-12 22:41               ` Dennis Zhou
  0 siblings, 0 replies; 12+ messages in thread
From: Dennis Zhou @ 2021-11-12 22:41 UTC (permalink / raw)
  To: Rich Felker
  Cc: Rob Landley, Linux-sh list, Nicholas Piggin, Geert Uytterhoeven

On Fri, Nov 12, 2021 at 01:57:53PM -0500, Rich Felker wrote:
> On Fri, Nov 12, 2021 at 01:34:43PM -0500, Dennis Zhou wrote:
> > On Thu, Nov 11, 2021 at 09:16:22AM -0500, Rich Felker wrote:
> > > On Wed, Nov 10, 2021 at 09:01:10PM -0500, Dennis Zhou wrote:
> > > > Hello,
> > > > 
> > > > On Wed, Nov 10, 2021 at 07:53:13PM -0500, Rich Felker wrote:
> > > > > On Wed, Nov 10, 2021 at 04:33:46PM -0500, Rich Felker wrote:
> > > > > > On Sat, Oct 02, 2021 at 10:18:51PM -0400, Rich Felker wrote:
> > > > > > > On Sat, Oct 02, 2021 at 02:32:15PM -0500, Rob Landley wrote:
> > > > > > > > From: Rob Landley <rob@landley.net>
> > > > > > > > 
> > > > > > > > Commit b67177ecd956 broke the j-core SOC build with a link failure, because
> > > > > > > > mm/percpu.c function pcpu_post_unmap_tlb_flush() calls flush_tlb_kernel_range()
> > > > > > > > which is defined under #ifdef CONFIG_MMU.
> > > > > > > > 
> > > > > > > > Signed-off-by: Rob Landley <rob@landley.net>
> > > > > > > > ---
> > > > > > > > 
> > > > > > > >  arch/sh/kernel/smp.c |    5 +++++
> > > > > > > >  1 file changed, 5 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/arch/sh/kernel/smp.c b/arch/sh/kernel/smp.c
> > > > > > > > index 65924d9ec245..3ec8f32aad85 100644
> > > > > > > > --- a/arch/sh/kernel/smp.c
> > > > > > > > +++ b/arch/sh/kernel/smp.c
> > > > > > > > @@ -468,4 +468,9 @@ void flush_tlb_one(unsigned long asid, unsigned long vaddr)
> > > > > > > >  	local_flush_tlb_one(asid, vaddr);
> > > > > > > >  }
> > > > > > > > 
> > > > > > > > +#else
> > > > > > > > +void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> > > > > > > > +{
> > > > > > > > +	local_flush_tlb_all();
> > > > > > > > +}
> > > > > > > >  #endif
> > > > > > > 
> > > > > > > local_flush_tlb_all() is defined in arch/sh/mm/nommu.c as BUG(); so
> > > > > > > this is most likely wrong unless it just doesn't get called. I think
> > > > > > > there should probably be something at a very general level dummying
> > > > > > > out these functions/macros on nommu but I don't know where it should
> > > > > > > be.
> > > > > > 
> > > > > > I've looked into this some more, and while arch/arm does dummy its
> > > > > > equivalent functions out on nommu, arch/sh has always had them as
> > > > > > BUG(), and indeed it makes some sense to catch erroneous usage.
> > > > > > pcpu_post_unmap_tlb_flush should probably have the flush under #ifdef
> > > > > > CONFIG_MMU or something.
> > > > > > 
> > > > > > I've added the author of the commit that broke this to Cc in case he
> > > > > > has any thoughts.
> > > > > 
> > > > > I think we actually have the wrong commit to blame; it looks like
> > > > > 93274f1dd6b0 ("percpu: flush tlb in pcpu_reclaim_populated()") added
> > > > > the TLB flush. However, this rabbit hole seems to go a lot deeper.
> > > > > According to the comments in the files, percpu-km.c, not percpu-vm.c,
> > > > > is supposed to be used on nommu archs. However, mm/Kconfig has
> > > > > NEED_PER_CPU_KM depending on !SMP rather than !SMP || !MMU, which
> > > > > mismatches the comments in the source. So should we be trying to fix
> > > > > the Kconfig constraints to use -km? Or fixing -vm to work on nommu
> > > > > like it used to?
> > > > > 
> > > > > Rich
> > > > 
> > > > I'm surprised I haven't heard any reports of this until now. Thanks for
> > > > ccing me. Fwiw, it's generally good to cc the author / subsystem
> > > > maintainer when there is an issue.
> > > 
> > > Thanks for the quick reply!
> > > 
> > 
> > Sorry, I've had to think about this a little bit.
> > 
> > > > I would need to think about it a little bit more, but I think
> > > > percpu-km.c was written more in the mindset of !MMU == !SMP. It sounds
> > > > like for superh that's not true.
> > > 
> > > Indeed, it's possible we have the only NOMMU+SMP system Linux
> > > currently runs on; if so that makes it something of a good smoke test
> > > for corner cases. J2 users just haven't kept up with the last few
> > > kernel releases, so the breakage went unnoticed for a while until
> > > resyncing with current.
> > > 
> > 
> > So NOMMU+SMP is the odd case out. How percpu works is by creating
> > regions in the VMA that are a fixed offset apart. With NOMMU, this means
> > if we don't find physical addresses to suffice the initial condition
> > we're going to fail allocations. My guess is that these machines have a
> > fairly limited amount of memory so it isn't an issue.
> > 
> > > > Given that, if sh has no MMU, then it really should be using percpu-km.c
> > > > not percpu-vm.c as there is no notion of unpopulated pages for that
> > > > configuration. I think we should fix the Kconfig constraints and give
> > > > that a shot. I didn't have any known users of !mmu/percpu-km.c, so I
> > > > didn't want to make the changes without someone verifying it or needing
> > > > it.
> > > 
> > > I'm not familiar with the different percpu implementations, and
> > > apparently up til now we've been using percpu-vm.c without noticing
> > > any problem. In concept, should both be usable, or was it just an
> > > accident that it worked?
> > > 
> > > Does it suffice to do:
> > > 
> > >  config NEED_PER_CPU_KM
> > > -       depends on !SMP
> > > +       depends on !SMP || !MMU
> > > 
> > > or should we also have:
> > > 
> > > +       select NEED_PER_CPU_KM if !MMU && SMP
> > > 
> > > for arch/sh/Kconfig to ensure there's a conflict if it can't be
> > > selected? (Or, should this logic be higher level and not
> > > arch-dependent?)
> > > 
> > > I'm not sure if there are any NOMMU+SMP ARM systems Linux supports,
> > > but FYI ARM has no-op implementations of all the TLB flush functions
> > > (whereas arch/sh is missing some and has BUG() for others), so it's
> > > possible the same thing is happening on ARM and just not exhibiting
> > > any outwardly wrong behaviors.
> > > 
> > 
> > Continuing my explanation above, percpu on !SMP is just a pass through
> > allocation. Percpu can just blindly allocate memory and work as an
> > identity mapped pass through. So !SMP == NEED_PER_CPU_KM.
> 
> I'm still not understanding -- is percpu-km.c really incompatible with
> SMP? I've successfully booted with the Kconfig changed to allow it,
> but it's possible that cores are just clobbering each other's percpu
> memory or something, and happening not to blow up yet.
> 

I'm wrong, I misremembered initialization and pinned the problem on the
wrong part. Let me start again below.

> > The tricky part is we still need to space the allocations which is what
> > percpu-vm.c kind of conflates with having an MMU and paging. SMP really
> > needs that part of percpu-vm.c not the vmalloc part.
> > 
> > So really, the appropriate fix would be to split out the address
> > allocation logic from the paging logic of percpu. This way backing can
> > occur through percpu-km.c for NOMMU+SMP. Unfortunately that is quite a
> > bit of work that I don't have the bandwidth for right now.
> > 
> > As a stop gap, I think we can get away with just stubbing out the tlb
> > flush as you've done and not bugging out on it because in the NOMMU+SMP
> > land, we need to get that specific address and not anything else. Doing
> 
> Can #ifdef CONFIG_MMU be put around the code in percpu-vm.c to make it
> clear that this case is supported?
> 
> > so should still be fine if my assumption about J2 being a small machine
> > is correct.
> 
> Most devices have around 128 MB of DRAM. Is that within your view of
> "small"?
> 

Yeah that should be fine.

I realize the real problem is cpu distance. percpu-km.c assumes that all
cpus are considered in the same group, 0. I assume that isn't an issue
for J2 and really shouldn't be for any NOMMU setup.

So, I think sh should be using NEED_PER_CPU_KM and it should be right to
just switch the Kconfig to !SMP || !MMU. Is that all you had to change?
Honestly, if it boots and hasn't bugged out after a reasonable amount of
time, it's probably doing the right thing.

Sorry for misspeaking before.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-11-12 22:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-02 19:32 [PATCH] Fix the j-core SOC build Rob Landley
2021-10-02 21:39 ` John Paul Adrian Glaubitz
2021-10-03  2:18 ` Rich Felker
2021-11-10 21:33   ` Rich Felker
2021-11-11  0:53     ` Rich Felker
2021-11-11  2:01       ` Dennis Zhou
2021-11-11  5:39         ` Rob Landley
2021-11-11 14:16         ` Rich Felker
2021-11-12 18:34           ` Dennis Zhou
2021-11-12 18:57             ` Rich Felker
2021-11-12 22:41               ` Dennis Zhou
2021-10-03  9:10 ` Sergey Shtylyov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).