linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qestion] Softlockup when send IPI to other CPUs
       [not found] ` <20190119235825.GG26876@brain-police>
@ 2019-01-21 14:21   ` Catalin Marinas
  2019-01-22  5:44     ` Will Deacon
  0 siblings, 1 reply; 8+ messages in thread
From: Catalin Marinas @ 2019-01-21 14:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: chenwandun, linux-kernel, linux-arm-kernel, Wangkefeng (Kevin),
	anshuman.khandual

On Sat, Jan 19, 2019 at 11:58:27PM +0000, Will Deacon wrote:
> On Thu, Jan 17, 2019 at 07:42:44AM +0000, chenwandun wrote:
> > Recently, I do some tests on linux-4.19 and hit a softlockup issue.
> > 
> > I find some CPUs get the spinlock in the __split_huge_pmd function and
> > then send IPI to other CPUs, waiting the response, while several CPUs
> > enter the __split_huge_pmd function, want to get the spinlock, but always
> > in queued_spin_lock_slowpath,
> > 
> > Because long time no response to the IPI, that results in a softlockup.
> > 
> > As to sending IPI, it was in the patch
> > 3b8c9f1cdfc506e94e992ae66b68bbe416f89610.  The patch is mean to send IPI
> > to each CPU after invalidating the I-cache for kernel mappings.  In this
> > case, after modify pmd, it sends IPI to other CPUS to sync memory
> > mappings.
> > 
> > No stable test case to repeat the result, it is hard to repeat the test procedure.
> > 
> > The environment is arm64, 64 CPUs. Except for idle CPU, there are 6 kind
> > of callstacks in total.
> 
> This looks like another lockup that would be solved if we deferred our
> I-cache invalidation when mapping user-executable pages, and instead
> performed the invalidation off the back of a UXN permission fault, where we
> could avoid holding any locks.

Looking back at commit 3b8c9f1cdfc5 ("arm64: IPI each CPU after
invalidating the I-cache for kernel mappings"), the text implies that it
should only do this for kernel mappings. I don't think we need this for
user mappings. We have a few scenarios where we invoke set_pte_at() with
exec permission:

1. Page faulted in - the pte was not previously accessible and the CPU
   should not have stale decoded instructions (my interpretation of the
   ARM ARM).

2. huge pmd splitting - there is no change to the underlying data. I
   have a suspicion here that we shouldn't even call
   sync_icache_aliases() but probably the PG_arch_1 isn't carried over
   from the head compound page to the small pages (needs checking).

3. page migration - there is no change to the underlying data and
   instructions, so I don't think we need the all CPUs sync.

4. mprotect() setting exec permission - that's normally the
   responsibility of the user to ensure cache maintenance

(I can add more text to the patch below but need to get to the bottom of
this first)

---------8<-------------------------------------------------
arm64: Do not issue IPIs for user executable ptes

From: Catalin Marinas <catalin.marinas@arm.com>

Commit 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache
for kernel mappings") was aimed at fixing the I-cache invalidation for
kernel mappings. However, it inadvertently caused all cache maintenance
for user mappings via set_pte_at() -> __sync_icache_dcache() to call
kick_all_cpus_sync().

Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
Cc: <stable@vger.kernel.org> # 4.19.x-
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/cacheflush.h |    2 +-
 arch/arm64/kernel/probes/uprobes.c  |    2 +-
 arch/arm64/mm/flush.c               |   14 ++++++++++----
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 19844211a4e6..18e92d9dacd4 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -80,7 +80,7 @@ extern void __clean_dcache_area_poc(void *addr, size_t len);
 extern void __clean_dcache_area_pop(void *addr, size_t len);
 extern void __clean_dcache_area_pou(void *addr, size_t len);
 extern long __flush_cache_user_range(unsigned long start, unsigned long end);
-extern void sync_icache_aliases(void *kaddr, unsigned long len);
+extern void sync_icache_aliases(void *kaddr, unsigned long len, bool sync);
 
 static inline void flush_icache_range(unsigned long start, unsigned long end)
 {
diff --git a/arch/arm64/kernel/probes/uprobes.c b/arch/arm64/kernel/probes/uprobes.c
index 636ca0119c0e..595e8c8f41cd 100644
--- a/arch/arm64/kernel/probes/uprobes.c
+++ b/arch/arm64/kernel/probes/uprobes.c
@@ -24,7 +24,7 @@ void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 	memcpy(dst, src, len);
 
 	/* flush caches (dcache/icache) */
-	sync_icache_aliases(dst, len);
+	sync_icache_aliases(dst, len, true);
 
 	kunmap_atomic(xol_page_kaddr);
 }
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 30695a868107..5c2f23a92d14 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -25,15 +25,17 @@
 #include <asm/cache.h>
 #include <asm/tlbflush.h>
 
-void sync_icache_aliases(void *kaddr, unsigned long len)
+void sync_icache_aliases(void *kaddr, unsigned long len, bool sync)
 {
 	unsigned long addr = (unsigned long)kaddr;
 
 	if (icache_is_aliasing()) {
 		__clean_dcache_area_pou(kaddr, len);
 		__flush_icache_all();
-	} else {
+	} else if (sync) {
 		flush_icache_range(addr, addr + len);
+	} else {
+		__flush_icache_range(addr, addr + len);
 	}
 }
 
@@ -42,7 +44,7 @@ static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
 				unsigned long len)
 {
 	if (vma->vm_flags & VM_EXEC)
-		sync_icache_aliases(kaddr, len);
+		sync_icache_aliases(kaddr, len, true);
 }
 
 /*
@@ -63,8 +65,12 @@ void __sync_icache_dcache(pte_t pte)
 	struct page *page = pte_page(pte);
 
 	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+		/*
+		 * Don't issue kick_all_cpus_sync() after I-cache invalidation
+		 * when setting a user executable pte.
+		 */
 		sync_icache_aliases(page_address(page),
-				    PAGE_SIZE << compound_order(page));
+				    PAGE_SIZE << compound_order(page), false);
 }
 EXPORT_SYMBOL_GPL(__sync_icache_dcache);
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qestion] Softlockup when send IPI to other CPUs
  2019-01-21 14:21   ` [Qestion] Softlockup when send IPI to other CPUs Catalin Marinas
@ 2019-01-22  5:44     ` Will Deacon
  2019-01-22 14:55       ` Mark Rutland
  2019-01-23 18:15       ` Catalin Marinas
  0 siblings, 2 replies; 8+ messages in thread
From: Will Deacon @ 2019-01-22  5:44 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: chenwandun, linux-kernel, linux-arm-kernel, Wangkefeng (Kevin),
	anshuman.khandual

Hi Catalin,

On Mon, Jan 21, 2019 at 02:21:28PM +0000, Catalin Marinas wrote:
> On Sat, Jan 19, 2019 at 11:58:27PM +0000, Will Deacon wrote:
> > On Thu, Jan 17, 2019 at 07:42:44AM +0000, chenwandun wrote:
> > > Recently, I do some tests on linux-4.19 and hit a softlockup issue.
> > > 
> > > I find some CPUs get the spinlock in the __split_huge_pmd function and
> > > then send IPI to other CPUs, waiting the response, while several CPUs
> > > enter the __split_huge_pmd function, want to get the spinlock, but always
> > > in queued_spin_lock_slowpath,
> > > 
> > > Because long time no response to the IPI, that results in a softlockup.
> > > 
> > > As to sending IPI, it was in the patch
> > > 3b8c9f1cdfc506e94e992ae66b68bbe416f89610.  The patch is mean to send IPI
> > > to each CPU after invalidating the I-cache for kernel mappings.  In this
> > > case, after modify pmd, it sends IPI to other CPUS to sync memory
> > > mappings.
> > > 
> > > No stable test case to repeat the result, it is hard to repeat the test procedure.
> > > 
> > > The environment is arm64, 64 CPUs. Except for idle CPU, there are 6 kind
> > > of callstacks in total.
> > 
> > This looks like another lockup that would be solved if we deferred our
> > I-cache invalidation when mapping user-executable pages, and instead
> > performed the invalidation off the back of a UXN permission fault, where we
> > could avoid holding any locks.
> 
> Looking back at commit 3b8c9f1cdfc5 ("arm64: IPI each CPU after
> invalidating the I-cache for kernel mappings"), the text implies that it
> should only do this for kernel mappings. I don't think we need this for
> user mappings. We have a few scenarios where we invoke set_pte_at() with
> exec permission:

Yes, I think you're right. I got confused because in this case we are
invalidating lines written by the kernel, but actually it's not about who
writes the data, but about whether or not the page table is being changed.

> 1. Page faulted in - the pte was not previously accessible and the CPU
>    should not have stale decoded instructions (my interpretation of the
>    ARM ARM).
> 
> 2. huge pmd splitting - there is no change to the underlying data. I
>    have a suspicion here that we shouldn't even call
>    sync_icache_aliases() but probably the PG_arch_1 isn't carried over
>    from the head compound page to the small pages (needs checking).
> 
> 3. page migration - there is no change to the underlying data and
>    instructions, so I don't think we need the all CPUs sync.
> 
> 4. mprotect() setting exec permission - that's normally the
>    responsibility of the user to ensure cache maintenance
> 
> (I can add more text to the patch below but need to get to the bottom of
> this first)
> 
> ---------8<-------------------------------------------------
> arm64: Do not issue IPIs for user executable ptes
> 
> From: Catalin Marinas <catalin.marinas@arm.com>
> 
> Commit 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache
> for kernel mappings") was aimed at fixing the I-cache invalidation for
> kernel mappings. However, it inadvertently caused all cache maintenance
> for user mappings via set_pte_at() -> __sync_icache_dcache() to call
> kick_all_cpus_sync().
> 
> Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
> Cc: <stable@vger.kernel.org> # 4.19.x-
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm64/include/asm/cacheflush.h |    2 +-
>  arch/arm64/kernel/probes/uprobes.c  |    2 +-
>  arch/arm64/mm/flush.c               |   14 ++++++++++----
>  3 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index 19844211a4e6..18e92d9dacd4 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -80,7 +80,7 @@ extern void __clean_dcache_area_poc(void *addr, size_t len);
>  extern void __clean_dcache_area_pop(void *addr, size_t len);
>  extern void __clean_dcache_area_pou(void *addr, size_t len);
>  extern long __flush_cache_user_range(unsigned long start, unsigned long end);
> -extern void sync_icache_aliases(void *kaddr, unsigned long len);
> +extern void sync_icache_aliases(void *kaddr, unsigned long len, bool sync);

I'd much prefer just adding something like sync_user_icache_aliases(), which
would avoid the IPI, since bool arguments tend to make the callsites
unreadable imo.

With that:

Acked-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qestion] Softlockup when send IPI to other CPUs
  2019-01-22  5:44     ` Will Deacon
@ 2019-01-22 14:55       ` Mark Rutland
  2019-01-23 10:21         ` Will Deacon
  2019-01-23 18:15       ` Catalin Marinas
  1 sibling, 1 reply; 8+ messages in thread
From: Mark Rutland @ 2019-01-22 14:55 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, chenwandun, linux-kernel, linux-arm-kernel,
	Wangkefeng (Kevin),
	anshuman.khandual

Hi Will,

On Tue, Jan 22, 2019 at 05:44:02AM +0000, Will Deacon wrote:
> On Mon, Jan 21, 2019 at 02:21:28PM +0000, Catalin Marinas wrote:
> > On Sat, Jan 19, 2019 at 11:58:27PM +0000, Will Deacon wrote:
> > > On Thu, Jan 17, 2019 at 07:42:44AM +0000, chenwandun wrote:
> > > > Recently, I do some tests on linux-4.19 and hit a softlockup issue.
> > > > 
> > > > I find some CPUs get the spinlock in the __split_huge_pmd function and
> > > > then send IPI to other CPUs, waiting the response, while several CPUs
> > > > enter the __split_huge_pmd function, want to get the spinlock, but always
> > > > in queued_spin_lock_slowpath,
> > > > 
> > > > Because long time no response to the IPI, that results in a softlockup.
> > > > 
> > > > As to sending IPI, it was in the patch
> > > > 3b8c9f1cdfc506e94e992ae66b68bbe416f89610.  The patch is mean to send IPI
> > > > to each CPU after invalidating the I-cache for kernel mappings.  In this
> > > > case, after modify pmd, it sends IPI to other CPUS to sync memory
> > > > mappings.
> > > > 
> > > > No stable test case to repeat the result, it is hard to repeat the test procedure.
> > > > 
> > > > The environment is arm64, 64 CPUs. Except for idle CPU, there are 6 kind
> > > > of callstacks in total.
> > > 
> > > This looks like another lockup that would be solved if we deferred our
> > > I-cache invalidation when mapping user-executable pages, and instead
> > > performed the invalidation off the back of a UXN permission fault, where we
> > > could avoid holding any locks.
> > 
> > Looking back at commit 3b8c9f1cdfc5 ("arm64: IPI each CPU after
> > invalidating the I-cache for kernel mappings"), the text implies that it
> > should only do this for kernel mappings. I don't think we need this for
> > user mappings. We have a few scenarios where we invoke set_pte_at() with
> > exec permission:
> 
> Yes, I think you're right. I got confused because in this case we are
> invalidating lines written by the kernel, but actually it's not about who
> writes the data, but about whether or not the page table is being changed.

IIUC we may have a userspace problem analagous to the kernel modules
problem, if userspace uses dlopen/dlclose to dynamically load/unload
shared objects.

If userspace unloads an object, then loads another, the new object might
get placed at the same VA. A PE could have started speculating
instructions from the old object, and IIUC the TLB invalidation and
I-cache maintenance don't cause those instructions be re-fetched from
the I-cache unless there's a context synchronization event.

Do we require the use of membarrier when loading or unloading objects?
If so, when does that happen relative to the unmap or map?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qestion] Softlockup when send IPI to other CPUs
  2019-01-22 14:55       ` Mark Rutland
@ 2019-01-23 10:21         ` Will Deacon
  0 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2019-01-23 10:21 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, chenwandun, linux-kernel, linux-arm-kernel,
	Wangkefeng (Kevin),
	anshuman.khandual

On Tue, Jan 22, 2019 at 02:55:22PM +0000, Mark Rutland wrote:
> On Tue, Jan 22, 2019 at 05:44:02AM +0000, Will Deacon wrote:
> > On Mon, Jan 21, 2019 at 02:21:28PM +0000, Catalin Marinas wrote:
> > > On Sat, Jan 19, 2019 at 11:58:27PM +0000, Will Deacon wrote:
> > > > On Thu, Jan 17, 2019 at 07:42:44AM +0000, chenwandun wrote:
> > > > > Recently, I do some tests on linux-4.19 and hit a softlockup issue.
> > > > > 
> > > > > I find some CPUs get the spinlock in the __split_huge_pmd function and
> > > > > then send IPI to other CPUs, waiting the response, while several CPUs
> > > > > enter the __split_huge_pmd function, want to get the spinlock, but always
> > > > > in queued_spin_lock_slowpath,
> > > > > 
> > > > > Because long time no response to the IPI, that results in a softlockup.
> > > > > 
> > > > > As to sending IPI, it was in the patch
> > > > > 3b8c9f1cdfc506e94e992ae66b68bbe416f89610.  The patch is mean to send IPI
> > > > > to each CPU after invalidating the I-cache for kernel mappings.  In this
> > > > > case, after modify pmd, it sends IPI to other CPUS to sync memory
> > > > > mappings.
> > > > > 
> > > > > No stable test case to repeat the result, it is hard to repeat the test procedure.
> > > > > 
> > > > > The environment is arm64, 64 CPUs. Except for idle CPU, there are 6 kind
> > > > > of callstacks in total.
> > > > 
> > > > This looks like another lockup that would be solved if we deferred our
> > > > I-cache invalidation when mapping user-executable pages, and instead
> > > > performed the invalidation off the back of a UXN permission fault, where we
> > > > could avoid holding any locks.
> > > 
> > > Looking back at commit 3b8c9f1cdfc5 ("arm64: IPI each CPU after
> > > invalidating the I-cache for kernel mappings"), the text implies that it
> > > should only do this for kernel mappings. I don't think we need this for
> > > user mappings. We have a few scenarios where we invoke set_pte_at() with
> > > exec permission:
> > 
> > Yes, I think you're right. I got confused because in this case we are
> > invalidating lines written by the kernel, but actually it's not about who
> > writes the data, but about whether or not the page table is being changed.
> 
> IIUC we may have a userspace problem analagous to the kernel modules
> problem, if userspace uses dlopen/dlclose to dynamically load/unload
> shared objects.
> 
> If userspace unloads an object, then loads another, the new object might
> get placed at the same VA. A PE could have started speculating
> instructions from the old object, and IIUC the TLB invalidation and
> I-cache maintenance don't cause those instructions be re-fetched from
> the I-cache unless there's a context synchronization event.
> 
> Do we require the use of membarrier when loading or unloading objects?
> If so, when does that happen relative to the unmap or map?

membarrier seems a bit OTT for this. Assumedly userspace is already having
to synchronise threads in this case so that (a) nobody is executing the old
object when it is unloaded and (b) nobody tries to execute the new
object until it has been successfully loaded. Squeezing in an ISB shouldn't
be too tricky, although I don't know whether it's actually done (especially
since the chance of this going wrong is so tiny).

Will

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qestion] Softlockup when send IPI to other CPUs
  2019-01-22  5:44     ` Will Deacon
  2019-01-22 14:55       ` Mark Rutland
@ 2019-01-23 18:15       ` Catalin Marinas
  2019-01-24  7:00         ` Shijith Thotton
  1 sibling, 1 reply; 8+ messages in thread
From: Catalin Marinas @ 2019-01-23 18:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: chenwandun, linux-kernel, linux-arm-kernel, Wangkefeng (Kevin),
	anshuman.khandual

On Tue, Jan 22, 2019 at 05:44:02AM +0000, Will Deacon wrote:
> On Mon, Jan 21, 2019 at 02:21:28PM +0000, Catalin Marinas wrote:
> > arm64: Do not issue IPIs for user executable ptes
> > 
> > From: Catalin Marinas <catalin.marinas@arm.com>
> > 
> > Commit 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache
> > for kernel mappings") was aimed at fixing the I-cache invalidation for
> > kernel mappings. However, it inadvertently caused all cache maintenance
> > for user mappings via set_pte_at() -> __sync_icache_dcache() to call
> > kick_all_cpus_sync().
> > 
> > Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
> > Cc: <stable@vger.kernel.org> # 4.19.x-
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > ---
> >  arch/arm64/include/asm/cacheflush.h |    2 +-
> >  arch/arm64/kernel/probes/uprobes.c  |    2 +-
> >  arch/arm64/mm/flush.c               |   14 ++++++++++----
> >  3 files changed, 12 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> > index 19844211a4e6..18e92d9dacd4 100644
> > --- a/arch/arm64/include/asm/cacheflush.h
> > +++ b/arch/arm64/include/asm/cacheflush.h
> > @@ -80,7 +80,7 @@ extern void __clean_dcache_area_poc(void *addr, size_t len);
> >  extern void __clean_dcache_area_pop(void *addr, size_t len);
> >  extern void __clean_dcache_area_pou(void *addr, size_t len);
> >  extern long __flush_cache_user_range(unsigned long start, unsigned long end);
> > -extern void sync_icache_aliases(void *kaddr, unsigned long len);
> > +extern void sync_icache_aliases(void *kaddr, unsigned long len, bool sync);
> 
> I'd much prefer just adding something like sync_user_icache_aliases(), which
> would avoid the IPI, since bool arguments tend to make the callsites
> unreadable imo.

I wonder whether we need the IPI for uprobes and ptrace. I would say we
can avoid it for ptrace since normally the debugged thread should be
stopped. I think it's different for uprobes but changing the text of a
live thread doesn't come with many guarantees anyway. So I'm tempted to
simply change sync_icache_aliases() to not issue an IPI at all, in which
case we wouldn't even need the bool argument; it's only used for user
addresses. So the new diff would be (the text is the same):

diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 30695a868107..5c9073bace83 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -33,7 +33,11 @@ void sync_icache_aliases(void *kaddr, unsigned long len)
 		__clean_dcache_area_pou(kaddr, len);
 		__flush_icache_all();
 	} else {
-		flush_icache_range(addr, addr + len);
+		/*
+		 * Don't issue kick_all_cpus_sync() after I-cache invalidation
+		 * for user mappings.
+		 */
+		__flush_icache_range(addr, addr + len);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qestion] Softlockup when send IPI to other CPUs
  2019-01-23 18:15       ` Catalin Marinas
@ 2019-01-24  7:00         ` Shijith Thotton
  2019-01-24 16:37           ` Catalin Marinas
  0 siblings, 1 reply; 8+ messages in thread
From: Shijith Thotton @ 2019-01-24  7:00 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: chenwandun, linux-kernel, linux-arm-kernel, Wangkefeng (Kevin),
	anshuman.khandual, Ganapatrao Kulkarni,
	Jayachandran Chandrasekharan Nair

Hi Catalin,

On 01/23/2019 11:45 PM, Catalin Marinas wrote:
> On Tue, Jan 22, 2019 at 05:44:02AM +0000, Will Deacon wrote:
>> On Mon, Jan 21, 2019 at 02:21:28PM +0000, Catalin Marinas wrote:
>>> arm64: Do not issue IPIs for user executable ptes
>>>
>>> From: Catalin Marinas <catalin.marinas@arm.com>
>>>
>>> Commit 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache
>>> for kernel mappings") was aimed at fixing the I-cache invalidation for
>>> kernel mappings. However, it inadvertently caused all cache maintenance
>>> for user mappings via set_pte_at() -> __sync_icache_dcache() to call
>>> kick_all_cpus_sync().
>>>
>>> Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
>>> Cc: <stable@vger.kernel.org> # 4.19.x-
>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>> ---
>>>   arch/arm64/include/asm/cacheflush.h |    2 +-
>>>   arch/arm64/kernel/probes/uprobes.c  |    2 +-
>>>   arch/arm64/mm/flush.c               |   14 ++++++++++----
>>>   3 files changed, 12 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
>>> index 19844211a4e6..18e92d9dacd4 100644
>>> --- a/arch/arm64/include/asm/cacheflush.h
>>> +++ b/arch/arm64/include/asm/cacheflush.h
>>> @@ -80,7 +80,7 @@ extern void __clean_dcache_area_poc(void *addr, size_t len);
>>>   extern void __clean_dcache_area_pop(void *addr, size_t len);
>>>   extern void __clean_dcache_area_pou(void *addr, size_t len);
>>>   extern long __flush_cache_user_range(unsigned long start, unsigned long end);
>>> -extern void sync_icache_aliases(void *kaddr, unsigned long len);
>>> +extern void sync_icache_aliases(void *kaddr, unsigned long len, bool sync);
>>
>> I'd much prefer just adding something like sync_user_icache_aliases(), which
>> would avoid the IPI, since bool arguments tend to make the callsites
>> unreadable imo.
> 
> I wonder whether we need the IPI for uprobes and ptrace. I would say we
> can avoid it for ptrace since normally the debugged thread should be
> stopped. I think it's different for uprobes but changing the text of a
> live thread doesn't come with many guarantees anyway. So I'm tempted to
> simply change sync_icache_aliases() to not issue an IPI at all, in which
> case we wouldn't even need the bool argument; it's only used for user
> addresses. So the new diff would be (the text is the same):
> 
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index 30695a868107..5c9073bace83 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -33,7 +33,11 @@ void sync_icache_aliases(void *kaddr, unsigned long len)
>   		__clean_dcache_area_pou(kaddr, len);
>   		__flush_icache_all();
>   	} else {
> -		flush_icache_range(addr, addr + len);
> +		/*
> +		 * Don't issue kick_all_cpus_sync() after I-cache invalidation
> +		 * for user mappings.
> +		 */
> +		__flush_icache_range(addr, addr + len);
>   	}
>   }

We also faced similar issue with LTP test migrate_pages03 in past and this patch 
fixes the issue.

http://lists.infradead.org/pipermail/linux-arm-kernel/2019-January/623574.html

Thanks,
Shijith

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qestion] Softlockup when send IPI to other CPUs
  2019-01-24  7:00         ` Shijith Thotton
@ 2019-01-24 16:37           ` Catalin Marinas
  2019-01-25  9:57             ` Shijith Thotton
  0 siblings, 1 reply; 8+ messages in thread
From: Catalin Marinas @ 2019-01-24 16:37 UTC (permalink / raw)
  To: Shijith Thotton
  Cc: Will Deacon, chenwandun, linux-kernel, linux-arm-kernel,
	Wangkefeng (Kevin),
	anshuman.khandual, Ganapatrao Kulkarni,
	Jayachandran Chandrasekharan Nair

Hi  Shijith,

On Thu, Jan 24, 2019 at 07:00:42AM +0000, Shijith Thotton wrote:
> On 01/23/2019 11:45 PM, Catalin Marinas wrote:
> > diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> > index 30695a868107..5c9073bace83 100644
> > --- a/arch/arm64/mm/flush.c
> > +++ b/arch/arm64/mm/flush.c
> > @@ -33,7 +33,11 @@ void sync_icache_aliases(void *kaddr, unsigned long len)
> >   		__clean_dcache_area_pou(kaddr, len);
> >   		__flush_icache_all();
> >   	} else {
> > -		flush_icache_range(addr, addr + len);
> > +		/*
> > +		 * Don't issue kick_all_cpus_sync() after I-cache invalidation
> > +		 * for user mappings.
> > +		 */
> > +		__flush_icache_range(addr, addr + len);
> >   	}
> >   }
> 
> We also faced similar issue with LTP test migrate_pages03 in past and this patch 
> fixes the issue.
> 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2019-January/623574.html

Thanks for confirming. I presume I can add your tested-by.

-- 
Catalin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qestion] Softlockup when send IPI to other CPUs
  2019-01-24 16:37           ` Catalin Marinas
@ 2019-01-25  9:57             ` Shijith Thotton
  0 siblings, 0 replies; 8+ messages in thread
From: Shijith Thotton @ 2019-01-25  9:57 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, chenwandun, linux-kernel, linux-arm-kernel,
	Wangkefeng (Kevin),
	anshuman.khandual, Ganapatrao Kulkarni,
	Jayachandran Chandrasekharan Nair

On 01/24/2019 10:07 PM, Catalin Marinas wrote:
> Hi  Shijith,
> 
> On Thu, Jan 24, 2019 at 07:00:42AM +0000, Shijith Thotton wrote:
>> On 01/23/2019 11:45 PM, Catalin Marinas wrote:
>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
>>> index 30695a868107..5c9073bace83 100644
>>> --- a/arch/arm64/mm/flush.c
>>> +++ b/arch/arm64/mm/flush.c
>>> @@ -33,7 +33,11 @@ void sync_icache_aliases(void *kaddr, unsigned long len)
>>>    		__clean_dcache_area_pou(kaddr, len);
>>>    		__flush_icache_all();
>>>    	} else {
>>> -		flush_icache_range(addr, addr + len);
>>> +		/*
>>> +		 * Don't issue kick_all_cpus_sync() after I-cache invalidation
>>> +		 * for user mappings.
>>> +		 */
>>> +		__flush_icache_range(addr, addr + len);
>>>    	}
>>>    }
>>
>> We also faced similar issue with LTP test migrate_pages03 in past and this patch
>> fixes the issue.
>>
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2019-January/623574.html
> 
> Thanks for confirming. I presume I can add your tested-by.
> 

Sure.

Tested-by: Shijith Thotton <sthotton@marvell.com>

Shijith

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-01-25  9:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <95C141B25E7AB14BA042DCCC556C0E6501620A47@dggeml529-mbx.china.huawei.com>
     [not found] ` <20190119235825.GG26876@brain-police>
2019-01-21 14:21   ` [Qestion] Softlockup when send IPI to other CPUs Catalin Marinas
2019-01-22  5:44     ` Will Deacon
2019-01-22 14:55       ` Mark Rutland
2019-01-23 10:21         ` Will Deacon
2019-01-23 18:15       ` Catalin Marinas
2019-01-24  7:00         ` Shijith Thotton
2019-01-24 16:37           ` Catalin Marinas
2019-01-25  9:57             ` Shijith Thotton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).