All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kernel] powerpc/mm/iommu: Put pages on process exit
@ 2016-07-14  5:16 Alexey Kardashevskiy
  2016-07-14 10:12 ` Balbir Singh
  0 siblings, 1 reply; 4+ messages in thread
From: Alexey Kardashevskiy @ 2016-07-14  5:16 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Benjamin Herrenschmidt, Paul Mackerras,
	linux-kernel, David Gibson, Balbir Singh, Nick Piggin

At the moment VFIO IOMMU SPAPR v2 driver pins all guest RAM pages when
the userspace starts using VFIO. When the userspace process finishes,
all the pinned pages need to be put; this is done as a part of
the userspace memory context (MM) destruction which happens on
the very last mmdrop().

This approach has a problem that a MM of the userspace process
may live longer than the userspace process itself as kernel threads
usually execute on a MM of a userspace process which was runnning
on a CPU where the kernel thread was scheduled to. If this happened,
the MM remains referenced until this exact kernel thread wakes up again
and releases the very last reference to the MM, on an idle system this
can take even hours.

This fixes the issue by moving mm_iommu_cleanup() (the helper which puts
pages) from destroy_context() (called on the last mmdrop()) to
the arch-specific arch_exit_mmap() hook (called on the last mmput()).
mmdrop() decrements the mm->mm_count which is a total reference number;
mmput() decrements the mm->mm_users which is a number of user spaces and
this is actually the counter we want to watch for here.

Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/mmu_context.h | 3 +++
 arch/powerpc/mm/mmu_context_book3s64.c | 4 ----
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 9d2cd0c..24b590d 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -138,6 +138,9 @@ static inline void arch_dup_mmap(struct mm_struct *oldmm,
 
 static inline void arch_exit_mmap(struct mm_struct *mm)
 {
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+	mm_iommu_cleanup(&mm->context);
+#endif
 }
 
 static inline void arch_unmap(struct mm_struct *mm,
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 19622222..aaeba74 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -159,10 +159,6 @@ static inline void destroy_pagetable_page(struct mm_struct *mm)
 
 void destroy_context(struct mm_struct *mm)
 {
-#ifdef CONFIG_SPAPR_TCE_IOMMU
-	mm_iommu_cleanup(&mm->context);
-#endif
-
 #ifdef CONFIG_PPC_ICSWX
 	drop_cop(mm->context.acop, mm);
 	kfree(mm->context.cop_lockp);
-- 
2.5.0.rc3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH kernel] powerpc/mm/iommu: Put pages on process exit
  2016-07-14  5:16 [PATCH kernel] powerpc/mm/iommu: Put pages on process exit Alexey Kardashevskiy
@ 2016-07-14 10:12 ` Balbir Singh
  2016-07-14 11:52   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 4+ messages in thread
From: Balbir Singh @ 2016-07-14 10:12 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	linux-kernel, David Gibson, Nick Piggin

On Thu, Jul 14, 2016 at 3:16 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> At the moment VFIO IOMMU SPAPR v2 driver pins all guest RAM pages when
> the userspace starts using VFIO. When the userspace process finishes,
> all the pinned pages need to be put; this is done as a part of
> the userspace memory context (MM) destruction which happens on
> the very last mmdrop().
>
> This approach has a problem that a MM of the userspace process
> may live longer than the userspace process itself as kernel threads
> usually execute on a MM of a userspace process which was runnning
> on a CPU where the kernel thread was scheduled to. If this happened,
> the MM remains referenced until this exact kernel thread wakes up again
> and releases the very last reference to the MM, on an idle system this
> can take even hours.
>
> This fixes the issue by moving mm_iommu_cleanup() (the helper which puts
> pages) from destroy_context() (called on the last mmdrop()) to
> the arch-specific arch_exit_mmap() hook (called on the last mmput()).
> mmdrop() decrements the mm->mm_count which is a total reference number;
> mmput() decrements the mm->mm_users which is a number of user spaces and
> this is actually the counter we want to watch for here.
>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/include/asm/mmu_context.h | 3 +++
>  arch/powerpc/mm/mmu_context_book3s64.c | 4 ----
>  2 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..24b590d 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -138,6 +138,9 @@ static inline void arch_dup_mmap(struct mm_struct *oldmm,
>
>  static inline void arch_exit_mmap(struct mm_struct *mm)
>  {
> +#ifdef CONFIG_SPAPR_TCE_IOMMU
> +       mm_iommu_cleanup(&mm->context);
> +#endif
>  }
>

The assumption is that all necessary tear down has happened. Earlier
we were doing the teardown
on the last mm ref drop, do we have any dependence on that? I don't
think so, but just checking

Does qemu dotce_iommu_register_pages on exit()/atexit() or do we rely
on the exit path to do the teardown?

Can we please remove the #ifdef CONFIG_SPAPR_TCE_IOMMU here and have a version
of mm_iommu_cleanup that handles it

Basically do

#ifdef CONFIG_SPAPR_TCE_IOMMU
extern void mm_iommu_cleanup(void *ptr)
#else
static inline mm_iommu_cleanup(void *ptr) {}
#endif

Otherwise looks good

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH kernel] powerpc/mm/iommu: Put pages on process exit
  2016-07-14 10:12 ` Balbir Singh
@ 2016-07-14 11:52   ` Alexey Kardashevskiy
  2016-07-15  1:47     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 4+ messages in thread
From: Alexey Kardashevskiy @ 2016-07-14 11:52 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linuxppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	linux-kernel, David Gibson, Nick Piggin

On 14/07/16 20:12, Balbir Singh wrote:
> On Thu, Jul 14, 2016 at 3:16 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> At the moment VFIO IOMMU SPAPR v2 driver pins all guest RAM pages when
>> the userspace starts using VFIO. When the userspace process finishes,
>> all the pinned pages need to be put; this is done as a part of
>> the userspace memory context (MM) destruction which happens on
>> the very last mmdrop().
>>
>> This approach has a problem that a MM of the userspace process
>> may live longer than the userspace process itself as kernel threads
>> usually execute on a MM of a userspace process which was runnning
>> on a CPU where the kernel thread was scheduled to. If this happened,
>> the MM remains referenced until this exact kernel thread wakes up again
>> and releases the very last reference to the MM, on an idle system this
>> can take even hours.
>>
>> This fixes the issue by moving mm_iommu_cleanup() (the helper which puts
>> pages) from destroy_context() (called on the last mmdrop()) to
>> the arch-specific arch_exit_mmap() hook (called on the last mmput()).
>> mmdrop() decrements the mm->mm_count which is a total reference number;
>> mmput() decrements the mm->mm_users which is a number of user spaces and
>> this is actually the counter we want to watch for here.
>>
>> Cc: David Gibson <david@gibson.dropbear.id.au>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: Balbir Singh <bsingharora@gmail.com>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  arch/powerpc/include/asm/mmu_context.h | 3 +++
>>  arch/powerpc/mm/mmu_context_book3s64.c | 4 ----
>>  2 files changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
>> index 9d2cd0c..24b590d 100644
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -138,6 +138,9 @@ static inline void arch_dup_mmap(struct mm_struct *oldmm,
>>
>>  static inline void arch_exit_mmap(struct mm_struct *mm)
>>  {
>> +#ifdef CONFIG_SPAPR_TCE_IOMMU
>> +       mm_iommu_cleanup(&mm->context);
>> +#endif
>>  }
>>
> 
> The assumption is that all necessary tear down has happened. Earlier
> we were doing the teardown
> on the last mm ref drop, do we have any dependence on that? I don't
> think so, but just checking
> 
> Does qemu dotce_iommu_register_pages on exit()/atexit() or do we rely
> on the exit path to do the teardown?

QEMU does call unregister when a memory region is deleted. Handling it in
arch_exit_mmap() is for the case when a QEMU process just dies for whatever
reason.

> Can we please remove the #ifdef CONFIG_SPAPR_TCE_IOMMU here and have a version
> of mm_iommu_cleanup that handles it


That would be a change unrelated to the fix. And I just do not see the
point in this - mm_iommu_cleanup() is declared in the very same header.


> Basically do
> 
> #ifdef CONFIG_SPAPR_TCE_IOMMU
> extern void mm_iommu_cleanup(void *ptr)
> #else
> static inline mm_iommu_cleanup(void *ptr) {}
> #endif
> 
> Otherwise looks good
> 
> Acked-by: Balbir Singh <bsingharora@gmail.com>

Thanks!


-- 
Alexey

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH kernel] powerpc/mm/iommu: Put pages on process exit
  2016-07-14 11:52   ` Alexey Kardashevskiy
@ 2016-07-15  1:47     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 4+ messages in thread
From: Alexey Kardashevskiy @ 2016-07-15  1:47 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linuxppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	linux-kernel, David Gibson, npiggin

On 14/07/16 21:52, Alexey Kardashevskiy wrote:
> On 14/07/16 20:12, Balbir Singh wrote:
>> On Thu, Jul 14, 2016 at 3:16 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>> At the moment VFIO IOMMU SPAPR v2 driver pins all guest RAM pages when
>>> the userspace starts using VFIO. When the userspace process finishes,
>>> all the pinned pages need to be put; this is done as a part of
>>> the userspace memory context (MM) destruction which happens on
>>> the very last mmdrop().
>>>
>>> This approach has a problem that a MM of the userspace process
>>> may live longer than the userspace process itself as kernel threads
>>> usually execute on a MM of a userspace process which was runnning
>>> on a CPU where the kernel thread was scheduled to. If this happened,
>>> the MM remains referenced until this exact kernel thread wakes up again
>>> and releases the very last reference to the MM, on an idle system this
>>> can take even hours.
>>>
>>> This fixes the issue by moving mm_iommu_cleanup() (the helper which puts
>>> pages) from destroy_context() (called on the last mmdrop()) to
>>> the arch-specific arch_exit_mmap() hook (called on the last mmput()).
>>> mmdrop() decrements the mm->mm_count which is a total reference number;
>>> mmput() decrements the mm->mm_users which is a number of user spaces and
>>> this is actually the counter we want to watch for here.
>>>
>>> Cc: David Gibson <david@gibson.dropbear.id.au>
>>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> Cc: Paul Mackerras <paulus@samba.org>
>>> Cc: Balbir Singh <bsingharora@gmail.com>
>>> Cc: Nick Piggin <npiggin@kernel.dk>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  arch/powerpc/include/asm/mmu_context.h | 3 +++
>>>  arch/powerpc/mm/mmu_context_book3s64.c | 4 ----
>>>  2 files changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
>>> index 9d2cd0c..24b590d 100644
>>> --- a/arch/powerpc/include/asm/mmu_context.h
>>> +++ b/arch/powerpc/include/asm/mmu_context.h
>>> @@ -138,6 +138,9 @@ static inline void arch_dup_mmap(struct mm_struct *oldmm,
>>>
>>>  static inline void arch_exit_mmap(struct mm_struct *mm)
>>>  {
>>> +#ifdef CONFIG_SPAPR_TCE_IOMMU
>>> +       mm_iommu_cleanup(&mm->context);
>>> +#endif
>>>  }
>>>
>>
>> The assumption is that all necessary tear down has happened. Earlier
>> we were doing the teardown
>> on the last mm ref drop, do we have any dependence on that? I don't
>> think so, but just checking
>>
>> Does qemu dotce_iommu_register_pages on exit()/atexit() or do we rely
>> on the exit path to do the teardown?
> 
> QEMU does call unregister when a memory region is deleted. Handling it in
> arch_exit_mmap() is for the case when a QEMU process just dies for whatever
> reason.

After a second thought the patch is not correct, before it would actually
put pages when all references to MM were dropped and many of them are hold
by file descriptors, with this patch, pages are put before files are closed
which means that devices will not be reset and doing DMA to memory which is
not pinned but there are TCEs pointign to it.

I will make another patch, with additional mmget()/mmput().



>> Can we please remove the #ifdef CONFIG_SPAPR_TCE_IOMMU here and have a version
>> of mm_iommu_cleanup that handles it
> 
> 
> That would be a change unrelated to the fix. And I just do not see the
> point in this - mm_iommu_cleanup() is declared in the very same header.
> 
> 
>> Basically do
>>
>> #ifdef CONFIG_SPAPR_TCE_IOMMU
>> extern void mm_iommu_cleanup(void *ptr)
>> #else
>> static inline mm_iommu_cleanup(void *ptr) {}
>> #endif
>>
>> Otherwise looks good
>>
>> Acked-by: Balbir Singh <bsingharora@gmail.com>
> 
> Thanks!
> 
> 
	

-- 
Alexey

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-07-15  1:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-14  5:16 [PATCH kernel] powerpc/mm/iommu: Put pages on process exit Alexey Kardashevskiy
2016-07-14 10:12 ` Balbir Singh
2016-07-14 11:52   ` Alexey Kardashevskiy
2016-07-15  1:47     ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.