* [PATCH kernel v2 0/2] powerpc/mm_iommu: Fixes
@ 2019-04-03 4:12 ` Alexey Kardashevskiy
0 siblings, 0 replies; 10+ messages in thread
From: Alexey Kardashevskiy @ 2019-04-03 4:12 UTC (permalink / raw)
To: linuxppc-dev
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V, kvm-ppc, David Gibson
The patches do independent things but touch exact same code so
the order in which they should apply matters.
This supercedes:
[PATCH kernel] powerpc/mm_iommu: Allow pinning large regions
[PATCH kernel 1/2] powerpc/mm_iommu: Prepare for less locking
[PATCH kernel 2/2] powerpc/mm_iommu: Fix potential deadlock
This is based on sha1
5e7a8ca31926 Linus Torvalds "Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs".
Please comment. Thanks.
Alexey Kardashevskiy (2):
powerpc/mm_iommu: Fix potential deadlock
powerpc/mm_iommu: Allow pinning large regions
arch/powerpc/mm/mmu_context_iommu.c | 97 +++++++++++++++++------------
1 file changed, 58 insertions(+), 39 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH kernel v2 0/2] powerpc/mm_iommu: Fixes
@ 2019-04-03 4:12 ` Alexey Kardashevskiy
0 siblings, 0 replies; 10+ messages in thread
From: Alexey Kardashevskiy @ 2019-04-03 4:12 UTC (permalink / raw)
To: linuxppc-dev
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V, kvm-ppc, David Gibson
The patches do independent things but touch exact same code so
the order in which they should apply matters.
This supercedes:
[PATCH kernel] powerpc/mm_iommu: Allow pinning large regions
[PATCH kernel 1/2] powerpc/mm_iommu: Prepare for less locking
[PATCH kernel 2/2] powerpc/mm_iommu: Fix potential deadlock
This is based on sha1
5e7a8ca31926 Linus Torvalds "Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs".
Please comment. Thanks.
Alexey Kardashevskiy (2):
powerpc/mm_iommu: Fix potential deadlock
powerpc/mm_iommu: Allow pinning large regions
arch/powerpc/mm/mmu_context_iommu.c | 97 +++++++++++++++++------------
1 file changed, 58 insertions(+), 39 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH kernel v2 1/2] powerpc/mm_iommu: Fix potential deadlock
2019-04-03 4:12 ` Alexey Kardashevskiy
@ 2019-04-03 4:12 ` Alexey Kardashevskiy
-1 siblings, 0 replies; 10+ messages in thread
From: Alexey Kardashevskiy @ 2019-04-03 4:12 UTC (permalink / raw)
To: linuxppc-dev
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V, kvm-ppc, David Gibson
Currently mm_iommu_do_alloc() is called in 2 cases:
- VFIO_IOMMU_SPAPR_REGISTER_MEMORY ioctl() for normal memory:
this locks &mem_list_mutex and then locks mm::mmap_sem
several times when adjusting locked_vm or pinning pages;
- vfio_pci_nvgpu_regops::mmap() for GPU memory:
this is called with mm::mmap_sem held already and it locks
&mem_list_mutex.
So one can craft a userspace program to do special ioctl and mmap in
2 threads concurrently and cause a deadlock which lockdep warns about
(below).
We did not hit this yet because QEMU constructs the machine in a single
thread.
This moves the overlap check next to where the new entry is added and
reduces the amount of time spent with &mem_list_mutex held.
This moves locked_vm adjustment from under &mem_list_mutex.
This relies on mm_iommu_adjust_locked_vm() doing nothing when entries==0.
This is one of the lockdep warnings:
======================================================
WARNING: possible circular locking dependency detected
5.1.0-rc2-le_nv2_aikATfstn1-p1 #363 Not tainted
------------------------------------------------------
qemu-system-ppc/8038 is trying to acquire lock:
000000002ec6c453 (mem_list_mutex){+.+.}, at: mm_iommu_do_alloc+0x70/0x490
but task is already holding lock:
00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&mm->mmap_sem){++++}:
lock_acquire+0xf8/0x260
down_write+0x44/0xa0
mm_iommu_adjust_locked_vm.part.1+0x4c/0x190
mm_iommu_do_alloc+0x310/0x490
tce_iommu_ioctl.part.9+0xb84/0x1150 [vfio_iommu_spapr_tce]
vfio_fops_unl_ioctl+0x94/0x430 [vfio]
do_vfs_ioctl+0xe4/0x930
ksys_ioctl+0xc4/0x110
sys_ioctl+0x28/0x80
system_call+0x5c/0x70
-> #0 (mem_list_mutex){+.+.}:
__lock_acquire+0x1484/0x1900
lock_acquire+0xf8/0x260
__mutex_lock+0x88/0xa70
mm_iommu_do_alloc+0x70/0x490
vfio_pci_nvgpu_mmap+0xc0/0x130 [vfio_pci]
vfio_pci_mmap+0x198/0x2a0 [vfio_pci]
vfio_device_fops_mmap+0x44/0x70 [vfio]
mmap_region+0x5d4/0x770
do_mmap+0x42c/0x650
vm_mmap_pgoff+0x124/0x160
ksys_mmap_pgoff+0xdc/0x2f0
sys_mmap+0x40/0x80
system_call+0x5c/0x70
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(mem_list_mutex);
lock(&mm->mmap_sem);
lock(mem_list_mutex);
*** DEADLOCK ***
1 lock held by qemu-system-ppc/8038:
#0: 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
Fixes: c10c21efa4bc ("powerpc/vfio/iommu/kvm: Do not pin device memory", 2018-12-19)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
arch/powerpc/mm/mmu_context_iommu.c | 75 +++++++++++++++--------------
1 file changed, 39 insertions(+), 36 deletions(-)
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index e7a9c4f6bfca..9d9be850f8c2 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -95,28 +95,14 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
unsigned long entries, unsigned long dev_hpa,
struct mm_iommu_table_group_mem_t **pmem)
{
- struct mm_iommu_table_group_mem_t *mem;
- long i, ret, locked_entries = 0;
+ struct mm_iommu_table_group_mem_t *mem, *mem2;
+ long i, ret, locked_entries = 0, pinned = 0;
unsigned int pageshift;
- mutex_lock(&mem_list_mutex);
-
- list_for_each_entry_rcu(mem, &mm->context.iommu_group_mem_list,
- next) {
- /* Overlap? */
- if ((mem->ua < (ua + (entries << PAGE_SHIFT))) &&
- (ua < (mem->ua +
- (mem->entries << PAGE_SHIFT)))) {
- ret = -EINVAL;
- goto unlock_exit;
- }
-
- }
-
if (dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) {
ret = mm_iommu_adjust_locked_vm(mm, entries, true);
if (ret)
- goto unlock_exit;
+ return ret;
locked_entries = entries;
}
@@ -150,15 +136,10 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
down_read(&mm->mmap_sem);
ret = get_user_pages_longterm(ua, entries, FOLL_WRITE, mem->hpages, NULL);
up_read(&mm->mmap_sem);
+ pinned = ret > 0 ? ret : 0;
if (ret != entries) {
- /* free the reference taken */
- for (i = 0; i < ret; i++)
- put_page(mem->hpages[i]);
-
- vfree(mem->hpas);
- kfree(mem);
ret = -EFAULT;
- goto unlock_exit;
+ goto free_exit;
}
pageshift = PAGE_SHIFT;
@@ -183,21 +164,43 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
}
good_exit:
- ret = 0;
atomic64_set(&mem->mapped, 1);
mem->used = 1;
mem->ua = ua;
mem->entries = entries;
- *pmem = mem;
+
+ mutex_lock(&mem_list_mutex);
+
+ list_for_each_entry_rcu(mem2, &mm->context.iommu_group_mem_list, next) {
+ /* Overlap? */
+ if ((mem2->ua < (ua + (entries << PAGE_SHIFT))) &&
+ (ua < (mem2->ua +
+ (mem2->entries << PAGE_SHIFT)))) {
+ ret = -EINVAL;
+ mutex_unlock(&mem_list_mutex);
+ goto free_exit;
+ }
+ }
list_add_rcu(&mem->next, &mm->context.iommu_group_mem_list);
-unlock_exit:
- if (locked_entries && ret)
- mm_iommu_adjust_locked_vm(mm, locked_entries, false);
-
mutex_unlock(&mem_list_mutex);
+ *pmem = mem;
+
+ return 0;
+
+free_exit:
+ /* free the reference taken */
+ for (i = 0; i < pinned; i++)
+ put_page(mem->hpages[i]);
+
+ vfree(mem->hpas);
+ kfree(mem);
+
+unlock_exit:
+ mm_iommu_adjust_locked_vm(mm, locked_entries, false);
+
return ret;
}
@@ -266,7 +269,7 @@ static void mm_iommu_release(struct mm_iommu_table_group_mem_t *mem)
long mm_iommu_put(struct mm_struct *mm, struct mm_iommu_table_group_mem_t *mem)
{
long ret = 0;
- unsigned long entries, dev_hpa;
+ unsigned long unlock_entries = 0;
mutex_lock(&mem_list_mutex);
@@ -287,17 +290,17 @@ long mm_iommu_put(struct mm_struct *mm, struct mm_iommu_table_group_mem_t *mem)
goto unlock_exit;
}
+ if (mem->dev_hpa == MM_IOMMU_TABLE_INVALID_HPA)
+ unlock_entries = mem->entries;
+
/* @mapped became 0 so now mappings are disabled, release the region */
- entries = mem->entries;
- dev_hpa = mem->dev_hpa;
mm_iommu_release(mem);
- if (dev_hpa == MM_IOMMU_TABLE_INVALID_HPA)
- mm_iommu_adjust_locked_vm(mm, entries, false);
-
unlock_exit:
mutex_unlock(&mem_list_mutex);
+ mm_iommu_adjust_locked_vm(mm, unlock_entries, false);
+
return ret;
}
EXPORT_SYMBOL_GPL(mm_iommu_put);
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH kernel v2 1/2] powerpc/mm_iommu: Fix potential deadlock
@ 2019-04-03 4:12 ` Alexey Kardashevskiy
0 siblings, 0 replies; 10+ messages in thread
From: Alexey Kardashevskiy @ 2019-04-03 4:12 UTC (permalink / raw)
To: linuxppc-dev
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V, kvm-ppc, David Gibson
Currently mm_iommu_do_alloc() is called in 2 cases:
- VFIO_IOMMU_SPAPR_REGISTER_MEMORY ioctl() for normal memory:
this locks &mem_list_mutex and then locks mm::mmap_sem
several times when adjusting locked_vm or pinning pages;
- vfio_pci_nvgpu_regops::mmap() for GPU memory:
this is called with mm::mmap_sem held already and it locks
&mem_list_mutex.
So one can craft a userspace program to do special ioctl and mmap in
2 threads concurrently and cause a deadlock which lockdep warns about
(below).
We did not hit this yet because QEMU constructs the machine in a single
thread.
This moves the overlap check next to where the new entry is added and
reduces the amount of time spent with &mem_list_mutex held.
This moves locked_vm adjustment from under &mem_list_mutex.
This relies on mm_iommu_adjust_locked_vm() doing nothing when entries=0.
This is one of the lockdep warnings:
===========================
WARNING: possible circular locking dependency detected
5.1.0-rc2-le_nv2_aikATfstn1-p1 #363 Not tainted
------------------------------------------------------
qemu-system-ppc/8038 is trying to acquire lock:
000000002ec6c453 (mem_list_mutex){+.+.}, at: mm_iommu_do_alloc+0x70/0x490
but task is already holding lock:
00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&mm->mmap_sem){++++}:
lock_acquire+0xf8/0x260
down_write+0x44/0xa0
mm_iommu_adjust_locked_vm.part.1+0x4c/0x190
mm_iommu_do_alloc+0x310/0x490
tce_iommu_ioctl.part.9+0xb84/0x1150 [vfio_iommu_spapr_tce]
vfio_fops_unl_ioctl+0x94/0x430 [vfio]
do_vfs_ioctl+0xe4/0x930
ksys_ioctl+0xc4/0x110
sys_ioctl+0x28/0x80
system_call+0x5c/0x70
-> #0 (mem_list_mutex){+.+.}:
__lock_acquire+0x1484/0x1900
lock_acquire+0xf8/0x260
__mutex_lock+0x88/0xa70
mm_iommu_do_alloc+0x70/0x490
vfio_pci_nvgpu_mmap+0xc0/0x130 [vfio_pci]
vfio_pci_mmap+0x198/0x2a0 [vfio_pci]
vfio_device_fops_mmap+0x44/0x70 [vfio]
mmap_region+0x5d4/0x770
do_mmap+0x42c/0x650
vm_mmap_pgoff+0x124/0x160
ksys_mmap_pgoff+0xdc/0x2f0
sys_mmap+0x40/0x80
system_call+0x5c/0x70
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(mem_list_mutex);
lock(&mm->mmap_sem);
lock(mem_list_mutex);
*** DEADLOCK ***
1 lock held by qemu-system-ppc/8038:
#0: 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
Fixes: c10c21efa4bc ("powerpc/vfio/iommu/kvm: Do not pin device memory", 2018-12-19)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
arch/powerpc/mm/mmu_context_iommu.c | 75 +++++++++++++++--------------
1 file changed, 39 insertions(+), 36 deletions(-)
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index e7a9c4f6bfca..9d9be850f8c2 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -95,28 +95,14 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
unsigned long entries, unsigned long dev_hpa,
struct mm_iommu_table_group_mem_t **pmem)
{
- struct mm_iommu_table_group_mem_t *mem;
- long i, ret, locked_entries = 0;
+ struct mm_iommu_table_group_mem_t *mem, *mem2;
+ long i, ret, locked_entries = 0, pinned = 0;
unsigned int pageshift;
- mutex_lock(&mem_list_mutex);
-
- list_for_each_entry_rcu(mem, &mm->context.iommu_group_mem_list,
- next) {
- /* Overlap? */
- if ((mem->ua < (ua + (entries << PAGE_SHIFT))) &&
- (ua < (mem->ua +
- (mem->entries << PAGE_SHIFT)))) {
- ret = -EINVAL;
- goto unlock_exit;
- }
-
- }
-
if (dev_hpa = MM_IOMMU_TABLE_INVALID_HPA) {
ret = mm_iommu_adjust_locked_vm(mm, entries, true);
if (ret)
- goto unlock_exit;
+ return ret;
locked_entries = entries;
}
@@ -150,15 +136,10 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
down_read(&mm->mmap_sem);
ret = get_user_pages_longterm(ua, entries, FOLL_WRITE, mem->hpages, NULL);
up_read(&mm->mmap_sem);
+ pinned = ret > 0 ? ret : 0;
if (ret != entries) {
- /* free the reference taken */
- for (i = 0; i < ret; i++)
- put_page(mem->hpages[i]);
-
- vfree(mem->hpas);
- kfree(mem);
ret = -EFAULT;
- goto unlock_exit;
+ goto free_exit;
}
pageshift = PAGE_SHIFT;
@@ -183,21 +164,43 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
}
good_exit:
- ret = 0;
atomic64_set(&mem->mapped, 1);
mem->used = 1;
mem->ua = ua;
mem->entries = entries;
- *pmem = mem;
+
+ mutex_lock(&mem_list_mutex);
+
+ list_for_each_entry_rcu(mem2, &mm->context.iommu_group_mem_list, next) {
+ /* Overlap? */
+ if ((mem2->ua < (ua + (entries << PAGE_SHIFT))) &&
+ (ua < (mem2->ua +
+ (mem2->entries << PAGE_SHIFT)))) {
+ ret = -EINVAL;
+ mutex_unlock(&mem_list_mutex);
+ goto free_exit;
+ }
+ }
list_add_rcu(&mem->next, &mm->context.iommu_group_mem_list);
-unlock_exit:
- if (locked_entries && ret)
- mm_iommu_adjust_locked_vm(mm, locked_entries, false);
-
mutex_unlock(&mem_list_mutex);
+ *pmem = mem;
+
+ return 0;
+
+free_exit:
+ /* free the reference taken */
+ for (i = 0; i < pinned; i++)
+ put_page(mem->hpages[i]);
+
+ vfree(mem->hpas);
+ kfree(mem);
+
+unlock_exit:
+ mm_iommu_adjust_locked_vm(mm, locked_entries, false);
+
return ret;
}
@@ -266,7 +269,7 @@ static void mm_iommu_release(struct mm_iommu_table_group_mem_t *mem)
long mm_iommu_put(struct mm_struct *mm, struct mm_iommu_table_group_mem_t *mem)
{
long ret = 0;
- unsigned long entries, dev_hpa;
+ unsigned long unlock_entries = 0;
mutex_lock(&mem_list_mutex);
@@ -287,17 +290,17 @@ long mm_iommu_put(struct mm_struct *mm, struct mm_iommu_table_group_mem_t *mem)
goto unlock_exit;
}
+ if (mem->dev_hpa = MM_IOMMU_TABLE_INVALID_HPA)
+ unlock_entries = mem->entries;
+
/* @mapped became 0 so now mappings are disabled, release the region */
- entries = mem->entries;
- dev_hpa = mem->dev_hpa;
mm_iommu_release(mem);
- if (dev_hpa = MM_IOMMU_TABLE_INVALID_HPA)
- mm_iommu_adjust_locked_vm(mm, entries, false);
-
unlock_exit:
mutex_unlock(&mem_list_mutex);
+ mm_iommu_adjust_locked_vm(mm, unlock_entries, false);
+
return ret;
}
EXPORT_SYMBOL_GPL(mm_iommu_put);
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH kernel v2 2/2] powerpc/mm_iommu: Allow pinning large regions
2019-04-03 4:12 ` Alexey Kardashevskiy
@ 2019-04-03 4:12 ` Alexey Kardashevskiy
-1 siblings, 0 replies; 10+ messages in thread
From: Alexey Kardashevskiy @ 2019-04-03 4:12 UTC (permalink / raw)
To: linuxppc-dev
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V, kvm-ppc, David Gibson
When called with vmas_arg==NULL, get_user_pages_longterm() allocates
an array of nr_pages*8 which can easily get greater that the max order,
for example, registering memory for a 256GB guest does this and fails
in __alloc_pages_nodemask().
This adds a loop over chunks of entries to fit the max order limit.
Fixes: 678e174c4c16 ("powerpc/mm/iommu: allow migration of cma allocated pages during mm_iommu_do_alloc", 2019-03-05)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
arch/powerpc/mm/mmu_context_iommu.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index 9d9be850f8c2..8330f135294f 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -98,6 +98,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
struct mm_iommu_table_group_mem_t *mem, *mem2;
long i, ret, locked_entries = 0, pinned = 0;
unsigned int pageshift;
+ unsigned long entry, chunk;
if (dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) {
ret = mm_iommu_adjust_locked_vm(mm, entries, true);
@@ -134,11 +135,26 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
}
down_read(&mm->mmap_sem);
- ret = get_user_pages_longterm(ua, entries, FOLL_WRITE, mem->hpages, NULL);
+ chunk = (1UL << (PAGE_SHIFT + MAX_ORDER - 1)) /
+ sizeof(struct vm_area_struct *);
+ chunk = min(chunk, entries);
+ for (entry = 0; entry < entries; entry += chunk) {
+ unsigned long n = min(entries - entry, chunk);
+
+ ret = get_user_pages_longterm(ua + (entry << PAGE_SHIFT), n,
+ FOLL_WRITE, mem->hpages + entry, NULL);
+ if (ret == n) {
+ pinned += n;
+ continue;
+ }
+ if (ret > 0)
+ pinned += ret;
+ break;
+ }
up_read(&mm->mmap_sem);
- pinned = ret > 0 ? ret : 0;
- if (ret != entries) {
- ret = -EFAULT;
+ if (pinned != entries) {
+ if (!ret)
+ ret = -EFAULT;
goto free_exit;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH kernel v2 2/2] powerpc/mm_iommu: Allow pinning large regions
@ 2019-04-03 4:12 ` Alexey Kardashevskiy
0 siblings, 0 replies; 10+ messages in thread
From: Alexey Kardashevskiy @ 2019-04-03 4:12 UTC (permalink / raw)
To: linuxppc-dev
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V, kvm-ppc, David Gibson
When called with vmas_arg=NULL, get_user_pages_longterm() allocates
an array of nr_pages*8 which can easily get greater that the max order,
for example, registering memory for a 256GB guest does this and fails
in __alloc_pages_nodemask().
This adds a loop over chunks of entries to fit the max order limit.
Fixes: 678e174c4c16 ("powerpc/mm/iommu: allow migration of cma allocated pages during mm_iommu_do_alloc", 2019-03-05)
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
arch/powerpc/mm/mmu_context_iommu.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index 9d9be850f8c2..8330f135294f 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -98,6 +98,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
struct mm_iommu_table_group_mem_t *mem, *mem2;
long i, ret, locked_entries = 0, pinned = 0;
unsigned int pageshift;
+ unsigned long entry, chunk;
if (dev_hpa = MM_IOMMU_TABLE_INVALID_HPA) {
ret = mm_iommu_adjust_locked_vm(mm, entries, true);
@@ -134,11 +135,26 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
}
down_read(&mm->mmap_sem);
- ret = get_user_pages_longterm(ua, entries, FOLL_WRITE, mem->hpages, NULL);
+ chunk = (1UL << (PAGE_SHIFT + MAX_ORDER - 1)) /
+ sizeof(struct vm_area_struct *);
+ chunk = min(chunk, entries);
+ for (entry = 0; entry < entries; entry += chunk) {
+ unsigned long n = min(entries - entry, chunk);
+
+ ret = get_user_pages_longterm(ua + (entry << PAGE_SHIFT), n,
+ FOLL_WRITE, mem->hpages + entry, NULL);
+ if (ret = n) {
+ pinned += n;
+ continue;
+ }
+ if (ret > 0)
+ pinned += ret;
+ break;
+ }
up_read(&mm->mmap_sem);
- pinned = ret > 0 ? ret : 0;
- if (ret != entries) {
- ret = -EFAULT;
+ if (pinned != entries) {
+ if (!ret)
+ ret = -EFAULT;
goto free_exit;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [kernel,v2,2/2] powerpc/mm_iommu: Allow pinning large regions
2019-04-03 4:12 ` Alexey Kardashevskiy
@ 2019-04-21 14:07 ` Michael Ellerman
-1 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2019-04-21 14:07 UTC (permalink / raw)
To: Alexey Kardashevskiy, linuxppc-dev
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V, kvm-ppc, David Gibson
On Wed, 2019-04-03 at 04:12:33 UTC, Alexey Kardashevskiy wrote:
> When called with vmas_arg==NULL, get_user_pages_longterm() allocates
> an array of nr_pages*8 which can easily get greater that the max order,
> for example, registering memory for a 256GB guest does this and fails
> in __alloc_pages_nodemask().
>
> This adds a loop over chunks of entries to fit the max order limit.
>
> Fixes: 678e174c4c16 ("powerpc/mm/iommu: allow migration of cma allocated pages during mm_iommu_do_alloc", 2019-03-05)
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Applied to powerpc fixes, thanks.
https://git.kernel.org/powerpc/c/7a3a4d763837d3aa654cd10590309504
cheers
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [kernel,v2,1/2] powerpc/mm_iommu: Fix potential deadlock
2019-04-03 4:12 ` Alexey Kardashevskiy
@ 2019-04-21 14:07 ` Michael Ellerman
-1 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2019-04-21 14:07 UTC (permalink / raw)
To: Alexey Kardashevskiy, linuxppc-dev
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V, kvm-ppc, David Gibson
On Wed, 2019-04-03 at 04:12:32 UTC, Alexey Kardashevskiy wrote:
> Currently mm_iommu_do_alloc() is called in 2 cases:
> - VFIO_IOMMU_SPAPR_REGISTER_MEMORY ioctl() for normal memory:
> this locks &mem_list_mutex and then locks mm::mmap_sem
> several times when adjusting locked_vm or pinning pages;
> - vfio_pci_nvgpu_regops::mmap() for GPU memory:
> this is called with mm::mmap_sem held already and it locks
> &mem_list_mutex.
>
> So one can craft a userspace program to do special ioctl and mmap in
> 2 threads concurrently and cause a deadlock which lockdep warns about
> (below).
>
> We did not hit this yet because QEMU constructs the machine in a single
> thread.
>
> This moves the overlap check next to where the new entry is added and
> reduces the amount of time spent with &mem_list_mutex held.
>
> This moves locked_vm adjustment from under &mem_list_mutex.
>
> This relies on mm_iommu_adjust_locked_vm() doing nothing when entries==0.
>
> This is one of the lockdep warnings:
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.1.0-rc2-le_nv2_aikATfstn1-p1 #363 Not tainted
> ------------------------------------------------------
> qemu-system-ppc/8038 is trying to acquire lock:
> 000000002ec6c453 (mem_list_mutex){+.+.}, at: mm_iommu_do_alloc+0x70/0x490
>
> but task is already holding lock:
> 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&mm->mmap_sem){++++}:
> lock_acquire+0xf8/0x260
> down_write+0x44/0xa0
> mm_iommu_adjust_locked_vm.part.1+0x4c/0x190
> mm_iommu_do_alloc+0x310/0x490
> tce_iommu_ioctl.part.9+0xb84/0x1150 [vfio_iommu_spapr_tce]
> vfio_fops_unl_ioctl+0x94/0x430 [vfio]
> do_vfs_ioctl+0xe4/0x930
> ksys_ioctl+0xc4/0x110
> sys_ioctl+0x28/0x80
> system_call+0x5c/0x70
>
> -> #0 (mem_list_mutex){+.+.}:
> __lock_acquire+0x1484/0x1900
> lock_acquire+0xf8/0x260
> __mutex_lock+0x88/0xa70
> mm_iommu_do_alloc+0x70/0x490
> vfio_pci_nvgpu_mmap+0xc0/0x130 [vfio_pci]
> vfio_pci_mmap+0x198/0x2a0 [vfio_pci]
> vfio_device_fops_mmap+0x44/0x70 [vfio]
> mmap_region+0x5d4/0x770
> do_mmap+0x42c/0x650
> vm_mmap_pgoff+0x124/0x160
> ksys_mmap_pgoff+0xdc/0x2f0
> sys_mmap+0x40/0x80
> system_call+0x5c/0x70
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&mm->mmap_sem);
> lock(mem_list_mutex);
> lock(&mm->mmap_sem);
> lock(mem_list_mutex);
>
> *** DEADLOCK ***
>
> 1 lock held by qemu-system-ppc/8038:
> #0: 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
>
> Fixes: c10c21efa4bc ("powerpc/vfio/iommu/kvm: Do not pin device memory", 2018-12-19)
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Applied to powerpc fixes, thanks.
https://git.kernel.org/powerpc/c/eb9d7a62c38628ab0ba6e59d22d7cb79
cheers
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [kernel,v2,2/2] powerpc/mm_iommu: Allow pinning large regions
@ 2019-04-21 14:07 ` Michael Ellerman
0 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2019-04-21 14:07 UTC (permalink / raw)
To: Alexey Kardashevskiy, linuxppc-dev
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V, kvm-ppc, David Gibson
On Wed, 2019-04-03 at 04:12:33 UTC, Alexey Kardashevskiy wrote:
> When called with vmas_arg=NULL, get_user_pages_longterm() allocates
> an array of nr_pages*8 which can easily get greater that the max order,
> for example, registering memory for a 256GB guest does this and fails
> in __alloc_pages_nodemask().
>
> This adds a loop over chunks of entries to fit the max order limit.
>
> Fixes: 678e174c4c16 ("powerpc/mm/iommu: allow migration of cma allocated pages during mm_iommu_do_alloc", 2019-03-05)
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Applied to powerpc fixes, thanks.
https://git.kernel.org/powerpc/c/7a3a4d763837d3aa654cd10590309504
cheers
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [kernel,v2,1/2] powerpc/mm_iommu: Fix potential deadlock
@ 2019-04-21 14:07 ` Michael Ellerman
0 siblings, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2019-04-21 14:07 UTC (permalink / raw)
To: Alexey Kardashevskiy, linuxppc-dev
Cc: Alexey Kardashevskiy, Aneesh Kumar K.V, kvm-ppc, David Gibson
On Wed, 2019-04-03 at 04:12:32 UTC, Alexey Kardashevskiy wrote:
> Currently mm_iommu_do_alloc() is called in 2 cases:
> - VFIO_IOMMU_SPAPR_REGISTER_MEMORY ioctl() for normal memory:
> this locks &mem_list_mutex and then locks mm::mmap_sem
> several times when adjusting locked_vm or pinning pages;
> - vfio_pci_nvgpu_regops::mmap() for GPU memory:
> this is called with mm::mmap_sem held already and it locks
> &mem_list_mutex.
>
> So one can craft a userspace program to do special ioctl and mmap in
> 2 threads concurrently and cause a deadlock which lockdep warns about
> (below).
>
> We did not hit this yet because QEMU constructs the machine in a single
> thread.
>
> This moves the overlap check next to where the new entry is added and
> reduces the amount of time spent with &mem_list_mutex held.
>
> This moves locked_vm adjustment from under &mem_list_mutex.
>
> This relies on mm_iommu_adjust_locked_vm() doing nothing when entries=0.
>
> This is one of the lockdep warnings:
>
> ===========================
> WARNING: possible circular locking dependency detected
> 5.1.0-rc2-le_nv2_aikATfstn1-p1 #363 Not tainted
> ------------------------------------------------------
> qemu-system-ppc/8038 is trying to acquire lock:
> 000000002ec6c453 (mem_list_mutex){+.+.}, at: mm_iommu_do_alloc+0x70/0x490
>
> but task is already holding lock:
> 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&mm->mmap_sem){++++}:
> lock_acquire+0xf8/0x260
> down_write+0x44/0xa0
> mm_iommu_adjust_locked_vm.part.1+0x4c/0x190
> mm_iommu_do_alloc+0x310/0x490
> tce_iommu_ioctl.part.9+0xb84/0x1150 [vfio_iommu_spapr_tce]
> vfio_fops_unl_ioctl+0x94/0x430 [vfio]
> do_vfs_ioctl+0xe4/0x930
> ksys_ioctl+0xc4/0x110
> sys_ioctl+0x28/0x80
> system_call+0x5c/0x70
>
> -> #0 (mem_list_mutex){+.+.}:
> __lock_acquire+0x1484/0x1900
> lock_acquire+0xf8/0x260
> __mutex_lock+0x88/0xa70
> mm_iommu_do_alloc+0x70/0x490
> vfio_pci_nvgpu_mmap+0xc0/0x130 [vfio_pci]
> vfio_pci_mmap+0x198/0x2a0 [vfio_pci]
> vfio_device_fops_mmap+0x44/0x70 [vfio]
> mmap_region+0x5d4/0x770
> do_mmap+0x42c/0x650
> vm_mmap_pgoff+0x124/0x160
> ksys_mmap_pgoff+0xdc/0x2f0
> sys_mmap+0x40/0x80
> system_call+0x5c/0x70
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&mm->mmap_sem);
> lock(mem_list_mutex);
> lock(&mm->mmap_sem);
> lock(mem_list_mutex);
>
> *** DEADLOCK ***
>
> 1 lock held by qemu-system-ppc/8038:
> #0: 00000000fd7da97f (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xf0/0x160
>
> Fixes: c10c21efa4bc ("powerpc/vfio/iommu/kvm: Do not pin device memory", 2018-12-19)
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Applied to powerpc fixes, thanks.
https://git.kernel.org/powerpc/c/eb9d7a62c38628ab0ba6e59d22d7cb79
cheers
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-04-21 14:10 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-03 4:12 [PATCH kernel v2 0/2] powerpc/mm_iommu: Fixes Alexey Kardashevskiy
2019-04-03 4:12 ` Alexey Kardashevskiy
2019-04-03 4:12 ` [PATCH kernel v2 1/2] powerpc/mm_iommu: Fix potential deadlock Alexey Kardashevskiy
2019-04-03 4:12 ` Alexey Kardashevskiy
2019-04-21 14:07 ` [kernel,v2,1/2] " Michael Ellerman
2019-04-21 14:07 ` Michael Ellerman
2019-04-03 4:12 ` [PATCH kernel v2 2/2] powerpc/mm_iommu: Allow pinning large regions Alexey Kardashevskiy
2019-04-03 4:12 ` Alexey Kardashevskiy
2019-04-21 14:07 ` [kernel,v2,2/2] " Michael Ellerman
2019-04-21 14:07 ` Michael Ellerman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.