linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin
@ 2016-05-23 17:14 Ebru Akagunduz
  2016-05-23 17:14 ` [PATCH 1/3] mm, thp: remove duplication of included header Ebru Akagunduz
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Ebru Akagunduz @ 2016-05-23 17:14 UTC (permalink / raw)
  To: linux-mm
  Cc: hughd, riel, akpm, kirill.shutemov, n-horiguchi, aarcange,
	iamjoonsoo.kim, gorcunov, linux-kernel, mgorman, rientjes,
	vbabka, aneesh.kumar, hannes, mhocko, boaz, Ebru Akagunduz

This patch series removes duplication of included header
and fixes locking inconsistency in khugepaged swapin

Ebru Akagunduz (3):
  mm, thp: remove duplication of included header
  mm, thp: fix possible circular locking dependency caused by
    sum_vm_event()
  mm, thp: make swapin readahead under down_read of mmap_sem

 mm/huge_memory.c | 39 ++++++++++++++++++++++++++++++---------
 1 file changed, 30 insertions(+), 9 deletions(-)

-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] mm, thp: remove duplication of included header
  2016-05-23 17:14 [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
@ 2016-05-23 17:14 ` Ebru Akagunduz
  2016-05-23 17:14 ` [PATCH 2/3] mm, thp: fix possible circular locking dependency caused by sum_vm_event() Ebru Akagunduz
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Ebru Akagunduz @ 2016-05-23 17:14 UTC (permalink / raw)
  To: linux-mm
  Cc: hughd, riel, akpm, kirill.shutemov, n-horiguchi, aarcange,
	iamjoonsoo.kim, gorcunov, linux-kernel, mgorman, rientjes,
	vbabka, aneesh.kumar, hannes, mhocko, boaz, Ebru Akagunduz

swapops.h included for a second time in the commit:
http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=639040960a340f6f987065fc52e149f4ea25ce25

This patch removes the duplication.

Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
---
 mm/huge_memory.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9472fed..91442a9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -30,7 +30,6 @@
 #include <linux/hashtable.h>
 #include <linux/userfaultfd_k.h>
 #include <linux/page_idle.h>
-#include <linux/swapops.h>
 
 #include <asm/tlb.h>
 #include <asm/pgalloc.h>
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] mm, thp: fix possible circular locking dependency caused by sum_vm_event()
  2016-05-23 17:14 [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
  2016-05-23 17:14 ` [PATCH 1/3] mm, thp: remove duplication of included header Ebru Akagunduz
@ 2016-05-23 17:14 ` Ebru Akagunduz
  2016-05-23 17:14 ` [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem Ebru Akagunduz
  2016-05-23 17:29 ` [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
  3 siblings, 0 replies; 17+ messages in thread
From: Ebru Akagunduz @ 2016-05-23 17:14 UTC (permalink / raw)
  To: linux-mm
  Cc: hughd, riel, akpm, kirill.shutemov, n-horiguchi, aarcange,
	iamjoonsoo.kim, gorcunov, linux-kernel, mgorman, rientjes,
	vbabka, aneesh.kumar, hannes, mhocko, boaz, Ebru Akagunduz

Nested circular locking dependency detected by kernel robot (udevadm).

  udevadm/221 is trying to acquire lock:
   (&mm->mmap_sem){++++++}, at: [<ffffffff81262543>] __might_fault+0x83/0x150
  but task is already holding lock:
   (s_active#12){++++.+}, at: [<ffffffff813315ee>] kernfs_fop_write+0x8e/0x250
  which lock already depends on the new lock.

 Possible unsafe locking scenario:

 CPU0                    CPU1
 ----                    ----
 lock(s_active);
                         lock(cpu_hotplug.lock);
                         lock(s_active);
 lock(&mm->mmap_sem);

 the existing dependency chain (in reverse order) is:
 -> #2 (s_active#12){++++.+}:
        [<ffffffff8117da2c>] lock_acquire+0xac/0x180
        [<ffffffff8132f50a>] __kernfs_remove+0x2da/0x410
        [<ffffffff81330630>] kernfs_remove_by_name_ns+0x40/0x90
        [<ffffffff813339fb>] sysfs_remove_file_ns+0x2b/0x70
        [<ffffffff81ba8a16>] device_del+0x166/0x320
        [<ffffffff81ba943c>] device_destroy+0x3c/0x50
        [<ffffffff8105aa61>] cpuid_class_cpu_callback+0x51/0x70
        [<ffffffff81131ce9>] notifier_call_chain+0x59/0x190
        [<ffffffff81132749>] __raw_notifier_call_chain+0x9/0x10
        [<ffffffff810fe6b0>] __cpu_notify+0x40/0x90
        [<ffffffff810fe890>] cpu_notify_nofail+0x10/0x30
        [<ffffffff810fe8d7>] notify_dead+0x27/0x1e0
        [<ffffffff810fe273>] cpuhp_down_callbacks+0x93/0x190
        [<ffffffff82096062>] _cpu_down+0xc2/0x1e0
        [<ffffffff810ff727>] do_cpu_down+0x37/0x50
        [<ffffffff8110003b>] cpu_down+0xb/0x10
        [<ffffffff81038e4d>] _debug_hotplug_cpu+0x7d/0xd0
        [<ffffffff8435d6bb>] debug_hotplug_cpu+0xd/0x11
        [<ffffffff84352426>] do_one_initcall+0x138/0x1cf
        [<ffffffff8435270a>] kernel_init_freeable+0x24d/0x2de
        [<ffffffff8209533a>] kernel_init+0xa/0x120
        [<ffffffff820a7972>] ret_from_fork+0x22/0x50

 -> #1 (cpu_hotplug.lock#2){+.+.+.}:
        [<ffffffff8117da2c>] lock_acquire+0xac/0x180
        [<ffffffff820a20d1>] mutex_lock_nested+0x71/0x4c0
        [<ffffffff810ff526>] get_online_cpus+0x66/0x80
        [<ffffffff81246fb3>] sum_vm_event+0x23/0x1b0
        [<ffffffff81293768>] collapse_huge_page+0x118/0x10b0
        [<ffffffff81294c5d>] khugepaged+0x55d/0xe80
        [<ffffffff81130304>] kthread+0x134/0x1a0
        [<ffffffff820a7972>] ret_from_fork+0x22/0x50

 -> #0 (&mm->mmap_sem){++++++}:
        [<ffffffff8117bf61>] __lock_acquire+0x2861/0x31f0
        [<ffffffff8117da2c>] lock_acquire+0xac/0x180
        [<ffffffff8126257e>] __might_fault+0xbe/0x150
        [<ffffffff8133160f>] kernfs_fop_write+0xaf/0x250
        [<ffffffff812a8933>] __vfs_write+0x43/0x1a0
        [<ffffffff812a8d3a>] vfs_write+0xda/0x240
        [<ffffffff812a8f84>] SyS_write+0x44/0xa0
        [<ffffffff820a773c>] entry_SYSCALL_64_fastpath+0x1f/0xbd

This patch moves sum_vm_event() before taking down_write(&mm->mmap_sem)
to solve dependency lock.

Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
---
 mm/huge_memory.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 91442a9..feee44c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2451,6 +2451,9 @@ static void collapse_huge_page(struct mm_struct *mm,
 		goto out_nolock;
 	}
 
+	swap = get_mm_counter(mm, MM_SWAPENTS);
+	curr_allocstall = sum_vm_event(ALLOCSTALL);
+
 	/*
 	 * Prevent all access to pagetables with the exception of
 	 * gup_fast later hanlded by the ptep_clear_flush and the VM
@@ -2483,8 +2486,6 @@ static void collapse_huge_page(struct mm_struct *mm,
 		goto out;
 	}
 
-	swap = get_mm_counter(mm, MM_SWAPENTS);
-	curr_allocstall = sum_vm_event(ALLOCSTALL);
 	/*
 	 * Don't perform swapin readahead when the system is under pressure,
 	 * to avoid unnecessary resource consumption.
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
  2016-05-23 17:14 [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
  2016-05-23 17:14 ` [PATCH 1/3] mm, thp: remove duplication of included header Ebru Akagunduz
  2016-05-23 17:14 ` [PATCH 2/3] mm, thp: fix possible circular locking dependency caused by sum_vm_event() Ebru Akagunduz
@ 2016-05-23 17:14 ` Ebru Akagunduz
  2016-05-23 17:44   ` Kirill A. Shutemov
  2016-05-23 18:42   ` Michal Hocko
  2016-05-23 17:29 ` [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
  3 siblings, 2 replies; 17+ messages in thread
From: Ebru Akagunduz @ 2016-05-23 17:14 UTC (permalink / raw)
  To: linux-mm
  Cc: hughd, riel, akpm, kirill.shutemov, n-horiguchi, aarcange,
	iamjoonsoo.kim, gorcunov, linux-kernel, mgorman, rientjes,
	vbabka, aneesh.kumar, hannes, mhocko, boaz, Ebru Akagunduz

Currently khugepaged makes swapin readahead under
down_write. This patch supplies to make swapin
readahead under down_read instead of down_write.

The patch was tested with a test program that allocates
800MB of memory, writes to it, and then sleeps. The system
was forced to swap out all. Afterwards, the test program
touches the area by writing, it skips a page in each
20 pages of the area.

Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
---
 mm/huge_memory.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index feee44c..668bc07 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2386,13 +2386,14 @@ static bool hugepage_vma_check(struct vm_area_struct *vma)
  * but with mmap_sem held to protect against vma changes.
  */
 
-static void __collapse_huge_page_swapin(struct mm_struct *mm,
+static bool __collapse_huge_page_swapin(struct mm_struct *mm,
 					struct vm_area_struct *vma,
 					unsigned long address, pmd_t *pmd)
 {
 	unsigned long _address;
 	pte_t *pte, pteval;
 	int swapped_in = 0, ret = 0;
+	struct vm_area_struct *vma_orig = vma;
 
 	pte = pte_offset_map(pmd, address);
 	for (_address = address; _address < address + HPAGE_PMD_NR*PAGE_SIZE;
@@ -2402,11 +2403,19 @@ static void __collapse_huge_page_swapin(struct mm_struct *mm,
 			continue;
 		swapped_in++;
 		ret = do_swap_page(mm, vma, _address, pte, pmd,
-				   FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT,
+				   FAULT_FLAG_ALLOW_RETRY,
 				   pteval);
+		/* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
+		if (ret & VM_FAULT_RETRY) {
+			down_read(&mm->mmap_sem);
+			vma = find_vma(mm, address);
+			/* vma is no longer available, don't continue to swapin */
+			if (vma != vma_orig)
+				return false;
+		}
 		if (ret & VM_FAULT_ERROR) {
 			trace_mm_collapse_huge_page_swapin(mm, swapped_in, 0);
-			return;
+			return false;
 		}
 		/* pte is unmapped now, we need to map it */
 		pte = pte_offset_map(pmd, _address);
@@ -2414,6 +2423,7 @@ static void __collapse_huge_page_swapin(struct mm_struct *mm,
 	pte--;
 	pte_unmap(pte);
 	trace_mm_collapse_huge_page_swapin(mm, swapped_in, 1);
+	return true;
 }
 
 static void collapse_huge_page(struct mm_struct *mm,
@@ -2459,7 +2469,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 * gup_fast later hanlded by the ptep_clear_flush and the VM
 	 * handled by the anon_vma lock + PG_lock.
 	 */
-	down_write(&mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 	if (unlikely(khugepaged_test_exit(mm))) {
 		result = SCAN_ANY_PROCESS;
 		goto out;
@@ -2490,9 +2500,20 @@ static void collapse_huge_page(struct mm_struct *mm,
 	 * Don't perform swapin readahead when the system is under pressure,
 	 * to avoid unnecessary resource consumption.
 	 */
-	if (allocstall == curr_allocstall && swap != 0)
-		__collapse_huge_page_swapin(mm, vma, address, pmd);
+	if (allocstall == curr_allocstall && swap != 0) {
+		/*
+		 * __collapse_huge_page_swapin always returns with mmap_sem
+		 * locked. If it fails, release mmap_sem and jump directly
+		 * label out. Continuing to collapse causes inconsistency.
+		 */
+		if (!__collapse_huge_page_swapin(mm, vma, address, pmd)) {
+			up_read(&mm->mmap_sem);
+			goto out;
+		}
+	}
 
+	up_read(&mm->mmap_sem);
+	down_write(&mm->mmap_sem);
 	anon_vma_lock_write(vma->anon_vma);
 
 	pte = pte_offset_map(pmd, address);
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin
  2016-05-23 17:14 [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
                   ` (2 preceding siblings ...)
  2016-05-23 17:14 ` [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem Ebru Akagunduz
@ 2016-05-23 17:29 ` Ebru Akagunduz
  2016-05-27 13:12   ` Michal Hocko
  3 siblings, 1 reply; 17+ messages in thread
From: Ebru Akagunduz @ 2016-05-23 17:29 UTC (permalink / raw)
  To: linux-mm
  Cc: hughd, riel, akpm, kirill.shutemov, n-horiguchi, aarcange,
	iamjoonsoo.kim, gorcunov, linux-kernel, mgorman, rientjes,
	vbabka, aneesh.kumar, hannes, mhocko, boaz

On Mon, May 23, 2016 at 08:14:08PM +0300, Ebru Akagunduz wrote:
> This patch series removes duplication of included header
> and fixes locking inconsistency in khugepaged swapin
> 
> Ebru Akagunduz (3):
>   mm, thp: remove duplication of included header
>   mm, thp: fix possible circular locking dependency caused by
>     sum_vm_event()
>   mm, thp: make swapin readahead under down_read of mmap_sem
> 
>  mm/huge_memory.c | 39 ++++++++++++++++++++++++++++++---------
>  1 file changed, 30 insertions(+), 9 deletions(-)
> 

Hi Andrew,

I prepared this patch series to solve rest of
problems of khugepaged swapin.

I have seen the discussion:
http://marc.info/?l=linux-mm&m=146373278424897&w=2

In my opinion, checking whether kswapd is wake up
could be good.

It's up to you. I can take an action according to community's decision.

Here is the last status of the discussion:

"
Optimistic swapin collapsing

1. it could be too optimisitic to lose the gain due to evicting workingset
2. let's detect memory pressure
3. current allocstall magic is not a good idea.
4. let's change the design from optimistic to conservative
5. how we can be conservative
6. two things - detect hot pages and threshold of swap pte
7. threhsold of swap pte is already done so remained thing is detect hot page
8. how to detect hot page - let's use young bit
9. Now, we are conservatie so we will swap in when it's worth
10. let's remove allocstall magic

I think it's not off-topic.
Anyway, it's just my thought and don't have any real workload and objection.
Feel free to ignore.
"

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
  2016-05-23 17:14 ` [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem Ebru Akagunduz
@ 2016-05-23 17:44   ` Kirill A. Shutemov
  2016-05-23 18:42   ` Michal Hocko
  1 sibling, 0 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2016-05-23 17:44 UTC (permalink / raw)
  To: Ebru Akagunduz
  Cc: linux-mm, hughd, riel, akpm, kirill.shutemov, n-horiguchi,
	aarcange, iamjoonsoo.kim, gorcunov, linux-kernel, mgorman,
	rientjes, vbabka, aneesh.kumar, hannes, mhocko, boaz

On Mon, May 23, 2016 at 08:14:11PM +0300, Ebru Akagunduz wrote:
> Currently khugepaged makes swapin readahead under
> down_write. This patch supplies to make swapin
> readahead under down_read instead of down_write.
> 
> The patch was tested with a test program that allocates
> 800MB of memory, writes to it, and then sleeps. The system
> was forced to swap out all. Afterwards, the test program
> touches the area by writing, it skips a page in each
> 20 pages of the area.
> 
> Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com>
> ---
>  mm/huge_memory.c | 33 +++++++++++++++++++++++++++------
>  1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index feee44c..668bc07 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2386,13 +2386,14 @@ static bool hugepage_vma_check(struct vm_area_struct *vma)
>   * but with mmap_sem held to protect against vma changes.
>   */
>  
> -static void __collapse_huge_page_swapin(struct mm_struct *mm,
> +static bool __collapse_huge_page_swapin(struct mm_struct *mm,
>  					struct vm_area_struct *vma,
>  					unsigned long address, pmd_t *pmd)
>  {
>  	unsigned long _address;
>  	pte_t *pte, pteval;
>  	int swapped_in = 0, ret = 0;
> +	struct vm_area_struct *vma_orig = vma;
>  
>  	pte = pte_offset_map(pmd, address);
>  	for (_address = address; _address < address + HPAGE_PMD_NR*PAGE_SIZE;
> @@ -2402,11 +2403,19 @@ static void __collapse_huge_page_swapin(struct mm_struct *mm,
>  			continue;
>  		swapped_in++;
>  		ret = do_swap_page(mm, vma, _address, pte, pmd,
> -				   FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT,
> +				   FAULT_FLAG_ALLOW_RETRY,
>  				   pteval);
> +		/* do_swap_page returns VM_FAULT_RETRY with released mmap_sem */
> +		if (ret & VM_FAULT_RETRY) {
> +			down_read(&mm->mmap_sem);
> +			vma = find_vma(mm, address);
> +			/* vma is no longer available, don't continue to swapin */
> +			if (vma != vma_orig)
> +				return false;
> +		}
>  		if (ret & VM_FAULT_ERROR) {
>  			trace_mm_collapse_huge_page_swapin(mm, swapped_in, 0);
> -			return;
> +			return false;
>  		}
>  		/* pte is unmapped now, we need to map it */
>  		pte = pte_offset_map(pmd, _address);
> @@ -2414,6 +2423,7 @@ static void __collapse_huge_page_swapin(struct mm_struct *mm,
>  	pte--;
>  	pte_unmap(pte);
>  	trace_mm_collapse_huge_page_swapin(mm, swapped_in, 1);
> +	return true;
>  }
>  
>  static void collapse_huge_page(struct mm_struct *mm,
> @@ -2459,7 +2469,7 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	 * gup_fast later hanlded by the ptep_clear_flush and the VM
>  	 * handled by the anon_vma lock + PG_lock.
>  	 */
> -	down_write(&mm->mmap_sem);
> +	down_read(&mm->mmap_sem);
>  	if (unlikely(khugepaged_test_exit(mm))) {
>  		result = SCAN_ANY_PROCESS;
>  		goto out;
> @@ -2490,9 +2500,20 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	 * Don't perform swapin readahead when the system is under pressure,
>  	 * to avoid unnecessary resource consumption.
>  	 */
> -	if (allocstall == curr_allocstall && swap != 0)
> -		__collapse_huge_page_swapin(mm, vma, address, pmd);
> +	if (allocstall == curr_allocstall && swap != 0) {
> +		/*
> +		 * __collapse_huge_page_swapin always returns with mmap_sem
> +		 * locked. If it fails, release mmap_sem and jump directly
> +		 * label out. Continuing to collapse causes inconsistency.
> +		 */
> +		if (!__collapse_huge_page_swapin(mm, vma, address, pmd)) {
> +			up_read(&mm->mmap_sem);
> +			goto out;
> +		}
> +	}
>  
> +	up_read(&mm->mmap_sem);
> +	down_write(&mm->mmap_sem);

That's the critical point.

How do you guarantee that the vma will not be destroyed (or changed)
between up_read() and down_write()?

You need at least find_vma() again.

>  	anon_vma_lock_write(vma->anon_vma);
>  
>  	pte = pte_offset_map(pmd, address);
> -- 
> 1.9.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
  2016-05-23 17:14 ` [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem Ebru Akagunduz
  2016-05-23 17:44   ` Kirill A. Shutemov
@ 2016-05-23 18:42   ` Michal Hocko
  2016-05-23 18:49     ` Rik van Riel
  1 sibling, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2016-05-23 18:42 UTC (permalink / raw)
  To: Ebru Akagunduz
  Cc: linux-mm, hughd, riel, akpm, kirill.shutemov, n-horiguchi,
	aarcange, iamjoonsoo.kim, gorcunov, linux-kernel, mgorman,
	rientjes, vbabka, aneesh.kumar, hannes, boaz

On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote:
> Currently khugepaged makes swapin readahead under
> down_write. This patch supplies to make swapin
> readahead under down_read instead of down_write.

You are still keeping down_write. Can we do without it altogether?
Blocking mmap_sem of a remote proces for write is certainly not nice.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
  2016-05-23 18:42   ` Michal Hocko
@ 2016-05-23 18:49     ` Rik van Riel
  2016-05-23 19:01       ` Kirill A. Shutemov
  0 siblings, 1 reply; 17+ messages in thread
From: Rik van Riel @ 2016-05-23 18:49 UTC (permalink / raw)
  To: Michal Hocko, Ebru Akagunduz
  Cc: linux-mm, hughd, akpm, kirill.shutemov, n-horiguchi, aarcange,
	iamjoonsoo.kim, gorcunov, linux-kernel, mgorman, rientjes,
	vbabka, aneesh.kumar, hannes, boaz

[-- Attachment #1: Type: text/plain, Size: 614 bytes --]

On Mon, 2016-05-23 at 20:42 +0200, Michal Hocko wrote:
> On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote:
> > 
> > Currently khugepaged makes swapin readahead under
> > down_write. This patch supplies to make swapin
> > readahead under down_read instead of down_write.
> You are still keeping down_write. Can we do without it altogether?
> Blocking mmap_sem of a remote proces for write is certainly not nice.

Maybe Andrea can explain why khugepaged requires
a down_write of mmap_sem?

If it were possible to have just down_read that
would make the code a lot simpler.

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
  2016-05-23 18:49     ` Rik van Riel
@ 2016-05-23 19:01       ` Kirill A. Shutemov
  2016-05-23 19:26         ` Rik van Riel
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2016-05-23 19:01 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Michal Hocko, Ebru Akagunduz, linux-mm, hughd, akpm, n-horiguchi,
	aarcange, iamjoonsoo.kim, gorcunov, linux-kernel, mgorman,
	rientjes, vbabka, aneesh.kumar, hannes, boaz

On Mon, May 23, 2016 at 02:49:09PM -0400, Rik van Riel wrote:
> On Mon, 2016-05-23 at 20:42 +0200, Michal Hocko wrote:
> > On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote:
> > > 
> > > Currently khugepaged makes swapin readahead under
> > > down_write. This patch supplies to make swapin
> > > readahead under down_read instead of down_write.
> > You are still keeping down_write. Can we do without it altogether?
> > Blocking mmap_sem of a remote proces for write is certainly not nice.
> 
> Maybe Andrea can explain why khugepaged requires
> a down_write of mmap_sem?
> 
> If it were possible to have just down_read that
> would make the code a lot simpler.

You need a down_write() to retract page table. We need to make sure that
nobody sees the page table before we can replace it with huge pmd.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
  2016-05-23 19:01       ` Kirill A. Shutemov
@ 2016-05-23 19:26         ` Rik van Riel
  2016-05-23 20:02           ` Kirill A. Shutemov
  0 siblings, 1 reply; 17+ messages in thread
From: Rik van Riel @ 2016-05-23 19:26 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Michal Hocko, Ebru Akagunduz, linux-mm, hughd, akpm, n-horiguchi,
	aarcange, iamjoonsoo.kim, gorcunov, linux-kernel, mgorman,
	rientjes, vbabka, aneesh.kumar, hannes, boaz

[-- Attachment #1: Type: text/plain, Size: 1235 bytes --]

On Mon, 2016-05-23 at 22:01 +0300, Kirill A. Shutemov wrote:
> On Mon, May 23, 2016 at 02:49:09PM -0400, Rik van Riel wrote:
> > 
> > On Mon, 2016-05-23 at 20:42 +0200, Michal Hocko wrote:
> > > 
> > > On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote:
> > > > 
> > > > 
> > > > Currently khugepaged makes swapin readahead under
> > > > down_write. This patch supplies to make swapin
> > > > readahead under down_read instead of down_write.
> > > You are still keeping down_write. Can we do without it
> > > altogether?
> > > Blocking mmap_sem of a remote proces for write is certainly not
> > > nice.
> > Maybe Andrea can explain why khugepaged requires
> > a down_write of mmap_sem?
> > 
> > If it were possible to have just down_read that
> > would make the code a lot simpler.
> You need a down_write() to retract page table. We need to make sure
> that
> nobody sees the page table before we can replace it with huge pmd.

Good point.

I guess the alternative is to have the page_table_lock
taken by a helper function (everywhere) that can return
failure if the page table was changed while the caller
was waiting for the lock.

Doable, but a fair amount of code churn.

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
  2016-05-23 19:26         ` Rik van Riel
@ 2016-05-23 20:02           ` Kirill A. Shutemov
  2016-05-23 20:13             ` Rik van Riel
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2016-05-23 20:02 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Kirill A. Shutemov, Michal Hocko, Ebru Akagunduz, linux-mm,
	hughd, akpm, n-horiguchi, aarcange, iamjoonsoo.kim, gorcunov,
	linux-kernel, mgorman, rientjes, vbabka, aneesh.kumar, hannes,
	boaz

On Mon, May 23, 2016 at 03:26:47PM -0400, Rik van Riel wrote:
> On Mon, 2016-05-23 at 22:01 +0300, Kirill A. Shutemov wrote:
> > On Mon, May 23, 2016 at 02:49:09PM -0400, Rik van Riel wrote:
> > > 
> > > On Mon, 2016-05-23 at 20:42 +0200, Michal Hocko wrote:
> > > > 
> > > > On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote:
> > > > > 
> > > > > 
> > > > > Currently khugepaged makes swapin readahead under
> > > > > down_write. This patch supplies to make swapin
> > > > > readahead under down_read instead of down_write.
> > > > You are still keeping down_write. Can we do without it
> > > > altogether?
> > > > Blocking mmap_sem of a remote proces for write is certainly not
> > > > nice.
> > > Maybe Andrea can explain why khugepaged requires
> > > a down_write of mmap_sem?
> > > 
> > > If it were possible to have just down_read that
> > > would make the code a lot simpler.
> > You need a down_write() to retract page table. We need to make sure
> > that
> > nobody sees the page table before we can replace it with huge pmd.
> 
> Good point.
> 
> I guess the alternative is to have the page_table_lock
> taken by a helper function (everywhere) that can return
> failure if the page table was changed while the caller
> was waiting for the lock.

Not page table was changed, but pmd is now pointing to something else.
Basically, we would need to nest all pte-ptl's within pmd_lock().
That's not good for scalability.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
  2016-05-23 20:02           ` Kirill A. Shutemov
@ 2016-05-23 20:13             ` Rik van Riel
  2016-05-23 21:49               ` Kirill A. Shutemov
  0 siblings, 1 reply; 17+ messages in thread
From: Rik van Riel @ 2016-05-23 20:13 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Michal Hocko, Ebru Akagunduz, linux-mm,
	hughd, akpm, n-horiguchi, aarcange, iamjoonsoo.kim, gorcunov,
	linux-kernel, mgorman, rientjes, vbabka, aneesh.kumar, hannes,
	boaz

[-- Attachment #1: Type: text/plain, Size: 2477 bytes --]

On Mon, 2016-05-23 at 23:02 +0300, Kirill A. Shutemov wrote:
> On Mon, May 23, 2016 at 03:26:47PM -0400, Rik van Riel wrote:
> > 
> > On Mon, 2016-05-23 at 22:01 +0300, Kirill A. Shutemov wrote:
> > > 
> > > On Mon, May 23, 2016 at 02:49:09PM -0400, Rik van Riel wrote:
> > > > 
> > > > 
> > > > On Mon, 2016-05-23 at 20:42 +0200, Michal Hocko wrote:
> > > > > 
> > > > > 
> > > > > On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Currently khugepaged makes swapin readahead under
> > > > > > down_write. This patch supplies to make swapin
> > > > > > readahead under down_read instead of down_write.
> > > > > You are still keeping down_write. Can we do without it
> > > > > altogether?
> > > > > Blocking mmap_sem of a remote proces for write is certainly
> > > > > not
> > > > > nice.
> > > > Maybe Andrea can explain why khugepaged requires
> > > > a down_write of mmap_sem?
> > > > 
> > > > If it were possible to have just down_read that
> > > > would make the code a lot simpler.
> > > You need a down_write() to retract page table. We need to make
> > > sure
> > > that
> > > nobody sees the page table before we can replace it with huge
> > > pmd.
> > Good point.
> > 
> > I guess the alternative is to have the page_table_lock
> > taken by a helper function (everywhere) that can return
> > failure if the page table was changed while the caller
> > was waiting for the lock.
> Not page table was changed, but pmd is now pointing to something
> else.
> Basically, we would need to nest all pte-ptl's within pmd_lock().
> That's not good for scalability.

I can see a few alternatives here:

1) huge pmd collapsing takes both the pmd lock and the pte lock,
   preventing pte updates from happening simultaneously

2) code that (re-)acquires the pte lock can read a sequence number
   at the pmd level, check that it did not change after the
   pte lock has been acquired, and abort if it has - I believe most
   of the code that re-acquires the pte lock already knows how to
   abort if somebody else touched the pte while it was looking
   elsewhere

That way the (uncommon) thp collapse code should still exclude
pte level operations, at the cost of potentially teaching a few
more pte level operations to abort (chances are most already do,
considering a race with other pte-level manipulations requires that).

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
  2016-05-23 20:13             ` Rik van Riel
@ 2016-05-23 21:49               ` Kirill A. Shutemov
  2016-05-23 23:08                 ` Andrea Arcangeli
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2016-05-23 21:49 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Kirill A. Shutemov, Michal Hocko, Ebru Akagunduz, linux-mm,
	hughd, akpm, n-horiguchi, aarcange, iamjoonsoo.kim, gorcunov,
	linux-kernel, mgorman, rientjes, vbabka, aneesh.kumar, hannes,
	boaz

On Mon, May 23, 2016 at 04:13:03PM -0400, Rik van Riel wrote:
> On Mon, 2016-05-23 at 23:02 +0300, Kirill A. Shutemov wrote:
> > On Mon, May 23, 2016 at 03:26:47PM -0400, Rik van Riel wrote:
> > > 
> > > On Mon, 2016-05-23 at 22:01 +0300, Kirill A. Shutemov wrote:
> > > > 
> > > > On Mon, May 23, 2016 at 02:49:09PM -0400, Rik van Riel wrote:
> > > > > 
> > > > > 
> > > > > On Mon, 2016-05-23 at 20:42 +0200, Michal Hocko wrote:
> > > > > > 
> > > > > > 
> > > > > > On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Currently khugepaged makes swapin readahead under
> > > > > > > down_write. This patch supplies to make swapin
> > > > > > > readahead under down_read instead of down_write.
> > > > > > You are still keeping down_write. Can we do without it
> > > > > > altogether?
> > > > > > Blocking mmap_sem of a remote proces for write is certainly
> > > > > > not
> > > > > > nice.
> > > > > Maybe Andrea can explain why khugepaged requires
> > > > > a down_write of mmap_sem?
> > > > > 
> > > > > If it were possible to have just down_read that
> > > > > would make the code a lot simpler.
> > > > You need a down_write() to retract page table. We need to make
> > > > sure
> > > > that
> > > > nobody sees the page table before we can replace it with huge
> > > > pmd.
> > > Good point.
> > > 
> > > I guess the alternative is to have the page_table_lock
> > > taken by a helper function (everywhere) that can return
> > > failure if the page table was changed while the caller
> > > was waiting for the lock.
> > Not page table was changed, but pmd is now pointing to something
> > else.
> > Basically, we would need to nest all pte-ptl's within pmd_lock().
> > That's not good for scalability.
> 
> I can see a few alternatives here:
> 
> 1) huge pmd collapsing takes both the pmd lock and the pte lock,
>    preventing pte updates from happening simultaneously

That's what we do now and that's not enough.

We would need to serialize against pmd_lock() during normal page-fault
path (and other pte manipulation), which we don't do now if pmd points to
page table.

That's huge hit on scalability.

> 
> 2) code that (re-)acquires the pte lock can read a sequence number
>    at the pmd level, check that it did not change after the
>    pte lock has been acquired, and abort if it has - I believe most
>    of the code that re-acquires the pte lock already knows how to
>    abort if somebody else touched the pte while it was looking
>    elsewhere

So, every pmd_lock() (and other means we take the lock) should bump the
sequence number and we need to be able to read stable result  outside
pmd_lock(), meaning it should be atomic_t or something similar.

Not exactly free.

And I'm not convinced the hassle worth the gain.

> That way the (uncommon) thp collapse code should still exclude
> pte level operations, at the cost of potentially teaching a few
> more pte level operations to abort (chances are most already do,
> considering a race with other pte-level manipulations requires that).

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem
  2016-05-23 21:49               ` Kirill A. Shutemov
@ 2016-05-23 23:08                 ` Andrea Arcangeli
  0 siblings, 0 replies; 17+ messages in thread
From: Andrea Arcangeli @ 2016-05-23 23:08 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Rik van Riel, Kirill A. Shutemov, Michal Hocko, Ebru Akagunduz,
	linux-mm, hughd, akpm, n-horiguchi, iamjoonsoo.kim, gorcunov,
	linux-kernel, mgorman, rientjes, vbabka, aneesh.kumar, hannes,
	boaz

On Tue, May 24, 2016 at 12:49:42AM +0300, Kirill A. Shutemov wrote:
> That's what we do now and that's not enough.
> 
> We would need to serialize against pmd_lock() during normal page-fault
> path (and other pte manipulation), which we don't do now if pmd points to
> page table.

Yes, mmap_sem for writing while converting the pmd to a
pmd_trans_huge() in khugepaged, is so that the pagetable walk doesn't
require the pmd_lock after holding the mmap_sem for reading if the pmd
is found !pmd_trans_unstable, i.e. if the pmd points to a pte.

This way the non-THP pte walk retains the identical cost it has with
THP not compiled into the kernel (even when THP is enabled).

khugepaged already starts by doing work with the mmap_sem for reading,
then while holding the mmap_sem for reading if khugepaged_scan_pmd()
finds a candidate pmd to collapse into a pmd_trans_huge(), it calls
collapse_huge_page which at some point releases the mmap_sem for
reading (before the THP memory allocation) and takes it again for
writing if the allocation succeeded and we can go ahead with the
atomic THP collapse (under mmap_sem for writing and under the anon_vma
lock for writing too to serialize against split_huge_page which can be
called on the physical page and doesn't hold any mmap_sem but just
finds the pagetables through a rmap walk). The atomic part is all non
blocking.

The swapin loop can run under the mmap_sem for reading if it does the
proper check to revalidate the vma, but it should move above the below
comment.

	/*
	 * Prevent all access to pagetables with the exception of
	 * gup_fast later hanlded by the ptep_clear_flush and the VM
	 * handled by the anon_vma lock + PG_lock.
	 */
	down_write(&mm->mmap_sem);

Which is more or less what the last patch was doing except by keeping
the comment above the swapin stage, it made the comment wrong, as the
comment was then followed by a down_read.

Aside from the comment being wrong (which is not a kernel crashing
issue), the real problem was lack of revalidates after releasing the
mmap_sem and this revalidate attempt is also not correct:

+                       vma = find_vma(mm, address);
+                       /* vma is no longer available, don't continue to swapin */
+                       if (vma != vma_orig)
+                               return false;

Because the mmap_sem was temporarily dropped, the vma may have been
freed and reallocated at the same address, but it may be a completely
different vma with different vm_start/end values or it may not be
anonymous or mremap may have altered the vm_start/end too or the "mm"
may have exited in the meanwhile.

collapse_huge_page already shows how to correctly to revalidate the
vma after dropping the mmap_sem temporarily:

	down_write(&mm->mmap_sem);
	if (unlikely(khugepaged_test_exit(mm))) {
		result = SCAN_ANY_PROCESS;
		goto out;
	}

	vma = find_vma(mm, address);
	if (!vma) {
		result = SCAN_VMA_NULL;
		goto out;
	}
	hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
	hend = vma->vm_end & HPAGE_PMD_MASK;
	if (address < hstart || address + HPAGE_PMD_SIZE > hend) {
		result = SCAN_ADDRESS_RANGE;
		goto out;
	}
	if (!hugepage_vma_check(vma)) {
		result = SCAN_VMA_CHECK;
		goto out;
	}

All checks above are needed for a correct revalidate, otherwise the
above code could also have been replaced by a vma != vma_orig.

If we move this swapin stage before the comment and after the THP
allocation succeeded, and we do enough revalidates correctly (which
are currently missing or incorrect, and find_vma is not enough for a
revalidate, find_vma only says there's some random vma with vm_end >
address), it should then work ok under only the mmap_sem for
reading.

Overall the last patch goes in the right direction just it needs to do
all revalidates right and to move the swapin stage a bit more up to
avoid invalidating the comment I think.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin
  2016-05-23 17:29 ` [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
@ 2016-05-27 13:12   ` Michal Hocko
  2016-06-11 19:21     ` Ebru Akagunduz
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2016-05-27 13:12 UTC (permalink / raw)
  To: Ebru Akagunduz
  Cc: linux-mm, hughd, riel, akpm, kirill.shutemov, n-horiguchi,
	aarcange, iamjoonsoo.kim, gorcunov, linux-kernel, mgorman,
	rientjes, vbabka, aneesh.kumar, hannes, boaz

On Mon 23-05-16 20:29:29, Ebru Akagunduz wrote:
> On Mon, May 23, 2016 at 08:14:08PM +0300, Ebru Akagunduz wrote:
> > This patch series removes duplication of included header
> > and fixes locking inconsistency in khugepaged swapin
> > 
> > Ebru Akagunduz (3):
> >   mm, thp: remove duplication of included header
> >   mm, thp: fix possible circular locking dependency caused by
> >     sum_vm_event()
> >   mm, thp: make swapin readahead under down_read of mmap_sem
> > 
> >  mm/huge_memory.c | 39 ++++++++++++++++++++++++++++++---------
> >  1 file changed, 30 insertions(+), 9 deletions(-)
> > 
> 
> Hi Andrew,
> 
> I prepared this patch series to solve rest of
> problems of khugepaged swapin.
> 
> I have seen the discussion:
> http://marc.info/?l=linux-mm&m=146373278424897&w=2
> 
> In my opinion, checking whether kswapd is wake up
> could be good.

This is still not enough because it doesn't help memcg loads. kswapd
might be sleeping but the memcg reclaim can still be active. So I think
we really need to do ~__GFP_DIRECT_RECLAIM thing.

> It's up to you. I can take an action according to community's decision.

IMHO we should drop the current ALLOCSTALL heuristic and replace it with
~__GFP_DIRECT_RECLAIM.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin
  2016-05-27 13:12   ` Michal Hocko
@ 2016-06-11 19:21     ` Ebru Akagunduz
  2016-06-13 13:55       ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Ebru Akagunduz @ 2016-06-11 19:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, hughd, riel, akpm, kirill.shutemov, n-horiguchi,
	aarcange, iamjoonsoo.kim, gorcunov, linux-kernel, mgorman,
	rientjes, vbabka, aneesh.kumar, hannes, boaz

On Fri, May 27, 2016 at 03:12:47PM +0200, Michal Hocko wrote:
> On Mon 23-05-16 20:29:29, Ebru Akagunduz wrote:
> > On Mon, May 23, 2016 at 08:14:08PM +0300, Ebru Akagunduz wrote:
> > > This patch series removes duplication of included header
> > > and fixes locking inconsistency in khugepaged swapin
> > > 
> > > Ebru Akagunduz (3):
> > >   mm, thp: remove duplication of included header
> > >   mm, thp: fix possible circular locking dependency caused by
> > >     sum_vm_event()
> > >   mm, thp: make swapin readahead under down_read of mmap_sem
> > > 
> > >  mm/huge_memory.c | 39 ++++++++++++++++++++++++++++++---------
> > >  1 file changed, 30 insertions(+), 9 deletions(-)
> > > 
> > 
> > Hi Andrew,
> > 
> > I prepared this patch series to solve rest of
> > problems of khugepaged swapin.
> > 
> > I have seen the discussion:
> > http://marc.info/?l=linux-mm&m=146373278424897&w=2
> > 
> > In my opinion, checking whether kswapd is wake up
> > could be good.
> 
> This is still not enough because it doesn't help memcg loads. kswapd
> might be sleeping but the memcg reclaim can still be active. So I think
> we really need to do ~__GFP_DIRECT_RECLAIM thing.
> 
> > It's up to you. I can take an action according to community's decision.
> 
> IMHO we should drop the current ALLOCSTALL heuristic and replace it with
> ~__GFP_DIRECT_RECLAIM.
Actually, I don't lean towards to touch do_swap_page giving gfp parameter.
do_swap_page is also used by do_page_fault, it can cause many side effect
that I can't see. 

I've just sent a patch series for converting from optimistic to conservative and take
back allocstall. Maybe that way can be easier to be approved and less problemitical.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin
  2016-06-11 19:21     ` Ebru Akagunduz
@ 2016-06-13 13:55       ` Michal Hocko
  0 siblings, 0 replies; 17+ messages in thread
From: Michal Hocko @ 2016-06-13 13:55 UTC (permalink / raw)
  To: Ebru Akagunduz
  Cc: linux-mm, hughd, riel, akpm, kirill.shutemov, n-horiguchi,
	aarcange, iamjoonsoo.kim, gorcunov, linux-kernel, mgorman,
	rientjes, vbabka, aneesh.kumar, hannes, boaz

On Sat 11-06-16 22:21:06, Ebru Akagunduz wrote:
> On Fri, May 27, 2016 at 03:12:47PM +0200, Michal Hocko wrote:
[...]
> > IMHO we should drop the current ALLOCSTALL heuristic and replace it with
> > ~__GFP_DIRECT_RECLAIM.
>
> Actually, I don't lean towards to touch do_swap_page giving gfp parameter.
> do_swap_page is also used by do_page_fault, it can cause many side effect
> that I can't see. 

The point of the patch is to keep the current gfp mask from all places
except for the optimistic swapin. This was what my example patch did
AFAIR.
 
> I've just sent a patch series for converting from optimistic to
> conservative and take back allocstall. Maybe that way can be easier to
> be approved and less problemitical.

I will try to have a look but I will be travelling this week so cannot
promise anything.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-06-13 13:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-23 17:14 [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
2016-05-23 17:14 ` [PATCH 1/3] mm, thp: remove duplication of included header Ebru Akagunduz
2016-05-23 17:14 ` [PATCH 2/3] mm, thp: fix possible circular locking dependency caused by sum_vm_event() Ebru Akagunduz
2016-05-23 17:14 ` [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem Ebru Akagunduz
2016-05-23 17:44   ` Kirill A. Shutemov
2016-05-23 18:42   ` Michal Hocko
2016-05-23 18:49     ` Rik van Riel
2016-05-23 19:01       ` Kirill A. Shutemov
2016-05-23 19:26         ` Rik van Riel
2016-05-23 20:02           ` Kirill A. Shutemov
2016-05-23 20:13             ` Rik van Riel
2016-05-23 21:49               ` Kirill A. Shutemov
2016-05-23 23:08                 ` Andrea Arcangeli
2016-05-23 17:29 ` [PATCH 0/3] mm, thp: remove duplication and fix locking issues in swapin Ebru Akagunduz
2016-05-27 13:12   ` Michal Hocko
2016-06-11 19:21     ` Ebru Akagunduz
2016-06-13 13:55       ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).