All of lore.kernel.org
 help / color / mirror / Atom feed
* mm: page allocation failures in swap_duplicate -> add_swap_count_continuation
@ 2017-05-12  9:18 ` Christian Borntraeger
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Borntraeger @ 2017-05-12  9:18 UTC (permalink / raw)
  To: linux-mm, Linux Kernel Mailing List

Folks,

recently I have seen page allocation failures during
paging in the paging code:
e.g. 

May 05 21:36:53  kernel: Call Trace:
May 05 21:36:53  kernel: ([<0000000000112f62>] show_trace+0x62/0x78)
May 05 21:36:53  kernel:  [<0000000000113050>] show_stack+0x68/0xe0 
May 05 21:36:53  kernel:  [<00000000004fb97e>] dump_stack+0x7e/0xb0 
May 05 21:36:53  kernel:  [<0000000000299262>] warn_alloc+0xf2/0x190 
May 05 21:36:53  kernel:  [<000000000029a25a>] __alloc_pages_nodemask+0xeda/0xfe0 
May 05 21:36:53  kernel:  [<00000000002fa570>] alloc_pages_current+0xb8/0x170 
May 05 21:36:53  kernel:  [<00000000002f03fc>] add_swap_count_continuation+0x3c/0x280 
May 05 21:36:53  kernel:  [<00000000002f068c>] swap_duplicate+0x4c/0x80 
May 05 21:36:53  kernel:  [<00000000002dfbfa>] try_to_unmap_one+0x372/0x578 
May 05 21:36:53  kernel:  [<000000000030131a>] rmap_walk_ksm+0x14a/0x1d8 
May 05 21:36:53  kernel:  [<00000000002e0d60>] try_to_unmap+0x140/0x170 
May 05 21:36:53  kernel:  [<00000000002abc9c>] shrink_page_list+0x944/0xad8 
May 05 21:36:53  kernel:  [<00000000002ac720>] shrink_inactive_list+0x1e0/0x5b8 
May 05 21:36:53  kernel:  [<00000000002ad642>] shrink_node_memcg+0x5e2/0x800 
May 05 21:36:53  kernel:  [<00000000002ad954>] shrink_node+0xf4/0x360 
May 05 21:36:53  kernel:  [<00000000002aeb00>] kswapd+0x330/0x810 
May 05 21:36:53  kernel:  [<0000000000189f14>] kthread+0x144/0x168 
May 05 21:36:53  kernel:  [<00000000008011ea>] kernel_thread_starter+0x6/0xc 
May 05 21:36:53  kernel:  [<00000000008011e4>] kernel_thread_starter+0x0/0xc 

This seems to be new in 4.11 but the relevant code did not seem to have
changed.

Something like this 

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1781308..b2dd53e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3039,7 +3039,7 @@ int swap_duplicate(swp_entry_t entry)
        int err = 0;
 
        while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
-               err = add_swap_count_continuation(entry, GFP_ATOMIC);
+               err = add_swap_count_continuation(entry, GFP_ATOMIC | __GFP_NOWARN);
        return err;
 }
 

seems not appropriate, because this code does not know if the caller can
handle returned errors.

Would something like the following (white space damaged cut'n'paste be ok?
(the try_to_unmap_one change looks fine, not sure if copy_one_pte does the
right thing)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 45e91dd..4577494 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -391,7 +391,7 @@ extern swp_entry_t get_swap_page_of_type(int);
 extern int get_swap_pages(int n, swp_entry_t swp_entries[]);
 extern int add_swap_count_continuation(swp_entry_t, gfp_t);
 extern void swap_shmem_alloc(swp_entry_t);
-extern int swap_duplicate(swp_entry_t);
+extern int swap_duplicate(swp_entry_t, gfp_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
 extern void swapcache_free(swp_entry_t);
@@ -447,7 +447,7 @@ static inline void swap_shmem_alloc(swp_entry_t swp)
 {
 }
 
-static inline int swap_duplicate(swp_entry_t swp)
+int swap_duplicate(swp_entry_t entry, gfp_t gfp_mask)
 {
        return 0;
 }
diff --git a/mm/memory.c b/mm/memory.c
index 235ba51..3ae6f33 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -898,7 +898,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
                swp_entry_t entry = pte_to_swp_entry(pte);
 
                if (likely(!non_swap_entry(entry))) {
-                       if (swap_duplicate(entry) < 0)
+                       if (swap_duplicate(entry, __GFP_NOWARN) < 0)
                                return entry.val;
 
                        /* make sure dst_mm is on swapoff's mmlist. */
diff --git a/mm/rmap.c b/mm/rmap.c
index f683801..777feb6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1433,7 +1433,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
                                goto discard;
                        }
 
-                       if (swap_duplicate(entry) < 0) {
+                       if (swap_duplicate(entry, __GFP_NOWARN) < 0) {
                                set_pte_at(mm, address, pvmw.pte, pteval);
                                ret = SWAP_FAIL;
                                page_vma_mapped_walk_done(&pvmw);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1781308..1f86268 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3034,12 +3034,12 @@ void swap_shmem_alloc(swp_entry_t entry)
  * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which
  * might occur if a page table entry has got corrupted.
  */
-int swap_duplicate(swp_entry_t entry)
+int swap_duplicate(swp_entry_t entry, gfp_t gfp_mask)
 {
        int err = 0;
 
        while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
-               err = add_swap_count_continuation(entry, GFP_ATOMIC);
+               err = add_swap_count_continuation(entry, GFP_ATOMIC | gfp_mask);
        return err;
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* mm: page allocation failures in swap_duplicate -> add_swap_count_continuation
@ 2017-05-12  9:18 ` Christian Borntraeger
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Borntraeger @ 2017-05-12  9:18 UTC (permalink / raw)
  To: linux-mm, Linux Kernel Mailing List

Folks,

recently I have seen page allocation failures during
paging in the paging code:
e.g. 

May 05 21:36:53  kernel: Call Trace:
May 05 21:36:53  kernel: ([<0000000000112f62>] show_trace+0x62/0x78)
May 05 21:36:53  kernel:  [<0000000000113050>] show_stack+0x68/0xe0 
May 05 21:36:53  kernel:  [<00000000004fb97e>] dump_stack+0x7e/0xb0 
May 05 21:36:53  kernel:  [<0000000000299262>] warn_alloc+0xf2/0x190 
May 05 21:36:53  kernel:  [<000000000029a25a>] __alloc_pages_nodemask+0xeda/0xfe0 
May 05 21:36:53  kernel:  [<00000000002fa570>] alloc_pages_current+0xb8/0x170 
May 05 21:36:53  kernel:  [<00000000002f03fc>] add_swap_count_continuation+0x3c/0x280 
May 05 21:36:53  kernel:  [<00000000002f068c>] swap_duplicate+0x4c/0x80 
May 05 21:36:53  kernel:  [<00000000002dfbfa>] try_to_unmap_one+0x372/0x578 
May 05 21:36:53  kernel:  [<000000000030131a>] rmap_walk_ksm+0x14a/0x1d8 
May 05 21:36:53  kernel:  [<00000000002e0d60>] try_to_unmap+0x140/0x170 
May 05 21:36:53  kernel:  [<00000000002abc9c>] shrink_page_list+0x944/0xad8 
May 05 21:36:53  kernel:  [<00000000002ac720>] shrink_inactive_list+0x1e0/0x5b8 
May 05 21:36:53  kernel:  [<00000000002ad642>] shrink_node_memcg+0x5e2/0x800 
May 05 21:36:53  kernel:  [<00000000002ad954>] shrink_node+0xf4/0x360 
May 05 21:36:53  kernel:  [<00000000002aeb00>] kswapd+0x330/0x810 
May 05 21:36:53  kernel:  [<0000000000189f14>] kthread+0x144/0x168 
May 05 21:36:53  kernel:  [<00000000008011ea>] kernel_thread_starter+0x6/0xc 
May 05 21:36:53  kernel:  [<00000000008011e4>] kernel_thread_starter+0x0/0xc 

This seems to be new in 4.11 but the relevant code did not seem to have
changed.

Something like this 

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1781308..b2dd53e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3039,7 +3039,7 @@ int swap_duplicate(swp_entry_t entry)
        int err = 0;
 
        while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
-               err = add_swap_count_continuation(entry, GFP_ATOMIC);
+               err = add_swap_count_continuation(entry, GFP_ATOMIC | __GFP_NOWARN);
        return err;
 }
 

seems not appropriate, because this code does not know if the caller can
handle returned errors.

Would something like the following (white space damaged cut'n'paste be ok?
(the try_to_unmap_one change looks fine, not sure if copy_one_pte does the
right thing)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 45e91dd..4577494 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -391,7 +391,7 @@ extern swp_entry_t get_swap_page_of_type(int);
 extern int get_swap_pages(int n, swp_entry_t swp_entries[]);
 extern int add_swap_count_continuation(swp_entry_t, gfp_t);
 extern void swap_shmem_alloc(swp_entry_t);
-extern int swap_duplicate(swp_entry_t);
+extern int swap_duplicate(swp_entry_t, gfp_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
 extern void swapcache_free(swp_entry_t);
@@ -447,7 +447,7 @@ static inline void swap_shmem_alloc(swp_entry_t swp)
 {
 }
 
-static inline int swap_duplicate(swp_entry_t swp)
+int swap_duplicate(swp_entry_t entry, gfp_t gfp_mask)
 {
        return 0;
 }
diff --git a/mm/memory.c b/mm/memory.c
index 235ba51..3ae6f33 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -898,7 +898,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
                swp_entry_t entry = pte_to_swp_entry(pte);
 
                if (likely(!non_swap_entry(entry))) {
-                       if (swap_duplicate(entry) < 0)
+                       if (swap_duplicate(entry, __GFP_NOWARN) < 0)
                                return entry.val;
 
                        /* make sure dst_mm is on swapoff's mmlist. */
diff --git a/mm/rmap.c b/mm/rmap.c
index f683801..777feb6 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1433,7 +1433,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
                                goto discard;
                        }
 
-                       if (swap_duplicate(entry) < 0) {
+                       if (swap_duplicate(entry, __GFP_NOWARN) < 0) {
                                set_pte_at(mm, address, pvmw.pte, pteval);
                                ret = SWAP_FAIL;
                                page_vma_mapped_walk_done(&pvmw);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1781308..1f86268 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3034,12 +3034,12 @@ void swap_shmem_alloc(swp_entry_t entry)
  * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which
  * might occur if a page table entry has got corrupted.
  */
-int swap_duplicate(swp_entry_t entry)
+int swap_duplicate(swp_entry_t entry, gfp_t gfp_mask)
 {
        int err = 0;
 
        while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
-               err = add_swap_count_continuation(entry, GFP_ATOMIC);
+               err = add_swap_count_continuation(entry, GFP_ATOMIC | gfp_mask);
        return err;
 }


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: mm: page allocation failures in swap_duplicate -> add_swap_count_continuation
  2017-05-12  9:18 ` Christian Borntraeger
@ 2017-05-15  8:03   ` Michal Hocko
  -1 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-05-15  8:03 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: linux-mm, Linux Kernel Mailing List

On Fri 12-05-17 11:18:42, Christian Borntraeger wrote:
> Folks,
> 
> recently I have seen page allocation failures during
> paging in the paging code:
> e.g. 
> 
> May 05 21:36:53  kernel: Call Trace:
> May 05 21:36:53  kernel: ([<0000000000112f62>] show_trace+0x62/0x78)
> May 05 21:36:53  kernel:  [<0000000000113050>] show_stack+0x68/0xe0 
> May 05 21:36:53  kernel:  [<00000000004fb97e>] dump_stack+0x7e/0xb0 
> May 05 21:36:53  kernel:  [<0000000000299262>] warn_alloc+0xf2/0x190 
> May 05 21:36:53  kernel:  [<000000000029a25a>] __alloc_pages_nodemask+0xeda/0xfe0 
> May 05 21:36:53  kernel:  [<00000000002fa570>] alloc_pages_current+0xb8/0x170 
> May 05 21:36:53  kernel:  [<00000000002f03fc>] add_swap_count_continuation+0x3c/0x280 
> May 05 21:36:53  kernel:  [<00000000002f068c>] swap_duplicate+0x4c/0x80 
> May 05 21:36:53  kernel:  [<00000000002dfbfa>] try_to_unmap_one+0x372/0x578 
> May 05 21:36:53  kernel:  [<000000000030131a>] rmap_walk_ksm+0x14a/0x1d8 
> May 05 21:36:53  kernel:  [<00000000002e0d60>] try_to_unmap+0x140/0x170 
> May 05 21:36:53  kernel:  [<00000000002abc9c>] shrink_page_list+0x944/0xad8 
> May 05 21:36:53  kernel:  [<00000000002ac720>] shrink_inactive_list+0x1e0/0x5b8 
> May 05 21:36:53  kernel:  [<00000000002ad642>] shrink_node_memcg+0x5e2/0x800 
> May 05 21:36:53  kernel:  [<00000000002ad954>] shrink_node+0xf4/0x360 
> May 05 21:36:53  kernel:  [<00000000002aeb00>] kswapd+0x330/0x810 
> May 05 21:36:53  kernel:  [<0000000000189f14>] kthread+0x144/0x168 
> May 05 21:36:53  kernel:  [<00000000008011ea>] kernel_thread_starter+0x6/0xc 
> May 05 21:36:53  kernel:  [<00000000008011e4>] kernel_thread_starter+0x0/0xc 
> 
> This seems to be new in 4.11 but the relevant code did not seem to have
> changed.
> 
> Something like this 
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 1781308..b2dd53e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3039,7 +3039,7 @@ int swap_duplicate(swp_entry_t entry)
>         int err = 0;
>  
>         while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> -               err = add_swap_count_continuation(entry, GFP_ATOMIC);
> +               err = add_swap_count_continuation(entry, GFP_ATOMIC | __GFP_NOWARN);
>         return err;
>  }
>  
> 
> seems not appropriate, because this code does not know if the caller can
> handle returned errors.
> 
> Would something like the following (white space damaged cut'n'paste be ok?
> (the try_to_unmap_one change looks fine, not sure if copy_one_pte does the
> right thing)

No, it won't. If you want to silent the warning then explain _why_ it is
a good approach. It is not immediatelly clear to me.

> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 235ba51..3ae6f33 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -898,7 +898,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>                 swp_entry_t entry = pte_to_swp_entry(pte);
>  
>                 if (likely(!non_swap_entry(entry))) {
> -                       if (swap_duplicate(entry) < 0)
> +                       if (swap_duplicate(entry, __GFP_NOWARN) < 0)
>                                 return entry.val;

Moreover if you add a gfp_mask argument then the _full_ mask should be
given rather than just one of the modifiers.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: page allocation failures in swap_duplicate -> add_swap_count_continuation
@ 2017-05-15  8:03   ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-05-15  8:03 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: linux-mm, Linux Kernel Mailing List

On Fri 12-05-17 11:18:42, Christian Borntraeger wrote:
> Folks,
> 
> recently I have seen page allocation failures during
> paging in the paging code:
> e.g. 
> 
> May 05 21:36:53  kernel: Call Trace:
> May 05 21:36:53  kernel: ([<0000000000112f62>] show_trace+0x62/0x78)
> May 05 21:36:53  kernel:  [<0000000000113050>] show_stack+0x68/0xe0 
> May 05 21:36:53  kernel:  [<00000000004fb97e>] dump_stack+0x7e/0xb0 
> May 05 21:36:53  kernel:  [<0000000000299262>] warn_alloc+0xf2/0x190 
> May 05 21:36:53  kernel:  [<000000000029a25a>] __alloc_pages_nodemask+0xeda/0xfe0 
> May 05 21:36:53  kernel:  [<00000000002fa570>] alloc_pages_current+0xb8/0x170 
> May 05 21:36:53  kernel:  [<00000000002f03fc>] add_swap_count_continuation+0x3c/0x280 
> May 05 21:36:53  kernel:  [<00000000002f068c>] swap_duplicate+0x4c/0x80 
> May 05 21:36:53  kernel:  [<00000000002dfbfa>] try_to_unmap_one+0x372/0x578 
> May 05 21:36:53  kernel:  [<000000000030131a>] rmap_walk_ksm+0x14a/0x1d8 
> May 05 21:36:53  kernel:  [<00000000002e0d60>] try_to_unmap+0x140/0x170 
> May 05 21:36:53  kernel:  [<00000000002abc9c>] shrink_page_list+0x944/0xad8 
> May 05 21:36:53  kernel:  [<00000000002ac720>] shrink_inactive_list+0x1e0/0x5b8 
> May 05 21:36:53  kernel:  [<00000000002ad642>] shrink_node_memcg+0x5e2/0x800 
> May 05 21:36:53  kernel:  [<00000000002ad954>] shrink_node+0xf4/0x360 
> May 05 21:36:53  kernel:  [<00000000002aeb00>] kswapd+0x330/0x810 
> May 05 21:36:53  kernel:  [<0000000000189f14>] kthread+0x144/0x168 
> May 05 21:36:53  kernel:  [<00000000008011ea>] kernel_thread_starter+0x6/0xc 
> May 05 21:36:53  kernel:  [<00000000008011e4>] kernel_thread_starter+0x0/0xc 
> 
> This seems to be new in 4.11 but the relevant code did not seem to have
> changed.
> 
> Something like this 
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 1781308..b2dd53e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3039,7 +3039,7 @@ int swap_duplicate(swp_entry_t entry)
>         int err = 0;
>  
>         while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> -               err = add_swap_count_continuation(entry, GFP_ATOMIC);
> +               err = add_swap_count_continuation(entry, GFP_ATOMIC | __GFP_NOWARN);
>         return err;
>  }
>  
> 
> seems not appropriate, because this code does not know if the caller can
> handle returned errors.
> 
> Would something like the following (white space damaged cut'n'paste be ok?
> (the try_to_unmap_one change looks fine, not sure if copy_one_pte does the
> right thing)

No, it won't. If you want to silent the warning then explain _why_ it is
a good approach. It is not immediatelly clear to me.

> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 235ba51..3ae6f33 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -898,7 +898,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>                 swp_entry_t entry = pte_to_swp_entry(pte);
>  
>                 if (likely(!non_swap_entry(entry))) {
> -                       if (swap_duplicate(entry) < 0)
> +                       if (swap_duplicate(entry, __GFP_NOWARN) < 0)
>                                 return entry.val;

Moreover if you add a gfp_mask argument then the _full_ mask should be
given rather than just one of the modifiers.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: page allocation failures in swap_duplicate -> add_swap_count_continuation
  2017-05-15  8:03   ` Michal Hocko
@ 2017-05-15  8:10     ` Christian Borntraeger
  -1 siblings, 0 replies; 10+ messages in thread
From: Christian Borntraeger @ 2017-05-15  8:10 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Linux Kernel Mailing List

On 05/15/2017 10:03 AM, Michal Hocko wrote:
> On Fri 12-05-17 11:18:42, Christian Borntraeger wrote:
>> Folks,
>>
>> recently I have seen page allocation failures during
>> paging in the paging code:
>> e.g. 
>>
>> May 05 21:36:53  kernel: Call Trace:
>> May 05 21:36:53  kernel: ([<0000000000112f62>] show_trace+0x62/0x78)
>> May 05 21:36:53  kernel:  [<0000000000113050>] show_stack+0x68/0xe0 
>> May 05 21:36:53  kernel:  [<00000000004fb97e>] dump_stack+0x7e/0xb0 
>> May 05 21:36:53  kernel:  [<0000000000299262>] warn_alloc+0xf2/0x190 
>> May 05 21:36:53  kernel:  [<000000000029a25a>] __alloc_pages_nodemask+0xeda/0xfe0 
>> May 05 21:36:53  kernel:  [<00000000002fa570>] alloc_pages_current+0xb8/0x170 
>> May 05 21:36:53  kernel:  [<00000000002f03fc>] add_swap_count_continuation+0x3c/0x280 
>> May 05 21:36:53  kernel:  [<00000000002f068c>] swap_duplicate+0x4c/0x80 
>> May 05 21:36:53  kernel:  [<00000000002dfbfa>] try_to_unmap_one+0x372/0x578 
>> May 05 21:36:53  kernel:  [<000000000030131a>] rmap_walk_ksm+0x14a/0x1d8 
>> May 05 21:36:53  kernel:  [<00000000002e0d60>] try_to_unmap+0x140/0x170 
>> May 05 21:36:53  kernel:  [<00000000002abc9c>] shrink_page_list+0x944/0xad8 
>> May 05 21:36:53  kernel:  [<00000000002ac720>] shrink_inactive_list+0x1e0/0x5b8 
>> May 05 21:36:53  kernel:  [<00000000002ad642>] shrink_node_memcg+0x5e2/0x800 
>> May 05 21:36:53  kernel:  [<00000000002ad954>] shrink_node+0xf4/0x360 
>> May 05 21:36:53  kernel:  [<00000000002aeb00>] kswapd+0x330/0x810 
>> May 05 21:36:53  kernel:  [<0000000000189f14>] kthread+0x144/0x168 
>> May 05 21:36:53  kernel:  [<00000000008011ea>] kernel_thread_starter+0x6/0xc 
>> May 05 21:36:53  kernel:  [<00000000008011e4>] kernel_thread_starter+0x0/0xc 
>>
>> This seems to be new in 4.11 but the relevant code did not seem to have
>> changed.
>>
>> Something like this 
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 1781308..b2dd53e 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -3039,7 +3039,7 @@ int swap_duplicate(swp_entry_t entry)
>>         int err = 0;
>>  
>>         while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
>> -               err = add_swap_count_continuation(entry, GFP_ATOMIC);
>> +               err = add_swap_count_continuation(entry, GFP_ATOMIC | __GFP_NOWARN);
>>         return err;
>>  }
>>  
>>
>> seems not appropriate, because this code does not know if the caller can
>> handle returned errors.
>>
>> Would something like the following (white space damaged cut'n'paste be ok?
>> (the try_to_unmap_one change looks fine, not sure if copy_one_pte does the
>> right thing)
> 
> No, it won't. If you want to silent the warning then explain _why_ it is
> a good approach. It is not immediatelly clear to me.

Consider my mail a bug report, not a proper fix. As far as I can tell, try_to_unmap_one
can handle allocation failure gracefully, so not warn here _looks_ fine to me.
> 
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 235ba51..3ae6f33 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -898,7 +898,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>                 swp_entry_t entry = pte_to_swp_entry(pte);
>>  
>>                 if (likely(!non_swap_entry(entry))) {
>> -                       if (swap_duplicate(entry) < 0)
>> +                       if (swap_duplicate(entry, __GFP_NOWARN) < 0)
>>                                 return entry.val;

This code has special casing for the allocation failure path, but I cannot
decide if it does the right thing here.


> 
> Moreover if you add a gfp_mask argument then the _full_ mask should be
> given rather than just one of the modifiers.
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: page allocation failures in swap_duplicate -> add_swap_count_continuation
@ 2017-05-15  8:10     ` Christian Borntraeger
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Borntraeger @ 2017-05-15  8:10 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Linux Kernel Mailing List

On 05/15/2017 10:03 AM, Michal Hocko wrote:
> On Fri 12-05-17 11:18:42, Christian Borntraeger wrote:
>> Folks,
>>
>> recently I have seen page allocation failures during
>> paging in the paging code:
>> e.g. 
>>
>> May 05 21:36:53  kernel: Call Trace:
>> May 05 21:36:53  kernel: ([<0000000000112f62>] show_trace+0x62/0x78)
>> May 05 21:36:53  kernel:  [<0000000000113050>] show_stack+0x68/0xe0 
>> May 05 21:36:53  kernel:  [<00000000004fb97e>] dump_stack+0x7e/0xb0 
>> May 05 21:36:53  kernel:  [<0000000000299262>] warn_alloc+0xf2/0x190 
>> May 05 21:36:53  kernel:  [<000000000029a25a>] __alloc_pages_nodemask+0xeda/0xfe0 
>> May 05 21:36:53  kernel:  [<00000000002fa570>] alloc_pages_current+0xb8/0x170 
>> May 05 21:36:53  kernel:  [<00000000002f03fc>] add_swap_count_continuation+0x3c/0x280 
>> May 05 21:36:53  kernel:  [<00000000002f068c>] swap_duplicate+0x4c/0x80 
>> May 05 21:36:53  kernel:  [<00000000002dfbfa>] try_to_unmap_one+0x372/0x578 
>> May 05 21:36:53  kernel:  [<000000000030131a>] rmap_walk_ksm+0x14a/0x1d8 
>> May 05 21:36:53  kernel:  [<00000000002e0d60>] try_to_unmap+0x140/0x170 
>> May 05 21:36:53  kernel:  [<00000000002abc9c>] shrink_page_list+0x944/0xad8 
>> May 05 21:36:53  kernel:  [<00000000002ac720>] shrink_inactive_list+0x1e0/0x5b8 
>> May 05 21:36:53  kernel:  [<00000000002ad642>] shrink_node_memcg+0x5e2/0x800 
>> May 05 21:36:53  kernel:  [<00000000002ad954>] shrink_node+0xf4/0x360 
>> May 05 21:36:53  kernel:  [<00000000002aeb00>] kswapd+0x330/0x810 
>> May 05 21:36:53  kernel:  [<0000000000189f14>] kthread+0x144/0x168 
>> May 05 21:36:53  kernel:  [<00000000008011ea>] kernel_thread_starter+0x6/0xc 
>> May 05 21:36:53  kernel:  [<00000000008011e4>] kernel_thread_starter+0x0/0xc 
>>
>> This seems to be new in 4.11 but the relevant code did not seem to have
>> changed.
>>
>> Something like this 
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 1781308..b2dd53e 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -3039,7 +3039,7 @@ int swap_duplicate(swp_entry_t entry)
>>         int err = 0;
>>  
>>         while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
>> -               err = add_swap_count_continuation(entry, GFP_ATOMIC);
>> +               err = add_swap_count_continuation(entry, GFP_ATOMIC | __GFP_NOWARN);
>>         return err;
>>  }
>>  
>>
>> seems not appropriate, because this code does not know if the caller can
>> handle returned errors.
>>
>> Would something like the following (white space damaged cut'n'paste be ok?
>> (the try_to_unmap_one change looks fine, not sure if copy_one_pte does the
>> right thing)
> 
> No, it won't. If you want to silent the warning then explain _why_ it is
> a good approach. It is not immediatelly clear to me.

Consider my mail a bug report, not a proper fix. As far as I can tell, try_to_unmap_one
can handle allocation failure gracefully, so not warn here _looks_ fine to me.
> 
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 235ba51..3ae6f33 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -898,7 +898,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>                 swp_entry_t entry = pte_to_swp_entry(pte);
>>  
>>                 if (likely(!non_swap_entry(entry))) {
>> -                       if (swap_duplicate(entry) < 0)
>> +                       if (swap_duplicate(entry, __GFP_NOWARN) < 0)
>>                                 return entry.val;

This code has special casing for the allocation failure path, but I cannot
decide if it does the right thing here.


> 
> Moreover if you add a gfp_mask argument then the _full_ mask should be
> given rather than just one of the modifiers.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: page allocation failures in swap_duplicate -> add_swap_count_continuation
  2017-05-15  8:10     ` Christian Borntraeger
@ 2017-05-15 12:51       ` Michal Hocko
  -1 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-05-15 12:51 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: linux-mm, Linux Kernel Mailing List

On Mon 15-05-17 10:10:17, Christian Borntraeger wrote:
> On 05/15/2017 10:03 AM, Michal Hocko wrote:
> > On Fri 12-05-17 11:18:42, Christian Borntraeger wrote:
> >> Folks,
> >>
> >> recently I have seen page allocation failures during
> >> paging in the paging code:
> >> e.g. 
> >>
> >> May 05 21:36:53  kernel: Call Trace:
> >> May 05 21:36:53  kernel: ([<0000000000112f62>] show_trace+0x62/0x78)
> >> May 05 21:36:53  kernel:  [<0000000000113050>] show_stack+0x68/0xe0 
> >> May 05 21:36:53  kernel:  [<00000000004fb97e>] dump_stack+0x7e/0xb0 
> >> May 05 21:36:53  kernel:  [<0000000000299262>] warn_alloc+0xf2/0x190 
> >> May 05 21:36:53  kernel:  [<000000000029a25a>] __alloc_pages_nodemask+0xeda/0xfe0 
> >> May 05 21:36:53  kernel:  [<00000000002fa570>] alloc_pages_current+0xb8/0x170 
> >> May 05 21:36:53  kernel:  [<00000000002f03fc>] add_swap_count_continuation+0x3c/0x280 
> >> May 05 21:36:53  kernel:  [<00000000002f068c>] swap_duplicate+0x4c/0x80 
> >> May 05 21:36:53  kernel:  [<00000000002dfbfa>] try_to_unmap_one+0x372/0x578 
> >> May 05 21:36:53  kernel:  [<000000000030131a>] rmap_walk_ksm+0x14a/0x1d8 
> >> May 05 21:36:53  kernel:  [<00000000002e0d60>] try_to_unmap+0x140/0x170 
> >> May 05 21:36:53  kernel:  [<00000000002abc9c>] shrink_page_list+0x944/0xad8 
> >> May 05 21:36:53  kernel:  [<00000000002ac720>] shrink_inactive_list+0x1e0/0x5b8 
> >> May 05 21:36:53  kernel:  [<00000000002ad642>] shrink_node_memcg+0x5e2/0x800 
> >> May 05 21:36:53  kernel:  [<00000000002ad954>] shrink_node+0xf4/0x360 
> >> May 05 21:36:53  kernel:  [<00000000002aeb00>] kswapd+0x330/0x810 
> >> May 05 21:36:53  kernel:  [<0000000000189f14>] kthread+0x144/0x168 
> >> May 05 21:36:53  kernel:  [<00000000008011ea>] kernel_thread_starter+0x6/0xc 
> >> May 05 21:36:53  kernel:  [<00000000008011e4>] kernel_thread_starter+0x0/0xc 
> >>
> >> This seems to be new in 4.11 but the relevant code did not seem to have
> >> changed.
> >>
> >> Something like this 
> >>
> >> diff --git a/mm/swapfile.c b/mm/swapfile.c
> >> index 1781308..b2dd53e 100644
> >> --- a/mm/swapfile.c
> >> +++ b/mm/swapfile.c
> >> @@ -3039,7 +3039,7 @@ int swap_duplicate(swp_entry_t entry)
> >>         int err = 0;
> >>  
> >>         while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> >> -               err = add_swap_count_continuation(entry, GFP_ATOMIC);
> >> +               err = add_swap_count_continuation(entry, GFP_ATOMIC | __GFP_NOWARN);
> >>         return err;
> >>  }
> >>  
> >>
> >> seems not appropriate, because this code does not know if the caller can
> >> handle returned errors.
> >>
> >> Would something like the following (white space damaged cut'n'paste be ok?
> >> (the try_to_unmap_one change looks fine, not sure if copy_one_pte does the
> >> right thing)
> > 
> > No, it won't. If you want to silent the warning then explain _why_ it is
> > a good approach. It is not immediatelly clear to me.
> 
> Consider my mail a bug report, not a proper fix. As far as I can tell, try_to_unmap_one
> can handle allocation failure gracefully, so not warn here _looks_ fine to me.

Could you be more specific about the issue then? I haven't checked very
closely but AFAIR we just keep pages on the LRU if try_to_unmap fails
and keep reclaiming. So we can handle the failure but it would be good
to know that something like that happened because if this is not a
one-off issue then it will help us to see why we see a seemingly
spurious OOM.

> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index 235ba51..3ae6f33 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -898,7 +898,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >>                 swp_entry_t entry = pte_to_swp_entry(pte);
> >>  
> >>                 if (likely(!non_swap_entry(entry))) {
> >> -                       if (swap_duplicate(entry) < 0)
> >> +                       if (swap_duplicate(entry, __GFP_NOWARN) < 0)
> >>                                 return entry.val;
> 
> This code has special casing for the allocation failure path, but I cannot
> decide if it does the right thing here.

My point was that you should _always_ use the full gfp mask when taken
as a parameter so the above should be GFP_ATOMIC | __GFP_NOWARN...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: page allocation failures in swap_duplicate -> add_swap_count_continuation
@ 2017-05-15 12:51       ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2017-05-15 12:51 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: linux-mm, Linux Kernel Mailing List

On Mon 15-05-17 10:10:17, Christian Borntraeger wrote:
> On 05/15/2017 10:03 AM, Michal Hocko wrote:
> > On Fri 12-05-17 11:18:42, Christian Borntraeger wrote:
> >> Folks,
> >>
> >> recently I have seen page allocation failures during
> >> paging in the paging code:
> >> e.g. 
> >>
> >> May 05 21:36:53  kernel: Call Trace:
> >> May 05 21:36:53  kernel: ([<0000000000112f62>] show_trace+0x62/0x78)
> >> May 05 21:36:53  kernel:  [<0000000000113050>] show_stack+0x68/0xe0 
> >> May 05 21:36:53  kernel:  [<00000000004fb97e>] dump_stack+0x7e/0xb0 
> >> May 05 21:36:53  kernel:  [<0000000000299262>] warn_alloc+0xf2/0x190 
> >> May 05 21:36:53  kernel:  [<000000000029a25a>] __alloc_pages_nodemask+0xeda/0xfe0 
> >> May 05 21:36:53  kernel:  [<00000000002fa570>] alloc_pages_current+0xb8/0x170 
> >> May 05 21:36:53  kernel:  [<00000000002f03fc>] add_swap_count_continuation+0x3c/0x280 
> >> May 05 21:36:53  kernel:  [<00000000002f068c>] swap_duplicate+0x4c/0x80 
> >> May 05 21:36:53  kernel:  [<00000000002dfbfa>] try_to_unmap_one+0x372/0x578 
> >> May 05 21:36:53  kernel:  [<000000000030131a>] rmap_walk_ksm+0x14a/0x1d8 
> >> May 05 21:36:53  kernel:  [<00000000002e0d60>] try_to_unmap+0x140/0x170 
> >> May 05 21:36:53  kernel:  [<00000000002abc9c>] shrink_page_list+0x944/0xad8 
> >> May 05 21:36:53  kernel:  [<00000000002ac720>] shrink_inactive_list+0x1e0/0x5b8 
> >> May 05 21:36:53  kernel:  [<00000000002ad642>] shrink_node_memcg+0x5e2/0x800 
> >> May 05 21:36:53  kernel:  [<00000000002ad954>] shrink_node+0xf4/0x360 
> >> May 05 21:36:53  kernel:  [<00000000002aeb00>] kswapd+0x330/0x810 
> >> May 05 21:36:53  kernel:  [<0000000000189f14>] kthread+0x144/0x168 
> >> May 05 21:36:53  kernel:  [<00000000008011ea>] kernel_thread_starter+0x6/0xc 
> >> May 05 21:36:53  kernel:  [<00000000008011e4>] kernel_thread_starter+0x0/0xc 
> >>
> >> This seems to be new in 4.11 but the relevant code did not seem to have
> >> changed.
> >>
> >> Something like this 
> >>
> >> diff --git a/mm/swapfile.c b/mm/swapfile.c
> >> index 1781308..b2dd53e 100644
> >> --- a/mm/swapfile.c
> >> +++ b/mm/swapfile.c
> >> @@ -3039,7 +3039,7 @@ int swap_duplicate(swp_entry_t entry)
> >>         int err = 0;
> >>  
> >>         while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
> >> -               err = add_swap_count_continuation(entry, GFP_ATOMIC);
> >> +               err = add_swap_count_continuation(entry, GFP_ATOMIC | __GFP_NOWARN);
> >>         return err;
> >>  }
> >>  
> >>
> >> seems not appropriate, because this code does not know if the caller can
> >> handle returned errors.
> >>
> >> Would something like the following (white space damaged cut'n'paste be ok?
> >> (the try_to_unmap_one change looks fine, not sure if copy_one_pte does the
> >> right thing)
> > 
> > No, it won't. If you want to silent the warning then explain _why_ it is
> > a good approach. It is not immediatelly clear to me.
> 
> Consider my mail a bug report, not a proper fix. As far as I can tell, try_to_unmap_one
> can handle allocation failure gracefully, so not warn here _looks_ fine to me.

Could you be more specific about the issue then? I haven't checked very
closely but AFAIR we just keep pages on the LRU if try_to_unmap fails
and keep reclaiming. So we can handle the failure but it would be good
to know that something like that happened because if this is not a
one-off issue then it will help us to see why we see a seemingly
spurious OOM.

> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index 235ba51..3ae6f33 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -898,7 +898,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> >>                 swp_entry_t entry = pte_to_swp_entry(pte);
> >>  
> >>                 if (likely(!non_swap_entry(entry))) {
> >> -                       if (swap_duplicate(entry) < 0)
> >> +                       if (swap_duplicate(entry, __GFP_NOWARN) < 0)
> >>                                 return entry.val;
> 
> This code has special casing for the allocation failure path, but I cannot
> decide if it does the right thing here.

My point was that you should _always_ use the full gfp mask when taken
as a parameter so the above should be GFP_ATOMIC | __GFP_NOWARN...

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: page allocation failures in swap_duplicate -> add_swap_count_continuation
  2017-05-15 12:51       ` Michal Hocko
@ 2017-05-15 12:57         ` Christian Borntraeger
  -1 siblings, 0 replies; 10+ messages in thread
From: Christian Borntraeger @ 2017-05-15 12:57 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Linux Kernel Mailing List

On 05/15/2017 02:51 PM, Michal Hocko wrote:
> On Mon 15-05-17 10:10:17, Christian Borntraeger wrote:
>> On 05/15/2017 10:03 AM, Michal Hocko wrote:
>>> On Fri 12-05-17 11:18:42, Christian Borntraeger wrote:
>>>> Folks,
>>>>
>>>> recently I have seen page allocation failures during
>>>> paging in the paging code:
>>>> e.g. 
>>>>
>>>> May 05 21:36:53  kernel: Call Trace:
>>>> May 05 21:36:53  kernel: ([<0000000000112f62>] show_trace+0x62/0x78)
>>>> May 05 21:36:53  kernel:  [<0000000000113050>] show_stack+0x68/0xe0 
>>>> May 05 21:36:53  kernel:  [<00000000004fb97e>] dump_stack+0x7e/0xb0 
>>>> May 05 21:36:53  kernel:  [<0000000000299262>] warn_alloc+0xf2/0x190 
>>>> May 05 21:36:53  kernel:  [<000000000029a25a>] __alloc_pages_nodemask+0xeda/0xfe0 
>>>> May 05 21:36:53  kernel:  [<00000000002fa570>] alloc_pages_current+0xb8/0x170 
>>>> May 05 21:36:53  kernel:  [<00000000002f03fc>] add_swap_count_continuation+0x3c/0x280 
>>>> May 05 21:36:53  kernel:  [<00000000002f068c>] swap_duplicate+0x4c/0x80 
>>>> May 05 21:36:53  kernel:  [<00000000002dfbfa>] try_to_unmap_one+0x372/0x578 
>>>> May 05 21:36:53  kernel:  [<000000000030131a>] rmap_walk_ksm+0x14a/0x1d8 
>>>> May 05 21:36:53  kernel:  [<00000000002e0d60>] try_to_unmap+0x140/0x170 
>>>> May 05 21:36:53  kernel:  [<00000000002abc9c>] shrink_page_list+0x944/0xad8 
>>>> May 05 21:36:53  kernel:  [<00000000002ac720>] shrink_inactive_list+0x1e0/0x5b8 
>>>> May 05 21:36:53  kernel:  [<00000000002ad642>] shrink_node_memcg+0x5e2/0x800 
>>>> May 05 21:36:53  kernel:  [<00000000002ad954>] shrink_node+0xf4/0x360 
>>>> May 05 21:36:53  kernel:  [<00000000002aeb00>] kswapd+0x330/0x810 
>>>> May 05 21:36:53  kernel:  [<0000000000189f14>] kthread+0x144/0x168 
>>>> May 05 21:36:53  kernel:  [<00000000008011ea>] kernel_thread_starter+0x6/0xc 
>>>> May 05 21:36:53  kernel:  [<00000000008011e4>] kernel_thread_starter+0x0/0xc 
>>>>
>>>> This seems to be new in 4.11 but the relevant code did not seem to have
>>>> changed.
>>>>
>>>> Something like this 
>>>>
>>>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>>>> index 1781308..b2dd53e 100644
>>>> --- a/mm/swapfile.c
>>>> +++ b/mm/swapfile.c
>>>> @@ -3039,7 +3039,7 @@ int swap_duplicate(swp_entry_t entry)
>>>>         int err = 0;
>>>>  
>>>>         while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
>>>> -               err = add_swap_count_continuation(entry, GFP_ATOMIC);
>>>> +               err = add_swap_count_continuation(entry, GFP_ATOMIC | __GFP_NOWARN);
>>>>         return err;
>>>>  }
>>>>  
>>>>
>>>> seems not appropriate, because this code does not know if the caller can
>>>> handle returned errors.
>>>>
>>>> Would something like the following (white space damaged cut'n'paste be ok?
>>>> (the try_to_unmap_one change looks fine, not sure if copy_one_pte does the
>>>> right thing)
>>>
>>> No, it won't. If you want to silent the warning then explain _why_ it is
>>> a good approach. It is not immediatelly clear to me.
>>
>> Consider my mail a bug report, not a proper fix. As far as I can tell, try_to_unmap_one
>> can handle allocation failure gracefully, so not warn here _looks_ fine to me.
> 
> Could you be more specific about the issue then? I haven't checked very
> closely but AFAIR we just keep pages on the LRU if try_to_unmap fails
> and keep reclaiming. So we can handle the failure but it would be good
> to know that something like that happened because if this is not a
> one-off issue then it will help us to see why we see a seemingly
> spurious OOM.

My understanding is that we want to suppress these allocation failure messages
when
a: the allocation can fail (e.g. GFP_ATOMIC)
b: the caller can handle the allocation failure
to avoid spamming the logs. 

If you think that seeing this message under memory pressure is ok, because it will help
debugging, then so be it.
You might be actually right, because this message shows that we might do an allocation
to handle memory pressure. 

> 
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index 235ba51..3ae6f33 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -898,7 +898,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>>>                 swp_entry_t entry = pte_to_swp_entry(pte);
>>>>  
>>>>                 if (likely(!non_swap_entry(entry))) {
>>>> -                       if (swap_duplicate(entry) < 0)
>>>> +                       if (swap_duplicate(entry, __GFP_NOWARN) < 0)
>>>>                                 return entry.val;
>>
>> This code has special casing for the allocation failure path, but I cannot
>> decide if it does the right thing here.
> 
> My point was that you should _always_ use the full gfp mask when taken
> as a parameter so the above should be GFP_ATOMIC | __GFP_NOWARN...
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: page allocation failures in swap_duplicate -> add_swap_count_continuation
@ 2017-05-15 12:57         ` Christian Borntraeger
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Borntraeger @ 2017-05-15 12:57 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Linux Kernel Mailing List

On 05/15/2017 02:51 PM, Michal Hocko wrote:
> On Mon 15-05-17 10:10:17, Christian Borntraeger wrote:
>> On 05/15/2017 10:03 AM, Michal Hocko wrote:
>>> On Fri 12-05-17 11:18:42, Christian Borntraeger wrote:
>>>> Folks,
>>>>
>>>> recently I have seen page allocation failures during
>>>> paging in the paging code:
>>>> e.g. 
>>>>
>>>> May 05 21:36:53  kernel: Call Trace:
>>>> May 05 21:36:53  kernel: ([<0000000000112f62>] show_trace+0x62/0x78)
>>>> May 05 21:36:53  kernel:  [<0000000000113050>] show_stack+0x68/0xe0 
>>>> May 05 21:36:53  kernel:  [<00000000004fb97e>] dump_stack+0x7e/0xb0 
>>>> May 05 21:36:53  kernel:  [<0000000000299262>] warn_alloc+0xf2/0x190 
>>>> May 05 21:36:53  kernel:  [<000000000029a25a>] __alloc_pages_nodemask+0xeda/0xfe0 
>>>> May 05 21:36:53  kernel:  [<00000000002fa570>] alloc_pages_current+0xb8/0x170 
>>>> May 05 21:36:53  kernel:  [<00000000002f03fc>] add_swap_count_continuation+0x3c/0x280 
>>>> May 05 21:36:53  kernel:  [<00000000002f068c>] swap_duplicate+0x4c/0x80 
>>>> May 05 21:36:53  kernel:  [<00000000002dfbfa>] try_to_unmap_one+0x372/0x578 
>>>> May 05 21:36:53  kernel:  [<000000000030131a>] rmap_walk_ksm+0x14a/0x1d8 
>>>> May 05 21:36:53  kernel:  [<00000000002e0d60>] try_to_unmap+0x140/0x170 
>>>> May 05 21:36:53  kernel:  [<00000000002abc9c>] shrink_page_list+0x944/0xad8 
>>>> May 05 21:36:53  kernel:  [<00000000002ac720>] shrink_inactive_list+0x1e0/0x5b8 
>>>> May 05 21:36:53  kernel:  [<00000000002ad642>] shrink_node_memcg+0x5e2/0x800 
>>>> May 05 21:36:53  kernel:  [<00000000002ad954>] shrink_node+0xf4/0x360 
>>>> May 05 21:36:53  kernel:  [<00000000002aeb00>] kswapd+0x330/0x810 
>>>> May 05 21:36:53  kernel:  [<0000000000189f14>] kthread+0x144/0x168 
>>>> May 05 21:36:53  kernel:  [<00000000008011ea>] kernel_thread_starter+0x6/0xc 
>>>> May 05 21:36:53  kernel:  [<00000000008011e4>] kernel_thread_starter+0x0/0xc 
>>>>
>>>> This seems to be new in 4.11 but the relevant code did not seem to have
>>>> changed.
>>>>
>>>> Something like this 
>>>>
>>>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>>>> index 1781308..b2dd53e 100644
>>>> --- a/mm/swapfile.c
>>>> +++ b/mm/swapfile.c
>>>> @@ -3039,7 +3039,7 @@ int swap_duplicate(swp_entry_t entry)
>>>>         int err = 0;
>>>>  
>>>>         while (!err && __swap_duplicate(entry, 1) == -ENOMEM)
>>>> -               err = add_swap_count_continuation(entry, GFP_ATOMIC);
>>>> +               err = add_swap_count_continuation(entry, GFP_ATOMIC | __GFP_NOWARN);
>>>>         return err;
>>>>  }
>>>>  
>>>>
>>>> seems not appropriate, because this code does not know if the caller can
>>>> handle returned errors.
>>>>
>>>> Would something like the following (white space damaged cut'n'paste be ok?
>>>> (the try_to_unmap_one change looks fine, not sure if copy_one_pte does the
>>>> right thing)
>>>
>>> No, it won't. If you want to silent the warning then explain _why_ it is
>>> a good approach. It is not immediatelly clear to me.
>>
>> Consider my mail a bug report, not a proper fix. As far as I can tell, try_to_unmap_one
>> can handle allocation failure gracefully, so not warn here _looks_ fine to me.
> 
> Could you be more specific about the issue then? I haven't checked very
> closely but AFAIR we just keep pages on the LRU if try_to_unmap fails
> and keep reclaiming. So we can handle the failure but it would be good
> to know that something like that happened because if this is not a
> one-off issue then it will help us to see why we see a seemingly
> spurious OOM.

My understanding is that we want to suppress these allocation failure messages
when
a: the allocation can fail (e.g. GFP_ATOMIC)
b: the caller can handle the allocation failure
to avoid spamming the logs. 

If you think that seeing this message under memory pressure is ok, because it will help
debugging, then so be it.
You might be actually right, because this message shows that we might do an allocation
to handle memory pressure. 

> 
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index 235ba51..3ae6f33 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -898,7 +898,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>>>                 swp_entry_t entry = pte_to_swp_entry(pte);
>>>>  
>>>>                 if (likely(!non_swap_entry(entry))) {
>>>> -                       if (swap_duplicate(entry) < 0)
>>>> +                       if (swap_duplicate(entry, __GFP_NOWARN) < 0)
>>>>                                 return entry.val;
>>
>> This code has special casing for the allocation failure path, but I cannot
>> decide if it does the right thing here.
> 
> My point was that you should _always_ use the full gfp mask when taken
> as a parameter so the above should be GFP_ATOMIC | __GFP_NOWARN...
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-05-15 12:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-12  9:18 mm: page allocation failures in swap_duplicate -> add_swap_count_continuation Christian Borntraeger
2017-05-12  9:18 ` Christian Borntraeger
2017-05-15  8:03 ` Michal Hocko
2017-05-15  8:03   ` Michal Hocko
2017-05-15  8:10   ` Christian Borntraeger
2017-05-15  8:10     ` Christian Borntraeger
2017-05-15 12:51     ` Michal Hocko
2017-05-15 12:51       ` Michal Hocko
2017-05-15 12:57       ` Christian Borntraeger
2017-05-15 12:57         ` Christian Borntraeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.