linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2] mm/mprotect: Add a cond_resched() inside change_pmd_range()
@ 2017-12-14 11:14 Anshuman Khandual
  2017-12-14 11:29 ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Anshuman Khandual @ 2017-12-14 11:14 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: akpm, mhocko

While testing on a large CPU system, detected the following RCU
stall many times over the span of the workload. This problem
is solved by adding a cond_resched() in the change_pmd_range()
function.

[  850.962530] INFO: rcu_sched detected stalls on CPUs/tasks:
[  850.962584]  154-....: (670 ticks this GP) idle=022/140000000000000/0 softirq=2825/2825 fqs=612
[  850.962605]  (detected by 955, t=6002 jiffies, g=4486, c=4485, q=90864)
[  850.962895] Sending NMI from CPU 955 to CPUs 154:
[  850.992667] NMI backtrace for cpu 154
[  850.993069] CPU: 154 PID: 147071 Comm: workload Not tainted 4.15.0-rc3+ #3
[  850.993258] NIP:  c0000000000b3f64 LR: c0000000000b33d4 CTR: 000000000000aa18
[  850.993503] REGS: 00000000a4b0fb44 TRAP: 0501   Not tainted  (4.15.0-rc3+)
[  850.993707] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22422082  XER: 00000000
[  850.994386] CFAR: 00000000006cf8f0 SOFTE: 1
GPR00: 0010000000000000 c00003ef9b1cb8c0 c0000000010cc600 0000000000000000
GPR04: 8e0000018c32b200 40017b3858fd6e00 8e0000018c32b208 40017b3858fd6e00
GPR08: 8e0000018c32b210 40017b3858fd6e00 8e0000018c32b218 40017b3858fd6e00
GPR12: ffffffffffffffff c00000000fb25100
[  850.995976] NIP [c0000000000b3f64] plpar_hcall9+0x44/0x7c
[  850.996174] LR [c0000000000b33d4] pSeries_lpar_flush_hash_range+0x384/0x420
[  850.996401] Call Trace:
[  850.996600] [c00003ef9b1cb8c0] [c00003fa8fff7d40] 0xc00003fa8fff7d40 (unreliable)
[  850.996959] [c00003ef9b1cba40] [c0000000000688a8] flush_hash_range+0x48/0x100
[  850.997261] [c00003ef9b1cba90] [c000000000071b14] __flush_tlb_pending+0x44/0xd0
[  850.997600] [c00003ef9b1cbac0] [c000000000071fa8] hpte_need_flush+0x408/0x470
[  850.997958] [c00003ef9b1cbb30] [c0000000002c646c] change_protection_range+0xaac/0xf10
[  850.998180] [c00003ef9b1cbcb0] [c0000000002f2510] change_prot_numa+0x30/0xb0
[  850.998502] [c00003ef9b1cbce0] [c00000000013a950] task_numa_work+0x2d0/0x3e0
[  850.998816] [c00003ef9b1cbda0] [c00000000011ea30] task_work_run+0x130/0x190
[  850.999121] [c00003ef9b1cbe00] [c00000000001bcd8] do_notify_resume+0x118/0x120
[  850.999421] [c00003ef9b1cbe30] [c00000000000b744] ret_from_except_lite+0x70/0x74
[  850.999716] Instruction dump:
[  850.999959] 60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378
[  851.000575] e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changes in V2:

- Moved cond_resched() to change_pmd_range() as per Michal Hocko
- Fixed commit message as appropriate

Changes in V1: (https://patchwork.kernel.org/patch/10111445/)

 mm/mprotect.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index ec39f73..43c29fa 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -196,6 +196,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 		this_pages = change_pte_range(vma, pmd, addr, next, newprot,
 				 dirty_accountable, prot_numa);
 		pages += this_pages;
+		cond_resched();
 	} while (pmd++, addr = next, addr != end);
 
 	if (mni_start)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mm/mprotect: Add a cond_resched() inside change_pmd_range()
  2017-12-14 11:14 [PATCH V2] mm/mprotect: Add a cond_resched() inside change_pmd_range() Anshuman Khandual
@ 2017-12-14 11:29 ` Michal Hocko
  2017-12-14 12:55   ` Anshuman Khandual
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2017-12-14 11:29 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-mm, linux-kernel, akpm

On Thu 14-12-17 16:44:26, Anshuman Khandual wrote:
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index ec39f73..43c29fa 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -196,6 +196,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
>  		this_pages = change_pte_range(vma, pmd, addr, next, newprot,
>  				 dirty_accountable, prot_numa);
>  		pages += this_pages;
> +		cond_resched();
>  	} while (pmd++, addr = next, addr != end);
>  
>  	if (mni_start)

this is not exactly what I meant. See how change_huge_pmd does continue.
That's why I mentioned zap_pmd_range which does goto next...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mm/mprotect: Add a cond_resched() inside change_pmd_range()
  2017-12-14 11:29 ` Michal Hocko
@ 2017-12-14 12:55   ` Anshuman Khandual
  2017-12-14 13:04     ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Anshuman Khandual @ 2017-12-14 12:55 UTC (permalink / raw)
  To: Michal Hocko, Anshuman Khandual; +Cc: linux-mm, linux-kernel, akpm

On 12/14/2017 04:59 PM, Michal Hocko wrote:
> On Thu 14-12-17 16:44:26, Anshuman Khandual wrote:
>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>> index ec39f73..43c29fa 100644
>> --- a/mm/mprotect.c
>> +++ b/mm/mprotect.c
>> @@ -196,6 +196,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
>>  		this_pages = change_pte_range(vma, pmd, addr, next, newprot,
>>  				 dirty_accountable, prot_numa);
>>  		pages += this_pages;
>> +		cond_resched();
>>  	} while (pmd++, addr = next, addr != end);
>>  
>>  	if (mni_start)
> 
> this is not exactly what I meant. See how change_huge_pmd does continue.
> That's why I mentioned zap_pmd_range which does goto next...

I might be still missing something but is this what you meant ?
Here we will give cond_resched() cover to the THP backed pages
as well.

diff --git a/mm/mprotect.c b/mm/mprotect.c
index ec39f73..3d445ee 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -188,7 +188,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
                                        }
 
                                        /* huge pmd was handled */
-                                       continue;
+                                       goto next;
                                }
                        }
                        /* fall through, the trans huge pmd just split */
@@ -196,6 +196,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
                this_pages = change_pte_range(vma, pmd, addr, next, newprot,
                                 dirty_accountable, prot_numa);
                pages += this_pages;
+next:
+               cond_resched();
        } while (pmd++, addr = next, addr != end);
 
        if (mni_start)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mm/mprotect: Add a cond_resched() inside change_pmd_range()
  2017-12-14 12:55   ` Anshuman Khandual
@ 2017-12-14 13:04     ` Michal Hocko
  2017-12-14 13:20       ` Anshuman Khandual
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2017-12-14 13:04 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-mm, linux-kernel, akpm

On Thu 14-12-17 18:25:54, Anshuman Khandual wrote:
> On 12/14/2017 04:59 PM, Michal Hocko wrote:
> > On Thu 14-12-17 16:44:26, Anshuman Khandual wrote:
> >> diff --git a/mm/mprotect.c b/mm/mprotect.c
> >> index ec39f73..43c29fa 100644
> >> --- a/mm/mprotect.c
> >> +++ b/mm/mprotect.c
> >> @@ -196,6 +196,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
> >>  		this_pages = change_pte_range(vma, pmd, addr, next, newprot,
> >>  				 dirty_accountable, prot_numa);
> >>  		pages += this_pages;
> >> +		cond_resched();
> >>  	} while (pmd++, addr = next, addr != end);
> >>  
> >>  	if (mni_start)
> > 
> > this is not exactly what I meant. See how change_huge_pmd does continue.
> > That's why I mentioned zap_pmd_range which does goto next...
> 
> I might be still missing something but is this what you meant ?

yes, except

> Here we will give cond_resched() cover to the THP backed pages
> as well.

but there is still 
		if (!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)
				&& pmd_none_or_clear_bad(pmd))
			continue;

so we won't have scheduling point on pmd holes. Maybe this doesn't
matter, I haven't checked but why should we handle those differently?

> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index ec39f73..3d445ee 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -188,7 +188,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
>                                         }
>  
>                                         /* huge pmd was handled */
> -                                       continue;
> +                                       goto next;
>                                 }
>                         }
>                         /* fall through, the trans huge pmd just split */
> @@ -196,6 +196,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
>                 this_pages = change_pte_range(vma, pmd, addr, next, newprot,
>                                  dirty_accountable, prot_numa);
>                 pages += this_pages;
> +next:
> +               cond_resched();
>         } while (pmd++, addr = next, addr != end);
>  
>         if (mni_start)

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mm/mprotect: Add a cond_resched() inside change_pmd_range()
  2017-12-14 13:04     ` Michal Hocko
@ 2017-12-14 13:20       ` Anshuman Khandual
  2017-12-14 13:27         ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Anshuman Khandual @ 2017-12-14 13:20 UTC (permalink / raw)
  To: Michal Hocko, Anshuman Khandual; +Cc: linux-mm, linux-kernel, akpm

On 12/14/2017 06:34 PM, Michal Hocko wrote:
> On Thu 14-12-17 18:25:54, Anshuman Khandual wrote:
>> On 12/14/2017 04:59 PM, Michal Hocko wrote:
>>> On Thu 14-12-17 16:44:26, Anshuman Khandual wrote:
>>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>>>> index ec39f73..43c29fa 100644
>>>> --- a/mm/mprotect.c
>>>> +++ b/mm/mprotect.c
>>>> @@ -196,6 +196,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
>>>>  		this_pages = change_pte_range(vma, pmd, addr, next, newprot,
>>>>  				 dirty_accountable, prot_numa);
>>>>  		pages += this_pages;
>>>> +		cond_resched();
>>>>  	} while (pmd++, addr = next, addr != end);
>>>>  
>>>>  	if (mni_start)
>>> this is not exactly what I meant. See how change_huge_pmd does continue.
>>> That's why I mentioned zap_pmd_range which does goto next...
>> I might be still missing something but is this what you meant ?
> yes, except
> 
>> Here we will give cond_resched() cover to the THP backed pages
>> as well.
> but there is still 
> 		if (!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)
> 				&& pmd_none_or_clear_bad(pmd))
> 			continue;
> 
> so we won't have scheduling point on pmd holes. Maybe this doesn't
> matter, I haven't checked but why should we handle those differently?

May be because it is not spending much time for those entries which
can really trigger stalls, hence they dont need scheduling points.
In case of zap_pmd_range(), it was spending time either in
__split_huge_pmd() or zap_huge_pmd() hence deserved a scheduling point.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mm/mprotect: Add a cond_resched() inside change_pmd_range()
  2017-12-14 13:20       ` Anshuman Khandual
@ 2017-12-14 13:27         ` Michal Hocko
  2017-12-14 13:30           ` Anshuman Khandual
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2017-12-14 13:27 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-mm, linux-kernel, akpm

On Thu 14-12-17 18:50:41, Anshuman Khandual wrote:
> On 12/14/2017 06:34 PM, Michal Hocko wrote:
> > On Thu 14-12-17 18:25:54, Anshuman Khandual wrote:
> >> On 12/14/2017 04:59 PM, Michal Hocko wrote:
> >>> On Thu 14-12-17 16:44:26, Anshuman Khandual wrote:
> >>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
> >>>> index ec39f73..43c29fa 100644
> >>>> --- a/mm/mprotect.c
> >>>> +++ b/mm/mprotect.c
> >>>> @@ -196,6 +196,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
> >>>>  		this_pages = change_pte_range(vma, pmd, addr, next, newprot,
> >>>>  				 dirty_accountable, prot_numa);
> >>>>  		pages += this_pages;
> >>>> +		cond_resched();
> >>>>  	} while (pmd++, addr = next, addr != end);
> >>>>  
> >>>>  	if (mni_start)
> >>> this is not exactly what I meant. See how change_huge_pmd does continue.
> >>> That's why I mentioned zap_pmd_range which does goto next...
> >> I might be still missing something but is this what you meant ?
> > yes, except
> > 
> >> Here we will give cond_resched() cover to the THP backed pages
> >> as well.
> > but there is still 
> > 		if (!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)
> > 				&& pmd_none_or_clear_bad(pmd))
> > 			continue;
> > 
> > so we won't have scheduling point on pmd holes. Maybe this doesn't
> > matter, I haven't checked but why should we handle those differently?
> 
> May be because it is not spending much time for those entries which
> can really trigger stalls, hence they dont need scheduling points.
> In case of zap_pmd_range(), it was spending time either in
> __split_huge_pmd() or zap_huge_pmd() hence deserved a scheduling point.

As I've said, I haven't thought much about that but the discrepancy just
hit my eyes. So if there is not a really good reason I would rather use
goto next consistently.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] mm/mprotect: Add a cond_resched() inside change_pmd_range()
  2017-12-14 13:27         ` Michal Hocko
@ 2017-12-14 13:30           ` Anshuman Khandual
  0 siblings, 0 replies; 7+ messages in thread
From: Anshuman Khandual @ 2017-12-14 13:30 UTC (permalink / raw)
  To: Michal Hocko, Anshuman Khandual; +Cc: linux-mm, linux-kernel, akpm

On 12/14/2017 06:57 PM, Michal Hocko wrote:
> On Thu 14-12-17 18:50:41, Anshuman Khandual wrote:
>> On 12/14/2017 06:34 PM, Michal Hocko wrote:
>>> On Thu 14-12-17 18:25:54, Anshuman Khandual wrote:
>>>> On 12/14/2017 04:59 PM, Michal Hocko wrote:
>>>>> On Thu 14-12-17 16:44:26, Anshuman Khandual wrote:
>>>>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>>>>>> index ec39f73..43c29fa 100644
>>>>>> --- a/mm/mprotect.c
>>>>>> +++ b/mm/mprotect.c
>>>>>> @@ -196,6 +196,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
>>>>>>  		this_pages = change_pte_range(vma, pmd, addr, next, newprot,
>>>>>>  				 dirty_accountable, prot_numa);
>>>>>>  		pages += this_pages;
>>>>>> +		cond_resched();
>>>>>>  	} while (pmd++, addr = next, addr != end);
>>>>>>  
>>>>>>  	if (mni_start)
>>>>> this is not exactly what I meant. See how change_huge_pmd does continue.
>>>>> That's why I mentioned zap_pmd_range which does goto next...
>>>> I might be still missing something but is this what you meant ?
>>> yes, except
>>>
>>>> Here we will give cond_resched() cover to the THP backed pages
>>>> as well.
>>> but there is still 
>>> 		if (!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd)
>>> 				&& pmd_none_or_clear_bad(pmd))
>>> 			continue;
>>>
>>> so we won't have scheduling point on pmd holes. Maybe this doesn't
>>> matter, I haven't checked but why should we handle those differently?
>>
>> May be because it is not spending much time for those entries which
>> can really trigger stalls, hence they dont need scheduling points.
>> In case of zap_pmd_range(), it was spending time either in
>> __split_huge_pmd() or zap_huge_pmd() hence deserved a scheduling point.
> 
> As I've said, I haven't thought much about that but the discrepancy just
> hit my eyes. So if there is not a really good reason I would rather use
> goto next consistently.

Sure, will respin with the changes.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-12-14 13:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-14 11:14 [PATCH V2] mm/mprotect: Add a cond_resched() inside change_pmd_range() Anshuman Khandual
2017-12-14 11:29 ` Michal Hocko
2017-12-14 12:55   ` Anshuman Khandual
2017-12-14 13:04     ` Michal Hocko
2017-12-14 13:20       ` Anshuman Khandual
2017-12-14 13:27         ` Michal Hocko
2017-12-14 13:30           ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).