linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: fix account pmd page to the process
@ 2016-06-16  7:47 zhongjiang
  0 siblings, 0 replies; 22+ messages in thread
From: zhongjiang @ 2016-06-16  7:47 UTC (permalink / raw)
  To: akpm, kirill.shutemov; +Cc: linux-kernel, linux-mm

From: zhong jiang <zhongjiang@huawei.com>

when a process acquire a pmd table shared by other process, we
increase the account to current process. otherwise, a race result
in other tasks have set the pud entry. so it no need to increase it.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 mm/hugetlb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 19d0d08..3b025c5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	if (pud_none(*pud)) {
 		pud_populate(mm, pud,
 				(pmd_t *)((unsigned long)spte & PAGE_MASK));
-	} else {
+	} else 
 		put_page(virt_to_page(spte));
-		mm_inc_nr_pmds(mm);
-	}
+
 	spin_unlock(ptl);
 out:
 	pte = (pte_t *)pmd_alloc(mm, pud, addr);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-17 15:39             ` Mike Kravetz
@ 2016-06-18  5:07               ` zhong jiang
  0 siblings, 0 replies; 22+ messages in thread
From: zhong jiang @ 2016-06-18  5:07 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Kirill A. Shutemov, Michal Hocko, akpm, linux-mm, linux-kernel

On 2016/6/17 23:39, Mike Kravetz wrote:
> On 06/17/2016 05:25 AM, Kirill A. Shutemov wrote:
>> From fd22922e7b4664e83653a84331f0a95b985bff0c Mon Sep 17 00:00:00 2001
>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>> Date: Fri, 17 Jun 2016 15:07:03 +0300
>> Subject: [PATCH] hugetlb: fix nr_pmds accounting with shared page tables
>>
>> We account HugeTLB's shared page table to all processes who share it.
>> The accounting happens during huge_pmd_share().
>>
>> If somebody populates pud entry under us, we should decrease pagetable's
>> refcount and decrease nr_pmds of the process.
>>
>> By mistake, I increase nr_pmds again in this case. :-/
>> It will lead to "BUG: non-zero nr_pmds on freeing mm: 2" on process'
>> exit.
>>
>> Let's fix this by increasing nr_pmds only when we're sure that the page
>> table will be used.
>>
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Nice,
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>
> I agree that we do not necessarily need a back port.  I have not seen
> reports of people experiencing this race and seeing the BUG (on mm
> tear-down).
>
> zhongjiang, did someone actually hit the BUG?  Or, did you find it by
> code examination?
>
  just code examination.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-17 12:25           ` Kirill A. Shutemov
  2016-06-17 13:00             ` Michal Hocko
@ 2016-06-17 15:39             ` Mike Kravetz
  2016-06-18  5:07               ` zhong jiang
  1 sibling, 1 reply; 22+ messages in thread
From: Mike Kravetz @ 2016-06-17 15:39 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Michal Hocko, zhongjiang, akpm, linux-mm, linux-kernel

On 06/17/2016 05:25 AM, Kirill A. Shutemov wrote:
> 
> From fd22922e7b4664e83653a84331f0a95b985bff0c Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Fri, 17 Jun 2016 15:07:03 +0300
> Subject: [PATCH] hugetlb: fix nr_pmds accounting with shared page tables
> 
> We account HugeTLB's shared page table to all processes who share it.
> The accounting happens during huge_pmd_share().
> 
> If somebody populates pud entry under us, we should decrease pagetable's
> refcount and decrease nr_pmds of the process.
> 
> By mistake, I increase nr_pmds again in this case. :-/
> It will lead to "BUG: non-zero nr_pmds on freeing mm: 2" on process'
> exit.
> 
> Let's fix this by increasing nr_pmds only when we're sure that the page
> table will be used.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Nice,
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

I agree that we do not necessarily need a back port.  I have not seen
reports of people experiencing this race and seeing the BUG (on mm
tear-down).

zhongjiang, did someone actually hit the BUG?  Or, did you find it by
code examination?

-- 
Mike Kravetz

> Reported-by: zhongjiang <zhongjiang@huawei.com>
> Fixes: dc6c9a35b66b ("mm: account pmd page tables to the process")
> Cc: <stable@vger.kernel.org>        [4.0+]
> ---
>  mm/hugetlb.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e197cd7080e6..ed6a537f0878 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4216,7 +4216,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>  		if (saddr) {
>  			spte = huge_pte_offset(svma->vm_mm, saddr);
>  			if (spte) {
> -				mm_inc_nr_pmds(mm);
>  				get_page(virt_to_page(spte));
>  				break;
>  			}
> @@ -4231,9 +4230,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>  	if (pud_none(*pud)) {
>  		pud_populate(mm, pud,
>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
> +		mm_inc_nr_pmds(mm);
>  	} else {
>  		put_page(virt_to_page(spte));
> -		mm_inc_nr_pmds(mm);
>  	}
>  	spin_unlock(ptl);
>  out:
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-17 13:00             ` Michal Hocko
@ 2016-06-17 14:25               ` Kirill A. Shutemov
  0 siblings, 0 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2016-06-17 14:25 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Mike Kravetz, zhongjiang, akpm, linux-mm, linux-kernel

On Fri, Jun 17, 2016 at 03:00:00PM +0200, Michal Hocko wrote:
> On Fri 17-06-16 15:25:06, Kirill A. Shutemov wrote:
> [...]
> > >From fd22922e7b4664e83653a84331f0a95b985bff0c Mon Sep 17 00:00:00 2001
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Date: Fri, 17 Jun 2016 15:07:03 +0300
> > Subject: [PATCH] hugetlb: fix nr_pmds accounting with shared page tables
> > 
> > We account HugeTLB's shared page table to all processes who share it.
> > The accounting happens during huge_pmd_share().
> > 
> > If somebody populates pud entry under us, we should decrease pagetable's
> > refcount and decrease nr_pmds of the process.
> > 
> > By mistake, I increase nr_pmds again in this case. :-/
> > It will lead to "BUG: non-zero nr_pmds on freeing mm: 2" on process'
> > exit.
> > 
> > Let's fix this by increasing nr_pmds only when we're sure that the page
> > table will be used.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Reported-by: zhongjiang <zhongjiang@huawei.com>
> > Fixes: dc6c9a35b66b ("mm: account pmd page tables to the process")
> > Cc: <stable@vger.kernel.org>        [4.0+]
> 
> Yes this patch is better. Is it worth backporting to stable though?
> BUG message sounds scary but it is not a real BUG().

I guess, we can live without stable backport.
> 
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-17 12:21 ` Michal Hocko
@ 2016-06-17 13:04   ` zhong jiang
  0 siblings, 0 replies; 22+ messages in thread
From: zhong jiang @ 2016-06-17 13:04 UTC (permalink / raw)
  To: Michal Hocko; +Cc: mike.kravetz, akpm, kirill, linux-mm, linux-kernel

On 2016/6/17 20:21, Michal Hocko wrote:
> On Fri 17-06-16 19:56:15, zhongjiang wrote:
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> hen a process acquire a pmd table shared by other process, we
>> increase the account to current process. otherwise, a race result
>> in other tasks have set the pud entry. so it no need to increase it.
> I have really hard time to understand (well even to parse) the
> changelog. What do you think about the following?
> "
> huge_pmd_share accounts the number of pmds incorrectly when it races
> with a parallel pud instantiation. vma_interval_tree_foreach will
> increase the counter but then has to recheck the pud with the pte lock
> held and the back off path should drop the increment. The previous
> code would lead to an elevated pmd count which shouldn't be very
> harmful (check_mm() might complain and oom_badness() might be marginally
> confused) but this is worth fixing.
>
> "
  Yes, it is better , thanks.
> But please note that I am still not 100% sure the race is real.
   we can not completely rule out the possibility of  race,  such implementation is common
    in the kernel.   The stability of the kernel will be guaranteed.
  
   Thanks
    zhongjiang
 
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  mm/hugetlb.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 19d0d08..3072857 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -4191,7 +4191,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
>>  	} else {
>>  		put_page(virt_to_page(spte));
>> -		mm_inc_nr_pmds(mm);
>> +		mm_dec_nr_pmds(mm);
>>  	}
>>  	spin_unlock(ptl);
>>  out:
>> -- 
>> 1.8.3.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-17 12:25           ` Kirill A. Shutemov
@ 2016-06-17 13:00             ` Michal Hocko
  2016-06-17 14:25               ` Kirill A. Shutemov
  2016-06-17 15:39             ` Mike Kravetz
  1 sibling, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2016-06-17 13:00 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Mike Kravetz, zhongjiang, akpm, linux-mm, linux-kernel

On Fri 17-06-16 15:25:06, Kirill A. Shutemov wrote:
[...]
> >From fd22922e7b4664e83653a84331f0a95b985bff0c Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Fri, 17 Jun 2016 15:07:03 +0300
> Subject: [PATCH] hugetlb: fix nr_pmds accounting with shared page tables
> 
> We account HugeTLB's shared page table to all processes who share it.
> The accounting happens during huge_pmd_share().
> 
> If somebody populates pud entry under us, we should decrease pagetable's
> refcount and decrease nr_pmds of the process.
> 
> By mistake, I increase nr_pmds again in this case. :-/
> It will lead to "BUG: non-zero nr_pmds on freeing mm: 2" on process'
> exit.
> 
> Let's fix this by increasing nr_pmds only when we're sure that the page
> table will be used.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: zhongjiang <zhongjiang@huawei.com>
> Fixes: dc6c9a35b66b ("mm: account pmd page tables to the process")
> Cc: <stable@vger.kernel.org>        [4.0+]

Yes this patch is better. Is it worth backporting to stable though?
BUG message sounds scary but it is not a real BUG().

Acked-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/hugetlb.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e197cd7080e6..ed6a537f0878 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4216,7 +4216,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>  		if (saddr) {
>  			spte = huge_pte_offset(svma->vm_mm, saddr);
>  			if (spte) {
> -				mm_inc_nr_pmds(mm);
>  				get_page(virt_to_page(spte));
>  				break;
>  			}
> @@ -4231,9 +4230,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>  	if (pud_none(*pud)) {
>  		pud_populate(mm, pud,
>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
> +		mm_inc_nr_pmds(mm);
>  	} else {
>  		put_page(virt_to_page(spte));
> -		mm_inc_nr_pmds(mm);
>  	}
>  	spin_unlock(ptl);
>  out:
> -- 
>  Kirill A. Shutemov

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-16 16:47         ` Mike Kravetz
@ 2016-06-17 12:25           ` Kirill A. Shutemov
  2016-06-17 13:00             ` Michal Hocko
  2016-06-17 15:39             ` Mike Kravetz
  0 siblings, 2 replies; 22+ messages in thread
From: Kirill A. Shutemov @ 2016-06-17 12:25 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Michal Hocko, zhongjiang, akpm, linux-mm, linux-kernel

On Thu, Jun 16, 2016 at 09:47:46AM -0700, Mike Kravetz wrote:
> On 06/16/2016 09:31 AM, Michal Hocko wrote:
> > On Thu 16-06-16 09:05:23, Mike Kravetz wrote:
> >> On 06/16/2016 08:43 AM, Michal Hocko wrote:
> >>> [It seems that this patch has been sent several times and this
> >>> particular copy didn't add Kirill who has added this code CC him now]
> >>>
> >>> On Thu 16-06-16 17:42:14, Michal Hocko wrote:
> >>>> On Thu 16-06-16 19:36:11, zhongjiang wrote:
> >>>>> From: zhong jiang <zhongjiang@huawei.com>
> >>>>>
> >>>>> when a process acquire a pmd table shared by other process, we
> >>>>> increase the account to current process. otherwise, a race result
> >>>>> in other tasks have set the pud entry. so it no need to increase it.
> >>>>>
> >>>>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> >>>>> ---
> >>>>>  mm/hugetlb.c | 5 ++---
> >>>>>  1 file changed, 2 insertions(+), 3 deletions(-)
> >>>>>
> >>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>>>> index 19d0d08..3b025c5 100644
> >>>>> --- a/mm/hugetlb.c
> >>>>> +++ b/mm/hugetlb.c
> >>>>> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
> >>>>>  	if (pud_none(*pud)) {
> >>>>>  		pud_populate(mm, pud,
> >>>>>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
> >>>>> -	} else {
> >>>>> +	} else 
> >>>>>  		put_page(virt_to_page(spte));
> >>>>> -		mm_inc_nr_pmds(mm);
> >>>>> -	}
> >>>>
> >>>> The code is quite puzzling but is this correct? Shouldn't we rather do
> >>>> mm_dec_nr_pmds(mm) in that path to undo the previous inc?
> >>
> >> I agree that the code is quite puzzling. :(
> >>
> >> However, if this were an issue I would have expected to see some reports.
> >> Oracle DB makes use of this feature (shared page tables) and if the pmd
> >> count is wrong we would catch it in check_mm() at exit time.
> >>
> >> Upon closer examination, I believe the code in question is never executed.
> >> Note the callers of huge_pmd_share.  The calling code looks like:
> >>
> >>                         if (want_pmd_share() && pud_none(*pud))
> >>                                 pte = huge_pmd_share(mm, addr, pud);
> >>                         else
> >>                                 pte = (pte_t *)pmd_alloc(mm, pud, addr);
> >>
> >> Therefore, we do not call huge_pmd_share unless pud_none(*pud).  The
> >> code in question is only executed when !pud_none(*pud).
> > 
> > My understanding is that the check is needed after we retake page lock
> > because we might have raced with other thread. But it's been quite some
> > time since I've looked at hugetlb locking and page table sharing code.
> 
> That is correct, we could have raced. Duh!
> 
> In the case of a race, the other thread would have incremented the
> PMD count already.  Your suggestion of decrementing pmd count in
> this case seems to be the correct approach.  But, I need to think
> about this some more.

Yes, I made mistake by increasing nr_pmds, not descreasing here.

Testcase:

#include <errno.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/time.h>

#define HPGSZ 2097152UL
int main(int argc, char **argv) {
	char *addr;

	system("echo 1024 > /proc/sys/vm/nr_hugepages");
	addr = mmap(NULL, 1024*HPGSZ, PROT_WRITE | PROT_READ,
			MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB | MAP_POPULATE, -1, 0);
	if (addr == MAP_FAILED) {
		fprintf(stderr, "Failed to alloc hugepage\n");
		return -1;
	}

	addr[0] = 1;
	fork();
	printf("addr[0]: %d\n", addr[0]);

	sleep(1);
	return 0;
}

You can simulate race by replacing 'if (pud_none(*pud))' with "if (0)". It
would produce "BUG: non-zero nr_pmds on freeing mm: 2" on the test-case.

Fix:

>From fd22922e7b4664e83653a84331f0a95b985bff0c Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Fri, 17 Jun 2016 15:07:03 +0300
Subject: [PATCH] hugetlb: fix nr_pmds accounting with shared page tables

We account HugeTLB's shared page table to all processes who share it.
The accounting happens during huge_pmd_share().

If somebody populates pud entry under us, we should decrease pagetable's
refcount and decrease nr_pmds of the process.

By mistake, I increase nr_pmds again in this case. :-/
It will lead to "BUG: non-zero nr_pmds on freeing mm: 2" on process'
exit.

Let's fix this by increasing nr_pmds only when we're sure that the page
table will be used.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: zhongjiang <zhongjiang@huawei.com>
Fixes: dc6c9a35b66b ("mm: account pmd page tables to the process")
Cc: <stable@vger.kernel.org>        [4.0+]
---
 mm/hugetlb.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e197cd7080e6..ed6a537f0878 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4216,7 +4216,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 		if (saddr) {
 			spte = huge_pte_offset(svma->vm_mm, saddr);
 			if (spte) {
-				mm_inc_nr_pmds(mm);
 				get_page(virt_to_page(spte));
 				break;
 			}
@@ -4231,9 +4230,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	if (pud_none(*pud)) {
 		pud_populate(mm, pud,
 				(pmd_t *)((unsigned long)spte & PAGE_MASK));
+		mm_inc_nr_pmds(mm);
 	} else {
 		put_page(virt_to_page(spte));
-		mm_inc_nr_pmds(mm);
 	}
 	spin_unlock(ptl);
 out:
-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-17 11:56 zhongjiang
@ 2016-06-17 12:21 ` Michal Hocko
  2016-06-17 13:04   ` zhong jiang
  0 siblings, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2016-06-17 12:21 UTC (permalink / raw)
  To: zhongjiang; +Cc: mike.kravetz, akpm, kirill, linux-mm, linux-kernel

On Fri 17-06-16 19:56:15, zhongjiang wrote:
> From: zhong jiang <zhongjiang@huawei.com>
> 
> hen a process acquire a pmd table shared by other process, we
> increase the account to current process. otherwise, a race result
> in other tasks have set the pud entry. so it no need to increase it.

I have really hard time to understand (well even to parse) the
changelog. What do you think about the following?
"
huge_pmd_share accounts the number of pmds incorrectly when it races
with a parallel pud instantiation. vma_interval_tree_foreach will
increase the counter but then has to recheck the pud with the pte lock
held and the back off path should drop the increment. The previous
code would lead to an elevated pmd count which shouldn't be very
harmful (check_mm() might complain and oom_badness() might be marginally
confused) but this is worth fixing.
"

But please note that I am still not 100% sure the race is real.

> 
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 19d0d08..3072857 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4191,7 +4191,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
>  	} else {
>  		put_page(virt_to_page(spte));
> -		mm_inc_nr_pmds(mm);
> +		mm_dec_nr_pmds(mm);
>  	}
>  	spin_unlock(ptl);
>  out:
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
       [not found] <1466163941-12952-1-git-send-email-zhongjiang@huawei.com>
@ 2016-06-17 12:01 ` zhong jiang
  0 siblings, 0 replies; 22+ messages in thread
From: zhong jiang @ 2016-06-17 12:01 UTC (permalink / raw)
  To: Michal Hocko, Mike Kravetz, kirill.shutemov
  Cc: Linux Memory Management List, LKML

On 2016/6/17 19:45, zhongjiang wrote:
> From: zhong jiang <zhongjiang@huawei.com>
>
> hen a process acquire a pmd table shared by other process, we
> increase the account to current process. otherwise, a race result
> in other tasks have set the pud entry. so it no need to increase it.
>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 19d0d08..3072857 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4191,7 +4191,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
>  	} else {
>  		put_page(virt_to_page(spte));
> -		mm_inc_nr_pmds(mm);
> +		mm_dec_nr_pmds(mm);
>  	}
>  	spin_unlock(ptl);
>  out:

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] mm: fix account pmd page to the process
@ 2016-06-17 11:56 zhongjiang
  2016-06-17 12:21 ` Michal Hocko
  0 siblings, 1 reply; 22+ messages in thread
From: zhongjiang @ 2016-06-17 11:56 UTC (permalink / raw)
  To: mhocko, mike.kravetz, akpm, kirill; +Cc: linux-mm, linux-kernel

From: zhong jiang <zhongjiang@huawei.com>

hen a process acquire a pmd table shared by other process, we
increase the account to current process. otherwise, a race result
in other tasks have set the pud entry. so it no need to increase it.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 19d0d08..3072857 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4191,7 +4191,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 				(pmd_t *)((unsigned long)spte & PAGE_MASK));
 	} else {
 		put_page(virt_to_page(spte));
-		mm_inc_nr_pmds(mm);
+		mm_dec_nr_pmds(mm);
 	}
 	spin_unlock(ptl);
 out:
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-16 15:42 ` Michal Hocko
  2016-06-16 15:43   ` Michal Hocko
@ 2016-06-17 11:18   ` zhong jiang
  1 sibling, 0 replies; 22+ messages in thread
From: zhong jiang @ 2016-06-17 11:18 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, linux-mm, linux-kernel

On 2016/6/16 23:42, Michal Hocko wrote:
> On Thu 16-06-16 19:36:11, zhongjiang wrote:
>> From: zhong jiang <zhongjiang@huawei.com>
>>
>> when a process acquire a pmd table shared by other process, we
>> increase the account to current process. otherwise, a race result
>> in other tasks have set the pud entry. so it no need to increase it.
>>
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  mm/hugetlb.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 19d0d08..3b025c5 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>>  	if (pud_none(*pud)) {
>>  		pud_populate(mm, pud,
>>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
>> -	} else {
>> +	} else 
>>  		put_page(virt_to_page(spte));
>> -		mm_inc_nr_pmds(mm);
>> -	}
> The code is quite puzzling but is this correct? Shouldn't we rather do
> mm_dec_nr_pmds(mm) in that path to undo the previous inc?
  Yes, you are right. I will modify it in V2.
 
  Thanks
  zhongjiang
>
>> +
>>  	spin_unlock(ptl);
>>  out:
>>  	pte = (pte_t *)pmd_alloc(mm, pud, addr);
>> -- 
>> 1.8.3.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-16 16:31       ` Michal Hocko
@ 2016-06-16 16:47         ` Mike Kravetz
  2016-06-17 12:25           ` Kirill A. Shutemov
  0 siblings, 1 reply; 22+ messages in thread
From: Mike Kravetz @ 2016-06-16 16:47 UTC (permalink / raw)
  To: Michal Hocko; +Cc: zhongjiang, akpm, linux-mm, linux-kernel, Kirill A. Shutemov

On 06/16/2016 09:31 AM, Michal Hocko wrote:
> On Thu 16-06-16 09:05:23, Mike Kravetz wrote:
>> On 06/16/2016 08:43 AM, Michal Hocko wrote:
>>> [It seems that this patch has been sent several times and this
>>> particular copy didn't add Kirill who has added this code CC him now]
>>>
>>> On Thu 16-06-16 17:42:14, Michal Hocko wrote:
>>>> On Thu 16-06-16 19:36:11, zhongjiang wrote:
>>>>> From: zhong jiang <zhongjiang@huawei.com>
>>>>>
>>>>> when a process acquire a pmd table shared by other process, we
>>>>> increase the account to current process. otherwise, a race result
>>>>> in other tasks have set the pud entry. so it no need to increase it.
>>>>>
>>>>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>>>>> ---
>>>>>  mm/hugetlb.c | 5 ++---
>>>>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>> index 19d0d08..3b025c5 100644
>>>>> --- a/mm/hugetlb.c
>>>>> +++ b/mm/hugetlb.c
>>>>> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>>>>>  	if (pud_none(*pud)) {
>>>>>  		pud_populate(mm, pud,
>>>>>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
>>>>> -	} else {
>>>>> +	} else 
>>>>>  		put_page(virt_to_page(spte));
>>>>> -		mm_inc_nr_pmds(mm);
>>>>> -	}
>>>>
>>>> The code is quite puzzling but is this correct? Shouldn't we rather do
>>>> mm_dec_nr_pmds(mm) in that path to undo the previous inc?
>>
>> I agree that the code is quite puzzling. :(
>>
>> However, if this were an issue I would have expected to see some reports.
>> Oracle DB makes use of this feature (shared page tables) and if the pmd
>> count is wrong we would catch it in check_mm() at exit time.
>>
>> Upon closer examination, I believe the code in question is never executed.
>> Note the callers of huge_pmd_share.  The calling code looks like:
>>
>>                         if (want_pmd_share() && pud_none(*pud))
>>                                 pte = huge_pmd_share(mm, addr, pud);
>>                         else
>>                                 pte = (pte_t *)pmd_alloc(mm, pud, addr);
>>
>> Therefore, we do not call huge_pmd_share unless pud_none(*pud).  The
>> code in question is only executed when !pud_none(*pud).
> 
> My understanding is that the check is needed after we retake page lock
> because we might have raced with other thread. But it's been quite some
> time since I've looked at hugetlb locking and page table sharing code.

That is correct, we could have raced. Duh!

In the case of a race, the other thread would have incremented the
PMD count already.  Your suggestion of decrementing pmd count in
this case seems to be the correct approach.  But, I need to think
about this some more.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-16 16:05     ` Mike Kravetz
@ 2016-06-16 16:31       ` Michal Hocko
  2016-06-16 16:47         ` Mike Kravetz
  0 siblings, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2016-06-16 16:31 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: zhongjiang, akpm, linux-mm, linux-kernel, Kirill A. Shutemov

On Thu 16-06-16 09:05:23, Mike Kravetz wrote:
> On 06/16/2016 08:43 AM, Michal Hocko wrote:
> > [It seems that this patch has been sent several times and this
> > particular copy didn't add Kirill who has added this code CC him now]
> > 
> > On Thu 16-06-16 17:42:14, Michal Hocko wrote:
> >> On Thu 16-06-16 19:36:11, zhongjiang wrote:
> >>> From: zhong jiang <zhongjiang@huawei.com>
> >>>
> >>> when a process acquire a pmd table shared by other process, we
> >>> increase the account to current process. otherwise, a race result
> >>> in other tasks have set the pud entry. so it no need to increase it.
> >>>
> >>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> >>> ---
> >>>  mm/hugetlb.c | 5 ++---
> >>>  1 file changed, 2 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index 19d0d08..3b025c5 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
> >>>  	if (pud_none(*pud)) {
> >>>  		pud_populate(mm, pud,
> >>>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
> >>> -	} else {
> >>> +	} else 
> >>>  		put_page(virt_to_page(spte));
> >>> -		mm_inc_nr_pmds(mm);
> >>> -	}
> >>
> >> The code is quite puzzling but is this correct? Shouldn't we rather do
> >> mm_dec_nr_pmds(mm) in that path to undo the previous inc?
> 
> I agree that the code is quite puzzling. :(
> 
> However, if this were an issue I would have expected to see some reports.
> Oracle DB makes use of this feature (shared page tables) and if the pmd
> count is wrong we would catch it in check_mm() at exit time.
> 
> Upon closer examination, I believe the code in question is never executed.
> Note the callers of huge_pmd_share.  The calling code looks like:
> 
>                         if (want_pmd_share() && pud_none(*pud))
>                                 pte = huge_pmd_share(mm, addr, pud);
>                         else
>                                 pte = (pte_t *)pmd_alloc(mm, pud, addr);
> 
> Therefore, we do not call huge_pmd_share unless pud_none(*pud).  The
> code in question is only executed when !pud_none(*pud).

My understanding is that the check is needed after we retake page lock
because we might have raced with other thread. But it's been quite some
time since I've looked at hugetlb locking and page table sharing code.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-16 15:43   ` Michal Hocko
@ 2016-06-16 16:05     ` Mike Kravetz
  2016-06-16 16:31       ` Michal Hocko
  0 siblings, 1 reply; 22+ messages in thread
From: Mike Kravetz @ 2016-06-16 16:05 UTC (permalink / raw)
  To: Michal Hocko, zhongjiang; +Cc: akpm, linux-mm, linux-kernel, Kirill A. Shutemov

On 06/16/2016 08:43 AM, Michal Hocko wrote:
> [It seems that this patch has been sent several times and this
> particular copy didn't add Kirill who has added this code CC him now]
> 
> On Thu 16-06-16 17:42:14, Michal Hocko wrote:
>> On Thu 16-06-16 19:36:11, zhongjiang wrote:
>>> From: zhong jiang <zhongjiang@huawei.com>
>>>
>>> when a process acquire a pmd table shared by other process, we
>>> increase the account to current process. otherwise, a race result
>>> in other tasks have set the pud entry. so it no need to increase it.
>>>
>>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>>> ---
>>>  mm/hugetlb.c | 5 ++---
>>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 19d0d08..3b025c5 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>>>  	if (pud_none(*pud)) {
>>>  		pud_populate(mm, pud,
>>>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
>>> -	} else {
>>> +	} else 
>>>  		put_page(virt_to_page(spte));
>>> -		mm_inc_nr_pmds(mm);
>>> -	}
>>
>> The code is quite puzzling but is this correct? Shouldn't we rather do
>> mm_dec_nr_pmds(mm) in that path to undo the previous inc?

I agree that the code is quite puzzling. :(

However, if this were an issue I would have expected to see some reports.
Oracle DB makes use of this feature (shared page tables) and if the pmd
count is wrong we would catch it in check_mm() at exit time.

Upon closer examination, I believe the code in question is never executed.
Note the callers of huge_pmd_share.  The calling code looks like:

                        if (want_pmd_share() && pud_none(*pud))
                                pte = huge_pmd_share(mm, addr, pud);
                        else
                                pte = (pte_t *)pmd_alloc(mm, pud, addr);

Therefore, we do not call huge_pmd_share unless pud_none(*pud).  The
code in question is only executed when !pud_none(*pud).

I think that entire if/else statement can be removed.  We know
pud_none(*pud), so just do pud_populate().

-- 
Mike Kravetz

>>
>>> +
>>>  	spin_unlock(ptl);
>>>  out:
>>>  	pte = (pte_t *)pmd_alloc(mm, pud, addr);
>>> -- 
>>> 1.8.3.1
>>
>> -- 
>> Michal Hocko
>> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-16 15:42 ` Michal Hocko
@ 2016-06-16 15:43   ` Michal Hocko
  2016-06-16 16:05     ` Mike Kravetz
  2016-06-17 11:18   ` zhong jiang
  1 sibling, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2016-06-16 15:43 UTC (permalink / raw)
  To: zhongjiang; +Cc: akpm, linux-mm, linux-kernel, Kirill A. Shutemov

[It seems that this patch has been sent several times and this
particular copy didn't add Kirill who has added this code CC him now]

On Thu 16-06-16 17:42:14, Michal Hocko wrote:
> On Thu 16-06-16 19:36:11, zhongjiang wrote:
> > From: zhong jiang <zhongjiang@huawei.com>
> > 
> > when a process acquire a pmd table shared by other process, we
> > increase the account to current process. otherwise, a race result
> > in other tasks have set the pud entry. so it no need to increase it.
> > 
> > Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> > ---
> >  mm/hugetlb.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 19d0d08..3b025c5 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
> >  	if (pud_none(*pud)) {
> >  		pud_populate(mm, pud,
> >  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
> > -	} else {
> > +	} else 
> >  		put_page(virt_to_page(spte));
> > -		mm_inc_nr_pmds(mm);
> > -	}
> 
> The code is quite puzzling but is this correct? Shouldn't we rather do
> mm_dec_nr_pmds(mm) in that path to undo the previous inc?
> 
> > +
> >  	spin_unlock(ptl);
> >  out:
> >  	pte = (pte_t *)pmd_alloc(mm, pud, addr);
> > -- 
> > 1.8.3.1
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
  2016-06-16 11:36 zhongjiang
@ 2016-06-16 15:42 ` Michal Hocko
  2016-06-16 15:43   ` Michal Hocko
  2016-06-17 11:18   ` zhong jiang
  0 siblings, 2 replies; 22+ messages in thread
From: Michal Hocko @ 2016-06-16 15:42 UTC (permalink / raw)
  To: zhongjiang; +Cc: akpm, linux-mm, linux-kernel

On Thu 16-06-16 19:36:11, zhongjiang wrote:
> From: zhong jiang <zhongjiang@huawei.com>
> 
> when a process acquire a pmd table shared by other process, we
> increase the account to current process. otherwise, a race result
> in other tasks have set the pud entry. so it no need to increase it.
> 
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  mm/hugetlb.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 19d0d08..3b025c5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>  	if (pud_none(*pud)) {
>  		pud_populate(mm, pud,
>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
> -	} else {
> +	} else 
>  		put_page(virt_to_page(spte));
> -		mm_inc_nr_pmds(mm);
> -	}

The code is quite puzzling but is this correct? Shouldn't we rather do
mm_dec_nr_pmds(mm) in that path to undo the previous inc?

> +
>  	spin_unlock(ptl);
>  out:
>  	pte = (pte_t *)pmd_alloc(mm, pud, addr);
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] mm: fix account pmd page to the process
@ 2016-06-16 11:36 zhongjiang
  2016-06-16 15:42 ` Michal Hocko
  0 siblings, 1 reply; 22+ messages in thread
From: zhongjiang @ 2016-06-16 11:36 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, zhongjiang

From: zhong jiang <zhongjiang@huawei.com>

when a process acquire a pmd table shared by other process, we
increase the account to current process. otherwise, a race result
in other tasks have set the pud entry. so it no need to increase it.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 mm/hugetlb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 19d0d08..3b025c5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	if (pud_none(*pud)) {
 		pud_populate(mm, pud,
 				(pmd_t *)((unsigned long)spte & PAGE_MASK));
-	} else {
+	} else 
 		put_page(virt_to_page(spte));
-		mm_inc_nr_pmds(mm);
-	}
+
 	spin_unlock(ptl);
 out:
 	pte = (pte_t *)pmd_alloc(mm, pud, addr);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] mm: fix account pmd page to the process
       [not found] <1466076175-23444-1-git-send-email-zhongjiang@huawei.com>
@ 2016-06-16 11:30 ` zhong jiang
  0 siblings, 0 replies; 22+ messages in thread
From: zhong jiang @ 2016-06-16 11:30 UTC (permalink / raw)
  To: Andrew Morton, kirill.shutemov; +Cc: Linux Memory Management List, LKML

On 2016/6/16 19:22, zhongjiang wrote:
> From: zhong jiang <zhongjiang@huawei.com>
>
> when a process acquire a pmd table shared by other process, we
> increase the account to current process. otherwise, a race result
> in other tasks have set the pud entry. so it no need to increase it.
>
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  mm/hugetlb.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 19d0d08..3b025c5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>  	if (pud_none(*pud)) {
>  		pud_populate(mm, pud,
>  				(pmd_t *)((unsigned long)spte & PAGE_MASK));
> -	} else {
> +	} else 
>  		put_page(virt_to_page(spte));
> -		mm_inc_nr_pmds(mm);
> -	}
> +
>  	spin_unlock(ptl);
>  out:
>  	pte = (pte_t *)pmd_alloc(mm, pud, addr);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] mm: fix account pmd page to the process
@ 2016-06-16 11:28 zhongjiang
  0 siblings, 0 replies; 22+ messages in thread
From: zhongjiang @ 2016-06-16 11:28 UTC (permalink / raw)
  To: kirill.shutemov, akpm; +Cc: linux-mm, linux-kernel, zhongjiang

From: zhong jiang <zhongjiang@huawei.com>

when a process acquire a pmd table shared by other process, we
increase the account to current process. otherwise, a race result
in other tasks have set the pud entry. so it no need to increase it.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 mm/hugetlb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 19d0d08..3b025c5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	if (pud_none(*pud)) {
 		pud_populate(mm, pud,
 				(pmd_t *)((unsigned long)spte & PAGE_MASK));
-	} else {
+	} else 
 		put_page(virt_to_page(spte));
-		mm_inc_nr_pmds(mm);
-	}
+
 	spin_unlock(ptl);
 out:
 	pte = (pte_t *)pmd_alloc(mm, pud, addr);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] mm: fix account pmd page to the process
@ 2016-06-16 11:17 zhongjiang
  0 siblings, 0 replies; 22+ messages in thread
From: zhongjiang @ 2016-06-16 11:17 UTC (permalink / raw)
  To: kirill.shutemov, akpm; +Cc: linux-mm, linux-kernel

From: zhong jiang <zhongjiang@huawei.com>

when a process acquire a pmd table shared by other process, we
increase the account to current process. otherwise, a race result
in other tasks have set the pud entry. so it no need to increase it.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 mm/hugetlb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 19d0d08..3b025c5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	if (pud_none(*pud)) {
 		pud_populate(mm, pud,
 				(pmd_t *)((unsigned long)spte & PAGE_MASK));
-	} else {
+	} else 
 		put_page(virt_to_page(spte));
-		mm_inc_nr_pmds(mm);
-	}
+
 	spin_unlock(ptl);
 out:
 	pte = (pte_t *)pmd_alloc(mm, pud, addr);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] mm: fix account pmd page to the process
@ 2016-06-16 11:16 zhongjiang
  0 siblings, 0 replies; 22+ messages in thread
From: zhongjiang @ 2016-06-16 11:16 UTC (permalink / raw)
  To: kirill.shutemov, akpm; +Cc: linux-mm, linux-kernel

From: zhong jiang <zhongjiang@huawei.com>

when a process acquire a pmd table shared by other process, we
increase the account to current process. otherwise, a race result
in other tasks have set the pud entry. so it no need to increase it.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 mm/hugetlb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 19d0d08..3b025c5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	if (pud_none(*pud)) {
 		pud_populate(mm, pud,
 				(pmd_t *)((unsigned long)spte & PAGE_MASK));
-	} else {
+	} else 
 		put_page(virt_to_page(spte));
-		mm_inc_nr_pmds(mm);
-	}
+
 	spin_unlock(ptl);
 out:
 	pte = (pte_t *)pmd_alloc(mm, pud, addr);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] mm: fix account pmd page to the process
@ 2016-06-16 11:13 zhongjiang
  0 siblings, 0 replies; 22+ messages in thread
From: zhongjiang @ 2016-06-16 11:13 UTC (permalink / raw)
  To: kirill.shutemov, akpm; +Cc: linux-mm, linux-kernel

From: zhong jiang <zhongjiang@huawei.com>

when a process acquire a pmd table shared by other process, we
increase the account to current process. otherwise, a race result
in other tasks have set the pud entry. so it no need to increase it.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 mm/hugetlb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 19d0d08..3b025c5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 	if (pud_none(*pud)) {
 		pud_populate(mm, pud,
 				(pmd_t *)((unsigned long)spte & PAGE_MASK));
-	} else {
+	} else 
 		put_page(virt_to_page(spte));
-		mm_inc_nr_pmds(mm);
-	}
+
 	spin_unlock(ptl);
 out:
 	pte = (pte_t *)pmd_alloc(mm, pud, addr);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-06-18  5:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16  7:47 [PATCH] mm: fix account pmd page to the process zhongjiang
2016-06-16 11:13 zhongjiang
2016-06-16 11:16 zhongjiang
2016-06-16 11:17 zhongjiang
2016-06-16 11:28 zhongjiang
     [not found] <1466076175-23444-1-git-send-email-zhongjiang@huawei.com>
2016-06-16 11:30 ` zhong jiang
2016-06-16 11:36 zhongjiang
2016-06-16 15:42 ` Michal Hocko
2016-06-16 15:43   ` Michal Hocko
2016-06-16 16:05     ` Mike Kravetz
2016-06-16 16:31       ` Michal Hocko
2016-06-16 16:47         ` Mike Kravetz
2016-06-17 12:25           ` Kirill A. Shutemov
2016-06-17 13:00             ` Michal Hocko
2016-06-17 14:25               ` Kirill A. Shutemov
2016-06-17 15:39             ` Mike Kravetz
2016-06-18  5:07               ` zhong jiang
2016-06-17 11:18   ` zhong jiang
2016-06-17 11:56 zhongjiang
2016-06-17 12:21 ` Michal Hocko
2016-06-17 13:04   ` zhong jiang
     [not found] <1466163941-12952-1-git-send-email-zhongjiang@huawei.com>
2016-06-17 12:01 ` zhong jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).