All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-11 12:31 ` Jianyu Zhan
  0 siblings, 0 replies; 22+ messages in thread
From: Jianyu Zhan @ 2014-05-11 12:31 UTC (permalink / raw)
  To: akpm, hughd, riel, nasa4836, mgorman, zhangyanfei, aarcange,
	fabf, sasha.levin, oleg, n-horiguchi, iamjoonsoo.kim,
	kirill.shutemov, gorcunov, cl, dave.hansen, toshi.kani,
	paul.gortmaker, srivatsa.bhat
  Cc: linux-mm, linux-kernel

>Your original __mod_zone_page_state happened to be correct;
>but you have no understanding of why it was correct, so its
>comment was very wrong, even after you changed the irq wording.
>
>This series just propagates your misunderstanding further,
>while providing an object lesson in how not to present a series.
>
>Sorry, you have quickly developed an unenviable reputation for
>patches which waste developers' time: please consider your
>patches much more carefully before posting them.

Hi, Hugh.

I'm sorry for my misunderstanding in the previous patches. I should
have given them more thought. I'm really sorry. {palmface} X 3

And I am really appreciated for your detailed comments and your patience
with me such a noob;-) Today I've spent some time on digging into
the history to figure out what is under the hood.

>I'd prefer to let Christoph write the definitive version,
>but my first stab at it would be:
>
>/*
> * For use when we know that interrupts are disabled,
> * or when we know that preemption is disabled and that
> * particular counter cannot be updated from interrupt context.
> */

 Seconded. Christoph, would you please write a comment? I've written
 a new one based on Hugh's, would you please also take a look? Thanks.

>Once upon a time, from 2.6.16 to 2.6.32, there was indeed a relevant
>and helpful comment in __page_set_anon_rmap():
>        /*
>         * nr_mapped state can be updated without turning off
>         * interrupts because it is not modified via interrupt.
>         */
>        __inc_page_state(nr_mapped);
>
>The comment survived the replacement of nr_mapped, but eventually
>it got cleaned away completely.
>
>It is safe to use the irq-unsafe __mod_zone_page_stat on counters
>which are never modified via interrupt.
>You are right that the comment is not good enough, but I don't trust
>your version either.  Since percpu variables are involved, it's important
>that preemption be disabled too (see comment above __inc_zone_state).

Thanks for clarifying this. I checked the history, this is my understanding
, correct me if I am wrong, thanks!

__mod_zone_page_stat() should be called with irq-off, this is a strongest
gurantee for safety. For a weaker gurantee, preemte-disable is needed and
in this situation, the item counter in question should not be modified in
interrupt context.

It is essential to have such gurantees, because __mod_zone_page_stat()
is a two-step operation : read-percpu-couter-then-modify-it.
(Need comments. Christoph, do I misunderstand it?)

For for all other call sites of __mod_zone_page_stat() in the patch,
I think your comment is right, it is because they are not modified in
interrupt context, so we could safely use __mod_zone_page_stat().
 
But for the call site in page_add_new_anon_rmap(), I think my previous
wording is appropriate: mlocked_vma_newpage() is only called in fault path
by page_add_new_anon_rmap() on a *new* page, which is initially only visible
via the pagetables, and the pte is locked while calling page_add_new_anon_rmap(),
so we are not afraid of others seeing it a this point, not even modifying it.
so whether is could be modified in interrupt context or not does matter, but it
is not the key point here.

I've renewed the patch. And in this patch, I also change page_add_new_anon_rmap to
use __mod_zone_page_stat(), since they are quite relative to be in one patch.

Thanks for your review. I am appreciated for your comments. Andrew, Christoph,
would you please review it? Thansk.

-----<8-----
mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()

mlocked_vma_newpage() is only called in fault path by
page_add_new_anon_rmap(), which is called on a *new* page.
And such page is initially only visible via the pagetables, and the
pte is locked while calling page_add_new_anon_rmap(), so we need not
use an irq-safe mod_zone_page_state() here, using a light-weight version
__mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
---
 mm/internal.h |  9 ++++++++-
 mm/rmap.c     | 11 +++++++++++
 mm/vmstat.c   |  5 ++++-
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..7140c9b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,14 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
 		return 0;
 
 	if (!TestSetPageMlocked(page)) {
-		mod_zone_page_state(page_zone(page), NR_MLOCK,
+		/*
+		 * We use the irq-unsafe __mod_zone_page_stat because
+		 * 1. this counter is not modified in interrupt context, and
+		 * 2. pte lock is held, and this a newpage, which is initially
+		 *    only visible via the pagetables. So this would exclude
+		 *    racy processes from preemting us and to modify it.
+		 */
+		__mod_zone_page_state(page_zone(page), NR_MLOCK,
 				    hpage_nr_pages(page));
 		count_vm_event(UNEVICTABLE_PGMLOCKED);
 	}
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..0700253 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -986,6 +986,12 @@ void do_page_add_anon_rmap(struct page *page,
 {
 	int first = atomic_inc_and_test(&page->_mapcount);
 	if (first) {
+		/*
+		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+		 * 1. these counters are not modified in interrupt context, and
+		 * 2. pte lock is held, this would exclude racy processes from
+		 *    preemting us and to modify these counters.
+		 */
 		if (PageTransHuge(page))
 			__inc_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
@@ -1077,6 +1083,11 @@ void page_remove_rmap(struct page *page)
 	/*
 	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 	 * and not charged by memcg for now.
+	 *
+	 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+	 * 1. these counters are not modified in interrupt context, and
+	 * 2. pte lock is held, this would exclude racy processes from
+	 *    preemting us and to modify these counters.
 	 */
 	if (unlikely(PageHuge(page)))
 		goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..e6db98d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,10 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * For use when we know that either
+ *  1. interrupts are disabled, or
+ *  2. preemption is disabled and that particular counter cannot be
+ *     updated from interrupt context.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
-- 
2.0.0-rc1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-11 12:31 ` Jianyu Zhan
  0 siblings, 0 replies; 22+ messages in thread
From: Jianyu Zhan @ 2014-05-11 12:31 UTC (permalink / raw)
  To: akpm, hughd, riel, nasa4836, mgorman, zhangyanfei, aarcange,
	fabf, sasha.levin, oleg, n-horiguchi, iamjoonsoo.kim,
	kirill.shutemov, gorcunov, cl, dave.hansen, toshi.kani,
	paul.gortmaker, srivatsa.bhat
  Cc: linux-mm, linux-kernel

>Your original __mod_zone_page_state happened to be correct;
>but you have no understanding of why it was correct, so its
>comment was very wrong, even after you changed the irq wording.
>
>This series just propagates your misunderstanding further,
>while providing an object lesson in how not to present a series.
>
>Sorry, you have quickly developed an unenviable reputation for
>patches which waste developers' time: please consider your
>patches much more carefully before posting them.

Hi, Hugh.

I'm sorry for my misunderstanding in the previous patches. I should
have given them more thought. I'm really sorry. {palmface} X 3

And I am really appreciated for your detailed comments and your patience
with me such a noob;-) Today I've spent some time on digging into
the history to figure out what is under the hood.

>I'd prefer to let Christoph write the definitive version,
>but my first stab at it would be:
>
>/*
> * For use when we know that interrupts are disabled,
> * or when we know that preemption is disabled and that
> * particular counter cannot be updated from interrupt context.
> */

 Seconded. Christoph, would you please write a comment? I've written
 a new one based on Hugh's, would you please also take a look? Thanks.

>Once upon a time, from 2.6.16 to 2.6.32, there was indeed a relevant
>and helpful comment in __page_set_anon_rmap():
>        /*
>         * nr_mapped state can be updated without turning off
>         * interrupts because it is not modified via interrupt.
>         */
>        __inc_page_state(nr_mapped);
>
>The comment survived the replacement of nr_mapped, but eventually
>it got cleaned away completely.
>
>It is safe to use the irq-unsafe __mod_zone_page_stat on counters
>which are never modified via interrupt.
>You are right that the comment is not good enough, but I don't trust
>your version either.  Since percpu variables are involved, it's important
>that preemption be disabled too (see comment above __inc_zone_state).

Thanks for clarifying this. I checked the history, this is my understanding
, correct me if I am wrong, thanks!

__mod_zone_page_stat() should be called with irq-off, this is a strongest
gurantee for safety. For a weaker gurantee, preemte-disable is needed and
in this situation, the item counter in question should not be modified in
interrupt context.

It is essential to have such gurantees, because __mod_zone_page_stat()
is a two-step operation : read-percpu-couter-then-modify-it.
(Need comments. Christoph, do I misunderstand it?)

For for all other call sites of __mod_zone_page_stat() in the patch,
I think your comment is right, it is because they are not modified in
interrupt context, so we could safely use __mod_zone_page_stat().
 
But for the call site in page_add_new_anon_rmap(), I think my previous
wording is appropriate: mlocked_vma_newpage() is only called in fault path
by page_add_new_anon_rmap() on a *new* page, which is initially only visible
via the pagetables, and the pte is locked while calling page_add_new_anon_rmap(),
so we are not afraid of others seeing it a this point, not even modifying it.
so whether is could be modified in interrupt context or not does matter, but it
is not the key point here.

I've renewed the patch. And in this patch, I also change page_add_new_anon_rmap to
use __mod_zone_page_stat(), since they are quite relative to be in one patch.

Thanks for your review. I am appreciated for your comments. Andrew, Christoph,
would you please review it? Thansk.

-----<8-----
mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()

mlocked_vma_newpage() is only called in fault path by
page_add_new_anon_rmap(), which is called on a *new* page.
And such page is initially only visible via the pagetables, and the
pte is locked while calling page_add_new_anon_rmap(), so we need not
use an irq-safe mod_zone_page_state() here, using a light-weight version
__mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
---
 mm/internal.h |  9 ++++++++-
 mm/rmap.c     | 11 +++++++++++
 mm/vmstat.c   |  5 ++++-
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..7140c9b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,14 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
 		return 0;
 
 	if (!TestSetPageMlocked(page)) {
-		mod_zone_page_state(page_zone(page), NR_MLOCK,
+		/*
+		 * We use the irq-unsafe __mod_zone_page_stat because
+		 * 1. this counter is not modified in interrupt context, and
+		 * 2. pte lock is held, and this a newpage, which is initially
+		 *    only visible via the pagetables. So this would exclude
+		 *    racy processes from preemting us and to modify it.
+		 */
+		__mod_zone_page_state(page_zone(page), NR_MLOCK,
 				    hpage_nr_pages(page));
 		count_vm_event(UNEVICTABLE_PGMLOCKED);
 	}
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..0700253 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -986,6 +986,12 @@ void do_page_add_anon_rmap(struct page *page,
 {
 	int first = atomic_inc_and_test(&page->_mapcount);
 	if (first) {
+		/*
+		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+		 * 1. these counters are not modified in interrupt context, and
+		 * 2. pte lock is held, this would exclude racy processes from
+		 *    preemting us and to modify these counters.
+		 */
 		if (PageTransHuge(page))
 			__inc_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
@@ -1077,6 +1083,11 @@ void page_remove_rmap(struct page *page)
 	/*
 	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 	 * and not charged by memcg for now.
+	 *
+	 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+	 * 1. these counters are not modified in interrupt context, and
+	 * 2. pte lock is held, this would exclude racy processes from
+	 *    preemting us and to modify these counters.
 	 */
 	if (unlikely(PageHuge(page)))
 		goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..e6db98d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,10 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * For use when we know that either
+ *  1. interrupts are disabled, or
+ *  2. preemption is disabled and that particular counter cannot be
+ *     updated from interrupt context.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
-- 
2.0.0-rc1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
  2014-05-11 12:31 ` Jianyu Zhan
@ 2014-05-12 14:01   ` Christoph Lameter
  -1 siblings, 0 replies; 22+ messages in thread
From: Christoph Lameter @ 2014-05-12 14:01 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: akpm, hughd, riel, mgorman, zhangyanfei, aarcange, fabf,
	sasha.levin, oleg, n-horiguchi, iamjoonsoo.kim, kirill.shutemov,
	gorcunov, dave.hansen, toshi.kani, paul.gortmaker, srivatsa.bhat,
	linux-mm, linux-kernel

On Sun, 11 May 2014, Jianyu Zhan wrote:

> >
> >/*
> > * For use when we know that interrupts are disabled,
> > * or when we know that preemption is disabled and that
> > * particular counter cannot be updated from interrupt context.
> > */
>
>  Seconded. Christoph, would you please write a comment? I've written
>  a new one based on Hugh's, would you please also take a look? Thanks.

The description above looks ok to me. The problem is that you are
considering the page related data structures as an issue for races and not
the data structures relevant for vm statistics.

> It is essential to have such gurantees, because __mod_zone_page_stat()
> is a two-step operation : read-percpu-couter-then-modify-it.
> (Need comments. Christoph, do I misunderstand it?)

Yup.

> mlocked_vma_newpage() is only called in fault path by
> page_add_new_anon_rmap(), which is called on a *new* page.
> And such page is initially only visible via the pagetables, and the
> pte is locked while calling page_add_new_anon_rmap(), so we need not
> use an irq-safe mod_zone_page_state() here, using a light-weight version
> __mod_zone_page_state() would be OK.

This is wrong.. What you could say is that preemption is off and that the
counter is never incremented from an interrupt context that could race
with it. If this is the case then it would be safe.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-12 14:01   ` Christoph Lameter
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Lameter @ 2014-05-12 14:01 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: akpm, hughd, riel, mgorman, zhangyanfei, aarcange, fabf,
	sasha.levin, oleg, n-horiguchi, iamjoonsoo.kim, kirill.shutemov,
	gorcunov, dave.hansen, toshi.kani, paul.gortmaker, srivatsa.bhat,
	linux-mm, linux-kernel

On Sun, 11 May 2014, Jianyu Zhan wrote:

> >
> >/*
> > * For use when we know that interrupts are disabled,
> > * or when we know that preemption is disabled and that
> > * particular counter cannot be updated from interrupt context.
> > */
>
>  Seconded. Christoph, would you please write a comment? I've written
>  a new one based on Hugh's, would you please also take a look? Thanks.

The description above looks ok to me. The problem is that you are
considering the page related data structures as an issue for races and not
the data structures relevant for vm statistics.

> It is essential to have such gurantees, because __mod_zone_page_stat()
> is a two-step operation : read-percpu-couter-then-modify-it.
> (Need comments. Christoph, do I misunderstand it?)

Yup.

> mlocked_vma_newpage() is only called in fault path by
> page_add_new_anon_rmap(), which is called on a *new* page.
> And such page is initially only visible via the pagetables, and the
> pte is locked while calling page_add_new_anon_rmap(), so we need not
> use an irq-safe mod_zone_page_state() here, using a light-weight version
> __mod_zone_page_state() would be OK.

This is wrong.. What you could say is that preemption is off and that the
counter is never incremented from an interrupt context that could race
with it. If this is the case then it would be safe.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
  2014-05-12 14:01   ` Christoph Lameter
@ 2014-05-12 15:19     ` Jianyu Zhan
  -1 siblings, 0 replies; 22+ messages in thread
From: Jianyu Zhan @ 2014-05-12 15:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Hugh Dickins, Rik van Riel, Mel Gorman,
	Zhang Yanfei, Andrea Arcangeli, fabf, sasha.levin, oleg,
	Naoya Horiguchi, iamjoonsoo.kim, kirill.shutemov,
	Cyrill Gorcunov, dave.hansen, toshi.kani, paul.gortmaker,
	srivatsa.bhat, linux-mm, LKML

On Mon, May 12, 2014 at 10:01 PM, Christoph Lameter <cl@linux.com> wrote:
> >
> >/*
> > * For use when we know that interrupts are disabled,
> > * or when we know that preemption is disabled and that
> > * particular counter cannot be updated from interrupt context.
> > */
>
> The description above looks ok to me. The problem is that you are
> considering the page related data structures as an issue for races and not
> the data structures relevant for vm statistics.

Hi,  Christoph,

Yep. I did.

Let me restate the point here.

 To use  __mod_zone_page_stat when we know that either
 1. interrupts are disabled, or
 2. preemption is disabled and that particular counter cannot be
    updated from interrupt context.

For those call sites currently using __mod_zone_page_stat, they just guarantees
the counter is never modified from an interrupt context, but doesn't disable
preemption.

This means they guarantee that even they are preemted the vm
counter won't be modified incorrectly.  Because the counter is page-related
(e.g., a new anon page added), and they are exclusively hold the pte lock.

This is why I emphasized on 'the page related data structures as an
issue for races'.

So, as you concludes in the other mail that __modd_zone_page_stat
couldn't be used.
in mlocked_vma_newpage, then what qualifies other call sites for using
it, in the same situation?
See:

void page_add_new_anon_rmap(struct page *page,
        struct vm_area_struct *vma, unsigned long address)
{
        VM_BUG_ON(address < vma->vm_start || address >= vma->vm_end);
        SetPageSwapBacked(page);
        atomic_set(&page->_mapcount, 0); /* increment count (starts at -1) */
        if (PageTransHuge(page))
                __inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
        __mod_zone_page_state(page_zone(page), NR_ANON_PAGES,
                        hpage_nr_pages(page));         <---  using it.
        __page_set_anon_rmap(page, vma, address, 1);
        if (!mlocked_vma_newpage(vma, page)) {     <--- couldn't use it ?
                SetPageActive(page);
                lru_cache_add(page);
        } else
                add_page_to_unevictable_list(page);
}

Hope I express it clearly enough.  Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-12 15:19     ` Jianyu Zhan
  0 siblings, 0 replies; 22+ messages in thread
From: Jianyu Zhan @ 2014-05-12 15:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Hugh Dickins, Rik van Riel, Mel Gorman,
	Zhang Yanfei, Andrea Arcangeli, fabf, sasha.levin, oleg,
	Naoya Horiguchi, iamjoonsoo.kim, kirill.shutemov,
	Cyrill Gorcunov, dave.hansen, toshi.kani, paul.gortmaker,
	srivatsa.bhat, linux-mm, LKML

On Mon, May 12, 2014 at 10:01 PM, Christoph Lameter <cl@linux.com> wrote:
> >
> >/*
> > * For use when we know that interrupts are disabled,
> > * or when we know that preemption is disabled and that
> > * particular counter cannot be updated from interrupt context.
> > */
>
> The description above looks ok to me. The problem is that you are
> considering the page related data structures as an issue for races and not
> the data structures relevant for vm statistics.

Hi,  Christoph,

Yep. I did.

Let me restate the point here.

 To use  __mod_zone_page_stat when we know that either
 1. interrupts are disabled, or
 2. preemption is disabled and that particular counter cannot be
    updated from interrupt context.

For those call sites currently using __mod_zone_page_stat, they just guarantees
the counter is never modified from an interrupt context, but doesn't disable
preemption.

This means they guarantee that even they are preemted the vm
counter won't be modified incorrectly.  Because the counter is page-related
(e.g., a new anon page added), and they are exclusively hold the pte lock.

This is why I emphasized on 'the page related data structures as an
issue for races'.

So, as you concludes in the other mail that __modd_zone_page_stat
couldn't be used.
in mlocked_vma_newpage, then what qualifies other call sites for using
it, in the same situation?
See:

void page_add_new_anon_rmap(struct page *page,
        struct vm_area_struct *vma, unsigned long address)
{
        VM_BUG_ON(address < vma->vm_start || address >= vma->vm_end);
        SetPageSwapBacked(page);
        atomic_set(&page->_mapcount, 0); /* increment count (starts at -1) */
        if (PageTransHuge(page))
                __inc_zone_page_state(page, NR_ANON_TRANSPARENT_HUGEPAGES);
        __mod_zone_page_state(page_zone(page), NR_ANON_PAGES,
                        hpage_nr_pages(page));         <---  using it.
        __page_set_anon_rmap(page, vma, address, 1);
        if (!mlocked_vma_newpage(vma, page)) {     <--- couldn't use it ?
                SetPageActive(page);
                lru_cache_add(page);
        } else
                add_page_to_unevictable_list(page);
}

Hope I express it clearly enough.  Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
  2014-05-12 15:19     ` Jianyu Zhan
@ 2014-05-12 16:05       ` Christoph Lameter
  -1 siblings, 0 replies; 22+ messages in thread
From: Christoph Lameter @ 2014-05-12 16:05 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: Andrew Morton, Hugh Dickins, Rik van Riel, Mel Gorman,
	Zhang Yanfei, Andrea Arcangeli, fabf, sasha.levin, oleg,
	Naoya Horiguchi, iamjoonsoo.kim, kirill.shutemov,
	Cyrill Gorcunov, dave.hansen, toshi.kani, paul.gortmaker,
	srivatsa.bhat, linux-mm, LKML

On Mon, 12 May 2014, Jianyu Zhan wrote:

> This means they guarantee that even they are preemted the vm
> counter won't be modified incorrectly.  Because the counter is page-related
> (e.g., a new anon page added), and they are exclusively hold the pte lock.

But there are multiple pte locks for numerous page. Another process could
modify the counter because the pte lock for a different page was
available which would cause counter corruption.


> So, as you concludes in the other mail that __modd_zone_page_stat
> couldn't be used.
> in mlocked_vma_newpage, then what qualifies other call sites for using
> it, in the same situation?

Preemption should be off in those functions because a spinlock is being
held.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-12 16:05       ` Christoph Lameter
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Lameter @ 2014-05-12 16:05 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: Andrew Morton, Hugh Dickins, Rik van Riel, Mel Gorman,
	Zhang Yanfei, Andrea Arcangeli, fabf, sasha.levin, oleg,
	Naoya Horiguchi, iamjoonsoo.kim, kirill.shutemov,
	Cyrill Gorcunov, dave.hansen, toshi.kani, paul.gortmaker,
	srivatsa.bhat, linux-mm, LKML

On Mon, 12 May 2014, Jianyu Zhan wrote:

> This means they guarantee that even they are preemted the vm
> counter won't be modified incorrectly.  Because the counter is page-related
> (e.g., a new anon page added), and they are exclusively hold the pte lock.

But there are multiple pte locks for numerous page. Another process could
modify the counter because the pte lock for a different page was
available which would cause counter corruption.


> So, as you concludes in the other mail that __modd_zone_page_stat
> couldn't be used.
> in mlocked_vma_newpage, then what qualifies other call sites for using
> it, in the same situation?

Preemption should be off in those functions because a spinlock is being
held.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
  2014-05-12 15:19     ` Jianyu Zhan
@ 2014-05-12 16:07       ` Christoph Lameter
  -1 siblings, 0 replies; 22+ messages in thread
From: Christoph Lameter @ 2014-05-12 16:07 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: Andrew Morton, Hugh Dickins, Rik van Riel, Mel Gorman,
	Zhang Yanfei, Andrea Arcangeli, fabf, sasha.levin, oleg,
	Naoya Horiguchi, iamjoonsoo.kim, kirill.shutemov,
	Cyrill Gorcunov, dave.hansen, toshi.kani, paul.gortmaker,
	srivatsa.bhat, linux-mm, LKML

On Mon, 12 May 2014, Jianyu Zhan wrote:

> This means they guarantee that even they are preemted the vm
> counter won't be modified incorrectly.  Because the counter is page-related
> (e.g., a new anon page added), and they are exclusively hold the pte lock.

Ok. if these locations hold the pte lock then preemption is disabled and
you are ok to use __mod_zone_page_state. Has nothing to do with the page.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-12 16:07       ` Christoph Lameter
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Lameter @ 2014-05-12 16:07 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: Andrew Morton, Hugh Dickins, Rik van Riel, Mel Gorman,
	Zhang Yanfei, Andrea Arcangeli, fabf, sasha.levin, oleg,
	Naoya Horiguchi, iamjoonsoo.kim, kirill.shutemov,
	Cyrill Gorcunov, dave.hansen, toshi.kani, paul.gortmaker,
	srivatsa.bhat, linux-mm, LKML

On Mon, 12 May 2014, Jianyu Zhan wrote:

> This means they guarantee that even they are preemted the vm
> counter won't be modified incorrectly.  Because the counter is page-related
> (e.g., a new anon page added), and they are exclusively hold the pte lock.

Ok. if these locations hold the pte lock then preemption is disabled and
you are ok to use __mod_zone_page_state. Has nothing to do with the page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
  2014-05-12 16:33 ` Jianyu Zhan
@ 2014-05-14  3:59   ` Hugh Dickins
  -1 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2014-05-14  3:59 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: cl, akpm, riel, aarcange, oleg, cldu, fabf, sasha.levin,
	zhangyanfei, iamjoonsoo.kim, n-horiguchi, kirill.shutemov,
	liwanp, gorcunov, minchan, dave.hansen, toshi.kani,
	paul.gortmaker, srivatsa.bhat, linux-mm, linux-kernel,
	Hugh Dickins

On Tue, 13 May 2014, Jianyu Zhan wrote:

> >> This means they guarantee that even they are preemted the vm
> >> counter won't be modified incorrectly.  Because the counter is page-related
> >> (e.g., a new anon page added), and they are exclusively hold the pte lock.
> >
> >But there are multiple pte locks for numerous page. Another process could
> >modify the counter because the pte lock for a different page was
> >available which would cause counter corruption.
> >
> >
> >> So, as you concludes in the other mail that __modd_zone_page_stat
> >> couldn't be used.
> >> in mlocked_vma_newpage, then what qualifies other call sites for using
> >> it, in the same situation?
> 
> Thanks, now everything is clear.
> 
> I've renewed the patch, would you please review it? Thanks!
> 
> ---<8---
> mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()
> 
> mlocked_vma_newpage() is called with pte lock held(a spinlock), which
> implies preemtion disabled, and the vm stat counter is not modified from
> interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
> using a light-weight version __mod_zone_page_state() would be OK.
> 
> This patch also documents __mod_zone_page_state() and some of its
> callsites. The comment above __mod_zone_page_state() is from Hugh
> Dickins, and acked by Christoph.
> 
> Most credits to Hugh and Christoph for the clarification on the usage of
> the __mod_zone_page_state().
> 
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Hugh Dickins <hughd@google.com>

No, I didn't sign it off, just offered some text (which you already
acknowledged graciously).  Andrew, please delete that line, but...

> Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>

Acked-by: Hugh Dickins <hughd@google.com>

Yes, this is good - and even better is the version which Andrew
has slipped into mmotm, removing the duplicated comment line from
page_remove_rmap(), and reformatting the comments to fit better.

It is a little random, in that we can easily see other
__mod_zone_page_states to which the same comment could be applied;
but that's okay, please leave it as is, you've made the point enough
to help other people get it, and it would get boring to repeat it more.

Hugh

> ---
>  mm/internal.h |  7 ++++++-
>  mm/rmap.c     | 10 ++++++++++
>  mm/vmstat.c   |  4 +++-
>  3 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 07b6736..53d439e 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
>  		return 0;
>  
>  	if (!TestSetPageMlocked(page)) {
> -		mod_zone_page_state(page_zone(page), NR_MLOCK,
> +		/*
> +		 * We use the irq-unsafe __mod_zone_page_stat because
> +		 * this counter is not modified from interrupt context, and the
> +		 * pte lock is held(spinlock), which implies preemtion disabled.
> +		 */
> +		__mod_zone_page_state(page_zone(page), NR_MLOCK,
>  				    hpage_nr_pages(page));
>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>  	}
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 9c3e773..2fa4375 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
>  {
>  	int first = atomic_inc_and_test(&page->_mapcount);
>  	if (first) {
> +		/*
> +		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
> +		 * these counters are not modified in interrupt context, and
> +		 * pte lock(a spinlock) is held, which implies preemtion disabled.
> +		 */
>  		if (PageTransHuge(page))
>  			__inc_zone_page_state(page,
>  					      NR_ANON_TRANSPARENT_HUGEPAGES);
> @@ -1077,6 +1082,11 @@ void page_remove_rmap(struct page *page)
>  	/*
>  	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
>  	 * and not charged by memcg for now.
> +	 *
> +	 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
> +	 * these counters are not modified in interrupt context, and
> +	 * these counters are not modified in interrupt context, and
> +	 * pte lock(a spinlock) is held, which implies preemtion disabled.
>  	 */
>  	if (unlikely(PageHuge(page)))
>  		goto out;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 302dd07..704928e 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
>  }
>  
>  /*
> - * For use when we know that interrupts are disabled.
> + * For use when we know that interrupts are disabled,
> + * or when we know that preemption is disabled and that
> + * particular counter cannot be updated from interrupt context.
>   */
>  void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
>  				int delta)
> -- 
> 2.0.0-rc1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-14  3:59   ` Hugh Dickins
  0 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2014-05-14  3:59 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: cl, akpm, riel, aarcange, oleg, cldu, fabf, sasha.levin,
	zhangyanfei, iamjoonsoo.kim, n-horiguchi, kirill.shutemov,
	liwanp, gorcunov, minchan, dave.hansen, toshi.kani,
	paul.gortmaker, srivatsa.bhat, linux-mm, linux-kernel,
	Hugh Dickins

On Tue, 13 May 2014, Jianyu Zhan wrote:

> >> This means they guarantee that even they are preemted the vm
> >> counter won't be modified incorrectly.  Because the counter is page-related
> >> (e.g., a new anon page added), and they are exclusively hold the pte lock.
> >
> >But there are multiple pte locks for numerous page. Another process could
> >modify the counter because the pte lock for a different page was
> >available which would cause counter corruption.
> >
> >
> >> So, as you concludes in the other mail that __modd_zone_page_stat
> >> couldn't be used.
> >> in mlocked_vma_newpage, then what qualifies other call sites for using
> >> it, in the same situation?
> 
> Thanks, now everything is clear.
> 
> I've renewed the patch, would you please review it? Thanks!
> 
> ---<8---
> mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()
> 
> mlocked_vma_newpage() is called with pte lock held(a spinlock), which
> implies preemtion disabled, and the vm stat counter is not modified from
> interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
> using a light-weight version __mod_zone_page_state() would be OK.
> 
> This patch also documents __mod_zone_page_state() and some of its
> callsites. The comment above __mod_zone_page_state() is from Hugh
> Dickins, and acked by Christoph.
> 
> Most credits to Hugh and Christoph for the clarification on the usage of
> the __mod_zone_page_state().
> 
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Hugh Dickins <hughd@google.com>

No, I didn't sign it off, just offered some text (which you already
acknowledged graciously).  Andrew, please delete that line, but...

> Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>

Acked-by: Hugh Dickins <hughd@google.com>

Yes, this is good - and even better is the version which Andrew
has slipped into mmotm, removing the duplicated comment line from
page_remove_rmap(), and reformatting the comments to fit better.

It is a little random, in that we can easily see other
__mod_zone_page_states to which the same comment could be applied;
but that's okay, please leave it as is, you've made the point enough
to help other people get it, and it would get boring to repeat it more.

Hugh

> ---
>  mm/internal.h |  7 ++++++-
>  mm/rmap.c     | 10 ++++++++++
>  mm/vmstat.c   |  4 +++-
>  3 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 07b6736..53d439e 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
>  		return 0;
>  
>  	if (!TestSetPageMlocked(page)) {
> -		mod_zone_page_state(page_zone(page), NR_MLOCK,
> +		/*
> +		 * We use the irq-unsafe __mod_zone_page_stat because
> +		 * this counter is not modified from interrupt context, and the
> +		 * pte lock is held(spinlock), which implies preemtion disabled.
> +		 */
> +		__mod_zone_page_state(page_zone(page), NR_MLOCK,
>  				    hpage_nr_pages(page));
>  		count_vm_event(UNEVICTABLE_PGMLOCKED);
>  	}
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 9c3e773..2fa4375 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
>  {
>  	int first = atomic_inc_and_test(&page->_mapcount);
>  	if (first) {
> +		/*
> +		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
> +		 * these counters are not modified in interrupt context, and
> +		 * pte lock(a spinlock) is held, which implies preemtion disabled.
> +		 */
>  		if (PageTransHuge(page))
>  			__inc_zone_page_state(page,
>  					      NR_ANON_TRANSPARENT_HUGEPAGES);
> @@ -1077,6 +1082,11 @@ void page_remove_rmap(struct page *page)
>  	/*
>  	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
>  	 * and not charged by memcg for now.
> +	 *
> +	 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
> +	 * these counters are not modified in interrupt context, and
> +	 * these counters are not modified in interrupt context, and
> +	 * pte lock(a spinlock) is held, which implies preemtion disabled.
>  	 */
>  	if (unlikely(PageHuge(page)))
>  		goto out;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 302dd07..704928e 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
>  }
>  
>  /*
> - * For use when we know that interrupts are disabled.
> + * For use when we know that interrupts are disabled,
> + * or when we know that preemption is disabled and that
> + * particular counter cannot be updated from interrupt context.
>   */
>  void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
>  				int delta)
> -- 
> 2.0.0-rc1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-12 17:36 ` Jianyu Zhan
  0 siblings, 0 replies; 22+ messages in thread
From: Jianyu Zhan @ 2014-05-12 17:36 UTC (permalink / raw)
  To: cl, akpm; +Cc: linux-mm, linux-kernel, nasa4836, Hugh Dickins

>Preemption is mis-spelled throughout.
>
>Otherwise
>
>Reviewed-by: Christoph Lameter <cl@linux.com>

 Oops, corrected. Thanks.

-----<8-----
>From 1d0a080429542accfeac7de03e159a0bea12abba Mon Sep 17 00:00:00 2001
From: Jianyu Zhan <nasa4836@gmail.com>
Date: Tue, 13 May 2014 00:19:08 +0800
Subject: [PATCH] mm: use the light version __mod_zone_page_state in
 mlocked_vma_newpage()

mlocked_vma_newpage() is called with pte lock held(a spinlock), which
implies preemtion disabled, and the vm stat counter is not modified from
interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
using a light-weight version __mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites. The comment above __mod_zone_page_state() is from Hugh
Dickins, and acked by Christoph.

Most credits to Hugh and Christoph for the clarification on the usage of
the __mod_zone_page_state().

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
---
 mm/internal.h | 7 ++++++-
 mm/rmap.c     | 9 +++++++++
 mm/vmstat.c   | 4 +++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..d6a4868 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
 		return 0;
 
 	if (!TestSetPageMlocked(page)) {
-		mod_zone_page_state(page_zone(page), NR_MLOCK,
+		/*
+		 * We use the irq-unsafe __mod_zone_page_stat because
+		 * this counter is not modified from interrupt context, and the
+		 * pte lock is held(spinlock), which implies preemption disabled.
+		 */
+		__mod_zone_page_state(page_zone(page), NR_MLOCK,
 				    hpage_nr_pages(page));
 		count_vm_event(UNEVICTABLE_PGMLOCKED);
 	}
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..fa73194 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
 {
 	int first = atomic_inc_and_test(&page->_mapcount);
 	if (first) {
+		/*
+		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+		 * these counters are not modified from interrupt context, and
+		 * pte lock(a spinlock) is held, which implies preemption disabled.
+		 */
 		if (PageTransHuge(page))
 			__inc_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
@@ -1077,6 +1082,10 @@ void page_remove_rmap(struct page *page)
 	/*
 	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 	 * and not charged by memcg for now.
+	 *
+	 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+	 * these counters are not modified from interrupt context, and
+	 * pte lock(a spinlock) is held, which implies preemption disabled.
 	 */
 	if (unlikely(PageHuge(page)))
 		goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..704928e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * For use when we know that interrupts are disabled,
+ * or when we know that preemption is disabled and that
+ * particular counter cannot be updated from interrupt context.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
-- 
2.0.0-rc1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-12 17:36 ` Jianyu Zhan
  0 siblings, 0 replies; 22+ messages in thread
From: Jianyu Zhan @ 2014-05-12 17:36 UTC (permalink / raw)
  To: cl, akpm; +Cc: linux-mm, linux-kernel, nasa4836, Hugh Dickins

>Preemption is mis-spelled throughout.
>
>Otherwise
>
>Reviewed-by: Christoph Lameter <cl@linux.com>

 Oops, corrected. Thanks.

-----<8-----

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
  2014-05-12 16:33 ` Jianyu Zhan
@ 2014-05-12 16:57   ` Christoph Lameter
  -1 siblings, 0 replies; 22+ messages in thread
From: Christoph Lameter @ 2014-05-12 16:57 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: akpm, riel, aarcange, oleg, cldu, fabf, sasha.levin, zhangyanfei,
	iamjoonsoo.kim, n-horiguchi, kirill.shutemov, liwanp, gorcunov,
	minchan, dave.hansen, toshi.kani, paul.gortmaker, srivatsa.bhat,
	linux-mm, linux-kernel, Hugh Dickins

On Tue, 13 May 2014, Jianyu Zhan wrote:

> diff --git a/mm/internal.h b/mm/internal.h
> index 07b6736..53d439e 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
>  		return 0;
>
>  	if (!TestSetPageMlocked(page)) {
> -		mod_zone_page_state(page_zone(page), NR_MLOCK,
> +		/*
> +		 * We use the irq-unsafe __mod_zone_page_stat because
> +		 * this counter is not modified from interrupt context, and the
> +		 * pte lock is held(spinlock), which implies preemtion disabled.

Preemption is mis-spelled throughout.

Otherwise

Reviewed-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-12 16:57   ` Christoph Lameter
  0 siblings, 0 replies; 22+ messages in thread
From: Christoph Lameter @ 2014-05-12 16:57 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: akpm, riel, aarcange, oleg, cldu, fabf, sasha.levin, zhangyanfei,
	iamjoonsoo.kim, n-horiguchi, kirill.shutemov, liwanp, gorcunov,
	minchan, dave.hansen, toshi.kani, paul.gortmaker, srivatsa.bhat,
	linux-mm, linux-kernel, Hugh Dickins

On Tue, 13 May 2014, Jianyu Zhan wrote:

> diff --git a/mm/internal.h b/mm/internal.h
> index 07b6736..53d439e 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
>  		return 0;
>
>  	if (!TestSetPageMlocked(page)) {
> -		mod_zone_page_state(page_zone(page), NR_MLOCK,
> +		/*
> +		 * We use the irq-unsafe __mod_zone_page_stat because
> +		 * this counter is not modified from interrupt context, and the
> +		 * pte lock is held(spinlock), which implies preemtion disabled.

Preemption is mis-spelled throughout.

Otherwise

Reviewed-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-12 16:33 ` Jianyu Zhan
  0 siblings, 0 replies; 22+ messages in thread
From: Jianyu Zhan @ 2014-05-12 16:33 UTC (permalink / raw)
  To: cl, akpm, riel, aarcange, oleg, cldu, fabf, nasa4836,
	sasha.levin, zhangyanfei, iamjoonsoo.kim, n-horiguchi,
	kirill.shutemov, liwanp, gorcunov, minchan, dave.hansen,
	toshi.kani, paul.gortmaker, srivatsa.bhat
  Cc: linux-mm, linux-kernel, Hugh Dickins

>> This means they guarantee that even they are preemted the vm
>> counter won't be modified incorrectly.  Because the counter is page-related
>> (e.g., a new anon page added), and they are exclusively hold the pte lock.
>
>But there are multiple pte locks for numerous page. Another process could
>modify the counter because the pte lock for a different page was
>available which would cause counter corruption.
>
>
>> So, as you concludes in the other mail that __modd_zone_page_stat
>> couldn't be used.
>> in mlocked_vma_newpage, then what qualifies other call sites for using
>> it, in the same situation?

Thanks, now everything is clear.

I've renewed the patch, would you please review it? Thanks!

---<8---
mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()

mlocked_vma_newpage() is called with pte lock held(a spinlock), which
implies preemtion disabled, and the vm stat counter is not modified from
interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
using a light-weight version __mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites. The comment above __mod_zone_page_state() is from Hugh
Dickins, and acked by Christoph.

Most credits to Hugh and Christoph for the clarification on the usage of
the __mod_zone_page_state().

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
---
 mm/internal.h |  7 ++++++-
 mm/rmap.c     | 10 ++++++++++
 mm/vmstat.c   |  4 +++-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..53d439e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
 		return 0;
 
 	if (!TestSetPageMlocked(page)) {
-		mod_zone_page_state(page_zone(page), NR_MLOCK,
+		/*
+		 * We use the irq-unsafe __mod_zone_page_stat because
+		 * this counter is not modified from interrupt context, and the
+		 * pte lock is held(spinlock), which implies preemtion disabled.
+		 */
+		__mod_zone_page_state(page_zone(page), NR_MLOCK,
 				    hpage_nr_pages(page));
 		count_vm_event(UNEVICTABLE_PGMLOCKED);
 	}
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..2fa4375 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
 {
 	int first = atomic_inc_and_test(&page->_mapcount);
 	if (first) {
+		/*
+		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+		 * these counters are not modified in interrupt context, and
+		 * pte lock(a spinlock) is held, which implies preemtion disabled.
+		 */
 		if (PageTransHuge(page))
 			__inc_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
@@ -1077,6 +1082,11 @@ void page_remove_rmap(struct page *page)
 	/*
 	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 	 * and not charged by memcg for now.
+	 *
+	 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+	 * these counters are not modified in interrupt context, and
+	 * these counters are not modified in interrupt context, and
+	 * pte lock(a spinlock) is held, which implies preemtion disabled.
 	 */
 	if (unlikely(PageHuge(page)))
 		goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..704928e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * For use when we know that interrupts are disabled,
+ * or when we know that preemption is disabled and that
+ * particular counter cannot be updated from interrupt context.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
-- 
2.0.0-rc1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-12 16:33 ` Jianyu Zhan
  0 siblings, 0 replies; 22+ messages in thread
From: Jianyu Zhan @ 2014-05-12 16:33 UTC (permalink / raw)
  To: cl, akpm, riel, aarcange, oleg, cldu, fabf, nasa4836,
	sasha.levin, zhangyanfei, iamjoonsoo.kim, n-horiguchi,
	kirill.shutemov, liwanp, gorcunov, minchan, dave.hansen,
	toshi.kani, paul.gortmaker, srivatsa.bhat
  Cc: linux-mm, linux-kernel, Hugh Dickins

>> This means they guarantee that even they are preemted the vm
>> counter won't be modified incorrectly.  Because the counter is page-related
>> (e.g., a new anon page added), and they are exclusively hold the pte lock.
>
>But there are multiple pte locks for numerous page. Another process could
>modify the counter because the pte lock for a different page was
>available which would cause counter corruption.
>
>
>> So, as you concludes in the other mail that __modd_zone_page_stat
>> couldn't be used.
>> in mlocked_vma_newpage, then what qualifies other call sites for using
>> it, in the same situation?

Thanks, now everything is clear.

I've renewed the patch, would you please review it? Thanks!

---<8---
mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()

mlocked_vma_newpage() is called with pte lock held(a spinlock), which
implies preemtion disabled, and the vm stat counter is not modified from
interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
using a light-weight version __mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites. The comment above __mod_zone_page_state() is from Hugh
Dickins, and acked by Christoph.

Most credits to Hugh and Christoph for the clarification on the usage of
the __mod_zone_page_state().

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
---
 mm/internal.h |  7 ++++++-
 mm/rmap.c     | 10 ++++++++++
 mm/vmstat.c   |  4 +++-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..53d439e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
 		return 0;
 
 	if (!TestSetPageMlocked(page)) {
-		mod_zone_page_state(page_zone(page), NR_MLOCK,
+		/*
+		 * We use the irq-unsafe __mod_zone_page_stat because
+		 * this counter is not modified from interrupt context, and the
+		 * pte lock is held(spinlock), which implies preemtion disabled.
+		 */
+		__mod_zone_page_state(page_zone(page), NR_MLOCK,
 				    hpage_nr_pages(page));
 		count_vm_event(UNEVICTABLE_PGMLOCKED);
 	}
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..2fa4375 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
 {
 	int first = atomic_inc_and_test(&page->_mapcount);
 	if (first) {
+		/*
+		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+		 * these counters are not modified in interrupt context, and
+		 * pte lock(a spinlock) is held, which implies preemtion disabled.
+		 */
 		if (PageTransHuge(page))
 			__inc_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
@@ -1077,6 +1082,11 @@ void page_remove_rmap(struct page *page)
 	/*
 	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 	 * and not charged by memcg for now.
+	 *
+	 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+	 * these counters are not modified in interrupt context, and
+	 * these counters are not modified in interrupt context, and
+	 * pte lock(a spinlock) is held, which implies preemtion disabled.
 	 */
 	if (unlikely(PageHuge(page)))
 		goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..704928e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * For use when we know that interrupts are disabled,
+ * or when we know that preemption is disabled and that
+ * particular counter cannot be updated from interrupt context.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
-- 
2.0.0-rc1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
  2014-05-10  7:15 ` Jianyu Zhan
@ 2014-05-10 19:51   ` Hugh Dickins
  -1 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2014-05-10 19:51 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: akpm, mgorman, cody, liuj97, zhangyanfei, srivatsa.bhat, dave,
	iamjoonsoo.kim, n-horiguchi, kirill.shutemov, schwidefsky,
	gorcunov, riel, cl, toshi.kani, paul.gortmaker, linux-mm,
	linux-kernel

On Sat, 10 May 2014, Jianyu Zhan wrote:

> __mod_zone_page_stat() is not irq-safe, so it should be used carefully.
> And it is not appropirately documented now. This patch adds comment for
> it, and also documents for some of its call sites.
> 
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>

Your original __mod_zone_page_state happened to be correct;
but you have no understanding of why it was correct, so its
comment was very wrong, even after you changed the irq wording.

This series just propagates your misunderstanding further,
while providing an object lesson in how not to present a series.

Sorry, you have quickly developed an unenviable reputation for
patches which waste developers' time: please consider your
patches much more carefully before posting them.

> ---
>  mm/page_alloc.c |  2 ++
>  mm/rmap.c       |  6 ++++++
>  mm/vmstat.c     | 16 +++++++++++++++-
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5dba293..9d6f474 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -659,6 +659,8 @@ static inline int free_pages_check(struct page *page)
>   *
>   * And clear the zone's pages_scanned counter, to hold off the "all pages are
>   * pinned" detection logic.
> + *
> + * Note: this function should be used with irq disabled.

Correct, but I don't see that that needed saying.  This is a static
function which is being used as intended: just because it matched your
search for "__mod_zone_page_state" is not a reason for you to add that
comment; irq disabled is only one of its prerequisites.

>   */
>  static void free_pcppages_bulk(struct zone *zone, int count,
>  					struct per_cpu_pages *pcp)
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 9c3e773..6078a30 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -979,6 +979,8 @@ void page_add_anon_rmap(struct page *page,
>  /*
>   * Special version of the above for do_swap_page, which often runs
>   * into pages that are exclusively owned by the current process.
> + * So we could use the irq-unsafe version __{inc|mod}_zone_page_stat
> + * here without others racing change it in between.

And yet you can immediately see them being used without any test
for "exclusive" below: why is that?  Think about it.

>   * Everybody else should continue to use page_add_anon_rmap above.
>   */
>  void do_page_add_anon_rmap(struct page *page,
> @@ -1077,6 +1079,10 @@ void page_remove_rmap(struct page *page)
>  	/*
>  	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
>  	 * and not charged by memcg for now.
> +	 *
> +	 * And we are the last user of this page, so it is safe to use
> +	 * the irq-unsafe version __{mod|dec}_zone_page here, since we
> +	 * have no racer.

Again, the code is correct to be using the irq-unsafe version, but your
comment is doubly wrong.

We are not necessarily the last user of this page, merely the one that
just now brought the mapcount down to 0.

But think: what bearing would being the last user of this page have on
the safety of using __mod_zone_page_state to adjust per-zone counters?

None at all.  A page does not move from one zone to another (though its
contents might be migrated from one page to another when safe to do so).

Once upon a time, from 2.6.16 to 2.6.32, there was indeed a relevant
and helpful comment in __page_set_anon_rmap():
	/*
	 * nr_mapped state can be updated without turning off
	 * interrupts because it is not modified via interrupt.
	 */
	__inc_page_state(nr_mapped);

The comment survived the replacement of nr_mapped, but eventually
it got cleaned away completely.

It is safe to use the irq-unsafe __mod_zone_page_stat on counters
which are never modified via interrupt.

>  	 */
>  	if (unlikely(PageHuge(page)))
>  		goto out;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 302dd07..778f154 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -207,7 +207,21 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
>  }
>  
>  /*
> - * For use when we know that interrupts are disabled.
> + * Optimized modificatoin function.
> + *
> + * The code basically does the modification in two steps:
> + *
> + *  1. read the current counter based on the processor number
> + *  2. modificate the counter write it back.
> + *
> + * So this function should be used with the guarantee that
> + *
> + *  1. interrupts are disabled, or
> + *  2. interrupts are enabled, but no other sites would race to
> + *     modify this counter in between.
> + *
> + * Otherwise, an irq-safe version mod_zone_page_state() should
> + * be used instead.

You are right that the comment is not good enough, but I don't trust
your version either.  Since percpu variables are involved, it's important
that preemption be disabled too (see comment above __inc_zone_state).

I'd prefer to let Christoph write the definitive version,
but my first stab at it would be:

/*
 * For use when we know that interrupts are disabled,
 * or when we know that preemption is disabled and that
 * particular counter cannot be updated from interrupt context.
 */

Hugh

>   */
>  void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
>  				int delta)
> -- 
> 2.0.0-rc1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-10 19:51   ` Hugh Dickins
  0 siblings, 0 replies; 22+ messages in thread
From: Hugh Dickins @ 2014-05-10 19:51 UTC (permalink / raw)
  To: Jianyu Zhan
  Cc: akpm, mgorman, cody, liuj97, zhangyanfei, srivatsa.bhat, dave,
	iamjoonsoo.kim, n-horiguchi, kirill.shutemov, schwidefsky,
	gorcunov, riel, cl, toshi.kani, paul.gortmaker, linux-mm,
	linux-kernel

On Sat, 10 May 2014, Jianyu Zhan wrote:

> __mod_zone_page_stat() is not irq-safe, so it should be used carefully.
> And it is not appropirately documented now. This patch adds comment for
> it, and also documents for some of its call sites.
> 
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>

Your original __mod_zone_page_state happened to be correct;
but you have no understanding of why it was correct, so its
comment was very wrong, even after you changed the irq wording.

This series just propagates your misunderstanding further,
while providing an object lesson in how not to present a series.

Sorry, you have quickly developed an unenviable reputation for
patches which waste developers' time: please consider your
patches much more carefully before posting them.

> ---
>  mm/page_alloc.c |  2 ++
>  mm/rmap.c       |  6 ++++++
>  mm/vmstat.c     | 16 +++++++++++++++-
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5dba293..9d6f474 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -659,6 +659,8 @@ static inline int free_pages_check(struct page *page)
>   *
>   * And clear the zone's pages_scanned counter, to hold off the "all pages are
>   * pinned" detection logic.
> + *
> + * Note: this function should be used with irq disabled.

Correct, but I don't see that that needed saying.  This is a static
function which is being used as intended: just because it matched your
search for "__mod_zone_page_state" is not a reason for you to add that
comment; irq disabled is only one of its prerequisites.

>   */
>  static void free_pcppages_bulk(struct zone *zone, int count,
>  					struct per_cpu_pages *pcp)
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 9c3e773..6078a30 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -979,6 +979,8 @@ void page_add_anon_rmap(struct page *page,
>  /*
>   * Special version of the above for do_swap_page, which often runs
>   * into pages that are exclusively owned by the current process.
> + * So we could use the irq-unsafe version __{inc|mod}_zone_page_stat
> + * here without others racing change it in between.

And yet you can immediately see them being used without any test
for "exclusive" below: why is that?  Think about it.

>   * Everybody else should continue to use page_add_anon_rmap above.
>   */
>  void do_page_add_anon_rmap(struct page *page,
> @@ -1077,6 +1079,10 @@ void page_remove_rmap(struct page *page)
>  	/*
>  	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
>  	 * and not charged by memcg for now.
> +	 *
> +	 * And we are the last user of this page, so it is safe to use
> +	 * the irq-unsafe version __{mod|dec}_zone_page here, since we
> +	 * have no racer.

Again, the code is correct to be using the irq-unsafe version, but your
comment is doubly wrong.

We are not necessarily the last user of this page, merely the one that
just now brought the mapcount down to 0.

But think: what bearing would being the last user of this page have on
the safety of using __mod_zone_page_state to adjust per-zone counters?

None at all.  A page does not move from one zone to another (though its
contents might be migrated from one page to another when safe to do so).

Once upon a time, from 2.6.16 to 2.6.32, there was indeed a relevant
and helpful comment in __page_set_anon_rmap():
	/*
	 * nr_mapped state can be updated without turning off
	 * interrupts because it is not modified via interrupt.
	 */
	__inc_page_state(nr_mapped);

The comment survived the replacement of nr_mapped, but eventually
it got cleaned away completely.

It is safe to use the irq-unsafe __mod_zone_page_stat on counters
which are never modified via interrupt.

>  	 */
>  	if (unlikely(PageHuge(page)))
>  		goto out;
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 302dd07..778f154 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -207,7 +207,21 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
>  }
>  
>  /*
> - * For use when we know that interrupts are disabled.
> + * Optimized modificatoin function.
> + *
> + * The code basically does the modification in two steps:
> + *
> + *  1. read the current counter based on the processor number
> + *  2. modificate the counter write it back.
> + *
> + * So this function should be used with the guarantee that
> + *
> + *  1. interrupts are disabled, or
> + *  2. interrupts are enabled, but no other sites would race to
> + *     modify this counter in between.
> + *
> + * Otherwise, an irq-safe version mod_zone_page_state() should
> + * be used instead.

You are right that the comment is not good enough, but I don't trust
your version either.  Since percpu variables are involved, it's important
that preemption be disabled too (see comment above __inc_zone_state).

I'd prefer to let Christoph write the definitive version,
but my first stab at it would be:

/*
 * For use when we know that interrupts are disabled,
 * or when we know that preemption is disabled and that
 * particular counter cannot be updated from interrupt context.
 */

Hugh

>   */
>  void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
>  				int delta)
> -- 
> 2.0.0-rc1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-10  7:15 ` Jianyu Zhan
  0 siblings, 0 replies; 22+ messages in thread
From: Jianyu Zhan @ 2014-05-10  7:15 UTC (permalink / raw)
  To: akpm, mgorman, cody, liuj97, zhangyanfei, srivatsa.bhat, dave,
	iamjoonsoo.kim, n-horiguchi, kirill.shutemov, schwidefsky,
	nasa4836, gorcunov, riel, cl, toshi.kani, paul.gortmaker
  Cc: linux-mm, linux-kernel

__mod_zone_page_stat() is not irq-safe, so it should be used carefully.
And it is not appropirately documented now. This patch adds comment for
it, and also documents for some of its call sites.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
---
 mm/page_alloc.c |  2 ++
 mm/rmap.c       |  6 ++++++
 mm/vmstat.c     | 16 +++++++++++++++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..9d6f474 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -659,6 +659,8 @@ static inline int free_pages_check(struct page *page)
  *
  * And clear the zone's pages_scanned counter, to hold off the "all pages are
  * pinned" detection logic.
+ *
+ * Note: this function should be used with irq disabled.
  */
 static void free_pcppages_bulk(struct zone *zone, int count,
 					struct per_cpu_pages *pcp)
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..6078a30 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -979,6 +979,8 @@ void page_add_anon_rmap(struct page *page,
 /*
  * Special version of the above for do_swap_page, which often runs
  * into pages that are exclusively owned by the current process.
+ * So we could use the irq-unsafe version __{inc|mod}_zone_page_stat
+ * here without others racing change it in between.
  * Everybody else should continue to use page_add_anon_rmap above.
  */
 void do_page_add_anon_rmap(struct page *page,
@@ -1077,6 +1079,10 @@ void page_remove_rmap(struct page *page)
 	/*
 	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 	 * and not charged by memcg for now.
+	 *
+	 * And we are the last user of this page, so it is safe to use
+	 * the irq-unsafe version __{mod|dec}_zone_page here, since we
+	 * have no racer.
 	 */
 	if (unlikely(PageHuge(page)))
 		goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..778f154 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,21 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * Optimized modificatoin function.
+ *
+ * The code basically does the modification in two steps:
+ *
+ *  1. read the current counter based on the processor number
+ *  2. modificate the counter write it back.
+ *
+ * So this function should be used with the guarantee that
+ *
+ *  1. interrupts are disabled, or
+ *  2. interrupts are enabled, but no other sites would race to
+ *     modify this counter in between.
+ *
+ * Otherwise, an irq-safe version mod_zone_page_state() should
+ * be used instead.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
-- 
2.0.0-rc1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-10  7:15 ` Jianyu Zhan
  0 siblings, 0 replies; 22+ messages in thread
From: Jianyu Zhan @ 2014-05-10  7:15 UTC (permalink / raw)
  To: akpm, mgorman, cody, liuj97, zhangyanfei, srivatsa.bhat, dave,
	iamjoonsoo.kim, n-horiguchi, kirill.shutemov, schwidefsky,
	nasa4836, gorcunov, riel, cl, toshi.kani, paul.gortmaker
  Cc: linux-mm, linux-kernel

__mod_zone_page_stat() is not irq-safe, so it should be used carefully.
And it is not appropirately documented now. This patch adds comment for
it, and also documents for some of its call sites.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
---
 mm/page_alloc.c |  2 ++
 mm/rmap.c       |  6 ++++++
 mm/vmstat.c     | 16 +++++++++++++++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..9d6f474 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -659,6 +659,8 @@ static inline int free_pages_check(struct page *page)
  *
  * And clear the zone's pages_scanned counter, to hold off the "all pages are
  * pinned" detection logic.
+ *
+ * Note: this function should be used with irq disabled.
  */
 static void free_pcppages_bulk(struct zone *zone, int count,
 					struct per_cpu_pages *pcp)
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..6078a30 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -979,6 +979,8 @@ void page_add_anon_rmap(struct page *page,
 /*
  * Special version of the above for do_swap_page, which often runs
  * into pages that are exclusively owned by the current process.
+ * So we could use the irq-unsafe version __{inc|mod}_zone_page_stat
+ * here without others racing change it in between.
  * Everybody else should continue to use page_add_anon_rmap above.
  */
 void do_page_add_anon_rmap(struct page *page,
@@ -1077,6 +1079,10 @@ void page_remove_rmap(struct page *page)
 	/*
 	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 	 * and not charged by memcg for now.
+	 *
+	 * And we are the last user of this page, so it is safe to use
+	 * the irq-unsafe version __{mod|dec}_zone_page here, since we
+	 * have no racer.
 	 */
 	if (unlikely(PageHuge(page)))
 		goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..778f154 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,21 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * Optimized modificatoin function.
+ *
+ * The code basically does the modification in two steps:
+ *
+ *  1. read the current counter based on the processor number
+ *  2. modificate the counter write it back.
+ *
+ * So this function should be used with the guarantee that
+ *
+ *  1. interrupts are disabled, or
+ *  2. interrupts are enabled, but no other sites would race to
+ *     modify this counter in between.
+ *
+ * Otherwise, an irq-safe version mod_zone_page_state() should
+ * be used instead.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
-- 
2.0.0-rc1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-05-14  4:00 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-11 12:31 [PATCH 1/3] mm: add comment for __mod_zone_page_stat Jianyu Zhan
2014-05-11 12:31 ` Jianyu Zhan
2014-05-12 14:01 ` Christoph Lameter
2014-05-12 14:01   ` Christoph Lameter
2014-05-12 15:19   ` Jianyu Zhan
2014-05-12 15:19     ` Jianyu Zhan
2014-05-12 16:05     ` Christoph Lameter
2014-05-12 16:05       ` Christoph Lameter
2014-05-12 16:07     ` Christoph Lameter
2014-05-12 16:07       ` Christoph Lameter
  -- strict thread matches above, loose matches on Subject: below --
2014-05-12 17:36 Jianyu Zhan
2014-05-12 17:36 ` Jianyu Zhan
2014-05-12 16:33 Jianyu Zhan
2014-05-12 16:33 ` Jianyu Zhan
2014-05-12 16:57 ` Christoph Lameter
2014-05-12 16:57   ` Christoph Lameter
2014-05-14  3:59 ` Hugh Dickins
2014-05-14  3:59   ` Hugh Dickins
2014-05-10  7:15 Jianyu Zhan
2014-05-10  7:15 ` Jianyu Zhan
2014-05-10 19:51 ` Hugh Dickins
2014-05-10 19:51   ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.