All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-09 20:03 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-09 20:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, KAMEZAWA Hiroyuki, Hugh Dickins, linux-kernel,
	Markus Trippelsdorf

On task's exit do_exit() calls sync_mm_rss() but this is not enough,
there can be page-faults after this point, for example exit_mm() ->
mm_release() -> put_user() (for processing tsk->clear_child_tid).
Thus there may be some rss-counters delta in current->rss_stat.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 kernel/exit.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..8e09dbe 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
 	enter_lazy_tlb(mm, current);
 	task_unlock(tsk);
 	mm_update_next_owner(mm);
+	sync_mm_rss(mm);
 	mmput(mm);
 }
 


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-09 20:03 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-09 20:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, KAMEZAWA Hiroyuki, Hugh Dickins, linux-kernel,
	Markus Trippelsdorf

On task's exit do_exit() calls sync_mm_rss() but this is not enough,
there can be page-faults after this point, for example exit_mm() ->
mm_release() -> put_user() (for processing tsk->clear_child_tid).
Thus there may be some rss-counters delta in current->rss_stat.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 kernel/exit.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..8e09dbe 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
 	enter_lazy_tlb(mm, current);
 	task_unlock(tsk);
 	mm_update_next_owner(mm);
+	sync_mm_rss(mm);
 	mmput(mm);
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
  2012-04-09 20:03 ` Konstantin Khlebnikov
@ 2012-04-09 21:04   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 45+ messages in thread
From: KOSAKI Motohiro @ 2012-04-09 21:04 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, linux-mm, KAMEZAWA Hiroyuki, Hugh Dickins,
	linux-kernel, Markus Trippelsdorf

On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
<khlebnikov@openvz.org> wrote:
> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> there can be page-faults after this point, for example exit_mm() ->
> mm_release() -> put_user() (for processing tsk->clear_child_tid).
> Thus there may be some rss-counters delta in current->rss_stat.

Seems reasonable. but I have another question. Do we have any reason to
keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
makes rss consistency.


>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  kernel/exit.c |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index d8bd3b42..8e09dbe 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
>        enter_lazy_tlb(mm, current);
>        task_unlock(tsk);
>        mm_update_next_owner(mm);
> +       sync_mm_rss(mm);
>        mmput(mm);
>  }
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-09 21:04   ` KOSAKI Motohiro
  0 siblings, 0 replies; 45+ messages in thread
From: KOSAKI Motohiro @ 2012-04-09 21:04 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, linux-mm, KAMEZAWA Hiroyuki, Hugh Dickins,
	linux-kernel, Markus Trippelsdorf

On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
<khlebnikov@openvz.org> wrote:
> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> there can be page-faults after this point, for example exit_mm() ->
> mm_release() -> put_user() (for processing tsk->clear_child_tid).
> Thus there may be some rss-counters delta in current->rss_stat.

Seems reasonable. but I have another question. Do we have any reason to
keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
makes rss consistency.


>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  kernel/exit.c |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index d8bd3b42..8e09dbe 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
>        enter_lazy_tlb(mm, current);
>        task_unlock(tsk);
>        mm_update_next_owner(mm);
> +       sync_mm_rss(mm);
>        mmput(mm);
>  }
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
  2012-04-09 21:04   ` KOSAKI Motohiro
  (?)
@ 2012-04-09 22:03   ` Hugh Dickins
  2012-04-10  0:33       ` KAMEZAWA Hiroyuki
  2012-04-10  6:34       ` Konstantin Khlebnikov
  -1 siblings, 2 replies; 45+ messages in thread
From: Hugh Dickins @ 2012-04-09 22:03 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm,
	KAMEZAWA Hiroyuki, linux-kernel, Markus Trippelsdorf

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2258 bytes --]

On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
> <khlebnikov@openvz.org> wrote:
> > On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> > there can be page-faults after this point, for example exit_mm() ->
> > mm_release() -> put_user() (for processing tsk->clear_child_tid).
> > Thus there may be some rss-counters delta in current->rss_stat.
> 
> Seems reasonable.

Yes, I think Konstantin has probably caught it;
but I'd like to hear confirmation from Markus.

> but I have another question. Do we have any reason to
> keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
> makes rss consistency.

IIRC it's all about the hiwater_rss/maxrss stuff: we want to sync the
maximum rss into mm->hiwater_rss before it's transferred to signal->maxrss,
and later made visible to the user though getrusage(RUSAGE_CHILDREN,) -
does your reading confirm that?

Konstantin now finds the child_tid and futex stuff can trigger faults
raising rss beyond that point, but usually it won't go higher than when
it was captured for maxrss there.

The sync_mm_rss() added by this patch (after "tsk->mm = NULL" so
*_mm_counter_fast() cannot store any more into the tsk even if there
were more faults) is solely to satisfy Konstantin's check_mm(), and
it is irritating to have that duplicated on the exit path.

I'd be happy to see the new one put under CONFIG_DEBUG_VM along with
check_mm(), once it's had a few -rcs of exposure without.

Hugh

> 
> 
> >
> > Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> > Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  kernel/exit.c |    1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index d8bd3b42..8e09dbe 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
> >        enter_lazy_tlb(mm, current);
> >        task_unlock(tsk);
> >        mm_update_next_owner(mm);
> > +       sync_mm_rss(mm);
> >        mmput(mm);
> >  }
> >

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
  2012-04-09 22:03   ` Hugh Dickins
@ 2012-04-10  0:33       ` KAMEZAWA Hiroyuki
  2012-04-10  6:34       ` Konstantin Khlebnikov
  1 sibling, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-10  0:33 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KOSAKI Motohiro, Konstantin Khlebnikov, Andrew Morton, linux-mm,
	linux-kernel, Markus Trippelsdorf

(2012/04/10 7:03), Hugh Dickins wrote:

> On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
>> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
>> <khlebnikov@openvz.org> wrote:
>>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>>> there can be page-faults after this point, for example exit_mm() ->
>>> mm_release() -> put_user() (for processing tsk->clear_child_tid).
>>> Thus there may be some rss-counters delta in current->rss_stat.
>>
>> Seems reasonable.
> 
> Yes, I think Konstantin has probably caught it;
> but I'd like to hear confirmation from Markus.
> 
>> but I have another question. Do we have any reason to
>> keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
>> makes rss consistency.
> 
> IIRC it's all about the hiwater_rss/maxrss stuff: we want to sync the
> maximum rss into mm->hiwater_rss before it's transferred to signal->maxrss,
> and later made visible to the user though getrusage(RUSAGE_CHILDREN,) -
> does your reading confirm that?
> 


IIRC, sync_mm_rss() in do_exit() is for synchronizing rsscounter for taskacct.
mm->maxrss is sent to listener by xacct_add_tsk(). It's needed to be
synchronized before taskstat_exit()..

Hm, but, exit_mm() is placed after taskstat_exit().

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-10  0:33       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-10  0:33 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KOSAKI Motohiro, Konstantin Khlebnikov, Andrew Morton, linux-mm,
	linux-kernel, Markus Trippelsdorf

(2012/04/10 7:03), Hugh Dickins wrote:

> On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
>> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
>> <khlebnikov@openvz.org> wrote:
>>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>>> there can be page-faults after this point, for example exit_mm() ->
>>> mm_release() -> put_user() (for processing tsk->clear_child_tid).
>>> Thus there may be some rss-counters delta in current->rss_stat.
>>
>> Seems reasonable.
> 
> Yes, I think Konstantin has probably caught it;
> but I'd like to hear confirmation from Markus.
> 
>> but I have another question. Do we have any reason to
>> keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
>> makes rss consistency.
> 
> IIRC it's all about the hiwater_rss/maxrss stuff: we want to sync the
> maximum rss into mm->hiwater_rss before it's transferred to signal->maxrss,
> and later made visible to the user though getrusage(RUSAGE_CHILDREN,) -
> does your reading confirm that?
> 


IIRC, sync_mm_rss() in do_exit() is for synchronizing rsscounter for taskacct.
mm->maxrss is sent to listener by xacct_add_tsk(). It's needed to be
synchronized before taskstat_exit()..

Hm, but, exit_mm() is placed after taskstat_exit().

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
  2012-04-09 20:03 ` Konstantin Khlebnikov
@ 2012-04-10  0:35   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-10  0:35 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, linux-mm, Hugh Dickins, linux-kernel, Markus Trippelsdorf

(2012/04/10 5:03), Konstantin Khlebnikov wrote:

> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> there can be page-faults after this point, for example exit_mm() ->
> mm_release() -> put_user() (for processing tsk->clear_child_tid).
> Thus there may be some rss-counters delta in current->rss_stat.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Does this fix recent issue reported ?

 




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-10  0:35   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-10  0:35 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, linux-mm, Hugh Dickins, linux-kernel, Markus Trippelsdorf

(2012/04/10 5:03), Konstantin Khlebnikov wrote:

> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> there can be page-faults after this point, for example exit_mm() ->
> mm_release() -> put_user() (for processing tsk->clear_child_tid).
> Thus there may be some rss-counters delta in current->rss_stat.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Does this fix recent issue reported ?

 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
  2012-04-10  0:35   ` KAMEZAWA Hiroyuki
@ 2012-04-10  5:43     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10  5:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-mm, Hugh Dickins, linux-kernel, Markus Trippelsdorf

KAMEZAWA Hiroyuki wrote:
> (2012/04/10 5:03), Konstantin Khlebnikov wrote:
>
>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>> there can be page-faults after this point, for example exit_mm() ->
>> mm_release() ->  put_user() (for processing tsk->clear_child_tid).
>> Thus there may be some rss-counters delta in current->rss_stat.

I had to mention it in comment:

This should fix warnings:
BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
>> Cc: Hugh Dickins<hughd@google.com>
>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> Reviewed-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> Does this fix recent issue reported ?
>

I hope so.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-10  5:43     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10  5:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-mm, Hugh Dickins, linux-kernel, Markus Trippelsdorf

KAMEZAWA Hiroyuki wrote:
> (2012/04/10 5:03), Konstantin Khlebnikov wrote:
>
>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>> there can be page-faults after this point, for example exit_mm() ->
>> mm_release() ->  put_user() (for processing tsk->clear_child_tid).
>> Thus there may be some rss-counters delta in current->rss_stat.

I had to mention it in comment:

This should fix warnings:
BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
>> Cc: Hugh Dickins<hughd@google.com>
>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> Reviewed-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> Does this fix recent issue reported ?
>

I hope so.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
  2012-04-09 22:03   ` Hugh Dickins
@ 2012-04-10  6:34       ` Konstantin Khlebnikov
  2012-04-10  6:34       ` Konstantin Khlebnikov
  1 sibling, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10  6:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KOSAKI Motohiro, Andrew Morton, linux-mm, KAMEZAWA Hiroyuki,
	linux-kernel, Markus Trippelsdorf

Hugh Dickins wrote:
> On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
>> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
>> <khlebnikov@openvz.org>  wrote:
>>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>>> there can be page-faults after this point, for example exit_mm() ->
>>> mm_release() ->  put_user() (for processing tsk->clear_child_tid).
>>> Thus there may be some rss-counters delta in current->rss_stat.
>>
>> Seems reasonable.
>
> Yes, I think Konstantin has probably caught it;
> but I'd like to hear confirmation from Markus.

There is another bug in exec_mmap()

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,8 +823,8 @@ static int exec_mmap(struct mm_struct *mm)
         /* Notify parent that we're no longer interested in the old VM */
         tsk = current;
         old_mm = current->mm;
-       sync_mm_rss(old_mm);
         mm_release(tsk, old_mm);
+       sync_mm_rss(old_mm);

         if (old_mm) {
                 /*

>
>> but I have another question. Do we have any reason to
>> keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
>> makes rss consistency.
>
> IIRC it's all about the hiwater_rss/maxrss stuff: we want to sync the
> maximum rss into mm->hiwater_rss before it's transferred to signal->maxrss,
> and later made visible to the user though getrusage(RUSAGE_CHILDREN,) -
> does your reading confirm that?
>
> Konstantin now finds the child_tid and futex stuff can trigger faults
> raising rss beyond that point, but usually it won't go higher than when
> it was captured for maxrss there.
>
> The sync_mm_rss() added by this patch (after "tsk->mm = NULL" so
> *_mm_counter_fast() cannot store any more into the tsk even if there
> were more faults) is solely to satisfy Konstantin's check_mm(), and
> it is irritating to have that duplicated on the exit path.

It was quick fix after the midnight. =) Now I think we can move mm_release()
from exit_mm() to do_exit(), and place it before sync_mm_rss(). Other stuff
there shouldn't trigger page-faults. Thus here will be only one sync_mm_rss():
at the end of mm_release()

>
> I'd be happy to see the new one put under CONFIG_DEBUG_VM along with
> check_mm(), once it's had a few -rcs of exposure without.
>
> Hugh
>
>>
>>
>>>
>>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
>>> Cc: Hugh Dickins<hughd@google.com>
>>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>> ---
>>>   kernel/exit.c |    1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/kernel/exit.c b/kernel/exit.c
>>> index d8bd3b42..8e09dbe 100644
>>> --- a/kernel/exit.c
>>> +++ b/kernel/exit.c
>>> @@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
>>>         enter_lazy_tlb(mm, current);
>>>         task_unlock(tsk);
>>>         mm_update_next_owner(mm);
>>> +       sync_mm_rss(mm);
>>>         mmput(mm);
>>>   }
>> >


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-10  6:34       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10  6:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KOSAKI Motohiro, Andrew Morton, linux-mm, KAMEZAWA Hiroyuki,
	linux-kernel, Markus Trippelsdorf

Hugh Dickins wrote:
> On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
>> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
>> <khlebnikov@openvz.org>  wrote:
>>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>>> there can be page-faults after this point, for example exit_mm() ->
>>> mm_release() ->  put_user() (for processing tsk->clear_child_tid).
>>> Thus there may be some rss-counters delta in current->rss_stat.
>>
>> Seems reasonable.
>
> Yes, I think Konstantin has probably caught it;
> but I'd like to hear confirmation from Markus.

There is another bug in exec_mmap()

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,8 +823,8 @@ static int exec_mmap(struct mm_struct *mm)
         /* Notify parent that we're no longer interested in the old VM */
         tsk = current;
         old_mm = current->mm;
-       sync_mm_rss(old_mm);
         mm_release(tsk, old_mm);
+       sync_mm_rss(old_mm);

         if (old_mm) {
                 /*

>
>> but I have another question. Do we have any reason to
>> keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
>> makes rss consistency.
>
> IIRC it's all about the hiwater_rss/maxrss stuff: we want to sync the
> maximum rss into mm->hiwater_rss before it's transferred to signal->maxrss,
> and later made visible to the user though getrusage(RUSAGE_CHILDREN,) -
> does your reading confirm that?
>
> Konstantin now finds the child_tid and futex stuff can trigger faults
> raising rss beyond that point, but usually it won't go higher than when
> it was captured for maxrss there.
>
> The sync_mm_rss() added by this patch (after "tsk->mm = NULL" so
> *_mm_counter_fast() cannot store any more into the tsk even if there
> were more faults) is solely to satisfy Konstantin's check_mm(), and
> it is irritating to have that duplicated on the exit path.

It was quick fix after the midnight. =) Now I think we can move mm_release()
from exit_mm() to do_exit(), and place it before sync_mm_rss(). Other stuff
there shouldn't trigger page-faults. Thus here will be only one sync_mm_rss():
at the end of mm_release()

>
> I'd be happy to see the new one put under CONFIG_DEBUG_VM along with
> check_mm(), once it's had a few -rcs of exposure without.
>
> Hugh
>
>>
>>
>>>
>>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
>>> Cc: Hugh Dickins<hughd@google.com>
>>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>> ---
>>>   kernel/exit.c |    1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/kernel/exit.c b/kernel/exit.c
>>> index d8bd3b42..8e09dbe 100644
>>> --- a/kernel/exit.c
>>> +++ b/kernel/exit.c
>>> @@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
>>>         enter_lazy_tlb(mm, current);
>>>         task_unlock(tsk);
>>>         mm_update_next_owner(mm);
>>> +       sync_mm_rss(mm);
>>>         mmput(mm);
>>>   }
>> >

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
  2012-04-10  6:34       ` Konstantin Khlebnikov
@ 2012-04-10 16:04         ` Markus Trippelsdorf
  -1 siblings, 0 replies; 45+ messages in thread
From: Markus Trippelsdorf @ 2012-04-10 16:04 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, KOSAKI Motohiro, Andrew Morton, linux-mm,
	KAMEZAWA Hiroyuki, linux-kernel

On 2012.04.10 at 10:34 +0400, Konstantin Khlebnikov wrote:
> Hugh Dickins wrote:
> > On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
> >> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
> >> <khlebnikov@openvz.org>  wrote:
> >>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> >>> there can be page-faults after this point, for example exit_mm() ->
> >>> mm_release() ->  put_user() (for processing tsk->clear_child_tid).
> >>> Thus there may be some rss-counters delta in current->rss_stat.
> >>
> >> Seems reasonable.
> >
> > Yes, I think Konstantin has probably caught it;
> > but I'd like to hear confirmation from Markus.
> 
> There is another bug in exec_mmap()
> 
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -823,8 +823,8 @@ static int exec_mmap(struct mm_struct *mm)
>          /* Notify parent that we're no longer interested in the old VM */
>          tsk = current;
>          old_mm = current->mm;
> -       sync_mm_rss(old_mm);
>          mm_release(tsk, old_mm);
> +       sync_mm_rss(old_mm);
> 
>          if (old_mm) {
>                  /*

FWIW with both patches applied I cannot reproduce the issue anymore.

Thanks.

-- 
Markus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-10 16:04         ` Markus Trippelsdorf
  0 siblings, 0 replies; 45+ messages in thread
From: Markus Trippelsdorf @ 2012-04-10 16:04 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, KOSAKI Motohiro, Andrew Morton, linux-mm,
	KAMEZAWA Hiroyuki, linux-kernel

On 2012.04.10 at 10:34 +0400, Konstantin Khlebnikov wrote:
> Hugh Dickins wrote:
> > On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
> >> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
> >> <khlebnikov@openvz.org>  wrote:
> >>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> >>> there can be page-faults after this point, for example exit_mm() ->
> >>> mm_release() ->  put_user() (for processing tsk->clear_child_tid).
> >>> Thus there may be some rss-counters delta in current->rss_stat.
> >>
> >> Seems reasonable.
> >
> > Yes, I think Konstantin has probably caught it;
> > but I'd like to hear confirmation from Markus.
> 
> There is another bug in exec_mmap()
> 
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -823,8 +823,8 @@ static int exec_mmap(struct mm_struct *mm)
>          /* Notify parent that we're no longer interested in the old VM */
>          tsk = current;
>          old_mm = current->mm;
> -       sync_mm_rss(old_mm);
>          mm_release(tsk, old_mm);
> +       sync_mm_rss(old_mm);
> 
>          if (old_mm) {
>                  /*

FWIW with both patches applied I cannot reproduce the issue anymore.

Thanks.

-- 
Markus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
  2012-04-09 20:03 ` Konstantin Khlebnikov
@ 2012-04-10 17:07   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 17:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

mm->rss_stat counters have per-task delta: task->rss_stat, before changing
task->mm pointer kernel must flush this delta with help of sync_mm_rss().

do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
Unfortunately kernel do this before calling mm_relese(), which can call put_user()
for processing task->clear_child_tid. So at this point we can trigger page-faults
and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:

| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
do_exit() and calls it earlier. After mm_release() there should be no page-faults.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 fs/exec.c     |    1 -
 kernel/exit.c |    9 +++++----
 kernel/fork.c |    8 ++++++++
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b1fd202..5be1d97 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *mm)
 	/* Notify parent that we're no longer interested in the old VM */
 	tsk = current;
 	old_mm = current->mm;
-	sync_mm_rss(old_mm);
 	mm_release(tsk, old_mm);
 
 	if (old_mm) {
diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..eb12719 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
 	 * user space pages.  We don't need them, and if we didn't close them
 	 * they would be locked into memory.
 	 */
+	mm_release(current, current->mm);
 	exit_mm(current);
 	/*
 	 * We don't want to get frozen, in case system-wide hibernation
@@ -640,7 +641,6 @@ static void exit_mm(struct task_struct * tsk)
 	struct mm_struct *mm = tsk->mm;
 	struct core_state *core_state;
 
-	mm_release(tsk, mm);
 	if (!mm)
 		return;
 	/*
@@ -959,9 +959,10 @@ void do_exit(long code)
 				preempt_count());
 
 	acct_update_integrals(tsk);
-	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
+
+	/* Release mm and sync mm's RSS info before statistics gathering */
+	mm_release(tsk, tsk->mm);
+
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
 		hrtimer_cancel(&tsk->signal->real_timer);
diff --git a/kernel/fork.c b/kernel/fork.c
index 54662ed..326bb5b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 		}
 		tsk->clear_child_tid = NULL;
 	}
+
+	/*
+	 * Final rss-counter synchronization. After this point must be
+	 * no page-faults into this mm from current context, otherwise
+	 * mm->rss_stat will be inconsistent.
+	 */
+	if (mm)
+		sync_mm_rss(mm);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
@ 2012-04-10 17:07   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 17:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

mm->rss_stat counters have per-task delta: task->rss_stat, before changing
task->mm pointer kernel must flush this delta with help of sync_mm_rss().

do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
Unfortunately kernel do this before calling mm_relese(), which can call put_user()
for processing task->clear_child_tid. So at this point we can trigger page-faults
and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:

| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
do_exit() and calls it earlier. After mm_release() there should be no page-faults.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 fs/exec.c     |    1 -
 kernel/exit.c |    9 +++++----
 kernel/fork.c |    8 ++++++++
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b1fd202..5be1d97 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *mm)
 	/* Notify parent that we're no longer interested in the old VM */
 	tsk = current;
 	old_mm = current->mm;
-	sync_mm_rss(old_mm);
 	mm_release(tsk, old_mm);
 
 	if (old_mm) {
diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..eb12719 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
 	 * user space pages.  We don't need them, and if we didn't close them
 	 * they would be locked into memory.
 	 */
+	mm_release(current, current->mm);
 	exit_mm(current);
 	/*
 	 * We don't want to get frozen, in case system-wide hibernation
@@ -640,7 +641,6 @@ static void exit_mm(struct task_struct * tsk)
 	struct mm_struct *mm = tsk->mm;
 	struct core_state *core_state;
 
-	mm_release(tsk, mm);
 	if (!mm)
 		return;
 	/*
@@ -959,9 +959,10 @@ void do_exit(long code)
 				preempt_count());
 
 	acct_update_integrals(tsk);
-	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
+
+	/* Release mm and sync mm's RSS info before statistics gathering */
+	mm_release(tsk, tsk->mm);
+
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
 		hrtimer_cancel(&tsk->signal->real_timer);
diff --git a/kernel/fork.c b/kernel/fork.c
index 54662ed..326bb5b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 		}
 		tsk->clear_child_tid = NULL;
 	}
+
+	/*
+	 * Final rss-counter synchronization. After this point must be
+	 * no page-faults into this mm from current context, otherwise
+	 * mm->rss_stat will be inconsistent.
+	 */
+	if (mm)
+		sync_mm_rss(mm);
 }
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
  2012-04-10 17:07   ` Konstantin Khlebnikov
@ 2012-04-10 18:43     ` Andrew Morton
  -1 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-10 18:43 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On Tue, 10 Apr 2012 21:07:32 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> mm->rss_stat counters have per-task delta: task->rss_stat, before changing
> task->mm pointer kernel must flush this delta with help of sync_mm_rss().
> 
> do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
> rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
> Unfortunately kernel do this before calling mm_relese(), which can call put_user()
> for processing task->clear_child_tid. So at this point we can trigger page-faults
> and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
> inconsistent and check_mm() will print something like this:
> 
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
> 
> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
> 
> ...
> 
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  		}
>  		tsk->clear_child_tid = NULL;
>  	}
> +
> +	/*
> +	 * Final rss-counter synchronization. After this point must be
> +	 * no page-faults into this mm from current context, otherwise
> +	 * mm->rss_stat will be inconsistent.
> +	 */
> +	if (mm)
> +		sync_mm_rss(mm);
>  }
>  

Well that's scary.  AFACIT `mm' can indeed be NULL here, when a kernel
thread calls do_exit().  No implementation of deactivate_mm() actually
uses its `mm' arg and I guess that kernel threads never set
tsk->clear_child_tid.  Whee.


Do we think we should backport this into -stable kernels?  How hard is
it to make that warning come out?


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
@ 2012-04-10 18:43     ` Andrew Morton
  0 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-10 18:43 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On Tue, 10 Apr 2012 21:07:32 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> mm->rss_stat counters have per-task delta: task->rss_stat, before changing
> task->mm pointer kernel must flush this delta with help of sync_mm_rss().
> 
> do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
> rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
> Unfortunately kernel do this before calling mm_relese(), which can call put_user()
> for processing task->clear_child_tid. So at this point we can trigger page-faults
> and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
> inconsistent and check_mm() will print something like this:
> 
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
> 
> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
> 
> ...
> 
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  		}
>  		tsk->clear_child_tid = NULL;
>  	}
> +
> +	/*
> +	 * Final rss-counter synchronization. After this point must be
> +	 * no page-faults into this mm from current context, otherwise
> +	 * mm->rss_stat will be inconsistent.
> +	 */
> +	if (mm)
> +		sync_mm_rss(mm);
>  }
>  

Well that's scary.  AFACIT `mm' can indeed be NULL here, when a kernel
thread calls do_exit().  No implementation of deactivate_mm() actually
uses its `mm' arg and I guess that kernel threads never set
tsk->clear_child_tid.  Whee.


Do we think we should backport this into -stable kernels?  How hard is
it to make that warning come out?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
  2012-04-10 17:07   ` Konstantin Khlebnikov
@ 2012-04-10 19:10     ` Oleg Nesterov
  -1 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-10 19:10 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On 04/10, Konstantin Khlebnikov wrote:
>
> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
> do_exit() and calls it earlier. After mm_release() there should be no page-faults.

Can't prove, but I feel there should be a simpler fix...

Anyway, this patch is not exactly correct.

> @@ -959,9 +959,10 @@ void do_exit(long code)
>  				preempt_count());
>
>  	acct_update_integrals(tsk);
> -	/* sync mm's RSS info before statistics gathering */
> -	if (tsk->mm)
> -		sync_mm_rss(tsk->mm);
> +
> +	/* Release mm and sync mm's RSS info before statistics gathering */
> +	mm_release(tsk, tsk->mm);

This breaks kthread_stop() at least.

The exiting kthread shouldn't do complete_vfork_done() until it
sets ->exit_code.

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
@ 2012-04-10 19:10     ` Oleg Nesterov
  0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-10 19:10 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On 04/10, Konstantin Khlebnikov wrote:
>
> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
> do_exit() and calls it earlier. After mm_release() there should be no page-faults.

Can't prove, but I feel there should be a simpler fix...

Anyway, this patch is not exactly correct.

> @@ -959,9 +959,10 @@ void do_exit(long code)
>  				preempt_count());
>
>  	acct_update_integrals(tsk);
> -	/* sync mm's RSS info before statistics gathering */
> -	if (tsk->mm)
> -		sync_mm_rss(tsk->mm);
> +
> +	/* Release mm and sync mm's RSS info before statistics gathering */
> +	mm_release(tsk, tsk->mm);

This breaks kthread_stop() at least.

The exiting kthread shouldn't do complete_vfork_done() until it
sets ->exit_code.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
  2012-04-10 18:43     ` Andrew Morton
@ 2012-04-10 19:52       ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 19:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

Andrew Morton wrote:
> On Tue, 10 Apr 2012 21:07:32 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> mm->rss_stat counters have per-task delta: task->rss_stat, before changing
>> task->mm pointer kernel must flush this delta with help of sync_mm_rss().
>>
>> do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
>> rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
>> Unfortunately kernel do this before calling mm_relese(), which can call put_user()
>> for processing task->clear_child_tid. So at this point we can trigger page-faults
>> and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
>> inconsistent and check_mm() will print something like this:
>>
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
>>
>> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
>> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
>>
>> ...
>>
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>>   		}
>>   		tsk->clear_child_tid = NULL;
>>   	}
>> +
>> +	/*
>> +	 * Final rss-counter synchronization. After this point must be
>> +	 * no page-faults into this mm from current context, otherwise
>> +	 * mm->rss_stat will be inconsistent.
>> +	 */
>> +	if (mm)
>> +		sync_mm_rss(mm);
>>   }
>>
>
> Well that's scary.  AFACIT `mm' can indeed be NULL here, when a kernel
> thread calls do_exit().  No implementation of deactivate_mm() actually
> uses its `mm' arg and I guess that kernel threads never set
> tsk->clear_child_tid.  Whee.

But it works as designed =)

>
>
> Do we think we should backport this into -stable kernels?  How hard is
> it to make that warning come out?
>

Warning was introduced in v3.3-3720-gc3f0327, so no releases with them.
And I think older kernels can live fine with slightly racy rss counters.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
@ 2012-04-10 19:52       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 19:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

Andrew Morton wrote:
> On Tue, 10 Apr 2012 21:07:32 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> mm->rss_stat counters have per-task delta: task->rss_stat, before changing
>> task->mm pointer kernel must flush this delta with help of sync_mm_rss().
>>
>> do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
>> rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
>> Unfortunately kernel do this before calling mm_relese(), which can call put_user()
>> for processing task->clear_child_tid. So at this point we can trigger page-faults
>> and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
>> inconsistent and check_mm() will print something like this:
>>
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
>>
>> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
>> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
>>
>> ...
>>
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>>   		}
>>   		tsk->clear_child_tid = NULL;
>>   	}
>> +
>> +	/*
>> +	 * Final rss-counter synchronization. After this point must be
>> +	 * no page-faults into this mm from current context, otherwise
>> +	 * mm->rss_stat will be inconsistent.
>> +	 */
>> +	if (mm)
>> +		sync_mm_rss(mm);
>>   }
>>
>
> Well that's scary.  AFACIT `mm' can indeed be NULL here, when a kernel
> thread calls do_exit().  No implementation of deactivate_mm() actually
> uses its `mm' arg and I guess that kernel threads never set
> tsk->clear_child_tid.  Whee.

But it works as designed =)

>
>
> Do we think we should backport this into -stable kernels?  How hard is
> it to make that warning come out?
>

Warning was introduced in v3.3-3720-gc3f0327, so no releases with them.
And I think older kernels can live fine with slightly racy rss counters.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
  2012-04-10 19:10     ` Oleg Nesterov
@ 2012-04-10 20:09       ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 20:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

Oleg Nesterov wrote:
> On 04/10, Konstantin Khlebnikov wrote:
>>
>> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
>> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
>
> Can't prove, but I feel there should be a simpler fix...
>
> Anyway, this patch is not exactly correct.
>
>> @@ -959,9 +959,10 @@ void do_exit(long code)
>>   				preempt_count());
>>
>>   	acct_update_integrals(tsk);
>> -	/* sync mm's RSS info before statistics gathering */
>> -	if (tsk->mm)
>> -		sync_mm_rss(tsk->mm);
>> +
>> +	/* Release mm and sync mm's RSS info before statistics gathering */
>> +	mm_release(tsk, tsk->mm);
>
> This breaks kthread_stop() at least.
>
> The exiting kthread shouldn't do complete_vfork_done() until it
> sets ->exit_code.

Ouch, I was afraid something like that.

>
> Oleg.
>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
@ 2012-04-10 20:09       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 20:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

Oleg Nesterov wrote:
> On 04/10, Konstantin Khlebnikov wrote:
>>
>> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
>> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
>
> Can't prove, but I feel there should be a simpler fix...
>
> Anyway, this patch is not exactly correct.
>
>> @@ -959,9 +959,10 @@ void do_exit(long code)
>>   				preempt_count());
>>
>>   	acct_update_integrals(tsk);
>> -	/* sync mm's RSS info before statistics gathering */
>> -	if (tsk->mm)
>> -		sync_mm_rss(tsk->mm);
>> +
>> +	/* Release mm and sync mm's RSS info before statistics gathering */
>> +	mm_release(tsk, tsk->mm);
>
> This breaks kthread_stop() at least.
>
> The exiting kthread shouldn't do complete_vfork_done() until it
> sets ->exit_code.

Ouch, I was afraid something like that.

>
> Oleg.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/2] mm: set task exit code before complete_vfork_done()
  2012-04-09 20:03 ` Konstantin Khlebnikov
@ 2012-04-12  8:09   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-12  8:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
shouldn't do complete_vfork_done() until it sets ->exit_code.

fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index eb12719..70875a6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -960,6 +960,9 @@ void do_exit(long code)
 
 	acct_update_integrals(tsk);
 
+	/* Set exit_code before complete_vfork_done() in mm_release() */
+	tsk->exit_code = code;
+
 	/* Release mm and sync mm's RSS info before statistics gathering */
 	mm_release(tsk, tsk->mm);
 
@@ -975,7 +978,6 @@ void do_exit(long code)
 		tty_audit_exit();
 	audit_free(tsk);
 
-	tsk->exit_code = code;
 	taskstats_exit(tsk, group_dead);
 
 	exit_mm(tsk);


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-12  8:09   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-12  8:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
shouldn't do complete_vfork_done() until it sets ->exit_code.

fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index eb12719..70875a6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -960,6 +960,9 @@ void do_exit(long code)
 
 	acct_update_integrals(tsk);
 
+	/* Set exit_code before complete_vfork_done() in mm_release() */
+	tsk->exit_code = code;
+
 	/* Release mm and sync mm's RSS info before statistics gathering */
 	mm_release(tsk, tsk->mm);
 
@@ -975,7 +978,6 @@ void do_exit(long code)
 		tty_audit_exit();
 	audit_free(tsk);
 
-	tsk->exit_code = code;
 	taskstats_exit(tsk, group_dead);
 
 	exit_mm(tsk);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
  2012-04-09 20:03 ` Konstantin Khlebnikov
@ 2012-04-12  8:09   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-12  8:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

Child should wake ups parent from vfork() only after finishing all operations with
shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
but it looks more accurate now.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 kernel/fork.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 326bb5b..f10ac1d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 	/* Get rid of any cached register state */
 	deactivate_mm(tsk, mm);
 
-	if (tsk->vfork_done)
-		complete_vfork_done(tsk);
-
 	/*
 	 * If we're exiting normally, clear a user-space tid field if
 	 * requested.  We leave this alone when dying by signal, to leave
@@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 	 */
 	if (mm)
 		sync_mm_rss(mm);
+
+	/*
+	 * All done, finally we can wake up parent and return this mm to him.
+	 * Also kthread_stop() uses this completion for synchronization.
+	 */
+	if (tsk->vfork_done)
+		complete_vfork_done(tsk);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
@ 2012-04-12  8:09   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-12  8:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

Child should wake ups parent from vfork() only after finishing all operations with
shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
but it looks more accurate now.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 kernel/fork.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 326bb5b..f10ac1d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 	/* Get rid of any cached register state */
 	deactivate_mm(tsk, mm);
 
-	if (tsk->vfork_done)
-		complete_vfork_done(tsk);
-
 	/*
 	 * If we're exiting normally, clear a user-space tid field if
 	 * requested.  We leave this alone when dying by signal, to leave
@@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 	 */
 	if (mm)
 		sync_mm_rss(mm);
+
+	/*
+	 * All done, finally we can wake up parent and return this mm to him.
+	 * Also kthread_stop() uses this completion for synchronization.
+	 */
+	if (tsk->vfork_done)
+		complete_vfork_done(tsk);
 }
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
  2012-04-12  8:09   ` Konstantin Khlebnikov
@ 2012-04-12 23:35     ` Andrew Morton
  -1 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-12 23:35 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On Thu, 12 Apr 2012 12:09:48 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
> shouldn't do complete_vfork_done() until it sets ->exit_code.
> 
> fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/exit.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index eb12719..70875a6 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -960,6 +960,9 @@ void do_exit(long code)
>  
>  	acct_update_integrals(tsk);
>  
> +	/* Set exit_code before complete_vfork_done() in mm_release() */
> +	tsk->exit_code = code;
> +
>  	/* Release mm and sync mm's RSS info before statistics gathering */
>  	mm_release(tsk, tsk->mm);
>  
> @@ -975,7 +978,6 @@ void do_exit(long code)
>  		tty_audit_exit();
>  	audit_free(tsk);
>  
> -	tsk->exit_code = code;
>  	taskstats_exit(tsk, group_dead);
>  

So does this patch address Oleg's objection to
mm-correctly-synchronize-rss-counters-at-exit-exec.patch?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-12 23:35     ` Andrew Morton
  0 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-12 23:35 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On Thu, 12 Apr 2012 12:09:48 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
> shouldn't do complete_vfork_done() until it sets ->exit_code.
> 
> fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/exit.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index eb12719..70875a6 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -960,6 +960,9 @@ void do_exit(long code)
>  
>  	acct_update_integrals(tsk);
>  
> +	/* Set exit_code before complete_vfork_done() in mm_release() */
> +	tsk->exit_code = code;
> +
>  	/* Release mm and sync mm's RSS info before statistics gathering */
>  	mm_release(tsk, tsk->mm);
>  
> @@ -975,7 +978,6 @@ void do_exit(long code)
>  		tty_audit_exit();
>  	audit_free(tsk);
>  
> -	tsk->exit_code = code;
>  	taskstats_exit(tsk, group_dead);
>  

So does this patch address Oleg's objection to
mm-correctly-synchronize-rss-counters-at-exit-exec.patch?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
  2012-04-12  8:09   ` Konstantin Khlebnikov
@ 2012-04-12 23:39     ` Andrew Morton
  -1 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-12 23:39 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On Thu, 12 Apr 2012 12:09:53 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Child should wake ups parent from vfork() only after finishing all operations with
> shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
> but it looks more accurate now.
> 
> ...
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  	/* Get rid of any cached register state */
>  	deactivate_mm(tsk, mm);
>  
> -	if (tsk->vfork_done)
> -		complete_vfork_done(tsk);
> -
>  	/*
>  	 * If we're exiting normally, clear a user-space tid field if
>  	 * requested.  We leave this alone when dying by signal, to leave
> @@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  	 */
>  	if (mm)
>  		sync_mm_rss(mm);
> +
> +	/*
> +	 * All done, finally we can wake up parent and return this mm to him.
> +	 * Also kthread_stop() uses this completion for synchronization.
> +	 */
> +	if (tsk->vfork_done)
> +		complete_vfork_done(tsk);
>  }

That does look a bit racy.

But are we really sure that the patch really does fix something? 
Because it does increase vfork() latency a tiny bit.

I'm going to call this a patch against the fork subsystem, not the mm
subsystem.

I believe that this patch is unrelated to "mm: set task exit code
before complete_vfork_done()", yes?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
@ 2012-04-12 23:39     ` Andrew Morton
  0 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-12 23:39 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On Thu, 12 Apr 2012 12:09:53 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Child should wake ups parent from vfork() only after finishing all operations with
> shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
> but it looks more accurate now.
> 
> ...
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  	/* Get rid of any cached register state */
>  	deactivate_mm(tsk, mm);
>  
> -	if (tsk->vfork_done)
> -		complete_vfork_done(tsk);
> -
>  	/*
>  	 * If we're exiting normally, clear a user-space tid field if
>  	 * requested.  We leave this alone when dying by signal, to leave
> @@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  	 */
>  	if (mm)
>  		sync_mm_rss(mm);
> +
> +	/*
> +	 * All done, finally we can wake up parent and return this mm to him.
> +	 * Also kthread_stop() uses this completion for synchronization.
> +	 */
> +	if (tsk->vfork_done)
> +		complete_vfork_done(tsk);
>  }

That does look a bit racy.

But are we really sure that the patch really does fix something? 
Because it does increase vfork() latency a tiny bit.

I'm going to call this a patch against the fork subsystem, not the mm
subsystem.

I believe that this patch is unrelated to "mm: set task exit code
before complete_vfork_done()", yes?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
  2012-04-12  8:09   ` Konstantin Khlebnikov
@ 2012-04-12 23:54     ` Oleg Nesterov
  -1 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-12 23:54 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On 04/12, Konstantin Khlebnikov wrote:
>
> kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
> shouldn't do complete_vfork_done() until it sets ->exit_code.
>
> fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch

Yes, this should fix the problem with kthread_stop()...

Damn, Konstantin I have to admit, I'll try to find another technical
reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
even with this fix ;)

Most probably I am wrong, but it looks overcomplicated. Somehow I
dislike irrationally the fact you moved mm_release() from exit_mm().

But, once again, it is not that I see the better solution.

2/2 looks fine at first glance... and afaics it is "off-topic".

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-12 23:54     ` Oleg Nesterov
  0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-12 23:54 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On 04/12, Konstantin Khlebnikov wrote:
>
> kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
> shouldn't do complete_vfork_done() until it sets ->exit_code.
>
> fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch

Yes, this should fix the problem with kthread_stop()...

Damn, Konstantin I have to admit, I'll try to find another technical
reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
even with this fix ;)

Most probably I am wrong, but it looks overcomplicated. Somehow I
dislike irrationally the fact you moved mm_release() from exit_mm().

But, once again, it is not that I see the better solution.

2/2 looks fine at first glance... and afaics it is "off-topic".

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
  2012-04-12 23:39     ` Andrew Morton
@ 2012-04-13  6:43       ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-13  6:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

Andrew Morton wrote:
> On Thu, 12 Apr 2012 12:09:53 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> Child should wake ups parent from vfork() only after finishing all operations with
>> shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
>> but it looks more accurate now.
>>
>> ...
>>
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>>   	/* Get rid of any cached register state */
>>   	deactivate_mm(tsk, mm);
>>
>> -	if (tsk->vfork_done)
>> -		complete_vfork_done(tsk);
>> -
>>   	/*
>>   	 * If we're exiting normally, clear a user-space tid field if
>>   	 * requested.  We leave this alone when dying by signal, to leave
>> @@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>>   	 */
>>   	if (mm)
>>   		sync_mm_rss(mm);
>> +
>> +	/*
>> +	 * All done, finally we can wake up parent and return this mm to him.
>> +	 * Also kthread_stop() uses this completion for synchronization.
>> +	 */
>> +	if (tsk->vfork_done)
>> +		complete_vfork_done(tsk);
>>   }
>
> That does look a bit racy.
>
> But are we really sure that the patch really does fix something?
> Because it does increase vfork() latency a tiny bit.
>
> I'm going to call this a patch against the fork subsystem, not the mm
> subsystem.
>
> I believe that this patch is unrelated to "mm: set task exit code
> before complete_vfork_done()", yes?

Yes, unrelated.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
@ 2012-04-13  6:43       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-13  6:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

Andrew Morton wrote:
> On Thu, 12 Apr 2012 12:09:53 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>
>> Child should wake ups parent from vfork() only after finishing all operations with
>> shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
>> but it looks more accurate now.
>>
>> ...
>>
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>>   	/* Get rid of any cached register state */
>>   	deactivate_mm(tsk, mm);
>>
>> -	if (tsk->vfork_done)
>> -		complete_vfork_done(tsk);
>> -
>>   	/*
>>   	 * If we're exiting normally, clear a user-space tid field if
>>   	 * requested.  We leave this alone when dying by signal, to leave
>> @@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>>   	 */
>>   	if (mm)
>>   		sync_mm_rss(mm);
>> +
>> +	/*
>> +	 * All done, finally we can wake up parent and return this mm to him.
>> +	 * Also kthread_stop() uses this completion for synchronization.
>> +	 */
>> +	if (tsk->vfork_done)
>> +		complete_vfork_done(tsk);
>>   }
>
> That does look a bit racy.
>
> But are we really sure that the patch really does fix something?
> Because it does increase vfork() latency a tiny bit.
>
> I'm going to call this a patch against the fork subsystem, not the mm
> subsystem.
>
> I believe that this patch is unrelated to "mm: set task exit code
> before complete_vfork_done()", yes?

Yes, unrelated.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
  2012-04-12 23:54     ` Oleg Nesterov
@ 2012-04-20 17:59       ` Oleg Nesterov
  -1 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-20 17:59 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On 04/13, Oleg Nesterov wrote:
>
> Damn, Konstantin I have to admit, I'll try to find another technical
> reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
> even with this fix ;)
>
> Most probably I am wrong, but it looks overcomplicated. Somehow I
> dislike irrationally the fact you moved mm_release() from exit_mm().

And perhaps you can help me to discredit your patch?

It turns out, I do not really understand this code in do_exit:

	/* sync mm's RSS info before statistics gathering */
	if (tsk->mm)
		sync_mm_rss(tsk->mm);

Which "statistics gathering" ? Probably I missed something, but
after the quick grep it seems to me that this is only needed for
taskstats_exit()->xacct_add_tsk().

So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?
Yes, this way we do not "account" put_user(clear_child_tid) but
I think we do not care.

IOW, what do you think about the trivial patch below? Uncompiled,
untested, probably incomplete. acct_update_integrals() looks
suspicious too.

Oleg.

--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
 	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
 	mm = get_task_mm(p);
 	if (mm) {
+		sync_mm_rss(mm);
 		/* adjust to KB unit */
 		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
 		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
 	mm_release(tsk, mm);
 	if (!mm)
 		return;
+
+	sync_mm_rss(mm);
 	/*
 	 * Serialize with any possible pending coredump.
 	 * We must hold mmap_sem around checking core_state
@@ -960,9 +962,6 @@ void do_exit(long code)
 				preempt_count());
 
 	acct_update_integrals(tsk);
-	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
 		hrtimer_cancel(&tsk->signal->real_timer);
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
 	/* Notify parent that we're no longer interested in the old VM */
 	tsk = current;
 	old_mm = current->mm;
-	sync_mm_rss(old_mm);
 	mm_release(tsk, old_mm);
 
 	if (old_mm) {
+		sync_mm_rss(old_mm);
 		/*
 		 * Make sure that if there is a core dump in progress
 		 * for the old mm, we get out and die instead of going


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-20 17:59       ` Oleg Nesterov
  0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-20 17:59 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On 04/13, Oleg Nesterov wrote:
>
> Damn, Konstantin I have to admit, I'll try to find another technical
> reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
> even with this fix ;)
>
> Most probably I am wrong, but it looks overcomplicated. Somehow I
> dislike irrationally the fact you moved mm_release() from exit_mm().

And perhaps you can help me to discredit your patch?

It turns out, I do not really understand this code in do_exit:

	/* sync mm's RSS info before statistics gathering */
	if (tsk->mm)
		sync_mm_rss(tsk->mm);

Which "statistics gathering" ? Probably I missed something, but
after the quick grep it seems to me that this is only needed for
taskstats_exit()->xacct_add_tsk().

So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?
Yes, this way we do not "account" put_user(clear_child_tid) but
I think we do not care.

IOW, what do you think about the trivial patch below? Uncompiled,
untested, probably incomplete. acct_update_integrals() looks
suspicious too.

Oleg.

--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
 	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
 	mm = get_task_mm(p);
 	if (mm) {
+		sync_mm_rss(mm);
 		/* adjust to KB unit */
 		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
 		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
 	mm_release(tsk, mm);
 	if (!mm)
 		return;
+
+	sync_mm_rss(mm);
 	/*
 	 * Serialize with any possible pending coredump.
 	 * We must hold mmap_sem around checking core_state
@@ -960,9 +962,6 @@ void do_exit(long code)
 				preempt_count());
 
 	acct_update_integrals(tsk);
-	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
 		hrtimer_cancel(&tsk->signal->real_timer);
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
 	/* Notify parent that we're no longer interested in the old VM */
 	tsk = current;
 	old_mm = current->mm;
-	sync_mm_rss(old_mm);
 	mm_release(tsk, old_mm);
 
 	if (old_mm) {
+		sync_mm_rss(old_mm);
 		/*
 		 * Make sure that if there is a core dump in progress
 		 * for the old mm, we get out and die instead of going

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
  2012-04-20 17:59       ` Oleg Nesterov
@ 2012-04-20 19:23         ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-20 19:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

Oleg Nesterov wrote:
> On 04/13, Oleg Nesterov wrote:
>>
>> Damn, Konstantin I have to admit, I'll try to find another technical
>> reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
>> even with this fix ;)
>>
>> Most probably I am wrong, but it looks overcomplicated. Somehow I
>> dislike irrationally the fact you moved mm_release() from exit_mm().
>
> And perhaps you can help me to discredit your patch?
>
> It turns out, I do not really understand this code in do_exit:
>
> 	/* sync mm's RSS info before statistics gathering */
> 	if (tsk->mm)
> 		sync_mm_rss(tsk->mm);
>
> Which "statistics gathering" ? Probably I missed something, but
> after the quick grep it seems to me that this is only needed for
> taskstats_exit()->xacct_add_tsk().
>
> So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?

> Yes, this way we do not "account" put_user(clear_child_tid) but
> I think we do not care.

Why we don't care? Each thread can corrupt these counters by one.
I do not think that we are satisfied with nearly accurate rss accounting.
+/- one page for each clone()-exit().

Actually I don't really like this per-task rss-delta.
Probably it would be better to use per-cpu counters.

>
> IOW, what do you think about the trivial patch below? Uncompiled,
> untested, probably incomplete. acct_update_integrals() looks
> suspicious too.

what a mess! =)

>
> Oleg.
>
> --- a/kernel/tsacct.c
> +++ b/kernel/tsacct.c
> @@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
>   	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
>   	mm = get_task_mm(p);
>   	if (mm) {
> +		sync_mm_rss(mm);
>   		/* adjust to KB unit */
>   		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
>   		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
>   	mm_release(tsk, mm);
>   	if (!mm)
>   		return;
> +
> +	sync_mm_rss(mm);
>   	/*
>   	 * Serialize with any possible pending coredump.
>   	 * We must hold mmap_sem around checking core_state
> @@ -960,9 +962,6 @@ void do_exit(long code)
>   				preempt_count());
>
>   	acct_update_integrals(tsk);
> -	/* sync mm's RSS info before statistics gathering */
> -	if (tsk->mm)
> -		sync_mm_rss(tsk->mm);
>   	group_dead = atomic_dec_and_test(&tsk->signal->live);
>   	if (group_dead) {
>   		hrtimer_cancel(&tsk->signal->real_timer);
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
>   	/* Notify parent that we're no longer interested in the old VM */
>   	tsk = current;
>   	old_mm = current->mm;
> -	sync_mm_rss(old_mm);
>   	mm_release(tsk, old_mm);
>
>   	if (old_mm) {
> +		sync_mm_rss(old_mm);
>   		/*
>   		 * Make sure that if there is a core dump in progress
>   		 * for the old mm, we get out and die instead of going
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-20 19:23         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-20 19:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

Oleg Nesterov wrote:
> On 04/13, Oleg Nesterov wrote:
>>
>> Damn, Konstantin I have to admit, I'll try to find another technical
>> reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
>> even with this fix ;)
>>
>> Most probably I am wrong, but it looks overcomplicated. Somehow I
>> dislike irrationally the fact you moved mm_release() from exit_mm().
>
> And perhaps you can help me to discredit your patch?
>
> It turns out, I do not really understand this code in do_exit:
>
> 	/* sync mm's RSS info before statistics gathering */
> 	if (tsk->mm)
> 		sync_mm_rss(tsk->mm);
>
> Which "statistics gathering" ? Probably I missed something, but
> after the quick grep it seems to me that this is only needed for
> taskstats_exit()->xacct_add_tsk().
>
> So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?

> Yes, this way we do not "account" put_user(clear_child_tid) but
> I think we do not care.

Why we don't care? Each thread can corrupt these counters by one.
I do not think that we are satisfied with nearly accurate rss accounting.
+/- one page for each clone()-exit().

Actually I don't really like this per-task rss-delta.
Probably it would be better to use per-cpu counters.

>
> IOW, what do you think about the trivial patch below? Uncompiled,
> untested, probably incomplete. acct_update_integrals() looks
> suspicious too.

what a mess! =)

>
> Oleg.
>
> --- a/kernel/tsacct.c
> +++ b/kernel/tsacct.c
> @@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
>   	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
>   	mm = get_task_mm(p);
>   	if (mm) {
> +		sync_mm_rss(mm);
>   		/* adjust to KB unit */
>   		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
>   		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
>   	mm_release(tsk, mm);
>   	if (!mm)
>   		return;
> +
> +	sync_mm_rss(mm);
>   	/*
>   	 * Serialize with any possible pending coredump.
>   	 * We must hold mmap_sem around checking core_state
> @@ -960,9 +962,6 @@ void do_exit(long code)
>   				preempt_count());
>
>   	acct_update_integrals(tsk);
> -	/* sync mm's RSS info before statistics gathering */
> -	if (tsk->mm)
> -		sync_mm_rss(tsk->mm);
>   	group_dead = atomic_dec_and_test(&tsk->signal->live);
>   	if (group_dead) {
>   		hrtimer_cancel(&tsk->signal->real_timer);
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
>   	/* Notify parent that we're no longer interested in the old VM */
>   	tsk = current;
>   	old_mm = current->mm;
> -	sync_mm_rss(old_mm);
>   	mm_release(tsk, old_mm);
>
>   	if (old_mm) {
> +		sync_mm_rss(old_mm);
>   		/*
>   		 * Make sure that if there is a core dump in progress
>   		 * for the old mm, we get out and die instead of going
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
  2012-04-20 19:23         ` Konstantin Khlebnikov
@ 2012-04-20 20:41           ` Oleg Nesterov
  -1 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-20 20:41 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On 04/20, Konstantin Khlebnikov wrote:
>
> Oleg Nesterov wrote:
>>
>> 	/* sync mm's RSS info before statistics gathering */
>> 	if (tsk->mm)
>> 		sync_mm_rss(tsk->mm);
>>
>> Which "statistics gathering" ? Probably I missed something, but
>> after the quick grep it seems to me that this is only needed for
>> taskstats_exit()->xacct_add_tsk().
>>
>> So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?
>
>> Yes, this way we do not "account" put_user(clear_child_tid) but
>> I think we do not care.
>
> Why we don't care? Each thread can corrupt these counters by one.
> I do not think that we are satisfied with nearly accurate rss accounting.
> +/- one page for each clone()-exit().

Not actually "for each" in practice. Each exit does sync_ (with
this patch from xacct_add_tsk), the net effect should be small.

And. This is what we do now, nobody ever complained.

>> IOW, what do you think about the trivial patch below? Uncompiled,
>> untested, probably incomplete. acct_update_integrals() looks
>> suspicious too.
>
> what a mess! =)

Thanks ;)

But it is much, much simpler than your patches, don't you agree?

Oleg.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-20 20:41           ` Oleg Nesterov
  0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-20 20:41 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On 04/20, Konstantin Khlebnikov wrote:
>
> Oleg Nesterov wrote:
>>
>> 	/* sync mm's RSS info before statistics gathering */
>> 	if (tsk->mm)
>> 		sync_mm_rss(tsk->mm);
>>
>> Which "statistics gathering" ? Probably I missed something, but
>> after the quick grep it seems to me that this is only needed for
>> taskstats_exit()->xacct_add_tsk().
>>
>> So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?
>
>> Yes, this way we do not "account" put_user(clear_child_tid) but
>> I think we do not care.
>
> Why we don't care? Each thread can corrupt these counters by one.
> I do not think that we are satisfied with nearly accurate rss accounting.
> +/- one page for each clone()-exit().

Not actually "for each" in practice. Each exit does sync_ (with
this patch from xacct_add_tsk), the net effect should be small.

And. This is what we do now, nobody ever complained.

>> IOW, what do you think about the trivial patch below? Uncompiled,
>> untested, probably incomplete. acct_update_integrals() looks
>> suspicious too.
>
> what a mess! =)

Thanks ;)

But it is much, much simpler than your patches, don't you agree?

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
  2012-04-20 20:41           ` Oleg Nesterov
@ 2012-04-25 20:01             ` Andrew Morton
  -1 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-25 20:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Konstantin Khlebnikov, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On Fri, 20 Apr 2012 22:41:29 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> >> IOW, what do you think about the trivial patch below? Uncompiled,
> >> untested, probably incomplete. acct_update_integrals() looks
> >> suspicious too.
> >
> > what a mess! =)
> 
> Thanks ;)
> 
> But it is much, much simpler than your patches, don't you agree?

<taps his foot>

Here's what we currently have:

From: Konstantin Khlebnikov <khlebnikov@openvz.org>
Subject: mm: correctly synchronize rss-counters at exit/exec

mm->rss_stat counters have per-task delta: task->rss_stat.  Before
changing task->mm pointer the kernel must flush this delta with
sync_mm_rss().

do_exit() already calls sync_mm_rss() to flush the rss-counters before
committing the rss statistics into task->signal->maxrss, taskstats, audit
and other stuff.  Unfortunately the kernel does this before calling
mm_release(), which can call put_user() for processing
task->clear_child_tid.  So at this point we can trigger page-faults and
task->rss_stat becomes non-zero again.  As a result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:

| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
out of do_exit() and calls it earlier.  After mm_release() there should be
no pagefaults.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/exec.c     |    1 -
 kernel/exit.c |   13 ++++++++-----
 kernel/fork.c |    8 ++++++++
 3 files changed, 16 insertions(+), 6 deletions(-)

diff -puN fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec fs/exec.c
--- a/fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/fs/exec.c
@@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *m
 	/* Notify parent that we're no longer interested in the old VM */
 	tsk = current;
 	old_mm = current->mm;
-	sync_mm_rss(old_mm);
 	mm_release(tsk, old_mm);
 
 	if (old_mm) {
diff -puN kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/exit.c
--- a/kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/exit.c
@@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
 	 * user space pages.  We don't need them, and if we didn't close them
 	 * they would be locked into memory.
 	 */
+	mm_release(current, current->mm);
 	exit_mm(current);
 	/*
 	 * We don't want to get frozen, in case system-wide hibernation
@@ -640,7 +641,6 @@ static void exit_mm(struct task_struct *
 	struct mm_struct *mm = tsk->mm;
 	struct core_state *core_state;
 
-	mm_release(tsk, mm);
 	if (!mm)
 		return;
 	/*
@@ -959,9 +959,13 @@ void do_exit(long code)
 				preempt_count());
 
 	acct_update_integrals(tsk);
-	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
+
+	/* Set exit_code before complete_vfork_done() in mm_release() */
+	tsk->exit_code = code;
+
+	/* Release mm and sync mm's RSS info before statistics gathering */
+	mm_release(tsk, tsk->mm);
+
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
 		hrtimer_cancel(&tsk->signal->real_timer);
@@ -974,7 +978,6 @@ void do_exit(long code)
 		tty_audit_exit();
 	audit_free(tsk);
 
-	tsk->exit_code = code;
 	taskstats_exit(tsk, group_dead);
 
 	exit_mm(tsk);
diff -puN kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/fork.c
--- a/kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/fork.c
@@ -782,6 +782,14 @@ void mm_release(struct task_struct *tsk,
 		}
 		tsk->clear_child_tid = NULL;
 	}
+
+	/*
+	 * Final rss-counter synchronization. After this point there must be
+	 * no pagefaults into this mm from the current context.  Otherwise
+	 * mm->rss_stat will be inconsistent.
+	 */
+	if (mm)
+		sync_mm_rss(mm);
 }
 
 /*
_


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-25 20:01             ` Andrew Morton
  0 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-25 20:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Konstantin Khlebnikov, Hugh Dickins, linux-kernel, linux-mm,
	Markus Trippelsdorf, KAMEZAWA Hiroyuki

On Fri, 20 Apr 2012 22:41:29 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> >> IOW, what do you think about the trivial patch below? Uncompiled,
> >> untested, probably incomplete. acct_update_integrals() looks
> >> suspicious too.
> >
> > what a mess! =)
> 
> Thanks ;)
> 
> But it is much, much simpler than your patches, don't you agree?

<taps his foot>

Here's what we currently have:

From: Konstantin Khlebnikov <khlebnikov@openvz.org>
Subject: mm: correctly synchronize rss-counters at exit/exec

mm->rss_stat counters have per-task delta: task->rss_stat.  Before
changing task->mm pointer the kernel must flush this delta with
sync_mm_rss().

do_exit() already calls sync_mm_rss() to flush the rss-counters before
committing the rss statistics into task->signal->maxrss, taskstats, audit
and other stuff.  Unfortunately the kernel does this before calling
mm_release(), which can call put_user() for processing
task->clear_child_tid.  So at this point we can trigger page-faults and
task->rss_stat becomes non-zero again.  As a result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:

| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
out of do_exit() and calls it earlier.  After mm_release() there should be
no pagefaults.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/exec.c     |    1 -
 kernel/exit.c |   13 ++++++++-----
 kernel/fork.c |    8 ++++++++
 3 files changed, 16 insertions(+), 6 deletions(-)

diff -puN fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec fs/exec.c
--- a/fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/fs/exec.c
@@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *m
 	/* Notify parent that we're no longer interested in the old VM */
 	tsk = current;
 	old_mm = current->mm;
-	sync_mm_rss(old_mm);
 	mm_release(tsk, old_mm);
 
 	if (old_mm) {
diff -puN kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/exit.c
--- a/kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/exit.c
@@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
 	 * user space pages.  We don't need them, and if we didn't close them
 	 * they would be locked into memory.
 	 */
+	mm_release(current, current->mm);
 	exit_mm(current);
 	/*
 	 * We don't want to get frozen, in case system-wide hibernation
@@ -640,7 +641,6 @@ static void exit_mm(struct task_struct *
 	struct mm_struct *mm = tsk->mm;
 	struct core_state *core_state;
 
-	mm_release(tsk, mm);
 	if (!mm)
 		return;
 	/*
@@ -959,9 +959,13 @@ void do_exit(long code)
 				preempt_count());
 
 	acct_update_integrals(tsk);
-	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
+
+	/* Set exit_code before complete_vfork_done() in mm_release() */
+	tsk->exit_code = code;
+
+	/* Release mm and sync mm's RSS info before statistics gathering */
+	mm_release(tsk, tsk->mm);
+
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
 		hrtimer_cancel(&tsk->signal->real_timer);
@@ -974,7 +978,6 @@ void do_exit(long code)
 		tty_audit_exit();
 	audit_free(tsk);
 
-	tsk->exit_code = code;
 	taskstats_exit(tsk, group_dead);
 
 	exit_mm(tsk);
diff -puN kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/fork.c
--- a/kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/fork.c
@@ -782,6 +782,14 @@ void mm_release(struct task_struct *tsk,
 		}
 		tsk->clear_child_tid = NULL;
 	}
+
+	/*
+	 * Final rss-counter synchronization. After this point there must be
+	 * no pagefaults into this mm from the current context.  Otherwise
+	 * mm->rss_stat will be inconsistent.
+	 */
+	if (mm)
+		sync_mm_rss(mm);
 }
 
 /*
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2012-04-25 20:01 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-09 20:03 [PATCH] mm: sync rss-counters at the end of exit_mm() Konstantin Khlebnikov
2012-04-09 20:03 ` Konstantin Khlebnikov
2012-04-09 21:04 ` KOSAKI Motohiro
2012-04-09 21:04   ` KOSAKI Motohiro
2012-04-09 22:03   ` Hugh Dickins
2012-04-10  0:33     ` KAMEZAWA Hiroyuki
2012-04-10  0:33       ` KAMEZAWA Hiroyuki
2012-04-10  6:34     ` Konstantin Khlebnikov
2012-04-10  6:34       ` Konstantin Khlebnikov
2012-04-10 16:04       ` Markus Trippelsdorf
2012-04-10 16:04         ` Markus Trippelsdorf
2012-04-10  0:35 ` KAMEZAWA Hiroyuki
2012-04-10  0:35   ` KAMEZAWA Hiroyuki
2012-04-10  5:43   ` Konstantin Khlebnikov
2012-04-10  5:43     ` Konstantin Khlebnikov
2012-04-10 17:07 ` [PATCH v2] mm: correctly synchronize rss-counters at exit/exec Konstantin Khlebnikov
2012-04-10 17:07   ` Konstantin Khlebnikov
2012-04-10 18:43   ` Andrew Morton
2012-04-10 18:43     ` Andrew Morton
2012-04-10 19:52     ` Konstantin Khlebnikov
2012-04-10 19:52       ` Konstantin Khlebnikov
2012-04-10 19:10   ` Oleg Nesterov
2012-04-10 19:10     ` Oleg Nesterov
2012-04-10 20:09     ` Konstantin Khlebnikov
2012-04-10 20:09       ` Konstantin Khlebnikov
2012-04-12  8:09 ` [PATCH 1/2] mm: set task exit code before complete_vfork_done() Konstantin Khlebnikov
2012-04-12  8:09   ` Konstantin Khlebnikov
2012-04-12 23:35   ` Andrew Morton
2012-04-12 23:35     ` Andrew Morton
2012-04-12 23:54   ` Oleg Nesterov
2012-04-12 23:54     ` Oleg Nesterov
2012-04-20 17:59     ` Oleg Nesterov
2012-04-20 17:59       ` Oleg Nesterov
2012-04-20 19:23       ` Konstantin Khlebnikov
2012-04-20 19:23         ` Konstantin Khlebnikov
2012-04-20 20:41         ` Oleg Nesterov
2012-04-20 20:41           ` Oleg Nesterov
2012-04-25 20:01           ` Andrew Morton
2012-04-25 20:01             ` Andrew Morton
2012-04-12  8:09 ` [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters Konstantin Khlebnikov
2012-04-12  8:09   ` Konstantin Khlebnikov
2012-04-12 23:39   ` Andrew Morton
2012-04-12 23:39     ` Andrew Morton
2012-04-13  6:43     ` Konstantin Khlebnikov
2012-04-13  6:43       ` Konstantin Khlebnikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.