* [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-09 20:03 ` Konstantin Khlebnikov
0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-09 20:03 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, KAMEZAWA Hiroyuki, Hugh Dickins, linux-kernel,
Markus Trippelsdorf
On task's exit do_exit() calls sync_mm_rss() but this is not enough,
there can be page-faults after this point, for example exit_mm() ->
mm_release() -> put_user() (for processing tsk->clear_child_tid).
Thus there may be some rss-counters delta in current->rss_stat.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
kernel/exit.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..8e09dbe 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
enter_lazy_tlb(mm, current);
task_unlock(tsk);
mm_update_next_owner(mm);
+ sync_mm_rss(mm);
mmput(mm);
}
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-09 20:03 ` Konstantin Khlebnikov
0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-09 20:03 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, KAMEZAWA Hiroyuki, Hugh Dickins, linux-kernel,
Markus Trippelsdorf
On task's exit do_exit() calls sync_mm_rss() but this is not enough,
there can be page-faults after this point, for example exit_mm() ->
mm_release() -> put_user() (for processing tsk->clear_child_tid).
Thus there may be some rss-counters delta in current->rss_stat.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
kernel/exit.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..8e09dbe 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
enter_lazy_tlb(mm, current);
task_unlock(tsk);
mm_update_next_owner(mm);
+ sync_mm_rss(mm);
mmput(mm);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
2012-04-09 20:03 ` Konstantin Khlebnikov
@ 2012-04-09 21:04 ` KOSAKI Motohiro
-1 siblings, 0 replies; 45+ messages in thread
From: KOSAKI Motohiro @ 2012-04-09 21:04 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, linux-mm, KAMEZAWA Hiroyuki, Hugh Dickins,
linux-kernel, Markus Trippelsdorf
On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
<khlebnikov@openvz.org> wrote:
> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> there can be page-faults after this point, for example exit_mm() ->
> mm_release() -> put_user() (for processing tsk->clear_child_tid).
> Thus there may be some rss-counters delta in current->rss_stat.
Seems reasonable. but I have another question. Do we have any reason to
keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
makes rss consistency.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> kernel/exit.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index d8bd3b42..8e09dbe 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
> enter_lazy_tlb(mm, current);
> task_unlock(tsk);
> mm_update_next_owner(mm);
> + sync_mm_rss(mm);
> mmput(mm);
> }
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-09 21:04 ` KOSAKI Motohiro
0 siblings, 0 replies; 45+ messages in thread
From: KOSAKI Motohiro @ 2012-04-09 21:04 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, linux-mm, KAMEZAWA Hiroyuki, Hugh Dickins,
linux-kernel, Markus Trippelsdorf
On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
<khlebnikov@openvz.org> wrote:
> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> there can be page-faults after this point, for example exit_mm() ->
> mm_release() -> put_user() (for processing tsk->clear_child_tid).
> Thus there may be some rss-counters delta in current->rss_stat.
Seems reasonable. but I have another question. Do we have any reason to
keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
makes rss consistency.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> kernel/exit.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index d8bd3b42..8e09dbe 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
> enter_lazy_tlb(mm, current);
> task_unlock(tsk);
> mm_update_next_owner(mm);
> + sync_mm_rss(mm);
> mmput(mm);
> }
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
2012-04-09 21:04 ` KOSAKI Motohiro
(?)
@ 2012-04-09 22:03 ` Hugh Dickins
2012-04-10 0:33 ` KAMEZAWA Hiroyuki
2012-04-10 6:34 ` Konstantin Khlebnikov
-1 siblings, 2 replies; 45+ messages in thread
From: Hugh Dickins @ 2012-04-09 22:03 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm,
KAMEZAWA Hiroyuki, linux-kernel, Markus Trippelsdorf
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2258 bytes --]
On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
> <khlebnikov@openvz.org> wrote:
> > On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> > there can be page-faults after this point, for example exit_mm() ->
> > mm_release() -> put_user() (for processing tsk->clear_child_tid).
> > Thus there may be some rss-counters delta in current->rss_stat.
>
> Seems reasonable.
Yes, I think Konstantin has probably caught it;
but I'd like to hear confirmation from Markus.
> but I have another question. Do we have any reason to
> keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
> makes rss consistency.
IIRC it's all about the hiwater_rss/maxrss stuff: we want to sync the
maximum rss into mm->hiwater_rss before it's transferred to signal->maxrss,
and later made visible to the user though getrusage(RUSAGE_CHILDREN,) -
does your reading confirm that?
Konstantin now finds the child_tid and futex stuff can trigger faults
raising rss beyond that point, but usually it won't go higher than when
it was captured for maxrss there.
The sync_mm_rss() added by this patch (after "tsk->mm = NULL" so
*_mm_counter_fast() cannot store any more into the tsk even if there
were more faults) is solely to satisfy Konstantin's check_mm(), and
it is irritating to have that duplicated on the exit path.
I'd be happy to see the new one put under CONFIG_DEBUG_VM along with
check_mm(), once it's had a few -rcs of exposure without.
Hugh
>
>
> >
> > Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> > Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> > kernel/exit.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index d8bd3b42..8e09dbe 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
> > enter_lazy_tlb(mm, current);
> > task_unlock(tsk);
> > mm_update_next_owner(mm);
> > + sync_mm_rss(mm);
> > mmput(mm);
> > }
> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
2012-04-09 22:03 ` Hugh Dickins
@ 2012-04-10 0:33 ` KAMEZAWA Hiroyuki
2012-04-10 6:34 ` Konstantin Khlebnikov
1 sibling, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-10 0:33 UTC (permalink / raw)
To: Hugh Dickins
Cc: KOSAKI Motohiro, Konstantin Khlebnikov, Andrew Morton, linux-mm,
linux-kernel, Markus Trippelsdorf
(2012/04/10 7:03), Hugh Dickins wrote:
> On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
>> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
>> <khlebnikov@openvz.org> wrote:
>>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>>> there can be page-faults after this point, for example exit_mm() ->
>>> mm_release() -> put_user() (for processing tsk->clear_child_tid).
>>> Thus there may be some rss-counters delta in current->rss_stat.
>>
>> Seems reasonable.
>
> Yes, I think Konstantin has probably caught it;
> but I'd like to hear confirmation from Markus.
>
>> but I have another question. Do we have any reason to
>> keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
>> makes rss consistency.
>
> IIRC it's all about the hiwater_rss/maxrss stuff: we want to sync the
> maximum rss into mm->hiwater_rss before it's transferred to signal->maxrss,
> and later made visible to the user though getrusage(RUSAGE_CHILDREN,) -
> does your reading confirm that?
>
IIRC, sync_mm_rss() in do_exit() is for synchronizing rsscounter for taskacct.
mm->maxrss is sent to listener by xacct_add_tsk(). It's needed to be
synchronized before taskstat_exit()..
Hm, but, exit_mm() is placed after taskstat_exit().
Thanks,
-Kame
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-10 0:33 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-10 0:33 UTC (permalink / raw)
To: Hugh Dickins
Cc: KOSAKI Motohiro, Konstantin Khlebnikov, Andrew Morton, linux-mm,
linux-kernel, Markus Trippelsdorf
(2012/04/10 7:03), Hugh Dickins wrote:
> On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
>> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
>> <khlebnikov@openvz.org> wrote:
>>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>>> there can be page-faults after this point, for example exit_mm() ->
>>> mm_release() -> put_user() (for processing tsk->clear_child_tid).
>>> Thus there may be some rss-counters delta in current->rss_stat.
>>
>> Seems reasonable.
>
> Yes, I think Konstantin has probably caught it;
> but I'd like to hear confirmation from Markus.
>
>> but I have another question. Do we have any reason to
>> keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
>> makes rss consistency.
>
> IIRC it's all about the hiwater_rss/maxrss stuff: we want to sync the
> maximum rss into mm->hiwater_rss before it's transferred to signal->maxrss,
> and later made visible to the user though getrusage(RUSAGE_CHILDREN,) -
> does your reading confirm that?
>
IIRC, sync_mm_rss() in do_exit() is for synchronizing rsscounter for taskacct.
mm->maxrss is sent to listener by xacct_add_tsk(). It's needed to be
synchronized before taskstat_exit()..
Hm, but, exit_mm() is placed after taskstat_exit().
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
2012-04-09 20:03 ` Konstantin Khlebnikov
@ 2012-04-10 0:35 ` KAMEZAWA Hiroyuki
-1 siblings, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-10 0:35 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, linux-mm, Hugh Dickins, linux-kernel, Markus Trippelsdorf
(2012/04/10 5:03), Konstantin Khlebnikov wrote:
> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> there can be page-faults after this point, for example exit_mm() ->
> mm_release() -> put_user() (for processing tsk->clear_child_tid).
> Thus there may be some rss-counters delta in current->rss_stat.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Does this fix recent issue reported ?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-10 0:35 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-10 0:35 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, linux-mm, Hugh Dickins, linux-kernel, Markus Trippelsdorf
(2012/04/10 5:03), Konstantin Khlebnikov wrote:
> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> there can be page-faults after this point, for example exit_mm() ->
> mm_release() -> put_user() (for processing tsk->clear_child_tid).
> Thus there may be some rss-counters delta in current->rss_stat.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Does this fix recent issue reported ?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
2012-04-10 0:35 ` KAMEZAWA Hiroyuki
@ 2012-04-10 5:43 ` Konstantin Khlebnikov
-1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 5:43 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Andrew Morton, linux-mm, Hugh Dickins, linux-kernel, Markus Trippelsdorf
KAMEZAWA Hiroyuki wrote:
> (2012/04/10 5:03), Konstantin Khlebnikov wrote:
>
>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>> there can be page-faults after this point, for example exit_mm() ->
>> mm_release() -> put_user() (for processing tsk->clear_child_tid).
>> Thus there may be some rss-counters delta in current->rss_stat.
I had to mention it in comment:
This should fix warnings:
BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
>> Cc: Hugh Dickins<hughd@google.com>
>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> Reviewed-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> Does this fix recent issue reported ?
>
I hope so.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-10 5:43 ` Konstantin Khlebnikov
0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 5:43 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Andrew Morton, linux-mm, Hugh Dickins, linux-kernel, Markus Trippelsdorf
KAMEZAWA Hiroyuki wrote:
> (2012/04/10 5:03), Konstantin Khlebnikov wrote:
>
>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>> there can be page-faults after this point, for example exit_mm() ->
>> mm_release() -> put_user() (for processing tsk->clear_child_tid).
>> Thus there may be some rss-counters delta in current->rss_stat.
I had to mention it in comment:
This should fix warnings:
BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
>> Cc: Hugh Dickins<hughd@google.com>
>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> Reviewed-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>
> Does this fix recent issue reported ?
>
I hope so.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
2012-04-09 22:03 ` Hugh Dickins
@ 2012-04-10 6:34 ` Konstantin Khlebnikov
2012-04-10 6:34 ` Konstantin Khlebnikov
1 sibling, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 6:34 UTC (permalink / raw)
To: Hugh Dickins
Cc: KOSAKI Motohiro, Andrew Morton, linux-mm, KAMEZAWA Hiroyuki,
linux-kernel, Markus Trippelsdorf
Hugh Dickins wrote:
> On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
>> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
>> <khlebnikov@openvz.org> wrote:
>>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>>> there can be page-faults after this point, for example exit_mm() ->
>>> mm_release() -> put_user() (for processing tsk->clear_child_tid).
>>> Thus there may be some rss-counters delta in current->rss_stat.
>>
>> Seems reasonable.
>
> Yes, I think Konstantin has probably caught it;
> but I'd like to hear confirmation from Markus.
There is another bug in exec_mmap()
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,8 +823,8 @@ static int exec_mmap(struct mm_struct *mm)
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
- sync_mm_rss(old_mm);
mm_release(tsk, old_mm);
+ sync_mm_rss(old_mm);
if (old_mm) {
/*
>
>> but I have another question. Do we have any reason to
>> keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
>> makes rss consistency.
>
> IIRC it's all about the hiwater_rss/maxrss stuff: we want to sync the
> maximum rss into mm->hiwater_rss before it's transferred to signal->maxrss,
> and later made visible to the user though getrusage(RUSAGE_CHILDREN,) -
> does your reading confirm that?
>
> Konstantin now finds the child_tid and futex stuff can trigger faults
> raising rss beyond that point, but usually it won't go higher than when
> it was captured for maxrss there.
>
> The sync_mm_rss() added by this patch (after "tsk->mm = NULL" so
> *_mm_counter_fast() cannot store any more into the tsk even if there
> were more faults) is solely to satisfy Konstantin's check_mm(), and
> it is irritating to have that duplicated on the exit path.
It was quick fix after the midnight. =) Now I think we can move mm_release()
from exit_mm() to do_exit(), and place it before sync_mm_rss(). Other stuff
there shouldn't trigger page-faults. Thus here will be only one sync_mm_rss():
at the end of mm_release()
>
> I'd be happy to see the new one put under CONFIG_DEBUG_VM along with
> check_mm(), once it's had a few -rcs of exposure without.
>
> Hugh
>
>>
>>
>>>
>>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
>>> Cc: Hugh Dickins<hughd@google.com>
>>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>> ---
>>> kernel/exit.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/kernel/exit.c b/kernel/exit.c
>>> index d8bd3b42..8e09dbe 100644
>>> --- a/kernel/exit.c
>>> +++ b/kernel/exit.c
>>> @@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
>>> enter_lazy_tlb(mm, current);
>>> task_unlock(tsk);
>>> mm_update_next_owner(mm);
>>> + sync_mm_rss(mm);
>>> mmput(mm);
>>> }
>> >
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-10 6:34 ` Konstantin Khlebnikov
0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 6:34 UTC (permalink / raw)
To: Hugh Dickins
Cc: KOSAKI Motohiro, Andrew Morton, linux-mm, KAMEZAWA Hiroyuki,
linux-kernel, Markus Trippelsdorf
Hugh Dickins wrote:
> On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
>> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
>> <khlebnikov@openvz.org> wrote:
>>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
>>> there can be page-faults after this point, for example exit_mm() ->
>>> mm_release() -> put_user() (for processing tsk->clear_child_tid).
>>> Thus there may be some rss-counters delta in current->rss_stat.
>>
>> Seems reasonable.
>
> Yes, I think Konstantin has probably caught it;
> but I'd like to hear confirmation from Markus.
There is another bug in exec_mmap()
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,8 +823,8 @@ static int exec_mmap(struct mm_struct *mm)
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
- sync_mm_rss(old_mm);
mm_release(tsk, old_mm);
+ sync_mm_rss(old_mm);
if (old_mm) {
/*
>
>> but I have another question. Do we have any reason to
>> keep sync_mm_rss() in do_exit()? I havn't seen any reason that thread exiting
>> makes rss consistency.
>
> IIRC it's all about the hiwater_rss/maxrss stuff: we want to sync the
> maximum rss into mm->hiwater_rss before it's transferred to signal->maxrss,
> and later made visible to the user though getrusage(RUSAGE_CHILDREN,) -
> does your reading confirm that?
>
> Konstantin now finds the child_tid and futex stuff can trigger faults
> raising rss beyond that point, but usually it won't go higher than when
> it was captured for maxrss there.
>
> The sync_mm_rss() added by this patch (after "tsk->mm = NULL" so
> *_mm_counter_fast() cannot store any more into the tsk even if there
> were more faults) is solely to satisfy Konstantin's check_mm(), and
> it is irritating to have that duplicated on the exit path.
It was quick fix after the midnight. =) Now I think we can move mm_release()
from exit_mm() to do_exit(), and place it before sync_mm_rss(). Other stuff
there shouldn't trigger page-faults. Thus here will be only one sync_mm_rss():
at the end of mm_release()
>
> I'd be happy to see the new one put under CONFIG_DEBUG_VM along with
> check_mm(), once it's had a few -rcs of exposure without.
>
> Hugh
>
>>
>>
>>>
>>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
>>> Cc: Hugh Dickins<hughd@google.com>
>>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>> ---
>>> kernel/exit.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/kernel/exit.c b/kernel/exit.c
>>> index d8bd3b42..8e09dbe 100644
>>> --- a/kernel/exit.c
>>> +++ b/kernel/exit.c
>>> @@ -683,6 +683,7 @@ static void exit_mm(struct task_struct * tsk)
>>> enter_lazy_tlb(mm, current);
>>> task_unlock(tsk);
>>> mm_update_next_owner(mm);
>>> + sync_mm_rss(mm);
>>> mmput(mm);
>>> }
>> >
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
2012-04-10 6:34 ` Konstantin Khlebnikov
@ 2012-04-10 16:04 ` Markus Trippelsdorf
-1 siblings, 0 replies; 45+ messages in thread
From: Markus Trippelsdorf @ 2012-04-10 16:04 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Hugh Dickins, KOSAKI Motohiro, Andrew Morton, linux-mm,
KAMEZAWA Hiroyuki, linux-kernel
On 2012.04.10 at 10:34 +0400, Konstantin Khlebnikov wrote:
> Hugh Dickins wrote:
> > On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
> >> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
> >> <khlebnikov@openvz.org> wrote:
> >>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> >>> there can be page-faults after this point, for example exit_mm() ->
> >>> mm_release() -> put_user() (for processing tsk->clear_child_tid).
> >>> Thus there may be some rss-counters delta in current->rss_stat.
> >>
> >> Seems reasonable.
> >
> > Yes, I think Konstantin has probably caught it;
> > but I'd like to hear confirmation from Markus.
>
> There is another bug in exec_mmap()
>
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -823,8 +823,8 @@ static int exec_mmap(struct mm_struct *mm)
> /* Notify parent that we're no longer interested in the old VM */
> tsk = current;
> old_mm = current->mm;
> - sync_mm_rss(old_mm);
> mm_release(tsk, old_mm);
> + sync_mm_rss(old_mm);
>
> if (old_mm) {
> /*
FWIW with both patches applied I cannot reproduce the issue anymore.
Thanks.
--
Markus
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] mm: sync rss-counters at the end of exit_mm()
@ 2012-04-10 16:04 ` Markus Trippelsdorf
0 siblings, 0 replies; 45+ messages in thread
From: Markus Trippelsdorf @ 2012-04-10 16:04 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Hugh Dickins, KOSAKI Motohiro, Andrew Morton, linux-mm,
KAMEZAWA Hiroyuki, linux-kernel
On 2012.04.10 at 10:34 +0400, Konstantin Khlebnikov wrote:
> Hugh Dickins wrote:
> > On Mon, 9 Apr 2012, KOSAKI Motohiro wrote:
> >> On Mon, Apr 9, 2012 at 4:03 PM, Konstantin Khlebnikov
> >> <khlebnikov@openvz.org> wrote:
> >>> On task's exit do_exit() calls sync_mm_rss() but this is not enough,
> >>> there can be page-faults after this point, for example exit_mm() ->
> >>> mm_release() -> put_user() (for processing tsk->clear_child_tid).
> >>> Thus there may be some rss-counters delta in current->rss_stat.
> >>
> >> Seems reasonable.
> >
> > Yes, I think Konstantin has probably caught it;
> > but I'd like to hear confirmation from Markus.
>
> There is another bug in exec_mmap()
>
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -823,8 +823,8 @@ static int exec_mmap(struct mm_struct *mm)
> /* Notify parent that we're no longer interested in the old VM */
> tsk = current;
> old_mm = current->mm;
> - sync_mm_rss(old_mm);
> mm_release(tsk, old_mm);
> + sync_mm_rss(old_mm);
>
> if (old_mm) {
> /*
FWIW with both patches applied I cannot reproduce the issue anymore.
Thanks.
--
Markus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
2012-04-09 20:03 ` Konstantin Khlebnikov
@ 2012-04-10 17:07 ` Konstantin Khlebnikov
-1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 17:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
mm->rss_stat counters have per-task delta: task->rss_stat, before changing
task->mm pointer kernel must flush this delta with help of sync_mm_rss().
do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
Unfortunately kernel do this before calling mm_relese(), which can call put_user()
for processing task->clear_child_tid. So at this point we can trigger page-faults
and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
do_exit() and calls it earlier. After mm_release() there should be no page-faults.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
fs/exec.c | 1 -
kernel/exit.c | 9 +++++----
kernel/fork.c | 8 ++++++++
3 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index b1fd202..5be1d97 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *mm)
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
- sync_mm_rss(old_mm);
mm_release(tsk, old_mm);
if (old_mm) {
diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..eb12719 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
* user space pages. We don't need them, and if we didn't close them
* they would be locked into memory.
*/
+ mm_release(current, current->mm);
exit_mm(current);
/*
* We don't want to get frozen, in case system-wide hibernation
@@ -640,7 +641,6 @@ static void exit_mm(struct task_struct * tsk)
struct mm_struct *mm = tsk->mm;
struct core_state *core_state;
- mm_release(tsk, mm);
if (!mm)
return;
/*
@@ -959,9 +959,10 @@ void do_exit(long code)
preempt_count());
acct_update_integrals(tsk);
- /* sync mm's RSS info before statistics gathering */
- if (tsk->mm)
- sync_mm_rss(tsk->mm);
+
+ /* Release mm and sync mm's RSS info before statistics gathering */
+ mm_release(tsk, tsk->mm);
+
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
diff --git a/kernel/fork.c b/kernel/fork.c
index 54662ed..326bb5b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
}
tsk->clear_child_tid = NULL;
}
+
+ /*
+ * Final rss-counter synchronization. After this point must be
+ * no page-faults into this mm from current context, otherwise
+ * mm->rss_stat will be inconsistent.
+ */
+ if (mm)
+ sync_mm_rss(mm);
}
/*
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
@ 2012-04-10 17:07 ` Konstantin Khlebnikov
0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 17:07 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
mm->rss_stat counters have per-task delta: task->rss_stat, before changing
task->mm pointer kernel must flush this delta with help of sync_mm_rss().
do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
Unfortunately kernel do this before calling mm_relese(), which can call put_user()
for processing task->clear_child_tid. So at this point we can trigger page-faults
and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
do_exit() and calls it earlier. After mm_release() there should be no page-faults.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
fs/exec.c | 1 -
kernel/exit.c | 9 +++++----
kernel/fork.c | 8 ++++++++
3 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index b1fd202..5be1d97 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *mm)
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
- sync_mm_rss(old_mm);
mm_release(tsk, old_mm);
if (old_mm) {
diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..eb12719 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
* user space pages. We don't need them, and if we didn't close them
* they would be locked into memory.
*/
+ mm_release(current, current->mm);
exit_mm(current);
/*
* We don't want to get frozen, in case system-wide hibernation
@@ -640,7 +641,6 @@ static void exit_mm(struct task_struct * tsk)
struct mm_struct *mm = tsk->mm;
struct core_state *core_state;
- mm_release(tsk, mm);
if (!mm)
return;
/*
@@ -959,9 +959,10 @@ void do_exit(long code)
preempt_count());
acct_update_integrals(tsk);
- /* sync mm's RSS info before statistics gathering */
- if (tsk->mm)
- sync_mm_rss(tsk->mm);
+
+ /* Release mm and sync mm's RSS info before statistics gathering */
+ mm_release(tsk, tsk->mm);
+
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
diff --git a/kernel/fork.c b/kernel/fork.c
index 54662ed..326bb5b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
}
tsk->clear_child_tid = NULL;
}
+
+ /*
+ * Final rss-counter synchronization. After this point must be
+ * no page-faults into this mm from current context, otherwise
+ * mm->rss_stat will be inconsistent.
+ */
+ if (mm)
+ sync_mm_rss(mm);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
2012-04-10 17:07 ` Konstantin Khlebnikov
@ 2012-04-10 18:43 ` Andrew Morton
-1 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-10 18:43 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On Tue, 10 Apr 2012 21:07:32 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:
> mm->rss_stat counters have per-task delta: task->rss_stat, before changing
> task->mm pointer kernel must flush this delta with help of sync_mm_rss().
>
> do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
> rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
> Unfortunately kernel do this before calling mm_relese(), which can call put_user()
> for processing task->clear_child_tid. So at this point we can trigger page-faults
> and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
> inconsistent and check_mm() will print something like this:
>
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
>
> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
>
> ...
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> }
> tsk->clear_child_tid = NULL;
> }
> +
> + /*
> + * Final rss-counter synchronization. After this point must be
> + * no page-faults into this mm from current context, otherwise
> + * mm->rss_stat will be inconsistent.
> + */
> + if (mm)
> + sync_mm_rss(mm);
> }
>
Well that's scary. AFACIT `mm' can indeed be NULL here, when a kernel
thread calls do_exit(). No implementation of deactivate_mm() actually
uses its `mm' arg and I guess that kernel threads never set
tsk->clear_child_tid. Whee.
Do we think we should backport this into -stable kernels? How hard is
it to make that warning come out?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
@ 2012-04-10 18:43 ` Andrew Morton
0 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-10 18:43 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On Tue, 10 Apr 2012 21:07:32 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:
> mm->rss_stat counters have per-task delta: task->rss_stat, before changing
> task->mm pointer kernel must flush this delta with help of sync_mm_rss().
>
> do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
> rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
> Unfortunately kernel do this before calling mm_relese(), which can call put_user()
> for processing task->clear_child_tid. So at this point we can trigger page-faults
> and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
> inconsistent and check_mm() will print something like this:
>
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
>
> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
>
> ...
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> }
> tsk->clear_child_tid = NULL;
> }
> +
> + /*
> + * Final rss-counter synchronization. After this point must be
> + * no page-faults into this mm from current context, otherwise
> + * mm->rss_stat will be inconsistent.
> + */
> + if (mm)
> + sync_mm_rss(mm);
> }
>
Well that's scary. AFACIT `mm' can indeed be NULL here, when a kernel
thread calls do_exit(). No implementation of deactivate_mm() actually
uses its `mm' arg and I guess that kernel threads never set
tsk->clear_child_tid. Whee.
Do we think we should backport this into -stable kernels? How hard is
it to make that warning come out?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
2012-04-10 17:07 ` Konstantin Khlebnikov
@ 2012-04-10 19:10 ` Oleg Nesterov
-1 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-10 19:10 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On 04/10, Konstantin Khlebnikov wrote:
>
> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
Can't prove, but I feel there should be a simpler fix...
Anyway, this patch is not exactly correct.
> @@ -959,9 +959,10 @@ void do_exit(long code)
> preempt_count());
>
> acct_update_integrals(tsk);
> - /* sync mm's RSS info before statistics gathering */
> - if (tsk->mm)
> - sync_mm_rss(tsk->mm);
> +
> + /* Release mm and sync mm's RSS info before statistics gathering */
> + mm_release(tsk, tsk->mm);
This breaks kthread_stop() at least.
The exiting kthread shouldn't do complete_vfork_done() until it
sets ->exit_code.
Oleg.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
@ 2012-04-10 19:10 ` Oleg Nesterov
0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-10 19:10 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On 04/10, Konstantin Khlebnikov wrote:
>
> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
Can't prove, but I feel there should be a simpler fix...
Anyway, this patch is not exactly correct.
> @@ -959,9 +959,10 @@ void do_exit(long code)
> preempt_count());
>
> acct_update_integrals(tsk);
> - /* sync mm's RSS info before statistics gathering */
> - if (tsk->mm)
> - sync_mm_rss(tsk->mm);
> +
> + /* Release mm and sync mm's RSS info before statistics gathering */
> + mm_release(tsk, tsk->mm);
This breaks kthread_stop() at least.
The exiting kthread shouldn't do complete_vfork_done() until it
sets ->exit_code.
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
2012-04-10 18:43 ` Andrew Morton
@ 2012-04-10 19:52 ` Konstantin Khlebnikov
-1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 19:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
Andrew Morton wrote:
> On Tue, 10 Apr 2012 21:07:32 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org> wrote:
>
>> mm->rss_stat counters have per-task delta: task->rss_stat, before changing
>> task->mm pointer kernel must flush this delta with help of sync_mm_rss().
>>
>> do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
>> rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
>> Unfortunately kernel do this before calling mm_relese(), which can call put_user()
>> for processing task->clear_child_tid. So at this point we can trigger page-faults
>> and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
>> inconsistent and check_mm() will print something like this:
>>
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
>>
>> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
>> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
>>
>> ...
>>
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>> }
>> tsk->clear_child_tid = NULL;
>> }
>> +
>> + /*
>> + * Final rss-counter synchronization. After this point must be
>> + * no page-faults into this mm from current context, otherwise
>> + * mm->rss_stat will be inconsistent.
>> + */
>> + if (mm)
>> + sync_mm_rss(mm);
>> }
>>
>
> Well that's scary. AFACIT `mm' can indeed be NULL here, when a kernel
> thread calls do_exit(). No implementation of deactivate_mm() actually
> uses its `mm' arg and I guess that kernel threads never set
> tsk->clear_child_tid. Whee.
But it works as designed =)
>
>
> Do we think we should backport this into -stable kernels? How hard is
> it to make that warning come out?
>
Warning was introduced in v3.3-3720-gc3f0327, so no releases with them.
And I think older kernels can live fine with slightly racy rss counters.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
@ 2012-04-10 19:52 ` Konstantin Khlebnikov
0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 19:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, linux-kernel, Oleg Nesterov, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
Andrew Morton wrote:
> On Tue, 10 Apr 2012 21:07:32 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org> wrote:
>
>> mm->rss_stat counters have per-task delta: task->rss_stat, before changing
>> task->mm pointer kernel must flush this delta with help of sync_mm_rss().
>>
>> do_exit() already calls sync_mm_rss() to flush rss-counters before commiting
>> rss-statistics into task->signal->maxrss, taskstats, audit and other stuff.
>> Unfortunately kernel do this before calling mm_relese(), which can call put_user()
>> for processing task->clear_child_tid. So at this point we can trigger page-faults
>> and task->rss_stat becomes non-zero again, as result mm->rss_stat becomes
>> inconsistent and check_mm() will print something like this:
>>
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
>>
>> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
>> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
>>
>> ...
>>
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -751,6 +751,14 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>> }
>> tsk->clear_child_tid = NULL;
>> }
>> +
>> + /*
>> + * Final rss-counter synchronization. After this point must be
>> + * no page-faults into this mm from current context, otherwise
>> + * mm->rss_stat will be inconsistent.
>> + */
>> + if (mm)
>> + sync_mm_rss(mm);
>> }
>>
>
> Well that's scary. AFACIT `mm' can indeed be NULL here, when a kernel
> thread calls do_exit(). No implementation of deactivate_mm() actually
> uses its `mm' arg and I guess that kernel threads never set
> tsk->clear_child_tid. Whee.
But it works as designed =)
>
>
> Do we think we should backport this into -stable kernels? How hard is
> it to make that warning come out?
>
Warning was introduced in v3.3-3720-gc3f0327, so no releases with them.
And I think older kernels can live fine with slightly racy rss counters.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
2012-04-10 19:10 ` Oleg Nesterov
@ 2012-04-10 20:09 ` Konstantin Khlebnikov
-1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 20:09 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
Oleg Nesterov wrote:
> On 04/10, Konstantin Khlebnikov wrote:
>>
>> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
>> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
>
> Can't prove, but I feel there should be a simpler fix...
>
> Anyway, this patch is not exactly correct.
>
>> @@ -959,9 +959,10 @@ void do_exit(long code)
>> preempt_count());
>>
>> acct_update_integrals(tsk);
>> - /* sync mm's RSS info before statistics gathering */
>> - if (tsk->mm)
>> - sync_mm_rss(tsk->mm);
>> +
>> + /* Release mm and sync mm's RSS info before statistics gathering */
>> + mm_release(tsk, tsk->mm);
>
> This breaks kthread_stop() at least.
>
> The exiting kthread shouldn't do complete_vfork_done() until it
> sets ->exit_code.
Ouch, I was afraid something like that.
>
> Oleg.
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v2] mm: correctly synchronize rss-counters at exit/exec
@ 2012-04-10 20:09 ` Konstantin Khlebnikov
0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-10 20:09 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
Oleg Nesterov wrote:
> On 04/10, Konstantin Khlebnikov wrote:
>>
>> This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of
>> do_exit() and calls it earlier. After mm_release() there should be no page-faults.
>
> Can't prove, but I feel there should be a simpler fix...
>
> Anyway, this patch is not exactly correct.
>
>> @@ -959,9 +959,10 @@ void do_exit(long code)
>> preempt_count());
>>
>> acct_update_integrals(tsk);
>> - /* sync mm's RSS info before statistics gathering */
>> - if (tsk->mm)
>> - sync_mm_rss(tsk->mm);
>> +
>> + /* Release mm and sync mm's RSS info before statistics gathering */
>> + mm_release(tsk, tsk->mm);
>
> This breaks kthread_stop() at least.
>
> The exiting kthread shouldn't do complete_vfork_done() until it
> sets ->exit_code.
Ouch, I was afraid something like that.
>
> Oleg.
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 1/2] mm: set task exit code before complete_vfork_done()
2012-04-09 20:03 ` Konstantin Khlebnikov
@ 2012-04-12 8:09 ` Konstantin Khlebnikov
-1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-12 8:09 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
shouldn't do complete_vfork_done() until it sets ->exit_code.
fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
kernel/exit.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index eb12719..70875a6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -960,6 +960,9 @@ void do_exit(long code)
acct_update_integrals(tsk);
+ /* Set exit_code before complete_vfork_done() in mm_release() */
+ tsk->exit_code = code;
+
/* Release mm and sync mm's RSS info before statistics gathering */
mm_release(tsk, tsk->mm);
@@ -975,7 +978,6 @@ void do_exit(long code)
tty_audit_exit();
audit_free(tsk);
- tsk->exit_code = code;
taskstats_exit(tsk, group_dead);
exit_mm(tsk);
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-12 8:09 ` Konstantin Khlebnikov
0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-12 8:09 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
shouldn't do complete_vfork_done() until it sets ->exit_code.
fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
kernel/exit.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index eb12719..70875a6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -960,6 +960,9 @@ void do_exit(long code)
acct_update_integrals(tsk);
+ /* Set exit_code before complete_vfork_done() in mm_release() */
+ tsk->exit_code = code;
+
/* Release mm and sync mm's RSS info before statistics gathering */
mm_release(tsk, tsk->mm);
@@ -975,7 +978,6 @@ void do_exit(long code)
tty_audit_exit();
audit_free(tsk);
- tsk->exit_code = code;
taskstats_exit(tsk, group_dead);
exit_mm(tsk);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
2012-04-09 20:03 ` Konstantin Khlebnikov
@ 2012-04-12 8:09 ` Konstantin Khlebnikov
-1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-12 8:09 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
Child should wake ups parent from vfork() only after finishing all operations with
shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
but it looks more accurate now.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
kernel/fork.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 326bb5b..f10ac1d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
/* Get rid of any cached register state */
deactivate_mm(tsk, mm);
- if (tsk->vfork_done)
- complete_vfork_done(tsk);
-
/*
* If we're exiting normally, clear a user-space tid field if
* requested. We leave this alone when dying by signal, to leave
@@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
*/
if (mm)
sync_mm_rss(mm);
+
+ /*
+ * All done, finally we can wake up parent and return this mm to him.
+ * Also kthread_stop() uses this completion for synchronization.
+ */
+ if (tsk->vfork_done)
+ complete_vfork_done(tsk);
}
/*
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
@ 2012-04-12 8:09 ` Konstantin Khlebnikov
0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-12 8:09 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
Child should wake ups parent from vfork() only after finishing all operations with
shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
but it looks more accurate now.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
kernel/fork.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 326bb5b..f10ac1d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
/* Get rid of any cached register state */
deactivate_mm(tsk, mm);
- if (tsk->vfork_done)
- complete_vfork_done(tsk);
-
/*
* If we're exiting normally, clear a user-space tid field if
* requested. We leave this alone when dying by signal, to leave
@@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
*/
if (mm)
sync_mm_rss(mm);
+
+ /*
+ * All done, finally we can wake up parent and return this mm to him.
+ * Also kthread_stop() uses this completion for synchronization.
+ */
+ if (tsk->vfork_done)
+ complete_vfork_done(tsk);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
2012-04-12 8:09 ` Konstantin Khlebnikov
@ 2012-04-12 23:35 ` Andrew Morton
-1 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-12 23:35 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On Thu, 12 Apr 2012 12:09:48 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:
> kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
> shouldn't do complete_vfork_done() until it sets ->exit_code.
>
> fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> ---
> kernel/exit.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index eb12719..70875a6 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -960,6 +960,9 @@ void do_exit(long code)
>
> acct_update_integrals(tsk);
>
> + /* Set exit_code before complete_vfork_done() in mm_release() */
> + tsk->exit_code = code;
> +
> /* Release mm and sync mm's RSS info before statistics gathering */
> mm_release(tsk, tsk->mm);
>
> @@ -975,7 +978,6 @@ void do_exit(long code)
> tty_audit_exit();
> audit_free(tsk);
>
> - tsk->exit_code = code;
> taskstats_exit(tsk, group_dead);
>
So does this patch address Oleg's objection to
mm-correctly-synchronize-rss-counters-at-exit-exec.patch?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-12 23:35 ` Andrew Morton
0 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-12 23:35 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On Thu, 12 Apr 2012 12:09:48 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:
> kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
> shouldn't do complete_vfork_done() until it sets ->exit_code.
>
> fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> ---
> kernel/exit.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index eb12719..70875a6 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -960,6 +960,9 @@ void do_exit(long code)
>
> acct_update_integrals(tsk);
>
> + /* Set exit_code before complete_vfork_done() in mm_release() */
> + tsk->exit_code = code;
> +
> /* Release mm and sync mm's RSS info before statistics gathering */
> mm_release(tsk, tsk->mm);
>
> @@ -975,7 +978,6 @@ void do_exit(long code)
> tty_audit_exit();
> audit_free(tsk);
>
> - tsk->exit_code = code;
> taskstats_exit(tsk, group_dead);
>
So does this patch address Oleg's objection to
mm-correctly-synchronize-rss-counters-at-exit-exec.patch?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
2012-04-12 8:09 ` Konstantin Khlebnikov
@ 2012-04-12 23:39 ` Andrew Morton
-1 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-12 23:39 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On Thu, 12 Apr 2012 12:09:53 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:
> Child should wake ups parent from vfork() only after finishing all operations with
> shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
> but it looks more accurate now.
>
> ...
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> /* Get rid of any cached register state */
> deactivate_mm(tsk, mm);
>
> - if (tsk->vfork_done)
> - complete_vfork_done(tsk);
> -
> /*
> * If we're exiting normally, clear a user-space tid field if
> * requested. We leave this alone when dying by signal, to leave
> @@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> */
> if (mm)
> sync_mm_rss(mm);
> +
> + /*
> + * All done, finally we can wake up parent and return this mm to him.
> + * Also kthread_stop() uses this completion for synchronization.
> + */
> + if (tsk->vfork_done)
> + complete_vfork_done(tsk);
> }
That does look a bit racy.
But are we really sure that the patch really does fix something?
Because it does increase vfork() latency a tiny bit.
I'm going to call this a patch against the fork subsystem, not the mm
subsystem.
I believe that this patch is unrelated to "mm: set task exit code
before complete_vfork_done()", yes?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
@ 2012-04-12 23:39 ` Andrew Morton
0 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-12 23:39 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On Thu, 12 Apr 2012 12:09:53 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:
> Child should wake ups parent from vfork() only after finishing all operations with
> shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
> but it looks more accurate now.
>
> ...
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> /* Get rid of any cached register state */
> deactivate_mm(tsk, mm);
>
> - if (tsk->vfork_done)
> - complete_vfork_done(tsk);
> -
> /*
> * If we're exiting normally, clear a user-space tid field if
> * requested. We leave this alone when dying by signal, to leave
> @@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
> */
> if (mm)
> sync_mm_rss(mm);
> +
> + /*
> + * All done, finally we can wake up parent and return this mm to him.
> + * Also kthread_stop() uses this completion for synchronization.
> + */
> + if (tsk->vfork_done)
> + complete_vfork_done(tsk);
> }
That does look a bit racy.
But are we really sure that the patch really does fix something?
Because it does increase vfork() latency a tiny bit.
I'm going to call this a patch against the fork subsystem, not the mm
subsystem.
I believe that this patch is unrelated to "mm: set task exit code
before complete_vfork_done()", yes?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
2012-04-12 8:09 ` Konstantin Khlebnikov
@ 2012-04-12 23:54 ` Oleg Nesterov
-1 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-12 23:54 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On 04/12, Konstantin Khlebnikov wrote:
>
> kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
> shouldn't do complete_vfork_done() until it sets ->exit_code.
>
> fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch
Yes, this should fix the problem with kthread_stop()...
Damn, Konstantin I have to admit, I'll try to find another technical
reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
even with this fix ;)
Most probably I am wrong, but it looks overcomplicated. Somehow I
dislike irrationally the fact you moved mm_release() from exit_mm().
But, once again, it is not that I see the better solution.
2/2 looks fine at first glance... and afaics it is "off-topic".
Oleg.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-12 23:54 ` Oleg Nesterov
0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-12 23:54 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On 04/12, Konstantin Khlebnikov wrote:
>
> kthread_stop() uses task->vfork_done for synchronization. The exiting kthread
> shouldn't do complete_vfork_done() until it sets ->exit_code.
>
> fix for mm-correctly-synchronize-rss-counters-at-exit-exec.patch
Yes, this should fix the problem with kthread_stop()...
Damn, Konstantin I have to admit, I'll try to find another technical
reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
even with this fix ;)
Most probably I am wrong, but it looks overcomplicated. Somehow I
dislike irrationally the fact you moved mm_release() from exit_mm().
But, once again, it is not that I see the better solution.
2/2 looks fine at first glance... and afaics it is "off-topic".
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
2012-04-12 23:39 ` Andrew Morton
@ 2012-04-13 6:43 ` Konstantin Khlebnikov
-1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-13 6:43 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
Andrew Morton wrote:
> On Thu, 12 Apr 2012 12:09:53 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org> wrote:
>
>> Child should wake ups parent from vfork() only after finishing all operations with
>> shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
>> but it looks more accurate now.
>>
>> ...
>>
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>> /* Get rid of any cached register state */
>> deactivate_mm(tsk, mm);
>>
>> - if (tsk->vfork_done)
>> - complete_vfork_done(tsk);
>> -
>> /*
>> * If we're exiting normally, clear a user-space tid field if
>> * requested. We leave this alone when dying by signal, to leave
>> @@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>> */
>> if (mm)
>> sync_mm_rss(mm);
>> +
>> + /*
>> + * All done, finally we can wake up parent and return this mm to him.
>> + * Also kthread_stop() uses this completion for synchronization.
>> + */
>> + if (tsk->vfork_done)
>> + complete_vfork_done(tsk);
>> }
>
> That does look a bit racy.
>
> But are we really sure that the patch really does fix something?
> Because it does increase vfork() latency a tiny bit.
>
> I'm going to call this a patch against the fork subsystem, not the mm
> subsystem.
>
> I believe that this patch is unrelated to "mm: set task exit code
> before complete_vfork_done()", yes?
Yes, unrelated.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters
@ 2012-04-13 6:43 ` Konstantin Khlebnikov
0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-13 6:43 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Oleg Nesterov, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
Andrew Morton wrote:
> On Thu, 12 Apr 2012 12:09:53 +0400
> Konstantin Khlebnikov<khlebnikov@openvz.org> wrote:
>
>> Child should wake ups parent from vfork() only after finishing all operations with
>> shared mm. There is no sense to use CLONE_CHILD_CLEARTID together with CLONE_VFORK,
>> but it looks more accurate now.
>>
>> ...
>>
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -728,9 +728,6 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>> /* Get rid of any cached register state */
>> deactivate_mm(tsk, mm);
>>
>> - if (tsk->vfork_done)
>> - complete_vfork_done(tsk);
>> -
>> /*
>> * If we're exiting normally, clear a user-space tid field if
>> * requested. We leave this alone when dying by signal, to leave
>> @@ -759,6 +756,13 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>> */
>> if (mm)
>> sync_mm_rss(mm);
>> +
>> + /*
>> + * All done, finally we can wake up parent and return this mm to him.
>> + * Also kthread_stop() uses this completion for synchronization.
>> + */
>> + if (tsk->vfork_done)
>> + complete_vfork_done(tsk);
>> }
>
> That does look a bit racy.
>
> But are we really sure that the patch really does fix something?
> Because it does increase vfork() latency a tiny bit.
>
> I'm going to call this a patch against the fork subsystem, not the mm
> subsystem.
>
> I believe that this patch is unrelated to "mm: set task exit code
> before complete_vfork_done()", yes?
Yes, unrelated.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
2012-04-12 23:54 ` Oleg Nesterov
@ 2012-04-20 17:59 ` Oleg Nesterov
-1 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-20 17:59 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On 04/13, Oleg Nesterov wrote:
>
> Damn, Konstantin I have to admit, I'll try to find another technical
> reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
> even with this fix ;)
>
> Most probably I am wrong, but it looks overcomplicated. Somehow I
> dislike irrationally the fact you moved mm_release() from exit_mm().
And perhaps you can help me to discredit your patch?
It turns out, I do not really understand this code in do_exit:
/* sync mm's RSS info before statistics gathering */
if (tsk->mm)
sync_mm_rss(tsk->mm);
Which "statistics gathering" ? Probably I missed something, but
after the quick grep it seems to me that this is only needed for
taskstats_exit()->xacct_add_tsk().
So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?
Yes, this way we do not "account" put_user(clear_child_tid) but
I think we do not care.
IOW, what do you think about the trivial patch below? Uncompiled,
untested, probably incomplete. acct_update_integrals() looks
suspicious too.
Oleg.
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
mm = get_task_mm(p);
if (mm) {
+ sync_mm_rss(mm);
/* adjust to KB unit */
stats->hiwater_rss = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
stats->hiwater_vm = get_mm_hiwater_vm(mm) * PAGE_SIZE / KB;
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
mm_release(tsk, mm);
if (!mm)
return;
+
+ sync_mm_rss(mm);
/*
* Serialize with any possible pending coredump.
* We must hold mmap_sem around checking core_state
@@ -960,9 +962,6 @@ void do_exit(long code)
preempt_count());
acct_update_integrals(tsk);
- /* sync mm's RSS info before statistics gathering */
- if (tsk->mm)
- sync_mm_rss(tsk->mm);
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
- sync_mm_rss(old_mm);
mm_release(tsk, old_mm);
if (old_mm) {
+ sync_mm_rss(old_mm);
/*
* Make sure that if there is a core dump in progress
* for the old mm, we get out and die instead of going
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-20 17:59 ` Oleg Nesterov
0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-20 17:59 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On 04/13, Oleg Nesterov wrote:
>
> Damn, Konstantin I have to admit, I'll try to find another technical
> reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
> even with this fix ;)
>
> Most probably I am wrong, but it looks overcomplicated. Somehow I
> dislike irrationally the fact you moved mm_release() from exit_mm().
And perhaps you can help me to discredit your patch?
It turns out, I do not really understand this code in do_exit:
/* sync mm's RSS info before statistics gathering */
if (tsk->mm)
sync_mm_rss(tsk->mm);
Which "statistics gathering" ? Probably I missed something, but
after the quick grep it seems to me that this is only needed for
taskstats_exit()->xacct_add_tsk().
So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?
Yes, this way we do not "account" put_user(clear_child_tid) but
I think we do not care.
IOW, what do you think about the trivial patch below? Uncompiled,
untested, probably incomplete. acct_update_integrals() looks
suspicious too.
Oleg.
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
mm = get_task_mm(p);
if (mm) {
+ sync_mm_rss(mm);
/* adjust to KB unit */
stats->hiwater_rss = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
stats->hiwater_vm = get_mm_hiwater_vm(mm) * PAGE_SIZE / KB;
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
mm_release(tsk, mm);
if (!mm)
return;
+
+ sync_mm_rss(mm);
/*
* Serialize with any possible pending coredump.
* We must hold mmap_sem around checking core_state
@@ -960,9 +962,6 @@ void do_exit(long code)
preempt_count());
acct_update_integrals(tsk);
- /* sync mm's RSS info before statistics gathering */
- if (tsk->mm)
- sync_mm_rss(tsk->mm);
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
- sync_mm_rss(old_mm);
mm_release(tsk, old_mm);
if (old_mm) {
+ sync_mm_rss(old_mm);
/*
* Make sure that if there is a core dump in progress
* for the old mm, we get out and die instead of going
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
2012-04-20 17:59 ` Oleg Nesterov
@ 2012-04-20 19:23 ` Konstantin Khlebnikov
-1 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-20 19:23 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
Oleg Nesterov wrote:
> On 04/13, Oleg Nesterov wrote:
>>
>> Damn, Konstantin I have to admit, I'll try to find another technical
>> reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
>> even with this fix ;)
>>
>> Most probably I am wrong, but it looks overcomplicated. Somehow I
>> dislike irrationally the fact you moved mm_release() from exit_mm().
>
> And perhaps you can help me to discredit your patch?
>
> It turns out, I do not really understand this code in do_exit:
>
> /* sync mm's RSS info before statistics gathering */
> if (tsk->mm)
> sync_mm_rss(tsk->mm);
>
> Which "statistics gathering" ? Probably I missed something, but
> after the quick grep it seems to me that this is only needed for
> taskstats_exit()->xacct_add_tsk().
>
> So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?
> Yes, this way we do not "account" put_user(clear_child_tid) but
> I think we do not care.
Why we don't care? Each thread can corrupt these counters by one.
I do not think that we are satisfied with nearly accurate rss accounting.
+/- one page for each clone()-exit().
Actually I don't really like this per-task rss-delta.
Probably it would be better to use per-cpu counters.
>
> IOW, what do you think about the trivial patch below? Uncompiled,
> untested, probably incomplete. acct_update_integrals() looks
> suspicious too.
what a mess! =)
>
> Oleg.
>
> --- a/kernel/tsacct.c
> +++ b/kernel/tsacct.c
> @@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
> stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
> mm = get_task_mm(p);
> if (mm) {
> + sync_mm_rss(mm);
> /* adjust to KB unit */
> stats->hiwater_rss = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
> stats->hiwater_vm = get_mm_hiwater_vm(mm) * PAGE_SIZE / KB;
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
> mm_release(tsk, mm);
> if (!mm)
> return;
> +
> + sync_mm_rss(mm);
> /*
> * Serialize with any possible pending coredump.
> * We must hold mmap_sem around checking core_state
> @@ -960,9 +962,6 @@ void do_exit(long code)
> preempt_count());
>
> acct_update_integrals(tsk);
> - /* sync mm's RSS info before statistics gathering */
> - if (tsk->mm)
> - sync_mm_rss(tsk->mm);
> group_dead = atomic_dec_and_test(&tsk->signal->live);
> if (group_dead) {
> hrtimer_cancel(&tsk->signal->real_timer);
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
> /* Notify parent that we're no longer interested in the old VM */
> tsk = current;
> old_mm = current->mm;
> - sync_mm_rss(old_mm);
> mm_release(tsk, old_mm);
>
> if (old_mm) {
> + sync_mm_rss(old_mm);
> /*
> * Make sure that if there is a core dump in progress
> * for the old mm, we get out and die instead of going
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org"> email@kvack.org</a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-20 19:23 ` Konstantin Khlebnikov
0 siblings, 0 replies; 45+ messages in thread
From: Konstantin Khlebnikov @ 2012-04-20 19:23 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
Oleg Nesterov wrote:
> On 04/13, Oleg Nesterov wrote:
>>
>> Damn, Konstantin I have to admit, I'll try to find another technical
>> reason against mm-correctly-synchronize-rss-counters-at-exit-exec.patch
>> even with this fix ;)
>>
>> Most probably I am wrong, but it looks overcomplicated. Somehow I
>> dislike irrationally the fact you moved mm_release() from exit_mm().
>
> And perhaps you can help me to discredit your patch?
>
> It turns out, I do not really understand this code in do_exit:
>
> /* sync mm's RSS info before statistics gathering */
> if (tsk->mm)
> sync_mm_rss(tsk->mm);
>
> Which "statistics gathering" ? Probably I missed something, but
> after the quick grep it seems to me that this is only needed for
> taskstats_exit()->xacct_add_tsk().
>
> So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?
> Yes, this way we do not "account" put_user(clear_child_tid) but
> I think we do not care.
Why we don't care? Each thread can corrupt these counters by one.
I do not think that we are satisfied with nearly accurate rss accounting.
+/- one page for each clone()-exit().
Actually I don't really like this per-task rss-delta.
Probably it would be better to use per-cpu counters.
>
> IOW, what do you think about the trivial patch below? Uncompiled,
> untested, probably incomplete. acct_update_integrals() looks
> suspicious too.
what a mess! =)
>
> Oleg.
>
> --- a/kernel/tsacct.c
> +++ b/kernel/tsacct.c
> @@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
> stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
> mm = get_task_mm(p);
> if (mm) {
> + sync_mm_rss(mm);
> /* adjust to KB unit */
> stats->hiwater_rss = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
> stats->hiwater_vm = get_mm_hiwater_vm(mm) * PAGE_SIZE / KB;
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
> mm_release(tsk, mm);
> if (!mm)
> return;
> +
> + sync_mm_rss(mm);
> /*
> * Serialize with any possible pending coredump.
> * We must hold mmap_sem around checking core_state
> @@ -960,9 +962,6 @@ void do_exit(long code)
> preempt_count());
>
> acct_update_integrals(tsk);
> - /* sync mm's RSS info before statistics gathering */
> - if (tsk->mm)
> - sync_mm_rss(tsk->mm);
> group_dead = atomic_dec_and_test(&tsk->signal->live);
> if (group_dead) {
> hrtimer_cancel(&tsk->signal->real_timer);
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
> /* Notify parent that we're no longer interested in the old VM */
> tsk = current;
> old_mm = current->mm;
> - sync_mm_rss(old_mm);
> mm_release(tsk, old_mm);
>
> if (old_mm) {
> + sync_mm_rss(old_mm);
> /*
> * Make sure that if there is a core dump in progress
> * for the old mm, we get out and die instead of going
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org"> email@kvack.org</a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
2012-04-20 19:23 ` Konstantin Khlebnikov
@ 2012-04-20 20:41 ` Oleg Nesterov
-1 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-20 20:41 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On 04/20, Konstantin Khlebnikov wrote:
>
> Oleg Nesterov wrote:
>>
>> /* sync mm's RSS info before statistics gathering */
>> if (tsk->mm)
>> sync_mm_rss(tsk->mm);
>>
>> Which "statistics gathering" ? Probably I missed something, but
>> after the quick grep it seems to me that this is only needed for
>> taskstats_exit()->xacct_add_tsk().
>>
>> So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?
>
>> Yes, this way we do not "account" put_user(clear_child_tid) but
>> I think we do not care.
>
> Why we don't care? Each thread can corrupt these counters by one.
> I do not think that we are satisfied with nearly accurate rss accounting.
> +/- one page for each clone()-exit().
Not actually "for each" in practice. Each exit does sync_ (with
this patch from xacct_add_tsk), the net effect should be small.
And. This is what we do now, nobody ever complained.
>> IOW, what do you think about the trivial patch below? Uncompiled,
>> untested, probably incomplete. acct_update_integrals() looks
>> suspicious too.
>
> what a mess! =)
Thanks ;)
But it is much, much simpler than your patches, don't you agree?
Oleg.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-20 20:41 ` Oleg Nesterov
0 siblings, 0 replies; 45+ messages in thread
From: Oleg Nesterov @ 2012-04-20 20:41 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: Andrew Morton, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On 04/20, Konstantin Khlebnikov wrote:
>
> Oleg Nesterov wrote:
>>
>> /* sync mm's RSS info before statistics gathering */
>> if (tsk->mm)
>> sync_mm_rss(tsk->mm);
>>
>> Which "statistics gathering" ? Probably I missed something, but
>> after the quick grep it seems to me that this is only needed for
>> taskstats_exit()->xacct_add_tsk().
>>
>> So why we can't simply add sync_mm_rss() into xacct_add_tsk() ?
>
>> Yes, this way we do not "account" put_user(clear_child_tid) but
>> I think we do not care.
>
> Why we don't care? Each thread can corrupt these counters by one.
> I do not think that we are satisfied with nearly accurate rss accounting.
> +/- one page for each clone()-exit().
Not actually "for each" in practice. Each exit does sync_ (with
this patch from xacct_add_tsk), the net effect should be small.
And. This is what we do now, nobody ever complained.
>> IOW, what do you think about the trivial patch below? Uncompiled,
>> untested, probably incomplete. acct_update_integrals() looks
>> suspicious too.
>
> what a mess! =)
Thanks ;)
But it is much, much simpler than your patches, don't you agree?
Oleg.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
2012-04-20 20:41 ` Oleg Nesterov
@ 2012-04-25 20:01 ` Andrew Morton
-1 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-25 20:01 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Konstantin Khlebnikov, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On Fri, 20 Apr 2012 22:41:29 +0200
Oleg Nesterov <oleg@redhat.com> wrote:
> >> IOW, what do you think about the trivial patch below? Uncompiled,
> >> untested, probably incomplete. acct_update_integrals() looks
> >> suspicious too.
> >
> > what a mess! =)
>
> Thanks ;)
>
> But it is much, much simpler than your patches, don't you agree?
<taps his foot>
Here's what we currently have:
From: Konstantin Khlebnikov <khlebnikov@openvz.org>
Subject: mm: correctly synchronize rss-counters at exit/exec
mm->rss_stat counters have per-task delta: task->rss_stat. Before
changing task->mm pointer the kernel must flush this delta with
sync_mm_rss().
do_exit() already calls sync_mm_rss() to flush the rss-counters before
committing the rss statistics into task->signal->maxrss, taskstats, audit
and other stuff. Unfortunately the kernel does this before calling
mm_release(), which can call put_user() for processing
task->clear_child_tid. So at this point we can trigger page-faults and
task->rss_stat becomes non-zero again. As a result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
out of do_exit() and calls it earlier. After mm_release() there should be
no pagefaults.
[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/exec.c | 1 -
kernel/exit.c | 13 ++++++++-----
kernel/fork.c | 8 ++++++++
3 files changed, 16 insertions(+), 6 deletions(-)
diff -puN fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec fs/exec.c
--- a/fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/fs/exec.c
@@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *m
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
- sync_mm_rss(old_mm);
mm_release(tsk, old_mm);
if (old_mm) {
diff -puN kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/exit.c
--- a/kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/exit.c
@@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
* user space pages. We don't need them, and if we didn't close them
* they would be locked into memory.
*/
+ mm_release(current, current->mm);
exit_mm(current);
/*
* We don't want to get frozen, in case system-wide hibernation
@@ -640,7 +641,6 @@ static void exit_mm(struct task_struct *
struct mm_struct *mm = tsk->mm;
struct core_state *core_state;
- mm_release(tsk, mm);
if (!mm)
return;
/*
@@ -959,9 +959,13 @@ void do_exit(long code)
preempt_count());
acct_update_integrals(tsk);
- /* sync mm's RSS info before statistics gathering */
- if (tsk->mm)
- sync_mm_rss(tsk->mm);
+
+ /* Set exit_code before complete_vfork_done() in mm_release() */
+ tsk->exit_code = code;
+
+ /* Release mm and sync mm's RSS info before statistics gathering */
+ mm_release(tsk, tsk->mm);
+
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
@@ -974,7 +978,6 @@ void do_exit(long code)
tty_audit_exit();
audit_free(tsk);
- tsk->exit_code = code;
taskstats_exit(tsk, group_dead);
exit_mm(tsk);
diff -puN kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/fork.c
--- a/kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/fork.c
@@ -782,6 +782,14 @@ void mm_release(struct task_struct *tsk,
}
tsk->clear_child_tid = NULL;
}
+
+ /*
+ * Final rss-counter synchronization. After this point there must be
+ * no pagefaults into this mm from the current context. Otherwise
+ * mm->rss_stat will be inconsistent.
+ */
+ if (mm)
+ sync_mm_rss(mm);
}
/*
_
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/2] mm: set task exit code before complete_vfork_done()
@ 2012-04-25 20:01 ` Andrew Morton
0 siblings, 0 replies; 45+ messages in thread
From: Andrew Morton @ 2012-04-25 20:01 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Konstantin Khlebnikov, Hugh Dickins, linux-kernel, linux-mm,
Markus Trippelsdorf, KAMEZAWA Hiroyuki
On Fri, 20 Apr 2012 22:41:29 +0200
Oleg Nesterov <oleg@redhat.com> wrote:
> >> IOW, what do you think about the trivial patch below? Uncompiled,
> >> untested, probably incomplete. acct_update_integrals() looks
> >> suspicious too.
> >
> > what a mess! =)
>
> Thanks ;)
>
> But it is much, much simpler than your patches, don't you agree?
<taps his foot>
Here's what we currently have:
From: Konstantin Khlebnikov <khlebnikov@openvz.org>
Subject: mm: correctly synchronize rss-counters at exit/exec
mm->rss_stat counters have per-task delta: task->rss_stat. Before
changing task->mm pointer the kernel must flush this delta with
sync_mm_rss().
do_exit() already calls sync_mm_rss() to flush the rss-counters before
committing the rss statistics into task->signal->maxrss, taskstats, audit
and other stuff. Unfortunately the kernel does this before calling
mm_release(), which can call put_user() for processing
task->clear_child_tid. So at this point we can trigger page-faults and
task->rss_stat becomes non-zero again. As a result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
out of do_exit() and calls it earlier. After mm_release() there should be
no pagefaults.
[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/exec.c | 1 -
kernel/exit.c | 13 ++++++++-----
kernel/fork.c | 8 ++++++++
3 files changed, 16 insertions(+), 6 deletions(-)
diff -puN fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec fs/exec.c
--- a/fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/fs/exec.c
@@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *m
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
- sync_mm_rss(old_mm);
mm_release(tsk, old_mm);
if (old_mm) {
diff -puN kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/exit.c
--- a/kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/exit.c
@@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
* user space pages. We don't need them, and if we didn't close them
* they would be locked into memory.
*/
+ mm_release(current, current->mm);
exit_mm(current);
/*
* We don't want to get frozen, in case system-wide hibernation
@@ -640,7 +641,6 @@ static void exit_mm(struct task_struct *
struct mm_struct *mm = tsk->mm;
struct core_state *core_state;
- mm_release(tsk, mm);
if (!mm)
return;
/*
@@ -959,9 +959,13 @@ void do_exit(long code)
preempt_count());
acct_update_integrals(tsk);
- /* sync mm's RSS info before statistics gathering */
- if (tsk->mm)
- sync_mm_rss(tsk->mm);
+
+ /* Set exit_code before complete_vfork_done() in mm_release() */
+ tsk->exit_code = code;
+
+ /* Release mm and sync mm's RSS info before statistics gathering */
+ mm_release(tsk, tsk->mm);
+
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
hrtimer_cancel(&tsk->signal->real_timer);
@@ -974,7 +978,6 @@ void do_exit(long code)
tty_audit_exit();
audit_free(tsk);
- tsk->exit_code = code;
taskstats_exit(tsk, group_dead);
exit_mm(tsk);
diff -puN kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/fork.c
--- a/kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/fork.c
@@ -782,6 +782,14 @@ void mm_release(struct task_struct *tsk,
}
tsk->clear_child_tid = NULL;
}
+
+ /*
+ * Final rss-counter synchronization. After this point there must be
+ * no pagefaults into this mm from the current context. Otherwise
+ * mm->rss_stat will be inconsistent.
+ */
+ if (mm)
+ sync_mm_rss(mm);
}
/*
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2012-04-25 20:01 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-09 20:03 [PATCH] mm: sync rss-counters at the end of exit_mm() Konstantin Khlebnikov
2012-04-09 20:03 ` Konstantin Khlebnikov
2012-04-09 21:04 ` KOSAKI Motohiro
2012-04-09 21:04 ` KOSAKI Motohiro
2012-04-09 22:03 ` Hugh Dickins
2012-04-10 0:33 ` KAMEZAWA Hiroyuki
2012-04-10 0:33 ` KAMEZAWA Hiroyuki
2012-04-10 6:34 ` Konstantin Khlebnikov
2012-04-10 6:34 ` Konstantin Khlebnikov
2012-04-10 16:04 ` Markus Trippelsdorf
2012-04-10 16:04 ` Markus Trippelsdorf
2012-04-10 0:35 ` KAMEZAWA Hiroyuki
2012-04-10 0:35 ` KAMEZAWA Hiroyuki
2012-04-10 5:43 ` Konstantin Khlebnikov
2012-04-10 5:43 ` Konstantin Khlebnikov
2012-04-10 17:07 ` [PATCH v2] mm: correctly synchronize rss-counters at exit/exec Konstantin Khlebnikov
2012-04-10 17:07 ` Konstantin Khlebnikov
2012-04-10 18:43 ` Andrew Morton
2012-04-10 18:43 ` Andrew Morton
2012-04-10 19:52 ` Konstantin Khlebnikov
2012-04-10 19:52 ` Konstantin Khlebnikov
2012-04-10 19:10 ` Oleg Nesterov
2012-04-10 19:10 ` Oleg Nesterov
2012-04-10 20:09 ` Konstantin Khlebnikov
2012-04-10 20:09 ` Konstantin Khlebnikov
2012-04-12 8:09 ` [PATCH 1/2] mm: set task exit code before complete_vfork_done() Konstantin Khlebnikov
2012-04-12 8:09 ` Konstantin Khlebnikov
2012-04-12 23:35 ` Andrew Morton
2012-04-12 23:35 ` Andrew Morton
2012-04-12 23:54 ` Oleg Nesterov
2012-04-12 23:54 ` Oleg Nesterov
2012-04-20 17:59 ` Oleg Nesterov
2012-04-20 17:59 ` Oleg Nesterov
2012-04-20 19:23 ` Konstantin Khlebnikov
2012-04-20 19:23 ` Konstantin Khlebnikov
2012-04-20 20:41 ` Oleg Nesterov
2012-04-20 20:41 ` Oleg Nesterov
2012-04-25 20:01 ` Andrew Morton
2012-04-25 20:01 ` Andrew Morton
2012-04-12 8:09 ` [PATCH 2/2] mm: call complete_vfork_done() after clearing child_tid and flushing rss-counters Konstantin Khlebnikov
2012-04-12 8:09 ` Konstantin Khlebnikov
2012-04-12 23:39 ` Andrew Morton
2012-04-12 23:39 ` Andrew Morton
2012-04-13 6:43 ` Konstantin Khlebnikov
2012-04-13 6:43 ` Konstantin Khlebnikov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.