All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting
@ 2008-12-12 14:05 Oleg Nesterov
  2008-12-12 15:56 ` Hugh Dickins
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Oleg Nesterov @ 2008-12-12 14:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Balbir Singh, Hugh Dickins, Jay Lan, Jiri Pirko, Jonathan Lim,
	KOSAKI Motohiro, linux-kernel

(changes: update the changelog/comments)

xacct_add_tsk() relies on do_exit()->update_hiwater_xxx() and uses
mm->hiwater_xxx directly, this leads to 2 problems:

	- taskstats_user_cmd() can call fill_pid()->xacct_add_tsk()
	  at any moment before the task exits, so we should check the
	  current values of rss/vm anyway.

	- do_exit()->update_hiwater_xxx() calls are racy. An exiting
	  thread can be preempted right before mm->hiwater_xxx = new_val,
	  and another thread can use A_LOT of memory and exit in between.
	  When the first thread resumes it can be the last thread in the
	  thread group, in that case we report the wrong hiwater_xxx
	  values which do not take A_LOT into account.

Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() helpers and
change xacct_add_tsk() to use them. The first helper will also be
used by rusage->ru_maxrss accounting.

Kill do_exit()->update_hiwater_xxx() calls. Unless we are going to
decrease rss/vm there is no point to update mm->hiwater_xxx, and
nobody can look at this mm_struct when exit_mmap() actually unmaps
the memory.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>

--- K-28/include/linux/sched.h~HIWATER	2008-12-02 17:12:40.000000000 +0100
+++ K-28/include/linux/sched.h	2008-12-03 18:17:18.000000000 +0100
@@ -388,6 +388,9 @@ extern void arch_unmap_area_topdown(stru
 		(mm)->hiwater_vm = (mm)->total_vm;	\
 } while (0)
 
+#define get_mm_hiwater_rss(mm)	max((mm)->hiwater_rss, get_mm_rss(mm))
+#define get_mm_hiwater_vm(mm)	max((mm)->hiwater_vm, (mm)->total_vm)
+
 extern void set_dumpable(struct mm_struct *mm, int value);
 extern int get_dumpable(struct mm_struct *mm);
 
--- K-28/kernel/tsacct.c~HIWATER	2008-10-10 00:13:53.000000000 +0200
+++ K-28/kernel/tsacct.c	2008-12-03 18:24:28.000000000 +0100
@@ -90,8 +90,8 @@ void xacct_add_tsk(struct taskstats *sta
 	mm = get_task_mm(p);
 	if (mm) {
 		/* adjust to KB unit */
-		stats->hiwater_rss   = mm->hiwater_rss * PAGE_SIZE / KB;
-		stats->hiwater_vm    = mm->hiwater_vm * PAGE_SIZE / KB;
+		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
+		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;
 		mmput(mm);
 	}
 	stats->read_char	= p->ioac.rchar;
--- K-28/kernel/exit.c~HIWATER	2008-12-02 17:12:40.000000000 +0100
+++ K-28/kernel/exit.c	2008-12-03 18:21:06.000000000 +0100
@@ -1048,10 +1048,7 @@ NORET_TYPE void do_exit(long code)
 				preempt_count());
 
 	acct_update_integrals(tsk);
-	if (tsk->mm) {
-		update_hiwater_rss(tsk->mm);
-		update_hiwater_vm(tsk->mm);
-	}
+
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
 		hrtimer_cancel(&tsk->signal->real_timer);
--- K-28/mm/mmap.c~HIWATER	2008-12-02 17:12:40.000000000 +0100
+++ K-28/mm/mmap.c	2008-12-11 09:13:07.000000000 +0100
@@ -2103,7 +2103,7 @@ void exit_mmap(struct mm_struct *mm)
 	lru_add_drain();
 	flush_cache_mm(mm);
 	tlb = tlb_gather_mmu(mm, 1);
-	/* Don't update_hiwater_rss(mm) here, do_exit already did */
+	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting
  2008-12-12 14:05 [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting Oleg Nesterov
@ 2008-12-12 15:56 ` Hugh Dickins
  2008-12-13  2:34 ` KOSAKI Motohiro
  2008-12-16  0:21 ` Andrew Morton
  2 siblings, 0 replies; 7+ messages in thread
From: Hugh Dickins @ 2008-12-12 15:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Balbir Singh, Jay Lan, Jiri Pirko, Jonathan Lim,
	KOSAKI Motohiro, linux-kernel

On Fri, 12 Dec 2008, Oleg Nesterov wrote:

> (changes: update the changelog/comments)
> 
> xacct_add_tsk() relies on do_exit()->update_hiwater_xxx() and uses
> mm->hiwater_xxx directly, this leads to 2 problems:
> 
> 	- taskstats_user_cmd() can call fill_pid()->xacct_add_tsk()
> 	  at any moment before the task exits, so we should check the
> 	  current values of rss/vm anyway.
> 
> 	- do_exit()->update_hiwater_xxx() calls are racy. An exiting
> 	  thread can be preempted right before mm->hiwater_xxx = new_val,
> 	  and another thread can use A_LOT of memory and exit in between.
> 	  When the first thread resumes it can be the last thread in the
> 	  thread group, in that case we report the wrong hiwater_xxx
> 	  values which do not take A_LOT into account.
> 
> Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() helpers and
> change xacct_add_tsk() to use them. The first helper will also be
> used by rusage->ru_maxrss accounting.
> 
> Kill do_exit()->update_hiwater_xxx() calls. Unless we are going to
> decrease rss/vm there is no point to update mm->hiwater_xxx, and
> nobody can look at this mm_struct when exit_mmap() actually unmaps
> the memory.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Hugh Dickins <hugh@veritas.com>

> 
> --- K-28/include/linux/sched.h~HIWATER	2008-12-02 17:12:40.000000000 +0100
> +++ K-28/include/linux/sched.h	2008-12-03 18:17:18.000000000 +0100
> @@ -388,6 +388,9 @@ extern void arch_unmap_area_topdown(stru
>  		(mm)->hiwater_vm = (mm)->total_vm;	\
>  } while (0)
>  
> +#define get_mm_hiwater_rss(mm)	max((mm)->hiwater_rss, get_mm_rss(mm))
> +#define get_mm_hiwater_vm(mm)	max((mm)->hiwater_vm, (mm)->total_vm)
> +
>  extern void set_dumpable(struct mm_struct *mm, int value);
>  extern int get_dumpable(struct mm_struct *mm);
>  
> --- K-28/kernel/tsacct.c~HIWATER	2008-10-10 00:13:53.000000000 +0200
> +++ K-28/kernel/tsacct.c	2008-12-03 18:24:28.000000000 +0100
> @@ -90,8 +90,8 @@ void xacct_add_tsk(struct taskstats *sta
>  	mm = get_task_mm(p);
>  	if (mm) {
>  		/* adjust to KB unit */
> -		stats->hiwater_rss   = mm->hiwater_rss * PAGE_SIZE / KB;
> -		stats->hiwater_vm    = mm->hiwater_vm * PAGE_SIZE / KB;
> +		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
> +		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;
>  		mmput(mm);
>  	}
>  	stats->read_char	= p->ioac.rchar;
> --- K-28/kernel/exit.c~HIWATER	2008-12-02 17:12:40.000000000 +0100
> +++ K-28/kernel/exit.c	2008-12-03 18:21:06.000000000 +0100
> @@ -1048,10 +1048,7 @@ NORET_TYPE void do_exit(long code)
>  				preempt_count());
>  
>  	acct_update_integrals(tsk);
> -	if (tsk->mm) {
> -		update_hiwater_rss(tsk->mm);
> -		update_hiwater_vm(tsk->mm);
> -	}
> +
>  	group_dead = atomic_dec_and_test(&tsk->signal->live);
>  	if (group_dead) {
>  		hrtimer_cancel(&tsk->signal->real_timer);
> --- K-28/mm/mmap.c~HIWATER	2008-12-02 17:12:40.000000000 +0100
> +++ K-28/mm/mmap.c	2008-12-11 09:13:07.000000000 +0100
> @@ -2103,7 +2103,7 @@ void exit_mmap(struct mm_struct *mm)
>  	lru_add_drain();
>  	flush_cache_mm(mm);
>  	tlb = tlb_gather_mmu(mm, 1);
> -	/* Don't update_hiwater_rss(mm) here, do_exit already did */
> +	/* update_hiwater_rss(mm) here? but nobody should be looking */
>  	/* Use -1 here to ensure all VMAs in the mm are unmapped */
>  	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
>  	vm_unacct_memory(nr_accounted);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting
  2008-12-12 14:05 [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting Oleg Nesterov
  2008-12-12 15:56 ` Hugh Dickins
@ 2008-12-13  2:34 ` KOSAKI Motohiro
  2008-12-13  3:48   ` Balbir Singh
  2008-12-16  0:21 ` Andrew Morton
  2 siblings, 1 reply; 7+ messages in thread
From: KOSAKI Motohiro @ 2008-12-13  2:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Balbir Singh, Hugh Dickins, Jay Lan, Jiri Pirko,
	Jonathan Lim, linux-kernel

> (changes: update the changelog/comments)
>
> xacct_add_tsk() relies on do_exit()->update_hiwater_xxx() and uses
> mm->hiwater_xxx directly, this leads to 2 problems:
>
>        - taskstats_user_cmd() can call fill_pid()->xacct_add_tsk()
>          at any moment before the task exits, so we should check the
>          current values of rss/vm anyway.
>
>        - do_exit()->update_hiwater_xxx() calls are racy. An exiting
>          thread can be preempted right before mm->hiwater_xxx = new_val,
>          and another thread can use A_LOT of memory and exit in between.
>          When the first thread resumes it can be the last thread in the
>          thread group, in that case we report the wrong hiwater_xxx
>          values which do not take A_LOT into account.
>
> Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() helpers and
> change xacct_add_tsk() to use them. The first helper will also be
> used by rusage->ru_maxrss accounting.
>
> Kill do_exit()->update_hiwater_xxx() calls. Unless we are going to
> decrease rss/vm there is no point to update mm->hiwater_xxx, and
> nobody can look at this mm_struct when exit_mmap() actually unmaps
> the memory.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Thanks! looks good to me.
   Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



> --- K-28/mm/mmap.c~HIWATER      2008-12-02 17:12:40.000000000 +0100
> +++ K-28/mm/mmap.c      2008-12-11 09:13:07.000000000 +0100
> @@ -2103,7 +2103,7 @@ void exit_mmap(struct mm_struct *mm)
>        lru_add_drain();
>        flush_cache_mm(mm);
>        tlb = tlb_gather_mmu(mm, 1);
> -       /* Don't update_hiwater_rss(mm) here, do_exit already did */
> +       /* update_hiwater_rss(mm) here? but nobody should be looking */
>        /* Use -1 here to ensure all VMAs in the mm are unmapped */
>        end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
>        vm_unacct_memory(nr_accounted);

I also think hiwatermark don't need update here.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting
  2008-12-13  2:34 ` KOSAKI Motohiro
@ 2008-12-13  3:48   ` Balbir Singh
  0 siblings, 0 replies; 7+ messages in thread
From: Balbir Singh @ 2008-12-13  3:48 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Oleg Nesterov, Andrew Morton, Hugh Dickins, Jay Lan, Jiri Pirko,
	Jonathan Lim, linux-kernel

* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2008-12-13 11:34:53]:

> > (changes: update the changelog/comments)
> >
> > xacct_add_tsk() relies on do_exit()->update_hiwater_xxx() and uses
> > mm->hiwater_xxx directly, this leads to 2 problems:
> >
> >        - taskstats_user_cmd() can call fill_pid()->xacct_add_tsk()
> >          at any moment before the task exits, so we should check the
> >          current values of rss/vm anyway.
> >
> >        - do_exit()->update_hiwater_xxx() calls are racy. An exiting
> >          thread can be preempted right before mm->hiwater_xxx = new_val,
> >          and another thread can use A_LOT of memory and exit in between.
> >          When the first thread resumes it can be the last thread in the
> >          thread group, in that case we report the wrong hiwater_xxx
> >          values which do not take A_LOT into account.
> >
> > Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() helpers and
> > change xacct_add_tsk() to use them. The first helper will also be
> > used by rusage->ru_maxrss accounting.
> >
> > Kill do_exit()->update_hiwater_xxx() calls. Unless we are going to
> > decrease rss/vm there is no point to update mm->hiwater_xxx, and
> > nobody can look at this mm_struct when exit_mmap() actually unmaps
> > the memory.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> 
> Thanks! looks good to me.
>    Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Me too, I am acking it, but you already have all the acks you need :)

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Balbir

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting
  2008-12-12 14:05 [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting Oleg Nesterov
  2008-12-12 15:56 ` Hugh Dickins
  2008-12-13  2:34 ` KOSAKI Motohiro
@ 2008-12-16  0:21 ` Andrew Morton
  2008-12-16 10:36   ` Oleg Nesterov
  2008-12-16 10:43   ` Jiri Pirko
  2 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2008-12-16  0:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: balbir, hugh, jlan, jpirko, jlim, kosaki.motohiro, linux-kernel

On Fri, 12 Dec 2008 15:05:24 +0100
Oleg Nesterov <oleg@redhat.com> wrote:
>

> --- K-28/include/linux/sched.h~HIWATER	2008-12-02 17:12:40.000000000 +0100
> +++ K-28/include/linux/sched.h	2008-12-03 18:17:18.000000000 +0100

grumble

> +#define get_mm_hiwater_rss(mm)	max((mm)->hiwater_rss, get_mm_rss(mm))

This evaluates its argument thrice.

> +#define get_mm_hiwater_vm(mm)	max((mm)->hiwater_vm, (mm)->total_vm)

This evaluates its argument twice.


was sched.h the appropriate header in which to implement these?  Maybe...

But they're only ever _used_ in kernel/tsacct.c, so do they actually
need to be implemented in any .h file?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting
  2008-12-16  0:21 ` Andrew Morton
@ 2008-12-16 10:36   ` Oleg Nesterov
  2008-12-16 10:43   ` Jiri Pirko
  1 sibling, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2008-12-16 10:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: balbir, hugh, jlan, jpirko, jlim, kosaki.motohiro, linux-kernel

On 12/15, Andrew Morton wrote:
>
> On Fri, 12 Dec 2008 15:05:24 +0100
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > +#define get_mm_hiwater_rss(mm)	max((mm)->hiwater_rss, get_mm_rss(mm))
>
> This evaluates its argument thrice.
>
> > +#define get_mm_hiwater_vm(mm)	max((mm)->hiwater_vm, (mm)->total_vm)
>
> This evaluates its argument twice.

I thought that any user should be careful anyway...

OK, agreed, will send the cleanup.

> was sched.h the appropriate header in which to implement these?  Maybe...

Just because I'd like to put them near update_hiwater_xxx()

> But they're only ever _used_ in kernel/tsacct.c, so do they actually
> need to be implemented in any .h file?

Jiri cooks the patch which implements rusage->ru_maxrss accounting,
it will use the first helper.

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting
  2008-12-16  0:21 ` Andrew Morton
  2008-12-16 10:36   ` Oleg Nesterov
@ 2008-12-16 10:43   ` Jiri Pirko
  1 sibling, 0 replies; 7+ messages in thread
From: Jiri Pirko @ 2008-12-16 10:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, balbir, hugh, jlan, jlim, kosaki.motohiro, linux-kernel

On Mon, 15 Dec 2008 16:21:48 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> 
> > +#define get_mm_hiwater_rss(mm)	max((mm)->hiwater_rss, get_mm_rss(mm))
> 
> This evaluates its argument thrice.
> 
> > +#define get_mm_hiwater_vm(mm)	max((mm)->hiwater_vm, (mm)->total_vm)
> 
> This evaluates its argument twice.
> 
> 
> was sched.h the appropriate header in which to implement these?  Maybe...
I think it was. There are similar helpers at the same place.
> 
> But they're only ever _used_ in kernel/tsacct.c, so do they actually
> need to be implemented in any .h file?
Yes because my patch (ru_maxrss filling) will be using
get_mm_hiwater_rss() from kernel/exit.c and kernel/sys.c
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-12-16 10:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-12 14:05 [PATCH, RESEND] introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting Oleg Nesterov
2008-12-12 15:56 ` Hugh Dickins
2008-12-13  2:34 ` KOSAKI Motohiro
2008-12-13  3:48   ` Balbir Singh
2008-12-16  0:21 ` Andrew Morton
2008-12-16 10:36   ` Oleg Nesterov
2008-12-16 10:43   ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.