linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Add pacct_struct to fix some pacct bugs.
@ 2006-06-20  6:24 KaiGai Kohei
  2006-06-20  6:42 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: KaiGai Kohei @ 2006-06-20  6:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm

Hi, I noticed three problems in pacct facility.

1. Pacct facility has a possibility to write incorrect ac_flag
   in multi-threading cases.
2. There is a possibility to be waken up OOM Killer from
   pacct facility. It will cause system stall.
3. If several threads are killed at same time, There is
   a possibility not to pick up a part of those accountings.

The attached patch will resolve those matters.
Any comments please. Thanks,

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>

[The details of those matters]

(1) about incorrect ac_flag
When pacct facility generate an 'ac_flag' field in accounting record,
it refers a task_struct of the thread which died last in the process.
But any other task_structs are ignored.

Therefore, pacct facility drops ASU flag even if root-privilege
operations are used by any other threads except the last one.
In addition, AFORK flag is always set when the thread of group-leader
didn't die last, although this process has called execve() after fork().

---- in kernel/acct.c : do_acct_process() ----
        ac.ac_flag = 0;
        if (current->flags & PF_FORKNOEXEC)
                ac.ac_flag |= AFORK;
        if (current->flags & PF_SUPERPRIV)
                ac.ac_flag |= ASU;
        if (current->flags & PF_DUMPCORE)
                ac.ac_flag |= ACORE;
        if (current->flags & PF_SIGNALED)
                ac.ac_flag |= AXSIG;
----------------------------------------------
I think ASU, ACORE, AXSIG flag should be set if any task_structs
satisfy the conditions. And, AFORK flag should not be checked
for any threads except the group leader.


(2) about OOM killer
The pacct facility need an i/o operation when an accounting record
is generated. There is a possibility to wake OOM killer up.
If OOM killer is activated, it kills some processes to make them
release process memory regions.
But acct_process() is called in the killed processes context
before calling exit_mm(), so those processes cannot release
own memory. In the results, any processes stop in this point and
it finally cause a system stall.

---- in kernel/exit.c : do_exit() ------------
        group_dead = atomic_dec_and_test(&tsk->signal->live);
        if (group_dead) {
                hrtimer_cancel(&tsk->signal->real_timer);
                exit_itimers(tsk->signal);
                acct_process(code);
        }
           :
       - snip -
           :
        exit_mm(tsk);
----------------------------------------------
I think acct_process() should be called after exit_mm() to avoid
this matter. As mm_struct is necessary to calculate ac_mem field,
it should be kept before exit_mm().


(3) about race condition in pacct
In current 2.6.17 implementation, signal_struct refered from task_struct
is used for per-process data structure. The pacct facility also uses it
as a per-process data structure to save stime, utime, minflt, majflt.
But those members are saved in __exit_signal(). It's too late.

For example, if some threads exits at same time, pacct facility has
a possibility to drop accountings for a part of those threads.
(see, the following 'The results of original 2.6.17 kernel')
I think accounting information should be completely collected into
the per-process data structure before writing out an accounting record.


[The solution for those matters]
a. pacct_struct is newly defined, and it's deployed into signal_struct
   as a per-process accounting storage.
b. acct_collect() is newly added to collect any thread's accountings
   before generating accounting record and calling exit_mm().
c. Calling acct_process() is moved after exit_mm().
d. do_acct_process() becomes to refer the per-process accounting
   structure, not task_strcut of the last thread.


[The results of original 2.6.17 kernel]
* test for ac_flag
  # time -p ./bugacct
  real 10.11
  user 5.94
  sys 0.17
  # touch /tmp/acct-`uname -r`
  # accton /tmp/acct-`uname -r`
  # time -p ./bugacct
  real 10.04
  user 5.86
  sys 0.17
  -- The accounting results -----------------------------------------
  FLAG  BTIME          ETIME UTIME STIME    MEM MINFLT MAJFLT    COMM
  F---  06/20-14:31:47  1003   586    16 140992  32776      0 bugacct

  (*) 'F' means it didn't execve() after fork().
      'P' means it used root-privilege operations.

  => original 2.6.17 drops 'P' flag, and appends undesired 'F' flag.

* test for the race condition
  # time -p ./raceacct 4
  real 2.59
  user 7.56
  sys 0.00
  # time -p ./raceacct 4
  real 2.27
  user 6.62
  sys 0.00
  # time -p ./raceacct 4
  real 2.16
  user 6.16
  sys 0.00
  # time -p ./raceacct 4
  real 2.87
  user 8.36
  sys 0.00
  # time -p ./raceacct 4
  real 2.77
  user 8.11
  sys 0.00

  -- The accounting results -----------------------------------------
  FLAG  BTIME          ETIME UTIME STIME    MEM MINFLT MAJFLT    COMM
  F---  06/20-11:18:57   259   596     0  28528     38      0 raceacct *
  ----  06/20-11:19:14   227   662     0  28528    171      0 raceacct
  F---  06/20-11:19:18   216   509     0  28528     38      0 raceacct *
  F---  06/20-11:19:22   287   651     0  28528     39      0 raceacct *
  ----  06/20-11:19:25   276   811     0  28528    170      0 raceacct
  (*) = UTIME does not match with the result of 'time -p'


[The results of patched 2.6.17-kg kernel]
* test for ac_flag
  # time -p ./bugacct
  real 10.07
  user 5.88
  sys 0.17
  -- The accounting results -----------------------------------------
  FLAG  BTIME          ETIME UTIME STIME    MEM MINFLT MAJFLT    COMM
  -P--  06/20-14:08:06  1007   588    16 140992  32944      0 bugacct

* test for the race condition
  # time -p ./raceacct 4
  real 2.55
  user 7.48
  sys 0.00
  # time -p ./raceacct 4
  real 2.59
  user 7.58
  sys 0.00
  # time -p ./raceacct 4
  real 2.64
  user 7.75
  sys 0.00
  # time -p ./raceacct 4
  real 2.29
  user 6.71
  sys 0.00
  # time -p ./raceacct 4
  real 2.68
  user 7.83
  sys 0.00
  -- The accounting results -----------------------------------------
  FLAG  BTIME          ETIME UTIME STIME    MEM MINFLT MAJFLT     COMM
  ----  06/20-11:09:33   255   748     0  28528    171      0 raceacct
  ----  06/20-11:09:37   259   758     0  28528    171      0 raceacct
  ----  06/20-11:09:41   264   775     0  28528    170      0 raceacct
  ----  06/20-11:09:44   229   671     0  28528    170      0 raceacct
  ----  06/20-11:09:57   268   783     0  28528    170      0 raceacct

[The test program : raceacct.c]
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void *worker(void *dummy) {
    int i, *flag = dummy;

    for (i=0; i < 100000000 && *flag; i++);
    *flag = 0;
    pthread_exit(NULL);
}

int main(int argc, char *argv[]) {
    pthread_t pthread;
    int i, num = 0, flag = 1;

  if (argc > 2) {
      fprintf(stderr, "argument too large\n");
      return 1;
  } else if (argc == 2) {
      num = atoi(argv[1]);
  }
  for (i=1; i < num; i++)
      pthread_create(&pthread, NULL, worker, (void *)&flag);
  worker((void *)&flag);
}

[The test program : bugacct.c]
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <sys/time.h>

#define BUFFER_SIZE (128 * 1024 * 1024)  /* 128MB */

void *mychild(void *buffer) {
        struct timeval tv;
        u_int64_t t1, t2;

        gettimeofday(&tv, NULL);
        t1 = tv.tv_sec * 1000000 + tv.tv_usec;
        t2 = t1 + 6 * 1000000;
        /* heavy CPU/Mem job */
        srand(tv.tv_usec);
        while(t1 < t2) {
            memset(buffer, rand(), BUFFER_SIZE);
            gettimeofday(&tv, NULL);
            t1 = tv.tv_sec * 1000000 + tv.tv_usec;
        }
        return NULL;
}

int main(int argc, char *argv[]) {
        pthread_t pthread;
        void *buffer = NULL;

        buffer = malloc(BUFFER_SIZE);
        if (!buffer)
            return 1;

        sleep(4);
        pthread_create(&pthread, NULL, mychild, buffer);
        nice(-1);
        sleep(1);
        pthread_exit(0);
}

[The patch against to 2.6.17]
--- linux-2.6.17/include/linux/acct.h	2006-06-18 10:49:35.000000000 +0900
+++ linux-2.6.17.kg/include/linux/acct.h	2006-06-19 17:04:11.000000000 +0900
@@ -123,13 +123,17 @@ struct vfsmount;
 struct super_block;
 extern void acct_auto_close_mnt(struct vfsmount *m);
 extern void acct_auto_close(struct super_block *sb);
-extern void acct_process(long exitcode);
+extern void acct_init_pacct(struct pacct_struct *pacct, struct task_struct *tsk);
+extern void acct_collect(long exitcode, int group_dead);
+extern void acct_process(void);
 extern void acct_update_integrals(struct task_struct *tsk);
 extern void acct_clear_integrals(struct task_struct *tsk);
 #else
 #define acct_auto_close_mnt(x)	do { } while (0)
 #define acct_auto_close(x)	do { } while (0)
-#define acct_process(x)		do { } while (0)
+#define acct_init_pacct(x,y)	do { } while (0)
+#define acct_collect(x,y)	do { } while (0)
+#define acct_process()		do { } while (0)
 #define acct_update_integrals(x)		do { } while (0)
 #define acct_clear_integrals(task)	do { } while (0)
 #endif
--- linux-2.6.17/include/linux/sched.h	2006-06-18 10:49:35.000000000 +0900
+++ linux-2.6.17.kg/include/linux/sched.h	2006-06-19 16:59:16.000000000 +0900
@@ -358,6 +358,15 @@ struct sighand_struct {
 	spinlock_t		siglock;
 };

+struct pacct_struct {
+	int		ac_flag;
+	long		ac_exitcode;
+	cputime_t	ac_utime, ac_stime;
+	unsigned long	ac_mem;
+	unsigned long	ac_minflt, ac_majflt;
+	struct timespec	ac_start_time;
+};
+
 /*
  * NOTE! "signal_struct" does not have it's own
  * locking, because a shared signal_struct always
@@ -449,6 +458,9 @@ struct signal_struct {
 	struct key *session_keyring;	/* keyring inherited over fork */
 	struct key *process_keyring;	/* keyring private to this process */
 #endif
+#ifdef CONFIG_BSD_PROCESS_ACCT
+	struct pacct_struct pacct;	/* per-process accounting information */
+#endif
 };

 /* Context switch must be unlocked if interrupts are to be enabled */
--- linux-2.6.17/kernel/acct.c	2006-06-18 10:49:35.000000000 +0900
+++ linux-2.6.17.kg/kernel/acct.c	2006-06-20 14:55:34.000000000 +0900
@@ -75,7 +75,7 @@ int acct_parm[3] = {4, 2, 30};
 /*
  * External references and all of the globals.
  */
-static void do_acct_process(long, struct file *);
+static void do_acct_process(struct file *);

 /*
  * This structure is used so that all the data protected by lock
@@ -196,7 +196,7 @@ static void acct_file_reopen(struct file
 	if (old_acct) {
 		mnt_unpin(old_acct->f_vfsmnt);
 		spin_unlock(&acct_globals.lock);
-		do_acct_process(0, old_acct);
+		do_acct_process(old_acct);
 		filp_close(old_acct, NULL);
 		spin_lock(&acct_globals.lock);
 	}
@@ -419,16 +419,16 @@ static u32 encode_float(u64 value)
 /*
  *  do_acct_process does all actual work. Caller holds the reference to file.
  */
-static void do_acct_process(long exitcode, struct file *file)
+static void do_acct_process(struct file *file)
 {
+	struct signal_struct *sig = current->signal;
+	struct pacct_struct *pacct = &sig->pacct;
 	acct_t ac;
 	mm_segment_t fs;
-	unsigned long vsize;
 	unsigned long flim;
 	u64 elapsed;
 	u64 run_time;
 	struct timespec uptime;
-	unsigned long jiffies;

 	/*
 	 * First check to see if there is enough free_space to continue
@@ -449,8 +449,8 @@ static void do_acct_process(long exitcod
 	/* calculate run_time in nsec*/
 	do_posix_clock_monotonic_gettime(&uptime);
 	run_time = (u64)uptime.tv_sec*NSEC_PER_SEC + uptime.tv_nsec;
-	run_time -= (u64)current->group_leader->start_time.tv_sec * NSEC_PER_SEC
-		       + current->group_leader->start_time.tv_nsec;
+	run_time -= (u64)pacct->ac_start_time.tv_sec * NSEC_PER_SEC
+			+ pacct->ac_start_time.tv_nsec;
 	/* convert nsec -> AHZ */
 	elapsed = nsec_to_AHZ(run_time);
 #if ACCT_VERSION==3
@@ -469,12 +469,8 @@ static void do_acct_process(long exitcod
 #endif
 	do_div(elapsed, AHZ);
 	ac.ac_btime = xtime.tv_sec - elapsed;
-	jiffies = cputime_to_jiffies(cputime_add(current->utime,
-						 current->signal->utime));
-	ac.ac_utime = encode_comp_t(jiffies_to_AHZ(jiffies));
-	jiffies = cputime_to_jiffies(cputime_add(current->stime,
-						 current->signal->stime));
-	ac.ac_stime = encode_comp_t(jiffies_to_AHZ(jiffies));
+	ac.ac_utime = encode_comp_t(jiffies_to_AHZ(cputime_to_jiffies(pacct->ac_utime)));
+	ac.ac_stime = encode_comp_t(jiffies_to_AHZ(cputime_to_jiffies(pacct->ac_stime)));
 	/* we really need to bite the bullet and change layout */
 	ac.ac_uid = current->uid;
 	ac.ac_gid = current->gid;
@@ -496,37 +492,14 @@ static void do_acct_process(long exitcod
 		old_encode_dev(tty_devnum(current->signal->tty)) : 0;
 	read_unlock(&tasklist_lock);

-	ac.ac_flag = 0;
-	if (current->flags & PF_FORKNOEXEC)
-		ac.ac_flag |= AFORK;
-	if (current->flags & PF_SUPERPRIV)
-		ac.ac_flag |= ASU;
-	if (current->flags & PF_DUMPCORE)
-		ac.ac_flag |= ACORE;
-	if (current->flags & PF_SIGNALED)
-		ac.ac_flag |= AXSIG;
-
-	vsize = 0;
-	if (current->mm) {
-		struct vm_area_struct *vma;
-		down_read(&current->mm->mmap_sem);
-		vma = current->mm->mmap;
-		while (vma) {
-			vsize += vma->vm_end - vma->vm_start;
-			vma = vma->vm_next;
-		}
-		up_read(&current->mm->mmap_sem);
-	}
-	vsize = vsize / 1024;
-	ac.ac_mem = encode_comp_t(vsize);
+	ac.ac_flag = pacct->ac_flag;
+	ac.ac_mem = encode_comp_t(pacct->ac_mem);
 	ac.ac_io = encode_comp_t(0 /* current->io_usage */);	/* %% */
 	ac.ac_rw = encode_comp_t(ac.ac_io / 1024);
-	ac.ac_minflt = encode_comp_t(current->signal->min_flt +
-				     current->min_flt);
-	ac.ac_majflt = encode_comp_t(current->signal->maj_flt +
-				     current->maj_flt);
+	ac.ac_minflt = encode_comp_t(pacct->ac_minflt);
+	ac.ac_majflt = encode_comp_t(pacct->ac_majflt);
 	ac.ac_swaps = encode_comp_t(0);
-	ac.ac_exitcode = exitcode;
+	ac.ac_exitcode = pacct->ac_exitcode;

 	/*
          * Kernel segment override to datasegment and write it
@@ -545,13 +518,56 @@ static void do_acct_process(long exitcod
 	set_fs(fs);
 }

+void acct_init_pacct(struct pacct_struct *pacct, struct task_struct *tsk)
+{
+	memset(pacct, 0, sizeof(struct pacct_struct));
+	pacct->ac_utime = pacct->ac_stime = cputime_zero;
+}
+
+void acct_collect(long exitcode, int group_dead)
+{
+	struct pacct_struct *pacct = &current->signal->pacct;
+	unsigned long vsize = 0;
+
+	if (group_dead && current->mm) {
+		struct vm_area_struct *vma;
+		down_read(&current->mm->mmap_sem);
+		vma = current->mm->mmap;
+		while (vma) {
+			vsize += vma->vm_end - vma->vm_start;
+			vma = vma->vm_next;
+		}
+		up_read(&current->mm->mmap_sem);
+	}
+	spin_lock(&current->sighand->siglock);
+	if (group_dead)
+		pacct->ac_mem = vsize / 1024;
+	if (thread_group_leader(current)) {
+		pacct->ac_start_time = current->start_time;
+		pacct->ac_exitcode = exitcode;
+		if (current->flags & PF_FORKNOEXEC)
+			pacct->ac_flag |= AFORK;
+	}
+	if (current->flags & PF_SUPERPRIV)
+		pacct->ac_flag |= ASU;
+	if (current->flags & PF_DUMPCORE)
+		pacct->ac_flag |= ACORE;
+	if (current->flags & PF_SIGNALED)
+		pacct->ac_flag |= AXSIG;
+	pacct->ac_utime = cputime_add(pacct->ac_utime, current->utime);
+	pacct->ac_stime = cputime_add(pacct->ac_stime, current->stime);
+	pacct->ac_minflt += current->min_flt;
+	pacct->ac_majflt += current->maj_flt;
+	spin_unlock(&current->sighand->siglock);
+}
+
 /**
  * acct_process - now just a wrapper around do_acct_process
  * @exitcode: task exit code
  *
  * handles process accounting for an exiting task
  */
-void acct_process(long exitcode)
+void acct_process()
 {
 	struct file *file = NULL;

@@ -570,11 +586,10 @@ void acct_process(long exitcode)
 	get_file(file);
 	spin_unlock(&acct_globals.lock);

-	do_acct_process(exitcode, file);
+	do_acct_process(file);
 	fput(file);
 }

-
 /**
  * acct_update_integrals - update mm integral fields in task_struct
  * @tsk: task_struct for accounting
--- linux-2.6.17/kernel/fork.c	2006-06-18 10:49:35.000000000 +0900
+++ linux-2.6.17.kg/kernel/fork.c	2006-06-19 16:16:30.000000000 +0900
@@ -871,6 +871,7 @@ static inline int copy_signal(unsigned l
 		tsk->it_prof_expires =
 			secs_to_cputime(sig->rlim[RLIMIT_CPU].rlim_cur);
 	}
+	acct_init_pacct(&sig->pacct, tsk);

 	return 0;
 }
--- linux-2.6.17/kernel/exit.c	2006-06-18 10:49:35.000000000 +0900
+++ linux-2.6.17.kg/kernel/exit.c	2006-06-19 16:14:31.000000000 +0900
@@ -895,8 +895,8 @@ fastcall NORET_TYPE void do_exit(long co
 	if (group_dead) {
  		hrtimer_cancel(&tsk->signal->real_timer);
 		exit_itimers(tsk->signal);
-		acct_process(code);
 	}
+	acct_collect(code, group_dead);
 	if (unlikely(tsk->robust_list))
 		exit_robust_list(tsk);
 #ifdef CONFIG_COMPAT
@@ -907,6 +907,8 @@ fastcall NORET_TYPE void do_exit(long co
 		audit_free(tsk);
 	exit_mm(tsk);

+	if (group_dead)
+		acct_process();
 	exit_sem(tsk);
 	__exit_files(tsk);
 	__exit_fs(tsk);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Add pacct_struct to fix some pacct bugs.
  2006-06-20  6:24 Add pacct_struct to fix some pacct bugs KaiGai Kohei
@ 2006-06-20  6:42 ` Andrew Morton
  2006-06-20  7:27   ` KaiGai Kohei
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2006-06-20  6:42 UTC (permalink / raw)
  To: KaiGai Kohei; +Cc: linux-kernel

On Tue, 20 Jun 2006 15:24:59 +0900
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> Hi, I noticed three problems in pacct facility.
> 
> 1. Pacct facility has a possibility to write incorrect ac_flag
>    in multi-threading cases.
> 2. There is a possibility to be waken up OOM Killer from
>    pacct facility. It will cause system stall.
> 3. If several threads are killed at same time, There is
>    a possibility not to pick up a part of those accountings.
> 
> The attached patch will resolve those matters.
> Any comments please. Thanks,

Thanks, but you have three quite distinct bugs here, and three quite
distinct descriptions and, I think, three quite distinct fixes.

Would it be possible for you to prepare three patches?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Add pacct_struct to fix some pacct bugs.
  2006-06-20  6:42 ` Andrew Morton
@ 2006-06-20  7:27   ` KaiGai Kohei
  2006-06-22  3:07     ` KaiGai Kohei
  0 siblings, 1 reply; 10+ messages in thread
From: KaiGai Kohei @ 2006-06-20  7:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

>> Hi, I noticed three problems in pacct facility.
>>
>> 1. Pacct facility has a possibility to write incorrect ac_flag
>>    in multi-threading cases.
>> 2. There is a possibility to be waken up OOM Killer from
>>    pacct facility. It will cause system stall.
>> 3. If several threads are killed at same time, There is
>>    a possibility not to pick up a part of those accountings.
>>
>> The attached patch will resolve those matters.
>> Any comments please. Thanks,
> 
> Thanks, but you have three quite distinct bugs here, and three quite
> distinct descriptions and, I think, three quite distinct fixes.
> 
> Would it be possible for you to prepare three patches?

It may be possible. Please wait for a while to separate it into
three-part and to confirm its correct behavior.

Thanks,
-- 
Open Source Software Promotion Center, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Add pacct_struct to fix some pacct bugs.
  2006-06-20  7:27   ` KaiGai Kohei
@ 2006-06-22  3:07     ` KaiGai Kohei
  2006-06-22  3:08       ` KaiGai Kohei
                         ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: KaiGai Kohei @ 2006-06-22  3:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KaiGai Kohei, linux-kernel

The seriese of patches fixes some process accounting bugs.

[PATCH 1/3] two phase process accounting
[PATCH 2/3] avoidance to refer the last thread as a representation
             of the process
[PATCH 3/3] none-delayed process accounting accumulation

* background of the patch.1

     The pacct facility need an i/o operation when an accounting record
     is generated. There is a possibility to wake OOM killer up.
     If OOM killer is activated, it kills some processes to make them
     release process memory regions.
     But acct_process() is called in the killed processes context
     before calling exit_mm(), so those processes cannot release
     own memory. In the results, any processes stop in this point and
     it finally cause a system stall.

     ---- in kernel/exit.c : do_exit() ------------
             group_dead = atomic_dec_and_test(&tsk->signal->live);
             if (group_dead) {
                     hrtimer_cancel(&tsk->signal->real_timer);
                     exit_itimers(tsk->signal);
                     acct_process(code);
             }
                :
            - snip -
                :
             exit_mm(tsk);
     ----------------------------------------------

     This patch separates generating an accounting record facility
     into two-phase.
     In the first one, acct_collect() calculate vitual memory size
     of the process and stores it into pacct_struct before exit_mm().
     Then, acct_process() generates an accounting record and write
     it into medium.


* background of the patch.2

     When pacct facility generate an 'ac_flag' field in accounting record,
     it refers a task_struct of the thread which died last in the process.
     But any other task_structs are ignored.

     Therefore, pacct facility drops ASU flag even if root-privilege
     operations are used by any other threads except the last one.
     In addition, AFORK flag is always set when the thread of group-leader
     didn't die last, although this process has called execve() after fork().

     We have a same matter in ac_exitcode. The recorded ac_exitcode is
     an exit code of the last thread in the process. There is a possibility
     this exitcode is not the group leader's one.

     ---- in kernel/acct.c : do_acct_process() ----
             ac.ac_flag = 0;
             if (current->flags & PF_FORKNOEXEC)
                     ac.ac_flag |= AFORK;
             if (current->flags & PF_SUPERPRIV)
                     ac.ac_flag |= ASU;
             if (current->flags & PF_DUMPCORE)
                     ac.ac_flag |= ACORE;
             if (current->flags & PF_SIGNALED)
                     ac.ac_flag |= AXSIG;
                       :
                    - snip -
                       :
             ac.ac_exitcode = exitcode;
     ----------------------------------------------

     This patch fixes those matters.
     - The exit code of group leader is recorded as ac_exitcode.
     - ASU, ACORE, AXSIG flag are marked if any task_struct satisfy
       the conditions.
     - AFORK flag is marked if only group leader thread satisfy
       the condition.


* background of the patch.3

     In current 2.6.17 implementation, signal_struct refered from task_struct
     is used for per-process data structure. The pacct facility also uses it
     as a per-process data structure to store stime, utime, minflt, majflt.
     But those members are saved in __exit_signal(). It's too late.

     For example, if some threads exits at same time, pacct facility has
     a possibility to drop accountings for a part of those threads.
     (see, the following 'The results of original 2.6.17 kernel')
     I think accounting information should be completely collected into
     the per-process data structure before writing out an accounting record.

     This patch fixes this matter. Accumulation of stime, utime, minflt
     and majflt are done before generating accounting record.


* accounting results

[in original 2.6.17 cases]

# accton acct.log
# time -p ./bugacct
real 10.07
user 5.96
sys 0.10
# time -p ./raceacct 4
real 6.92
user 27.22
sys 0.00
# time -p ./raceacct 4
real 7.71
user 30.14
sys 0.00
# time -p ./raceacct 4
real 6.94
user 27.21
sys 0.00
# time -p ./raceacct 4
real 6.25
user 24.42
sys 0.00
# time -p ./raceacct 4
real 6.92
user 27.22
sys 0.00

-- accounting results --------
FLAG    BTIME  ETIME  UTIME  STIME     MEM  MINFLT MAJFLT      COMM
-P-- 13:41:16      5       0     0    3072     110      0    accton
F--- 13:41:35   1006     596     9  143232    8200      0   bugacct *
F--- 13:41:53    692    2032     0   28528      38      0  raceacct *
---- 13:42:10    771    3014     0   28528     170      0  raceacct
F--- 13:42:19    694    2027     0   28528       8      0  raceacct *
F--- 13:42:26    625    1832     0   28528      40      0  raceacct *
---- 13:45:40    692    2722     0   28528     171      0  raceacct

'P' means this process used root privilege operations.
'F' means this process didn't execve() after fork().

=> bugacct used root privilege operation, but pacct facility droped it.
=> In raceacct, some threads exit on same time. pacct facility often drops
    a part of utime, stime, minflt and majflt.
=> When group leader thread didn't die last in raceacct, incorrent flag 'F'
    is set.


[in patched 2.6.17-kg cases]

# touch acct.log
# accton acct.log
# time -p ./bugacct
real 10.07
user 5.97
sys 0.09
# time -p ./raceacct 4
real 7.11
user 27.76
sys 0.00
# time -p ./raceacct 4
real 6.93
user 27.18
sys 0.00
# time -p ./raceacct 4
real 7.11
user 27.76
sys 0.00
# time -p ./raceacct 4
real 7.12
user 27.77
sys 0.00
# time -p ./raceacct 4
real 6.92
user 27.17
sys 0.00

-- accounting results --------
FLAG    BTIME  ETIME  UTIME  STIME     MEM  MINFLT MAJFLT      COMM
-P-- 13:24:01      0      0      0    3072     111      0    accton
-P-- 13:24:05   1007    597      8  143232    8360      0   bugacct
---- 13:24:35    711   2776      0   28528     171      0  raceacct
---- 13:24:44    693   2718      0   28528     172      0  raceacct
---- 13:24:51    711   2776      0   28528     172      0  raceacct
---- 13:25:05    712   2777      0   28528     174      0  raceacct
---- 13:25:14    692   2717      0   28528     171      0  raceacct

I hope your any comments. Thanks,

KaiGai Kohei wrote:
 >>> Hi, I noticed three problems in pacct facility.
 >>>
 >>> 1. Pacct facility has a possibility to write incorrect ac_flag
 >>>    in multi-threading cases.
 >>> 2. There is a possibility to be waken up OOM Killer from
 >>>    pacct facility. It will cause system stall.
 >>> 3. If several threads are killed at same time, There is
 >>>    a possibility not to pick up a part of those accountings.
 >>>
 >>> The attached patch will resolve those matters.
 >>> Any comments please. Thanks,
 >>
 >> Thanks, but you have three quite distinct bugs here, and three quite
 >> distinct descriptions and, I think, three quite distinct fixes.
 >>
 >> Would it be possible for you to prepare three patches?
 >
 > It may be possible. Please wait for a while to separate it into
 > three-part and to confirm its correct behavior.
 >
 > Thanks,
-- 
Open Source Software Promotion Center, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Add pacct_struct to fix some pacct bugs.
  2006-06-22  3:07     ` KaiGai Kohei
@ 2006-06-22  3:08       ` KaiGai Kohei
  2006-06-22  3:09       ` KaiGai Kohei
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: KaiGai Kohei @ 2006-06-22  3:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KaiGai Kohei, linux-kernel

     [PACCT] two phase process accounting

     The pacct facility need an i/o operation when an accounting record
     is generated. There is a possibility to wake OOM killer up.
     If OOM killer is activated, it kills some processes to make them
     release process memory regions.
     But acct_process() is called in the killed processes context
     before calling exit_mm(), so those processes cannot release
     own memory. In the results, any processes stop in this point and
     it finally cause a system stall.

     ---- in kernel/exit.c : do_exit() ------------
             group_dead = atomic_dec_and_test(&tsk->signal->live);
             if (group_dead) {
                     hrtimer_cancel(&tsk->signal->real_timer);
                     exit_itimers(tsk->signal);
                     acct_process(code);
             }
                :
            - snip -
                :
             exit_mm(tsk);
     ----------------------------------------------

     This patch separates generating an accounting record facility
     into two-phase.
     In the first one, acct_collect() calculate vitual memory size
     of the process and stores it into pacct_struct before exit_mm().
     Then, acct_process() generates an accounting record and write
     it into medium.

     Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>

diff --git a/include/linux/acct.h b/include/linux/acct.h
index 9a66401..6753687 100644
--- a/include/linux/acct.h
+++ b/include/linux/acct.h
@@ -121,16 +121,20 @@ struct acct_v3
  #ifdef CONFIG_BSD_PROCESS_ACCT
  struct vfsmount;
  struct super_block;
  extern void acct_auto_close_mnt(struct vfsmount *m);
  extern void acct_auto_close(struct super_block *sb);
+extern void acct_init_pacct(struct pacct_struct *pacct);
+extern void acct_collect();
  extern void acct_process(long exitcode);
  extern void acct_update_integrals(struct task_struct *tsk);
  extern void acct_clear_integrals(struct task_struct *tsk);
  #else
  #define acct_auto_close_mnt(x)	do { } while (0)
  #define acct_auto_close(x)	do { } while (0)
+#define acct_init_pacct(x)	do { } while (0)
+#define acct_collect()		do { } while (0)
  #define acct_process(x)		do { } while (0)
  #define acct_update_integrals(x)		do { } while (0)
  #define acct_clear_integrals(task)	do { } while (0)
  #endif

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 29b7d4f..918fdda 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -356,10 +356,14 @@ struct sighand_struct {
  	atomic_t		count;
  	struct k_sigaction	action[_NSIG];
  	spinlock_t		siglock;
  };

+struct pacct_struct {
+	unsigned long		ac_mem;
+};
+
  /*
   * NOTE! "signal_struct" does not have it's own
   * locking, because a shared signal_struct always
   * implies a shared sighand_struct, so locking
   * sighand_struct is always a proper superset of
@@ -447,10 +451,13 @@ struct signal_struct {
  	 * thing in threads created with CLONE_THREAD */
  #ifdef CONFIG_KEYS
  	struct key *session_keyring;	/* keyring inherited over fork */
  	struct key *process_keyring;	/* keyring private to this process */
  #endif
+#ifdef CONFIG_BSD_PROCESS_ACCT
+	struct pacct_struct pacct;	/* per-process accounting information */
+#endif
  };

  /* Context switch must be unlocked if interrupts are to be enabled */
  #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
  # define __ARCH_WANT_UNLOCKED_CTXSW
diff --git a/kernel/acct.c b/kernel/acct.c
index b327f4d..f1a4e12 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -419,13 +419,13 @@ static u32 encode_float(u64 value)
  /*
   *  do_acct_process does all actual work. Caller holds the reference to file.
   */
  static void do_acct_process(long exitcode, struct file *file)
  {
+	struct pacct_struct *pacct = &current->signal->pacct;
  	acct_t ac;
  	mm_segment_t fs;
-	unsigned long vsize;
  	unsigned long flim;
  	u64 elapsed;
  	u64 run_time;
  	struct timespec uptime;
  	unsigned long jiffies;
@@ -503,24 +503,13 @@ static void do_acct_process(long exitcod
  		ac.ac_flag |= ASU;
  	if (current->flags & PF_DUMPCORE)
  		ac.ac_flag |= ACORE;
  	if (current->flags & PF_SIGNALED)
  		ac.ac_flag |= AXSIG;
-
-	vsize = 0;
-	if (current->mm) {
-		struct vm_area_struct *vma;
-		down_read(&current->mm->mmap_sem);
-		vma = current->mm->mmap;
-		while (vma) {
-			vsize += vma->vm_end - vma->vm_start;
-			vma = vma->vm_next;
-		}
-		up_read(&current->mm->mmap_sem);
-	}
-	vsize = vsize / 1024;
-	ac.ac_mem = encode_comp_t(vsize);
+	spin_lock(&current->sighand->siglock);
+	ac.ac_mem = encode_comp_t(pacct->ac_mem);
+	spin_unlock(&current->sighand->siglock);
  	ac.ac_io = encode_comp_t(0 /* current->io_usage */);	/* %% */
  	ac.ac_rw = encode_comp_t(ac.ac_io / 1024);
  	ac.ac_minflt = encode_comp_t(current->signal->min_flt +
  				     current->min_flt);
  	ac.ac_majflt = encode_comp_t(current->signal->maj_flt +
@@ -544,10 +533,42 @@ static void do_acct_process(long exitcod
  	current->signal->rlim[RLIMIT_FSIZE].rlim_cur = flim;
  	set_fs(fs);
  }

  /**
+ * acct_init_pacct - initialize a new pacct_struct
+ */
+void acct_init_pacct(struct pacct_struct *pacct)
+{
+	memset(pacct, 0, sizeof(struct pacct_struct));
+}
+
+/**
+ * acct_collect - collect accounting information into pacct_struct
+ */
+void acct_collect(void)
+{
+	struct pacct_struct *pacct = &current->signal->pacct;
+	unsigned long vsize = 0;
+
+	if (current->mm) {
+		struct vm_area_struct *vma;
+		down_read(&current->mm->mmap_sem);
+		vma = current->mm->mmap;
+		while (vma) {
+			vsize += vma->vm_end - vma->vm_start;
+			vma = vma->vm_next;
+		}
+		up_read(&current->mm->mmap_sem);
+	}
+
+	spin_lock(&current->sighand->siglock);
+	pacct->ac_mem = vsize / 1024;
+	spin_unlock(&current->sighand->siglock);
+}
+
+/**
   * acct_process - now just a wrapper around do_acct_process
   * @exitcode: task exit code
   *
   * handles process accounting for an exiting task
   */
diff --git a/kernel/exit.c b/kernel/exit.c
index e06d0c1..54bdbd9 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -893,11 +893,11 @@ fastcall NORET_TYPE void do_exit(long co
  	}
  	group_dead = atomic_dec_and_test(&tsk->signal->live);
  	if (group_dead) {
   		hrtimer_cancel(&tsk->signal->real_timer);
  		exit_itimers(tsk->signal);
-		acct_process(code);
+		acct_collect();
  	}
  	if (unlikely(tsk->robust_list))
  		exit_robust_list(tsk);
  #ifdef CONFIG_COMPAT
  	if (unlikely(tsk->compat_robust_list))
@@ -905,10 +905,12 @@ fastcall NORET_TYPE void do_exit(long co
  #endif
  	if (unlikely(tsk->audit_context))
  		audit_free(tsk);
  	exit_mm(tsk);

+	if (group_dead)
+		acct_process(code);
  	exit_sem(tsk);
  	__exit_files(tsk);
  	__exit_fs(tsk);
  	exit_namespace(tsk);
  	exit_thread();
diff --git a/kernel/fork.c b/kernel/fork.c
index ac8100e..d6c812c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -869,10 +869,11 @@ static inline int copy_signal(unsigned l
  		 * of the whole CPU time limit.
  		 */
  		tsk->it_prof_expires =
  			secs_to_cputime(sig->rlim[RLIMIT_CPU].rlim_cur);
  	}
+	acct_init_pacct(&sig->pacct);

  	return 0;
  }

  void __cleanup_signal(struct signal_struct *sig)


-- 
Open Source Software Promotion Center, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Add pacct_struct to fix some pacct bugs.
  2006-06-22  3:07     ` KaiGai Kohei
  2006-06-22  3:08       ` KaiGai Kohei
@ 2006-06-22  3:09       ` KaiGai Kohei
  2006-06-22  3:09       ` KaiGai Kohei
  2006-06-22  3:35       ` Andrew Morton
  3 siblings, 0 replies; 10+ messages in thread
From: KaiGai Kohei @ 2006-06-22  3:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KaiGai Kohei, linux-kernel

     [PACCT] avoidance to refer the last thread as a representation of the process

     When pacct facility generate an 'ac_flag' field in accounting record,
     it refers a task_struct of the thread which died last in the process.
     But any other task_structs are ignored.

     Therefore, pacct facility drops ASU flag even if root-privilege
     operations are used by any other threads except the last one.
     In addition, AFORK flag is always set when the thread of group-leader
     didn't die last, although this process has called execve() after fork().

     We have a same matter in ac_exitcode. The recorded ac_exitcode is
     an exit code of the last thread in the process. There is a possibility
     this exitcode is not the group leader's one.

     ---- in kernel/acct.c : do_acct_process() ----
             ac.ac_flag = 0;
             if (current->flags & PF_FORKNOEXEC)
                     ac.ac_flag |= AFORK;
             if (current->flags & PF_SUPERPRIV)
                     ac.ac_flag |= ASU;
             if (current->flags & PF_DUMPCORE)
                     ac.ac_flag |= ACORE;
             if (current->flags & PF_SIGNALED)
                     ac.ac_flag |= AXSIG;
                       :
                    - snip -
                       :
             ac.ac_exitcode = exitcode;
     ----------------------------------------------

     This patch fixes those matters.
     - The exit code of group leader is recorded as ac_exitcode.
     - ASU, ACORE, AXSIG flag are marked if any task_struct satisfy
       the conditions.
     - AFORK flag is marked if only group leader thread satisfy
       the condition.

     Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>

diff --git a/include/linux/acct.h b/include/linux/acct.h
index 6753687..879d441 100644
--- a/include/linux/acct.h
+++ b/include/linux/acct.h
@@ -122,20 +122,20 @@ struct acct_v3
  struct vfsmount;
  struct super_block;
  extern void acct_auto_close_mnt(struct vfsmount *m);
  extern void acct_auto_close(struct super_block *sb);
  extern void acct_init_pacct(struct pacct_struct *pacct);
-extern void acct_collect();
-extern void acct_process(long exitcode);
+extern void acct_collect(long exitcode, int group_dead);
+extern void acct_process(void);
  extern void acct_update_integrals(struct task_struct *tsk);
  extern void acct_clear_integrals(struct task_struct *tsk);
  #else
  #define acct_auto_close_mnt(x)	do { } while (0)
  #define acct_auto_close(x)	do { } while (0)
  #define acct_init_pacct(x)	do { } while (0)
-#define acct_collect()		do { } while (0)
-#define acct_process(x)		do { } while (0)
+#define acct_collect(x,y)	do { } while (0)
+#define acct_process()		do { } while (0)
  #define acct_update_integrals(x)		do { } while (0)
  #define acct_clear_integrals(task)	do { } while (0)
  #endif

  /*
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 918fdda..6905ac0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -357,10 +357,12 @@ struct sighand_struct {
  	struct k_sigaction	action[_NSIG];
  	spinlock_t		siglock;
  };

  struct pacct_struct {
+	int			ac_flag;
+	long			ac_exitcode;
  	unsigned long		ac_mem;
  };

  /*
   * NOTE! "signal_struct" does not have it's own
diff --git a/kernel/acct.c b/kernel/acct.c
index f1a4e12..7111fe7 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -73,11 +73,11 @@ int acct_parm[3] = {4, 2, 30};
  #define ACCT_TIMEOUT	(acct_parm[2])	/* foo second timeout between checks */

  /*
   * External references and all of the globals.
   */
-static void do_acct_process(long, struct file *);
+static void do_acct_process(struct file *);

  /*
   * This structure is used so that all the data protected by lock
   * can be placed in the same cache line as the lock.  This primes
   * the cache line to have the data after getting the lock.
@@ -194,11 +194,11 @@ static void acct_file_reopen(struct file
  		add_timer(&acct_globals.timer);
  	}
  	if (old_acct) {
  		mnt_unpin(old_acct->f_vfsmnt);
  		spin_unlock(&acct_globals.lock);
-		do_acct_process(0, old_acct);
+		do_acct_process(old_acct);
  		filp_close(old_acct, NULL);
  		spin_lock(&acct_globals.lock);
  	}
  }

@@ -417,11 +417,11 @@ static u32 encode_float(u64 value)
   */

  /*
   *  do_acct_process does all actual work. Caller holds the reference to file.
   */
-static void do_acct_process(long exitcode, struct file *file)
+static void do_acct_process(struct file *file)
  {
  	struct pacct_struct *pacct = &current->signal->pacct;
  	acct_t ac;
  	mm_segment_t fs;
  	unsigned long flim;
@@ -494,30 +494,22 @@ static void do_acct_process(long exitcod
  	read_lock(&tasklist_lock);	/* pin current->signal */
  	ac.ac_tty = current->signal->tty ?
  		old_encode_dev(tty_devnum(current->signal->tty)) : 0;
  	read_unlock(&tasklist_lock);

-	ac.ac_flag = 0;
-	if (current->flags & PF_FORKNOEXEC)
-		ac.ac_flag |= AFORK;
-	if (current->flags & PF_SUPERPRIV)
-		ac.ac_flag |= ASU;
-	if (current->flags & PF_DUMPCORE)
-		ac.ac_flag |= ACORE;
-	if (current->flags & PF_SIGNALED)
-		ac.ac_flag |= AXSIG;
  	spin_lock(&current->sighand->siglock);
+	ac.ac_flag = pacct->ac_flag;
  	ac.ac_mem = encode_comp_t(pacct->ac_mem);
+	ac.ac_exitcode = pacct->ac_exitcode;
  	spin_unlock(&current->sighand->siglock);
  	ac.ac_io = encode_comp_t(0 /* current->io_usage */);	/* %% */
  	ac.ac_rw = encode_comp_t(ac.ac_io / 1024);
  	ac.ac_minflt = encode_comp_t(current->signal->min_flt +
  				     current->min_flt);
  	ac.ac_majflt = encode_comp_t(current->signal->maj_flt +
  				     current->maj_flt);
  	ac.ac_swaps = encode_comp_t(0);
-	ac.ac_exitcode = exitcode;

  	/*
           * Kernel segment override to datasegment and write it
           * to the accounting file.
           */
@@ -542,17 +534,19 @@ void acct_init_pacct(struct pacct_struct
  	memset(pacct, 0, sizeof(struct pacct_struct));
  }

  /**
   * acct_collect - collect accounting information into pacct_struct
+ * @exitcode: task exit code
+ * @group_dead: not 0, if this thread is the last one in the process.
   */
-void acct_collect(void)
+void acct_collect(long exitcode, int group_dead)
  {
  	struct pacct_struct *pacct = &current->signal->pacct;
  	unsigned long vsize = 0;

-	if (current->mm) {
+	if (group_dead && current->mm) {
  		struct vm_area_struct *vma;
  		down_read(&current->mm->mmap_sem);
  		vma = current->mm->mmap;
  		while (vma) {
  			vsize += vma->vm_end - vma->vm_start;
@@ -560,21 +554,33 @@ void acct_collect(void)
  		}
  		up_read(&current->mm->mmap_sem);
  	}

  	spin_lock(&current->sighand->siglock);
-	pacct->ac_mem = vsize / 1024;
+	if (group_dead)
+		pacct->ac_mem = vsize / 1024;
+	if (thread_group_leader(current)) {
+		pacct->ac_exitcode = exitcode;
+		if (current->flags & PF_FORKNOEXEC)
+			pacct->ac_flag |= AFORK;
+	}
+	if (current->flags & PF_SUPERPRIV)
+		pacct->ac_flag |= ASU;
+	if (current->flags & PF_DUMPCORE)
+		pacct->ac_flag |= ACORE;
+	if (current->flags & PF_SIGNALED)
+		pacct->ac_flag |= AXSIG;
  	spin_unlock(&current->sighand->siglock);
  }

  /**
   * acct_process - now just a wrapper around do_acct_process
   * @exitcode: task exit code
   *
   * handles process accounting for an exiting task
   */
-void acct_process(long exitcode)
+void acct_process()
  {
  	struct file *file = NULL;

  	/*
  	 * accelerate the common fastpath:
@@ -589,11 +595,11 @@ void acct_process(long exitcode)
  		return;
  	}
  	get_file(file);
  	spin_unlock(&acct_globals.lock);

-	do_acct_process(exitcode, file);
+	do_acct_process(file);
  	fput(file);
  }


  /**
diff --git a/kernel/exit.c b/kernel/exit.c
index 54bdbd9..9d395c7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -893,12 +893,12 @@ fastcall NORET_TYPE void do_exit(long co
  	}
  	group_dead = atomic_dec_and_test(&tsk->signal->live);
  	if (group_dead) {
   		hrtimer_cancel(&tsk->signal->real_timer);
  		exit_itimers(tsk->signal);
-		acct_collect();
  	}
+	acct_collect(code, group_dead);
  	if (unlikely(tsk->robust_list))
  		exit_robust_list(tsk);
  #ifdef CONFIG_COMPAT
  	if (unlikely(tsk->compat_robust_list))
  		compat_exit_robust_list(tsk);
@@ -906,11 +906,11 @@ fastcall NORET_TYPE void do_exit(long co
  	if (unlikely(tsk->audit_context))
  		audit_free(tsk);
  	exit_mm(tsk);

  	if (group_dead)
-		acct_process(code);
+		acct_process();
  	exit_sem(tsk);
  	__exit_files(tsk);
  	__exit_fs(tsk);
  	exit_namespace(tsk);
  	exit_thread();

-- 
Open Source Software Promotion Center, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Add pacct_struct to fix some pacct bugs.
  2006-06-22  3:07     ` KaiGai Kohei
  2006-06-22  3:08       ` KaiGai Kohei
  2006-06-22  3:09       ` KaiGai Kohei
@ 2006-06-22  3:09       ` KaiGai Kohei
  2006-06-22  3:35       ` Andrew Morton
  3 siblings, 0 replies; 10+ messages in thread
From: KaiGai Kohei @ 2006-06-22  3:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: KaiGai Kohei, linux-kernel

     [PACCT] none-delayed process accounting accumulation

     In current 2.6.17 implementation, signal_struct refered from task_struct
     is used for per-process data structure. The pacct facility also uses it
     as a per-process data structure to store stime, utime, minflt, majflt.
     But those members are saved in __exit_signal(). It's too late.

     For example, if some threads exits at same time, pacct facility has
     a possibility to drop accountings for a part of those threads.
     (see, the following 'The results of original 2.6.17 kernel')
     I think accounting information should be completely collected into
     the per-process data structure before writing out an accounting record.

     This patch fixes this matter. Accumulation of stime, utime, minflt
     and majflt are done before generating accounting record.

     Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6905ac0..06c72fb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -360,10 +360,12 @@ struct sighand_struct {

  struct pacct_struct {
  	int			ac_flag;
  	long			ac_exitcode;
  	unsigned long		ac_mem;
+	cputime_t		ac_utime, ac_stime;
+	unsigned long		ac_minflt, ac_majflt;
  };

  /*
   * NOTE! "signal_struct" does not have it's own
   * locking, because a shared signal_struct always
diff --git a/kernel/acct.c b/kernel/acct.c
index 7111fe7..48c701c 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -426,11 +426,10 @@ static void do_acct_process(struct file
  	mm_segment_t fs;
  	unsigned long flim;
  	u64 elapsed;
  	u64 run_time;
  	struct timespec uptime;
-	unsigned long jiffies;

  	/*
  	 * First check to see if there is enough free_space to continue
  	 * the process accounting system.
  	 */
@@ -467,16 +466,10 @@ static void do_acct_process(struct file
  		ac.ac_etime_lo = (u16) etime;
  	}
  #endif
  	do_div(elapsed, AHZ);
  	ac.ac_btime = xtime.tv_sec - elapsed;
-	jiffies = cputime_to_jiffies(cputime_add(current->utime,
-						 current->signal->utime));
-	ac.ac_utime = encode_comp_t(jiffies_to_AHZ(jiffies));
-	jiffies = cputime_to_jiffies(cputime_add(current->stime,
-						 current->signal->stime));
-	ac.ac_stime = encode_comp_t(jiffies_to_AHZ(jiffies));
  	/* we really need to bite the bullet and change layout */
  	ac.ac_uid = current->uid;
  	ac.ac_gid = current->gid;
  #if ACCT_VERSION==2
  	ac.ac_ahz = AHZ;
@@ -495,20 +488,20 @@ static void do_acct_process(struct file
  	ac.ac_tty = current->signal->tty ?
  		old_encode_dev(tty_devnum(current->signal->tty)) : 0;
  	read_unlock(&tasklist_lock);

  	spin_lock(&current->sighand->siglock);
+	ac.ac_utime = encode_comp_t(jiffies_to_AHZ(cputime_to_jiffies(pacct->ac_utime)));
+	ac.ac_stime = encode_comp_t(jiffies_to_AHZ(cputime_to_jiffies(pacct->ac_stime)));
  	ac.ac_flag = pacct->ac_flag;
  	ac.ac_mem = encode_comp_t(pacct->ac_mem);
+	ac.ac_minflt = encode_comp_t(pacct->ac_minflt);
+	ac.ac_majflt = encode_comp_t(pacct->ac_majflt);
  	ac.ac_exitcode = pacct->ac_exitcode;
  	spin_unlock(&current->sighand->siglock);
  	ac.ac_io = encode_comp_t(0 /* current->io_usage */);	/* %% */
  	ac.ac_rw = encode_comp_t(ac.ac_io / 1024);
-	ac.ac_minflt = encode_comp_t(current->signal->min_flt +
-				     current->min_flt);
-	ac.ac_majflt = encode_comp_t(current->signal->maj_flt +
-				     current->maj_flt);
  	ac.ac_swaps = encode_comp_t(0);

  	/*
           * Kernel segment override to datasegment and write it
           * to the accounting file.
@@ -530,10 +523,11 @@ static void do_acct_process(struct file
   * acct_init_pacct - initialize a new pacct_struct
   */
  void acct_init_pacct(struct pacct_struct *pacct)
  {
  	memset(pacct, 0, sizeof(struct pacct_struct));
+	pacct->ac_utime = pacct->ac_stime = cputime_zero;
  }

  /**
   * acct_collect - collect accounting information into pacct_struct
   * @exitcode: task exit code
@@ -567,10 +561,14 @@ void acct_collect(long exitcode, int gro
  		pacct->ac_flag |= ASU;
  	if (current->flags & PF_DUMPCORE)
  		pacct->ac_flag |= ACORE;
  	if (current->flags & PF_SIGNALED)
  		pacct->ac_flag |= AXSIG;
+	pacct->ac_utime = cputime_add(pacct->ac_utime, current->utime);
+	pacct->ac_stime = cputime_add(pacct->ac_stime, current->stime);
+	pacct->ac_minflt += current->min_flt;
+	pacct->ac_majflt += current->maj_flt;
  	spin_unlock(&current->sighand->siglock);
  }

  /**
   * acct_process - now just a wrapper around do_acct_process

-- 
Open Source Software Promotion Center, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Add pacct_struct to fix some pacct bugs.
  2006-06-22  3:07     ` KaiGai Kohei
                         ` (2 preceding siblings ...)
  2006-06-22  3:09       ` KaiGai Kohei
@ 2006-06-22  3:35       ` Andrew Morton
  2006-06-22  4:05         ` KaiGai Kohei
  2006-06-22  4:14         ` Randy.Dunlap
  3 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2006-06-22  3:35 UTC (permalink / raw)
  To: KaiGai Kohei; +Cc: kaigai, linux-kernel

On Thu, 22 Jun 2006 12:07:19 +0900
KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:

> The seriese of patches fixes some process accounting bugs.

OK, thanks for splitting those up.  A few patch-protocol things:

- Please make the email Subject: reflect the patch contents - all three
  patches here were called "Re: Add pacct_struct to fix some pacct bugs."

- Please don't indent the changlogs by five spaces.  Start in column zero.

- Your email client performs space-stuffing, which corrupts the patches. 
  I fixed them all by hand.

  I don't know if it's possible to prevent thunderbird from doing this
  (my mozilla bugzilla record on this has to be three years old, and is
  still open).  You might have to use text/plain attachments in the future.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Add pacct_struct to fix some pacct bugs.
  2006-06-22  3:35       ` Andrew Morton
@ 2006-06-22  4:05         ` KaiGai Kohei
  2006-06-22  4:14         ` Randy.Dunlap
  1 sibling, 0 replies; 10+ messages in thread
From: KaiGai Kohei @ 2006-06-22  4:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
> On Thu, 22 Jun 2006 12:07:19 +0900
> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
> 
>> The seriese of patches fixes some process accounting bugs.
> 
> OK, thanks for splitting those up.  A few patch-protocol things:
> 
> - Please make the email Subject: reflect the patch contents - all three
>   patches here were called "Re: Add pacct_struct to fix some pacct bugs."
> 
> - Please don't indent the changlogs by five spaces.  Start in column zero.
> 
> - Your email client performs space-stuffing, which corrupts the patches. 
>   I fixed them all by hand.
> 
>   I don't know if it's possible to prevent thunderbird from doing this
>   (my mozilla bugzilla record on this has to be three years old, and is
>   still open).  You might have to use text/plain attachments in the future.

Oops, I'm sorry for sending you a trouble.
I'll pay attention about those manners.

Thanks,
-- 
Open Source Software Promotion Center, NEC
KaiGai Kohei <kaigai@ak.jp.nec.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Add pacct_struct to fix some pacct bugs.
  2006-06-22  3:35       ` Andrew Morton
  2006-06-22  4:05         ` KaiGai Kohei
@ 2006-06-22  4:14         ` Randy.Dunlap
  1 sibling, 0 replies; 10+ messages in thread
From: Randy.Dunlap @ 2006-06-22  4:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: kaigai, linux-kernel

On Wed, 21 Jun 2006 20:35:50 -0700 Andrew Morton wrote:

> On Thu, 22 Jun 2006 12:07:19 +0900
> KaiGai Kohei <kaigai@ak.jp.nec.com> wrote:
> 
> > The seriese of patches fixes some process accounting bugs.
> 
> OK, thanks for splitting those up.  A few patch-protocol things:
> 
> - Please make the email Subject: reflect the patch contents - all three
>   patches here were called "Re: Add pacct_struct to fix some pacct bugs."
> 
> - Please don't indent the changlogs by five spaces.  Start in column zero.
> 
> - Your email client performs space-stuffing, which corrupts the patches. 
>   I fixed them all by hand.
> 
>   I don't know if it's possible to prevent thunderbird from doing this
>   (my mozilla bugzilla record on this has to be three years old, and is
>   still open).  You might have to use text/plain attachments in the future.


tbird hints (works for me from oracle.com):

http://mbligh.org/linuxdocs/Email/Clients/Thunderbird
and
http://lkml.org/lkml/2005/12/27/191
and
http://lists.osdl.org/pipermail/kernel-janitors/2006-June/006478.html

However, I guess there could be other things between the sender
and receiver that cause problems...

---
~Randy

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-06-22  4:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-20  6:24 Add pacct_struct to fix some pacct bugs KaiGai Kohei
2006-06-20  6:42 ` Andrew Morton
2006-06-20  7:27   ` KaiGai Kohei
2006-06-22  3:07     ` KaiGai Kohei
2006-06-22  3:08       ` KaiGai Kohei
2006-06-22  3:09       ` KaiGai Kohei
2006-06-22  3:09       ` KaiGai Kohei
2006-06-22  3:35       ` Andrew Morton
2006-06-22  4:05         ` KaiGai Kohei
2006-06-22  4:14         ` Randy.Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).