From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751335AbdBLPvm (ORCPT ); Sun, 12 Feb 2017 10:51:42 -0500 Received: from forward11m.cmail.yandex.net ([5.255.216.137]:35777 "EHLO forward11m.cmail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751226AbdBLPvk (ORCPT ); Sun, 12 Feb 2017 10:51:40 -0500 X-Greylist: delayed 389 seconds by postgrey-1.27 at vger.kernel.org; Sun, 12 Feb 2017 10:51:40 EST Authentication-Results: smtp3m.mail.yandex.net; dkim=pass header.i=@yandex.ru X-Yandex-ForeignMX: US X-Yandex-Suid-Status: 1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0,1 0 From: Dmitry Romanov Subject: Re: [REGRESSION] Two issues that prevent process accounting (taskstats) from working correctly To: LKML References: <78aa4c26-2a31-06e5-d862-6e777d204b2b@yandex.ru> Cc: Peter Zijlstra , Ingo Molnar , Andrew Morton , Ingo Molnar , Nicolas Dichtel , Jay Lan , Martin Steigerwald , Gerlof Langeveld , Marc Haber , Ben Hutchings , Balbir Singh , Tejun Heo , kernel-team@fb.com Message-ID: Date: Sun, 12 Feb 2017 18:44:57 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <78aa4c26-2a31-06e5-d862-6e777d204b2b@yandex.ru> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 19, 2016 at 01:06:00PM +0100, Martin Steigerwald wrote: > > 1) Sometimes process accounting does not work at all. > > The acct() system call (to activate process accounting) return value 0, > which means that process accounting is activated successfully. > However, no process accounting records are written whatsoever. This > situation can be reproduced with the program 'acctdemo.c' > that you can find as attachment. When this program gives the message > "found a process accounting record!", the situation is okay > and process accounting works fine to the file '/tmp/mypacct'. When the > message 'No process accounting record yet....' is repeatedly given, > process accounting does not work and will not work at all. It might be > that you have to start this program several times before you get > this situation (preferably start/finish lots of processes in the mean time). > This problem is probably caused by a new mechanism introduced in the > kernel code (..../linux/kernel/acct.c) that is called 'slow accounting' > and has to be solved in the kernel code. > > I experience this problem on Debian8 with a 4.8 kernel and on CentOS7 > with a 4.8 kernel. I reported same problem on bugzilla as: Bug 180841 - Process accounting sometimes do not append records for terminated processes https://bugzilla.kernel.org/show_bug.cgi?id=180841 I think I found cause of this problem and can suggest patch which correct this problem. Problem arise in such situation: Problem arise if between process accounting activation with call acct(...) and first termination of process pass not less than one jiffy. (Note, acct() return successfully, with zero.) In such situation records for terminated processes do nod append to accounting file, until process accounting is restarted. I wrote test program test.c for illustration described problem for kernel version 3.17-rc1 and later. This program create empty file for accounting, call system call acct() with this file, sleep for not less than one jiffy, create new process and exit this process. Then, program test size of accounting file. If size of file remain zero, it seems problem and program display message "Accounting file size = 0, process accounting did not add record to accounting file!". On my system problem reproduce almost always by this test.c. ---------- test.c ---------- #include #include #include #include #include #include /* Accounting file name : */ #define ACCTFILENAME "/var/log/pacct" int main() { int fd; int pid; struct stat sb; /* Turn off accounting */ if ( acct(NULL) < 0 ) { perror("Process accounting off"); return -1; } /* Create empty accounting file */ fd = creat(ACCTFILENAME, S_IRUSR | S_IWUSR); if ( fd == -1 ) { perror("Accounting file creation"); return -1; } if ( close(fd) == -1) { perror("Accounting file closing"); return -1; } /* Switch accounting on */ if ( acct(ACCTFILENAME) < 0 ) { perror("Process accounting on"); return -1; } /* Delay must be at least 1 jiffy. For reproducing bug, some process exit must not happen during first jiffy. If HZ >= 100, need delay minimum 10 ms. */ usleep(10000); /* Create and exit child process. The record for this process must be append by activated process accounting. */ pid = fork(); if (pid < 0) { perror("Child process creating"); return -1; } /* Exit child process */ if (pid == 0) { exit(0); } /* Wait for child process termination */ wait(NULL); /* Get size of accounting file. */ if ( stat(ACCTFILENAME, &sb) == -1 ) { perror("Getting accounting file status"); return -1; } if ( sb.st_size == 0 ) { printf("Accounting file size = 0, process accounting did not add record to accounting file!\n"); } else { printf("Accounting file size > 0, expected behaviour.\n"); } /* Turn off accounting and remove accounting file*/ if ( acct(NULL) < 0 ) { perror("Process accounting off"); return -1; } if ( remove(ACCTFILENAME) == -1 ) { perror("Removing accounting file"); return -1; } return 0; } ---------- I suppose that this problem may be solve in kernel versions 3.17-rc1 and later by following patch: Signed-off-by: Dmitry Romanov --- kernel/acct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/acct.c b/kernel/acct.c index 74963d1..37f1dc6 100644 --- a/kernel/acct.c +++ b/kernel/acct.c @@ -99,7 +99,7 @@ static int check_free_space(struct bsd_acct_struct *acct) { struct kstatfs sbuf; - if (time_is_before_jiffies(acct->needcheck)) + if (time_is_after_jiffies(acct->needcheck)) goto out; /* May block */ -- 1.9.1 Line "if (time_is_before_jiffies(acct->needcheck))" use for check whether the time for check free space. It seems function "time_is_after_jiffies" need here. ---------- In kernel versions 3.3-rc1 - 3.16 activation of process accounting implemented differently, so delay between call acct(filename) and first process termination do not produce problem, and program test.c do not detect problem. But, it seems, using function time_is_before_jiffies is not right similarly. Another problem arise, if during work of process accounting happen that current jiffies is greater than acct->needcheck (for example, if between two consecutive process terminations happen interval greater than ACCT_TIMEOUT seconds). Then in lines: if (!file || time_is_before_jiffies(acct->needcheck)) goto out; always will branch with "goto" and acct->needcheck will not change. So free space will not check more, until process accounting is restarted. Note, that in version 3.17-rc1 and later this problem is also present. Therefore I suppose that this problem for kernel version 3.3-rc1 - 3.16 may be solve by following patch: Signed-off-by: Dmitry Romanov --- kernel/acct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/acct.c b/kernel/acct.c index 808a86f..591bdcd 100644 --- a/kernel/acct.c +++ b/kernel/acct.c @@ -107,7 +107,7 @@ static int check_free_space(struct bsd_acct_struct *acct, struct file *file) spin_lock(&acct_lock); res = acct->active; - if (!file || time_is_before_jiffies(acct->needcheck)) + if (!file || time_is_after_jiffies(acct->needcheck)) goto out; spin_unlock(&acct_lock); -- 1.9.1 In kernel 3.2 another method used for define time to check free space (by timer). So I did not find in this version such problem. Dmitry