linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>,
	Sonny Rao <sonnyrao@chromium.org>, Jann Horn <jann@thejh.net>,
	Robert Foss <robert.foss@collabora.com>,
	corbet@lwn.net, Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	John Stultz <john.stultz@linaro.org>,
	ross.zwisler@linux.intel.com, jmarchan@redhat.com,
	Johannes Weiner <hannes@cmpxchg.org>,
	Kees Cook <keescook@chromium.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Robin Humble <plaguedbypenguins@gmail.com>,
	David Rientjes <rientjes@google.com>,
	eric.engestrom@imgtec.com, Janis Danisevskis <jdanis@google.com>,
	calvinowens@fb.com, Alexey Dobriyan <adobriyan@gmail.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	ldufour@linux.vnet.ibm.com, linux-doc@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ben Zhang <benzh@chromium.org>, Bryan Freed <bfreed@chromium.org>,
	Filipe Brandenburger <filbranden@chromium.org>,
	Mateusz Guzik <mguzik@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Mike Galbraith <mgalbraith@suse.com>
Subject: Re: utime accounting regression since 4.6 (was: Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps)
Date: Fri, 30 Sep 2016 11:49:05 +0200	[thread overview]
Message-ID: <20160930094905.GA22867@dhcp22.suse.cz> (raw)
In-Reply-To: <1471988771.2751.29.camel@redhat.com>

[CC Mike and Mel as they have seen some accounting oddities
 when doing performance testing. They can share details but
 essentially the system time just gets too high]

For your reference the email thread started
http://lkml.kernel.org/r/20160823143330.GL23577@dhcp22.suse.cz

I suspect this is mainly for short lived processes - like kernel compile
$ /usr/bin/time -v make mm/mmap.o
[...]
        User time (seconds): 0.45
        System time (seconds): 0.82
        Percent of CPU this job got: 111%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.14
$ rm mm/mmap.o
$ /usr/bin/time -v make mm/mmap.o
[...]
        User time (seconds): 0.47
        System time (seconds): 1.55
        Percent of CPU this job got: 107%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.88

This is quite unexpected for a cache hot compile. I would expect most of
the time being spent in userspace.

$ perf report | grep kernel.vmlinux
     2.01%  as        [kernel.vmlinux]         [k] page_fault
     0.59%  cc1       [kernel.vmlinux]         [k] page_fault
     0.15%  git       [kernel.vmlinux]         [k] page_fault
     0.12%  bash      [kernel.vmlinux]         [k] page_fault
     0.11%  sh        [kernel.vmlinux]         [k] page_fault
     0.08%  gcc       [kernel.vmlinux]         [k] page_fault
     0.06%  make      [kernel.vmlinux]         [k] page_fault
     0.04%  rm        [kernel.vmlinux]         [k] page_fault
     0.03%  ld        [kernel.vmlinux]         [k] page_fault
     0.02%  bash      [kernel.vmlinux]         [k] entry_SYSCALL_64
     0.01%  git       [kernel.vmlinux]         [k] entry_SYSCALL_64
     0.01%  cat       [kernel.vmlinux]         [k] page_fault
     0.01%  collect2  [kernel.vmlinux]         [k] page_fault
     0.00%  sh        [kernel.vmlinux]         [k] entry_SYSCALL_64
     0.00%  rm        [kernel.vmlinux]         [k] entry_SYSCALL_64
     0.00%  grep      [kernel.vmlinux]         [k] page_fault

doesn't show anything unexpected.

Original Rik's reply follows:

On Tue 23-08-16 17:46:11, Rik van Riel wrote:
> On Tue, 2016-08-23 at 16:33 +0200, Michal Hocko wrote:
[...]
> > OK, so it seems I found it. I was quite lucky because
> > account_user_time
> > is not all that popular function and there were basically no changes
> > besides Riks ff9a9b4c4334 ("sched, time: Switch
> > VIRT_CPU_ACCOUNTING_GEN
> > to jiffy granularity") and that seems to cause the regression.
> > Reverting
> > the commit on top of the current mmotm seems to fix the issue for me.
> > 
> > And just to give Rik more context. While debugging overhead of the
> > /proc/<pid>/smaps I am getting a misleading output from /usr/bin/time
> > -v
> > (source for ./max_mmap is [1])
> > 
> > root@test1:~# uname -r
> > 4.5.0-rc6-bisect1-00025-gff9a9b4c4334
> > root@test1:~# ./max_map 
> > pid:2990 maps:65515
> > root@test1:~# /usr/bin/time -v awk '/^Rss/{rss+=$2} /^Pss/{pss+=$2}
> > END {printf "rss:%d pss:%d\n", rss, pss}' /proc/2990/smaps
> > rss:263368 pss:262203
> >         Command being timed: "awk /^Rss/{rss+=$2} /^Pss/{pss+=$2} END
> > {printf "rss:%d pss:%d\n", rss, pss} /proc/2990/smaps"
> >         User time (seconds): 0.00
> >         System time (seconds): 0.45
> >         Percent of CPU this job got: 98%
> > 
> 
> > root@test1:~# /usr/bin/time -v awk '/^Rss/{rss+=$2} /^Pss/{pss+=$2}
> > END {printf "rss:%d pss:%d\n", rss, pss}' /proc/3015/smaps
> > rss:263316 pss:262199
> >         Command being timed: "awk /^Rss/{rss+=$2} /^Pss/{pss+=$2} END
> > {printf "rss:%d pss:%d\n", rss, pss} /proc/3015/smaps"
> >         User time (seconds): 0.18
> >         System time (seconds): 0.29
> >         Percent of CPU this job got: 97%
> 
> The patch in question makes user and system
> time accounting essentially tick-based. If
> jiffies changes while the task is in user
> mode, time gets accounted as user time, if
> jiffies changes while the task is in system
> mode, time gets accounted as system time.
> 
> If you get "unlucky", with a job like the
> above, it is possible all time gets accounted
> to system time.
> 
> This would be true both with the system running
> with a periodic timer tick (before and after my
> patch is applied), and in nohz_idle mode (after
> my patch).
> 
> However, it does seem quite unlikely that you
> get zero user time, since you have 125 timer
> ticks in half a second. Furthermore, you do not
> even have NO_HZ_FULL enabled...
> 
> Does the workload consistently get zero user
> time?
> 
> If so, we need to dig further to see under
> what precise circumstances that happens.
> 
> On my laptop, with kernel 4.6.3-300.fc24.x86_64
> I get this:
> 
> $ /usr/bin/time -v awk '/^Rss/{rss+=$2} /^Pss/{pss+=$2} END {printf
> "rss:%d pss:%d\n", rss, pss}' /proc/19825/smaps
> rss:263368 pss:262145
> 	Command being timed: "awk /^Rss/{rss+=$2} /^Pss/{pss+=$2} END
> {printf "rss:%d pss:%d\n", rss, pss} /proc/19825/smaps"
> 	User time (seconds): 0.64
> 	System time (seconds): 0.19
> 	Percent of CPU this job got: 99%
> 	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.83
> 
> The main difference between your and my
> NO_HZ config seems to be that NO_HZ_FULL
> is set here. However, it is not enabled
> at run time, so both of our systems
> should only really get NO_HZ_IDLE
> effectively.
> 
> Running tasks should get sampled with the
> regular timer tick, while they are running.
> 
> In other words, vtime accounting should be
> disabled in both of our tests, for everything
> except the idle task.
> 
> Do I need to do anything special to reproduce
> your bug, besides running the max mmap program
> and the awk script?
> 
> 

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2016-09-30  9:49 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-12 22:04 [PACTH v2 0/3] Implement /proc/<pid>/totmaps robert.foss
2016-08-12 22:04 ` [PACTH v2 1/3] mm, proc: " robert.foss
2016-08-13 14:39   ` Jann Horn
2016-08-15 13:57     ` Robert Foss
2016-08-15 20:14       ` Robert Foss
2016-08-12 22:04 ` [PACTH v2 2/3] Documentation/filesystems: Fixed typo robert.foss
2016-08-12 22:04 ` [PACTH v2 3/3] Documentation/filesystems: Added /proc/PID/totmaps documentation robert.foss
2016-08-14  9:04 ` [PACTH v2 0/3] Implement /proc/<pid>/totmaps Michal Hocko
2016-08-15 13:00   ` Robert Foss
2016-08-15 13:42     ` Michal Hocko
2016-08-15 16:25       ` Robert Foss
2016-08-16  7:12         ` Michal Hocko
2016-08-16 16:46           ` Robert Foss
2016-08-17  8:22             ` Michal Hocko
2016-08-17  9:31               ` Jann Horn
2016-08-17 13:03                 ` Michal Hocko
2016-08-17 16:48                   ` Robert Foss
2016-08-17 18:57                   ` Sonny Rao
2016-08-18  7:44                     ` Michal Hocko
2016-08-18 17:47                       ` Sonny Rao
2016-08-18 18:01                         ` Michal Hocko
2016-08-18 21:05                           ` Robert Foss
2016-08-19  6:27                             ` Sonny Rao
2016-08-19  2:26                           ` Minchan Kim
2016-08-19  6:47                             ` Sonny Rao
2016-08-19  8:05                             ` Michal Hocko
2016-08-19 18:20                               ` Sonny Rao
2016-08-22  0:07                               ` Minchan Kim
2016-08-22  7:40                                 ` Michal Hocko
2016-08-22 14:12                                   ` Minchan Kim
2016-08-22 14:37                                     ` Robert Foss
2016-08-22 16:45                                     ` Michal Hocko
2016-08-22 17:29                                       ` Michal Hocko
2016-08-22 17:47                                         ` Michal Hocko
2016-08-23  8:26                                           ` Michal Hocko
2016-08-23 14:33                                             ` utime accounting regression since 4.6 (was: Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps) Michal Hocko
2016-08-23 21:46                                               ` Rik van Riel
2016-08-24 16:56                                                 ` Michal Hocko
2016-09-30  9:49                                                 ` Michal Hocko [this message]
2016-09-30 12:35                                                   ` Rik van Riel
2016-08-19  6:43                           ` [PACTH v2 0/3] Implement /proc/<pid>/totmaps Sonny Rao
2016-08-19  7:59                             ` Michal Hocko
2016-08-19 17:57                               ` Sonny Rao
2016-08-22  7:54                                 ` Michal Hocko
2016-08-22 22:44                                   ` Sonny Rao
2016-08-24 10:14                                     ` Marcin Jabrzyk
2016-08-30  8:20                                       ` Michal Hocko
2016-08-29 14:37                                     ` Michal Hocko
2016-08-30  8:15                                       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160930094905.GA22867@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=benzh@chromium.org \
    --cc=bfreed@chromium.org \
    --cc=calvinowens@fb.com \
    --cc=corbet@lwn.net \
    --cc=eric.engestrom@imgtec.com \
    --cc=filbranden@chromium.org \
    --cc=gorcunov@openvz.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jann@thejh.net \
    --cc=jdanis@google.com \
    --cc=jmarchan@redhat.com \
    --cc=john.stultz@linaro.org \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=koct9i@gmail.com \
    --cc=ldufour@linux.vnet.ibm.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgalbraith@suse.com \
    --cc=mgorman@suse.de \
    --cc=mguzik@redhat.com \
    --cc=minchan@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=plaguedbypenguins@gmail.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=robert.foss@collabora.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=sonnyrao@chromium.org \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).