From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754237AbZFSPv1 (ORCPT ); Fri, 19 Jun 2009 11:51:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751904AbZFSPvS (ORCPT ); Fri, 19 Jun 2009 11:51:18 -0400 Received: from tomts22.bellnexxia.net ([209.226.175.184]:42836 "EHLO tomts22-srv.bellnexxia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751632AbZFSPvR (ORCPT ); Fri, 19 Jun 2009 11:51:17 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjIFAFtPO0pMQWja/2dsb2JhbACBUb4jCJAGgjEfCIExBQ Date: Fri, 19 Jun 2009 11:51:15 -0400 From: Mathieu Desnoyers To: Ingo Molnar Cc: Linus Torvalds , mingo@redhat.com, hpa@zytor.com, paulus@samba.org, acme@redhat.com, linux-kernel@vger.kernel.org, a.p.zijlstra@chello.nl, penberg@cs.helsinki.fi, vegard.nossum@gmail.com, efault@gmx.de, jeremy@goop.org, npiggin@suse.de, tglx@linutronix.de, linux-tip-commits@vger.kernel.org Subject: Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chain support to use NMI-safe methods Message-ID: <20090619155115.GA24111@Krystal> References: <20090615183649.GA16999@elte.hu> <20090615194344.GA12554@elte.hu> <20090615200619.GA10632@Krystal> <20090615204715.GA24554@elte.hu> <20090615210225.GA12919@Krystal> <20090615211209.GA27100@elte.hu> <20090619152029.GA7204@elte.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: <20090619152029.GA7204@elte.hu> X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.21.3-grsec (i686) X-Uptime: 11:43:57 up 111 days, 12:10, 4 users, load average: 0.29, 0.39, 0.72 User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar (mingo@elte.hu) wrote: > > * Linus Torvalds wrote: > > > On Mon, 15 Jun 2009, Ingo Molnar wrote: > > > > > > See the numbers in the other mail: about 33 million pagefaults > > > happen in a typical kernel build - that's ~400K/sec - and that > > > is not a particularly really pagefault-heavy workload. > > > > Did you do any function-level profiles? > > > > Last I looked at it, the real cost of page faults were all in the > > memory copies and page clearing, and while it would be nice to > > speed up the kernel entry and exit, the few tens of cycles we > > might be able to get from there really aren't all that important. > > Yeah. > > Here's the function level profiles of a typical kernel build on a > Nehalem box: > > $ perf report --sort symbol > > # > # (14317328 samples) > # > # Overhead Symbol > # ........ ...... > # > 44.05% 0x000000001a0b80 It makes me wonder how the following scenario is accounted : - Execution of a newly forked/exec'd process instruction causes a fault. (traps, faults and interrupts can take roughly 2000 cycles to execute) - PC sampling interrupt fires. Will it account the execution time as part of user-space or kernel-space execution ? Depending on how the sampling mechanism finds out if it is running in kernel mode or userspace mode, this might make the userspace PC appear as currently running even though the current execution context is the very beginning of the page fault handler (1st instruction servicing the fault). Mathieu > 5.09% 0x0000000001d298 > 3.56% 0x0000000005742c > 2.48% 0x0000000014026d > 2.31% 0x00000000007b1a > 2.06% 0x00000000115ac9 > 1.83% [.] _int_malloc > 1.71% 0x00000000064680 > 1.50% [.] memset > 1.37% 0x00000000125d88 > 1.28% 0x000000000b7642 > 1.17% [k] clear_page_c > 0.87% [k] page_fault > 0.78% [.] is_defined_config > 0.71% [.] _int_free > 0.68% [.] __GI_strlen > 0.66% 0x000000000699e8 > 0.54% [.] __GI_memcpy > > Most is dominated by user-space symbols. (no proper ELF+debuginfo on > this box so they are unnamed.) It also sows that page clearing and > pagefault handling dominates the kernel overhead - but is dwarved by > other overhead. Any page-fault-entry costs are a drop in the bucket. > > In fact with call-chain graphs we can get a precise picture, as we > can do a non-linear 'slice' set operation over the samples and > filter out the ones that have the 'page_fault' pattern in one of > their parent functions: > > $ perf report --sort symbol --parent page_fault > > # > # (14317328 samples) > # > # Overhead Symbol > # ........ ...... > # > 1.12% [k] clear_page_c > 0.87% [k] page_fault > 0.43% [k] get_page_from_freelist > 0.25% [k] _spin_lock > 0.24% [k] do_page_fault > 0.23% [k] perf_swcounter_ctx_event > 0.16% [k] perf_swcounter_event > 0.15% [k] handle_mm_fault > 0.15% [k] __alloc_pages_nodemask > 0.14% [k] __rmqueue > 0.12% [k] find_get_page > 0.11% [k] copy_page_c > 0.11% [k] find_vma > 0.10% [k] _spin_lock_irqsave > 0.10% [k] __wake_up_bit > 0.09% [k] _spin_unlock_irqrestore > 0.09% [k] do_anonymous_page > 0.09% [k] __inc_zone_state > > This "sub-profile" shows the true summary overhead that 'page_fault' > and all its child functions have. Note that for example clear_page_c > decreased from 1.17% to 1.12%: > > 1.12% [k] clear_page_c > 1.17% [k] clear_page_c > > because there's 0.05% of other callers to clear_page_c() that do not > involve page_fault. Those are filtered out via --parent > filtering/matching. > > Ingo -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68