From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423286AbcFMMwy (ORCPT ); Mon, 13 Jun 2016 08:52:54 -0400 Received: from mga14.intel.com ([192.55.52.115]:46414 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423230AbcFMMwx (ORCPT ); Mon, 13 Jun 2016 08:52:53 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,466,1459839600"; d="scan'208";a="120920626" Date: Mon, 13 Jun 2016 15:52:48 +0300 From: "Kirill A. Shutemov" To: Linus Torvalds Cc: "Huang, Ying" , Rik van Riel , Michal Hocko , LKML , Michal Hocko , Minchan Kim , Vinayak Menon , Mel Gorman , Andrew Morton , LKP , Dave Hansen Subject: Re: [LKP] [lkp] [mm] 5c0a85fad9: unixbench.score -6.3% regression Message-ID: <20160613125248.GA30109@black.fi.intel.com> References: <20160606022724.GA26227@yexl-desktop> <20160606095136.GA79951@black.fi.intel.com> <87a8iw5enf.fsf@yhuang-dev.intel.com> <8760tk5aym.fsf@yhuang-dev.intel.com> <20160608085811.GB12655@black.fi.intel.com> <87porn44fm.fsf@yhuang-dev.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 11, 2016 at 06:02:57PM -0700, Linus Torvalds wrote: > On Sat, Jun 11, 2016 at 5:49 PM, Huang, Ying wrote: > > > > From perf profile, the time spent in page_fault and its children > > functions are almost same (7.85% vs 7.81%). So the time spent in page > > fault and page table operation itself doesn't changed much. So, you > > mean CPU may be slower to load the page table entry to TLB if accessed > > bit is not set? > > So the CPU does take a microfault internally when it needs to set the > accessed/dirty bit. It's not architecturally visible, but you can see > it when you do timing loops. > > I've timed it at over a thousand cycles on at least some CPU's, but > that's still peanuts compared to a real page fault. It shouldn't be > *that* noticeable, ie no way it's a 6% regression on its own. Looks like setting accessed bit is the problem. Withouth mkold: Score: 1952.9 Performance counter stats for './Run shell8 -c 1' (3 runs): 468,562,316,621 cycles:u ( +- 0.02% ) 4,596,299,472 dtlb_load_misses_walk_duration:u ( +- 0.07% ) 5,245,488,559 itlb_misses_walk_duration:u ( +- 0.10% ) 189.336404566 seconds time elapsed ( +- 0.01% ) With mkold: Score: 1885.5 Performance counter stats for './Run shell8 -c 1' (3 runs): 503,185,676,256 cycles:u ( +- 0.06% ) 8,137,007,894 dtlb_load_misses_walk_duration:u ( +- 0.85% ) 7,220,632,283 itlb_misses_walk_duration:u ( +- 1.40% ) 189.363223499 seconds time elapsed ( +- 0.01% ) We spend 36% more time in page walk only, about 1% of total userspace time. Combining this with page walk footprint on caches, I guess we can get to this 3.5% score difference I see. I'm not sure if there's anything we can do to solve the issue without screwing relacim logic again. :( -- Kirill A. Shutemov