From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751840AbcFNODo (ORCPT ); Tue, 14 Jun 2016 10:03:44 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:40972 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751483AbcFNODm (ORCPT ); Tue, 14 Jun 2016 10:03:42 -0400 X-IBM-Helo: d03dlp01.boulder.ibm.com X-IBM-MailFrom: borntraeger@de.ibm.com X-IBM-RcptTo: lkp@01.org;vinmenon@codeaurora.org;ying.huang@intel.com;mhocko@kernel.org;minchan@kernel.org;akpm@linux-foundation.org;torvalds@linux-foundation.org;dave.hansen@linux.intel.com;kirill.shutemov@linux.intel.com;riel@redhat.com;mhocko@suse.com;mgorman@suse.de;linux-kernel@vger.kernel.org;linux-s390@vger.kernel.org Subject: Re: [LKP] [lkp] [mm] 5c0a85fad9: unixbench.score -6.3% regression To: Linus Torvalds , "Kirill A. Shutemov" References: <20160606022724.GA26227@yexl-desktop> <20160606095136.GA79951@black.fi.intel.com> <87a8iw5enf.fsf@yhuang-dev.intel.com> <8760tk5aym.fsf@yhuang-dev.intel.com> <20160608085811.GB12655@black.fi.intel.com> <87porn44fm.fsf@yhuang-dev.intel.com> <20160613125248.GA30109@black.fi.intel.com> Cc: "Huang, Ying" , Rik van Riel , Michal Hocko , LKML , Michal Hocko , Minchan Kim , Vinayak Menon , Mel Gorman , Andrew Morton , LKP , Dave Hansen , Martin Schwidefsky , linux-s390 From: Christian Borntraeger Date: Tue, 14 Jun 2016 16:03:23 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16061414-0016-0000-0000-000003EFEE52 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16061414-0017-0000-0000-0000302EDA12 Message-Id: <57600EAB.9030000@de.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-06-14_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1606140156 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/14/2016 08:11 AM, Linus Torvalds wrote: > On Mon, Jun 13, 2016 at 5:52 AM, Kirill A. Shutemov > wrote: >> On Sat, Jun 11, 2016 at 06:02:57PM -0700, Linus Torvalds wrote: >>> >>> I've timed it at over a thousand cycles on at least some CPU's, but >>> that's still peanuts compared to a real page fault. It shouldn't be >>> *that* noticeable, ie no way it's a 6% regression on its own. >> >> Looks like setting accessed bit is the problem. > > Ok. I've definitely seen it as an issue, but never to the point of > several percent on a real benchmark that wasn't explicitly testing > that cost. > > I reported the excessive dirty/accessed bit cost to Intel back in the > P4 days, but it's apparently not been high enough for anybody to care. > >> We spend 36% more time in page walk only, about 1% of total userspace time. >> Combining this with page walk footprint on caches, I guess we can get to >> this 3.5% score difference I see. >> >> I'm not sure if there's anything we can do to solve the issue without >> screwing relacim logic again. :( > > I think we should say "screw the reclaim logic" for now, and revert > commit 5c0a85fad949 for now. > > Considering how much trouble the accessed bit is on some other > architectures too, I wonder if we should strive to simply not care > about it, and always leaving it set. And then rely entirely on just > unmapping the pages and making the "we took a page fault after > unmapping" be the real activity tester. > > So get rid of the "if the page is young, mark it old but leave it in > the page tables" logic entirely. When we unmap a page, it will always > either be in the swap cache or the page cache anyway, so faulting it > in again should be just a minor fault with no actual IO happening. > > That might be less of an impact in the end - yes, the unmap and > re-fault is much more expensive, but it presumably happens to much > fewer pages. FWIW, something like that is what Martin did for s390 3 years ago. We now use invalidation and page faults to implement the *young functions in pgtable.h (basically using a SW young bit). This helped us to get rid of the storage keys (which contain the HW reference bit). The performance did not seem to suffer. See commit 0944fe3f4a323f436180d39402cae7f9c46ead17 s390/mm: implement software referenced bits > > What do you think? Your proposal would be to do the software tracking via invalidation/fault part of the generic mm code and not to hide it in the architecture backend. Correct? > > Linus >