From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id DEE616B0038 for ; Sun, 25 Dec 2016 16:51:18 -0500 (EST) Received: by mail-io0-f200.google.com with SMTP id j76so126516074ioe.3 for ; Sun, 25 Dec 2016 13:51:18 -0800 (PST) Received: from mail-it0-x244.google.com (mail-it0-x244.google.com. [2607:f8b0:4001:c0b::244]) by mx.google.com with ESMTPS id l195si28969216ioe.182.2016.12.25.13.51.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 25 Dec 2016 13:51:18 -0800 (PST) Received: by mail-it0-x244.google.com with SMTP id 75so29648882ite.1 for ; Sun, 25 Dec 2016 13:51:18 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20161225030030.23219-3-npiggin@gmail.com> References: <20161225030030.23219-1-npiggin@gmail.com> <20161225030030.23219-3-npiggin@gmail.com> From: Linus Torvalds Date: Sun, 25 Dec 2016 13:51:17 -0800 Message-ID: Subject: Re: [PATCH 2/2] mm: add PageWaiters indicating tasks are waiting for a page bit Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Nicholas Piggin Cc: Dave Hansen , Bob Peterson , Linux Kernel Mailing List , Steven Whitehouse , Andrew Lutomirski , Andreas Gruenbacher , Peter Zijlstra , linux-mm , Mel Gorman On Sat, Dec 24, 2016 at 7:00 PM, Nicholas Piggin wrote: > Add a new page flag, PageWaiters, to indicate the page waitqueue has > tasks waiting. This can be tested rather than testing waitqueue_active > which requires another cacheline load. Ok, I applied this one too. I think there's room for improvement, but I don't think it's going to help to just wait another release cycle and hope something happens. Example room for improvement from a profile of unlock_page(): 46.44 =E2=94=82 lock andb $0xfe,(%rdi) 34.22 =E2=94=82 mov (%rdi),%rax this has the old "do atomic op on a byte, then load the whole word" issue that we used to have with the nasty zone lookup code too. And it causes a horrible pipeline hickup because the load will not forward the data from the (partial) store. Its' really a misfeature of our asm optimizations of the atomic bit ops. Using "andb" is slightly smaller, but in this case in particular, an "andq" would be a ton faster, and the mask still fits in an imm8, so it's not even hugely larger. But it might also be a good idea to simply use a "cmpxchg" loop here. That also gives atomicity guarantees that we don't have with the "clear bit and then load the value". Regardless, I think this is worth more people looking at and testing. And merging it is probably the best way for that to happen. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org