From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1807C169C4 for ; Mon, 11 Feb 2019 19:09:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 64E8B2229E for ; Mon, 11 Feb 2019 19:09:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="hHSxsAMm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730008AbfBKTJI (ORCPT ); Mon, 11 Feb 2019 14:09:08 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:37212 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728110AbfBKTJI (ORCPT ); Mon, 11 Feb 2019 14:09:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=E91HbN5nHgQWlkJg36bcb7w4k7jQQ9V6f5qX1ZYtB7U=; b=hHSxsAMmj92F+UkkA4c3M/+VKT yQ5jUO2ewktGcO+333VTG5ss6Mx6XPed1P0SnjBQeoKYfaifUZG6G25fiDDU8XEnttW6+C9pKsaAf LZcTS9tsUysUMcLAlr4I4TchHavWk4/suAJVMW2a1Fa5bOa5DYflHKTITUrBRwgWLtVXHOCmyU2fl B6eD6KWiYUVomKD+J95AcLXGn2tZfhH3hhesZj2Ffq0Y+OPRZw6oW+i5Ml/Sg5d+QV4uupfFHeCvR LMaFKbP5QIwGMO/JEdPjCNgEZg4/bmAFI4XTGJtV8ZpuQPwN1NRlnRet0sakLbsaamToFoNoUFTSD qdKZq8lg==; Received: from willy by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux)) id 1gtGx2-0005gf-8o; Mon, 11 Feb 2019 19:09:08 +0000 Date: Mon, 11 Feb 2019 11:09:08 -0800 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [LSF/MM TOPIC] Eliminating tail pages Message-ID: <20190211190908.GA21683@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org I can't follow simple instructions. ----- Forwarded message from Matthew Wilcox ----- Date: Mon, 11 Feb 2019 11:07:28 -0800 From: Matthew Wilcox To: lsf-pc@lists.linux-foundation.org Subject: [LSF/MM TOPIC] Eliminating tail pages User-Agent: Mutt/1.9.2 (2017-12-15) Tail pages are a pain. All over the kernel, we call compound_head() (or occasionally forget to ...). So what would it take to eliminate them? I'm doing my best to eliminate them from being stored in the page cache. That's a nice first step, but the very first thing that functions like find_get_entry(), find_get_entries(), et al do is convert any large page they find to a tail page. So we'll probably need to introduce new functions which will return head pages and convert users over to them. I know Kirill has a lot more experience with this. Another place where we return tail pages is get_user_pages(). Callers of get_user_pages() expect tail or small pages; they do things like calculate the offset of the byte within the page by AND with PAGE_MASK. There'll be a lot of work to check all the users and convert them to something like unsigned int page_offset(struct page *page, unsigned long addr); Another thing to consider is that some architectures have a third-level page size of 16GB (looking at you, POWER). So an unsigned int isn't going to cut it. Do we want to support pages that large, or do we declare that there will never be any point in supporting pages larger than 4GB? There are probably other pitfalls I'm forgetting or have never known. Something like this will be essential for the glorious future that Christoph Lameter keeps talking about where we divide the memory up into parts which are only accessible as 2MB pages and parts which support legacy 4kB usages. Useful participants: Kirill Shutemov Christoph Lameter Hugh Dickins probably also relevant to the DAX crew. ----- End forwarded message -----