From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCAD0C433ED for ; Thu, 1 Apr 2021 16:00:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 303536112E for ; Thu, 1 Apr 2021 16:00:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 303536112E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 67DC56B0075; Thu, 1 Apr 2021 12:00:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 62D446B0078; Thu, 1 Apr 2021 12:00:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 480416B007D; Thu, 1 Apr 2021 12:00:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0100.hostedemail.com [216.40.44.100]) by kanga.kvack.org (Postfix) with ESMTP id 2FBA16B0075 for ; Thu, 1 Apr 2021 12:00:13 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D7677BF15 for ; Thu, 1 Apr 2021 16:00:12 +0000 (UTC) X-FDA: 77984259864.16.DA6C0E4 Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by imf07.hostedemail.com (Postfix) with ESMTP id E338BA0009D9 for ; Thu, 1 Apr 2021 16:00:11 +0000 (UTC) Received: by mail-qk1-f181.google.com with SMTP id c4so2677209qkg.3 for ; Thu, 01 Apr 2021 09:00:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=G/+9Hb7YYQmA2SssmIOWqXZ0kNaROHWnX1DvJqmrmNI=; b=c13x1plnbd9O5N0pa2dHAkihjGH6ROBPko3mLq9jsLcZ5dvMbyLYm0/pe8+4HVj7SY NqpWRjFmceGkyaBCaeRvbtg1oxZot1s5xFOIlITYkU2RI+OWxtHNsT5vB/SQ5N5cMcOx MFeBKHXLwFPoanmzbLqMRMcY9wNvhrrASbOTogFfi3uMEjFNO/FZF24TmaprZJjC+lCX UxVRXOYhRVKR+gQVDEyJz59VyO0zcFG1s1F9PG5FgFnG6pVrlZ6/iM8h3tzduY7oz422 /xu6rR9gOKWyN1rtcKRWB4aHvxcIDUzhLqM06zpkx7vHCu2W97HmNlA4H/NCJvjjfFQj wURQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=G/+9Hb7YYQmA2SssmIOWqXZ0kNaROHWnX1DvJqmrmNI=; b=JfVsRGZBbDwaHVm2nydMihb/hRddVuBKhZUD264YI2qvQLR//ampNodD3bpOYjHoHn OGsjzeJ7MSlCQPRhLAHCGyYjxtCzcLq8hBU5ARG67o72hUNxklXVo1ghNbWz+DhRrku1 UPpMrWO0D2Xa8icmreS6J9WY/uN6PdesJrLUFVXktdn4VdeFQBvE/OMD35dMpaqHda90 VYu9axha7dQ6XBRfFkxLCPAmJhf9ucoLslzu9WEIaFF2NeCaOAX8ziPtJ2+J+VXyaddd yKAml0vm1fjccf9ADC8RBbOICumfRT+6j3YTpmR+Iy6c3dNVECc0AwNtCoLFlz3Vfsvq 1dUQ== X-Gm-Message-State: AOAM5330uAOVGyUlagyFNrgMouLvgvFvgqU6CBl+Jyk0SRih3aBtUf57 sCxCFWvFaireuwTISVtvXcmzYg== X-Google-Smtp-Source: ABdhPJzhDkBd+v2HhIuSSFxe7q/gCCFa7piTUeuWx+Lf/VxFezQOIFJlwEWQjiaOfJj4rq7XTMPNMA== X-Received: by 2002:a05:620a:e10:: with SMTP id y16mr9042205qkm.375.1617292811103; Thu, 01 Apr 2021 09:00:11 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:9738]) by smtp.gmail.com with ESMTPSA id z188sm4335793qkb.40.2021.04.01.09.00.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Apr 2021 09:00:10 -0700 (PDT) Date: Thu, 1 Apr 2021 12:00:08 -0400 From: Johannes Weiner To: Al Viro Cc: Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org Subject: Re: [PATCH v5 00/27] Memory Folios Message-ID: References: <20210320054104.1300774-1-willy@infradead.org> <20210322184744.GU1719932@casper.infradead.org> <20210324062421.GQ1719932@casper.infradead.org> <20210329165832.GG351017@casper.infradead.org> <20210330210929.GR351017@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: E338BA0009D9 X-Stat-Signature: x5ykyff3mfneatg17naks448pg5qeqgz X-Rspamd-Server: rspam02 Received-SPF: none (cmpxchg.org>: No applicable sender policy available) receiver=imf07; identity=mailfrom; envelope-from=""; helo=mail-qk1-f181.google.com; client-ip=209.85.222.181 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617292811-461260 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 01, 2021 at 05:05:37AM +0000, Al Viro wrote: > On Tue, Mar 30, 2021 at 10:09:29PM +0100, Matthew Wilcox wrote: > > > That's a very Intel-centric way of looking at it. Other architectures > > support a multitude of page sizes, from the insane ia64 (4k, 8k, 16k, then > > every power of four up to 4GB) to more reasonable options like (4k, 32k, > > 256k, 2M, 16M, 128M). But we (in software) shouldn't constrain ourselves > > to thinking in terms of what the hardware currently supports. Google > > have data showing that for their workloads, 32kB is the goldilocks size. > > I'm sure for some workloads, it's much higher and for others it's lower. > > But for almost no workload is 4kB the right choice any more, and probably > > hasn't been since the late 90s. > > Out of curiosity I looked at the distribution of file sizes in the > kernel tree: > 71455 files total > 0--4Kb 36702 > 4--8Kb 11820 > 8--16Kb 10066 > 16--32Kb 6984 > 32--64Kb 3804 > 64--128Kb 1498 > 128--256Kb 393 > 256--512Kb 108 > 512Kb--1Mb 35 > 1--2Mb 25 > 2--4Mb 5 > 4--6Mb 7 > 6--8Mb 4 > 12Mb 2 > 14Mb 1 > 16Mb 1 > > ... incidentally, everything bigger than 1.2Mb lives^Wshambles under > drivers/gpu/drm/amd/include/asic_reg/ > > Page size Footprint > 4Kb 1128Mb > 8Kb 1324Mb > 16Kb 1764Mb > 32Kb 2739Mb > 64Kb 4832Mb > 128Kb 9191Mb > 256Kb 18062Mb > 512Kb 35883Mb > 1Mb 71570Mb > 2Mb 142958Mb > > So for kernel builds (as well as grep over the tree, etc.) uniform 2Mb pages > would be... interesting. Right, I don't see us getting rid of 4k cache entries anytime soon. Even 32k pages would double the footprint here. The issue is just that at the other end of the spectrum we have IO devices that do 10GB/s, which corresponds to 2.6 million pages per second. At such data rates we are currently CPU-limited because of the pure transaction overhead in page reclaim. Workloads like this tend to use much larger files, and would benefit from a larger paging unit. Likewise, most production workloads in cloud servers have enormous anonymous regions and large executables that greatly benefit from fewer page table levels and bigger TLB entries. Today, fragmentation prevents the page allocator from producing 2MB blocks at a satisfactory rate and allocation latency. It's not feasible to allocate 2M inside page faults for example; getting huge page coverage for the page cache will be even more difficult. I'm not saying we should get rid of 4k cache entries. Rather, I'm wondering out loud whether longer-term we'd want to change the default page size to 2M, and implement the 4k cache entries, which we clearly continue to need, with a slab style allocator on top. The idea being that it'll do a better job at grouping cache entries with other cache entries of a similar lifetime than the untyped page allocator does naturally, and so make fragmentation a whole lot more manageable. (I'm using x86 page sizes as examples because they matter to me. But there is an architecture independent discrepancy between the smallest cache entries we must continue to support, and larger blocks / huge pages that we increasingly rely on as first class pages.)