From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38282C47E48 for ; Thu, 15 Jul 2021 05:29:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DCD1261279 for ; Thu, 15 Jul 2021 05:29:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DCD1261279 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3BF638D0091; Thu, 15 Jul 2021 01:29:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 36F3F8D0065; Thu, 15 Jul 2021 01:29:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 237238D0091; Thu, 15 Jul 2021 01:29:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0222.hostedemail.com [216.40.44.222]) by kanga.kvack.org (Postfix) with ESMTP id EB5C38D0065 for ; Thu, 15 Jul 2021 01:29:29 -0400 (EDT) Received: from smtpin37.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D5EBA10F6C for ; Thu, 15 Jul 2021 05:29:28 +0000 (UTC) X-FDA: 78363694416.37.7CE748B Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf06.hostedemail.com (Postfix) with ESMTP id 0F398801F25E for ; Thu, 15 Jul 2021 05:29:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=XiMUvgaQRDYtKInq/h4jgn0syV52emYjHtsQoYohWY4=; b=uXEmtwKK4BYhtjvIXP5eOWI8a6 7Dj9kZ96pnzLmW5EdnGCY/fBzRzRjcxFsOrIcBr9sWGfUzQtR3LwnfAfp6894LET/mIkzV7BL3K4a 4jaXee88MmQJDcyVz2tLHWF3GhdbXCrM14RsVtnlQqx/0Lv5EEkTHhUFMLeJHUD54I1NRwY9nwQXk ni7/rKoyHH3v+66NlAExxvUdkmxUUMzcC9/UGqgQ46+NVKREUMDARCSnQkTjLdW1Qac6k2EyAyJdk Jn7uU+gzBpzuFGcefZ/o+bMzq9LxSspKDLnJJYm4kfWaX7BdlbPl56xpssJY8V/Oeee3/l3ka5J+L b5oKWFKw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1m3tuu-0031Js-3E; Thu, 15 Jul 2021 05:28:28 +0000 From: "Matthew Wilcox (Oracle)" To: linux-kernel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v14 138/138] mm/readahead: Add multi-page folio readahead Date: Thu, 15 Jul 2021 04:37:04 +0100 Message-Id: <20210715033704.692967-139-willy@infradead.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210715033704.692967-1-willy@infradead.org> References: <20210715033704.692967-1-willy@infradead.org> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0F398801F25E X-Stat-Signature: 9cqc4j18sgfn9e5gnpadi4tjr1kochxw Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=uXEmtwKK; dmarc=none; spf=none (imf06.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-HE-Tag: 1626326967-418728 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If the filesystem supports multi-page folios, allocate larger pages in the readahead code when it seems worth doing. The heuristic for choosing larger page sizes will surely need some tuning, but this aggressive ramp-up has been good for testing. Signed-off-by: Matthew Wilcox (Oracle) --- mm/readahead.c | 102 +++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 95 insertions(+), 7 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index e1df44ad57ed..27e76cc2a9ba 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -149,7 +149,7 @@ static void read_pages(struct readahead_control *rac,= struct list_head *pages, =20 blk_finish_plug(&plug); =20 - BUG_ON(!list_empty(pages)); + BUG_ON(pages && !list_empty(pages)); BUG_ON(readahead_count(rac)); =20 out: @@ -430,11 +430,99 @@ static int try_context_readahead(struct address_spa= ce *mapping, return 1; } =20 +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_= t index, + pgoff_t mark, unsigned int order, gfp_t gfp) +{ + int err; + struct folio *folio =3D filemap_alloc_folio(gfp, order); + + if (!folio) + return -ENOMEM; + if (mark - index < (1UL << order)) + folio_set_readahead(folio); + err =3D filemap_add_folio(ractl->mapping, folio, index, gfp); + if (err) + folio_put(folio); + else + ractl->_nr_pages +=3D 1UL << order; + return err; +} + +static void page_cache_ra_order(struct readahead_control *ractl, + struct file_ra_state *ra, unsigned int new_order) +{ + struct address_space *mapping =3D ractl->mapping; + pgoff_t index =3D readahead_index(ractl); + pgoff_t limit =3D (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; + pgoff_t mark =3D index + ra->size - ra->async_size; + int err =3D 0; + gfp_t gfp =3D readahead_gfp_mask(mapping); + + if (!mapping_thp_support(mapping) || ra->size < 4) + goto fallback; + + limit =3D min(limit, index + ra->size - 1); + + /* Grow page size up to PMD size */ + if (new_order < HPAGE_PMD_ORDER) { + new_order +=3D 2; + if (new_order > HPAGE_PMD_ORDER) + new_order =3D HPAGE_PMD_ORDER; + while ((1 << new_order) > ra->size) + new_order--; + } + + while (index <=3D limit) { + unsigned int order =3D new_order; + + /* Align with smaller pages if needed */ + if (index & ((1UL << order) - 1)) { + order =3D __ffs(index); + if (order =3D=3D 1) + order =3D 0; + } + /* Don't allocate pages past EOF */ + while (index + (1UL << order) - 1 > limit) { + if (--order =3D=3D 1) + order =3D 0; + } + err =3D ra_alloc_folio(ractl, index, mark, order, gfp); + if (err) + break; + index +=3D 1UL << order; + } + + if (index > limit) { + ra->size +=3D index - limit - 1; + ra->async_size +=3D index - limit - 1; + } + + read_pages(ractl, NULL, false); + + /* + * If there were already pages in the page cache, then we may have + * left some gaps. Let the regular readahead code take care of this + * situation. + */ + if (!err) + return; +fallback: + do_page_cache_ra(ractl, ra->size, ra->async_size); +} +#else +static void page_cache_ra_order(struct readahead_control *ractl, + struct file_ra_state *ra, unsigned int order) +{ + do_page_cache_ra(ractl, ra->size, ra->async_size); +} +#endif + /* * A minimal readahead algorithm for trivial sequential/random reads. */ static void ondemand_readahead(struct readahead_control *ractl, - bool hit_readahead_marker, unsigned long req_size) + struct folio *folio, unsigned long req_size) { struct backing_dev_info *bdi =3D inode_to_bdi(ractl->mapping->host); struct file_ra_state *ra =3D ractl->ra; @@ -469,12 +557,12 @@ static void ondemand_readahead(struct readahead_con= trol *ractl, } =20 /* - * Hit a marked page without valid readahead state. + * Hit a marked folio without valid readahead state. * E.g. interleaved reads. * Query the pagecache for async_size, which normally equals to * readahead size. Ramp it up and use it as the new readahead size. */ - if (hit_readahead_marker) { + if (folio) { pgoff_t start; =20 rcu_read_lock(); @@ -547,7 +635,7 @@ static void ondemand_readahead(struct readahead_contr= ol *ractl, } =20 ractl->_index =3D ra->start; - do_page_cache_ra(ractl, ra->size, ra->async_size); + page_cache_ra_order(ractl, ra, folio ? folio_order(folio) : 0); } =20 void page_cache_sync_ra(struct readahead_control *ractl, @@ -575,7 +663,7 @@ void page_cache_sync_ra(struct readahead_control *rac= tl, } =20 /* do read-ahead */ - ondemand_readahead(ractl, false, req_count); + ondemand_readahead(ractl, NULL, req_count); } EXPORT_SYMBOL_GPL(page_cache_sync_ra); =20 @@ -604,7 +692,7 @@ void page_cache_async_ra(struct readahead_control *ra= ctl, return; =20 /* do read-ahead */ - ondemand_readahead(ractl, true, req_count); + ondemand_readahead(ractl, folio, req_count); } EXPORT_SYMBOL_GPL(page_cache_async_ra); =20 --=20 2.30.2