From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F1F1C433DF for ; Thu, 16 Jul 2020 21:32:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1F9E42083B for ; Thu, 16 Jul 2020 21:32:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KrtuqfKL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726359AbgGPVcw (ORCPT ); Thu, 16 Jul 2020 17:32:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725959AbgGPVcv (ORCPT ); Thu, 16 Jul 2020 17:32:51 -0400 Received: from mail-io1-xd43.google.com (mail-io1-xd43.google.com [IPv6:2607:f8b0:4864:20::d43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BC5AC061755; Thu, 16 Jul 2020 14:32:51 -0700 (PDT) Received: by mail-io1-xd43.google.com with SMTP id f23so7691302iof.6; Thu, 16 Jul 2020 14:32:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WCPVcqFuiCdUoJYXeYbmBF6DGMw5rsz8t1arNpaHm/Q=; b=KrtuqfKLe1HresqYP0nYMAMlhSHrS4z3wZlak6tIf9Sc+8bZ+DnymZHD1TlFkkHIkt +eeUM9kjrXHgYezAodfxHYeNe/WyjTcrY7uEE8opCtb0olqS2Gdkvl781a2c0ddmyDuF EAh5aHgfGMI3ZoFLtdav2bTekuWibtY40jWTG7FKCgsxsTYnJmzcwtKPdveljQGwcBdh IW9qf1f/xdpBBhTRlSpnRW6SrhS+g53ccA+Vh5lgWBRk3Kki6o2gOq47FWGFoNBMuwoh qR8zfKq45Dea0rRqxfkQ/BGwu7pQ2fmDcOs5RGv5yKFMK/7cabijhHnGeOAhapjTJ4mt lAbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WCPVcqFuiCdUoJYXeYbmBF6DGMw5rsz8t1arNpaHm/Q=; b=sborYYuzPi64LbO/JZPi8qICY5UJFfwIZ3JWY7mUNwKzLCYHYRCj0cKul8opX2Dexf 48hh5rISYsv7v7BcIdqCgR9UF1bNVElFghybWWh1zU+nKq5lq4EoK0J2KEuNAfTFI6Ec JvZaC49Jm3MT0eOeh4+zBAKluwI5TFDI5Fkaf/Ve7AuOuTvzYZYnw/VvgI4lh4q7rOcG Y/bLDXmYFO8deqHmPkEeO86vBT7/soQkfVm16urdrjSbsvYs100f+jTYjvx77xyyETEV Urrl9om3G3JVke9r7Gvpb+ydaAHrRIm0mMZcmMgHWwFhEHZZEaUSSRfmi1vlkLgcxhtb HnmA== X-Gm-Message-State: AOAM533fEFj92CQeN1jUqeeluKEJBv+0CrwyUgMLZclmPlwKV6tziLuu y9TFoh/QOu8fOmQP9+E4HqXO11cwq0iO4fyU54A= X-Google-Smtp-Source: ABdhPJwlkKnjQKpDgnnAlSJD7m0HJKooPQCbYM4cllqOAxIG6bMA2HhR43hzrGc21eQRRx1kXzerU3wVAFR4p2nj8/Y= X-Received: by 2002:a02:ce9a:: with SMTP id y26mr7535180jaq.121.1594935170180; Thu, 16 Jul 2020 14:32:50 -0700 (PDT) MIME-Version: 1.0 References: <1594429136-20002-1-git-send-email-alex.shi@linux.alibaba.com> <1594429136-20002-16-git-send-email-alex.shi@linux.alibaba.com> In-Reply-To: <1594429136-20002-16-git-send-email-alex.shi@linux.alibaba.com> From: Alexander Duyck Date: Thu, 16 Jul 2020 14:32:39 -0700 Message-ID: Subject: Re: [PATCH v16 15/22] mm/compaction: do page isolation first in compaction To: Alex Shi Cc: Andrew Morton , Mel Gorman , Tejun Heo , Hugh Dickins , Konstantin Khlebnikov , Daniel Jordan , Yang Shi , Matthew Wilcox , Johannes Weiner , kbuild test robot , linux-mm , LKML , cgroups@vger.kernel.org, Shakeel Butt , Joonsoo Kim , Wei Yang , "Kirill A. Shutemov" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 10, 2020 at 5:59 PM Alex Shi wrote: > > Johannes Weiner has suggested: > "So here is a crazy idea that may be worth exploring: > > Right now, pgdat->lru_lock protects both PageLRU *and* the lruvec's > linked list. > > Can we make PageLRU atomic and use it to stabilize the lru_lock > instead, and then use the lru_lock only serialize list operations? > ..." > > Yes, this patch is doing so on __isolate_lru_page which is the core > page isolation func in compaction and shrinking path. > With this patch, the compaction will only deal the PageLRU set and now > isolated pages to skip the just alloced page which no LRU bit. And the > isolation could exclusive the other isolations in memcg move_account, > page migrations and thp split_huge_page. > > As a side effect, PageLRU may be cleared during shrink_inactive_list > path for isolation reason. If so, we can skip that page. > > Hugh Dickins fixed following bugs in this patch's > early version: > > Fix lots of crashes under compaction load: isolate_migratepages_block() > must clean up appropriately when rejecting a page, setting PageLRU again > if it had been cleared; and a put_page() after get_page_unless_zero() > cannot safely be done while holding locked_lruvec - it may turn out to > be the final put_page(), which will take an lruvec lock when PageLRU. > And move __isolate_lru_page_prepare back after get_page_unless_zero to > make trylock_page() safe: > trylock_page() is not safe to use at this time: its setting PG_locked > can race with the page being freed or allocated ("Bad page"), and can > also erase flags being set by one of those "sole owners" of a freshly > allocated page who use non-atomic __SetPageFlag(). > > Suggested-by: Johannes Weiner > Signed-off-by: Alex Shi > Cc: Hugh Dickins > Cc: Andrew Morton > Cc: Matthew Wilcox > Cc: linux-kernel@vger.kernel.org > Cc: linux-mm@kvack.org > --- > include/linux/swap.h | 2 +- > mm/compaction.c | 42 +++++++++++++++++++++++++++++++++--------- > mm/vmscan.c | 38 ++++++++++++++++++++++---------------- > 3 files changed, 56 insertions(+), 26 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 2c29399b29a0..6d23d3beeff7 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -358,7 +358,7 @@ extern void lru_cache_add_active_or_unevictable(struct page *page, > extern unsigned long zone_reclaimable_pages(struct zone *zone); > extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, > gfp_t gfp_mask, nodemask_t *mask); > -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode); > +extern int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); > extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, > unsigned long nr_pages, > gfp_t gfp_mask, > diff --git a/mm/compaction.c b/mm/compaction.c > index f14780fc296a..2da2933fe56b 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -869,6 +869,7 @@ static bool too_many_isolated(pg_data_t *pgdat) > if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) { > if (!cc->ignore_skip_hint && get_pageblock_skip(page)) { > low_pfn = end_pfn; > + page = NULL; > goto isolate_abort; > } > valid_page = page; > @@ -950,6 +951,21 @@ static bool too_many_isolated(pg_data_t *pgdat) > if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) > goto isolate_fail; > > + /* > + * Be careful not to clear PageLRU until after we're > + * sure the page is not being freed elsewhere -- the > + * page release code relies on it. > + */ > + if (unlikely(!get_page_unless_zero(page))) > + goto isolate_fail; > + > + if (__isolate_lru_page_prepare(page, isolate_mode) != 0) > + goto isolate_fail_put; > + > + /* Try isolate the page */ > + if (!TestClearPageLRU(page)) > + goto isolate_fail_put; > + > /* If we already hold the lock, we can skip some rechecking */ > if (!locked) { > locked = compact_lock_irqsave(&pgdat->lru_lock, Why not do the __isolate_lru_page_prepare before getting the page? That way you can avoid performing an extra atomic operation on non-LRU pages. > @@ -962,10 +978,6 @@ static bool too_many_isolated(pg_data_t *pgdat) > goto isolate_abort; > } > > - /* Recheck PageLRU and PageCompound under lock */ > - if (!PageLRU(page)) > - goto isolate_fail; > - > /* > * Page become compound since the non-locked check, > * and it's on LRU. It can only be a THP so the order > @@ -973,16 +985,13 @@ static bool too_many_isolated(pg_data_t *pgdat) > */ > if (unlikely(PageCompound(page) && !cc->alloc_contig)) { > low_pfn += compound_nr(page) - 1; > - goto isolate_fail; > + SetPageLRU(page); > + goto isolate_fail_put; > } > } > > lruvec = mem_cgroup_page_lruvec(page, pgdat); > > - /* Try isolate the page */ > - if (__isolate_lru_page(page, isolate_mode) != 0) > - goto isolate_fail; > - > /* The whole page is taken off the LRU; skip the tail pages. */ > if (PageCompound(page)) > low_pfn += compound_nr(page) - 1; > @@ -1011,6 +1020,15 @@ static bool too_many_isolated(pg_data_t *pgdat) > } > > continue; > + > +isolate_fail_put: > + /* Avoid potential deadlock in freeing page under lru_lock */ > + if (locked) { > + spin_unlock_irqrestore(&pgdat->lru_lock, flags); > + locked = false; > + } > + put_page(page); > + > isolate_fail: > if (!skip_on_failure) > continue; > @@ -1047,9 +1065,15 @@ static bool too_many_isolated(pg_data_t *pgdat) > if (unlikely(low_pfn > end_pfn)) > low_pfn = end_pfn; > > + page = NULL; > + > isolate_abort: > if (locked) > spin_unlock_irqrestore(&pgdat->lru_lock, flags); > + if (page) { > + SetPageLRU(page); > + put_page(page); > + } > > /* > * Updated the cached scanner pfn once the pageblock has been scanned > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 18986fefd49b..f77748adc340 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1544,7 +1544,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, > * > * returns 0 on success, -ve errno on failure. > */ > -int __isolate_lru_page(struct page *page, isolate_mode_t mode) > +int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) > { > int ret = -EINVAL; > > @@ -1598,20 +1598,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) > if ((mode & ISOLATE_UNMAPPED) && page_mapped(page)) > return ret; > > - if (likely(get_page_unless_zero(page))) { > - /* > - * Be careful not to clear PageLRU until after we're > - * sure the page is not being freed elsewhere -- the > - * page release code relies on it. > - */ > - ClearPageLRU(page); > - ret = 0; > - } > - > - return ret; > + return 0; > } > > - > /* > * Update LRU sizes after isolating pages. The LRU size updates must > * be complete before mem_cgroup_update_lru_size due to a sanity check. > @@ -1691,17 +1680,34 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, > * only when the page is being freed somewhere else. > */ > scan += nr_pages; > - switch (__isolate_lru_page(page, mode)) { > + switch (__isolate_lru_page_prepare(page, mode)) { > case 0: > + /* > + * Be careful not to clear PageLRU until after we're > + * sure the page is not being freed elsewhere -- the > + * page release code relies on it. > + */ > + if (unlikely(!get_page_unless_zero(page))) > + goto busy; > + > + if (!TestClearPageLRU(page)) { > + /* > + * This page may in other isolation path, > + * but we still hold lru_lock. > + */ > + put_page(page); > + goto busy; > + } > + I wonder if it wouldn't make sense to combine these two atomic ops with tests and the put_page into a single inline function? Then it could be possible to just do one check and if succeeds you do the block of code below, otherwise you just fall-through into the -EBUSY case. > nr_taken += nr_pages; > nr_zone_taken[page_zonenum(page)] += nr_pages; > list_move(&page->lru, dst); > break; > - > +busy: > case -EBUSY: > /* else it is being freed elsewhere */ > list_move(&page->lru, src); > - continue; > + break; > > default: > BUG(); > -- > 1.8.3.1 > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D636FC433E1 for ; Thu, 16 Jul 2020 21:32:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7DB99207DD for ; Thu, 16 Jul 2020 21:32:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KrtuqfKL" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7DB99207DD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1970D8D000B; Thu, 16 Jul 2020 17:32:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 148A18D0003; Thu, 16 Jul 2020 17:32:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 035BD8D000B; Thu, 16 Jul 2020 17:32:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0063.hostedemail.com [216.40.44.63]) by kanga.kvack.org (Postfix) with ESMTP id E19848D0003 for ; Thu, 16 Jul 2020 17:32:51 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9533E8248047 for ; Thu, 16 Jul 2020 21:32:51 +0000 (UTC) X-FDA: 77045238942.03.mist31_4215ea326f05 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 5EBB528A4EC for ; Thu, 16 Jul 2020 21:32:51 +0000 (UTC) X-HE-Tag: mist31_4215ea326f05 X-Filterd-Recvd-Size: 13997 Received: from mail-io1-f66.google.com (mail-io1-f66.google.com [209.85.166.66]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Thu, 16 Jul 2020 21:32:50 +0000 (UTC) Received: by mail-io1-f66.google.com with SMTP id p205so7688415iod.8 for ; Thu, 16 Jul 2020 14:32:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WCPVcqFuiCdUoJYXeYbmBF6DGMw5rsz8t1arNpaHm/Q=; b=KrtuqfKLe1HresqYP0nYMAMlhSHrS4z3wZlak6tIf9Sc+8bZ+DnymZHD1TlFkkHIkt +eeUM9kjrXHgYezAodfxHYeNe/WyjTcrY7uEE8opCtb0olqS2Gdkvl781a2c0ddmyDuF EAh5aHgfGMI3ZoFLtdav2bTekuWibtY40jWTG7FKCgsxsTYnJmzcwtKPdveljQGwcBdh IW9qf1f/xdpBBhTRlSpnRW6SrhS+g53ccA+Vh5lgWBRk3Kki6o2gOq47FWGFoNBMuwoh qR8zfKq45Dea0rRqxfkQ/BGwu7pQ2fmDcOs5RGv5yKFMK/7cabijhHnGeOAhapjTJ4mt lAbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WCPVcqFuiCdUoJYXeYbmBF6DGMw5rsz8t1arNpaHm/Q=; b=QdiepTc4GBANPLcsvHQsKICB+Q9JoXSOoo0b/hnVkTqxJVHHGWRqC2NAr27wfgKX9S /P5hSjbcB6q+ZNpb0TlliGqR8rtbkoebyN7ZTt4p0MtpwB1IG8ew6/q2mO3vhum1a/Q3 OqMTOG4aY03H7i6zChYCvKcE7+a7yMqve6l/oH8Y25mX77cUWY9fS0CWyvyyPjyUXM79 UHfV3D9K6hxpPcXzIYDJSuewsnb/WOyUUZDr8UPOCkU5cY+Qh9WdLtrNzM17pwec6vxp CLHK34CdX2QosgwDhK3AEEdpIRS2gWPAoYunqHQ5ytfAxmEYDmEM/w5GsqHqoTVfrVUd FIFw== X-Gm-Message-State: AOAM530ik1tPDhhWIx5zsBOl7kukkE3GZj/J+MsHMP3DtULtQn4L+IUF 6LKrI39jy/Z1M+N5PwI2zuvXiS9WF90SXa1aj+U= X-Google-Smtp-Source: ABdhPJwlkKnjQKpDgnnAlSJD7m0HJKooPQCbYM4cllqOAxIG6bMA2HhR43hzrGc21eQRRx1kXzerU3wVAFR4p2nj8/Y= X-Received: by 2002:a02:ce9a:: with SMTP id y26mr7535180jaq.121.1594935170180; Thu, 16 Jul 2020 14:32:50 -0700 (PDT) MIME-Version: 1.0 References: <1594429136-20002-1-git-send-email-alex.shi@linux.alibaba.com> <1594429136-20002-16-git-send-email-alex.shi@linux.alibaba.com> In-Reply-To: <1594429136-20002-16-git-send-email-alex.shi@linux.alibaba.com> From: Alexander Duyck Date: Thu, 16 Jul 2020 14:32:39 -0700 Message-ID: Subject: Re: [PATCH v16 15/22] mm/compaction: do page isolation first in compaction To: Alex Shi Cc: Andrew Morton , Mel Gorman , Tejun Heo , Hugh Dickins , Konstantin Khlebnikov , Daniel Jordan , Yang Shi , Matthew Wilcox , Johannes Weiner , kbuild test robot , linux-mm , LKML , cgroups@vger.kernel.org, Shakeel Butt , Joonsoo Kim , Wei Yang , "Kirill A. Shutemov" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 5EBB528A4EC X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 10, 2020 at 5:59 PM Alex Shi wrote: > > Johannes Weiner has suggested: > "So here is a crazy idea that may be worth exploring: > > Right now, pgdat->lru_lock protects both PageLRU *and* the lruvec's > linked list. > > Can we make PageLRU atomic and use it to stabilize the lru_lock > instead, and then use the lru_lock only serialize list operations? > ..." > > Yes, this patch is doing so on __isolate_lru_page which is the core > page isolation func in compaction and shrinking path. > With this patch, the compaction will only deal the PageLRU set and now > isolated pages to skip the just alloced page which no LRU bit. And the > isolation could exclusive the other isolations in memcg move_account, > page migrations and thp split_huge_page. > > As a side effect, PageLRU may be cleared during shrink_inactive_list > path for isolation reason. If so, we can skip that page. > > Hugh Dickins fixed following bugs in this patch's > early version: > > Fix lots of crashes under compaction load: isolate_migratepages_block() > must clean up appropriately when rejecting a page, setting PageLRU again > if it had been cleared; and a put_page() after get_page_unless_zero() > cannot safely be done while holding locked_lruvec - it may turn out to > be the final put_page(), which will take an lruvec lock when PageLRU. > And move __isolate_lru_page_prepare back after get_page_unless_zero to > make trylock_page() safe: > trylock_page() is not safe to use at this time: its setting PG_locked > can race with the page being freed or allocated ("Bad page"), and can > also erase flags being set by one of those "sole owners" of a freshly > allocated page who use non-atomic __SetPageFlag(). > > Suggested-by: Johannes Weiner > Signed-off-by: Alex Shi > Cc: Hugh Dickins > Cc: Andrew Morton > Cc: Matthew Wilcox > Cc: linux-kernel@vger.kernel.org > Cc: linux-mm@kvack.org > --- > include/linux/swap.h | 2 +- > mm/compaction.c | 42 +++++++++++++++++++++++++++++++++--------- > mm/vmscan.c | 38 ++++++++++++++++++++++---------------- > 3 files changed, 56 insertions(+), 26 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 2c29399b29a0..6d23d3beeff7 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -358,7 +358,7 @@ extern void lru_cache_add_active_or_unevictable(struct page *page, > extern unsigned long zone_reclaimable_pages(struct zone *zone); > extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, > gfp_t gfp_mask, nodemask_t *mask); > -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode); > +extern int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); > extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, > unsigned long nr_pages, > gfp_t gfp_mask, > diff --git a/mm/compaction.c b/mm/compaction.c > index f14780fc296a..2da2933fe56b 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -869,6 +869,7 @@ static bool too_many_isolated(pg_data_t *pgdat) > if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) { > if (!cc->ignore_skip_hint && get_pageblock_skip(page)) { > low_pfn = end_pfn; > + page = NULL; > goto isolate_abort; > } > valid_page = page; > @@ -950,6 +951,21 @@ static bool too_many_isolated(pg_data_t *pgdat) > if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) > goto isolate_fail; > > + /* > + * Be careful not to clear PageLRU until after we're > + * sure the page is not being freed elsewhere -- the > + * page release code relies on it. > + */ > + if (unlikely(!get_page_unless_zero(page))) > + goto isolate_fail; > + > + if (__isolate_lru_page_prepare(page, isolate_mode) != 0) > + goto isolate_fail_put; > + > + /* Try isolate the page */ > + if (!TestClearPageLRU(page)) > + goto isolate_fail_put; > + > /* If we already hold the lock, we can skip some rechecking */ > if (!locked) { > locked = compact_lock_irqsave(&pgdat->lru_lock, Why not do the __isolate_lru_page_prepare before getting the page? That way you can avoid performing an extra atomic operation on non-LRU pages. > @@ -962,10 +978,6 @@ static bool too_many_isolated(pg_data_t *pgdat) > goto isolate_abort; > } > > - /* Recheck PageLRU and PageCompound under lock */ > - if (!PageLRU(page)) > - goto isolate_fail; > - > /* > * Page become compound since the non-locked check, > * and it's on LRU. It can only be a THP so the order > @@ -973,16 +985,13 @@ static bool too_many_isolated(pg_data_t *pgdat) > */ > if (unlikely(PageCompound(page) && !cc->alloc_contig)) { > low_pfn += compound_nr(page) - 1; > - goto isolate_fail; > + SetPageLRU(page); > + goto isolate_fail_put; > } > } > > lruvec = mem_cgroup_page_lruvec(page, pgdat); > > - /* Try isolate the page */ > - if (__isolate_lru_page(page, isolate_mode) != 0) > - goto isolate_fail; > - > /* The whole page is taken off the LRU; skip the tail pages. */ > if (PageCompound(page)) > low_pfn += compound_nr(page) - 1; > @@ -1011,6 +1020,15 @@ static bool too_many_isolated(pg_data_t *pgdat) > } > > continue; > + > +isolate_fail_put: > + /* Avoid potential deadlock in freeing page under lru_lock */ > + if (locked) { > + spin_unlock_irqrestore(&pgdat->lru_lock, flags); > + locked = false; > + } > + put_page(page); > + > isolate_fail: > if (!skip_on_failure) > continue; > @@ -1047,9 +1065,15 @@ static bool too_many_isolated(pg_data_t *pgdat) > if (unlikely(low_pfn > end_pfn)) > low_pfn = end_pfn; > > + page = NULL; > + > isolate_abort: > if (locked) > spin_unlock_irqrestore(&pgdat->lru_lock, flags); > + if (page) { > + SetPageLRU(page); > + put_page(page); > + } > > /* > * Updated the cached scanner pfn once the pageblock has been scanned > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 18986fefd49b..f77748adc340 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1544,7 +1544,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, > * > * returns 0 on success, -ve errno on failure. > */ > -int __isolate_lru_page(struct page *page, isolate_mode_t mode) > +int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) > { > int ret = -EINVAL; > > @@ -1598,20 +1598,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) > if ((mode & ISOLATE_UNMAPPED) && page_mapped(page)) > return ret; > > - if (likely(get_page_unless_zero(page))) { > - /* > - * Be careful not to clear PageLRU until after we're > - * sure the page is not being freed elsewhere -- the > - * page release code relies on it. > - */ > - ClearPageLRU(page); > - ret = 0; > - } > - > - return ret; > + return 0; > } > > - > /* > * Update LRU sizes after isolating pages. The LRU size updates must > * be complete before mem_cgroup_update_lru_size due to a sanity check. > @@ -1691,17 +1680,34 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, > * only when the page is being freed somewhere else. > */ > scan += nr_pages; > - switch (__isolate_lru_page(page, mode)) { > + switch (__isolate_lru_page_prepare(page, mode)) { > case 0: > + /* > + * Be careful not to clear PageLRU until after we're > + * sure the page is not being freed elsewhere -- the > + * page release code relies on it. > + */ > + if (unlikely(!get_page_unless_zero(page))) > + goto busy; > + > + if (!TestClearPageLRU(page)) { > + /* > + * This page may in other isolation path, > + * but we still hold lru_lock. > + */ > + put_page(page); > + goto busy; > + } > + I wonder if it wouldn't make sense to combine these two atomic ops with tests and the put_page into a single inline function? Then it could be possible to just do one check and if succeeds you do the block of code below, otherwise you just fall-through into the -EBUSY case. > nr_taken += nr_pages; > nr_zone_taken[page_zonenum(page)] += nr_pages; > list_move(&page->lru, dst); > break; > - > +busy: > case -EBUSY: > /* else it is being freed elsewhere */ > list_move(&page->lru, src); > - continue; > + break; > > default: > BUG(); > -- > 1.8.3.1 > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Subject: Re: [PATCH v16 15/22] mm/compaction: do page isolation first in compaction Date: Thu, 16 Jul 2020 14:32:39 -0700 Message-ID: References: <1594429136-20002-1-git-send-email-alex.shi@linux.alibaba.com> <1594429136-20002-16-git-send-email-alex.shi@linux.alibaba.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WCPVcqFuiCdUoJYXeYbmBF6DGMw5rsz8t1arNpaHm/Q=; b=KrtuqfKLe1HresqYP0nYMAMlhSHrS4z3wZlak6tIf9Sc+8bZ+DnymZHD1TlFkkHIkt +eeUM9kjrXHgYezAodfxHYeNe/WyjTcrY7uEE8opCtb0olqS2Gdkvl781a2c0ddmyDuF EAh5aHgfGMI3ZoFLtdav2bTekuWibtY40jWTG7FKCgsxsTYnJmzcwtKPdveljQGwcBdh IW9qf1f/xdpBBhTRlSpnRW6SrhS+g53ccA+Vh5lgWBRk3Kki6o2gOq47FWGFoNBMuwoh qR8zfKq45Dea0rRqxfkQ/BGwu7pQ2fmDcOs5RGv5yKFMK/7cabijhHnGeOAhapjTJ4mt lAbQ== In-Reply-To: <1594429136-20002-16-git-send-email-alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Alex Shi Cc: Andrew Morton , Mel Gorman , Tejun Heo , Hugh Dickins , Konstantin Khlebnikov , Daniel Jordan , Yang Shi , Matthew Wilcox , Johannes Weiner , kbuild test robot , linux-mm , LKML , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Shakeel Butt , Joonsoo Kim , Wei Yang , "Kirill A. Shutemov" On Fri, Jul 10, 2020 at 5:59 PM Alex Shi wrote: > > Johannes Weiner has suggested: > "So here is a crazy idea that may be worth exploring: > > Right now, pgdat->lru_lock protects both PageLRU *and* the lruvec's > linked list. > > Can we make PageLRU atomic and use it to stabilize the lru_lock > instead, and then use the lru_lock only serialize list operations? > ..." > > Yes, this patch is doing so on __isolate_lru_page which is the core > page isolation func in compaction and shrinking path. > With this patch, the compaction will only deal the PageLRU set and now > isolated pages to skip the just alloced page which no LRU bit. And the > isolation could exclusive the other isolations in memcg move_account, > page migrations and thp split_huge_page. > > As a side effect, PageLRU may be cleared during shrink_inactive_list > path for isolation reason. If so, we can skip that page. > > Hugh Dickins fixed following bugs in this patch's > early version: > > Fix lots of crashes under compaction load: isolate_migratepages_block() > must clean up appropriately when rejecting a page, setting PageLRU again > if it had been cleared; and a put_page() after get_page_unless_zero() > cannot safely be done while holding locked_lruvec - it may turn out to > be the final put_page(), which will take an lruvec lock when PageLRU. > And move __isolate_lru_page_prepare back after get_page_unless_zero to > make trylock_page() safe: > trylock_page() is not safe to use at this time: its setting PG_locked > can race with the page being freed or allocated ("Bad page"), and can > also erase flags being set by one of those "sole owners" of a freshly > allocated page who use non-atomic __SetPageFlag(). > > Suggested-by: Johannes Weiner > Signed-off-by: Alex Shi > Cc: Hugh Dickins > Cc: Andrew Morton > Cc: Matthew Wilcox > Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org > --- > include/linux/swap.h | 2 +- > mm/compaction.c | 42 +++++++++++++++++++++++++++++++++--------- > mm/vmscan.c | 38 ++++++++++++++++++++++---------------- > 3 files changed, 56 insertions(+), 26 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 2c29399b29a0..6d23d3beeff7 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -358,7 +358,7 @@ extern void lru_cache_add_active_or_unevictable(struct page *page, > extern unsigned long zone_reclaimable_pages(struct zone *zone); > extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, > gfp_t gfp_mask, nodemask_t *mask); > -extern int __isolate_lru_page(struct page *page, isolate_mode_t mode); > +extern int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); > extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, > unsigned long nr_pages, > gfp_t gfp_mask, > diff --git a/mm/compaction.c b/mm/compaction.c > index f14780fc296a..2da2933fe56b 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -869,6 +869,7 @@ static bool too_many_isolated(pg_data_t *pgdat) > if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) { > if (!cc->ignore_skip_hint && get_pageblock_skip(page)) { > low_pfn = end_pfn; > + page = NULL; > goto isolate_abort; > } > valid_page = page; > @@ -950,6 +951,21 @@ static bool too_many_isolated(pg_data_t *pgdat) > if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) > goto isolate_fail; > > + /* > + * Be careful not to clear PageLRU until after we're > + * sure the page is not being freed elsewhere -- the > + * page release code relies on it. > + */ > + if (unlikely(!get_page_unless_zero(page))) > + goto isolate_fail; > + > + if (__isolate_lru_page_prepare(page, isolate_mode) != 0) > + goto isolate_fail_put; > + > + /* Try isolate the page */ > + if (!TestClearPageLRU(page)) > + goto isolate_fail_put; > + > /* If we already hold the lock, we can skip some rechecking */ > if (!locked) { > locked = compact_lock_irqsave(&pgdat->lru_lock, Why not do the __isolate_lru_page_prepare before getting the page? That way you can avoid performing an extra atomic operation on non-LRU pages. > @@ -962,10 +978,6 @@ static bool too_many_isolated(pg_data_t *pgdat) > goto isolate_abort; > } > > - /* Recheck PageLRU and PageCompound under lock */ > - if (!PageLRU(page)) > - goto isolate_fail; > - > /* > * Page become compound since the non-locked check, > * and it's on LRU. It can only be a THP so the order > @@ -973,16 +985,13 @@ static bool too_many_isolated(pg_data_t *pgdat) > */ > if (unlikely(PageCompound(page) && !cc->alloc_contig)) { > low_pfn += compound_nr(page) - 1; > - goto isolate_fail; > + SetPageLRU(page); > + goto isolate_fail_put; > } > } > > lruvec = mem_cgroup_page_lruvec(page, pgdat); > > - /* Try isolate the page */ > - if (__isolate_lru_page(page, isolate_mode) != 0) > - goto isolate_fail; > - > /* The whole page is taken off the LRU; skip the tail pages. */ > if (PageCompound(page)) > low_pfn += compound_nr(page) - 1; > @@ -1011,6 +1020,15 @@ static bool too_many_isolated(pg_data_t *pgdat) > } > > continue; > + > +isolate_fail_put: > + /* Avoid potential deadlock in freeing page under lru_lock */ > + if (locked) { > + spin_unlock_irqrestore(&pgdat->lru_lock, flags); > + locked = false; > + } > + put_page(page); > + > isolate_fail: > if (!skip_on_failure) > continue; > @@ -1047,9 +1065,15 @@ static bool too_many_isolated(pg_data_t *pgdat) > if (unlikely(low_pfn > end_pfn)) > low_pfn = end_pfn; > > + page = NULL; > + > isolate_abort: > if (locked) > spin_unlock_irqrestore(&pgdat->lru_lock, flags); > + if (page) { > + SetPageLRU(page); > + put_page(page); > + } > > /* > * Updated the cached scanner pfn once the pageblock has been scanned > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 18986fefd49b..f77748adc340 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1544,7 +1544,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, > * > * returns 0 on success, -ve errno on failure. > */ > -int __isolate_lru_page(struct page *page, isolate_mode_t mode) > +int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) > { > int ret = -EINVAL; > > @@ -1598,20 +1598,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) > if ((mode & ISOLATE_UNMAPPED) && page_mapped(page)) > return ret; > > - if (likely(get_page_unless_zero(page))) { > - /* > - * Be careful not to clear PageLRU until after we're > - * sure the page is not being freed elsewhere -- the > - * page release code relies on it. > - */ > - ClearPageLRU(page); > - ret = 0; > - } > - > - return ret; > + return 0; > } > > - > /* > * Update LRU sizes after isolating pages. The LRU size updates must > * be complete before mem_cgroup_update_lru_size due to a sanity check. > @@ -1691,17 +1680,34 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, > * only when the page is being freed somewhere else. > */ > scan += nr_pages; > - switch (__isolate_lru_page(page, mode)) { > + switch (__isolate_lru_page_prepare(page, mode)) { > case 0: > + /* > + * Be careful not to clear PageLRU until after we're > + * sure the page is not being freed elsewhere -- the > + * page release code relies on it. > + */ > + if (unlikely(!get_page_unless_zero(page))) > + goto busy; > + > + if (!TestClearPageLRU(page)) { > + /* > + * This page may in other isolation path, > + * but we still hold lru_lock. > + */ > + put_page(page); > + goto busy; > + } > + I wonder if it wouldn't make sense to combine these two atomic ops with tests and the put_page into a single inline function? Then it could be possible to just do one check and if succeeds you do the block of code below, otherwise you just fall-through into the -EBUSY case. > nr_taken += nr_pages; > nr_zone_taken[page_zonenum(page)] += nr_pages; > list_move(&page->lru, dst); > break; > - > +busy: > case -EBUSY: > /* else it is being freed elsewhere */ > list_move(&page->lru, src); > - continue; > + break; > > default: > BUG(); > -- > 1.8.3.1 > >