From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8D93C433E0 for ; Wed, 29 Jul 2020 01:27:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7AF682076E for ; Wed, 29 Jul 2020 01:27:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XHRjbV6g" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7AF682076E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1439F8D0002; Tue, 28 Jul 2020 21:27:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F5906B0025; Tue, 28 Jul 2020 21:27:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00A5C8D0002; Tue, 28 Jul 2020 21:27:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0179.hostedemail.com [216.40.44.179]) by kanga.kvack.org (Postfix) with ESMTP id E08946B0023 for ; Tue, 28 Jul 2020 21:27:28 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id A48CF824805A for ; Wed, 29 Jul 2020 01:27:28 +0000 (UTC) X-FDA: 77089375776.23.worm09_57020d626f6e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 7BC3F429B4 for ; Wed, 29 Jul 2020 01:27:28 +0000 (UTC) X-HE-Tag: worm09_57020d626f6e X-Filterd-Recvd-Size: 8636 Received: from mail-il1-f193.google.com (mail-il1-f193.google.com [209.85.166.193]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Wed, 29 Jul 2020 01:27:28 +0000 (UTC) Received: by mail-il1-f193.google.com with SMTP id c16so6168851ils.8 for ; Tue, 28 Jul 2020 18:27:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=2AUZodZwJRXZg+Nk/lyRu0Z8bQTh1DNt0my27QVLRIE=; b=XHRjbV6gcKKyZ7t3CuLSBJocHqKHPIjZvwsV5DIOeWQ3ngzNTNUwZds7roo63C+ktS PYjB9YztN1uSR96wxcGWPo1IILBKAU5yRYVFlNQDGdHc2tGGryZ1aw2A2CqY3ImZRZ8b fFXnrmGZXe4j3aKu72ikXjiSJP3h8qOTow4eoPJSjHRCIoEeSK34SNbyP9xoHzYblJ4D /ddlrdx2vPMg61m6jpgsBVm+VSFyH268r0bOLg01yfs79F5AMif/9ZbBKB1LuINmz0Ww hSnm5Wi1q6pzSQQViTsBhrcvlPW2F4cLnRLSJ8w9NOxr2BX8ltTmT4V6IWqg5l3JYSKi TlOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=2AUZodZwJRXZg+Nk/lyRu0Z8bQTh1DNt0my27QVLRIE=; b=moOQi3pzMym+eY1u0zWcwtUqbDWF70ixg6eNQp6O9/GXJvBw+hQtnCJkvRex2lopJc EVN/WUrh/pscFmybBOc89R93F5bIJomj9erB+iHu0PKNMwxK0NPwR7rSRtaAYFYNmoGW 3ifo6DQne9kmk9tbcIlVPfxHzX0aOpXBMjM5qKuMec7P417HJi2TMwlNbQrvxZ6Z9CH5 UNmh4/91QPD3V7S3RcQyzdVoPO5vPvbtd9gOHwXR7UQ2VJdidgyNoLUwhVGNyGXjDXGg XjUqDZFSkCuHjB5zcNvhJJWX37zvBzCuIWppNn6BuKZhe/VYKiblUfrwocidpCg5uPGE 2QKw== X-Gm-Message-State: AOAM5320Xb0NskWZaEZBkZGUuCXG2hAG2JBgsA25aMShXvqiO84jckP6 rjhxSINQM1AD3togfGs3MFcbcD1U6D1vza+YfNA= X-Google-Smtp-Source: ABdhPJzU2TqKKvvLiczoQEejJGBneHlgzlG75S5cGxU8lcNMuyxASx3GJcMFcrnosy3ANydW2zPH1K0CAzaXM8sp00s= X-Received: by 2002:a05:6e02:1212:: with SMTP id a18mr28736892ilq.97.1595986047223; Tue, 28 Jul 2020 18:27:27 -0700 (PDT) MIME-Version: 1.0 References: <1595681998-19193-1-git-send-email-alex.shi@linux.alibaba.com> <1595681998-19193-18-git-send-email-alex.shi@linux.alibaba.com> <09aeced7-cc36-0c9a-d40b-451db9dc54cc@linux.alibaba.com> In-Reply-To: <09aeced7-cc36-0c9a-d40b-451db9dc54cc@linux.alibaba.com> From: Alexander Duyck Date: Tue, 28 Jul 2020 18:27:16 -0700 Message-ID: Subject: Re: [PATCH v17 17/21] mm/lru: replace pgdat lru_lock with lruvec lock To: Alex Shi Cc: Andrew Morton , Mel Gorman , Tejun Heo , Hugh Dickins , Konstantin Khlebnikov , Daniel Jordan , Yang Shi , Matthew Wilcox , Johannes Weiner , kbuild test robot , linux-mm , LKML , cgroups@vger.kernel.org, Shakeel Butt , Joonsoo Kim , Wei Yang , "Kirill A. Shutemov" , Rong Chen , Michal Hocko , Vladimir Davydov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7BC3F429B4 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jul 28, 2020 at 6:00 PM Alex Shi wrote= : > > > > =E5=9C=A8 2020/7/28 =E4=B8=8B=E5=8D=8810:54, Alexander Duyck =E5=86=99=E9= =81=93: > > On Tue, Jul 28, 2020 at 4:20 AM Alex Shi w= rote: > >> > >> > >> > >> =E5=9C=A8 2020/7/28 =E4=B8=8A=E5=8D=887:34, Alexander Duyck =E5=86=99= =E9=81=93: > >>>> @@ -1876,6 +1876,12 @@ static unsigned noinline_for_stack move_pages= _to_lru(struct lruvec *lruvec, > >>>> * list_add(&= page->lru,) > >>>> * list_add(&page->lru,) //corrupt > >>>> */ > >>>> + new_lruvec =3D mem_cgroup_page_lruvec(page, page_pgd= at(page)); > >>>> + if (new_lruvec !=3D lruvec) { > >>>> + if (lruvec) > >>>> + spin_unlock_irq(&lruvec->lru_lock); > >>>> + lruvec =3D lock_page_lruvec_irq(page); > >>>> + } > >>>> SetPageLRU(page); > >>>> > >>>> if (unlikely(put_page_testzero(page))) { > >>> I was going through the code of the entire patch set and I noticed > >>> these changes in move_pages_to_lru. What is the reason for adding the > >>> new_lruvec logic? My understanding is that we are moving the pages to > >>> the lruvec provided are we not?If so why do we need to add code to ge= t > >>> a new lruvec? The code itself seems to stand out from the rest of the > >>> patch as it is introducing new code instead of replacing existing > >>> locking code, and it doesn't match up with the description of what > >>> this function is supposed to do since it changes the lruvec. > >> > >> this new_lruvec is the replacement of removed line, as following code: > >>>> - lruvec =3D mem_cgroup_page_lruvec(page, pgdat); > >> This recheck is for the page move the root memcg, otherwise it cause t= he bug: > > > > Okay, now I see where the issue is. You moved this code so now it has > > a different effect than it did before. You are relocking things before > > you needed to. Don't forget that when you came into this function you > > already had the lock. In addition the patch is broken as it currently > > stands as you aren't using similar logic in the code just above this > > addition if you encounter an evictable page. As a result this is > > really difficult to review as there are subtle bugs here. > > Why you think its a bug? the relock only happens if locked lruvec is diff= erent. > and unlock the old one. The section I am talking about with the bug is this section here: while (!list_empty(list)) { + struct lruvec *new_lruvec =3D NULL; + page =3D lru_to_page(list); VM_BUG_ON_PAGE(PageLRU(page), page); list_del(&page->lru); if (unlikely(!page_evictable(page))) { - spin_unlock_irq(&pgdat->lru_lock); + spin_unlock_irq(&lruvec->lru_lock); putback_lru_page(page); - spin_lock_irq(&pgdat->lru_lock); + spin_lock_irq(&lruvec->lru_lock); continue; } Basically it probably is not advisable to be retaking the lruvec->lru_lock directly as the lruvec may have changed so it wouldn't be correct for the next page. It would make more sense to be using your API and calling unlock_page_lruvec_irq and lock_page_lruvec_irq instead of using the lock directly. > > > > I suppose the correct fix is to get rid of this line, but it should > > be placed everywhere the original function was calling > > spin_lock_irq(). > > > > In addition I would consider changing the arguments/documentation for > > move_pages_to_lru. You aren't moving the pages to lruvec, so there is > > probably no need to pass that as an argument. Instead I would pass > > pgdat since that isn't going to be moving and is the only thing you > > actually derive based on the original lruvec. > > yes, The comments should be changed with the line was introduced from lon= g ago. :) > Anyway, I am wondering if it worth a v18 version resend? So I have been looking over the function itself and I wonder if it isn't worth looking at rewriting this to optimize the locking behavior to minimize the number of times we have to take the LRU lock. I have some code I am working on that I plan to submit as an RFC in the next day or so after I can get it smoke tested. The basic idea would be to defer returning the evictiable pages or freeing the compound pages until after we have processed the pages that can be moved while still holding the lock. I would think it should reduce the lock contention significantly while improving the throughput.