From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F1A8C43461 for ; Sat, 12 Sep 2020 08:39:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 61A552158C for ; Sat, 12 Sep 2020 08:39:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DGtRL6Tb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 61A552158C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 87FCB6B0070; Sat, 12 Sep 2020 04:39:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 808C96B0071; Sat, 12 Sep 2020 04:39:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D0FA6B0072; Sat, 12 Sep 2020 04:39:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29]) by kanga.kvack.org (Postfix) with ESMTP id 4FE076B0070 for ; Sat, 12 Sep 2020 04:39:04 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0632F363E for ; Sat, 12 Sep 2020 08:39:04 +0000 (UTC) X-FDA: 77253759408.08.van82_5a055e9270f5 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id D22941819E766 for ; Sat, 12 Sep 2020 08:39:03 +0000 (UTC) X-HE-Tag: van82_5a055e9270f5 X-Filterd-Recvd-Size: 9441 Received: from mail-qv1-f66.google.com (mail-qv1-f66.google.com [209.85.219.66]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Sat, 12 Sep 2020 08:39:03 +0000 (UTC) Received: by mail-qv1-f66.google.com with SMTP id cr8so6422249qvb.10 for ; Sat, 12 Sep 2020 01:39:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=t+vGWEoJxp91SsNDolPTSq+7zjC/jYdCePf+SxUM8GU=; b=DGtRL6Tbceqb0N+n02ZUW5GRPqfqmNmUDurGwj8oZMmd4F5Lhsl0tIGQjWUxvMSNQN yhi92u1IZOFzeBw7bslzjFppRiKI/kQ6DUTv+OfbT9814NinYRuckT9C32fdbn1pDTan AEFBUKu5KM/gk/EVPmSUo0F4aGg/rRIvIrINtKxaQw8YkL3UfiZi1PuxGCwYRZD9Qizp GEawDzYMcm09lgvcqR22jxvUTZ2i7w0n/mJbeta+hGleLeTgsDEJgDFiB+DRYJ3SGnM5 Ke1QWlZ2LQAf9tSSijK8ZBr2m0Eg6OKZeB0uVCaKJr4PvsICy9mBOIjf/oGVqHOeRnfU m2Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=t+vGWEoJxp91SsNDolPTSq+7zjC/jYdCePf+SxUM8GU=; b=fTRfUvnsLpfwQx5d4xEBHjEKc5ADpxv14tDFz89igQZsnAIThlBE3xgX3xlLEdY9Q3 rYdh5IrosDRKFleL76bH829koinsPUHJOXDi9td+U3fWIATITNgryLcHi9ZndG0KxYDh nvcTyqprdq+qIkr6XkIixP4twhuB1PpPXB9rUHMvoIm68N8ov2TfpYHguBAR3x+afCLT Kgf7e78TbxgYDXe+2CeOCGfRSDq7a0PkVkUsbRx2G3/PEzkonbTBEAFs0eiRF+wiU4ky MvdieYYTM6JpLWTdXKJhmDhbM2IkHvIK5WyDgMYyePu8X0b6w5Wl2+nzuaFBYXRThKFv tHKQ== X-Gm-Message-State: AOAM530pc362UHAxqHbjJdIA1ccTApLzrfrFhdTuftT7MmVypkSyJU37 WcLu7BvIW5Uu13FjB8Yl/92TfQ== X-Google-Smtp-Source: ABdhPJwriv/y6+ub3Fs5ZDgAvBfXcMmxXW+goduf+GXSgFcnz8NQ7PLMVYLvBAlnr9np4WOgMtRfnQ== X-Received: by 2002:ad4:4523:: with SMTP id l3mr5389790qvu.109.1599899942234; Sat, 12 Sep 2020 01:39:02 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 205sm6565315qki.118.2020.09.12.01.38.57 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Sat, 12 Sep 2020 01:38:59 -0700 (PDT) Date: Sat, 12 Sep 2020 01:38:56 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Alex Shi cc: Hugh Dickins , Andrew Morton , mgorman@techsingularity.net, tj@kernel.org, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com, vbabka@suse.cz, minchan@kernel.org, cai@lca.pw Subject: Re: [PATCH v18 00/32] per memcg lru_lock: reviews In-Reply-To: <61a42a87-eec9-e300-f710-992756f70de6@linux.alibaba.com> Message-ID: References: <1598273705-69124-1-git-send-email-alex.shi@linux.alibaba.com> <20200824114204.cc796ca182db95809dd70a47@linux-foundation.org> <61a42a87-eec9-e300-f710-992756f70de6@linux.alibaba.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-916415362-1599899939=:23961" X-Rspamd-Queue-Id: D22941819E766 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-916415362-1599899939=:23961 Content-Type: TEXT/PLAIN; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Wed, 9 Sep 2020, Alex Shi wrote: > =E5=9C=A8 2020/9/9 =E4=B8=8A=E5=8D=887:41, Hugh Dickins =E5=86=99=E9=81= =93: > >=20 > > The use of lock_page_memcg() in __munlock_pagevec() in 20/32, > > introduced in patchset v17, looks good but it isn't: I was lucky that > > systemd at reboot did some munlocking that exposed the problem to lockd= ep. > > The first time into the loop, lock_page_memcg() is done before lru_lock > > (as 06/32 has allowed); but the second time around the loop, it is done > > while still holding lru_lock. >=20 > I don't know the details of lockdep show. Just wondering could it possibl= e=20 > to solid the move_lock/lru_lock sequence? > or try other blocking way which mentioned in commit_charge()? >=20 > >=20 > > lock_page_memcg() really needs to be absorbed into (a variant of) > > relock_page_lruvec(), and I do have that (it's awkward because of > > the different ways in which the IRQ flags are handled). And out of > > curiosity, I've also tried using that in mm/swap.c too, instead of the > > TestClearPageLRU technique: lockdep is happy, but an update_lru_size() > > warning showed that it cannot safely be mixed with the TestClearPageLRU > > technique (that I'd left in isolate_lru_page()). So I'll stash away > > that relock_page_lruvec(), and consider what's best for mm/mlock.c: > > now that I've posted these comments so far, that's my priority, then > > to get the result under testing again, before resuming these comments. >=20 > No idea of your solution, but looking forward for your good news! :) Yes, it is good news, and simpler than anything suggested above. The main difficulties will probably be to look good in the 80 columns (I know that limit has been lifted recently, but some of us use xterms side by side), and to explain it. mm/mlock.c has not been kept up-to-date very well: and in particular, you have taken too seriously that "Serialize with any parallel __split_huge_page_refcount()..." comment that you updated to two comments "Serialize split tail pages in __split_huge_page_tail()...". Delete them! The original comment was by Vlastimil for v3.14 in 2014. But Kirill redesigned THP refcounting for v4.5 in 2016: that's when __split_huge_page_refcount() went away. And with the new refcounting, the THP splitting races that lru_lock protected munlock_vma_page() and __munlock_pagevec() from: those races have become impossible. Or maybe there never was such a race in __munlock_pagevec(): you have added the comment there, assuming lru_lock was for that purpose, but that was probably just the convenient place to take it, to cover all the del_page_from_lru()s. Observe how split_huge_page_to_list() uses unmap_page() to remove all pmds and all ptes for the huge page being split, and remap_page() only replaces the migration entries (used for anon but not for shmem or file) after doing all of the __split_huge_page_tail()s, before unlocking any of the pages. Recall that munlock_vma_page() and __munlock_pagevec() are being applied to pages found mapped into userspace, by ptes or pmd: there are none of those while __split_huge_page_tail() is being used, so no race to protect from. (Could a newly detached tail be freshly faulted into userspace just before __split_huge_page() has reached the head? Not quite, the fault has to wait to get the tail's page lock. But even if it could, how would that be a problem for __munlock_pagevec()?) There's lots more that could be said: for example, PageMlocked will always be clear on the THP head during __split_huge_page_tail(), because the last unmap of a PageMlocked page does clear_page_mlock(). But that's not required to prove the case, it's just another argument against the "Serialize" comment you have in __munlock_pagevec(). So, no need for the problematic lock_page_memcg(page) there in __munlock_pagevec(), nor to lock (or relock) lruvec just below it. __munlock_pagevec() still needs lru_lock to del_page_from_lru_list(), of course, but that must be done after your TestClearPageMlocked has stabilized page->memcg. Use relock_page_lruvec_irq() here? I suppose that will be easiest, but notice how __munlock_pagevec_fill() has already made sure that all the pages in the pagevec are from the same zone (and it cannot do the same for memcg without locking page memcg); so some of relock's work will be redundant. Otherwise, I'm much happier with your mm/mlock.c since looking at it in more detail: a couple of nits though - drop the clear_page_mlock() hunk from 25/32 - kernel style says do it the way you are undoing by -=09if (!isolate_lru_page(page)) { +=09if (!isolate_lru_page(page)) =09=09putback_lru_page(page); -=09} else { +=09else { I don't always follow that over-braced style when making changes, but you should not touch otherwise untouched code just to make it go against the approved style. And in munlock_vma_page(), -=09if (!TestClearPageMlocked(page)) { +=09if (!TestClearPageMlocked(page)) =09=09/* Potentially, PTE-mapped THP: do not skip the rest PTEs */ -=09=09nr_pages =3D 1; -=09=09goto unlock_out; -=09} +=09=09return 0; please restore the braces: with that comment line in there, the compiler does not need the braces, but the human eye does. Hugh --0-916415362-1599899939=:23961--