From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3467BC4332F for ; Wed, 9 Feb 2022 22:28:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A6316B0075; Wed, 9 Feb 2022 17:28:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 961706B0078; Wed, 9 Feb 2022 17:28:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F5306B007B; Wed, 9 Feb 2022 17:28:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0233.hostedemail.com [216.40.44.233]) by kanga.kvack.org (Postfix) with ESMTP id 6F02C6B0075 for ; Wed, 9 Feb 2022 17:28:32 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1FA9B82655D2 for ; Wed, 9 Feb 2022 22:28:32 +0000 (UTC) X-FDA: 79124681664.13.C3C49DF Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41]) by imf06.hostedemail.com (Postfix) with ESMTP id B0E60180005 for ; Wed, 9 Feb 2022 22:28:31 +0000 (UTC) Received: by mail-ot1-f41.google.com with SMTP id l43-20020a9d1cab000000b005aa50ff5869so2583507ota.0 for ; Wed, 09 Feb 2022 14:28:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=ow76+gGSkaBY1EZZ1MFaphikeMi7h3bSFQ/+/XQlTq0=; b=lKOXkVrnz3eFiyfyLCV70oVZ2uqhUlhTnU3QFUzperRTSy4QZUoR4uPenX7xHTwNVI XMRpSC15xRvPOm6eJaXes0Krt4wXmHiX+aXgxqpTbkDaVSipu8EARSjH1pk7Xqn7+kR5 ++RcXbkVSGaZv6GLcEoWPNK4+RWJBFTabSUAUrvPFwGUo2rtYXfWqdwZvPyw6PaYbNX5 5LWJ5ebin3BSsLtzCorjoC78oCL1zbHNjlpI2/qZbD/y8i2KiQIaJ8q23t2Gx9DCpxHl LJkwNpbVGdpflZTg9raNp1gOz+c8JSkhI3TMpvKYL6k5H734BWeHbeuSGD8NxnZNEsKB 85cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=ow76+gGSkaBY1EZZ1MFaphikeMi7h3bSFQ/+/XQlTq0=; b=WfAraAb1GU0gmKm6MVUylFeWqF24YGNdgvFUdTB3JJ0WCx1f8rwdIUHlJ7n0FmRGFC m7fzsQ/M6L0rAxiKRW0T3riXbrGqTiAcWzgQ5jR7yBHamgbu6GTq/53vPwT+ypq3283w PSALHX+6yjLwRe0wzaxdPFz02skVhAYJ9Lak6gxTZtHTrjNYSiAHFbCC5bfBItK4dOgR gNK9qd/iUillOqqyxuGpX9VRPpMuhYXbeiOsE4fABKUFd5APkopBz5XPWwRqW93Utm+k JQqrBxV65iWJUQUvej8tAXgg4xQ+KsPgyy6eFp56wejQ0NPNvnlCEcCypN/BQZHh8Djn Oo7A== X-Gm-Message-State: AOAM5328yiWaaKfh+vTpW8sM0/bsSh2nvZl5IJ2QIaeLOEmQokKDsHsd t9YwTe7P4t6fLNTe3l7QRrp+SA== X-Google-Smtp-Source: ABdhPJzmBki7ktBHEFQa+yoDyLpAdLuMiGhYDC2N3odDv4pHXR1wMIGMYfAU6QZon5yqJeyuMqH2nw== X-Received: by 2002:a9d:7856:: with SMTP id c22mr1887179otm.192.1644445710788; Wed, 09 Feb 2022 14:28:30 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id e16sm7155029otr.11.2022.02.09.14.28.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Feb 2022 14:28:29 -0800 (PST) Date: Wed, 9 Feb 2022 14:28:08 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Vlastimil Babka cc: Hugh Dickins , Andrew Morton , Michal Hocko , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Alistair Popple , Johannes Weiner , Rik van Riel , Suren Baghdasaryan , Yu Zhao , Greg Thelen , Shakeel Butt , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 01/13] mm/munlock: delete page_mlock() and all its works In-Reply-To: <4a5bc989-e59a-d421-faf4-8156f700ec99@suse.cz> Message-ID: References: <8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com> <5ed1f01-3e7e-7e26-cc1-2b7a574e2147@google.com> <4a5bc989-e59a-d421-faf4-8156f700ec99@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=lKOXkVrn; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of hughd@google.com designates 209.85.210.41 as permitted sender) smtp.mailfrom=hughd@google.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: B0E60180005 X-Stat-Signature: h5zycbxb98t6b31z7abwpgo4igwc6br6 X-HE-Tag: 1644445711-160717 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 9 Feb 2022, Vlastimil Babka wrote: > On 2/6/22 22:30, Hugh Dickins wrote: > > While I understand the reasons to clear the ground first, wonder what are > the implications for bisectability - is there a risk of surprising failures? > Maybe we should at least explicitly spell out the implications here? > IIUC, pages that once become mlocked, will stay mlocked, implicating the > Mlocked meminfo counter and inability to reclaim them. But if e.g. a process > that did mlockall() exits, its exclusive pages will be freed anyway, so it's > not a catastrophic kind of leak, right? Thanks for taking a look, Vlastimil. You make a good point here. I had satisfied myself that no stage of the series was going to introduce boot failures or BUGs; and if someone is bisecting some mlock/munlock misbehaviour, I would not worry about which commit of the series they alight on, but root cause it keeping all the patches in mind. But we certainly wouldn't want the series split up into separately submitted parts (that is, split anywhere between 01/13 and 07/13: splitting the rest apart wouldn't matter much); and it would be unfortunate if someone were bisecting some reclaim failure OOM problem elsewhere, and their test app happened to be using mlock, and their bisection landed part way between 01 and 07 here - the unimplemented munlock confusing the bisection for OOM. The worst of it would be, I think, landing between 05 and 07: where your mlockall could mlock a variety of shared libraries etc, moving all their pages to unevictable, and those pagecache pages not getting moved back to evictable when unmapped. I forget the current shrinker situation, whether inode pressure could evict that pagecache or not. Two mitigations come to mind, let me think on it some more (and hope nobody's bisecting OOMs meanwhile): one might be to shift 05 (the one which replaces clear_page_inode() on last unmap by clearance when freeing) later in the series - its position was always somewhat arbitrary, but that position is where it's been tested; another might be to put nothing at all on the unevictable list in between 01 and 07. Though taking this apart and putting it back together brings its own dangers. That second suggestion probably won't fly very well, with 06/13 using mlock_count only while on the unevictable. I'm not sure how much rethinking the bisection possibility deserves. > Yet it differs from the existing "failure modes" where pages would be left > as "stranded" due to failure of being isolated, because they would at least > go through TestClearPageMlocked and counters update. > > > > > /* > > @@ -413,75 +136,11 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec, > > * > > * Returns with VM_LOCKED cleared. Callers must be prepared to > > * deal with this. > > - * > > - * We don't save and restore VM_LOCKED here because pages are > > - * still on lru. In unmap path, pages might be scanned by reclaim > > - * and re-mlocked by page_mlock/try_to_unmap before we unmap and > > - * free them. This will result in freeing mlocked pages. > > */ > > void munlock_vma_pages_range(struct vm_area_struct *vma, > > unsigned long start, unsigned long end) > > { > > - vma->vm_flags &= VM_LOCKED_CLEAR_MASK; > > Should we at least keep doing the flags clearing? I haven't check if there > are some VM_BUG_ONs that would trip on not cleared, but wouldn't be entirely > surprised. There are two flags in question here, VM_LOCKED and VM_LOCKONFAULT: I'm not sure which of them you're particularly concerned about. As to VM_LOCKED: yes, you're right, at this stage of the series the munlock really ought to be clearing VM_LOCKED (even while it doesn't go on to do anything about the pages), as it claims in the comment above. I removed this line at a later stage (07/13), when changing it to mlock_vma_pages_range() serving both mlock and munlock according to whether VM_LOCKED is provided - and mistakenly folded back that deletion to this patch. End result the same, but better to restore that maskout in this patch, as you suggest. As to VM_LOCKONFAULT: I had checked the rest of mm/mlock.c, and the rest of the tree, and it only ever reached here along with VM_LOCKED; so when in 07/13 I switched over to "vma->vm_flags = newflags" (or WRITE_ONCE equivalent), I just didn't see the need to mask it out in the munlocking case; but could add a VM_BUG_ON that newflags never has it without VM_LOCKED, if you like. (You'll say VM_WARN_ON_ONCE, I'll say VM_BUG_ON because it never happens, then as soon as I put it in and run LTP or kselftests, I'll be ashamed to discover I've got it wrong, perhaps.) Hugh