From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D3F1C433EF for ; Fri, 25 Mar 2022 01:14:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D25B68D0050; Thu, 24 Mar 2022 21:14:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C60108D004D; Thu, 24 Mar 2022 21:14:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB49F8D0050; Thu, 24 Mar 2022 21:14:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 8C64B8D004D for ; Thu, 24 Mar 2022 21:14:14 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 55AD9250FC for ; Fri, 25 Mar 2022 01:14:14 +0000 (UTC) X-FDA: 79281137628.06.A13E83D Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf12.hostedemail.com (Postfix) with ESMTP id E09F340042 for ; Fri, 25 Mar 2022 01:14:13 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4EBD2618CB; Fri, 25 Mar 2022 01:14:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A9A10C340EC; Fri, 25 Mar 2022 01:14:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1648170852; bh=g0rshfmvBKg3SRoE4e+/iBfZSeKSsH4HTmeZSvqpxo8=; h=Date:To:From:In-Reply-To:Subject:From; b=vSuQCl3z60T5FBoxwcf91KGRKtIkVwHKk8ZxLcfVBP5sUJa4zurpEeslw6/+mfUxm j3X04rNtADpCrwZKK9SXdQenXOYYOZp8y7uvdemFWPzDHFQEg3eoMW5zARmg31SJK6 zsZfMQz7vlmLS0Ra0GmCJGgiTIswJ0c7VHEB61RI= Date: Thu, 24 Mar 2022 18:14:12 -0700 To: vbabka@suse.cz,shakeelb@google.com,nadav.amit@gmail.com,mike.kravetz@oracle.com,mhocko@suse.com,dgilbert@redhat.com,david@redhat.com,hannes@cmpxchg.org,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220324180758.96b1ac7e17675d6bc474485e@linux-foundation.org> Subject: [patch 112/114] mm: madvise: MADV_DONTNEED_LOCKED Message-Id: <20220325011412.A9A10C340EC@smtp.kernel.org> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E09F340042 X-Stat-Signature: 17hoi98gfih4qgjg36f1hut4tzgqjznj X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=vSuQCl3z; dmarc=none; spf=pass (imf12.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-HE-Tag: 1648170853-235957 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Johannes Weiner Subject: mm: madvise: MADV_DONTNEED_LOCKED MADV_DONTNEED historically rejects mlocked ranges, but with MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating, there are valid use cases for depopulating locked ranges as well. Users mlock memory to protect secrets. There are allocators for secure buffers that want in-use memory generally mlocked, but cleared and invalidated memory to give up the physical pages. This could be done with explicit munlock -> mlock calls on free -> alloc of course, but that adds two unnecessary syscalls, heavy mmap_sem write locks, vma splits and re-merges - only to get rid of the backing pages. Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are okay with on-demand initial population. It seems valid to selectively free some memory during the lifetime of such a process, without having to mess with its overall policy. Why add a separate flag? Isn't this a pretty niche usecase? - MADV_DONTNEED has been bailing on locked vmas forever. It's at least conceivable that someone, somewhere is relying on mlock to protect data from perhaps broader invalidation calls. Changing this behavior now could lead to quiet data corruption. - It also clarifies expectations around MADV_FREE and maybe MADV_REMOVE. It avoids the situation where one quietly behaves different than the others. MADV_FREE_LOCKED can be added later. - The combination of mlock() and madvise() in the first place is probably niche. But where it happens, I'd say that dropping pages from a locked region once they don't contain secrets or won't page anymore is much saner than relying on mlock to protect memory from speculative or errant invalidation calls. It's just that we can't change the default behavior because of the two previous points. Given that, an explicit new flag seems to make the most sense. [hannes@cmpxchg.org: fix mips build] Link: https://lkml.kernel.org/r/20220304171912.305060-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Reviewed-by: Mike Kravetz Reviewed-by: Shakeel Butt Acked-by: Vlastimil Babka Cc: Nadav Amit Cc: David Hildenbrand Cc: Dr. David Alan Gilbert Signed-off-by: Andrew Morton --- arch/alpha/include/uapi/asm/mman.h | 2 + arch/mips/include/uapi/asm/mman.h | 2 + arch/parisc/include/uapi/asm/mman.h | 2 + arch/xtensa/include/uapi/asm/mman.h | 2 + include/uapi/asm-generic/mman-common.h | 2 + mm/madvise.c | 24 +++++++++++++---------- 6 files changed, 24 insertions(+), 10 deletions(-) --- a/arch/alpha/include/uapi/asm/mman.h~mm-madvise-madv_dontneed_locked +++ a/arch/alpha/include/uapi/asm/mman.h @@ -74,6 +74,8 @@ #define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ #define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ +#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ + /* compatibility flags */ #define MAP_FILE 0 --- a/arch/mips/include/uapi/asm/mman.h~mm-madvise-madv_dontneed_locked +++ a/arch/mips/include/uapi/asm/mman.h @@ -101,6 +101,8 @@ #define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ #define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ +#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ + /* compatibility flags */ #define MAP_FILE 0 --- a/arch/parisc/include/uapi/asm/mman.h~mm-madvise-madv_dontneed_locked +++ a/arch/parisc/include/uapi/asm/mman.h @@ -55,6 +55,8 @@ #define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ #define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ +#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ + #define MADV_MERGEABLE 65 /* KSM may merge identical pages */ #define MADV_UNMERGEABLE 66 /* KSM may not merge identical pages */ --- a/arch/xtensa/include/uapi/asm/mman.h~mm-madvise-madv_dontneed_locked +++ a/arch/xtensa/include/uapi/asm/mman.h @@ -109,6 +109,8 @@ #define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ #define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ +#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ + /* compatibility flags */ #define MAP_FILE 0 --- a/include/uapi/asm-generic/mman-common.h~mm-madvise-madv_dontneed_locked +++ a/include/uapi/asm-generic/mman-common.h @@ -75,6 +75,8 @@ #define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ #define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ +#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ + /* compatibility flags */ #define MAP_FILE 0 --- a/mm/madvise.c~mm-madvise-madv_dontneed_locked +++ a/mm/madvise.c @@ -52,6 +52,7 @@ static int madvise_need_mmap_write(int b case MADV_REMOVE: case MADV_WILLNEED: case MADV_DONTNEED: + case MADV_DONTNEED_LOCKED: case MADV_COLD: case MADV_PAGEOUT: case MADV_FREE: @@ -502,14 +503,9 @@ static void madvise_cold_page_range(stru tlb_end_vma(tlb, vma); } -static inline bool can_madv_lru_non_huge_vma(struct vm_area_struct *vma) -{ - return !(vma->vm_flags & (VM_LOCKED|VM_PFNMAP)); -} - static inline bool can_madv_lru_vma(struct vm_area_struct *vma) { - return can_madv_lru_non_huge_vma(vma) && !is_vm_hugetlb_page(vma); + return !(vma->vm_flags & (VM_LOCKED|VM_PFNMAP|VM_HUGETLB)); } static long madvise_cold(struct vm_area_struct *vma, @@ -787,10 +783,16 @@ static bool madvise_dontneed_free_valid_ unsigned long *end, int behavior) { - if (!is_vm_hugetlb_page(vma)) - return can_madv_lru_non_huge_vma(vma); + if (!is_vm_hugetlb_page(vma)) { + unsigned int forbidden = VM_PFNMAP; + + if (behavior != MADV_DONTNEED_LOCKED) + forbidden |= VM_LOCKED; + + return !(vma->vm_flags & forbidden); + } - if (behavior != MADV_DONTNEED) + if (behavior != MADV_DONTNEED && behavior != MADV_DONTNEED_LOCKED) return false; if (start & ~huge_page_mask(hstate_vma(vma))) return false; @@ -854,7 +856,7 @@ static long madvise_dontneed_free(struct VM_WARN_ON(start >= end); } - if (behavior == MADV_DONTNEED) + if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED) return madvise_dontneed_single_vma(vma, start, end); else if (behavior == MADV_FREE) return madvise_free_single_vma(vma, start, end); @@ -993,6 +995,7 @@ static int madvise_vma_behavior(struct v return madvise_pageout(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: + case MADV_DONTNEED_LOCKED: return madvise_dontneed_free(vma, prev, start, end, behavior); case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: @@ -1123,6 +1126,7 @@ madvise_behavior_valid(int behavior) case MADV_REMOVE: case MADV_WILLNEED: case MADV_DONTNEED: + case MADV_DONTNEED_LOCKED: case MADV_FREE: case MADV_COLD: case MADV_PAGEOUT: _