From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8555C47082 for ; Thu, 3 Jun 2021 19:57:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5439D613D7 for ; Thu, 3 Jun 2021 19:57:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5439D613D7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DCAC96B0036; Thu, 3 Jun 2021 15:57:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D79976B006C; Thu, 3 Jun 2021 15:57:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1A946B006E; Thu, 3 Jun 2021 15:57:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id 8E1EE6B0036 for ; Thu, 3 Jun 2021 15:57:16 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 3212A181AC9C6 for ; Thu, 3 Jun 2021 19:57:16 +0000 (UTC) X-FDA: 78213471672.05.7FECDAA Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf22.hostedemail.com (Postfix) with ESMTP id 483BEC0201D3 for ; Thu, 3 Jun 2021 19:56:58 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 0AA1E6139A; Thu, 3 Jun 2021 19:57:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1622750224; bh=81DqeO774s33h+YAwWrBAiXQa7KeTEOqSdKa8sghF9M=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=cnjbZym8K9qCgQHxspSMzsHaHSF/RlTIzutUDrU8l5AVE+Y6KL0kum7iau+vYj1Gz x3w9EZwjJdgqclXYwgm8KFsMTLm7iF8wzcgXwYL42JH6Fz7bv1gcAuSMCH89GbDSSv c+xc9tdT04HVkKeKhM+xP2VT1W81C5nlPjmeb1xbGYHJ8L1OCnkrHYYfFgtfHbKZd3 2jBdLE/5WSQan772gkgaQDQaW1qMdngdBZZbzGPoP6mLs08iDhj8wTf7ndC4oJqodB RLJaTrvJz4aCaj82E/lqKjkS4tiw1kHRqDqNNUtgj3RcDpFBaJ2gy8/6sCy9KdkA64 nBRfjO3Mmhb0w== Subject: Re: [PATCH 2/2] mm: adds NOSIGBUS extension for out-of-band shmem read To: Hugh Dickins Cc: Linus Torvalds , Simon Ser , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org References: <1622589753-9206-1-git-send-email-mlin@kernel.org> <1622589753-9206-3-git-send-email-mlin@kernel.org> <79a27014-5450-1345-9eea-12fc9ae25777@kernel.org> From: Ming Lin Message-ID: Date: Thu, 3 Jun 2021 12:57:01 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=cnjbZym8; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf22.hostedemail.com: domain of mlin@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=mlin@kernel.org X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 483BEC0201D3 X-Stat-Signature: w34g6k4mo81n5t59k3ox95dihhufbpjg X-HE-Tag: 1622750218-925110 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/2/2021 5:46 PM, Hugh Dickins wrote > > It's do_anonymous_page()'s business to map in the zero page on > read fault (see "my_zero_pfn(vmf->address)" in there), or fill > a freshly allocated page with zeroes on write fault - and now > you're sticking to MAP_PRIVATE, write faults in VM_WRITE areas > are okay for VM_NOSIGBUS. > > Ideally you can simply call do_anonymous_page() from __do_fault() > in the VM_FAULT_SIGBUS on VM_NOSIGBUS case. That's what to start > from anyway: but look to see if there's state to be adjusted to > achieve that; and it won't be surprising if somewhere down in > do_anonymous_page() or something it calls, there's a BUG on it > being called when vma->vm_file is set, or something like that. > May need some tweaking. do_anonymous_page() works nicely for read fault and write fault. I didn't see any BUG() thing in my test. But I'm still struggling with how to do "punch hole should remove the mapping of zero page". Here is the hack I have now. diff --git a/mm/memory.c b/mm/memory.c index 46ecda5..6b5a897 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1241,7 +1241,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, struct page *page; page = vm_normal_page(vma, addr, ptent); - if (unlikely(details) && page) { + if (unlikely(details) && page && !(vma->vm_flags & VM_NOSIGBUS)) { /* * unmap_shared_mapping_pages() wants to * invalidate cache without truncating: And other parts of the patch is following, ---- diff --git a/include/linux/mm.h b/include/linux/mm.h index e9d67bc..af9e277 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -373,6 +373,8 @@ int __add_to_page_cache_locked(struct page *page, struct address_space *mapping, # define VM_UFFD_MINOR VM_NONE #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#define VM_NOSIGBUS VM_FLAGS_BIT(38) /* Do not SIGBUS on fault */ + /* Bits set in the VMA until the stack is in its final location */ #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ) diff --git a/include/linux/mman.h b/include/linux/mman.h index b2cbae9..c966b08 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -154,6 +154,7 @@ static inline bool arch_validate_flags(unsigned long flags) _calc_vm_trans(flags, MAP_DENYWRITE, VM_DENYWRITE ) | _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | + _calc_vm_trans(flags, MAP_NOSIGBUS, VM_NOSIGBUS ) | arch_calc_vm_flag_bits(flags); } diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index f94f65d..a2a5333 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -29,6 +29,7 @@ #define MAP_HUGETLB 0x040000 /* create a huge page mapping */ #define MAP_SYNC 0x080000 /* perform synchronous page faults for the mapping */ #define MAP_FIXED_NOREPLACE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */ +#define MAP_NOSIGBUS 0x200000 /* do not SIGBUS on fault */ #define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be * uninitialized */ diff --git a/mm/memory.c b/mm/memory.c index eff2a47..46ecda5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3676,6 +3676,17 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) } ret = vma->vm_ops->fault(vmf); + if (unlikely(ret & VM_FAULT_SIGBUS) && (vma->vm_flags & VM_NOSIGBUS)) { + /* + * For MAP_NOSIGBUS mapping, map in the zero page on read fault + * or fill a freshly allocated page with zeroes on write fault + */ + ret = do_anonymous_page(vmf); + if (!ret) + ret = VM_FAULT_NOPAGE; + return ret; + } + if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY | VM_FAULT_DONE_COW))) return ret; diff --git a/mm/mmap.c b/mm/mmap.c index 096bba4..74fb49a 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1419,6 +1419,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (!len) return -EINVAL; + /* Restrict MAP_NOSIGBUS to MAP_PRIVATE mapping */ + if ((flags & MAP_NOSIGBUS) && !(flags & MAP_PRIVATE)) + return -EINVAL; + /* * Does the application expect PROT_READ to imply PROT_EXEC? *