From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFBADC433E4 for ; Thu, 30 Jul 2020 16:22:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9897921744 for ; Thu, 30 Jul 2020 16:22:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729724AbgG3QWS (ORCPT ); Thu, 30 Jul 2020 12:22:18 -0400 Received: from mail.kernel.org ([198.145.29.99]:48048 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728452AbgG3QWR (ORCPT ); Thu, 30 Jul 2020 12:22:17 -0400 Received: from gaia (unknown [95.146.230.158]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6264720838; Thu, 30 Jul 2020 16:22:12 +0000 (UTC) Date: Thu, 30 Jul 2020 17:22:10 +0100 From: Catalin Marinas To: Mike Rapoport Cc: linux-kernel@vger.kernel.org, Alexander Viro , Andrew Morton , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Idan Yaniv , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org Subject: Re: [PATCH v2 3/7] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: <20200730162209.GB3128@gaia> References: <20200727162935.31714-1-rppt@kernel.org> <20200727162935.31714-4-rppt@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200727162935.31714-4-rppt@kernel.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-api-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-api@vger.kernel.org Hi Mike, On Mon, Jul 27, 2020 at 07:29:31PM +0300, Mike Rapoport wrote: > For instance, the following example will create an uncached mapping (error > handling is omitted): > > fd = memfd_secret(SECRETMEM_UNCACHED); > ftruncate(fd, MAP_SIZE); > ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); [...] > +static struct page *secretmem_alloc_page(gfp_t gfp) > +{ > + /* > + * FIXME: use a cache of large pages to reduce the direct map > + * fragmentation > + */ > + return alloc_page(gfp); > +} > + > +static vm_fault_t secretmem_fault(struct vm_fault *vmf) > +{ > + struct address_space *mapping = vmf->vma->vm_file->f_mapping; > + struct inode *inode = file_inode(vmf->vma->vm_file); > + pgoff_t offset = vmf->pgoff; > + unsigned long addr; > + struct page *page; > + int ret = 0; > + > + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode)) > + return vmf_error(-EINVAL); > + > + page = find_get_entry(mapping, offset); > + if (!page) { > + page = secretmem_alloc_page(vmf->gfp_mask); > + if (!page) > + return vmf_error(-ENOMEM); > + > + ret = add_to_page_cache(page, mapping, offset, vmf->gfp_mask); > + if (unlikely(ret)) > + goto err_put_page; > + > + ret = set_direct_map_invalid_noflush(page); > + if (ret) > + goto err_del_page_cache; > + > + addr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > + > + __SetPageUptodate(page); > + > + ret = VM_FAULT_LOCKED; > + } > + > + vmf->page = page; > + return ret; > + > +err_del_page_cache: > + delete_from_page_cache(page); > +err_put_page: > + put_page(page); > + return vmf_error(ret); > +} > + > +static const struct vm_operations_struct secretmem_vm_ops = { > + .fault = secretmem_fault, > +}; > + > +static int secretmem_mmap(struct file *file, struct vm_area_struct *vma) > +{ > + struct secretmem_ctx *ctx = file->private_data; > + unsigned long mode = ctx->mode; > + unsigned long len = vma->vm_end - vma->vm_start; > + > + if (!mode) > + return -EINVAL; > + > + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0) > + return -EINVAL; > + > + if (mlock_future_check(vma->vm_mm, vma->vm_flags | VM_LOCKED, len)) > + return -EAGAIN; > + > + switch (mode) { > + case SECRETMEM_UNCACHED: > + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > + fallthrough; > + case SECRETMEM_EXCLUSIVE: > + vma->vm_ops = &secretmem_vm_ops; > + break; > + default: > + return -EINVAL; > + } > + > + vma->vm_flags |= VM_LOCKED; > + > + return 0; > +} I think the uncached mapping is not the right thing for arm/arm64. First of all, pgprot_noncached() gives us Strongly Ordered (Device memory) semantics together with not allowing unaligned accesses. I suspect the semantics are different on x86. The second, more serious problem, is that I can't find any place where the caches are flushed for the page mapped on fault. When a page is allocated, assuming GFP_ZERO, only the caches are guaranteed to be zeroed. Exposing this subsequently to user space as uncached would allow the user to read stale data prior to zeroing. The arm64 set_direct_map_default_noflush() doesn't do any cache maintenance. -- Catalin