From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08CCBC3B187 for ; Tue, 11 Feb 2020 16:50:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C5B692086A for ; Tue, 11 Feb 2020 16:50:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="ifBMmXPW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730904AbgBKQuk (ORCPT ); Tue, 11 Feb 2020 11:50:40 -0500 Received: from mail-qv1-f66.google.com ([209.85.219.66]:35972 "EHLO mail-qv1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728188AbgBKQuj (ORCPT ); Tue, 11 Feb 2020 11:50:39 -0500 Received: by mail-qv1-f66.google.com with SMTP id db9so5298710qvb.3 for ; Tue, 11 Feb 2020 08:50:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Jd5i4mX/Ogg4/ff2n6qJB7i2iFOPG1CrXa+uEUO9qsw=; b=ifBMmXPW/zhL2PKm3FCrxT+p/6/F3wl9Kh2h+E5xu7Q2TD9qIPdzdgI8LOYyye3FRr UtUBpmAtJ5follDHdD7SjZSZJGIh/ElsRSNlPKYJGDm6Uc7aM3xB468XblcMzBoTh6+j d0kT2iJCbTsVnzeZqaXmiAsbbTrcaLDK5yrV0fr0hi36Nqpkqn78C+d0FDkn4OmBg1nH +4J2crlppYnJKhtPKq6MpYFYYhz6L4hd/n8C0G8QoMdCooies8demjKz57mNc9gwuHqd iFKcwsUCFKfcOjmcSUVqeuUAUMws9OsTOw/EUB4jt7cIljVzgx4W0yBnZwk0+X8slKI6 QdgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Jd5i4mX/Ogg4/ff2n6qJB7i2iFOPG1CrXa+uEUO9qsw=; b=fJHkpnbbnDq5NmLxBv84llET6aNtbDZkl7Gwr8jzAIecPGlv/OaZpuzuumCmf2W2fX 5NmouxGlJ3qh4KNAxtZSC+NULqqx3v5WuSppa5WBJCUKXOouEVJG9/IoeHeHuz05Rqpz L/Af5pXVm8Op4XeuB+XgoT3IVAx7ac/3+s1tNfd9xVf4sPRnowEH89zXYs7fsDPAznF1 ckTmcaQ0SaYXkm5T9ddHg6JuWMHiYfwc2IZ8XGn4qAliDKR+m0q8ERr0eKO0p/FpZnhn 5mOzYtZV6UxP3HBtEv40vmE6VRa8ne5meh2ty+4/HEv+9+h0EXgI4+PUhwL/xcTULGly dnFg== X-Gm-Message-State: APjAAAVSm9to37rNAh0b8BThgmPybmJ7m7y+ho0J3Z3XuId/jzWa7eU3 DSVZNY+/RytmzmQ1y13T1GNaQw== X-Google-Smtp-Source: APXvYqycWT/ED4KtAEPXSOdv+cBO6MkaZONwPU4iRoEq6Z7Ai40GB4O1dCWxrG5iz5rmSQk0gWbZzw== X-Received: by 2002:ad4:4c08:: with SMTP id bz8mr15982924qvb.241.1581439838304; Tue, 11 Feb 2020 08:50:38 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-68-57-212.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.57.212]) by smtp.gmail.com with ESMTPSA id o6sm2206759qkk.53.2020.02.11.08.50.37 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 11 Feb 2020 08:50:37 -0800 (PST) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1j1Yk9-0002J8-7b; Tue, 11 Feb 2020 12:50:37 -0400 Date: Tue, 11 Feb 2020 12:50:37 -0400 From: Jason Gunthorpe To: Joao Martins Cc: linux-nvdimm@lists.01.org, Dan Williams , Vishal Verma , Dave Jiang , Ira Weiny , Alex Williamson , Cornelia Huck , kvm@vger.kernel.org, Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , x86@kernel.org, Liran Alon , Nikita Leshenko , Barret Rhoden , Boris Ostrovsky , Matthew Wilcox , Konrad Rzeszutek Wilk Subject: Re: [PATCH RFC 09/10] vfio/type1: Use follow_pfn for VM_FPNMAP VMAs Message-ID: <20200211165037.GA22564@ziepe.ca> References: <20200110190313.17144-1-joao.m.martins@oracle.com> <20200110190313.17144-10-joao.m.martins@oracle.com> <20200207210831.GA31015@ziepe.ca> <98351044-a710-1d52-f030-022eec89d1d5@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <98351044-a710-1d52-f030-022eec89d1d5@oracle.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 11, 2020 at 04:23:49PM +0000, Joao Martins wrote: > On 2/7/20 9:08 PM, Jason Gunthorpe wrote: > > On Fri, Jan 10, 2020 at 07:03:12PM +0000, Joao Martins wrote: > >> From: Nikita Leshenko > >> > >> Unconditionally interpreting vm_pgoff as a PFN is incorrect. > >> > >> VMAs created by /dev/mem do this, but in general VM_PFNMAP just means > >> that the VMA doesn't have an associated struct page and is being managed > >> directly by something other than the core mmu. > >> > >> Use follow_pfn like KVM does to find the PFN. > >> > >> Signed-off-by: Nikita Leshenko > >> drivers/vfio/vfio_iommu_type1.c | 6 +++--- > >> 1 file changed, 3 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >> index 2ada8e6cdb88..1e43581f95ea 100644 > >> +++ b/drivers/vfio/vfio_iommu_type1.c > >> @@ -362,9 +362,9 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > >> vma = find_vma_intersection(mm, vaddr, vaddr + 1); > >> > >> if (vma && vma->vm_flags & VM_PFNMAP) { > >> - *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; > >> - if (is_invalid_reserved_pfn(*pfn)) > >> - ret = 0; > >> + ret = follow_pfn(vma, vaddr, pfn); > >> + if (!ret && !is_invalid_reserved_pfn(*pfn)) > >> + ret = -EOPNOTSUPP; > >> } > > > > FWIW this existing code is a huge hack and a security problem. > > > > I'm not sure how you could be successfully using this path on actual > > memory without hitting bad bugs? > > > ATM I think this codepath is largelly hit at the moment for MMIO (GPU > passthrough, or mdev). In the context of this patch, guest memory would be > treated similarly meaning the device-dax backing memory wouldn't have a 'struct > page' (as introduced in this series). I think it is being used specifically to allow two VFIO's to be inserted into a VM and have the IOMMU setup to allow MMIO access. > > Fudamentally VFIO can't retain a reference to a page from within a VMA > > without some kind of recount/locking/etc to allow the thing that put > > the page there to know it is still being used (ie programmed in a > > IOMMU) by VFIO. > > > > Otherwise it creates use-after-free style security problems on the > > page. > > I take it you're referring to the past problems with long term page pinning + > fsdax? Or you had something else in mind, perhaps related to your LSFMM topic? No. I'm refering to retaining access to memory backed a VMA without holding any kind of locking on it. This is an access after free scenario. It *should* be like a long term page pin so that the VMA owner knows something is happening. > Here the memory can't be used by the kernel (and there's no struct page) except > from device-dax managing/tearing/driving the pfn region (which is static and the > underlying PFNs won't change throughout device lifetime), and vfio > pinning/unpinning the pfns (which are refcounted against multiple map/unmaps); For instance if you tear down the device-dax then VFIO will happily continue to reference the memory. This is a bug. There are other cases that escalate to security bugs. > > This code needs to be deleted, not extended :( > > To some extent it isn't really an extension: the patch was just removing the > assumption @vm_pgoff being the 'start pfn' on PFNMAP vmas. This is also > similarly done by get_vaddr_frames(). You are extending it in the sense that you plan to use it for more cases than VMAs created by some other VFIO. That should not be done as it will only complicate fixing this code. KVM is allowed to use follow_pfn because it uses MMU notifiers and does not allow the result of follow_pfn to outlive the VMA (AFAIK at least). So it should be safe. Jason