From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BBBDC47095 for ; Thu, 8 Oct 2020 00:44:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BE98E207EA for ; Thu, 8 Oct 2020 00:44:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="bOv1zchI" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BE98E207EA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E46BF6B005C; Wed, 7 Oct 2020 20:44:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DF7966B005D; Wed, 7 Oct 2020 20:44:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE6A38E0001; Wed, 7 Oct 2020 20:44:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0223.hostedemail.com [216.40.44.223]) by kanga.kvack.org (Postfix) with ESMTP id A08496B005C for ; Wed, 7 Oct 2020 20:44:54 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3DD8C8249980 for ; Thu, 8 Oct 2020 00:44:54 +0000 (UTC) X-FDA: 77346913308.06.bun28_340bcdc271d3 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 1C71B100C0C68 for ; Thu, 8 Oct 2020 00:44:54 +0000 (UTC) X-HE-Tag: bun28_340bcdc271d3 X-Filterd-Recvd-Size: 5729 Received: from hqnvemgate24.nvidia.com (hqnvemgate24.nvidia.com [216.228.121.143]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Thu, 8 Oct 2020 00:44:53 +0000 (UTC) Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Wed, 07 Oct 2020 17:43:02 -0700 Received: from [10.2.85.86] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 8 Oct 2020 00:44:46 +0000 Subject: Re: [PATCH 07/13] mm: close race in generic_access_phys To: Daniel Vetter , DRI Development , LKML CC: , , , , , , Jason Gunthorpe , Dan Williams , Kees Cook , Rik van Riel , Benjamin Herrensmidt , Dave Airlie , Hugh Dickins , Andrew Morton , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jan Kara , Daniel Vetter References: <20201007164426.1812530-1-daniel.vetter@ffwll.ch> <20201007164426.1812530-8-daniel.vetter@ffwll.ch> From: John Hubbard Message-ID: <852a74ec-339b-4c7f-9e29-b9736111849a@nvidia.com> Date: Wed, 7 Oct 2020 17:44:46 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20201007164426.1812530-8-daniel.vetter@ffwll.ch> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL101.nvidia.com (172.20.187.10) To HQMAIL107.nvidia.com (172.20.187.13) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1602117782; bh=5uceI5LiOq5+y7CKrEdS7/p/1zSG6F9z/ywEQqsl+0Q=; h=Subject:To:CC:References:From:Message-ID:Date:User-Agent: MIME-Version:In-Reply-To:Content-Type:Content-Language: Content-Transfer-Encoding:X-Originating-IP:X-ClientProxiedBy; b=bOv1zchIGPZ4VmVobYy9ouKvwkK5z2iFx0tCygQStaL5H7jfVh0290LObbayaaJdS WHEwgcF93dN/WBuw358p5ASUvd7jq2UVbYaEsfmxxLYQKm9SIHpvw1Azpcdj8EvlkO +h0jsBZCIPOWXbnWZglvtlNPQ9RozMAAGu9n2l/aM37dTFHaW2PBaPyNHrvz3GBICy 9TRkHD5naSV5XLswp6NmgWYjti6DfG9Id8SNalhueNWQcUu+7yGRlg+spys2z8hb56 zeyo4ufqohIWRiSETZ13zhAJ7qYzaQsz7j6WHTgK/Hmqk/WB97CYrfy3hYU2QFR/25 OfakNOpxWk5cA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/7/20 9:44 AM, Daniel Vetter wrote: > Way back it was a reasonable assumptions that iomem mappings never > change the pfn range they point at. But this has changed: > > - gpu drivers dynamically manage their memory nowadays, invalidating > ptes with unmap_mapping_range when buffers get moved > > - contiguous dma allocations have moved from dedicated carvetouts to s/carvetouts/carveouts/ > cma regions. This means if we miss the unmap the pfn might contain > pagecache or anon memory (well anything allocated with GFP_MOVEABLE) > > - even /dev/mem now invalidates mappings when the kernel requests that > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 > ("/dev/mem: Revoke mappings when a driver claims the region") Thanks for putting these references into the log, it's very helpful. ... > diff --git a/mm/memory.c b/mm/memory.c > index fcfc4ca36eba..8d467e23b44e 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma, > return ret; > } > > +/** > + * generic_access_phys - generic implementation for iomem mmap access > + * @vma: the vma to access > + * @addr: userspace addres, not relative offset within @vma > + * @buf: buffer to read/write > + * @len: length of transfer > + * @write: set to FOLL_WRITE when writing, otherwise reading > + * > + * This is a generic implementation for &vm_operations_struct.access for an > + * iomem mapping. This callback is used by access_process_vm() when the @vma is > + * not page based. > + */ > int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, > void *buf, int len, int write) > { > resource_size_t phys_addr; > unsigned long prot = 0; > void __iomem *maddr; > + pte_t *ptep, pte; > + spinlock_t *ptl; > int offset = addr & (PAGE_SIZE-1); > + int ret = -EINVAL; > + > + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) > + return -EINVAL; > + > +retry: > + if (follow_pte(vma->vm_mm, addr, &ptep, &ptl)) > + return -EINVAL; > + pte = *ptep; > + pte_unmap_unlock(ptep, ptl); > > - if (follow_phys(vma, addr, write, &prot, &phys_addr)) > + prot = pgprot_val(pte_pgprot(pte)); > + phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT; > + > + if ((write & FOLL_WRITE) && !pte_write(pte)) > return -EINVAL; > > maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot); > if (!maddr) > return -ENOMEM; > > + if (follow_pte(vma->vm_mm, addr, &ptep, &ptl)) > + goto out_unmap; > + > + if (pte_same(pte, *ptep)) { The ioremap area is something I'm sorta new to, so a newbie question: is it possible for the same pte to already be there, ever? If so, we be stuck in an infinite loop here. I'm sure that's not the case, but it's not yet obvious to me why it's impossible. Resource reservations maybe? thanks, -- John Hubbard NVIDIA