linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap
       [not found] <20201021085655.1192025-1-daniel.vetter@ffwll.ch>
@ 2020-10-21  8:56 ` Daniel Vetter
  2020-10-21 12:50   ` Jason Gunthorpe
  2020-10-21  8:56 ` [PATCH v3 15/16] sysfs: Support zapping of binary attr mmaps Daniel Vetter
  2020-10-21  8:56 ` [PATCH v3 16/16] PCI: Revoke mappings like devmem Daniel Vetter
  2 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2020-10-21  8:56 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Bjorn Helgaas, linux-pci,
	Daniel Vetter

There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to PCI BARs while a driver is loaded and using it.

Fix this by adding the same iomem_is_exclusive() check we already have
on the sysfs side in pci_mmap_resource().

References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.com>
--
v2: Improve commit message (Bjorn)
---
 drivers/pci/proc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index d35186b01d98..3a2f90beb4cb 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 		else
 			return -EINVAL;
 	}
+
+	if (dev->resource[i].flags & IORESOURCE_MEM &&
+	    iomem_is_exclusive(dev->resource[i].start))
+		return -EINVAL;
+
 	ret = pci_mmap_page_range(dev, i, vma,
 				  fpriv->mmap_state, write_combine);
 	if (ret < 0)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 15/16] sysfs: Support zapping of binary attr mmaps
       [not found] <20201021085655.1192025-1-daniel.vetter@ffwll.ch>
  2020-10-21  8:56 ` [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap Daniel Vetter
@ 2020-10-21  8:56 ` Daniel Vetter
  2020-10-21  8:56 ` [PATCH v3 16/16] PCI: Revoke mappings like devmem Daniel Vetter
  2 siblings, 0 replies; 13+ messages in thread
From: Daniel Vetter @ 2020-10-21  8:56 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Greg Kroah-Hartman, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	linux-pci, Rafael J. Wysocki, Christian Brauner, David S. Miller,
	Michael Ellerman, Sourabh Jain, Mauro Carvalho Chehab,
	Nayna Jain, Daniel Vetter

We want to be able to revoke pci mmaps so that the same access rules
applies as for /dev/kmem. Revoke support for devmem was added in
3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
region").

The simplest way to achieve this is by having the same filp->f_mapping
for all mappings, so that unmap_mapping_range can find them all, no
matter through which file they've been created. Since this must be set
at open time we need sysfs support for this.

Add an optional mapping parameter bin_attr, which is only consulted
when there's also an mmap callback, since without mmap support
allowing to adjust the ->f_mapping makes no sense.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.com>
---
 fs/sysfs/file.c       | 11 +++++++++++
 include/linux/sysfs.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index eb6897ab78e7..9d8ccdb000e3 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -169,6 +169,16 @@ static int sysfs_kf_bin_mmap(struct kernfs_open_file *of,
 	return battr->mmap(of->file, kobj, battr, vma);
 }
 
+static int sysfs_kf_bin_open(struct kernfs_open_file *of)
+{
+	struct bin_attribute *battr = of->kn->priv;
+
+	if (battr->mapping)
+		of->file->f_mapping = battr->mapping;
+
+	return 0;
+}
+
 void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr)
 {
 	struct kernfs_node *kn = kobj->sd, *tmp;
@@ -240,6 +250,7 @@ static const struct kernfs_ops sysfs_bin_kfops_mmap = {
 	.read		= sysfs_kf_bin_read,
 	.write		= sysfs_kf_bin_write,
 	.mmap		= sysfs_kf_bin_mmap,
+	.open		= sysfs_kf_bin_open,
 };
 
 int sysfs_add_file_mode_ns(struct kernfs_node *parent,
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 34e84122f635..a17a474d1601 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -164,11 +164,13 @@ __ATTRIBUTE_GROUPS(_name)
 
 struct file;
 struct vm_area_struct;
+struct address_space;
 
 struct bin_attribute {
 	struct attribute	attr;
 	size_t			size;
 	void			*private;
+	struct address_space	*mapping;
 	ssize_t (*read)(struct file *, struct kobject *, struct bin_attribute *,
 			char *, loff_t, size_t);
 	ssize_t (*write)(struct file *, struct kobject *, struct bin_attribute *,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 16/16] PCI: Revoke mappings like devmem
       [not found] <20201021085655.1192025-1-daniel.vetter@ffwll.ch>
  2020-10-21  8:56 ` [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap Daniel Vetter
  2020-10-21  8:56 ` [PATCH v3 15/16] sysfs: Support zapping of binary attr mmaps Daniel Vetter
@ 2020-10-21  8:56 ` Daniel Vetter
  2 siblings, 0 replies; 13+ messages in thread
From: Daniel Vetter @ 2020-10-21  8:56 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Greg Kroah-Hartman,
	Bjorn Helgaas, linux-pci, Daniel Vetter

Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
the region") /dev/kmem zaps ptes when the kernel requests exclusive
acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
the default for all driver uses.

Except there's two more ways to access PCI BARs: sysfs and proc mmap
support. Let's plug that hole.

For revoke_devmem() to work we need to link our vma into the same
address_space, with consistent vma->vm_pgoff. ->pgoff is already
adjusted, because that's how (io_)remap_pfn_range works, but for the
mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is
to adjust this at at ->open time:

- for sysfs this is easy, now that binary attributes support this. We
  just set bin_attr->mapping when mmap is supported
- for procfs it's a bit more tricky, since procfs pci access has only
  one file per device, and access to a specific resources first needs
  to be set up with some ioctl calls. But mmap is only supported for
  the same resources as sysfs exposes with mmap support, and otherwise
  rejected, so we can set the mapping unconditionally at open time
  without harm.

A special consideration is for arch_can_pci_mmap_io() - we need to
make sure that the ->f_mapping doesn't alias between ioport and iomem
space. There's only 2 ways in-tree to support mmap of ioports: generic
pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single
architecture hand-rolling. Both approach support ioport mmap through a
special pfn range and not through magic pte attributes. Aliasing is
therefore not a problem.

The only difference in access checks left is that sysfs PCI mmap does
not check for CAP_RAWIO. I'm not really sure whether that should be
added or not.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.com>
--
v2:
- Totally new approach: Adjust filp->f_mapping at open time. Note that
  this now works on all architectures, not just those support
  ARCH_GENERIC_PCI_MMAP_RESOURCE
---
 drivers/pci/pci-sysfs.c | 4 ++++
 drivers/pci/proc.c      | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 6d78df981d41..cee38fcb4a86 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -928,6 +928,7 @@ void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_io->read = pci_read_legacy_io;
 	b->legacy_io->write = pci_write_legacy_io;
 	b->legacy_io->mmap = pci_mmap_legacy_io;
+	b->legacy_io->mapping = iomem_get_mapping();
 	pci_adjust_legacy_attr(b, pci_mmap_io);
 	error = device_create_bin_file(&b->dev, b->legacy_io);
 	if (error)
@@ -940,6 +941,7 @@ void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_mem->size = 1024*1024;
 	b->legacy_mem->attr.mode = 0600;
 	b->legacy_mem->mmap = pci_mmap_legacy_mem;
+	b->legacy_io->mapping = iomem_get_mapping();
 	pci_adjust_legacy_attr(b, pci_mmap_mem);
 	error = device_create_bin_file(&b->dev, b->legacy_mem);
 	if (error)
@@ -1155,6 +1157,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
 			res_attr->mmap = pci_mmap_resource_uc;
 		}
 	}
+	if (res_attr->mmap)
+		res_attr->mapping = iomem_get_mapping();
 	res_attr->attr.name = res_attr_name;
 	res_attr->attr.mode = 0600;
 	res_attr->size = pci_resource_len(pdev, num);
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index 3a2f90beb4cb..9bab07302bbf 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file)
 	fpriv->write_combine = 0;
 
 	file->private_data = fpriv;
+	file->f_mapping = iomem_get_mapping();
 
 	return 0;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap
  2020-10-21  8:56 ` [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap Daniel Vetter
@ 2020-10-21 12:50   ` Jason Gunthorpe
  2020-10-21 14:42     ` Daniel Vetter
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-10-21 12:50 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Bjorn Helgaas, linux-pci,
	Daniel Vetter

On Wed, Oct 21, 2020 at 10:56:51AM +0200, Daniel Vetter wrote:
> There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
> files, and the old proc interface. Two check against
> iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> this starts to matter, since we don't want random userspace having
> access to PCI BARs while a driver is loaded and using it.
> 
> Fix this by adding the same iomem_is_exclusive() check we already have
> on the sysfs side in pci_mmap_resource().
> 
> References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.com>

Maybe not for fixing in this series, but this access to
IORESOURCE_BUSY doesn't have any locking.

The write side holds the resource_lock at least..

>  	ret = pci_mmap_page_range(dev, i, vma,
>  				  fpriv->mmap_state, write_combine);

At this point the vma isn't linked into the address space, so doesn't
this happen?

     CPU 0                                  CPU1
 mmap_region()
   vma = vm_area_alloc
   proc_bus_pci_mmap
    iomem_is_exclusive
    pci_mmap_page_range
                                            revoke_devmem
                                             unmap_mapping_range()
     // vma is not linked to the address space here,
     // unmap doesn't find it
  vma_link() 
  !!! The VMA gets mapped with the revoked PTEs

I couldn't find anything that prevents it at least, no mmap_sem on the
unmap side, just the i_mmap_lock

Not seeing how address space and pre-populating during mmap work
together? Did I miss locking someplace?

Not something to be fixed for this series, this is clearly an
improvement, but seems like another problem to tackle?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap
  2020-10-21 12:50   ` Jason Gunthorpe
@ 2020-10-21 14:42     ` Daniel Vetter
  2020-10-21 15:13       ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2020-10-21 14:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Bjorn Helgaas, Linux PCI, Daniel Vetter

On Wed, Oct 21, 2020 at 2:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 21, 2020 at 10:56:51AM +0200, Daniel Vetter wrote:
> > There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
> > files, and the old proc interface. Two check against
> > iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
> > this starts to matter, since we don't want random userspace having
> > access to PCI BARs while a driver is loaded and using it.
> >
> > Fix this by adding the same iomem_is_exclusive() check we already have
> > on the sysfs side in pci_mmap_resource().
> >
> > References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: linux-pci@vger.kernel.org
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.com>
>
> Maybe not for fixing in this series, but this access to
> IORESOURCE_BUSY doesn't have any locking.
>
> The write side holds the resource_lock at least..
>
> >       ret = pci_mmap_page_range(dev, i, vma,
> >                                 fpriv->mmap_state, write_combine);
>
> At this point the vma isn't linked into the address space, so doesn't
> this happen?
>
>      CPU 0                                  CPU1
>  mmap_region()
>    vma = vm_area_alloc
>    proc_bus_pci_mmap
>     iomem_is_exclusive
>     pci_mmap_page_range
>                                             revoke_devmem
>                                              unmap_mapping_range()
>      // vma is not linked to the address space here,
>      // unmap doesn't find it
>   vma_link()
>   !!! The VMA gets mapped with the revoked PTEs
>
> I couldn't find anything that prevents it at least, no mmap_sem on the
> unmap side, just the i_mmap_lock
>
> Not seeing how address space and pre-populating during mmap work
> together? Did I miss locking someplace?
>
> Not something to be fixed for this series, this is clearly an
> improvement, but seems like another problem to tackle?

Uh yes. In drivers/gpu this isn't a problem because we only install
ptes from the vm_ops->fault handler. So no races. And I don't think
you can fix this otherwise through holding locks: mmap_sem we can't
hold because before vma_link we don't even know which mm_struct is
involved, so can't solve the race. Plus this would be worse that
mm_take_all_locks used by mmu notifier. And the address_space
i_mmap_lock is also no good since it's not held during the ->mmap
callback, when we write the ptes. And the resource locks is even less
useful, since we're not going to hold that at vma_link() time for
sure.

Hence delaying the pte writes after the vma_link, which means ->fault
time, looks like the only way to close this gap.

Trouble is I have no idea how to do this cleanly ...
-Daniel



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap
  2020-10-21 14:42     ` Daniel Vetter
@ 2020-10-21 15:13       ` Jason Gunthorpe
  2020-10-21 15:54         ` Daniel Vetter
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-10-21 15:13 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Bjorn Helgaas, Linux PCI, Daniel Vetter

On Wed, Oct 21, 2020 at 04:42:11PM +0200, Daniel Vetter wrote:

> Uh yes. In drivers/gpu this isn't a problem because we only install
> ptes from the vm_ops->fault handler. So no races. And I don't think
> you can fix this otherwise through holding locks: mmap_sem we can't
> hold because before vma_link we don't even know which mm_struct is
> involved, so can't solve the race. Plus this would be worse that
> mm_take_all_locks used by mmu notifier. And the address_space
> i_mmap_lock is also no good since it's not held during the ->mmap
> callback, when we write the ptes. And the resource locks is even less
> useful, since we're not going to hold that at vma_link() time for
> sure.
> 
> Hence delaying the pte writes after the vma_link, which means ->fault
> time, looks like the only way to close this gap.

> Trouble is I have no idea how to do this cleanly ...

How about add a vm_ops callback 'install_pages'/'prefault_pages' ?

Call it after vm_link() - basically just move the remap_pfn, under
some other lock, into there.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap
  2020-10-21 15:13       ` Jason Gunthorpe
@ 2020-10-21 15:54         ` Daniel Vetter
  2020-10-21 16:37           ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2020-10-21 15:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Bjorn Helgaas, Linux PCI, Daniel Vetter

On Wed, Oct 21, 2020 at 5:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 21, 2020 at 04:42:11PM +0200, Daniel Vetter wrote:
>
> > Uh yes. In drivers/gpu this isn't a problem because we only install
> > ptes from the vm_ops->fault handler. So no races. And I don't think
> > you can fix this otherwise through holding locks: mmap_sem we can't
> > hold because before vma_link we don't even know which mm_struct is
> > involved, so can't solve the race. Plus this would be worse that
> > mm_take_all_locks used by mmu notifier. And the address_space
> > i_mmap_lock is also no good since it's not held during the ->mmap
> > callback, when we write the ptes. And the resource locks is even less
> > useful, since we're not going to hold that at vma_link() time for
> > sure.
> >
> > Hence delaying the pte writes after the vma_link, which means ->fault
> > time, looks like the only way to close this gap.
>
> > Trouble is I have no idea how to do this cleanly ...
>
> How about add a vm_ops callback 'install_pages'/'prefault_pages' ?
>
> Call it after vm_link() - basically just move the remap_pfn, under
> some other lock, into there.

Yeah, I think that would be useful. This might also be useful for
something entirely different: For legacy fbdev emulation on top of drm
kernel modesetting drivers we need to track dirty pages of VM_IO
mmaps. Right now that's a gross hack, and essentially we just pay the
price for entirely separate storage and an additional memcpy when this
is needed to emulate fbdev mmap on top of drm. But if we have
install_ptes callback or similar we could just wrap the native vm_ops
and add a mkwrite callback on top for that dirty tracking. For that
the hook would need to be after vm_set_page_prot so that we
write-protect the ptes by default, since that's where we compute
vma_wants_writenotify(). That's also after vma_link, so one hook for
two use-cases.

The trouble is that io_remap_pfn adjust vma->pgoff, so we'd need to
split that. So ideally ->mmap would never set up any ptes.

I guess one option would be if remap_pfn_range would steal the
vma->vm_ops pointer for itself, then it could set up the correct
->install_ptes hook. But there's tons of callers for that, so not sure
that's a bright idea.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap
  2020-10-21 15:54         ` Daniel Vetter
@ 2020-10-21 16:37           ` Jason Gunthorpe
  2020-10-21 19:24             ` Daniel Vetter
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-10-21 16:37 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Bjorn Helgaas, Linux PCI, Daniel Vetter

On Wed, Oct 21, 2020 at 05:54:54PM +0200, Daniel Vetter wrote:

> The trouble is that io_remap_pfn adjust vma->pgoff, so we'd need to
> split that. So ideally ->mmap would never set up any ptes.

/dev/mem makes pgoff == pfn so it doesn't get changed by remap.

pgoff doesn't get touched for MAP_SHARED either, so there are other
users that could work like this - eg anyone mmaping IO memory is
probably OK.

> I guess one option would be if remap_pfn_range would steal the
> vma->vm_ops pointer for itself, then it could set up the correct
> ->install_ptes hook. But there's tons of callers for that, so not sure
> that's a bright idea.

The caller has to check that the mapping is still live, and I think
hold a lock across the remap? Auto-defering it doesn't seem feasible.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap
  2020-10-21 16:37           ` Jason Gunthorpe
@ 2020-10-21 19:24             ` Daniel Vetter
  2020-10-21 23:20               ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2020-10-21 19:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Bjorn Helgaas, Linux PCI, Daniel Vetter

On Wed, Oct 21, 2020 at 6:37 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 21, 2020 at 05:54:54PM +0200, Daniel Vetter wrote:
>
> > The trouble is that io_remap_pfn adjust vma->pgoff, so we'd need to
> > split that. So ideally ->mmap would never set up any ptes.
>
> /dev/mem makes pgoff == pfn so it doesn't get changed by remap.
>
> pgoff doesn't get touched for MAP_SHARED either, so there are other
> users that could work like this - eg anyone mmaping IO memory is
> probably OK.

I was more generally thinking for io_remap_pfn_users because of the
mkwrite use-case we might have in fbdev emulation in drm.

> > I guess one option would be if remap_pfn_range would steal the
> > vma->vm_ops pointer for itself, then it could set up the correct
> > ->install_ptes hook. But there's tons of callers for that, so not sure
> > that's a bright idea.
>
> The caller has to check that the mapping is still live, and I think
> hold a lock across the remap? Auto-defering it doesn't seem feasible.

Right auto-defering reopens the race, so making this work
automatically is a bit much. I guess just splitting this into a
setup/install part and then doing the install of all the ptes at first
fault should be good enough. We don't really need a new install_pages
for that, just an io_remap_pfn_range that's split in two parts.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap
  2020-10-21 19:24             ` Daniel Vetter
@ 2020-10-21 23:20               ` Jason Gunthorpe
  2020-10-22  7:00                 ` Daniel Vetter
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-10-21 23:20 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Bjorn Helgaas, Linux PCI, Daniel Vetter

On Wed, Oct 21, 2020 at 09:24:08PM +0200, Daniel Vetter wrote:
> On Wed, Oct 21, 2020 at 6:37 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 21, 2020 at 05:54:54PM +0200, Daniel Vetter wrote:
> >
> > > The trouble is that io_remap_pfn adjust vma->pgoff, so we'd need to
> > > split that. So ideally ->mmap would never set up any ptes.
> >
> > /dev/mem makes pgoff == pfn so it doesn't get changed by remap.
> >
> > pgoff doesn't get touched for MAP_SHARED either, so there are other
> > users that could work like this - eg anyone mmaping IO memory is
> > probably OK.
> 
> I was more generally thinking for io_remap_pfn_users because of the
> mkwrite use-case we might have in fbdev emulation in drm.

You have a use case for MAP_PRIVATE and io_remap_pfn_range()??

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap
  2020-10-21 23:20               ` Jason Gunthorpe
@ 2020-10-22  7:00                 ` Daniel Vetter
  2020-10-22 11:43                   ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2020-10-22  7:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Bjorn Helgaas, Linux PCI, Daniel Vetter

On Thu, Oct 22, 2020 at 1:20 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Oct 21, 2020 at 09:24:08PM +0200, Daniel Vetter wrote:
> > On Wed, Oct 21, 2020 at 6:37 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Oct 21, 2020 at 05:54:54PM +0200, Daniel Vetter wrote:
> > >
> > > > The trouble is that io_remap_pfn adjust vma->pgoff, so we'd need to
> > > > split that. So ideally ->mmap would never set up any ptes.
> > >
> > > /dev/mem makes pgoff == pfn so it doesn't get changed by remap.
> > >
> > > pgoff doesn't get touched for MAP_SHARED either, so there are other
> > > users that could work like this - eg anyone mmaping IO memory is
> > > probably OK.
> >
> > I was more generally thinking for io_remap_pfn_users because of the
> > mkwrite use-case we might have in fbdev emulation in drm.
>
> You have a use case for MAP_PRIVATE and io_remap_pfn_range()??

Uh no :-) But for ioremaps and keep track of which pages userspace has
touched. Problem is that there's many displays where you need to
explicitly upload the data, and in drm we have ioctl calls for that.
fbdev mmap assumes this just magically happens. So you need to keep
track of write faults, launch a delayed worker which first re-protects
all ptes and then uploads the dirty pages. And ideally we wouldn't
have to implement this everywhere just for fbdev, but could wrap it
around an existing mmap implementation by just intercepting mkwrite.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap
  2020-10-22  7:00                 ` Daniel Vetter
@ 2020-10-22 11:43                   ` Jason Gunthorpe
  2020-10-22 13:04                     ` Daniel Vetter
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-10-22 11:43 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Bjorn Helgaas, Linux PCI, Daniel Vetter

On Thu, Oct 22, 2020 at 09:00:44AM +0200, Daniel Vetter wrote:
> On Thu, Oct 22, 2020 at 1:20 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Oct 21, 2020 at 09:24:08PM +0200, Daniel Vetter wrote:
> > > On Wed, Oct 21, 2020 at 6:37 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Wed, Oct 21, 2020 at 05:54:54PM +0200, Daniel Vetter wrote:
> > > >
> > > > > The trouble is that io_remap_pfn adjust vma->pgoff, so we'd need to
> > > > > split that. So ideally ->mmap would never set up any ptes.
> > > >
> > > > /dev/mem makes pgoff == pfn so it doesn't get changed by remap.
> > > >
> > > > pgoff doesn't get touched for MAP_SHARED either, so there are other
> > > > users that could work like this - eg anyone mmaping IO memory is
> > > > probably OK.
> > >
> > > I was more generally thinking for io_remap_pfn_users because of the
> > > mkwrite use-case we might have in fbdev emulation in drm.
> >
> > You have a use case for MAP_PRIVATE and io_remap_pfn_range()??
> 
> Uh no :-)

So it is fine, the pgoff mangling only happens for MAP_PRIVATE

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap
  2020-10-22 11:43                   ` Jason Gunthorpe
@ 2020-10-22 13:04                     ` Daniel Vetter
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Vetter @ 2020-10-22 13:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Bjorn Helgaas, Linux PCI, Daniel Vetter

On Thu, Oct 22, 2020 at 1:43 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Oct 22, 2020 at 09:00:44AM +0200, Daniel Vetter wrote:
> > On Thu, Oct 22, 2020 at 1:20 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Oct 21, 2020 at 09:24:08PM +0200, Daniel Vetter wrote:
> > > > On Wed, Oct 21, 2020 at 6:37 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > >
> > > > > On Wed, Oct 21, 2020 at 05:54:54PM +0200, Daniel Vetter wrote:
> > > > >
> > > > > > The trouble is that io_remap_pfn adjust vma->pgoff, so we'd need to
> > > > > > split that. So ideally ->mmap would never set up any ptes.
> > > > >
> > > > > /dev/mem makes pgoff == pfn so it doesn't get changed by remap.
> > > > >
> > > > > pgoff doesn't get touched for MAP_SHARED either, so there are other
> > > > > users that could work like this - eg anyone mmaping IO memory is
> > > > > probably OK.
> > > >
> > > > I was more generally thinking for io_remap_pfn_users because of the
> > > > mkwrite use-case we might have in fbdev emulation in drm.
> > >
> > > You have a use case for MAP_PRIVATE and io_remap_pfn_range()??
> >
> > Uh no :-)
>
> So it is fine, the pgoff mangling only happens for MAP_PRIVATE

Ah right I got confused, thanks for clarifying.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-10-22 13:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20201021085655.1192025-1-daniel.vetter@ffwll.ch>
2020-10-21  8:56 ` [PATCH v3 12/16] PCI: Obey iomem restrictions for procfs mmap Daniel Vetter
2020-10-21 12:50   ` Jason Gunthorpe
2020-10-21 14:42     ` Daniel Vetter
2020-10-21 15:13       ` Jason Gunthorpe
2020-10-21 15:54         ` Daniel Vetter
2020-10-21 16:37           ` Jason Gunthorpe
2020-10-21 19:24             ` Daniel Vetter
2020-10-21 23:20               ` Jason Gunthorpe
2020-10-22  7:00                 ` Daniel Vetter
2020-10-22 11:43                   ` Jason Gunthorpe
2020-10-22 13:04                     ` Daniel Vetter
2020-10-21  8:56 ` [PATCH v3 15/16] sysfs: Support zapping of binary attr mmaps Daniel Vetter
2020-10-21  8:56 ` [PATCH v3 16/16] PCI: Revoke mappings like devmem Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).