There's three ways to access PCI BARs from userspace: /dev/mem, sysfs files, and the old proc interface. Two check against iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM, this starts to matter, since we don't want random userspace having access to PCI BARs while a driver is loaded and using it. Fix this by adding the same iomem_is_exclusive() check we already have on the sysfs side in pci_mmap_resource(). Acked-by: Bjorn Helgaas <bhelgaas@google.com> References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges") Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> -- v2: Improve commit message (Bjorn) --- drivers/pci/proc.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index d35186b01d98..3a2f90beb4cb 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma) else return -EINVAL; } + + if (dev->resource[i].flags & IORESOURCE_MEM && + iomem_is_exclusive(dev->resource[i].start)) + return -EINVAL; + ret = pci_mmap_page_range(dev, i, vma, fpriv->mmap_state, write_combine); if (ret < 0) -- 2.29.2
We want to be able to revoke pci mmaps so that the same access rules applies as for /dev/kmem. Revoke support for devmem was added in 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region"). The simplest way to achieve this is by having the same filp->f_mapping for all mappings, so that unmap_mapping_range can find them all, no matter through which file they've been created. Since this must be set at open time we need sysfs support for this. Add an optional mapping parameter bin_attr, which is only consulted when there's also an mmap callback, since without mmap support allowing to adjust the ->f_mapping makes no sense. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> --- fs/sysfs/file.c | 11 +++++++++++ include/linux/sysfs.h | 2 ++ 2 files changed, 13 insertions(+) diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index 96d0da65e088..9aefa7779b29 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -170,6 +170,16 @@ static int sysfs_kf_bin_mmap(struct kernfs_open_file *of, return battr->mmap(of->file, kobj, battr, vma); } +static int sysfs_kf_bin_open(struct kernfs_open_file *of) +{ + struct bin_attribute *battr = of->kn->priv; + + if (battr->mapping) + of->file->f_mapping = battr->mapping; + + return 0; +} + void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr) { struct kernfs_node *kn = kobj->sd, *tmp; @@ -241,6 +251,7 @@ static const struct kernfs_ops sysfs_bin_kfops_mmap = { .read = sysfs_kf_bin_read, .write = sysfs_kf_bin_write, .mmap = sysfs_kf_bin_mmap, + .open = sysfs_kf_bin_open, }; int sysfs_add_file_mode_ns(struct kernfs_node *parent, diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h index 2caa34c1ca1a..d76a1ddf83a3 100644 --- a/include/linux/sysfs.h +++ b/include/linux/sysfs.h @@ -164,11 +164,13 @@ __ATTRIBUTE_GROUPS(_name) struct file; struct vm_area_struct; +struct address_space; struct bin_attribute { struct attribute attr; size_t size; void *private; + struct address_space *mapping; ssize_t (*read)(struct file *, struct kobject *, struct bin_attribute *, char *, loff_t, size_t); ssize_t (*write)(struct file *, struct kobject *, struct bin_attribute *, -- 2.29.2
Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region") /dev/kmem zaps ptes when the kernel requests exclusive acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is the default for all driver uses. Except there's two more ways to access PCI BARs: sysfs and proc mmap support. Let's plug that hole. For revoke_devmem() to work we need to link our vma into the same address_space, with consistent vma->vm_pgoff. ->pgoff is already adjusted, because that's how (io_)remap_pfn_range works, but for the mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is to adjust this at at ->open time: - for sysfs this is easy, now that binary attributes support this. We just set bin_attr->mapping when mmap is supported - for procfs it's a bit more tricky, since procfs pci access has only one file per device, and access to a specific resources first needs to be set up with some ioctl calls. But mmap is only supported for the same resources as sysfs exposes with mmap support, and otherwise rejected, so we can set the mapping unconditionally at open time without harm. A special consideration is for arch_can_pci_mmap_io() - we need to make sure that the ->f_mapping doesn't alias between ioport and iomem space. There's only 2 ways in-tree to support mmap of ioports: generic pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single architecture hand-rolling. Both approach support ioport mmap through a special pfn range and not through magic pte attributes. Aliasing is therefore not a problem. The only difference in access checks left is that sysfs PCI mmap does not check for CAP_RAWIO. I'm not really sure whether that should be added or not. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> -- v2: - Totally new approach: Adjust filp->f_mapping at open time. Note that this now works on all architectures, not just those support ARCH_GENERIC_PCI_MMAP_RESOURCE --- drivers/pci/pci-sysfs.c | 4 ++++ drivers/pci/proc.c | 1 + 2 files changed, 5 insertions(+) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index d15c881e2e7e..3f1c31bc0b7c 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -929,6 +929,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_io->read = pci_read_legacy_io; b->legacy_io->write = pci_write_legacy_io; b->legacy_io->mmap = pci_mmap_legacy_io; + b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_io); error = device_create_bin_file(&b->dev, b->legacy_io); if (error) @@ -941,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_mem->size = 1024*1024; b->legacy_mem->attr.mode = 0600; b->legacy_mem->mmap = pci_mmap_legacy_mem; + b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_mem); error = device_create_bin_file(&b->dev, b->legacy_mem); if (error) @@ -1156,6 +1158,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) res_attr->mmap = pci_mmap_resource_uc; } } + if (res_attr->mmap) + res_attr->mapping = iomem_get_mapping(); res_attr->attr.name = res_attr_name; res_attr->attr.mode = 0600; res_attr->size = pci_resource_len(pdev, num); diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index 3a2f90beb4cb..9bab07302bbf 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) fpriv->write_combine = 0; file->private_data = fpriv; + file->f_mapping = iomem_get_mapping(); return 0; } -- 2.29.2